nvidia2

author: leshe4ka46 <alex9102naid1@ya.ru> 2025-10-18 12:25:53 +0300
committer: leshe4ka46 <alex9102naid1@ya.ru> 2025-10-18 12:25:53 +0300
commit: 910a222fa60ce6ea0831f2956470b8a0b9f62670 (patch)
tree: 1d6bbccafb667731ad127f93390761100fc11b53 /Fundamentals_of_Accelerated_Data_Science/1-03_memory_management.ipynb
parent: 35b9040e4104b0e79bf243a2c9769c589f96e2c4 (diff)
1 files changed, 1135 insertions, 0 deletions
diff --git a/Fundamentals_of_Accelerated_Data_Science/1-03_memory_management.ipynb b/Fundamentals_of_Accelerated_Data_Science/1-03_memory_management.ipynb
new file mode 100644
index 0000000..cae5373
--- /dev/null
+++ b/Fundamentals_of_Accelerated_Data_Science/1-03_memory_management.ipynb
@@ -0,0 +1,1135 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "def31b0f-921a-43eb-9807-8b9b31eb7b32",
+   "metadata": {},
+   "source": [
+    "<img src=\"./images/DLI_Header.png\" width=400/>"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4a0fd4dd-f7be-4c90-8ddd-384a760ac04f",
+   "metadata": {},
+   "source": [
+    "# Fundamentals of Accelerated Data Science # "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6a8fdf2e-a481-455e-8a52-8be8472b63bf",
+   "metadata": {},
+   "source": [
+    "## 03 - Memory Management ##\n",
+    "\n",
+    "**Table of Contents**\n",
+    "<br>\n",
+    "This notebook explores the dynamics between data and memory. This notebook covers the below sections: \n",
+    "1. [Memory Management](#Memory-Management)\n",
+    "    * [Memory Usage](#Memory-Usage)\n",
+    "2. [Data Types](#Data-Types)\n",
+    "    * [Convert Data Types](#Convert-Data-Types)\n",
+    "    * [Exercise #1 - Modify `dtypes`](#Exercise-#1---Modify-dtypes)\n",
+    "    * [Categorical](#Categorical)\n",
+    "3. [Efficient Data Loading](#Efficient-Data-Loading)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1b59367c-48bc-4c72-b1f4-4cfdfa5470cf",
+   "metadata": {},
+   "source": [
+    "## Memory Management ##\n",
+    "During the data acquisition process, data is transferred to memory in order to be operated on by the processor. Memory management is crucial for cuDF and GPU operations for several key reasons: \n",
+    "* **Limited GPU memory**: GPUs typically have less memory than CPUs, therefore efficient memory management is essential to maximize the use of available GPU memory, especially for large datasets.\n",
+    "* **Data transfer overhead**: Transferring data between CPU and GPU memory is relatively slow compared to GPU computation speed. Minimizing these transfers through smart memory management is critical for performance.\n",
+    "* **Performance tuning**: Understanding and optimizing memory usage is key to achieving peak performance in GPU-accelerated data processing tasks.\n",
+    "\n",
+    "When done correctly, keeping the data on the GPU can enable cuDF and the RAPIDS ecosystem to achieve significant performance improvements, handle larger datasets, and provide more efficient data processing capabilities. \n",
+    "\n",
+    "Below we import the data from the csv file. "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "id": "b7b8a623-f799-4dad-aca9-0e571bb6e527",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# DO NOT CHANGE THIS CELL\n",
+    "import pandas as pd\n",
+    "import random\n",
+    "import time"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "id": "711d0a7f-8598-49fc-949c-5caf6029ce47",
+   "metadata": {
+    "scrolled": true
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<div>\n",
+       "<style scoped>\n",
+       "    .dataframe tbody tr th:only-of-type {\n",
+       "        vertical-align: middle;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe tbody tr th {\n",
+       "        vertical-align: top;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe thead th {\n",
+       "        text-align: right;\n",
+       "    }\n",
+       "</style>\n",
+       "<table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       "    <tr style=\"text-align: right;\">\n",
+       "      <th></th>\n",
+       "      <th>age</th>\n",
+       "      <th>sex</th>\n",
+       "      <th>county</th>\n",
+       "      <th>lat</th>\n",
+       "      <th>long</th>\n",
+       "      <th>name</th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <th>0</th>\n",
+       "      <td>0</td>\n",
+       "      <td>m</td>\n",
+       "      <td>DARLINGTON</td>\n",
+       "      <td>54.533644</td>\n",
+       "      <td>-1.524401</td>\n",
+       "      <td>FRANCIS</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>1</th>\n",
+       "      <td>0</td>\n",
+       "      <td>m</td>\n",
+       "      <td>DARLINGTON</td>\n",
+       "      <td>54.426256</td>\n",
+       "      <td>-1.465314</td>\n",
+       "      <td>EDWARD</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>2</th>\n",
+       "      <td>0</td>\n",
+       "      <td>m</td>\n",
+       "      <td>DARLINGTON</td>\n",
+       "      <td>54.555200</td>\n",
+       "      <td>-1.496417</td>\n",
+       "      <td>TEDDY</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>3</th>\n",
+       "      <td>0</td>\n",
+       "      <td>m</td>\n",
+       "      <td>DARLINGTON</td>\n",
+       "      <td>54.547906</td>\n",
+       "      <td>-1.572341</td>\n",
+       "      <td>ANGUS</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>4</th>\n",
+       "      <td>0</td>\n",
+       "      <td>m</td>\n",
+       "      <td>DARLINGTON</td>\n",
+       "      <td>54.477639</td>\n",
+       "      <td>-1.605995</td>\n",
+       "      <td>CHARLIE</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table>\n",
+       "</div>"
+      ],
+      "text/plain": [
+       "   age sex      county        lat      long     name\n",
+       "0    0   m  DARLINGTON  54.533644 -1.524401  FRANCIS\n",
+       "1    0   m  DARLINGTON  54.426256 -1.465314   EDWARD\n",
+       "2    0   m  DARLINGTON  54.555200 -1.496417    TEDDY\n",
+       "3    0   m  DARLINGTON  54.547906 -1.572341    ANGUS\n",
+       "4    0   m  DARLINGTON  54.477639 -1.605995  CHARLIE"
+      ]
+     },
+     "execution_count": 2,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "# DO NOT CHANGE THIS CELL\n",
+    "df=pd.read_csv('./data/uk_pop.csv')\n",
+    "\n",
+    "# preview\n",
+    "df.head()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "36416fd0-7081-42aa-bf31-d1231b81ec0b",
+   "metadata": {},
+   "source": [
+    "### Memory Usage ###\n",
+    "Memory utilization of a DataFrame depends on the date types for each column.\n",
+    "\n",
+    "<p><img src='images/dtypes.png' width=720></p>\n",
+    "\n",
+    "We can use `DataFrame.memory_usage()` to see the memory usage for each column (in bytes). Most of the common data types have a fixed size in memory, such as `int`, `float`, `datetime`, and `bool`. Memory usage for these data types is the respective memory requirement multiplied by the number of data points. For `string` data type, the memory usage reported _for pandas_ is the number of elements times 8 bytes. This accounts for the 64-bit required for the pointer that points to an address in memory but not the memory used for the actual string values. The actual memory required for a string value is 49 bytes plus an additional byte for each character. The `deep` parameter provides a more accurate memory usage report that accounts for the system-level memory consumption of the contained `string` data type. \n",
+    "\n",
+    "Below we get the memory usage. "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "id": "8378207b-2d9e-4102-8408-c2dddafc8a40",
+   "metadata": {
+    "scrolled": true
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "<class 'pandas.core.frame.DataFrame'>\n",
+      "RangeIndex: 58479894 entries, 0 to 58479893\n",
+      "Data columns (total 6 columns):\n",
+      " #   Column  Dtype  \n",
+      "---  ------  -----  \n",
+      " 0   age     int64  \n",
+      " 1   sex     object \n",
+      " 2   county  object \n",
+      " 3   lat     float64\n",
+      " 4   long    float64\n",
+      " 5   name    object \n",
+      "dtypes: float64(2), int64(1), object(3)\n",
+      "memory usage: 11.5 GB\n"
+     ]
+    },
+    {
+     "data": {
+      "text/plain": [
+       "Index            128\n",
+       "age        467839152\n",
+       "sex       3391833852\n",
+       "county    3934985133\n",
+       "lat        467839152\n",
+       "long       467839152\n",
+       "name      3666922374\n",
+       "dtype: int64"
+      ]
+     },
+     "execution_count": 9,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "# DO NOT CHANGE THIS CELL\n",
+    "# pandas memory utilization\n",
+    "df.info(memory_usage='deep')\n",
+    "mem_usage_df=df.memory_usage(deep=True)\n",
+    "mem_usage_df"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "07c24bb1-c4f7-440c-a949-d4c57800ec61",
+   "metadata": {},
+   "source": [
+    "Below we define a `make_decimal()` function to convert memory size into units based on powers of 2. In contrast to units based on powers of 10, this customary convention is commonly used to report memory capacity. More information about the two definitions can be found [here](https://en.wikipedia.org/wiki/Byte#Multiple-byte_units). "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "id": "5ae42218-1547-49fd-9123-ab508a2b03de",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# DO NOT CHANGE THIS CELL\n",
+    "suffixes = ['B', 'kB', 'MB', 'GB', 'TB', 'PB']\n",
+    "def make_decimal(nbytes):\n",
+    "    i=0\n",
+    "    while nbytes >= 1024 and i < len(suffixes)-1:\n",
+    "        nbytes/=1024.\n",
+    "        i+=1\n",
+    "    f=('%.2f' % nbytes).rstrip('0').rstrip('.')\n",
+    "    return '%s %s' % (f, suffixes[i])"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "id": "e6d4a613-3eea-4dce-8e71-39593ff6f226",
+   "metadata": {
+    "scrolled": true
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "'11.55 GB'"
+      ]
+     },
+     "execution_count": 5,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "make_decimal(mem_usage_df.sum())"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a352c0b2-65aa-4231-b753-556aca46ff49",
+   "metadata": {},
+   "source": [
+    "Below we calculate the memory usage manually based on the data types. "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "id": "630327b9-6dc1-4b70-9fdf-9f7763ec4d50",
+   "metadata": {
+    "scrolled": true
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Numerical columns use 467839152 bytes of memory\n"
+     ]
+    }
+   ],
+   "source": [
+    "# DO NOT CHANGE THIS CELL\n",
+    "# get number of rows\n",
+    "num_rows=len(df)\n",
+    "\n",
+    "# 64-bit numbers uses 8 bytes of memory\n",
+    "print(f'Numerical columns use {num_rows*8} bytes of memory')"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "id": "bb22b5f4-e38f-438e-9426-61746b509e50",
+   "metadata": {
+    "scrolled": true
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "county column uses 3934985133 bytes of memory.\n"
+     ]
+    }
+   ],
+   "source": [
+    "# DO NOT CHANGE THIS CELL\n",
+    "# check random string-typed column\n",
+    "string_cols=[col for col in df.columns if df[col].dtype=='object' ]\n",
+    "column_to_check=random.choice(string_cols)\n",
+    "\n",
+    "overhead=49\n",
+    "pointer_size=8\n",
+    "\n",
+    "# nan==nan when value is not a number\n",
+    "# nan uses 32 bytes of memory\n",
+    "string_col_mem_usage_df=df[column_to_check].map(lambda x: len(x)+overhead+pointer_size if x else 32)\n",
+    "string_col_mem_usage=string_col_mem_usage_df.sum()\n",
+    "print(f'{column_to_check} column uses {string_col_mem_usage} bytes of memory.')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "94e393c2-c0d0-40ee-82d2-730c4667e9b8",
+   "metadata": {},
+   "source": [
+    "**Note**: The `string` data type is stored differently in cuDF than it is in pandas. More information about `libcudf` stores string data using the [Arrow format](https://arrow.apache.org/docs/format/Columnar.html#variable-size-binary-layout) can be found [here](https://developer.nvidia.com/blog/mastering-string-transformations-in-rapids-libcudf/). "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "737ff50b-9426-4e08-a00a-d7ee69f48b9f",
+   "metadata": {},
+   "source": [
+    "## Data Types ##\n",
+    "By default, pandas (and cuDF) uses 64-bit for numerical values. Using 64-bit numbers provides the highest precision but many applications do not require 64-bit precision when aggregating over a very large number of data points. When possible, using 32-bit numbers reduces storage and memory requirements in half, and also typically greatly speeds up computations because only half as much data needs to be accessed in memory. "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0b77d450-c415-44b8-87ac-20ce616ec809",
+   "metadata": {},
+   "source": [
+    "### Convert Data Types ###\n",
+    "The `.astype()` method can be used to convert numerical data types to use different bit-size containers. Here we convert the `age` column from `int64` to `int8`. "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "id": "603f7c70-134e-4466-a790-8a18b9088ca6",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "age          int8\n",
+       "sex        object\n",
+       "county     object\n",
+       "lat       float64\n",
+       "long      float64\n",
+       "name       object\n",
+       "dtype: object"
+      ]
+     },
+     "execution_count": 10,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "# DO NOT CHANGE THIS CELL\n",
+    "df['age']=df['age'].astype('int8')\n",
+    "\n",
+    "df.dtypes"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "973a6dd4-2aef-44d9-8b01-8853032eddae",
+   "metadata": {},
+   "source": [
+    "### Exercise #1 - Modify `dtypes` ###\n",
+    "**Instructions**: <br>\n",
+    "* Modify the `<FIXME>` only and execute the below cell to convert any 64-bit data types to their 32-bit counterparts."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 11,
+   "id": "beb7d71b-6672-462e-b65c-a64dbe5f7a57",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "df['lat']=df['lat'].astype('float32')\n",
+    "df['long']=df['long'].astype('float32')"
+   ]
+  },
+  {
+   "cell_type": "raw",
+   "id": "3b44fb22-a0f1-4e43-a332-1ccbad50caee",
+   "metadata": {},
+   "source": [
+    "\n",
+    "df['lat']=df['lat'].astype('float32')\n",
+    "df['long']=df['long'].astype('float32')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "98b6542d-22cc-4926-b600-a3e052c37c96",
+   "metadata": {},
+   "source": [
+    "Click ... for solution. "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7b2cd622-977c-4915-a87f-2fe03c1793f5",
+   "metadata": {},
+   "source": [
+    "### Categorical ###\n",
+    "Categorical data is a type of data that represents discrete, distinct categories or groups. They can have a meaningful order or ranking but generally cannot be used for numerical operations. When appropriate, using the `categorical` data type can reduce memory usage and lead to faster operations. It can also be used to define and maintain a custom order of categories. \n",
+    "\n",
+    "Below we get the number of unique values in the string columns. "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 12,
+   "id": "f249e4b8-5d7a-4b44-ac15-bd3360a43f2a",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "sex           2\n",
+       "county      171\n",
+       "name      13212\n",
+       "dtype: int64"
+      ]
+     },
+     "execution_count": 12,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "# DO NOT CHANGE THIS CELL\n",
+    "df.select_dtypes(include='object').nunique()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f1d8bd88-b39b-4043-9039-d8bd75fe851a",
+   "metadata": {},
+   "source": [
+    "Below we convert columns with few discrete values to `category`. The `category` data type has `.categories` and `codes` properties that are accessed through `.cat`. "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 13,
+   "id": "a99bebbf-2e5b-4720-96f9-9fd7d42d2fe8",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# DO NOT CHANGE THIS CELL\n",
+    "df['sex']=df['sex'].astype('category')\n",
+    "df['county']=df['county'].astype('category')"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 14,
+   "id": "41b7b290-cfcf-4ff6-b6b4-454c19b44a62",
+   "metadata": {
+    "scrolled": true
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "Index(['BARKING AND DAGENHAM', 'BARNET', 'BARNSLEY',\n",
+       "       'BATH AND NORTH EAST SOMERSET', 'BEDFORD', 'BEXLEY', 'BIRMINGHAM',\n",
+       "       'BLACKBURN WITH DARWEN', 'BLACKPOOL', 'BLAENAU GWENT',\n",
+       "       ...\n",
+       "       'WESTMINSTER', 'WIGAN', 'WILTSHIRE', 'WINDSOR AND MAIDENHEAD', 'WIRRAL',\n",
+       "       'WOKINGHAM', 'WOLVERHAMPTON', 'WORCESTERSHIRE', 'WREXHAM', 'YORK'],\n",
+       "      dtype='object', length=171)"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "----------------------------------------\n"
+     ]
+    },
+    {
+     "data": {
+      "text/plain": [
+       "0           37\n",
+       "1           37\n",
+       "2           37\n",
+       "3           37\n",
+       "4           37\n",
+       "            ..\n",
+       "58479889    96\n",
+       "58479890    96\n",
+       "58479891    96\n",
+       "58479892    96\n",
+       "58479893    96\n",
+       "Length: 58479894, dtype: int16"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "# DO NOT CHANGE THIS CELL\n",
+    "display(df['county'].cat.categories)\n",
+    "print('-'*40)\n",
+    "display(df['county'].cat.codes)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 15,
+   "id": "01d12a78-5f70-4152-a708-68ee68046a1e",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "age           int8\n",
+       "sex       category\n",
+       "county    category\n",
+       "lat        float32\n",
+       "long       float32\n",
+       "name        object\n",
+       "dtype: object"
+      ]
+     },
+     "execution_count": 15,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "df.dtypes"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 16,
+   "id": "24138ffc-80b2-46ea-930d-1c1ab9706b10",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<div>\n",
+       "<style scoped>\n",
+       "    .dataframe tbody tr th:only-of-type {\n",
+       "        vertical-align: middle;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe tbody tr th {\n",
+       "        vertical-align: top;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe thead th {\n",
+       "        text-align: right;\n",
+       "    }\n",
+       "</style>\n",
+       "<table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       "    <tr style=\"text-align: right;\">\n",
+       "      <th></th>\n",
+       "      <th>age</th>\n",
+       "      <th>sex</th>\n",
+       "      <th>county</th>\n",
+       "      <th>lat</th>\n",
+       "      <th>long</th>\n",
+       "      <th>name</th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <th>0</th>\n",
+       "      <td>0</td>\n",
+       "      <td>m</td>\n",
+       "      <td>DARLINGTON</td>\n",
+       "      <td>54.533646</td>\n",
+       "      <td>-1.524401</td>\n",
+       "      <td>FRANCIS</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>1</th>\n",
+       "      <td>0</td>\n",
+       "      <td>m</td>\n",
+       "      <td>DARLINGTON</td>\n",
+       "      <td>54.426254</td>\n",
+       "      <td>-1.465314</td>\n",
+       "      <td>EDWARD</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>2</th>\n",
+       "      <td>0</td>\n",
+       "      <td>m</td>\n",
+       "      <td>DARLINGTON</td>\n",
+       "      <td>54.555199</td>\n",
+       "      <td>-1.496417</td>\n",
+       "      <td>TEDDY</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>3</th>\n",
+       "      <td>0</td>\n",
+       "      <td>m</td>\n",
+       "      <td>DARLINGTON</td>\n",
+       "      <td>54.547905</td>\n",
+       "      <td>-1.572341</td>\n",
+       "      <td>ANGUS</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>4</th>\n",
+       "      <td>0</td>\n",
+       "      <td>m</td>\n",
+       "      <td>DARLINGTON</td>\n",
+       "      <td>54.477638</td>\n",
+       "      <td>-1.605994</td>\n",
+       "      <td>CHARLIE</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table>\n",
+       "</div>"
+      ],
+      "text/plain": [
+       "   age sex      county        lat      long     name\n",
+       "0    0   m  DARLINGTON  54.533646 -1.524401  FRANCIS\n",
+       "1    0   m  DARLINGTON  54.426254 -1.465314   EDWARD\n",
+       "2    0   m  DARLINGTON  54.555199 -1.496417    TEDDY\n",
+       "3    0   m  DARLINGTON  54.547905 -1.572341    ANGUS\n",
+       "4    0   m  DARLINGTON  54.477638 -1.605994  CHARLIE"
+      ]
+     },
+     "execution_count": 16,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "df.head()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "3d0addcc-c078-42f5-a66a-3bb9a969d7e8",
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  },
+  {
+   "cell_type": "markdown",
+   "id": "737385ab-677c-4bef-a86a-10aa3119e29a",
+   "metadata": {},
+   "source": [
+    "**Note**: `.astype()` can also be used to convert data to `datetime` or `object` to enable datetime and string methods. "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "552c47c2-0fbc-455e-8745-cb98fc777243",
+   "metadata": {},
+   "source": [
+    "## Efficient Data Loading ##\n",
+    "It is often advantageous to specify the most appropriate data types for each columns, based on range, precision requirement, and how they are used. "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 17,
+   "id": "c2b9f0c3-8598-4a28-9481-ce28fea7544b",
+   "metadata": {
+    "scrolled": true
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "Index            128\n",
+       "age        467839152\n",
+       "sex       3391833852\n",
+       "county    3934985133\n",
+       "lat        467839152\n",
+       "long       467839152\n",
+       "name      3666922374\n",
+       "dtype: int64"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Loading 11.55 GB took 33.87 seconds.\n"
+     ]
+    }
+   ],
+   "source": [
+    "# DO NOT CHANGE THIS CELL\n",
+    "start=time.time()\n",
+    "df=pd.read_csv('./data/uk_pop.csv')\n",
+    "duration=time.time()-start\n",
+    "\n",
+    "mem_usage_df=df.memory_usage(deep=True)\n",
+    "display(mem_usage_df)\n",
+    "\n",
+    "print(f'Loading {make_decimal(mem_usage_df.sum())} took {round(duration, 2)} seconds.')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5729520e-3ed8-4ec6-ae1f-ba46d642f48d",
+   "metadata": {},
+   "source": [
+    "Below we enable `cuda.pandas` to see the difference. "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 18,
+   "id": "99aa0f32-4d2a-43a7-bec1-f1b88bcc37c2",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# DO NOT CHANGE THIS CELL\n",
+    "%load_ext cudf.pandas\n",
+    "\n",
+    "import pandas as pd\n",
+    "import time"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 19,
+   "id": "2b724201-9ad1-4e9b-b712-f3b31bdc4104",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# DO NOT CHANGE THIS CELL\n",
+    "suffixes = ['B', 'kB', 'MB', 'GB', 'TB', 'PB']\n",
+    "def make_decimal(nbytes):\n",
+    "    i=0\n",
+    "    while nbytes >= 1024 and i < len(suffixes)-1:\n",
+    "        nbytes/=1024.\n",
+    "        i+=1\n",
+    "    f=('%.2f' % nbytes).rstrip('0').rstrip('.')\n",
+    "    return '%s %s' % (f, suffixes[i])"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 20,
+   "id": "99bdd7b0-8563-41db-bd8e-3a7279394ede",
+   "metadata": {
+    "scrolled": true
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "age        58479894\n",
+       "sex        58479908\n",
+       "county     58482446\n",
+       "lat       467839152\n",
+       "long      467839152\n",
+       "name      117096917\n",
+       "Index             0\n",
+       "dtype: int64"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Loading 1.14 GB took 2.12 seconds.\n"
+     ]
+    },
+    {
+     "data": {
+      "text/html": [
+       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"font-style: italic\">                                                                                                                   </span>\n",
+       "<span style=\"font-style: italic\">                                             Total time elapsed: 2.687 seconds                                     </span>\n",
+       "<span style=\"font-style: italic\">                                                                                                                   </span>\n",
+       "<span style=\"font-style: italic\">                                                           Stats                                                   </span>\n",
+       "<span style=\"font-style: italic\">                                                                                                                   </span>\n",
+       "┏━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━┓\n",
+       "┃<span style=\"font-weight: bold\"> Line no. </span>┃<span style=\"font-weight: bold\"> Line                                                                     </span>┃<span style=\"font-weight: bold\"> GPU TIME(s) </span>┃<span style=\"font-weight: bold\"> CPU TIME(s) </span>┃\n",
+       "┡━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━┩\n",
+       "│ 2        │ <span style=\"color: #f8f8f2; text-decoration-color: #f8f8f2; background-color: #272822\">    start</span><span style=\"color: #ff4689; text-decoration-color: #ff4689; background-color: #272822\">=</span><span style=\"color: #f8f8f2; text-decoration-color: #f8f8f2; background-color: #272822\">time</span><span style=\"color: #ff4689; text-decoration-color: #ff4689; background-color: #272822\">.</span><span style=\"color: #f8f8f2; text-decoration-color: #f8f8f2; background-color: #272822\">time()</span><span style=\"background-color: #272822\">                                                   </span> │             │             │\n",
+       "│          │ <span style=\"background-color: #272822\">                                                                        </span> │             │             │\n",
+       "│ 5        │ <span style=\"color: #f8f8f2; text-decoration-color: #f8f8f2; background-color: #272822\">    dtype_dict</span><span style=\"color: #ff4689; text-decoration-color: #ff4689; background-color: #272822\">=</span><span style=\"color: #f8f8f2; text-decoration-color: #f8f8f2; background-color: #272822\">{</span><span style=\"background-color: #272822\">                                                        </span> │             │             │\n",
+       "│          │ <span style=\"background-color: #272822\">                                                                        </span> │             │             │\n",
+       "│ 6        │ <span style=\"color: #f8f8f2; text-decoration-color: #f8f8f2; background-color: #272822\">        </span><span style=\"color: #e6db74; text-decoration-color: #e6db74; background-color: #272822\">'age'</span><span style=\"color: #f8f8f2; text-decoration-color: #f8f8f2; background-color: #272822\">: </span><span style=\"color: #e6db74; text-decoration-color: #e6db74; background-color: #272822\">'int8'</span><span style=\"color: #f8f8f2; text-decoration-color: #f8f8f2; background-color: #272822\">, </span><span style=\"background-color: #272822\">                                                 </span> │             │             │\n",
+       "│          │ <span style=\"background-color: #272822\">                                                                        </span> │             │             │\n",
+       "│ 7        │ <span style=\"color: #f8f8f2; text-decoration-color: #f8f8f2; background-color: #272822\">        </span><span style=\"color: #e6db74; text-decoration-color: #e6db74; background-color: #272822\">'sex'</span><span style=\"color: #f8f8f2; text-decoration-color: #f8f8f2; background-color: #272822\">: </span><span style=\"color: #e6db74; text-decoration-color: #e6db74; background-color: #272822\">'category'</span><span style=\"color: #f8f8f2; text-decoration-color: #f8f8f2; background-color: #272822\">, </span><span style=\"background-color: #272822\">                                             </span> │             │             │\n",
+       "│          │ <span style=\"background-color: #272822\">                                                                        </span> │             │             │\n",
+       "│ 8        │ <span style=\"color: #f8f8f2; text-decoration-color: #f8f8f2; background-color: #272822\">        </span><span style=\"color: #e6db74; text-decoration-color: #e6db74; background-color: #272822\">'county'</span><span style=\"color: #f8f8f2; text-decoration-color: #f8f8f2; background-color: #272822\">: </span><span style=\"color: #e6db74; text-decoration-color: #e6db74; background-color: #272822\">'category'</span><span style=\"color: #f8f8f2; text-decoration-color: #f8f8f2; background-color: #272822\">, </span><span style=\"background-color: #272822\">                                          </span> │             │             │\n",
+       "│          │ <span style=\"background-color: #272822\">                                                                        </span> │             │             │\n",
+       "│ 9        │ <span style=\"color: #f8f8f2; text-decoration-color: #f8f8f2; background-color: #272822\">        </span><span style=\"color: #e6db74; text-decoration-color: #e6db74; background-color: #272822\">'lat'</span><span style=\"color: #f8f8f2; text-decoration-color: #f8f8f2; background-color: #272822\">: </span><span style=\"color: #e6db74; text-decoration-color: #e6db74; background-color: #272822\">'float64'</span><span style=\"color: #f8f8f2; text-decoration-color: #f8f8f2; background-color: #272822\">, </span><span style=\"background-color: #272822\">                                              </span> │             │             │\n",
+       "│          │ <span style=\"background-color: #272822\">                                                                        </span> │             │             │\n",
+       "│ 10       │ <span style=\"color: #f8f8f2; text-decoration-color: #f8f8f2; background-color: #272822\">        </span><span style=\"color: #e6db74; text-decoration-color: #e6db74; background-color: #272822\">'long'</span><span style=\"color: #f8f8f2; text-decoration-color: #f8f8f2; background-color: #272822\">: </span><span style=\"color: #e6db74; text-decoration-color: #e6db74; background-color: #272822\">'float64'</span><span style=\"color: #f8f8f2; text-decoration-color: #f8f8f2; background-color: #272822\">, </span><span style=\"background-color: #272822\">                                             </span> │             │             │\n",
+       "│          │ <span style=\"background-color: #272822\">                                                                        </span> │             │             │\n",
+       "│ 11       │ <span style=\"color: #f8f8f2; text-decoration-color: #f8f8f2; background-color: #272822\">        </span><span style=\"color: #e6db74; text-decoration-color: #e6db74; background-color: #272822\">'name'</span><span style=\"color: #f8f8f2; text-decoration-color: #f8f8f2; background-color: #272822\">: </span><span style=\"color: #e6db74; text-decoration-color: #e6db74; background-color: #272822\">'category'</span><span style=\"background-color: #272822\">                                              </span> │             │             │\n",
+       "│          │ <span style=\"background-color: #272822\">                                                                        </span> │             │             │\n",
+       "│ 14       │ <span style=\"color: #f8f8f2; text-decoration-color: #f8f8f2; background-color: #272822\">    efficient_df</span><span style=\"color: #ff4689; text-decoration-color: #ff4689; background-color: #272822\">=</span><span style=\"color: #f8f8f2; text-decoration-color: #f8f8f2; background-color: #272822\">pd</span><span style=\"color: #ff4689; text-decoration-color: #ff4689; background-color: #272822\">.</span><span style=\"color: #f8f8f2; text-decoration-color: #f8f8f2; background-color: #272822\">read_csv(</span><span style=\"color: #e6db74; text-decoration-color: #e6db74; background-color: #272822\">'./data/uk_pop.csv'</span><span style=\"color: #f8f8f2; text-decoration-color: #f8f8f2; background-color: #272822\">, dtype</span><span style=\"color: #ff4689; text-decoration-color: #ff4689; background-color: #272822\">=</span><span style=\"color: #f8f8f2; text-decoration-color: #f8f8f2; background-color: #272822\">dtype_dict)</span><span style=\"background-color: #272822\">     </span> │ 1.718211215 │             │\n",
+       "│          │ <span style=\"background-color: #272822\">                                                                        </span> │             │             │\n",
+       "│ 15       │ <span style=\"color: #f8f8f2; text-decoration-color: #f8f8f2; background-color: #272822\">    duration</span><span style=\"color: #ff4689; text-decoration-color: #ff4689; background-color: #272822\">=</span><span style=\"color: #f8f8f2; text-decoration-color: #f8f8f2; background-color: #272822\">time</span><span style=\"color: #ff4689; text-decoration-color: #ff4689; background-color: #272822\">.</span><span style=\"color: #f8f8f2; text-decoration-color: #f8f8f2; background-color: #272822\">time()</span><span style=\"color: #ff4689; text-decoration-color: #ff4689; background-color: #272822\">-</span><span style=\"color: #f8f8f2; text-decoration-color: #f8f8f2; background-color: #272822\">start</span><span style=\"background-color: #272822\">                                          </span> │             │             │\n",
+       "│          │ <span style=\"background-color: #272822\">                                                                        </span> │             │             │\n",
+       "│ 17       │ <span style=\"color: #f8f8f2; text-decoration-color: #f8f8f2; background-color: #272822\">    mem_usage_df</span><span style=\"color: #ff4689; text-decoration-color: #ff4689; background-color: #272822\">=</span><span style=\"color: #f8f8f2; text-decoration-color: #f8f8f2; background-color: #272822\">efficient_df</span><span style=\"color: #ff4689; text-decoration-color: #ff4689; background-color: #272822\">.</span><span style=\"color: #f8f8f2; text-decoration-color: #f8f8f2; background-color: #272822\">memory_usage(</span><span style=\"color: #e6db74; text-decoration-color: #e6db74; background-color: #272822\">'deep'</span><span style=\"color: #f8f8f2; text-decoration-color: #f8f8f2; background-color: #272822\">)</span><span style=\"background-color: #272822\">                      </span> │ 0.005751408 │             │\n",
+       "│          │ <span style=\"background-color: #272822\">                                                                        </span> │             │             │\n",
+       "│ 18       │ <span style=\"color: #f8f8f2; text-decoration-color: #f8f8f2; background-color: #272822\">    display(mem_usage_df)</span><span style=\"background-color: #272822\">                                               </span> │ 0.011270449 │ 0.007067286 │\n",
+       "│          │ <span style=\"background-color: #272822\">                                                                        </span> │             │             │\n",
+       "│ 20       │ <span style=\"color: #f8f8f2; text-decoration-color: #f8f8f2; background-color: #272822\">    print(</span><span style=\"color: #e6db74; text-decoration-color: #e6db74; background-color: #272822\">f'Loading {</span><span style=\"color: #f8f8f2; text-decoration-color: #f8f8f2; background-color: #272822\">make_decimal(mem_usage_df</span><span style=\"color: #ff4689; text-decoration-color: #ff4689; background-color: #272822\">.</span><span style=\"color: #f8f8f2; text-decoration-color: #f8f8f2; background-color: #272822\">sum())</span><span style=\"color: #e6db74; text-decoration-color: #e6db74; background-color: #272822\">} took {</span><span style=\"color: #f8f8f2; text-decoration-color: #f8f8f2; background-color: #272822\">round(dura…</span> │ 0.004789912 │             │\n",
+       "│          │ <span style=\"background-color: #272822\">                                                                        </span> │             │             │\n",
+       "└──────────┴──────────────────────────────────────────────────────────────────────────┴─────────────┴─────────────┘\n",
+       "</pre>\n"
+      ],
+      "text/plain": [
+       "\u001b[3m                                                                                                                   \u001b[0m\n",
+       "\u001b[3m                                             Total time elapsed: 2.687 seconds                                     \u001b[0m\n",
+       "\u001b[3m                                                                                                                   \u001b[0m\n",
+       "\u001b[3m                                                           Stats                                                   \u001b[0m\n",
+       "\u001b[3m                                                                                                                   \u001b[0m\n",
+       "┏━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━┓\n",
+       "┃\u001b[1m \u001b[0m\u001b[1mLine no.\u001b[0m\u001b[1m \u001b[0m┃\u001b[1m \u001b[0m\u001b[1mLine                                                                    \u001b[0m\u001b[1m \u001b[0m┃\u001b[1m \u001b[0m\u001b[1mGPU TIME(s)\u001b[0m\u001b[1m \u001b[0m┃\u001b[1m \u001b[0m\u001b[1mCPU TIME(s)\u001b[0m\u001b[1m \u001b[0m┃\n",
+       "┡━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━┩\n",
+       "│ 2        │ \u001b[38;2;248;248;242;48;2;39;40;34m    \u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mstart\u001b[0m\u001b[38;2;255;70;137;48;2;39;40;34m=\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mtime\u001b[0m\u001b[38;2;255;70;137;48;2;39;40;34m.\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mtime\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m(\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m)\u001b[0m\u001b[48;2;39;40;34m                                                   \u001b[0m │             │             │\n",
+       "│          │ \u001b[48;2;39;40;34m                                                                        \u001b[0m │             │             │\n",
+       "│ 5        │ \u001b[38;2;248;248;242;48;2;39;40;34m    \u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mdtype_dict\u001b[0m\u001b[38;2;255;70;137;48;2;39;40;34m=\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m{\u001b[0m\u001b[48;2;39;40;34m                                                        \u001b[0m │             │             │\n",
+       "│          │ \u001b[48;2;39;40;34m                                                                        \u001b[0m │             │             │\n",
+       "│ 6        │ \u001b[38;2;248;248;242;48;2;39;40;34m        \u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34m'\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34mage\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34m'\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m:\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34m'\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34mint8\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34m'\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m,\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[48;2;39;40;34m                                                 \u001b[0m │             │             │\n",
+       "│          │ \u001b[48;2;39;40;34m                                                                        \u001b[0m │             │             │\n",
+       "│ 7        │ \u001b[38;2;248;248;242;48;2;39;40;34m        \u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34m'\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34msex\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34m'\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m:\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34m'\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34mcategory\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34m'\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m,\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[48;2;39;40;34m                                             \u001b[0m │             │             │\n",
+       "│          │ \u001b[48;2;39;40;34m                                                                        \u001b[0m │             │             │\n",
+       "│ 8        │ \u001b[38;2;248;248;242;48;2;39;40;34m        \u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34m'\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34mcounty\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34m'\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m:\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34m'\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34mcategory\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34m'\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m,\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[48;2;39;40;34m                                          \u001b[0m │             │             │\n",
+       "│          │ \u001b[48;2;39;40;34m                                                                        \u001b[0m │             │             │\n",
+       "│ 9        │ \u001b[38;2;248;248;242;48;2;39;40;34m        \u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34m'\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34mlat\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34m'\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m:\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34m'\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34mfloat64\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34m'\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m,\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[48;2;39;40;34m                                              \u001b[0m │             │             │\n",
+       "│          │ \u001b[48;2;39;40;34m                                                                        \u001b[0m │             │             │\n",
+       "│ 10       │ \u001b[38;2;248;248;242;48;2;39;40;34m        \u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34m'\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34mlong\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34m'\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m:\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34m'\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34mfloat64\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34m'\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m,\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[48;2;39;40;34m                                             \u001b[0m │             │             │\n",
+       "│          │ \u001b[48;2;39;40;34m                                                                        \u001b[0m │             │             │\n",
+       "│ 11       │ \u001b[38;2;248;248;242;48;2;39;40;34m        \u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34m'\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34mname\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34m'\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m:\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34m'\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34mcategory\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34m'\u001b[0m\u001b[48;2;39;40;34m                                              \u001b[0m │             │             │\n",
+       "│          │ \u001b[48;2;39;40;34m                                                                        \u001b[0m │             │             │\n",
+       "│ 14       │ \u001b[38;2;248;248;242;48;2;39;40;34m    \u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mefficient_df\u001b[0m\u001b[38;2;255;70;137;48;2;39;40;34m=\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mpd\u001b[0m\u001b[38;2;255;70;137;48;2;39;40;34m.\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mread_csv\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m(\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34m'\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34m./data/uk_pop.csv\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34m'\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m,\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mdtype\u001b[0m\u001b[38;2;255;70;137;48;2;39;40;34m=\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mdtype_dict\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m)\u001b[0m\u001b[48;2;39;40;34m     \u001b[0m │ 1.718211215 │             │\n",
+       "│          │ \u001b[48;2;39;40;34m                                                                        \u001b[0m │             │             │\n",
+       "│ 15       │ \u001b[38;2;248;248;242;48;2;39;40;34m    \u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mduration\u001b[0m\u001b[38;2;255;70;137;48;2;39;40;34m=\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mtime\u001b[0m\u001b[38;2;255;70;137;48;2;39;40;34m.\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mtime\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m(\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m)\u001b[0m\u001b[38;2;255;70;137;48;2;39;40;34m-\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mstart\u001b[0m\u001b[48;2;39;40;34m                                          \u001b[0m │             │             │\n",
+       "│          │ \u001b[48;2;39;40;34m                                                                        \u001b[0m │             │             │\n",
+       "│ 17       │ \u001b[38;2;248;248;242;48;2;39;40;34m    \u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mmem_usage_df\u001b[0m\u001b[38;2;255;70;137;48;2;39;40;34m=\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mefficient_df\u001b[0m\u001b[38;2;255;70;137;48;2;39;40;34m.\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mmemory_usage\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m(\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34m'\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34mdeep\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34m'\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m)\u001b[0m\u001b[48;2;39;40;34m                      \u001b[0m │ 0.005751408 │             │\n",
+       "│          │ \u001b[48;2;39;40;34m                                                                        \u001b[0m │             │             │\n",
+       "│ 18       │ \u001b[38;2;248;248;242;48;2;39;40;34m    \u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mdisplay\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m(\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mmem_usage_df\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m)\u001b[0m\u001b[48;2;39;40;34m                                               \u001b[0m │ 0.011270449 │ 0.007067286 │\n",
+       "│          │ \u001b[48;2;39;40;34m                                                                        \u001b[0m │             │             │\n",
+       "│ 20       │ \u001b[38;2;248;248;242;48;2;39;40;34m    \u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mprint\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m(\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34mf\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34m'\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34mLoading \u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34m{\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mmake_decimal\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m(\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mmem_usage_df\u001b[0m\u001b[38;2;255;70;137;48;2;39;40;34m.\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34msum\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m(\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m)\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m)\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34m}\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34m took \u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34m{\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mround\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m(\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mdura…\u001b[0m │ 0.004789912 │             │\n",
+       "│          │ \u001b[48;2;39;40;34m                                                                        \u001b[0m │             │             │\n",
+       "└──────────┴──────────────────────────────────────────────────────────────────────────┴─────────────┴─────────────┘\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "%%cudf.pandas.line_profile\n",
+    "# DO NOT CHANGE THIS CELL\n",
+    "start=time.time()\n",
+    "\n",
+    "# define data types for each column\n",
+    "dtype_dict={\n",
+    "    'age': 'int8', \n",
+    "    'sex': 'category', \n",
+    "    'county': 'category', \n",
+    "    'lat': 'float64', \n",
+    "    'long': 'float64', \n",
+    "    'name': 'category'\n",
+    "}\n",
+    "        \n",
+    "efficient_df=pd.read_csv('./data/uk_pop.csv', dtype=dtype_dict)\n",
+    "duration=time.time()-start\n",
+    "\n",
+    "mem_usage_df=efficient_df.memory_usage('deep')\n",
+    "display(mem_usage_df)\n",
+    "\n",
+    "print(f'Loading {make_decimal(mem_usage_df.sum())} took {round(duration, 2)} seconds.')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0f4607d8-6de3-4b27-96d4-a9720d268333",
+   "metadata": {},
+   "source": [
+    "We were able to load data faster and more efficiently. \n",
+    "\n",
+    "**Note**: Notice that the memory utilized on the GPU is larger than the memory used by the DataFrame. This is expected because there are intermediary processes that use some memory during the data loading process, specifically related to parsing the csv file in this case. \n",
+    "\n",
+    "```\n",
+    "+-----------------------------------------------------------------------------+\n",
+    "| NVIDIA-SMI 525.60.13    Driver Version: 525.60.13    CUDA Version: 12.0     |\n",
+    "|-------------------------------+----------------------+----------------------+\n",
+    "| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |\n",
+    "| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |\n",
+    "|                               |                      |               MIG M. |\n",
+    "|===============================+======================+======================|\n",
+    "|   0  Tesla T4            Off  | 00000000:00:1B.0 Off |                    0 |\n",
+    "| N/A   32C    P0    26W /  70W |   1378MiB / 15360MiB |      0%      Default |\n",
+    "|                               |                      |                  N/A |\n",
+    "+-------------------------------+----------------------+----------------------+\n",
+    "|   1  Tesla T4            Off  | 00000000:00:1C.0 Off |                    0 |\n",
+    "| N/A   31C    P0    26W /  70W |    168MiB / 15360MiB |      0%      Default |\n",
+    "|                               |                      |                  N/A |\n",
+    "+-------------------------------+----------------------+----------------------+\n",
+    "|   2  Tesla T4            Off  | 00000000:00:1D.0 Off |                    0 |\n",
+    "| N/A   30C    P0    26W /  70W |    168MiB / 15360MiB |      0%      Default |\n",
+    "|                               |                      |                  N/A |\n",
+    "+-------------------------------+----------------------+----------------------+\n",
+    "|   3  Tesla T4            Off  | 00000000:00:1E.0 Off |                    0 |\n",
+    "| N/A   30C    P0    26W /  70W |    168MiB / 15360MiB |      0%      Default |\n",
+    "|                               |                      |                  N/A |\n",
+    "+-------------------------------+----------------------+----------------------+\n",
+    "                                                                               \n",
+    "+-----------------------------------------------------------------------------+\n",
+    "| Processes:                                                                  |\n",
+    "|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |\n",
+    "|        ID   ID                                                   Usage      |\n",
+    "|=============================================================================|\n",
+    "+-----------------------------------------------------------------------------+\n",
+    "```"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 21,
+   "id": "92f7ee37-4acb-46aa-bb73-4c0139d3f6b8",
+   "metadata": {
+    "scrolled": true
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Sat Oct 11 16:44:59 2025       \n",
+      "+-----------------------------------------------------------------------------+\n",
+      "| NVIDIA-SMI 525.85.12    Driver Version: 525.85.12    CUDA Version: 12.0     |\n",
+      "|-------------------------------+----------------------+----------------------+\n",
+      "| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |\n",
+      "| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |\n",
+      "|                               |                      |               MIG M. |\n",
+      "|===============================+======================+======================|\n",
+      "|   0  Tesla T4            On   | 00000000:00:1B.0 Off |                    0 |\n",
+      "| N/A   30C    P0    25W /  70W |  11338MiB / 15360MiB |      0%      Default |\n",
+      "|                               |                      |                  N/A |\n",
+      "+-------------------------------+----------------------+----------------------+\n",
+      "|   1  Tesla T4            On   | 00000000:00:1C.0 Off |                    0 |\n",
+      "| N/A   31C    P0    25W /  70W |    168MiB / 15360MiB |      0%      Default |\n",
+      "|                               |                      |                  N/A |\n",
+      "+-------------------------------+----------------------+----------------------+\n",
+      "|   2  Tesla T4            On   | 00000000:00:1D.0 Off |                    0 |\n",
+      "| N/A   31C    P0    25W /  70W |    168MiB / 15360MiB |      0%      Default |\n",
+      "|                               |                      |                  N/A |\n",
+      "+-------------------------------+----------------------+----------------------+\n",
+      "|   3  Tesla T4            On   | 00000000:00:1E.0 Off |                    0 |\n",
+      "| N/A   31C    P0    25W /  70W |    168MiB / 15360MiB |      0%      Default |\n",
+      "|                               |                      |                  N/A |\n",
+      "+-------------------------------+----------------------+----------------------+\n",
+      "                                                                               \n",
+      "+-----------------------------------------------------------------------------+\n",
+      "| Processes:                                                                  |\n",
+      "|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |\n",
+      "|        ID   ID                                                   Usage      |\n",
+      "|=============================================================================|\n",
+      "+-----------------------------------------------------------------------------+\n"
+     ]
+    }
+   ],
+   "source": [
+    "# DO NOT CHANGE THIS CELL\n",
+    "!nvidia-smi"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c031d2c7-03cb-4ac7-a195-70fc25cb191d",
+   "metadata": {},
+   "source": [
+    "When loading data this way, we may be able to fit more data. The optimal dataset size depends on various factors including the specific operations being performed, the complexity of the workload, and the available GPU memory. To maximize acceleration, datasets should ideally fit within GPU memory, with ample space left for operations that can spike memory requirements. As a general rule of thumb, cuDF recommends data sets that are less than 50% of the GPU memory capacity. "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 22,
+   "id": "ec6cefea-dc64-4f13-815e-081cd35651b9",
+   "metadata": {
+    "scrolled": true
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "We can load 408997980 rows.\n"
+     ]
+    }
+   ],
+   "source": [
+    "# DO NOT CHANGE THIS CELL\n",
+    "# 1 gigabytes = 1073741824 bytes\n",
+    "mem_capacity=16*1073741824\n",
+    "\n",
+    "mem_per_record=mem_usage_df.sum()/len(efficient_df)\n",
+    "\n",
+    "print(f'We can load {int(mem_capacity/2/mem_per_record)} rows.')"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 23,
+   "id": "ddaaa1ac-66ec-4323-9842-2543c6d85e4e",
+   "metadata": {
+    "scrolled": true
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "{'status': 'ok', 'restart': True}"
+      ]
+     },
+     "execution_count": 23,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "# DO NOT CHANGE THIS CELL\n",
+    "import IPython\n",
+    "app = IPython.Application.instance()\n",
+    "app.kernel.do_shutdown(True)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "658e9847-775f-4d12-af4e-8f896df4e6fe",
+   "metadata": {},
+   "source": [
+    "**Well Done!** Let's move to the [next notebook](1-04_interoperability.ipynb). "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b86451cf-60e6-4733-b431-1bc0bd586bc2",
+   "metadata": {},
+   "source": [
+    "<img src=\"./images/DLI_Header.png\" width=400/>"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.10.15"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
author	leshe4ka46 <alex9102naid1@ya.ru>	2025-10-18 12:25:53 +0300
committer	leshe4ka46 <alex9102naid1@ya.ru>	2025-10-18 12:25:53 +0300
commit	910a222fa60ce6ea0831f2956470b8a0b9f62670 (patch)
tree	1d6bbccafb667731ad127f93390761100fc11b53 /Fundamentals_of_Accelerated_Data_Science/1-03_memory_management.ipynb
parent	35b9040e4104b0e79bf243a2c9769c589f96e2c4 (diff)