<img src="./images/DLI_Header.png" width=400/>

# Fundamentals of Accelerated Data Science # 

## 03 - Memory Management ##

**Table of Contents**
<br>
This notebook explores the dynamics between data and memory. This notebook covers the below sections: 
1. [Memory Management](#Memory-Management)
    * [Memory Usage](#Memory-Usage)
2. [Data Types](#Data-Types)
    * [Convert Data Types](#Convert-Data-Types)
    * [Exercise #1 - Modify `dtypes`](#Exercise-#1---Modify-dtypes)
    * [Categorical](#Categorical)
3. [Efficient Data Loading](#Efficient-Data-Loading)

## Memory Management ##
During the data acquisition process, data is transferred to memory in order to be operated on by the processor. Memory management is crucial for cuDF and GPU operations for several key reasons: 
* **Limited GPU memory**: GPUs typically have less memory than CPUs, therefore efficient memory management is essential to maximize the use of available GPU memory, especially for large datasets.
* **Data transfer overhead**: Transferring data between CPU and GPU memory is relatively slow compared to GPU computation speed. Minimizing these transfers through smart memory management is critical for performance.
* **Performance tuning**: Understanding and optimizing memory usage is key to achieving peak performance in GPU-accelerated data processing tasks.

When done correctly, keeping the data on the GPU can enable cuDF and the RAPIDS ecosystem to achieve significant performance improvements, handle larger datasets, and provide more efficient data processing capabilities. 

Below we import the data from the csv file. 

In [1]:
# DO NOT CHANGE THIS CELL
import pandas as pd
import random
import time

In [2]:
# DO NOT CHANGE THIS CELL
df=pd.read_csv('./data/uk_pop.csv')

# preview
df.head()

Unnamed: 0,age,sex,county,lat,long,name
0,0,m,DARLINGTON,54.533644,-1.524401,FRANCIS
1,0,m,DARLINGTON,54.426256,-1.465314,EDWARD
2,0,m,DARLINGTON,54.5552,-1.496417,TEDDY
3,0,m,DARLINGTON,54.547906,-1.572341,ANGUS
4,0,m,DARLINGTON,54.477639,-1.605995,CHARLIE


### Memory Usage ###
Memory utilization of a DataFrame depends on the date types for each column.

<p><img src='images/dtypes.png' width=720></p>

We can use `DataFrame.memory_usage()` to see the memory usage for each column (in bytes). Most of the common data types have a fixed size in memory, such as `int`, `float`, `datetime`, and `bool`. Memory usage for these data types is the respective memory requirement multiplied by the number of data points. For `string` data type, the memory usage reported _for pandas_ is the number of elements times 8 bytes. This accounts for the 64-bit required for the pointer that points to an address in memory but not the memory used for the actual string values. The actual memory required for a string value is 49 bytes plus an additional byte for each character. The `deep` parameter provides a more accurate memory usage report that accounts for the system-level memory consumption of the contained `string` data type. 

Below we get the memory usage. 

In [9]:
# DO NOT CHANGE THIS CELL
# pandas memory utilization
df.info(memory_usage='deep')
mem_usage_df=df.memory_usage(deep=True)
mem_usage_df

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 58479894 entries, 0 to 58479893
Data columns (total 6 columns):
 #   Column  Dtype  
---  ------  -----  
 0   age     int64  
 1   sex     object 
 2   county  object 
 3   lat     float64
 4   long    float64
 5   name    object 
dtypes: float64(2), int64(1), object(3)
memory usage: 11.5 GB


Index            128
age        467839152
sex       3391833852
county    3934985133
lat        467839152
long       467839152
name      3666922374
dtype: int64

Below we define a `make_decimal()` function to convert memory size into units based on powers of 2. In contrast to units based on powers of 10, this customary convention is commonly used to report memory capacity. More information about the two definitions can be found [here](https://en.wikipedia.org/wiki/Byte#Multiple-byte_units). 

In [4]:
# DO NOT CHANGE THIS CELL
suffixes = ['B', 'kB', 'MB', 'GB', 'TB', 'PB']
def make_decimal(nbytes):
    i=0
    while nbytes >= 1024 and i < len(suffixes)-1:
        nbytes/=1024.
        i+=1
    f=('%.2f' % nbytes).rstrip('0').rstrip('.')
    return '%s %s' % (f, suffixes[i])

In [5]:
make_decimal(mem_usage_df.sum())

'11.55 GB'

Below we calculate the memory usage manually based on the data types. 

In [6]:
# DO NOT CHANGE THIS CELL
# get number of rows
num_rows=len(df)

# 64-bit numbers uses 8 bytes of memory
print(f'Numerical columns use {num_rows*8} bytes of memory')

Numerical columns use 467839152 bytes of memory


In [7]:
# DO NOT CHANGE THIS CELL
# check random string-typed column
string_cols=[col for col in df.columns if df[col].dtype=='object' ]
column_to_check=random.choice(string_cols)

overhead=49
pointer_size=8

# nan==nan when value is not a number
# nan uses 32 bytes of memory
string_col_mem_usage_df=df[column_to_check].map(lambda x: len(x)+overhead+pointer_size if x else 32)
string_col_mem_usage=string_col_mem_usage_df.sum()
print(f'{column_to_check} column uses {string_col_mem_usage} bytes of memory.')

county column uses 3934985133 bytes of memory.


**Note**: The `string` data type is stored differently in cuDF than it is in pandas. More information about `libcudf` stores string data using the [Arrow format](https://arrow.apache.org/docs/format/Columnar.html#variable-size-binary-layout) can be found [here](https://developer.nvidia.com/blog/mastering-string-transformations-in-rapids-libcudf/). 

## Data Types ##
By default, pandas (and cuDF) uses 64-bit for numerical values. Using 64-bit numbers provides the highest precision but many applications do not require 64-bit precision when aggregating over a very large number of data points. When possible, using 32-bit numbers reduces storage and memory requirements in half, and also typically greatly speeds up computations because only half as much data needs to be accessed in memory. 

### Convert Data Types ###
The `.astype()` method can be used to convert numerical data types to use different bit-size containers. Here we convert the `age` column from `int64` to `int8`. 

In [10]:
# DO NOT CHANGE THIS CELL
df['age']=df['age'].astype('int8')

df.dtypes

age          int8
sex        object
county     object
lat       float64
long      float64
name       object
dtype: object

### Exercise #1 - Modify `dtypes` ###
**Instructions**: <br>
* Modify the `<FIXME>` only and execute the below cell to convert any 64-bit data types to their 32-bit counterparts.

In [11]:
df['lat']=df['lat'].astype('float32')
df['long']=df['long'].astype('float32')

Click ... for solution. 

### Categorical ###
Categorical data is a type of data that represents discrete, distinct categories or groups. They can have a meaningful order or ranking but generally cannot be used for numerical operations. When appropriate, using the `categorical` data type can reduce memory usage and lead to faster operations. It can also be used to define and maintain a custom order of categories. 

Below we get the number of unique values in the string columns. 

In [12]:
# DO NOT CHANGE THIS CELL
df.select_dtypes(include='object').nunique()

sex           2
county      171
name      13212
dtype: int64

Below we convert columns with few discrete values to `category`. The `category` data type has `.categories` and `codes` properties that are accessed through `.cat`. 

In [13]:
# DO NOT CHANGE THIS CELL
df['sex']=df['sex'].astype('category')
df['county']=df['county'].astype('category')

In [14]:
# DO NOT CHANGE THIS CELL
display(df['county'].cat.categories)
print('-'*40)
display(df['county'].cat.codes)

Index(['BARKING AND DAGENHAM', 'BARNET', 'BARNSLEY',
       'BATH AND NORTH EAST SOMERSET', 'BEDFORD', 'BEXLEY', 'BIRMINGHAM',
       'BLACKBURN WITH DARWEN', 'BLACKPOOL', 'BLAENAU GWENT',
       ...
       'WESTMINSTER', 'WIGAN', 'WILTSHIRE', 'WINDSOR AND MAIDENHEAD', 'WIRRAL',
       'WOKINGHAM', 'WOLVERHAMPTON', 'WORCESTERSHIRE', 'WREXHAM', 'YORK'],
      dtype='object', length=171)

----------------------------------------


0           37
1           37
2           37
3           37
4           37
            ..
58479889    96
58479890    96
58479891    96
58479892    96
58479893    96
Length: 58479894, dtype: int16

In [15]:
df.dtypes

age           int8
sex       category
county    category
lat        float32
long       float32
name        object
dtype: object

In [16]:
df.head()

Unnamed: 0,age,sex,county,lat,long,name
0,0,m,DARLINGTON,54.533646,-1.524401,FRANCIS
1,0,m,DARLINGTON,54.426254,-1.465314,EDWARD
2,0,m,DARLINGTON,54.555199,-1.496417,TEDDY
3,0,m,DARLINGTON,54.547905,-1.572341,ANGUS
4,0,m,DARLINGTON,54.477638,-1.605994,CHARLIE


**Note**: `.astype()` can also be used to convert data to `datetime` or `object` to enable datetime and string methods. 

## Efficient Data Loading ##
It is often advantageous to specify the most appropriate data types for each columns, based on range, precision requirement, and how they are used. 

In [17]:
# DO NOT CHANGE THIS CELL
start=time.time()
df=pd.read_csv('./data/uk_pop.csv')
duration=time.time()-start

mem_usage_df=df.memory_usage(deep=True)
display(mem_usage_df)

print(f'Loading {make_decimal(mem_usage_df.sum())} took {round(duration, 2)} seconds.')

Index            128
age        467839152
sex       3391833852
county    3934985133
lat        467839152
long       467839152
name      3666922374
dtype: int64

Loading 11.55 GB took 33.87 seconds.


Below we enable `cuda.pandas` to see the difference. 

In [18]:
# DO NOT CHANGE THIS CELL
%load_ext cudf.pandas

import pandas as pd
import time

In [19]:
# DO NOT CHANGE THIS CELL
suffixes = ['B', 'kB', 'MB', 'GB', 'TB', 'PB']
def make_decimal(nbytes):
    i=0
    while nbytes >= 1024 and i < len(suffixes)-1:
        nbytes/=1024.
        i+=1
    f=('%.2f' % nbytes).rstrip('0').rstrip('.')
    return '%s %s' % (f, suffixes[i])

In [20]:
%%cudf.pandas.line_profile
# DO NOT CHANGE THIS CELL
start=time.time()

# define data types for each column
dtype_dict={
    'age': 'int8', 
    'sex': 'category', 
    'county': 'category', 
    'lat': 'float64', 
    'long': 'float64', 
    'name': 'category'
}
        
efficient_df=pd.read_csv('./data/uk_pop.csv', dtype=dtype_dict)
duration=time.time()-start

mem_usage_df=efficient_df.memory_usage('deep')
display(mem_usage_df)

print(f'Loading {make_decimal(mem_usage_df.sum())} took {round(duration, 2)} seconds.')

age        58479894
sex        58479908
county     58482446
lat       467839152
long      467839152
name      117096917
Index             0
dtype: int64

Loading 1.14 GB took 2.12 seconds.


We were able to load data faster and more efficiently. 

**Note**: Notice that the memory utilized on the GPU is larger than the memory used by the DataFrame. This is expected because there are intermediary processes that use some memory during the data loading process, specifically related to parsing the csv file in this case. 

```
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.60.13    Driver Version: 525.60.13    CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla T4            Off  | 00000000:00:1B.0 Off |                    0 |
| N/A   32C    P0    26W /  70W |   1378MiB / 15360MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  Tesla T4            Off  | 00000000:00:1C.0 Off |                    0 |
| N/A   31C    P0    26W /  70W |    168MiB / 15360MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   2  Tesla T4            Off  | 00000000:00:1D.0 Off |                    0 |
| N/A   30C    P0    26W /  70W |    168MiB / 15360MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   3  Tesla T4            Off  | 00000000:00:1E.0 Off |                    0 |
| N/A   30C    P0    26W /  70W |    168MiB / 15360MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+
```

In [21]:
# DO NOT CHANGE THIS CELL
!nvidia-smi

Sat Oct 11 16:44:59 2025       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.85.12    Driver Version: 525.85.12    CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla T4            On   | 00000000:00:1B.0 Off |                    0 |
| N/A   30C    P0    25W /  70W |  11338MiB / 15360MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  Tesla T4            On   | 00000000:00:1C.0 Off |                    0 |
| N/A   31C    P0    25W /  70W |    168MiB / 15360MiB |      0%      Default |
|       

When loading data this way, we may be able to fit more data. The optimal dataset size depends on various factors including the specific operations being performed, the complexity of the workload, and the available GPU memory. To maximize acceleration, datasets should ideally fit within GPU memory, with ample space left for operations that can spike memory requirements. As a general rule of thumb, cuDF recommends data sets that are less than 50% of the GPU memory capacity. 

In [22]:
# DO NOT CHANGE THIS CELL
# 1 gigabytes = 1073741824 bytes
mem_capacity=16*1073741824

mem_per_record=mem_usage_df.sum()/len(efficient_df)

print(f'We can load {int(mem_capacity/2/mem_per_record)} rows.')

We can load 408997980 rows.


In [23]:
# DO NOT CHANGE THIS CELL
import IPython
app = IPython.Application.instance()
app.kernel.do_shutdown(True)

{'status': 'ok', 'restart': True}

**Well Done!** Let's move to the [next notebook](1-04_interoperability.ipynb). 

<img src="./images/DLI_Header.png" width=400/>