{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "b53a7b12-538d-4459-b82a-a35c8c417849",
   "metadata": {},
   "source": [
    "<img src=\"./images/DLI_Header.png\" width=400/>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ae497b71-bc43-471e-8970-88a1878e7cf9",
   "metadata": {},
   "source": [
    "# Fundamentals of Accelerated Data Science # "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a149b6d1-1880-4a5d-9d71-f963d3097aa4",
   "metadata": {},
   "source": [
    "## 02 - Data Manipulation ##\n",
    "\n",
    "**Table of Contents**\n",
    "<br>\n",
    "This notebook explores the fundamentals of data acquisition and manipulation using DataFrame APIs, covering essential techniques for handling and processing datasets. This notebook covers the below sections: \n",
    "1. [Data Background](#Data-Background)\n",
    "1. [cuDF and pandas](#cuDF-and-pandas)\n",
    "    * [pandas](#pandas)\n",
    "    * [cuDF](#cuDF)\n",
    "3. [Data Acquisition](#Data-Acquisition)\n",
    "4. [Initial Data Exploration](#Initial-Data-Exploration)\n",
    "5. [Indexing and Data Selection with `.loc` Accessor](#Indexing-and-Data-Selection-with-.loc-Accessor)\n",
    "6. [Basic Operations](#Basic-Operations)\n",
    "    * [Exercise #1 - Convert `county` Column to Title Case](#Exercise-#1---Convert-county-Column-to-Title-Case)\n",
    "7. [Aggregation](#Aggregation)\n",
    "8. [Applying User-Defined Functions (UDFs) with `.map()` and `.apply()`](#Applying-User-Defined-Functions-(UDFs)-with-.map()-and-.apply())\n",
    "9. [Filtering with `.loc` and Boolean Mask](#Filtering-with-.loc-and-Boolean-Mask)\n",
    "    * [Exercise #2 - Counties North of Sunderland](#Exercise-#2---Counties-North-of-Sunderland)\n",
    "10. [Creating New Columns](#Creating-New-Columns)\n",
    "11. [pandas vs. cuDF](#pandas-vs.-cuDF)\n",
    "12. [cuDF pandas](#cuDF-pandas)\n",
    "    * [Exercise #3 - Automatic Acceleration](#Exercise-#3---Automatic-Acceleration)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8b739635-4883-40b2-94e9-7a08f853871c",
   "metadata": {},
   "source": [
    "## Data Background ##\n",
    "For this workshop, we will be reading almost 60 million records (corresponding to the entire population of England and Wales) which were synthesized from official UK census data. "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "95e6bbed-1c08-4002-837c-392d5a12658f",
   "metadata": {},
   "source": [
    "## cuDF and pandas ##"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "050926cb-1dee-447a-9da8-49ebb1292d55",
   "metadata": {},
   "source": [
    "### pandas ###\n",
    "[pandas](https://pandas.pydata.org/) is a widely-used open-source library for data manipulation and analysis in Python. It provides high-performance, easy-to-use data structures and tools for working with structured data. It popularized the term DataFrame as a data structure for statistical computing. In data science, pandas is used for: \n",
    "* **Data loading and writing**: reads from and writes to various file formats like CSV, Excel, JSON, and SQL databases\n",
    "* **Data cleaning and processing/preprocessing**: helps users with handling missing data, merging datasets, and reshaping data\n",
    "* **Data analysis**: performs grouping, aggregating, and statistical operations\n",
    "\n",
    "**Note**: Data preprocessing refers to the process of transforming raw data into a format that is more suitable for analysis and other downstream tasks. "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "79e09f10-be1d-4ffe-9247-1e605e3f450f",
   "metadata": {},
   "source": [
    "### cuDF ###\n",
    "Similarly, [cuDF](https://docs.rapids.ai/api/cudf/stable/) is a Python GPU DataFrame library for loading, joining, aggregating, filtering, and otherwise manipulating data. cuDF is designed to accelerate data science workflows by utilizing the parallel processing power of GPUs, potentially offering significant speed improvements over CPU-based alternatives for large datasets. The key features of cuDF include: \n",
    "* **GPU Acceleration**: leverages NVIDIA GPUs for fast data processing and analysis\n",
    "* **pandas-like API**: provides users a familiar interface and transition to GPU-based computing\n",
    "* **Integration with other RAPIDS libraries**: works seamlessly with other GPU-accelerated tools in the RAPIDS ecosystem"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ff5519e2-f77f-4160-b362-979301705733",
   "metadata": {},
   "source": [
    "**Note**: Both Pandas and cuDF serve similar purposes in data manipulation and analysis, but cuDF is specifically optimized for GPU acceleration, making it particularly useful for working with large datasets where performance is critical."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "770fb1d8-73c5-4c45-a1e4-599f66e6b833",
   "metadata": {},
   "source": [
    "## Data Acquisition ##\n",
    "In our context, data acquisition refers to the process of collecting and importing data from various sources into a Python environment for analysis, processing, and manipulation. Data can come from a variety of sources: \n",
    "* Local file in various formats\n",
    "* Databases\n",
    "* APIs\n",
    "* Web scraping\n",
    "\n",
    "It's worth noting that Python's rich ecosystem of libraries makes it versatile for acquiring data from various sources, allowing data scientists to work with diverse datasets efficiently. CPU processing will be responsible for acquiring data from APIs or Web Scraping. In most cases, network bandwidth will likely be the bottleneck. Furthermore, cuDF doesn't have a way to get transactions from SQL data bases directly into GPU memory. The recommended approach for reading data from a database is to first use CPU-based methods (i.e. pandas), then convert to cuDF for GPU-accelerated processing.  \n",
    "\n",
    "Below we use the `head` linux command to display the beginning of the data file. This allows us to understand how to read the data correctly. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "247d2b96-1bce-4e26-89bd-d659df3528d7",
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\"age\",\"sex\",\"county\",\"lat\",\"long\",\"name\"\n",
      "0,\"m\",\"DARLINGTON\",54.53364379,-1.524400639,\"FRANCIS\"\n",
      "0,\"m\",\"DARLINGTON\",54.42625551,-1.465313919,\"EDWARD\"\n",
      "0,\"m\",\"DARLINGTON\",54.55520036,-1.496417277,\"TEDDY\"\n",
      "0,\"m\",\"DARLINGTON\",54.54790635,-1.572341399,\"ANGUS\"\n"
     ]
    }
   ],
   "source": [
    "# DO NOT CHANGE THIS CELL\n",
    "!head -n 5 data/uk_pop.csv"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "be7bd168-eb68-45cc-8009-64569974a187",
   "metadata": {},
   "source": [
    "One row will represent one person. We have information about their `age`, `sex`, `county`, location, and `name`. Using cuDF, the RAPIDS API providing a GPU-accelerated DataFrame, we can read data from [a variety of formats](https://rapidsai.github.io/projects/cudf/en/0.10.0/api.html#module-cudf.io.csv), including csv, json, parquet, feather, orc, and pandas DataFrames, among others. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "89c435b1-35d5-4971-ade1-549ae77d22db",
   "metadata": {},
   "outputs": [],
   "source": [
    "# DO NOT CHANGE THIS CELL\n",
    "import cudf\n",
    "import cupy as cp\n",
    "import numpy as np\n",
    "\n",
    "from datetime import datetime\n",
    "import random\n",
    "import time"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9cb9faf3-4dc9-42bf-b481-98fb4155033e",
   "metadata": {},
   "source": [
    "Below we read the data from a local csv file directly into GPU memory with the `read_csv()` function. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "a343a943-fd64-45f6-abd5-a991810cf5f3",
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Duration: 3.72 seconds\n"
     ]
    }
   ],
   "source": [
    "# DO NOT CHANGE THIS CELL\n",
    "start=time.time()\n",
    "df=cudf.read_csv('./data/uk_pop.csv')\n",
    "print(f'Duration: {round(time.time()-start, 2)} seconds')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c406541c-884a-49c3-b5cb-7aaf21b60403",
   "metadata": {},
   "source": [
    "**Note**: Because of the sophisticated GPU memory management behind the scenes in cuDF, the first data load into a fresh RAPIDS memory environment is sometimes substantially slower than subsequent loads. The [RAPIDS Memory Manager](https://github.com/rapidsai/rmm) is preparing additional memory to accommodate the array of data science operations that we may be interested in using on the data, rather than allocating and deallocating the memory repeatedly throughout the workflow. \n",
    "\n",
    "Below we get the general information about the DataFrame with the `DataFrame.info()` method. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "18cd5602-9129-4809-a95f-1e30940558c5",
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<class 'cudf.core.dataframe.DataFrame'>\n",
      "RangeIndex: 58479894 entries, 0 to 58479893\n",
      "Data columns (total 6 columns):\n",
      " #   Column  Dtype\n",
      "---  ------  -----\n",
      " 0   age     int64\n",
      " 1   sex     object\n",
      " 2   county  object\n",
      " 3   lat     float64\n",
      " 4   long    float64\n",
      " 5   name    object\n",
      "dtypes: float64(2), int64(1), object(3)\n",
      "memory usage: 2.9 GB\n"
     ]
    }
   ],
   "source": [
    "# DO NOT CHANGE THIS CELL\n",
    "df.info(memory_usage='deep')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a8289385-f1ac-4ccd-8ba4-6b127b200b42",
   "metadata": {},
   "source": [
    "The **DataFrame** is a two-dimensional labeled data structure. It's organized in rows and columns, similar to a spreadsheet or SQL table. Both rows and columns have labels. Rows are typically labeled with an index, while columns have column names. Data is aligned based on row and column labels when performing operations. This is useful for enabling highly efficient vectorized operations across columns or rows. A **Series** refers to a one-dimensional array and is typically associated with a single column of data with an index. \n",
    "\n",
    "There are ~60MM records across 6 columns. cuDF is able to read data from local files directly into the GPU very efficiently. By default, cuDF samples the dataset to infer the most appropriate data types for each columns. \n",
    "\n",
    "**Note**: The DataFrame has `.dtypes` and `.columns` attributes that can be used to get similar information. "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "af6127ab-437e-4a60-b9bd-5f9671c10602",
   "metadata": {},
   "source": [
    "## Initial Data Exploration ##\n",
    "Now that we have some data loaded, let's do some initial exploration. \n",
    "\n",
    "Below we preview the DataFrame with the `DataFrame.head()` method. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "7faf372e-644c-4120-8080-779f3a23a152",
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>age</th>\n",
       "      <th>sex</th>\n",
       "      <th>county</th>\n",
       "      <th>lat</th>\n",
       "      <th>long</th>\n",
       "      <th>name</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0</td>\n",
       "      <td>m</td>\n",
       "      <td>DARLINGTON</td>\n",
       "      <td>54.533644</td>\n",
       "      <td>-1.524401</td>\n",
       "      <td>FRANCIS</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0</td>\n",
       "      <td>m</td>\n",
       "      <td>DARLINGTON</td>\n",
       "      <td>54.426256</td>\n",
       "      <td>-1.465314</td>\n",
       "      <td>EDWARD</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>0</td>\n",
       "      <td>m</td>\n",
       "      <td>DARLINGTON</td>\n",
       "      <td>54.555200</td>\n",
       "      <td>-1.496417</td>\n",
       "      <td>TEDDY</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>0</td>\n",
       "      <td>m</td>\n",
       "      <td>DARLINGTON</td>\n",
       "      <td>54.547906</td>\n",
       "      <td>-1.572341</td>\n",
       "      <td>ANGUS</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>0</td>\n",
       "      <td>m</td>\n",
       "      <td>DARLINGTON</td>\n",
       "      <td>54.477639</td>\n",
       "      <td>-1.605995</td>\n",
       "      <td>CHARLIE</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   age sex      county        lat      long     name\n",
       "0    0   m  DARLINGTON  54.533644 -1.524401  FRANCIS\n",
       "1    0   m  DARLINGTON  54.426256 -1.465314   EDWARD\n",
       "2    0   m  DARLINGTON  54.555200 -1.496417    TEDDY\n",
       "3    0   m  DARLINGTON  54.547906 -1.572341    ANGUS\n",
       "4    0   m  DARLINGTON  54.477639 -1.605995  CHARLIE"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# DO NOT CHANGE THIS CELL\n",
    "df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ee4649b1-2730-47e9-a9d7-331fe7514241",
   "metadata": {},
   "source": [
    "## Indexing and Data Selection with `.loc` Accessor ##\n",
    "The `.loc` accessor in cuDF DataFrames is used for label-based indexing and selection of data. It allows us to access and manipulate data in a DataFrame based on row and column labels. We can use `DataFrame.loc[row_label(s), column_label(s)]` to access a group of rows and columns. When selecting multiple labels, a list (`[]`) is used. Furthermore, we can use the slicing operator (`:`, i.e. `start:end`) to specify a range of elements. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "40d63289-8f23-424c-b3f4-0b7098c9b5a1",
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "----------------------------------------\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>age</th>\n",
       "      <th>sex</th>\n",
       "      <th>county</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0</td>\n",
       "      <td>m</td>\n",
       "      <td>DARLINGTON</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0</td>\n",
       "      <td>m</td>\n",
       "      <td>DARLINGTON</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>0</td>\n",
       "      <td>m</td>\n",
       "      <td>DARLINGTON</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   age sex      county\n",
       "0    0   m  DARLINGTON\n",
       "1    0   m  DARLINGTON\n",
       "2    0   m  DARLINGTON"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "----------------------------------------\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>age</th>\n",
       "      <th>sex</th>\n",
       "      <th>county</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0</td>\n",
       "      <td>m</td>\n",
       "      <td>DARLINGTON</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0</td>\n",
       "      <td>m</td>\n",
       "      <td>DARLINGTON</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>0</td>\n",
       "      <td>m</td>\n",
       "      <td>DARLINGTON</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>0</td>\n",
       "      <td>m</td>\n",
       "      <td>DARLINGTON</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>0</td>\n",
       "      <td>m</td>\n",
       "      <td>DARLINGTON</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>0</td>\n",
       "      <td>m</td>\n",
       "      <td>DARLINGTON</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   age sex      county\n",
       "0    0   m  DARLINGTON\n",
       "1    0   m  DARLINGTON\n",
       "2    0   m  DARLINGTON\n",
       "3    0   m  DARLINGTON\n",
       "4    0   m  DARLINGTON\n",
       "5    0   m  DARLINGTON"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "----------------------------------------\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>age</th>\n",
       "      <th>sex</th>\n",
       "      <th>county</th>\n",
       "      <th>lat</th>\n",
       "      <th>long</th>\n",
       "      <th>name</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0</td>\n",
       "      <td>m</td>\n",
       "      <td>DARLINGTON</td>\n",
       "      <td>54.533644</td>\n",
       "      <td>-1.524401</td>\n",
       "      <td>FRANCIS</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0</td>\n",
       "      <td>m</td>\n",
       "      <td>DARLINGTON</td>\n",
       "      <td>54.426256</td>\n",
       "      <td>-1.465314</td>\n",
       "      <td>EDWARD</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>0</td>\n",
       "      <td>m</td>\n",
       "      <td>DARLINGTON</td>\n",
       "      <td>54.555200</td>\n",
       "      <td>-1.496417</td>\n",
       "      <td>TEDDY</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>0</td>\n",
       "      <td>m</td>\n",
       "      <td>DARLINGTON</td>\n",
       "      <td>54.547906</td>\n",
       "      <td>-1.572341</td>\n",
       "      <td>ANGUS</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>0</td>\n",
       "      <td>m</td>\n",
       "      <td>DARLINGTON</td>\n",
       "      <td>54.477639</td>\n",
       "      <td>-1.605995</td>\n",
       "      <td>CHARLIE</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>0</td>\n",
       "      <td>m</td>\n",
       "      <td>DARLINGTON</td>\n",
       "      <td>54.522900</td>\n",
       "      <td>-1.599255</td>\n",
       "      <td>VICTOR</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>0</td>\n",
       "      <td>m</td>\n",
       "      <td>DARLINGTON</td>\n",
       "      <td>54.501872</td>\n",
       "      <td>-1.667874</td>\n",
       "      <td>EAMONN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>0</td>\n",
       "      <td>m</td>\n",
       "      <td>DARLINGTON</td>\n",
       "      <td>54.554709</td>\n",
       "      <td>-1.494506</td>\n",
       "      <td>HARRY</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>0</td>\n",
       "      <td>m</td>\n",
       "      <td>DARLINGTON</td>\n",
       "      <td>54.602288</td>\n",
       "      <td>-1.586457</td>\n",
       "      <td>HECTOR</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>0</td>\n",
       "      <td>m</td>\n",
       "      <td>DARLINGTON</td>\n",
       "      <td>54.489992</td>\n",
       "      <td>-1.652537</td>\n",
       "      <td>THEODORE</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>0</td>\n",
       "      <td>m</td>\n",
       "      <td>DARLINGTON</td>\n",
       "      <td>54.551315</td>\n",
       "      <td>-1.519593</td>\n",
       "      <td>MUHAMMAD</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    age sex      county        lat      long      name\n",
       "0     0   m  DARLINGTON  54.533644 -1.524401   FRANCIS\n",
       "1     0   m  DARLINGTON  54.426256 -1.465314    EDWARD\n",
       "2     0   m  DARLINGTON  54.555200 -1.496417     TEDDY\n",
       "3     0   m  DARLINGTON  54.547906 -1.572341     ANGUS\n",
       "4     0   m  DARLINGTON  54.477639 -1.605995   CHARLIE\n",
       "5     0   m  DARLINGTON  54.522900 -1.599255    VICTOR\n",
       "6     0   m  DARLINGTON  54.501872 -1.667874    EAMONN\n",
       "7     0   m  DARLINGTON  54.554709 -1.494506     HARRY\n",
       "8     0   m  DARLINGTON  54.602288 -1.586457    HECTOR\n",
       "9     0   m  DARLINGTON  54.489992 -1.652537  THEODORE\n",
       "10    0   m  DARLINGTON  54.551315 -1.519593  MUHAMMAD"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# DO NOT CHANGE THIS CELL\n",
    "# get first cell\n",
    "display(df.loc[0, 'age'])\n",
    "print('-'*40)\n",
    "\n",
    "# get multiple rows and columns\n",
    "display(df.loc[[0, 1, 2], ['age', 'sex', 'county']])\n",
    "print('-'*40)\n",
    "\n",
    "# slice a range of rows and columns\n",
    "display(df.loc[0:5, 'age':'county'])\n",
    "print('-'*40)\n",
    "\n",
    "# slice a range of rows and columns\n",
    "display(df.loc[:10, :'name'])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a451118f-b986-49b6-ae03-2526e44007a7",
   "metadata": {},
   "source": [
    "**Note**: `df[column_label(s)]` is another way to access specific columns, similar to `df.loc[:, column_labels]`. "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "055f8828-db5b-419a-aab5-bf149b9fd829",
   "metadata": {},
   "source": [
    "## Basic Operations ##\n",
    "cuDF support a wide range of operations for numerical data. Although strings are not a data type traditionally associated with GPUs, cuDF supports powerful accelerated string operations.\n",
    "* Numerical operations:\n",
    "    * Arithmetic operations: addition, subtraction, multiplication, division\n",
    "* String operations:\n",
    "    * Case conversion: `.upper()`, `.lower()`, `.title()`\n",
    "    * String manipulation: concatenation, substring, extraction, padding\n",
    "    * Pattern matching: `contains()`\n",
    "    * Splitting: `.split()`\n",
    "* Comparison operations: greater than, less than, equal to, etc.\n",
    "\n",
    "These operations will be performed element-wise for each row. This allows for efficient **vectorized operations** across entire columns. These operations are implemented as vector operations instead of iteration because vector operations can be applied to entire arrays of data, instead of iterating through each element individually. Vectorization is significantly faster than iterating over elements, especially for large datasets. When operating on multiple columns, operations are aligned by index, ensuring that calculations are performed on the correct corresponding elements across columns. These element-wise operations are typically highly optimized and can be much faster than explicit loops, especially for large datasets. We can get the underlying array of data with the `.values` attribute. This is useful when we want to perform operations on the underlying data. \n",
    "\n",
    "**Note**: Iterating over a cuDF Series, DataFrame or Index is not supported. This is because iterating over data that resides on the GPU will yield extremely poor performance, as GPUs are optimized for highly parallel operations rather than sequential operations. \n",
    "\n",
    "Below we calculate the birth year for each person. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "d4286299-3a43-4e53-a9fb-04e1f20a40a4",
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0           2025\n",
       "1           2025\n",
       "2           2025\n",
       "3           2025\n",
       "4           2025\n",
       "            ... \n",
       "58479889    1935\n",
       "58479890    1935\n",
       "58479891    1935\n",
       "58479892    1935\n",
       "58479893    1935\n",
       "Name: age, Length: 58479894, dtype: int64"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/plain": [
       "array([2025, 2025, 2025, ..., 1935, 1935, 1935])"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# DO NOT CHANGE THIS CELL\n",
    "# get current year\n",
    "current_year=datetime.now().year\n",
    "\n",
    "# derive the birth year\n",
    "display(current_year-df.loc[:, 'age'])\n",
    "\n",
    "# get the age array (CuPy for cuDF)\n",
    "age_ary=df.loc[:, 'age'].values\n",
    "\n",
    "# derive the birth year\n",
    "current_year-age_ary"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "00213228-e32e-4a88-853e-eef53fad4da8",
   "metadata": {},
   "source": [
    "When performing operations between a DataFrame and a scalar value, the scalar is \"broadcast\" to match the shape of the DataFrame, effectively applying it to each element. \n",
    "\n",
    "```\n",
    "current_year - df.loc[:, 'age']\n",
    "-------------------------------\n",
    "  (scalar)          (array)    \n",
    "    2024,    -         0\n",
    "    2024,    -         0\n",
    "    2024,    -         0\n",
    "    2024,    -         0\n",
    "    2024,    -         0\n",
    "    ...      -         ...\n",
    "```\n",
    "\n",
    "This partially explains why cuDF provides significant performance improvements over pandas, especially for large datasets. The parallel processing architecture of GPUs are designed with thousands of small, specialized cores that can execute many operations simultaneously. This architecture is ideal for vectorized operations, which perform the same instruction on multiple data elements in parallel. "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "760a2729-9c7a-4602-83ac-5e171cc4f5f9",
   "metadata": {},
   "source": [
    "<a name='s4.1'></a>\n",
    "### Exercise #1 - Convert `county` Column to Title Case ###\n",
    "As it stands, all of the counties are UPPERCASE. We want to convert the `county` column to title case. \n",
    "\n",
    "**Instructions**: <br>\n",
    "* Modify the `<FIXME>` only and execute the below cell to convert the `county` column to title case. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "5365b5b6-6a00-47e7-8839-0cc8b183c6d7",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0           Darlington\n",
       "1           Darlington\n",
       "2           Darlington\n",
       "3           Darlington\n",
       "4           Darlington\n",
       "               ...    \n",
       "58479889       Newport\n",
       "58479890       Newport\n",
       "58479891       Newport\n",
       "58479892       Newport\n",
       "58479893       Newport\n",
       "Name: county, Length: 58479894, dtype: object"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df['county'].str.title()"
   ]
  },
  {
   "cell_type": "raw",
   "id": "738892d3-4bab-4404-af83-b8623804ca5d",
   "metadata": {},
   "source": [
    "\n",
    "df['county'].str.title()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "35a2520b-5eec-4d59-a956-3f31b43a98b2",
   "metadata": {},
   "source": [
    "Click ... for solution. "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d163438b-9993-41e7-856c-76101135a9ad",
   "metadata": {},
   "source": [
    "Performing comparison operations or applying conditions create boolean values (`True`/`False`) that corresponds element-wise. \n",
    "\n",
    "Below we check if each person is an adult. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "481218f1-ec09-4776-bda6-b039ccc190ab",
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0           False\n",
       "1           False\n",
       "2           False\n",
       "3           False\n",
       "4           False\n",
       "            ...  \n",
       "58479889     True\n",
       "58479890     True\n",
       "58479891     True\n",
       "58479892     True\n",
       "58479893     True\n",
       "Name: age, Length: 58479894, dtype: bool"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# DO NOT CHANGE THIS CELL\n",
    "df['age']>=18"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0b98c380-828a-404b-9fb2-1d215337eff0",
   "metadata": {},
   "source": [
    "## Aggregation ##\n",
    "Aggregation is an important operation for data science tasks, allowing us to summarize and analyze grouped data. It's commonly used for tasks like calculating totals, averages, counts, etc. cuDF supports common aggregations like `.sum()`, `.mean()`, `.min()`, `.max()`, `.count()`, `.std()`(standard deviation), etc. It also supports more advanced aggregations like `.quantile()` and `.corr()` (correlation). With the `axis` parameter, aggregation operations can be applied column-wise (`0`) or row-wise (`1`). \n",
    "\n",
    "When the aggregation is implemented as a vector operation, specifically a reduction operation, it is very efficient on the GPU becasue a large number of data elements can be processed simultaneously and in parallel. Column-wise operations also benefit from the [Apache Arrow columnar memory format](https://arrow.apache.org/docs/format/Columnar.html). \n",
    "\n",
    "<p><img src='images/sum_reduction.png' width=720></p>\n",
    "\n",
    "Below we calculate the arithmetic mean of `lat` and `long` to get an approximate center. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "id": "bee8be82-8631-4863-a1fa-e46eed47e334",
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "lat     52.350600\n",
       "long    -1.304956\n",
       "dtype: float64"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# DO NOT CHANGE THIS CELL\n",
    "df[['lat', 'long']].mean()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8a53f0f4-7dc5-40fd-af07-3e82d6556393",
   "metadata": {},
   "source": [
    "## Applying User-Defined Functions (UDFs) with `.map()` and `.apply()` ##\n",
    "The `.map()` and `.apply()` methods are the primary ways of applying user-defined functions element-wise, and row or column-wise, respectively. We can pass a callable function (built-in or user-defined) as the argument, which is then applied to the entire data structure. Not all operations can be vectorized, especially complex custom logic. In such cases, methods like `.apply()` or custom UDFs might be necessary.\n",
    "\n",
    "Below we use `.apply()` to check if each person is an adult. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "id": "c65e80f8-1cc3-453c-85f2-910dab451228",
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0           0\n",
       "1           0\n",
       "2           0\n",
       "3           0\n",
       "4           0\n",
       "           ..\n",
       "58479889    1\n",
       "58479890    1\n",
       "58479891    1\n",
       "58479892    1\n",
       "58479893    1\n",
       "Length: 58479894, dtype: int64"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Duration: 2.19 seconds\n"
     ]
    }
   ],
   "source": [
    "# DO NOT CHNAGE THIS CELL\n",
    "# define a function to check if age is greater than or equal to 18\n",
    "start=time.time()\n",
    "def is_adult(row): \n",
    "    if row['age']>=18: \n",
    "        return 1\n",
    "    else: \n",
    "        return 0\n",
    "\n",
    "# derive the birth year\n",
    "display(df.apply(is_adult, axis=1))\n",
    "print(f'Duration: {round(time.time()-start, 2)} seconds')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "02828781-6f5b-49d5-87a5-aa5ef08adf15",
   "metadata": {},
   "source": [
    "We can also use a [**lambda function**](https://docs.python.org/3/glossary.html#term-lambda) when the function is simple. Lambda functions are limited to a single expression but can include a conditional statement and mulitple arguments. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "id": "31d6e7fb-f435-4b1f-8e74-e732cc406b51",
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0           0\n",
       "1           0\n",
       "2           0\n",
       "3           0\n",
       "4           0\n",
       "           ..\n",
       "58479889    1\n",
       "58479890    1\n",
       "58479891    1\n",
       "58479892    1\n",
       "58479893    1\n",
       "Length: 58479894, dtype: int64"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Duration: 0.05 seconds\n"
     ]
    }
   ],
   "source": [
    "# DO NOT CHANGE THIS CELL\n",
    "# derive the birth year\n",
    "start=time.time()\n",
    "display(df.apply(lambda x: 1 if x['age']>=18 else 0, axis=1))\n",
    "print(f'Duration: {round(time.time()-start, 2)} seconds')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "475cc2f1-4dc9-4492-aab5-9e51ebf54ebb",
   "metadata": {},
   "source": [
    "**Note**: The `.apply()` function in pandas accepts any user-defined function that can include arbitrary operations that are applied to each value of a Series and DataFrame. cuDF also supports `.apply()`, but it relies on Numba to JIT compile the UDF (not in scope) and execute it on the GPU. This can be extremely fast, but imposes a few limitations on what operations are allowed in the UDF. See the docs on [UDFs](https://docs.rapids.ai/api/cudf/stable/user_guide/guide-to-udfs/) for details."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "id": "ecadefaa-380c-412c-87af-05c63d3f7871",
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0           0\n",
       "1           0\n",
       "2           0\n",
       "3           0\n",
       "4           0\n",
       "           ..\n",
       "58479889    1\n",
       "58479890    1\n",
       "58479891    1\n",
       "58479892    1\n",
       "58479893    1\n",
       "Name: age, Length: 58479894, dtype: int64"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Duration: 0.02 seconds\n"
     ]
    }
   ],
   "source": [
    "# DO NOT CHANGE THIS CELL\n",
    "# derive the birth year\n",
    "start=time.time()\n",
    "display((df['age']>=18).astype('int'))\n",
    "print(f'Duration: {round(time.time()-start, 2)} seconds')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "84f0c41b-ad56-4cc7-b074-f2390f41cc70",
   "metadata": {},
   "source": [
    "Below we use `Series.map()` to determine the number of characters in each person's name. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c4a3e4e1-fd83-4024-bcbf-29216c11016f",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "# DO NOT CHANGE THIS CELL\n",
    "df['name'].map(lambda x: len(x))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7c717ada-69b2-4982-b81b-8594af6d9bf1",
   "metadata": {},
   "source": [
    "## Filtering with `.loc` and Boolean Mask ##\n",
    "A boolean mask is an array of `True`/`False` values that corresponds element-wise to another array or data structure. It's used for filtering and selecting data based on certain conditions. In this context, the mask can be used to index or filter a DataFrame with `.loc`, selecting only the elements where the mask is `True`. \n",
    "\n",
    "**Note**: Boolean masking is often more efficient than iterative approaches, especially for large datasets, as it leverages vectorized operations. \n",
    "\n",
    "Below we use the `.loc` accessor and a boolean mask to filter people whose names start with an `E`. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "id": "cf9cc540-1de6-4e50-986a-5bf9bd9056a6",
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>age</th>\n",
       "      <th>sex</th>\n",
       "      <th>county</th>\n",
       "      <th>lat</th>\n",
       "      <th>long</th>\n",
       "      <th>name</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0</td>\n",
       "      <td>m</td>\n",
       "      <td>DARLINGTON</td>\n",
       "      <td>54.426256</td>\n",
       "      <td>-1.465314</td>\n",
       "      <td>EDWARD</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>0</td>\n",
       "      <td>m</td>\n",
       "      <td>DARLINGTON</td>\n",
       "      <td>54.501872</td>\n",
       "      <td>-1.667874</td>\n",
       "      <td>EAMONN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>34</th>\n",
       "      <td>0</td>\n",
       "      <td>m</td>\n",
       "      <td>DARLINGTON</td>\n",
       "      <td>54.483065</td>\n",
       "      <td>-1.501312</td>\n",
       "      <td>ETHAN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>45</th>\n",
       "      <td>0</td>\n",
       "      <td>m</td>\n",
       "      <td>DARLINGTON</td>\n",
       "      <td>54.640205</td>\n",
       "      <td>-1.558986</td>\n",
       "      <td>ELVIN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>49</th>\n",
       "      <td>0</td>\n",
       "      <td>m</td>\n",
       "      <td>DARLINGTON</td>\n",
       "      <td>54.575450</td>\n",
       "      <td>-1.600592</td>\n",
       "      <td>EDWARD</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>58479859</th>\n",
       "      <td>90</td>\n",
       "      <td>f</td>\n",
       "      <td>NEWPORT</td>\n",
       "      <td>51.576452</td>\n",
       "      <td>-2.891774</td>\n",
       "      <td>EDIE</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>58479867</th>\n",
       "      <td>90</td>\n",
       "      <td>f</td>\n",
       "      <td>NEWPORT</td>\n",
       "      <td>51.555083</td>\n",
       "      <td>-3.080259</td>\n",
       "      <td>ELEANOR</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>58479871</th>\n",
       "      <td>90</td>\n",
       "      <td>f</td>\n",
       "      <td>NEWPORT</td>\n",
       "      <td>51.515820</td>\n",
       "      <td>-2.839532</td>\n",
       "      <td>EMERSON</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>58479875</th>\n",
       "      <td>90</td>\n",
       "      <td>f</td>\n",
       "      <td>NEWPORT</td>\n",
       "      <td>51.510140</td>\n",
       "      <td>-3.004406</td>\n",
       "      <td>ELLA</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>58479885</th>\n",
       "      <td>90</td>\n",
       "      <td>f</td>\n",
       "      <td>NEWPORT</td>\n",
       "      <td>51.586575</td>\n",
       "      <td>-2.799302</td>\n",
       "      <td>ELIN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>5081794 rows × 6 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "          age sex      county        lat      long     name\n",
       "1           0   m  DARLINGTON  54.426256 -1.465314   EDWARD\n",
       "6           0   m  DARLINGTON  54.501872 -1.667874   EAMONN\n",
       "34          0   m  DARLINGTON  54.483065 -1.501312    ETHAN\n",
       "45          0   m  DARLINGTON  54.640205 -1.558986    ELVIN\n",
       "49          0   m  DARLINGTON  54.575450 -1.600592   EDWARD\n",
       "...       ...  ..         ...        ...       ...      ...\n",
       "58479859   90   f     NEWPORT  51.576452 -2.891774     EDIE\n",
       "58479867   90   f     NEWPORT  51.555083 -3.080259  ELEANOR\n",
       "58479871   90   f     NEWPORT  51.515820 -2.839532  EMERSON\n",
       "58479875   90   f     NEWPORT  51.510140 -3.004406     ELLA\n",
       "58479885   90   f     NEWPORT  51.586575 -2.799302     ELIN\n",
       "\n",
       "[5081794 rows x 6 columns]"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# DO NOT CHANGE THIS CELL\n",
    "boolean_mask=df['name'].str.startswith('E')\n",
    "df.loc[boolean_mask]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5d76a6dd-5d0b-4fd2-868a-235255375af0",
   "metadata": {},
   "source": [
    "Multiple conditions can be combined using logical operators (`&` and `|`). \n",
    "\n",
    "**Note**: When using multiple conditions, it's important to wrap each condition in parentheses (`(` and `)`) to ensure correct order of operations. \n",
    "\n",
    "Below we use the `.loc` accessor and multiple conditions to filter adults whose names start with an `E`. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "id": "03713403-6575-437d-99f0-c7f8ec3cb13b",
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>age</th>\n",
       "      <th>sex</th>\n",
       "      <th>county</th>\n",
       "      <th>lat</th>\n",
       "      <th>long</th>\n",
       "      <th>name</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0</td>\n",
       "      <td>m</td>\n",
       "      <td>DARLINGTON</td>\n",
       "      <td>54.426256</td>\n",
       "      <td>-1.465314</td>\n",
       "      <td>EDWARD</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>0</td>\n",
       "      <td>m</td>\n",
       "      <td>DARLINGTON</td>\n",
       "      <td>54.501872</td>\n",
       "      <td>-1.667874</td>\n",
       "      <td>EAMONN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>34</th>\n",
       "      <td>0</td>\n",
       "      <td>m</td>\n",
       "      <td>DARLINGTON</td>\n",
       "      <td>54.483065</td>\n",
       "      <td>-1.501312</td>\n",
       "      <td>ETHAN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>45</th>\n",
       "      <td>0</td>\n",
       "      <td>m</td>\n",
       "      <td>DARLINGTON</td>\n",
       "      <td>54.640205</td>\n",
       "      <td>-1.558986</td>\n",
       "      <td>ELVIN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>49</th>\n",
       "      <td>0</td>\n",
       "      <td>m</td>\n",
       "      <td>DARLINGTON</td>\n",
       "      <td>54.575450</td>\n",
       "      <td>-1.600592</td>\n",
       "      <td>EDWARD</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>58479889</th>\n",
       "      <td>90</td>\n",
       "      <td>f</td>\n",
       "      <td>NEWPORT</td>\n",
       "      <td>51.626744</td>\n",
       "      <td>-2.859381</td>\n",
       "      <td>FREYA</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>58479890</th>\n",
       "      <td>90</td>\n",
       "      <td>f</td>\n",
       "      <td>NEWPORT</td>\n",
       "      <td>51.546043</td>\n",
       "      <td>-2.897815</td>\n",
       "      <td>GEORGINA</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>58479891</th>\n",
       "      <td>90</td>\n",
       "      <td>f</td>\n",
       "      <td>NEWPORT</td>\n",
       "      <td>51.605268</td>\n",
       "      <td>-2.849656</td>\n",
       "      <td>REBECCA</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>58479892</th>\n",
       "      <td>90</td>\n",
       "      <td>f</td>\n",
       "      <td>NEWPORT</td>\n",
       "      <td>51.554649</td>\n",
       "      <td>-2.934364</td>\n",
       "      <td>JESSICA</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>58479893</th>\n",
       "      <td>90</td>\n",
       "      <td>f</td>\n",
       "      <td>NEWPORT</td>\n",
       "      <td>51.578787</td>\n",
       "      <td>-2.827954</td>\n",
       "      <td>FLORENCE</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>47085782 rows × 6 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "          age sex      county        lat      long      name\n",
       "1           0   m  DARLINGTON  54.426256 -1.465314    EDWARD\n",
       "6           0   m  DARLINGTON  54.501872 -1.667874    EAMONN\n",
       "34          0   m  DARLINGTON  54.483065 -1.501312     ETHAN\n",
       "45          0   m  DARLINGTON  54.640205 -1.558986     ELVIN\n",
       "49          0   m  DARLINGTON  54.575450 -1.600592    EDWARD\n",
       "...       ...  ..         ...        ...       ...       ...\n",
       "58479889   90   f     NEWPORT  51.626744 -2.859381     FREYA\n",
       "58479890   90   f     NEWPORT  51.546043 -2.897815  GEORGINA\n",
       "58479891   90   f     NEWPORT  51.605268 -2.849656   REBECCA\n",
       "58479892   90   f     NEWPORT  51.554649 -2.934364   JESSICA\n",
       "58479893   90   f     NEWPORT  51.578787 -2.827954  FLORENCE\n",
       "\n",
       "[47085782 rows x 6 columns]"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# DO NOT CHANGE THIS CELL\n",
    "df[(df['age']>=18) | (df['name'].str.startswith('E'))]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "69b7f7ef-b270-4eae-8852-d6f48bf83086",
   "metadata": {},
   "source": [
    "<a name='s4.1'></a>\n",
    "### Exercise #2 - Counties North of Sunderland ###\n",
    "This exercise will require to use the `.loc` accessor, and several of the techniques described above. We want to identify the latitude of the northernmost resident of Sunderland county (the person with the maximum `lat` value), and then determine which counties have any residents north of this resident. Use the `Series.unique()` method of to de-duplicate the result.\n",
    "\n",
    "**Instructions**: <br>\n",
    "* Modify the `<FIXME>` only and execute the below cell to identify counties north of Sunderland. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "id": "8c1394db-6bca-473b-a053-61a6066bd835",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0          COUNTY DURHAM\n",
       "1         NORTHUMBERLAND\n",
       "2              GATESHEAD\n",
       "3    NEWCASTLE UPON TYNE\n",
       "4         NORTH TYNESIDE\n",
       "5         SOUTH TYNESIDE\n",
       "6                CUMBRIA\n",
       "7        NORTH YORKSHIRE\n",
       "Name: county, dtype: object"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "sunderland_residents=df.loc[df['county'].str.upper() == 'SUNDERLAND']\n",
    "northmost_sunderland_lat=sunderland_residents['lat'].max()\n",
    "df.loc[df['lat'] > northmost_sunderland_lat]['county'].unique()"
   ]
  },
  {
   "cell_type": "raw",
   "id": "0cf99881-1a27-409a-822b-7e62b5953f3a",
   "metadata": {},
   "source": [
    "\n",
    "sunderland_residents=df.loc[df['county'] == 'SUNDERLAND']\n",
    "northmost_sunderland_lat=sunderland_residents['lat'].max()\n",
    "df.loc[df['lat'] > northmost_sunderland_lat]['county'].unique()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0594efe7-97d4-4884-bffb-a26f5144ad54",
   "metadata": {},
   "source": [
    "Click ... for solution. "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d5ead43e-37bc-4a3d-a64f-bb82ad01ad99",
   "metadata": {},
   "source": [
    "## Creating New Columns ##"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "45ada779-bc14-4fba-88d8-62c282345a63",
   "metadata": {},
   "source": [
    "We can create new columns by assigning values to the column label. The new column should have the same number of rows as the existing DataFrame. Typically, we create new columns by performing operations on existing columns. \n",
    "\n",
    "Below we create a few additional columns. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "id": "977fdb2b-dbf1-4842-ab0f-31b9af65e0d1",
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>age</th>\n",
       "      <th>sex</th>\n",
       "      <th>county</th>\n",
       "      <th>lat</th>\n",
       "      <th>long</th>\n",
       "      <th>name</th>\n",
       "      <th>birth_year</th>\n",
       "      <th>sex_normalize</th>\n",
       "      <th>county_normalize</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0</td>\n",
       "      <td>m</td>\n",
       "      <td>DARLINGTON</td>\n",
       "      <td>54.533644</td>\n",
       "      <td>-1.524401</td>\n",
       "      <td>Francis</td>\n",
       "      <td>2025</td>\n",
       "      <td>M</td>\n",
       "      <td>Darlington</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0</td>\n",
       "      <td>m</td>\n",
       "      <td>DARLINGTON</td>\n",
       "      <td>54.426256</td>\n",
       "      <td>-1.465314</td>\n",
       "      <td>Edward</td>\n",
       "      <td>2025</td>\n",
       "      <td>M</td>\n",
       "      <td>Darlington</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>0</td>\n",
       "      <td>m</td>\n",
       "      <td>DARLINGTON</td>\n",
       "      <td>54.555200</td>\n",
       "      <td>-1.496417</td>\n",
       "      <td>Teddy</td>\n",
       "      <td>2025</td>\n",
       "      <td>M</td>\n",
       "      <td>Darlington</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>0</td>\n",
       "      <td>m</td>\n",
       "      <td>DARLINGTON</td>\n",
       "      <td>54.547906</td>\n",
       "      <td>-1.572341</td>\n",
       "      <td>Angus</td>\n",
       "      <td>2025</td>\n",
       "      <td>M</td>\n",
       "      <td>Darlington</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>0</td>\n",
       "      <td>m</td>\n",
       "      <td>DARLINGTON</td>\n",
       "      <td>54.477639</td>\n",
       "      <td>-1.605995</td>\n",
       "      <td>Charlie</td>\n",
       "      <td>2025</td>\n",
       "      <td>M</td>\n",
       "      <td>Darlington</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   age sex      county        lat      long     name  birth_year  \\\n",
       "0    0   m  DARLINGTON  54.533644 -1.524401  Francis        2025   \n",
       "1    0   m  DARLINGTON  54.426256 -1.465314   Edward        2025   \n",
       "2    0   m  DARLINGTON  54.555200 -1.496417    Teddy        2025   \n",
       "3    0   m  DARLINGTON  54.547906 -1.572341    Angus        2025   \n",
       "4    0   m  DARLINGTON  54.477639 -1.605995  Charlie        2025   \n",
       "\n",
       "  sex_normalize county_normalize  \n",
       "0             M       Darlington  \n",
       "1             M       Darlington  \n",
       "2             M       Darlington  \n",
       "3             M       Darlington  \n",
       "4             M       Darlington  "
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# DO NOT CHANGE THIS CELL\n",
    "# get current year\n",
    "current_year=datetime.now().year\n",
    "\n",
    "# numerical operations\n",
    "df['birth_year']=current_year-df['age']\n",
    "\n",
    "# string operations\n",
    "df['sex_normalize']=df['sex'].str.upper()\n",
    "df['county_normalize']=df['county'].str.title().str.replace(' ', '_')\n",
    "df['name']=df['name'].str.title()\n",
    "\n",
    "# preview\n",
    "df.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "id": "66c5d332-a6ef-4f8d-9560-fa860ea1679a",
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'status': 'ok', 'restart': True}"
      ]
     },
     "execution_count": 22,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# DO NOT CHANGE THIS CELL\n",
    "import IPython\n",
    "app = IPython.Application.instance()\n",
    "app.kernel.do_shutdown(True)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cd7bed0a-d46d-402a-884e-e2e17c98738c",
   "metadata": {},
   "source": [
    "## pandas vs. cuDF ##\n",
    "Except for being much more performant with large datasets, cuDF looks and feels a lot like Pandas. By way of review, cuDF and pandas share the below similarities: \n",
    "* **API similarity**: cuDF provides a pandas-like API that is familiar to data engineers and data scientists. It aims to implement many of the same functions and operations as pandas, allowing users to easily accelerate their existing pandas workflows.\n",
    "* **Similar operations**: cuDF implements many common pandas operations such as filtering, joining, aggregating, and groupby.\n",
    "\n",
    "<p><img src='images/pandas_vs_cudf.png' width=1080></p>\n",
    "\n",
    "Comparing to pandas, cuDF tends to perform better for large datasets because of the follow features: \n",
    "* GPUs excel at parallel computation, which is advantageous for many data science and machine learning tasks.\n",
    "* GPUs typically have much higher memory bandwidth than CPUs, allowing for faster data access in memory-bound operations.\n",
    "* cuDF leverages GPU's ability to perform vectorized operations efficiently, which is particularly beneficial for large datasets.\n",
    "* cuDF uses a columnar data format, which can lead to more efficient memory access patterns on GPUs. When performing data operations on cuDF Dataframes, column operations are typically much more performant than row-wise operations.\n",
    "\n",
    "**Note**: It's important to note that the performance advantage of cuDF over pandas can vary depending on the specific operation, data size, and hardware configuration. For smaller datasets or simpler operations, the overhead of GPU initialization might make pandas on CPU faster."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c9c00b32-f5f2-46de-a50a-fc54f3244dab",
   "metadata": {},
   "source": [
    "## cuDF pandas ##\n",
    "Starting with version `23.10.01`, cuDF introduced a **pandas accelerator mode** (`cudf.pandas`) that supports 100% of the pandas API. This mode allows users to accelerate pandas code on the GPU without requiring any code changes. Not all operations can be performed on the GPU. When using `cudf.pandas`, operations that can be accelerated will run on the GPU, while unsupported operations will automatically fall back to pandas on the CPU. For example, `.read_sql()`. this will first read sql with pandas and move the data to cuDF. \n",
    "\n",
    "There are two ways to activate cuDF pandas:\n",
    "- Jupyter Magic Command\n",
    "```\n",
    "%load_ext cudf.pandas\n",
    "import pandas\n",
    "...\n",
    "```\n",
    "- Python Import\n",
    "```\n",
    "import cudf.pandas\n",
    "cudf.pandas.install()\n",
    "\n",
    "import pandas as pd\n",
    "...\n",
    "```\n",
    "\n",
    "**Note**: There are no other changes required - this is useful to quickly accelerate existing workloads with minimum code change. More information about cuDF pandas can be found [here](https://docs.rapids.ai/api/cudf/stable/cudf_pandas/). \n",
    "\n",
    "cuDF pandas is a no code change accelerator for pandas for automatic acceleration of any supported pandas call. \n",
    "\n",
    "Below we run some basic DataFrame operations with pandas, before demonstrating how cudf pandas is enabled. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "5fed82ae-0ecb-4471-bb8f-060b1bf4542f",
   "metadata": {},
   "outputs": [],
   "source": [
    "# DO NOT CHANGE THIS CELL\n",
    "%load_ext cudf.pandas"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "7671791e-c491-4831-bd1b-956de6b455e5",
   "metadata": {},
   "outputs": [],
   "source": [
    "# DO NOT CHANGE THIS CELL\n",
    "import pandas as pd\n",
    "import time\n",
    "from datetime import datetime"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "47c87c9f-5b97-4a0d-bfa7-a26c1369314f",
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Duration: 1.99 seconds\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>age</th>\n",
       "      <th>sex</th>\n",
       "      <th>county</th>\n",
       "      <th>lat</th>\n",
       "      <th>long</th>\n",
       "      <th>name</th>\n",
       "      <th>birth_year</th>\n",
       "      <th>sex_normalize</th>\n",
       "      <th>county_normalize</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0</td>\n",
       "      <td>m</td>\n",
       "      <td>DARLINGTON</td>\n",
       "      <td>54.533644</td>\n",
       "      <td>-1.524401</td>\n",
       "      <td>Francis</td>\n",
       "      <td>2025</td>\n",
       "      <td>M</td>\n",
       "      <td>Darlington</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0</td>\n",
       "      <td>m</td>\n",
       "      <td>DARLINGTON</td>\n",
       "      <td>54.426256</td>\n",
       "      <td>-1.465314</td>\n",
       "      <td>Edward</td>\n",
       "      <td>2025</td>\n",
       "      <td>M</td>\n",
       "      <td>Darlington</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>0</td>\n",
       "      <td>m</td>\n",
       "      <td>DARLINGTON</td>\n",
       "      <td>54.555200</td>\n",
       "      <td>-1.496417</td>\n",
       "      <td>Teddy</td>\n",
       "      <td>2025</td>\n",
       "      <td>M</td>\n",
       "      <td>Darlington</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>0</td>\n",
       "      <td>m</td>\n",
       "      <td>DARLINGTON</td>\n",
       "      <td>54.547906</td>\n",
       "      <td>-1.572341</td>\n",
       "      <td>Angus</td>\n",
       "      <td>2025</td>\n",
       "      <td>M</td>\n",
       "      <td>Darlington</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>0</td>\n",
       "      <td>m</td>\n",
       "      <td>DARLINGTON</td>\n",
       "      <td>54.477639</td>\n",
       "      <td>-1.605995</td>\n",
       "      <td>Charlie</td>\n",
       "      <td>2025</td>\n",
       "      <td>M</td>\n",
       "      <td>Darlington</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   age sex      county        lat      long     name  birth_year  \\\n",
       "0    0   m  DARLINGTON  54.533644 -1.524401  Francis        2025   \n",
       "1    0   m  DARLINGTON  54.426256 -1.465314   Edward        2025   \n",
       "2    0   m  DARLINGTON  54.555200 -1.496417    Teddy        2025   \n",
       "3    0   m  DARLINGTON  54.547906 -1.572341    Angus        2025   \n",
       "4    0   m  DARLINGTON  54.477639 -1.605995  Charlie        2025   \n",
       "\n",
       "  sex_normalize county_normalize  \n",
       "0             M       Darlington  \n",
       "1             M       Darlington  \n",
       "2             M       Darlington  \n",
       "3             M       Darlington  \n",
       "4             M       Darlington  "
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"font-style: italic\">                                                                                                                   </span>\n",
       "<span style=\"font-style: italic\">                                             Total time elapsed: 3.759 seconds                                     </span>\n",
       "<span style=\"font-style: italic\">                                                                                                                   </span>\n",
       "<span style=\"font-style: italic\">                                                           Stats                                                   </span>\n",
       "<span style=\"font-style: italic\">                                                                                                                   </span>\n",
       "┏━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━┓\n",
       "┃<span style=\"font-weight: bold\"> Line no. </span>┃<span style=\"font-weight: bold\"> Line                                                                     </span>┃<span style=\"font-weight: bold\"> GPU TIME(s) </span>┃<span style=\"font-weight: bold\"> CPU TIME(s) </span>┃\n",
       "┡━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━┩\n",
       "│ 2        │ <span style=\"color: #f8f8f2; text-decoration-color: #f8f8f2; background-color: #272822\">    start</span><span style=\"color: #ff4689; text-decoration-color: #ff4689; background-color: #272822\">=</span><span style=\"color: #f8f8f2; text-decoration-color: #f8f8f2; background-color: #272822\">time</span><span style=\"color: #ff4689; text-decoration-color: #ff4689; background-color: #272822\">.</span><span style=\"color: #f8f8f2; text-decoration-color: #f8f8f2; background-color: #272822\">time()</span><span style=\"background-color: #272822\">                                                   </span> │             │             │\n",
       "│          │ <span style=\"background-color: #272822\">                                                                        </span> │             │             │\n",
       "│ 4        │ <span style=\"color: #f8f8f2; text-decoration-color: #f8f8f2; background-color: #272822\">    df</span><span style=\"color: #ff4689; text-decoration-color: #ff4689; background-color: #272822\">=</span><span style=\"color: #f8f8f2; text-decoration-color: #f8f8f2; background-color: #272822\">pd</span><span style=\"color: #ff4689; text-decoration-color: #ff4689; background-color: #272822\">.</span><span style=\"color: #f8f8f2; text-decoration-color: #f8f8f2; background-color: #272822\">read_csv(</span><span style=\"color: #e6db74; text-decoration-color: #e6db74; background-color: #272822\">'./data/uk_pop.csv'</span><span style=\"color: #f8f8f2; text-decoration-color: #f8f8f2; background-color: #272822\">)</span><span style=\"background-color: #272822\">                                 </span> │ 1.540004319 │             │\n",
       "│          │ <span style=\"background-color: #272822\">                                                                        </span> │             │             │\n",
       "│ 5        │ <span style=\"color: #f8f8f2; text-decoration-color: #f8f8f2; background-color: #272822\">    current_year</span><span style=\"color: #ff4689; text-decoration-color: #ff4689; background-color: #272822\">=</span><span style=\"color: #f8f8f2; text-decoration-color: #f8f8f2; background-color: #272822\">datetime</span><span style=\"color: #ff4689; text-decoration-color: #ff4689; background-color: #272822\">.</span><span style=\"color: #f8f8f2; text-decoration-color: #f8f8f2; background-color: #272822\">now()</span><span style=\"color: #ff4689; text-decoration-color: #ff4689; background-color: #272822\">.</span><span style=\"color: #f8f8f2; text-decoration-color: #f8f8f2; background-color: #272822\">year</span><span style=\"background-color: #272822\">                                    </span> │             │             │\n",
       "│          │ <span style=\"background-color: #272822\">                                                                        </span> │             │             │\n",
       "│ 7        │ <span style=\"color: #f8f8f2; text-decoration-color: #f8f8f2; background-color: #272822\">    df[</span><span style=\"color: #e6db74; text-decoration-color: #e6db74; background-color: #272822\">'birth_year'</span><span style=\"color: #f8f8f2; text-decoration-color: #f8f8f2; background-color: #272822\">]</span><span style=\"color: #ff4689; text-decoration-color: #ff4689; background-color: #272822\">=</span><span style=\"color: #f8f8f2; text-decoration-color: #f8f8f2; background-color: #272822\">current_year</span><span style=\"color: #ff4689; text-decoration-color: #ff4689; background-color: #272822\">-</span><span style=\"color: #f8f8f2; text-decoration-color: #f8f8f2; background-color: #272822\">df[</span><span style=\"color: #e6db74; text-decoration-color: #e6db74; background-color: #272822\">'age'</span><span style=\"color: #f8f8f2; text-decoration-color: #f8f8f2; background-color: #272822\">]</span><span style=\"background-color: #272822\">                             </span> │ 0.011366894 │             │\n",
       "│          │ <span style=\"background-color: #272822\">                                                                        </span> │             │             │\n",
       "│ 9        │ <span style=\"color: #f8f8f2; text-decoration-color: #f8f8f2; background-color: #272822\">    df[</span><span style=\"color: #e6db74; text-decoration-color: #e6db74; background-color: #272822\">'sex_normalize'</span><span style=\"color: #f8f8f2; text-decoration-color: #f8f8f2; background-color: #272822\">]</span><span style=\"color: #ff4689; text-decoration-color: #ff4689; background-color: #272822\">=</span><span style=\"color: #f8f8f2; text-decoration-color: #f8f8f2; background-color: #272822\">df[</span><span style=\"color: #e6db74; text-decoration-color: #e6db74; background-color: #272822\">'sex'</span><span style=\"color: #f8f8f2; text-decoration-color: #f8f8f2; background-color: #272822\">]</span><span style=\"color: #ff4689; text-decoration-color: #ff4689; background-color: #272822\">.</span><span style=\"color: #f8f8f2; text-decoration-color: #f8f8f2; background-color: #272822\">str</span><span style=\"color: #ff4689; text-decoration-color: #ff4689; background-color: #272822\">.</span><span style=\"color: #f8f8f2; text-decoration-color: #f8f8f2; background-color: #272822\">upper()</span><span style=\"background-color: #272822\">                           </span> │ 0.015399261 │             │\n",
       "│          │ <span style=\"background-color: #272822\">                                                                        </span> │             │             │\n",
       "│ 10       │ <span style=\"color: #f8f8f2; text-decoration-color: #f8f8f2; background-color: #272822\">    df[</span><span style=\"color: #e6db74; text-decoration-color: #e6db74; background-color: #272822\">'county_normalize'</span><span style=\"color: #f8f8f2; text-decoration-color: #f8f8f2; background-color: #272822\">]</span><span style=\"color: #ff4689; text-decoration-color: #ff4689; background-color: #272822\">=</span><span style=\"color: #f8f8f2; text-decoration-color: #f8f8f2; background-color: #272822\">df[</span><span style=\"color: #e6db74; text-decoration-color: #e6db74; background-color: #272822\">'county'</span><span style=\"color: #f8f8f2; text-decoration-color: #f8f8f2; background-color: #272822\">]</span><span style=\"color: #ff4689; text-decoration-color: #ff4689; background-color: #272822\">.</span><span style=\"color: #f8f8f2; text-decoration-color: #f8f8f2; background-color: #272822\">str</span><span style=\"color: #ff4689; text-decoration-color: #ff4689; background-color: #272822\">.</span><span style=\"color: #f8f8f2; text-decoration-color: #f8f8f2; background-color: #272822\">title()</span><span style=\"color: #ff4689; text-decoration-color: #ff4689; background-color: #272822\">.</span><span style=\"color: #f8f8f2; text-decoration-color: #f8f8f2; background-color: #272822\">str</span><span style=\"color: #ff4689; text-decoration-color: #ff4689; background-color: #272822\">.</span><span style=\"color: #f8f8f2; text-decoration-color: #f8f8f2; background-color: #272822\">replace(</span><span style=\"color: #e6db74; text-decoration-color: #e6db74; background-color: #272822\">' '</span><span style=\"color: #f8f8f2; text-decoration-color: #f8f8f2; background-color: #272822\">, </span><span style=\"color: #e6db74; text-decoration-color: #e6db74; background-color: #272822\">'_…</span> │ 0.066522101 │             │\n",
       "│          │ <span style=\"background-color: #272822\">                                                                        </span> │             │             │\n",
       "│ 11       │ <span style=\"color: #f8f8f2; text-decoration-color: #f8f8f2; background-color: #272822\">    df[</span><span style=\"color: #e6db74; text-decoration-color: #e6db74; background-color: #272822\">'name'</span><span style=\"color: #f8f8f2; text-decoration-color: #f8f8f2; background-color: #272822\">]</span><span style=\"color: #ff4689; text-decoration-color: #ff4689; background-color: #272822\">=</span><span style=\"color: #f8f8f2; text-decoration-color: #f8f8f2; background-color: #272822\">df[</span><span style=\"color: #e6db74; text-decoration-color: #e6db74; background-color: #272822\">'name'</span><span style=\"color: #f8f8f2; text-decoration-color: #f8f8f2; background-color: #272822\">]</span><span style=\"color: #ff4689; text-decoration-color: #ff4689; background-color: #272822\">.</span><span style=\"color: #f8f8f2; text-decoration-color: #f8f8f2; background-color: #272822\">str</span><span style=\"color: #ff4689; text-decoration-color: #ff4689; background-color: #272822\">.</span><span style=\"color: #f8f8f2; text-decoration-color: #f8f8f2; background-color: #272822\">title()</span><span style=\"background-color: #272822\">                                   </span> │ 0.037128624 │             │\n",
       "│          │ <span style=\"background-color: #272822\">                                                                        </span> │             │             │\n",
       "│ 13       │ <span style=\"color: #f8f8f2; text-decoration-color: #f8f8f2; background-color: #272822\">    print(</span><span style=\"color: #e6db74; text-decoration-color: #e6db74; background-color: #272822\">f'Duration: {</span><span style=\"color: #f8f8f2; text-decoration-color: #f8f8f2; background-color: #272822\">round(time</span><span style=\"color: #ff4689; text-decoration-color: #ff4689; background-color: #272822\">.</span><span style=\"color: #f8f8f2; text-decoration-color: #f8f8f2; background-color: #272822\">time()</span><span style=\"color: #ff4689; text-decoration-color: #ff4689; background-color: #272822\">-</span><span style=\"color: #f8f8f2; text-decoration-color: #f8f8f2; background-color: #272822\">start, </span><span style=\"color: #ae81ff; text-decoration-color: #ae81ff; background-color: #272822\">2</span><span style=\"color: #f8f8f2; text-decoration-color: #f8f8f2; background-color: #272822\">)</span><span style=\"color: #e6db74; text-decoration-color: #e6db74; background-color: #272822\">} seconds'</span><span style=\"color: #f8f8f2; text-decoration-color: #f8f8f2; background-color: #272822\">)</span><span style=\"background-color: #272822\">           </span> │             │             │\n",
       "│          │ <span style=\"background-color: #272822\">                                                                        </span> │             │             │\n",
       "│ 15       │ <span style=\"color: #f8f8f2; text-decoration-color: #f8f8f2; background-color: #272822\">    display(df</span><span style=\"color: #ff4689; text-decoration-color: #ff4689; background-color: #272822\">.</span><span style=\"color: #f8f8f2; text-decoration-color: #f8f8f2; background-color: #272822\">head())</span><span style=\"background-color: #272822\">                                                  </span> │ 1.463330139 │             │\n",
       "│          │ <span style=\"background-color: #272822\">                                                                        </span> │             │             │\n",
       "└──────────┴──────────────────────────────────────────────────────────────────────────┴─────────────┴─────────────┘\n",
       "</pre>\n"
      ],
      "text/plain": [
       "\u001b[3m                                                                                                                   \u001b[0m\n",
       "\u001b[3m                                             Total time elapsed: 3.759 seconds                                     \u001b[0m\n",
       "\u001b[3m                                                                                                                   \u001b[0m\n",
       "\u001b[3m                                                           Stats                                                   \u001b[0m\n",
       "\u001b[3m                                                                                                                   \u001b[0m\n",
       "┏━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━┓\n",
       "┃\u001b[1m \u001b[0m\u001b[1mLine no.\u001b[0m\u001b[1m \u001b[0m┃\u001b[1m \u001b[0m\u001b[1mLine                                                                    \u001b[0m\u001b[1m \u001b[0m┃\u001b[1m \u001b[0m\u001b[1mGPU TIME(s)\u001b[0m\u001b[1m \u001b[0m┃\u001b[1m \u001b[0m\u001b[1mCPU TIME(s)\u001b[0m\u001b[1m \u001b[0m┃\n",
       "┡━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━┩\n",
       "│ 2        │ \u001b[38;2;248;248;242;48;2;39;40;34m    \u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mstart\u001b[0m\u001b[38;2;255;70;137;48;2;39;40;34m=\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mtime\u001b[0m\u001b[38;2;255;70;137;48;2;39;40;34m.\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mtime\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m(\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m)\u001b[0m\u001b[48;2;39;40;34m                                                   \u001b[0m │             │             │\n",
       "│          │ \u001b[48;2;39;40;34m                                                                        \u001b[0m │             │             │\n",
       "│ 4        │ \u001b[38;2;248;248;242;48;2;39;40;34m    \u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mdf\u001b[0m\u001b[38;2;255;70;137;48;2;39;40;34m=\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mpd\u001b[0m\u001b[38;2;255;70;137;48;2;39;40;34m.\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mread_csv\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m(\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34m'\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34m./data/uk_pop.csv\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34m'\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m)\u001b[0m\u001b[48;2;39;40;34m                                 \u001b[0m │ 1.540004319 │             │\n",
       "│          │ \u001b[48;2;39;40;34m                                                                        \u001b[0m │             │             │\n",
       "│ 5        │ \u001b[38;2;248;248;242;48;2;39;40;34m    \u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mcurrent_year\u001b[0m\u001b[38;2;255;70;137;48;2;39;40;34m=\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mdatetime\u001b[0m\u001b[38;2;255;70;137;48;2;39;40;34m.\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mnow\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m(\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m)\u001b[0m\u001b[38;2;255;70;137;48;2;39;40;34m.\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34myear\u001b[0m\u001b[48;2;39;40;34m                                    \u001b[0m │             │             │\n",
       "│          │ \u001b[48;2;39;40;34m                                                                        \u001b[0m │             │             │\n",
       "│ 7        │ \u001b[38;2;248;248;242;48;2;39;40;34m    \u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mdf\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m[\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34m'\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34mbirth_year\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34m'\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m]\u001b[0m\u001b[38;2;255;70;137;48;2;39;40;34m=\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mcurrent_year\u001b[0m\u001b[38;2;255;70;137;48;2;39;40;34m-\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mdf\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m[\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34m'\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34mage\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34m'\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m]\u001b[0m\u001b[48;2;39;40;34m                             \u001b[0m │ 0.011366894 │             │\n",
       "│          │ \u001b[48;2;39;40;34m                                                                        \u001b[0m │             │             │\n",
       "│ 9        │ \u001b[38;2;248;248;242;48;2;39;40;34m    \u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mdf\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m[\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34m'\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34msex_normalize\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34m'\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m]\u001b[0m\u001b[38;2;255;70;137;48;2;39;40;34m=\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mdf\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m[\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34m'\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34msex\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34m'\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m]\u001b[0m\u001b[38;2;255;70;137;48;2;39;40;34m.\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mstr\u001b[0m\u001b[38;2;255;70;137;48;2;39;40;34m.\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mupper\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m(\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m)\u001b[0m\u001b[48;2;39;40;34m                           \u001b[0m │ 0.015399261 │             │\n",
       "│          │ \u001b[48;2;39;40;34m                                                                        \u001b[0m │             │             │\n",
       "│ 10       │ \u001b[38;2;248;248;242;48;2;39;40;34m    \u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mdf\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m[\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34m'\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34mcounty_normalize\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34m'\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m]\u001b[0m\u001b[38;2;255;70;137;48;2;39;40;34m=\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mdf\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m[\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34m'\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34mcounty\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34m'\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m]\u001b[0m\u001b[38;2;255;70;137;48;2;39;40;34m.\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mstr\u001b[0m\u001b[38;2;255;70;137;48;2;39;40;34m.\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mtitle\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m(\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m)\u001b[0m\u001b[38;2;255;70;137;48;2;39;40;34m.\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mstr\u001b[0m\u001b[38;2;255;70;137;48;2;39;40;34m.\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mreplace\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m(\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34m'\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34m \u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34m'\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m,\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34m'\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34m_\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34m…\u001b[0m │ 0.066522101 │             │\n",
       "│          │ \u001b[48;2;39;40;34m                                                                        \u001b[0m │             │             │\n",
       "│ 11       │ \u001b[38;2;248;248;242;48;2;39;40;34m    \u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mdf\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m[\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34m'\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34mname\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34m'\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m]\u001b[0m\u001b[38;2;255;70;137;48;2;39;40;34m=\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mdf\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m[\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34m'\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34mname\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34m'\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m]\u001b[0m\u001b[38;2;255;70;137;48;2;39;40;34m.\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mstr\u001b[0m\u001b[38;2;255;70;137;48;2;39;40;34m.\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mtitle\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m(\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m)\u001b[0m\u001b[48;2;39;40;34m                                   \u001b[0m │ 0.037128624 │             │\n",
       "│          │ \u001b[48;2;39;40;34m                                                                        \u001b[0m │             │             │\n",
       "│ 13       │ \u001b[38;2;248;248;242;48;2;39;40;34m    \u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mprint\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m(\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34mf\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34m'\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34mDuration: \u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34m{\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mround\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m(\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mtime\u001b[0m\u001b[38;2;255;70;137;48;2;39;40;34m.\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mtime\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m(\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m)\u001b[0m\u001b[38;2;255;70;137;48;2;39;40;34m-\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mstart\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m,\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;174;129;255;48;2;39;40;34m2\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m)\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34m}\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34m seconds\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34m'\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m)\u001b[0m\u001b[48;2;39;40;34m           \u001b[0m │             │             │\n",
       "│          │ \u001b[48;2;39;40;34m                                                                        \u001b[0m │             │             │\n",
       "│ 15       │ \u001b[38;2;248;248;242;48;2;39;40;34m    \u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mdisplay\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m(\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mdf\u001b[0m\u001b[38;2;255;70;137;48;2;39;40;34m.\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mhead\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m(\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m)\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m)\u001b[0m\u001b[48;2;39;40;34m                                                  \u001b[0m │ 1.463330139 │             │\n",
       "│          │ \u001b[48;2;39;40;34m                                                                        \u001b[0m │             │             │\n",
       "└──────────┴──────────────────────────────────────────────────────────────────────────┴─────────────┴─────────────┘\n"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "%%cudf.pandas.line_profile\n",
    "# DO NOT CHANGE THIS CELL\n",
    "start=time.time()\n",
    "\n",
    "df=pd.read_csv('./data/uk_pop.csv')\n",
    "current_year=datetime.now().year\n",
    "\n",
    "df['birth_year']=current_year-df['age']\n",
    "\n",
    "df['sex_normalize']=df['sex'].str.upper()\n",
    "df['county_normalize']=df['county'].str.title().str.replace(' ', '_')\n",
    "df['name']=df['name'].str.title()\n",
    "\n",
    "print(f'Duration: {round(time.time()-start, 2)} seconds')\n",
    "\n",
    "display(df.head())"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7ebe113c-9fc0-4da5-932c-0a68af0d5a31",
   "metadata": {},
   "source": [
    "<a name='s4.1'></a>\n",
    "### Exercise #3 - Automatic Acceleration ###\n",
    "**Instructions**: <br>\n",
    "* Go back to the top of this subsection, re-execute the cells and uncomment the `%load_ext` magic command to accelerate with cuDF pandas. \n",
    "* Observe the acceleration. \n",
    "* Go back to the top of this subsection, re-execute the cells and uncomment the `%%cudf.pandas.line_profile` magic command to use the line profiler. \n",
    "* Observe the output from the line profiler. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "f1688462-783c-4fea-ae18-5d37524d26d8",
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'status': 'ok', 'restart': True}"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# DO NOT CHANGE THIS CELL\n",
    "import IPython\n",
    "app = IPython.Application.instance()\n",
    "app.kernel.do_shutdown(True)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0a691784-dac5-4485-89fb-5e405f10c05c",
   "metadata": {},
   "source": [
    "**Well Done!** Let's move to the [next notebook](1-03_memory_management.ipynb). "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "81e47f0a-547e-4714-878d-34eb9b75c835",
   "metadata": {},
   "source": [
    "<img src=\"./images/DLI_Header.png\" width=400/>"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.15"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}