{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"
"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Fundamentals of Accelerated Data Science # "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 02 - K-Means ##\n",
"\n",
"**Table of Contents**\n",
"
\n",
"This notebook uses GPU-accelerated K-means to find the best locations for a fixed number of humanitarian supply airdrop depots. This notebook covers the below sections: \n",
"1. [Environment](#Environment)\n",
"2. [Load Data](#Load-Data)\n",
"3. [K-Means Clustering](#K-Means-Clustering)\n",
" * [Exercise #1 - Make Another `KMeans` Instance](#Exercise-#1---Make-Another-KMeans-Instance)\n",
"4. [Visualize the Clusters](#Visualize-the-Clusters)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Environment ##\n",
"For the first time we import `cuml`, the RAPIDS GPU-accelerated library containing many common machine learning algorithms. We will be visualizing the results of your work in this notebook, so we also import `cuxfilter`."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"# DO NOT CHANGE THIS CELL\n",
"import cudf\n",
"import cuml\n",
"import cupy as cp\n",
"import cuxfilter as cxf"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Load Data ##\n",
"For this notebook we load again the cleaned UK population data--in this case, we are not specifically looking at counties, so we omit that column and just keep the grid coordinate columns."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"northing float64\n",
"easting float64\n",
"dtype: object\n"
]
},
{
"data": {
"text/plain": [
"(58479894, 2)"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# DO NOT CHANGE THIS CELL\n",
"gdf = cudf.read_csv('./data/clean_uk_pop.csv', usecols=['easting', 'northing'])\n",
"print(gdf.dtypes)\n",
"gdf.shape"
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
| \n", " | northing | \n", "easting | \n", "cluster | \n", "
|---|---|---|---|
| 0 | \n", "515491.5313 | \n", "430772.1875 | \n", "1 | \n", "
| 1 | \n", "503572.4688 | \n", "434685.8750 | \n", "1 | \n", "
| 2 | \n", "517903.6563 | \n", "432565.5313 | \n", "1 | \n", "
| 3 | \n", "517059.9063 | \n", "427660.6250 | \n", "1 | \n", "
| 4 | \n", "509228.6875 | \n", "425527.7813 | \n", "1 | \n", "