{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Fundamentals of Accelerated Data Science # " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 05 - KNN ##\n", "\n", "**Table of Contents**\n", "
\n", "This notebook uses GPU-accelerated k-nearest neighbors to identify the nearest road nodes to hospitals. This notebook covers the below sections: \n", "1. [Environment](#Environment)\n", "2. [Load Data](#Load-Data)\n", " * [Road Nodes](#Road-Nodes)\n", " * [Hospitals](#Hospitals)\n", "3. [K-Nearest Neighbors](#K-Nearest-Neighbors)\n", " * [Road Nodes Closest to Each Hospital](#Road-Nodes-Closest-to-Each-Hospital)\n", " * [Viewing a Specific Hospital](#Viewing-a-Specific-Hospital)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Environment ##" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import cudf\n", "import cuml" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Load Data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Road Nodes ###\n", "We begin by reading our road nodes data." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# road_nodes = cudf.read_csv('./data/road_nodes_2-06.csv', dtype=['str', 'float32', 'float32', 'str'])\n", "road_nodes = cudf.read_csv('./data/road_nodes.csv', dtype=['str', 'float32', 'float32', 'str'])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "road_nodes.dtypes" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "road_nodes.shape" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "road_nodes.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Hospitals ###\n", "Next we load the hospital data." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "hospitals = cudf.read_csv('./data/clean_hospitals_full.csv')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "hospitals.dtypes" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "hospitals.shape" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "hospitals.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## K-Nearest Neighbors ##\n", "We are going to use the [k-nearest neighbors](https://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm) algorithm to find the nearest *k* road nodes for every hospital. We will need to fit a KNN model with road data, and then give our trained model hospital locations so that it can return the nearest roads.\n", "\n", "Create a k-nearest neighbors model `knn` by using the `cuml.NearestNeighbors` constructor, passing it the named argument `n_neighbors` set to 3.\n", "\n", "Create a new dataframe `road_locs` using the `road_nodes` columns `east` and `north`. The order of the columns doesn't matter, except that we will need them to remain consistent over multiple operations, so please use the ordering `['east', 'north']`.\n", "\n", "Fit the `knn` model with `road_locs` using the `knn.fit` method." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "\n", "knn = cuml.NearestNeighbors(n_neighbors=3)\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "\n", "road_locs = road_nodes[['east', 'north']]\n", "knn.fit(road_locs)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Road Nodes Closest to Each Hospital ###\n", "Use the `knn.kneighbors` method to find the 3 closest road nodes to each hospital. `knn.kneighbors` expects 2 arguments: `X`, for which you should use the `easting` and `northing` columns of `hospitals` (remember to retain the same column order as when you fit the `knn` model above), and `n_neighbors`, the number of neighbors to search for--in this case, 3. \n", "\n", "`knn.kneighbors` will return 2 cudf dataframes, which you should name `distances` and `indices` respectively." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "distances, indices = knn.kneighbors(hospitals[['easting', 'northing']], 3) # order has to match the knn fit order (east, north)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Viewing a Specific Hospital ###\n", "We can now use `indices`, `hospitals`, and `road_nodes` to derive information specific to a given hospital. Here we will examine the hospital at index `10`. First we view the hospital's grid coordinates:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "SELECTED_RESULT = 10\n", "print('hospital coordinates:\\n', hospitals.loc[SELECTED_RESULT, ['easting', 'northing']], sep='')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we view the road node IDs for the 3 closest road nodes:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "nearest_road_nodes = indices.iloc[SELECTED_RESULT, 0:3]\n", "print('node_id:\\n', nearest_road_nodes, sep='')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And finally the grid coordinates for the 3 nearest road nodes, which we can confirm are located in order of increasing distance from the hospital:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print('road_node coordinates:\\n', road_nodes.loc[nearest_road_nodes, ['east', 'north']], sep='')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import IPython\n", "app = IPython.Application.instance()\n", "app.kernel.do_shutdown(True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Well Done!** Let's move to the [next notebook](3-06_xgboost.ipynb). " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.15" } }, "nbformat": 4, "nbformat_minor": 4 }