{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Fundamentals of Accelerated Data Science # " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 05 - KNN ##\n", "\n", "**Table of Contents**\n", "
\n", "This notebook uses GPU-accelerated k-nearest neighbors to identify the nearest road nodes to hospitals. This notebook covers the below sections: \n", "1. [Environment](#Environment)\n", "2. [Load Data](#Load-Data)\n", " * [Road Nodes](#Road-Nodes)\n", " * [Hospitals](#Hospitals)\n", "3. [K-Nearest Neighbors](#K-Nearest-Neighbors)\n", " * [Road Nodes Closest to Each Hospital](#Road-Nodes-Closest-to-Each-Hospital)\n", " * [Viewing a Specific Hospital](#Viewing-a-Specific-Hospital)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Environment ##" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import cudf\n", "import cuml" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Load Data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Road Nodes ###\n", "We begin by reading our road nodes data." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "# road_nodes = cudf.read_csv('./data/road_nodes_2-06.csv', dtype=['str', 'float32', 'float32', 'str'])\n", "road_nodes = cudf.read_csv('./data/road_nodes.csv', dtype=['str', 'float32', 'float32', 'str'])" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "node_id object\n", "east float32\n", "north float32\n", "type object\n", "dtype: object" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "road_nodes.dtypes" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(3121148, 4)" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "road_nodes.shape" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
node_ideastnorthtype
0id02FE73D4-E88D-4119-8DC2-6E80DE6F6594320608.09375870994.0000junction
1id634D65C1-C38B-4868-9080-2E1E47F0935C320628.50000871103.8125road end
2idDC14D4D1-774E-487D-8EDE-60B129E5482C320635.46875870983.8750junction
3id51555819-1A39-4B41-B0C9-C6D2086D9921320648.68750871083.5625junction
4id9E362428-79D7-4EE3-B015-0CE3F6A78A69320658.18750871162.3750junction
\n", "
" ], "text/plain": [ " node_id east north type\n", "0 id02FE73D4-E88D-4119-8DC2-6E80DE6F6594 320608.09375 870994.0000 junction\n", "1 id634D65C1-C38B-4868-9080-2E1E47F0935C 320628.50000 871103.8125 road end\n", "2 idDC14D4D1-774E-487D-8EDE-60B129E5482C 320635.46875 870983.8750 junction\n", "3 id51555819-1A39-4B41-B0C9-C6D2086D9921 320648.68750 871083.5625 junction\n", "4 id9E362428-79D7-4EE3-B015-0CE3F6A78A69 320658.18750 871162.3750 junction" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "road_nodes.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Hospitals ###\n", "Next we load the hospital data." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "hospitals = cudf.read_csv('./data/clean_hospitals_full.csv')" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "OrganisationID int64\n", "OrganisationCode object\n", "OrganisationType object\n", "SubType object\n", "Sector object\n", "OrganisationStatus object\n", "IsPimsManaged object\n", "OrganisationName object\n", "Address1 object\n", "Address2 object\n", "Address3 object\n", "City object\n", "County object\n", "Postcode object\n", "Latitude float64\n", "Longitude float64\n", "ParentODSCode object\n", "ParentName object\n", "Phone object\n", "Email object\n", "Website object\n", "Fax object\n", "northing float64\n", "easting float64\n", "dtype: object" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "hospitals.dtypes" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(1226, 24)" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "hospitals.shape" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
OrganisationIDOrganisationCodeOrganisationTypeSubTypeSectorOrganisationStatusIsPimsManagedOrganisationNameAddress1Address2...LatitudeLongitudeParentODSCodeParentNamePhoneEmailWebsiteFaxnorthingeasting
017970NDA07HospitalHospitalIndependent SectorVisibleTRUEWalton Community Hospital - Virgin Care Servic...<NA>Rodney Road...51.379997-0.406042NDAVirgin Care Services Ltd01932 414205<NA><NA>01932 253674165810.4688510917.5313
117981NDA18HospitalHospitalIndependent SectorVisibleTRUEWoking Community Hospital (Virgin Care)<NA>Heathside Road...51.315132-0.556289NDAVirgin Care Services Ltd01483 715911<NA><NA><NA>158381.3438500604.8438
218102NLT02HospitalHospitalNHS SectorVisibleTRUENorth Somerset Community HospitalNorth Somerset Community HospitalOld Street...51.437195-2.847193NLTNorth Somerset Community Partnership Community...01275 872212<NA>http://www.nscphealth.co.uk<NA>171305.7813341119.3750
318138NMP01HospitalHospitalIndependent SectorVisibleFALSEBridgewater Hospital120 Princess Road<NA>...53.459743-2.245469NMPBridgewater Hospital (Manchester) Ltd0161 2270000<NA>www.bridgewaterhospital.com<NA>395944.5625383703.5938
418142NMV01HospitalHospitalIndependent SectorVisibleTRUEKneesworth HouseOld North RoadBassingbourn...52.078121-0.030604NMVPartnerships In Care Ltd01763 255 700reception_kneesworthhouse@partnershipsincare.c...www.partnershipsincare.co.uk<NA>244071.7031534945.1875
\n", "

5 rows × 24 columns

\n", "
" ], "text/plain": [ " OrganisationID OrganisationCode OrganisationType SubType \\\n", "0 17970 NDA07 Hospital Hospital \n", "1 17981 NDA18 Hospital Hospital \n", "2 18102 NLT02 Hospital Hospital \n", "3 18138 NMP01 Hospital Hospital \n", "4 18142 NMV01 Hospital Hospital \n", "\n", " Sector OrganisationStatus IsPimsManaged \\\n", "0 Independent Sector Visible TRUE \n", "1 Independent Sector Visible TRUE \n", "2 NHS Sector Visible TRUE \n", "3 Independent Sector Visible FALSE \n", "4 Independent Sector Visible TRUE \n", "\n", " OrganisationName \\\n", "0 Walton Community Hospital - Virgin Care Servic... \n", "1 Woking Community Hospital (Virgin Care) \n", "2 North Somerset Community Hospital \n", "3 Bridgewater Hospital \n", "4 Kneesworth House \n", "\n", " Address1 Address2 ... Latitude \\\n", "0 Rodney Road ... 51.379997 \n", "1 Heathside Road ... 51.315132 \n", "2 North Somerset Community Hospital Old Street ... 51.437195 \n", "3 120 Princess Road ... 53.459743 \n", "4 Old North Road Bassingbourn ... 52.078121 \n", "\n", " Longitude ParentODSCode \\\n", "0 -0.406042 NDA \n", "1 -0.556289 NDA \n", "2 -2.847193 NLT \n", "3 -2.245469 NMP \n", "4 -0.030604 NMV \n", "\n", " ParentName Phone \\\n", "0 Virgin Care Services Ltd 01932 414205 \n", "1 Virgin Care Services Ltd 01483 715911 \n", "2 North Somerset Community Partnership Community... 01275 872212 \n", "3 Bridgewater Hospital (Manchester) Ltd 0161 2270000 \n", "4 Partnerships In Care Ltd 01763 255 700 \n", "\n", " Email \\\n", "0 \n", "1 \n", "2 \n", "3 \n", "4 reception_kneesworthhouse@partnershipsincare.c... \n", "\n", " Website Fax northing easting \n", "0 01932 253674 165810.4688 510917.5313 \n", "1 158381.3438 500604.8438 \n", "2 http://www.nscphealth.co.uk 171305.7813 341119.3750 \n", "3 www.bridgewaterhospital.com 395944.5625 383703.5938 \n", "4 www.partnershipsincare.co.uk 244071.7031 534945.1875 \n", "\n", "[5 rows x 24 columns]" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "hospitals.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## K-Nearest Neighbors ##\n", "We are going to use the [k-nearest neighbors](https://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm) algorithm to find the nearest *k* road nodes for every hospital. We will need to fit a KNN model with road data, and then give our trained model hospital locations so that it can return the nearest roads.\n", "\n", "Create a k-nearest neighbors model `knn` by using the `cuml.NearestNeighbors` constructor, passing it the named argument `n_neighbors` set to 3.\n", "\n", "Create a new dataframe `road_locs` using the `road_nodes` columns `east` and `north`. The order of the columns doesn't matter, except that we will need them to remain consistent over multiple operations, so please use the ordering `['east', 'north']`.\n", "\n", "Fit the `knn` model with `road_locs` using the `knn.fit` method." ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "\n", "knn = cuml.NearestNeighbors(n_neighbors=3)\n" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
NearestNeighbors()
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
" ], "text/plain": [ "NearestNeighbors()" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "\n", "road_locs = road_nodes[['east', 'north']]\n", "knn.fit(road_locs)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Road Nodes Closest to Each Hospital ###\n", "Use the `knn.kneighbors` method to find the 3 closest road nodes to each hospital. `knn.kneighbors` expects 2 arguments: `X`, for which you should use the `easting` and `northing` columns of `hospitals` (remember to retain the same column order as when you fit the `knn` model above), and `n_neighbors`, the number of neighbors to search for--in this case, 3. \n", "\n", "`knn.kneighbors` will return 2 cudf dataframes, which you should name `distances` and `indices` respectively." ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [], "source": [ "distances, indices = knn.kneighbors(hospitals[['easting', 'northing']], 3) # order has to match the knn fit order (east, north)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Viewing a Specific Hospital ###\n", "We can now use `indices`, `hospitals`, and `road_nodes` to derive information specific to a given hospital. Here we will examine the hospital at index `10`. First we view the hospital's grid coordinates:" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "hospital coordinates:\n", "easting 260713.17190\n", "northing 56303.21875\n", "Name: 10, dtype: float64\n" ] } ], "source": [ "SELECTED_RESULT = 10\n", "print('hospital coordinates:\\n', hospitals.loc[SELECTED_RESULT, ['easting', 'northing']], sep='')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we view the road node IDs for the 3 closest road nodes:" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "node_id:\n", "0 118559\n", "1 118560\n", "2 118678\n", "Name: 10, dtype: int64\n" ] } ], "source": [ "nearest_road_nodes = indices.iloc[SELECTED_RESULT, 0:3]\n", "print('node_id:\\n', nearest_road_nodes, sep='')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And finally the grid coordinates for the 3 nearest road nodes, which we can confirm are located in order of increasing distance from the hospital:" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "road_node coordinates:\n", " east north\n", "118559 260697.859375 56322.710938\n", "118560 260722.812500 56207.925781\n", "118678 260540.000000 56105.000000\n" ] } ], "source": [ "print('road_node coordinates:\\n', road_nodes.loc[nearest_road_nodes, ['east', 'north']], sep='')" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'status': 'ok', 'restart': True}" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import IPython\n", "app = IPython.Application.instance()\n", "app.kernel.do_shutdown(True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Well Done!** Let's move to the [next notebook](3-06_xgboost.ipynb). " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.15" } }, "nbformat": 4, "nbformat_minor": 4 }