From 910a222fa60ce6ea0831f2956470b8a0b9f62670 Mon Sep 17 00:00:00 2001 From: leshe4ka46 Date: Sat, 18 Oct 2025 12:25:53 +0300 Subject: nvidia2 --- .../3-05_knn.ipynb | 300 +++++++++++++++++++++ 1 file changed, 300 insertions(+) create mode 100644 Fundamentals_of_Accelerated_Data_Science/3-05_knn.ipynb (limited to 'Fundamentals_of_Accelerated_Data_Science/3-05_knn.ipynb') diff --git a/Fundamentals_of_Accelerated_Data_Science/3-05_knn.ipynb b/Fundamentals_of_Accelerated_Data_Science/3-05_knn.ipynb new file mode 100644 index 0000000..e6d543f --- /dev/null +++ b/Fundamentals_of_Accelerated_Data_Science/3-05_knn.ipynb @@ -0,0 +1,300 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Fundamentals of Accelerated Data Science # " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 05 - KNN ##\n", + "\n", + "**Table of Contents**\n", + "
\n", + "This notebook uses GPU-accelerated k-nearest neighbors to identify the nearest road nodes to hospitals. This notebook covers the below sections: \n", + "1. [Environment](#Environment)\n", + "2. [Load Data](#Load-Data)\n", + " * [Road Nodes](#Road-Nodes)\n", + " * [Hospitals](#Hospitals)\n", + "3. [K-Nearest Neighbors](#K-Nearest-Neighbors)\n", + " * [Road Nodes Closest to Each Hospital](#Road-Nodes-Closest-to-Each-Hospital)\n", + " * [Viewing a Specific Hospital](#Viewing-a-Specific-Hospital)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Environment ##" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import cudf\n", + "import cuml" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Load Data" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Road Nodes ###\n", + "We begin by reading our road nodes data." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# road_nodes = cudf.read_csv('./data/road_nodes_2-06.csv', dtype=['str', 'float32', 'float32', 'str'])\n", + "road_nodes = cudf.read_csv('./data/road_nodes.csv', dtype=['str', 'float32', 'float32', 'str'])" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "road_nodes.dtypes" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "road_nodes.shape" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "road_nodes.head()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Hospitals ###\n", + "Next we load the hospital data." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "hospitals = cudf.read_csv('./data/clean_hospitals_full.csv')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "hospitals.dtypes" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "hospitals.shape" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "hospitals.head()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## K-Nearest Neighbors ##\n", + "We are going to use the [k-nearest neighbors](https://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm) algorithm to find the nearest *k* road nodes for every hospital. We will need to fit a KNN model with road data, and then give our trained model hospital locations so that it can return the nearest roads.\n", + "\n", + "Create a k-nearest neighbors model `knn` by using the `cuml.NearestNeighbors` constructor, passing it the named argument `n_neighbors` set to 3.\n", + "\n", + "Create a new dataframe `road_locs` using the `road_nodes` columns `east` and `north`. The order of the columns doesn't matter, except that we will need them to remain consistent over multiple operations, so please use the ordering `['east', 'north']`.\n", + "\n", + "Fit the `knn` model with `road_locs` using the `knn.fit` method." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "\n", + "knn = cuml.NearestNeighbors(n_neighbors=3)\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "\n", + "road_locs = road_nodes[['east', 'north']]\n", + "knn.fit(road_locs)\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Road Nodes Closest to Each Hospital ###\n", + "Use the `knn.kneighbors` method to find the 3 closest road nodes to each hospital. `knn.kneighbors` expects 2 arguments: `X`, for which you should use the `easting` and `northing` columns of `hospitals` (remember to retain the same column order as when you fit the `knn` model above), and `n_neighbors`, the number of neighbors to search for--in this case, 3. \n", + "\n", + "`knn.kneighbors` will return 2 cudf dataframes, which you should name `distances` and `indices` respectively." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "distances, indices = knn.kneighbors(hospitals[['easting', 'northing']], 3) # order has to match the knn fit order (east, north)\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Viewing a Specific Hospital ###\n", + "We can now use `indices`, `hospitals`, and `road_nodes` to derive information specific to a given hospital. Here we will examine the hospital at index `10`. First we view the hospital's grid coordinates:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "SELECTED_RESULT = 10\n", + "print('hospital coordinates:\\n', hospitals.loc[SELECTED_RESULT, ['easting', 'northing']], sep='')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now we view the road node IDs for the 3 closest road nodes:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "nearest_road_nodes = indices.iloc[SELECTED_RESULT, 0:3]\n", + "print('node_id:\\n', nearest_road_nodes, sep='')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "And finally the grid coordinates for the 3 nearest road nodes, which we can confirm are located in order of increasing distance from the hospital:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "print('road_node coordinates:\\n', road_nodes.loc[nearest_road_nodes, ['east', 'north']], sep='')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import IPython\n", + "app = IPython.Application.instance()\n", + "app.kernel.do_shutdown(True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**Well Done!** Let's move to the [next notebook](3-06_xgboost.ipynb). " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.10.15" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} -- cgit v1.2.3