From 910a222fa60ce6ea0831f2956470b8a0b9f62670 Mon Sep 17 00:00:00 2001 From: leshe4ka46 Date: Sat, 18 Oct 2025 12:25:53 +0300 Subject: nvidia2 --- .../2-02_prep_graph.ipynb | 1699 ++++++++++++++++++++ 1 file changed, 1699 insertions(+) create mode 100644 Fundamentals_of_Accelerated_Data_Science/2-02_prep_graph.ipynb (limited to 'Fundamentals_of_Accelerated_Data_Science/2-02_prep_graph.ipynb') diff --git a/Fundamentals_of_Accelerated_Data_Science/2-02_prep_graph.ipynb b/Fundamentals_of_Accelerated_Data_Science/2-02_prep_graph.ipynb new file mode 100644 index 0000000..83e2049 --- /dev/null +++ b/Fundamentals_of_Accelerated_Data_Science/2-02_prep_graph.ipynb @@ -0,0 +1,1699 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "tags": [] + }, + "source": [ + "" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Fundamentals of Accelerated Data Science # " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 02 - Preparing Data for Graph Construction ##\n", + "\n", + "**Table of Contents**\n", + "
\n", + "This notebook introduces the basics of representing and constructing a graph. This notebook covers the below sections:\n", + "1. [Background](#Background)\n", + "2. [Environment](#Environment)\n", + "3. [Read Data](#Read-Data)\n", + " * [UK Road Nodes](#UK-Road-Nodes)\n", + " * [UK Road Edges](#UK-Road-Edges)\n", + " * [Exercise #1 - Make IDs Compatible](#Exercise-#1---Make-IDs-Compatible)\n", + "5. [Data Summary](#Data-Summary)\n", + "6. [Building the Road Network Graph](#Building-the-Road-Network-Graph)\n", + " * [Reindex `road_nodes`](#Reindex-road_nodes)\n", + " * [Analyzing the Graph](#Analyzing-the-Graph)\n", + "7. [Construct a Graph of Roads with Time Weights](#Construct-a-Graph-of-Roads-with-Time-Weights)\n", + " * [Road Type to Speed Conversion](#Road-Type-to-Speed-Conversion)\n", + " * [Step 1: Merge `speed_gdf` into `road_edges`](#Step-1:-Merge-speed_gdf-into-road_edges)\n", + " * [Exercse #2 - Step 2: Add Length in Seconds Column](#Exercse-#2---Step-2:-Add-Length-in-Seconds-Column)\n", + " * [Exercise #3 - Step 3: Construct the Graph](#Exercise-#3---Step-3:-Construct-the-Graph)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Background ##\n", + "As part of our larger data science goal for this workshop, we will be working with data reflecting the entire road network of Great Britain. We have as a starting point road data extracted into tabular csv format from official [GML](https://en.wikipedia.org/wiki/Geography_Markup_Language) files. Ultimately, we would like to use cuGraph to perform GPU-accelerated graph analytics on this data, but in order to do so, we need to do some preprocessing to get it ready for graph creation.\n", + "\n", + "In this notebook you will be learning additional cuDF data transformation techniques in a demonstration of prepping data for ingestion by cuGraph. Next, you will do a series of exercises to perform a similar transformation of the data for the creation of a graph with different edge weights." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Environment ##" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In addition to `cudf`, for this notebook we will also import `cugraph`, which we will use (after data preparation) to construct a GPU-accelerated graph. We also import `networkx` for a brief performance comparison later on." + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "import warnings\n", + "warnings.filterwarnings('ignore')\n", + "\n", + "import cudf\n", + "import cugraph as cg\n", + "\n", + "import networkx as nx" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Read Data ##" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In this notebook we will be working with two data sources that will help us create a graph of the UK's road networks." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### UK Road Nodes ###" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The first data table describes the nodes in the road network: endpoints, junctions (including roundabouts), and points that break up a long stretch of curving road so that it can be mapped correctly (instead of as a straight line).\n", + "\n", + "The coordinates for each point are in the OSGB36 format we explored earlier in section 1-05." + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
node_ideastnorthtype
0id02FE73D4-E88D-4119-8DC2-6E80DE6F6594320608.0938870994.0000junction
1id634D65C1-C38B-4868-9080-2E1E47F0935C320628.5000871103.8125road end
2idDC14D4D1-774E-487D-8EDE-60B129E5482C320635.4688870983.9375junction
3id51555819-1A39-4B41-B0C9-C6D2086D9921320648.7188871083.5625junction
4id9E362428-79D7-4EE3-B015-0CE3F6A78A69320658.1875871162.3750junction
\n", + "
" + ], + "text/plain": [ + " node_id east north type\n", + "0 id02FE73D4-E88D-4119-8DC2-6E80DE6F6594 320608.0938 870994.0000 junction\n", + "1 id634D65C1-C38B-4868-9080-2E1E47F0935C 320628.5000 871103.8125 road end\n", + "2 idDC14D4D1-774E-487D-8EDE-60B129E5482C 320635.4688 870983.9375 junction\n", + "3 id51555819-1A39-4B41-B0C9-C6D2086D9921 320648.7188 871083.5625 junction\n", + "4 id9E362428-79D7-4EE3-B015-0CE3F6A78A69 320658.1875 871162.3750 junction" + ] + }, + "execution_count": 2, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "road_nodes = cudf.read_csv('./data/road_nodes.csv')\n", + "road_nodes.head()" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "node_id object\n", + "east float64\n", + "north float64\n", + "type object\n", + "dtype: object" + ] + }, + "execution_count": 3, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "road_nodes.dtypes" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "(3121148, 4)" + ] + }, + "execution_count": 4, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "road_nodes.shape" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0 junction\n", + "1 road end\n", + "2 pseudo node\n", + "3 roundabout\n", + "Name: type, dtype: object" + ] + }, + "execution_count": 5, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "road_nodes['type'].unique()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### UK Road Edges ###" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The second data table describes road segments, including their start and end points, how long they are, and what kind of road they are." + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
src_iddst_idlengthtypeform
0#id138447A5-91D4-4642-BFAC-13F309705429#id84C9DAD4-9243-4742-B582-E8CBC848E08A314Restricted Local Access RoadSingle Carriageway
1#idD615F9C5-5BE9-412D-9FED-F4928BAB4146#idA1BB20B9-0751-4B42-9925-20607ABF5027104Restricted Local Access RoadSingle Carriageway
2#idDC14D4D1-774E-487D-8EDE-60B129E5482C#id51555819-1A39-4B41-B0C9-C6D2086D9921100Restricted Local Access RoadSingle Carriageway
3#id626FC567-199C-41FB-9F29-1AB718874128#idACD1B0A9-F870-4B46-88CF-C870A9EDAF8B93Restricted Local Access RoadSingle Carriageway
4#id03312900-B147-4CA3-A858-E2BF6AD1ECA7#id02FE73D4-E88D-4119-8DC2-6E80DE6F659495Restricted Local Access RoadSingle Carriageway
\n", + "
" + ], + "text/plain": [ + " src_id \\\n", + "0 #id138447A5-91D4-4642-BFAC-13F309705429 \n", + "1 #idD615F9C5-5BE9-412D-9FED-F4928BAB4146 \n", + "2 #idDC14D4D1-774E-487D-8EDE-60B129E5482C \n", + "3 #id626FC567-199C-41FB-9F29-1AB718874128 \n", + "4 #id03312900-B147-4CA3-A858-E2BF6AD1ECA7 \n", + "\n", + " dst_id length \\\n", + "0 #id84C9DAD4-9243-4742-B582-E8CBC848E08A 314 \n", + "1 #idA1BB20B9-0751-4B42-9925-20607ABF5027 104 \n", + "2 #id51555819-1A39-4B41-B0C9-C6D2086D9921 100 \n", + "3 #idACD1B0A9-F870-4B46-88CF-C870A9EDAF8B 93 \n", + "4 #id02FE73D4-E88D-4119-8DC2-6E80DE6F6594 95 \n", + "\n", + " type form \n", + "0 Restricted Local Access Road Single Carriageway \n", + "1 Restricted Local Access Road Single Carriageway \n", + "2 Restricted Local Access Road Single Carriageway \n", + "3 Restricted Local Access Road Single Carriageway \n", + "4 Restricted Local Access Road Single Carriageway " + ] + }, + "execution_count": 6, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "road_edges = cudf.read_csv('./data/road_edges.csv')\n", + "road_edges.head()" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "src_id object\n", + "dst_id object\n", + "length int64\n", + "type object\n", + "form object\n", + "dtype: object" + ] + }, + "execution_count": 7, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "road_edges.dtypes" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "(3725531, 5)" + ] + }, + "execution_count": 8, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "road_edges.shape" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0 Restricted Local Access Road\n", + "1 Local Road\n", + "2 B Road\n", + "3 Secondary Access Road\n", + "4 Minor Road\n", + "5 A Road\n", + "6 Local Access Road\n", + "7 Motorway\n", + "Name: type, dtype: object" + ] + }, + "execution_count": 9, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "road_edges['type'].unique()" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0 Single Carriageway\n", + "1 Collapsed Dual Carriageway\n", + "2 Roundabout\n", + "3 Slip Road\n", + "4 Shared Use Carriageway\n", + "5 Dual Carriageway\n", + "6 Guided Busway\n", + "Name: form, dtype: object" + ] + }, + "execution_count": 10, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "road_edges['form'].unique()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Exercise #1 - Make IDs Compatible ###" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Our csv files were derived from original [GML](https://en.wikipedia.org/wiki/Geography_Markup_Language) files, and as you can see from the above, both `road_edges['src_id']` and `road_edges['dst_id']` contain a leading `#` character that `road_nodes['node_id']` does not. To make the IDs compatible between the edges and nodes, use cuDF's [string method](https://docs.rapids.ai/api/nvstrings/stable/) `.str.lstrip` to replace the `src_id` and `dst_id` columns in `road_edges` with values stripped of the leading `#` characters." + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [], + "source": [ + "road_edges['src_id']=road_edges['src_id'].str.lstrip('#')\n", + "road_edges['dst_id']=road_edges['dst_id'].str.lstrip('#')" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
src_iddst_idlengthtypeform
0id138447A5-91D4-4642-BFAC-13F309705429id84C9DAD4-9243-4742-B582-E8CBC848E08A314Restricted Local Access RoadSingle Carriageway
1idD615F9C5-5BE9-412D-9FED-F4928BAB4146idA1BB20B9-0751-4B42-9925-20607ABF5027104Restricted Local Access RoadSingle Carriageway
2idDC14D4D1-774E-487D-8EDE-60B129E5482Cid51555819-1A39-4B41-B0C9-C6D2086D9921100Restricted Local Access RoadSingle Carriageway
3id626FC567-199C-41FB-9F29-1AB718874128idACD1B0A9-F870-4B46-88CF-C870A9EDAF8B93Restricted Local Access RoadSingle Carriageway
4id03312900-B147-4CA3-A858-E2BF6AD1ECA7id02FE73D4-E88D-4119-8DC2-6E80DE6F659495Restricted Local Access RoadSingle Carriageway
\n", + "
" + ], + "text/plain": [ + " src_id \\\n", + "0 id138447A5-91D4-4642-BFAC-13F309705429 \n", + "1 idD615F9C5-5BE9-412D-9FED-F4928BAB4146 \n", + "2 idDC14D4D1-774E-487D-8EDE-60B129E5482C \n", + "3 id626FC567-199C-41FB-9F29-1AB718874128 \n", + "4 id03312900-B147-4CA3-A858-E2BF6AD1ECA7 \n", + "\n", + " dst_id length \\\n", + "0 id84C9DAD4-9243-4742-B582-E8CBC848E08A 314 \n", + "1 idA1BB20B9-0751-4B42-9925-20607ABF5027 104 \n", + "2 id51555819-1A39-4B41-B0C9-C6D2086D9921 100 \n", + "3 idACD1B0A9-F870-4B46-88CF-C870A9EDAF8B 93 \n", + "4 id02FE73D4-E88D-4119-8DC2-6E80DE6F6594 95 \n", + "\n", + " type form \n", + "0 Restricted Local Access Road Single Carriageway \n", + "1 Restricted Local Access Road Single Carriageway \n", + "2 Restricted Local Access Road Single Carriageway \n", + "3 Restricted Local Access Road Single Carriageway \n", + "4 Restricted Local Access Road Single Carriageway " + ] + }, + "execution_count": 13, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "road_edges.head()" + ] + }, + { + "cell_type": "raw", + "metadata": { + "jupyter": { + "source_hidden": true + } + }, + "source": [ + "\n", + "road_edges['src_id'] = road_edges['src_id'].str.lstrip('#')\n", + "road_edges['dst_id'] = road_edges['dst_id'].str.lstrip('#')\n", + "road_edges[['src_id', 'dst_id']].head()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Click ... for solution. " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Data Summary ##" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now that the data is cleaned we can see just how many roads and endpoints/junctions/curve points we will be working with, as well as its memory footprint in our GPU. The GPUs we are using can hold and analyze much larger graphs than this one!" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "3725531 edges, 3121148 nodes\n" + ] + } + ], + "source": [ + "print(f'{road_edges.shape[0]} edges, {road_nodes.shape[0]} nodes')" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Wed Oct 15 12:04:42 2025 \n", + "+-----------------------------------------------------------------------------+\n", + "| NVIDIA-SMI 525.85.12 Driver Version: 525.85.12 CUDA Version: 12.0 |\n", + "|-------------------------------+----------------------+----------------------+\n", + "| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |\n", + "| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |\n", + "| | | MIG M. |\n", + "|===============================+======================+======================|\n", + "| 0 Tesla T4 On | 00000000:00:1B.0 Off | 0 |\n", + "| N/A 29C P0 27W / 70W | 3416MiB / 15360MiB | 0% Default |\n", + "| | | N/A |\n", + "+-------------------------------+----------------------+----------------------+\n", + "| 1 Tesla T4 On | 00000000:00:1C.0 Off | 0 |\n", + "| N/A 30C P0 27W / 70W | 168MiB / 15360MiB | 0% Default |\n", + "| | | N/A |\n", + "+-------------------------------+----------------------+----------------------+\n", + "| 2 Tesla T4 On | 00000000:00:1D.0 Off | 0 |\n", + "| N/A 30C P0 27W / 70W | 168MiB / 15360MiB | 0% Default |\n", + "| | | N/A |\n", + "+-------------------------------+----------------------+----------------------+\n", + "| 3 Tesla T4 On | 00000000:00:1E.0 Off | 0 |\n", + "| N/A 29C P0 28W / 70W | 168MiB / 15360MiB | 0% Default |\n", + "| | | N/A |\n", + "+-------------------------------+----------------------+----------------------+\n", + " \n", + "+-----------------------------------------------------------------------------+\n", + "| Processes: |\n", + "| GPU GI CI PID Type Process name GPU Memory |\n", + "| ID ID Usage |\n", + "|=============================================================================|\n", + "+-----------------------------------------------------------------------------+\n" + ] + } + ], + "source": [ + "!nvidia-smi" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Building the Road Network Graph ##" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We don't have information on the direction of the roads (some of them are one-way), so we will assume all of them are two-way for simplicity. That makes the graph \"undirected,\" so we will build a cuGraph `Graph` rather than a directed graph or`DiGraph`.\n", + "\n", + "We initialize it with edge sources, destinations, and attributes, which for our data will be the length of the roads:" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "CPU times: user 399 ms, sys: 95.7 ms, total: 495 ms\n", + "Wall time: 494 ms\n" + ] + } + ], + "source": [ + "G = cg.Graph()\n", + "%time G.from_cudf_edgelist(road_edges, source='src_id', destination='dst_id', edge_attr='length')" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "display(G)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Just as a point of comparison, we also construct the equivalent graph in NetworkX from the equivalent cleaned and prepped Pandas dataframe." + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "CPU times: user 15.4 s, sys: 922 ms, total: 16.3 s\n", + "Wall time: 16.3 s\n" + ] + } + ], + "source": [ + "road_edges_cpu = road_edges.to_pandas()\n", + "%time G_cpu = nx.convert_matrix.from_pandas_edgelist(road_edges_cpu, source='src_id', target='dst_id', edge_attr='length')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Reindex `road_nodes` ###" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "For efficient lookup later, we will reindex `road_nodes` to use the `node_id` as its index - remember, we will typically get results from the graph analytics in terms of `node_id`s, so this lets us easily pull other information about the nodes (like their locations). We then sort the dataframe on this new index:" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "CPU times: user 260 ms, sys: 12 ms, total: 272 ms\n", + "Wall time: 272 ms\n" + ] + }, + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
eastnorthtype
node_id
id000000F5-5180-4C03-B05D-B01352C54F89432920.250572547.4375road end
id000003F8-9E09-4829-AD87-6DA4438D22D8526616.375189678.3906junction
id000010DA-C89A-4198-847A-6E62815E038A336879.000731824.0000junction
id000017A0-1843-4BC7-BCF7-C943B6780839380635.000390153.0000junction
id00001B2A-155F-4CD3-8E06-7677ADC6AF74337481.000350509.7188junction
\n", + "
" + ], + "text/plain": [ + " east north type\n", + "node_id \n", + "id000000F5-5180-4C03-B05D-B01352C54F89 432920.250 572547.4375 road end\n", + "id000003F8-9E09-4829-AD87-6DA4438D22D8 526616.375 189678.3906 junction\n", + "id000010DA-C89A-4198-847A-6E62815E038A 336879.000 731824.0000 junction\n", + "id000017A0-1843-4BC7-BCF7-C943B6780839 380635.000 390153.0000 junction\n", + "id00001B2A-155F-4CD3-8E06-7677ADC6AF74 337481.000 350509.7188 junction" + ] + }, + "execution_count": 18, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "road_nodes = road_nodes.set_index('node_id', drop=True)\n", + "%time road_nodes = road_nodes.sort_index()\n", + "road_nodes.head()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Analyzing the Graph ###" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now that we have created the graph we can analyze the number of nodes and edges in it:" + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "3078117" + ] + }, + "execution_count": 19, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "G.number_of_nodes()" + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "1875790" + ] + }, + "execution_count": 20, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "G.number_of_edges()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Notice that the number of edges is slightly smaller than the number of edges in `road_edges` printed above--the original data came from map tiles, and roads that passed over the edge of a tile were listed in both tiles, so cuGraph de-duplicated them. If we were creating a `MultiGraph` or `MultiDiGraph`--a graph that can have multiple edges in the same direction between nodes--then duplicates could be preserved." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We can also analyze the degrees of our graph nodes:" + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "metadata": {}, + "outputs": [], + "source": [ + "deg_df = G.degree()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In an undirected graph, every edge entering a node is simultaneously an edge leaving the node, so we expect the nodes to have a minimum degree of 2:" + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "mean 4.689990\n", + "std 1.913452\n", + "min 2.000000\n", + "25% 2.000000\n", + "50% 6.000000\n", + "75% 6.000000\n", + "max 16.000000\n", + "Name: degree, dtype: float64" + ] + }, + "execution_count": 22, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "deg_df['degree'].describe()[1:]" + ] + }, + { + "cell_type": "code", + "execution_count": 23, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "count 3.078117e+06\n", + "mean 4.689990e+00\n", + "std 1.913452e+00\n", + "min 2.000000e+00\n", + "25% 2.000000e+00\n", + "50% 6.000000e+00\n", + "75% 6.000000e+00\n", + "max 1.600000e+01\n", + "Name: degree, dtype: float64" + ] + }, + "execution_count": 23, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "deg_df['degree'].describe()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "You will spend more time using this GPU-accelerated graph later in the workshop." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Construct a Graph of Roads with Time Weights ##" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "For this series of exercises, you are going to construct and analyze a new graph of Great Britain's roads using the techniques just demonstrated, but this time, instead of using raw distance for the edges' weights, you will be using the time it will take to travel between the two nodes at a notional speed limit.\n", + "\n", + "You will be beginning this exercise with the `road_edges` dataframe from earlier:" + ] + }, + { + "cell_type": "code", + "execution_count": 24, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "src_id object\n", + "dst_id object\n", + "length int64\n", + "type object\n", + "form object\n", + "dtype: object" + ] + }, + "execution_count": 24, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "road_edges.dtypes" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Road Type to Speed Conversion ###" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In order to calculate how long it should take to travel along a road, we need to know its speed limit. We will do this by utilizing `road_edges['type']`, along with rules for the speed limits for each type of road.\n", + "\n", + "Here are the unique types of roads in our data:" + ] + }, + { + "cell_type": "code", + "execution_count": 25, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0 Restricted Local Access Road\n", + "1 Local Road\n", + "2 B Road\n", + "3 Secondary Access Road\n", + "4 Minor Road\n", + "5 A Road\n", + "6 Local Access Road\n", + "7 Motorway\n", + "Name: type, dtype: object" + ] + }, + "execution_count": 25, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "road_edges['type'].unique()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "And here is a table with assumptions about speed limits we can use for our conversion:" + ] + }, + { + "cell_type": "code", + "execution_count": 26, + "metadata": {}, + "outputs": [], + "source": [ + "# https://www.rac.co.uk/drive/advice/legal/speed-limits/\n", + "# Technically, speed limits depend on whether a road is in a built-up area and the form of carriageway,\n", + "# but we can use road type as a proxy for built-up areas.\n", + "# Values are in mph.\n", + "\n", + "speed_limits = {'Motorway': 70,\n", + " 'A Road': 60,\n", + " 'B Road': 60,\n", + " 'Local Road': 30,\n", + " 'Local Access Road': 30,\n", + " 'Minor Road': 30,\n", + " 'Restricted Local Access Road': 30,\n", + " 'Secondary Access Road': 30}" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We begin by creating `speed_gdf` to store each road type with its speed limit:" + ] + }, + { + "cell_type": "code", + "execution_count": 27, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
typelimit_mph
0Motorway70
1A Road60
2B Road60
3Local Road30
4Local Access Road30
5Minor Road30
6Restricted Local Access Road30
7Secondary Access Road30
\n", + "
" + ], + "text/plain": [ + " type limit_mph\n", + "0 Motorway 70\n", + "1 A Road 60\n", + "2 B Road 60\n", + "3 Local Road 30\n", + "4 Local Access Road 30\n", + "5 Minor Road 30\n", + "6 Restricted Local Access Road 30\n", + "7 Secondary Access Road 30" + ] + }, + "execution_count": 27, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "speed_gdf = cudf.DataFrame()\n", + "\n", + "speed_gdf['type'] = speed_limits.keys()\n", + "speed_gdf['limit_mph'] = [speed_limits[key] for key in speed_limits.keys()]\n", + "speed_gdf" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Next we add an additional column, `limit_m/s`, which for each road type will give us a measure of how fast one can travel on it in meters / second." + ] + }, + { + "cell_type": "code", + "execution_count": 28, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
typelimit_mphlimit_m/s
0Motorway7031.292722
1A Road6026.822333
2B Road6026.822333
3Local Road3013.411167
4Local Access Road3013.411167
5Minor Road3013.411167
6Restricted Local Access Road3013.411167
7Secondary Access Road3013.411167
\n", + "
" + ], + "text/plain": [ + " type limit_mph limit_m/s\n", + "0 Motorway 70 31.292722\n", + "1 A Road 60 26.822333\n", + "2 B Road 60 26.822333\n", + "3 Local Road 30 13.411167\n", + "4 Local Access Road 30 13.411167\n", + "5 Minor Road 30 13.411167\n", + "6 Restricted Local Access Road 30 13.411167\n", + "7 Secondary Access Road 30 13.411167" + ] + }, + "execution_count": 28, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# We will have road distances in meters (m), so to get road distances in seconds (s), we need to multiply by meters/mile and divide by seconds/hour\n", + "# 1 mile ~ 1609.34 m\n", + "speed_gdf['limit_m/s'] = speed_gdf['limit_mph'] * 1609.34 / 3600\n", + "speed_gdf" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Step 1: Merge `speed_gdf` into `road_edges` ###" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "cuDF provides merging functionality just like Pandas. Since we will be using values in `road_edges` to construct our graph, we need to merge `speed_gdf` into `road_edges` (similar to a database join). You can merge on the `type` column, which both of these dataframes share." + ] + }, + { + "cell_type": "code", + "execution_count": 29, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "CPU times: user 54.4 ms, sys: 4.66 ms, total: 59 ms\n", + "Wall time: 58.1 ms\n" + ] + } + ], + "source": [ + "%time road_edges = road_edges.merge(speed_gdf, on='type')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Exercse #2 - Step 2: Add Length in Seconds Column ###" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "You now need to calculate the number of seconds it will take to traverse a given road at the speed limit. This can be done by dividing a road's length in m by its speed limit in m/s. Perform this calculation on `road_edges` and store the results in a new column `length_s`." + ] + }, + { + "cell_type": "code", + "execution_count": 30, + "metadata": {}, + "outputs": [], + "source": [ + "road_edges['length_s']=road_edges['length']/road_edges['limit_m/s']" + ] + }, + { + "cell_type": "code", + "execution_count": 31, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
src_iddst_idlengthtypeformlimit_mphlimit_m/slength_s
0idC70BECA6-F99E-47BB-9E09-815D611F5D12idDB390BD6-EACC-4FD7-AA3E-933DF4B38529200Minor RoadSingle Carriageway3013.41116714.912946
1id6584F688-B7D8-421D-AF57-819DD952B691id7CE6BA3A-A498-4790-B881-4A64B2280AA0115Local RoadSingle Carriageway3013.4111678.574944
2idB5B2189B-EB23-4CAA-A5E7-FB67AD622B0Aid2EEE7A72-9DF6-4013-98CE-458037738C0414Local RoadSingle Carriageway3013.4111671.043906
3idC4093DE6-6D9A-49DE-9F9B-1FC25704D669id95067828-6B99-46BD-AF69-4BE4A9187DA667Local RoadSingle Carriageway3013.4111674.995837
4idCCF6E708-B31E-48EC-8412-F0669885BB40idB5B2189B-EB23-4CAA-A5E7-FB67AD622B0A51Local RoadSingle Carriageway3013.4111673.802801
\n", + "
" + ], + "text/plain": [ + " src_id \\\n", + "0 idC70BECA6-F99E-47BB-9E09-815D611F5D12 \n", + "1 id6584F688-B7D8-421D-AF57-819DD952B691 \n", + "2 idB5B2189B-EB23-4CAA-A5E7-FB67AD622B0A \n", + "3 idC4093DE6-6D9A-49DE-9F9B-1FC25704D669 \n", + "4 idCCF6E708-B31E-48EC-8412-F0669885BB40 \n", + "\n", + " dst_id length type \\\n", + "0 idDB390BD6-EACC-4FD7-AA3E-933DF4B38529 200 Minor Road \n", + "1 id7CE6BA3A-A498-4790-B881-4A64B2280AA0 115 Local Road \n", + "2 id2EEE7A72-9DF6-4013-98CE-458037738C04 14 Local Road \n", + "3 id95067828-6B99-46BD-AF69-4BE4A9187DA6 67 Local Road \n", + "4 idB5B2189B-EB23-4CAA-A5E7-FB67AD622B0A 51 Local Road \n", + "\n", + " form limit_mph limit_m/s length_s \n", + "0 Single Carriageway 30 13.411167 14.912946 \n", + "1 Single Carriageway 30 13.411167 8.574944 \n", + "2 Single Carriageway 30 13.411167 1.043906 \n", + "3 Single Carriageway 30 13.411167 4.995837 \n", + "4 Single Carriageway 30 13.411167 3.802801 " + ] + }, + "execution_count": 31, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "road_edges.head()" + ] + }, + { + "cell_type": "raw", + "metadata": {}, + "source": [ + "\n", + "road_edges['length_s'] = road_edges['length'] / road_edges['limit_m/s']\n", + "road_edges['length_s'].head()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Click ... for solution. " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Exercise #3 - Step 3: Construct the Graph ###" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Construct a cuGraph `Graph` called `G_ex` using the sources and destinations found in `road_edges`, along with length-in-seconds values for the edges' weights." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "G_ex=cg.Graph()\n", + "G_ex.from_cudf_edgelist(road_edges, source='src_id', destination='dst_id', edge_attr='length')" + ] + }, + { + "cell_type": "raw", + "metadata": { + "jupyter": { + "source_hidden": true + } + }, + "source": [ + "\n", + "G_ex = cg.Graph()\n", + "G_ex.from_cudf_edgelist(road_edges, source='src_id', destination='dst_id', edge_attr='length_s')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Click ... for solution. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import IPython\n", + "app = IPython.Application.instance()\n", + "app.kernel.do_shutdown(True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**Well Done!** Let's move to the [next notebook](2-03_cugraph.ipynb). " + ] + }, + { + "cell_type": "markdown", + "metadata": { + "tags": [] + }, + "source": [ + "" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.10.15" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} -- cgit v1.2.3