{ "cells": [ { "cell_type": "markdown", "id": "2d190a78-7253-4fad-9d9c-6b4fb33c8bf2", "metadata": { "tags": [] }, "source": [ "" ] }, { "cell_type": "markdown", "id": "8a2c4abf-6278-4edd-83f8-f0afac4c834f", "metadata": {}, "source": [ "# Fundamentals of Accelerated Data Science #" ] }, { "cell_type": "markdown", "id": "e1e78ef4-c0de-433e-8616-bd946f69d30e", "metadata": {}, "source": [ "## 04 - cuGraph as a NetworkX backend ##" ] }, { "cell_type": "markdown", "id": "0828e0b4-7935-4b77-95ef-e06b72f0319e", "metadata": {}, "source": [ "**Table of Contents**\n", "
\n", "This notebook introduces the various methods of utilizing the cuGraph backend for NetworkX and runs centrality algorithms on the dataset. This notebook covers the below sections:\n", "1. [Background](#Background)\n", "2. [Installation](#Installation)\n", "3. [Utilizing nx-cugraph](#Utilizing-nx-cugraph)\n", " * [Runtime Environment Variable](#Runtime-Environment-Variable)\n", " * [Backend Keyword Argument](#Backend-Keyword-Argument)\n", " * [Type-Based Dispatching](#Type-Based-Dispatching)\n", "4. [Computing Centrality](#Computing-Centrality)\n", " * [Creating Graph](#Creating-Graph)\n", " * [Running Centrality Algorithms](#Running-Centrality-Algorithms)\n", " * [Betweenness Centrality](#Betweenness-Centrality)\n", " * [Degree Centrality](#Degree-Centrality)\n", " * [Katz Centrality](#Katz-Centrality)\n", " * [Pagerank Centrality](#Pagerank-Centrality)\n", " * [Eigenvector Centrality](#Eigenvector-Centrality)\n", " * [Visualize Results](#Visualize-Results)\n", " * [Exercise #1 - Type Dispatch](#Exercise-#1---Type-Dispatch)" ] }, { "cell_type": "markdown", "id": "c57b79ba-c7c7-49d2-9e21-c388bbe6ca98", "metadata": {}, "source": [ "## Background ##\n", "RAPIDS recently introduced a new backend to NetworkX called nx-cugraph. With this backend, you can automatically accelerate supported algorithms. In this notebook, we will cover the various methods of enabling the cugraph backend, and use the backend to run different centrality algorithms." ] }, { "cell_type": "code", "execution_count": null, "id": "7375dbf9", "metadata": {}, "outputs": [], "source": [ "# https://networkx.org/documentation/stable/reference/configs.html\n", "\n", "# nx.config.backend_priority = [\"cugraph\", \"..\"]\n", "# env NETWORKX_BACKEND_PRIORITY=\"cugraph,..\"" ] }, { "cell_type": "markdown", "id": "697ea4c9-b416-43d5-9d2c-28aa41ef2561", "metadata": {}, "source": [ "## Installation ##\n", "We have already prepared the environment with nx-cugraph installed. When you are using your own environment, below is the command for installation. " ] }, { "cell_type": "code", "execution_count": 1, "id": "8bce324e-ec19-4d87-96e5-bc134ab53baa", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Looking in indexes: https://pypi.org/simple, https://pypi.anaconda.org/rapidsai-wheels-nightly/simple\n", "Requirement already satisfied: nx-cugraph-cu12 in /opt/conda/lib/python3.10/site-packages (25.4.0)\n", "\u001b[33mWARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager, possibly rendering your system unusable.It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv. Use the --root-user-action option if you know what you are doing and want to suppress this warning.\u001b[0m\u001b[33m\n", "\u001b[0m" ] } ], "source": [ "!pip install nx-cugraph-cu12 --no-deps --extra-index-url https://pypi.anaconda.org/rapidsai-wheels-nightly/simple" ] }, { "cell_type": "markdown", "id": "a9ea09f4-6c93-4785-bcc3-44c6f040dfc6", "metadata": {}, "source": [ "## Utilizing nx-cugraph ##\n", "There are 3 ways to utilize nx-cugraph\n", "\n", "1. **Environment Variable at Runtime**\n", "2. **Backend keyword argument**\n", "3. **Type-Based dispatching**\n", "\n", "Let's dig a little deeper in to each of these methods." ] }, { "cell_type": "markdown", "id": "8b4322fd-9f56-4cbc-a00c-8fac4b2b2fe1", "metadata": {}, "source": [ "### Runtime Environment Variable ###\n", "The NETWORKX_AUTOMATIC_BACKENDS environment variable can be used to have NetworkX automatically dispatch to specified backends. Set NETWORKX_AUTOMATIC_BACKENDS=cugraph to use nx-cugraph to GPU accelerate supported APIs with no code changes. We will also be loading the cuDF pandas module to accelerate csv loading." ] }, { "cell_type": "code", "execution_count": 2, "id": "b41fef7f-5d43-4481-98a7-d9f3cb54066c", "metadata": {}, "outputs": [], "source": [ "!NETWORKX_AUTOMATIC_BACKENDS=cugraph python -m cudf.pandas scripts/networkx.py" ] }, { "cell_type": "markdown", "id": "5ffb6c4b-a03a-4bfb-9b92-14c59e6dcd75", "metadata": {}, "source": [ "### Backend Keyword Argument ###\n", "NetworkX also supports explicitly specifying a particular backend for supported APIs with the backend= keyword argument. This argument takes precedence over the NETWORKX_AUTOMATIC_BACKENDS environment variable. This method also requires that the specified backend already be installed." ] }, { "cell_type": "code", "execution_count": 3, "id": "8183ecc7-8544-4914-8c07-c904ba12225a", "metadata": {}, "outputs": [], "source": [ "import warnings\n", "warnings.filterwarnings('ignore')\n", "\n", "%load_ext cudf.pandas\n", "import networkx as nx\n", "import pandas as pd\n", "import matplotlib.pyplot as plt\n", "\n", "# Load the CSV file\n", "road_graph = pd.read_csv('./data/road_graph.csv', dtype=['int32', 'int32', 'float32'], nrows=1000)\n", "\n", "# Create an empty graph\n", "G = nx.from_pandas_edgelist(road_graph, source='src', target='dst', edge_attr='length')\n", "b = nx.betweenness_centrality(G, k=1000, backend=\"cugraph\")" ] }, { "cell_type": "markdown", "id": "e588aa65-6281-4c19-a51c-42f044636ac0", "metadata": {}, "source": [ "### Type-Based Dispatching ###\n", "For users wanting to ensure a particular behavior, without the potential for runtime conversions, NetworkX offers type-based dispatching. To utilize this method, users must import the desired backend and create a Graph instance for it." ] }, { "cell_type": "code", "execution_count": 4, "id": "5fea9300-8d75-443a-9ec0-ee65c8ccaf0f", "metadata": {}, "outputs": [], "source": [ "import networkx as nx\n", "import nx_cugraph as nxcg\n", "\n", "# Loading data from previous cell\n", "G = nx.from_pandas_edgelist(road_graph, source='src', target='dst', edge_attr='length') \n", "\n", "nxcg_G = nxcg.from_networkx(G) # conversion happens once here\n", "b = nx.betweenness_centrality(nxcg_G, k=1000) # nxcg Graph type causes cugraph backend to be used, no conversion necessary" ] }, { "cell_type": "markdown", "id": "cb5a17e1-d886-4d20-8d4b-ce900280279c", "metadata": {}, "source": [ "## Computing Centrality ##\n", "Now that we learned how to enable nx-cugraph, let's try to use it in a workflow! We will be using the backend argument for this example. First let's create a graph." ] }, { "cell_type": "markdown", "id": "19bea37c-bccf-4815-81bd-aa1de553812d", "metadata": {}, "source": [ "### Creating Graph ###" ] }, { "cell_type": "code", "execution_count": 5, "id": "2b4420d7-7c89-4914-809f-4e323a12f47f", "metadata": {}, "outputs": [], "source": [ "# Create a graph from already loaded dataframe\n", "G = nx.from_pandas_edgelist(road_graph, source='src', target='dst', edge_attr='length')" ] }, { "cell_type": "markdown", "id": "7dc1ad5b-8454-4277-9568-0cdacbebd9f1", "metadata": {}, "source": [ "### Running Centrality Algorithms ###\n", "Now, let's run the various centrality algorithms!" ] }, { "cell_type": "markdown", "id": "1c52b7b3-6c23-45be-9ace-34a667f132aa", "metadata": {}, "source": [ "### Betweenness Centrality ###\n", "Quantifies the number of times a node acts as a bridge along the shortest path between two other nodes, highlighting its importance in information flow\n", "\n", "For every pair of vertices in a connected graph, there exists at least one shortest path between the vertices, that is, there exists at least one path such that either the number of edges that the path passes through (for unweighted graphs) or the sum of the weights of the edges (for weighted graphs) is minimized.\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "id": "281374af-c7cf-4592-a34d-796c1158dab6", "metadata": {}, "outputs": [], "source": [ "b = nx.betweenness_centrality(G, backend=\"cugraph\")" ] }, { "cell_type": "markdown", "id": "f98b2975-1f72-4bff-83c7-ace7aab65d98", "metadata": {}, "source": [ "### Degree Centrality ###\n", "Measures the number of direct connections a node has, indicating how well-connected it is within the network" ] }, { "cell_type": "code", "execution_count": 7, "id": "3e0c4460-6d25-4a2b-8b8f-8f8c6ef617b0", "metadata": {}, "outputs": [], "source": [ "d = nx.degree_centrality(G, backend=\"cugraph\")" ] }, { "cell_type": "markdown", "id": "0665a659-16b1-48b4-b3bb-9aa5659ef91c", "metadata": {}, "source": [ "### Katz Centrality ###\n", "Measures a node's centrality based on its global influence in the network, considering both direct and indirect connections\n", "\n", "\n", "Katz centrality measures influence by taking into account the total number of walks between a pair of actors" ] }, { "cell_type": "code", "execution_count": 8, "id": "8ce418d2-9eda-40bc-9733-b82d8d7556b1", "metadata": {}, "outputs": [], "source": [ "k = nx.katz_centrality(G, backend=\"cugraph\")" ] }, { "cell_type": "markdown", "id": "0712cedb-87ba-4a08-a74d-24997d02a636", "metadata": {}, "source": [ "### Pagerank Centrality ###\n", "Determines a node's importance based on the quantity and quality of links to it, similar to Google's original PageRank algorithm\n", "\n", "PageRank’s main difference from EigenCentrality is that it accounts for link direction. Each node in a network is assigned a score based on its number of incoming links (its ‘indegree’). These links are also weighted depending on the relative score of its originating node.\n", "\n", "1/n" ] }, { "cell_type": "code", "execution_count": null, "id": "a17ee15b-8758-484b-82b9-a158187231c5", "metadata": {}, "outputs": [], "source": [ "p = nx.pagerank(G, max_iter=10, tol=1.0e-3, backend=\"cugraph\")" ] }, { "cell_type": "markdown", "id": "c5f57a5e-95e4-47f7-a9ec-04a99fa2c1dc", "metadata": {}, "source": [ "### Eigenvector Centrality ###\n", "Assigns scores to nodes based on the principle that connections to high-scoring nodes contribute more to the node's own score than connections to low-scoring nodes\n", "\n", "connections to high-scoring nodes contribute more to the score of the node in question than equal connections to low-scoring nodes. A high eigenvector score means that a node is connected to many nodes who themselves have high scores." ] }, { "cell_type": "code", "execution_count": 12, "id": "3eb1e358-ae8e-4399-bf45-90616b663e9d", "metadata": {}, "outputs": [], "source": [ "e = nx.eigenvector_centrality(G, max_iter=1000, tol=1.0e-3, backend=\"cugraph\")" ] }, { "cell_type": "markdown", "id": "0bc9178c-e66a-4c75-bf91-0c5d668b5634", "metadata": {}, "source": [ "### Visualize Results ###\n", "Now let's visualize results! We will only display the top 5 rows for readibility. " ] }, { "cell_type": "code", "execution_count": 13, "id": "69b6c23d-78a0-4dbb-be19-913ad180fe94", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Degree
vertexdegree_centrality
240.002847
720.002847
860.002847
1270.002847
1330.002847
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Betweenness
vertexbetweenness_centrality
2220.000007
3810.000007
240.000006
720.000006
860.000006
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Katz
vertexkatz_centrality
240.033058
720.033058
860.033058
1270.033058
1330.033058
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PageRank
vertexpagerank
240.002525
720.002525
860.002525
1270.002525
1330.002525
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
EigenVector
vertexeigenvector_centrality
240.064086
720.064086
860.064086
1270.064086
1330.064086
\n" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "from IPython.display import display_html\n", "dc_top = pd.DataFrame(sorted(d.items(), key=lambda x:x[1], reverse=True)[:5], columns=[\"vertex\", \"degree_centrality\"])\n", "bc_top = pd.DataFrame(sorted(b.items(), key=lambda x:x[1], reverse=True)[:5], columns=[\"vertex\", \"betweenness_centrality\"])\n", "katz_top = pd.DataFrame(sorted(k.items(), key=lambda x:x[1], reverse=True)[:5], columns=[\"vertex\", \"katz_centrality\"])\n", "pr_top = pd.DataFrame(sorted(p.items(), key=lambda x:x[1], reverse=True)[:5], columns=[\"vertex\", \"pagerank\"])\n", "ev_top = pd.DataFrame(sorted(e.items(), key=lambda x:x[1], reverse=True)[:5], columns=[\"vertex\", \"eigenvector_centrality\"])\n", "\n", "df1_styler = dc_top.style.set_table_attributes(\"style='display:inline'\").set_caption('Degree').hide(axis='index')\n", "df2_styler = bc_top.style.set_table_attributes(\"style='display:inline'\").set_caption('Betweenness').hide(axis='index')\n", "df3_styler = katz_top.style.set_table_attributes(\"style='display:inline'\").set_caption('Katz').hide(axis='index')\n", "df4_styler = pr_top.style.set_table_attributes(\"style='display:inline'\").set_caption('PageRank').hide(axis='index')\n", "df5_styler = ev_top.style.set_table_attributes(\"style='display:inline'\").set_caption('EigenVector').hide(axis='index')\n", "\n", "display_html(df1_styler._repr_html_()+df2_styler._repr_html_()+df3_styler._repr_html_()+df4_styler._repr_html_()+df5_styler._repr_html_(), raw=True)" ] }, { "cell_type": "markdown", "id": "1a653ca9-9448-4ba5-85b2-f6c885c273a9", "metadata": {}, "source": [ "### Exercise #1 - Type Dispatch ###\n", "Use the type dispatching method to obtain pagerank centrality results with the cugraph backend." ] }, { "cell_type": "code", "execution_count": 17, "id": "6eb90078-1479-4847-97b7-eb119e9d5478", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
vertexpagerank
0240.002525
1720.002525
2860.002525
31270.002525
41330.002525
\n", "
" ], "text/plain": [ " vertex pagerank\n", "0 24 0.002525\n", "1 72 0.002525\n", "2 86 0.002525\n", "3 127 0.002525\n", "4 133 0.002525" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "p = nx.pagerank(G, max_iter=10, tol=1.0e-3, backend=\"cugraph\")\n", "\n", "pd.DataFrame(sorted(p.items(), key=lambda x:x[1], reverse=True)[:5], columns=[\"vertex\", \"pagerank\"])" ] }, { "cell_type": "raw", "id": "1771c759-fbae-4831-b9b9-663514e460d7", "metadata": {}, "source": [ "\n", "import networkx as nx\n", "import nx_cugraph as nxcg\n", "\n", "# Loading data from previous cell\n", "G = nx.from_pandas_edgelist(road_graph, source='src', target='dst', edge_attr='length') \n", "\n", "nxcg_G = nxcg.from_networkx(G) # conversion happens once here\n", "p = nx.pagerank(nxcg_G, max_iter=10, tol=1.0e-3) # nxcg Graph type causes cugraph backend to be used, no conversion necessary\n", "\n", "pd.DataFrame(sorted(p.items(), key=lambda x:x[1], reverse=True)[:5], columns=[\"vertex\", \"pagerank\"])" ] }, { "cell_type": "markdown", "id": "bcdee8d8-c8e0-4521-b7ec-1f85f014e3ca", "metadata": {}, "source": [ "Click ... for solution. " ] }, { "cell_type": "code", "execution_count": null, "id": "d70c78b7-551d-4d9e-b428-32b26adcd3c4", "metadata": {}, "outputs": [], "source": [ "import IPython\n", "app = IPython.Application.instance()\n", "app.kernel.do_shutdown(True)" ] }, { "cell_type": "markdown", "id": "2279fdf1-82c0-4c6e-ac8e-b952f4777562", "metadata": {}, "source": [ "**Well Done!** " ] }, { "cell_type": "markdown", "id": "3fbc12b2-585c-48a9-a176-b2572040d378", "metadata": { "tags": [] }, "source": [ "" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.15" } }, "nbformat": 4, "nbformat_minor": 5 }