<img src="./images/DLI_Header.png" width=400/>

# Fundamentals of Accelerated Data Science #

## 04 - cuGraph as a NetworkX backend  ##

**Table of Contents**
<br>
This notebook introduces the various methods of utilizing the cuGraph backend for NetworkX and runs centrality algorithms on the dataset. This notebook covers the below sections:
1. [Background](#Background)
2. [Installation](#Installation)
3. [Utilizing nx-cugraph](#Utilizing-nx-cugraph)
    * [Runtime Environment Variable](#Runtime-Environment-Variable)
    * [Backend Keyword Argument](#Backend-Keyword-Argument)
    * [Type-Based Dispatching](#Type-Based-Dispatching)
4. [Computing Centrality](#Computing-Centrality)
    * [Creating Graph](#Creating-Graph)
    * [Running Centrality Algorithms](#Running-Centrality-Algorithms)
    * [Betweenness Centrality](#Betweenness-Centrality)
    * [Degree Centrality](#Degree-Centrality)
    * [Katz Centrality](#Katz-Centrality)
    * [Pagerank Centrality](#Pagerank-Centrality)
    * [Eigenvector Centrality](#Eigenvector-Centrality)
    * [Visualize Results](#Visualize-Results)
    * [Exercise #1 - Type Dispatch](#Exercise-#1---Type-Dispatch)

## Background ##
RAPIDS recently introduced a new backend to NetworkX called nx-cugraph. With this backend, you can automatically accelerate supported algorithms. In this notebook, we will cover the various methods of enabling the cugraph backend, and use the backend to run different centrality algorithms.

In [None]:
# https://networkx.org/documentation/stable/reference/configs.html

# nx.config.backend_priority = ["cugraph", ".."]
# env NETWORKX_BACKEND_PRIORITY="cugraph,.."

## Installation ##
We have already prepared the environment with nx-cugraph installed. When you are using your own environment, below is the command for installation. 

In [1]:
!pip install nx-cugraph-cu12 --no-deps --extra-index-url https://pypi.anaconda.org/rapidsai-wheels-nightly/simple

Looking in indexes: https://pypi.org/simple, https://pypi.anaconda.org/rapidsai-wheels-nightly/simple
[0m

## Utilizing nx-cugraph ##
There are 3 ways to utilize nx-cugraph

1. **Environment Variable at Runtime**
2. **Backend keyword argument**
3. **Type-Based dispatching**

Let's dig a little deeper in to each of these methods.

### Runtime Environment Variable ###
The NETWORKX_AUTOMATIC_BACKENDS environment variable can be used to have NetworkX automatically dispatch to specified backends. Set NETWORKX_AUTOMATIC_BACKENDS=cugraph to use nx-cugraph to GPU accelerate supported APIs with no code changes. We will also be loading the cuDF pandas module to accelerate csv loading.

In [2]:
!NETWORKX_AUTOMATIC_BACKENDS=cugraph python -m cudf.pandas scripts/networkx.py

### Backend Keyword Argument ###
NetworkX also supports explicitly specifying a particular backend for supported APIs with the backend= keyword argument. This argument takes precedence over the NETWORKX_AUTOMATIC_BACKENDS environment variable. This method also requires that the specified backend already be installed.

In [3]:
import warnings
warnings.filterwarnings('ignore')

%load_ext cudf.pandas
import networkx as nx
import pandas as pd
import matplotlib.pyplot as plt

# Load the CSV file
road_graph = pd.read_csv('./data/road_graph.csv', dtype=['int32', 'int32', 'float32'], nrows=1000)

# Create an empty graph
G = nx.from_pandas_edgelist(road_graph, source='src', target='dst', edge_attr='length')
b = nx.betweenness_centrality(G, k=1000, backend="cugraph")

### Type-Based Dispatching ###
For users wanting to ensure a particular behavior, without the potential for runtime conversions, NetworkX offers type-based dispatching. To utilize this method, users must import the desired backend and create a Graph instance for it.

In [4]:
import networkx as nx
import nx_cugraph as nxcg

# Loading data from previous cell
G = nx.from_pandas_edgelist(road_graph, source='src', target='dst', edge_attr='length') 

nxcg_G = nxcg.from_networkx(G)             # conversion happens once here
b = nx.betweenness_centrality(nxcg_G, k=1000)  # nxcg Graph type causes cugraph backend to be used, no conversion necessary

## Computing Centrality ##
Now that we learned how to enable nx-cugraph, let's try to use it in a workflow! We will be using the backend argument for this example. First let's create a graph.

### Creating Graph ###

In [5]:
# Create a graph from already loaded dataframe
G = nx.from_pandas_edgelist(road_graph, source='src', target='dst', edge_attr='length')

### Running Centrality Algorithms ###
Now, let's run the various centrality algorithms!

### Betweenness Centrality ###
Quantifies the number of times a node acts as a bridge along the shortest path between two other nodes, highlighting its importance in information flow

For every pair of vertices in a connected graph, there exists at least one shortest path between the vertices, that is, there exists at least one path such that either the number of edges that the path passes through (for unweighted graphs) or the sum of the weights of the edges (for weighted graphs) is minimized.



In [None]:
b = nx.betweenness_centrality(G, backend="cugraph")

### Degree Centrality ###
Measures the number of direct connections a node has, indicating how well-connected it is within the network

In [7]:
d = nx.degree_centrality(G, backend="cugraph")

### Katz Centrality ###
Measures a node's centrality based on its global influence in the network, considering both direct and indirect connections


Katz centrality measures influence by taking into account the total number of walks between a pair of actors

In [8]:
k = nx.katz_centrality(G, backend="cugraph")

### Pagerank Centrality ###
Determines a node's importance based on the quantity and quality of links to it, similar to Google's original PageRank algorithm

PageRank’s main difference from EigenCentrality is that it accounts for link direction. Each node in a network is assigned a score based on its number of incoming links (its ‘indegree’). These links are also weighted depending on the relative score of its originating node.

1/n

In [None]:
p = nx.pagerank(G, max_iter=10, tol=1.0e-3, backend="cugraph")

### Eigenvector Centrality ###
Assigns scores to nodes based on the principle that connections to high-scoring nodes contribute more to the node's own score than connections to low-scoring nodes

connections to high-scoring nodes contribute more to the score of the node in question than equal connections to low-scoring nodes. A high eigenvector score means that a node is connected to many nodes who themselves have high scores.

In [12]:
e = nx.eigenvector_centrality(G, max_iter=1000, tol=1.0e-3, backend="cugraph")

### Visualize Results ###
Now let's visualize results! We will only display the top 5 rows for readibility. 

In [13]:
from IPython.display import display_html
dc_top = pd.DataFrame(sorted(d.items(), key=lambda x:x[1], reverse=True)[:5], columns=["vertex", "degree_centrality"])
bc_top = pd.DataFrame(sorted(b.items(), key=lambda x:x[1], reverse=True)[:5], columns=["vertex", "betweenness_centrality"])
katz_top = pd.DataFrame(sorted(k.items(), key=lambda x:x[1], reverse=True)[:5], columns=["vertex", "katz_centrality"])
pr_top = pd.DataFrame(sorted(p.items(), key=lambda x:x[1], reverse=True)[:5], columns=["vertex", "pagerank"])
ev_top = pd.DataFrame(sorted(e.items(), key=lambda x:x[1], reverse=True)[:5], columns=["vertex", "eigenvector_centrality"])

df1_styler = dc_top.style.set_table_attributes("style='display:inline'").set_caption('Degree').hide(axis='index')
df2_styler = bc_top.style.set_table_attributes("style='display:inline'").set_caption('Betweenness').hide(axis='index')
df3_styler = katz_top.style.set_table_attributes("style='display:inline'").set_caption('Katz').hide(axis='index')
df4_styler = pr_top.style.set_table_attributes("style='display:inline'").set_caption('PageRank').hide(axis='index')
df5_styler = ev_top.style.set_table_attributes("style='display:inline'").set_caption('EigenVector').hide(axis='index')

display_html(df1_styler._repr_html_()+df2_styler._repr_html_()+df3_styler._repr_html_()+df4_styler._repr_html_()+df5_styler._repr_html_(), raw=True)

vertex,degree_centrality
24,0.002847
72,0.002847
86,0.002847
127,0.002847
133,0.002847

vertex,betweenness_centrality
222,7e-06
381,7e-06
24,6e-06
72,6e-06
86,6e-06

vertex,katz_centrality
24,0.033058
72,0.033058
86,0.033058
127,0.033058
133,0.033058

vertex,pagerank
24,0.002525
72,0.002525
86,0.002525
127,0.002525
133,0.002525

vertex,eigenvector_centrality
24,0.064086
72,0.064086
86,0.064086
127,0.064086
133,0.064086


### Exercise #1 - Type Dispatch ###
Use the type dispatching method to obtain pagerank centrality results with the cugraph backend.

In [17]:
p = nx.pagerank(G, max_iter=10, tol=1.0e-3, backend="cugraph")

pd.DataFrame(sorted(p.items(), key=lambda x:x[1], reverse=True)[:5], columns=["vertex", "pagerank"])

Unnamed: 0,vertex,pagerank
0,24,0.002525
1,72,0.002525
2,86,0.002525
3,127,0.002525
4,133,0.002525


Click ... for solution. 

In [None]:
import IPython
app = IPython.Application.instance()
app.kernel.do_shutdown(True)

**Well Done!** 

<img src="./images/DLI_Header.png" width=400/>