Graph Theory¶

Graph theory provides the mathematical language for describing relationships between entities. This file covers nodes, edges, adjacency matrices, graph types, degree and connectivity, the graph Laplacian, spectral graph theory, and real-world graph applications. We will go deepe into graphs in the pure computer science chapters

So far in this book, data has lived on regular structures: vectors in \(\mathbb{R}^n\) (chapter 1), matrices as grids of numbers (chapter 2), images as pixel grids (chapter 8), sequences as ordered lists (chapter 7). But many real-world systems are irregular: a social network has no grid structure, a molecule has no left-to-right order, and a road network does not tile neatly into rows and columns.
Graphs are the mathematical tool for representing these irregular, relational structures. A graph captures entities (nodes) and relationships (edges) between them. Once data is represented as a graph, we can apply the geometric deep learning principles from file 1 to learn from it.

Nodes, Edges, and Adjacency¶

A graph \(G = (V, E)\) consists of a set of nodes (or vertices) \(V = \{v_1, v_2, \ldots, v_n\}\) and a set of edges \(E \subseteq V \times V\) connecting pairs of nodes.
Nodes represent entities: people, atoms, cities, web pages, neurons. Edges represent relationships: friendships, chemical bonds, roads, hyperlinks, synapses.
The adjacency matrix \(A\) is the matrix representation of a graph. For a graph with \(n\) nodes, \(A\) is an \(n \times n\) matrix where \(A_{ij} = 1\) if there is an edge from node \(i\) to node \(j\), and \(A_{ij} = 0\) otherwise.
For example, a triangle graph (3 nodes, all connected) has:

\[ A = \begin{bmatrix} 0 & 1 & 1 \\ 1 & 0 & 1 \\ 1 & 1 & 0 \end{bmatrix} \]

A triangle graph and its adjacency matrix: 1 where edges exist, 0 otherwise

The diagonal is zero because nodes are not connected to themselves (no self-loops by default). The adjacency matrix is a direct application of the Boolean matrices we studied in chapter 2: each entry is a binary relationship.
The adjacency matrix encodes the graph's structure completely. Matrix operations on \(A\) reveal graph properties: \(A^2_{ij}\) counts the number of paths of length 2 between nodes \(i\) and \(j\) (recall matrix multiplication from chapter 2: each entry is a sum of products over intermediate nodes). More generally, \(A^k_{ij}\) counts length-\(k\) paths.
Each node can carry a feature vector \(\mathbf{x}_i \in \mathbb{R}^d\). For a social network, this might be a user's profile information. For a molecule, it encodes atom type, charge, and other properties. The full set of node features is a matrix \(X \in \mathbb{R}^{n \times d}\), where each row is one node's features.
Edges can also carry features: bond type in molecules, distance in spatial graphs, relationship type in knowledge graphs. The edge feature for edge \((i, j)\) is a vector \(\mathbf{e}_{ij} \in \mathbb{R}^{d_e}\).

Graph Types¶

An undirected graph has symmetric edges: if \(i\) is connected to \(j\), then \(j\) is connected to \(i\). The adjacency matrix is symmetric: \(A = A^T\) (a symmetric matrix, chapter 2). Friendships and chemical bonds are undirected.
A directed graph (digraph) has edges with direction: an edge from \(i\) to \(j\) does not imply an edge from \(j\) to \(i\). The adjacency matrix is asymmetric. Twitter follows, web hyperlinks, and citation networks are directed.
A weighted graph assigns a numerical weight to each edge. The adjacency matrix has real-valued entries instead of binary: \(A_{ij} = w_{ij}\). Distances in a road network, correlation strengths in brain connectivity, and interaction frequencies in social networks are weighted.
A bipartite graph has two disjoint sets of nodes, with edges only between the sets (never within). Users and products form a bipartite graph: users rate products, but users do not rate users. The adjacency matrix of a bipartite graph has a block structure:

\[ A = \begin{bmatrix} 0 & B \\ B^T & 0 \end{bmatrix} \]

where \(B\) is the bipartite adjacency matrix between the two node sets.
A multigraph allows multiple edges between the same pair of nodes and/or self-loops. Knowledge graphs are typically multigraphs: two entities can have multiple relationships (e.g., "born in," "lives in," "works in").
A hypergraph generalises edges to connect more than two nodes at once. A hyperedge connects a set of nodes, representing higher-order relationships. A research paper co-authored by five people is a hyperedge connecting five author nodes.
A complete graph \(K_n\) has an edge between every pair of nodes. This is the graph analogue of a fully connected layer, and it is the structure that transformers operate on (every token attends to every other token).

Degree, Paths, and Connectivity¶

The degree of a node is the number of edges connected to it. In an undirected graph, the degree of node \(i\) is \(d_i = \sum_j A_{ij}\). High-degree nodes are "hubs" with many connections.
The degree matrix \(D\) is a diagonal matrix with degrees on the diagonal: \(D_{ii} = d_i\). This matrix appears throughout graph theory and GNN formulas.
A path between two nodes is a sequence of edges connecting them. The shortest path (or geodesic) between \(i\) and \(j\) is the path with the fewest edges (or lowest total weight in a weighted graph). Dijkstra's algorithm finds shortest paths in \(O((|V| + |E|) \log |V|)\) time.
A graph is connected if there is a path between every pair of nodes. If not, it has multiple connected components: isolated subgraphs with no edges between them.
The diameter of a graph is the longest shortest path between any pair of nodes. It measures how "spread out" the graph is. Social networks have famously small diameters ("six degrees of separation").
A cycle is a path that starts and ends at the same node. A graph with no cycles is a tree. Trees are the simplest connected graphs: \(n\) nodes and exactly \(n-1\) edges.
Centrality measures the importance of a node. Degree centrality is simply the degree. Betweenness centrality counts how many shortest paths pass through a node. Eigenvector centrality assigns importance based on the importance of a node's neighbours, leading to the eigenvector equation \(A\mathbf{x} = \lambda \mathbf{x}\) (chapter 2). Google's PageRank is a variant of eigenvector centrality for directed graphs.

The Graph Laplacian¶

The graph Laplacian is perhaps the most important matrix in graph theory. It is defined as:

\[L = D - A\]

where \(D\) is the degree matrix and \(A\) is the adjacency matrix. For our triangle example:

\[ L = \begin{bmatrix} 2 & 0 & 0 \\ 0 & 2 & 0 \\ 0 & 0 & 2 \end{bmatrix} - \begin{bmatrix} 0 & 1 & 1 \\ 1 & 0 & 1 \\ 1 & 1 & 0 \end{bmatrix} = \begin{bmatrix} 2 & -1 & -1 \\ -1 & 2 & -1 \\ -1 & -1 & 2 \end{bmatrix} \]

The Laplacian has remarkable properties:
- It is always symmetric and positive semi-definite (recall from chapter 2: all eigenvalues are \(\geq 0\)). For any vector \(\mathbf{x}\):

\[\mathbf{x}^T L \mathbf{x} = \sum_{(i,j) \in E} (x_i - x_j)^2\]

Graph Laplacian measures signal smoothness: smooth signals have similar values on connected nodes, non-smooth signals vary sharply

- This quadratic form measures how much a signal $\mathbf{x}$ on the graph varies across edges. If neighbouring nodes have similar values, $\mathbf{x}^T L \mathbf{x}$ is small. If they differ sharply, it is large. The Laplacian measures **smoothness** of signals on the graph.

- The smallest eigenvalue is always 0, with eigenvector $\mathbf{1} = [1, 1, \ldots, 1]^T$ (a constant signal has zero variation). The number of zero eigenvalues equals the number of connected components.

- The second-smallest eigenvalue $\lambda_2$ is the **algebraic connectivity** (Fiedler value). It measures how well-connected the graph is: $\lambda_2 = 0$ means the graph is disconnected, large $\lambda_2$ means the graph is tightly connected.

The normalised Laplacian scales by the degrees:

\[\hat{L} = D^{-1/2} L D^{-1/2} = I - D^{-1/2} A D^{-1/2}\]

This normalisation ensures that the Laplacian's properties do not depend on the absolute scale of node degrees. The term \(D^{-1/2} A D^{-1/2}\) is the symmetrically normalised adjacency, and it appears directly in the GCN formula (file 3).

Spectral Graph Theory¶

The eigenvalues and eigenvectors of the graph Laplacian define the spectrum of the graph, and they serve as the graph analogue of the Fourier transform.
In classical signal processing, the Fourier transform decomposes a signal into frequency components (sines and cosines). On a graph, the eigenvectors of the Laplacian play the role of these frequency bases. Low-eigenvalue eigenvectors vary slowly across the graph (low frequency, smooth), while high-eigenvalue eigenvectors vary rapidly (high frequency, oscillatory).
The Graph Fourier Transform (GFT) of a signal \(\mathbf{x}\) on the graph is:

\[\hat{\mathbf{x}} = U^T \mathbf{x}\]

where \(U\) is the matrix of Laplacian eigenvectors (recall eigendecomposition from chapter 2: \(L = U \Lambda U^T\)). The inverse transform is \(\mathbf{x} = U \hat{\mathbf{x}}\).
Graph convolution in the spectral domain is pointwise multiplication in the frequency domain, just as convolution in the spatial domain corresponds to multiplication in the Fourier domain (the convolution theorem from chapter 8):

\[g_\theta \star \mathbf{x} = U \left( (U^T g_\theta) \odot (U^T \mathbf{x}) \right) = U \, \text{diag}(\hat{g}_\theta) \, U^T \mathbf{x}\]

The filter \(\hat{g}_\theta\) is a learnable function of the eigenvalues. This is the foundation of spectral GNNs, which we will simplify into the practical GCN in file 3.
The computational bottleneck is the eigendecomposition of \(L\), which costs \(O(n^3)\) for a graph with \(n\) nodes. This is impractical for large graphs (millions of nodes). Polynomial approximations (Chebyshev polynomials) avoid the eigendecomposition entirely, and this approximation leads directly to the GCN.

Community Detection¶

Many real-world graphs have community structure: clusters of densely connected nodes with sparse connections between clusters. Social networks have friend groups, biological networks have functional modules, citation networks have research fields.
Spectral clustering uses the Laplacian eigenvectors to find communities. The idea: embed each node using the \(k\) smallest non-trivial eigenvectors of \(L\), then apply k-means (chapter 6) in this embedding space. Nodes in the same community end up close together in the spectral embedding.
This works because the Fiedler vector (eigenvector of \(\lambda_2\)) naturally separates the graph into two groups: nodes with positive values and nodes with negative values, cutting through the sparsest connections. Higher eigenvectors refine this into more groups.
Modularity \(Q\) measures the quality of a community partition. It compares the number of within-community edges to the expected number in a random graph:

\[Q = \frac{1}{2|E|} \sum_{ij} \left( A_{ij} - \frac{d_i d_j}{2|E|} \right) \delta(c_i, c_j)\]

where \(c_i\) is the community assignment of node \(i\) and \(\delta\) is 1 if nodes are in the same community. \(Q\) ranges from \(-0.5\) to \(1\), with higher values indicating stronger community structure.

Real-World Graphs¶

Social networks: nodes are people, edges are friendships or interactions. Facebook has billions of nodes and hundreds of billions of edges. These graphs are typically sparse (each person has hundreds of friends, not billions), exhibit small-world properties (short average path length), and have heavy-tailed degree distributions (a few hubs with millions of connections).
Molecular graphs: nodes are atoms, edges are chemical bonds. Each atom has features (element type, charge, hybridisation) and each bond has features (single, double, triple, aromatic). Molecular graphs are small (tens to hundreds of nodes) but highly structured. Predicting molecular properties from graph structure is a major application of GNNs.
Knowledge graphs: nodes are entities (people, places, concepts), edges are typed relationships ("born in," "capital of," "instance of"). Knowledge graphs power search engines, recommendation systems, and question-answering. They are typically directed multigraphs with millions of entities and billions of relationships.
Citation networks: nodes are papers, edges are citations (directed). Clustering reveals research communities. Node features include title, abstract, and publication year.
Protein interaction networks: nodes are proteins, edges indicate physical interactions or functional associations. Understanding these graphs helps identify drug targets and disease mechanisms.
Road networks and transportation: nodes are intersections, edges are road segments with distance/time weights. Shortest-path algorithms on these graphs power navigation systems. Self-driving motion prediction (chapter 11) represents agent interactions as graphs.

Coding Tasks (use CoLab or notebook)¶

Build a small graph as an adjacency matrix and compute basic properties: degree of each node, number of paths of length 2, and whether the graph is connected.

import jax.numpy as jnp

# A simple graph: 5 nodes
# 0-1, 0-2, 1-2, 2-3, 3-4
A = jnp.array([[0, 1, 1, 0, 0],
               [1, 0, 1, 0, 0],
               [1, 1, 0, 1, 0],
               [0, 0, 1, 0, 1],
               [0, 0, 0, 1, 0]], dtype=float)

# Degree
degrees = A.sum(axis=1)
print(f"Degrees: {degrees}")

# Paths of length 2
A2 = A @ A
print(f"Paths of length 2 (node 0 to 3): {int(A2[0, 3])}")

# Connected? Check if A^(n-1) has all nonzero entries
An = jnp.linalg.matrix_power(A + jnp.eye(5), 4)  # (A+I)^4 for reachability
connected = jnp.all(An > 0)
print(f"Connected: {connected}")

Compute the graph Laplacian and its eigenvalues. Verify that the smallest eigenvalue is 0 and the corresponding eigenvector is constant.

import jax.numpy as jnp

A = jnp.array([[0, 1, 1, 0, 0],
               [1, 0, 1, 0, 0],
               [1, 1, 0, 1, 0],
               [0, 0, 1, 0, 1],
               [0, 0, 0, 1, 0]], dtype=float)

D = jnp.diag(A.sum(axis=1))
L = D - A

eigenvalues, eigenvectors = jnp.linalg.eigh(L)
print(f"Eigenvalues: {eigenvalues}")
print(f"Smallest eigenvector: {eigenvectors[:, 0]}")
print(f"Fiedler value (algebraic connectivity): {eigenvalues[1]:.4f}")

# Verify: x^T L x measures smoothness
x = jnp.array([1.0, 1.0, 1.0, -1.0, -1.0])  # two groups
smoothness = x @ L @ x
print(f"Smoothness of two-group signal: {smoothness:.2f}")

Perform spectral clustering on a graph with two communities. Embed nodes using the Fiedler vector and separate them by sign.

import jax.numpy as jnp
import matplotlib.pyplot as plt

# Two communities of 5 nodes each, weakly connected
A = jnp.zeros((10, 10))
# Community 1: nodes 0-4 (dense)
for i in range(5):
    for j in range(i+1, 5):
        A = A.at[i, j].set(1).at[j, i].set(1)
# Community 2: nodes 5-9 (dense)
for i in range(5, 10):
    for j in range(i+1, 10):
        A = A.at[i, j].set(1).at[j, i].set(1)
# One bridge edge
A = A.at[2, 7].set(1).at[7, 2].set(1)

D = jnp.diag(A.sum(axis=1))
L = D - A
eigenvalues, eigenvectors = jnp.linalg.eigh(L)

# Fiedler vector (2nd smallest eigenvalue)
fiedler = eigenvectors[:, 1]
communities = (fiedler > 0).astype(int)

print(f"Fiedler vector: {fiedler}")
print(f"Clusters: {communities}")

plt.bar(range(10), fiedler, color=["#3498db" if c == 0 else "#e74c3c" for c in communities])
plt.xlabel("Node"); plt.ylabel("Fiedler vector value")
plt.title("Spectral Clustering via Fiedler Vector")
plt.show()