Graphs¶
Graphs model relationships and connections, from social networks to road maps to dependency chains. This file covers representations, BFS, DFS, shortest paths, topological sort, and connected components, with the traversal and pathfinding patterns that dominate graph interview problems.
-
We covered graph theory in chapter 12 (adjacency matrices, Laplacian, spectral properties) and chapter 13 (trees, planarity, colouring). Here we focus on algorithmic patterns: how to traverse, search, and optimise over graphs in code.
-
The two fundamental graph algorithms are BFS and DFS. Nearly every graph problem reduces to one of these, possibly with modifications. Master these two and you can solve the vast majority of graph problems.
Graph Representations¶
- Adjacency list: for each node, store a list of its neighbours. Space: \(O(|V| + |E|)\). Best for sparse graphs (most real-world graphs).
# Undirected graph
graph = {
0: [1, 2],
1: [0, 3],
2: [0, 3],
3: [1, 2]
}
# From edge list
def build_graph(n, edges):
graph = {i: [] for i in range(n)}
for u, v in edges:
graph[u].append(v)
graph[v].append(u) # omit for directed
return graph
-
Adjacency matrix: \(n \times n\) matrix where \(A[i][j] = 1\) if edge \((i, j)\) exists. Space: \(O(|V|^2)\). Best for dense graphs or when you need \(O(1)\) edge lookup.
-
When to use which: adjacency list for almost everything. Matrix only when the graph is dense (\(|E| \approx |V|^2\)) or you need constant-time edge existence checks.
Pattern: BFS (Breadth-First Search)¶
- BFS explores nodes level by level using a queue. It is the go-to algorithm for:
- Shortest path in unweighted graphs
- Level-order traversal
- Finding connected components
- Any problem asking for the "minimum number of steps"
from collections import deque
def bfs(graph, start):
visited = {start}
queue = deque([start])
while queue:
node = queue.popleft()
for neighbour in graph[node]:
if neighbour not in visited:
visited.add(neighbour)
queue.append(neighbour)
- Critical: add to
visitedwhen enqueuing, not when dequeuing. If you mark visited on dequeue, the same node can be enqueued multiple times from different predecessors, wasting time and potentially causing incorrect results.
Easy: Number of Islands¶
-
Problem: given a 2D grid of '1's (land) and '0's (water), count the number of islands.
-
Pattern: iterate through the grid. When you find a '1', BFS/DFS to mark all connected land cells as visited. Each BFS initiation is one island.
from collections import deque
def num_islands(grid):
if not grid:
return 0
rows, cols = len(grid), len(grid[0])
count = 0
for r in range(rows):
for c in range(cols):
if grid[r][c] == '1':
count += 1
# BFS to mark entire island
queue = deque([(r, c)])
grid[r][c] = '0' # mark visited
while queue:
cr, cc = queue.popleft()
for dr, dc in [(0,1),(0,-1),(1,0),(-1,0)]:
nr, nc = cr + dr, cc + dc
if 0 <= nr < rows and 0 <= nc < cols and grid[nr][nc] == '1':
grid[nr][nc] = '0'
queue.append((nr, nc))
return count
-
Pitfall: the
directions = [(0,1),(0,-1),(1,0),(-1,0)]pattern for 4-connected grid neighbours is used in almost every grid problem. Memorise it. For 8-connected, add the diagonals. -
Pitfall: modifying the input grid (
grid[r][c] = '0') avoids needing a separatevisitedset. This is acceptable in interviews but be explicit about the tradeoff (mutates input).
Medium: Rotting Oranges¶
-
Problem: fresh oranges rot if adjacent to a rotten orange. Return the minimum time until all oranges are rotten (or -1 if impossible).
-
Pattern: multi-source BFS. Start with all initially rotten oranges in the queue simultaneously. Each BFS level is one time step.
from collections import deque
def oranges_rotting(grid):
rows, cols = len(grid), len(grid[0])
queue = deque()
fresh = 0
for r in range(rows):
for c in range(cols):
if grid[r][c] == 2:
queue.append((r, c))
elif grid[r][c] == 1:
fresh += 1
if fresh == 0:
return 0
time = 0
while queue and fresh > 0:
time += 1
for _ in range(len(queue)):
cr, cc = queue.popleft()
for dr, dc in [(0,1),(0,-1),(1,0),(-1,0)]:
nr, nc = cr + dr, cc + dc
if 0 <= nr < rows and 0 <= nc < cols and grid[nr][nc] == 1:
grid[nr][nc] = 2
fresh -= 1
queue.append((nr, nc))
return time if fresh == 0 else -1
- Key insight: multi-source BFS processes all sources simultaneously. This gives the shortest distance from any source, which is exactly "how long until the last fresh orange rots."
Pattern: DFS (Depth-First Search)¶
- DFS explores as deep as possible before backtracking. It uses a stack (explicit or the call stack via recursion). DFS is the go-to for:
- Cycle detection
- Topological sorting
- Connected components
- Backtracking / exhaustive search
- Path-finding with constraints
def dfs(graph, node, visited=None):
if visited is None:
visited = set()
visited.add(node)
for neighbour in graph[node]:
if neighbour not in visited:
dfs(graph, neighbour, visited)
Medium: Course Schedule (Cycle Detection)¶
-
Problem: given \(n\) courses and prerequisites, determine if all courses can be completed (i.e., no circular dependencies).
-
Pattern: detect a cycle in a directed graph. Use DFS with three states: unvisited, in-progress (on the current DFS path), and completed.
def can_finish(num_courses, prerequisites):
graph = {i: [] for i in range(num_courses)}
for course, prereq in prerequisites:
graph[course].append(prereq)
# 0 = unvisited, 1 = in-progress, 2 = completed
state = [0] * num_courses
def has_cycle(node):
if state[node] == 1:
return True # back edge → cycle
if state[node] == 2:
return False # already fully explored
state[node] = 1 # mark in-progress
for neighbour in graph[node]:
if has_cycle(neighbour):
return True
state[node] = 2 # mark completed
return False
for course in range(num_courses):
if has_cycle(course):
return False
return True
- Why three states: two states (visited/unvisited) cannot distinguish "I am currently exploring this node" from "I finished exploring this node." Finding a node that is currently being explored (state = 1) means we have found a cycle. Finding one that is fully explored (state = 2) is just a cross edge, not a cycle.
Medium: Course Schedule II (Topological Sort)¶
-
Problem: return a valid course order (topological ordering).
-
Pattern (Kahn's algorithm — BFS-based): start with nodes that have no incoming edges (indegree 0). Process them, reduce indegree of their neighbours. Repeat.
from collections import deque
def find_order(num_courses, prerequisites):
graph = {i: [] for i in range(num_courses)}
indegree = [0] * num_courses
for course, prereq in prerequisites:
graph[prereq].append(course)
indegree[course] += 1
queue = deque([i for i in range(num_courses) if indegree[i] == 0])
order = []
while queue:
node = queue.popleft()
order.append(node)
for neighbour in graph[node]:
indegree[neighbour] -= 1
if indegree[neighbour] == 0:
queue.append(neighbour)
return order if len(order) == num_courses else [] # empty = cycle exists
- Pitfall: if the result has fewer nodes than the graph, a cycle exists (some nodes' indegree never reached 0).
Shortest Paths¶
Dijkstra's Algorithm¶
- Finds shortest paths from a source to all other nodes in a non-negative weighted graph. Uses a priority queue (min-heap).
import heapq
def dijkstra(graph, start):
# graph: {node: [(neighbour, weight), ...]}
dist = {node: float('inf') for node in graph}
dist[start] = 0
heap = [(0, start)]
while heap:
d, node = heapq.heappop(heap)
if d > dist[node]:
continue # stale entry
for neighbour, weight in graph[node]:
new_dist = d + weight
if new_dist < dist[neighbour]:
dist[neighbour] = new_dist
heapq.heappush(heap, (new_dist, neighbour))
return dist
-
Time: \(O((|V| + |E|) \log |V|)\) with a binary heap.
-
Pitfall: the
if d > dist[node]: continueline is essential. Without it, you process stale heap entries, potentially degrading to \(O(|V|^2)\). -
Pitfall: Dijkstra does not work with negative weights. If an edge has negative weight, the greedy assumption (once a node is finalised, its distance is optimal) breaks. Use Bellman-Ford instead.
Hard: Network Delay Time¶
- Problem: given \(n\) nodes and weighted directed edges, find the time for a signal to reach all nodes from a source. Return -1 if not all nodes are reachable.
def network_delay(times, n, k):
graph = {i: [] for i in range(1, n + 1)}
for u, v, w in times:
graph[u].append((v, w))
dist = dijkstra(graph, k)
max_time = max(dist.values())
return max_time if max_time < float('inf') else -1
Strongly Connected Components¶
-
In a directed graph, a strongly connected component (SCC) is a maximal set of nodes where every node can reach every other.
-
Kosaraju's algorithm: (1) DFS on the original graph, recording finish order. (2) Transpose the graph (reverse all edges). (3) DFS on the transposed graph in reverse finish order. Each DFS tree in step 3 is one SCC.
-
When to use: finding cyclic dependencies, 2-SAT, condensing a directed graph into a DAG of SCCs.
Common Pitfalls Summary¶
| Pitfall | Example | Fix |
|---|---|---|
| Marking visited on dequeue | Same node enqueued multiple times | Mark visited when enqueuing |
| Two-state visited in directed graphs | Cannot distinguish back edge from cross edge | Use three states: unvisited/in-progress/done |
| Dijkstra with negative weights | Incorrect shortest paths | Use Bellman-Ford |
Forgetting if d > dist[node]: continue |
Processing stale heap entries | Always skip if current distance is worse |
| Grid bounds checking | Index out of range | 0 <= nr < rows and 0 <= nc < cols |
| Not counting time=0 edge case | Rotting oranges: no fresh oranges | Check fresh == 0 before BFS |
| Building directed graph as undirected | Prerequisites go one way | Add edge only in one direction |
Take-Home Problems (NeetCode)¶
BFS Patterns¶
- Number of Islands — grid BFS/DFS
- Rotting Oranges — multi-source BFS
- Clone Graph — BFS + hash map for cloning
- Pacific Atlantic Water Flow — BFS from both oceans
- Word Ladder — BFS on implicit graph
DFS Patterns¶
- Max Area of Island — DFS with area counting
- Course Schedule — cycle detection in directed graph
- Course Schedule II — topological sort
- Number of Connected Components — DFS or Union-Find
- Graph Valid Tree — connected + no cycle
Shortest Paths¶
- Network Delay Time — Dijkstra
- Cheapest Flights Within K Stops — modified BFS/Bellman-Ford with constraint
- Swim in Rising Water — binary search + BFS or Dijkstra on grid
Advanced¶
- Alien Dictionary — topological sort from character ordering