Skip to content

Graphs

Graphs model relationships and connections, from social networks to road maps to dependency chains. This file covers representations, BFS, DFS, shortest paths, topological sort, and connected components, with the traversal and pathfinding patterns that dominate graph interview problems.

  • We covered graph theory in chapter 12 (adjacency matrices, Laplacian, spectral properties) and chapter 13 (trees, planarity, colouring). Here we focus on algorithmic patterns: how to traverse, search, and optimise over graphs in code.

  • The two fundamental graph algorithms are BFS and DFS. Nearly every graph problem reduces to one of these, possibly with modifications. Master these two and you can solve the vast majority of graph problems.

Graph Representations

  • Adjacency list: for each node, store a list of its neighbours. Space: \(O(|V| + |E|)\). Best for sparse graphs (most real-world graphs).
# Undirected graph
graph = {
    0: [1, 2],
    1: [0, 3],
    2: [0, 3],
    3: [1, 2]
}

# From edge list
def build_graph(n, edges):
    graph = {i: [] for i in range(n)}
    for u, v in edges:
        graph[u].append(v)
        graph[v].append(u)  # omit for directed
    return graph
  • Adjacency matrix: \(n \times n\) matrix where \(A[i][j] = 1\) if edge \((i, j)\) exists. Space: \(O(|V|^2)\). Best for dense graphs or when you need \(O(1)\) edge lookup.

  • When to use which: adjacency list for almost everything. Matrix only when the graph is dense (\(|E| \approx |V|^2\)) or you need constant-time edge existence checks.

  • BFS explores nodes level by level using a queue. It is the go-to algorithm for:
    • Shortest path in unweighted graphs
    • Level-order traversal
    • Finding connected components
    • Any problem asking for the "minimum number of steps"
from collections import deque

def bfs(graph, start):
    visited = {start}
    queue = deque([start])

    while queue:
        node = queue.popleft()
        for neighbour in graph[node]:
            if neighbour not in visited:
                visited.add(neighbour)
                queue.append(neighbour)
  • Critical: add to visited when enqueuing, not when dequeuing. If you mark visited on dequeue, the same node can be enqueued multiple times from different predecessors, wasting time and potentially causing incorrect results.

Easy: Number of Islands

  • Problem: given a 2D grid of '1's (land) and '0's (water), count the number of islands.

  • Pattern: iterate through the grid. When you find a '1', BFS/DFS to mark all connected land cells as visited. Each BFS initiation is one island.

from collections import deque

def num_islands(grid):
    if not grid:
        return 0

    rows, cols = len(grid), len(grid[0])
    count = 0

    for r in range(rows):
        for c in range(cols):
            if grid[r][c] == '1':
                count += 1
                # BFS to mark entire island
                queue = deque([(r, c)])
                grid[r][c] = '0'  # mark visited
                while queue:
                    cr, cc = queue.popleft()
                    for dr, dc in [(0,1),(0,-1),(1,0),(-1,0)]:
                        nr, nc = cr + dr, cc + dc
                        if 0 <= nr < rows and 0 <= nc < cols and grid[nr][nc] == '1':
                            grid[nr][nc] = '0'
                            queue.append((nr, nc))

    return count
  • Pitfall: the directions = [(0,1),(0,-1),(1,0),(-1,0)] pattern for 4-connected grid neighbours is used in almost every grid problem. Memorise it. For 8-connected, add the diagonals.

  • Pitfall: modifying the input grid (grid[r][c] = '0') avoids needing a separate visited set. This is acceptable in interviews but be explicit about the tradeoff (mutates input).

Medium: Rotting Oranges

  • Problem: fresh oranges rot if adjacent to a rotten orange. Return the minimum time until all oranges are rotten (or -1 if impossible).

  • Pattern: multi-source BFS. Start with all initially rotten oranges in the queue simultaneously. Each BFS level is one time step.

from collections import deque

def oranges_rotting(grid):
    rows, cols = len(grid), len(grid[0])
    queue = deque()
    fresh = 0

    for r in range(rows):
        for c in range(cols):
            if grid[r][c] == 2:
                queue.append((r, c))
            elif grid[r][c] == 1:
                fresh += 1

    if fresh == 0:
        return 0

    time = 0
    while queue and fresh > 0:
        time += 1
        for _ in range(len(queue)):
            cr, cc = queue.popleft()
            for dr, dc in [(0,1),(0,-1),(1,0),(-1,0)]:
                nr, nc = cr + dr, cc + dc
                if 0 <= nr < rows and 0 <= nc < cols and grid[nr][nc] == 1:
                    grid[nr][nc] = 2
                    fresh -= 1
                    queue.append((nr, nc))

    return time if fresh == 0 else -1
  • Key insight: multi-source BFS processes all sources simultaneously. This gives the shortest distance from any source, which is exactly "how long until the last fresh orange rots."
  • DFS explores as deep as possible before backtracking. It uses a stack (explicit or the call stack via recursion). DFS is the go-to for:
    • Cycle detection
    • Topological sorting
    • Connected components
    • Backtracking / exhaustive search
    • Path-finding with constraints
def dfs(graph, node, visited=None):
    if visited is None:
        visited = set()
    visited.add(node)
    for neighbour in graph[node]:
        if neighbour not in visited:
            dfs(graph, neighbour, visited)

Medium: Course Schedule (Cycle Detection)

  • Problem: given \(n\) courses and prerequisites, determine if all courses can be completed (i.e., no circular dependencies).

  • Pattern: detect a cycle in a directed graph. Use DFS with three states: unvisited, in-progress (on the current DFS path), and completed.

def can_finish(num_courses, prerequisites):
    graph = {i: [] for i in range(num_courses)}
    for course, prereq in prerequisites:
        graph[course].append(prereq)

    # 0 = unvisited, 1 = in-progress, 2 = completed
    state = [0] * num_courses

    def has_cycle(node):
        if state[node] == 1:
            return True   # back edge → cycle
        if state[node] == 2:
            return False  # already fully explored

        state[node] = 1  # mark in-progress
        for neighbour in graph[node]:
            if has_cycle(neighbour):
                return True
        state[node] = 2  # mark completed
        return False

    for course in range(num_courses):
        if has_cycle(course):
            return False
    return True
  • Why three states: two states (visited/unvisited) cannot distinguish "I am currently exploring this node" from "I finished exploring this node." Finding a node that is currently being explored (state = 1) means we have found a cycle. Finding one that is fully explored (state = 2) is just a cross edge, not a cycle.

Medium: Course Schedule II (Topological Sort)

  • Problem: return a valid course order (topological ordering).

  • Pattern (Kahn's algorithm — BFS-based): start with nodes that have no incoming edges (indegree 0). Process them, reduce indegree of their neighbours. Repeat.

from collections import deque

def find_order(num_courses, prerequisites):
    graph = {i: [] for i in range(num_courses)}
    indegree = [0] * num_courses

    for course, prereq in prerequisites:
        graph[prereq].append(course)
        indegree[course] += 1

    queue = deque([i for i in range(num_courses) if indegree[i] == 0])
    order = []

    while queue:
        node = queue.popleft()
        order.append(node)
        for neighbour in graph[node]:
            indegree[neighbour] -= 1
            if indegree[neighbour] == 0:
                queue.append(neighbour)

    return order if len(order) == num_courses else []  # empty = cycle exists
  • Pitfall: if the result has fewer nodes than the graph, a cycle exists (some nodes' indegree never reached 0).

Shortest Paths

Dijkstra's Algorithm

  • Finds shortest paths from a source to all other nodes in a non-negative weighted graph. Uses a priority queue (min-heap).
import heapq

def dijkstra(graph, start):
    # graph: {node: [(neighbour, weight), ...]}
    dist = {node: float('inf') for node in graph}
    dist[start] = 0
    heap = [(0, start)]

    while heap:
        d, node = heapq.heappop(heap)
        if d > dist[node]:
            continue  # stale entry

        for neighbour, weight in graph[node]:
            new_dist = d + weight
            if new_dist < dist[neighbour]:
                dist[neighbour] = new_dist
                heapq.heappush(heap, (new_dist, neighbour))

    return dist
  • Time: \(O((|V| + |E|) \log |V|)\) with a binary heap.

  • Pitfall: the if d > dist[node]: continue line is essential. Without it, you process stale heap entries, potentially degrading to \(O(|V|^2)\).

  • Pitfall: Dijkstra does not work with negative weights. If an edge has negative weight, the greedy assumption (once a node is finalised, its distance is optimal) breaks. Use Bellman-Ford instead.

Hard: Network Delay Time

  • Problem: given \(n\) nodes and weighted directed edges, find the time for a signal to reach all nodes from a source. Return -1 if not all nodes are reachable.
def network_delay(times, n, k):
    graph = {i: [] for i in range(1, n + 1)}
    for u, v, w in times:
        graph[u].append((v, w))

    dist = dijkstra(graph, k)
    max_time = max(dist.values())
    return max_time if max_time < float('inf') else -1

Strongly Connected Components

  • In a directed graph, a strongly connected component (SCC) is a maximal set of nodes where every node can reach every other.

  • Kosaraju's algorithm: (1) DFS on the original graph, recording finish order. (2) Transpose the graph (reverse all edges). (3) DFS on the transposed graph in reverse finish order. Each DFS tree in step 3 is one SCC.

  • When to use: finding cyclic dependencies, 2-SAT, condensing a directed graph into a DAG of SCCs.


Common Pitfalls Summary

Pitfall Example Fix
Marking visited on dequeue Same node enqueued multiple times Mark visited when enqueuing
Two-state visited in directed graphs Cannot distinguish back edge from cross edge Use three states: unvisited/in-progress/done
Dijkstra with negative weights Incorrect shortest paths Use Bellman-Ford
Forgetting if d > dist[node]: continue Processing stale heap entries Always skip if current distance is worse
Grid bounds checking Index out of range 0 <= nr < rows and 0 <= nc < cols
Not counting time=0 edge case Rotting oranges: no fresh oranges Check fresh == 0 before BFS
Building directed graph as undirected Prerequisites go one way Add edge only in one direction

Take-Home Problems (NeetCode)

BFS Patterns

DFS Patterns

Shortest Paths

Advanced