Skip to content

Foundations: Big O, Recursion, Backtracking, and Dynamic Programming

Before diving into data structures and algorithms, you need four foundational concepts: Big O notation for measuring efficiency, recursion for breaking problems into subproblems, backtracking for exhaustive search with pruning, and dynamic programming for avoiding redundant computation. This file teaches each from first principles.

  • The remaining files in this chapter assume you are comfortable with these four ideas. If you skip this file, the \(O(n \log n)\) annotations, recursive tree traversals, backtracking templates, and DP state transitions in later files will feel like magic rather than engineering.

Why Patterns, Not Memorisation

  • There are thousands of coding problems on LeetCode, NeetCode, and HackerRank. Nobody can memorise them all, and trying to is a losing strategy. Interviewers do not pick problems from a fixed list, they modify, combine, and disguise them. A memorised solution to "Two Sum" will not help you when the interviewer asks a variant you have never seen.

  • The good news: there are only about 15-20 core patterns (two pointers, sliding window, BFS/DFS, DP, backtracking, etc.). Every problem, no matter how novel it looks on the surface, reduces to one or a combination of these patterns. The interview is not testing whether you have seen this exact problem before. It is testing whether you can peel away the context, the story, the specific data types, the edge cases, and recognise the underlying pattern.

  • Consider these three problems:

    • "Find two numbers in an array that sum to a target."
    • "Find two molecules whose binding energies sum to a threshold."
    • "Given a list of account balances, find two accounts whose combined value equals a debt."
  • They look different. They are the same problem: Two Sum. The context (numbers, molecules, accounts) is irrelevant. The structure is: search for a complement in a collection → hash map lookup.

  • This is why this chapter teaches patterns through intuition, not solutions through repetition. For each pattern, we explain:

    • What structural property of the problem signals this pattern (sorted input → two pointers; subarray constraint → sliding window; optimal substructure + overlapping subproblems → DP).
    • Why the pattern works — the mathematical or logical reasoning, not just "it gives the right answer."
    • How to adapt it — by showing easy, medium, and hard variants where the same core idea applies in different contexts.
  • When you deeply understand why sliding window works (the monotonicity of the constraint means expanding/contracting is sufficient), you can apply it to any problem with that structure, even one you have never seen. When you only memorise the code for "Longest Substring Without Repeating Characters," you are stuck the moment the problem changes.

  • The practical strategy:

    1. Learn the pattern (this chapter).
    2. Practise recognising it in disguised problems (the NeetCode take-homes at the end of each file).
    3. Practise implementing it under time pressure.
    4. In the interview: read the problem → strip the context → identify the pattern → implement.

Big O Notation

  • When we say an algorithm is "fast" or "slow," we need a precise way to measure it. Big O notation describes how an algorithm's runtime (or space usage) grows as the input size \(n\) increases, ignoring constant factors and lower-order terms.

  • The formal definition: \(f(n) = O(g(n))\) means there exist constants \(c > 0\) and \(n_0\) such that \(f(n) \leq c \cdot g(n)\) for all \(n \geq n_0\). In plain English: \(f\) grows no faster than \(g\) for large inputs.

  • Why ignore constants? Because a \(2n\) algorithm and a \(5n\) algorithm are both \(O(n)\): they scale the same way. On a faster computer, the constants change, but the scaling does not. Big O captures the intrinsic difficulty of the problem, independent of hardware.

The Growth Rate Hierarchy

  • From fastest to slowest:
Big O Name Example \(n = 10^6\) operations
\(O(1)\) Constant Array access, hash lookup 1
\(O(\log n)\) Logarithmic Binary search 20
\(O(n)\) Linear Linear scan, single loop \(10^6\)
\(O(n \log n)\) Linearithmic Merge sort, efficient sorting \(2 \times 10^7\)
\(O(n^2)\) Quadratic Nested loops, brute-force pairs \(10^{12}\) (too slow)
\(O(n^3)\) Cubic Triple nested loops, matrix multiply \(10^{18}\) (way too slow)
\(O(2^n)\) Exponential All subsets, brute-force backtracking \(10^{301030}\) (impossible)
\(O(n!)\) Factorial All permutations absurd
  • Rule of thumb: modern computers execute roughly \(10^8\)\(10^9\) simple operations per second. For a 1-second time limit:

    • \(O(n)\) works for \(n \leq 10^8\)
    • \(O(n \log n)\) works for \(n \leq 10^7\)
    • \(O(n^2)\) works for \(n \leq 10^4\)
    • \(O(2^n)\) works for \(n \leq 25\)
  • This table tells you immediately whether your approach is fast enough. If \(n = 10^5\) and your solution is \(O(n^2)\), it will be \(10^{10}\) operations — too slow. You need a better algorithm.

How to Analyse Big O

  • Single loop over \(n\) elements: \(O(n)\).
total = 0
for x in arr:   # n iterations
    total += x   # O(1) per iteration
# Total: O(n)
  • Nested loops: multiply the iteration counts.
for i in range(n):       # n iterations
    for j in range(n):   # n iterations each
        process(i, j)    # O(1)
# Total: O(n^2)
  • Loop with halving: \(O(\log n)\). Each iteration halves the problem size, so it takes \(\log_2 n\) iterations.
i = n
while i > 0:
    process(i)
    i //= 2
# Total: O(log n)
  • Nested loop where inner depends on outer:
for i in range(n):
    for j in range(i):   # j goes from 0 to i-1
        process(i, j)
# Total: 0 + 1 + 2 + ... + (n-1) = n(n-1)/2 = O(n^2)
  • Recursive: write the recurrence relation and solve it (chapter 13 covered the Master Theorem). For example, merge sort: \(T(n) = 2T(n/2) + O(n) = O(n \log n)\).

Common Traps

  • Hidden loops: x in list is \(O(n)\) in Python (linear scan), but x in set is \(O(1)\). Using in on a list inside a loop gives \(O(n^2)\), not \(O(n)\).
# BAD: O(n^2) — "in" on a list is O(n)
for x in arr:
    if x in another_list:
        process(x)

# GOOD: O(n) — convert to set first
another_set = set(another_list)
for x in arr:
    if x in another_set:
        process(x)
  • String concatenation: s += c in Python copies the entire string each time. Inside a loop of \(n\) iterations: \(O(1 + 2 + \cdots + n) = O(n^2)\).

  • Sorting dominance: if your algorithm sorts (\(O(n \log n)\)) and then does a linear scan (\(O(n)\)), the total is \(O(n \log n)\) — the sort dominates.

  • Amortised complexity: some operations are expensive occasionally but cheap on average. Dynamic array append is \(O(1)\) amortised because the rare \(O(n)\) resize is spread across \(n\) cheap appends. Do not confuse amortised \(O(1)\) with worst-case \(O(1)\).

Space Complexity

  • Space complexity follows the same Big O rules, applied to memory usage instead of time.

  • In-place algorithms use \(O(1)\) extra space (not counting the input). Quicksort is \(O(\log n)\) space (recursion stack depth). Merge sort is \(O(n)\) (temporary arrays for merging).

  • Recursion stack: every recursive call uses stack space. A recursion \(n\) levels deep uses \(O(n)\) space, even if each call does not allocate additional memory. This is why recursive DFS on a graph with \(n\) nodes uses \(O(n)\) space.

  • For interviews, always state both time and space complexity. An \(O(n)\) time, \(O(n)\) space solution is often acceptable, but an \(O(n)\) time, \(O(1)\) space solution is better. The interviewer may ask you to optimise one or the other.


Recursion

  • Recursion is when a function calls itself to solve a smaller instance of the same problem. It is the most natural approach for problems with recursive structure: trees, nested structures, divide-and-conquer, and mathematical sequences.

  • Every recursive function has two parts:

    1. Base case: the smallest instance that can be solved directly (without recursion). This is what stops the recursion.
    2. Recursive case: break the problem into smaller subproblems, solve them recursively, and combine the results.

Example: Factorial

def factorial(n):
    if n <= 1:        # base case
        return 1
    return n * factorial(n - 1)  # recursive case
  • How it executes for factorial(4):

    • factorial(4) calls factorial(3)
    • factorial(3) calls factorial(2)
    • factorial(2) calls factorial(1)
    • factorial(1) returns 1 (base case)
    • factorial(2) returns 2 * 1 = 2
    • factorial(3) returns 3 * 2 = 6
    • factorial(4) returns 4 * 6 = 24
  • Each call is pushed onto the call stack. The stack grows until the base case is reached, then unwinds as each call returns. If the recursion is too deep (e.g., factorial(1000000) in Python), the stack overflows (RecursionError). Python's default recursion limit is 1000.

How to Think Recursively

  • The key mental shift: trust the recursion. When writing a recursive function, assume that the recursive call returns the correct answer for the smaller subproblem. Your job is only to:

    1. Handle the base case.
    2. Break the problem into smaller pieces.
    3. Combine the results.
  • You do not need to trace through every recursive call in your head. That is like trying to understand a loop by mentally executing every iteration. Instead, verify: "if the recursive call gives me the right answer for the smaller input, does my combination step give the right answer for the full input?"

Example: Recursion on a Linked List

  • Reverse a linked list recursively:
def reverse(head):
    if not head or not head.next:   # base case: 0 or 1 nodes
        return head

    new_head = reverse(head.next)   # reverse the rest
    head.next.next = head           # point the next node back to me
    head.next = None                # I am now the tail
    return new_head
  • Trust the recursion: reverse(head.next) correctly reverses the rest of the list and returns the new head. We just need to attach the current node at the end.

Example: Recursion on a Tree

  • Compute the height of a binary tree:
def height(root):
    if not root:           # base case: empty tree has height 0
        return 0
    left_h = height(root.left)    # height of left subtree
    right_h = height(root.right)  # height of right subtree
    return 1 + max(left_h, right_h)  # this node adds 1 level
  • This pattern — "recurse on left, recurse on right, combine" — solves the vast majority of tree problems (see file 03).

Recursion vs Iteration

  • Every recursive algorithm can be converted to an iterative one (using an explicit stack or loop). Iteration avoids the call stack overhead and the risk of stack overflow.

  • When to prefer recursion: the problem has natural recursive structure (trees, nested data, divide-and-conquer). The recursive solution is cleaner and easier to reason about.

  • When to prefer iteration: the recursion depth could be very large (e.g., processing a linked list of \(10^6\) nodes). The iterative solution avoids stack overflow.

  • Tail recursion: a recursive call is "tail recursive" if it is the last operation in the function (no work is done after the recursive call returns). Some languages (Scheme, Scala) optimise tail calls to use constant stack space. Python does not optimise tail calls, so tail recursion in Python still uses \(O(n)\) stack space.

Common Pitfalls

Pitfall Example Fix
Missing base case Infinite recursion → stack overflow Always define when to stop
Wrong base case Off-by-one in recursive decomposition Test with the smallest inputs (0, 1, 2)
Not reducing the problem f(n) calls f(n) instead of f(n-1) Ensure subproblem is strictly smaller
Redundant computation Fibonacci: f(n) = f(n-1) + f(n-2) recomputes exponentially Use memoisation (→ DP)
Python recursion limit factorial(10000) crashes Use sys.setrecursionlimit or convert to iteration

Backtracking

  • Backtracking is a systematic way to explore all possible solutions by building them incrementally and abandoning partial solutions that cannot possibly lead to a valid answer.

  • Think of it as navigating a maze. At each junction, you pick a path. If you hit a dead end, you go back to the last junction and try a different path. You do not start over from the beginning — you backtrack to the most recent decision point.

The Three Steps

Every backtracking algorithm follows the same pattern:

  1. Choose: select a candidate to extend the current partial solution.
  2. Explore: recursively try to build a complete solution from this candidate.
  3. Unchoose: undo the choice (backtrack) and try the next candidate.
def backtrack(state, choices, result):
    if is_complete(state):
        result.append(state.copy())
        return

    for choice in choices:
        if is_valid(choice, state):
            state.add(choice)           # 1. choose
            backtrack(state, choices, result)  # 2. explore
            state.remove(choice)        # 3. unchoose (backtrack)
  • The unchoose step is what distinguishes backtracking from plain recursion. Without it, the state accumulates all choices and you cannot explore alternative paths.

When to Use Backtracking

  • The problem asks to enumerate all valid configurations: all permutations, all subsets, all valid arrangements (e.g., N-Queens).
  • The problem asks to find any valid configuration: Sudoku solving, maze path finding.
  • The search space is large but can be pruned: most partial solutions can be rejected early without exploring them fully.

How Pruning Makes It Fast

  • Without pruning, backtracking explores every possible combination — exponential time. Pruning cuts branches early:
for choice in choices:
    if not is_valid(choice, state):
        continue  # PRUNE: skip this entire subtree

    state.add(choice)
    backtrack(state, choices, result)
    state.remove(choice)
  • In N-Queens (file 05), checking column and diagonal conflicts before placing a queen prunes the search tree from \(n^n\) to roughly \(n!\) candidates. For \(n = 8\), that is 16 million → 40,000. Good pruning makes exponential algorithms practical for moderate \(n\).

Generating All Subsets (The Simplest Backtracking)

def subsets(nums):
    result = []

    def backtrack(start, path):
        result.append(path[:])  # every partial solution is a valid subset

        for i in range(start, len(nums)):
            path.append(nums[i])        # choose
            backtrack(i + 1, path)       # explore (i+1: no reuse)
            path.pop()                   # unchoose

    backtrack(0, [])
    return result
  • For [1, 2, 3], the recursion tree:

    • [][1][1,2][1,2,3] (backtrack) → [1,3] (backtrack) → [2][2,3] (backtrack) → [3]
  • Each node in the tree is one call to backtrack. Each leaf (and intermediate node) produces a subset. Total subsets: \(2^n\).

Generating All Permutations

def permutations(nums):
    result = []

    def backtrack(path, remaining):
        if not remaining:
            result.append(path[:])
            return

        for i in range(len(remaining)):
            path.append(remaining[i])                    # choose
            backtrack(path, remaining[:i] + remaining[i+1:])  # explore
            path.pop()                                   # unchoose

    backtrack([], nums)
    return result
  • Total permutations: \(n!\). Each requires \(O(n)\) work to construct remaining, so the total is \(O(n \cdot n!)\).

Common Pitfalls

Pitfall Example Fix
Forgetting to copy the path result.append(path) — all entries share the same list result.append(path[:]) or path.copy()
Not backtracking (unchoosing) State keeps growing, later candidates see stale state Always path.pop() or state.remove() after recursive call
Wrong loop start Subsets with duplicates, or permutations with unwanted reuse Use start parameter to avoid revisiting earlier indices
Skipping pruning Exploring obviously invalid branches Add if not is_valid: continue before the recursive call

Dynamic Programming

  • Dynamic programming (DP) is an optimisation technique for problems where the same subproblems are solved repeatedly. Instead of recomputing, DP solves each subproblem once and stores the result.

  • DP applies when a problem has two properties:

    1. Optimal substructure: the optimal solution can be built from optimal solutions to subproblems.
    2. Overlapping subproblems: the same subproblems appear multiple times in the recursion.

The Fibonacci Motivation

  • Naive recursive Fibonacci:
def fib(n):
    if n <= 1:
        return n
    return fib(n - 1) + fib(n - 2)
  • For fib(5), the recursion tree:

    • fib(5) calls fib(4) and fib(3)
    • fib(4) calls fib(3) and fib(2)
    • fib(3) is computed twice, fib(2) is computed three times
  • This is \(O(2^n)\) because the tree branches at every level, and most branches recompute the same values. For fib(50), it takes over \(10^{15}\) operations — infeasible.

  • With memoisation (top-down DP):

def fib_memo(n, memo={}):
    if n in memo:
        return memo[n]
    if n <= 1:
        return n
    memo[n] = fib_memo(n - 1, memo) + fib_memo(n - 2, memo)
    return memo[n]
  • Now fib(3) is computed once, stored, and looked up on subsequent calls. Total: \(O(n)\) time, \(O(n)\) space.

  • With tabulation (bottom-up DP):

def fib_tab(n):
    if n <= 1:
        return n
    dp = [0] * (n + 1)
    dp[1] = 1
    for i in range(2, n + 1):
        dp[i] = dp[i - 1] + dp[i - 2]
    return dp[n]
  • Same \(O(n)\) time, but builds the solution from the bottom up without recursion. Can be further optimised to \(O(1)\) space since each value depends only on the previous two.

The DP Recipe

For any DP problem, follow these steps:

  1. Define the state: what does dp[i] (or dp[i][j]) represent? This is the hardest step. The state must capture enough information to make optimal decisions.

  2. Write the recurrence: how does dp[i] relate to smaller subproblems? This is the transition formula.

  3. Identify the base case: what are the smallest subproblems that can be solved directly?

  4. Determine the iteration order: which subproblems must be solved before which? Bottom-up: iterate in an order that ensures dependencies are resolved. Top-down: recursion handles this automatically.

  5. Optimise space (optional): if dp[i] only depends on the previous row or the previous few entries, you do not need the full table.

Example: The Thinking Process

Problem: given an array of positive integers, find the maximum sum of non-adjacent elements (House Robber).

Step 1 — Define the state: dp[i] = maximum sum considering elements nums[0..i].

Step 2 — Write the recurrence: for element \(i\), we either: - Skip it: dp[i] = dp[i-1] (best sum without element \(i\)). - Take it: dp[i] = dp[i-2] + nums[i] (must skip element \(i-1\), then add element \(i\)).

So: dp[i] = max(dp[i-1], dp[i-2] + nums[i]).

Step 3 — Base cases: dp[0] = nums[0], dp[1] = max(nums[0], nums[1]).

Step 4 — Iteration order: left to right (each state depends on the two previous states).

Step 5 — Space optimisation: only need the last two values.

def rob(nums):
    if len(nums) == 1:
        return nums[0]

    prev2, prev1 = nums[0], max(nums[0], nums[1])

    for i in range(2, len(nums)):
        curr = max(prev1, prev2 + nums[i])
        prev2, prev1 = prev1, curr

    return prev1

How to Identify DP Problems

  • The problem asks for an optimum (minimum cost, maximum profit, longest sequence) or a count (number of ways).
  • The problem has choices at each step (take/skip, go left/right, use this coin or not), and the best overall answer depends on the best answers to subproblems.
  • Drawing the recursion tree reveals repeated subproblems.
  • The brute force is exponential, but there are far fewer distinct states than recursive calls.

Categories of DP

  • 1D DP: state depends on a single index. Examples: climbing stairs, house robber, maximum subarray.

  • 2D DP: state depends on two indices. Examples: longest common subsequence (dp[i][j] for first \(i\) chars of string 1 and first \(j\) chars of string 2), edit distance, grid path problems.

  • Interval DP: state is a range dp[i][j] representing the subproblem on arr[i..j]. Examples: matrix chain multiplication, burst balloons.

  • Knapsack DP: state is an item index and a capacity. Examples: 0/1 knapsack, coin change, subset sum.

  • Bitmask DP: state includes a bitmask representing which elements have been used. Examples: TSP, assignment problem. The state space is \(O(2^n \cdot n)\), feasible for \(n \leq 20\).

Top-Down vs Bottom-Up

Top-Down (Memoisation) Bottom-Up (Tabulation)
Implementation Recursive + cache Iterative + table
Computes Only subproblems that are actually needed All subproblems up to the target
Stack overflow risk Yes (deep recursion) No
Space optimisation Harder Easier (use rolling array)
Ease of coding Often more natural (write recursion, add cache) Requires thinking about iteration order
  • In interviews, top-down is often faster to code. In production, bottom-up is usually preferred for performance (no recursion overhead, better cache behaviour).

Common Pitfalls

Pitfall Example Fix
Wrong state definition dp[i] does not capture enough info to make decisions Add dimensions (e.g., dp[i][j] instead of dp[i])
Missing base case dp[0] is wrong → all subsequent values are wrong Verify base case by hand
Wrong iteration order Computing dp[i] before its dependencies Draw the dependency arrows and iterate accordingly
Not initialising dp correctly Using 0 when it should be infinity (for min problems) float('inf') for minimisation, float('-inf') for maximisation
Forgetting to consider "skip" option Always taking the current element The recurrence usually has max(take, skip)
Mutable default argument def f(memo={}) shares cache across calls def f(memo=None): if memo is None: memo = {}
Off-by-one in 2D DP Indexing text1[i] when dp is 1-indexed dp has size (m+1) x (n+1), access text1[i-1]

Putting It All Together

  • These four concepts form a progression:

    1. Big O tells you whether an approach is fast enough.
    2. Recursion breaks problems into subproblems.
    3. Backtracking is recursion + choices + undo, for exhaustive search.
    4. DP is recursion + caching, for optimisation over overlapping subproblems.
  • When you see a new problem:

    • Estimate the input size \(n\). What Big O is acceptable?
    • If the brute force is exponential and the problem asks to enumerate/find configurations: backtracking (with pruning to make it practical).
    • If the brute force is exponential and the problem asks for an optimum or count, and you see overlapping subproblems: DP.
    • If the problem has structure that halves the search space: binary search or divide and conquer.
    • If the problem is on sequences with a constraint on subarrays: sliding window or two pointers.
    • If the problem needs fast lookups: hash map.
  • The remaining files in this chapter apply these ideas to specific data structures and patterns.