Foundations: Big O, Recursion, Backtracking, and Dynamic Programming¶
Before diving into data structures and algorithms, you need four foundational concepts: Big O notation for measuring efficiency, recursion for breaking problems into subproblems, backtracking for exhaustive search with pruning, and dynamic programming for avoiding redundant computation. This file teaches each from first principles.
- The remaining files in this chapter assume you are comfortable with these four ideas. If you skip this file, the \(O(n \log n)\) annotations, recursive tree traversals, backtracking templates, and DP state transitions in later files will feel like magic rather than engineering.
Why Patterns, Not Memorisation¶
-
There are thousands of coding problems on LeetCode, NeetCode, and HackerRank. Nobody can memorise them all, and trying to is a losing strategy. Interviewers do not pick problems from a fixed list, they modify, combine, and disguise them. A memorised solution to "Two Sum" will not help you when the interviewer asks a variant you have never seen.
-
The good news: there are only about 15-20 core patterns (two pointers, sliding window, BFS/DFS, DP, backtracking, etc.). Every problem, no matter how novel it looks on the surface, reduces to one or a combination of these patterns. The interview is not testing whether you have seen this exact problem before. It is testing whether you can peel away the context, the story, the specific data types, the edge cases, and recognise the underlying pattern.
-
Consider these three problems:
- "Find two numbers in an array that sum to a target."
- "Find two molecules whose binding energies sum to a threshold."
- "Given a list of account balances, find two accounts whose combined value equals a debt."
-
They look different. They are the same problem: Two Sum. The context (numbers, molecules, accounts) is irrelevant. The structure is: search for a complement in a collection → hash map lookup.
-
This is why this chapter teaches patterns through intuition, not solutions through repetition. For each pattern, we explain:
- What structural property of the problem signals this pattern (sorted input → two pointers; subarray constraint → sliding window; optimal substructure + overlapping subproblems → DP).
- Why the pattern works — the mathematical or logical reasoning, not just "it gives the right answer."
- How to adapt it — by showing easy, medium, and hard variants where the same core idea applies in different contexts.
-
When you deeply understand why sliding window works (the monotonicity of the constraint means expanding/contracting is sufficient), you can apply it to any problem with that structure, even one you have never seen. When you only memorise the code for "Longest Substring Without Repeating Characters," you are stuck the moment the problem changes.
-
The practical strategy:
- Learn the pattern (this chapter).
- Practise recognising it in disguised problems (the NeetCode take-homes at the end of each file).
- Practise implementing it under time pressure.
- In the interview: read the problem → strip the context → identify the pattern → implement.
Big O Notation¶
-
When we say an algorithm is "fast" or "slow," we need a precise way to measure it. Big O notation describes how an algorithm's runtime (or space usage) grows as the input size \(n\) increases, ignoring constant factors and lower-order terms.
-
The formal definition: \(f(n) = O(g(n))\) means there exist constants \(c > 0\) and \(n_0\) such that \(f(n) \leq c \cdot g(n)\) for all \(n \geq n_0\). In plain English: \(f\) grows no faster than \(g\) for large inputs.
-
Why ignore constants? Because a \(2n\) algorithm and a \(5n\) algorithm are both \(O(n)\): they scale the same way. On a faster computer, the constants change, but the scaling does not. Big O captures the intrinsic difficulty of the problem, independent of hardware.
The Growth Rate Hierarchy¶
- From fastest to slowest:
| Big O | Name | Example | \(n = 10^6\) operations |
|---|---|---|---|
| \(O(1)\) | Constant | Array access, hash lookup | 1 |
| \(O(\log n)\) | Logarithmic | Binary search | 20 |
| \(O(n)\) | Linear | Linear scan, single loop | \(10^6\) |
| \(O(n \log n)\) | Linearithmic | Merge sort, efficient sorting | \(2 \times 10^7\) |
| \(O(n^2)\) | Quadratic | Nested loops, brute-force pairs | \(10^{12}\) (too slow) |
| \(O(n^3)\) | Cubic | Triple nested loops, matrix multiply | \(10^{18}\) (way too slow) |
| \(O(2^n)\) | Exponential | All subsets, brute-force backtracking | \(10^{301030}\) (impossible) |
| \(O(n!)\) | Factorial | All permutations | absurd |
-
Rule of thumb: modern computers execute roughly \(10^8\)–\(10^9\) simple operations per second. For a 1-second time limit:
- \(O(n)\) works for \(n \leq 10^8\)
- \(O(n \log n)\) works for \(n \leq 10^7\)
- \(O(n^2)\) works for \(n \leq 10^4\)
- \(O(2^n)\) works for \(n \leq 25\)
-
This table tells you immediately whether your approach is fast enough. If \(n = 10^5\) and your solution is \(O(n^2)\), it will be \(10^{10}\) operations — too slow. You need a better algorithm.
How to Analyse Big O¶
- Single loop over \(n\) elements: \(O(n)\).
- Nested loops: multiply the iteration counts.
for i in range(n): # n iterations
for j in range(n): # n iterations each
process(i, j) # O(1)
# Total: O(n^2)
- Loop with halving: \(O(\log n)\). Each iteration halves the problem size, so it takes \(\log_2 n\) iterations.
- Nested loop where inner depends on outer:
for i in range(n):
for j in range(i): # j goes from 0 to i-1
process(i, j)
# Total: 0 + 1 + 2 + ... + (n-1) = n(n-1)/2 = O(n^2)
- Recursive: write the recurrence relation and solve it (chapter 13 covered the Master Theorem). For example, merge sort: \(T(n) = 2T(n/2) + O(n) = O(n \log n)\).
Common Traps¶
- Hidden loops:
x in listis \(O(n)\) in Python (linear scan), butx in setis \(O(1)\). Usinginon a list inside a loop gives \(O(n^2)\), not \(O(n)\).
# BAD: O(n^2) — "in" on a list is O(n)
for x in arr:
if x in another_list:
process(x)
# GOOD: O(n) — convert to set first
another_set = set(another_list)
for x in arr:
if x in another_set:
process(x)
-
String concatenation:
s += cin Python copies the entire string each time. Inside a loop of \(n\) iterations: \(O(1 + 2 + \cdots + n) = O(n^2)\). -
Sorting dominance: if your algorithm sorts (\(O(n \log n)\)) and then does a linear scan (\(O(n)\)), the total is \(O(n \log n)\) — the sort dominates.
-
Amortised complexity: some operations are expensive occasionally but cheap on average. Dynamic array append is \(O(1)\) amortised because the rare \(O(n)\) resize is spread across \(n\) cheap appends. Do not confuse amortised \(O(1)\) with worst-case \(O(1)\).
Space Complexity¶
-
Space complexity follows the same Big O rules, applied to memory usage instead of time.
-
In-place algorithms use \(O(1)\) extra space (not counting the input). Quicksort is \(O(\log n)\) space (recursion stack depth). Merge sort is \(O(n)\) (temporary arrays for merging).
-
Recursion stack: every recursive call uses stack space. A recursion \(n\) levels deep uses \(O(n)\) space, even if each call does not allocate additional memory. This is why recursive DFS on a graph with \(n\) nodes uses \(O(n)\) space.
-
For interviews, always state both time and space complexity. An \(O(n)\) time, \(O(n)\) space solution is often acceptable, but an \(O(n)\) time, \(O(1)\) space solution is better. The interviewer may ask you to optimise one or the other.
Recursion¶
-
Recursion is when a function calls itself to solve a smaller instance of the same problem. It is the most natural approach for problems with recursive structure: trees, nested structures, divide-and-conquer, and mathematical sequences.
-
Every recursive function has two parts:
- Base case: the smallest instance that can be solved directly (without recursion). This is what stops the recursion.
- Recursive case: break the problem into smaller subproblems, solve them recursively, and combine the results.
Example: Factorial¶
-
How it executes for
factorial(4):factorial(4)callsfactorial(3)factorial(3)callsfactorial(2)factorial(2)callsfactorial(1)factorial(1)returns1(base case)factorial(2)returns2 * 1 = 2factorial(3)returns3 * 2 = 6factorial(4)returns4 * 6 = 24
-
Each call is pushed onto the call stack. The stack grows until the base case is reached, then unwinds as each call returns. If the recursion is too deep (e.g.,
factorial(1000000)in Python), the stack overflows (RecursionError). Python's default recursion limit is 1000.
How to Think Recursively¶
-
The key mental shift: trust the recursion. When writing a recursive function, assume that the recursive call returns the correct answer for the smaller subproblem. Your job is only to:
- Handle the base case.
- Break the problem into smaller pieces.
- Combine the results.
-
You do not need to trace through every recursive call in your head. That is like trying to understand a loop by mentally executing every iteration. Instead, verify: "if the recursive call gives me the right answer for the smaller input, does my combination step give the right answer for the full input?"
Example: Recursion on a Linked List¶
- Reverse a linked list recursively:
def reverse(head):
if not head or not head.next: # base case: 0 or 1 nodes
return head
new_head = reverse(head.next) # reverse the rest
head.next.next = head # point the next node back to me
head.next = None # I am now the tail
return new_head
- Trust the recursion:
reverse(head.next)correctly reverses the rest of the list and returns the new head. We just need to attach the current node at the end.
Example: Recursion on a Tree¶
- Compute the height of a binary tree:
def height(root):
if not root: # base case: empty tree has height 0
return 0
left_h = height(root.left) # height of left subtree
right_h = height(root.right) # height of right subtree
return 1 + max(left_h, right_h) # this node adds 1 level
- This pattern — "recurse on left, recurse on right, combine" — solves the vast majority of tree problems (see file 03).
Recursion vs Iteration¶
-
Every recursive algorithm can be converted to an iterative one (using an explicit stack or loop). Iteration avoids the call stack overhead and the risk of stack overflow.
-
When to prefer recursion: the problem has natural recursive structure (trees, nested data, divide-and-conquer). The recursive solution is cleaner and easier to reason about.
-
When to prefer iteration: the recursion depth could be very large (e.g., processing a linked list of \(10^6\) nodes). The iterative solution avoids stack overflow.
-
Tail recursion: a recursive call is "tail recursive" if it is the last operation in the function (no work is done after the recursive call returns). Some languages (Scheme, Scala) optimise tail calls to use constant stack space. Python does not optimise tail calls, so tail recursion in Python still uses \(O(n)\) stack space.
Common Pitfalls¶
| Pitfall | Example | Fix |
|---|---|---|
| Missing base case | Infinite recursion → stack overflow | Always define when to stop |
| Wrong base case | Off-by-one in recursive decomposition | Test with the smallest inputs (0, 1, 2) |
| Not reducing the problem | f(n) calls f(n) instead of f(n-1) |
Ensure subproblem is strictly smaller |
| Redundant computation | Fibonacci: f(n) = f(n-1) + f(n-2) recomputes exponentially |
Use memoisation (→ DP) |
| Python recursion limit | factorial(10000) crashes |
Use sys.setrecursionlimit or convert to iteration |
Backtracking¶
-
Backtracking is a systematic way to explore all possible solutions by building them incrementally and abandoning partial solutions that cannot possibly lead to a valid answer.
-
Think of it as navigating a maze. At each junction, you pick a path. If you hit a dead end, you go back to the last junction and try a different path. You do not start over from the beginning — you backtrack to the most recent decision point.
The Three Steps¶
Every backtracking algorithm follows the same pattern:
- Choose: select a candidate to extend the current partial solution.
- Explore: recursively try to build a complete solution from this candidate.
- Unchoose: undo the choice (backtrack) and try the next candidate.
def backtrack(state, choices, result):
if is_complete(state):
result.append(state.copy())
return
for choice in choices:
if is_valid(choice, state):
state.add(choice) # 1. choose
backtrack(state, choices, result) # 2. explore
state.remove(choice) # 3. unchoose (backtrack)
- The unchoose step is what distinguishes backtracking from plain recursion. Without it, the state accumulates all choices and you cannot explore alternative paths.
When to Use Backtracking¶
- The problem asks to enumerate all valid configurations: all permutations, all subsets, all valid arrangements (e.g., N-Queens).
- The problem asks to find any valid configuration: Sudoku solving, maze path finding.
- The search space is large but can be pruned: most partial solutions can be rejected early without exploring them fully.
How Pruning Makes It Fast¶
- Without pruning, backtracking explores every possible combination — exponential time. Pruning cuts branches early:
for choice in choices:
if not is_valid(choice, state):
continue # PRUNE: skip this entire subtree
state.add(choice)
backtrack(state, choices, result)
state.remove(choice)
- In N-Queens (file 05), checking column and diagonal conflicts before placing a queen prunes the search tree from \(n^n\) to roughly \(n!\) candidates. For \(n = 8\), that is 16 million → 40,000. Good pruning makes exponential algorithms practical for moderate \(n\).
Generating All Subsets (The Simplest Backtracking)¶
def subsets(nums):
result = []
def backtrack(start, path):
result.append(path[:]) # every partial solution is a valid subset
for i in range(start, len(nums)):
path.append(nums[i]) # choose
backtrack(i + 1, path) # explore (i+1: no reuse)
path.pop() # unchoose
backtrack(0, [])
return result
-
For
[1, 2, 3], the recursion tree:[]→[1]→[1,2]→[1,2,3](backtrack) →[1,3](backtrack) →[2]→[2,3](backtrack) →[3]
-
Each node in the tree is one call to
backtrack. Each leaf (and intermediate node) produces a subset. Total subsets: \(2^n\).
Generating All Permutations¶
def permutations(nums):
result = []
def backtrack(path, remaining):
if not remaining:
result.append(path[:])
return
for i in range(len(remaining)):
path.append(remaining[i]) # choose
backtrack(path, remaining[:i] + remaining[i+1:]) # explore
path.pop() # unchoose
backtrack([], nums)
return result
- Total permutations: \(n!\). Each requires \(O(n)\) work to construct
remaining, so the total is \(O(n \cdot n!)\).
Common Pitfalls¶
| Pitfall | Example | Fix |
|---|---|---|
| Forgetting to copy the path | result.append(path) — all entries share the same list |
result.append(path[:]) or path.copy() |
| Not backtracking (unchoosing) | State keeps growing, later candidates see stale state | Always path.pop() or state.remove() after recursive call |
| Wrong loop start | Subsets with duplicates, or permutations with unwanted reuse | Use start parameter to avoid revisiting earlier indices |
| Skipping pruning | Exploring obviously invalid branches | Add if not is_valid: continue before the recursive call |
Dynamic Programming¶
-
Dynamic programming (DP) is an optimisation technique for problems where the same subproblems are solved repeatedly. Instead of recomputing, DP solves each subproblem once and stores the result.
-
DP applies when a problem has two properties:
- Optimal substructure: the optimal solution can be built from optimal solutions to subproblems.
- Overlapping subproblems: the same subproblems appear multiple times in the recursion.
The Fibonacci Motivation¶
- Naive recursive Fibonacci:
-
For
fib(5), the recursion tree:fib(5)callsfib(4)andfib(3)fib(4)callsfib(3)andfib(2)fib(3)is computed twice,fib(2)is computed three times
-
This is \(O(2^n)\) because the tree branches at every level, and most branches recompute the same values. For
fib(50), it takes over \(10^{15}\) operations — infeasible. -
With memoisation (top-down DP):
def fib_memo(n, memo={}):
if n in memo:
return memo[n]
if n <= 1:
return n
memo[n] = fib_memo(n - 1, memo) + fib_memo(n - 2, memo)
return memo[n]
-
Now
fib(3)is computed once, stored, and looked up on subsequent calls. Total: \(O(n)\) time, \(O(n)\) space. -
With tabulation (bottom-up DP):
def fib_tab(n):
if n <= 1:
return n
dp = [0] * (n + 1)
dp[1] = 1
for i in range(2, n + 1):
dp[i] = dp[i - 1] + dp[i - 2]
return dp[n]
- Same \(O(n)\) time, but builds the solution from the bottom up without recursion. Can be further optimised to \(O(1)\) space since each value depends only on the previous two.
The DP Recipe¶
For any DP problem, follow these steps:
-
Define the state: what does
dp[i](ordp[i][j]) represent? This is the hardest step. The state must capture enough information to make optimal decisions. -
Write the recurrence: how does
dp[i]relate to smaller subproblems? This is the transition formula. -
Identify the base case: what are the smallest subproblems that can be solved directly?
-
Determine the iteration order: which subproblems must be solved before which? Bottom-up: iterate in an order that ensures dependencies are resolved. Top-down: recursion handles this automatically.
-
Optimise space (optional): if
dp[i]only depends on the previous row or the previous few entries, you do not need the full table.
Example: The Thinking Process¶
Problem: given an array of positive integers, find the maximum sum of non-adjacent elements (House Robber).
Step 1 — Define the state: dp[i] = maximum sum considering elements nums[0..i].
Step 2 — Write the recurrence: for element \(i\), we either:
- Skip it: dp[i] = dp[i-1] (best sum without element \(i\)).
- Take it: dp[i] = dp[i-2] + nums[i] (must skip element \(i-1\), then add element \(i\)).
So: dp[i] = max(dp[i-1], dp[i-2] + nums[i]).
Step 3 — Base cases: dp[0] = nums[0], dp[1] = max(nums[0], nums[1]).
Step 4 — Iteration order: left to right (each state depends on the two previous states).
Step 5 — Space optimisation: only need the last two values.
def rob(nums):
if len(nums) == 1:
return nums[0]
prev2, prev1 = nums[0], max(nums[0], nums[1])
for i in range(2, len(nums)):
curr = max(prev1, prev2 + nums[i])
prev2, prev1 = prev1, curr
return prev1
How to Identify DP Problems¶
- The problem asks for an optimum (minimum cost, maximum profit, longest sequence) or a count (number of ways).
- The problem has choices at each step (take/skip, go left/right, use this coin or not), and the best overall answer depends on the best answers to subproblems.
- Drawing the recursion tree reveals repeated subproblems.
- The brute force is exponential, but there are far fewer distinct states than recursive calls.
Categories of DP¶
-
1D DP: state depends on a single index. Examples: climbing stairs, house robber, maximum subarray.
-
2D DP: state depends on two indices. Examples: longest common subsequence (
dp[i][j]for first \(i\) chars of string 1 and first \(j\) chars of string 2), edit distance, grid path problems. -
Interval DP: state is a range
dp[i][j]representing the subproblem onarr[i..j]. Examples: matrix chain multiplication, burst balloons. -
Knapsack DP: state is an item index and a capacity. Examples: 0/1 knapsack, coin change, subset sum.
-
Bitmask DP: state includes a bitmask representing which elements have been used. Examples: TSP, assignment problem. The state space is \(O(2^n \cdot n)\), feasible for \(n \leq 20\).
Top-Down vs Bottom-Up¶
| Top-Down (Memoisation) | Bottom-Up (Tabulation) | |
|---|---|---|
| Implementation | Recursive + cache | Iterative + table |
| Computes | Only subproblems that are actually needed | All subproblems up to the target |
| Stack overflow risk | Yes (deep recursion) | No |
| Space optimisation | Harder | Easier (use rolling array) |
| Ease of coding | Often more natural (write recursion, add cache) | Requires thinking about iteration order |
- In interviews, top-down is often faster to code. In production, bottom-up is usually preferred for performance (no recursion overhead, better cache behaviour).
Common Pitfalls¶
| Pitfall | Example | Fix |
|---|---|---|
| Wrong state definition | dp[i] does not capture enough info to make decisions |
Add dimensions (e.g., dp[i][j] instead of dp[i]) |
| Missing base case | dp[0] is wrong → all subsequent values are wrong |
Verify base case by hand |
| Wrong iteration order | Computing dp[i] before its dependencies |
Draw the dependency arrows and iterate accordingly |
Not initialising dp correctly |
Using 0 when it should be infinity (for min problems) | float('inf') for minimisation, float('-inf') for maximisation |
| Forgetting to consider "skip" option | Always taking the current element | The recurrence usually has max(take, skip) |
| Mutable default argument | def f(memo={}) shares cache across calls |
def f(memo=None): if memo is None: memo = {} |
| Off-by-one in 2D DP | Indexing text1[i] when dp is 1-indexed |
dp has size (m+1) x (n+1), access text1[i-1] |
Putting It All Together¶
-
These four concepts form a progression:
- Big O tells you whether an approach is fast enough.
- Recursion breaks problems into subproblems.
- Backtracking is recursion + choices + undo, for exhaustive search.
- DP is recursion + caching, for optimisation over overlapping subproblems.
-
When you see a new problem:
- Estimate the input size \(n\). What Big O is acceptable?
- If the brute force is exponential and the problem asks to enumerate/find configurations: backtracking (with pruning to make it practical).
- If the brute force is exponential and the problem asks for an optimum or count, and you see overlapping subproblems: DP.
- If the problem has structure that halves the search space: binary search or divide and conquer.
- If the problem is on sequences with a constraint on subarrays: sliding window or two pointers.
- If the problem needs fast lookups: hash map.
-
The remaining files in this chapter apply these ideas to specific data structures and patterns.