Sorting, Search, and Algorithm Design¶
Sorting and searching are the most fundamental algorithmic operations. This file covers sorting algorithms, binary search patterns, divide and conquer, greedy algorithms, dynamic programming, and backtracking
- Every data structure enables algorithms, and every algorithm relies on data structures. This file covers the design paradigms: the high-level strategies for solving problems. Once you recognise which paradigm applies, the implementation follows naturally.
Sorting Algorithms¶
- Sorting is the most studied problem in computer science. Understanding the algorithms builds intuition for recursion, divide-and-conquer, and complexity analysis.
| Algorithm | Best | Average | Worst | Space | Stable? |
|---|---|---|---|---|---|
| Bubble sort | \(O(n)\) | \(O(n^2)\) | \(O(n^2)\) | \(O(1)\) | Yes |
| Insertion sort | \(O(n)\) | \(O(n^2)\) | \(O(n^2)\) | \(O(1)\) | Yes |
| Merge sort | \(O(n \log n)\) | \(O(n \log n)\) | \(O(n \log n)\) | \(O(n)\) | Yes |
| Quick sort | \(O(n \log n)\) | \(O(n \log n)\) | \(O(n^2)\) | \(O(\log n)\) | No |
| Heap sort | \(O(n \log n)\) | \(O(n \log n)\) | \(O(n \log n)\) | \(O(1)\) | No |
| Counting sort | \(O(n + k)\) | \(O(n + k)\) | \(O(n + k)\) | \(O(k)\) | Yes |
| Radix sort | \(O(d(n + k))\) | \(O(d(n + k))\) | \(O(d(n + k))\) | \(O(n + k)\) | Yes |
-
Stable means equal elements preserve their relative order. This matters when sorting by multiple keys.
-
The lower bound for comparison-based sorting is \(\Omega(n \log n)\). The proof uses decision trees (chapter 13): any comparison sort must distinguish all \(n!\) permutations, requiring at least \(\log_2(n!) = \Omega(n \log n)\) comparisons. Counting sort and radix sort beat this by not comparing elements.
Merge Sort¶
- Divide the array in half, recursively sort each half, then merge the sorted halves. \(O(n \log n)\) always, \(O(n)\) extra space.
def merge_sort(arr):
if len(arr) <= 1:
return arr
mid = len(arr) // 2
left = merge_sort(arr[:mid])
right = merge_sort(arr[mid:])
return merge(left, right)
def merge(left, right):
result = []
i = j = 0
while i < len(left) and j < len(right):
if left[i] <= right[j]: # <= for stability
result.append(left[i])
i += 1
else:
result.append(right[j])
j += 1
result.extend(left[i:])
result.extend(right[j:])
return result
- Pitfall: using
<instead of<=in the merge breaks stability (equal elements from the right half would come before those from the left).
Quick Sort¶
- Pick a pivot, partition elements into "less than pivot" and "greater than pivot," recursively sort each partition. \(O(n \log n)\) average, \(O(n^2)\) worst case (when the pivot is always the smallest or largest element).
def quicksort(arr, lo=0, hi=None):
if hi is None:
hi = len(arr) - 1
if lo >= hi:
return
pivot_idx = partition(arr, lo, hi)
quicksort(arr, lo, pivot_idx - 1)
quicksort(arr, pivot_idx + 1, hi)
def partition(arr, lo, hi):
pivot = arr[hi] # Lomuto: pivot is last element
i = lo
for j in range(lo, hi):
if arr[j] < pivot:
arr[i], arr[j] = arr[j], arr[i]
i += 1
arr[i], arr[hi] = arr[hi], arr[i]
return i
-
Pivot strategies: last element (simple, bad for sorted input), random (expected \(O(n \log n)\)), median-of-three (practical choice). Always prefer random pivot in interviews to avoid worst-case discussion.
-
Pitfall: quicksort's \(O(n^2)\) worst case occurs on already-sorted arrays with first/last pivot. In practice, random pivot or median-of-three eliminates this.
Counting Sort¶
- When values are integers in a known range \([0, k)\), count occurrences and reconstruct: \(O(n + k)\) time. Not comparison-based, so it can beat \(O(n \log n)\).
def counting_sort(arr, k):
count = [0] * k
for x in arr:
count[x] += 1
result = []
for val in range(k):
result.extend([val] * count[val])
return result
- When to use: the range \(k\) is not much larger than \(n\). If \(k = O(n)\), this is \(O(n)\). If \(k \gg n\) (e.g., sorting 10 numbers in range \([0, 10^9]\)), counting sort wastes memory.
Pattern: Binary Search¶
-
Binary search finds a target in a sorted array in \(O(\log n)\) by repeatedly halving the search space. But binary search is much more than "find a number in a sorted array." The general pattern is: search on a monotonic condition.
-
Template (the one that avoids off-by-one errors):
def binary_search(arr, target):
lo, hi = 0, len(arr) - 1
while lo <= hi:
mid = lo + (hi - lo) // 2 # avoids overflow in other languages
if arr[mid] == target:
return mid
elif arr[mid] < target:
lo = mid + 1
else:
hi = mid - 1
return -1 # not found
- Lower bound (first element \(\geq\) target):
def lower_bound(arr, target):
lo, hi = 0, len(arr)
while lo < hi:
mid = (lo + hi) // 2
if arr[mid] < target:
lo = mid + 1
else:
hi = mid
return lo
- Pitfall: the difference between
lo <= hiandlo < hi, and betweenhi = midandhi = mid - 1, determines whether you find the exact match or the boundary. Draw it out with a 2-element array to verify.
Easy: Binary Search¶
- The standard problem. Use the template above.
Medium: Search in Rotated Sorted Array¶
-
Problem: a sorted array is rotated at some pivot. Find a target.
-
Pattern: at each step, one half is always sorted. Determine which half is sorted, and check if the target lies in that half.
def search_rotated(nums, target):
lo, hi = 0, len(nums) - 1
while lo <= hi:
mid = (lo + hi) // 2
if nums[mid] == target:
return mid
# left half is sorted
if nums[lo] <= nums[mid]:
if nums[lo] <= target < nums[mid]:
hi = mid - 1
else:
lo = mid + 1
# right half is sorted
else:
if nums[mid] < target <= nums[hi]:
lo = mid + 1
else:
hi = mid - 1
return -1
- Pitfall: the
<=innums[lo] <= nums[mid](not<) is critical. Whenlo == mid(2 elements left), we must correctly identify the sorted half.
Hard: Median of Two Sorted Arrays¶
-
Problem: find the median of two sorted arrays in \(O(\log(m + n))\).
-
Pattern: binary search on the partition point of the smaller array. The partition divides both arrays such that all elements on the left are smaller than all elements on the right.
def find_median(nums1, nums2):
if len(nums1) > len(nums2):
nums1, nums2 = nums2, nums1 # ensure nums1 is shorter
m, n = len(nums1), len(nums2)
lo, hi = 0, m
half = (m + n + 1) // 2
while lo <= hi:
i = (lo + hi) // 2 # partition point in nums1
j = half - i # partition point in nums2
left1 = nums1[i - 1] if i > 0 else float('-inf')
right1 = nums1[i] if i < m else float('inf')
left2 = nums2[j - 1] if j > 0 else float('-inf')
right2 = nums2[j] if j < n else float('inf')
if left1 <= right2 and left2 <= right1:
# correct partition
if (m + n) % 2 == 1:
return max(left1, left2)
return (max(left1, left2) + min(right1, right2)) / 2
elif left1 > right2:
hi = i - 1
else:
lo = i + 1
- This is one of the hardest binary search problems. The key insight is that you are not searching for a value but for a partition point that satisfies a condition.
Meta-Pattern: Binary Search on Answer¶
-
Many problems that do not look like binary search can be solved by binary searching on the answer. If the answer is a number, and you can write a function
is_feasible(x)that is monotonic (True for all \(x \geq\) optimal, or False for all \(x \geq\) optimal), then binary search over \(x\). -
Example: "what is the minimum capacity of a ship to deliver all packages within \(d\) days?" Binary search on the capacity. For each candidate capacity, greedily check if all packages can be delivered in \(d\) days.
def ship_within_days(weights, days):
lo, hi = max(weights), sum(weights)
while lo < hi:
mid = (lo + hi) // 2
# can we ship with capacity mid in <= days?
current_load, num_days = 0, 1
for w in weights:
if current_load + w > mid:
num_days += 1
current_load = 0
current_load += w
if num_days <= days:
hi = mid
else:
lo = mid + 1
return lo
Pattern: Greedy Algorithms¶
- A greedy algorithm makes the locally optimal choice at each step, hoping this leads to the globally optimal solution. Greedy works when the problem has greedy choice property (local optimum leads to global optimum) and optimal substructure (optimal solution contains optimal solutions to subproblems).
Medium: Jump Game¶
- Problem: given an array where
nums[i]is the max jump length at position \(i\), determine if you can reach the last index.
def can_jump(nums):
max_reach = 0
for i, jump in enumerate(nums):
if i > max_reach:
return False # cannot reach this position
max_reach = max(max_reach, i + jump)
return True
- Why greedy works: we only need to know the farthest reachable position. If the current position is beyond the farthest reach, we are stuck. Otherwise, we update the farthest reach.
Medium: Merge Intervals¶
- Problem: merge overlapping intervals.
def merge_intervals(intervals):
intervals.sort(key=lambda x: x[0])
merged = [intervals[0]]
for start, end in intervals[1:]:
if start <= merged[-1][1]:
merged[-1][1] = max(merged[-1][1], end)
else:
merged.append([start, end])
return merged
-
Pattern: sort by start time, then merge greedily. If the current interval overlaps with the last merged one, extend it. Otherwise, start a new merged interval.
-
Pitfall: using
merged[-1][1] = endinstead ofmerged[-1][1] = max(merged[-1][1], end). An interval can be fully contained within another (e.g., [1, 10] and [2, 5]).
Pattern: Dynamic Programming¶
-
Dynamic programming (DP) solves problems by breaking them into overlapping subproblems, solving each subproblem once, and storing the results. It works when a problem has optimal substructure and overlapping subproblems.
-
Two approaches:
- Top-down (memoisation): write the natural recursive solution, then cache results in a dictionary.
- Bottom-up (tabulation): build a table from the smallest subproblems up.
-
How to identify DP: the problem asks for an optimum (min/max), counting, or existence, and the current decision depends on previous decisions. If you draw the recursion tree and see repeated subproblems, it is DP.
Easy: Climbing Stairs¶
-
Problem: \(n\) steps, you can climb 1 or 2 at a time. How many distinct ways?
-
This is Fibonacci: \(f(n) = f(n-1) + f(n-2)\).
def climb_stairs(n):
if n <= 2:
return n
a, b = 1, 2
for _ in range(3, n + 1):
a, b = b, a + b
return b
- \(O(n)\) time, \(O(1)\) space. The full memoisation table is not needed since each state depends only on the previous two.
Medium: Coin Change¶
-
Problem: given coin denominations and a target amount, find the minimum number of coins needed.
-
State:
dp[amount]= minimum coins to makeamount. - Transition:
dp[amount] = min(dp[amount - coin] + 1)for each coin. - Base case:
dp[0] = 0.
def coin_change(coins, amount):
dp = [float('inf')] * (amount + 1)
dp[0] = 0
for a in range(1, amount + 1):
for coin in coins:
if coin <= a and dp[a - coin] + 1 < dp[a]:
dp[a] = dp[a - coin] + 1
return dp[amount] if dp[amount] != float('inf') else -1
- Pitfall: initialising with
float('inf')(not 0 or -1). The minimum comparison only works if unreachable states are infinity.
Medium: Longest Common Subsequence¶
-
Problem: given two strings, find the length of their longest common subsequence.
-
State:
dp[i][j]= LCS oftext1[:i]andtext2[:j]. - Transition: if
text1[i-1] == text2[j-1], thendp[i][j] = dp[i-1][j-1] + 1. Otherwise,dp[i][j] = max(dp[i-1][j], dp[i][j-1]).
def longest_common_subsequence(text1, text2):
m, n = len(text1), len(text2)
dp = [[0] * (n + 1) for _ in range(m + 1)]
for i in range(1, m + 1):
for j in range(1, n + 1):
if text1[i - 1] == text2[j - 1]:
dp[i][j] = dp[i - 1][j - 1] + 1
else:
dp[i][j] = max(dp[i - 1][j], dp[i][j - 1])
return dp[m][n]
Hard: 0/1 Knapsack¶
-
Problem: given items with weights and values, and a capacity \(W\), maximise the total value without exceeding \(W\).
-
State:
dp[i][w]= max value using first \(i\) items with capacity \(w\). - Transition:
dp[i][w] = max(dp[i-1][w], dp[i-1][w - weight[i]] + value[i])(skip or take item \(i\)).
def knapsack(weights, values, capacity):
n = len(weights)
dp = [[0] * (capacity + 1) for _ in range(n + 1)]
for i in range(1, n + 1):
for w in range(capacity + 1):
dp[i][w] = dp[i - 1][w] # skip item i
if weights[i - 1] <= w:
dp[i][w] = max(dp[i][w],
dp[i - 1][w - weights[i - 1]] + values[i - 1])
return dp[n][capacity]
- Space optimisation: since each row only depends on the previous row, use a 1D array and iterate \(w\) from right to left:
def knapsack_optimised(weights, values, capacity):
dp = [0] * (capacity + 1)
for i in range(len(weights)):
for w in range(capacity, weights[i] - 1, -1): # right to left!
dp[w] = max(dp[w], dp[w - weights[i]] + values[i])
return dp[capacity]
- Pitfall: iterating left to right in the 1D version allows using item \(i\) multiple times (unbounded knapsack). Right to left ensures each item is used at most once.
Pattern: Backtracking¶
-
Backtracking is exhaustive search with pruning. Build a solution incrementally, and abandon (backtrack) as soon as the partial solution cannot possibly lead to a valid complete solution.
-
Template:
def backtrack(candidates, path, result):
if is_solution(path):
result.append(path[:]) # copy!
return
for candidate in get_candidates(path):
if is_valid(candidate, path):
path.append(candidate) # choose
backtrack(candidates, path, result) # explore
path.pop() # unchoose (backtrack)
Medium: Subsets¶
def subsets(nums):
result = []
def backtrack(start, path):
result.append(path[:])
for i in range(start, len(nums)):
path.append(nums[i])
backtrack(i + 1, path)
path.pop()
backtrack(0, [])
return result
Medium: Combination Sum¶
- Problem: find all unique combinations that sum to a target (elements can be reused).
def combination_sum(candidates, target):
result = []
def backtrack(start, path, remaining):
if remaining == 0:
result.append(path[:])
return
for i in range(start, len(candidates)):
if candidates[i] > remaining:
break # prune: sorted, so all further candidates are too large
path.append(candidates[i])
backtrack(i, path, remaining - candidates[i]) # i, not i+1: reuse allowed
path.pop()
candidates.sort() # sort for pruning
backtrack(0, [], target)
return result
- Pitfall:
backtrack(i, ...)allows reuse of the same element.backtrack(i + 1, ...)would move to the next element (no reuse). Getting this wrong is the most common backtracking bug.
Hard: N-Queens¶
- Problem: place \(n\) queens on an \(n \times n\) board such that no two attack each other.
def solve_n_queens(n):
result = []
cols = set()
pos_diag = set() # (row + col) is constant on / diagonals
neg_diag = set() # (row - col) is constant on \ diagonals
board = [['.' ] * n for _ in range(n)]
def backtrack(row):
if row == n:
result.append([''.join(r) for r in board])
return
for col in range(n):
if col in cols or (row + col) in pos_diag or (row - col) in neg_diag:
continue
cols.add(col)
pos_diag.add(row + col)
neg_diag.add(row - col)
board[row][col] = 'Q'
backtrack(row + 1)
cols.remove(col)
pos_diag.remove(row + col)
neg_diag.remove(row - col)
board[row][col] = '.'
backtrack(0)
return result
- Key insight: the diagonal encoding. For
/diagonals,row + colis constant. For\diagonals,row - colis constant. Using sets for column and diagonal tracking makes the validity check \(O(1)\).
Common Pitfalls Summary¶
| Pitfall | Example | Fix |
|---|---|---|
lo <= hi vs lo < hi in binary search |
Off-by-one in bounds | Choose based on whether hi is inclusive or exclusive |
| Left-to-right 1D knapsack | Items used multiple times | Iterate right-to-left for 0/1 knapsack |
| Not copying path in backtracking | result.append(path) — all entries point to same list |
result.append(path[:]) or path.copy() |
backtrack(i) vs backtrack(i+1) |
Reuse vs no-reuse of elements | Match the problem statement |
Missing break in sorted backtracking |
Exploring candidates that are too large | Sort + break when candidate exceeds remaining |
| DP initialisation | dp[0] wrong → all subsequent values wrong |
Carefully define the base case |
| Greedy without proof | Greedy does not always work | Verify greedy choice property |
| Unstable sort for multi-key | Relative order of equal elements lost | Use stable sort (merge sort, Python's sorted) |
Take-Home Problems (NeetCode)¶
Binary Search¶
- Binary Search — standard template
- Search a 2D Matrix — binary search on flattened matrix
- Koko Eating Bananas — binary search on answer
- Search in Rotated Sorted Array — identify sorted half
- Find Minimum in Rotated Sorted Array — binary search for inflection
- Median of Two Sorted Arrays — partition-based binary search
Greedy¶
- Jump Game — track max reach
- Jump Game II — BFS-style level tracking
- Merge Intervals — sort + merge
- Insert Interval — find overlap region
- Non-overlapping Intervals — sort by end time
Dynamic Programming¶
- Climbing Stairs — Fibonacci DP
- House Robber — take-or-skip DP
- House Robber II — circular: run twice
- Coin Change — unbounded knapsack
- Longest Common Subsequence — 2D DP on two strings
- Word Break — DP with set lookup
- Longest Increasing Subsequence — \(O(n^2)\) DP or \(O(n \log n)\) with binary search
- Edit Distance — classic 2D DP
- Partition Equal Subset Sum — 0/1 knapsack variant
Backtracking¶
- Subsets — enumerate all subsets
- Combination Sum — backtrack with reuse
- Permutations — backtrack with used set
- Subsets II — skip duplicates
- Word Search — grid backtracking
- Palindrome Partitioning — backtrack + palindrome check
- N-Queens — constraint propagation