Skip to content

Sorting, Search, and Algorithm Design

Sorting and searching are the most fundamental algorithmic operations. This file covers sorting algorithms, binary search patterns, divide and conquer, greedy algorithms, dynamic programming, and backtracking

  • Every data structure enables algorithms, and every algorithm relies on data structures. This file covers the design paradigms: the high-level strategies for solving problems. Once you recognise which paradigm applies, the implementation follows naturally.

Sorting Algorithms

  • Sorting is the most studied problem in computer science. Understanding the algorithms builds intuition for recursion, divide-and-conquer, and complexity analysis.
Algorithm Best Average Worst Space Stable?
Bubble sort \(O(n)\) \(O(n^2)\) \(O(n^2)\) \(O(1)\) Yes
Insertion sort \(O(n)\) \(O(n^2)\) \(O(n^2)\) \(O(1)\) Yes
Merge sort \(O(n \log n)\) \(O(n \log n)\) \(O(n \log n)\) \(O(n)\) Yes
Quick sort \(O(n \log n)\) \(O(n \log n)\) \(O(n^2)\) \(O(\log n)\) No
Heap sort \(O(n \log n)\) \(O(n \log n)\) \(O(n \log n)\) \(O(1)\) No
Counting sort \(O(n + k)\) \(O(n + k)\) \(O(n + k)\) \(O(k)\) Yes
Radix sort \(O(d(n + k))\) \(O(d(n + k))\) \(O(d(n + k))\) \(O(n + k)\) Yes
  • Stable means equal elements preserve their relative order. This matters when sorting by multiple keys.

  • The lower bound for comparison-based sorting is \(\Omega(n \log n)\). The proof uses decision trees (chapter 13): any comparison sort must distinguish all \(n!\) permutations, requiring at least \(\log_2(n!) = \Omega(n \log n)\) comparisons. Counting sort and radix sort beat this by not comparing elements.

Merge Sort

  • Divide the array in half, recursively sort each half, then merge the sorted halves. \(O(n \log n)\) always, \(O(n)\) extra space.
def merge_sort(arr):
    if len(arr) <= 1:
        return arr

    mid = len(arr) // 2
    left = merge_sort(arr[:mid])
    right = merge_sort(arr[mid:])

    return merge(left, right)

def merge(left, right):
    result = []
    i = j = 0
    while i < len(left) and j < len(right):
        if left[i] <= right[j]:  # <= for stability
            result.append(left[i])
            i += 1
        else:
            result.append(right[j])
            j += 1
    result.extend(left[i:])
    result.extend(right[j:])
    return result
  • Pitfall: using < instead of <= in the merge breaks stability (equal elements from the right half would come before those from the left).

Quick Sort

  • Pick a pivot, partition elements into "less than pivot" and "greater than pivot," recursively sort each partition. \(O(n \log n)\) average, \(O(n^2)\) worst case (when the pivot is always the smallest or largest element).
def quicksort(arr, lo=0, hi=None):
    if hi is None:
        hi = len(arr) - 1
    if lo >= hi:
        return

    pivot_idx = partition(arr, lo, hi)
    quicksort(arr, lo, pivot_idx - 1)
    quicksort(arr, pivot_idx + 1, hi)

def partition(arr, lo, hi):
    pivot = arr[hi]  # Lomuto: pivot is last element
    i = lo
    for j in range(lo, hi):
        if arr[j] < pivot:
            arr[i], arr[j] = arr[j], arr[i]
            i += 1
    arr[i], arr[hi] = arr[hi], arr[i]
    return i
  • Pivot strategies: last element (simple, bad for sorted input), random (expected \(O(n \log n)\)), median-of-three (practical choice). Always prefer random pivot in interviews to avoid worst-case discussion.

  • Pitfall: quicksort's \(O(n^2)\) worst case occurs on already-sorted arrays with first/last pivot. In practice, random pivot or median-of-three eliminates this.

Counting Sort

  • When values are integers in a known range \([0, k)\), count occurrences and reconstruct: \(O(n + k)\) time. Not comparison-based, so it can beat \(O(n \log n)\).
def counting_sort(arr, k):
    count = [0] * k
    for x in arr:
        count[x] += 1
    result = []
    for val in range(k):
        result.extend([val] * count[val])
    return result
  • When to use: the range \(k\) is not much larger than \(n\). If \(k = O(n)\), this is \(O(n)\). If \(k \gg n\) (e.g., sorting 10 numbers in range \([0, 10^9]\)), counting sort wastes memory.

  • Binary search finds a target in a sorted array in \(O(\log n)\) by repeatedly halving the search space. But binary search is much more than "find a number in a sorted array." The general pattern is: search on a monotonic condition.

  • Template (the one that avoids off-by-one errors):

def binary_search(arr, target):
    lo, hi = 0, len(arr) - 1

    while lo <= hi:
        mid = lo + (hi - lo) // 2  # avoids overflow in other languages
        if arr[mid] == target:
            return mid
        elif arr[mid] < target:
            lo = mid + 1
        else:
            hi = mid - 1

    return -1  # not found
  • Lower bound (first element \(\geq\) target):
def lower_bound(arr, target):
    lo, hi = 0, len(arr)
    while lo < hi:
        mid = (lo + hi) // 2
        if arr[mid] < target:
            lo = mid + 1
        else:
            hi = mid
    return lo
  • Pitfall: the difference between lo <= hi and lo < hi, and between hi = mid and hi = mid - 1, determines whether you find the exact match or the boundary. Draw it out with a 2-element array to verify.
  • The standard problem. Use the template above.

Medium: Search in Rotated Sorted Array

  • Problem: a sorted array is rotated at some pivot. Find a target.

  • Pattern: at each step, one half is always sorted. Determine which half is sorted, and check if the target lies in that half.

def search_rotated(nums, target):
    lo, hi = 0, len(nums) - 1

    while lo <= hi:
        mid = (lo + hi) // 2
        if nums[mid] == target:
            return mid

        # left half is sorted
        if nums[lo] <= nums[mid]:
            if nums[lo] <= target < nums[mid]:
                hi = mid - 1
            else:
                lo = mid + 1
        # right half is sorted
        else:
            if nums[mid] < target <= nums[hi]:
                lo = mid + 1
            else:
                hi = mid - 1

    return -1
  • Pitfall: the <= in nums[lo] <= nums[mid] (not <) is critical. When lo == mid (2 elements left), we must correctly identify the sorted half.

Hard: Median of Two Sorted Arrays

  • Problem: find the median of two sorted arrays in \(O(\log(m + n))\).

  • Pattern: binary search on the partition point of the smaller array. The partition divides both arrays such that all elements on the left are smaller than all elements on the right.

def find_median(nums1, nums2):
    if len(nums1) > len(nums2):
        nums1, nums2 = nums2, nums1  # ensure nums1 is shorter

    m, n = len(nums1), len(nums2)
    lo, hi = 0, m
    half = (m + n + 1) // 2

    while lo <= hi:
        i = (lo + hi) // 2          # partition point in nums1
        j = half - i                 # partition point in nums2

        left1 = nums1[i - 1] if i > 0 else float('-inf')
        right1 = nums1[i] if i < m else float('inf')
        left2 = nums2[j - 1] if j > 0 else float('-inf')
        right2 = nums2[j] if j < n else float('inf')

        if left1 <= right2 and left2 <= right1:
            # correct partition
            if (m + n) % 2 == 1:
                return max(left1, left2)
            return (max(left1, left2) + min(right1, right2)) / 2
        elif left1 > right2:
            hi = i - 1
        else:
            lo = i + 1
  • This is one of the hardest binary search problems. The key insight is that you are not searching for a value but for a partition point that satisfies a condition.

Meta-Pattern: Binary Search on Answer

  • Many problems that do not look like binary search can be solved by binary searching on the answer. If the answer is a number, and you can write a function is_feasible(x) that is monotonic (True for all \(x \geq\) optimal, or False for all \(x \geq\) optimal), then binary search over \(x\).

  • Example: "what is the minimum capacity of a ship to deliver all packages within \(d\) days?" Binary search on the capacity. For each candidate capacity, greedily check if all packages can be delivered in \(d\) days.

def ship_within_days(weights, days):
    lo, hi = max(weights), sum(weights)

    while lo < hi:
        mid = (lo + hi) // 2
        # can we ship with capacity mid in <= days?
        current_load, num_days = 0, 1
        for w in weights:
            if current_load + w > mid:
                num_days += 1
                current_load = 0
            current_load += w

        if num_days <= days:
            hi = mid
        else:
            lo = mid + 1

    return lo

Pattern: Greedy Algorithms

  • A greedy algorithm makes the locally optimal choice at each step, hoping this leads to the globally optimal solution. Greedy works when the problem has greedy choice property (local optimum leads to global optimum) and optimal substructure (optimal solution contains optimal solutions to subproblems).

Medium: Jump Game

  • Problem: given an array where nums[i] is the max jump length at position \(i\), determine if you can reach the last index.
def can_jump(nums):
    max_reach = 0
    for i, jump in enumerate(nums):
        if i > max_reach:
            return False  # cannot reach this position
        max_reach = max(max_reach, i + jump)
    return True
  • Why greedy works: we only need to know the farthest reachable position. If the current position is beyond the farthest reach, we are stuck. Otherwise, we update the farthest reach.

Medium: Merge Intervals

  • Problem: merge overlapping intervals.
def merge_intervals(intervals):
    intervals.sort(key=lambda x: x[0])
    merged = [intervals[0]]

    for start, end in intervals[1:]:
        if start <= merged[-1][1]:
            merged[-1][1] = max(merged[-1][1], end)
        else:
            merged.append([start, end])

    return merged
  • Pattern: sort by start time, then merge greedily. If the current interval overlaps with the last merged one, extend it. Otherwise, start a new merged interval.

  • Pitfall: using merged[-1][1] = end instead of merged[-1][1] = max(merged[-1][1], end). An interval can be fully contained within another (e.g., [1, 10] and [2, 5]).


Pattern: Dynamic Programming

  • Dynamic programming (DP) solves problems by breaking them into overlapping subproblems, solving each subproblem once, and storing the results. It works when a problem has optimal substructure and overlapping subproblems.

  • Two approaches:

    • Top-down (memoisation): write the natural recursive solution, then cache results in a dictionary.
    • Bottom-up (tabulation): build a table from the smallest subproblems up.
  • How to identify DP: the problem asks for an optimum (min/max), counting, or existence, and the current decision depends on previous decisions. If you draw the recursion tree and see repeated subproblems, it is DP.

Easy: Climbing Stairs

  • Problem: \(n\) steps, you can climb 1 or 2 at a time. How many distinct ways?

  • This is Fibonacci: \(f(n) = f(n-1) + f(n-2)\).

def climb_stairs(n):
    if n <= 2:
        return n
    a, b = 1, 2
    for _ in range(3, n + 1):
        a, b = b, a + b
    return b
  • \(O(n)\) time, \(O(1)\) space. The full memoisation table is not needed since each state depends only on the previous two.

Medium: Coin Change

  • Problem: given coin denominations and a target amount, find the minimum number of coins needed.

  • State: dp[amount] = minimum coins to make amount.

  • Transition: dp[amount] = min(dp[amount - coin] + 1) for each coin.
  • Base case: dp[0] = 0.
def coin_change(coins, amount):
    dp = [float('inf')] * (amount + 1)
    dp[0] = 0

    for a in range(1, amount + 1):
        for coin in coins:
            if coin <= a and dp[a - coin] + 1 < dp[a]:
                dp[a] = dp[a - coin] + 1

    return dp[amount] if dp[amount] != float('inf') else -1
  • Pitfall: initialising with float('inf') (not 0 or -1). The minimum comparison only works if unreachable states are infinity.

Medium: Longest Common Subsequence

  • Problem: given two strings, find the length of their longest common subsequence.

  • State: dp[i][j] = LCS of text1[:i] and text2[:j].

  • Transition: if text1[i-1] == text2[j-1], then dp[i][j] = dp[i-1][j-1] + 1. Otherwise, dp[i][j] = max(dp[i-1][j], dp[i][j-1]).
def longest_common_subsequence(text1, text2):
    m, n = len(text1), len(text2)
    dp = [[0] * (n + 1) for _ in range(m + 1)]

    for i in range(1, m + 1):
        for j in range(1, n + 1):
            if text1[i - 1] == text2[j - 1]:
                dp[i][j] = dp[i - 1][j - 1] + 1
            else:
                dp[i][j] = max(dp[i - 1][j], dp[i][j - 1])

    return dp[m][n]

Hard: 0/1 Knapsack

  • Problem: given items with weights and values, and a capacity \(W\), maximise the total value without exceeding \(W\).

  • State: dp[i][w] = max value using first \(i\) items with capacity \(w\).

  • Transition: dp[i][w] = max(dp[i-1][w], dp[i-1][w - weight[i]] + value[i]) (skip or take item \(i\)).
def knapsack(weights, values, capacity):
    n = len(weights)
    dp = [[0] * (capacity + 1) for _ in range(n + 1)]

    for i in range(1, n + 1):
        for w in range(capacity + 1):
            dp[i][w] = dp[i - 1][w]  # skip item i
            if weights[i - 1] <= w:
                dp[i][w] = max(dp[i][w],
                               dp[i - 1][w - weights[i - 1]] + values[i - 1])

    return dp[n][capacity]
  • Space optimisation: since each row only depends on the previous row, use a 1D array and iterate \(w\) from right to left:
def knapsack_optimised(weights, values, capacity):
    dp = [0] * (capacity + 1)
    for i in range(len(weights)):
        for w in range(capacity, weights[i] - 1, -1):  # right to left!
            dp[w] = max(dp[w], dp[w - weights[i]] + values[i])
    return dp[capacity]
  • Pitfall: iterating left to right in the 1D version allows using item \(i\) multiple times (unbounded knapsack). Right to left ensures each item is used at most once.

Pattern: Backtracking

  • Backtracking is exhaustive search with pruning. Build a solution incrementally, and abandon (backtrack) as soon as the partial solution cannot possibly lead to a valid complete solution.

  • Template:

def backtrack(candidates, path, result):
    if is_solution(path):
        result.append(path[:])  # copy!
        return

    for candidate in get_candidates(path):
        if is_valid(candidate, path):
            path.append(candidate)     # choose
            backtrack(candidates, path, result)  # explore
            path.pop()                 # unchoose (backtrack)

Medium: Subsets

def subsets(nums):
    result = []
    def backtrack(start, path):
        result.append(path[:])
        for i in range(start, len(nums)):
            path.append(nums[i])
            backtrack(i + 1, path)
            path.pop()
    backtrack(0, [])
    return result

Medium: Combination Sum

  • Problem: find all unique combinations that sum to a target (elements can be reused).
def combination_sum(candidates, target):
    result = []
    def backtrack(start, path, remaining):
        if remaining == 0:
            result.append(path[:])
            return
        for i in range(start, len(candidates)):
            if candidates[i] > remaining:
                break  # prune: sorted, so all further candidates are too large
            path.append(candidates[i])
            backtrack(i, path, remaining - candidates[i])  # i, not i+1: reuse allowed
            path.pop()

    candidates.sort()  # sort for pruning
    backtrack(0, [], target)
    return result
  • Pitfall: backtrack(i, ...) allows reuse of the same element. backtrack(i + 1, ...) would move to the next element (no reuse). Getting this wrong is the most common backtracking bug.

Hard: N-Queens

  • Problem: place \(n\) queens on an \(n \times n\) board such that no two attack each other.
def solve_n_queens(n):
    result = []
    cols = set()
    pos_diag = set()  # (row + col) is constant on / diagonals
    neg_diag = set()  # (row - col) is constant on \ diagonals

    board = [['.' ] * n for _ in range(n)]

    def backtrack(row):
        if row == n:
            result.append([''.join(r) for r in board])
            return

        for col in range(n):
            if col in cols or (row + col) in pos_diag or (row - col) in neg_diag:
                continue

            cols.add(col)
            pos_diag.add(row + col)
            neg_diag.add(row - col)
            board[row][col] = 'Q'

            backtrack(row + 1)

            cols.remove(col)
            pos_diag.remove(row + col)
            neg_diag.remove(row - col)
            board[row][col] = '.'

    backtrack(0)
    return result
  • Key insight: the diagonal encoding. For / diagonals, row + col is constant. For \ diagonals, row - col is constant. Using sets for column and diagonal tracking makes the validity check \(O(1)\).

Common Pitfalls Summary

Pitfall Example Fix
lo <= hi vs lo < hi in binary search Off-by-one in bounds Choose based on whether hi is inclusive or exclusive
Left-to-right 1D knapsack Items used multiple times Iterate right-to-left for 0/1 knapsack
Not copying path in backtracking result.append(path) — all entries point to same list result.append(path[:]) or path.copy()
backtrack(i) vs backtrack(i+1) Reuse vs no-reuse of elements Match the problem statement
Missing break in sorted backtracking Exploring candidates that are too large Sort + break when candidate exceeds remaining
DP initialisation dp[0] wrong → all subsequent values wrong Carefully define the base case
Greedy without proof Greedy does not always work Verify greedy choice property
Unstable sort for multi-key Relative order of equal elements lost Use stable sort (merge sort, Python's sorted)

Take-Home Problems (NeetCode)

Greedy

Dynamic Programming

Backtracking