Induction and Loop Invariants
Motivation
Iterative algorithms execute a loop whose body changes the program state on every pass. To prove such an algorithm correct, we need a way to talk about “what is true at every point during the loop.” A loop invariant is exactly that: a logical condition that holds before each iteration begins. Combined with mathematical induction, loop invariants reduce algorithm correctness to checking three small claims, no matter how many iterations the loop runs (Kleinberg and Tardos 2005; Cormen et al. 2009).
Without invariants, correctness arguments degenerate into “trace through a few examples and convince yourself it works” — fine for confidence, useless for guarantees on every input.
Mathematical induction
Induction proves that a statement \(P(n)\) holds for every natural number \(n\) by establishing two things:
- Base case: \(P(0)\) (or \(P(1)\)) is true.
- Inductive step: for every \(n \ge 0\), \(P(n) \implies P(n+1)\).
By the well-ordering of \(\mathbb{N}\), these two facts force \(P\) to hold for all \(n\).
A common variant is strong induction: assume \(P(0), P(1), \dots, P(n)\) all hold and prove \(P(n+1)\). Strong induction is logically equivalent to ordinary induction but is often easier to apply when the inductive step needs more than just \(P(n)\) (e.g., divide-and-conquer recurrences that recurse on smaller-but-not-strictly-immediately-smaller subproblems).
Loop invariants
A loop invariant is a predicate \(I\) on the program state such that:
- Initialization. \(I\) holds just before the first iteration of the loop.
- Maintenance. If \(I\) holds at the start of an iteration, then \(I\) holds at the start of the next iteration.
- Termination. When the loop exits, \(I\) together with the negation of the loop condition implies the desired postcondition.
The first two parts are precisely the base case and inductive step of an induction over the iteration index, so a loop invariant is induction in disguise.
Worked example: insertion sort
For insertion sort, the canonical invariant is:
Before iteration \(j\) of the outer loop, the subarray \(A[1 \dots j-1]\) is a sorted permutation of the elements originally in those positions.
- Initialization: at \(j = 2\), the subarray \(A[1 \dots 1]\) is a single element and trivially sorted.
- Maintenance: the inner loop slides \(A[j-1], A[j-2], \dots\) rightward while they exceed
key, then dropskeyinto the gap. The result is a sorted \(A[1 \dots j]\). - Termination: the loop exits at \(j = n+1\), so the invariant says all of \(A[1 \dots n]\) is sorted — the desired postcondition.
Example: binary search
Binary search locates a target \(x\) in a sorted array \(A[1 \dots n]\) by halving the search region each step.
lo = 1; hi = n
while lo <= hi:
mid = floor((lo + hi) / 2)
if A[mid] == x: return mid
if A[mid] < x: lo = mid + 1
else: hi = mid - 1
return "not found"
Invariant. At the top of each iteration: if \(x\) appears in \(A\), then \(x \in A[\text{lo} \dots \text{hi}]\).
- Initialization: \(\text{lo} = 1\), \(\text{hi} = n\), so \(A[\text{lo} \dots \text{hi}] = A[1 \dots n]\). If \(x\) is in \(A\) at all, it is in this range.
- Maintenance: if \(A[\text{mid}] < x\), then all elements in \(A[1 \dots \text{mid}]\) are less than \(x\) (by sortedness), so \(x\) can only be in \(A[\text{mid}+1 \dots \text{hi}]\); setting \(\text{lo} = \text{mid}+1\) preserves the invariant. The symmetric argument applies when \(A[\text{mid}] > x\).
- Termination: the loop exits when \(\text{lo} > \text{hi}\), making \(A[\text{lo} \dots \text{hi}]\) empty. The invariant then says \(x\) is not in \(A\), so “not found” is correct.
The region \([\text{lo}, \text{hi}]\) shrinks by at least half each iteration, so the loop runs at most \(\lceil \log_2(n+1) \rceil\) times — establishing \(O(\log n)\) as a side effect of the correctness argument.
Choosing a good invariant
A loop invariant should be:
- Strong enough that, combined with the loop’s exit condition, it implies the postcondition.
- Weak enough that it actually holds at initialization and is preserved by each iteration.
These two pressures are in tension: too weak and termination tells you nothing; too strong and you cannot prove maintenance. Designing the invariant is often the hardest part of writing the proof, but once it is correct the maintenance step is usually mechanical.
Beyond loops: recursion
Recursive algorithms admit a similar treatment, but now the induction is on input size (or some structural measure). The invariant-style argument becomes:
- Base case: the algorithm is correct on the smallest inputs.
- Inductive step: assuming correctness on all strictly smaller inputs, the recursive case is correct.
This is the standard technique for mergesort, Karatsuba multiplication, and most other divide-and-conquer algorithms.