Insertion Sort
Motivation
Insertion sort is the algorithm most people invent on their own when asked to sort a hand of cards: walk through the input one element at a time and slide each new element back into the already-sorted prefix until it lands in its correct spot. It is a useful starting point for studying algorithms because its correctness is the textbook example of a loop invariant proof, and it is genuinely the fastest algorithm in some regimes — small arrays or arrays that are already nearly sorted (Kleinberg and Tardos 2005; Cormen et al. 2009).
Faster sorting algorithms (mergesort, heapsort, quicksort) are asymptotically better but have larger constants and more complicated control flow. Most production sort routines fall back to insertion sort once a recursive subproblem becomes small enough.
Problem
Given an array \(A[1 \dots n]\) of comparable elements, produce a permutation in non-decreasing order. Cost is measured in comparisons and swaps (or element shifts).
Key Ideas
The driving idea is to grow a sorted prefix one element at a time. After processing the first \(j-1\) elements, \(A[1 \dots j-1]\) is sorted; processing element \(j\) means finding its correct position within that prefix and shifting larger elements one slot right to make room.
Two properties fall out of this design:
- In-place. The sorted prefix and the unprocessed suffix share the same array — no auxiliary buffer is needed, only a single
keyvariable. - Adaptive. The inner shift loop stops as soon as it finds an element no greater than
key. If the input is already nearly sorted, each iteration does \(O(1)\) work and the algorithm runs in \(\Theta(n + d)\) time, where \(d\) is the number of inversions.
The trade-off is that finding the insertion position is sequential. Even though a binary search could locate the slot in \(O(\log j)\) comparisons, the shifts themselves are \(\Theta(j)\) in the worst case — so the worst-case running time is \(\Theta(n^2)\) regardless of how cleverly we search.
Algorithm
for j = 2 to n:
key = A[j]
i = j - 1
while i >= 1 and A[i] > key:
A[i+1] = A[i]
i = i - 1
A[i+1] = key
At iteration \(j\), the prefix \(A[1 \dots j-1]\) is already sorted. We pick up \(A[j]\) as key, shift larger elements one position to the right, and drop key into the gap.
Walkthrough
Sorting six elements
Input array \(A = [5, 2, 4, 6, 1, 3]\). Each row shows the array state at the start of the outer iteration \(j\). The shaded green prefix is the already-sorted region; the boxed cell is the next key about to be inserted.
The total comparison count for this input is \(1 + 1 + 1 + 4 + 3 = 10\), well below the worst case \(\binom{6}{2} = 15\).
Correctness
The invariant maintained at the top of the outer loop is:
Invariant. Before iteration \(j\), the subarray \(A[1 \dots j-1]\) contains the original first \(j-1\) elements in sorted order.
The three steps of an induction proof using a loop invariant — initialization, maintenance, termination — establish correctness:
- Initialization (\(j = 2\)): the prefix \(A[1]\) is trivially sorted.
- Maintenance: the inner loop shifts every element of \(A[1 \dots j-1]\) that exceeds
keyone slot right, then writeskeyinto the resulting hole. The new prefix \(A[1 \dots j]\) is therefore sorted. - Termination (\(j = n+1\)): the invariant says \(A[1 \dots n]\) is sorted, which is the postcondition.
Complexity and Tradeoffs
Let \(t_j\) be the number of times the while-loop test runs at outer iteration \(j\). The total cost is \(\Theta(n + \sum_j t_j)\).
- Best case (already sorted): each \(t_j = 1\), total \(\Theta(n)\).
- Worst case (reverse sorted): each \(t_j = j\), total \(\sum_{j=2}^n j = \Theta(n^2)\).
- Average case (random permutation): each new element lands at the median of the sorted prefix on average, so each \(t_j = \Theta(j)\), giving \(\Theta(n^2)\).
Insertion sort is in-place (\(O(1)\) extra memory) and stable (does not reorder equal elements). The \(\Theta(n^2)\) worst case rules it out for large inputs, but the small constants and adaptive best case make it competitive on short or nearly-sorted arrays — which is why it appears as the base case inside otherwise asymptotically faster sorts.
When to Use It
| Situation | Use insertion sort? |
|---|---|
| Input size \(n \le \sim 50\) | Yes — small constants beat \(O(n \log n)\) algorithms. |
| Nearly-sorted input (\(d\) inversions, \(d = O(n)\)) | Yes — runs in \(\Theta(n + d)\). |
Base case of a hybrid sort (introsort, Timsort, std::sort) |
Yes — standard production pattern. |
| Large random input | No — use mergesort, heapsort, or quicksort. |
| Stability required, \(n\) large | No — use a stable \(O(n \log n)\) sort. |