Insertion Sort

Motivation

Insertion sort is the algorithm most people invent on their own when asked to sort a hand of cards: walk through the input one element at a time and slide each new element back into the already-sorted prefix until it lands in its correct spot. It is a useful starting point for studying algorithms because its correctness is the textbook example of a loop invariant proof, and it is genuinely the fastest algorithm in some regimes — small arrays or arrays that are already nearly sorted (Kleinberg and Tardos 2005; Cormen et al. 2009).

Faster sorting algorithms (mergesort, heapsort, quicksort) are asymptotically better but have larger constants and more complicated control flow. Most production sort routines fall back to insertion sort once a recursive subproblem becomes small enough.

Problem

Given an array \(A[1 \dots n]\) of comparable elements, produce a permutation in non-decreasing order. Cost is measured in comparisons and swaps (or element shifts).

Key Ideas

The driving idea is to grow a sorted prefix one element at a time. After processing the first \(j-1\) elements, \(A[1 \dots j-1]\) is sorted; processing element \(j\) means finding its correct position within that prefix and shifting larger elements one slot right to make room.

Two properties fall out of this design:

In-place. The sorted prefix and the unprocessed suffix share the same array — no auxiliary buffer is needed, only a single key variable.
Adaptive. The inner shift loop stops as soon as it finds an element no greater than key. If the input is already nearly sorted, each iteration does \(O(1)\) work and the algorithm runs in \(\Theta(n + d)\) time, where \(d\) is the number of inversions.

The trade-off is that finding the insertion position is sequential. Even though a binary search could locate the slot in \(O(\log j)\) comparisons, the shifts themselves are \(\Theta(j)\) in the worst case — so the worst-case running time is \(\Theta(n^2)\) regardless of how cleverly we search.

Algorithm

for j = 2 to n:
    key = A[j]
    i = j - 1
    while i >= 1 and A[i] > key:
        A[i+1] = A[i]
        i = i - 1
    A[i+1] = key

At iteration \(j\), the prefix \(A[1 \dots j-1]\) is already sorted. We pick up \(A[j]\) as key, shift larger elements one position to the right, and drop key into the gap.

Walkthrough

Sorting six elements

Input array \(A = [5, 2, 4, 6, 1, 3]\). Each row shows the array state at the start of the outer iteration \(j\). The shaded green prefix is the already-sorted region; the boxed cell is the next key about to be inserted.

The total comparison count for this input is \(1 + 1 + 1 + 4 + 3 = 10\), well below the worst case \(\binom{6}{2} = 15\).

Correctness

The invariant maintained at the top of the outer loop is:

Invariant. Before iteration \(j\), the subarray \(A[1 \dots j-1]\) contains the original first \(j-1\) elements in sorted order.

The three steps of an induction proof using a loop invariant — initialization, maintenance, termination — establish correctness:

Initialization (\(j = 2\)): the prefix \(A[1]\) is trivially sorted.
Maintenance: the inner loop shifts every element of \(A[1 \dots j-1]\) that exceeds key one slot right, then writes key into the resulting hole. The new prefix \(A[1 \dots j]\) is therefore sorted.
Termination (\(j = n+1\)): the invariant says \(A[1 \dots n]\) is sorted, which is the postcondition.

Complexity and Tradeoffs

Let \(t_j\) be the number of times the while-loop test runs at outer iteration \(j\). The total cost is \(\Theta(n + \sum_j t_j)\).

Best case (already sorted): each \(t_j = 1\), total \(\Theta(n)\).
Worst case (reverse sorted): each \(t_j = j\), total \(\sum_{j=2}^n j = \Theta(n^2)\).
Average case (random permutation): each new element lands at the median of the sorted prefix on average, so each \(t_j = \Theta(j)\), giving \(\Theta(n^2)\).

Insertion sort is in-place (\(O(1)\) extra memory) and stable (does not reorder equal elements). The \(\Theta(n^2)\) worst case rules it out for large inputs, but the small constants and adaptive best case make it competitive on short or nearly-sorted arrays — which is why it appears as the base case inside otherwise asymptotically faster sorts.

When to Use It

Situation	Use insertion sort?
Input size \(n \le \sim 50\)	Yes — small constants beat \(O(n \log n)\) algorithms.
Nearly-sorted input (\(d\) inversions, \(d = O(n)\))	Yes — runs in \(\Theta(n + d)\).
Base case of a hybrid sort (introsort, Timsort, `std::sort`)	Yes — standard production pattern.
Large random input	No — use mergesort, heapsort, or quicksort.
Stability required, \(n\) large	No — use a stable \(O(n \log n)\) sort.

References

Cormen, Thomas H., Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. 2009. Introduction to Algorithms. 3rd ed. MIT Press. https://mitpress.mit.edu/9780262533058/introduction-to-algorithms/.

Kleinberg, Jon, and Éva Tardos. 2005. Algorithm Design. Pearson/Addison-Wesley. https://www.pearson.com/en-us/subject-catalog/p/algorithm-design/P200000003259.