Mergesort

Motivation

Mergesort sorts an array of \(n\) elements in \(\Theta(n \log n)\) time — a meaningful improvement over the \(\Theta(n^2)\) of insertion sort on any non-trivial input. It was one of the first algorithms designed (von Neumann, 1945) and remains the standard example of divide and conquer: split the input into halves, recursively solve each, and combine the answers. The combine step (merging two sorted lists) is the entire algorithmic content; everything else is bookkeeping (Kleinberg and Tardos 2005; Cormen et al. 2009).

Beyond its asymptotics, mergesort is the foundation of every external-memory sort (its sequential access pattern is friendly to disks and tapes) and the inspiration for Timsort, the sorting algorithm in Python and Java.

Problem

Input: an array \(A[1 \dots n]\) of comparable elements. Output: a permutation of \(A\) in non-decreasing order.

We measure cost by the number of comparisons and the total work performed (including the auxiliary memory used to hold merged output).

Examples

Merging two sorted halves

The combine step is the only place mergesort actually compares elements. Given two already-sorted lists, we walk a pointer along each, repeatedly emitting the smaller front element:

step	\(L\) remaining	\(R\) remaining	compare	output
0	\([1, 4, 7]\)	\([2, 3, 8]\)	\(1\) vs \(2\)	\([\,]\)
1	\([4, 7]\)	\([2, 3, 8]\)	\(4\) vs \(2\)	\([1]\)
2	\([4, 7]\)	\([3, 8]\)	\(4\) vs \(3\)	\([1, 2]\)
3	\([4, 7]\)	\([8]\)	\(4\) vs \(8\)	\([1, 2, 3]\)
4	\([7]\)	\([8]\)	\(7\) vs \(8\)	\([1, 2, 3, 4]\)
5	\([\,]\)	\([8]\)	drain \(R\)	\([1, 2, 3, 4, 7]\)
6	\([\,]\)	\([\,]\)	done	\([1, 2, 3, 4, 7, 8]\)

Six elements merged in five comparisons. Each comparison emits exactly one output element, so merging two lists of total size \(k\) uses at most \(k - 1\) comparisons and \(\Theta(k)\) work. This linear-time merge is what makes the whole algorithm fast.

Already-sorted input is not faster

Standard mergesort always recurses to the bottom and always runs the merge step, even when both halves are already in order. On the input \([1, 2, 3, 4, 5, 6, 7, 8]\) the recursion tree, the number of merges, and the comparison count are essentially the same as on a random permutation. The Timsort variant exploits already-sorted runs; plain mergesort does not.

Key Ideas

The whole algorithm rests on one observation: two sorted lists can be combined into one sorted list in linear time. Given that, sorting becomes recursive:

If the input has \(\le 1\) element, it is already sorted.
Otherwise, split it in half, recursively sort each half, and merge the two sorted results.

The split is trivial — just compute a midpoint. The recursion is generic. The merge is where every comparison happens, and its linear cost is what holds the whole structure together.

The shape of the recursion is the same on every input: \(\log_2 n\) levels deep, with the work at each level summing to \(\Theta(n)\). There are \(n\) leaves (size-\(1\) subproblems), and at every internal level the merges between siblings cover the entire array exactly once. So the total work is \(\Theta(n \log n)\), regardless of how scrambled the input was.

Deriving the Solution

Let \(T(n)\) be the worst-case running time. The algorithm makes two recursive calls on inputs of size \(n/2\) and then merges in linear time:

\[ T(n) = 2 \, T(n/2) + \Theta(n), \qquad T(1) = \Theta(1). \]

By the master theorem with \(a = 2\), \(b = 2\), \(c = 1\) (so \(a = b^c\)), the solution is \(T(n) = \Theta(n \log n)\).

A direct argument gives the same answer. The recursion tree has \(\log_2 n\) levels. At level \(\ell\), the algorithm runs \(2^\ell\) merges, each on inputs of size \(n / 2^\ell\), for total work \(2^\ell \cdot \Theta(n / 2^\ell) = \Theta(n)\). Summing across \(\log_2 n\) levels gives \(\Theta(n \log n)\).

Algorithm

mergesort(A):
    if len(A) <= 1:
        return A
    mid = len(A) // 2
    L = mergesort(A[1 .. mid])
    R = mergesort(A[mid+1 .. n])
    return merge(L, R)

merge(L, R):
    out = []
    i = j = 0
    while i < len(L) and j < len(R):
        if L[i] <= R[j]:
            out.append(L[i]); i += 1
        else:
            out.append(R[j]); j += 1
    append remainder of L (or R) to out
    return out

The split step does no real work; the merge step is where every comparison happens. The condition L[i] <= R[j] (rather than <) is what makes mergesort stable — equal-keyed elements from the left half are emitted first, preserving their original order.

Walkthrough

Input array \(A = [5, 2, 4, 7, 1, 3, 2, 6]\). The recursion splits down to singletons (level 3), then merges back up. At each non-leaf node, the italic gray line is the unsorted input from the split; the bold line is the merged sorted result.

The merge at the root combines \([2,4,5,7]\) and \([1,2,3,6]\) in seven comparisons, taking the smaller front element each step. Note level 2’s pairs \([4,7]\), \([1,3]\), and \([2,6]\) are already sorted; the merge step still runs on them but does no swapping. The total work across one level is always \(\Theta(n)\), regardless of how the input was distributed among children.

Correctness

By induction on input size:

Base case: an array of length \(\le 1\) is sorted.
Inductive step: assume mergesort is correct on inputs of size \(< n\). Then L and R are sorted permutations of the original halves, and merge produces a sorted permutation of \(L \cup R\) — the merge invariant (“out is sorted, every element of L[0..i) and R[0..j) has been emitted, and the smallest unemitted element is at position \(i\) or \(j\)”) establishes this.

The merge invariant itself is a small loop-invariant proof: it holds initially (out is empty, \(i = j = 0\)), is maintained by each iteration (the front of \(L\) or \(R\) is, by sortedness, \(\le\) every later element of either list, so emitting it preserves sortedness of out), and at termination implies out is exactly \(L \cup R\) sorted.

Complexity and Tradeoffs

The recurrence \(T(n) = 2 T(n/2) + \Theta(n)\) with \(T(1) = \Theta(1)\) gives

\[ T(n) = \Theta(n \log n) \]

in the best, average, and worst cases — the recursion structure does not depend on the input order.

Auxiliary memory is \(\Theta(n)\): each merge writes its output into a fresh buffer of total size equal to the merged input. In-place merging is possible but adds substantial constant-factor overhead and complicates the code; standard implementations accept the linear extra memory.

Important tradeoffs:

Memory: \(\Theta(n)\) extra space, versus \(O(1)\) for insertion sort, heapsort, and in-place quicksort.
Worst case: \(\Theta(n \log n)\), beating quicksort’s \(\Theta(n^2)\) worst case.
Already-sorted input: still \(\Theta(n \log n)\), unlike insertion sort’s \(\Theta(n)\) or Timsort’s run-aware variant.
Sequential access: merges read each input list front-to-back, ideal for disks, tapes, and streaming inputs.
Stability: preserved by the standard implementation; quicksort and heapsort are not stable.

When to Use It

Mergesort is the right default when the worst case matters or when stability is required. The broader pattern is divide and conquer with a linear-time combine: same recurrence, same \(\Theta(n \log n)\) bound.

Situation	Typical algorithm
Small array (\(n \lesssim 50\))	Insertion sort — small constants beat \(O(n \log n)\)
Nearly-sorted input	Insertion sort or Timsort — exploit existing order
In-memory, no stability needed	Quicksort — typically faster constants, but \(\Theta(n^2)\) worst case
Stability required	Mergesort or Timsort
Worst-case bound required	Mergesort or heapsort — heapsort uses \(O(1)\) extra space
Data exceeds RAM (external sort)	External mergesort — sequential I/O is the deciding factor
Linked-list input	Mergesort — no random access needed, can run in-place on links

Properties

Stable: mergesort preserves the relative order of equal-keyed elements (provided merge uses L[i] <= R[j], never <).
Comparison-based: makes only \(\Theta(n \log n)\) comparisons, matching the information-theoretic lower bound for comparison sorts.
Not in-place: the standard implementation needs \(\Theta(n)\) auxiliary memory for the merge buffer. In-place merge is possible but complicated and slower in practice.
External-friendly: sequential reads and writes, ideal for disk/tape sorting.

Variants

Bottom-up mergesort: avoid recursion by iteratively merging size-\(1\) runs into size-\(2\), size-\(2\) into size-\(4\), and so on. Same complexity, smaller constant factors.
Timsort: detect already-sorted runs in the input and merge them, achieving \(\Theta(n)\) on nearly-sorted inputs while remaining \(\Theta(n \log n)\) in the worst case.
External mergesort: when \(n\) exceeds memory, use \(k\)-way merging with a priority queue across disk-resident runs.
Parallel mergesort: the divide phase is embarrassingly parallel; the merge step admits a logarithmic-depth parallel algorithm.

References

Cormen, Thomas H., Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. 2009. Introduction to Algorithms. 3rd ed. MIT Press. https://mitpress.mit.edu/9780262533058/introduction-to-algorithms/.

Kleinberg, Jon, and Éva Tardos. 2005. Algorithm Design. Pearson/Addison-Wesley. https://www.pearson.com/en-us/subject-catalog/p/algorithm-design/P200000003259.