Greedy Exchange Argument
Motivation
A greedy algorithm commits to one local choice at a time without revisiting it. To prove a greedy algorithm is optimal, we have to rule out the possibility that some clever non-greedy schedule does better. The exchange argument does this by transforming any optimal solution, step by step, into the greedy one — and showing that each transformation does not make the solution worse. If we can reach the greedy solution from the optimum without losing anything, the greedy must already be optimal (Kleinberg and Tardos 2005).
This is the second of the two standard greedy proof patterns. The other, greedy stays ahead, tracks a scalar progress measure across both solutions. Exchange is the right tool when the natural progress measure is missing or non-monotone — when the greedy and the optimum disagree on the order of items, on which items to include, or on how to assign resources.
The template
Let \(G\) be the greedy solution and \(O^*\) any optimal solution. The argument proceeds:
- Identify a difference. If \(O^* \ne G\), locate a structural disagreement: an inverted pair, a swapped element, a different choice at the first divergence.
- Define a local exchange. Specify a transformation \(O^* \to O'\) that resolves one disagreement, moving \(O'\) “closer” to \(G\) by some well-founded measure (number of inversions, position of first divergence, symmetric difference size).
- Prove the exchange is safe. Show \(\text{cost}(O') \le \text{cost}(O^*)\) (or value is non-decreasing).
- Iterate. Each exchange strictly decreases the discrepancy measure, so finitely many exchanges yield the greedy solution \(G\) with cost no worse than \(O^*\). Therefore \(G\) is optimal.
The art is choosing the exchange. It must be both safe (does not worsen the solution) and productive (strictly closer to \(G\)). A swap that leaves the solution structurally unchanged is useless; a swap that improves the cost too much is suspect (it would mean \(O^*\) was not optimal in the first place — a sign-error red flag).
Canonical example: minimizing maximum lateness
Minimizing maximum lateness: given \(n\) jobs with processing times \(t_i\) and deadlines \(d_i\), schedule them on one machine to minimize the maximum lateness \(L = \max_i \max(0, f_i - d_i)\).
Greedy choice: earliest deadline first (EDF). The exchange argument:
- An inversion is a pair of adjacent jobs scheduled in the wrong deadline order (\(d_i > d_j\) with \(i\) before \(j\)).
- EDF has no inversions.
- Any inversion-free schedule with no idle time has the same maximum lateness as EDF.
- Swapping an adjacent inverted pair never increases \(L\): the only finish times that change are the two jobs’, and the larger of the two new latenesses corresponds to the job with the later deadline, which has at least as much slack as the previously-late job.
Starting from any optimal schedule and repeatedly swapping adjacent inversions, we reach EDF without ever increasing \(L\). So EDF is optimal.
Example: swapping one inversion in a lateness schedule
Three jobs: \(J_1 = (t=2, d=5)\), \(J_2 = (t=3, d=3)\), \(J_3 = (t=1, d=7)\).
EDF order (sorted by deadline): \(J_2, J_1, J_3\).
| Order | Finish times | Latenesses | \(L\) |
|---|---|---|---|
| \(J_2, J_1, J_3\) (EDF) | \(f_2=3,\ f_1=5,\ f_3=6\) | \(0,\ 0,\ 0\) | 0 |
| \(J_1, J_2, J_3\) (one inversion) | \(f_1=2,\ f_2=5,\ f_3=6\) | \(0,\ 2,\ 0\) | 2 |
The second schedule has the inversion \((J_1, J_2)\): \(J_1\) runs first but \(d_1=5 > d_2=3\). The exchange argument says we can swap them without worsening \(L\). Let the start of the pair be \(s=0\), \(t_1=2\), \(t_2=3\).
Before swap: \(f_1=2\), \(f_2=5\). Latenesses: \(\max(0,2-5)=0\) and \(\max(0,5-3)=2\). Pair max = \(2\).
After swap (\(J_2\) first): \(f_2=3\), \(f_1=5\). Latenesses: \(\max(0,3-3)=0\) and \(\max(0,5-5)=0\). Pair max = \(0\).
The swap resolved the inversion and dropped the pair’s contribution to \(L\) from \(2\) to \(0\). Since \(d_1 > d_2\), moving \(J_2\) earlier always reduces \(J_2\)’s lateness while \(J_1\)’s finish time stays at \(s+t_1+t_2\) regardless of order — so the maximum lateness of the pair can only decrease or stay the same.
Other examples
- Optimal caching with farthest-in-future (what, why): exchange one eviction decision at a time, replacing the optimum’s eviction with FIF’s. A case analysis shows the patched schedule has no more misses.
- Huffman coding: the optimum can be modified so the two least-frequent symbols are siblings at the deepest level — exactly what the greedy merge does first.
- Kruskal’s MST: the cut property is an exchange — replacing a non-greedy edge crossing a cut with the lightest such edge cannot increase total weight, since the swap forms a cycle whose removed edge is no lighter.
Stays ahead vs. exchange
| Pattern | Best when… | Mechanism | Example |
|---|---|---|---|
| Stays ahead | objective counts items or distance; greedy sorts by a scalar key | inductive comparison of \(r\)-th element | interval scheduling, Dijkstra |
| Exchange | objective is min-max, total cost over a permutation, or set selection | local swap or substitution into the optimum | minimizing maximum lateness, optimal caching |
Both patterns are interchangeable for some problems (interval scheduling has both flavors of proof), but each fits more naturally on its canonical example. When in doubt, try stays-ahead first — it is usually shorter when it works — and fall back to exchange when no scalar progress measure is available.
Common pitfalls
- The exchange is not productive. If the swap leaves a different inversion in place, the discrepancy measure does not decrease, and the iteration may not terminate. Always identify the discrepancy measure explicitly.
- The exchange is not safe. Showing one job’s lateness decreases is not enough; the maximum must not increase. Bound the new maximum carefully.
- Optimality is conflated with uniqueness. Exchange shows the greedy is no worse than the optimum, not that the optimum equals the greedy. Multiple optima can coexist.
- Adjacency matters. Many exchange arguments work only for adjacent swaps; non-adjacent swaps change too many finish times at once. The “every inversion implies an adjacent inversion” lemma is what reduces the general case to the local one.