Optimal Caching

Motivation

A cache is a small, fast memory that holds a subset of items drawn from a large slow memory. Given a sequence of accesses, every miss (the requested item is not in the cache) requires evicting some current item and bringing the requested item in. Which item should we evict?

In real systems we cannot see the future, so we use heuristics like LRU (least recently used). But the theoretically optimal policy — the one that minimizes total misses given the entire access sequence in advance — has a strikingly simple form: evict the item whose next access is furthest in the future (Belady 1966; Kleinberg and Tardos 2005).

This Belady / farthest-in-future rule is interesting for two reasons. First, its correctness proof is the prototype of an exchange argument for greedy algorithms. Second, it gives a yardstick: the gap between an online policy’s misses and Belady’s is the policy’s “competitive ratio,” the fundamental measure of online cache performance.

Problem

A cache of capacity \(k\) items.
A sequence of requests \(r_1, r_2, \dots, r_n\), each a memory item.
An eviction schedule decides, on each cache miss, which currently-cached item to discard. (A hit costs nothing.)

A schedule is reduced if it never evicts unless forced (i.e., never voluntarily evicts on a hit). Any schedule can be transformed into a reduced one without increasing the miss count, so we restrict attention to reduced schedules.

The goal is to minimize the number of cache misses over the whole sequence.

Key Ideas

The optimal eviction rule has a strikingly simple form: on each forced eviction, discard the cached item whose next access is furthest in the future.

The intuition is that every forced eviction will eventually cost one miss — when the evicted item is requested again. Among the current candidates, the one used soonest is the most expensive to evict (a near-certain near-future miss), and the one used latest is the cheapest, because its next miss is delayed the most. Choosing the latest pushes the inevitable next miss as far into the future as possible, where it is also more likely to be absorbed by an unrelated future eviction.

This local rule extends to a global guarantee through an exchange argument: any optimal schedule can be edited, one decision at a time, into the FIF schedule without increasing the miss count. The surprise is that a single local criterion suffices — no lookahead beyond “next use” is needed. The formal proof appears under Correctness.

Algorithm

on request r_t:
    if r_t in cache:
        hit
    else:
        miss
        if cache has free slot:
            load r_t
        else:
            evict the cached item whose next access is farthest in the future
            (break ties arbitrarily; if some cached item is never accessed
            again, evict it)
            load r_t

Walkthrough

A nine-request trace

Cache of size \(k = 3\). Requests \(a, b, c, d, a, c, d, b, c\). Each column shows the cache contents after that step is processed; the request row is shaded green for a hit and red for a miss.

The yellow cells mark the slot that was rewritten on each eviction. Five misses on nine requests; no schedule does better on this sequence. A natural rival, least-recently-used (LRU), would also miss five times on this input — but on adversarial sequences LRU can miss up to \(k\) times for every miss FIF makes.

Correctness

Theorem. The FIF algorithm produces a schedule with the minimum number of misses. (proof)

Proof sketch. Let \(S_{FIF}\) be the FIF schedule and \(S^*\) any optimal reduced schedule. We transform \(S^*\) into \(S_{FIF}\) one decision at a time without increasing the miss count.

Suppose \(S^*\) and \(S_{FIF}\) agree on the first \(j-1\) decisions but differ at decision \(j\). Both are at the same cache state. On request \(r_j\) both miss, and they evict different items: \(S_{FIF}\) evicts \(f\) (farthest-in-future) and \(S^*\) evicts \(e \ne f\). Construct \(S'\) by swapping these choices: \(S'\) evicts \(f\) at step \(j\), then mimics \(S^*\) thereafter, with one wrinkle — wherever \(S^*\) would have used \(e\) but \(S'\) has \(f\) instead, we patch the schedule so that the items in cache after step \(j\) differ only in \(\{e, f\}\).

A case analysis on the next request to either \(e\) or \(f\) shows \(S'\) has at most as many misses as \(S^*\):

The next access among \(\{e, f\}\) is to \(e\). By definition of FIF, \(f\)’s next access is no earlier than \(e\)’s, so it is also at this step or later. Carry through the patch: \(S'\) has \(f\) in cache when this request comes, but \(S^*\) has \(e\) — both schedules miss on \(f\) vs. \(e\) at this time, with the cache states reconverging.
The next access among \(\{e, f\}\) is to \(f\). Then \(S^*\) misses (it had evicted… wait, it had not evicted \(f\), it had evicted \(e\)). \(S'\) has \(f\), so it hits while \(S^*\) would miss — strictly better for \(S'\).

Either way, \(S'\) has at most as many misses as \(S^*\) and agrees with \(S_{FIF}\) on one more decision. Repeating, we reach \(S_{FIF}\) without ever increasing the miss count, so \(S_{FIF}\) is optimal.

Complexity and Tradeoffs

Each request can be handled in \(O(\log n)\) if we precompute, for each request, the index of its next occurrence and use a heap keyed on these “next-use times.” Total: \(O(n \log n)\) given the sequence in advance.

Important tradeoffs:

Offline only. FIF requires the entire request sequence in advance. Real caches receive requests one at a time and must commit to evictions immediately.
Two passes over the sequence. The first pass annotates each request with its next-use index; the second simulates the cache. \(\Theta(n)\) extra memory beyond the cache itself.
Tie-breaking. When two cached items both have no future use, the choice is arbitrary; the miss count is identical.

When to Use It

FIF cannot be deployed in a live cache because future requests are unknown. Its role is analytical: it serves as the offline benchmark \(\text{OPT}\) against which real, online policies are measured.

The standard measure is the competitive ratio: an online policy \(\mathcal{A}\) is \(c\)-competitive if \(\mathcal{A}(\sigma) \le c \cdot \text{OPT}(\sigma) + O(1)\) on every sequence \(\sigma\). A celebrated result is that LRU is \(k\)-competitive on a cache of size \(k\), and no deterministic online policy can do better than \(k\)-competitive in the worst case (Sleator and Tarjan 1985). Randomized online policies (such as Marker) achieve \(O(\log k)\)-competitive ratios, again with FIF as the benchmark.

Use case	Policy
Offline analysis, sequence known	FIF / Belady — optimal
Online deployment, deterministic	LRU — \(k\)-competitive, optimal up to constants
Online deployment, randomized	Marker / Randomized Marker — \(O(\log k)\)-competitive
Workload-tuned cache	LFU, ARC, CLOCK, LIRS, and many heuristics that beat LRU on specific workloads but offer no worst-case guarantee

References

Belady, L. A. 1966. “A Study of Replacement Algorithms for a Virtual-Storage Computer.” IBM Systems Journal 5 (2): 78–101. https://doi.org/10.1147/sj.52.0078.

Kleinberg, Jon, and Éva Tardos. 2005. Algorithm Design. Pearson/Addison-Wesley. https://www.pearson.com/en-us/subject-catalog/p/algorithm-design/P200000003259.

Sleator, Daniel D., and Robert E. Tarjan. 1985. “Amortized Efficiency of List Update and Paging Rules.” Communications of the ACM 28 (2): 202–8. https://doi.org/10.1145/2786.2793.