Contraction Mappings

Motivation

Value iteration and policy iteration both work by applying the Bellman operator repeatedly and relying on convergence to a fixed point. But why do these iterations converge, and why is the fixed point unique? The answer is that the Bellman operators are contraction mappings, and the Banach fixed-point theorem guarantees that any contraction on a complete metric space converges geometrically to a unique fixed point from any initialization. Contraction mappings are therefore the mathematical foundation on which the correctness of these RL algorithms rests.

Definition

Repeated contraction into a fixed point

A contraction maps a whole ball into a smaller ball around the image. Iterating it collapses uncertainty toward the unique fixed point.

Let \((X, d)\) be a metric space. A function \(f : X \to X\) is a contraction (or contraction mapping) with constant \(c \in [0, 1)\) if

\[ d(f(x), f(y)) \leq c \cdot d(x, y) \quad \text{for all } x, y \in X. \]

The constant \(c\) is called the contraction factor (or Lipschitz constant). Because \(c < 1\), each application of \(f\) strictly reduces the distance between any two points.

Banach Fixed-Point Theorem

If \((X, d)\) is a complete metric space and \(f\) is a contraction with constant \(c\), then (Banach 1922):

\(f\) has a unique fixed point \(x^* \in X\) satisfying \(f(x^*) = x^*\).
For any starting point \(x_0 \in X\), the iterates \(x_{k+1} = f(x_k)\) converge to \(x^*\).
The error after \(k\) steps satisfies

\[ d(x_k, x^*) \leq \frac{c^k}{1 - c}\, d(x_1, x_0). \]

Convergence is geometric with ratio \(c\).

Contraction in the \(\ell^\infty\) Norm

In reinforcement learning the relevant metric space is the set of bounded functions \(V : S \to \mathbb{R}\) equipped with the sup-norm

\[ \|V - W\|_\infty = \sup_{s \in S} |V(s) - W(s)|. \]

This space is complete. An operator \(\mathcal{T}\) on this space is a \(\gamma\)-contraction if

\[ \|\mathcal{T} V - \mathcal{T} W\|_\infty \leq \gamma \|V - W\|_\infty \]

for some \(\gamma \in [0, 1)\). The Banach fixed-point theorem then guarantees a unique fixed point and geometric convergence from any initialization.

Application: Bellman Operators

Both the Bellman optimality operator \(\mathcal{T}\) and the Bellman expectation operator \(\mathcal{T}^\pi\) are \(\gamma\)-contractions in the \(\ell^\infty\) norm, where \(\gamma\) is the MDP discount factor. (proof)

This is the key property underlying the convergence of value iteration and policy iteration.

References

Banach, Stefan. 1922. “Sur Les Opérations Dans Les Ensembles Abstraits Et Leur Application Aux Équations Intégrales.” Fundamenta Mathematicae 3: 133–81. https://doi.org/10.4064/fm-3-1-133-181.