Contraction Mappings
Motivation
Value iteration and policy iteration both work by applying the Bellman operator repeatedly and relying on convergence to a fixed point. But why do these iterations converge, and why is the fixed point unique? The answer is that the Bellman operators are contraction mappings, and the Banach fixed-point theorem guarantees that any contraction on a complete metric space converges geometrically to a unique fixed point from any initialization. Contraction mappings are therefore the mathematical foundation on which the correctness of these RL algorithms rests.
Definition
Repeated contraction into a fixed point
A contraction maps a whole ball into a smaller ball around the image. Iterating it collapses uncertainty toward the unique fixed point.
Let \((X, d)\) be a metric space. A function \(f : X \to X\) is a contraction (or contraction mapping) with constant \(c \in [0, 1)\) if
\[ d(f(x), f(y)) \leq c \cdot d(x, y) \quad \text{for all } x, y \in X. \]
The constant \(c\) is called the contraction factor (or Lipschitz constant). Because \(c < 1\), each application of \(f\) strictly reduces the distance between any two points.
Banach Fixed-Point Theorem
If \((X, d)\) is a complete metric space and \(f\) is a contraction with constant \(c\), then (Banach 1922):
- \(f\) has a unique fixed point \(x^* \in X\) satisfying \(f(x^*) = x^*\).
- For any starting point \(x_0 \in X\), the iterates \(x_{k+1} = f(x_k)\) converge to \(x^*\).
- The error after \(k\) steps satisfies
\[ d(x_k, x^*) \leq \frac{c^k}{1 - c}\, d(x_1, x_0). \]
Convergence is geometric with ratio \(c\).
Contraction in the \(\ell^\infty\) Norm
In reinforcement learning the relevant metric space is the set of bounded functions \(V : S \to \mathbb{R}\) equipped with the sup-norm
\[ \|V - W\|_\infty = \sup_{s \in S} |V(s) - W(s)|. \]
This space is complete. An operator \(\mathcal{T}\) on this space is a \(\gamma\)-contraction if
\[ \|\mathcal{T} V - \mathcal{T} W\|_\infty \leq \gamma \|V - W\|_\infty \]
for some \(\gamma \in [0, 1)\). The Banach fixed-point theorem then guarantees a unique fixed point and geometric convergence from any initialization.
Application: Bellman Operators
Both the Bellman optimality operator \(\mathcal{T}\) and the Bellman expectation operator \(\mathcal{T}^\pi\) are \(\gamma\)-contractions in the \(\ell^\infty\) norm, where \(\gamma\) is the MDP discount factor. (proof)
This is the key property underlying the convergence of value iteration and policy iteration.