Value Iteration Converges to V*

Claim

Starting from any bounded \(V_0 : S \to \mathbb{R}\), the sequence \(V_{k+1} = \mathcal{T} V_k\) converges to the optimal value function \(V^*\) (Bellman 1957; Puterman 1994). The error after \(k\) iterations satisfies:

\[ \|V_k - V^*\|_\infty \leq \frac{\gamma^k}{1 - \gamma} \|V_1 - V_0\|_\infty. \]

Proof

Step 1: \(\mathcal{T}\) is a \(\gamma\)-contraction with unique fixed point \(V^*\).

The Bellman optimality operator satisfies \(\|\mathcal{T} V - \mathcal{T} W\|_\infty \leq \gamma \|V - W\|_\infty\) for all bounded \(V, W\). (proof) Since \(\gamma < 1\), the Banach fixed-point theorem applies: \(\mathcal{T}\) has a unique fixed point, which is \(V^*\) by definition of the Bellman optimality equations.

Step 2: Geometric convergence.

By induction on \(k\), applying the contraction inequality \(k\) times:

\[ \|V_k - V^*\|_\infty = \|\mathcal{T}^k V_0 - \mathcal{T}^k V^*\|_\infty \leq \gamma^k \|V_0 - V^*\|_\infty. \]

This shows \(V_k \to V^*\) as \(k \to \infty\).

Step 3: A-posteriori error bound.

The triangle inequality gives, for any \(j \geq 0\):

\[ \|V_k - V^*\|_\infty \leq \sum_{j=0}^{\infty} \|V_{k+j+1} - V_{k+j}\|_\infty. \]

The contraction applied to consecutive iterates gives \(\|V_{k+j+1} - V_{k+j}\|_\infty \leq \gamma^j \|V_{k+1} - V_k\|_\infty\), so:

\[ \|V_k - V^*\|_\infty \leq \|V_{k+1} - V_k\|_\infty \sum_{j=0}^{\infty} \gamma^j = \frac{\|V_{k+1} - V_k\|_\infty}{1 - \gamma}. \]

Applying the contraction \(k\) more times to bound \(\|V_{k+1} - V_k\|_\infty \leq \gamma^k \|V_1 - V_0\|_\infty\):

\[ \|V_k - V^*\|_\infty \leq \frac{\gamma^k}{1 - \gamma} \|V_1 - V_0\|_\infty. \quad \square \]

Consequence for the Stopping Criterion

The a-posteriori bound with \(j = 0\) gives:

\[ \|V_k - V^*\|_\infty \leq \frac{\gamma}{1 - \gamma} \|V_k - V_{k-1}\|_\infty. \]

When the algorithm halts at \(\|V_{k+1} - V_k\|_\infty < \varepsilon\), the true error satisfies \(\|V_{k+1} - V^*\|_\infty < \frac{\gamma \varepsilon}{1 - \gamma}\).

References

Bellman, Richard. 1957. Dynamic Programming. Princeton University Press. https://press.princeton.edu/books/paperback/9780691146683/dynamic-programming.

Puterman, Martin L. 1994. Markov Decision Processes: Discrete Stochastic Dynamic Programming. In Wiley Series in Probability and Statistics. Wiley. https://doi.org/10.1002/9780470316887.