Value Iteration Converges to V*
Claim
Starting from any bounded \(V_0 : S \to \mathbb{R}\), the sequence \(V_{k+1} = \mathcal{T} V_k\) converges to the optimal value function \(V^*\) (Bellman 1957; Puterman 1994). The error after \(k\) iterations satisfies:
\[ \|V_k - V^*\|_\infty \leq \frac{\gamma^k}{1 - \gamma} \|V_1 - V_0\|_\infty. \]
Proof
Step 1: \(\mathcal{T}\) is a \(\gamma\)-contraction with unique fixed point \(V^*\).
The Bellman optimality operator satisfies \(\|\mathcal{T} V - \mathcal{T} W\|_\infty \leq \gamma \|V - W\|_\infty\) for all bounded \(V, W\). (proof) Since \(\gamma < 1\), the Banach fixed-point theorem applies: \(\mathcal{T}\) has a unique fixed point, which is \(V^*\) by definition of the Bellman optimality equations.
Step 2: Geometric convergence.
By induction on \(k\), applying the contraction inequality \(k\) times:
\[ \|V_k - V^*\|_\infty = \|\mathcal{T}^k V_0 - \mathcal{T}^k V^*\|_\infty \leq \gamma^k \|V_0 - V^*\|_\infty. \]
This shows \(V_k \to V^*\) as \(k \to \infty\).
Step 3: A-posteriori error bound.
The triangle inequality gives, for any \(j \geq 0\):
\[ \|V_k - V^*\|_\infty \leq \sum_{j=0}^{\infty} \|V_{k+j+1} - V_{k+j}\|_\infty. \]
The contraction applied to consecutive iterates gives \(\|V_{k+j+1} - V_{k+j}\|_\infty \leq \gamma^j \|V_{k+1} - V_k\|_\infty\), so:
\[ \|V_k - V^*\|_\infty \leq \|V_{k+1} - V_k\|_\infty \sum_{j=0}^{\infty} \gamma^j = \frac{\|V_{k+1} - V_k\|_\infty}{1 - \gamma}. \]
Applying the contraction \(k\) more times to bound \(\|V_{k+1} - V_k\|_\infty \leq \gamma^k \|V_1 - V_0\|_\infty\):
\[ \|V_k - V^*\|_\infty \leq \frac{\gamma^k}{1 - \gamma} \|V_1 - V_0\|_\infty. \quad \square \]
Consequence for the Stopping Criterion
The a-posteriori bound with \(j = 0\) gives:
\[ \|V_k - V^*\|_\infty \leq \frac{\gamma}{1 - \gamma} \|V_k - V_{k-1}\|_\infty. \]
When the algorithm halts at \(\|V_{k+1} - V_k\|_\infty < \varepsilon\), the true error satisfies \(\|V_{k+1} - V^*\|_\infty < \frac{\gamma \varepsilon}{1 - \gamma}\).