POMDP Value Functions Are Piecewise Linear and Convex
Claim
For a finite POMDP \((S, A, \Omega, P, O, R, \gamma)\), the finite-horizon optimal value function \(V_k^* : \Delta(S) \to \mathbb{R}\) is piecewise linear and convex (PWLC) in the belief \(b\) for every horizon \(k \geq 0\) (Kaelbling et al. 1998). That is, there exists a finite set \(\Gamma_k \subset \mathbb{R}^{|S|}\) of \(\alpha\)-vectors such that:
\[V_k^*(b) = \max_{\alpha \in \Gamma_k}\, \alpha \cdot b \qquad \forall b \in \Delta(S).\]
Proof
By induction on \(k\).
Base case (\(k = 0\)). Set \(V_0^*(b) = 0\) for all \(b\). This is PWLC with \(\Gamma_0 = \{\mathbf{0}\}\). \(\square\)
Inductive step. Assume \(V_k^*\) is PWLC with \(\alpha\)-vector set \(\Gamma_k\):
\[V_k^*(b) = \max_{\alpha \in \Gamma_k}\, \alpha \cdot b.\]
We show \(V_{k+1}^*\) is PWLC.
Step 1: Transform \(\alpha\)-vectors through one observation.
Fix action \(a \in A\) and observation \(o \in \Omega\). For each \(\alpha \in \Gamma_k\), define the backed-up \(\alpha\)-vector \(\hat{\alpha}^{a,o} \in \mathbb{R}^{|S|}\) by:
\[\hat{\alpha}^{a,o}_s = \gamma \sum_{s' \in S} O(o \mid s', a)\, P(s' \mid s, a)\, \alpha_{s'}.\]
This is a linear transformation of \(\alpha\) through the observation and transition matrices.
Step 2: The contribution of one observation is linear in \(b\).
The Bayes filter gives \(b^{a,o}(s') \propto O(o \mid s', a) \sum_s P(s' \mid s, a)\, b(s)\), with normalization \(P(o \mid b, a) = \sum_{s'} O(o \mid s', a) \sum_s P(s' \mid s, a)\, b(s)\). Therefore:
\[P(o \mid b, a)\, V_k^*(b^{a,o}) = P(o \mid b, a) \max_{\alpha \in \Gamma_k}\, \alpha \cdot b^{a,o} = \max_{\alpha \in \Gamma_k}\, \hat{\alpha}^{a,o} \cdot b.\]
The last equality holds because multiplying through by \(P(o \mid b, a)\) and substituting the Bayes filter expression cancels the normalization, leaving \(\sum_s \hat{\alpha}^{a,o}_s\, b(s)\).
Step 3: Sum over observations.
The one-step value for a fixed action \(a\) is:
\[Q_{k+1}(b, a) = \rho(b, a) + \gamma \sum_{o \in \Omega} P(o \mid b, a)\, V_k^*(b^{a,o}) = r^a \cdot b + \sum_{o \in \Omega} \max_{\alpha \in \Gamma_k}\, \hat{\alpha}^{a,o} \cdot b,\]
where \(r^a_s = R(s, a)\). For each \(b\), the max selects one \(\alpha\)-vector per observation. Over all beliefs, this amounts to taking the max over all functions \(\sigma : \Omega \to \Gamma_k\) (one \(\alpha\)-vector per observation):
\[Q_{k+1}(b, a) = \max_{\sigma : \Omega \to \Gamma_k} \left( r^a + \sum_{o \in \Omega} \hat{\alpha}^{a,o}_{\sigma(o)} \right) \cdot b.\]
Each candidate is a fixed vector dotted with \(b\), so \(Q_{k+1}(\cdot, a)\) is PWLC for each \(a\).
Step 4: Max over actions.
\[V_{k+1}^*(b) = \max_{a \in A} Q_{k+1}(b, a).\]
A finite max of PWLC functions is PWLC. The combined \(\alpha\)-vector set is:
\[\Gamma_{k+1} = \left\{ r^a + \sum_{o \in \Omega} \hat{\alpha}^{a,o}_{\sigma(o)} \;\middle|\; a \in A,\; \sigma : \Omega \to \Gamma_k \right\},\]
which is finite (at most \(|A| \cdot |\Gamma_k|^{|\Omega|}\) vectors, pruned to remove dominated ones). Therefore \(V_{k+1}^*\) is PWLC. \(\blacksquare\)
Consequence
Each \(\alpha\)-vector in \(\Gamma_k\) encodes the value of a specific \(k\)-step conditional plan — a complete decision tree mapping observation histories to actions. The PWLC structure enables exact POMDP algorithms: instead of representing \(V_k^*\) over the continuous belief simplex, one only needs to store the finite set \(\Gamma_k\). The worst-case number of \(\alpha\)-vectors grows as \(O((|A| \cdot |\Gamma_k|)^{|\Omega|})\) per step, which is exponential in the horizon but tractable for small problems.