The Kalman Gain Minimizes the Posterior Covariance
Claim
In the Kalman filter update step, the gain
\[ K = \Sigma_{-} C^\top \bigl( C \Sigma_{-} C^\top + V \bigr)^{-1} \]
minimizes the posterior covariance \(\Sigma_+ = (I - KC)\Sigma_{-}\) over all linear estimators of the form \(\hat{x}_+ = \hat{x}_- + K'(y - C\hat{x}_-)\).
More precisely, for any matrix \(K'\), the posterior covariance \(\Sigma_+'\) under \(K'\) satisfies \(\Sigma_+' \succeq \Sigma_+\) in the positive semidefinite order.
Setup
Let \(\hat{x}_-\) denote the predicted estimate with covariance \(\Sigma_- \succeq 0\), and let \(y = Cx + v\) with \(v \sim \mathcal{N}(0, V)\), \(V \succ 0\), independent of the prior error \(e_- = x - \hat{x}_-\). A general linear update takes the form
\[ \hat{x}_+ = \hat{x}_- + K'(y - C\hat{x}_-), \]
so the posterior error is
\[ e_+ = x - \hat{x}_+ = (I - K'C)\,e_- - K'v. \]
Proof
The posterior covariance under an arbitrary gain \(K'\) is
\[ \Sigma_+' = \mathbb{E}[e_+ e_+^\top] = (I - K'C)\,\Sigma_-\,(I - K'C)^\top + K' V K'{}^\top. \]
Let \(S = C\Sigma_- C^\top + V\) denote the innovation covariance. Expanding and collecting terms:
\[ \Sigma_+' = \Sigma_- - K'C\Sigma_- - \Sigma_- C^\top K'{}^\top + K'\bigl(C\Sigma_- C^\top + V\bigr)K'{}^\top = \Sigma_- - K'C\Sigma_- - \Sigma_- C^\top K'{}^\top + K' S K'{}^\top. \]
Write \(K' = K + \Delta\) where \(K = \Sigma_- C^\top S^{-1}\) is the Kalman gain. Substituting:
\[ \Sigma_+' = \Sigma_- - (K+\Delta)C\Sigma_- - \Sigma_- C^\top(K+\Delta)^\top + (K+\Delta)S(K+\Delta)^\top. \]
Separate into the Kalman-gain part and the \(\Delta\) terms. Using \(KS = \Sigma_- C^\top\) (which follows from \(K = \Sigma_- C^\top S^{-1}\)), the cross terms involving \(\Delta\) cancel:
\[ -\Delta C\Sigma_- - \Sigma_- C^\top \Delta^\top + KS\Delta^\top + \Delta SK^\top = -\Delta C\Sigma_- - \Sigma_- C^\top \Delta^\top + \Sigma_- C^\top \Delta^\top + \Delta C\Sigma_- = 0. \]
What remains is
\[ \Sigma_+' = \underbrace{\Sigma_- - KC\Sigma_- - \Sigma_- C^\top K^\top + KSK^\top}_{\Sigma_+} + \Delta S \Delta^\top. \]
The first group simplifies to \(\Sigma_+ = (I - KC)\Sigma_-\) (verifiable by expanding \((I-KC)\Sigma_-\) and noting \(KS = \Sigma_- C^\top\)). Therefore:
\[ \Sigma_+' = \Sigma_+ + \Delta S \Delta^\top. \]
Since \(S \succ 0\) (because \(V \succ 0\)), we have \(\Delta S \Delta^\top \succeq 0\) for any \(\Delta\), with equality if and only if \(\Delta = 0\). Thus
\[ \Sigma_+' \succeq \Sigma_+, \]
with equality uniquely at \(K' = K\).
Remarks
- Minimizing \(\Sigma_+\) in the PSD sense simultaneously minimizes any scalar measure of estimation error, such as \(\mathbb{E}[\|e_+\|^2] = \operatorname{tr}(\Sigma_+)\).
- Under Gaussian priors and likelihoods, \(\hat{x}_+ = \hat{x}_- + K(y - C\hat{x}_-)\) is not merely the best linear estimate but the full conditional mean \(\mathbb{E}[x \mid y_1, \ldots, y_t]\), so the Kalman filter is the MMSE estimator (not just linear-MMSE) in the Gaussian case (Kalman 1960).
- In practice the covariance update is written in the numerically stable Joseph form \((I - KC)\Sigma_-(I-KC)^\top + KVK^\top\), which equals \((I-KC)\Sigma_-\) only at the Kalman gain but remains positive semidefinite for any \(K'\).