The Kalman Gain Minimizes the Posterior Covariance

Claim

In the Kalman filter update step, the gain

\[ K = \Sigma_{-} C^\top \bigl( C \Sigma_{-} C^\top + V \bigr)^{-1} \]

minimizes the posterior covariance \(\Sigma_+ = (I - KC)\Sigma_{-}\) over all linear estimators of the form \(\hat{x}_+ = \hat{x}_- + K'(y - C\hat{x}_-)\).

More precisely, for any matrix \(K'\), the posterior covariance \(\Sigma_+'\) under \(K'\) satisfies \(\Sigma_+' \succeq \Sigma_+\) in the positive semidefinite order.

Setup

Let \(\hat{x}_-\) denote the predicted estimate with covariance \(\Sigma_- \succeq 0\), and let \(y = Cx + v\) with \(v \sim \mathcal{N}(0, V)\), \(V \succ 0\), independent of the prior error \(e_- = x - \hat{x}_-\). A general linear update takes the form

\[ \hat{x}_+ = \hat{x}_- + K'(y - C\hat{x}_-), \]

so the posterior error is

\[ e_+ = x - \hat{x}_+ = (I - K'C)\,e_- - K'v. \]

Proof

The posterior covariance under an arbitrary gain \(K'\) is

\[ \Sigma_+' = \mathbb{E}[e_+ e_+^\top] = (I - K'C)\,\Sigma_-\,(I - K'C)^\top + K' V K'{}^\top. \]

Let \(S = C\Sigma_- C^\top + V\) denote the innovation covariance. Expanding and collecting terms:

\[ \Sigma_+' = \Sigma_- - K'C\Sigma_- - \Sigma_- C^\top K'{}^\top + K'\bigl(C\Sigma_- C^\top + V\bigr)K'{}^\top = \Sigma_- - K'C\Sigma_- - \Sigma_- C^\top K'{}^\top + K' S K'{}^\top. \]

Write \(K' = K + \Delta\) where \(K = \Sigma_- C^\top S^{-1}\) is the Kalman gain. Substituting:

\[ \Sigma_+' = \Sigma_- - (K+\Delta)C\Sigma_- - \Sigma_- C^\top(K+\Delta)^\top + (K+\Delta)S(K+\Delta)^\top. \]

Separate into the Kalman-gain part and the \(\Delta\) terms. Using \(KS = \Sigma_- C^\top\) (which follows from \(K = \Sigma_- C^\top S^{-1}\)), the cross terms involving \(\Delta\) cancel:

\[ -\Delta C\Sigma_- - \Sigma_- C^\top \Delta^\top + KS\Delta^\top + \Delta SK^\top = -\Delta C\Sigma_- - \Sigma_- C^\top \Delta^\top + \Sigma_- C^\top \Delta^\top + \Delta C\Sigma_- = 0. \]

What remains is

\[ \Sigma_+' = \underbrace{\Sigma_- - KC\Sigma_- - \Sigma_- C^\top K^\top + KSK^\top}_{\Sigma_+} + \Delta S \Delta^\top. \]

The first group simplifies to \(\Sigma_+ = (I - KC)\Sigma_-\) (verifiable by expanding \((I-KC)\Sigma_-\) and noting \(KS = \Sigma_- C^\top\)). Therefore:

\[ \Sigma_+' = \Sigma_+ + \Delta S \Delta^\top. \]

Since \(S \succ 0\) (because \(V \succ 0\)), we have \(\Delta S \Delta^\top \succeq 0\) for any \(\Delta\), with equality if and only if \(\Delta = 0\). Thus

\[ \Sigma_+' \succeq \Sigma_+, \]

with equality uniquely at \(K' = K\).

Remarks

Minimizing \(\Sigma_+\) in the PSD sense simultaneously minimizes any scalar measure of estimation error, such as \(\mathbb{E}[\|e_+\|^2] = \operatorname{tr}(\Sigma_+)\).
Under Gaussian priors and likelihoods, \(\hat{x}_+ = \hat{x}_- + K(y - C\hat{x}_-)\) is not merely the best linear estimate but the full conditional mean \(\mathbb{E}[x \mid y_1, \ldots, y_t]\), so the Kalman filter is the MMSE estimator (not just linear-MMSE) in the Gaussian case (Kalman 1960).
In practice the covariance update is written in the numerically stable Joseph form \((I - KC)\Sigma_-(I-KC)^\top + KVK^\top\), which equals \((I-KC)\Sigma_-\) only at the Kalman gain but remains positive semidefinite for any \(K'\).

References

Kalman, R. E. 1960. “A New Approach to Linear Filtering and Prediction Problems.” Journal of Basic Engineering 82 (1): 35–45. https://doi.org/10.1115/1.3662552.