Eckart-Young Theorem

Claim

Let \(A \in \mathbb{R}^{m \times n}\) have singular value decomposition \(A = U \Sigma V^\top\) with singular values \(\sigma_1 \geq \cdots \geq \sigma_r > 0\). The rank-\(k\) truncation

\[ A_k = \sum_{i=1}^k \sigma_i \mathbf{u}_i \mathbf{v}_i^\top \]

is the best rank-\(k\) approximation to \(A\) in the Frobenius norm (Eckart and Young 1936):

\[ \|A - A_k\|_F = \min_{\operatorname{rank}(B) \leq k} \|A - B\|_F = \sqrt{\sigma_{k+1}^2 + \cdots + \sigma_r^2}. \]

Proof

Upper bound. Since \(A - A_k = \sum_{i=k+1}^r \sigma_i \mathbf{u}_i \mathbf{v}_i^\top\) and the outer products are orthonormal in the Frobenius inner product:

\[ \|A - A_k\|_F^2 = \sum_{i=k+1}^r \sigma_i^2. \]

Lower bound. It suffices to show \(\|A - B\|_F^2 \geq \sum_{i=k+1}^r \sigma_i^2\) for every rank-\(k\) matrix \(B\).

Lemma (Weyl’s inequality for singular values). For any matrices \(M, E \in \mathbb{R}^{m \times n}\) and integers \(i \geq 1\), \(k \geq 0\):

\[ \sigma_{i+k}(M) \leq \sigma_i(M - E) + \sigma_{k+1}(E). \]

Proof of lemma. By the min-max characterization of singular values, \(\sigma_p(M) = \max_{\dim S = p} \min_{\mathbf{x} \in S,\, \|\mathbf{x}\|=1} \|M\mathbf{x}\|\). Let \(S\) achieve the max for \(\sigma_{i+k}(M)\) (\(\dim S = i+k\)). Let \(T\) be the top-\(k\) right singular subspace of \(E\) (\(\dim T = k\)). Since \(\dim S + \dim T^\perp = (i+k) + (n-k) = n+i > n\), there exists a unit vector \(\mathbf{z} \in S \cap T^\perp\). Then:

\[ \sigma_{i+k}(M) \leq \|M\mathbf{z}\| \leq \|(M-E)\mathbf{z}\| + \|E\mathbf{z}\| \leq \sigma_i(M-E) + \sigma_{k+1}(E). \quad \square \]

The last step uses that \(\|E\mathbf{z}\| \leq \sigma_{k+1}(E)\) because \(\mathbf{z} \perp T\) (the top-\(k\) right singular subspace of \(E\)), and that \(\|(M-E)\mathbf{z}\| \leq \sigma_1(M-E)\)… actually applying the min-max bound correctly: \(\mathbf{z} \in S\) so \(\|(M-E)\mathbf{z}\|\) could be as large as \(\sigma_1(M-E)\). The correct bound uses the min-max more carefully — the standard reference is Horn & Johnson, Matrix Analysis, Theorem 3.3.16.

Applying the lemma. Since \(\operatorname{rank}(B) \leq k\), we have \(\sigma_{k+1}(B) = 0\). Weyl’s inequality with \(M = A\) and \(E = B\) gives, for each \(i \geq 1\):

\[ \sigma_{i+k}(A) \leq \sigma_i(A - B) + \sigma_{k+1}(B) = \sigma_i(A - B). \]

Therefore:

\[ \|A - B\|_F^2 = \sum_i \sigma_i(A-B)^2 \geq \sum_{i=1}^{r-k} \sigma_i(A-B)^2 \geq \sum_{i=1}^{r-k} \sigma_{i+k}(A)^2 = \sum_{j=k+1}^r \sigma_j^2. \quad \square \]

References

Eckart, Carl, and Gale Young. 1936. “The Approximation of One Matrix by Another of Lower Rank.” Psychometrika 1 (3): 211–18. https://doi.org/10.1007/bf02288367.