Singular Value Decomposition

Motivation

Every matrix represents a linear transformation. The singular value decomposition (SVD) factors any matrix into three interpretable pieces: a rotation, a scaling, and another rotation. This decomposition reveals the intrinsic geometry of the transformation and underlies a wide range of algorithms in data analysis, signal processing, and numerical linear algebra.

Definition

For any matrix \(A \in \mathbb{R}^{m \times n}\) of rank \(r\), the SVD is

\[ A = U \Sigma V^\top, \]

where:

\(U \in \mathbb{R}^{m \times m}\) is orthogonal — the left singular vectors \(\mathbf{u}_1, \ldots, \mathbf{u}_m\) form its columns
\(V \in \mathbb{R}^{n \times n}\) is orthogonal — the right singular vectors \(\mathbf{v}_1, \ldots, \mathbf{v}_n\) form its columns
\(\Sigma \in \mathbb{R}^{m \times n}\) is zero everywhere except \(\Sigma_{ii} = \sigma_i \geq 0\) for \(i = 1, \ldots, \min(m,n)\)

The values \(\sigma_1 \geq \sigma_2 \geq \cdots \geq \sigma_r > \sigma_{r+1} = \cdots = 0\) are the singular values of \(A\).

Every real matrix has an SVD. (proof)

Singular Vectors

The columns \(\mathbf{u}_i\) of \(U\) and \(\mathbf{v}_i\) of \(V\) are the left and right singular vectors of \(A\). They come in matched pairs \((\mathbf{u}_i, \mathbf{v}_i)\) tied together by the singular value \(\sigma_i\) through

\[ A \mathbf{v}_i = \sigma_i \mathbf{u}_i, \qquad A^\top \mathbf{u}_i = \sigma_i \mathbf{v}_i. \]

Equivalently, a unit vector \(\mathbf{v} \in \mathbb{R}^n\) is a right singular vector with singular value \(\sigma \geq 0\) if there exists a unit vector \(\mathbf{u} \in \mathbb{R}^m\) satisfying both identities above; \(\mathbf{u}\) is then the matched left singular vector.

The right singular vectors form an orthonormal basis of \(\mathbb{R}^n\), and the left singular vectors form an orthonormal basis of \(\mathbb{R}^m\). They admit a variational characterization: \(\mathbf{v}_1\) maximizes \(\|A\mathbf{v}\|\) over unit vectors \(\mathbf{v}\), and each subsequent \(\mathbf{v}_k\) maximizes \(\|A\mathbf{v}\|\) over unit vectors orthogonal to \(\mathbf{v}_1, \ldots, \mathbf{v}_{k-1}\). The maximum value at step \(k\) is \(\sigma_k\), and \(\mathbf{u}_k = A\mathbf{v}_k / \sigma_k\) whenever \(\sigma_k > 0\).

Geometric Interpretation

Rotate, scale, rotate

The SVD decomposes a linear map into an input rotation, axis-aligned scaling, and an output rotation.

The action of \(A\) on a unit vector \(\mathbf{x}\) decomposes as:

\(V^\top\): rotate/reflect \(\mathbf{x}\) into the coordinate system of right singular vectors
\(\Sigma\): scale each coordinate by \(\sigma_i\) (and embed into \(\mathbb{R}^m\))
\(U\): rotate/reflect back into \(\mathbb{R}^m\)

The right singular vectors \(\mathbf{v}_i\) are the input directions that \(A\) maps to scalar multiples of \(\mathbf{u}_i\):

\[ A \mathbf{v}_i = \sigma_i \mathbf{u}_i. \]

Relationship to Eigendecomposition

The singular vectors are eigenvectors of the associated symmetric matrices:

\[ A^\top A \mathbf{v}_i = \sigma_i^2 \mathbf{v}_i, \qquad A A^\top \mathbf{u}_i = \sigma_i^2 \mathbf{u}_i. \]

The singular values are the square roots of the eigenvalues of \(A^\top A\) (equivalently \(AA^\top\)).

Thin SVD

For \(m \geq n\), only the first \(n\) columns of \(U\) and the first \(n\) rows of \(\Sigma\) are nonzero. The thin SVD discards the zero rows:

\[ A = U_n \Sigma_n V^\top, \quad U_n \in \mathbb{R}^{m \times n},\; \Sigma_n \in \mathbb{R}^{n \times n}. \]

This is the form returned by most numerical libraries.

Low-Rank Approximation

The rank-\(k\) truncation

\[ A_k = \sum_{i=1}^k \sigma_i \mathbf{u}_i \mathbf{v}_i^\top = U_k \Sigma_k V_k^\top \]

is the best rank-\(k\) approximation to \(A\) in both the Frobenius norm and the spectral norm (Eckart and Young 1936). (proof)

The approximation error is \(\|A - A_k\|_F = \sqrt{\sigma_{k+1}^2 + \cdots + \sigma_r^2}\).

Applications

Principal component analysis. For a centered data matrix \(X \in \mathbb{R}^{n \times d}\), the SVD \(X = U \Sigma V^\top\) gives the principal components as the columns of \(V\) and the singular values satisfy \(\sigma_i = \sqrt{n \lambda_i}\) where \(\lambda_i\) are eigenvalues of the sample covariance matrix.

Pseudoinverse. The Moore-Penrose pseudoinverse is \(A^+ = V \Sigma^+ U^\top\) where \(\Sigma^+\) replaces each nonzero \(\sigma_i\) with \(1/\sigma_i\).

Condition number. \(\kappa(A) = \sigma_1 / \sigma_r\) measures sensitivity of the linear system \(A\mathbf{x} = \mathbf{b}\) to perturbations.

Numerical rank. The number of singular values exceeding a tolerance \(\varepsilon\) gives a stable definition of effective rank in the presence of noise.

References

Eckart, Carl, and Gale Young. 1936. “The Approximation of One Matrix by Another of Lower Rank.” Psychometrika 1 (3): 211–18. https://doi.org/10.1007/bf02288367.