Singular Value Decomposition
Motivation
Every matrix represents a linear transformation. The singular value decomposition (SVD) factors any matrix into three interpretable pieces: a rotation, a scaling, and another rotation. This decomposition reveals the intrinsic geometry of the transformation and underlies a wide range of algorithms in data analysis, signal processing, and numerical linear algebra.
Definition
For any matrix \(A \in \mathbb{R}^{m \times n}\) of rank \(r\), the SVD is
\[ A = U \Sigma V^\top, \]
where:
- \(U \in \mathbb{R}^{m \times m}\) is orthogonal — the left singular vectors \(\mathbf{u}_1, \ldots, \mathbf{u}_m\) form its columns
- \(V \in \mathbb{R}^{n \times n}\) is orthogonal — the right singular vectors \(\mathbf{v}_1, \ldots, \mathbf{v}_n\) form its columns
- \(\Sigma \in \mathbb{R}^{m \times n}\) is zero everywhere except \(\Sigma_{ii} = \sigma_i \geq 0\) for \(i = 1, \ldots, \min(m,n)\)
The values \(\sigma_1 \geq \sigma_2 \geq \cdots \geq \sigma_r > \sigma_{r+1} = \cdots = 0\) are the singular values of \(A\).
Every real matrix has an SVD. (proof)
Singular Vectors
The columns \(\mathbf{u}_i\) of \(U\) and \(\mathbf{v}_i\) of \(V\) are the left and right singular vectors of \(A\). They come in matched pairs \((\mathbf{u}_i, \mathbf{v}_i)\) tied together by the singular value \(\sigma_i\) through
\[ A \mathbf{v}_i = \sigma_i \mathbf{u}_i, \qquad A^\top \mathbf{u}_i = \sigma_i \mathbf{v}_i. \]
Equivalently, a unit vector \(\mathbf{v} \in \mathbb{R}^n\) is a right singular vector with singular value \(\sigma \geq 0\) if there exists a unit vector \(\mathbf{u} \in \mathbb{R}^m\) satisfying both identities above; \(\mathbf{u}\) is then the matched left singular vector.
The right singular vectors form an orthonormal basis of \(\mathbb{R}^n\), and the left singular vectors form an orthonormal basis of \(\mathbb{R}^m\). They admit a variational characterization: \(\mathbf{v}_1\) maximizes \(\|A\mathbf{v}\|\) over unit vectors \(\mathbf{v}\), and each subsequent \(\mathbf{v}_k\) maximizes \(\|A\mathbf{v}\|\) over unit vectors orthogonal to \(\mathbf{v}_1, \ldots, \mathbf{v}_{k-1}\). The maximum value at step \(k\) is \(\sigma_k\), and \(\mathbf{u}_k = A\mathbf{v}_k / \sigma_k\) whenever \(\sigma_k > 0\).
Geometric Interpretation
Rotate, scale, rotate
The SVD decomposes a linear map into an input rotation, axis-aligned scaling, and an output rotation.
The action of \(A\) on a unit vector \(\mathbf{x}\) decomposes as:
- \(V^\top\): rotate/reflect \(\mathbf{x}\) into the coordinate system of right singular vectors
- \(\Sigma\): scale each coordinate by \(\sigma_i\) (and embed into \(\mathbb{R}^m\))
- \(U\): rotate/reflect back into \(\mathbb{R}^m\)
The right singular vectors \(\mathbf{v}_i\) are the input directions that \(A\) maps to scalar multiples of \(\mathbf{u}_i\):
\[ A \mathbf{v}_i = \sigma_i \mathbf{u}_i. \]
Relationship to Eigendecomposition
The singular vectors are eigenvectors of the associated symmetric matrices:
\[ A^\top A \mathbf{v}_i = \sigma_i^2 \mathbf{v}_i, \qquad A A^\top \mathbf{u}_i = \sigma_i^2 \mathbf{u}_i. \]
The singular values are the square roots of the eigenvalues of \(A^\top A\) (equivalently \(AA^\top\)).
Thin SVD
For \(m \geq n\), only the first \(n\) columns of \(U\) and the first \(n\) rows of \(\Sigma\) are nonzero. The thin SVD discards the zero rows:
\[ A = U_n \Sigma_n V^\top, \quad U_n \in \mathbb{R}^{m \times n},\; \Sigma_n \in \mathbb{R}^{n \times n}. \]
This is the form returned by most numerical libraries.
Low-Rank Approximation
The rank-\(k\) truncation
\[ A_k = \sum_{i=1}^k \sigma_i \mathbf{u}_i \mathbf{v}_i^\top = U_k \Sigma_k V_k^\top \]
is the best rank-\(k\) approximation to \(A\) in both the Frobenius norm and the spectral norm (Eckart and Young 1936). (proof)
The approximation error is \(\|A - A_k\|_F = \sqrt{\sigma_{k+1}^2 + \cdots + \sigma_r^2}\).
Applications
Principal component analysis. For a centered data matrix \(X \in \mathbb{R}^{n \times d}\), the SVD \(X = U \Sigma V^\top\) gives the principal components as the columns of \(V\) and the singular values satisfy \(\sigma_i = \sqrt{n \lambda_i}\) where \(\lambda_i\) are eigenvalues of the sample covariance matrix.
Pseudoinverse. The Moore-Penrose pseudoinverse is \(A^+ = V \Sigma^+ U^\top\) where \(\Sigma^+\) replaces each nonzero \(\sigma_i\) with \(1/\sigma_i\).
Condition number. \(\kappa(A) = \sigma_1 / \sigma_r\) measures sensitivity of the linear system \(A\mathbf{x} = \mathbf{b}\) to perturbations.
Numerical rank. The number of singular values exceeding a tolerance \(\varepsilon\) gives a stable definition of effective rank in the presence of noise.