Diagonalization

Motivation

A diagonal matrix is the simplest kind of linear transformation — it scales each coordinate axis independently and does nothing else. Most matrices are not diagonal, but many of them are equivalent to a diagonal matrix in the sense that they describe the same linear map in a different coordinate system. A matrix that admits such a coordinate change is called diagonalizable, and the procedure of finding the coordinate change is called diagonalization (Trefethen and Bau 1997).

Diagonalization tells us when a matrix’s behavior can be fully understood as independent stretches along axes — and when it cannot. The criterion is sharp, the obstruction is geometric, and the consequences run through everything from solving linear recurrences to deciding whether eigendecomposition is possible.

Definition

A square matrix \(A \in \mathbb{R}^{n \times n}\) is diagonalizable if there exists an invertible matrix \(P\) and a diagonal matrix \(D\) such that

\[ A = P D P^{-1}. \]

Equivalently: \(A\) is similar to a diagonal matrix. The columns of \(P\) are eigenvectors of \(A\) and the diagonal of \(D\) holds the corresponding eigenvalues.

The factorization itself is the eigendecomposition; “diagonalization” is the property — does it exist? — and the procedure for constructing it.

A non-diagonalizable square matrix is called defective.

A diagonalizable matrix

\[ A = \begin{pmatrix} 2 & 1 \\ 0 & 3 \end{pmatrix}. \]

Eigenvalues \(\lambda_1 = 2, \lambda_2 = 3\) are distinct, so there are two linearly independent eigenvectors and \(A\) is diagonalizable.

A defective matrix

\[ J = \begin{pmatrix} 2 & 1 \\ 0 & 2 \end{pmatrix}. \]

The only eigenvalue is \(\lambda = 2\) (with algebraic multiplicity \(2\)), but solving \((A - 2I)\mathbf{v} = \mathbf{0}\) gives only the one-dimensional eigenspace \(\{(t, 0)^\top : t \in \mathbb{R}\}\). There are not two linearly independent eigenvectors, so \(J\) is not diagonalizable.

Criterion for Diagonalizability

A matrix \(A \in \mathbb{R}^{n \times n}\) is diagonalizable (over \(\mathbb{R}\)) if and only if it has \(n\) linearly independent eigenvectors.

This restates the existence of an invertible \(P\) in the definition. To check it in practice, decompose the criterion by eigenvalue:

For each eigenvalue \(\lambda\), let \(m_\lambda\) be its algebraic multiplicity — the multiplicity of \(\lambda\) as a root of the characteristic polynomial.
Let \(g_\lambda\) be its geometric multiplicity — the dimension of the eigenspace \(\ker(A - \lambda I)\), equivalently the number of linearly independent eigenvectors for \(\lambda\).
Always \(1 \le g_\lambda \le m_\lambda\).

\(A\) is diagonalizable over \(\mathbb{R}\) if and only if two conditions hold:

All eigenvalues are real (the characteristic polynomial splits over \(\mathbb{R}\)), and
\(g_\lambda = m_\lambda\) for every eigenvalue \(\lambda\).

When (1) fails, \(A\) is not diagonalizable over \(\mathbb{R}\) but may still be diagonalizable over \(\mathbb{C}\). When (2) fails for some eigenvalue, \(A\) is defective in any field.

Two convenient sufficient conditions

Distinct eigenvalues. If \(A\) has \(n\) distinct eigenvalues, it is diagonalizable. Eigenvectors for distinct eigenvalues are automatically linearly independent, so \(n\) of them give the required basis. The example \(A = \begin{pmatrix} 2 & 1 \\ 0 & 3 \end{pmatrix}\) above falls into this case.
Real and symmetric. Every real symmetric matrix is diagonalizable, with an orthonormal basis of eigenvectors and real eigenvalues. This is the spectral theorem, and it is the version used in principal components analysis and to prove the singular value decomposition.

Neither condition is necessary: a \(2 \times 2\) matrix like \(\begin{pmatrix} 3 & 0 \\ 0 & 3 \end{pmatrix} = 3I\) has a repeated eigenvalue but is trivially diagonal — and hence diagonalizable.

Algebraic vs. Geometric Multiplicity

The single point on which diagonalizability turns is whether the algebraic and geometric multiplicities agree at every eigenvalue. The interpretations are:

Algebraic multiplicity counts how many times \(\lambda\) appears as a root of \(\det(\lambda I - A)\). It is a polynomial-counting fact.
Geometric multiplicity counts how many linearly independent eigenvectors share that eigenvalue. It is a dimension-of-eigenspace fact.

Geometric multiplicity can be smaller than algebraic multiplicity, but never larger. When it is strictly smaller, eigenvectors “run out” before reaching the algebraic count, and no basis of eigenvectors can be assembled.

The defective \(J\) revisited

\[ J = \begin{pmatrix} 2 & 1 \\ 0 & 2 \end{pmatrix}. \]

Characteristic polynomial: \((\lambda - 2)^2\), so \(m_2 = 2\). Eigenspace: \(\ker(J - 2I) = \ker\begin{pmatrix} 0 & 1 \\ 0 & 0 \end{pmatrix} = \operatorname{span}((1, 0)^\top)\), so \(g_2 = 1\). Since \(g_2 < m_2\), the matrix is defective.

The geometric content is that \(J\) is not “pure scaling” along two directions — it does a partial scaling on one direction and a shear along another. Shearing is the canonical example of a behavior that cannot be unraveled by a coordinate change into independent scalings.

Procedure

The standard procedure for diagonalization is the same as for eigendecomposition:

diagonalize(A):
    compute the characteristic polynomial det(λI - A)
    find its roots λ_1, ..., λ_k (with multiplicities)
    for each distinct eigenvalue λ:
        find a basis of the eigenspace ker(A - λI)
    collect all eigenvectors into the columns of P
    if total count < n:
        A is defective, no diagonalization exists
    otherwise:
        D = diag of the eigenvalues (in the same order)
        A = P D P^{-1}

The check “total count \(<\) n” is where the algebraic-vs-geometric multiplicity comparison shows up. If for any eigenvalue the eigenspace is smaller than the multiplicity, the total count of eigenvectors falls short of \(n\).

Worked example

\[ A = \begin{pmatrix} 5 & -1 & 0 \\ 0 & 4 & 0 \\ 0 & 1 & 4 \end{pmatrix}. \]

Characteristic polynomial: \((5 - \lambda)(4 - \lambda)^2\). Eigenvalues \(\lambda_1 = 5\) (algebraic multiplicity \(1\)) and \(\lambda_2 = 4\) (algebraic multiplicity \(2\)).

For \(\lambda_1 = 5\): \((A - 5I)\mathbf{v} = \mathbf{0}\) gives \(-v_2 = 0\) and \(v_2 = -v_2\) (consistent), forcing \(v_2 = 0\), \(v_3\) free. Wait — re-row-reducing,

\[ A - 5I = \begin{pmatrix} 0 & -1 & 0 \\ 0 & -1 & 0 \\ 0 & 1 & -1 \end{pmatrix}, \]

so \(v_2 = 0\) and \(v_3 = 0\) with \(v_1\) free, giving \(\mathbf{p}_1 = (1, 0, 0)^\top\) and \(g_5 = 1 = m_5\).

For \(\lambda_2 = 4\):

\[ A - 4I = \begin{pmatrix} 1 & -1 & 0 \\ 0 & 0 & 0 \\ 0 & 1 & 0 \end{pmatrix}. \]

Row reducing: \(v_2 = 0\) from the third row, then \(v_1 = 0\) from the first, with \(v_3\) free, giving the single eigenvector \(\mathbf{p}_2 = (0, 0, 1)^\top\). So \(g_4 = 1 < 2 = m_4\), and \(A\) is defective — not diagonalizable.

When Diagonalization Is Impossible

A defective matrix has no eigendecomposition, but it still has structure. Two standard substitutes:

Jordan canonical form (Jordan 1870). Every square matrix over \(\mathbb{C}\) is similar to a block-diagonal matrix whose blocks are Jordan blocks \(J_k(\lambda)\) — diagonal with \(\lambda\) on the diagonal and \(1\)s on the superdiagonal. For diagonalizable matrices, all Jordan blocks have size \(1\) and the form reduces to the eigendecomposition. The Jordan form is the algebraic completion of diagonalization.
Singular value decomposition. \(A = U \Sigma V^\top\) exists for every matrix, diagonalizable or not. The singular values \(\sigma_i\) play the role of “size of the action along each direction” without requiring that the input and output directions coincide. See singular value decomposition.

In numerical work, the SVD is preferred to the Jordan form, which is unstable under perturbation: a defective matrix can become diagonalizable under an arbitrarily small change in entries, with the Jordan structure collapsing.

Computational Notes

Whether \(A\) is diagonalizable is decided by checking eigenspace dimensions. In exact symbolic computation this is straightforward. In numerical work the binary “diagonalizable vs. defective” distinction is fragile:

Near-defective matrices have well-defined eigenvalues but ill-conditioned eigenvectors, and the matrix \(P\) used in the decomposition has large \(\|P\| \, \|P^{-1}\|\). The decomposition \(A = PDP^{-1}\) exists but is numerically unreliable.
The condition number of the eigenvector matrix, \(\kappa(P)\), controls how much \(\|PDP^{-1}\|\) amplifies perturbations of \(D\). For orthonormal \(P\) (e.g., symmetric \(A\)), \(\kappa(P) = 1\) and the decomposition is perfectly conditioned. For nearly-defective \(A\), \(\kappa(P)\) is huge.

Symmetric and other “normal” matrices (\(AA^\top = A^\top A\)) are the favorable case: they are always diagonalizable with orthonormal eigenvectors. Non-normal matrices are where defectiveness and ill-conditioning live.

Why Diagonalization Matters

Diagonalizable matrices are the matrices whose long-run behavior is decoupled — they can be analyzed one eigendirection at a time, and properties that hold component-wise (positivity, exponential growth, oscillation) transfer cleanly. This is why diagonalization is the gateway to:

Closed-form matrix powers \(A^k = P D^k P^{-1}\), used in linear recurrences and Markov chain analysis.
Matrix exponentials \(e^A = P e^D P^{-1}\), used to solve linear ODEs.
Spectral analysis of symmetric operators, used in PCA, harmonic analysis, and quantum mechanics.
Stability analysis of dynamical systems, where the eigenvalues of the Jacobian determine local behavior.

Defective matrices still have these structures available — via Jordan form or the SVD — but the analysis becomes more delicate, and “what diagonalization gives you cleanly” is the right baseline against which the cost of non-diagonalizability is measured.

References

Jordan, Camille. 1870. Traité Des Substitutions Et Des équations Algébriques. Gauthier-Villars.

Trefethen, Lloyd N., and David Bau III. 1997. Numerical Linear Algebra. Society for Industrial; Applied Mathematics. https://doi.org/10.1137/1.9780898719574.