Matrix Invertibility
Motivation
A square matrix \(A\) describes a linear map from \(\mathbb{R}^n\) to itself. Sometimes that map can be undone — every output corresponds to exactly one input, and we can recover the input from the output. Sometimes it cannot, because \(A\) collapses some directions to zero or fails to reach every output. A matrix that can be undone is called invertible, and the matrix that undoes it is its inverse \(A^{-1}\) (Strang 2016).
Invertibility is the single most common precondition in linear algebra. It is what makes “solve \(A\mathbf{x} = \mathbf{b}\) for any \(\mathbf{b}\)” possible, what makes change of basis reversible, what makes the determinant nonzero, and what links together a long list of equivalent conditions — the invertible matrix theorem — that recurs throughout linear algebra.
Definition
A square matrix \(A \in \mathbb{R}^{n \times n}\) is invertible (or nonsingular) if there exists a matrix \(A^{-1} \in \mathbb{R}^{n \times n}\) satisfying
\[ A \, A^{-1} = A^{-1} A = I. \]
When it exists, \(A^{-1}\) is unique: if \(B\) and \(C\) both satisfy \(AB = I\) and \(CA = I\), then \(C = C(AB) = (CA)B = B\).
A matrix that is not invertible is called singular.
A \(2 \times 2\) example
\[ A = \begin{pmatrix} 2 & 1 \\ 1 & 1 \end{pmatrix}, \qquad A^{-1} = \begin{pmatrix} 1 & -1 \\ -1 & 2 \end{pmatrix}. \]
Check: \(AA^{-1} = \begin{pmatrix} 2 - 1 & -2 + 2 \\ 1 - 1 & -1 + 2 \end{pmatrix} = I\).
A singular matrix
\[ A = \begin{pmatrix} 1 & 2 \\ 2 & 4 \end{pmatrix}. \]
The second column is twice the first, so \(A\) has rank \(1\) and collapses \(\mathbb{R}^2\) onto a single line. The vector \((2, -1)^\top\) satisfies \(A(2, -1)^\top = \mathbf{0}\), so \(A\) destroys information and cannot be undone.
The Invertible Matrix Theorem
For a square matrix \(A \in \mathbb{R}^{n \times n}\), the following are equivalent:
- \(A\) is invertible.
- \(A\) has rank \(n\) (full rank).
- The columns of \(A\) are linearly independent.
- The rows of \(A\) are linearly independent.
- The columns of \(A\) span \(\mathbb{R}^n\).
- \(\ker(A) = \{\mathbf{0}\}\) — the only solution to \(A\mathbf{x} = \mathbf{0}\) is \(\mathbf{x} = \mathbf{0}\).
- \(A\mathbf{x} = \mathbf{b}\) has a unique solution for every \(\mathbf{b} \in \mathbb{R}^n\).
- \(\det(A) \neq 0\) (see determinants).
- Every eigenvalue of \(A\) is nonzero.
- \(A^\top\) is invertible.
Each statement reformulates “this linear map is a bijection.” In practice, deciding invertibility means picking whichever of these conditions is easiest to check.
Inverse Formula in \(2 \times 2\)
For \(A = \begin{pmatrix} a & b \\ c & d \end{pmatrix}\) with \(ad - bc \neq 0\),
\[ A^{-1} = \frac{1}{ad - bc} \begin{pmatrix} d & -b \\ -c & a \end{pmatrix}. \]
The denominator \(ad - bc\) is the determinant; its nonvanishing is exactly condition (8) of the invertibility theorem.
For larger matrices, explicit formulas exist (Cramer’s rule (Cramer 1750), adjugate) but are not used in practice — they are arithmetically inefficient and numerically unstable. Gaussian elimination is used instead.
Computing the Inverse
The standard method augments \(A\) with the identity matrix and row-reduces:
\[ [\,A \mid I\,] \;\xrightarrow{\text{Gauss-Jordan}}\; [\,I \mid A^{-1}\,]. \]
The row operations that turn \(A\) into \(I\), applied to \(I\), produce \(A^{-1}\).
Whenever the goal is to solve \(A\mathbf{x} = \mathbf{b}\), computing \(A^{-1}\) first is almost always the wrong approach. Forming \(A^{-1}\) explicitly takes \(\Theta(n^3)\) work and accumulates round-off error. Solving \(A\mathbf{x} = \mathbf{b}\) directly via \(LU\) factorization or Gaussian elimination is the same asymptotic cost but with better constants and far better numerical behavior. The inverse appears in derivations and identities; the explicit matrix rarely needs to be formed.
Properties
Product of invertibles is invertible. If \(A, B \in \mathbb{R}^{n \times n}\) are both invertible, then so is \(AB\), and
\[ (AB)^{-1} = B^{-1} A^{-1}. \]
The order reverses — a common source of bugs.
Transpose. If \(A\) is invertible, \(A^\top\) is invertible and \((A^\top)^{-1} = (A^{-1})^\top\).
Inverse of inverse. \((A^{-1})^{-1} = A\).
Scalar multiple. For \(c \neq 0\), \((cA)^{-1} = c^{-1} A^{-1}\).
Orthogonal matrices. If \(Q\) has orthonormal columns and is square, then \(Q^\top Q = I\), so \(Q^{-1} = Q^\top\). Transposition is the inverse for orthogonal matrices — no Gaussian elimination needed.
Diagonal matrices. \(\operatorname{diag}(d_1, \ldots, d_n)\) is invertible if and only if every \(d_i\) is nonzero, in which case the inverse is \(\operatorname{diag}(1/d_1, \ldots, 1/d_n)\).
Geometric Interpretation
A linear map \(A : \mathbb{R}^n \to \mathbb{R}^n\) is invertible exactly when it preserves “fullness” — it stretches, rotates, reflects, or skews the unit cube but does not flatten it to lower dimension.
- Invertible \(\iff\) volume-preserving up to sign. \(\det(A) \neq 0\) says \(A\) scales every \(n\)-dimensional volume by a nonzero factor.
- Singular \(\iff\) collapses to a hyperplane (or smaller). A singular matrix smashes the cube into a flat slab of dimension less than \(n\); no inverse can re-inflate it.
This is the geometric content of “an invertible map is a bijection of \(\mathbb{R}^n\).”
When Invertibility Fails
In data-driven problems, exact singularity is rare; near-singularity is common. A matrix \(A\) may be technically invertible but have an inverse that magnifies small input errors enormously. The condition number \(\kappa(A) = \sigma_1(A) / \sigma_n(A)\) — the ratio of the largest to the smallest singular value — quantifies this. Large \(\kappa\) signals that \(A^{-1}\) exists in principle but is numerically unreliable.
When invertibility fails outright or is too unstable to trust, two replacements are standard:
- Pseudoinverse \(A^+\) — defined for any matrix, agrees with \(A^{-1}\) when \(A\) is invertible, and gives the minimum-norm least-squares solution of \(A\mathbf{x} = \mathbf{b}\) otherwise. Computed via the singular value decomposition.
- Regularization — solve \((A^\top A + \lambda I)\mathbf{x} = A^\top \mathbf{b}\) instead of \(A\mathbf{x} = \mathbf{b}\). The added \(\lambda I\) guarantees invertibility and trades a small bias for a much smaller variance.