Systems of Linear Equations
Motivation
Many problems reduce to finding unknown values that simultaneously satisfy several constraints, each of which is a linear equation. A robot localization problem might constrain coordinates from multiple sensor readings; a recommendation system might constrain user preferences from observed ratings; a circuit might constrain voltages from Kirchhoff’s laws. All of these are systems of linear equations (Strang 2016). Understanding when solutions exist, how many there are, and how to find them is foundational for machine learning and applied mathematics.
Definition
A system of \(m\) linear equations in \(n\) unknowns has the form
\[ \begin{aligned} a_{11} x_1 + a_{12} x_2 + \cdots + a_{1n} x_n &= b_1 \\ a_{21} x_1 + a_{22} x_2 + \cdots + a_{2n} x_n &= b_2 \\ &\;\vdots \\ a_{m1} x_1 + a_{m2} x_2 + \cdots + a_{mn} x_n &= b_m. \end{aligned} \]
The \(a_{ij}\) are known coefficients, the \(b_i\) are known right-hand sides, and \(x_1, \ldots, x_n\) are the unknowns.
Matrix form. Stacking everything into a matrix \(A \in \mathbb{R}^{m \times n}\) and vectors \(\mathbf{x} \in \mathbb{R}^n\), \(\mathbf{b} \in \mathbb{R}^m\):
\[ A\mathbf{x} = \mathbf{b}. \]
This compact notation is not just convenient — it reveals that solving a linear system means finding a vector \(\mathbf{x}\) whose image under the linear map \(A\) is \(\mathbf{b}\).
Examples
One equation, one unknown
\[ 3x = 6 \implies x = 2. \]
Exactly one solution.
Two equations, two unknowns
\[ \begin{pmatrix} 1 & 1 \\ 2 & -1 \end{pmatrix} \begin{pmatrix} x \\ y \end{pmatrix} = \begin{pmatrix} 5 \\ 4 \end{pmatrix}. \]
The first equation is \(x + y = 5\), the second is \(2x - y = 4\). Adding them: \(3x = 9\), so \(x = 3\), \(y = 2\). Unique solution.
Geometrically
Each equation in \(\mathbb{R}^2\) defines a line. The solution set is the intersection:
In \(\mathbb{R}^3\) each equation defines a plane; solutions are intersections of planes.
Three Possible Outcomes
For any system \(A\mathbf{x} = \mathbf{b}\), exactly one of three things is true:
- Unique solution. Exactly one \(\mathbf{x}\) satisfies all equations. (The “square” case: \(m = n\) and the equations are independent.)
- No solution. The equations are contradictory — inconsistent. (E.g., \(x = 1\) and \(x = 2\) simultaneously.)
- Infinitely many solutions. One equation is redundant; the solution set is a line, plane, or higher-dimensional subspace.
Which case applies depends on the rank of \(A\) relative to the rank of the augmented matrix \([A \mid \mathbf{b}]\).
| Condition | Outcome |
|---|---|
| \(\operatorname{rank}(A) = \operatorname{rank}([A\mid\mathbf{b}]) = n\) | unique solution |
| \(\operatorname{rank}(A) = \operatorname{rank}([A\mid\mathbf{b}]) < n\) | infinitely many |
| \(\operatorname{rank}(A) < \operatorname{rank}([A\mid\mathbf{b}])\) | no solution |
The Homogeneous Case
When \(\mathbf{b} = \mathbf{0}\), the system \(A\mathbf{x} = \mathbf{0}\) is homogeneous. It always has at least the trivial solution \(\mathbf{x} = \mathbf{0}\). If \(A\) has more columns than rows (\(n > m\)), the system must have infinitely many solutions (the null space is non-trivial).
The solution set of \(A\mathbf{x} = \mathbf{0}\) is the null space of \(A\), a subspace of \(\mathbb{R}^n\).
The General Solution
If \(\mathbf{x}_p\) is any particular solution to \(A\mathbf{x} = \mathbf{b}\), and \(\mathbf{x}_h\) is any solution to the homogeneous system \(A\mathbf{x} = \mathbf{0}\), then \(\mathbf{x}_p + \mathbf{x}_h\) is also a solution to \(A\mathbf{x} = \mathbf{b}\). The complete solution set is
\[ \{\mathbf{x}_p + \mathbf{x}_h : \mathbf{x}_h \in \ker(A)\}. \]
This structure — particular solution plus null space — reappears in differential equations, physics, and optimization.
Overdetermined and Underdetermined Systems
Overdetermined (\(m > n\)): more equations than unknowns. Typically no exact solution exists. The practical fix is to find the \(\mathbf{x}\) that minimizes \(\|A\mathbf{x} - \mathbf{b}\|^2\) — this is linear regression (ordinary least squares).
Underdetermined (\(m < n\)): fewer equations than unknowns. Typically infinitely many solutions. The practical fix is to add a regularizer and find the minimum-norm solution — this is ridge regression.
Solving Systems
The standard algorithm for solving \(A\mathbf{x} = \mathbf{b}\) is Gaussian elimination, which converts \(A\) to row-echelon form through systematic row operations. For square, invertible \(A\), the solution is
\[ \mathbf{x} = A^{-1}\mathbf{b}, \]
but computing the inverse is less efficient than elimination for large systems.