Linear Maps
Motivation
A function takes inputs and produces outputs. The functions most natural for vector spaces are the ones that respect the vector structure — they preserve addition and scalar multiplication. These are called linear maps (also linear transformations) (Axler 2015). Every matrix multiplication is a linear map, and every linear map between finite-dimensional spaces is a matrix multiplication. This equivalence is what makes matrices so useful: to understand how a transformation works, study its matrix.
Definition
A function \(T : \mathbb{R}^n \to \mathbb{R}^m\) is a linear map if it satisfies two conditions for all \(\mathbf{u}, \mathbf{v} \in \mathbb{R}^n\) and all scalars \(c \in \mathbb{R}\):
- Additivity: \(T(\mathbf{u} + \mathbf{v}) = T(\mathbf{u}) + T(\mathbf{v})\)
- Homogeneity: \(T(c\mathbf{v}) = c \, T(\mathbf{v})\)
Together these say that \(T\) preserves linear combinations:
\[ T(c_1 \mathbf{v}_1 + c_2 \mathbf{v}_2 + \cdots + c_k \mathbf{v}_k) = c_1 T(\mathbf{v}_1) + c_2 T(\mathbf{v}_2) + \cdots + c_k T(\mathbf{v}_k). \]
This is the defining property. A linear map is completely determined by what it does to a basis: once you know the outputs on \(n\) basis vectors, you know the output on every vector.
Examples of Linear Maps
Scaling
\(T(\mathbf{v}) = c\mathbf{v}\) scales every vector by \(c\). Check: \(T(\mathbf{u} + \mathbf{v}) = c(\mathbf{u} + \mathbf{v}) = c\mathbf{u} + c\mathbf{v} = T(\mathbf{u}) + T(\mathbf{v})\). ✓
Rotation in \(\mathbb{R}^2\)
Rotating every vector by angle \(\theta\) counterclockwise is linear. It preserves the angle between any two vectors (and hence preserves addition and scalar multiplication up to rotation).
Projection
Projecting onto a line or subspace is linear. For example, projecting \(\mathbb{R}^2\) onto the \(x\)-axis:
\[ T\!\begin{pmatrix} x \\ y \end{pmatrix} = \begin{pmatrix} x \\ 0 \end{pmatrix}. \]
Differentiation
On the space of polynomials, taking the derivative is linear: \((f + g)' = f' + g'\) and \((cf)' = cf'\).
Non-example: translation
\(T(\mathbf{v}) = \mathbf{v} + \mathbf{b}\) with \(\mathbf{b} \ne \mathbf{0}\) is not linear: \(T(\mathbf{0}) = \mathbf{b} \ne \mathbf{0}\), but every linear map must satisfy \(T(\mathbf{0}) = \mathbf{0}\).
The Matrix Representation
The central theorem connecting linear maps and matrices:
Every linear map \(T : \mathbb{R}^n \to \mathbb{R}^m\) is represented by a unique matrix \(A \in \mathbb{R}^{m \times n}\) such that \(T(\mathbf{v}) = A\mathbf{v}\) for all \(\mathbf{v}\).
Constructing \(A\): apply \(T\) to each standard basis vector \(\mathbf{e}_1, \ldots, \mathbf{e}_n\). The outputs are the columns of \(A\):
\[ A = \begin{pmatrix} | & | & & | \\ T(\mathbf{e}_1) & T(\mathbf{e}_2) & \cdots & T(\mathbf{e}_n) \\ | & | & & | \end{pmatrix}. \]
Why this works: any \(\mathbf{v} = v_1 \mathbf{e}_1 + \cdots + v_n \mathbf{e}_n\), so by linearity
\[ T(\mathbf{v}) = v_1 T(\mathbf{e}_1) + \cdots + v_n T(\mathbf{e}_n) = A\mathbf{v}. \]
Example: Rotation by \(90°\)
Under a \(90°\) counterclockwise rotation:
\[ T(\mathbf{e}_1) = T\!\begin{pmatrix} 1 \\ 0 \end{pmatrix} = \begin{pmatrix} 0 \\ 1 \end{pmatrix}, \qquad T(\mathbf{e}_2) = T\!\begin{pmatrix} 0 \\ 1 \end{pmatrix} = \begin{pmatrix} -1 \\ 0 \end{pmatrix}. \]
So the rotation matrix is
\[ A = \begin{pmatrix} 0 & -1 \\ 1 & 0 \end{pmatrix}. \]
Check: \(A \begin{pmatrix} 3 \\ 2 \end{pmatrix} = \begin{pmatrix} -2 \\ 3 \end{pmatrix}\), which is indeed \((3, 2)^\top\) rotated \(90°\). ✓
Composition and Matrix Multiplication
If \(S : \mathbb{R}^n \to \mathbb{R}^k\) and \(T : \mathbb{R}^k \to \mathbb{R}^m\) are linear maps with matrices \(B\) and \(A\) respectively, then the composition \(T \circ S\) is linear with matrix \(AB\).
This is why matrix multiplication is defined the way it is: matrix multiplication is composition of linear maps.
\[ (T \circ S)(\mathbf{v}) = T(S(\mathbf{v})) = A(B\mathbf{v}) = (AB)\mathbf{v}. \]
Kernel and Image
Two subspaces characterize a linear map \(T\) with matrix \(A\):
Kernel (null space): \(\ker(T) = \{\mathbf{v} \in \mathbb{R}^n : T(\mathbf{v}) = \mathbf{0}\} = \{\mathbf{v} : A\mathbf{v} = \mathbf{0}\}\).
The kernel measures what gets “lost” by \(T\). If \(\ker(T) = \{\mathbf{0}\}\), then \(T\) is injective (one-to-one).
Image (column space): \(\operatorname{im}(T) = \{T(\mathbf{v}) : \mathbf{v} \in \mathbb{R}^n\}\).
The image is the set of all vectors \(T\) can produce — the span of \(A\)’s columns. If \(\operatorname{im}(T) = \mathbb{R}^m\), then \(T\) is surjective (onto).
Rank-nullity theorem:
\[ \dim(\ker T) + \dim(\operatorname{im} T) = n. \]
The dimensions of kernel and image sum to the input dimension. This is why a matrix with rank \(r\) has a null space of dimension \(n - r\).
Why Linearity Matters
Non-linear functions are hard to analyze globally. A linear map is completely determined by its behavior on \(n\) basis vectors — a finite amount of information. This finiteness is what allows the entire apparatus of linear algebra (eigenvectors, SVD, PCA) to work. Neural networks are stacks of linear maps interspersed with non-linearities; the linear parts are tractable, and the non-linearities are what give the network expressive power.