Matrices
Motivation
A vector organizes \(n\) numbers into a column. A matrix organizes \(m \times n\) numbers into a rectangular grid (Strang 2016). Matrices are indispensable in machine learning and linear algebra because they do two distinct things at once: they store data (a dataset with \(m\) rows and \(n\) features is a matrix), and they act on vectors (multiplying a matrix by a vector transforms the vector, encoding a linear map). Almost every operation in this course — solving equations, computing PCA, applying neural network layers — reduces to matrix arithmetic.
Definition
An \(m \times n\) matrix \(A\) is a rectangular array of real numbers with \(m\) rows and \(n\) columns:
\[ A = \begin{pmatrix} a_{11} & a_{12} & \cdots & a_{1n} \\ a_{21} & a_{22} & \cdots & a_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ a_{m1} & a_{m2} & \cdots & a_{mn} \end{pmatrix}. \]
The entry in row \(i\) and column \(j\) is written \(a_{ij}\) or \(A_{ij}\) or \((A)_{ij}\). We write \(A \in \mathbb{R}^{m \times n}\).
Examples
\[ A = \begin{pmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \end{pmatrix} \in \mathbb{R}^{2 \times 3}, \qquad I = \begin{pmatrix} 1 & 0 \\ 0 & 1 \end{pmatrix} \in \mathbb{R}^{2 \times 2}. \]
Viewing a Matrix as Columns or Rows
Two complementary views are constantly useful.
Column view. Each column of \(A\) is a vector in \(\mathbb{R}^m\):
\[ A = \begin{pmatrix} | & | & & | \\ \mathbf{a}_1 & \mathbf{a}_2 & \cdots & \mathbf{a}_n \\ | & | & & | \end{pmatrix}. \]
Row view. Each row of \(A\) is a row vector in \(\mathbb{R}^n\):
\[ A = \begin{pmatrix} \text{---} \; \mathbf{r}_1^\top \; \text{---} \\ \text{---} \; \mathbf{r}_2^\top \; \text{---} \\ \vdots \\ \text{---} \; \mathbf{r}_m^\top \; \text{---} \end{pmatrix}. \]
Which view to use depends on the operation. Matrix-vector multiplication has clean interpretations in both.
Special Matrices
| Name | Shape | Defining property |
|---|---|---|
| Square | \(n \times n\) | equal rows and columns |
| Identity \(I_n\) | \(n \times n\) | \(I_{ij} = 1\) if \(i = j\), else \(0\) |
| Zero | any | all entries \(0\) |
| Diagonal | \(n \times n\) | \(A_{ij} = 0\) for \(i \ne j\) |
| Symmetric | \(n \times n\) | \(A = A^\top\) |
The identity matrix plays the role of \(1\) in matrix arithmetic: \(AI = A\) and \(IA = A\) whenever the dimensions match.
Basic Operations
Scalar multiplication
\[ (cA)_{ij} = c \cdot a_{ij}. \]
Addition
Two matrices of the same shape add entry-wise:
\[ (A + B)_{ij} = a_{ij} + b_{ij}. \]
Transpose
The transpose \(A^\top\) flips rows and columns: \((A^\top)_{ij} = a_{ji}\). If \(A \in \mathbb{R}^{m \times n}\) then \(A^\top \in \mathbb{R}^{n \times m}\).
\[ A = \begin{pmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \end{pmatrix} \implies A^\top = \begin{pmatrix} 1 & 4 \\ 2 & 5 \\ 3 & 6 \end{pmatrix}. \]
Matrix-Vector Multiplication
If \(A \in \mathbb{R}^{m \times n}\) and \(\mathbf{x} \in \mathbb{R}^n\), the product \(A\mathbf{x} \in \mathbb{R}^m\) is defined by
\[ (A\mathbf{x})_i = \sum_{j=1}^n a_{ij} x_j \quad \text{for each row } i. \]
Row view: the \(i\)-th entry of \(A\mathbf{x}\) is the dot product of the \(i\)-th row of \(A\) with \(\mathbf{x}\).
Column view: \(A\mathbf{x}\) is a linear combination of the columns of \(A\) with coefficients from \(\mathbf{x}\):
\[ A\mathbf{x} = x_1 \mathbf{a}_1 + x_2 \mathbf{a}_2 + \cdots + x_n \mathbf{a}_n. \]
Example
\[ \begin{pmatrix} 1 & 2 \\ 3 & 4 \end{pmatrix} \begin{pmatrix} 5 \\ 6 \end{pmatrix} = 5\begin{pmatrix} 1 \\ 3 \end{pmatrix} + 6\begin{pmatrix} 2 \\ 4 \end{pmatrix} = \begin{pmatrix} 5 + 12 \\ 15 + 24 \end{pmatrix} = \begin{pmatrix} 17 \\ 39 \end{pmatrix}. \]
The dimensions must match: \(A\) has \(n\) columns and \(\mathbf{x}\) has \(n\) rows. The output has \(m\) rows, one per row of \(A\).
Matrix-Matrix Multiplication
If \(A \in \mathbb{R}^{m \times k}\) and \(B \in \mathbb{R}^{k \times n}\), the product \(C = AB \in \mathbb{R}^{m \times n}\) is
\[ c_{ij} = \sum_{\ell=1}^k a_{i\ell} \, b_{\ell j}. \]
Equivalently, the \(j\)-th column of \(C\) is \(A\) times the \(j\)-th column of \(B\):
\[ AB = A \begin{pmatrix} | & & | \\ \mathbf{b}_1 & \cdots & \mathbf{b}_n \\ | & & | \end{pmatrix} = \begin{pmatrix} | & & | \\ A\mathbf{b}_1 & \cdots & A\mathbf{b}_n \\ | & & | \end{pmatrix}. \]
Important: matrix multiplication is not commutative in general. \(AB \ne BA\) (and \(BA\) may not even be defined).
Example
\[ \begin{pmatrix} 1 & 2 \\ 3 & 4 \end{pmatrix} \begin{pmatrix} 5 & 6 \\ 7 & 8 \end{pmatrix} = \begin{pmatrix} 1 \cdot 5 + 2 \cdot 7 & 1 \cdot 6 + 2 \cdot 8 \\ 3 \cdot 5 + 4 \cdot 7 & 3 \cdot 6 + 4 \cdot 8 \end{pmatrix} = \begin{pmatrix} 19 & 22 \\ 43 & 50 \end{pmatrix}. \]
Matrices as Data
A dataset of \(m\) examples, each with \(n\) features, is naturally stored as a matrix \(X \in \mathbb{R}^{m \times n}\):
- Row \(i\) is the feature vector for example \(i\).
- Column \(j\) is the values of feature \(j\) across all examples.
Operations like computing means, covariances, and linear model predictions all become matrix operations on \(X\).
What Comes Next
- Linear maps explains what it means for a matrix to represent a transformation of space.
- Systems of linear equations uses matrices to encode and solve \(A\mathbf{x} = \mathbf{b}\).
- Matrix rank and invertibility characterize when solutions exist and are unique.