Matrices

Motivation

A vector organizes \(n\) numbers into a column. A matrix organizes \(m \times n\) numbers into a rectangular grid (Strang 2016). Matrices are indispensable in machine learning and linear algebra because they do two distinct things at once: they store data (a dataset with \(m\) rows and \(n\) features is a matrix), and they act on vectors (multiplying a matrix by a vector transforms the vector, encoding a linear map). Almost every operation in this course — solving equations, computing PCA, applying neural network layers — reduces to matrix arithmetic.

Definition

An \(m \times n\) matrix \(A\) is a rectangular array of real numbers with \(m\) rows and \(n\) columns:

\[ A = \begin{pmatrix} a_{11} & a_{12} & \cdots & a_{1n} \\ a_{21} & a_{22} & \cdots & a_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ a_{m1} & a_{m2} & \cdots & a_{mn} \end{pmatrix}. \]

The entry in row \(i\) and column \(j\) is written \(a_{ij}\) or \(A_{ij}\) or \((A)_{ij}\). We write \(A \in \mathbb{R}^{m \times n}\).

Examples

\[ A = \begin{pmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \end{pmatrix} \in \mathbb{R}^{2 \times 3}, \qquad I = \begin{pmatrix} 1 & 0 \\ 0 & 1 \end{pmatrix} \in \mathbb{R}^{2 \times 2}. \]

Viewing a Matrix as Columns or Rows

Two complementary views are constantly useful.

Column view. Each column of \(A\) is a vector in \(\mathbb{R}^m\):

\[ A = \begin{pmatrix} | & | & & | \\ \mathbf{a}_1 & \mathbf{a}_2 & \cdots & \mathbf{a}_n \\ | & | & & | \end{pmatrix}. \]

Row view. Each row of \(A\) is a row vector in \(\mathbb{R}^n\):

\[ A = \begin{pmatrix} \text{---} \; \mathbf{r}_1^\top \; \text{---} \\ \text{---} \; \mathbf{r}_2^\top \; \text{---} \\ \vdots \\ \text{---} \; \mathbf{r}_m^\top \; \text{---} \end{pmatrix}. \]

Which view to use depends on the operation. Matrix-vector multiplication has clean interpretations in both.

Special Matrices

Name Shape Defining property
Square \(n \times n\) equal rows and columns
Identity \(I_n\) \(n \times n\) \(I_{ij} = 1\) if \(i = j\), else \(0\)
Zero any all entries \(0\)
Diagonal \(n \times n\) \(A_{ij} = 0\) for \(i \ne j\)
Symmetric \(n \times n\) \(A = A^\top\)

The identity matrix plays the role of \(1\) in matrix arithmetic: \(AI = A\) and \(IA = A\) whenever the dimensions match.

Basic Operations

Scalar multiplication

\[ (cA)_{ij} = c \cdot a_{ij}. \]

Addition

Two matrices of the same shape add entry-wise:

\[ (A + B)_{ij} = a_{ij} + b_{ij}. \]

Transpose

The transpose \(A^\top\) flips rows and columns: \((A^\top)_{ij} = a_{ji}\). If \(A \in \mathbb{R}^{m \times n}\) then \(A^\top \in \mathbb{R}^{n \times m}\).

\[ A = \begin{pmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \end{pmatrix} \implies A^\top = \begin{pmatrix} 1 & 4 \\ 2 & 5 \\ 3 & 6 \end{pmatrix}. \]

Matrix-Vector Multiplication

If \(A \in \mathbb{R}^{m \times n}\) and \(\mathbf{x} \in \mathbb{R}^n\), the product \(A\mathbf{x} \in \mathbb{R}^m\) is defined by

\[ (A\mathbf{x})_i = \sum_{j=1}^n a_{ij} x_j \quad \text{for each row } i. \]

Row view: the \(i\)-th entry of \(A\mathbf{x}\) is the dot product of the \(i\)-th row of \(A\) with \(\mathbf{x}\).

Column view: \(A\mathbf{x}\) is a linear combination of the columns of \(A\) with coefficients from \(\mathbf{x}\):

\[ A\mathbf{x} = x_1 \mathbf{a}_1 + x_2 \mathbf{a}_2 + \cdots + x_n \mathbf{a}_n. \]

Example

\[ \begin{pmatrix} 1 & 2 \\ 3 & 4 \end{pmatrix} \begin{pmatrix} 5 \\ 6 \end{pmatrix} = 5\begin{pmatrix} 1 \\ 3 \end{pmatrix} + 6\begin{pmatrix} 2 \\ 4 \end{pmatrix} = \begin{pmatrix} 5 + 12 \\ 15 + 24 \end{pmatrix} = \begin{pmatrix} 17 \\ 39 \end{pmatrix}. \]

The dimensions must match: \(A\) has \(n\) columns and \(\mathbf{x}\) has \(n\) rows. The output has \(m\) rows, one per row of \(A\).

Matrix-Matrix Multiplication

If \(A \in \mathbb{R}^{m \times k}\) and \(B \in \mathbb{R}^{k \times n}\), the product \(C = AB \in \mathbb{R}^{m \times n}\) is

\[ c_{ij} = \sum_{\ell=1}^k a_{i\ell} \, b_{\ell j}. \]

Equivalently, the \(j\)-th column of \(C\) is \(A\) times the \(j\)-th column of \(B\):

\[ AB = A \begin{pmatrix} | & & | \\ \mathbf{b}_1 & \cdots & \mathbf{b}_n \\ | & & | \end{pmatrix} = \begin{pmatrix} | & & | \\ A\mathbf{b}_1 & \cdots & A\mathbf{b}_n \\ | & & | \end{pmatrix}. \]

Important: matrix multiplication is not commutative in general. \(AB \ne BA\) (and \(BA\) may not even be defined).

Example

\[ \begin{pmatrix} 1 & 2 \\ 3 & 4 \end{pmatrix} \begin{pmatrix} 5 & 6 \\ 7 & 8 \end{pmatrix} = \begin{pmatrix} 1 \cdot 5 + 2 \cdot 7 & 1 \cdot 6 + 2 \cdot 8 \\ 3 \cdot 5 + 4 \cdot 7 & 3 \cdot 6 + 4 \cdot 8 \end{pmatrix} = \begin{pmatrix} 19 & 22 \\ 43 & 50 \end{pmatrix}. \]

Matrices as Data

A dataset of \(m\) examples, each with \(n\) features, is naturally stored as a matrix \(X \in \mathbb{R}^{m \times n}\):

  • Row \(i\) is the feature vector for example \(i\).
  • Column \(j\) is the values of feature \(j\) across all examples.

Operations like computing means, covariances, and linear model predictions all become matrix operations on \(X\).

What Comes Next

References

Strang, Gilbert. 2016. Introduction to Linear Algebra. 5th ed. Wellesley-Cambridge Press.