Vectors
Motivation
A scalar is a single number. But most real-world quantities — the position of a robot, the pixel colors in an image, the word frequencies in a document — are not single numbers; they are ordered lists of numbers that must be kept together and manipulated as a unit. A vector is the mathematical object for this (Strang 2016). Linear algebra is, at its core, the study of how vectors combine, transform, and relate — which is why vectors are the first concept in any machine learning curriculum.
Definition
A vector in \(\mathbb{R}^n\) is an ordered list of \(n\) real numbers, written as a column:
\[ \mathbf{v} = \begin{pmatrix} v_1 \\ v_2 \\ \vdots \\ v_n \end{pmatrix}. \]
The number \(v_i\) is the \(i\)-th component (or coordinate, or entry) of \(\mathbf{v}\). The integer \(n\) is the dimension of the vector. We write \(\mathbf{v} \in \mathbb{R}^n\) to say that \(\mathbf{v}\) is an \(n\)-dimensional real vector.
Vectors are usually written in bold lowercase (\(\mathbf{v}\), \(\mathbf{x}\), \(\mathbf{w}\)) to distinguish them from scalars (plain lowercase \(v\), \(x\), \(w\)).
Geometric Interpretation
In two and three dimensions, a vector has a clean geometric picture: an arrow from the origin to a point.
The components are the horizontal and vertical displacements. The vector
\[ \mathbf{v} = \begin{pmatrix} 3 \\ 2 \end{pmatrix} \]
points 3 units to the right and 2 units up. Two vectors are equal if and only if all their corresponding components are equal.
Basic Operations
Scalar multiplication
Multiplying a vector by a scalar \(c \in \mathbb{R}\) scales each component:
\[ c \mathbf{v} = \begin{pmatrix} c v_1 \\ c v_2 \\ \vdots \\ c v_n \end{pmatrix}. \]
Geometrically, this stretches or shrinks the arrow by factor \(|c|\), reversing direction when \(c < 0\).
Vector addition
Adding two vectors of the same dimension adds component-wise:
\[ \mathbf{u} + \mathbf{v} = \begin{pmatrix} u_1 + v_1 \\ u_2 + v_2 \\ \vdots \\ u_n + v_n \end{pmatrix}. \]
Geometrically this is the tip-to-tail rule: place the tail of \(\mathbf{v}\) at the tip of \(\mathbf{u}\); the sum is the arrow from the origin to the new tip.
Linear combination
A linear combination of vectors \(\mathbf{v}_1, \ldots, \mathbf{v}_k\) with scalars \(c_1, \ldots, c_k\) is
\[ c_1 \mathbf{v}_1 + c_2 \mathbf{v}_2 + \cdots + c_k \mathbf{v}_k. \]
Linear combinations are the fundamental operation of linear algebra: spans, linear independence, bases, and every matrix operation are all defined in terms of linear combinations.
Example
\[ 2 \begin{pmatrix} 1 \\ 0 \end{pmatrix} + 3 \begin{pmatrix} 0 \\ 1 \end{pmatrix} = \begin{pmatrix} 2 \\ 3 \end{pmatrix}. \]
The standard basis vectors \(\mathbf{e}_1 = (1, 0)^\top\) and \(\mathbf{e}_2 = (0, 1)^\top\) are the building blocks: every 2D vector is a linear combination of them.
Length and Direction
The Euclidean norm (length) of a vector \(\mathbf{v} \in \mathbb{R}^n\) is
\[ \|\mathbf{v}\| = \sqrt{v_1^2 + v_2^2 + \cdots + v_n^2}. \]
This is Pythagoras’s theorem extended to \(n\) dimensions. A vector with \(\|\mathbf{v}\| = 1\) is called a unit vector.
To find the unit vector pointing in the same direction as \(\mathbf{v}\), divide by its length:
\[ \hat{\mathbf{v}} = \frac{\mathbf{v}}{\|\mathbf{v}\|}. \]
Example
\[ \mathbf{v} = \begin{pmatrix} 3 \\ 4 \end{pmatrix}, \quad \|\mathbf{v}\| = \sqrt{9 + 16} = 5, \quad \hat{\mathbf{v}} = \begin{pmatrix} 3/5 \\ 4/5 \end{pmatrix}. \]
The Zero Vector
The zero vector \(\mathbf{0} \in \mathbb{R}^n\) has all components equal to zero. It is the additive identity: \(\mathbf{v} + \mathbf{0} = \mathbf{v}\) for every \(\mathbf{v}\). Geometrically it is a point, not an arrow — it has no direction.
Vectors in Machine Learning
Vectors appear throughout machine learning:
- Feature vectors. A data point (house, image, email) is represented as a vector \(\mathbf{x} \in \mathbb{R}^d\) where each coordinate is a measured attribute.
- Weight vectors. A linear model is parameterized by a weight vector \(\mathbf{w} \in \mathbb{R}^d\); the prediction is \(\mathbf{w}^\top \mathbf{x}\).
- Word embeddings. Words are mapped to vectors in \(\mathbb{R}^{300}\) or \(\mathbb{R}^{768}\); semantic similarity corresponds to geometric proximity.
- Gradient. The gradient of a loss function with respect to model parameters is a vector pointing in the direction of steepest increase.
What Comes Next
A single vector lives in isolation. The next questions are:
- How do you measure the angle between two vectors? — the dot product.
- What sets of vectors form useful collections? — spans and linear independence.
- How do you transform vectors systematically? — matrices and linear maps.