Vectors

Motivation

A scalar is a single number. But most real-world quantities — the position of a robot, the pixel colors in an image, the word frequencies in a document — are not single numbers; they are ordered lists of numbers that must be kept together and manipulated as a unit. A vector is the mathematical object for this (Strang 2016). Linear algebra is, at its core, the study of how vectors combine, transform, and relate — which is why vectors are the first concept in any machine learning curriculum.

Definition

A vector in \(\mathbb{R}^n\) is an ordered list of \(n\) real numbers, written as a column:

\[ \mathbf{v} = \begin{pmatrix} v_1 \\ v_2 \\ \vdots \\ v_n \end{pmatrix}. \]

The number \(v_i\) is the \(i\)-th component (or coordinate, or entry) of \(\mathbf{v}\). The integer \(n\) is the dimension of the vector. We write \(\mathbf{v} \in \mathbb{R}^n\) to say that \(\mathbf{v}\) is an \(n\)-dimensional real vector.

Vectors are usually written in bold lowercase (\(\mathbf{v}\), \(\mathbf{x}\), \(\mathbf{w}\)) to distinguish them from scalars (plain lowercase \(v\), \(x\), \(w\)).

Geometric Interpretation

In two and three dimensions, a vector has a clean geometric picture: an arrow from the origin to a point.

2D vector x y 2 4 2 4 v v₁ = 3.8 v₂ = 3.6

components v = [3] [1] [4] [2] a 4-dimensional vector in ℝ⁴

The components are the horizontal and vertical displacements. The vector

\[ \mathbf{v} = \begin{pmatrix} 3 \\ 2 \end{pmatrix} \]

points 3 units to the right and 2 units up. Two vectors are equal if and only if all their corresponding components are equal.

Basic Operations

Scalar multiplication

Multiplying a vector by a scalar \(c \in \mathbb{R}\) scales each component:

\[ c \mathbf{v} = \begin{pmatrix} c v_1 \\ c v_2 \\ \vdots \\ c v_n \end{pmatrix}. \]

Geometrically, this stretches or shrinks the arrow by factor \(|c|\), reversing direction when \(c < 0\).

Vector addition

Adding two vectors of the same dimension adds component-wise:

\[ \mathbf{u} + \mathbf{v} = \begin{pmatrix} u_1 + v_1 \\ u_2 + v_2 \\ \vdots \\ u_n + v_n \end{pmatrix}. \]

Geometrically this is the tip-to-tail rule: place the tail of \(\mathbf{v}\) at the tip of \(\mathbf{u}\); the sum is the arrow from the origin to the new tip.

Linear combination

A linear combination of vectors \(\mathbf{v}_1, \ldots, \mathbf{v}_k\) with scalars \(c_1, \ldots, c_k\) is

\[ c_1 \mathbf{v}_1 + c_2 \mathbf{v}_2 + \cdots + c_k \mathbf{v}_k. \]

Linear combinations are the fundamental operation of linear algebra: spans, linear independence, bases, and every matrix operation are all defined in terms of linear combinations.

Example

\[ 2 \begin{pmatrix} 1 \\ 0 \end{pmatrix} + 3 \begin{pmatrix} 0 \\ 1 \end{pmatrix} = \begin{pmatrix} 2 \\ 3 \end{pmatrix}. \]

The standard basis vectors \(\mathbf{e}_1 = (1, 0)^\top\) and \(\mathbf{e}_2 = (0, 1)^\top\) are the building blocks: every 2D vector is a linear combination of them.

Length and Direction

The Euclidean norm (length) of a vector \(\mathbf{v} \in \mathbb{R}^n\) is

\[ \|\mathbf{v}\| = \sqrt{v_1^2 + v_2^2 + \cdots + v_n^2}. \]

This is Pythagoras’s theorem extended to \(n\) dimensions. A vector with \(\|\mathbf{v}\| = 1\) is called a unit vector.

To find the unit vector pointing in the same direction as \(\mathbf{v}\), divide by its length:

\[ \hat{\mathbf{v}} = \frac{\mathbf{v}}{\|\mathbf{v}\|}. \]

Example

\[ \mathbf{v} = \begin{pmatrix} 3 \\ 4 \end{pmatrix}, \quad \|\mathbf{v}\| = \sqrt{9 + 16} = 5, \quad \hat{\mathbf{v}} = \begin{pmatrix} 3/5 \\ 4/5 \end{pmatrix}. \]

The Zero Vector

The zero vector \(\mathbf{0} \in \mathbb{R}^n\) has all components equal to zero. It is the additive identity: \(\mathbf{v} + \mathbf{0} = \mathbf{v}\) for every \(\mathbf{v}\). Geometrically it is a point, not an arrow — it has no direction.

Vectors in Machine Learning

Vectors appear throughout machine learning:

  • Feature vectors. A data point (house, image, email) is represented as a vector \(\mathbf{x} \in \mathbb{R}^d\) where each coordinate is a measured attribute.
  • Weight vectors. A linear model is parameterized by a weight vector \(\mathbf{w} \in \mathbb{R}^d\); the prediction is \(\mathbf{w}^\top \mathbf{x}\).
  • Word embeddings. Words are mapped to vectors in \(\mathbb{R}^{300}\) or \(\mathbb{R}^{768}\); semantic similarity corresponds to geometric proximity.
  • Gradient. The gradient of a loss function with respect to model parameters is a vector pointing in the direction of steepest increase.

What Comes Next

A single vector lives in isolation. The next questions are:

References

Strang, Gilbert. 2016. Introduction to Linear Algebra. 5th ed. Wellesley-Cambridge Press.