Vector Spaces

Motivation

The vectors that arise in machine learning, statistics, and physics rarely look like tuples of numbers. Polynomials of bounded degree, continuous functions, random variables, image patches, gradient fields — all of these can be added together and scaled, and all obey the same algebraic rules as \(\mathbb{R}^n\). A vector space is the abstraction that captures these rules in one definition (Axler 2015), so that theorems proved about “abstract vectors” apply uniformly to every concrete instance.

Working with vector spaces lets us speak about linear independence, span, bases, dimension, and linear maps without re-deriving them for each new setting. It is the framework in which essentially all of linear algebra is phrased.

Definition

A vector space over \(\mathbb{R}\) is a set \(V\) together with two operations,

addition \(+ : V \times V \to V\), and
scalar multiplication \(\cdot : \mathbb{R} \times V \to V\),

satisfying the following axioms for all \(\mathbf{u}, \mathbf{v}, \mathbf{w} \in V\) and \(\alpha, \beta \in \mathbb{R}\):

Axiom	Statement
Associativity of \(+\)	\((\mathbf{u} + \mathbf{v}) + \mathbf{w} = \mathbf{u} + (\mathbf{v} + \mathbf{w})\)
Commutativity of \(+\)	\(\mathbf{u} + \mathbf{v} = \mathbf{v} + \mathbf{u}\)
Additive identity	there exists \(\mathbf{0} \in V\) with \(\mathbf{v} + \mathbf{0} = \mathbf{v}\)
Additive inverse	for every \(\mathbf{v}\) there exists \(-\mathbf{v}\) with \(\mathbf{v} + (-\mathbf{v}) = \mathbf{0}\)
Compatibility of \(\cdot\)	\(\alpha (\beta \mathbf{v}) = (\alpha \beta) \mathbf{v}\)
Scalar identity	\(1 \cdot \mathbf{v} = \mathbf{v}\)
Distributivity over \(+_V\)	\(\alpha (\mathbf{u} + \mathbf{v}) = \alpha \mathbf{u} + \alpha \mathbf{v}\)
Distributivity over \(+_\mathbb{R}\)	\((\alpha + \beta) \mathbf{v} = \alpha \mathbf{v} + \beta \mathbf{v}\)

Elements of \(V\) are called vectors; elements of \(\mathbb{R}\) are called scalars. Scalars can be drawn from \(\mathbb{C}\) or any other field, but in this course \(\mathbb{R}\) is the default.

The axioms encode familiar arithmetic. The crucial point is that they make no reference to coordinates, dimension, or geometry — once we know an object satisfies them, every general theorem of linear algebra applies.

Examples

Coordinate space \(\mathbb{R}^n\)

The prototype: \(n\)-tuples of real numbers, added componentwise, scaled by real factors. This is the setting most of linear algebra is first taught in.

Polynomials of degree at most \(d\)

Let \(P_d = \{a_0 + a_1 x + \cdots + a_d x^d : a_i \in \mathbb{R}\}\). Addition and scalar multiplication are the usual polynomial operations. The zero polynomial is the additive identity. \(P_d\) is a vector space of dimension \(d + 1\).

Continuous functions on \([0, 1]\)

Let \(C([0, 1])\) be the set of continuous real-valued functions on \([0, 1]\). Pointwise addition \((f + g)(x) = f(x) + g(x)\) and scaling \((\alpha f)(x) = \alpha f(x)\) make \(C([0, 1])\) a vector space — infinite-dimensional, but still a vector space.

Solutions of a linear ODE

The set of solutions \(y(t)\) to \(y'' + y = 0\) is a vector space: if \(y_1\) and \(y_2\) both satisfy the equation, so does \(\alpha y_1 + \beta y_2\). This space turns out to be 2-dimensional, spanned by \(\sin t\) and \(\cos t\).

A non-example

The set \(\mathbb{R}_{\ge 0}\) of nonnegative reals is not a vector space under ordinary arithmetic: there is no additive inverse for any positive number. Closure under scalar multiplication fails too — multiplying by \(-1\) leaves \(\mathbb{R}_{\ge 0}\).

Subspaces

A subspace of a vector space \(V\) is a subset \(W \subseteq V\) that is itself a vector space under the inherited operations. By the axioms, this reduces to three conditions:

\(\mathbf{0} \in W\),
\(W\) is closed under addition: \(\mathbf{u}, \mathbf{v} \in W \implies \mathbf{u} + \mathbf{v} \in W\),
\(W\) is closed under scalar multiplication: \(\alpha \in \mathbb{R}, \mathbf{v} \in W \implies \alpha \mathbf{v} \in W\).

Equivalently: \(W\) contains \(\mathbf{0}\) and is closed under arbitrary linear combinations.

The span of any subset of \(V\) is the smallest subspace containing it. Conversely, every subspace of a finite-dimensional space is the span of some finite set.

Examples of subspaces

Lines and planes through the origin in \(\mathbb{R}^3\).
The space of even polynomials inside \(P_d\).
The kernel \(\{\mathbf{x} : A\mathbf{x} = \mathbf{0}\}\) of a matrix \(A\) — see matrix rank.
The column space and row space of a matrix — see spans.

A line not through the origin is not a subspace: it fails to contain \(\mathbf{0}\). This is the most common stumbling block.

Dimension

A vector space \(V\) is finite-dimensional if it has a finite spanning set. Every basis of a finite-dimensional space has the same size, and that size is the dimension \(\dim V\). Examples:

\(\dim \mathbb{R}^n = n\).
\(\dim P_d = d + 1\).
\(\dim C([0, 1]) = \infty\).

For any subspace \(W \subseteq V\), \(\dim W \le \dim V\), with equality if and only if \(W = V\).

Why the Abstraction Matters

Phrasing results at the level of vector spaces — rather than re-proving them for each example — is what makes linear algebra reusable across domains.

A “linear regression” in \(\mathbb{R}^n\) and a “Fourier expansion” in \(C([0, 1])\) are both projections onto a subspace; the same projection theorem covers both.
The chain of theorems about bases, dimension, and linear maps is proved once and applies wherever the axioms hold.
When a new object — a probability distribution, a neural-network weight tensor, a graph signal — satisfies the axioms, every general theorem becomes available without reproof.

The cost of this abstraction is the conceptual leap from “vectors are arrows” to “vectors are anything obeying the rules.” The payoff is enormous: most of applied mathematics rests on it.

References

Axler, Sheldon. 2015. Linear Algebra Done Right. 3rd ed. Springer.