Simplex Method
Motivation
The simplex method (Dantzig 1963) is the most widely used algorithm for solving linear programs. It works by exploiting a key geometric fact: the optimal value of any LP, if finite, is achieved at a vertex of the feasible polytope. Simplex starts at one vertex and walks along edges to neighboring vertices, always improving the objective, until no improving neighbor exists.
Understanding simplex matters beyond just “how LP solvers work.” Warm starts, pivot rules, degeneracy, and formulation choices all affect practical solver performance. Branch-and-bound methods for integer programming rely on repeated LP solves that warm-start simplex from a previous basis, making simplex — not interior-point methods — the algorithm of choice inside most ILP solvers.
Problem
Given the standard-form linear program
\[ \max c^\top x \quad \text{s.t.} \quad Ax \le b, \quad x \ge 0, \]
find \(x\) achieving the maximum (if finite), or report unboundedness or infeasibility.
Key Ideas
Slack form: equalities instead of inequalities
Introduce one slack variable \(s_i \ge 0\) per constraint, converting each inequality to an equality:
\[ Ax + s = b, \quad x \ge 0, \quad s \ge 0. \]
The combined variable vector \((x, s) \in \mathbb{R}^{n+m}\) has \(n + m\) entries, and there are \(m\) equality constraints. The change is mechanical — slack form has the same feasible region as the original — but makes the linear-algebra steps cleaner.
A basic feasible solution is a vertex
A basis \(B \subseteq \{1, \ldots, n+m\}\) is a set of \(m\) indices such that the corresponding columns of \([A \mid I]\) are linearly independent. The basic feasible solution (BFS) for \(B\) sets all non-basic variables (indices not in \(B\)) to zero and solves the resulting \(m \times m\) system for the \(m\) basic variables. If the solution is non-negative, it is a BFS.
Every vertex of the feasible polytope corresponds to at least one BFS. Since the number of BFS is at most \(\binom{n+m}{m}\) (finite), any algorithm that moves between BFS without repeating will terminate. This is what turns the geometric “walk along edges” into a finite combinatorial procedure: each pivot exchanges one basic variable for one non-basic, and the algorithm needs to visit only finitely many bases.
Reduced costs as the optimality certificate
For a basis \(B\) with basis matrix \(\mathbf{B}\) (the \(m\) columns of \([A \mid I]\) indexed by \(B\)), the reduced cost of non-basic variable \(j\) is
\[ \bar c_j = c_j - c_B^\top \mathbf{B}^{-1} a_j, \]
where \(a_j\) is column \(j\) of \([A \mid I]\) and \(c_B\) are the objective coefficients of the basic variables. Reduced cost measures how much the objective improves per unit increase in \(x_j\).
If \(\bar c_j \le 0\) for every non-basic \(j\), the current BFS is optimal — no improving direction exists at this vertex. Otherwise, some non-basic variable with positive reduced cost can enter the basis, and the corresponding edge of the polytope improves the objective.
Algorithm
find an initial basic feasible solution
while some non-basic variable j has c̄_j > 0:
pick an entering variable j with c̄_j > 0
compute the updated column ā_j = B⁻¹ a_j
if ā_j ≤ 0 entry-wise: LP is unbounded — stop
minimum-ratio test: pick leaving variable i = argmin { x_i / ā_{ij} : ā_{ij} > 0 }
pivot: swap j into the basis, i out
return current BFS as optimal
Minimum-ratio test. Increasing non-basic variable \(j\) from \(0\) while holding all other non-basics at \(0\) drives each basic variable \(x_i\) at rate \(\bar a_{ij}\). The first basic variable to hit zero is \(\arg\min_i \{x_i / \bar a_{ij} : \bar a_{ij} > 0\}\). This variable leaves the basis.
Unboundedness. If \(\bar a_j \le 0\) for the entering variable \(j\), increasing \(j\) never drives any basic variable negative: the feasible region is unbounded in the direction of \(j\) and the LP has no finite optimum.
Initialization (Phase I). For LPs with \(b \ge 0\), the slack variables form an initial BFS: set \(x = 0\), \(s = b\). For general \(b\), Phase I finds a starting BFS by solving an auxiliary LP that minimizes the sum of artificial variables added to make the system feasible. If the Phase I optimum is zero, the artificials are all zero and we have a valid starting basis; otherwise the original LP is infeasible.
Walkthrough
Use the LP from the linear programming geometry example (converted to maximization slack form):
\[ \text{maximize } 3x_1 + 2x_2, \quad x_1 + x_2 + s_1 = 4, \quad x_1 + s_2 = 3, \quad x_2 + s_3 = 3, \quad \text{all variables} \ge 0. \]
The diagram below shows the three vertices visited and the two pivot steps as a walk on the polytope.
Initial tableau
Initial basis \(B = \{s_1, s_2, s_3\}\), so \(x_1 = x_2 = 0\).
| \(x_1\) | \(x_2\) | \(s_1\) | \(s_2\) | \(s_3\) | RHS | |
|---|---|---|---|---|---|---|
| \(s_1\) | 1 | 1 | 1 | 0 | 0 | 4 |
| \(s_2\) | 1 | 0 | 0 | 1 | 0 | 3 |
| \(s_3\) | 0 | 1 | 0 | 0 | 1 | 3 |
| obj | 3 | 2 | 0 | 0 | 0 | 0 |
The objective row shows reduced costs \(\bar c_j\). Both \(x_1\) and \(x_2\) have positive reduced cost; the current BFS has objective \(0\).
Pivot 1: enter \(x_1\) (reduced cost 3), leave \(s_2\) (minimum ratio \(3/1 = 3\))
Ratios: \(s_1\) row gives \(4/1 = 4\); \(s_2\) row gives \(3/1 = 3\); \(s_3\) row has coefficient \(0\), so \(\infty\). Minimum ratio is \(3\), so \(s_2\) leaves.
Divide the \(s_2\) row by \(1\) (pivot element). Eliminate \(x_1\) from all other rows.
| \(x_1\) | \(x_2\) | \(s_1\) | \(s_2\) | \(s_3\) | RHS | |
|---|---|---|---|---|---|---|
| \(s_1\) | 0 | 1 | 1 | \(-1\) | 0 | 1 |
| \(x_1\) | 1 | 0 | 0 | 1 | 0 | 3 |
| \(s_3\) | 0 | 1 | 0 | 0 | 1 | 3 |
| obj | 0 | 2 | 0 | \(-3\) | 0 | 9 |
BFS: \(x_1 = 3\), \(x_2 = 0\), \(s_1 = 1\), \(s_3 = 3\). Objective \(= 9\).
Pivot 2: enter \(x_2\) (reduced cost 2), leave \(s_1\) (minimum ratio \(1/1 = 1\))
Ratios: \(s_1\) row gives \(1/1 = 1\); \(x_1\) row has coefficient \(0\), so \(\infty\); \(s_3\) row gives \(3/1 = 3\). Minimum is \(1\).
| \(x_1\) | \(x_2\) | \(s_1\) | \(s_2\) | \(s_3\) | RHS | |
|---|---|---|---|---|---|---|
| \(x_2\) | 0 | 1 | 1 | \(-1\) | 0 | 1 |
| \(x_1\) | 1 | 0 | 0 | 1 | 0 | 3 |
| \(s_3\) | 0 | 0 | \(-1\) | 1 | 1 | 2 |
| obj | 0 | 0 | \(-2\) | \(-1\) | 0 | 11 |
All reduced costs \(\le 0\). Optimal: \(x_1 = 3\), \(x_2 = 1\), objective \(= 11\). This matches vertex \(C\) from the geometry diagram.
Correctness
Soundness. When the algorithm halts with \(\bar c_j \le 0\) for all non-basic \(j\), no edge improves the objective, so the current vertex is locally optimal. Convexity of the LP elevates “locally optimal at a vertex” to “globally optimal.”
Termination via Bland’s rule. A BFS is degenerate if some basic variable has value zero — that is, more than \(n\) constraints are tight at the same vertex. A pivot at a degenerate BFS may leave the objective unchanged (the entering variable enters at level \(0\)). In the worst case, a sequence of degenerate pivots can cycle back to a previously visited basis, looping forever.
Bland’s rule (Bland 1977): enter the non-basic variable with the smallest index among those with positive reduced cost; break minimum-ratio ties the same way. This eliminates cycling and guarantees finite termination (proof). Most practical solvers handle degeneracy through perturbation (slightly randomizing the right-hand side \(b\)) or the lexicographic rule, both of which also guarantee finite termination.
Complexity and Tradeoffs
The worst-case number of pivots is exponential. The Klee–Minty cube (Klee and Minty 1972) is a family of LPs in \(n\) variables where the most-positive-reduced-cost rule visits all \(2^n\) vertices.
In practice, simplex is fast — typically \(O(\text{poly}(n, m))\) pivots on real-world instances. Smoothed analysis explains this: if the input is perturbed by a tiny random noise, simplex runs in polynomial time with high probability. This formal result matches the observed behavior on practical data.
Pivot rules
The algorithm is correct for any choice of entering variable with positive reduced cost, but the pivot rule affects termination and speed.
- Dantzig’s rule. Enter the variable with the most positive reduced cost. Intuitive and fast in practice.
- Bland’s rule. Smallest-index entering variable. Slower in practice but guarantees termination — useful as a fallback when cycling is detected.
- Steepest edge. Enter the variable whose pivot direction has the largest normalized objective improvement. Expensive per step but often pays off in pivot count; the default in most production solvers.
When to Use It
| Situation | Approach |
|---|---|
| LP with warm start available (branch-and-bound, parametric resolves) | Simplex — exploits the previous basis. |
| Massive sparse LP, cold start | Interior-point methods are often faster — but the simplex final pivot can polish to a vertex. |
| Need an exact vertex solution (e.g., for ILP) | Simplex — interior-point returns interior-of-face solutions. |
| Highly degenerate LP | Use a steepest-edge or Bland’s-rule simplex; consider an interior-point method. |
| Very large LP, dense | Interior-point methods scale better in \(n\). |
Variants
- Dual simplex. Maintains dual feasibility instead of primal feasibility, pivoting toward primal feasibility. The standard choice for warm-starting after a constraint is added (e.g., in branch-and-bound).
- Revised simplex. Stores \(\mathbf{B}^{-1}\) implicitly rather than the full tableau, reducing memory and per-pivot work on sparse problems.
- Parametric simplex. Solves a family of LPs parameterized by one parameter (e.g., one varying coefficient) in a single pass.