Minimax Value
Motivation
In a two-player zero-sum game, one player’s gain is exactly the other’s loss. With both players acting optimally, the natural notion of “the value of the game” must account for the fact that each player will exploit any predictable mistake by the other. The minimax value captures this: it is the payoff each player can guarantee against any strategy of the opponent (Russell and Norvig 2020).
Setup
Two players, MAX and MIN, alternate moves. Each terminal position \(z\) has a payoff \(u(z) \in \mathbb{R}\) that MAX wants to maximize and MIN wants to minimize. Let \(\text{Actions}(s)\) be the legal moves at non-terminal state \(s\) and \(\text{Result}(s, a)\) the resulting state.
Game-Tree Minimax Value
Define the minimax value \(V(s)\) recursively over the game tree:
\[ V(s) = \begin{cases} u(s) & \text{if } s \text{ is terminal,} \\ \max_{a \in \text{Actions}(s)} V(\text{Result}(s, a)) & \text{if MAX moves at } s, \\ \min_{a \in \text{Actions}(s)} V(\text{Result}(s, a)) & \text{if MIN moves at } s. \end{cases} \]
\(V(s)\) is the payoff that results when both players play optimally from \(s\) onward. The associated minimax-optimal policy chooses, at each MAX node, an action whose child achieves the max — and analogously for MIN.
This recursion works for any finite game of perfect information: tic-tac-toe, chess (with a rule cap on game length), Nim. For infinite or merely impractically large games, \(V(s)\) is well-defined but cannot be computed exactly; see minimax search for the depth-limited approximation.
Guarantee
The minimax value is exactly what each player can guarantee:
- MAX has a strategy that secures payoff at least \(V(s)\), regardless of MIN’s play.
- MIN has a strategy that secures payoff at most \(V(s)\), regardless of MAX’s play.
Because the game is zero-sum and finite with perfect information, these two bounds meet at the same number — there is no “exploitation” gap. This is the perfect-information analogue of the minimax theorem, which extends the same principle to simultaneous-move games via mixed strategies.
Matrix-Game Minimax Value
For simultaneous-move zero-sum games, the players choose actions without observing each other’s choice. The game is described by a payoff matrix \(A \in \mathbb{R}^{m \times n}\) where \(A_{ij}\) is the payoff to MAX when MAX plays row \(i\) and MIN plays column \(j\).
Restricting both players to deterministic (“pure”) strategies gives two generally distinct quantities:
\[ \underline{V} = \max_{i} \min_{j} A_{ij}, \qquad \overline{V} = \min_{j} \max_{i} A_{ij}. \]
\(\underline{V}\) is what MAX can guarantee by committing to a row first; \(\overline{V}\) is what MIN can hold MAX to by committing to a column first. The committer is at a disadvantage, so \(\underline{V} \le \overline{V}\), often strictly.
Allowing mixed strategies — distributions \(x \in \Delta_m\) for MAX and \(y \in \Delta_n\) for MIN — closes the gap:
\[ V = \max_{x \in \Delta_m} \min_{y \in \Delta_n} x^\top A y = \min_{y \in \Delta_n} \max_{x \in \Delta_m} x^\top A y. \]
This common value \(V\) is the minimax value of the game and is achieved by some pair \((x^*, y^*)\) — a Nash equilibrium of the game. (proof)
Why the Two Definitions Agree
A perfect-information game is a special case of the matrix-game formulation in which one player observes the other’s move before choosing. Backward induction on the game tree produces a pair of pure strategies that simultaneously achieve max-min and min-max, so no randomization is needed — the equilibrium is in pure strategies, and the recursive \(V(s)\) above coincides with the matrix-game \(V\).
Computing the Minimax Value
- Game trees: minimax search evaluates \(V(s)\) by recursive expansion, optionally with alpha-beta pruning to cut off provably irrelevant branches.
- Matrix games: the minimax value and an optimal mixed strategy can be found by solving a linear program in \(O(\text{poly}(m, n))\) time.
Generalizations
- General-sum games lack a single scalar value; equilibria are defined by best-response conditions rather than max-min.
- Stochastic moves (chance nodes) introduce expectimax-style averaging into the recursion.
- Imperfect information requires reasoning over information sets and admits more sophisticated solution concepts (e.g., counterfactual regret minimization).