NP-Completeness
Motivation
Computer science has many problems for which no polynomial-time algorithm is known: the travelling-salesman tour, the smallest vertex cover, the optimal Boolean satisfying assignment. Decades of effort have failed to find polynomial algorithms for any of them. Are we missing something clever, or are these problems fundamentally hard?
NP-completeness is the framework that lets us answer this rigorously. A problem is NP-complete if it is in NP (a candidate solution can be verified in polynomial time) and is NP-hard (every NP problem can be reduced to it in polynomial time). NP-complete problems are the “hardest in NP” — a polynomial algorithm for any one would solve all of them, settling the famous P vs. NP question (Cook 1971; Karp 1972; Garey and Johnson 1979).
So when a researcher proves a new problem NP-complete, the practical message is: stop looking for an exact polynomial algorithm and start looking for approximation, restriction, or special-case methods.
Setup: decision problems and verification
A decision problem asks a yes/no question of an input string. Examples: “does this graph have a Hamiltonian cycle?”, “does this Boolean formula have a satisfying assignment?”. Every optimization problem (“find the smallest vertex cover”) has a corresponding decision problem (“is there a vertex cover of size \(\le k\)?”), and the two are polynomial-time equivalent under reasonable conditions.
The class P is the set of decision problems decidable by a polynomial-time algorithm.
The class NP is the set of decision problems for which a candidate certificate can be verified in polynomial time. Formally: \(L \in \mathsf{NP}\) if there exists a polynomial \(p\) and a polynomial-time algorithm \(V\) such that for every \(x\), \[ x \in L \iff \exists c \text{ with } |c| \le p(|x|) \text{ and } V(x, c) = \text{accept}. \] The certificate \(c\) is the “solution”; \(V\) checks that it actually solves the instance.
Every problem in P is in NP (the verifier ignores the certificate and recomputes the answer). The unresolved question: is \(\mathsf{P} = \mathsf{NP}\)?
Polynomial-time reductions
A polynomial-time reduction from problem \(A\) to problem \(B\) — written \(A \le_P B\) — is a polynomial-time function \(f\) such that \[ x \in A \iff f(x) \in B. \] If \(A \le_P B\) and \(B\) has a polynomial algorithm, so does \(A\) (compute \(f(x)\), then run \(B\)’s algorithm). Reductions are transitive: \(A \le_P B\) and \(B \le_P C\) imply \(A \le_P C\).
NP-hardness and NP-completeness
A problem \(B\) is NP-hard if \(A \le_P B\) for every \(A \in \mathsf{NP}\). A problem \(B\) is NP-complete if it is in NP and is NP-hard.
So NP-complete problems are universal hard problems within NP: solving any one of them in polynomial time would solve every NP problem in polynomial time, proving \(\mathsf{P} = \mathsf{NP}\).
The Cook-Levin theorem
The first NP-complete problem was identified by Cook (and independently Levin) in 1971: Boolean satisfiability (SAT), the question of whether a propositional formula has a satisfying assignment, is NP-complete (Cook 1971). The proof builds a Boolean formula that simulates the verifier of an arbitrary NP language on a given input.
Once one problem is shown NP-complete, others follow by reduction: to show \(B\) is NP-complete, exhibit a polynomial-time reduction from a known NP-complete problem (often SAT) to \(B\), and check that \(B \in \mathsf{NP}\). Karp (1972) showed 21 problems NP-complete this way (Karp 1972), and the catalogue now contains thousands.
Catalogue of common NP-complete problems
| Problem | Decision form |
|---|---|
| SAT | Is there a satisfying assignment? |
| 3-SAT | SAT restricted to clauses of length 3 |
| Vertex cover | Is there a vertex cover of size \(\le k\)? |
| Independent set | Is there an independent set of size \(\ge k\)? |
| Clique | Is there a clique of size \(\ge k\)? |
| Hamiltonian cycle / TSP | Does a Hamiltonian cycle exist? Does a tour of length \(\le L\) exist? |
| Subset sum | Is there a subset of integers summing to \(T\)? |
| Knapsack (decision) | Is there a value-\(\ge V\) feasible packing? |
| 3-coloring | Is the graph 3-colorable? |
| Integer linear programming | Does an integer feasible solution exist? (proof) |
Example: reduction from subset-sum to knapsack
Subset-sum instance. Numbers \(\{3, 1, 4, 1, 5\}\), target \(T = 9\). Does some subset sum to exactly \(9\)?
Map to knapsack decision. For each element \(a_i\), create an item with \(w_i = v_i = a_i\). Set capacity \(W = T = 9\) and value threshold \(V = T = 9\). Ask: is there a subset of items with total weight \(\le 9\) and total value \(\ge 9\)?
Because \(w_i = v_i\), any feasible packing (weight \(\le 9\)) that achieves value \(\ge 9\) must have weight exactly \(9\) — so it picks a subset summing to exactly \(T\).
| Element | \(w_i\) | \(v_i\) |
|---|---|---|
| 3 | 3 | 3 |
| 1 | 1 | 1 |
| 4 | 4 | 4 |
| 1 | 1 | 1 |
| 5 | 5 | 5 |
Yes instance. The subset \(\{3, 1, 5\}\) sums to \(9\); the corresponding packing has weight \(3+1+5=9 \le W\) and value \(9 \ge V\). So the knapsack answer is yes.
Why yes/no is preserved. If a subset \(S\) sums to \(T\), then including those items gives value \(= T = V\) with weight \(= T = W\): a feasible knapsack packing achieving the threshold. Conversely, any knapsack packing with value \(\ge V = T\) must have weight \(\ge T\) (since \(v_i = w_i\)), so the weight constraint forces weight \(= T\) exactly — that selection is a subset summing to \(T\).
The reduction is polynomial (one item per element, three parameters). Therefore knapsack (decision) is NP-hard: a polynomial knapsack algorithm would solve subset-sum, which is NP-complete.
What does NP-completeness rule out?
NP-completeness does not mean the problem cannot be attacked. It means any algorithm with provable worst-case polynomial running time would prove \(\mathsf{P} = \mathsf{NP}\). In practice, NP-complete problems are tackled by:
- Approximation algorithms: run in polynomial time, return a solution within a guaranteed factor of optimal.
- Heuristics: no guarantees, but empirically fast (e.g., DPLL for SAT).
- Restricted instances: many NP-complete problems are polynomial-time on trees, planar graphs, bounded-treewidth graphs, etc.
- Pseudo-polynomial algorithms: polynomial in the numerical value of inputs, not bit-length (e.g., the knapsack DP).
- Parameterized algorithms: polynomial in \(n\) for each fixed parameter \(k\).
- LP relaxation: solve the linear-programming relaxation and round.
Beyond NP-completeness
NP is one floor of a much taller complexity tower:
- PSPACE: problems decidable in polynomial space (includes NP and likely more — game-playing, model checking).
- EXPTIME: problems decidable in exponential time (includes PSPACE).
- #P: counting problems (e.g., counting satisfying assignments of a SAT formula).
- \(\Sigma_2, \Pi_2\), polynomial hierarchy: stratification of problems by alternation of \(\exists\) and \(\forall\) quantifiers.
For data-science applications, “NP-hard” is the common warning label that exact solutions are out of reach and approximation or heuristics are the next move.