Boolean Satisfiability
Overview
A Boolean formula combines Boolean variables — each taking value true or false — using the connectives AND (\(\land\)), OR (\(\lor\)), NOT (\(\lnot\)), and implication (\(\to\)). The Boolean satisfiability problem (SAT) asks: does there exist an assignment of truth values to the variables that makes the formula evaluate to true?
The problem’s significance extends far beyond its simple statement. SAT was the first problem proved NP-complete (Cook 1971), making it the canonical hard problem of computational complexity theory. It also sits at the center of automated reasoning, formal verification, AI planning, and constraint solving.
Boolean Formulas
A Boolean variable \(x\) takes a value in \(\{\text{true}, \text{false}\}\). A literal is a variable or its negation: \(x\) or \(\lnot x\). Propositional formulas are built inductively from literals using the connectives \(\land\), \(\lor\), \(\lnot\), and \(\to\).
An assignment maps each variable to a truth value. An assignment satisfies a formula if the formula evaluates to true under it. A formula is:
- satisfiable if some assignment satisfies it,
- unsatisfiable (a contradiction) if no assignment does,
- a tautology if every assignment satisfies it.
For example: \(x \lor \lnot x\) is a tautology; \(x \land \lnot x\) is unsatisfiable; \(x \lor y\) is satisfiable (set either variable true). The satisfiability and tautology questions are duals: \(\phi\) is a tautology if and only if \(\lnot \phi\) is unsatisfiable.
Motivation
SAT matters for two largely independent reasons. In computer science, it is the canonical NP-complete problem and the engine behind practical combinatorial search and formal verification. In artificial intelligence, it is the computational core of logical inference, planning, and constraint satisfaction — each of which reduces to a satisfiability query. Modern solvers serve both communities through the same algorithmic stack.
A Universal Language for Combinatorial Search
Many combinatorial problems encode naturally as SAT. To solve a problem:
- Introduce Boolean variables representing the choices (e.g., “is vertex \(v\) colored red?”).
- Write clauses enforcing the constraints (e.g., “adjacent vertices cannot both be red”).
- Hand the formula to a SAT solver; a satisfying assignment is a solution.
Graph coloring, scheduling, bin packing, and circuit synthesis all admit such encodings. This uniformity means that any advance in SAT solving technology immediately benefits all of these domains.
Formal Verification
Bounded model checking is the main industrial application of SAT (Biere et al. 1999). A hardware circuit or software system is modeled as a transition system; the question “can the system reach an error state within \(k\) steps?” is encoded as a SAT formula over \(k\) copies of the state variables. If satisfiable, the satisfying assignment is a concrete error trace. This approach displaced earlier symbolic methods for many practical verification tasks and is now standard in hardware design.
NP-Completeness
SAT’s most consequential role in computer science is as the first NP-complete problem. Cook (1971) proved in 1971 that:
- SAT is in NP: a proposed satisfying assignment can be verified in polynomial time by evaluating the formula.
- Every problem in NP can be reduced to SAT in polynomial time.
The reduction encodes the computation of a non-deterministic polynomial-time machine as a Boolean formula: variables represent the machine’s configuration at each step, and clauses enforce valid transitions, the initial state, and acceptance (proof).
This means SAT is at least as hard as every problem in NP. A polynomial-time SAT algorithm would show that P = NP, solving one of the central open problems in mathematics and computer science.
Karp (1972) immediately applied Cook’s result: by reducing SAT — and its restriction to three-variable disjunctive clauses, known as 3-SAT — to dozens of other problems, he established NP-completeness for graph coloring, Hamiltonian cycle, set cover, clique, vertex cover, and more. This reduction chain became the standard methodology for proving computational hardness. Today, to prove that a new problem is NP-hard, one typically reduces a known NP-complete problem (often 3-SAT) to it in polynomial time.
Knowledge Representation and Logical Inference
Early AI systems represented an agent’s knowledge as a set of propositional sentences. Checking whether a sentence \(\phi\) follows from a knowledge base \(\mathcal{K}\) is equivalent to checking whether \(\mathcal{K} \land \lnot\phi\) is unsatisfiable: if no assignment satisfies the knowledge base while falsifying \(\phi\), then \(\phi\) must be true (Russell and Norvig 2020). SAT is thus the computational core of logical inference.
Resolution-based theorem proving and the Davis-Putnam-Logemann-Loveland algorithm were developed specifically for this use case. Davis-Putnam-Logemann-Loveland remains the basis for every modern SAT solver and, through it, underpins any system that reasons using propositional logic.
Planning
Classical AI planning — finding a sequence of actions to reach a goal from an initial state — encodes naturally as SAT (Kautz and Selman 1992). The encoding introduces Boolean variables for which actions execute at each time step and writes clauses enforcing:
- Preconditions: an action can only be taken if its preconditions hold in the preceding state.
- Effects: taking an action changes specified state variables.
- Frame axioms: variables not mentioned by an action’s effects persist unchanged.
- Goal conditions: the goal holds at the final time step.
A satisfying assignment corresponds to a valid plan. This reduction showed that general-purpose SAT solvers could match or outperform specialized planners on large instances, establishing SAT-based planning as a practical approach.
Constraint Satisfaction
Many AI problems are posed as constraint satisfaction problems (CSPs): assign values to variables subject to relational constraints. A CSP over finite domains can be encoded as SAT: Boolean variables represent each (variable, value) pair, and clauses enforce the constraints. This encoding lets CSP solvers borrow SAT advances, while CSP techniques — arc consistency, domain propagation — have in turn migrated into SAT solvers. The two fields have developed in close dialogue, with backtracking search and conflict-driven learning appearing in both.
Diagnosis and Configuration
Model-based diagnosis asks: given a description of a system and a set of observations, which components are faulty? Checking whether a fault hypothesis is consistent with the observations is a satisfiability query. Configuration — choosing components and parameters to meet a specification — is similarly encodable. Both drove early industrial adoption of SAT solvers in AI.
Hardness and Practical Solvers
SAT is NP-complete, so no polynomial-time algorithm is known, and none is expected unless P = NP. Yet in practice, modern SAT solvers handle formulas with millions of variables on problems from hardware verification, software analysis, and planning. The gap between worst-case intractability and practical success is explained by the structure of real-world instances: circuit encodings, planning problems, and software constraints carry regularity — small conflict clauses, locality, symmetry — that clause learning and unit propagation exploit effectively.
The Davis-Putnam-Logemann-Loveland algorithm and its modern descendant conflict-driven clause learning (CDCL) are the computational engines making this possible. The practical success of SAT solving is one of the more surprising empirical facts in computer science: a problem known to be worst-case intractable is routinely solved at industrial scale.