Davis-Putnam-Logemann-Loveland

Motivation

The Boolean satisfiability problem (SAT) asks whether a given Boolean formula has a satisfying assignment. SAT was the first problem proved NP-complete, and it remains one of the most heavily engineered combinatorial problems: practical SAT solvers regularly handle formulas with millions of variables, well beyond what any worst-case bound predicts.

The Davis-Putnam-Logemann-Loveland (DPLL) algorithm is the foundation of modern SAT solvers (Davis et al. 1962; Russell and Norvig 2020). It is backtracking search with two pieces of essential constraint propagation — unit propagation and pure literal elimination — that together transform a hopeless brute-force search into something effective.

Problem

Decide whether a Boolean formula in conjunctive normal form (CNF) has a satisfying assignment, and return one if so.

A CNF formula is a conjunction of clauses, each clause a disjunction of literals (a variable or its negation). For example,

\[ (x_1 \lor \lnot x_2) \,\land\, (x_2 \lor x_3) \,\land\, (\lnot x_1 \lor \lnot x_3). \]

Any propositional formula has an equivalent CNF formula (with auxiliary variables, in linear size via the Tseitin transformation), so this is no real restriction.

A clause is satisfied if at least one of its literals is true. The whole formula is satisfied iff every clause is. A partial assignment binds some variables; clauses whose remaining literals are all currently false are conflicting.

Key Ideas

DPLL is plain backtracking search over Boolean variables, but two propagation rules — both forms of “this assignment is forced, not chosen” — are powerful enough to make the search practical.

Unit propagation

A clause whose remaining literals are all false except one is a unit clause. The remaining literal must be true for the clause to be satisfiable, so the algorithm is forced to assign it. Repeated unit propagation may cascade: each forced assignment can produce new unit clauses, sometimes resolving many variables in a single sweep.

Unit propagation does the heavy lifting in practice. On structured formulas — circuit verification, planning encodings, software model checking — most variables are not free choices but logical consequences of earlier ones. A typical SAT instance with millions of variables has only thousands of decisions; the rest are propagated. The exponential search tree is therefore tiny in practice even though the worst case is unchanged.

Pure literal elimination

A literal is pure if it appears with only one polarity in the remaining clauses (always positive or always negative). Assigning a pure literal to true cannot conflict — it only satisfies clauses, never falsifies them — so the algorithm can do it without branching.

Pure literal elimination is rarely critical in modern solvers and is sometimes omitted, because it requires global formula scans that interact poorly with incremental data structures.

Why these rules are different from decisions

Both rules are pure constraint propagation: they assign variables that must take a specific value given the current state, with no guessing required. Decisions, by contrast, pick freely among unforced variables and may need to be undone on backtrack. The work of a SAT solver is choosing decisions wisely so that propagation does as much of the work as possible.

Algorithm

function DPLL(F, assignment):
    F, assignment = UNIT-PROPAGATE(F, assignment)
    F, assignment = PURE-LITERAL-ELIMINATE(F, assignment)
    if F has an empty clause: return failure       # conflict
    if F has no clauses: return assignment         # all satisfied
    x = CHOOSE-VARIABLE(F)
    return DPLL(F ∪ {x}, assignment ∪ {x = true})
        or DPLL(F ∪ {¬x}, assignment ∪ {x = false})

The recursion picks an unassigned variable, tries true, and on failure tries false. Each branch may trigger more propagation, which may force many more assignments before the next branch point.

Walkthrough

One decision, three unit propagations

Consider the CNF formula

\[ F = (x_1 \lor x_2) \,\land\, (\lnot x_2 \lor x_3) \,\land\, (\lnot x_3 \lor \lnot x_1). \]

No clause is unit yet, and no literal is pure (each of \(x_1, x_2, x_3\) appears in both polarities). DPLL must branch — try \(x_1 = \text{true}\).

Step Action \(F\) becomes Forced
1 Decide \(x_1 = T\) \((\lnot x_2 \lor x_3) \land (\lnot x_3)\)
2 Unit propagate \((\lnot x_3)\)\(x_3 = F\) \((\lnot x_2 \lor x_3) \to (\lnot x_2)\) \(x_3 = F\)
3 Unit propagate \((\lnot x_2)\)\(x_2 = F\) all clauses satisfied \(x_2 = F\)

The formula is satisfied with \(x_1 = T\), \(x_2 = F\), \(x_3 = F\). No backtracking was needed because the very first decision triggered a cascade of unit propagations that resolved the remaining variables.

If the first decision had instead been \(x_1 = F\), the clause \((x_1 \lor x_2)\) would become \((x_2)\), unit propagating \(x_2 = T\), which would make \((\lnot x_2 \lor x_3)\) become \((x_3)\), unit propagating \(x_3 = T\) — also satisfiable. Both branches of \(x_1\) succeed for this formula.

Correctness

Soundness. Both propagation rules preserve satisfiability: unit propagation assigns the only literal that could satisfy a unit clause, and pure literal elimination assigns a literal whose polarity never falsifies any clause. The branching step is exhaustive: if a satisfying assignment exists, it gives \(x\) some value, and DPLL tries both. So DPLL never returns “unsatisfiable” for a satisfiable formula, and the assignment it returns is always satisfying (every clause is satisfied or has been removed).

Termination. Each recursive call binds at least one previously free variable, and a formula with all variables bound either has an empty clause (failure) or has no clauses (success). The recursion depth is at most the number of variables.

Complexity and Tradeoffs

DPLL is exponential in the worst case — SAT is NP-complete, so unless \(\text{P} = \text{NP}\), no polynomial-time algorithm exists. The empirical performance gap between worst case and typical case is huge: random 3-SAT instances near the satisfiability threshold are hard for current solvers at a few hundred variables, while industrial instances with millions of variables are routinely solved. Understanding which structural properties of a formula make it tractable for SAT solvers is an active research area connecting proof complexity, parameterized complexity, and graph theory.

Memory in pure DPLL is \(O(n + m)\) for \(n\) variables and \(m\) clause occurrences — only the current partial assignment and the formula need to be stored. CDCL (below) trades memory for speed by learning new clauses; modern solvers periodically delete learned clauses to keep memory bounded.

When to Use It

Situation Approach
Industrial SAT (verification, planning) CDCL (DPLL + clause learning + VSIDS + two-watched-literals). Don’t roll your own — use MiniSat, CaDiCaL, Kissat.
Random 3-SAT near satisfiability threshold Hard for both DPLL and CDCL; stochastic local search (WalkSAT) often competitive on satisfiable instances.
Counting solutions (#SAT) Use a model-counting variant — see Variants.
Optimization (MaxSAT) Use a MaxSAT solver — branch-and-bound on top of DPLL.
Non-Boolean reasoning (linear arithmetic, arrays) Use an SMT solver, which layers DPLL over theory-specific decision procedures.

Variants

  • CDCL (Conflict-Driven Clause Learning). Every modern SAT solver. Extends DPLL with three additions: conflict analysis identifies a small set of decisions that caused a conflict; clause learning encodes that conflict as a new clause added to the formula; non-chronological backjumping returns directly to the earliest involved decision. CDCL also relies on heuristic decision strategies (VSIDS — variable activity scores updated on conflicts), efficient unit propagation (the two-watched-literals scheme), and periodic restarts and clause deletion.
  • Stochastic local search (WalkSAT, GSAT). Start from a random assignment, flip variables to reduce conflicts. Incomplete (cannot prove unsatisfiability) but often fast on satisfiable instances.
  • #SAT and model counting. Modify DPLL to count satisfying assignments rather than find one. Used in probabilistic inference.
  • MaxSAT. Find an assignment maximizing the number of satisfied clauses. Branch-and-bound extensions of DPLL apply.
  • SMT solvers. Layer DPLL over theory-specific reasoners (linear arithmetic, arrays, bit-vectors) to decide formulas in richer logics.

References

Davis, Martin, George Logemann, and Donald Loveland. 1962. “A Machine Program for Theorem-Proving.” Communications of the ACM 5 (7): 394–97. https://doi.org/10.1145/368273.368557.
Russell, Stuart, and Peter Norvig. 2020. Artificial Intelligence: A Modern Approach. 4th ed. Pearson. https://www.pearson.com/en-us/subject-catalog/p/artificial-intelligence-a-modern-approach/P200000003500/9780137505135.