Backward Chaining

Motivation

Backward chaining is a goal-directed inference strategy: start from the query and work backward through rules, reducing the goal to subgoals, until all subgoals are known facts (Russell and Norvig 2020). It focuses computation on rules relevant to the goal, avoiding the derivation of irrelevant facts that forward chaining would produce.

Problem

Given a definite Horn clause knowledge base (facts and rules \(p_1 \land \cdots \land p_n \Rightarrow q\)) and a query atom \(g\) that may contain variables, return a substitution \(\theta\) such that \(\theta(g)\) is entailed by the KB, or report failure if no such substitution exists.

Backward chaining uses unification (see resolution) to match goals against rule heads containing variables.

Key Ideas

Reduce a goal to subgoals

If the goal \(g\) matches a known fact, we are done. Otherwise, find a rule \(p_1 \land \cdots \land p_n \Rightarrow q\) whose head \(q\) unifies with \(g\). The proof of \(g\) now reduces to proving each premise \(p_i\) under the unifier — turning a single goal into \(n\) smaller goals that share variable bindings through the substitution.

This recursive structure is depth-first search on the proof tree. Each rule choice is a branching point; each premise becomes a recursive call.

Why work backward from the goal

Forward chaining derives every consequence of the KB, then checks whether the goal happens to appear. Backward chaining only ever follows rules whose head unifies with a current goal — so a rule that has nothing to do with \(g\) is never even examined. On large KBs with focused queries this prunes most of the work.

Depth-first means incomplete in general

Standard backward chaining explores rule choices in the order they appear, depth-first. When a branch fails, it backtracks and tries the next applicable rule. This is the execution model of Prolog, where the knowledge base is a Prolog program and queries are goals.

The trade-off: depth-first search can plunge into an infinite recursion on left-recursive rules and never return to try alternatives that would have succeeded. So backward chaining trades completeness for goal-focused efficiency.

Algorithm

To prove goal \(g\) with current substitution \(\theta\):

prove(g, θ):
    if θ(g) is a known fact: return θ
    for each rule (p₁ ∧ … ∧ pₙ ⇒ q) in KB (with fresh variable names):
        θ' ← unify(g, q)
        if θ' succeeds:
            θ'' ← prove(θ'(p₁), θ' ∘ θ)
            ...
            θ''' ← prove(θ'(pₙ), …)
            if all subgoals succeed: return composed substitution
    fail

The algorithm returns a substitution \(\theta\) such that \(\theta(g)\) follows from \(KB\), or fails if no proof exists. The order in which rules are tried affects efficiency and termination — Prolog tries rules in the order they are written.

Walkthrough

Proving a goal by recursive subgoal reduction

Facts: \(\text{parent}(\text{alice}, \text{bob})\), \(\text{parent}(\text{bob}, \text{carol})\).

Rules:

\(\text{parent}(x, y) \Rightarrow \text{ancestor}(x, y)\)
\(\text{parent}(x, z) \land \text{ancestor}(z, y) \Rightarrow \text{ancestor}(x, y)\)

Goal: \(\text{ancestor}(\text{alice}, \text{carol})\).

Step	Current goal	Action	Substitution
1	\(\text{ancestor}(\text{alice}, \text{carol})\)	Try rule 1: unify head \(\text{ancestor}(x, y)\) → \(\{x = \text{alice}, y = \text{carol}\}\). Subgoal: \(\text{parent}(\text{alice}, \text{carol})\).	\(\{x=\text{alice}, y=\text{carol}\}\)
2	\(\text{parent}(\text{alice}, \text{carol})\)	Not a known fact, no rule with this head. Fail; backtrack.	—
3	\(\text{ancestor}(\text{alice}, \text{carol})\)	Try rule 2: unify head → \(\{x = \text{alice}, y = \text{carol}, z = ?\}\). Subgoals: \(\text{parent}(\text{alice}, z)\) and \(\text{ancestor}(z, \text{carol})\).	\(\{x=\text{alice}, y=\text{carol}\}\)
4	\(\text{parent}(\text{alice}, z)\)	Matches fact \(\text{parent}(\text{alice}, \text{bob})\). Bind \(z = \text{bob}\).	add \(\{z=\text{bob}\}\)
5	\(\text{ancestor}(\text{bob}, \text{carol})\)	Recurse. Try rule 1: subgoal \(\text{parent}(\text{bob}, \text{carol})\).	—
6	\(\text{parent}(\text{bob}, \text{carol})\)	Matches fact. Success.	—

The proof completes: \(\text{ancestor}(\text{alice}, \text{carol})\) holds because \(\text{alice}\) is a parent of \(\text{bob}\), who is an ancestor of \(\text{carol}\). Notice rule 1 was tried first (step 1) and failed before rule 2 succeeded — the backtracking trace is part of normal operation.

Correctness

Soundness. Each step of backward chaining either matches a known fact or unifies the goal with a rule head and recursively proves the resulting subgoals under a composed substitution. Both moves correspond to valid applications of Horn-clause inference, so the chain of reasoning is correct: every answer substitution \(\theta\) returned satisfies \(KB \models \theta(g)\).

Incompleteness. Unlike forward chaining, backward chaining is not guaranteed to terminate. Left-recursive rules such as \(\text{ancestor}(x, z) \land \text{parent}(z, y) \Rightarrow \text{ancestor}(x, y)\) ordered before the base case can cause depth-first search to loop forever, missing answers that exist. Prolog’s depth-first strategy can be incomplete in this sense even when the knowledge base has a finite answer set. Adding occurs-check on unification and breadth-first or iterative-deepening search restores completeness at the cost of more work.

Complexity and Tradeoffs

In the worst case, backward chaining can explore an exponential proof tree (one branch per applicable rule, branching at every subgoal). On focused queries where few rules apply at each step, the search is far smaller — this is the key benefit over forward chaining, which always derives every consequence.

Backward chaining is the execution model of Prolog and the inference strategy of most rule-based expert systems and logic-programming languages. Memoization (tabling, in Prolog terms) avoids redundant work by caching subgoal results and detecting recursive calls, restoring completeness on a wide class of programs.

When to Use It

Situation	Approach
One-off goal queries on a large KB	Backward chaining — derives only goal-relevant facts.
Prolog programming	Backward chaining (depth-first with rule-order tiebreaking).
Repeated queries on a stable KB	Forward chaining — precomputes all consequences.
Left-recursive rules / completeness required	Use tabling, iterative deepening, or resolution.
Streaming data, monitoring	Forward chaining — react to new facts.

Comparison with Forward Chaining

	Forward chaining	Backward chaining
Direction	Data → goal	Goal → data
Completeness	Yes (Horn clauses)	No (may loop)
Focus	Derives all consequences	Derives only goal-relevant facts
Typical use	Monitoring, alerting, Datalog	Query answering, Prolog

Forward chaining is preferable when many queries will be posed against a stable knowledge base, since all consequences are precomputed. Backward chaining is preferable for one-off queries where most of the knowledge base is irrelevant.

References

Russell, Stuart, and Peter Norvig. 2020. Artificial Intelligence: A Modern Approach. 4th ed. Pearson. https://www.pearson.com/en-us/subject-catalog/p/artificial-intelligence-a-modern-approach/P200000003500/9780137505135.