Neurosymbolic AI
Motivation
Two traditions have spent decades doing AI differently. The symbolic tradition — production systems, first-order logic, planners, knowledge bases — encodes domain knowledge explicitly and reasons over it with sound algorithms. The neural tradition — multilayer perceptrons, convolutional networks, transformers — learns implicit representations from data and computes by matrix multiplication. Each has strengths the other lacks:
| Symbolic | Neural |
|---|---|
| Compositional, generalizes to novel combinations | Robust to noise and partial input |
| Sound: conclusions provably follow from premises | Learns from examples; no rule writing required |
| Inspectable; rules can be audited and edited | Distributed; no clean inspection point |
| Brittle at the edges of the formalism | Fluent on the data, opaque on the reasoning |
Neurosymbolic AI (Garcez and Lamb 2023) is the program of combining the two — building systems where pattern recognition and structured reasoning operate on the same problem, ideally with the strengths of each compensating for the weaknesses of the other. The motivating bet is that the failure modes of LLMs — hallucination, miscalibration, brittle compositional generalization (see limits of reasoning) — are exactly what symbolic methods are good at, and vice versa.
The Core Tension
The challenge is that the two computational substrates are incompatible. Neural networks operate on dense, continuous embeddings; symbolic systems operate on discrete syntactic structures. A neurosymbolic architecture must answer:
- Where is the interface? Embeddings ↔︎ symbols ↔︎ embeddings, or all one substrate?
- Which direction is the gradient? Does the symbolic component need to be differentiable, or is it called only at inference time?
- Who is in charge? Does the neural component drive (proposing solutions, calling the symbolic system) or the symbolic component drive (committing to a logic, lifting facts from learned features)?
Different answers produce different architectures, but the patterns cluster.
Patterns of Integration
Five recurring patterns capture most of the design space.
1. Symbolic features feed into a neural network
Compute symbolic features — entity types from a knowledge graph, arithmetic results from a calculator, syntactic parses — and feed them as additional inputs to a neural model. The neural model gets a richer feature vector; the symbolic computation is upstream and not learned end-to-end.
Simple to implement, used widely in practice, but the neural model treats the symbolic features as just more numbers. There is no guarantee it respects their semantics.
2. Neural networks supply candidates to a symbolic reasoner
The network is fast and approximate; the symbolic system is slow and exact. The network generates candidates, the symbolic system filters by soundness. This is neural proposal with symbolic verification, and the AlphaGeometry / FunSearch / theorem-proving systems are the canonical instances.
The neural component never needs to be correct, only useful — wrong proposals are caught by the verifier. The symbolic component never needs to be creative, only sound.
3. Symbolic structure constrains a neural model’s outputs
Build the symbolic constraint into the model’s decoding. Examples:
- Grammar-constrained decoding. When generating code or JSON, mask tokens that would violate the grammar. The output is guaranteed to be well-formed without any post-hoc filtering.
- Logic-aware loss functions. Add a term that penalizes violations of known constraints (e.g., “the predicted set of facts must satisfy this ontology”). Used in semantic image segmentation and structured prediction.
- Differentiable theorem provers (DeepProbLog, NTPs). Encode logical rules as differentiable computations so that gradients can flow through them. The neural and symbolic components share a substrate and are trained jointly.
These approaches make the symbolic structure a constraint on what the neural model can output, not an external check.
4. Symbolic tools are called by a neural agent
The neural model decides when to invoke an external tool — a calculator, a SQL query, a theorem prover, an LLM with symbolic tools wrapper — and treats its output as additional context. The two systems are loosely coupled and communicate through a textual interface.
The neural model retains its conversational fluency; the symbolic tool handles the precise computation. This is the design behind Toolformer (Schick et al. 2023), ReAct (Yao et al. 2023), and the tool-use pattern that has become standard in modern LLM applications.
5. Knowledge graphs supply structured grounding
Pair a knowledge graph with a neural model so that the model’s claims about the world can be checked against, or generated from, the graph. Retrieval-augmented generation (RAG) is the most common instantiation: retrieve graph entries (or text passages indexed by entities), condition generation on them, and reduce hallucination by grounding generation in retrieved content.
The graph is curated and updatable; the neural model is fluent and adaptable. The retrieval interface is what keeps them aligned.
The Three Waves
It is conventional (Garcez and Lamb 2023) to describe AI as proceeding in waves:
- Wave 1 (symbolic). GOFAI, expert systems, knowledge representation. Captured intuitions but did not scale; the “AI winter” of the late 1980s followed.
- Wave 2 (neural). Statistical learning, deep learning, transformers. Scales spectacularly but fails compositionally and cannot be audited.
- Wave 3 (neurosymbolic). Hybrid systems that combine learned representations with explicit knowledge. The bet is that the third wave inherits the scaling of the second and the structure of the first.
The wave metaphor is rhetorical, not literal — symbolic and neural research never stopped overlapping — but it captures the shift in attention.
Where the Examples Live
The lecture covers two end-of-spectrum patterns that bracket the design space:
- LLMs as interfaces to symbolic tools sits at pattern 4: the neural agent drives, and the symbolic tool is consulted on demand.
- Neural proposal and symbolic verification sits at pattern 2: the neural network supplies candidates that a symbolic system must validate.
Both treat the symbolic component as a black box that the neural component invokes; neither requires the symbolic system to be differentiable. That keeps the engineering practical: any existing solver, theorem prover, knowledge graph, or constraint engine can plug in without modification.
Open Questions
Three problems separate “promising research direction” from “default architecture”:
- How are the symbols chosen? Hand-engineered vocabularies do not scale; learned discrete representations are unstable. A neurosymbolic system is only as good as its symbol-grounding mechanism.
- How is failure attributed? When a hybrid system errs, was the neural proposal bad or the symbolic constraint wrong? Debugging requires interpretable interfaces, which the looser couplings (pattern 4) provide and the tighter ones (pattern 3) often hide.
- Does the integration help on the hardest tasks? On standard NLP benchmarks, large language models alone keep beating hybrid systems. The cleanest wins for neurosymbolic methods come on tasks with verifiable correctness — math, code, formal proofs — where the symbolic side has real teeth.
The expected trajectory is a stratification: agentic LLM systems for fluent open-ended work, neurosymbolic combinations for domains where verifiable correctness matters.