Neurosymbolic AI

Motivation

Two traditions have spent decades doing AI differently. The symbolic tradition — production systems, first-order logic, planners, knowledge bases — encodes domain knowledge explicitly and reasons over it with sound algorithms. The neural tradition — multilayer perceptrons, convolutional networks, transformers — learns implicit representations from data and computes by matrix multiplication. Each has strengths the other lacks:

Symbolic	Neural
Compositional, generalizes to novel combinations	Robust to noise and partial input
Sound: conclusions provably follow from premises	Learns from examples; no rule writing required
Inspectable; rules can be audited and edited	Distributed; no clean inspection point
Brittle at the edges of the formalism	Fluent on the data, opaque on the reasoning

Neurosymbolic AI (Garcez and Lamb 2023) is the program of combining the two — building systems where pattern recognition and structured reasoning operate on the same problem, ideally with the strengths of each compensating for the weaknesses of the other. The motivating bet is that the failure modes of LLMs — hallucination, miscalibration, brittle compositional generalization (see limits of reasoning) — are exactly what symbolic methods are good at, and vice versa.

The Core Tension

The challenge is that the two computational substrates are incompatible. Neural networks operate on dense, continuous embeddings; symbolic systems operate on discrete syntactic structures. A neurosymbolic architecture must answer:

Where is the interface? Embeddings ↔︎ symbols ↔︎ embeddings, or all one substrate?
Which direction is the gradient? Does the symbolic component need to be differentiable, or is it called only at inference time?
Who is in charge? Does the neural component drive (proposing solutions, calling the symbolic system) or the symbolic component drive (committing to a logic, lifting facts from learned features)?

Different answers produce different architectures, but the patterns cluster.

Patterns of Integration

Five recurring patterns capture most of the design space.

1. Symbolic features feed into a neural network

Compute symbolic features — entity types from a knowledge graph, arithmetic results from a calculator, syntactic parses — and feed them as additional inputs to a neural model. The neural model gets a richer feature vector; the symbolic computation is upstream and not learned end-to-end.

Simple to implement, used widely in practice, but the neural model treats the symbolic features as just more numbers. There is no guarantee it respects their semantics.

2. Neural networks supply candidates to a symbolic reasoner

The network is fast and approximate; the symbolic system is slow and exact. The network generates candidates, the symbolic system filters by soundness. This is neural proposal with symbolic verification, and the AlphaGeometry / FunSearch / theorem-proving systems are the canonical instances.

The neural component never needs to be correct, only useful — wrong proposals are caught by the verifier. The symbolic component never needs to be creative, only sound.

3. Symbolic structure constrains a neural model’s outputs

Build the symbolic constraint into the model’s decoding. Examples:

Grammar-constrained decoding. When generating code or JSON, mask tokens that would violate the grammar. The output is guaranteed to be well-formed without any post-hoc filtering.
Logic-aware loss functions. Add a term that penalizes violations of known constraints (e.g., “the predicted set of facts must satisfy this ontology”). Used in semantic image segmentation and structured prediction.
Differentiable theorem provers (DeepProbLog, NTPs). Encode logical rules as differentiable computations so that gradients can flow through them. The neural and symbolic components share a substrate and are trained jointly.

These approaches make the symbolic structure a constraint on what the neural model can output, not an external check.

4. Symbolic tools are called by a neural agent

The neural model decides when to invoke an external tool — a calculator, a SQL query, a theorem prover, an LLM with symbolic tools wrapper — and treats its output as additional context. The two systems are loosely coupled and communicate through a textual interface.

The neural model retains its conversational fluency; the symbolic tool handles the precise computation. This is the design behind Toolformer (Schick et al. 2023), ReAct (Yao et al. 2023), and the tool-use pattern that has become standard in modern LLM applications.

5. Knowledge graphs supply structured grounding

Pair a knowledge graph with a neural model so that the model’s claims about the world can be checked against, or generated from, the graph. Retrieval-augmented generation (RAG) is the most common instantiation: retrieve graph entries (or text passages indexed by entities), condition generation on them, and reduce hallucination by grounding generation in retrieved content.

The graph is curated and updatable; the neural model is fluent and adaptable. The retrieval interface is what keeps them aligned.

The Three Waves

It is conventional (Garcez and Lamb 2023) to describe AI as proceeding in waves:

Wave 1 (symbolic). GOFAI, expert systems, knowledge representation. Captured intuitions but did not scale; the “AI winter” of the late 1980s followed.
Wave 2 (neural). Statistical learning, deep learning, transformers. Scales spectacularly but fails compositionally and cannot be audited.
Wave 3 (neurosymbolic). Hybrid systems that combine learned representations with explicit knowledge. The bet is that the third wave inherits the scaling of the second and the structure of the first.

The wave metaphor is rhetorical, not literal — symbolic and neural research never stopped overlapping — but it captures the shift in attention.

Where the Examples Live

The lecture covers two end-of-spectrum patterns that bracket the design space:

LLMs as interfaces to symbolic tools sits at pattern 4: the neural agent drives, and the symbolic tool is consulted on demand.
Neural proposal and symbolic verification sits at pattern 2: the neural network supplies candidates that a symbolic system must validate.

Both treat the symbolic component as a black box that the neural component invokes; neither requires the symbolic system to be differentiable. That keeps the engineering practical: any existing solver, theorem prover, knowledge graph, or constraint engine can plug in without modification.

Open Questions

Three problems separate “promising research direction” from “default architecture”:

How are the symbols chosen? Hand-engineered vocabularies do not scale; learned discrete representations are unstable. A neurosymbolic system is only as good as its symbol-grounding mechanism.
How is failure attributed? When a hybrid system errs, was the neural proposal bad or the symbolic constraint wrong? Debugging requires interpretable interfaces, which the looser couplings (pattern 4) provide and the tighter ones (pattern 3) often hide.
Does the integration help on the hardest tasks? On standard NLP benchmarks, large language models alone keep beating hybrid systems. The cleanest wins for neurosymbolic methods come on tasks with verifiable correctness — math, code, formal proofs — where the symbolic side has real teeth.

The expected trajectory is a stratification: agentic LLM systems for fluent open-ended work, neurosymbolic combinations for domains where verifiable correctness matters.

References

Garcez, Artur d’Avila, and Luís C. Lamb. 2023. “Neurosymbolic AI: The 3rd Wave.” Artificial Intelligence Review 56 (11): 12387–406. https://doi.org/10.1007/s10462-023-10448-w.

Schick, Timo, Jane Dwivedi-Yu, Roberto Dessì, et al. 2023. “Toolformer: Language Models Can Teach Themselves to Use Tools.” Advances in Neural Information Processing Systems (NeurIPS). https://proceedings.neurips.cc/paper_files/paper/2023/hash/d842425e4bf79ba039352da0f658a906-Abstract-Conference.html.

Yao, Shunyu, Jeffrey Zhao, Dian Yu, et al. 2023. “ReAct: Synergizing Reasoning and Acting in Language Models.” International Conference on Learning Representations (ICLR). https://openreview.net/forum?id=WE_vluYUL-X.