Ontologies
Motivation
An ontology is an explicit, formal specification of the concepts in a domain and the relationships among them (Gruber 1993). Where a knowledge graph records instances — Boston, Massachusetts, the mayor — an ontology records the schema on top: what counts as a City, what relations a City can participate in, and which configurations are consistent. The ontology is the part of the representation that says what the symbols mean and how they interact.
Two reasons to bother with an ontology beyond the graph:
- Inferred classifications. If
Motheris defined as “Female with a child,” then any individual who happens to be Female and has a child is automatically a Mother — no triple needs to be asserted. A reasoner derives class memberships from definitions. - Consistency checking. If
PersonandBuildingare declared disjoint, asserting that the same individual is both is a contradiction. Catching this protects downstream consumers from incoherent data.
Both capabilities sit on a formal substrate: an ontology is a theory in some logic — usually a description logic — and reasoning is logical entailment over that theory.
What an Ontology Specifies
A typical ontology contains:
| Component | What it says | Example |
|---|---|---|
| Classes (concepts) | Sets of individuals | Person, City, Country |
| Properties (roles) | Binary relations between individuals | locatedIn, hasChild, bornIn |
| Hierarchy | Subclass and subproperty relations | City ⊑ Place; bornIn ⊑ relatedTo |
| Domain and range | Which classes a property’s subject and object belong to | domain(bornIn) = Person, range(bornIn) = Place |
| Class definitions | Necessary-and-sufficient conditions for class membership | Mother ≡ Female ⊓ ∃hasChild.Person |
| Disjointness | Pairs of classes with no shared instances | Person ⊓ Building ⊑ ⊥ |
| Cardinality constraints | Bounds on the number of values a property can take | “A Person has at most two biological parents” |
| Individuals | Named instances and their assertions | Boston : City; (Boston, locatedIn, Mass.) |
This is the description-logic vocabulary. The OWL (Web Ontology Language) standard makes it concrete and ships with standardized reasoners.
A Worked Example
Suppose we are building an ontology of geography. A small fragment:
Class hierarchy:
Place
├── PopulatedPlace
│ ├── City
│ └── Country
└── Region
Properties:
locatedIn domain: Place range: Place (transitive)
capitalOf domain: City range: Country
governs domain: Person range: Country
Definitions:
Capital ≡ City ⊓ ∃capitalOf.Country
Country ⊑ PopulatedPlace ⊓ ∃governs⁻.Person
Disjointness:
City ⊓ Country ⊑ ⊥
Cardinality:
capitalOf is functional (a city is the capital of at most one country)
With these declarations and the fact (Boston, locatedIn, Massachusetts) and (Massachusetts, locatedIn, US):
(Boston, locatedIn, US)follows fromlocatedInbeing transitive.- If we additionally assert
(Boston, capitalOf, US), the reasoner concludesBoston : Capitaland flags an inconsistency (because Boston is not the capital of the US in reality — but the inconsistency would only appear if some other axiom contradicted this, e.g.,(Washington, capitalOf, US)together withcapitalOfbeing inverse-functional). - Asserting
Boston : Countrytriggers a contradiction withBoston : Citybecause of the disjointness axiom.
The reasoner’s job is to derive these consequences and flag contradictions — without rerunning a hand-written script every time a new fact arrives.
Reasoning Tasks
Standard description-logic reasoners answer four kinds of queries:
- Subsumption. Is class \(A\) necessarily a subclass of class \(B\)? (
Mother ⊑ Parent?) - Classification. Compute the full subclass hierarchy of the ontology.
- Instance checking. Is individual \(i\) an instance of class \(A\)? (
Boston : Capital?) - Consistency. Does the ontology have any model at all?
All four reduce to logical entailment in the underlying description logic. Modern reasoners (HermiT, Pellet, ELK) decide them in worst-case exponential time for expressive dialects, or polynomial time for restricted ones like \(\mathcal{EL}\).
The Description-Logic Spectrum
Description logics are designed as a Pareto frontier: each dialect picks an expressiveness/complexity trade-off.
| Dialect | Expressiveness | Complexity of subsumption |
|---|---|---|
| \(\mathcal{EL}\) | Existential restrictions and intersections only | PTime |
| \(\mathcal{ALC}\) | Adds full negation and universal restrictions | ExpTime-complete |
| \(\mathcal{SROIQ}\) (OWL 2 DL) | Property chains, inverse, cardinality, nominals | NExpTime-complete |
The biomedical community standardized on \(\mathcal{EL}\) for SNOMED-CT because PTime classification scales to hundreds of thousands of concepts. Less performance-sensitive ontologies use OWL 2 DL.
A common pattern is to write the ontology in the most expressive dialect it needs, but reason over a profile-conformant fragment so the reasoner stays fast.
The Open-World Assumption
Description logics are open-world by default: if the ontology does not entail \(\phi\), \(\phi\) is unknown, not false. This is a deliberate design choice for the web setting — the graph almost certainly lacks some facts about any individual.
It also means common-sense closed-world inferences do not hold. If the ontology lists Boston’s mayor and nothing more, a reasoner does not conclude that Boston has only one mayor — there might be uncatalogued co-mayors. Cardinality axioms must be stated explicitly to license such conclusions.
Why Ontologies Matter
A few payoffs:
- Data integration. When two organizations describe the same domain, an ontology lets you declare equivalences (
OrgA:Customer ≡ OrgB:Client) and merge data automatically. - Quality control. Disjointness and domain/range axioms turn structural errors into logical inconsistencies a tool can find.
- Inferred views. Class definitions yield virtual columns: “all Capitals” is computed by the reasoner rather than maintained by hand.
- Standardized vocabularies. Schema.org, FOAF, the Gene Ontology, and SNOMED-CT all encode shared vocabularies so that disparate systems can interoperate.
When Ontologies Don’t Help
Three failure modes are worth naming:
- Ontology engineering is hard. Choosing the right classes and capturing the right axioms takes domain expertise. Bad ontologies impose constraints that real data violates; the reasoner then reports the ontology as inconsistent.
- Open-world surprises. Engineers used to relational databases assume the absence of a fact is a negation. Under OWA it is not, and queries return fewer answers than expected.
- Computational complexity bites. Even decidable description logics can be exponential in the size of the ontology. Profile choice and incremental reasoning are non-negotiable for large deployments.
This is why most modern systems use ontologies narrowly: a small T-box of class definitions and constraints over a large knowledge graph of instances. The ontology provides schema discipline; the graph carries the data. Together they realize the description-logic level of the knowledge representation hierarchy.