Ontologies

Motivation

An ontology is an explicit, formal specification of the concepts in a domain and the relationships among them (Gruber 1993). Where a knowledge graph records instances — Boston, Massachusetts, the mayor — an ontology records the schema on top: what counts as a City, what relations a City can participate in, and which configurations are consistent. The ontology is the part of the representation that says what the symbols mean and how they interact.

Two reasons to bother with an ontology beyond the graph:

  • Inferred classifications. If Mother is defined as “Female with a child,” then any individual who happens to be Female and has a child is automatically a Mother — no triple needs to be asserted. A reasoner derives class memberships from definitions.
  • Consistency checking. If Person and Building are declared disjoint, asserting that the same individual is both is a contradiction. Catching this protects downstream consumers from incoherent data.

Both capabilities sit on a formal substrate: an ontology is a theory in some logic — usually a description logic — and reasoning is logical entailment over that theory.

What an Ontology Specifies

A typical ontology contains:

Component What it says Example
Classes (concepts) Sets of individuals Person, City, Country
Properties (roles) Binary relations between individuals locatedIn, hasChild, bornIn
Hierarchy Subclass and subproperty relations City ⊑ Place; bornIn ⊑ relatedTo
Domain and range Which classes a property’s subject and object belong to domain(bornIn) = Person, range(bornIn) = Place
Class definitions Necessary-and-sufficient conditions for class membership Mother ≡ Female ⊓ ∃hasChild.Person
Disjointness Pairs of classes with no shared instances Person ⊓ Building ⊑ ⊥
Cardinality constraints Bounds on the number of values a property can take “A Person has at most two biological parents”
Individuals Named instances and their assertions Boston : City; (Boston, locatedIn, Mass.)

This is the description-logic vocabulary. The OWL (Web Ontology Language) standard makes it concrete and ships with standardized reasoners.

A Worked Example

Suppose we are building an ontology of geography. A small fragment:

Class hierarchy:
  Place
    ├── PopulatedPlace
    │     ├── City
    │     └── Country
    └── Region

Properties:
  locatedIn      domain: Place           range: Place      (transitive)
  capitalOf      domain: City            range: Country
  governs        domain: Person          range: Country

Definitions:
  Capital ≡ City ⊓ ∃capitalOf.Country
  Country ⊑ PopulatedPlace ⊓ ∃governs⁻.Person

Disjointness:
  City ⊓ Country ⊑ ⊥

Cardinality:
  capitalOf is functional (a city is the capital of at most one country)

With these declarations and the fact (Boston, locatedIn, Massachusetts) and (Massachusetts, locatedIn, US):

  • (Boston, locatedIn, US) follows from locatedIn being transitive.
  • If we additionally assert (Boston, capitalOf, US), the reasoner concludes Boston : Capital and flags an inconsistency (because Boston is not the capital of the US in reality — but the inconsistency would only appear if some other axiom contradicted this, e.g., (Washington, capitalOf, US) together with capitalOf being inverse-functional).
  • Asserting Boston : Country triggers a contradiction with Boston : City because of the disjointness axiom.

The reasoner’s job is to derive these consequences and flag contradictions — without rerunning a hand-written script every time a new fact arrives.

Reasoning Tasks

Standard description-logic reasoners answer four kinds of queries:

  • Subsumption. Is class \(A\) necessarily a subclass of class \(B\)? (Mother ⊑ Parent?)
  • Classification. Compute the full subclass hierarchy of the ontology.
  • Instance checking. Is individual \(i\) an instance of class \(A\)? (Boston : Capital?)
  • Consistency. Does the ontology have any model at all?

All four reduce to logical entailment in the underlying description logic. Modern reasoners (HermiT, Pellet, ELK) decide them in worst-case exponential time for expressive dialects, or polynomial time for restricted ones like \(\mathcal{EL}\).

The Description-Logic Spectrum

Description logics are designed as a Pareto frontier: each dialect picks an expressiveness/complexity trade-off.

Dialect Expressiveness Complexity of subsumption
\(\mathcal{EL}\) Existential restrictions and intersections only PTime
\(\mathcal{ALC}\) Adds full negation and universal restrictions ExpTime-complete
\(\mathcal{SROIQ}\) (OWL 2 DL) Property chains, inverse, cardinality, nominals NExpTime-complete

The biomedical community standardized on \(\mathcal{EL}\) for SNOMED-CT because PTime classification scales to hundreds of thousands of concepts. Less performance-sensitive ontologies use OWL 2 DL.

A common pattern is to write the ontology in the most expressive dialect it needs, but reason over a profile-conformant fragment so the reasoner stays fast.

The Open-World Assumption

Description logics are open-world by default: if the ontology does not entail \(\phi\), \(\phi\) is unknown, not false. This is a deliberate design choice for the web setting — the graph almost certainly lacks some facts about any individual.

It also means common-sense closed-world inferences do not hold. If the ontology lists Boston’s mayor and nothing more, a reasoner does not conclude that Boston has only one mayor — there might be uncatalogued co-mayors. Cardinality axioms must be stated explicitly to license such conclusions.

Why Ontologies Matter

A few payoffs:

  • Data integration. When two organizations describe the same domain, an ontology lets you declare equivalences (OrgA:Customer ≡ OrgB:Client) and merge data automatically.
  • Quality control. Disjointness and domain/range axioms turn structural errors into logical inconsistencies a tool can find.
  • Inferred views. Class definitions yield virtual columns: “all Capitals” is computed by the reasoner rather than maintained by hand.
  • Standardized vocabularies. Schema.org, FOAF, the Gene Ontology, and SNOMED-CT all encode shared vocabularies so that disparate systems can interoperate.

When Ontologies Don’t Help

Three failure modes are worth naming:

  • Ontology engineering is hard. Choosing the right classes and capturing the right axioms takes domain expertise. Bad ontologies impose constraints that real data violates; the reasoner then reports the ontology as inconsistent.
  • Open-world surprises. Engineers used to relational databases assume the absence of a fact is a negation. Under OWA it is not, and queries return fewer answers than expected.
  • Computational complexity bites. Even decidable description logics can be exponential in the size of the ontology. Profile choice and incremental reasoning are non-negotiable for large deployments.

This is why most modern systems use ontologies narrowly: a small T-box of class definitions and constraints over a large knowledge graph of instances. The ontology provides schema discipline; the graph carries the data. Together they realize the description-logic level of the knowledge representation hierarchy.

References

Gruber, Thomas R. 1993. “A Translation Approach to Portable Ontology Specifications.” Knowledge Acquisition 5 (2): 199–220. https://doi.org/10.1006/knac.1993.1008.