Dependency Structure

Motivation

Syntactic structure represents how words in a sentence relate to each other. There are two main paradigms: constituency parsing groups words into nested phrases, and dependency parsing identifies binary directed relations between individual words. Dependency representations are compact, directly encode grammatical relationships, and work well across languages with flexible word order.

Dependency Trees

A dependency tree for a sentence of \(n\) words is a directed tree where:

Every word is a node.
Exactly one node is the root — the main predicate, typically the main verb.
Every non-root node has exactly one head (parent), connected by a directed arc from head to dependent.
Each arc carries a relation label describing the grammatical role of the dependent (e.g., nsubj for nominal subject, obj for direct object, amod for adjectival modifier).

For the sentence “The cat sat on the mat”:

sat → cat  (nsubj)
sat → on   (prep)
on  → mat  (pobj)
cat → The  (det)
mat → the  (det)

The root is “sat”; every other word depends on exactly one head.

Universal Dependencies

The Universal Dependencies (UD) project defines a cross-lingual standard for dependency annotation (Nivre 2008). UD uses a fixed inventory of around 40 relation types and annotation guidelines designed to be consistent across 100+ languages. UD treebanks enable multilingual parsing research and the development of language-agnostic models.

Projectivity

A dependency tree is projective if, for every arc \((h, d)\), every word between \(h\) and \(d\) in linear sentence order is a descendant of \(h\). Intuitively, arcs do not cross when drawn as curves above the sentence. English is mostly projective; languages with scrambling (German, Czech, Finnish) have more non-projective structures.

Projectivity matters algorithmically: the fastest exact parsing algorithms assume projectivity and run in \(O(n^3)\) via dynamic programming. Handling non-projective structures requires more general but slower algorithms.

Transition-Based Parsing

Transition-based dependency parsers (Nivre 2008) process the sentence left-to-right using a stack and a buffer:

Stack: words currently being processed.
Buffer: words not yet seen.
Arc set: the dependency arcs built so far.

Three transitions (arc-standard system):

Transition	Effect
Shift	Move the front of the buffer onto the stack.
Left-Arc(\(r\))	Add arc (buffer front → stack top) with label \(r\); pop the stack top.
Right-Arc(\(r\))	Add arc (stack top → buffer front) with label \(r\); remove the buffer front.

A classifier chooses the transition at each step based on features of the current configuration. Transition-based parsers run in \(O(n)\) steps and are fast in practice — one transition per word — making them suitable for processing large corpora.

Graph-Based Neural Parsers

Modern parsers use pretrained transformer encoders to produce contextual embeddings for every word, then apply a bilinear classifier to score every possible (head, dependent, label) triple. A minimum spanning tree algorithm (for non-projective trees) or the Eisner algorithm (for projective trees) then finds the globally highest-scoring valid tree. These graph-based parsers achieve state-of-the-art accuracy and handle non-projective structures naturally, at the cost of an \(O(n^2)\) scoring step.

References

Nivre, Joakim. 2008. “Algorithms for Deterministic Incremental Dependency Parsing.” Computational Linguistics 34 (4): 513–53. https://aclanthology.org/J08-4003/.