Strongly Connected Components
Motivation
In a directed graph, reachability is one-way: \(u\) may reach \(v\) without \(v\) reaching \(u\). A strongly connected component (SCC) is a maximal set of vertices that are mutually reachable — every vertex in the set can reach every other along directed paths (Cormen et al. 2009; Kleinberg and Tardos 2005). Contracting each SCC to a single super-vertex produces the condensation, which is always a DAG. This makes SCCs the bridge between general directed graphs and the algorithmic toolkit for DAGs: once the SCC structure is known, problems like cycle analysis, dependency resolution with cycles, dataflow analysis, and 2-SAT reduce to working on the condensation.
Problem
Let \(G = (V, E)\) be a directed graph with \(n = |V|\) vertices and \(m = |E|\) edges. Define the relation \(u \sim v\) to mean “there is a directed path from \(u\) to \(v\) and a directed path from \(v\) to \(u\)” (with \(u \sim u\) by convention). This is an equivalence relation; its equivalence classes are the strongly connected components of \(G\). The goal is to compute the partition \(V = S_1 \cup S_2 \cup \dots \cup S_k\).
A seven-vertex digraph with three SCCs and its condensation. Within each SCC every vertex reaches every other; between SCCs only one direction holds.
Within SCC 1 the cycle \(A \to B \to C \to A\) makes every vertex mutually reachable; within SCC 2 the cycle \(D \to E \to F \to D\) does the same. The bridges \(C \to D\) and \(F \to G\) go between SCCs and are never matched by any back-path — adding either reverse edge would merge two SCCs into one.
Key Ideas
Two facts drive every linear-time SCC algorithm.
- The condensation is a DAG. If the condensation had a cycle \(C_1 \to C_2 \to \dots \to C_j \to C_1\), the vertices of \(C_1, \dots, C_j\) would all be mutually reachable, contradicting the maximality of each \(C_i\). (proof)
- DFS finish times respect the condensation. Run a complete DFS on \(G\) and record each vertex’s finishing time. For any two SCCs \(C, C'\) with a condensation edge \(C \to C'\), the maximum finish time inside \(C\) strictly exceeds the maximum finish time inside \(C'\). Equivalently, sorting SCCs by their largest finish time descending produces a topological order of the condensation. (proof)
The second fact is the surprise. It says DFS — which knows nothing about SCCs — already encodes the condensation’s topological order in its finish times. The Kosaraju–Sharir algorithm exploits this directly; Tarjan’s algorithm encodes the same insight more efficiently through low-link values computed during a single DFS.
Algorithm
Kosaraju–Sharir: two DFS passes
The cleanest algorithm, due to Kosaraju and Sharir (Sharir 1981).
- Run DFS on \(G\), pushing each vertex onto a stack as it finishes.
- Form the transpose \(G^T\) by reversing every edge.
- While the stack is non-empty, pop \(v\). If \(v\) is unvisited in the second pass, run DFS on \(G^T\) from \(v\); the vertices reached form one SCC.
# First pass: order vertices by finish time
finished = []
visited = {}
for u in V:
if u not visited:
dfs1(u)
def dfs1(u):
visited[u] = True
for each edge (u, v) in G:
if v not visited: dfs1(v)
finished.append(u) # post-order
# Second pass: DFS in transpose, popping by reverse finish order
GT = transpose of G
visited2 = {}
sccs = []
for u in reversed(finished):
if u not in visited2:
component = []
dfs2(u, component)
sccs.append(component)
def dfs2(u, component):
visited2[u] = True
component.append(u)
for each edge (u, v) in GT:
if v not in visited2: dfs2(v, component)
Why it works: the highest-finish vertex lies in a source SCC of the condensation, which becomes a sink SCC in \(G^T\). A DFS in \(G^T\) from a sink-SCC vertex cannot escape that SCC. Peeling it off and repeating processes one SCC at a time in topological order of the condensation. (proof)
Tarjan’s algorithm: one DFS pass
Tarjan’s algorithm computes the same partition in a single DFS (Tarjan 1972). Each vertex receives a DFS index (the order it was entered) and a low-link value: the smallest index reachable from \(v\) via tree edges followed by at most one back-edge or cross-edge to a still-on-stack vertex. When DFS finishes \(v\) and \(\text{low}(v) = \text{index}(v)\), \(v\) is the root of an SCC; pop the stack down to \(v\) to emit that SCC.
index = 0
stack = []
on_stack = {}
index_of = {}
lowlink = {}
sccs = []
def strongconnect(v):
global index
index_of[v] = lowlink[v] = index
index += 1
stack.push(v); on_stack[v] = True
for each edge (v, w):
if w not in index_of:
strongconnect(w)
lowlink[v] = min(lowlink[v], lowlink[w])
elif on_stack[w]:
lowlink[v] = min(lowlink[v], index_of[w])
if lowlink[v] == index_of[v]:
component = []
repeat:
w = stack.pop(); on_stack[w] = False
component.append(w)
until w == v
sccs.append(component)
for v in V:
if v not in index_of:
strongconnect(v)
The two approaches compute the same partition; Tarjan’s avoids constructing \(G^T\) and visits each edge once.
Walkthrough
Use the digraph from the figure: edges \(A \to B\), \(B \to C\), \(C \to A\), \(C \to D\), \(D \to E\), \(E \to F\), \(F \to D\), \(F \to G\).
First DFS pass on \(G\)
Start at \(A\), expanding neighbors in alphabetical order.
| Step | Action | Recursion stack | Finished (in order) |
|---|---|---|---|
| 1 | enter A | A | |
| 2 | enter B (from A) | A, B | |
| 3 | enter C (from B) | A, B, C | |
| 4 | edge C→A: visited | A, B, C | |
| 5 | enter D (from C) | A, B, C, D | |
| 6 | enter E (from D) | A, B, C, D, E | |
| 7 | enter F (from E) | A, B, C, D, E, F | |
| 8 | edge F→D: visited | A, B, C, D, E, F | |
| 9 | enter G (from F) | A, B, C, D, E, F, G | |
| 10 | finish G | A, B, C, D, E, F | G |
| 11 | finish F | A, B, C, D, E | G, F |
| 12 | finish E | A, B, C, D | G, F, E |
| 13 | finish D | A, B, C | G, F, E, D |
| 14 | finish C | A, B | G, F, E, D, C |
| 15 | finish B | A | G, F, E, D, C, B |
| 16 | finish A | G, F, E, D, C, B, A |
Reverse finish order (highest first): \(A, B, C, D, E, F, G\).
Second DFS pass on \(G^T\)
Transpose edges: \(B \to A\), \(C \to B\), \(A \to C\), \(D \to C\), \(E \to D\), \(F \to E\), \(D \to F\), \(G \to F\).
Pop vertices in reverse finish order and DFS from each unvisited one:
| Pop | Already visited? | DFS in \(G^T\) reaches | SCC emitted |
|---|---|---|---|
| A | no | A → C → B (then stop) | {A, B, C} |
| B | yes | — | — |
| C | yes | — | — |
| D | no | D → C (visited), D → F → E → D (visited) | {D, E, F} |
| E | yes | — | — |
| F | yes | — | — |
| G | no | G → F (visited) | {G} |
The DFS rooted at \(A\) in \(G^T\) cannot reach \(D\), \(E\), \(F\), or \(G\) because the original bridges \(C \to D\) and \(F \to G\) flipped to \(D \to C\) and \(G \to F\) in \(G^T\) — these now lead into SCC 1, not out of it. That is exactly the “sink in \(G^T\)” property that confines the DFS to one SCC.
The three components \(\{A, B, C\}\), \(\{D, E, F\}\), \(\{G\}\) are emitted in topological order of the condensation: a source of the condensation is processed first, then the next, then the sink.
Complexity and Tradeoffs
Both algorithms run in \(\Theta(V + E)\) time and use \(\Theta(V + E)\) space (Kosaraju–Sharir needs an explicit transpose; Tarjan stores a recursion stack and the explicit SCC stack).
Important tradeoffs:
- Conceptual simplicity. Kosaraju–Sharir is easier to understand and prove correct — its correctness reduces to a single fact about DFS finish times. Tarjan’s relies on the more subtle low-link invariant.
- Constant factors. Tarjan’s makes only one DFS pass and avoids materializing \(G^T\), so it is typically about twice as fast in practice and uses less memory.
- Streaming the edges. When \(G\) is given as an edge list and not stored as an adjacency structure, building \(G^T\) costs an extra pass. Tarjan’s avoids this.
- Topological order of the condensation. Kosaraju–Sharir emits SCCs in topological order naturally. Tarjan’s emits them in reverse topological order (each SCC is finalized when its DFS root finishes — sinks first).
When to Use It
Compute SCCs whenever a problem on a directed graph either needs to identify cycle clusters or wants to reduce to a DAG problem.
- DAG-ifying a digraph: contract SCCs to obtain the condensation, then apply topological sort, DAG shortest paths, or dynamic programming on DAGs.
- 2-satisfiability: a 2-SAT formula on \(n\) variables is satisfiable iff no variable \(x\) shares an SCC with \(\neg x\) in the implication graph — an \(O(n + m)\) test.
- Dependency analysis with cycles: when build, module, or compilation dependencies legitimately contain cycles, SCCs identify the indivisible “groups” that must be processed together.
- Dataflow analysis and compiler optimization: loop bodies in control-flow graphs correspond to non-trivial SCCs; many fixed-point analyses iterate over the condensation.
- Model checking and verification: detecting reachable cycles in a state-transition graph reduces to finding non-trivial SCCs.
- Strong-connectivity test: \(G\) is strongly connected iff it has exactly one SCC.