PDF of Eric’s handwritten notes are here.
DFS on Directed Graphs and Strongly Connected Components
In this lecture we’ll review the classic DFS (depth first search) algorithm, look at its application to directed graphs and then use it find strongly connected components of a general, directed graph.
We begin by reviewing the basic DFS algorithm for undirected graphs. The below pseudocode finds the connected components for an undirected graph that is given in adjacency list representation.
Undirected Connected Components
Below is the pseudocode. There is a global counter ccnum which is the number of the current connected component that is being explored, and each vertex is assigned a number cc[v] which is the connected component for that vertex.
DFS-cc(G):
- For all
- visited(v) = False
- ccnum = 0
- For all
:
- If not visited(v) then
- ccnum++
- Explore(v)
- If not visited(v) then
Explore(z):
- visited(z) = True
- cc(z) = ccnum
- for
- if not visited(w) then
- Explore(w)
- if not visited(w) then
The above algorithm runs in O(n+m) time where n=|V| and m=|E|.
Directed Graph
We want to extend the above algorithm to directed graphs. This will require more work and we’ll need to extend the basic DFS algorithm to keep track of further information. Recall that in a rooted tree we can label the vertices with numbering based on preorder, inorder, and postorder traversal of the tree. We will use preorder and postorder numbers. Thus we will have a global counter clock and when we first visit a node we assign a preorder number and increment the clock. When we finish exploring the neighbors of a node we assign it a postorder number and increment the clock.
Here is the pseudocode:
DFS(G):
- for all
- visited(v) = False
- clock = 0
- ccnum = 0
- for all
- if not visited(v) then
- ccnum++
- Explore(v)
- if not visited(v) then
Explore(z):
- visited(z) = True
- pre(z) = clock
- clock++
- cc(z) = ccnum
- for
- if not visited(w) then
- Explore(w)
- if not visited(w) then
- post(z) = clock
- clock++
There are 4 types of edges in the DFS tree (see Eric’s handwritten notes for an example):
- Tree edges: explored edges meaning that when considering this edge w is visited for the first time and thus we then run Explore(w).
- Back edges: if v is a descendant of w.
- Forward edges: if v is an ancestor of w (and it is not a tree edge).
- Cross edges: all remaining edges (in these cases v and w have no ancestor-descendant relationship).
It will be useful to consider the properties of the postorder numbers for the 4 types of edges. For edge :
- Tree edge: post(v) > post(w)
- Back edge: post(w) > post(v)
- Forward edge: post(v) > post(w)
- Cross edge: post(v) > post(w)
Notice that for back edges the postorder number of the tail w > postorder number of the head v, but it is the opposite for all other types of edges. Back edges are important for detecting cycles.
Theorem: A directed graph G has a cycle if and only if its DFS tree has a back edge.
Proof. (): Consider a cycle
. Some vertex of C is visited first by the DFS algorithm. Let
denote the first vertex of C that’s visited. From
we can reach every other vertex in C, thus the rest of the vertices of C are in the subtree rooted at
and therefore they are all descendants of
in the DFS tree. Therefore, the edge
will be a back edge.
(): Say the back edge is
. Since
is a descendant of
in the DFS tree there is a path
from
to
. Therefore,
is a cycle.
Topological Sorting
Consider a directed acyclic graph; this is a graph with no cycles which is known as a DAG. Our goal is topologically sort the graph. This is an ordering of the vertices so that all edges go from earlier to later in the ordering.
For a DAG we know that every DFS tree has no back edges. Therefore all edges satisfy:
. Hence we can topologically sort a DAG by decreasing postorder number.
Note that in the first vertex in the topological sorting is a source vertex meaning that it has no incoming edges, whereas the last vertex is a sink vertex, which is a vertex with no outgoing edges. There may be other source or sink vertices. Finally since the vertex with the highest postorder number is first in our topologically sorting algorithm we know that in a DAG the vertex with highest postorder number is guaranteed to be a source. Similarly, in a DAG the vertex with lowest postorder number is guaranteed to be a sink.
Strongly Connected Components
How do we solve connectivity in general directed graphs? First off we need to define the appropriate notion of connectivity. We say that vertices v and w are strongly connected if there exists a path from v to w and a path from w to v. Then a strongly connected component, denoted as a SCC, is a maximal set of strongly connected vertices.
Take a directed graph, partition the vertices into their SCC’s. Now consider the meta-graph where there is a meta-vertex for each SCC and an edge from SCC S to S’ if some v∈ S has an edge to some w∈ S’. This meta-graph is a DAG because if there is a cycle then the SCC’s on that cycle should be merged into one SCC. Thus the SCC’s can be topologically sorted. We will find the SCC’s and we will also sort these SCC’s in topological order. Moreover we will do this with just 2 runs of DFS. This algorithm is due to Rao Kosaraju (see the Wikipedia page for Kosaraju’s algorithm).
Our basic approach is to cleverly choose a start vertex v. We then run Explore(v) to discover the SCC containing v, but we only want to visit this SCC containing v and no other vertices. How should we choose this v? We want v to be in a sink SCC (meaning that the SCC containing v is a sink vertex in the meta-graph of SCC’s).
How do we get a vertex that is guaranteed to be in a sink SCC? Recall that for a DAG the vertex with lowest postorder number is guaranteed to be a sink vertex. Thus we might expect that for a general directed graph that the vertex with lowest postorder number is guaranteed to be in a sink SCC. This is not always the case (see the [DPV] textbook or Eric’s handwritten notes for a simple counterexample). But it turns out that the vertex with highest postorder number is guaranteed to be in a source SCC.
Key Lemma: For a directed graph G, for any DFS on G, the vertex with highest postorder number lies in a source SCC.
We’ll prove the lemma in a moment, but let’s first use it to finish up our SCC algorithm. We need a vertex in a sink SCC but we can find a vertex in a source SCC. What do we do? The key idea is to look at the reverse graph.
For a directed graph define its reverse graph by
where
(thus
and
have the same vertices and
reverses every edge of
).
Observe that S is a source SCC in iff S is a sink SCC in
. Similarly, S is a sink SCC in
iff S is a source SCC in
. Hence we have our algorithm.
SCC Algorithm
Take as input a directed graph in adjacency list representation.
- Build
.
- Run DFS on
(this is the DFS algorithm for directed graphs).
- Order
by decreasing postorder numbers from step (2).
- Run DFS-cc(
) with vertices ordered from step (3).
Complexity: .
Proof of Key Lemma.
It remains to prove the key lemma that the vertex with maximum postorder number lies in a source SCC. To prove the lemma we’ll use the following simpler claim.
Claim: For SCC’s and
, if there is an edge from some
to some
, then the maximum postorder number in S > maximum postorder number in S’.
Before we prove the claim, let’s use it to prove the lemma. Notice that by the claim we can topologically sort the SCC’s by the maximum postorder number in each SCC (the claim then implies that all edges go from earlier SCC’s to later SCC’s in this ordering and thus it’s a valid topologically sorting of the SCC’s). Now look at the first SCC in this ordering, call it S. SCC S contains the vertex with the maximum postorder number, and since S is first in the topological ordering it is a source SCC. Hence
lies in a source SCC and that proves the lemma. We simply need to prove the claim now.
Proof of claim: Let be the first vertex in
that’s visited.
If , then we visit and finish all of S’ before visiting any of S because S’ has no path to S (since S and S’ are distinct SCCs). Hence, post(
) post(
) for all
. This completes the proof of the claim.