$\newcommand{\V}[1]{\mathbf #1}$ $\newcommand{\M}[1] {#1}$ $\newcommand{\Reals} { \mathbb{R} }$


C.1. Data Structures

A fundamental concept in computer science is the graph (aka, network). This is a structure that defines a set of objects and connectivity relations between pairs of objects. The objects are known as vertices (or sometimes nodes), and the relations are known as edges (or sometimes arcs).


A graph $G = (V,E)$ consists of a set of vertices $V$ and a set of edges $E$, in which each edge is a pair $(u,v)$ with $u\in V$ and $v \in V$. For every such pair $(u,v)$, $v$ is called a neighbor of $u$. We can also define the neighbors of a vertex as $N(u) = \{ v \in V \quad |\quad (u,v)\in E\}$.

Graphs can either be directed in which the existence of an edge $(u,v)$ indicates that a path can move from $u$ to $v$, or undirected in which paths can run either direction along an edge. For the undirected case, we shall simply require that if $(u,v) \in E$ then $(v,u) \in E$ as well.

For each edge $(u,v)$ we say that $u$ is the tail of the edge and $v$ is its head. A sequence of connected edges $(v_0,v_1),(v_1,v_2),\ldots,(v_{k-1},v_k)$ with each $(v_{i-1},v_i)\in E$ is known as a path in the graph. If a path exists between two vertices in an undirected graph, we say the vertices are connected. A path that starts and ends at the same vertex ($v_k=v_0$) is called a cycle.



Figure 1. A tree is a graph with no cycles. (a) A directed tree is drawn with its root at the top, with a node's children drawn on the layer below. Leaf nodes are shaded. (b) A vertex's ancestors (shaded) are the vertices on the path from the root. (c) A vertex's descendants (shaded) forms a subtree of the original tree where $v$ is the root.

A tree is a special kind of undirected graph in which there are no cycles and in which all vertices are connected. A tree can be thought of a directed graph in which each vertex $v$ is the head of at most one edge in $E$. In fact, all vertices are the head of exactly one edge except one special vertex, called the root of the tree (Fig. 1.a).

In a directed tree, we define two concepts for each vertex $v$. The parent of $v$, $Parent(v)$ is defined as the unique vertex $u$ for which $(u,v)\in E$, or nil if $v$ is the root. The children of $v$ are defined as the heads of edges for which $v$ is a tail: $$Children(v) = \{ w \in V \quad |\quad (v,w) \in E\}.$$ If a vertex has no children, then it is called a leaf. Trees are usually drawn in downward fashion starting from the root, with all children of a node on the next level down.

With this definition, we can also define the ancestors of a vertex $v$ which is the set of nodes along the path from the root to $v$, inclusive (Fig. 1.b). This set includes $v$, $v$'s parent, the parent of $v$'s parent, and so on, until the root is reached. Similarly, the descendants of $v$ are the set of nodes $w$ for which either $v=w$ or there exists a directed path from $v$ to $w$. In other words, $v$'s descendants are $v$, $v$'s children, $v$'s children's children, and so on. The tree consisting of $v$'s descendants and edges between them is also known as the subtree at $v$. In the subtree. $v$ is the root (Fig. 1.c).


Representations of trees are simpler than general graphs, so we will describe them first. In most object-oriented programming languages, a tree can be represented as a simple hierarchical data structure. Define a structure TreeNode representing a vertex and containing its entire subtree. It has the following members:

  • data: application-specific data to be stored with the vertex.
  • children: array or list of TreeNodes.
  • parent: optional, a TreeNode reference.

Then the entire tree is stored as the root TreeNode. With this structure it is possible to search downward from the root for any desired node. If the parent member is present, then it is also possible to search upward from any node.

Graph data structures are a bit more complex. The most straightforward representation, which stores $V$ and $E$ as lists of vertices and pairs of vertices, respectively, is not computationally convenient for most purposes. For example, to determine a vertex's neighbors $N(v)$, one would need to loop through all of the edges in $E$ to find if $v$ is one of the edge's endpoints. This is an $O(|E|)$ operation, which is $O(|V|^2)$ in the worst case! Since this is a fundamental operation to most interesting operations on graphs (determining connectedness, graph search, etc) the adjacency list representation is often preferred.


Figure 2. A simple graph

The adjacency list representation stores for each vertex $v$ the list of its neighbors $N(v)$. Specifically, if we were to number the $n\equiv |V|$ vertices as $0,1,\ldots,n-1$ (assuming 0-based indexing in the programming language), the adjacency list representation stores a nested list edges$[0], \ldots,$ edges$[n-1]$ where each edges$[i]$ is a list of vertex indices in $\{ 0,\ldots,n-1 \}$. The code below illustrates this for a simple graph shown in Fig. 2.

data = ['A','B','C','D'] #vertex names
edges = [[1,2],#indices of neighbors of 'A'
         [3],  #indices of neighbors of 'B'
         [3],  #indices of neighbors of 'C'
         []]   #indices of neighbors of 'D'

Any application-specific data associated with vertices is stored as an auxiliary array data$[0], \ldots,$ data$[n-1]$. The main disadvantage of this representation is that identifying whether an edge exists requires iterating through the adjacency list of the edge's tail.

The alternative adjacency matrix representation is useful for a number of operations, such as determining whether an edge exists in the graph and various graph-theoretic operations. This represents the edges of a graph as an $n \times n$ array such that edges$[i,j]$ is 1 if there is an edge in $E$ from the vertex indexed by $i$ to the vertex indexed by $j$, and 0 otherwise. Again we are assuming 0-based indexing. the matrix $$ \begin{bmatrix} 0 & 1 & 1 & 0 \\ 0 & 0 & 0 & 1 \\ 0 & 0 & 0 & 1 \\ 0 & 0 & 0 & 0 \end{bmatrix}$$ gives the adjacency list for the graph shown in Fig. 2. The main disadvantages of this representation are that 1) it takes $n^2$ space to store, and 2) determining neighbors of a vertex requires checking $n$ entries.