Ellipsoid Method

The simplex algorithm is quite fast in practice, but no variant of it is known to be polynomial time. The Ellipsoid Algorithm is a first known to be polynomial time algorithm for linear programming,

find x

subject to Ax \leq b

i.e., find a feasible solution, or in other words, given a convex set P, written as Ax \leq b, find a point in P.

We start with a big enough ball, which is guaranteed to contain P. The sphere is defined E_0(0, R^2I) = \{ x: (x-z)^\top A^{-1}(x-z) \leq 1 \}, where R is sufficiently large. (Here, z =0 and A = R^2I)

Repeat:

  1. check if z is feasible
  2. if not, find a violated constraint a s.t. a \cdot x \leq a \cdot z, and compute minimum volume Ellipsoid E_{t+1} containing E_t \cap \{x: a\cdot x \leq a\cdot z \} is E'(z', A') where z' = z - \frac{1}{n+1} \frac{Aa}{\sqrt{a^\top Aa}}, A' = \frac{n^2}{n^2-1}(A - \frac{2}{n+1}\frac{Aaa^\top A^\top}{a^\top Aa})

Now let’s prove the correctness and polynomiality of the algorithm. We will use the following lemmas.

Lemma 1. \langle R \rangle \leq poly(n,\langle A \rangle, \langle b\rangle) where \langle \cdot \rangle denotes the number of bits required.

Lemma 2. vol(E_{t+1}) \leq e^\frac{-1}{2n+2} vol(E_t)

Lemma 3. Given E(z, A), the minimum volume ellipsoid containing E(z, A) \cap \{x : a\cdot x \leq a\cdot z\} has center z' and matrix A' given as

z' = z - \frac{1}{n+1} \frac{Aa}{\sqrt{a^\top Aa}}          A' = \frac{n^2}{n^2-1}(A - \frac{2}{n+1}\frac{Aaa^\top A^\top}{a^\top Aa})

Now the theorem we want to prove:

Theorem 4. The algorithm either finds a feasible solution x or we can go to a lower dimensional space in O(n^2 log R) iterations.

Proof of Lemma 2,3:

Without loss of generality, assume a=(1,0,0,...0)^\top

Screen Shot 2018-11-12 at 4.07.59 AM

By symmetry, E', center moves only along e, and it has axes length (1-z), r, r, ...., r for some r \cdot (0,1,0,0,....,0) is on boundary of E'.

i.e. ((0,1) - (z, 0))\begin{pmatrix} \frac{1}{(1-z)^2} & 0 \\0 & \frac{1}{r^2}  \end{pmatrix} \left ( \begin{pmatrix} 0 \\ 1 \end{pmatrix} - \begin{pmatrix} z \\ 0 \end{pmatrix} \right ) or

\frac{z^2}{(1-z)^2}+\frac{1}{r^2}=1 \Longrightarrow r = \frac{1-z}{\sqrt{1-2z}}

vol(E') = vol(B_n) \cdot \prod_i a_i, where B_n is a unit n-ball, and a is axis length

\begin{aligned} vol(E') &= vol(B_n) \cdot \prod_i a_i \\ &= vol(B_n) \cdot (1-z)\cdot r^{n-1} \\ &=vol(B_n) \cdot \frac{(1-z)^n}{(1-2z)^\frac{n-1}{2}}\end{aligned}

Minimizing f(z) = \frac{(1-z)^n}{1-2z^\frac{n-1}{2}} we get z=\frac{1}{n+1} (by setting \nabla f(z) = 0 and solve for z)

and therefore the axes lengths are:

\frac{n}{n+1}\frac{\frac{n}{n+1}}{\sqrt{\frac{n-1}{n+1}}} = \frac{n}{\sqrt{n^2-1}} \cdot

and the volume is:

\frac{n^n}{(n+1)(n^2-1)^\frac{n-1}{2}}

and we can derive the inequality:

\begin{aligned} \frac{n^n}{(n+1)(n^2-1)^\frac{n-1}{2}} &= \left (\frac{n}{n+1}\right )\cdot\left ( \frac{n^2}{n^2-1}\right )^\frac{n-1}{2} \\ &= \left( 1-\frac{1}{n+1}\right )\left ( 1+\frac{1}{n^2-1}\right )^\frac{n-1}{2} \\&\leq e^\frac{-1}{n+1} \cdot e^\frac{1}{2(n+1)} = e^{-\frac{1}{2n+2}}\end{aligned}.

Each iteration of Ellipsoid algorithm only needs a violated inequality for the current query point. It does not need any other information from the feasible set. So in fact, the algorithm can be used much more generally, for convex programming.

Separation Oracle: a procedure, which answers z \in P as YES or NO, and in the latter case outputs a separating hyperplane.

Theorem 5. Given r, R s.t. \exists x_0 ~~ x_0+rB_n \subset K \subset RB_n and access to a separation oracle for K, the Ellipsoid Algorithm finds a point in K in at most O(n^2 log(\frac{R}{r})) queries.

O(n) queries to halve the volume of current ellipsoid R^n \rightarrow r^n takes n\cdot log\frac{R}{r} halvings \rightarrow O(n^2 log\frac{R}{r}).

Advertisements

Linear Programming II

Linear Program Equivalence

Note that the standard form of a linear program takes the following form

objective:

\max\{ \vec{c} \cdot \vec{x} \}

constraints:

A \vec{x} \leq \vec{b} and \vec{x} \geq 0

where A \in \mathbb{R}^{m \times n}, \vec{b} \in \mathbb{R}^{m}, and \vec{c} \in \mathbb{R}^{n}. Yet linear program constraints may take many forms, some of which described below

  • A \vec{x} = \vec{b} and \vec{x} \geq 0. Define A' \in \mathbb{R}^{2m \times n}, \vec{b}' \in \mathbb{R}^{2m} as follows

A' = \begin{bmatrix}A \\- A\end{bmatrix}

\vec{b}' = \begin{bmatrix}\vec{b} \\- \vec{b}\end{bmatrix}

If vector \vec{x} \geq 0 satisfies A' \vec{x} \leq \vec{b}', then

A \vec{x} \leq \vec{b} and -A \vec{x} \leq -\vec{b}

which implies A \vec{x} = \vec{b}. Thus, this linear program may be expressed in standard form.

  • A \vec{x} \leq \vec{b} and \vec{x} \in \mathbb{R}^{n} is unbounded. Define A' \in \mathbb{R}^{m \times 2n} as follows

A' = \begin{bmatrix}A, - A\end{bmatrix}

If vector \vec{x} = \big[\vec{y}, \vec{z}\big] \geq 0 satisfies A' \vec{x} \leq \vec{b}, then

A' \vec{x} = A\vec{y} - A \vec{z} = A (\vec{y} - \vec{z}) \leq \vec{b}

where \vec{y}, \vec{z} \geq 0. Thus, this linear program may be expressed in standard form.

We will use these linear program equivalences when we discuss the simplex algorithm. But first, we return to Farkas’ lemma, and provide an intuition for its proof, which you will formalize in your homework

Farkas’ Lemma

For any matrix A \in \mathbb{R}^{m \times n}, vector \vec{b} \in \mathbb{R}^{m}, exactly one of the following holds

  • (1) There exists \vec{x} \in \mathbb{R}^{n} such that

\vec{x} \geq 0 and A \vec{x} = \vec{b}

  • (2) There exists \vec{y} \in \mathbb{R}^{m} such that

\vec{b} \cdot \vec{y} < 0 and A^{T} \vec{y} \geq 0

Here’s intuition for the proof. Let A = \big[\vec{a}_{1}, \vec{a}_{2}, \dots,  \vec{a}_{n}\big] where \vec{a}_{j} is the j^{\text{th}} column vector. For all vectors \vec{x} \geq 0 with nonnegative entries,

A \vec{x} = \sum_{j = 1}^{n} x_{j} \vec{a}_{j}

matrix-vector product given above must lie in the conical hull H = \bigg\{A \vec{x} \big| \vec{x} \geq 0 \bigg\} induced by column vectors \{\vec{a}_{1}, \vec{a}_{2}, \dots,  \vec{a}_{n}\}. There are two possibilities

  1. \vec{b} \in H. Clearly, this implies there exists vector \vec{x} \in \mathbb{R}^{n} such that \vec{x} \geq 0 and A \vec{x} = \vec{b}
  2. \vec{b} \notin H. This implies there exists hyperplane that intersects origin \vec{0} \in \mathbb{R}^{m} such that separates vectors in conical hull H from vector \vec{b}. (You will prove this implication in your homework). As such, there exists \vec{y} \in \mathbb{R}^{m} such that \vec{a}_{j} \cdot \vec{y} \geq 0 for j from 1 to n and \vec{b} \cdot \vec{y} < 0. In other words, \vec{b} \cdot \vec{y} < 0 and A^{T} \vec{y} \geq 0.

This lemma equips us with the ability to prove part (4) of the duality theorem

Linear Programing Duality Theorem

Given primal program P, dual program D, there exist four possibilities

  1. P is infeasible, which implies D is unbounded
  2. P is unbounded, which implies D is infeasible
  3. P, D are both infeasible
  4. P, D are feasible

Linear program is infeasible if there exists no solution that satisfies all constraints, and is unbounded if objective function may be incremented towards positive or negative infinity. We shall demonstrate that in case (4), P, D have the same optimal value.

Let P be primal program that is feasible and bounded with objective \max\{ \vec{c} \cdot \vec{x} \} and constraints A \vec{x} = \vec{b} and \vec{x} \geq 0. Also, let \vec{x}^{*} \geq 0 be vector that satisfies constraints and achieves optimum z^{*} = \vec{c} \cdot \vec{x}^{*}.

Dual program D has objective \min\{ \vec{b} \cdot \vec{y} \} and constraints  A^{T} \vec{y} \geq \vec{c} where \vec{y} \in \mathbb{R}^{m} is unbounded.

Let z > z^{*}. Construct modified matrix A' \in \mathbb{R}^{m + 1 \times n}, vector \vec{b}' \in \mathbb{R}^{m + 1} as follows

A' = \begin{bmatrix}A \\\vec{c}^{T}\end{bmatrix}

\vec{b}' = \begin{bmatrix}\vec{b} \\z\end{bmatrix}

Since \vec{c} \cdot \vec{x} < z for all vectors \vec{x}\geq 0, case (1) in Farkas’ lemma cannot hold. As such, there exists vector \vec{w} = \big[\vec{y}, \alpha\big] \in \mathbb{R}^{m + 1} such that

\vec{b}' \cdot \vec{w} < 0 and A'^{T} \vec{w} \geq 0

\vec{b} \cdot \vec{y} + \alpha z < 0 and A^{T} \vec{y} + \alpha \vec{c} \geq 0

where \vec{y} \in \mathbb{R}^{m} and \alpha \in \mathbb{R}. Since \vec{x}^{*} \geq 0,

\vec{x}^{*} \cdot (A^{T} \vec{y}) + \alpha \vec{x}^{*} \cdot \vec{c} \geq \vec{x}^{*} \cdot 0

\vec{b} \cdot \vec{y} + \alpha z^{*} \geq 0

Now, since
\alpha z > \alpha z^{*}, we have \alpha < 0. Dividing both \vec{b} \cdot \vec{y} + \alpha z < 0 and A^{T} \vec{y} + \alpha \vec{c} \geq 0 by -\alpha yields

    \vec{b} \cdot \frac{\vec{y}}{-\alpha} < z and A^{T} \frac{\vec{y}}{-\alpha} \geq \vec{c}

Thus, z > \vec{b} \cdot \frac{\vec{y}}{-\alpha} \geq \min\{\vec{b} \cdot \vec{y}\}. By weak duality,

z > \min\{\vec{b} \cdot \vec{y}\} \geq \max\{\vec{c} \cdot \vec{x}\} = z^{*}

Therefore, setting z arbitrarily close to z^{*} implies that \min\{\vec{b} \cdot \vec{y}\} = z^{*} = \max\{\vec{c} \cdot \vec{x}\}

Solving Linear Programs

Let P be primal program that is feasible and bounded with objective \max\{ \vec{c} \cdot \vec{x} \} and constraints A \vec{x} \leq \vec{b} and \vec{x} \geq 0. Consider row vectors in A

A = \begin{bmatrix}\vec{a}_{1} \\\vec{a}_{2} \\\dots \\\vec{a}_{m}\end{bmatrix}

Note that \vec{a}_{j} \cdot \vec{x} \leq b_{j} is a half-space in \mathbb{R}^{n}, and the feasible region for P is the intersection of these half-spaces. Let F denote this feasible region, which is, by assumption, bounded and nonempty.

Clearly, \vec{a}_{j} \cdot \vec{x} = b_{j} is a hyperplane in \mathbb{R}^{n}. Vertex v \in \mathbb{R}^{n} is the intersection of n hyperplanes with linearly independent normal vectors $\vec{a}$. We will also note here that the feasible region may be expressed as the convex hull of these special points. Note that the vector that has an optimal objective value must occur at one of these vertices due to the fact that this objective function is linear.

We proceed with an example to demonstrate an approach to solving LPs known as the simplex method.

Simplex Method Example

objective:

\max\{ 5x_1 + 4x_2 + 3x_3 \}

constraints:

\begin{aligned}2x_1 + 3x_2 + x_3 &\leq 5\\4x_1 + x_2 + 2x_3 &\leq 11\\3x_1 + 4x_2 + 2x_3 &\leq 8\end{aligned}

Let us convert these inequality constraints to equalities using some ‘slack’ variables as follows

objective:

\max\{ 5x_1 + 4x_2 + 3x_3 \}

constraints:

\begin{aligned}2x_1 + 3x_2 + x_3 + x_4 &= 5\\4x_1 + x_2 + 2x_3 + x_5 &= 11\\3x_1 + 4x_2 + 2x_3 + x_6 &= 8\end{aligned}

We will begin by starting with some feasible solution, say (0, 0, 0, 5, 11, 8), and rewrite the nonzero (basic) terms x_4, x_5, x_6 in terms of the zero (nonbasic) terms x_1, x_2, x_3

  1. objective:

    z = 5x_1 + 4x_2 + 3x_3 = 0

    constraints:

    \begin{aligned}x_4 &= 5 - 2x_1 - 3x_2 - x_3 \\x_5 &= 11 - 4x_1 - x_2 - 2x_3\\x_6 &= 8 - 3x_1 - 4x_2 - 2x_3\end{aligned}

    Now we try to increase x_1. Note that x_1 \leq \min\{\frac{5}{2}, \frac{11}{4}, \frac{8}{3}\} = \frac{5}{2}. Setting x_1 = \frac{5}{2} produces solution (\frac{5}{2}, 0, 0, 0, 1, \frac{1}{2})

  2. objective:

    z = \frac{25}{2} -\frac{7}{2} x_2 + \frac{1}{2} x_3 - \frac{5}{2} x_4 = \frac{25}{2}

    constraints:

    \begin{aligned}x_1 &= \frac{5}{2} - \frac{3}{2} x_2 - \frac{1}{2} x_3 - \frac{1}{2} x_4  \\x_5 &= 1 + 6 x_2 + 2 x_4\\x_6 &= \frac{1}{2} + \frac{1}{2} x_2 - \frac{1}{2} x_3 + \frac{3}{2} x_4\end{aligned}

    Next we try to increase x_2 since it has a positive coefficient in the objective function. Note that x_2 \leq \min\{5, 1\} =1. Setting x_3 = 1 produces solution (2, 0, 1, 0, 1, 0)

  3. objective:

    z =13 - 3x_2 - x_4 - x_6

    constraints:

    \begin{aligned}x_1 &= 2 - 2x_2 - 2x_4 + x_6  \\x_3 &= 1 + x_2 + 3x_4 - 2x_6  \\x_5 &= 1 + 6 x_2 + 2 x_4\end{aligned}

    Note that no improvements can be made to objective function, and so solution (2, 0, 1, 0, 1, 0) is optimal vector with value 13.


Sometimes the initial solution cannot be found by setting the original variables to zero. Consider the following constrains:

\begin{aligned}-x_1-x_2 &\le -1\\-2x_1+x_2 &\le -1\\x_1,x_2 &\ge 0\end{aligned}

After adding slack variables, they become

\begin{aligned}-x_1-x_2+x_3 &= -1\\-2x_1+x_2+x_4 &= -1\\x_1,x_2,x_3,x_4 &\ge 0\end{aligned}

However we cannot set x_1=x_2=0 because x_3, x_4 should be nonnegative. So we add yet another two artificial variables y_1,y_2 to construct another LP: \min y_1+y_2 s.t.

\begin{aligned}-x_1-x_2+x_3+y_1 &= -1\\2x_1+x_2+x_4+y_2 &= -1\\x_1,x_2,x_3,x_4,y_1,y_2 &\ge 0\end{aligned}

If the original LP is feasible, then \min y_1+y_2=0 and we also get an initial feasible solution. If \min y_1+y_2>0, then original LP is not feasible. Now we proceed to formalize the simplex method.

Simplex Method 

Given matrix A \in \mathbb{R}^{m \times n}, constraint vector \vec{b} \in \mathbb{R}^{m}, and objective vector \vec{c} \in \mathbb{R}^{n},

  1. Augment constraints by adding m slack variables to produce equalities, i.e. for constraint i from 1 to m

    \sum_{j = 1}^{n} a_{ij} x_j + x_{n+i} = b_i

    where x_j \geq 0 for j from 1 to n+m. Also, rewrite objective function as follows,

    z = \sum_{j = 1}^{n+m} c_{j} x_j

    where c_j = 0 for j > n

  2. Find initial feasible solution \vec{x}^{0}, using method specified above if necessary, and rewrite basic variables in terms of nonbasic variables
  3. While there exists some positive objective coefficient, select some variable x_j where c_j > 0
    1. Using constraints, increase selected variable until some basic variable becomes 0. This variable is also known as the leaving variable
    2. Rewrite constraints so that basic variables are in terms of nonbasic variables, and replace the selected variable in objective function with equivalence constraint
  4. Output solution vector \vec{x}^{*}

Now we prove that once simplex terminates, \vec{x}^{*} is the vector that maximizes our objective function \vec{c} \cdot \vec{x}. Note that when simplex terminates objective function looks like

z = z^{*} + \sum_{j=1}^{n+m} \overline{c}_{j} x_{j} = \sum_{j=1}^{n} c_{j} x_{j}

where \overline{c}_{j} \leq 0. Consider dual program given as

objective:

\min\{ \vec{b} \cdot \vec{y} \}

constraints:

A^{T} \vec{y} \geq \vec{c} and \vec{y} \geq 0

Define vector \vec{y}^{*} \in \mathbb{R}^{m} with entries y_{i}^{*} = -\overline{c}_{n+i} for i from 1 to m. Clearly, \vec{y} \geq 0. We still need to demonstrate A^{T} \vec{y}^{*} \geq \vec{c} and \vec{b} \cdot \vec{y}^{*} = z^{*}.

  1. \vec{b} \cdot \vec{y}^{*} = z^{*}. Note that

    \begin{aligned}z &= z^{*} + \sum_{j=1}^{n} \overline{c}_{j} x_{j} + \sum_{i=1}^{m} \overline{c}_{n+i} x_{n+i} \\&= z^{*} + \sum_{j=1}^{n} \overline{c}_{j} x_{j} - \sum_{i=1}^{m} y_{i}^{*} x_{n+i}\\&= z^{*} + \sum_{j=1}^{n} \overline{c}_{j} x_{j} - \sum_{i=1}^{m} y_{i}^{*} \bigg(b_i - \sum_{j=1}^{n} a_{ij}x_{j}\bigg)\\&= z^{*} - \sum_{i=1}^{m} y_{i}^{*} b_{i} + \sum_{j=1}^{n}\big(\overline{c}_{j} + \sum_{i=1}^{m} a_{ij} y_{i}^{*}\big) x_{j}\\\end{aligned}

    Setting x_{1}, \dots, x_{n} = 0 yields z = 0, and so z^{*} - \vec{b} \cdot \vec{y}

  2. A^{T} \vec{y}^{*} \geq \vec{c}. Since z^{*} = \vec{b} \cdot \vec{y},

    \begin{aligned}z &= \sum_{j=1}^{n} c_{j} x_{j} = \sum_{j=1}^{n}\big(\overline{c}_{j} + \sum_{i=1}^{m} a_{ij} y_{i}^{*}\big) x_{j}\\\end{aligned}

    Since c_{j} = \overline{c}_{j} + \sum_{i=1}^{m} a_{ij} y_{i}^{*} and \overline{c}_{j} \leq 0,

    \begin{aligned}c_{j} &= \overline{c}_{j} + \sum_{i=1}^{m} a_{ij} y_{i}^{*} \\c_{j} &\leq \sum_{i=1}^{m} a_{ij} y_{i}^{*} \\\end{aligned}

    And so, \vec{c} \leq A^{T} \vec{y}^{*}

Linear Programming I

Many optimization problems may be formulated as a linear program. Given matrix A \in \mathbb{R}^{m \times n} and vectors \vec{b} \in \mathbb{R}^{m}\vec{c} \in \mathbb{R}^{n}, we seek some vector  \vec{x} \in \mathbb{R}^{n} that satisfies

A \vec{x} \leq \vec{b}

and maximizes

\max \{ \vec{c} \cdot \vec{x} \}

where elements in \vec{x} \geq 0 are nonnegative. We call \max \{ \vec{c} \cdot \vec{x} \} the objective function, and A \vec{x} \leq \vec{b} are known as the constraints.

Flow Networks as Linear Programs

Pushing flow through a network reduces to linear programming!

objective:

\max \{ \sum_{(s, u) \in E} f_{su}\}

capacity constraints:

\forall_{e \in E} 0 \leq f_{e} \leq c_{e}

flow preservation:

\forall_{v \neq s, t} \sum_{(u, v) \in E} f_{uv} = \sum_{(v, w) \in E} f_{vw}

Decision Problem Formulation

Note that linear programs can be formulated as a decision problem. Given some t \in \mathbb{R}, does there exist some \vec{x} \in \mathbb{R}^{n} such that

\vec{c} \cdot \vec{x} \geq t    and   A \vec{x} \leq \vec{b}

If YES, then simple provide the vector \vec{x} to be verified. If NO, we need some polynomial certificate to verify there exists no such vector. This is where the notion of a ‘dual’ program comes in. But first, perhaps an example is appropriate

 

Diet Problem

Suppose you are given the following meal choices: burger at $9, yogurt at $6, and ramen at $2. You also want to satisfy some dietary needs: you require at least 180 calories of protein, 100 calories of carbohydrates, and 20 calories of fat.

\begin{tabular}{ c c c c }& protein & carbohydrates & fat \\burger & 40 & 10 & 8 \\yogurt & 15 & 20 & 5 \\ramen & 5 & 30 & 15\end{tabular}

Given the above caloric breakdown of each meal, how can you satisfy your requirements while spending as little as possible? Let us formulate our problem as a linear program. Let x, y, z be the units of burgers, yogurts, and ramen we purchase. Note that we have the following constraints

40x + 15y + 5z \geq 180

10x + 20y + 30z \geq 100

8x + 5y + 15z \geq 20

where we want to minimize our objective 9x + 6y + 2z. Note that this is a minimization linear program, and to convert it to standard form, all we need to do is negate the objective and constraints, which yields the following linear program

objective:

\max \{ -9x - 6y - 15z\}

constraints:

-40x - 15y - 5z \leq -180

-10x - 20y - 30z \leq -100

-8x - 5y - 15z \leq -20

Another Problem

Consider following linear program

objective:

\max \{ 4x_1 + x_2 + 5x_3 + 3x_4\}

constraints:

(1) x_1 - x_2 - x_3 + 3x_4 \leq 1

(2) 5x_1 + x_2 + 3x_3 + 8x_4 \leq 55

(3) -x_1 + 2x_2 + 3x_3 - 5x_4 \leq 3

Let us assign variable y_{j} to constraint inequality (j), ensuring y_{j} \geq 0. Note that

\begin{aligned}y_1 (x_1 - x_2 - x_3 + 3x_4) &\leq y_1\\ y_2 (5x_1 + x_2 + 3x_3 + 8x_4) &\leq 55y_2\\ y_3 (-x_1 + 2x_2 + 3x_3 - 5x_4) &\leq 3y_3\end{aligned}

Adding each inequality together and grouping by x_{i} yields

\begin{aligned}&x_1 (y_1 + 5y_2 - y_3) \\ &+ x_2 (-y_1 + y_2 + 2y_3) \\ &+ x_3 (-y_1 + y_2 + 3 y_3) \\ &+ x_4 (3y_1 + 8y_2 - 5y_3) \leq y_1 + 55 y_2 + 3 y_3\end{aligned}

For each x_i, constrain multiplicative term to be greater than or equal to c_i, yielding following constraints

y_1 + 5y_2 - y_3 \geq 4

-y_1 + y_2 + 2y_3 \geq 1

-y_1 + 3y_2 + 3 y_3 \geq 5

3y_1 + 8y_2 - 5y_3 \geq 3

These constraints ensure that \max \{ 4x_1 + x_2 + 5x_3 + 3x_4 \} \leq \min \{ y_1 + 55 y_2 + 3 y_3 \}. As such, this is the dual program for original linear program with objective y_1 + 55 y_2 + 3y_3 that we seek to minimize and constraints given above.

 

Dual Programs

Let A \in \mathbb{R}^{m \times n} be matrix and \vec{b} \in \mathbb{R}^{m}\vec{c} \in \mathbb{R}^{n} be vectors.

Given linear program with following objective and constraints

objective:

\max \{ c_1x_1 + c_2x_2 + \dots + c_n x_n\}

constraint i:

a_{i1}x_1 + a_{i2}x_2 + \dots + a_{i1} x_n \leq b_{i}

Dual program is given as follows

objective:

\min \{ b_1y_1 + b_2y_2 + \dots + b_m y_m\}

constraint j:

a_{1j}y_1 + a_{2j}y_2 + \dots + a_{mj} y_m \geq c_{j}

where i \in [1, n]j \in [1, m]. Weak duality states

\max \{ \vec{c} \cdot \vec{x} | A \vec{x} \leq \vec{b}, \vec{x} \geq 0\} \leq \min \{ \vec{b} \cdot \vec{y} | A^{T} \vec{y} \geq \vec{c}, \vec{y} \geq 0\} 

 

Linear Programing Duality Theorem

Given primal program P, dual program D, there exist four possibilities

  1. P is infeasible, which implies D is unbounded
  2. P is unbounded, which implies D is infeasible
  3. P, D are both infeasible
  4. P, D are feasible

Linear program is infeasible if there exists no solution that satisfies all constraints, and is unbounded if objective function may be incremented towards positive or negative infinity. We shall demonstrate that in case (4), P, D have the same optimal value. We will need the following lemma to do this

 

Farkas’ Lemma

For any matrix A \in \mathbb{R}^{m \times n}, vector \vec{b} \in \mathbb{R}^{m}, exactly one of the following holds

  1. There exists \vec{x} \in \mathbb{R}^{n} such that

\vec{x} \geq 0 and A \vec{x} = \vec{b}

  1. There exists \vec{y} \in \mathbb{R}^{m} such that

\vec{b} \cdot \vec{y} < 0 and A^{T} \vec{y} \geq 0

General Graph Matching

Tutte’s Theorem: graph G = (V, E) has perfect matching M if and only if for all subsets X \subseteq V,

|odd(G \setminus X)| \le |X|

where |odd(G \setminus X)| denotes the number of odd components in the subgraph G \setminus X.

To demonstrate forward direction, assume towards contradiction that G has perfect matching M and there exists subset X \subseteq V such that |odd(G \setminus X)| > |X|. Note that no two distinct odd components in G \setminus X can share the same vertex in X, since the edges should constitute a matching. Thus, there exists some odd component with no matching edge to verses in X, which yields a contradiction

The other direction can be proved by first proving the correctness of Edmond’s algorithm, which computes maximum matchings for general graphs.

Edmond’s Algorithm:

For a matching M, a blossom is an odd cycle of length 2k+1 with k edges from M.

Lemma: Let B be a blossom of G such that B is disjoint from the rest of M. Let G' be the graph obtained from G by contracting the blossom B to a single vertex and let M' be the matching M restricted to G'. Then, G has an augmenting path iff G' has an augmenting path.

Proof: Clearly, if no augmenting path meets B, then the statement is trivial in both directions. Suppose we have an augmenting path in G' that ends at the blossom vertex (notice that the blossom vertex is unmatched in M'). Since the blossom is an odd cycle, there must be a path in G to the unmatched vertex in the blossom. Similarly, if we have an augmenting path in G that meets the blossom at some vertex, then we get a corresponding augmenting path in G' that ends at the blossom vertex.

With this idea, it is possible to modify the augmenting path algorithm from bipartite graphs to work for general graphs. Let the vertices on the even level be called outer vertices and the vertices on the odd levels be called inner vertices.

  • Let F be forest with roots initialized as all unmatched vertices in G
  • For every outer x \in F and y \notin F with (x,y) \in E, add (x,y) and (y,z) to F where (y,z) is a matching edge.
  • For outer x,y from different components of F, if (x,y) \in E, then we found an augmenting path, so output it and exit
  • For outer x,y from the same component and (x,y) \in E
    • Identify blossom (Least Common Ancestor of x,y)
    • Shrink the blossom, swap the matching and non matching edges from the blossom vertex to the root of its component.
  • Repeat till either augmenting path is found or if no more new vertices can be added.

Time Complexity: The running time is O(m n^2) since we perform graph search which takes O(m) steps, the size of the matching increases by 1 every time we find an augmenting path, and there can be at most \frac{n}{2} blossoms in the graph.

Correctness: 

We have proved (in previous lecture) that G is a maximum matching if and only if there are no augmenting paths. Suppose that Edmond’s algorithm does not find an augmenting path. But does this mean there are no augmenting paths? Is Edmond’s algorithm guaranteed to find an augmenting path in a graph if it exists?

First, observe that there are no edges between outer vertices:

  1. If there is an edge between outer vertices in the same component of the forest, then this implies a blossom.
  2. If there is an edge between outer vertices in different components, then this implies an augmenting path.

Claim For a subset X \subseteq V that leaves k isolated odd components in G \backslash X, then the number of unmatched vertices in any matching is at least k-|X|.

Proof The claim follows from the following observation: there must be at least one unmatched vertex in each odd component, and for this vertex to be matched, it must be matched to a vertex in X.

Proof (Edmond’s algorithm) Take X the be all inner vertices, which leaves a number of isolated odd components equal to the number of outer vertices. Thus, the number of unmatched vertices in any matching is at least #outer-#inner vertices. Note that #outer-#inner is equal to the number of components in the forest found by Edmond’s algorithm, which is also equal to the number of unmatched vertices. Thus, if Edmond’s algorithm does not find an augmenting path, then the matching is maximum!

Corollary (Tutte’s Theorem) G has a perfect matching iff for all X \subseteq V, |X| \ge |odd(G\backslash X)|.

 

Linear Equations I

Given n \times n integer matrix A \in \mathbb{Z}^{n \times n}, and  n dimensional vector \vec{b} \in \mathbb{Z}^{n}, output vector \vec{x} \in \mathbb{Z}^{n} which satisfies

A \vec{x} = \vec{b}

if such vector exists. Assume integer entries a_{ij} \in A, b_{k} \in \vec{b} may be represented using at most L bits for i, j, k from 1 to n. Note that this implies input size is O(n^2 L).

Gaussian Elimination

RowReduce(A):

h = 1, k = 1 (initialize pivot row, column)

while h \leq m and k \leq n

    i^{*} = argmax_{i \in [h, m]}(|A[i][k]|) (find k-th pivot)

    if A[i^{*}][k] = 0

        k \gets k + 1 (no pivot in this column, skip)
    else 
        
       swap rows h, i^{*}

        for i from h + 1 to m

            f = A[i][k] / A[h][k]

            A[i][k] \gets 0

            for j from k + 1 to n

                A[i][j] \gets A[i][j] - f \cdot A[h][j]

        h \gets h + 1, k \gets k + 1

Return A

 

GE(A, b):

    A' \gets [A, b] (augmented matrix)

    A' \gets RowReduce(A') (row-echelon form)

    Return \vec{x} \gets BackSub(A') (back substitution)

 

Consider following example

A=\begin{bmatrix}3&5&1\\ 1&2&4\\ -1&0&6\end{bmatrix}            \vec{b} =\begin{bmatrix}0\\ 1\\ 2\end{bmatrix}

Row reduction for augmented matrix given as

[A, b] = \begin{bmatrix}3&5&1&0\\ 1&2&4&1\\ -1&0&6&2\end{bmatrix} \sim \begin{bmatrix}1&\frac{5}{3}&\frac{1}{3}&0\\ 0&\frac{1}{3}&\frac{11}{3}&1\\ 0&\frac{5}{3}&\frac{19}{3}&2\end{bmatrix} \sim  \begin{bmatrix}1&\frac{5}{3}&\frac{1}{3}&0\\ 0&1&11&3\\ 0&0&-12&-3\end{bmatrix} \sim \begin{bmatrix}1&\frac{5}{3}&\frac{1}{3}&0\\ 0&1&11&3\\ 0&0&1&\frac{1}{4}\end{bmatrix}

Back substitution yields

[A, b] \sim \begin{bmatrix}1&\frac{5}{3}&\frac{1}{3}&0\\ 0&1&11&3\\ 0&0&1&\frac{1}{4}\end{bmatrix} \sim \begin{bmatrix}1&\frac{5}{3}&0&-\frac{1}{12}\\ 0&1&0&\frac{1}{4}\\ 0&0&1&\frac{1}{4}\end{bmatrix} \sim \begin{bmatrix}1&0&0&-\frac{1}{2}\\ 0&1&0&\frac{1}{4}\\ 0&0&1&\frac{1}{4}\end{bmatrix} 

Thus, \vec{x} = [-\frac{1}{2}, \frac{1}{4}, \frac{1}{4}]

Gaussian Elimination Complexity

Note that row reduction on k-th iteration requires at most O(n^2) elementary operations, and so time complexity is O(n^3). Since back substitution  requires O(n^2) elementary steps, gaussian elimination takes O(n^3) operations. Yet, what of the bit complexity of the operands that undergo these elementary operations?

On first iteration, entries may have bit size 2L. On second, size 4L. As such, on k-th iteration, entries may have size 2^{k} L. Thus, space complexity for storing and reading such iterations during gaussian elimination becomes exponential in input size. This is problematic. How can we fix this?

 

Modified Gaussian Elimination

ModifiedRowReduce(A):

    k = 1 (initialize pivot column)

    A^{0} = A (initialize matrix)

    A^{0}[0][0] = 1

    while k \leq \min\{n, m\}

        i^{*} = argmax_{i \in [k, m]}(|A^{k-1}[i][k]|) (find k-th pivot)

        if A^{k-1}[i^{*}][k] = 0
            
            A^{k} \gets A^{k-1}
            k \gets k + 1 (no pivot in this column, exit)

        else

           swap rows k, i^{*}

            for i from 1 to k
                
               for j from 1 to n
                    
                   A^{k}[i][j] \gets A^{k-1}[i][j]

            
            for i from k + 1 to m

                A^{k}[i][k] \gets 0

                for j from k + 1 to n

                    A^{k}[i][j] \gets \frac{A^{k-1}[k][k] A^{k-1}[i][j] - A^{k-1}[i][k] A^{k-1}[k][j]}{A^{k-1}[k-1][k-1]}

            k \gets k + 1


Return A^{k}

 

As per previous example,

A^{0} =\begin{bmatrix}3&5&1\\ 1&2&4\\ -1&0&6\end{bmatrix}

A^{1} = \begin{bmatrix}3&5&1\\ 0&3 \cdot 2 - 1 \cdot 5&3 \cdot 4 - 1 \cdot 1\\ 0&3 \cdot 0 - (-1) \cdot 5 &3 \cdot 6 - (-1) \cdot 1\end{bmatrix} = \begin{bmatrix}3&5&1\\ 0&1&11\\ 0&5&19\end{bmatrix}

A^{2} = \begin{bmatrix}3&5&1\\ 0&1&11\\ 0&0&\frac{1 \cdot 19 - 5 \cdot 11}{3}\end{bmatrix} = \begin{bmatrix}3&5&1\\ 0&1&11\\ 0&0&-12\end{bmatrix}

Lemma 1

For i \in [1, m], j \in [1, n], entry a^{k}_{ij} in matrix A^{k} is determinant of the submatrix of A with rows [1, \dots, k, i] and columns [1, \dots, k, j]

a^{k}_{ij}  = \begin{vmatrix}a_{11}&a_{12}&\dots&a_{1k}&a_{1j}\\ a_{21}&a_{22}&\dots&a_{2k}&a_{2j}\\ \dots&\dots&\dots&\dots&\dots\\ a_{k1}&a_{k2}&\dots&a_{kk}&a_{kj}\\ a_{i1}&a_{i2}&\dots&a_{ik}&a_{ij}\end{vmatrix}

Since integer matrices have integer determinants, A^{k} has integer entries given input matrix A \in \mathbb{Z}^{m \times n}. Given that each entry in A uses at most L bits, determinant a^{k}_{ij} is n! sum over n products, which implies

[a^{k}_{ij}] = \log(a^{k}_{ij}) \leq \log(c^{nL} \cdot n!) \leq nL \log(c) + n \log(n)

Thus, bit complexity for representing a^{k}_{ij} is O(n(L + \log(n))).

Lemma 0

Before we can prove lemma 1, we need some results on determinants. Suppose matrix B \in \mathbb{R}^{n \times n} results from applying an elementary row operation to matrix A \in \mathbb{R}^{n \times n} as specified below

  • Swapping two rows in A

\text{det}(B) = - \text{det}(A)

  • Multiplying row in A by nonzero number c

\text{det}(B) = c \text{det}(A)

  • Adding multiple of one row to another row in A

\text{det}(B) = \text{det}(A)

Each of the above equivalences result from \text{det}(C \cdot D) = \text{det}(C) \cdot \text{det}(D)

Proof of Lemma 1:

Define matrix B^{k} as submatrix of A^{k} with rows [1, \dots, k, i_{1}, \dots, i_{l}] and columns [1, \dots, k, j_{1}, \dots, j_{l}]

B^{k} = \begin{bmatrix}a_{11}&\dots&a_{1k}&a_{1j_{1}}&\dots&a_{1j_{l}}\\ a_{21}&\dots&a_{2k}&a_{2j_{1}}&\dots&a_{2j_{l}}\\ \dots&\dots&\dots&\dots&\dots&\dots\\ a_{k1}&\dots&a_{kk}&a_{kj_{1}}&\dots&a_{kj_{l}}\\ a_{i_{1}1}&\dots&a_{i_{1}k}&a_{i_{1}j_{1}}&\dots&a_{i_{1}j_{l}}\\ \dots&\dots&\dots&\dots&\dots&\dots\\ a_{i_{l}1}&\dots&a_{i_{l}k}&a_{i_{l}j_{1}}&\dots&a_{i_{l}j_{l}}\\ \end{bmatrix}

Each row i > k is multiplied by \frac{a^{k}_{kk}}{a^{k-1}_{k-1,k-1}} subtracted by \frac{a^{k-1}_{ik}}{a^{k-1}_{k-1,k-1}} times $k$-th row. By lemma 0,

\text{det}(B^{k}) = (\frac{a^{k}_{kk}}{a^{k-1}_{k-1,k-1}})^{l} det(B^{k-1})

Assume l = 1, in that i_{1} = i and j_{1} = j,

B^{k} = \begin{bmatrix}a_{11}&a_{12}&\dots&a_{1k}&a_{1j}\\ a_{21}&a_{22}&\dots&a_{2k}&a_{2j}\\ \dots&\dots&\dots&\dots&\dots\\ a_{k1}&a_{k2}&\dots&a_{kk}&a_{kj}\\ a_{i1}&a_{i2}&\dots&a_{ik}&a_{ij}\end{bmatrix}

For notational simplicity, let a_{j} = a^{j}_{jj}, Note that

\text{det}(B^{k}) = (\frac{a_{k}}{a_{k-1}}) \text{det}(B^{k-1}) = (\frac{a_{k}}{a_{k-1}}) (\frac{a_{k-1}}{a_{k-2}})^{2} \cdot \cdot \cdot (\frac{a_{1}}{a_{0}})^{k} \text{det}(B^{0})

\text{det}(B^{k}) = a_{k} a_{k-1} \cdot \cdot \cdot a_{1} \text{det}(B^{0})

Since B^{k} is upper triangular, \text{det}(B^{k}) = a_{1} \cdot \cdot \cdot a_{k-1} a_{k} \cdot a^{k}_{ij}. Thus,

\text{det}(B^{k}) = a_{k} a_{k-1} \cdot \cdot \cdot a_{1} \text{det}(B^{0}) = a_{1} \cdot \cdot \cdot a_{k-1} a_{k} a^{k}_{ij}

\text{det}(B^{0}) = a^{k}_{ij} \text{    } \square

By Lemma 1, bit complexity for representing a^{k}_{ij} is O(n(L + \log(n))). Note that gaussian elimination with modified row reduction still requires O(n^{3}) arithmetic operations on entries with bit size O(n(L + \log(n))), and so overall complexity is O(n^{4}(L + \log(n))).

It has been demonstrated that gaussian elimination can be done in time O(n^{\omega} L) where n^{\omega} is matrix multiplication time.

 

Greedy Algorithms

(Algorithms chapter 5)

Greedy strategies take a very natural approach to solve a problem. Given the choices that we have made so far, make the choice that provides the most benefit going forward. We can think of a greedy algorithm as taking the choice that looks locally optimal. However, generally our goal is to find a global optimum. When do these locally optimal moves produce a global optimum? We start by seeing some examples.

Def 1. (Matching) Given a graph G=(V, E), a subset of edges M \subset E is a matching if every two edges in E has disjoint endpoints.

Maximum Matching Problem

Given a graph G=(V, E), find a matching of maximum cardinality.

A greedy approach:  Pick any available edge that extends the current matching.

Does the above algorithm always output a maximum matching of the graph? No. Consider the example below which is a path of length three. If the green edge is chosen as the first edge by the algorithm, the algorithm stops there. The optimal solution though is a matching of size 2 (red edges).

Screen Shot 2018-09-18 at 4.05.44 PM

Shortest Path Problem

Given a graph G=(V, E) with weights on the edges,  and vertices (s,t), find a path between s, t with minimum total weight.

A greedy approach: Start with s and at each step choose the neighbour that gives the shortest path to t.

Does the above algorithm always find a shortest path? No. Consider the example below. At the first step, the algorithm chooses the neighbour c cause the distance of c to t is 6 versus the distance of a to t which is 8. Therefore the greedy algorithm outputs the path (s, c, d, t). However, the shortest path is (s, a, b, t).

Screen Shot 2018-09-18 at 4.20.04 PM

Minimum Spanning Tree (MST) Problem

The next example is a problem (computing the minimum spanning tree of a graph) where a greedy algorithm finds the optimal solution. Before getting into the problem, we need to introduce some definitions of a graph G=(V,E).

Def 2. (Connected Component) A connected component is a subset of vertices M \subset V such that there is a path between any two vertices in M.

Def 3. (Spanning Subgraph) A subgraph that includes every vertex in V.

Def 4. (Tree) A tree is a connected component of G with no cycles.

Claim 5: A minimal weight spanning connected component  F\subseteq E is a spanning tree. That is,

  1. F has no cycles
  2. F has |V|-1=n-1 edges; i.e. |F|=n-1.

Proof: (1) Suppose F had a cycle. Then, we can remove any edge on the cycle while maintaining connectivity between every pair of vertices. Therefore, there can be no cycles if F is minimal.

(2) We prove by induction that |F|=n-1. It’s clearly true for the base cases n=1,2. Now, for n>2, remove any edge in F. Then, F \backslash \{e\} has 2 components (otherwise F was not minimal), and each component has n_1, n_2<n vertices with n_1 + n_2 = n. By induction,

|F| = (n_1-1) + (n_2-1) + 1 = n-1.

Minimum Spanning Tree (MST) Problem. Given a graph G=(V, E) and weights on the edges, find a spanning tree of G with minimum total weight.

A greedy approach: One possible approach is to simply select the lightest weight edge. Locally, this seems like the best edge to select. However, there is a potential problem: what if the lightest weight edge creates a cycle? Well, then we simply ignore that edge and proceed to the next lightest edge. We have essentially described Kruskal’s Algorithm:

Kruskal’s Algorithm for solving MST:

KRUSKAL(V,E,w):

  • Sort edges in increasing order of weight; so have an ordering e_1, \ldots, e_m such that w(e_i) \le w(e_{i+1}).
  • Set T = \{\}
  • for i=1 to m
    • Add e_i to T if it doesn’t create a cycle.
  • Return T.

Screen Shot 2018-09-21 at 4.58.07 PM

Theorem 6 (Kruskal): Kruskal’s algorithm produces a minimum weight spanning tree of G=(V,E).

Before proving this theorem, we need to first introduce some definitions and lemmas.

Def 7. (Cut) A cut (S,\bar S) is the set of edges between a subset of vertices S \subseteq V and its complement \bar S = V \backslash S. That is,

(S,\bar S) = \{(u_i, v_j) \in E | u_i \in S, v_j \in \bar S\}.

Lemma 8. Fix any edge e whose weight is the minimum in some cut (S,\bar S). Then, there exists an MST T containing e.

Proof: Suppose e is not in an MST. Consider an MST T and add e to T. Then, e induces a cycle C (since T \cup \{e\} has two paths between endpoints of e).

Consider the cut (S,\bar S) for which e is a minimum weight edge. Then C must cross (S,\bar S) in some other edge f with weight(e) \le weight(f).

Set T' = T \cup \{e\} \backslash \{f\}. Note that weight(T') \le weight(T), which means T' is an MST (since it connects every pair of vertices and is minimum weight).

Lemma 9 (cut property). Let X be a subset of edges of an MST of a graph G. For any edge e such that e is a minimum-weight edge of a cut (S, \bar{S}) and X \cap (S, \bar{S}) = \phi, there is an MST T s.t. X \cup \{e \} \subseteq T.

Proof. Take some MST T that contains X.  Let (S, \bar{S}) be a cut in which e is a minimum cost edge and assume that e \notin T. Add e to T. This creates a cycle. At least one other edge of the cycle e' must be in (S, \bar S). Since e was the edge of the cut with the smallest weight, T \cup \{e\}\setminus \{e'\} has lower weight than T. This contradicts T being an MST.

Proof (of Kruskal Theorem): Using Lemmas 8 and 9, we can now prove that Kruskal’s algorithm always produces an MST. When the next edge

When the next edge e is added, we can identify a cut (S, \bar S) such that e is the first edge across this cut. Let X be the edges that has been added so far. According to Lemma 9, e belongs to some MST. We add e to X and repeat the process.

Running time of Kruskal’s algorithm: O(m \log n).

Sorting the edges once up front takes O(m \log n).

However, we also have to keep track of whether an added edge creates a cycle, i.e. when an edge connects two vertices in the same connected component. We can do this via a simple data structure.

For each vertex, introduce a component number, which is initially just the index of that vertex. When Kruskal’s algorithm adds an edge which connects two vertices with the same component number, we skip that edge because it creates a cycle. If the component numbers of the two vertices are different, then we relabel the vertices in the smaller component to the label of the larger component. Since we relabel the smaller component, the size of a vertex’s component at least doubles each time it is relabeled. Therefore a vertex’s label changes at most O( \log n) times. Thus the total cost of this data structure is O(n \log n).

Prim’s Algorithm:

Another algorithm to find the MST of a graph is Prim’s algorithm.

  • Start with any vertex, and mark it as visited.
  • Add the lightest weight edge from some visited vertex to an unvisited vertex; repeat until all vertices are visited.

As with Kruskal’s algorithm, its correctness follows from the cut property given above.

Running time of Prim’s algorithm: O(m \log n).

Matroids

Can we define a more general class of problems that could be solved by the above greedy algorithm?

Def 10. (Matroid) A matroid consists of a universe of elements U and a collection I of independent subsets of U with the following properties

  • A \subseteq B, B\in I \Rightarrow A \in I
  • |A|<|B|, A,B \in I \Rightarrow \exists e \in B s.t. A\cup \{e\} \in I

A Greedy approach. Sort elements of U in increasing order of weights. Add the next element in the list that preserves the independency.

Theorem 11. The above greedy algorithm finds the maximum weight independent set.

Proof. We prove by contradiction. Let x_1 \geq x_2 \geq X_3 \ldots be the elements in the output greedy algorithm in descending weight order and let y_1, y_2, \ldots be the elements in the maximum independent set in descending weight order. Since the total weight of the optimal is higher than the total weight of the greedy solution, consider k to be the first index that w(x_k) < w(y_k). Then |\{x_1, \ldots, x_{k-1}\}| < |y_1, \ldots, y_k| and both are independent. By the matroid property \exists y_j \in \{ y_1, \ldots , y_k \} such that \{ x_1, \ldots, x_k \} \cup \{y_j\} is an independent set. But w(y_j) \geq w(y_k) > w(x_k) and this contradicts the fact that the greedy algorithm chose x_k and not y_j.

Observation 12. Spanning trees form a matroid. Let’s define U = E and I as the subset of edges with no cycles (forests). It is simple to see that this satisfies the two matroid properties. Given this, Theorem 11 states that the greedy algorithm stated above finds the maximum spanning tree. This can be used to solve the MST problem by simply defining the weights M_{ij} = W - w_{ij}. Then finding an MST on a graph with weights w is equivalent to finding the maximum spanning tree on the same graph with weights M.

Dynamic Programming

Divide & Conquer works when a problem can be solved by solving smaller problems of the same type.

In Dynamic Programming (DP), we typically have to define and solve a more general problem.

Shortest Path

Problem: Given a graph G = (V, E) with edge lengths l(e), find the shortest path from vertex s to t.

  • If all edges were of unit-length a breadth first search could solve the problem in O(|V|+|E|).
  • Previous solution can be extended to weighted version by breaking edges into smaller unit edges, though the complexity of such a solution would be exponential on size of input! (An edge of length L can be represented in O(\log (T)) bits)

Observation: A shortest s-t path P is also a shortest s-u path, for all vertices u on P.

More General Problem: Find shortest s-u paths, \forall u.

Algorithm (Dijkstra):

Idea: Maintaining tentative distances to each vertex, i.e., T(u).

  • T(s) = 0
  • T(u) = \infty, \forall u \ne s
  • S = \{ \}
  • while S \ne V:
    • Let j be the vertex not in S with smallest T(j)
    • S \leftarrow S \cup \{ j \}
    • Update T(u) \leftarrow \min \{ T(u), T(j) + l(j,u) \}

Theorem: Dijkstra’s algorithm correctly finds shortest path from s to all vertices.

Either of the following lemmas immediately result the theorem.

Lemma: At every step, for every vertex u \in ST(u) is the length of shortest s-u path and members of S are the |S| nearest vertices to the s.

Proof: We prove this by induction on |S|. Let’s name the set after k iterations S_k. The basis trivially holds for |S_1| = 1. Induction step: (k to k+1) Let u be the k+1‘th distant vertex from s. The previous (to the end) vertex on a shortest s-u path (name it v) is nearer to the source and by induction hypothesis is in S_k and T(v) is distance of v from the source. Considering the algorithm at a previous step, when v was added to S, it has updated its neighbor u so T(u) \leq T(v) + l(v,u). On the other hand at any point of the algorithm and for all vertices x, T(x) is at least equal to distance of x from the source, because according to updates sequence there exists a path of that length from source to x. Hence, we must have T(u) = T(v) + l(v,u), equal to distance of u from the source. As for all other vertices x \in V \setminus S_k \setminus \{u\} we have T(x) \geq d(s,x) \geq d(s,u) \geq T(u), u will be the vertex added into S at this step for which T(u) is distance from the source and it is the k+1‘th distant vertex from s as required for S_{k+1}.

Alternatively we can prove the theorem using the following lemma.

Lemma: At every step, for each vertex u \in S, T(u) is the shortest path distance from s to it and there is a s-u path of length T(u) that falls wholly inside S.

Proof: Similarly, we have simple induction on |S|. Basis holds for S_k = S_1 = \{s\}. Induction step: Assuming the lemma for k, Let u be the nearest vertex to source from V \setminus S_k. A shortest path from s to u starts in S and ends outside of it. By the choice of u all previous vertices in the path to it must be in S, else we have contradiction by choice of u. Name the previous (to u) vertex on that path v. Considering the algorithm at a previous step, when v was added to S, it has updated its neighbor u so T(u) \leq T(v) + l(v,u). On the other hand at any point of the algorithm and for all vertices x, T(x) is at least equal to distance of x from the source, because according to updates sequence there exists a path of that length from source to x. Hence, we must have T(u) = T(v) + l(v,u), equal to distance of u from the source. As for all other vertices x \in V \setminus S_k \setminus \{u\} we have T(x) \geq d(s,x) \geq d(s,u) \geq T(u), u will be the vertex added into S at this step for which T(u) is distance from the source and there exist a shortest path to it using only vertices in S_{k+1}, hence completing the induction hypothesis.

Note: Using data structures like heap, the runtime for above procedure would be O( (|V|+|E|) \log (|V|)).

Note: We found the length of shortest path to each vertex. To retrieve the shortest path itself, we could iteratively check all neighbors of destination for being a previous node in a shortest path, or we could store this information while updating its T().

Longest Increasing Subsequence

Problem: Given a sequence of n numbers a_1, \ldots, a_n, find (the length of) its longest increasing subsequence (LIS).

Example: LIS for 1, 4, -1, 0, 6, 2, 4, 5 is -1, 0, 2, 4, 5 that is of length 5.

Note that D&Q technique does not work for this problem.

Let’s solve a more general problem: Finding (length of) LIS ending at element a(i) for all indices i, name it s(i).

Recursion: s(i) = \max \{ 1+s(j) : j < i, a(j) <= a(i) \}.

Using previous observation we solved the LIS problem, computing all s(i) in time O( n^2 ).

Note: This problem has an O(n \log n) solution, using monotonicity of smallest tail of increasing subsequences of fixed lengths.

Matrix Chain Multiplication

Problem: Given k matrices M_1, \dots, M_k, want to compute M_1 \times \dots \times M_k most efficiently by choosing the order in which we perform multiplications.

For simplicity say for each multiplication we use the naive method, taking time O(m n p) to multiply an m \times n and an n \times p matrix, and we only want to find the minimum cost of computing the final matrix.

Let the i‘th matrix be of size m_i \times m_{i+1}.

Example: If sizes of matrices be (2 \times 6), (6 \times 10), (10 \times 6), multiplying the left pair first results in total cost of 2 \cdot 6 \cdot 10 + 2 \cdot 10 \cdot 6 while the other scenario (right first) costs 6 \cdot 10 \cdot 6 + 2 \cdot 6 \cdot 6.

More general problem: For every i \leq j, what is the (minimum) cost of multiplications from the i‘th matrix to the j‘th one, i.e., M_i, \dots, M_j, name this value C[i,j].

Observation:

  • C[i, i+1] = m_i m_{i+1} m_{i+2}
  • C[i, j] = \min_{i < k < j} C[i, k] + C[k+1, j] + m_i m_{k+1} m_{j+1}

Analysis: Storing C[,] in a two dimensional array, we need to compute values for elements above the diagonal, computing O(n^2) values where each takes O(n) time, summing up to O(n^3) for our total computation.

Note: In DP we usually use smaller instances of the general problem to solve larger ones. While we compute the answer for an instance using others it is important that referred instances be previously solved and known. In last example an acceptable order of finding values C[i,j] is in increasing order of j-i, i.e., diagonals of 2-d array C moving up, starting at the main diagonal.