# DP Part 1

PDF of Eric’s handwritten notes are here.

Dynamic Programming

Here we study dynamic programming.  We’ll discuss a few examples so you see the methodology for designing a dynamic programming algorithm.

Problem 1: Longest Increasing Subsequence (LIS)

Input: $n$ numbers $a_1, \ldots, a_n$.

Output: The length of the LIS.

Example: If the input is $7,11,-5,-2,15,1,16,6,7,11,8,9,0$, then the longest increasing subsequence is $-5,-2,1,6,7,8,9$.

The first step in designing a dynamic programming algorithm is to define the subproblem.  We then attempt to define a recursive solution to the subproblem in terms of smaller subproblems.  If we are unable to do so then we go back and modify our definition of the subproblems, hopefully armed with some greater insight into the recursive nature of the problem.

Our first attempt at defining a subproblem is typically to consider exactly the same problem formulation on a prefix of the input.  Thus we consider the following.

Attempt 1:

Subproblem definition: Let $T(i)=$ length of LIS in $a_1, \ldots, a_i$.

Recursive step:  We want to express $T(i)$ in terms of $T(1),\dots,T(i-1)$.  But given $T(i-1)$ how do we know if we can append $a_i$ onto the optimal subsequence for $a_1,\dots,a_{i-1}$?  We need to know what is the last number in the solution for $T(i-1)$.  And if $a_i$ cannot be added on we may want to extend a suboptimal solution as this suboptimal solution may eventually surpass $T(i)$.  Thus we modify the subproblem definition to keep track of the last number in the subsequence.

Attempt 2:

Subproblem definition: Let $T(i)=$ length of LIS in $a_1, \ldots, a_i$ which includes $a_i$.

Recursive step:  $T(i) = 1 + \max_{j} \{ T(j) : 1 \le j < i \text{ and } a_j < a_i\}$.

Output: Return $\max_i T(i)$.

Pseudocode:

$LIS(a_1, \ldots, a_n)$

• for $i=1\rightarrow n$
• $T(i) = 1$
• for $j=1\rightarrow i-1$
• if $a_j < a_i$ then $T(i) = \max\{T(i),1+T(j)\}$
• $max = 1$
• for $i=2\rightarrow n$
• if $T(i) > T(max)$ then $max = i$
• Return $T(max)$.

Running time: $O(n^2)$. We have $n$ subproblems, and each subproblem depends on all the subproblems before it.

Note: Consider the definition of the subproblem. There is a crucial property here that we are enforcing $T(i)$ to include element $a_i$; this allows us to know when we can “extend” this subproblem. Since we know the last element of the subsequence is $a_i$, we can add any element with value $>a_i$.

Problem 2: Longest Common Subsequence

Given two strings $X = x_1 x_2 \dots x_n$ and $Y = y_1 y_2 \dots y_m$, find the length of the longest string that is a subsequence of both $X$ and $Y$.

Example

$X = BCDBCDA$
$Y = ABECBA$

It is easy to verify that the length of the longest common subsequence is 4 and the string is $BCBA$.

Dynamic Programming Solution

As in the previous example, we will be using dynamic programming to solve this problem.

The first step is to define the subproblem. As before, let us attempt to use prefixes.

For $0 \le i \le n$, $0 \le j \le m$, let:
$L(i,j)$ denote length of the longest common subsequence for the strings $x_1, \dots, x_i$ and $y_1, \dots, y_j$.

Next, let us proceed with the base cases:

$L(i,0) = 0, 0 \le i \le n$
$L(0,j) = 0, 0 \le j \le m$

Now, let us figure out the recurrence:

If characters $x_i$ and $y_j$ are different, then we may choose to exclude $x_i$ or $y_j$. If we exclude $x_i$, we would be left with the strings $x_1 x_2 \dots x_{i-1}$ and $y_1 y_2 \dots y_j$. What would be the length of the LCS for the strings $x_1 x_2 \dots x_{i-1}$ and $y_1 y_2 \dots y_j$? This is precisely the definition of $L(i-1, j)$. Similarly, if we leave out $y_j$, the length LCS of the strings $x_1 x_2 \dots x_i$ and $y_1 y_2 \dots y_{j-1}$ will be $L(i,j-1)$. So the length of the LCS of the strings $x_1 x_2 \dots x_i$ and $y_1 y_2 \dots y_j$ will be the larger of the two options. Therefore we have the following recurrence in the case that the last characters are different.

If $x_i \ne y_j$ then:
$L(i,j) = max \{L(i-1,j), L(i,j-1)\}$

If characters $x_i$ and $y_j$ match, then we may choose to match $x_i$ with $y_j$, and would be left with the strings $x_1 x_2 \dots x_{i-1}$ and $y_1 y_2 \dots y_{j-1}$. In this case, the length of the LCS will be $1 + L(i-1,j-1)$. We may also choose to exclude $x_i$ or $y_j$ as before. So the length of the LCS of the strings $x_1 x_2 \dots x_i$ and $y_1 y_2 \dots y_j$ will be the larger of the three options.

If $x_i = y_j$ then:
$L(i,j) = max\{1 + L(i-1,j-1), L(i-1,j), L(i, j-1)\}$
(In fact one can argue that in this case, $L(i,j) = 1 +L(i-1,j-1)$ since we might as well match $x_i$ and $y_j$.)

Let us summarize the algorithm in the form of pseudocode:

$LCS(X,Y)$

• for $i=0$ to $n: L(i,0) = 0$
• for $j=0$ to $m: L(0,j) = 0$
• for $i=1$ to $n:$
• for $j=1$ to $m:$
• if $x_i = y_j$: $L(i,j) = max\{1 + L(i-1, j-1), L(i-1,j), L(i,j-1)\}$
• if $x_i \ne y_j$: $L(i,j) = max \{L(i-1,j), L(i,j-1)\}$
• return $L(n,m)$

Clearly, the running time of the algorithm is $O(nm)$.

Practice exercise: How do you find the LCS (not just the length, but output a string)?

Practice exercise: Longest common substring (instead of subsequence).