NP: Knapsack

PDF of Eric’s handwritten notes are here.

Recall that our DP (dynamic programming) algorithm for Knapsack takes $O(nB)$ time, where $n$ was the number of items and $B$ was the capacity of our bag. This takes exponential time in the size of the input. We will now show that Knapsack (search version) is NP-complete.

The key will be to show that the following problem, known as the Subset Sum problem, is NP-complete.  The knapsack problem is a generalization of Subset Sum so it’ll follow as an easy corollary that knapsack-search is NP-complete.

Subset-sum (SS)
Input: $n$ numbers $a_1,...,a_n$ and a parameter $t$
Output: A subset $S$ of objects $\{1,...,n\}$ where $\sum_{i\in S} a_i = t$ and NO if no such $S$ exists.

We assume that each $a_i$ is at most $t$ in magnitude. Then the input size is $O(n \log t)$ because each of the $n$ items takes $\log t$ bits.

Exercise: Find a DP algorithm for this problem that takes $O(nt)$ time.

Lemma: Subset-sum is NP-complete

Proof:
a) First show that subset-sum is in NP:
Given input $\{a_1,...,a_n, t\}$ and a solution $S$, it takes $O(n\log t)$ time to compute $\sum_{i\in S} a_i$ and check that this equals $t$.

b) Next, we reduce 3SAT to SS.
Take a 3SAT input $f$ with variables $x_1,...x_n$ and clauses $c_1,...,c_m$.

Our input for SS will $2n + 2m$ numbers and some $t$:
We make $2n$ numbers for the literals: $v_1,v_1',v_2,v_2',...,v_n,v_n'$
We make $2m$ numbers for the clauses: $s_1,s_1',...,s_m,s_m'$

We will write each of our numbers and $t$ as an $n + m$ digit number in base 10. Note that the magnitude of each number could be as large as $10^{n+m}$.

We want that $v_i \in S \iff x_1 = T$ (so $v_i \in S \iff x_1 = F$). Thus, we want $S$ to include exactly one of $\{v_i,v_i'\}$.

For each variable $x_i$, we will set digit $i$ of $v_i$ and $v_i'$ to 1.
For clause $C_j$, if $x_i \in C_j$, set digit $n+j$ of $v_i$ to 1, else if $\bar{x_i} \in C_j$, set digit $n+j$ of $v_i'$ to 1.

We set digits 1 to $n$ of $t$ to 1. We set digits $n+1$ to $m$ of $t$ to 3.

Now we set our $S_i, S_i'$ variables as “slacks” to complete the reduction. For each clause $C_j$, set digit $n + j$ of slack variables $S_j,S_j'$ to 1.

Since the first $n$ digits of $t$ are 1, then for each $i$, we can only choose at most one of $v_i, v_i'$ (because if we choose both, our numbers will sum to 2 in that column instead of 1).

Since the last $m$ digits of $t$ are 3, then for each clause $C_j$, we will need to choose at least one number corresponding to a literal in $C_j$. For instance, if $C_j$ is $x_2 \vee \bar{x_4} \vee x_7$, then the only numbers that are non-zero in the $(n+j)^{th}$ digit are $v_2,v_4',v_7,s_j,$ and $s_j'$. To get this digit to sum to 3, we need at least one of $v_2, v_4', v_7$ to be included in $S$.

Suppose that we have the following input to 3SAT:

$f = (\bar{x_1} \vee \bar{x_2} \vee \bar{x_3} ) \wedge (\bar{x_1} \vee \bar{x_2} \vee x_3 ) \wedge (x_1 \vee \bar{x_2} \vee x_3) \wedge (x_1 \vee x_2)$

Then this table shows how we create the numbers for our input to SS.

Lemma: SS has a solution $S$ $\iff$ 3SAT $f$ has a satisfying assignment

Proof:
$(\Rightarrow)$ Take solution $S$ for SS. We know that $S$ includes $v_i$ or $v_i'$ (but not both). If $v_i \in S$, then set $x_i = T$. If $v_i' \in S$ then set $x_i = F$. Thus, we have an assignment.

If we look at digit $n+j$, there will be at least 1 literal in $C_j$ being satisfied by construction.

$(\Leftarrow)$ Take a satisfying assignment for $f$. If $x_i = T$, then add $v_i$ to $S$, else if $x_i = F$, add $v_i'$ to $S$. Then digit $i$ of our numbers will sum to 1, as desired.

Look at digit $n+j$. The assignment must satisfy at least one literal of $C_j$. If exactly one such literal is satisfied, then add $s_j$ and $s_j'$ to $S$. If exactly two literals in $C_j$ are satisfied, then add $s_j$ to $S$. If exactly three literals in $C_j$ are satisfied, then do not add $s_j$ or $s_j'$ to $S$.

Next, we show that Knapsack-search is NP-complete.

Knapsack-search:

Input$n$ objects with integer weights $w_1,...,w_n$ and integer values $v_1,...,v_n$, total capacity $B$, and goal $V$.

Output: Subset $S$ of $\{1,..,n\}$ where $\sum_{i\in S} w_i \le B$ and $\sum_{i\in S} v_i \ge V$.

Lemma: Knapsack-search is NP-complete:

a) Knapsack-search is in NP. We showed this in a previous class.

b) SS $\rightarrow$ Knapsack-search. Take your input to SS: $a_1,...,a_n, t$. We define the following input to Knapsack-search. For all $i$, we set $w_i = v_i = a_i$, and we set $B = V = t$.

A solution $S$ to Knapsack-search must obey:
$\sum_{i\in S} w_i \le B \iff \sum_{i\in S} a_i \le t$
$\sum_{i\in S} v_i \ge V \iff \sum_{i\in S} a_i \ge t$

Thus, $\sum_{i\in S} a_i = t$, so $S$ is a solution to Knapsack-search iff $S$ is a solution to the SS instance.