# NP-Completeness: Intro

PDF of Eric’s handwritten notes are here.

#### Overview:

How can we show that a problem is intractable or computationally difficult in the sense that it can’t be solved efficiently?   To be precise, by efficient we mean that the running time is polynomial in the input size.  We are not able to show that a problem cannot be solved in polynomial-time, but instead we show that if we can solve this particular problem in polynomial-time then it will have dramatic implications.  In particular, we will be able solve a wide-range of other problems in polynomial-time, and these other problems include many noteworthy problems that have resisted the efforts of many people to crack them.

Following [Dasgupta-Papadimitriou-Vazirani]’s textbook we will define the class NP in terms of search problems.  This is a somewhat more intuitive approach, however the traditional approach uses decision problems.  The distinction is not important but you should be aware in case you look at other textbooks or you’ve seen these topics in an earlier course.

The class NP are all problems where we can efficiently verify solutions.  In retrospect, we may have denoted the class as EV (to stand for efficient verification), but these concepts were devised in terms of automata and NP stands for non-deterministic polynomial-time; we’ll discuss this more in a bit.  Let’s first define the notion of a search problem.

#### Definition:  Search Problem:

• Input: Instance $I$
• Output: Solution $S$ if one exists; NO if no solution exists
• Requirements: Given $I$ and solution $S$ then in polynomial time we can check that $S$ is a solution to $I$

Roughly speaking, a search problem is one where we can efficiently verify solutions.  We don’t need to find or construct a solution, simply if we are given a solution (who knows how it’s found) then we can efficiently check that it is in fact a solution. To show problem $A$ is a search problem, we need to show an algorithm that takes input $I$ and solution $S$ and checks that $S$ is, in fact, a solution to $I$, and the running time of the algorithm $A$ must be polynomial in the input size $|I|$.

We can now define the class NP as the class of all search problems.  Note this says nothing about constructing the solution, just that we can verify solutions efficiently.  If in addition we can construct solutions efficiently for all inputs then the problem is in the class P.

• NP = Class of all search problems
• P = All search problems that can be solved in polynomial time

Hence the question whether P =? NP is roughly asking whether constructing a solution as hard/easy as verifying a solution.  An analogous question from mathematics: given a proof of a theorem is checking that proof as difficult (within polynomial factors) as constructing a proof?

We saw the SAT problem earlier in the graph algorithms section.  Let’s review the definition since SAT plays a central role in the theory of NP-completeness.

#### Definition: SAT Problem:

• Input: Boolean formula $f$ in CNF with $n$ variables and $m$ clauses
• Output: Satisfying assignment if one exists; NO if satisfying SAT assignment

SAT $\in$ NP: Given $f$ and an assignment $\sigma$, for each clause in  $O(n)$ time we can check that there is $\geq 1$ satisfied literal. Thus it takes a total of $O(nm)$ time to check that $\sigma$ satisfies $f$.

Here are a few additional problems that we will discuss in this lecture.

In the coloring problem we have k colors and we want to color the vertices so that adjacent vertices get different colors (i.e., no monochromatic edges).

#### k-Coloring Problem:

• Input: $G=(V,E)$ and integer $k \geq 0$
• Output: k-coloring if one exists; NO if no such k-coloring

Note, k-Coloring $\in$ NP: Given $G, k$ and $\sigma$. In $O(m)$ time to verify that $\sigma$ is a proper coloring. $O(n)$ time to check that $\sigma$ uses no more than $k$ colors.

Here again is the knapsack problem that we saw in the dynamic programming section.

#### Knapsack Problem:

• Input: $n$ objects: integer weights $w_1, w_2, \dots, w_n$; values $v_1, v_2, \dots, v_n$; integer $B$ as weight limit
• Output: Solution $S$ of objects with weight no more than $B$ and the max value of the objects

Is Knapsack $\in$ NP? Given the input and a solution $S$, we need to verify that the solution $S$ is of maximum value.  But how can we do that? We can run the dynamic programming algorithm to find the maximum value which is possible to attain, but that algorithm takes time $O(nB)$.  The input size is of size poly($n, \log B$).  Hence we don’t know that Knapsack is in NP, in fact, if it is in NP then P=NP.

The above formulation is the optimization version, the following is the search version which adds an extra input parameter for the “goal”.

#### Knapsack-Search Problem:

• Input: n objects: integer weights $w_1, w_2, \dots, w_n$; values $v_1, v_2, \dots, v_n$; integer $B$ as weight limit and integer $g$ (Goal)
• Output: Solution $S$ of objects with weight  $\leq B$ and total value $\geq g$

Note, Knapsack-Search $\in$ NP: Given the input and $S$, we can verify $S$ in poly$(n, \log B)$ time by checking whether total weight $\leq B$ and total value >= $g$.

#### Knapsack (optimization) can be reduced to Knapsack-Search:

If we can find a solution for the search version in polynomial time, then we can find the solution to the original Knapsack (optimization) problem in polynomial time using Knapsack-Search Problem as a subroutine and applying binary search over $g$. Hence, we’ve shown that: Knapsack $\rightarrow$ Knapsack-Search.

Recall, the minimum spanning tree (MST) problem.

#### MST Problem:

• Input: $G=(V,E)$ with $w(e) > 0$  for every $e \in E$
• Output: Tree $T$ of min weight

Note, MST $\in$ NP: Given $G$ and $T$. Run DFS to check $T$ is a tree in $O(n)$ time. Then, run Kruskal’s or Prim’s algorithm to get a MST $T'$ and check that $w(T) = w(T')$.

NP stands for non-deterministic polynomial time these are problems that can be solved in poly-time on a non-deterministic machine. (Such machines are allowed to “guess” at each step, you can view it as an automata and we’re simply asking if there exists a path to the accepting state.)

To summarize the status, we’ve shown:

• P contains:    MST
• NP contains: MST, Knapsack, SAT, K-Coloring, TSP

According to the definition of NP and P at the beginning of the lecture, it is easy to see that P $\subset$ NP. What will happen if P = NP?

• P = NP: If we can verify solutions efficiently, then we can construct solutions efficiently. In other words, verifying a proof is not harder than constructing a proof.
• P $\neq$ NP: There are some search problem cannot be solved in poly time.  These problems are NP-Complete problems, they are the hardest problems in NP in the sense that they are the most difficult search problems to solve.

If P $\neq$ NP, then there is no polynomial time algorithm for some problems in NP, these are the “hardest” problems in the class NP and hence are called NP-complete problems.  The problems Knapsack-search, SAT, k-Colorings, TSP are all NP-complete, and there a ton of other NP-complete problems. If we can solve one of these NP-Complete problems efficiently, then we can solve all of the rest efficiently.

To show SAT is NP-Complete, we need to prove:

1. SAT $\in$ NP (shown above)
2. For every problem A $\in$ NP, there is a reduction from A to SAT (A $\rightarrow$ SAT): Given a poly-time algorithm for SAT we can use it to get a poly-time algorithm for every problem A $\in$NP.  Thus, SAT is at least as hard as every other problem in NP.

Here we formally the notion of a reduction alluded to in the second part.

#### Definition: Reduction:

Consider a pair of problems A and B, e.g., A = MST, B = SAT or A = 2SAT, B = SCC.

$A \rightarrow B$ (or $A \leq B$) means A reduces to B:
If we can solve B in poly-time then we can use that algorithm for B as a black-box to solve A in poly-time.

Take input I from A:

1. define input f(I) for B
2. run black-box algorithm for B on f(I)
3. Given solution S for f(I) produce solution h(S) and given No for f(I) then output No for I.

Thus to reduce A to B, we need:

1. define f and h
2. prove that if S is a solution for f(I) then h(s) is a solution for I
3. prove that if there is no solution for f(I) then there is no solution for I

#### To prove NP-completeness:

Now we can use reductions to show the second part of the proof of NP-completeness for SAT.  That is, for any problem A in NP, we need to find a reduction from A to SAT.

Suppose we know SAT is NP-Complete somehow, which gives us for every problem A in NP, we have A $\rightarrow$ SAT.

And now we want to show Colorings is NP-Complete. Since we already know Colorings is NP, the remaining thing is to show for every problem A in NP, we have A $\rightarrow$ Colorings.  If we can show SAT $\rightarrow$ Colorings, then we have A $\rightarrow$ SAT $\rightarrow$ Colorings, and hence A $\rightarrow$ Colorings.

In general, if we want to prove a problem such as Colorings is NP-Complete, we need to show:

1. Colorings $\in$ NP
2. Find a known NP-Complete problem B and show that B $\rightarrow$ Colorings.