# SAT is NP-complete

NP-completeness-notes

NP$\{L : \exists NTM \mbox{ that accepts } L \mbox{ in poly time} \}$

NP is the class of languages such that there exists a nondeterministic TM that accepts that language in polynomial time.

An alternative definition is to say NP is the class of languages that can be efficiently certified. So, if $x \in L$, then there exists a polynomial sized, in $|x|$, certificate that can prove $x \in L$.

A few examples…

• Clique: Does there exist a clique of size $\ge k$ in a graph $G$?
• The certificate is the subset of $k$ vertices that are in the clique, and then the verifier will check if there is an edge between every pair of vertices.
• Independent Set: Does there exist an independent set of size $\ge k$ in a graph $G$?
• As with clique, verifier takes a set of vertices with size $\ge k$ and check if every pair of vertices have no edge.
• Hamiltonian Cycle: Does there exists a Hamiltonian cycle (a cycle of length $n$, using each vertex exactly once) in $G$?
• Vertex Cover: Does there exist a vertex cover of size $\le k$ in $G$?
• Recall a vertex cover is a set of vertices such that each edge has at least one of its endpoints in the set.
• TSP: Does there exist a tour in $G$ that visits every vertex and returns to the origin, and has length $\le L$?
• Integer Linear Programming (ILP): Given $A \in \mathbb{R}^{m \times n}, b \in \mathbb{R}^m$, does there exist an $x \in \mathbb{Z}^n$ such that $A x \ge b$?
• The certificate is the vector $x$. We check if $A x \ge b$ and that $x$ is all integers.
• SAT: Given a satisfiability instance $F$, does there exist a boolean assignment $x$ such that $F(x)$ is true?

These are just a few of the problems that are in NP. All of the problems given above do not currently have a polynomial time algorithm. Can we come up with one? Is there anything that prevents us from coming up with a polynomial time algorithm?

Which problem seems the easiest to develop a poly-time algorithm? It turns out they are equally difficult–a polynomial time algorithm for any of them would give a polynomial time algorithm for all of them. To understand this, we introduce the notion of reduction.

Reduction

We say a decision problem $A$ is reducible to a decision problem $B$ if there exists an algorithm to solve $A$ using a procedure to solve $B$.

$A \hookrightarrow B$ is a polytime reduction, on input of size $n$, if the algorithm makes only a poly$(n)$ number of calls to $B$, each on inputs of size poly$(n)$, and uses poly$(n)$ additional time. We use $A \hookrightarrow_P B$ to say $A$ is polytime reducible to $B$.

CLIQUE $\hookrightarrow_P$ INDEPENDENT-SET? Given an instance of CLIQUE, we call the algorithm for INDEPENDENT-SET on the complement of this graph, $\overline{G}$. Then, an independent set in $\overline{G}$ of size $\ge k$ will be a clique of size $G$. There is a nearly equivalent reduction from INDEPENDENT-SET $\hookrightarrow_P$ CLIQUE.

VERTEX-COVER $\hookrightarrow_P$ INDEPENDENT-SET?

Given an instance of VERTEX-COVER, call INDEPENDENT-SET on this graph. Call the independent set it finds $\mathcal{I}$. Then, take the complement of this $S = V \backslash \mathcal{I}$, which will be a vertex cover. The following claim shows how to use the reduction to answer the decision question:

Claim: $G$ has a vertex cover of size $\le k$ iff $G$ has an independent set of size $\ge n-k$.

SAT $\hookrightarrow_P$ ILP?

Given an instance of SAT, write each clause as an equation. For example, a clause $x_1 \lor x_2 \lor \overline{x_3}$ become an equation $x_1 + x_2 + (1-x_3) \ge 1$. We then add bounds $0\le x_i\le 1$ to enforce each $x_i$ is either $0$ or $1$.

Definition A language $L$ is NP-complete if

1. $L \in$ NP.
2. For all $A \in$ NP, there exists a polytime reduction $A \hookrightarrow_P L$.

So if $L$ is NP-complete, $L$ is the “hardest” language in NP. Also if $L$ is NP-complete and $L \in P$, then $NP \subseteq P$ and $P = NP$.

But does there exist an NP-complete language? We show first that SAT is NP-complete.

Theorem (Cook-Levin). SAT is NP-complete.

Proof: We need to show that for all $L \in NP$, that we can decide $L$ given a procedure to decide SAT.

Suppose $L \in NP$. Then, there exists a NTM $M$ that accepts any $x \in L$ in $\le n^k$ steps for some fixed $k$. This also implies that there exists a sequence of $\le n^k$ configurations from a starting configuration to $q_{A}$ if $x \in L$. We will show how to check the existence of such an accepting path as a polynomial-sized SAT formula.

We encode the steps of computation of $M$ by transforming the state of the NTM into literals. Define

$X_{i,j,s} = \begin{cases} 1 \mbox{ if } i^{th} \mbox{ step has } j^{th} \mbox{ entry } =s. \\ 0 \mbox{ otherwise} \end{cases}$

Note that $s \in \Gamma \cup Q$ and that for all $i,j, X_{i,j,s}=1$ for exactly one value of $s$. We can encode the requirement of exactly one literal being true as a SAT instance. For two literals and a clause $(x_1 \lor x_2)$, the following requires that exactly one of the literals be true:

$(x_1 \lor x_2) \land (\overline{x}_1 \lor \overline{x}_2)$.

More generally, for $k$ literals and a clause $(x_1 \lor x_2 \lor \ldots \lor x_k)$, the following evaluates to true if and only if the original clause has exactly one literal set to true:

$(x_1 \lor x_2 \lor \ldots \lor x_k) \land \left(\land_{i \neq j} (\overline{x}_i \lor \overline{x}_j) \right)$.

We encode the existence of an accepting path in the NTM as the following SAT instance:

1. For each pair $i,j$ create a set of clauses
• $\left( \lor_s X_{i,j,s} \right) \land \left( \land_{s\neq t} \left( \overline{X}_{i,j,s} \lor \overline{X}_{i,j,t} \right) \right)$
2. Encode the input/initial configuration $c_1c_2c_3\ldots$ as $(X_{1,1,c_1} \land X_{1,2,c_2} \land \ldots )$.
3. Require that we reach an accept state somewhere: $\lor_{i,j} X_{i,j,q_{A}}$.
4. Ensure that every transition is valid by requiring that the current state leads to a valid future state. We consider windows of three symbols at both the current/next step, because a TM can read/write and move either L/R in a single transition. The we consider 6 literals at a time for a window and then ensure that every window is valid:

\begin{aligned} \land_{i,j} \lor_{a_1, \ldots, a_6 \text{ is valid window}}& (x_{i,j-1,a_1} \land x_{i,j,a_2} \land x_{i,j+1,a_3} \Rightarrow \\ & x_{i+1,j-1,a_4} \land x_{i+1,j,a_5} \land x_{i+1,j+1,a_6}) \end{aligned}

We then concatenate all the clauses from 1-4 above into a single function $F = (1) \land (2) \land (3) \land (4)$, where $F$ has a satisfying assignment iff the original input $x \in L$. The size of the formula is at most $n^k \cdot n^k \cdot (|Q| + |\Gamma|) \cdot C = O(n^{2k})$.

Therefore, $L \hookrightarrow_P SAT$.