SAT is NP-complete


NP\{L : \exists NTM \mbox{ that accepts } L \mbox{ in poly time} \}

NP is the class of languages such that there exists a nondeterministic TM that accepts that language in polynomial time.

An alternative definition is to say NP is the class of languages that can be efficiently certified. So, if x \in L, then there exists a polynomial sized, in |x|, certificate that can prove x \in L.

A few examples…

  • Clique: Does there exist a clique of size \ge k in a graph G?
    • The certificate is the subset of k vertices that are in the clique, and then the verifier will check if there is an edge between every pair of vertices.
  • Independent Set: Does there exist an independent set of size \ge k in a graph G?
    • As with clique, verifier takes a set of vertices with size \ge k and check if every pair of vertices have no edge.
  • Hamiltonian Cycle: Does there exists a Hamiltonian cycle (a cycle of length n, using each vertex exactly once) in G?
  • Vertex Cover: Does there exist a vertex cover of size \le k in G?
    • Recall a vertex cover is a set of vertices such that each edge has at least one of its endpoints in the set.
  • TSP: Does there exist a tour in G that visits every vertex and returns to the origin, and has length \le L?
  • Integer Linear Programming (ILP): Given A \in \mathbb{R}^{m \times n}, b \in \mathbb{R}^m, does there exist an x \in \mathbb{Z}^n such that A x \ge b?
    • The certificate is the vector x. We check if A x \ge b and that x is all integers.
  • SAT: Given a satisfiability instance F, does there exist a boolean assignment x such that F(x) is true?

These are just a few of the problems that are in NP. All of the problems given above do not currently have a polynomial time algorithm. Can we come up with one? Is there anything that prevents us from coming up with a polynomial time algorithm?

Which problem seems the easiest to develop a poly-time algorithm? It turns out they are equally difficult–a polynomial time algorithm for any of them would give a polynomial time algorithm for all of them. To understand this, we introduce the notion of reduction.


We say a decision problem A is reducible to a decision problem B if there exists an algorithm to solve A using a procedure to solve B.

A \hookrightarrow B is a polytime reduction, on input of size n, if the algorithm makes only a poly(n) number of calls to B, each on inputs of size poly(n), and uses poly(n) additional time. We use A \hookrightarrow_P B to say A is polytime reducible to B.

CLIQUE \hookrightarrow_P INDEPENDENT-SET? Given an instance of CLIQUE, we call the algorithm for INDEPENDENT-SET on the complement of this graph, \overline{G}. Then, an independent set in \overline{G} of size \ge k will be a clique of size G. There is a nearly equivalent reduction from INDEPENDENT-SET \hookrightarrow_P CLIQUE.


Given an instance of VERTEX-COVER, call INDEPENDENT-SET on this graph. Call the independent set it finds \mathcal{I}. Then, take the complement of this S = V \backslash \mathcal{I}, which will be a vertex cover. The following claim shows how to use the reduction to answer the decision question:

Claim: G has a vertex cover of size \le k iff G has an independent set of size \ge n-k.

SAT \hookrightarrow_P ILP?

Given an instance of SAT, write each clause as an equation. For example, a clause x_1 \lor x_2 \lor \overline{x_3} become an equation x_1 + x_2 + (1-x_3) \ge 1. We then add bounds 0\le x_i\le 1 to enforce each x_i is either 0 or 1.

Definition A language L is NP-complete if

  1. L \in NP.
  2. For all A \in NP, there exists a polytime reduction A \hookrightarrow_P L.

So if L is NP-complete, L is the “hardest” language in NP. Also if L is NP-complete and L \in P, then NP \subseteq P and P = NP.

But does there exist an NP-complete language? We show first that SAT is NP-complete.

Theorem (Cook-Levin). SAT is NP-complete.

Proof: We need to show that for all L \in NP, that we can decide L given a procedure to decide SAT.

Suppose L \in NP. Then, there exists a NTM M that accepts any x \in L in \le n^k steps for some fixed k. This also implies that there exists a sequence of \le n^k configurations from a starting configuration to q_{A} if x \in L. We will show how to check the existence of such an accepting path as a polynomial-sized SAT formula.

We encode the steps of computation of M by transforming the state of the NTM into literals. Define

X_{i,j,s} = \begin{cases}    1 \mbox{ if } i^{th} \mbox{ step has } j^{th} \mbox{ entry } =s. \\    0 \mbox{ otherwise}    \end{cases}

Note that s \in \Gamma \cup Q and that for all i,j, X_{i,j,s}=1 for exactly one value of s. We can encode the requirement of exactly one literal being true as a SAT instance. For two literals and a clause (x_1 \lor x_2), the following requires that exactly one of the literals be true:

(x_1 \lor x_2) \land (\overline{x}_1 \lor \overline{x}_2).

More generally, for k literals and a clause (x_1 \lor x_2 \lor \ldots \lor x_k), the following evaluates to true if and only if the original clause has exactly one literal set to true:

(x_1 \lor x_2 \lor \ldots \lor x_k) \land \left(\land_{i \neq j} (\overline{x}_i \lor \overline{x}_j) \right).

We encode the existence of an accepting path in the NTM as the following SAT instance:

  1. For each pair i,j create a set of clauses
    • \left( \lor_s X_{i,j,s} \right) \land \left( \land_{s\neq t} \left( \overline{X}_{i,j,s} \lor \overline{X}_{i,j,t} \right) \right)
  2. Encode the input/initial configuration c_1c_2c_3\ldots as (X_{1,1,c_1} \land X_{1,2,c_2} \land \ldots ).
  3. Require that we reach an accept state somewhere: \lor_{i,j} X_{i,j,q_{A}}.
  4. Ensure that every transition is valid by requiring that the current state leads to a valid future state. We consider windows of three symbols at both the current/next step, because a TM can read/write and move either L/R in a single transition. The we consider 6 literals at a time for a window and then ensure that every window is valid:

\begin{aligned}    \land_{i,j} \lor_{a_1, \ldots, a_6 \text{ is valid window}}& (x_{i,j-1,a_1} \land x_{i,j,a_2} \land x_{i,j+1,a_3} \Rightarrow \\    & x_{i+1,j-1,a_4} \land x_{i+1,j,a_5} \land x_{i+1,j+1,a_6})    \end{aligned}

We then concatenate all the clauses from 1-4 above into a single function F = (1) \land (2) \land (3) \land (4), where F has a satisfying assignment iff the original input x \in L. The size of the formula is at most n^k \cdot n^k \cdot (|Q| + |\Gamma|) \cdot C = O(n^{2k}).

Therefore, L \hookrightarrow_P SAT.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s