# Max-SAT Approx Alg.

PDF of Eric’s handwritten notes are here.

We know that SAT is NP-complete so it is unlikely that we’ll find a polynomial-time algorithm to solve it.  Can we approximately solve it?  Let’s first look at the optimization version of SAT so that we are always outputting something.  Here is the definition of the Max-SAT problem:

Max SAT

Input: Boolean formula $f$ in CNF with $n$ variables $x_1, ... , x_n$ and $m$ clauses $C_1, ..., C_m$

Output: assignment maximizing the number of clauses satisfied.

Example:

$f = (x_1 \vee \bar{x_3} \vee x_4) \wedge(x_2 \vee x_3) \wedge (\bar{x_2}) \wedge (\bar{x_1} \vee \bar{x_3} \vee x_2 \vee x_4) \wedge (\bar{x_4})$

Setting $x_1 = F, x_2 = F, x_3 = T, x_4 = F$ satisfies 4 of the 5 clauses, and there is no assignment satisfying all 5.

Clearly, SAT $\rightarrow Max-SAT$ and hence Max-SAT is NP-hard.  Therefore we cannot hope to efficiently find an optimal assignment.  Can we find an approximately optimal solution?  In particular, for an input formula $f$ let $m^*$ denote the maximum number of satisfied clauses.  In the above example, $m=5$ and $m^*=4$.

Can we construct an algorithm A which finds an assignment satisfying $\ell$ clauses where $\ell \geq \frac{1}{2}m^*$?  And it achieves this guarantee of being within a factor $\frac{1}{2}$ of optimal for every input $f$.  We call A a $\frac{1}{2}$-approximation algorithm.

We will first see a very simple randomized algorithm which is a $\frac{1}{2}$-approximation algorithm for Max-SAT.  Then we will use linear programming to construct a $(1-\frac{1}{e})$-approximation algorithm.  Finally, by combining these two algorithms we’ll achieve a $\frac{3}{4}$-approximation algorithm.

Simple Algorithm

First we give a very simple algorithm for Max-SAT. The algorithm sets each variable to be True independently with probability 1/2 and False with probability 1/2.  I then outputs the resulting assignment.  Since this is a randomized algorithm we look at its “expected”

Let $W$ be the number of satisfied clauses. Also, for $1 \leq j \leq m$, let

$W_j = \begin{cases} 1 & \text{ if } C_j \text{ is satisfied,} \\ 0 & \text{otherwise.}\end{cases}$

It is easy to see that $W = \sum_{j = 1}^m W_j$. By linearity of expectation

$E[W] = E[\sum_{j = 1}^m W_j] =\sum_{j = 1}^m E[W_j].$

Consider a clause $C_j$ with $k$ variables,

$Pr( W_j = 1) = 1 - (1/2)^k \geq 1 - (1/2) = 1/2.$

Therefore,

$E[W] =\sum_{j = 1}^m E[W_j] \geq m/2.$

It follows that the simple algorithm satisfies half of the clauses in expectation (regardless of the max # of clauses satisfiable). Moreover, if there are exactly $k$ literals in every clause, then it satisfies $\geq (1-2^{-k})m$ clauses.

Integer Linear Programing

Integer Linear Programming (ILP) is a linear program where the variables are restricted to integer values.

First we show that ILP is NP-hard by reducing Max SAT -> ILP as follows.

Consider input $f$ for Max SAT with variables $x_1, ... , x_n$ & clauses $C_1, ..., C_m$.

For variable $x_i$ in SAT input $f$, create variable $y_i$ for our ILP instance. Restrict $y_i \in \{0,1\}$ where $y_i=1$ corresponds to $x_i=T$ and $y_i=0$ corresponds to $x_i=F$.

For clause $C_j$, create variable $z_j \in \{0,1\}$ where $z_j = 1$ corresponds to $C_j$ satisfied $z_j=0$ corresponds to $C_j$ unsatisfied

Further, let $C_j^+=$ variables in $C_j$ in positive form, and $C_j^-=$ variables in $C_j$ in negative form. For example, if $C_j = (x_5 \vee \bar{x_3} \vee x_7), C_j^+ = \lbrace x_5, x_7 \rbrace, C_j^{-} = \lbrace x_3 \rbrace$. Create a constraint

$\sum_{x_i \in C_j^+} y_i + \sum_{x_i \in C_j^-} (1 - y_i) \geq z_j$

for each clause $C_j$.

Finally, the ILP maximizes

$\sum_{j=1}^m z_j$.

Solving this ILP will give a solution to Max SAT.

To see this, note that for the constraint

$\sum_{x_i \in C_j^+} y_i + \sum_{x_i \in C_j^-} (1 - y_i) \geq z_j$

to get $z_j = 1$ we need at least 1 positive literal having $y_i = 1$ &/or at least 1 negative literal with $y_i = 0$. Also, since we maximize $\sum_j z_j$ we’ll set $z_j$ = 1 if possible.

LP Relaxation

Consider the following LP which changes constraints of the form $y_i \in \lbrace 0,1 \rbrace$ to $0 \leq y_i \leq 1$

max $\sum_{j=1}^m \hat{z_j}$ subject to:

$\forall i = 1, ..., n, 0 \leq \hat{y_i} \leq 1$

$\forall j = 1, ..., m, 0 \leq \hat{z_j} \leq 1$

$\forall j = 1, ..., m, \sum_{i \in C_j^+} \hat{y_i} + \sum_{i \in C_j^-} (1-\hat{y_i}) \geq \hat{z_j}$

This is a LP so we can solve it in poly-time.

This is a “relaxation” of the original ILP in the following sense: any solution to the ILP is also a solution to the LP

Hence, objective value for optimal of ILP is less than or equal to objective value for optimal of LP

$\sum_{j=1}^m z_j^* \leq \sum_{j=1}^m \hat{z_j^*},$

where $\hat{y}^* \& \hat{z}^*$ is optimal solution for the LP.

Randomized Rounding

We want to then create a solution $y,z$ for the ILP which is close to the optimal for the ILP. How do we measure close to the ILP optimal? By saying it’s close to the LP optimal.

Take $\hat{y}^* \& \hat{z}^*$. Set

$y_j = \begin{cases} 1 & \text{ with probability } \hat{y_i}^*, \\ 0 & \text{ with probability } 1 - \hat{y_i}^*. \end{cases}$

This method is called randomized rounding.

Let’s look at expected # of satisfied clauses.

For $k \geq 1$ let $B_k = 1 - (1-\frac{1}{k})^k$

Lemma: For clause $C_j$ with $k$ literals, $Pr(C_j$ is satisified) $\geq B_k \hat{z_j}$

Assuming the lemma, we show that the algorithm gives a constant approximation factor.

Note, $1-\frac{1}{k} \leq e^{-k}$ for $k \geq 1$ since $e^{-x} = 1 - x + \frac{x^2}{2!} - \frac{x^3}{3!} + - ...$

Hence, $1 - (1-\frac{1}{k})^k \geq 1 - \frac{1}{e}$ for $k \geq 1$.

Let $W$ = # of satisfied clauses.

$E[W] = \sum_{j=1}^m Pr(C_j$ is satisfied)

$\geq \sum_{j=1}^m B_{k(j)} \hat{z_j}$ where $C_j$ has $k(j)$ variables.

$\geq (1 - \frac{1}{e}) \sum_{j=1}^m \hat{z_j}$

Recall, $\sum_j \hat{Z_j} \geq \sum Z_j = m*=$ the most optimal # of satisfied clauses

Hence, $E[z] \geq (1 - \frac{1}{e})m*$

So in expectation we satisfy $\geq (1 - \frac{1}{e})$ times the maximum # of satisfied clauses.

This is a $(1-\frac{1}{e})$ – expected approximation algorithm, and we can find such an assignment using the method of conditional expectations.

Proof of lemma: Fix $C_j$. We may assume that all of the variables in $C_j$ are in positive form, so say $C_j = (x_1 \vee x_2 \vee .. \vee x_k)$

The LP constraint is then:

$\hat{y_1} + \hat{y_2} + ... + \hat{y_k} \geq \hat{z_j}$ (*)

Clause $C_j$ is unsatisfied if every $y_i$ for $i = 1, ..., k$ is rounded to 0.

This happens with probability $\prod_{i=1}^k (1-\hat{y_i}).$

Recall the arithmetic mean-geometric mean inequality AM – FM

$AM = \frac{1}{k} \sum_{i=1}^k w_i \geq (\prod_{i=1}^k w_i)^{\frac{1}{k}} = GM$

In our case $w_i = 1 - \hat{y_i}$

Then, $\prod_{i=1}^k \left(1-\hat{y_i}\right) \leq \left[\frac{1}{k} \sum_{i=1}^k \left(1-\hat{y_i}\right) \right]^k = \left[1 - \frac{\sum_{i=1}^k \hat{y_i}}{k} \right]^k$

Thus, $1 - \prod_{i=1}^k \left(1-\hat{y_i} \right) \geq 1 - \left(1 - \frac{\sum_{i=1}^k \hat{y_i}}{k} \right)^k\ \geq 1 - \left(1 - \frac{\hat{z_i}}{k} \right)^k$

since $\hat{y_1} + \hat{y_2} + ... + \hat{y_k} \geq \hat{z_j}$ from (*) on last page.

Let $f(a) = 1-\left(1-\frac{a}{k} \right)^k$

$f''(a) < 0$ so $f(a)$ is concave

To show $f(a) \geq B_k a$ for all $a \in [0,1]$ we just need to check at $a = 0$ and $a= 1$.

• for $a = 0: f(a) = 1 - (1-\frac{0}{k})^k = 0 = B_k a$
• for $a = 1: f(a) = 1 - (1-\frac{1}{k})^k = B_k a$

Hence, $f(\hat{z_j}) \geq B_k \hat{z_j}$

Therefore,

$Pr(C_j$ is satisfied) = $1 - \prod_{i=1}^k (1-\hat{y_i}) \geq 1 - (1 - \frac{\hat{Z_j}}{k})^k$ by AM – GM
$\geq Bk \hat{Z_j}$ since $f(a) \geq B_k a$

Recall, our old algorithm achieves $1-2^{-k}$ on clauses of size $k$ & this new algolrithm achieves $B_k= 1- (1-\frac{1}{k})^k$

The new algorithm is better for ${k \leq 2}$ and the old algorithm is better for ${k \geq 2}$

Combine Two Algorithms:

Run best both algorithms and take better of the 2 solutions.

Let ${m_1}$ = expected number of clauses satisfied by old algorithm and ${m_2}$ = expected number for new algorithm

Theorem:

${max \lbrace m_1, m_2 \rbrace \geq \frac{3}{4} \sum_{j=1}^m \hat{z_j} \geq \frac{3}{4} m *}$

In other words, we have a ${\frac{3}{4}}$ – approximation algorithm.

(can use method of conditional expectation to derandomize)

Proof:

${max (m_1, m_2) \geq \frac{m_1 + m_2}{2} = average(m_1, m_2)}$

So it suffices to show ${\frac{m_1 + m_2}{2} \geq \frac{3}{4}}$

Let $S_k$ = set of clauses with $k$ literals.

$m_1 = {\sum_k \sum_{C_j\in S_k} (1-2^{-k}) \geq \sum_k \sum_{C_j\in S_k} (1-1^{-k}) \hat{z_j}}$

Since ${\hat{z_j} \leq 1}$

${m_2 \geq \sum_k \sum_{C_j\in S_k} \left(1-\left(1-(\frac{1}{2})^{-k}\right)\right) \hat{z_j}}$

Thus,

${\frac{m_1 + m_2}{2} \geq \sum_k \sum_{C_j\in S_k} \left(\frac{(1-2^{-k})+(1-(1-\frac{1}{k})^k)}{2} \right) \hat{z_j} }$

Need to show: ${a_k \geq \frac{3}{4}}$ for all $k \geq 1$.

• $k = 1: {a_k = \frac{\frac{1}{2} + 1}{2} = \frac{3}{4}}$
• $k = 2: {a_k = \frac{\frac{3}{4} + \frac{3}{4}}{2} = \frac{3}{4}}$
• ${k \geq 3: a_k = \frac{\frac{7}{8} + (1-\frac{1}{e})}{2} = \frac{1}{2}(\frac{15}{8} - \frac{1}{e}) \geq \frac{1}{2}(\frac{12}{8}) = \frac{3}{4} }$  since  ${\frac{1}{e} \leq \frac{3}{8} }$