Lecture 08

Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

U.C.

Berkeley — CS278: Computational Complexity Handout N8


Professor Luca Trevisan 9/27/2002

Notes for Lecture 8

In this lecture we will define the probabilistic complexity classes BPP, RP, ZPP and
we will see how they are related to each other, as well as to other deterministic or circuit
complexity classes. Then we will present a randomized reduction showing that the SAT
problem remains as hard when restricted to inputs that have either zero or one satisfying
assignments as in the general case.

1 Probabilistic complexity classes


First we are going to describe the probabilistic model of computation. In this model an
algorithm A gets as input a sequence of random bits r and the ”real” input x of the problem.
The output of the algorithm is the correct answer for the input x with some probability.
Definition 1 An algorithm A is called a polynomial time probabilistic algorithm if the size
of the random sequence |r| is polynomial in the input |x| and A() runs in time polynomial
in |x|.
If we want to talk about the correctness of the algorithm, then informally we could say that
for every input x we need Pr[A(x, r) = correct answer for x] ≥ 32 . That is, for every input
the probability distribution over all the random sequences must be some constant bounded
away from 12 . Let us now define the class BPP.
Definition 2 A decision problem L is in BPP if there is a polynomial time algorithm A
and a polynomial p() such that :
∀x ∈ L Prr∈{0,1}p(|x|) [A(x, r) = 1] ≥ 2/3
∀x 6∈ L Prr∈{0,1}p(|x|) [A(x, r) = 1] ≤ 1/3
We can see that in this setting we have an algorithm with two inputs and some con-
straints on the probabilities of the outcome. In the same way we can also define the class
P as:
Definition 3 A decision problem L is in P if there is a polynomial time algorithm A and
a polynomial p() such that :
∀x ∈ L : Prr∈{0,1}p(|x|) [A(x, r) = 1] = 1
∀x 6∈ L : Prr∈{0,1}p(|x|) [A(x, r) = 1] = 0
Similarly, we define the classes RP and ZPP.
Definition 4 A decision problem L is in RP if there is a polynomial time algorithm A and
a polynomial p() such that:
∀x ∈ L Prr∈{0,1}p(|x|) [A(x, r) = 1] ≥ 1/2
∀x 6∈ L Prr∈{0,1}p(|x|) [A(x, r) = 1] ≤ 0

1
Definition 5 A decision problem L is in ZPP if there is a polynomial time algorithm A
whose output can be 0, 1, ? and a polynomial p() such that :

∀x Prr∈{0,1}p(|x|) [A(x, r) =?] ≤ 1/2


∀x, ∀r such that A(x, r) 6=? then A(x, r) = 1 if and only if x ∈ L

2 Relations between complexity classes


After defining these probabilistic complexity classes, let us see how they are related to other
complexity classes and with each other.

Theorem 1 RP⊆NP.

Proof: Suppose we have a RP algorithm for a language L. Then this algorithm is can
be seen as a “verifier” showing that L is in NP. If x ∈ L then there is a random sequence
r, for which the algorithm answers yes, and we think of such sequences r as witnesses that
x ∈ L. If x 6∈ L then there is no witness. 2
We can also show that the class ZPP is no larger than RP.

Theorem 2 ZPP⊆RP.

Proof: We are going to convert a ZPP algorithm into an RP algorithm. The construction
consists of running the ZPP algorithm and anytime it outputs ?, the new algorithm will
answer 0. In this way, if the right answer is 0, then the algorithm will answer 0 with
probability 1. On the other hand, when the right answer is 1, then the algorithm will give
the wrong answer with probability less than 1/2, since the probability of the ZPP algorithm
giving the output ? is less than 1/2. 2
Another interesting property of the class ZPP is that it’s equivalent to the class of
languages for which there is an average polynomial time algorithm that always gives the
right answer. More formally,

Theorem 3 A language L is in the class ZPP if and only if L has an average polynomial
time algorithm that always gives the right answer.

Proof: First let us clarify what we mean by average time. For each input x we take the
average time of A(x, r) over all random sequences r. Then for size n we take the worst time
over all possible inputs x of size |x| = n. In order to construct an algorithm that always
gives the right answer we run the ZPP algorithm and if it outputs a ?, then we run it again.
Suppose that the running time of the ZPP algorithm is T , then the average running time
of the new algorithm is:
1 1 1
Tavg = · T + · 2T + . . . + k · kT = O(T )
2 4 2
Now, we want to prove that if the language L has an algorithm that runs in polynomial
average time t(|x|), then this is in ZPP. We run the algorithm for time 2t(|x|) and output
a ? if the algorithm has not yet stopped. It is straightforward to see that this belongs to

2
ZPP. First of all, the worst running time is polynomial, actually 2t(|x|). Moreover, the
probability that our algorithm outputs a ? is less than 1/2, since the original algorithm has
an average running time t(|x|) and so it must stop before time 2t(|x|) at least half of the
times. 2
Let us now prove the fact that RP is contained in BPP.

Theorem 4 RP⊆BPP

Proof: We will convert an RP algorithm into a BPP algorithm. In the case that the input
x does not belong to the language then the RP algorithm always gives the right answer, so
it certainly satisfies that BPP requirement of giving the right answer with probability at
least 2/3. In the case that the input x does belong to the language then we need to improve
the probability of a correct answer from at least 1/2 to at least 2/3.
Let A be an RP algorithm for a decision problem L. We fix some number k and define
the following algorithm:

A(k)

input: x,
pick r1 , r2 , . . . , rk
if A(x, r1 ) = A(x, r2 ) = . . . = A(x, rk ) = 0 then return 0
else return 1

Let us now consider the correctness of the algorithm. In case the correct answer is 0 the
output is always right. In the case where the right answer is 1 the output is right except
when all A(x, ri ) = 0.

if x 6∈ L Prr1 ,...,rk [Ak (x, r1 , . . . , rk ) = 1] = 0


 k
k 1
if x ∈ L Prr1 ,...,rk [A (x, r1 , . . . , rk ) = 1] ≥ 1 −
2

It is easy to see that by choosing an appropriate k the second probability can go arbitrarily
close to 1. In particular, choosing k = 2 suffices to have a probability larger than 2/3, which
is what is required by the definition of BPP. In fact, by choosing k to be a polynomial in
|x|, we can make the probability exponentially close to 1. This means that the definition of
RP that we gave above would have been equivalent to a definition in which, instead of the
bound of 1/2 for the probability of a correct answer when the input is in the language L,
q(|x|)
we had have a bound of 1 − 12 , for a fixed polynomial q. 2
Let, now, A be a BPP algorithm for a decision problem L. Then, we fix k and define
the following algorithm:

3
A(k)

input: x,
pick r1 , r2 , . . . , rk
c = ki=1 A(x, ri )
P

if c ≥ k2 then return 1
else return 0

In a BPP algorithm we expect the right answer to come up with probability more than
1/2. So, by running the algorithm many times we make sure that this slightly bigger than
1/2 probability will actually show up in the results. More formally let us define the Chernoff
bounds.
Theorem 5 (Chernoff Bound)
Suppose X1 , . . . , Xk are independent random variables with values in {0, 1} and for every i,
Pr[Xi = 1] = p. Then:
k k
1
X −2
Pr[ k Xi − p > ] < e 2p(1−p)
i=1
k k
X −2 2p(1−p)
Pr[ k1 Xi − p < −] < e
i=1

The Chernoff bounds will enable us to bound the probability that our result is far from the
expected. Indeed, these bounds say that this probability is exponentially small in respect
to k.
Let us now consider how the Chernoff bounds apply to the algorithm we described
previously. We fix the input x and call p = Prr [A(x, r) = 1] over all possible random
sequences. We also define the independent random variables X1 , . . . , Xk such that Xi =
A(x, ri ).
First, suppose x ∈ L. Then the algorithm A(k) (x, r1 , . . . , rk ) outputs the right answer
1 P
1, when k i Xi ≥ 12 . So, the algorithm makes a mistake when k1 i Xi < 12 .
P
We now apply the Chernoff bounds to bound this probability.
X
Pr[A(k) outputs the wrong answer on x] = Pr[ k1 Xi < 12 ]
i
X
≤ Pr[ k1 Xi − p ≤ − 16 ]
i

since p ≥ 23 .

k
− 72p(1−p)
≤e = 2−Ω(k)

The probability is exponentially small in k. The same reasoning applies also for the case
where x 6∈ L. Further, it is easy to see that by choosing k to be a polynomial in |x| instead

4
of a constant, we can change the definition of a BPP algorithm and instead of the bound
of 13 for the probability of a wrong answer, we can have a bound of 2−q(|x|) , for a fixed
polynomial q.
Next, we are going to see how the probabilistic complexity classes relate to circuit
complexity classes and specifically prove that the class BPP has polynomial size circuits.

Theorem 6 (Adleman) BPP ⊆ SIZE(nO(1) )

Proof: Let L be in the class BPP. Then by definition, there is a polynomial time
algorithm A and a polynomial p, such that for every input x

Prr∈{0,1}p(|x|) [A(x, r) = wrong answer for x] ≤ 2−(n+1)

This follows from our previous conclusion that we can replace 13 with 2−q(|x|) . We now fix
n and try to construct a family of circuits Cn , that solves L on inputs of length n.

Claim 7 There is a random sequence r ∈ {0, 1}p(n) such that for every x ∈ {0, 1}n A(x, r)
is correct.

Proof: Informally, we can see that for each input x the number of random sequences
r that give the wrong answer is exponentially small. Therefore, even if we assume that
these sequences are different for every input x, their sum is still less than the total number
of random sequences. Formally, let’s consider the probability over all sequences that the
algorithm gives the right answer for all input. If this probability is greater than 0, then the
claim is proved.

Prr [for every x, A(x, r) is correct] = 1 − Prr [∃x, A(x, r) is wrong]

the second probability is the union of 2n possible events for each x. This is bounded by the
sum of the probabilities.

X
≥1− Prr [A(x, r)is wrong]
x∈{0,1}n

≥ 1 − 2n · 2−(n+1)
1

2
2
So, we proved that at least half of the random sequences are correct for all possible
input x. Therefore, it is straightforward to see that we can simulate the algorithm A(·, ·),
where the first input has length n and the second p(n), by a circuit of size polynomial in n.
All we have to do is find a random sequence which is always correct and build it inside
the circuit. Hence, our circuit will take as input only the input x and simulate A with input
x and r for this fixed r. Of course, this is only an existential proof, since we don’t know
how to find this sequence efficiently. 2

5
In conclusion, let us briefly describe some other relations between complexity classes.
Whether BPP ⊆ NP or not is still an open question. What we know is that it’s unlikely
that NP is contained in BPP, since then by the previous result NP would have polynomial
size circuits and hence by the result of Karp and Lipton the polynomial hierarchy would
collapse.

3 BPP⊆ Σ2
This result was first shown by Sipser and Gacs. Lautemann gave a much simpler proof
which we give below.

Lemma 8 If L is in BPP then there is an algorithm A such that for every x,


1
Prr (A(x, r) = right answer) ≥ 1 − 3m ,

where the number of random bits |r| = m = |x|O(1) and A runs in time |x|O(1) .

Proof: Let  be a BPP algorithm for L. Then for every x, Prr (Â(x, r) = wrong answer) ≤
1
3 , and  uses m̂(n) random bits where n = |x|.
k(n)
Do k(n) repetitions of  and accept if and only if at least executions of  ac-
2
cept. Call the new algorithm A. Then A uses k(n)m̂(n) random bits and Prr (A(x, r) =
wrong answer) ≤ 2−ck(n) . We can then find k(n) with k(n) = Θ(log m̂(n)) such that
ˆ . 2
1 1
2ck(n)

3k(n)m(n)

Theorem 9 BPP⊆ Σ2 .

Proof: Let L be in BPP and A as in the claim. Then we want to show that
m
_
x ∈ L ⇔ ∃y1 , . . . , ym ∈ {0, 1}m ∀z ∈ {0, 1}m A(x, yi ⊕ z) = 1
i=1

where m is the number of random bits used by A on input x.


Suppose x ∈ L. Then

Pry1 ,...,ym (∃zA(x, y1 ⊕ z) = · · · = A(x, ym ⊕ z) = 0)


X
≤ Pry1 ,...,ym (A(x, y1 ⊕ z) = · · · = A(x, ym ⊕ z) = 0)
z∈{0,1}m
1
≤ 2m
(3m)m
< 1.

So
_
Pry1 ,...,ym (∀z A(x, yi ⊕ z)) = 1 − Pry1 ,...,ym (∃zA(x, y1 ⊕ z) = · · · = A(x, ym ⊕ z) = 0)
i
> 0.

6
So (y1 , . . . , ym ) exists.
Conversely suppose x ∈ / L. Then
!
_ X
Prz A(x, yi ⊕ z) ≤ Prz (A(x, yi ⊕ z) = 1)
i i
1
≤m·
3m
1
= .
3
So
!
_
Prz (A(x, y1 ⊕ z) = · · · = A(x, ym ⊕ z) = 0) = Prz A(x, yi ⊕ z)
i
2

3
> 0.
⊕ z) = 0 for all y1 , . . . , ym ∈ {0, 1}m . 2
W
So there is a z such that i A(x, yi

4 The Valiant-Vazirani Reduction


In this section we show the following: suppose there is an algorithm for the satisfiability
problem that always find a satisfying assignment for formulae that have exactly one sat-
isfiable assignment (and behaves arbitrarily on other instances): then we can get an RP
algorithm for the general satisfiability problem, and so NP = RP.
We prove the result by presenting a randomized reduction that given in input a CNF
formula φ produces in output a polynomial number of formulae ψ0 , . . . , ψn . If φ is satisfiable,
then (with high probability) at least one of the ψi is satisfiable and has exactly one satisfying
assignment; if φ is not satisfiable, then (with probability one) all ψi are unsatisfiable.
The idea for the reduction is the following. Suppose φ is a satisfiable formula with n
variables that has about 2k satisfying assignments, and let h : {0, 1}n → {0, 1}k be a hash
function picked from a family of pairwise independent hash functions: then the average
number of assignments x such that φ(x) is true and h(x) = (0, . . . , 0) is about one. Indeed,
we can prove formally that with constant probability there is exactly one such assignment,1
and that there is CNF formula ψ (easily constructed from φ and h) that is satisfied precisely
by that assignment. By doing the above construction for values of k ranging from 0 to n,
we obtain the desired reduction. Details follow.
Definition 6 Let H be a family of functions of the form h : {0, 1}n → {0, 1}m . We say
that H is a family of pair-wise independent hash functions if for every two different inputs
x, y ∈ {0, 1}n and for every two possible outputs a, b ∈ {0, 1}m we have
1
Prh∈H [h(x) = a ∧ h(y) = b] =
22m
1
For technical reasons, it will be easier to prove that this is the case when picking a hash function
h : {0, 1}n → {0, 1}k+2 .

7
Another way to look at the definition is that for every x 6= y, when we pick h at random
then the random variables h(x) and h(y) are independent and uniformly distributed. In
particular, for every x 6= y and for every a, b we have Prh [h(x) = a|h(y) = b] = Prh [h(x) =
a].
For m vectors a1 , . . . , am ∈ {0, 1}m and m bits b1 , . . . , bm , define ha1 ,...,am ,b1 ,...,bm :
{0, 1}n → {0, 1}m as ha,b (x) = (a1 · x + b1 , . . . , am · x + bm ), and let HAF F be the fam-
ily of functions defined this way. Then it is not hard to see that HAF F is a family of
pairwise independent hash functions.

Lemma 10 Let T ⊆ {0, 1}n be a set such that 2k ≤ |T | < 2k+1 and let H be a family of
pairwise independent hash functions of the form h : {0, 1}n → {0, 1}k+2 . Then if we pick
h at random from H, there is a constant probability that there is a unique element x ∈ T
such that h(x) = 0. Precisely,
1
Prh∈H [|{x ∈ T : h(x) = 0}| = 1] ≥
8
Proof: Let us fix an element x ∈ T . We want to compute the probability that x is the
unique element of T mapped into 0 by h. Clearly,

Prh [h(x) = 0∧∀y ∈ T −{x}.h(y) 6= 0] = Prh [h(x) = 0]·Prh [∀y ∈ T −{x}.h(y) 6= 0|h(x) = 0]

and we know that


1
Prh [h(x) = 0] =
2k+2
The difficult part is to estimate the other probability. First, we write

Prh [∀y ∈ T − {x}.h(y) 6= 0|h(x) = 0] = 1 − Prh [∃y ∈ T − {x}.h(y) = 0|h(x) = 0]

And then observe that

Prh [∃y ∈ T − {x}.h(y) = 0|h(x) = 0]


X
≤ Prh [h(y) = 0|h(x) = 0]
y∈|T |−{x}
X
= Prh [h(y) = 0]
y∈|T |−{x}
|T | − 1
=
2k+2
1

2
Notice how we used the fact that the value of h(y) is independent of the value of h(x) when
x 6= y.
Putting everything together, we have
1
Prh [∀y ∈ T − {x}.h(y) 6= 0|h(x) = 0] ≥
2

8
and so
1
Prh [h(x) = 0 ∧ ∀y ∈ T − {x}.h(y) 6= 0] ≥
2k+3
To conclude the argument, we observe that the probability that there is a unique element
of T mapped into 0 is given by the sum over x ∈ T of the probability that x is the unique
element mapped into 0 (all this events are disjoint, so the probability of their union is the
sum of the probabilities). The probability of a unique element mapped into 0 is then at
least |T |/2k+3 > 1/8. 2

Lemma 11 There is a probabilistic polynomial time algorithm that on input a CNF formula
φ and an integer k outputs a formula ψ such that
• If φ is unsatisfiable then ψ is unsatisfiable.

• If φ has at least 2k and less than 2k+1 satifying assignments, then there is a probability
at least 1/8 then the formula ψ has exactly one satisfying assignment.

Proof: Say that φ is a formula over n variables. The algorithm picks at random vectors
a1 , . . . , ak+2 ∈ {0, 1}n and bits b1 , . . . , bk+2 and produces a formula ψ that is equivalent to
the expression φ(x)∧(a1 ·x+b1 = 0)∧. . .∧(ak+2 ·x+bk+2 = 0). Indeed, there is no compact
CNF expression to compute a · x if a has a lot of ones, but we can proceed as follows: for
each i we add auxiliary variables y1i , . . . , yni and then write a CNF condition equivalent to
(y1i = x1 ∧ ai [1]) ∧ · · · ∧ (yni = yn−1
i ⊕ (xn ∧ ai [n] ⊕ bi ))). Then ψ is the AND of the clauses
in φ plus all the above expressions for i = 1, 2, . . . , k + 2.
By construction, the number of satisfying assignments of ψ is equal to the number of
satisfying assignments x of φ such that ha1 ,...,ak+2 ,b1 ,...,bk+2 (x) = 0. If φ is unsatisfiable, then,
for every possible choice of the ai , ψ is also unsatisfiable.
If φ has between 2k and 2k+1 assignments, then Lemma 10 implies that with probability
at least 1/8 there is exactly one satisfying assignment for ψ. 2

Theorem 12 (Valiant-Vazirani) Suppose there is a polynomial time algorithm that on


input a CNF formula having exactly one satisfying assignment finds that assignment. (We
make no assumption on the behaviour of the algorithm on other inputs.) Then NP = RP.

Proof: It is enough to show that, under the assumption of the Theorem, 3SAT has an
RP algorithm.
On input a formula φ, we construct formulae ψ0 , . . . , ψn by using the algorithm of Lemma
11 with parameters k = 0, . . . , n. We submit all formulae ψ0 , . . . , ψn to the algorithm in the
assumption of the Theorem, and accept if the algorithm can find a satisfying assignment
for at least one of the formulae. If φ is unsatisfiable, then all the formulae are always
unsatisfiable, and so the algorithm has a probability zero of accepting. If φ is satisfiable,
then for some k it has between 2k and 2k+1 satisfying assignments, and there is a probability
at least 1/8 that ψk has exactly one satisfying assignment and that the algorithm accepts.
If we repeat the above procedure t times, and accept if at least one iteration accepts, then if
φ is unsatisfiable we still have probability zero of accepting, otherwise we have probability
at least 1 − (7/8)t of accepting, which is more than 1/2 already for t = 6. 2

9
5 References
Probabilistic complexity classes were defined in [Gil77]. Adleman’s proof that BPP ⊆
SIZE(nO(1) ) appears in [Adl78]. Sipser’s proof that BPP ⊆ Σ2 appears in [Sip83], and
Lautemann’s proof is in [Lau83]. The Valiant-Vazirani result is from [VV86].

References
[Adl78] Leonard Adleman. Two theorems on random polynomial time. In Proceedings
of the 19th IEEE Symposium on Foundations of Computer Science, pages 75–83,
1978. 10

[Gil77] J. Gill. Computational complexity of probabilistic Turing machines. SIAM Journal


on Computing, 6:675–695, 1977. 10

[Lau83] C. Lautemann. BPP and the Polynomial Hierarchy. Information Processing Letters,
17:215–217, 1983. 10

[Sip83] M. Sipser. A complexity theoretic appraoch to randomness. In Proceedings of the


15th ACM Symposium on Theory of Computing, pages 330–335, 1983. 10

[VV86] L.G. Valiant and V.V. Vazirani. NP is as easy as detecting unique solutions.
Theoretical Computer Science, 47:85–93, 1986. 10

10
Exercises
1. Prove that ZPP=RP∩coRP

2. Show that if NP ⊆ BPP then NP = RP.

3. Prove that SPACE(O(nlog n )) 6⊆ BPP.

4. Change the assumption of Theorem 12 to having a probabilistic polynomial time al-


gorithm that on input a formula with exactly one satisfying assignment finds that
assignment with probability at least 1/2. Prove that it still follows that NP = RP.

11

You might also like