On Yao's XOR-Lemma: 1 T Def T I 1 I
On Yao's XOR-Lemma: 1 T Def T I 1 I
On Yao's XOR-Lemma: 1 T Def T I 1 I
An early version of this survey appeared as TR95-050 of ECCC, and was revised
several times (with the latest revision posted in January 1999). Since the first
publication of this survey, Yaos XOR Lemma has been the subject of inten-
sive research. The current revision contains a short review of this research (see
Section 7), but the main text (i.e., Sections 16) is not updated according to
these subsequent discoveries. The current version also include a new appendix
(Appendix B), which discusses a variant of the XOR Lemma, called the Selective
XOR Lemma.
1 Introduction
Yao stated the XOR Lemma in the context of one-way functions, where the
predicate f is the composition of an easy to compute Boolean predicate and the
inverse of the one-way function (i.e., f (x) = b(g 1 (x)), where g is a 1-1 one-
way function and b is an easy to compute predicate). Clearly, this is a special
case of the setting described above. Yet, the XOR Lemma is sometimes used
within the more general setting (under the false assumption that proofs for this
setting have appeared in the literature). Furthermore, in contrary to common
beliefs, the lemma itself has not appeared in Yaos original paper Theory and
Applications of Trapdoor Functions [17] (but rather in oral presentations of his
work).
A proof of Yaos XOR Lemma has first appeared in Levins paper [12]. Levins
proof is for the context of one-way functions and is carried through in a uniform
model of complexity. The presentation of this proof in [12] is very succinct and
does not decouple the basic approach from difficulties arising from the uniform-
complexity model. In Section 3, we show that Levins basic approach suffices
for the general case (mentioned above) provided it is stated in terms of non-
uniform complexity. The proof also extends to a uniform-complexity setting,
provided that some sampling condition (which is satisfied in the context of one-
way functions) holds. We do not know whether the XOR Lemma holds in the
uniform-complexity model in case this sampling condition is not satisfied.
Recently, Impagliazzo has shown that, in the non-uniform model, any weakly-
unpredictable predicate has a hard-core1 on which it is almost unpredictable [7].
Using this result, Impagliazzo has presented an alternative proof for the general
case of the XOR-Lemma (within the non-uniform model). We present this proof
in Section 4.
A third proof for the general case of the XOR-Lemma is presented in Sec-
tion 5. This proof proceeds by first proving that a function constructed by con-
catenating the values of the predicate on several independent instances is much
more unpredictable, with respect to specified complexity bounds, than the orig-
inal predicate. Loosely speaking, it is hard to predict the value of the function
with probability substantially higher than t , where is a bound on the prob-
ability of predicting the predicate and t is the number of instances concate-
nated. Not surprisingly, this statement turns out to be easier to prove than the
XOR-Lemma. Using a result of Goldreich and Levin [5] and some elementary
observation, we derive the XOR-Lemma.
We remark that Levins proof yields a stronger quantitative statement of
the XOR Lemma than the other two proofs. In fact, the quantitative statement
provided by Levins proof is almost optimal. Both Levins proof and our proof
can be transformed to the uniform-complexity provided some natural sampling
condition holds. We do not know how to transform Impagliazzos proof to the
uniform-complexity setting, even under this condition.
1
Here the term hard-core means a subset of the predicates domain. This meaning
is certainly different from the usage of the term hard-core in [5], where it means a
strongly-unpredicatable predicate associated with a one-way function.
3
2 Formal Setting
where the expectation is taken over the random variable Xn (and the process P ).
We say that a complexity class (i.e., a set of circuit families) has correlation at
most c() with P over X if, for every circuit family C in this class, the correlation
of C with P over X is bounded by c().
The foregoing definition may be used to discuss both uniform and non-uniform
complexity classes. In the next subsection we relate the Definition 1 to the stan-
dard treatment of unpredictability within the context of one-way functions.
def (t)
where x1 , ..., xt(n) {0, 1}n, and let X(t) = {Xn } be a probability ensemble
(t)
such that Xn consists of t(n) independent copies of Xn .
(hypothesis) Let s : N N be a size function, and : N [1, +1] be a function
1
that is bounded-away-from-1 (i.e., |(n)| < 1 p(n) , for some polynomial
p and all sufficiently large ns). Suppose that is an upper bound on the
correlation of families of s()-size circuits with P over X.
(conclusion) Then, there exists a bounded-away-from-1 function : N [1, +1]
and a polynomial p such that, for every function t : N N and every function
: N [0, 1], the function
def
(t) (n) = p(n) (n)t(n) + (n)
4
This assertion refers to what was known at the time this survey was written. As
noted in Section 7, the situation regarding this issue has changed recently.
6
Steven Rudich has observed that the Dream Version does not hold in a relativized
world. Specifically, his argument proceeds as follows. Fix as in the Dream
Version and set t such that (t) < 2(n). Consider an oracle that for every
(x1 , ..., xt(n) ) ({0, 1}n )t(n) and for a 2(n) fraction of the rs in {0, 1}n, answers
the query (x1 , ..., xt(n) , r) with (P (x1 ), ..., P (xt )), otherwise the oracle answers
with a special symbol. These rs may be selected at random (thus constructing
a random oracle). The hypothesis of the lemma may hold relative to this oracle,
but the conclusion cannot possibly hold. Put differently, one can argue that
there is no (polynomial-time) black-box reduction of the task of correlating P
(by at least ) to the task of correlating P (t) (by at least ). The reason being
that the polynomial-time machine (effecting this reduction) cannot distinguish a
black-box of negligible correlation (i.e., correlation 2) from a black-box of zero
correlation.
as well. These statements relate to the time required to construct the circuits
in the hypothesis and those in the conclusion. For example, one may refer to
circuit families, {Cn }, for which, given n, the circuit Cn can be constructed in
poly(|Cn |)-time. In addition, all functions referred to in the statement of the
lemma (i.e., s, t : N N, : N [1, +1] and : N [1, +1]) need to be
computable within corresponding time bounds. Such analogues of the two first
versions can be proven, provided that one can construct random examples of the
distribution (Xn , P (Xn )) within the stated (uniform) complexity bounds (and
in particular in polynomial-time). See Section 2.3 as well as comments in the
subsequent sections.
3 Levins Proof
The key ingredient in Levins proof is the following lemma, which provides an
accurate account of the decrease of the computational correlation in the case
that two predicates are xor-ed together. It should be stressed that the statement
of the lemma is intentionally asymmetric with respect to the two predicates.
Assume, towards the contradiction, that a circuit family C (of size s()) has
correlation greater than () with P over X. Thus, denoting by Yl (resp., Zm )
8
def def
the projection of Xn on the first l = l(n) bits (resp., last m = n l(n) bits),
we get
where, in the last expression, the outer expectation is over Yl and the inner one
is over Zm . For every fixed y {0, 1}l, let
def
T (y) = E[Cn (y, Zm ) P2 (Zm )]. (1)
We shall see that Eq. (2) either contradicts the hypothesis concerning P2 (see
Claim 4.1) or contradicts the hypothesis concerning P1 (by a slightly more in-
volved argument).
Claim 4.1: For all but finitely many ns and every y {0, 1}l
|T (y)| 2 (m).
def
Proof: Otherwise, fixing a y contradicting the claim, we get a circuit Cm (z) =
Cn (y, z) of size s(n) + l < s2 (m), having greater correlation with P2 than that
allowed by the lemmas hypothesis.
By Claim 4.1, the value T (y)/2 (m) lies in the interval [1, +1]; while, on the
other hand (by Eq. (2)), it (i.e., T ()/2 (m)) has good correlation with P1 . In
the rest of the argument we transform the function T into a circuit which
contradicts the hypothesis concerning P1 . Suppose for a moment, that one could
compute T (y), on input y. Then, one would get an algorithm with output in
[1, +1] that has correlation at least (n)/2 (m) > 1 (l) with P1 over Yl , which
is almost in contradiction to the hypothesis of the lemma.6 The same holds if
one can approximate T (y) well enough using circuits of size s1 (l). Indeed, the
lemma follows by observing that such an approximation is possible. Namely:
Claim 4.2: For every n, l = l(n), m = n l, q = poly(n/(n)) and y {0, 1}l, let
q
def 1
T(y) =
X
Cn (y, zi ) i
q i=1
where (z1 , 1 ), ..., (zq , q ) is a sequence of q independent samples from the dis-
tribution (Zm , P2 (Zm )). Then,
4 Impagliazzos Proof
1
Prob[Cn (Xn ) = f (Xn )] + (n)
2
where Xn is a random variable uniformly distributed on Sn .
We say that f has a hard-core (region) of density () with respect to s()-
size circuits families and advantage () if there exists a sequence of sets
S = {Sn {0, 1}n} such that S is a hard-core of f with respect to the above
and |Sn | (n) 2n .
We stress that the usage of the term hard-core in the above definition (and in
the rest of this section) is different from the usage of this term in [5]. Observe
that every strongly-unpredictable predicate has a hard-core of density 1 (i.e.,
the entire domain itself). Impagliazzo proves that also weakly-unpredicatabe
predicates have hard-core sets that have density related to the amount of unpre-
dictability. Namely:
[0, 1] be a noticeable function (i.e., (n) > 1/poly(n)), such that for every n and
every circuit Cn of size at most s(n) it holds that
where Un is a random variable uniformly distributed on {0, 1}n. Then, for every
function : N [0, 1], the function f has a hard-core of density () with respect
def
to s ()-size circuits families and advantage (), where (n) = (1 o(1)) (n)
def
and s (n) = s(n)/poly(n/(n)).
1 (n) 1
(n) = (1 o(1)) > (1 (n)).
2 3
Now, suppose that in contradiction to the XOR Lemma, the predicate F (t) de-
def
fined as F (t) (x1 , ..., xt ) = i f (xi ) can be correlated by small circuits with
def
correlation greater than c (n) = 2 ( 2+(n)
3 )t + (n). In other words, such cir-
cuits can guess F with success probability at least 21 + 12 c (n). However, the
(t)
probability that none of the t arguments to F (t) falls in the hard-core is at most
(1 (n))t . Thus, conditioned on the event that at least one argument falls in
the hard-core S, the circuit guess F (t) correctly with probability at least
1 1 1 (n)
+ c (n) (1 (n))t > +
2 2 2 2 .
Note, however, that this does not seem to yield an immediate contradition to
the definition of a hard-core of f , yet we shall see that such a contradiction can
be derived.
For every non-empty I {1, ..., t}, we consider the event, denoted EI , that
represents the case that the arguments to F (t) that fall in the hard-core of f are
exactly those with index in I. We have just shown that, conditioned on the union
of these events, the circuit guesses the predicate F (t) correctly with probability
at least 21 + (n)
2 . Thus, there exists an (non-empty) I such that, conditioned
on EI , the circuit guesses F (t) correctly with probability at least 21 + (n)
2 . Let
i I be arbitrary. By another averaging argument, we fix all inputs to the circuit
except the ith input and obtain a circuit that guesses f correctly with probability
at least 21 + (n)
2 . (For these fixed xj s, j 6= i, the circuit incorporates also the
value of j6=i f (xj ).) This contradicts the hypothesis that S is a hard-core.
13
Generalization. We have just established the validity of the Lemma 1 for the case
of the uniform probability ensemble and parameters p(n) = 2 and (n) = 2+(n)
3 .
The bound for can be improved to (n) = 1+(n)
2 + o(1 (n)). The argument
extends to arbitrary probability ensembles. To this end one needs to properly
generalize Definition 2 and prove a generalization of Lemma 6; for details the
interested reader is referred to Appendix A.
def 1 + (n)
Prob[C(Xn ) = P (Xn )] p(n) =
2 .
(conclusion) Then, for every function : N [0, +1], for every n and for every
poly( (n)
n ) s(n)-size circuit C , it holds that
Remark. Nisan et. al. [14] have used the XOR-Lemma in order to derive the
Concatenation Lemma. Our feeling is that the Concatenation Lemma is more
basic than the XOR Lemma, and thus that their strategy is not very natural.8
In fact, this feeling was our motivation for trying to find a direct proof for
8
This assertion is supported by a recent work of Viola and Wigderson, which pro-
vides a very simple proof that, in the general setting, the XOR Lemma implies the
Concatenation Lemma [16, Prop. 1.4].
14
the Concatenation Lemma. Extrapolating from the situation regarding the two
original lemmata of Yao (i.e., the XOR Lemma and the Concatenation Lemma
w.r.t. one-way functions),9 we believed that such a proof (for the Concatenation
Lemma) should be easy to find. Indeed, we consider the following proof of Con-
catenation Lemma much simpler than the proofs of the XOR Lemma (given in
previous sections).
for F1 (x)) and likewise C2 (x, y) is Cs guess for F2 (y). It is instructive to write
the success probability of C as follows:
The basic idea is that using the hypothesis regarding F2 allows to bound the
first factor by p2 (m), whereas the hypothesis regarding F1 allows to bound the
second factor by approximately p1 (l). The basic idea for the latter step is that a
sufficiently large sample of (Z, F2 (Z)), which may be hard-wired into the circuit,
allows to use the conditional probability space (in such a circuit), provided the
condition holds with noticeable probability. The last caveat motivates a separate
treatment for ys with noticeable Prob[C2 (y, Z) = F2 (Z)] and for the rest.
We call y good if Prob[C2 (y, Z) = F2 (Z)] /2 and bad otherwise. Let G be
the set of good ys. Then, using Prob[C(Y, Z) = F (Y, Z)] < /2 for every bad y,
we upper bound the success probability of C as follows
We proceed according to the foregoing outline. We first show that Prob[C2 (Y, Z) = F2 (Z)]
cannot be too large, as otherwise the hypothesis concerning F2 is violate. Actu-
ally, we prove the following
Claim 8.1: For every y, it holds that
Proof: Otherwise, using any y {0, 1}l such that Prob[C2 (y, Z) = F2 (Z)] >
def
p2 (m), we get a circuit C (z) = C2 (y, z) that contradicts the lemmas hypothesis
concerning F2 .
Next, we use Claim 8.1 in order to relate the success probability of C to the
success probability of small circuits for F1 .
Claim 8.2: There exists a circuit C of size s1 (l) such that
def
and let C (y) = C1 (y, z), where (z, ) is a uniformly selected among the ele-
ments of S for which C2 (y, z) = holds. Details follow.
def
Let S be a sequence of t = poly(n/) pairs, generated by taking t inde-
pendent samples from the distribution (Z, F2 (Z)). We stress that we do not
assume here that such a sample can be produced by an efficient (uniform) al-
gorithm (but, jumping ahead, we remark that such a sequence can be fixed
non-uniformly). For each y G {0, 1}l, we denote by Sy the set of pairs
(z, ) S for which C2 (y, z) = . Note that Sy is a random sample for the resid-
ual probability space defined by (Z, F2 (Z)) conditioned on C2 (y, Z) = F2 (Z).
Also, with overwhelmingly high probability, |Sy | = (l/2 ) (since y G implies
Prob[C2 (y, Z) = F2 (Z)] /2). Thus, with overwhelming probability (i.e., prob-
ability greater than 1 2l ), taken over the choices of S, the sample Sy provides
a good approximation to the conditional probability space, and in particular
|{(z, ) Sy : C1 (y, z) = F1 (y)}|
Prob[C1 (y, Z) = F1 (y) | C2 (y, Z) = F2 (Z)]
|Sy | 2
(4)
Thus, with positive probability, Eq. (4) holds for all y G {0, 1}l. The circuit
C guessing F1 is now defined as follows. A set S = {zi , i } satisfying Eq. (4) for
all good ys is hard-wired into the circuit C . (In particular, Sy is not empty
for any good y.) On input y, the circuit C first determines the set Sy , by running
C for t times and checking, for each i = 1, ..., t, whether C2 (y, zi ) = i . In case
Sy is empty, the circuit returns an arbitrary value. Otherwise, the circuit selects
uniformly a pair (z, ) Sy and outputs C1 (y, z). (This latter random choice
can be eliminated by a standard averaging argument.) Using the definition of C
and Eq. (4), we get
Prob[C (Y ) = F1 (Y )]
X
Prob[Y = y] Prob[C (y) = F1 (y)]
yG
X |{(z, ) Sy : C1 (y, z) = F1 (y)}|
= Prob[Y = y]
|Sy |
yG
X
Prob[Y = y] Prob[C1 (y, Z) = F1 (y) | C2 (y, Z) = F2 (Z)]
2
yG
X Prob[C(y, Z) = F (y, Z)]
Prob[Y = y]
Prob[C2 (y, Z) = F2 (Z)] 2.
yG
This is almost what we need, but not quite (what we need is a statement con-
cerning S = {1, ..., t(n)}). The gap is easily bridged by some standard padding
trick. For example, by using a sequence of fixed pairs (zi , i ), such that i =
t(n)
P (zi ), we reduce the computation of iS P (xi ) to the computation of i=1 P (yi )
by setting yi = xi if i S and yi = zi otherwise. (See Appendix B for more
details.) Thus, Lemma 1 follows (with the stated parameters).
This holds because otherwise one may sample Y with the aim of finding a y such
that Prob[C2 (y, Z) = F2 (Z)] > p2 (m) holds, and then use this y to construct
(uniformly!) a circuit that contradicts the hypothesis concerning F2 . Next, we
prove a weaker version of Claim 8.2 by observing that, for a uniformly selected
pair sequence S, with overwhelmingly high probability (and not only with pos-
itive probability), Eq. (4) holds for all good y {0, 1}l. Thus, if we generate
S by taking random samples from the distribution (Zm , F2 (Zm )), then with
overwhelmingly high probability we end-up with a circuit as required by the
modified claim. (The modified claim has p2 (m) + /8 in the denominator (rather
than p2 (m)) as well as an extra additive term of /8.) Using the hypothesis
concerning F1 , we are done as in the non-uniform case.
We consider all possible settings of all variables, except X (i) , and bound the
conditional entropy under this setting (which does not effect X (i) ). The fixed
values X (j) = xj can be eliminated from the entropy condition and incorporated
into the circuit. However, fixing some of the inputs in the circuit C yields a
circuit also in C and so we can apply the propositions hypothesis and get
7 Subsequent Work
Since the first publication of this survey, Yaos XOR Lemma has been the subject
of intensive research. Here we only outline three themes that were pursued, while
referring the interested reader to [10] and the references therein.
Derandomization. A central motivation for Impagliazzos work [7, 8] has been the
desire to present derandomized versions of the XOR Lemma; that is, predi-
cates that use their input in order to define a sequence of related instances,
and take the XOR of the original predicate on these instances.12 The potential
benefit in such a construction is that the hardness of the resulting predicate is
related to shorter inputs (i.e., the seed of a generator of a t-long sequence of n-bit
long strings, rather than the tn-bit long sequence itself). Indeed, Impagliazzos
work [7, 8] presented such a construction (based on a pairwise independent gen-
erator), and left the question of providing a full derandomization (that uses a
seed of length O(n) to generate t instances) to subsequent work. The goal was
achieved by Impagliazzo and Wigderson [11] by using a generator that combines
Impagliazzos generator [7, 8] with a new generator, which in turn combines an
expander walk generator with the Nisan-Wigderson generator [15].
Avoiding the use of random examples. As pointed out in Section 2.3, all proofs
presented in this survey make an essential use of random examples. For more
than a decade, this feature stood in the way of a general uniform version of
the XOR Lemma (i.e., all uniform proofs assumed access to such random exam-
ples). This barrier was lifted by Impagliazzo, Jaiswal, and Kabanets [9], which
culminated in comprehensive treatment of [10]. The latter work provides sim-
plified, optimized, and derandomized versions of the XOR and Concatenation
Lemmas.13 The key idea is to use the hypothetical solver of the concatenated
problem in order to obtain a sequence of random examples that are all good
with noticeable probability. An instance of the original problem is then solved
by hiding it in a random sequence that has a fair intersection with the initial
12
That is, the predicate consists of an instance generator and multiple applications
of the original predicate, P . Specifically, on input an s-bit long seed, denoted y, the
generator produces a t-long sequence of n-bit long strings (i.e., (x1 , ..., xt ) G(y)),
and the value of the new predicate is defined as the XOR of the values of P on these
t strings (i.e., ti=1 P (xi )).
13
The focus of [10] is actually on the Concatenation Lemma, which is currently called
the Direct Product Theorem. See next paragraph regarding the relation to the XOR
Lemma.
22
Acknowledgement
We wish to thank Mike Saks for useful discussions regarding Levins proof of the
XOR Lemma. We also thank Salil Vadhan and Ronen Shaltiel for pointing out
errors in previous versions, and for suggesting ways to fix these errors.
References
1. O. Goldreich. Foundation of Cryptography Class Notes. Spring 1989, Computer
Science Department, Technion, Haifa, Israel.
2. O. Goldreich. Foundation of Cryptography Fragments of a Book. February
1995. Available from ECCC.
3. O. Goldreich. Foundation of Cryptography: Basic Tools. Cambridge University
Press, 2001.
4. O. Goldreich. Computational Complexity: A Conceptual Perspective. Cambridge
University Press, 2008.
5. O. Goldreich and L.A. Levin. A Hard-Core Predicate for all One-Way Functions.
In 21st STOC, pages 2532, 1989.
6. J. H
astad, R. Impagliazzo, L.A. Levin and M. Luby. A Pseudorandom Generator
from any One-way Function. SICOMP, Volume 28, Number 4, pages 13641396,
1999. Combines papers of Impagliazzo et al. (21st STOC, 1989) and H astad
(22nd STOC, 1990).
7. R. Impagliazzo, manuscript 1994. See [8], which appeared after our first posting.
8. R. Impagliazzo. Hard-core Distributions for Somewhat Hard Problems. In 36th
FOCS, pages 538545, 1995. This is a later version of [7].
9. R. Impagliazzo, R. Jaiswal, and V. Kabanets Approximately List-Decoding
Direct Product Codes and Uniform Hardness Amplification. In 47th FOCS,
pages 187196, 2006.
10. R. Impagliazzo, R. Jaiswal, V. Kabanets, and A. Wigderson: Uniform Direct
Product Theorems: Simplified, Optimized, and Derandomized. SIAM J. Com-
put., Vol. 39 (4), pages 16371665, 2010. Preliminary version in 40th STOC,
2008.
23
Then, for every function : N [0, 1], the function f has a hard-core of density
() relative to X with respect to s ()-size circuits families and advantage (),
def def
where (n) = (1 o(1)) (n) and s (n) = s(n)/poly(n/(n)).
24
In this case we also say that Y is dominated by X. We say that Y is critically dom-
inated by X if for every string either Prob[Yn = ] = (1/(||)) Prob[Xn = ]
or Prob[Yn = ] = 0. (Actually, to avoid trivial difficulties, we allow at most one
string {0, 1}n such that 0 < Prob[Yn = ] < (1/(||)) Prob[Xn = ].)
The notions of domination and critical domination play central roles in the
following proof, which consists of two parts. In the first part (cf., Claim 12.1),
we prove the existence of a ensemble dominated by X such that f is strongly
unpredictable under this ensemble. In the second part (cf., Claims 12.2 and 12.3),
we essentially prove that the existence of such a dominated ensemble implies the
existence of an ensemble that is critically dominated by X such that f is strongly
unpredictable under the latter ensemble. However, such a critically dominated
ensemble defines a hard-core of f relative to X, and the lemma follows. Before
starting, we make the following simplifying assumptions (used in Claim 12.3).
Simplifying assumptions: Without loss of generality, the following two conditions
hold:
1. log2 s(n) n.
(Otherwise the hypothesis of the lemma cannot hold.)
2. Prob[Xn = x] < poly(n)/s(n), for all xs.
(This assumption is justified since xs violating this condition cannot con-
tribute to the hardness of f with respect to Xn , because one can incorporate
all these s(n)/poly(n) many violating xs with their corresponding f (x)s
into the circuit).
Claim 12.1: Under the hypothesis of the lemma it holds that there exists a prob-
ability ensemble Y = {Yn } such that Y is dominated by X and, for every
s (n)-circuit Cn , it holds that
1 (n)
Prob[Cn (Yn ) = f (Yn )] + (5)
2 2 .
Proof:14 We start by assuming, towards the contradiction, that for every distri-
bution Yn that is dominated by Xn there exists an s (n)-size circuits Cn such
that Prob[Cn (Yn ) = f (Yn )] > 0.5 + (n), where (n) = (n)/2. One key obser-
vation is that there is a correspondence between the set of all distributions that
14
The current text was revised following the revision in [4, Sec. 7.2.2.1].
25
are each dominated by Xn and the set of all the convex combinations of criti-
cally dominated (by Xn ) distributions; that is, each dominated distribution is a
convex combinations of critically dominated distributions and vice versa. Thus,
(1) (t)
considering an enumeration Yn , ..., Yn of all the critically dominated (by Xn )
distributions, we conclude that, for every distribution (or convex combination)
on [t], there exists an s (n)-size circuits Cn such that
t
X
(i) Prob[Cn (Yn(i) ) = f (Yn(i) )] > 0.5 + (n). (6)
i=1
Now, consider a finite game between two players, where the first player selects
a critically dominated (by Xn ) distribution, and the second player selects an
s (n)-size circuit and obtains a payoff as determined by the corresponding suc-
cess probability; that is, if the first player selects the ith critically dominated
distribution and the second player selects the circuit C, then the payoff equals
(i) (i)
Prob[C(Yn ) = f (Yn )]. Taking this perspective Eq. (6) means that, for any
randomized strategy for the first player, there exists a deterministic strategy for
the second player yielding average payoff greater than 0.5 + (n). The Min-Max
Principle asserts that, in such a case, there exists a randomized strategy for
the second player that yields average payoff greater than 0.5 + (n) no matter
what strategy is employed by the first player. This means that there exists a
distribution, denoted Dn , on s (n)-size circuits such that for every i it holds
that
Prob[Dn (Yn(i) ) = f (Yn(i) )] > 0.5 + (n), (7)
where the probability refers both to the choice of the circuit Dn and to the
(i)
random variable Yn . Let Bn = {x : Prob[Dn (x) = f (x)] 0.5 + (n)}. Then,
Prob[Xn Bn ] < (n), because otherwise we reach a contradiction to Eq. (7) by
defining Yn such that Prob[Yn = x] = Prob[Xn = x]/Prob[Xn Bn ] if x Bn and
Prob[Yn = x] = 0 otherwise.15 By employing standard amplification to Dn , we
obtain a distribution Dn over poly(n/ (n))s (n)-size circuits such that for every
x {0, 1}n \ Bn it holds that Prob[Dn (x) = f (x)] > 1 2n . It follows that there
exists an s(n)-sized circuit Cn such that Cn (x) = f (x) for every x {0, 1}n \ Bn ,
which implies that Prob[Cn (Xn ) = f (Xn )] Prob[Xn {0, 1}n \ Bn ] > 1 (n),
in contradiction to the theorems hypothesis. The claim follows.
independently of the choices made for all other strings. Note that the inequality
holds because X dominates Y.
First we show that, with overwhelmingly high probability over the choive of
Rn , it holds that Prob[Xn Rn ] (n).
Claim 12.2: Let > 0 and suppose that Prob[Xn = x] (n) 2 /poly(n), for
every x. Then, for all but at most a 2poly(n) measure of the choices of Rn , it
holds that
|Prob[Xn Rn ] (n)| < (n).
def
Proof: For every x {0, 1}n, let wx = Prob[Xn = x]. We define random variables
x = x (Rn ), over the probability space defined by the random choices of Rn ,
such that x indicate whether x Rn ; that is, the x s are independent of one
another, and Prob[x = 1] = p(x) (and x = 0 otherwise). Thus, for every possible
choice of Rn , it holds that
X
Prob[Xn Rn ] = x (Rn ) wx
x
P
and consequently we are interested in the behaviour of the sum x wx x as a
random variable (over the probability space of all possible choices of Rn ). Taking
expactation (over the possible choices of Rn ), we get
" #
X X
E wx x = p(x) wx
x x
X (n) Prob[Yn = x]
= Prob[Xn = x]
x
Prob[Xn = x]
= (n).
Finally, using the claims hypotheses wx 2 (n)/poly(n) (for all xs), the
latter expression is bounded by exp(poly(n)), and the claim follows.
Claim 12.3:16 For all but at most a 2poly(n) measure of the choices of Rn , it
holds that every circuit Cn of size s (n) satisfies
1
Prob[Cn (Xn ) = f (Xn )|Xn Rn ] < + (n).
2
Proof: We define the same random variables x = x (Rn ) as in the proof of
Claim 12.2; that is, x (Rn ) = 1 if x Rn and x (Rn ) = 0 otherwise. Also, as
def
before, wx = Prob[Xn = x], for every x {0, 1}n. Fixing any circuit Cn , let C
be the set of inputs on which Cn correctly computes f ; namely,
def
C = {x : Cn (x) = f (x)}. (9)
Prob[Xn C Xn Rn ]
Prob[Xn C|Xn Rn ] = (10)
Prob[Xn Rn ] .
We first determine the expected value of the numerator of Eq. (10), where the
expactation
P is taken over the possible choices of Rn . We rewrite the numerator
as xC x (Rn ) wx , and lower bound it as follows
" #
X X
E x wx = p(x) wx
xC xC
X (n) Prob[Yn = x]
= Prob[Xn = x]
Prob[Xn = x]
xC
= (n) Prob[Yn C]
1 (n)
(n) +
2 2 ,
where the last inequality is due to the hypothesis regarding Yn . Next, we use a
(multiplicative) Chernoff bound, and get
" # !!
2
X 1 2(n) (n) (n)
Prob wx x > + (n) < exp
2 3 maxx {wx }
xC
!!
2
(n) s(n) log2 s(n)
< exp
poly(n)
,
where the last inequality uses the simplifying assumptions regarding the wx s
and s(n) (i.e., wx < poly(n)/s(n) and log2 s(n) n). Thus, for all but at most
a exp(poly(n) s (n) log2 s (n)) measure of the Rn s, the numerator of Eq. (10)
is at most ( 21 + 2(n)
3 ) (n). This holds for each possible circuit of size s (n).
16
The current statement and its proof were somewhat revised.
28
Applying the union bound to the set of all 2s (n)(O(1)+2 log2 s (n)) possible circuits
of size s (n), we conclude that the probability that for some of these circuits the
numerator of Eq. (10) is greater than ( 21 + 2(n)
3 ) (n) is at most exp(poly(n)),
where the probability is taken over the choice of Rn . Using Claim 12.2, we
conclude that, for a similar measure of Rn s, the denumerator of Eq. (10) is at
least (1 (n)
3 ) (n). The claim follows.
Conclusion. The lemma now follows by combining the foregoing three claims.
Claim 12.1 provides us with a suitable Y for which we apply the probabilistic
construction, whereas Claims 12.2 and 12.3 establish the existence of a set Rn
such that both
Prob[Xn Rn ] > (1 o(1)) (n)
and
1
Prob[Cn (Xn ) = f (Xn )|Xn Rn ] < + (n)
2
holds for all possible circuits, Cn , of size s (n). The lemma follows.
def (t)
where x1 , ..., xt(n) {0, 1}n and S {1, ..., t(n)}. Let Y(t) = {(Xn , Ut(n) )},
(t)
where Xn is as in Lemma 1 and Ut(n) be a random variable that is independently
and uniformly distributed over {0, 1}t(n).
(hypothesis) As in Lemma 1; that is, suppose that for some function s : N N
and some bounded-away-from-1 function : N [1, +1], it holds that is
an upper bound on the correlation of families of s()-size circuits with P over
X.
(conclusion) Analogously to Lemma 1, there exists a bounded-away-from-1 func-
tion : N [1, +1] and a polynomial p such that, for every function
t : N N and every function : N [0, 1], the function
def
(t) (n) = p(n) (n)t(n) + (n)
29
In this appendix we discuss the relation of the Selective XOR Lemma to the
XOR Lemma and to the Concatenation Lemma.
The Selective XOR Lemma implies the XOR Lemma. This implication was
sketched in Section 5.2, and we provide more details next. We show how to
use an algorithm that correlates P (t) in order to correlate Q(t) . We shall use t
random examples, denoted (z1 , P (z1 )), ..., (zt , P (zt )). On input (x1 , ..., xt , S), we
set xi = xi if i S and xi = zi otherwise, obtain (from the algorithm) a guess
b for P (t) (x1 , ..., xt ), and output b i[t]\S P (zi ). Thus, our answer is correct if
Q
(t) (t) (t)
Q only if b = P (x1 , ..., xt ), because P (x1 , ..., xt ) equals Q (x1 , ..., xt , S)
and
i[t]\S P (zi ).
The XOR Lemma implies the Selective XOR Lemma. Following [16, Prop. 1.4],
we show how to use an algorithm that correlates Q(3t) in order to correlate
P (t) . Here we shall use 3t random examples, denoted (z1 , P (z1 )), ..., (z3t , P (z3t )).
On input (x1 , ..., xt ), we select at random a subset S {1, ..., 3t}, and let
i1 , ..., it be arbitrary t distinct elements of S (assuming that |S| t). Next,
we set xij = xj for every j = 1, .., t, and set xi = zi for every i S , where
def
S = {1, ..., 3t} \ {ij : j = 1, ..., t}. WeQobtain (from the algorithm) a guess b
for Q(3t) (x1 , ..., x3t , S), and output b iS\S P (zi ). Thus, our answer is cor-
rect if and only Q if b = Q(3t) (x1 , ..., x3t , S), because Q(3t) (x1 , ..., x3t , S) equals
(t)
P (x1 , ..., xt ) iS\S P (zi ). Note that this works assuming that |S| t,
which holds with probability 1 2(t). Thus, our correlation with P (t) is lower
bounded by p 2(t) , where p is the correlation of the given algorithm with
Q(3t) .