Lecture 3: Linearity Testing
Lecture 3: Linearity Testing
Lecture 3: Linearity Testing
In today’s lecture we’ll go over some preliminaries to the coding theory we will need
throughout the course. In particular, we will go over the Walsh-Hadamard code and the
Blum-Luby-Rubinfeld linearity test. In the following lecture we will use these tools to show
that N P ⊆ P CP [poly, O(1)] (recall that our goal is ultimately to bring the polynomial
amount of randomness down to a logarithmic amount). A common feature throughout the
course is that we will use codes to construct PCPs and vice-versa, construct useful codes
from PCPs.
3-1
distance that can be achieved.
3. Error correction: given r ∈ {0, 1}n , find x ∈ {0, 1}k minimizing ∆(r, C(x)).
In this course, we will interested in these questions in the context of sublinear time
algorithms. We need to be specific what we mean by sublinear time. Note that a sublinear
time algorithm A to compute a function f {0, 1}k → {0, 1}n doesn’t have enough time to
read it’s input or write it’s output. We get over this by accessing the input and writing the
output by oracle access. That is, an oracle machine A is a sublinear time algorithm for f if
Ax (j) = f (x)j where f (x)j is the j-th bit of f (x). Note that j need only be log n bits, so
can be read entirely in sublinear time. More formally,
• The input is represented implicitly by an oracle. Whenever the sublinear time algo-
rithm wants to access the j th bit of the input string x (for some j ∈ [k]), it queries
the input x-oracle for the j th bit and obtains xj .
• The output is not explicitly written by the algorithm, instead it is only implicitly
given by the algorithm. Formally, on being queried for index i of the output string
f (x) (for some i ∈ [n]), the algorithm outputs the bit f (x)i . Thus, the algorithm itself
behaves as an oracle for the string f (x), which in turn has oracle access to the input
oracle x.
• Since the algorithm does not read the entire input x, we cannot expect it compute
the output f (x) exactly. We instead relax our guarantee on the output as follows:
On input x ∈ {0, 1}k , the algorithm must compute f (x0 ) exactly for some x0 ∈ {0, 1}k
that is -close to the actual input x. In other words, the algorithm computes functions
on some approximation to the input instead of the input itself.
3-2
Property testing, for those familiar with it, is typically done in this framework. Figure 1
gives a pictorial description of a sublinear time algorithm with the above mentioned relax-
ations.
Now we’ll consider the algorithmic questions of coding theory in the sublinear time
framework. Let’s consider the questions above one by one:
1. For a code to have good error-correcting properties, most bits of the codeword needs
to depend on most message-bits. Taking this into account, it does not seem reasonable
to expect a “good” code to have sublinear time encoding. (However, there has been
some recent work in the area of locally encodable codes by relaxing some of the error-
correction properties of the code).
2. Error detection: in this context, error detection is known as local testing. That is,
given r ∈ {0, 1}n , test if either r is a codeword or r is far from every codeword (similar
to the gap problems we saw earlier, and also to property testing).
3. Error correction is known as local decoding in this context. Given j, the decoder
queries some bits of a noisy codeword C(x) + η and outputs the j-th bit of x. more
formally, we say a code is locally decodable if there exists a local decoder Dec such
that for all x and r where ∆(r, C(x)) < , the decoder satisfies
3
∀i ∈ {1, . . . , k}, Pr[Decr (i) = xi ] ≥ .
4
If there is such a decoder for a given code C and it makes fewer than q queries, then
we say C is (q, )-locally decodable.
Obviously, sublinear time algorithms in general, and local decoding and local testing
algorithms in particular, will be randomized algorithms.
Note that local decoding is only required to work when the input is sufficiently close to a
codeword, whereas local testability determines whether a given string is close to a codeword
or far from any codeword. Thus the local decodability of a code says nothing about its local
testability.
3-3
Since any two distinct linear functions disagree on exactly half the set of inputs (i.e,
Pra [`x (a) 6= `y (a)] = 1/2, for x 6= y), the fractional distance of the Walsh-Hadamard codes
is 1/2. Thus the Walsh-Hadamard code is a [k, 2k , 2k−1 ]-code. It has very good distance,
but poor rate.
It is useful to note that there are two dual views of the Walsh-Hadamard code based on
the fact that W H(x)a = `x (a) = `a (x). Thus, W H(x) is both the evaluation of the linear
function `x at every point as well as the evaluation of every linear function at the point x.
Note that the WH code has the special property that the input bits xi are in fact a subset
of the codeword bits, since xi = `x (ei ). Codes with this property are called systematic codes.
Proof. Since f is δ-close to `x , we have that for a random r, the probability that f (z + r) =
`x (z + r) is at least 1 − δ, and so is the probability that f (r) = `x (r). If both of these
conditions hold, then Decf (z) = `x (z), by the linearity of `x . Thus Prr [Decf (z) = `x (z)] ≥
1 − 2δ.
Since the decoder only makes two queries, the Walsh-Hadamard code is 2-locally decod-
able.
3-4
Obviously if f is linear, this test will pass with probability 1. The question is, can there
be function that are far from linear but still pass this test with high probability? No, as
shown in the following
Theorem 3.2. If f is δ-far from linear, then
Pr[BLR-Testf accepts ] ≤ 1 − δ.
The above theorem is tight since a random function is 1/2-far from linear, and passes
the BLR-Test with probability exactly 1/2.
The original proof of Blum, Luby and Rubinfeld [BLR93] is a combinatorial proof of
a weaker version of the above theorem, but we will give an algebraic proof, as similar
techniques will arise later in the course. This algebraic proof is due to Bellare, Coppersmith,
håstad, Kiwi and Sudan [BCH+ 96]. Before proceeding to the proof, we will first need to
equip ourselves with some basics of Boolean Fourier analysis.
(It is an easy exercise to check that this is in fact an inner product.) Note that there are
already 2k = dim F functions of this form, so all we need to do is show that they are linearly
independent.
We begin by examining a few basic properties of the functions χa . First, note that
Property 1. χa (x + y) = χa (x)χa (y)
i.e. χa is linear, as mentioned previously. Second,
Property 2. χa+b (x) = χa−b (x) = χa (x)χb (x)
i.e. χa (x) is also linear in a; this should come as no surprise, because of the duality of
`a (x) as both a linear function of a and of x.
The first property we will need that is not entirely obvious is that
Property 3.
1 if a = 0
Expx [χa (x)] =
0 otherwise
3-5
Proof. If a = 0, this clearly holds. If a 6= 0, then by permuting the indices we may assume
that a1 6= 0 without loss of generality. Then we have
X P
2k · Expx [χa (x)] = (−1) ai xi
x
X P X P
ai xi ai xi
= (−1) + (−1)
x:x1 =1 x:x1 =0
Then, since (−1)`a (0y) = −(−1)`a (1y) , these two sums exactly cancel out.
We then have,
Property 4. (
1 if a = b
hχa , χb i = .
0 otherwise
This follows from the above via hχa , χb i = Ex [χa (x)χb (x)] = Ex [χa−b (x)], and then
applying the above fact.
Thus, the χa form not only a basis for F, but also an orthonormal P basis for F. Since
ˆ
the χa form an orthonormal basis, for any f ∈ F we may write f = a fa χa for hf, χa i =
fˆa ∈ R. These fˆa are called the Fourier coefficients of f .
Observer that if the normalized distance δ(f, χa ) = , then fˆa = 1(1−)+(−1) = 1−2,
so the Fourier coefficients capture the normalized distance from linear functions.
Now we come to one of the most basic useful facts in Fourier analysis:
P ˆ2
Property 5 (Parseval’s identity). hf, f i = fa
Proof. Writing f in terms of the basis χa , we get:
X X
hf, f i = h fˆa χa , fˆb χb i
a b
X
= fˆa fˆb hχa , χb i (by linearity of h·, ·i)
a,b
X
= fˆa2
a
where the last line follows from the previous because the χa form an orthonormal basis.
P ˆ2
Corollary 3.3. In particular, if f is a Boolean function, i.e. ±1-valued, then fa = 1.
Proof of Theorem 3.2. Suppose f is δ-far from any linear function. Note that we can rewrite
the linearity condition f (x)f (y) = f (x + y) as f (x)f (y)f (x + y) = 1, since f is ±1-valued.
Then
3-6
Note that for any random variable Z with values in {+1, −1}, P r(Z = 1) = Exp[ 1+Z 2 ],
since if Z = 1, then (1 + Z)/2 = 1 and if Z = −1, then (1 + Z)/2 = 0, so (1 + Z)/2 is an
indicator variable for the event Z = 1. Thus we have
1 + f (x)f (y)f (x + y)
Pr [BLR-Test accepts f ] = Expx,y
x,y 2
1 1
= + Expx,y [f (x)f (y)f (x + y)]
2 2
Then, we apply linearity of expectation and the fact that x and y are independent to get:
1 1X ˆ ˆˆ
Pr [BLR accepts f ] = + fa fb fc Expx [χa (x)χc (x)] Expy [χb (y)χc (y)]
x,y 2 2
a,b,c
1 1 X ˆ3
= + fa
2 2
a,b,c
1 1 X
≤ + max(fˆa ) fˆa2
2 2 a a
1 1
= + max(fˆa ) (by Parseval’s identity)
2 2 a
= 1−δ
where the last line follows from the fact that f is δ-far from linear, so its largest Fourier
coefficient can be at most 1 − 2δ, as noted previously.
References
[BCH+ 96] Mihir Bellare, Don Coppersmith, Johan Håstad, Marcos A. Kiwi, and
Madhu Sudan. Linearity testing in characteristic two. IEEE Transactions on
Information Theory, 42(6):1781–1795, November 1996. (Preliminary version in
36th FOCS, 1995). doi:10.1109/18.556674.
3-7