Multiscale Analysis of The Primes: Terence Tao (UCLA)

Download as pdf or txt
Download as pdf or txt
You are on page 1of 27

Multiscale analysis of the primes

Terence Tao (UCLA)

www.math.ucla.edu/∼tao/preprints

1
Patterns in the primes

• Let N be a large integer, let P1 be the integers be-


tween N/2 and N , and let P be the set of prime
numbers in P1. In additive number theory, one can
ask the following two general questions:
• Q1: How can one describe (or model) the arithmetic
structure of P inside P1 for large N ?
• Q2: What kind of arithmetic patterns does P have
for large N ?
• The two problems are related; the strategy has been
to make progress on Q2 type problems by first mak-
ing progress on Q1.

2
More specific examples of Q2:

• If N is sufficiently large, then


• (Twin prime conjecture) P − P contains 2.
• (Vinogradov’s theorem) P +P +P covers all the odd
numbers between 1.9N and 2.1N (say).
• (Goldbach Conjecture) P + P covers all the even
numbers between 1.4N to 1.6N .
• (Van der Corput theorem) P contains at least one
arithmetic progression of length 3.
• (Green-Tao) P contains at least one arithmetic pro-
gression of length k (if N ≥ C(k)).

3
Prime number theorem

• The most basic result regarding Q1 (i.e. the structure


of P ) is the Prime Number Theorem:
|P1|
|P | = (1 + o(1)) ,
log N
i.e. P has density approximately 1/ log N in P1. The
proof is roughly as follows:
• Unique Factorization: For any integer n, we have
∏ ∏
n= p
p prime j≥1:pj |n

• Define the von Mangoldt function Λ(n) to be log p


if n is a power of a prime p, and zero otherwise. Then
taking logs of the above formula we obtain

log n = Λ(d).
d|n

• Multiply this by n−s and sum to obtain



∑ ∑∞ ∑∞
log n 1 Λ(n)
= × .
n=1
ns n=1
ns
n=1
ns

4
∑∞
• Define the Riemann zeta function ζ(s) := 1
n=1 ns ,
thus ∞
∑ Λ(n)

−ζ (s) = ζ(s) s
.
n=1
n

• Now obtain good control on ζ(s) and ζ ′(s), and then


solve for Λ(n) as best as one can. It is particularly im-
portant to know where the zeroes of ζ(s) are. Even-
tually one gets

Λ(n) = (1 + o(1))|P1|
n∈P1

which then quickly yields the prime number theorem.


• The famous Riemann hypothesis gives much more
information about ζ, and in turn gives much more

precise estimates on such sums as N n=N/2 Λ(n), and
thus on the cardinality of P . However, it reveals
very little about the finer structure of P , and does
not seem to be the right tool for solving Q2 type
problems (for instance, we are no closer to solving the
twin prime or Goldbach conjecture even assuming
the generalized Riemann hypothesis).

5
A probabilistic model of the primes

• Given that P has density 1/ log N in P1, one may be


led to the following random model for P :
P behaves like a random subset of P1 of density 1/ log N .
• By “behaves like”, we mean that any statistic used to
measure P would give roughly the same answer (as
N gets large) as a random set of the same density.
Another way of stating this is that the von Mangoldt
function Λ on P1 can be approximated in some weak
sense by its average value or expectation E(Λ) :=

E(Λ(n)|n ∈ P1) := |P11| n∈P1 Λ(n) = 1 + o(1).
• If such a model were true, then it would easily yield
answers to many Q2 questions. For instance, for p
picked at random in |P1|, we would have that p is
prime with probability 1/ log N , and p + 2 is prime
with probability 1/ log N , hence (p, p+2) would both
be prime with probability 1/ log2 N . Thus one ex-
pects roughly N/2 log2 N twin primes in P (and thus
infinitely many twin primes overall). A similar argu-
ment predicts roughly N 2/8(k − 1) logk N progres-
sions of length k in P .
• Another way of stating the above assertions would

6
be that
E(Λ(n)Λ(n+2)|n ∈ P1) ≈ E(E(Λ)(n)E(Λ)(n+2)|n ∈ P1)
and
E(Λ(n) . . . Λ(n + (k − 1)r)|n ∈ P1, r ∈ [1, N/2])
≈ E(E(Λ)(n) . . . E(Λ)(n+(k−1)r)|n ∈ P1, r ∈ [1, N/2]).
• Unfortunately, this simple model is too naive. For
instance, it would also predict that there are roughly
N/2 log2 N pairs of the form p, p + 1 in P , which is
clearly wrong!
• The problem is that we have to incorporate more
Q1 type information than the prime number theo-
rem, which only gives density bounds on P1. In this
particular case, the additional structure we need to
incorporate is
All elements of P are odd.
• To exploit this, let P2 denote those elements of P1
which are odd. Then |P2| ≈ |P1|/2, and so P has
density roughly 2/ log N inside P2. Thus we could
conjecture a new probabilistic model for the primes,
P behaves like a random subset of P2 of density 2/ log N .

7
• A fancier way of stating this model is the follow-
ing. Let B2 be the σ-algebra of P1 generated by the
residue classes modulo 2. Then the above model as-
serts that Λ can be approximated in a weak sense,
not by the expectation E(Λ), which is roughly 1, but
by the conditional expectation, E(Λ|B2), which is
roughly 2 on the odd numbers and 0 on the even
numbers.
• This model now gives the correct count for pairs
p, p+1 in P , and gives a revised prediction of N/ log2 N
for the number of pairs p, p+2 in P , and similarly for
other Q2 type problems. However, it gives incorrect
counts for other problems, such as counting triples of
the form p, p + 2, p + 4 in P (it predicts 2N/ log3 N
such triples, when in fact there are 0). This is be-
cause we have not yet incorporated further Q1 type
information about the primes, namely
All elements of P are equal to 1 or 2 mod 3.
• In fact, the elements of P are equally divided into
the two residue classes. This is a special case of
• Dirichlet-Siegel-Walfisz theorem The density
of P in the residue class {n ∈ P1 : n = a mod k}
is equal to 0 if a is not coprime to k, and equal to

8
(1+o(1))k
when a is coprime to k, if N is large enough
ϕ(k) log N
depending on k. Here ϕ(k) is the number of residue
classes mod k that are coprime to k.
• This leads to another model for the primes. Let P3
be those elements of P2 which are coprime to 3; then
P3 has density 1/3 in P1, then we could conjecture
P behaves like a random subset of P3 of density 3/ log N .
• Equivalently, we can define B3 to be the σ-algebra
formed by adding the residue classes modulo 3 to
B∈, and we then model Λ by E(Λ|B3) (i.e. the func-
tion which equals 3 when n = 1, 5 mod 6 and zero
otherwise).

9
Hardy-Littlewood prime tuple conjecture

• One can obviously continue this procedure (the sieve



of Eratosthenes). For any level 1 ≤ R ≤ N ,
we define BR to be the σ-algebra generated by the
residue classes with modulus at most R, and define
the almost primes PR of level R to be those num-
bers, all of whose prime factors exceed R. We then
approximate Λ by E(Λ|BR); or what is much the
same thing, we view P as a random subset of PR of
density
√ |P |/|PR | ≈ log R/ log N . If R is as large as
N , then this model becomes exact. But it is con-
jectured that the model becomes accurate far sooner:
• Hardy-Littlewood prime tuple conjecture
(informal) The random model of the primes at
level R gives accurate predictions (up to multiplica-
tive errors of 1 + o(1)) to question Q2 if R → ∞
slowly with N (e.g. R = log log N ).
• This conjecture, if true, would give precise asymp-
totics for all the Q2 problems listed earlier, e.g.
• The number of twin primes in P would be (1 +
o(1))B2N/2 log2 N , where B2 is the twin prime con-

10
stant
∏ 1
B2 := ω(p)/(1 − )2 = 1.88032 . . . ,
p
p prime
where ω(p) is the probability that a random pair
n, n + 2 in Z/pZ are both coprime to p.
• The number of progressions of length k in P would
be (1 + o(1))Ck N/8(k − 1) logk N , where
∏ 1
Ck := γk (p)/(1 − )k ,
p
p prime

and γk (p) is the probability that a random arithmetic


progression of length k in Z/pZ is entirely coprime
to p. (This is known to be true when k = 3 but is
open for higher k).
• Note that the random prime model at level R would
give the above predictions, but with the product over
primes truncated to only run over primes less than
R.
• With a small handful of exceptions, the Hardy-Littlewood
prime tuple conjecture is completely open. The prob-
lem is that we have very few tools for understanding
how P sits inside PR . Even the generalized Riemann
11
hypothesis does not give fine structure information
on P .
• On the other hand, we do understand PR quite well
for small values of R. For instance, P2 is simply the
odd numbers from N/2 to N . Sieve theory can in
fact give quite good bounds for PR for R up to a
small
√ power of N - but alas, not all the way up to
N . A typical result here is Chen’s theorem: there
are at least N/100 log2 N pairs p, p + 2, where p lies
in P and p + 2 lies in PN 3/11 . Similarly one can
count arithmetic progressions of length k in √ PR for
R smaller than N 1/2k or so. As one nears N , the
error terms begin to swamp the main terms and we
do not know how to proceed. A major obstacle here
is the parity problem: it is very hard to distinguish
the numbers which have an odd number of prime
factors from numbers which have an even number of
prime factors (those two sets of numbers have almost
identical statistics).

12
Vinogradov’s method

• Aside from zeta function methods (giving cardinality


control on P ) and sieve theory (giving structural con-
trol on PR ), we have a third set of techniques based
on harmonic analysis, which can give Fourier type
control on P , the idea being that harmonic analy-
sis (and ergodic theory, also) show that P cannot
have structure in too many different ways. A typical
example of this is Vinogradov’s method.
• Very roughly speaking, this method allows one to get
control on Fourier coefficients of Λ, e.g.

f (θ) := Λ(n)e2πiθn.
n∈P1

When θ is close to a rational with small denominator


one has very good estimates here (basically, here it
is safe to replace the primes by one of the random
models mentioned above). When θ is highly irra-
tional, one can proceed by starting with the unique
factorization identity

log n = Λ(d)
d:d|n

13
and applying Möbius inversion to obtain the for-
mula ∑ n
Λ(n) = µ(d)Λ( )
d
d|n

where µ is the Möbius function (equal to +1 when


d is squarefree with even number of prime factors,
-1 when d is squarefree with odd number of prime
factors, and 0 otherwise). We can then write

f (θ) = µ(d) log(m)e2πiθdm.
d,m:dm∈P1

One can view this as a weighted sum of several Fourier


coefficients of, say, µ(d), at the frequency θm. The
point is that if θ is irrational, then θm must vary,
and while µ could have a few exceptional Fourier
coefficients, their average behaviour is easier to con-
trol. (We are skipping many important details here).
Thus we are using the arithmetic structure of the
primes to “disperse” one Fourier coefficient into an
average of many Fourier coefficients.
• The upshot of Vinogradov type methods can be sum-
marized as:
The Fourier coefficients of P behave
in magnitude like those of a random subset of PR ,

14
i.e. the magnitudes of Fourier coefficients are con-
sistent with the Hardy-Littlewood conjecture. This
allows us in turn to solve Q2 type questions which
only require magnitude control on the Fourier coef-
ficients. For instance, the number of ways to write
M as the sum of three primes in P can be written
in terms of the Fourier coefficients as
∫ 2π
1
e−2πiM θ f (θ)3 dθ,
2π 0
and this can be computed using the above techniques
to prove Vinogradov’s theorem (every large odd num-
ber is the sum of three primes). It is crucial here that
the exponent is 3, which is larger than the Plancherel
exponent of 2; this means that large Fourier coef-
ficients dominate, and one can estimate the small
Fourier coefficients in absolute values. A similar ar-
gument gives van der Corput’s theorem (infinitely
many prime arithmetic progressions of length 3).
• Unfortunately, many Q2 problems are known to de-
pend on the phase of Fourier coefficients, as well as
their magnitude. A typical example is the problem
of locating progressions of length 4. Let ε be a small
number and consider two subsets A and B of P1.
A will be a randomly chosen set of cardinality εN .
15
B will be the set of all numbers n such that αn2
is within ε/2 to an integer,
√ where α is a fixed ir-
rational (e.g. α := 2). Both sets have roughly
the same size, and roughly the same Fourier coeffi-
cients. However, B has significantly more progres-
sions n, n + r, n + 2r, n + 3r of length 4 than A (it
has roughly ε3n2, while A has roughly ε4n2). The
reason is because of the quadratic identity
αn2 − 3α(n + r)2 + 3α(n + 2r)2 − α(n + 3r)2 = 0
which means that if three elements of a progression lie
in B, the fourth is very likely to also. This quadratic
correlation is not easily detected using Fourier meth-
ods (although Tim Gowers has managed to do so by
developing the foundations of a “quadratic Fourier
analysis”; we will not discuss this fascinating topic
here).

Density-based methods

• To summarize so far: we know the primes P are


a somewhat sparse subset of the interval P1 (with
density 1/ log N ), but can also be viewed as a denser
subset of the almost primes PR (with density roughly
log R/ log N ). We understand PR very well for R

16
small (e.g. a small power of N ), but as for how P
sits inside PR , we only have a conjecture, and some
Fourier-type information. This blocks us from solv-
ing a number of Q2 type problems.
• However, for certain special types of Q2 problems it is
possible to establish patterns in P without knowing
the specific way in which P lies inside PR - instead,
relying purely on the density of P inside PR . The
prototype for such a result is
• Szemerédi’s theorem Let A be any subset of P1
with density δ > 0. Then, if N is large enough de-
pending on δ and k, then A must contain arithmetic
progressions of length k (in fact, it contains at least
c(k, δ)N 2 of them).
• The intuition behind this theorem is the following.
If A is randomly distributed, then it is easy to see
why A should have lots of progressions. Conversely,
if A is very structured (e.g. periodic), then it is also
easy to see why A should have progressions (whereas
it need not have any twins, for instance). The (very)
hard part of the proof is to show that every set A
can be somehow “split” into a random part and a
structured part.

17
• While Szemerédi’s theorem is very deep, its reach
has been limited by the fact that N must be large
depending on the density δ. Despite much progress
on this issue, particularly by Gowers, the best bound
known for N for large k is
N ≥ 2 ↑ 2 ↑ 1/δ ↑ 2 ↑ 2 ↑ k + 9,
where ↑ denotes exponentiation. Since the primes
P have density 1/ log N , this bound is nowhere near
good enough to obtain long arithmetic progressions
in the primes. (There is a conjectured bound of 2ck /δ ,
essentially due to Erdös and Turan, which would give
progressions in the primes, but this conjecture is still
well out of reach.)
• Szemerédi’s theorem asserts, roughly speaking, that
dense subsets of integers contain long arithmetic pro-
gressions. The reach of this theorem has recently
been significantly extended to assert that further-
more, dense subsets of sufficiently pseudorandom
sets of integers also contain long arithmetic progres-
sions. The definition of “sufficiently pseudorandom”
is technical, but let us give a principal consequence:
• Theorem [Green,T.] Let A be any subset of PR
k
with density δ > 0, where R := N 1/(16k×2 ). Then,
18
if N is large enoguh depending on δ and k, then A
must contain arithmetic progressions of length k (in
fact, it contains at least c(k, δ)N 2/ logk N of them).
• Since P is itself a dense subset of PR (with density
roughly 1/(16k × 2k )), we obtain as a corollary that
the primes contain arbitrarily long arithmetic pro-
gressions. The key here is that while the density of
P in P1 depends on N , the density of P in PR does
not.
• A technical note: the almost primes PR are not them-
selves pseudorandom, for instance they are all odd.
However, they are pseudorandomly distributed in-
side PW for some very slowly growing W (e.g. W =
log log N ), which suffices to conclude the above the-
orem.
• The above theorem “transfers” Szemerédi’s theorem
(which it uses as a “black box”) from dense sub-
sets of integers to dense subsets of pseudorandom
sets. There are other theorems of Szemerédi type for
dense subsets of integers that are known, and it is
likely that those can now also be transferred to the
pseudorandom setting. For instance, it seems fea-
sible to establish infinitely many quadruples of the

19
form p, p + n, p + n2, p + n3 in the primes. However,
problems such as the twin prime problem still remain
well out of reach, since even quite dense subsets of
the integers need not have any twins.

20
Szemerédi’s theorem
• Szemerédi’s theorem has four quite different, quite
difficult, and quite important proofs: the original
graph theory-combinatorial proof of Szemerédi, the
ergodic theory proof of Furstenberg and later au-
thors, the Fourier-combinatorial proof of Roth (in
k = 3) and Gowers (in k = 4 and k > 4), and
the hypergraph-combinatorial proof of Frankl-Rodl
and Gowers. The connections between these proofs
are only just now being understood; the proof of the
transference theorem required ingredients (and inspi-
ration) from all four proofs.
• All proofs revolve around the dichotomy between
randomness and structure, and around some no-
tion of relative randomness and relative structure.
Schematically, all the proofs look like the following.
• Step 0. Start with no structure identified on A.
• Step 1. Is A pseudorandom relative to the existing
structure? If so, we model A by a random subset
of the existing structure, and then locate arithmetic
progressions in that structure.
• Step 2. If A is not pseudorandom, then there must
be an additional structure which correlates with A.
21
Add this to the list of identified structures and return
to Step 1.
• The various proofs of Szemerédi’s theorem then dif-
fer on the details of implementing this scheme, for
instance in defining what structure or pseudoran-
dom is (or even what “A” is), how to justify the
random model, how to find arithmetic progressions
in structures, how to show that non-randomness im-
plies correlation with structure, and how to ensure
the algorithm terminates.
• This type of proof should be contrasted with the
models of the primes built up earlier. There, we
identified in advance explicit structures (in this case,
small residue classes) to model the primes. In the
proofs of

22
Arbitrarily long progressions in P .

• To count progressions of length k in P , one controls


averages of the quantity
Λ(n)Λ(n + r) . . . Λ(n + (k − 1)r)
as n and k vary. If Λ were bounded by O(1), this
could be handled by Szemerédi’s theorem; but Λ can
be as large as log N . To deal with this problem,
we view Λ as bounded by ΛR , which is basically a
slightly larger version of Λ but extended to the almost
primes PR (the precise definition is a little technical).
We rely on some estimates of Goldston and Yildirim
which assert some pseudorandomness of ΛR (roughly
speaking, it asserts that the very “spiky” function ΛR
obeys the same statistics as a much more bounded
function). These estimates use the same machinery
as that used to prove the prime number theorem (i.e.
ζ functions).
• In addition to bounding Λ by ΛR , we need a few more
concepts.
• Gowers’ uniformity norms. The work of Gow-
ers on Szemerédi’s theorem identifies the correct no-
tion of “randomness” or “uniformity” appropriate to
23
the task of locating progressions of length k. A func-
tion f (x) is linearly uniform if the quantity
f (x)f (x + a)f (x + b)f (x + a + b)
has a very small mean when averaged over a, b, x;
this is roughly like saying that the second deriva-
tive of the phase of f is unbiased in sign. Functions
which are linearly uniform are essentially negligible
for the purposes of counting progressions of length 3
(i.e. one can subtract f from Λ and not significantly
affect the progression count). Higher order analogues
of this notion correspond to counting longer progres-
sions. (Technical note: to rigorously prove the neg-
ligibility of uniform functions requires knowing that
these functions are dominated by a pseudorandom
function such as ΛR ).
• Dual functions If a function f is not uniform, then
there exists a dual function Df which correlates
with f and which has some strong regularity proper-
ties (such as “almost periodicity”). For instance, if f
is not linearly uniform, the dual function Df (x) can
be created by averaging f (x+a)f (x+b)f (x + a + b)
over a and b. A key fact is that even if f is un-
bounded, the dual function Df will still be bounded,
24
if f is bounded by a suitably pseudorandom function.
• Sigma algebras The role of “structure” in the
transference argument is played by σ-algebras B of
P1, or equivalently that of partitions of P1. Given
any structure B, one can decompose a function f into
the component E(f |B) which lies in that structure,
and the component f − E(f |B) which is orthogonal
to the structure. Every dual function generates its
own σ-algebra (formed using the level sets of that σ-
algebra). A crucial combinatorial computation shows
that pseudorandom functions such as ΛR are always
very uniformly distributed with respect to these σ-
algebras, and so when one projects a function such
as Λ onto such sigma algebras, one automatically ob-
tains a bounded function. One can think of a single
σ-algebra as a scale in a multiscale analysis.
• From the above tools one can now fairly quickly con-
clude a
• Structure theorem. There exists a sigma alge-
bra B, such that Λ can be decomposed into a “struc-
tured” component E(Λ|B), which is bounded, and a
“uniform” component Λ − E(Λ|B), which is Gowers
uniform and hence negligible.

25
• Sketch of proof Start with B being the trivial σ-
algebra. If Λ − E(Λ|B) is already Gowers-uniform,
then we are done. (Note: this is essentially what is
claimed by the Hardy-Littlewood prime tuple conjec-
ture). If not, use dual functions to locate a σ-algebra
which correlates with Λ − E(Λ|B), and add it to B
(this will increase the L2 norm or energy of E(Λ|B)).
Repeat this process, extracting more and more struc-
ture from Λ, until what is left is Gowers-uniform; the
energy of the bounded function E(Λ|B) is bounded
from above, so this procedure has to terminate in a
bounded number of steps. 
• To conclude the proof of long arithmetic progressions
in the primes, one now simply discards the Gowers
uniform error Λ−E(Λ|B), thus replacing Λ by a “ho-
mogenized” variant E(Λ|B). This variant is bounded
and now one can use the ordinary Szemerédi theorem
to obtain arbitrarily long progressions.
• In other words, the Structure theorem tells us that
while the Hardy-Littlewood prime tuple conjecture
could be false (so we don’t necessarily know that P
behaves like a random subset of PR ), we do know that
P behaves like a random subset of something reason-
ably dense - and since Szemerédi’s theorem tells us
26
that any dense set contains arithmetic progressions,
so must this random subset.

27

You might also like