SCIENTIST AT WORK: Steven Weinberg Publications

Download as pdf or txt
Download as pdf or txt
You are on page 1of 1019

SCIENTIST AT WORK: Steven Weinberg Publications

Steven Weinberg (born May 3, 1933, New York, U.S.) is an American theoretical physicist and Nobel laureate in

Physics for his contributions with Abdus Salam and Sheldon Glashow to the formulation of the electroweak theory,

which explains the unity of electromagnetism with the weak nuclear force.
Contents

 Effective Field Theory, Past and Future


 The Making of the Standard Model
 A Designer Universe?
 Conceptual foundations of the unified theory of weak and electromagnetic
interactions
 Living in the Multiverse
 Four golden lessons
 Phenomenological Lagrangians
 The Trouble with Quantum Mechanics
 The quantum theory of fields III - Supersymmetry
 The cosmological constant problem
 Why The Renormalization Group Is A Good Thing
 Non-Gaussian Correlations Outside the Horizon II: The General Case
 Living With Infinities
 Asymptotically Safe Inflation
 Six-dimensional Methods for Four-dimensional Conformal Field Theories
 Pions in Large-N Quantum Chromodynamics
 Ultraviolet Divergences in Cosmological Correlations
 Collapse of the State Vector
 Six-dimensional Methods for Four-dimensional Conformal Field Theories
II: Irreducible Fields
 Minimal Fields of Canonical Dimensionality are Free
 Tetraquark Mesons in Large-N Quantum Chromodynamics
 Goldstone Bosons as Fractional Cosmic Neutrinos
 Quantum Mechanics Without State Vectors
 What Happens in a Measurement?
 Lindblad Decoherence in Atomic Clocks
 Gravitational Waves in Cold Dark Matter
 Soft Bremsstrahlung
 Absorption of Gravitational Waves from Distant Sources
 Effective Field Theory for Inflation
 A Tree Theorem for Inflation
 Non-Gaussian Correlations Outside the Horizon
 A Priori Probability Distribution of the Cosmological Constant
 Curvature Dependence of Peaks in the Cosmic Microwave Background
Distribution
 Fluctuations in the Cosmic Microwave Background I: Form Factors and
their Calculation in Synchronous Gauge
 Fluctuations in the Cosmic Microwave Background II: Cℓ at Large and
Small ℓ
 Conference Summary 20th Texas Symposium on Relativistic Astrophysics
 Cosmological Fluctuations Of Small Wavelength
 Adiabatic Modes in Cosmology
 Damping of Tensor Modes in Cosmology
 Can Non-Adiabatic Perturbations Arise After Single-Field Inflation?
 Must Cosmological Perturbations Remain Non-Adiabatic After Multi-Field
Inflation?
 Quantum Contributions to Cosmological Correlations
 Living in the Multiverse
 Quantum Contributions to Cosmological Correlations II: Can These
Corrections Become Large?
 A No-Truncation Approach to Cosmic Microwave Background Anisotropies
 Tensor Microwave Background Fluctuations for Large Multipole Order
 Non-Renormalization Theorems in Non-Renormalizable Theories
 Effective Field Theories in the Large N Limit
 Three-Body Interactions Among Nucleons and Pions
 Flavor Changing Scalar Interactions
 Effective Action and Renormalization Group Flow of Anisotropic
Superconductors
 General Effective Actions
 Strong Interactions at Low Energies
 Are Nonrenormalizable Gauge Theories Renormalizable?
 Theories of the Cosmological Constant
 Likely Values of the Cosmological Constant
 What is Quantum Field Theory, and What Did We Think It Is?
UTTG-09-09
TCC-028-09
arXiv:0908.1964v3 [hep-th] 26 Sep 2009

Effective Field Theory, Past and Future

Steven Weinberg∗
Theory Group, Department of Physics, University of Texas
Austin, TX, 78712

Abstract

This is a written version of the opening talk at the 6th International Work-
shop on Chiral Dynamics, at the University of Bern, Switzerland, July 6,
2009, to be published in the proceedings of the Workshop. In it, I remi-
nisce about the early development of effective field theories of the strong
interactions, comment briefly on some other applications of effective field
theories, and then take up the idea that the Standard Model and General
Relativity are the leading terms in an effective field theory. Finally, I cite
recent calculations that suggest that the effective field theory of gravitation
and matter is asymptotically safe.


Electronic address: weinberg@physics.utexas.edu

1
I have been asked by the organizers of this meeting to “celebrate 30
years” of a paper1 on effective field theories that I wrote in 1979. I am
quoting this request at the outset, because in the first half of this talk I will
be reminiscing about my own work on effective field theories, leading up to
this 1979 paper. I think it is important to understand how confusing these
things seemed back then, and no one knows better than I do how confused I
was. But I am sure that many in this audience know more than I do about
the applications of effective field theory to the strong interactions since 1979,
so I will mention only some early applications to strong interactions and a
few applications to other areas of physics. I will then describe how we have
come to think that our most fundamental theories, the Standard Model and
General Relativity, are the leading terms in an effective field theory. Finally,
I will report on recent work of others that lends support to a suggestion that
this effective field theory may actually be a fundamental theory, valid at all
energies.
It all started with current algebra. As everyone knows, in 1960 Yoichiro
Nambu had the idea that the axial vector current of beta decay could be con-
sidered to be conserved in the same limit that the pion, the lightest hadron,
could be considered massless.2 This assumption would follow if the ax-
ial vector current was associated with a spontaneously broken approximate
symmetry, with the pion playing the role of a Goldstone boson.3 Nambu
used this idea to explain the success of the Goldberger-Treiman formula4
for the pion decay amplitude, and with his collaborators he was able to de-
rive formulas for the rate of emission of a single low energy pion in various
collisions.5 In this work it was not necessary to assume anything about the
nature of the broken symmetry – only that there was some approximate
symmetry responsible for the approximate conservation of the axial vector
current and the approximate masslessness of the pion. But to deal with
processes involving more than one pion, it was necessary to use not only the
approximate conservation of the current but also the current commutation
relations, which of course do depend on the underlying broken symmetry.
1
S. Weinberg, Physica A96, 327 (1979).
2
Y. Nambu, Phys. Rev. Lett. 4, 380 (1960).
3
J. Goldstone, Nuovo Cimento 9, 154 (1961); Y. Nambu and G. Jona-Lasinio, Phys.
Rev. 122, 345 (1961); J. Goldstone, A. Salam, and S. Weinberg, Phys. Rev. 127, 965
(1962).
4
M. L. Goldberger and S. B. Treiman, Phys. Rev. 111, 354 (1956).
5
Y. Nambu and D. Lurie, Phys Rev. 125, 1429 (1962); Y. Nambu and E. Shrauner,
Phys. Rev. 128, 862 (1962).

2
The technology of using these properties of the currents, in which one does
not use any specific Lagrangian for the strong interactions, became known
as current algebra.6 It scored a dramatic success in the derivation of the
Adler-Weisberger sum rule7 for the axial vector beta decay coupling con-
stant gA , which showed that the current commutation relations are those of
SU (2) × SU (2).
When I started in the mid-1960s to work on current algebra, I had the
feeling that, despite the success of the Goldberger-Treiman relation and the
Adler-Weisberger sum rule, there was then rather too much emphasis on the
role that the axial vector current plays in weak interactions. After all, even if
there were no weak interactions, the fact that the strong interactions have an
approximate but spontaneously broken SU (2) × SU (2) symmetry would be
a pretty important piece of information about the strong interactions.8 To
demonstrate the point, I was able to use current algebra to derive successful
formulas for the pion-pion and pion-nucleon scattering lengths.9 When com-
bined with a well-known dispersion relation10 and the Goldberger-Treiman
relation, these formulas for the pion–nucleon scattering lengths turned out
to imply the Adler-Weisberger sum rule.
In 1966 I turned to the problem of calculating the rate of processes in
which arbitrary numbers of low energy massless pions are emitted in the
collision of other hadrons. This was not a problem that urgently needed
to be solved. I was interested in it because a year earlier I had worked
out simple formulas for the rate of emission of arbitrary numbers of soft
gravitons or photons in any collision,11 and I was curious whether anything
equally simple could be said about soft pions. Calculating the amplitude
for emission of several soft pions by use of the technique of current algebra
turned out to be fearsomely complicated; the non-vanishing commutators
of the currents associated with the soft pions prevented the derivation of
anything as simple as the results for soft photons or gravitons, except in the
6
The name may be due to Murray Gell-Mann. The current commutation relations
were given in M. Gell-Mann, Physics 1, 63 (1964).
7
S. L. Adler, Phys. Rev. Lett. 14, 1051 (1965); Phys. Rev. 140, B736 (1965); W. I.
Weisberger, Phys. Rev. Lett. 14, 1047 (1965).
8
I emphasized this point in my rapporteur’s talk on current algebra at the 1968
“Rochester” conference; see Proceedings of the 14th International Conference on High-
Energy Physics, p. 253.
9
S. Weinberg, Phys. Rev. Lett. 17, 616 (1966). The pion-nucleon scattering lengths
were calculated independently by Y. Tomozawa, Nuovo Cimento 46A, 707 (1966).
10
M. L. Goldberger, Y. Miyazawa, and R. Oehme, Phys. Rev. 99, 986 (1955).
11
S. Weinberg, Phys. Rev. 140, B516 (1965).

3
special case in which all pions have the same charge.12
Then some time late in 1966 I was sitting at the counter of a café in Har-
vard Square, scribbling on a napkin the amplitudes I had found for emitting
two or three soft pions in nucleon collisions, and it suddenly occurred to
me that these results looked very much like what would be given by lowest
order Feynman diagrams in a quantum field theory in which pion lines are
emitted from the external nucleon lines, with a Lagrangian in which the
nucleon interacts with one, two, and more pion fields. Why should this be?
Remember, this was a time when theorists had pretty well given up the
idea of applying specific quantum field theories to the strong interactions,
because there was no reason to trust the lowest order of perturbation the-
ory, and no way to sum the perturbation series. What was popular was to
exploit tools such as current algebra and dispersion relations that did not
rely on assumptions about particular Lagrangians.
The best explanation that I could give then for the field-theoretic ap-
pearance of the current algebra results was that these results for the emission
of n soft pions in nucleon collisions are of the minimum order, Gnπ , in the
pion-nucleon coupling constant Gπ , except that one had to use the exact
values for the collision amplitudes without soft pion emission, and divide by
factors of the axial vector coupling constant gA ≃ 1.2 in appropriate places.
Therefore any Lagrangian that satisfied the axioms of current algebra would
have to give the same answer as current algebra in lowest order perturbation
theory, except that it would have to be a field theory in which soft pions
were emitted only from external lines of the diagram for the nucleon colli-
sions, for only then would one know how to put in the correct factors of gA
and the correct nucleon collision amplitude.
The time-honored renormalizable theory of nucleons and pions with con-
served currents that satisfied the assumptions of current algebra was the
“linear σ-model,”13 with Lagrangian (in the limit of exact current conser-
vation):
1
L = − [∂µ~π · ∂ µ~π + ∂µ σ ∂ µ σ]
2
12
S. Weinberg, Phys. Rev. Lett. 16, 879 (1966).
13
J. Bernstein, S. Fubini, M. Gell-Mann, and W. Thirring, Nuovo Cimento 17, 757
(1960); M. Gell-Mann and M. Lévy, Nuovo Cimento 16, 705 (1960); K. C. Chou, Soviet
Physics JETP 12, 492 (1961). This theory, with the inclusion of a symmetry-breaking
term proportional to the σ field, was intended to provide an illustration of a “partially
conserved axial current,” that is, one whose divergence is proportional to the pion field.

4
m2  2  λ 2
− σ + ~π 2 − σ 2 + ~π 2
2 4 
µ
−N̄ γ ∂µ N − Gπ N̄ σ + 2iγ5 ~π · ~t N , (1)

where N , ~π , and σ are the fields of the nucleon doublet, pion triplet, and
a scalar singlet, and ~t is the nucleon isospin matrix (with ~t2 = 3/4). This
Lagrangian has an SU (2) × SU (2) symmetry (equivalent as far as current
commutation relations are concerned to an SO(4) symmetry), that is spon-
taneously broken for m2 < 0 by thepexpectation value of the σ field, given
in lowest order by < σ >= F/2 ≡ −m2 /λ, which also gives the nucleon
a lowest order mass 2Gπ F . But with a Lagrangian of this form soft pions
could be emitted from internal as well as external lines of the graphs for the
nucleon collision itself, and there would be no way to evaluate the pion emis-
sion amplitude without having to sum over the infinite number of graphs
for the nucleon collision amplitude.
To get around this obstacle, I used the chiral SO(4) symmetry to rotate
the chiral four-vector into the fourth direction
    p
~π , σ 7→ 0, σ ′ , σ′ = σ 2 + ~π 2 , (2)

with a corresponding chiral transformation N 7→ N ′ of the nucleon doublet.


The chiral symmetry of the Lagrangian would result in the pion disappearing
from the Lagrangian, except that the matrix of the rotation (2) necessar-
ily, like the fields, depends on spacetime position, while the theory is only
invariant under spacetime-independent chiral rotations. The pion field thus
reappears as a parameter in the SO(4) rotation (2), which could conveniently
be taken as
~π ′ ≡ F ~π /[σ + σ ′ ] . (3)
But the rotation parameter ~π ′ would not appear in the transformed La-
grangian if it were independent of the spacetime coordinates, so wherever it
appears it must be accompanied with at least one derivative. This derivative
produces a factor of pion four-momentum in the pion emission amplitude,
which would suppress the amplitude for emitting soft pions, if this factor
were not compensated by the pole in the nucleon propagator of an external
nucleon line to which the pion emission vertex is attached. Thus, with the
Lagrangian in this form, pions of small momenta can only be emitted from
external lines of a nucleon collision amplitude. This is what I needed.
Since σ ′ is chiral-invariant, it plays no role in maintaining the chiral
invariance of the theory, and could therefore be replaced with its lowest-

5
order expectation value F/2. The transformed Lagrangian (now dropping
primes) is then
" #−2
1 ~π 2
L = − 1+ 2 ∂µ~π · ∂ µ~π
2 F
"
−N̄ γ µ ∂µ + Gπ F/2
" #−1  #
~π 2 2 ~ 2
+ iγ µ
1+ 2 γ5 t · ∂µ~π + 2 ~t · (~π × ∂µ~π ) N . (4)
F F F

In order to reproduce the results of current algebra, it is only necessary to


identify F as the pion decay amplitude Fπ ≃ 184 MeV, replace the term
Gπ F/2 in the nucleon bilinear with the actual nucleon mass mN (given by
the Goldberger–Treiman relation as Gπ Fπ /2gA ), and replace the pseudovec-
tor pion-nucleon coupling 1/F with its actual value Gπ /2mN = gA /Fπ . This
gives an effective Lagrangian
" #−2
1 ~π 2
Leff = − 1+ 2 ∂µ~π · ∂ µ~π
2 Fπ
"
−N̄ γ µ ∂µ + mN
" #−1  #
µ ~π 2 Gπ ~ 2
+ iγ 1+ 2 γ5 t · ∂µ~π + 2 ~t · (~π × ∂µ~π ) N . (5)
Fπ mN Fπ

To take account of the finite pion mass, the linear sigma model also includes
a chiral-symmetry breaking perturbation proportional to σ. Making the
chiral rotation (2), replacing σ ′ with the constant F/2, and adjusting the
coefficient of this term to give the physical pion mass mπ gives a chiral
symmetry breaking term
" #−1
1 ~π 2
∆Leff =− 1+ 2 m2π ~π 2 . (6)
2 Fπ

Using Leff + ∆Leff in lowest order perturbation theory, I found the same
results for low-energy pion-pion and pion-nucleon scattering that I had ob-
tained earlier with much greater difficulty by the methods of current algebra.
A few months after this work, Julian Schwinger remarked to me that it
should be possible to skip this complicated derivation, forget all about the

6
linear σ-model, and instead infer the structure of the Lagrangian directly
from the non-linear chiral transformation properties of the pion field appear-
ing in (5).14 It was a good idea. I spent the summer of 1967 working out
these transformation properties, and what they imply for the structure of
the Lagrangian.15 It turns out that if we require that the pion field has the
usual linear transformation under SO(3) isospin rotations (because isospin
symmetry is supposed to be not spontaneously broken), then there is a
unique SO(4) chiral transformation that takes the pion field into a function
of itself — unique, that is, up to possible redefinition of the field. For an
infinitesimal SO(4) rotation by an angle ǫ in the a4 plane (where a = 1, 2, 3),
the pion field πb (labelled with a prime in Eq. (3)) changes by an amount
" ! #
1 ~π 2 πa πb
δa πb = −iǫFπ 1− 2 δab + 2 . (7)
2 Fπ Fπ

Any other field ψ, on which isospin rotations act with a matrix ~t , is changed
by an infinitesimal chiral rotation in the a4 plane by an amount
ǫ ~ 
δa ψ = t × ~π ψ . (8)
Fπ a

This is just an ordinary, though position-dependent, isospin rotation, so a


non-derivative isospin-invariant term in the Lagrangian that does not in-
volve pions, like the nucleon mass term −mN N̄ N , is automatically chiral-
invariant. The terms in Eq. (5):
" " #−1 #
2i ~π 2 ~t · (~π × ∂µ~π ) N ,
−N̄ γ ∂µ + 2 γ µ 1 + 2
µ
(9)
Fπ Fπ

and " #−1


Gπ ~π 2
−i 1+ 2 N̄ γ µ γ5~t · ∂µ~π N , (10)
mN Fπ
are simply proportional to the most general chiral-invariant nucleon–pion
interactions with a single spacetime derivative. The coefficient of the term
14
For Schwinger’s own development of this idea, see J. Schwinger, Phys. Lett. 24B,
473 (1967). It is interesting that in deriving the effective field theory of goldstinos in
supergravity theories, it is much more transparent to start with a theory with linearly
realized supersymmetry and impose constraints analogous to setting σ ′ = F/2, than to
work from the beginning with supersymmetry realized non-linearly, in analogy to Eq. (7);
see Z. Komargodski and N. Seiberg, to be published.
15
S. Weinberg, Phys. Rev. 166, 1568 (1968).

7
(9) is fixed by the condition that N should be canonically normalized, while
the coefficient of (10) is chosen to agree with the conventional definition
of the pion-nucleon coupling Gπ , and is not directly constrained by chiral
symmetry. The term
" #−2
1 ~π 2
− 1+ 2 ∂µ~π · ∂ µ~π (11)
2 F

is proportional to the most general chiral invariant quantity involving the


pion field and no more than two spacetime derivatives, with a coefficient
fixed by the condition that ~π should be canonically normalized. The chiral
symmetry breaking term (6) is the most general function of the pion field
without derivatives that transforms as the fourth component of a chiral four-
vector. None of this relies on the methods of current algebra, though one
can use the Lagrangian (5) to calculate the Noether current corresponding
to chiral transformations, and recover the Goldberger-Treiman relation in
the form gA = Gπ Fπ /2mN .
This sort of direct analysis was subsequently extended by Callan, Cole-
man, Wess, and Zumino to the transformation and interactions of the Gold-
stone boson fields associated with the spontaneous breakdown of any Lie
group G to any subgroup H.16 Here, too, the transformation of the Gold-
stone boson fields is unique, up to a redefinition of the fields, and the trans-
formation of other fields under G is uniquely determined by their trans-
formation under the unbroken subgroup H. It is straightforward to work
out the rules for using these ingredients to construct effective Lagrangians
that are invariant under G as well as H.17 Once again, the key point is
that the invariance of the Lagrangian under G would eliminate all presence
of the Goldstone boson field in the Lagrangian if the field were spacetime-
independent, so wherever functions of this field appear in the Lagrangian
they are always accompanied with at least one spacetime derivative.
16
S. Coleman, J. Wess, and B. Zumino, Phys. Rev. 177, 2239(1969); C. G. Callan, S.
Coleman, J. Wess, and B. Zumino, Phys. Rev. 177, 2247(1969).
17
There is a complication. In some cases, such as SU (3) × SU (3) spontaneously broken
to SU (3), fermion loops produce G-invariant terms in the action that are not the integrals
of G-invariant terms in the Lagrangian density; see J. Wess and B. Zumino, Phys. Lett.
37B, 95 (1971); E. Witten, Nucl. Phys. B223, 422 (1983). The most general such terms in
the action, whether or not produced by fermion loops, have been cataloged by E. D’Hoker
and S. Weinberg, Phys. Rev. D50, R6050 (1994). It turns out that for SU (N ) × SU (N )
spontaneously broken to the diagonal SU (N ), there is just one such term for N ≥ 3, and
none for N = 1 or 2. For N = 3, this term is the one found by Wess and Zumino.

8
In the following years, effective Lagrangians with spontaneously broken
SU (2)×SU (2) or SU (3)×SU (3) symmetry were widely used in lowest-order
perturbation theory to make predictions about low energy pion and kaon
interactions.18 But during this period, from the late 1960s to the late 1970s,
like many other particle physicists I was chiefly concerned with developing
and testing the Standard Model of elementary particles. As it happened, the
Standard Model did much to clarify the basis for chiral symmetry. Quan-
tum chromodynamics with N light quarks is automatically invariant under
a SU (N ) × SU (N ) chiral symmetry,19 broken in the Lagrangian only by
quark masses, and the electroweak theory tells us that the currents of this
symmetry (along with the quark number currents) are just those to which
the W ± , Z 0 , and photon are coupled.
During this whole period, effective field theories appeared as only a de-
vice for more easily reproducing the results of current algebra. It was dif-
ficult to take them seriously as dynamical theories, because the derivative
couplings that made them useful in the lowest order of perturbation theory
also made them nonrenormalizable, thus apparently closing off the possibil-
ity of using these theories in higher order.
My thinking about this began to change in 1976. I was invited to give a
series of lectures at Erice that summer, and took the opportunity to learn
the theory of critical phenomena by giving lectures about it.20 In preparing
these lectures, I was struck by Kenneth Wilson’s device of “integrating out”
short-distance degrees of freedom by introducing a variable ultraviolet cut-
off, with the bare couplings given a cutoff dependence that guaranteed that
physical quantities are cutoff independent. Even if the underlying theory
is renormalizable, once a finite cutoff is introduced it becomes necessary to
introduce every possible interaction, renormalizable or not, to keep physics
18
For reviews, see S. Weinberg, in Lectures on Elementary Particles and Quantum Field
Theory — 1970 Brandeis University Summer Institute in Theoretical Physics, Vol. 1, ed.
S. Deser, M. Grisaru, and H. Pendleton (The M.I.T. Press, Cambridge, MA, 1970); B. W.
Lee, Chiral Dynamics (Gordon and Breach, New York, 1972).
19
For a while it was not clear why there was not also a chiral U (1) symmetry, that would
also be broken in the Lagrangian only by the quark masses, and would either lead to a
parity doubling of observed hadrons, or to a new light pseudoscalar neutral meson, both
of which possibilities were experimentally ruled out. It was not until 1976 that ‘t Hooft
pointed out that the effect of triangle anomalies in the presence of instantons produced
an intrinsic violation of this unwanted chiral U (1) symmetry; see G. ‘t Hooft, Phys. Rev.
D14, 3432 (1976).
20
S. Weinberg, “Critical Phenomena for Field Theorists,” in Understanding the Funda-
mental Constituents of Matter, ed. A. Zichichi (Plenum Press, New York, 1977).

9
strictly cutoff independent. From this point of view, it doesn’t make much
difference whether the underlying theory is renormalizable or not. Indeed,
I realized that even without a cutoff, as long as every term allowed by sym-
metries is included in the Lagrangian, there will always be a counterterm
available to absorb every possible ultraviolet divergence by renormalization
of the corresponding coupling constant. Non-renormalizable theories, I re-
alized, are just as renormalizable as renormalizable theories.
This opened the door to the consideration of a Lagrangian containing
terms like (5) as the basis for a legitimate dynamical theory, not limited to
the tree approximation, provided one adds every one of the infinite number
of other, higher-derivative, terms allowed by chiral symmetry.21 But for this
to be useful, it is necessary that in some sort of perturbative expansion,
only a finite number of terms in the Lagrangian can appear in each order of
perturbation theory.
In chiral dynamics, this perturbation theory is provided by an expansion
in powers of small momenta and pion masses. At momenta of order mπ , the
number ν of factors of momenta or mπ contributed by a diagram with L
loops, EN external nucleon lines, and Vi vertices of type i, for any reaction
among pions and/or nucleons, is

ni EN
X  
ν= Vi di + + mi − 2 + 2L + 2 − , (12)
i
2 2

where di , ni , and mi are respectively the numbers of derivatives, factors of


nucleon fields, and factors of pion mass (or more precisely, half the number
of factors of u and d quark masses) associated with vertices of type i. As a
consequence of chiral symmetry, the minimum possible value of di +ni /2+mi
is 2, so the leading diagrams for small momenta are those with L = 0 and
any number of interactions with di + ni /2+ mi = 2, which are the ones given
in Eqs. (5) and (6). To next order in momenta, we may include tree graphs
with any number of vertices with di +ni /2+mi = 2 and just one vertex with
di + ni /2 + mi = 3 (such as the so-called σ-term). To next order, we include
any number of vertices with di + ni /2 + mi = 2, plus either a single loop,
or a single vertex with di + ni /2 + mi = 4 which provides a counterterm for
the infinity in the loop graph, or two vertices with di + ni /2 + mi = 3. And
so on. Thus one can generate a power series in momenta and mπ , in which
only a few new constants need to be introduced at each new order. As an
21
I thought it appropriate to publish this in a festschrift for Julian Schwinger; see
footnote 1.

10
explicit example of this procedure, I calculated the one-loop corrections to
pion–pion scattering in the limit of zero pion mass, and of course I found
the sort of corrections required to this order by unitarity.22
But even if this procedure gives well-defined finite results, how do we
know they are true? It would be extraordinarily difficult to justify any cal-
culation involving loop graphs using current algebra. For me in 1979, the
answer involved a radical reconsideration of the nature of quantum field the-
ory. From its beginning in the late 1920s, quantum field theory had been
regarded as the application of quantum mechanics to fields that are among
the fundamental constituents of the universe — first the electromagnetic
field, and later the electron field and fields for other known “elementary”
particles. In fact, this became a working definition of an elementary particle
— it is a particle with its own field. But for years in teaching courses on
quantum field theory I had emphasized that the description of nature by
quantum field theories is inevitable, at least in theories with a finite number
of particle types, once one assumes the principles of relativity and quantum
mechanics, plus the cluster decomposition principle, which requires that
distant experiments have uncorrelated results. So I began to think that al-
though specific quantum field theories may have a content that goes beyond
these general principles, quantum field theory itself does not. I offered this
in my 1979 paper as what Arthur Wightman would call a folk theorem: “if
one writes down the most general possible Lagrangian, including all terms
consistent with assumed symmetry principles, and then calculates matrix
elements with this Lagrangian to any given order of perturbation theory,
the result will simply be the most general possible S-matrix consistent with
perturbative unitarity, analyticity, cluster decomposition, and the assumed
symmetry properties.” So current algebra wasn’t needed.
There was an interesting irony in this. I had been at Berkeley from
1959 to 1966, when Geoffrey Chew and his collaborators were elaborating a
program for calculating S-matrix elements for strong interaction processes
by the use of unitarity, analyticity, and Lorentz invariance, without reference
to quantum field theory. I found it an attractive philosophy, because it relied
only on a minimum of principles, all well established. Unfortunately, the S-
matrix theorists were never able to develop a reliable method of calculation,
so I worked instead on other things, including current algebra. Now in 1979
22
Unitarity corrections to soft-pion results of current algebra had been considered earlier
by H. Schnitzer, Phys. Rev. Lett. 24, 1384 (1970); Phys. Rev. D2, 1621 (1970); L.-F.
Li and H. Pagels, Phys. Rev. Lett. 26, 1204 (1971); Phys. Rev. D5, 1509 (1972); P.
Langacker and H. Pagels, Phys. Rev. D8, 4595 (1973).

11
I realized that the assumptions of S-matrix theory, supplemented by chiral
invariance, were indeed all that are needed at low energy, but the most
convenient way of implementing these assumptions in actual calculations
was by good old quantum field theory, which the S-matrix theorists had
hoped to supplant.
After 1979, effective field theories were applied to strong interactions in
work by Gasser, Leutwyler, Meissner, and many other theorists. My own
contributions to this work were limited to two areas — isospin violation,
and nuclear forces.
At first in the development of chiral dynamics there had been a tacit
assumption that isotopic spin symmetry was a better approximate symme-
try than chiral SU (2) × SU (2), and that the Gell-Mann–Ne’eman SU (3)
symmetry was a better approximate symmetry than chiral SU (3) × SU (3).
This assumption became untenable with the calculation of quark mass ratios
from the measured pseudoscalar meson masses.23 It turns out that the d
quark mass is almost twice the u quark mass, and the s quark mass is very
much larger than either. As a consequence of the inequality of d and u quark
masses, chiral SU (2) × SU (2) is broken in the Lagrangian of quantum chro-
modynamics not only by the fourth component of a chiral four-vector, as in
(6), but also by the third component of a different chiral four-vector pro-
portional to mu − md (whose fourth component is a pseudoscalar). There
is no function of the pion field alone, without derivatives, with the latter
transformation property, which is why pion–pion scattering and the pion
masses are described by (6) and the first term in (5) in leading order, with
no isospin breaking aside of course from that due to electromagnetism. But
there are non-derivative corrections to pion–nucleon interactions,24 which
at momenta of order mπ are suppressed relative to the derivative coupling
terms in (5) by just one factor of mπ or momenta:
!
A 1 − π 2 /Fπ2
∆′ Leff = − N̄ N
2 1 + π 2 /Fπ2
2 π3
   
−B N̄ t3 N − 2 N̄ ~t · ~π N
Fπ 1 + π 2 /Fπ2
23
S. Weinberg, contribution to a festschrift for I. I. Rabi, Trans. N. Y. Acad. Sci. 38,
185 (1977).
24
S. Weinberg, in Chiral Dynamics: Theory and Experiment — Proceedings of the Work-
shop Held at MIT, July 1994 (Springer-Verlag, Berlin, 1995). The terms in Eq. (13) that
are odd in the pion field are given in Section 19.5 of S. Weinberg, The Quantum Theory
of Fields, Vol. II (Cambridge University Press, 1996)

12
iC
− N̄ γ5~π · ~tN
1 + ~π 2 /Fπ2
iDπ3
− N̄ γ5 N , (13)
1 + ~π 2 /Fπ2

where A and C are proportional to mu + md , and B and D are proportional


to mu − md , with B ≃ −2.5 MeV. The A and B terms contribute isospin
conserving and violating terms to the so-called σ-term in pion nucleon scat-
tering.
My work on nuclear forces began one day in 1990 while I was lecturing
to a graduate class at Texas. I derived Eq. (12) for the class, and showed
how the interactions in the leading tree graphs with di + ni /2 + mi = 2 were
just those given here in Eqs. (5) and (6). Then, while I was standing at the
blackboard, it suddenly occurred to me that there was one other term with
di + ni /2 + mi = 2 that I had never previously considered: an interaction
with no factors of pion mass and no derivatives (and hence, according to chi-
ral symmetry, no pions), but four nucleon fields — that is, a sum of Fermi
interactions (N̄ ΓN )(N̄ Γ′ N ), with any matrices Γ and Γ′ allowed by Lorentz
invariance, parity conservation, and isospin conservation. This is just the
sort of “hard core” nucleon–nucleon interaction that nuclear theorists had
long known has to be added to the pion-exchange term in theories of nuclear
force. But there is a complication — in graphs for nucleon–nucleon scatter-
ing at low energy, two-nucleon intermediate states make a large contribution
that invalidates the sort of power-counting that justifies the use of the effec-
tive Lagrangian (5), (6) in processes involving only pions, or one low-energy
nucleon plus pions. So it is necessary to apply the effective Lagrangian,
including the terms (N̄ ΓN )(N̄ Γ′ N ) along with the terms (5) and (6), to
the two-nucleon irreducible nucleon–nucleon potential, rather than directly
to the scattering amplitude.25 This program was initially carried further
by Ordoñez, van Kolck, Friar, and their collaborators,26 and eventually by
several others.
The advent of effective field theories generated changes in point of view
and suggested new techniques of calculation that propagated out to numer-
25
S. Weinberg, Phys. Lett. B251, 288 (1990); Nucl. Phys. B363, 3 (1991); Phys. Lett.
B295, 114 (1992).
26
C. Ordoñez and U. van Kolck, Phys. Lett. B291, 459 (1992); C. Ordoñez. L. Ray,
and U. van Kolck, Phys. Rev. Lett. 72, 1982 (1994); U. van Kolck, Phys. Rev. C49,
2932 (1994); U. van Kolck, J. Friar, and T. Goldman, Phys. Lett. B 371, 169 (1996); C.
Ordoñez, L. Ray, and U. van Kolck, Phys. Rev. C 53, 2086 (1996); C. J. Friar, Few-Body
Systems Suppl. 99, 1 (1996).

13
ous areas of physics, some quite far removed from particle physics. Notable
here is the use of the power-counting arguments of effective field theory to
justify the approximations made in the BCS theory of superconductivity.27
Instead of counting powers of small momenta, one must count powers of
the departures of momenta from the Fermi surface. Also, general features
of theories of inflation have been clarified by re-casting these theories as
effective field theories of the inflaton and gravitational fields.28
Perhaps the most important lesson from chiral dynamics was that we
should keep an open mind about renormalizability. The renormalizable
Standard Model of elementary particles may itself be just the first term
in an effective field theory that contains every possible interaction allowed
by Lorentz invariance and the SU (3) × SU (2) × U (1) gauge symmetry, only
with the non-renormalizable terms suppressed by negative powers of some
very large mass M , just as the terms in chiral dynamics with more deriva-
tives than in Eq. (5) are suppressed by negative powers of 2πFπ ≈ mN . One
indication that there is a large mass scale in some theory underlying the
Standard Model is the well-known fact that the three (suitably normalized)
running gauge couplings of SU (3)×SU (2)×U (1) become equal at an energy
of the order of 1015 GeV (or, if supersymmetry is assumed, 2 × 1016 GeV,
with better convergence of the couplings.)
In 1979 papers by Frank Wilczek and Tony Zee29 and me30 independently
pointed out that, while the renormalizable terms of the Standard Model
cannot violate baryon or lepton conservation,31 this is not true of the higher
27
G. Benfatto and G. Gallavotti, J. Stat. Phys. 59, 541 (1990); Phys. Rev. 42, 9967
(1990); J. Feldman and E. Trubowitz, Helv. Phys. Acta 63, 157 (1990); 64, 213 (1991);
65, 679 (1992); R. Shankar, Physica A177, 530 (1991); Rev. Mod. Phys. 66, 129 (1993);
J. Polchinski, in Recent Developments in Particle Theory, Proceedings of the 1992 TASI,
eds. J. Harvey and J. Polchinski (World Scientific, Singapore, 1993); S. Weinberg, Nucl.
Phys. B413, 567 (1994).
28
C. Cheung, P. Creminilli, A. L. Fitzpatrick, J. Kaplan, and L. Senatore, J. High
Energy Physics 0803, 014 (2008); S. Weinberg, Phys. Rev. D 73, 123541 (2008).
29
F. Wilczek and A. Zee, Phys. Rev. Lett. 43, 1571 (1979).
30
S. Weinberg, Phys. Rev. Lett. 43, 1566 (1979).
31
This is not true if the effective theory contains fields for the squarks and sleptons of
supersymmetry. However, there are no renormalizable baryon or lepton violating terms
in “split supersymmetry” theories, in which the squarks and sleptons are superheavy,
and only the gauginos and perhaps higgsinos survive to ordinary energies; see N. Arkani-
Hamed and S. Dimopoulos, JHEP 0506, 073 (2005); G. F. Giudice and A. Romanino,
Nucl. Phys. B 699, 65 (2004); N. Arkani-Hamed, S. Dimopoulos, G. F. Giudice, and A.
Romanino, Nucl. Phys. B 709, 3 (2005); A. Delgado and G. F. Giudice, Phys. Lett.
B627, 155 (2005).

14
non-renormalizable terms. In particular, four-fermion terms can generate a
proton decay into antileptons, though not into leptons, with an amplitude
suppressed on dimensional grounds by a factor M −2 . The conservation of
baryon and lepton number in observed physical processes thus may be an
accident, an artifact of the necessary simplicity of the leading renormalizable
SU (3) × SU (2) × U (1)-invariant interactions. I also noted at the same time
that interactions between a pair of lepton doublets and a pair of scalar
doublets can generate a neutrino mass, which is suppressed only by a factor
M −1 , and that therefore with a reasonable estimate of M could produce
observable neutrino oscillations. The subsequent confirmation of neutrino
oscillations lends support to the view of the Standard Model as an effective
field theory, with M somewhere in the neighborhood of 1016 GeV.
Of course, these non-renormalizable terms can be (and in fact, had been)
generated in various renormalizable grand-unified theories by integrating
out the heavy particles in these theories. Some calculations in the result-
ing theories can be assisted by treating them as effective field theories.32
But the important point is that the existence of suppressed baryon- and
lepton-nonconserving terms, and some of their detailed properties, should
be expected on much more general grounds, even if the underlying theory
is not a quantum field theory at all. Indeed, from the 1980s on, it has been
increasingly popular to suppose that the theory underlying the Standard
Model as well as general relativity is a string theory.
Which brings me to gravitation. Just as we have learned to live with the
fact that there is no renormalizable theory of pion fields that is invariant un-
der the chiral transformation (7), so also we should not despair of applying
quantum field theory to gravitation just because there is no renormaliz-
able theory of the metric tensor that is invariant under general coordinate
transformations. It increasingly seems apparent that the Einstein–Hilbert

Lagrangian gR is just the least suppressed term in the Lagrangian of an
effective field theory containing every possible generally covariant function
of the metric and its derivatives. The application of this point of view to
long range properties of gravitation has been most thoroughly developed
32
The effective field theories derived by integrating out heavy particles had been con-
sidered by T. Appelquist and J. Carrazone, Phys. Rev. D11, 2856 (1975). In 1980, in a
paper titled “Effective Gauge Theories,” I used the techniques of effective field theory to
evaluate the effects of integrating out the heavy gauge bosons in grand unified theories on
the initial conditions for the running of the gauge couplings down to accessible energies:
S. Weinberg, Phys. Lett. 91B, 51 (1980).

15
by John Donoghue and his collaborators.33 One consequence of viewing
the Einstein–Hilbert Lagrangian as one term in an effective field theory is
that any theorem based on conventional general relativity, which declares
that under certain initial conditions future singularities are inevitable, must
be reinterpreted to mean that under these conditions higher terms in the
effective action become important.
Of course, there is a problem — the effective theory of gravitation cannot
be used at very high energies, say of the order of the Planck mass, no more
than chiral dynamics can be used above a momentum of order 2πFπ ≈ 1
GeV. For purposes of the subsequent discussion, it is useful to express this
problem in terms of the Wilsonian renormalization group. The effective
action for gravitation takes the form
Z "
4
p
Ieff = − d x −Detg f0 (Λ) + f1 (Λ)R

+f2a (Λ)R2 + f2b (Λ)Rµν Rµν


#
3
+f3a (Λ)R + . . . , (14)

where here Λ is the ultraviolet cutoff, and the fn (Λ) are coupling parame-
ters with a cutoff dependence chosen so that physical quantities are cutoff-
independent. We can replace these coupling parameters with dimensionless
parameters gn (Λ):

g0 ≡ Λ−4 f0 ; g1 ≡ Λ−2 f1 ; g2a ≡ f2a ;


g2b ≡ f2b ; g3a ≡ Λ2 f3a ; . . . . (15)

Because dimensionless, these parameters must satisfy a renormalization


group equation of the form
d  
Λ gn (Λ) = βn g(Λ) . (16)

In perturbation theory, all but a finite number of the gn (Λ) go to infinity
as Λ → ∞, which if true would rule out the use of this theory to calculate
33
J. F. Donoghue, Phys. Rev. D50, 3874 (1884); Phys. Lett. 72, 2996 (1994); lectures
presented at the Advanced School on Effective Field Theories (Almunecar, Spain, June
1995), gr-qc/9512024; J. F. Donoghue, B. R. Holstein, B.Garbrecth, and T.Konstandin,
Phys. Lett. B529, 132 (2002); N. E. J. Bjerrum-Bohr, J. F. Donoghue, and B. R. Holstein,
Phys. Rev. D68, 084005 (2003).

16
anything at very high energy. There are even examples, like the Landau pole
in quantum electrodynamics and the phenomenon of “triviality” in scalar
field theories, in which the couplings blow up at a finite value of Λ.
It is usually assumed that this explosion of the dimensionless couplings
at high energy is irrelevant in the theory of gravitation, just as it is irrelevant
in chiral dynamics. In chiral dynamics, it is understood that at energies of
order 2πFπ ≈ mN , the appropriate degrees of freedom are no longer pion
and nucleon fields, but rather quark and gluon fields. In the same way, it is
usually assumed that in the quantum theory of gravitation, when Λ reaches
some very high energy, of the order of 1015 to 1018 GeV, the appropriate
degrees of freedom are no longer the metric and the Standard Model fields,
but something very different, perhaps strings.
But maybe not. It is just possible that the appropriate degrees of free-
dom at all energies are the metric and matter fields, including those of the
Standard Model. The dimensionless couplings can be protected from blow-
ing up if they are attracted to a finite value gn∗ . This is known as asymptotic
safety.34
Quantum chromodynamics provides an example of asymptotic safety,
but one in which the theory at high energies is not only safe from exploding
couplings, but also free. In the more general case of asymptotic safety, the
high energy limit gn∗ is finite, but not commonly zero.
For asymptotic safety to be possible, it is necessary that all the beta
functions should vanish at gn∗ :

βn (g∗ ) = 0 . (17)

It is also necessary that the physical couplings should be on a trajectory


that is attracted to gn∗ . The number of independent parameters in such
a theory equals the dimensionality of the surface, known as the ultraviolet
critical surface, formed by all the trajectories that are attracted to the fixed
point. This dimensionality had better be finite, if the theory is to have
any predictive power at high energy. For an asymptotically safe theory
with a finite-dimensional ultraviolet critical surface, the requirement that
couplings lie on this surface plays much the same role as the requirment
of renormalizability in quantum chromodynamics — it provides a rational
basis for limiting the complexity of the theory.
This dimensionality of the ultraviolet critical surface can be expressed
in terms of the behavior of βn (g) for g near the fixed point g∗ . Barring
34
This was first proposed in my 1976 Erice lectures; see footnote 20.

17
unexpected singularities, in this case we have

∂βn (g)
X  
βn (g) → Bnm (gm − g∗m ) , Bnm ≡ . (18)
m ∂gm ∗

The solution of Eq. (16) for g near g∗ is then

uin Λλi ,
X
gn (Λ) → gn∗ + (19)
i

where λi and uin are the eigenvalues and suitably normalized eigenvectors
of Bnm : X
Bnm uim = λi uin . (20)
m

Because Bnm is real but not symmetric, the eigenvalues are either real, or
come in pairs of complex conjugates. The dimensionality of the ultraviolet
critical surface is therefore equal to the number of eigenvalues of Bnm with
negative real part. The condition that the couplings lie on this surface can be
regarded as a generalization of the condition that quantum chromodynamics,
if it were a fundamental and not merely an effective field theory, would have
to involve only renormalizable couplings.
It may seem unlikely that an infinite matrix like Bnm should have only
a finite number of eigenvalues with negative real part, but in fact examples
of this are quite common. As we learned from the Wilson–Fisher theory
of critical phenomena, when a substance undergoes a second-order phase
transition, its parameters are subject to a renormalization group equation
that has a fixed point, with a single infrared-repulsive direction, so that
adjustment of a single parameter such as the temperature or the pressure
can put the parameters of the theory on an infrared attractive surface of
co-dimension one, leading to long-range correlations. The single infrared-
repulsive direction is at the same time a unique ultraviolet-attractive direc-
tion, so the ultraviolet critical surface in such a theory is a one-dimensional
curve. Of course, the parameters of the substance on this curve do not re-
ally approach a fixed point at very short distances, because at a distance of
the order of the interparticle spacing the effective field theory describing the
phase transition breaks down.
What about gravitation? There are indications that here too there is
a fixed point, with an ultraviolet critical surface of finite dimensionality.
Fixed points have been found (of course with gn∗ 6= 0) using dimensional

18
continuation from 2+ǫ to 4 spacetime dimensions,35 by a 1/N approximation
(where N is the number of added matter fields),36 by lattice methods,37 and
by use of the truncated exact renormalization group equation,38 initiated
in 1998 by Martin Reuter. In the last method, which had earlier been
introduced in condensed matter physics39 and then carried over to particle
theory,40 one derives an exact renormalization group equation for the total
vacuum amplitude Γ[g, Λ] in the presence of a background metric gµν with
an infrared cutoff Λ. This is the action to be used in calculations of the
true vacuum amplitude in calculations of graphs with an ultraviolet cutoff
Λ. To have equations that can be solved, it is necessary to truncate these
renormalization group equations, writing Γ[g, Λ] as a sum of just a finite
number of terms like those shown explicitly in Eq. (14), and ignoring the
fact that the beta function inevitably does not vanish for the couplings of
other terms in Γ[g, Λ] that in the given truncation are assumed to vanish.
Initially only two terms were included in the truncation of Γ[g, Λ] (a cos-

mological constant and the Einstein–Hilbert term gR), and a fixed point
was found with two eigenvalues λi , a pair of complex conjugates with nega-
35
S. Weinberg, in General Relativity, ed. S. W. Hawking and W. Israel (Cambridge
University Press, 1979): 700; H. Kawai, Y. Kitazawa, & M. Ninomiya, Nucl. Phys. B
404, 684 (1993); Nucl. Phys. B 467, 313 (1996); T. Aida & Y. Kitazawa, Nucl. Phys. B
401, 427 (1997); M. Niedermaier, Nucl. Phys. B 673, 131 (2003) .
36
L. Smolin, Nucl. Phys. B208, 439 (1982); R. Percacci, Phys. Rev. D 73, 041501
(2006).
37
J. Ambjørn, J. Jurkewicz, & R. Loll, Phys. Rev. Lett. 93, 131301 (2004); Phys. Rev.
Lett. 95, 171301 (2005); Phys. Rev. D72, 064014 (2005); Phys. Rev. D78, 063544 (2008);
and in Approaches to Quantum Gravity, ed. D. Orı́ti (Cambridge University Press).
38
M. Reuter, Phys. Rev. D 57, 971 (1998); D. Dou & R. Percacci, Class. Quant. Grav.
15, 3449 (1998); W. Souma, Prog. Theor. Phys. 102, 181 (1999); O. Lauscher & M.
Reuter, Phys. Rev. D 65, 025013 (2001); Class. Quant. Grav. 19. 483 (2002); M. Reuter
& F. Saueressig, Phys Rev. D 65, 065016 (2002); O. Lauscher & M. Reuter, Int. J. Mod.
Phys. A 17, 993 (2002); Phys. Rev. D 66, 025026 (2002); M. Reuter and F. Saueressig,
Phys Rev. D 66, 125001 (2002); R. Percacci & D. Perini, Phys. Rev. D 67, 081503 (2002);
Phys. Rev. D 68, 044018 (2003); D. Perini, Nucl. Phys. Proc. Suppl. C 127, 185 (2004);
D. F. Litim, Phys. Rev. Lett. 92, 201301 (2004); A. Codello & R. Percacci, Phys. Rev.
Lett. 97, 221301 (2006); A. Codello, R. Percacci, & C. Rahmede, Int. J. Mod. Phys.
A23, 143 (2008); M. Reuter & F. Saueressig, 0708.1317; P. F. Machado and F. Saueressig,
Phys. Rev. D77, 124045 (2008); A. Codello, R. Percacci, & C. Rahmede, Ann. Phys. 324,
414 (2009); A. Codello & R. Percacci, 0810.0715; D. F. Litim 0810.3675; H. Gies & M. M.
Scherer, 0901.2459; D. Benedetti, P. F. Machado, & F. Saueressig, 0901.2984, 0902.4630;
M. Reuter & H. Weyer, 0903.2971.
39
F. J. Wegner and A. Houghton, Phys. Rev. A8, 401 (1973).
40
J. Polchinski, Nucl. Phys. B231, 269 (1984); C. Wetterich, Phys. Lett. B 301, 90
(1993).

19
tive real part. Then a third operator (Rµν Rµν or the equivalent) was added,
and a third eigenvalue was found, with λi real and negative. This was not
encouraging. If each time that new terms were included in the truncation,
new eigenvalues appeared with negative real part, then the ultraviolet crit-
ical surface would be infinite dimensional, and the theory, though free of
couplings that exploded at high energy, would lose all predictive value at
high energy.
In just the last few years calculations have been done that allow more
optimism. Codello, Percacci, and Rahmede41 have considered a Lagrangian

containing all terms gRn with n running from zero to a maximum value
nmax , and find that the ultraviolet critical surface has dimensionality 3 even
when nmax exceeds 2, up to the highest value nmax = 6 that they considered,
for which the space of coupling constants is 7-dimensional. Furthermore,
the three eigenvalues they find with negative real part seem to converge
as nmax increases, as shown in the following table of ultraviolet-attractive
eigenvalues:
nmax =2: −1.38 ± 2.32i −26.8
nmax =3: −2.71 ± 2.27i −2.07
nmax =4: −2.86 ± 2.45i −1.55
nmax =5: −2.53 ± 2.69i −1.78
nmax =6: −2.41 ± 2.42i −1.50

In a subsequent paper42 they added matter fields, and again found just three
ultraviolet-attractive eigenvalues. Further, this year Benedetti, Machado,
and Saueressig43 considered a truncation with a different four terms, terms
√ √
proportional to gRn with n = 0, 1 and 2 and also gCµνρσ C µνρσ (where
Cµνρσ is the Weyl tensor) and they too find just three ultraviolet-attractive
eigenvalues, also when matter is added. If this pattern of eigenvalues con-
tinues to hold in future calculations, it will begin to look as if there is a
quantum field theory of gravitation that is well-defined at all energies, and
that has just three free parameters.
The natural arena for application of these ideas is in the physics of
gravitation at small distance scales and high energy — specifically, in the
early universe. A start in this direction has been made by Reuter and his
collaborators,44 but much remains to be done.
41
A. Codello, R. Percacci, & C. Rahmede, Int. J. Mod. Phys. A23, 143 (2008)
42
A. Codello, R. Percacci, & C. Rahmede, Ann. Phys. 324, 414 (2009)
43
D. Benedetti, P. F. Machado, & F. Saueressig, 0901.2984, 0902.4630
44
A. Bonanno and M. Reuter, Phys. Rev. D 65, 043508 (2002); Phys. Lett. B527, 9

20
I am grateful for correspondence about recent work on asymptotic safety
with D. Benedetti, D. Litim, R. Percacci, and M. Reuter, and to G. Colan-
gelo and J. Gasser for inviting me to give this talk. This material is based
in part on work supported by the National Science Foundation under Grant
NO. PHY-0455649 and with support from The Robert A. Welch Foundation,
Grant No. F-0014.

(2002); M. Reuter and F. Saueressig, J. Cosm. and Astropart. Phys. 09, 012 (2005).

21
The Making of the Standard Model

Steven Weinberg∗
Theory Group, Physics Department, University of Texas,
Austin, TX, 78712

This is the edited text of a talk given at CERN on September 16, 2003, as part of
a celebration of the 30th anniversary of the discovery of neutral currents and the 20th
anniversary of the discovery of the W and Z particles.

I have been asked to review the history of the formation of the Standard
Model. It is natural to tell this story as a sequence of brilliant ideas and
experiments, but here I will also talk about some of the misunderstandings
and false starts that went along with this progress, and why some steps were
not taken until long after they became possible. The study of what was not
understood by scientists, or was understood wrongly, seems to me often the
most interesting part of the history of science. Anyway, it is an aspect of
the Standard Model with which I am very familiar, for as you will see in this
talk, I shared in many of these misunderstandings.
I’ll begin by taking you back before the Standard Model to the 1950’s.
It was a time of frustration and confusion. The success of quantum elec-
trodynamics in the late 1940s had produced a boom in elementary particle
theory, and then the market crashed. It was realized that the four-fermion
theory of weak interactions had infinities that could not be eliminated by
the technique of renormalization, which had worked so brilliantly in elec-
trodynamics. The four-fermion theory was perfectly good as a lowest-order
approximation, but when you tried to push it to the next order of pertur-
bation theory you encountered unremovable infinities. The theory of strong
interactions had a different problem; there was no difficulty in constructing
renormalizable theories of the strong interactions like the original Yukawa
theory but, because the strong interactions are strong, perturbation theory
was useless, and one could do no practical calculations with these theories.
A deeper problem with our understanding of both the weak and the strong
interactions was that there was no rationale for any of these theories. The
weak interaction theory was simply cobbled together to fit what experimen-

weinberg@physics.utexas.edu
tal data was available, and there was no evidence at all for any particular
theory of strong interactions.
There began a period of disillusionment with quantum field theory. The
community of theoretical physicists tended to split into what at the time
were sometimes called, by analogy with atomic wave functions, radial and
azimuthal physicists. Radial physicists were concerned with dynamics, par-
ticularly the dynamics of the strong interactions. They had little to say
about the weak interactions. Some of them tried to proceed just on the basis
of general principles, using dispersion relations and Regge pole expansions,
and they hoped ultimately for a pure S-matrix theory of the strong inter-
actions, completely divorced from quantum field theory. Weak interactions
would somehow take care of themselves later. Azimuthal physicists were
more modest. They took it as a working rule that there was no point in
trying to understand strong interaction dynamics, and instead they studied
the one sort of thing that could be used to make predictions without such
understanding — principles of symmetry.
But there was a great obstacle in the understanding of symmetry princi-
ples. Many symmetry principles were known, and a large fraction of them
were only approximate. That was certainly true of isotopic spin symmetry,
which goes back to 1936 [1]. Strangeness conservation was known from the
beginning to be violated by the weak interactions [2]. Then in 1956 even the
sacred symmetries of space and time, P and PT conservation, were found
to be violated by the weak interactions [3], and CP conservation was found
in 1964 to be only approximate [4]. The SU(3) symmetry of the “eightfold
way” discovered in the early 1960s [5] was at best only a fair approximation
even for the strong interactions. This left us with a fundamental question.
Many azimuthal physicists had thought that symmetry principles were an
expression of the simplicity of nature at its deepest level. So what are you to
make of an approximate symmetry principle? The approximate simplicity of
nature?
During this time of confusion and frustration in the 1950s and 1960s
there emerged three good ideas. These ideas took a long time to mature,
but have become fundamental to today’s elementary particle physics. I am
emphasizing here that it took a long time before we realized what these ideas
were good for partly because I want to encourage today’s string theorists,
who I think also have good ideas that are taking a long time to mature.
The first of the good ideas that I’ll mention is the quark model, proposed

2
in 1964 independently by Gell-Mann and Zweig [6]. The idea that hadrons
are made of quarks and antiquarks, used in a naive way, allowed one to make
some sense of the growing menu of hadrons. Also, the naive quark model
seemed to get experimental support from an experiment done at SLAC in
1968 under the leadership of Friedman, Kendall, and Taylor [7], which was
analogous to the experiment done by Geiger and Marsden in Rutherford’s
laboratory in 1911. Geiger and Marsden had found that alpha particles were
sometimes scattered by gold atoms at large angles, and Rutherford inferred
from this that the mass of the atoms was concentrated in something like a
point particle, which became known as the nucleus of the atom. In the same
way, the SLAC experiment found that electrons were sometimes scattered
from nucleons at large angles, and this was interpreted by Feynman and
Bjorken [8] as indicating that the neutron and proton consisted of point
particles. It was natural to identify these “partons” with Gell-Mann and
Zweig’s quarks. But of course the mystery about all this was why no one
ever saw quarks. Why, for example, did oil drop experiments never reveal
third integer charges? I remember Dalitz and Lipkin at various conferences
showing all the successful predictions of the naive quark model for hadron
systematics, while I sat there remaining stubbornly unconvinced, because
everyone knew that quarks had been looked for and not found.
The second of the good ideas that were extant in the 1950s and 1960s was
the idea of gauge (or local) symmetry. (Of course electrodynamics was much
older, and could have been regarded as based on a U(1) gauge symmetry,
but that wasn’t the point of view of the theorists who developed quantum
electrodynamics in the 1930s.) Yang and Mills [9] in 1954 constructed a
gauge theory based not on the simple one-dimensional group U(1) of elec-
trodynamics, but on a three-dimensional group, the group SU(2) of isotopic
spin conservation, in the hope that this would become a theory of the strong
interactions. This was a beautiful theory because the symmetry dictated
the form of the interactions. In particular, because the gauge group was
non-Abelian (the “charges” do not commute with each other) there was a
self-interaction of the gauge bosons, like the self-interactions of gravitons in
general relativity. This was just the sort of thing that brings joy to the heart
of an elementary particle theorist.
The quantization of non-Abelian gauge theories was studied by a number
of other theorists [10], generally without any idea of applying these theories
immediately to known interactions. Some of these theorists developed the

3
theory of the quantization of Yang–Mills theories as a warm-up exercise for
the problem they really wanted to solve, the quantization of general relativity.
It took a few years before physicists began to apply the Yang–Mills idea to
the weak interactions. This was in part because in 1954, as you may recall,
the beta decay interactions were known to be a mixture of scalar, tensor,
and perhaps pseudoscalar four-fermion interactions. This was the result of a
series of wrong experiments, each one of which as soon as it was discovered to
be wrong was replaced by another wrong experiment. It wasn’t until 1957–
58 that it became generally realized that the weak interactions are in fact a
mixture of vector and axial vector interactions [11], of the sort that would
be produced by intermediate vector bosons.
Theories of intermediate vector bosons were then developed by several
authors [12], but generally, except for the papers by Bludman in 1958 and by
Salam and Ward in 1964, without reference to non-Abelian local symmetries.
(For instance, with the exceptions noted, these papers did not include the
quadrilinear interactions among vector bosons characteristic of theories with
non-Abelian local symmetries.) I will have more to say about some of these
papers later.
From the beginning, the chief obstacle to the application of the Yang–
Mills approach to theories of either the weak or the strong interactions was
the problem of mass. Gauge symmetry forbids the gauge bosons from having
any mass, and it was supposed that any massless gauge bosons would surely
have been detected. In all the papers of ref. 12 a mass was put in by hand,
but this would destroy the rationale for a gauge theory; the local symmetry
principle that motivates such theories would be violated by the insertion of
a mass. Obviously also the arbitrary insertion of mass terms makes theories
less predictive. Finally, through the work of several authors [13] in the 1960s,
it was realized that non-Abelian gauge theories with mass terms inserted by
hand are non-renormalizable, and therefore in this respect do not represent
an advance over the original four-fermion weak interaction.
The third of the good ideas that I wished to mention was the idea of spon-
taneously broken symmetry: there might be symmetries of the Lagrangian
that are not symmetries of the vacuum. Physicists came to this idea through
two rather different routes.
The first route was founded on a fundamental misunderstanding. Re-
member that for some time there had been a problem of understanding the
known approximate symmetries. Many of us, including myself, were at first

4
under the illusion that if you had an exact symmetry of the field equations
of nature which was spontaneously broken then it would appear experimen-
tally as an approximate symmetry. This is quite wrong, but that’s what we
thought. (Heisenberg continued to believe this as late as 1975 [14].) At first
this seemed to offer a great hope of understanding the many approximate
symmetries, like isotopic spin, the 8-fold way, and so on. Thus it was re-
garded as a terrible setback in 1961 when Goldstone announced a theorem
[15], proved by Goldstone, Salam and myself [16] the following year, that
for every spontaneously broken symmetry there must be a massless spinless
particle. We knew that there were no such massless Goldstone bosons in
strong-interaction physics — they would have been obvious many years be-
fore — so this seemed to close off the opportunities provided by spontaneous
symmetry breaking. Higgs [17] in 1964 was motivated by this disappointment
to try to find a way out of the Goldstone theorem. He recognized that the
Goldstone theorem would not apply if the original symmetry was not just
a global symmetry like isotopic spin conservation, but a gauge symmetry
like the local isotopic spin symmetry of the original Yang–Mills theory. The
Goldstone boson then remains in the theory, but it turns into the helicity-zero
part of a gauge boson, which thereby gets a mass. At about the same time
Englert and Brout [18] independently discovered the same phenomenon, but
with a different motivation: they hoped to go back to the idea of using the
Yang–Mills theory to construct a theory of the strong interactions mediated
by massive vector bosons. This phenomenon had also been noted earlier by
Anderson [19], in a non-relativistic context.
The second of the routes to broken symmetry was the study of the currents
of the semi-leptonic weak interactions, the vector and axial-vector currents.
In 1958 Goldberger and Treiman [20] gave a derivation of a relation between
the pion decay constant, the axial vector coupling constant of beta decay,
and the strong coupling constant. The relation worked better than would be
expected from the rather implausible approximations used. It was in order to
explain the success of the Goldberger–Treiman relation that several theorists
[21] in the following years developed the idea of a partially conserved axial-
vector current, that is, an axial-vector current whose divergence was not zero
but was proportional to the pion field. Taken literally, this was a meaningless
proposition, because any field operator that had the right quantum numbers,
such as the divergence of the axial-vector current, can be called the pion
field. Nature does not single out specific operators as the field of this or that

5
particle. This idea was greatly clarified by Nambu [22] in 1960. He pointed
out that in an ideal world, where the axial-vector current was not partially
conserved but exactly conserved, the existence of a non-vanishing nucleon
mass and axial vector coupling would require the pion to be a particle of
zero mass. At sufficiently small momentum transfer this massless pion would
dominate the pseudoscalar part of the one-nucleon matrix element of the axial
vector current, which leads to the same Goldberger–Treiman result that had
previously motivated the notion of partial current conservation. Nambu and
Jona-Lasinio [23] worked out a dynamical model in which the axial–vector
current would be exactly conserved, and showed that the spectrum of bound
states did indeed include a massless pion.
In this work there was little discussion of spontaneously broken symme-
try. In particular, because the work of Nambu and his collaborators [24] on
soft-pion interactions only involved a single soft pion, it was not necessary to
identify a particular broken symmetry group. In much of their work it was
taken to be a simple U(1) symmetry group. Nambu et al. like Gell-Mann
et al. [21] emphasized the properties of the currents of beta decay rather
than broken symmetry. Nambu, especially in the paper with Jona-Lasinio,
described what he was doing as an analog to the successful theory of super-
conductivity of Bardeen, Cooper and Schrieffer [25]. A superconductor is
nothing but a place where electromagnetic gauge invariance is spontaneously
broken, but you will not find that statement or any mention of spontaneously
broken symmetry anywhere in the classic BCS paper. Anderson [19] did re-
alize the importance of spontaneous symmetry breaking in the theory of
superconductivity, but he was almost the only condensed matter physicist
who did.
The currents of the semi-leptonic weak interactions remained the preoc-
cupation of Gell-Mann and others, who proposed working with them the way
Heisenberg had worked with atomic electric dipole transition matrix elements
in his famous 1925 paper on quantum mechanics, that is, by deriving com-
mutation relations for the currents and then saturating them by inserting
sums over suitable intermediate states [26]. This was the so-called current
algebra program. Among other things, this approach was used by Adler and
Weisberger to derive their celebrated formula for the axial-vector coupling
constant of beta decay [27].
Sometime around 1965 we began to understand all these developments
and how they were related to each other in a more modern way. It was re-

6
alized that the strong interactions must have a broken symmetry, SU(2) ×
SU(2), consisting of ordinary isotopic spin transformations plus chiral iso-
topic spin transformations acting oppositely on the left and right-handed
parts of nucleon fields. Contrary to what I and others had thought at first,
such a broken symmetry does not look in the laboratory like an ordinary ap-
proximate symmetry. If it is an exact symmetry, but spontaneously broken,
the symmetry implications are found in precise predictions for the low-energy
interactions of the massless Goldstone bosons, which for SU(2)×SU(2) would
be the pions. Among these “soft pion” formulas is the Goldberger–Treiman
relation, which should be read as a formula for the pion-nucleon coupling
at zero pion momentum. Of course SU(2) × SU(2) is only an approximate
symmetry of the strong interactions, so the pion is not a massless particle,
but is what (over Goldstone’s objections) I later called a pseudo-Goldstone
boson, with an exceptionally small mass.
From this point of view one can calculate things having nothing to do with
the electro-weak interactions, nothing to do with the semi-leptonic vector and
axial vector currents, but that refer solely to the strong interactions. Starting
in 1965, the pion-nucleon scattering lengths were calculated independently by
Tomozawa and myself [28], and I calculated the pion-pion scattering lengths
[29]. Because these processes involve more than one soft pion, the results of
these calculations depended critically on the SU(2) × SU(2) symmetry. This
work had a twofold impact. One is that it tended to kill off the S-matrix
approach to the strong interactions, because although there was nothing
wrong with the S-matrix philosophy, its practical implementation relied on
the pion-pion interaction being rather strong at low energy, while these new
results showed that it the interaction is in fact quite weak at low energy. This
work also tended for a while to reduce interest in what Higgs and Brout and
Englert had done, for we no longer wanted to get rid of the nasty Goldstone
bosons (as had been hoped particularly by Higgs), because now the pion was
recognized as a Goldstone boson, or very nearly.
This brings me to the electroweak theory, as developed by myself [30], and
independently by Salam [31]. Unfortunately Salam is not with us to describe
the chain of reasoning that led him to this theory, so I can only speak about
my own work. My starting point in 1967 was the old aim, going back to
Yang and Mills, of developing a gauge theory of the strong interactions,
but now based on the symmetry group that underlies the successful soft-
pion predictions, the symmetry group SU(2) × SU(2) [32]. I supposed that

7
the vector gauge boson of this theory would be the ρ-meson, which was
an old idea, while the axial-vector gauge boson would be the a1 meson,
an enhancement in the π − ρ channel which was known to be needed to
saturate certain spectral function sum rules, which I had developed a little
earlier that year [33]. Taking the SU(2) × SU(2) symmetry to be exact
but spontaneously broken, I encountered the same result found earlier by
Higgs and Brout and Englert; the Goldstone bosons disappeared and the
a1 meson became massive. But with the isotopic spin subgroup unbroken,
then (in accordance with a general result of Kibble [34]) the ρ-meson would
remain massless. I could of course put in a common mass for the a1 and ρ by
hand, which at first gave encouraging results. The pion now reappeared as a
Goldstone boson, and the spontaneous breaking of the symmetry made the
a1 mass larger than the ρ mass by a factor of the square root of two, which
was just the ratio that had come out of the spectral function sum rules. For
a while I was encouraged, but the theory was really too ugly. It was the same
old problem: putting in a ρ-meson mass or any gauge boson mass by hand
destroyed the rationale for the theory and made the theory less predictive,
and it also made the theory not renormalizable. So I was very discouraged.
Then it suddenly occurred to me that this was a perfectly good sort of
theory, but I was applying it to the wrong kind of interaction. The right
place to apply these ideas was not to the strong interactions, but to the weak
and electromagnetic interactions. There would be a spontaneously broken
gauge symmetry (probably not SU(2) × SU(2)) leading to massive gauge
bosons that would have nothing to do with the a1 meson but could rather be
identified with the intermediate vector bosons of the weak interactions. There
might be some generator of the gauge group that was not spontaneously
broken, and the corresponding massless gauge boson would not be the ρ
meson, but the photon. The gauge symmetry would be exact; there would
be no masses put in by hand.
I needed a concrete model to illustrate these general ideas. At that time
I didn’t have any faith in the existence of quarks, and so I decided only
to look at the leptons, and somewhat arbitrarily I decided to consider only
symmetries that acted on just one generation of leptons, separately from
antileptons — just the left-handed electron and electron-type neutrino, and
the right-handed electron. With those ingredients, the largest gauge group
you could possibly have would be SU(2) × U(1) × U(1). One of the U(1)s
could be taken to be the gauge group of lepton conservation. Now, I knew

8
that lepton number was conserved to a high degree of accuracy, so this U(1)
symmetry was presumably not spontaneously broken, but I also knew that
there was no massless gauge boson associated with lepton number, because
according to an old argument of Lee and Yang [35] it would produce a force
that would compete with gravitation. So I decided to exclude this part of the
gauge group, leaving just SU(2) × U(1) gauge symmetry. The gauge bosons
were then the charged massive particle (and its antiparticle) that had tradi-
tionally been called the W ; a neutral massive vector particle that I called the
Z; and the massless photon. The interactions of these gauge bosons with the
leptons and with each other were fixed by the gauge symmetry. Afterwards I
looked back at the literature on intermediate vector boson theories from the
late 1950s and early 1960s, and I found that the global SU(2) × U(1) group
structure had already been proposed in 1961 by Glashow [12]. I only learned
later of the independent 1964 work of Salam and Ward [12]. I think the
reason that the four of us had independently come to the same SU(2) × U(1)
group structure is simply because with these fermionic ingredients, just one
generation of leptons, there is no other group you can be led to. But now the
theory was based on an exact though spontaneously broken gauge symmetry.
The spontaneous breakdown of this symmetry had not only to give mass
to the intermediate vector bosons of the weak interactions, it also had to give
mass to the electron (and also, in another lepton doublet, to the muon.) The
only scalar particles whose vacuum expectation values could give mass to
the electron and the muon would have to form SU(2) × U(1) doublets with
charges +e and zero. For simplicity, I assumed that these would be the only
kind of scalar fields in the theory. That made the theory quite predictive. It
allowed the masses of the W and the Z as well as their couplings to be calcu-
lated in terms of a single unknown angle θ. Whatever the value of θ, the W
and Z masses were predicted to be quite large, large enough to have escaped
detection. The same results apply with several scalar doublets. (These pre-
dictions by the way could also have been obtained in a “technicolor” theory
in which the electroweak gauge symmetry is spontaneously broken by strong
forces, as realized twelve years later by Susskind and myself [36]. This is still
a possibility, but such technicolor theories have problems, and I’m betting
on the original scalar doublet or doublets.)
In addition to predicting the masses and interactions of the W and Z
in terms of a single angle, the electroweak theory made another striking
prediction which could not be verified at the time, and still has not been. A

9
single scalar doublet of complex scalar fields can be written in terms of four
real fields. Three of the gauge symmetries of SU(2)×U(1) are spontaneously
broken, which eliminates the three Goldstone bosons associated with these
fields. This leaves over one massive neutral scalar particle, as a real particle
that can be observed in the laboratory. This particle, which first made
its appearance in the physics literature in 1967 [30], has so far not made
its appearance in the laboratory. Its couplings were already predicted in
this paper, but its mass is still unknown. To distinguish this particle from
the Goldstone bosons it has come to be called the Higgs boson, and it is
now a major target of experimental effort. With several doublets (as in
supersymmetry theories) there would be several of these particles, some of
them charged.
Both Salam and I guessed that the electroweak theory is renormalizable,
because we had started with a theory that was manifestly renormalizable.
But the theory with spontaneous symmetry breaking had a new perturba-
tive expansion, and the question was whether or not renormalizability was
preserved in the new perturbation theory. We both said that we thought
that it was, but didn’t prove it. I can’t answer for Salam, but I can tell you
why I didn’t prove it. It was because at that time I disliked the only method
by which it could be proved — the method of path integration. There are
two alternative approaches to quantization: the old operator method that
goes back to the 1920s, and Feynman path-integration [37]. When I learned
the path-integration approach in graduate school and subsequent reading,
it seemed to me to be no more powerful than the operator formalism, but
with a lot more hand-waving. I tried to prove the renormalizability of the
electroweak theory using the most convenient gauge that can be introduced
in the operator formalism, called unitarity gauge, but I couldn’t do it [38].
I suggested the problem to a student [39], but he couldn’t do it either, and
to this day no one has done it using this gauge. What I didn’t realize was
that the path-integral formalism allows the use of gauges that cannot be
introduced as a condition on the operators in a quantum field theory, so it
gives you a much larger armamentarium of possible gauges in which gauge
invariant theories can be formulated.
Although I didn’t understand the potentialities of path integration, Velt-
man and his student t’Hooft did. In 1971 t’Hooft used path integration
to define a gauge in which it was obvious that spontaneously broken non-
Abelian gauge theories with only the simplest interactions had a property

10
that is essential to renormalizability, that in all orders of perturbation the-
ory there are only a finite number of infinities [40]. This did not quite prove
that the theory was renormalizable, because the Lagrangian is constrained
by a spontaneously broken but exact gauge symmetry. In the ‘t Hooft gauge
it was obvious that there were only a finite number of infinities, but how
could one be sure that they exactly match the parameters of the original
theory as constrained by gauge invariance, so that these infinities can be
absorbed into a redefinition of the parameters? This was initially proved in
1972 by Lee and Zinn-Justin [41] and by ’t Hooft and Veltmann [42], and
later in an elegant formalism by Becchi, Rouet, and Stora, and by Tyutin
[43]. But I must say that after ’t Hooft’s original 1971 paper, (and, for me,
a subsequent related paper by Ben Lee [44]) most theorists were pretty well
convinced that the theory was renormalizable, and at least among theorists
there was a tremendous upsurge of interest in this kind of theory.
From today’s perspective, it may seem odd that so much attention was
focused on the issue of renormalizability. Like general relativity, the old the-
ory of weak interactions based on four-fermion interactions could have been
regarded as an effective quantum field theory [45], which works perfectly well
at sufficiently low energy, and with the introduction of a few additional free
parameters even allows the calculation of quantum corrections. The expan-
sion parameter in such theories is the energy divided by some characteristic
mass and as long as you work to a given order in the energy you will only
need a finite number of coupling types, so that the coupling parameters can
absorb all of the infinities. But such theories inevitably lose all predictive
power at energies above the characteristic mass. For the four-fermion theory
of weak interactions it was clear that the characteristic mass was no greater
than about 300 GeV, and as we now know, it is actually of the order of the
W mass. The importance of the renormalizability of the electroweak theory
was not so much that infinities could be removed by renormalization, but
rather that the theory had the potentiality of describing weak and electro-
magnetic interactions at energies much greater than 300 GeV, and perhaps
all the way up to the Planck scale. The search for a renormalizable theory
of weak interactions was the right strategy but, as it turned out, not for the
reasons we originally thought.
These attractive theories of the electroweak theory did not mean that the
theory was true — that was a matter for experiment. After the demonstration
that the electroweak theory is renormalizable, its experimental consequences

11
began to be taken seriously. The theory predicted the existence of neutral
currents, but this was an old story. Suggestions of neutral weak currents can
be traced back to 1937 papers of Gamow and Teller, Kemmer, and Wentzel
[46]. Neutral currents had appeared in the 1958 paper by Bludman and in all
the subsequent papers in ref. 12, including of course those of Glashow and
of Salam and Ward. But now there was some idea about their strength. In
1972 I looked at the question of how easy it would be to find semi-leptonic
neutral current processes, and I found that although in the electroweak theory
they are somewhat weak compared to the ordinary charged-current weak
interactions, they were not too weak to be seen [47]. In particular, I pointed
out that the ratio of elastic neutrino-proton scattering to the corresponding
inelastic charged-current reaction would have a value between .15 and .25,
depending on the value of the unknown angle θ. A 1970 experiment [48] had
given a value of .12 plus or minus .06 for this ratio, but the experimenters
didn’t believe that they were actually seeing neutral currents, so they didn’t
claim to have observed a neutral current reaction at a level of roughly 12%
of the charged current reaction, and instead quoted this result as an upper
bound. The minimum theoretical value 0.15 of this ratio applies for sin2 θ =
0.25, which is not far from what we now know is the correct value. I suspect
that this 1970 experiment had actually observed neutral currents, but you
get credit for making discoveries only when you claim that you have made
the discovery.
Neutral currents were discovered in 1973 at CERN [49]. I suspect that
this will be mentioned later today, so I won’t go into it here. At first the
data on neutral current reactions looked like it exactly fit the electroweak
theory, but then a series of other experiments gave contrary results. The
most severe challenge came in 1976 from two atomic physics experiments
[50] that seemed to show that there was no parity violation in the bismuth
atom at the level that would be expected to be produced by neutral current
electron-nucleon interactions in the electroweak theory. For most theorists
these experiments did not challenge the basic idea that weak interactions arise
from a spontaneously broken gauge symmetry, but they threw serious doubt
on the specific SU(2) × U(1) implementation of the idea. Many other models
were tried during this period, all sharing the property of being terribly ugly.
Finally, parity violation in the neutral currents was discovered at the expected
level in electron–nucleon scattering at SLAC in 1978 [51], and after that
most physicists took it for granted that the electroweak theory is essentially

12
correct.
The other half of the Standard Model is quantum chromodynamics. By
the early 1970s the success of the electroweak theory had restored interest in
Yang–Mills theory. In 1973 Gross and Wilczek and Politzer independently
discovered that non-Abelian gauge theories have the remarkable property of
asymptotic freedom [52]. They used renormalization group methods due
to Gell-Mann and Low [53], which had been revived in 1970 by Callan,
Symanzik, Coleman and Jackiw [54], to define an effective gauge coupling
constant as a function of energy, and they showed that in Yang–Mills theo-
ries with not too many fermions this coupling goes to zero as the energy goes
to infinity. (‘t Hooft had found this result and announced it at a conference
in 1972, but he waited to publish this result and work out its implications
while he was doing other things, so his result did not attract much atten-
tion.) It was already known both from baryon systematics and from the rate
of neutral pion decay into two photons that quarks of each flavor u, d, s, etc.
must come in three colors [55], so it was natural to take the gauge symmetry
of the strong interactions as an SU(3) gauge group acting on the three-valued
color quantum number of the quarks. Subsequent work [56] by Gross and
Wilczek and by Georgi and Politzer using the Wilson operator product ex-
pansion [57] showed that the decrease of the strong coupling constant with
increasing energy in this theory explained why “partons” had appeared to
be weakly coupled in the 1968 Friedman–Kendall–Taylor experiment [7].
But a big problem remained: what is one to do with the massless SU(3)
gauge bosons, the gluons? The original papers [52] of Politzer and Gross and
Wilczek suggested that the reason why massless gluons are not observed is
that the gauge symmetry is spontaneously broken, just as in the electroweak
theory. The gluons could then be assumed to be too heavy to observe. Very
soon afterwards a number of authors independently suggested an alternative,
that the gauge symmetry is not broken at all, the gluons are in fact massless,
but we don’t see them for the same reason that we don’t see the quarks, which
is that, as a result of the peculiar infrared properties of non-Abelian gauge
theories, color is trapped; color particles like quarks and gluons can never
be isolated [58]. This has never been proved. There is now a million dollar
prize offered by the Cray Foundation to anyone who succeeds in proving it
rigorously, but since it is true I for one am happy to leave the proof to the
mathematicians.
One of the great things that came out of this period of the development

13
of the electroweak and the strong interaction theories is an understanding
at long last of the old approximate symmetries. It was now understood
that these symmetries were approximate because they weren’t fundamental
symmetries at all; they were just accidents. Renormalizable quantum chro-
modynamics must respect strangeness conservation and charge conjugation
invariance, and, aside from a non-perturbative effect that I don’t have time
to go into, it must also respect parity and time reversal invariance. You
cannot introduce any renormalizable interaction into the theory that would
violate those symmetries. This would not be true if scalar fields participated
in the strong interactions, as in the old Yukawa theory. This result was not
only aesthetically pleasing, but crucial, because if there were possible renor-
malizable interactions that violated, say, strangeness conservation, or parity,
then even if you didn’t put such interactions in the theory, higher order weak
interactions would generate them at first order in the fine structure constant
[59]. There would then be violations of parity and strangeness conservation
in the strong interactions at a level of a percent or so, which certainly is not
the case.
If one makes the additional assumption that the up, down and strange
quark masses are small, then without having to assume anything about their
ratios it follows that the theory has an approximate SU(3)×SU(3) symmetry,
including not only the eightfold way but also the spontaneously broken chiral
SU(2) × SU(2) symmetry that had been used to derive theorems for low-
energy pions back in the mid 1960s. Furthermore, with an intrinsic SU(3) ×
SU(3) symmetry breaking due to small up, down and strange quark masses,
this symmetry gives rise to the Gell-Mann–Okubo mass formula [60] and
justifies the symmetry-breaking assumptions made in the 1965 derivation
of the pion-pion scattering lengths [29]. Finally, it is automatic in such
theories that the semi-leptonic currents of the weak interactions must be
symmetry currents associated with this SU(3) × SU(3) symmetry. This
was a really joyous moment for theorists. Suddenly, after all those years of
dealing with approximate symmetries, it all fell into place. They are not
fundamental symmetries of nature at all; they are just accidents dictated by
the renormalizability of quantum chromodynamics and the gauge origin of
the electroweak interactions.
Before closing, I must also say something about two other topics: the
problem of strangeness nonconservation in the weak interactions, and the
discoveries of the third generation of quarks and leptons and of the W and

14
Z.
The charge exchange semileptonic interactions were long known to vio-
late strangeness conservation, so any charged W boson would have to have
couplings in which strangeness changes by one unit. It follows that the ex-
change of pairs of W s could produce processes like K − K̄ conversion in which
strangeness changes by two units. With an ultraviolet cut-off of the order
of the W mass, the amplitude for such processes would be suppressed by
only two factors of the inverse W mass, like a first-order weak interaction, in
contradiction with the known magnitude of the mass difference of the K1 and
K2 . A way out of this difficulty was discovered in 1970 by Glashow, Iliopou-
los and Maiani [61]. They found that these strangeness-violating first-order
weak interactions would disappear if there were two full doublets of quarks,
entering in the same way in the weak interactions. This required a fourth
quark, called the charm quark. They also showed that with the fourth quark
in the theory, in an SU(2) gauge theory the neutral currents would not vi-
olate strangeness conservation. In 1972 I showed that the GIM mechanism
also works for the Z exchange of the SU(2) × U(1) electroweak theory [62].
The introduction of the fourth quark also had the happy consequence, as
shown independently by Bouchiat, Iliopoulos, and Meyer and by myself [63],
that the triangle anomalies that would otherwise make the theory not really
gauge invariant all cancelled. The K1 − K2 mass difference was calculated
as a function of the charm quark mass by Gaillard and Lee [64], who used
the experimental value of this mass difference to estimate that the mass of
the charm quark would be about 1.5 GeV. Further, using the new insight
from quantum chromodynamics that the strong coupling is not so strong at
energies of this order, Applequist and Politzer in 1974 (just before the dis-
covery of the J/psi) predicted that the charm-anticharm bound state would
be rather narrow [65]. This narrow bound state was discovered in 1974 [66],
and immediately not only provided evidence for the existence of a fourth
quark, but also gave vivid testimony that quarks are real.
The only thing remaining in the completion of the Standard Model was
the discovery of the third generation: the τ lepton [67] (and the correspond-
ing neutrino) and the bottom [68] and top [69] quarks. This provided a
new mechanism for CP violation, the complex phase factor in the Cabibbo–
Kobayashi–Maskawa matrix [70] appearing in the semi-leptonic weak inter-
actions. The fact that the third generation of quarks is only slightly mixed in
this matrix with the first and second generations even makes it natural that

15
the CP violation produced in this way should be rather weak. Unfortunately,
the explanation of the masses and mixing angles in the Cabibbo–Kobayashi–
Maskawa matrix continues to elude us.
These developments were crowned in 1983 with the discovery [71] of the
W and the Z intermediate vector bosons. It has proved possible to measure
their masses with great precision, which has allowed a stringent comparison
of the electroweak theory with experiment. This comparison has even begun
to give hints of the properties of the as yet undiscovered scalar particle or
particles.
Well, those were great days. The 1960s and 1970s were a time when
experimentalists and theorists were really interested in what each other had
to say, and made great discoveries through their mutual interchange. We
have not seen such great days in elementary particle physics since that time,
but I expect that we will see good times return again in a few years, with
the beginning of a new generation of experiments at this laboratory.

References

1. G. Breit, E. U. Condon, and R. D. Present, Phys. Rev. 50, 825 (1936);


B. Cassen and E. U. Condon, Phys. Rev. 50, 846 (1936); G. Breit
and E. Feenberg, Phys. Rev. 50, 850 (1936). This symmetry was
suggested by the discovery of the equality of proton-proton and proton-
neutron forces by M. A. Tuve, N. Heydenberg, and L. R. Hafstad, Phys.
Rev. 50, 806 (1936). Heisenberg had earlier used an isotopic spin
formalism, but without introducing any symmetry beyond invariance
under interchange of protons and neutrons.

2. M. Gell-Mann, Phys. Rev. 92, 833 (1953); T. Nakano and K. Nishijima,


Prog. Theor. Phys. 10, 581 (1955).

3. T. Lee and C. N. Yang, Phys. Rev. 104, 254 (1956); C. S. Wu et


al. Phys. Rev. 105, 1413 (1957); R. Garwin, M. Lederman, and M.
Weinrich, Phys. Rev. 105, 1415 (1957); J. I. Friedman and V. L.
Telegdi, Phys. Rev. 105, 1681.

4. J. H. Christensen, J. W. Cronin, V. L. Fitch, and R. Turlay, Phys.


Rev. Lett. 13, 138 (1964).

16
5. M. Gell-Mann, Cal. Tech. Synchotron Lab Report CTSL-20 (1961);
Y. Ne’eman, Nucl. Phys. 26, 222 (1961).

6. M. Gell-Mann, Phys. Lett. 8, 214 (1964); G. Zweig, CERN preprint


TH401 (1964). Earlier, it had been suggested that baryon number
should be included in the hadron symmetry group by expanding SU(3)
to U(3) rather than SU(3) × U(1), with each lower or upper index in a
tensor representation of U(3) carrying a baryon number 1/3 or −1/3,
respectively, by H. Goldberg and Y. Ne’eman, Nuovo Cimento 27, 1
(1963).

7. E. D. Bloom et al., Phys. Rev. Lett. 23, 930 (1969); M. Briedenbach


et al., Phys. Rev. Lett. 23, 935 (1969); J. L. Friedman and H. W.
Kendall, Annual Reviews of Nuclear Science 22, 203 (1972).

8. J. D. Bjorken, Phys. Rev. 179, 1547 (1969); R. P. Feynman, Phys.


Rev. Lett. 23, 1415 (1969).

9. C. N. Yang and R. L. Mills, Phys. Rev. 96, 191 (1954).

10. B. de Witt, Phys. Rev. Lett. 12, 742 (1964); Phys. Rev. 162,
1195 (1967); L. D. Faddeev and V. N. Popov, Phys. Lett. B 25, 29
(1967); also see R. P. Feynman, Acta Phys. Pol. 24, 697 (1963); S.
Mandelstam, Phys. Rev. 175, 1580, 1604 (1968).

11. E. C. G. Sudarshan and R. E. Marshak, in Proceedings of the Padua–


Venice Conference on Mesons and Recently Discovered Particles, p.
v-14 (1957); Phys. Rev. 109, 1860 (1958); R. P. Feynman and M.
Gell-Mann, Phys. Rev. 109, 193 (1958).

12. J. Schwinger, Ann. Phys. 2, 407 (1957); T. D. Lee and C. N. Yang,


Phys. Rev. 108, 1611 (1957); 119, 1410 (1960); S. Bludman, Nuovo
Cimento 9, 433 (1958); J. Leite-Lopes, Nucl. Phys. 8. 234 (1958); S.
L. Glashow, Nucl. Phys. 22, 519 (1961); A. Salam and J. C. Ward,
Phys. Lett. 13, 168 (1964).

13. A. Komar and A. Salam, Nucl. Phys. 21, 624 (1960); H. Umezawa
and S. Kamefuchi, Nucl. Phys. 23, 399 (1961); S. Kamefuchi, L. O’
Raifeartaigh, and A. Salam, Nucl. Phys. 28, 529 (1961); A. Salam,

17
Phys. Rev. 127, 331 (1962); M. Veltman, Nucl. Phys. B 7, 637 (1968);
Nucl. Phys. 21, 288 (1970); D. Boulware, Ann. Phys. 56, 140 (1970).
14. W. Heisenberg, lecture “What is an Elementary Particle?” to the Ger-
man Physical Society on March 5, 1975, reprinted in English translation
in Encounters with Einstein And Other Essays of People, Places, and
Particles (Princeton University Press, 1983).
15. J. Goldstone, Nuovo Cimento 19, 154 (1961).
16. J. Goldstone, A. Salam, and S. Weinberg, Phys. Rev. 127, 965 (1962).
17. P. W. Higgs, Phys. Lett. 12, 132 (1964); Phys. Lett. 13, 508 (1964);
Phys. Rev. 145, 1156 (1966). Also see G. S. Guralnik, C. Hagen, and
T. W. B. Kibble, Phys. Rev. Lett. 13, 585 (1964).
18. F. Englert and R. Brout, Phys. Rev. Lett. 13, 321 (1964).
19. P. M. Anderson, Phys. Rev. 130, 439 (1963).
20. M. L. Goldberger and S. B. Treiman, Phys. Rev. 111, 354 (1958).
21. M. Gell-Mann and M. Lévy, Nuovo Cimento 16, 705 (1960); J. Bern-
stein, S. Fubini, M. Gell-Mann, and W. Thirring, Nuovo Cimento 17,
757 (1960); K-C. Chou, Soviet Physics JETP 12, 492 (1961).
22. Y. Nambu, Phys. Rev. Lett. 4, 380 (1960).
23. Y. Nambu and G. Jona-Lasinio, Phys. Rev. 122, 345 (1961).
24. Y. Nambu and D. Lurie, Phys. Rev. 125, 1429 (1962); Y. Nambu
and E. Shrauner, Phys. Rev. 128, 862 (1962). These predictions were
generalized by S. Weinberg, Phys. Rev. Lett. 16, 879 (1966).
25. J. Bardeen, L. N. Cooper, and J. R. Schrieffer, Phys. Rev. 108, 1175
(1957).
26. M. Gell-Mann, Physics 1, 63 (1964).
27. S. L. Adler, Phys. Rev. Lett. 14, 1051 (1965); Phys. Rev. 140, B736
(1965); W. I. Weisberger, Phys. Rev. Lett. 14, 1047 (1965); Phys.
Rev. 143, 1302 (1965).

18
28. S. Weinberg, Phys. Rev. Lett. 17, 616 (1966); Y. Tomozawa, Nuovo
Cimento 46A, 707 (1066).

29. S. Weinberg, ref. 28.

30. S. Weinberg, Phys. Rev. Lett. 19, 1264 (1967).

31. A. Salam, in Elementary Particle Physics, N. Svartholm, ed. (Nobel


Symposium No. 8, Almqvist & Wiksell, Stockholm, 1968), p. 367.

32. This work was briefly reported in ref. 33, footnote 7.

33. S. Weinberg, Phys. Rev. Lett. 18, 507 (1967).

34. T. W. B. Kibble, Phys. Rev. 155, 1554 (1967).

35. T. D. Lee and C. N. Yang, Phys. Rev. 98, 101 (1955).

36. S. Weinberg, Phys. Rev. D19, 1277 (1979); L. Susskind, Phys. Rev.
D19, 2619 (1979).

37. R. P. Feynman, “The Principle of Least Action in Quantum Mechan-


ics” (Princeton University Ph. D. thesis, 1942; University Microfilms
Publication No. 2948, Ann Arbor.) This work was in the context of
non-relativistic quantum mechanics. Feynman later applied this for-
malism to the Dirac theory of electrons, but its application to a full-
fledged quantum field theory was the work of other authors, including
some of those in ref. 10.

38. I reported this work later in Phys. Rev. Lett. 27, 1688 (1971) and
described it in more detail in Phys. Rev. D 7, 1068 (1973).

39. See L. Stuller, M.I.T. Ph.D. thesis (1971).

40. G. ‘t Hooft, Nucl. Phys. B 35, 167 (1971).

41. B. W. Lee and J. Zinn-Justin, Phys. Rev. D 5, 3121, 3137, 3155 (1972).

42. G. ‘t Hooft and M. Veltman, Nucl. Phys. B 44, 189 (1972); Nucl.
Phys. B 50. 318 (1972).

19
43. C. Becchi, A. Rouet, and R. Stora, Commun. Math. Phys. 42, 127
(1975); Ann. Phys. 98, 287 (1976); I. V. Tyutin, Lebedev Institute
preprint N39 (1975).

44. B. W. Lee, Phys. Rev. D 5, 823 (1972).

45. S. Weinberg, Physica 96A, 327 (1979).

46. G. Gamow and E. Teller, Phys. Rev. 51, 289L (1937); N.Kemmer,
Phys. Rev. 52, 906 (1937); G. Wentzel, Helv. Phys. Acta 10, 108
(1937).

47. S. Weinberg, Phys. Rev. 5, 1412 (1972).

48. D. C. Cundy et al., Phys. Lett. B 31, 478 (1970).

49. F. J. Hasert et al., Phys. Lett. B 46, 121, 138 (1973); P. Musset et al.,
J. Phys. (Paris) 11/12, T34 (1973).

50. L. L. Lewis et al., Phys. Rev. Lett. 39, 795 (1977); P. E. G. Baird et
al., Phys. Rev. Lett. 39, 798 (1977).

51. C. Y. Prescott et al., Phys. Lett. 77B, 347 (1978).

52. D. J. Gross and F. Wilczek, Phys. Rev. Lett. 30, 1343 (1973); H. D.
Politzer, Phys. Rev. Lett. 30, 1346 (1973).

53. M. Gell-Mann and F. E. Low, Phys. Rev. 95, 1300 (1954).

54. C. G. Callan, Phys. Rev. D2, 1541 (1970); K. Symanzik, Commun.


Math. Phys. 18, 227 (1970); C. G. Callan, S. Coleman, and R. Jackiw,
Ann. of Phys. (New York) 47, 773 (1973).

55. O. W. Greenberg, Phys. Rev. Lett. 13, 598 (1964); M. Y. Han


and Y. Nambu, Phys. Rev. B 139, 1006 (1965); W. A. Bardeen,
H. Fritzsch, and M. Gell-Mann, in Scale and Conformal Symmetry in
Hadron Physics, R. Gatto, ed. (Wiley, New York, 1973), p. 139.

56. H. Georgi and H. D. Politzer, Phys. Rev. D 9, 416 (1974); D. J. Gross


and F. Wilczek, Phys. Rev. D 9, 980 (1974).

20
57. K. Wilson, Phys. Rev. 179, 1499 (1969).

58. S. Weinberg, Phys. Rev. Lett. 31, 494 (1973); D. J. Gross and F.
Wilczek, Phys. Rev. D 8, 3633 (1973); H. Fritzsch, M. Gell-Mann, and
H. Leutwyler, Phys. Lett. B 47, 365 (1973).

59. S. Weinberg, ref. 58.

60. M. Gell-Mann, ref. 5; S. Okubo, Prog. Theor. Phys. 27, 949 (1962).

61. S. Glashow, J. Iliopoulos, and L. Maiani, Phys. Rev. D 2, 1285 (1970).

62. S. Weinberg, ref. 47.

63. C. Bouchiat, J. Iliopoulos, and P. Meyer, Phys. Lett. 38B, 519 (1972);
S. Weinberg, in Fundamental Interactions in Physics and Astrophysics,
eds. G. Iverson et al. (Plenum Press, New York, 1973), p. 157.

64. M. Gaillard and B. W. Lee, Phys. Rev. D 10, 897 (1974).

65. T. Appelquist and H. D. Politzer, Phys. Rev. Lett. 34, 43 (1975).

66. J. J. Aubert et al., Phys. Rev. Lett. 33, 1404 (1974); J. E. Augustin
et al., Phys. Rev. Lett. 33, 1406 (1974).

67. M. Perl et al. Phys. Rev. Lett. 35, 195, 1489 (1975); Phys. Lett. 63B,
466 (1976).

68. S. W. Herb et al., Phys. Rev. Lett. 39, 252 (1975).

69. F. Abe et al., Phys. Rev. Lett. 74, 2626 (1995); S. Abachi et al., Phys
Rev. Lett. 74, 2632 (1995).

70. N. Cabibbo, Phys. Rev. Lett. 10, 531 (1963); M. Kobayashi and K.
Maskawa, Prog. Theor. Phys. 49, 282 (1972).

71. G. Arnison et al., Phys. Lett. 122B, 103 (1983); 126B, 398 (1983);
129B, 273 (1983); 134B, 469 (1984); 147B, 241 (1984).

21
A Designer Universe?
by Steven Weinberg

Professor of Physics, University of Texas at Austin


Winner of the 1979 Nobel Prize in Physics.

I have been asked to comment on whether the universe shows signs of having
been designed.1 I don't see how it's possible to talk about this without having at
least some vague idea of what a designer would be like. Any possible universe
could be explained as the work of some sort of designer. Even a universe that is
completely chaotic, without any laws or regularities at all, could be supposed to
have been designed by an idiot.

The question that seems to me to be worth answering, and perhaps not


impossible to answer, is whether the universe shows signs of having been
designed by a deity more or less like those of traditional monotheistic religions—
not necessarily a figure from the ceiling of the Sistine Chapel, but at least some
sort of personality, some intelligence, who created the universe and has some
special concern with life, in particular with human life. I expect that this is not the
idea of a designer held by many here. You may tell me that you are thinking of
something much more abstract, some cosmic spirit of order and harmony, as
Einstein did. You are certainly free to think that way, but then I don't know why
you use words like 'designer' or 'God,' except perhaps as a form of protective
coloration.

It used to be obvious that the world was designed by some sort of intelligence.
What else could account for fire and rain and lightning and earthquakes? Above
all, the wonderful abilities of living things seemed to point to a creator who had a
special interest in life. Today we understand most of these things in terms of
physical forces acting under impersonal laws. We don't yet know the most
fundamental laws, and we can't work out all the consequences of the laws we do
know. The human mind remains extraordinarily difficult to understand, but so is
the weather. We can't predict whether it will rain one month from today, but we
do know the rules that govern the rain, even though we can't always calculate
their consequences. I see nothing about the human mind any more than about
the weather that stands out as beyond the hope of understanding as a
consequence of impersonal laws acting over billions of years.

There do not seem to be any exceptions to this natural order, any miracles. I have
the impression that these days most theologians are embarrassed by talk of
miracles, but the great monotheistic faiths are founded on miracle stories—the
burning bush, the empty tomb, an angel dictating the Koran to Mohammed—and
some of these faiths teach that miracles continue at the present day. The
evidence for all these miracles seems to me to be considerably weaker than the
evidence for cold fusion, and I don't believe in cold fusion. Above all, today we
understand that even human beings are the result of natural selection acting over
millions of years of breeding and eating.

I'd guess that if we were to see the hand of the designer anywhere, it would be in
the fundamental principles, the final laws of nature, the book of rules that govern
all natural phenomena. We don't know the final laws yet, but as far as we have
been able to see, they are utterly impersonal and quite without any special role
for life. There is no life force. As Richard Feynman has said, when you look at the
universe and understand its laws, 'the theory that it is all arranged as a stage for
God to watch man's struggle for good and evil seems inadequate.'

True, when quantum mechanics was new, some physicists thought that it put
humans back into the picture, because the principles of quantum mechanics tell
us how to calculate the probabilities of various results that might be found by a
human observer. But, starting with the work of Hugh Everett forty years ago, the
tendency of physicists who think deeply about these things has been to
reformulate quantum mechanics in an entirely objective way, with observers
treated just like everything else. I don't know if this program has been completely
successful yet, but I think it will be.
I have to admit that, even when physicists will have gone as far as they can go,
when we have a final theory, we will not have a completely satisfying picture of
the world, because we will still be left with the question 'why?' Why this theory,
rather than some other theory? For example, why is the world described by
quantum mechanics? Quantum mechanics is the one part of our present physics
that is likely to survive intact in any future theory, but there is nothing logically
inevitable about quantum mechanics; I can imagine a universe governed by
Newtonian mechanics instead. So there seems to be an irreducible mystery that
science will not eliminate.

But religious theories of design have the same problem. Either you mean
something definite by a God, a designer, or you don't. If you don't, then what are
we talking about? If you do mean something definite by 'God' or 'design,' if for
instance you believe in a God who is jealous, or loving, or intelligent, or whimsical,
then you still must confront the question 'why?' A religion may assert that the
universe is governed by that sort of God, rather than some other sort of God, and
it may offer evidence for this belief, but it cannot explain why this should be so.

In this respect, it seems to me that physics is in a better position to give us a


partly satisfying explanation of the world than religion can ever be, because
although physicists won't be able to explain why the laws of nature are what they
are and not something completely different, at least we may be able to explain
why they are not slightly different. For instance, no one has been able to think of
a logically consistent alternative to quantum mechanics that is only slightly
different. Once you start trying to make small changes in quantum mechanics,
you get into theories with negative probabilities or other logical absurdities.
When you combine quantum mechanics with relativity you increase its logical
fragility. You find that unless you arrange the theory in just the right way you get
nonsense, like effects preceding causes, or infinite probabilities. Religious
theories, on the other hand, seem to be infinitely flexible, with nothing to prevent
the invention of deities of any conceivable sort.
Now, it doesn't settle the matter for me to say that we cannot see the hand of a
designer in what we know about the fundamental principles of science. It might
be that, although these principles do not refer explicitly to life, much less human
life, they are nevertheless craftily designed to bring it about.

Some physicists have argued that certain constants of nature have values that
seem to have been mysteriously fine-tuned to just the values that allow for the
possibility of life, in a way that could only be explained by the intervention of a
designer with some special concern for life. I am not impressed with these
supposed instances of fine-tuning. For instance, one of the most frequently
quoted examples of fine-tuning has to do with a property of the nucleus of the
carbon atom. The matter left over from the first few minutes of the universe was
almost entirely hydrogen and helium, with virtually none of the heavier elements
like carbon, nitrogen, and oxygen that seem to be necessary for life. The heavy
elements that we find on earth were built up hundreds of millions of years later in
a first generation of stars, and then spewed out into the interstellar gas out of
which our solar system eventually formed.

The first step in the sequence of nuclear reactions that created the heavy
elements in early stars is usually the formation of a carbon nucleus out of three
helium nuclei. There is a negligible chance of producing a carbon nucleus in its
normal state (the state of lowest energy) in collisions of three helium nuclei, but it
would be possible to produce appreciable amounts of carbon in stars if the
carbon nucleus could exist in a radioactive state with an energy roughly 7 million
electron volts (MeV) above the energy of the normal state, matching the energy
of three helium nuclei, but (for reasons I'll come to presently) not more than 7.7
MeV above the normal state.

This radioactive state of a carbon nucleus could be easily formed in stars from
three helium nuclei. After that, there would be no problem in producing ordinary
carbon; the carbon nucleus in its radioactive state would spontaneously emit light
and turn into carbon in its normal nonradioactive state, the state found on earth.
The critical point in producing carbon is the existence of a radioactive state that
can be produced in collisions of three helium nuclei.

In fact, the carbon nucleus is known experimentally to have just such a


radioactive state, with an energy 7.65 MeV above the normal state. At first sight
this may seem like a pretty close call; the energy of this radioactive state of
carbon misses being too high to allow the formation of carbon (and hence of us)
by only 0.05 MeV, which is less than one percent of 7.65 MeV. It may appear that
the constants of nature on which the properties of all nuclei depend have been
carefully fine-tuned to make life possible.

Looked at more closely, the fine-tuning of the constants of nature here does not
seem so fine. We have to consider the reason why the formation of carbon in
stars requires the existence of a radioactive state of carbon with an energy not
more than 7.7 MeV above the energy of the normal state. The reason is that the
carbon nuclei in this state are actually formed in a two-step process: first, two
helium nuclei combine to form the unstable nucleus of a beryllium isotope,
beryllium 8, which occasionally, before it falls apart, captures another helium
nucleus, forming a carbon nucleus in its radioactive state, which then decays into
normal carbon. The total energy of the beryllium 8 nucleus and a helium nucleus
at rest is 7.4 MeV above the energy of the normal state of the carbon nucleus; so
if the energy of the radioactive state of carbon were more than 7.7 MeV it could
only be formed in a collision of a helium nucleus and a beryllium 8 nucleus if the
energy of motion of these two nuclei were at least 0.3 MeV—an energy which is
extremely unlikely at the temperatures found in stars.

Thus the crucial thing that affects the production of carbon in stars is not the 7.65
MeV energy of the radioactive state of carbon above its normal state, but the
0.25 MeV energy of the radioactive state, an unstable composite of a beryllium 8
nucleus and a helium nucleus, above the energy of those nuclei at rest.2 This
energy misses being too high for the production of carbon by a fractional amount
of 0.05 MeV/0.25 MeV, or 20 percent, which is not such a close call after all.

This conclusion about the lessons to be learned from carbon synthesis is


somewhat controversial. In any case, there is one constant whose value does
seem remarkably well adjusted in our favor. It is the energy density of empty
space, also known as the cosmological constant. It could have any value, but from
first principles one would guess that this constant should be very large, and could
be positive or negative. If large and positive, the cosmological constant would act
as a repulsive force that increases with distance, a force that would prevent
matter from clumping together in the early universe, the process that was the
first step in forming galaxies and stars and planets and people. If large and
negative the cosmological constant would act as an attractive force increasing
with distance, a force that would almost immediately reverse the expansion of
the universe and cause it to recollapse, leaving no time for the evolution of life. In
fact, astronomical observations show that the cosmological constant is quite
small, very much smaller than would have been guessed from first principles.

It is still too early to tell whether there is some fundamental principle that can
explain why the cosmological constant must be this small. But even if there is no
such principle, recent developments in cosmology offer the possibility of an
explanation of why the measured values of the cosmological constant and other
physical constants are favorable for the appearance of intelligent life. According
to the 'chaotic inflation' theories of André Linde and others, the expanding cloud
of billions of galaxies that we call the big bang may be just one fragment of a
much larger universe in which big bangs go off all the time, each one with
different values for the fundamental constants.

In any such picture, in which the universe contains many parts with different
values for what we call the constants of nature, there would be no difficulty in
understanding why these constants take values favorable to intelligent life. There
would be a vast number of big bangs in which the constants of nature take values
unfavorable for life, and many fewer where life is possible. You don't have to
invoke a benevolent designer to explain why we are in one of the parts of the
universe where life is possible: in all the other parts of the universe there is no
one to raise the question.3 If any theory of this general type turns out to be
correct, then to conclude that the constants of nature have been fine-tuned by a
benevolent designer would be like saying, 'Isn't it wonderful that God put us here
on earth, where there's water and air and the surface gravity and temperature
are so comfortable, rather than some horrid place, like Mercury or Pluto?' Where
else in the solar system other than on earth could we have evolved?

Reasoning like this is called 'anthropic.' Sometimes it just amounts to an assertion


that the laws of nature are what they are so that we can exist, without further
explanation. This seems to me to be little more than mystical mumbo jumbo. On
the other hand, if there really is a large number of worlds in which some
constants take different values, then the anthropic explanation of why in our
world they take values favorable for life is just common sense, like explaining why
we live on the earth rather than Mercury or Pluto. The actual value of the
cosmological constant, recently measured by observations of the motion of
distant supernovas, is about what you would expect from this sort of argument: it
is just about small enough so that it does not interfere much with the formation
of galaxies. But we don't yet know enough about physics to tell whether there are
different parts of the universe in which what are usually called the constants of
physics really do take different values. This is not a hopeless question; we will be
able to answer it when we know more about the quantum theory of gravitation
than we do now.

It would be evidence for a benevolent designer if life were better than could be
expected on other grounds. To judge this, we should keep in mind that a certain
capacity for pleasure would readily have evolved through natural selection, as an
incentive to animals who need to eat and breed in order to pass on their genes. It
may not be likely that natural selection on any one planet would produce animals
who are fortunate enough to have the leisure and the ability to do science and
think abstractly, but our sample of what is produced by evolution is very biased,
by the fact that it is only in these fortunate cases that there is anyone thinking
about cosmic design. Astronomers call this a selection effect.

The universe is very large, and perhaps infinite, so it should be no surprise that,
among the enormous number of planets that may support only unintelligent life
and the still vaster number that cannot support life at all, there is some tiny
fraction on which there are living beings who are capable of thinking about the
universe, as we are doing here. A journalist who has been assigned to interview
lottery winners may come to feel that some special providence has been at work
on their behalf, but he should keep in mind the much larger number of lottery
players whom he is not interviewing because they haven't won anything. Thus, to
judge whether our lives show evidence for a benevolent designer, we have not
only to ask whether life is better than would be expected in any case from what
we know about natural selection, but we need also to take into account the bias
introduced by the fact that it is we who are thinking about the problem.

This is a question that you all will have to answer for yourselves. Being a physicist
is no help with questions like this, so I have to speak from my own experience. My
life has been remarkably happy, perhaps in the upper 99.99 percentile of human
happiness, but even so, I have seen a mother die painfully of cancer, a father's
personality destroyed by Alzheimer's disease, and scores of second and third
cousins murdered in the Holocaust. Signs of a benevolent designer are pretty well
hidden.

The prevalence of evil and misery has always bothered those who believe in a
benevolent and omnipotent God. Sometimes God is excused by pointing to the
need for free will. Milton gives God this argument in Paradise Lost:

I formed them free, and free they must remain


Till they enthral themselves: I else must change
Their nature, and revoke the high decree
Unchangeable, eternal, which ordained
Their freedom; they themselves ordained their fall.

It seems a bit unfair to my relatives to be murdered in order to provide an


opportunity for free will for Germans, but even putting that aside, how does free
will account for cancer? Is it an opportunity of free will for tumors?
I don't need to argue here that the evil in the world proves that the universe is
not designed, but only that there are no signs of benevolence that might have
shown the hand of a designer. But in fact the perception that God cannot be
benevolent is very old. Plays by Aeschylus and Euripides make a quite explicit
statement that the gods are selfish and cruel, though they expect better behavior
from humans. God in the Old Testament tells us to bash the heads of infidels and
demands of us that we be willing to sacrifice our children's lives at His orders, and
the God of traditional Christianity and Islam damns us for eternity if we do not
worship him in the right manner. Is this a nice way to behave? I know, I know, we
are not supposed to judge God according to human standards, but you see the
problem here: If we are not yet convinced of His existence, and are looking for
signs of His benevolence, then what other standards can we use?

The issues that I have been asked to address here will seem to many to be terribly
old-fashioned. The 'argument from design' made by the English theologian
William Paley is not on most peoples' minds these days. The prestige of religion
seems today to derive from what people take to be its moral influence, rather
than from what they may think has been its success in accounting for what we see
in nature. Conversely, I have to admit that, although I really don't believe in a
cosmic designer, the reason that I am taking the trouble to argue about it is that I
think that on balance the moral influence of religion has been awful.

This is much too big a question to be settled here. On one side, I could point out
endless examples of the harm done by religious enthusiasm, through a long
history of pogroms, crusades, and jihads. In our own century it was a Muslim
zealot who killed Sadat, a Jewish zealot who killed Rabin, and a Hindu zealot who
killed Gandhi. No one would say that Hitler was a Christian zealot, but it is hard to
imagine Nazism taking the form it did without the foundation provided by
centuries of Christian anti-Semitism. On the other side, many admirers of religion
would set countless examples of the good done by religion. For instance, in his
recent book Imagined Worlds, the distinguished physicist Freeman Dyson has
emphasized the role of religious belief in the suppression of slavery. I'd like to
comment briefly on this point, not to try to prove anything with one example but
just to illustrate what I think about the moral influence of religion.

It is certainly true that the campaign against slavery and the slave trade was
greatly strengthened by devout Christians, including the Evangelical layman
William Wilberforce in England and the Unitarian minister William Ellery Channing
in America. But Christianity, like other great world religions, lived comfortably
with slavery for many centuries, and slavery was endorsed in the New Testament.
So what was different for anti-slavery Christians like Wilberforce and Channing?
There had been no discovery of new sacred scriptures, and neither Wilberforce
nor Channing claimed to have received any supernatural revelations. Rather, the
eighteenth century had seen a widespread increase in rationality and
humanitarianism that led others—for instance, Adam Smith, Jeremy Bentham,
and Richard Brinsley Sheridan—also to oppose slavery, on grounds having nothing
to do with religion. Lord Mansfield, the author of the decision in Somersett's Case,
which ended slavery in England (though not its colonies), was no more than
conventionally religious, and his decision did not mention religious arguments.
Although Wilberforce was the instigator of the campaign against the slave trade
in the 1790s, this movement had essential support from many in Parliament like
Fox and Pitt, who were not known for their piety. As far as I can tell, the moral
tone of religion benefited more from the spirit of the times than the spirit of the
times benefited from religion.

Where religion did make a difference, it was more in support of slavery than in
opposition to it. Arguments from scripture were used in Parliament to defend the
slave trade. Frederick Douglass told in his Narrative how his condition as a slave
became worse when his master underwent a religious conversion that allowed
him to justify slavery as the punishment of the children of Ham. Mark Twain
described his mother as a genuinely good person, whose soft heart pitied even
Satan, but who had no doubt about the legitimacy of slavery, because in years of
living in antebellum Missouri she had never heard any sermon opposing slavery,
but only countless sermons preaching that slavery was God's will. With or without
religion, good people can behave well and bad people can do evil; but for good
people to do evil—that takes religion.
In an e-mail message from the American Association for the Advancement of
Science I learned that the aim of this conference is to have a constructive dialogue
between science and religion. I am all in favor of a dialogue between science and
religion, but not a constructive dialogue. One of the great achievements of
science has been, if not to make it impossible for intelligent people to be
religious, then at least to make it possible for them not to be religious. We should
not retreat from this accomplishment.

1 This article is based on a talk given in April 1999 at the Conference on Cosmic
Design of the American Association for the Advancement of Science in
Washington, D.C. back

2 This was pointed out in a 1989 paper by M. Livio, D. Hollowell, A. Weiss, and
J.W. Truran ('The anthropic significance of the existence of an excited state of
12C,' Nature, Vol. 340, No. 6231, July 27, 1989). They did the calculation quoted
here of the 7.7 MeV maximum energy of the radioactive state of carbon, above
which little carbon is formed in stars. back

3 The same conclusion may be reached in a more subtle way when quantum
mechanics is applied to the whole universe. Through a reinterpretation of earlier
work by Stephen Hawking, Sidney Coleman has shown how quantum mechanical
effects can lead to a split of the history of the universe (more precisely, in what is
called the wave function of the universe) into a huge number of separate
possibilities, each one corresponding to a different set of fundamental constants.
See Sidney Coleman, 'Black Holes as Red Herrings: Topological fluctuations and
the loss of quantum coherence,' Nuclear Physics, Vol. B307 (1988), p. 867. back
Biography

Steven Weinberg was educated at Cornell, Copenhagen, and


Princeton, and taught at Columbia, Berkeley, M.I.T., and
Harvard, where from 1973 to 1982 he was Higgins Professor of
Physics. In 1982 he moved to The University of Texas at Austin
and founded its Theory Group. At Texas he holds the Josey
Regental Chair of Science and is a member of the Physics and Astronomy
Departments. His research has spanned a broad range of topics in quantum field
theory, elementary particle physics, and cosmology, and has been honored with
numerous awards, including the Nobel Prize in Physics, the National Medal of
Science, the Heinemann Prize in Mathematical Physics, the Cresson Medal of the
Franklin Institute, the Madison Medal of Princeton University, and the
Oppenheimer Prize. He also holds honorary doctoral degrees from a dozen
universities. He is a member of the National Academy of Science, the Royal
Society of London, the American Academy of Arts and Sciences, the International
Astronomical Union, and the American Philosophical Society. In addition to the
well-known treatise, Gravitation and Cosmology, he has written several books for
general readers, including the prize-winning The First Three Minutes (now
translated into 22 foreign languages), The Discovery of Subatomic Particles, and
most recently Dreams of a Final Theory. He has written a textbook The Quantum
Theory of Fields, Vol. I. and Vol. II.

Sincere thanks to Prof. Steven Weinberg for giving PhysLink.com the permission to
publish this talk.
CONCEPTUAL FOUNDATIONS OF THE UNI-
FIED THEORY OF WEAK AND ELECTROMAG-
NETIC INTERACTIONS
Nobel Lecture, December 8, 1979
by STEVEN WEINBERG
Lyman Laboratory of Physics Harvard University and Harvard-Smithson-
ian Center for Astrophysics Cambridge, Mass., USA.

Our job in physics is to see things simply, to understand a great many


complicated phenomena in a unified way, in terms of a few simple princi-
ples. At times, our efforts are illuminated by a brilliant experiment, such as
the 1973 discovery of neutral current neutrino reactions. But even in the
dark times between experimental breakthroughs, there always continues a
steady evolution of theoretical ideas, leading almost imperceptibly to
changes in previous beliefs. In this talk, I want to discuss the development
of two lines of thought in theoretical physics. One of them is the slow
growth in our understanding of symmetry, and in particular, broken or
hidden symmetry. The other is the old struggle to come to terms with the
infinities in quantum field theories. To a remarkable degree, our present
detailed theories of elementary particle interactions can be understood
deductively, as consequences of symmetry principles and of a principle of
renormalizability which is invoked to deal with the infinities. I will also
briefly describe how the convergence of these lines of thought led to my
own work on the unification of weak and electromagnetic interactions. For
the most part, my talk will center on my own gradual education in these
matters, because that is one subject on which I can speak with some
confidence. With rather less confidence, I will also try to look ahead, and
suggest what role these lines of thought may play in the physics of the
future.
Symmetry principles made their appearance in twentieth century phys-
ics in 1905 with Einstein’s identification of the invariance group of space
and time. With this as a precedent, symmetries took on a character in
physicists’ minds as a priori principles of universal validity, expressions of
the simplicity of nature at its deepest level. So it was painfully difficult in
the 1930’s to realize that there are internal symmetries, such as isospin
conservation, [1] having nothing to do with space and time, symmetries
which are far from self-evident, and that only govern what are now called
the strong interactions. The 1950’s saw the discovery of another internal
symmetry - the conservation of strangeness [2] - which is not obeyed by
the weak interactions, and even one of the supposedly sacred symmetries
of space-time - parity - was also found to be violated by weak interactions.
[3] Instead of moving toward unity, physicists were learning that different
544 Physics 1979

interactions are apparently governed by quite different symmetries. Mat-


ters became yet more confusing with the recognition in the early 1960’s of
a symmetry group - the “eightfold way” - which is not even an exact
symmetry of the strong interactions. [4]
These are all “global” symmetries, for which the symmetry transforma-
tions do not depend on position in space and time. It had been recognized
[5] in the 1920’s that quantum electrodynamics has another symmetry of a
far more powerful kind, a “local” symmetry under transformations in
which the electron field suffers a phase change that can vary freely from
point to point in space-time, and the electromagnetic vector potential
undergoes a corresponding gauge transformation. Today this would be
called a U(1) gauge symmetry, because a simple phase change can be
thought of as multiplication by a 1 x 1 unitary matrix. The extension to
more complicated groups was made by Yang and Mills [6] in 1954 in a
seminal paper in which they showed how to construct an SU(2) gauge
theory of strong interactions. (The name “SU(2)” means that the group of
symmetry transformations consists of 2 x 2 unitary matrices that are
“special,” in that they have determinant unity). But here again it seemed
that the symmetry if real at all would have to be approximate, because at
least on a naive level gauge invariance requires that vector bosons like the
photon would have to be massless, and it seemed obvious that the strong
interactions are not mediated by massless particles. The old question
remained: if symmetry principles are an expression of the simplicity of
nature at its deepest level, then how can there be such a thing as an
approximate symmetry? Is nature only approximately simple?
Some time in 1960 or early 1961, I learned of an idea which had
originated earlier in solid state physics and had been brought into particle
physics by those like Heisenberg, Nambu, and Goldstone, who had worked
in both areas. It was the idea of “broken symmetry,” that the Hamiltonian
and commutation relations of a quantum theory could possess an exact
symmetry, and that the physical states might nevertheless not provide neat
representations of the symmetry. In particular, a symmetry of the Hamil-
tonian might turn out to be not a symmetry of the vacuum.
As theorists sometimes do, I fell in love with this idea. But as often
happens with love affairs, at first I was rather confused about its implica-
tions. I thought (as turned out, wrongly) that the approximate symmetries
- parity, isospin, strangeness, the eight-fold way - might really be exact a
priori symmetry principles, and that the observed violations of these sym-
metries might somehow be brought about by spontaneous symmetry
breaking. It was therefore rather disturbing for me to hear of a result of
Goldstone, [7] that in at least one simple case the spontaneous breakdown
of a continuous symmetry like isospin would necessarily entail the exis-
tence of a massless spin zero particle - what would today be called a
“Goldstone boson.” It seemed obvious that there could not exist any new
type of massless particle of this sort which would not already have been
discovered.
S. Weinberg 545

I had long discussions of this problems with Goldstone at Madison in the


summer of 1961, and then with Salam while I was his guest at Imperial
College in 196l-62. The three of us soon were able to show that Gold-
stone bosons must in fact occur whenever a symmetry like isospin or
strangeness is spontaneously broken, and that their masses then remain
zero to all orders of perturbation theory. I remember being so discouraged
by these zero masses that when we wrote our joint paper on the subject, [8]
I added an epigraph to the paper to underscore the futility of supposing
that anything could be explained in terms of a non-invariant vacuum state:
it was Lear’s retort to Cordelia, “Nothing will come of nothing: speak
again.” Of course, The Physical Review protected the purity of the physics
literature, and removed the quote. Considering the future of the non-
invariant vacuum in theoretical physics, it was just as well.
There was actually an exception to this proof, pointed out soon after-
wards by Higgs, Kibble, and others. [9] They showed that if the broken
symmetry is a local, gauge symmetry, like electromagnetic gauge in-
variance, then although the Goldstone bosons exist formally, and are in
some sense real, they can be eliminated by a gauge transformation, so that
they do not appear as physical particles. The missing Goldstone bosons
appear instead as helicity zero states of the vector particles, which thereby
acquire a mass.
I think that at the time physicists who heard about this exception gener-
ally regarded it as a technicality. This may have been because of a new
development in theoretical physics which suddenly seemed to change the
role of Goldstone bosons from that of unwanted intruders to that of
welcome friends.
In 1964 Adler and Weisberger [10] independently derived sum rules
which gave the ratio of axial-vector to vector coupling constants in
beta decay in terms of pion-nucleon cross sections. One way of looking at
their calculation, (perhaps the most common way at the time) was as an
analogue to the old dipole sum rule in atomic physics: a complete set of
hadronic states is inserted in the commutation relations of the axial vector
currents. This is the approach memorialized in the name of “current
algebra.” [11] But there was another way of looking at the Adler-Weis-
berger sum rule. One could suppose that the strong interactions have an
approximate symmetry, based on the group SU(2) x SU(2), and that this
symmetry is spontaneously broken, giving rise among other things to the
nucleon masses. The pion is then identified as (approximately) a Gold-
stone boson, with small non-zero mass, an idea that goes back to Nambu.
[12] Although the SU(2) X SU(2) symmetry is spontaneously broken, it still
has a great deal of predictive power, but its predictions take the form of
approximate formulas, which give the matrix elements for low energy
pionic reactions. In this approach, the Adler-Weisberger sum rule is ob-
tained by using the predicted pion nucleon scattering lengths in conjunc-
tion with a well-known sum rule [13], which years earlier had been derived
from the dispersion relations for pion-nucleon scattering.
546 Physics 1979

In these calculations one is really using not only the fact that the strong
interactions have a spontaneously broken approximate SU(2) X SU(2) sym-
metry, but also that the currents of this symmetry group are, up to an
overall constant, to be identified with the vector and axial vector currents
of beta decay. (With this assumption gets into the picture through
the Goldberger-Treiman relation, [14] which gives in terms of the
pion decay constant and the pion nucleon coupling.) Here, in this relation
between the currents of the symmetries of the strong interactions and the
physical currents of beta decay, there was a tantalizing hint of a deep
connection between the weak interactions and the strong interactions. But
this connection was not really understood for almost a decade.
I spent the years 1965-67 happily developing the implications of spon-
taneous symmetry breaking for the strong interactions. [15] It was this
work that led to my 1967 paper on weak and electromagnetic unification.
But before I come to that I have to go back in history and pick up one
other line of though, having to do with the problem of infinities in
quantum field theory.
I believe that it was Oppenheimer and Waller in 1930 [16] who indepen-
dently first noted that quantum field theory when pushed beyond the
lowest approximation yields ultraviolet divergent results for radiative self
energies. Professor Waller told me last night that when he described this
result to Pauli, Pauli did not believe it. It must have seemed that these
infinities would be a disaster for the quantum field theory that had just
been developed by Heisenberg and Pauli in 1929-30. And indeed, these
infinites did lead to a sense of discouragement about quantum field the-
ory, and many attempts were made in the 1930’s and early 1940’s to find
alternatives. The problem was solved (at least for quantum electrodynam-
ics) after the war, by Feynman, Schwinger, and Tomonaga [17] and Dyson
[19]. It was found that all infinities disappear if one identifies the observed
finite values of the electron mass and charge, not with the parameters m
and e appearing in the Lagrangian, but with the electron mass and charge
that are calculated from m and e, when one takes into account the fact that
the electron and photon are always surrounded with clouds of virtual
photons and electron-positron pairs [18]. Suddenly all sorts of calculations
became possible, and gave results in spectacular agreement with experi-
ment.
But even after this success, opinions differed as to the significance of the
ultraviolet divergences in quantum field theory. Many thought-and some
still do think-that what had been done was just to sweep the real problems
under the rug. And it soon became clear that there was only a limited class
of so-called “renormalizable” theories in which the infinities could be
eliminated by absorbing them into a redefinition, or a “renormalization,”
of a finite number of physical parameters. (Roughly speaking, in renorma-
lizable theories no coupling constants can have the dimensions of negative
powers of mass. But every time we add a field or a space-time derivative to
an interaction, we reduce the dimensionality of the associated coupling
S. Weinberg 547

constant. So only a few simple types of interaction can be renormalizable.)


In particular, the existing Fermi theory of weak interactions clearly was
not renormalizable. (The Fermi coupling constant has the dimensions of
[ m a s s ]-2.) The sense of discouragement about quantum field theory per-
sisted into the 1950’s and 1960’s.
I learned about renormalization theory as a graduate student, mostly by
reading Dyson’s papers. [19] From the beginning it seemed to me to be a
wonderful thing that very few quantum field theories are renormalizable.
Limitations of this sort are, after all, what we most want , not mathematical
methods which can make sense of an infinite variety of physically irrele-
vant theories, but methods which carry constraints, because these con-
straints may point the way toward the one true theory. In particular, I was
impressed by the fact that quantum electrodynamics could in a sense be
derived from symmetry principles and the constraints of renormalizability;
the only Lorentz invariant and gauge invariant renormalizable Lagrangian
for photons and electrons is precisely the orginal Dirac Lagrangian of
QED. Of course, that is not the way Dirac came to his theory. He had the
benefit of the information gleaned in centuries of experimentation on
electromagnetism, and in order to fix the final form of his theory he relied
on ideas of simplicity (specifically, on what is sometimes called minimal
electromagnetic coupling). But we have to look ahead, to try to make
theories of phenomena which have not been so well studied experimental-
ly, and we may not be able to trust purely formal ideas of simplicity. I
thought that renormalizability might be the key criterion, which also in a
more general context would impose a precise kind of simplicity on our
theories and help us to pick out the one true physical theory out of the
infinite variety of conceivable quantum field theories. As I will explain
later, I would say this a bit differently today, but I am more convinced than
ever that the use of renormalizability as a constraint on our theories of the
observed interactions is a good strategy. Filled with enthusiasm for renor-
malization theory, I wrote my Ph.D. thesis under Sam Treiman in 1957 on
the use of a limited version of renormalizability to set constraints on the
weak interactions, [20] and a little later I worked out a rather tough little
theorem [21] which completed the proof by Dyson [19] and Salam [22] that
ultraviolet divergences really do cancel out to all orders in nominally
renormalizable theories. But none of this seemed to help with the impor-
tant problem, of how to make a renormalizable theory of weak interac-
tions.
Now, back to 1967. I had been considering the implications of the
broken SU(2) x SU(2) symmetry of the strong interactions, and I thought
of trying out the idea that perhaps the SU(2) x SU(2) symmetry was a
“local,” not merely a “global,” symmetry. That is, the strong interactions
might be described by something like a Yang-Mills theory, but in addition
to the vector mesons of the Yang-Mills theory, there would also be axial
vector Al mesons. To give the meson a mass, it was necessary to insert a
common and Al mass term in the Lagrangian, and the spontaneous
548 Physics 1979

breakdown of the SU(2) x SU(2) symmetry would then split the and Al
by something like the Higgs mechanism, but since the theory would not be
gauge invariant the pions would remain as physical Goldstone bosons.
This theory gave an intriguing result, that the mass ratio should be
and in trying to understand this result without relying on perturbation
theory, I discovered certain sum rules, the “spectral function sum rules,”
[23] which turned out to have variety of other uses. But the SU(2) x SU(2)
theory was not gauge invariant, and hence it could not be renormalizable,
[24] so I was not too enthusiastic about it. [25] Of course, if I did not insert
the mass term in the Lagrangian, then the theory would be gauge
invariant and renormalizable, and the Al would be massive. But then
there would be no pions and the mesons would be massless, in obvious
contradiction (to say the least) with observation.
At some point in the fall of 1967, I think while driving to my office at
M.I.T., it occurred to me that I had been applying the right ideas to the
wrong problem. It is not the mesons that is massless: it is the photon.
And its partner is not the Al, but the massive intermediate boson, which
since the time of Yukawa had been suspected to be the mediator of the
weak interactions. The weak and electromagnetic interactions could then
be described [26] in a unified way in terms of an exact but spontaneously
broken gauge symmetry. [Of course, not necessarily SU(2) X SU(2)]. And
this theory would be renormalizable like quantum electrodynamics be-
cause it is gauge invariant like quantum electrodynamics.
It was not difficult to develop a concrete model which embodied these
ideas. I had little confidence then in my understanding of strong interac-
tions, so I decided to concentrate on leptons. There are two left-handed
electron-type leptons, the and and one right-handed electron-type
lepton, the so I started with the group U(2) U(1): all unitary 2 x 2
matrices acting on the left-handed e-type leptons, together with all unitary
1 X 1 matrices acting on the right-handed e-type lepton. Breaking up U(2)
into unimodular transformations and phase transformations, one could
say that the group was SU(2) X U( 1) X U( 1). But then one of the U(l)‘s
could be identified with ordinary lepton number, and since lepton number
appears to be conserved and there is no massless vector particle coupled to
it, I decided to exclude it from the group. This left the four-parameter
group SU(2) x U( 1). The spontaneous breakdown of SU(2) x U( 1) to the
U(1) of ordinary electromagnetic gauge invariance would give masses to
three of the four vector gauge bosons: the charged bosons W ± , and a
neutral boson that I called the Z 0. The fourth boson would automatically
remain massless, and could be identified as the photon. Knowing the
strength of the ordinary charged current weak interactions like beta decay
which are mediated by W ± , the mass of the W± was then determined as
about 40 where is the angle.
To go further, one had to make some hypothesis about the mechanism
for the breakdown of SU (2) x U (1). The only kind of field in a renormali-
zable SU(2) X U(1) theory whose vacuum expectation values could give the
S. Weinberg 549

electron a mass is a spin zero SU(2) doublet so for simplicity I


assumed that these were the only scalar fields in the theory. The mass of
t h e Z0 w a s t h e n d e t e r m i n e d a s a b o u t 8 0 G e V / s i n This fixed the
strength of the neutral current weak interactions. Indeed, just as in QED,
once one decides on the menu of fields in the theory all details of the
theory are completely determined by symmetry principles and renormal-
izability, with just a few free parameters: the lepton charge and masses, the
Fermi coupling constant of beta decay, the mixing angle and the mass of
the scalar particle. (It was of crucial importance to impose the constraint of
renormalizability; otherwise weak interactions would receive contributions
from SU(2)xU(I) - invariant four-fermion couplings as well as from vector
boson exchange, and the theory would lose most of its predictive power.)
The naturalness of the whole theory is well demonstrated by the fact that
much the same theory was independently developed [27] by Salam in
1968.
The next question now was renormalizability. The Feynman rules for
Yang-Mills theories with unbroken gauge symmetries had been worked
out [28] by deWitt, Faddeev and Popov and others, and it was known that
such theories are renormalizable. But in 1967 I did not know how to prove
that this renormalizability was not spoiled by the spontaneous symmetry
breaking. I worked on the problem on and off for several years, partly in
collaboration with students, [29] but I made little progress. With hindsight,
my main difficulty was that in quantizing the vector fields I adopted a
gauge now known as the unitarity gauge [30]: this gauge has several
wonderful advantages, it exhibits the true particle spectrum of the theory,
but it has the disadvantage of making renormalizability totally obscure.
Finally, in 1971 ‘t Hooft [31] showed in a beautiful paper how the
problem could be solved. He invented a gauge, like the “Feynman gauge”
in QED, in which the Feynman rules manifestly lead to only a finite
number of types of ultraviolet divergence. It was also necessary to show
that these infinities satisfied essentially the same constraints as the Lagran-
gian itself, so that they could be absorbed into a redefinition of the
parameters of the theory. (This was plausible, but not easy to prove,
because a gauge invariant theory can be quantized only after one has
picked a specific gauge, so it is not obvious that the ultraviolet divergences
satisfy the same gauge invariance constraints as the Lagrangian itself.) The
proof was subsequently completed [32] by Lee and Zinn-Justin and by ‘t
Hooft and Veltman. More recently, Becchi, Rouet and Stora [33] have
invented an ingenious method for carrying out this sort of proof, by using
a global supersymmetry of gauge theories which is preserved even when
we choose a specific gauge.
I have to admit that when I first saw ‘t Hooft’s paper in 1971, I was not
convinced that he had found the way to’ prove renormalizability. The
trouble was not with ‘t Hooft, but with me: I was simply not familiar
enough with the path integral formalism on which ‘t Hooft’s work was
based, and I wanted to see a derivation of the Feynman rules in ‘t Hooft’s
550 Physics 1979

gauge from canonical quantization. That was soon supplied (for a limited
class of gauge theories) by a paper of Ben Lee, [34] and after Lee’s paper I
was ready to regard the renormalizability of the unified theory as essential-
ly proved.
By this time, many theoretical physicists were becoming convinced of the
general approach that Salam and I had adopted: that is, the weak and
electromagnetic interactions are governed by some group of exact local
gauge symmetries; this group is spontaneously broken to U(l), giving mass
to all the vector bosons except the photon; and the theory is renormaliza-
ble. What was not so clear was that our specific simple model was the one
chosen by nature. That, of course, was a matter for experiment to decide.
It was obvious even back in 1967 that the best way to test the theory
would be by searching for neutral current weak interactions, mediated by
the neutral intermediate vector boson, the Z0. Of course, the possibility of
neutral currents was nothing new. There had been speculations [35] about
possible neutral currents as far back as 1937 by Gamow and Teller,
Kemmer, and Wentzel, and again in 1958 by Bludman and Leite-Lopes.
Attempts at a unified weak and electromagnetic theory had been made
[36] by Glashow and Salam and Ward in the early 1960’s, and these had
neutral currents with many of the features that Salam and I encountered
in developing the 1967-68 theory. But since one of the predictions of our
theory was a value for the mass of the Z 0 , it made a definite prediction of
the strength of the neutral currents. More important, now we had a
comprehensive quantum field theory of the weak and electromagnetic
interactions that was physically and mathematically satisfactory in the same
sense as was quantum electrodynamics-a theory that treated photons and
intermediate vector bosons on the same footing, that was based on an exact
symmetry principle, and that allowed one to carry calculations to any
desired degree of accuracy. To test this theory, it had now become urgent
to settle the question of the existence of the neutral currents.
Late in 1971, I carried out a study of the experimental possibilites. [37]
The results were striking. Previous experiments had set upper bounds on
the rates of neutral current processes which were rather low, and many
people had received the impression that neutral currents were pretty well
ruled out, but I found that in fact the 1967-68 theory predicted quite low
rates, low enough in fact to have escaped clear detection up to that time.
For instance, experiments [38] a few years earlier had found an upper
bound of 0.12 ± 0.06 on the ratio of a neutral current process, the elastic
scattering of muon neutrinos by protons, to the corresponding charged
current process, in which a muon is produced. I found a predicted ratio of
0.15 to 0.25, depending on the value of the Z 0 mixing angle So there
was every reason to look a little harder.
As everyone knows, neutral currents were finally discovered [39] in
1973. There followed years of careful experimental study on the detailed
properties of the neutral currents. It would take me too far from my
subject to survey these experiments, [40] so I will just say that they have
S. Weinberg 551

confirmed the 1967-68 theory with steadily improving precision for neu-
trino-nucleon and neutrino-electron neutral current reactions, and since
the remarkable SLAC-Yale experiment [41] last year, for the electron-
nucleon neutral current as well.
This is all very nice. But I must say that I would not have been too
disturbed if it had turned out that the correct theory was based on some
other spontaneously broken gauge group, with very different neutral
currents. One possibility was a clever SU(2) theory proposed in 1972 by
Georgi and Glashow, [42] which has no neutral currents at all. The impor-
tant thing to me was the idea of an exact spontaneously broken gauge
symmetry, which connects the weak and electromagnetic interactions, and
allows these interactions to be renormalizable. Of this I was convinced, if
only because it fitted my conception of the way that nature ought to be.
There were two other relevant theoretical developments in the early
1970’s, before the discovery of neutral currents, that I must mention here.
One is the important work of Glashow, Iliopoulos, and Maiani on the
charmed quark. [43] Their work provided a solution to what otherwise
would have been a serious problem, that of neutral strangeness changing
currents. I leave this topic for Professor Glashow’s talk. The other theoreti-
cal development has to do specifically with the strong interactions, but it
will take us back to one of the themes of my talk, the theme of symmetry.
In 1973, Politzer and Gross and Wilczek discovered [44] a remarkable
property of Yang-Mills theories which they called “asymptotic freedom”
- the effective coupling constant [45] decreases to zero as the characteris-
tic energy of a process goes to infinity. It seemed that this might explain
the experimental fact that the nucleon behaves in high energy deep inelas-
tic electron scattering as if it consists of essentially free quarks. [46] But
there was a problem. In order to give masses to the vector bosons in a
gauge theory of strong interactions one would want to include strongly
interacting scalar fields, and these would generally destroy asymptotic
freedom. Another difficulty, one that particularly bothered me, was that in
a unified theory of weak and electromagnetic interactions the fundamen-
tal weak coupling is of the same order as the electronic charge, e, so the
effects of virtual intermediate vector bosons would introduce much too
large violations of parity and strangeness conservation, of order 1/137,
into the strong interactions of the scalars with each other and with the
quarks. [47] At some point in the spring of 1973 it occurred to me (and
independently to Gross and Wilczek) that one could do away with strongly
interacting scalar fields altogether, allowing the strong interaction gauge
symmetry to remain unbroken so that the vector bosons, or “gluons”, are
massless, and relying on the increase of the strong forces with increasing
distance to explain why quarks as well as the massless gluons are not seen
in the laboratory. [48] Assuming no strongly interacting scalars, three
“colors” of quarks (as indicated by earlier work of several authors [49]),
and an SU(3) gauge group, one then had a specific theory of strong
interactions, the theory now generally known as quantum chromodyna-
mics.
552 Physics 1979

Experiments since then have increasingly confirmed QCD as the correct


theory of strong interactions. What concerns me here, though, is its impact
on our understanding of symmetry principles. Once again, the constraints
of gauge invariance and renormalizability proved enormously powerful.
These constraints force the Lagrangian to be so simple, that the strong
interactions in QCD must conserve strangeness, charge conjugation, and
(apart from problems [50] having to do with instantons) parity. One does
not have to assume these symmetries as a priori principles; there is simply
no way that the Lagrangian can be complicated enough to violate them.
With one additional assumption, that the u and d quarks have relatively
small masses, the strong interactions must also satisfy the approximate
SU(2) X SU(2) symmetry of current algebra, which when spontaneously
broken leaves us with isospin. If the s quark mass is also not too large, then
one gets the whole eight-fold way as an approximate symmetry of the
strong interactions. And the breaking of the SU(3)xSU(3) symmetry by
quark masses has just the (3,3)+(3,3) form required to account for the
pion-pion scattering lengths [15] and Gell-Mann-Okubo mass formu-
las. Furthermore, with weak and electromagnetic interactions also de-
scribed by a gauge theory, the weak currents are necessarily just the
currents associated with these strong interaction symmetries. In other
words, pretty much the whole pattern of approximate symmetries of
strong, weak, and electromagnetic interactions that puzzled us so much in
the 1950’s and 1960’s now stands explained as a simple consequence of
strong, weak, and electromagnetic gauge invariance, plus renormalizabi-
lity. Internal symmetry is now at the point where space-time symmetry was
in Einstein’s day. All the approximate internal symmetries are explained
dynamically. On a fundamental level, there are no approximate or partial
symmetries; there are only exact symmetries which govern all interactions.
I now want to look ahead a bit, and comment on the possible future
development of the ideas of symmetry and renormalizability.
We are still confronted with the question whether the scalar particles
that are responsible for the spontaneous breakdown of the electroweak
gauge symmetry SU(2) X U(1) are really elementary. If they are, then spin
zero semi-weakly decaying “Higgs bosons” should be found at energies
comparable with those needed to produce the intermediate vector bosons.
On the other hand, it may be that the scalars are composites. [51] The
Higgs bosons would then be indistinct broad states at very high mass,
analogous to the possible s-wave enhancement in scattering. There
would probably also exist lighter, more slowly decaying, scalar particles of
a rather different type, known as pseudo-Goldstone bosons. [52] And
there would have to exist a new class of “extra strong” interactions [53] to
provide the binding force, extra strong in the sense that asymptotic free-
dom sets in not at a few hundred MeV, as in QCD, but at a few hundred
GeV. This “extra strong” force would be felt by new families of fermions,
and would give these fermions masses of the order of several hundred
GeV. We shall see.
S. Weinberg 553

Of the four (now three) types of interactions, only gravity has resisted
incorporation into a renormalizable quantum field theory. This may just
mean that we are not being clever enough in our mathematical treatment
of general relativity. But there is another possibility that seems to me quite
plausible. The constant of gravity defines a unit of energy known as the
Planck energy, about 10 1 9 GeV. This is the energy at which gravitation
becomes effectively a strong interaction, so that at this energy one can no
longer ignore its ultraviolet divergences. It may be that there is a whole
world of new physics with unsuspected degrees of freedom at these enor-
mous energies, and that general relativity does not provide an adequate
framework for understanding the physics of these superhigh energy de-
grees of freedom. When we explore gravitation or other ordinary phe-
nomena, with particle masses and energies no greater than a TeV or so, we
may be learning only about an “effective” field theory; that is, one in which
superheavy degrees of freedom do not explicitly appear, but the coupling
parameters implicitly represent sums over these hidden degrees of free-
dom.
To see if this makes sense, let us suppose it is true, and ask what kinds of
interactions we would expect on this basis to find at ordinary energy. By
“integrating out” the superhigh energy degrees of freedom in a funda-
mental theory, we generally encounter a very complicated effective field
theory - so complicated, in fact, that it contains all interactions allowed by
symmetry principles. But where dimensional analysis tells us that a cou-
pling constant is a certain power of some mass, that mass is likely to be a
typical superheavy mass, such as 10 GeV. The infinite variety of non-
19

renormalizable interactions in the effective theory have coupling constants


with the dimensionality of negative powers of mass, so their effects are
suppressed at ordinary energies by powers of energy divided by super-
heavy masses. Thus the only interactions that we can detect at ordinary
energies are those that are renormalizable in the usual sense, plus any non-
renormalizable interactions that produce effects which, although tiny, are
somehow exotic enough to be seen.
One way that a very weak interaction could be detected is for it to be
coherent and of long range, so that it can add up and have macroscopic
effects. It has been shown [54] that the only particles whose exchange
could produce such forces are massless particles of spin 0, 1, or 2. And
furthermore, Lorentz’s invariance alone is enough to show that the long-
range interactions produced by any particle of mass zero and spin 2 must
be governed by general relativity. [55] Thus from this point of view we
should not be too surprised that gravitation is the only interaction discov-
ered so far that does not seem to be described by a renormalizable field
theory - it is almost the only superweak interaction that could have been
detected. And we should not be surprised to find that gravity is well
described by general relativity at macroscopic scales, even if we do not
think that general relativity applies at 10 19 G e V .
554 Physics 1979

Non-renormalizable effective interactions may also be detected if they


violate otherwise exact conservation laws. The leading candidates for viola-
tion are baryon and lepton conservation. It is a remarkable consequence of
the SU(3) and SU(2) x U( 1) gauge symmetries of strong, weak, and electro-
magnetic interactions, that all renormalizable interactions among known
particles automatically conserve baryon and lepton number. Thus, the fact
that ordinary matter seems pretty stable, that proton decay has not been
seen, should not lead us to the conclusion that baryon and lepton conserva-
tion are fundamental conservation laws. To the accuracy with which they
have been verified, baryon and lepton conservation can be explained as
dynamical consequences of other symmetries, in the same way that strange-
ness conservation has been explained within QCD. But superheavy parti-
cles may exist, and these particles may have unusual SU(3) or SU(2) x
SU(1) transformation properties, and in this case, there is no reason why
their interactions should conserve baryon or lepton number. I doubt that
they would. Indeed, the fact that the universe seems to contain an excess of
baryons over antibaryons should lead us to suspect that baryon non-
conserving processes have actually occurred. If effects of a tiny nonconser-
vation of baryon or lepton number such as proton decay or neutrino
masses are discovered experimentally, we will then be left with gauge
symmetries as the only true internal symmetries of nature, a conclusion
that I would regard as most satisfactory.
The idea of a new scale of superheavy masses has arisen in another way.
[56] If any sort of “grand unification” of strong and electroweak gauge
couplings is to be possible, then one would expect all of the SU(3) and
SU(2) x U( 1) gauge coupling constants to be of comparable magnitude. (In
particular, if SU(3) and SU(2) x U(1) are subgroups of a larger simple
group, then the ratios of the squared couplings are fixed as rational
numbers of order unity.[57]) But this appears in contradiction with the
obvious fact that the strong interactions are stronger than the weak and
electromagnetic interactions. In 1974 Georgi, Quinn and I suggested that
the grand unification scale, at which the couplings are comparable, is at an
enormous energy, and that the reason that the strong coupling is so much
larger than the electroweak couplings at ordinary energies is that QCD is
asymptotically free, so that its effective coupling constant rises slowly as the
energy drops from the grand unification scale to ordinary values. The
change of the strong couplings is very slow (like so the grand
unification scale must be enormous. We found that for a fairly large class
of theories the grand unification scale comes out to be in the neighbor-
hood of 1016 GeV, an energy not all that different from the Planck energy
of 10 19 GeV. The nucleon lifetime is very difficult to estimate accurately,
but we gave a representative value of 1032 years, which may be accessible
experimentally in a few years. (These estimates have been improved in
more detailed calculations by several authors.) [58] We also calculated a
value for the mixing parameter of about 0.2, not far from the present
experimental of 0.23±0.01. It will be an important task for future
S. Weinberg 555

experiments on neutral currents to improve the precision with which


is known, to see if it really agrees with this prediction.
In a grand unified theory, in order for elementary scalar particles to be
available to produce the spontaneous breakdown of the electroweak gauge
symmetry at a few hundred GeV, it is necessary for such particles to escape
getting superlarge masses from the spontaneous breakdown of the grand
unified gauge group. There is nothing impossible in this, but I have not
been able to think of any reason why it should happen. (The problem may
be related to the old mystery of why quantum corrections do not produce
an enormous cosmological constant; in both cases, one is concerned with
an anomalously small “super-renormalizable” term in the effective Lagran-
gian which has to be adjusted to be zero. In the case of the cosmological
constant, the adjustment must be precise to some fifty decimal places.)
With elementary scalars of small or zero bare mass, enormous ratios of
symmetry breaking scales can arise quite naturally [59]. On the other
hand, if there are no elementary scalars which escape getting superlarge
masses from the breakdown of the grand unified gauge group, then as I
have already mentioned, there must be extra strong forces to bind the
composite Goldstone and Higgs bosons that are associated with the sponta-
neous breakdown of SU(2) x U(1). Such forces can occur rather naturally
in grand unified theories. To take one example, suppose that the grand
gauge group breaks, not into SU(3) x SU(2) x U(l), but into SU(4) x SU(3)
x SU(2) x U(1). Since SU(4) is a bigger group than SU(3), its coupling
constant rises with decreasing energy more rapidly than the QCD cou-
pling, so the SU(4) force becomes strong at a much higher energy than the
few hundred MeV at which the QCD force becomes strong. Ordinary
quarks and leptons would be neutral under SU(4), so they would not feel
this force, but other fermions might carry SU(4) quantum numbers, and so
get rather large masses. One can even imagine a sequence of increasingly
large subgroups of the grand gauge group, which would fill in the vast
energy range up to 1015 or 1019 GeV with particle masses that are produced
by these successively stronger interactions.
If there are elementary scalars whose vacuum expectation values are
responsible for the masses of ordinary quarks and leptons, then these
masses can be affected in order α by radiative corrections involving the
superheavy vector bosons of the grand gauge group, and it will probably
be impossible to explain the value of quantities like without a
complete grand unified theory. On the other hand, if there are no such
elementary scalars, then almost all the details of the grand unified theory
are forgotten by the effective field theory that describes physics at ordi-
nary energies, and it ought to be possible to calculate quark and lepton
masses purely in terms of processes at accessible energies. Unfortunately,
no one so far has been able to see how in this way anything resembling the
observed pattern of masses could arise. [60]
Putting aside all these uncertainties, suppose that there is a truly funda-
mental theory, characterized by an energy scale of order 10 16 to 1019 G e V ,
556 Physics 1979

at which strong, electroweak, and gravitational interactions are all united.


It might be a conventional renormalizable quantum field theory, but at the
moment, if we include gravity, we do not see how this is possible. (I leave
the topic of supersymmetry and supergravity for Professor Salam’s talk.)
But if it is not renormalizable, what then determines the infinite set of
coupling constants that are needed to absorb all the ultraviolet divergences
of the theory?
I think the answer must lie in the fact that the quantum field theory,
which was born just fifty years ago from the marriage of quantum mechan-
ics with relativity, is a beautiful but not very robust child. As Landau and
Kallen recognized long ago, quantum field theory at superhigh energies is
susceptible to all sorts of diseases - tachyons, ghosts, etc. and it needs
special medicine to survive. One way that a quantum field theory can avoid
these diseases is to be renormalizable and asymptotically free, but there are
other possibilities. For instance, even an infinite set of coupling constants
may approach a non-zero fixed point as the energy at which they are
measured goes to infinity. However, to require this behavior generally
imposes so many constraints on the couplings that there are only a finite
number of free parameters left[6 1] - just as for theories that are renormali-
zable in the usual sense. Thus, one way or another, I think that quantum
field theory is going to go on being very stubborn, refusing to allow us to
describe all but a small number of possible worlds, among which, we hope,
is ours.
I suppose that I tend to be optimistic about the future of physics. And
nothing makes me more optimistic than the discovery of broken symme-
tries. In the seventh book of the Republic, Plato describes prisoners who are
chained in a cave and can see only shadows that things outside cast on the
cave wall. When released from the cave at first their eyes hurt, and for a
while they think that the shadows they saw in the cave are more real than
the objects they now see. But eventually their vision clears, and they can
understand how beautiful the real world is. We are in such a cave, impris-
oned by the limitations on the sorts of experiments we can do. In particu-
lar, we can study matter only at relatively low temperatures, where symme-
tries are likely to be spontaneously broken, so that nature does not appear
very simple or unified. We have not been able to get out of this cave, but by
looking long and hard at the shadows on the cave wall, we can at least make
out the shapes of symmetries, which though broken, are exact principles
governing all phenomena, expressions of the beauty of the world outside.

It has only been possible here to give references to a very small part of
the literature on the subjects discussed in this talk. Additional references
can be found in the following reviews:.

Abers, E.S. and Lee, B.W., Gauge Theories (Physics Reports 9C, No. 1,
1973).
S. Weinberg 557

Marciano, W. and Pagels, H., Quantum Chromodynamics (Physics Reports


36C, No. 3, 1978).
Taylor, J.C., Gauge Theories of Weak Interactions (Cambridge Univ. Press,
1976).

REFERENCES
1. Tuve, M. A., Heydenberg, N. and Hafstad, L. R. Phys. Rev. 50, 806 (1936); Breit, G.,
Condon, E. V. and Present, R. D. Phys. Rev. 50, 825 (1936); Breit, G. and Feenberg, E.
Phys. Rev. 50, 850 (1936).
2. Gell-Mann, M. Phys. Rev. 92, 833 (1953); Nakano. T. and Nishijima, K. Prog. Theor.
Phys. 10, 581 (1955).
3. Lee, T. D. and Yang, C. N. Phys. Rev. 104, 254 (1956); Wu. C. S. et.al. Phys. Rev. 105,
1413 (1957); Garwin, R., Lederman, L. and Weinrich, M. Phys. Rev. 105, 1415 (1957);
Friedman, J. I. and Telegdi V. L. Phys. Rev. 105, 1681 (1957).
4. Gell-Mann, M. Cal. Tech. Synchotron Laboratory Report CTSL-20 (1961). unpublished;
Ne’eman, Y. Nucl. Phys. 26, 222 (1961).
5. Fock, V. Z. f. Physik 39, 226 (1927); Weyl, H. Z. f. Physik 56, 330 (1929). The name
“gauge invariance” is based on an analogy with the earlier speculations of Weyl, H. in
Raum, Zeit, Materie, 3rd edn, (Springer, 1920). Also see London, F. Z. f. Physik 42, 375
(1927). (This history has been reviewed by Yang, C. N. in a talk at City College, (1977).)
6. Yang, C. N. and Mills, R. L. Phys. Rev. 96, 191 (1954).
7. Goldstone, J. Nuovo Cimento 19, 154 (1961).
8. Goldstone, J., Salam, A. and Weinberg, S. Phys. Rev. 127, 965 (1962).
9. Higgs, P. W. Phys. Lett. 12, 132 (1964); 13, 508 (1964); Phys. Rev. 145, 1156 (1966);
Kibble, T. W. B. Phys. Rev. 155, 1554 (1967); Guralnik, G. S., Hagen, C. R. and Kibble,
T. W. B. Phys. Rev. Lett. 13, 585 (1964); Englert, F. and Brout, R. Phys. Rev. Lett. 13,
32 1 (1964); Also see Anderson, P. W. Phys. Rev. 130, 439 (1963).
10. Adler, S. L. Phys. Rev. Lett. 14, 1051 (1965); Phys Rev. 140, B736 (1965); Weisberger, W.
I. Phys. Rev. Lett. 14, 1047 (1965); Phys Rev. 143, 1302 (1966).
11. Gell-Mann, M. Physics I, 63 (1964).
12. Nambu, Y. and Jona-Lasinio, G. Phys. Rev. 122, 345 (1961); 124, 246 (1961); Nambu, Y,
and Lurie, D. Phys. Rev. 125, 1429 (1962); Nambu. Y. and Shrauner, E. Phys. Rev. 128,
862 (1962); Also see Gell-Mann, M. and Levy, M., Nuovo Cimento 16, 705 (1960).
13. Goldberger, M. L., Miyazawa, H. and Oehme, R. Phys Rev. 99, 986 (1955).
14. Goldberger, M. L., and Treiman, S. B. Phys. Rev. 111, 354 (1958).
15 .Weinberg, S. Phys. Rev. Lett. 16, 879 (1966); 17, 336 (1966); 17, 616 (1966); 18, 188
(1967); Phys Rev. 166, 1568 (1967).
16. Oppenheimer, J, R. Phys. Rev. 35, 461 (1930); Waller, I. Z. Phys. 59, 168 (1930); ibid.,
62, 673 (1930).
17. Feynman, R. P. Rev. Mod. Phys. 20, 367 (1948); Phys. Rev. 74, 939, 1430 (1948); 76, 749,
769 (1949); 80, 440 (1950); Schwinger, J. Phys. Rev. 73, 146 (1948); 74, 1439 (1948); 75,
651 (1949); 76, 790 (1949); 82, 664, 914 (1951);91, 713 (1953); Proc. Nat. Acad. Sci.37,
452 (1951); Tomonaga, S. Progr. Theor. Phys. (Japan) I, 27 (1946); Koba, Z., Tati, T.
and Tomonaga, S. ibid. 2, 101 (1947); Kanazawa, S. and Tomonaga, S. ibid. 3, 276 (1948);
Koba, Z. and Tomonaga, S. ibid 3, 290 (1948).
18. There had been earlier suggestions that infinities could be eliminated from quantum
field theories in this way, by Weisskopf, V. F. Kong. Dansk. Vid. Sel. Mat.-Fys. Medd. 15
(6) 1936, especially p. 34 and pp. 5-6; Kramers,.H. (unpublished).
19. Dyson, F. J. Phys. Rev. 75, 486, 1736 (1949).
20. Weinberg, S. Phys. Rev. 106, 1301 (1957).
21. Weinberg, S. Phys. Rev. 118, 838 (1960).
22. Salam, A. Phys. Rev. 82, 217 (1951); 84, 426 (1951).
558 Physics 1979

23. Weinberg, S. Phys. Rev. Lett. 18, 507 (1967).


24. For the non-renormalizability of theories with intrinsically broken gauge symmetries, see
Komar, A. and Salam, A. Nucl. Phys. 21, 624 (1960); Umezawa, H. and Kamefuchi, S.
Nucl. Phys. 23, 399 (1961); Kamefuchi, S., O’Raifeartaigh, L. and Salam, A. Nucl. Phys.
28, 529 (1961); Salam, A. Phys. Rev. 127, 331 (1962); Veltman, M. Nucl. Phys. B7, 637
(1968); B21, 288 (1970); Boulware, D. Ann. Phys. (N, Y,)56, 140 (1970).
25. This work was briefly reported in reference 23, footnote 7.
26. Weinberg, S. Phys. Rev. Lett. 19, 1264 (1967).
27. Salam, A. In Elementary Particle Physics (Nobel Symposium No. 8), ed. by Svartholm, N.
(Almqvist and Wiksell, Stockholm, 1968), p. 367.
28. deWitt, B. Phys. Rev. Lett. 12, 742 (1964); Phys. Rev. 162, 1195 (1967); Faddeev L. D.,
and Popov, V. N. Phys. Lett. B25, 29 (1967); Also see Feynman, R. P. Acta. Phys. Pol. 24,
697 (1963); Mandelstam, S. Phys. Rev. 175, 1580, 1604 (1968).
29. See Stuller, I.. M. I. T., Thesis, Ph. D. (1971), unpublished.
30. My work with the unitarity gauge was reported in Weinberg, S. Phys. Rev. Lett. 27, 1688
(1971 ), and described in more detail in Weinberg, S. Phys. Rev. D7, 1068 (1973).
31. ‘t Hooft, G Nucl. Phys. B35, 167 (1971).
32. Lee, B. W. and Zinn-Justin, J. Phys. Rev. D5, 3121, 3137, 3155 (1972); ‘t Hooft, G. and
Veltman, M. Nucl. Phys. 844, 189 (1972), B50, 318 (1972). There still remained the
problem of possible Adler-Bell-Jackiw anomalies, but these nicely cancelled; see D. J.
Gross and R. Jackiw, Phys. Rev. D6, 477 (1972) and C. Bouchiat, J. lliopoulos, and Ph.
Meyer, Phys. Lett. 388, 519 (1972).
33. Beechi, C., Rouet, A. and Stora R. Comm. Math. Phys. 42, 127 (1975).
34. Lee, B. W. Phys. Rev. D5, 823 (1972).
35. Gamow, G. and Teller, E. Phys. Rev. 51, 288 (1937); Kemmer, N. Phys. Rev. 52, 906
(1937); Wentrel, G. Helv. Phys. Acta. 10, 108 (1937); Bludman, S. Nuovo Cimento 9, 433
(1958); Leite-Lopes, J. Nucl. Phys. 8, 234 (1958).
36. Glashow, S. L. Nucl. Phys. 22, 519 (1961); Salam, A. and Ward, J. C. Phys. Lett. 13, 168
(1964).
37. Weinberg, S. Phys. Rev. 5, 1412 (1972).
38. Cundy, D. C. et al., Phys. Lett. 31B, 478 (1970).
39. The first published discovery of neutral currents was at the Gargamelle Bubble Chamber
at CERN: Hasert, F. J. et al., Phys. Lett. 468, 121, 138 (1973). Also see Musset, P. Jour.
de Physique 11 /12 T34 (1973). Muonless events were seen at about the same time by the
HPWF group at Fermilab, but when publication of their paper was delayed, they took the
opportunity to rebuild their detector, and then did not at first find the same neutral
current signal. The HPWF group published evidence for neutral currents in Benvenuti,
A. et al., Phys. Rev. Lett. 52, 800 (1974).
40. For a survey of the data see Baltay, C. Proceedings of the 19th International Conference on
High Energy Physics, Tokyo, 1978. For theoretical analyses, see Abbott, L. F. and Barnett,
R. M. Phys. Rev. D19, 3230 (1979); Langacker, P., Kim, J. E., Levine, M., Williams, H. H.
and Sidhu, D. P. Neutrino Conference ‘79; and earlier references cited therein.
41. Prescott, C. Y. et.al., Phys. Lett. 778, 347 (1978).
42. Glashow, S. L. and Georgi, H. L. Phys. Rev. Lett. 28, 1494 (1972). Also see Schwinger, J.
Annals of Physics (N. Y.)2, 407 (1957).
43. Glashow, S. L., Iliopoulos, J. and Maiani, L. Phys. Rev. D2, 1285 (1970). This paper was
cited in ref. 37 as providing a possible solution to the problem of strangeness changing
neutral currents. However, at that time I was skeptical about the quark model, so in the
calculations of ref. 37 baryons were incorporated in the theory by taking the protons and
neutrons to form an SU(2) doublet, with strange particles simply ignored.
44. Politzer, H. D. Phys. Rev. Lett. 30, 1346 (1973); Gross, D. J. and Wilczek, F. Phys. Rev.
Lett. 30, 1343 (1973).
45. Energy dependent effective couping constants were introduced by Gell-Mann, M. and
Low, F. E. Phys. Rev. 95, 1300 (1954).
46. Bloom, E. D. et.al., Phys. Rev. Lett. 23, 930 (1969); Breidenbach, M. et.al., Phys. Rev.
Lett. 23, 935 (1969).
S. Weinberg 559

47. Weinberg, S. Phys. Rev. D8, 605 (1973).


48. Gross, D. J. and Wilczek, F. Phys. Rev. D8, 3633 (1973); Weinberg, S. Phys. Rev. Lett. 31,
494 (1973). A similar idea had been proposed before the discovery of asymptotic free-
dom by Fritzsch, H., Gell-Mann, M. and Leutwyler, H. Phys. Lett. 478, 365 (1973).
49. Greenberg, 0. W. Phys. Rev. Lett. 13, 598 (1964); Han, M. Y. and Nambu, Y. Phys. Rev.
139, B1006 (1965); Bardeen, W. A., Fritzsch, H. and Gell-Mann, M. in Scale and Confor-
mu1 Symmetry in Hadron Physics, ed. by Gatto, R. (Wiley, 1973), p. 139; etc.
50. ‘t Hooft, G. Phys. Rev. Lett. 37, 8 (1976).
51. Such “dynamical” mechanisms for spontaneous symmetry breaking were first discussed
by Nambu, Y. and Jona-Lasinio, G. Phys. Rev. 122, 345 (1961); Schwinger, J. Phys. Rev.
125, 397 (1962); 128, 2425 (1962); and in the context of modern gauge theories by
Jackiw, R. and Johnson, K. Phys. Rev. D8, 2386 (1973); Cornwall, J. M. and Norton, R. E.
Phys. Rev. D8, 3338 (1973). The implications of dynamical symmetry breaking have been
considered by Weinberg, S. Phys. Rev. D13, 974 (1976); D19, 1277 (1979); Susskind, L.
Phys. Rev. D20, 2619 (1979).
52. Weinberg, S. ref 51, The possibility of pseudo-Goldstone bosons was originally noted in a
different context by Weinberg, S. Phys. Rev. Lett. 29, 1698 (1972).
53. Weinberg, S. ref. 51. Models involving such interactions have also been discussed by
Susskind, L. ref. 51.
54. Weinberg, S. Phys. Rev. 135, B1049 (1964).
55. Weinberg. S. Phys. Lett. 9, 357 (1964); Phys. Rev. 8138, 988 (1965); Lectures in Particles
and Field Theory, ed. by Deser, S. and Ford, K. (Prentice-Hall, 1965), p. 988; and ref. 54.
The program of deriving general relativity from quantum mechanics and special relativ-
ity was completed by Boulware, D. and Deser, S. Ann. Phys. 89, 173 (1975). I understand
that similar ideas were developed by Feynman, R. in unpublished lectures at Cal. Tech.
56. Georgi, H., Quinn, H. and Weinberg, S. Phys. Rev. Lett. 33, 45 1 (1974).
57. An example of a simple gauge group for weak and electromagnetic interactions (for-
which was given by S. Weinberg, Phys. Rev. D5, 1962 (1972). There are a
number of specific models of weak, electromagnetic, and strong interactions based on
simple gauge groups, including those of Pati, J. C. and Salam, A. Phys. Rev. D10, 275
(1974); Georgi, H. and Glashow, S. L. Phys. Rev. Lett. 32, 438 (1974); Georgi, H. in
Particles and Fields (American Institute of Physics, 1975); Fritzsch, H. and Minkowski, P.
Ann. Phys. 93, 193 (1975); Georgi, H. and Nanopoulos, D. V. Phys. Lett. 82B, 392
(1979); Gürsey, F. Ramond, P. and Sikivie, P. Phys. Lett. B60, 177 (1975); Gürsey, F. and
Sikivie, P. Phys. Rev. Lett. 36, 775 (1976); Ramond, P. Nucl. Phys, B110, 214 (1976); etc;
all these violate baryon and lepton conservation, because they have quarks and leptons in
the same multiplet; see Pati, J. C. and Salam, A. Phys. Rev. Lett. 31, 661 (1973); Phys.
Rev. D8, 1240 (1973).
58. Buras, A., Ellis, J., Gaillard, M. K. and Nanopoulos, D. V. Nucl. Phys. B135, 66 (1978);
Ross, D. Nucl. Phys. B140, 1 (1978); Marciano, W. J. Phys. Rev. D20, 274 (1979);
‘Goldman, T. and Ross, D. CALT 68-704, to be published; Jarlskog, C. and Yndurain, F.
J. CERN preprint, to be published. Machacek, M. Harvard preprint HUTP-79/AO21, to
be published in Nuclear Physics; Weinberg, S. paper in preparation. The phenomenono-
logy of nucleon decay has been discussed in general terms by Weinberg, S. Phys. Rev.
Lett. 43, 1566 (1979); Wilczek, F. and Zee, A. Phys. Rev. Lett. 43, 1571 (1979).
59. Gildener, E. and Weinberg, S. Phys. Rev. D13, 3333 (1976); Weinberg, S. Phys. Letters
82B, 387 (1979). In general there should exist at least one scalar particle with physical
mass of order 10 GeV. The spontaneous symmetry breaking in models with zero bare
scalar mass was first considered by Coleman, S. and Weinberg, E., Phys. Rev. D 7, 1888
(1973).
60. This problem has been studied recently by Dimopoulos, S. and Susskind, L. Nucl. Phys.
B155, 237 (1979); Eichten, E. and Lane, K. Physics Letters, to be published; Weinberg, S.
unpublished.
61. Weinberg, S. in General Relativity - An Einstein Centenary Survey, ed. by Hawking, S. W.
and Israel, W. (Cambridge Univ. Press, 1979), Chapter-16.
UTTG-12-05

Living in the Multiverse

Opening Talk at the Symposium ”Expectations of a Final Theory” at


arXiv:hep-th/0511037v1 3 Nov 2005

Trinity College, Cambridge, September 2, 2005; to be published in


Universe or Multiverse?, ed. B. Carr (Cambridge University Press).

Steven Weinberg
Physics Department, University of Texas at Austin

Most advances in the history of science have been marked by discoveries


about nature, but at certain turning points we have made discoveries about
science itself. These discoveries lead to changes in how we score our work,
in what we consider to be an acceptable theory.
For an example look back to a discovery made just one hundred years
ago. As you recall, before 1905 there had been numerous unsuccessful ef-
forts to detect changes in the speed of light due to the motion of the earth
through the ether. Attempts were made by Fitzgerald, Lorentz, and others
to construct a mathematical model of the electron (which was then con-
ceived to be the chief constituent of all matter), that would explain how
rulers contract when moving through the ether in just the right way to keep
the apparent speed of light unchanged. Einstein instead offered a symmetry
principle, which stated that not just the speed of light but all the laws of
nature are unaffected by a transformation to a frame of reference in uniform
motion. Lorentz grumbled that Einstein was simply assuming what he and
others had been trying to prove. But history was on Einstein’s side. The
1905 Special Theory of Relativity was the beginning of a general acceptance
of symmetry principles as a valid basis for physical theories.
This was how Special Relativity made a change in science itself. From
one point of view, Special Relativity was no big thing — it just amounted
to the replacement of one 10 parameter spacetime symmetry group, the
Galileo group, with another 10 parameter group, the Lorentz group. But
never before had a symmetry principle been taken as a legitimate hypothesis
on which to base a physical theory.
As usually happens with this sort of revolution, Einstein’s advance came
with a retreat in another direction: The effort to construct a model of

1
the electron was suspended for decades. Instead, symmetry principles in-
creasingly became the dominant foundation for physical theories. This ten-
dency was accelerated after the advent of quantum mechanics in the 1920s,
because the survival of symmetry principles in quantum theories imposes
highly restrictive consistency conditions (existence of antiparticles, connec-
tion between spin and statistics, cancelation of infinities and anomalies) on
physically acceptable theories. Our present Standard Model of elementary
particle interactions can be regarded as simply the consequence of certain
gauge symmetries and the associated quantum mechanical consistency con-
ditions.
The development of the Standard Model did not involve any changes
in our conception of what was acceptable as a basis for physical theories.
Indeed, the Standard Model can be regarded as just quantum electrodynam-
ics writ large. Similarly, when the effort to extend the Standard Model to
include gravity led to widespread interest in string theory, we expected to
score the success or failure of this theory in the same way as for the Stan-
dard Model: String theory would be a success if its symmetry principles and
consistency conditions led to a successful prediction of the free parameters
of the Standard Model.
Now we may be at a new turning point, a radical change in what we
accept as a legitimate foundation for a physical theory. The current ex-
citement is is of course a consequence of the discovery of a vast number
of solutions of string theory, beginning in 2000 with the work of Bousso
and Polchinski.1 The compactified six dimensions in Type II string theories
typically have a large number (tens or hundreds) of topological fixtures (3-
cycles), each of which can be threaded by a variety of fluxes. The logarithm
of the number of allowed sets of values of these fluxes is proportional to the
number of topological fixtures. Further, for each set of fluxes one obtains a
different effective field theory for the modular parameters that describe the
compactified 6-manifold, and for each effective field theory the number of
local minima of the potential for these parameters is again proportional to
the number of topological fixtures. Each local minimum corresponds to the
vacuum of a possible stable or metastable universe.
Subsequent work by Giddings, Kachru, Kallosh, Linde, Maloney, Polchin-
ski, Silverstein, Strominger, and Trivedi (in various combinations2 ) estab-
1
R. Bousso and J. Polchinski, JHEP 0006, 006 (2000).
2
S. B. Giddings, S. Kachru, and J. Polchinski, Phys. Rev. D66, 106006 (2002); A.
Maloney, E. Silverstein, and A. Strominger, hep-th/0205316; S. Kachru, R. Kallosh, A.
D. Linde, and S. P. Trivedi, Phys. Rev. D68, 046005 (2003).

2
lished the existence of a large number of vacua with positive energy densities.
Ashok and Douglas3 estimated the number of these vacua to be of order 10100
to 10500 . Susskind4 gave the name “string landscape” to this multiplicity
of vacua, taking the term from biochemistry, where the possible choices of
orientation of each chemical bond in large molecules leads to a vast number
of possible configurations. Unless one can find a reason to reject all but a
few of the string theory vacua, we will have to accept that much of what we
had hoped to calculate are environmental parameters, like the distance of
the earth from the sun, whose values we will never be able to deduce from
first principles.
We lose some, and win some. The larger the number of possible values
of physical parameters provided by the string landscape, the more string
theory legitimates anthropic reasoning as a new basis for physical theories:
Any scientists who study nature must live in a part of the landscape where
physical parameters take values suitable for the appearance of life and its
evolution into scientists.
An apparently successful example of anthropic reasoning was already at
hand by the time the string landscape was discovered. For decades there
seemed to be something peculiar about the value of the vacuum energy ρV .
Quantum fluctuations in known fields at well-understood energies (say, less
than 100 GeV) give a value of ρV larger than observationally allowed by a
factor 1056 . This contribution to the vacuum energy might be canceled by
quantum fluctuations of higher energy, or by simply including a suitable cos-
mological constant term in the Einstein field equations, but the cancelation
would have to be exact to 56 decimal places. No symmetry argument or ad-
justment mechanism could be found that would explain such a cancelation.
Even if such an explanation could be found, there would be no reason to
suppose that the remaining net vacuum energy would be comparable to the
present value of the matter density, and since it is certainly not very much
larger, it was natural to suppose that it is very much less, too small to be
detected.
On the other hand, if ρV takes a broad range of values in the multiverse,
then it is natural for scientists to find themselves in a subuniverse in which
ρV takes a value suitable for the appearance of scientists. I pointed out in
1987 that this value for ρV can’t be too large and positive, because then
3
S. K. Ashok and M. Douglas, JHEP 0401, 060 (2004).
4
L. Susskind, hep-th/0302219

3
galaxies and stars would not form.5 Roughly, this limit is that ρV should
be less than the mass density of the universe at the time when galaxies first
condense. Since this was in the past, when the mass density was larger than
at present, the anthropic upper limit on the vacuum energy density is larger
than the present mass density, but not many orders of magnitude greater.
But anthropic arguments provide not just a bound on ρV ; they give
us some idea of the value to be expected: ρV should be not very different
from the mean of the values suitable for life. This is what Vilenkin6 calls
the “principle of mediocrity.” This mean is positive, because if ρV were
negative it would have to be less in absolute value than the mass density
of the universe during the whole time that life evolves, since otherwise the
universe would collapse before any astronomers come on the scene,7 while
if positive ρV only has to be less than the mass density of the universe at
the time when most galaxies form, giving a much broader range of possible
positive than negative values. In 1997-8 Martel, Shapiro, and I8 carried out
a detailed calculation of the probability distribution of values of ρV seen
by astronomers throughout the multiverse, under the assumption that the
a priori probability distribution is flat in the relatively very narrow range
that is anthropically allowed. At that time the value of the primordial rms
fractional density fluctuation σ was not well known, since the value inferred
from observations of the cosmic microwave background depended on what
one assumed for ρV . It was therefore not possible to calculate a mean
expected value of ρV , but for any assumed value of ρV we could estimate σ
and use the result to calculate the fraction of astronomers that would observe
a value of ρV as small as the assumed value. In this way we concluded
that if ΩΛ turned out to be much less than 0.6, anthropic reasoning could
not explain why it was so small. The editor of the Astrophysical Journal
objected to publishing papers about anthropic calculations, and we had to
sell our article by pointing out that we had provided a strong argument for
abandoning an anthropic explanation of a small value of ρV , if it turned out
to be too small.
Of course, it turned out that ρV is not too small. Soon after this work,
5
S. Weinberg, Phys. Rev. Lett. 59, 2607 (1987).
6
A. Vilenkin, Phys. Rev. Lett. 74, 846 (1995)
7
J. D. Barrow and F. J. Tipler, The Anthropic Cosmological Principle (Clarendon,
Oxford, 1986).
8
H. Martel, P. Shapiro, and S. Weinberg, Astrophys. J. 492, 29 (1998). For earlier
calculations, see G. Efstathiou, Mon. Not. Roy. Astron. Soc. 274, L73 (1995); S.
Weinberg, in Critical Dialogues in Cosmology, ed. N. Turok (World Scientific, 1997).

4
observations of type Ia supernovae revealed that the expansion of the uni-
verse is accelerating,9 and gave the result that ΩV ≃ 0.7. In other words the
ratio of the vacuum energy density to the present mass density ρM 0 in our
subuniverse (which I use just as a convenient measure of density) is about
2.3, a conclusion subsequently confirmed by observations of the microwave
background.10
This is still a bit low. Martel, Shapiro, and I had found that the probabil-
ity of a vacuum energy density this small was 12%. I have now recalculated
the probability distribution, using WMAP data and a better transfer func-
tion, with the result that the probability of a random astronomer seeing a
value as small as 2.3ρM 0 is increased to 15.6%. Now that we know σ, we
can also calculate that the median vacuum energy density is 13.3ρM 0 .
I should mention a complication in these calculations. The average of
the product of density fluctuations at different points becomes infinite as
these points approach each other, so the rms fractional density fluctuation
σ is actually infinite. Fortunately, it is not σ itself that is really needed
in these calculations, but the rms fractional density fluctuation averaged
over a sphere of co-moving radius R taken large enough so that the density
fluctuation is able to hold on efficiently to the heavy elements produced in the
first generation of stars. The results mentioned above were calculated for R
(projected to the present) equal to 2 Mpc. These results are rather sensitive
to the value of R; for R = 1 Mpc, the probability of finding a vacuum energy
as small as 2.3ρM 0 is only 7.2%. The estimate of the required value of R
involves complicated astrophysics, and needs to be better understood.
Now I want to take up four problems we have to face in working out the
anthropic implications of the string landscape.

I What is the shape of the string landscape?


Douglas11 and Dine12 and their co-workers have taken the first steps in
finding the statistical rules governing different string vacua. I can’t comment
usefully on this, except to say that it wouldn’t hurt in this work if we knew
what string theory is.
9
A. G. Riess et al., Astron. J. 116, 1009 (1998); S. Perlmutter et al., Astrophys. J.
517, 565 (1999).
10
WMAP collaboration, Astrophys. J. Suppl. 148 (2003).
11
M. R. Douglas, hep-ph/0401004; Compt. Rend. Phys. 5, 965 (2004).
12
M. Dine, D. O’Neil and Z. Sun, JHEP 0507, 014 (2005); M. Dine and Z. Sun, hep-
th/0506246.

5
II What constants scan?
Anthropic reasoning makes sense for a given constant if the range over
which the constant varies in the landscape is large compared with the an-
thropically allowed range of values of the constant, for then it is reasonable
to assume that the a priori probability distribution is flat in the anthrop-
ically allowed range. We need to know what constants actually “scan” in
this sense. Physicists would like to be able to calculate as much as possible,
so we hope that not too many constants scan.
The most optimistic hypothesis is that the only constants that scan are
the few whose dimensionality is a positive power of mass: the vacuum energy,
and whatever scalar mass or masses set the scale of electroweak symmetry
breaking. With all other parameters of the Standard Model fixed, the scale
of electroweak symmetry breaking is bounded by about 1.4 to 2.7 times its
value in our subuniverse, by the condition that the pion mass should be small
enough to make the nuclear force strong enough to keep the deuteron stable
against fission.13 (The condition that the deuteron be stable against beta
decay, which yields a tighter bound, does not seem to me to be necessary.
Even a beta-unstable deuteron would live long enough to allow cosmological
helium synthesis; helium would be burned to heavy elements in the first
generation of very massive stars; and then subsequent generations could have
long lifetimes burning hydrogen through the carbon cycle.) But the mere
fact that the electroweak symmetry breaking scale is only a few orders of
magnitude larger than the QCD scale should not in itself lead us to conclude
that it must be anthropically fixed. There is always the possibility that the
electroweak symmetry breaking scale is determined by the energy at which
some gauge coupling constant becomes strong, and if that coupling happens
to grow with decreasing energy a little faster than the QCD coupling then
the electroweak breaking scale will naturally be a few orders of magnitude
larger than the QCD scale.
If the electroweak symmetry breaking scale is anthropically fixed, then
we can give up the decades long search for a natural solution of the hierarchy
problem. This is a very attractive prospect, because none of the “natural”
solutions that have been proposed, such as technicolor or low energy su-
persymmetry, were ever free of difficulties. In particular, giving up low
energy supersymmetry can restore some of the most attractive features of
the non-supersymmetric standard model: automatic conservation of baryon
and lepton number in interactions up to dimension 5 and 4, respectively;
13
V.Agrawal, S. M. Barr, J. F. Donoghue, and D. Seckel, Phys. Rev. D 57, 5480 (1998).

6
natural conservation of flavors in neutral currents; and a small neutron elec-
tric dipole moment. Arkani-Hamed and Dimopoulos14 have even shown how
it is possible to keep the good features of supersymmetry, such as a more
accurate convergence of the SU (3) × SU (2) × U (1) couplings to a single
value, and the presence of candidates for dark matter WIMPs. The idea
of this “split supersymmetry” is that, although supersymmetry is broken
at some very high energy, the gauginos and higgsinos are kept light by a
chiral symmetry. [An additional discrete symmetry is needed to prevent
lepton-number violation in higgsino-lepton mixing, and to keep the lightest
supersymmetric particle stable.] One of the nice things about split super-
symmetry is that, unlike many of the things we talk about these days, it
makes predictions that can be checked when the LHC starts operation. One
expects a single neutral Higgs with a mass in the range 120 to 165 GeV,
possible winos and binos but no squarks or sleptons, and a long-lived gluino.
(Incidentally, a Stanford group15 has recently used considerations of big bang
nucleosynthesis to argue that a 1 TeV gluino must have a lifetime less than
100 seconds, indicating a supersymmetry breaking scale less than 1010 GeV,
which might create problems for proton stability. But I wonder whether,
even if the gluino has a longer lifetime and decays after nucleosynthesis, the
universe might not thereby be reheated above the temperature of helium
dissociation, giving big bang nucleosynthesis a second chance to produce
the observed helium abundance.)
What about the dimensionless Yukawa couplings of the Standard Model?
Hogan16 has analyzed the anthropic constraints on these couplings, with the
electroweak symmetry breaking scale and the sum of the u and d Yukawa
couplings held fixed, to avoid complications due to the dependence of nuclear
forces on the pion mass. He imposes the conditions that (1) md − mu −
me > 1.2 MeV, so that the early universe doesn’t become all neutrons; (2)
md − mu + me < 3.4 MeV, so that the pp reaction is exothermic, and (3)
me > 0. With three conditions on the two parameters mu − md and me , he
naturally finds these parameters are limited to a finite region, which turns
out to be quite small. At first sight, this gives the impression that the quark
and lepton Yukawa couplings are subject to stringent anthropic constraints,
14
N. Arkani-Hamed and S. Dimopoulos, JHEP 0506, 073 (2005). Also see G. F. Giudice
and A. Romanino, Nucl. Phys. B 699, 65 (2004); N. Arkani-Hamed, S. Dimopoulos, G. F.
Giudice, and A. Romanino, Nucl. Phys. B 709, 3 (2005); A. Delgado and G. F. Giudice,
hep-ph/0506217.
15
A. Arvanitaki, C. Davis, P. W. Graham, A. Pierce, and J. G. Wacker, hep-ph/0504210.
16
C. Hogan, Rev. Mod. Phys. 72, 1149 (2000); and astro-ph/0407086.

7
in which case we might infer that the Yukawa couplings probably scan.
I have two reservations about this conclusion. The first reservation is
that the pp reaction is not necessary for life. For one thing, the pep reaction
p + p + e− → d+ ν can keep stars burning hydrogen for a long time. For this,
we do not need md − mu + me < 3.4 MeV, but only the weaker condition
md − mu − me < 3.4 MeV. The three conditions then do not constrain
md − mu and me separately to any finite region, but only constrain the
single parameter md − mu − me to lie between 1.2 MeV and 3.4 MeV, not
a very tight anthropic constraint. (In fact, He4 will be stable as long as
md − mu − me is less than about 13 MeV, so stellar nucleosynthesis can
begin with helium burning in the heavy stars of Population III, followed
by hydrogen burning in later generations of stars.) My second reservation
is that the anthropic constraints on the Yukawa couplings are alleviated if
we suppose (as discussed above) that the electroweak symmetry breaking
scale is not fixed, but free to take whatever value is anthropically necessary.
For instance, according to the results of reference 13, the deuteron binding
energy could be made as large as about 3.5 MeV by taking the electroweak
breaking scale much less than it is in our universe, in which case even the
condition that the pp reaction be exothermic becomes much looser.
Incidentally, I don’t set much store by the famous “coincidence” em-
phasized by Hoyle, that there is an excited state of C12 with just the right
energy to allow carbon production via α–Be8 reactions in stars. We know
that even–even nuclei have states that are well described as composites of
α-particles. One such state is the ground state of Be8 , which is unsta-
ble against fission into two alpha particles. The same α-α potential that
produces that sort of unstable state in Be8 could naturally be expected to
produce an unstable state in C12 that is essentially a composite of three α
particles, and that therefore appears as a low-energy resonance in α–Be8
reactions. So the existence of this state doesn’t seem to me to provide any
evidence of fine tuning.
What else scans? Tegmark and Rees17 have raised the question whether
the rms density fluctuation σ may itself scan. If it does, then the anthropic
constraint on the vacuum energy becomes weaker, resuscitating to some
extent the problem of why ρV is so small. But Garriga and Vilenkin18 have
pointed out that it is really ρV /σ 3 that is constrained anthropically, so that
even if σ does scan the anthropic prediction of this ratio remains robust.
17
M. Tegmark and M. J. Rees, Astrophys. J. 499, 526 (1998).
18
J. Garriga and A. Vilenkin, hep-th/0508005.

8
Arkani-Hamed, Dimopoulos, and Kachru19 have offered a possible rea-
son to suppose that most constants do not scan. If there are a large number
N of decoupled modular fields, each taking a few possible values, then the
probability distribution of quantities that depend√on all these fields will be
sharply peaked, with a width proportional to 1/ N . According to Distler
and Varadarajan,20 it is not really necessary here to make arbitrary assump-
tions about the decoupling of the various scalar fields; it is enough to adopt
the most general polynomial superpotential that is stable, in the sense that
radiative corrections do not change the effective couplings for large N by
amounts larger than the couplings themselves. Distler and Varadarajan em-
phasize cubic superpotentials, because polynomial superpotentials of order
higher than cubic presumably make no physical sense. But it is not clear
that even cubic superpotentials can be plausible approximations, or that
peaks will occur at reasonable values in the distribution of dimensionless
couplings rather than of some combinations of these couplings.21 It also
is not clear that the multiplicity of vacua in this kind of effective scalar
field theory can properly represent the multiplicity of flux values in string
theories,22 but even if not, it presumably can represent the variety of minima
of the potential for a given set of flux vacua.
If most constants do not effectively scan, then why should anthropic ar-
guments work for the vacuum energy and the electroweak breaking scale?
ADK point out that, even if some constant has a relatively narrow distribu-
tion, anthropic arguments will still apply if the anthropically allowed range is
even narrower and near a point around which the distribution is symmetric.
(ADK suppose that this point would be at zero, but this is not necessary.)
This is the case, for instance, for the vacuum energy if the superpotential W
is the sum of the superpotentials Wn for a large number of decoupled scalar
fields, for each of which there is a separate broken R symmetry, so that the
possible values of each Wn are equal and opposite. The probability distri-
P
bution of the total superpotential W = N n=1 Wn √will then be a Gaussian
peaked at W = 0 with a width proportional to 1/ N , and the probability
distribution of the supersymmetric vacuum energy −8πG|W |2 will extend
over a correspondingly narrow range of negative values, with a maximum
at zero. When supersymmetry breaking is taken into account, the proba-
19
N. Arkani-Hamed, S. Dimopoulos, and S. Kachru, hep-th/0501082, referred to below
as ADK.
20
J. Distler and U. Varadarajan, hep-th/0507090.
21
M. Douglas, private communication.
22
T. Banks, hep-th/0011255.

9
bility distribution widens to include positive values of the vacuum energy,
extending out to a positive value depending on the scale of supersymmetry
breaking. For any reasonable supersymmetry breaking scale, this probability
distribution, though narrow compared with the Planck scale, will be very
wide compared with the vary narrow anthropically allowed range around
ρV = 0, so within this range the probability distribution can be expected
to be flat, and anthropic arguments should work. Similar remarks apply to
the µ-term of the supersymmetric Standard Model, which sets the scale of
electroweak symmetry breaking.

III. How should we calculate anthropically conditioned probabili-


ties?
We would expect the anthropically conditioned probability distribution
for a given value of any constant that scans to be proportional to the num-
ber of scientific civilizations that observe that value. In the calculations
described above, Martel, Shapiro, and I took this number to be propor-
tional to the fraction of baryons that find themselves in galaxies, but what
if the total number of baryons itself scans? What if it is infinite?

IV. How is the landscape populated?


There are at least four ways in which we might imagine the different
“universes” described by the string landscape actually to exist:

1. The various subuniverses may be simply different regions of space.


This is most simply realized in the chaotic inflation theory.23 The
scalar fields in different inflating patches may take different values,
giving rise to different values for various effective coupling constants.
Indeed, Linde speculated about the application of the anthropic prin-
ciple to cosmology soon after the proposal of chaotic inflation.24

2. The subuniverses may be different eras of time in a single big bang. For
instance, what appear to be constants of nature might actually depend
on scalar fields that change very slowly as the universe expands.25
23
A. D. Linde, Phys. Lett. 129B, 177 (1983); A. Vilenkin, Phys. Rev. D 27, 2848
(1983); A. D. Linde, Phys. Lett. B 175, 305 (1986); Phys. Scripta T15, 100 (1987);
Phys. Lett. B202, 194 (1988).
24
A. D. Linde, in The Very Early Universe, ed. G. W. Gibbons, S. W. Hawking, and
S. Siklos (Cambridge University Press, 1983); Rept. Progr. Phys. 47, 925 (1984).
25
T. Banks, Nucl. Phys. B 249, 332 (1985).

10
3. The subuniverses may be different regions of spacetime. This can
happen if, instead of changing smoothly with time, various scalar fields
on which the “constants” of nature depend change in a sequence of
first-order phase transitions.26 In these transitions metastable bubbles
form within a region of higher vacuum energy; then within each bubble
there form further bubbles of even lower vacuum energy; and so on.
In recent years this idea has been revived in the context of the string
landscape.27

4. The subuniverses could be different parts of quantum mechanical Hilbert


space. In a reinterpretation of Hawking’s earlier work on the wave
function of the universe,28 Coleman29 showed that certain topological
fixtures known as wormholes in the path integral for the Euclidean
wave function of the universe would lead to a superposition of wave
functions in which any coupling constant not constrained by symmetry
principles would take any possible value. Ooguri, Vafa, and Verlinde30
have argued for a particular wave function of the universe, but it es-
capes me how anyone can tell whether this or any other proposed wave
function is the wave function of the universe.

These alternatives are by no means mutually exclusive. In particular, it


seems to me that, whatever one concludes about alternatives 1, 2, and 3,
26
L. Abbott, Phys. Lett. B150, 427 (1985); J. D. Brown and C. Teitelboim, Phys.
Lett. B 195, 177 (1987); Nucl. Phys. B 297, 787 (1987).
27
R. Bousso and J. Polchinski, op. cit.; J. L. Feng, J. March-Russel, S. Sethi, and F.
Wilczek, Nucl. Phys. B 602, 307 (2001); H. Firouzjahi, S. Sarangji, and S.-H. Henry
Tye, JHEP 0409, 060 (2004); B.Freivogel, M. Kleban, M. R. Martinez, and L. Susskind,
hep-th/0505232.
28
S. W. Hawking, Nucl. Phys. B 239, 257 (1984); and in Relativity, Groups, and
Topology II, NATO Advanced Study Institute Session XL, Les Houches, 1983, ed. B.S.
DeWitt and R. Stora (Elsevier, Amsterdam, 1984): p. 336. Some of this work is based
on an initial condition for the origin of the universe proposed by J. Hartle and S. W.
Hawking, Phys. Rev. D 28, 2960 (1983).
29
S. Coleman, Nucl. Phys. B 307, 867 (1988). It has been argued that the wave
function of the universe is sharply peaked at values of the constants that yield a zero
vacuum energy at late times, by S. W. Hawking, in Shelter Island II — Proceedings of the
1983 Shelter Island Conference on Quantum Field Theory and the Fundamental Problems
of Physics, ed. R. Jackiw et al. (MIT Press, Cambridge, 1985); Phys. Lett. B 134, 403
(1984); E. Baum, Phys. Lett. B 133, 185 (1984); S. Coleman, Nucl. Phys. B 310, 643
(1985). This view has been challenged; see W. Fischler, I. Klebanov, J. Polchinski, and
L. Susskind, Nucl. Phys. B 237, 157 (1989). I am assuming here that there are no such
peaks.
30
H. Ooguri, C. Vafa, and E. Verlinde, hep-th/0502211.

11
we will still have the possibility that the wave function of the universe is
a superposition of different terms representing different ways of populating
the landscape in space and/or time.

In closing, I would like to comment about the impact of anthropic rea-


soning within and beyond the physics community. Some physicists have
expressed a strong distaste for anthropic arguments. (I have heard David
Gross say “I hate it.”) This is understandable. Theories based on anthropic
calculation certainly represent a retreat from what we had hoped for: the
calculation of all fundamental parameters from first principles. It is too soon
to give up on this hope, but without loving it we may just have to resign
ourselves to a retreat, just as Newton had to give up Kepler’s hope of a
calculation of the relative sizes of planetary orbits from first principles.
There is also a less creditable reason for hostility to the idea of a multi-
verse, based on the fact that we will never be able to observe any subuni-
verses except our own. Livio and Rees31 and Tegmark32 have given thor-
ough discussions of various other ingredients of accepted theories that we
will never be able to observe, without our being led to reject these theories.
The test of a physical theory is not that everything in it should be observable
and every prediction it makes should be testable, but rather that enough
is observable and enough predictions are testable to give us confidence that
the theory is right.
Finally, I have heard the objection that, in trying to explain why the
laws of nature are so well suited for the appearance and evolution of life,
anthropic arguments take on some of the flavor of religion. I think that
just the opposite is the case. Just as Darwin and Wallace explained how
the wonderful adaptations of living forms could arise without supernatural
intervention, so the string landscape may explain how the constants of nature
that we observe can take values suitable for life without being fine-tuned by
a benevolent creator. I found this parallel well understood in a surprising
place, a New York Times op-ed article by Christoph Schönborn, Cardinal
Archbishop of Vienna.33 His article concludes as follows:

“Now, at the beginning of the 21st century, faced with scien-


tific claims like neo-Darwinism and the multiverse hypothesis in
cosmology invented to avoid the overwhelming evidence for pur-
31
M. Livio and M. J. Rees, Science 309, 1022 (12 August, 2003).
32
M. Tegmark, Ann. Phys. 270, 1 (1998).
33
C. Schönborn, N. Y. Times, 7 July 2005, p. A23.

12
pose and design found in modern science, the Catholic Church
will again defend human nature by proclaiming that the imma-
nent design evident in nature is real. Scientific theories that try
to explain away the appearance of design as the result of ‘chance
and necessity’ are not scientific at all, but, as John Paul put it,
an abdication of human intelligence.”

It’s nice to see work in cosmology get some of the attention given these days
to evolution, but of course it is not religious preconceptions like these that
can decide any issues in science.
It must be acknowledged that there is a big difference in the degree
of confidence we can have in neo-Darwinism and in the multiverse. It is
settled, as well as anything in science is ever settled, that the adaptations
of living things on earth have come into being through natural selection
acting on random undirected inheritable variations. About the multiverse,
it is appropriate to keep an open mind, and opinions among scientists differ
widely. In the Austin airport on the way to this meeting I noticed for sale
the October issue of a magazine called Astronomy, having on the cover the
headline “Why You Live in Multiple Universes.” Inside I found a report of
a discussion at a conference at Stanford, at which Martin Rees said that
he was sufficiently confident about the multiverse to bet his dog’s life on
it, while Andrei Linde said he would bet his own life. As for me, I have
just enough confidence about the multiverse to bet the lives of both Andrei
Linde and Martin Rees’s dog.


This material is based upon work supported by the National Science
Foundation under Grants Nos. PHY-0071512 and PHY-0455649 and with
support from The Robert A. Welch Foundation, Grant No. F-0014, and
also grant support from the US Navy, Office of Naval Research, Grant Nos.
N00014-03-1-0639 and N00014-04-1-0336, Quantum Optics Initiative.

13
27.11 concepts 21/11/03 5:02 pm Page 1

concepts

Four golden lessons Scientist


Advice to students at the start of
work of many theoretical and experimental
Steven Weinberg their scientific careers.
physicists has been able to sort it out, and
put everything (well, almost everything)

W
hen I received my undergraduate
degree — about a hundred years together in a beautiful theory known as
ago — the physics literature the standard model.My advice is to go for the to spending most of your time not being
seemed to me a vast, unexplored ocean, messes — that’s where the action is. creative, to being becalmed on the ocean of
every part of which I had to chart before My third piece of advice is probably the scientific knowledge.
beginning any research of my own. How hardest to take. It is to forgive yourself for Finally, learn something about the history
could I do anything without knowing wasting time. Students are only asked to of science,or at a minimum the history of your
everything that had already been done? solve problems that their professors (unless own branch of science. The least important
Fortunately, in my first year of graduate unusually cruel) know to be solvable. In reason for this is that the history may actually
school, I had the good luck to fall into the addition,it doesn’t matter if the problems are be of some use to you in your own scientific
hands of senior physicists who insisted, over scientifically important — they have to be work. For instance, now and then scientists
my anxious objections, that I must start solved to pass the course. But in the real are hampered by believing one of the over-
doing research, and pick up what I needed world, it’s very hard to know which problems simplified models of science that have
to know as I went along. It was sink or are important, and you never know whether been proposed by philosophers from Francis
swim. To my surprise, I found that this at a given moment in history a problem is Bacon to Thomas Kuhn and Karl Popper.
works. I managed to get a quick PhD — solvable. At the beginning of the twentieth The best antidote to the philosophy of science
though when I got it I knew almost nothing century, several leading physicists, including is a knowledge of the history of science.
about physics. But I did learn one big Lorentz and Abraham, were trying to work More importantly, the history of science
thing: that no one knows everything, and out a theory of the electron. This was partly can make your work seem more worthwhile
you don’t have to. in order to understand why all attempts to to you. As a scientist, you’re probably not
Another lesson to be learned, to continue detect effects of Earth’s motion through the going to get rich. Your friends and relatives
using my oceanographic metaphor, is that ether had failed. We now know that probably won’t understand what you’re
while you are swimming and not sinking you they were working on the wrong problem. doing.And if you work in a field like elemen-
should aim for rough water. When I was At that time, no one could have developed a tary particle physics, you won’t even have the
teaching at the Massachusetts Institute of successful theory of the electron, because satisfaction of doing something that is
Technology in the late 1960s, a student told quantum mechanics had not yet been immediately useful. But you can get great
me that he wanted to go into general discovered. It took the genius of Albert satisfaction by recognizing that your work in
relativity rather than the area I was working Einstein in 1905 to realize that the right science is a part of history.
on, elementary particle physics, because problem on which to work was the effect Look back 100 years, to 1903. How
the principles of the former were well of motion on measurements of space and important is it now who was Prime Minister
known, while the latter seemed like a mess time. This led him to the special theory of of Great Britain in 1903, or President of the
to him. It struck me that he had just given relativity. As you will never be sure which United States? What stands out as really
a perfectly good reason for doing the oppo- are the right problems to work on, most important is that at McGill University,
site. Particle physics was an area where of the time that you spend in the laboratory Ernest Rutherford and Frederick Soddy were
creative work could still be done. It really was or at your desk will be wasted. If you want working out the nature of radioactivity.
a mess in the 1960s, but since that time the to be creative, then you will have to get used This work (of course!) had practical applica-
tions, but much more important were its
USA/NETWORKS/KOBAL COLLECTION/R. FOREMAN

cultural implications. The understanding of


radioactivity allowed physicists to explain
how the Sun and Earth’s cores could still be
hot after millions of years. In this way, it
removed the last scientific objection to what
many geologists and paleontologists
thought was the great age of the Earth and
the Sun.After this, Christians and Jews either
had to give up belief in the literal truth of
the Bible or resign themselves to intellectual
irrelevance. This was just one step in a
sequence of steps from Galileo through
Newton and Darwin to the present that, time
after time,has weakened the hold of religious
dogmatism. Reading any newspaper nowa-
days is enough to show you that this work
is not yet complete. But it is civilizing work,
of which scientists are able to feel proud. ■
Steven Weinberg is in the Department of Physics,
the University of Texas at Austin, Texas 78712,
USA. This essay is based on a commencement talk
given by the author at the Science Convocation at
Dive right in: exploring the unclear, uncharted areas of science can lead to creative work.
McGill University in June 2003.

NATURE | VOL 426 | 27 NOVEMBER 2003 | www.nature.com/nature 389


©2003 Nature Publishing Group
Physica 96A (1979) 327-340 © North-Holland Publishing Co.

P H E N O M E N O L O G I C A L LAGRANGIANS*

STEVEN WEINBERG
L yman Laboratory of Physics, Harvard University

and

Harvard-Smithsonian Center for Astrophysics, Cambridge, Massachusetts 02138, USA

1. Introduction: A reminiscence

Julian Schwinger's ideas have strongly influenced my understanding of


phenomenological Lagrangians since 1966, when I made a visit to Har-
vard. At that time, I was trying to construct a phenomenological Lagrangian
which would allow one to obtain the predictions of current algebra for soft
pion matrix elements with less work, and with more insight into possible
corrections. It was necessary to arrange that the pion couplings in the
Lagrangian would all be derivative interactions, to suppress the incalculable
graphs in which soft pions would be emitted from internal lines of a hard-
particle process. The mathematical approach I followed 1) at first was quite
clumsy; I started with the old o--model2), in which the pion is in a chiral
quartet with a 0+ isoscalar or; then performed a space-time dependent chiral
rotation which transformed {~',cr} everywhere into {0, o,'} with tr'-=
(tr2+ rr)~/2; and then re-introduced the pion field as the chiral rotation "angle".
The Lagrangian obtained in this way had a complicated and unfamiliar
non-linear structure, but it did have the desired property of derivative
coupling, because any space-time independent part of the rotation "angle"
would correspond to a symmetry of the theory, and so would not contribute
to the Lagrangian.
Schwinger suggested to me that one might be able to construct a suitable
phenomenological Lagrangian directly, by introducing a pion field which from
the beginning would have the non-linear transformation property of chiral
rotation angles, and then just obeying the dictates of chiral symmetry for such
a pion field3). Following this suggestion, I worked out a general theory 4) of
non-linear realizations of chiral SU(2) x SU(2), which was soon after general-
ized to arbitrary groups in elegant papers of Callan, Coleman, Wess, and
ZuminoS), and has since been applied by many authors6). The importance of
the approach suggested by Schwinger has been not only that it saves the work

* Research supported in part by the National Science Foundation under Grant No. PHY77-
22864.

327
328 STEVEN WE1NBERG

involved in the transition f r o m an ordinary linear representation like {~, o-} to


a non-linear realization, but more important, that it m a k e s clear that the
interactions of other hadrons with soft pions does not in any way depend on
the chiral transformation properties of w h a t e v e r fields are associated with
these hadrons, but only on their isospin.
In the decade since 1967, Schwinger's ideas have evolved into what he calls
" s o u r c e theory"7). I have been pretty much out of touch with this work,
mostly because of an i n v o l v e m e n t with other lines of research, but perhaps
also because I found Schwinger's conceptual f r a m e w o r k unfamiliar. Recently,
several p r o b l e m s have led me to think again about the use of p h e n o m e n o l o -
gical Lagrangians, and I find that my ideas have shifted somewhat, to a point
of view that seems to me to be now not too different f r o m the point of view of
source theory.
To summarize: section 2 presents an argument that phenomenological
Lagrangians can be used not only to r e p r o d u c e the soft pion results of current
algebra, but also to justify these results, without any use of operator algebra.
Section 3 shows how phenomenological Lagrangians can be used to calculate
corrections to the leading soft pion results to any desired order in external
m o m e n t a . In section 4, the renormalization group is used to elucidate the
structure of these corrections. Corrections due to the finite mass of the pion
are treated in section 5. Section 6 offers speculations about a possible other
application of phenomenological Lagrangians.
This article is intended as a r e v i e w - I doubt that any of the material
presented here is entirely new. In particular, although I have not tried here to
judge the extent to which the ideas described below overlap those of source
theory, I would not be surprised to find that these are points which long ago
a p p e a r e d in Schwinger's work. In that case, I hope that he will take this paper
as a little work of translation into the Vulgate, offered as a birthday present to
an old friend.

2. Current algebra without current algebra

It is well k n o w n that matrix elements for soft-pion interactions can be


obtained by " c u r r e n t algebra", that is, by a direct use of the c o m m u t a t i o n and
conservation relations of the vector and axial-vector currents of chiral
SU(2) × SU(2). It is also well k n o w n that the same matrix elements m a y also
be calculated (usually more easily) f r o m the tree graphs in an SU(2) × SU(2)-
invariant phenomenological Lagrangian. H o w e v e r , it has been widely sup-
posed s) that the ultimate justification of the results obtained f r o m a
phenomenological Lagrangian rests on the foundation of current algebra.
According to this method of derivation, one must first use current algebra
PHENOMENOLOGICAL LAGRANGIANS 329

to show that soft-pion matrix elements are uniquely determined by the


properties of the currents plus certain "smoothness" properties of the matrix
elements. One then reflects that any chiral-invariant Lagrangian will have
currents with the assumed properties, and that the tree graphs in such a
theory will have the assumed smoothness properties. It follows that the
matrix elements computed from these tree graphs must automatically
reproduce the results of current algebra.
I would like to show in this section that the use of current algebra in the
above line of reasoning is actually unnecessary. That is, the phenomenological
Lagrangians themselves can be used to justify the calculation of soft-pion
matrix elements from the tree graphs, without any use of operator algebra.
This remark is based on a "theorem", which as far as I know has never
been proven, but which I cannot imagine could be wrong. The "theorem" says
that although individual quantum field theories have of course a good deal of
content, quantum field theory itself has no content beyond analyticity, uni-
tarity, cluster decomposition, and symmetry. This can be put more precisely
in the context of perturbation theory: if one writes down the most general
possible Lagrangian, including all terms consistent with assumed symmetry
principles, and then calculates matrix elements with this Lagrangian to any
given order of perturbation theory, the result will simply be the most general
possible S-matrix consistent with analyticity, perturbative unitarity, cluster
decomposition and the assumed symmetry principles. As I said, this has not
been proved, but any counterexamples would be of great interest, and I do
not know of any.
With this "theorem", one can obtain and justify the results of current
algebra simply by writing down the most general Lagrangian consistent with
the assumed symmetry principles, and then deriving low energy theorems by
a direct study of the Feynman graphs, without operator algebra. However, in
order for this to be a derivation and not merely a mnemonic, it is necessary to
include all possible terms in the Lagrangian, and take account of graphs of all
orders in perturbation theory.
To illustrate this procedure, let us consider a theory of massless pions,
governed by a chiral SU(2)x SU(2) symmetry, for which the pions serve as
Goldstone bosons. For simplicity, we will "integrate out" whatever other
degrees of freedom may be present-nucleons, p mesons, tr mesons, e t c . -
and consider only the pions. The Lagrangian will be SU(2)x SU(2)-invariant
provided it conserves isospin, and is constructed only from a chiral-covariant
derivative of the pion field, which by a suitable definition of the pion field may
be taken in the form 4)

Dt,~ = (a~,~)/(1 + n-z). (1)


330 STEVEN WEINBERG

The most general such Lagrangian is an infinite series of operators of higher


and higher dimensionality 9)

c Y = - ~.g2D~,Tr • D " c r - ~ ; 4 ~,-.~,..

- ~g~2)(D~,lr • D ~ T r ) ( D " : r • D ~ ~r) + . . . , (2)

where g]"~ are constants of dimensionality [mass] 4 ~. Since the field rr is


dimensionless, d is just equal to the number of derivatives in the interaction.
As is well known, the constant g2 is related to the pion decay amplitude
F~ ~- 190 MeV by

g2 = F~ (3)

but it will be more convenient here to treat g2 in parallel with the other
couplings.
According to the "theorem" quoted at the beginning of this section, such a
general Lagrangian has no specific dynamical content beyond the general
principles of analyticity, unitarity, cluster decomposition, Lorentz invariance,
and chirality, so that when it is used to calculate pionic S-matrix elements, it
yields the most general matrix elements consistent with these general prin-
ciples, provided that all terms of all orders in all couplings g2, g]~), g~42), etc., are
included. One does not need the methods of current algebra for justification
here; the Lagrangian (2) is so general that the only conclusions that can be
drawn from it are just those which follow from the general principles with
which we started.
All this becomes of practical value in the calculation of matrix elements for
pions of low energy. Consider the matrix element for a process involving Me
external pion lines, carrying energies proportional to some energy scale E.
Such a matrix element will have dimensionality [mass] D', where

DI = 4 - Ne. (4)
The coupling constants contributing to a given term in such a matrix element will
altogether have dimensionality [mass] °~, where

192 = ~ Nd(4 - d) - 2N~- Ne. (5)


d

Here N d is the number of vertices formed from interactions with d deriva-


tives, and N~ is the total number of internal pion lines. (The terms -2N~ and
- N e appear here because the pion field r: has an unconventional nor-
malization, with a propagator proportional to l / g 2 and external line " w a v e
functions" proportional to 1/~/g2. We could have used a conventionally
normalized pion field ~ / ~ r , in which case the propagators and external lines
would not contribute factors involving g2, but such factors would instead be
PHENOMENOLOGICAL LAGRANGIANS 331

contributed by the pion fields in the interactions. The final answer is of course
the same.) Ultraviolet divergences are to be absorbed into a renormalization
of the infinite n u m b e r of coupling parameters, defined at renormalization
points with m o m e n t a proportional to some c o m m o n renormalization energy
scale /~.. The only quantities with non-vanishing dimensionality are the com-
mon energy scale E, the c o m m o n renormalization scale /z, and the coupling
constants themselves, so each term in the matrix element must take the f o r m

M = E°f(E/IJ.) (6)

with

D = D , - D2 = 4 + ~ , N a ( d - 4) + 2Ni. (7)
d

This can be conveniently re-written by using the well-known formula for the
n u m b e r of loops in a graph

NL = Ni- ~ , Na + 1. (8)
d

We find then

D = 2+ ~ N d ( d - 2) + 2NL. (9)
d
N o w suppose that the characteristic pion energy E is very small, and take
the renormalization scale # to be of order E. From (6), we see that the
dominant graphs will then be those with the smallest values for the exponent
D. According to eq. (9), these are just the tree graphs (i.e., NL = 0) formed
purely f r o m the term in the Lagrangian with the lowest possible number d of
derivatives, the d = 2 term

~ 2 = - ~g2D~,~D ~'~. (1 O)

Thus without using the methods of current algebra, we arrive at the same
conclusion, that matrix elements for soft pion processes may be calculated
from the effective Lagrangian (10), keeping only tree graphs.

3. Corrections to soft-pion results

The real virtue of the phenomenological Lagrangian approach described in


the preceding section is not that it provides an alternative derivation of a
known result, but that it allows us in a systematic way to calculate corrections
to this result.
332 STEVEN WEINBERG

F o r e x a m p l e , s u p p o s e that we w a n t to calculate the matrix e l e m e n t for


p i o n - p i o n scattering with M a n d e l s t a m variables s, t, u all of the s a m e order of
m a g n i t u d e , and all v e r y small. As we h a v e seen, eq. (9) tells us that the
leading t e r m (which here is of o r d e r s) can be calculated using the d -- 2 t e r m
(10) in the tree a p p r o x i m a t i o n . This gives the k n o w n matrix e l e m e n t '°)

M(1)
abed = 4g~J[6ab6~dS+ 6a~6bdt+ 6~dSbeU]. (11)

In the notation used here, a, b, c, d are i s o v e c t o r indices a s s o c i a t e d with pion


lines having f o u r - m o m e n t a PA, PB, Pc, PD r e s p e c t i v e l y and s = - ( P A + PB) 2,
t = -- (PA -- Pc) 2, u = - (PA -- PD) 2. W e here set the pion m a s s e s equal to zero, so
s + t + u vanishes. Also, the p i o n - p i o n scattering matrix e l e m e n t M is nor-
malized so that the S - m a t r i x e l e m e n t is

S = i(2"/r)464(pA + Pn- P c - PD)M(21r)-6(16EAEBEcED) -~/2.


N o w , s u p p o s e that we w a n t also to calculate the c o r r e c t i o n s of order
s 2 ~ E 4. E q u a t i o n (9) tells us that there will arise f r o m g r a p h s in which there
are a n y n u m b e r of couplings (10) with d = 2, and either one v e r t e x with d = 4
or one loop. A s t r a i g h t - f o r w a r d calculation gives the o r d e r - s 2 c o r r e c t i o n s to M
as:

M~2) ~ab~cd r- s)- - - -(,, + 3V)In(-t)

1
(t 2 - s 2 + 3u 2) I n ( - u) + l ( ~ s 2 + ~t 2 + -~u2) In A 2
7"1"

- 2_1s ~(1)
4 o~2 - z s1 4~(2)/~,2
~, + u 2)J + c r o s s e d terms. (12)

H e r e A is an ultraviolet cut-off, and " c r o s s e d t e r m s " d e n o t e s t e r m s given by


the i n t e r c h a n g e s b ~-~ c and s +~ t or b +~ d and s ~-~ u. T h e d i v e r g e n c e m a y be
eliminated by defining r e n o r m a l i z e d coupling c o n s t a n t s

g~')(.) = g~' - ~ - ~ In ( - ~ ) , (13)

g~2)(tx)~g~2) - 4 ( A2 )
~--~z In ~ - ~ , (14)

so that (12) b e c o m e s

M(z)
abed _
-- @g 2 [ _ 2 1 _ ~ s 2 l n ( ~ ) _ l_~ (u2_s2+ 3tZ) ln(~__~)

1 -u
--127r2 (t 2 - s2+ 3u 2) In ( - - ~ - ) - ~gt')(/.t)s 2 - ~g(42)(/z)(t2 + u2)l

+ c r o s s e d terms. (15)
PHENOMENOLOGICAL LAGRANGIANS 333

It is true that (15) has a polynomial part with unknown coefficients, but it is
far from an empty formula: the logarithmic terms have coefficients given by
eq. (15) as definite functions of g2. It is not surprising that chiral symmetry
should have consequences of this sort, for the logarithmic branch points arise
from intermediate states consisting of soft pion pairsl6), and the matrix elements
for producing and absorbing these soft pions are determined by chiral
symmetry. What is noteworthy is that the coefficients of the logarithmic terms
can be calculated in detail so easily, by a one-loop calculation using a suitable
phenomenological Lagrangian.

4. Application of the renormalization group

Geli-Mann and Low showed long ago H) how renormalization group tech-
niques could be used to gain information about the perturbation series for
quantum electrodynamics. Without having to calculate Feynman graphs, they
were able to show that the photon propagator/~(s) is linear in log s in second
order, linear in log s in fourth order, and quadratic in log s in sixth order, with
the coefficient of log 2 s determined by the product of the coefficients of log s
in second and fourth order. In much the same way, we can use renor-
realization group techniques to get detailed information about the perturbation
series generated by a non-renormalizable phenomenological Lagrangian.
For illustration, let us consider the matrix element for an arbitrary reaction
among soft pions, but now fix all scattering angles and isospin indices, so that
the scattering amplitude can be written as function of the center-of-mass
energy E - ~/s alone. According to eq. (9), terms of order E 2 arise from tree
graphs involving only g2; terms of order E 4 arise from one-loop graphs
involving only g2 plus tree graphs linear in o, ~9)t~/z) or g~a2~(/z);terms of order E 6
arise from two-loop graphs involving only g:, plus one-loop graphs linear in
g~l) or g~2) plus tree graphs linear in g~6") or the quadratic in the g~"); and so on.
Hence the matrix element takes the form

+ Z,,1C~6""g~"'(~)g]m'(")]+ ' " "}" (16)

Here the c; are dimensionless functions of angle and isospin variables. The
dimensionless functions F, H, J, etc., arise from loop graphs, and depend only
on angle and isospin variables and on the dimensionless ratio EIo.
334 STEVEN WEINBERG

T h e essential idea o f the r e n o r m a l i z a t i o n g r o u p m e t h o d is to exploit the fact


that the matrix e l e m e n t m u s t be i n d e p e n d e n t of the arbitrary r e n o r m a l i z a t i o n
scale p.. As applied to eq. (16), this yields the conditions

-7 -7 J'°"

n \P;/ n nm

(18)
Since (17) m u s t hold for arbitrary E and tz, both terms must be constants. F o r
this to be true for all angles and isospins, we m u s t then have

p.g~m'(/.t) = b~"~, (19)

-- = £'4 /94 (20)


#

with bt4"~ c o n s t a n t s that are i n d e p e n d e n t of all angle or isospin variables. T h u s


the terms of o r d e r E ~ can at m o s t contain a single logarithm

,,4 ~4 In . (21)
n

T o d e t e r m i n e the b] "~ we can c o m p a r e our p r e v i o u s result (13), (14) with the


solution of eq. (19)

g]"~(#) = b]n~ In (~00) (22)

and find

bi t ' = ~ 42 , b]Z) = ~ 82 . (23)

Of course, eq. (21) holds with the same values of b~"~ for all processes, not
just 7r-Tr scattering.
Returning n o w to eq. (18), we can insert (22), and find

- ~- - ~- J~"~' b~ "~ In #

+~.. J'"'(E)b]"'+g2~, c~"'/xg~""(/~)+ 2 y~ °~ 6' " " ' ~t, '4' ' ' g 4'"',~ / ~ ,.
n n nm
PHENOMENOLOGICAL LAGRANGIANS 335

Differentiating this with respect to E then yields

=~ b~,,ln ( ~ ) E 0 E a (E)
Ix a(Elix) Ix ~ H . Ix a(E/IX) IX a(E/ix) J~")

+ ~ b~"~

Since this must hold for all IX, we can immediately conclude that J(")(Elix) is
linear in In E[IX while H(E[IX) is quadratic in In E[IX:
H(EIIX) = ho + h~ In E]~ + h2 In2 E]~, (24)

J(")(E/IX) = j~o"~+ j~") In(E/ix) (25)


with coefficients of leading logarithms related by

h2 = ½ Z b]")J]")" (26)
pl

It is truly a pleasure to be able to deduce such detailed information about


multi-loop graphs, without ever having to calculate any of them.

5. Symmetry breaking

In the real world, chiral symmetry is broken-rather weakly for SU(2)x


SU(2), fairly strongly for SU(3)x SU(3). As a result, the "soft zr" and "soft
K" results of current algebra, which would be precise theorems in the limit of
exact chiral symmetry, became somewhat fuzzy, depending for their inter-
pretation on a good deal of unsystematic guesswork about the smoothness of
extrapolations off the mass shell. In this section, I wish to show that
phenomenological Lagrangians can serve as the basis of an approach to chiral
symmetry breaking, which has at least the virtue of being entirely systematic.
Quantum chromodynamics tells us that in a world with only light u and d
quark fields, the Lagrangian of the strong interactions takes the form ~Z0+ o~l,
where ~0 is invariant under global SU(2)x SU(2) transformations on the
quark fields, and ~ is in some sense a small perturbation
~l = mu(~u + mddd. (27)
We may write .L#I as the sum of third and fourth components of chiral
four-vectors 15)

L e ~ D = V3 + V4, (28)
V3 = ~ ( m . - m ~ ) ( a u - a d ) , (29)
336 STEVEN WEINBERG

V4 = ½(m, + ma)(au + dd). (30)

Thus the S-matrix takes the form of a sum of terms of kth order in m~ - m d
a n d / t h order in m, + ma, each term having the chiral transformation property
of a traceless symmetric tensor of rank k + l with k indices equal to 3 and l
indices equal to 4

S = E s(kl),
kt
S (u) ~ ( m . - m d ) k ( m u + mJ. (31)

(Strictly speaking, this is true only with a chiral-invariant infrared cut-off.)


Now, the most general phenomenoiogical Lagrangian which gives an S-matrix
of this form is itself such a sum

,~EFF= ~ o~(kl), (32)


k,l sO

~tk~) ~ (m,, - m d ) k ( m u + md) t. (33)

The S-matrix is therefore to be calculated with such a phenomenological


Lagrangian, with 5gtk~) taken as the most general function of hadronic fields
and their derivatives having the chiral transformation property of a com-
ponent of a traceless symmetric tensor of rank k + l with k 3-indices and l
4-indices.
To see how this works in detail, let us again restrict ourselves to purely
pionic processes. As is well known n) (and shown below) the square of the
pion mass is proportional to m,, + ma, so S tk~) and o~ (kl) a r e of order m--2tk*~).. If
we calculate a pionic process near threshold, then all of the characteristic
energies E are also of order m~. Hence (9) may be modified to give the
number of powers of E and/or m . contributed by any given graph

if) = 2 + ~ Ndkt(d - 2 + 2k + 21) + 2NL. (34)


d,k,l

where Nat,t is the number of vertices with d derivatives, k powers of ma + m,,,


and ! powers of m a - m,, and NL is again the number of loops. If we regard
E ~ mr as a small parameter, then the leading graphs are those with the
smallest values o f / ) .
Now, there is no way to make a chiral scalar out of the pion field alone,
with no derivatives, so there is no term in (34) with d = k = l = 0. There is a
single chiral four-vector that can be formed with no derivatives, with com-
ponents
~T2
Vi = (1 + ~2)-'~r, V4 = - ~ + ] - ~ cry. (35)
PHENOMENOLOGICAL LAGRANGIANS 337

The third c o m p o n e n t is a pseudoscalar, and therefore cannot a p p e a r in the


strong interaction Lagrangian. H o w e v e r , the fourth c o m p o n e n t is a scalar, so
it yields an interaction with d = k = 0, l = I
~(01)
= - ~Im 2~g2(1 + ,,~,,2) - I ,,i,,],,2"
(36)

The coupling constant is fixed here by the condition that the square of the
canonically normalized field ~r'-g2~/2~" should have coefficient -m2=/2, and a
constant term has been discarded. C o m p a r i s o n of (36) with (33) shows as
already mentioned that m 2~.oc m,, + md.
According to eq. (34), the leading terms in the S-matrix are given by the
sum of all tree graphs (NL = 0) constructed f r o m any n u m b e r of vertices (10)
with k = l = 0, d = 2 and any n u m b e r of vertices (36) with k = d = 0, l = 1. For
pion-pion scattering, the tree graphs consist of single vertices, f o r m e d f r o m
the terms in (10) or (36) that are quartic in the pion field. This yields precisely
the formulas for the ~r-~r scattering lengths previously derived by operator
methods~°).
As before, phenomenological Lagrangians really c o m e into their own in
calculating corrections to the leading terms. Equation (34) shows that the
leading corrections arise f r o m one-loop graphs constructed f r o m any n u m b e r
of vertices (10), (36), plus tree graphs constructed f r o m any n u m b e r of
vertices (10), (36) and one vertex with d + 2k + 21 = 4. These latter vertices
m a y be f o r m e d f r o m functions of the pion field and its derivatives, of three
different kinds:
(a) d = 4, k + l = 0. These are just the chiral scalars with four derivatives
appearing in eq. (2).
(b) d = 2, k + l = I. This is the chiral f o u r - v e c t o r f o r m e d by multiplying the
f o u r - v e c t o r (35) with the scalar (10)

Ui = (1 + ¢r2)-3ma~,'tqa~'zri, (37)

/-14 = -½(1 + ~r2)-3(1 - ~r2)a~,Tr~a"~rj. (38)

(c) d = 0, k + I = 2. This is the chiral tensor f o r m e d f r o m the direct product


of the f o u r - v e c t o r (35) with itself

Tij = (1 + n'2)-27rilr~, (39)

Ti4 = - ½(1 + "B'2)-2(1 - "ff2)'a'i, (40)


To = ~(1 + ~2)-2(1 - n'2) 2. (41)

The only operators here of positive parity are the chiral scalars (a), plus the
c o m p o n e n t s U4, Tii, and T44. H e n c e the new terms in the effective Lagrangian
which are needed to calculate the leading corrections to the tree ap-
338 STEVEN WEINBERG

proximation results for soft pion processes are

a m ~ ( l + "rr2)-3(! - 'rr2)O~ • Ot'Tr + flm4(i + zr 2",~- 2 7r32


+ 3,m~(I + 2 ) 2(1 - rr2) 2, (42)

where a, /3, and 3; are dimensionless constants of order unity, with

/3/y = O((m,, - rna)2[(m, + rn,D2). (43)

Current algebra calculations '2) of the K + - K ° mass difference yield the quark
mass ratio m e / m , = 1.8, so the right-hand side of (43) is 0.08. It is interesting
that the non-degeneracy of d and u introduces a rather large violation of
isotopic spin conservation in the leading corrections to the usual soft pion
results.
In calculating terms of a given order in E and/or m , , we must include
graphs with arbitrary numbers of vertices f o r m e d f r o m the interactions (10)
and (36). For the most part, the n u m b e r of vertices that can actually occur are
limited by the topology of the graphs; for instance for 7r-Tr scattering the tree
graphs have just a single quartic vertex, the one-loop graphs have two quartic
vertices, and so on. H o w e v e r , there is never any limit on the n u m b e r of times
the quadratic part -½rn2g2rr 2 of eq. (36) can appear. To put this another and
more convenient way, we must calculate all tree and loop graphs using a pion
propagator g ~ ( q 2 + m~) ~, and not expand these propagators in powers of m~.
Thus in addition to an over-all p o w e r of m~, the matrix elements we calculate
will have singularities in m~ as well as E. These singularities are of course just
what is required by perturbative unitarityl3).
T h e r e is not so much need today for refinements in the theory of pion-pion
scattering. On the other hand, there is a wide variety of experimentally
interesting processes where corrections to soft ~r or soft K t h e o r e m s m a y be
important, including 7 r N ~ 2 z r N , K ~ 2 7 r , K ~ 3 7 r , K ~ ~'/.t~,, r/~37r, etc. The
approach outlined in this section may serve as a basis for a systematic
treatment of all these processes.

6. Phenomenological Lagrangians and QCD

H a n d y as they are, the phenomenoiogical Lagrangians that we have been


using are only phenomenological. This is brought h o m e to us as we calculate
graphs to higher and higher order in the pion e n e r g y - i n each successive
order, we e n c o u n t e r more and more unknown parameters. Beneath the
phenomenological Lagrangian of the soft pions there must lie a more nearly
PHENOMENOLOGICAL LAGRANGIANS 339

fundamental quantum field theory of strong interactions, which fixes all the
free parameters of our phenomenological Lagrangians.
It now appears increasingly likely that this underlying theory is the tenor-
realizable gauge theory known as quantum chromodynamics. By virtue of its
asymptotic freedom, QCD predicts that the strong interactions should
become weak at high energies in a certain definite way. The weakness of the
interactions allows one to carry out perturbative calculations of certain
quantities at high energy, and the results so far are in agreement with
experiment.
However, as the energy becomes smaller, the strong interactions in QCD
become stronger, and perturbation theory becomes no longer applicable. This
of course is just what we w a n t - t h e richness of the hadron spectrum and the
absence of free quarks or gluons show clearly that perturbation theory had
better not work at all energies. But then how do we do calculations of strong
interactions at low energies?
In non-relativistic potential theory, there are well-known methods of solv-
ing the problem of calculating scattering amplitudes for potentials that are too
strong to allow the use of perturbation theory. One of these methods of
solution is the "quasiparticle" approachl4). In this method, one introduces
fictitious elementary particles into the theory, in rough correspondence with.
the bound states (or more precisely, the eigenvalues of the scattering kernel)
of the theory. In order not to change the physics, one must at the same time
change the potential. Since the bound states of the original theory are now
introduced as elementary particles, the modified potential must not produce
them also as bound states. Hence the modified potential is weaker, and can in
fact be weak enough to allow the use of perturbation theory.
Following this lead, one might imagine weakening the forces of QCD by
introducing some sort of infrared cut-off A, and preserving the physical
content of the theory by introducing the bound states of the theory as
fictitious elementary particles. These bound states are just the ordinary
hadrons, and they must be described by a chiral-invariant phenomenological
Lagrangian. The parameters of the phenomenological Lagrangian would have
to be functions of A, defined by differential equations which guarantee the
A-independence of the S-matrix, with initial condition set by the requirement
that the theory goes over to pure QCD in the limit A ~ 0 , where there is no
infrared cut-off. The hope would be that at low energy, one could continue the
solution of these equations to a value of A large enough to allow the use of
perturbation theory.
It remains to be seen whether this program can be successfully carried
through.
340 STEVEN WEINBERG

References

1) S. Weinberg, Phys. Rev. Lett. 18 (1%7) 188.


2) J. Schwinger, Ann. Phys. (N.Y.) 2 (1957) 407.
M. Gell-Mann and M. Levy, Nuovo Cimento 16 (1960) 705.
3) This approach was followed by J. Schwinger, Phys. Lett. 24B (1%7) 473.
4) S. Weinberg, Phys. Rev. 166 (1%8) 1568.
5) S. Coleman, J. Wess and B. Zumino, Phys. Rev. 177 (1%8) 2239.
C. Callan, S. Coleman, J. Wess and B. Zumino, Phys. Rev. 177 (1%8) 2247.
6) For reviews, see B.W. Lee, Chiral Dynamics (Gordon and Breach, New York 1972).
S. Weinberg, Lectures on Elementary Particles and Quantum Field Theory, S. Deser, M.
Grisaru and H. Pendleton, eds. (M.I.T. Press, Cambridge, MA, 1970).
7) J. Schwinger, Particles, Sources, and Fields (Addison-Wesley, Reading, MA, 1973).
8) This was my own point of view in ref. 1.
9) Interactions like (D,D"rt) 2 are omitted here, because they can be eliminated by a suitable
redefinition of the pion field, and hence are not needed in the construction of the most
general on-mass-shell matrix elements.
10) S. Weinberg, Phys. Rev. Lett. 17 (1966) 616.
11) M. Gell-Mann and F.E. Low, Phys. Rev. 95 (1954) 1300.
12) See e.g.S. Weinberg, in A Festschrift for LI. Rabi (New York Academy of Sciences, 1977),
p. 185, and references quoted therein.
13) The presence and importance of terms in the S-matrix which are not analytic in symmetry
breaking parameters like m,, me was pointed out by L.-F. Li and H. Pagels, Phys. Rev. Lett.
26 (1971) 1204; 27 (1971) 1089; Phys. Rev. D5 (1972) 1509; and P. Langacker and H. Pagels,
Phys. Rev. D8 (1973) 4595.
14) S. Weinberg, Phys. Rev. 130 (1%3) 776; 131 (1%3) 440.
M. Scadron and S. Weinberg, Phys. Rev. 133 (1964) B1589.
M. Scadron, S. Weinberg and J. Wright, Phys. Rev. 135 (1964) B202.
15) In referring to terms in the Lagrangian as chiral four-vectors or tensors, I am of course
making use of the familiar isomorphism of SU(2) × SU(2) with the four-dimensional rotation
group.
16) LiandPagelsremarkedinref. 13 that the presence of massless Goldstone bosons in intermediate
states makes the matrix element non-analytic in energy-momentum variables. A calculation of
the coefficients of the logarithms in pion-pion scattering that uses unitarity instead of the
phenomenological Lagrangian has been given by H. Lehmann, Phys. Lett. 41B (1972) 529; H.
Lehmann and H. Trute, Nucl. Phys. B52 (1973) 280.
The Trouble with Quantum Mechanics
Steven Weinberg JANUARY 19, 2017 ISSUE

The development of quantum mechanics in the first


decades of the twentieth century came as a shock to many
physicists. Today, despite the great successes of quantum
mechanics, arguments continue about its meaning, and its
future.

1.
The first shock came as a challenge to the clear categories
to which physicists by 1900 had become accustomed.
There were particles—atoms, and then electrons and
atomic nuclei—and there were fields—conditions of space
that pervade regions in which electric, magnetic, and
gravitational forces are exerted. Light waves were clearly
recognized as self-sustaining oscillations of electric and
magnetic fields. But in order to understand the light
emitted by heated bodies, Albert Einstein in 1905 found it
necessary to describe light waves as streams of massless
particles, later called photons.

Then in the 1920s, according to theories of Louis de


Broglie and Erwin Schrödinger, it appeared that electrons,
which had always been recognized as particles, under
some circumstances behaved as waves. In order to account
for the energies of the stable states of atoms, physicists had
to give up the notion that electrons in atoms are little
Newtonian planets in orbit around the atomic nucleus.
Eric J. Heller
Electrons in atoms are better described as waves, fitting
The physicist Eric J. Heller’s Transport XIII (2003), inspired by electron flow
around the nucleus like sound waves fitting into an organ experiments conducted at Harvard. According to Heller, the image ‘shows two kinds of
pipe.1 The world’s categories had become all muddled. chaos: a random quantum wave on the surface of a sphere, and chaotic classical electron
paths in a semiconductor launched over a range of angles from a particular point. Even
though one is quantum mechanical and the other classical, they are related: the chaotic
Worse yet, the electron waves are not waves of electronic classical paths cause random quantum waves to appear when the classical system is
matter, in the way that ocean waves are waves of water. solved quantum mechanically.’

Rather, as Max Born came to realize, the electron waves


are waves of probability. That is, when a free electron collides with an atom, we cannot in principle say in what direction it
will bounce off. The electron wave, after encountering the atom, spreads out in all directions, like an ocean wave after
striking a reef. As Born recognized, this does not mean that the electron itself spreads out. Instead, the undivided electron
goes in some one direction, but not a precisely predictable direction. It is more likely to go in a direction where the wave is
more intense, but any direction is possible.

Probability was not unfamiliar to the physicists of the 1920s, but it had generally been thought to reflect an imperfect
knowledge of whatever was under study, not an indeterminism in the underlying physical laws. Newton’s theories of
motion and gravitation had set the standard of deterministic laws. When we have reasonably precise knowledge of the
location and velocity of each body in the solar system at a given moment, Newton’s laws tell us with good accuracy where
they will all be for a long time in the future. Probability enters Newtonian physics only when our knowledge is imperfect,
as for example when we do not have precise knowledge of how a pair of dice is thrown. But with the new quantum
mechanics, the moment-to-moment determinism of the laws of physics themselves seemed to be lost.

All very strange. In a 1926 letter to Born, Einstein complained:

Quantum mechanics is very impressive. But an inner voice tells me that it is not yet the real thing. The theory
produces a good deal but hardly brings us closer to the secret of the Old One. I am at all events convinced that He
does not play dice.2

As late as 1964, in his Messenger lectures at Cornell, Richard Feynman lamented, “I think I can safely say that no one
understands quantum mechanics.”3 With quantum mechanics, the break with the past was so sharp that all earlier physical
theories became known as “classical.”

The weirdness of quantum mechanics did not matter for most purposes. Physicists learned how to use it to do increasingly
precise calculations of the energy levels of atoms, and of the probabilities that particles will scatter in one direction or
another when they collide. Lawrence Krauss has labeled the quantum mechanical calculation of one effect in the spectrum
of hydrogen “the best, most accurate prediction in all of science.”4 Beyond atomic physics, early applications of quantum
mechanics listed by the physicist Gino Segrè included the binding of atoms in molecules, the radioactive decay of atomic
nuclei, electrical conduction, magnetism, and electromagnetic radiation.5 Later applications spanned theories of
semiconductivity and superconductivity, white dwarf stars and neutron stars, nuclear forces, and elementary particles. Even
the most adventurous modern speculations, such as string theory, are based on the principles of quantum mechanics.

Many physicists came to think that the reaction of Einstein and Feynman and others to the unfamiliar aspects of quantum
mechanics had been overblown. This used to be my view. After all, Newton’s theories too had been unpalatable to many of
his contemporaries. Newton had introduced what his critics saw as an occult force, gravity, which was unrelated to any sort
of tangible pushing and pulling, and which could not be explained on the basis of philosophy or pure mathematics. Also,
his theories had renounced a chief aim of Ptolemy and Kepler, to calculate the sizes of planetary orbits from first
principles. But in the end the opposition to Newtonianism faded away. Newton and his followers succeeded in accounting
not only for the motions of planets and falling apples, but also for the movements of comets and moons and the shape of
the earth and the change in direction of its axis of rotation. By the end of the eighteenth century this success had
established Newton’s theories of motion and gravitation as correct, or at least as a marvelously accurate approximation.
Evidently it is a mistake to demand too strictly that new physical theories should fit some preconceived philosophical
standard.

In quantum mechanics the state of a system is not described by giving the position and velocity of every particle and the
values and rates of change of various fields, as in classical physics. Instead, the state of any system at any moment is
described by a wave function, essentially a list of numbers, one number for every possible configuration of the system.6 If
the system is a single particle, then there is a number for every possible position in space that the particle may occupy. This
is something like the description of a sound wave in classical physics, except that for a sound wave a number for each
position in space gives the pressure of the air at that point, while for a particle in quantum mechanics the wave function’s
number for a given position reflects the probability that the particle is at that position. What is so terrible about that?
Certainly, it was a tragic mistake for Einstein and Schrödinger to step away from using quantum mechanics, isolating
themselves in their later lives from the exciting progress made by others.

2.
Even so, I’m not as sure as I once was about the future of quantum mechanics. It is a bad sign that those physicists today
who are most comfortable with quantum mechanics do not agree with one another about what it all means. The dispute
arises chiefly regarding the nature of measurement in quantum mechanics. This issue can be illustrated by considering a
simple example, measurement of the spin of an electron. (A particle’s spin in any direction is a measure of the amount of
rotation of matter around a line pointing in that direction.)
All theories agree, and experiment confirms, that when one measures the amount of spin of an electron in any arbitrarily
chosen direction there are only two possible results. One possible result will be equal to a positive number, a universal
constant of nature. (This is the constant that Max Planck originally introduced in his 1900 theory of heat radiation, denoted
h, divided by 4π.) The other possible result is its opposite, the negative of the first. These positive or negative values of the
spin correspond to an electron that is spinning either clockwise or counter-clockwise in the chosen direction.

But it is only when a measurement is made that these are the sole two possibilities. An electron spin that has not been
measured is like a musical chord, formed from a superposition of two notes that correspond to positive or negative spins,
each note with its own amplitude. Just as a chord creates a sound distinct from each of its constituent notes, the state of an
electron spin that has not yet been measured is a superposition of the two possible states of definite spin, the superposition
differing qualitatively from either state. In this musical analogy, the act of measuring the spin somehow shifts all the
intensity of the chord to one of the notes, which we then hear on its own.

This can be put in terms of the wave function. If we disregard everything about an electron but its spin, there is not much
that is wavelike about its wave function. It is just a pair of numbers, one number for each sign of the spin in some chosen
direction, analogous to the amplitudes of each of the two notes in a chord.7 The wave function of an electron whose spin
has not been measured generally has nonzero values for spins of both signs.

There is a rule of quantum mechanics, known as the Born rule, that tells us how to use the wave function to calculate the
probabilities of getting various possible results in experiments. For example, the Born rule tells us that the probabilities of
finding either a positive or a negative result when the spin in some chosen direction is measured are proportional to the
squares of the numbers in the wave function for those two states of the spin.8

The introduction of probability into the principles of physics was disturbing to past physicists, but the trouble with
quantum mechanics is not that it involves probabilities. We can live with that. The trouble is that in quantum mechanics the
way that wave functions change with time is governed by an equation, the Schrödinger equation, that does not involve
probabilities. It is just as deterministic as Newton’s equations of motion and gravitation. That is, given the wave function at
any moment, the Schrödinger equation will tell you precisely what the wave function will be at any future time. There is
not even the possibility of chaos, the extreme sensitivity to initial conditions that is possible in Newtonian mechanics. So if
we regard the whole process of measurement as being governed by the equations of quantum mechanics, and these
equations are perfectly deterministic, how do probabilities get into quantum mechanics?

One common answer is that, in a measurement, the spin (or whatever else is measured) is put in an interaction with a
macroscopic environment that jitters in an unpredictable way. For example, the environment might be the shower of
photons in a beam of light that is used to observe the system, as unpredictable in practice as a shower of raindrops. Such an
environment causes the superposition of different states in the wave function to break down, leading to an unpredictable
result of the measurement. (This is called decoherence.) It is as if a noisy background somehow unpredictably left only one
of the notes of a chord audible. But this begs the question. If the deterministic Schrödinger equation governs the changes
through time not only of the spin but also of the measuring apparatus and the physicist using it, then the results of
measurement should not in principle be unpredictable. So we still have to ask, how do probabilities get into quantum
mechanics?

One response to this puzzle was given in the 1920s by Niels Bohr, in what came to be called the Copenhagen interpretation
of quantum mechanics. According to Bohr, in a measurement the state of a system such as a spin collapses to one result or
another in a way that cannot itself be described by quantum mechanics, and is truly unpredictable. This answer is now
widely felt to be unacceptable. There seems no way to locate the boundary between the realms in which, according to Bohr,
quantum mechanics does or does not apply. As it happens, I was a graduate student at Bohr’s institute in Copenhagen, but
he was very great and I was very young, and I never had a chance to ask him about this.

Today there are two widely followed approaches to quantum mechanics, the “realist” and “instrumentalist” approaches,
which view the origin of probability in measurement in two very different ways.9 For reasons I will explain, neither
approach seems to me quite satisfactory.10
3.
The instrumentalist approach is a descendant of the Copenhagen interpretation, but instead of imagining a boundary
beyond which reality is not described by quantum mechanics, it rejects quantum mechanics altogether as a description of
reality. There is still a wave function, but it is not real like a particle or a field. Instead it is merely an instrument that
provides predictions of the probabilities of various outcomes when measurements are made.

It seems to me that the trouble with this approach is not only that it gives up on an ancient aim of science: to say what is
really going on out there. It is a surrender of a particularly unfortunate kind. In the instrumentalist approach, we have to
assume, as fundamental laws of nature, the rules (such as the Born rule I mentioned earlier) for using the wave function to
calculate the probabilities of various results when humans make measurements. Thus humans are brought into the laws of
nature at the most fundamental level. According to Eugene Wigner, a pioneer of quantum mechanics, “it was not possible
to formulate the laws of quantum mechanics in a fully consistent way without reference to the consciousness.”11

Thus the instrumentalist approach turns its back on a vision that became possible after Darwin, of a world governed by
impersonal physical laws that control human behavior along with everything else. It is not that we object to thinking about
humans. Rather, we want to understand the relation of humans to nature, not just assuming the character of this relation by
incorporating it in what we suppose are nature’s fundamental laws, but rather by deduction from laws that make no explicit
reference to humans. We may in the end have to give up this goal, but I think not yet.

Some physicists who adopt an instrumentalist approach argue that the probabilities we infer from the wave function are
objective probabilities, independent of whether humans are making a measurement. I don’t find this tenable. In quantum
mechanics these probabilities do not exist until people choose what to measure, such as the spin in one or another direction.
Unlike the case of classical physics, a choice must be made, because in quantum mechanics not everything can be
simultaneously measured. As Werner Heisenberg realized, a particle cannot have, at the same time, both a definite position
and a definite velocity. The measuring of one precludes the measuring of the other. Likewise, if we know the wave function
that describes the spin of an electron we can calculate the probability that the electron would have a positive spin in the
north direction if that were measured, or the probability that the electron would have a positive spin in the east direction if
that were measured, but we cannot ask about the probability of the spins being found positive in both directions because
there is no state in which an electron has a definite spin in two different directions.

4.
These problems are partly avoided in the realist—as opposed to the instrumentalist—approach to quantum mechanics.
Here one takes the wave function and its deterministic evolution seriously as a description of reality. But this raises other
problems.

The realist approach has a very strange implication, first worked out in the 1957
Princeton Ph.D. thesis of the late Hugh Everett. When a physicist measures the spin of
an electron, say in the north direction, the wave function of the electron and the
measuring apparatus and the physicist are supposed, in the realist approach, to evolve
deterministically, as dictated by the Schrödinger equation; but in consequence of their
interaction during the measurement, the wave function becomes a superposition of two
terms, in one of which the electron spin is positive and everyone in the world who looks
into it thinks it is positive, and in the other the spin is negative and everyone thinks it is
negative. Since in each term of the wave function everyone shares a belief that the spin
has one definite sign, the existence of the superposition is undetectable. In effect the
history of the world has split into two streams, uncorrelated with each other.

This is strange enough, but the fission of history would not only occur when someone
measures a spin. In the realist approach the history of the world is endlessly splitting; it
does so every time a macroscopic body becomes tied in with a choice of quantum Erwin Schrödinger; drawing by David Levine
states. This inconceivably huge variety of histories has provided material for science
fiction,12 and it offers a rationale for a multiverse, in which the particular cosmic history in which we find ourselves is
constrained by the requirement that it must be one of the histories in which conditions are sufficiently benign to allow
conscious beings to exist. But the vista of all these parallel histories is deeply unsettling, and like many other physicists I
would prefer a single history.

There is another thing that is unsatisfactory about the realist approach, beyond our parochial preferences. In this approach
the wave function of the multiverse evolves deterministically. We can still talk of probabilities as the fractions of the time
that various possible results are found when measurements are performed many times in any one history; but the rules that
govern what probabilities are observed would have to follow from the deterministic evolution of the whole multiverse. If
this were not the case, to predict probabilities we would need to make some additional assumption about what happens
when humans make measurements, and we would be back with the shortcomings of the instrumentalist approach. Several
attempts following the realist approach have come close to deducing rules like the Born rule that we know work well
experimentally, but I think without final success.

The realist approach to quantum mechanics had already run into a different sort of trouble long before Everett wrote about
multiple histories. It was emphasized in a 1935 paper by Einstein with his coworkers Boris Podolsky and Nathan Rosen,
and arises in connection with the phenomenon of “entanglement.”13

We naturally tend to think that reality can be described locally. I can say what is happening in my laboratory, and you can
say what is happening in yours, but we don’t have to talk about both at the same time. But in quantum mechanics it is
possible for a system to be in an entangled state that involves correlations between parts of the system that are arbitrarily
far apart, like the two ends of a very long rigid stick.

For instance, suppose we have a pair of electrons whose total spin in any direction is zero. In such a state, the wave
function (ignoring everything but spin) is a sum of two terms: in one term, electron A has positive spin and electron B has
negative spin in, say, the north direction, while in the other term in the wave function the positive and negative signs are
reversed. The electron spins are said to be entangled. If nothing is done to interfere with these spins, this entangled state
will persist even if the electrons fly apart to a great distance. However far apart they are, we can only talk about the wave
function of the two electrons, not of each separately. Entanglement contributed to Einstein’s distrust of quantum mechanics
as much or more than the appearance of probabilities.

Strange as it is, the entanglement entailed by quantum mechanics is actually observed experimentally. But how can
something so nonlocal represent reality?

5.
What then must be done about the shortcomings of quantum mechanics? One reasonable response is contained in the
legendary advice to inquiring students: “Shut up and calculate!” There is no argument about how to use quantum
mechanics, only how to describe what it means, so perhaps the problem is merely one of words.

On the other hand, the problems of understanding measurement in the present form of quantum mechanics may be warning
us that the theory needs modification. Quantum mechanics works so well for atoms that any new theory would have to be
nearly indistinguishable from quantum mechanics when applied to such small things. But a new theory might be designed
so that the superpositions of states of large things like physicists and their apparatus even in isolation suffer an actual rapid
spontaneous collapse, in which probabilities evolve to give the results expected in quantum mechanics. The many histories
of Everett would naturally collapse to a single history. The goal in inventing a new theory is to make this happen not by
giving measurement any special status in the laws of physics, but as part of what in the post-quantum theory would be the
ordinary processes of physics.

One difficulty in developing such a new theory is that we get no direction from experiment—all data so far agree with
ordinary quantum mechanics. We do get some help, however, from some general principles, which turn out to provide
surprisingly strict constraints on any new theory.
Obviously, probabilities must all be positive numbers, and add up to 100 percent. There is another requirement, satisfied in
ordinary quantum mechanics, that in entangled states the evolution of probabilities during measurements cannot be used to
send instantaneous signals, which would violate the theory of relativity. Special relativity requires that no signal can travel
faster than the speed of light. When these requirements are put together, it turns out that the most general evolution of
probabilities satisfies an equation of a class known as Lindblad equations.14 The class of Lindblad equations contains the
Schrödinger equation of ordinary quantum mechanics as a special case, but in general these equations involve a variety of
new quantities that represent a departure from quantum mechanics. These are quantities whose details of course we now
don’t know. Though it has been scarcely noticed outside the theoretical community, there already is a line of interesting
papers, going back to an influential 1986 article by Gian Carlo Ghirardi, Alberto Rimini, and Tullio Weber at Trieste, that
use the Lindblad equations to generalize quantum mechanics in various ways.

Lately I have been thinking about a possible experimental search for signs of departure from ordinary quantum mechanics
in atomic clocks. At the heart of any atomic clock is a device invented by the late Norman Ramsey for tuning the frequency
of microwave or visible radiation to the known natural frequency at which the wave function of an atom oscillates when it
is in a superposition of two states of different energy. This natural frequency equals the difference in the energies of the two
atomic states used in the clock, divided by Planck’s constant. It is the same under all external conditions, and therefore
serves as a fixed reference for frequency, in the way that a platinum-iridium cylinder at Sèvres serves as a fixed reference
for mass.

Tuning the frequency of an electromagnetic wave to this reference frequency works a little like tuning the frequency of a
metronome to match another metronome. If you start the two metronomes together and the beats still match after a
thousand beats, you know that their frequencies are equal at least to about one part in a thousand. Quantum mechanical
calculations show that in some atomic clocks the tuning should be precise to one part in a hundred million billion, and this
precision is indeed realized. But if the corrections to quantum mechanics represented by the new terms in the Lindblad
equations (expressed as energies) were as large as one part in a hundred million billion of the energy difference of the
atomic states used in the clock, this precision would have been quite lost. The new terms must therefore be even smaller
than this.

How significant is this limit? Unfortunately, these ideas about modifications of quantum mechanics are not only
speculative but also vague, and we have no idea how big we should expect the corrections to quantum mechanics to be.
Regarding not only this issue, but more generally the future of quantum mechanics, I have to echo Viola in Twelfth Night:
“O time, thou must untangle this, not I.”
Letters
Steven Weinberg and the Puzzle of Quantum Mechanics March 16, 2017

1 Conditions on sound waves at the closed or open ends of an organ pipe require that either an odd number of quarter wave lengths or an even or an odd number of half wave lengths must just fit
into the pipe, which limits the possible notes that can be produced by the pipe. In an atom the wave function must satisfy conditions of continuity and finiteness close to and far from the
nucleus, which similarly limit the possible energies of atomic states. ↩

2 Quoted by Abraham Pais in ‘Subtle Is the Lord’: The Science and the Life of Albert Einstein (Oxford University Press, 1982), p. 443. ↩

3 Richard Feynman, The Character of Physical Law (MIT Press, 1967), p. 129. ↩

4 Lawrence M. Krauss, A Universe from Nothing (Free Press, 2012), p. 138. ↩

5 Gino Segrè, Ordinary Geniuses (Viking, 2011). ↩

6 These are complex numbers, that is, quantities of the general form a+ib, where a and b are ordinary real numbers and i is the square root of minus one. ↩

7 Simple as it is, such a wave function incorporates much more information than just a choice between positive and negative spin. It is this extra information that makes quantum computers,
which store information in this sort of wave function, so much more powerful than ordinary digital computers. ↩

8 To be precise, these “squares” are squares of the absolute values of the complex numbers in the wave function. For a complex number of the form a+ib, the square of the absolute value is the
square of a plus the square of b. ↩

9 The opposition between these two approaches is nicely described by Sean Carroll in The Big Picture (Dutton, 2016). ↩

10 I go into this in mathematical detail in Section 3.7 of Lectures on Quantum Mechanics, second edition (Cambridge University Press, 2015). ↩

11 Quoted by Marcelo Gleiser, The Island of Knowledge (Basic Books, 2014), p. 222. ↩
12 For instance, Northern Lights by Philip Pullman (Scholastic, 1995), and the early “Mirror, Mirror” episode of Star Trek. ↩

13 Entanglement was recently discussed by Jim Holt in these pages, November 10, 2016. ↩

14 This equation is named for Göran Lindblad, but it was also independently discovered by Vittorio Gorini, Andrzej Kossakowski, and George Sudarshan. ↩

RELATED

Something Faster Than Light? What Is Physics: What We Do and Don’t Know At the Core of Science !
It? ! Steven Weinberg Jim Holt
Jim Holt

© 1963-2017 NYREV, Inc. All rights reserved.


BULLETIN (New Series) OF THE
AMERICAN MATHEMATICAL SOCIETY
Volume 39, Number 3, Pages 433–439
S 0273-0979(02)00944-8
Article electronically published on April 12, 2002

The quantum theory of fields. III. Supersymmetry, by Steven Weinberg, Cambridge


Univ. Press, Cambridge, 2000, xxii + 419 pp., $49.95, ISBN 0-521-66000-9

Supersymmetry is an idea that has played a critical role in many of the recent
developments in theoretical physics of interest to mathematicians. The third vol-
ume of The quantum theory of fields by Steven Weinberg [1] is an introduction to
supersymmetric field theory and supergravity. The first two volumes of the series
treat the essentials of quantum field theory. In this third volume, Weinberg has cre-
ated the most complete introduction to supersymmetry to date. Although the text
is aimed squarely at physicists, it should prove useful to mathematicians interested
in learning about supersymmetry in its natural physical setting. As a supplement,
to help bridge the cultural and language differences between physics and mathe-
matics, I would suggest the short lecture series by Freed [2]. My goal, in the course
of this review, is to convey both the essential idea behind supersymmetry and some
of the background needed to peruse the physics literature.
What is supersymmetry? The basic notion behind supersymmetry (SUSY) can
be described in the setting of quantum mechanics. As a starting point, let us
consider a particle in one dimension with position x moving in a potential well,
V (x). The time evolution of this system is governed by a Hamiltonian, H, which
takes the form
1
(1) H = p2 + V (x).
2
The momentum, p, is a vector field that satisfies the commutation relation
(2) [x, p] = xp − px = i.
To avoid unimportant technical issues, let us assume that V → ∞ as |x| → ∞.
The states of this theory describe the possible quantum mechanical configurations
for the particle. Each state is described by a square normalizable function of x.
We also note that our Hamiltonian is Hermitian with respect to the standard inner
product on the Hilbert space of states.
To make this system supersymmetric, we need to add additional degrees of free-
dom of a quite different kind. Particles in nature come in two distinct flavors:
there are bosons and there are fermions. While bosons are described by conven-
tional commuting variables, fermions are described by variables that take values
in a Grassmann algebra. A collection of M real fermions in quantum mechanics
satisfies the quantization conditions:
(3) {ψa , ψb } = ψa ψb + ψb ψa = δab , a = 1, . . . , M.
By contrast, additional bosonic degrees of freedom, labelled say y1 , y2 , . . . , would
satisfy commutation relations:
(4) [x, yi ] = 0, [yi , yj ] = 0.
That fermions satisfy anti-commutation rather than commutation relations is their
hallmark. In a system with bosons and fermions, we can define a conserved Z2

2000 Mathematics Subject Classification. Primary 81T60.

2002
c American Mathematical Society

433
434 BOOK REVIEWS

charge measured by the operator (−1)F ,


Y
(5) (−1)F = ψa .
a

A purely bosonic operator OB satisfies


 
(6) (−1)F , OB = 0,
while a purely fermionic operator OF satisfies

(7) (−1)F , OF = 0.
That (−1)F is conserved is the statement that it commutes with H, so H is a purely
bosonic operator. The Hilbert space of the theory can therefore be organized into
states with definite eigenvalue under (−1)F . Bosonic states have eigenvalue +1,
while fermionic states have eigenvalue −1.
We might now imagine modifying H by adding additional interactions which
preserve the condition that H be Hermitian. These interactions are operators
which are constructed from the fermions, ψa , and the boson, x. The Hamiltonian
must continue to commute with (−1)F , so these interactions can involve only even
numbers of fermions. Beyond this constraint, these interactions can be essentially
whatever we choose. For supersymmetry, however, we demand that the resulting
Hamiltonian satisfy the algebraic relations
(8) {Qa , Qb } = Hδab , a, b = 1, . . . , N.
Our supercharges, Qa , are also Hermitian operators, and the parameter N deter-
mines the degree of supersymmetry. A system with N = 1 has simple supersym-
metry. Systems with N > 1 have extended supersymmetry. Typically, our control
over a theory increases with the number of supersymmetries.
To supersymmetrize the Hamiltonian of equation (1), we can add an interaction
coupling two fermions to the boson, x, giving a new Hamiltonian,
1 ∂ p
(9) Hsusy = p2 + V (x) − i 2V (x)ψ1 ψ2 .
2 ∂x
This Hamiltonian obeys the algebra
(10) Hsusy = Q2 ,
with supercharge
p
(11) Q = ψ1 p + ψ2 2V (x).
Note that Q is fermionic. It is, in essence, a ‘square-root’ of H. We can now
begin to see why supersymmetry is so powerful. From the algebra given in (8),
we see that the spectrum of any supersymmetric Hamiltonian, Hsusy , is bounded
from below (by zero). Further, supersymmetry requires that eigenstates of H with
non-zero energy eigenvalue appear in degenerate pairs. Given any state |Ei with
energy eigenvalue E, we can construct a degenerate state, √QE |Ei. Since Q itself
is fermionic, one of these states is bosonic while the other is fermionic. In this
way, supersymmetry maps bosons to fermions and vice-versa. Although arrived at
in a simplified setting, these essential features of supersymmetry generalize from
quantum mechanics to higher-dimensional field theories.
Weinberg begins his text not with supersymmetric quantum mechanics, but with
a short historical note. Supersymmetry has its roots in the early development of
BOOK REVIEWS 435

string theory. It was soon realized, however, that supersymmetry could be im-
plemented in conventional four-dimensional quantum field theories. The Standard
Model of particle physics constitutes the most important example of a theory that
can be supersymmetrized. The resulting theory, known as the ‘minimal supersym-
metric standard model’ (MSSM), remains one of the more promising candidates for
describing particle physics beyond the current Standard Model. There is a common
terminology associated to the pairing of bosons and fermions under supersymmetry.
In most cases, to an observed fermion of a supersymmetric theory, we associate a
‘sparticle’ superpartner. For example, electrons are leptons observed in the world
around us. In the context of a supersymmetric theory of leptons, the superpartner
of an electron is a ‘selectron’. Likewise, the superpartner of a quark is a ‘squark’.
For observed bosons, the appellation for the superpartner is different. We append
‘ino’ to the name of the boson; for example, the superpartner of a graviton – the
particle that we believe mediates the gravitational interaction – is a gravitino, while
the superpartner of a photon is a photino.
Clearly, supersymmetry is not an exact symmetry of the world around us. If
this were the case, we would have already observed the superpartners of light or
massless particles like the photon. Nevertheless, many physicists believe that su-
persymmetry will be observed as a fundamental symmetry of particle interactions
as we probe higher energy scales. It is also worth noting that our only consistent
theory of quantum gravity – namely, string theory – requires supersymmetry in its
formulation.
The initial discovery of supersymmetry was particularly surprising because it
avoids a famous ‘no-go’ theorem by Coleman and Mandula [3]. Quantum field the-
ories in D + 1 space-time dimensions consist of states that transform irreducibly
under the Poincaré group of rotations, boosts, and translations.1 Under reason-
able conditions, Coleman and Mandula argued that the only possible symmetries
of quantum field theory consist of the Poincaré group together with possible inter-
nal symmetries that commute with Poincaré. However, the theorem assumes that
all symmetries preserve the Z2 grading by fermion number. By contrast, super-
symmetry relates bosons to fermions, and hence evades the theorem. Weinberg’s
discussion of these points is remarkably detailed. He gives complete arguments
which are often simpler than those given in the original papers. The thoroughness
with which Weinberg develops his arguments is perhaps my favorite feature of this
text.
Supersymmetry algebras are the next topic of discussion. The N = 1 supersym-
metry algebra (10) of quantum mechanics generalizes to four space-time dimensions
(D = 3) in the following way:
(12) {Qα , Q∗β } = 2σαβ
µ
Pµ , α, β = 1, 2
{Qα , Qβ } = 0.
A word on notation is in order: the σ µ are the Pauli matrices with σ 0 = −1. In a
convenient basis,
     
1 0 1 2 0 −i 3 1 0
(13) σ = , σ = , σ = .
1 0 i 0 0 −1
1 It is unreasonable for me to attempt an explanation of quantum field theory here. Fortu-

nately, there has been a substantial effort devoted to making quantum field theory accessible to
mathematicians. The proceeds of this effort appear in [4].
436 BOOK REVIEWS

The Pµ are space-time momenta, while the Qα are complex two component space-
time spinors. The space-time metric is the Minkowski metric with diagonal entries,
{−1, 1, 1, 1}. One of the annoying features of the literature on supersymmetry is
the bewildering array of notations and conventions. Whether one likes or dislikes
his choices, Weinberg is considerate enough to clearly state his conventions, which
are consistent with his first two volumes on quantum field theory.
In quantum field theory, there is a rather important correlation between the
statistics of a particle (Bose versus Fermi) and its spin. A particle described by
a quantum field theory is characterized by its mass and spin. The notion of spin
is an important one and merits an explanation. These two characteristics, mass
and spin, come naturally from representation theory in the following way: the
single particle states of the quantum field theory essentially describe a particle
moving with momentum P . These states form an irreducible representation of the
Poincaré group. Irreducible representations of Poincaré are labelled by two Casimir
invariants. The first Casimir is P · P = −M 2 , where M is the mass of the particle.
The second invariant can be described as follows: while P · P is invariant under
the action of the Lorentz group, any particular momentum P is only left invariant
by a subgroup of the Lorentz group, Spin(3, 1). This subgroup is called the little
group of Spin(3, 1). In determining the spin of a particle, we need to consider two
distinct cases: in the case of massive particles where M 6= 0, it is not hard to see
that the little group is SU (2). The single particle states with a fixed momentum
P transform irreducibly under the little group. Let the dimension of this SU (2)
representation be 2j + 1. The quantum number j is the spin of the particle, and
−M 2 j(j + 1) is the second Casimir of the Poincaré group. If we boost to a frame
where the particle is stationary so only P 0 = M is non-zero, we see that spin indeed
describes how the particle rotates in three space.
For massless particles where P · P = 0, the situation is different. The little group
which leaves any particular P invariant has three generators which we label J3 , B1
and B2 . In a convenient frame where the particle is moving along the third axis, J3
generates rotations around this axis, while B1 and B2 generate particular boosts.
These generators satisfy the Lie algebra relations
(14) [B1 , B2 ] = 0, [J3 , B1 ] = iB2 , [J3 , B2 ] = −iB1 .
The representations of this algebra which correspond to physical particles are quite
restricted: B1 and B2 must act trivially on allowed representations. Further, the
eigenvalue of J3 , known as the helicity of the particle, is quantized. The helicity
can be either integral or half-integral.
There is, however, a standard abuse of notation under which a massless particle
is assigned spin, as if it were massive. For example, a massless photon is a spin 1
particle even though it consists only of helicity ±1 states. Likewise, a graviton is
a spin 2 particle. With this caveat in mind, we can now state the spin-statistics
theorem which follows from quite general properties of quantum field theory: bosons
must have integral spin, while fermions must have half-integral spin.
As we see from (12), minimal supersymmetry in four dimensions requires four real
supercharges. Our next step, in tandem with Weinberg, is to study representations
of the minimal supersymmetry algebra. From the representations, we can determine
the combinations of particles needed to build supersymmetric field theories. There
are again two distinct cases. If we consider a multiplet of massive particles with
mass M , we can always boost to a frame where the particles are stationary. The
BOOK REVIEWS 437

only non-vanishing component of P µ is again P 0 = M . In this convenient frame,


it is easy to construct representations of the resulting algebra,
(15) {Qα , Q∗β } = 2δαβ M.

Take a Fock vacuum |0i satisfying Qα |0i = 0. By acting with Q∗ , we build a


representation with states
|0i, Q∗α |0i, αβ Q∗α Q∗β |0i.
Note that the Fock vacuum need not be invariant under the Lorentz group. This
representation of the supersymmetry algebra is reducible under the Lorentz group.
On decomposition to irreducible representations under Lorentz, we discover the
particle content of this supermultiplet. For example, if the Fock vacuum |0i is
invariant under the Lorentz group, we obtain the following fields: two scalar fields
with spin 0 corresponding to the states |0i and αβ Q∗α Q∗β |0i, and one fermion with
spin 1/2, Q∗α |0i.
Because this supermultiplet contains massive particles, there are no particularly
strong physical constraints on the permitted representations. For sufficiently large
masses, the spins of the constituent particles can be arbitrarily high without a
physical inconsistency appearing in the low-energy observed theory. The situation
is quite different for massless particles. We can still boost to a special frame where
P µ = (E, 0, 0, E). The algebra of (12) becomes
 
1 0
(16) {Qα , Q∗β } = 4E .
0 0 αβ

We see that half the supersymmetry generators are represented trivially. The size
of our representations is therefore reduced in comparision with the case of a massive
particle. On massless particles, there are strong physical constraints. Theories of
massless particles with spins greater than 2 are not believed to be consistent. We
should note that a theory with a spin 2 massless particle is necessarily a theory of
gravity. The spin 2 particle is the graviton.
We therefore impose the constraint that the spins of massless particles in a super-
multiplet not exceed this bound. However, if we increase the number of supersym-
metries beyond the minimal four supercharges, the maximum spin of particles in a
supermultiplet increases. As Weinberg describes in more detail, we can have an ex-
tended supersymmetry algebra with at most 32 real supercharges. This maximally
supersymmetric theory is quite unique. It has a single permitted representation.
Among the particles of this supermultiplet is a spin 2 particle, so this maximally
supersymmetric theory includes gravity. A theory of supersymmetry and gravity is
known as a theory of supergravity (SUGRA).
In four dimensions, theories with sixteen or fewer real supercharges need not
contain gravity. There is a profound difference between supersymmetric theories
with and without gravity. Without gravity, supersymmetry is a global symmetry:
we must perform the same supersymmetry transformation at each point in space-
time. With gravity, the situation is quite different. Supersymmetry becomes a local,
or gauge symmetry, in which the parameters of our supersymmetry transformation
are allowed to vary over space-time. The construction of supergravity theories is
quite involved, but Weinberg’s discussion in chapter 31 is a reasonable place to
start.
438 BOOK REVIEWS

To complete this brief introduction to supersymmetry, let us turn momentarily


to the Lagrangian formulation of supersymmetric theories. This is, by far, the most
common formalism for discussing field theory. Our quantum mechanical Hamilton-
ian of equation (9) will serve as an example. This Hamiltonian can be obtained
from the action
Z
S = dt L(x, ψ)
(17) Z (   )
i X dψi ∂ p
2
1 dx
= dt + ψi − V (x) + i 2V (x)ψ1 ψ2 . ,
2 dt 2 i dt ∂x
by a Legendre transformation. In the Lagrangian formalism, where L is the La-
grangian, we view x = x(t) and ψi = ψi (t) as fields rather than operators. As fields
in the Lagrangian, fermions obey the anti-commutation relations
(18) {ψa , ψb } = 0.
Supersymmetry transformations are parametrized by a Grassmann variable  and
act on the fields in the following way:
dx p
(19) δ x = −iψ1 , δ ψ1 =  , δ ψ2 =  2V (x).
dt
Under the action of any symmetry, the Lagrangian must vary into a total derivative.
It is not hard to check that the variations given in (19) define a symmetry of the
action (17) with  time-independent. Closure of the supersymmetry algebra implies
that
d
(20) [δ , δ0 ] = 2i0 ,
dt
when acting on any of the fields. It is a quite general feature of supersymmetry
that satisfying (20) on all the fields typically requires the use of the equations of
motion. For a theory defined by a Lagrangian, the equations of motion are the
Euler-Lagrange equations. In such situations, we say that the symmetry algebra
closes only on-shell, i.e., for fields satisfying their equations of motion. This is true
even for theories of free particles.
At this point, we should systematize the procedure for constructing supersym-
metric Lagrangians and extend it to higher-dimensional field theories. One way to
do this is by introducing the notion of superspace. Unfortunately, fascinating top-
ics like superspace, supersymmetry breaking, and applications like Seiberg-Witten
theory, are beyond the scope of this review. Fortunately, these topics and many
more can be found in Weinberg’s quite comprehensive text. These modern topics,
particularly the derivation of the Seiberg-Witten solution [5] and the discussion
of non-perturbative physics, largely differentiate Weinberg’s text from other classic
texts on supersymmetry and supergravity, namely, the book by Wess and Bagger [6]
and the book by West [7]. I hope the reader is sufficiently enticed to further explore
this beautiful subject.

References
[1] S. Weinberg, “The quantum theory of fields. Vol. 3: Supersymmetry”, Cambridge, UK:
Univ. Pr. (2000) 419 pp. MR 2001a:81258
[2] D. S. Freed, “Five lectures on supersymmetry”, Providence, USA: AMS (1999) 119 pp. MR
2000k:58015
BOOK REVIEWS 439

[3] S. R. Coleman and J. Mandula, “All Possible Symmetries of the S Matrix”, Phys. Rev. 159,
1251 (1967).
[4] P. Deligne et al., “Quantum fields and strings: A course for mathematicians. Vol. 1, 2”,
Providence, USA: AMS (1999) 1-1501. MR 2000e:81010
[5] N. Seiberg and E. Witten, “Electric - magnetic duality, monopole condensation, and confine-
ment in N=2 supersymmetric Yang-Mills theory”, Nucl. Phys. B 426, 19 (1994) [Erratum-
ibid. B 430, 485 (1994)] [arXiv:hep-th/9407087]. MR 95m:81202a, MR 95m:81202b
[6] J. Wess and J. Bagger, “Supersymmetry and Supergravity”, 2nd ed., Princeton, USA: Univ.
Pr. (1992) 259 pp. MR 93a:81003
[7] P. West, “Introduction To Supersymmetry and Supergravity”, 2nd ed., Teaneck, NJ: World
Scientific (1990) 425 pp. MR 92f:81004

Savdeep Sethi
University of Chicago
E-mail address: sethi@theory.uchicago.edu
The cosmological constant problem
Steven Weinberg
Theory Group, Department of Physics, University of Texas, Austin, Texas 7871Z
Astronomical observations indicate that the cosmological constant is many orders of magnitude smaller
than estimated in modern theories of elementary particles. After a brief review of the history of this prob-
lem, five different approaches to its solution are described.

CONTENTS II. EARLY HISTORY

I. Introduction
After completing his formulation of general relativity
1
II. Early History 1
in 1915—1916, Einstein (1917) attempted to apply his new
III. The Problem 2 theory to the whole universe. His guiding principle was
IV. Supersymmetry, Supergravity, Superstrings 3 that the universe is static: "The most important fact that
V. Anthropic Considerations 6 we draw from experience is that the relative velocities of
A. Mass density 8 the stars are very small as compared with the velocity of
8. Ages 8 "
light. No such static solution of his original equations
C. Number counts 8
VI. Adjustment Mechanisms
could be found (any more than for Newtonian gravita-
9
VII. Changing Gravity 11 tion), so he modified them by adding a new term involv-
VIII. Quantum Cosmology 14 ing a free parameter A. , the cosmological constant:
IX. Outlook
Acknowledgments
20
21 R„—'g —, R —g„= —8e GT„
A, (2. 1)
References 21
Now, for A, &0, there was a static solution for a universe
filled with dust of zero pressure and mass density
As I was going up the stair,
I neet a man who wasn't theve. (2.2)
He wasn't there again today, 8+6
I wish, I wish he'd stay away. Its geometry was that of a sphere S3, with proper cir-
Hughes Mearns cumference 2m. v, where

I. INTRODUCTION
r = 1/VSmpG (2.3)
so the mass of the universe was
Physics thrives on crisis. We all recall the great pro-
gress made while finding a way out of various crises of M=2mr p= —k ' 6 (2.4)
the past: the failure to detect a motion of the Earth 4
through the ether, the discovery of the continuous spec- In some popular history accounts, it was Hubble' s
trum of beta decay, the ~-0 problem, the ultraviolet discovery of the expansion of the universe that led Ein-
divergences in electromagnetic and then weak interac- stein to retract his proposal of a cosmological constant.
tions, and so on. Unfortunately, we have run short of The real story is more complicated, and more interesting.
crises lately. The "standard model" of electroweak and One disappointment came almost immediately. Ein-
strong interactions currently faces neither internal incon- stein had been pleased at the connection in his model be-
sistencies nor conflicts with experiment. It has plenty of tween the mass density of the universe and its geometry,
loose ends; we know no reason why the quarks and lep- because, following Mach's lead, he expected that the
tons should have the masses they have, but then we know mass distribution of the universe should set inertial
no reason why they should not. frames. It was therefore unpleasant when his friend de
Perhaps it is for want of other crises to worry about Sitter, with whom Einstein remained in touch during the
that interest is increasingly centered on one veritable war, in 1917 proposed another apparently static cosmo-
crisis: theoretical expectations for the cosmological con- logical model with no matter at all. (See de Sitter, 1917.)

dr—
stant exceed observational limits bP some 120 orders of Its line element (using the same coordinate system as de
magnitude. ' In these lectures I will first review the histo- Sitter, but in a difterent notation) was
ry of this problem and then survey the various attempts dv = 1
[dt
that have been made at a solution. cosh Hv
H tanh Hr(dO —
+ sin Odg )],
*Morris Loeb Lectures in Physics, Harvard University, May (2.5)
2, 3, 5, and 10, 1988.
For a good nonmathematical description of the cosmological 2The notation used here for metrics, curvatures, etc. , is the
constant problem, see Abbott (1988). same as in W'einberg (1972).

Reviews of Modern Physics, Vol. 61, No. 1, January 1989 Copyright 1988 The American Physical Society
Steven Weinberg: The cosmological constant problem

with H related to the cosmological constant by III. THE PROBLEM

H =&A, /3 (2.6) Unfortunately, it was not so easy simply to drop the


and p=p =0. Clearly matter was not needed to produce cosmological constant, because anything that contributes
inertia. to the energy density of the vacuum acts just like a
At about this time, the redshift of distant objects was cosmological constant. Lorentz invariance tells us that
being, discovered by Slipher. Over the period from 1910 in the vacuum the energy-mornenturn tensor must take
to the mid-1920s, Slipher (1924) observed that galaxies the form
(or, as then known, spiral nebulae) have redshifts
z = b, A, /A, ranging up to 6%, and only a few have blue-
& T„.&= —(p&g„. . (3.1)
shifts. Weyl pointed out in 1923 that de Sitter's model (A minus sign appears here because we use a metric
would exhibit such a redshift, increasing with distance, which for flat space-time has goo= —1.) Inspection of
because although the metric in de Sitter's coordinate sys- Eq. (2. 1) shows that this has the same efFect as adding a
tem is time independent, test bodies are not at rest; there term 8m G (p ) to the effective cosmological constant
is a nonvanishing component of the afBne connection
X„=X+8~G(p) . (3.2)
I « = —H sinhHr tanhHr (2.7)
Equivalently we can say that the Einstein cosmological
giving a redshift proportional to distance constant contributes a term A, /8mG to the total efFective
vacuum energy
z=Hr for Hr (&1 . (2 8)

In his influential textbook, Eddington (1924) interpreted p =&p&+X/8 G=A, , /8 G . (3.3)


Slipher's redshifts in terms of de Sitter's "static" A crude experimental upper bound on A, ,& or pz is pro-
universe. vided by measurements of cosmological redshifts as a
But of course, although the cosmological constant was function of distance, the program begun by Hubble in the
needed for a static universe, it was not needed for an ex- late 1920s. The present expansion rate is today estimated
panding one. Already in 1922, Friedmann (1924) had de- as
scribed a class of cosmological models, with line element
(in modern notation) 1 dR
=Ho =50 —100 km/sec Mpc
R dt now
dr
— +r
dr =dt 2 R(t)
2
— (d6 + sin Hdy )
=( ' —1)X10
—,
' /yr .
1 kr
(2.9) Furthermore, we do not gross effects of the curvature of
the universe, so very roughly
These are comoving coordinates; the universe expands or i/R'„. „SH', .
ik
contracts as R (t) increases or decreases, but the galaxies
keep fixed coordinates r, o, y. The motion of the cosmic Finally, the ordinary nonvacuum mass density of the
scale factor is governed by an energy-conservation equa- universe is not much greater than its critical value
tion —(p -3H,'/8~G .
Ip & I

2
dR = —k+ —
'R (8m Gp+ A, ) . (2 10) Hence (2. 10) shows that
dt

The de Sitter model is just the special case with k =0 and


or, in physicists' units,
p=O; in order to put the line element (2.5) in the more
general form (2.9), it is necessary to introduce new coor- ipvi 510 g/cm =10 GeV (3.4)
dinates,
A more precise observational bound will be discussed in
t'= t —H ' ln coshHr,
Sec. V, but this one will be good enough for our present
r'=H '
exp( Ht) sinhHr, — (2. 11) purposes.
As everyone knows, the trouble with this is that the en-
ergy density (p) of empty space is likely to be enormous-
ly larger than 10 GeV . For one thing, summing the
and then drop the primes. However, we can also easily zero-point energies of all normal modes of some field of
find expanding )
solutions with A, =O and p 0. Pais (1982) mass m up to a wave number cutoff A)) m yields a vacu-
quotes a 1923 letter of Einstein to Weyl, giving his reac- um energy density (with fi = c = 1 )
tion to the discovery of the expansion of the universe:
"If there is no quasi-static world, then away with the ( ) JA4vrk dk 1
~k2+ 2 A
(3.5)
cosmological term&" (2m)' 2 16~'

Rev. Mod. Phys. , Vol. 61, No. 1, January 1989


Steven Weinberg: The cosmological constant problem

If we believe general relativity


then we might take A=(SmG) ', up to the Planck scale,
which would give
of quantum Auctuations. As we have seen, the zero-point
energies themselves gave far too large a value for & p ), so
Zeldovich assumed that these were canceled by A, /Sn. G,
&p) =2-"~-'G-'=2X 10" GeV'. (3.6) leaving only higher-order effects: in particular, the gravi-
But we saw that & p) +1,/ SAG~ is less than about
~
tational force between the particles in the vacuum Quc-
10 GeV, so the two terms here must cancel to better tuations. (In Feynman diagram terms, this corresponds
than 118 decimal places. Even if we only worry about to throwing away the one-loop vacuum graphs, but keep-
zero-point energies in quantum chromo dynamics, we ing those with two loops. ) Taking A particles of energy
would expect &p) to be of order AocD/16m, or 10 A per unit volume gives the gravitational self-energy den-
GeV, requiring I, /SmG to cancel this term to about 41 sity of order
decimal places. &p)=(GA /A ')A =GA (3.7)
Perhaps surprisingly, it was a long time before particle
physicists began seriously to worry about this problem, For no clear reason, Zeldovich took the cutoff A as 1
despite the demonstration in the Casimir effect of the GeV, which yields a density p ) = 10
& GeV, much
reality of zero-point energies. Since the cosmological smaller than from zero-point energies themselves, but
upper bound on & p ) + A, /Sm G was vastly less than any
~ ~
still larger than the observational bound (3.4) on
value expected from particle theory, most particle theor- ~&p)+A, /Sm. G~ by some 9 orders of magnitude. Neither
ists simply assumed that for some unknown reason this Zeldovich nor anyone else felt encouraged to pursue
quantity was zero. But cosmologists generally continued these ideas.
to keep an open mind, analyzing cosmological data in The real beginning of serious worry about the vacuum
terms of models with a possibly nonvanishing cosmologi- energy seems to date from the success of the idea of spon-
cal constant. taneous symmetry breaking in the electroweak theory.
In fact, as far as I know, the first published discussion In this theory, the scalar field potential takes the form
of the contribution of quantum Auctuations to the (with p &0, g &0)
effective cosmological constant was triggered by astro-
V= Vo pY0+g—(A)' . (3.8)
nomical observations. In the late 1960s it seemed that an
excessively large number of quasars were being observed At its minimum this takes the value
with redshifts clustered about z =1.95. Since 1+z is the
=V,„=V,—"
4
ratio of the cosmic scale factor R(t) at. present to its &p) (3.9)
value at the time the light now observed was emitted, this
could be explained if the universe loitered for a while at a Apparently some theorists felt that V should vanish at
value of R (t) equal to 1/2. 95 times the present value. A $ =0, which would give Vo = 0, so that & p ) would be
number of authors [Petrosian, Salpeter, and Szekeres negative definite. In the electroweak theory this would
(1967); Shklovsky (1967); Rowan-Robinson (1968)j pro- give & p) = — g(300 GeV), which even for g as small as
posed that such a loitering could be accounted for in a a would yield & p ) = 10 GeV, larger than the bound
~ ~

model proposed by Lemaitre (1927, 1931). In this model on p~ by a factor 10 . Of course we know of no reason
there is a positive cosmological constant X,z and positive why Vo or A, must vanish, and it is entirely possible that
curvature k =+1, just as in the static Einstein model, Vo or A, cancels the term
—p /4g (and higher-order
while the mass of the universe is taken close to the Ein- corrections), but this example shows vividly how un-
stein value (2.4). The scale factor R (t) starts at R =0 natural it is to get a reasonably small effective cosmologi-
and then increases; however, when the mass density cal constant. Moreover, at early times the effective
drops to near the Einstein value (2.2), the universe temperature-dependent potential has a positive coefficient
behaves for a while like a static Einstein universe, until for P P, so the minimum then is at /=0, where
the instability of this model takes over and the universe V(P)= Vo. Thus, in order to get a zero cosmological
starts expanding again. In order for this idea to explain a constant today, we have to put up with an enormous
preponderance of redshifts at z =-1.95, the vacuum ener- cosmological constant at times before the electroweak
gy density pv would have to be (2. 95) times the present phase transition. [This is not in conflict with experiment;
nonvacuum mass density po. in fact, the phase transition occurs at a temperature T of
These considerations led Zeldovich (1967) to attempt order p/&g, so the black-body radiation present at that
to account for a nonzero vacuum energy density in terms
4Veltman (1975) attributes this view to Linde (1974), himself
(quoted as "to be published" ), and Dreitlein (1974). However,
Casimir (1948}showed that quantum fluctuations in the space Linde's paper does not seem to me to take this position.
between two Aat conducting plates with separation d would pro- Dreitlein's paper proposed that Eq. (3.9) could give an accept-
duce a force per unit area equal to Ac+ /240d, or 1.30X 10 ably small value of &p), with p/i/g fixed by the Fermi cou-
dyn cm /d . This was measured by Sparnaay (1957), who found pling constant of weak interactions, if p is very small, of order
a force per area of (1 —4)X10 ' dyncm /d, when d was 10 MeV. Veltman's paper gives experimental arguments
varied between 2 and 10 pm. against this possibility.

Rev. Mod. Phys. , Vol. 61, No. 1, January 1989


Steven Weinberg: The cosmological constant problem

time has an energy density of order p /g, larger than with c independent of g„. With this X, there are no
the vacuum energy by a factor 1/g (Bludman and Ruder- solutions of Eq. (3.11), unless for some reason the
man, 1977).] At even earlier times there were other tran- coefficient c vanishes when (3. 10) is satisfied.
sitions, implying an even larger early value for the Now that the problem has been posed, we turn to its
effective cosmological constant. This is currently regard- possible solution. The next five sections will describe five
ed as a good thing; the large early cosmological constant directions that have been taken in trying to solve the
would drive cosmic inAation, solving several of the long- problem of the cosmological constant.
standing problems of cosmological theory (Guth, 1981;
Albrecht and Steinhardt, 1982; Linde, 1982). We want to
IV. SUPERSYMMETRY, SUPERGRAVITY,
explain why the effective cosmological constant is small SUPERSTR INGS
now, not why it was always small.
Before closing this section, I want to take up a peculiar Shortly after the development of four-dimensional glo-
aspect of the problem of the cosmological constant. The bally supersymmetric field theories, Zumino (1975) point-
appearance of an effective cosmological constant makes it ed out that supersymmetry in these theories would, if un-
impossible to find any solutions of the Einstein field equa- broken, imply a vanishing vacuum energy. The argu-
tions in which g„ is the constant Minkowski term g„. ment is very simple: the supersymmetry generators Q
That is, the original symmetry of general covariance, satisfy an anticommutation relation
which is always broken by the appearance of any given
metric g„, cannot, without fine-tuning, be broken in (4. 1)
such a way as to preserve the subgroup of space-time
where a and P are two-component spin indices; o „cr2,
translations.
and 0.3 are the Pauli matrices; o0=1; and I'" is the
This situation is unusual. Usually if a.theory is invari-
energy-momentum 4-vector operator. If supersymmetry
ant under some group G, we would not expect to have to
is unbroken, then the vacuum state l0& satisfies
fine-tune the parameters of the theory in order to find
vacuum solutions that preserve any given subgroup (4.2)
H C G. For instance, in the electroweak theory, there is a
finite range of parameters in which any number of dou- and from (4. 1) and (4.2) we infer that the vacuum has
blet scalars will get vacuum expectation values that vanishing energy and momentum
preserve a U(1) subgroup of SU(2)XU(1). So why will
this not work for the translational subgroup of the group
of general coordinate transformations? Suppose we look This result can also be obtained by considering the poten-
for a solution of the field equations that preserves transla- tial V(P, P' ) for the chiral scalar fields P' of a globally su-
tional invariance. With all fields constant, the field equa- persymmetric theory:
tions for matter and gravity are
&8 (P)
8
=0, (3.10)
y(y yy ) y (4.3)

where W(P) is the so-called superpotential. (Gauge de-


(3.1 1) grees of freedom are ignored here, but they would not
change the argument. ) The condition for unbroken su-
With N g's, these are N + 6 equations for N + 6 un- persymmetry is that 8'be stationary in P, which would
knowns, so one might expect a solution without fine- imply that Vtake its minimum value,
tuning. The problem is that when (3.10) is satisfied, the
dependence of X on gz is too simple to allow a solution (4.4)
of (3.11). There is a GL(4) symmetry that survives as a
Quantum effects do not change this conclusion, because
vestige of general covariance even when we constrain the
fields to be constants: under the GL(4) transformation with boson-fermion symmetry, the fermion loops cancel
the boson ones.
g„~A 1'„A (3.12) The trouble with this result is that supersymmetry is

1t; ~D;) (A ) f~; (3.13)


broken in the real world, and in this case either (4. 1) or
(4.3) shows that the vacuum energy is positive-definite.
the Lagrangian transforms as a density, If this vacuum energy were the sole contribution to the
effective cosmological constant, then the effect of super-
X~DetAX . (3.14) symmetry would be to convert the problem of the cosmo-
When Eq. (3. 10) is satisfied, this implies that X trans-
logical constant from a crisis into a disaster.
forms as in (3. 14) under (3.12) alone This has the. unique
Fortunately this is not the whole story. It is not possi-
ble to decide the value of the effective cosmological con-
solution
stant unless we explicitly introduce gravitation into the
X =c(Detg ) '~ (3.15) theory. Any globally supersymmetric theory that in-

Rev. Mod. Phys. , VoI. 61, No. 1, January 1989


Steven Weinberg: The cosmological constant problem

volves gravity is inevitablya locally supersymmetric su- point of V. Thus in supergravity the problem of the
pergravity theory. In such a theory the eff'ective cosmo- cosmological constant is no more a disaster, but just as
logical constant is given by the expectation value of the much a crisis, as in nonsupersymmetric theories.
potential, but the potential is now given by (Cremmer On the other hand, supergravity theories o6'er oppor-
et al. , 1978, 1979; Barbieri et QI. , 1982; %'itten and tunities for changing the context of the cosmological con-
Bagger, 1982) stant problem, if not yet for solving it. Cremmer et aI.
(1983) have noted that there is a class of Kahler poten-
V(P, P*) = exp(8m. GK)[D,. W(g ')'j(D W)' tials and superpotentials that, for a broad range of most
—24~G I
Wl'~, (4.5) parameters, automatically yield an equilibrium scalar
field configuration in which V=O, even though super-
where K (P, P' ) is a real function of both P and P' known symmetry is broken. Here is a somewhat generalized
as the Kahler potential, D,-S' is a sort of covariant version: the Kahler potential is
derivative
BS'
sc = + T' h(c',—
3»l T— c'*) j8~G I

BK
aO'
6 aa' ' (4.6) +g(Sn Snn') (4.8)
and (g ')'j is the inverse of a metric while the superpotential is

() E W= W, (C')+ W2(S"), (4.9)


j (4.7)
ayieayj
and T, C', S" are all chiral scalar fields. No constraints
The condition for unbroken supersymmetry is now are placed on the functions h (C', C"),
IC(S",S"*),
D, 8'=0. This again yields a stationary point of the po- Wi(c'), or Wz(S"), except that h and E are real, and
tential, but now it is one at which Vis generally negative. functions all depend only on the fields indicated; in par-
In fact, even if we fine-tuned 8' so that there were a su- ticular, the superpotential must be independent of the
persymmetric stationary point at which W =0 and hence single chiral scalar T.
V =0, such a solution would not, in general, be the state With these conditions the potential (4. 5) takes the form
of lowest energy, though it would be stable [Coleman and
88'
de Luccia (1980), Weinberg (1982)]. It should, however,
be mentioned that if there is a set of field values at which
V= exp(8m')
3(T+T*+h) (~— )a
1

8'=0 and D, W=O for all i in lowest order of perturba-


tion theory, then the theory has a supersymmetric equi- X b +(D„W)(g ')" (D W)*
librium configuration with V=0 to all orders of pertur-
bation theory, though not necessarily beyond perturba- (4. 10)
tion theory (Cxrisaru, Siegel, and Rocek, 1979). The same
is believed to be true in superstring perturbation theory where (JV ')'b is the reciprocal of the matrix
(Dine and Seiberg, 1986; Friedan, Martinec, and Shenker, ah
1986; Martinec, 1986; Attick, Moore, and Sen, 1987; (4. 11)
Morozov and Perelomov, 1987).
ac'*ac'
Without fine-tuning, we can generally find a nonsuper- The matrices JPb and g" are necessarily positive-
symmetric set of scalar field values at which V=O and definite, because of their role in the kinetic part of the
D; W&0, but this would not normally be a stationary scalar Lagrangian

k1Il J gX p
p

3 aT ah ac' aT ah ac 3 gCae gCb


gn
os"' as
(T+T*+h) ax" aC' ax" ax& a( b
ax& IT+ T* —h I Bx„ ax~ ax„
(4. 12)

D. ~=-'aC" +8-6'
Hence Eq. (4. 10) is positive and therefore, without fur-
ther fine-tuning, may be expected to have a stationary aC'
~
point with V =0, specified by the conditions
3 Bh
aw =D„S"=0 . (4. 14)
(4. 13) IT+T +hl ac
i3C'
and this does not necessarily vanish. (However, to have
But this is not necessarily a supersymmetric supersymmetry broken, it is essential that the superpo-
configuration, because here tential actually depend on all of the chiral scalars S", be-

Rev. Mod. Phys. , Vol. 61, No. 1, January 1989


Steven Weinberg: The cosmological constant problem

cause otherwise the conditions D„R'=0 would require So far, the only examples where this occurs entail a
W =0 and hence D, 8'=0.) compactification to two rather than four space-time di-
The superpotential 8' depends on C' and S", but not mensions, but it does not seem unlikely that four-
on T, so the conditions (4. 13) will generally fix the values dimensional examples could be found. A more serious
of C' and S" at the minimum of V, while leaving T un- obstacle is that the Atkin-Lehner symmetry seems irre-
determined. The field T enters the potential only in the trievably tied to one-loop order.
overall scale of the part that depends on the C', so such Indeed, it is very hard to see how any property of su-
theories are called "no-scale" models. An intensive phe- pergravity or superstring theory could make the efFective
nomenological study of these models was carried out at cosmological constant sufFiciently small. It is not enough
CERN for several years following 1983 (Ellis, Lahanas, that the vacuum energy density cancel in lowest order, or
et al. , 1984; Ellis, Kounnas, et al. , 1984; Barbieri et al. , to all finite orders of perturbative theory; even nonpertur-
1985). bative effects like ordinary QCD instantons would give
Of course, these models do not solve the cosmological far too large a contribution to the efFective cosmological
constant problem, because neither Eq. (4.8) nor Eq. (4.9) constant if not canceled by something else. According to
is dictated by any known physical principle. In particu- our modern theories, properties of elementary particles,
lar, in order to cancel the second term in Eq. (4.5), it is like approximate baryon and lepton conservation, are
essential that the coefficient of the logarithm in the first dictated by gauge symmetries of the standard model,
term in (4.8) be given the apparently arbitrary value which survive down to accessible energies. %e know of
—3/8&G. no such symmetry (aside from the unrealistic example of
It was therefore exciting when, in some of the first unbroken supersymmetry) that could keep the effective
work on the physical implications of superstring theory, cosmological constant sufficiently small. It is conceivable
it was found that compactification of six of the ten origi- that in supergravity the property of having zero efFective
nal dimensions yielded a four-dimensional supergravity cosmological constant does survive to low energies
theory with Kahler potential and superpotential of the without any symmetry to guard it, but this would run
form (4. 8) and (4.9). Specifically, Witten (1985) found a counter to all our experience in physics.
Kahler potential of the form (4.8), with h quadratic in the
C's and K= —ln(S+S*)/8~6, but with a superpoten-
V. ANTHROPIC CONSlDERATlONS
tial that depended solely on the C's. By including non-
perturbative gaugino condensation efFects, Dine et al. I now turn to a very difFerent approach to the cosmo-
(1985) were able to give the superpotential a dependence logical constant, based on what Carter (1974) has called
on S (though they did not treat the dependence of the the anthropic principle. Briefly stated, the anthropic
Kahler potential or superpotential on the C' fields). In principle has it that the world is the way it is, at least in
this work, the S field is a complex function (now often
part, because otherwise there would be no one to ask why
called Y) of four-dimensional dilaton and axion fields, it is the way it is. There are a number of difFerent ver-
while the T field represents the scale of the compactified sions of this principle, ranging from those that are so
six-dimensional manifold. The factor 3 in Eq. (4.8) arises
weak as to be trivial to those that are so strong as to.be
in these models because one compactifies on a complex absurd. Three of these versions seem worth distinguish-
manifold with (10 — 4)/2=3 complex dimensions (Chang ing here.
et al. , 1988).
(i) In one very weak version, the anthropic principle
Intriguing as these results are, they have not been tak- amounts simply to the use of the fact that we are here as
en seriously (even by the original authors) as a solution of
one more experimental datum. For instance, recall M.
the cosmological constant problem. The trouble is that Goldhaber's joke that "we know in our bones" that the
no one expects the simple structures (4.8) and (4.9) to sur-
lifetime of the proton must be greater than about 10' yr,
vive beyond the lowest order of perturbation theory, be-
because otherwise we would not survive the ionizing par-
cause they are not protected by any symmetry that sur-
ticles produced by proton decay in our own'bodies. No
vives down to accessible energies.
one can argue with this version, but it does not help us to
Recently Moore (1987a, 1987b) has attempted a more
explain anything, such as why the proton lives so long.
specifically "stringy" attack on the problem. Early work
Nor does it give very useful experimental information;
by Rohm (1984) and Polchinski (1986) had shown that in certainly experimental physicists (including Goldhaber)
the calculation of the vacuum energy density, the sum
have provided us with better limits on the proton life-
over zero-point energies can be converted into an integral
time.
over a complex "modular parameter" r. (In string
theories, two-dimensional conformal symmetry makes
the tree-level vacuum energy vanish. ) Last year Moore
pointed out that for some special compactifications there
is a discrete symmetry of modular space, known as 5Recent discussions of the anthropic principle are given in the
Atkin-Lehner symmetry, that makes the integral over ~ books by Davies (1982) and Barrow and Tipler (1986); and in ar-
vanish despite the absence of space-time supersymmetry. ticles by Carter (1983), Page (1987), and Rees (1987).

Rev. Mod. Phys. , Vol. 61, No. 1, January 1989


Steven Weinberg: The cosmological constant problem

(ii) In one rather strong version, the anthropic princi- (1) The vacuum energy may depend on a scalar field
ple states that the laws of nature, which are otherwise in- vacuum expectation value that changes slowly as the
complete, are completed by the requirement that condi- universe expands, as in a model of Banks (1985).
tions must allow intelligent life to arise, the reason being (2) In a model of Linde (1986, 1987, 1988b), fiuctua-
that science (and quantum mechanics in particular) is tions in scalar fields produce exponentially expanding re-
meaningless without observers. I do not know how to gions of the universe, within which further Auctuations
reach a decision about such matters and will simply state produce further subuniverses, and so on. Since these
my own view, that although science is clearly impossible subuniverses arise from Auctuations in the fields, they
without scientists, it is not clear that the universe is im- have differing values of various "constants" of nature.
possible without science. (3) The universe may go through a very large number
(iii) A moderate version of the anthropic principle, of first-order phase transitions in which bubbles of small-
sometimes known as the "weak anthropic principle, " er vacuum energy form; within these bubbles there form
amounts to an explanation of which of the various possi- further bubbles of even smaller vacuum energy, and so
ble eras or parts of the universe we inhabit, by calculat- on. This can happen if the potential for some scalar field
ing which eras or parts of the universe we could inhabit. has a large number of small bumps, as in a model of Ab-
An example is provided by what I think is the first use of bott (1985). Alternatively, the bubble walls may be ele-
anthropic arguments in modern physics, by Dicke (1961), mentary membranes coupled to a 3-form gauge potential
in response to a problem posed by Dirac (1937). In effect, A i, as in the work of Brown and Teitelboim (1987a,
Dirac had noted that a combination of fundamental con- 1987b).
stants with the dimensions of a time turns out to be '(4) The universe may start in a quantum state in which
roughly of the order of the present age of the universe: the cosmological constant does not have a precise value.
Any "measurement" of the properties of the universe
A'/Gcm =4. 5X 10' yr . (5. 1) yields a variety of possible values for the cosmological
constant, with a priori probabilities determined by the in-
[There are various other ways of writing this relation, itial state (Hawking, 1987a). We will see examples of this
in Secs. VII and VIII.
such as replacing m with various combinations of parti-
cle masses and introducing powers of e /A'c. Dirac's In models of these types, it is perfectly sensible to apply
original "large-number" coincidence is equivalent to us- anthropic considerations to decide which era or part of
ing Eq. (5. 1) as a formula for the age of the universe, with the universe we could inhabit, and hence which values of
m replaced by (137m~m, )'~ =183 MeV. In fact, there
the cosmological constant we could observe.
are so many different possibilities that one may doubt A large cosmological constant would interfere with the
whether there is any coincidence that needs explaining. ] appearance of life in different ways, depending on the
Dirac reasoned that if this connection were a real one,
then, since the age of the universe increases (linearly)
,
sign of A, ,z. For a large positiUe A, z, the universe very ear-
ly enters an exponentially expanding de Sitter phase,
with time, some of the constants on the left side of (5. 1)
which then lasts forever. The exponential expansion in-
must change with t;ime. He guessed that it is 6 that terferes with the formation of gravitational condensa-
changes, like 1/t [Zee (198.5) has applied similar argu- tions, but once a clump of matter becomes gravitationally
ments to the cosmological constant itself. ] In response to
bound, its subsequent evolution is unaffected by the
Dirac, Dicke pointed out that the question of the age of cosmological constant. Now, we do not know what
the universe could only arise when the conditions are weird forms life may take, but it is hard to imagine that it
right for the existence of life. Specifically, the universe could develop at all without gravitational condensations
must be old enough so that some stars will have complet- out of an initially smooth universe. Therefore the an-
ed their time on the main sequence to produce the heavy thropic principle makes a rather crisp prediction: A, ,ff
elements necessary for life, and it must be young enough of
must be small enough to allow the formation
so that some stars would still be providing energy
sufficiently large gravitational condensations (Weinberg,
through nuclear reactions. Both the upper and lower 1987).
bounds on the ages of the universe at which life can exist
This has been worked out quantitatively, but we can
turn out to be roughly (very roughly) given by just the
easily understand the main result without detailed calcu-
quantity (5. 1). Hence there is no need to suppose that lations. We know that in our universe gravitational con-
any of the fundamental constants vary with time to ac-
densation had already begun at a redshift z, ~ 4. At this
count for the rough agreement of the quantity (5. 1) with
time, the energy density was greater than the present
the present age of the universe.
mass density pM by a factor (1+z, ) ~ 125. A cosmolog-
It is this "weak anthropic principle" that will be ap-
plied here. Its relevance arises from the fact that, in ical constant has little effect as long as the nonvacuum
some modern cosmological models, the universe does energy density is larger than pz, so one can conclude that
have parts or eras in which the effective cosmological a vacuum energy density pz no larger than, say 100pM
constant takes a wide variety of values. Here are some would not be large enough to prevent gravitational con-
examples. densations. [The quantitative analysis of Weinberg

Rev. Mod. Phys. , Vol. 61, No. 1, January 1989


Steven Weinberg: The cosmological constant problem

(1987) shows that for k =0, a vacuum energy density no SmGpv


greater than vr (1+z, ) pl /3 would not prevent gravita- nv=
3H 30
tional condensation at a redshift z„' this is 410pM for (5.3)
0 SmGPM
z, =4.] +M
This result suggests strongly that if it is the anthropic 3H
principle that accounts for the smallness of the cosmolog-
The dynamics of clusters of galaxies seems to indicate
ical constant, then we would expect a vacuum energy
that Q~ 0 is in the range 0. 1 —0. 2 (Knapp and Kormendy,
density p~-(10 —100)pM, because there is no anthropic
0
1987), which with these assumptions would indicate a
reason for it to be any smaller.
value for pv/pM in the range 4 —9. If we discount the
Is such a large vacuum energy density observationally
allowed? There are a number of different types of astro- evidence from the dynamics of clusters of galaxies, then
nomical data that indicate differing answers to this ques- QM 0 could be as small as 0.02 (Knapp and Kormendy,
tion. 1987), corresponding to a value of pz/ps' -50. [See also
Bahcall et al. (1987).]

A. Mass density
B. Ages
If, as often assumed, the universe now has negligible
In a dust-dominated universe with k = 0 and p z = 0,
spatial curvature, then
the age of the universe is I; p =2/3Hp
~ For
Av+QM =1, (5.2) Hp = 100 km/sec Mpc, this is 7 X 10 yr, considerably
less than the ages usually estimated for globular clusters
where Qv and QM MQ
are the ratios of the vacuum energy (Renzini, 1986). On the other hand, for a dust-dominated
density and the present mass density to the critical densi- universe with k =0 and p~&0, the present age of an ob-
ty ject that formed at a redshift z, is

1/2 '
1/2 1/2
to(z, ) = — 1+
2 PMQ
H ', sinh —sinh (1+z )-'" (5.4)
Pv 0 PM

For instance, for z, =4 and pz/pM =9 ', Ho '. This is not in


(i.e. , QM 0 =0. 1), this gives an age 1.1HO ' in place of —
0
conflict with globular cluster ages even for Hubble constants near 100 km/sec Mpc.
These considerations of cosmic age and density have led a number of astronomers to suggest a fairly large positive
)
cosmological constant, with p~ pM [de Vaucouleurs (1982, 1983); Peebles (1984, 1987a, 1987b); Turner, Steigman, and
Krauss (1984)]. However, there recently has appeared a strong argument against this view, which we shall now consid-
er.

C. Number counts

Loh and Spillar (1986) have carried out a survey of numbers of galaxies as a function of redshift, subsequently ana-
lyzed by Loh (1986). For a uniformly distributed class of objects that are all bright enough to be detectable at redshifts
z, z,
~ „, the number of objects observed at redshift less than z ~ „ in a dust-dominated universe with k =0 is

N((z)oc I 1

(1+z] „,dss
'" 4 3
(1+pvs /pM )
—' 2
p
ds's' —2 (1+pvs' 2 /pM )
—1/2
(5.5)

Of course, in the real world there are always some objects This is more than 3 orders of magnitude below the an-
too dim to be seen. Loh's analysis allowed for an un- thropic upper bound discussed earlier. If the effective
known luminosity distribution, assuming only that its cosmological constant is really this small, then we would
shape does not evolve with time. Under these assump- have to conclude that the anthropic principle does not
tions, he found that the vacuum energy must be quite explain why it is so small. [However, there are reasons to
small: specifically, be cautious in reaching this c'onclusion. Bahcall and
Tremaine (1988) have recently reanalyzed the data of
—p 4 Loh and Spillar, using a plausible model of galaxy evolu-
pv/p~ =0. 1+0.2 . tion in which the shape of the luminosity distribution

Rev. Mod. Phys. , Vol. 61, No. 3, January 1989


Steven Weinberg: The cosmological constant problem

does change with time. They considered only the case matter, or the vacuum and radiation, in such a way tha. t
pz =0, leavingQM undetermined, and found that evolu- either pv/pM or pv/pz remain constant, respectively
tion in this model could increase or decrease the inferred (see also Reuter and Wetterich, 1987). In order for the
value of QM by as much as unity. Presumably it would vacuum to transfer energy to ordinary matter in such a
0
way that p v lpM remains fixed, and if baryon number is
also have a similarly large efFect on the inferred value of
conserved, then it would be necessary to create baryon-
pv/pM 0 when QM 0 +Av is constrained to be unity. In antibaryon pairs at a su%cient rate to produce a trouble-
addition, the redshifts of Loh and Spillar are photometric some y-ray background. Alternatively, if the vacuum
and therefore less certain than those obtained from shifts transfers energy to radiation in such a way that pv/pz
of individual spectral lines. ] remains constant, and if p~ is comparable with the
Now let us consider a cosmological constant of the present mass density pM,0 then pv/pz must be rather
other sign, A, ,~(0. Here the cosmological constant does
not interfere with the formation of gravitational conden- large, completely changing the results of cosmological
Instead (for k = 0 or k = + 1), the whole
nucleosynthesis.
sations.
One more possibility that was not considered by Freese
universe collapses to a singularity in a finite time T. The
et al. is that the vacuum transfers energy to radiation,
anthropic constraint here is simply that the universe last
avoiding the problems of baryon-antibaryon annihilation,
long enough for the appearance of life (Barrow and
but in such a way as to keep a fixed ratio pv/pM rather
Tipler, 1986), say, T ~0. 5HO ', where Ho ' is the Hubble
than pv/pz. However, this also does not work. With
time in our universe. For a dust-dominated universe with
k =0, we have pv=cpM and R pM constant, Eq. (5.8) yields
r
4 3 4
T =~(S~Glp, l)'", Ro
+ 3&PM0 Ro Ro
(5.6) PR PZ0
' R R
H =(Sm Gp /3)'
so the anthropic constraint here is just ~ (pR
—3cpM )
Ro
R
(5.9)

lpvl-pM, . (5.7)
So that there is no interference with calculations of
In this case the anthropic principle can explain why the cosmological nucleosynthesis, we need
4
cosmological constant is as small as found by Loh (1986), Ro
but not much smaller. On the other hand, a negative PR PR0
cosmological constant would not help with the cos-
mic mass and age problems. and therefore
Before closing this section, let me take up one possibili-
ty that may confront us in a few years. Suppose it really
is confirmed that, as suggested by cosmic ages and densi- Ipvl/pM =
—Ic I-
PZ0
3PM0
«1. (5. 10)
ties, there is a cosmological constant with p~ of order
pM . Would we then have any alternative to an an- Thus, even if we are willing to suppose that the vacuum
0
thropic explanation for this value of p~? The mass densi- energy changes with time, a vacuum energy density com-
parable with the present mass density seems very dificult
ty pM changes with time, so without anthropic considera-
tions it is very hard to explain why a constant p~ should
to explain on other than anthropic grounds.
equal the value that pM happens to have at present. But
perhaps pz really is not constant. For instance, Peebles VI. ADJUSTMENT MECHANISMS
and Ratra (1988) and Ratra and Peebles (1988) have con-
sidered a model in which the vacuum energy depends on I now turn to an idea that has been tried by virtually
-a scalar field that changes as the universe expands. In or- everyone who has worried about the cosmological con-
der to qualify as a vacuum energy, it is only necessary for stant [see, e.g. , Dolgov (1982); Wilczek and Zee (1983);
p~ to be accompanied with a pressure pz= — Wilzcek (1984, 1985); Peccei, Sola, and Wetterich (1987);
'
p~, the
value of pz can change if the vacuum exchanges energy Barr and Hochberg (1988)]. Suppose there is some scalar
with matter and radiation. The conservation of energy whose source is proportional to the trace of the
then relates the change of pz to the change in the densi- energy-mornenturn tensor
ties of matter (with p~ =0) and radiation (with
H P a: Ti'„~ R . (6. 1)
(Here T"' is
the total energy-momentum tensor that in-
dt
pv+R dt
(R pM)+R
dt
(R p~)=0 . (5.8) cludes a possible cosmological constant term
Ag"'/Sn. G. ) Suppose also that T"„depends on P and
Freese et al. (1987) have considered the possibility that vanishes at some field value Po. Then P will evolve until
energy is exchanged only between the vacuum and it reaches an equilibrium value Po, where T"„=0,and the

Rev. Mod. Phys. , Vol. 61, No. 1, January 1989


10 Steven Weinberg: The cosmological constant problem

Einstein field equations have a flat-space solution. the N fields f„with N —1 fields o, (not necessarily sca-
Of course, we do not observe such a scalar field, but for lars) and one scalar P, in such a way that the symmetry
these purposes it can couple as weakly as we like; a weak transformation (6.5) takes the form
coupling simply implies that the equilibrium value Po is 5gi, =2egi„, 5o, =0, 5$= —E . (6.7)
very large. In this respect the scalar P is analogous to the
axion, especially in its later "invisible" version [Kim [To do this, we first define a "transverse" surface S in
(1979); Dine, Fischler, and Srednicki (1981)]. field space by an equation T(g)=0, where T(P )is -any
Even very weakly coupled, it is possible that the P field function on which g„(BT/Bg„)f„(g) does not vani'sh.
could have interesting effects, because it must have very We take o, as any set of coordinates on this (N —1)-
small mass. If it has any nonzero mass M&, then at ener- dimensional surface, and define P„(o;P) as the solution
gies below m& we can work with an effective Lagrangian of the ordinary diff'erential equation dg„/dP=f„(g)
"
in which P has been "integrated out, and so does not ap- subject to the condition that at /=0, 1(„ is at the point
pear explicitly. But massless fields like the gravitational on S with coordinates cr. The condition that S be a trans-
and electromagnetic field will still appear in this effective verse surface ensures that, at least within a finite region
Lagrangian, and their vacuum fluctuations will contrib- of field 'space, any point itj„ is on just one of these trajec-
ute to the effective cosmological constant. In order to tories. ] This symmetry simply ensures that for constant
keep pz&10 GeV, we need the scalar field adjust- fields the Lagrangian can depend on gi, and P only in the
ment to cancel the effect of gravitational and electromag- combination e ~g& . The general arguments of Sec. III
netic field fluctuations down to frequencies 10 ' GeV; then show that when the field equations for 0. are
for this purpose we must have m& &10 ' GeV. A field satisfied, the Lagrangian must take the form
this light will have a macroscopic range: A'/m&c ~0. 01 X =e ~(Detg )'~ Xo(0 ) . (6.8)
cm.
'
Unfortunately it seems to be impossible to construct a We see that the source of (t is the trace of the energy-
theory with one or more scalar fields having the assumed momentum tensor
properties. This can be seen in very general terms. What BX = T"P (Detg ) ' (6.9)
we want is to find an equilibrium solution of the field
equations in which g„, and all matter fields P„(perhaps T"'=g" e .
~XO(o). (6. 10)
tensors as well as scalars) are constant in space-time. For
such constant fields the Euler-Lagrange equations are It is true that if there were a value of P where X is sta-
simply tionary in P, then the trace of the Einstein field equations
would automatically be satisfied at this point, but clearly
=0, (6.2) there is no such stationary field value (unless, of course,
Bgp~ we fine-tune Xo so that it vanishes at its stationary point).
To put this another way, since X depends only on P and
(6.3)
g„, only in the combination g&, —e g„, (and derivatives
of P and g„), we might as well redefine the metric as g„„
As we saw in Sec. III, the problem is in satisfying the instead of g„. Then p is just a scalar with only deriva-
trace of the gravitational field equation. To make a solu- tive couplings and clearly cannot help with our problem.
tion natural, we would like this trace to be a linear com- As one example of many failed attempts along this
bination of the P„ field equations; that is, we want line, let us consider a proposal of Peccei, Soli, and Wet-
BX(g, g) terich (1987). They observed that the symmetry (6.5) or
~ BX(g, g) (6.4) (6.7) may be broken by conformal anomalies, such as
those that produce the P function of quantum chromo-
for all constant g„, and g„. This can be restated as a dynamics, in such a way that the effective Lagrangian be-
symmetry condition: for constant fields the Lagrangian comes
must be invariant under the transformation X, s=(Detg)'[e ~XO(o. )+pe"„], (6. 11)
(6.5) where e~„represents the effect of the conformal anoma-
With this condition, if we find a solution g' '
of the
Euler-Lagrange equations for g„,
6This remark is due to Polchinski (1987).
az =0 at (6.6)
An equation essentially equivalent to (6. 11) appeared in the
1t n n
preprint version of the paper by Peccei, Sola, and Wetterich
(1987). In the published version this equation was removed, and
then the trace of the field equation for g„ is automatica1- it was acknowledged that fine-tuning is still needed to make the
ly satisfied. cosmological constant vanish. However, this equation was
The problem is that under these assumptions, it is im- quoted in the meantime in a paper by Ellis, Tsamis, and
possible (without fine-tuning X) to find a solution to the Voloshin {1987),which mostly deals with the observable conse-
field equations (6.3) for the g„. To see this, we replace quences of the light scalar particle in this model.

Rev. Mod. Phys. , Vol. 61, No. 1, January )989


Steven Weinberg: The cosmologicaI constant problem

ly. The source of the P field is now cal assumptions that later turn out to have exceptions of
great physical interest. (A famous example is the
=(T& +6"P )(Detg)' (6. 12) Coleman-Mandula theorem. ) More discouraging than
any theorem is the fact that many theorists have tried to
with T"" the previous energy-momentum tensor (6. 10). invent adjustment mechanisms to cancel the cosmologi-
Now we can find an equilibrium solution for the P field, cal constant, but without any success so far.
at a value Po such that
Vll. CHANGING GRAVITY
4ttt
4e ~XO+B"„=0 . (6. 13)
A number of authors have suggested changing the
rules of classical general relativity in such a way that the
The trouble is that this is not the condition for a Aat- cosmological constant appears as a constant of integra-
space solution; the Einstein equation for a constant tion, unrelated to any parameters in the action [Van der
metric is
Bij et al. (1982); Weinberg (1983); Wilczek and Zee
(1983); Buchmiiller and Dragon (1988a, 1988b)]. This
eff
e ~X +$6"„, (6. 14) does not solve the cosmological constant problem, but it
does change it in a suggestive way.
I will describe one version of this idea, in which one
which is not the same as (6. 13). The point is that just cal- maintains general covariance, but reinterprets the for-
ling the anomalous term in (6. 11) 0"„does not make it a malism so that the determinant of the metric is not a
term in the trace of the energy-momentum tensor to dynamical field. Any theory can be written in a way that
which g„ is coupled. This result is not surprising, since is formally generally covariant, so' by the usual argu-
(6. 11) does not obey the symmetry (6.7). One cannot ments we can take the action for gravity and matter as
have it both ways: either we preserve the symmetry, in
which case there is no equilibrium solution for P, or we
break the symmetry, in which case such an equilibrium
I[4 g]= I d'x g R+I~[4 g ], (7. 1)
solution does not imply a solution of the field equations where g are a set of matter fields appearing in the matter
for a constant metric. (Also see Coughlan et al. , 1988; action IM. (IM includes a possible cosmological constant
Wet terich, 1988.)
'
term — A.
J
&gd x/8mG. ) The variational derivative of
In a slightly different version of this general class of
Eq. (7. 1) with respect to the metric is
models, we can try coupling a scalar field so that it is the
curvature scalar R rather than the trace of the energy- 5I 1 —'g" R )+T"'
momentum tensor that directly serves as the source of 5g„8~G (R" —, (7.2)
the scalar field. [See, e.g. , Dolgov (1982); Barr (1987);
where, as usual, T is the variational derivative of I~
Ford (1987).] For instance, we might take the Lagrangian
as
with respect to g„. In ordinary general relativity all
components of the metric are dynamical fields, so Eq.
(7.2) vanishes for all p, v, yielding the usual Einstein field
r=&g —-'a Pya~y— 8m. G
2
1
R —U(P)R equations. However, just because we use a generally co-
variant formalism does not mean that we are committed
to treating all components of the metric as dynamical
(6. 15)
fields. For instance, we all learn in childhood how to
write the equations of Newtonian mechanics in general
This has a fiat-space solution with g„,=rl„and P =go (a curvilinear spatial coordinate systems, without supposing
constant), provided that the 3-metric has to obey any field equations at all.
In particular, if the determinant g is not dynamical,
U($0)= oo . (6. 16) then the action only has to be stationary with respect to
variations in the metric that keep the determinant fixed,
However, as the above authors observed, the effective
gravitational coupling in this theory is given by

6 =0. (6. 17) 8For instance, -we assumed that in the solution for Hat space all
1+ 16m. G U ( $0) fields are constant, but it might be that this solution preserves
only some combination of translation and gauge invariance, in
This is not much progress; we always knew that a which case some gauge-noninvariant fields might vary with
nonzero vacuum energy does not prevent a Hat-space space-time position. (This is the case for the 3-form gauge field
solution if the gravitational constant is zero. model discussed at the end of Sec. VII and in Sec. VIII.) Fur-
The "no-go" theorem proved in this section should not thermore, it is possible that the foliation of field space, which al-
be regarded as closing off all hope in this direction. No- lows us to replace the g„with cr, and P, does not work
go theorems have a way of relying on apparently techni- throughout the whole of field space.

Rev. Mod. Phys. , Vol. 61, No. 1, January 1989


12 Steven Weinberg: The cosmological constant problem

i.e., for which g" 5g„=0; hence only the traceless part This is consistent only if t„ is traceless; however, Ein-
of (7.2) needs to vanish, yielding the field equation stein took for t„not the full energy-momentum tensor of
matter and radiation, but just the traceless tensor of radi-
R" g"—R= —8mG(T"' ,'g—"—Tk)
,'— . (7.3) ation alone. This is, of course, conserved only outside
matter. In such regions there is no difference between
This is just the traceless part of the Einstein field equa- Eqs. (7.8) and (7.3), so by the same calculation as shown
tions; these equations evidently contain less information here, Einstein was able to recover Eq. (7.7), with A a con-
than Einstein's, but as we shall see, not much less. Be- stant of integration. However, inside matter, Eq. (7.8) is
cause the whole formalism is generally covariant, the different from (7.3), the difference being that the right-
energy-momentum tensor satisfies the usual conservation hand side of Eq. (7.3) includes the traceless part of the
law energy-momentum tensor of matter. A consequence of
T".;II =0 (7.4) this difference is that in charged matter R is an undeter-
mined function, except that it is constant along world
and of course the Bianchi identities still hold, lines.
(R"'——
'g"'R ).;p =0 . (7.5)
I will also take the opportunity of this pause to com-
2
ment on the connection between the formulation de-
The full Einstein field equations are automatically con- scribed here and that of Zee (1985) and Buchmiiller and
sistent with (7.4) and (7.5), but for the traceless part we Dragon (1988a, 1988b). These authors take as their start-
get a nontrivial consistency condition. Taking the co- ing point the assumption that the action is invariant not
variant derivative of Eq. (7.3) with respect to x" yields under the group of all coordinate transformations, but
only under the subgroup of transformations x"~x'"
—, '8 R =SAG —,'B„T q, with Det(Bx'"/Bx')=1. This is not really in confiict
or, in other words, R —Sm. GT is a constant, which we
with the formulation presented here; the general covari-
&
ance of Eq. (7. 1) is achieved at the cost of introducing a
will call — 4A:
metric that is partly nondynamical (just as we can make
R —8mGT z= —.4A (constant) . (7.6) Newtonian mechanics formally Lorentz invariant by in-
troducing a nondynamical quantity, the velocity of the
From (7.3) and (7.6), we obtain reference frame). However, in giving up general covari-
R" ——
'g" R —Ag" = —S~GT" (7.7) ance, one may be led to a theory with unnecessary ele-
2
ments. Under transformations with Det(Bx'/Bx ) = 1, the
Thus we recover the Einstein field equations, but with a determinant of the metric g behaves just like any scalar
cosmological constant that has nothing to do with any field, so one can introduce arbitrary functions of g here
terms in the action or vacuum fluctuations, arising, in- and there in the action. There is nothing wrong with
stead, as a mere integration constant. To put this anoth- this, but it is not necessary, no different from inserting a
er way, Eq. (7.3) does not involve a cosmological con- new scalar field into the theory.
stant; the contribution of vacuum fluctuations automati- Now let us return to the theory described by the field
cally cancel on the right-hand side of Eq. (7.3), so this equations (7.3). In my view, the key question in deciding
equation does have Aat-space solutions in the absence of whether this is a plausible classical theory of gravitation
matter and radiation. The remaining problem in this for- is whether it can be obtained as the classical limit of any
mulation is: why should we choose the Oat-space solu- physically satisfactory quantum theory of gravitation.
tions'? To help in answering this, and also to illuminate the
Before proce|;ding with this theory, I should pause to points raised in the previous paragraph, let us look at a
mention that it is closely related to a proposal made long simple model (Teitelboim, 1982) that shares several
ago by Einstein (1919). After his formulation of general features with the theory of gravitation studied here.
relativity and its application to cosmology, Einstein Consider a free relativistic particle, with space-time
turned to the old problem of a field theory of matter. In trajectory x "(s) parametrized by a variable s. In order
a paper titled "Do Gravitational Fields Play an Essential for the action to be invariant under arbitrary reparame-
Part in the Structure of the Elementary Particles of trizations s ~s'(s), we must introduce an "einbein" g (s),
Matter' ?" he proposed to replace the original gravitation- with transformation rule
al field equation with the equation
dS
R„—'g R = —S~Gt„ (7.8)
g(s)~g'(s') =g(s) (7.9)
—, dS

The action may then be taken as


dx"(s) dx
I[x,g]= ' Jdsg '(s) dS
—,
d$
This was pointed out to me by someone in the audience of the
lectures at Harvard. I thank my informant for this interesting
historical reference.
Pl
f ds g(s) . (7.10)

Rev. Mod. Phys. , Vol. 61, No. 1, January 1989


Steven Weinberg: The cosmological constant problem 13

The conditions that I


be stationary with respect to varia- Here h;, , N, and N' parametrize
de= the 4-metric, with line
tions in x "(s) and g(s) are, respectively, element given by

dp (Q (N—N'N&Q . . )dt ~
~
(7. 11) EJ
ds —2h;. N'dx~dt —h, dx'dx~, (7. 18)
p~pp = m (7. 12)
h = Det(h;j ) . (7. 19)
where p„ is the canonical conjugate to x":
Furthermore, m'j is the canonical conjugate to h;. , and &
dx„(s) and &; are functions of h;j and rr'j and their space
p„(s ) =g '(s) (7. 13)
derivatives, given by
However, just because we choose to write the action in (
g 'j, kl~ ij kl (3)g
a reparametrization-invariant way does not necessarily
mean that we must treat the einbein g (s) as a dynamical 2h; —
Vk vr j. ", (7.21)
quantity. If we treat x"(s), but not g(s), as dynamical
variables, then we obtain Eq. (7. 11), but not (7. 12). Of where ' 'R is the scalar curvature and VI, is the covariant
course, Eq. (7. 11) implies that p„p" is a constant [just as derivative, both calculated using the 3-metric h;, and
Eq. (7.3) implies that R —8rrGT k is constant]. If we = ~ ik h jl + hil h jk (7.22)
~i j, kl hij ~ kl
like, we can call this constant — m, but this is now a
mere integration constant, unrelated to anything in the We see that X and X' just act
as Lagrange multipliers for
original action. & and &;, respectively. Moreover, from (7. 18), we see
Now to quantization. The Hamiltonian here is that X is just the quantity whose status is under ques-
tion here, the determinant of the 4-metric'
dx" L= —
II(s)=p ds
'g(p—"p +m ), (7. 14)
N=(Detg„)' (7.23)
so in quantum mechanics we calculate amplitudes by the
functional integral Thus, just as the integral over the einbein g(s) enforced
the constraint p"p„= — m, the integral over Detg en-
A = f [dx][dp][dg] forces the constraint

dx "(s) (7.24)
Xexp f ds, p„(s)
i H(s)— (7. 15)
L
ds The two conditions are quite similar. Just as g„has sig-
+++
I

The einbein g(s) has no canonical conjugate, and so ap- nature —,the quantity (7.22), viewed as a 6X6
pears here only as a Lagrange multiplier, whose integral matrix, has signature +, +, +, +, +, Hence the in- —.
yields a factor tegration over Detg„has the effect of eliminating one
negative norm degree of freedom for each x, ~'J ~ (h ')'j,
Q5(p "p„+m ) . (7. 16)
just as the integral over the einbein g(s) allows one to
eliminate the variable p . However, for gravity there is a
Presumably the classical theory in which g is not dynami- "potential" term in &, proportional to the 3-curvature,
cal would be obtained as the classical limit of a quantum
and it is not eritirely clear to me that it really is necessary
theory in which we do not do a functional integral over
to constrain & to take a fixed value. For the present, the
g(s), and hence do not get the factor (7. 16). But then question of whether it is necessary to integrate over
there would be nothing to keep p" timelike. This is such
a trivial theory that it is hard to say that anything goes
Detg„must be left open. [Recent work by Henneaux
and Teitelboim (1988) shows that there is a sensible gen-
wrong physically; but we may anticipate that in less trivi-
erally covariant quantum version of the classical theory
al theories, we need a field to serve as a Lagrange multi-
described by Eq. (7.3).]
plier for every negative norm degree of freedom like p .
Before closing this section, I should note that several
This is the case, for instance, in string theories, where the
authors have made a rather different suggestion, which
integration over the world-sheet metric is needed to en-
also has the efFect of converting the cosmological con-
force the Virasoro conditions on physical states.
stant from a function of parameters in the action into a
The quantum theory of gravitation can be put in simi- constant of the motion (Aurilia et a/. , 1980; Witten,
lar terms. Using the Arnowitt-Deser-Misner (1962) for-
1983; Henneaux and Teitelboim, 1984). They proposed
malism, we calculate amplitudes as functional integrals,
adding to the action a term
f
Z= [dh;j][dm"][dN][dN']

Xexp i rr"
Bt
—(&—2A, )N &;N' d "x- ' In order
diA'erently
to obtain this result, I have defined & and %
from the usual & and X, by moving a factor h'
'

(7. 17) from %to &.

Rev. Mod. Phys. , Vol. 61, No. 1, January 1989


14 Steven Weinberg: The cosmological constant problem

IF = ——
' d 4g
x&gFP VPO'
F" ~ (7.25) the space-time coordinate system so that the spacelike
surface has constant t, and then decomposing the 4-
where F„, is the exterior derivative of a 3-form gauge metric g„„as in Eq. (7. 19).] This wave function satisfies
field A a sort of Schrodinger equation, known as the Wheeler-

and g—— :
pvpcr

Detg„. Since
it can be expressed as
l p vpa]
F" P
(7.26)
is totally antisymmetric,
DeWitt equation [DeWitt (1967); Wheeler (1968)]:

'~2 5h"ij "' ' 5h kl

FP vP —
cF P vP 0' / Q (7.27) +8m'GTOO 4=0, (8. 1)
"
where c. P is the Levi-Civita tensor density, with
c.
' —
= 1, and c is a scalar field. The field equation for 3 with notation explained in Sec. VII (except that we now
is include a matter energy density Too, in which the canoni-
FPvPo —0 (7.28) cal conjugate of a matter field 4
is replaced with 5/54).
;p 7

It will be very important in what follows that we express


so, using (7.27) the solution as a Euclidean path integral

(7.29)
~ f [dg][d4]exp( —S[g, @]), (8.2)
But the action (7.25) then takes the form
where we integrate over all Euclidean-signature 4-metrics
IF = + ,' c —d"x &g f (7.30)
g„and matter fields 4
defined on a 4-manifold M4, that
In other words, whatever else contributes tq the cosmo- have the 3-manifold M3[h, g] with 3-metric h; and
logical constant, there is one term that depends on the in- matter fields P as a. boundary. (The Wheeler-DeWitt
tegration constant c, equation is the constraint obtained from integrating the
Lagrange multiplier X as discussed in Sec. VII. ) Here S
AF 4m Gc (7.31) is the Euclidean action"
Again, this does not solve the cosmological constant
problem, but it does change the way it arises.
If A, is a constant of integration, then in a quantum
S=
16~6 f Vg (R +2k)
theory we expect the state vector of the universe to be a +matter terms+surface
of states with different values of A, , in
terms . (8.3)
superposition
which case the anthropic considerations of Sec. V would
set a bound on th effective cosmological constant. Since Eq. (8. 1) is a diff'erential equation in an infinite-
dimensional space [the set of Ii; (x) and P(x) for all x], it
has an infinite variety of solutions, which can be specified
Vill. QUANTUM COSMOLOGY by giving the 4-manifold in Eq. (8.2) other boundaries,
besides the M3[h, g] on which the 3-metric and matter
The last approach to the cosmological constant prob- . fields are specified. Hartle and Hawking (1983) proposed
lem that I shall describe here is based on the application as a cosmological initial condition that the manifold M4
of quantum mechanics to the whole universe. In 1984 should have no boundaries other than M3(h, P). We will
Hawking (1984b) described how in quantum cosmology see that Coleman's (1988b) approach does not depend
there could arise a distribution of values for the effective critically on the choice of initial conditions.
cosmological constant, with an enormous peak at A, ff 0. , There are technical problems associated with this for-
Very recently, this approach has been revived in an excit- malism. One is an operator-ordering ambiguity: there
ing paper by Coleman (1988b), using a new mechanism are various ways of ordering' the h;J fields and 5/5h;1
for producing a distribution of values for the cosmologi- operators in (8. 1), all of which have (8.2) as solution, but
cal constant (that rests in part on other work of Hawking
and Coleman) and finding an even sharper peak. Related
ideas have also been recently discussed by Banks (1988).
Before describing the work of Coleman and Hawking, I ' The Euclidean action S is opposite in sign to what we would
will have to say something about quantum cosmology in I
get if we replaced the metric g„ in the action in Eq. (7. 1) with
general. one of signature +, +, +, +.This sign of S is chosen so that
Most treatments of quantum cosmology are based on ordinary matter makes a positive contribution to S.
the "wave function of the universe, a function %[h, P] of " 2The;nsert, on of factor~ h — / and h /';n Eq (8 l
the 3-metric and matter fields on a spacelike surface. represents one choice of operator ordering, which is made in or-
[The 3-metric h," can be conveniently defined by adapting der to allow the derivation of the conservation equation (8.8).

Rev. Mod. Phys. , Vol. 61, No. 1, January 1989


Steven Weinberg: The cosmological constant problem 15

with different ways of calculating the measure [dg][d4] is a natural definition of time, and we generally ask -for
(Hawking and Page, 1986). Another problem, potentially the probabilities that the fields have certain values at a
more worrisome, is that for gravity the Euclidean action definite time. However, here time is a coordinate with no
(8.3) is not bounded below. Gibbons, Hawking, and Per- objective significance, and this coordinate time is even
ry (1978) have proposed rotating the contour of integra- imaginary. As Augustine (398) warned, "I must not al-
tion for the overall scale of the 4-metric so that it runs low my mind to insist that time is something objective.
"
parallel to the imaginary axis. We will not need to go Heeding this warning, suppose we choose some "time-
into these technicalities here, because it will turn out that keeping" field a(x, t), for instance, the trace of the
we only need to deal with the effective action at its equi- energy-momentum tensor, and use its value to define a lo-
librium point. cal time a. Each value of a defines a 3-surface, on which
A problem that is more relevant to us here has to do the coordinate time t is a function t(x, a) defined impli-
with the probabilistic interpretation of the wave function citly by
4 and of Euclidean path integrals like (8.2). Hawking a(x, t(x, a))=a . (8.4)
has proposed (1984a, 1984c) that'exp( — S[g, @]) should
be regarded as proportional to the probability of a partic- We are then interested in the probability that the tangen-
ular metric and matter field history. It is not immediate- tial components of the metric and all matter fields other
ly clear what is meant by this —
even supposing that we than a (x, t ) have specified values on this surface. Calling
had the godlike ability to measure the gravitational and these quantities b„( xt), we see that the probability den-
rnatter fields throughout space-time, it would be in a sity for the b„(x, t ) to have the values p„(x) at local time
space-time of Lorentzian rather than Euclidean signa- CX is

ture. However, since we can (sometimes) go from one —S[g, @])


signature to another by a complex coordinate transfor- P [P]=N f [dg][d4]exp(
rnation, it may be that a Euclidean history g„(x), C&(x)
X jf 5(b„(x, t(x, a) ) —p„(x) ), (8.5)
can be interpreted in terms of correlations of scalar quan-
tities, just as if the space-time were I.orentzian. In much
of Hawking's work (e.g. , Hawking, 1979), these questions with N a normalization factor, determined by the 'condi-
are avoided by using the formalism only to calculate the tion that the total probability of finding any value for the
probability that, in the space-time history of the universe, b„(x) at local time a should be unity:
there is a spacelike 3-surface with a given 3-metric h;J(x)
and matter fields P(x). For instance, with Hartle- f
1= p [p][dp]
Hawking (1983) initial conditions, we would integrate =X [dg][d4]exp( —S[g, 4]) .
f (8.6)
over all closed 4-manifolds that contain such a 3-surface.
If this surface bisects the 4-manifold, then it can be re- [This usually makes X a function of a, because in (8.5)
garded as the boundary of the two halves of the 4- and (8.6) we integrate only over matter and metric his-
manifold, and so the integral is (with some qualifications) tories for which Eq. (8.4) is satisfied on some 3-surface.
just the square of the wave function (8.2). But questions With some boundary conditions, this condition is au-
still arise concerning the probabilistic interpretation of tomatically satisfied, and then N is o. independent. For
particularly with regard to normalization. If instance, if M4 has two boundaries, on which a (x) is re-
~'P[h, P]~ is the probability density that there exists some quired to take values o. , and u2, then there are 3-surfaces
3-surface on which the 3-metric is h, . (x) and the matter on which (8.4) is satisfied for all u in the range
fields are P(x), then we would not simply want to set the a, &n(a2. ] Where the surface of constant a bisects the
functional integral of ~%[h, P]~ over Ii;J(x) and P(x) 4-space, P [p] can be written as proportional to the
equal to unity, because in this functional integral we are square of the wave function %[a, p], but with a constant
summing up possibilities that are not exclusive; if the in 3-space.
universe has some h; (x) and P(x) on one 3-surface, then
it may also have some other h'J. (x) and P'(x) on some
This quote is not merely a display of useless erudition. Book
other 3-surface. After all, you would not expect that the
probabilities that you ever in your life have flipped a coin
XI of Augustine's Confessions contains a famous discussion of
the nature of time, and it seems to have become a tradition to
and gotten heads, and that you ever in your life have
quote from this chapter in writing about quantum cosmology.
flipped a coin and gotten tails, should add up to unity. Thus Hawking (1979) quotes "What did God do before He
I would like to offer an interpretation of what is meant made Heaven and Earth? I do not answer as one did merrily:
by treating ~%[h, @]~ as a probability density, which He was yreparing a Hell for those that ask such questions. For
seems to me implicit in Hawking s writings (and may al- at no time had God not made anything, for time itself was made
ready be stated explicitly somewhere in the literature). by God.
" Coleman (1988a) quotes "The past is present
As everyone has recognized, the problem has to do with "
memory. To this, I can add one more very relevant quote: "I
the role of time in quantum gravity. [See, e.g. , Hartle confess to you, Lord, that I still do not know what time is. Yet
(1987).] The problems raised here do not arise in asymp- I confess too that I do know that I am saying this in time, that I
totically flat cosmologies, because in such theories there "
have been talking about time for a long time, . . . .

Rev. Mod. Phys. , Vol. 61, No. 1, January 1989


16 Steven Weinberg: The cosmological constant problem

Coleman (1988b) short-circuits many of the problems


that arise in giving a probabilistic interpretation to Eu-
clidean path integrals by using such integrals only to cal-
0= f J[dg]lm (8. 12)

culate expectation values: the expectation value of an ar- The trouble here is, of course, the same as that encoun-
bitrary scalar field Ag +(x), which may depend on the tered in giving a probabilistic interpretation, to the
metric and matter fields and their derivatives, is taken as Klein-Gordon equation: the integrand in (8. 12) is not, in
f [dg][d@]A ~(x)exp( —S[g, 4]) general, positive. Banks, Fischler, and Susskind (1985)
(8.7) and Vilenkin (1986, 1988a), have considered minisuper-
f [dg][de]exp( —S[g, e]) space models in which %' is complex, with increasing
phase, for which the integrand of Eq. (8. 12) is positive-
The general covariance of the theory makes ( A ) in- definite; however, this is not the case in general, and, in
dependent of x. In fact, it should be emphasized that this particular, not for Hartle-Hawking boundary conditions.
sort of expectation values includes an average over the For a recent more general discussion, see Vilenkin
time in the history of the universe that A is measured. (1988b). )
On the other hand, the probability P [/3] discussed above I now want to give a simplified description of
is the expectation value of a nonlocal operator, the delta Hawking's (1984b) proposed solution of the cosmological
function in (8.5), and refers to a specific local time a. constant problem, using for this purpose parts of
(I should mention here that there is a very difFerent Coleman's (1988b) analysis. In order to make the cosmo-
and apparently unrelated approach to the problem of giv- logical constant into a dynamical variable, Hawking in-
ing a probabilistic interpretation to the wave function %. troduces a 3-form gauge field A„& of the sort described
The Wheeler-DeWitt equation (8. 1) is somewhat like the at the end of Sec. VII. According to the general ideas of
Klein-Gordon equation for a particle in a scalar potential Euclidean quantum cosmology, the probability distribu-
and leads immediately to a somewhat similar conserva- tion for the scalar c(x) defined by Eq. (7.27) at any one
tion law (now given for pure gravity): pointx =x, is
0= h'i (x)Q;, k, (x) P(c) = (5(c(x, ) —c ) )
Ei

X Im %*[h]
5
%[h] (8.8)
~ f [dA][dg][d@]5(c(x, —c) )

kl Xexp( —S[ A, g, C&]) . (8. 13)


Since the beginning, it was hoped that such a conserva- It is well known that such functional integrals can be ex-
tion law could be used to construct a suitable probability pressed as exponentials of the effective action at its sta-
density (DeWitt, 1967). Usually (8.8) is stated in a tionary point. ' In the present case, we have
minisuperspace context, where h,. (x) is constrained to —I [A„g„@,]),
P(c) exp( (8. 14)
depend on only a finite number of parameters. Since
9;. k&h"'= —h;, it is natural to treat the overall scale of where I [A, g, @] is the total action (the sum of one-
h;. as a sort of global time coordinate, and take as a prob- particle irreducible graphs with external lines replaced
ability density the corresponding component of the con- with fields A, g, @) and the subscript c indicates that this
served "current" in (8.8). I wish to point out here that quantity is to be evaluated at a point where I is station-
such a construction is not limited to any particular ary with respect to any variations in 3 k(x), g„(x), or
minisuperspgce formulation, but can be carried out in the C&(x) that leave c(x, ) =c fixed. Now, among all the pos-
general case. Take %' to depend on a "global time" sible stationary points of I,
there is one that can be
1/2 found knowing only the effective action relevant to large
T[h]=
L
f d3x h'i (x) (8.9)

and an arbitrary (in fact, infinite) number of other param-


eters g„[h], all g„ independent of the overall scale of ]4The usual proof, for the case without a delta function in the
It; (x): integrand, proceeds by adding a term
0 denotes the various fields, and J
f
JQ to the action, where
is a set of corresponding
5$„[It] exp[ —8'( J) ]
0= fd x h ' (x)hki(x)
kl
(8. 10)
currents.
=— dQ exp( —
f
The path integral is then
S — JQ) The effec. tive action is defined by the
f
Legendre transformation 1 (0) = W( Jo ) — J&Q, where
f J„ is
We also introduce a Jacobian J(g, T) and write the func-
the current that produces a given expectation value 0=68 /6J.
tional measure as
The condition for zero current is that I (0) be stationary with
[dh]=J[dg]dT . (8. 11) respect to 0, and at this point I (0) = W(0). The delta function
in (8. 13) can be dealt with by writing it as an integral
Multiplying Eq. (8.8) with 5(T [h] —T) and doing an in- f dco exp[ic0[c(x, ) — c]].
One can then use the above theorem
tegral over x and a functional integral over h;i(x), we to evaluate the functional integral before integrating over cg,
easily find a constancy condition now with no restriction on c(x), and then doing the co integral.

Rev. Mod. Phys. , Vol. 61, No. 1, January 1989


Steven Weinberg: The cosmological constant problem

4-manifolds. In this case, it is convenient to set all fields where co is the value of c (assuming there is one), for
except A„,i and g„equal to their ( A- and g-dependent) which A. (c) =0.
stationary values, in which case the effective action can It is important that the quantity A, (c) is the true
be expanded in inverse powers of the size of the mani- effective cosmological constant, previously called
fold' that would be measured in gravitational phenomena at
long ranges. ' The constant A, in Eq. (8. 15) includes all
I,QA, g]= f v'gd~x+ f v'gR d~x e6'ects of fields other than g„and A„&, including all
quantum fluctuations. Hence the result (8.21), if valid,
+—'
48
d xt/gF pvA, p
F" I'+ 7 (8. 15) really does solve the cosmological constant problem.
We can check that this result is not invalidated by the
the omitted terms involving more than two derivatives of
terms neglected in Eq. (8. 16). For a large radius r, the ex-
g and/or A. As we saw in Sec. VII, the condition that hibited terms in (8. 16) are of order A, r /G and r /G, re-
this be stationary in A„ i [for variations that keep c(x, )
spectively, while a term with D 4 derivatives would
fixed] is that F„ i have vanishing covariant divergence,
from which it follows that c in Eq. (7.27) is constant;
yield a contribution to I,
tr of order (mr),
where m is
some combination of the Planck mass and elementary-
hence
particle masses. For A, (c) & m, this shifts the size of the
I,tr= 8mG f i gd x+ 16m.G f t gR d x+ manifold by

5r/r=GA(c)[A(c)/m ]' '~ &&1.


(8. 16)
The change in the stationary value of the action is then
where
2
51,a=[A, (c)/m ]' ' &1

(c) =
C
A, +A. . (8. 17) so these higher-derivative terms have no eftect on the
2
singularity (8.20).
The condition that this be stationary in g„ is, of course, Coleman (1988b) does not need to introduce a 3-form
that g„satisfy the Einstein field equations with cosmo- gauge field A„&,' rather, in order to make the cosmologi-
logical constant A. ( c ). For any such solution, cal constant into a dynamical variable, he considers the
R = —4A, (c), so at the stationary point e6'ect of topological fixtures known as wormholes. ' An
—4
= —A(c) fi/gd explicit example of a wormhole is provided by the metric
I
8~G
x. (8. 18) (Hawking, 1987b, 1988)

With Hartle-Hawking boundary conditions, the solution ds = (1+b /x "x") dx "dx" . (8.22)
of the Einstein equations for A. (c) &0 is a 4-sphere of
This appears to have a singularity at x)"=0, but the line
proper circumference 2mr, where
element is invariant under the transformation
r =t 3/A(c), (8. 19)
x "~x'"=x"b /x'x (8.23)
yielding a probability density proportional to
so the region x "x"& 6 actually has the same geometry
exp( —I,tr) =exp[3m /GA, (c)] . (8.20) as that with xl'x" & b . The space described by Eq. (8.22)
therefore consists of two asymptotically Aat 4-spaces,
On the other hand, for A, (c) &0 the solutions can be made joined together at the 3-surface with x "x"= a 3- b,
compact by imposing periodicity conditions, but they all sphere known as a "baby universe. "
This 4-metric is not
have l,
z 0. Hawking s conclusion is that the probabili- a solution of the classical. Einstein equations (though it
ty density has an infinite peak for A, (c) ~0+;
hence, after does have R =0), but this is not very relevant; the action
normalizing P, is
P(c) =5(c —co), (8.21) S=3mb /G, (8.24)
so the factor exp( —S) suppresses the efFects of all

5Such an effective action may be used as the input for calcula-


tions in which we include quantum effects only from virtual This property is shared by an imaginative solution to the
massless particles with ~q less than some cutoff A . Such
~
cosmological constant problem proposed by Linde (1988a}.
effects are, of course, finite, and their A dependence is to be can- 7The importance of quantum Auctuations in space-time topol-
celed by giving the coe%cients in F', & a suitable A dependence. ogy at small scales has been emphasized for many years by
(This point of view is described by Weinberg, 1979b.) In order Wheeler (e.g. , 1964), and more recently by Hawking (1978) and
to prevent these quantum effects from generating an unaccept- Strominger (1984). Such "space-time foam" was considered as a
able cosmological constant, the cutoff A must be taken very mechanism for canceling a cosmological constant by Hawking
small. (1983}.

Rev. Mod. Phys. , Vol. 61, No. 3, January 1989


18 Steven Weinberg: The cosmological constant problem

wormholes except those of Planck dimensions or less, for different interpretation [see also Giddings and Strom-
which quantum effects are surely important. [A model inger (1988b)]. The state lB ) in Eq. (8.26) may always be
with classical wormhole solutions, based on a 2-form ax- expanded in eigenstates of the operators a;+a,~:
ion, has been presented by Giddings and Strominger
(1988a).] (8.28)
If Planck-sized
wormholes can connect asymptotically
Aat 4-spaces, then they can connect any 4-spaces that are (a, +at)la) =a, la&, (8.29)
large compared to the Planck scale. We are therefore led
&a'la &
= g 5(a,' —a;), (8.30)
to consider contributions to the Euclidean path integral
from large 4-spaces [like the 4-sphere in Hawking's
(1984b) theory] connected to themselves and each other the function f
ii ( a ) depending on the boundary condi-
tions. For instance, for Hartle-Hawking conditiens, lB )
with Planck-sized wormholes. Each wormhole can be re-
garded as the creation and subsequent destruction of a satisfies Eq. (8.27), and so
baby universe [like the 3-sphere of proper circumference
fbi(a)= + m
'~ exp( —a, /2) . (8.31)
4mb in Hawking's (1987b, 1988) wormhole model], and
such baby universes may also appear as part of the
boundary of the 4-manifold. (With n baby universes on the boundary of the 4-space,
What are the effects of these wormholes and baby this would be multiplied with a Hermite polynomial of
universes? At scales large compared with the scale of the order n. ) In the state la), the effect of the creation and
baby universe, the creation -or destruction of a baby annihilation of baby universes is to change the action S to
universe can only show up through the insertion of a lo- S =5+ g a; f 0;(x)d x . (8.32)
cal operator in the path integral. The various types of
baby universes can be classified according to the form of
these local operators. The effect of creating and destroy- That is, the coupling constant multiplying each possible
ing arbitrary numbers of baby universes of all types can local term f 0;d x is changed by an amount a;. As soon
as we start to make any sort of measurements, the state
thus be expressed by adding a suitable term in the action
of the universe breaks up into an incoherent superposi-
S=S+ g (a;+a; ) f d x 0;(x), (8.25) tion of these la) s, each appearing with a priori probabil-
ity lf~(a)l; but for each term we have an ordinary
where a; and a; are the annihilation and creation opera- wormhole-free quantum theory, with a-dependent action
tors for a baby universe of type i, and O, (x) is the corre- (8.32).
sponding local operator. [This was first stated by Hawk- If all we want is to explain why the cosmological con-
ing (1987b). Creation and annihilation operators for stant is not enormous, then our work is essentially done.
baby universes were earlier used by Strominger (1984). The effective cosmological constant is a function of the
For a proof of Eq. (8.25), see Coleman (1988a) and Gid- a;, because among the 0; there is a simple operator
dings and Strominger (1988b).] The path integral over all 0, =&g, whose coeflicient contributes a term 8mGai to
4-manifolds with given boundary conditions is to be cal- A. , and also because the vacuum energy (p) depends on
culated as the couplings of all interactions, each of which has a
'= f term proportional to one of the a;. Now, generic baby-
f[dg][d+]e [dg][d~](Ble 'IB&, (8.26)
universe states lB) will have components la) for which
A, ,ga) is very small, as well as others for which it is enor-
where No means that wormholes and baby universes are
excluded, and B ) is a normalized baby-universe state
l
mous. The anthropic considerations of Sec. VI tell us
depending on the boundary conditions. For instance, that any scientist who asks about the value of the cosmo-
with Hartle-Hawking boundary conditions, lB ) is the logical constants can only be living in components la)
empty state for which A, ,z is quite small, for otherwise galaxies and
stars could never have formed (for A, a) 0), or else there
,
(8.27) would not be time for life to evolve (for A, ,ir(0).
These baby universes have an important effect even if However, it is of great interest to ask whether the
none of them appear as part of the boundary of the 4- effective cosmological constant is really zero, or just
manifold, as would be the case for Hartle-Hawking small enough to satisfy anthropic bounds, in which case
boundary conditions. Hawking (1987b, 1988) has sug- it should show up observationally. The probability of
gested that since the baby universes are unobservable, getting any particular value of the a;, and hence of
their effect is an effective loss of quantum coherence. finding a value A, ,it(a), is not just given by the function
[See also Hawking (1982); Teitelboim (1982); Strominger lf~(a)l arising from the boundary conditions, but is also
(1984); Lavrelashvili, Rubakov, and Tinyakov (1987, affected by the functional integral itself.
1988); Giddings and Strominger (1988b). A contrary In calculating this effect, Coleman (1988b) observed
view was taken by Gross (1984).] Recently Coleman that although we are to integrate only over connected 4-
(1988a) has argued (convincingly, in my view) for a manifolds, on a scale much large than the wormhole

Rev. Mod. Phys. , Vol. 61, No. 1, January 1989


Steven Weinberg: The cosmological constant problem 19

scale those manifolds that appear disconnected will really possible that the essential singularity in
be connected by wormholes. Hence any sort of probabili- exp{exp[3m/GA(a)]I is canceled by an essential zero in
ty density or expectation value will contain as a factor a the a priori probability ~f~(a)~ . However, this is not the
sum over disconnected manifolds consisting of arbitrary case for Hartle-Hawking boundary conditions, where
numbers of closed connected wormhole-free com- ~f~(a)~ is a simple Gaussian. Moreover, Coleman
ponents. ' Just as for Feynman diagrams, this sum is the (1988b) has shown that in his theory such an essential
exponential of the path integral for a single closed con- zero would be destroyed by almost any perturbation of
nected wormhole-free manifold the boundary conditions; instead of its being unnatural to
have zero cosmological constant, it would be highly un-
E(a)=exp ICC
[dg]exp( —S [g]) (8.33) natural not to. Still, the problem of boundary conditions
is disturbing, because it reminds us that quantum cosmol-
where CC indicates that we include only closed connect-
ogy is an incomplete theory.
ed wormhole-free manifolds, and S~[g] is the action (3) Are wormholes real'? Coleman's calculation de-
(8.32) with all fields other than g„,(x) integrated out. pends on there being a clear separation between the very
The path integral in (8.33) can be evaluated by precise- large 4-manifolds, for which the long-range effective ac-
ly the same methods as described above in connection tion is stationary (and large and negative), and very small
with Hawking's (1984b) model [and used for this purpose wormholes, whose contribution to the action is of order
by Coleman (1988b)]. The result is that the probability unity (and generally positive). Furthermore, the
density for A, ,tr contains a factor (for A, , 0) z) wormholes have been assumed to be so well separated
that we can ignore their interactions (the "dilute gas" ap-
F= exp exp
3m
+O(1) (8.34) proximation). It may be possible to construct a theory in
eff
which the wormhole scale [like b in Eq. (8.22)] is some-
The fact that this is now an exponential of an exponen- what larger than the Planck scale, large enough to allow
tial, instead of a mere exponential, is not essential in solv- the wormhole metric to be calculated classically, but we
ing the cosmological constant problem (though it is im- would still have to ask whether this is actually the case.
portant in fixing other constants, as described at the end Hawking (1984b) does not need to worry about
of this section). Either way, the probability distribution wormholes, but how do we know that the 3-form gauge
has an infinite peak at A, ,~~0+, which, after normaliz- field is real? A related question for both authors: even
ing so that the total probability is unity, means that P(a) granting the existence of the stationary point of the ac-
has a factor tion at which I, ~= —3m. /A, G, how do we know that this
is the dominant stationary point?
P(a) ~5(A, ,ga)) . (8.35) (4) What about the other terms in the effective action?
For instance, suppose we include the 6-derivative term'
In addition, as in Hawking's case, from the way that F
,
has bee. n calculated it is clear that this A, ~ is the constant A(a) R
that appears in the effective action for pure gravity with 8m G(a) 16m G(a)
all high-energy fluctuations integrated out; hence it is the
cosmological constant relevant to astronomical observa- +g(a)G(a)R„ I'R& 'R (8.36)
tion.
Has the cosmological constant problem been solved? with g(a) a dimensionless coefficient that, like A, and G,
Perhaps so, but there are still some things to worry about depends on the baby-universe parameters a;. Hawking
in Coleman's approach, as also in the earlier work of and Coleman found a stationary point of this action for
Hawking. Here is a short list of qualms. which I, ——
~ & ~ when A(a)G(a)~0, but for this pur-
(1) Does Euclidean quantum cosmology have anything pose it is essential that g(a) remain bounded in this limit.
to do with the real world? It is essential to both Coleman (We recall that in our previous discussion of the higher-
and Hawking that the path integral be given by a station-
— derivative terms in I,
z, we assumed that the coefficient
ary point of the Euclideanized action the conclusion m of terms with D ~4 derivatives remains less than
would be completely wiped out if in place of as A, ~O. ) But if we can let 1/A, G go to infinity,
exp(3~/GA, ,&) we had found exp( n3.i G/A, , )a. Some of then why not let g go to infinity also? In particular, why
the technical and conceptual difficulties of Euclidean not use a dimensional factor 1/A, (a) in place of G(a) in
quantum cosmology were discussed at the beginning of
this section.
(2) What are the boundary conditions'? It is always Terms involving the Ricci tensor R„or its trace R are not
included here, because they represent merely a redefinition of
the metric; see, e.g. , %'einberg (1979a). The 4-derivative term
'8This sum actually includes manifolds that are truly not con- R„q~R"" ~ is not included, because it can be combined with
nected by wormholes or anything else, but their contribution is terms involving R~„or R to make a topological invariant and is
a harmless multiplicative factor, which will cancel out anyway therefore physically unmeasureable for fixed large-scale topolo-
in normalizing P (a). gy.

Rev. Mod. Phys. , Vol. 61, No. 1, January 1989


20 Steven Weinberg: The cosmological constant problem

the last term of Eq. (8.36)'? This would completely invali- nice to be able to calculate them, because up to now the
date the analysis of the singularity in the probability den- only really unsatisfactory feature of the quantum theory
sity P(a), and could well wipe it out. of gravitation has been the apparent arbitrariness of this
The last of these four qualms suggests some interesting infinite set of parameters.
possibilities. Suppose we do assume that for some reason
constants like g(a) in Eq. (8.36) are bounded. Then the
IX. OUTLOOK
effect of wormholes is not only to fix A, (a) at zero, but
also to fix these other constants at their lower or upper
All of the five approaches to the cosmological constant
bounds. [I think this is the correct interpretation of what
problem described in Secs. EV —VIII remain interesting.
Coleman (1988b) calls "the big fix. "] For instance, for
At present, the fifth, based on quantum cosmology, ap-
g(a) bounded and ~A(a)G(a)~ ((1,
the action (8.36) is
pears the most promising. However, if wormholes (or 3-
stationary for a sphere of proper circumference 2~r, form gauge fields) do produce a distribution of values for
where
the cosmological constant, but without an infinite peak at
3 64gmG A, A, ,z-=O, then we will have to fall back on the anthropic
(8.37}
3 principle to explain why A, ,z is not enormously larger
than allowed by observation. Alternatively, it may be
for which the effective action takes the value
some change in the theory of gravity, like that described
3' 128(GA, ~
(8.38)
here in Sec. VII, that produces the distribution in values
eff
Gg 3 ,
for A, &. The approaches based on supersymmetry and
Thus the probability distribution exp[exp( — s)] not
adjustment mechanisms described in Secs. IV and VI
I, seem least promising at present, but this may change.
only has an infinite peak at A(a) =0, but also contains a
All five approaches have one other thing in common:
factor
They show that any solution of the cosmological constant

exp
128/6 X~
exp
3' (8.39)
problem is likely to have a much wider impact on other
areas of physics or astronomy. One does not need to ex-
plain the potential importance of supergravity and super-
For GA, ~O, the quantity GA, exp(3rr/GA, ) becomes strings. A light scalar like that needed for adjustment
infinite, so the normalized probability will have a delta mechanisms could show up macroscopically, as a "fifth
function at the upper bound of g(a). All constants in the
"
force. Changing gravity by making Detg„not dynami-
effective action for gravitation, including terms with any cal would make us rethink our quantum theories of grav-
numbers of derivatives, can be calculated in this way, itation, and wormholes might force all the constants in
but they all have to be bounded as A, (a)G(a)~0 for any these theories to their outer bounds. Finally, and of
of this to make sense. greatest interest to astronomy, if it is only anthropic con-
It may be that the bounds (if any) on parameters like straints that keep the effective cosmological constant
g(a) arise from the details of wormhole physics, in which within empirical limits, then this constant should be rath-
case these remarks are not. going to be useful numerically er large, large enough to show up before long in astro-
for some time. However, there is another more exciting nomical observations.
possibility, that there are just unitarity bounds, which Note added in proof As might h. ave been expected, in
could be calculated working only with low-energy the time since this report was submitted for publication
effective theory itself. Of course, we are not likely to be there have appeared a large number of preprints that fol-
able to measure parameters like g(a), but it would still be low up on various aspects of the work of Coleman
(1988b) and Banks (1988). Here is a partial list: Accetta
et al. (1988); Adler (1988); Fischler and Susskind (1988);
Giddings and Strominger (1988c); Gilbert (1988}; Grin-
To the extent that it will become possible to calculate func- stein and Wise (1988); Gupta and Wise (1988); Hosoya
tions like A, (a), G(a), g(a) etc. , in terms of the parameters in an
(1988); Klebanov, Susskind, and Banks (1988); Myers and
underlying fundamental theory, such as a string theory, the lo-
Periwal (1988); Polchinski (1988); Rubakov (1988). I am
cation of -the delta functions in I" may allow us to infer some-
thing about the values of the e; and of the parameters in the un-
not able to review all of these papers here. However, I do
derlying theory. However, without such an underlying theory,
want to mention two further qualms, regarding
it is impossible to use calculations of A, G, g, etc. , to infer any- Coleman's proposed solution of the cosmological con-
thing about the observed parameters of some intermediate stant problem, that are raised by some of these papers.
theory like the standard model. This is because, in addition to First, Fischler and Susskind (1988), partly on the basis of
charges, masses, etc. , the standard model implicitly also in- conversations with V. Kaplunovsky, have pointed out
volves parameters AO, Go, go, . . . appearing in the effective ac- that the exponential damping of large wormholes may be
tion for gravitation. When we integrate out the quarks, leptons, overcome by Coleman's double exponential. If this were
and gauge and Higgs bosons, we obtain new values for A, , G, g, the case, we would be confronted with closely packed
etc. ; but these new values depend on an equal number of un- worrnholes of macroscopic as well as Planck scales. This
knowns Ao Gp $0 etc. , as well as on charges and masses. would be a disaster for Coleman's proposed solution of

Rev. Mod. Phys. , Vol. 61, No. 1, January 1989


Steven Weinberg: The cosmological constant problem 21

the cosmological constant problem, and would also indi- Attick, J., G. Moore, and A. Sen, 1987, Institute for Advanced
cate that we do not fully understand how to use Euclide- Studies preprint.
an path integrals in quantum cosmology. Next, Polchin- Augustine, 398, Confessions, translated by R. S. Pine-Coffin
ski (1988) has found that the Euclidean path integral over (Penguin Books, Harmondsworth, Middlesex, 1961), Book XI.
Bahcall, J., T. Piran, and S. Weinberg, 1987, Eds. , Dark Matter
closed, connected, wormhole-free manifolds inside the
in the Universe: 4th Jerusalem Winter School for Theoretical
exponential in (8.33) has a phase that might eliminate the
Physics (World Scientific, Singapore).
peak in the probability distribution at zero cosmological Bahcall, S. R., and S. Tremaine, 1988, Astrophys. J. 326, L1.
constant. As pointed out here in footnote 15, when we Banks, T., 1985, Nucl. Phys. B 249, 332.
use an effective action I,
~ to evaluate such path integrals, Banks, T., 1988, University of California, Santa Cruz, Preprint
the effective action must be taken as an input to calcula- No. SCIPP 88/09.
tions in which we include quantum fluctuations in mass- Banks, T., W. Fischler, and L. Susskind, 1985, Nucl. Phys. B
less particle fields with momenta up to some ultraviolet 262, 159.
cutoff A. This cutoff must be taken as the same as the in- Barbieri, R., E. Cremmer, and S. Ferrara, 1985, Phys. Lett. B
frared cutoF that was used in calculating I', s; so that all 163, 143.
Barbieri, R., S. Ferrara, D. V. Nanopoulos, and K. S. Stelle,
Auctuations are taken into account. It was remarked in
1982, Phys. Lett. B 113, 219.
footnote 15 that A must be taken very small, to avoid Barr, S. M. , 1987, Phys. Rev. D 36, 1691.
reintroducing a cosmological constant, but as Polchinski Barr, S. M. , and D. Hochberg, 1988, Phys. Lett. B 211, 49.
now remarks, no matter how small we take A, the in- Barrow, J. D. , and F. J. Tipler, 1986, The Anthropic Cosmologi'-
tegral over Auctuations in the gravitational field with mo- cal Principle (Clarendon, Oxford).
menta less than A produces a phase in the integral. Since Bernstein, J., and G. Feinberg, 1986, Eds. , Cosmological Con-
this phase appears inside the exponential in Eq. (8.33), if stants (Columbia University, New York).
its real part is not positive definite there would be no ex- Bludman, S. A. , and M. Ruderman, 1977, Phys. Rev. Lett. 38,
ponential peak at zero cosmological constant. On the 255.
other hand, in the absence of wormholes this phase Brown, J. D. , and C. Teitelboim, 1987a, Phys. Lett. B 195, 177.
would appear as an overall factor in front of a single ex- Brown, J. D. , and C. Teitelboim, 1987b, Nucl. Phys. B 297, 787.
Buchmiiller, W. , and N. Dragon, 1988a, University of Hann-
ponential, so it would not affect the peaking at zero
over Preprint No. ITP-UH 1/88.
cosmological constant found by Hawking (1984b). Buchmuller, W. , and N. Dragon, 1988b, Phys. Lett. 8 207, 292.
Carter, B., 1974, in Internationa/ Astronomical Union Symposi-
ACKNQWLEOG MENTS um 63: Confrontation of Cosmological Theories with Observa-
tional Data, edited by M. S. Longair (Reidel, Dordrecht), p.
I have been greatly helped in preparing this review by 291.
Carter, B., 1983, in The Constants of Physics, Proceedings of a
conversations with many co11eagues. Here is a list of a
Royal Society Discussion Meeting, 1983, edited by W. H.
few of those to whom my thanks are especially due. Sec- McCrea and M. J. Rees (printed for The Royal Society, Lon-
tion II: G. Holton, Sec. III: E. Witten; Sec. IV: S. de don, at the University Press, Cambridge), p. 137.
Alwis, J. Polchinski, E. Witten; Sec. V: P. J. E. Peebles, Casimir, H. B. G., 1948, Proc. K.'Ned. Akad. Wet. 51, 635.
P. Shapiro, E. Vishniac; Sec. VI: J. Polchinski; Sec. VII: Chang, N;-P. , D.-X. Li, arid J. Perez-Mercader, 1988, Phys.
C. Teitelboim, F. Wilczek; Sec. VIII: L. Abbott, S. Cole- Rev. Lett. 60, 882.
man, B. DeWitt, W. Fischler, S. Giddings, L. Susskind, Coleman, S., 1988a, Nucl. Phys. B 307, 867.
C. Teitelboim, F. Wilczek. Of course, they take no Coleman, S., 1988b, "Why there is nothing rather than some-
responsibility for anything that I may have gotten wrong.
"
thing: A theory of the cosmological constant, Harvard Uni-
Research was supported in part by the Robert A. Welch versity Preprint No. HUTP-88/A022.
Coleman, S., and F. de Luccia, 1980, Phys. Rev. D 21, 3305.
Foundation and NSF Grant No. PHY 8605978.
Coughlan, G. D., I. Kani, G. G. Ross, and G. Segre, 1988,
CERN Preprint No. TH. 5014/88.
REFERENCES Cremmer, E., S. Ferrara, C. Kounnas, and D. V. Nanopoulos,
1983, Phys. Lett. B 133, 61.
Cremmer, E., B. Julia, J. Scherk, S. Ferrara, L. Girardello, and
Abbott, L., 1985, Phys. Lett. 8150, 427. P. van Nieuwenhuizen, 1978, Phys. Lett. B 79, 231.
Abbott, L., 1988, Sci. Am. 258 (No. 5), 106. Cremmer, E., B. Julia, J. Scherk, S. Ferrara, L. Girardello, and
Accetta, F. S., A. Chodos, F. Cooper, and B. Shao, 1988, "Fun P. van Nieuwenhuizen, 1979, Nucl. Phys. B 147, 105.
with the wormhole calculus, "
Yale University Preprint No. Davies, P. C. W. , 1982, The Accidental Universe (Cambridge
YCTP-P20-88. University, Cambridge).
Adler, S. L., 1988, "On the Banks-Coleman-Hawking argument de Sitter, W. , 1917, Mon. Not. R. Astron. Soc. 78, 3 (reprinted
for the vanishing of the cosmological constant, " Institute for in Bernstein and Feinberg, 1986).
Advanced Study Preprint No. IASSNS-HEP-88/35. de Vaucouleurs, G., 1982, Nature (London) 299, 303.
Albrecht; A. , and P. J. Steinhardt, 1982, Phys. Rev. Lett. 48, de Vaucouleurs, G. , 1983, Astrophys. J. 268, 468, Appendix B.
120. DeWitt, B., 1967, Phys. Rev. 160, 1113.
Arnowitt, R., S. Deser, and C. W. Misner, 1962, in Grauitation: Dicke, R. H. , 1961, Nature (London) 192, 440.
An Introduction to Current Research, edited by L. Witten (Wi- Dine, M. , W. Fischler, and M. Srednicki, 1981, Phys. Lett. B
ley, New York) p. 227. 104, 199.

Rev. Mod. Phys. , Vol. 61, No. 1, January 1989


22 Steven Weinberg: The cosmological constant problem

Dine, M. , R. Rohm, N. Seiberg, and E. Witten, 1985, Phys. Hawking, S. W. , 1984a, Nucl. Phys. 8 239, 257.
Lett. 8 156, 55. Hawking, S. W. , 1984b, Phys. Lett. 8 134, 403.
Dine, M. , and N. Seiberg, 1986, Phys. Rev. Lett. 57, 2625. Hawking, S. W. , 1984c, in Relativity, Groups and Topology II,
Dirac, P. A. M. , 1937, Nature (London) 139, 323. NATO Advanced Study Institute Session XL. . . Les Houches
Dolgov, A. D. , 1982, in The Very Early Universe: Proceedings of 1983, edited by B. S. DeWitt and Raymond Stora (Elsevier,
the 1982 Nu+eld Workshop at Cambridge, edited by G. W. Amsterdam), p. 336.
Gibbons, S. W. Hawking, and S. T. C. Siklos (Cambridge Uni- Hawking, S. W. , 1987a, remarks quoted by M. Gell-Mann,
versity, Cambridge), p. 449. Phys. Scr. T15, 202 (1987).
Dreitlein, J., 1974, Phys. Rev. Lett. 34, 777. Hawking; S. W. , 1987b, Phys. Lett. 8 195, 337.
Eddington, A. S., 1924, The hfathematical Theory of Relativity, Hawking, S. W. , 1988, Phys. Rev. D 37, 904.
2nd Ed. (Cambridge University, London). Hawking, S., and D. Page, 1986, Nucl; Phys. 8 264, 185.
Einstein, A. , 1917, Sitzungsber. Preuss. Akad. Wiss. Phys. - Henneaux, M. , and C. Teitelboim, 1984, Phys. Lett. 8 143, 415.
Math. Kl. 142 [English Translation in The Principle of Rela Henneaux, M. , and C. Teitelboim, 1988, "The cosmological
tivity (Methuen, 1923, reprinted by Dover Publications), p. constant and general covariance, "
University of Texas pre-
177; and in Bernstein and Feinberg, 1986]. print.
Einstein, A. , 1919, Sitzungsber. Preuss. Akad. Wiss. , Phys. - Hosoya, A. , 1988, "A diagrammatic derivation of Coleman's
Math. Kl. [English translation in The Principle of Relativity vanishing cosmological constant, " Hiroshima Preprint No.
(Methuen, 1923, reprinted by Dover Publications), p. 191]. RRK-88-28.
Ellis, J., C. Kounnas, and D. V. Nanopoulos, 1984, Nucl. Phys. Kim, J., 1979, Phys. Rev. Lett. 43, 103.
8 247, 373. Klebanov, I., L. Susskind, and T. Banks, 1988, "Wormholes and
Ellis, J., A. B. Lahanas, D. V. Nanopoulos, and K. Tamvakis,
"
cosmological constant, SLAC Preprint No. SLAC-Pub. -4705.
1984, Phys. Lett. 8 134, 429. Knapp, G. R., and J. Kormendy, 1987, Eds. , Dark Matter in the
Ellis, J., N. C. Tsamis, and M. Voloshin, 1987, Phys. Lett. 8 UnEverse: I.A. U. Symposium No. 117 (Reidel, Dordrecht).
194, A 291. Lavrelashvili, G. V., V. A. Rubakov, and P. G. Tinyakov, 1987,
Fischler, W. , and L. Susskind, 1988, "A wormhole catas- Pis'ma Zh. Eksp. Teor. Fiz. 46, 134 [JETP Lett. 46, 167
"
trophe, Texas Preprint No. UTTG-26-88. (1987)].
Ford, L. H. , 1987, Phys. Rev. D 35, 2339. Lavrelashvili, G. V. , V. A. Rubakov, and P. G. Tinyakov, 1988,
Freese, K., F. C. Adams, J. A. Frieman, and E. Mottola, 1987, Nucl. Phys. 8 299, 757.
Nucl. Phys. 8 287, 797. Lemaitre, G., 1927, Ann. Soc. Sci. Bruxelles, Ser. 1 47, 49.
Friedan, D., E. Martinec, and S. Shenker, 1986, Nucl. Phys. 8 Lemaitre, G., 1931, Mon. Not. R. Astron. Soc. 91, 483.
271, 93. Linde, A. D. , 1974, Pis'ma Zh. Eksp. Teor. Fiz. 19, 320 [JETP
Friedmann, A. , 1924, Z. Phys. 21, 326 [English translation in Lett. 19, 183 (1974)].
Bernstein and Feinberg, 1986, Eds. , Cosmological Constants Linde, A. D., 1982, Phys. Lett. 8 129, 389.
(Columbia University, New York)]. Linde, A. D., 1986, Phys. Lett. 8 175, 395.
Gibbons, G. W. , S. W. Hawki'ng, and M. J. Perry, 1978, Nucl. Linde, A. D. , 1987, Phys. Seri. T15, 169.
Phys. 8 138, 141. Linde, A. D., 1988a, Phys. Lett. 8 200, 272.
Giddings, S. B., and A. Strominger, 1988a, Nucl. Phys. 8 306, Linde, A. D., 1988b, Phys. Lett. 8 202, 194.
890. Loh, E. D., 1986, Phys. Rev. Lett. 57, 2865.
Giddings, S. B., and A. Strorninger, 1988b, Nucl. Phys. 8 307, Loh, E. D., and E. J. Spillar, 1986, Astrophys. J. 303, 154.
854. Martinec, E., 1986, Phys. Lett. 8 171, 189,
Giddings, S. B., and A. Strominger, 1988c, "Baby universes, Moore, G., 1987a, Nucl Phys. 8 293, 139.
third quantization, and the cosmological constant, Harvard " ~

Moore, G., 1987b, Institute for Advanced Study Preprint No.


Preprint No. HUTP-88/A036. IASSNS-HEP-87/59, to be published in the proceedings of the
Gilbert, G., 1988, "Wormhole induced proton decay, Caltech " Cargese School on Nonperturbative Quantum Field Theory.
Preprint No. CALT-68-1524. Morozov, A. , and A. Perelomov, 1987, Phys. Lett. 8 199, 209.
Grinstein, B., and M. B. Wise, 1988, "Light scalars in quantum Myers, R. C., and V. Periwal, 1988, "Constants and correlations
"
gravity, Caltech Preprint No. CALT-68-1505. in the Coleman calculus, " Santa Barbara Preprint No. NSF-
Grisaru, M. T., W. Siegel, and M. Rocek, 1979, Nucl. Phys. 8 ITP-88-151.
159, 429. I
Page, D., 1987, in The 8'orld and (in press).
Gross, D. J., 1984, Nucl. Phys. 8 236, 349. Pais, A. , 1982, 'Subtle is the Lord. . . ': The Science and the Life
Gupta, A. K., and M. B. Wise, 1988, Comment on wormhole of Albert Einstein (Oxford University, New York).
"
correlations, Caltech Preprint No. CALT-68-)520. . Peccei, R. D. , J. Sola, and C. Wetterich, 1987, Phys. Lett. 8
Guth, A. H. , 1981, Phys. Rev. D 23, 347. 195, 183.
Hartle, J. B., 1987, in Gravitation in Astrophysics: Cargese 1986, Peebles, P. J. E., 1984, Astrophys. J. 28$, 439.
edited by B. Carter and J. B. Hartle (Plenum, New York), p. Peebles, P. J. E., 1987a, in Proceedings of the Summer Study on
329. the Physics of the Superconducting Super Collider, edited by R.
Hartle, J. B., and S. W. Hawking, 1983, Phys. Rev. D 28, 2960. Donaldson and J. Marx (Division of Particles and Fields of the
Hawking, S. W. , 1978, Nucl. Phys. 8 144, 349. APS, New York).
Hawking, S. W. , 1979, in Three Hundred Years of Gravitation, Peebles, P. J. E., 1978b, Publ. Astron. Soc. Pac. , in press.
edited by S. W. Hawking and W. Israel (Cambridge Universi- Peebles, P. J. E., and B. Ratra, 1988, Astrophys. J. Lett. 325,
ty, Cambridge). L17.
Hawking, S. W. , 1982, Commun. Math. Phys. 87, 395. Petrosian, V. , E. E. Salpeter, and P. Szekeres, 1967, Astrophys.
Hawking, S. W. , 1983, Philos. Trans. R. Soc. London, Ser. A J. 147, 1222.
310, 303. Polchinski, J., 1986, Commun. Math. Phys. 104, 37.

Rev. Mod. Phys. , Vol. 61, No. 1, January 1989


Steven Weinberg: The cosmological constant problem 23

Polchinski, J.,
1987, private communication. Weinberg, S., 1979b, Physlca 96A, 327.
Polchinski, J.,
1988, in preparation. Weinberg, S., 1982, Phys. Rev. Lett. 48, 1776.
Ratra, B., and P. J. E. Peebles, 1988, Phys. Rev. D 37, 3406. Weinberg, S., 1983, unpublished remarks at the workshop on
Rees, M. J., 1987, New Sci. August 6, 1987, p. 43. "Problems in Unification and Supergravity, " La Jolla Insti-
Renzini, A. , 1986, in Galaxy Distances and Deviations from tute, 1983.
Universal Expansion, edited by B. F. Madore and R. B. Tully Weinberg, S., 1987, Phys. Rev. Lett. 59, 2607.
(Reidel, Dordrecht), p. 177. Wetterich, C., 1988, Nucl. Phys. B 302, 668.
Reuter, M. , and C. Wetterich, 1987, Phys, Lett. 188, 38. Wheeler, J. A. , 1964, in Relativity, Groups and Topology, Lec-
Rohm, R., 1984, Nucl. Phys. B 237, 553. tures Delivered at Les Houches, 1963. . . , edited by B. DeWitt
Rowan-Robinson, M. , 1968, Mon. Not. R. Astron. Soc. 141, and C. DeWitt (Gordon and Breach, New York), p. 317.
445. Wheeler, J. A. , 1968, in Battelle Rencoritres, edited by C.
Rubakov, V. A. „1988, "On the third quantization and the DeWitt and J. A. Wheeler (Benjamin, New York).
"
cosmological constant, DESY preprint. Wilczek, F., 1984, Phys. Rep. 104, 143.
Shklovsky, I., 1967, Astrophys. J. 150, L1. Wilczek, F., 1985, in How Far Are We from the Gauge Forces:
Slipher, V. M. , 1924, table in Eddington (1924), The Mathemati- Proceedings of the 1983 Erice Conference, edited by A. Zichichi
cal Theory of Relativity, 2nd Ed. (Cambridge University, Lon- (Plenum, New York), p. 208.
don), p. 162 Wilczek, F., arid A. Zee, 1983, unpublished work quote by Zee
Sparnaay, M. J., 1957, Nature (London) 180, 334. (1985), in High Energy Physics: Proceedings of the Annual
Strominger, A. , 1984, Phys. Rev. Lett. 52, 1733 ~ Orbis Scientiae, edited by S. L. Mintz and A. Perlmutter (Ple-
Teitelboim, C., 1982, Phys. Rev. D 25, 3159. num, New York).
Turner, M. S., G. Steigman, and L. M. Krauss, 1984, Phys. Rev. Witten, E., 1983, in Proceedings of the 1983 Shelter Island
Lett. 52, 2090. Conference on Quantum Field Theory and the Fundamental
Van der Bij, J. J., H. Van Dam, and Y. J. Ng, 1982, Physica Problems of Physics, edited by R. Jackiw, N. N. Khnri, S.
116A, 307. Weinberg, and E. Witten (MIT, Cambridge, Massachusetts), p.
Veltman, M. , 1975, Phys. Rev. Lett. 34, 777. 273.
Vilenkin, A. , 1986, Phys. Rev. D 33, 3560. Witten, E., 1985, Phys. Lett. B 155, 151.
Vilenkin, A. , 1988a, Phys. Rev. D 37, 888. Witten, E., and J. Bagger, 1982, Phys. Lett. B 115, 202.
Vilenkin, A. , 1988b, Tufts Preprint No. TUTP-88-3. Zee, A. , 1985, in High Energy Physics: Proceedings of the 20th
Weinberg, S., 1972, Gravitation and Cosmology (Wiley, New Annual Orbis Scientiae, 1983, edited by S. L. Mintz and A.
York). Perlmutter (Plenum, New York).
Weinberg, S., 1979a, in General Relativity: An Einstein Cen- Zeldovich, Ya. , B., 1967, Pis'ma Zh. Eksp. Teor. Fiz. 6, 883
tenary Survey, edited by S. W. Hawking and W. Israel (Cam- [JETP Lett. 6, 316 (1967)].
bridge University, Cambridge), p. 800. Zumino, B., 1975, Nucl. Phys. B 89, 535.

Rev. Mod. Phys. , Vot. 61, No. 1, January 1989


UTTG-08-08
arXiv:0810.2831v1 [hep-ph] 16 Oct 2008

Non-Gaussian Correlations Outside the Horizon II: The


General Case

Steven Weinberg∗
Theory Group, Department of Physics, University of Texas
Austin, TX, 78712

Abstract

The results of a recent paper [0808.2909] are generalized. A more detailed


proof is presented that under essentially all conditions, the non-linear clas-
sical equations governing matter and gravitation in cosmology have “adi-
abatic” solutions in which, far outside the horizon, in a suitable gauge,
the reduced spatial metric gij (x, t)/a2 (t) becomes a time-independent func-
tion Gij (x), and all perturbations to the other metric components and to
all matter variables vanish. The corrections are of order a−2 , and their x-
dependence is now explicitly given in terms of Gij (x) and its derivatives. The
previous results for the time-dependence of the corrections to gij (x, t)/a2 (t)
in the case of multi-scalar field theories are now shown to apply for any
theory whose anisotropic inertia vanishes to order a−2 . Further, it is shown
that the adiabatic solutions are attractive as a becomes large for the case of
single field inflation and now also for thermal equilibrium with no non-zero
conserved quantities, and the O(a−2 ) corrections to the other dynamical
variables are explicitly calculated in both cases.


Electronic address: weinberg@physics.utexas.edu

1
I. Introduction

In a recent paper [1], I addressed the problem of the evolution of non-


Gaussian correlations during the long period when the observationally in-
teresting wave lengths were outside the horizon, and the universe passed
through transitions such as reheating and baryonsynthesis that are not well
understood. It was shown on general grounds that, apart from quantum cor-
rections, in a suitable coordinate system, the exact non-linear field equations
always have an adiabatic solution for which gij /a2 becomes time independent
outside the horizon, and the perturbations to g00 and gi0 and to all matter
variables become negligible. This, however, is only part of the problem. In
order to conclude in some class of theories that cosmological perturbations
were really described by the adiabatic solution throughout the time when
they were outside the horizon, it is necessary to show that for these the-
ories the adiabatic solution is attractive in the limit of large a(t). In [1]
the corrections to this solution for the metric were calculated explicitly for
a theory with any number of scalar fields and an arbitrary potential, and
it was shown that for all such theories this solution is attractive as far as
the spatial part of the metric is concerned, but this could be shown for the
scalar field perturbations only in the case of single field inflation [2]. The
present paper clarifies these general arguments in Section II, and shows in
Section III that the explicit solution for the metric obtained in [1] applies to
any theory that (like scalar field theories) has vanishing anisotropic inertia
to lowest order in perturbations. Sections IV–VI present an an improved
argument that this solution is attractive for all metric and matter variables,
not only in the case of single field inflation, but also for a sufficiently long
period of local thermal equilibrium with no non-zero conserved quantities.

II. The General Adiabatic Solution

In this section we will give a general broken-symmetry argument that


shows the existence of certain adiabatic solutions of the non-linear field
equations for the metric and matter variables. The discussion will follow
along the same lines as in Section II of [1], but we will here obtain more
detailed results, that are of some interest in themselves, and will be used in
the next section.
Before proceeding to this task, it should be emphasized that the univer-
sal existence of adiabatic solutions of the field equations does not in itself
mean that these are the solutions that apply to the fluctuations in matter

2
and gravitation actually present in our universe. We will be able to show in
Sections IV–VI that these solutions are attractive in two cases: single field
inflation, and local thermal equilibrium with no non-zero conserved quanti-
ties. Based on our experience with the linearized equations, it is plausible
that fluctuations in these cases are in the basin of attraction of the adiabatic
solutions, but we will not try to prove this for the non-linear equations.
Turning to the existence of the adiabatic solutions, our discussion in this
section is based on two very general assumptions:

1. We assume that the dynamical equations governing matter and grav-


itation are generally covariant. This of course is just a way of imple-
menting the Principle of Equivalence.

2. We assume that whatever the dynamical equations governing the met-


ric and matter (including radiation) variables may be, these equations
have a solution in which the metric takes the flat-space Robertson–
Walker form, with g00 = −1, g0i = 0, and g̃ij ≡ gij /a2 = δij , and
in which all matter variables take their unperturbed form; that is, all
densities and pressures and scalar fields are functions only of time, and
all velocities and other 3-vectors vanish.

We will show that under these conditions, for a suitable choice of space-
time coordinates, these equations always also have a family of solutions that
for large a(t) have the following properties:

1. The metric for any of these solutions has components with


h i
gij (x, t) = a2 (t) Gij (x) + f1 (t)Rij (x) + f2 (t)Gij (x)R(x) + . . .
gi0 (x, t) = f3 (t)∂i R(x) + . . . , (1)
g00 (x, t) = −1 + f4 (t)R(x) + . . . ,

2. Whether or not the energy and momentum of any particular con-


stituent of the universe is separately conserved, for this solution its
energy-momentum tensor has the form
h i
Tij (x, t) = a2 (t) Gij (x)p̄(t) + g1 (t)Rij (x) + g2 (t)Gij (x)R(x) + . . . ,
Ti0 (x, t) = g3 (t)∂i R(x) + . . . , (2)
T00 (x, t) = ρ̄(t) + g4 (t)R(x) + . . . ,

3
3. Four-scalars s(x, t) such as temperatures, number densities, or scalar
fields, have the form

s(x, t) = s̄(t) + h(t)R(x) + . . . . (3)

Here Gij (x) is an arbitrary positive matrix function only of the spatial co-
ordinates, which can conveniently be taken as the value of gij (x, t)/a2 (t)
at some time t = T ; Rij (x) and R(x) are the Ricci tensor and curvature
scalar for the metric Gij (x); the functions fn (t), gn (t) and h(t) are of or-
der a−2 (t); dots denote terms of higher order in 1/a(t); and a bar over any
quantity indicates its unperturbed value. Although these solutions exist for
any Gij (x), quantum fluctuations in inflation produce a particular stochastic
Gij (x), whose properties will not concern us here. Aside from quantum loop
effects, a term of order a−n makes a contribution to correlation functions
with characteristic wave number k that is expected to be suppressed outside
the horizon by factors of order (k/aH)n , where as usual H ≡ ȧ/a.
To illustrate what is meant by functions of order a−2 (t), we note here
that, as we will show in the next section, in suitable coordinate system, as
long as there is no anisotropic inertia to order a−2 , the functions f1 (t) and
f2 (t) are in general given by
t t ′
dt′
Z Z
f1 (t) = 2 3
a(t′′ ) dt′′ (4)
T a (t ) T

1 t H 2 (t′ ) dt′
Z t′ Z t′
1 t dt′
Z Z
′′ ′′
f2 (t) = − a(t ) dt + a(t′′ ) dt′′
2 T a3 (t′ ) T 2 T Ḣ(t′ )a3 (t′ ) T
1 t H(t′ ) dt′
Z
− , (5)
2 T a2 (t′ ) Ḣ(t′ )

while the coordinate system is defined so that f3 (t) = 0. Here T is an


arbitrary time, at which we choose gij /a2 to equal Gij . For instance, during
the radiation-dominated era when a ∝ t1/2 , at a time t ≫ T , these give
1
f1 (t) → (6)
3a2 (t)H 2 (t)
f2 (t) → 0 . (7)

The space derivatives in Rij (x) contribute two factors of a characteristic


wave number k, so that these O(a−2 ) terms do indeed make contributions
of order (k/aH)2 .

4
On the other hand, the function f4 (t) depends on the nature of the
constituents of the universe. We show in Section V that during single field
inflation
1 1 H(t) t ′
 Z 
f4 (t) = − − 2 + 3 a(t ) dt′ , (8)
2Ḣ(t) a (t) a (t) T
while in Section VI we show that in a period of thermal equilibrium with no
non-zero conserved quantities,
! !
Ḣ(t)
Z t
H(t′ ) Ḧ(t′ ) 1 H(t′ )
Z t′
′′ ′′
f4 (t) = 2 + − 2 ′ + 3 ′ a(t ) dt dt′
H (t) T Ḣ(t′ ) 6Ḣ 2 (t′ ) a (t ) a (t ) T
Z t
1 1 H(t)
 
− − 2 + 3 a(t′ ) dt′ . (9)
2Ḣ(t) a (t) a (t) T
The first step in proving the existence of these solutions is to consider
instead a trial configuration, in which gij (x, t) = a2 (t)Gij (x), where Gij (x) is
an arbitrary positive matrix function of position, and all other variables are
the same as for the Robertson–Walker solution; that is, g00 = −1, gi0 = 0;
all densities and pressures take their unperturbed values; and all velocities
vanish. Note that this is not a small perturbation to the Robertson–Walker
solution, for no assumption will be made that Gij (x) is close to δij . If Gij (x)
were constant then this would be an exact solution, since it could be ob-
tained from the Robertson–Walker solution by a coordinate transformation
xi → Ai j xj , where Ak i Ak j = Gij . Thus we expect the trial configuration
to fail to satisfy the field equations only by terms involving derivatives of
Gij (x). These terms must be accompanied with factors of 1/a, as required
by the condition that these field equations must be invariant under the scale
transformations
xi → λxi , a(t) → a(t)/λ , (10)
under which gij /a2 , gi0 /a, g00 , Tij /a2 , Ti0 /a, T00 (for each constituent of
the universe), and all pressures, densities, temperature, etc. are invariant.
(This is just a different way of expressing invariance under the coordinate
transformations xi → λxi , under which gµν and Tµν transform as tensors,
and a(t) is invariant.) Hence for sufficiently large a(t), it is a reasonable
ansatz to seek a solution in which deviations of all scale-invariant metric
and matter variables from the trial configuration are small perturbations.
The field equations for the complete set of scale-invariant perturbations
then take a form that can be symbolized as Dq = S, where q is a column of
scale-invariant perturbations, D is a matrix formed from a linear combina-
tion of first and higher time derivatives whose coefficients are scale-invariant

5
x-independent functions of time, and S is a column of source terms, given
by a linear combination of appropriate x-dependent 3-tensors formed from
derivatives of Gij (x), with factors of a(t) as required to make the source terms
scale-invariant. Concrete examples will be given in Sections III, V, and VI.
But even without going into the details of these equations, we can see the
general features of a class of solutions. If we make the ansatz that the solu-
tion for each perturbation is a linear combination of the same x-dependent
3-tensors that appear in the source terms for this perturbation, with coeffi-
cients given by functions only of time, then the dynamical equations reduce
to a set of coupled ordinary inhomogeneous differential equations for these
coefficient functions, with source terms having whatever powers of 1/a(t)
were required by scale invariance in the original equations. Such equations
will always have an adiabatic solution, in which the functions of time ac-
companying each 3-tensor are of an order in 1/a(t) that is the same as the
number of powers of 1/a(t) in the corresponding source term, plus some
number of solutions of the homogeneous equations Dq = 0 that may in gen-
eral increase or decrease as a(t) increases. It is the time-dependence of the
solutions of the homogeneous equation that tells us whether the adiabatic
solution is attractive or not, the question addressed in Sections IV through
VI.
For gij /a2 the source term must be a linear combination of the scale-
invariant tensors a−2 (t)Rij (x) and a−2 (t)Gij (x)R(x), plus terms with more
than two derivatives that are suppressed by more than two powers of 1/a(t).
For gi0 /a the source term must be a term proportional to the scale-invariant
3-vector a−3 (t)∂i R(x), plus terms with more than three derivatives that are
suppressed by more than three powers of 1/a(t). For g00 the source term
must be a term proportional to the scale-invariant scalar a−2 (t)R(x), plus
terms with more than two derivatives that are suppressed by more than
two powers of 1/a(t). Thus we expect these equations to have solutions
of the form (1). The same reasoning applies to whatever matter variables
(densities, pressures, velocities, and if necessary anisotropic inertia) that
enter in the energy-momentum tensors for each constituent of the universe,
giving energy-momentum tensors of the form (2) and scalars of the form (3).
It should be noted that Eq. (2) tells us that δTi0 is a gradient to order a−2 ,
so that to this order there is no vorticity. On the other hand, Eq. (2) allows
terms in δTij that are not proportional to δij , so there is no general argument
against the presence of anisotropic inertia to order a−2 . The results of [1]
show that there is no anisotropic inertia to this order in a theory of multiple
scalar fields with an arbitrary potential, but solution of the collisionless

6
Boltzmann equation [3] shows that anisotropic inertia of order a−2 can be
produced in the linear approximation by gravitational perturbations of this
order in a gas of non-interacting relativistic particles.

III. Explicit Solution for the Metric

We will now verify by detailed calculation that, under the assumptions


of the previous section, for large a(t) there is indeed a solution of the gravi-
tational field equations for which, in a suitable spacetime coordinate system,
the metric takes the form (1), and we will calculate the functions fn (t) ap-
pearing in this solution. For the metric, we use the ADM parameterization
[4]:

g00 = −N 2 + gij N i N j , g0i = gij N j ≡ Ni


g00 = −N −2 , gi0 = N i /N 2 , g ij = (3) gij − N i N j /N 2 , (11)

where (3) gij is the reciprocal of the 3 × 3-matrix gij . It will be convenient
also to write
gij (x, t) = a2 (t)g̃ij (x, t) , (12)
where a(t) is the Robertson–Walker scale factor appearing in the unper-
turbed solution. In this notation, and in units with 8πG ≡ 1, the gravita-
tional field equations are
 
˜ i C i j − δi j C k k = −N T 0 j ,
∇ (13)

−a−2 g̃ij R̃ij − C i j C j i + (C i i )2 = 2N 2 T 00 , (14)



˜ j Nk + C k j ∇
R̃ij − C k k Cij + 2Cik C k j + N −1 − Ċij + C k i ∇ ˜ i Nk

˜ j N = −Tij + 1 a2 g̃ij T λ λ ,

˜ k Cij + ∇
+ N k∇ ˜ i∇ (15)
2
where g̃ij (x, t) is the reciprocal of the matrix g̃ij (x, t); R̃ij (x, t) is the three-
dimensional Ricci tensor for the metric g̃ij (x, t); and C i j (x, t) is the extrinsic
curvature of the surfaces of fixed time
1 h ˜ k Ni − ∇
i
˜ i Nk ,
C i j ≡ a−2 g̃ ik Ckj , Cki ≡ 2aȧg̃ki + a2 g̃˙ ki − ∇ (16)
2N
where ∇˜ i is the three-dimensional covariant derivative calculated with the
three-metric g̃ij .

7
In accordance with the remarks of the previous section, we tentatively
assume a solution in which g̃˙ ij and g00 + 1 are small perturbations for large
a(t), of order 1/a2 (t). We also adopt a definition of space coordinates for
which N i = 0, so that gi0 = 0. We then have

C i j = Hδi j + ξ i j , (17)

where ξ i j is, like g̃˙ ij and δN , a quantity whose leading term is of order a−2 :
1 h i
ξ i j = g̃ik g̃˙ kj − 2H δN g̃ki + O(a−4 ) . (18)
2
Also in accordance with what we found in the previous section, the total
energy momentum tensor for this solution takes the form

Tij = a2 p̄g̃ij + δTij , Ti0 = δTi0 T00 = ρ̄ + δT00 , (19)

where ρ̄ = 3H 2 , p̄ = −2Ḣ − 3H 2 , and δTij /a2 , δTi0 , and δT00 are all of order
a−2 . The gravitational field equations now read
 
˜ i ξ i j − δi j ξ k k = δT0j + O(a−4 ) ,
∇ (20)

−12H 2 δN + 2δT00 = −a−2 g̃ ij R̃ij + 4Hξ k k + O(a−4 ) , (21)

ξ̇ i j + 3Hξ i j + Hδi j ξ k k + (3H 2 − Ḣ)δN δi j = a−2 g̃ik R̃kj


1 1
+ a−2 g̃ ik δTkj − a−2 δi j g̃ kl δTkl + δi j δT00 + O(a−4 ) . (22)
2 2
Now, as remarked in Section II, to order a−2 these solutions have no
vorticity, so we can introduce a momentum potential U , of order a−2 , such
that
δTi0 = ∂i U . (23)
The equation of momentum conservation then reads
˜ i δTkj = 2Ḣ∂j δN + ∂j (U̇ + 3HU ) + O(a−4 ) .
a−2 g̃ik ∇ (24)

We can always also write


 
δTij = a2 g̃ij δp + Πij , (25)

where Πij is a 3-tensor of order a−2 representing anisotropic inertia, with


g̃ ij Πij = 0. Eq. (2) for the total energy momentum tensor together with

8
˜ i δTkj is a gradient, so we can
the Bianchi identity for Rij tells us that g ik ∇
introduce another potential Π such that
˜ i Πkj = ∂j Π .
g̃ik ∇ (26)

The momentum-conservation equation (24) then reads

Π = −δp + 2Ḣ δN + U̇ + 3HU . (27)

Using Eqs. (21), (25), and (27), the field equation (22) now takes the form
1 1
 
i i
Ξ̇ j + 3HΞ j = 2 g̃ ik R̃kj − δi j g̃ kl R̃kl
a 4
1
+ g̃ ik Πkj + δi j Π + O(a−4 ) , (28)
2
where
1
Ξi j ≡ ξ i j + δ i j U . (29)
2
As a check, we note that the field equation (20) may be put in the form
 
˜ i Ξi j − δi j Ξk k + O(a−4 ) = 0 ,
∇ (30)

which is automatically consistent with Eq. (28) as a consequence of Eq. (26)


and the Bianchi identity obeyed by R̃ij .
Up to this point we have kept open the possibility of anisotropic inertia
terms of order a−2 in order to emphasize the very great simplicities offered
if anisotropic inertia can be neglected to this order. If we now assume that
Πij = 0 to order a−2 , so that also Π = 0 to this order, then Eq. (28) reads

1 1
 
Ξ̇ j + 3HΞ j = 2 g̃ ik R̃kj − δi j g̃kl R̃kl + O(a−4 ) .
i i
(31)
a 4
The general solution of Eq. (31) is of the form

1 1
 Z t
Ξi j (x, t) = G ik (x)Rkj (x) − δi j G kl (x)Rkl (x) 3 a(t′ ) dt′
4 a (t) T
i
B j (x)
+ 3 + O(a−4 ) , (32)
a (t)

where T is any fixed time, Gij (x) is the value of g̃ij (x, t) at that time, Rij (x)
is the Ricci tensor calculated from the 3-metric Gij (x); and B i j (x) is some

9
function of x (and T ), appearing in the solution of the homogeneous equation
corresponding to Eq. (31). It is striking that once we assume an absence of
vorticity and anisotropic inertia, all other terms that depend on the details
of the energy-momentum tensor cancel in this solution.
To solve for the metric, we need to complete our choice of gauge. By
using Eqs. (18), (21), and (29), we have
2H 2 H
g̃˙ ij = 2g̃ik Ξk j + g̃ij Ξk k − g̃ij g̃ kl R̃kl + g̃ij X + O(a−4 ) , (33)
Ḣ 2
2a Ḣ
where X = O(1/a2 ) depends on the matter perturbations
!
H  3H 2
X ≡− δT00 − 2(3H 2 + Ḣ)δN − U 1+ . (34)
Ḣ Ḣ
(There are other ways of writing Eq. (33), with apparently simpler definitions
of X, but as we will see the definition used here provides special advantages.)
Under a shift t → t + ǫ(x, t) in the time coordinate, with ǫ of order 1/a2
i i ij dt a−2 ∂ǫ/∂xj to keep
R
(and a corresponding transformation x → x + G
Ni = 0), the quantity X undergoes the transformation
∂ 
X →X +2 ǫH + . . . , (35)
∂t
so we can evidently choose ǫ to make X = 0. This choice is not unique;
after choosing space and time coordinates so that gi0 = 0 and X = 0, we
can preserve both conditions by making a further gauge transformation with
∂τ (x) dt
Z
t → t + τ (x)/H(t) , xi → xi + G ij (x) , (36)
∂xj a2 (t)H(t)
where τ is an arbitrary function only of x. This residual gauge freedom will
be important in Sections V and VI.
In a theory of multiple scalar fields X is a linear combination of terms
proportional to the scalar field perturbations and their time derivatives, so
the choice X = 0 is a generalization of the choice of gauge commonly used
in studying single field inflation, in which the scalar field is unperturbed. In
general, the gauge choice X = 0 provides the great advantage that we can
solve Eq. (33) for g̃ij without worrying about the matter variables to which
the metric is coupled:

1
Z t dt′
Z t′
g̃ij (x, t) = Gij (x) + 2 Rij (x) − Gij (x)G kl (x)Rkl (x) 3
a(t′′ ) dt′′
4 T a (t′ ) T

10
1 t H 2 (t′ ) dt′ t′
Z Z
+ Gij (x)G kl (x)Rkl (x) a(t′′ ) dt′′
2 T Ḣ(t′ )a3 (t′ ) T
Z t
1 H(t′ ) dt′
− Gij (x)G kl (x)Rkl (x)
2 T a2 (t′ ) Ḣ(t′ )
t dt ′ t H 2 (t′ ) dt′
Z Z
+ 2 Gik (x)B k j (x) + 2 Gij (x)B k k (x)
T a3 (t′ ) T a3 (t′ ) Ḣ(t′ )
 
−4
+O a (t) , (37)

where T is again any fixed time, and Gij (x) and Rij (x) are the values of
g̃ij and the associated Ricci tensor at that time. This is the same solution
found in [1] for the special case of multiple scalar fields with an arbitrary
potential. In the language of Eq. (1), this solution tells us that
tt ′
dt′
Z Z
f1 (t) = 2 3 ′
a(t′′ ) dt′′ (38)
T a (t ) T
Z t′
1 t dt′
Z
f2 (t) = − a(t′′ ) dt′′
2 T a3 (t′ ) T
1 t H 2 (t′ ) dt′
Z Z t′
+ a(t′′ ) dt′′
2 T Ḣ(t′ )a3 (t′ ) T
1 t H(t′ ) dt′
Z
− . (39)
2 T a2 (t′ ) Ḣ(t′ )

Of course, having chosen space coordinates so that gi0 = 0, we have f3 (t) =


0. The calculation of f4 (t) is taken up in Sections V and VI.

IV. Other Variables

We have seen in Eq. (34) that the general solution of the dynamical
equations for matter and metric variables that differs from the “trial con-
figuration” (for which gij (x, t)/a2 (t) = Gij (x), gi0 = 0, g00 = −1, and all
matter variables are unperturbed) by small terms, of order a−2 , has a spa-
tial metric that approaches Gij (x) for large a(t). But in order to show that
the trial configuration is truly attractive, we also need to consider the other
metric component g00 = −N 2 (the coordinate system having been chosen
so that gi0 = 0), and the various matter variables. We will do this in the
next two sections for two cosmological models: single field inflation and local
thermal equilibrium with no non-zero conserved quantities. Both of these

11
models have vanishing anisotropic inertia, and therefore have a spatial met-
ric described by the results of the previous section. From the perfect-fluid
form of the energy-momentum tensor, we have in both cases

δT00 = δρ + 6H 2 δN + O(a−4 ) (40)

so Eqs. (21) and (29) give

2δρ = −a−2 g̃ ij R̃ij + 4HΞk k − 6HU + O(a−4 ) , (41)

while Eqs. (40) and (34) show that the gauge condition X = 0 gives
!
H  3H 2
0=− δρ − 2ḢδN − U 1+ (42)
Ḣ Ḣ

Finally, for zero anisotropic inertia the equation (27) of momentum conser-
vation gives
0 = −δp + 2Ḣ δN + 3HU + U̇ + O(a−4 ) . (43)
This gives three relations among the four quantities δN , δρ, δp, and U , so
with one additional relation we can calculate all four.

V. Single Field Inflation

In general, for a single scalar field ϕ̄(t) + δϕ(x, t), with a conventional
kinematic term, the quantity X defined by Eq. (34) is given by [1]:
H 
X = −ϕ̄˙ δϕ + ¨ δϕ − ϕ̄˙ δϕ̇ + O(a−4 )
ϕ̄
Ḣ
1 H 
= −p −2Ḣδϕ − − Ḧδϕ + 2Ḣδϕ̇ + O(a−4 ) . (44)
−2Ḣ Ḣ

The gauge choice that X = 0 to order a−2 tells us then that


q
−2Ḣ(t)
δϕ(x, t) = f (x) , (45)
H(t)

where f (x) is some function only of x. Under a residual gauge transforma-


tion of the form (36) (which preserves the conditions X = 0 and gi0 = 0)
the scalar field perturbation is shifted by
q
∆δϕ(x, t) = −ϕ̄˙ (t)τ (x)/H(t) = − −2Ḣ(t)τ (x)/H(t) ,

12
so by choosing τ (x) = f (x), we can make δϕ = 0, completing our choice of
gauge. The momentum potential defined by Eq. (23) is here

U = ϕ̄˙ δϕ , (46)

so the gauge choice δϕ = 0 also entails U = 0. The density and pressure


perturbations here are
1 2
δρ = δp = δg00 ϕ̄˙ = 2ḢδN , (47)
2
so Eqs. (42) and (43) are satisfied, and the only variable that needs to be
examined to check the attractive nature of the adiabatic solution is δg00 =
−2δN . It is given by Eqs. (47) and (41) as
1 h −2 ij i
δN = −a g̃ R̃ij + 4HΞk k + O(a−4 )
4Ḣ
G ij Rij 1 H t ′
 Z 
= − 2+ 3 ′
a(t ) dt + O(a−3 ) . (48)
4Ḣ a a T

This is O(a−2 ), and hence vanishes for large a, so the adiabatic solution is
indeed an attractor in the case of single field inflation. In the language of
Eq. (1), Eq. (48) gives

1

1 H(t)
Z t 
′ ′
f4 (t) = − − 2 + 3 a(t ) dt , (49)
2Ḣ(t) a (t) a (t) T

In the gauge adopted here there are no matter perturbations during single
field inflation, and so nothing else to check.

VI. Local Thermal Equilibrium

In thermal equilibrium with no non-zero conserved quantum numbers


the pressure and energy density in any gauge are functions only of the tem-
perature, so !

δp = (p̄˙ /ρ̄˙ ) δρ = −1 − δρ . (50)
3H Ḣ
Using this and Eqs. (41) and (42) in Eq. (43) gives a differential equation
for the momentum potential U :

13
! !
Ḣ Ḧ Ḧ 1

U̇ + U − =− 2+ − 2 g̃ ij R̃ij + 2HΞi i
H Ḣ 3H Ḣ 2a
!
Ḧ 1 H
Z t 
=− 1+ − 2+ 3 ′
a(t ) dt ′
G ij Rij + O(a−3 ) ,(51)
6H Ḣ a a T

where T may be taken as any time during the period of thermal equilibrium,
most conveniently at its beginning, and Gij and Rij are the reduced metric
gij /a2 and associated Ricci tensor at that time. The solution is

Ḣ(t)
U (x, t) = f (x)
H(t)
! !
Ḣ(t)
Z t H(t′ ) Ḧ(t′ )
ij ′
− G (x)Rij (x) dt 1+
H(t) T Ḣ(t′ ) 6 H(t′ ) Ḣ(t′ )
!
1 H(t′ )
Z t′
′′ ′′
× − 2 ′ + 3 ′ a(t ) dt + O(a−3 ) , (52)
a (t ) a (t ) T

where f (x) is an arbitrary function of position. At first sight, this does


not appear very attractive, in either the mathematical or the colloquial
sense. The first term, representing a solution of the homogeneous equation
corresponding to (51), is of zeroth order in 1/a, and so does not become
small for large a. But this term is a gauge artifact. From the definition (23)
of the momentum potential, we can easily see that under the residual gauge
transformation (36) (which preserves the conditions X = 0 and gi0 = 0), U
transforms as !
2Ḣ(t)
U (x, t) → U (x, t) + τ (x) (53)
H(t)
so by choosing the arbitrary function τ (x) to have the value −f (x)/2, we
can cancel the first term in Eq. (52). The remainder is of order a−2 , and
hence vanishes for large a.
The remaining perturbations δg00 = −2δN and δρ are algebraically re-
lated to U by Eqs. (41) and (42), which give

1 1 H(t)
 Z t 
δρ(x, t) = −3H(t)U (x, t) + G ij (x)Rij (x) − 2 + 3 ′
dt a(t ) ′
2 a (t) a (t) T
(54)

14
and
!
Ḣ(t) 1

1 H(t)
Z t 
2Ḣ(t)δN (x, t) = U (x, t)+ G ij (x)Rij (x) − 2 + ′
dt a(t ) ′
.
H(t) 2 a (t) a3 (t) T
(55)
Both are of order a−2 , so in this case, too, the adiabatic solution is com-
pletely attractive for large a. In the language of Eq. (1), Eqs. (55) and (52)
give:
! !
Ḣ(t)
Z t
H(t′ ) Ḧ(t′ ) 1 H(t′ )
Z t′
′′ ′′
f4 (t) = 2 + − 2 ′ + 3 ′ a(t ) dt dt′
H (t) T Ḣ(t′ ) 6Ḣ 2 (t′ ) a (t ) a (t ) T
Z t
1 1 H(t)
 
− − 2 + 3 a(t′ ) dt′ . (56)
2Ḣ(t) a (t) a (t) T

I am grateful for helpful conversations with Raphael Flauger and Eiichiro


Komatsu. This material is based upon work supported by the National
Science Foundation under Grant No. PHY-0455649 and with support from
The Robert A. Welch Foundation, Grant No. F-0014

References

1. S. Weinberg, arXiv:0808.2909.

2. Some aspects of these results have been obtained by D. H. Lyth, K.


A. Malik, and M. Sasaki, J. Cosm. Astropart. Phys. 05, 004 (2005).
The O(a−2 ) and O(a−3 ) terms in the metric have also been found by
Y. Tanaka and M. Sasaki, Prog. Theor. Phys. 181, 455 (2007) in the
special case of single-field inflation. Their solution is different from
that presented in Section III, presumably because they use a different
gauge.

3. For tensor modes, see S. Weinberg, Phys. Rev. D 69, 023503 (2004).
For scalar modes, see Eq. (6.1.62) of S. Weinberg, Cosmology (Oxford
University Press, 2008).

4. R. S. Arnowitt, S. Deser, and C. W. Misner, in Gravitation: An Intro-


duction to Current Research, ed. L. Witten (Wiley, New York, 1962):
227, now also available as gr-qc/0405109.

15
UTTG-01-09
TCC-013-09

Living With Infinities


arXiv:0903.0568v2 [hep-th] 21 Apr 2009

Steven Weinberg∗
Theory Group, Department of Physics, and Texas Cosmology Center,
University of Texas, Austin, TX, 78712

Abstract

This is the written version of a talk given in memory of Gunnar Källén, at


the Departments of Theoretical Physics, Physics, and Astronomy of Lund
University on February 13, 2009. It will be published in a collection of
the papers of Gunnar Källén, edited by C. Jarlskog and A. C. T. Wu. I
discuss some of Källén’s work, especially regarding the problem of infinities
in quantum field theory, and recount my own interactions with him. In
addition, I describe for non-specialists the current status of the problem, and
present my personal view on how it may be resolved in the future.


Electronic address: weinberg@physics.utexas.edu

1
I owe a great debt of gratitude to Gunnar Källén. In the summer of
1954, having just finished my undergraduate studies at Cornell, I arrived
at the Bohr Institute in Copenhagen, where Källén was a member of the
Theoretical Study Division of CERN, which had not yet moved to Geneva.
Richard Dalitz had advised me to go to Copenhagen partly because of the
presence there of CERN. But my real reason for coming to Copenhagen
with my wife was that we had just married, and thought that we could
have a romantic year abroad before we returned to the U.S. for me to enter
graduate school. I brought with me a bag of physics books to read, but I
did not imagine that I could start original research. You see, I had the idea
that before I started research on any topic, I first had to know everything
that had been done in that area, and I knew that I was far from knowing
everything about anything.
It wasn’t long before people at the Institute let me know that everyone
there was expected to be working on some sort of research. David Frisch, a
visiting American nuclear physicist, kindly suggested that I do something on
nuclear alpha decay, but nothing came of it.
Early in 1955 I heard that a young theorist named Källén was doing
interesting things in quantum field theory, so I knocked on his office door,
and asked him to suggest a research problem. As it happened, Källén did
have a problem to suggest. A year earlier, Tsung-Dao Lee at Columbia had
invented a clever field-theoretic model that could be solved exactly.1 The
model had some peculiarities, which I’ll come back to. These problems did
not at first seem fatal to Lee, but Källén joined with the great Wolfgang Pauli
to show that scattering processes in the Lee model violate the principle of
unitarity — that is, the sum of the probabilities for all the things that can
happen when two particles collide did not always add up to 100%.2 Now
Källén wanted me to see if there were other things wrong with the Lee model.
With a great deal of patient help from Källén, I was able to show that
there were states in the Lee model whose energies were complex — that is,
not ordinary real numbers. I finished the work on the Danish freighter that
took my wife and me back to the U.S., and soon after I started graduate
school at Princeton I had published the work as my first research paper.3
1
T.D. Lee, Phys. Rev. 95, 1329 (1954).
2
G. Källén and W. Pauli, Dan. Mat. Fys. Medd. 30, no. 7 (1955).
3
S. Weinberg, Phys. Rev. 102, 285 (1956).

2
This was a pretty unimportant paper (I recently checked, and found that it
has been cited just nine times in 53 years), but it was a big thing for me —
I started to feel like a physicist, not a student.
Incidentally, Källén’s kindness to me went beyond starting me in research.
He and his wife had my wife and me to their house for dinner, and going to the
bathroom there, I learned something about Källén that probably most of you
don’t know — he had hand towels embroidered with the Dirac equation. Mrs.
Källén told me that they were a present from Pauli. Källén also introduced
me to Pauli, but I didn’t get any towels.
Even though I had benefited so much from Källén’s suggestion of a re-
search problem, I felt that there was something odd about it. Lee was then
not a well-known theorist — his great work with Yang on parity violation
and weak interactions was a few years in the future. Also, the Lee model was
not intended to be a serious model of real particles. So why did Källén take
the trouble to shoot it down, even to the extent of enlisting the collaboration
of his friend Pauli? The explanation, which I understood only much later,
has to do with a long-standing controversy about the future of quantum field
theory, in which Källén was playing an important part.
The controversy concerned the significance of infinities in quantum field
theory. The problem of infinities was anticipated in the first papers on quan-
tum field theory by Heisenberg and Pauli,4 and then in 1930 infinite energy
shifts were found in calculations of the effects of emitting and reabsorbing
photons by free or bound electrons, by Waller5 and Oppenheimer.6 In both
cases you have to integrate over the momenta of the photons, and the inte-
grals diverge. During the 1930s it was widely believed that these infinities
signified a breakdown of quantum electrodynamics at energies above a few
MeV. This changed after the war, when new techniques of calculation were
developed that manifestly preserved the principles of special relativity at ev-
ery step, and it was recognized that the infinities could be absorbed into a
redefinition, called a renormalization, of physical constants like the charge
and mass of the electron.7 Dyson was able to show (with some technicalities
4
W. Heisenberg and W. Pauli, Z. f. Physik 56, 1 (1929); 59, 168 (1930).
5
I. Waller, Z. f. Physik 59, 168 (1930); 61, 721, 837 (1930); 62, 673 (1930)
6
J. R. Oppenheimer, Phys. Rev. 35, 461 (1930).
7
See articles by Bethe, Dyson, Feynman, Kramers, Lamb & Retherford, Schwinger,
Tomonaga, and Weisskopf reprinted in Quantum Electrodynamics, ed. J. Schwinger (Dover
Publications, Inc., New York, 1958).

3
cleared up later by Salam8 and me9 ) that in quantum electrodynamics and
a limited class of other theories, the renormalization of a finite number of
physical parameters would actually remove infinities in every order of pertur-
bation theory — that is, in every term when we write any physical observable
as an expansion in powers of the charge of the electron, or powers of similar
parameters in other theories. Theories in which infinities are removed in this
way are known as renormalizable. They can be recognized by the property
that in renormalizable theories, in natural units in which Planck’s constant
and the speed of light are unity, all of the constants multiplying terms in the
Lagrangian are just pure numbers, like the charge of the electron, or have
the units of positive powers of energy, like particle masses, but not negative
powers of energy.10
The great success of calculations in quantum electrodynamics using the
renormalization idea generated a new enthusiasm for quantum electrodynam-
ics. After this change of mood, probably most theorists simply didn’t worry
about having to deal with infinite renormalizations. Some theorists thought
that these infinities were just a consequence of having expanded in powers
of the electric charge of the electron, and that not only observables but even
quantities like the “bare” electron charge (the charge appearing in the field
equations of quantum electrodynamics) would be found to be finite when
we learned how to calculate without perturbation theory. But at least two
leading theorists had their doubts about this, and thought that the appear-
ance of infinite renormalizations in perturbation theory was a symptom of a
deeper problem, a problem not with perturbation theory but with quantum
field theory itself. They were Lev Landau, and Gunnar Källén.
Källén’s first step in exploring this problem was in an important 1952
8
A. Salam, Phys. Rev. 82, 217 (1951).
9
S. Weinberg, Phys. Rev. 118, 838 (1959).
10
The units of these constants of course depend on the units we assign to the field
operators. In using this criterion for renormalizability, it is essential to use units for any
field operator related to the asymptotic behaviour of its propagator; if the propagator goes
like k n for large four-momentum k, then the field must be assigned the units of energy to
the power n/2 + 2. In particular, because of k µ k ν /(k 2 + m2 ) terms in the propagator of
a massive vector field, for these purposes the field must be given the unconventional units
of energy to the power +2, and any interaction of the field would be non-renormalizable,
unless the field is coupled only to conserved currents for which the terms in the propagator
proportional to k µ k ν may be dropped.

4
paper,11 in which he showed how to define quantities like the bare charge
of the electron without the use of perturbation theory. To avoid the com-
plications that arise from the vector nature of the electromagnetic field, I’ll
describe the essential points here using the easier example of a real scalar
field ϕ(x), studied a little later by Lehmann.12 The quantity −i∆′ (p) known
as the propagator, that in perturbation theory would be given by the sum
of all Feynman diagrams with two external lines, carrying four-momenta pµ
and −pµ , can be defined without the use of perturbation theory by
D o E Z
d4 p ′
0 T {ϕ(x) , ϕ(0) 0 = −i ∆ (p)eip·x , (1)

(2π)4
where |0i is the physical vacuum state, and T denotes a time-ordered product,
with ϕ(x) to the left or right of ϕ(0) according as the time x0 is positive or
negative. By inserting a complete set of states between the fields in the time-
ordered product, one finds what has come to be called the Källén–Lehmann
representation
|N|2 σ(µ) dµ
Z

∆ (p) = 2 2
+ , (2)
p +m p2 + µ 2
where σ(µ2 ) ≥ 0 is given by a sum over multiparticle states with total energy-
momentum vector P λ satisfying −P 2 = µ2 , and N is defined by the matrix
element of ϕ(x) between the vacuum and a one-particle state of physical mass
m and three-momentum k:
D E Neik·x
0 ϕ(x) k = √ , (3)
(2π)3/2 2k 0

with k 0 ≡ k2 + m2 . If ϕ(x) is the “unrenormalized” field that appears
in the quadratic part of the Lagrangian without any extra factors, then it
satisfies the canonical commutation relation
h i
ϕ̇(x, t) , ϕ(y, t) = −iδ 3 (x − y) . (4)

By taking the time derivative of Eq. (1) and then setting the time x0 equal
to zero and using the commutation relation (4), one obtains the sum rule
Z
2
1 = |N| + σ(µ) dµ . (5)
11
G. Källén, Helv. Phys. Acta 25, 417 (1952).
12
H. Lehman, Nuovo Cimento XI, 342 (1954).

5
One immediate consequence is that, since |N|2 is necessarily positive,
Eq. (5) gives an upper limit on the coupling of the field ϕ to multiparticle
states Z
σ(µ) dµ ≤ 1 . (6)
I’ll mention in passing that this upper limit is reached in the case N = 0,
which only applies if ϕ(x) does not appear in the Lagrangian at all — that
is, if the particle in question is not elementary. Thus, in a sense, composite
particles are coupled to their constituents more strongly than any possible
elementary particle.
This kind of sum rule has proved very valuable in theoretical physics.
For instance, if instead of a pair of scalar fields in Eq. (1) we consider pairs
of conserved symmetry currents, then by using methods similar to Källén’s,
one gets what are called a spectral function sum rules,13 which have had
useful applications, for instance in calculating the decays of vector mesons
into electron–positron pairs.
What chiefly concerned Källén was the application of these methods to
quantum electrodynamics. In his 1952 paper, Källén derived a sum rule like
(5) for the electromagnetic field, with Z3 ≡ |Nγ |2 in place of |N|2 , where
Nγ is the renormalization constant for the electromagnetic field. As in the
scalar field theory, this sum rule (and the definition of Z3 as an absolute
value squared) shows that
0 ≤ Z3 < 1 . (7)
This is especially important in electrodynamics, because Z3 appears in the
relation between the bare electronic charge eB that appears in the field equa-
tions, and the physical charge e of the electron:
e2 = Z3 e2B . (8)
The fact that e2 is less than e2B has a well-known interpretation: it is due to
the shielding of the bare charge by virtual positrons, which are pulled out of
the vacuum along with virtual electrons, and unlike the virtual electrons are
attracted to the real electron whose charge is being measured.
Now, in lowest order perturbation theory, we have
e2 Λ
 
Z3 = 1 − ln , (9)
6π 2 me
13
S. Weinberg, Phys. Rev. Lett. 18, 507 (1967).

6
where Λ is an ultraviolet cut-off, put in as a limit on the energies of the vir-
tual photons. This is all very well if we take Λ as a reasonable multiple of the
electron mass me , but if the cut-off is taken greater than me exp(6π 2 /e2 ) ≈
10280 me (which is more than the total mass of the observable universe) then
we are in trouble: In this case Eq. (9) gives Z3 negative, contradicting the
inequality (7). As Landau pointed out,14 this ridiculously large energy be-
comes much smaller if we take into account the fact that there are several
species of charged elementary particles; for instance, if there are ν species of
spin one-half particles with the same charge as the electron, then the factor
10280 is replaced with 10280/ν . So if ν is, say, 10 or 20, the problem with
the sign of Z3 would set in at energies much closer to those with which we
usually have to deal. But this is just lowest order perturbation theory — to
see if there is really any problem, it is necessary to go beyond perturbation
theory.
To explore this issue, Källén set out to see if the integral appearing in
1 − Z3 , and not just its expansion in powers of e2 , actually diverges in the
absence of a cut-off. Of course, he could not evaluate the integral exactly,
but since every kind of multiparticle state makes a positive contribution
to the integrand, he could concentrate on the contribution of the simplest
states, consisting of just an electron and a positron — if the integral of this
contribution diverges, then the whole integral diverges. In evaluating this
contribution, he had to assume that all renormalizations including the renor-
malization of the electron mass and field were finite. With this assumption,
and some tricky interchanges of integrations, he found that the integral for
1 − Z3 does diverge. In this way, he reached his famous conclusion that at
least one of the renormalization constants in quantum electrodynamics has
to be infinite.15
Not everyone was convinced. To quote the Källén memorial statement of
Paul Urban in 1969,16 “Indeed, other authors are in doubt about his famous
proof that at least one of the renormalization constants has to be infinite, but
so far no definite answer to this question has been found.” It should be noted
that at the end of his 1953 paper, Källén had explicitly disavowed any claim
to mathematical rigor. As far as I know, this issue has never been settled. Of
14
L. Landau, in Niels Bohr and the Development of Physics (Pergamon Press, New
York, 1955): p. 52.
15
G. Källén, Dan. Mat. Fys. Medd. 27, no. 12 (1953).
16
P. Urban, Acta Physica Austriaca, Suppl. 6 (1969).

7
course, the important question was not whether some of the renormalization
constants are infinite for infinite cut-off, but whether something happens
at very high energies, such as 10280 me , to prevent the cut-off in quantum
electrodynamics from being taken to infinity. I don’t know if Källén ever
expressed an opinion about it, but I suspect that he thought that quantum
electrodynamics does break down at very high energies, and that he wanted
to be the one who proved it.
Which brings me back to the Lee model. This is a model with two heavy
particles, V and N, and a lighter particle θ, all with zero spin. The only
interactions in the theory are ones in which V converts to N + θ, or vice
versa. No antiparticles are included, and the recoil energies of the V and N
are neglected, so the model is non-relativistic, thoughqthe energy ω of a θ of
momentum p is given by the relativistic formula ω = p2 + m2θ . The model
is exactly soluble in sectors with just one or two particles. For instance, to
find the complete amplitude for V → N + θ, one can sum the graphs for
V → N +θ → V → N +θ → V → ··· → N +θ ,
which is just a geometric series. One finds that, if the physical and bare
V -particle states are normalized so that
hV, phys|V, physi = hV, bare|V, barei = 1 , (10)
then we have an exact sum rule resembling (5):
|g|2 Λ k 2 dk
Z
1 = |N|2 + , (11)
4π 2 0 ω3
where
N ≡ hV, bare|V, physi (12)
Here Λ is again an ultraviolet cut-off, and g is the renormalized coupling for
this vertex, related to the bare coupling gB by the exact formula g = NgB .
For Λ ≫ mθ , the integral in Eq. (11) grows as ln Λ, so if g 6= 0 then Λ cannot
be arbitrarily large without violating the condition that |N|2 ≥ 0. This is
just like the problem encountered in lowest-order quantum electrodynamics,
except that here there is no use of perturbation theory, and hence no hope
that the difficulty will go away when perturbation theory is dispensed with.
Despite this difficulty, Lee found that his model with Λ → ∞ gave sensible
results for some simple problems, like the calculation of the energy of the V

8
particle. In their 1955 paper, Källén and Pauli confronted the difficulty that
|N|2 then comes out negative, and recognized that for very large Λ this was
necessarily a theory with an indefinite metric — that is, it is necessary to take
all states with odd numbers of bare V particles with negative norm, while
all other states with definite numbers of bare particles have positive norm.
In particular, in place of (10), we must take hV, bare|V, barei = −1, while
calculations show that the physical V state has positive norm, so that we
can still normalize it so that hV, phys|V, physi = +1. (There is also another
discrete energy eigenstate formed as a superposition of bare V and N + θ
states, that has negative norm.) Then in place of (11), we have

g2 Λ k 2 dk
Z
1 = −|N|2 + , (13)
4π 2 0 ω3
which gives no problem for large Λ. The device of an indefinite metric had
already been introduced by Dirac,17 for reasons having nothing to do with
infinities (Dirac was trying to find a physical interpretation of the negative
energy solutions of the relativistic wave equations for bosons), and Pauli18
had noticed that if we can introduce suitable negative signs into sums over
states, it should be possible to avoid infinities altogether. I think that what
Källén and Pauli in 1955 disliked about the indefinite metric was not that it
solved the problem of infinities, but that it did so too easily, without having
to worry about what really happens at very high energies and short distances,
and this is why they took the trouble to show that it did lead to unphysical
results in the Lee model.
Experience has justified Källén and Pauli’s distrust of the indefinite met-
ric. This device continues to appear in theoretical physics, but only where
there is some symmetry principle that cancels the negative probability for
producing states with negative norm by the positive probability for producing
other unphysical states, so that the total probability of producing physical
states still adds up to 100%. Thus, in the Lorentz-invariant quantization
of the electromagnetic field by Gupta and Bleuler,19 the state of a timelike
photon has negative norm, but gauge invariance insures that the negative
17
P. A. M. Dirac, Proc. Roy. Soc. A180, 1 (1942).
18
W. Pauli, Rev. Mod. Phys. 15, 175 (1943).
19
S. N. Gupta, Proc. Phys. Soc. 58, 681 (1950); K. Bleuler, Helv. Phys. Acta 28, 567
(1950).

9
probability for the production of these unphysical photons with timelike po-
larization is canceled by the positive probability for the production of other
unphysical photons, with longitudinal polarization. A similar cancelation
occurs in the Lorentz invariant quantization of string theories, where the
symmetry is conformal symmetry on the two-dimensional worldsheet of the
string. But it seems that without any such symmetry, as in the Lee model,
the indefinite metric does not work.20
I should say a word about where we stand today regarding the survival
of quantum electrodynamics and other field theories in the limit of very
high cut-off. The appropriate formalism for addressing this question is the
renormalization group formalism presented by Wilson21 in 1971. When we
calculate the logarithmic derivative of the bare electron charge eBΛ with
respect to the cut-off Λ at a fixed renormalized charge, then the result for
Λ ≫ me can only depend on eBΛ , since there is no relevant quantity with
the units of energy with which Λ can be compared. That is, eBΛ satisfies a
differential equation of the form
deBΛ
Λ = β(eBΛ ) . (14)

The whole question then reduces to the behavior
R∞
of the function β(e). If it
is positive and increases fast enough so that de/β(e) converges, then the
cut-off in quantum electrodynamics cannot be extended to a value greater
than a finite energy E∞ , given by
!
∞ de
Z
E∞ = µ exp , (15)
eBµ β(e)

with µ arbitrary. On the basis of an approximation in which in each order of


perturbation theory one keeps only terms with the maximum number of large
logarithms, Landau concluded in ref. 14 that quantum electrodynamics does
break down at very high energy. In effect, he was arguing on the basis of the
lowest-order term, β(e) ≃ e3 /12π 2 , for which ∞ de/β(e) does converge.
R

20
It has been argued that the PT symmetry of the Lee model allows the definition of a
scalar product for which the theory is unitary; see C. M. Bender, S. F. Brandt, J-H Chen,
and Q. Wang, Phys. Rev. D 71, 025014 (2005); C. M. Bender and P. D. Mannheim, Phys.
Rev. D 78, 025022 (2008).
21
K. G. Wilson, Phys. Rev. B4, 3174, 3184 (1971); Rev. Mod. Phys. 47, 773 (1975).

10
No one today knows whether this is the case. It is equally possible that
higher-order effects will make β(e) increase more slowly or even decrease for
very large e, in which case ∞ de/β(e) will diverge and eBΛ will just continue
R

to grow smoothly with Λ. One might imagine that β(e) could instead drop
to zero at some finite value e∗ , in which case eBΛ would approach e∗ as
Λ → ∞, though there are arguments against this.22 Lattice calculations
(in which spacetime is replaced by a lattice of separate points, providing an
ultraviolet cut-off equal to the inverse lattice spacing) indicate that the beta
function for a scalar field theory with interaction gB ϕ4 increases for large
gB fast enough so that dgB /β(gB ) converges and the theory therefore does
R

not have a continuum limit for zero lattice spacing.23 And in the Lee model
without an indefinite metric, Eq. (11) together with the relation gB = g/N
gives
dgBΛ g3
β(gBΛ ) ≡ Λ = BΛ2
dΛ 8π
R∞
for Λ ≫ mθ , so dg/β(g) converges, and as we have seen, the cut-off cannot
be taken to infinity.
If limited to quantum electrodynamics, the problem of high energy be-
havior has become academic, since electromagnetism merges with the weak
interactions at energies above 100 GeV, and we really should be asking about
the high energy behavior of the SU(2) and U(1) couplings of the electroweak
theory. Even that is somewhat academic, because gravitation becomes im-
portant at an energy of order 1019 GeV, well below the energy at which the
SU(2) and U(1) couplings would become infinite. And there is no theory of
gravitation that is renormalizable in the Dyson sense — the Newton constant
appearing in General Relativity has the units of an energy to the power −2.
Källén’s concern with the problems of quantum field theory at very high
energy did not keep him from appreciating the great success of quantum elec-
trodynamics. In a contribution to the 1953 Kamerlingh Onnes Conference,24
22
S. L. Adler, C. G. Callan, D. J. Gross, and R. Jackiw, Phys. Rev. D6, 2982 (1972);
M. Baker and K. Johnson, Physica 96A, 120 (1979); P. C. Argyres, M. Ronen, N. Seiberg,
and E. Witten, Nucl. Phys. B461, 71 (1996).
23
For a discussion and references, see J. Glimm and A. Jaffe, Quantum Physics – A
Functional Integral Point of View, 2nd ed. (Springer-Verlag, New York, 1987), Sec. 21.6;
R. Fernandez, J. Frölich, and A. D. Sokal, Random Walks, Critical Phenomena, amd
Triality in Quantum Field Theory (Springer-Verlag, Berlin, 1992), Chapter 15.
24
G. Källén, Physica XIX, 850 (1953.

11
he remarked that “there is little doubt that the mathematical framework of
quantum electrodynamics contains something which corresponds closely to
physical reality.” He did practical calculations using perturbation theory in
quantum electrodynamics, on problems such as the vacuum polarization in
fourth order25 and the radiative corrections to decay processes.26 He wrote a
book about quantum electrodynamics,27 leaving for the very end of the book
his concern about the infinite value of renormalization constants.
Källén’s interests were not limited to quantum electrodynamics. In 1954
he showed that the renormalizable meson theory with pseudoscalar coupling
could not be used to account for both pion scattering and pion photopro-
duction, because different values of the pion-nucleon coupling constant are
needed in the two cases.28 Again, this result relied on lowest-order pertur-
bation theory, so Källén acknowledged that it did not conclusively kill this
meson theory. He remarked that “It would certainly be felt as a great re-
lief by many theoretical physicists — among them the present author —
if a definite argument against meson theory in its present form or a defi-
nite mathematical inconsistency in it could be found. This feeling together
with wishful thinking must not tempt us to accept as conclusive evidence an
argument that is still somewhat incomplete.”
Of course, Källén was right in his dislike of this particular meson theory.
A decade or so later the development of chiral Lagrangians showed that low
energy pions are in fact well described by a theory with pseudovector coupling
of single pions to nucleons, plus terms with two or more pions interacting
with a nucleon at a single vertex, as dictated by a symmetry principle, chiral
symmetry.29 This theory is not renormalizable in the Dyson sense, but we
have learned how to live with that. It is an effective field theory, which can
be used to generate a series expansion for soft pion scattering amplitudes
in powers of the pion energy. The Lagrangian for the theory contains every
possible interaction that is allowed by the symmetries of the theory, but
the non-renormalizable interactions whose coupling constants are negative
25
G. Källén and A. Sabry, Dan. Mat. Fys. Medd. 29, no. 7 (1955).
26
G. Källén, Nucl. Phys. B 1, 225 (1967).
27
G. Källén, Quantum Electrodynamics, transl. C. K. Iddings and M. Mizushima
(Springer-Verlag, 1972).
28
G. Källén, Nuovo Cimento XII, 217 (1954).
29
For a discussion with references to the original literature, see Sec. 19.5 of S. Weinberg,
The Quantum Theory of Fields, Vol. II (Cambridge Univ. Press, 1996.)

12
powers of some characteristic energy (which is about 1 GeV in this theory)
make a small contribution for pion energies that are much less than the
characteristic energy. To any given order in pion energy, all infinities can be
absorbed in the renormalization of a finite number of coupling parameters,
but we need more and more of these parameters to absorb infinities as we go
to higher and higher powers of pion energy.
My own view is that all of the successful field theories of which we are
so proud — electrodynamics, the electroweak theory, quantum chromody-
namics, and even General Relativity — are in truth effective field theories,
only with a much larger characteristic energy, something like the Planck en-
ergy, 1019 GeV. It is somewhat of an accident that the simplest versions
of electrodynamics, the electroweak theory, and quantum chromodynamics
are renormalizable in the Dyson sense, though it is very important from a
practical point of view, because the renormalizable interactions dominate
at ordinary accessible energies. An effect of one of the non-renormalizable
terms has recently been detected: An interaction involving two lepton dou-
blets and two scalar field doublets generates neutrino masses when the scalar
fields acquire expectation values.30
None of the renormalizable versions of these theories really describes na-
ture at very high energy, where the non-renormalizable terms in the theory
are not suppressed. From this point of view, the fact that General Relativity
is not renormalizable in the Dyson sense is no more (or less) of a fundamental
problem than the fact that non-renormalizable terms are present along with
the usual renormalizable terms of the Standard Model. All of these theo-
ries lose their predictive power at a sufficiently high energy. The challenge
for the future is to find the final underlying theory, to which the effective
field theories of the standard model and General Relativity are low-energy
approximations.
It is possible and perhaps likely that the ingredients of the underlying
theory are not the quark and lepton and gauge boson fields of the Standard
Model, but something quite different, such as strings. After all, as it has
turned out, the ingredients of our modern theory of strong interactions are
not the nucleon and pion fields of Källén’s time, but quark and gluon fields,
with an effective field theory of nucleon and pion fields useful only as a low-
energy approximation.
30
S. Weinberg, Phys. Rev. Lett. 43, 1566 (1979).

13
But there is another possibility. The underlying theory may be an or-
dinary quantum field theory, including fields for gravitation and the ingre-
dients of the Standard Model. Of course, it could not be renormalizable in
the Dyson sense, so to deal with infinities every possible interaction allowed
by symmetry principles would have to be present, just as in effective field
theories like the chiral theory of pions and nucleons. But it need not lose its
predictive power at high energies, if the bare coupling constants gn (Λ) for
an ultraviolet cut-off Λ (multiplied by whatever positive or negative powers
of Λ are needed to make the gn dimensionless) approach a fixed point gn∗
as Λ → ∞.31 This is what happens in quantum chromodynamics, where
g∗ = 0, and in that case is known as asymptotic freedom.32 In theories in-
volving gravitation it is not possible for all the gn∗ to vanish. In this more
general case where gn∗ is not necessarily zero, the approach to a fixed point
is known as “asymptotic safety,” because the theory is safe from the danger
that dimensionless couplings like ggrav = GΛ2 (where G is Newton’s constant)
might run off to infinity as Λ goes to infinity.
For asymptotic safety to be possible, it is necessary that βn (g∗ ) = 0, where
βn (g(Λ)) ≡ Λ dgn(Λ)/dΛ. It is also necessary that the coupling constants
gn (Λ) at any finite cut-off lie on a trajectory in coupling constant space that
is attracted rather than repelled by this fixed point. There are reasons to
expect that, even with an infinite number of coupling parameters, the surfaces
spanned by such trajectories have finite dimensionality, so such a theory
would involve just a finite number of free parameters, just as for ordinary
renormalizable theories. The trouble, of course, is that there is no reason
to expect the gn∗ to be small, so that ordinary perturbation theory can’t be
relied on for calculations in asymptotically safe theories. Other techniques
such as dimensional continuation,33 1/N expansions,34 lattice quantization,35
31
S. Weinberg, in Understanding the Fundamental Constituents of Matter – 1976 Erice
Lectures, ed. A. Zichichi (Plenum Press); and in General Relativity, ed. S. W. Hawking
and W. Israel (Cambridge University Press, 1979) 790.
32
D. J. Gross and F. Wilczek, Phys. Rev. Lett. 40, 1343 (1973); H. D. Politzer, Phys.
Rev. Lett. 30, 1346 (1973).
33
S. Weinberg, ref. 31 (1979); H. Kawai, Y. Kitazawa, & M. Ninomiya, Nucl. Phys. B
404, 684 (1993); Nucl. Phys. B 467, 313 (1996); T. Aida & Y. Kitazawa, Nucl. Phys. B
401, 427 (1997); M. Niedermaier, Nucl. Phys. B 673, 131 (2003) .
34
L. Smolin, Nucl. Phys. B208, 439 (1982); R. Percacci, Phys. Rev. D 73, 041501
(2006).
35
J. Ambjørn, J. Jurkewicz, & R. Loll, Phys. Rev. Lett. 93, 131301 (2004); Phys. Rev.

14
and the truncated “exact” renormalization group equations,36 have provided
increasing evidence that gravitation may be part of an asymptotically safe
theory. 37 So it is just possible that we may be closer to the final underlying
theory than is usually thought.
Källén continued his interest in general elementary particle physics, and
wrote a book about it, published in 1964.38 Arthur Wightman quoted a
typical remark about this book: “That is the book on elementary particles
that experimentalists find really helpful.” But Källén’s timing was unlucky
– the development not only of chiral dynamics but also of the electroweak
theory were then just a few years in the future, and they were to put many
of the problems he worried about in a new perspective.
It was a tragic loss not only to his friends and family but also to all the-
oretical physics that Källén died in an airplane accident just 40 years ago.
For me, this was specially poignant, because he had been so kind to me in
Copenhagen, and yet we had become estranged. Some time in 1957, just
before I finished my graduate work, Källén visited Princeton, and left a note
in my mail box. Apparently he had seen a draft of my Ph. D. thesis, which
was about the use of renormalization theory to deal with strong interaction
effects in weak decay processes. His note seemed angry, and said that my
Lett. 95, 171301 (2005); Phys. Rev. D72, 064014 (2005); Phys. Rev. D78, 063544 (2008);
and in Approaches to Quantum Gravity, ed. D. Orı́ti (Cambridge University Press).
36
M. Reuter, Phys. Rev. D 57, 971 (1998); D. Dou & R. Percacci, Class. Quant. Grav.
15, 3449 (1998); W. Souma, Prog. Theor. Phys. 102, 181 (1999); O. Lauscher & M.
Reuter, Phys. Rev. D 65, 025013 (2001); Class. Quant. Grav. 19. 483 (2002); M. Reuter
& F. Saueressig, Phys Rev. D 65, 065016 (2002); O. Lauscher & M. Reuter, Int. J. Mod.
Phys. A 17, 993 (2002); Phys. Rev. D 66, 025026 (2002); M. Reuter and F. Saueressig,
Phys Rev. D 66, 125001 (2002); R. Percacci & D. Perini, Phys. Rev. D 67, 081503 (2002);
Phys. Rev. D 68, 044018 (2003); D. Perini, Nucl. Phys. Proc. Suppl. C 127, 185 (2004);
D. F. Litim, Phys. Rev. Lett. 92, 201301 (2004); A. Codello & R. Percacci, Phys. Rev.
Lett. 97, 221301 (2006); A. Codello, R. Percacci, & C. Rahmede, Int. J. Mod. Phys. A23,
143 (2008); M. Reuter & F. Saueressig, 0708.1317; P. F. Machado and F. Saueressig, Phys.
Rev. D77, 124045 (2008); A. Codello, R. Percacci, & C. Rahmede, 0805.2909; A. Codello
& R. Percacci, 0810.0715; D. F. Litim 0810.3675; H. Gies & M. M. Scherer, 0901.2459; D.
Benedetti, P. F. Machado, & F. Saueressig, 0901.2984, 0902.4630; M. Reuter & H. Weyer,
0903.2971.
37
For reviews see M. Niedermaier & M. Reuther, Living Rev. Relativity 9, 5 (2006);
M. Niedermaier, Class. Quant. Grav. 24, R171 (2007); M. Reuter and F. Saueressig,
0708.1317; R. Percacci, in Approaches to Quantum Gravity, ed. D. Orı́ti (Cambridge
University Press).
38
G. Källén, Elementary Particle Physics (Addison-Wesley, Reading, MA. 1964).

15
work showed all the misconceptions about quantum field theory that were
then common. Well, my thesis was no great accomplishment, but I didn’t
see why he was angry about it. Maybe he was annoyed that I was following
the common practice, of not worrying about the fact that the renormaliza-
tion constants I encountered were infinite. Torsten Gustafson39 has said of
Källén that “Like Pauli he often expressed his opinion in a provocative fash-
ion — especially to well-known physicists.” I certainly was not a well-known
physicist, but maybe Källén was paying me a compliment by treating me like
one.
I did not meet Källén again after this, and I never replied to his note. I
regret that very much, because I think that if I had replied we could have un-
derstood each other, and been friends again. Perhaps this talk can substitute
for the reply to Källén that I should have made half a century ago.

I am grateful to C. Jarlskog and the Källén Lecture Committee for inviting


me to Lund to give this talk, and to the Gunnar and Gunnel Källén Memorial
Fund of the Royal Physiographic Society for sponsoring it. This material
is based in part upon work supported by the National Science Foundation
under Grant No. PHY-0455649 and with support from The Robert A. Welch
Foundation, Grant No. F-0014.

39
T. Gustafson, Nucl. Phys. A140, 1 (1970).

16
UTTG-11-09
arXiv:0911.3165v2 [hep-th] 30 Mar 2010

Asymptotically Safe Inflation

Steven Weinberg∗
Theory Group, Department of Physics, University of Texas
Austin, TX, 78712

Abstract

Inflation is studied in the context of asymptotically safe theories of gravi-


tation. Conditions are explored under which it is possible to have a long
period of nearly exponential expansion that eventually comes to an end.


Electronic address: weinberg@physics.utexas.edu

1
I. Introduction

Decades ago it was suggested that the effective quantum field theory of
gravitation and matter might be asymptotically safe,1 and hence ultraviolet-
complete. That is, the renormalization group flows might have a fixed point,
with a finite dimensional ultraviolet critical surface of trajectories attracted
to the fixed point at short distances. Evidence for a fixed point in the quan-
tum theory of gravitation with or without matter has gradually accumulated
through the use of dimensional continuation,2 the large N approximation3
(where N is the number of matter fields), lattice methods,4 the truncated
exact renormalization group,5 and a version of perturbation theory.6 Re-
cently there has also been evidence that the ultraviolet critical surface is
finite-dimensional; it has been found that even in truncations of the exact
renormalization group equations with more than three (and up to nine) in-
dependent coupling parameters, the ultraviolet critical surface is just three-
dimensional.7 The condition that physical parameters lie on the ultraviolet
1
S. Weinberg, in Understanding the Fundamental Constituents of Matter, ed. A.
Zichichi (Plenum Press, New York, 1977).
2
S. Weinberg, in General Relativity, ed. S. W. Hawking and W. Israel (Cambridge
University Press, 1979): 700; H. Kawai, Y. Kitazawa, & M. Ninomiya, Nucl. Phys. B
404, 684 (1993); Nucl. Phys. B 467, 313 (1996); T. Aida & Y. Kitazawa, Nucl. Phys. B
401, 427 (1997); M. Niedermaier, Nucl. Phys. B 673, 131 (2003) .
3
L. Smolin, Nucl. Phys. B208, 439 (1982); R. Percacci, Phys. Rev. D 73, 041501
(2006).
4
J. Ambjørn, J. Jurkewicz, & R. Loll, Phys. Rev. Lett. 93, 131301 (2004); Phys. Rev.
Lett. 95, 171301 (2005); Phys. Rev. D72, 064014 (2005); Phys. Rev. D78, 063544 (2008);
and in Approaches to Quantum Gravity, ed. D. Orı́ti (Cambridge University Press).
5
M. Reuter, Phys. Rev. D 57, 971 (1998); M. Reuter, hep-th/9605030; D. Dou & R.
Percacci, Class. Quant. Grav. 15, 3449 (1998); W. Souma, Prog. Theor. Phys. 102, 181
(1999); O. Lauscher & M. Reuter, Phys. Rev. D 65, 025013 (2001); Class. Quant. Grav.
19. 483 (2002); M. Reuter & F. Saueressig, Phys Rev. D 65, 065016 (2002); O. Lauscher
& M. Reuter, Int. J. Mod. Phys. A 17, 993 (2002); Phys. Rev. D 66, 025026 (2002);
M. Reuter and F. Saueressig, Phys Rev. D 66, 125001 (2002); R. Percacci & D. Perini,
Phys. Rev. D 67, 081503 (2002); Phys. Rev. D 68, 044018 (2003); D. Perini, Nucl. Phys.
Proc. Suppl. C 127, 185 (2004); D. F. Litim, Phys. Rev. Lett. 92, 201301 (2004); A.
Codello & R. Percacci, Phys. Rev. Lett. 97, 221301 (2006); A. Codello, R. Percacci, & C.
Rahmede, Int. J. Mod. Phys. A23, 143 (2008); M. Reuter & F. Saueressig, 0708.1317; P.
F. Machado and F. Saueressig, Phys. Rev. D77, 124045 (2008); A. Codello, R. Percacci,
& C. Rahmede, Ann. Phys. 324, 414 (2009); A. Codello & R. Percacci, 0810.0715; D. F.
Litim 0810.3675; H. Gies & M. M. Scherer, 0901.2459; D. Benedetti, P. F. Machado, &
F. Saueressig, 0901.2984, 0902.4630; M. Reuter & H. Weyer, 0903.2971. For a review, see
M. Reuter and P. Saueressig, to be published [0708.1317].
6
M. R. Niedermaier, Phys. Rev. Lett. 103, 101303 (2009).
7
A. Codello, R. Percacci, & C. Rahmede, Int. J. Mod. Phys. A23, 143 (2008); Ann.

2
critical surface is analogous to the condition of renormalizability in the Stan-
dard Model, and like that condition yields a theory with a finite number of
free parameters.
The natural arena for applications of the idea of asymptotic safety is
the physics of very short distances, and in particular the early universe.8
In Section II we show how to formulate the differential equations for the
scale factor in a Robertson–Walker solution of the classical field equations
in a completely general generally covariant theory of gravitation. In Section
III we apply this result to calculate the expansion rate H for a de Sitter
solution of the classical field equations. We are interested here in solutions
for which H is of the same order as the scale at which the couplings are
beginning to approach their fixed point, or larger. In this case, H turns
out in the tree approximation to depend strongly on the ultraviolet cutoff,
indicating a breakdown of the classical approximation. We deal with this by
choosing an optimal cutoff, which minimizes the quantum corrections to the
classical field equations. Section IV considers more general time-dependent
Robertson–Walker solutions of the classical field equations with an optimal
cutoff, and explores the circumstances under which it is possible to have an
exponential expansion that persists for a long time but eventually comes to
an end. An illustrative example is worked out in Section V.
We will work with a completely general generally covariant theory of
gravitation. (For simplicity matter will be ignored here.) The effective
Phys. 324, 414 (2009); D. Benedetti, P. F. Machado, & F. Saueressig, 0901.2984, 0902.4630
8
The implications of asymptotic safety for cosmology have been considered by A. Bo-
nanno and M. Reuter, Phys. Rev. D 65, 043508 (2002); Phys. Lett. B527, 9 (2002);
M. Reuter and F. Saueressig, J. Cosm. and Astropart. Phys. 09, 012 (2005). This work
differs from that presented here, in that they consider a severe truncation of the gravita-
tional action, including only the cosmological constant and Einstein–Hilbert terms; they
include matter as a perfect fluid with a constant equation of state parameter w; and they
employ a time-dependent cutoff Λ. For more recent similar work that is somewhat closer
in spirit to the present paper, see A. Bonanno and M. Reuter, J. Cosm. and Astropart.
Phys. 0708, 024 (2007); J. Phys. Conf. Ser. 140, 012008 (2008).

3
action with an ultraviolet cutoff Λ takes the form9
Z "
4
−Detg Λ4 g0 (Λ) + Λ2 g1 (Λ)R + g2a (Λ)R2
p
IΛ [g] = − d x
#
µν −2 3 −2 µν
+g2b (Λ)R Rµν + Λ g3a (Λ)R + Λ g3b (Λ)RR Rµν + . . . (1)
.

Here we have extracted powers of Λ from the conventional coupling con-


stants, to make the coupling parameters gn (Λ) dimensionless. Because they
are dimensionless, these running couplings satisfy renormalization group
equations of the form
d  
Λ gn (Λ) = βn g(Λ) . (2)

The condition for a fixed point at gn = gn∗ is that βn (g∗ ) = 0 for all n. As is
well known, the condition for the couplings to be attracted to a fixed point
gn∗ as Λ → ∞ can be seen by considering the behavior of gn (Λ) when it is
near gn∗ . In the case where βn (g) is analytic in a neighborhood of gn∗ , near
this fixed point we have
∂βn (g)
X  
βn (g) → Bnm (gm − g∗m ) Bnm ≡ , (3)
m ∂gm ∗

The solution of Eq. (2) in this neighborhood is


λN
Λ

uN
X
gn (Λ) → g∗n + n (4)
N
M
9
Higher derivative theories of this sort if used in the tree approximation have long
been known to be plagued by “ghosts”; that is, poles in propagators with residues of
the wrong sign for unitarity. This is only if the series of operators in (1) is truncated;
otherwise propagator denominators are not polynomials in the squared momentum, and
there may be just one pole, or any number of poles. Even with a truncated action, because
of the running of the couplings, there is no one Lagrangian that can be used to find the
propagator in the tree approximation over the whole range of momenta where the various
poles occur, and it is not ruled out that all the poles have the residues of the right sign.
For instance, ref. 6 shows that, in a theory with only the couplings g1 , g2a , and g2b , the
residue of the pole in the spin 2 propagator at high mass, which had usually been supposed
to have the wrong sign (as for instance in the work of K. S. Stelle, Phys. Rev. D 16, 953
(1977)), in fact has a sign consistent with unitarity. More generally, Benedetti et al. in ref.
7 point out that for any truncation or no truncation, when weplook for a pole at a four-
momentum p, we must take the cut-off Λ to be proportional to −p2 , so the denominator
of any propagator takes the form p2 + m2 (−p2 ). The function m2 (−p2 ) is a constant at
sufficiently low |p2 |, and of the form cp2 for momenta so large that the couplings are near
their fixed point, where c is a constant, so the equation p2 + m2 (−p2 ) = 0 for the pole
position has no solution if −c > 1.

4
where uN and λN are eigenvectors and corresponding eigenvalues of the
matrix Bnm :
Bnm uN N
X
m = λN un . (5)
m

It is a physical requirement that the only eigenvectors that are allowed


to appear in the sum in Eq. (4) are those for which the real part of the
corresponding eigenvalues are negative, so that the couplings actually do
approach the fixed point. The normalizations of the eigenvectors that do
appear in Eq. (4) are free physical parameters, the only free parameters of
the theory, except that we can adjust the over-all normalization of all the
eigenvectors as we like by a suitable choice of the arbitrary mass scale M .
If we choose M to make the largest of the uN n of order unity, then M is the
cut-off scale at which couplings are just beginning to approach their fixed
point.
Aside from the illustrative example considered in Section V, we will
not carry our discussion in this paper to the point of performing numerical
calculations, which of course would require some truncation of the series of
terms in the action (1). Our purpose here is to lay out the general outlines of
such a calculation, for which purpose we do not need to adopt any specific
truncation. Our results are worked out in detail for the terms explicitly
shown in Eq. (1), but this is only for the purposes of illustration; nothing
in this paper assumes the neglect of higher terms. For our purposes here,
it makes no difference whether Λ is regarded as a sharp ultraviolet cutoff
on loop diagrams to be calculated using the action (1), or as a momentum
parameter (usually called k) in a regulator term added to the action, or a
sliding renormalization scale.

II. Robertson–Walker Solutions

In this section we consider how to find a solution of the classical gravita-


tional field equations for the general action (1), of the flat-space Robertson–
Walker form
dτ 2 = dt2 − a2 (t) d~x2 . (6)
It would be very complicated to derive the ten classical field equations for
a general metric that follow from an action like (1), and then specialize
to the case of a Robertson–Walker metric. Instead, we can much more
easily exploit the symmetries of this metric to derive a single differential
equation for the Hubble rate H(t) ≡ ȧ(t)/a(t). In showing how to derive

5
this differential equation, we will be quite general, not making any use in
this section of the assumption of asymptotic safety.
We can use the rotational and translational symmetries of the line ele-
ment (6) to write the components of the variational derivatives δIΛ /δgµν in
the form
" #
δIΛ [g] Λ4
= δij a−2 (t) MΛ (t) , (7)
δgij (x) RW
6
δIΛ [g]
 
=0 (8)
δgi0 (x) RW
δIΛ [g] Λ4
 
= − NΛ (t) , (9)
δg00 (x) RW 2

the subscript RW indicating that, after taking the variational derivative,


the metric is to be set equal to the Robertson–Walker metric defined by (6).
(The factors Λ4 /6a2 and Λ4 /2 are inserted in the definitions of MΛ and NΛ
for future convenience.) Also, the general covariance of the action yields the
generalized Bianchi identity
" #
δIΛ [g]
0= . (10)
δgµν (x) ;ν

By using Eqs. (7)–(9) for the Robertson–Walker metric, Eq. (10) is reduced
to the condition:
d 3 
a2 ȧ MΛ = a NΛ . (11)
dt
Therefore the gravitational field equations reduce here to a single differential
equation:
NΛ (t) = 0 , (12)
which we see ensures the vanishing of all variational derivatives δIΛ [g]/δgµν .
This result (which holds also in the presence of spatial curvature and matter)
is the generalization of the familiar Friedmann equation, which would apply

if only the Einstein–Hilbert term − gR/16πG and a vacuum energy term
were included in the gravitational action.
We can express MΛ and then NΛ in terms of variational derivatives of
the action for the Robertson-Walker metric with respect to the scale factor
a(t). Because a(t) appears in the Robertson–Walker metric only as a factor

6
a2 (t) in gij (x, t), we have
" #
δIΛ [gRW ] δI[g]
Z
3 3
= d x 2a(t)δij × a (t) = V Λ4 MΛ (t) a2 (t) ,
δa(t) δgij (x, t) RW
(13)
where V is the coordinate space volume (which can be made finite by im-
posing periodic boundary conditions.) For the flat-space Robertson-Walker
metric (gRW )µν , the action takes the general form
Z  
4
IΛ [gRW ] = V Λ dt a3 (t) IΛ H(t), Ḣ(t), . . . , (14)

where as usual H(t) ≡ ȧ(t)/a(t). Here and in Eqs. (15)–(17) below, the el-
lipsis . . . indicates a possible dependence of IΛ on second and higher deriva-
tives of H(t). (Second and higher √ time derivatives do not occur in IΛ if
the integrand of the action is −Det g times an arbitrary scalar function
of the Riemann-Christoffel curvature tensor Rµνρσ , including of course an
arbitrary dependence on the curvature scalar and the Ricci tensor, but we
do not assume that this is the case.) Comparing Eq. (13) with the result of
a straightforward calculation of the variational derivative of the action (14)
with respect to a(t) gives
∂IΛ ∂IΛ
MΛ = 3IΛ − 3H + (3Ḣ + 9H 2 )
∂H ∂ Ḣ
d ∂IΛ d ∂IΛ d2 ∂IΛ
     
− + 6H + 2 + ... . (15)
dt ∂H dt ∂ Ḣ dt ∂ Ḣ
We note that a2 ȧMΛ is a time-derivative
( "
2 d ∂IΛ ∂IΛ
a ȧMΛ = a3 IΛ − H + (−Ḣ + 3H 2 )
dt ∂H ∂ Ḣ
#)
d ∂IΛ
 
+H + ... . (16)
dt ∂ Ḣ

Comparing with Eq. (11), we see that NΛ equals the term in square brackets
in (16), up to a possible term equal to a constant divided by a3 (t). But the
term in square brackets is independent of the scale of a(t), as is NΛ (t), so
there can be no term in their difference proportional to 1/a3 (t), and thus
∂IΛ ∂IΛ d ∂IΛ
 
NΛ = IΛ − H + (−Ḣ + 3H 2 ) +H + ... (17)
∂H ∂ Ḣ dt ∂ Ḣ

7
The ten classical field equations reduce for the flat-space Robertson–Walker
metric to the single requirement that this vanishes.
To evaluate the terms in the action for the Robertson–Walker metric
with no spatial curvature that are explicitly shown in Eq. (1), we note that
for this metric R = −12H 2 − 6Ḣ and Rµν Rµν = 36H 4 + 36H 2 Ḣ + 12Ḣ 2 .
Using these in Eq. (1) and comparing with Eq. (14) gives

IΛ = −g0 (Λ) + Λ−2 g1 (Λ)(12H 2 + 6Ḣ) − Λ−4 g2a (Λ)(12H 2 + 6Ḣ)2


 
− Λ−4 g2b (Λ) 36H 4 + 36H 2 Ḣ + 12Ḣ 2 + Λ−6 g3a (Λ)(12H 2 + 6Ḣ)3
+ Λ−6 g3b (Λ)(12H 2 + 6Ḣ)(36H 4 + 36H 2 Ḣ + 12Ḣ 2 ) + . . . , (18)

where now the dots . . . denote contributions from terms not shown in (1),
some of which involve second and higher derivatives of H. From Eq. (17),
we then have

NΛ (H, Ḣ, Ḧ, . . .) = −g0 (Λ) + 6Λ−2 g1 (Λ)H 2


 
− Λ−4 g2a (Λ) 216H 2 Ḣ − 36Ḣ 2 + 72H Ḧ
 
− Λ−4 g2b (Λ) 72H 2 Ḣ − 12Ḣ 2 + 24H Ḧ

+ Λ−6 g3a (Λ) − 864 H 6 + 7776H 4 Ḣ + 3240H 2 Ḣ 2

− 432 Ḣ 3 + 216 H Ḧ (12H 2 + 6Ḣ)

+ Λ−6 g3b (Λ) − 216H 6 + 2160H 4 Ḣ + 1008 H 2 Ḣ 2 − 144Ḣ 3

+ H Ḧ(720H 2 + 432Ḣ) + . . . . (19)

This is the quantity that must be set equal to zero in finding a flat-space
Robertson–Walker solution of the classical gravitational field equations.

III. De Sitter Solutions and Optimal Cutoff

We can now easily find the condition for a de Sitter solution of the
classical field equations, with

a(t) ∝ eHt , (20)

where H is constant. Setting the quantity (19) equal to zero for H(t) = H

8
gives our condition on H:10

0 = NΛ (H) ≡ NΛ (H, 0, 0, . . .)
= −g0 (Λ) + 6 g1 (Λ) (H/Λ)2 − 864 g3a (Λ) (H/Λ)6
−216 g3b (Λ) (H/Λ)6 + . . . (21)

It is easy to find solutions of Eq. (21) that have small values of H, very
much smaller than the scale M at which the couplings begin to approach
their fixed points. For sufficiently small H, we can take Λ to be much larger
than H, and yet small enough so that the couplings appearing as coefficients
in (1) become independent of Λ, and in particular

Λ4 g0 (Λ) → ρV , Λ2 g1 (Λ) → 1/16πGN ,

where ρV and GN are the conventional, Λ-independent, vacuum energy and


Newton constant. Then (21) has the familiar Λ-independent solution

2 8πGN ρV
H = .
3
Because of the still mysterious fact that ρV is observed to be much less than
G−2 , this value of H is much less than G−1/2 , and so radiative corrections
and higher terms in (21) can be neglected.
We will instead be interested here in looking for solutions for which H is
roughly of the order of the scale M at which the couplings begin to approach
their fixed points, or larger. In this case, we face a difficult choice: How
should we choose Λ? On one hand, if we choose Λ ≪ H, then we can expect
radiative corrections to the classical result (21) to be unimportant, because
H provides a natural infrared cutoff in loop diagrams constructed using the
action (1). But for Λ ≪ H, the sum (21) receives increasing contributions as
we include higher and higher terms, and whether or not the series actually
converges, it is not useful. On the other hand, if we choose Λ ≫ H, then
it is reasonable to suppose that the series (21) is dominated by its lowest
terms, but for Λ ≫ H there is no reason to suppose that we can neglect
radiative corrections to the field equations. Indeed, we can see that radiative
10
Note that this is not the result that would be obtained by setting the derivative of
IΛ (H, 0, 0, . . .) with respect to H equal to zero. For a de Sitter metric with a(t) = exp(Ht),
the integral over t in the action RIΛ [g] diverges at t = ∞. If we integrate only from
t = −∞ to t = 0, the integral dt a3 (t) gives a factor 1/3H , but the derivative of
IΛ (H, 0, 0, . . .)/3H with respect to H is not zero; it equals a surface term (∂IΛ /∂ Ḣ)H ,
which again gives Eq. (21).

9
corrections to the field equations are important here, because where Eq. (21)
is dominated by its lowest terms, it gives H a strong dependence on Λ. (This
is clearest in the case where Λ is so large that the couplings are near their
fixed points, in which case (21) gives H proportional to Λ.) The whole
point of the renormalization group equations (2) is that physical quantities
like H should be independent of the cutoff, but in general this is true only
when radiative corrections are included, and since Eq. (21) gives H a strong
dependence on Λ when Λ ≫ H, radiative corrections evidently can not be
neglected.
Ideally, we should leave Λ undetermined, and calculate enough of the
radiative corrections to the field equations so that H comes out at least
approximately independent of Λ. This would not be easy. Instead, we can
try to make a judicious choice of Λ to minimize the radiative corrections. We
can guess that the optimal Λ is roughly of the order of H, where radiative
corrections are just beginning to be important, and the higher terms in (21)
are just beginning to be less important. This sort of guess works quite well
in quantum chromodynamics. The radiative corrections to a process like
e+ –e− annihilation into jets of hadrons at an energy E are accompanied
with powers of ln(E/Λ), and to avoid large radiative corrections it is only
necessary to take Λ ≈ E. In this way, we can use the tree approximation to
calculate the annihilation into, say, three jets, with the renormalization scale
of the QCD coupling taken of order E. But in our case, radiative corrections
are more sensitive to Λ, and we have to make a more careful choice of Λ.
To find an optimal cutoff, we note that in principle we should find H by
solving the full quantum corrected field equations, which give a result that
can be schematically written as

H true = H(Λ) + ∆H(Λ) , (22)

where H(Λ) is defined as the solution of Eq. (21), and ∆H(Λ) represents
the effect of radiative corrections. Instead of calculating loop graphs, we
can get some idea of the results of such a calculation by using the tree-
approximation field equations (21), but with Λ chosen at a local minimum
of the radiative corrections to H. For such an optimal Λ, we have11

∆H(Λ) = 0 . (23)
∂Λ
11
This is the weakest point in our discussion. For one thing, we do not know whether
the condition (23) gives a local minimum or maximum of the radiative corrections. Worse,
even if the radiative corrections are minimized, we do not know that they are small.

10
As already mentioned, physical quantities, including the true expansion rate
H true , must be independent of Λ, so Eq. (23) tells us also that the expansion
rate calculated from the classical field equations is stationary at the optimal
cut-off

0 = Λ H(Λ) . (24)
∂Λ
 
By definition, for any Λ we have NΛ H(Λ) = 0, and by differentiating
this with respect to Λ and using Eq. (24) we find that the condition for an
optimal cutoff may be put in the form

   

0= Λ NΛ (H) = AΛ H(Λ) + BΛ (H(Λ) , (25)
∂Λ H=H(Λ)

where AΛ arises from the explicit dependence of NΛ (H) on H/Λ:


AΛ (H) ≡ −H NΛ (H)
∂H
!2 !6
H H
= −12 g1 (Λ) + 5184 g3a (Λ)
Λ Λ
!6
H
+ 1296 g3b (Λ) + . . . , (26)
Λ

and BΛ comes from the running of the couplings in NΛ :


     
BΛ (H) ≡ −β0 g(Λ) + 6 β1 g(Λ) (H/Λ)2 − 864 β3a g(Λ) (H/Λ)6
 
−216 β3b g(Λ) (H/Λ)6 + . . . . (27)

We now have two equations, (21) and (25), for the two quantities H and Λ,
so it is not unreasonable to expect there to be one or more solutions, with
both Λ and H roughly of order M , the only mass parameter in the theory.

IV. Time Dependence

The de Sitter solution found in Section II describes a universe that in-


flates eternally. For a more realistic picture of inflation, we need a solution
that remains close to the de Sitter solution with expansion rate near H for
a time much longer than 1/H, but that gradually evolves away from the de
Sitter solution, so that inflation can come to an end. (We have nothing to

11
say here about the metric before the universe enters into its de Sitter phase.)
To find such a solution, we will consider first-order perturbations of the de
Sitter solution, of the Robertson–Walker form (6). The expansion rate will
take the form
H(t) = H + δH(t) , (28)
with |δH(t)| ≪ H. Keeping only terms in (19) of first order in δH(t), the
field equation NΛ = 0 becomes

δH δḢ δḦ
c0 (H, Λ) + c1 (H, Λ) 2 + c2 (H, Λ) 3 + . . . = 0 , (29)
H H H
where
∂NΛ
 
c0 (H, Λ) ≡ H = −AΛ (H) , (30)
∂H H
with AΛ given by Eq. (26), and
∂NΛ
 
2
c1 (H, Λ) ≡ H
∂ Ḣ H
!4 !4 !6
H H H
= −216 g2a (Λ) − 72 g2b (Λ) + 7776 g3a (Λ)
Λ Λ Λ
!6
H
+ 2160 g3b (Λ) + ... , (31)
Λ

∂NΛ
 
3
c2 (H, Λ) ≡ H
∂ Ḧ H
!4 !4 !6
H H H
= −72 g2a (Λ) − 24 g2b (Λ) + 2592 g3a (Λ)
Λ Λ Λ
!6
H
+ 720 g3b (Λ) + ... , (32)
Λ

and so on, with the subscript H on partial derivatives meaning that after
taking the derivatives we set H(t) = H. Eq. (29) has an obvious solution of
the form
δH ∝ exp(ξHt) , (33)
where ξ is any root of the equation

c0 (H, Λ) + c1 (H, Λ) ξ + c2 (H, Λ) ξ 2 + . . . = 0 . (34)

12
(This is a√quadratic equation in the special case in which the integrand of the
action is −Detg times an arbitrary function of the curvature tensor.) For
positive Re ξ, Eq. (33) represents an instability, and the number of e-foldings
before this instability ends the exponential expansion is ≈ 1/Re ξ.
We would generally expect the coefficients in Eq. (34) to be of the same
order, in which case typical solutions for ξ would be of order unity, and
inflation would either end almost immediately (if Re ξ > 0) or go on forever
(if Re ξ ≤ 0). But there are various circumstances under which we expect
ξ to be much smaller, giving a large number of e-foldings before the end of
inflation.12

1. If |c0 | is much less than all the other |cn |, then Eq. (34) will have a
solution ξ ≃ −c0 /c1 , and so much less than unity. In particular, if
we now choose Λ to be the optimal cutoff described in the previous
section, then we can use the condition (25) and Eq. (30) to write

c0 (H, Λ) = BΛ (H) , (35)

According to Eq. (27), BΛ (H) vanishes if the couplings are at their


fixed point, so we can conclude that it is possible to have a long but
not eternal period of inflation if the optimal Λ is large enough so that
the couplings gn (Λ) are not far from their fixed point. But there is a
limit to how close the couplings at the optimum cutoff can be to their
fixed point. At the fixed point, the quantities (21) and (25) are both
functions of the single parameter H/Λ, and it is not likely that these
two functions would vanish at the same value of this parameter.

2. If the couplings are not very near their fixed point, they are sensitive
to the free parameters of the theory that characterize the particular
trajectory in coupling-constant space on which the couplings lie, and
it is easy to choose these couplings to make |c0 | as small as we like.
For instance, where (4) applies, all the couplings are linear in the
normalization of the eigenvectors uN n , the only free parameters of the
theory. In a theory of chaotic inflation, the value of these parameters
in any big bang containing observers may be conditioned by the re-
quirement that c0 should be small enough (and have the right sign)
to allow the bang to become big. To be specific, in order for spatial
12
We are concentrating here on only one mode. In all cases Eq. (34) will have more
than one solution, and we are assuming that all modes other than the one (or several)
with Re ξ small and positive either have Re ξ ≤ 0 or for some reason are not excited.

13
curvature not to interfere with the formation of galaxies it is neces-
sary that the universe should expand enough during inflation so that
whatever curvature was present at the beginning of inflation would be
decreased enough so that the curvature term in the Friedmann equa-
tion should not dominate over the matter term when galaxies form.13
As is well known, the fact that spatial curvature does not dominate
at present requires about 60 to 70 e-foldings of inflation,14 and the
anthropic requirement that curvature does not interfere with galaxy
formation is almost as restrictive. But the combination of data from
the microwave background, baryon acoustic oscillations, and type Ia
supernovae distance–redshift relations has shown15 that (within two
standard deviations) the fractional curvature contribution ΩK to H02
is in the range of −0.0178 to +0.0066. It is hard to see any anthropic
reason for a number of e-foldings large enough to reduce the curvature
this much.

3. Instead of c0 being anomalously small, it is possible for some or all


of the other cn to be anomalously large, in which case again ξ will be
small and the number of e-foldings will be large. For instance, we note
that c0 unlike the other cn does not involve the couplings g2a and g2b , so
if these couplings are anomalously large, as in ref. 6, then c1 , c2 , etc.,
will be much larger than c0 , and again we will have |ξ| ≃ |c0 /c1 | ≪ 1.

V. An Example

We will now apply the above results to a classic example of higher deriva-
tive theories of gravitation, with action limited to terms with no more than
four spacetime derivatives:
Z "
d4 x −Detg Λ4 g0 (Λ) + Λ2 g1 (Λ)R + g2a (Λ)R2
p
IΛ [g] = −
#
µν
+g2b (Λ)R Rµν . (36)

This theory was studied by Stelle16 as a possible renormalizable quantum


13
B. Freivogel, M, Kleban, M. R. Martinez, and L. Susskind, J. High Energy Phys.
0603, 039 (2006).
14
A. Guth, Phys. Rev. D23, 347 (1981).
15
E. Komatsu et al., Astrophys. J. Suppl. Ser. 180, 330 (2009).
16
K. S. Stelle, ref. 9.

14
theory of gravitation, and has been considered recently by Niedermaier17
and by Benedetti et al.18 in connection with asymptotic safety. As is well
known, it is possible by using the Gauss–Bonnet identity to put this action
in the form used in refs. 17 and 18:
Z "
d4 x −Detg Λ4 g0 (Λ) + Λ2 g1 (Λ)R + fa (Λ)R2
p
IΛ [g] = −
#
µνρσ
+fb (Λ)C Rµνρσ , (37)

where Cµνρσ is the Weyl tensor, and


g2b g2b
fa = g2a + , fb = . (38)
3 2
For this action, Eq. (21) gives the expansion rate for a de Sitter solution
of the field equations as
q
H = Λ g0 (Λ)/6g1 (Λ) . (39)

Instead of trying to find an optimal value of Λ, which minimizes radiative


corrections to Eq. (39), here we will simply assume that Λ is large enough
so that the couplings gn (Λ) are near their fixed point gn∗ , and use Eq. (39)
to express Λ in terms of H:
q
Λ = H 6g1∗ /g0∗ , (40)

with H left undetermined.


The critical question for this sort of theory is whether the de Sitter so-
lution has an instability that ends the eepxonential expansion after a finite
but large number of e-foldings. As we have seen, for any small perturbation
of the de Sitter solution, ȧ/a is a sum of terms with the time dependence
exp(ξHt), with ξ running over the √ roots of Eq. (34). We are now consid-
ering an action whose integrand is −Detg times a scalar function of the
metric and the Riemann–Christoffel curvature tensor, so as remarked in the
previous section, this equation is quadratic:

c0 + c1 ξ + c2 ξ 2 = 0 . (41)
17
M. R. Niedermaier, ref. 6.
18
D. Benedetti, P. F. Machado, and F. Saueressig, ref. 7.

15
For the particular action (36), the coefficients are given by

c0 = 12 g1∗ (H/Λ)2 = 2g0∗ (42)


   
c1 = 3c2 = − 216g2a∗ − 72g2b∗ (H/Λ)4 = 2
− 6g2a∗ − 2g2b∗ g0∗ 2
/g1∗ , (43)

so Eq. (41) reads


ξ 2 + 3ξ = A , (44)
where
c0 2
3g1∗
A=− = . (45)
c2 g0∗ (3 g2a∗ + g2b∗ )
We get a realistic picture of inflation if it turns out that A is small and
positive. In this case Eq. (44) has a root with ξ ≃ −3, corresponding to a
perturbation to ȧ/a that decays as exp(−3Ht), and a root with ξ ≃ A/3,
corresponding to a slowly growing perturbation, that ends the exponential
phase after about 3/A e-foldings.
Unfortunately, the numerical results obtained in ref. 17 and 18 are not
encouraging. The calculations of ref. 17 are expressed in terms of coupling
constants λ, gN , ω, and s, related to the couplings in Eq. (36) by

g0 = 2λ/gN , g1 = 1/gN
g2a = −(1 + ω)/3s , g2b = 1/s . (46)

Using a version of perturbation theory, ref. 17 found that for Λ → ∞ the


parameters ω, λ and gN approach the fixed point values

ω∗ = −0.0228 , λ∗ = 12.69 gN ∗ /(4π)2 = 0.4227 , (47)

while s(Λ) vanishes as

s(Λ) → 11.88/ ln(Λ/M ) , (48)

where M is some unknown large mass. Then Eq. (45) gives


3s 0.92
A=− → , (49)
2ωλgN ln(Λ/M )

so A is positive, but Λ/M would have to be about 108 to give 60 e-foldings


before inflation ends.

16
In ref. 18, by using the truncated exact renormalization group equations,
a fixed point is found with (in our notation)

g0∗ = −0.0042 , g1∗ = −0.0101 , g2a∗ = −0.0109 , g2b∗ = 0.01 . (50)

Using these results in Eq. (45) gives A = 3.05. This is positive, but un-
fortunately not at all small. The two roots of Eq. (44) are ξ = −3.80,
correspondign to a rapidly decaying mode, and ξ = 0.80, corresponding to
an instability that ends inflation after only a few e-foldings.

I am grateful for discussions with D. Benedetti, W. Fischler, E. Ko-


matsu, M. Niedermaier, and M. Reuter. This material is based in part on
work supported by the National Science Foundation under Grant NO. PHY-
0455649 and with support from The Robert A. Welch Foundation, Grant
No. F-0014.

17
UTTG-04-10

Six-dimensional Methods for Four-dimensional Conformal


arXiv:1006.3480v2 [hep-th] 2 Aug 2010

Field Theories

Steven Weinberg∗
Theory Group, Department of Physics, University of Texas
Austin, TX, 78712

Abstract

The calculation of both spinor and tensor Green’s functions in four-dimensional


conformally invariant field theories can be greatly simplified by six-dimensional
methods. For this purpose, four-dimensional fields are constructed as pro-
jections of fields on the hypercone in six-dimensional projective space, satis-
fying certain transversality conditions. In this way some Green’s functions
in conformal field theories are shown to have structures more general than
those commonly found by use of the inversion operator. These methods fit
in well with the assumption of AdS/CFT duality. In particular, it is trans-
parent that if fields on AdS5 approach finite limits on the boundary of AdS5 ,
then in the conformal field theory on this boundary these limits transform
with conformal dimensionality zero if they are tensors (of any rank), but
with conformal dimension 1/2 if they are spinors or spinor-tensors.


Electronic address: weinberg@physics.utexas.edu

1
I. INTRODUCTION

Let’s first review some well-known fundamentals. The action of confor-


mal transformations in four spacetime dimensions on a general field ψ n (x)
is given by its commutators with the generators J µν of Lorentz transforma-
tions, P µ of translations, K µ of special conformal transformations, and S of
dilatations:
!
∂ ∂
i [J ρσ , ψ n (x)] = xσ − xρ ψ n (x) − i (j ρσ )n m ψ m (x) , (1)
∂xρ ∂xσ
∂ n
i [P ρ , ψ n (x)] = − ψ (x) , (2)
∂xρ
!
∂ ∂
i [K ρ , ψ n (x)] = 2xρ xλ − x2 ψ n (x)
∂xλ ∂xρ
 n
−2ixλ j λρ mψ
m
(x) + 2d xρ ψ n (x) , (3)

 
i [S, ψ n (x)] = xλ + d ψ n (x) , (4)
∂xλ
where d is the conformal dimensionality of the field, and j ρσ is the appropri-
ate matrix representation of the Lie algebra of the Lorentz group, satisfying
the commutation relations
h i
i j µν , j ρσ = η νρ j µσ − η µρ j νσ − η νσ j µρ + η µσ j νρ . (5)

We can work out the consequences of conformal symmetry for Green’s func-
tions of general fields by direct use of these commutation relations, but
this is complicated, especially for non-scalar fields, for which j ρσ 6= 0, and
for three-point and higher Green’s functions. A widely practiced alterna-
tive[1] is to make use of invariance under the a single action of the inversion
xµ 7→ −xµ /x2 , but this is also complicated, and not necessarily valid. The
inversion is not an element of the connected part of the conformal group,
but only an outer automorphism, so that it is possible for the commutation
relations (1) through (4) to be satisfied without invariance under the inver-
sion. This makes no difference for two-point functions, or for some more
complicated Green’s functions involving only scalar fields, but in Section V
we will see examples of Green’s functions for spinor fields that are not invari-
ant under the inversion, even when the commutation relations (1) through
(4) are satisfied. (These comments do not apply if one acts with the inver-
sion an even number of times, but this gets complicated, and it is not what

2
is usually done in deriving the structure of Green’s function.) Here we are
going to offer a different method for the calculation of Green’s functions in
four-dimensional conformal field theories, based on very elementary calcula-
tions in six dimensions.1 Though no dynamical assumptions are made here,
to achieve conformal invariance in four dimensions it is found necessary to
specify certain relations between the fields in four dimensions and in six
dimensions and to impose constraints on the six-dimensional fields, both of
which may prove useful in dynamical theories.
It is well known that the connected part of the conformal group in
four space-time dimensions form the group SO(4, 2), which can be real-
ized as linear transformations in a six-dimensional projective space. This
six-dimensional space is a hypercone,
ηKL X K X L = 0 , (6)
where K, L, etc. run over the values 1, 2, 3, 0, 5, 6, and ηKL is the metric of
the six-dimensional space, a diagonal matrix with non-zero elements
η11 = η22 = η33 = η55 = +1 , η00 = η66 = −1 . (7)
It is a projective space, in the sense that λ X K is identified with X K for any
non-zero λ. The connection between six and four dimensions is provided by
the formula for the spacetime coordinates xµ ,

xµ = 5 , (8)
X + X6
where as usual µ, ν, etc. run over the values 1, 2, 3, 0. The conformal group
consists of transformations
X K 7→ ΛK L X L , ηKL ΛK M ΛL N = ηM N , DetΛ = 1 (9)
1
This work was done in preparing a course on quantum field theory given in Spring 2010.
Since the original version of this paper was posted on the hep-th archive, I have learned
of previous work in which dynamical equations are assumed for fields in six dimensions,
and then used to derive physical field equations in four dimensions. The literature on this
goes back to Dirac[2], where electromagnetic fields and free spinor fields were considered.
Among the first following Dirac to use this approach were Mack and Salam[3]. Other
early references are given in a historical review by Kastrup[4]. Extensive work has been
done on six dimensional field equations (including constraints on six-dimensional fields
found here) corresponding to realistic theories in four dimensions, by Bars[5]. Related
work was done by Ferrara, Grillo, and Gatto[6] for the case of symmetric tensors, and
extended to superconformal theories by Ferrara[7]. Of course, much work on the AdS/CFT
correspondence deals with related problems[8]. In contrast to all this previous work, the
aim of the present paper is the modest one of using six dimensional field theories to
derive only those properties of Green’s functions in four dimensions that follow solely
from conformal invariance, with no dynamical assumptions.

3
which generate the group of conformal transformations on the xµ given by
Eq. (8). The generators J KL = −J LK of these transformations satisfy the
commutation relations
h i
i J KL , J M N = η M L J KN − η KM J LN − η LN J KM + η KN J M L , (10)

with the generators of translations, special conformal transformations,and


dilatations identified as

P µ = J 5µ + J 6µ , K µ = −J 5µ + J 6µ , S = J 65 . (11)

The inversion operation xµ 7→ −xµ /x2 is simply the reflection that changes
the sign of X 6 , leaving all other X K unchanged. It violates the condition
DetΛ = +1, and hence belongs to O(4, 2) but not to SO(4, 2).
Because of the simplicity of the conformal transformation rule (9), it
is very easy to work out the consequences of conformal invariance for the
Green’s functions of fields in the six dimensional projective space. We can
tell by inspection whether a Green’s function of fields in six dimensions
is SO(4, 2)-invariant, in much the same way that we can tell at a glance
whether a Green’s function in four spacetime dimensions is Lorentz invari-
ant. The question, then, is how can we convert information about six-
dimensional Green’s functions into information about the Green’s functions
of fields in four-dimensional spacetime? Fields in six dimensions of course
have more components than the corresponding fields in four dimensions; for
instance, a six-dimensional tensor of rank r has 6r components, rather than
the 4r components in four dimensions, and a spinor field in six dimensions
has eight rather than four components. In order to construct suitable four-
dimensional fields from fields in six dimensions, we need both to impose
constraints on the fields in six dimensions, and write the four-dimensional
fields as suitable projections of the six-dimensional fields.
We show how to do this for tensor fields in Section II. In Section III we
apply these methods to derive the structure of various Green’s functions of
tensor fields in four-dimensions. Section IV deals with spinor fields, and in
Section V we find some new results for spinor Green’s functions.
Although the methods of this paper described in Sections II through V do
not in any way depend on assumptions about holography, they were in fact
inspired by AdS/CFT duality[9], especially as explained by Witten[10]. The
six-dimensional methods introduced here are applied to AdS/CFT duality
in Section VI, and used to find the conformal dimensionality d of fields

4
in four-dimensional conformal field theories that arise from fields in five-
dimensional anti-de Sitter space that approach finite limits on the boundary
of the space. For general tensors, it has the familiar value d = 0, but for
spinor or spinor-tensor fields it is d = 1/2.

II. TENSOR FIELDS

A tensor field T K1 K2 ...Kr (X) of rank r in six dimensions has the conformal
transformation rule

T K1 ...Kr (X) 7→ ΛL1 K1 · · · ΛLr Kr T L1 ...Lr (ΛX) , (12)

with Λ satisfying Eq. (9). (Indices K, L, etc. are lowered and raised with
ηKL and its inverse η KL .) For infinitesimal SO(4, 2) transformations, this
can be expressed as formulas for the commutators of T K1K2 ...Kr with the
generators J KL of these transformations:
∂ ∂
h i  
MN K1 ...Kr N
i J ,T (X) = X − XM T K1 ...Kr (X)
∂XM ∂XN
 K1 ...Kr
− i J MN T L1 ...Lr (X) , (13)
L1 ...Lr

where J M N is the tensor representation of the SO(4, 2) algebra:


 K1 ...Kr  
i J MN = η M K1 δLN1 − η N K1 δLM1 δLK22 · · · δLKrr + . . .
L1 ...Lr
 
K
+ η M Kr δLNr − η N Kr δLMr δLK11 · · · δLr−1
r−1
. (14)

Because we identify X K with λX K , T K1 ...Kr (X) must satisfy a scaling re-


lation
T K1 ...Kr (λX) = λ−d T K1 ...Kr (X) , (15)
where for the present d is just some unknown number. For reasons that will
become clear, we also require that the hypercone condition (6) must not be
affected by any of the differential operators
∂ ∂
T K1 ...Kr (X) ,··· , T K1 ...Kr (X) ,
∂X K1 ∂X Kr
so that T K1 ...Kr (X) must be transverse on each index

XK1 T K1...Kr (X) = 0 , · · · , XKr T K1 ...Kr (X) = 0 . (16)

5
Now, consider the four-dimensional field

tµ1 ...µr (x) ≡ (X 5 + X 6 )d eµK11 (x) · · · eµKrr (x) T K1 ...Kr (X) , (17)

with
eµν (x) ≡ δνµ , eµ5 (x) ≡ eµ6 (x) ≡ −xµ . (18)
Because of the scaling condition (15), the field (17) is only a function of
the ratios of the X K , so that when we eliminate X 5 − X 6 by imposing the
hypercone condition (6), the field (17) can indeed be regarded as a function
only of the spacetime coordinate xµ given by Eq. (8).
It is straightforward though tedious to use Eqs. (6), (8), (11), (13),
(15), and (16) to show directly that the four-dimensional tensor field given
by Eqs. (17) and (18) does satisfy the conformal transformation rules (1)
through (4), with (j ρσ )νµ11...ν
...µr
r
here given by the tensor representation of the
Lorentz group:

i (j ρσ )µν11...ν
...µr
= η ρµ1 δνσ1 − η σµ1 δνρ1 δνµ22 · · · δνµrr + . . .

r

+ η ρµr δνσr − η σµr δνρr δνµ11 · · · δνµr−1


 r−1
. (19)

In this paper we will instead show this by a less direct but more illuminating
method.
It is shown in the Appendix that the usual conformal transformation
rules of tensor fields just amount to the statement that under general con-
formal transformations a tensor of rank r and conformal dimensionality d
transforms as a tensor density of weight

w = −(d + r)/4 . (20)

So this is the condition that must be satisfied by the field (17). To show
that this condition is satisfied, we note by differentiating Eq. (8) that
∂xµ (X)
K
= (X 5 + X 6 )−1 eµK (x) ,
∂X
so that the field (17) can be written
∂xµ1 (X) ∂xµr (X) K1 ...Kr
tµ1 ...µr (x) ≡ (X 5 + X 6 )d+r · · · T (X) .
∂X K1 ∂X Kr
Hence, under a coordinate transformation X 7→ X ′ = ΛK L X L , we have
∂xµ1 (X ′ ) ∂xµr (X ′ )
tµ1 ...µr (x) 7→ (X ′5 +X ′6 )d+r · · · ΛL1 K1 ΛLr Kr T L1 ...Lr (X ′ ) .
∂X K1 ∂X Kr

6
Now, for any displacement dX on the hypercone (6), we have

∂xµ (X ′ ) ′L ∂xµ (X ′ ) K ∂xµ (X ′ ) K


dX = dX = ΛL dX ′L .
∂X ′L ∂X K ∂X K
But this is only for dX ′L on the hypercone, i. e. for XL′ dX ′L = 0, so

∂xµ (X ′ ) ∂xµ (X ′ ) K
− ΛL ∝ XL′ .
∂X ′L ∂X K
Under the transversality condition (16) the term proportional to XL′ makes
µ1 µr
no contribution, so we see that ∂x∂X K(X)
1
· · · ∂x∂X K(X)
r
T K1 ...Kr (X) transforms
as a tensor under general conformal transformations. Furthermore, it is
straightforward to show that under general conformal transformations x 7→
x′ , the quantity X 5 + X 6 transforms as a scalar density of weight −1/4:

X ′5 + X ′6 ∂x′ −1/4

= .
X5 + X6 ∂x

Hence tµ1 ...µr (x) does indeed transform under general conformal transfor-
mations as a tensor density of weight given by Eq. (20), the condition for
conformal invariance.
It may be noted that eµK (x)X K = 0, so tµ1 ...µr (x) is unchanged if we shift
T K1 ...Kr (X) by an amount proportional to any of X K1 or X K2 etc. This
lowers the number of physically relevant components of T K1 ...Kr (X) from
6r to 5r , and the transversality conditions (16) lowers it further to 4r , the
appropriate number for a four-dimensional tensor of rank r.
It may also be noted, as a consequence of Eq. (16), that traces of the four-
dimensional tensor tµ1 ...µr (x) are proportional to the corresponding traces
of the six-dimensional tensor T K1 ...Kr (X). For instance,

ηµ1 µ2 tµ1 µ2 ...µr (x) = (X 5 + X 5 )d eµK33 (x) eµK44 (x) · · · ηK1 K2 T K1 K2 ...Kr (X) .

In particular, the condition of being traceless carries over from a six-dimensional


tensor T K1 ...Kr (X) to the corresponding four-dimensional tensor tµ1 ...µr (x).
The same is obviously also true for conditions of symmetry or antisymme-
try. Hence six-dimensional tensors belonging to irreducible representations
of SO(4, 2) yield four-dimensional tensors belonging to the corresponding
irreducible representations of SO(3, 1).

III. TENSOR APPLICATIONS

7
We will first apply the method described in the previous section to a few
familiar simple examples, and then turn to more complicated applications.

A. Scalar Fields

First, consider the Green’s function hϕ1 (x)ϕ2 (y)i0 for a pair of scalar
fields ϕ1 (x) and ϕ2 (y) of conformal dimensionality d1 and d2 , with x − y
spacelike. According to the scaling condition (15), the Green’s function for
the corresponding six-dimensional fields Φ1 (X) and Φ2 (Y ) must be of order
−d1 in X and −d2 in Y , but it can only depend on the scalar X · Y , so there
must be an equal number of factors of X and Y , and therefore d1 = d2 ≡ d.
As is well known, this is the one thing beyond scale invariance that we learn
in this case from conformal symmetry. To check that the Green’s function
in four dimensions has the familiar form dictated by Poincaré and scale
invariance, we note by using Eq. (8) that the scalar here is
1 1
X · Y = Xµ Y µ + (X 5 + X 6 )(Y 5 − Y 6 ) + (X 5 − X 6 )(Y 5 + Y 6 )
2 !2
x 2 y 2 1
= (X 5 + X 6 )(Y 5 + Y 6 ) x · y − − = − (X 5 + X 6 )(Y 5 + Y 6 )(x − y)2 ,
2 2 2
(21)

so the six dimensional Green’s function is proportional to


−d
1

(X · Y )−d
= − (X 5 + X 6 )(Y 5 + Y 6 )(x − y)2 .
2
But according to Eq. (17), the four-dimensional scalars are related to the
six-dimensional scalars by

ϕ1 (x) = (X 5 + X 6 )d Φ1 (X) , ϕ2 (y) = (Y 5 + Y 6 )d Φ2 (Y ) , (22)

so the factors X 5 + X 6 and Y 5 + Y 6 cancel in the four-dimensional Green’s


function, which we see is proportional to [(x − y)2 ]−d , the well-known result
of Poincaré and scale invariance.
It is almost as easy to deal with the three-point function hϕ1 (x)ϕ2 (y)ϕ3 (z)i0 .
According to the scaling condition (15), the corresponding six-dimensional
three-point function for Φ1 (X), Φ2 (Y ) , and Φ3 (Z) must be of order −d1 in
X, −d2 in Y , and −d2 in Z, so it must be proportional to

(X · Y )−a (Y · Z)−b (Z · X)−c ,

8
where a + c = d1 , a + b = d2 , and b + c = d3 , and thus must be proportional
to

(X · Y )(d3 −d1 −d2 )/2 (Y · Z)(d1 −d2 −d3 )/2 (Z · X)(d2 −d1 −d3 )/2
∝ (X 5 + X 6 )−d1 (Y 5 + Y 6 )−d2 (Z 5 + Z 6 )−d3
×((x − y)2 )(d3 −d1 −d2 )/2 ((y − z)2 )(d1 −d2 −d3 )/2 ((z − x)2 )(d2 −d1 −d3 )/2 .

The factors (X 5 + X 6 )−d1 , (Y 5 + Y 6 )−d2 , and (Z 5 + Z 6 )−d3 are canceled by


similar factors in the relation (22) between the ϕs and Φs, leaving us with
a three-point function hϕ1 (x)ϕ2 (y)ϕ3 (z)i0 proportional to

((x − y)2 )(d3 −d1 −d2 )/2 ((y − z)2 )(d1 −d2 −d3 )/2 ((z − z)2 )(d2 −d1 −d3 )/2 , (23)

another known result.

B. Vector Fields

We next turn to vector fields. The two-point function of the six-vector


fields V1K (X) and V2L (Y ) must be a linear combination of the two tensors
that vanish when contracted with either XK or YL :

Y KXL
η KL − , XK Y L ,
X ·Y
with coefficients that are functions only of X · Y . Because X K eµK (x) = 0,
the second of these makes no contribution to the four-dimensional Green’s
function, and can be ignored. Each term in the first transverse tensor con-
tains zero net factors of X and Y , while the scaling condition (8) requires
that the two-point function be of order −d1 in X and of order −d2 in Y ,
so we see again that the two-point function vanishes unless d1 = d2 ≡ d, in
which case it is proportional to
!
−d KL Y KXL
(X · Y ) η − ,
X ·Y

with a constant coefficient. Using Eq. (17), we see that the four-dimensional
Green’s function hv µ (x)v ν (y)i0 is proportional to
!
Y KXL
(X 5 + X 6 )d (Y 5 + Y 6 )d (X · Y )−d eµK (x)eνL (Y ) η KL −
X·Y

9
Now, we note that
eµK (x)eνL (y)η KL = η µν , (24)
and
Y K eµK (x) = Y µ − xµ (Y 5 + Y 6 ) = (Y 5 + Y 6 )(y µ − xµ ) (25)
and likewise X K eµK (y) = (X 5 + X 6 )(xµ − y µ ). Eq. (21) then shows that the
factors (X 5 + X 6 ) and (Y 5 + Y 6 ) all cancel, leaving us with the result that
hv µ (x)v ν (y)i0 is proportional to

(x − y)µ (x − y)ν
 
((x − y)2 )−d η µν − 2 . (26)
(x − y)2

Here the conformal dimensionality d is arbitrary, but if we now impose the


further condition that these vectors are conserved currents, we find that d
must have the canonical value d = 3.

C. Symmetric Second-Rank Tensor Fields

The two-point function of two symmetric six-tensors TIKL (X) and T2M N (Y )
is required by SO(4, 2) invariance and the transversality condition (16) to
be a linear combination of the transverse tensors
! ! ! !
KM Y KXM LN Y LX N Y LX M Y K XN
η − η − + η LM − η KN

X·Y X·Y X ·Y X ·Y
! !
KL Y K X L + Y LX K MN XM Y N + Y M XN
η − η −
X·Y X·Y
and
X K X LY M Y N ,
in all three cases with coefficients that are functions only of the scalar X · Y .
Each term in these three tensors (including their coefficients) has equal
numbers of factors of X and Y , while the scaling condition (15) requires
the number of factors of X and Y to equal −d1 and −d2 , respectively,
so we must have d1 = d2 ≡ d, just as for scalars and vectors. Because
X K eµK (x) = Y M eµM (y) = 0, the third of these tensors makes no contribution
to the four-dimensional two-point function, and will therefore be ignored.
So the six-dimensional Green’s function must be a linear combination of the
first two tensors, with coefficients proportional to (X ·Y )−d . Using Eqs. (21),

10
(24) and (25), the two-point function of the four-dimensional tensors defined
by Eq. (17) is then
"
htµν (x)tρσ (y)i0 = A[r 2 ]−d η µρ η νσ + η µσ η νρ

r ρ r ν η µσ + r ρ r µ η νσ + r σ r ν η µρ + r σ r µ η νρ
−2
# r2
rρrσ rµrν
+8 + B[r 2 ]−d η µν η ρσ , (27)
(r 2 )2

where r ≡ x − y, and A and B are constants.


So far, d like A and B is an arbitrary number, but all these constants
become tightly constrained if we require that the tensor is conserved. Op-
erating on Eq. (27) with ∂/∂xµ gives a quantity proportional to

rρ rσ rν
(2d − 8)(r ρ η σν + r σ η ρν ) − (4A + 2dB)r ν η ρσ + A(32 − 8d) ,
r2
so the conservation condition tells us that d = 4 and A = −2B. These
are just the properties we expect for the energy-momentum tensor in a
conformally invariant theory — its canonical dimension is d = 4, while the
condition A = −2B tells us that the tensor is traceless.

IV. SPIN0R FIELDS

We now consider how to convert information about the Green’s func-


tions of spinor fields on the hypercone in six-dimensional projective space
into information about the Green’s functions of spinors in four-dimensional
spacetime. Let’s first recall some well-known facts about spinors in six di-
mensions.
The Clifford algebra for SO(4, 2) has a 26/2 = 8-dimensional irreducible
representation:
! ! !
µ 0 iγ5 γ µ 5 0 γ5 6 0 1
Γ = , Γ = , Γ = , (28)
iγ5 γ µ 0 γ5 0 −1 0

which obeys the anticommutation relations


n o
ΓK , ΓL = 2η KL . (29)

11
(Here γµ is the usual 4 × 4 Dirac matrix,2 and γ5 ≡ −iγ 0 γ 2 γ 2 γ 3 .) From
these matrices, we can construct the 8-component Dirac representation of
the SO(4, 2) Lie algebra
i h K Li
J KL = − Γ ,Γ (30)
4
for which h i
i J KL , ΓM = ΓK η LM − ΓL η KM , (31)
and so
h i
i J KL , J M N = η LM J KN − η KM J LN − η LN J KM + η KN J LM . (32)

Explicitly,
! !
j µν 0 5µ 1 γµ 0
J µν = , J = ,
0 j µν 2 0 γµ
! !
1 γ5 γ µ 0 56 i γ5 0
J 6µ = , J = , (33)
2 0 −γ5 γ µ 2 0 −γ5

where j µν is the Dirac representation of the Lorentz group Lie algebra:


i h µ νi
j µν = − γ ,γ . (34)
4
The block diagonal form of the matrices (33) indicates that this representa-
tion of the Lie algebra of SO(4, 2) is reducible, the top and bottom blocks
furnishing the two different irreducible four-component spinor representa-
tions of the Lie algebra of SO(4, 2).
The 8-component spinor fields in six dimensions have an SO(4, 2) trans-
formation given by the commutation relations
∂ ∂
   n
KL n L
i[J , Ψ (X)] = X − XK Ψn (X) − i J KL mΨ
m
(X) .
∂XK ∂XL
(35)
We note that the matrices ΓK and J KL obey reality conditions
!

K †

K

KL †

KL γ 0 γ5 0
Γ = −bΓ b , J = bJ b, b≡ = b−1 ,
0 γ 0 γ5
(36)
2
Our notation for Dirac matrices is the same as used in [11].

12
so the adjoint of Eq. (35) gives

∂ ∂
 
KL L
i[J , Ψ(X)] = X − XK Ψ(X) + iΨ(X) J KL , (37)
∂XK ∂XL
where
Ψ(X) ≡ Ψ† (X) b . (38)
We can therefore form six-tensors from bilinears in Ψ: For any 8 × 8 matrix
M , we have
∂ ∂
h  i   
KL L
i J , Ψ(X)M Ψ(X) = X − XK Ψ(X)M Ψ(X)
∂XK ∂XL
 
+ i Ψ(X) [J KL , M ]Ψ(X) , (39)
   
so for instance Ψ(X)ΓK Ψ(X) is a vector field, Ψ(X)J KL Ψ(X) is an
antisymmetric tensor, etc.
As in the case of tensor fields, we assume that Ψ(X) obeys a scaling law,

Ψ(λX) = λ−d+1/2 Ψ(X) (40)

so that (X 5 + X 6 )d−1/2 Ψ(X) is a function only of ratios of the X K . So far,


d − 1/2 is just some unknown number; the reason for writing it in this form
will become apparent soon. With X 5 − X 6 eliminated in favor of X 5 + X 6
and X µ Xµ by use of Eq. (6), we can regard (X 5 + X 6 )d−1/2 Ψ(X) as a
function only of the coordinate xµ given by Eq. (8):

(X 5 + X 6 )d−1/2 Ψ(X) ≡ ζ(x) (41)

It will be convenient to separate Ψ(x) and ζ(x) into four-component seg-


ments
!
Ψ+ (x)
Ψ(x) = , ζ± (x) = (X 5 + X 6 )d−1/2 Ψ± (X) . (42)
Ψ− (x)

Eq. (33) shows that the Ψ± transform according to the two fundamental
spinor irreducible representations of SO(4, 2). Although the ζ± (x) are func-
tions only of xµ , neither of these four-component fields have the right con-
formal (or even translation) transformation properties (1)–(4) to serve as
conventional four-dimensional spinor fields, but they will be ingredients in
the construction of such fields.

13
Using Eqs. (35) and (33), we can work out the commutators of the ζ±
fields with the generators J µν of Lorentz transformations; the generators
P µ = J 5µ + J 6µ of translations, the generators K µ = J 6µ − J 5µ of special
conformal transformations, and the generator S = −J 56 of scale transfor-
mations:
!
µν ∂ ν ∂
i[J , ζ± (x)] = x − xµ ζ± (x) − ij µν ζ± (x) , (43)
∂xµ ∂xν
∂ i
i[P µ , ζ± (x)] = − ζ± (x) − (1 ± γ5 )γ µ ζ± (x) , (44)
∂xµ 2
!
µ ∂
µ λ ∂
i[K , ζ± (x)] = 2x x λ
− x2 + (2d − 1)xµ ζ± (x)
∂x ∂xµ
i
+ (1 ∓ γ5 )γ µ ζ± (x) , (45)
2
∂ 1 1
 
i[S, ζ± (x)] = xλ λ + d − ζ± (x) ∓ γ5 ζ± (x) . (46)
∂x 2 2
The second terms in Eqs. (44) through (46) are very different from the
matrix terms in the commutation relations (1)–(4) of general fields in four-
dimensions. In particular, the presence of a matrix term in the commutation
relation (44) shows that ζ± (x) does not have the usual transformation rule
under translations. In order to construct suitable four-dimensional spinor
fields, we must impose a condition on Ψ(X) analogous to the transversality
condition imposed on tensors in Section II, and we must apply a projection
matrix to ζ± (X), analogous to the quantities eµK (x) in Eq. (17).
First, to eliminate the matrix term in Eq. (44), we define a pair of chiral
fields
1
ψ± (x) ≡ (1 ∓ γ5 ) ζ± (x) . (47)
2
Because γ5 commutes with j µν , multiplying Eq. (43) with (1 ∓ γ5 )/2 gives
the same Lorentz transformation rule:
!
µν ν ∂ ∂
i[J , ψ± (x)] = x − xµ ψ± (x) − ij µν ψ± (x) , (48)
∂xµ ∂xν

while multiplying Eq. (44) with (1 ∓ γ5 )/2 gives what is now a conventional
transformation under spacetime translations:

i[P µ , ψ± (x)] = − ψ± (x) . (49)
∂xµ

14
When we multiply Eq. (46) with (1 ∓ γ5 )/2, the second term becomes just
ψ± (x)/2, canceling the −1/2 in the first term:

 
λ
i[S, ψ± (x)] = x +d ψ± (x) . (50)
∂xλ
This is why we wrote the scaling relation for fermions in the form (40);
Eq. (50) shows that with this form of the scaling relation, d is the conformal
dimension of the spinor fields. Finally, multiplying the commutation relation
(45) with (1 ∓ γ5 )/2 gives
!
µ µ λ ∂ ∂
i[K , ψ± (x)] = 2x x λ
− x2 + (2d − 1)xµ ψ± (x)
∂x ∂xµ
+ iγ µ χ± (x) , (51)

where χ± is the opposite-chirality part of ζ± :


1
χ± (x) ≡ (1 ± γ5 ) ζ± (x) . (52)
2
This is still very different from the desired transformation rule under special
conformal transformations.
To proceed, we must impose a transversality condition on the spinor
fields Ψ(X) in six dimensions. The natural such condition is

XK ΓK Ψ(X) = 0 . (53)

This manifestly respects SO(4, 2) invariance, and it is consistent with the


fact that (X · Γ)2 = (X · X) = 0, so that zero is the sole eigenvalue of
X · Γ. Eq. (53) has the immediate consequence that the vector field (ΨΓK Ψ)
obeys the same transversality condition XK (ΨΓK Ψ) = 0 that we imposed
on vector fields in Section II. The same transversality holds for the other
vector field (ΨΓ7 ΓK Ψ), where
!
0 1 2 3 5 6 1 0
Γ7 ≡ −iΓ Γ Γ Γ Γ Γ = , (54)
0 −1

and also for the antisymmetric tensors (Ψ[ΓK , ΓL ]Ψ) and (ΨΓ7 [ΓK , ΓL ]Ψ).
The only other six-dimensional tensors that can be formed from bilinears in
Ψ(X) are the totally antisymmetric tensors of third rank

(ΨΓ[K ΓL ΓM ] Ψ) , (ΨΓ7 Γ[K ΓL ΓM ] Ψ) ,

15
the square brackets indicating antisymmetrization. These are not strictly
transverse; instead, Eq. (53) gives

XK (ΨΓ[K ΓL ΓM ] Ψ) = X L (ΨΓM Ψ) − X M (ΨΓL Ψ) ,

and similarly for XK (ΨΓ7 Γ[K ΓL ΓM ] Ψ). If we think of these tensors as three-
forms

(ΨΓ[K ΓL ΓM ] Ψ)dXK dXL dXM , (ΨΓ7 Γ[K ΓL ΓM ] Ψ)dXK dXL dXM

with anticommuting differentials dX K tangent to the hypercone (6), so that


X K dXK = 0, then these 3-forms are transverse, in the sense that they
vanish if we replace any dX K with X K . But the real justification for the
transversality condition (53) is that, as we shall now see, it gives the results
we need in four dimensions.
By multiplying the transversality condition Eq. (53) with the matrix
!
1 − γ5 0
0 1 + γ5

we find a simple formula for χ± in terms of ψ± :

χ± = −ixν γ ν ψ± . (55)

Thus the last term in Eq. (51) is


 
iγ µ χ± = γ µ γ ν xν ψ± = xµ + 2ij µν xν ψ± .

The special conformal transformation rule (51) thus reads


!
µ µ λ∂ ∂
i[K , ψ± (x)] = 2x x λ
− x2 + 2dxµ ψ± (x)
∂x ∂xµ
+ 2ij µν xν ψ± (x) . (56)

Eqs. (48)–(50) and (56) show that the fields ψ± (x) are conventional four-
dimensional Dirac fields, satisfying the commutation relations (1)–(4) with
the generators of the conformal group, and with conformal dimension d.
The other fields χ± have no obvious physical interpretation. Of course, we
can assemble the chiral fields ψ± into a four-component Dirac field
1 − γ5 1 + γ5
    
ψ(x) = ψ+ (x)+ψ− (x) = (X 5 +X 6 )d−1/2 Ψ+ (X) + Ψ− (X) .
2 2
(57)

16
It is this form of the spinor field that will be used to work out the conse-
quences of conformal symmetry for Green’s functions involving spinor fields.
By combining the methods of this section and of Section II, we can see
that a field ΨK1 ···Kr (X) with tensor indices as well as an 8-component spinor
index, if subjected to the transversality conditions,

XK1 ΨK1 ···Kr (X) = . . . = XKr ΨK1 ···Kr (X) = (X · Γ)ΨK1 ···Kr (X) = 0

yields a spinor-tensor in four dimensions

ψ µ1 ···µr (x) = (X 5 + X 6 )d−1/2 eµK11 (x) · · · eµKrr (x)


(1 − γ5 ) K1 ···Kr (1 + γ5 ) K1 ···Kr
 
× Ψ+ (X) + Ψ− (X) ,
2 2

(where Ψ+ and Ψ− are the upper and lower four components of Ψ, with
Γ7 = +1 and Γ7 = −1, respectively), which transforms under conformal
transformations according to Eqs. (1)–(4), with conformal dimensionality d.

V. SPINOR APPLICATIONS
D E
First let’s consider the Green’s function ψ1 (x) ψ 2 (y) , where ψ ≡ ψ † γ 0 γ5 .
Invariance under SO(4, 2) tells us that the corresponding two point function
of Ψ1 (X) and Ψ2 (Y ) in six dimensions must be a linear combination

A + B(X · Γ) + C(Y · Γ) + D[X · Γ, Y · Γ] ,

with A, B, C, and D all functions only of the scalar X · Y . (Here we


are ignoring the possibility of including terms involving the matrix Γ7 . We
will consider such terms presently.) The transversality condition that (X ·
Γ)Ψ1 (X) = 0 tells us that C = 0 and A = 2D X · Y , while the condition
that Ψ2 (Y )(Y · Γ) = 0 tells us B = 0 and, again, A = 2D X · Y . So the
six-dimensional Green’s function must have the form
[X · Γ, Y · Γ] A (X · Γ) (Y · Γ)
 
A 1+ = .
2X · Y (X · Y )

Every term here has equal numbers of factors of X K and Y K (including those
in A), while the scaling condition (40) tells us that the Green’s function must

17
be of order −d1 + 1/2 in X K and of the order −d2 + 1/2 in Y K , so we must
have d1 = d2 ≡ d, and the whole Green’s function must be proportional to
[X · Γ, Y · Γ]
 
(X · Y )1/2−d 1 + , (58)
2X · Y
with a constant proportionality coefficient.
From Eqs. (30) and (33), we find
!
M+ 0
[X · Γ, Y · Γ] = 4iXK YL J KL = 4i ,
0 M−

where
1
M± = j µν Xµ Yν + (1 ± γ5 )γ µ (X5 Yµ − Y5 Xµ )
2
µ i
±γ5 γ (X6 Yµ − Y6 Xµ ) ± γ5 (X5 Y6 − Y5 X6 ) .
2
From Eq. (57), we then have
D E
ψ1 (x)ψ 2 (y) ∝ (X 5 + X 6 )d−1/2 (Y 5 + Y 6 )d−1/2 (X · Y )−d−1/2
X  1 ∓ γ5  
1 ± γ5

× M± ;.
±
2 2

Only the vector and axial vector terms in M± survive, so this simplifies to
D E
ψ1 (x)ψ 2 (y) ∝ (X 5 + X 6 )d−1/2 (Y 5 + Y 6 )d−1/2 (X · Y )−d−1/2
 
× γ µ (X 5 + X 6 )Yµ − (Y 5 + Y 6 )Xµ .

From (8) and (21), we have then


D E  −d−1/2
ψ1 (x)ψ 2 (y) ∝ (x − y)2 γ µ (xµ − yµ ) . (59)

This is of course just what we should expect in a Poincaré invariant and


scale invariant theory with spinor fields of equal dimensionality d.
Now let us return to the possibility of including the matrix Γ7 defined
by Eq. (54) in the six-dimensional Green’s function. That is, we consider
the possibility of multiplying Eq. (58) with a factor (1 + αΓ7 ), with some
arbitrary α, so that the Green’s function in six dimensions is proportional
to
[X · Γ, Y · Γ]
 
1/2−d
(1 + αΓ7 )(X · Y ) 1+ . (60)
2X · Y

18
The effect is to multiply the terms M± with (1 ± α), so that the Green’s
function (59) becomes
D E  −d−1/2
ψ1 (x)ψ 2 (y) ∝ (x − y)2 (1 − αγ5 )γ µ (xµ − yµ ) . (61)

This is allowed by SO(4, 2) invariance, since Γ7 commutes with all the gen-
erators J KL , but it is not allowed in a theory that is invariant under O(4, 2),
since Γ7 changes sign under transformations (9) with DetΛ = −1. In par-
ticular, Γ7 terms seem to be ruled out if we impose invariance under the
inversion xµ 7→ −xµ /x2 , which just amounts to the reflection that changes
the sign of X 6 and leaves all other X K unchanged.
The presence of a Γ7 term in the six-dimensional Green’s function (60)
or a γ5 term in the corresponding four-dimensional Green’s function (61)
does not in itself violate invariance under O(4, 2), because we can eliminate
these terms by a redefinition of the fermion fields. It is only necessary to
replace Ψ with
1 + Γ7 1 − Γ7
    
′ −1/2 −1/2
Ψ = (1 + α) + (1 − α) Ψ (62)
2 2

so that instead of Eq. (56) we have

1 − γ5 1 + γ5
     
ψ(x) = (X 5 +X 6 )d−1/2 (1 + α)−1/2 Ψ+ (X) + (1 − α)−1/2 Ψ− (X) .
2 2
(63)
The real sign of a breakdown of O(4, 2) to SO(4, 2) is the presence, in one
or more Green’s functions, of O(4, 2)-breaking Γ7 terms that cannot all be
eliminated by redefinition of the fermion fields. D E
Here is an example. Consider the Green’s function ψ1 (x)ψ 2 (y)ϕ(z)
0
of two fermion and one scalar field, of dimensionality d1 , d2 , and d3 , re-
spectively. Invariance under O(4, 2) would require the corresponding six-
dimensional Green’s function to take the form

A + B(X · Γ) + C(Y · Γ) + D(Z · Γ) + E[X · Γ, Y · Γ]


F [Y · Γ, Z · Γ] + G[Z · Γ, X · Γ] + H(X · Γ) (Z · Γ) (Y · Γ) ,

with A, B, etc. functions of the scalars X · Y , Y · Z, and Z · X. (Any


other ordering of the Γ-matrices in the last term would differ only by terms
of the same form as those already included.) This must vanish when we
multiply with X · Γ on the left; the vanishing of the terms proportional to

19
[X · Γ, Y · Γ], [X · Γ, Z · Γ], and XK YL ZM Γ[K ΓL ΓM ] gives C = 0, D = 0,
and F = 0, while the vanishing of the terms proportional to X · Γ gives
A = 2EX · Y . It must also vanish when we multiply on the right with Y · Γ;
the vanishing of the terms proportional to [X · Γ, Y · Γ], [Y · Γ, Z · Γ], and
XK YL ZM Γ[K ΓL ΓM ] gives B = 0, D = 0, and G = 0, while the vanishing of
the terms proportional to Y · Γ again gives A = 2EX · Y . In both cases the
vanishing of terms proportional to the unit matrix gives nothing new. So
we conclude that the Green’s function in six dimensions is of the form
[X · Γ, Y · Γ]
 
A 1+ + H(X · Γ) (Z · Γ) (Y · Γ) .
2X · Y
Now, according to the scaling properties of the fields, the total number of
factors of X, Y , and Z must be respectively −d1 + 1/2, −d2 + 1/2, −d3 , so

A ∝ (X · Y )−a (Y · Z)−b (Z · X)−c ,

H ∝ (X · Y )−a−1/2 (Y · Z)−b−1/2 (Z · X)−c−1/2 ,


where a + c = d1 − 1/2, a + b = d2 − 1/2, b + c = d3 . The Green’s function
for two spinors and a scalar in six dimensions thus takes the form

(X · Y )(d3 −d1 −d2 +1)/2 (Y · Z)(d1 −d2 −d3 )/2 (Z · X)(d2 −d3 −d1 )/2
"  #
[X · Γ, Y · Γ] (X · Γ) (Z · Γ) (Y · Γ)

× a 1+ + hp , (64)
2X · Y (X · Y ) (Y · Z) (Z · X)

where a and h are constants.


The contribution of the second term to the four-dimensional Green’s
function is complicated, and is not needed for the point I wish to make, so I
will take h = 0 in what follows. Then, following the same arguments as for
the two-spinor Green’s function, we have
D E
ψ1 (x)ψ 2 (y)ϕ(z) ∝ ((x − y)2 )(d3 −d1 −d2 −1)/2 ((y − z)2 )(d1 −d2 −d3 )/2
0
2 (d2 −d3 −d1 )/2 µ
× ((z − x) ) γ (x − y)µ . (65)

But in a theory that is invariant under SO(4, 2) but not O(4, 2), we are free
to include a factor 1 + βΓ7 multiplying the first term in Eq. (64), so that
(for h = 0) in place of Eq. (65) we have
D E
ψ1 (x)ψ 2 (y)ϕ(z) ∝ ((x − y)2 )(d3 −d1 −d2 −1)/2 ((y − z)2 )(d1 −d2 −d3 )/2
0
2 (d2 −d3 −d1 )/2
× ((z − x) ) (1 − βγ5 )γ µ (x − y)µ . (66)

20
Now, by redefining the fermion fields we can eliminate the 1 + αΓ7 factor in
the two point function, which eliminates the γ5 term in Eq. (61), or we can
eliminate the 1+βΓ7 factor in the three-point function, which eliminates the
γ5 term in Eq. (66), but unless β = α we cannot do both. We see then that
it makes a difference whether we assume invariance under O(4, 2), which
includes the inversion xµ 7→ −xµ /x2 , or only invariance under SO(4, 2),
which does not include the inversion.

VI. AdS/CFT

In the preceeding sections the six-tensors T K1 ···Kr (X) and eight-component


spinors Ψ(X) were fictions, merely means to the end of calculating Green’s
functions for fields in four spacetime dimensions. But T K1 ···Kr (X) and Ψ(X)
may also be regarded as actual fields on five-dimensional anti-de Sitter space
(AdS5 ). This space is the surface of the hypersphere in six dimensions

ηKL X K X L = R2 (67)

with the same metric ηKL as in Sections I through V, and arbitrary R > 0.
It is manifestly maximally symmetric, with isometry group SO(4, 2) con-
sisting of the transformations (9). Tensors T K1 ···Kr (X) on AdS5 transform
as in Eq. (12), and without upsetting the isometry can be subject to the
transversality condition (16). We can also introduce 8-component spinor
fields Ψ(X) on AdS5 , with the same SO(4, 2) transformation properties as
in Section IV, but we cannot here adopt the transversality condition (53),
which requires that (X · Γ) Ψ = 0, because on the hypersphere we have

(X · Γ)2 = X · X = R2 ,

and so the only eigenvalues of X · Γ are R and −R. But we can instead
adopt the SO(4, 2)-invariant condition

X · Γ Ψ(X) = RΨ(X) . (68)

There is no loss of generality in taking the coefficient of Ψ(X) on the right-


hand side to be R rather than −R, because if Ψ(X) satisfies Eq. (68), then
Γ7 Ψ(X) satisfies the same constraint with R replaced with −R.
Of course, X K and λX K here can not both be on the hypersphere (67)
except for λ = ±1, so we can not impose a scale invariance condition like
(15) here. But in the limit that some components X K become much larger

21
than R, with the ratios of all components held fixed, the hypersphere (67) ef-
fectively becomes the hypercone (6), and the constraint (68) on spinor fields
effectively becomes the transversality condition (53). The AdS/CFT conjec-
ture deals with fields on AdS5 that approach c-number values T∞ K1 ···Kr (X)

or Ψ∞ (X) in this limit, satisfying scaling conditions of the form


K1 ···Kr
T∞ (λX) = λa T∞
K1 ···Kr
(X) , Ψ∞ (λX) = λa Ψ∞ (X) . (69)

Of particular interest are massless degrees of freedom, represented by fields


with a = 0; massive degrees of freedom generally have a < 0.
We know from the work of Sections II and IV that, from such asymp-
totic fields T∞K1 ···Kr (X) and Ψ (X), we can form tensor fields (17) and

spinor fields (57) in four dimensions that transform as usual under the four-
dimensional conformal group, with conformal dimensions d = −a for ten-
sors of any rank and d − 1/2 = −a for spinors, or spinor-tensors of any
rank. In particular, in the important case a = 0 for which fields approach
finite limits on the boundary X → ∞ of AdS5 , as well known a tensor
current on the boundary must have conformal dimension d = 0, and the
four-dimensional tensor field with which it interacts must therefore have di-
mensionality d = 4, the expected dimensionality for the energy-momentum
tensor in conformally-invariant theories. On the other hand, a spinor or
spinor-tensor field, which arises from a spinor or spinor-tensor field on AdS5
that approaches a finite value on the boundary, has d = 1/2, so the four-
dimensional spinor or spinor-tensor fields with which these fields interact
must then have dimensionality 7/2, the correct expected dimensionality for
the supersymmetry current in conformally invariant supersymmetric theo-
ries.

ACKNOWLEDGMENTS

I am grateful for discussions with J. Distler and J. Meyers, and for corre-
spondence with I. Bars, A. Chodos, H. Kastrup, T. Okuda, S. Ferrara, and
A. Waldron. This material is based upon work supported by the National
Science Foundation under Grant No. PHY-0455649 and with support from
The Robert A. Welch Foundation, Grant No. F-0014.

APPENDIX

This Appendix will justify the claim made in Section II, that the usual
conformal transformation rules of tensor fields just amount to the statement

22
that under general conformal transformations a tensor of rank r and con-
formal dimensionality d transforms as a tensor density of weight given by
Eq. (17):

∂x −(r+d)/4 ∂xµ1 ∂xµ2 ∂xµr ν1 ν2 ···νr ′



µ1 µ2 ···µr
t (x) 7→ ′
· · · t (x ) , (A.1)
∂x ∂x′ν1 ∂x′ν2 ∂x′νr

where |∂x/∂x′ | is the determinant of the matrix ∂xµ /∂x′ν . This is trivial
for Lorentz transformations and translations. For the scale transformation
x′µ = (1 + b)xµ , Eq. (A.1) gives
 
tµ1 µ2 ···µN (x) 7→ (1 + b)d tµ1 µ2 ···µN (1 + b)x (A.2)

which for infinitesimal b is the same as the scale transformation rule (4).
Similarly, for an infinitesimal special conformal transformation

xµ 7→ x′µ = xµ + 2(x · c)xµ − cµ x2 ,

we have
∂xµ ∂x
 
µ µ µ µ
= δν − 2(x · c)δν − 2 x cν − c x ν , ∂x′ = 1 − 8(x · c)

∂x′ν

so here Eq. (A.1) reads

tµ1 µ2 ···µr (x) 7→ tµ1 µ2 ···µr (x) + 2d(x · c)tµ1 µ2 ···µr (x)
−(2xµ1 cν − 2cµ1 xν )tνµ2 ···µr (x) + · · · − (2xµr cν − 2cµr xν )tµ1 µ2 ···ν (x)
+(2(x · c)xµ − cµ x2 )∂µ tµ1 µ2 ···µr (x) . (A.3)

This is the same as the transformation rule (3) (contracted with cν ), with
Lorentz transformation matrix j ρσ given by Eq. (19).

REFERENCES

1. See, e.g., E. J. Schreier, Phys. Rev. D 3, 980 (1971). For a review,


see E. S. Fradkin and M. Ya. Nalchik, Phys. Rept. 44, 249 (1978).

2. P. A. M. Dirac, Ann. Math. 37, 429 (1936).

3. G. Mack and A. Salam, Ann. Phys. (New York) 53, 174 (1969).

4. H. A. Kastrup, Ann. Phys. (Berlin) 17, 631 (2008).

23
5. I. Bars, Phys. Rev. D 62, 046007 (2000); Phys. Rev. D64, 045004
(2001); Phys. Rev. D 74, 085019 (2006); Phys. Rev D77, 125027
(2008); Phys. Rev. D79, 085021 (2009).

6. S. Ferrara, A. F. Grillo, and R. Gatto, Ann. Phys. (New York) 76,


161 (1973).

7. S. Ferrara, Nucl. Phys. B77, 413 (1974).

8. See for instance L. Cornalba, M. S. Costa, J. Penedones, and R. Schi-


appa, J. High Energy Phys. 0708, 019 (2007) (which however considers
only scalar fields).

9. J. Maldacena, Adv. Theor. Math. Phys. 2, 231 (1998).

10. E. Witten, Adv. Theor. Math. Phys. 2, 253 (1998).

11. S. Weinberg, The Quantum Theory of Fields (Cambridge University


Press, Cambridge, 1995), Sec. 5.4.

24
UTTG-10-10
arXiv:1009.1537v2 [hep-ph] 14 Dec 2010

Pions in Large-N Quantum Chromodynamics

Steven Weinberg*
Theory Group, Department of Physics, University of Texas
Austin, TX, 78712

Abstract

An effective field theory of quarks, gluons, and pions, with the number N
of colors treated as large, is proposed as a basis for calculations of hadronic
phenomena at moderate energies. The qualitative consequences of the large
N limit are similar though not identical to those in pure quantum chro-
modynamics, but because constituent quark masses appear in the effective
Lagrangian, the ‘t Hooft coupling in the effective theory need not be strong
at moderate energies. To leading order in 1/N the effective theory is renor-
malizable, with only a finite number of terms in the Lagrangian.

*
Electronic address: weinberg@physics.utexas.edu

1
The success of quantum chromodynamics (QCD) in accounting for pro-
cesses like electron–positron annihilation into hadrons at high energy shows
that it is the correct theory of strong interactions, but it has been diffi-
cult to use QCD to account for the wide variety of hadronic phenomena at
moderate energies.
On one hand, the suggestion[1] to consider QCD in the limit of a large
number
√ N of colors, with the gauge coupling g vanishing in this limit as
1/ N , has had remarkable success in reproducing qualitative features of
strong interaction phenomena. But it has not led √ to much quantitatively,
presumably because the ‘t Hooft coupling g̃ ≡ g N is not small at mod-
erate energies. Indeed, with the u and d quark masses negligible, the N -
independent masses of mesons like the ρ can only be of the order of the
integration constant ΛQCD in the renormalization group equation for g̃, so
inevitably g̃ cannot be small at these meson masses.
Alternatively, it is possible to introduce constituent quark masses into
QCD by taking chiral symmetry breaking in account in an effective field the-
ory of quarks, gluons, and pions, so that hadrons (other than the pion) can
get their mass mostly from the constituent quark masses. In consequence,
gluon couplings in the effective theory need not be strong at moderate en-
ergies. This fits in well with the observed pattern of hadron masses, such as
the fact that the average of the masses of the nucleon and ∆(1238) is not
very different from 3/2 the ρ and ω masses. Such an effective field theory
was briefly mentioned in [2], and proposed and developed in some detail
(for the three-flavor case) by Manohar and Georgi[3], But here there is a
different problem: In effective field theories we generally must include every
one of the infinite number of interactions satisfying relevant symmetries, all
of them presumably important at moderate energies, so that the theory can
only be used at low energies.
I suggest that by combining these two approaches the difficulties of each
can be avoided. To leading order in 1/N , the effective field theory of quarks,
gluons, and pions is effectively renormalizable, with only a finite number of
terms in the Lagrangian needed to absorb all infinities. Such an effective
field theory, with a small value of the ‘t Hooft coupling at moderate energies,
may explain why the naive quark model works so well. Since the pion is
already in the Lagrangian, it is not even necessary for the QCD coupling to
be strong at relatively low energies, though it must still be strong at very
large distances to keep color trapped.

2
The effective Lagrangian is taken as1
1 µν 1 µ

Leff = − Tr {F µν F } − ψ(D γ µ + m)ψ
4g 2 g2
F 2 2igA  
− π Dµ~π · D µ~π − 2 ψγ5 γ µ~tψ · Dµ~π
2 g
µ 2
−c1 (Dµ~π · D ~π ) − c2 (Dµ~π · Dν ~π ) (D µ~π · D ν ~π ) . (1)

Here ψ and ~π are the quark isodoublet and the pion isovector, re-scaled by
multiplying the canonically normalized fields by√ factors g and 1/Fπ , respec-
tively. Both g and 1/Fπ are taken to go as 1/ N for large N if we hold
ΛQCD fixed. Also, Fµν is the re-scaled gluon field strength tensor, an N × N
matrix:
Fµν ≡ ∂µ Aν − ∂ν Aµ − i[Aµ , Aν ] ,
where Aµ is g times the canonically normalized gluon vector potential ma-
trix. Also, ~t is the quark isospin matrix; Dµ ψ is the gauge- and chiral-
covariant derivative of the quark field

~π × ∂µ~π
 
Dµ ψ ≡ ∂µ − iAµ + 2i~t · ψ;
1 + ~π 2
and Dµ~π is the chiral-covariant derivative of the pion field

∂µ~π
Dµ~π ≡ .
1 + ~π 2
The constituent quark mass matrix m and the axial coupling gA are uncon-
strained by chiral symmetry, and assumed to be N -independent. (Sum rules
have been used in the large N limit to show that quarks have axial coupling
gA ≃ 1[5].) The parameters c1 and c2 are coefficients of order N .
With or without the last two terms in (1), this Lagrangian will clearly
reproduce the usual soft-pion theorems of chiral symmetry at low energy,
1
Unfortunately, the anomaly that breaks the chiral U (1) symmetry of the Lagrangian
with mu = md = 0 disappears in the large N limit[4], leading to the presence of a light
pseudoscalar particle ϕ, a mixture of the η and an SU (3) singlet η ′ . The ϕ mass is shown in
[4] to vanish as 1/N for N → ∞, so for the consistency of the large N approximation, the
field of this light pseudoscalar presumably should in principle be included in the effective
field theory. It is a failing of the large N approximation that in the real world, with N = 3,
the η and η ′ are not particularly light. Possibly there is some reason why the ϕ mass,
though of order 1/N , is not small. Here we will not include the ϕ in the effective field
theory, but all of the combinatoric arguments here regarding pions would apply also to ϕ
mesons, if they were included in the effective Lagrangian.

3
× ×

× × × ×
A B

× ×
C
Figure 1: Some diagrams of leading order in 1/N for processes involving
pions and other mesons. Plain lines indicate quarks; wavy lines indicate
gluons; dashed lines indicate pions; and crosses indicate insertions of quark
bilinears.

and as we will see, it also gives most of the usual qualitative results[1] of the
1/N approximation at moderate energies. The last two terms in (1) will be
needed to cancel ultraviolet divergences to leading order in N .
Let us consider a process involving some mesons and perhaps also glue-
balls. As is well known[1], if we ignore the pions and keep only the terms
for quarks and gluons in (1), the leading connected diagrams will consist
of a single quark loop surrounding a planar mesh of gluon lines, with in-
sertions of operators in the quark and gluon lines representing the emission
and absorption of mesons (other than pions) and glueballs. (See Figure
1A.) Such diagrams make a contribution of order N . (This assumes that
the operators representing mesons and glueballs are constructed as bilinears

4
in the un-rescaled quark and Fµν fields. For the moment we are ignoring
the N -dependent factors needed in these insertions to give the initial and
final states created by these operators the proper normalization, because
such factors depend only on the process considered, and hence do not affect
the relative contributions of different diagrams for a given process. These
insertions also must include form-factors, taken from the solution of the
quark-antiquark bound state problem.)
Now, suppose we include virtual pions, increasing the total number of
internal pion and quark lines by ∆I and increasing the total number of
vertices by ∆V . Since pions have no color, this does not change the number
of index loops, so the change ∆χ in the number χ of factors of N contributed
to a connected graph will be ∆χ = ∆V − ∆I. But the total number of
loops in a connected diagram is L = I − V + 1, so ∆L = ∆I − ∆V , and
thus ∆χ = −∆L. The dominant connected diagrams will thus be those
with ∆L = 0. In other words, these are diagrams with a single quark
loop surrounding a planar mesh of gluon lines, in which the pions add no
additional loops, and therefore can only form trees attached to the quark
loop. (See Figure 1B.) Here ∆χ = 0, and such diagrams therefore make
contributions of order N , just as without pions.
There is, however, a complication here: The addition of pion lines to
an unconnected diagram with C > 1 separate connected parts may yield a
connected diagram — that is, one with a single connected component. In
this case, we have ∆L = ∆I − ∆V + ∆C, where ∆C ≤ 0 is the change
in the number of connected components produced by adding the pion lines,
which if the graph with pions added is to have a single connected component
must be ∆C = 1 − C. Thus the change in the number of factors of N is
∆χ = ∆V − ∆I = −∆L + 1 − C. The leading diagrams for a given C
will again be those with ∆L = 0, and now will have ∆χ = 1 − C. But if
the leading graphs without pion lines have C connected components, they
are of order N C ; that is, they have χ = C. Hence the leading connected
graphs with pions added will have χ + ∆χ = 1, and so will again be of
order N . In contrast to the usual version of large N QCD, in the effective
field theory the leading connected graphs can have any number of quark
loops, each surrounding a planar mesh of gluon lines, but connected in a
tree by single pion lines, not gluon lines. (See Figure 1C.) This allows some
“Zweig-rule-forbidden” processes in leading order. One can have transitions
between ūu and dd ¯ mesons by having one meson destroyed at one quark
loop and the other created at another quark loop, but only if these mesons
have the quantum numbers of the pion. In leading order there are no pion
lines connecting quark lines within a single quark loop, so pion exchange

5
has no effect on the spectrum of mesons, other than those with the quantum
numbers of pions. Also, to leading order the renormalization group equation
for the ‘t Hooft coupling in the effective theory is the same as in QCD, but
the integration constant Λ in the solution of this equation may be smaller,
giving a smaller coupling at any given energy.
The same analysis applies to the case C = 0. That is, a tree graph
consisting solely of pion lines with vertices given by the purely pionic terms
in Eq. (1) makes a contribution to purely pionic processes of order N , just
like diagrams for the same processes with one or more quark loops.
Unlike the usual experience with effective field theory[2,3], for large N
we have no additional ultraviolet divergences due to loops including pion
fields. The attachment of vertices proportional to Dµ~π to a quark loop
does introduce new ultraviolet divergences, but in such diagrams the pion
field acts just as a classical external field, so the ultraviolet divergences
are limited. There are logarithmic divergences from graphs with four new
vertices in the quark loop, whose form is constrained by chiral symmetry so
that they can be canceled2 by the terms in (1) proportional to c1 and c2 .
There are quadratic divergences from graphs with two new vertices in the
quark loop, that can be canceled by renormalization of Fπ . These graphs also
produce logarithmic divergences, that as remarked in [2] can be canceled by
redefinition of the pion field. Finally, gluon corrections to a vertex inserted
in a quark line produce logarithmic divergences that can be canceled by
renormalization of gA . Thus in the large N limit the Lagrangian (1) describes
what is in effect a renormalizable theory. Terms in the Lagrangian with
more quark or Fµν field factors and/or more derivatives are not needed for
renormalization in leading order in N , so such terms may be taken to have
coefficients with sufficient powers of 1/N so that they do not contribute in
leading order.
The dominant graphs remain of order N (or N 2 , for reactions involv-
ing only glueballs) if the insertions in quark and gluon lines that we make
to represent the emission and absorption of mesons (other than pions) and
glueballs are bilinear in un-rescaled fields, and hence of order N when ex-
pressed in terms of re-scaled quark and gluon fields, like the terms in (1).
In particular, propagators of these operators are of order N for mesons (as
also for the re-scaled pion field) and of order N 2 for glueballs. But an op-
2
If we did include the field ϕ of an isoscalar pseudoscalar pseudo-Goldstone boson in
the theory, we would also have to include counterterms

−c3 (∂µ ϕ∂ µ ϕ)2 − c4 (∂µ ϕ∂ µ ϕ) (Dµ ~π · Dµ ~π ) − c5 (∂µ ϕ∂ν ϕ) (Dµ ~π · Dν ~π ) ,

in addition to kinematic and axial vector coupling terms for ϕ.

6
A B
Figure 2: Some diagrams of leading order in 1/N for baryon structure.
Notation same as Fig. 1.

erator that is properly normalized to produce physical states must have a


propagator whose residues at one-particle poles are N -independent, so to
form properly normalized operators for creating and destroying pions√and
other mesons we must include an additional factor proportional to 1/ N ,
while the properly normalized operators for glueballs must include an addi-
tional factor 1/N . The amplitude for reactions whose initial and final states
contain respectively M ≥ 1 and M ′ ≥ 1 pions or other mesons and G ≥ 0
′ ′
and G′ ≥ 0 glueballs is then of order N 1−M/2−M /2−G−G , as in the usual
case without pions[1]. Since pions count here the same as other mesons,
the usual arguments show that the singularities of scattering amplitudes for
mesons and glueballs consist solely of meson poles in various channels.
Finally, let us consider the leading connected terms in the interaction of
the N quarks making up a baryon. Witten[6] has shown that in pure QCD,
without pions, the leading contributions to a connected graph involving n
quarks is of order N 1−n , but the number of ways of selecting these n quarks
from the N quarks in the baryon is N !/(N − n)!n!, which for N ≫ n goes
as N n , so that the sum of these connected graphs is of order N . In the
effective theory including pions, we have to take into account the possibility
of forming a connected graph from a disconnected diagram with C separate
connected parts, linking them together with pion lines. First consider the
N -dependence of such a disconnected diagram before we add the pion lines.
(See Figure 2A.) If the rth connected part involves nr quark lines, then the
total contribution of such parts is of order
C
N! NC
N 1−nr ×
Y
→ ,
C!(N − r nr )! r nr ! C! r nr !
P Q Q
r=1

where the first factor is the contribution of the C connected parts, and

7
the second factor is the number of ways of selecting the quarks in these C
connected parts. Just as we saw in the meson case, the addition of pion
lines to give a connected diagram supplies an additional factor N 1−C−∆L ,
where ∆L is the increase in the number of loops, so the leading graphs are
those in which the addition of pions does not increase the number of loops,
and these graphs are of order N , as in pure QCD. (See Figure 2B.)
This picture raises issues of double counting of baryons.3 The purely
pionic part of the Lagrangian (1) may have skyrmion solutions, with masses
of order N [6], in addition to the N -quark states described above. Indeed, the
last two terms in (1) are just the sort needed to stabilize the skyrmion. But
although in this theory purely pionic interactions are correctly described at
low energy in the tree approximation by the purely pionic terms in (1), this
is not true at the moderate energies of order ΛQCD probed in the structure
of skyrmions. At such energies, to leading order in N we must also take into
account quark loops, each surrounding a planar mesh of gluon lines, which
can take the place of vertices in a tree of pion lines. Nothing is known about
the existence of skyrmion solutions when such quark loops are taken into
account.
This work leaves open several questions: Can (1) be derived from QCD
by some process of “integrating out” degrees of freedom? If so, what is the
relation between the integration constants Λ for the ‘t Hooft couplings in
QCD and the effective theory? And will this effective field theory provide a
basis for practical calculations of hadronic phenomena at moderate energies?
I am grateful for valuable conversations with J. Distler, W. Fischler, V.
Kaplunovsky, and J. Meyers. This material is based upon work supported by
the National Science Foundation under Grant Numbers PHY-0969020 and
PHY-0455649 and with support from The Robert A. Welch Foundation,
Grant No. F-0014.

———-

1. G. ’t Hooft, Nucl. Phys. B75, 461 (1974).

2. S. Weinberg, Physica 96A, 327 (1979).

3. A. Manohar and H. Georgi, Nucl. Phys. B234, 189 (1984).


3
Ref. [3] has already shown a way that binding a second pion can be avoided.

8
4. E. Witten, Nucl. Phys. B 149, 285 (1979); 156, 269 (1979); G.
Veneziano, Nucl. Phys. B 159, 213 (1979).

5. S. Weinberg, Phys. Rev. Lett. 65, 1181 (1990).

6. E. Witten, Nucl. Phys. B160, 57 (1979).

9
UTTG-04-10

Ultraviolet Divergences in Cosmological Correlations


arXiv:1011.1630v2 [hep-th] 19 Dec 2010

Steven Weinberg*
Theory Group, Department of Physics, University of Texas
Austin, TX, 78712

Abstract

A method is developed for dealing with ultraviolet divergences in calcula-


tions of cosmological correlations, which does not depend on dimensional
regularization. An extended version of the WKB approximation is used to
analyze the divergences in these calculations, and these divergences are con-
trolled by the introduction of Pauli–Villars regulator fields. This approach
is illustrated in the theory of a scalar field with arbitrary self-interactions
in a fixed flat-space Robertson–Walker metric with arbitrary scale factor
a(t). Explicit formulas are given for the counterterms needed to cancel all
dependence on the regulator properties, and an explicit prescription is given
for calculating finite regulator-independent correlation functions. The pos-
sibility of infrared divergences in this theory is briefly considered.

*
Electronic address: weinberg@physics.utexas.edu

1
I. INTRODUCTION

Much effort has been expended in recent years in the calculation of quan-
tum effects on cosmological correlations produced during inflation. These
calculations are complicated by the occurrence of ultraviolet divergences,
which have typically been treated by the method of dimensional regular-
ization. Unfortunately, this method has several drawbacks. It is difficult
or impossible to employ dimensional regularization unless the analytic form
of the integrand as a function of wave number is explicitly known, so cal-
culations have generally relied on an assumption of slow roll inflation, or
even strictly exponential inflation. Also, even where an analytic form of the
integrand is known, dimensional regularization can be tricky. Senatore and
Zaldarriaga[1] have shown that there are terms in correlation functions that
were omitted in work by other authors[2],[3].
This article will describe a method of dealing with ultraviolet diver-
gences in cosmological correlations, without dimensional regularization. For
the purposes of regularization of infinities, we employ a generally covariant
version of Pauli–Villars regularization[4]. In order to calculate the coun-
terterms that are needed to cancel infinities when the regulator masses go
to infinity, we introduce an extended version of the WKB approximation
(keeping not only terms of leading order in wavelength), which works well
even when the wave number dependence of the integrand is not explicitly
known, and can therefore be applied for an arbitrary history of expansion
during inflation.
This method is described here in a classic model, the fluctuations of a
real scalar field in a fixed general Robertson–Walker metric. This is simple
enough to illustrate the use of the method without the general idea being
lost in the complications of quantum gravity, and yet sufficiently general
so that we can see how to deal with an arbitrary expansion history. As
we shall see, these methods yield a prescription for calculating correlation
functions that are not only free of ultraviolet divergences, but independent
of the properties of the regulator fields.

II. THE MODEL

We consider the theory of a single real scalar field ϕ(x) in a fixed metric
gµν (x), with Lagrangian density

1
 
−Detg − g µν ∂µ ϕ∂ν ϕ − V (ϕ) ,
p
L= (1)
2

2
where V (ϕ) is a general potential. The modifications in this Lagrangian
needed to introduce counterterms and regulator fields will be discussed in
Sections III and IV, respectively.
This theory will be studied in the case of a general flat-space Robertson–
Walker metric:

g00 = −1 , g0i = 0 , gij = a2 (t) δij , (2)

with a(t) a fixed function (unrelated to V (ϕ)), which is arbitrary except


that we assume that a(t) increases monotonically from a value that vanishes
for t → −∞. The field equation is then

ϕ̈ + 3H ϕ̇ − a−2 ∇2 ϕ + V ′ (ϕ) = 0 , (3)

where as usual H ≡ ȧ/a is the expansion rate. We define a fluctuation δϕ


by writing
ϕ(x, t) = ϕ̄(t) + δϕ(x, t) , (4)
where ϕ̄(t) is a position-independent c-number solution of the field equation:

ϕ̈ + 3H ϕ̇ + V ′ (ϕ) = 0 . (5)

Our calculations will be done using an interaction picture, in which the


time-dependence of δϕ is governed by the part of the Hamiltonian quadratic
in δϕ, so that δϕ satisfies a linear differential equation

δϕ̈ + 3Hδϕ̇ − a−2 ∇2 δϕ + V ′′ (ϕ)δϕ = 0 . (6)

The commutation relations of δϕ are

[δϕ(x, t), δϕ̇(y, t)] = ia−3 (t)δ3 (x − y) , (7)

[δϕ(x, t), δϕ(y, t)] = [δϕ̇(x, t), δϕ̇(y, t)] = 0 . (8)


The fluctuation can therefore be expressed as
Z h i
δϕ(x, t) = d3 q eiq·x α(q)uq (t) + e−iq·x α† (q)u∗q (t) , (9)

where α(q) is an operator satisfying the familiar commutation relations

[α(q), α† (q′ )] = δ3 (q − q′ ) , [α(q), α(q′ )] = 0 , (10)

and uq (t) satisfies the differential equation

üq + 3H u̇q + a−2 q 2 uq + V ′′ (ϕ)uq = 0 (11)

3
and the initial condition, that for t → −∞,
" #
1 T dt′
Z
uq (t) → 3/2
√ exp iq , (12)
(2π) a(t) 2q t a(t′ )

where T is an arbitrary fixed time. (The commutation relations (10) and


the initial condition (12) ensure that the commutation relations (7) and
(8) are satisfied for t → −∞. The three commutators in these commutation
relations satisfy coupled first-order differential equations in time, which with
this initial condition imply that the commutation relations are satisfied for
all times.)
According to the “in–in” formalism[5], the vacuum expectation value of
a product OH (t) of Heisenberg picture fields and their derivatives, all at
time t, is given by1
  Z t   Z t 
hOH (t)iVAC = T̄ exp i HI′ (t′ )dt′ OI (t) T exp −i HI′ (t′ )dt′
−∞ −∞ 0
(13)
where h· · ·i0 denotes the expectation value in a bare vacuum state annihi-
lated by α(q); T and T̄ denote time-ordered and anti-time-ordered prod-
ucts; OI (t) is the operator O(t) expressed in terms of interaction picture
fluctuations; and HI′ is the interaction Hamiltonian, the sum of terms in
the Hamiltonian of third and higher order in the fluctuations, expressed in
terms of the interaction-picture fluctuation δϕ:
1 ′′′ 1
Z  
3 3
HI′ ≡a d x V (ϕ)δϕ3 + V ′′′′ (ϕ)δϕ4 + . . . (14)
6 24

We will evaluate Eq. (13) as an expansion in the number of loops. If


we like, we can introduce a loop-counting parameter g by writing V (ϕ) =
g−2 F (gϕ), with F (z) a g-independent function of z, so that the number of
factors of g in a diagram with L loops and E external scalar lines is

# = 2L − 2 + E . (15)

Thus an expansion in the number of loops is the same as a series in powers


of g2 .
1
It will be implicitly understood that the contours of integration over time are distorted
at very early times to provide exponential convergence factors, as described in ref. [3].

4
III. ONE-LOOP COUNTERTERMS

Infinities are encountered when calculating loop contributions to (13) in


this model. As in flat space, they can be canceled by introducing suitable
counterterms into the Lagrangian. (When regulator fields are introduced,
the counterterms instead cancel dependence on the regulator properties.)
But the Lagrangian cannot know what metric will be adopted, or the clas-
sical field ϕ around which the field ϕ is to be expanded, so neither can the
counterterms. Thus we must return to the generally covariant form (1) of
the Lagrangian in analyzing the possible counterterms that may be needed
and employed.
The general one-loop one-particle-irreducible diagram consists of a loop
into which are inserted a number of vertices, to each of which is attached any
number of external lines. An insertion with N external lines is given by the
(N + 2)th derivative of V (ϕ) with respect to ϕ at ϕ = ϕ, so the counterterm
in the Lagrangian can only be a function of V ′′ (ϕ), and of gµν and its
derivatives. Furthermore, the operators appearing in a counterterm needed
to cancel infinities can only be of dimensionality (in powers of energy) four
or less. But V ′′ (ϕ) has dimensionality two, so the only generally covariant
counterterm satisfying these conditions is of the form2
h i
L1∞loop = −Detg A V ′′ (ϕ) + B[V ′′ (ϕ)]2 + C R V ′′ (ϕ) ,
p
(16)

where R is the usual scalar curvature, and A, B, and C are constants that
depend on the cutoff (that is, on the regulator masses), but not on the
potential. Dimensional analysis tells us that in the absence of regulator fields
A is quadratically divergent, while B and C are logarithmically divergent.
If we now specialize to the Robertson–Walker metric (2), and write the
scalar field as in (4), this counterterm becomes (aside from a c-number term)
"
 1 
L1∞loop = a A V ′′′ (ϕ)δϕ + V ′′′′ (ϕ)δϕ2 + . . .
3
2
 
+B 2V ′′ (ϕ)V ′′′ (ϕ)δϕ + [V ′′′2 (ϕ) + V ′′ (ϕ)V ′′′′ (ϕ)]δϕ2 + . . .
#
1
 
2
−(6Ḣ + 12H )C V (ϕ)δϕ + V ′′′′ (ϕ)δϕ2 + . . .
′′′
. (17)
2
2
√ This argument does not rule out an additional term proportional to
−Detg V ′′ (ϕ)g µν ∂µ ϕ∂ν ϕ, but one-loop diagrams do not generate ultraviolet divergent
terms with spacetime derivatives acting on external line wave functions.

5
These terms are of one-loop order, and hence to that order are to be used
only in the tree approximation, with a new term in the interaction Hamil-
tonian given by Z
∆HI = − d3 x L1∞loop . (18)

The terms shown explicitly in Eq. (17) are the only counterterms in Eq. (18)
that contribute in one-loop order to the one-point and two-point functions.

IV. REGULATORS

The counterterm (17) is certainly not the most general counterterm that
would be consistent with the symmetries of the Robertson–Walker metric.
For instance, if we didn’t know anything about general covariance, we would
have no reason to expect that Ḣ and H 2 should occur in the linear com-
bination R = −6Ḣ − 12H 2 . In order to be sure that the divergences we
encounter will be of a form that can be canceled by the counterterm (17),
although we do our calculations for the Robertson–Walker metric (2), we
shall adopt a regulator scheme derived from a generally covariant theory.
The usual approach to this problem is to use dimensional regularization,
which we wish to avoid for reasons given in Section I. There are other meth-
ods of regularization that have been extensively applied to the evaluation of
expectation values of operators like the energy-momentum tensor in curved
spacetimes[6] but not as far as I know to the calculation of cosmological
correlations.
One such method is covariant point-splitting[7]. This method is well
suited to the calculation of expectation values of bilinear operators, where
the ultraviolet divergence arises from the confluence of the arguments of the
two operators. Because it is a covariant method, it can be implemented by a
renormalization of the bilinear operator that respects its transformation and
convergence properties. It seems difficult to apply covariant point-splitting
to the calculation of cosmological correlations, where one integrates over the
separation of the spacetime arguments of the interaction Hamiltonian.’
There is another widely used method known as adiabatic regulariza-
tion[8]. In this method, one subtracts from the integrand its asymptotic form
for large wave numbers, as determined by an extended version of the WKB
method. Experience has shown that though not covariant, this method
yields the same results for expectation values of bilinear operators as covari-
ant point-splitting[9]. But adiabatic regularization affects the contribution
of small as well as large internal wave numbers, so it seems unlikely that
it can be applied to the calculation of cosmological correlations, where for

6
some diagrams the contribution of small virtual wave numbers to correla-
tion functions depends in a complicated way on external wave numbers, so
that adiabatic regularization cannot be implemented by the introduction of
generally covariant counterterms in the Lagrangian.
We will instead here employ a generally covariant version of Pauli–Villars
regularization[4], which like covariant point splitting and adiabatic regular-
ization has previously been applied to the calculation of expectation values.
For the theory studied here, the Lagrangian (1) is modified to read
"
1 1X  
−Detg − g µν ∂µ ϕ∂ν ϕ − Zn gµν ∂µ χn ∂ν χn + Mn2 χ2n
p
L =
2 2 n
!#
X
−V ϕ+ χn , (19)
n

where χn are regulator fields, and Zn and Mn are real non-zero parameters.
In order to eliminate ultraviolet divergences up to some even order D, we
must take the Zn and regulator masses Mn to satisfy

Zn−1 = −1 , Zn−1 Mn2 = 0 , Zn−1 Mn4 = 0 , . . . , Zn−1 MnD = 0 .


X X X X

n n n n
(20)
For instance, if there were only logarithmic divergences then D = 0, and we
would only need one regulator field, with Z1 = −1. In one-loop calculations
the maximum degree of divergence is quadratic, i.e. D = 2, and to satisfy
the conditions (20) we need at least two regulator fields. In our calculations
we will not need to make a specific choice of the number of regulator fields,
but only assume that there are enough to satisfy Eq. (20).
The coefficients A, B, and C in the one-loop counterterm (17) will be
given values depending on the Zn and Mn , such that all expectation values
(13) approach finite limits independent of the Zn and Mn , as the Mn become
infinite. As we will see, this condition not only fixes the terms in A, B, and
C that are proportional to logarithms of regulator masses and the term in
A that is proportional to squares of regulator masses, but also the terms
in A, B, and C that depend on regulator masses only through their ratios,
and hence that remain fixed as the regulator mass scale goes to infinity.
The only terms in A, B, and C that will not be fixed by this condition are
finite terms independent of regulator properties, which of course represent
the freedom we have to change the parameters in the potential or to add a
non-minimal coupling of the scalar field to the curvature.
The regulator fields χn like the physical field ϕ are written as classical

7
fields plus fluctuations

χn (x, t) = χn (t) + δχn (x, t) . (21)

The classical fields satisfy the coupled field equations


 
ϕ̈ + 3H ϕ̇ + V ′ ϕ +
X
χn = 0 (22)
n
 
χ̈n + 3H χ̇n + Zn−1 V ′ ϕ + χn + Mn2 χn = 0 .
X
(23)
n

We assume throughout that the regulator masses Mn are all much larger
  1/2
than H(t′ ) and V ′′ ϕ(t′ ) over the whole range from t′ → −∞ to the

time t′ = t at which the correlations are measured. In consequence, the


classical field equations (22) and (23) have a solution in which all the χn
are less than ϕ by factors of order H 2 /Mn2 and |V ′′ (ϕ)|/Mn2 , and so may be
neglected. We adopt this solution for the classical fields. In particular, the
field ϕ then satisfies the original classical field equation (5).
In dealing with internal lines, it is convenient to lump together the phys-
ical field fluctuation δϕ and the fluctuations δχn in the regulator fields, by
introducing an index N (and likewise M , etc.) such that δχN is the phys-
ical field fluctuation δϕ for N = 0 and is a regulator field fluctuation for
N = n ≥ 1, both in the interaction picture. The general field fluctuations
satisfy the coupled field equations

δχ̈N + 3Hδχ̇N − a−2 ∇2 δχN + MN


2 −1 ′′
X
δχN + ZN V (ϕ) δχM = 0 , (24)
M

where Z0 = 1 and M0 = 0. The commutation relations of the δχ are

[δχN (x, t), δχ̇M (y, t)] = ia−3 (t)δ3 (x − y)ZN


−1
δN M , (25)

[δχN (x, t), δχM (y, t)] = [δχ̇N (x, t), δχ̇M (y, t)] = 0 . (26)
The general fluctuation can therefore be expressed as
XZ h i
−iq·x †
δχN (x, t) = d3 q eiq·x αM (q)uM
N q (t) + e αM (q)u∗M
N q (t) , (27)
M

where αN (q) satisfy the commutation relations

[αN (q), α†M (q′ )] = δ3 (q − q′ )ZN


−1
δN M , [αN (q), αM (q′ )] = 0 , (28)

8
and the uM
N q (t) are solutions of Eq. (24):

üM M −2 2 M 2 M −1 ′′
uM
X
N q + 3H u̇N q + a q uN q + MN uN q + ZN V (ϕ) Lq = 0 (29)
L

distinguished by the initial condition, that for t → −∞,

1
 Z t 
uM
N q (t) → q M
δN exp −i κN q (t′ ) dt′ , (30)
(2π)3/2 a3/2 (t) 2κN q (t) T

where !1/2
q2 2
κN q (t′ ) ≡ + MN . (31)
a2 (t′ )
The αN (q) are all taken to annihilate the vacuum. The two-point functions
appearing in propagators are then given by
XZ
hδχN (x1 , t1 )δχM (x2 , t2 )i0 = d3 q eiq·(x1 −x2 ) ZK
−1 K
uN q (t1 ) uK∗
M q (t2 ) .
K
(32)
In calculating one-loop graphs, we must integrate over one or more times
ti associated with vertices, and over a single co-moving wave number q.
There are two ranges of q ≡ |q| where the integrand is greatly simplified.
In the first range, q/a(t) (and hence all q/a(ti )) is much greater than
  1/2
H(t′ ) and V ′′ ϕ(t′ ) for all t′ ≤ t, as well as much greater than the

physical wave numbers associated with external lines, though q/a(t) is not
necessarily greater than the regulator masses. In this range, we can reli-
ably evaluate the integrand in an extended version of the WKB approxi-
mation, described in an Appendix. Any term that would be convergent in
the absence of cancelations among the physical and regulator fields makes a
negligible contribution to the integral over this range.
In the second range, q/a(t) is much less than the regulator masses,
  1/2
though it is not necessarily less than H(t′ ) or V ′′ ϕ(t′ ) or the physical

wave numbers associated with external lines. In this range, it is safe to ig-
nore the regulator fields. (We do not have to worry about the contribution
of times t′ so much earlier than t that q/a(t′ ) is of the order of the regulator
masses, because this contribution is exponentially suppressed by the rapid
oscillation of the integrand at these early times.)
It is crucially important to our method of calculation that, because we as-
  1/2
sume that the regulator masses are much larger than H(t′ ) and V ′′ ϕ(t′ )

9
and the physical wave numbers associated with external lines, these ranges
of wave number overlap. We can therefore separate the range of integra-
tion of co-moving wave number by introducing a quantity Q in the overlap
region, so that Q/a(t) is much less than all regulator masses, and much
  1/2
greater than H(t′ ) and V ′′ ϕ(t′ ) and the physical wave numbers asso-

ciated with external lines. We can evaluate the integral over q ≤ Q ignoring
the regulators, and over q ≥ Q by using the WKB approximation. No errors
are introduced by this procedure in the final result, because we are taking
the regulator masses to be arbitrarily large compared with Q/a(t), which
  1/2
is taken to be arbitrarily large compared with H(t′ ) or V ′′ ϕ(t′ ) for


t ≤ t or the physical wave numbers associated with external lines, so terms
proportional to quantities like Q/Mn a(t) or Ha(t)/Q are entirely negligible.
It should be emphasized that Q is neither an infrared nor an ultraviolet
cutoff, but simply a more-or-less arbitrary point at which we choose to split
the range of integration. As long as Q is chosen in the overlap of the two
regions defined in the previous paragraphs, the sum of the integrals over
q ≤ Q and q ≥ Q will automatically be independent of Q.

V. THE TWO-POINT FUNCTION

To demonstrate the use of the methods described in the previous section,


and to evaluate the coefficients A, B, and C in the counterterm (16), we
will now calculate the one-loop corrections to the vacuum expectation value
of the product δϕH (y, t) δϕH (z, t) of Heisenberg picture fields. This can be
written in terms of a Green’s function Gp (t), as
Z  
hδϕH (y, t) δϕH (z, t)iVAC = d3 p exp ip · (y − z) Gp (t) . (33)

Leaving aside vacuum fluctuations and counterterms, there are three one-
loop diagrams, shown in Figure 1. In this section we will consider only
the one-particle-irreducible diagrams, I and II; these will suffice to allow us
in Section VI to fix the coefficients A, B, and C in the counterterm (16).
Diagram III will be dealt with in Section VII.

Diagram I
By the usual rules of the “in–in” formalism, after integrating over spatial
coordinates, the contribution of diagram I to the two-point function is
Z t  Z t  
GIp (t) = −2(2π) Re6 3
dt1 a (t1 ) V ′′′
ϕ(t1 ) dt2 a3 (t2 ) V ′′′ ϕ(t2 )
−∞ −∞

10
b b

II

III

Figure 1: Diagrams for the two-point function.

11
Z
−1 −1
d3 q
X
× ZK ZL
KLM N M ′ N ′
"
× θ(t1 − t2 )u2p (t)u∗p (t1 )u∗p (t2 )uK K∗ L L∗
M q (t1 )uM ′ q (t2 )uN q ′ (t1 )uN ′ q ′ (t2 )
#
1
− |up (t)|2 u∗p (t1 )up (t2 )uK∗ K L∗ L
M q (t1 )uM ′ q (t2 )uN q ′ (t1 )uN ′ q ′ (t2 ) , (34)
2
where q ≡ |q| and q ′ ≡ |q − p|. The first term in the square brackets arises
from diagrams in which the vertices come either both from the time-ordered
product or both from the anti-time-ordered product in Eq. (13), while the
second term arises from diagrams in which one vertex comes from the time-
ordered product and the other from the anti-time-ordered product.
As described at the end of the previous section, to calculate GIp (t) we
divide the region of integration over q ≡ |q| into the ranges q < Q and
q > Q, where Q is chosen so that Q/a(t) is much less than all regulator
masses but much greater than p/a(t) and H(t′ ) and |V ′′ (ϕ(t′ ))|1/2 for all
t′ ≤ t.
For q < Q, we can ignore the regulators, and set K, L, M , N , M ′ , N ′
all equal to zero, with u00q just equal to the wave function uq in the absence
of regulators. This contribution takes the form
Z t  Z t  
GI,<Q
p (t) = −2(2π) Re 6 3
dt1 a (t1 ) V ′′′
ϕ(t1 ) dt2 a3 (t2 ) V ′′′ ϕ(t2 )
−∞ −∞
" Z
× θ(t1 − t2 )u2p (t)u∗p (t1 )u∗p (t2 ) d3 q uq (t1 )u∗q (t2 )uq′ (t1 )u∗q′ (t2 )
q<Q
#
1
Z
− |up (t)|2 u∗p (t1 )up (t2 ) 3
d q u∗q (t1 )uq (t2 )u∗q′ (t1 )uq′ (t2 ) , (35)
2
No limit has been put on the second integral over q, because the oscillat-
ing exponentials in uq and uq′ make this integral converge[3], so that the
contribution of wave numbers with q > Q is exponentially small.
For q ≥ Q, we can use the WKB approximation (30). This contribution
then takes the form
Z t  Z t1  
GI,>Q
p (t) = −2Re dt1 V ′′′ ϕ(t1 ) dt2 V ′′′ ϕ(t2 ) u2p (t)u∗p (t1 )u∗p (t2 )
−∞ −∞
d3 q
Z
−1 −1
X
× ZK ZL q
KL q>Q 4 κKq (t1 )κKq (t2 )κLq (t1 )κLq (t2 )
 Z t1 
′ ′ ′
× exp −i [κKq (t ) + κLq (t )] dt . (36)
t2

12
Note that we have dropped the distinction between q ′ and q, because p is
negligible compared with q for q > Q. We have also dropped the contribution
of the second term in Eq. (34), because this term converges for each K and
L, and so makes a negligible contribution to the integral over values q > Q.
The contribution of values of t2 at any fixed time less than t1 is also
negligible, because of the rapid oscillation of the final factor. But there
is an important contribution from values of t2 that are so close to t1 that
(t1 − t2 )Q/a(t1 ) is not large. This contribution can be evaluated by setting
t2 = t1 everywhere except in the range of integration in the exponential, so
that
Z t  Z t1  
GI,>Q
p (t) = −2Re dt1 V ′′′
ϕ(t1 ) dt2 V ′′′ ϕ(t1 ) u2p (t)u∗2
p (t1 )
−∞ −∞
Z 3
d q
−1 −1
X
× ZK ZL
KL q>Q 4κKq (t1 )κLq (t1 )
× exp (−i(t1 − t2 )[κKq (t1 ) + κLq (t1 )])
Z t  2 h i
= − dt1 V ′′′ ϕ(t1 ) Im u2p (t)u∗2
p (t1 )
−∞
d3 q
Z
−1 −1
X
× ZK ZL .(37)
KL q>Q 2κKq (t1 )κLq (t1 )[κKq (t1 ) + κLq (t1 )]
−1
The integral over q converges because K ZK = 0. This integral receives
P

contributions from terms where χK and χL are both regulator fields χm


and χn , or are a regulator field χn and a physical field χ0 = ϕ, or are two
physical fields. Adding these contributions gives
Z t  2 h i
GI,>Q
p (t) =π dt1 a3 (t1 ) V ′′′ ϕ(t1 ) Im u2p (t)u∗2
p (t1 )
−∞
" #
2 2
−1 −1 Mn ln Mn − Mm ln Mm Q

Zn−1 ln Mn + ln
X X
× Zm Zn 2 2
+ 2 . (38)
mn Mn − Mm n a(t1 )
Note that, because n Zn−1 = −1, this is independent of the units used to
P

measure Q and the regulator masses, as long as the same units are used in
all logarithms.

Diagram II
By the usual rules of the “in–in” formalism, after integrating over spatial
coordinates, the contribution of diagram II to the two-point function (33) is
given by
Z t    
GII
p (t) = (2π) 3
dt1 a3 (t1 ) V ′′′′ ϕ(t1 ) Im u2p (t)u∗2
p (t1 )
−∞

13
Z
−1
d3 q uK K∗
X
× ZK N q (t1 )uN ′ q (t1 ) . (39)
KN N ′

We again divide the range of integration over q ≡ |q| into the ranges q < Q
and q ≥ Q, where Q is chosen so that Q/a(t) is much less than all regulator
masses but much greater than p/a(t) and H(t′ ) and |V ′′ (ϕ(t′ ))|1/2 for all
t′ ≤ t1 .
For q < Q we can ignore the regulators, and set K, N , and N ′ all equal to
zero, with u00q just equal to the wave function uq in the absence of regulators.
This contribution takes the form
Z t    
GII,<Q
p (t) = (2π)3 dt1 a3 (t1 ) V ′′′′ ϕ(t1 ) Im u2p (t)u∗2
p (t1 )
−∞
Z
× d3 q |uq (t1 )|2 . (40)
q<Q

For q > Q the individual terms in Eq. (39) are quadratically divergent,
so here we need an extended version of the WKB approximation (30), in
which we keep terms in u of order κ−3/2 and κ−5/2 as well as κ−1/2 . This is
complicated by the presence of the potential term in Eq. (29), which couples
wave functions with different κs. We will deal with this by considering the
potential term in Eq. (29) as a perturbation. Of course, V ′′ (ϕ) is not a per-
turbation; it is of zeroth order in the loop-counting parameter g introduced
at the end of Section II. However, each insertion of V ′′ (ϕ) in the loop in
Diagram II lowers its degree of divergence by two units, so the only terms
we need to consider are those of zeroth and first order in V ′′ (ϕ), which in
the absence of cancelations are quadratically and logarithmically divergent,
respectively. Terms of higher order in V ′′ (ϕ) are convergent even in the
absence of cancelations, and are therefore negligible.
To evaluate the terms in GII,>Q
p (t) of zeroth order in V ′′ (ϕ), we note
that in the absence of the potential, uK N (t1 ) is proportional to δKN :

uK
N q (t1 ) = δN K uN q (t1 ) , (41)

where
üN q + 3H u̇N q + (q 2 /a2 )uN q + MN
2
uN q = 0 . (42)
This contribution is
Z t    
GII,>Q,0
p (t) = (2π)3 dt1 a3 (t1 ) V ′′′′ ϕ(t1 ) Im u2p (t)u∗2
p (t1 )
−∞
Z
−1
d3 q |uN q (t1 )|2 .
X
× ZN (43)
N

14
The integrand is given by an asymptotic expansion derived in the Appendix.
For both q 2 /a2 (t1 ) and MN
2 much greater than both H 2 (t ) and Ḣ(t ), we
1 1
have
" #
2 1 Ḣ + 2H 2 (Ḣ + 3H 2 )MN
2 5H 2 MN 4
|uN q | → 1 + + − ,
2κN q a3 (2π)3 2κ2N q 4κ4N q 8κ6N q
(44)
 2
where, as before, κ2N q (t1 ) = q/a(t1 ) 2 . The integral over q converges
+ MN
−1 −1 2 = 0. The sum over N receives contri-
because N ZN = N ZN MN
P P

butions from terms where χN is a regulator field χn or the physical field


χ0 = ϕ. Adding these contributions gives
Z t    
GII,>Q,0
p (t) = π dt1 a3 (t1 ) V ′′′′ ϕ(t1 ) Im u2p (t)u∗2
p (t1 )
−∞
"
  5 
Zn−1 Mn2 ln Mn + Ḣ(t1 ) + 2H 2 (t1 ) Zn−1 ln Mn
X X
× −
n 6 n
#
Q2 Q
  
− − Ḣ(t1 ) + 2H 2 (t1 ) ln . (45)
a2 (t1 ) a(t1 )
The regulator-dependent term arising from diagram II that are of first
order in V ′′ (ϕ) can be calculated by applying the rules of the “in-in” for-
malism a diagram like that of diagram II, but with a V ′′ insertion in the
loop. This gives
Z t  Z t  
GII,>Q,1
p (t) = −(2π)6 dt1 a3 (t1 ) V ′′′′ ϕ(t1 ) dt2 a3 (t2 ) V ′′ ϕ(t2 )
−∞ −∞
Z (
−1 −1
d3 q Re u2p (t)u∗p (t1 )u∗p (t1 )
X
× ZK ZL
KLM N M ′ N ′ q>Q
)
h i
× θ(t1 − t2 )uK K∗ L L∗
M q (t1 )uM ′ q (t2 )uN q (t1 )uN ′ q (t2 ) + 1 ↔ 2 . (46)

(This contribution is produced only by terms in which both interactions


come from the time-ordered product in Eq. (13), or both from the anti-
time-ordered product. As in the case of diagram I, the other terms make a
negligible contribution to the part of the integral with q > Q.) The individ-
ual terms in Eq. (46) are only logarithmically divergent, so we can evaluate
this using the leading term (30) in the WKB approximation. Following the
same limiting procedure as for diagram I, we find
Z t     h i
GII,>Q,1
p (t) =π dt1 a3 (t1 ) V ′′′′ ϕ(t1 ) V ′′ ϕ(t1 ) Im u2p (t)u∗2
p (t1 )
−∞

15
" #
2 2
−1 −1 Mn ln Mn − Mm ln Mm Q

Zn−1 ln Mn
X X
× Zm Zn 2 2
+2 + ln . (47)
mn Mn − Mm n a(t1 )

Total 1PI Amplitude


The complete contribution of the two one-particle irreducible diagrams
is given by the sum of the terms (35), (38), (40), (45), and (47):
Z t  Z t1  
G1P I
p (t) = −2(2π)
6
dt1 a3 (t1 )V ′′′ ϕ(t1 ) dt2 a3 (t2 )V ′′′ ϕ(t2 )
−∞ −∞
 Z 
× Re u2p (t)u∗p (t1 )u∗p (t2 ) 3
d q uq (t1 )u∗q (t2 )uq′ (t1 )u∗q′ (t2 )
q<Q
Z t  Z t  
+(2π)6 dt1 a3 (t1 )V ′′′ ϕ(t1 ) dt2 a3 (t2 )V ′′′ ϕ(t2 )
−∞ −∞
 Z 
2 3
× |up (t)| Re u∗p (t1 )up (t2 ) d q u∗q (t1 )u∗q′ (t1 )uq (t2 )uq′ (t2 )
Z t   n oZ
+(2π)3 a3 (t1 )V ′′′′ ϕ(t1 ) Im u2p (t)u∗p (t1 ) d3 q |uq (t1 )|2
−∞ q<Q
Z t   2     n o
3
+π dt1 a (t1 ) V ′′′
ϕ(t1 ) +V ′′′′
ϕ(t1 ) V ′′
ϕ(t1 ) Im u2p (t)u∗2
p (t1 )
−∞
" ! #
Mn2 ln Mn − Mm2 ln M Q

m
Zn−1 Zm
−1
Zn−1 ln Mn
X X
× +2 + ln
mn Mn2 − Mm
2
n a(t1 )
Z t   n o
+π dt1 a3 (t1 )V ′′′′ ϕ(t1 ) Im u2p (t)u∗2
p (t1 )
−∞
"
Q2
Zn−1 Mn2 ln Mn −
X
×
n a2 (t1 )
! #
5 X −1 Q
  
+ Ḣ(t1 ) + 2H 2 (t1 ) − Zn ln Mn − ln . (48)
6 n a(t1 )

To repeat, q ′ ≡ |q − p|, and Q is any wave number for which Q2 /a2 (t)
is much larger than H 2 and V ′′ (ϕ) and p2 /a2 (t) and much less than all
regulator masses. In this range, the Q-dependence of the first and third
terms is canceled by the explicit Q-dependence of the fourth and fifth terms.

VI. CANCELING THE REGULATORS

The terms in the counterterm (17) that are quadratic in the fluctuation
make a contribution to the interaction-picture Hamiltonian of the form
1
Z
∆HIquad (t) = G(t) d3 x δϕ2 (x, t) , (49)
2

16
where
h i
G = −a3 AV ′′′′ (ϕ) + 2B[V ′′′2 (ϕ) + V ′′ (ϕ)V ′′′′ (ϕ)] − 6C(Ḣ + 2H 2 )V ′′′ (ϕ)
(50)
According to the rules of the “in-in” formalism, this makes a contribution
to the two-point function (33) given by
Z t
∆G1PI
p (t) = 2(2π) 3
dt1 G(t1 )Im{u2p (t)u∗2
p (t1 )} . (51)
−∞

Comparing Eqs. (50) and (51) with (48), we see that in order to cancel
the dependence of the one-particle irreducible two-point function on the
regulator properties, we need
" #
1
Zn−1 Mn2 ln Mn + µ2A
X
A = 2
(52)
16π n
" !
1 Mn2 ln(Mn /µB ) − Mm
2 ln(M /µ )
m B
Z −1 Z −1
X
B =
32π 2 nm n m 2
Mn − Mm 2
#
Zn−1 ln(Mn /µB )
X
+2 (53)
n
!
1 5 X −1 Mn

C = − − Zn ln . (54)
96π 2 6 n µC

(The first term in Eq. (52) does not depend on the units used for regulator
masses in the logarithm, because n Zn−1 Mn2 = 0.) Here µA , µB , and µC
P

are unknown mass parameters. The presence of these parameters should


not be seen as a drawback of this method; they reflect the real freedom we
have to add finite regulator-independent terms to the original Lagrangian
proportional to V ′′ (ϕ) or V ′′2 (ϕ) or R V ′′ (ϕ).
Adding Eqs. (48) and (51) gives our final answer for the one-particle-
irreducible part of the two-point function
" Z t  Z t1  
G1P I
p (t) + ∆G1P I
p (t) = − 2(2π)6 dt1 a3 (t1 )V ′′′ ϕ(t1 ) dt2 a3 (t2 )V ′′′ ϕ(t2 )
−∞ −∞
 Z 
× Re u2p (t)u∗p (t1 )u∗p (t2 ) d3 q uq (t1 )u∗q (t2 )uq′ (t1 )u∗q′ (t2 )
q<Q
t #
Q
Z  2 
+π dt1 a3 (t1 )V ′′′ ϕ(t1 ) Im{u2p (t)u∗2
p (t1 )} ln
−∞ a(t1 )µB
Z t  Z t  
6 3
+(2π) dt1 a (t1 )V ′′′
ϕ(t1 ) dt2 a3 (t2 )V ′′′ ϕ(t2 )
−∞ −∞

17
 Z 
2 3
× |up (t)| Re u∗p (t1 )up (t2 ) d q u∗q (t1 )u∗q′ (t1 )uq (t2 )uq′ (t2 )
" Z t   n oZ
+ (2π)3 dt1 a3 (t1 )V ′′′′ ϕ(t1 ) Im u2p (t)u∗p (t1 ) d3 q |uq (t1 )|2
−∞ q<Q
(
t Q2
Z  
+π dt1 a3 (t1 )V ′′′′ ϕ(t1 ) Im{u2p (t)u∗2
p (t1 )} −
−∞ a2 (t1 )
)#
Q Q
       
2
+V ′′
ϕ(t1 ) ln − Ḣ(t1 ) + 2H (t1 ) ln + µ2A . (55)
a(t1 )µB a(t1 )µC

For Q2 /a2 (t) much larger than H 2 , Ḣ, |V ′′ (ϕ)|, and p2 /a2 (t), all Q de-
pendence cancels separately in the terms in square brackets on the first
three lines and on the last three lines. In this form, the two-point function
(including also the one-particle-reducible contribution discussed in the fol-
lowing section) can be calculated even if all we have for the wave functions
uq (t′ ) is a numerical approximation.

VII. ONE-PARTICLE-REDUCIBLE DIAGRAMS

We now turn to the one-particle-reducible diagram III. In this diagram


the two external lines come together in a three-field vertex, with the third
line terminating either in a three-field vertex to which is attached a scalar
loop or a one-field vertex arising from the part of the one-loop counterterm
(17) that is linear in δϕ. This part of the counterterm is
Z
∆HIlin (t) = F(t) d3 x δϕ(x, t) , (56)

with F(t) given by


h i
F = −a3 AV ′′′ (ϕ) + 2BV ′′ (ϕ)V ′′′ (ϕ) − C(6H 2 + 12Ḣ)V ′′′ (ϕ) . (57)

This diagram requires special treatment, because the line connecting the two
vertices carries zero three-momentum. For this reason, here we will delay
integrating over the difference x of the spatial coordinate of the two vertices.
The full one-particle-reducible contribution to the two-point function (33)
is then
Z t  
G1P
p
R
(t) = 2(2π)3 Re dt1 a3 (t1 )V ′′′ ϕ(t1 ) u2p (t1 )u∗2
p (t1 )
−∞
Z t Z h i
× dt2 I(t2 ) d3 x − hT {δϕ(0, t1 )δϕ(x, t2 )}i0 + hδϕ(x, t1 )δϕ(0, t2 )i0 (58)
,
−∞

18
where
1   Z
I(t2 ) ≡ a3 (t2 )V ′′′ ϕ(t2 ) d3 q uK K∗
X
N q (t2 )uN ′ q (t2 ) + F(t2 ) . (59)
2 KN N ′

In the first term in the square brackets in Eq. (58), both vertices come
from the time-ordered product in Eq. (13), while in the second term, ver-
tex 1 comes from the time-ordered product and vertex 2 from the anti-
time-ordered product; in the complex conjugate time-ordered and anti-time-
ordered products are interchanged.
There is no problem here with ultraviolet divergences coming from the
integral over q. Following the same procedure as in our treatment of diagram
II in the preceeding two sections, we have
"Z
1 3 1 Q2 Q
     
3 2
I(t2 ) = a (t2 )V ′′′ ϕ(t2 ) d q|uq (t2 )| + 2 − 2 + V ′′ ϕ(t2 ) ln
2 q<Q 8π a (t2 ) a(t2 )µB
!#
Q
   
2
− Ḣ(t2 ) + 2H (t2 ) ln + µ2A , (60)
a(t2 )µC

where Q is any wave number with Q2 /a2 (t) much larger than Ḣ(t′ ) and
H 2 (t′ ) and V ′′ ϕ(t′ ) for all t′ ≤ t. All dependence of Q cancels in this

limit.
But there is an apparent problem with infrared effects. Eq. (58) involves
the integrals
Z Z
d3 x hT {δϕ(0, t1 ) δϕ(x, t2 )}i0 and d3 x hδϕ(x, t2 ) δϕ(0, t1 )i0 .

When we use Eq. (9) for the interaction-picture fields, the integrals over x
pick out the value zero for the wave number q. But the wave function uq (t) is
not defined in the case q = 0, because in this case there is of course no
 time
early enough so that q 2 /a2 (t) is much larger than H 2 (t) and |V ′′ ϕ(t) |. For
the same reason, the argument for the Bunch–Davies condition α(q)Φ0 = 0
breaks down for q = 0.
Fortunately, we need the integrals over x only in the combination
Z
d3 x [−hT {δϕ(0, t1 ) δϕ(x, t2 )}i0 + hδϕ(x, t2 ) δϕ(0, t1 )i0 ]
= iθ(t1 − t2 ) G(t1 , t2 ) (61)

where Z Dh iE
G(t1 , t2 ) ≡ i d3 x δϕ(0, t1 ) , δϕ(x, t2 ) . (62)
0

19
Despite the ambiguity in u0 (t) and the inapplicability of the Bunch–Davies
condition for q = 0, the function G(t1 , t2 ) is perfectly well-defined. It is the
solution of the second-order differential equation
" #
d2 d ′′
 
+ 3H(t 1 ) + V ϕ(t 1 ) G(t1 , t2 ) = 0 , (63)
dt21 dt1
subject to initial conditions dictated by the commutation relations (7) and
(8):
G(t2 , t2 ) = 0 , (64)
d
 
G(t1 , t2 ) = a−3 (t2 ) . (65)
dt1 t1 =t2
The only property of the vacuum state used here is that it has zero momen-
tum and unit norm. The general solution is
Z t1 dt
G(t1 , t2 ) = u(t1 ) u(t2 ) , (66)
t2 a3 (t) u2 (t)
where u(t) is any solution of the q = 0 wave equation
ü + 3H u̇ + V ′′ (ϕ)u = 0 , (67)
that does not vanish between t1 and t2 . (For instance, for a general poten-
tial and a de Sitter metric, we can take u = ϕ̇, which does not vanish in
typical inflationary models.) Putting this together, we have the one-particle-
reducible contribution to the two-point function (33):
Z t  
G1P
p
R
= −2(2π) 3
dt1 a3 (t1 ) V ′′′ ϕ(t1 ) Im{u2p (t)u∗2
k (t1 )}
−∞
Z t1
× dt2 G(t1 , t2 ) I(t2 ) . (68)
−∞

VIII. THE ONE-POINT FUNCTION

In Section II we defined δϕ as the departure of the field ϕ from its


classical value ϕ, not from its mean value, so we must expect δϕ to have a
non-vanishing expectation value. As we will see, this is closely related to
quantities calculated in the previous section.
According to the general diagrammatic rules, the vacuum expectation
value of the Heisenberg picture scalar field fluctuation in one-loop order is
Z Z t
hδϕH (y, t)ione
VAC
loop
= −i d3 x1 dt1 hδϕ(y, t) δϕ(x1 , t1 )i0 I(t1 ) + c.c. ,
−∞
(69)

20
with I given by Eq. (60) representing the insertion of a loop or a counterterm
at the end of the single incoming line. In the term shown in Eq. (69) the
single vertex comes from the time-ordered product in Eq. (13); in its complex
conjugate, the vertex comes from the anti-time-ordered product. The two
terms together involve the commutator of the field perturbations, so the
one-point function may be written in terms of the function G defined by
Eq. (62):
Z t
hδϕH (y, t)ione
VAC
loop
=− dt1 G(t, t1 ) I(t1 ) . (70)
−∞
We see now that the contribution (68) of the one-particle-reducible diagrams
to the two-point function may be simply expressed in terms of the mean
fluctuation:
Z t  
G1P
p
R
(t) = dt1 a3 (t1 ) V ′′′ ϕ(t1 )
−∞
× hδϕH (0, t1 )ione
VAC
loop
Im{u2k (t)u∗2
k (t1 )} . (71)

This is the same as would be given by adding an interaction obtained by


shifting δϕ by its expectation value:
1   Z
∆HI (t) = a3 (t) V ′′′ ϕ(t) hδϕH (0, t)ione
VAC
loop
d3 x δϕ2 (x, t) . (72)
2
IX. INFRARED DIVERGENCES?

Although the model treated in this paper is intended to provide an il-


lustration of a method of dealing with ultraviolet divergences, it may be of
some interest to look into the possible presence of infrared divergences in
this model. For any fixed co-moving wave number q, the evolution of the
wave function uq (t) defined by Eqs. (11) and (12) becomes q-independent
once q/a(t) drops below H(t), so the behavior of the wave
 function for fixed
t and q → 0 is determined by the behavior of V ϕ(t ) and H(t′ ) for t′ → 0.
′′ ′

We can distinguish two cases in which this problem is greatly simplified.

Expansion-dominated:
 
If V ′′ ϕ(t′ ) ≪ H 2 (t′ ) for t′ → 0, then as long as this inequality is satisfied,

we can drop the potential term in Eq. (11), which then becomes the same
as the differential equation for tensor fluctuations. It is well known[10]
in this case that if Ḣ(t′ ) → −ǫH 2 (t′ ) as t′ → 0, then the wave function
uq (t1 ) at a fixed time t1 goes as q −3/2−ǫ for q/a(t 1 )≪ H(t
 1 ). This q-
2 ′′
dependence is unaffected even if H (t) drops below V ϕ(t) at some time

21
after q/a drops below H, since the evolution of the wave function at such
times is q-independent. So (taking ǫ < 1) the integral over q of the product
uq (t1 )u∗q (t2 ) in the propagator will be infrared divergent if and only if ǫ ≥ 0.
(We have been assuming that as time passes fluctuations leave the horizon
rather than entering it, so this discussion is limited to the case ǫ < 1. For
the case ǫ ≥ 1, see ref. [11].) There is no infrared divergence in the unlikely
event that the expansion rate increases at very early times.

Potential-dominated:
 
If V ′′ ϕ(t′ ) ≫ H 2 for t′ → 0, then as long as this inequality is satisfied,

Eqs. (11) and (12) have a WKB solution

1  Z T 

uq (t ) ≃ exp i ω(t′′ ) dt′′ , (73)
(2π)3/2 a3/2 (t′ ) 2ω(t′ )
p
t′

where T is arbitrary, and


s 2
′ q  
ω(t ) ≡ + V ′′ ϕ(t′ ) . (74)
a(t′ )
 
Once q/a(t′ ) falls below |V ′′ ϕ(t′ ) |, the wave function uq (t′ ) becomes inde-
pendent of q, aside from a q-dependent phase that is independent of t′ . Later,

H 2 (t′ ) may or may not become comparable to or greater than V ′′ ϕ(t′ ) ,

but this cannot affect the q-dependence of the wave function. Therefore
when the potential dominates at very early times, the product uq (t1 )u∗q (t2 )
in the propagator at fixed times t1 and t2 becomes q-independent for q → 0,
and there is no infrared divergence when we integrate the propagator over
q.

X. FURTHER ISSUES

The method described here can of course be applied in this model to all
one-loop correlation functions. The same counterterms, given by Eqs. (16)
or (17) and (52)–(54) will remove dependence on the regulator properties,
because the only ultraviolet divergences in one-loop one-particle-irreducible
diagrams occur in the one-point and two-point functions, which we have
already discussed in Sections V through VIII. The only ultraviolet diver-
gences in higher correlation functions arise in diagrams in which trees are
attached to loops at either one or two vertices, and the divergences in these
loops are just those with which we have dealt. Multi-loop graphs are more
challenging.

22
Beyond the simple model discussed here, of a scalar field in a fixed met-
ric, there is the more realistic problem of scalar and tensor fluctuations in
a theory of coupled scalar and gravitational fields. This is more compli-
cated, because even in one-loop order there are quartic as well as quadratic
and logarithmic ultraviolet divergences. That alone should not prevent the
method described here from being applicable to realistic theories, at least
for one-loop graphs, since divergences of any order can be eliminated by
including enough regulator fields.
A more serious problem is the difficulty of introducing regulator fields for
the graviton propagator. (This problem is of course avoided in theories with
large numbers of matter fields, where matter loops dominate over graviton
loops.) If the only vertices that involve gravitons have a single graviton line
attached to matter lines, then we can introduce regulators for the gravi-
ton propagator by coupling heavy tensor fields with suitable Z-factors to
the energy-momentum tensor. But it is not clear how to deal with graphs
containing vertices to which are attached two or more graviton lines.
This raises the question whether Pauli–Villars regularization is really
necessary. The final results (55) and (68) for the one-particle irreducible
and reducible parts of the two-point function could almost have been guessed
without introducing regulator fields. It would only be necessary to intro-
duce an ultraviolet cut-off at a sufficiently large co-moving wave number
Q, calculate the Q-dependence of the resulting two-point function by using
the WKB methods described in this paper, and then introduce a countert-
erm of form (16), with A, B, and C chosen as functions of Q to cancel the
Q-dependence found in this way. (This is not the adiabatic regularization
procedure mentioned in Section IV, even though both procedures use WKB
methods, because with a cut-off at Q only the part of the integrand for
internal wave numbers larger than Q is affected.) Of course, this procedure
leaves finite terms in A, B, and C undetermined, but they are undetermined
anyway, since they represent the real possibility of changing the original La-
grangian by adding corrections to the potential and adding a coupling of
the scalar field to the spacetime curvature. The cut-off introduced in this
way would not respect general covariance, but apparently one would get the
correct results (55) and (68) anyway.
There is something mysterious about this. The actual calculations in
this paper were done for a fixed Robertson–Walker metric, Eq. (2). They
would have been done in the same way by someone who had never heard
of general covariance. Yet the infinities turned out to depend on H and
Ḣ only in the combination Ḣ + 2H 2 , proportional to the scalar spacetime
curvature. We can understand this for a generally covariant regularization

23
procedure, like Pauli–Villars regularization, because in that case general
covariance is broken only by the background, which presumably does not
affect ultraviolet divergences. But how do these calculations know that they
are supposed to give infinities that can be canceled by counterterms that
are generally covariant, when we use a non-covariant cutoff on the internal
wave number instead of introducing regulator fields?
ACKNOWLEDGMENTS
I am grateful for discussions with Joel Meyers, Emil Mottola, and Richard
Woodard. This material is based upon work supported by the National Sci-
ence Foundation under Grant Numbers PHY-0969020 and PHY-0455649
and with support from The Robert A. Welch Foundation, Grant No. F-
0014.
APPENDIX: THE EXTENDED WKB APPROXIMATION
We wish to find an asymptotic expression for the solution uq (t) of the dif-
ferential equation
 
üq (t) + 3H(t)u̇q (t) + q 2 /a2 (t) uq (t) + M 2 uq (t) = 0 (A.1)
subject to the initial condition, that for t → 0,
!
1
Z T
′ ′
uq (t) → 3/2
√ exp iq dt /a(t ) . (A.2)
(2π) a(t) 2q t

(The effects of the potential are treated separately in Section V.) We are
interested in the behavior of uq (t) at a fixed time t, when q/a(t) is much
larger than H(t), but not necessarily greater than M .
As an ansatz, we take
!
1
Z T f (t) g(t)

′ ′
uq (t) → p exp i κ(t )dt 1+ + 2 + O(κ−3 )
(2π)3/2 a3/2 (t) 2κ(t) t κ(t) κ (t)
(A.3)
with f , g, etc. of zeroth order in q and M , and
q
κ(t) ≡ q 2 /a2 (t) + M 2 . (A.4)
This clearly satisfies the initial condition (A.2). The differential equation
(A.1) is satisfied by (A.3) to order κ3/2 and κ1/2 , while the terms in (A.1)
of order κ−1/2 (counting M as being the same order as κ) give
!
d f i 3H 2 M 2 5M 4 H 2 ḢM 2
 
2
= Ḣ + 2H + − + . (A.5)
dt κ 2κ 2κ2 4κ4 2κ2

24
The terms in (A.1) of order κ−3/2 are more complicated, but fortunately
we only need these terms in |uq |2 , and for this purpose we can avoid having
to work out these terms by using the time-dependence of the Wronskian:
1
u∗q u̇q − uq u̇∗q ∝ . (A.6)
a3
Using (A.3) gives
  4iRe f
2(2π)3 a3 u∗q u̇q − uq u̇∗q = −2i −
κ
2i d Imf |f |2 4iRe g
 
+ − 2i − + O(κ−3 ) . (A.7)
κ dt κ κ2 κ2
Now, Eq. (A.5) shows that d/dt(f /κ) is imaginary, so since f (t)/κ(t)
vanishes for t → 0, f (t)/κ(t) and hence f (t) is imaginary for all t. The
first term on the right-hand side of Eq. (A.7) is constant, and the second
term vanishes, so the constancy of this quantity requires the vanishing of
the terms of order κ−2 :
d Imf
 
|f |2 + 2Re g = κ (A.8)
dt κ
But this is just what we need, for Eq. (A.3) (with f imaginary) gives
" #
2 1 |f (t)|2 + 2Re g(t)
|uq (t)| → 1 + . (A.9)
2κ(t)(2π)3 a3 (t) κ2 (t)

Together with Eqs. (A.5) and (A.8), this gives the result used in evaluating
diagram II in Section V.
" #
2 1 Ḣ + 2H 2 (Ḣ + 3H 2 )M 2 5H 2 M 4
|uq | → 1 + + − . (A.10)
2κa3 (2π)3 2κ2 4κ4 8κ6
———-

1. L. Senatore and M. Zaldarriaga, [0912.2734].

2. S. Weinberg, Phys. Rev. D 74, 023508 (2006) [hep-th/0605244]; K.


Chaicherdsakul, Phys. Rev. D 75, 063522 (2007) [hep-th/0611352];
D. Seery, JCAP 0711, 025 (2007) [0707.3377]; E. Dimastrogiovanni
and N. Bartolo, JCAP 0811, 016 (2008) [0807.2709]; P. Adshead, R.
Easther, and E. A. Lim, Phys. Rev. D 79, 063504 (2009) [0809.4008];
X. Gao and F. Xu, JCAP 0907, 042 (2009) [0905.0405]; D. Campo,
[0908.3642].

25
3. P. Adshead, R. Easther, and E. A. Lim, [0904.4207].

4. W. Pauli and F. Villars, Rev. Mod. Phys. 75, 434 (1949). Pauli-
Villars regularization has been applied to the problem of calculating
the expectation value of the energy-momentum tensor in a curved
spacetime, by C. Bernard and A. Duncan, Ann. Phys. 107, 201 (1977)
and A. Vilenkin, Nuovo Cimento 44 A, 441 (1977), but not as far as
I know in the more complicated problem of calculating cosmological
correlations. The present work was done as a result of preparing a
course on quantum field theory given in Spring 2010.

5. J. Schwinger, Proc. Nat. Acad. Sci. US 46, 1401 (1960); J. Math.


Phys. 2, 407 (1961); K. T. Mahanthappa, Phys. Rev. 126, 329
(1962); P. M. Bakshi and K. T. Mahanthappa, J. Math. Phys. 4,
1, 12 (1963); L. V. Keldysh, Soviet Physics JETP 20, 1018 (1965);
D. Boyanovsky and H. J. de Vega, Ann. Phys. 307, 335 (2003); B.
DeWitt, The Global Approach to Quantum Field Theory (Clarendon
Press, Oxford, 2003): Sec. 31. For a review, with applications to
cosmological correlations, see S. Weinberg, Phys. Rev. D72, 043514
(2005) [hep-th/0506236].

6. For a general survey, see N. D. Birrell and P. C. W. Davies, Quantum


Fields in Curved Space (Cambridge University Press, 1982).

7. S. M. Christensen, Phys. Rev. D 14, 2490 (1976).

8. L. Parker and S. A. Fulling, Phys. Rev. D 341 (1974); N. D. Birrell,


Proc. Roy. Soc. (London) B361, 513 (1978); T. S. Bunch, J. Phys. A
13, 1297 (1980).

9. P. R. Anderson and L. Parker, Phys. Rev. D 36, 2963 (1987); S. Habib,


C. MOlin-Paris and E. Mottola, Phys. Rev. D 61, 024010 (1999); P.
Anderson, W. Eaker, S.Habib, C. Molina-Paris, and E.Mottola, Phys.
Rev. D 62, 124019 (2000)

10. For a textbook treatment, see S. Weinberg, Cosmology (Oxford Uni-


versity Press, 2008), Sec. 10.3.

11. T. M. Jannsen, S. P. Miao, T. Prokpec, and R. P. Woodard, Class.


Quant. Grav. 25, 245013 (2008) [0808.2449]

26
UTTG-18-11

Collapse of the State Vector


arXiv:1109.6462v4 [quant-ph] 3 Jul 2012

Steven Weinberg∗
Theory Group, Department of Physics, University of Texas
Austin, TX, 78712

Abstract

Modifications of quantum mechanics are considered, in which the state vec-


tor of any system, large or small, undergoes a stochastic evolution. The
general class of theories is described, in which the probability distribution
of the state vector collapses to a sum of delta functions, one for each possible
final state, with coefficients given by the Born rule.


Electronic address: weinberg@physics.utexas.edu

1
I. INTRODUCTION

There is now in my opinion no entirely satisfactory interpretation of


quantum mechanics[1]. The Copenhagen interpretation[2] assumes a myste-
rious division between the microscopic world governed by quantum mechan-
ics and a macroscopic world of apparatus and observers that obeys classical
physics. During measurement the state vector of the microscopic system
collapses in a probabilistic way to one of a number of classical states, in a
way that is unexplained, and cannot be described by the time-dependent
Schrödinger equation. The many-worlds interpretation[3] assumes that the
state vector of the whole of any isolated system does not collapse, but evolves
deterministically according to the time-dependent Schrödinger equation. In
such a deterministic theory it is hard to see how probabilities can arise. Also,
the branching of the world into vast numbers of histories is disturbing, to say
the least. The decoherent histories approach[4] like the Copenhagen inter-
pretation gives up on the idea that it is possible to completely characterize
the state of an isolated system at any time by a vector in Hilbert space,
or by anything else, and instead provides only a set of rules for calculating
the probabilities of certain kinds of history. This avoids inconsistencies, but
without any objective characterization of the state of a system, one wonders
where the rules come from.
Faced with these perplexities, one is led to consider the possibility that
quantum mechanics needs correction. There may be a Hilbert space vector
that completely characterizes the state of a system, but that suffers an in-
herently probabilistic physical collapse, not limited as in the Copenhagen
interpretation to measurement by a macroscopic apparatus, but occurring
at all scales, though presumably much faster for large systems. From time
to time specific models for this sort of collapse have been proposed[5]. In the
present article we will consider the properties of theories of the stochastic
evolution of the state vector in a more general formalism. We assume that
this evolution depends only on the state vector, with no hidden variables. In
contrast to earlier work, we concentrate on the linear first-order differential
equation that in general describes the evolution of the probability distribu-
tion of the state vector in Hilbert space. We find conditions on this evolution
so that it leads to final states with probabilities given by the Born rule of
ordinary quantum mechanics. This general formalism is also applied to the
special case of a state vector that evolves through quantum jumps. Theories
of the evolution of the density matrix are examined as an important special
case of the more general formalism.

2
II. EVOLUTION OF THE STATE VECTOR’S PROBABILITY
DENSITY

We consider a general isolated system, which may or may not include


a macroscopic measuring apparatus and/or an observer. We assume as in
ordinary quantum mechanics that the state of the system is entirely de-
scribed by a vector in Hilbert space. The state vector here is taken in a
sort of Heisenberg picture, in which operators A(t) have a time dependence
dictated by the Hamiltonian H as exp(iHt)A(0) exp(−iHt). But the state
vector in this sort of theory is not time-independent; it undergoes a stochas-
tic evolution, slow for microscopic systems but rapid for larger systems, so
that at any time t there is a probability P (ψ, t)dψ for the wave function
to be in a small volume dψ around any value ψ. Here we are adopting a
basis that is so far arbitrary, labeled by a discrete index i, so that ψ is an
abbreviation for the whole set of components ψi and ψi∗ , constrained by the
normalization condition i |ψi |2 = 1, and dψ is defined as
P

!
2 dArgψi
d|ψi |2
X Y
dψ ≡ δ 1 − |ψi | , (1)
i i

a measure invariant under unitary transformations of the ψi . The continuum


case will be considered later, in Sec. IV.
We assume time-translation invariance, so that if the wave function at
time t has a definite value ψ, then at a later time t′ the probability density
at ψ ′ will be some function Π(ψ ′ , ψ, t′ − t) of ψ ′ , of ψ, and of the elapsed
time t′ − t, but not separately of t or t′ . It follows then from the rules
of probability that if at time t the wave function has a probability density
P (ψ, t), then at time t′ the probability density will be
Z
′ ′
P (ψ , t ) = dψ Π(ψ ′ , ψ, t′ − t) P (ψ, t) . (2)

Differentiating with respect to t′ and then setting t′ = t gives our funda-


mental differential equation for the evolution of the probability density:
d
Z
P (ψ ′ , t) = dψ Kψ′ ,ψ P (ψ, t) , (3)
dt
where K is the kernel
d
 
Kψ′ ,ψ ≡ Π(ψ ′ , ψ, τ ) , (4)
dτ τ =0

3
which depends on the details of the system under study, including any mea-
suring apparatus that the system may contain. Eq. (3) resembles the time-
dependent Schrödinger equation with K in place of −iH, because both follow
from time-translation invariance, but Eq. (3) describes the evolution of the
probability density in Hilbert space rather than of the state vector, and so
K is real rather than anti-Hermitian. Like the time-dependent Schrödinger
equation, Eq. (3) neither violates nor guarantees Lorentz invariance. Pre-
sumably, in a Lorentz invariant theory, K would be accompanied with other
kernels that describe how probabilities change with the position of the ob-
server.
The solution of Eq. (3) is of course
Z  

P (ψ , t) = dψ eKt P (ψ, 0) , (5)
ψ′ ,ψ

with the exponential of Kt defined as usual by its power series expansion.


To evaluate this exponential, we let fN (ψ) be the linearly independent right-
eigenfunctions of K:
Z
dψ Kψ′ ,ψ fN (ψ) = −λN fN (ψ ′ ) , (6)

with eigenvalues −λN . Because there is no need for K to be Hermitian,


some of the eigenfunctions and eigenvalues may be complex, but because K
is real, any complex eigenfunctions and eigenvalues must come in complex
conjugate pairs.
We will assume that the fN (ψ) form a complete set. This is the generic
case; other cases can be handled by letting some eigenvalues and eigenfunc-
tions of K merge with each other. Where the fN form a complete set we
may write the kernel as

λN fN (ψ ′ ) gN (ψ) ,
X
Kψ′ ,ψ = − (7)
N

where gN (ψ) are some coefficient functions, not related in any simple way
to fN (ψ). The eigenvalue condition (6) requires that
Z
dψ gM (ψ) fN (ψ) = δN M . (8)

Then gN will be a left-eigenfunction of K, also with eigenvalue −λN :


Z
dψ ′ gN (ψ ′ ) Kψ′ ,ψ = −λN gN (ψ) . (9)

4
(Eq. (7) does not define gN in the case λN = 0; in this case the definition
is provided by Eqs. (8) and (9).) The completeness relation for the fN can
then be expressed as
fN (ψ ′ ) gN (ψ)
X
1ψ′ ,ψ = (10)
N

where 1ψ′ ,ψ is defined so that, for any smooth function F (ψ),


Z
dψ 1ψ′ ,ψ F (ψ) = F (ψ ′ ) . (11)

It is elementary then to use the power series expansion for the exponential
to calculate that
h i
eKt e−λN t fN (ψ ′ ) gN (ψ) .
X
= (12)
ψ′ ,ψ
N

The probability distribution for the wave function is therefore


Z
−λN t
dψ ′ gN (ψ ′ ) P (ψ ′ , 0) .
X
P (ψ, t) = e fN (ψ) (13)
N

(Where the fN miss being a complete set by a finite number of terms, the
exponentials are in general accompanied with polynomial functions of time.)

III. LIMIT OF EVOLUTION

It is clear that in order for the probability distribution to approach any


sort of limit for t → ∞, all the eigenvalues must have negative real parts;
that is, ReλN ≥ 0. If we assume that there is a minimum value to the
smallest non-zero value of ReλN , then the probability distribution becomes
dominated by the zero modes: for t → ∞
Z
dψ ′ gn (ψ ′ ) P (ψ ′ , 0) ,
X
P (ψ, t) → fn (ψ) (14)
n

where n runs over the values of N for which λN = 0. (The contribution of


eigenmodes with ReλN = 0 but ImλN 6= 0 presumably oscillates so rapidly
as t → ∞ as to be unobservable.) The fn (ψ) can be regarded as fixed points
of the differential equation (3). The magnitude of the non-zero eigenvalues
depends on the nature of the system in question. Presumably where a system
is large, as in measurement by a macroscopic apparatus, the values of the

5
non-zero eigenvalues are large, in which case the approach to the limit (14)
is exponentially fast.
Although the limit of the probability distribution for t → ∞ depends
only on the zero-modes fn and gn , in general to calculate the evolution of
the probability distribution for finite times we need to know all the eigen-
functions fN and gN . But the whole time dependence of the probability
distribution can be calculated in terms of the zero modes in the special case
in which all non-zero λN are equal, say to λ. Then Eq. (12) gives
h i
eKt fn (ψ ′ ) gn (ψ) + e−λt fν (ψ ′ ) gν (ψ) ,
X X
=
ψ′ ,ψ
n ν

where ν runs over the values of N for which λN 6= 0. The completeness


relation (10) gives
X X
fν (ψ ′ ) gν (ψ) = 1ψ′ ,ψ − fn (ψ ′ ) gn (ψ)
ν n

so h i h iX
eKt = 1 − e−λt fn (ψ ′ ) gn (ψ) + e−λt [1]ψ′ ,ψ ,
ψ′ ,ψ
n
and the probability distribution is
h iX Z
P (ψ, t) = P (ψ, 0)e−λt + 1 − e−λt fn (ψ) dψ ′ gn (ψ ′ ) P (ψ ′ , 0) , (15)
n

in which we can see explicitly how the probability distribution approaches


the limit (14) for t → ∞.
The kernel K (including the zero modes fn and gn along with the non-
zero eigenvalues −λν ) depends on the details of the system in question, as
well as depending on the as yet mysterious dynamics of the collapse process.
Consider a system containing a subsystem with a complete set of commuting
observables whose eigenvalues are labeled by an index n, and a measuring
apparatus that through a unitary evolution of the whole system becomes
entangled with the subsystem in such a way that, when the subsystem is in
the n th eigenstate of the observables, the apparatus is in a unique state. It
is convenient to perform a unitary transformation to a new basis, in which
ϕn is the component of the state vector along such a joint state of the whole
system. In order to reproduce the results of the Copenhagen interpretation
the probability distribution at late times must relax to a sum over n of terms
proportional to m6=n δ(|ϕm |2 ), so that only ϕn is allowed to be non-zero in
Q

the nth term. To reproduce the Born rule, the coefficient of the nth term

6
must be proportional to the initial value of |ϕn |2 . Comparing with Eq. (14),
we see that the zero modes here can be labeled with the same index n, with

δ(|ϕm |2 ) , gn (ϕ) = |ϕn |2 ,


Y
fn (ϕ) = Fn (Argϕn ) (16)
m6=n

where Fn (θ) is an unknown function satisfying 02π Fn (θ) dθ = 2π. The nor-
R

malization of these zero


R
modes has been chosen to be consistent with Eq. (8),
which requires that dϕfn (ϕ)gm (ϕ) = δnm . Also, since it is only fn gn that
enter in this requirement, we have made an arbitrary choice of a conve-
nient normalization for fn , thus fixing the normalization of gn . According
to Eq. (14), the probability density at late times becomes
 
  Z
δ |ϕm |2  dϕ′ |ϕ′n |2 P (ϕ′ , 0) .
X Y
P (ϕ, t) → Fn (Argϕn )  (17)
n m6=n

Note that here Eqs. (9) and (16) give, for each n and ϕ
Z
dϕ′ |ϕ′n |2 Kϕ′ ,ϕ = 0 . (18)

This implies the time-independence of the quantity


Z
Pn ≡ dϕ |ϕn |2 P (ϕ, t) . (19)

This makes sense, because Pn according to the Born rule is the probability
that, when the collapse is finished, the state of the system will be found
in the basis state n, and this of course must be independent of t. Since
P ′ 2
n |ϕn | = 1, the sum of Eq. (18) over n yields
Z
dϕ′ Kϕ′ ,ϕ = 0 , (20)

which is theR condition that Eq. (3) respects the conservation of the total
probability dϕ P (ϕ, t).
In usual measurements, the measuring apparatus does not evolve into a
unique state when the subsystem is in the nth eigenstate of a set of observ-
ables, but into any one of a number of apparatus states, labeled with another
index r. It is convenient again to choose a corresponding basis, so that the
components of the wave function are labeled ϕnr , with nr |ϕnr |2 = 1. In
P

this case, assuming all apparatus states r for a given subsystem state n are

7
equally probable, consistency
R
with the results of the Copenhagen interpre-
tation and the condition dϕ fn gm = δnm requires that
 
δ |ϕmr |2 |ϕnr |2 ,
Y X
fn (ϕ) = Fn , gn (ϕ) = (21)
r,m6=n r

where Fn is an unknown function of the phases of all ϕnr , whose average over
phases is unity, and the individual normalization of fnR and gn has again been
chosen for convenience.
R
The individual probabilities gn (ϕ)P (ϕ)dϕ and the
total probability P (ϕ)dϕ are conserved here for the same reason as before.
From the point of view adopted here, there is nothing special about
measurement. Measurement is just a process in which the state vector of a
system (typically microscopic) becomes entangled with the state vector of
a relatively large system, which then undergoes a collapse to an eigenstate
of some operators determined by the characteristics of that system. So we
expect that the state vector of any system undergoes a similar collapse, but
one that is much faster for large systems. But collapse to what? Without
attempting a precise general prescription, we have in mind that these are the
sorts of states familiar in classical physics. For instance, in a Stern-Gerlach
experiment, they would be states in which a macroscopic detector registers
that an atom has a definite trajectory, not a superposition of trajectories.
In Schrödinger’s macabre thought experiment[6], they are states in which
the cat is alive, or dead, but not a superposition of alive and dead. These
states are like the “pointer states” of Zurek[7], but here these basis states
are determined by the physics of the assumed collapse of the state vector,
rather than by the decoherence produced by interaction with small external
perturbations.

IV. CONTINUUM STATES

It is straightforward to adapt this formalism to the continuum case,


where the wave functions depend on a continuous variable x rather than
a discrete label i. In the continuum case, we take ψR as an abbreviation
for the functions ψ(x) and ψ ∗ (x), normalized so that dx|ψ(x)|2 = 1; the
probability distribution
R
P [ψ, t] and the kernel Kψ,ψ′ are functionals of these
functions; and dψ is a functional integral, with a normalization that can
be chosen as convenience dictates. There is no reason here to expect a
gap between the zero and non-zero eigenvalues of K, and in the example
discussed in Sec. VII there is no such gap, so we will not here bother

8
to separate the zero-modes from the eigenfunctionals of K with non-zero
eigenvalue. The kernel can be expressed as
Z
K ψ′ ,ψ =− dN λN fN [ψ ′ ]gN [ψ] , (22)

where
Z Z

dψ Kψ′ ,ψ fN [ψ] = −λN fN (ψ ) , dψ gN ′ [ψ]fN [ψ] = δ(N ′ − N ) . (23)

Using the completeness relation


Z
1ψ′ ,ψ = dN fN [ψ ′ ]gN [ψ]; , (24)

we have h i Z
Kt
e = dN fN [ψ ′ ]gN [ψ]e−λN t (25)
ψ′ ,ψ

and the probability distribution at time t is


Z Z
P [ψ, t] = dN fN [ψ]e−λN t dψ ′ gN [ψ ′ ]P [ψ ′ , 0] . (26)

As before, to avoid runaway solutions we need to assume that ReλN ≥ 0 for


all eigenvalues, in which case with increasing time Eq. (26) is increasingly
dominated by the eigenmodes with smallest λN . But without a gap between
zero and non-zero eigenvalues, the probability distribution may not approach
any specific limit exponentially as t → ∞.

V. QUANTUM JUMPS

Our discussion so far has been very general, not dependent on any specific
picture of the evolution of the state vector. We can be a little more concrete,
by assuming that the wave function undergoes a series of quantum jumps,
from ψ to Jψ, where J is a non-linear operator depending on one or more
random parameters. If the rate of jumps is Γ, and the wave function at some
time t is ψ ′ , then at a slightly

later

time t + dt the probability distribution at
ψ is (1−Γdt)1ψ,ψ′ +Γdt 1ψ,Jψ′ , where brackets indicate an average over the
random parameters on which the operator J depends. Hence the evolution
of the probability distribution is given by Eq. (3), with kernel


Kψ,ψ′ = −Γ 1ψ,ψ′ − 1ψ,Jψ′ . (27)

9
We note in particular that, for any function (or functional) g(ψ) of the wave
function, we have
Z  
dψ g(ψ) Kψ,ψ′ = −Γ g(ψ ′ ) − g(Jψ ′ )

, (28)

so the condition for g(ψ) to be a left eigenfunction of the kernel is that

hg(Jψ)i = Λ g(ψ) , (29)

in which case the corresponding eigenvalue is

λ = −Γ (1 − Λ) . (30)

The left-eigenfunctions of the kernel with zero eigenvalue are those functions
g(ψ) that on average are unaffected by quantum jumps.

VI. DENSITY MATRIX

The class of theories presented here are more general than in any based
on an assumed differential equation for the density matrix, as there is much
more information contained in the probability distribution P (ψ) than in
the density matrix. (For instance, for a system with two discrete states,
the density matrix is specified by only three real parameters, while the
probability distribution is an unknown real function of one modulus and
two phases.) The density matrix in is defined in a general discrete basis by
Z
ρij (t) ≡ dψ P (ψ, t)ψi ψj∗ . (31)

In particular, Eq. (3) gives the rate of change of the density matrix
d
Z Z
ρij (t) ≡ dψ dψ ′ Kψ,ψ′ P (ψ ′ , t) ψi ψj∗ . (32)
dt
In order for the right-hand side to be expressible in terms of ρ, we would
need the space of bilinear functions of ψ to be invariant under the left action
of the kernel K:
Z

dψ Kψ,ψ′ ψi ψj∗ = κij,i′ j ′ ψi′′ ψj∗′ .
X
(33)
i′ j ′

This condition is preserved if we make a change of basis by a unitary trans-


formation of the wave function, of course with a transformed κ matrix. Not

10
all conceivable kernels satisfy a condition like Eq. (33). Where this condition
holds, the density matrix obeys the differential equation
d X
ρij (t) = κij,i′ j ′ ρi′ j ′ (t) . (34)
dt ij
′ ′

There are reasons to suppose that this must be the case. It is a familiar
feature of quantum mechanics that different statistical ensembles of indi-
vidual states can yield the same density matrix. Gisin[8] has shown that
for any two such ensembles of states of a given physical system that have
the same density matrix ρ, it is always possible to invent a second isolated
physical system that can be entangled with the first, in such a way that
measurements in the second system can drive the first system to one or the
other of the two ensembles with density matrix ρ. This does not lead to any
possibility of communication between the two systems, provided the den-
sity matrix contains all information concerning any possible observation of
the first system, and provided that the subsequent evolution of the density
matrix depends only on the density matrix, not on the particular statistical
ensemble it represents. But if Eq. (33) were not satisfied, then the evolution
of the density matrix would depend on the specific statistical ensemble of
state vectors, not just on the density matrix, and instantaneous communi-
cation between isolated systems would be possible.
Where the probability distribution approaches the limit (17) at late time,
in the basis ϕn described in Section III, the density matrix becomes diagonal
Z
ρnm ≡ dϕ P (ϕ) ϕn ϕ∗m → Pn δnm (35)

where the Pn are constants given by Eq. (19). Of course, by a unitary


transformation the density matrix can be put in a diagonal form at any
time, but with diagonal elements and in a basis that change with time.
Eq. (35) tells us that the density matrix approaches a diagonal form with
in a fixed basis and with fixed diagonal elements, equal to the expectation
values (19) of the density matrix at any time.

VII. THE GRW CASE

Finally, it is interesting to examine how the theory proposed in the well-


known paper of Ghirardi, Rimini, and Weber[5] (henceforth GRW) appears
in the more general formalism presented here. GRW suggested a stochastic
evolution of the state vector, leading to its localization, and expressed their

11
model in a differential equation for the density matrix for a single particle
(written here using the Heisenberg picture described above, and in a notation
slightly different from that of GRW)

d  ′ 2

ρx′ ,x (t) = −ω 1 − e−α(x −x) /2 ρx′ ,x (t) , (36)
dt
with ω > 0 and α > 0. (Here x is the eigenvalue of the one-dimensional
Heisenberg-picture position operator x̂(t) = x̂(0) + p̂t/m.) Thus the condi-
tion (33) here reads
Z  
′ 2 /2
dψ Kψ,ψ′ ψ(x′ )ψ ∗ (x) = −ω 1 − e−α(x −x) ψ ′ (x′ )ψ ′∗ (x) , (37)

so the kernel has eigenvalues


 ′ 2 /2

−λxx′ = −ω 1 − e−α(x −x) ≤0, (38)

with left-eigenfunctionals

gxx′ [ψ] = ψ(x′ )ψ ∗ (x) . (39)

But this does not determine the kernel, since without changing Eq. (37) we
can change K by adding any kernel K (0) for which
Z
(0)
dψ Kψ,ψ′ ψ(x)ψ ∗ (x′ ) = 0 . (40)

The zero modes (among others that would depend on K (0) ) are the gxx′ [ψ]
given by Eq. (39), with x = x′ . This is a case where there is no gap between
the negative eigenvalues and zero, and the probability distribution does not
approach any definite limit, though the density matrix becomes increasingly
diagonal as t → ∞.
Bell[5] subsequently gave formulas for a jump operator J and for the
probability distribution for the random parameter in J that would yield the
GRW equation (36) for the evolution of the density matrix. (Bell’s formulas
do not follow uniquely from the GRW equation (36) for the evolution of the
density matrix, but they do follow from other assumptions in the GRW pa-
per.) In the one-particle one-dimensional case Bell’s results (in a somewhat
different notation) gives

[Jξ ψ](x) = j(x − ξ)ψ(x)/R(ψ, ξ) , (41)

12
where j(x) = (2α/π)1/4 exp(−αx2 ); ξ is a random parameter with prob-
ability density R2 (ψ, ξ), and R(ψ, ξ) is determined by the normalization
condition on Jψ:
Z
2
R (ψ, ξ) = d3 x |j(x − ξ)ψ(x)|2 . (42)

Using Eq. (27), and setting the jump frequency Γ equal to ω, we can use
Eq. (41) to find the kernel K. Because of the ψ-dependence of R(ψ, ξ),
it is not so easy here to find general solutions of the eigenvalue condition
(29). But it is easy to see that Eq. (29) is satisfied for the functionals
gx,x′ [ψ] ≡ ψ(x′ )ψ ∗ (x) for arbitrary x and x′ . In these cases the factors 1/R
in [Jξ ψ](x′ ) and [Jξ ψ]∗ (x) are cancelled by the probability distribution R2
for ξ, and we find Λ = exp(−α(x − x′ )2 /2). Using Eq. (30) then shows that
ψ(x′ )ψ ∗ (x) are left-eigenfunctionals of K with eigenvalues (38), so Eq. (37)
is satisfied, and this yields the GRW equation (36).
I am grateful to James Hartle for several helpful conversations about this
work and about the interpretation of quantum mechanics, and to Jacques
Distler for valuable comments. I also thank Angelo Bassi, Gian Carlo Ghi-
rardi and Roderich Tumulka for helpful correspondence about an earlier
version of this paper. This material is based upon work supported by the
National Science Foundation under Grant Number PHY-0969020 and with
support from The Robert A. Welch Foundation, Grant No. F-0014.

———

1. This point involves too many issues to be treated adequately here.


The author’s views on the present state of quantum mechanics are
spelled out in detail in Section 3.7 of Lectures on Quantum Mechanics,
(Cambridge University Press, 2012), to be published.

2. N. Bohr, Nature 121, 580 (1928), reprinted in Quantum Theory and


Measurement, eds. J. A. Wheeler and W. H. Zurek (Princeton Univer-
sity Press, Princeton, NJ, 1983); Essays 1958–1962 on Atomic Physics
and Human Knowledge (Interscience, New York, 1963).

3. The published version is H. Everett, Rev. Mod. Phys. 29, 454 (1957).

13
4. R. B. Griffiths, J. Stat. Phys. 36, 219 (1984); R. Omnès, Rev. Mod.
Phys. 64, 339 (1992); M. Gell–Mann and J. B. Hartle, in Complexity,
Entropy, and the Physics of Information, ed. W. Zurek (Addison–
Wesley, Reading, MA, 1990); in Proceedings of the Third International
Symposium on the Foundations of Quantum Mechanics in the Light of
New Technology, ed. S. Kobayashi, H. Ezawa, Y. Murayama, and S.
Nomura (Physical Society of Japan, 1990); in Proceedings of the 25th
International Conference on High Energy Physics, Singapore, August
2–8, 1990, ed. K. K. Phua and Y Yamaguchi (World Scientific, Sin-
gapore, 1990); J. B. Hartle, Directions in Relativity, Vol. 1, ed. B.-L.
Hu, M.P. Ryan, and C.V. Vishveshwars (Cambridge University Press,
Cambridge, 1993). For a survey and more recent references, see P.
Hohenberg, Rev. Mod. Phys. 82, 2835 (2010).

5. See, e.g., D. Bohm and J. Bub, Rev. Mod. Phys. 38, 453 (1966);
P. Pearle, Phys. Rev. D 13, 857 (1976); G. C. Ghirardi, A. Rimini,
and T. Weber, Phys. Rev. D 34, 470 (1986); J. S. Bell, in Speakable
and Unspeakable in Quantum Mechanics (Cambridge University Press,
Cambridge, 1987), pp. 201–212; L. Diosi, J. Phys. A21, 2885 (1988);
P. Pearle, Phys. Rev. A 39, 2277 (1989); G. C. Ghirardi, P. Pearle,
and A. Rimini, Phys. Rev. A 43, 78 (1990); R. Penrose, in Physics
Meets Philosophy at the Planck Scale, ed. C. Callender (Cambridge
University Press, Cambridge, 2001), p. 290; S. Adler, D. C. Brody,
T. A. Brun, and L. P. Hughston, J. Phys. A 34, 8795 (2001). For a
review, see A. Bassi and G. C. Ghirardi, Phys. Rept. 379, 257 (2003).

6. E. Schrödinger, Naturwiss. 23, 807 (1935).

7. W. Zurek, Phys. Rev. D 24, 1516 (1981).

8. N. Gisin, Helv. Phys. Acta 62, 363 (1989). The possibility of instan-
taneous communication between separated systems is discussed in a
wider context by J. Polchinski, Phys. Rev. Lett. 66, 397 (1991).

14
UTTG-13-12
arXiv:1209.4659v2 [hep-th] 29 Sep 2012

Six-dimensional Methods for Four-dimensional Conformal


Field Theories II: Irreducible Fields

Steven Weinberg∗
Theory Group, Department of Physics, University of Texas
Austin, TX, 78712

Abstract

This note supplements an earlier paper on conformal field theories. There


it was shown how to construct tensor, spinor, and spinor-tensor primary
fields in four dimensions from their counterparts in six dimensions, where
conformal transformations act simply as SO(4, 2) Lorentz transformations.
Here we show how to constrain fields in six dimensions so that the corre-
sponding primary fields in four dimensions transform according to irreducible
representations of the four-dimensional Lorentz group, even when the ir-
reducibility coditions on these representations involve the four-component
Levi-Civita tensor ǫµνρσ .


Electronic address: weinberg@physics.utexas.edu

1
The consequences of conformal symmetry for fields in four spacetime di-
mensions can conveniently be worked out from the manifest consequences
of O(4, 2) invariance in six dimensions. A recent article [1] gave prescrip-
tions for the construction of tensor, spinor, and tensor-spinor fields with
the usual (“primary”) conformal transformation properties in four dimen-
sions from corresponding six-dimensional fields, but these did not all belong
to irreducible representations of the Lorentz group. Where irreducible ten-
sors in four dimensions are entirely characterized by their tracelessness and
their behavior under permutation of their indices, the corresponding six di-
mensional tensor must simply have the same tracelessness and permutation
properties, but how do we impose conditions of irreducibility where these
conditions involve the Levi-Civita tensor ǫµνρσ , when no such four-index
constant antisymmetric tensor exists in six dimensions? The purpose of the
present brief note is to fill this gap.
Let us first consider the paradigmatic example of fields that belong to
the (1, 0) and (0, 1) representations of the Lorentz group, such as the fields
describing left- or right-handed polarized light. These are antisymmetric
tensors tµν , subject to either of the conditions
i
tµν (x) = ± ǫµνρσ tρσ (x) . (1)
2
As explained in [1], we can form a second rank primary tensor field tµν (x)
in four spacetime dimensions, with conformal dimensionality d, from a six-
dimensional tensor field T KL (X) satisfying the conditions

T KL (λX) = λ−d T KL (X) , XK T KL (X) = XL T KL (X) = 0 , (2)

by the construction

tµν (x) = (X 5 + X 6 )d eµK (x) eνL (x) T KL (X) , (3)

where
eµν (x) = δνµ , eµ5 (x) = eµ6 (x) = −xµ . (4)
Here indices µ, ν, etc. run over the values 1, 2, 3, 0, while indices K, L,
etc. run over the values 1, 2, 3, 0, 5, 6 and are raised and lowered with the
metric 
 +1 K = L = 1, 2, 3, 5

η KL = ηKL = −1 K = L = 0, 6 . (5)
 0 K 6= L

2
The xµ are related to the X K by

xµ = . (6)
X5 + X6
Obviously, if we impose on T KL (X) the condition of antisymmetry, T KL (X) =
−T LK (X), then tµν (x) will be antisymmetric: tµν (x) = −tνµ (x). But with-
out further constraints, tµν (x) will belong to the reducible representation
(1, 0) ⊕ (0, 1) of the Lorentz group. How do we impose on T KL (X) some
SO(4, 2)-invariant condition that makes tµν (x) satisfy one or the other of
the irreducibility conditions (1)?
In six dimensions the Levi-Civita tensor is of sixth rank, so no condition
analogous to (1) can be imposed directly on a second-rank tensor T KL (X).
But we can instead impose an irreducibility condition:
i ′ ′ ′
AKLM (X) = ∓ ǫKLM K L N AK ′ L′ M ′ (X) , (7)
6
on the third-rank totally antisymmetric tensor

AKLM (X) ≡ X K T LM (X) + X L T M K (X) + X M T KL (X) . (8)


′ ′ ′
(Here ǫKLM K L M is the totally antisymmetric tensor with ǫ012356 = +1.)
According to Eqs. (3) and (4), the tensor in four dimensions corresponding
to T KL (X) is
h    i
tµν (x) = (X 5 + X 6 )d T µν − xµ T 5ν + T 6ν + xν T 5µ + T 6µ (9)

and using Eq. (6), this is


h i
tµν (x) = (X 5 + X 6 )d−1 Aµν5 (X) + Aµν6 (X) . (10)

To find the constraint on tµν (x) imposed by Eq. (7), we set K and L in this
condition equal to four-dimensional indices µ and ν, while M is set equal to
5 or 6. Using ǫµν5ρσ6 = ǫµνρσ , this gives the conditions
i
Aµν5 (X) = ± ǫµνρσ Aρσ 6 , (11)
2
i
Aµν6 (X) = ± ǫµνρσ Aρσ 5 , (12)
2
The sum of Eqs. (11) and (12) then gives the desired irreducibility condition
(1).

3
There was no loss of information in adding Eqs. (11) and (12), because
these two equations are algebraically equivalent. Likewise, there is no ad-
ditional information to be gained by setting K, L, and M in Eq. (7) equal
to 5, 6, and a spacetime index, or all to spacetime indices, because these
constraints can be derived by applying an SO(4, 2) transformation to (11)
or (12).
It is now easy to see how to construct tensors belonging to the irreducible
(ℓ, 0) or (0, ℓ) representations of the Lorentz group (with ℓ an integer) from
tensors in six dimensions. These representations are the symmetrized direct
products of ℓ (1, 0) or of ℓ (0, 1) representations. The four-dimensional
tensors belonging to the (ℓ, 0) or (0, ℓ) representations:

tµ1 ν1 ,µ2 ν2 ,···µℓ νℓ (x) ,

are therefore constrained to be antisymmetric in each pair of indices, sym-


metric between index pairs, and for each pair to satisfy an irreducibility
condition like Eq. (1):
i
tµ1 ν1 ,µ2 ν2 ,···µℓ νℓ (x) = ± ǫµ1 ν1 ρσ tρσµ2 ν2 ,···µℓ νℓ (x) . (13)
2
Such a primary tensor field can be obtained from a corresponding tensor in
six dimensions by the prescription

tµ1 ν1 ,···µℓ νℓ (x) = (X 5 + X 6 )d eµK11 (x) eνL11 (x) · · · eµKℓℓ (x) eνLℓℓ (x)
× T K1 L1 ···Kℓ Lℓ (X) , (14)

where T K1 L1 ···Kℓ Lℓ (X) is antisymmetric within each index pair and symmet-
ric between index pairs; satisfies the scaling condition

T K1 L1 ···Kℓ Lℓ (λX) = λ−d T K1 L1 ···Kℓ Lℓ (X) , (15)

and a transversaility condition on each index:

XK1 T K1 L1 ···Kℓ Lℓ (X) = 0 ; (16)

and finally is subject to an irreducibility condition like Eq. (7): for each
index pair we require
i ′ ′ ′
AK1 L1 M,K2 L2 ,···Kℓ Lℓ (X) = ∓ ǫK1 L1 M K1′ L′1 M ′ AK1 L1 M ,K2 L2 ,···Kℓ Lℓ (X) ,
6
(17)

4
where

AK1 L1 M,···Kℓ Lℓ (X) ≡ X K1 T L1 M,···Kℓ Lℓ (X)


+ X L1 T M K1 ,···Kℓ Lℓ (X) + X M T K1 L1 ,···Kℓ Lℓ (X) . (18)

Note that if we did not impose the condition (17) then tµ1 ν1 ,µ2 ν2 ,···µℓ νℓ (x)
would transform as the direct product of ℓ reducible (1, 0) ⊕ (0, 1) represen-
tations, and hence as a sum of various representations of the Lorentz group,
not just (ℓ, 0) and (0, ℓ).
The construction of spinor fields belonging to the irreducible (1/2, 0) and
(0, 1/2) representations has already been described in [1]. In six dimensions,
we introduce an eight-component spinor
!
Ψ+ (X)
Ψ(X) = , (19)
Ψ− (X)

where Ψ± are the two irreducible four-component fundamental spinor rep-


resentations of SO(4, 2), subject to a scaling condition

Ψ(λX) = λ−d+1/2 Ψ(X) . (20)

These irreducible representations are related by a transversality condition

X K ΓK Ψ(X) = 0 , (21)

where the ΓK form the irreducible 8×8 representation of the Clifford algebra
for SO(4, 2):
! ! !
µ 0 iγ5 γ µ 5 0 γ5 6 0 1
Γ = µ , Γ = , Γ = , (22)
iγ5 γ 0 γ5 0 −1 0

in a notation for which


!
0 1 2 3 5 6 1 0
Γ7 ≡ −iΓ Γ Γ Γ Γ Γ = . (23)
0 −1

From Ψ(X), we can form spinor fields ψ± (x) of conformal dimensionality


d that transform according to the usual (primary) representation of the
conformal group in four dimensions, and that transform according to the
two irreducible fundamental spinor representations of the Lorentz group:
1 ∓ γ5
 
5 6 d−1/2
ψ± (x) = (X + X ) Ψ± (X) . (24)
2

5
The above prescriptions for tensors and spinors can be combined into a
prescription for spinor-tensor fields. In six dimensions, we introduce a field
with an eight-valued spinor index and 2ℓ six-vector indices
K1 L1 ···Kℓ Lℓ
!
Ψ+ (X)
ΨK1 L1 ···Kℓ Lℓ (X) = K1 L1 ···Kℓ Lℓ , (25)
Ψ− (X)

which is again antisymmetric in each vector index pair, symmetric between


vector index pairs, and for each index pair satisfies

K1 L1 M,K2 L2 ,···Kℓ Lℓ i K ′ L′ M ′ ,K2 L2 ,···Kℓ Lℓ


Ω± (X) = ∓ ǫK1 L1 M K1′ L′1 M ′ Ω± 1 1 (X) ,
6
(26)
where
K1 L1 M,···Kℓ Lℓ L1 M,···Kℓ Lℓ
Ω± (X) ≡ X K1 Ψ± (X)
+ X L1 Ψ M
±
K1 ,···Kℓ Lℓ K1 L1 ,···Kℓ Lℓ
(X) + X M Ψ± (X) . (27)

Like Ψ, the spinor-tensor field is subject to a scaling condition

ΨK1 L1 ···Kℓ Lℓ (λX) = λ−d+1/2 ΨK1 L1 ···Kℓ Lℓ (X) . (28)

and a transversality condition:

X K ΓK ΨK1L1 ···Kℓ Lℓ (X) = 0 , (29)

and like tensor fields, it is also transverse in the sense that:

XK1 ΨK1 L1 ···Kℓ Lℓ (X) = 0 . (30)

As noted in [1], from these six-dimensional fields we can construct fields


in four dimensions that transform under conformal transformations as pri-
mary fields of conformal dimensionality d:
µ1 ν1 ,µ2 ν2 ,···µℓ νℓ
ψ± (x) = (X 5 + X 6 )d−1/2 eµK11 (x) eνL11 (x) · · · eµKℓℓ (x) eνLℓℓ (x)
1 ± γ5
 
K1 L1 ···Kℓ Lℓ
× Ψ± (X) . (31)
2
But these fields transform according to the reducible (1/2, 0) ⊗ (ℓ, 0) and
(0, 1/2) ⊗ (0, ℓ) representation of the Lorentz group, which consist respec-
tively of (ℓ−1/2, 0) and (0, ℓ−1/2) representations as well as (ℓ+1/2, 0) and
(0, ℓ + 1/2) representations. In order to isolate the irreducible (ℓ + 1/2, 0)

6
and (0, ℓ + 1/2) representations, we must impose a further Lorentz-invariant
irreducibility condition: For each index pair
µ1 ν1 ,µ2 ν2 ,···µℓ νℓ
γµ1 γν1 ψ± (x) = 0 . (32)

The left-hand side has ℓ − 1 index pairs, so it contains the (ℓ − 1/2, 0) or


(0, ℓ − 1/2) representations, which are thus eliminated by this condition. It
is fairly obvious that this condition in four dimensions is implemented in six
dimensions by the SO(4, 2)-invariant constraint

ΓK1 ΓL1 ΨK1 L1 ···Kℓ Lℓ (X) = 0 . (33)

It is straightforward using the constraints (29) and (30) to show that the
condition (33) in six dimensions does imply the condition (32) in four di-
mensions.
This material is based upon work supported by the National Science
Foundation under Grant Number PHY-0969020 and with support from The
Robert A. Welch Foundation, Grant No. F-0014.

Reference

1. S. Weinberg, Phys. Rev. D 82, 045031 (2010). This article contains


references to earlier works on the six-dimensional approach to confor-
mal symmetry in four dimensions; also see A.R. Gover, A. Shaukat,
and A. Waldron, Phys. Lett. B 675, 93 (2009).

Note added in proof:

After this work was completed, I learned that these results can be obtained
from a twistor formalism described by W. Siegel, arxiv: 1204.5679 and in
earlier works cited therein.

7
UTTG-16-12

Minimal Fields of Canonical Dimensionality are Free


arXiv:1210.3864v1 [hep-th] 15 Oct 2012

Steven Weinberg∗
Theory Group, Department of Physics, University of Texas
Austin, TX, 78712

Abstract

It is shown that in a scale-invariant relativistic field theory, any field ψn


belonging to the (j, 0) or (0, j) representations of the Lorentz group and
with dimensionality d = j + 1 is a free field. For other field types there is no
value of the dimensionality that guarantees that the field is free. Conformal
invariance is not used in the proof of these results, but it gives them a special
interest; as already known and as shown here in an appendix, the only fields
in a conformal field theory that can describe massless particles belong to the
(j, 0) or (0, j) representations of the Lorentz group and have dimensionality
d = j + 1. Hence in conformal field theories massless particles are free.


Electronic address: weinberg@physics.utexas.edu

1
This note will show that in a scale-invariant relativistic field theory, any
fields that belong to the minimal 2j + 1-component (j, 0) or (0, j) represen-
tations of the Lorentz group (where j is an integer or half-integer) and have
canonical dimensionality d = j +1 are necessarily free fields. This conclusion
is already known for j = 0 [1]; here it is extended to all spins. Although
conformal invariance is not used here, this result gains interest from the fact
[2] that in conformal field theories the only fields that can describe massless
particles belong to the (j, 0) and (0, j) representations of the Lorentz group
and have canonical dimensionality. An elementary proof of this theorem is
given in an appendix. It follows that, according to the main result of the
present paper, massless particles in a conformally invariant field theory must
be free particles.
To begin, consider a field ψn (x) belonging to any representation of the
Lorentz group. Poincaré invariance tells us that

X X
Lρσ Gnm (z) = − [Jρσ ]nl Glm (z) + Gnl (z)[Jρσ ]lm (1)
l l

where G is the vacuum expectation value


D E

Gnm (x − y) ≡ 0 ψn (x) ψm (y) 0 , (2)

Lρσ are the differential operators

∂ ∂
Lρσ ≡ −iz ρ + iz σ , (3)
∂zσ ∂zρ

and [Jρσ ]nm are the matrices representing the generators of the Lorentz
group in the representation furnished by the field ψ(x). Iteration of Eq. (1)
gives (suppressing matrix indices)

Lρσ Lρσ G(z) = J ρσ Jρσ G + GJ † ρσ Jρσ



− 2J ρσ GJρσ

(4)

The point of this exercise is that by elementary commutations of deriva-


tives and coordinates, one can derive the identity

Lρσ Lρσ = −2z 2 ✷ + 2S 2 − 4S , (5)

where ✷ ≡ ∂ 2 /∂z ρ ∂zρ is the usual d’Alembertian, and S is the scale trans-
formation operator

S ≡ −z ρ ρ . (6)
∂z

2
(This is analogous to the identity in three dimensions that can be used to
show that the Laplacian of the spherical polynomial r ℓ Yℓm (θ, φ) vanishes.)
We will use Eqs. (4)–(6) to show that if ψ belongs to the (j, 0) or (0, j)
representations of the Lorentz group and has canonical dimensionality then
✷ψ = 0.
If ψ(x) belongs to the (j, 0) representation of the Lorentz group, then

Jij = ǫijk Jk , Ji0 = −iJi , (7)

where Ji are the Hermitian matrices representing the generators of the ro-
tation group in its spin j representation. It follows that
1 ρσ
J Jρσ G = 2Ji Ji G = 2j(j + 1)G
2
1
GJ †ρσ Jρσ

= 2GJi† Ji† = 2j(j + 1)G (8)
2
J ρσ GJρσ

=0.

Also, if ψ has dimensionality d (counting powers of momentum) then in a


scale-invariant theory
S G(z) = 2dG(z) . (9)
So for these fields, Eq. (4) reads

−2z 2 ✷G(z) + 8d2 G(z) − 8d G(z) = 8j(j + 1) G(z) , (10)

and in particular, for d = j + 1,

✷G(z) = 0 . (11)

Operating again with a d’Alembertian, it follows trivially that


D E
0 = 0 ✷x ψn (x) [✷y ψm (y)]† 0 , (12)

so E
[✷y ψm (y)]† 0 = 0 (13)

But any local operator that annihilates the vacuum must vanish[3], so

✷y ψm (y) = 0 , (14)

and the field is therefore free. The proof for (0, j) fields is identical, except
for an inconsequential difference of sign of Ji0 . This does not say that the

3
theory for which ✷y ψm (y) = 0 is a free-field theory, but only that the field
ψn is free; there may be other fields in the same theory, which transform
according to other representations of the Lorentz group and/or have other
dimensionalities, that are not free.
It is only fields belonging to the (j, 0) or (0, j) representations of the
Lorentz group that can be shown in this way to be free. Indeed, for fields
χr belonging to other irreducible representations of the Lorentz group, there
is no value of dimensionality d for which it is guaranteed that ✷χr = 0. If
χr transforms according to the (j, j ′ ) representation, then χr χ†s in general
transforms reducibly, as a sum of the representations (A, B) with both A
and B running by unit steps from |j − j ′ | to j + j ′ . The vacuum expectation
value F ≡ h0|χr χ†s |0i has a corresponding decomposition into terms F (A,B)
belonging to the same representations, for which in place of Eq. (10) we have

−2z 2 ✷F (A,B) (z)+8d2 F (A,B) (z)−8d F (A,B) (z) = 4[A(A+1)+B(B+1)] F (A,B) (z) .
(15)
If F (A,B) itself is non-zero, then the only way that ✷F (A,B) can vanish is if
d takes a value for which

2d(d − 1) = A(A + 1) + B(B + 1) . (16)

There is obviously no value of d for which this is satisfied for all values of
A and B between |j − j ′ | and j + j ′ unless either j ′ = 0 or j = 0, in which
case both A and B take the unique value j or j ′ .

ACKNOWLEDGMENTS

I am grateful for helpful conversations and correspondence with J. Dis-


tler, H. Osborn, E. Sezgin, and E. Witten. This material is based upon work
supported by the National Science Foundation under Grant Number PHY-
0969020 and with support from The Robert A. Welch Foundation, Grant
No. F-0014.

APPENDIX: MASSLESS PARTICLE FIELDS IN


CONFORMAL THEORIES

This appendix offers an elementary demonstration of Mack’s result [2],


that the only fields in a conformal field theory that can describe a massless

4
particle of helicity j or −j (in the sense that the field has a non-vanishing
matrix element between the particle state and the vacuum) is a (0, j) or
(j, 0) field of canonical dimensionality d = j + 1. Together with the main
result of the present work, this shows that massless particles in conformal
field theories are free.
It is necessary first to say how massless particle states transform un-
der infinitesimal conformal transformations. This is already known[4] (as I
learned after working out the transformation rules), but it is worth present-
ing a detailed derivation here to show that these transformation rules are
unique. To define the massless particle states, we first introduce a standard
three-momentum κẑ of magnitude κ in the +3-direction, and define a state
|κẑ, σi with this momentum and with helicity σ, in the sense that this is an
eigenstate of the generator J12 of rotations in the 1− 2 plane with eigenvalue
σ:
J12 |κẑ, σi = σ|κẑ, σi . (A.1)
In order to avoid introducing new continuous degrees of freedom, it is also
necessary to assume that these states are annihilated by the generators of
the invariant Abelian subgroup of the little group (the group of Lorentz
transformations that leave the standard three-momentum invariant):
[J10 + J13 ]|κẑ, σi = [J20 + J23 ]|κẑ, σi = 0 . (A.2)
We then take  
|p, σi ≡ U L(p) |κẑ, σi , (A.3)
 
where U L(p) is the unitary operator representing a standard Lorentz
transformation Lµ ν (p) that takes the standard three-momentum κẑ to p.
For instance, we can take Lµ ν (p) as a boost along the 3-axis that takes κẑ
to |p|ẑ, followed by a rotation in the ẑ − p̂ plane that takes |p|ẑ to p. These
states are here normalized to have the Lorentz-invariant scalar product
hp, σ|p′ , σ ′ i = δσσ′ |p| δ3 (p − p′ ) , (A.4)
rather than the conventional scalar product which does not contain the
factor |p|. As is well known[5], acting on such a state, the unitary operator
U (Λ) representing a Lorentz transformation Λµ ν gives∗∗
 
U (Λ)|p, σi = exp iσφ(p, Λ) |Λp, σi (A.5)
∗∗
We take i, j, k, · · · to run over the spatial coordinate indices 1, 2, 3, while µ, ν, · · · run
over the spacetime indices 0, 1, 2, 3. Repeated indices are summed, and the spacetime
metric ηµν has non-zero components η11 = η22 = η33 = 1, η00 = −1.

5
where (Λp)i ≡ Λi j pj +Λi 0 |p|, and φ is the real angle appearing in the Wigner
rotation for massless particles:
   
U L−1 (Λp)ΛL(p) |κẑ, σi = exp iσφ(p, Λ) |κẑ, σi . (A.6)

For infinitesimal Lorentz transformations Λµ ν = δµ ν + ω µ ν (with ω µν ≡


ω µ λ η νλ infinitesimal and antisymmetric) we have
i
U (1 + ω) = 1 + ω µν Jµν (A.7)
2
with Jµν Hermitian and antisymmetric in µ and ν, and satisfying the com-
mutation relations

i[Jµν , Jκλ ] = ηνκ Jµλ − ηµκ Jνλ − ηνλ Jµκ + ηµλ Jνκ . (A.8)

In this case, Eq. (A.5) reads


" ! #
∂ ∂
Jij |p, σi = i pi − pj + σφij (p) |p, σi , (A.9)
∂pj ∂pi

 
Ji0 |p, σi = i|p| + σφi0 (p) |p, σi , (A.10)
∂pi
where the antisymmetric real coefficients φµν (which will play a large role
in what follows) are defined by
1 µν
φ(p, 1 + ω) = ω φµν (p) . (A.11)
2
Of course, also
P µ |p, σi = pµ |p, σi , (A.12)
with p0 = |p|. It is straightforward to check that the operators (A.9), (A.10),
and (A.12) are Hermitian within the norm (A.4).
The functions φµν can be calculated from Eq. (A.6) in any convenient
representation of the Lorentz group, such as the two-component fundamental
spinor representation. The result is

ǫijk (p̂ + ẑ)k (p̂ × ẑ)i


φij (p) = , φi0 (p) = − . (A.13)
1 + p̂ · ẑ 1 + p̂ · ẑ
These formulas depend on a particular prescription for the standard Lorentz
transformation L(p) that takes κẑ to p. Suppose we change this prescription

6
by introducing some other Lorentz transformation L′ (p) that takes κẑ to p,
The Lorentz transformation L−1 (p)L′ (p) is an element of the little group,
and therefore merely muliples |κẑ, σi with a phase factor exp iζ(p) , so if
we use L′ (p) in place of L(p) in Eq. (3) the one particle state is changed to
     
|p, σi′ ≡ U L′ (p) |κẑ, σi = U L(p) U L−1 (p)L′ (p) |κẑ, σi
 
= exp iζ(p) |p, σi . (A.14)

Thus the formulas (A.13) for φij (p) and φi0 (p) represent a particular con-
vention for the phase of these states, supplementing the convention (A.4)
we have adopted for the normalization of the states.
The generators of the conformal symmetry group in four spacetime di-
mensions comprise the generator S of scale transformations and the genera-
tors Kµ of special conformal transformations, together with the generators
Jµν and Pµ of Poincaré transformations. They satisfy the well-known com-
mutation relations

i [Kµ , Jρλ ] = ηµρ Kλ − ηµλ Kρ , (A.15)


[Kµ , Kν ] = 0 , (A.16)
i[Pµ , Kν ] = 2Jµν + 2ηµν S , (A.17)
i[S, Pµ ] = Pµ , (A.18)
i[S, Kµ ] = −Kµ , (A.19)
[S, Jµν ] = 0 , (A.20)

as well as the familiar commutation relations of the Poincaré group:

i [Jµν , Jρλ ] = ηνρ Jµλ − ηµρ Jνλ − ηλµ Jρν + ηλν Jρµ , (A.21)
i [Pµ , Jρλ ] = ηµρ Pλ − ηµλ Pρ , (A.22)
[Pµ , Pρ ] = 0 . (A.23)

It is straightforward though quite tedious to show that these commutation


relations are satisfied by the following operators on one-particle states:
"
∂2 ∂
K0 |p, σi = |p| − 2iσφk0 (p)
∂pk ∂pk ∂pk
#
σ2
− (φk0 (p)φk0 (p) + 1) |p, σi (A.24)
|p|

7
"
∂2 ∂2 ∂
Ki |p, σi = 2pk − pi +2
∂pk ∂pi ∂pk ∂pk ∂pi
#
∂ σ2  
+ 2iσφik (p) + 2φik (p)φk0 (p) − p̂i [φk0 (p)φk0 (p) + 1] |p, σi ,
∂pk |p|
(A.25)

 
S|p, σi = i pk + 1 |p, σi , (A.26)
∂pk
together with the Poincaré transformation operators (A.9), (A.10), and
(A.12). (These results agree with those of [4] when we use the formulas
(A.13) for φµν (p), but in what follows it will be convenient to leave the
transformation rules in the form (A.24)–(A.26).) What is less straightfor-
ward is to show that, given massless particle states |p, σi satisfying the
Poincaré transformation rules (A.9), (A.10), and (A.12), the only confor-
mal transformation properties consistent with the commutation relations
are those given in Eqs. (A.24)–(A.26). To show this, we will outline the
steps by which Eqs. (A.24)–(A.26) are derived.

(a) The commutation relation (A.17) gives [K0 , Pi ] = 2iJi0 . This uniquely
fixes the derivative terms in Eq. (A.24). That is,
" #
∂2 ∂
K0 |p, σi = |p| − 2iσφk0 (p) + α(p) |p, σi , (A.27)
∂pk ∂pk ∂pk
with α(p) some c-number function that remains to be calculated. Using
Eqs. (A.4) and (A.13), we easily see that for K0 to be Hermitian, α(p) must
be real.

(b) The commutation relation (A.17) also gives [K0 , P0 ] = −2iS. Using
Eqs. (A.12) and (A.27), we obtain Eq. (A.26) for the action of the dilation
generator S on one-particle states.

(c) Using Eq. (A.13) together with Eqs. (A.27) and (A.9), we can calculate
that
( )
∂ ∂ h i
[Jij , K0 ]|p, σi = −i|p, σi pi − pj α(p) + φk0 (p)φk0 (p)/|p| .
∂pj ∂pi
Since this must vanish, α(p) + φk0 (p)φk0 (p)/|p| must be a function only
of the modulus |p|. Further, Eqs. (A.19) and (A.26) tell us that this func-
tion scales as 1/p, and hence must take the form a/|p|, with a real and

8
p-independent. Hence Eq. (A.27) reads
" #
∂2 ∂ φk0 (p)φk0 (p) − a
K0 |p, σi = |p| −2iσφk0 (p) − |p, σi , (A.28)
∂pk ∂pk ∂pk |p|

(d) From Eq. (A.15), we have Ki = −i[Ji0 , K0 ]. Using the formulas (A.10)
and (A.28) for the operators Ji0 and K0 , we find
"
∂2 ∂2 ∂ ∂
Ki |p, σi = 2pk − pi +2 + 2iσφik (p)
∂pi ∂pk ∂pk ∂pk ∂pi ∂pk
#
2σ 2 σ 2 φk0 (p)φk0 (p) − a
− φk0 (p)φki (p) − p̂i |p, σi . (A.29)
|p| |p|

(e) It only remains to find the constant a. We can do this by requiring


that Kµ be a four-vector. Since we have constructed K0 to be a rotational
scalar and Ki to be equal to −i[Ji0 , K0 ], the remaining requirement provided
by Eq. (A.15) is that [Ji0 , Kj ] = iδij K0 . Now using the formulas (A.13),
equating the coefficients of δij in the non-derivative terms on both sides of
this commutation relation, we find a = −σ 2 . Eqs. (A.28) and (A.29) are
then the desired results (A.24) and (A.25).

Now let us turn to the transformation of field operators. We will consider


here only fields that transform linearly and homogeneously under Poincaré
transformations:

i[J µν , ψn (x)] = −i µν
ψm (x) + (xν ∂ µ − xµ ∂ ν ) ψn (x)
X
Jnm (A.30)
m

i[Pµ , ψn (x)] = −∂µ ψn (x) (A.31)


where J µν is a set of spin matrices that satisfy the same commutation rela-
tion (A.21) as J µν . This excludes gauge fields, whose Lorentz transformation
properties in an operator formalism (rather than a path-integral formalism)
include a gauge transformation in addition to the transformation (A.30).
A primary field may be defined as one with the familiar conformal trans-
formation properties:

[Jµν ]nm xµ ψm (x)+2dn xν ψn (x)+(2xν xρ ∂ρ −x2 ∂ν )ψn (x) ,


X
i[Kν , ψn (x)] = −2i
m
(A.32)

9
i[S, ψn (x)] = dn ψn (x) + xµ ∂µ ψn (x) , (A.33)
where dn is a real number, known as the conformal dimensionality of the
field. Since neither Lorentz nor conformal transformations mix different
irreducible representations of the Lorentz group, we will assume without
loss of generality that the matrices Jµν furnish an irreducible representation
of the algebra of the Lorentz group.
Our aim in this appendix is to find what kinds of primary fields can
describe a massless particle of a given helicity σ. By a field “describing” a
particle, we mean that the field has non-vanishing matrix elements between
the particle state and the vacuum. The propagator of such a field will
have a zero mass pole whose residue is proportional to the product of this
matrix element and its complex conjugate, so that S-matrix elements for this
particle can be found from the residues of poles in the vacuum expectation
value of time-ordered products of the field.
Let’s first take up the simple case of dilations. Assuming that the vacuum
is invariant under these transformations, Eq. (A.33) gives
−ih0|ψn (0)S|p, σi = dn h0|ψn (0)|p, σi (A.34)
With Eq. (A.26), this becomes

 
dn h0|ψn (0)|p, σi = pk + 1 h0|ψn (0)|p, σi . (A.35)
∂pk
The way that the matrix element h0|ψn (0)|p, σi scales with momentum
depends on the Lorentz transformation properties of the field ψn . Recall
that the general irreducible representations of the Lorentz group are labeled
(A, B), where A and B are positive integers or half-integers. These repre-
sentations are defined by writing the matrices representing the generators
of the Lorentz group in terms of two Hermitian matrix 3-vectors defined by
1 i 1 i
Ai ≡ Ji + Ji0 , Bi ≡ Ji − Ji0 , (A.36)
2 2 2 2
where as usual Ji ≡ 12 ǫijk Jjk . The commutation relations of the Jµν tell us
that
[Ai , Aj ] = iǫijk Ak , [Bi , Bj ] = iǫijk Bk , [Ai , Bj ] = 0 . (A.37)
In the (A, B) representation of the Lorentz group, these are A×A and B ×B
matrices, such that
A2 = A(A + 1) , B2 = B(B + 1) . (A.38)

10
It is an old result [6] that the only free fields that can describe a massless
particle of helicity σ have
σ =B−A (A.39)
and have matrix elements between the vacuum and these one-particle states
that scale as pA+B ; that is

pk h0|ψn (0)|p, σi = (A + B)h0|ψn (0)|p, σi . (A.40)
∂pk

It is easy to show using Lorentz invariance that Eqs. (A.39) and (A.40) also
hold for general interacting fields. Combining Eq. (A.40) with Eq. (A.35)
gives the conformal dimensionality

d=A+B+1. (A.41)

in which we drop the subscript n on dn since this is the same for all com-
ponents of a field belonging to an irreducible representation of the Lorentz
algebra. Thus fields of Lorentz type (A, B) that describe a massless parti-
cle (in the sense explained above) can only have conformal dimensionality
A+B+1
Now let’s consider special conformal transformations. We will compare
what we have learned about the conformal transformation properties of one-
particle states with the consequences of the conformal transformation prop-
erties of a primary field that can describe such a particle. By taking the
matrix element of Eq. (A.32) for ν = 0 between a one-particle state |p, σi
and the vacuum, and assuming that the vacuum is conformal-invariant, we
find

[Ji0 ]nm xi h0|ψm (x)|p, σi


X
h0|ψn (x)K0 |p, σi = 2
n
+ 2id x0 h0|ψn (x)|p, σi − (2x0 xρ pρ + x2 )|p|h0|ψn (x)|p, σi . (A.42)

We need to rearrange the right-hand side of Eq. (A.42) so that it takes the
form h0|ψn (x)K|p, σi, where K is some xρ -independent matrix function of
momentum and momentum derivatives, and then compare K|p, σi with what
Eq. (A.24) gives for K0 |p, σi. For this purpose, we first re-write Eq. (A.42)
so that it reads

[Ji0 ]nm xi h0|ψm (x)|p, σi


X
h0|ψn (x)K0 |p, σi = 2
n

11
∂ 2 eip·x
+ 2i(d − 1)x0 h0|ψn (x)|p, σi + h0|ψn (0)|p, σi |p|
∂pk ∂pk

 
i ip·x
X
= 2 [Ji0 ]nm x h0|ψm (x)|p, σi + 2ix0 e d − 1 − pk h0|ψn (0)|p, σi
n ∂pk
∂ ∂2
− 2i|p|eip·x xk h0|ψn (0)|p, σi − |p|eip·x h0|ψn (0)|p, σi
∂pk ∂pk ∂pk
∂2
+ h0|ψn (x)|p| |p, σi . (A.43)
∂pk ∂pk
Eq. (A.35) tells us that the term in the final expression proportional to x0
vanishes, as it must.
To calculate the derivatives of h0|ψn (0)|p, σi with respect to momentum,
we use Lorentz invariance. Combining Eq. (A.30) for x = 0 with Eq. (A.10),
we find
∂ X
i|p| h0|ψn (0)|p, σi = [Jk0 ]nm h0|ψm (0)|p, σi − σφk0 h0|ψn (0)|p, σi .
∂pk m

Using this formula, and its derivative with respect to pk , together with
Eq. (A.13), we put Eq. (A.43) in the form
1 X
h0|ψn (x)K0 |p, σi = [Jk0 Jk0 ]nm h0|ψm (x)|p, σi
|p| m
2σφk0 X
− [Jk0 ]nm h0|ψm (x)|p, σi
|p| m
+σ 2 φk0 φk0 h0|ψn (x)|p, σi + 2σφk0 xk h0|ψn (x)|p, σi
∂2 ∂
+ h0|ψn (x)|p| |p, σi + eip·x p̂k h0|ψn (0)|p, σi . (A.44)
∂pk ∂pk ∂pk
Using the formula (A.13) for φk0 , we can write
∂ ip·x
φk0 xk eip·x = φk0 [xk + p̂k x0 ]eip·x = −iφk0 e
∂pk
and put the fourth term in Eq. (A.44) in the form

2σφk0 xk h0|ψn (x)|p, σi = −2iσφk0 h0|ψn (x) |p, σi
∂pk
2σ X 2σ 2
+ φk0 [Jk0 ]nm h0|ψm (x)|p, σi − φk0 φk0 h0|ψn (x)|p, σi .
|p| m |p|

12
The next-to-last term here cancels the second term in Eq. (A.44). Using
Eq. (A.40) again, Eq. (A.44) now takes almost the desired form:
*
h ∂2 ∂
h0|ψn (x)K0 |p, σi = 0 ψn (x) |p| − 2iσφk0

∂pk ∂pk ∂pk
+
σ2 A + B i
− φk0 φk0 + p, σ
|p| |p|
1 X
+ [Jk0 Jk0 ]nm h0|ψm (x)|p, σi . (A.45)
|p| m

Comparing this with Eq. (A.24), we see that for a field ψ to describe a
massless particle of helicity σ, it is necessary that
h i h i
[Jk0 Jk0 ]nm = −δnm σ 2 + A + B = −δnm (B − A)2 + A + B (A.46)

But by using Eqs. (A.36) and (A.38), we see that

[Jk0 Jk0 ]nm = [Ji Ji ]nm − 2δnm [A(A + 1) + B(B + 1)] (A.47)

so the requirement (A.46) is that


h i
[J 2 ]nm = δnm − (B − A)2 − A − B + 2A(A + 1) + 2B(B + 1) . (A.48)

This rules out most irreducible representations of the Lorentz group, for
which J 2 takes different values for different components. For instance,
in the (1/2, 1/2) four-vector representations, J 2 takes the value 0 for the
time-component and the value 1(1 + 1) for the space components. The
only irreducible representations for which J 2 takes the same value for all
components are the 2j + 1-dimensional representations (j, 0) and (0, j), with
j a positive integer or half-integer. For all these representations Eq. (A.48)
is satisfied, since with either A = j and B = 0 or A = 0 and B = j, we have
J 2 = j(j + 1) and −(B − A)2 − A − B + 2A(A + 1) + 2B(B + 1) = j(j + 1).
We conclude then that the only primary field that in a conformally in-
variant theory can describe a massless particle of helicity σ is the (j, 0)
representation if σ = −j or the (0, j) representation if σ = j. The con-
formal dimensionalities of these fields are simply d = j + 1. Other fields
of type (A, B) with B − A = σ can describe a massless particle of helicity
σ = ±j, but these are spacetime derivatives of fields of type (j, 0) or (0, j),
and cannot have the conformal transformation properties (A.30)–(A.33) of
a primary field.

13
REFERENCES

1. The earliest reference seems to be D. Buchholz and K. Fredenhagen,


J. Math. Phys. 18, 1107 (1977). A much simpler and more trans-
parent proof for the case j = 0 has been given by E. Witten, private
communication. Witten’s proof is based on the known explict form of
the two-point function for scalar fields in scale-invariant theories. The
proof presented here is a generalization of Witten’s proof to j 6= 0, but
avoids having to work out the explicit form of the two-point function.

2. G. Mack, Commun. Math. Phys. 55, 1 (1977). Mack’s construction


of these representation is based on states with simple transformation
properties under the compact subgroup SO(4) × SO(2) of SO(4, 2),
rather than the states of definite momentum considered here in the
appendix, though Mack does identify the mass and helicity of states
belonging to various representations of SO(4) × SO(2). The massless
particle states considered in this paper correspond to item (5) in the
table of representations given in Section 1 of Mack’s paper. This repre-
sentation of the conformal group is sometimes called the “doubleton”
representation; see, e.g., E. Sezgin and P. Sundell, JHEP 0109, 036
(2001) [hep-th/0105001], Section 2.

3. This argument is spelled out by P. C. Argyres, M. R. Plesser, N.


Seiberg, and E. Witten, Nucl. Phys. B 461, 71 (1996).

4. F. Chan and H. F. Jones, Phys. Rev. D 19, 1321 (1974).

5. E. P. Wigner, in Theoretical Physics (International Atomic Energy


Agency, Vienna, 1963). For a textbook treatment, see S. Weinberg,
The Quantum Theory of Fields, Vol. I (Cambridge University Press,
Cambridge, UK, 1995), Sec. 2.5.

6. S. Weinberg, Phys. Rev. 134, B882 (1964). For a textbook treatment,


see S. Weinberg, The Quantum Theory of Fields, Vol. I (Cambridge
University Press, Cambridge, UK, 1995), Sec. 5.9.

14
UTTG-06-13
arXiv:1303.0342v1 [hep-ph] 2 Mar 2013

Tetraquark Mesons in Large-N Quantum Chromodynamics

Steven Weinberg∗
Theory Group, Department of Physics, University of Texas
Austin, TX, 78712

Abstract

It is argued that exotic mesons consisting of two quarks and two antiquarks
are not ruled out in quantum chromodynamics with a large number N of
colors, as generally thought. They can come in two varieties: short-lived
tetraquarks with decay rates proportional to N , which would be unobserv-
able if N were sufficiently large, and long-lived tetraquarks with decay rates
proportional to 1/N . The f0 (500) and f0 (980) may be examples of these
two varieties of exotic mesons.


Electronic address: weinberg@physics.utexas.edu

1
The suggestion[1] to consider quantum chromodynamics in the limit of a
large √number N of colors, with the gauge coupling g vanishing in this limit
as 1/ N, has been had some impressive success in reproducing qualitative
features of strong interaction phenomena. In his classic Erice lectures[2]
describing these results, Coleman concluded that, for large N , quantum
chromodynamics should not admit tetraquark mesons — exotic mesons that
are formed from a pair of quarks and a pair of antiquarks — a result that
has been widely accepted[3]. This note will complete Coleman’s argument
and then point out exceptions to his conclusion. The large N approximation
not only does not rule out tetraquark mesons, it helps to understand their
properties.
Coleman’s reasoning was as follows. By Fierz rearrangements of fermion
fields, any color-neutral operator formed from two quark and two antiquark
fields can be put in the form
X
Q(x) = Cij Bi (x)Bj (x) , (1)
ij

where the Bi (x) are various color-neutral quark bilinears:

q a (x)Γi q a (x) .
X
Bi (x) = (2)
a

Here q a is a column of canonically normalized quark fields,∗∗ with a an N -


component SU (N ) color index and with spin and flavor indices suppressed;
the Γi are various N -independent spin and flavor matrices; and the Cij are
some symmetric numerical coefficients, which we will take as N -independent.
Coleman considered the vacuum expectation value of two of these fields,
given by a decomposition into disconnected and connected parts
"
X
hQ(x)Q(y)i0 = Cij Ckl hBi (x)Bk (y)i0 hBj (x)Bl (y)i0
ijkl
#
+ hBi (x)Bj (x)Bk (y)Bl (y)i0,conn . (3)

(We drop disconnected terms that are coordinate-independent.) A one-


tetraquark pole can only appear in the final, connected, term, but according
to the usual rules for counting powers of N , the first term is of order N 2 ,
∗∗
In his article Coleman used bilinears Bi′ (x) defined to contain an extra factor of
g N 1/2 ∝ N −1/2 . This makes no difference to results for observables.
2

2
while the final term is only of order N , and so any one-tetraquark pole would
make a contribution in (3) that is relatively suppressed by a factor 1/N .
So far, so good, but what does this really show? Coleman concluded “In
the large-N limit, quadrilinears make meson pairs and nothing else.” But is
this justified? If there is a tetraquark meson pole in the connected part of
the propagator, what difference does it make if its residue is small compared
with the disconnected part? To take an analogy, the amplitude for ordinary
meson-meson scattering is proportional to the connected part of a four-point
function involving four quark-antiquark bilinear operators, which is of order
N , while the disconnected parts of the same four-point function are of order
N 2 . Does this mean that ordinary mesons do not scatter in the large N
limit?
The real question is the decay rate of a supposed tetraquark meson.
If the width of the tetraquark grows as some power of N , while its mass
is independent of N , then for very large N it may not be observable as a
distinct particle. Although Coleman did not address this issue, his discussion
does suggest that the rate for an tetraquark meson to decay into two ordinary
mesons does grow with N . As we will now see, this is correct, but with an
important exception.
To calculate decay rates, we need to represent particle states with op-
erators that are properly normalized to be used as LSZ interpolating fields.
The propagator for a quark bilinear operator Bn (x) representing an ordinary
meson is proportional to N , but the residue of the pole in the propagator of a
properly normalized operator should be N -independent, so as noted by Cole-
man, the properly normalized operators for creating and destroying ordinary
mesons are N −1/2 Bn (x). Similarly, if there is an one-tetraquark pole in the
connected term in (3), then since the connected term in Eq. (3) is of order
N , the correctly normalized operator for creating or destroying a tetraquark
meson is N −1/2 Q(x). The amplitude for the decay of a tetraquark meson
into ordinary mesons of type n and m is then proportional to a suitable
Fourier transform of the three-point function

N −3/2 hT {Q(x)Bn (y)Bm (z)}i0


X
= N −3/2 Cij hT {Bi (x)Bn (y)}i0 hT {Bj (x)Bm (z)}i0
ij

+ N −3/2 hT {Q(x)Bn (y)Bm (z)}i0,conn . (4)

The connected second term on the right is of order N −3/2 N = N −1/2 , but
the first term is larger, of order N −3/2 N 2 = N 1/2 , giving a decay rate pro-

3
portional to N . In this case a tetraquark meson would become unobservable
for N → ∞, though one may still wonder about the relevance of this result.
The physical value N = 3 may or may not be taken as large, but it can’t be
regarded as infinite.
In any case, there is an exception to the rule that tetraquark mesons
become increasingly unstable for increasing N . It may be that the bilinears
Bi (x) in Eq. (1) have quantum numbers that do not match those of any
meson light enough to appear in the decay of the tetraquark meson repre-
sented by Q(x). In that case the tetraquark decay amplitude would arise
entirely from the second term in Eq. (4), which would give a decay rate of
the tetraquark into two light ordinary mesons proportional to 1/N , just as
in the decay of ordinary mesons.
We can find examples of both kinds. For an example of a short-lived
tetraquark, consider a J P C = 0++ isoscalar tetraquark meson, represented
by the operator
X   
Q(x) = q a (x) γ5 ~t q a (x) · q b (x) γ5 ~t q b (x) , (5)
ab

where ~t is an isospin matrix. In this case the decay into two pions can
proceed through the first term in Eq. (4), and the decay rate is of order N .
This may be the case for a plausible tetraquark[4], the very broad f0 (500),
which has a width for two-pion decay of 400 to 700 MeV.
For an example of a long-lived tetraquark, consider the case of a different
J P C = 0++ isoscalar tetraquark meson represented by the operator
2
X
a a
X
a
Q(x) = u (x)γ5 u (x) + a
d (x)γ5 d (x) ,

(6)
a a

or 2 2
X X
a a
Q(x) = sa (x)γ5 u (x) + sa (x)γ5 d (x) . (7)

a a

The lightest meson with the quantum numbers of these choices of B(x) are
the η(548) and the K(495). If a 0++ tetraquark meson represented by (6)
or (7) is lighter than 2mη or 2mK , respectively, its decay would receive
no contribution from the first term in Eq. (4). Its decay amplitude would
arise entirely from the second term in Eq. (4), which would give a decay
rate (for instance, into two pions) proportional to 1/N . This may be the
case for instance for the f0 (980), which is plausibly identified as a tetraquark

4
meson[4], and has a width of only 40 to 100 MeV. The large N approximation
not only does not rule out such exotic messons — it can explain why they
are narrow.
The large N approximation gives an objective meaning to a statement
that a tetraquark represented by a product B1 (x)B2 (x) of quark bilinears is
a composite of the ordinary mesons represented by B1 (x) and B2 (x), even
where the tetraquark meson is much lighter or much heavier than the sum of
these ordinary meson masses. It is not only that the two-meson intermediate
state dominates the propagator of the tetraquark operator Q(x), as shown
by Coleman. More relevant to experiment, the contribution of a two-meson
state to meson–meson scattering is proportional to [(N −1/2 )4 N ]2 = 1/N 2 ,
while since the amplitude for the tetraquark to go into two ordinary mesons
is proportional to N 1/2 , the contribution of a tetraquark pole (if there is
one) is proportional to [N 1/2 ]2 = N . Hence, whatever its mass, for large N
the one-tetraquark intermediate state dominates the scattering of these two
ordinary mesons in the partial wave with the same quantum numbers as the
tetraquark.
It would be interesting to apply this analysis to a wider variety of
tetraquarks, with quantum numbers other than 0++ , T = 0, and also taking
flavor SU (3) symmetry into account.

I am grateful to Frank Close and Philip Page for helpful correspondence,


and to Tamar Friedmann for a seminar talk that spurred my interest in
tetraquarks. This material is based upon work supported by the National
Science Foundation under Grant Number PHY-0969020 and with support
from The Robert A. Welch Foundation, Grant No. F-0014.

———-

1. G. ’t Hooft, Nucl. Phys. B75, 461 (1974).

2. S. Coleman, Aspects of Symmetry (Cambridge University Press, Cam-


bridge, UK, 1985), pp. 377-378.

3. For instance, see P. R. Page, in Intersections of Particle and Nuclear


Physics: 8th Conference, ed. Z. Parsa (American Institute of Physics,
2003), p. 513.

5
4. R. J. Jaffe, Phys. Rev. D 15, 267 (1977); F. E. Close and N. A.
Törnqvist, J. Phys. G 28, R249 (2002) [hep-ph/0204205]; T. Fried-
mann, to appear in Eur. Phys. J. C [arXiv:0910.2229].

6
UTTG-09-13
TCC-0006-13
arXiv:1305.1971v1 [astro-ph.CO] 8 May 2013

Goldstone Bosons as Fractional Cosmic Neutrinos

Steven Weinberg∗
Theory Group, Department of Physics, University of Texas
Austin, TX, 78712

Abstract

It is suggested that Goldstone bosons may be masquerading as fractional


cosmic neutrinos, contributing about 0.39 to what is reported as the effective
number of neutrino types in the era before recombination. The broken
symmetry associated with these Goldstone bosons is further speculated to
be the conservation of the particles of dark matter.


Electronic address: weinberg@physics.utexas.edu

1
The correlations of temperature fluctuations in the cosmic microwave
background depend on the effective number Neff of neutrino species present
in the era before recombination. Although observations are certainly con-
sistent with the expected value Neff = 3, there have been persistent hints
in the data that the effective number may be somewhat greater. WMAP9
together with ground-based observations (WMAP9 +eCMB)[1] gave Neff =
3.89 ± 0.67, while Planck together with the WMAP9 polarization data and
ground-based observations (Planck+WP+highL)[2] gives Neff = 3.36 ± 0.34,
both at the 68% confidence level. Is it possible that some nearly massless
weakly interacting particle is masquerading as a fractional cosmic neutrino?
As a candidate for an imposter fractional neutrino, one naturally thinks
of Goldstone bosons, associated with the spontaneous breakdown of some
exact or nearly exact global continuous symmetry. They would of course
be massless or nearly massless, and the characteristic derivative coupling of
Goldstone bosons would make them weakly interacting at sufficiently low
temperatures.
Since Fermi statistics reduces the energy density of neutrinos relative
to massless bosons by a factor 7/8, and Neff lumps antineutrinos with neu-
trinos, a neutral Goldstone boson might look like (1/2)/(7/8) = 4/7 of a
neutrino. But for this to be true, there is an important qualification: the
Goldstone bosons must remain in thermal equilibrium with ordinary parti-
cles until after the era of muon annihilation, so that the temperature of the
Goldstone bosons matches the neutrino temperature. If Goldstone bosons
went out of equilibrium much earlier, then neutrinos but not Goldstone
bosons would have been heated by the annihilation of the various species of
particles of the Standard Model, and the contribution of Goldstone bosons
to Neff would be much less than 4/7. As we shall see, there is a plausible
intermediate possibility, that the contribution of Goldstone bosons to Neff
would be (4/7)(43/57)4/3 = 0.39. To judge when the Goldstone bosons went
out of thermal equilibrium, we need a specific theory[3].
We will consider the simplest possible broken continuous symmetry, a
U (1) symmetry associated with the conservation of some quantum number
W . All fields of the Standard Model are supposed to have W = 0. To allow
in the simplest way for the breaking of this symmetry, we introduce a single
complex scalar field χ(x), neutral under SU (3)⊗SU (2)⊗U (1), which carries
a non-vanishing value of W . With this field added to the Standard Model,
the most general renormalizable Lagrangian is
1 1 1  2
L = − ∂µ χ† ∂ µ χ + µ2 χ† χ − λ χ† χ
2 2 4

2
g  †  † 
− χ χ ϕ ϕ + LSM , (1)
4
where µ2 , g , and λ are real constants; LSM is the usual Lagrangian of the
Standard Model; and ϕ = (ϕ0 , ϕ− ) is the Standard Model’s scalar doublet.
Experience with the linear σ-model shows that with a Lagrangian like (1),
there are several diagrams in each order of perturbation theory that must
be added up in order to give matrix elements that agree with theorems
governing soft Goldstone bosons. To avoid this, it is better to separate a
massless Goldstone boson field α(x) and a massive “radial” field r(x) by
defining
χ(x) = r(x)e2iα(x) , (2)
where r(x) and α(x) are real, with the phase of χ(x) adjusted to make
hα(x)i = 0. (The 2 in the exponent is for future convenience.) The La-
grangian (1) then takes the form
1 1 1
L = − ∂µ r ∂ µ r + µ2 r 2 − λr 4
2 2 4
2 µ g 2 † 
−2r ∂µ α∂ α − r ϕ ϕ + LSM . (3)
4
The SU (2) ⊗ U (1) symmetry of the Standard Model is of course broken
by a non-vanishing vacuum expectation value of the field ϕ0 , with a real
zeroth-order value hϕi ≃ 247 GeV. The U (1) symmetry of W conservation
is also broken if (µ2 −ghϕi2 )/λ is positive, in which case r gets a real vacuum
expectation value, given in zeroth order by
q
hri = m2r /2λ , m2r ≡ µ2 − ghϕi2 /2 . (4)
In this formalism, the interaction of Goldstone bosons with the particles
of the Standard Model arises entirely from a mixing of the radial boson with
the Higgs boson. There is a term −ghϕihriϕ′ r ′ in the Lagrangian (3), where
r ′ ≡ r − hri and ϕ′ ≡ Reϕ0 − hϕi , so that the fields describing neutral
spinless particles of definite non-zero mass are not precisely ϕ′ and r ′ , but
instead cos θ ϕ′ + sin θ r ′ and − sin θ ϕ′ + cos θ r ′ , with the mixing angle given
by
ghϕihri
tan 2θ = 2 . (5)
mϕ − m2r
Since only one Higgs boson has been discovered at CERN[4], with what ap-
pear to be the production rate and decay properties expected in the Stan-
dard Model, this mixing must be weak. We will make the assumption that
| tan 2θ| ≪ 1, and return soon to the question whether this is plausible.

3
This ϕ−r mixing allows the Higgs boson to decay into a pair of Goldstone
bosons. The fourth term in (3) contains an interaction (1/2hri)r ′ ∂µ α′ ∂ µ α′ ,
where α′ ≡ 2hri α is the canonically normalized Goldstone boson field. To-
gether with one vertex of the mixing term −ghϕihriϕ′ r ′ , this gives a partial
width
g2 hϕi2 m3ϕ
Γϕ→2α = (6)
16π(m2ϕ − m2r )2
Taking hϕi = 247 GeV, mϕ = 125 GeV, and assuming mϕ ≫ mr , this
partial width is 9.7 g2 GeV. The Goldstone bosons interact very weakly
with particles of the Standard Model, so these decays would be unobserved.
But under the assumption that the production and decays of the Higgs
boson are correctly described by the Standard Model aside perhaps from
decay into some new unobserved particles, the branching ratio for decay
into new unobserved particles is known to be less than about 19% [5], so
with a Higgs width of about 4 MeV, the partial width (6) must be less than
0.8 MeV, and therefore |g| < 0.009. With g this small, and again assuming
that mϕ ≫ mr , the mixing parameter (5) is indeed much less than one,
provided that hri is much less than 7 TeV, which seems not implausible.
Now, back to the problem of when the Goldstone bosons cease being in
thermal equilibrium with the particles of the Standard Model. The joint ac-
tion of the previously discussed terms −(1/2hri)r ′ ∂µ α′ ∂ µ α′ and −ghϕihriϕ′ r ′
in the Lagrangian (3) produces an effective interaction between low-energy
Goldstone bosons and any fermion F of the Standard Model:
gmF
− ∂µ α′ ∂ µ α′ F F . (7)
2m2r m2ϕ

At a temperature T , the derivatives in Eq. (7) yield factors of order kT ,


and the number density of any particle with mass of order kT or less is of
order (kT )3 , so the rate of collisions of Goldstone bosons with any species
of fermion F with mass mF at or below kT is of order g2 m2F (kT )7 /m4r m4ϕ .
The expansion rate of the universe is of order (kT )2 /mPL where mPL is the
Planck mass, so the ratio of these two rates is

collision g2 m2F (kT )5 mPL


≈ . (8)
expansion m4r m4ϕ

This is a crude estimate, but the ratio decreases so rapidly with temperature
that it gives a fair idea of when the Goldstone bosons go out of equilibrium.

4
As mentioned earlier, if Goldstone bosons go out of equilibrium before
kT falls below the mass of most of the particles of the Standard Model,
then the neutrinos (which are in thermal equilibrium at these temperatures)
will be heated by the annihilation of Standard Model particles while the
Goldstone bosons will not, and the contribution of Goldstone bosons to
Neff will be much less than 4/7. But suppose that Goldstone bosons go
out of equilibrium while kT is still above the mass of muons and electrons
but below the mass of all other particles of the Standard Model, a time
when neutrinos are still in thermal equilibrium. The cosmic entropy density
just before the annihilation of muons, taking account of photons, muons,
electrons, and three species of neutrinos, is
4aB T 3 7 7 21
 
s= 1+ + + ,
3 4 4 8
while after muon annihilation it is
4aB T 3 7 21
 
s= 1+ + ,
3 4 8
where aB is the radiation energy constant. The constancy of the entropy
per co-moving volume s a3 tells us that for particles like neutrinos that are
in thermal equilibrium, T a must increase by a factor (57/43)1/3 , while for
free Goldstone bosons T a is constant, so that Goldstone bosons make a
contribution to the measured Neff equal to (4/7)(43/57)4/3 = 0.39, which at
least for the present seems in good agreement with observation. For this to
be the case, the ratio (8) must equal unity when mF = mµ and kT ≈ mµ ,
so that
g2 m7µ mPL
≈1. (9)
m4r m4ϕ
For instance, with g = 0.005 and mϕ = 125 GeV, this tells us that mr ≈ 500
MeV. (In order for the Goldstone bosons to go out of equilibrium when the
only massive Standard Model particles left are electrons and positrons, in
which case they make a contribution to Neff equal to 4/7, the value of mr
would have to be less than given by Eq. (9) by a factor between (me /mµ )1/2
and (me /mµ )7/4 .)
Another consequence of the term −(1/2hri)r ′ ∂µ α′ ∂ µ α′ in the Lagrangian
is that the massive r bosons decay rapidly into Goldstone boson pairs. Even
for hri as large as 7 TeV, and taking mr = 500 MeV, the radial boson lifetime
would be at most of order 10−16 seconds, so they would be long gone at any
era with which we are concerned here.

5
We can further speculate about the physical significance of the assumed
broken U (1) symmetry. There is no room for a new broken global symmetry
in the Standard Model, so it natural to think of a symmetry associated with
particles not described by the Standard Model, but known to be abundant
in the universe — that is, with dark matter. We will now assume that
the conserved quantum number W associated with the global U (1) symme-
try introduced above is WIMP number, the number of weakly interacting
massive particles minus the number of their antiparticles. We introduce a
single complex Dirac WIMP field ψ(x), carrying WIMP quantum number
W = +1, and give the scalar field χ(x) WIMP quantum number W = +2, so
that its expectation value leaves an unbroken reflection symmetry ψ → −ψ.
All the fields of the Standard Model are again assumed to have W = 0.
The most general renormalizable term involving the WIMP field that can
be added to the Lagrangian (1) is

f c f∗
Lψ = −ψ̄γ µ ∂µ ψ − mψ ψ̄ψ − ψ ψ χ† − ψψ c χ , (10)
2 2
where ψ c is the charge-conjugate field∗∗ ; mψ and f are constants; and by a
choice of phase of ψ we can make f as well as mψ real. If together with the
definition (2), we define a field ψ ′ (x) by

ψ(x) = ψ ′ (x)eiα(x) , (11)

the WIMP Lagrangian (10) then becomes

Lψ = −ψ ′ γ µ ∂µ ψ ′ − mψ ψ ′ ψ ′ − iψ ′ γ µ ψ ′ ∂µ α
f f
− ψ ′c ψ ′ r − ψ ′ ψ ′c r . (12)
2 2
Because r has a non-zero vacuum expectation value hri, the WIMP fields
with definite mass are a pair of self-charge-conjugate fields
1  
ψ± (x) = √ ψ ′ (x) ± ψ ′c (x) , (13)
2
with masses
m± = mψ ± hrif . (14)
∗∗
That is, ψ c is the complex conjugate of ψ, multiplied by a matrix C −1 β (in the notation
of ref. 6) that gives ψ c the same Lorentz transformation properties as ψ.

6
The part of the Lagrangian that involves the WIMP fields can then be put
in the form
1 Xh i ih i
Lψ = − ψ± γ µ ∂µ ψ± + m± ψ± ψ± − ψ+ γ µ ψ− + ψ− γ µ ψ+ ∂µ α
2 ± 2
f h i
− r ′ ψ+ ψ+ + ψ− ψ− , (15)
2
where again, r ′ ≡ r − hri.
We see that instead of one Dirac WIMP, there are two Majorana WIMPs
of different mass. But the heavier WIMP will decay into the lighter one
by emitting a Goldstone boson, while the lighter one is kept stable by an
unbroken reflection symmetry, so in this theory we can expect that the
present universe will contain only one kind of Majorana WIMP, the lighter
one w, with mass mw equal to the smaller of m± .
The r −ϕ mixing allows the Higgs boson to decay into pairs of the lighter
WIMPs, if they are lighter than mϕ /2. In this case, the partial width for
this decay is !2
1 f ghrihϕi q
Γϕ→2w = m2ϕ − 4m2w (16)
32π m2ϕ − m2r
As we have seen, observations require this to be less than about 0.8 MeV.
Taking mr and 2mw much less than mϕ , this condition tells us that the
WIMP mass splitting ∆m ≡ |m+ − m− | = 2|hrif | satisfies |g|∆m < 3.2
GeV, a constraint that will be useful in what follows.
The surviving WIMPs can annihilate in pairs through their interaction
with Goldstone bosons and with the field r ′ , which mediates interactions
both with Goldstone and radial bosons and with the particles of the Stan-
dard Model. It is well known that in order for annihilation of WIMPs to give
a dark matter density like that observed, it is necessary for the annihilation
cross-section to satisfy[7]
!0.51
2π hσvi
P
mw ≃ 3.7 GeV × (2ΩD h2 )−0.54 ≃ 9 GeV , (17)
G2wk m2w

where ΩD h2 ≃ 0.105 is the usual dark matter density parameter; the sum
is taken over all annihilation channels; and Gwk ≃ 10−5 GeV−2 is the weak
coupling constant. In what follows we will simplify our estimates by replac-
ing the exponent 0.51 with 1/2.
One possibility is annihilation into a quark q and its antiparticle. The
combination of the interactions f ψ± ψ± r ′ , the mixing term −ghϕihriϕ′ r ′

7
and the Standard Model interaction (mq /hϕi)qqϕ′ gives an effective cross
section for annihilation of cold WIMP pairs into a relativistic quark q and
its antiquark:
!2
X 3 gmq mw ∆m
hσvi = , (18)
2π 2(4m2w − m2r )(4m2w − m2ϕ )

in which we have used Eq. (14) to express |hrif | as ∆m/2.


For heavy WIMPs, with mw much larger than the mass mt of the top
quark, the quark produced in WIMP annihilation would be the top quark, in
which case Eq. (17) (with
√ 2mw much larger than mr and mϕ ) requires that
∆m/mw = 32m2w Gwk / 3|g|mt × 9 GeV ≫ 32, requiring |mψ | and |hrif | to
differ by much less than 6%.
The fine tuning problem is worse for mt > mw ≫ mϕ /2. In this case
the quark produced in WIMP annihilation would √ be the bottom quark, and
2
Eq. (17) requires that ∆m/mw = 32mw Gwk / 3|g|mb × 9 GeV ≫ 160, which
would require |mψ | and |hrif | to differ by much less than 1%.
The case √mϕ ≫ 2mw ≫ mr is even less promising. In this case, Eq. (17)
gives mw ≃ 3mq |g| ∆m/8m2ϕ Gwk (9 GeV) . With the previously derived
upper bound |g|∆m < 3.2 GeV, this requires that mw < 0.49 mq , which is
clearly impossible if cold w pairs are to annihilate into q + q.
It appears that if ∆m and mw are of comparable magnitude, then the
annihilation of these WIMPs into quarks may not be sufficiently fast to bring
the dark matter density down to the observed value. Inclusion of annihila-
tion into leptons and gauge bosons helps this problem, but apparently not
enough. Of course, we could make the annihilation cross-section as large
as we like by taking 2mw sufficiently close to mϕ (or mr ). Otherwise, the
dominant annihilation could be into pairs of Goldstone bosons (and perhaps
radial bosons). The cross-section here is of order f 4 /m2w , so condition (17)
would require that mw ≈ 104 f 2 GeV.
Unfortunately there are too many free parameters here to allow a definite
conclusion whether the density of WIMPs in this theory does or does not
match the observed density of dark matter.

I am grateful for a helpful correspondence with Eiichiro Komatsu, for


a valuable suggestion by Jacques Distler, and for information provided by
Can Kilic and Matthew McCullough. This material is based upon work
supported by the National Science Foundation under Grant Number PHY-

8
0969020 and with support from The Robert A. Welch Foundation, Grant
No. F-0014.

———-

1. G. Hinshaw et al. [WMAP collaboration], arXiv: 1212.5226; S. Das et


al. [ACT collaboration], Astrophys. J. 729, 62 (2011); R. Keisler et
al. [SPT collaboration], Astrophys. J. 743, 28 (2011).

2. P. A. R. Ade et al. [Planck collaboration], arXiv:1303.5076; S. Das et


al. [ACT collaboration], arXiv:1301.1037; C. L. Reichardt et al. [SPT
collaboration], Astrophys. J. 755, 70 (2012).

3. The possibility that axion-like Goldstone bosons contribute to Neff was


mentioned along with other possible contributions by K. Nakayama,
F. Takahashi, and T. T. Yanagida, Phys. Lett. B 697, 275 (2011),
without addressing the question of thermal equilibrium between these
Goldstone bosons and Standard Model particles.

4. A. Aad et al. [ATLAS Collaboration], Phys. Lett. B 716, 1 (2012);


S. Chatrchyan et al. [CMS Collaboration], Phys. Lett. B 716, 30
(2012).

5. P. P. Giardano, K. Kannike, I. Masina, M. Raidal, and A. Strumlo,


arXiv:1302.3570.

6. S. Weinberg, The Quantum Theory of Fields - Volume I (Cambridge


University Press, Cambridge, UK, 1995), Sec. 5.4.

7. B. W. Lee and S. Weinberg, Phys. Rev. Lett. 39, 165 (1977); D.


D. Dicus, E. W. Kolb, and V. L. Teplitz, Phys. Rev. Lett. 39, 168
(1977); E. W. Kolb and K. A. Olive, Phys. Rev. D 33, 1202 (1986).
Eq. (17) is adapted from Eq. (3.4.14) of S. Weinberg, Cosmology (Ox-
ford University Press, Oxford, 2008). An additional factor 2 has been
inserted in front of ΩD h2 , because in this theory the present dark
matter density consists of a single Majorana WIMP species, rather
than distinct particles and antiparticles as assumed in the derivation
of Eq. (3.4.14).

9
UTTG-21-13
arXiv:1405.3483v1 [quant-ph] 14 May 2014

Quantum Mechanics Without State Vectors

Steven Weinberg∗
Theory Group, Department of Physics, University of Texas
Austin, TX, 78712

Abstract

It is proposed to give up the description of physical states in terms of


ensembles of state vectors with various probabilities, relying instead solely
on the density matrix as the description of reality. With this definition of a
physical state, even in entangled states nothing that is done in one isolated
system can instantaneously effect the physical state of a distant isolated
system. This change in the description of physical states opens up a large
variety of new ways that the density matrix may transform under various
symmetries, different from the unitary transformations of ordinary quantum
mechanics. Such new transformation properties have been explored before,
but so far only for the symmetry of time translations into the future, treated
as a semi-group. Here new transformation properties are studied for general
symmetry transformations forming groups, rather than semi-groups. Ar-
guments are given that such symmetries should act on the density matrix
as in ordinary quantum mechanics, but loopholes are found for all of these
arguments.


Electronic address: weinberg@physics.utexas.edu

1
I. A MODEST PROPOSAL

Two unsatisfactory features of quantum mechanics have bothered physi-


cists for decades. The first is the difficulty of dealing with measurement.
The unitary deterministic evolution of the state vector in quantum mechan-
ics cannot convert a definite initial state vector to an ensemble of eigenvec-
tors of the measured quantity with various probabilities. Here we seem to
be faced with nothing but bad choices. The Copenhagen interpretation[1]
assumes a mysterious division between the microscopic world governed by
quantum mechanics and a macroscopic world of apparatus and observers
that obeys classical physics. If instead we take the wave function or state
vector seriously as a description of reality, and suppose that it evolves uni-
tarily according to the deterministic time-dependent Schrödinger equation,
we are inevitably led to a many-worlds interpretation[2], in which all possible
results of any measurement are realized. To avoid both the absurd dualism
of the Copenhagen interpretation and the endless creation of inconceivably
many branches of history of the many-worlds approach, some physicists
adopt an instrumentalist position, giving up on any realistic interpretation
of the wave function, and regarding it as only a source of predictions of
probabilities, as in the decoherent histories approach[3].
The other problem with quantum mechanics arises from entanglement[4].
In an entangled state in ordinary quantum mechanics an intervention in the
state vector affecting one part of a system can instantaneously affect the
state vector describing a distant isolated part of the system. It is true
that in ordinary quantum mechanics no measurement in one subsystem can
reveal what measurement was done in a different isolated subsystem, but
the susceptibility of the state vector to instantaneous change from a distance
casts doubts on its physical significance.
Entanglement is much more of a problem in some modifications of quan-
tum mechanics that are intended to resolve the problem of measurement,
such as the general nonlinear stochastic evolution studied in [5]. It is diffi-
cult in these theories even to formulate what we mean by isolated subsys-
tems, much less to prevent instantaneous communication between them[6,7].
Polchinski[7] has shown that unless nonlinearities are constrained to depend
only on the density matrix, such modified versions of quantum mechanics
even allow communication between the different worlds of the many-worlds
description of quantum mechanics.
The problem of instantaneous communication between distant isolated
systems has been nicely summarized in a theorem of Gisin[6]. It states

2
that in a system consisting of two isolated subsystems I and II, with a
prescribed density matrix ρI for subsystem I , it is always possible in a
suitable entangled state of the two subsystems to make measurements on
subsystem II that put subsystem I in any set of states ΨIr (not necessarily
orthogonal) with probabilities Pr , provided only that r Pr ΛIr = ρI , where
P

ΛIr is the projection operator on the state ΨIr .


Since any statement that a system is in an ensemble of states with def-
inite probabilities can thus be changed instantaneously by a measurement
at an arbitrary distance, keeping only the density matrix fixed, it seems
reasonable to infer that such statements are meaningless, and that only the
density matrix has meaning. That is, it seems worth considering yet another
interpretation of quantum mechanics: The density matrix rather than the
state vector or wave function is to be taken as a description of reality.
Taking the density matrix as the description of reality is very different
from giving the same status to an ensemble of state vectors with various
probabilities, because the density matrix contains much less information. If
we know that a system is in any one of a number of states Ψr , with prob-
abilities Pr , then we know that the density matrix is ρ = r Pr Λr , where
P

Λr is the projection operator on state Ψr , but this does not work in reverse.
As is well known, for a given density matrix ρ there are any number of
ensembles of not necessarily orthogonal or even independent state vectors
and their probabilities that give the same density matrix. (An exception is
discussed in Section II.) The density matrix is of course a Hermitian oper-
ator on Hilbert space, a vector space. In speaking of “quantum mechanics
without state vectors” I mean only that a statement that a system is in any
one of various state vectors with various probabilities is to be regarded as
having no meaning, except for what it tells us about the density matrix.
For example, suppose the density matrix of a spin 1/2 particle, in a basis
provided by states with the north-component of spin equal to +1/2 or −1/2,
takes the form !
0.69 0.17
ρ= .
0.17 0.31
By diagonalizing this matrix, we might conclude that this particle has a
75% probability of being in a pure state with spin pointing northeast and
a 25% probability of being in an orthogonal pure state with spin pointing
southwest. But we get the same density matrix if the particle has a 50%
probability of being in a pure state with spin pointing north, a 15% probabil-
ity of being in a pure state with spin pointing south, and a 35% probability

3
of being in a pure state with spin pointing east. These ensembles sound
different, but in fact they are indistinguishable. Indeed, they had better be
indistinguishable, because otherwise we could communicate instantaneously
at an arbitrary distance by acting on a distant isolated system with which
this particle’s state vector is entangled so as to change the spin states from
the first to the second ensemble. It is better just to specify the density
matrix, and give up its description in terms of an ensemble of state vectors
with various probabilities.
If the density matrix is not to be defined in terms of ensembles of state
vectors, then what is it? We may define it by postulating a physical inter-
pretation: The average value A of any physical quantity represented by an
Hermitian operator A is Tr(Aρ), which since it applies also to powers of A
allows us to find from the density matrix the probability distribution for
values of A. (These may be regarded as objective probabilities, independent
of whether or not anything is actually being measured.) This postulate leads
to all the properties of the density matrix that are usually derived from its
interpretation in terms of an ensemble of states with various probabilities.
The density matrix must be Hermitian in order that Tr(Aρ) should be real
for an arbitrary Hermitian operator A. The density matrix must have unit
trace in order that Tr(αρ) = α for any c-number α. The density matrix
must be positive in order that Tr(Aρ) should be positive for any positive
Hermitian operator A. Also, a physical quantity represented by a Hermitian
operator A will have a definite value α (in the sense that the mean value of
An is αn for all integers n) if and only if Aρ = αρ.
It may seem like a mere matter of language to say that it is the density
matrix rather than an ensemble of state vectors with various probabilities
that should be taken as the description of a physical system. Already many
studies of the interpretation of quantum mechanics and of quantum infor-
mation theory are based on the density matrix rather than the state vector,
without needing a new interpretation of quantum mechanics. What differ-
ence does it make?
There is one big difference, that is our chief concern in this paper. Giv-
ing up the definition of the density matrix in terms of state vectors opens
up a much larger variety of ways that the density matrix might respond
to various symmetry transformations. In ordinary quantum mechanics, a
symmetry transformation takes a density matrix ρ into U ρU † , where U is a
unitary (or, for time-reversal, antiunitary) operator belonging to one of the
representations of the symmetry group. This is certainly not the only way
that an Hermitian matrix could transform. For instance, we may consider

4
a system with an SU (3) symmetry and a Hilbert space of three-dimensions,
in which the density matrix transforms under SU (3) as the reducible repre-
sentation 3 + 3 + 1 + 1 + 1. In a suitable basis, we would have
 
a1 b3 b∗2
 ∗
ρ =  b3 a 2 b1  ,

b2 b∗1 a3
where under SU (3) the real diagonal elements an transform as singlets, with
a1 + a2 + a3 = 1, and the triplet (b1 , b2 , b3 ) transforms as the representation
3. This SU (3) transformation of ρ cannot be put in the form ρ 7→ U ρU †
required in ordinary quantum mechanics, because if the 3 × 3 matrix U
belonged to the representations 3 or 3 then ρ would transform as 3 × 3 =
1 + 8, not 3 + 3 + 1 + 1 + 1. (The other possibility is that U belongs to
the representation 1 + 1 + 1, in which case ρ would transform as a sum
of singlets, again not including 3 + 3.) The question of the positivity of a
density matrix transforming in this way is discussed in Section VII.
The possibility of an unusual transformation of the density matrix has
been widely considered, but up to now I believe only for the symmetry of
time-translation. In this case it is known that the evolution of the density
matrix with time is in general governed by a first-order linear differential
equation, such as the Lindblad equation[8] (given here in Section VIII), dif-
ferent from what is found in ordinary quantum mechanics. The Lindblad
equation is commonly used to study open systems in ordinary quantum
mechanics, with the effects of the environment integrated out, but it has
also been used to deal with the problem of measurement[9] in closed sys-
tems. A stochastic evolution of the state vector can be arranged to yield a
Lindblad equation for the density matrix, and with a suitable choice of the
details of this differential equation, its solutions can reproduce the results
of measurement according to the Copenhagen interpretation, but through
a smooth spontaneous localization of the density matrix[9] rather than a
sudden intrusion of classical physics. (These matters are discussed in a sep-
arate paper[10].) These theories share the well-known feature of ordinary
quantum mechanics, that entanglement does not lead to communication at
a distance between isolated systems. This is because (as explained below,
in a more general context) nothing that is done in one system can instan-
taneously affect the density matrix of another system, though it can affect
the state vector; also, all predictions can be derived from the density ma-
trix, without knowing anything about state vectors, and the evolution of
the density matrix depends only on the density matrix, not on the state

5
vector. But from the point of view explored in the present work, the study
of the stochastic evolution of the state vector is unnecessary; it is only the
differential equation for the density matrix that matters.
The time-translation symmetry transformations used in deriving the
Lindblad equation take us only into the future, not the past, and hence
form a semi-group, not a group. If we are willing to consider new ways
that the density matrix might transform under time-translation, then we
ought to do the same for general symmetry groups, not just semi-groups.
This proposal runs into potential difficulties, each of which can be escaped
through a narrow loophole. As shown in Section II, in order to allow for
any new group transformation rules, we would need to restrict the class
of Hermitian operators that represent physical quantities. For continuous
symmetries, it would also be necessary to restrict the class of physically re-
alizable density matrices, as described in Section VII. Finally, in general it
would also be necessary to suppose that the usual arguments for complete
positivity do not apply to real physical systems, for reasons given in Section
VIII. If further study shows that these loopholes are not actually open, then
on the basis of the arguments of this paper, we could conclude that, even in
a quantum mechanics without state vectors, the density matrix transforms
under symmetry groups just as in ordinary quantum mechanics.

II. PHYSICAL QUANTITIES AND UNUSUAL SYMMETRIES

To explore unusual possibilities for symmetry transformations, we need


first to say what we mean by a symmetry transformation. We will take
a symmetry transformation to be a linear mapping ρ 7→ g(ρ) of density
matrices, which preserve their Hermiticity, positivity, and unit trace. For
any such transformation g, we further assume that there is a corresponding
linear transformation A 7→ g(A) of any operator A representing a physical
quantity, which preserves its Hermiticity, such that the mean value of the
physical quantity is left invariant:
   
Tr g(A) g(ρ) = Tr A ρ . (1)

With this definition of symmetry transformations, it is important to


decide just what operators can represent physical quantities. Certainly we
want to include familiar quantities like momentum, angular momentum, etc.
and functions of these quantities. In particular, the projection operator on

6
a non-degenerate eigenstate of such a physical quantity, as for instance the
projection operator (1 ± 2sz )/2 on a state of a spin 1/2 with sz = ±1/2, rep-
resents a physical quantity. In ordinary quantum mechanics, any Hermitian
operator is assumed to represent a physical quantity, but if state vectors are
not to be taken as a representation of reality, we can doubt whether operators
ΛΨ that are defined as the projection operators on arbitrary state-vectors Ψ
necessarily represent physical quantities. If they did, then according to our
interpretive postulate Tr(ΛΨ ρ) would be the probability that a system with
density matrix ρ is in a state Ψ, which is just the sort of statement that we
are taking as generally meaningless. In any case, it is hard to see how one
could ever tell that Schrödinger’s cat is in a state |alive > +|dead > rather
than, say, |alive > −|dead >.
This point is important for us, because if all Hermitian operators in-
cluding projection operators represent physical quantities, then with our
assumptions it can be shown that density matrices transform under any
symmetry transformation g with an inverse just as in ordinary quantum
mechanics[11]:
g(ρ) = U ρ U † , (2)
with U unitary or antiunitary.
The first step in the proof is to show that if all projection operators are
physical quantities in the above sense, then any symmetry transformation
g with an inverse g−1 takes any projection operator Λ (defined here as a
Hermitian operator with Λ2 = Λ and TrΛ = 1) into another projection
operator. According to our definition of symmetries, any density matrix ρ
is mapped into a Hermitian positive matrix g(ρ) with unit trace, which can
therefore be expressed as
X
g(ρ) = Pn Λn ,
n

where Λn are projection operators, satisfying Λn Λm = δnm Λn and TrΛn = 1,


and the Pn are positive real numbers with n Pn = 1. We then have
P

X
ρ= Pn g−1 (Λn ) .
n

If all projection operators represent physical quantities, then we can use


Eq. (1) with any Λn in place of A, ρ taken as any Λm , and g replaced with
g−1 , so that
   
Tr g−1 (Λn ) g −1 (Λm ) = Tr Λn Λm = δnm .

7
Hence  
Tr ρ2 = Pn2 .
X

n
   
Now, if ρ is a projection operator, then ρ2 = ρ, so Tr ρ2 = Tr ρ = 1, and
therefore n Pn2 = 1. But the only way that this can be satisfied by a set of
P

real positive numbers Pn with n Pn = 1 is to have all Pn vanish except for


P

one, say P1 , with the value P1 = 1. Then g(ρ) is itself a projection operator,
namely Λ1 .
The rest of the proof is completed quickly. Any projection operator Λ
can be expressed as a dyad, ΛΨ = ΨΨ† , where Ψ is a normalized state
vector, unique up to a phase. (This is the exception mentioned in Section I
to the rule that density matrices may be expressed in various different ways
as linear combinations of projection operators; the only such representation
of a projection operator is as a unique dyad.) Since as we have seen any
symmetry transformation g with an inverse takes projection operators into
projection operators, we must have g(ΛΨ ) = Λg(Ψ) , where the state vector
g(Ψ) is unique up to a phase. Again, if these projection operators are to
be taken as representing physical quantities, then in Eq. (1) we can take
A = ΛΨ and ρ = ΛΦ for any two state vectors Ψ and Φ, and find
   
Tr Λg(Ψ) Λg(Φ) = Tr ΛΨ ΛΦ

and therefore   2   2
g(Ψ), g(Φ) = Ψ, Φ .

According to Wigner’s theorem[12], if this condition is satisfied for all nor-


malized state vectors Φ and Ψ, then it must be possible to choose the phases
of all g(Ψ) so that,
g(Ψ) = Ug Ψ
where Ug is a unitary (or antiunitary) operator, the same for all Ψ. In this
case, we have g(ΛΨ ) = Ug ΛΨ Ug† . Since any density matrix can be expressed
(though not uniquely) as a linear combination of projection operators, they
also transform as in Eq. (2), as was to be proved.
We thus have a choice. We can assume that the invariance condition (1)
holds for all density matrices ρ and all Hermitian operators A, in which case
density matrices can only have the same transformation properties (2) as
in ordinary quantum mechanics. Or we can limit the validity of Eq. (1) to
a class of physical quantities that does not include projection operators on

8
every state vector, in which case density matrices may have a much wider
variety of symmetry transformation properties. In this paper we will explore
the consequences of the latter choice.
The behavior of the density matrix under general symmetry transforma-
tions is outlined here in Section III. In Section IV these general results are
applied and further explored for the case of continuous symmetries. The
group multiplication law is found to impose severe constraints on the trans-
formation of the density matrix. Section V presents an example of a class of
continuous symmetries whose action on the density matrix explicitly satisfies
these constraints, but is different from the transformation found in ordinary
quantum mechanics. Section VI describes special features of the action of
compact groups on the density matrix. Section VII takes up the important
but difficult question of deciding what conditions should be imposed on the
transformation of the density matrix under general symmetry operations so
that these transformations will preserve the positivity of the density ma-
trix. Section VIII shows that assuming the positivity of the eigenvalues
of the transformation kernel rules out the possibility that the density ma-
trix transforms differently than in ordinary quantum mechanics, and gives
reasons why these eigenvectors need not be positive.

III. GENERAL SYMMETRIES

We suppose that a general element g of the symmetry group of a system


induces on the density matrix a linear transformation ρ 7→ g(ρ), with:
X
g(ρ)M ′ N ′ = KM ′ M,N ′ N [g] ρM N , (3)
MN

where K[g] is some c-number kernel independent of ρ. We will take the


indices M , N , etc. to run here over a finite number d of values, but will
assume that the formalism can be extended to Hilbert spaces of infinite di-
mensionality, on which the matrices considered here become well-behaved
operators. (No attempt will be made here to apply this formalism to rela-
tivistic theories[13].) Our reason for concentrating on linear transformations
is explained later in this section.
In order for g(ρ) to be Hermitian for an arbitrary Hermitian ρ, it is
necessary and sufficient that K be Hermitian, in the sense that

KM ′ M,N ′ N [g]∗ = KN ′ N,M ′ M [g] . (4)

9
(This is why it proves convenient to put the subscripts on K[g] in what may
otherwise look like a peculiar order.) Also, in order for g(ρ) to have unit
trace for an arbitrary ρ with unit trace, it is necessary and sufficient that
X
KM ′ M,M ′ N [g] = δM N . (5)
M′

The difficult thing is to know what additional conditions should be imposed


on K[g] (or on ρ) so that g(ρ) will be positive. This will be discussed in
Sections VII and VIII.
The great physical advantage of basing quantum mechanics on the den-
sity matrix, with linear symmetry transformation properties, is that the
transformation properties of the density matrix for an isolated subsystem
do not depend on the properties of any other distant isolated subsystem,
even in the case of entanglement, where the density matrix does not factor-
ize into density matrices for the individual subsystems. Suppose that the
system consists of two isolated parts, subsystems I and II, and replace the
indices M , N , etc. with compound indices ma, nb, etc., with the first letter
labeling the states of subsystem I and the second the states of subsystem II.
The possibility of entanglement does not in general allow the density matrix
(I) (II)
to factor into a product ρmn ρab of density matrices for the two subsystems,
but if the subsystems are isolated they transform independently, in the sense
that the kernel in Eq. (3) does factorize:
(I) (II)
Km′ a′ ma,n′ b′ nb [g] = Km′ m,n′ n [g] Ka′ a,b′ b [g] , (6)

where K (I) [g] and K (II) [g] are the kernels that would describe the transfor-
mation of the density matrix in subsystems I and II if the other subsystem
did not exist. (For a nonlinear transformation it would be difficult to see
what could take the place of Eq. (6) as a statement of what we mean by
isolated subsystems.) Since both K (I) [g] and K (II) [g] are possible physical
kernels, they each satisfy the analog of Eq. (5):
X (I) X (II)
Km′ m,m′ n [g] = δmn , Ka′ a,a′ b [g] = δab . (7)
m′ a′

The density matrix of subsystem I is related to the density matrix of the


whole system by
(I)
X
ρmn = ρma,na . (8)
a

10
(This follows from the requirement that the mean value Tr(ρA) of any physi-
(I)
cal quantity represented by an operator of the form Ama,nb = Amn δab , which
acts non-trivially only on subsystem I, should be equal to Tr(ρ A(I) ).) Ac-
(I)

cording to Eqs. (3), (6) and (8), its transformation is given by


(I) (I) (II)
ρm′ n′ 7→ g(I) (ρ)m′ n′ =
X X
Km′ m,n′ n [g] Ka′ a,a′ b [g] ρma,nb .
a′ mnab

Using Eq. (7) for K (II) and Eq. (8) again, this is
(I)
g(I) (ρ)m′ n′ = (I)
X
Km′ m,n′ n [g] ρmn , (9)
mn

so the transformation of ρ(I) is independent of ρ(II) . This applies in particu-


lar to the symmetry of time-translation, so as well known even in entangled
states the evolution of the density matrix for subsystem I is unaffected by
whatever happens in subsystem II.
Now let us return to the general case, and our former notation. Because
the kernel K[g] is Hermitian in the sense of Eq. (4), it can be expanded as
(i) (i)∗
η (i) [g] uM ′ M [g] uN ′ N [g] ,
X
KM ′ M,N ′ N [g] = (10)
i

(i)
where the uM ′ M [g] and η (i) [g] are a complete set of normalized eigenmatrices
and eigenvalues of the kernel KM ′ M,N ′ N [g], in the sense that
(i) (i)
KM ′ M,N ′ N [g] uN ′ N [g] = η (i) [g] uM ′ M [g] ,
X
(11)
N ′N
 
Tr u(i)† [g] u(j) [g] = δij . (12)

(Note that Eq. (11) does not say that the map (3) takes u(i) [g] into η (i) [g] u(i) [g].)
The transformed density matrix (3) can then be written more compactly as
a sum of matrix products:

η (i) [g] u(i) [g] ρ u(i)† [g] .


X
g(ρ) = (13)
i

The trace condition (5) here reads

η (i) [g] u(i)† [g] u(i) [g] = 1 ,


X
(14)
i

11
where 1 is the unit matrix. If there were only one eigenvector u(1) [g] with
a non-zero eigenvalue η (1) [g], then Eq. (14) would require η (1) [g] > 0, and
the transformation rule (13) could be written †
q g(ρ) = U [g] ρ U [g], where
according to Eq. (14) the matrix U [g] ≡ η (1) [g]u(1) [g] appearing in this
transformation rule is unitary. But in the general case, where the kernel has
several independent eigenmatrices with non-zero eigenvalues, Eqs. (13) and
(14) represent a non-trivial generalization of the unitary transformations of
ordinary quantum mechanics.
We also need to impose on K[g] the group property, that for any two
symmetry transformations g and g, we have
X
KM ′′ M ′ ,N ′′ N ′ [g] KM ′ M,N ′ N [g] = KM ′′ M,N ′′ N [gg] . (15)
M ′N ′

(This condition is not usually mentioned in connection with time translation,


because as we shall see it does not constrain the differential equation that
governs the temporal evolution of the density matrix, but it does need to
be imposed even for time translation when that symmetry is combined with
other symmetries.) Using the representation (10), the group property (15)
may be written
(i) (j) (i)† (j)†
η (i) [g] η (j) [g] uM ′′ M ′ [g] uM ′ M [g] uN ′ N ′′ [g] uN N ′ [g]
X X

M ′ N ′ ij
(k) (k)†
η (k) [gg] uM ′′ M [gg] uN N ′′ [gg] ,
X
= (16)
k

or, in an abbreviated notation,

η (i) [g] η (j) [g] u(i) [g] u(j) [g] ⊗ u(j)† [g] u(i)† [g] = η (k) [gg] u(k) [gg]⊗u(k)† [gg] ,
X X

ij k
(17)
it being understood that for any two d × d matrices A and B,

[A ⊗ B]M ′ M,N ′ N ≡ AM ′ M BN N ′ . (18)

In the next section we will explore the implications of Eq. (17) for continuous
symmetries.

IV. CONTINUOUS SYMMETRIES

12
We now consider a group of transformations that includes elements ar-
bitrarily close to the identity I. For the identity, we have of course

KM ′ M,N ′ N [I] = δM ′ M δN ′ N . (19)

This has one eigenmatrix u(1) [I] with non-zero eigenvalue


(1)

uN ′ N [I] = δN ′ N / d η (1) [I] = d , (20)

and d2 −1 eigenmatrices u(α) [I], a complete set of traceless matrices, all with
eigenvalues zero:
Tr u(α) [I] = 0 η (α) [I] = 0 . (21)
Now let’s consider group elements g(ǫn), with g(0) = I, where ǫ is in-
finitesimal, and nr is a real vector specifying a fixed direction in the space
of group parameters near the origin. The kernel K[g(ǫn)] may be supposed
to be analytic in ǫn for ǫn near zero, but because the eigenvalues η (α) [g(ǫn)]
are degenerate for ǫ = 0, according to the familiar rules of first order pertur-
bation theory the corresponding unperturbed eigenmatrices must be chosen
to diagonalize the first-order perturbation to K[g(ǫn)], and therefore may
remain functions of the direction (but not of the magnitude) of n even for
ǫ → 0. That is, in the limit ǫ → 0, the u(α) [g(ǫn)] approach u(α) (n), where:

∂KM ′ M,N ′ N [g(ǫn)]


 
(α)∗ (β)
uN ′ N (n) = δαβ ∆(α) (n) , (22)
X
uM ′ M (n)
M ′M N ′N
∂ǫ ǫ=0

where ∆(α) (n) scales as ∆(α) (cn) = c∆(α) (n), but like u(α) (n) is not in gen-
eral analytic in n at n = 0. To first order in ǫ, the corresponding eigenvalues
are
η (α) [g(ǫn)] → ǫ ∆(α) (n) . (23)
η (1) [g(ǫn)] is not degenerate and does
On the other hand, the eigenvalueq
not vanish for ǫ = 0, so the quantity η (1) [g(ǫn)]u(1) [g(ǫn)], which appears
in the terms in Eq. (10) with i or j or k equal to 1, may be supposed to be
given by a power series in ǫ n:
q
η (1) [g(ǫn)]u(1) [g(ǫn)] → 1 − iǫ n · τ + O(ǫ2 ) , (24)

with n · τ ≡ r nr τr , where [τr ]N ′ N are constant matrices (not necessarily


P

Hermitian), independent of ǫ and n.

13
The trace condition (14) tells us the anti-Hermitian parts of the matrices
τr :
−in · τ + in · τ † + ∆(α) (n) u(α)† (n)u(α) (n) = 0 .
X
(25)
α

Now, ∆(α) (n) and u(α)† (n) are complicated functions of the vector n that
defines a infinitesimal group element, but we can easily see that the sum in
Eq. (25) is simply linear in the components of n. By using Eqs. (10), (23),
(24), and (25), we can calculate the kernel for this group element to first
order in ǫ. In the notation of Eq. (18),
" #
† (α) α α†
X
K[g(ǫn)] → 1⊗1+ǫ −i n · τ ⊗ 1 + 1 ⊗ in · τ + ∆ (n) u (n) ⊗ u (n) .
α
(26)
We are assuming that the kernel itself is analytic in the group parameters
near the origin, so that the first-order term in K[g(ǫn)] must be of the form
ǫ r nr Kr , with Kr independent of n, and therefore also
P

∆(α) (n) u(α) (n) ⊗ u(α)† (n) = n r Lr


X X
(27)
α r

with Lr independent of n. As a corollary, if we use Eq. (18) to put indices


back in Eq. (27), and contract the indices N ′ and M ′ , we learn that

∆(α) (n) u(α)† (n)u(α) (n) = nr θr


X X
(28)
α r

where the d× d Hermitian matrix θr is given by [θr ]N M =


P
M ′ [Lr ]M ′ M,M ′ N ,
and is therefore independent of n. We may therefore write
i
τr = Tr − θr , (29)
2
where Tr are Hermitian matrices, which like τr and θr are independent of n.
Using these results in Eq. (13), we find the first-order change in the
density matrix due to the transformation g[ǫn]:
"
(α)
(n) u(α) (n) ρ u(α)† (n)
X
δǫn ρ = iǫ [n · T, ρ] + ǫ ∆
α
#
1 1
− u(α)† (n)u(α) (n) ρ − ρ u(α)† (n)u(α) (n) . (30)
2 2

14
It is the set of matrices Tr that here play a role like the usual Hermitian
matrix representation of the Lie algebra, though as we shall see it is only
in special cases that they can be shown to satisfy the same commutation
relations.
It may be noted that in itself the transformation rule (30) does not
uniquely fix the matrices Tr and u(α) (n). Without changing δǫn ρ, we may
shift these matrices by
 
∆ u(α) (n) = 1 Tr Cu(α) (n) ,
i X i X
[∆ Tr ]M ′ M = [Lr ]M ′ M,N N ′ CN ′ N − ∗
[Lr ]∗M M ′ ,N N ′ CN ′N ,
2 N ′N 2 N ′N
where C is an arbitrary complex matrix. This allows us to make the trace of
u(α) (n) anything we like, for if we take C = cβ u(β)† (n) with cβ arbitrary,
P

then (using Eq. (12)) we have Tr[∆u(α) (n)] = cα d. However in this paper
we will stick to the original definitions of Tr and u(α) (n), characterized by
the tracelessness of u(α) (n).
So far, this section has closely followed the usual treatment of the symme-
try of time-translation, especially as in ref. [14]. This symmetry yields the
Lindblad equation[8] for the time dependence of the density matrix (given
here in Section VIII), which applies in some extended versions of quantum
mechanics[9]. In that case there is just one matrix T , which can be identified
with minus the Hamiltonian of the system.
We will now see what can be learned from the multiplication rule (17)
for general continuous groups, when g = g(ǫn) and g = g(ǫn) are both
near the identity. Eq. (17) is automatically satisfied if either g = I or
g = I, so the lowest-order non-trivial terms in Eq. (17) are of order ǫ2 . The
resulting condition is a terrible mess, involving many coefficients that only
reflect how group elements are parameterized. To focus only on physically
interesting quantities, we will ignore everything but the part of Eq. (17) that
is antisymmetric in n and n, which must be satisfied separately from the
rest. Eq. (17) would be symmetric in n and n if it were not for the non-
vanishing commutators of the matrices u(i) on the left side of the equation
and of the group elements themselves on the right side. To calculate the
latter terms, we may write
  1X r s t
g(ǫn)g(ǫn) = g ǫn + ǫn + ǫ2 f (n, n) + . . . fr = C n n , (31)
2 st st
r = −C r are the structure constants of the group’s Lie algebra,
where Cst ts
and the dots in Eq. (31) denote second-order terms that are symmetric in n

15
and n, as well as terms of higher order in ǫ. The antisymmetric part of the
terms in Eq. (17) of order ǫ2 now gives

[n · τ, n · τ ] ⊗ 1 + 1 ⊗ [n · τ, n · τ ]†
∆(α) (n) [τ · n, u(α) (n)] ⊗ u(α)† (n) − i ∆(α) (n) u(α) (n) ⊗ [τ · n, u(α) (n)]†
X X
+i
α α
(α) (α) (α)†
∆(α) (n) u(α) (n) ⊗ [τ · n, u(α) (n)]†
X X
−i ∆ (n) [τ · n, u (n)] ⊗ u (n) + i
α α
1 X (α)
− ∆ (n) ∆(β) (n) [u(α) (n), u(β) (n)] ⊗ {u(α) (n), u(β) (n)}†
2 αβ
1 X (α)
− ∆ (n) ∆(β) (n) {u(α) (n), u(β) (n)} ⊗ [u(α) (n), u(β) (n)]†
2 αβ
r s t r s t
X X
=i τr Cst n n ⊗ 1 − i1 ⊗ τr† Cst n n
rst rst
" #

∆(α) (n + n)u(α) (n + n) ⊗ u(α)† (n + n) r s t
X X
− r
Cst n n (32)
rst
∂(n + n) α n+n=0

where curly brackets denote an anticommutator.


Inspection of Eq. (30) shows that if all ∆(α) vanish then δǫn ρ = iǫ[n ·
T, ρ]. Further, Eqs. (28) and (29) show in this case that τr = Tr . Eq. (32)
then shows also that in this case the Hermitian matrices Tr satisfy the
commutation relations [Ts , Tt ] = i r Cst
P r T of the symmetry group’s Lie
t
algebra.
But these familiar results do not hold if the density matrix transforms
in a more general way, with non-vanishing values for some ∆(α) , in which
case the terms in Eq. (30), (28), and (32) with non-zero ∆(α) represent a de-
parture from ordinary quantum mechanics. In ordinary quantum mechanics
the structure of the Hamiltonian and other operators representing symmetry
generators is largely fixed by the condition that they satisfy the Lie algebra
of the symmetry group, as for instance the form of the kinetic energy terms
in the non-relativistic Hamiltonian is fixed by the commutators of the gen-
erator of time-translation with the other generators of the Galilean group.
In the generalization of quantum mechanics considered here, it is Eqs. (32)
and (29) that must be used to constrain the operators Tr and u(α) (n) that
define the transformation of the density matrix.

V. AN EXAMPLE

16
The condition (32) sets constraints on the sorts of matrices Tr and u(α) (n)
that can enter in the transformation (30) of the density matrix for a given
set of structure constants Crs t . This section will give an explicit example

showing how these constraints can be satisfied, in a way different from that
of ordinary quantum mechanics.
We will consider a group containing (perhaps among other things) two
commuting symmetry operations, characterized by vectors nr and nr in
the space of group parameters, for which rs nr ns Crs t = 0. (One of these
P

symmetry operations may be time translation.) To satisfy the constraints,


let us try the assumption that the matrices n · T , n · T , and the relevant
u(α) (n), u(α)† (n), u(β) (n), and u(β)† (n) all commute with one another (where
by relevant we mean that ∆(α) (n) and ∆(β) (n) are not all zero.) Then the
definition (28), (29) shows that the matrices n · τ and n · τ commute with
each other and with the relevant u(α) (n), u(α)† (n), u(β) (n), and u(β)† (n).
The constraint (32) is then satisfied, as every term in this constraint simply
vanishes.
As simple as this example is, it represents a non-trivial generalization of
ordinary quantum mechanics. Since the Hermitian n · T and the relevant
u(α) (n) and u(α)† (n) all commute with one another, we can choose a basis
in which they are all diagonal, with

[u(α) (n)]M N = δM N uαM (n) , [n · T ]M N = δM N n · TM .

The transformation (30) then reads


"
h
∆(α) (n) uαM (n) uαN (n)∗
X
δǫn ρM N = ǫ in · (TM − TN ) +
α
#
1 1 i
− |uαM (n)|2 − |uαN (n)|2 ρM N ,
2 2

so that ρM N is not simply multiplied by a difference f (M ) − f (N ) for some


function f , as in ordinary quantum mechanics. In the absence of any other
symmetries the parameters ∆(α) (n), n · TM , and uαM (n) would be uncon-
strained, except that ∆(α) (n) and TM are real.

VI. COMPACT GROUPS

A well-known theorem tells us that with a suitable choice of basis,


all finite-dimensional representations of compact groups are unitary. The

17
density matrix furnishes a d2 -dimensional representation of any symmetry
group, so for compact groups it should transform unitarily. As we will now
see, this does not mean that it undergoes the transformation ρ 7→ U ρU † of
ordinary quantum mechanics, but it does constrain its transformation prop-
erties in interesting ways, one of which will be important in dealing with the
issue of positivity.
The unitarity of the transformation (3) requires that
∗ ∗
X X
KM M ′′ ,N N ′′ [g] KM ′ M ′′ ,N ′ N ′′ [g] = KM ′′ M ′ ,N ′′ N ′ [g] KM ′′ M,N ′′ N [g] = δM ′ M δN ′ N .

M ′′ N ′′ M ′′ N ′′
(33)
One immediate consequence that we will need in Section VII is that the
density matrix ρ = 1/d is invariant. We can see this by contracting the first
equation (33) with δM N and using the trace condition Eq. (5). This gives
X

KM ′ M ′′ ,N ′ M ′′ [g] = δM ′ N ′ .

M ′′

Taking the complex conjugate and dividing by d then gives the statement
of invariance:
g(1/d) = 1/d (34)
In the notation (18), Eq. (33) reads

η (i) [g]η (j) [g]u(i) [g]u(j)† [g] ⊗ u(j) [g]u(i)† [g]


X
1⊗1 =
ij

η (i) [g]η (j) [g]u(i)† [g]u(j) [g] ⊗ u(j)† [g]u(i) [g] .


X
= (35)
ij

For elements of continuous groups with parameters ǫnr near the origin, we
can either use Eqs. (24) and (29) in Eq. (35), or use Eq. (26) directly in
Eq. (33), and in either way find that

∆(α) [n] u(α) (n)⊗u(α)† (n)+ ∆(α) [n] u(α)† (n)⊗u(α) (n) .
X X
n·θ⊗1+1⊗n·θ =
α α
(36)
Taking the trace of the matrices on the right of the direct products then
gives
d n · θ + 1 Tr(n · θ) = 0 .
The trace of this equation gives 2dTr(n · θ) = 0, so n · θ = 0, and therefore
for compact groups
τr = Tr . (37)

18
VII. POSITIVITY

Finally, we come to the issue of positivity. It is clearly necessary that


the linear mapping (3) corresponding to a symmetry transformation should
take the density matrices of physical states into other density matrices that
are positive as well as being Hermitian and having unit trace. A linear
mapping is itself called positive if takes all positive Hermitian matrices into
positive Hermitian matrices with the same trace. It would simplify matters
if we could just assume that all symmetry mappings are positive, but this
is not indispensable, because the density matrices of physical states may
be limited in some way that insures that they are mapped into positive
matrices, even if some other positive matrices are not mapped into positive
matrices. This is particularly plausible for compact symmetry groups, for
which g(ρ) for any ρ varies only over a compact manifold. For instance, in
the SU (3) example of Section I, the density matrix is positive if (though not
only if) it is subject to the inequality
1
|b1 |2 + |b2 |2 + |b3 |2 ≤ a1 a2 a3 .
4
This condition is SU (3)-invariant, so under SU (3) transformations any den-
sity matrix satisfying this condition will be transformed into another density
matrix satisfying the same condition, and will therefore also be positive.
This is an important point, because we can show that if the mapping
associated with any invertible continuous symmetry acts on all density ma-
trices as a positive mapping, then this mapping must take the same form
(2) as in ordinary quantum mechanics. For the purposes of this theorem, we
only need to show that for any invertible mapping that is not of the form
(2) there is some positive density matrix ρ that is transformed into a non-
positive matrix, so we are free to choose ρ here pretty much as we like. We
will choose the density matrix ρ to have one eigenvector v with eigenvalue
zero:
ρv = v † ρ = 0 . (38)
When we take the expectation value of Eq. (30) in the “state” v, as a
consequence of Eq. (38) we find that only the first term in the sum over α
makes a non-zero contribution:
     
v † [ρ + δǫn ρ]v = v † δǫn ρ v ∆(α) (n) v † u(α) (n) ρ u(α)† (n)v . (39)
X

α

19
It is immediately obvious that if the coefficient of ǫ is non-zero, then for
some sign of ǫ the expectation value (39) will be negative, so that ρ + δǫn ρ
cannot be positive.
It only remains to show that unless all ∆(α) (n) vanish, for some vector v
there will be some positive Hermitian matrix ρ satisfying Eq. (38) for which
the coefficient of ǫ in Eq. (39) is non-zero. (This is obvious if all ∆(α) (n)
have the same sign, but we want also to consider the case where some are
positive and some are negative.) Let us suppose the contrary; that is, for
all v we have  
∆(α) (n) v † u(α) (n) ρ u(α)† (n)v = 0
X
(40)
α

for all positive Hermitian matrices ρ satisfying Eq. (38). We will show that
in this case, we must have ∆(α) (n) = 0 for all α.
We are free to take ρ to have no eigenvectors with eigenvalue zero other
than v. In this case, the condition that ρ is positive puts no constraints on
infinitesimal variations of ρ, so the assumption that Eq. (40) holds for all
positive Hermitian ρ satisfying Eq. (38) requires that

∆(α) (n) [v † u(α) (n)]M [u(α)† (n)v]N = c∗M vN + vM


X

cN , (41)
α

for all N and M , and for some vector cN which may depend on n and v.
In fact, cN must depend on v, because Eq. (41) is supposed to hold for all
v, so there have to be the same numbers of vs and v ∗ s on both sides of
the equation. Specifically, we must have cN = L CN L vL , where CN L is
P

independent of v, and the coefficient of vP∗ vQ in Eq. (41) must vanish:


(α) (α)†
∆(α) (n) uP M (n)] uN Q (n) = CM

X
P δN Q + δM P CN Q . (42)
α

Now we can use some of the properties of the u(α) (n) obtained in Section IV.
(β)† (β)
Consider any β, and contract Eq. (42) with uM P (n)uQN (n). Because the
u(β) (n) are traceless, the right-hand side of the contracted equation vanishes,


and because they satisfy the orthonormality condition Tr u(β) u(α) = δβα ,
the left-hand side of the contracted equation is ∆(β) (n), so ∆(β) (n) = 0 for
all β, as was to be shown.
This theorem leaves open the possibility of limiting the density matrix
to a special class of positive Hermitian matrices, for which any symmetry
transformation takes any member of this class into another positive member
of the same class, as in the SU (3) example given above. But does such a

20
special class always exist? Apparently it does, at least for compact groups.
We saw in the previous section that in a suitable basis, such symmetries leave
invariant the positive density matrix 1/d. Suppose we shift this density
matrix by some traceless Hermitian matrix η. The new density matrix
1/d+η will generally not be invariant, but as long as η is sufficiently small the
symmetry transformations belonging to a compact group will take 1/d + η
into density matrices whose eigenvalues are sufficiently close to the original
eigenvalues 1/d so that they are still all positive. Thus, acting on a density
matrix 1/d + η with all symmetries g belonging to a compact group, and
with η running over all traceless Hermitian matrices that are sufficiently
small so that all g(1/d + η) are positive, provides the sort of special class
of density matrices we need, which can transform in a way that is different
from the transformation (2) of ordinary quantum mechanics without raising
problems with positivity.

VIII. MAPPINGS WITH POSITIVE EIGENVALUES

There is a class of mappings that are obviously positive, in the sense of


taking all positive Hermitian matrices into positive Hermitian matrices. In-
spection of Eq. (13) shows immediately that a mapping is positive if (though
not only if) all its eigenvalues η (i) are positive. In this case, we can write
the general mapping (13) in the Kraus form[15]:

A(i) [g] ρ A(i)† [g]


X
g(ρ) = (43)
i

where A(i) [g] ≡ η (i) [g]1/2 u(i) [g] and i A(i)† [g]A(i) [g] = 1. The unitary
P

transformations ρ 7→ U [g] ρ U [g]† of ordinary quantum mechanics are a spe-


cial case of the more general transformations (43).
The assumption that all eigenvalues of the kernel are positive would be
an effective way of guaranteeing that positive density matrices are mapped
into positive density matrices, but it would have the consequence that the
transformation of the density matrix under any symmetry group, whether
continuous or discrete, would reduce to the same unitary transformation
rule (2) as in ordinary quantum mechanics.
We can see this immediately for continuous groups. If for some contin-
uous symmetry g we were to require the positivity of the eigenvalues (23)
whatever the sign of ǫ, we would have to assume that ∆(α) [g] = 0 for all

21
α. As already mentioned in Section IV, any continuous symmetry for which
∆(α) [g] vanishes for all α is necessarily realized by the unitary transforma-
tion (2) of ordinary quantum mechanics. Of course, we already knew this.
With all eigenvalues positive, mappings preserve the positivity of any den-
sity matrix, and as shown in the previous section any continuous symmetry
with an inverse that preserves the positivity of all density matrices must act
as in Eq. (2).
As already mentioned in Section I, in various extended versions of quan-
tum mechanics[9] the assumption of positive mapping of the density matrix
is commonly made for the continuous symmetry of time-translation, but
only prospectively, not retrospectively. It is usually assumed in these theo-
ries that the kernel for time-translation by an amount τ has positive eigen-
values if τ > 0, but this is not assumed (and in fact is not generally true)
when τ < 0. For this reason, the positive time-translation mappings usu-
ally considered in these theories form a semi-group, not a group. With this
weaker assumption, Eq, (23) simply requires that ∆(α) (nT ) ≥ 0, where nT
denotes the direction in the space of group parameters for time-translation.
The transformation rule (30) then immediately yields the Lindblad equation
" #
d 1 1
Lα ρ L†α − L†α Lα ρ − ρ L†α Lα ,
X
ρ = −i[H, ρ] +
dt α 2 2

where H ≡ −nT · ·T and Lα ≡ ∆(α)1/2 [nT ] u(α) [nT ]. If ρ(t) is positive for
some initial time t then this equation gives a positive ρ(t′ ) for all t′ > t but,
as illustrated in ref. [10], in these theories one can usually find an earlier
time t′ < t for which ρ(t′ ) is not positive. As I understand it, this is tol-
erated in these extended versions of quantum mechanics because unless we
tackle the description of the whole universe the differential equation for the
time-dependence of the density matrix is only supposed to apply for closed
systems, which become closed at some initial time t, so one does not have to
worry about what happens for times t′ < t. One can argue about whether
this is satisfactory for time-translation, but we would certainly not want
to assume that the eigenvalues of K[g] are all positive when the symmetry
transformation g is a spatial translation to the north but not to the south, or
is a rotation that is clockwise around the vertical but not counter-clockwise,
or is a boost that increases the velocity to the eastward but not to the west-
ward. Such symmetry transformations must be assumed to form a group,
not merely a semi-group.
With a little more effort, we can show that for all symmetry groups, dis-

22
crete as well as continuous, the assumption of positive eigenvalues rules out
the possibility that the density matrix will have an unusual transformation
rule, one different from Eq. (2). To prove this, we use one of the defining
properties of groups, that for every element g of a group there is an inverse
g−1 . From Eq. (16) and Eqs. (19)–(21), with g taken as g −1 , we have

η (i) [g] η (j) [g −1 ] u(i) [g] u(j) [g−1 ] ρ u(j)† [g−1 ] u(i)† [g] = ρ ,
X
(44)
ij

for any matrix ρ. In particular, for an Hermitian matrix ρ, we can find a


unitary matrix Ω such that ρ(D) = ΩρΩ−1 is diagonal, [ρ(D) ]M N = PM δM N .
Eq. (44) then applies if we replace ρ with ρ(D) and replace all u(i) [g] and
u(j) [g −1 ] with u(iD) [g] ≡ Ωu(i) [g]Ω−1 and u(jD) [g −1 ] ≡ Ωu(j) [g−1 ]Ω−1 . This
gives
Xh i h i∗
η (i) [g] η (j) [g −1 ] u(iD) [g] u(jD) [g−1 ] PL u(iD) [g] u(jD) [g−1 ]
X
ML NL
ij L
= PM δM N , (45)

This must hold for all real numbers PN , so it follows that, for all L, M , and
N:
h i h i∗
η (i) [g] η (j) [g −1 ] u(iD) [g] u(jD) [g−1 ] u(iD) [g] u(jD) [g −1 ]
X
= δM L δN L .
ML NL
ij
(46)
In particular, if M = N 6= L, then
h i 2
η (i) [g] η (j) [g −1 ] u(iD) [g] u(jD) [g−1 ]
X
=0. (47)

ML
ij

Here is where the positivity of the eigenvalues becomes important. If all the
eigenvalues η (i) [g] and η (j) [g−1 ] are positive, then it follows from Eq. (47)
that for all relevant i and j (that is, for all i and j for which η (i) [g] and
η (j) [g−1 ] respectively do not vanish) we have
h i
u(iD) [g] u(jD) [g−1 ] =0, (48)
ML

for any unequal indices M and L. Since the matrix u(iD) [g] u(jD) [g−1 ] is
thus diagonal, it commutes with the diagonal matrix ρ(D) . But then also
u(i) [g] u(j) [g−1 ] commutes with the arbitrary Hermitian matrix ρ. The only

23
matrices that commute with all Hermitian matrices are proportional to the
unit matrix, so we can conclude that for all relevant i and j

u(i) [g] u(j) [g −1 ] = cij [g]1 . (49)

for some complex numerical coefficients cij [g]. Taking the determinant of
Eq. (49) gives (cij [g])d = Detu(i) [g] Detu(j) [g−1 ]. Now, there must be at least
one relevant j for which Detu(j) [g−1 ] 6= 0, since otherwise we would have
u(i) [g] u(j) [g−1 ] = 0 for all relevant i and j, contradicting Eq. (44). Taking
j to have any value for which Detu(j) [g−1 ] 6= 0, Eq. (49) then tells us that
all relevant u(i) [g] are proportional to a single u[g]; specifically,
 1/d
u(i) [g] = Detu(i) [g] × u[g] , (50)
 1/d
where u[g] = u(j)−1 [g−1 ] × Detu(j) [g−1 ] . The trace condition (14) then
reads 2/d
η (i) [g] Detu(i) [g] u† [g]u[g] = 1 ,
X
i
 2/d 1/2
so the matrix U [g] ≡ η (i) [g] (i)
P
Detu [g] u[g] is unitary, and the

i

transformation rule (13) takes the familiar form g(ρ) = U [g] ρ U † [g] of
ordinary quantum mechanics, as was to be proved.
But do we need to require that the kernels for general symmetries have
only positive eigenvalues? There are well-known examples of positive map-
pings that have some negative eigenvalues. The standard example is the
transposition map
KM ′ M,N ′ N = δM ′ N δN ′ M .
This has two eigenvalues, one positive and one negative. (Any symmetric or
antisymmetric matrix is an eigenmatrix with eigenvalue +1 or −1.) Never-
theless, K is positive, because g(ρ) = ρT , which is positive if ρ is positive.
There is a widely cited argument for the requirement that all eigenvalues
of the kernel K must be positive, based on the possibility of entanglement.
Consider an arbitrary system S (I) , and an arbitrary linear mapping K (I)
of the density matrix of this system, which preserves its Hermiticity, unit
trace, and positivity. We can imagine adding an isolated system S (II) of
finite dimensionality dII , and extending K (I) to a kernel K that acts as
K (I) on S (I) , and acts trivially on S (II) . That is, if we label the basis
vectors of S (I) with indices m, n, etc. and the basis vectors of S (II) with

24
indices a, b, etc., the kernel of the mapping (in the notation of Eqs. (3) and
(6)) on the combined system is
(I)
Km′ a′ ma,n′ b′ nb = Km′ m,n′ n δa′ a δb′ b . (51)

The original mapping K (I) is said to be completely positive[16] if K is posi-


tive (in the sense of mapping all positive density matrices for the combined
system into positive density matrices) for all finite dimensionalities dII . A
theorem due to Choi[17] states that if K (I) is completely positive in this
sense, then all its eigenvalues are positive. (As usually stated, the theorem
says that any completely positive mapping takes the Kraus form (43), but
as we have seen this form follows from the positivity of the eigenvalues, and
it is obvious that any kernel that induces a transformation of this form has
only positive eigenvalues.)
Though there is no doubt of the mathematical correctness of the Choi
theorem, it is not clear that it is relevant physically. In particular, the
vacuum is the only physical system that is invariant under Galilean (or
Lorentz) transformations and time translations. Since the dimensionality of
the Hilbert space of the vacuum is unity, this does not fulfill the conditions of
the Choi theorem, that there should be isolated systems S (II) with arbitrary
finite dimensionality on which the symmetry acts trivially.
There seems to be a widespread impression that this does not matter, at
least for the only symmetry that has been previously studied in this context,
the symmetry of time translation. It is supposed that, even if a symmetry
transformation K acts non-trivially on S (II) , we may be able to undo it by
inventing a transformation L that acts on S (II) as the inverse of K, and
leaves S (I) unchanged, so that LK does have the form (51). (I have not
been able to find a published reference to this argument.) But in general,
except in the uninteresting case in which S (I) is the vacuum, L will not be
a symmetry transformation, so neither will be LK. Or if we take L as a
symmetry transformation that acts non-trivially on S I , then the action of
LK on the Hilbert space of S I is a completely positive mapping, but it is
not the same mapping as K (I) .
There are some continuous symmetry transformations, such as rotations,
for which there are invariant physical systems with Hilbert spaces of arbi-
trary dimensionality which therefore might be taken as the isolated system
S (II) in the assumptions of the Choi theorem. Even so, in the real world
there are no disembodied spins, only particles with spins. The Hilbert space
of any physical system other than the vacuum has infinite dimensionality,

25
and it is not clear that the Choi theorem can be extended to realistic cases.
Despite these skeptical comments, it may turn out to be physically nec-
essary for the kernels K[g] for all elements g of symmetry transformation
groups to have only positive eigenvalues, even for symmetries like Galilean
invariance. In that case, the main point of the of this paper would be the
proof that such symmetry transformations act on the density matrix as in
ordinary quantum mechanics. It would be much more interesting if for some
symmetry groups it will turn out to be unnecessary for all eigenvalues to be
positive, in which case the much richer variety of symmetry transformations
discussed in Section II and III would be physically possible.

Added note: This work was originally presented on February 28, 2014, at a
conference in honor of Joseph Polchinski at the Kavli Institute for Theoret-
ical Physics, Santa Barbara, CA. I subsequently was sent two papers of J.
A. Barandes and D. Kagan that also propose to regard the density matrix
rather than state vectors as the representation of reality. Distribution of the
present paper has been delayed in order to take account of comments of G.
Moore on the talk given in Santa Barbara.

Special thanks are due to Gregory Moore for his incisive comments on the
original version of this work. I am also grateful to Nicolas Gisin and Philip
Pearle for helpful recent correspondence, and to Jacques Distler, Angelo
Bassi, Gian Carlo Ghirardi, James Hartle, and Roderich Tumulka for inter-
esting earlier communications on the interpretation of quantum mechanics.
This material is based upon work supported by the National Science Foun-
dation under Grant Numbers PHY-1316033 and PHY-0969020 and with
support from The Robert A. Welch Foundation, Grant No. F-0014.

———

1. N. Bohr, Nature 121, 580 (1928).

2. The published version is H. Everett, Rev. Mod. Phys. 29, 454 (1957).

3. R. B. Griffiths, J. Stat. Phys. 36, 219 (1984); R. Omnès, Rev. Mod.


Phys. 64, 339 (1992); M. Gell–Mann and J. B. Hartle, in Complexity,

26
Entropy, and the Physics of Information, ed. W. Zurek (Addison–
Wesley, Reading, MA, 1990); in Proceedings of the Third International
Symposium on the Foundations of Quantum Mechanics in the Light of
New Technology, ed. S. Kobayashi, H. Ezawa, Y. Murayama, and S.
Nomura (Physical Society of Japan, 1990); in Proceedings of the 25th
International Conference on High Energy Physics, Singapore, August
2–8, 1990, ed. K. K. Phua and Y Yamaguchi (World Scientific, Sin-
gapore, 1990); J. B. Hartle, Directions in Relativity, Vol. 1, ed. B.-L.
Hu, M.P. Ryan, and C.V. Vishveshwars (Cambridge University Press,
Cambridge, 1993). For a survey and more recent references, see P.
Hohenberg, Rev. Mod. Phys. 82, 2835 (2010).

4. A. Einstein, B. Podolsky, and N. Rosen, Phys. Rev. 47, 777 (1936); D.


Bohm, Quantum Theory (Prentice-hall, Inc., New York, 1951), Chap-
ter XXII; D. Bohm and Y. Aharonov, Phys. Rev. 108, 1070 (1957).

5. S. Weinberg, Phys. Rev. A 85, 062116 (2012).

6. N. Gisin, Helv. Phys. Acta 62, 363 (1989); Phys. Lett. A 143, 1
(1990).

7. J. Polchinski, Phys. Rev. Lett. 66, 397 (1991).

8. G. Lindblad, Commun. Math. Phys. 48, 119 (1976); V. Gorini, A.


Kossakowski and E. C. G. Sudarshan, J. Math. Phys. 17, 821 (1976).
The Lindblad equation can be derived as a straightforward application
of an earlier result of A. Kossakowski, Reports on Math. Phys. 3,
247 (1972), Eq. (77). The equation was independently derived by T.
Banks, L. Susskind, and M. E. Peskin, Nucl. Phys. B 244, 125 (1984).

9. G. C. Ghirardi, A. Rimini and T. Weber, Phys. Rev. D 34, 470


(1986); P. Pearle, Phys. Rev. A 39, 2277 (1989); G. C. Ghirardi,
P. Pearle, and A. Rimini, Phys. Rev. A 42, 78 (1990); P. Pearle,
in Quantum Theory: A Two-Time Success Story (Yakir Aharonov
Festschrift), eds. D. C. Struppa & J. M. Tollakson (Springer, 2013),
Chapter 9. [arXiv:1209.5082]. For a review, see A. Bassi and G. C.
Ghirardi, Physics Reports 379, 257 (2003).

10. S. Weinberg, paper in preparation.

11. This theorem was suggested to me by G. Moore, private communica-


tion.

27
12. E. P. Wigner, Ann. Math 40, 149 (1939).

13. On this, see Banks, Susskind, and Peskin, ref. 8; G. C. Ghirardi, R.


Grassi, and P. Pearle, Found. Phys. 20, 1271 (1980).

14. P. Pearle, Eur. J. Phys. 33, 805 (2012) [arXiv: 1204.2016].

15. K. Kraus, States, Effects, and Operations – Fundamental Notions of


Quantum Mechanics, Lecture Notes in Physics 190 (Springer-Verlag,
Berlin, 1983): Chapter 3.

16. W. F. Stinespring, Proc. Am. Math. Soc. 6, 211 (1955); M. D. Choi,


J. Math. 24, 520 (1972). For a review, see F. Benatti and R. Floren-
tini, Int.J.Mod.Phys. B19, 3063 (2005) [arXiv:quant-ph/0507271].

17. M. D. Choi, Linear Algebra and its Applications 10, 285 (1975).

28
UTTG-01-16
arXiv:1603.06008v1 [quant-ph] 18 Mar 2016

What Happens in a Measurement?

Steven Weinberg∗
Theory Group, Department of Physics, University of Texas
Austin, TX, 78712

Abstract

It is assumed that in a measurement the system under study interacts with a


macroscopic measuring apparatus, in such a way that the density matrix of
the measured system evolves according to the Lindblad equation. Under an
assumption of non-decreasing von Neumann entropy, conditions on the op-
erators appearing in this equation are given that are necessary and sufficient
for the late-time limit of the density matrix to take the form appropriate for
a measurement. Where these conditions are satisfied, the Lindblad equa-
tion can be solved explicitly. The probabilities appearing in the late-time
limit of this general solution are found to agree with the Born rule, and are
independent of the details of the operators in the Lindblad equation.


Electronic address: weinberg@physics.utexas.edu

1
I. INTRODUCTION

According to the Copenhagen interpretation of quantum mechanics, dur-


ing a complete measurement the initial density matrix ρinitial undergoes a
collapse X
ρinitial 7→ ρfinal = pα Λα (1)
α
where Λα = |αihα| are projection operators onto the complete set of or-
thonormal eigenvectors |αi of whatever is being measured, satisfying the
usual conditions
X
Λα Λβ = δαβ Λα Λα = 1 TrΛα = 1 Λ†α = Λα , (2)
α

and pα are probabilities, given by the Born rule


 
pα = hα|ρinitial |αi = Tr Λα ρinitial . (3)

In a closed system in ordinary quantum mechanics the state vector evolves


unitarily and deterministically, so as well known the collapse (1) cannot oc-
cur if the initial density matrix ρinitial describes a pure state (or an ensemble
of fewer pure states than the number of terms in ρfinal ). In the original for-
mulation[1] of the Copenhagen interpretation it was simply accepted that
the change in a system during measurement in principle departs from quan-
tum mechanics. We will instead adopt the popular modern view that the
Copenhagen interpretation refers to open systems in which the transition
(1) is driven by the interaction of the microscopic system under study with
a suitable environment, a macroscopic external measuring apparatus (which
may include an observer) chosen to bring this transition about.
Of course, this view of the Copenhagen interpretation just pushes the
hard problems of interpreting quantum mechanics to a larger scale. We make
no attempt to address these problems in the present paper, beyond noting
the conjecture[2] that the unitary evolution of microscopic systems is merely
a very good approximation, while the density matrix of combined systems
with macroscopic parts in general evolves rapidly and non-unitarily, and
in particular undergoes the collapse (1) during a measurement. Although
in this paper we are focusing on ordinary quantum mechanics in an open
system, most of our analysis applies equally to closed systems in modified
versions of quantum mechanics.
Whether in open systems in ordinary quantum mechanics or in closed
systems in some modified version of quantum mechanics, in order to avoid

2
instantaneous communication at a distance in entangled states, it is impor-
tant to require that the density matrix at one time depends on the density
matrix at any earlier time, but not otherwise on the state vector at the earlier
time.[3]. This evolution can be linear but non-unitary in ordinary quantum
mechanics if the system under study interacts with an environment that
fluctuates randomly more rapidly than the rate at which the density matrix
evolves (set by the interaction strength) if we average over these fluctua-
tions. We do not need to go into details regarding this interaction with the
environment, because it is known that in the most general linear evolution
that preserves the unit trace and Hermiticity of the density matrix and sat-
isfies the condition of complete positivity[4], the density matrix satisfies the
Lindblad equation[5]:

dρ(t) 1 1
h i X  
= −i H, ρ(t) + Ln ρ(t)L†n − L†n Ln ρ(t) − ρ(t)L†n Ln , (4)
dt n 2 2

with constant matrices∗∗ Ln and H. We then face three questions:

1. What are the necessary conditions on the operators Ln and H for


the density matrix to approach a time-independent linear combination
such as (1) of specific projection operators Λα at late times?

2. Are these conditions sufficient?

3. For such Ln and H, are the coefficients pα of the Λα in this linear


combination given by the Born rule (3)?

The answer to the first question is given in Section II, under the assumption
that the Ln satisfy the necessary and sufficient condition[6] that the von
Neumann entropy −Tr(ρ ln ρ) should never decrease:

L†n Ln = Ln L†n .
X X
(5)
n n

The second and third questions are answered in Section III, where we give
a general solution of the Lindblad equation, under the conditions found in
Section II.

II. NECESSARY CONDITIONS FOR A MEASUREMENT


∗∗
We limit the considerations of this paper to a Hilbert space of a finite dimensionality
d. Presumably they can be extended to infinite dimensional spaces, on which H and the
Ln act as suitably defined operators.

3
First, let us consider some general aspects of the late-time behavior of
the solutions of the Lindblad equation (4), without yet specializing to Ln
satisfying Eq. (5). Because Eq. (4) is linear with time-independent coeffi-
cients, it has solutions that are generically of the form
X  
ρ(t) = vk exp λk t , (6)
k

where vk and λk are the eigenmatrices and eigenvalues of the operator L in


Eq. (4):
Lvk = λk vk , (7)
X 1 1
h i 
Lv ≡ −i H, v + Ln v L†n − L†n Ln v − v L†n Ln , (8)
n 2 2
with the normalization of each vk in Eq. (6) of course depending on initial
conditions. (It is only for the non-degenerate case that the solution of Eq. (4)
necessarily takes the form (6); if an eigenvalue λk has an N -fold degeneracy,
then exp(λk t) may be accompanied with a polynomial in t of order up to
N − 1.) Because L is in general not Hermitian the eigenvalues may be
complex, and the individual vk need not be Hermitian or positive, though
the sum (6) must be both Hermitian and positive,
Even if there are eigenvalues λk with positive-definite real parts, such
terms cannot contribute to the sum (6). If they did contribute then the sum
of such terms would dominate ρ(t) at late times. But Trρ(t) must remain
constant, so the sum of terms with Reλk > 0 would have to be traceless.
Also, ρ(t) must remain Hermitian and positive, so the sum of terms with
Reλk > 0 would have to be Hermitian and positive. But then the eigenvalues
of this sum would have to be real and positive and add up to zero, which is
impossible unless all the eigenvalues vanish, in which case the sum vanishes.
The same argument rules out any contribution of powers of time for any
eigenvalues with Reλk = 0. So we conclude that the asymptotic behavior
of ρ(t) is dominated by the sum of vk exp(λk t) over all eigenmatrices with
Reλk = 0, if there are any.
In fact, as required by the constancy of the trace, there always is at least
one eigenmatrix with λk = 0. We can think of L as a d2 × d2 matrix, acting
on the space of d × d matrices. Because Eq. (4) preserves the trace of ρ,
the unit d × d matrix 1 is a left eigenvector of L with eigenvalue zero, so
DetL = 0, and therefore L also has a right eigenvector (not necessarily the
unit matrix) with eigenvalue zero. But in general there may be several vk
with λk = 0.

4
In order to separate the real and imaginary parts of general eigenvalues,
let us consider the quantity
   
Tr vk† vk λk = Tr vk† Lvk . (9)

A straightforward calculation gives


!
  1
vk† vk [vk , L†n ]† [vk , L†n ]
X
Tr Reλk = − Tr
2 n
!
1 X 
− Tr vk vk† L†n Ln − Ln L†n (10)
2 n
   
Tr vk† vk Imλk = −Tr vk† [H, vk ] + ImTr Ln vk† [vk , L†n ]
X
(11)
n

(See Appendix A.) It is difficult to make further progress without invoking


some assumption that limits the nature of the Ln . As mentioned in Sec. I,
we shall assume that the Ln satisfy the necessary and sufficient condition
(5) for non-decreasing entropy. In this case, Eq. (10) simplifies to
!
  1
vk† vk
X
Tr Reλk = − Tr [vk , L†n ]† [vk , L†n ] . (12)
2 n

We see immediately that the real parts of all λk are negative or zero. The
behavior of ρ(t) for t → ∞ is then dominated by the modes vk for which
Reλk = 0, for which according to Eq. (12) vk must commute with all L†n ,
and hence with all Ln . (By taking the adjoint of Eq. (7) we see that if vk is
an eigenmatrix of L then so is vk† , which must appear in (6) along with vk
to keep ρ Hermitian. The adjoint of the condition that vk† commutes with
L†n tells us that vk must commute with Ln .)
Also, for such modes Eq. (11) gives
   
Tr vk† vk Imλk = −Tr vk† [H, vk ] . (13)

But it must not be thought that a vk that commutes with all Ln is nec-
essarily an eigenmatrix of L with the real part of the eigenvalue zero and
its imaginary part of first order in H. With Ln subject to Eq. (5) and vk
commuting with all Ln , the eigenvalue equation (7) becomes

λk vk = −i[H, vk ] ,

5
but this is impossible if the space of matrices that commute with all Ln is
not invariant under commutation with H. Otherwise the commutator with
H in Eq. (7) will mix the eigenmatrices vk that commute with all Ln with
other matrices that do not commute with some Ln , giving an eigenvalue
with negative-definite real part, whose contribution vanishes for t → ∞.
Here is an example. Take d = 2, with a single Ln given by L = ℓσ3 (which
trivially satisfies Eq. (5)), and H = hσ1 , with h real. This L commutes with
the projection operators (1 ± σ3 )/2, as required in a measurement of σ3 ,
but for h 6= 0 the commutator of H with these projection operators does
not commute with them, so the measurment doesn’t work. We can see
this in the late-time behavior of the solutions of the Lindblad equation. In
general the eigenmatrices of L are v0 ∝ 1, with eigenvalue zero; v1 ∝ σ1 ,
with eigenvalue −2|ℓ|2 < 0; and two mixtures of σ2 and σ3 , with eigenvalues
−|ℓ|2 ± (|ℓ|4 − 4h2 )1/2 . For h = 0 there are two eigenmatrices with eigenvalue
zero, which can be taken as 1 and σ3 , or equivalently as the projection
matrices (1 + σ3 )/2 and (1 − σ3 )/2 as needed in a measurement of σ3 ; the
other eigenmatrices both have eigenvalues −2|ℓ|2 , corresponding to modes
that disappear for t → ∞. On the other hand the late-time behavior of
the density matrix is entirely different if h is non-zero, though arbitrarily
small. In this case all eigenvalues have negative-definite real part, except the
eigenvalue λ0 = 0 associated with v0 ∝ 1, and for t → ∞ the density matrix
approaches the maximum entropy matrix 1/2, for which all probabilities are
the same.
With this background, let us now consider what happens in a measure-
ment. We suppose that the microscopic system under study interacts with
a macroscopic measuring apparatus, in such a way that the density matrix
of the microscopic system evolves according to the Lindblad equation (4),
with the measuring apparatus chosen so that the matrices Ln and H have
whatever properties are needed so that ρ(t) at late times approaches a linear
combination of projection operators Λα on the eigenstates |αi of whatever
is being measured. As we have seen in Eq. (12), in order for this to be the
case without putting any constraints on the initial conditions that determine
the coefficients in this linear combination, it is necessary that the matrices
Ln should commute with any linear combination of the Λα , and hence with
each Λα :
[Ln , Λα ] = 0 , (14)
from which it follows immediately that each Ln must itself be a linear com-

6
bination of the Λα : X
Ln = ℓnα Λα , (15)
α

with coefficients ℓnα that are in general complex numbers. (From Eq. (14) it
follows that the eigenstates |αi satisfying Λβ |αi = δαβ |αi must be eigenstates
of the Ln :
Ln |αi = Ln Λα |αi = Λα Ln |αi = |αihα|Ln |αi
Then Ln has the same action on any |αi as does the sum (15) with ℓnα =
hα|Ln |αi, and since the |αi form a complete set, Ln must equal the sum
(15).) From Eq. (15) the condition (5) for non-decreasing entropy follows
trivially.
This leaves us with the matrix H. As remarked earlier, in order that the
limiting behavior of ρ(t) for general initial conditions should be a linear com-
bination of the Λα , it is necessary that the space of such linear combinations
should be invariant under commutation with H:
X
[H, Λα ] = hαβ Λβ .
β

By multiplying this commutator on both the left and right with any Λβ ,
we see that 0 = hαβ , and therefore H must commute with all Λα . By the
same argument used above for the Ln , we see then that H must be a linear
combination of the Λα : X
H= hα Λα , (16)
α

with real coefficients hα .


This is a good place to bring up a complication. The late-time behavior
(1) is expected only for a complete measurement. It is more common for
measurements to be incomplete, in the sense that they do not lead to definite
states |αi with definite probabilities, but to equivalence classes of states
that are not distinguished by the measurement. For instance, in a system
consisting of two spins 1/2, we might measure only the first spin, leaving the
other undisturbed. The states then fall into two classes, labeled by the z-
components of the two spins: one class consists of |1/2, 1/2i and |1/2, −1/2i,
and the other consists of | − 1/2, 1/2i and | − 1/2, −1/2i. In incomplete
measurements, instead of (1), the expected late-time limit of the density
matrix is X
ρinitial 7→ ρfinal = ΛC ρinitial ΛC , (17)
C

7
where X
ΛC = Λα , (18)
α∈C
As far as the states within a single class are concerned, ΛC acts just like a
unit matrix, so Eq, (17) says that the measurement does nothing to what is
not being measured. For a complete measurement, where each state belongs
to a different class, Eq. (17) reduces to Eqs. (1) and (3).
Eq. (12) shows that in order for ρ(t) to have some given asymptotic limit
ρfinal , it is necessary for all Ln to commute with this limit, and since this
must be true for all ρinitial , the Ln here must in particular commute with
C ΛC Λα ΛC = Λα . The same argument as given above for complete mea-
P

surements then shows that, here too, eash Ln must be a linear combination
(15) of the Λα . Only now there is a constraint on the coefficients. The
commutator of the sum (15) with the limit (17) is
hX X i X X
ℓnα Λα , ΛC ρinitial ΛC = [ℓnβ − ℓnγ ]Λβ ρinitial Λγ
α C C β,γ∈C

which vanishes for all initial density matrices if ℓnβ = ℓnγ for all β and γ in
the same class. The same argument shows that hβ = hγ if β and γ are in
the same class. This is reasonable, because for an incomplete measurement
the Lindblad equation must not distinguish between different states in the
same class, We will see in the next section that in this case the late-time
limit of the density matrix does have the form (17).

III. COLLAPSE OF THE DENSITY MATRIX

First let us give the solution of the Lindblad equation under the condition
that the matrices Ln and H in this equation are linear combinations (15),
(16) of projection operators Λα satisfying Eq. (2):
X X
Ln = ℓnα Λα , H= hα Λα .
α α

It is straightforward to check that Eq. (4) is then satisfied by


X
ρ(t) = Λα M Λβ exp(λαβ t) , (19)
αβ

where
1 X 2  
ℓnα ℓ∗nβ − i hα − hβ .
X
λαβ = − ℓnα − ℓnβ + i Im (20)

2 n n

8
and M is an arbitrary matrix, independent of α, β, and time. [See Appendix
B.] To relate M to the initial value of ρ(t) at t = 0, set t = 0 in Eq. (19) and
use the completeness condition α Λα = 1. We see that ρ(0) = M , and so
P

X
ρ(t) = Λα ρ(0)Λβ exp(λαβ t) , (21)
αβ

This is our general solution.[7]


Now consider the behavior of this solution at late times. The only terms
in the sum (21) that do not decay exponentially are those with ℓnα = ℓnβ
for all n. If for the moment we rule out degeneracy, so that ℓnα can equal
ℓnβ for all n only for α = β, then all λαβ have negative-definite real part
except those with α = β, for which the imaginary as well as the real parts of
λαα vanish. These terms then dominate the asymptotic behavior[8] of the
density matrix for t → ∞:
X X
ρ(t) → Λα ρ(0)Λα = Λα hα|ρ(0)|αi (22)
α α

This is just the behavior (1) called for by the Copenhagen interpretation,
with probabilities pα given by the Born rule (3).
The case of degeneracy arises in an incomplete measurement, in which
we only measure whether the system is in some state or other in a class of
states that are not distinguished by the measurement. As indicated at the
end of the previous section, in this case we expect ℓnα to equal ℓnβ for all n
and hβ = hγ if (and only if) |αi and |βi are in the same class. Then Eq. (21)
has the expected late-time behavior (17).
It is striking that although the detailed time-dependence of the density
matrix depends on the coefficients ℓnα and hα appearing in the matrices in
the Lindblad equation, the asymptotic limit for t → ∞ for both complete
and incomplete measurements does not depend on these details, depending
only on the initial condition ρ(0) and on what it is that is being measured.
This, of course, is just what we require of a measurement.

Acknowledgments

I am grateful for correspondence with P. Pearle, and regarding the condi-


tion for non-decreasing entropy, with H. Narnhofer, D. Reeb, and R. Werner.

9
This material is based upon work supported by the National Science Founda-
tion under Grant Number PHY-1316033 and with support from The Robert
A. Welch Foundation, Grant No. F-0014.

APPENDIX A: Derivation of Eqs. (10) and (11)


We start with the desired result, and work back to the problem it solves.
For a general matrix v, consider the quantity
! !
1 1 X 
[v , L†n ]† [v , L†n ] − Tr vv † L†n Ln − Ln L†n
X
R ≡ − Tr
2 n 2 n
 
Ln v † [v, L†n ] − iTr v † [H, v]
X
+iImTr (A.1)
n

Expanding each term, this is


1 X 1 X † 1 X 1 X †
R = − Tr Ln v † v L†n + Tr v Ln v L†n + Tr Ln v † L†n v − Tr v Ln L†n v
2 n 2 n 2 n 2 n
1 X † † 1 X †
− Tr vv Ln Ln + Tr vv Ln L†n
2 n 2 n
1 X † † 1 X
+ Tr v Ln vLn − Tr Ln v † L†n v
2 n 2 n
 
−iTr v † [H, v] . (A.2)
The third and eighth terms cancel; the fourth and sixth terms cancel; the sec-
ond and seventh terms add to give the term Trv † n Ln vL†n in Trv † Lv; the
P

first and fifth terms give the terms −Trv † v n L†n Ln /2 and −Trv † n L†n Ln v/2
P P

in Trv † Lv; and the last term gives the Hamiltonian term in Trv † Lv. We con-
clude that  
R = Tr v † Lv . (A.3)
The first two terms in (A.1) are real, while the last two are imaginary, so
! !
† 1 X 1 X 
ReTr v Lv = − Tr [v , L†n ]† [v , L†n ] − Tr vv † L†n Ln − Ln L†n
2 n 2 n
  (A.4)
† † † †
X
Im Tr v Lv = ImTr Ln v [v, Ln ] − Tr v [H, v] (A.5)
n
Taking v to be one of the eigenmatrices vk of L, with Lvk = λk vk then gives
Eqs. (10) and (11).

10
APPENDIX B: Derivation of Eqs. (19) and (20)

We try a solution of the Lindblad equation


X
ρ(t) = Λα M Λβ fαβ (t) . (B.1)
αβ

With Λα and H given by Eqs. (15) and (16), the Lindblad equation (4)
becomes
X d X
‘ Λα M Λβ fαβ (t) = λαβ Λα M Λβ fαβ (t) , (B.2)
αβ
dt αβ

‘ where
1 1
λαβ = Cαβ − Cαα − Cββ − i(hα − hβ ) (B.3)
2 2
and X
Cαβ = ℓnα ℓ∗nβ . (B.4)
n

This has an obvious solution of the same form as (19):


 
fαβ (t) = exp λαβ t fαβ (0) . (B.5)

To get a more useful expression for λαβ , we note that

1 1 1 1 1
− |ℓα − ℓβ |2 = − Cαα − Cββ +Re ℓnα ℓ∗nβ = Cαβ − Cαα − Cββ −iIm ℓnα ℓ∗nβ
X X
2 2 2 n 2 2 n

so Eq. (B.3) is the same as Eq. (20).

———

1. N. Bohr, Nature 121, 580 (1928).

2. G. C. Ghirardi, A. Rimini, and T. Weber, Phys. Rev. D 34, 470


(1986); P. Pearle, Phys. Rev. A 39, 2277 (1989), and in Quan-
tum Theory: A Two-Time Success Story (Yakir Aharonov Festschrift),
eds. D. C. Struppa & J. M. Tollakson (Springer, 2013), Chapter 9.
[arXiv:1209.5082]

11
3. N. Gisin, Helv. Phys. Acta 62, 363 (1989); Phys. Lett. A 143, 1
(1990). This is discussed in a wider context by J. Polchinski, Phys.
Rev. Lett. 66, 397 (1991).

4. If any entangled density matrix for a compound system S ⊗ S consist-


ing of two isolated copies of a system S remains positive for a range
of future times if it is positive at an initial time, then the linear map-
ping ρ(t) → ρ(t′ ) of the density matrix of S for t′ > t in this range
is completely positive, as shown by F. Benatti, R. Floreanini, and R.
Romano, J. Phys. A Math. Gen. 35, L551 (2002). For complete pos-
itivity see W. F. Stinnespring, Proc. Am. Math. Soc. 6, 211 (1955);
M. D. Choi, J. Canada Math. 24, 520 (1972). For its implications,
see M. D. Choi, Linear Algebra and its Applications 10, 285 (1975).

5. G. Lindblad, Commun. Math. Phys. 48, 119 (1976); V. Gorini, A.


Kossakowski and E. C. G. Sudarshan, J. Math. Phys. 17, 821 (1976).
For a straightforward derivation, see P. Pearle, Eur. J. Phys. 33, 805
(2012).

6. F. Benatti and R. Narnhofer, Lett. Math. Phys. 15, 325 (1988).


(Their result, which applies for infinite as well as finite Hilbert spaces,
takes the form of an inequality. When limited to finite Hilbert spaces,
it iw equivalent to the equality (5).) It was earlier shown by T. Banks,
M. Peskin, and L. Susskind, Nuclear Phys. B 244, 125 (1984), that a
sufficient (though not necessary) condition for non-decreasing entropy
is that the Ln are Hermitian. Of course, if the Ln are Hermitian then
Eq. (5) is automatically satisfied.

7. This solution was given in the second edition of S. Weinberg, Lectures


on Quantum Mechanics (Cambridge University Press, Cambridge, UK,
2015), Section 6.9, for the special case where all Ln are Hermitian.

8. This behavior is seen in several of the examples presented by Pearle


in ref. [5].

12
UTTG-10-16
arXiv:1610.02537v1 [quant-ph] 8 Oct 2016

Lindblad Decoherence in Atomic Clocks

Steven Weinberg∗
Theory Group, Department of Physics, University of Texas
Austin, TX, 78712

Abstract

It is shown how possible corrections to ordinary quantum mechanics de-


scribed by the Lindblad equation might be detected by exploiting the great
precision of atomic clocks.


Electronic address: weinberg@physics.utexas.edu

1
In searching for an interpretation of quantum mechanics we seem to be
faced with nothing but bad choices[1]. To avoid both the dualism of the
Copenhagen interpretation and the endless creation of inconceivably many
branches of history of the many-worlds approach, while at the same time
holding on to a realist description of the evolution of physical states from
moment to moment, we may try to modify quantum mechanics so that dur-
ing measurement the density matrix of even an isolated system undergoes a
collapse of the sort called for by the Copenhagen interpretation. The idea is
that this collapse is rapid in systems containing macroscopic elements, such
as apparatus or physicists, while the corrections to quantum mechanics are
very small in purely microscopic systems such as atoms. A collapse of the
density matrix has already been described theoretically in interesting modi-
fications of quantum mechanics[2]. Here we wish to explore the possibility of
observing small departures from quantum mechanics by exploiting the great
precision of atomic clocks. Apart from this aim, the formalism developed
here may prove useful in describing limits on the precision of atomic clocks in
ordinary quantum mechanics due to their interaction with the environment.
First, a reminder of the time-dependence to be expected both in modified
versions of quantum mechanics and in open systems. To avoid instantaneous
communication at a distance[3], the density matrix at time t′ is assumed to
depend only on the density matrix at any earlier time t, but not otherwise
on the state vector at earlier times. Following the rules for composition
of probabilities, we take this to be a linear relation. We require that this
relation preserves the trace and the Hermiticity of the density matrix, and
satisfies a condition of complete positivity[4]. It is well-known that under
these assumptions the time-dependence of the density matrix is given by a
first-order differential equation, the Lindblad equation[5]:
" #
1 1
Lα ρ(t) L†α − L†α Lα ρ(t) − ρ(t) L†α Lα .
X
ρ̇(t) = −i[H, ρ(t)] + (1)
α 2 2

We are here considering only a Hilbert space of finite dimensionality d, which


is adequate for the application we have in mind. (We ignore the translational
degree of freedom of atoms.) In Eq. (1), H is a d × d Hermitian matrix that
can be identified with the Hamiltonian of ordinary quantum mechanics, and
the sum runs over not more than d2 − 1 matrices Lα , which represent the
departure from ordinary quantum mechanics. We use units with h̄ = 1.
The form of Eq. (1) also assumes time-translation invariance, in which
case the relation between ρ(t) and ρ(t′ ) depends only on t′ − t, and the

2
matrices H and Lα are time-independent. This is of course not the case
throughout the history of atoms in an atomic clock, which are intermittently
exposed to external electromagnetic radiation. But atomic clocks rely on a
“Ramsey trick”[6], in which atoms are exposed to electromagnetic radiation
only in two relatively short bursts, separated by a much longer interval in
which they are free of external fields. It is this long time interval between
bursts that gives the atomic wave function a chance to get out of phase with
the electromagnetic wave, and so leads to the high precision with which
the frequency of the wave can be tuned to that of the atomic transition,
and it is also during this field-free period that the small effects due to the
corrections to ordinary quantum mechanics have a chance to build up. So
to deal with atomic clocks we shall first consider the field-free case, with
time-dependence prescribed by the time-independent equation (1), and then
return to the clocks.
We will simplify our task here by assuming (in agreement with obser-
vation) that the states |mi with which we have to deal are stable, aside
from radiative transitions that are slow enough to be ignored. We shall also
assume
 that  Eq. (1) does not allow a decrease in the von Neumann entropy
−Tr ρ ln ρ for any ρ. It follows then that the stable states are eigenstates
of Lα , L†α , and H, with eigenvalues that we shall call ℓαm , ℓ∗αm , and Em .
Here is the proof[7]. If |mi is stable then the right-hand side of Eq. (1)
must vanish if we take ρ to be the projection operator Λm = |mihm| on such
a state. Multiplying this equation on the left with Λm and taking the trace,
with a little rearrangement we have
( ) ( )
X 

L†α Lα Lα L†α
X
0 = Tr [Lα , Λm ] [Lα , Λm ] + Tr Λm − . (2)
α α

The necessary and sufficient condition for the non-decrease of entropy is


the vanishing of the sum over α in the second term[8], leaving us here with
the vanishing of the first term, and hence with the vanishing of [Lα , Λm ]
for all α. The adjoint shows that also [L†α , Λm ] vanishes for all α. Then
the vanishing of the right-hand side of Eq. (1) where ρ = Λm requires also
that [H, Λm ] = 0. Letting these vanishing commutators act on |mi shows
immediately that |mi is an eigenstate of Lα , L†α , and H, as was to be proved.
Note that we have not had to assume that the discrete stable states form
a complete set. Indeed, we only need to assume stability for the two states
involved in the clock transition. To digress a bit, if we had assumed that
the stable states form a complete set, then we could have concluded from

3
the above that these states would form a basis in which Lα , L†α , and H are
all diagonal, so that they would all commute with each other, and so energy
would be conserved — not a surprising conclusion, thought it could not have
been reached here without our further assumption of non-decreasing entropy.
The conservation of energy by the Lindblad equation might raise problems
with locality and Lorentz invariance[9], though this is uncertain[10].
In accordance with this theorem, Eq. (1) gives the density matrix the
time-dependence
h i
ρmn (t) ∝ exp − i(Em − En )t − λmn t (3)

where |mi and |ni are any two stable states, and
" #
X 1 2 1 2
λmn = ℓαm + ℓαn − ℓαm ℓ∗αn

α 2 2
" #
X 1 2
= − i Im(ℓαm ℓ∗αn ) + ℓαm − ℓαn . (4)

α 2

We note that Reλmn ≥ 0, so all elements of the density matrix decay except
for those with Reλmn = 0, Also, λmm = 0, so the diagonal elements ρmm (t),
and typically only the diagonal elements, are time-independent.
Now let us see what this implies for the tuning of the frequency of an
electromagnetic wave to the transition frequency Ee − Eg between stable
states |gi and |ei in an atomic clock. (The labels e and g are conventional,
standing for “excited state” and “ground state,” though g and e can be any
two stable states of the atom.) Each atom is exposed twice for periods each
lasting a relatively short time τ to an oscillating external electromagnetic
field, which adds to the Hamiltonian a term H ′ exp(−iωt) + H ′† exp(iωt),
and can drive the transition g → e when the real frequency ω is tuned to
a value near Ee − Eg . We will work with an “interaction picture” density
matrix ρImn (t) ≡ exp(i(Em − En )t)ρmn (t). We assume that the exposure
period τ is short enough so that τ |λmn | ≪ 1, and hence during this period
changes in ρI arise only from the oscillating external field. We make the
usual assumptions that τ |Ee − Em | ≫ 1 for m 6= e, τ |Eg − Em | ≫ 1 for
m 6= g and τ |ω| ≫ 1, which allows us to drop rapidly oscillating terms
in the equation for ρ̇I and keep only those terms with time-dependence
proportional to exp(±i∆ωt), where ∆ω ≡ ω − Ee + Eg . We also suppose
that as usual in atomic clocks the frequency of the external field has been
tuned so that |∆ω| ≪ |Heg ′ |, and hence the frequency of Rabi oscillations is

4
Ω/2 = |Heg′ |. Under these assumptions, the density matrices at times t and

t + τ are related by

ρI (t + τ ) = U (t + τ, t)ρI (t)U † (t + τ, t) , (5)

where

Uee (t + τ, t) = Ugg (t + τ, t) = cos(Ωτ /2) , ,


∗ i∆ω t
Ueg (t + τ, t) = Uge (t + τ, t) = −ie sin(Ωτ /2) , (6)

(We are choosing the relative phase of the states e and g so that Heg ′ is real

and positive, and hence equal to Ω/2.)


If an atom starts at t = 0 in the pure state g, then at time t = τ its
density matrix ρI (t) = U (τ, 0)ρI (0)U † (τ, 0) will have components

ρIee (τ ) = sin2 (Ωτ /2) , ρIgg (τ ) = cos2 (Ωτ /2) ,


−i∆ωτ
ρIeg (τ ) = ρI∗
ge (τ ) = ie cos(Ωτ /2) sin(Ωτ /2) , (7)

which of course still represents a pure state.


Then for a Ramsey time T ≫ τ the atom travels through field-free space,
so the only time-dependence of the density matrix ρI in this period arises
from the Lindblad term in Eq. (1). In accordance with Eq. (3), the density
matrix at the end of this period is

ρIee (τ + T ) = sin2 (Ωτ /2) , ρIgg (τ ) = cos2 (Ωτ /2) ,


−i∆ωτ −λeg T
ρIeg (τ + T ) = ρI∗
ge (τ + T ) = ie e cos(Ωτ /2) sin(Ωτ /2) . (8)

Then in a second period of duration τ the atom is again exposed to the


same external electromagnetic field, and the density matrix is changed to

ρI (2τ + T ) = U (2τ + T, τ + T )ρI (τ + T )U † (2τ + T, τ + T ) (9)

A straightforward calculation gives the probability Pe that the atom will


wind up in the excited state:
" !#
1  
Pe = ρIee (2τ + T ) = sin2 Ωτ 1 + e−ΓT cos ω − Ee + Eg − E T ,
2
(10)
where we write λge = Γ − iE with Γ and E real, and hence according to
Eq. (4), #
1 X 2
Γ= ℓαg − ℓαe (11)

2 α

5
Im {ℓαg ℓ∗αe }
X
E =− (12)
α
In using atomic clocks the excitation probability Pe is measured as a function
of frequency ω by repeating the observation of the fraction of atoms excited
for various chosen frequencies ω. Then ω is tuned to maximize Pe , so that
ω will then normally be expected to equal the reference frequency Ee − Eg
within an uncertainty of order 1/T .
If there were corrections to ordinary quantum mechanics in Eq. (10) with
Γ of order 1/T or greater, the shape of the curve of Pe versus ω would be
grossly altered. For instance, for ΓT = 1, the ratio of the minimum value
of Pe to its maximum value would be 0.46 instead of zero, and the ratio
of the value of Pe where it is most rapidly varying with frequency to its
maximum value would be 0.73 instead of 0.5. Seeing such a departure from
expectations would be a good sign of a departure from ordinary quantum
mechanics. A change in the form of Pe versus ω this drastic would generally
have been seen in atomic clocks and has not been seen[11], so it seems safe to
conclude that Γ is less than the values of 1/T encountered in atomic clocks.
Unfortunately we have no idea of what target value of Γ which we should
aim at, or even how Γ might vary from one transition to another. We can
distinguish two extreme cases.
If Γ has similar values for all transitions, then we should look at clocks
for which the Ramsey time T is as long as possible. Modern atomic clocks
typically have T of the order of seconds, but a clock[12] using a microwave-
frequency transition in trapped 171 Yb+ ions has operated with T > 600
seconds. Hence we can conclude that in this transition Γ < 10−18 eV. This
upper limit shows that environmental effects make it hopeless to look for
departures from quantum mechanics on macroscopic scales, where the en-
ergy of interaction with the environment is presumably always much greater
than 10−18 eV. On the other hand, this upper bound is enormous compared
with the difference between energies of discrete states of macroscopic ob-
jects that are free from all external influences. For instance, according to
quantum mechanics, the successive energy eigenstates of a pointer of mass
one gram and length one centimeter that swivels freely in two dimensions is
about 10−42 eV. Thus departures from ordinary quantum mechanics with Γ
less than the limit 10−18 eV derived from atomic clocks might still have a
powerful effect on the quantum states of macroscopic systems if they could
somehow be isolated from their environment.
If instead Γ somehow scaled with the transition frequency Ee − Eg , then
we would want to set a limit on Γ/(Ee −Eg ), rather than on Γ itself. For this

6
purpose it would be more useful to look at clocks for which the fractional
imprecision 1/T (Ee − Eg ) is as small as possible. For optical clocks with T
of the order of a second this is 10−15 , but a clock using 37 Al ions achieved
a value about 3 × 10−17 [13], so we can conclude that at least for these
transitions, Γ/(Ee − Eg ) < 3 × 10−17 .
In addition to a change in the shape of the curve of Pe versus ω, Eq. (10)
also entails a shift in the frequency at the maximum value of Pe , from
Ee − Eg to Ee − Eg + E. Detecting this frequency shift is impossible in
a two-state system if we do not have independent information about the
uncorrected frequency Ee − Eg . The prospects are brighter if it is possible to
drive transitions among three different energy levels, because actual energy
differences trivially obey the relation
(E1 − E2 ) + (E2 − E3 ) + (E3 − E1 ) = 0
while there is no reason to expect the frequency shifts Eij to obey the cor-
responding relation
E12 + E23 + E31 = 0 .
It remains to be seen if there is a three-level system suitable for this purpose.

I am grateful for helpful conversations about atomic clocks with Mark


Raizen and David Wineland. This material is based upon work supported
by the National Science Foundation under Grant Number PHY-1620610 and
with support from The Robert A. Welch Foundation, Grant No. F-0014.

———

1. The author’s views on this issue are set out in detail in Section 3.7 of
S. Weinberg, Lectures on Quantum Mechanics, 2nd ed. (Cambridge
University Press, Cambridge, U. K., 2015).
2. G. C. Ghirardi, A. Rimini, and T. Weber, Phys. Rev. D 34, 470
(1986); P. Pearle, Phys. Rev. A 39, 2277 (1989), and in Quan-
tum Theory: A Two-Time Success Story (Yakir Aharonov Festschrift),
eds. D. C. Struppa & J. M. Tollakson (Springer, 2013), Chapter 9.
[arXiv:1209.5082]; G. J. Milburn, Phys. Rev. A 44, 5401 (1991); I. C.
Percival, Proc. Roy. Soc. London A, 447, 189 (1994). For a review,
see A. Bassi and G. C. Ghirardi, Physics Reports 379, 257 (2003).

7
3. N. Gisin, Helv. Phys. Acta 62, 363 (1989); Phys. Lett. A 143, 1
(1990). This is discussed in a wider context by J. Polchinski, Phys.
Rev. Lett. 66, 397 (1991).

4. W. F. Stinespring, Proc. Am. Math. Soc. 6, 211 (1955). For a review,


see F. Benatti and R. Florentini, Int.J.Mod.Phys. B19, 3063 (2005)
[arXiv:quant-ph/0507271].

5. G. Lindblad, Commun. Math. Phys. 48, 119 (1976); V. Gorini, A.


Kossakowski and E. C. G. Sudarshan, J. Math. Phys. 17, 821 (1976).
For a lucid derivation of the Lindblad equation see P. Pearle, arXiv:
1204.2016.

6. N. F. Ramsey, Phys. Rev. 76, 996 (1949).

7. The proof follows along the same lines as in S. Weinberg, Phys. Rev.
A 93, 032124 (2016).

8. F. Benatti and R. Narnhofer, Lett. Math. Phys. 15, 325 (1988).

9. T. Banks, M. E. Peskin, and L. Susskind, Nucl. Phys. B244, 125


(1984); M. Srednicki, Nucl. Phys. B410, 143 (1993).

10. W. G. Unruh and R. M. Wald, Phys. Rev. D 52, 2176 (1995).

11. D. J. Wineland, private communication.

12. P. T. H. Fink, M. J. Sellars, M. A. Lawn, C. Coles, A. G. Mann, and


D. G. Blair, IEEE Transactions on Instrumentation and Measurement
44. 113 (1995).

13. C. W. Chou, D. B. Hume, M. J. Thorpe, D. J. Wineland, and T.


Rosenband, Phys. Rev. Lett. 106, 160801 (2011).

8
UTTG-27-16

Gravitational Waves in Cold Dark Matter

Raphael Flauger*
Department of Physics, University of California, San Diego
arXiv:1801.00386v1 [astro-ph.CO] 1 Jan 2018

La Jolla, CA, 92093

Steven Weinberg**
Theory Group, Department of Physics, University of Texas
Austin, TX, 78712

Abstract

We study the effects of cold dark matter on the propagation of gravitational waves of astrophysi-
cal and primordial origin. We show that the dominant effect of cold dark matter on gravitational
waves from astrophysical sources is a small frequency dependent modification of the propagation
speed of gravitational waves. However, the magnitude of the effect is too small to be detected
in the near future. We furthermore show that the spectrum of primordial gravitational waves
in principle contains detailed information about the properties of dark matter. However, de-
pending on the wavelength, the effects are either suppressed because the dark matter is highly
non-relativistic or because it contributes a small fraction of the energy density of the universe.
As a consequence, the effects of cold dark matter on primordial gravitational waves in practice
also appear too small to be detectable.

*
Electronic address: flauger@physics.ucsd.edu
**
Electronic address: weinberg@physics.utexas.edu

1
I. Introduction

The direct observation [1] of gravitational waves from distant sources immediately heightened
interest in the propagation of these waves from source to detector. Calabrese, Battaglia, and
Spergel [2] considered the future use of gravitational wave source counts as a probe of grav-
itational wave propagation. They did not assume any specific model for intervening matter,
supposing instead that by some mechanism the wave intensity falls off as a power of distance.
In contrast, Goswami, Mohanty, and Prasanna [3] considered the intervening matter to be an
imperfect fluid, using an old result of Hawking [4], that the intensity of a gravitational wave
falls off in an imperfect fluid at a rate 16πGη, where η is the viscosity. They set an upper limit
on η by adopting the estimate of Ref. [1], that the source is at a distance of 410 Mpc. This limit
would be valid if the source distance really were 410 Mpc, but the source distance was estimated
in [1] from the observed signal strength, under the assumption that the gravitational wave is
not damped. The observations in [1] do not rule out a viscosity greater than the upper bound
given in Ref. [3]; if the viscosity were greater, it would just mean that the distance to the source
is less than 410 Mpc. In order to use the observed intensity of detected gravitational waves to
set an upper limit on the viscosity, we would need an independent measure of the distance of
the source, other than the intensity of the gravitational wave.
But even so, a fundamental question would remain: Is it reasonable to calculate the effect of
cosmic matter on the propagation of gravitational waves by treating this matter as an imperfect
fluid? It is clear that the treatment of a gas as a fluid, perfect or imperfect, must break down
at some sufficiently small collision frequency. The coefficients of viscosity and heat conduction
in the theory of imperfect fluids are proportional to the mean free path, and so would become
infinite for zero collision frequency, which is absurd. The issue whether a particular medium can
be treated as an imperfect fluid, characterized by coefficients of viscosity and heat conduction,
depends on the scales of distance and time of the process under study. As argued briefly in
Section III, in the propagation of a gravitational wave through some medium, collisions are
effective only if the mean free path in the medium is smaller than the wavelength. This is
certainly not the case for observed gravitational waves. The observed wavelengths are in the
range of 300 to 15000 km, and there is nothing in interstellar space with free paths that short.
(For hydrogen atoms in our galaxy, with cross sections of the order of a square Angstrom and
a density of the order of 1 cm−3 , the mean free path is of order 1011 km. The mean free path
of warm ionized gas is somewhat shorter, about 5 × 107 km, but still much longer than the
observed wavelengths. Mean free paths are of course longer outside galaxies, and longer for
WIMPs everywhere.) The wavelength of observed gravitational waves is so much smaller than
interstellar and intergalactic mean free paths that it is more appropriate to treat cosmic matter

2
as collisionless than as a fluid, perfect or imperfect. For this reason, and also with an eye to
possible cosmological applications, this paper will explore the effect on a gravitational wave of
its passage through cold dark matter.
The general formalism for calculating the effect of collisionless neutrinos on gravitational
waves has already been laid out in [5]. The perturbation of the neutrinos due to the gravitational
wave was calculated using the collisionless Boltzmann equation; the result of this calculation
was then used to evaluate the effect of the perturbation back on the wave. This formalism
was applied in [5] to cosmological gravitational waves in the radiation-dominated era, in which
case the effects were found to be substantial. Here we are instead concerned with the effects of
massive particles. Our calculations will follow the same track as in Ref. [5], but the presence of
non-zero mass will make them somewhat more complicated.
In Sections II through V we develop the general formalism for calculating those aspects of
the effects of massive collisionless particles on gravitational radiation that are relevant to both
astrophysical and cosmological sources. After stating our assumptions in Section II, a general
result for the anisotropic inertia in the presence of massive collisionless matter is given in Section
III for a general Robertson-Walker scale factor a(t). In Section IV we apply these results to the
case of non-relativistic matter, and give the gravitational wave equation in this case. Section V
deals with a special cases of relevance to both astrophysical and cosmological sources, of a wave
frequency much larger than the rate of cosmic expansion.
We then consider specific applications. In Section VI we evaluate the effect of intervening
dark matter on the gravitational waves whose detection was reported in [1]. It will be a surprise
to no one that the effect turns out to be much too small to be observed. In Section VII we turn
to the calculation of the effects of cold dark matter on primordial gravitational waves. Because
primordial gravitational waves with wavelengths accessible at interferometers enter the horizon
before kinetic decoupling of the dark matter or even when the dark matter is still relativistic,
in this section we extend our discussion to include the effects of collisions. We show that the
spectrum of primordial gravitational waves in principle contains valuable information about the
dark matter like the temperature of kinetic decoupling and the nature of the interactions of dark
matter particles. Unfortunately, the effects appear too small to be detectable in the foreseeable
future. We summarize our findings in Section VIII.
Added note: After our paper was nearly finished, we encountered a recent paper [6] that
covers much the same ground as ours regarding gravitational waves from astrophysical sources,
finding as we have that damping of these waves is negligible. In addition to damping, our
discussion pays close attention to the modification of the propagation speed of these waves
in cold dark matter, and includes a detailed treatment of the effects of cold dark matter on
primordial gravitational waves, which is not considered in [6].

3
II. Assumptions

We consider gravitational waves in transverse-traceless gauge in a spatially flat Robertson–


Walker background, so that the spacetime line element takes the form*

dτ 2 = dt2 − gij (x, t) dxi dxj , (1)

with h i
gij (x, t) = a2 (t) δij + hij (x, t) , (2)

where |hij |  1 and


∂hij
hii = 0 ,
= 0. (3)
∂xi
Since the background Robertson-Walker metric is invariant under time-independent coordinate-
space translations, we can restrict our attention to superpositions of plane waves with space-
dependence
hij (x, t) ∝ eiq·x , (4)

where q is a time-independent co-moving wave number.


As is well known, the propagation of the wave represented by hij is governed by the wave
equation
q2
 
3ȧ
ḧij + ḣij + hij = 16πGπij , (5)
a a2
where q 2 ≡ qi qi , and πij is the anisotropic part of the spatial components of the energy-
momentum tensor T µ ν :

T i j (x, t) = πij (x, t) + δij terms , πii (x, t) = 0 . (6)

We assume that the wave passes through a medium consisting of collisionless particles of mass
m 6= 0, with an isotropic unperturbed coordinate-space density 4πp2 dp n(p) of particles with

pi pi between p and p + dp. In particular, our treatment will not include the more familiar
effect of gravitational lensing of the gravitational waves by intrinsic density perturbations in the
dark matter distribution. Our first task is then to calculate πij (x, t). The general result for
collisionless dark matter found in the following section is given below in Eq. (22). Collisions are
included in Section VII.
*
We take i, j, k, etc. to run over the spatial coordinate indices 1, 2, 3; repeated indices are summed; and we
set the speed of light equal to unity.

4
III. Calculation of πij .

For a line element of the general form (1) the four-momentum of a particle of rest-mass m is

dxµ
pµ = m , (7)

so
dxi
= pi /p0 , (8)
dt
and q
p0 = m2 + gij pi pj . (9)

It turns out that the covariant components pi satisfy a simpler equation of motion than the
contravariant components

dpi d  ∂gij j ∂gij pk pj µ ν


j p p
= gij pj = p + − g ij Γµν ,
dt dt ∂t ∂xk p0 p0

and therefore for any metric of form (1)

dpi 1 ∂gkl pk p`
= . (10)
dt 2 ∂xi p0

With the spatial components of the metric of the form (2), this is

dpi a2 ∂hkl pk p` ia2 qi p k p `


= i 0
= hkl 0 , (11)
dt 2 ∂x p 2 p
so the changes in the covariant components are of first order in the perturbation hij .
For a gas of such particles with n(p, x, t) i dpi i dxi particles in a momentum-space volume
Q Q
i
Q Q
i dpi around p and in a coordinate-space volume i dx around x, the space-components of
the energy-momentum tensor are

pi (p, x, t)pj
Z
i 1
T j (x, t) = p d3 p n(p, x, t) , (12)
Detg(x, t) p0 (p, x, t)

where d3 p ≡
Q
i dpi . The phase space density n is subject to the collisionless Boltzmann equa-
tion, which according to Eqs. (8) and (11) takes the form

∂n pi ∂n ia2 qi pk p` ∂n
0= + 0 i+ hkl 0 . (13)
∂t p ∂x 2 p ∂pi
We assume that in the absence of the gravitational wave represented by hij the density n is some
√ 
function n pi pi , which is a trivial solution of Eq, (13) for hij = 0. As an initial condition,

5
we suppose that at some initial time t1 the density in the presence of hij is the same in locally
Cartesian spatial coordinate frames:
 q 
n(p, x, t1 ) = n a(t1 ) g ij (x, t1 )pi pj . (14)

To first order in hij , this is


1
n(p, x, t1 ) = n(p) − n0 (p)hij (x, t1 )pi pj /p , (15)
2

where again p ≡ pi pi . At any later time t there is a dynamical correction δn induced by the
gravitational wave, so that
1
n(p, x, t) = n(p) − n0 (p)hij (x, t)pi pj /p + δn(p, x, t) , (16)
2
with initial value δn(p, x, t1 ) = 0. Since ∂n/∂xi is already of first order in hij , in Eq. (13) we
can use the zeroth order expressions for pi and p0 :

pi = a−2 pi ,
p
p0 = m2 + p2 /a2 .

Like all other first-order perturbations, δn has a space-dependence δn ∝ exp(iqi xi ). The first-
order terms in Eq. (13) then give
∂ δn(p, x, t) iq p pk pl n0 (p)
+ p i i δn(p, x, t) = ḣkl (x, t) . (17)
∂t a2 (t) m2 + p2 /a2 (t) 2p
We return to this in detail in Section VII, but let us pause at this point and consider the effect
of collisions. In general, collisions will drive the phase-space distribution back to the equilibrium
form (14), for which δn = 0, so their effect can be simulated in Eq. (17) by adding a term −Γδn
to the right-hand side, where Γ is the decay rate of departures from equilibrium in the absence
of field perturbations. Collisions can be ignored if this term is much less than the transport term
p
in the left-hand side of Eq. (17) — that is, if Γ  v/λ, where v = p/a m2 + p2 /a2 is a typical
proper velocity and λ ≈ a/q is the proper wavelength. The decay rate Γ varies inversely as the
mean free path `, so on dimensional grounds we expect that Γ ≈ v/`. Hence the condition for
neglecting collisions is that `  λ. As remarked in Section I, this condition is well satisfied for
detected gravitational waves.
Returning now to the collisionless Boltzmann equation (17), the solution is
" Z #
pk pl n0 (p) t 0 t
Z
iq i p i
δn(p, x, t) = dt exp − dt00 p ḣkl (x, t0 ) . (18)
2p t1 t0 a2 (t00 ) m2 + p2 /a2 (t00 )
In calculating the space components (12) of the energy-momentum tensor, we use the first-order
expressions
h p p
pi = a−2 [pi − hik pk ] , p kl k l
p
p0 = m2 + p2 /a2 − , (19)
2a2 m2 + p2 /a2

6
and Eqs. (16) and (18). To first order in hij the spatial components of the energy-momentum
tensor are then
Z "
1 pi pj
T i j (x, t) = 5 d3 p n(p) p
a (t) m2 + p2 /a2 (t)
#
hik (x, t)pk pj pi pj pk pl hkl (x, t)
−p + 2
m2 + p2 /a2 (t) 2a (t)(m2 + p2 /a2 (t))3/2
Z
1 pi pj pk pl hkl (x, t)
− 5 d3 p n0 (p) p
2a (t) p m2 + p2 /a2 (t)
Z
1 pi pj pk pl
+ 5 d3 p n0 (p) p
a (t) 2p m2 + p2 /a2 (t)
Z t " Z #
t
0 0 iqi pi
× dt ḣkl (x, t ) exp − dt00 p . (20)
t1 t0 a2 (t00 ) m2 + p2 /a2 (t00 )

The next-to-last term of Eq. (20) can be calculated by setting n0 (p)pi /p = ∂n(p)/∂pi and
integrating by parts in momentum space. In this way we find that all the terms in Eq. (20)
cancel, except for a term proportional to δij and the last term in Eq. (20):
Z
1 pi pj pk pl
i
T j (x, t) = 5 d3 p n0 (p) p
a (t) 2p m2 + p2 /a2 (t)
Z t " Z #
t
0 0 00 qi p i
× dt ḣkl (x, t ) exp −i dt p
t1 t0 a2 (t00 ) m2 + p2 /a2 (t00 )
+δij terms . (21)

The momentum space volume element in Eq. (21) may be written as d3 p = p2 dp dz dϕ, where
z = qi pi /qp is the cosine of the angle between the wave vector q and the momentum p, and ϕ
is the azimuthal angle of the momentum around the wave vector. The integral of pi pj pk pl /p4
over ϕ must take the form of a linear combination of symmetric terms formed from Kronecker
deltas and q̂ ≡ q/q, with coefficients that depend only on z
Z 2π
dϕ pi pj pk pl /p4 = A(z)q̂i q̂j q̂k q̂l
0
+B(z)[q̂i q̂j δkl + q̂i q̂k δjl + q̂i q̂l δjk + q̂j q̂k δil + q̂j q̂l δik + q̂k q̂l δij ]
+C(z)[δij δkl + δik δjl + δil δjk ] .

Because hkl is transverse and traceless, terms proportional to q̂k or q̂l or δkl do not contribute in
Eq. (21), so all we need is C(z), which by taking various contractions is easily calculated to be
C(z) = π(1 − z 2 )2 /4. Discarding terms proportional to δij , Eq. (21) finally gives the anisotropic

7
stress tensor for collisionless particles
Z ∞
n0 (p)
Z +1
π
πij (x, t) = 5 p5 dp p (1 − z 2 )2 dz
4a (t) 0 2 2
m + p /a (t) −12
Z t " Z #
t
0 0 00 iqpz
× dt ḣij (x, t ) exp − dt p . (22)
t1 t0 a2 (t00 ) m2 + p2 /a2 (t00 )

This is traceless and transverse because hij is.


As a check on Eq. (22), let’s briefly consider the special case of massless collisionless particles
such as neutrinos, or at any rate particles that have p/a(t)  m during the period of interest.
Here Eq. (22) becomes
Z ∞ Z +1 Z t  Z t 
π 4 0 2 2 0 0 00 iqz
πij (x, t) = 4 p dp n (p) (1 − z ) dz dt ḣij (x, t ) exp − dt .
4a (t) 0 −1 t1 t0 a(t00 )
The argument of the exponential does not depend on p, so if we integrate over p by parts we
have
Z ∞ Z +1 Z t  Z t 
π iqz
πij (x, t) = − p3 dp n(p) (1 − z 2 )2 dz dt0 ḣij (x, t0 ) exp − dt00 00 .
a4 (t) 0 −1 t1 t0 a(t )

To zeroth order in hij , the the proper volume of a coordinate space volume d3 x is a3 d3 x, and the

energy of a massless particle is given by Eq. (9) as p0 = a−1 pi pi = a−1 p, so the total energy
per proper volume is
Z Z ∞
ρ(t) = d3 p n(p)p/a4 (t) = 4π p3 dp n(p)/a4 (t) .
0

For m = 0 Eq. (22) therefore gives:

ρ(t) +1
Z Z t  Z t 
2 2 0 0 00 iqz
πij (x, t) = − (1 − z ) dz dt ḣij (x, t ) exp − dt ,
4 −1 t1 t0 a(t00 )

which is the same result as given for neutrinos by Eqs. (16) and (17) of reference [5].

IV. Non-relativistic matter

For a general non-zero particle mass m, our result (22) for πij is much more complicated than
for m = 0. We can regain some of the simplicity of the zero mass case by specializing to the
opposite limit, of non-relativistic matter. We will now assume (as is likely for dark matter) that
the matter through which the gravitational wave passes is non-relativistic, in the sense that n(p)
is non-negligible only for p small enough so that

p/a(t0 )  m , (23)

8
over the whole time t0 from emission of the gravitational wave at t0 = t1 to direct or indirect
detection of the gravitational wave at t0 = t. Then Eq. (22) becomes
Z ∞ Z +1
π 5 0
πij (x, t) = 5 p dp n (p) (1 − z 2 )2 dz
4a (t)m 0 −1
Z t  Z t 
0 0 q
× dt ḣij (x, t ) exp −i(p/m)z dt00 . (24)
t1 t0 a (t00 )
2

If the dark matter particles move less than the wavelength of the mode between t00 = t1 to
t00 = t, the argument of the exponential in Eq. (24) is small. The integral over t0 is then trivial;
the integral of (1 − z 2 )2 over z just gives a factor 16/15; and the integral over p can be done by
parts, so that
2E h i
πij (x, t) = − h ij (x, t) − hij (x, t1 ) , (25)
3a5 (t)
where ∞
p2
Z
E≡ 4πp2 n(p) dp × . (26)
0 2m
(Note that E/a5 (t) is the proper kinetic energy density at time t.) The wave equation (5) can
thus be written as
 
ȧ(t) 32πGE
ḧij (x, t) + 3 ḣij (x, t) + ω 2 (t)hij (x, t) = hij (x, t1 ) , (27)
a(t) 3a5 (t)

where1
q2 32πGE
ω 2 (t) ≡ + . (28)
a2 (t) 3a5 (t)
In general, matters are more complicated. The non-relativistic assumption (23) does not
automatically allow us to set the argument of the exponential in Eq. (24) equal to zero. Even
non-relativistic particles will travel a distance large compared to the wavelength if given enough
time, making the argument of the exponential in Eq. (24) much larger than unity. We will see
in Section V that this is likely the case for the gravitational waves reported in [1]. However,
under the relativistic assumption the rate of oscillation of the exponential in Eq. (24) is much
smaller the rate of oscillation of hij , which is of order q/a. So we can take the t0 -derivative in
Eq. (24) to act on the whole integrand of the integral over t0 :
 Z t    Z t 
iqpz ∂ 00 iqpz
ḣij (x, t0 ) exp − dt00 ' hij (x, t0
) exp − dt . (29)
t0 ma2 (t00 ) ∂t0 t0 ma2 (t00 )
1
Notice that the modification of the dispersion relation comes with definite sign, and that the phase velocity
is greater than the speed of light so that there can be no gravitational Cherenkov radiation.

9
The integral over t0 is then trivial, and we find
Z ∞ Z +1
π 5 0
πij (x, t) ' 5 p dp n (p) (1 − z 2 )2 dz
4a (t)m 0 −1
  Z t 
qpz
× hij (x, t) − hij (x, t1 ) exp −i dt00 . (30)
t1 ma2 (t00 )

To see what sort of error is introduced in this approximation, consider for a moment a case in
which the original t0 integral can be done explicitly for general mass without the approximation
(29) . Suppose that a(t) is a constant, which can be taken as a(t) = 1, and suppose that the
gravitational wave has a simple-harmonic time-dependence
   
hij (x, t) = Cij exp iq · x exp ± iω(t − t1 ) ,

with Cij constant, and ω a constant frequency, of order q. The integral over t0 in Eq. (24) is
then straightforward

π ∞ 4
Z Z +1
0
πij (x, t) = p v dp n (p) (1 − z 2 )2 dz
4 0 −1
" #
  ω    
× Cij exp iq · x exp ± iω(t − t1 ) − exp − iqvz(t − t1 ) ,
ω ∓ vzq

where v ≡ p/m. Comparison with Eq. (30) shows that in this case, the approximation (29) just
amounts to supposing that v is small enough to allow us to replace the factor ω/(ω ∓ vzq) with
unity.
Coming back to Eq. (30), the wave equation (5) may now be written
 
ȧ(t)
ḧij (x, t) + 3 ḣij (x, t) + ω 2 (t)hij (x, t) = Sij (x, t) , (31)
a(t)

where again

q2 p2
Z
32πGE
ω 2 (t) = + , E≡ 4πp2 n(p) dp × , (32)
a2 (t) 3a5 (t) 0 2m
and Sij is 16πG times the second term in πij :
∞ +1 t
4π 2 G
Z Z  Z 
5 0 2 2 00qpz
Sij (x, t) ≡ −hij (x, t1 ) 5 p dp n (p) dz (1 − z ) exp −i dt . (33)
a (t)m 0 −1 t1 ma2 (t00 )

We write the wave equation in this form because the right-hand side Sij is a transient that goes to
zero exponentially with increasing t after the dark matter particles have traveled a distance larger
Rt
than the wavelength of the mode. More concretely, if for some t2 we have qp/m t12 dt00 /a2 (t00 ) 
1; then for any smooth density function n(p) of p, Sij becomes exponentially small for t > t2 .

10
To illustrate this, let us take n(p) to have the Maxwell-Boltzmann form

n(p) = N exp(−p2 /2P 2 ) , (34)

with N and P any positive constants. The z and p integrals are then straightforward, and we
find that the wave equation (31) takes the form
" 2 #
0
  2
Z t
ȧ(t) 32πGE v q dt
ḧij (x, t)+3 ḣij (x, t)+ω 2 (t)hij (x, t) = hij (x, t1 ) 5 exp − 2 0
, (35)
a(t) 3a (t) 2 t1 a (t )

where E is again given by Eq. (26), and v 2 = P 2 /m2 is the mean square coordinate velocity for
the distribution (34). Our assumption that v 2 /a2 (t00 )  1 makes the argument of the exponential
in Eq. (35) negligible in the case of few oscillations, so that in this case the wave equation (35)
agrees with our earlier result (27), and we can take Eq. (27) as a fair approximation to the
wave equation for all times. But Sij (x, t) is exponentially small for late times when the dark
matter particles have traveled far compared to the wavelength of the mode and the number of
oscillations becomes so large that
p Z t
q dt0
v2 1.
t1 a2 (t0 )

At these late times, the memory of the gravitational field at the time of emission in the distri-
bution of momenta is erased, and the wave equation (35) simply becomes
 
ȧ(t)
ḧij (x, t) + 3 ḣij (x, t) + ω 2 (t)hij (x, t) = 0 . (36)
a(t)

But to find the coefficients of the two independent solutions of the homogeneous equation (36)
we need to use the inhomogeneous wave equation, Eq. (35).

V. Short Wavelengths

It is not possible to find analytic solutions of either Eq. (35) or Eq. (36) for an arbitrary time-
dependence of the Robertson–Walker scale factor a(t). But we can find solutions when the
frequency ω(t) is much larger than the fractional time-dependence H(t) = ȧ(t)/a(t) of the scale
factor, and hence also much larger than the fractional time-dependence of ω(t) itself. This of
course includes the case of constant a(t), which is a good approximation for the gravitational
waves reported in [1], and to which we shall return in Section VI.
In the short-wavelength case, the familiar WKB approximation (neglecting second time
derivatives of the coefficients of the cosine or sine) yields approximate solutions of the ho-

11
mogeneous equation (36), with time-dependence
hR i
t 0 ) dt0
cos ω(t
a−3/2 (t) ω −1/2 (t) × hR
t
i
sin ω(t0 ) dt0 .

Knowing these homogeneous solutions, it is easy to construct a Green’s function that allows us
to solve the inhomogeneous equation (35)

a3/2 (t0 )ω −1/2 (t0 ) t


Z 
0 00 00
G(t, t ) ≡ 3/2 sin ω(t ) dt θ(t − t0 ) ,
a (t)ω 1/2 (t) t0

for which, within the WKB approximation,


 2   
d ȧ(t) d
+3 + ω (t) G(t, t0 ) = δ(t − t0 ) .
2
dt2 a(t) dt
The general solution of Eq. (35) is therefore
(0)
hij (x, t) = hij (x, t)
t
dt0 a3/2 (t0 )ω 1/2 (t0 )
Z Z t 
32πGE 00 00
+ hij (x, t1 ) sin ω(t ) dt
3 t? a5 (t0 )ω(t0 ) a3/2 (t)ω 1/2 (t) t0
 !2 
2
Z t0 00
v q dt
× exp − 2 00
 , (37)
2 t1 a (t )

(0)
where hij (x, t) is some solution of the homogeneous equation (36). The lower bound t? on the
integral over t0 is arbitrary, because the difference in the integral between two possible choices
of t? is a solution of the homogeneous equation (36), and so far h(0) is an arbitrary solution of
the homogeneous equation. The one condition that must be satisfied by t? is that the WKB
approximation must be valid from t? to t. This may or may not allow us to choose t? = t1 ,
depending on the context. Whatever we choose for t? , the inhomogeneous term in Eq. (37) and
its first time-derivative both vanish for t = t? , so the homogeneous term by itself must satisfy
the initial conditions at t = t? , and therefore takes the form
" Z t
a3/2 (t? )ω 1/2 (t? )

(0) 0 0
hij (x, t) = 3/2 hij (x, t? ) cos ω(t ) dt
a (t)ω 1/2 (t) t?
Z t #
+ḣij (x, t? )ω −1 (t? ) sin ω(t0 ) dt0 . (38)
t?

We are now in a position to evaluate the coefficients of the solutions of the homogeneous
equation after many oscillations. We write the argument of the sine in Eq. (37) as
Z t Z t Z t0
00 00 00 00
ω(t ) dt = ω(t ) dt − ω(t00 ) dt00 .
t0 t? t?

12
Then Eqs. (37) and (38) become
" Z t
a3/2 (t? )ω 1/2 (t? )
 
00 00
hij (x, t) = 3/2 cos ω(t )dt h ij (x, t? ) + A(t)h ij (x, t1 )
a (t)ω 1/2 (t) t?
Z t  #

+ sin ω(t00 )dt00 ω −1 (t? )ḣij (x, t? ) + B(t)hij (x, t1 ) , (39)
t?

where
"Z 0 #
dt0 a3/2 (t0 )ω −1/2 (t0 )
t t
Z
32πGE
A(t) = − 5 0 3/2 (t )ω 1/2 (t )
sin ω(t00 ) dt00
3 t? a (t ) a ? ? t?
 !2 
2
Z t0 00
v q dt
× exp − 2 00
,
2 t1 a (t )
"Z 0 #
32πGE t dt0 a3/2 (t0 )ω −1/2 (t0 ) t
Z
B(t) = 5 0 3/2 (t )ω 1/2 (t )
cos ω(t00 ) dt00
3 t? a (t ) a ? ? t?
 !2 
2
Z t0 00
v q dt
× exp − 2 00
. (40)
2 t1 a (t )

If at some time t0 = t2 the argument of the exponentials in Eq. (40) becomes much larger than
unity, the integrals of t0 are effectively cut off for t0 > t2 , and A(t) and B(t) approach finite t-
independent values for t > t2 . The solution (39) then becomes a linear combination of solutions
of the homogeneous equation.
" Z t
a3/2 (t? )ω 1/2 (t? )
 
00 00
hij (x, t) → 3/2 cos ω(t )dt hij (x, t? ) + A(∞)h ij (x, t1 )
a (t)ω 1/2 (t) t?
Z t  #

00 00 −1
+ sin ω(t )dt ω (t? )ḣij (x, t∗) + B(∞)hij (x, t1 ) . (41)
t?

VI. Observed Gravitational Waves

As a first application of our results for m 6= 0, let us consider the effect of intervening dark
matter on observed gravitational waves [1], believed to be produced by coalescing black holes.
Since the source of these waves is at a fairly small redshift z < 0.1, we can greatly simplify our
calculations by taking the Robertson–Walker scale factor a(t) to be constant during the time
elapsed from production to detection of the waves. Without loss of generality we can normalize
our spatial coordinates so that a(t) = 1.

13
For a(t) = 1, the gravitational wave equation (35) in the presence of collisionless non-
relativistic matter here takes the form
 2 2
v q (t − t1 )2

2 32πGE
ḧij (x, t) + ω hij (x, t) = hij (x, t1 ) exp − , (42)
3 2

where now the frequency (28) is a constant

32πGE
ω 2 = q 2 + Ω2 , Ω2 = . (43)
3
and E is the proper density of kinetic energy.
With a(t) constant we can use the results of the previous section, with no need for the WKB
approximation. Since we are not relying here on the WKB approximation, there is no obstacle
to taking the lower bound t? in Eqs. (39) and (40) to be equal to the emission time t1 . The
solution (39) of Eq. (42) is now exact, and takes the form

hij (x, t) = cos (ω(t − t1 )) (1 + A(t)) hij (x, t1 )


 
+ sin (ω(t − t1 )) ω −1 ḣij (x, t1 ) + B(t)hij (x, t1 ) , (44)

where
" #
32πGE t 0 v 2 q 2 (t0 − t1 )2
Z
 0 
A(t) = − dt sin ω(t − t1 ) exp − , (45)
3ω t1 2
" #
32πGE t 0 v 2 q 2 (t0 − t1 )2
Z
 0 
B(t) = dt cos ω(t − t1 ) exp − . (46)
3ω t1 2

The gravitational waves with the lowest observed frequencies have wavelength about 15000 km,
so if their source is at a distance 410 Mpc,2 the quantity q(t − t1 ) is of order 5 × 1018 . Hence
the argument of the exponentials in Eqs. (45) and (46) is already much larger than unity even
for t0 much less than t, provided that the rms velocity of the dark matter is much larger than
2 × 10−19 c, which we shall assume to be the case. In this case the dark matter particles travel
a distance long compared to the wavelength of the gravitational wave, and the exponentials in
Eqs. (45) and (46) therefore cut off the integrals already for t0 much less than t, and we can take
t = ∞ in A(t) and B(t). The integral for B(∞) is easy

ω2
r  
32πGE π
B(∞) = exp − . (47)
3ωq 2v 2 2v 2 q 2
2
The values here correspond to those in reference [1] because much of the paper was written shortly after the
discovery of gravitational waves. The conclusions remain the same for the more recent observations of gravitational
wave events.

14
The integral for A(∞) is more complicated. It can be expressed in terms of a confluent hyper-
geometric function of the first kind

ω2 1 3 ω2
   
32πGE
A(∞) = − exp − 1 F1 , , , (48)
3q 2 v 2 2v 2 q 2 2 2 2v 2 q 2

with [7]   Z 1
1 3 −3/2
1 F1 , ,z =2 (1 + t)−1/2 exp (z(1 + t)/2) dt . (49)
2 2 −1

Of particular interest is the limit v 2 → 0, with ω/q of order unity. In this limit B(∞) is
exponentially small, while A(∞) → −Ω2 /ω 2 , a result that can be obtained more simply by
writing sin ω(t − t1 ) in Eq. (45) as (1/ω)(d/dt) cos ω(t − t1 ) and integrating by parts. In this
limit Eq. (44) becomes

Ω2
 
hij (x, t) = cos (ω(t − t1 )) 1 − 2 hij (x, t1 ) + ω −1 sin (ω(t − t1 )) ḣij (x, t1 ) . (50)
ω

One effect of the modified relation (43) between q and ω is a frequency-dependence of the
group velocity
∂ω p
vg = = 1 − Ω2 /ω 2 .
∂q
After the gravitational wave has traveled for a distance D, two components of the wave of
different frequency will arrive at times separated by ∆t = D∆(1/vg ). In addition to the shift
in frequency, the presence of the correction term proportional to Ω2 /ω 2 in the relation (50)
between the observed gravitational wave and the initial conditions leads to some distortion of
the gravitational waveform.
But if dark matter is composed of WIMPs, these effects are extremely small. Even if we
were to suppose that dark matter particles have moderate velocities, and dominate the cosmic
p
energy density ρ0 , the quantity Ω would be no greater than H0 = 8πGρ0 /3, which of course
is tiny compared with ω for observed gravitational waves, so Ω2 /ω 2 is negligible. The correction
to the group velocity has a larger effect, but one that is still very small. After the gravitational
wave has traveled for a distance D, two components of the wave with frequency differing by ∆ω
will arrive at times separated by

DΩ2
 
1
∆t = ∆ ,
2 ω2

which even for D of order 1/H0 and ∆ω of order ω is less than the period 2π/ω of the oscil-
lation by a factor of order H0 /ω. It appears that WIMPs can have no detectable effect on the
gravitational waves observed from sources at moderate redshift.

15
VII. Primordial Gravitational Waves

As a second application, we consider the effect of cold dark matter on primordial gravitational
waves. In much of what follows we will consider WIMP dark matter for concreteness, but the
discussion generalizes to more general models of dark matter. Let us begin by summarizing
the key events during cosmic history that are important for our treatment of the effects of
WIMP dark matter on primordial gravitational waves. At early times WIMPs are relativistic
and are in thermal equilibrium with the particles of the standard model. As the universe
cools, the dark matter particles become non-relativistic. Shortly after this time, when the
temperature of the medium has dropped to ≈ 1/30 of the WIMP mass, inelastic processes
are no longer efficient enough to keep the dark matter particles in chemical equilibrium and
the comoving number density of dark matter particles becomes constant. However, elastic
scattering still occurs rapidly and keeps the WIMPs in kinematic equilibrium with the standard
model. As the universe cools further, elastic scattering between the dark matter particles and
standard model particles becomes inefficient as well, WIMPs kinetically decouple and become
free-streaming. Astrophysical sources emit gravitational waves long after kinetic decoupling
when the dark matter is already free-streaming. In contrast, depending on their frequency,
primordial gravitational waves may propagate during earlier epochs when the dark matter was
still in kinetic equilibrium or even relativistic.
We will refer to gravitational waves that enter the horizon after kinetic decoupling as long
modes. For typical WIMPs, these have frequencies of at most a few times 10−12 Hz today,
and can only be accessed through measurements of the polarization of the cosmic microwave
background. We call modes that enter the horizon before kinetic decoupling but after the dark
matter has become non-relativistic intermediate modes. These modes have frequencies between
10−12 and ∼ 10−5 Hz, and fall into the frequency range observable with pulsar timing arrays.
Modes accessible with DECIGO [8] or BBO [9] enter the horizon when the dark matter particles
are still relativistic, and we refer to them as short modes.

Long modes
We first discuss effects on modes with wavelengths that can be accessed through measurements of
the polarization of the cosmic microwave background. In linear perturbation theory primordial
gravitational waves generate B-mode polarization whereas density perturbations do not. So the
search for B-mode polarization of the CMB is an indirect search for gravitational waves. Lensing
of the CMB by large scale structure between us and the surface of last scattering also generates
B-mode polarization and in practice limits the multipoles for which we can extract information
about primordial gravitational waves to less than a few hundred.

16
The contribution to the CMB anisotropies at multipole ` is dominated by gravitational waves
with wave number k = aL `/dL , where aL is the value of the scale factor at last scattering, and
dL is the angular diameter distance to the surface of last scattering. For a flat geometry
Z 1
1 dx
dL = ≈ 13 Mpc−1 . (51)
H0 (1 + zL ) 1/(1+zL ) Ωr + Ωm x + ΩΛ x4

So the CMB allows us to access gravitational waves with comoving wave numbers k . 0.03 Mpc−1 .
These modes entered the horizon at a redshift of z . 104 long after kinetic decoupling of the
dark matter. The anisotropic stress for the modes of interest is then well approximated by
equation (24). Furthermore, by this time these modes have at most undergone a few oscillations
so that the anisotropic stress for the modes accessible in the CMB further simplifies to (25)
and (26).
In sections V and VI we found analytic solutions to the field equations in the presence of
non-relativistic collisionless matter for wave frequencies much greater than the Hubble expansion
rate, either using the WKB approximation to deal with general expansion rates, or in the special
case of constant a(t), where this approximation is unnecessary. We are now concerned with
gravitational wave frequencies comparable to the expansion rate. Unfortunately there is no way
to find analytic solutions of the field equations for Robertson-Walker scale factors a(t) with
arbitrary time-dependence. However, we can find solutions during the matter and radiation
dominated eras most relevant to the CMB.
To treat the time evolution during the matter and radiation dominated eras, it is convenient
to introduce the independent variable y = a/aeq , where aeq is the scale factor at matter-radiation
equality, and write equation (27) as3
d2
 
2 5 d 4
(1 + y) 2 hij (x, t) + + hij (x, t) + κ 2 hij (x, t) = − 3 (hij (x, t) − hij (x, t1 )) , (52)
dy y 2 dy y
with  = E/a5eq ρm eq the fraction of the energy density of the dark matter particles stored in

kinetic energy at matter-radiation equality, and κ = 2q/aeq Heq . The solution to this equation
cannot be written in closed form, but we can find solutions for κ  1 and κ  1.
Let us first consider modes that enter the horizon after matter-radiation equality for which
κ  1. For modes outside the horizon at last scattering hij (x, t) ≈ hij (x, t1 ) and the anisotropic
stress vanishes. So we expect the evolution of the gravitational waves to be unaffected by the
presence of cold dark matter. To be more quantitative, we can treat both the gradients and the
anisotropic stress as a perturbation. Introducing the mode expansion
X Z
hij (x, t) = d3 q β(q, λ)eij (q̂, λ)hq (t)eiq·x , (53)
λ=±2
3
This equation is valid after electrons and positrons have frozen out.

17
the general solution to the homogeneous equation is given by a linear combination of
 √ √ 
1 2 1 1+y+1 1+y
hq (y) = 1 and hq (y) = ln √ − . (54)
2 1+y−1 y

The second solution diverges like 1/y for small y and it is the first the solution that is of interest
in the context of primordial gravitational waves. With help of the Green’s function
√ √

 
z p 1+y+1 1+z+1
G(y, z) = √ −2z 1 + y + 2y 1 + z + yz + ln √ − ln √ θ(y − z) .
2y 1 + z 1+y−1 1+z−1
(55)
we can write the solution at leading order in κ 2 as

2κ 2
   
(0) o
p 2 y 1+y+1
hq (y) = hq 1 + 8 − 8 1 + y − 3y + 4y 1 + ln + ln √ . (56)
15y 4 1+y−1

The leading contribution from anisotropic stress also arises at order κ 2 and is given by
(0)
y hq (z) − hoq
Z
h(1) (1)
q (y) = hq (y? ) − 4 dzG(y, z) , (57)
y? z3

where y? is late enough for collisions to be negligible but early enough so the mode is far outside
(1)
the horizon, and hq (y? ) is the contribution generated by up to this point. We will compute it
in section VII, for now we simply give the result

κ 2 y?
 
(1) o
hq (y? ) = hq 1 + + Cω , (58)
3

where Cω is negative and describes a small amount of damping generated by collisions around
the time of kinetic decoupling. It is of order κ 2 akd /aeq and is suppressed relative to the terms
of interest by akd /aeq  1, where akd is the scale factor at kinetic decoupling, and we can safely
neglect it.
The result cannot be written in closed form for general y but becomes simple in the radiation
and matter dominated epochs

1 2 2 κ 2 y
 
o
hq (y) → hq 1 − κ y + for y  1, (59)
6 3
4κ 2 (8ζ(3) − 7)
 
o 2 2
hq (y) → hq 1 − κ y+ for y  1. (60)
5 15

Since 8ζ(3) − 7 ≈ 2.6 > 0, we see that modes outside the horizon during last scattering receive
a small scale-dependent boost. Since last scattering occurs for y ≈ 3, this simple limiting form
does not capture the effect on the CMB accurately, but we can expand the result to higher

18
0.010

0.005

Ε-1hH1L 0.000

-0.005

-0.010
0 10 20 30 40 50
y

Figure 1: The effect of collisionless matter on the time evolution of a mode with κ = 1/10. We
show the limiting form given in equation (60) (green), the approximation given in equation (61)
(orange), the full expression based on equation (57) (dashed red), and the difference between
the numerical solutions of the equation of motion with and without anisotropic stress (black).

(0) (1) (0)


orders, and find that the solution is given by hq (y) = hq (y) + hq (y) with hq (y) given by
equation (56) and the leading effect due to collisionless matter given by
8(15 + 2π 2 )

(1) 2 o 8ζ(3) − 7 4
hq (y) = 4κ hq − +
15 5y 135y 3/2
4(7 + 2 ln(y/4)) 4(15 + 2π 2 ) 32(2 + ln(y/4))

+ − − + O(y −7/2 ) (61)
15y 2 225y 5/2 135y 3
The limiting form (60), the approximation (61), and the result at order κ 2 and linear in  based
on equation (57) are compared to the numerical result in Figure 1 for κ = 1/10. The difference
between the numerical result and our approximation for large y arises because the mode is about
to enter the horizon.
We see that the effect is highly suppressed and unobservably small for any upcoming or
planned CMB experiment both because the fraction of the energy density stored in kinetic
energy density of the dark matter is very small and because for these modes κ  1.
Let us now turn to modes with κ  1. These modes enter the horizon at a time when the
energy density of the universe is dominated by radiation. To find their time evolution, we will
first find the solution during radiation domination and then match it onto the WKB solution (39)
to extend it to late times.
In the radiation dominated period, y  1, the equation of motion for gravitational waves (52)
simplifies and the mode functions will only depend on y through u = κy. It is then convenient

19
to write the equation of motion as
d2 2 d 4κ
2
hq (u) + hq (u) + hq (u) = − 3 (hq (u) − hq (u1 )) . (62)
du u du u
The general solution of the homogeneous differential equation is a superposition of the solutions
sin(u)
h1q (u) = , (63)
u
cos(u)
h2q (u) = . (64)
u
The second solution diverges for small u and consequently decays outside the horizon so that the
first solution is relevant for primordial gravitational waves. It is normalized so that h1q (0) = 1.
In this case we can write the Green’s function as
v sin(u − v)
θ(u − v) = v 2 h1q (u)h2q (v) − h2q (u)h1q (v) θ(u − v) ,
 
G(u, v) = (65)
u
and we can formally write the solution to the inhomogeneous equation as
Z u
hq (v) − hq (v1 )
hq (u) = h(0)
q (u) − 4κ dv G(u, v) . (66)
u? v3
The integral and its derivative vanish at u? so the homogeneous solution must be chosen to
satisfy the desired initial conditions. We can write it as

h(0) 1 2
q (u) = Ahq (u) + Bhq (u) , (67)

with

A = hq (u? ) (cos(u? ) + u? sin(u? )) + h0q (u? )u? cos(u? ) , (68)


B = hq (u? ) (u? cos(u? ) − sin(u? )) − h0q (u? )u? sin(u? ) . (69)

To first order in κ we can write the solution as a superposition of the two solutions of the
homogeneous solution, albeit with time dependent coefficients

hq (u) = A (1 + C(u))h1q (u) + D(u)h2q (u)


 

+ B E(u)h1q (u) + (1 + F (u))h2q (u) ,


 
(70)

with
Z u
dv 2
hq (v) (h1q (v) − h1q (v1 ) ,

C(u) = −4κ (71)
u? v
Z u
dv 1
hq (v) h1q (v) − h1q (v1 ) ,

D(u) = 4κ (72)
u? v
Z u
dv 2
hq (v) h2q (v) − h2q (v1 ) ,

E(u) = −4κ (73)
v
Z uu?
dv 1
hq (v) h2q (v) − h2q (v1 ) .

F (u) = 4κ (74)
u? v

20
These integrals can all be expressed in terms of trigonometric functions, sine and cosine integrals,
but we will not give the general formulae and work in various limits. For primordial gravitational
(0)
waves we expect hq (u) = hoq h1q (u) so that

hq (u) = hoq (1 + C(u))h1q (u) + hoq D(u)h2q (u) , (75)

or
h(1) o 1 o 2
q (u) = hq C(u)hq (u) + hq D(u)hq (u) , (76)

and we only need the behavior of C(u) and D(u). As we will see, this is not entirely accurate
because a small departure from A = 1 and B = 0 is generated around the time of kinetic
decoupling, and as we will see

2u? u2?
A = 1 + κ + Cω and B = −κ . (77)
3 3
The amount of damping generated around kinetic decoupling, Cω , is calculated below. For
now, it suffices to know that it is of order κ 2 akd /aeq , where akd is the scale factor at kinetic
decoupling (defined more precisely below). Since Cω is suppressed not only by  but also by
akd /aeq we can safely neglect it in our discussion here. This implies that we have

u2?
   
(1) o 2u? 1 o
hq (u) = hq C(u) + κ hq (u) + hq D(u) − κ h2q (u) , (78)
3 3
For small u it is easy to see that we can drop the additional terms provided we set u? = 0 in
equations (71) and (72), and we will do so in what follows. For modes that are far outside the
horizon when the particles become non-relativistic v1  1. The leading correction is quadratic
in v1 , and we will take v1 → 0. We will need the limiting forms for u  1 and u  1. For small
arguments we find
2u
C(u) → κ + O(u3 ) , (79)
3
u2
D(u) → −κ + O(u4 ) , (80)
3
whereas for large arguments
sin(u)
C(u) → 4κ 2
+ O(1/u3 ) , (81)
 u 
2 cos(u) − 1/2
D(u) → 2κ 1 − 2 ln 2 + + O(1/u3 ) . (82)
u2
This leads to a solution for the mode function far outside the horizon of
1 2 2 κ 2 y
 
o
hq (y) = hq 1 − κ y + + O(y 3 ) , (83)
6 3

21
1.0

0.5

0.0

XHuL fc -0.5

-1.0

-1.5

-2.0
0 5 10 15 20
u

Figure 2: C(u) and D(u) as defined in equations (71), (72). We show the exact results for C(u)
(orange) and D(u) (red), and the limiting forms (81), (82) valid for u  1 for C(u) (dashed
blue) and D(u) (dashed green).

in agreement with equation (59). Once the mode is deep inside the horizon, it approaches
 
o sin(κy) 2κ cos(κy)(2 ln 2 − 1)
hq (y) = hq − + O(κ −3 y −3 ) . (84)
κy κy

We see that the dark matter has no effect on the amplitude (besides the small effect generated
around kinetic decoupling we neglected) but introduces a small phase shift. Since we will need
it later, let us also record its derivative
 
0 o cos(κy) 2κ sin(κy)(2 ln 2 − 1)
hq (y) = κhq + + O(κ −2 y −3 ) . (85)
κy κy

The behavior of the functions C(u) and D(u) and the comparison to the limiting forms (81), (82)
are shown in Figure 2.
This solution is valid deep inside the horizon and during the radiation dominated era. To
find the solution at later times, we can match it to the WKB approximation we derived in
section V. Equation (39) becomes

hq (y) = h1q (y) hq (y? ) + hq (y1 )A(y) + h2q (y) $(y? )−1 h0q (y? ) + hq (y1 )B(y) ,
   
(86)

with q
κ 2 + y43
$(y) = √ , (87)
1+y

22
the functions
y?  p    p 
h1q (y) =
p p
cos 2κ 1 + y − 1 + y? − sin 2κ 1 + y − 1 + y? , (88)
y κyy?
y?  p    p 
h2q (y) =
p p
sin 2κ 1 + y − 1 + y? + cos 2κ 1 + y − 1 + y? , (89)
y κyy?
and to leading order in 
√ √
4(1 + y) cos(2κ( 1 + y − 1 + y? )) 4(1 + y? )
A(y) = − , (90)
κ2y2 κ 2 y?2
√ √
4(1 + y) sin(2κ( 1 + y − 1 + y? ))
B(y) = . (91)
κ2y2
So to first order in  and deep inside the horizon, we obtain the solution

o 1 sin(κy? ) 2 cos(κy? )(1 − 2 ln 2) 4(1 + y? )
hq (y) = hq hq (y) + −
κy? y? κ2y2
√ √ ? 
4(1 + y) cos(2κ( 1 + y − 1 + y? ))
+
κ2y2

cos(κy? ) 2 sin(κy? )(1 − 2 ln 2)
+ hoq h2q (y) −
κy? y?
√ √ 
4(1 + y? ) sin(2κ( 1 + y − 1 + y? ))
+ . (92)
κ2y2

Working to leading order in , the dependence on y? disappears as it had to and the evolution
inside the horizon valid during both radiation and matter dominated eras is given by
" √  √  #
o sin 2κ 1+y−1 2κ(2 ln 2 − 1) cos 2κ 1 + y − 1
hq (y) = hq − . (93)
κy κy

We see that the gravitational waves acquire a small phase shift δϕ = −2κ(2 ln 2 − 1). The
analytic solution is compared to a numerical calculation in Figure 3 for κ = 100. We see that
the effect on modes that enter the horizon during the radiation dominated period is larger than
the effect on modes that enter at later times, but since the fraction of the density in the kinetic
energy of the dark matter is rather small, its effect on the degree scale polarization of the
cosmic microwave background is also too small to be observed with upcoming or planned CMB
experiments.

Intermediate modes
We now turn to modes that enter the horizon when the dark matter is still in kinetic equilibrium
but has already become non-relativistic. For a typical WIMP this corresponds to a gravitational
wave frequency today below ∼ 10−5 Hz.

23
4

Ε�1h�1� 0

�2

�4

0 1 2 3 4 5

Figure 3: The effect of collisionless matter on the time evolution of a mode with κ = 100. We
show the term of order  of the approximation to the mode function given in equation (93)
(dashed orange) and the difference between the numerical solutions of the equation of motion
with and without anisotropic stress (black).

As we briefly discussed after equation (17), we expect collisions to be negligible if the collision
term in the Boltzmann equation is much less than the transport term. The wavelength of the
primordial gravitational waves, λ, redshifts like one power of the scale factor, the velocity of the
dark matter particles, v redshifts like a−1 after and a−1/2 before kinetic decoupling. The rate ωr
at which energy is exchanged between standard model particles and the dark matter redshifts
at least like a−3 like the number density of standard model particles. So at late times when
ωr  v/λ collisions are negligible, but they become important at early times. As a consequence
we see that the anisotropic stress is no longer given by (24) and we will have to revisit the
derivation in the presence of collisions.
If the standard model particles interacting with the dark matter are much lighter than
the dark matter particles, are relativistic and are in local thermal equilibrium, the Boltzmann
equation becomes

∂n(p, x, t) pi ∂ 1 ∂gkl pk pl ∂
+ 0 i n(p, x, t) + n(p, x, t) =
∂t p ∂x 2 ∂xi p0 ∂pi
h i
−2hσvi n(p, x, t)n(x, t) − neq (p, x, t)neq (x, t)
 
∂ ∂
+ωr (t) pi n(p, x, t) + gij (x, t)mT n(p, x, t) , (94)
∂pi ∂pj
where T is the temperature of the standard model degrees of freedom, hσvi is the thermally

24
averaged dark matter annihilation cross section, ωr is the rate at which the standard model
particles and dark matter particles exchange energies of order kT , and as before

g ij (x, t)pi pj
q
pi = g ij (x, t)pj , p0 = m2 + g ij (x, t)pi pj ≈ m + , (95)
2m
and Z
1
n(x, t) = p d3 p n(p, x, t) . (96)
detg(x, t)
In general, we expect the temperature to be a function of position and expect a small position
dependent velocity of the medium, but because we are interested in tensor perturbations we will
not need to include this.
In writing equation (94), we have assumed that the dark matter only participates in interac-
tions with the standard model particles, both in the form of the inelastic processes responsible
for setting the freeze-out abundance, and in the form of the elastic processes required by crossing
symmetry, but have neglected self-interactions. Of course, we only have very weak constraints
on dark matter matter self-interactions, and these interactions may, in fact, well be significantly
stronger than the interactions with the standard model that are included here, at least for some
range of temperatures. However, we will see that our treatment of the effects of the minimal
interactions that must be present for any WIMP included here will also allow us to understand
the effects of self-interacting dark matter on gravitational waves.
Close to local thermal equilibrium the scattering rate is much higher than the rate of change
in the temperature or the metric. We can thus neglect time derivatives acting on the metric or
the temperature and see that the equilibrium distribution is
 3/2  ij 
1 g (x, t)pi pj
neq (p, x, t) = neq exp − . (97)
2πmT 2mT

Away from thermal equilibrium we should in general consider an Ansatz in which the temper-
ature of the dark matter particles depends on position, but because we are interested in tensor
perturbations we can consider an Ansatz in which it is only a function of time
 3/2  ij 
1 g (x, t)pi pj
n(p, x, t) = n(t) exp − + δn(p, x, t) . (98)
2πmTdm (t) 2mTdm (t)

The first term on the right hand side is a solution to the Boltzmann equation in the absence of
tensor perturbations provided the dark matter temperature and density obey
1 d 2 
a Tdm = 2ωr (t)(T − Tdm ) , (99)
a2 dt
1 d 3 
a n = −2hσvi n2 − n2eq .

3
(100)
a dt

25
So as expected δn(p, x, t) is of first order in the metric perturbation, and as before we will write

1 ∂
n(p, x, t) = n(p) − hij (x, t)pi n(p) + δn(p, x, t) , (101)
2 ∂pj

with n(p, t) given by


3/2
p2
  
1
n(p, t) = a3 n(t) exp − . (102)
2πma2 Tdm 2ma2 Tdm

The equation for a plane wave, δn(p, x, t) ∝ exp(iq · x), with wave vector q then becomes

∂δn(p, x, t) ip · q 1 ∂
+ 2 δn(p, x, t) − ḣij (x, t)p̂i p̂j p n(p, t) =
∂t a m 2 ∂p
 
∂ 2 ∂
−2ωa (t)δn(p, x, t) + ωr (t) pi δn(p, x, t) + a mT δn(p, x, t) , (103)
∂pi ∂pi

where we denoted the annihilation rate by

ωa (t) = hσvin(t) , (104)

and we have used Z


d3 p δn(p, x, t) = 0 , (105)

because gravitational waves do not generate fluctuations in the number density.4


Before we consider the general case, let us consider wavelengths for which the medium
behaves like a viscous fluid. At leading non-trivial order in the derivative expansion, taking
H  ωr , q/a  ωr and using ωa  ωr , the perturbation to the phase space density must satisfy
 
∂ 2 ∂ 1 ∂
pi δn(p, x, t) + a mT δn(p, x, t) = − ḣij (x, t)pi n(p, t) . (106)
∂pi ∂pi 2ωr ∂pj

Because hij is traceless, we can commute pi with the derivative and the first integration is trivial.
Remembering that the perturbation must vanish as the gravitational wave amplitude is taken
to zero, we have
∂ 1
pi δn(p, x, t) + a2 mT δn(p, x, t) = − ḣij (x, t)pj n(p, t) . (107)
∂pi 2ωr

Since δn(p, x, t) is a scalar that vanishes as the gravitational wave is taken to zero, and the
metric perturbation is transverse and traceless we consider an Ansatz of the form

˜ p, t) .
δn(p, x, t) = ḣij (x, t)p̂i p̂j ∆(q, (108)
4
We will see a more rigorous justification for this below.

26
Introducing the shorthand notation ḣ = ḣkl (x, t)p̂k p̂l , the resulting equation is

a2 mT
  
˜ ˜ ˜ ∂ ˜
p̂i ḣ∆(q, p, t) + 2ḣij p̂j ∆(q, p, t) + p̂i ḣ −2∆(q, p, t) + p ∆(q, p, t)
p2 ∂p
1
=− ḣij p̂j n(p, t) . (109)
2ωr

The coefficients of p̂i and ḣij p̂j must vanish independently and from the term proportional to
ḣij p̂j we can read off
2
˜ p, t) = − p n(p, t) .
∆(q, (110)
4ωr a2 mT
Equation (99) leads to  
H
Tdm = T 1− , (111)
2ωr
so that for H  ωr the dark matter temperature is well approximated by that of the standard
˜ p, t)
model particles, Tdm ≈ T , and we see that the terms proportional to p̂i also vanish for ∆(q,
given by (110). The perturbation to the phase space density in this approximation is then

p2 n(p, t)
 2 
q
δn(p, x, t) = − 2
ḣij (x, t)p̂i p̂j + O . (112)
4ωr a mT a ωr2
2

Substituting back into the Boltzmann equation (103), we see that the terms we are neglecting are
indeed of order q/aωr and H/ωr relative to the terms we are keeping. To compute the anisotropic
stress, recall that the space-space components of the stress tensor are given by equation (12).
For the Ansatz (101), the contribution linear in the metric perturbation simplifies to
Z
i 1 pi pj
δT j (x, t) = 5
d3 p δn(p, x, t) , (113)
a m
and the anisotropic stress is simply the transverse traceless part of this expression. We can
perform the angular integrals with the identity
Z 2
d p̂ 1h i
p̂i p̂j p̂k p̂l = δij δkl + δik δjl + δil δjk , (114)
4π 15
and the integral over the magnitude by recalling
∂ p
n(p, t) = − 2 n(p, t) , (115)
∂p a mTdm
integrating by parts and using the definition of comoving kinetic energy density of the dark
matter particles
p2
Z
3 3
E(t) = d3 p n(p, t) = a5 nTdm ≈ a5 nT . (116)
2m 2 2

27
This leads us to the anisotropic stress
E(t) nT
πij (x, t) = − 5
ḣij (x, t) = − ḣij (x, t) , (117)
3a ωr (t) 2ωr (t)
with the number density n set by the usual freeze-out calculation. The equation of motion for
gravitational waves before then simply becomes
q2 nT
ḧq (t) + (3H(t) + Γ)ḣq (t) + hq (t) = 0 with Γ = 8πG , (118)
a2 (t) ωr
so that the presence of the dark matter leads to some amount of damping of the gravitational
waves. Repeating the above computation for a velocity gradient, we find that the shear viscosity
of the medium is given by η = nT /2ωr , so that the damping rate is given by Γ = 16πGη consistent
with [4]. However, because
E(t) H
 H H, (119)
3a5 ωr (t)Mp2 ωr
the Hubble rate during this epoch is orders of magnitude larger than Γ. The effect is highly sup-
pressed both because the energy density in dark matter particles is a subdominant contribution
to the total energy density during radiation domination, and because H  ωr before kinetic
decoupling.
We know that ωr ≈ H during kinetic decoupling so that the approximation does not allow
us to follow modes through kinetic decoupling, and we can only use it to study the behavior
of modes before kinetic decoupling while q/aωr  1 and H/ωr  1. To follow modes through
decoupling, we return to equation (103) and rewrite it as a hierarchy of coupled ordinary differ-
ential equations. Recalling the mode expansion (53), we see that the equation only depends on
the direction of the momentum of the dark matter particles through µ = p̂ · q̂ and eij (q̂, λ)p̂i p̂j .
In general, additional directional dependence could arise from the initial conditions, but we are
interested in isotropic initial conditions so that the perturbation to the phase space density must
be of the form
X Z
δn(p, x, t) = ˜ p, µ, t)eiq·x .
d3 q β(q, λ)ekl (q̂, λ)p̂k p̂l ∆(q, (120)
λ=±2

Given that the polarization tensor is transverse and traceless, we see that this Ansatz justifies
equation (105). As we show in Appendix A, expanding the perturbation to the phase space
density in terms of orthonormal polynomials
X Z
δn(p, x, t) = d3 q β(q, λ)eij (q̂, λ)p̂i p̂j eiq·x
λ=±2
X ∂
× (−i)` (2` + 1)∆n ` (q, t)Ln ` (z)P` (µ)p n(p, t) , (121)
∂p
`=2...∞
n=0...∞

28
where
P`2 (µ) p2
P` (µ) = and Ln ` (z) = z `/2−1 Ln`+1/2 (z) where z= , (122)
1 − µ2 2a2 mTdm
Lkn are generalized Laguerre polynomials and P`m are associated Legendre polynomials, allows
us to diagonalize the collision term and leads us to the Boltzmann hierarchy
 1/2 "  
˙ n ` (q, t) + q 2Tdm 3
∆ (` + 2) n + ` + ∆n `+1 (q, t)
(2` + 1)a m 2
#
−n(` + 2)∆n−1 `+1 (q, t) + (` − 2)∆n+1 `−1 (q, t) − (` − 2)∆n `−1 (q, t) =

1 T n2eq
− ḣq (t)δ`2 δn0 − (2n + `)ωr (t) ∆n ` (q, t) − 2ωa (t) 2 ∆n ` (q, t) , (123)
30 Tdm n
and the anisotropic stress
πq (t) = 30n(t)Tdm (t)∆02 (q, t) . (124)
We see that for non-relativistic dark matter particles the collision term is dominated by
the elastic scattering processes as expected. Dark matter self-interactions introduce another
source of damping on the right hand side. Assuming they are generated by an operator with
comparable coefficient to that responsible for the interactions between the dark matter and the
standard model, their effect would be suppressed just like that of annihilations because the dark
matter is non-relativistic and its number density is small compared to that of light standard
model degrees of freedom.
To find the initial conditions for equation (123), let us consider the system at a time when
scattering is efficient and q/aωr  1 and H/ωr  1. We see that in this limit all modes but the
mode with n = 0 and ` = 2 are rapidly driven to zero. Recalling that in this limit Tdm ≈ T , we
find
ḣq (t)
∆02 (q, t) → − , (125)
60ωr (t)
∆n ` (q, t) → 0 for all others. (126)

The expansion (121) together with L02 (z) = 1 and P2 (µ) = 3 then implies
p2 n(p, t)
δn(p, x, t) = − ḣij (x, t)p̂i p̂j , (127)
4ωr a2 mT
in agreement with our earlier result (112). As a further consistency check consider gravitational
wave emission at some time t1 long after decoupling. Provided we are interested in the anisotropic
stress at a time t that is not too long after emission so that we still have
2Tdm (t0 ) 3/2
Z t  
0 q
dt  1, (128)
t1 a(t0 ) m

29
all couplings between modes are negligible and we simply have
1
∆02 (q, t) = − (hq (t) − hq (t1 )) , (129)
30
so that
πij (x, t) = −n(t)Tdm (t) (hij (x, t) − hij (x, t1 )) , (130)

consistent with equation (25) in section IV since the comoving kinetic energy density is given
by E = 3a5 nTdm /2.
As long as the particles move a distance that is short compared to the wavelength of the
gravitational wave on the time scale on which the dark matter and the standard model exchange
energy, we have (q/a)v  ωr so that the higher multipole moments are driven to zero and the
hierarchy reduces to

˙ 02 (q, t) + 2ωr (t) T (t) ∆02 (q, t) = − 1 ḣq (t) .


∆ (131)
Tdm (t) 30

All that remains is to find the initial conditions, but provided q/aωr  1 around the time of
freeze-out when ωr  H, we know that the initial conditions are given by equation (125), and
the solution is
0
 Z t 
ḣq (t1 ) 0 0 T (t )
∆02 (q, t) = − exp −2 dt ωr (t )
60ωr (t1 ) t1 Tdm (t0 )
Z t 00
 Z t 
1 0 0 00 00 T (t )
− dt ḣq (t ) exp −2 dt ωr (t ) . (132)
30 t1 t0 Tdm (t00 )

Intermediate modes enter the horizon when the dark matter is non-relativistic, and we can take
t1 early enough so the mode is outside the horizon. In this case we can neglect the first term on
the right hand side so that the time evolution for gravitational waves is governed by
Z t 00
 Z t
q2

0 0 00 00 T (t )
ḧq (t) + 3H ḣq (t) + 2 hq (t) = −16πGnTdm dt ḣq (t ) exp −2 dt ωr (t ) . (133)
a t1 t0 Tdm (t00 )

For modes that enter the horizon after kinetic decoupling the argument of the exponential
is small and as expected the equation reduces to that studied in section IV.
As an additional check, let us also consider modes that enter the horizon before kinetic
decoupling when ωr  H. For modes whose wave numbers satisfy q/a  ωr , we see that the
integral is dominated by times t0 that differ from t by ∼ 1/ωr . Since the mode function varies
on much longer time scales set by q/a and H, we can approximate its argument by t0 ≈ t and
recover an anisotropic stress consistent with equation (124) with ∆02 given by equation (125).
As we saw, this leads to an additional friction term, but the effect is much too small to be
observable.

30
As the universe expands, the rate ωr eventually drops below q/a. For modes that entered
significantly before kinetic decoupling this happens while ωr  H so that q/a  ωr  H. At
this time Tdm ≈ T and we can write the anisotropic stress as
Z t  Z t 
0 0 00 00
πq (t) = nT dt ḣq (t ) exp −2 dt ωr (t ) . (134)
t1 t0

We can break up the integral into a contribution from the initial time t1 to some time t? when
q/a  ωr  H and a contribution from t? to the time of interest t
Z t?  Z t?   Z t 
0 0 00 00 00 00
πq (t) = nT dt ḣq (t ) exp −2 dt ωr (t ) exp −2 dt ωr (t )
t1 t0 t?
Z t  Z t 
+nT dt0 ḣq (t0 ) exp −2 dt00 ωr (t00 ) , (135)
t? t0

The first term on the right hand side is then exponentially suppressed by the last factor provided
t is at least a few 1/ωr after t? , and we can use the same trick as in equation (29) to perform
the integral on the second line because q/a  ωr  H for all t0 . The equation of motion of the
gravitational waves is then
 Z t
q2
 
00 00
ḧq (t) + 3H ḣq (t) + 2 hq (t) = −16πGnT hq (t) − hq (t? ) exp −2 dt ωr (t ) . (136)
a t?

As long as ωr  H, collisions rapidly erase the second term on the right hand side and the
equation simplifies to the homogeneous equation

q2 32πGE
ḧq (t) + 3H ḣq (t) + ω 2 hq (t) = 0 with ω2 = 2
+ , (137)
a 3
where E is the proper density of kinetic energy E = 3nT /2. So once q/a  ωr , the only
effect is the modified dispersion relation. We can then compute the phase shift caused by this
modification throughout cosmic history as
Z t0
16πGE a0 H0
∆ϕ = dt   1, (138)
tkd 3q/a(t) q

where t0 denotes the present time. We see that even for primordial gravitational waves that
entered the horizon before kinetic decoupling the modification to the dispersion relation has no
observable effect.
From this discussion, we see that modes are not significantly affected either at early times
when q/a  ωr or once q/a  ωr . What remains is to compute the effect of collisions around the
time when q/a ≈ ωr . For this purpose it is convenient to introduce the independent variable x =

31
a/akd and to define the Hubble rate at kinetic decoupling such that Hkd ≡ H(tkd ) = 2ωr (tkd ).
In this case equation (133) becomes
x x
6nTdm x2
Z  Z 
2
h00q (x) + h0q (x) + κ2 hq (x) = − dzh0q (z) exp − 0 0 0
dz z ω̂(z ) , (139)
x ρkd x1 z

where ω̂ (y(t)) = ωr (t)/ωr (tkd ), κ = q/akd Hkd and ρkd is the energy density when H(tkd ) =
2ωr (tkd ). This equation neglects the effect introduced by the change in the number of relativistic
degrees of freedom on the expansion rate studied in [10] because we are interested in small
corrections introduced to the standard calculation of the gravitational wave spectrum by the
velocity dispersion of the dark matter particles. We have set T = Tdm in the exponential
because as we will see the effect of collisions on modes that enter before kinetic decoupling are
most significant around the time when the wave number of the gravitational wave is comparable
to ωr , which occurs before kinetic decoupling when T ≈ Tdm . The integral on the right hand
side receives negligible contributions at early times when the modes are frozen and we can set
x1 = 0.
We will keep ω̂(y) general for now, but it may be helpful to know what behavior we expect.
If the interactions between the dark matter particles and the standard model are controlled
by a single operator, the dark matter is non-relativistic and the standard model particles are
relativistic, the rate scales like ωr ∝ T 4+β . The value of β is determined by the form of the
interactions between dark matter and the standard model. An interaction between a non-
relativistic scalar or fermionic dark matter particle and a relativistic scalar through a dimension
four and five operator, respectively, would correspond to β = 0, β = 2 would describe a non-
relativistic, fermionic dark matter particle interacting with a relativistic fermion through a
dimension six operator, etc.
The anisotropic stress is proportional to the fraction of the energy density stored in kinetic
energy of the dark matter particles, which is small both because the dark matter particles are
non-relativistic at the time of interest and because the universe is radiation dominated, justifying
a perturbative treatment. Using the mode functions (63), (64), and the Green’s function (65),
the leading order solution is given by

hq (x) = hoq (1 + C(x))h1q (x) + hoq D(x)h2q (x) , (140)

with the functions


x y  Z y
6nTdm y 4
Z Z 
0 0 0
C(x) = − dy κh2q (y) dzh10 −
q (z) exp dz z ω̂(z ) , (141)
x1 ρkd 0 z
Z x Z y  Z y
6nTdm y 4

0 0 0
D(x) = dy κh1q (y) 10
dzhq (z) exp − dz z ω̂(z ) . (142)
x1 ρkd 0 z

32
Introducing the dark matter kinetic energy density at kinetic decoupling Ekd and recalling
that the temperature of the dark matter particles obeys equation (99), we find

4Ekd κ x
Z Z y  Z y 
2 10 0 0 0
C(x) = − dy yτdm (y)hq (y) dzhq (z) exp − dz z ω̂(z ) , (143)
ρkd x1 0 z
4Ekd κ x
Z Z y  Z y 
0 0 0
D(x) = dy yτdm (y)h1q (y) dzh10
q (z) exp − dz z ω̂(z ) , (144)
ρkd x1 0 z

where τdm = Tdm /Tkd is the solution of the differential equation


 
1 d 2
 1
y τdm (y) = ω̂ − τdm , (145)
y 3 dy y

that approaches τdm (y) → y −1 before kinetic decoupling when y  1. After kinetic decoupling
the right hand side of the equation is negligible and the temperature of the dark matter particles
redshifts like y −2 . Notice that here Tkd is the temperature of the standard model particles at
kinetic decoupling so that Ekd ≡ n(tkd )Tkd differs from the kinetic energy density in the dark
matter particles at decoupling by a factor τdm (1).
We can think of C(x) as a change to the amplitude of the mode caused by collisions whereas
2 /2, we see that
D(x) corresponds to the phase shift generated by them. Writing Ekd = 3ρm, kd vkd
the effect is suppressed both because the velocity at decoupling for cold dark matter is of order
10−2 − 10−3 and because decoupling typically happens deep in the radiation dominated era so
that ρm, kd  ρkd .
For modes that enter the horizon long before kinetic decoupling κ  1, and we already know
from our earlier discussion that C and D do not receive significant contributions from very early
or late times and we are interested in their behavior when (q/a) ≈ ωr or y ω̂ ≈ κ when y ω̂  1
and y  1. In this case the integral over z is dominated by z ≈ y. Provided (y ω̂)0  (y ω̂)2 we
can change variables to z = y + u and approximate the integral by expanding the argument of
the exponential to leading order in u
Z y  Z y  Z 0
10 0 0 0
dzhq (z) exp − dz z ω̂(z ) ≈ duh10
q (y + u) exp [uy ω̂(y)] . (146)
0 z −∞

Expanding everywhere but in the trigonometric functions in h10


q (z) to leading order in u this
leads to the following expression for κ  1

κ 1 + y 2 ω̂ cos(κy) + y(κ2 − ω̂) sin(κy)


Z y  Z y  
10 0 0 0
dzhq (z) exp − dz z ω̂(z ) ≈ . (147)
0 z κy 2 (κ2 + y 2 ω̂ 2 )
For large enough y an additional constant contribution arises from a saddle point. However,
this contribution decays rapidly for large κ and in any case does not contribute once integrated
against the oscillatory mode functions. So we will ignore it and work with (147).

33
0.000

-0.002 0.015

D���Ρkd �4 �kd
CHxLΡkd 4 Ekd

-0.004
0.010

-0.006
0.005
-0.008

0.000
0.0 0.5 1.0 1.5 2.0 2.5 3.0 0.0 0.5 1.0 1.5 2.0 2.5 3.0
x �

Figure 4: Left: Comparison of equation (148) (red) with the results of a numerical calculation
(orange). Right: Comparison of equation (149) (blue) with the results of a numerical calculation
(green). The small oscillatory contributions were neglected in the analytic calculation because
we were interested in the asymptotic behavior.

Given equation (147) we can easily find the dominant contributions to C(x) and D(x).
Neglecting the suppressed oscillatory contributions, before kinetic decoupling when τdm ≈ y −1 ,
we find
4Ekd x 1 + y 2 ω̂ 2
Z
C(x) ≈ − dy 3 2 , (148)
ρkd 0 2y (κ + y 2 ω̂ 2 )
4Ekd x κ2 − ω̂
Z
D(x) ≈ dy . (149)
ρkd 0 2κy 2 (κ2 + y 2 ω̂ 2 )
As expected, the dominant contribution to the integrals arises when κ ≈ y ω̂ or equivalently
q/a ≈ ωr .
Let us first consider the phase shift. Provided ω̂ decays more rapidly than y −1 , the phase
shift at late times, when κy  1 behaves like
Z x
4Ekd 1
D(x) ≈ dy , (150)
ρkd 2κy 2
independent of the detailed behavior of ω̂ and consistent with the definition of the phase shift
in equation (138) valid for q/a  ωr .
Turning to the effect on the amplitude, the sign of C(x) is negative so that gravitational
waves are damped around the time when q/a ≈ ωr as expected. We show a comparison of a
numerical computation with these results in Figure 4 for a representative wave number of κ = 40
and for a rate that scales like a power law ω̂(y) = y −(4+β) with β = 2.

34
0.00 0.06
-0.02 0.04
-0.04
0.02

DΩ Ρkd4 Ekd
CΩ Ρkd4 Ekd
-0.06
0.00
-0.08
-0.02
-0.10
-0.04
-0.12
-0.06
-0.14 -4
10 0.01 1 100 10-4 0.01 1 100
Κ Κ

-10-8

-10-6
CΩ Ρkd4 Ekd

-10-4

-10-2

-1
10-4 10-3 10-2 10-1 1 10 100 1000
Κ

Figure 5: Top: Numerical calculation of the damping (left) and phase shift (right) acquired by
gravitational waves around the time of kinetic decoupling of the dark matter particles from the
standard model. Bottom: Comparison of the numerical computation (orange) with the analytic
results described in the text for κ  1 (dashed green) and for κ  1 (dashed red).

Continuing with ω̂(y) = y −(4+β) for concreteness, we see that the amount of damping expe-
rienced around the time when q/a ≈ ωr scales like κ−(2+β)/(3+β) , For β = 2 we, for example,
find that gravitational waves with κ  1 are damped by an amount

2πEkd h ϕ i 1+ 5
Cω ≈ − 5/4 1/2 1 + with ϕ= . (151)
5 ϕ ρkd κ4/5 κ4/5 2
This result is compared with a numerical calculation in Figure 5. We see that the power spec-
trum of primordial gravitational waves carries information both about when kinetic decoupling
occurs and about the type of interactions of the dark matter with the standard model.
Our discussion did not crucially rely on the assumption that the collisions are between dark
matter particles and standard model particles and readily extends to models of interacting dark

35
matter. In the presence of dark matter self-interactions, ω̂ in the exponentials of equations (143)
and (144) should be replaced by the total rate at which collisions transfer energy between dark
matter particles, either by collisions with the standard model particles or by self-interactions.
Elastic self-interactions do not affect the temperature evolution, and the rate in equation (145)
that controls the dark matter temperature evolution remains the rate associated with elastic
interactions with the standard model unless there are number changing interactions in the dark
sector, such as 3 → 2 processes, or the dark sector contains several degrees of freedom.
Self-interactions lead to additional collisions which will isotropize the distribution function
of dark matter particles more rapidly. This reduces the anisotropic stress and the effect of dark
matter on gravitational waves. Besides this general expectation, any discussion of dark matter
self-interactions is highly model-dependent, and we will not attempt to classify all possible
models. Instead, we content ourselves with a simple concrete example to illustrate that self-
interactions may also leave imprints on the gravitational wave spectrum, and imagine a scenario
in which the dark matter undergoes elastic self-interactions. The thermally averaged cross-
section for elastic scattering of non-relativistic particles is constant, leading to a contribution
to the relaxation rate that redshifts like the density of dark matter particles, y −3 . As we saw
earlier, the contribution to the relaxation rate from interactions with standard model particles
redshifts faster by at least one power of y. For example, if interactions between the dark
matter and the standard model are controlled by a four-fermion interaction, they lead to a
contribution to the relaxation rate that redshifts like y −6 . Here three powers of the scale factor
arise because the density of standard model particles redshifts like y −3 , two powers arise from
the thermally averaged cross section, and the last power of the scale factor arises because it
takes m/T collisions to transfer energies of order T in collisions of the non-relativistic dark
matter with the relativistic dark matter particles. After the dark matter has frozen out, the
number density of standard model particles is exponentially larger than the number density of
dark matter particles so that the relaxation rate would presumably initially be dominated by
scattering of the dark matter particles with standard model particles. However, because the
contribution to the relaxation rate from collisions with the standard model particles redshifts
more rapidly as the universe expands, the contributions from dark matter self-interactions would
dominate below a certain temperature. The power spectrum of primordial gravitational waves
would then contain information about the interactions with the standard model particles or the
self-interactions depending whether q/a ≈ ω when the interactions with the standard model
particles or the self-interactions dominate the relaxation rate. In this example the evolution of
the dark matter temperature remains unchanged, and our results such as (148), (149) directly
apply. As long as ω̂ is a superposition of power laws, away from the transition region even the
scaling of Cω with κ we derived for a single power law can be used. In models that modify the

36
evolution of the dark matter temperature some additional work is required, but this is in principle
straightforward as well. We see that gravitational waves carry a great deal of information about
the properties of dark matter. The only problem is that the effects are hopelessly small.
We are now also in a position to justify the statement we made in our discussion of long modes
for which κ  1, namely that the change in amplitude and phase acquired around the time of
kinetic decoupling are much smaller than the contributions acquired after kinetic decoupling. To
see this we consider the behavior of the amplitude and phase at a time after kinetic decoupling,
but early enough so that the modes are still outside the horizon because this is when we started
the computation for the long modes. At this time the arguments of the trigonometric functions
for long modes are small and equations (143) and (144) become

4Ekd κ2 x
Z Z y  Z y 
0 0 0
C(x) ≈ dy τdm (y) dzz exp − dz z ω̂(z ) , (152)
3ρkd x1 0 z
4Ekd κ3 x
Z Z y  Z y 
0 0 0
D(x) ≈ − dy yτdm (y) dzz exp − dz z ω̂(z ) . (153)
3ρkd x1 0 z

To make contact with our discussion of long modes, we need C(x) and D(x) sufficiently long
after decoupling but before horizon entry. We can write them as

2Ekd τkd κ2 Ekd τkd κ3 2


C(x) = Cω + x and D(x) = Dω (x) − x , (154)
3ρkd 3ρkd

where Cω and Dω (x) are given by

4Ekd κ2 x
Z y  Z y
2Ekd τkd κ2
Z 
0 0 0
Cω = dy τdm (y) dzz exp − dz z ω̂(z ) − x , (155)
3ρkd x1 0 z 3ρkd
4Ekd κ3 x
Z y  Z y
Ekd τkd κ3 2
Z 
0 0 0
Dω (x) = − dy yτdm (y) dzz exp − dz z ω̂(z ) + x . (156)
3ρkd x1 0 z 3ρkd

Here τkd is defined through the behavior of the dark matter temperature at late times, which
according to equation (145) is
τkd
τdm (y) → for y  1 . (157)
y2
To see that Cω is indeed independent of x, note that as the argument of the exponential
after kinetic decoupling approaches unity, the terms linear in x cancel, and the remainder is
finite. As we mentioned in our discussion of long modes, the term in C(x) linear in x ensures
that there is no dependence on the time at which we match onto the collisionless description.
Unlike for intermediate modes for which the dominant contribution to Cω arises when q/a ∼ ωr ,
the dominant contribution here arises around kinetic decoupling, and we see that Cω universally
scale like κ2 .

37
The additional factor of y in the integral for Dω (x), introduces a logarithmic dependence on
x that is absent in the collisionless description. As a consequence, unlike Cω , the phase receives
contributions until horizon crossing. Equation (156) implies that the contribution from the time
around kinetic decoupling universally scales like κ3 . The presence of two powers of κ in the
denominators of the mode functions in (144) implies that the contribution from horizon entry
scales like κ and dominates.
In the model with ω̂ = y −(4+β) , the solution to equation (145) can be found explicitly in
terms of incomplete Γ-functions and by taking the late time limit we see that the constant in
equation (157) is given by  
1
− 2+β 1+β
τkd = (2 + β) Γ . (158)
2+β
Approximating the integrand of the y-integral in Cω by its asymptotic forms
2
 
− 2+β

β
1 (2 + β) Γ 2+β
y 3+β for y ≤ yc , and τkd 1 −  for y > yc , (159)
2 y2

with τ  1
kd 3+β
yc = , (160)
2
we find
"  1 #
2Ekd τkd κ2 1  
3 + β  τkd  3+β − 2 2 3+β β
Cω = − + (2 + β) 3+β Γ . (161)
3ρkd 4+β 2 τkd 2+β
(162)

The phase Dω (x) can be evaluated in the same way, but as we discussed the contribution from
kinetic decoupling is suppressed by two powers of κ compared to the dominant contribution
arising at horizon crossing and we will not give it here.
The variables used here and in the discussion of the long modes are related according to
Ekd τkd κ
κ = . (163)
ρkd
For u? = κx?  1 the constants A and B in equation (67) can then be written as

2u? u2?
A ≈ 1 + κ + Cω and B ≈ −κ . (164)
3 3
For modes that obey (q/a)v/H  1 around the time of kinetic decoupling so that κ . 1/vkd ,
equations (143) and (144) and our discussion here are valid throughout. For a typical WIMP
this corresponds to frequencies of below ∼ 10−9 Hz today. For modes with shorter wavelengths
we must understand whether higher multipoles may become excited. To gain some intuition we

38
will make the simplifying assumption that the relaxation rates for all n and ` are identical to
those for n = 0 and ` = 2. This is equivalent to working in the relaxation time approximation. In
this case the derivation from section III goes through essentially unchanged and the anisotropic
stress is given by
Z ∞ Z +1
π 5 0
πq (t) = p dp n (p) (1 − µ2 )2 dµ
4ma5 (t) 0 −1
Z t 00
 Z t   Z t 
0 0 00 iqpµ 00 00 T (t )
× dt ḣq (t ) exp − dt 2 00 exp −2 dt ωr (t ) . (165)
t1 t0 a (t )m t0 Tdm (t00 )

As before, the equation of motion at late times when q/a  ωr , H is given by equation (137),
and we only have to follow the evolution of the mode until q/a  ωr  H to find the appropriate
initial conditions for this equation.
To find the expression for the anisotropic stress valid from horizon entry until q/a  ωr  H,
we can proceed as before and approximate the second line of equation (165) as
Z y  Z y  Z 0
10 iκpµ y 0 0 0
h κpµu i
dzhq (z) exp − ln − dz z ω̂(z ) ≈ duh10
q (y + u) exp i exp [uy ω̂(y)] .
0 makd z z −∞ ma
(166)
The integral on the right hand side only receives significant contributions for |u| < 1/κ so that
the argument of the first exponential is of order the dark matter velocity v around the time when
q/a ≈ ωr . Furthermore, because of the integration over µ in equation (165) only even powers in
µ contribute so that the leading correction occurs at second order in the dark matter velocity,
implying that the damping of the amplitude and the phase shift for all intermediate modes are
well approximated by equations (148) and (149). Furthermore, for all modes that enter after
the dark matter particles have become non-relativistic, q/a ≈ ωr occurs after freeze-out so that
annihilations can be neglected around this time.

Short modes
We now turn to modes that enter the horizon when the dark matter is still relativistic. While
detailed modeling of the collision terms describing the scattering of relativistic dark matter
particles with the standard model is possible, it is significantly more tedious than in the non-
relativistic limit, and we continue with the simplifying assumption that relaxation rates for all
n and ` are equivalent to those for n = 0 and ` = 2. In this case the anisotropic stress is
Z ∞
n0 (p)
Z +1
π 5
πq (t) = 5 p dp p (1 − µ2 )2 dµ
4a (t) 0 m2 + p2 /a2 (t) −1
Z t " Z #
t  Z t 
0 0 00 iqpµ 00 00
× dt ḣq (t ) exp − dt p exp −2 dt ω(t ) , (167)
t1 t0 a2 (t00 ) m2 + p2 /a2 (t00 ) t0

39
with ω(t) now the collision rate including both elastic an inelastic processes.
Short modes naturally subdivide into two classes, one for which the dark matter is still
relativistic and one for which it is non-relativistic when q/a ≈ ω. For a typical WIMP, the
boundary between these classes corresponds to modes with a frequency of 104 Hz today, so that
for all planned interferometer experiments it is sufficient to focus on modes for which the dark
matter is already non-relativistic when q/a ≈ ω. As we will see, the dominant contributions for
these modes arise during two periods, the first around the time when the dark matter becomes
non-relativistic, and the second when q/a ≈ ω. Scattering is very rapid during both periods and
we expect (167) to provide a very good approximation.
From the discussion of intermediate modes, we know that the equation of motion for grav-
itational waves when q/a  ω, qv/a is given by equation (137). What remains is to find the
initial conditions for this equation or equivalently the amplitude and phase shift. As before we
will make use of the fact that the dark matter distribution approaches its equilibrium value on
time scales short compared to the expansion of the universe and the integral over t0 receives its
dominant contribution near the upper limit. Using the same notation as for the intermediate
modes, we can approximate
 
Z y Z y Z y
iκpµ
10
dzhq (z) exp −i
 qdz 0 − dz 0 z 0 ω̂(z 0 ) (168)
0 z 0 2 2 02
z akd m + p /z /akd 2 z

Z 0 " #
10 κpµu
≈ duhq (y + u) exp i p exp [uy ω̂(y)] . (169)
−∞ a m2 + p2 /a2

The integral over u only receives significant contributions for |u| of order 1/y ω̂(y), which is of
order 1/κ when q/a ≈ ω. This implies that the argument of the argument of the exponential is
of order the dark matter velocity at this time and hence small for the modes of interest. The
integration over µ implies that the leading contribution arises at second order in the velocities
and we will ignore these corrections. At earlier times y ω̂(y)  κ so that the argument is further
suppressed then, and we can approximate the anisotropic stress by
Z ∞
n0 (p, t)
Z 0
4π 5
πq (t) = p dp duh10
q (y(t) + u) exp [uy(t)ω̂(y(t))] . (170)
15a5 (t) 0
p
2 2 2
m + p /a (t) −∞

As long as y ω̂  κ, which is the case for the short modes of interest until the dark matter has
become non-relativistic, we can neglect u in h10
q and the equation of motion for gravitational
waves becomes
 
2
h00q (x) + + γ(x) h0q (x) + κ2 hq (x) = 0 , (171)
x

40
with
d3 p p2 (4E 2 + m2 )
Z
2
γ(x) = n(p, t) , (172)
5ρ(x)x3 ω̂(x)
(2π)3 (xakd )5 E 3
p
where ρ(x) is the total energy density and E = m2 + p2 /a2 . Either treating the additional
damping term as a perturbation and using the Green’s function (65) or using the WKB approx-
imation, we find that the damping of the amplitude is independent of wave number and is given
by
x
p2 (4E 2 + m2 )
Z Z
4 fdm (y) 1
C(x) = − dy with fdm (y) = d3 p n(p, t) . (173)
5 0 y 3 ω̂(y) 4ρ(y) (yakd )5 E 3

At early times when the dark matter is relativistic, fdm is time-independent and corresponds to
the fraction of the energy density stored in dark matter. As the temperature drops below the
mass of the dark matter particles, fdm (y) decreases rapidly and cuts off the integral. In general,
n(p, t) follows from the freeze-out calculation based on equation (100). For scattering rates that
do not drop too rapidly, we can approximate n(p, t) by its equilibrium abundance and write

gd 30 ∞ dz z 4 (5s2 /4 + z 2 )
Z
1 m m
fdm (y) = 2 2 2 2 3/2

2 +z 2
with s= = y , (174)
g? (y) π 0 2π (s + z ) e s ±1 T Tkd

with gd counting the number of degrees of freedom in the dark matter, and g? (y) the usual
effective number of degrees of freedom.
If the interactions between the dark matter particles and the standard model are controlled
by a single operator, we expect ω̂(y) = (m/Tkd )y −(3+β) . In this case the amount of damping
experienced around the time when the dark matter becomes non-relativistic is given by
 2+β
4 gd Tkd
Cnr =− F(β) , (175)
5 g?,m m

where g?,m is the effective number of relativistic degrees of freedom around the time when the
dark matter particles become non-relativistic, and F(β) is a function that only depends on β
and can readily be evaluated numerically. In our example of β = 2, it takes the value F(2) ≈ 12.
For larger values of β, n(p, t) should be obtained using equation (100). Since Tkd /m is the square
of the dark matter velocity at kinetic decoupling, we see that the effect is rather small.
We now know the mode functions for short modes at a time when q/a is still small compared
to ω but the dark matter has already become non-relativistic. We can proceed just like for
the intermediate modes to evolve the modes until q/a  ω and equation (137) describes their
evolution. The only difference is that for intermediate modes the lower limit of the integral in
equation (147) was zero whereas it is now non-zero. However, the integral is dominated by the

41
contribution near the upper limit so that this difference is negligible and the damping and phase
shift acquired around the time when q/a ≈ ωr are still given by equations (148) and (149).
As long as the two events are separated, the total amount of damping is simply given by
Cnr + Cω . For high frequencies the first term dominates, for low frequencies it is the second. Up
to order one factors, the transition between the regime occurs at
  3+β   3+β   3+β  3+β
g?,eq 2+β g?,m 2+β Teq 2+β m
κ∼ , (176)
gd g?,kd m Tkd

with frequency independent damping above this wave number and an amount of damping that
scales like k −(2+β)/(3+β) for smaller wave numbers.

VIII. Conclusions

We have analyzed the effects of cold dark matter on the propagation of gravitational waves of
astrophysical and primordial origin. Our analysis does not suggest any way of detecting the effect
of cold dark matter on the propagation of gravitational waves from astrophysical gravitational
waves in the near future.
Primordial gravitational waves in principle contain a wealth of information about dark matter
and its interactions such as coupling strengths and the nature of the interactions. However, in
practice the effects of cold dark matter on primordial gravitational waves also appear too small to
be detectable. For the longest modes that enter after matter radiation equality, the anisotropic
stress is small because the cold dark matter is highly non-relativistic by the time of horizon entry.
The effects are largest for intermediate modes that enter the horizon around the time of kinetic
decoupling, but even then the effects are highly suppressed because the cold dark matter is non-
relativistic at this time and because the contribution to the energy density from dark matter is
small compared to that in radiation at the time of kinetic decoupling. For shorter modes, the
effects are suppressed because collisions rapidly drive the system toward local equilibrium.
Unlike cold dark matter, particles that decouple when they are relativistic have a signif-
icant effect on primordial gravitational waves. Modes that enter after kinetic decoupling are
damped [5]. The spectrum of primordial gravitational waves on scales that enter the horizon
around the time of kinetic decoupling contains information about the interactions. However, for
neutrinos, the only particles known to decouple relativistically, kinetic decoupling is imprinted
on modes with frequencies that are too high to be accessible in the CMB and too low for pulsar
timing arrays.

42
Acknowledgments

We are grateful for helpful conversations with Richard Matzner and Paul Shapiro. R.F. was
supported in part by the Alfred P. Sloan Foundation, the Department of Energy under Grant
No. de-sc0009919, and a grant from the Simons Foundation/SFARI 560536. S.W. was supported
by the National Science Foundation under Grant Number PHY-1620610 and by The Robert A.
Welch Foundation, Grant No. F-0014.

APPENDIX A: Boltzmann Hierarchy

In this appendix we provide the derivation of the Boltzmann hierarchy (123) from the linearized
Boltzmann equation (103). As we explained in section VII, the form of the mode expansion
for the gravitational field given in equation (53) implies that equation (103) only depends on
the direction of the momentum of the dark matter particles through µ = p̂ · q̂ and eij (q̂, λ)p̂i p̂j .
For isotropic initial conditions, the perturbation to the phase space density of the dark matter
particles introduced by the gravitational wave must then be of the form (120). For this Ansatz
˜ p, µ, t)
equation (103) becomes a differential equation for ∆(q,

˜˙ ipqµ ˜ 1 ∂
∆(q, p, µ, t) + 2 ∆(q, p, µ, t) − ḣq (t)p n(p, t) =
a m 2 ∂p

−2ωa (t)∆(q, ˜ p, µ, t) + p ∂ ∆(q,
˜ p, µ, t) + ωr (t) 3∆(q, ˜ p, µ, t)
∂p
 2  
2 ∂ 2 ∂ 1 2 ˜
+a mT + − D ∆(q, p, µ, t) , (177)
∂p2 p ∂p p2

with the operator D2 given by

∂2 ∂
D2 = −(1 − µ2 ) + 6µ + 6. (178)
∂µ2 ∂µ

We will eventually expand in terms of eigenfunctions of D2 and the differential operator appear-
ing on the right hand side. Since it involves T instead of Tdm , one would have to keep a large
number of the eigenfunctions when Tdm  T . In an attempt to ameliorate this, we will work
with the fractional perturbation ∆(q, p, µ, t) defined by

˜ p, µ, t) = ∆(q, p, µ, t)p ∂ n(p, t) .


∆(q, (179)
∂p

43
For simplicity, let us drop the first term on the right hand side because ωa  ωr when the dark
matter particles are non-relativistic. We will restore it later. In this case the equation becomes

˙ ipqµ 1
∆(q, p, µ, t) + 2 ∆(q, p, µ, t) − ḣq (t) =
 a m 2 
2T 2T ∂
ωr (t) − ∆(q, p, µ, t) − − 1 p ∆(q, p, µ, t)
Tdm Tdm ∂p
 2  
2 ∂ 6 ∂ 6 1 2
+a mT + + − D ∆(q, p, µ, t) . (180)
∂p2 p ∂p p2 p2

Our goal will be to turn this partial differential equation into a hierarchy of coupled ordinary
differential equations by constructing the eigenfunctions of the differential operator on the right
hand side and rely on the orthogonality of eigenfunctions with different eigenvalues. The eigen-
functions of the operator D2 with appropriate boundary conditions are

P`2 (µ)
P` (µ) = , (181)
1 − µ2

where P`m (µ) are associated Legendre polynomials. These functions are eigenfunctions of D2
with eigenvalue `(` + 1)
D2 P` (µ) = `(` + 1)P` (µ) , (182)

and obey the orthogonality relation


Z 1
2(` + 2)!
dµ(1 − µ2 )2 P` (µ)P`0 (µ) = δ``0 . (183)
−1 (2` + 1)(` − 2)!

For ` = 2 we simply have

P2 (µ) = 3 , (184)

so that the orthogonality relation also implies


Z 1
16
dµ(1 − µ2 )2 P` (µ) = δ`2 . (185)
−1 5

Furthermore, they obey the recurrence relation


`+1 `−1
µP` (µ) = P`−1 (µ) + P`+1 (µ) . (186)
2` + 1 2` + 1
Expanding
X
∆(q, p, µ, t) = (−i)` (2` + 1)∆` (q, p, t)P` (µ) , (187)
`

44
and using the recurrence relation (186), the orthogonality relations (183) and (185), equa-
tion (180) becomes

˙ ` (q, p, t) + pq h i 1
∆ (` + 2)∆ `+1 (q, p, t) − (` − 2)∆ `−1 (q, p, t) + ḣq (t)δ`,2 =
(2` + 1)a2 m 30
  
2T 2T ∂
ωr (t) − ∆` (q, p, t) − − 1 p ∆` (q, p, t)
Tdm Tdm ∂p
 2  
2 ∂ 6 ∂ `(` + 1) − 6
+a mT + − ∆` (q, p, t) . (188)
∂p2 p ∂p p2

It would seem natural to work with the eigenfunctions of the operator on the right hand side.
However, it turns out to be convenient to instead work with the eigenfunctions
  2 
T ∂ 2 ∂ 6 ∂ `(` + 1) − 6
− p + a mT + − Ln` (z) =
Tdm ∂p ∂p2 p ∂p p2
T
−(2n + ` − 2) Ln` (z) , (189)
Tdm
with n = 0 . . . ∞ and ` = 2 . . . ∞, which are given in terms of generalized Laguerre polynomials
Lkn by
p2
Ln ` (z) = z `/2−1 Ln`+1/2 (z) with . z= (190)
2a2 mTdm
As we will see, the advantage of this basis is that z is also the argument of the exponential in
n(p, t). These functions obey the orthogonality relation
Z ∞
Γ(n + ` + 3/2)
dz z 5/2 e−z Ln ` (z)Ln0 ` (z) = δnn0 , (191)
0 n!

which contains the special case


Z ∞ √
5/2 −z 15 π
dz z e Ln 2 (z) = δn0 . (192)
0 8

To make use of the orthogonality relation (191) when deriving the hierarchy, we will have to use
the relations

L0 `−1 (z) = z −1/2 L0 ` (z) , (193)


d `−2
L0 ` (z) = L0 ` (z) , (194)
dz 2z 
3
Ln `+1 (z) = n+`+ z −1/2 Ln ` (z) − (n + 1)z −1/2 Ln+1 ` (z) , (195)
2
Ln `−1 (z) = z −1/2 Ln ` (z) − z −1/2 Ln−1 ` (z) for n ≥ 1, (196)
d 2n + ` − 2 n + ` + 12
Ln ` (z) = Ln ` (z) − Ln−1 ` (z) for n ≥ 1, (197)
dz 2z z

45
which directly follow from the relations for associated Laguerre polynomials
`+1/2 `+3/2
L0 (z) = L0
(z) , (198)
d `+1/2
L (z) = 0 , (199)
dz 0  
`+3/2 3 `+1/2
zLn (z) = n+`+ L`+1/2
n (z) − (n + 1)Ln+1 (z) , (200)
2
`+3/2
L`+1/2
n (z) = Ln`+3/2 (z) − Ln−1 (z) for n ≥ 1, (201)
d `+1/2 `+3/2
L (z) = −Ln−1 (z) for n ≥ 1. (202)
dz n
Expanding ∆` (q, p, t) in terms of these eigenfunctions
X
∆` (q, p, t) = ∆n ` (q, t)Ln ` (z) , (203)
n

substituting the expansion into equation (188), using the orthogonality relation and identities
in the appendix, as well as equation (99) in the form
 
ż T
= −2ωr (t) −1 , (204)
z Tdm
we arrive at the following hierarchy of equations
 1/2 "  
˙ q 2Tdm 3
∆n ` (q, t) + (` + 2) n + ` + ∆n `+1 (q, t)
(2` + 1)a m 2
#
−n(` + 2)∆n−1 `+1 (q, t) + (` − 2)∆n+1 `−1 (q, t) − (` − 2)∆n `−1 (q, t) =

1 T
− ḣq (t)δ`2 δn0 − (2n + `)ωr (t) ∆n ` (q, t) . (205)
30 Tdm
The derivation in the presence of annihilations proceeds in the same way, and keeping them one
arrives at
 1/2 "  
˙ n ` (q, t) + q 2Tdm 3
∆ (` + 2) n + ` + ∆n `+1 (q, t)
(2` + 1)a m 2
#
−n(` + 2)∆n−1 `+1 (q, t) + (` − 2)∆n+1 `−1 (q, t) − (` − 2)∆n `−1 (q, t) =

1 T n2eq
− ḣq (t)δ`2 δn0 − (2n + `)ωr (t) ∆n ` (q, t) − 2ωa (t) 2 ∆n ` (q, t) . (206)
30 Tdm n

———

46
References

[1] B. P. Abbott et al. [LIGO Scientific and Virgo Collaborations], Phys. Rev. Lett. 116, no.
6, 061102 (2016) [arXiv:1602.03837 [gr-qc]].

[2] E. Calabrese, N. Battaglia and D. N. Spergel, Class. Quant. Grav. 33, no. 16, 165004 (2016)
[arXiv:1602.03883 [gr-qc]].

[3] G. Goswami, G. K. Chakravarty, S. Mohanty and A. R. Prasanna, Phys. Rev. D 95, no.
10, 103509 (2017) [arXiv:1603.02635 [hep-ph]].

[4] S. W. Hawking, Astrophys. J. 145, 544 (1966).

[5] S. Weinberg, Phys. Rev. D 69, 023503 (2004) [astro-ph/0306304].

[6] G. Baym, S. P. Patil and C. J. Pethick, Phys. Rev. D 96, no. 8, 084033 (2017)
[arXiv:1707.05192 [gr-qc]].

[7] I. S. Gradshteyn and I. M. Ryzhik, Table of Integrals, Series, and Products, trans. and ed.
A. Jeffrey (Academic Press, New York, 1980), 3.896-3 and 9.211-1.

[8] S. Kawamura et al., J. Phys. Conf. Ser. 122, 012006 (2008).

[9] S. Phinney et al., NASA Mission Concept Study (2004).

[10] Y. Watanabe and E. Komatsu, Phys. Rev. D 73, 123515 (2006) [astro-ph/0604176].

47
UTTG-20-18
arXiv:1903.11168v1 [astro-ph.GA] 26 Mar 2019

Soft Bremsstrahlung

Steven Weinberg∗
Theory Group, Department of Physics, University of Texas
Austin, TX, 78712

Abstract

Simple analytic formulas are considered for the energy radiated in low
frequency bremsstrahlung from fully ionized gases. A formula that has been
frequently cited over many years turns out to have only a limited range of
validity, more narrow than for a formula derived using the Born approxima-
tion. In an attempt to find a more widely valid simple formula, a soft photon
theorem is employed, which in this context implies that the differential rate
of photon emission in an electron-ion collision with definite initial and final
electron momenta is correctly given for sufficiently soft photons by the Born
approximation, to all orders in the Coulomb potential. Corrections to the
Born approximation arise because the upper limit on photon energy for this
theorem to apply to a given collision becomes increasingly stringent as the
scattering approaches the forward direction. A general formula is suggested
that takes this into account.


Electronic address: weinberg@physics.utexas.edu

1
I INTRODUCTION

The emission of radio waves from hot ionized interstellar gas is largely
due to soft bremsstrahlung, the radiation of a low energy photon in the de-
flection of a free electron with much larger kinetic energy by the Coulomb
field of an atomic nucleus. It is conventional to express the rate j(ν, v) of
energy emission per time, per photon solid angle, and per photon frequency
interval at frequency ν from an electron of velocity v with |v| = v due to
bremsstrahlung in a fully ionized gas as the approximate classical electrody-
namics result given in 1923 by Kramers[1], times a “free-free Gaunt factor”
gff (ν, v) that incorporates quantum and other corrections:

8πZ 2 e6 nI
j(ν, v) ≡ √ 3 2 gff (ν, v) , (1)
3 3 c me v

where nI is the number density of ions, Ze is the ionic charge (with e


everywhere in unrationalized electrostatic units), and me is the electron
mass. For an ionized gas in kinetic equilibrium at temperature T , this gives
the emissivity, the rate of radiation energy emitted per time, per volume,
per photon solid angle, and per photon frequency interval:
Z
jν (T ) = d3 v ne (v, T ) j(ν, |v|) (2)
me v2 /2>hν

where ne (v, T )d3 v is the number density of free electrons with velocity in
a range d3 v at v. Using the Maxwell-Boltzmann distribution for ne (v, T ),
this gives the emissivity
1/2
8Z 2 e6 nI ne 2π

jν (T ) = 3/2
g ff (ν, T ) . (3)
3c3 (kB T )1/2 me 3

where ne is the total number density of free electrons; kB is the Boltz-


mann constant; and gff (ν, T ) is the thermally averaged free-free Gaunt factor
(briefly, the thermal Gaunt factor):
!
me ∞ me v 2
Z
g ff (ν, T ) = √ gff (ν, v) exp − v dv . (4)
kB T 2hν/me 2kB T

Astrophysicists today chiefly rely on various numerical calculations (e.g.,


refs. [2], [3], [4], [5]) of the Gaunt factor, based on a set of quite complicated

2
formulas:

2 3h 2 ′2 2 ′2 ′ 2 1/2 ′2 1/2
i
gff (ν, v) = (ξ + ξ + 2ξ ξ )I 0 − 2ξξ (1 + ξ ) (1 + ξ ) I 1 I0 ,
πξξ ′
(5)

where
ℓ+1
1 4ξξ ′ |Γ(ℓ + 1 + iξ)Γ(ℓ + 1 + iξ ′ )|


Iℓ = eπ(ξ+ξ )/2
4 (ξ ′ − ξ)2 Γ(2ℓ + 1)
−iξ−iξ ′ !
ξ + ξ′ 4ξξ ′

× ′ 2 F1 ℓ + 1 − iξ, ℓ + 1 − iξ ′ ; 2ℓ + 2; − ′ .
ξ −ξ (ξ − ξ)2

Here ξ ≡ Ze2 /h̄v and ξ ′ ≡ Ze2 /h̄v ′ , with v ′ the magnitude of the final
electron velocity, given in terms of ν and v by the condition of energy con-
servation
me v ′2 /2 = me v 2 /2 − hν . (6)
Also, 2 F1 is a confluent hypergeometric function, with power series expansion

X (a)n (b)n xn
2 F1 (a, b; c; x) = ,
n=0
(c)n n!

where for any complex z

(z)n ≡ z(z + 1) · · · (z + n − 1) for n = 1, 2, 3 . . . ; (z)0 ≡ 1 .

These are derived[2] from the summation of a partial wave expansion[6] of


results originally given by Sommerfeld[7] Still, it would be useful to have
a widely valid simple analytic formula for the Gaunt factor, in order easily
to see trends in how it varies with various parameters, and in order easily
to calculate the thermal Gaunt factor (4). Above all, from an independent
derivation of a simple analytic formula we can gain a more detailed physical
understanding of what is going on in the bremsstrahlung process.
A simple formula for the thermal Gaunt factor has been often given,
without providing a derivation, in treatises on the interstellar medium (for
example, [8], [9], [10], [11])).
√ " ! #
3 (2kB T )3/2 5γ
g ff (ν, T ) = ln 1/2
− , (7)
π πZe2 νme 2

3
where γ is the Euler constant, γ = 0.577 . . .. Some of these references in-
dicate that the formula holds for photons that are soft, in the sense that
hν ≪ kB T . (They also note that it is necessary to assume that the pho-
ton frequency is much larger than the plasma frequency νP , so that Debye
screening can be ignored. This is not a stringent condition, and will be
taken for granted throughout.) But none suggest that there are more strin-
gent conditions on the frequency and temperature for the formula to be a
valid approximation.
For this formula Spitzer[8] cited an article by Scheuer[11], who found
Eq. (7) by a purely classical calculation of the emissivity per electron, which
gave a result √ " ! #
3 me v 3
gff (ν, v) = ln −γ , (8)
π πZe2 ν
As Scheuer found, using this in Eq. (4) gives the widely quoted thermal
Gaunt factor (7) for hν ≪ kB T . .
It is not possible that Eq. (8) could be a good approximation for general
photon frequencies and electron velocities with hν ≪ me v 2 /2 . Contrary
to Eq. (8), if ξ ≡ Ze2 /h̄v is much less than unity (so that the Coulomb
potential at an electron-ion separation equal to the electron de Broglie wave
length is much less than the electron kinetic energy) one would expect j(ν, v)
to have a finite limit, given by the Born approximation — that is, keeping
in the matrix element only terms of first order in the Coulomb potential:
√ !
Born 3 2me v 2
gff (ν, v) = ln . (9)
π hν
(The derivation of Eq. (9) for ξ ≪ 1 is given in the next section.) Since
Eq. (8) does not reduce to Eq. (9) for ξ ≪ 1, where Eq. (9) applies, it
cannot be correct for small ξ. It also cannot be correct when ξ is much
larger than the ratio of electron to photon energies, because there it gives a
negative Gaunt factor. Accordingly, the formula (7) for the thermal Gaunt
factor derived from Eq. (8) cannot be expected to hold for general photon
frequencies and electron temperatures with hν ≪ kB T .
Recently Albalat and Zimmerman[13] have shown that Eq. (8) follows
from the “exact” formula (5) used in numerical calculations, for ξ in the
range
1 ≪ ξ ≪ me v 2 /hν . (10)
(They subsequently found that, though it seems to have been largely forgot-
ten, in 1962 a review article[14] had obtained the same result.) For instance,

4
for 2hν/me v 2 = 10−3 , Eq. (8) is within a few percent of numerical results[5]
for ξ between 1 and 10. On the other hand, for 2hν/me v 2 = 10−2 the range
in which (8) agrees with numerical results is vanishingly narrow.
In contrast, the Born approximation (9) agrees very well with numerical
calculations where ξ < 1, and the Scheuer Gaunt factor (8) does not. For
instance, for hν = 10−3 me v 2 /2, Eq. (9) gives gffBorn (ν, v) = 4.573, while
numerical calculations[5] give gff (ν, v) equal to 4.5730 for ξ = 10−3 and for
ξ = 10−2 , dropping only to 4.5672 for ξ = 0.1 and to 4.2093 for ξ = 1. (An
electron has ξ < 1 if its kinetic energy is larger than the binding energy of a
1s atomic electron.) In contrast, for the same ratio of photon and electron
energies, Eq. (8) gives a Gaunt factor that is 76% too large for ξ = 10−3 ,
and still 21% too large for ξ = 0.1.
This leaves us with the task of finding an approximate analytic formula
for the Gaunt factor that is generally valid for ξ > 1. Section II derives what
I think is a new formula for the matrix element for soft bremsstrahlung, valid
for any ξ to all orders in the Coulomb potential. This formula leads imme-
diately to the Born approximation (9) in the case ξ ≪ 1. To deal with more
general values of ξ, a general soft-photon theorem is used in Section III
to show that the the differential rate of photon emission in an electron-ion
collision with definite initial and final electron momenta is correctly given
for sufficiently soft photons by the Born approximation, to all orders in the
Coulomb potential. Nevertheless, as explained in Section IV, integration
over the final electron direction introduces corrections to the Born approxi-
mation for the Gaunt factor for ξ > 1. Properties of the general formula for
the bremsstrahlung matrix element derived in Section II suggest a frame-
work for a more general formula.

II A GENERAL FORMULA

To derive the Born approximation result (9) for ξ ≪ 1, and to understand


the decrease of the Gaunt factor below the Born approximation value for ξ
of order unity and greater, it will be useful first to provide what I think is
a new formula for the matrix element for bremsstrahlung that is valid to all
orders in the Coulomb potential.
Taking electrons to be non-relativistic, which also entails the electric
dipole approximation for the interaction of electrons with the quantized elec-
tromagnetic field, the term in the matrix element (that is, the coefficient of
the energy conservation delta function in the S-matrix) for bremsstrahlung
of first order in this interaction and to all orders in the Coulomb interac-

5
tion between an electron and an ion is given in the “distorted wave Born
approximation”[15] as

−2πi − 4πeh̄2
Z
M=√ × d3 r ψ ′∗ (r)e∗ (q̂, λ) · ∇ψ(r) (11)
2qc(2πh̄)3/2 me

where e(q̂, λ) is the polarization vector (with e∗ · e = 1) for a photon with


momentum q and helicity λ, and ψ and ψ ′ are respectively “in” and “out”
normalized solutions of the Schrödinger equation for the initial and final
electrons. If we multiply M with qc = me v 2 /2 − me v ′2 /2 and use the
Schrödinger equations for me v 2 ψ/2 and me v ′2 ψ ′ /2, we find by an integration
by parts that the kinetic energy terms in the Schrödinger equations cancel,
while the potential terms cancel except where the gradient in Eq. (9) acts
on the electron-ion interaction potential V , so
√ Z
−ie h̄ 3 ′∗ ∗
h i
M= d r ψ (r) e (q̂, λ) · ∇V (r) ψ(r) . (12)
(qc)3/2 me

Using the general rules for calculating rates in quantum mechanics, and
setting qc = hν, the rate of emission of radiation energy per time, per photon
solid angle, per photon frequency interval, and per final electron solid angle
when an electron is scattered from initial velocity v to final velocity v′ is
then
!
h5 ν 2 nI m3e ∞ me v 2 me v ′2
Z Z
′ 2 ′2 ′ 2
X
j(ν, v → v ) = hν× d q̂ v dv |M | δ − − hν .
4πc3 0 λ
2 2
(13)
Eqs. (12) and (13) apply for a general potential V . For a Coulomb
potential V (r) = −Ze2 /r. the exact wave functions are well known:

Γ(1 + iξ)e−ξπ/2 ik·r  


ψ(r) = e 1 F1 − iξ; 1; (ik|r| − ik · r) ,
(2πh̄)3/2

′ Γ(1 − iξ ′ )e−ξ π/2 ik′ ·r 
′ ′ ′

ψ (r) = e 1 F1 iξ ; 1; (−ik |r| − ik · r) , (14)
(2πh̄)3/2

where k ≡ vme /h̄ and k′ ≡ v′ me /h̄; v and v′ are the initial and final
electron velocities; ξ ≡ Ze2 /h̄v; ξ ′ ≡ Ze2 /h̄v ′ ; and 1 F1 is another confluent
hypergeometric function.
So far, this is valid for non-relativistic electrons to all orders in the
Coulomb potential V (r). If we now take ξ ≪ 1 (so that for soft photons

6
also ξ ′ ≪ 1) then the functions 1 F1 in the wave functions (11) take the form
1 + O(ξ) and 1 + O(ξ ′ ) and the matrix element (10) is then given by just
the term of first order in V [16]:

4πZe3 h̄3/2 (v − v′ ) · e∗ (q̂, λ)


M= (15)
(hν)3/2 (2πh̄)3 m2e (v − v′ )2

Using this in Eq. (11), we find the emission rate per electron:

4Z 2 e6 nI v ′ 1
j Born (ν, v → v′ ) = . (16)
3πc3 m2e |v − v′ |2

(with v ′ given by the energy conservation condition me v ′2 /2 = me v 2 /2−hν).


Integrating over the final electron direction gives, for soft photons,

8Z 2 e6 nI v + v′
Z  
j Born (ν, v) = d2 v̂ ′ j Born (ν, v → v′ ) = 3 2
ln
3c me v v − v′
!
8Z 2 e6 nI 2me v 2
→ ln (17)
3c3 m2e v hν

corresponding to the Gaunt factor (9), in disagreement with Scheuer’s for-


mula (8). It has recently been shown [12] that Eq. (9) also follows in the
limit ξ → 0 from the formula (5) used in refs. [2]-[5]. .
Using Eq. (9) in Eq. (4) gives a thermal Gaunt factor
√ "  #
3 
g ff (ν, T ) = ln 4kB T /hν − γ , (18)
π

which
p we expect to be valid if 2hν/me vT2 ≪ 1 and Ze2 /h̄vT ≪ 1, where vT ≡
2kB T /me is a typical thermal velocity. There is a possible problem in this
thermal averaging: no matter how small 2hν/me vT2 and Ze2 /h̄vT may be,
there will always be some electrons with v ≪ vT for which 2hν/me v 2 and/or
Ze2 /h̄v are not small. But the effect of these slow electrons is suppressed
by d3 v/v = 4πv dv in the velocity integration (4). In any case, Scheuer’s
Eq. (8) already disagrees with the Born approximation result (9) before
thermal averaging, under circumstances in which the Born approximation is
valid.

III THE SOFT PHOTON THEOREM

7
We need to go further, and understand the changes in the Gaunt factor
when ξ ≡ Ze2 /h̄v is of order unity or greater. At first sight, it might
seem that the Born approximation should continue to apply for soft photons
whatever the value of ξ. This is because the emission per electron j(ν, v →
v′ ) for fixed electron directions is correctly given in the soft photon limit
v ′ → v by the Born approximation result (16), to all orders in the Coulomb
potential, whether or not ξ is small.
This conclusion is based on a very general low energy theorem[17] of
quantum electrodynamics. According to this theorem, which is valid to
all orders in perturbation theory, the differential rate for a general process
α → β with emission of any number of soft photons with total energy less
than some amount E is given in the soft photon limit E → 0 by

dΓα→β (< E) → (E/Λ)A b(A) dΓ0α→β . (19)

Here
1 X 4πηn ηm en em 1 + βnm
 
A=− 2
ln , (20)
8π h̄c n,m βnm 1 − βnm
where the sums run over all particles participating in the reaction α → β;
en is the charge of the nth particle; ηn equals +1 or −1 for particles in the
initial state α or final state β; and cβnm is the velocity of either of particles
n or m in the rest frame of the other particle:
" #1/2
m2 m2 c4
βnm ≡ 1− n m 2 . (21)
(pn · pm )

Also, b(A) is the function

1 +∞ sin σ dσ 1 dω  iωσ π 2 A2
Z  Z  
b(A) ≡ exp A e −1 =1− + . . . , (22)
π −∞ σ 0 ω 12

and dΓ0α→β is the differential rate for the same process without soft pho-
ton emission and without radiative corrections from virtual infrared pho-
tons, where Λ is a more-or-less arbitrary upper limit on virtual photon four-
momenta that is used to define what we mean by “infrared.” (As we shall
see, Λ will not appear in the non-relativistic limit relevant to this paper.)
The differential rates dΓα→β (< E) and dΓ0α→β are rates for producing the
particles in the final state β in some infinitesimal element of their momen-
tum spaces, the same for both rates. (The formula given in ref. [17] has
been modified here by inserting a factor 4π in Eq. (20) to account for the

8
use here of unrationalized units for electric charge, and inserting a factor
1/h̄c to make A dimensionless in cgs units.)
This is no place to re-derive this old result, but it may be useful here to
remark that it applies because the insertion of a soft-photon external line of
momentum q in any external line for a charged particle in the process α → β
produces an internal line connecting this vertex to the rest of the diagram;
the propagator for this line contributes a 1/q singularity for q → 0, whose
residue is proportional to the matrix element for the process without the
soft photon. This accounts for a factor 1/q in Eq. (12), which multiplies the

kinematic factor 1/ q already present in Eq. (11). (For photon absorption,
the corresponding 1/q 3/2 factor in the matrix element accounts for a factor
(kB T )−3 in the Kramers opacity for free-free transitions.) These diagrams
dominate the matrix element for q → 0, because insertion of the soft photon
line in an internal line of the process α → β does not produce this pole.
Formula (19) for the soft photon emission rate applies for relativistic or
non-relativistic processes involving particles of arbitrary spin, whatever the
interactions may be that produce the reaction α → β. It is considerably
simplified if we specialize to the non-relativistic case, for which in some
reference frame all velocities of the particles in the states α and β are much
less than c. In this case all βnm are much less than one, and we can use the
expansion
1 1 + βnm 2 2
 
ln = 2 + βnm + ... . (23)
βnm 1 − βnm 3
The first term does not contribute in Eq. (20), because the conservation
of electric charge gives n ηn en = 0. Hence in the non-relativistic case,
P

Eq, (20) becomes


1 X 2
A=− ηn ηm en em βnm , (24)
3πh̄c n,m
This is at most of order v 2 /c2 , so in the non-relativistic limit b(A) = 1, and
Eq. (19) becomes
E
  
dΓα→β (< E) → 1 + A ln dΓ0α→β . (25)
Λ
The rate of energy radiation per frequency interval is then simply
d
hν dΓα→β (< hν) → hA dΓ0α→β . (26)

If we now specialize to the case of a non-relativistic electron of velocity v
scattered by a Coulomb potential into a final velocity v′ , the sum in Eq. (26)

9
will be dominated by terms in which particles n and m are respectively the
initial and final electron, or vice versa. This gives

2e2
A= v − v′ |2 . (27)
3πh̄c3
If needed we could use this in Eq. (24) with any assumption about the
differential rate dΓ0v→v′ for electron scattering without photon emission,
taking account of complications like screening or finite ion size in scattering
by atoms, but the calculation of dΓ0v→v′ beyond low orders of perturbation
theory would then be complicated. Fortunately, as well known, if we limit
ourselves to the scattering of electrons by the Coulomb field of an unscreened
heavy point ion of charge Ze, then the differential Coulomb scattering rate
per electron dΓ0v→v′ is correctly given to all orders in the Coulomb potential
by the Born approximation result:

4Z 2 e4 nI v 2 ′
dΓ0v→v′ = d v̂ , (28)
m2e |v − v′ |4

where v ≡ |v|, and d2 v̂ ′ is the solid angle into which the electron is scattered.
Using Eqs. (27) and (28) in Eq. (26), the differential rate of energy radiation
in soft bremsstrahlung per photon frequency interval, per photon solid angle,
and per electron is

hν d 4Z 2 e6 nI v
j(ν, v → v′ )d2 v̂ ′ ≡ dΓv→v′ (< hν) → 2 ′
2 d v̂ , (29)
4π dν

3πc3 m2e v − v′

just as in the Born approximation result (14). Unfortunately, as we shall see,


although the soft photon theorem tells us that Eq. (29) holds for any ξ and
any fixed initial and final electron velocities and sufficiently small photon
frequency, for ξ > 1 the upper bound on the photon frequency for its validity
becomes increasingly stringent as the final electron direction approaches the
initial direction.

IV DEPARTURES FROM THE BORN APPROXIMATION

Departures from the Born approximation for soft photons arise because,
to calculate the emission per electron we need to integrate over the final
electron direction. In the strict soft photon limit, in which ν = 0 and
v = v ′ , there is a logarithmic divergence in the integral, arising from the

10
configuration in which v̂ ′ is parallel to v̂. In the Born approximation, for ν
small but non-zero the integral is cut off by the inequality of v and v ′ , yielding
Eq. (17), from which the Gaunt factor (9) follows as before. But beyond the
Born approximation, does Eq. (15) correctly describe the behavior of the
emission rate when v ′ 6= v, where the soft photon theorem does not apply?
To answer this, we note that the singularity in the photon emission rate
when v′ → v arises entirely from the slow decrease of the Coulomb potential
at large r. To evaluate this singularity, we use the well-known asymptotic
behavior of the Coulomb wave functions: for r → ∞
′ ′
ψ(r) → (2πh̄)−3/2 eik·r |kr−k·r|iξ , ψ ′ (r) → (2πh̄)−3/2 eik ·r |k′ r+k′ ·r|−iξ ,
(30)
where k = vme /h̄ and k′ = v′ me /h̄. Using these asymptotic forms in
Eq. (12), we see that when |v ′ − v|/v ≃ hν/me v 2 and the angle θ between v
and v′ are both very small, the singularity in the matrix element, is of the
form
√ ′ Z
−iZe3 h (me /h̄)iξ+iξ
 ∗ 
3 e ·r   ′
M→ 3/2 3
d r 3
exp ir·(v−v ′
)(m e /h̄) |vr−v·r|iξ |v ′ r+v′ ·r|iξ .
(hν) me (2πh̄) r
(31)
This can be straightforwardly calculated to leading order in hν/me v 2 and θ
in two limiting cases:
First, if hν/me v 2 ≪ θ ≪ 1 we encounter a singularity:
 
1
4πZe3 h̄3/2 v 2iξ (v − v′ ) · e∗ Γ(1 + 2iξ)Γ 2 − iξ cosh ξπ
M→ × √  .
(hν)3/2 (2πh̄)3 m2e |v − v′ |2+2iξ

πΓ 1 − iξ
(32)
This is not the same as in the Born approximation matrix element (15),
which is no surprise, because the Coulomb scattering amplitude itself is
affected by higher orders in the Coulomb potential. But here as in Coulomb
scattering the higher-order corrections in Eq. (32) are just phases, which do
not appear in |M |2 , so the singularity in the integrand of the emission per
electron is of the form
|M |2 → |MBorn |2 . (33)
The soft photon theorem tells us that this is also true for hν/me v 2 ≪ 1 even
where the angle θ between electron directions is not small and the integral
(10) is not dominated by large r.

11
Second, moving away from the case covered by the soft photon theorem,
if θ ≪ hν/me v 2 ≪ 1, we find a singularity

4πZe3 h̄3/2 v 2iξ (v − v′ ) · e∗ 1    cosh ξπ


M→ Γ(1+2iξ)Γ −iξ Γ 1+iξ √ .
(hν)3/2 (2πh̄)3 m2e |v − v′ |2+2iξ 2 π
(34)
Here the corrections to the Born approximation are not just phases. Instead,

π2ξ 2
|M |2 → |MBorn |2 × . (35)
sinh2 πξ
For a rough approximation to the Gaunt factor when ξ is not small, we
introduce a critical angle θc , and tentatively suppose that Eq. (33) holds for
θ > θc and Eq. (35) holds for θ < θc . Then the emission rate per electron is
 
4Z 2 e6 n v′ d2 v̂ ′ π2ξ 2 d2 v̂ ′
Z Z
I
j(ν, v) = 2 + 2  , (36)
 
3πc3 m2e sinh2 πξ

θ>θc v − v ′
θ<θc v − v′

corresponding to a Gaunt factor


√ " ! #
3 2me v 2 π2ξ 2
gff (ν, v) = ln + ln ζ , (37)
π hνζ sinh2 πξ

where
ζ ≡ (1 + 2(θc2 /(2hν/me v 2 )2 )1/2 . (38)
For ξ ≪ 1 the factor π 2 ξ 2 / sinh2 πξ is close to unity, the dependence of
the Gaunt factor on the unknown function ζ drops out, and we recover the
Born approximation (9). For ξ > 1 the factor π 2 ξ 2 / sinh2 πξ is exponentially
small, and this Gaunt factor reduces to the form (9) of the Born approxi-
mation, except for the factor 1/ζ in the argument of the logarithm. Since
ζ > 1, the Gaunt factor for ξ > 1 is always less than the Born approximation
value. In the limited range (10) where Eq. √ (8) is valid, we have ζ ≃ ξeγ ≫ 1,
and so here the critical angle is θc ≃ ξeγ 2hν/me v 2 . More generally, the
decrease in the Gaunt factor found in numerical calculations for ξ > 1 is ev-
idently due to a depletion of photon radiation from nearly forward electron
scattering.
I am grateful to Paul Shapiro for helpful conversations about bremsstrahlung
in astrophysics, and to Aaron Zimmerman and Sergi Albalat for informa-
tive discussions of numerical calculations. This article is based on work

12
supported by the National Science Foundation under Grant Number PHY-
1620610, and with support from the Robert A. Welch Foundation, Grant
No. F-0014.

REFERENCES

1. H. Kramers, Phil. Mag. 46, 836 (1923).

2. W. J. Karzas and R. Latter, Astrophys. J. Suppl. 6, 167 (1961)

3. D. G. Hummer, Astrophys. J. 327, 472 (1988).

4. R. S. Sutherland, Mon. Not. Roy. Ast. Soc. 300, 321 (1998).

5. P. A. M. van Hoof et al., Mon. Not. Roy. Ast. Soc. 444, 420 (2014).

6. L. C. Biedenharn, Phys. Rev. 102, 262 (1955).

7. A. J. Sommerfeld, Atombau und Spektralinien, Vol. II, Chapter 7,


Section 5. (Vieweg & Sohn, Braunschweig, 1939).

8. D. E. Osterbrock, Astrophysics of Gaseous Nebulae and Active Galac-


tic Nuclei (University Science Books, Mill Valley, CA, 1989).

9. L. Spitzer, Jr. Physical Processes in the Interstellar Medium (John


Wiley & Sons, Inc., New York, 1998).

10. B. T. Draine, Physics of the Interstellar and Intergalactic Medium


(Princeton University Press, Princeton, NJ, 2011).

11. W. J. Maciel, Astrophysics of the Interstellar Medium, transl. M.


Serote Roos (Springer Sciences, New York, 2013).

12. F. A. G. Scheuer, Mon. Not. Roy. Astron. Soc. 120, 231 (1960).
This formula was cited as a result of classical theory by L. Oster, Rev.
Mod. Phys. 33, 525 (1961).

13. S. N. Albalat and A. Zimmerman, paper in preparation.

14. P. J. Brussaard and H. C. van de Hulst, Rev. Mod. Phys. 34, 507
(1962).

13
15. For textbook discussions of the distorted wave Born approximation,
photon emission interactions, and Coulomb wave functions, see for in-
stance S. Weinberg, Lectures on Quantum Mechanics, 2nd ed. (Cam-
bridge University Press, Cambridge, UK, 2015), sections 8.6, 11.7, and
7.9.

16. Essentially the same Born approximation matrix element was obtained
by a direct use of old-fashioned second order perturbation theory in
a calculation of the inverse bremsstrahlung contribution to opacity by
H-Y. Chiu, Stellar Physics (Blaisdell Publishing Co., Waltham, MA,
1968)), Sec. 5.11.

17. S. Weinberg, Phys. Rev. 140, B516 (1965).

14
UTTG-03-19

Absorption of Gravitational Waves from Distant Sources

Raphael Flauger∗
Department of Physics, University of California, San Diego
La Jolla, CA, 92093
arXiv:1906.04853v1 [hep-th] 11 Jun 2019

Steven Weinberg∗∗
Theory Group, Department of Physics, University of Texas
Austin, TX, 78712

Abstract

The rate of gravitational wave absorption by inverse bremsstrahlung is calculated. It increases


with decreasing frequency ν as ν −3 . Nevertheless, because of the near cancellation of absorption
by stimulated emission, the ionized gas in galaxy clusters does not block gravitational waves at the
nanohertz frequencies that may be detected by the use of pulsar timing observations.


Electronic address: rflauger@ucsd.edu
∗∗
Electronic address: weinberg@physics.utexas.edu

1
I INTRODUCTION

The exciting discovery of gravitational waves from coalescing black holes [1] and neutron stars
[2] has naturally increased interest [3] in possible effects that intervening matter may have on these
waves. There is an old result of Hawking [4], that gives the rate of absorption as Γabs = 16πGη/c2 ,
where η is the viscosity of the matter, but this only applies if the collision frequency in the matter
is much greater than the frequency of the gravitational wave, which it typically is not. We have
recently studied the opposite extreme case of gravitational wave propagation through collisionless
matter [5], but we found no observable effects, except perhaps for cosmological sources. This
paper will consider a case different from either of these: the quantum mechanical absorption of low
frequency gravitational waves during collisions occurring in intervening matter.
Though we will keep our treatment of this effect as general as possible, in making approximations
we shall have in mind the natural application to clusters of galaxies. They are big, and contain
hot ionized gas with temperatures of a few keV and electron number densities of 10−3 cm−3 to
10−2 cm−3 . We will see that the absorption rate increases sharply with decreasing frequency,
so that it might be thought that clusters of galaxies could effectively block gravitational waves
with frequency less than a few hundred nanohertz, just the waves from binaries of supermassive
black holes in galaxy clusters that might otherwise be detected by pulsar timing observations [6].
Fortunately, it turns out that this absorption is almost entirely cancelled by stimulated emission,
and does not present an obstacle to this use of pulsar timing observations.

II ABSORPTION AND EMISSION

We begin with a reminder of the relation between the rates of emission and absorption of a soft
graviton or photon, which will allow us to use an old formula [7] for the rate of gravitational wave
emission in collisions to calculate the rate of gravitational wave absorption in the same collisions.
Suppose we write the rate of emission of photons or gravitons with momenta in a range d3 q around
q in collision processes in a volume V as

dΓem = V |M|2 d3 q , (1)

where |M|2 is proportional to the thermal average of a sum over helicity of a squared matrix element
and products of densities of colliding particles. Crossing symmetry dictates that the matrix elements
for absorption and emission of a very soft neutral massless particle are the same apart from phases,
but because in absorption there is one more particle in the initial state the absorption rate has an
additional factor (2πh̄)3 /V , and of course it does not have the factor d3 q for the final photon or
graviton that appears in the emission rate. Hence the absorption rate is
1
Γabs = (2πh̄)3 |M|2 , (2)
2
with the factor 1/2 appearing here because in calculating the absorption rate we average rather than
sum over helicity. The absorption rate can therefore conveniently be expressed in terms of a quantity
familiar in astrophysics, the emissivity jν , defined as the energy emitted per time, per volume, per
photon or graviton solid angle dΩ, and per frequency interval dν for frequencies in the range ν to

2
ν + dν. Since d3 q = q 2 dq dΩ = (2πh̄/c)3 ν 2 dν dΩ, the emissivity is jν = 2πh̄ν × (2πh̄/c)3 ν 2 |M|2 ,
and so the absorption rate is related to the emissivity by
!
c3
Γabs (ν) = jν . (3)
4πh̄ ν 3

This result is derived from quantum mechanics alone, without considerations of thermal equi-
librium. It is therefore limited to very low temperature, with kT ≪ 2πh̄ν, and in particular does
not take into account the effect of stimulated emission. To deal with the more general case it is
easiest to adopt the assumption (not entirely obvious for gravitons) that it is possible to bring the
radiation into thermal equilibrium with the medium at temperature T . In this case, the balance
between emission and net absorption (subtracting stimulated emission) of photons or gravitons re-
quires [8] that jν = Γnet abs (ν)B(ν)/4π, where B(ν) is the black-body energy density per frequency
interval
16π 2 h̄ν 3 h 2πh̄ν/kT i−1
B(ν) = 3
e −1 .
c
In place of Eq. (3), we have then for general temperature
!
c3 h i
Γnet abs (ν) = jν e2πh̄ν/kT − 1 .
4πh̄ ν 3

This result is derived in an appendix, under the assumption that the particles with which the
radiation interacts are in thermal equilibrium with one another, without needing to assume that
the radiation itself can be brought into equilibrium with these particles.
The emissivity contains a factor e−2πh̄ν/kT , which combined with the factor e2πh̄ν/kT − 1 in
Eq. (4) yields a factor 1 − e−2πh̄ν/kT , with the first and second terms representing the effects of
absorption and stimulated emission. Here we are concerned with the case of very low frequency,
for which 2πh̄ν ≪ kT , so whether or not we take account of the factor e−2πh̄ν/kT in the emissivity,
we have simply !
c3 2πh̄ν

Γnet abs (ν) = jν . (4)
4πh̄ ν 3 kT

III GRAVITON ABSORPTION

The rate for the production of graviton energy ≤ E in a single collision of some type α → β is
given [7] for small E by
dΓα→β (≤ E) → (E/Λ)B b(B) dΓ0α→β . (5)
Here
G X 2
1 + βnm 1 + βnm
 
B= ηn ηm mn mm 2 )1/2
ln , (6)
2πh̄c n,m βnm (1 − βnm 1 − βnm
where the sums run over all particles participating in the reaction α → β; mn is the rest mass of
the nth particle; ηn equals +1 or −1 for particles in the initial state α or final state β; and cβnm

3
is the velocity of either of particles n or m in the rest frame of the other particle:
" #1/2
m2 m2 c4
βnm ≡ 1− n m 2 . (7)
(pn · pm )

Also, b(B) is the function

1 +∞ sin σ dσ 1 dω  iωσ π2 B 2
Z  Z  
b(B) ≡ exp B e −1 =1− + ... , (8)
π −∞ σ 0 ω 12
and dΓ0α→β is the differential rate for the same process without soft graviton emission and without
radiative corrections from virtual infrared gravitons, where Λ is a more-or-less arbitrary upper limit
on virtual graviton four-momenta that is used to define what we mean by “infrared.” ( Λ will not
appear in our final results.) The differential rates dΓα→β (≤ E) and dΓ0α→β are rates for producing
the particles in the final state β in some infinitesimal element of their momentum spaces, the same
for both rates. (The formula given in natural units in ref. [7] has been modified here by inserting
a factor 1/h̄c to make B dimensionless in cgs units.) Since generally B ≪ 1. we can approximate
b(B) = 1 and write Eq. (5) as

dΓα→β (≤ E) → [1 + B ln(E/Λ)] dΓ0α→β . (9)

and the rate of energy radiation per collision and per frequency interval at low frequency is then
d
2πh̄ν dΓα→β (≤ 2πh̄ν) → 2πh̄B dΓ0α→β . (10)

Formula (10) applies for relativistic or non-relativistic processes involving any number of par-
ticles of arbitrary spin, whatever the interactions may be that produce the reaction α → β. If we
now specialize to the case where α → β is non-relativistic elastic two-body scattering, and take
into account the conservation of energy and momentum, we have [7]
8G 2 4 2
B= µ v sin θc . (11)
5πh̄c5
where µ is the reduced mass, v ≡ |v1 − v2 | is the relative speed, and θc is the scattering angle in
the center-of-mass system. Eqs. (10) and (11) then give the emissivity at low frequency as

2πh̄ 8Gµ2
jν → n 1 n 2 v 5 σD , (12)
4π 5πh̄c5
where n1 and n2 are the number densities of the two colliding particles, σD is a deflection cross
section

Z
σD ≡ sin2 θc dΩ′ , (13)
dΩ′
and the bar indicates an average over incident velocities.
Using Eq. (4) now gives a general formula for the net rate at which gravitational waves of low
frequency ν with 2πh̄ν ≪ kT are absorbed in non-relativistic two-body collisions
Gµ2 2πh̄ν
 
Γnet abs (ν) → 2 2 3
n 1 n 2 v 5 σD . (14)
5π h̄c ν kT

4
the final factor 2πh̄ν/kT arising from the near cancellation of absorption by stimulated emission.
We now specialize further, and consider the Coulomb scattering of electrons by protons in fully
ionized hydrogen. Here µ is close to the electron mass me ; v is close to the initial electron velocity;
n1 = n2 is the electron number density ne , and the cross-section (13) is

2πe4 +1 sin2 θ d cos θ


Z
σD = 2 4 , (15)
me v −1 [1 − cos θ + h̄2 /2m22 v 2 ℓ2 ]2
p
where e is the electron charge in unrationalized electrostatic units, and ℓ ≡ kT /4πne e2 is the
Debye screening length. Eq. (15) is derived using the Born approximation, which applies in intra-
cluster gas because typical electron kinetic energies are much larger than a Rydberg, 13.6 eV. Also,
in Eq. (15) we are neglecting the difference 2πh̄ν between initial and final electron energies, which
is a good approximation because we are concerned with gravitational wave frequencies much less
than the plasma frequency, which in intracluster gas is a few hundred Hz.
As we shall see, in the cases that interest us here me vℓ/h̄ is very large for typical values of v,
and Eq. (15) therefore gives
" !#
4πe4 4m2e v 2 ℓ2
σD = 2 4 −2 + ln . (16)
me v h̄2

The thermal average in Eq. (14) is then, for 2πh̄ν ≪ kT ,


" ! #
8e4 (2πkT )1/2 8me ℓ2 kT
v 5 σD = ln −1−γ , (17)
me
5/2 h̄2

where γ = 0.577 . . . is the Euler constant.


Before drawing consequences from Eqs. (14) and (17), we need to check that the absorption
occurs in independent collisions, rather than in an imperfect fluid as was assumed in ref. [4]. So we
need to ask, what is the rate νC of relevant collisions? The absorption of gravitons in a collision of an
electron with a proton is unaffected if at the same time the electron experiences forward scattering
by the Coulomb field of some distant other proton, so the cross section to use in estimating νC is
not the total cross section, but something like the deflection cross section σD . Also, we are not
interested in collisions of very slow electrons which, because the factor v 4 in Eq. (11) cancels the
factor v −4 in Eq. (15), contribute little to gravitational wave emission or absorption. Therefore
instead of taking νC as the thermal average of ne vσD , we will take it as the average weighted with
an additional factor v 4 :
νC = ne v 5 σD /v 4 , (18)
where the bar again indicates an ordinary thermal average. (It would make little difference numer-
ically if we weighted the average over velocity with any power v n with n ≥ 1 instead of v 4 .) The
argument of the logarithm is

8me ℓ2 kT 2me (kT )2 30


h i2 h
−3 −3
i−1
= = 6 × 10 kT (keV) n e (10 cm ) . (19)
h̄2 πe2 ne h̄2
This is so large that changes of a few orders of magnitude in kT or ne make little difference
in the logarithm, so we shall fix the quantity in square brackets in Eq. (17) to have the value

5
ln(6 × 1030 ) − 1 − γ = 69.3. The effective collision frequency (18) is then
2 " ! #
4 me 8ne e4 (2πkT )1/2 8me ℓ2 kT

νC = × ln −1−γ
15 2kT me
5/2 h̄2
h ih i−3/2
= 2.5 × 10−12 sec−1 ne (10−3 cm−3 ) kT (keV) . (20)

There is still a question, whether the appropriate condition that allows us to treat the collisions
in which gravitons are absorbed as independent is that electrons experience many cycles of the
gravitational wave between collisions, which requires that ν ≫ νC , or that electrons pass through
many gravitational wavelengths between collisions, which requires that the mean free path v/νC
be much longer than the wavelength c/ν for typical electron velocities v, or in other words, that
ν ≫ (c/v)νC . Since v < c the second condition is always more stringent. For kT ≃ 1 keV electrons
typically have v/c ≃ 1/30, so for relevant temperatures and densities (c/v)νC is sufficiently less
than the gravitational wave frequencies we will consider so that we can use Eq. (14) for the graviton
absorption rate.
Eqs. (14) and (17) now give the net absorption rate for 2πh̄ν ≪ kT :
" ! #
Gm2 8e4 (2πkT )1/2 8me ℓ2 kT  
Γnet abs (ν) = 2 2e 3 n2e × ln −1−γ 2πh̄ν/kT
5π h̄c ν me
5/2 h̄2
h i2 h i1/2
1.4 × 10−34 sec−1 ne (10−3 cm−3 ) kT (keV)  
= h i3 2πh̄ν/kT . (21)
ν(sec−1 )

If it were not for the cancellation of absorption by stimulated emission, represented by the final
factor 2πh̄ν/kT , the mean distance c/Γabs for graviton absorption in fully ionized hydrogen with
density ne ≃ 10−3 cm−3 and temperature kT ≃ 1 keV would be less than 1 Mpc at frequencies
less than 240 nanohertz. This covers the range of frequencies of gravitational waves that might be
detected by observation of pulsar timing [6]. Fortunately, for kT ≈ 1 keV and ν ≈ 200 nanohertz
the net absorption is suppressed by the factor 2πh̄ν/kT ≈ 10−24 , and has no relevant effect on
gravitational wave propagation. Indeed, if nanohertz gravitational waves are observed coming from
galaxy clusters, it will show that gravitons like photons are produced by stimulated emission.
Since gravitational interactions are universal, gravitational waves may also be absorbed in in-
tergalactic space in collisions other than the electron-proton collisions considered here, such as
collisions of possible dark matter particles that interact strongly with one another. The net ab-
sorption rate in any non-relativistic elastic two-body collisions may be calculated using Eq. (14),
or in more general collisions by using Eqs. (4), (10), and (6).

Appendix

In this appendix we will derive a general formula for the rate of change of the occupation number
n(q, λ) of gravitons or photons interacting with a hot gas, that will exhibit the effects of stimulated
emission as well as absorption and spontaneous emission. The occupation number is defined so
that n(q, λ)d3 q/(2πh̄)3 is the number density of gravitons or photons of helicity λ in a volume d3 q

6
of momentum space around momentum q. Its rate of change due to absorption and spontaneous
and stimulated emission in collisions of particles 1, 2, . . . is

dn(q, λ) n(q, λ)
Z
=− d3 p1 d3 p2 · · · d3 p′1 d3 p′2 · · · n1 (p1 )n2 (p2 ) · · ·
dt 2πh̄
×δ3 (q + p1 + p2 + . . . − p′1 − p′2 − . . .)δ(|q|c + E1 + E2 + . . . − E1′ − E2′ − . . .)
2
× M (λ, q, p1 , p2 . . . → p′1 , p′2 . . .)

1 + n(q, λ)
Z
+ d3 p1 d3 p2 · · · d3 p′1 d3 p′2 · · · n1 (p1 )n2 (p2 ) · · ·
2πh̄
×δ3 (p1 + p2 + . . . − q − p′1 − p′2 − . . .)δ(E1 + E2 + . . . − |q|c − E1′ − E2′ − . . .)
2
× M (p1 , p2 . . . → λ, q, p′1 , p′2 . . .) . (22)

Here M is the coefficient of the energy and momentum conservation delta functions in the S-matrix
element for the indicated process, and n1 , n2 , etc. are the occupation numbers for the colliding
particles. We assume that n1 , n2 , etc. are all much less than unity, so that we do not need to
take account of the Pauli exclusion principle where the colliding particles are fermions, or of the
stimulated emission of the colliding particles if they are bosons. As usual, the factor 1 + n(q)
arises from the commutator of the photon or graviton creation operator with the n + 1 annihilation
operators in the adjoint of the final state. The matrix element M and the occupation numbers
of the colliding particles depend on spin indices, which are suppressed; they are understood to be
summed along with integrations over momenta.
We now interchange the labels of the momenta (and spins) of the colliding particles in the
first term of Eq. (22), and make use the unitarity of the S-matrix, which implies that for any
multiparticle transition
Z 2 Z 2
dβ δ3 (pβ − pα )δ(Eβ − Eα ) M (β → α) = dβ δ3 (pβ − pα )δ(Eβ − Eα ) M (α → β) , (23)

R
where dβ is understood to include a sum over the spins of all particles in the state β as well as
an integration over all 3-momenta of these particles. We then have

dn(q, λ) 1
Z
= d3 p1 d3 p2 · · · d3 p′1 d3 p′2 · · ·
dt 2πh̄
×δ3 (p1 + p2 + . . . − q − p′1 − p′2 − . . .)δ(E1 + E2 + . . . − |q|c − E1′ − E2′ − . . .)
2
× M (p1 , p2 . . . → λ, q, p′1 , p′2 . . .)

h   i
× −n(q, λ)n1 (p′1 )n2 (p′2 ) · · · + 1 + n(q, λ) n1 (p1 )n2 (p2 ) · · · . (24)

We now assume that the colliding particles are in thermal equilibrium with each other, though
not necessarily with the photons or gravitons. Since we are assuming that their occupation numbers
are small, for E1 + E2 + . . . = |q|c + E1′ + E2′ + . . . we have

n1 (p1 )n2 (p2 ) · · ·    


= exp − |q|c/kT = exp − 2πh̄ν/kT (25)
n1 (p′1 )n2 (p′2 ) · · ·

7
so
dn(q, λ) 1
Z
= d3 p1 d3 p2 · · · d3 p′1 d3 p′2 · · ·
dt 2πh̄
×δ3 (p1 + p2 + . . . − q − p′1 − p′2 − . . .)δ(E1 + E2 + . . . − |q|c − E1′ − E2′ − . . .)
2
× M (p1 , p2 . . . → λ, q, p′1 , p′2 . . .) n1 (p1 )n2 (p2 ) · · ·

h h   ii
× 1 − n(q, λ) exp 2πh̄ν/kT − 1 . (26)

The first term +1 in the square


 brackets on the last line arises from spontaneous emission, while
the terms − exp 2πh̄ν/kT and +1 multiplying n(q, λ) arise respectively from absorption and
stimulated emission.
 The important
 point for this paper is that the ratio of stimulated emission to
absorption is exp − 2πh̄ν/kT .

We are grateful to Aaron Zimmerman for helpful conversations about pulsar timing observa-
tions and other matters. This article is based on work of R. F. supported in part by the Alfred
P. Sloan Foundation, the Department of Energy under grant de-sc0009919, and the Simons Foun-
dation/SFARI 560536. and of S. W. supported by the National Science Foundation under Grant
Number PHY-1620610, and with support from the Robert A. Welch Foundation, Grant No. F-0014.

REFERENCES

1. B. F. Abbott et at. [LIGO and Virgo collaborations], Phys. Rev. Lett. 116, 061102 (2016)
[arXiv: 1602.03837].

2. B. P Abbott et al. (LIGO Scientific Collaboration, Virgo Collaboration, and other collabo-
rations), Astrophys. J. 848, L12 (2017).

3. For instance, see E. Calabrese, N. Battaglia, and D. N. Spergel, Class. Quant. Grav. 33,
165004 (2016).

4. S. W. Hawking, Ap. J. 145, 544 (1966).

5. R. Flauger and S. Weinberg, Phys. Rev. D 75, 123505 (2007).

6. S. Detweiler, Ap. J. 234, 1100 (1979). For a current review, see S. Burke-Spolaor et al.,
astro-ph/1811.08826..

7. S. Weinberg, Phys. Rev. 140, B516 (1965).

8. For instance, see F. H. Shu, The Physics of Astrophysics, Vol. 1 (University Science Books,
Mill Valley, CA, 1991).

8
UTTG-01-08
arXiv:0804.4291v2 [hep-th] 15 May 2008

Effective Field Theory for Inflation

Steven Weinberg∗
Theory Group, Department of Physics, University of Texas
Austin, TX, 78712

Abstract

The methods of effective field theory are used to study generic theories of
inflation with a single inflaton field. For scalar modes, the leading corrections
to the R correlation function are found to be purely of the k-inflation type.
For tensor modes the leading corrections to the correlation function arise
from terms in the action that are quadratic in the curvature, including a
parity-violating term that makes the propagation of these modes depend on
their helicity. These methods are also briefly applied to non-generic theories
of inflation with an extra shift symmetry, as in so-called ghost inflation.


Electronic address: weinberg@physics.utexas.edu

1
I. Generic Theories of Inflation

Observations of the cosmic microwave background and large scale struc-


ture are consistent with a simple theory of inflation[1] with a single canoni-
cally normalized inflaton field ϕc (x), described by a Lagrangian
" #
√ M2 1
L0 = g − P R − gµν ∂µ ϕc ∂ν ϕc − V (ϕc ) , (1)
2 2

where g ≡ −Detgµν , MP ≡ 1/ 8πG is the reduced Planck mass, and V (ϕc )
is a potential down which the scalar field rolls more-or-less slowly. With this
theory, the strength of observed fluctuations in the microwave background
matter density indicates that the cosmic expansion rate H ≡ ȧ/a and the
physical wave number k/a at horizon exit, when these are equal, have the

value[2] H = k/a ≈ ǫ × 2 × 1014 GeV, where ǫ is the value of −Ḣ/H 2
at this time, and a is the Robertson–Walker scale factor. Hence H and
k/a at horizon exit are likely to be much less than MP ≃ 2.4 × 1018 GeV,
and even considerably less than a plausible grand unification scale ≈ 1016
GeV. This provides a justification after the fact for using a Lagrangian (1)
with a minimum number of spacetime derivatives. (As is well known, (1)
is the most general Lagrangian density for gravitation and a single scalar
field with no more than two spacetime derivatives. An arbitrary function
of ϕ multiplying the first term could be eliminated by a redefinition of the
metric, and an arbitrary function of ϕ multiplying the second term could be
eliminated by a redefinition of ϕ.)
But H and k/a at horizon exit are not entirely negligible compared with
whatever fundamental scale characterizes the theory underlying inflation,
and at earlier times k/a is exponentially larger than at horizon exit, so it
is worth considering the next corrections to (1). We assume that (1) is
just the first term in a generic effective field theory, in which terms with
higher derivatives are suppressed by negative powers of some large mass
M , characterizing whatever fundamental theory underlies this effective field
theory. Rather than committing ourselves to any particular underlying the-
ory, we will simply assume that all constants in the higher derivative terms
of the effective Lagrangian take values that are powers of M indicated by
dimensional analysis, with coefficients roughly of order unity. Because H
and k/a are so large during inflation, observations of fluctuations produced
during inflation provide a unique opportunity for detecting effects of higher
derivative terms in the gravitational action.

2
To get some idea of the value of M , we note that the unperturbed canon-
ically normalized√scalar field ϕ̄c described by the Lagrangian (1) has a time
derivative ϕ̄˙ c = 2ǫMP H, so the change in ϕ̄c during √ a Hubble time 1/H at
around the time of horizon exit is of order ϕ̄˙ c /H = 2ǫMP . If we are to use
effective field theory to study fluctuations at about the time of horizon exit
in generic theories in which the dependence of the action on ϕc is uncon-
strained by symmetry principles or by other consequences of an underlying
theory, and if (1) is at least a fair first approximation to the full theory, then
the mass M that is characteristic
√ of the effective field theory of inflation can
not be much smaller than 2ǫMP , for if it were then there would be no limit
on the size of higher-derivative terms containing many powers of ϕc /M . It
follows that the √expansion parameter H/M in this class of theories is no
greater than H/ 2ǫMP ≃ 6 × 10−5 , whatever the value √ of ǫ.
We will tentatively assume here that M is of order 2ǫMP , in which case
the coefficients of the higher-derivative terms in the effective Lagrangian
have to be taken as arbitrary functions of ϕc /M . This is likely to be the
case if ǫ is not too √small, say of order 0.02, since then there is not much
difference between 2ǫMP and MP , and M is unlikely to be much larger
than MP . (The considerations √ presented below would still be valid if M
were instead much larger than 2ǫMP , as for instance if M ≈ MP and ǫ
is very small, but then we would have to count powers of ϕc /M as well as
numbers of derivatives in judging how much the various higher-derivative
terms are suppressed, and some of the coefficient functions in the effective
Lagrangian derived below would be negligible, and all others very simple.)
From now on we will work with a dimensionless scalar field ϕ ≡ ϕc /M , and
write (1) as
" #
√ M2 M 2 µν
L0 = g − P R − g ∂µ ϕ∂ν ϕ − MP2 U (ϕ) , (2)
2 2

where U (ϕ) ≡ V (M ϕ)/MP2 . Note that the unperturbed value of U is (3 −


ǫ)H 2 , so we can think of U as well as ∂µ ϕ∂ µ ϕ as both being of order H 2 at
horizon exit.
The leading correction to (2) will consist of a sum of all generally covari-
ant terms with four spacetime derivatives and coefficients of order unity[3].
By a judicious weeding out of of total derivatives, the most general such
correction term can be put in the form[4]
"
√  2
∆L = g f1 (ϕ) gµν ϕ,µ ϕ,ν + f2 (ϕ)gρσ ϕ,ρ ϕ,σ 2ϕ

3
 2
+f3 (ϕ) 2ϕ + f4 (ϕ)Rµν ϕ,µ ϕ,ν + f5 (ϕ)R gµν ϕ,µ ϕ,ν
+f6 (ϕ)R 2ϕ + f7 (ϕ)R2
#
µν µνρσ
+f8 (ϕ)R Rµν + f9 (ϕ)C Cµνρσ

+f10 (ϕ)ǫµνρσ Cµν κλ Cρσκλ , (3)


where as usual commas denote ordinary derivatives and semicolons denote
covariant derivatives; 2ϕ ≡ gµν ϕ,µ;ν is the invariant d’Alembertian of ϕ;
ǫµνρσ is the totally antisymmetric tensor density with ǫ1230 ≡ +1; and the
fn (ϕ) are dimensionless functions, treated here as of order unity. In the last
two terms, instead of the Riemann–Christoffel tensor Rµνρσ , we have used
the Weyl tensor
1  R 
Cµνρσ ≡ Rµνρσ − gµρ Rνσ −gµσ Rνρ −gνρ Rµσ +gνσ Rµρ + gµρ gνσ −gνρ gµσ .
2 6
(4)
Writing the last two terms in Eq. (3) as bilinears in Cµνρσ rather than Rµνρσ
has no effect in the last term, and in the penultimate term of course just
amounts to a different definition of f7 and f8 . (Similarly, instead of writing
the penultimate term as a bilinear in Cµνρσ or Rµνρσ , we could have written
it as the linear combination of curvature bilinears that appears in the Gauss–
Bonnet identity; even though this linear combination is a total derivative, it
would affect the field equations because its coefficient f9 (ϕ) is not constant.)
Our reason for choosing to use the Weyl tensor in the last two terms will
become apparent soon.
The correction term (3) involves second time derivatives, as well as fields
and their first time derivatives. If we took such a theory literally, we would
find more than just the usual two adiabatic modes for single-field inflation,
and the commutation relations (as given by the Ostrogradski formalism[5])
would be bizarre, with ϕ commuting with ϕ̇. (For instance, Kallosh, Kang,
Linde, and Mukhanov[6] encounter such additional modes when the Ostro-
gradski formalism is applied to a scalar field Lagrangian involving second
time derivatives.) Similarly, there are metric components (such as g00 and
g0i in the ADM formalism[7]) whose time derivatives do not appear in L0 ,
but that do appear in ∆L. If we were to take L0 +∆L as the full Lagrangian,
then the correction term ∆L would cause these auxiliary fields to become
dynamical, with a further expansion of the modes of the system.
Instead, we should remember that from the point of view of effective field
theory, Eqs. (2) and (3) represent just the lowest two terms in an expansion

4
in inverse powers of M , so we must rule out any modes that cannot be
expanded in this way[8]. This means in particular that we must eliminate
all second time derivatives and time derivatives of auxiliary fields in the first
correction terms in the effective action by using the field equations derived
from the leading terms in the action.1 In the present case, we must eliminate
second time derivatives and time derivatives of auxiliary fields in (3) by using
the zero-th order field equations derived from (2):

M 2 2ϕ = MP2 U ′ (ϕ) , Rµν = −(M 2 /MP2 )ϕ,µ ϕ,ν − U (ϕ)gµν . (5)

Using these field equations in Eq. (3) allows us, with some redefini-
tions, to eliminate all of the terms in (3) except the first one and the last
two. Specifically, the second term in Eq. (3) just provides a field-dependent
1
This is equivalent to what is generally done in deriving Feynman rules in effective flat-
space quantum field theories. Consider for instance the very simple effective Lagrangian
1
L = − [∂µ ϕ∂ µ ϕ + m2 ϕ2 + M −2 (2ϕ)2 ] + Jϕ
2
where M ≫ m is some very large mass, and J is a c-number external current. We can
easily find the connected part Γ of the vacuum persistance amplitude:

|J(k)|2
Z
Γ=i d4 k .
k2 + m2 + k4 /M 2

If we took this result seriously, then we would conclude that in addition to the usual
particle with mass m + O(m3 /M 2 ), the theory contains an unphysical one particle state
with mass M + O(m2 /M ). But if we regard L as just the first two terms in a power series
in 1/M 2 , then we must treat the term M −2 (2ϕ)2 as a first-order perturbation, so that
the vacuum persistence amplitude is
Z  
1 k4
Γ=i d4 k|J(k)|2 − 2 2 + ... ,
k +m
2 2 M (k + m2 )2

and the only pole is at k2 = −m2 . This is just the same result for Γ that we would find
if we were to eliminate the second time derivatives in the O(M −2 ) term in L by using the
field equation derived from the leading term in the Lagrangian

2ϕ = m2 ϕ − J .

In this case the effective Lagrangian becomes


1
L = − [∂µ ϕ∂ µ ϕ + m2 ϕ2 + m4 M −2 ϕ2 ] + (1 + m2 /M 2 )Jϕ − J 2 /2M 2 .
2
Taking into account all J-dependent terms, it is straightforward to see that with this
Lagrangian we get the same vacuum persistence amplitude as found above for the the
original Lagrangian, when M −2 (2ϕ)2 is treated as a first-order perturbation.

5
correction to the kinematic term in (2), which can be eliminated by a re-
definition of the inflaton field; the third term just provides a correction
f3 U ′ 2 MP4 /M 4 to the potential in (2), which can be absorbed into a redefi-
nition of U (ϕ); the fourth and fifth terms supply corrections to both f1 (ϕ)
and the kinematic term in (2); the sixth term provides corrections to the
kinematic term and the potential in (2); and the seventh and eighth terms
provide corrections to the kinematic term and potential in (2) and to f1 (ϕ).
That is, with suitable redefinitions of ϕ, U (ϕ), and f1 (ϕ), and with various
total derivatives dropped, the Lagrangian is the sum of (2) and a correction
term of the form
√  2 √
∆L = gf1 (ϕ) gµν ϕ,µ ϕ,ν+ gf9 (ϕ)C µνρσ Cµνρσ +f10 (ϕ)ǫµνρσ Cµν κλ Cρσκλ ,
(6)
The first term is of the type encountered in theories of “k-inflation”[9]. This
term must be included in the Lagrangian, as a counterterm to ultraviolet
divergences encountered when the leading terms in (2) are used in one-loop
order. The second term (or an equivalent Gauss–Bonnet term) has been
considered in connection with inflation and the evolution of dark energy[10].
For a general function f10 (ϕ) the final term in Eq. (6) violates par-
ity conservation[11]. That is, although the action is invariant under co-
ordinate transformations xµ → x′µ that are “small,” in the sense that
Det(∂x′ /∂x) > 0, it is not invariant under inversions, that is, under co-
ordinate transformations with Det(∂x′ /∂x) < 0. It is only invariance under
“small” coordinate transformations that is needed to ensure the conserva-
tion of the energy-momentum tensor, and no sequence of “small” coordinate
transformations can ever add up to an inversion, so there is no a priori rea-
son to impose invariance under inversions, including space inversion. The
fact that parity has always been observed to be conserved in gravitational
interactions is sufficiently explained by the fact that terms in the effective
action for gravity and scalars with no more than two spacetime derivatives
that are invariant under “small” coordinate transformations cannot be com-
plicated enough to violate invariance under inversions.
From now on we shall work in perturbation theory, writing

gµν (x, t) = ḡµν (t) + hµν (x, t) , ϕ(x, t) = ϕ̄(t) + δϕ(x, t) , (7)

where ḡµν (t) is the flat-space Robertson–Walker metric with ḡ00 = −1, ḡ0i =
0, and ḡij = a2 (t)δij ; ϕ̄(t) is the unperturbed scalar field; and hµν and δϕ
are first-order perturbations. In this paper we will mostly be concerned with

6
the terms in the Lagrangian that are quadratic in perturbations, which are
needed for the calculation of Gaussian correlations. Terms of higher order
in perturbations that are needed for the calculation of non-Gaussian effects
will be considered only briefly.
Because the spatially flat Robertson–Walker metric is also conformally
flat it has a vanishing Weyl tensor, and so the Weyl tensor starts with a
term of first order in perturbations. This saves us from having to calculate
the Weyl tensor to second order in perturbations; we have simply
(2)

  2
(2)
[∆L] = gf1 (ϕ) gµν ϕ,µ ϕ,ν
(1) (1) (1)
(1)
+a3 f9 (ϕ̄)ḡµκ ḡνλ ḡρη ḡσζ Cκληζ Cµνρσ + f10 (ϕ̄)ǫµνρσ ḡκη ḡλζ Cµνηζ Cρσκλ , (8)

where the superscripts (1) and (2) denote terms of first and second order
in perturbations, respectively. Furthermore, the Weyl tensor is traceless,
which to first order in perturbations gives
(1) (1) (1) (1) (1)
Ci0k0 = a−2 Cijkj , Cijj0 = Ci0i0 = Cijij = 0 . (9)

Since scalar and tensor fluctuations do not interfere in Gaussian correlations,


they will be considered separately.

II. Scalar Fluctuations

Here we are interested in terms in the Lagrangian that, after eliminating


auxiliary fields, are quadratic in R, the familiar gauge-invariant quantity
that is conserved outside the horizon[1]
A Hδϕ
R≡ − , (10)
2 ϕ̄˙
with A defined by writing the spatial part of the metric perturbation for
scalar perturbations in a general gauge as
" #
2 ∂ 2 B(x, t)
hij (x, t) = a (t) δij A(x, t) + . (11)
∂xi ∂xj

Let us consider in turn the contribution of the three terms in Eq. (8) to the
quadratic part of the Lagrangian for R.
First, terms in the effective Lagrangian like the first term in Eq. (8)
that depend only on ϕ and ∂µ ϕ∂ µ ϕ are known to enter into the part of the

7
Lagrangian quadratic in scalar fluctuations only through their effect on the
sound speed cs (t)[12]. That is, after eliminating auxiliary fields,
" √ #(2)
gMP2 √  MP2 Ḣ 3 1 2 1 ~ 2
  
− R + gP − ∂µ ϕ∂ µ ϕ/2, ϕ =− 2
a 2
Ṙ − 2 (∇R) ,
2 H cs a
(12)
where " , #
∂ 2 P (X, ϕ̄) ∂P (X, ϕ̄)
c−2
s =1+2 X . (13)
∂X 2 ∂X X=ϕ̄˙ 2 /2
In particular, the first term in Eq. (8) shifts the squared speed of sound by
16ḢMP2 f1 (ϕ̄)
∆c2s = , (14)
M4
corresponding to a second-order perturbation

√  2 (2) 16MP4 a3 Ḣ 2 f1 (ϕ̄) 2
gf1 (ϕ) gµν ϕ,µ ϕ,ν = Ṙ . (15)
M 4H 2
The other terms in Eq. (8) are greatly simplified by noting that, for scalar
modes, Cijk0 must take the form of δik ∂j − δjk ∂i acting on some scalar, so
(1) (1)
the above trace condition Cijj0 = 0 implies that Cijk0 = 0. Hence all we
need to evaluate the second term in (8) are the purely spatial components of
the Weyl tensor. Using the field equations (5) to eliminate auxiliary fields
and second time derivatives, we find after a straightforward though tedious
calculation that
(1) (1) (1) (1) (1)
h i
(1)
a3 f9 (ϕ̄)ḡµκ ḡνλ ḡρη ḡσζ Cκληζ Cµνρσ = a−5 f9 (ϕ̄) Cijkl Cijkl + 4Cijkj Cilkl
16Ḣ 2 3
= a f9 (ϕ̄)Ṙ2 . (16)
3H 2
Comparing this with Eq. (15), we see that the effect of the second term in
Eq. (8) is the same as a change in the coefficient f1 of the first term by an
amount
M4
∆f1 (ϕ) = f9 (ϕ) . (17)
3MP4
(1)
Finally, because Cijk0 vanishes for scalar modes, the last term in Eq. (8)
is
(1) (1)
f10 (ϕ̄)ǫµνρσ ḡκη ḡλζ Cµνηζ Cρσκλ
h i
(1) (1) (1) (1)
= f10 (ϕ̄)ǫijk0 4a−4 Cijlm Ck0lm − 8Cijl0 Ck0l0 = 0 . (18)

8
We conclude that the leading corrections to the Gaussian correlations of
R are solely of the k-inflation type. This justifies the calculation of the
effective Lagrangian for Gaussian scalar correlations in slow roll inflation
in Section 3 of reference 3 even for generic theories of inflation. In such
theories the terms in Eq. (3) that are left out in reference 3 can indeed
be omitted in calculating the part of the effective Lagrangian quadratic in
scalar fluctuations, not because they are small, but because as we have seen
for scalar Gaussian fluctuations they yield nothing new. But this is not the
case for Gaussian tensor fluctuations, and does not seem to be the case when
non-Gaussian correlations are considered.
We have so far only considered the terms in the effective action of sec-
ond order in R, which are needed to calculate Gaussian correlations, but
for actions of the k-inflation type, which only involve first derivatives of
fields, it is not difficult also to calculate terms in the action of higher or-
der in R, which generate non-Gaussian correlations. For this purpose it
is convenient to adopt a gauge in which there are no scalar perturbations
to gij ; that is, in which gij = a2 (t)[exp(D(x, t))]ij , where Dij is a gravita-
tional wave amplitude with Dii = 0 and ∂i Dij = 0. In this gauge, Eq. (10)
gives R = −Hδϕ/ϕ̄˙ . If we tentatively ignore the interaction of the inflaton
with gravitational perturbations, and assume that H, f1 , and ϕ̄˙ are varying
slowly, then it is trivial, by simply setting ϕ equal to ϕ̄ + δϕ in L, and using
2 2
Ḣ = −ϕ̄˙ (M 2 +4f1 ϕ̄˙ )/2MP2 , to write a Lagrangian for π ≡ −R/H = δϕ/ϕ̄˙ :
" #
√ M 2 µν
g − g ∂µ ϕ∂ν ϕ − MP2 U (ϕ) + f1 (ϕ)(gµν ∂µ ϕ∂ν ϕ)2
2
 
~ 2
= L̄ + a3 MP2 Ḣ −π̇ 2 + a−2 (∇π)
~ 2 (∇π)
~ 2 π̇ 4 π̇ 2 (∇π) ~ 4
!
16a3 MP4 Ḣ 2 f1 (ϕ̄) 2 3 π̇(∇π)
+ π̇ + π̇ − + − + . (19)
M4 a2 4 2a2 4a2

This agrees with the result obtained in Eq. (28) of [3], except that here we
include terms quartic in π. In [3] the neglect of interactions of the inflaton
with gravitational perturbations is justified on the basis of a “high energy”
approximation, which amounts to the usual slow-roll approximation that
ǫ ≪ 1, plus the assumption that, in our√terms, M 2 ≫ ǫHMP , which is
much weaker than the assumption M ≫ 2ǫMP that we found necessary
to treat generic theories of inflation by the methods of effective field theory.
We see that the non-quadratic terms in L0 that can generate non-Gaussian
correlations are suppressed in the slow roll approximation, as found by Mal-

9
dacena[13], but in ∆L the coefficients of the quadratic and higher order
terms are of the same order of magnitude.
III. Tensor Fluctuations
Tensor fluctuations appear solely in the perturbation to the purely spa-
tial metric:
h i
hij (x, t) = a2 (t) exp D (x, t) , Dii = 0 , ∂i Dij = 0 , (20)
ij

with δϕ = 0. The first term in Eq. (8) involves only the metric components
g00 and Detgµν , so it gets no contribution from tensor fluctuations. On
the other hand, here the second and third terms in (8) make a non-trivial
contribution to the Lagrangian for Dij . Another straightforward calculation
(dropping total derivatives) gives these terms as
(1)
(1)
a3 f9 (ϕ̄)ḡµκ ḡνλ ḡρη ḡσζ Cκληζ Cµνρσ
(1) (1) (1) (1) (1) (1)
h i
= f9 (ϕ̄) a−5 Cijkl Cijkl − 4a−3 Cijk0 Cijk0 + 4a−1 Ci0k0 Ci0k0
(
h
= a f9 (ϕ̄) Ḋik 2H 2 + 2∇2 /a2 ]Ḋik − 4H Ḋik (∇2 /a2 )Dik
3

)
4 4
+ 2Dik (∇ /a )Dik , (21)

and
(1) (1) (1) (1) (1) (1)
h i
f10 (ϕ̄)ǫµνρσ ḡκη ḡλζ Cµνηζ Cρσκλ = f10 (ϕ̄)ǫijk0 4a−4 Clmij Clmk0 − 8a−2 Cl0ij Cl0k0
∂h i
= 4f10 (ϕ̄)ǫijk0Dil ∂j ∇2 Dkl . (22)
∂t
The field equation for the tensor mode (with the term proportional to f9
dropped for simplicity) is then
 
D̈il + 3H Ḋil − (∇2 /a2 )Dil = −64πGf˙10 a−3 ǫijk0 ∂j ∇2 Dkl + ǫljk0 ∂j ∇2 Dki .
(23)
~
For a plane wave with co-moving wave number k in the 3-direction, the only
non-vanishing tensor amplitudes are D11 = −D22 and D12 = D21 . They
satisfy the field equations
D̈± + 3H Ḋ± + (k2 /a2 )D± = ∓128πG(k/a)3 f˙10 D± (24)
where D± ≡ D11 ∓ iD12 are the amplitudes with helicity ±2. As found in
ref. [11], the wave equation depends on helicity because parity is violated.

10
IV. A Non-Generic Example: Ghost Inflation

Up to now, we have been concerned with generic theories of inflation, in


which the dependence of the action on the inflaton field is unconstrained,
and in consequence the characteristic mass M cannot be taken to be much

less than ǫMP . For an example of a different sort, we might impose on the
action a shift symmetry, under a transformation ϕ → ϕ+constant, which
requires that the Lagrangian density involve only spacetime derivatives of
ϕ rather than ϕ itself. This possibility was discussed briefly in [9], and in
more detail under the name “ghost inflation” in [14]. We will take ϕ to be
normalized so that ∂µ ϕ is dimensionless, and has an unperurturbed value
at horizon exit of order unity. The term in the Lagrangian density that
depends only on ∂µ ϕ is then

L0 = M 4 gP (−∂µ ϕ∂ µ ϕ) , (25)

where P (X) is a power series in X, with coefficients assumed to be of order


unity, and M is the characteristic mass of the theory. Since powers of ϕ are
excluded by the shift symmetry, M here can be much smaller than in generic
theories of inflation, and in particular we will assume that M is much less
than the Planck mass MP . Any additional derivatives acting on ∂µ ϕ or on
the metric yield factors of order H ∼ M 2 /MP ≪ M , so Eq. (25) along with
the Einstein term can be taken as the leading term in L, with any correction
terms suppressed by factors of H/M .
Let us first consider a theory in which (25) is the whole Lagrangian
density for the scalar field, with no higher-derivative corrections. The field
equation for the unperturbed scalar field ϕ̄(t) in this theory is

d 3 ′ 2 
a P (ϕ̄˙ )ϕ̄˙ = 0 . (26)
dt
As noted in [9], in the limit of late time when a → ∞, either ϕ̄˙ → 0, or
ϕ̄˙ → v, where v is a quantity of order unity satisfying P ′ (v 2 ) = 0. We will
consider only the latter case. In ref. [14] the limit ϕ̄ = vt is supposed to
be already reached, in which case interesting fluctuations occur only when
higher-derivative correction terms are added to (25). But if we take ϕ̄(t) to
be only close to vt, but not yet there, then we find a non-trivial spectrum of
propagating fluctuations even when no correction terms are added to (25).
In this case the solution of Eq. (26) (with an appropriate normalization of
a(t)) has ϕ̄˙ → v+a−3 ; the speed of sound is cs → 1/va3 ; the expansion rate
p

11
approaches a limit H∞ = (M 2 /MP ) P (v 2 )/3; and the Fourier transform
p

of R is
2k
 
−3/2 (1)
Rk ∝ a H3/5 , (27)
5H∞ v 1/2 a5/2
with a k-independent constant of proportionality. At late times, when the
perturbation wave length is outside the acoustic horizon, this approaches a
time-independent quantity Rok , with

|Rok |2 ∝ k−6/5 , (28)

corresponding to a conventional scalar slope index nS = 14/5, which of


course is empirically ruled out. Thus to have a realistic theory of this sort,
we must consider corrections to the leading term (25).
The first correction to this Lagrangian density contains just one factor
of a second derivative of ϕ, and in general is of the form

∆L = M 3 gQ(−∂µ ϕ∂ µ ϕ)2ϕ , (29)

and is therefore suppressed relative to (25) by factors of order H/M ≈


M/MP . (A term proportional to gρκ gσλ ϕ,κ ϕ,λ ϕ,ρ;σ can be put in the form
(29) by adding suitable total derivatives. In ref. [14] these terms were
excluded by imposing an additional symmetry under the reflection ϕ → −ϕ,
in which case the first correction is quadratic rather than linear in second
derivatives of ϕ.) Once again, we must eliminate the second time derivatives
in ∆L by setting ϕ̈ equal to the same quantity as given by the field equation
derived from the leading part of the Lagrangian

P ′ (−∂ν ϕ∂ ν ϕ)2ϕ − 2P ′′ (−∂ν ϕ∂ ν ϕ)(∂ν ϕ);µ ∂ µ ϕ∂ ν ϕ = 0 . (30)

This is pretty complicated, so for simplicity let us consider the case of a


metric fixed in the flat-space Robertson–Walker form. Then after using (30)
to eliminate second time derivatives in (29), the correction term is

2QP ′′
 
3 3
∆L = M a − 2a−2 ϕ̇∂i ϕ∂i ϕ̇
P ′ + 2P ′′ ϕ̇2

+a−2 H ϕ̇∂i ϕ∂i ϕ + a−4 ∂i ϕ∂j ϕ∂i ∂j ϕ + 3H ϕ̇3 − a−2 ϕ̇2 ∇2 ϕ (, 31)

where Q, P ′ and P ′′ all have arguments −∂µ ϕ∂ µ ϕ = ϕ̇2 −a−2 ∂i ϕ∂i ϕ. This is
the correction that has to be added to L0 in order to find the commutation
relations of the field as well as the field equations by canonical quantization.

12
I am grateful for discussions with N. Arkani-Hamed, J. Distler, J. Meyers,
S. Odintsov, S. Paban, D. Robbins, and L. Senatore. This material is based
upon work supported by the National Science Foundation under Grant No.
PHY-0455649.

References

1. For reviews with references to the original literature, see V. Mukhanov,


Physical Foundations of Cosmology (Cambridge University Press, 2005);
S. Weinberg, Cosmology (Oxford University Press, 2008).

2. This is based on third year WMAP results; D. Spergel et al., Astro-


phys. J. Suppl. 170, 288 (2007).

3. This is different from the approach followed in an interesting recent


paper on the effective field theory of inflation by C. Cheung, P. Crem-
inelli, A. L. Fitzpatrick, J. Kaplan, and L. Senatore, arxiv:0709.0293,
whose calculations do not include any of the terms in Eq. (3) follow-
ing the first term. Their paper does not spell out the rules governing
which terms are to be included in the corrections of leading order, but
on the basis of a private communication with Senatore, I gather that
in judging how much various correction terms are suppressed, Cheung
et al. do not count spacetime derivatives acting on the background
scalar or metric fields in a co-moving (or “unitary”) gauge in which ϕ
equals its unperturbed value ϕ̄, but only count derivatives acting on
fluctuations in this gauge. In co-moving gauge the first term in (3)

is gf1 (ϕ̄)(g00 ϕ̄˙ 2 )2 , and the factors of ϕ̄˙ are not counted as suppress-
ing this term, because |ϕ̄˙ c | is much larger than H 2 . But as we have
seen, in generic theories of inflation the characteristic mass scale M of

the effective field theory must be at least as large as ǫMP , and the
quantity |ϕ̄˙ c |/M is then no larger than H, so that the first term in (3)
is no less suppressed than the other terms. (Cheung et al. do at first
include terms involving the extrinsic curvature of the spacelike surface
with ϕ constant, but later drop these terms. The extrinsic curvature
is not included here in Eq. (3), because in a general gauge it does not
give a local term in the action. But Cheung et al. stick to co-moving
gauge, in which the extrinsic curvature can be expanded in a series of
local functions, and these do yield some though not all of the terms in
Eq. (3).) The approach of Cheung et al. is justified in theories with a
much smaller value of M than considered here, which is possible if the

13
dependence of the action on the inflaton field is limited by symmetry
principles or other consequences of an underlying theory. This case is
discussed briefly at the end of the present paper.

4. The same list of terms with just four spacetime derivatives (aside
from the term that violates parity conservation) has been given by
E. Elizalde, A. Jacksenaev, S. D. Odintsov, and I. L. Shapiro, Phys.
Lett. B 328, 297 (1994); Class. Quant. Grav. 12, 1385 (1995), but
not in the context of effective field theory. (In the original preprint of
the present paper, written before I had seen the work of Elizalde et

al., this list contained an additional term proportional to gRµν ϕ,µ;ν .
This term is redundant, because by using the Bianchi identity and dis-
carding total derivatives it may be expressed as a linear combination
of the fourth, fifth, and sixth terms listed here in Eq. (3).)

5. M. Ostrogradski, Mem. Act. St. Petersbourg VI 4, 385 (1850). For a


modern account, see F. J. de Urries and J. Julve, J. Phys. A31, 6949
(1998).

6. R. Kallosh, J. U. Kang, A. Linde, and V. Mukhanov, arXiv:0712.2040.

7. R. S. Arnowitt, S. Deser, and C. W. Misner, in Gravitation: An Intro-


duction to Current Research, edited by L. Witten (Wiley, New York,
1962).

8. See, e. g., J. Z. Simon, Phys. Rev. D 41, 3729 (1990).

9. C. Armendáriz-Picón, T. Damour, and V. F. Mukhanov, Phys. Lett.


B 458, 209 (1999).

10. S. Kawai, M. Sakagami, and J. Soda, Phys. Lett. B 437, 284 (1998);
S. Kawai and J. Soda, Phys. Lett. B 460, 41 (1999); S. Nojiri, S.
D. Odintsov, and M. Sasaki, Phys. Rev. D 71, 123509 (2005); G.
Calcagni, S. Tsujikawa, and M. Sami, Class. Quant. Grav. 22, 3977
(2005); G. Calcagne, B. de Carlos, A. De Felice, Nucl. Phys. B
752, 404 (2006); I. P. Neupane and B. M. N. Carter, J. Cosm. &
Astropart. Phys. 06, 004 (2006): G. Cognola, E. Elizalde, S. Nojiri,
S. D. Odintsov, and S. Zerbini, Phys. Rev. D73, 084007 (2006); B.
Leith and I. P. Neupane, J. Cosm. & Astropart. Phys. 05, 019 (2007);
S. Tsujikawa and M. Sami, J. Cosm. & Astropart. Phys. 07, 006
(2007); K. Bamba, Z-K Guo, and N. Ohta, arxiv:0707.4334. Some

14
of these articles deal with instabilities produced by a Gauss–Bonnet
term, but no instability arises if this term is treated as a correction
term in an effective field theory. The first term in Eq. (6) along with
a Gauss–Bonnet term equivalent to the second term in Eq. (6) were
encountered in a low-energy limit of string theory by Z. K. Guo, N.
Ohta, and S. Tsujikawa, Phys. Rev. D 75, 023520 (2007).

11. There is a large literature on a parity violating term of this form. See,
for instance, A. Lue, L. Wang, and M. Kamionkowski, Phys. Rev.
Lett. 83, 156 (1999); S-Y. Pi and R. Jackiw, Phys. Rev. D68,
104012 (2003); M. Satoh, S. Kanno, and J. Soda, Phys. Rev. D77,
023526 (2008). Leptogenesis due to this term was considered by S. H.
S. Alexander, M. E. Peskin, and M. M. Sheikh-Jabbari, Phys. Rev.
Lett. 96, 081301 (2006).

12. J. Garriga and V. F. Mukhanov, Phys. Lett. B 458, 219 (1999).

13. J. Maldacena, JHEP 0305, 013 (2003).

14. N. Arkani-Hamed, P. Creminelli, S. Mukohyama, and M. Zaldarriaga,


JCAP 04, 001 (2001); N. Arkani-Hamed, H-C Cheng, M. A. Luty, and
S. Mukohyama, JHEP 05, 074 (2004); S. Mukohyama, JCAP 0610,
011 (2006).

15
UTTG-04-08
arXiv:0805.3781v3 [hep-th] 13 Aug 2008

A Tree Theorem for Inflation

Steven Weinberg∗
Theory Group, Department of Physics, University of Texas
Austin, TX, 78712

Abstract

It is shown that the generating function for tree graphs in the “in-in” formal-
ism may be calculated by solving the classical equations of motion subject
to certain constraints. This theorem is illustrated by application to the
evolution of a single inflaton field in a Robertson–Walker background.


Electronic address: weinberg@physics.utexas.edu

1
I. Introduction

Quantum field theory is used differently in cosmology than in elementary


particle physics. In particle physics we need to know matrix elements be-
tween “in” states and “out” states, defined respectively by their appearance
at times long before and long after a collision. In contrast, in cosmology we
are interested in the expectation value of various operators in an “in” state,
usually defined as a state that looks like the vacuum at very early times,
long before perturbations leave the horizon during inflation. To calculate
such expectation values, we must use the “in–in” formalism of Schwinger et
al.[1]
The “in–in” formalism provides graphical rules for calculating expecta-
tion values, similar to but more complicated than the Feynman rules used
in particle physics. A special role in these calculations is provided by tree
graphs, which in many theories give much larger contributions to expecta-
tion values than graphs with loops. For instance, in general relativity the
coupling constant G appears only in a factor 1/G multiplying the whole
Einstein–Hilbert action, so each graviton propagator yields a factor G and
each interaction comes with a factor 1/G. A connected graph with I internal
lines and V vertices thus yields a factor GI−V = GL−1 , where L = I − V + 1
is the number of loops, so at a characteristic frequency H, graphs with L
loops are suppressed relative to tree graphs by factors of order (GH 2 )L . The
strength of fluctuations in the cosmic microwave background suggests that
GH 2 ≈ 10−10 . This suppression can partly be compensated by the appear-
ance of powers of ln a[2] (where a is the Robertson–Walker scale factor), but
for this to make loops competitive with trees the universe would have to
expand after horizon exit by something like 1010 e-foldings. For the same
reason, tree graphs will dominate in any theory with a small very coupling
constant g that appears only as a factor 1/g in the action.
In the present state of cosmology it is fortunate that tree graphs make
a much larger contribution to correlation functions than graphs with loops.
During the long period from when perturbations left the horizon during
inflation to when they re-entered the horizon in the radiation- or matter-
dominated era, there were various events, such as reheating, lepton and
baryon synthesis, and dark matter decoupling, about which we know es-
sentially nothing. The only reason that we are able to relate observations
of the microwave background anisotropies or large scale structure to what
happened in inflation is that certain quantities such as the curvature per-
turbation ζ and the gravitational wave amplitude are believed to be time-

2
independent when perturbations are outside the horizon[3] — that is, when
the physical wave number k/a is much less than the expansion rate H. For
tree graphs, when the physical wave numbers associated with external lines
are all much less than H then the same is true of the wave numbers asso-
ciated with all internal lines but, as shown by the ln a factors mentioned
above, this is not true of graphs with loops, and so the theorems that state
the constancy of ζ or of the gravitational wave amplitude outside the horizon
apply only to tree graphs.
It is well known in elementary particle physics that the tree graph con-
tributions to matrix elements between “in” and “out” states can be calcu-
lated by solving the classical field equations in the presence of an external
c-number current.[4] In the Appendix to a recent paper[5] I remarked that
the same is true in the “in-in” formalism, but the prescription stated there
was not correct. Section II of the present paper states and proves a general
theorem (hopefully correct this time) for the evaluation of tree contributions
to expectation values by the use of classical field equations. In Sections III
and IV this theorem is applied to inflation. The derivation of the tree the-
orem in Section II relies on an analytic continuation of trajectories in the
path-integral formalism, and since this is not treated rigorously, we check in
Section V that this theorem does give the correct results in the first few or-
ders of perturbation theory for the inflationary model discussed in Sections
III and IV. The methods of this paper are not really needed in the calculation
of tree contributions to specific expectation values, since the rules for such
calculations are already well known, but it is hoped that our results may
prove useful in proving general theorems about non-Gaussian correlations
in cosmology.

II. Tree Theorem for the “In-In” Formalism

We will consider a general Hamiltonian system, with Hermitian operators


Qa (t) and their canonical conjugates Pa (t), satisfying the usual commutation
relations

[Qa (t), Pb (t)] = iδab , [Qa (t), Qb (t)] = [Pa (t), Pb (t)] = 0 . (1)

In field theories a is a compound index, including a spatial coordinate as


well as discrete indices labeling the nature of the various operators, and
δab includes a delta function in the spatial coordinates as well as Kronecker
deltas for the discrete indices. These are Heisenberg-picture operators, with

3
a time-dependence generated by a Hamiltonian H[Q(t), P (t), t]:
h i h i
Q̇a (t) = −i Qa (t), H[Q(t), P (t), t] , Ṗa (t) = −i Pa (t), H[Q(t), P (t), t] .
(2)
We are allowing the Hamiltonian here to have an explicit dependence on time
for a reason discussed in [5]: When Qa (t) and Pa (t) are the fluctuations of
canonical variables around time-dependent background values, the Hamil-
tonian that generates their time-dependence is not the time-independent
Hamiltonian for the total canonical variables, but the sum of the terms in
this Hamiltonian of second and higher order in the fluctuations, so that a
time-dependence is introduced by dropping the terms of zeroth and first or-
der in fluctuations. The differential equations (2) show that in H[Q(t), P (t), t]
we can set the time argument of Qa (t) and Pa (t) equal to any fixed time t∗ ,
but the explicit time dependence of H[Q(t∗ ), P (t∗ ), t] still remains.
Instead of giving a formula for expectation values of products of the
Qa (t1 ) at a fixed time t1 in an “in” state |0, ini defined by its appearance
at a time t0 < t1 , we will give a formula for a generating function W [J, t1 ]
of a c-number current Ja , from which all such expectation values may be
obtained. The generating function is defined by
* +
hX i
W [J,t1 ]
e ≡ 0, in exp Qa (t1 )Ja 0, in , (3)

a

and from its derivatives we can calculate the expectation value of a product
of n of the Q’s:
∂n
 h i 
h0, in |Qa (t1 ) Qb (t1 ) · · ·| 0, ini = exp W [J, t1 ] (4)
∂Ja ∂Jb · · · J=0

(In field theories, sums over a like that in (3) include an integral over a
spatial coordinate as well as sums over discrete indices, and the derivatives
in (4) are functional derivatives.) We need not take Ja to be real, but if we
do then the generating function is real also.
To calculate W [J, t1 ] in the tree approximation we introduce a pair of
complex c-number J-dependent functions of time, qLa (t) and qRa (t), which
are defined by three conditions:
(A) Both qLa (t) and qRa (t) satisfy the Lagrangian equations of motion

d ∂L[qL (t), q̇L (t), t] ∂L[qL (t), q̇L (t), t]


 
= , (5)
dt ∂ q̇La (t) ∂qLa (t)

4
and
d ∂L[qR (t), q̇R (t), t] ∂L[qR (t), q̇R (t), t]
 
= , (6)
dt ∂ q̇Ra (t) ∂qRa (t)
where L[q(t), q̇(t), t] is obtained from the classical Hamiltonian H by using
the expression
X
L[q(t), q̇(t), t] = pa (t)q̇a (t) − H[q(t), p(t), t] . (7)
a

with pa (t) eliminated from the right-hand side by using the classical formula
∂H[q(t), p(t), t]
q̇a (t) = . (8)
∂pa (t)

(B) The qLa (t) and qRa (t) and their time derivatives satisfy constraints at
the time t1 :
qLa (t1 ) = qRa (t1 ) , (9)
and
∂L[qL (t1 ), q̇L (t1 ), t1 ] ∂L[qR (t1 ), q̇R (t1 ), t1 ]
− = −iJa . (10)
∂ q̇La (t1 ) ∂ q̇Ra (t1 )

(C) The qLa (t) and qRa (t) and their time derivatives also satisfy constraints
at the time t0 that is used to define the state |0, ini, constraints that depend
on the nature of this state. In particular, if |0, ini is a state that looks like
the Bunch–Davies vacuum at a time t0 = −∞, then qLa (t) and qRa (t) satisfy
“positive frequency” and “negative frequency” conditions, respectively; that
is, they must for t → −∞ be superpositions of terms with time-dependence
proportional to e−iωt and e+iωt , respectively, where the ωs are various posi-
tive frequencies.
With qLa (t) and qRa (t) calculated in terms of the current using these
three conditions, the generating function W [J, t1 ] is given in the tree ap-
proximation by
Z t1 n o X
W [J, t1 ]tree = i dt L[qR (t), q̇R (t), t] − L[qL (t), q̇L (t), t] + qRa (t1 ) Ja .
t0 a
(11)
(There is an easy generalization of this theorem: Instead of the linear
function a Qa (t1 )Ja in the exponential on the right-hand side of the defi-
P

nition (3) of W , we could insert an arbitrary function J [Q(t1 )] of all the Qa


at the same time t1 . Then in the tree approximation W would be given by
Z t1 n o
Wtree = i dt L[qR (t), q̇R (t), t] − L[qL (t), q̇L (t), t] + J [qR (t1 )]
t0

5
with Ja on the right-hand side of the constraint (10) replaced with ∂J (q)/∂qa ,
in which we set qa = qRa (t1 ).)
The proof of the tree theorem relies on the path-integral formulation
of the “in-in” formalism, so we begin with a brief derivation of the path-
integral formula for exp(W [J, t1 ]). At any given time t, we introduce a
complete set of states |q, ti, defined as normalized eigenstates of the Qa (t)
with eigenvalues qa :
Y
Qa (t)|q, ti = qa |q, ti , hq, t|q ′ , ti = δ(qa − qa′ ) . (12)
a

The expectation value (3) may then be written


Z !
W [J,t1 ]
Y
e = dqa dqa′ dqa′′ Ψ∗0 (q ′ )hq, t1 |q ′ , t0 i∗
a
!
X
× exp qa Ja hq, t1 |q ′′ , t0 iΨ0 (q ′′ ) , (13)
a

where Ψ0 (q) is the wave function of the state in which the expectation value
is taken
Ψ0 (q) ≡ hq, t0 |0, ini . (14)
The matrix elements between eigenstates of the Q’s at times t1 and t0 with
t0 < t1 are given by an integral over real functions qa (t) that interpolate be-
tween the eigenvalues at t0 and t1 together with independent unconstrained
real functions pa (t):
  ! !
Z Y Y Y
hq, t1 |q ′ , t0 i =  dqa (t) dpa (t) ′
δ(qa (t0 ) − qa ) δ(qa (t1 ) − qa )
a,t a a
( Z " #)
t1 X
× exp i dt pa (t)q̇a (t) − H[q(t), p(t), t] . (15)
t0 a

(Eq. (15) and its derivation are the same as encountered in the familiar
derivation of the path-integral formula for the S-matrix.) Using this in
Eq. (13), we must introduce separate integration variables qLa (t), pLa (t) and
qRa (t), pRa (t) in the path-integral formulas for hq, t1 |q ′ , t0 i and hq, t1 |q ′′ , t0 i,
respectively. This gives
   
Z Z
eW [J,t1 ] =
Y Y
 dqLa (t) dpLa (t)  dqRa (t) dpRa (t)
a,t a,t

6
! !
    Y X
× Ψ∗0 qL (t0 ) Ψ0 qR (t0 ) δ(qLa (t1 ) − qRa (t1 )) exp qRa (t1 )Ja
a a
( )!
Z t1 X
× exp −i dt pLa (t)q̇La (t) − H[qL (t), pL (t), t]
t0 a
( )!
Z t1 X
× exp i dt pRa (t)q̇Ra (t) − H[qR (t), pR (t), t] . (16)
t0 a

There is no special reason why we chose qRa (t1 ) rather than qLa (t1 ) to multi-
ply the current in the first exponential; the delta function makes this choice
inconsequential.
Eq. (16) leads to well-known graphical rules: Writing H as the sum of
a quadratic part and an interaction, we expand in powers of the interaction
and J, and evaluate the resulting Gaussian integral as a sum of ways of
pairing the qs and ps in this expansion. Because it enters in the exponential
on the left-hand side of Eq.(16), W [J, t1 ] is given by a sum of connected
graphs. To isolate the connected tree graphs, we use the same trick as is
used in proving a tree theorem for the S-matrix[4]: We introduce a fictitious
coupling constant g, defining a generating function W [J, t1 , g] by
   
Z Z
eW [J,t1 ,g]/g ≡
Y Y
 dqLa (t) dpLa (t)  dqRa (t) dpRa (t)
a,t a,t
!
    Y
× Ψ∗0 qL (t0 ) Ψ0 qR (t0 ) δ(qLa (t1 ) − qRa (t1 ))
a
! ( )!
1X i
Z t1 X
× exp qRa (t1 )Ja exp − dt pLa (t)q̇La (t) − H[qL (t), pL (t), t]
g a g t0 a
( )!
i
Z t1 X
× exp dt pRa (t)q̇Ra (t) − H[qR (t), pR (t), t] . (17)
g t0 a

By the same argument as in Section I, a connected graph with L loops makes


a contribution to W [J, t1 , g]/g proportional to gL−1 , so W [J, t1 , g] for g → 0
approaches a g-independent limit given by the sum of tree graphs for any
value of g, and in particular for the physical value g = 1:

W [J, t1 , g] → W [J, t1 ]tree for g→0. (18)

Now, in the limit g → 0, the path integrals in Eq. (17) are dominated
by complex J-dependent trajectories of qLa (t), pLa (t) and qRa (t), pRa (t) at

7
which the coefficient of 1/g in the combined exponential is stationary, so
" ( )
X Z t1 X
W [J, t1 ]tree = qRa (t1 )Ja − i dt pLa (t)q̇La (t) − H[qL (t), pL (t), t]
a t0 a
( )#
Z t1 X
+i dt pRa (t)q̇Ra (t) − H[qR (t), pR (t), t] , (19)
t0 a staty

with the condition that the trajectories be stationary (indicated by the sub-
script “staty”) taken subject to the constraint (9) imposed by the factor
Q
a δ(qLa (t1 ) − qRa(t1 )) in Eq. (17),
 and also
 subject to constraints imposed

by the factors Ψ0 qL (t0 ) and Ψ0 qR (t0 ) , about which more later.
The functions pLa (t) and pRa (t) are unconstrained, so in Eq. (19) we
may take them to be given by the classical conditions (8) applied to both
functions. Then Eq. (19) becomes
"
X Z t1
W [J, t1 ]tree = qRa (t1 )Ja − i dt L[qL (t), q̇L (t), t]
a t0
#
Z t1
+i dt L[qR (t), q̇R (t), t] , (20)
t0 staty

with L[q, q̇, t] given by using (8) to eliminate the ps in Eq. (7). This is the
same as the desired result (11), provided we can show that the trajectories
for which this quantity is stationary are those described by the three above
conditions (A), (B), and (C).
To implement the condition that this is stationary with respect to vari-
ations in qLa (t) and qRa (t), we note that
" #
X Z t1 Z t1
δ qRa (t1 )Ja − i dt L[qL (t), q̇L (t), t] + i dt L[qR (t), q̇R (t), t]
a t0 t0
X
= δqRa (t1 )Ja
a
Z t1 X  ∂L[qL (t), q̇L (t), t] ∂L[qL (t), q̇L (t), t]

−i dt δqLa (t) + δq̇La (t)
t0 a ∂qLa (t) ∂ q̇La (t)
Z t1 X  ∂L[qR (t), q̇R (t), t] ∂L[qR (t), q̇R (t), t]

+i dt δqRa (t) + δq̇Ra (t)
t0 a ∂qRa (t) ∂ q̇Ra (t)
∂L[qR (t1 ), q̇R (t1 ), t1 ] ∂L[qL (t1 ), q̇L (t1 ), t1 ]
X  X 
= δqRa (t1 ) Ja + i −i δqLa (t1 )
a ∂ q̇Ra (t1 ) a ∂ q̇La (t1 )

8
Z t1 X  ∂L[qL (t), q̇L (t), t] d ∂L[qL (t), q̇L (t), t]

−i dt − δqLa (t)
t0 a ∂qLa (t) dt ∂ q̇a (t)
Z t1 X ∂L[qR (t), q̇R (t), t] d ∂L[qR (t), q̇R (t), t]

+i dt − δqRa (t) . (21)
t0 a ∂qRa (t) dt ∂ q̇Ra (t)
The vanishing of the coefficients of δqLa (t) and δqRa (t) for t0 < t < t1
yields the Lagrangian equations of motion (5) and (6). The vanishing of the
coefficients of δqLa (t1 ) and δqRa (t1 ), subject to the condition that δqLa (t1 ) =
δqRa (t1 ), yields Eq. (10).
Finally, we must return to the constraint on the solutions of the La-
grangian equations of motion  (5) and (6) imposed  by the presence in (17)

of the wave functions Ψ0 qR (t0 ) and Ψ0 qL (t0 ) . As is familiar from the
path-integral calculation of the S-matrix,
 where t0 → −∞ and |0, ini is the
“in” vacuum the factor Ψ0 qR (t0 ) has the effect of putting a −iǫ in the
denominator of the Fourier integral for the propagator, so when Eq. (6)
is solved using this propagator as a Green’s function we get “negative fre-
quency” solutions qRa (t) — that is, solutions that behave for t → −∞ as
a superposition of terms with time dependence eiωt , where ω > 0. As re-
marked in [5], because L[qL (t1 ), q̇L (t1 ), t1 ] enters in the argument of the
exponential in (17) with an opposite
 sign to L[qR (t1 ), q̇R (t1 ), t1 ], the effect

of the wave function Ψ0 qL (t0 ) is to constrain qLa (t) to have “positive fre-
quency” — that is, to behave for t → −∞ as a superposition of terms with
time dependence e−iωt , again with ω > 0.
This completes the proof that the functions qLa (t) and qRa (t) that should
be used in Eq. (11) to calculate the tree contribution to W [J, t1 ] are those
satisfying the above conditions (A), (B), and (C). In the case where Ja is real,
these three conditions are consistent with the result that qLa (t)∗ = qRa (t),
which makes the tree approximation (11) to the generating function W [J, t1 ]
real, as expected.
III. Application to Inflation
To illustrate the use of the tree theorem of Section II in cosmology, we will
consider a simple semi-realistic model, in which a single scalar field evolves in
an unperturbed Robertson–Walker metric gµν of zero spatial curvature. We
write the scalar field φ(x, t) as an unperturbed term φ̄(t) plus a fluctuation
ϕ(x, t) (called δϕ(x, t) in [5]). The Lagrangian for the fluctuation is
√ 3 1
Z  
L = g d x − gµν ∂µ ϕ∂ν ϕ − U (ϕ, t)
2

9
1 2 1
Z  
= a3 d3 x ϕ̇ − 2 (∇ϕ)2 − U (ϕ, t) , (22)
2 2a

where a(t) is the Robertson–Walker scale factor, and in accordance with the
prescription for constructing the Hamiltonian for fluctuations mentioned in
Section II, the time-dependent potential U (ϕ, t) for the fluctuations is a
function containing only second and higher powers of ϕ(x, t), and given in
terms of the time-independent potential V (φ) for the total scalar field by
     
U (ϕ, t) ≡ V φ̄(t) + ϕ − V φ̄(t) − V ′ φ̄(t) ϕ . (23)

It is straightforward to apply the results of the previous section to this


theory. Here the index a is the spatial coordinate x; the variable qa (t) is
ϕ(x, t); the current Ja is J(x); and derivatives with respect to qa (t) or q̇a (t)
are functional derivatives. The Lagrangian equations (5) and (6) are here
the Euler–Lagrange equations

ϕ̈L + 3H ϕ̇L − a−2 ∇2 ϕL + U ′ (ϕL , t) = 0 , (24)

ϕ̈R + 3H ϕ̇R − a−2 ∇2 ϕR + U ′ (ϕR , t) = 0 , (25)


where H(t) ≡ ȧ(t)/a(t), the prime on U indicates a derivative with respect
to its field argument, and the constraints (9) and (10) read

ϕL (x, t1 ) = ϕR (x, t1 ) , (26)

ϕ̇L (x, t1 ) − ϕ̇R (x, t1 ) = −ia−3 (t1 )J(x) . (27)


With t0 = −∞ and expectation values calculated for the Bunch–Davies vac-
uum, the “positive frequency” and “negative frequency” constraints require
in this model that, for t → −∞, ϕL (x, t) and ϕR (x, t) approach superposi-
tions of exp(ik · x − ikη) and exp(ik · x + ikη), respectively, where η is the
conformal time, with η̇ > 0.
After solving Eqs. (24) and (25) subject to these constraints, the gener-
ating functional W [J, t1 ] can be calculated in the tree approximation from
Eq. (11), which for this model reads
Z t1 n o Z
W [J, t1 ]tree = i dt L[ϕR (t), ϕ̇R (t), t]−L[ϕL (t), ϕ̇L (t), t] + d3 x ϕR (x, t1 ) J(x)
−∞
(28)

10
with the Lagrangian L given by Eq. (22). The integral of the Lagrangian
can be simplified by integrating by parts and then using the field equation.
Note that
Z t1 Z
1 2 1

1

dt d3 x a3 (t)
ϕ̇ − 2 (∇ϕ)2 − U (ϕ, t) = a3 (t1 )ϕ(t1 )ϕ̇(t1 )
−∞ 2 2a 2
Z t1
1 1 2
Z    
3 3
− dt d x a (t) ϕ ϕ̈ + 3H ϕ̇ − 2 ∇ ϕ + U (ϕ, t) .
−∞ 2 a
(There is no contribution from the lower limit of the integral, because the
integrand oscillates increasingly rapidly for t → −∞.) Hence by using the
field equations (24) and (25), Eq. (28) becomes
i
Z
W [J, t1 ]tree = a3 (t1 ) d3 x {ϕR (x, t1 )ϕ̇R (x, t1 ) − ϕL (x, t1 )ϕ̇L (x, t1 )}
2 (
Z t1
1
Z    
3 3
+i dt a (t) d x ϕR (x, t) U ′ ϕR (x, t), t − U ϕR (x, t), t
−∞ 2
)
1    
− ϕL (x, t) U ′ ϕL (x, t), t + U ϕL (x, t), t
2
Z
+ d3 x ϕR (x, t1 ) J(x) .

Using the constraints (26) and (27), we see that the first term is −1/2 the
last term, so
(
Z t1 Z
1    
3 3
W [J, t1 ]tree = i dt a (t) d x ϕR (x, t)U ′ ϕR (x, t), t − U ϕR (x, t), t
−∞ 2
)
1    
− ϕL (x, t)U ′ ϕL (x, t), t + U ϕL (x, t), t
2
1
Z
+ d3 x ϕR (x, t1 ) J(x) . (29)
2
This is the form we will use in what follows. With W calculated in this
way, the tree approximation to the expectation value of a product of n ϕs
is given by Eq. (4) as
δn
 h i 
h0, in |ϕ(x, t1 ) ϕ(y, t1 ) · · ·| 0, initree = exp W [J, t1 ]tree
δJ(x)δJ(y) · · · J=0
(30)

IV. Integral Equation Formulation

11
We will now re-write the Euler–Lagrange equations as integral equations
that incorporate the constraints on the behavior of ϕL (x, t) and ϕR (x, t) for
t → −∞ as well as the constraints (26) and (27). It is these integral equa-
tions that will be used in the following section to check that this formalism
generates the usual results of perturbation theory.
We first separate the potential U into a term quadratic in ϕ and an
interaction term Γ containing only cubic and higher terms:
1
U (ϕ, t) ≡ ϕ2 U ′′ (0, t) + Γ(ϕ, t) . (31)
2
We introduce functions uk (t) for which uk (t) exp(ik · x) are “positive fre-
quency” solutions of the linearized Euler-Lagrange equations, — that is,
 2
ük (t) + 3H(t)u̇k (t) + k/a(t) uk (t) + U ′′ (0, t)uk (t) = 0 . (32)

with “positive frequency” interpreted to mean that the WKB solution for
t → −∞ is proportional to
 Z t 
a(t)−1 exp −ik dt′ /a(t′ ) ,

rather than to its complex conjugate. It will be convenient to normalize


these functions so that for t → −∞ the WKB solution takes the form
1
 Z t dt′

uk (t) → √ exp −ik , (33)
(2π)3/2 2k a(t) t∗ a(t′ )

with t∗ any fixed time. The Wronskian of uk and u∗k is proportional for all
times to 1/a3 , so the normalization (33) gives it the value

i
uk (t)u̇∗k (t) − u̇k (t)u∗k (t) = . (34)
(2π)3 a3 (t)

Using these functions, we can construct a Green’s function:


 
G(x − x′ , t, t′ ) ≡ θ(t − t′ ) G0 (x − x′ , t, t′ ) + G∗0 (x − x′ , t, t′ ) . (35)

where θ is the usual step function, and


Z  
′ ′
G0 (x − x , t, t ) ≡ −i d3 k exp ik · (x − x′ ) uk (t)u∗k (t′ ) . (36)

12
Both G0 and its complex conjugate satisfy the homogeneous wave equation
!
∂2 ∂
+ 3H(t) − a−2 (t)∇2 + U ′′ (0, t) G0 (x − x′ , t, t′ ) = 0 (37)
∂t2 ∂t

and the Wronskian formula (34) then tells us that G satisfies the correspond-
ing inhomogeneous equation:
!
∂2 ∂
2
+ 3H(t) − a−2 (t)∇2 + U ′′ (0, t) G(x−x′ , t, t′ ) = −δ(t−t′ )δ3 (x−x′ ) .
∂t ∂t
(38)
Solutions of Eqs. (24) and (25) that satisfy the constraints (26) and (27)
and the positive and negative frequency conditions, respectively, are given
by the coupled integral equations
(
Z Z t1  
ϕL (x, t) = d3 x′ dt′ G(x − x′ , t, t′ ) a3 (t′ ) Γ′ ϕL (x′ , t′ ), t′
−∞
" #)
   
′ ′ 3 ′ ′ ′ ′ ′ ′ ′ ′ ′
+ G0 (x − x , t, t ) a (t ) Γ ϕR (x , t ), t − Γ ϕL (x , t ), t
Z
+i d3 x′ G0 (x − x′ , t, t1 ) J(x′ ) , (39)

(
Z Z t1  
ϕR (x, t) = d3 x′ dt′ G(x − x′ , t, t′ ) a3 (t′ ) Γ′ ϕR (x′ , t′ ), t′
−∞
" #)
   
3
− G∗0 (x ′ ′ ′
− x , t, t ) a (t ) Γ ϕR (x , t ), t ′ ′ ′ ′ ′ ′
− Γ ϕL (x , t ), t ′ ′

Z
−i d3 x′ G∗0 (x − x′ , t, t1 ) J(x′ ) . (40)

Functions ϕL (x, t) and ϕR (x, t) that satisfy these integral equations will
satisfy the field equations (24) and (25) because of the properties (37) and
(38) of the Green’s functions, and they will satisfy the positive and negative
frequency conditions because G0 and G∗0 respectively satisfy these condi-
tions, while G(x − x′ , t, t′ ) → 0 for t → −∞. To check the constraint (26),
and for future reference, we note that since θ(t1 − t′ ) = 1,

ϕL (x, t1 ) = ϕR (x, t1 )
(
Z Z t1  
3 ′
= d x dt ′
G0 (x − x′ , t1 , t′ ) a3 (t′ ) Γ′ ϕR (x′ , t′ ), t′
−∞

13
)
 
3
+ G∗0 (x ′ ′ ′ ′ ′
− x , t1 , t ) a (t ) Γ ϕL (x , t ), t′ ′

Z
+ d3 x′ D(x − x′ , t1 ) J(x′ ) , (41)

where D is the real function

D(x − x′ , t1 ) = iG0 (x − x′ , t1 , t1 ) = −iG∗0 (x − x′ , t1 , t1 )


Z  
= d3 k |uk (t1 )|2 exp ik · (x − x′ ) . (42)

To check the constraint (27), we note that G0 (x − x′ , t, t) is imaginary, so


 
Ġ(x − x′ , t, t′ ) = θ(t − t′ ) Ġ0 (x − x′ , t, t′ ) + Ġ∗0 (x − x′ , t, t′ ) , (43)

the dot indicating a derivative with respect to the first time argument. It
follows then from Eqs. (39) and (40) that
Z  
ϕ̇L (x, t1 ) − ϕ̇R (x, t1 ) = i d3 x′ Ġ0 (x − x′ , t1 , t1 ) + Ġ∗0 (x − x′ , t1 , t1 ) J(x′ ) .
(44)
Using the Wronskian formula (34) again gives

Ġ0 (x − x′ , t1 , t1 ) + Ġ∗0 (x − x′ , t1 , t1 ) = −a−3 (t1 ) δ3 (x − x′ ) , (45)

so Eq. (44) shows that the constraint (27) also is satisfied.


For a real current J(x), and a potential U (ϕ) that satisfies the real-
ity condition U ∗ (ϕ) = U (ϕ∗ ), the iterative solution of Eqs. (39) and (40)
obviously satisfies the condition ϕL (x, t)∗ = ϕR (x, t). For t = t1 , when
ϕL (x, t1 ) = ϕR (x, t1 ), this implies that ϕR (x, t1 ) is real. It then follows that
the formula (29) gives W [J, t1 ] real in the tree approximation.

V. Perturbation Theory

The derivation of the tree theorem in Section II was something less than
mathematically rigorous. In particular, in evaluating the path integral in the
limit g → 0 by setting the argument of the exponential equal to its value
where stationary, we distorted the contour of integration of the variables
qa (t) and pa (t) away from the real axis, without proving the validity of this
analytic continuation. As a check on the validity of this theorem, in this
section we will verify that when we expand the integral equations derived in

14
the previous section in powers of the interaction Γ, we obtain the first few
orders of perturbation theory given by the usual path-integral formulation
of the “in-in” formalism.
Let us first recall the diagrammatic rules given by perturbation theory
for the expectation value

h0, in|ϕ(x, t1 )ϕ(y, t1 ) · · · |0, ini , (46)

(as derived for instance in [5]), in the special case of the model discussed
in Sections III and IV. We sum contributions from diagrams containing any
number of vertices, each of which can be either of L or R type, and is labeled
with a spatial coordinate and a time t ≤ t1 . For each vertex of L or R type
we include a factor +ia3 (t) or −ia3 (t), respectively, together with a time-
dependent coupling constant equal to the n-th derivative with respect to
ϕ of Γ(ϕ, t) at ϕ = 0 for a vertex to which is attached n internal and/or
external lines. Attached to the vertices are external lines, corresponding to
fields in the expectation value (46), and/or internal lines connecting pairs
of vertices. The contribution of an external line corresponding to a field
ϕ(x, t1 ) attached to a L or R vertex labeled with spacetime coordinates y,
t, is a function
L : h0|ϕI (y, t)ϕI (x, t1 )|0i (47)
or
R: h0|ϕI (x, t1 )ϕI (y, t)|0i , (48)
respectively, where ϕI (x, t) is the interaction picture field
Z h i
ϕI (x, t) ≡ d3 k eik·x uk (t)α(k) + e−ik·x u∗k (t)α∗ (k) (49)

with α(k) and α∗ (k) the usual annihilation and creation operators, and |0i
is the bare vacuum, annihilated by α(k). The contribution of an internal
line connecting two vertices of L and/or R type labeled with spacetime
coordinates x, t and x′ , t′ is

LL : h0|T̄ {ϕI (x, t) ϕI (x′ , t′ )}|0i , (50)


I I ′ ′
RR : h0|T {ϕ (x, t) ϕ (x , t )}|0i , (51)
I I ′ ′
LR : h0|ϕ (x, t) ϕ (x , t )|0i , (52)
I ′ ′ I
RL : h0|ϕ (x , t ) ϕ (x, t)|0i , (53)

where T and T̄ denote time-ordered and anti-time-ordered products, respec-


tively. In addition, there may be lines passing through the diagram without

15
interaction, corresponding to pairings of fields in the product whose expec-
tation value is being calculated. Each such line, corresponding to fields
ϕ(x, t1 ) and ϕ(y, t1 ), makes a contribution

h0|ϕI (x, t1 ) ϕI (y, t1 )|0i , (54)

the order of the fields here being unimportant, since fields at the same time
commute. Finally, we are to integrate the spatial coordinates associated with
each vertex over all space, and integrate the corresponding time coordinates
from −∞ to t1 .
As a first step in checking the agreement of our theorem with these
graphical rules, we note that the quadratic term in U (ϕ) drops out in the
combination 21 ϕU ′ (ϕ, t) − U (ϕ, t), so Eq. (29) may be written in terms of
the interaction Γ defined by Eq. (31):
(
Z t1 Z
1    
3
W [J, t1 ]tree = i dt d x ϕR (x, t)Γ′ ϕR (x, t), t − Γ ϕR (x, t), t
−∞ 2
)
1    
− ϕL (x, t)Γ′ ϕL (x, t), t + Γ ϕL (x, t), t a3 (t)
2
1
Z
+ d3 x ϕR (x, t1 ) J(x) . (55)
2
We can now expand in powers of Γ.
Zeroth Order
The zeroth order approximation to ϕR (x, t1 ) is given by dropping the
terms involving Γ′ in Eq. (41):
Z
(0)
ϕR (x, t1 ) = d3 x D(x − x′ , t1 ) J(x′ ) . (56)

Using this in Eq. (55) gives the zeroth order perturbation to the generating
function:
1
Z Z
(0) 3
W [J, t1 ]tree = d x d3 x′ D(x − x′ , t1 ) J(x) J(x′ ) . (57)
2
The result that this is quadratic in J corresponds to the elementary obser-
vation that in the absence of interactions all fluctuations are Gaussian. The
two-point correlation is given in the tree approximation by Eqs. (30) and
(57) as
h0, in|ϕ(x, t1 ) ϕ(y, t1 )|0, initree = D(x − y, t1 ) . (58)

16
This agrees with the above perturbative rules, because the two-point func-
tion (54) is just D(x − y, t1 ).
First Order
To first order in Γ, Eq. (55) gives the generating function as
(
Z t1 Z
1 (0)    
(1) 3 (0) (0)
W [J, t1 ]tree = i dt d x ϕR (x, t)Γ′ ϕR (x, t), t − Γ ϕR (x, t), t
−∞ 2
)
1 (0) 
(0)
 
(0)

− ϕL (x, t)Γ′ ϕL (x, t), t + Γ ϕL (x, t), t a3 (t)
2
1
Z
(1)
+ d3 x ϕR (x, t1 ) J(x) (59)
2
with superscripts (0) and (1) indicating terms of zeroth and first order in Γ.
Eqs. (39) and (40) give the zeroth order fields for general time as
Z
(0)
ϕL (x, t) =i d3 x′ G0 (x − x′ , t, t1 ) J(x′ ) , (60)
Z
(0)
ϕR (x, t) = −i d3 x′ G∗0 (x − x′ , t, t1 ) J(x′ ) , (61)

while from Eq. (41) we find the first-order fields at time t1 :


(1) (1)
ϕL (x, t1 ) = ϕR (x, t1 )
(
Z Z t1  
3 ′ (0)
= d x dt ′
G0 (x − x′ , t1 , t′ ) a3 (t′ ) Γ′ ϕR (x′ , t′ ), t′
−∞
)
 
3 (0)
+ G∗0 (x ′
− x , t1 , t ) a (t ) Γ ′ ′ ′
ϕL (x′ , t′ ), t′ . (62)

From the definition (36), we see that G0 (x − x′ , t, t′ ) = −G∗0 (x − x′ , t′ , t), so


Eqs. (60)–(62) give the final term in Eq. (59) as

1
Z
1
Z Z Z t1
3 (1) 3 3 ′
d x ϕR (x, t1 ) J(x) =− d x d x dt′ J(x)
2 2 −∞
(
 
(0)
× G∗0 (x − x′ , t′ , t1 ) a3 (t′ ) Γ′ ϕR (x′ , t′ ), t′
)
 
′ ′ 3 ′ ′ (0)
+ G0 (x − x , t , t1 ) a (t ) Γ ϕL (x′ , t′ ), t′

17
(
i
Z Z t1  
3 ′ ′ (0) (0)
=− d x dt ϕR (x′ , t1 )Γ′ ϕR (x′ , t′ ), t′
2 −∞
)
 
(0) (0)
− ϕL (x′ , t1 )Γ′ ϕL (x′ , t′ ), t′ .

This cancels the Γ′ terms in the first two lines of Eq. (59), leaving us with
the simple first-order result
( )
Z t1 Z    
(0) (0)
W (1) [J, t1 ]tree = i dt d3 x −Γ ϕR (x, t), t +Γ ϕL (x, t), t a3 (t) .
−∞
(63)
According to Eqs. (60) and (61), when we take the functional derivative of
(0) (0)
ϕL (y, t) or ϕR (y, t) with respect to J(x) we get a factor iG0 (y − x, t, t1 )
or −iG∗0 (y − x, t, t1 ), respectively, while the definitions (36) and (49) give
Z  
iG0 (y−x, t, t1 ) = d3 k exp ik·(y−x) uk (t)u∗k (t1 ) = h0|ϕI (y, t)ϕI (x, t1 )|0i ,
Z (64)
 
3 I I
−iG∗0 (y−x, t, t1 ) = d k exp ik·(x−y) uk (t1 )u∗k (t) = h0|ϕ (x, t1 )ϕ (y, t)|0i ,
(65)
which are the same as the factors (47) and (48) associated according to the
usual graphical rules with external lines. Thus, Eq. (63) is just what is given
by the “in-in” formalism for a diagram with a single vertex, which can be
L or R type, and any number n ≥ 3 of external lines for which there is a
term in Γ with n field factors. For instance, in the case n = 3, the third
derivative of Eq. (63) gives
Z t1 Z

h0, in|ϕ(x, t1 ) ϕ(x , t1 ) ϕ(x , t1 )|0, ini = i′′
dt d3 y Γ′′′ (0, t) a3 (t)
−∞
(
× G0 (y − x, t, t1 )G0 (y − x′ , t, t1 )G0 (y − x′′ , t, t1 )
)
+ G∗0 (y − x, t, t1 )G∗0 (y −x ′
, t, t1 )G∗0 (y ′′
− x , t, t1 ) ,

which is the same as given by a graph with a single vertex to which are
attached three external lines. This three-point function has been calculated
in a different way, by a perturbative solution for the Heisenberg picture
fields, by Seery, Malik, and Lyth[6]. It is the analog of the three-point
function calculated in a more realistic model by Maldacena[3].

18
Second Order
So far, we have confirmed in first order that we get the right contribu-
tions from external lines, but to check that we get the right propagators for
internal lines, we have to go to second order in Γ. According to Eq. (55),
the contribution to W [J, t1 ]tree of second order in Γ is
(
Z t1 Z
1 (1)  
(0)
W (2) [J, t1 ]tree = i dt a3 (t) d3 x − ϕR (x, t)Γ′ ϕR (x, t), t
−∞ 2
1 (0) (1)

(0)

+ ϕR (x, t)ϕR (x, t)Γ′′ ϕR (x, t), t
2 )
1 (1) 
(0)
 1
(0) (1)

(0)

+ ϕL (x, t)Γ′ ϕL (x, t), t − ϕL (x, t)ϕL (x, t)Γ′′ ϕL (x, t), t
2 2
1
Z
(2)
+ d3 x ϕR (x, t1 ) J(x) . (66)
2
The second-order contribution to ϕR (x, t1 ) is given by Eq. (41) as
(
Z Z t1  
(2) 3 ′ (0) (1)
ϕR (x, t1 ) = d x dt G0 (x − x′ , t1 , t′ ) a3 (t′ ) Γ′′ ϕR (x′ , t′ ), t′ ϕR (x′ , t′ )

−∞
)
 
3 (0) (1)
+ G∗0 (x − ′ ′
x , t1 , t ) a (t ) Γ ′ ′′
ϕL (x′ , t′ ), t′ ϕL (x′ , t′ ) . (67)

By again using the relation G0 (x−x′ , t, t′ ) = −G∗0 (x−x′ , t′ , t) and Eqs. (60)–
(62), now together with Eq. (67), we see that the final term in Eq. (66) is

1
Z
1
Z Z Z t1
(2)
d3 x ϕR (x, t1 ) J(x) = − d3 x d3 x′ dt′ J(x)
2 2 −∞
(
 
(0) (1)
× G∗0 (x − x′ , t′ , t1 ) a3 (t′ ) Γ′′ ϕR (x′ , t′ ), t′ ϕR (x′ , t′ )
)
 
′ ′ 3 ′ ′′ (0) (1)
+ G0 (x − x , t , t1 ) a (t ) Γ ϕL (x′ , t′ ), t′ ϕR (x′ , t′ )
(
i
Z Z t1  
3 ′ ′ (0) (0) (1)
=− d x dt ϕR (x′ , t1 )Γ′′ ϕR (x′ , t′ ), t′ ϕR (x′ , t′ )
2 −∞
)
 
(0) (0) (1)
− ϕL (x′ , t1 )Γ′′ ϕL (x′ , t′ ), t′ ϕ( x′ , t′ ) a3 (t′ ) .

This cancels the Γ′′ terms in the first three lines of Eq. (66), leaving us with

19
the much simpler relation
(
Z t1 Z
1 (1)  
(2) 3 3 (0)
W [J, t1 ]tree = i dt a (t) d x − ϕR (x, t)Γ′ ϕR (x, t), t
−∞ 2
)
1 (1) 
(0)

+ ϕL (x, t)Γ′ ϕL (x, t), t . (68)
2

(A similar cancelation occurs in each order of perturbation theory.) Eqs. (39)


and (40) give the first-order contributions to ϕL and ϕR for general times:
(
Z Z t1  
(1) 3 ′ (0)
ϕL (x, t) = d x dt ′
G(x − x′ , t, t′ ) a3 (t′ ) Γ′ ϕL (x′ , t′ ), t′
−∞
" #)
   
′ ′ 3 ′ ′ (0)
+ G0 (x − x , t, t ) a (t ) Γ ϕR (x′ , t′ ), t′ ′
− Γ ϕL (x , t ), t ′ ′ ′
,(69)

(
Z Z t1  
(1) 3 ′ (0)
ϕR (x, t) = d x dt ′
G(x − x′ , t, t′ ) a3 (t′ ) Γ′ ϕR (x′ , t′ ), t′
−∞
" #)
   
3 (0)
− G∗0 (x ′ ′
− x , t, t ) a (t ) Γ ′ ′
ϕR (x′ , t′ ), t′ ′
− Γ ϕL (x , t ), t ′ ′ ′
(. 70)

Using Eqs. (69) and (70) in Eq. (68) gives


1
Z t1 Z Z t1 Z
(2) 3 3 3
W [J, t1 ]tree = dt a (t) d x ′
dt a (t ) ′
d3 x′
2 −∞ −∞
(
   
(0) (0)
× − Γ′ ϕR (x, t), t ∆RR (x − x′ , t, t′ )Γ′ ϕR (x′ , t′ ), t′
   
(0) (0)
+ Γ′ ϕR (x, t), t ∆RL (x − x′ , t, t′ )Γ′ ϕL (x′ , t′ ), t′
   
(0) (0)
+ Γ′ ϕL (x, t), t ∆LR (x − x′ , t, t′ )Γ′ ϕR (x′ , t′ ), t′
)
   
′ (0) ′ ′ ′ (0)
−Γ ϕL (x, t), t ∆LL (x − x , t, t )Γ ϕL (x′ , t′ ), t′ , (71)

where
∆LL (x − x′ , t, t′ ) = −iG(x − x′ , t, t′ ) + iG0 (x − x′ , t, t′ ) , (72)
′ ′ ′ ′
∆RR (x − x , t, t ) = iG(x − x , t, t ) − iG∗0 (x ′
− x , t, t ) , ′
(73)
′ ′ ′ ′
∆LR (x − x , t, t ) = iG0 (x − x , t, t ) , (74)
′ ′
∆RL (x − x , t, t ) = −iG∗0 (x ′
− x , t, t ) . ′
(75)

20
Referring back to the definitions (35) and (36) of the Green’s functions,
and recalling the formula (49) for the interaction picture field, we see that
Eqs. (72)–(75) give

∆LL (x − x′ , t, t′ ) = h0|T̄ {ϕ(x, t)ϕ(x′ , t′ )}|0i , (76)


′ ′ ′ ′
∆RR (x − x , t, t ) = h0|T {ϕ(x, t)ϕ(x , t )}|0i , (77)
′ ′ ′ ′
∆LR (x − x , t, t ) = h0|ϕ(x, t)ϕ(x , t )|0i , (78)
′ ′ ′ ′
∆RL (x − x , t, t ) = h0|ϕ(x , t )ϕ(x, t)|0i , (79)

in agreement with the rules (50)-(53) for propagators in the “in-in” for-
malism. The signs of the four terms in brackets in Eq. (71) are just those
expected from the rule of associating factors +i and −i with R and L ver-
tices, respectively.
We have recovered enough of the results of perturbation theory to assure
us of the validity of the tree theorem for the “in-in” formalism.

I am grateful for conversations with P. Greene, E. Komatsu, J. Malda-


cena, and M. Musso. This material is based upon work supported by the
National Science Foundation under Grant No. PHY-0455649.

References

1. J. Schwinger, Proc. Nat. Acad. Sci. US 46, 1401 (1961). Also see K.
T. Mahanthappa, Phys. Rev. 126, 329 (1962); P. M. Bakshi and K. T.
Mahanthappa, J. Math. Phys. 4, 1, 12 (1963); L. V. Keldysh, Soviet
Physics JETP 20, 1018 (1965); P. Danielewicz, Ann. Phys. 152, 239
(1984); K Chou, Z. Su, B. Hao, and L. Yu, Phys. Rept. 118, 1 (1985);
R. D. Jordan, Phys. Rev. D 33, 444 (1986); B. DeWitt, The Global
Approach to Quantum Field Theory (Clarendon Press, Oxford, 2003):
Sec. 31. For applications to cosmology, see E. Calzetta and B. L. Hu,
Phys. Rev. D 35, 495 (1987); M. Morikawa, Prog. Theor. Phys. 93,
685 (1995); N. C. Tsamis and R. Woodard, Ann. Phys. 238, 1 (1995);
253, 1 (1997); N. C. Tsamis and R. Woodard, Phys. Lett. B426, 21
(1998); V. K. Onemli and R. P. Woodard, Class. Quant. Grav. 19,
4607 (2002); T. Prokopec, O. Tornkvist, and R. P. Woodard, Ann.
Phys. 303, 251 (2003); T. Prokopec and R. P. Woodard, JHEP 0310,
059 (2003); V. K. Onemli and R. P. Woodard, Phys. Rev. D 70,
107301 (2004); T. Brunier, V.K. Onemli, and R. P. Woodard, Class.
Quant. Grav. 22, 59 (2005); S. Weinberg, Phys. Rev. D 72, 043514

21
(2005); Phys. Rev. D 74, 023508 (2006); M. van der Meulen and J.
Smit, J. Cosm. Astropart. Phys. 11, 023 (2007).

2. It is shown by S. Weinberg, Phys. Rev. D 74, 023508 (2006) that in


a broad class of theories there are no contributions to the correlation
functions involving positive powers of the Robertson-Walker scale fac-
tor a, though powers of ln a are possible. The possibility of powers of
ln a, arising when the effective cut off provided by renormalization is
at virtual physical wave numbers of order H, was pointed out by S.
Weinberg, Phys. Rev. D 73, 043514 (2005). This was found to occur
by M. van der Meulen and J. Smit, ref. [1].

3. For a non-linear treatment, see J. M. Maldacena, J. High Energy Phys.


05, 013 (2005) for the case of single field inflation, and D. H. Lyth, K.
A. Malik, and M. Sasaki, J. Cosm. Astropart. Phys. 05, 004 (2005)
for single-field inflation and its aftermath.

4. S. Coleman, in Aspects of Symmetry (Cambridge University Press,


Cambridge, 1985): pp 139–142.

5. S. Weinberg, Phys. Rev. D 72, 043514 (2005).

6. D. Seery, K. A. Malik, and D. H. Lyth, J. Cosm. Astropart. Phys. A


03, 014 (2008).

22
UTTG-06-08

Non-Gaussian Correlations Outside the Horizon


arXiv:0808.2909v3 [hep-th] 18 Oct 2008

Steven Weinberg∗
Theory Group, Department of Physics, University of Texas
Austin, TX, 78712

Abstract

It is shown that under essentially all conditions, the non-linear classical equa-
tions governing gravitation and matter in cosmology have a solution in which
far outside the horizon in a suitable gauge the reduced spatial metric (the
spatial metric divided by the square of the Robertson–Walker scale factor
a) is time-independent, though with an arbitrary dependence on co-moving
coordinates, and all perturbations to the other metric components and to all
matter variables vanish, to leading order in 1/a. The corrections are of order
1/a2 , and are explicitly given for the reduced metric in a multifield model
with a general potential. Further, this is the solution that describes the
metric and matter produced by single-field inflation. These results justify
the use of observed non-Gaussian correlations (or their absence) as a test
of theories of single-field inflation, despite our ignorance of the constituents
of the universe while fluctuations are outside the horizon after inflation, as
long as graphs with loops can be neglected.


Electronic address: weinberg@physics.utexas.edu

1
I. Introduction

Non-Gaussian cosmological correlations are attracting increasing interest


as an observational test of detailed theories of inflation[1]. But there is a
problem in calculations of observable non-Gaussian correlations. Given any
specific Lagrangian for the scalar fields that play a role in inflation, we know
in principle how to calculate the correlation functions for these fields and
gravitation up to the end of inflation. And given any set of correlation
functions for gravitational and matter and radiation perturbations at some
time in the relatively recent era when the temperature is well below the
QCD scale, we know enough about the contents of the universe to calculate
the subsequent evolution of these correlations. But in the intervening era
the universe went through a sequence of transformations about which we
know almost nothing, including reheating, lepton and baryon synthesis, and
cold matter decoupling. So how can we use assumptions about inflation to
calculate observable correlations?
The only thing that gives us any hope in understanding cosmological
correlations is that the wavelengths at which these correlations are observed
were outside the horizon during the whole period from a time during in-
flation until a relatively recent time when the contents of the universe are
reasonably well understood. But in order to take advantage of this fact,
we need to identify a set of variables whose correlation functions are time-
independent for wavelengths outside the horizon. In the linear approxima-
tion, it is known that quantum fluctuations in single-field inflation produce
adiabatic fluctuations, in which the curvature perturbation ζ as well as the
amplitude of gravitational waves become constant outside the horizon[2].
This is enough to show that Gaussian correlations of these quantities re-
main time-independent after inflation, as long as the wavelength is outside
the horizon, at least when quantum effects are neglected. But for non-
Gaussian correlations we must work with the full non-linear field equations.
It is known that the classical non-linear field equations for single-field
inflation have an ”adiabatic” solution for which ζ and the gravitational wave
amplitude become time-independent at late times during inflation[3], and it
has further been shown[4] that these quantities become time-independent
outside the horizon both during and after single-field inflation, but this
is only part of the story. To provide initial conditions for a calculation of
fluctuations through horizon re-entry and until the present, we need to know
not only ζ and the other metric components but also the matter (including
radiation) perturbations before horizon re-entry. It is sometimes taken as

2
part of the definition of the adiabatic solution that (in a suitable gauge)
these matter perturbations vanish, but it needs to be shown that such a
solution exists, and that in some circumstances the universe is described by
this solution.
Section II of this paper shows that whatever the constituents of the uni-
verse and the classical equations governing them may be, these equations
have a solution for which in a suitable gauge, as long as all relevant wave-
lengths are sufficiently far outside the horizon, all components of the reduced
metric g̃ij ≡ gij /a2 become time-independent functions of position; g00 be-
comes −1; gi0 vanishes; and all matter densities, pressures, and velocities
become equal to their unperturbed values; in all cases with corrections of
order (k/aH)2 . (As usual, a(t) is the Robertson–Walker scale factor and
H(t) ≡ ȧ(t)/a(t), while k is the largest relevant wave number.) The ar-
gument for these results is based on considerations of broken symmetry,
similar to those used to derive the form of the chiral Lagrangian for soft
pions[5]. It relies only on the general covariance of the underlying equa-
tions, and the usual assumption that these equations have a solution of the
Robertson–Walker form. This argument may be regarded as a substitute for
a “separate universe” assumption[6], but it gives more detailed information,
and some may find it more convincing.
In Section III we verify these results in a fairly general model of matter
fields, in a special gauge that allows the calculation of metric perturbations
outside the horizon without at the same time having to solve the equations
for matter perturbations. In particular we confirm that in this model there is
always an adiabatic solution in which the corrections to the leading terms for
both the metric and the matter variables are of order (k/aH)2 . This much is
already known for the metric components[4], but here we also obtain explicit
results∗∗ for the form of the O(1/a2 ) and O(1/a3 ) terms in g̃ij [7]. In general
this is only one of many possible solutions; whether or not it is the solution
that describes the real world depends on the details of inflation. In Section
IV we show that single field inflation in this model leads to matter and
metric fields that are described by the adiabatic solution after inflation, as
long as all wavelengths are outside the horizon.
There is a problem with all such classical field calculations. Even if we
could show that under all circumstances the full non-linear classical field
∗∗
Added note: In a paper now in preparation, I show that these explicit results are not
limited to scalar field theories, but are quite general for theories with anisotropic inertia
and vorticity that vanish to order a−2 .

3
equations have a solution for which ζ and gravitational wave amplitudes be-
come constant outside the horizon, and that the initial conditions provided
by single-field inflation (or a state of thermal equilibrium after inflation) pro-
duce perturbations that are described by this solution, we still would not
know that Heisenberg picture quantum operators for ζ and gravitational
wave amplitudes become constant at late times, and so we would not know
that the correlation functions become constant outside the horizon. “Out-
side the horizon” means that for all relevant co-moving wave numbers k,
we have k/a ≪ H. For quantum operators there can be no clear meaning
to this, because whatever the wave numbers at which the correlation func-
tions are observed, quantum fluctuations can carry arbitrarily large virtual
wave numbers. That is, there is no limit on the wave numbers circulating in
the loops in general graphs, however small we make the wave numbers for
the external lines of these graphs. For inflation with a single inflaton field,
the loop contributions to the correlation function of ζ are integrals over
virtual wave numbers p with integrands that are time-independent when
the virtual as well as the external wavelengths are outside the horizon, but
virtual wavelengths can not be constrained to remain outside the horizon,
because the integrals over virtual wave numbers are ultraviolet-divergent.
(For examples, see [8].) True, we can assume √ that the ultraviolet√divergence
is canceled by counterterms arising from −Detg Rµν Rµν and −Detg R2
terms in the Lagrangian density[9], but if this cancelation results in an ef-
fective cut-off at p/a of order H, then the correlation functions will involve
powers of ln a[8]. Detailed calculations[10] confirm the presence of such
time-dependent corrections in loop contributions to correlation functions.
Fortunately, in many theories the tree graphs make much larger contri-
butions to the correlation functions than loop graphs. For a tree graph the
wave number associated with any internal line is just a sum of wave num-
bers of several external lines, so all internal wavelengths can be assumed
to be outside the horizon if the external wavelengths are. Thus at least in
some theories, one can treat the non-linear field equations as if all relevant
wavelengths were outside the horizon, and hope that quantum effects do not
introduce large corrections. Alternatively, one can limit oneself to the tree
approximation from the beginning, and assume that tree graphs give a good
approximation to the correlation functions.
A recent paper[11] showed how to calculate the sum of tree graphs for
the generating function for general correlation functions by solving the clas-
sical equations of motion subject to certain constraints that depend on the
current appearing in the generating function. This is reviewed here in an ap-

4
pendix, using a simplified notation and adding some necessary comments.
In order to conclude in this formalism that correlation functions become
time-independent outside the horizon, it is not enough to show that the so-
lution of the non-linear classical field equations becomes time-independent
at late times during inflation. As reviewed here in the appendix, one must
also show that the effect of the constraints that are imposed at the time at
which the correlation functions are measured becomes independent of this
time when all wavelengths are outside the horizon, and also that a certain
integral converges. In Section V we show that these conditions are all sat-
isfied during and after single-field inflation for the quantity g̃ij ≡ gij /a2 .
(The same√ argument applies to any function of g̃ij , such as the quantity
ζ ≡ ln Detg̃ studied in [3].) Thus in order for parametric amplification
during reheating[12] to produce significant changes in the correlation func-
tions, such effects would have to amplify perturbations by a factor of order
e120 to e140 .
In summary, these results provide a practical program for calculating
observable correlation functions from theories of single-field inflation.
(i) First calculate the correlation functions of g̃ij ≡ gij /a2 (or any functions
of g̃ij ) sufficiently late after horizon exit during inflation so that they are
time-independent, using a definition of the time coordinate for which the
inflaton field is unperturbed. (This would presumably be done by direct
calculation of tree graphs, as already done in [3] for the bispectrum, rather
than by using the methods of Section V, which are intended only to provide a
proof of the time-independence outside the horizon of the sum of tree graphs
for any correlation function of g̃ij .) If we like we can separate a curvature
perturbation ζ by following [3] and writing

g̃ij = e2ζ [eγ ]ij , Tr γ = 0 ,

or alternatively by writing

g̃ij = δij + 2ζ δij + γij , Tr γ = 0 .

These different definitions of course give different non-Gaussian correlation


functions for ζ, but with either definition the correlation functions are con-
stant outside the horizon.
(ii) At a time which is sufficiently early so that the all wavelengths are still
outside the horizon, but late enough so that the contents of the universe are
well understood, take the correlation functions of g̃ij or of functions of g̃ij

5
to be given by the results of (i), and take all correlation functions involving
gi0 and/or g00 + 1 and/or matter or radiation perturbations to vanish.
(iii) Use the results of (ii) as initial conditions for calculation of the subse-
quent evolution of the correlation functions for gravitational and matter and
radiation perturbations when the wavelengths re-enter the horizon. This can
be done by using the classical field equations to derive coupled differential
equations for the various correlation functions, but such calculations are
outside the scope of this paper.
The above program appears to be more or less what is done in recent work
on non-Gaussian correlations[13]. The aim of this paper is to clarify the
justification for these calculations.

II. The General Adiabatic Solution

We assume, as usual in cosmology, that whatever the dynamical equa-


tions governing the metric and matter (including radiation) variables may
be, these equations have a solution in which the metric takes the Robertson–
Walker form, with g00 = −1, g0i = 0, and g̃ij ≡ gij /a2 = δij , and in which all
matter variables take their unperturbed form; that is, all densities and pres-
sures and scalar fields are functions only of time, and all velocities and other
3-vectors vanish. This section will give a very general argument that, what-
ever the constituents of the universe and the generally covariant equations
governing them and the metric may be, for a suitable choice of spacetime
coordinates, these equations always also have a family of solutions that we
will call adiabatic, which for large a(t) have the following properties:

1. The metric for any of these solutions has components with


   
g00 (x, t) = −1 + O a−2 (t) , gi0 (x, t) = O a−2 (t) ,
h  i
gij (x, t) = a2 (t) Gij (x) + O a−2 (t) , (1)

where Gij (x) is an arbitrary function only of the spatial coordinates.


(Different choices of this function characterize the different members
of this family of solutions.)

2. Whether or not the energy and momentum of any particular con-


stituent of the universe is separately conserved, its energy-momentum

6
tensor has the form
   
T00 (x, t) = ρ̄(t) + O a−2 (t) , Ti0 (x, t) = O a−2 (t) ,
h  i
Tij (x, t) = a2 (t) Gij (x)p̄(t) + O a−2 (t) . (2)

(Here and below, a bar over any quantity indicates its unperturbed
value.)

3. Any four-scalar s(x, t), such as a temperatures, number density, or


scalar field, has the form
 
s(x, t) = s̄(t) + O a−2 (t) . (3)

We are not assuming a de Sitter expansion, but in counting powers of 1/a,


we shall take H and its Rtime derivatives to be of zeroth order in a, so that
quantities like ȧ(t) and a(t) dt are counted as being of first order in a. It
should be understood that since the scale of a has a physical significance
only when a multiplies a co-moving coordinate, it follows that when we
calculate correlation functions with a typical co-moving wave number k,
a factor a−1 will always be accompanied with a factor k. Since k/a has
the same dimensions as H ≡ ȧ/a, we can anticipate that the dimensionless
parameter that characterizes the smallness of a term of order a−n is (k/aH)n .
Thus this theorem gives good approximations to the adiabatic solutions both
after horizon exit during inflation, when k/aH is decreasing, and before
horizon re-entry after inflation, when k/aH is increasing, as long as k/aH
is sufficiently small.
Of course, these solutions are in general far from unique, and the state-
ment that these adiabatic solutions exist does not tell us that one of these
solutions actually describes the metric and matter of the universe. As we
will see in section IV, if we start with single-field inflation then the universe
will thereafter be described by an adiabatic solution. Also, even when the
universe is described by an adiabatic solution, we need a detailed model of
inflation to calculate the function Gij (x) in Eq. (1).
To prove the existence of the adiabatic solutions, we will make use of an
argument based on the broken symmetry of general covariance. As already
mentioned, we are assuming that the dynamical equations have a solution
in which the metric takes the Robertson–Walker form, and in which all mat-
ter variables take their unperturbed form, with pressures and densities only
functions of time, and vanishing co-moving velocities. Now, whatever they

7
are, the dynamical equations will be invariant under all coordinate transfor-
mations, but this solution is not. In particular, if we subject the space coor-
dinates to a matrix transformation xi → x′i = Ai j xj , with Ai j an arbitrary
constant real matrix, then we get another exact solution, with g00 = −1,
gi0 = 0, but now with g̃ij ≡ gij /a2 equal to the arbitrary constant positive
real matrix (AT A)ij . The energy-momentum tensor of any constituent of
the universe (whether or not separately conserved) will in the new coordi-
nate system still have the perfect fluid form, Tµν = gµν p̄ + ūµ ūν (p̄ + ρ̄), with
the same density ρ̄(t), pressure p̄(t), and velocity ūi = 0, ū0 = −1, but now
with the new metric.
Instead of this exact solution, now consider what we will call a “trial
configuration” in which g00 = −1, gi0 = 0 and all densities, pressures, and
velocities are unperturbed, but with g̃ij an arbitrary time-independent pos-
itive matrix function Gij (x) of the co-moving space coordinates xi , not nec-
essarily close to δij . This trial configuration is of course not a solution of the
field equations, but since it would be a solution if g̃ij were constant, it fails
to be a solution only because there are terms in the field equations in which
space derivatives act on g̃ij . (Up to this point, this is just like the argument
used to derive the effective chiral Lagrangian for soft pions[5].) The spatial
derivatives of Gij (x) thus act as forcing terms, that drive the actual solution
away from the trial configuration. That is, making the tentative assump-
tion that the differences between metric or matter variables and their values
in the trial configuration are small perturbations when a(t) is sufficiently
large, these perturbations satisfy a set of coupled inhomogeneous linear dif-
ferential equations, with left-hand sides that are linear combinations of time
derivatives of these perturbations, and right-hand sides that involve spatial
derivatives of Gij (x). We will see concrete examples of such equations in the
next section.
Now, as a special case of general covariance, the field equations must be
invariant under the substitution xi → λxi (with λ an arbitrary constant) if
we also subject other quantities to appropriate transformations: 3-tensors
such as gij and Tij transform as gij → λ−2 gij and Tij → λ−2 Tij , while 3-
vectors such as gi0 and Ti0 transform as gi0 → λ−1 gi0 and Ti0 → λ−1 Ti0 .
(Here Tµν may be the energy-momentum tensor of any one constituent of
the universe, even if not separately conserved.) It is convenient to express
this as invariance under a scale transformation:

xi → λxi , a(t) → λ−1 a(t) , (4)

8
that leaves invariant various reduced quantities

g̃ij (x, t) ≡ gij (x, t)/a2 (t) , g̃i0 (x, t) ≡ gi0 (x, t)/a(t) ,
T̃ij (x, t) ≡ Tij (x, t)/a2 (t) , T̃i0 (x, t) ≡ Ti0 (x, t)/a(t) ,

as well as all 3-scalars such as temperature, densities, scalar fields, and also
g00 and T00 . The forcing term for any scale-invariant perturbation must be
scale-invariant, and three-dimensional coordinate invariance requires it to
have the same transformation under purely spatial coordinate transforma-
tions as the perturbation, so the perturbation of any scale-invariant quan-
tity away from its value in the trial configuration will be proportional to as
many powers of 1/a as appear in the scale-invariant quantity formed from
1/a and derivatives of Gij that has the same transformation property under
three-dimensional coordinate transformations as the perturbation in ques-
tion. For perturbations to the scale-invariant quantities gij /a2 or Tij /a2 ,
the scale-invariant forcing terms with the minimum number of factors of
1/a are proportional to the 3-tensors Rij /a2 or Gij R/a2 , where Rij is the 3-
dimensional Ricci tensor for the 3-metric Gij , and R = G kl Rkl , with G ij the
reciprocal of Gij . Likewise, for perturbations to the scale-invariant quantities
gi0 /a or Ti0 /a, the scale-invariant forcing terms with the minimum number
of factors of 1/a are proportional to the scale-invariant 3-vectors ∂i R/a3 ,
and for perturbations to the scale-invariant quantities g00 or T00 , the forcing
terms with the minimum number of factors of 1/a are proportional to the
scale-invariant 3-scalar R/a2 , and likewise for perturbations to any other
scale-invariant 3-scalar. Thus the difference between the values of the quan-
tities gij /a2 , Tij /a2 , gi0 , Ti0 , g00 , T00 , and 3-scalars like temperatures or
scalar fields and the values of the corresponding quantities in the trial con-
figuration are all of order 1/a2 , as was to be proved. In particular, all these
perturbations are small for sufficiently large a(t), as tentatively assumed in
proving the existence of these solutions.
The general solution for the perturbations to the trial configuration con-
sists of a sum of the solution of the inhomogeneous differential equations,
with derivatives of Gij (x) as forcing terms, plus solutions of the correspond-
ing homogeneous equations. In a completely general theory of inflation the
solutions of the homogeneous equation could have any magnitude. However,
we will see in Section IV that in single field inflation they are also of order
a−2 .
The form (1), (2), for the adiabatic solutions is not valid for all choices
of spacetime coordinates, but it is easy to impose gauge-fixing conditions

9
on the coordinates that are consistent with this form. We can choose the
time-coordinate so that any one three-scalar, such as a scalar field or the
temperature, is unperturbed. (A generalized version of this choice of gauge
is adopted in Section III.) To choose the space coordinates, we note that
under a time-dependent transformation xi → x′i (x, t) that leaves the time
invariant, the metric component gi0 undergoes the transformation

∂x′i j0 ∂x′i 00
gi0 → g ′i0 = g + g
∂xj ∂t
We can evidently choose the time-dependence of x′i (x, t) so that g′i0 = 0,
by solving the differential equation

∂x′i ∂x′i
= − j gj0 /g 00
∂t ∂x
for any arbitrary choice of x′i (x, t0 ) at an initial time t0 . In this case, also
′ = 0. Though not unique, this choice of space and time coordinates is
gi0
clearly consistent with (1)—(3). It still leaves us free to make purely spatial
time-independent coordinate transformations, a freedom we have used in
the arguments above.
These adiabatic solutions to the non-linear field equations far outside
the horizon may look unfamiliar to readers who are familiar with the form
of the adiabatic solution for scalar modes in the linear approximation in
Newtonian gauge, for which gi0 = 0 and gij ∝ δij . In [2] it is shown in the
linear approximation that in the adiabatic mode in Newtonian gauge there
are perturbations to the matter fields χn (x) that do not vanish for large
a(t), and a perturbation to g̃ij that does not become time-independent in
this limit:
ζ(x)χ̄˙ n (t) t H(t) t
Z  Z 
δχn (x, t) = − a(t′ ) dt′ , δg̃ij = 2δij ζ(x) 1 − a(t′ ) dt′ ,
a(t) T a(t) T

with ζ(x) an infinitesimal function only of position, and T arbitrary. But it


is easy to see that by a re-definition of the space and time coordinates, we
can make all δχn vanish, keep gi0 equal to zero, and make δg̃ij equal to

∂ 2 ζ(x) t dt′ t′
Z Z
δg̃ij = 2δij ζ(x) + 2 a(t′′ ) dt′′ ,
∂xi ∂xj T a3 (t′ ) T

which in the limit of large a(t) approaches the time-independent function


2δij ζ(x), with a correction of order a−2 . In the non-linear case there is

10
no advantage to using something like Newtonian gauge (for instance, by
choosing space coordinates so that g̃ij = e2ζ [eγ ]ij where ∂i γij = 0 as well
as γii = 0, as in [3]), and such a choice has the disadvantage of spoiling
three-dimensional coordinate invariance.
This analysis allows us to make a rough estimate of the expected cor-
rections to the constancy of the metric correlations and to the vanishing
of the correlation functions for matter perturbations following single-field
inflation. At horizon exit the rate of change of the correlation functions of
g̃ij is of order H, and from then until the end of inflation, the factor 1/a2
decreases by a factor roughly of order e−120 to e−140 [14], so at the end of
inflation we expect the correlation functions of g̃ij to be changing at a rate
of order e−120 H to e−140 H. Similarly, the correlation functions of g̃ij at the
end of inflation are of the same order as at horizon exit, so with the decrease
in 1/a2 we expect correlation functions for matter perturbations after infla-
tion to less than the correlation functions of g̃ij by a factor of order e−120
to e−140 . This is the suppression factor that has to be overcome in order
for physical processes during reheating like parametric amplification[12] to
produce significant changes in observable correlation functions.

III. Explicit Solutions for Multiscalar Theories

The arguments of the previous section were rather abstract, so to see in


more detail how they work out in practice, let us consider a more concrete
but still fairly general model. To represent the matter fields, we suppose that
there is a set of scalar fields χn , with a conventional kinematic Lagrangian
and a completely arbitrary real potential V (χ). The unperturbed values of
these fields are functions χ̄n (t) of time that satisfy the field equations

∂V (χ̄)
¨ n + 3H χ̄˙ n +
χ̄ =0 (5)
∂ χ̄n

where (in units with 8πG = 1)


1X 2
3H 2 = χ̄˙ + V (χ̄) . (6)
2 n n

For the metric, we use the ADM parameterization[15]

g00 = −N 2 + gij N i N j , g0i = gij N j ≡ Ni


g00 = −N −2 , gi0 = N i /N 2 , g ij = (3) gij − N i N j /N 2 , (7)

11
where (3) gij is the reciprocal of the 3 × 3-matrix gij . It will be convenient
also to write
gij (x, t) = a2 (t)g̃ij (x, t) , (8)
where a(t) is the Robertson–Walker scale factor, satisfying ȧ/a = H, with
H given by Eq, (6), The quantities N and N i are auxiliary fields, whose
time derivatives do not appear in the Lagrangian. The Lagrangian for this
theory is
( )
1
Z
3 (4) µν
p X
L= d x −Detg −R −g ∂µ χn ∂ν χn − 2V (χ)
2 n
(
a3
Z
3
g̃ − a−2 g̃ ij R̃ij + C i j C j i − (C i i )2
p
= d xN
2
)
X 2 X
−2 i −2 ij
+N χ̇n − N ∂i χn −a g̃ ∂i χn ∂j χn − 2V (χ) , (9)
n n

where g̃ij (x, t) is the reciprocal of the matrix g̃ij (x, t); g̃(x, t) is the deter-
minant of g̃ij (x, t); R̃ij (x, t) is the three-dimensional Ricci tensor (with the
sign convention of [2]) for the metric g̃ij (x, t); and C i j (x, t) is the extrinsic
curvature of the surfaces of fixed time
1 h ˜ i Nj − ∇
i
˜ j Ni ,
C i j ≡ a−2 g̃ ik Ckj , Cij ≡ 2aȧg̃ij + a2 g̃˙ ij − ∇ (10)
2N
where ∇˜ i is the three-dimensional covariant derivative calculated with the
three-metric g̃ij . For future use, we also note the well-known relations (for
8πG = 1): X 2
χ̄˙ n = −2Ḣ V (χ̄) = 3H 2 + Ḣ . (11)
n
Models of this sort can be used both as fairly realistic theories of inflation,
and also as surrogates for a theory of the matter and radiation after inflation.
Because we are allowing any number of scalar fields, this model will in
general have solutions in which neither the perturbations to matter fields
nor the rate of change of g̃ij vanish at late times, so it is not trivial to see
that there is also an adiabatic solution in which they do go to zero at late
time, and that this is the solution that is excited if during inflation there is
only one non-negligible scalar field.
The gravitational field equations derived from this Lagrangian are

˜ i C i j − δi j C k k = 1
   
∂j χn χ̇n − N i ∂i χn ,
X
∇ (12)
N n

12
h i
N 2 − a−2 g̃ij R̃ij − C i j C j i + (C i i )2 − 2V (χ)
(χ̇n − N i ∂i χn )2 + N 2 a−2 g̃ij
X X
= ∂i χn ∂j χn , (13)
n n


˜ j Nk + C k j ∇
R̃ij − C k k Cij + 2Cik C k j + N −1 − Ċij + C k i ∇ ˜ i Nk

˜ k Cij + ∇
+ N k∇ ˜ i∇
˜ j N = −a2 g̃ij V (χ) −
X
∂i χn ∂j χn , (14)
n

and the scalar field equations are


√  ! √
∂ g̃ i
 3H g̃  
χ̇n − N ∂i χn + χ̇n − N i ∂i χn
∂t N N
√ !
1 ∂ p ij
 ∂ g̃ N j  i

= 2 i g̃ N g̃ ∂j χn + j χ̇n − N ∂i χn
a ∂x ∂x N
p ∂V (χ)
− g̃N . (15)
∂χn
In line with the remarks of the previous section, we look for a solution in
which δχn (x, t) ≡ χn (x, t) − χ̄n (t) as well as g̃˙ ij and δN ≡ N − 1 are all of
order a−2 (t) at late time. For convenience, in accordance with remarks at the
end of the previous section, we also adopt a definition of space coordinates
for which N i = 0. We can then write

C i j = Hδi j + ξ i j , (16)

where ξ i j is, like g̃˙ ij and δN , a quantity whose leading term is of order a−2 :
1 h i
ξ i j = g̃ ik g̃˙ kj − 2H δN g̃kj + O(1/a4 ) . (17)
2
Then the gravitational field equations (12)–(14) become
 
˜ i ξ i j − δi j ξ k k = ∂j
X
∇ δχn χ̄˙ n + O(a−4 ) , (18)
n

X ∂V (χ̄)
4ḢδN = −a−2 g̃ij R̃ij + 4Hξ k k − 2
X
χ̄˙ n δχ̇n − 2 δχn + O(a−4 ) ,
n n ∂ χ̄n
(19)
X ∂V (χ̄)
˙i i i
ξ j + 3Hξ j + Hδ j ξ k
k − ḢδN δ j
i
=a −2 ik
g̃ R̃kj + δ i
j δχn + O(a −4
).
n ∂ χ̄n
(20)

13
Using Eqs. (19) and (5), we can rewrite Eq. (20) in the form:
1 1
 
Ξ̇i j + 3HΞi j = 2
g̃ ik R̃kj − δi j g̃kl R̃kl + O(a−4 ) , (21)
a 4
where
1 X
Ξi j ≡ ξ i j + δ i j χ̄˙ n δχn . (22)
2 n

Also, Eq. (18) now reads simply


 
˜ i Ξi j − δi j Ξk k = O(a−4 ) ,
∇ (23)

Eq. (21) has a solution:

1 1 tB i j (x)
  Z
Ξi j (x, t) = G ik (x)Rkj (x) − δi j G kl (x)Rkl (x) 3 a(t′ ) dt′ +
+O(a−4 ) ,
4 a (t) T a3 (t)
(24)
where T is any fixed time, Gij (x) is the value of g̃ij (x, t) at that time, Rij (x)
is the Ricci tensor calculated from the 3-metric Gij (x), and B i j (x) is a time-
independent function of co-moving position. It is convenient to choose T at
around the end of inflation, where k/aH is smallest, in which case all terms
in Eq.(24) are very small from soon after horizon exit to just before horizon
re-entry. The leading term in (24) automatically satisfies Eq. (23) because
of the Bianchi identity satisfied by Rij .
To solve for the metric, we need to complete our choice of gauge. By
using Eqs. (22), (17), (19), and (5), we have

2H 2 H
g̃˙ ij = 2g̃ik Ξk j + g̃ij Ξk k − g̃ij g̃ kl R̃kl − g̃ij X + O(a−4 ) , (25)
Ḣ 2a2 Ḣ
where X = O(1/a2 ) arises from the perturbation to the scalar fields
X H X 
X≡ χ̄˙ n δχn + χ̄˙ n δχ̇n − χ̄
¨ n δχn . (26)
n Ḣ n

Under a shift t → t+ǫ(x, t) in the time coordinate, with ǫ of order 1/a2 (and
i i ij dt a−2 ∂ǫ/∂xj to keep Ni =
R
a corresponding transformation x → x + G
0), the perturbations to the scalar fields undergo the gauge transformation
δχn (x, t) → δχn (x, t) − ǫ(x, t)χ̄˙ n (t) to order 1/a2 . Hence, using Eq. (11), to
this order
∂ 
X →X +2 ǫH , (27)
∂t

14
so we can evidently choose ǫ to make X = 0. This choice provides the great
advantage that we can solve Eq. (25) for g̃ij without first solving the field
equations for the matter fields:

1
Z t dt′ Z t′
g̃ij (x, t) = Gij (x) + 2 Rij (x) − Gij (x)G kl (x)Rkl (x) 3 ′
a(t′′ ) dt′′
4 T a (t ) T
H 2 (t′ ) dt′
Z t t ′
1
Z
+ Gij (x)G kl (x)Rkl (x) a(t′′ ) dt′′
2 3
T Ḣ(t )a (t ) T
′ ′
Z t
1 H(t′ ) dt′
− Gij (x)G kl (x)Rkl (x)
2 T a2 (t′ ) Ḣ(t′ )
H 2 (t′ ) dt′
Z t ′ Z t
dt
+ 2 Gik (x)B k j (x) 3 ′
+ 2 G ij (x)B k
k (x)
T a (t ) T a3 (t′ ) Ḣ(t′ )
 
+ O a−4 (t) , (28)
where T is again some fixed time, conveniently chosen as the time at the
end of inflation, and Gij (x) and Rij (x) are the values of g̃ij and the asso-
ciated Ricci tensor at that time. This confirms that while far outside the
horizon, the time-dependent part of g̃ij is of order a−2 . But in the radiation
or matter-dominated era the second, third, and fourth terms in Eq. (28) in-
crease like 1/a2 H 2 , which produces the breakdown in these approximations
when physical wavelengths re-enter the horizon.
It remains to consider the scalar fields. By using Eq. (5) again, we can
put the field equation (15) in the form
X ∂ 2 V (χ̄) 1
 
δχ̈n +3Hδχ̇n + δχm = − g̃ ij g̃˙ ij + δṄ + 6HδN χ̄˙ n +2δN χ̄
¨ n +O(1/a4 ) .
m ∂ χ̄ n ∂ χ̄ m 2
(29)
With X = 0, we now have
1 H 1 X
δN = − g̃ ij R̃ij + Ξk k + χ̄˙ δχn + O(1/a4 ) . (30)
4Ḣa 2 Ḣ 2H n n
Using Eqs. (21), (25), and (30), we can put Eq. (29) in the form
1 X
δχ̈n + 3Hδχ̇n − χ̄˙ χ̄˙ δχ̇m
2H m n m
" ! #
X ∂ 2 V (χ̄) Ḣ 1
+ − 3− 2
χ̄˙ n χ̄˙ m − ¨ n χ̄˙ m ) δχm
¨ m + 2χ̄
(χ̄˙ n χ̄
m ∂ χ̄ n ∂ χ̄ m 2H 2H
1   1 
˙ ¨ ij k
= Ḧ χ̄n − 2Ḣ χ̄n 2
G Rij − HΞ k + O(a−4 ) , (31)
Ḣ 2 4a

15
or, using Eq. (24) for Ξ:
1 X
δχ̈n + 3Hδχ̇n − χ̄˙ χ̄˙ δχ̇m
2H m n m
" ! #
X ∂ 2 V (χ̄) Ḣ 1
+ − 3− χ̄˙ n χ̄˙ m − ¨ n χ̄˙ m ) δχm
¨ + 2χ̄
(χ̄˙ χ̄
m ∂ χ̄n ∂ χ̄m 2H 2 2H n m
" #
1  1 H t H B i i (x)
  Z 
= Ḧ χ̄˙ n − 2Ḣ χ̄
¨ n R(x)
2
− 3 ′
a(t ) dt ′

Ḣ 2 4a 4a T a3
+ O(a−4 ) . (32)

An inhomogeneous differential equation of this form will have a solution


in which a non-zero curvature scalar R will generate perturbations of order
1/a2 in the various scalar fields, as anticipated in the previous section. The
field equations also have isocurvature solutions in which R = 0 and there
are small perturbations to the scalar field, not necessarily of order 1/a2 , for
which
1 X
0 = δχ̈n + 3Hδχ̇n − χ̄˙ χ̄˙ δχ̇m
2H m n m
" ! #
X ∂ 2 V (χ̄) Ḣ 1
+ − 3− χ̄˙ n χ̄˙ m − ¨ n χ̄˙ m ) δχm ,
¨ + 2χ̄
(χ̄˙ χ̄
m ∂ χ̄n ∂ χ̄m 2H 2 2H n m
(33)

and X = 0. To tell what solutions actually describe the metric and matter
of the universe, we need a specific model of inflation, such as single field
inflation, to which we now turn.

IV. Single-field Inflation, and its Aftermath

During single-field inflation there is by assumption only one non-zero


2
χn , say χ1 , so Eq. (11) gives Ḣ = −χ̄˙ 1 /2 and Ḧ = −χ̄˙ 1 χ̄
¨ 1 , and we see that
in this case the right-hand side of Eq. (32) vanishes. Thus during single-
field inflation Eq. (32) is a homogeneous differential equation for δχ1 , and
therefore allows a solution δχ1 = 0, which of course it must, since we can
arrange that δχ1 = 0 by a choice of gauge consistent with the gauge choice
X = 0 used to derive Eq. (32).
This shows that the non-linear field equations for single-field inflation
have a solution in which δχ1 = 0, and in which for late times g̃ij (x, t) is

16
attracted to a time-independent metric Gij (x), with corrections of order
a−2 and a−3 given by Eq. (28). We know by explicit calculation that in
the linear approximation all solutions are in the basin of attraction for this
asymptotic solution[16], but it is difficult to show that the relevant solution
of the full non-linear equations is in this basin of attraction, and we shall
simply assume that this is the case.
Then at the end of inflation the transfer of energy from the inflation turns
on other scalar fields, and the right-hand side of Eq. (32) becomes non-zero.
As we have seen the general solution for the scalar field perturbations is a
forced term of order a−2 , plus a solution of the homogenous equation (33). In
general the solution of the homogeneous equation could be of any order in a,
but by definition during single field inflation in our gauge all δχn and δχ̇n are
negligible, and with these initial conditions the solution of the homogeneous
equation must be of order a−2 to cancel the O(a−2 ) terms in the solution of
the inhomogeneous equation immediately after single field inflation. Thus
as expected, for this solution all perturbations to the matter fields become
of order 1/a2 outside the horizon, and we have a pure adiabatic solution,
with negligible corrections.
V. Tree-Approximation Correlation Functions
If the results we have obtained so far really applied to the metric and
matter perturbations in the Heisenberg picture, we could conclude that with
a suitable definition of coordinates, all correlation functions involving only
g̃ij (or functions of g̃ij ) become time-independent outside the horizon, and
that all correlation functions involving perturbations to g00 , g0i , and mat-
ter variables become negligible outside the horizon. But as mentioned in
the Introduction, the presence of quantum fluctuations of arbitrarily small
wave lengths invalidates the expansions in powers of 1/a as applied to the
Heisenberg picture interacting fields. To avoid this problem we must limit
our consideration to tree graphs for correlation functions, on the assumption
that the contributions of graphs with loops are much smaller. We can as
usual apply the results of Sections II – IV to the Heisenberg picture quantum
fields, but calculate correlation functions only to lowest order in interactions
to avoid loop graphs, hoping that this is a good approximation. Here we
want to consider an alternative approach, in which one explicitly considers
only tree graphs.
In the appendix we review the general tree theorem of [11], which shows
how to calculate the sum of tree graphs for correlation functions by a so-
lution of the classical field equations, subject to certain constraints. To

17
illustrate the use of this theorem, in this section we will apply this theo-
rem to the correlation functions of the reduced metric g̃ij ≡ gij /a2 during
single-field inflation, adopting space and time coordinates for which there is
no perturbation to the inflaton field, and for which gi0 = 0.
The generating function W [J, t1 ] for correlation functions of g̃ij at a time
t1 is defined by Eq. (A.1), which for this case takes the form
n o  hZ i 
d3 x g̃ij
H
(x, t1 )J ij (x) 0, in

exp W [J, t1 ] ≡ 0, in exp , (34)

H (x, t) is the Heisenberg-picture quantum mechanical operator cor-


where g̃ij
responding to g̃ij (x, t). Correlation functions for g̃ij (x, t1 ) are calculated
according to Eq. (A.2), which here reads

∂n
 n o 
H H
h0, in|g̃ij (x, t1 ) g̃kl (y, t1 ) · · · |0, ini = exp W [J, t 1 ]
∂J ij (x) ∂J kl (y) · · · J=0
(35)
We want to evaluate W [J, t1 ] for late times during inflation, at which the
Robertson–Walker scale factor a(t1 ) becomes exponentially large, from which
we can calculate the late-time expectation value of products of the operators
g̃ij at various space coordinates or wave numbers.
As described in the appendix, to calculate W in the tree approxima-
tion we construct complex c-number metric fields g̃ij (x, t) together with a
complex auxiliary field N (x, t), satisfying the constraints:
(A) The fields satisfy the Euler–Lagrange equations. In the our case, they
are Eqs. (12)–(14) with no scalar field perturbations.
(B) The fields g̃ ij satisfy constraints at time t1 :

Im g̃ij (x, t1 ) = 0 , (36)


˙ t1 ]
( )
δL[g̃, g̃,
Im = −J ij (x) . (37)
˙
δg̃ ij (x, t1 )

(C) g̃ij satisfies a positive frequency constraint at time t → −∞, that it


behaves as a superposition of terms proportional to exp(−iωt) , with ω
various positive frequencies.
These constraints give the functions g̃ij (x, t) an implicit dependence on both
the current J and on the time t1 at which correlations are to be measured.

18
With the functions g̃ij (x, t) and N (x, t) constructed in this way, the gener-
ating function is given by Eq. (A.6), which here reads
Z t1 Z
W [J, t1 ]tree = ˙
Im L[g̃(t), g̃(t), t] dt + d3 x J ij (x)g̃ij (x, t1 ) . (38)
−∞

We showed in Section III that the non-linear field equations have a solu-
tion for g̃ij (x, t) that for late times is attracted to a time-independent metric
Gij (x). But as remarked in the appendix, this is not enough to conclude that
the correlation functions for g̃ij (x, t) become time-independent at late time.
We must also show that the constraints (36) and (37) do not give g̃ij (x, t)
any dependence on the time t1 at which the constraints are imposed, pro-
vided a(t1 ) is sufficiently large, and we must consider the convergence of the
time integral in Eq. (38) as a(t1 ) at the upper limit t1 becomes large.
For large a(t1 ), the constraint (36) simply provides the t1 -independent
condition that the leading term Gij (x) in Eq. (28) is real for all x. It follows
then that the Ricci tensor Rij (x) for the metric Gij (x) is real, so the terms
in (28) of order 1/a2 are also real. The leading terms in Im g̃ij (x, t1 ) are
then of order 1/a3 :
t dt′ t H 2 (t′ ) dt′
Z Z
Im g̃ij (x, t) = 2Gik (x) ImB k j (x) + 2Gij (x) ImB k k (x)
T a3 (t′ ) T a3 (t′ ) Ḣ(t′ )
 
+ O a−4 (t) . (39)

The functional derivative appearing in the constraint (37) is


˙ a3 (t) g̃(x, t) ik
p
δL[g̃(t), g̃(t), t]  
= g̃ (x, t) −2H(t)δj k +ξ j k (x, t)−δj k ξ l l (x, t) .
δg̃˙ ij (x, t) 2
(40)
The metric is constrained by Eq. (36) to be real at t = t1 , so the term
−2Hδi j in parentheses makes a contribution to this functional derivative
that is also real at t = t1 , but the tensor ξ i j in the other two terms has
an imaginary part given by the O(a−3 ) term in Eq. (24) (with Ξi j replaced
with ξ i j , which in the absence of scalar field perturbations is the same):
 
Im ξ i j (x, t) = a−3 (t)Im B i j (x) + O a−4 (t) , (41)
so
˙
p
δL[g̃(t), g̃(t)] G(x) ik    
Im = G (x)Im B j k (x) − δj k B l l (x) + O a−1 (t) .
δg̃˙ ij (x, t) 2
(42)

19
Thus the constraint (37) does become independent of t1 for large t1 .
This is not just a happy accident. We can understand the asymptotic
constancy of the left-hand side of Eq. (37) by recalling the Euler–Lagrange
equations
˙
∂ δL[g̃(t), g̃(t)] ˙
δL[g̃(t), g̃(t)]
= .
∂t δg̃˙ ij (x, t) δg̃ij (x, t)
The imaginary part of the right-hand side decreases as 1/a2 , so the left-hand
side of Eq. (37) becomes constant for large a.
This leaves the question of the convergence of the integral over time in
Eq. (38) for large a(t1 ). Let’s first consider the terms in the gravitational
part of the Lagrangian (9) that contain either 0 or 1 space or time derivative.
Since N is fixed by the condition that the Lagrangian be stationary in N , to
first order in δN we can set N = 1. It is then straightforward to calculate
that the terms in the gravitational part of L of zeroth or first order in
derivatives add up to

a3
Z p h i
˙ t] =
L1 [g̃, g̃, ˜ i Nj . (43)
d3 x g̃ − 12H 2 − 4Ḣ − 2H g̃ ij g̃˙ ij − 8H g̃ ij ∇
2
The final term in square brackets integrates to zero (and in any case vanishes
for the choice we have made of spatial coordinates), leaving us (as already
noted in [3]) with a total time derivative

˙ t] = −2 d a3 H
 Z 
3
p
L1 [g̃, g̃, d x g̃ (44)
dt
As remarked in the appendix, such a total time derivative in the Lagrangian
has no effect on the correlation functions. This leaves the terms in the
gravitational part of L that are of second order in ξ or that involve √ the
space curvature. According to Eq. (39) the imaginary part of N g̃g̃ ij R̃ij
is of order a(t)−3 at late time, which cancels the over-all factor a3 in the
Lagrangian, so this term makes a contribution of order a−2 . According to
Eq. (41), the imaginary part of any second-order function of the ξ is of
order a−2 × a−3 , so again such terms make contributions to Im(L − L1 )
that at late times are of order a−2 . The time integral in Eq. (38) therefore
converges to a finite limit for large a(t1 ) exponentially fast, as t1 a(t)−2 dt.
R

This concludes the proof that in single field inflation the generating function
W [J, t1 ] converges to a t1 -independent function for large t1 , and therefore
so do the correlation functions of g̃ij .

20
This demonstration, that the correlation functions for the metric con-
verge to t1 -independent functions for large t1 , does not imply that these
limits are uniform in the coordinates appearing as arguments of the metric
components. In fact, if we set coordinates equal, the correlation functions
do not converge to finite limits. For instance, if to avoid ultraviolet diver-
gences we define ζr (t) as the average of the curvature perturbation ζ(x) over
a very small co-moving volume r 3 whose physical radius a(t)r over the times
of interest remains much less than the horizon size 1/H(t), then it can be
shown that in slow roll inflation the tree-approximation vacuum expecta-
tion value of ζr2 (t) increases like a(t)nS −1 for nS > 1 (where nS is the usual
scalar mode slope parameter) and like ln a(t) for nS = 1, though it does
approach a constant for nS < 1. Because of the way that a(t) and the co-
moving coordinates xi enter in the flat-space Robertson–Walker metric, they
have no physical significance in themselves; it is only a(t) times differences
of co-moving coordinates that have a significance, as physical separations.
Thus we expect the metric correlation functions to approach constant limits
only when all such physical separations become large compared with the
horizon size 1/H. Of course, in practice we are chiefly interested in the
Fourier transforms of the correlation functions, in which case the physical
wave numbers are the co-moving wave numbers divided by a(t), and we ex-
pect these Fourier transforms to approach finite limits only when all physical
wave numbers become much less than H(t).

I am grateful for helpful conversations with Raphael Flauger, Eiichiro


Komatsu, David Lyth, Juan Maldacena, Misao Sasaki, and Richard Woodard.
This material is based upon work supported by the National Science Foun-
dation under Grant No. PHY-0455649 and with support from The Robert
A. Welch Foundation, Grant No. F-0014

Appendix: The Tree Theorem

In this appendix we will review the general tree theorem of [11], in a


somewhat simplified notation, and add a remark that is needed in Section
V. We consider a general Lagrangian system, with Hermitian Heisenberg-
picture canonical operators qaH (t), and Lagrangian (not Lagrangian density)
L[q H (t), q̇ H (t), t], possibly with an intrinsic time dependence. In field theo-
ries the index a incorporates a space coordinate x as well as discrete indices
labeling the various field components; a sum over a includes an integral over

21
x as well as sums over discrete indices; and derivatives with respect to qa (t)
are interpreted as functional derivatives. We wish to calculate the generat-
ing function W [J, t1 ], a real function of a set of real c-number currents Ja ,
which is defined by
* +
n o hX i
H
exp W [J, t1 ] ≡ 0, in exp qa (t1 )Ja 0, in , (A.1)

a

where |0, ini is a state defined to look like the vacuum state at an early
time, which in this paper we take as t = −∞. From W we can calculate
expectation values in this state of products of any number n of q H s at the
time t1 :
∂n
 n 
o
h0, in|qaH (t1 ) qbH (t1 ) · · · |0, ini = exp W [J, t1 ] (A.2)
∂Ja ∂Jb · · · J=0

To calculate W [J, t1 ] in the tree approximation, we construct complex c-


number functions qa (t), subject to three conditions:
(A) The qa (t) satisfy the Euler–Lagrange equations

∂ ∂L[q(t), q̇(t), t] ∂L[q(t), q̇(t), t]


= . (A.3)
∂t ∂ q̇a (t) ∂qa (t)

(In extending the Lagrangian to complex variables, we take it as a real


function, in the sense that L∗ [q(t), q̇(t), t] = L[q ∗ (t), q̇ ∗ (t), t].)
(B) The qa (t) satisfy constraints at time t1 :

Im qa (t1 ) = 0 , (A.4)
∂L[q(t1 ), q̇(t1 ), t1 ]
Im = −Ja . (A.5)
∂ q̇a (t1 )

(C) The qa (t) also satisfy a positive frequency constraint at time t →


−∞, that they behave as superpositions of terms with time-dependence
exp(−iωt), with ω various positive frequencies.
(In [11] the functions qa (t) were denoted qLa (t); we are here taking advantage
of the fact that for real currents, the other functions qRa (t) introduced in [11]
∗ (t).) The constraint (B) gives the q (t) an implicit dependence
are just qLa a
on t1 as well as on the Ja . With qa (t) calculated subject to these three

22
constraints, the contribution of connected tree graphs to the generating
function is given by
Z t1 X
W [J, t1 ]tree = Im L[q(t), q̇(t), t] dt + Ja qa (t1 ) . (A.6)
−∞ a

We are concerned in this paper with the limit of W for large t1 (or more
precisely, for large a(t1 )).
From the foregoing, we can see that, in order to conclude that the gen-
erating function becomes independent of t1 when t1 is sufficiently large, it
is not enough to show that the quantities qa (t1 ) approach finite limits for
large t1 . We must also show that the integral in Eq. (A.6) converges in
this limit. (There is no problem with the convergence at very early times,
where the integrand oscillates increasingly rapidly.) Further, because the
constraints (A.4) and (A.5) are applied at time t1 , we must show that the
quantities Im{∂L[q(t1 ), q̇(t1 ), t1 ]/∂ q̇a (t1 )} as well as Im qa (t1 ) approach finite
t1 -independent limits for large t1 .
In order to evaluate the late time behavior of the correlation function in
Section V, we need to supplement this general review with a remark about
the effect of adding to the Lagrangian a derivative term:
d
∆L(t) = F [q(t), t] , (A.7)
dt
with F [q(t), t] an arbitrary function of t and of the qa (t), which is real in
the same sense as L; that is, F ∗ [q(t), t] = F [q ∗ (t), t]. It is familiar that
such derivative terms do not matter in calculating the S-matrix, because
there the Lagrangian enters in integrals over all time, but in calculating the
generating function here we need to integrate the Lagrangian only up to time
t1 , and the Lagrangian also enters in the constraint (A.5). Nevertheless, we
can easily see that in calculating the generating function, as in S-matrix
calculations, the change (A.7) has no effect. First, adding a derivative term
(A.7) obviously has no effect on the Euler-Lagrange equations (A.3). The
only other place where the Lagrangian enters in constructing the functions
qa (t) is in the constraint (A.5), but adding the derivative term (A.7) changes
the left-hand side of Eq. (A.5) by
∂L[q(t1 ), q̇(t1 ), t1 ] ∂F [q(t1 ), t1 ]
∆ Im = Im , (A.8)
∂ q̇a (t1 ) ∂qa (t1 )
and this vanishes because the constraint (A.4) requires that qa (t1 ) be real.
Hence the change (A.7) has no effect on the functions qa (t). The only effect

23
on the generating function (A.6) is then to change it by an amount
Z t1
∆W [J, t1 ]tree = Im ∆L[q(t), q̇(t), t] dt = ImF [q(t1 ), t1 ] (A.9)
−∞

and this vanishes because again the constraint (A.4) requires that qa (t1 ) be
real.

References

1. For a review, see N. Bartolo, E. Komatsu, S. Matarrese, and A. Riotto,


Phys. Rep. 402, 103 (2004).

2. For a review with references to the original literature, see S. Weinberg,


Cosmology (Oxford University Press, 2008), Sec. 5.4.

3. J. M. Maldacena, J. High Energy Phys. 05, 013 (2003).

4. D. H. Lyth, K. A. Malik, and M. Sasaki, J. Cosm. Astropart. Phys.


05, 004 (2005).

5. For a review and references to the original literature, see S. Wein-


berg, The Quantum Theory of Fields, Sec. 19.5 (Cambridge University
Press, 1996).

6. For diverse discussions of this assumption in the context of the linear


approximation see M. Sasaki and T. Tanaka, Prog. Theor. Phys. 99,
763 (1998); D. Wands, K. A. Malik, D. H. Lyth, and A. R. Liddle,
Phys. Rev. D 62, 043527 (2000); A. R. Liddle and D. H. Lyth, Cos-
mological Inflation and Large-Scale Structure (Cambridge University
Press, 2000); D. H. Lyth and D. Wands, Phys. Rev. D 68, 103516
(2003). It has been extended beyond the linear approximation in [4].

7. After this work was complete, I learned that the O(a−2 ) and O(a−3 )
terms in the metric in the special case of single-field inflation have also
been found by Y. Tanaka and M. Sasaki, Prog. Theor. Phys. 181,
455 (2007). Their solution is different from that presented in Section
III, presumably because they use a different gauge: In their gauge,
the time is not defined to give the scalar field its unperturbed value,
so that they find O(a−2 ) and O(a−3 ) perturbations to the scalar field,
and the space coordinates are not defined to make N i = 0.

24
8. S. Weinberg, Phys. Rev. D 72, 043514 (2005).

9. G ‘t Hooft and M. J. G. Veltman, Ann. Poincare Phys. Theor. A20,


69 (1974); J. F. Donoghue, Phys. Rev. D 50, 3874 (1994).

10. M. van der Meulen and J. Smit, J. Cosm. Astropart. Phys. 11, 023
(2007).

11. S. Weinberg, Phys. Rev. D 78, 063534 (2008).

12. L. Kofman, A. Linde, and A.A. Starobinsky, Phys. Rev. D 56, 3258
(1997). Non-Gaussianity due to parametric amplification is studied by
K. Enqvist, A. Jokinen, A. Mazumdar, T. Multamaki, and A. Vaihko-
nen, J. Cosm. Astropart. Phys. 03, 010 (2005); A. Jokinen and A.
Mazumdar, J. Cosm. Astropart. Phys. 04, 003 (2006); A. Chambers
and A. Rajantie, Phys. Rev. Lett. 100, 041302 (2008).

13. For example, D. Seery and J. E. Lidsey, J. Cosm. Astropart. Phys.


06, 0506 (2005); X. Chen, M-x. Huang, S. Kachru, and G. Shiu, J.
Cosm. Astropart. Phys. 01, 002 (2007); X. Chen, R. Easther, and E.
A. Lim, J. Cosm. Astropart. Phys. 06, 023 (2007) and 0801.3295.

14. A. R. Liddle and S. M. Leach, Phys. Rev. D 68, 103503 (2003).

15. R. S. Arnowitt, S. Deser, and C. W. Misner, in Gravitation: An Intro-


duction to Current Research, ed. L. Witten (Wiley, New York, 1962):
227, now also available as gr-qc/0405109.

16. S. Weinberg, ref. [8], Eq. (24).

25
UTTG-01-00

A Priori Probability Distribution of the Cosmological


arXiv:astro-ph/0002387v1 21 Feb 2000

Constant

Steven Weinberg∗
Theory Group, Department of Physics, University of Texas
Austin, TX, 78712

Abstract
In calculations of the probability distribution for the cosmological con-
stant, it has been previously assumed that the a priori probability distri-
bution is essentially constant in the very narrow range that is anthropically
allowed. This assumption has recently been challenged. Here we identify
large classes of theories in which this assumption is justified.


Electronic address: weinberg@physics.utexas.edu
I. INTRODUCTION

In some theories of inflation1 and of quantum cosmology2 the observed big


bang is just one of an ensemble of expanding regions in which the cosmological

constant takes various different values. In such theories there is a probability


distribution for the cosmological constant: the probability dP(ρV ) that a
scientific society in any of the expanding regions will observe a vacuum energy
between ρV and ρV + ρV is given by3,4,5

dP(ρV ) = P∗ (ρV )N (ρV )d4ρV , (1)

where P∗ (ρV )dρV is the a priori probability that an expanding region will
have a vacuum energy between ρV and ρV + dρV (to be precise, weighted

with the number of baryons in such regions), and N (ρV ) is proportional to


the fraction of baryons that wind up in galaxies. (The constant of propor-
tionality in N (ρV ) is independent of ρV , because once a galaxy is formed the
subsequent evolution of its stars, planets, and life is essentially unaffected by

the vacuum energy.)


The factor N (ρV ) vanishes except for values of ρV that are very small by
the standards of elementary particle physics, because for ρV large and positive
there is a repulsive force that prevents the formation of galaxies6 and hence

of stars, while for ρV large and negative the universe recollapses too fast for
galaxies or stars to form.7 The fraction of baryons that form galaxies has
been calculated5 for ρV > 0 under reasonable astrophysical assumptions. On

1
the other hand, we know little about the a priori probability distribution
P∗ (ρV ). However, the range of values of ρV in which N (ρV ) 6= 0 is so
narrow compared with the scales of energy density typical of particle physics
4,5
that it had seemed reasonable in earlier work to assume that P∗ (ρV ) is
constant within this range, so that dP(ρV ) can be calculated as proportional
to N (ρV )dρV . In an interesting recent article,8 Garriga and Vilenkin have
argued that this assumption (which they call “Weinberg’s conjecture”) is

generally not valid. This raises the problem of characterizing those theories
in which this assumption is valid and those in which it is not.
It is shown in Section II that this assumption is in fact valid for a broad
range of theories, in which the different regions are characterized by differ-

ent values of a scalar field that couples only to itself and gravitation. The
deciding factor is how we impose the flatness conditions on the scalar field
potential that are needed to ensure that the vacuum energy is now nearly

time-independent. If the potential is flat because the scalar field renormal-


ization constant is very large, then the a priori probability distribution of
the vacuum energy is essentially constant within the anthropically allowed
range, for scalar potentials of generic form. It is also essentially constant

for a large class of other potentials. Section III is a digression, showing that
the same flatness conditions ensure tht the vacuum energy has been roughly
constant since the end of inflation. Section IV takes up the sharp peaks in
the a priori probability found in theories of quantum cosmology and eternal

2
inflation.

II. SLOWLY ROLLING SCALAR FIELD

One of the possibilities considered by Garriga and Vilenkin is a vacuum


energy that depends on a homogeneous scalar field φ(t) whose present value
is governed by some smooth probability distribution. The vacuum energy is

1
ρV = V (φ) + φ̇2 , (2)
2

and the scalar field time-dependence is given by

φ̈ + 3H φ̇ = −V ′ (φ) , (3)

where H(t) is the Hubble fractional expansion rate, V (φ) is the scalar field
potential, dots denote derivatives with respect to time, and primes denote
derivatives with respect to φ. Following Garriga and Vilenkin,8 we assume

that at present the scalar field energy appears like a cosmological constant
because the field φ is now nearly constant in time, and that this scalar field
energy now dominates the cosmic energy density. For this to make sense it is
necessary for the potential V (φ) to satisfy certain flatness conditions. In the

usual treatment of a slowly rolling scalar, one neglects the inertial term φ̈ in
Eq. (3) as well as the kinetic energy term φ̇2 /2 in Eq. (2). With the inertial
term neglected, the condition that V (φ) should change little in a Hubble time
1/H is that9

V ′2 (φ) ≪ 3H 2 |V (φ)| . (4)

3
With the scalar field energy dominating the total cosmic energy density, the
Friedmann equation gives

|V (φ)| ≃ ρV ≃ 3H 2/8πG , (5)

so Eq. (4) requires



|V ′ (φ)| ≪ 8πG ρV . (6)

(The kinetic energy term φ̇2 /2 in Eq. (2) can be neglected under the slightly

weaker condition
q √
|V ′ (φ)| ≪ 18H 2 |V (φ)| ≃ 48πG ρV ,

which is the flatness condition given by Garriga and Vilenkin.) There is also

a bound on the second derivative of the potential, needed in order for the
inertial term to be neglected. With the scalar field energy dominating the
total cosmic energy density, this condition requires that9

|V ′′ (φ)| ≪ 8πGρV . (7)

As Garriga and Vilenkin correctly pointed out, the smallness of the slope
of V (φ) means that φ may vary appreciably even when ρV ≃ V (φ) is re-
stricted to the very narrow anthropically allowed range of values in which

galaxy formation is non-negligible. They concluded that it would be possible


for the a priori probability P∗ (ρV ) to vary appreciably in this range. In par-
ticular, Garriga and Vilenkin assumed an a priori probability distribution

4
for φ that is constant in the anthropically allowed range, in which case the a
priori probability distribution for ρV is

P∗ (ρV ) ∝ 1/|V ′ (φ)| (8)

which they said could vary appreciably in the anthropically allowed range.
Though possible, this rapid variation is by no means the generic case. As
already mentioned, the second as well as the first derivative of the potential

must be small, so that the a priori probability density (8) may change little in
the anthropically allowed range. It all depends on how the flatness conditions
are satisfied. There are two obvious ways that one might try to make the
potential sufficiently flat. Potentials of the first type are of the general form

V (φ) = V1 f (λφ) , (9)

where V1 is some large energy density, in the range of m4W to G−2 ; the

constant λ is very small: and f (x) is some dimensionless function involving


no very large or very small parameters. Potentials of the second type are of
the general form
V (φ) = V1 [1 − ǫ g(λφ)] , (10)

where V1 is again some large energy density; λ is here some fixed inverse

mass, perhaps of order G; now it is ǫ instead of λ that is very small; and
g(x) is some other dimensionless function involving no very large or very
small parameters.

5
For potentials (9) of the first type, it is always possible to meet all obser-
vational conditions by taking λ sufficiently small, provided that the function
f (x) has a simple zero at a point x = a of order unity, with derivatives at a

of order unity. Because V1 is so large, the present value of λφ must be very


close to the assumed zero a of f (x). With f ′ (a) and f ′′ (a) of order unity, the
flatness conditions (6) and (7) are both satisfied if
√
ρV

|λ| ≪ 8πG . (11)
V1

Galaxy formation is only possible for |V (φ)| less than an upper bound Vmax
of the order of the mass density of the universe at the earliest time of galaxy
formation,6 which in the absence of fine tuning of the cosmological constant

is very much less than V1 . The anthropically allowed range of φ is therefore


given by
Vmax
∆φ ≡ |φ − a/λ|max = . (12)
|λf ′(a)V1 |
The fractional change in the a priori probability density 1/|V ′ (φ)| in this
range is then

V ′′ (φ)∆φ Vmax f ′′ (a)


= , (13)
V ′ (φ) V1 f ′2 (a)

with no dependence on λ. As long as the factor f ′′ (a)/f ′2 (a) is roughly of


order unity the fractional variation (13) in the a priori probability will be

very small, as was assumed in references 4 and 5.


This reasoning applies to potentials of the form

V (φ) = V1 [1 − (λφ)n ] ,

6
which, as already noted by Garriga and Vilenkin, lead to an priori probability
distribution that is nearly constant in the anthropically allowed range. (In
this case a = 1 and f ′′ (a)/f ′2 (a) = (1−n)/n.) But this reasoning also applies

to the “washboard potential” that was taken as a counterexample by Garriga


and Vilenkin, which with no loss of generality can be put in the form:

V (φ) = V1 [1 + αλφ + β sin(λφ)] .

The zero point a is here determined by the condition

1 + αa + β sin a = 0 ,

and the factor f ′′ (a)/f ′2 (a) in Eq. (13) is

f ′′ (a) −β sin a
= .
f (a)
′2 (α + β cos a)2

If the flatness condition is satisfied by taking λ small, with α and β of

order unity, as is assumed for potentials of the first kind, then the factor
f ′′ (a)/f ′2 (a) in Eq. (13) is of order unity unless α and β happen to be chosen
so that s
!
−α α2
1 + α cos−1 + β 1 − 2 ≪ 1 .

β β
Of course it would be possible to impose this condition on α and β, but this is

the kind of fine-tuning that would be upset by adding a constant of order V1


to the potential. Aside from this exception, for all α and β of order unity the
factor f ′′ (a)/f ′2 (a) is of order unity, so the washboard potential also yields

7
an a priori probability distribution for the vacuum energy that is flat in the
anthropically allowed range.
In contrast, for potentials (10) of the second kind the flatness conditions

are not necessarily satisfied no matter how small we take ǫ. Because the
present vacuum energy is much less than V1 , the present value of φ must be
very close to a value φǫ , satisfying

g(λφǫ) = 1/ǫ . (14)

This requires λφǫ to be near a singularity of the function g(x), perhaps at


infinity, so it is not clear in general that such a potential would have small

derivatives at λφǫ for any value of ǫ. For instance, for an exponential g(x) =
exp(x) we have φǫ = − ln ǫ/λ, and V ′ (φǫ ) approaches an ǫ-independent value
proportional to λ, which is not small unless we take λ very small, in which
case have a potential of the first kind, for which as we have seen the a priori

probability density (8) is flat in the anthropically allowed range. The flatness
conditions are satisfied for small ǫ if g(x) approaches a power xn for x → ∞.
In this case φǫ goes as ǫ−1/n , so V ′ (φǫ ) goes as ǫ1/n and V ′′ (φǫ ) goes as ǫ2/n ,
both of which can be made as small as we like by taking ǫ sufficiently small.

In particular, if the singularity in g(x) at x → ∞ consists only of poles


in 1/x of various orders up to n (as is the case for a polynomial of order n)
then the anthropically allowed range of φ is

Vm Vm
 
φ − φǫ ≈ ≈ ǫ−1/n . (15)

max V1 ǫ|g (φǫ )|
′ V1

8
The flatness conditions make this range much greater than the Planck mass,
but the fractional change in the a priori probability density (8) in this range
is still very small

V ′′ (φ ) Vm
ǫ
φ − φǫ ≈ ≪1. (16)

V (φǫ ) V1

max

To have a large fractional change in the a priori probability distribution in


the anthropically allowed range for potentials of the second type that satisfy
the flatness conditions, we need a function g(x) that goes like a power as

x → ∞, but has a more complicated singularity at x = ∞ than just poles in


1/x. An example is provided by the washboard potential with α and β very
small and λ fixed, the case considered by Garriga and Vilenkin, for which
g(x) has an essential singularity at x = ∞.

In summary, the a priori probability is flat in the anthropically allowed


range for several large classes of potentials, while it seems to be not flat only
in exceptional cases.
It remains to consider whether the small parameters λ or ǫ in potentials

respectively of the first or second kind could arise naturally. Garriga and
Vilenkin argued that a term in a potential of what we have called the second
kind with an over-all factor ǫ ≪ 1 could be naturally produced by instanton
effects. On the other hand, for potentials of type 1 a small parameter λ

could be naturally produced by the running of a field-renormalization factor.


The field φ has a conventional “canonical” normalization, as shown by the
fact that the term φ̇2 /2 in the vacuum energy (2) and the inertial term

9
φ̈ in the field equation (3) have coefficients unity. Factors dependent on
the ultraviolet cutoff will therefore be associated with external φ-lines. In
order for the potential V (φ) to be expressed in a cut-off independent way in

terms of coupling parameters gµ renormalized at a wave-number scale µ, the


field φ must be accompanied with a field-renormalization factor Zµ−1/2 , which
satisfies a differential equation of the form

dZµ
µ = γ(gµ )Zµ . (17)

At very large distances, the field φ will therefore be accompanied with a


factor
( )
1 µ dµ′
Z
−1/2
λ= Z0 = exp γ(gµ′ ) Zµ−1/2 . (18)
2 0 µ′
The integral here only has to be reasonably large and negative in order for
λ to be extremely small.

III. SLOW ROLLING IN THE EARLY UNIVERSE

When the cosmic energy density is dominated by the vacuum energy,

the flatness conditions (6) and (7) insure that the vacuum energy changes
little in a Hubble time. But if the vacuum energy density is nearly time-
independent, then from the end of inflation until nearly the present it must
have been much smaller than the energy density of matter and radiation,

and under these conditions we are not able to neglect the inertial term φ̈ in
Eq. (3). A separate argument is needed to show that the vacuum energy is

10
nearly constant at these early times. This is important because, although
there is no observational reason to require V (φ) to be constant at early
times, it must have been less than the energy of radiation at the time of

nucleosynthesis in order not to interfere with the successful prediction of


light element abundances, and therefore at this time must have been very
much less than V1 , which we have supposed to be at least of order m4W . For
potentials (9) of the first kind, this means that φ must have been very close

to its present value at the time of helium synthesis. Also, if φ at the end of
inflation were not the same as φ at the time of galaxy formation, then a flat
a priori distribution for the first would not in general imply a flat a priori
distribution for the second.

At times between the end of inflation and the recent past the expansion
rate behaved as H = η/t, where η = 2/3 or η = 1/2 during the eras of matter
or radiation dominance, respectively. During this period, Eq. (3) takes the

form

φ̈ + φ̇ = −V ′ (φ) , (19)
t
If we tentatively assume that φ is nearly constant, then Eq. (19) gives for its
rate of change
t V ′ (φ)
φ̇ ≃ − . (20)
1 + 3η
The change in the vacuum energy from the end of inflation to the present

11
time t0 is therefore
Z t0 V ′2 (φ)t20
∆V ≃ V ′ (φ) φ̇ dt ≃ − . (21)
0 2(1 + 3η)
q
The present time is roughly given by t0 ≈ η 3/8πGρV 0 , so the fractional
change in the vacuum energy density since the end of inflation is
! !
∆V 3η 2 V ′2 (φ)
≈ , (22)

ρV 0 2(1 + 3η) 8πGρ2V 0

a subscript zero as usual denoting the present instant. The factor 3η 2 /2(1 +
3η) is of order unity, so the inequality (6) tells us that the change in the
vacuum energy during the time since inflation has indeed been much less
than its present value.

III. QUANTUM COSMOLOGY

In some theories of quantum cosmology the wave function of the uni-


verse is a superposition of terms, corresponding to universes with different
(but time-independent) values for the vacuum energy ρV . It has been argued

by Baum2 , Hawking2 and Coleman10 that these terms are weighted with a
ρV -dependent factor, that gives an a priori probability distribution with an
infinite peak at ρV = 0, but this claim has been challenged.11 As already ac-
knowledged in references 4 and 5, if this peak at ρV = 0 is really present, then

anthropic considerations are both inapplicable and unnecessary in solving the


problem of the cosmological constant.

12
Garriga and Vilenkin8 have proposed a different sort of infinite peak,
arising from a ρV -dependent rate of nucleation of sub-universes operating
over an infinite time. Even granting the existence of such a peak, it is not

clear that it really leaves a vanishing normalized probability distribution at


all other values of ρV . For instance, the nucleation rate might depend on the
population of sub-universes already present, in such a way that the peaks in
the probability distribution are kept to a finite size. If P∗ (ρV ) = 0 except at

the peak, then anthropic considerations are irrelevant and the cosmological
constant problem is as bad as ever, since there is no known reason why
the peak should occur in the very narrow range of ρV that is anthropically
allowed. On the other hand, if there is a smooth background in addition to a

peak outside the anthropically allowed range of ρV then the peak is irrelevant,
because no observers would ever measure such values of ρV . In this case the
probability distribution of the cosmological constant can be calculated using

the methods of references 4 and 5.

ACKNOWLEDGEMENTS

I am grateful for a useful correspondence with Alex Vilenkin. This re-


search was supported in part by the Robert A. Welch Foundation and NSF
Grant PHY-9511632.

REFERENCES

13
1. A. Vilenkin, Phys. Rev. D27, 2848 (1983); A. D. Linde, Phys. Lett.
B175, 395 (1986).

2. E. Baum, Phys. Lett. B133, 185 (1984); S. W. Hawking, in Shelter


Island II - Proceedings of the 1983 Shelter Island Conference on Quan-
tum Field Theory and the Fundamental Problems of Physics, ed. R.

Jackiw et al. (MIT Press, Cambridge, 1995); Phys. Lett. B134, 403
(1984); S. Coleman, Nucl. Phys. B 307, 867 (1988).

3. An equation of this type was given by A. Vilenkin, Phys. Rev. Lett.

74, 846 (1995); and in Cosmological Constant and the Evolution of the
Universe, K. Sato, et al., ed. (Universal Academy Press, Tokyo, 1996)
(gr-qc/9512031), but it was not used in a calculation of the mean value
or probability distribution of ρV .

4. S. Weinberg, in Critical Dialogs in Cosmology, ed. by N. Turok (World


Scientific, Singapore, 1997).

5. H. Martel, P. Shapiro, and S. Weinberg, Ap. J. 492, 29 (1998).

6. S. Weinberg, Phys. Rev. Lett. 59, 2607 (1987).

7. J. D. Barrow and F. J. Tipler, The Anthropic Cosmological Principle


(Clarendon Press, Oxford, 1986).

8. J. Garriga and A. Vilenkin, Tufts University preprint astro-ph/9908115,

to be published.

14
9. P. J. Steinhardt and M. S. Turner, Phys. Rev. D29, 2162 (1984).

10. S. Coleman, Nucl. Phys. B 310, 643 (1988).

11. W. Fischler, I. Klebanov, J. Polchinski, and L. Susskind, Nucl. Phys.


B237, 157 (1989).

15
UTTG-08-00

Curvature Dependence of Peaks in the Cosmic Microwave


arXiv:astro-ph/0006276v1 20 Jun 2000

Background Distribution

Steven Weinberg∗
Theory Group, Department of Physics, University of Texas
Austin, TX, 78712

Abstract
−1/2
The widely cited formula ℓ1 ≃ 200 Ω0 for the multipole number of the
first Doppler peak is not even a crude approximation in the case of greatest
current interest, in which the cosmic mass density is less than the vacuum
energy density. For instance, with ΩM fixed at 0.3, the position of any
Doppler peak varies as Ω−1.58
0 near Ω0 = 1.


Electronic address: weinberg@physics.utexas.edu
The precise measurement1 of the multipole number ℓ1 = 197 ± 6 at the

first ‘Doppler’ peak has provided an invaluable constraint on cosmological

parameters. In a 1994 numerical calculation, Kamionkowski, Spergel and

Sugiyama2 presented a formula giving ℓ1 as a function essentially of the

curvature alone:
200
ℓ1 ∼ √ , (1)
Ω0

where Ω0 ≡ ΩM + ΩΛ , in which ΩM and ΩΛ are the present ratios of the

cosmic mass density and the vacuum energy (associated, e. g., with a cos-

mological constant) to the critical density. This calculation was done before

supernova studies3 indicated the likely presence of a relatively large cosmo-

logical constant, and therefore assumed that ΩΛ = 0. They also explained


−1/2
the Ω0 behavior by noting that ℓ1 is approximately inversely propor-

tional to the angle subtended at the earth by the horizon at the time of last
1/2
scattering, which was known4 to be proportional to Ω0 for ΩΛ = 0. The

same Ω0 -dependence was derived on the same grounds by Frampton et al.,5

explicitly for the case ΩΛ = 0.

Unfortunately, despite the fact that it was derived only for the case ΩΛ =

0, Eq. (1) continues to be quoted1,6,7,8,9 as if it were generally applicable also

when ΩΛ is appreciable. As far as I know, this formula has not been used by

observational groups in analysis of their data, but in view of the great current

interest in these matters, it seems worth warning that in fact, Eq. (1) is not

valid for parameters in the range suggested by supernova observations, for

1
which ΩΛ > ΩM . Although it is true that when Ω0 is near unity, ℓ1 depends

less sensitively on other parameters than on Ω0 , the dependence of ℓ1 on

Ω0 bears no resemblence whatever to Eq. (1), except for the case ΩΛ ≪ 1.

Instead, we shall see that the dependence of ℓ1 on Ω0 near Ω0 = 1 with

ΩM fixed at values less than 0.4 is much stronger than given by Eq. (1) (for

instance, ℓ1 ∝ Ω−1.58
M for ΩM = 0.3), and it depends sensitively on ΩM .

To calculate the full dependence of ℓ1 on Ω0 , ΩM , Ωbaryon , Ωradiation ,

etc. is a complicated task, requiring the consideration of the evolution of

the acoustic velocity and of the ratio of radiation and matter energies, and

the consideration of Doppler shifts as well as temperature fluctuations. We

can avoid all these complications by considering the dependence of ℓ1 on Ω0

when only ΩΛ is allowed to vary, with ΩM and all other parameters held

fixed. If it were really true (as Eq. (1) says) that ℓ1 depends only on Ω0 ,

then this would be all we need to calculate the full Ω0 -dependence.

The advantage of letting only ΩΛ vary is that the vacuum energy density

is negligible compared with the densities of matter and radiation at and be-

fore the redshift zL ≃ 1100 of last scattering, so the only effect of variations

in ΩΛ on the multipole number ℓn of the nth Doppler peak is to change

the paths followed by light rays since the time of last scattering. The angle

subtended at the earth by any feature of the cosmic microwave background

of proper length d is

θ = d/dA , (2)

2
where dA is the angular diameter distance of the surface of last scattering:10
" #
1
Z 1 dx
1/2
dA = 1/2
sinh Ωk p , (3)
Ωk H0 (1 + zL ) 1/(1+zL ) Ω Λ x + Ω k x2 + Ω M x
4

and Ωk is a measure of curvature

Ωk ≡ 1 − ΩΛ − ΩM = 1 − Ω0 . (4)

It follows that the ΩΛ -dependence of ℓn is given by

ℓn ∝ dA . (5)

Furthermore, although the relation between the present Hubble constant H0

and the proper scales of phenomena at the time of last scattering depends

on ΩM and Ωradiation , it does not depend on ΩΛ . (For instance, if we neglect

radiation, then the acoustic horizon at the redshift of last scattering is 2(1 +

zL )−3/2 / 3ΩM H0 .) Therefore, with ΩM fixed, the dependence of ℓn on ΩΛ

is given by
" #
1
Z 1 dx
1/2
ℓn ∝ F(ΩΛ ) ≡ 1/2
sinh Ωk p , (6)
Ωk 0 Ω Λ x4 + Ω k x2 + Ω M x

with Ωk given in terms of ΩΛ by Eq. (4). (The lower limit on the integral

has here been set equal to zero because zL >> 1.) Of course, all the de-

tailed physics of the acoustic oscillations responsible for the Doppler peaks

is contained in the constant of proportionality; all we need to know here is

that it does not involve ΩΛ .

3
Now let us consider the variation of the quantity (6) as we make small

changes in Ω0 near Ω0 = 1 with ΩM fixed. An elementary calculation gives

ℓn ∝ Ω−ν
0 , (7)

where
∂ ln F I12 I2
 
ν≡ = − , (8)
∂ΩΛ ΩΛ =1−ΩM 6 2I1

with

1 dx 1 (x2 − x4 ) dx
Z Z
I1 ≡ , I2 ≡ .
0 [(1 − ΩM )x4 + ΩM x]1/2 0 [(1 − ΩM )x4 + ΩM x]3/2
(9)

The table below gives values of these integrals, and of the resulting exponent

ν in Eq. (7).

ΩM I1 I2 ν
0.2 3.891 2.546 2.196
0.3 3.305 1.601 1.578
0.4 2.938 1.145 1.244

1.0 2 8/21 4/7

The only approximation made in deriving these results is that the universe

becomes transparent suddenly at a redshift zL ≫ 1, and has been dominated

since then by non-relativistic matter and vacuum energy. Also, we are ne-

glecting the effect of changing gravitational potentials at redshifts z ≪ zL ,

which introduce an additional Λ- dependence11 that is quite small at the

wavelengths of the Doppler peaks. Otherwise, these results are exact.

4
−4/7
The behavior ℓ1 ∝ Ω0 near Ω0 = 1 for ΩM fixed at unity is close to the

near Ω0 = 1 found2,5 for ΩΛ fixed at zero, confirming


−1/2
behaviour ℓ1 ∝ Ω0

that ℓ1 is approximately a function of Ω0 alone for ΩΛ = 0 and ΩM near

unity. The fact that ν depends strongly on ΩM for smaller values of ΩM

shows that for observationally favored parameters ℓ1 is not approximately

a function of Ω0 alone. Indeed, there is no physical reason why ℓ1 should

be even approximately a function of Ω0 alone. For fixed values of ΩM less

than 0.4 the ℓn fall off with increasing Ω0 much more rapidly than would be

expected from Eq. (1), so the measurement of the positions of the Doppler

peaks provides a more stringent constraint on Ω0 than would be the case if

Eq. (1) were correct.

Acknowledgements

I am grateful for conversations with M. Kamionkowski, M. Roos, M.

Turner, and M. White. This research was supported in part by the Robert

A. Welch Foundation and NSF Grant PHY-9511632.

References

1. P. de Bernardis et al., Nature 404, 955 (2000).

2. M Kamionkowski, D. N. Spergel, and N. Sugiyama, Ap. J. 426, L57

(1994).

5
3. S. Perlmutter et al., astro-ph/9812133, 9812473; B. P. Schmidt et al.,

Ap. J. 507, 46 (1998); A. G. Riess et al., astro-ph/9805200.

4. See, e. g., S. Weinberg, Gravitation and Cosmology (Wiley, New York,

1972), Eq. (15.5.39); E. W. Kolb and M. S. Turner, The Early Universe

(Addison-Wesley, Redwood City, CA, 1990), p. 505.

5. P. Frampton , Y. J. Ng, and R. Rohm, Mod. Phys. Lett. A13, 2541

(1998). There are aspects of this paper with which I disagree, but they

are not relevant to the present work.

6. N. A. Bahcall, J. P. Ostriker, S. Perlmutter, and P. J. Steinhardt,

Science 28, 1481 (1999).

7. M. S. Turner, in Cosmo-98: Second International Workshop on Parti-

cle Physics and the Early Universe, AIP Conference Proceedings 478

(American Institute of Physics, Woodbury, NY 1999), ed. by D. O.

Caldwell, p. 113.

8. M. Roos and S. M. Harun-or-Rashid, astro-ph/0005541 (2000).

9. Bahcall et al.6 cited reference 2, while Turner7 cited no reference for

Eq. (1). De Bernardis et al.1 cited no references for Eq. (1), but relied

on references 2, 5, and 6. Roos and Harun-or-Rashid8 also cited no

references, but took this formula from reference 1.

10. This formula is given, e. g., in reference 5.

6
11. M. White, D. Scott, and J. Silk, Ann. Rev. Astron. Astrophys. 32,

319 (1994): Appendix B; W. Hu and M. White, Astron. Astrophys.

315, 33 (1996).

7
UTTG-03-01

Fluctuations in the Cosmic Microwave Background I: Form


arXiv:astro-ph/0103279v2 16 Aug 2001

Factors and their Calculation in Synchronous Gauge

Steven Weinberg∗
Theory Group, Department of Physics, University of Texas
Austin, TX, 78712

Abstract

It is shown that the fluctuation in the temperature of the cosmic microwave


background in any direction may be evaluated as an integral involving scalar
and dipole form factors, which incorporate all relevant information about
acoustic oscillations before the time of last scattering. A companion paper
gives asymptotic expressions for the multipole coefficient Cℓ in terms of these
form factors. Explicit expressions are given here for the form factors in a
simplified hydrodynamic model for the evolution of perturbations.


Electronic address: weinberg@physics.utexas.edu
I. INTRODUCTION

The purpose of this paper is, first, to exhibit a general formalism, ex-
pressing the observed fluctuations in the cosmic microwave background tem-
perature in terms of a pair of form factors, and then to carry out an illustra-
tive approximate analytic calculation of these form factors. A companion
paper[1] gives general asymptotic formulas for the coefficient Cℓ of the ℓth
multipole term in the temperature correlation function for arbitrary form
factors, and also uses these formulas to calculate Cℓ for the form factors
found in the present paper.
In Section II we show that under very general assumptions the fractional
variation from the mean of the cosmic microwave background temperature
observed in a direction n̂ takes the form
∆T (n̂)
Z h i
= d3 k ǫk eidA n̂·k F (k) + i n̂ · k̂ G(k) . (1)
T
Here dA is the angular diameter distance of the surface of last scattering,∗∗
and k2 ǫk is proportional (with a k-independent proportionality coefficient)
to the Fourier transform of the fractional perturbation in the total energy
density at early times. (There are additional terms in ∆T /T that arise from
times near the present, and chiefly effect the multipole coefficients Cℓ for
small ℓ, especially ℓ = 0 and ℓ = 1. These will be discussed in Section IV
and in the Appendix. Effects from a changing gravitational field soon after
the time of last scattering are included in Eq. (1).)
One advantage of this formalism is that it provides a nice separation
between the three different kinds of effect that influence the observed tem-
perature fluctuation, that arise in three different eras: (1) at very early
times, (2) during the era of acoustic oscillations, and (3) from the time of
last scattering to the present:

1. The k-dependence of the unprocessed fluctuation amplitude ǫk reflects


the space-dependence of fluctuations in the energy density at very early
times. The average of the product of two ǫs is assumed to satisfy the
conditions of statistical homogeneity and isotropy:
hǫk ǫk′ i = δ3 (k + k′ ) P(k) (2)
∗∗
Note that in speaking of a surface of last scattering, we are not necessarily assuming
that the transition from opacity to transparency takes place instantaneously. The physical
wave number vector k varies with time as 1/a(t) (where a(t) is the Robertson–Walker scale
factor), while for large redshifts dA varies as a(t), so the product dA k is nearly independent
of what we choose as a nominal time of last scattering.

1
with k ≡ |k|. Since the reality of the fluctuations in the energy density
requires that ǫ∗k = ǫ−k , the power spectral function P(k) is real and
positive. It is common to assume a “straight” spectrum

P(k) ∝ kn−4 (3)

For instance, the ‘scale-invariant’ n = 1 form[2] suggested by theories


of new inflation[3] is:
P(k) = B k −3 , (4)
with B a constant that must be taken from observations of the cosmic
microwave background or condensed object mass distributions, or from
detailed theories of inflation.

2. The form factors F (k) and G(k) characterize acoustic oscillations, with
F (k) arising from the Sachs–Wolfe effect and intrinsic temperature
fluctuations, and G(k) arising from the Doppler effect.

3. Taking account of a constant vacuum energy density as well as cold


matter in the time after last scattering, it is easy to calculate the
angular diameter distance of the surface of last scattering:
 
1
Z 1 dx
1/2
dA = 1/2 sinh ΩC √  ,
ΩC H0 (1 + zL ) 1
1+zL
Ω Λ x4 + Ω C x2 + Ω M x
(5)
where zL ≃ 1100 is the redshift of last scattering, ΩC ≡ 1 − ΩΛ − ΩM ,
and ΩΛ and ΩM are as usual the present ratios of the energy densities
of the vacuum and matter to the critical density 3H02 /8πG.

We note in particular that F (k) and G(k) depend on ΩM h2 and on the


baryon density parameter ΩB h2 (where h is the Hubble constant in units of
100 km/sec/Mpc), but since the curvature and vacuum energy were negligi-
ble at and before last scattering, F (k) and G(k) are essentially independent
of the present curvature and of ΩΛ . The exponent n in P(k) is expected to
be independent of all these parameters. On the other hand, dA is affected by
whatever governed the paths of light rays since the time of last scattering,
so it depends on ΩM , ΩΛ , and the curvature parameter ΩC , but it is essen-
tially independent of quantities like the baryon density parameter ΩB that
effect acoustic oscillations before the time of last scattering. In quintessence
theories dA would be given by a different formula, but P(k) and the form

2
factors would be essentially unchanged as long as the quintessence energy
density is a small part of the total energy density at and before the time of
last scattering.
Another advantage of this formalism is that, although Cℓ must be cal-
culated by a numerical integration, it is possible to give approximate ana-
lytic expressions for the form factors in terms of elementary functions. The
detailed confrontation of observation and theory must necessarily be done
using computer codes that take into account all relevant astrophysical and
observational effects.[4] Nevertheless, there is some value in also having an
analytic treatment that, though not as accurate as possible, is as simple as
possible while still capturing the main features of what is going on. The
point is not to compete with the computer codes, but rather to gain some
feeling for what is going on, in order to help us judge how predictions for
the cosmic microwave background fluctuations may change with alterations
in the underlying assumptions.
Analytic treatments of fluctuations in the cosmic microwave background
already exist in the literature[5]. Our main purpose in going over the same
ground here is not to give a more accurate or comprehensive treatment of
acoustic oscillations, but to obtain simple expressions for the form factors
as examples to which to apply the asymptotic formulas for Cℓ derived in
reference [1]. To derive analytic expressions for the temperature fluctuation
it is necessary to neglect the contribution of radiation and neutrinos to
the gravitational field, which should be a fair approximation near the first
Doppler peak but not much beyond that. We employ a purely hydrodynamic
treatment, relying on the Boltzmann equation only implicitly in the values
used for the shear viscosity and heat conduction coefficients; the effects of
viscosity and heat conduction are included from the beginning, not just by
inserting damping factors; and ‘Landau’ damping due to the finite duration
of the era of last scattering is included along with ‘Silk’ damping due to
shear viscosity and heat conduction. As far as I know, this is the first work
to obtain explicit analytic expressions for the temperature fluctuations that
are correct within these approximations.
Section III presents an analytic calculation of the evolution of pertur-
bations in synchronous gauge up to the time of last scattering, which is
then used in Section IV to calculate the form factors. For very small wave
numbers the form factors are found to be

F (k) → 1 − 3k2 t2L /2 − 3[−ξ −1 + ξ −2 ln(1 + ξ)]k4 t4L /4 + . . . , (6)


G(k) → 3ktL − 3k3 t3L /2(1 + ξ) + . . . , (7)

3
while for wave numbers large enough to allow the use of the WKB approxi-
mation, i. e., ktL ≥ ξ, the form factors are[6]
h 2 d2
i
F (k) = (1 + 2ξ/k 2 t2L )−1 −3ξ + 2ξ/k 2 t2L + (1 + ξ)−1/4 e−k D cos(kdH ) ,
(8)
and √ 2 d2
G(k) = 3 (1 + ξ)−3/4 (1 + 2ξ/k 2 t2L )−1 e−k D sin(kdH ) . (9)
Here tL is the time of last scattering; ξ is 3/4 the ratio of the baryon to
photon energy densities at this time:
!
3ρB
ξ= ≃ 27 ΩB h2 ; (10)
4ργ t=tL

dH is the acoustic horizon size at this time, given by Eq. (75), and dD is a
damping length, given by Eq. (89).

II. FORM FACTORS

We first justify Eq. (1) under very general assumptions, not limited to
those of Section III. At and before the time of last scattering the spatial
curvature was negligible, so small perturbations in the cosmic metric and
in all particle distributions at these times may conveniently be expressed
as Fourier transforms of functions of a co-moving wave number vector q
and the time t. Effects like pressure forces that involve spatial gradients
are important for a given q only when the physical wave number q/a(t) is
at least as large as the cosmic expansion rate,√which is of order 1/t. Since
a(t) vanishes for t → 0 no more rapidly than t, the ratio qt/a(t) vanishes
as t → 0, so whatever the value of q, there will always be some time early
enough so that pressure forces and other effects of spatial gradients are neg-
ligble. At such early times, perturbations grow or decay with powers of time.
Generically there is one most rapidly growing mode, and this is the one that
eventually grows into the perturbations seen at the time of last scattering.
Since the equations for the time dependence of the perturbations are linear,
the Fourier transforms of all perturbations to the metric and particle dis-
tributions during the era of last scattering will then be proportional to the
Fourier transform eq of any one of these perturbations at any sufficiently
early time. For definiteness, we can take eq to be q −2 times the Fourier
transform of the fractional perturbation to the total energy density at some
very early time, a choice that will prove‘ to be convenient in Section IV.

4
Since the fractional change in the observed microwave background tem-
perature seen in a direction n̂ is linear in the perturbations to the metric
and photon and matter distributions at various times during the era of last
scattering, it can be written as
∆T (n̂)
Z Z
= dt d3 q eiq·n̂r(t) eq J(q, q̂ · n̂, t) , (11)
T
where r(t) is the co-moving radial coordinate of a source scattering light at
time t that would be received at the present. Note that the quantity J can
depend on q only through the scalars q and q̂ · n̂, because the differential
equations governing the growth of perturbations are rotationally invariant,
even though the initial fluctuation amplitude eq is not.
We can make a great simplification in Eq. (11) by taking advantage of
the fact that the Robertson–Walker radial coordinate r(t) is nearly constant
during the era of last scattering. Using equilibrium statistical mechanics to
calculate the hydrogen ionization, and simple Thomson scattering to calcu-
late scattering probabilities, one finds that for ΩB /ΩM = 0.2, the probability
that a photon will never again be scattered rises from 2% at 3360◦ K to 98%
at 2780◦ K. (This depends on the assumed value of ΩB /ΩM , but very weakly;
for instance, for ΩB /ΩM = 0.12, the probability of no future scattering rises
from 2% to 98% as the temperature drops from 3400◦ K to 2810◦ K.) For
definiteness, we will round off these temperatures, taking the era of last
scattering to extend from a temperature of 3400◦ K down to 2800◦ K, corre-
sponding to a drop in redshift z from 1220 to 1010. The radial coordinate
can be expressed in terms of z by the well-known formula
" #
1
Z 1 dx
1/2
r(tz ) = 1/2
sinh ΩC √ , (12)
ΩC H0 a(t0 ) 1
1+z
Ω Λ x + Ω C x2 + Ω M x
4

where ΩC ≡ 1 − ΩΛ − ΩM , a(t) is the Robertson–Walker scale factor, tz is


the time corresponding to redshift z, and t0 is the present. This approaches
a constant limit for z → ∞, and therefore varies very little in the range
from z = 1010 to z = 1220 (or, for that matter, even in the range from
z = 1010 to z → ∞.) For instance, if we take the popular values ΩM = 0.3
and ΩV = 0.7, then the fractional change in r(tz ) as z drops from 1220 to
1010 is 0.0034. We can therefore take the exponential in Eq. (11) outside
the time integral, so that
∆T (n̂)
Z Z
3 iq·n̂ r(tL )
= d qe eq dt J(q, q̂ · n̂, t) , (13)
T

5
where tL is any conveniently chosen time during the era of last scattering,
say at a redshift zL = 1100.
It is convenient to replace the co-moving wave number vector q with the
physical wave number at last scattering:

k ≡ q/a(tL ) . (14)

Eq. (13) may then be written

∆T (n̂)
Z Z
= d3 k eik·n̂dA ǫk dt J (k, k̂ · n̂, t) , (15)
T

where J (k, k̂ · n̂, t) ≡ a(tL )3 J(q, q̂ · n̂, t), ǫk ≡ eq , and dA ≡ r(tL )a(tL ) is the
angular diameter distance of the surface of last scattering.
Part of the observed temperature fluctuations arise from perturbations
in scalar quantities, like the gravitational potential and the intrinsic tem-
perature, and therefore makes a contribution to J that is independent of
n̂. Another part arises from fluctuations in a vector, the velocity of the
baryon–electron plasma, and therefore makes a contribution that is linear
in n̂. Leaving aside other effects like gravitational radiation, the function J
therefore takes the form

J (k, k̂ · n̂, t) = F(k, t) + ik̂ · n̂ G(k, t) , (16)

with F(k, t) arising from the Sachs–Wolfe effect and intrinsic temperature
fluctuations, and G(k, t) arising from the Doppler effect. Using this in
Eq. (15) then gives

∆T (n̂)
Z h i
= d3 k eik·n̂dA F (k) + ik̂ · n̂ G(k) ǫk , (17)
T
which is the same as Eq. (1), with the form factors identified as time integrals
Z Z
F (k) ≡ dt F(k, t) , G(k) ≡ dt G(k, t) . (18)

This time integration introduces a damping of oscillatory part of the form


factors, but this will be less important than the effects of heat conduction
and viscosity in the time interval between recombination and last scattering.

III. EVOLUTION OF PERTURBATIONS IN SYNCHRONOUS


GAUGE

6
We now turn to the approximate analytic calculation of the form factors.

A. General Approximations

We make two assumptions that will allow great simplifications in this


calculation:
1. The contents of the universe up to the time of last scattering are taken
to consist of collisionless cold dark matter, collisionless neutrinos, a
baryon–electron plasma treated as a perfect fluid, and a photon gas
coupled to the plasma by Thomson scattering, with a short but non-
negligible photon mean free time. The finite duration of the era of
last scattering, when the mean free time becomes too large to allow
a hydrodynamic treatment, will be taken into account by the time
integrals in Eq. (18).

2. It is assumed that only the cold dark matter contributes to the ex-
pansion rate of the universe before the time of last scattering and to
perturbations in the metric. This is not a very good approximation,
but it is the price that has to be paid to get analytic expressions for
the observed temperature fluctuation. To minimize errors introduced
by the incorrect treatment of acoustic oscillations before the cross-over
time tC when the photon energy density equaled the dark matter en-
ergy density, it is necessary to restrict the wave number to be less than
an upper bound given in Section V.

B. Gravitational Field

We begin by reminding the reader of the equations that govern pertur-


bations in the metric and fluid properties before the time of last scattering.
The perturbed metric is taken as

gµν total (x, t) = gµν (t) + hµν (x, t) , (19)

where gµν is the Robertson–Walker metric in co-moving coordinates x with


spatial curvature neglected:

g00 = −1 , g0i = 0 , gij = a2 (t)δij , (20)

and hµν (x, t) is a small perturbation. We work in synchronous gauge, defined


by the conditions
h0i = h00 = 0 , (21)

7
and by the requirement that the cold dark matter particles have time-
independent spatial coordinates. These conditions leave an unbroken resid-
ual gauge invariance, under the transformation
!
2 ∂ei ∂ej
hij → hij + a + , (22)
∂xj ∂xi
with ei an arbitrary function of x but independent of t. As we will see, in
synchronous gauge the evolution of the compressional modes that concern
us here depends on the gravitational field only through a quantity that is
invariant under these these transformations
∂ hkk
 
ψ≡ (23)
∂t 2a2
The spatial curvature is negligible at and before the time of last scattering,
so it will be convenient to express ψ(x, t) as a Fourier transform:
Z
ψ(x, t) = d3 q eiq·x ψq (t) . (24)

Likewise, the total proper energy density of each of the constituents of the
universe (labelled f = D, B, γ for dark matter, the baryon–electron plasma,
and photons, respectively) is written
Z
̺f total (x, t) = ̺f (t)+δρf (x, t) , δρf (x, t) = d3 q ̺f q (t) eiq·x , (25)

with quantities carrying a subscript q denoting small perturbations. Under


our second assumption, the gravitational field equation in synchronous gauge
reads[7]
d 2 
a ψq = −4πGa2 ρDq . (26)
dt
C. Dark Matter Perturbations

The dark matter particles are assumed to ride on the expanding co-
ordinate mesh, with negligble peculiar velocities. (This is not affected by
perturbations to the gravitational field, because in synchronous gauge these
perturbations leave Γi00 zero.) Hence their energy-momentum tensor has
only a 00-component, TD00 = ̺D total . For the metric (19)–(21), the energy
conservation equation TD0µ ;µ = 0 then reads

dρDq 3ȧ
+ ρDq + ψq ρD = 0
dt a

8
or in other words
dδDq
= −ψq , (27)
dt
where δDq is the fractional dark matter density perturbation:

δDq ≡ ρDq /ρD . (28)

Combining Eqs. (26) and (27) gives


d dδDq
 
a2 = 4πGa2 ρD δDq . (29)
dt dt

During the dark matter dominated era, a ∝ t2/3 and 4πGρD = 2/3t2 , so
Eq. (29) can be written
d 4/3 dδDq 2 −2/3
 
t = t δDq . (30)
dt dt 3

As is well known, the two solutions go as t−1 and t2/3 . If these two modes
have comparable strengths for very small t, then the relevant solution is the
one that is most rapidly growing, which we shall write as
2
δDq = Nq t2/3 , ψq = − Nq t−1/3 . (31)
3
(The normalization constant Nq will play a role in this section similar to
that of the constant eq in Section II.)

D. Plasma and Photon Perturbations

Next, let us consider the imperfect fluid formed by the baryon–electron


plasma and the photons. It has a total velocity four-vector of the form
Z
µ µ
Utotal (x, t) =U + d3 q Uqµ (t) eiq·x (32)

where U µ is the unperturbed velocity four-vector

U0 = 1 , Ui = 0 , (33)

and Uqµ (t) is a small perturbation. The normalization condition gµν total Utotal
µ ν
Utotal =
−1 tells us that the first-order perturbations are purely spatial

Uq0 (t) = 0 . (34)

9
We will be considering only compressional modes, so we will assume that
µ
the spatial part of Utotal is the gradient of a velocity potential u:
Uqi (t) = i q i uq (t) . (35)
We will write the conservation laws for this fluid in terms of fractional pertur-
bations to the baryon–electron plasma mass density and the photon energy
density:
δBq ≡ ρBq /ρB , δγq ≡ ργq /ργ . (36)
The particle conservation equation[8] for the baryon–electron plasma mass
density is then
dδBq
= q 2 uq − ψq . (37)
dt
The energy conservation equation[9] for the baryon–electron–photon fluid is
d  3ȧ 4 4
    
ρB δBq + ργ δγq + ρB δBq + ργ δγq = − ρB + ργ ψq − q 2 uq
dt a 3 3
!
T d(a2 uq ) T δγq
− χq 2 Ṫ uq + 2 + , (38)
a dt 4a2
where T is the unperturbed photon temperature and χ is the coefficient of
heat conduction caused by photon energy transport. Finally, the momentum
conservation equation[10] is
d 4 δγq d 2 
     
+ 16πGη −a5 ρB + ργ − χṪ uq + χT a3 + a uq
dt 3 4 dt
1 4ηa3 h 2 i
= a3 ργ δγq − −q uq + ψq , (39)
3 3
where η is the coefficient of viscosity due to photon momentum transport.
By using Eq. (37) and recalling that ρB ∝ a−3 , ργ ∝ a−4 , and T ∝ a−1 , we
can simplify Eq. (38) to read
" #
d

4 χT

1 ∂  q2δ
γq
δγq − δBq = − a(δ̇Bq + ψ) − , (40)
dt 3 ργ a ∂t 4a2

Also, using Eq. (37) lets us write Eq. (39) as


d 4
     
+ 16πGη a5 ρB + ργ − χṪ δ̇Bq + ψq
dt 3
!#
2 q 2 a3 h
3 q δγq d 2  i
− χT a + a (δ̇Bq + ψq ) =− ργ δγq + 4η δ̇Bq . (41)
4 dt 3

10
Now, η/ργ and χT /ργ are of the order of the photon mean free time,
which as long as hydrodynamics is applicable must be short compared with
the cosmic age. Therefore we can neglect η and χ everywhere, except where
they are accompanied with a maximum number of space and/or time deriva-
tives of δBq or δγq , in which case powers of a high wave number can com-
pensate for the smallness of χ or η. Then Eqs. (40) and (41) simplify further
to " #
d 4 χT q 2 δγq
 
δγq − δBq = −δ̈Bq − , (42)
dt 3 ργ 4a2
!
d 5 4 q 2 δ̇γ d3 δBq
   
3
a ρB + ργ δ̇Bq + ψq − χT a + a2
dt 3 4 dt3
q 2 a3 4q 2 ηa3
=− ργ δγq − δ̇Bq . (43)
3 3
We also neglect terms of second order in χ and/or η, so we can set δ̇Bq
equal to 3δ̇γq /4 in the dissipative terms in Eqs. (42) and (43). Then using
Eq. (42) to eliminate δBq in Eq. (43) gives our differential equation for δγq :
!
d 5 4 3 dδγq 3a5 χT ρB 3 d3 δγq q 2 dδγq
   
a ρB + ργ + ψq + +
dt 3 4 dt 4ργ 4 dt3 4a2 dt
q 2 a3 dδγq
=− ργ δγq − ηq 2 a3 . (44)
3 dt
It will be convenient to multiply with t4/3 /a5 ργ . Recalling that ργ ∝ a−4 ,
ρB ∝ a−3 and a ∝ t2/3 , this gives finally
4/3 4/3
d dδγq k 2 tL ηk2 tL dδγq
 
2/3
t (1 + R)t2/3 + δγq +
dt dt 3 ργ dt
!
RχT d3 δγq 2 4/3 dδγq
+ 3t4/3 +k t
4ργ dt3 dt
4t2/3 d   8Nq
=− (1 + R) t2/3 ψq = (1 + 3R) , (45)
3 dt 27
where (as before) tL is some typical time of last scattering, k ≡ q/a(tL ) =
2/3
qt2/3 /tL a(t), and
R ≡ 3ρB /4ργ ∝ a . (46)
We next turn to two different ranges of wave number in which it is
possible to find an analytic solution of this equation.

11
E. Solution for large k

We consider first wave numbers that are large enough to allow the use
of the WKB approximation. For this purpose, we introduce a new variable

ζ ≡ (t/tL )1/3 . (47)

(This is the usual conformal time η, but with a different normalization.)


2/3
Multiplying Eq. (45) with 9tL then gives

d dδγq 3ηk2 tL dδγq


 
(1 + ξζ 2 ) + 3k2 t2L δγq +
dζ dζ ργ ζ 2 dζ
2/3
!
ξχT 1 d3 δγq dδγq 8Nq tL  
+ 3
+ 3k2 tL = 1 + 3ξζ 2 . (48)
4ργ tL dζ dζ 3

Here we have again assumed that dissipative terms are negligible except
where a maximum number of derivatives (i.e., factors of k and/or ζ-derivatives)
acts on δγq . We have also used the fact that R ∝ a to set R = ξζ 2 , where ξ
is the ratio (46) at time tL .
In the absence of dissipation, Eq. (48) would have the exact solution
2/3
8Nq tL (1 + 3ξζ 2 )
δγq = . (49)
9(k2 t2L + 2ξ)

(This is actually independent of our choice of tL , because ξ and k2 t2L both


2/3
scale as tL .) The neglect of dissipation is justified in this solution, because
the rate of change of this expression does not yield a factor of the large wave
number k that could compensate for the smallness of χ and η.
To this particular solution, we must add a suitable solution of the cor-
responding homogeneous equation. √ In the absence of damping we can find
exact solutions of the form Pν (i ξζ), where Pν (z) is the usual Legendre
function, and ν is either of the roots of the quadratic equation ν(ν + 1) =
−3k2 t2L /ξ. But this will not be useful in calculating the Cℓ in our companion
paper [1]. To get a more useful result, we must use the WKB approximation.
Under the assumption that

ktL ≥ ξ (50)

we can find a pair of approximate solutions of the homogeneous equation

δγq ∝ exp(±iϕ) (51)

12
with
√ Z
dζ ζ
ϕ= 3ktL p (52)
0 1 + ξζ 2
provided we neglect dissipative terms. Note that if ξ as well as η and χ were
zero, then these homogeneous solutions would be exact. More generally,
inspection of Eq. (48) shows that these are approximate solutions if the
fractional rate of change of 1+ ξζ 2 is small compared with the rate of change
of the phase ϕ: √
2ξζ ktL 3
≤p .
1 + ξζ 2 1 + ξζ 2
which is true at all times if and only if it is satisfied at ζ = 1, i.e.,

ktL ≥ p .
3(1 + ξ)
For plausible values of ξ this condition is actually somewhat weaker than
Eq. (50), but we will need the greater strength of Eq. (50) later, when we
calculate the plasma velocity potential.
We can do better than Eq. (51), and include the effects of viscosity and
heat conduction, by seeking solutions of the form δγq = A exp(±iϕ), with
A a slowly varying real amplitude. By calculating the rate of change of the
Wronskian of these two solutions (and replacing d3 δγq /dζ 3 with −3k3 t2L (1 +
ξζ 2 )−1 dδγq /dζ in the dissipative term), we easily find the WKB solutions of
the homogenous equation:
h i
δγq ∝ (1 + ξζ 2 )−1/4 exp ±iϕ − k2 D 2 , (53)

where " #
ζ η χT ξ 2 ζ 4
Z
D 2 = 3tL + ζ 2 dζ (54)
0 2ργ (1 + ξζ 2 ) 8ργ (1 + ξζ 2 )2
The viscosity and heat conduction coefficients are given by[11]
16 4
η= ργ τγ , χT = ργ τ γ . (55)
45 3
Here τγ is the photon mean free time
1
τγ =
σT n e c
!
tL (6πGmp )1/2 (ΩM /ΩB )1/2 ζ 9/2 ζ 2∆
= exp ,(56)
σT c(2πme kB T0 /h2 )3/4 (1 + zL )3/4 (1 − Y /2)1/2 2kB TL

13
where σT is the Thomson scattering cross section, h is (only here) the orig-
inal Planck constant, T0 = 2.738◦ K is the present microwave background
temperature, T0 = 3100◦ K is the temperature at last scattering, kB is Boltz-
mann’s constant, Y ≃ .23 is the primordial helium abundance, and ∆ = 13.6
eV is the hydrogen ionization energy.
The relevant solution is again the one that increases most rapidly at
early times, which we can find by requiring that δγq → 0 as ζ → 0. In the
limit ζ → 0 the phase ϕ vanishes as O(ζ), while D 2 vanishes more rapidly
because the mean free time of photons is very small at early times. Hence
the linear combination of the particular inhomogeneous solution (49) and
the homogeneous solutions (53) that grows most rapidly at early times is
2/3
" #
8Nq tL 2 2
δγq = 2 2 1 + 3ξζ 2 − (1 + ξζ 2 )−1/4 e−k D cos ϕ . (57)
9(k tL + 2ξ)

(We would be able to√neglect the term 2ξ in the denominator only under
the condition ktL ≥ 2ξ, which for plausible values of ξ is stronger than
our assumption (50). )
To calculate the velocity potential of the plasma–photon fluid for large
wave numbers, we will also need the rate of change of δBq . At times of order
tL , the time derivatives of ξζ 2 , ϕ, and D 2 are of the orders of ξ/tL , k, and
τL , respectively, where τL is the photon mean free time τ ≈ η/ργ ≈ χT /ργ
at time tL . We are assuming that ktL ≥ ξ, so the time derivative of ϕ is
larger than the time derivative of ξζ 2 . Eq. (56) shows that damping becomes
important if k2 tL τL ≥ 1, , but even for such large values of k we can still
limit ourselves to the case
kτL ≤ 1 , (58)
in which case the time derivative of ϕ is also larger than the time derivative
of k2 D 2 . Hence for wave numbers k in the range defined by Eqs. (50) and
(58), we have
2/3 2 2
dδγq 8Nq tL ke−k D sin ϕ
≃ √ . (59)
dt 9 3(1 + ξζ 2 )3/4 (k2 t2L + 2ξ)ζ 2
The dissipative terms in Eq. (42) are smaller than this by a factor kτ , so
here we can take δ̇Bq ≃ 3δ̇γq /4, and Eqs. (37), (31) and (59) then give the
velocity potential
2 2
" #
2Nq ktL e−k D sin ϕ
uq = 1/3
−1 + √ (60)
3k2 a2 (tL )tL ζ 3(1 + ξζ 2 )3/4 (k2 t2L + 2ξ)ζ

14
F. Solution for small k

Here we can neglect viscosity and heat conduction. For k = 0, Eq. (48)
has an obvious solution
2/3
δγq = 4Nq tL ζ 2 /3 = 4δDq /3 . (61)

To this we can add any linear combination of the two solutions of the
corresponding homogeneous solution, for which δγq is respectively time-
independent or proportional to
Z ζ dζ
,
0 1 + ξζ 2
which near the beginning of the dark-matter dominated era goes as ζ. As
ζ increases these homogeneous solutions become negligible compared with
the inhomogeneous solution (61), so at later times the solution for k = 0 is
given by Eq. (61).
To get the term in δγq of first order in k2 , we can use the solution (61)
in the terms in Eq. (48) proportional to k2 , so that

d   d 3
 
1 + ξζ 2 δγq − δDq = −3k2 t2L δDq . (62)
dζ dζ 4
Discarding a homogeneous term for the same reason as before, we have

d 3 k2 t2L ζ δDq
 
δγq − δDq =− (63)
dζ 4 1 + ξζ 2
which gives
2/3
" #
4Nq tL 2 k2 t2L 1 1
 
δγq = ζ 1− − 2 2 ln(1 + ξζ 2 ) + . . . (64)
3 2 ξ ζ ξ

Also, Eqs. (37), (27), (63) and (31) give the plasma velocity potential for
k → 0 as
5/3
1 d Nq t L ζ
uq = 2 (δBq − δDq ) → − 2 . (65)
q dt 3a (tL )(1 + ξζ 2 )
As we will see, this provides a small correction to the Doppler shift, which for
small k will turn out to be mostly due to perturbations in the gravitational
field.

15
IV. OBSERVED TEMPERATURE FLUCTUATIONS

There are three separate sources of the observed temperature fluctuation


in the cosmic microwave background: the Sachs–Wolfe effect due to pertur-
bations in the gravitational potential, the Doppler effect due to plasma pe-
culiar velocities, and the intrinsic temperature fluctuations themselves. We
will consider each of these in turn, and then put the results together. In
calculating the Sachs–Wolfe and Doppler contributions, we will use a non-
relativistic approach, taking the effect of the gravitational field perturbations
on the observed photon temperature to consist entirely of the time dilation
caused by a Newtonian gravitational potential plus the Doppler shift caused
by the gravitational acceleration of the source and receiver. This approach
has the virtue of getting useful results quickly, but the results obtained in
this need to be justified by a thoroughly relativistic treatment of the Sachs–
Wolfe and Doppler effects, which will be given in the Appendix.

A. Sachs–Wolfe Effect

We can define a Newtonian gravitational potential φ as the solution of


the Poisson equation

a−2 (t)∇2 φ(x, t) = 4πGδρD (x, t) (66)

with the factor a−2 inserted to take account of the difference between the
Robertson–Walker co-moving coordinate vector x used here and the coordi-
nate vector a(t)x that measures proper distances at time t. Using Eqs. (31)
and (25), this gives
Z
φ(x, t) = −4πGρD (t)t2/3 a2 (t) d3 q q −2 eiq·x Nq
2a2 (t)
Z
= − d3 q q −2 eiq·x Nq (67)
3t4/3
It is important to note that this is time-independent during the dark matter
era, when a(t) ∝ t2/3 .
This potential makes two separate contributions to the Sachs–Wolfe ef-
fect. There is a gravitational redshift, yielding a fractional fluctuation in
the observed temperature in a direction n̂ equal to φ(rL n̂) − φ(0), where rL
is the Robertson–Walker radial coordinate of the surface of last scattering.
There is also a time-delay; if the unperturbed cosmic temperature reaches
the value TL ≃ 3000◦ K of last scattering at a time tL , then the gravitational

16
potential causes the cosmic temperature in a direction n̂ to reach the value
TL at a time [1 + φ(rL n̂) − φ(0)]tL , so that the redshifted temperature seen
now is changed by a fractional amount[12]:
h ih i 2h i
− tL ȧ(tL )/a(tL ) φ(rL n̂) − φ(0) = − φ(rL n̂) − φ(0) .
3
(This argument is valid only because φ is time-independent; otherwise we
would have to consider the complete gravitationally delayed time-history
of the cosmic temperature, as done in the Appendix.) Combining the two
effects, the net fractional change in observed temperature is
∆T (n) 1h
  i
= φ(rL n̂) − φ(0) . (68)
T Sachs−Wolfe 3

As we shall see in the Appendix, this formula can be derived using the
formalism of general relativity, which in synchronous gauge gives the famous
factor of 1/3 directly, without having to consider separately the gravitational
red shift and time delay.
It will be convenient to rewrite Eq. (67) in terms of the physical wave
number at the time of last scattering, k ≡ q/a(tL ), so that Eq. (68) gives

∆T (n)
  Z h i
= d3 k eik·n̂dA − 1 ǫk , (69)
T Sachs−Wolfe

where ǫk is an amplitude for fluctuations not processed by acoustic oscilla-


tions, defined by
2Nq a2 (tL ) 3
ǫk d3 k ≡ − 4/3
d q, (70)
9q 2 tL
and dA = rL a(tL ) is the angular diameter distance of the surface of last
scattering.

B. Doppler Shifts

The plasma velocity potential uq calculated in Section III yields a pressure-


induced plasma velocity perturbation
Z
vpressure (x, t) = a(t) ∇ d3 q eiq·x uq (t) . (71)

(The factor a(t) enters because it is the velocity in co-moving coordinates


that is given by the co-moving gradient of the velocity potential.) This yields

17
a Doppler shift of the temperature of the cosmic microwave background seen
in a direction n̂:
∆T (n̂)
  Z
= −ia(tL ) d3 q n̂ · q uq (tL )eiq·n̂rL
T pressure Doppler
Z
= −i d3 k n̂ · k̂ ǫk g(k) eik·n̂dA , (72)

with the form factor g(k) given by Eq. (65) for small k as
3k3 t3L
g(k) = (73)
2(1 + ξ)
and for large k by Eq. (60) as
√ 2 2
 
g(k) = 3ktL − 3(1 + ξ)−3/4 (1 + 2ξ/k 2 t2L )−1 e−k dD sin kdH (74)

where ξ as before is 3/4 the ratio of baryon and photon energy densities at
the time of last scattering, DL is the damping length D given in Eq. (56),
evaluated at ζ = 1 (actually, as discussed in the next section, at ζ a little less
than unity) , and dH is the acoustic horizon at the time of last scattering:

√ Z 1
dζ 3t
√ L ln
p p 
dH = 3tL p = ξ + 1 + ξ . (75)
0 1 + ξζ 2 ξ
In the non-relativistic approach used here, there is also an additional ve-
locity perturbation induced by the gravitational potential φ(x). The proper
peculiar velocity vgrav produced in this way is given by the equation of
motion[13]
∂ ȧ(t) 1
vgrav (x, t) + vgrav (x, t) = − ∇φ(x) (76)
∂t a(t) a(t)
Because the gravitational potential φ is time-independent, this has the sim-
ple solution
2ia(t)
Z
vgrav (x, t) = −a−1 (t) t ∇φ(x) = d3 q q −2 q eiq·x Nq . (77)
3 t1/3
This contributes a fractional temperature shift seen in a direction n̂:
∆T (n̂)
 
= −n̂ · [vgrav (n̂rL , tL ) − vgrav (0, t0 )]
T gravity Doppler
1/3
" #
tL a(t0 )
Z
3 ik·n̂dA
= 3i d k (k̂ · n̂) ktL ǫk e − 1/3
, (78)
t0 a(tL )

18
where t0 is the present time. (A general relativistic derivation of this result
in synchronous gauge is given in the Appendix.)

C. Intrinsic Temperature Fluctuations

The fractional change in the photon temperature is one-fourth the frac-


tional change in the photon energy density. The contribution of intrinsic
density fluctuations at the time tL of last scattering to the fractional change
of temperature seen coming from a direction n̂ is therefore†

∆T (n̂) δργ (n̂rL , tL ) 1


  Z
= = d3 q eiq·n̂rL δγq (tL ) . (79)
T intrinsic 4ργ (tL ) 4

Eqs. (57), (64) and (70) then give

∆T (n̂)
  Z
= d3 k ǫk f (k) eik·n̂dA , (80)
T intrinsic

with the partial form factor f given by

−3k2 t2L /2 + 3 ξ −1 −2 4 4
(  
f (k) = h − ξ ln(1 + ξ) k tL /4 + ... i k→0
2 2 −1 −1/4 −k 2 DL
2
(1 + 2ξ/k tL ) −1 − 3ξ + (1 + ξ) e cos(kdH ) k large .
(81)

D. Total Temperature Fluctuations

We now put together the fractional temperature fluctuations given by Eqs. (69),
(72), (78), and (80), and obtain the total fractional temperature fluctuation
4/3
( )
∆T (n̂) tL a(t0 )
  Z h i
= d3 k ǫk F (k) + ik̂ · n̂G(k) eik·n̂dA − 1 − 3ik · n̂ 1/3
,
T t0 a(tL )
(82)

There is a subtlety here. To the extent that the opacity drops sharply from 100%
to zero, last scattering occurs at a fixed value of the perturbed temperature T + δT , near
3000◦ K, rather than at a fixed value of the unperturbed temperature or the time. The
effect of the intrinsic temperature fluctuation δT (t) is thus to change the time of last
scattering, in such a way as to produce a change −δT in the value of the unperturbed
temperature T (t) at this time. Since T (t) ∝ 1/a(t), we then have δa/a = +δT /T at the
time of last scattering, so that the observed temperature is shifted by the change in the
cosmological redshift by a fractional amount ∆T /T = δa/a = +δT /T .

19
where F (k) is the total scalar form factor, given by Eqs. (69), (80), and (81)
as

F (k) = 1 + f (k)
1 − 3k2 t2L /2 −1 −2 4 4
(  
= h − 3 −ξ + ξ ln(1 + ξ) k tL2/42+ . . . i k→0
2 2
(1 + 2ξ/k tL )−1 2 2
−3ξ + 2ξ/k tL + (1 + ξ)−1/4 e−k DL
cos(kdH ) k large .
(83)

and G(k) is the total dipole form factor, given by Eqs. (72), (73), (74), and
(78) as
(
3ktL − 3k3 t3L /2(1 + ξ) + . . . k→0
G(k) = 3ktL −g(k) = √ −3/4 2 2 −1 −k 2 DL
2
3(1 + ξ) (1 + 2ξ/k tL ) e sin(kdH ) k large.
(84)
The last two terms in the curly brackets in Eq. (82) contribute only to the
multipole coefficients Cℓ for ℓ = 0 and ℓ = 1[15], and may therefore be
dropped (as they are in Eq. (1)) in considering the higher multipoles.
We see that the WKB solution for large k gives a poor picture √ of what
happens for k → 0, except in the case ξ ≪ 1, where dH = 3tL , in which
case the above pairs of expressions for F (k) and G(k) agree for small k.
As discussed in Section II, it still remains to average over the time of last
scattering. The effect of this averaging on the damping factor exp −k2 D 2
is small[15]. Otherwise, the averaging over t chiefly affects the sin kdH and
cos kdH factors in Eqs. (83) and (84), which oscillate rapidly with the time
of last scattering when k is large. We will approximate the probability
distribution of the actual time of last scattering t as a Gaussian of the form
(1/π ∆t) exp(−(t − tL )2 /∆t2 ), where tL is a nominal time of last scattering.
Replacing tL in the sines and cosines in Eqs. (83) and (84) with t, multiplying
with this probability distribution, and integrating over t then gives the same
result for the form factors for large k , but with an additional term now added
to DL2 :
∆t 2
 
2 2
∆DL = dH . (85)
2tL
This is a sort of ‘Landau damping,’ except that the damping arises from
a spread in the time at which the temperature of the medium is observed
rather than from a spread in wave numbers. As we will see in the next
section, this term makes a smaller but not insignificant contribution to the
total damping.

20
V. DISCUSSION
In a companion paper[1] we show how to use the formula (82) for the
total temperature fluctuation to derive expressions for the coefficient Cℓ of
the term of multipole number ℓ in the temperature fluctuation correlation
function for general form factors F (k) and G(k). As we will see there, the
contribution of the scalar form factor F (k) to Cℓ arises mostly from wave
numbers of order ℓ/dA (where dA is the angular diameter distance of the
surface of last scattering), while this approximation is much worse for the
contribution of the dipole form factor G(k).
For the present, we will content ourselves with noting that if we tenta-
tively use the WKB approximation, neglect damping effects, and drop the
terms in the second line of Eq. (83) proportional to ξ/k2 t2L , then for ξ less
than 0.311 (that is, for 3ξ < (1 + ξ)−1/4 ) the squared scalar form factor
F 2 (k) has peaks at the wave numbers
kn = nπ/dH , (86)
(with n = 1, 2, . . .), with higher peaks for odd n (where the two terms in
F (k) have the same sign) than for even n. The minima are at the zeroes
of F (k). For ξ > .311 the only peaks are those for n odd, and the minima
are at n even. This suggests that there should be peaks in Cℓ near ℓn =
(2n − 1)πdA /dH and either lower peaks or dips near 2nπdA /dH , depending
on the value of ξ. These peaks are known as the Doppler peaks (though
Eq. (84) shows that the contribution of the Doppler shift is very small at
all the wave numbers kn .) These results depend critically on the negative
sign of the term −3ξ in the second line of Eq. (83); if this term had turned
out to be positive then for ξ > .311 the positions of the peaks and dips
would be interchanged. Despite what is sometimes said[16], there is no way
without detailed calculations to see that the first Doppler peak should be at
ℓ ≃ πdA /dH , rather than at a multipole number twice as large.
We can now check whether the WKB approximation used in subsection
3E is valid at the first Doppler peak. According to Eqs. (86) and (75), we
have √
π ξ
k1 tL = √ √ √ , (87)
3 ln( ξ + 1 + ξ)
so the ratio of the wave number at the first Doppler peak to the mininum
wave number kmin allowed by the inequality (50) is
k1 π
=√ √ √ . (88)
kmin 3ξ ln( ξ + 1 + ξ)

21
The WKB approximation is valid at wave numbers down to the first Doppler
peak if this ratio is sufficiently larger than unity. For instance, for ΩB h2 =
0.03 we have ξ = 0.81, so Eq. (88) gives k1 /kmin = 2.5, making the WKB ap-
proximation fairly good at the first Doppler peak. The WKB approximation
is somewhat better at k1 for smaller values of ΩB h2 , though it still breaks
down at smaller wave numbers unless ΩB h2 = 0. For all plausible values of
ξ the WKB approximation is excellent at the higher Doppler peaks.
Next, let us consider the importance of damping. It might seem that we
should calculate the damping length DL by integrating in Eq. (54) up to the
time of last scattering, corresponding to ζ = 1. But at the nominal time
of last scattering (defined so that the probability of any futurepscattering is
50%), the photon collision rate 1/τγ given by Eq. (56) is 0.2 ΩB /ΩM /tL ,
which is already considerably smaller than the expansion rate 2/3tL , so that
we cannot trust the hydrodynamic calculations used to obtain Eq. (54). We
will instead integrate in Eq. (54) only up to a value ζmax of ζ at which the
photon collision rate becomes equal to the expansion rate, and set
9/2
3tL ζ ∆
  
τγ ≃ exp − (ζ 2 − ζ 2 ) .
2 ζmax 2kB TL max

The exponential factor (with ∆/2kB TL = 25.5) is so sharply peaked at


ζ = ζmax that we can approximate ζmax 2 − ζ 2 ≃ 2ζmax (ζmax − ζ) in the
exponent and set ζ equal to ζmax everywhere else in the integral, giving
!
3t2 8 ξ 2 ζmax
4 kB TL

DL2 ≃ 3L 2 )
+ 2 )2
2ζmax 15(1 + ξζmax 2(1 + ξζmax ∆

Furthermore, ζmax is very close to unity. (For instance, for ΩM /ΩB = 7.5,
we have ζmax = 0.96. That is, we carry the damping integral down to a
2
temperature TL /ζmax ≃ 3360◦ K instead of 3100◦ K.) Hence in this result we
may as well replace ζmax with unity, so that
!
3t2 8 ξ2 kB TL

DL2 ≃ L +
2 15(1 + ξ) 2(1 + ξ)2 ∆

This approximation leads to the additional simplification that the damping


length is independent of most of the parameters appearing in Eq. (56),
including the ratio ΩB /ΩM .
There is a smaller additional contribution from the averaging over oscil-
latory terms, given by Eq. (85). To evaluate this Landau damping term, we

22
will need the ratio ∆t/tL . We noted in Section II that the probability that a
photon will not be scattered again rises from 2% at about 3400◦ K to 98% at
about 2800◦ K, with very little dependence on any cosmological parameters.
Matching this to the probabilities calculated from the approximation that
the probability of scattering in a time interval from t to t + ∆t is a Gaussian
(dt/π ∆t) exp(−(t − tL )2 /∆t2 ), and using the relation T ∝ t−2/3 , we find
∆t/tL = 0.10, so the contribution to DL2 in Eq. (85) has a value 0.0025d2H .
Adding this to the quantity we have calculated from the integral (54) gives
the total squared damping length
!
8 ξ2
d2D ≡ DL2 + ∆DL2 ≃ 0.029 t2L + + 0.0025 d2H . (89)
15(1 + ξ) 2(1 + ξ)2

For instance, for ΩB h2 = 0.02 (so that ξ = 0.54) Eq. (75) gives dH =
1.61 tL , so d2D = 0.0071 d2H . Hence at the first Doppler peak the argument
of the damping exponential is d2D k12 ≃ 0.07. (This depends very little on ξ.)
We see that damping is not important at the first Dopper peak, in agreement
with more accurate computer calculations[17], but is quite significant at the
second Doppler peak. One effect of damping is to shift the second and higher
Doppler peaks to lower values of k and ℓ.
In deriving the wave numbers (86) of the Doppler peaks we also neglected
the terms proportional to ξ/k 2 t2L in the second line of Eq. (83). At the first
Doppler peak this quantity is given by Eq. (75) as

ξ 3h p p i2
= ln( ξ + 1 + ξ) .
k12 t2L π2

This is 0.20 for ΩB h2 = 0.03, for which ξ = 0.81, and less for smaller values
of ΩB h2 . This approximation is thus fair at the first Doppler peak, and
becomes excellent at the higher Doppler peaks.
Finally, we must ask what values of k are small enough so that we can
ignore acoustic oscillations during the era when the photon energy density
exceeded the dark matter plus baryon density, during which our analysis
does not apply. During this era the Robertson–Walker
√ scale factor a(t) went
as t1/2 , and the speed of sound was 1/ 3, so the phase change of acoustic
oscillations up to the time tC of the crossover from radiation dominance to
matter dominance was
Z tC dt 2qtC 2ktL

tC a(tL )

∆ϕ = q √ =√ = √ .
0 3a(t) 3a(tC ) 3 tL a(tC )

23
The redshift zC at the crossover is given by 1+zC = ΩM /Ωγ = 4×104 ΩM h2 .
During the period from this crossover to the present the scale factor a(t) went
as t2/3 , so the ratio in parentheses is
s
tC a(tL ) 1 + zL 1
= = √ .
tL a(tC ) 1 + zC 6.0 ΩM h2

Using this and Eq. (75) gives



0.35 k ξ
 
∆ϕ ≃ √ √ √ . (90)
ΩM h2 k1 ln( ξ + 1 + ξ)

For instance, if we take ΩM h2 = 0.15 and ΩB h2 = 0.03, then ∆ϕ ≃ 1 at the


first Doppler peak, indicating that oscillations in the radiation-dominated
era are becoming important at the first Doppler peak. This is not to say
that we are making an error of order unity in the argument ϕ of the sines
and cosines in Eqs. (83) and (84), but rather that the evolution of the
perturbations during this much of their oscillations has not been reliably
calculated. This source of error is mitigated in reference [1] by including the
effects of photon and neutrino energies on a(t) in calculating the horizon
distance.
Our formula (84) for the dipole form factor G(k) raises the possibility
of a maximum in G(k) at kdH = π/2, yielding a “zeroth Doppler peak,”
produced (as the first Doppler peak is not) by the Doppler effect. For
ΩB h2 = 0.03 the wave number at this supposed peak is too small for us
to trust the WKB approximation used to derive Eq. (84) at this peak, but
the peak in G(k) at kdH = π/2 would definitely be there for much smaller
values of ΩB h2 . In particular, the calculations of reference [1] show such a
zeroth Doppler peak in Cℓ at ℓ ≃ 0.45dA /dH for ΩB = 0.

APPENDIX: RELATIVISTIC CALCULATION OF THE


SACHS–WOLFE AND DOPPLER EFFECTS

In Section IV we gave a derivation of the Sachs–Wolfe and Doppler ef-


fects, using heuristic arguments to supplement relativistic results. For com-
pleteness, this Appendix will present a thouroughly relativistic derivation in
synchronous gauge, taking into account the possible presence of a vacuum
energy, which may or may not be constant. This goes over familiar ground,
first considered by Sachs and Wolfe[18], but I as far as I know there is no
published treatment of the ‘integrated Sachs–Wolfe effect’ in synchronous

24
gauge that goes explicitly and analytically into the details presented here,
including the possibility of a varying vacuum energy.
A light ray travelling toward the center of the Robertson–Walker coor-
dinate system from the direction n̂ will have a co-moving radial coordinate
r related to t by
 
0 = gµν total dxµ dxν = −dt2 + a2 (t) + hrr (rn̂, t) dr 2 , (91)

or in other words
dr  −1/2 1 hrr
= − a2 + hrr ≃− + 3 . (92)
dt a 2a
The first-order solution is
1 t dt′
Z
′ ′
r(t) = s(t) + hrr s(t )n̂, t , (93)
2 tL a3 (t′ )
where s(t) is the zero-th order solution for the radial coordinate which has
the value rL at t = tL :
t dt′
Z
s(t) = rL − . (94)
tL a(t′ )
In particular, if the ray reaches r = 0 at a time t0 , then
1
Z t0 dt
0 = s(t0 ) + hrr (s(t)n̂, t) . (95)
2 tL a3 (t)
A time interval δtL between successive light wave crests at the time tL
of last scattering produces a time interval δt0 at t0 given by the variation of
Eq. (95):
" #
1 1 hrr (rL n̂, tL ) 1
Z t0 dt

∂hrr (rn̂, t)

0 = δtL − 3
+ 3
a(tL ) 2 a (tL ) 2 a(tL ) tL a (t) ∂r r=s(t)
∂u(rn̂, tL ) 1 1 hrr (0, t0 )
   
+δtL + δt0 − + . (96)
∂r r=rL a(t0 ) 2 a3 (t0 )
(The velocity potential term on the right-hand side arises from the pressure-
induced change with time of the radial coordinate rL of the light source in
Eq. (95).) The total rate of change of the quantity hrr (s(t)n̂, t)/a2 (t) in
Eq. (96) is
d hrr (s(t)n̂, t) ∂ hrr (rn̂, t) 1 ∂hrr (rn̂, t)
   
= − 3 ,
dt a2 (t) ∂t a2 (t) r=s(t) a (t) ∂r r=s(t)

25
so Eq. (96) may be written
" #
1 1 hrr (0, t0 ) 1
Z t0 


hrr (rn̂, t)

0 = δtL − 2
+ dt
a(tL ) 2 a (t0 )a(tL ) 2 a(tL ) tL ∂t a2 (t) r=s(t)
∂u(rn̂, t) 1 1 hrr (0, t0 )
   
+δtL + δt0 − + . (97)
∂r r=rL a(t0 ) 2 a3 (t0 )

Hence to first order the ratio of the received and emitted frequencies is
"
ν0 δtL a(tL ) 1
Z tL 


hrr (rn̂, t)

= = 1−
νL δt0 a(t0 ) 2 t0 ∂t a2 (t) r=s(t)
#
∂u(rn̂, t)
 
− a(tL ) . (98)
∂r r=rL

This gives a fractional shift in the radiation temperature observed at time t0


coming from direction n̂, from its unperturbed value: T0 = TL a(tL )/a(t0 ):

∆T (n̂) ν0
 
= −1
T SW, Dop a(t L L /a(t0 )

Z t0
∂ hrr (rn̂, t) ∂u(rL n̂, tL )
    
= − dt − a(t L ) . (99)
tL ∂t 2 a2 (t) r=s(t) ∂r r=rL

Now we have to think about how to relate the rr component of the


metric perturbation to the field ψ appearing in Section III. In general, the
metric perturbation may be written as

∂2B
hij = Aδij + . (100)
∂xi ∂xj
The quantity entering into the integrand in Eq. (99) is then

∂ hrr (rn̂, t) ∂ 2 β(rn̂, t)


 
= α(rn̂, t) + , (101)
∂t a2 (t) ∂r 2

where
∂ A ∂ B
   
α≡ , β≡ . (102)
∂t 2a2 ∂t 2a2
The field ψ defined by Eq. (23) is given by

ψ = 3 α + ∇2 β . (103)

26
We also need a relation between α and β, which can be taken from the field
equation for the full metric perturbation[19]:
∂ 2 hik ∂ 2 hjk ∂ 2 hkk
∇2 hij − − +
∂xj ∂x

k ∂xi ∂xk ∂xi ∂xj
−a2 ḧij + aȧ ḣij − δij ḣkk + 2ȧ2 δij hkk + 2aähij
= −8πG (δ̺ − δp) a4 δij . (104)
(For simplicity we are here taking the universe to be spatially flat, which is
certainly a good approximation at high redshifts, and seems to be a good
approximation even at present.) The ∂ 2 /∂xi ∂xj terms in Eq. (104) give
∂ ∂  −2 
 
a3
A = a2 B̈ − aȧḂ − 2aäB = a a B . (105)
∂t ∂t
In terms of the quantities defined by Eq. (102), this is
∂ 1 ∂  3 
 
a β
α= . (106)
∂t a ∂t
Hence for a given gravitational potential ψ, we can calculate β by solving
Eq. (103):
∂ 1 ∂  3 
 
a β + ∇2 β = ψ (107)
∂t a ∂t
and then use Eq. (106) to find α.
Now we return to the fractional temperature shift (99). Using Eqs. (101)
and (106) lets us write this as
!
∆T (n̂) t0 ∂ 2 β(rn̂, t) ∂u(rL n̂, tL )
  Z  
= − dt − a(tL )
T SW, Dop tL ∂r 2 r=s(t)
∂r r=rL
Z t0 


1 ∂ 3

− (a (t) β(rn̂, t)) . (108)
tL ∂t a ∂t r=s(t)

To do the first integral here we note that


!
∂ 2 β(rn̂, t)
∂r 2 r=s(t)
" #
d ∂β(rn̂, t) ∂β(rn̂, t)

=− a2 (t) + a(t)ȧ(t)β(rn̂, t) + a(t)
dt ∂t ∂r r=s(t)
!
2 ∂ 2 β(rn̂, t) ∂β(rn̂, t)  
+ a (t) + 3a(t)ȧ(t) + a(t)ä(t) + ȧ2 (t) β(rn̂, t)
∂t2 ∂t r=s(t)
(109)

27
The fractional temperature fluctuation (108) may therefore be written

∆T (n̂) ∆T (n̂) ∆T (n̂) ∆T (n̂)


       
= + + ,
T SW, Dop T early T late T integrated
(110)
where
∆T (n̂) ∂β(rL n̂, t)
   
= −a2 (tL ) − a(tL )ȧ(tL )β(rL n̂, tL )
T early ∂t t=tL
∂β(rn̂, tL ) ∂u(rL n̂, tL )
   
− a(tL ) − a(tL ) (111)
∂r r=rL ∂r r=rL
∆T (n̂) ∂β(0, t)
   
= a2 (t0 ) + a(t0 )ȧ(t0 )β(0, t0 )
T late ∂t t=t0
∂β(rn̂, t0 )
 
+ a(t0 ) (112)
∂r r=0
∆T (n̂)
 
=
T integrated
!
tL ∂ 2 β(rn̂, t) ∂β(rn̂, t)
Z  
2 2
−2 dt a (t) + 4a(t)ȧ(t) + 2 a(t)ä(t) + ȧ (t) β(rn̂, t) .
t0 ∂t2 ∂t r=s(t)
(113)

In evaluating these three contributions to the temperature fluctuation, it


is helpful to note a relation between β and the conventionally defined New-
tonian potential φ that applies not only for a gravitational field dominated
by cold dark matter, but also in the presence of a constant vacuum energy.
Combining Eqs. (26) and (27) gives

∂ 1 ∂ 2
 
a ψ =ψ. (114)
∂t 4πGa2 ρD ∂t

Taking into account the relation ρD ∝ a−3 , an elementary manipulation


then gives
∂ 1 ∂ 3 d ȧ
    
a ψ = 4πGρD + a2 ψ (115)
∂t a ∂t dt a
The equations of the Friedmann model give
d ȧ
 
= −4πG(ρ + p) . (116)
dt a

28
A constant vacuum energy density ρV is associated with a pressure pV =
−ρV , while cold dark matter by definition has zero pressure, so as long as the
gravitational field is dominated by cold dark matter and a constant vacuum
energy, the right-hand side of Eq. (116) is −4πGρV , and Eq. (115) then gives

∂ 1 ∂ 3
 
a ψ =0. (117)
∂t a ∂t

Comparing Eq. (106) with Eq. (110), we now see that Eqs. (107) has the
solution
∇2 β = ψ (118)
More specifically, if we define a Newtonian gravitational potential φ by Pois-
son’s equation
a−2 ∇2 φ = 4πGδ̺D , (119)
then Eqs. (26) and (118) show that the Newtonian potential is

∂ 2 
φ=− a β . (120)
∂t
This result is not applicable if the gravitational field receives significant
cotnributions from a varying vacuum energy, but even in quintessence the-
ories it is reasonable to assume that a vacuum energy density of any sort
is negligible at and near the time of last scattering. (It certainly must be
much less than the radiation energy density at the time of cosmological nu-
cleosynthesis, in order to avoid the production of too much helium.) We
have also been relying here on the approximation that the radiation energy
density is much less than the dark matter density at around the time of last
scattering. Therefore the early-time contribution (111) to the temperature
fluctuation can be calculated using the relation (118) and ψ ∝ t−1/3 , which
give β ∝ t−1/3 . Since here a ∝ t2/3 , Eq. (120) then gives β = −tφ/a2 , with
φ time-independent. The early-time contribution (111) to the temperature
fluctuation may therefore be expressed as

∆T (n̂) 1 tL ∂φ(rn̂) ∂u(rL n̂, tL )


     
= φ(rn̂)+ −a(tL ) .
T early 3 a(tL ) ∂r r=rL ∂r r=rL
(121)
This yields the Sachs–Wolfe temperature shift (68) and the gravitationally-
induced Doppler shift (77) (aside from the terms arising from r = 0, about
which more later), as well as the pressure-induced Doppler shift (72). The

29
famous factor 1/3 in the first term on the right-hand side arises in ‘New-
tonian gauge’ as the sum of a gravitational redshift equal to φ, and a term
in the intrinsic temperature fluctuation equal to −2φ/3, while in the syn-
chronous gauge used here this term is due entirely to the metric pertur-
bation. It is a curious feature of synchronous gauge that what we have
called the gravitationally-induced Doppler shift also arises from the metric
perturbation.
It is not appropriate to neglect the vacuum energy at t = t0 , so it cannot
be ignored in the early-time contribution (112) to the temperature fluctu-
ation. Therefore in general this contribution is not the same as the r = 0
terms in Eqs. (68) and (77). Nevertheless, the terms in the early-time con-
tribution to the temperature fluctuation are only of zeroth and first order
in n̂ (like the r = 0 terms in Eqs. (68) and (77)) so these terms can only
affect the multipole coefficients for ℓ = 0 and ℓ = 1.
This leaves the integrated term (113) as the only correction to the results
of Section IV for ℓ ≥ 2. The integrand vanishes if we ignore the vacuum
energy and radiation energy, in which case a ∝ t2/3 and β ∝ t−1/3 , so the
integral receives a contribution only for t near t0 , and is therefore expected
to be a small correction[20]. Furthermore, although this integral is fairly
complicated, it has a simple dependence on n̂. In the presence of a vacuum
energy, ψ(x, t) can have a fairly complicated dependence on time, but, with-
out pressure forces acting on the dark matter, its x dependence is the same
as we found in the absence of vacuum energy, given by Eqs. (31) and (70)
as Z
ψ(x, t) = f (t) d3 k eik·x k2 ǫk (122)

with f (t) not proportional to t−1/3 where the vacuum energy is appreciable.
For a constant vacuum energy, β is then given by Eq. (118) as
Z
β(x, t) = −f (t) d3 k eik·x ǫk (123)

The “integrated” contribution (113) to the temperature fluctuation then


takes the form

∆T (n̂)
 Z Z tL
=2 d3 k dt eik·n̂s(t) ǫk
T integrated t0
   
× a2 (t)f¨(t) + 4a(t)ȧ(t)f˙(t) + 2 a(t)ä(t) + ȧ2 (t) f (t) . (124)

It can be shown that this makes an additive contribution to ℓ(ℓ + 1)Cℓ that
for large ℓ goes as 1/ℓ, with no interference between this contribution to the

30
temperature fluctuation and the other contributions.[21] For a time-varying
(but spatially constant) vacuum energy the function β(x, t) does not satisfy
the relations (118) and (123), but Eq. (107) shows that its spatial Fourier
transform is nevertheless just proportional to ǫk for large k, so the integrated
term still makes a contribution to ℓ(ℓ + 1)Cℓ that is proportional to 1/ℓ for
large ℓ.

ACKNOWLEDGMENTS

I am grateful for helpful correspondence with E. Bertschinger, J. R.


Bond, L. P. Grishchuk, and M. White. This research was supported in part
by the Robert A. Welch Foundation and NSF Grants PHY-0071512 and
PHY-9511632.

REFERENCES

1. S. Weinberg, “Fluctuations in the Cosmic Microwave Background II:


Cℓ for large and small ℓ,” UTTG-04-01, astro-ph/0103281.

2. E. R. Harrison, Phys. Rev. D1, 2726 (1970); P. J. E. Peebles and J.


T. Yu, Ap. J. 162, 815 (1970); Ya. B. Zel’dovich, Astron. Ap. 5, 84
(1970).

3. S. Hawking, Phys. Lett. 115B, 295 (1982); A. A. Starobinsky, Phys.


Lett. 117B, 175 (1982); A. Guth and S.-Y. Pi, Phys. Rev. Lett. 49,
1110 (1982); J. M. Bardeen, P. J. Steinhardt, and M. S. Turner, Phys.
Rev. D28, 679 (1983); W. Fischler, B. Ratra, and L. Susskind, Nucl.
Phys. B259, 730 (1985).

4. W. Hu, U. Seljak, M. White, and M. Zaldarriaga, Phys. Rev. D57,


3290 (1998), and earlier references cited therein. For analysis of recent
observations, see J. R. Bond et al., astro-ph/0011378 (2000).

5. P. J. E. Peebles and J. T. Yu, ref. 2; J. R. Bond and G. Efstathiou,


Ap. J. Lett. 285, L45 (1984); Mon. Not. Roy. Astron. Soc. 226, 655
(1987); A. G. Doroshkevich, Sov. Astron. Lett. 14, 125 (1988); F.
Atrio-Barandela, A. G. Doroshkevich, and A. A. Klypin, Astrophys. J.
378, 1 (1991); P. Naselsky and I. Novikov, Ap. J. 413, 14 (1993); H. E.
Jφrgensen, E. Kotok, P. Naselsky, and I. Novikov, Astron. Astrophys.
294, 639 (1995); C-P. Ma and E. Bertschinger, Ap. J. 455, 7 (1995).
For comments on some of these articles, see reference [6]. The most

31
up-to-date and comprehensive calculation is that of W. Hu and N.
Sugiyama, Ap. J. 444, 489 (1995); 471, 542 (1996), but they do not
collect their results into a single formula for the temperature shift, so
it is not easy to compare their results with those of the present paper.

6. Several authors have given approximate formulas for the temperature


fluctuation that fit the general form of Eq. (1). In particular, the re-
sults given in Eqs. (8) and (9) can be obtained from Eq. (1) of Naselsky
and Novikov, ref. 5, by applying some corrections: the factor (1 + ν)
should be omitted in their definition of ξ (in their notation, ǫ); the fac-
tor (1 + z)−1 should be omitted in their definition of their parameter
ω, so that their ω equals ξ; and a factor ω should be included in the
last term of the numerator in the argument of the logarithm of their
Eq. (2); and the damping factor exp(−k2 d2D ) and terms proportional
to ξ/k 2 t2L should be inserted. This paper did not give the derivation
of their Eq. (1), but quoted Doroshkevich, ref. 5, despite the fact that
their result was quite different from that of Doroshkevich. There was
an obvious misprint in the formula given by Doroshkevich, but this
was corrected in the later paper by Atrio-Barandela, Doroshkevich,
and Klypin, ref. 5. The formula in this paper includes the terms pro-
portional to ξ/k2 t2L , which had been omitted by Naselsky and Novikov,
but omitted the factors of (1 + ξ) that had been included by Nasel-
sky and Novikov, and also included a spurious term in the analog
of G(k) (the 1 in the numerator of the second line of their Eq. (4)).
This paper gave differential equations for the time development of the
perturbations, but did not explain how they were used to calculate
the temperature fluctuation. A few years later the formula given by
Naselsky and Novikov was repeated by Jφrgensen, Kotok, Naselsky,
and Novikov, ref. 5, and the differential equations on which the for-
mula was based were given. However, once again the derivation of the
formula from these equations was not explained, and an overly restric-
tive lower bound on k was given for the validity of the formula, that k
must be larger than the inverse conformal time. If this condition were
really necessary, then the formula would not be applicable at the first
Doppler peak. We will see in Section III that the lower bound on k is
actually less restrictive, and in particular disappears for ξ ≪ 1.

7. S. Weinberg, Gravitation and Cosmology (Wiley, New York, 1972),


Eq. (15.10.13). In conformity with common present notation, the sym-

32
bol R(t) used in this book for the Robertson–Walker scale factor has
been replaced in the present paper with a(t).

8. S. Weinberg, ref. 7, Eq. (15.10.52).

9. S. Weinberg, ref. 7, Eq. (15.10.51).

10. S. Weinberg, ref. 7, Eq. (15.10.53). A factor T was missing in the


second term in the curly brackets in Eq. (15.10.53), and has been
suppplied here.

11. These formulas are obtained by comparing the acoustic damping rate
calculated by N. Kaiser, Mon. Not. Roy. Astr. Soc. 202, 1169 (1983)
with the damping rate calculated for arbitrary values of η and χ by S.
Weinberg, Ap. J. 168, 175 (1971), Eq. (4.15). The latter article also
gives values for χ and η, repeated in ref. 7: it gives the same value of
χ and a value for η that is 3/4 the value quoted in Eq. (55), but these
results were based on calculations of L. H. Thomas, Quart. J. Math.
(Oxford) 1, 239 (1930), that assumed isotropic scattering and ignored
photon polarization. (The same value for η had been given by C.
Misner, Ap. J. 151, 431 (1968).) Kaiser’s results are calculated using
the correct differential cross section for Thomson scattering and take
photon polarization into account, and therefore supersede the earlier
value quoted for η. As late as 1995 the wrong value of the damping
rate was still being used, for instance by Hu and Sugiyama, ref. 5, but
the correct rate was used by Hu and White, Ap. J. 479, 568 (1997).

12. J. A. Peacock, Cosmological Physics (Cambridge University Press,


Cambridge, UK 1999), p. 591.

13. The derivation is given for instance in S. Weinberg, ref. 7, Eq. (15.9.13).
The presence of the second term on the left-hand side has as a con-
sequence the well known decay ∝ 1/a(t) of the peculiar velocities of
non-relativistic free particles. The factor 1/a(t) multiplying the gra-
dient of the potential enters again to convert a derivative with respect
to co-moving coordinates into a derivative with respect to coordinates
that measure proper distances.

14. Hu and Sugiyama, ref. 5; Hu and White, ref. 11.

15. A. Dimitropoulos and L. P. Grishchuk, gr-qc/001087.

33
16. See, e. g., P. H. Frampton, Y. J. Ng, and R. Rohm, Mod. Phys. Lett.
13, 2541 (1998).

17. W. Hu and M. White, ref. 11.

18. R. K. Sachs and A. M. Wolfe, Ap. J. 1, 73 (1967).

19. S. Weinberg, ref. 7, Eqs. (15.10.29) and (15.1.19). (A misprint has


been corrected here: the equals sign in the first line of Eq. (15.10.29)
has been changed to a minus sign.)

20. L. A. Kofman and A. A. Starobinskii, Sov. Astron. Lett. 11, 271


(1985).

21. The 1/ℓ dependence was found in reference [20], but without consid-
eration of a possible interference between this effect and the Doppler
shift and intrinsic temperature shift.

34
UTTG-04-01
arXiv:astro-ph/0103281v2 16 Aug 2001

Fluctuations in the Cosmic Microwave Background II:


Cℓ at Large and Small ℓ

Steven Weinberg∗
Theory Group, Department of Physics, University of Texas
Austin, TX, 78712

Abstract

General asymptotic formulas are given for the coefficient Cℓ of the term of
multipole number ℓ in the temperature correlation function of the cosmic mi-
crowave background, in terms of scalar and dipole form factors introduced in
a companion paper. The formulas apply in two overlapping limits: for ℓ ≫ 1
and for ℓd/dA ≪ 1 (where dA is the angular diameter distance of the surface
of last scattering, and d is a length, of the order of the acoustic horizon at
the time of last scattering, that characterizes acoustic oscillations before this
time.) The frequently used approximation that Cℓ receives its main con-
tribution from wave numbers of order ℓ/dA is found to be less accurate for
the contribution of the Doppler effect than for the Sachs–Wolfe effect and
intrinsic temperature fluctuations. For ℓd/dA ≪ 1 and ℓ ≥ 2, the growth of
Cℓ with ℓ is shown to be affected by acoustic oscillation wave numbers of all
scales. The asymptotic formulas are applied to a model of acoustic oscilla-
tions before the time of last scattering, with results in reasonable agreement
with more elaborate computer calculations.


Electronic address: weinberg@physics.utexas.edu
I. INTRODUCTION

A companion paper[1] has shown how to express the temperature fluc-


tuation in the cosmic microwave background in any direction as an integral
involving scalar and dipole form factors F (k) and G(k), which characterize
acoustic oscillations before the time of last scattering. In the present paper
we derive asymptotic formulas for the strength Cℓ of fluctuations at multi-
pole number ℓ for form factors of arbitary functional form. After outlining
our assumptions and reviewing some generalities in Section II, our general
result in the limit of ℓ ≫ 1 [Eq. (26)] is derived in Section III. In this limit
ℓ(ℓ + 1)Cℓ depends on ℓ and the angular diameter distance dA at the time of
last scattering only through the ratio ℓ/dA . (This is why the heights of the
Doppler peaks do not depend on parameters like the cosmological constant
that affect dA but not the form factors.) Our result in the limit ℓd/dA ≪ 1
[Eq. (43)] is derived in Section IV. (Here d is some length characterizing
acoustic oscillations, such as the acoustic horizon distance dH at the time of
last scattering). These ranges of ℓ overlap because dA ≫ d.
Even without a detailed calculation of the form factors, these results have
a moral for the physical interpretation of measurements of Cℓ . It is common
to interpret these measurements by supposing that Cℓ arises mostly from
fluctuations of wave number k ≃ ℓ/dA . Eq. (27) shows that this is a fair
approximation for the contribution of the scalar form factor F (k), which
represents the Sachs–Wolfe effect and intrinsic temperature fluctuations; Cℓ
receives no contribution from F (k) with k < ℓ/dA , and the contribution from
k ≫ ℓ/dA is suppressed by a factor β −2 (β 2 − 1)−1/2 , where β ≡ kdA /ℓ. In
particular, a peak in the magnitude of the scalar form factor F (k) at some
wave number k1 (like the peak found in the simple model studied in reference
[1] at k = π/dH ) will show up in ℓ(ℓ + 1)Cℓ at a value of ℓ less than but close
to k1 dA . For instance, we will see in Section V that the peak in |F (k)| at
k = π/dH produces a peak in ℓ(ℓ + 1)Cℓ at ℓ ≃ 2.6dA /dH rather than at
πdA /dH . But Eq. (27) also shows that this interpretation of Cℓ is much less
useful for the contribution of the vector form factor G(k), which arises from
the Doppler effect; Cℓ also receives no contribution from G(k) with k < ℓ/dA ,
but instead of the contribution from k ≃ ℓ/dA being enhanced by a factor
(β 2 − 1)−1/2 , it is suppressed by a factor (β 2 − 1)1/2 . Indeed, we will see
in Section V that for sufficiently small baryon number the peak in G(k) at
k = π/2dH found in the simple model of reference [1] does show up as a peak

1
in ℓ(ℓ+1)Cℓ , but at ℓ ≃ .45dA /dH , much less than (π/2)dA /dH . Furthermore,
the behavior of ℓ(ℓ + 1)Cℓ for ℓd/dA near zero depends on the values of F (k)
and G(k) for all k. This points up the value of observations that can measure
the correlation function of temperature fluctuations directly, as a supplement
to measurements of Cℓ .
The results obtained in Sections III and IV are used in Section V to
calculate Cℓ for the approximate form factors calculated in reference 1. In
agreement with what is found in more accurate computer calculations, the
position ℓ1 of the first Doppler peak is not a sensitive function of the baryon
density parameter ΩB h2 . On the other hand, we find that the ratio of the
value of ℓ(ℓ + 1)Cℓ at the first Doppler peak to its value at ℓ ≪ dA /dH is a
sensitive indicator of the value of ΩB h2 .

II. GENERALITIES

The companion paper[1] shows that, in very general models (but assum-
ing only compressional normal modes, with no gravitational radiation), the
fractional variation from the mean of the cosmic microwave background tem-
perature observed in a direction n̂ takes the general form

∆T (n̂)
Z h i
= d3 k ǫk eidA n̂·k F (k) + i n̂ · k̂ G(k) , (1)
T
aside from effects arising from late times, which chiefly affect the coefficients
Cℓ for relatively small ℓ. Here dA is the angular diameter distance of the
surface of last scattering
 
1 1 dx
Z
1/2
dA = 1/2
sinh ΩC √  , (2)
ΩC H0 (1 + zL ) 1
1+zL ΩΛ x + ΩC x2 + ΩM x
4

where ΩC ≡ 1−ΩΛ −ΩM , and ΩΛ and ΩM are the present ratios of the energy
densities of the vacuum and matter to the critical density 3H02 /8πG. (If the
vacuum energy were to change with time, as in theories of quintessence, then
the formula for dA would need modification, but there would be essentially
no change in the other ingredients in Eq. (1), as long as the quintessence
energy density makes a negligible contribution to the total energy density
at and before the time of last scattering.) Also, k2 ǫk is proportional (with
a k-independent proportionality coefficient) to the Fourier transform of the

2
fractional perturbation in the energy density early in the radiation-dominated
era. The average∗∗ of the product of two ǫs is assumed to satisfy the condi-
tions of statistical homogeneity and isotropy:

hǫk ǫk′ i = δ 3 (k + k′ ) P(k) (3)

with k ≡ |k|. The power spectral function P(k) is real and positive. Where
a specific expression for P(k) is needed, we will use the ‘scale-invariant’ (or
n = 1) Harrison–Zel’dovich form suggested by theories of new inflation:

P(k) = B k −3 , (4)

with B a constant that must be taken from observations of the cosmic mi-
crowave background or condensed object mass distributions, or from detailed
theories of inflation.
The form factors F (k) and G(k) characterize acoustic oscillations, with
F (k) arising from the Sachs–Wolfe effect and intrinsic temperature fluctu-
ations, and G(k) arising from the Doppler effect. For instance, they are
calculated in reference 1 in the approximation that perturbations in the
gravitational field at and before the time of last scattering arise entirely
from perturbations in the density of cold dark matter. For very small wave
numbers the form factors are

F (k) → 1 − 3k 2 t2L /2 − 3[−ξ −1 + ξ −2 ln(1 + ξ)]k 4 t4L /4 + . . . , (5)


G(k) → 3ktL − 3k 3 t3L /2(1 + ξ) + . . . , (6)

while for wave numbers large enough to allow the use of the WKB approxi-
mation, i. e.,
ktL > ξ (7)
the form factors are
h 2d
i
F (k) = (1 − 2ξ/k 2t2L )−1 −3ξ + 2ξ/k 2 t2L + (1 + ξ)−1/4 e−k D
cos(kdH ) , (8)

and √ 2d
G(k) = 3 (1 − 2ξ/k 2t2L )−1 (1 + ξ)−3/4 e−k D
sin(kdH ) . (9)
∗∗
The average here is over an ensemble of possible fluctuations. Using Eq. (3) to analyze
the particular element of this sample observed in our universe relies on ergodic arguments,
which are not exact except in the limit ℓ → ∞. However, corrections are manageable[2]
even for small ℓ.

3
Here tL is the time of last scattering; ξ is 3/4 the ratio of the baryon to
photon energy densities at this time:
!
3ρB
ξ= = 27 ΩB h2 ; (10)
4ργ t=tL

dH is the acoustic horizon size at this time, and dD is a damping length,


given by Eq. (48). These formulas for the form factors are mentioned at
this point only for illustration; we will be working here with general form
factors F (k) and G(k), and will not make use of the specific formulas (5)–
(10) until Section V. But we will assume throughout that any lengths d that
(like dH and dD in Eqs. (8) and (9)) characterize the k-dependence of the
form factors are much smaller than the angular diameter distance dA of last
scattering. This is a good approximation: for instance, if the ratios of matter
and vacuum energy densities to the critical density have the present values
ΩM = 0.3 and ΩΛ = 0.7, then dA /dH runs from 91.7 to 79.7 for values of
ΩB h2 running from zero to 0.03, and dD is smaller than dH , independent of
the value of H0 .
It is usual to employ the well-known expansion of a plane wave in Legen-
dre polynomials, and write Eq. (1) as

"
∆T (n̂) Z  
(2ℓ + 1) iℓ d3 k ǫk Pℓ n̂ · k̂ jℓ (kdA ) F (k)
X
=
T ℓ=0
#
+jℓ′ (kdA ) G(k) . (11)

Using Eq. (3) and the orthogonality property of Legendre polynomials



Z      
dΩk̂ Pℓ n̂ · k̂ Pℓ′ n̂′ · k̂ = δℓ ℓ′ Pℓ (n̂ · n̂′ ) , (12)
2ℓ + 1
one finds that

* + !
∆T (n̂) ∆T (n̂′ ) 2ℓ + 1
Cℓ Pℓ (n̂ · n̂′ ) ,
X
= (13)
T T ℓ=0 4π

with the conventional coefficient Cℓ taking the value


Z ∞ h i2
Cℓ = 16π 2 P(k) k 2 dk jℓ (kdA ) F (k) + jℓ′ (kdA ) G(k) . (14)
0

4
This familiar formula is adequate for numerical calculation of Cℓ , but it hides
the essential qualitative aspects of the dependence of Cℓ on ℓ: that Cℓ for
ℓ ≫ 1 depends on the ratio ℓ/dA , and that ℓ(ℓ + 1)Cℓ approaches a constant
for sufficiently small values of this ratio, whether ℓ itself is large or small. To
obtain these results, we must now distinguish between the two cases ℓ ≫ 1
and ℓ ≪ dA /d (but ℓ ≥ 2), where d is a typical length characteristic of the
form-factors F (k) and G(k). These two cases overlap because, as remarked
above, dA is much larger than d.

III. LARGE ℓ

The usual way of obtaining the contribution of the scalar form factor to
Cℓ for large ℓ is to note that the integral (14) receives its largest contribution
when the argument of the spherical Bessel function is of order ℓ, in which
case we can use the approximation that, for ℓ → ∞,
(
0 √ z<ν
jℓ (z) → 
z −1/2 (z 2 − ν 2 )−1/4 cos z 2 − ν 2 − ν arccos(ν/z) − π4z>ν,
(15)
where z/ν is held fixed at a value 6= 1, with ν ≡ ℓ + 1/2. The procedure is
straightorward for the F 2 terms in Eq. (14), but for the F G and G2 terms
involving the Doppler effect we run into a difficulty: differentiating the factor
(z 2 −ν 2 )−1/4 in Eq. (15) yields larger negative powers of z 2 −ν 2 that introduce
divergences from the part of the integral in Eq. (14) near the lower bound
k = ν/dA . These infrared divergences are spurious, because the asymptotic
formula (15) breaks down if we let z and ν go to infinity in such a way
that z/ν → 1. This problem can be dealt with by switching to a different
asymptotic limit[3] for k near ν/dA . Here we will use a different method[4]
which avoids the delicate problem of the asymptotic behavior of jℓ (z) and
jℓ′ (z) for z near ν.
We return to Eq. (1), and use Eq. (3) to put the correlation function of
observed temperature fluctuations in the form
* +
∆T (n̂) ∆T (n̂′ )
Z h
= d3 k P(k) exp (idA k · (n̂ − n̂′ )) F 2 (k)
T T
i
+ik̂ · (n̂ − n̂′ )F (k) G(k) + (k̂ · n̂)(k̂ · n̂′ ) G2 (k) . (16)

5
The integral over the direction of k is easy, and gives the correlation function
* + "
∆T (n̂) ∆T (n̂′ ) Z ∞

= 4π k 2 dk P(k) F 2 (k) + F (k) G(k)
T T 0 ∂(dA k)
4 2 2
! !#
1 θ 1 1 3θ ∂ sin(dA kθ)
+ G2 (k) 1 + + 2− + 2
, (17)
2 4 θ 2 4 ∂(dA k) dA kθ
where θ ≡ |n̂ − n̂′ |. (This formula may prove useful in analyzing observations
that give the correlation function directly, rather than in terms of Cℓ .) The
amplitude Cℓ is defined as the integral
* +
+1 ∆T (n̂) ∆T (n̂′ )
Z
Cℓ = 2π Pℓ (µ) dµ , (18)
−1 T T
where µ ≡ n̂ · n̂′ = 1 − θ2 /2. For large ℓ the Legendre polynomial Pℓ (µ)
oscillates rapidly for θ ≫ 1/ℓ, so the integral is dominated by values of θ
of order 1/ℓ, in which case we can use the well-known limiting expression
Pℓ (µ) → J0 (ℓθ), and write
"
∞ ∂2
Z Z
2 2
Cℓ → 8π k dk P(k) J0 (ℓθ) θ dθ F 2 (k) + F (k) G(k)
0 0 ∂(dA k)
4 2 2
! !#
1 θ 1 1 3θ ∂ sin(dA kθ)
+ G2 (k) 1 + + 2− + . (19)
2 4 θ 2 4 ∂(dA k)2 dA kθ
The integral over k is dominated by values for which kdA θ is of order unity,
so the derivative ∂/∂(dA k) is effectively of order θ ≈ 1/ℓ. Thus to leading
order in 1/ℓ, Eq. (19) may be simplified to
Z ∞ Z 2
2 2
Cℓ → 8π k dk P(k) J0 (ℓθ) θ dθ
0 0
∂2
" !#
1 1 sin(dA kθ)
× F (k) + G2 (k) 1 + 2
2
. (20)
2 θ ∂(dA k)2 dA kθ
Introducing a new variable s ≡ ℓθ and changing the upper limit on the
s-integral from 2ℓ to infinity, we may write this as
8π 2 ∞ ∞
Z Z
2
Cℓ → k dk P(k) J0 (s) s ds
ℓ2" 0 0
∂2
!#
1 sin(dA ks/ℓ)
× F (k) + G2 (k) 1 +
2
. (21)
2 ∂(dA ks/ℓ)2 (dA ks/ℓ)

6
The integral over s is easy for the F 2 term; we need only use the formula[5]:
(
Z ∞ 0 β<1
J0 (s) sin(βs) ds = (22)
0 (β 2 − 1)−1/2 β>1,

where here β = dA k/ℓ. The integral of the G2 term takes a little more work.
We use the formula (1 + d2 /dx2 ) sin x/x = −(2/x)d/dx(sin x/x) and do the
remaining integral by parts, so that
∂2
" #
∞ sin(βs) 2 ∞ ∂ sin(βs)
Z Z
J0 (s) s 1 + 2
ds = − 2 J0 (s) ds
0 ∂(βs) βs β 0 ∂s s
2 1
Z ∞ 
= 2− 3 J2 (s) + J0 (s) sin(βs) ds . (23)
β β 0
Here we also need the formula[5]
(
Z ∞ 2β β<1
J2 (s) sin(βs) ds = √ ,
0 −(β 2 − 1)−1/2 (β + β 2 − 1)−1 β>1
(24)
so that
∂2
" # (
∞ sin(βs) 0 β<1
Z
J0 (s) s 1 + ds = √ (25)
0 ∂(βs) 2 βs 2β −3 β 2 − 1 β>1.

Using Eqs. (22) and (25) in Eq. (21) then gives our final general formula for
Cℓ at large ℓ:
√ 2
8π 2 ℓ ∞ βF 2 (ℓβ/dA ) β − 1 G2 (ℓβ/dA )
Z " #
Cℓ → 3 dβ P(ℓβ/dA ) √ 2 + . (26)
dA 1 β −1 β

Note that ℓ2 Cℓ depends on ℓ and dA only through its dependence on the ratio
ℓ/dA .
For instance, if take the power spectral function to have the scale-invariant
form P(k) = Bk −3 , then for ℓ ≫ 1
"
2
√ 2 2
#
F (ℓβ/d ) β − 1 G (ℓβ/d )
Z ∞
A A
ℓ(ℓ + 1)Cℓ → 8π 2 B dβ √ + . (27)
1 β2 β2 − 1 β4

(We have taken advantage of the fact that here we are considering ℓ ≫ 1
to change a factor ℓ2 to ℓ(ℓ + 1), in order to facilitate comparison with the

7
results of the next section.) The rapid fall-off of the coefficient of F 2 for
β > 1 suggests that the contribution of the scalar form factor F to Cℓ is
dominated by wave numbers close to dA /ℓ, as is usually assumed. On the
other hand, the contribution of the dipole form factor G(k) for√wave numbers
immediately above dA /ℓ is actually suppressed by the factor β 2 − 1 in the
second term of Eq. (27).

IV. SMALL ℓ d/dA

Here we will adopt the ‘n = 1’ scale-invariant spectrum P(k) ≃ Bk −3


from the beginning, so that the general formula Eq. (14) becomes
2
∞ s s ds
Z    
Cℓ = 16π 2 B jℓ (s) F + jℓ′ (s) G . (28)
0 dA dA s
To generate a series for ℓ(ℓ + 1)Cℓ in powers of ℓ/dA we expand the form
factors in power series:

F (k) = F0 + F2 k 2 + · · · , G(k) = G1 k + G3 k 3 + · · · . (29)

(The power series for F and G must be respectively even and odd in k, in
order that the integrand in the temperature fluctuation (1) should be analytic
in the three-vector k at k = 0.) The leading term in Cℓ is well known; using
a standard formula[6]:
 
m
Z ∞ 2m−3 πΓ(2 − m) Γ ℓ + 2
jℓ2 (s) sm−1 ds = 
3−m
 
m
 , (30)
0 Γ2 2
Γ ℓ+2− 2

we find the term in Eq. (28) of zeroth order in 1/dA :

(0) 8π 2 BF02
Cℓ = . (31)
ℓ(ℓ + 1)

There is no difficulty in also calculating the term in Eq. (28) of first order in
1/dA :

32π 2 BF0 G1 16π 2 BF0 G1


! Z !
∞ i∞
(1)
h
Cℓ = jℓ (s) jℓ′ (s) ds = jℓ2 (s) =0.
dA 0 dA 0
(32)

8
But we run into trouble in calculating the term of second order in 1/dA . The
second derivative of Cℓ with respect to 1/dA is

d 2 Cℓ ∞
Z
′′ 2 ′′
n
= 16π 2 B jℓ2 (s)F 2 (s/dA ) + j ′ ℓ (s)G2 (s/dA )
d (1/dA )2 0
h i′′ 
+2jℓ (s)jℓ′ (s) F (s/dA )G(s/dA ) s ds . (33)

The jℓ jℓ′ term doesn’t contribute to the part of Cℓ of second order in 1/dA ,
because F (k)G(k) contains only odd powers of k. To calculate the contri-
bution of the j ′ 2ℓ term, we need to supplement Eq. (30) with the additional
formula:†
 
m
Z ∞
2
2m−3 πΓ(2 − m) Γ ℓ + 2
jℓ′ (s) sm−1 ds = 
3−m
 
m

0 Γ2 2
Γ ℓ+2− 2
  
(m − 3)(m − 2) (m − 2)(m − 3) − 2ℓ(ℓ + 1)
× 1 + 
m

m
  . (34)
2(3 − m)2 ℓ+ 2
−1 ℓ− 2
+2

The second derivative (33) is divergent at 1/dA = 0, as shown by the factors


Γ(2 − m) in Eqs. (30) and (34), which become infinite for m = 2. Of course,
there is no infinity in Cℓ ; it is simply not analytic in 1/dA at 1/dA = 0.
We can deal with this problem by a method similar to the dimensional
regularization technique used in quantum field theory[7]. We treat m as a
complex variable that approaches m = 2. In this limit, Eqs. (30) and (34)
give

" #
1 1 1
Z ∞
2 m−1
X
jℓ (s) s ds → − + − C + ln 2 − D , (35)
0 2 m − 2 r=1 r

" #
∞ 1 1 1
Z
2
jℓ′ (s) sm−1 ds
X
→− + − C + ln 2 − D + 1 , (36)
0 2 m − 2 r=1 r
where C is the Euler constant C ≡ −Γ′ (1) = 0.57722, and D ≡ −Γ′ (1/2)/Γ(1/2) =
1.96351. The important point here is that the parts of the integrals (35) and
(36) that are divergent at m = 2 are independent of ℓ, and so also is the part

This formula was obtained by using the Bessel differential equation to show that
jℓ′ 2 (z)
= (1−ℓ(ℓ+1)/z 2 )jℓ2 (z)+(zjℓ2 (z))′′ /2z, and then using Eq. (30) with two integrations
by parts.

9
of Cℓ that is non-analytic in 1/dA at 1/dA = 0. Using Eqs. (29), (35) and
(36) in Eq. (33) thus gives the part of Cℓ that is of second order in 1/dA as

(2)
 X 1
Cℓ = −8π 2 B d−2
A 2F0 F2 + G21 + ℓ−independent terms . (37)
r=1 r

We can check the consistency of these results and calculate the ℓ-independent
terms here by using our previous result (27) in the case where ℓ is large and
ℓd/dA is small, where d is whatever length characterizes the k-dependence of
the form factors. The term in Eq. (27) of zeroth order in ℓd/dA is
2
Z ∞ dβ
ℓ(ℓ + 1)Cℓ → 8π BF02 √ 2 = 8π 2 BF02 , (38)
1 β2 β −1
in agreement with Eq. (31). Also, Eq. (27) has no terms of first order in 1/dA ,
in agreement with Eq. (32). To calculate the terms in Eq. (27) of second order
in 1/dA , we express F 2 (k) and G2 (k) in terms of cosine transforms
Z ∞   Z ∞  
2
F (k) = F02 + da f (a) 1−cos(ka) , 2
G (k) = da g(a) 1−cos(ka) .
0 0
(39)
Then for ℓ ≫ 1 and ℓd/dA ≪ 1, Eq. (27) gives
8π 2 B  ℓd¯
" ! ! #
(2) 2
 3 2
Cℓ → − 2 2F0 F2 + G1 ln +C − + G1 , (40)
dA 2dA 2
where d¯ is a typical value of the variable a in the cosine transforms (39):
R∞
[f (a) + g(a)] a2 ln a da
ln d¯ ≡ 0
R∞
2
(41)
0 [f (a) + g(a)] a da

Eq. (40) agrees with the limit of Eq. (37) for large ℓ, because in this limit
Pℓ
1 1/r → ln ℓ + C, and now fixes the ℓ-independent terms in Eq. (37) so
that, for any ℓ with ℓd/dA ≪ 1,
8π 2 B  d¯ ℓ
" ! ! #
(2)
 1 3
2F0 F2 + G21 + G21 .
X
Cℓ =− 2 ln + − (42)
dA 2dA r=1 r 2
Putting together Eqs. (31), (32), and (42) gives our final formula for Cℓ
in the case ℓd/dA ≪ 1 and ℓ ≥ 2:
d¯ ℓ
( " ! ! # )
ℓ(ℓ + 1) 2 1
ℓ(ℓ + 1)Cℓ = 8π 2 BF02 − d′2 + . . . ,
X
1− d ln +
d2A 2dA r=1 r
(43)

10
where now we introduce a pair of characteristic lengths:
2F0 F2 + G21 3F0 F2 + 12 G21
d2 ≡ , d′2 ≡ . (44)
F02 F02
The logarithm in Eq. (43) is large and negative, so ℓ(ℓ + 1)Cℓ will increase
or decrease with ℓ for sufficiently small ℓ according as d2 > 0 or d2 < 0.
(Taken literally, Eq. (43) would suggest that this behavior is reversed when
the sum over r becomes large enough to cancel the logarithm, but this is at
ℓ ≃ 2e−C dA /d,¯ which is large enough to invalidate the approximations that
led to Eq. (43).) Note that, while d and d′ depend only on the behavior of the
form factors near zero wave number, the length d¯ given by Eq. (41) depends
on the behavior of the form factors at all wave numbers. Consequently,
although the value of Cℓ at low ℓ depends only on the form factors at k = 0,
somewhat surprisingly the growth of Cℓ for small ℓ depends on the form
factors at all wave numbers.

V. APPLICATION

To illustrate the use of the asymptotic formulas obtained here, we will


now apply them to the simplified model described in reference 1: the uni-
verse before last scattering consisting of pressureless cold dark matter and
a photon-nucleon-electron plasma; no gravitational radiation; and negligible
contributions of the plasma and neutrinos to the gravitational field. In this
case, the comparison of Eqs. (5) and (6) for the long wavelength limit of the
form factors with Eq. (29) gives

F0 = 1 , F2 = −3t2L /2 , G1 = 3tL , (45)

so the lengths (44) are here

d2 = 6t2L , d′2 = 0 . (46)

Hence Eq. (43) then gives the behaviour of Cℓ for ℓd/dA ≪ 1 and ℓ ≥ 2 as

6ℓ(ℓ + 1)t2L d¯ ℓ
( " ! # )
2
X 1
ℓ(ℓ + 1)Cℓ = 8π B 1 − ln + + ... , (47)
d2A 2dA r=1 r

Aside from its weak dependence on d,¯ the behaviour of Cℓ for ℓd/dA ≪
1 is independent of the baryon density, in agreement with more accurate

11
computer calculations[8]. We can’t calculate the length d¯ without a model
that would give the form factors at all wave numbers, but d¯ is expected to be
roughly of order dH , and since dA /dH is large the logarithm
√ is not sensitive
¯ If for instance we take d¯ = 3tL = dA /58.5 (the
to the precise value of d.
acoustic horizon at last scattering for ΩM = 0.4, ΩV = 0.6, and ΩB = 0) then
the quantity ℓ(ℓ + 1)Cℓ /8π 2 B rises from unity when extrapolated to ℓ = 0 to
1.044 at ℓ = 5, and to 1.118 at ℓ = 10, which is probably the highest value
of ℓ for which the approximations leading to Eq. (47) are reliable.
For ℓ of the order of dA /dH the coefficients Cℓ can be calculated under
the simplifying assumptions of this section by using the form factors given
by Eqs. (8) and (9) in Eq. (27). The damping length is given in reference 1
as
ξ2
!
8
d2D ≡ DL2 + ∆DL2 ≃ 0.029 t2L + + 0.0025 d2H . (48)
15(1 + ξ) 2(1 + ξ)2
Our results for Cℓ at and below the first Doppler peak are not sensitive to
dD . We will simplify our calculations here by dropping the terms in Eqs. (8)
and (9) that are proportional to the ratio ξ/k 2t2L , on the grounds that these
terms are not very different from corrections to the WKB approximation that
are not included either. (At the first Doppler peak ξ/k 2 t2L increases with ξ
and hence with ΩB h2 , and for ΩB h2 = 0.03 it has the value 0.20. But to
be honest, the real reason for dropping these terms is that they spoil the
agreement of our results for the height of the first Doppler peak with more
accurate numerical calculations.) The results obtained now depend critically
on the baryon density parameter ξ ≃ 27 ΩB h2 , and are shown in Figure 1 for
values of ΩB h2 ranging from zero to 0.03.
For ΩB = 0 (in which case the WKB approximation is not needed, so
that Eq. (27) should give Cℓ down to values of ℓ of order two) the behavior
of Cℓ is nothing like what is observed: ℓ(ℓ + 1)Cℓ /8π 2 B rises from unity to
1.1 at a ‘zeroth Doppler peak’ at ℓdH /dA ≃ 0.45 (due to the maximum in the
Doppler form factor G(k) at kdH = π/2), then dips to 0.7 at ℓdH /dA ≃ 1.6,
and then rises again to a first Doppler peak at ℓdH /dA ≃ 2.83.
For ΩB h2 ≥ 0.01 the behavior of Cℓ within the range of validity of the
WKB approximation is much more like what is observed: ℓ(ℓ + 1)Cℓ rises
monotonically to a first Doppler peak at ℓdH /dA very roughly of order π
(though actually around 2.6). There is another clear peak at ℓ ≃ 8.7 dH /dA ,
presumably arising from the peak in F (k) at k = 3π/dH . The weaker peaks

12
8

5 10 15 20

Figure 1: Plots of the ratio of the multipole strength parameter ℓ(ℓ + 1)Cℓ to
its value at small ℓ, versus ℓdH /dA , where dH is the horizon size at the time of
last scattering and dA is the angular diameter distance of the surface of last
scattering. The curves are for ΩB h2 ranging (from top to bottom) over the
values 0.03, 0.02, 0.01, and 0, corresponding to ξ taking the values 0.81, 0.54,
0.27, and 0. The solid curves are calculated using the WKB approximation;
dashed lines indicate an extrapolation to the known value at small ℓdH /dA .
These results are independent of the parameters H0 , ΩΛ , and ΩM .

13
in ℓ(ℓ + 1)Cℓ arising from peaks in F (k) near even values of kdH /π are absent
here, presumably because of our neglect of the contribution of radiation and
neutrinos to the gravitational field. Another difference between the curves of
Figure 1 and more accurate computer calculations is that, again because we
neglect the contribution of radiation and neutrinos to the gravitational field,
our results do not show the fall-off of ℓ(ℓ + 1)Cℓ at large ℓ associated with
the fall-off of the familiar transfer function T (k) at large k.
The values of the position ℓ1 dH /dA of the first peak and the ratio of its
height ℓ1 (ℓ1 + 1)Cℓ1 to the value 8π 2 B ≃ 6C2 for small ℓ are given for various
baryon densities in Table 1. These results are independent of other param-
eters. In the last two columns of Table 1 we also give values of dA /dH for
ΩM = 0.3 and ΩΛ = 0.7, and the corresponding results for the multipole num-
ber ℓ1 of the first Doppler peak. In calculating the horizon at last scattering
dH we have now (somewhat inconsistently) taken into account the effect of
photons and three flavors of neutrinos and antineutrinos on the expansion
rate, which gives

a(tL ) tL dt
Z
dH ≡ √ q
3 0 a(t) 1 + R(t)
√ q 
2 1+ξ+ ξ(1 + λ)
= √ ln  √  , (49)
H0 (1 + zL )3/2 3ξΩM 1+ ξλ

where λ = 0.047/ΩM h2 is the ratio of photon and neutrino energy density


to dark matter energy density at the time of last scattering, and dA is given
by Eq. (2). In calculating the values of dA /dH in the table we have taken
ΩM h2 = 0.15.
We see from Table 1 that the position of the first Doppler peak does not
depend strongly on ΩB h2 , while its height is a sensitive function of ΩB h2 .
For ΩB h2 between 0.02 and 0.03 the height and position are in fair agreement
with what is observed, though of course the serious comparison of theory with
observation relies on more accurate computer calculations. The qualitative
results obtained here suggest that if one were to rely on a single feature of
the plot of ℓ(ℓ + 1)Cℓ versus ℓ to measure ΩB h2 , then the ratio of the the
height of the first Doppler peak to the value for lower ℓ values studied by
the COBE satellite would be more useful than the ratio of the heights of the
first and second Doppler peaks, which relies on less precise data, depends on

14
Table 1: Location ℓ1 dA /dH of the first Doppler peak and height of the peak
in ℓ(ℓ + 1)Cℓ relative to its value 8π 2 B ≃ 6C2 for ℓ extrapolated to zero
for various values of the baryon density parameter. These results, and the
curves in Figure 1, are independent of the values of H0 , ΩΛ , and (within our
approximations) ΩM and. The last two columns give the values of dA /dH
and ℓ1 for ΩM = 0.3, ΩΛ = 0.7, with dH calculated taking into account
the contribution of photons and neutrinos to the expansion rate, and using
ΩM h2 = 0.15.
ΩB h2 ξ ℓ1 dH /dA ℓ1 (ℓ1 + 1)Cℓ1 /6C2 dA /dH ℓ1
0 0 2.83 0.863 91.7 260
0.01 0.27 2.65 2.34 87.1 231
0.02 0.54 2.60 5.09 83.6 217
0.03 0.81 2.58 9.115 79.7 206

complicated damping effects, and is more sensitive to other parameters, such


as ΩM h2 and the rate of change, if any, of the vacuum energy. Of course, for
high precision one must use the whole plot of ℓ(ℓ + 1)Cℓ versus ℓ to measure
all these parameters together.

ACKNOWLEDGMENTS

I am grateful for helpful correspondence with E. Bertschinger, J. R. Bond,


L. P. Grishchuk, and M. White. This research was supported in part by
the Robert A. Welch Foundation and NSF Grants PHY-0071512 and PHY-
9511632.

REFERENCES

1. S. Weinberg, ”Fluctuations in the Cosmic Microwave Background I:


Form Factors and their Calculation in Synchronous Gauge,” UTTG
03-01, astro-ph/0103279.

2. L. P. Grishchuk and J. Martin, Phys. Rev. D 56, 1924 (1997).

3. J. R. Bond, “Theory and Observations of the Cosmic Background Ra-


diation,” in Cosmology and Large Scale Structure, eds. R. Schaeffer,
J. Silk, M. Spiro and J. Zinn-Justin (Elsevier, 1996), Section 5.1.3.

15
Our result (26) can also be derived with somewhat more trouble by
using Bond’s results on the asymptotic behavior of averaged products
of spherical Bessel functions.

4. This method had been used previously to obtain Eq. (21) with only
a scalar form-factor F (k), as for instance by J. R. Bond and G. Efs-
tathiou, Mon. Not. R. Astr. Soc. 226, 655 (1987), Eq. (4.19), but not
as far as I know with the inclusion of the dipole form-factor G(k).

5. I. S. Gradshteyn and I. M. Ryzhik, Table of Integrals, Series, and Prod-


ucts (Academic Press, New York, 1980), #6.671.2.

6. I. S. Gradshteyn and I. M. Ryzhik, ibid., #6.574.2.

7. G. ’t Hooft and M. Veltman, Nucl. Phys. B44, 189 (1972).

8. E. F. Bunn and M. White, Ap. J. 480, 6 (1997).

16
UTTG-05-01

Conference Summary
20th Texas Symposium on Relativistic
Astrophysics
Steven Weinberg
arXiv:astro-ph/0104482v3 4 May 2001

Department of Physics, University of Texas at Austin


weinberg@physics.utexas.edu

Abstract. This is the written version of the summary talk given at the 20th Texas
Symposium on Relativistic Astrophysics in Austin, Texas, on December 15, 2000. Af-
ter a brief summary of some of the highlights at the conference, comments are offered
on three special topics: theories with large additional spatial dimensions, the cosmo-
logical constant problems, and the analysis of fluctuations in the cosmic microwave
background.

I OVERVIEW
Speaking as a particle physicist, an outsider, I have to say that my chief reaction
after a week of listening to talks at this meeting is one of envy. You astrophysi-
cists are blessed with enlightening data in an abundance that particle physicists
haven’t seen since the 1970s. And although you still face many mysteries, theory
is increasingly converging with observation.
For instance, as discussed by Shri Kulkarni, it now seems clear that gamma ray
bursters are at cosmological distances, producing over 1050 ergs in particle kinetic
energies alone in a minute or so, making them the most spectacular objects in the
sky. Tsvi Piran described a fireball model for the gamma ray bursters, in which
gamma rays are produced by relativistic particles accelerated by shocks within
material that is ejected ultra-relativisticaly from a central source. One can think
of various mechanisms for the hidden central source, but even without a specific
model for the source, the fireball model does a good job of accounting for what is
observed.
According to this fireball model, gamma rays from the bursters are strongly
beamed. Peter Höflich presented evidence that core collapse supernova are also
highly aspherical. Both conclusions may be good news for gravitational wave as-
tronomers — only aspherical explosions can generate gravitational waves.
Spectacular things seem to be turning up all over. Amy Barger told us how X-
ray observations are revealing many active galactic nuclei in what had previously
seemed like ordinary galaxies, and John Kormendy reported evidence that the
events that produce galactic bulges or elliptical galaxies are the same as those that
produce black holes in galactic centers.

Astrophysics is currently the beneficiary of massive surveys that are providing or


will soon provide a flood of important data. We heard from John Peacock about
the 2dF Galaxy Redshift Survey, from Bruce Margon about the Sloan Digital Sky
Survey, and from George Ricker about the HETE x-ray and γ- ray satellite mission.
Together with cosmic microwave background observations, about which more later,
there seems to be a general consistency with the big bang cosmology, with about
30% of the critical mass furnished by cold dark matter, and about 70% furnished
by negative-pressure vacuum energy.

This is not to say that there are no puzzles. Alan Watson reported on the long-
standing puzzles of understanding how the highest energy cosmic rays are generated
and how they manage to get to earth through the cosmic microwave background.
There are also persistent problems in matching the cold dark matter model to
observations of the mass distribution in galaxies. Ben Moore cast some doubt on
whether cold dark matter really leads to the missing “cuspy cores” of galaxy haloes,
and he concentrated instead on a different problem: cold dark matter models give
much more matter in satellites of galaxies than is observed. He suggests that
the missing satellites may really be there, and that they have not been observed
because they have not formed stars. The reionization processes discussed here by
Paul Shapiro may be responsible for the failure of star formation.

Our knowledge of the dark matter mass distribution within galaxies is receiving
important contributions from observations of the lensing of quasar images by inter-
vening galaxies, discussed by Genevieve Soucail. There have been hopes of using
surveys of gravitational lenses to distinguish among cosmological models, but I have
the impression that the study of galactic lenses will turn out to be more important
in learning about the lensing galaxies themselves. Andrew Gould reported that
microlensing observations have ruled out the dark matter being massive compact
halo objects with masses in the range 10−7 M⊙ to 10−3 M⊙ .

It would of course be a great advance if cold dark matter particles could be


directly detected. We heard a lively debate about whether weakly interacting
massive dark matter particles have already been detected, between Rita Bernabei
(pro) and Blas Cabrera (con). It would be foolhardy for a theorist to try to judge
this issue, but at least one gathers that, if the dark matter is composed of WIMPs,
then they can be detected.

I have now completed my 10 minute general summary of the conference. There


were other excellent plenary talks, and I have not mentioned any of the parallel
talks, but what can you do in 10 minutes? In the remaining 35 minutes, I want to
take up some special topics, on which I will have a few comments of my own.
II LARGE EXTRA DIMENSIONS
It is an old idea that the four spacetime dimensions in which we live are embedded
in a higher dimensional spacetime, with the extra dimensions rolled up in some
sort of compact manifold with radius R. This would have profound cosmological
consequences: the compactification of the extra dimensions could be the most
important event in the history of the universe, and such theories would contain
vast numbers of new types of particle.
In the original version of this theory any field would have normal modes that
would be observed in four dimensions as an infinite tower of ‘Kaluza–Klein recur-
rences,’ particles carrying the quantum numbers of the fields, with masses given
by multiples of 1/R. It had generally been supposed that R would be of the order
of the Planck length, or perhaps 10 to 100 times larger, of the order of the inverse
energy M at which the strong and electroweak coupling constants are unified. Even
setting this preconception aside, it had seemed that in any case R would have to be
smaller than 10−16 cm ≈ (100 GeV)−1 , in order that the Kaluza–Klein recurrences
of the particles of the standard model would be heavy enough to have escaped
detection.
The possibilities for higher dimensional theories became much richer with the
increasing attention given to the idea that the spacetime in which we live does
not merely appear four-dimensional — our three-space may be a truly three-
dimensional surface that is embedded in a higher dimensional space. (This is the
picture of higher dimensions that was vividly described in Edwin Abbott’s 1884
novel Flatland, and has more recently become an important part of string theory,
starting with Polchinski’s work on D-branes[1].) This idea opens up the possibility
that some fields may depend only on position on the four-dimensional spacetime
surface, while others ‘live in the bulk’ — that is, they depend on position in the
full higher-dimensional space. Only the fields that live in the bulk would have
Kaluza–Klein recurrences.
Craig Hogan here discussed the recently proposed idea that the compactification
scale R may actually be much larger than 10−16 cm, with no Kaluza–Klein recur-
rences for the particles of the standard model because the standard model fields
depend only on position in the four-dimensional spacetime in which we live[2]. Ac-
cording to this idea, it is only the gravitational field that depends on position in the
higher dimensional space, and so it is only the graviton that has has Kaluza–Klein
recurrences, which at ordinary energies would interact too weakly to have been
observed. The long range forces produced by exchange of these massive gravitons
would be small enough to have escaped detection in measurements of gravitational
forces between laboratory masses as long as R < 1 mm. (There are stronger astro-
physical and cosmological bounds on R, arising from limits on the production of
graviton recurrences in supernovas[3] and in the early universe[4].)
In any such theory with large compactification radius R the Planck mass scale
of the higher dimensional theory of gravitation would be very much less than the
Planck mass scale in our four dimensional spacetime. In a world with 4 + N space-
time dimensions the gravitational constant G4+N (the reciprocal of the coefficient

of the term d4+N x g gµν Rµν in the action) has dimensionality [mass]−2−N , so
R

we would expect it to be given in terms of some fundamental higher dimensional


Planck mass scale M∗ by G4+N ≈ M∗−2−N . Dimensional analysis then tells us that
the gravitational constant G in four spacetime dimensions must be given by

G ≈ M∗−2−N R−N . (1)

The usual assumption in theories with extra dimensions has been that R ≈ M∗−1 ,
in which case G ≈ M∗−2 , and M∗ would have to be about 1019 GeV. But if we take
N = 1 and R ≈ 1 mm, then 1/R ≈ 10−13 GeV, and M∗ ≈ 108 GeV. With N = 2
and R ≈ 1 mm, M∗ ≈ 300 GeV. This is the most attractive aspect of theories with
large extra dimensions: they can reduce or eliminate what had seemed like a huge
gap between the characteristic energy scale of electroweak symmetry breaking and
the fundamental energy scale at which gravitation becomes a strong interaction.
Theories with large extra dimensions are very ingenious, and they may even be
correct, but I am not enthusiastic about them, for they give up the one solid ac-
complishment of previous theories that attempt to go beyond the standard model:
the renormalization group equations of the original standard model showed that
there is an energy, around 1015 GeV, where the three independent gauge coupling
constants become nearly equal[5]. In the supersymmetric version of the standard
model the convergence of the couplings with each other becomes more precise[6],
and the energy scale MU of this unification moves up to about 2 × 1016 GeV [7],
which is less than would be expected in string theories of gravitation by a factor
of only about 20. (This is also a plausible energy scale for the violation of lepton
number conservation that may be showing up in the neutrino oscillation experi-
ments discussed here by Masayuki Nakahata.) The Kaluza–Klein tower of graviton
recurrences does nothing to change the running of the strong and electroweak cou-
pling constants, and since the higher dimensional Planck mass M∗ is very much
less than 1015 GeV in theories with large extra dimensions (this, after all, is the
point of these theories), it appears that in these theories the standard model gauge
couplings are not unified at the fundamental mass scale M∗ . Of course, they might
be unified at some higher energy, but we have no way to calculate what happens
in these theories at any energy higher than M∗ .
In his talk here Hogan mentioned that Dienes, Dudas, and Gherghetta[8] have
proposed a way out of this problem. I looked up their papers, and found that they
modify the renormalization group equations for the gauge couplings of the standard
model by allowing the gauge and Higgs fields (and perhaps some fermion fields) to
depend on position in the higher dimensional space, along with the gravitational
field. Of course, then they have to avoid conflict with experiment by taking 1/R
greater than 100 GeV. The Kaluza-Klein recurrences of the gauge bosons greatly
increase the rate at which the coupling constants of the standard model run, but
with little change in their unification. To put this quantitatively, Dienes et al. find
the bare (Wilsonian) couplings evaluated with a cut-off Λ are
4π 4π bi Λ b̄i b̄i XN h N
i
= − ln + ln ΛR − (ΛR) − 1 , (2)
gi2 (Λ) gi2 (mZ ) 2π mZ 2π 2πN

where g1 and g2 are defined as usual in terms of the electron charge e and the elec-
troweak mixing angle θ by g12 = e2 / sin2 θ and g22 = 5e2 /3 cos2 θ; g3 is the coupling
constant of quantum chromodynamics; and XN is a number of order unity. The
constants (b1 , b2 , b3 ) are the factors (33/5, 1, −3) appearing in the renormalization
group equation of the supersymmetric standard model with two Higgs doublets,
while the constants (b̄1 , b̄2 , b̄3 ) are the corresponding factors (3/5, −3, −6) in the
renormalization group equations for Λ above the compactification scale 1/R (with
a possible constant added to each of the b̄i , proportional to the number of chiral
fermions that live in the bulk). Dienes et al. remark that the standard model
couplings still come close to converging to a common value, because the ratios of
the differences of the b̄i are not very different from the ratios of the differences of
the bi . I would like to put this more quantitatively, by asking what value of sin2 θ
is needed in order for the couplings to become exactly equal at some value of Λ. In
the supersymmetric standard model, this is

3(b3 − b2 ) + 5(b2 − b1 )e2 /g32 1 7 e2


sin2 θ = = + = 0.231 , (3)
8b3 − 3b2 − 5b1 5 15 g32

in excellent agreement with the measured value 0.23117 ± 0.00016. (Here e and g3
are taken as measured at mZ , in which case e2 /4π = 1/128 and g32 /4π = 0.118.) If
all the running of the couplings were at scales greater than 1/R, then sin2 θ would
be given by Eq. (3), but with bi replaced with b̄i :

3(b̄3 − b̄2 ) + 5(b̄2 − b̄1 )e2 /g32 3 3 e2


sin2 θ = = + = 0.243 . (4)
8b̄3 − 3b̄2 − 5b̄1 14 7 g32

This is not bad, but nevertheless outside experimental bounds. (It would be neces-
sary to consider higher-order contributions in the renormalization group equations
and threshold effects to be sure that there is really a discrepancy here.) In order
not to spoil the prediction for sin2 θ, 1/R would have to be considerably larger
than 1 TeV, so that much of the running of the coupling constants would occur
at scales below 1/R, where the renormalization group equations are those of the
supersymmetric standard model.
In any case, the running of the couplings is so rapid above the compactification
scale 1/R that the couplings become equal (to the extent that they do become
equal) at an energy not far above 1/R. The 4 + N dimensional Planck scale M∗
given by Eq. (1) is very much greater than this. Taking 1/R greater than 1 TeV,
Eq. (1) would give M∗ greater than 1013 GeV for N = 1. Even for N = 7, we would
have M∗ greater than 106 GeV. Thus theories of this sort save the unification of
couplings at the cost of reintroducing a large gap between the higher-dimensional
Planck scale M∗ and the electroweak scale.
III VACUUM ENERGY
There are now two problems surrounding the energy of empty space[9]. The first
is the old problem, why the vacuum energy density is so much smaller than any one
of a number of individual contributions. For instance, it is smaller than the energy
density in quantum fluctuations of the gravitational field at wavelengths above the
Planck length by a factor of about 10−122 and it is smaller than the latent heat
associated with the breakdown of chiral symmetry in the strong interactions by
a factor about 10−50 . All these contributions can be cancelled by just adding an
appropriate cosmological constant in the gravitational field equations; the problem
is why there should be such a fantastically well-adjusted cancellation. The second,
newer, problem is why the vacuum energy density that seems to be showing up in
supernova studies of the redshift-distance relation (reviewed in a parallel session by
Nick Suntzeff and Saul Perlmutter) is of the same order of magnitude (apparently
larger by a factor about 2) as the matter density at the present time. There are
five broad classes of attempts to solve one or both of these problems:
1) Cancellation Mechanisms
It has occurred to many theorists that the gravitational effect of vacuum energy
might be wiped out by the dynamics of a scalar field, which automatically adjusts
itself to minimize the spacetime curvature. So far, this has never worked. Some
recent attempts were described by Andre Linde in a parallel session.
2) Deep Symmetries
There are several symmetries that could account for a vanishing vacuum energy,
if they were not broken. One is scale invariance; another is supersymmetry. The
problem is to see how to preserve the vanishing of the vacuum energy despite the
breakdown of the symmetry. No one knows how to do this.
3) Quintessence
It is increasingly popular to consider the possibility that the vacuum energy is
not constant, but evolves with the universe[10]. For instance, a real scalar field φ
with Lagrangian density −∂µ φ∂ µ φ/2 − V (φ) if spatially homogeneous contributes
a vacuum energy density and a pressure
1 1
ρ = φ̇2 + V (φ) , p = φ̇2 − V (φ) , (5)
2 2
so the condition ρ + 3p < 0 for an accelerating expansion is satisfied if the field φ
is evolving sufficiently slowly so that φ̇2 < V (φ).
It must be said from the outset that, in themselves, quintessence theories do
not help with the first problem mentioned above — they do not explain why V (φ)
does not contain an additive constant of the order of (1019 GeV)4 . It is true that
superstring theories naturally lead to “modular” scalar fields φ for which V (φ) does
vanish as φ → ∞, in which case the vacuum becomes supersymmetric. It might
be hoped that the vacuum energy is small now, because the scalar field is well
on its way toward this limit. The trouble is that the vacuum now is nowhere near
supersymmetric, so that in these theories we would expect a present vacuum energy
of the order of the fourth power of the supersymmetry-breaking scale, or at least
(1 TeV)4 .
On the other hand, such theories may help with the second problem, if the
quintessence energy is somehow related to the energy in matter and radiation,
because the present moment is not so many e-foldings of cosmic expansion (about
10, in fact) from the turning point in cosmic history when the radiation energy
density (including neutrinos) fell below the matter energy density. Paul Steinhardt
here described a model in which the quintessence energy density was less than the
radiation energy density by a constant factor r, as long as radiation dominated
over matter[11]. (It is necessary that r be considerably less than unity, in order
that quintessence should not appreciably increase the expansion rate during the
era of nucleosynthesis, increasing the present helium abundance above the observed
value.) Then when the radiation energy density fell below the matter energy density
at a cross-over redshift zC ≈ 3000 the quintessence energy dropped sharply by
a factor of order r 2 , and has remained roughly constant since then. Since the
cross-over between radiation and matter dominance the matter energy density has
decreased by a factor zC−3 , so the ratio of the quintessence energy density and
the matter energy density now should be of order r 2 × r × zC3 = (zC r)3 . For
the quintessence and matter energies to be about equal now, r must be equal to
about 1/zC ≈ 3 × 10−4. Steinhardt tells me that when these calculations are done
carefully, the required ratio r of quintessence to radiation energy density at early
times is about 10−2 , rather than 3 × 10−4. But whatever the value of r that makes
the quintessence energy comparable to the matter energy density now, it requires
some fairly fine tuning: changing r by a factor 10 would change the ratio of the
present values of the quintessence energy density and the matter energy density by
a factor 103 .

4) Brane Solutions
Several authors have found solutions of brane theories of the Randall–Sundrum
kind[2] in which our four-dimensional spacetime is flat, despite the presence of a
large cosmological constant in the higher dimensional gravitational Lagrangian[12].
These solutions contained an unacceptable essential singularity off the brane, but
there are models in which this can be avoided[13]. I don’t believe that there is
anything unique in these solutions, so that instead of having to fine tune parameters
in the Lagrangian one has to fine tune initial conditions. Also, it is not clear
why the effective cosmological constant has to be zero now, rather than before the
spontaneous breakdown of the chiral symmetry of quantum chromodynamics, when
the latent heat associated with this phase transition would have given the vacuum
an energy density (1 GeV)4 .

5) Anthropic Principle
Why is the temperature on earth in the narrow range where water is liquid? One
answer is that otherwise we wouldn’t be here. This answer makes sense only because
there are many planets in the universe, with a wide range of surface temperatures.
Because there are so many planets, it is natural that some of them should have
liquid water, and of course it is just these planets on which there would be anyone to
wonder about the temperature. In the same way, if our big bang is just one of many
big bangs, with a wide range of vacuum energies, then it is natural that some of
these big bangs should have a vacuum energy in the narrow range where galaxies can
form, and of course it is just these big bangs in which there could be astronomers
and physicists wondering about the vacuum energy. To be specific, a constant
vacuum energy if negative would have to be greater than about −10−120 m4Planck , in
order for the universe not to collapse before life has had time to develop[14], and if
positive it would have to be less than about +10−118 m4Planck , in order for galaxies
to have had a chance to form before the matter energy density fell too far below
the vacuum energy density[15]. As far as I know, this is at present the only way of
understanding the small value of the vacuum energy. But of course it makes sense
only if the big bang in which we live is one of an ensemble of many big bangs with
a wide range of values of the cosmological constant. There are various ways that
this might be realized:
(a) Wormholes or other quantum gravitational effects may cause the wave function
of the universe to break up into different incoherent terms, corresponding to
various possible universes with different values for what are usually called the
constants of nature, perhaps including the cosmological constant[16].
(b) Various versions of “new” inflation lead to a continual production of big
bangs[17], perhaps with different values of the vacuum energy. For instance,
if there is a scalar field that takes different initial values in the different big
bangs, and if it has a sufficiently flat potential, then its energy appears like a
cosmological constant, which takes different values in different big bangs[18].
(c) As the universe evolves the vacuum energy may drop discontinuously to lower
and lower discrete values. One way for this to happen is for the vacuum en-
ergy to be a function of a scalar field, with many local minima, so that as the
universe evolves the vacuum energy keeps dropping discontinuously to lower
and lower local minima[19]. Another possibility[20] with similar consequences
is based on the introduction of an antisymmetric gauge potential Aµνλ , which
enters in the Lagrangian density in a term proportional to F µνλκ Fµνλκ , where
Fµνλκ is ∂κ Aµνλ with antisymmetrized indices. Instead of a scalar field tunnel-
ing from one minimum of a potential to another, the vacuum energy evolves
through the formation of membranes, across which there is a discontinuity in
the value of Lorentz-invariant gauge fields Fµνλκ = F ǫµνλκ . To allow an an-
thropic explanation of the smallness of the vacuum energy, it is essential that
the metastable values of the vacuum energy be very close together. Several
models of this sort have been proposed recently[21].
Under any of these alternatives, we have not only an upper bound[15] on the
vacuum energy density, given by the matter energy density at the time of forma-
tion of the earliest galaxies, but also a plausible expectation, which Vilenkin calls
the principle of mediocrity[22], that the vacuum energy density found by typical
astronomers will be comparable to the mass density at the time when most galax-
ies condense, since any larger vacuum energy density would reduce the number
of galaxies formed, and there is no reason why the vacuum energy density should
be much smaller. The observed vacuum energy density is somewhat smaller than
this, but not very much smaller. This can be put quantitatively[23]: under the
assumption[24] that the a priori probability distribution of the vacuum energy is
approximately constant within the narrow range within which galaxies can form,
the probability that an astronomer in any of the big bangs would find a value of ΩΛ
as small as 0.7 ranges from 5% to 12%, depending on various assumptions about
the initial fluctuations. In this calculation the fractional fluctuation in the cosmic
mass density at recombination is assumed to take the value observed in our big
bang, since the vacuum energy would have a negligible effect on physical processes
at and before recombination. There are also interesting calculations along these
lines in which the rms value of density fluctuations at recombination is allowed to
vary independently of the vacuum energy[25].

IV COSMIC MICROWAVE BACKGROUND


ANISOTROPIES

Perhaps the most remarkable improvement in cosmological knowledge over the


past decade has been in studies of the cosmic microwave background. Since COBE,
there is for the first time a cosmological parameter — the radiation temperature —
that is known to three significant figures. More recently, since the BOOMERANG
and MAXIMA experiments reviewed here by Paolo de Bernardis, our knowledge
of small angular scale anisotropies has become good enough to set useful limits on
other cosmological parameters, such as the present spatial curvature.
Unfortunately, this has produced a frustrating situation for those of us who
are not specialists in the theory of the cosmic microwave background. We see
papers in which experimental results for the strengths Cℓ of the ℓth multipole in
the temperature correlation function are compared with computer generated plots
of Cℓ versus ℓ for various values of the cosmological parameters, without the non-
specialist reader being able to understand why the theoretical plots of Cℓ versus
ℓ look the way they do, or why they depend on cosmological parameters the way
they do. I want to take the opportunity here to advertise a formalism[26] that I
think helps in understanding the main features of the observed anisotropies, and
how they depend on various cosmological assumptions.
One can show under very general assumptions that the fractional variation from
the mean of the cosmic microwave background temperature observed in a direction
n̂ takes the form
∆T (n̂)
Z h i
= d3 k ǫk eidA k·n̂ F (k) + i n̂ · k̂ G(k) , (6)
T
where dA is the angular diameter distance of the surface of last scattering
 
1 1 dx
Z
1/2
dA = 1/2
sinh ΩC √  , (7)
ΩC H0 (1 + zL ) 1
1+zL ΩΛ x + ΩC x2 + ΩM x
4

(with zL ≃ 1100 and ΩC ≡ 1 − ΩM − ΩV ); k2 ǫk is proportional to the Fourier


transform of the fluctuation in the energy density at early times (with k the physical
wave number vector at the nominal moment of last scattering, so that dA k in
the argument of the exponential is essentially independent of how this moment is
defined); and F (k) and G(k) are a pair of form factors that incorporate all relevant
information about acoustic oscillations up to the time of last scattering, with F (k)
arising from intrinsic temperature fluctuations and the Sachs–Wolfe effect, and
G(k) arising from the Doppler effect. Given the form factors, one can find the
coefficients Cℓ for ℓ ≫ 1 by a single integration
√ 2
8π 2 ℓ3 ∞ βF 2(ℓβ/dA ) β − 1 G2 (ℓβ/dA )
Z " #
ℓ(ℓ + 1)Cℓ → 3 dβ P(ℓβ/dA ) √ 2 + . (8)
dA 1 β −1 β

where P(k) is the power spectral function, defined by

hǫk ǫk′ i = δ 3 (k + k′ ) P(k) . (9)

(The first term in the square brackets in Eq. (8) appeared in a calculation by Bond
and Efstathiou[27]; I think the second is new.)
As you can see from the F 2 (k) term in Eq. (8), for ℓ ≫ 1 the main contribution
to Cℓ of the Sachs–Wolfe effect and intrinsic temperature fluctuations comes from
wave numbers close to dA /ℓ, but this well-known result is not a good approximation
for the Doppler effect form factor G(k). Since it is the form factors rather than Cℓ
that really reflect what was going on before-recombination, it is important to try to
measure them more directly, as for instance through interferometric measurements
of the temperature correlation function, of the sort described in a parallel session
by K. Y. Lo et al. and B. S. Mason et al.
The Harrison–Zel’dovich spectrum suggested by theories of new inflation[28] is
P(k) = Bk −3 , with B a constant. In this case Eq. (8) gives a formula for Cℓ that
is valid for ℓ ≫ 1 and ℓ ≪ dA /dH (where dH ≪ dA is the horizon distance at the
time of last scattering):

ℓ2 d¯ℓ
( " ! ! # )
ℓ(ℓ + 1)Cℓ → 8π 2 BF02 1 − 2 d2 ln − C − d′2 + . . . , (10)
dA 2dA

where C is the Euler constant C ≡ −Γ′ (1) = 0.57722, and d and d′ are a pair of
characteristic lengths of order dH :
2 2F0 F2 + G21 ′2 3F0 F2 + 21 G21
d ≡ , d ≡ , (11)
F02 F02
expressed in terms of coefficients in a power series expansion of the form factors:
F (k) = F0 + F2 k 2 + · · · , G(k) = G1 k + G3 k 3 + · · · . (12)
(This formula applies even when ℓ is not much larger than unity, except for ℓ = 0
and ℓ = 1 [29], provided we replace ℓ2 with ℓ(ℓ + 1) and ln ℓ with ℓr=1 1/r + C.)
P

The quantity d¯ in the logarithm is another length of order dH , this one given by a
much more complicated expression involving the form factors at all wave numbers,
but since dH ≪ dA the precise value of ln(d/2d ¯ A ) does not depend sensitively on
¯
the precise value of d.
One advantage of this formalism is that it provides a nice separation between the
three different kinds of effect that influence the observed temperature fluctuation,
that arise in three different eras: the power spectral function P(k) characterizes
the origin of the fluctuations, perhaps in the era of inflation; the form factors F (k)
and G(k) characterize acoustic fluctuations up to the time of last scattering; and
the angular diameter distance dA depends on the propagation of light since then.
This allows us to see easily what depends on what parameters. The form factors
F (k) and G(k) depend strongly on ΩB h2 (through the effect of baryons on the
sound speed) and more weakly on ΩM h2 (through the effect of radiation on the
expansion rate before the time of last scattering), but since the curvature and
vacuum energy were negligible at and before last scattering, F (k) and G(k) are
essentially independent of the present curvature and of ΩΛ . The power spectral
function P(k) is expected to be independent of all these parameters. On the other
hand, dA is affected by whatever governed the paths of light rays since the time of
last scattering, so it depends strongly on ΩM , ΩΛ , and the spatial curvature, but
it is essentially independent of ΩB . In quintessence theories dA would be given by
a formula different from (7), but P(k) and the form factors would be essentially
unchanged as long as the quintessence energy density was a small part of the total
energy density at and before the time of last scattering. In particular, Eq. (8)
shows that ℓ(ℓ + 1)Cℓ for ℓ ≫ 1 depends on ℓ and dA only through the ratio ℓ/dA ,
so changes in ΩΛ or the introduction of quintessence would lead to a re-scaling of
all the ℓ-values of the peaks in the plots of ℓ(ℓ + 1)Cℓ versus ℓ, but would have little
effect on their height.
Another advantage of this formalism is that, although Cℓ must be calculated by
a numerical integration, it is possible to give approximate analytic expressions for
the form factors in terms of elementary functions, at least in the approximation
that the dark matter dominates the gravitational field for a significant length of
time before last scattering. (There have been numerous earlier analytic calculations
of the temperature fluctuations[30], and their results may all be put in the form
(6), but my point here is that this form is general, not depending on the particular
approximations used.) In this approximation the form factors for very small wave
numbers are
FIGURE 1. Plots of the ratio of the multipole strength parameter ℓ(ℓ + 1)Cℓ to its value at
small ℓ, versus ℓdH /dA , where dH is the horizon size at the time of last scattering and dA is the
angular diameter distance of the surface of last scattering. The curves are for ΩB h2 ranging (from
top to bottom) over the values 0.03, 0.02, 0.01, and 0, corresponding to ξ taking the values 0.81,
0.54, 0.27, and 0. The solid curves are calculated using the WKB approximation; dashed lines
indicate an extrapolation to the known value at small ℓdH /dA .
F (k) → 1 − 3k 2 t2L /2 − 3[−ξ −1 + ξ 2 ln(1 + ξ)]k 4t4L /4 + . . . , (13)
G(k) → 3ktL − 3k 3 t3L /2(1 + ξ) + . . . , (14)

while for wave numbers large enough to allow the use of the WKB approximation
the form factors are
2 d2
h i
F (k) = (1 + 2ξ/k 2 t2L )−1 −3ξ + 2ξ/k 2 t2L + (1 + ξ)−1/4 e−k ∆ cos(kdH ) , (15)

and
√ 2 d2
G(k) = 3 (1 + 2ξ/k 2t2L )−1 (1 + ξ)−3/4 e−k ∆ sin(kdH ) . (16)

Here tL is the time of last scattering; ξ = 27ΩB h2 is 3/4 the ratio of the baryon
to photon energy densities at this time; dH is the acoustic horizon size at this
time; and d∆ is a damping length, typically less than dH . Using these results in
Eq. (8) gives the curves for ℓ(ℓ + 1)Cℓ /6C2 versus ℓdH /dA shown in Figure 1, in
the approximation that damping and the term 2ξ/k 2t2L may be neglected near the
peak. In this approximation the scalar form factor F (k) has a peak at k1 = π/dH
for any value of ΩB h2 , but the peak in ℓ(ℓ + 1)Cℓ does not appear (as is often
said) at ℓ = k1 dA = πdA /dH ; instead, ℓdH /dA at the peak ranges from 3.0 to 2.6,
depending on the value of ΩB h2 .
We see even from these crude calculations how sensitive is the height of the first
peak in ℓ(ℓ + 1)Cℓ /6C2 to the baryon density parameter ΩB h2 . (The experimental
value[31] for the height of this peak is about 6.) Right now, there is some worry
about the fact that the value of ΩB h2 inferred from the ratio of the heights of the
second and first peaks is larger than that inferred from considerations of cosmo-
logical nucleosynthesis. Perhaps it would be worth trying to estimate ΩB h2 by
comparing theory and experiment for the ratio of ℓ(ℓ + 1)Cℓ at the first peak to its
value for small ℓ, discarding the data at the second peak where the statistics are
worse and complicated damping effects make the theory more complicated.1

ACKNOWLEDGMENTS

I am grateful to Willy Fischler, Hugo Martel, Paul Shapiro, and Craig Wheeler
for their help in preparing this report. This research was supported in part by a
grant from the Welch Foundation and by National Science Foundation Grants PHY
9511632 and PHY 0071512.

REFERENCES

1. For a review, see J. Polchinski, in Fields, Strings, and Duality – TASI 1996,
eds. C. Efthimiou and B. Greene (World Scientific, Singapore, 1996): 293.
1)
At the meeting someone in the audience said that this has been done, but that was in the
early days, not I think with the more detailed information now available.
2. This was first discussed in the context of string theory by I. Antoniadis, Phys.
Lett. B246, 377 (1990); I. Antoniadis, C. Muñoz, and M. Quirós, Nucl. Phys.
B397 515 (1993); I. Antoniadis, K. Benakli, and M. Quirós, Phys. Lett.
B331, 313 (1994); J. Lykken, Phys. Rev. D54, 3693 (1996); E. Witten, Nucl.
Phys. B471, 135 (1996); and then developed in more general terms by N.
Arkani-Hamed, S. Dimopoulos, and G. Dvali, Phys. Lett. B 429, 263 (1998);
I. Antoniadis, N. Arkani-Hamed, S. Dimopoulos, and G. Dvali, Phys. Lett. B
436, 257 (1998). A different approach has been pursued by L. Randall and R.
Sundrum, Phys. Rev. Letters 83, 3370 (1999).
3. N. Arkani-Hamed, S. Dimopoulos, and G. Dvali, Phys. Rev. 59, 086004
(1999); S. Hannestad and G. G. Raffelt, astro-ph/0103201.
4. S. Hannestad, astro-ph/0102290.
5. H. Georgi, H. Quinn, and S. Weinberg, Phys. Rev. Lett. 33, 451 (1974).
6. S Dimopoulos and H. Georgi, Nucl. Phys. B193, 150 (1981); J. Ellis, S.
Kelley, and D. V. Nanopoulos, Phys. Lett. B260, 131 (1991); U. Amaldi, W.
de Boer, and H. Furstmann, Phys. Lett. B260, 447 (1991); C. Giunti, C. W.
Kim and U. W. Lee, Mod. Phys. Lett. 16, 1745 (1991); P. Langacker and
M.-X. Luo, Phys. Rev. D44, 817 (1991). For other references and more recent
analyses of the data, see P. Langacker and N. Polonsky, Phys. Rev. D47, 4028
(1993); D49, 1454 (1994); L. J. Hall and U. Sarid, Phys. Rev. Lett. 70, 2673
(1993).
7. S. Dimopoulos, S. Raby, and F. Wilczek, Phys. Rev. D24, 1681 (1981).
8. K. R. Dienes, E. Dudas, and T. Ghergetta, hep-ph/9806292, 9807522.
9. For recent detailed reviews, see S. Weinberg, in Sources and Detection of Dark
Matter and Dark Energy in the Universe — Fourth International Symposium,
D. B. Cline, ed. (Springer, Berlin, 2001), p. 18; E. Witten, ibid., p. 27; and
J. Garriga and A. Vilenkin, hep-th/0011262.
10. K. Freese, F. C. Adams, J. A. Frieman, and E. Mottola, Nucl. Phys. B287,
797 (1987); P. J. E. Peebles and B. Ratra, Astrophys. J. 325, L17 (1988);
B. Ratra and P. J. E. Peebles, Phys. Rev. D 37, 3406 (1988); C. Wetterich,
Nucl. Phys. B302, 668 (1988).
11. C. Armendariz-Picon, V. Mukhanov, and P. J. Steinhardt, astro-ph/0004134.
12. N. Arkani-Hamed, S. Dimopoulos, N. Kaloper, and R. Sundrum, Phys. Lett.
B 480, 193 (2000); S. Kachru, M. Schulz, and E. Silverstein, Phys. Rev. D62,
045021 (2000).
13. J. E. Kim, B. Kyae, and H. M. Lee, hep-th/0011118.
14. J. D. Barrow and F. J. Tipler, The Anthropic Cosmological Principle (Claren-
don Press, Oxford, 1986).
15. S. Weinberg, Phys. Rev. Lett. 59, 2607 (1987).
16. E. Baum, Phys. Lett. B133, 185 (1984); S. W. Hawking, in Shelter Island II
– Proceedings of the 1983 Shelter Island Conference on Quantum Field Theory
and the Fundamental Problems of Physics, ed. by R. Jackiw et al. (MIT Press,
Cambridge, 1985); Phys. Lett. B134, 403 (1984); S. Coleman, Nucl. Phys. B
307, 867 (1988).
17. A. Vilenkin, Phys. Rev. D 27, 2848 (1983); A. D. Linde, Phys. Lett. B175,
395 (1986).
18. J. Garriga and A. Vilenkin, astro-ph/9908115.
19. L. Abbott, Phys. Lett. B195, 177 (1987).
20. J. D. Brown and C. Teitelboim, Nucl. Phys. 279, 787 (1988).
21. R. Buosso and J. Polchinski, JHEP 0006:006 (2000); J. L. Feng, J. March-
Russel, S. Sethi, and F. Wilczek, hep-th/0005276.
22. A. Vilenkin: Phys. Rev. Lett. 74, 846 (1995); in Cosmological Constant and
the Evolution of the Universe, ed. by K. Sato et al. (Universal Academy Press,
Tokyo, 1996).
23. H. Martel, P. Shapiro, and S. Weinberg, Ap. J. 492, 29 (1998).
24. S. Weinberg, in Critical Dialogs in Cosmology, ed. by N. Turok (World Sci-
entific, Singapore, 1997). Counterexamples in theories of type (b) are pointed
out in reference [18], and the issue is further discussed in reference [9].
25. G. Efstathiou, Mon. Not. Roy. Astron. Soc. 274, L73 (1995); M. Tegmark
and M. J. Rees, Astrophys. J. 499, 526 (1998), J. Garriga, M. Livio, and
A. Vilenkin, Phys. Rev. D61. 023503 (2000); S. Bludman, Nucl. Phys.
A663-664, 865 (2000).
26. S. Weinberg, astro-ph/0103279 and 0103281.
27. J. R. Bond and G. Efstathiou, Mon. Not. R. Astr. Soc. 226, 655 (1987),
Eq. (4.19).
28. S. Hawking, Phys. Lett. 115B, 295 (1982); A. A. Starobinsky, Phys. Lett.
117B, 175 (1982); A. Guth and S.-Y. Pi, Phys. Rev. Lett. 49, 1110 (1982); J.
M. Bardeen, P. J. Steinhardt, and M. S. Turner, Phys. Rev. D28, 679 (1983);
W. Fischler, B. Ratra, and L. Susskind, Nucl. Phys. B259, 730 (1985).
29. In Eq. (6) terms are neglected that only affect C0 and C1 ; for these terms, see
A. Dimitropoulos and L.P. Grishchuk, gr-qc/0010087.
30. P. J. E. Peebles and J. T. Yu, Ap. J. 162, 815 (1970); J. R. Bond and G.
Efstathiou, Ap. J. Lett. 285, L45 (1984); Mon. Not. Roy. Astron. Soc. 226,
655 (1987); C-P. Ma and E. Bertschinger, Ap. J. 455, 7 (1995); W. Hu and
N. Sugiyama, Ap. J. 444, 489 (1995); 471, 542 (1996).
31. A. H. Jaffe et al., astro-ph/0007333.
UTTG–06–02

Cosmological Fluctuations Of Small Wavelength

Steven Weinberg

Theory Group, Department of Physics, University of Texas, Austin, TX, 78712


arXiv:astro-ph/0207375v1 18 Jul 2002

weinberg@physics.utexas.edu

ABSTRACT

This paper presents a completely analytic treatment of cosmological fluctua-


tions whose wavelength is small enough to come within the horizon well before
the energy densities of matter and radiation become equal. This analysis yields a
simple formula for the conventional transfer function T (k) at large wave number
k, which agrees very well with computer calculations of T (k). It also yields an
explicit formula for the microwave background multipole coefficient Cℓ at very
large ℓ.

Subject headings: cosmic microwave background — dark matter — early universe

1. Introduction

The transfer function gives the wave length dependence of the growth of perturbations
in the cold dark matter density from early times to near the present. As such, it plays a
central role in theoretical studies of cosmological structure formation, and it also enters in
the calculation of the microwave background anisotropies of large multipole number. For
general wave length the transfer function can only be calculated numerically. This paper will
present a purely analytic solution of the equations governing the evolution of perturbations
in the early universe in the case of small wave length,1 which yields a simple formula for the
transfer function in this case, including the numerical parameters appearing in this formula.

1
In speaking of small wavelengths, it is nevertheless assumed that the wavelength is large enough so that
the fluctuations are far outside the horizon during the era of electron–positron annihilation, and large enough
so that viscosity and heat conduction are negligible until close to the time of recombination, as is the case
for all fluctuations of physical interest.
–2–

The most closely related previous work seems to be that of Hu & Sugiyama (1996).
In contrast with their work, the present paper provides the justification for a crucial step
in calculating fluctuations in the dark matter density (see footnote 4 below); it is entirely
analytic, even in following perturbations through the era of horizon crossing and in analyzing
the case of infinite wavelength (which is needed to normalize the transfer function); and
explicit formulas are given for the numerical parameters in the transfer function at small
wavelength and for the cosmic microwave background multipole coefficient Cℓ at large ℓ.

2. Generalities

We consider the contents of the universe to consist of radiation plus cold dark mat-
ter plus baryons (electrons and nuclei). We include neutrinos in the radiation, neglect-
ing the anisotropic part of their energy-momentum tensor, which makes possible a purely
analytic treatment. As usual, the cold dark matter is taken to have zero pressure and
only gravitational interactions. For simplicity at first we will assume local thermal equi-
librium, so that the fractional changes in the baryon and radiation densities are related by
δρB /ρB = 3δρR /4ρR ≡ δR , which is a good approximation until late in the matter-dominated
era, and we will ignore the effects of curvature and a cosmological constant, which are neg-
ligible until near the present. Later these effects and departures from equilibrium will be
taken into account where they are relevant.
The evolution of compressional cosmological perturbations under these assumptions are
governed by the equations:2
   
d 2
 2 8
a ψ = −4 πG a ρD δD + ρR + ρB δR , (1)
dt 3
δ̇D = −ψ , δ̇R = −ψ + q 2 UR , (2)
   
d 5 4 4
a ρR + ρB UR = − a3 ρR δR . (3)
dt 3 9

Here q is the co-moving wave number; a(t) is the Robertson–Walker scale factor; UR (t)
is the radiation velocity potential; δD is the fractional change δρD /ρD in the dark matter
density ρD ; and dots indicate ordinary time derivatives. We are using a synchronous gauge,
with vanishing time-time and time-space components of the metric perturbation δgµν , and
with the remaining gauge freedom removed by requiring that the cold dark matter velocity

2
These equations are a simple extension of Eqs. (15.10.50), (15.10.51), and (15.10.53) of Weinberg (1972)
to the multi-fluid system considered here.
–3–

vanishes. In this gauge, all effects of gravitational perturbations for compressional modes
are contained in the field ψ(t) ≡ d(δgkk (t)/2a2 (t))/dt.
For general wave numbers these equations are too complicated to be solved analytically.
However, for large q we can divide the evolution of the fluctuations into two overlapping
eras, in each of which there are approximations available that allow an analytic solution.

3. Radiation Dominated Era

First, consider an era so early that ρD and ρB are much less than ρR , though the

wavelength may be inside or outside the horizon. Here a ∝ t and t2 = 3/32πGρR , and by
eliminating UR we obtain a pair of coupled equations for δR and ψ:

d √ dδR q2 t d √ 
 
d 1
(t ψ) = − δR , t + δR = − tψ . (4)
dt t dt dt 3a2 dt
The linear combination of the three independent solutions that grows most rapidly for small
time is3
   
2N 2 2 2
δR = 2 sin θ − 1 − 2 cos θ − 2 , (5)
C θ θ θ
 
4 4 4 2
ψ=N sin θ + 4 cos θ − 4 − 2 (6)
θ3 θ θ θ
where N is an unknown function of q that is presumably fixed during the era of inflation;
√ √ √
θ ≡ C t; and C is the constant C ≡ [2q t/ 3a]t→0 . Also, Eq.(2) gives

C t  
2N 4 4 4 2
Z
δD = − 2 sin θ + cos θ − − θ dθ . (7)
C 0 θ3 θ4 θ4 θ2

Note that the fractional perturbations δD and δR are both of order ψ/q 2 , justifying the
neglect of the matter term in Eq. (1) when ρR ≫ ρD .
For convenience later, it is useful to normalize the Robertson–Walker scale factor so
that a = 1 at the time tEQ when the matter density ρM ≡ ρD + ρB and the radiation density

ρR have a common value ρEQ . Then at early times we have a → (32πGρEQ /3)1/4 t, and

so C = (q/ 3)(2πGρEQ /3)−1/4 . Also, q is now defined as the physical wave number q/a at
t = tEQ .

3
Aside from normalization, this solution is equivalent to that given for the Newtonian potential in a
different gauge in Eq. (48) of Bashinsky & Bertschinger (2002).
–4–

4. Deep Inside the Horizon

Following this is an era in which the dark matter density may not be negligible, but
the wavelength is well within the horizon. With the physical wave number q/a much greater
than the expansion rate, there are two kinds of normal mode, that can be calculated using
two different methods of approximation.
The first are the “fast” modes, for which d/dt acting on perturbations gives factors
of order q/a. Inspection of Eqs. (1)–(3) shows that there is a solution with δR = O(qψ),
UR = O(ψ), and δD = O(ψ/q), so that we can neglect the term ψ on the right-hand side of
Eq. (2), and even for ρD > ρR we can neglect the dark matter term on the right-hand side
of Eq. (1). Eliminating UR then gives an equation for δR alone:
q2
 
d dδR
(1 + R)a + δR = 0 , (8)
dt dt 3a
where R ≡ 3ρB /4ργ . This has the well-known WKB solutions (Peebles & Yu, 1970)
!
q dt
Z
δR± = (1 + R)−1/4 exp ±i p , (9)
3(1 + R)a
which would be exact for vanishing R.
Then there are “slow” modes, for which d/dt acting on perturbations gives factors
of order 1/t. Inspection of Eqs. (2)–(3) shows that in this case there is a solution with
δD = O(ψ), ψ ≃ q 2 UR . and δR = O(UR ) = O(ψ/q 2). It follows that even for ρD < ρR we
can neglect the radiation term on the right-hand side of Eq. (1), so that after eliminating
the field ψ we have  
d 2 dδD
a = 4 πG a2 ρD δD . (10)
dt dt
It is convenient to convert the independent variable from t to a, using the Friedmann equation
ȧ2 8πG 8πGρEQ −3
a + a−4

2
= (ρM + ρR ) = (11)
a 3 3
so that Eq. (10) reads4
d2 δD
 
3a dδD 3
a(1 + a) 2 + 1 + − (1 − β) δD = 0 , (12)
da 2 da 2

4
Hu and Sugiyama (1996) pointed out that this equation leads to a transfer function with the asymptotic
form ln k/k 2 , but it has not been clear why it is legitimate in deriving Eq. (12) to neglect fluctuations in
the radiation energy density as a contribution to the source of the gravitational field during the radiation-
dominated era. Eq. (12) was first derived by Mészáros (1974), who simply ignored fluctuations in the
–5–

where β ≡ ρB /ρM = ΩB /ΩM . The independent solutions of Eq. (12) for β = 0 were given
by Mészáros (1974) and Groth & Peebles (1975):
 √

 
3a 3a 1+a+1
f1 = 1 + , f2 = 1 + ln √ −3 1+a. (13)
2 2 1+a−1

Hu and Sugiyama (1996) have given the solutions for general β in terms of hypergeometric
functions, but the necessity of matching these solutions to those that apply after recombina-
tion leads to an extremely complicated formula for the transfer function, which obscures the
dependence of the result on the baryon density. Here we will assume that β is small, though
not entirely negligible, and work with solutions valid only to first order in β. The first-order
solutions of Eq. (12) with the same behavior for a ≪ 1 as the zeroth order solutions f1 and
f2 are:
3β a f1,2 (b) db
Z
(1,2)
δD (a) = f1,2 (a) − [f1 (a)f2 (b) − f2 (a)f1 (b)] √ . (14)
2 0 1+b

By applying Eqs. (2) and (3), we can find the fast mode solutions for UR± and δD±
from
(1,2) (1,2)
Eq. (9) and the slow mode solutions for UR and δR from Eq. (14). These four modes
a complete set of solutions of the fourth-order system of equations (1)–(3) up to the time
of recombination for q/a ≫ ȧ/a. The physical solution is a linear combination of these four
modes, to be found by matching their behavior for a ≪ 1 to that found in Section 3.

radiation density. Groth and Peebles (1975) neglected fluctuations in the radiation density on the grounds
that the wavelength is much less than the Jeans length, which for radiation is the horizon, but the relevance
of the Jeans length in an expanding universe containing both radiation and dark matter is not clear. In
their Appendix B, Hu and Sugiyama (1996) neglected the contributions of perturbations in the dark matter
density to the gravitational field at early times, and showed that then the contribution of perturbations in the
radiation density to the gravitational field are also negligible. But this does not justify the use of equation
(12). For this, it is necessary to show that perturbations in the radiation density make negligible contributions
to the gravitational field when the contributions of the dark matter perturbation are not negligible, as is the
case late in the radiation dominated era and during the cross-over from radiation to matter dominance. Liddle
and Lyth (2000) on p. 107 attempted to explain the neglect of perturbations in the radiation energy density
in Eq. (12) by claiming that Silk damping makes these perturbations decay away. This is incorrect. For
wavelengths of physical interest Silk damping is negligible during the radiation-dominated era and through
the time of radiation-matter equality. (This has been acknowledged by Liddle and Lyth in an erratum:
star-www.cpes.sussex.ac.uk/ andrewl/infbook/errata.html.) The neglect of perturbations in the radiation
density in Eq. (12) is explained by counting powers of 1/q as done here, and it applies only to the slow
mode part of the solution; in the fast mode it is the perturbations in the dark matter density that become
negligible for small wavelength.
–6–

5. Matching

Fortunately, for small wavelength there is an overlap in the two eras in which we have
found solutions for δD , etc., satisfying both conditions q/a ≫ ȧ/a and ρM ≪ ρR . In this

period C t ≫ 1, and Eq. (5) gives the oscillating part of the fractional perturbation in the

radiation density as δR = −(2N/C 2 ) cos C t, which for ρM ≪ ρR fits smoothly with the
linear combination of the fast solutions (9) for q/a ≫ ȧ/a:
Z t ! p Z t !
fast 2N q dt 2N 6πGρEQ q dt
δR = − cos =− cos ,
(1 + R)1/4 C 2 (1 + R)1/4 q 2
p p
0 3(1 + R)a 0 3(1 + R)a
(15)
from which we also find, to leading order in 1/q,
!
1/4
p Z t
3Na(1 + R) (2 + R) 2πGρEQ q dt
ψ fast = sin , (16)
q 3 t2
p
0 3(1 + R)a
!
3Na2 (1 + R)3/4 (2 + R) 6πGρEQ
p Z t
fast q dt
δD = cos . (17)
q 4 t2
p
0 3(1 + R)a
and p Z t !
2N 2πGρEQ q dt
URfast = 4 sin . (18)
q a(1 + R)3/4
p
0 3(1 + R)a

To find the coefficients in the slow modes, we note that the limit of Eq. (7) for C t ≫ 1
is
!

p
4N 6πGρEQ
 
4N 1 1 aq
δD → 2 − + γ + ln C t = − + γ + ln p , (19)
C 2 q2 2 8πGρEQ

where γ = 0.5772 . . . is the Euler constant. For a ≪ 1, the solutions (14) become
(1) (2)
δD → 1 , δD → − ln(a/4) − 3 . (20)

The linear combination of these solutions that fits smoothly with Eq. (19) is then
(" √ 2 !# )
slow 4N 7 2 3C (1) (2)
δD = 2
− + γ + ln δD − δD
C 2 q
p (" !# )
4N 6πGρEQ 7 2q (1) (2)
= 2
− + γ + ln p δD − δD . (21)
q 2 2πGρEQ
–7–

The slow part of the velocity potential and radiation density are given by Eqs. (2), (3), and
(21) as
URslow = ψ slow /q 2 = −δ̇D
slow
/q 2 (22)
d 
δRslow = −3a2 a(1 + R)URslow .

(23)
dt
fast slow
The full solution up to the time of recombination is given by δD = δD + δD and likewise
for UR and δR .
Eq. (17) shows that the fast part of δD is smaller than the slow part (21) by a factor of
order 1/q 2 t2 , so that for small wavelengths the full perturbation to the dark matter density is
given by Eq. (21) from the time that q/a becomes much greater than ȧ/a, and even after the
energy densities of matter and radiation become comparable, up to the time of recombination.
But this is not true of the radiation perturbations. Comparison of Eqs. (22) and (23) with
(18) and (15) shows that for large q the perturbations to the radiation velocity potential and
density are dominated by the fast mode, by one and two factors of q, respectively.

6. The Transfer Function

The transfer function T (k) is properly defined as the growth of the total matter density
perturbation for a given present physical wave number k ≡ q/a(t0 ) = q(1 + zEQ ), from early
in the radiation-dominated era to late in the matter dominated era, relative to the growth
that occurs in the same time interval for zero wave number. We must therefore now project
the solution we have found for the density perturbations forward into the era following the
time of recombination. In this era the baryonic perturbation is no longer suppressed by
radiation pressure, and so it follows the same equation as the dark matter perturbation:

d2 δB d2 δD
   
3a dδB 3a dδD 3
a(1 + a) 2 + 1 + = a(1 + a) 2 + 1 + = δM , (24)
da 2 da da 2 da 2

where δM ≡ δρM /ρM = (1 − β)δD + βδB . This does not mean that δB and δD are equal, for
they satisfy different initial conditions at recombination. But from a linear combination of
these equations for δD and δB we find that

d2 δM
 
3a dδM 3
a(1 + a) 2 + 1 + − δM = 0 , (25)
da 2 da 2

This has the solutions (13). To find the correct linear combination of these solutions, we
note that δB and δ̇B vanish to leading order in 1/q at recombination, so δM and and its
first derivative at recombination must respectively equal (1 − β)δD and its first derivative.
–8–

The total matter density perturbation after recombination is the linear combination of the
solutions (13) that matches in this way with the solution (21):
p
4N 6πGρEQ  
δM (a) = A f
1 1 (a) + A f
2 2 (a) , (26)
q2
where
" !#
7 2q
A1 (q) = − + γ + ln p (1 − β − β I12 ) − β I22 , (27)
2 2πGρEQ
" !#
7 2q
A2 (q) = −1 + β − β I12 − β I11 − + γ + ln p , (28)
2 2πGρEQ

3 aR fi (a) fj (a) da
Z
Iij ≡ √ . (29)
2 0 1+a
Near the present, where a ≫ 1, the matter density fluctuation goes to
r
18A1 Na 2πGρEQ
δM → . (30)
q2 3

We can find the behavior of δM early in the matter-dominated era by taking C t ≪ 1 in
Eqs. (5) and (7):
δM → Nt/2 . (31)

Eqs. (30) and (31) must be compared with the growth of δM for q = 0. In this case
Eq. (2) gives δR = δD = −ψ̇, so Eq. (1) becomes
   
d 2 dδM 2 1 8
a = 4πGa ρEQ + δM (32)
dt dt a3 3a4
The solution of Eqs. (32) and (11) that has the same behavior for a → 0 as Eq. (31) is
s
N 3 
2 3
√ 
δM = 2 16 + 8a − 2a + a − 16 1 + a (33)
5a 2πGρEQ

This has an asymptotic behavior for a ≫ 1:


s
Na 3
δM → (34)
5 2πGρEQ

The transfer function T then has an asymptotic behavior for large wave number given by
the ratio of Eqs. (30) and (34):
60πGρEQ
T → A1 (q) , (35)
q2
–9–

with A1 (q) given by Eq. (27). At late times the growth of δM may be affected by a cosmo-
logical constant or spatial curvature, but these effects are independent of wave number, and
therefore cancel in the transfer function.
For ρB ≪ ρD we can neglect β, so Eqs. (35) and (27) give a transfer function
" !#
60πGρEQ 7 2q
T → − + γ + ln p . (36)
q2 2 2πGρEQ

This can be put in more familiar terms by using the relations ρEQ = (3H02ΩM /8πG)(1+zEQ )3 ,
q = k(1 + zEQ ), and 1 + zEQ = ΩM /ΩR , which give the transfer function in terms of the
present wave number k:

45Ω2M H02
  
7 4k ΩR ln(2.40 Q)
T (k) → 2
− + γ + ln √ = , (37)
2ΩR k 2 ΩM H0 3 (4.07 Q)2

where Q ≡ k(Mpc−1 )/ΩM h2 , and in the final expression we use ΩR h2 = 4.15×10−5. This may
be compared with the BBKS numerical fit (Bardeen et al. 1986) to computer calculations
of the transfer function:
ln(1 + 2.34Q)  −1/4
T (k) ≃ 1 + 3.89Q + (16.1Q)2 + (5.46Q)3 + (6.71Q)4 . (38)
2.34Q

This goes to ln(2.34Q)/(3.96 Q)2 for large Q, in very good agreement with Eq. (37). Our
simple calculation thus accounts not only for the form of the transfer function for large wave
numbers, but also for its numerical parameters.
(Though it is not relevant to the present work, it may be noted that the BBKS formula
cannot be taken seriously for small values of Q, for it has unphysical terms that are linear
in Q at Q → 0. Analyticity in the three-vector k requires that in this limit T (k) should be
a power series in k 2 , or equivalently in Q2 .)
To assess the effect of a non-zero baryon number, we note that, to first order in β =
ΩB /ΩM , the general formula (35) may be put in the form
 
ln 2.40 Q (1 + βI22 )
T →h  i2 (39)
4.07 Q 1 + β(1 + I12 )/2

We need values for the integrals Iij defined by Eq. (29). The upper limit on the integrals
(29) is aR = (1 + zEQ )/(1 + zR ). The redshift zR at recombination has only a very weak
dependence on cosmological parameters, and will be taken here to have the fixed value
– 10 –

zR = 1100. The redshift zEQ at matter-radiation equality is given by 1 + zEQ = ΩM /ΩR , so


taking ΩR h2 = 4.15 × 10−5 , we have aR = 21.9 ΩM h2 . The integral I12 is given by
 a  √ √ 
3 2 R 2
 1 + aR + 1
I12 (aR ) = −22aR − 18aR + 4 ln + 1 + aR 4 + 8aR + 9aR ln √
20 4 1 + aR − 1

The integral I22 is given by a lengthy expression involving Spence functions, but it converges
so rapidly for likely values of aR that for practical purposes we can use the value for aR
infinite:
I22 (∞) = 2π 2 /5 − 3 = 0.947842 .
For instance, for aR = 4.38 (corresponding to ΩM h2 = 0.2), we have I22 = 0.9470.
Eq. (39) agrees very well for large k with the numerical results of Holtzman (1989). For
ΩM h2 = 0.2 (the most plausible of the values considered by Holtzman) Eq. (39) becomes
 
ln 12.0 k (1 + 0.947 β)
T →h  i2 ,
20.35 k 1 + 1.377 β

with k in Mpc−1 . Table 1 compares the results given by this formula with the numerical
results given by Holtzman (1989) for β ≡ ΩB /ΩM equal to 0.01 and 0.1, and for various values
of k. As can be seen, for these parameters the asymptotic formula (39) gives a pretty good
approximation to the numerically calculated results for k > 0.5 Mpc−1 , and the numerically
calculated results converge rapidly to Eq. (39) for larger values of k. But Holtzman warns
that his result should not be used for k > 3.09 Mpc−1 , while the results obtained from
Eq. (39) presumably become increasingly more accurate for larger values of k
We see that for any plausible value of ΩM h2 , the effect of a small baryon density on
the argument of the logarithm is to replace the parameter ΩM h2 in the definition of Q with
ΩM h2 (1 − .95β), while for ΩM h2 in the range of 0.12 to 0.2, the effect of a small baryon
density on the denominator of T (k) is to replace the parameter ΩM h2 in the definition of
Q with ΩM h2 (1 − ζβ), with ζ ≡ (1 + I22 )/2 in the range of 1.24 to 1.38. These results
throw some light on a series of attempts to correct the transfer function for the effects of
baryon density by re-scaling the definition of Q (usually called q) in the BBKS formula
(38). Various authors attempted to correct for the baryon density by replacing ΩM h2 in
the denominator of √ Q with a factor ΩM h2 exp(−2ΩB ) (Peacock & Dodds 1994), or with
ΩM h2 exp(−ΩB − 2hΩB /ΩM ) (Sugiyama 1995) or with ΩM h2 exp(−ΩB − ΩB /ΩM ) (Liddle
et al. 1996). (For a more detailed study of the effects of a finite baryon-to-dark matter ratio
on the transfer function, see Eisenstein & Hu (1998).) Of course there is no reason why
the baryon density should enter only in the definition of Q, and Eq. (39) shows that it does
– 11 –

not. But even without any detailed calculations, it is evident that these correction factors
are physically impossible. The transfer function is defined with no reference to the present
moment, except that it is conventionally written as a function of the present wave number k.
It depends on ΩM h2 and ΩR h2 , which enter in the formulas for k/q and ρEQ , and it can (and
does) have an additional dependence on the constant ratio of the energy densities of baryons
and all matter, which is equal to ΩB /ΩM , but there is no way that it can depend separately
on ΩB or ΩM or h. What we have found here is that for large wave number, the effect of a
small baryon density can be crudely taken into account by replacing the parameter ΩM h2 in
the definition of Q with ΩM h2 (1 − ζΩB /ΩM ) ≃ ΩM h2 exp(−ζΩB /ΩM ), with ζ roughly equal
to unity for likely values of ΩM h2 .

7. Microwave Background Anisotropies

The fractional temperature fluctuation in a direction n̂ takes the general form (apart
from late-time effects):
∆T (n̂, z)
Z Z h i
= p(z) dz d3 k ei(1+z)dA (z)n̂·k ǫk F (k, z) + i(n̂ · k̂)G(k, z) , (40)
T
where p(z) dz is the probability that last scattering will occur between redshifts z and z + dz;
dA (z) is the angular diameter distance to redshift z; and ǫk is a primordial fluctuation
amplitude, defined as proportional to N(k), with a coefficient to be chosen below. In the
synchronous gauge and hydrodynamic approximation used here, and now making the further
approximation that dark matter dominates the energy density at last scattering, the form
factors F and G in Eq. (40) are given by (Weinberg 2001):
1 1
ǫF = φ + δR , (41)
3 3
ǫG = −aqUR + qtφ/a , (42)

where φ = −4πGρR δR a2 /q 2 is the Newtonian potential produced by dark matter density


fluctuations. The first and second terms in F arise from the Sachs–Wolfe effect and intrinsic
temperature fluctuations, respectively. The form factor G arises from the Doppler effect, with
its first and second terms contributed by velocities produced by pressure and gravitational
forces, respectively.
The conventional multipole coefficient Cℓ is given in general by the familiar formula
Z ∞ Z h     i 2
2 2

Cℓ = 16π P(k) k dk dz p(z) jℓ kr(z)/H0 F (k, z) + jℓ kr(z)/H0 G(k, z) ,
0
(43)
– 12 –

where P(k) is the power spectral function, defined by


hǫk ǫk′ i = P(k)δ 3 (k + k′ ) , (44)
and
1
dx
Z
r(z) ≡ (1 + z)dA (z)H0 = √ . (45)
1/(1+z) ΩΛ x4 + ΩM x
To fix the normalization of ǫk , we note that for small values of ℓ (say, ℓ < 10) the large value
of dA makes the spherical Bessel functions in Eq. (43) oscillate rapidly except for small values
of k, so Cℓ is dominated for small ℓ by the Sachs–Wolfe term in F (k), for which Eqs. (34)
and (41) give the z − independent small-k behavior
4πGρR a3 N(q)
ǫF (k, z) → − p . (46)
5q 2 6πGρEQ
We therefore define ǫk by
4πGρR a3
ǫk = − p N(q) , (47)
5q 2 6πGρEQ
so that F (0, z) = 1. With this normalization, a Harrison–Zel’dovich power spectral function
P(k) = Bk −3 gives Cℓ = 8π 2 Bℓ(ℓ + 1) for small ℓ.
For large ℓ, the integral over k in Eq. (43) is dominated by large wave numbers. In
this case, the Sachs–Wolfe term in Eq. (41) receives a contribution of order 1/k 4 from the
slow mode part (21) of δD and of order 1/k 6 from the fast mode part (17). The intrinsic
fluctuation term in Eq. (41) receives a contribution of order 1/k 2 from the fast mode term
(15) in δR , and of order 1/k 4 from the slow mode term (23). The slow mode parts of the two
terms in the Doppler form factor (42) cancel, leaving the contribution of the fast mode term
(18) in UR , which is of order 1/k 3 . We conclude from this that in the absence of dissipative
effects, the temperature fluctuation is dominated for large k by the fast-mode part of the
intrinsic temperature fluctuation.
But for very large k the rapidly oscillating fast mode is killed by Silk damping (i.,
e., photon viscosity and heat conduction) and Landau damping (cancelations due to large
changes in the phase of the fast modes over the range of redshifts at which last scattering
may occur). As pointed out by Hu and Sugiyama(1996), for ℓ greater than about 4,000
the dominant contribution to Cℓ arises from the non-oscillatory terms in the perturbations.
These terms, which are contributed by both the Sachs–Wolfe effect and the intrinsic tem-
perature fluctuations, can be taken from Eqs. (83) and (84) of Weinberg (2001), with the
damped terms neglected and an extra factor T (k) supplied, because here we are dealing with
wavelengths that come into the horizon during the radiation dominated era. This gives
F (k, z) → −3R(z) T (k) , G(k, z) → 0 , (48)
– 13 –

so Eq. (43) becomes


Z ∞
Z   2
2 2 2

Cℓ = 144π P(k) T (k) k dk dz p(z) R(z) jℓ kr(z)/H0 .
(49)
0

To do the double integral over z and k, we use an approximation of Hu and White (1996).
The last-scattering probability distribution p(z) is sharply peaked around a mean value zL ≃
1, 100, while for sufficiently large ℓ the spherical Bessel function is even more sharply peaked
at a value ℓ + 1/2 of its argument. We therefore set z at a value where kr(z)/H0 = ℓ + 1/2
everywhere but in the argument of jℓ , and integrate over the argument of jℓ with k fixed,
after which we set k = (ℓ + 1/2)H0/r(zL ) everywhere but in the argument of p(z), and
integrate over that argument:
   
2 3 (ℓ + 1/2)H0 2 (ℓ + 1/2)H0 ℓ + 1/2
Cℓ → 144π H0 P T R2 (zL ) 2
r(zL ) r(zL ) r (zL )r ′ (zL )
Z Z ∞ 2
× p2 (z) dz

jℓ (s) ds
0

36π H0 (1 + zL )3/2 ΩM
5/2 3
   
ℓ H0 2 ℓ H0
→ 2
P T R2 (zL ) , (50)
r (zL )σ r(zL ) r(zL )

where σ is defined by
1
Z
p2 (z) dz ≡ √ , (51)
2 πσ
so that σ is the standard deviation if p(z) is Gaussian. For instance, for a straight spectrum
with P(k) ∝ k −2−ns , Eq. (45) gives Cℓ ∝ ℓ−6−ns ln2 ℓ. Unfortunately the interposition of
foreground objects makes it unlikely that this can be measured.
I am grateful for valuable discussions with S. Bashinsky, E. Bertschinger, R. Bond, W.
Hu, A. R. Liddle, D. H. Lyth, H. Martel, and P. Shapiro, and for help with integrals by M.
Trott. This article is based on work supported by the National Science Foundation under
Grant No. 0071512, and also supported by The Robert A. Welch Foundation.

REFERENCES

Bardeen, J. M., Bond, J. R., Kaiser, N., & Szalay, A. S. 1986, ApJ 304, 15
Bashinsky, S. & Bertschinger, E. 2002, Phys. Rev. D 65, 123008.
Eisenstein, D. J. & Hu, W. 1998, ApJ 496, 605
Groth, E.J. & Peebles, P.J.E. 1975, A&A 41, 143
– 14 –

Holtzman, J.A. 1989, ApJS 71, 1


Hu, W. & White, M. 1997, A&A 315, 33
Hu, W. & Sugiyama, N. 1996, ApJ 471, 542
Liddle, A. R., Lyth, D. H., Viana, P. T. & White, M. 1995, MNRAS 282, 281
Liddle, A. R. & Lyth, D. H. 2000, Cosmological Inflation and Large Scale
Structure (Cambridge, UK: Cambridge University Press)
Mészáros, P. 1974, A&A 37, 225
Peacock, J. A. & Dodds, S. J. 1994, MNRAS 267, 1020
Peebles, P. J. E. & Yu, J. T. 1970, ApJ 162, 815
Sugiyama, N. 1995, ApJ 100, 281
Weinberg, S. 1972, Gravitation and Cosmology (New York: Wiley)
Weinberg, S. 2001, Phys. Rev. D 64, 123511

This preprint was prepared with the AAS LATEX macros v5.0.
– 15 –

Table 1. Values of the transfer function for ΩM h2 = 0.2 and ΩB /ΩM = 0.01 or 0.1.

k (Mpc−1 ) T (k)ΩB /ΩM =0.01 a T (k)ΩB /ΩM =0.01 b T (k)ΩB /ΩM =0.1 a T (k)ΩB /ΩM =0.1 b

0.1 0.161 0.0451 0.138 0.0509


0.3 0.0398 0.0337 0.0328 0.0284
0.5 0.0189 0.0169 0.0154 0.0140
1 0.00640 0.00586 0.00517 0.00480
2 0.00202 0.00187 0.00162 0.00152
3 0.000997 0.000938 0.000797 0.000762

a
From Holtzman (1989)
b
From Eq. (39)
UTTG-12-02

Adiabatic Modes in Cosmology


arXiv:astro-ph/0302326v1 17 Feb 2003

Steven Weinberg1
Theory Group, Department of Physics, University of Texas
Austin, TX, 78712

Abstract

We show that the field equations for cosmological perturbations in Newtonian


gauge always have an adiabatic solution, for which a quantity R is non-zero
and constant in all eras in the limit of large wavelength, so that it can be
used to connect observed cosmological fluctuations in this mode with those at
very early times. There is also a second adiabatic mode, for which R vanishes
for large wavelength, and in general there may be non-adiabatic modes as
well. These conclusions apply in all eras and whatever the constituents of
the universe, under only a mild technical assumption about the wavelength
dependence of the field equations for large wave length. In the absence of
anisotropic inertia, the perturbations in the adiabatic modes are given for
large wavelength by universal formulas in terms of the Robertson–Walker
scale factor. We discuss an apparent discrepancy between these results and
what appears to be a conservation law in all modes found for large wavelength
in synchronous gauge: it turns out that, although equivalent, synchronous
and Newtonian gauges suggest inequivalent assumptions about the behavior
of the perturbations for large wavelength.

1
Electronic address: weinberg@physics.utexas.edu
I. INTRODUCTION

If observations are to be used to tell us something about inflation, then


we need some way of connecting the properties of the cosmological fluctua-
tions produced during inflation to the properties of fluctuations much closer
to the present. Inconveniently, in inflationary cosmologies the era of inflation
was followed by a period when the energy in scalar fields was converted into
matter and radiation, and about this process we know essentially nothing.
Subsequently there may have been other periods about which we are equally
ignorant, such as the often-hypothesized era with temperatures between 1013
GeV and 1011 GeV, when supersymmetry may have become broken by un-
known strong forces. These mysterious eras occurred when fluctuations of
cosmological interest were far outside the horizon, but this does not rule
out some effect on the strength or even the wave-length dependence of these
fluctuations.2 Therefore, in relating the cosmological fluctuations produced
during inflation with those observed in the cosmic microwave background or
in large-scale cosmic structures, it is essential to employ some sort of conser-
vation law that is valid at large wavelengths whatever the details of cosmic
evolution.
In much work on fluctuations in cosmology, the conserved quantity is
taken to be a quantity R related to the spatial curvature on co-moving spatial
surfaces[1], given in Newtonian gauge by3

R = −Ψ + Hδu . (1)

The rate of change of R is given by a general formula[2]:

q2
!" ! #
Ḧ + 3H Ḣ H
 
Ṙ = X + 2 Ψ− 4πG δσ . (2)
a 3Ḣ 2 Ḣ
2
By a mode being “beyond the horizon” we only mean that the physical wave number
is much less than the expansion rate. This does not necessarily have anything to do with
causality; indeed, the point of inflation is to make the true particle horizon radius much
larger than the inverse expansion rate.
3
Here H = ȧ/a is the expansion rate, with dots denoting ordinary time derivatives. In
Newtonian gauge the perturbations to the gravitational field are taken to be δg00 = −2Φ
and δgij = −2a2 Ψδij . Also δρ, δp, and δu are the perturbations to the total energy density,
pressure, and velocity potential in Newtonian gauge, while we use a bar to denote unper-
turbed quantities like the unperturbed energy density ρ̄ and pressure p̄. For simplicity we
assume a vanishing unperturbed spatial curvature.

1
Here q is the co-moving wave number, δσ is a measure of the anisotropic
stress,4 and
ρ̄˙ δp − p̄˙ δρ
X≡ . (3)
3(ρ̄ + p̄)2
Thus R is conserved in the limit of small wave number if and only if X = 0
in this limit.
The limit of small q is of some interest in itself, but its importance lies
chiefly in the circumstance that those factors of q that arise from the field
equations (rather than from the initial conditions) are always accompanied
by factors of 1/a(t), because it is only q/a(t) that is independent of the
units chosen for the co-moving spatial coordinates xi . It is usually a good
guess that terms in the perturbations proportional to such factors of q will be
negligible if q/a(t)H(t) ≪ 1. Hence, although here we will study the behavior
of the perturbations in the limit of small q, it is expected that this provides
insight to the behavior of perturbations as a(t) increases. In particular, we
expect that if X → 0 for q → 0, and if the coefficient of q 2 in the second
term in Eq. (2) remains finite in this limit, then at any given epoch Ṙ will
be small if q/aH is sufficiently small. In any mode for which R is non-zero
in this limit, the fractional rate of change of R will then also be small in this
limit.
Now, the quantity X vanishes in all modes and for all q when the per-
turbed pressure p̄ + δp is a function only of the perturbed energy density
ρ̄ + δρ, as is the case in a universe dominated either by pure radiation or by
pure cold matter, but not when both radiation and cold matter are impor-
tant, and also not during inflation or in the curvaton model[4]. The quantity
X does vanish for all modes in the limit q → 0 in the case of inflation with
a single scalar field, but this is not true with several scalar fields. Section
II of this paper will show that in general, whatever the contents of the uni-
verse, with only a mild technical assumption about the dependence on wave
number of the field equations for cosmological perturbations in Newtonian
gauge, these equations always have a physical solution for which X → 0 and
R approaches a non-zero constant in the limit q → 0, though there may also
be other modes for which R is not constant. In fact, there are always at least
4
The quantity δσ is defined by writing Tij for scalar perturbations as gij p + ∂i ∂j δσ.
In this formalism, p includes effects of bulk viscosity, while iqδu is the velocity of energy
transport, and so includes effects of heat conduction. For this formalism, see ref. [3].

2
two solutions with X = 0 and R constant in the limit q → 0: one solution
for which R =6 0, and another with R = 0. These solutions will be illustrated
in Section III for the case of inflation with any number of interacting scalar
fields. The existence of such solutions is well known in special cases, but I
do not know of a previous general proof of their existence.
Solutions of this sort are usually called adiabatic, even in contexts where
thermodynamics has no relevance.5 As we will also see in Section II, in
theories in which the energy- momentum tensor is the sum of a number of
tensors Tfµν for a set of fluids labeled f , we have the stronger result that for
q = 0 in the adiabatic modes, the perturbations in each fluid satisfy
δρf δpf δρ δp
= = = . (4)
ρ̄˙ f p̄˙ f ρ̄˙ p̄˙

In the special case where the unperturbed energy-momentum tensors are


separately conserved, we also have ρ̄˙ f = −3H(ρ̄f + p̄f ), in which case Eq. (4)
implies that the ratios δρf /(ρ̄f + p̄f ) are equal:

δρf δρ
= , (5)
ρ̄f + p̄f ρ̄ + p̄

which is often taken to be what is meant by an adiabatic perturbation.


Things appear very different in synchronous gauge. As shown in Section
IV, when we take the limit of vanishing wave number in the field equations of
synchronous gauge, we find that these equations respect the conservation of a
quantity A in all modes whatever the contents of the universe, provided only
that none of the perturbations in synchronous gauge blow up in the limit q →
0. At first sight, this presents an apparent paradox. All gauges are equivalent,
so how can there be a quantity that is conserved for zero wave number in
all modes during all eras in synchronous gauge and no equally universal
conservation law for zero wave number in Newtonian gauge? We will find the
answer to be that when we speak of the limit q = 0 we mean different things
in different gauges. Though mathematically equivalent, synchronous gauge
5
Sometimes the non-adiabatic solutions of the field equations are called isocurvature
perturbations. This is a misnomer, because even for q = 0 it is only possible for R to
have the constant value zero if X = 0. Nevertheless, as we will see in Section IV, in
synchronous gauge there is a sense in which the solutions that do not correspond to the
adiabatic solutions of Newtonian gauge can indeed be regarded as isocurvature modes.

3
and Newtonian gauge suggest different hypotheses about how perturbations
behave in this limit, leading to different conditions for the validity of the
conservation law for q = 0.
In some work on cosmological fluctuations, a quantity ζ, related to the
spatial curvature on spacelike surfaces of constant energy density, is used in
place of R. It is defined in Newtonian gauge by[5]

ζ ≡ −Ψ − Hδρ/ρ̄˙ = −Ψ + δρ/3(ρ̄ + p̄) . (6)

By taking a suitable linear combination of the time-time and time- space


components of the Einstein field equation and the part of the space- space
component proportional to δij , one can derive a general constraint[6],

q2a
!
3 3
a δρ − 3Ha (ρ̄ + p̄) δu + Ψ=0, (7)
4πG

so that
q2
!
R−ζ = Ψ (8)
3a2 Ḣ
and therefore in all modes ζ → R in the limit q → 0.

II. ADIABATIC MODES IN NEWTONIAN GAUGE

We consider a general cosmological model, based on the Einstein field


equations, supplemented with whatever other equations are needed to give
the energy density, pressure, and velocity potential perturbations in terms of
independent dynamical variables, and the field equations satisfied by those
variables. We will demonstrate the general existence of a pair of adiabatic
solutions of these field equations in Newtonian gauge: one with R = 6 0 and
constant in the limit q → 0, and the other with R = 0 in this limit. In order
to draw these conclusions without specifying the formulas for δρ, δp, and δu,
we use a trick, based on the fact that although there is no remaining gauge
symmetry in Newtonian gauge for q 6= 0, for q = 0 there is a remnant gauge
symmetry of the field equations in Newtonian gauge, which makes it easy to
find exact general solutions of the field equations for q = 0. Not all of these
solutions are physical. To be physical, such a solution must be the limit as
q → 0 of a solution of the field equations for q 6= 0 in at least a neighborhood
of q = 0. For this to be the case, it is necessary for the q = 0 solution

4
to satisfy certain conditions imposed by the Einstein field equations, which
limit the physical solutions to a linear combination of just two independent
adiabatic modes. We will then make a mild technical assumption about
the dependence of the field equations on q, which guarantees that these two
solutions are the limits as q → 0 of physical solutions for q 6= 0.
Whatever the constituents of the universe, for q = 0 the field equations
in Newtonian gauge for scalar (i.e., compressional) modes will always be
invariant under the gauge transformation induced by a redefinition of the
time coordinate
t → t + ǫ(t) , (9)
and a re-scaling of the space coordinate

xi → xi (1 − λ) , (10)

with ǫ(t) an arbitrary infinitesimal function of time and λ an arbitrary in-


finitesimal constant. For scalar modes, δgi0 is proportional to qi , and the
part of δgij not proportional to δij is proportional to qi qj , so both auto-
matically vanish for q → 0, and we therefore do not need to impose any
conditions on ǫ(t) and λ to remain in Newtonian gauge for q = 0. Eqs. (9)
and (10) provide the most general spacetime transformations of purely scalar
perturbations that preserve the condition q = 0.
Under this gauge transformation, the q = 0 perturbations undergo the
transformation
Ψ → Ψ + Hǫ − λ , Φ → Φ − ǫ̇ , (11)
δρ → δρ − ρ̄˙ ǫ , δp → δp − p̄˙ ǫ . (12)
Likewise, in the case of separate fluids with energy density and pressure ρf
and pf or separate scalar fields ϕf ,

δρf → δρf − ρ̄˙ f ǫ , δpf → δpf − p̄˙ f ǫ , (13)

or
δϕf → δϕf − ϕ̄˙ f ǫ . (14)
It follows that there is always a solution of the Newtonian gauge field equa-
tions for q = 0, in which:

Ψ = Hǫ − λ , Φ = −ǫ̇ , (15)

5
δρ = −ρ̄˙ ǫ , δp = −p̄˙ ǫ , (16)
and for several fluids

δρf = −ρ̄˙ f ǫ , δpf = −p̄˙ f ǫ , (17)

or several scalar fields


δϕf = −ϕ̄˙ f ǫ , (18)
where ǫ(t) is an arbitrary function of time and λ is an arbitrary constant. (It
is not necessary for this that the energy-momentum tensors of the individual
fluids or scalar fields be separately conserved; all we need is that they are
tensors.)
Of course, for general ǫ(t) and λ this is just a gauge mode. For it to
have any physical significance, it must satisfy certain conditions that allow
it to be extended to the case of non-zero wave number. In particular, the
part of the space-space component of the Einstein field equations that is not
proportional to δij takes the form (with δσ the anisotropic stress):

qi qj Φ − Ψ) = −8πGqi qj δσ , (19)

so this equation disappears for q = 0. In order for the solution (15)–(18) of


the field equations to be extended to q 6= 0, we must have

Φ = Ψ − 8πGδσ , (20)

and therefore ǫ(t) and λ must satisfy the condition

ǫ̇ + Hǫ = λ − 8πGδσ , (21)

which can always be satisfied by a suitable choice of ǫ(t).


There is another equation that disappears for q = 0: for scalar modes the
space-time component of the Einstein field equations reads
   
qi − 2Ψ̇ − 2HΦ = 8πG ρ̄ + p̄ qi δu = −2Ḣqi δu (22)

where δu is the velocity potential in Newtonian gauge, which does not appear
in the equations for q = 0. Hence in order for the solution we have found to
be extended to q 6= 0, this solution must also have a velocity potential given
by
Ḣδu = Ψ̇ + HΦ , (23)

6
or, using the result (15),
δu = ǫ . (24)
This agrees with what would be found from the gauge transformation induced
by the coordinate transformation (9). Likewise, with several fluids,

δuf = ǫ (25)

From Eqs. (16) and (24) it follows that δρ = 3H (ρ̄ + p̄) δu, so the constraint
equation (7) is automatically satisfied for q = 0 by this solution. By inserting
Eqs. (15), (16), and (24) in equations (1) and (6), we now find that for q = 0,
this solution has
R=ζ =λ, (26)
so R and ζ are indeed constant and equal for this solution in the limit q → 0,
as was to be shown. They are also non-zero, as long as we take λ 6= 0.
Now we have to ask what additional conditions are needed to ensure that
this solution is the limit as q → 0 of a solution with q 6= 0. In general, any
closed set of linear homogeneous ordinary differential equations for a finite
set of dependent variables can be put in the first-order form
X
ẏn (t) + Cnm (t)ym (t) = 0 . (27)
m

(If some of the original set of field equations involve derivatives of higher
than first order, we can still write the equations in the form (27) by including
some derivatives of the field variables among the ym (t).) This has the general
solution  Z t
X  
′ ′
yn (t) = T exp − C(t ) dt ym (t0 ) (28)
m t0 nm

with t0 arbitrary, and with T denoting a time-ordered product defined by a


power series expansion of the exponential, which for finite matrices is always
convergent. The initial conditions may be subject to constraints like Eq. (7),
which can be written X
cn yn (t0 ) = 0 . (29)
n

(Eq. (7) is such a constraint, because the equations of energy and momentum
conservation and the gravitational field equation Ψ̇ + HΦ = Ḣδu imply that
the left-hand side of Eq. (7) is time-independent.)

7
Our “mild technical” assumption is that, as long as the Einstein equa-
tions (19) and (22) are written instead in the stronger form (20) and (23),
the matrix elements Cnm (t) and the constraint coefficients cn are continuous
functions of q in at least a neighborhood of q = 0. In this case the yn (t0 )
that satisfy Eq. (29) and the matrix in Eq. (28) will also be continuous in
q, so that any solution of Eq. (27) for q = 0 can be extended to a solution
for q 6= 0 in a neighborhood of q = 0 by using Eq. (28) and (29) with the
values of C(t) and c for q 6= 0. The next section shows the validity of this
assumption in one illustrative example: inflation with several scalar fields
and an arbitrary potential. Because q generally enters the field equations
and constraint equations in such a simple manner, we expect that this as-
sumption will always be satisfied, and in any case it is easy to check for any
specific model. For q 6= 0, there is no remaining gauge freedom in Newtonian
gauge (as there is for q = 0), so the adiabatic solution found in this way will
be the limit as q → 0 of a physical solution, not a mere gauge mode.
We expect the anisotropic stress coefficient δσ for a wide class of theories
to be some linear combination of δu, δρ, and δp, so that for the solution
(15)–(17), (24)–(25), δσ may be written as δσ = ǫΣ, with Σ(t) depending
only on unperturbed quantities. (For instance, a non-zero shear viscosity η
in an imperfect fluid gives δσ = −2η δu, so here Σ = −2η.) In all such cases,
Eq. (21) has the general solution
λ t
Z
ǫ(t) = α(t′ ) dt′ (30)
α(t)
where  Z t 
′ ′
α(t) ≡ a(t) exp 8πG Σ(t ) dt , (31)
and the lower limit on the integral in Eq. (30) is arbitrary.
There is also a second mode, corresponding to the possibility of shifting
the lower limit of the integral in Eq. (30), for which ǫ(t) goes as ǫ(t) ∝ 1/α(t).
Since shifting the lower bound on the integral in Eq. (30) has no effect on
the value (26) of R and ζ, this solution has R = ζ = 0.
In the special case of vanishing anisotropic stress we have δσ = 0, so here
Φ = Ψ, and α(t) is just the Robertson–Walker scale factor a(t). The general
solution of Eq. (21) is then
λ Zt ′ ′
ǫ(t) = a(t ) dt , (32)
a(t)

8
with an arbitrary lower limit. This eventually increases in absolute value as t
for Robertson–Walker scale factors that grow as any power of t, while in the
other mode ǫ(t) ∝ 1/a(t) decreases with time. Inserting the result (32) in
Eqs. (15)–(18) gives explicit results for the perturbations in the gravitational
field and various pressures and energy densities as functions of time.
The results presented in this section can be interpreted in terms of what
Liddle and Lyth in ref. [1] call a “separate universe” picture, which in one
form or another has been used since the beginning of inflationary theory to
deal with cosmological fluctuations in the case of a single scalar field. For
instance, Bardeen, Steinhardt, and Turner in ref. [5] gave what they called a
‘heuristic argument’ that in this case any portion of the universe that is larger
than the horizon 1/H but smaller than the physical perturbation wavelength
a/q would have to look like a separate unperturbed universe, with ϕ + δϕ
following the unique evolutionary path of the scalar field, and with all of
these separate universes therefore the same except for a variation in the time
at which the scalar field satisfies some specific initial condition a few Hubble
times after horizon exit. As pointed out by Bardeen et al., it follows then
that δρ/ρ̄˙ = δp/p̄˙ , and hence X = 0, for q/a ≪ H.
There is a potential problem with this sort of argument, that there are
two fields involved, the inflaton and the gravitational field, so that different
separate universes might have different ratios of these fields. The argument
of Bardeen et al. was formulated in a gauge in which it is unnecessary to
consider fluctuations in the gravitational field, but it applies also to Newto-
nian gauge, because in this gauge the constraint (7) allows the gravitational
potential to be expressed in terms of fluctuations in the scalar field. But as
we have seen in this section, in Newtonian gauge it is necessary not only to
allow shifts in the time at which the scalar field reaches some given value
after horizon crossing, but also to re-scale the co-moving coordinates used
in each separate universe. In synchronous gauge there is no constraint like
Eq. (7) that allows us to express the gravitational field in terms of the scalar
field, and so, as we will see in Section IV, the solutions even for inflation with
a single scalar field do not satisfy X = 0 in the limit q → 0.
There is another potential problem, that the equation of motion of the
scalar field is a second-order differential equation, so that there are two in-
dependent solutions whose relative coefficients may vary from one separate
universe to another. Bardeen et al. and other authors avoid this problem by
assuming that the scalar field experiences a period of “slow roll” inflation,

9
in which the differential equation satisfied by the scalar field is of first order,
to a good approximation. We have not had to make this assumption, for a
reason already pointed out by Guth and Pi[7]: the Wronskian of these two
solutions decays rapidly after horizon crossing, so that it is as if there were
only one independent solution. (Guth and Pi considered the case of H con-
stant, but even with a time-dependent H the Wronskian still decays, though
not precisely exponentially.)
In any case, it has always been clear that such “separate universe” argu-
ments do not rule out non-adiabatic solutions in the case of several scalar
fields, where ratios of the scalar fields may vary from one “separate universe”
to another. The results of this section may be interpreted as the statement
that in this and all other cases it is always possible to find an adiabatic solu-
tion of the field equations in Newtonian gauge in which the separate universes
appear the same, except for a shift in the time coordinate and a re-scaling of
the co-moving space coordinates.

III. AN EXAMPLE: MULTIFIELD INFLATION

For illustration, and to confirm the reasoning of the theorem of the pre-
vious section in a case where X does not vanish for all modes, let us consider
the case of inflation with an arbitrary number of scalar fields ϕf , and with a
general potential V that may include interactions among the various scalars.
The energy-momentum tensor of the scalar fields has the perfect-fluid form,
so here σ = 0, and Φ = Ψ. The field equations in Newtonian gauge are

ϕ̄˙ f δϕf ,
X
Ψ̇ + HΨ = 4πG (33)
f

∂ 2 V (ϕ̄) q2
!
∂V (ϕ̄)
+ 4Ψ̇ϕ̄˙ f , (34)
X
δ ϕ̈f + 3Hδ ϕ̇f + δϕf ′ + 2 δϕf = −2Ψ
f′
∂ ϕ̄f ∂ ϕ̄f ′ a ∂ ϕ̄f
and the constraint (7) is here

q2
!
X 
Ḣ + 2 Ψ = 4πG −ϕ̄˙ f δ ϕ̇f + ϕ̄
¨ f δϕf . (35)
a f

We can write Eqs. (33) and (34) in the form (27) by taking the yn to run
over Ψ and all φf and φ̇f , in which case the constraint (35) is of the form

10
(29). Here obviously Cnm (t) and cn are continuous in q in a neighborhood
of q = 0; in fact, they are just linear functions of q 2 . Hence any solution of
Eqs. (33)–(35) that we find for q = 0 can be extended to a solution for q 6= 0.
Let us try for a solution for q = 0 in which all of the individual velocity
potentials −δϕf /ϕ̄˙ f are equal, so that

δϕf = −ϕ̄˙ f δu , (36)

with the common value satisfying

δ u̇ = −Ψ . (37)

Using the time-derivative of the unperturbed scalar field equation


∂V (ϕ̄)
¨ f + 3H ϕ̄˙ f +
ϕ̄ =0, (38)
∂ ϕ̄f
we can put Eq. (34) for q = 0 in the form

Ḣδu + Hδ u̇ + δü = 0 . (39)

Also, the gravitational field equation (33) now reads Ψ̇ + HΨ = Ḣδu, which
Eq. (39) guarantees is automatically satisfied by the Ψ given by Eq. (37).
The general solution is
λ
Z
δu = a dt , Ψ = Hδu − λ , (40)
a
(with λ an arbitrary constant), just as we found above in Eqs. (15), (24),
and (32). The perturbations to the energy density and pressure of the f th
field here are
2 ∂V  
2
δρf = −Ψϕ̄˙ + ϕ̄˙ δ ϕ̇ + δϕf = − Ψ + δ u̇ ϕ̄˙ f − ρ̄˙ f δu = −ρ̄˙ f δu , (41)
∂ ϕ̄f
and
2 ∂V  
2
δpf = −Ψϕ̄˙ + ϕ̄˙ δ ϕ̇ − δϕf = − Ψ + δ u̇ ϕ̄˙ f − p̄˙ f δu = −p̄˙ f δu , (42)
∂ ϕ̄f
so this mode is adiabatic, in the sense that X → 0 for q → 0. Inserting
Eq. (40) in Eq. (1) gives again R = λ.

11
Once again, because of the freedom to shift the lower limit of the integral
in Eq. (40), there are two adiabatic modes here, the second with δu ∝ 1/a
and R = 0.
For a single scalar field, Eqs. (33) and (34) are a third-order set of dif-
ferential equations, and therefore have a third independent solution. The
third solution can also be found explicitly, and turns out to have Ṙ ∝ 1/a3 Ḣ
for q = 0, so this solution is not adiabatic. However, this third solution is
eliminated by the constraint Eq. (35), which as we have seen in the previ-
ous section is automatically satisfied by any adiabatic solution, but is not
satisfied by the non-adiabatic solution of Eqs. (33) and (34). For N scalar
fields Eqs. (33) and (34) have 2N + 1 independent solutions, of which two are
adiabatic, and one is eliminated by Eq. (35), leaving 2N − 2 non-adiabatic
solutions.

IV. SYNCHRONOUS GAUGE

We now turn to synchronous gauge. With zero unperturbed spatial cur-


vature, the perturbed metric has components

gij (x, t) = a2 (t)δij + hij (x, t) , g00 = −1 , gi0 = 0 , (43)

with hij a small perturbation. We now assume for simplicity that the per-
turbed energy-momentum tensor takes the perfect-fluid form

Tµν = p gµν + (p + ρ)uµ uν . (44)

The unperturbed quantities p̄ and ρ̄ depend only on time, and the unper-
turbed velocity four-vector has components ū0 = 1, ūi = 0. The normaliza-
tion condition uµ uµ = −1 then requires that the velocity perturbation δuµ(S)
is purely spatial. (A superscript (S) is used to denote perturbed quantities
in synchronous gauge.) We consider only compressional modes, for which
(S)
δui = ∂δu(S) /∂xi . Then the relevant field equations for a Fourier compo-
nent with wave number q are[8]
d 2   
a ψ = −4πGa2 δρ(S) + 3δp(S) , (45)
dt
and     
δ ρ̇(S) + 3H δρ(S) + δp(S) = − ρ̄ + p̄ ψ − a−2 q 2 δu(S) . (46)

12
Here ψ is a field employed in recent work using synchronous gauge[9]
!
d hii
ψ≡ . (47)
dt 2a2

There is also an Euler equation that will be needed later in this section:
d h 3  i
a ρ̄ + p̄ δu(S) = −a3 δp(S) . (48)
dt
 
From equations (45) and (46) together with the relation 4πG ρ̄ + p̄ = −Ḣ
it follows that
Ȧ = −q 2 Hδ u̇(S) (49)
where
A ≡ a2 Hψ − 4πGa2 δρ(S) − q 2 Hδu(S) . (50)
Here is the proof: Eq. (46) can be written
  Ḣ  
δ ρ̇(S) + 3H δρ(S) + δp(S) = ψ − a−2 q 2 δu(S) ,
4πG
and it follows immediately from Eq. (45) that

d(a2 Hψ)  
= −4πGa2 H δρ(S) + 3δp(S) + a2 Ḣψ .
dt
Eliminating Ḣψ from these two equations gives

d(a2 Hψ)   h  i
= −4πGa2 H δρ(S) +3δp(S) +4πGa2 δ ρ̇(S) + 3H δρ(S) + δp(S) +q 2 Ḣδu(S)
dt
or in other words
d h 2 i
a Hψ − 4πGa2 δρ(S) = q 2 Ḣδu(S) .
dt
The quantity in square brackets on the left is not invariant under the gauge
transformations that preserve the condition (43) for synchronous gauge, so
instead we work with the related gauge-invariant quantity (50), for which
Eq. (49) follows immediately.

13
As long as the velocity potential remains finite in the limit q → 0, Eq. (49)
yields a conservation law

Ȧ = 0 for q = 0 . (51)

This is true for all modes in all cases, including inflation with several scalar
fields and for the transition from radiation to matter dominance. The con-
servation of A in the limit q = 0 can also be derived by simply perturbing
a(t), ρ(t), and the curvature constant K in the Friedmann equation, which
gives δK = −2A/3.
By taking suitable linear combinations of solutions, it is always possible
to arrange that for q = 0 just one of a complete set has A 6= 0, while all
the other solutions have A = 0. Examples are given in an appendix to
this paper. Because of the connection of A with the spatial curvature, it
is legitimate to call the solutions with A = 0 isocurvature modes. When
q is small but non-zero the isocurvature solutions usually have both A and
Ȧ of order q 2 , so that Eq. (49) does not keep A for these solutions from
undergoing large fractional changes. This does not vitiate the usefulness of
the conservation law for initial conditions that give a physical perturbation
in which all solutions make contributions with comparable coefficients. In
this case, the contribution of the isocurvature modes to A may be rapidly
varying, but at any given time they will be small as long as q is sufficiently
small. The physical solution will have a rapid fractional variation in A only
if the coefficient of the mode with A 6= 0 for q = 0 is suppressed, or if the
coefficients of the isocurvature modes are enhanced.
There is a simple relation between the quantity A introduced in this sec-
tion and the more familiar quantity R discussed in Sections I–III. Given
perturbations Ψ, δρ and δu in Newtonian gauge, we can find the pertur-
bations ψ, δρ(S) , and δu(S) in synchronous gauge from the transformation
equations:
d    2
ψ = −3Ψ̇ − 3 Hǫ + q/a ǫ , δρ(S) = δρ − ǫρ̄˙ δu(S) = δu + ǫ , (52)
dt
where
ǫ̇ = Ψ . (53)
(The possibility of shifting ǫ by a constant term corresponds to the possibility
of making gauge transformations that preserve the conditions for synchronous

14
gauge.) By applying these equations to the quantity (50), it is elementary
to show that A is related to the quantity R defined in Eq. (1) by

A = −q 2 R . (54)

Thus for any finite q the fractional rate of change in R will be the same as the
fractional rate of change in A. In some treatments of multi-field inflation[10]
and in discussions of the curvaton model[4], it is simply assumed that the
mode with R = 6 0 and hence A 6= 0 is somehow suppressed, which is enough
to explain why these authors find a significant fractional change in R. But
why more generally does the condition X = 0 play an important role in
establishing the conservation of R for q → 0 in Newtonian gauge, while
there seems to be no similar condition needed for the conservation of A in
synchronous gauge?
As a first step toward resolving this apparent paradox, we note from
Eq. (54) that that the limit as q → 0 of the perturbed quantities δρ(S) and
ψ in synchronous gauge in the mode for which A 6= 0 in this limit is not
obtained by applying a gauge transformation to the perturbed quantities in
the corresponding mode in Newtonian gauge for q = 0, since that would give
A = 0 for q = 0. We can go further, and show in general that for q = 0,
the synchronous gauge solution corresponding to any adiabatic solution of
the Newtonian gauge field equations (normalized to not diverge as q → 0)
has vanishing values not only for A, but also (up to a choice of a particular
synchronous gauge) for ψ and the total density fluctuation δρ(S) and velocity
potential δu(S) .
The reasoning here is essentially the reverse of that used to prove the
theorem of Section II. We will use the space-time component of the Einstein
field equations in Newtonian gauge

Ψ̇ + HΨ = −4πG(ρ̄ + p̄)δu = Ḣδu . (55)

We work in the limit q = 0, assuming that the solution is normalized so that


in Newtonian gauge all fluctuations remain finite in this limit. (As we shall
see, this assumption is less innocent than it may seem.) Then for modes that
for q = 0 are adiabatic in the sense that X = 0, Eqs. (1) and (2) give

d
Ψ̇ = (Hδu) (56)
dt

15
for q = 0. Combining this with Eq. (55) gives

Ψ = −δ u̇ . (57)

Thus according to Eq. (53) we can adopt a particular synchronous gauge


such that the transformation parameter ǫ in Eq. (52) is

ǫ = −δu . (58)

Using Eqs. (56) and (58) in Eq. (52) shows immediately that, for q = 0,

ψ=0. (59)

Furthermore, Eq. (55) together with the Newtonian gauge Euler equation
supplies a general constraint equivalent to Eq. (7) for q = 0:

−4πGδρ = 3H Ḣδu (60)

Eqs. (52) and (60) give the synchronous gauge density fluctuation

−4πGδρ(S) = −4πG [δρ − ǫρ̄˙ ]


= 3H Ḣδu + [−4πG]δu [−3H(ρ̄ + p̄)]
= 0. (61)

Finally, the velocity potential in this synchronous gauge is

δu(S) = δu + ǫ = 0 . (62)

Thus no synchronous gauge perturbation with non-vanishing values of ψ


or δρ(S) or δu(S) (apart from those that can be eliminated by a transformation
to a different synchronous gauge), such as modes 1, 2, and 3 of the radiation
plus cold dark matter model of the appendix, can be the gauge transforma-
tion of one of the q = 0 adiabatic Newtonian gauge solutions. Rather, the
synchronous gauge solutions for q = 0 with A a non-zero constant must be
the gauge transformations of the terms of order q 2 in the adiabatic Newtonian
gauge solution with R = 6 0, re-normalized by dividing by a factor q 2 .6 With
6
This is why it is possible for the quantity X not to vanish in any mode for q = 0 in
synchronous gauge, as we find in the appendix in the case of inflation, while there are two
modes in Newtonian gauge in which X → 0 for q → 0, despite the fact that X is gauge
invariant. It is not that X is different in the two gauges, but rather that the limit q → 0
means different things in synchronous and Newtonian gauge.

16
this re-normalization of the synchronous gauge modes, as in the appendix,
the conserved quantity A is not necessarily of order q 2 , as would be expected
from Eq. (54), but can have a finite limit for q → 0, as we will find it does
in the appendix.
Now at last we come to the point. Working in Newtonian gauge, it is
most natural to assume that, with an over-all normalization factor chosen
so that R is finite and non-zero in the limit q = 0, all density fluctuations
and velocity potentials as well as Ψ are non-zero in this limit. Under this
assumption, if the contribution of non-adiabatic modes is comparable to that
of the adiabatic modes, R will undergo significant changes with time. Trans-
forming this sort of solution to synchronous gauge, we have found above that
the density fluctuations and the total velocity potential receive contributions
of order q 2 (relative to the Newtonian gauge perturbations) from the adia-
batic modes but of order unity from the non-adiabatic modes, so that Ȧ is of
order q 2 , while A is also of order q 2 , and so A does suffer significant changes
with time. Or we can re-normalize the synchronous gauge fluctuations by
an over-all factor of order 1/q 2 , in which case A and the density fluctua-
tions and velocity potentials receive contributions of order unity for q = 0
from the adiabatic modes, as in the appendix, while the contribution of the
non-adiabatic modes to the total velocity potential if present is enhanced by
a peculiar looking factor of 1/q 2 , giving both A and Ȧ non-zero limits for
q → 0.
On the other hand, working in synchronous gauge, it is most natural
to assume that, with an over-all normalization factor chosen so that A is
finite and non-zero in the limit q = 0, all density fluctuations and velocity
potentials as well as ψ are finite in this limit. Under this assumption, it makes
no difference whether the contribution of non-adiabatic modes is comparable
to that of the adiabatic modes; even if it is, A will undergo no significant
changes with time. Transforming this sort of solution to Newtonian gauge,
one finds that the density fluctuations and the total velocity potential receive
contributions of order 1/q 2 from the adiabatic modes and of order unity from
the non-adiabatic modes, so R is of order 1/q 2 while its rate of change is only
of order unity. Or we can re- normalize the Newtonian gauge fluctuations by
an over-all factor of order q 2 , in which case R and the density fluctuations and
velocity potentials receive contributions of order unity for q = 0, while the
contribution of any non-adiabatic modes is suppressed by a peculiar looking
factor of q 2 , giving R a zero rate of change for q = 0.

17
So which is right? The issue is not the over-all normalization of the total
perturbations, but the relative magnitude of its adiabatic and non-adiabatic
terms in the limit q → 0. There is nothing about either gauge that makes it
a more reliable guide to our intuition about this than the other.
It is generally expected that inflation with several scalar fields the gen-
eral solution does not have R approaching a constant for increasing a(t), in
agreement with what would be expected from the behavior for q → 0 sug-
gested by Newtonian gauge but not synchronous gauge. But there are cases
of multi-field inflation in which A and hence R do approach constants as a(t)
increases, as would be expected from the behavior for q → 0 suggested by
synchronous gauge but not Newtonian gauge. One case is a potential given
by a sum of exponentials[11].
X
V = gn exp(−λn ϕn ) (63)
n

Another is a potential of the form


!
ϕ2n
X
V =F , (64)
n

with F an arbitrary function. It would be interesting to characterize the


general class of potentials for multi-field inflation for which A and R approach
constants as a(t) increases.

APPENDIX: LONG-WAVELENGTH SOLUTIONS IN


SYNCHRONOUS GAUGE

In this appendix we will study several examples of calculations for zero


wave number in synchronous gauge, to exhibit both solutions with A 6= 0
and those with A = 0. All quantities here will be in synchronous gauge, so
we will drop the label (S).
As a first example, consider inflation with just a single real scalar field
ϕ = ϕ̄(t) + δϕ(x, t), and potential V (ϕ). As is well known, the unperturbed
pressure and energy density are
1 2 1 2
ρ̄ = ϕ̄˙ + V (ϕ̄) , p̄ = ϕ̄˙ − V (ϕ̄) , (65)
2 2

18
from which we find the equation of motion of the unperturbed scalar field

¨ + 3H ϕ̄˙ + V ′ (ϕ̄) = 0 .
ϕ̄ (66)

The perturbations to the energy density and pressure are

δρ = ϕ̄˙ δ ϕ̇ + V ′ (ϕ̄)δϕ , δp = ϕ̄˙ δ ϕ̇ − V ′ (ϕ̄)δϕ . (67)

Also, the perturbed velocity potential is

δu = −δϕ/ϕ̄˙ . (68)

The field equations (45) and (46) for the Fourier component of the pertur-
bations with wave number q here take the form
d  2   
a ψ = −4πGa2 4ϕ̄˙ δ ϕ̇ − 2V ′ (ϕ̄) δϕ , (69)
dt
δ ϕ̈ + 3Hδ ϕ̇ + V ′′ (ϕ̄) δϕ + a−2 q 2 δϕ = −ϕ̄˙ ψ , (70)
where v  
˙2
u
ȧ u 8πG ϕ̄
H≡ = + V (ϕ̄) . (71)
u 
t
a 3 2

The Euler equation (48) gives no new information here.


There is a gauge mode, with ϕ = τ ϕ̄˙ and ψ = τ (3Ḣ − q 2 /a2 ), where τ
is an arbitrary time-independent function of q. Knowing this solution allows
us to reduce the degree of equations (67) and (68) from three to two, in
agreement with the number of physical solutions found in Newtonian gauge
in Section III. We introduce time-dependent functions f and g by writing
 
δϕ = f ϕ̄˙ , ψ = (f + g) 3Ḣ − q 2 /a2 . (72)

Equations (67) and (68) then become a second-order set of equations for the
gauge-invariant quantities f˙ and g:

f¨ + 3H f˙ + (Ḧ/Ḣ)f˙ = − 3Ḣ − q 2 /a2 g ,


 
(73)

3Ḣ − q 2 /a2 ġ + 6H Ḣ + 3Ḧ g = Ḣ + q 2 /a2 f˙ ,


     
(74)

19
in which the gauge mode appears in the possibility of adding a constant to f .
These equations can be solved exactly for q = 0 and H(t) arbitrary. There
are two physical solutions:
Mode 1:
" #
1 t 1 H(t) t
Z Z
(1)
g (t) = 3 ′ ′
a(t )dt , f˙(1) (t) = 1 − ′
a(t )dt ′
.
3a (t)Ḣ(t) 0 a2 (t)Ḣ(t) a(t) 0
(75)
Mode 2:
1 H(t)
g (2) (t) = , f˙(2) (t) = − . (76)
3a3 (t)Ḣ(t) a3 (t)Ḣ(t)

(The lower limit 0 on the integral over t′ is arbitrary; changing it just amounts
to adding some of mode 2 to mode 1.)
Equation (50) gives the values of A for q = 0 in these two modes as the
constants
A1 = 1 , A2 = 0 , (77)
even though neither of these solutions satisfies the adiabatic condition X = 0.
A general mixture of modes with coefficients c1 and c2 will have A = c1 for
q = 0, provided c2 does not blow up in this limit. With this proviso, the
conservation of A allows the value of c1 that is calculated for a given inflaton
potential to be used to find the strength of the non-isocurvature mode at later
times. However, if the value of c2 for the physical solution found after horizon
crossing went as c/q 2 for q → 0, while c1 remained finite, then Eqs. (49), (66),
(70), and (74) would give

−cH 2 (t)
Ȧ → c2 Ȧ2 → for q → 0 ,
a3 (t)Ḣ(t)
and there would be no useful conservation law even for q = 0. As discussed
in Section IV, this is just what we would expect if we assumed that, with
an over-all normalization factor chosen so that R is finite and non-zero in
the limit q = 0, all density fluctuations and velocity potentials in Newtonian
gauge as well as Ψ are finite and non-zero in this limit.
For another example, we consider a later epoch, when the dominant con-
stituents of the universe were radiation and cold dark matter. (For simplicity,

20
we are neglecting the baryon density compared with the density of cold dark
matter, but supposing that there are still enough baryons to keep the radia-
tion in thermal equilibrium, and we are ignoring the effects of free-streaming
neutrinos.) We adopt a particular synchronous gauge in which the cold dark
matter is at rest. The field equations then are Eqs. (46) and (48) for the ra-
diation energy density perturbation δρR and velocity potential δuR ; Eq. (46)
for the cold dark matter density ρD ; and Eq. (45) for ψ, with the total energy
density and pressure appearing on the right hand side:
 
δ ρ̇R + 4HδρR = −(4/3)ρ̄R ψ − a−2 q 2 δuR , (78)

d h 3 i
4 a ρ̄R δuR = −a3 δρR , (79)
dt
δ ρ̇D + 3HδρD = −ρ̄D ψ , (80)
d 2 
a ψ = −4πGa2 (2δρR + δρD ) . (81)
dt
The unperturbed radiation and dark matter densities go as a−4 and a−3 ,
respectively. It is convenient here to normalize a so that a = 1 when ρ̄R = ρ̄D ,
so that
ρ̄R = ρEQ a−4 , ρ̄D = ρEQ a−3 , (82)
where ρEQ is constant.
Equations (78)–(81) are a fourth-order system of differential equations,
so there are four modes, all of which are physical because the gauge has been
fixed by choosing δuD = 0. For q = 0, they take the form
Mode 1:
!
(1) 1
δρR = 16 + 8a − 2a2 + a3 ,
πGa6
a4 da
s
(1) a 3 Z a
(1)
δuR = − √ δρR , (83)
4ρEQ 8πGρEQ 1+a
!
(1) 3 2 3
δρD = 16 + 8a − 2a + a ,
4πGa5
s √ !
3 1+a
ψ (1) = 2 32 + 8a − a 3
8πGρEQ a4

21
Mode 2:
(2) 1 √
δρR = 1+a,
πGa6 s
a 3 a4 da
Z a
(2) (2)
δuR = − √ δρR , (84)
4ρEQ 8πGρEQ 1+a
(2) 3 √
δρD = 1+a,
4πGa5
s !
3
ψ (2) = 4 + 3a
8πGρEQ

Mode 3:
(3) (3)
δρR = δρD = ψ (3) = 0 ,
(3)
δuR ∝ a . (85)

Mode 4:

!
(4) 1
δρR = 6
8 + 4a − a2 − 8 1 + a ,
πGa
a a4 da
s
a 3
Z
(4) (3)
δuR = − √ δρR , (86)
4ρEQ 8πGρEQ 1+a

!
(4) 3
δρD = 8 + 4a − 8 1 + a ,
8πGa5

s !
8 3
ψ (4) = 4 (4 + a) 1 + a − 4 − 3a
a 8πGρEQ
The lower bound on the integrals in the formulas for δuR in modes 1, 2, and 4
are arbitrary; changing this lower limit in any of these integrals just amounts
to adding some of mode 3 to that mode.
Note that modes 1, 2, and (trivially) 3 are adiabatic, in the sense that
δρD δρR
= , (87)
ρ̄D ρ̄R + p̄R
(and so X = 0) while mode 4 is not adiabatic in this sense. The values of A
for the four modes are
A1 = 1, A2 = A3 = A4 = 0 . (88)

22
Thus modes 2 and 3 are both adiabatic and isocurvature. An arbitrary
mixture of modes will have A constant unless the coefficients of modes 2, 3,
or 4 blow up as 1/q 2 limit q → 0, which will be the case if the fluctuations in
the non-adiabatic modes in Newtonian gauge have non-zero limits for q → 0.

ACKNOWLEDGMENTS

I am grateful for helpful correspondence with E. Bertschinger, D. Lyth, S.


Mukhanov, and N. Turok. This research was supported in part by the Robert
A. Welch Foundation and NSF Grants PHY-0071512 and PHY-9511632.

REFERENCES

1. J. M. Bardeen, Phys. Rev. D22, 1882 (1980); D. H. Lyth, Phys.


Rev. D31, 1792 (1985). For reviews, see J. Bardeen, in Cosmology and
Particle Physics, eds. Li-zhi Fang and A. Zee (Gordon & Breach, New
York, 1988); A. R. Liddle and D. H. Lyth, Cosmological Inflation and
Large Scale Structure (Cambridge University Press, Cambridge, UK,
2000).

2. J. M. Bardeen, Phys. Rev. D22, 1882 (1980), Eq. (5.21).

3. E. Bertschinger, in Cosmology and Large Scale Structure — Proceedings


Session LX of the Les Houches Summer School, ed. R. Schaeffer, J.
Silk, M. Spiro, and J. Zinn-Justin (Amsterdam: Elsevier Science, 1996).

4. S. Mollerach, Phys. Rev. D 42, 313 (1990); A. D. Linde and V.


Mukhanov, Phys. Rev. D 56, 535 (1997); D. H. Lyth and D. Wands,
Phys. Lett. B 524, 5 (2002); T. Moroi and T. Takahashi, Phys. Lett.
B 522, 215 (2001); Phys. Rev. D66, 063501 (2002); D. H. Lyth, C.
Ungarelli, and D. Wands, astro-ph/0208055; K. Dimpopoulos and D.
H. Lyth, astro-ph/0209180.

5. J. M. Bardeen, P. J. Steinhardt, and M. S. Turner, Phys. Rev. D28,


679 (1983). This quantity was re-introduced by D. Wands, K. A. Malik,
D. H. Lyth, and A. R. Liddle, Phys. Rev. D62, 043527 (2000).

6. See, e. g., C. Gordon, D. Wands, B. A. Bassett, and R. Maartens,


Phys. Rev. D63, 023506 (2000), Eq. (14).

23
7. A. Guth and S-Y. Pi, Phys. Rev. Lett. 49, 1110 (1982).

8. These are taken from Eqs. (15.10.50), (15.10.51), and (15.10.53) of S.


Weinberg, Gravitation and Cosmology – Principles and Applications of
the General Theory of Relativity (Wiley, New York, 1972). It should be
noted that the velocity vector U1 used in this reference has components
U1i = a−2 δui = a−2 iqi δu(S) .

9. S. Weinberg, Phys. Rev. D64, 123511 (2001); Phys. Rev. D64,


123512 (2001); Astrophys. J. 581, 810 (2002).

10. See, e.g., V. S. Mukhanov, H. A. Feldman, and R.H. Brandenberger,


Physics Reports 215, 203–333 (1992).

11. K. A. Malik and D. Wands, Phys. Rev. D59, 123501 (1999).

24
UTTG-02-03

Damping of Tensor Modes in Cosmology


arXiv:astro-ph/0306304v2 5 Oct 2003

Steven Weinberg
Theory Group, Physics Department, University of Texas,
Austin, TX, 78712

An analytic formula is given for the traceless transverse part of the anisotropic stress
tensor due to free streaming neutrinos, and used to derive an integro-differential equa-
tion for the propagation of cosmological gravitational waves. The solution shows that
anisotropic stress reduces the squared amplitude by 35.6 % for wavelengths that enter the
horizon during the radiation-dominated phase, independent of any cosmological param-
eters. This decreases the tensor temperature and polarization correlation functions for
these wavelengths by the same amount. The effect is less for wavelengths that enter the
horizon at later times. At the longest wavelengths the decrease in the tensor correlation
functions due to neutrino free streaming ranges from 10.7% for ΩM h2 = 0.1 to 9.0% for
ΩM h2 = 0.15. An Appendix gives a general proof that tensor as well as scalar modes sat-
isfy a conservation law for perturbations outside the horizon, even when the anisotropic
stress tensor is not negligible.

I. Introduction

It is widely expected that the observation of cosmological tensor fluctua-


tions through measurements of the polarization of the microwave background
may provide a uniquely valuable check on the validity of simple inflationary
cosmologies. For instance, for a large class of inflationary theories with single
scalar fields satisfying the “slow roll” approximation, the wave-number de-
pendence PS ∝ k nS −1 and PT ∝ k nT of the scalar and tensor power spectral
functions and the ratio of these spectral functions after horizon exit during
inflation are related by[1]

PT /PS = −nT /2 . (1)

But in order to use observations to check such relations, we need to know


what happens to the fluctuations between the time of inflation and the
present. There is a very large literature on the scalar modes, but ever since
the first calculations[2] of the production of tensor modes in inflation, with
only one exception[3] known to me, the interaction of these modes with mat-
ter and radiation has simply been assumed to be negligible in studies of the
cosmic microwave background[4]. It is not included in the widely used com-
puter program of Seljak and Zaldarriaga[5]. As we shall see, the effect is not
negligible even at the relatively low values of ℓ where the B-type polarization
multipole coefficients CB ℓ are likely to be first measured, and becomes quite
significant for larger values of ℓ.

II. Damping Effects in the Wave Equation

The interaction of tensor modes with matter and radiation vanishes in the
case of perfect fluids, but not in the presence of traceless transverse terms in
the anisotropic stress tensor. In general, the tensor fluctuation satisfies
∇2
!
3ȧ
 
ḧij + ḣij − hij = 16πGπij , (2)
a a2
where dots indicate ordinary time derivatives. Here the components of the
perturbed metric are
h i
g00 = −1 , gi0 = 0 , gij (x, t) = a2 (t) δij + hij (x, t) (3)

where hij (x, t) is treated as a small perturbation; and πij (x, t) is the anisotropic
part of the stress tensor, defined by writing the spatial part of the perturbed
energy-momentum tensor as Tij = p̄ gij + a2 πij , or equivalently
T i j = p̄ δij + πij , (4)
where p̄ is the unperturbed pressure. In these formulas we are considering
only tensor perturbations, so that
hii = 0 , ∂i hij = 0 , πii = 0 , ∂i πij = 0 . (5)
For a perfect fluid πij = 0, but this is not true in general. For instance, in
any imperfect fluid with shear viscosity η, we have[6] πij = −η ḣij . Neverthe-
less, as we shall show in the Appendix, even where hydrodynamic approxi-
mations are inapplicable, hij becomes time- independent as the wavelength

2
of a mode leaves the horizon, and remains time-independent until horizon
re-entry. All modes of cosmological interest are still far outside the horizon
at the temperature ≈ 1010◦ K where neutrinos are going out of equilibrium
with electrons and photons, so hij can be effected by anisotropic inertia only
later, when neutrinos are freely streaming.
We can calculate the contribution of freely streaming neutrinos to πij
exactly[7]. We define a density n(x, p, t) as
3 3
! !
i
xir (t))
X Y Y
n(x, p, t) ≡ δ(x − δ(pi − pri (t)) , (6)
r i=1 i=1

with r labeling individual neutrino and antineutrino trajectories. The rela-


tivistic equations of motion in phase space for any metric with g00 = −1 and
gi0 = 0 are
pir pjr pkr ∂gjk
!
i
ẋr = 0 , ṗri = . (7)
pr 2p0r ∂xi x=xr
It follows then that n satisfies a Boltzmann equation
∂n ∂n pi ∂n pj pk ∂gjk
+ i 0+ =0, (8)
∂t ∂x p ∂pi 2p0 ∂xi
it being understood that pi and p0 are expressed in terms of the independent
variable pi by pi = g ij pj and p0 = (g ij pi pj )1/2 . At a time t1 soon after
neutrinos started free streaming, n had the ideal gas form (assuming zero
chemical potentials)
−1
N
 q 
n(x, p, t1 ) = exp g ij (x, t )p p /k T + 1 ≡ n1 (x, p) , (9)
1 i j B 1
(2π)3
where N is the number of types of neutrinos, counting antineutrinos sepa-
rately, and kB is Boltzmann’s constant. We therefore write
n(x, p, t) = n1 (x, p) + δn(x, p, t) (10)
so that δn vanishes for t = t1 .
In the absence of metric perturbations, Eq. (8) and the initial condition
(9) have the solution n(p) = n̄(p), where n̄(p) is the zeroth-order part of n1 :
N h i−1
n̄(p) = exp (p/k B T1 a1 ) + 1 , (11)
(2π)3

3

and p ≡ pi pi . To first order in metric perturbations, Eq. (8) gives

∂δn(x, p, t) p̂i ∂δn(x, p, t) p ′ ∂


+ i
=− n̄ (p)p̂i p̂j p̂k k (hij (x, t) − hij (x, t1 )) ,
∂t a(t) ∂x 2a(t) ∂x
(12)
where hats denote unit vectors. (In putting the Boltzmann equation in this
form, we use that fact that n1 depends on x and pi only through the combina-
tion g ij (x, t1 )pi pj , so that to first order ∂k n1 (x, p) = −p n̄′ (p)p̂i p̂j ∂k hij (x, t1 ).)
We now suppose that the x-dependence of hij (x, t) is contained in a factor
exp(ik · x), where k is a co-moving wave number.1 Eq. (12) and the initial
condition that δn = 0 at t = t1 then have the solution
i u
Z  

δn(p, u) = − p n̄′ (p) p̂ · k̂ p̂i p̂j du′ eip̂·k̂(u −u) hij (u′ ) − hij (0) (13)
2 0

where we now drop the position argument, and write δn and hij as functions
of a variable u instead of t, with u defined as the wave number times the
conformal time
dt′
Z t
u≡k . (14)
t1 a(t′ )

The space part of the neutrino energy-momentum tensor is given by


3
1 X pir prj 3 n pi pj
!
1 Z
Tνi j
Y
=√ δ (x − xr ) = √ dpk (15)
Detg r p0r Detg k=1 p0
q
This yields terms of first order in hij (u) from pi = g ij pj and p0 = g ij pi pj , a
term of first order in hij (0) from the term n1 in n, and a term of first order
in hij (u) − hij (0) from δn. Collecting all these terms and using Eq. (5) yields
a surprisingly simple formula for πij :
Z u
πij (u) = −4ρ̄ν (u) K(u − U) h′ij (U) dU (16)
0

where primes now indicate derivatives with respect to U or u; K is the kernel


1 +1 sin s 3 cos s 3 sin s
Z
K(s) ≡ dx (1 − x2 )2 ei x s = − − + , (17)
16 −1 s3 s4 s5
1
Conventionally the co-moving coordinate x and wave number k are normalized by
defining a(t) so that a = 1 at present. Here we will leave this normalization arbitrary.

4
and ρ̄ν = a−4 d3 p p n̄(p) is the unperturbed neutrino energy density.
R

To continue, we use Eq. (16) in Eq. (2) and express time-derivatives


in terms of u-derivatives. This gives an integro-differential equation for
hij (u)[8]:
!2 Z
2a′ (u) ′ a′ (u) u
h′′ij (u) + hij (u) + hij (u) = −24fν (u) K(u − U) h′ij (U) dU ,
a(u) a(u) 0
(18)
where fν ≡ ρ̄ν /ρ̄.
We took the initial time t1 to be soon after neutrinos started free stream-
ing, so interesting perturbations are outside the horizon then, and for some
time after. As we show in the Appendix, hij rapidly became time indepen-
dent after horizon exit, and remained so until horizon re-entry. In terms of
u, we then have the initial condition

h′ij (0) = 0 . (19)

The solution of Eq. (18) can therefore be put in the general form

hij (u) = hij (0)χ(u) (20)

where χ(u) satisfies the same integro-differential equation as hij (u)


!2 Z
′′ 2a′ (u) ′ a′ (u) u
χ (u) + χ (u) + χ(u) = −24fν (u) K(u − U) χ′ (U) dU; ,
a(u) a(u) 0
(21)
and the initial conditions

χ(0) = 1 , χ′ (0) = 0 . (22)

III. Short Wavelengths

We will first consider wavelengths short enough to have re-entered the


horizon during the radiation-dominated era (though long after neutrino de-
coupling), and then turn to the general case in the next section. We can
take the initial time t1 to be early enough so that it can be approximated as

5
t1 ≃ 0, with the zero
√ of time defined so that during the radiation-dominated

era we have a ∝ t. Then in Eq. (21) we can set a /a = 1/u, while for 3
neutrino flavors fν takes the constant value fν (0) = 0.40523. Then Eq. (21)
becomes
2 24fν (0) u
Z
χ (u) + χ′ (u) + χ(u) = −
′′
K(u − U) χ′ (U) dU , (23)
u u2 0

Because of the decrease of the factor 1/u2, the right-hand of Eq. (23) becomes
negligible for u ≫ 1, so deep inside the horizon the solution of Eqs. (22) and
(23) approaches a homogeneous solution

χ(u) → A sin(u + δ)/u (24)

as compared with the solution sin(u)/u for fν = 0. A numerical solution


of Eqs. (22) and (23) shows that χ(u) follows the fν = 0 solution pretty
accurately until u ≈ 1, when the perturbation enters the horizon, and there-
after rapidly approaches the asymptotic form (24), with A = 0.8026 and δ
very small. This asymptotic form provides the initial condition for the later
period when the matter energy density becomes first comparable and then
greater than that of radiation, so the effect of neutrino damping at these
later times is still only to reduce the tensor amplitude by the same factor
A = 0.8026. Hence, for wavelengths that enter the horizon after electron–
positron annihilation and well before radiation-matter equality, all quadratic
effects of the tensor modes in the cosmic microwave background, such as
the tensor contribution to the temperature multipole coefficients Cℓ and the
whole of the “B-B” polarization multipole coefficients Cℓ B , are 35.6% less
than they would be without the damping due to free- streaming neutrinos.
(Photons also contribute to πij , but this effect is much smaller because at
last scattering photons contribute much less than 40% of the total energy.)

IV. General Wavelengths

To deal with perturbations that may enter the horizon after the matter
energy density has become important, let us switch the independent variable
from u to y ≡ a(t)/aEQ , where aEQ is a(t) at the time tEQ of radiation- matter

6
equality. To see how they are related, note that
s
a2
3
dy ȧ a0

= = H0 ΩM + (Ωγ + Ων ) ( f raca0 a)4 (25)
du aEQ k/a aEQ k a
The redshift of matter-radiation equality is given by 1 + zEQ = a0 /aEQ =
ΩM /(Ωγ + Ων ), so Eq. (25) can be simplified to read
du Q
=√ (26)
dy 1+y
where
k
Q≡ q . (27)
a0 H0 ΩM (1 + zEQ )
Since u → 0 for y → 0, the solution of Eq. (26) is
q 
u = 2Q 1+y−1 . (28)

Theq Hubble constant at matter-radiation equality has the value HEQ =


H0 2ΩM (1 + zEQ )3 , so Eq. (27) can be written

Q= 2k/kEQ , (29)

where kEQ ≡ aEQ HEQ is the wave number of perturbations that just enter
the horizon at the time of radiation-matter equality. (Hence in particular
the results of the previous section apply for Q ≫ 1.)
The fraction of the total energy density in neutrinos is well known
Ων (a0 /a)4 fν (0)
fν (y) = 3 4
= (30)
ΩM (a0 /a) + (Ωγ + Omegaν )(a0 /a) 1+y
where
Ων
fν (0) = = 0.40523 . (31)
Ων + Ωγ
A little algebra then lets us put Eq. (21) in the form

d2χ(y) dχ(y ′) ′
!
2(1 + y) 1 dχ(y) 24 fν (0) y
Z
(1+y) 2
+ + +Q2 χ(y) = − K(y, y ′) dy ,
dy y 2 dy y2 0 dy ′
(32)

7
where K(y, y ′) is the same as the K(s) given by Eq. (17), but with s now
given by q q 
s ≡ z − z ′ = 2Q 1 + y − 1 + y ′ (33)
The initial conditions (22) now read

dχ(y)
χ(0) = =0. (34)
dy y=0

We now have to face the complication that for general Q the value of y at
last scattering is not in an asymptotic region where the effect of anisotropic
inertia is simply to damp χ(t) by some constant factor. We therefore now
have to consider what feature of χ(t) is related to observations of the cosmic
microwave background. It is χ̇ that enters into the Boltzmann equation
for perturbations to the temperature and Stokes parameters[9], so in the
approximation of a sudden transition from opacity to transparency, we expect
the B-B and other multipole coefficients to depend on χ(y) only through a
factor |χ′ (yL )|2 , where yL = (1 + zEQ )/(1 + zL ) is the value of y at last
scattering. Hence we will be primarily interested in calculating the value of
|χ′ (yL )|2 for various values of Q, and comparing these values with what they
would be in the absence of anisotropic inertia.
For Tγ0 = 2.725◦ K, we have Ωγ + Ων = 4.15 ×10−5 h−2 , so, taking 1 + zL =
1090, the parameter yL is

yL = 22.1 ΩM h2 .

It will be useful also to have an idea of the value of ℓ for which the multipole
coefficients in various correlation functions are dominated by perturbations
with a given Q. The dominant contribution to a multipole coefficient of
order ℓ comes from wave numbers k ≃ aL ℓ/dL , where aL is a(t) at the time
of last scattering, and dL is the angular diameter distance of the surface of
last scattering, which for flat geometries is:
1 1 dx
Z
dL = q ,
H0 (1 + zL ) 1/(1+zL ) ΩM x + (1 − ΩM )x4

where zL is the redshift of last scattering. Thus the multipole order that
receives its main contribution from wave lengths that are just coming into

8
the horizon at matter-radiation equality is
dL kEQ q 1 dx
Z
ℓEQ ≡ = 2ΩM (1 + zEQ ) q , (35)
aL 1/(1+zL ) ΩM x + (1 − ΩM )x4

where zEQ is the redshift of matter-radiation equality. For present radiation


temperature Tγ0 = 2.725◦ K and ΩM h2 = 0.15 this redshift is zEQ = 3613.
If also ΩM = 0.3 and 1 + zL = 1090 then the integral in Eq. (35) has the
value 3.195, and so Eq. (35) gives ℓEQ = 149. Hence for these cosmological
parameters, Eq. (29) gives
√ ℓ ℓ
Q= 2 ≃ .
ℓEQ 105

When referring below to specific values of ℓ, it will always be understood


that the conversion from Q to ℓ has been made using these cosmological
parameters, but it should be kept in mind that the dependence of the function
χ(y) on y and Q is independent of cosmological parameters, and that the
value of y at last scattering depends only on Tγ0 , 1 + zL . and ΩM h2 , not on
ΩM or Ωvac .
Let us first consider the case Q ≪ 1, which for the above cosmological
parameters corresponds to ℓ ≪ 100. Here the kernel K(y, y ′) has the constant
value 1/15, and Eqs. (32) and (34) have a solution of the form

χ(y) → 1 − Q2 g(y) for Q → 0 (36)

where g(y) is independent of Q, and satisfies the inhomogeneous differential


equation

d2g(y)
!
2(1 + y) 1 dg(y) 8fν (0)
(1 + y) 2
+ + + g(y) = 1 (37)
dy y 2 dy 5y 2

and the initial conditions

g(0) = g ′ (0) = 0 . (38)

According to the above discussion, the streaming of free neutrinos damps


the various tensor correlation functions of the cosmic microwave background
by a factor |χ′ (yL )/χ′0 (yL )|2 , which for Q ≪ 1 becomes |g ′(yL )/g0′ (yL )|2 , the

9
subscript 0 denoting quantities calculated ignoring this damping, i.e., for
fν = 0, and yL again equal to the ratio of a(t) at last scattering to that
at matter-radiation equality. Numerical solutions of Eqs. (37) and (38) for
fν (0) = 0.40523 and for fν = 0 show that the damping factor |g ′ (yL )/g0′ (yL )|2
is very close to a linear function of yL and hence of ΩM h2 for observationally
favored values of ΩM h2 , increasing from 0.893 at ΩM h2 = 0.10 to 0.910 for
ΩM h2 = 0.15.
This damping is relatively insensitive to Q for small Q. For instance,
numerical integration of Eqs. (32) and (34) shows that for ΩM h2 = 0.15, the
damping has only decreased from 9% to 8% for Q = 0.55 (ℓ ≃ 58), and to 7%
for Q = 0.8 (ℓ ≃ 84). Matters are more complicated for larger values of Q and
ℓ, because the damping factor |χ′ (yL )/χ′0 (yL )|2 is the ratio of two oscillating
functions with slightly different phases, so that the plot of |χ′ (yL )/χ′0 (yL )|2
vs. Q shows narrow spikes: this ratio becomes infinite at values of Q for
which χ′0 (yL ) vanishes and then almost immediately drops to zero at the
slightly larger value of Q for which χ′ (yL ) vanishes. (Even if we average
over the small range of y values over which last scattering occurs, the plot of
h|χ′ (yL )/χ′0 (yL )|2 i vs. Q still shows finite though high narrow spikes at the
zeroes of χ′0 (yL ).) These spikes are not particularly interesting, because they
occur at values of Q where χ′ (yL ) is particularly small, so that the multipole
coefficients in the various tensor temperature and polarization correlation
functions will be very difficult to measure for the corresponding values of
ℓ. The values of |χ′ (yL )/χ′0 (yL )|2 in the relatively flat regions between the
spikes steadily decreases from the value ≃ 0.9 for Q ≪ 1 to a value close to
the result .644 found in the previous section for Q ≃ 10.
The effects considered in this paper will doubtless eventually be taken
into account in the computer programs used to analyze data from PLANCK
and other future facilities. In the meanwhile, the planning of future observa-
tions should take into account that the damping of cosmological gravitational
waves is not negligible.

ACKNOWLEDGMENTS

I am grateful for valuable conversations with Richard Bond, Lev Kofman,


Eiichiro Komatsu, Richard Matzner and Matias Zaldarriaga. Thanks are due
to Michael Trott for advice regarding the numerical solution of Eq. (18), and

10
to Matthew Anderson for checking the numerical results. This research was
supported in part by the Robert A. Welch Foundation, by NSF Grant PHY-
0071512, and by the US Navy Grant No. N00014-03-1- 0639, “Quantum
Optics Initiative.”

APPENDIX: SUPERHORIZON CONSERVATION LAWS

This Appendix will prove a result quoted in Section II, that in all cases
there is a tensor mode whose amplitude remains constant outside the hori-
zon, even where some particles may have mean free times comparable to
the Hubble time. The argument is similar to one used previously to show
the existence under very general conditions of two scalar modes for which a
quantity R is constant outside the horizon.[10] It is based on the observa-
tion that for zero wave number the Newtonian gauge field equations and the
dynamical equations for matter and radiation as well as the condition k = 0
are invariant under coordinate transformations that are not symmetries of
the unperturbed metric.2 The most general such transformations are
1
 
0 0
x → x + ǫ(t); , i
x → δij − ωij xj , (A1)
2
where H ≡ ȧ/a, ǫ(t) is an arbitrary function of time, and ωij = ωji is
an arbitrary constant matrix. Under these conditions we have something
like a Goldstone theorem: since the metric satisfies the field equations both
before and after the transformation, the change in the metric under these
transformations must also satisfy the field equations. This change is simply
h i
δg00 = ǫ̇(t) , δgi0 = 0 , δgij = a2 (t) − H(t)ǫ(t)δij + ωij . (A2)
This means that for zero wave number we always have a solution with scalar
modes
Ψ = Hǫ − ωii /3 , Φ = −ǫ̇ (A3)
and a tensor mode
1
hij = ωij − δij ωkk . (A4)
3
2
In this respect, the theorem proved here is similar to the Goldstone theorem[11] of
quantum field theory. The modes for which R or hij are constant outside the horizon take
the place here of the Goldstone bosons that become free particles for long wavelength.

11
(The notation for Φ and Ψ is standard, and the same as in Ref. [10].) These
are just gauge modes for zero wave number, but if they can be extended to
non-zero wave number they become physical modes, since the transforma-
tions (A1) are not symmetries of the field equations except for zero wave
number. For the scalar modes there are field equations that disappear in the
limit of zero wave number, so that conditions Φ = Ψ − 8πGπS and δu = ǫ
(where πS is the scalar part of the anisotropic inertia, called σ in Ref. [10])
and δu is the perturbation to the velocity potential) must be imposed on the
solutions (A3) for them to have an extension to non-zero wave number. It
follows then that the zero wave number scalar modes that become physical
for non-zero wave number satisfy

ǫ̇ = −Hǫ + ωkk /3 − 8πGπS , δu = ǫ . (A5)

Then for zero wave number the quantity R ≡ −Ψ + hδu has the time-
independent value
R = ωkk /3; . (A6)
For tensor modes there are no field equations that disappear for zero wave
number, so the solution hij =constant automatically has an extension to a
physical mode for non-zero wave number.
As examples, we note that both the anisotropic stress tensor πij = −η ḣij
for an imperfect fluid with shear viscosity η and the tensor (16) for freely
streaming neutrinos vanish for ḣij = 0, so in the limit of zero wave numbers
Eq. (2) has a solution with ḣij = 0. The above theorem shows that this result
applies even when some particle’s mean free time is comparable with the
Hubble time, in which case neither the hydrodynamic nor the free- streaming
approximations are applicable.
The solution with ḣij = 0 for zero wave number is not the only solu-
tion, but the other solutions decay rapidly after horizon exit. There is no
anisotropic inertia in scalar field theories, and in the absence of anisotropic
inertia, Eq. (2) for zero wave number has two solutions, one with hij con-
stant, and the other with ḣij ∝ a−3 , for which hij rapidly becomes constant.
The energy-momentum tensor of the universe departs from the perfect fluid
form later, during neutrino decoupling, and perhaps also during reheating
or periods of baryon or lepton nonconservation, but during all these epochs
cosmologically interesting tensor fluctuations are far outside the horizon, and
hence remain constant.

12
References
1. A. Starobinsky, Sov. Astron. Lett. 11, 133 (1985); E. D. Stewart and
D. H. Lyth, Phys. Lett. 302B, 171 (1993).
2. V. A. Rubakov, M. Sazhin, and A. Veryaskin, Phys. Lett. 115B, 189
(1982); R. Fabbri and M.D. Pollock, Phys. Lett. 125B, 445 (1983); L.
F. Abbott and M. B. Wise, Nuclear Physics B244, 541 (1984). A. A.
Starobinskii, Sov. Astron. Lett. 11, 133 (1985).
3. The effects of anisotropic inertia due to both neutrinos and photons
were included in a large program of numerical calculations reported
by J. R. Bond, in Cosmology and Large Scale Structure, Les Houches
Session LX, eds. R. Schaeffer, J. Silk, and J. Zinn-Justin (Elsevier
Science Press, Amsterdam, 1996). Bond concluded from the numerical
results that there is an average ‘∼ 20%’ reduction of the squared tensor
amplitude for multipole order ℓ larger than about 100, and that this
would not be observable in measurements of Cℓ because according to
Eq. (1) tensor modes already make a much smaller contribution to Cℓ
than scalar modes. It is the prospect of cosmic microwave background
polarization measurements that makes the effect of anisotropic inertia
on the tensor amplitude important.
4. See, e.g., V. F. Mukhanov, H.A. Feldman, and R. H. Brandenberger,
Physics Reports 215, 203 (1992); M. S. Turner, M. White, and J. E.
Lidsey, Phys. Rev. D 48, 4613 (1993); M. S. Turner, Phys Rev. D.
55, 435 (1997); D. J. Schwarz, astro-ph/0303574.
5. U. Seljak and M. Zaldarriaga, Astrophys. J. 469, 437 (1996).
6. S. Weinberg, Gravitation and Cosmology (Wiley, New York, 1972),
Eq. (15.10.39). (It should be noted that hij as defined in this refer-
ence is a2 times the hij used in the present work.) For k/a ≫ ȧ/a, this
formula for π gives the damping of gravitational waves that had been
calculated by S. W. Hawking, Astrophys. J. 145, 544 (1966).
7. Differential equations for both the scalar and tensor parts of the anisotropic
stress tensor were given by J. R. Bond and A. S. Szalay, Astrophys. J.
274, 443 (1983), using an orthonormal basis instead of the coordinate
basis used here, but the result was applied only for the scalar modes.

13
8. For perturbations outside the horizon, where z ≪ 1, we can replace
K(z − y) with K(0) = 1/15, and the integral in Eq. (8) becomes
just (hij (z) − hij (0))/15. Aside from the term hij (0), this equation
in the radiation-dominated case is then equivalent to Eq. (4.3) of C.
W. Misner, Astrophys. J. 151, 431 (1968), which was derived to study
a phenomenon different from that considered here: the approach to
isotropy of a homogeneous anisotropic cosmology. (This equation was
generalized to the case of finite mean free times by C. Misner and R.
Matzner, Astrophys. J. 171, 415 (1972).) Misner took hij (0) = 0
(but h′ij (0) 6= 0), on the ground that a constant term in hij could be
made to vanish by a coordinate transformation, and found a decaying
solution. But a constant term in hij is only a gauge mode when k is
strictly zero. As remarked in the Appendix, the existence of this gauge
mode means that there is a physical mode with k 6= 0 for which hij
becomes constant outside the horizon, where k is negligible, but which
becomes time-dependent when the wavelength re-enters the horizon.
[After the preprint of this work was circulated, I learned of an article
by A. K. Rebhan and D. J. Schwarz, Phys. Rev. D 50, 2541 (1994),
which obtained an integro-differential equation like Eq. (18), but with
extra terms representing more general initial conditions. No attempt
was made to identify the initial conditions that would actually apply
cosmologically, or to calculate the damping effect relevant to the cosmic
microwave background.]

9. See, e. g., M. Zaldarriaga and U. Seljak, Phys. Rev. D55, 1830 (1997).

10. S. Weinberg, Phys. Rev. D67, 123504 (2003).

11. J. Goldstone, Nuovo Cimento 9, 154 (1961); J. Goldstone, A. Salam,


and S. Weinberg, Phys. Rev. 127, 965 (1962).

14
UTTG-08-03
arXiv:astro-ph/0401313v3 18 May 2004

Can Non-Adiabatic Perturbations Arise After


Single-Field Inflation?

Steven Weinberg1
Theory Group, Department of Physics, University of Texas
Austin, TX, 78712

Abstract

It is shown that non-adiabatic cosmological perturbations cannot appear


during the period of reheating following inflation with a single scalar inflaton
field.

1
Electronic address: weinberg@physics.utexas.edu
According to a widely adopted picture [1], the perturbations to the Robertson–
Walker cosmology arose from the quantum fluctuations in a slowly rolling
scalar “inflaton” field during a period of inflation, then became classical as
their wavelength was stretched beyond the horizon, and subsequently were
imprinted on the decay products of the inflaton during a period of “reheat-
ing.” One of the attractions of this picture (and in particular the assumption
that there is just one inflaton field) is that it has generally been thought to
lead only to adiabatic perturbations, in agreement with current observations
[2].
A recent preprint [3] has raised the question, whether it is possible for
non-adiabatic cosmological perturbations to arise during reheating even af-
ter inflation with a single inflaton field. This would be very important if
true, for then observational limits on non-adiabatic fluctuations in the cos-
mic microwave background would provide some constraints on the otherwise
mysterious era of reheating, and indeed on the whole history of the universe
between inflation and the present.
However, there are very general grounds for expecting that single-field in-
flation can only produce adiabatic fluctuations, whatever happens in reheat-
ing or subsequently. It has been shown [4] that, whatever the constituents
of the universe, the differential equations for cosmological perturbations in
Newtonian gauge always have an adiabatic solution for wavelengths outside
the horizon (that is, for physical wave numbers that are much less than the
cosmological expansion rate). For this solution, there is a conserved quantity
ζ [5], defined by
ζ ≡ −Ψ + δρ/3(ρ̄ + p̄) , (1)
where δρ is the perturbation to the total energy density in Newtonian gauge;
bars denote unperturbed quantities; the perturbed metric is given by

ds2 = −(1 + 2Φ)dt2 + a2 (1 − 2Ψ) dx2 ; (2)

and as usual H ≡ ȧ/a. (Reference [4] dealt mostly with a quantity [6]

R ≡ −Ψ + H δu . (3)

where δu is the perturbation to the total velocity potential, but outside


the horizon R and ζ are the same.) Also, for this adiabatic mode, the
perturbations to the metric and the total energy density and pressure are

1
given outside the horizon by
" #
H(t) t ′ ′
Z
Φ = Ψ = ζ −1 + a(t ) dt (4)
a(t) t1
ζ ρ̄˙ (t) t ′ ′ ζ p̄˙ (t) t ′ ′
Z Z
δρ = − a(t ) dt , δp = − a(t ) dt . (5)
a(t) t1 a(t) t1
This is a solution for any value of t1 , so the difference of adiabatic solu-
tions with different values of t1 is also a solution, with ζ = 0 and Φ = Ψ ∝
H(t)/a(t), δρ/ρ̄˙ ∝ 1/a(t), etc. We will adjust t1 so that the total pertur-
bation takes the form (4), (5), in which case t1 will be at some early time
during inflation to avoid perturbations that are very large at early times.
Furthermore, if the total energy-momentum tensor T µν is given by a sum of
tensors Tαµν for fluids labeled α (not necessarily individually conserved) then
Eq. (5) holds for each of the individual perturbations
ζ ρ̄˙ α (t) t ζ p̄˙α (t) t
Z Z
δρα = − a(t′ ) dt′ , δpα = − a(t′ ) dt′ , (6)
a(t) t1 a(t) t1

and similarly for any other four-dimensional scalar, such as the inflaton field
ζ ϕ̄˙ (t) t
Z
δϕ = − a(t′ ) dt′ . (7)
a(t) t1

In general there may be other solutions, for which ζ may or may not
be constant and the perturbations are not given by Eqs. (4)–(7). (Mislead-
ingly, these others are often called isocurvature solutions.) But if only one
inflaton field makes a contribution to the energy-momentum tensor during
inflation then the perturbations to this field and the metric in Newtonian
gauge are governed by a third-order set of differential equations with a single
constraint, so only two independent solutions contribute to cosmological per-
turbations, and since we have found two explicit adiabatic solutions outside
the horizon, these are the only solutions for the coupled system of inflaton
and gravitational fields from horizon exit to the end of inflation. During
reheating and the subsequent evolution of the universe other fields and flu-
ids become important, but the adiabatic mode is a true solution outside the
horizon whatever the contents of the universe, so with the universe in the
adiabatic mode at the beginning of reheating it remains in this mode as long
as the wavelength remains outside the horizon.

2
So how did non-adiabatic modes arise in reference [3]? The theorem of
reference [4] assumes that the field equations hold at every moment, which
requires that the perturbations are differentiable functions of time. As he
suggested might be the case, the reason that Armendáriz-Picón found a non-
adiabatic mode during reheating in reference [3] was that, although it was
assumed that initially during inflation only the inflaton field was present and
there was no energy transfer to other fields, he supposed in this reference
that the energy transfer rate rose discontinuously to a non-zero value at the
beginning of reheating. (The model considered in reference [3] actually gave
a constant value for ζ, but the perturbations had unequal values of δρ/ρ̄˙ for
the inflaton and its decay products, so as recognized by Armendáriz-Picón,
this perturbation was not adiabatic.) Of course, a discontinuous change in
the energy transfer rate is unphysical.
There is one weak point in the above general argument that non-adiabatic
perturbations do not arise during reheating. It is the assumption that there
was nothing but the inflaton and gravitation before reheating. Of course,
in order for reheating to occur at all, there must be other fields or fluids
besides the inflaton, and these do not suddenly come into existence during
reheating. The reactions that produce matter during reheating can’t be com-
pletely absent beforehand, so the remaining question is whether the transfer
of energy from the inflaton to other fields during inflation before reheating
excites these other modes.
This question is answered by a further theorem, that if the matter en-
ergy density during inflation is small, then even if the perturbations to the
matter energy density are initially not at all adiabatic, the departures from
adiabaticity would decay exponentially fast as soon as the energy transfer
rate becomes appreciable.
Here is the proof. The co-moving rate per proper volume of energy trans-
fer from the inflaton field ϕ to “matter” fields (possibly including radiation)
is in general some four-scalar function X of all these fields and perhaps their
first and higher time derivatives:
µν
−uµ TM ;ν = X(ϕ, . . .) , (8)

where uµ is the four-vector velocity of the total energy-momentum tensor,


normalized so that uµ uµ = −1. To zeroth order in all perturbations to the

3
metric and other fields, this reads
ρ̄˙ M + 3H(ρ̄M + p̄M ) = X̄ , (9)
where bars denote unperturbed values, taken to depend only on time. (We
choose signs so that ū0 = +1.) To first order in perturbations, Eq. (8) gives
in Newtonian gauge
δ ρ̇M + 3H(δρM + δpM ) − 3(ρ̄M + p̄M )Ψ̇ = δX + ΦX̄ , (10)
the last term on the right arising from the perturbation to u0 . We have
dropped terms involving spatial gradients, which become negligible outside
the horizon.
Now suppose that at some early time during inflation the density and
pressure of matter are small. (This is plausible, because the energy density
of fermions and gauge fields produced by quantum fluctuations would be
quadratic in the fluctuations.) Then the inflaton will provide the chief source
of the gravitational field. As mentioned above, under these conditions the
perturbations to the inflaton and gravitational potentials will be described
by the adiabatic solution
δϕ = −ϕ̄˙ I , Φ = Ψ = −ζ + HI , (11)
in which for convenience we have introduced the notation
ζ
Z t
I(t) ≡ a(t′ ) dt′ . (12)
a(t) t1
Also, under these conditions the energy transfer rate will depend only on the
inflaton and gravitational fields and their time derivatives, so
˙I.
δX = −X̄ (13)
(For instance, if X depends only on ϕ, then
!
∂ X̄ ∂ X̄
δX = δϕ = − ϕ̄˙ I .
∂ ϕ̄ ∂ ϕ̄
Eq. (13) also holds if X depends also on ϕ̇, ϕ̈, etc., provided that for each
pair of time derivatives there is a factor of g 00 to keep X a scalar.) Putting
Eq. (13) together with Eq. (11), the right-hand side of Eq. (10) is
∂h i
δX + ΦX̄ = − X̄I . (14)
∂t

4
and the difference between Eq. (10) and the time-derivative of Eq. (9) gives

∂ ∂ h  i
[δρM + ρ̄˙ M I] = −3H(δρM + δpM ) − 3(ρ̄M + p̄M )Ï − 3 ρ̄M + p̄M HI .
∂t ∂t
(15)
This can be simplified by noting that I satisfies the differential equation

Ï + (HI) = 0 , (16)
∂t
so

[δρM + ρ̄˙ M I] = −3H(δρM + δpM + ρ̄˙ M I + p̄˙ M I) . (17)
∂t
To continue, let us assume that the matter pressure is a function pM (ρM )
only of the matter energy density ρM , as in the case of pure radiation or
pure dust. (We are not assuming this for the combined system of matter and
inflaton.) This is plausible, because the decay of a single real inflaton field
would not generally produce any chemical potentials. Then p̄˙ M = p′M (ρ̄M )ρ̄˙ M
and δpM = p′M (ρ̄M )δρM . Eq. (17) now reads


[δρM + ρ̄˙ M I] = −3H(1 + c2M )(δρM + ρ̄˙ M I) , (18)
∂t
where c2M ≡ dpM /dρM is the squared sound speed. This shows that |δρM +
ρ̄˙ M I| decreases monotonically and faster than a−3 , but that is not good
enough, because we have to make sure that this is not just because |δρM |
and |ρ̄˙ M I| are both decreasing. For this purpose, we use Eq. (9) again, and
re-write Eq. (18) as
∂ (1 + c2M )
ln N = −X̄ , (19)
∂t ρ̄M + p̄M
where N is a dimensionless measure of the departure of the matter energy
density perturbation from its adiabatic value

+ ρ̄˙ M I

δρ
M
N≡ . (20)
ρ̄M + p̄M

Since energy is flowing from the inflaton field to matter, X is positive, and
Eq. (19) shows that N decreases monotonically. Early in inflation the matter
perturbation may be nowhere near adiabatic, with N of the same order of

5
magnitude as the fractional density perturbation. As time passes during
inflation, the transfer of energy from the inflaton to matter may make both
δρM and ρ̄˙ M large, but the quantity N continues to decrease. Eventually,
after the energy transfer rate X becomes large for a sufficiently long time
during reheating, the matter density perturbation can no longer be ignored
in calculating X and the gravitational field perturbations, but by that time N
will have decayed exponentially. With the density and pressure perturbations
of the matter as well as the inflaton satisfying Eq. (6), we will still have
δX = −X̄ ˙ I and Φ = Ψ = −İ, and the above analysis will remain valid.
Note that asymptotically, after the transfer of energy from the inflaton to
matter ceases, the quantity −ρ̄˙ M I/(ρ̄M + p̄M ) approaches 3ζ/(1 − Ḣ/H 2 ), so
as long as p̄M < ρ̄M , −ρ̄˙ M I/(ρ̄M + p̄M ) is bounded below in absolute value by
3|ζ|/4. Thus the reason that N approaches zero is not that δρM /(ρ̄M + p̄M )
and −ρ̄˙ M I/(ρ̄M + p̄M ) are both approaching zero. Rather, the ratio of these
quantities, −δρM /ρ̄˙ M I, approaches unity.
With the matter pressure a function only of the matter density, the matter
pressure perturbation will then also satisfy Eq. (6). Thus the perturbations
become adiabatic, in the sense of Eq. (6) and (7). Also, with the energy
density and pressure perturbations of matter as well as of the inflaton all
satisfying Eq. (6), the total energy density and pressure perturbations will
obviously satisfy Eq. (5). It follows that the perturbations also satisfy the
adiabatic condition that ζ is constant, because in general

ρ̄˙ δp − p̄˙ δρ
ζ̇ = . (21)
3(ρ̄ + p̄)2

The same analysis also obviously applies if the inflaton energy goes into
several species of matter, with a pressure for each species given by a function
of the energy density in that species.
In conclusion, even if the decay of the inflaton during inflation produces a
small matter density whose perturbations are not at all adiabatic, the depar-
ture from adiabaticity will decay rather than grow as inflation proceeds, and
the departures of the perturbations from their adiabatic values will become
exponentially small when the matter density becomes large during reheating.
I was greatly helped in my thinking about this question by a correspon-
dence some months ago with Alan Guth, and more recently by discussions
with Christian Armendáriz-Picón and Eiichiro Komatsu. This research was

6
supported by the National Science Foundation under Grant No. 0071512 and
by the Robert A. Welch Foundation and also the US Navy Office of Naval
Research, Grant No. N00014-03-1-0639, Quantum Optics Initiative.

REFERENCES

1. A. Guth and S.-Y Pi, Phys. Rev. Lett. 49, 1110 (1982); S. Hawking,
Phys. Lett. 115B, 295 (1982); A. Starobinsky, Phys. Lett. 117B, 175
(1982); F. Bardeen, P. Steinhardt, and M. Turner, Phys. Rev. D28,
679 (1983). A model in which cosmological perturbations arise during
inflation from quantum fluctuations in a scalar field derived from the
spacetime curvature had been considered earlier by Y. Mukhanov and
G. Chibisov, JETP Lett. 33, 532 (1981).

2. H. V. Peiris et al., Astrophys. J. Suppl. 148, 213 (2003)); P. Crotty,


J. Garcia-Bellido, J. Lesgourgues, and A. Riazuelo, Phys. Rev. Lett.
91, 171301 (2003).

3. C. Armendáriz-Picón, astro-ph/0312389.

4. S. Weinberg, Phys. Rev. D67, 123504 (2003). A somewhat improved


discussion is given in the Appendix to S. Weinberg, Phys. Rev. D 69,
023503 (2004). Also see S. Bashinsky and U. Seljak, astro-ph/0310198,
especially Appendix B.

5. J. M. Bardeen, P. J. Steinhardt, and M. S. Turner, Phys. Rev. D28,


679 (1983). This quantity was re-introduced by D. Wands, K. A. Malik,
D. H. Lyth, and A. R. Liddle, Phys. Rev. D62, 043527 (2000). Also
see D. H. Lyth, Phys. Rev. D 31, 1792 (1985); K. A. Malik, D. Wands,
and C. Ungarelli, Phys. Rev. D 67, 063516 (2003).

6. J. M. Bardeen, Phys. Rev. D22, 1882 (1980); D. H. Lyth, Phys.


Rev. D31, 1792 (1985). For reviews, see J. Bardeen, in Cosmology and
Particle Physics, eds. Li-zhi Fang and A. Zee (Gordon & Breach, New
York, 1988); A. R. Liddle and D. H. Lyth, Cosmological Inflation and
Large Scale Structure (Cambridge University Press, Cambridge, UK,
2000).

7
UTTG-04-04

Must Cosmological Perturbations Remain


arXiv:astro-ph/0405397v2 20 Jul 2004

Non-Adiabatic After Multi-Field Inflation?

Steven Weinberg1
Theory Group, Department of Physics, University of Texas
Austin, TX, 78712

Abstract

Even if non-adiabatic perturbations are generated in multi-field inflation,


the perturbations will become adiabatic if the universe after inflation enters
an era of local thermal equilibrium, with no non-zero conserved quantities,
and will remain adiabatic as long as the wavelength is outside the hori-
zon, even when local thermal equilibrium no longer applies. Small initial
non-adiabatic perturbations associated with imperfect local thermal equilib-
rium remain small when baryons are created from out-of-equilibrium decay
of massive particles, or when dark matter particles go out of local thermal
equilibrium.

1
Electronic address: weinberg@physics.utexas.edu
I. INTRODUCTION

A recent paper [1] has shown, contrary to what is sometimes thought [2],
that if the cosmological perturbations produced during inflation are in the
adiabatic mode after they leave the horizon, then they stay in that mode
through reheating and until horizon re-entry. This paper considers a comple-
mentary question: if the cosmological perturbations produced during multi-
field inflation are not in the adiabatic mode after they leave the horizon, then
must they remain non-adiabatic thereafter?
As in [1], our approach to this problem is based on the theorem that
in all cosmological models, among the various solutions of the differential
equations that govern perturbations in Newtonian gauge, there are always
two independent solutions that are adiabatic throughout the whole period
when the perturbations are outside the horizon.[3] By “adiabatic” is usually
meant that the perturbation δS in any four-scalar S (including temperature,
proper number densities, scalar fields, etc.) is proportional to S,˙¯ where bars
denote unperturbed quantities, and dots denote ordinary time derivatives.
In particular, for the energy densities and pressures of all the individual
constituents (labeled α) of the universe, we have[4]
δρα δpα
= is independent of α , (1)
ρ̄˙ α p̄˙ α

from which comes the term “adiabatic.” The theorem of ref. [3] provides the
further information that for all four-scalars S, the adiabatic solution has
δS
= −I , (2)
S˙¯
and in particular
δρα δpα
= = −I , (3)
ρ̄˙ α p̄˙ α
where
ζ t
Z
I(t) ≡ a(t)dt . (4)
a(t) T

Here a(t) is the Robertson–Walker scale factor, and ζ and T are constants,
the same for all four-scalars. (The freedom to choose T as well as ζ means

1
that these are two independent solutions.) Also, according to this theorem,
the three-scalar gravitational potentials (defined by the Newtonian gauge line
element ds2 = −(1 + 2Φ)dt2 + a2 (1 − 2Ψ)dx2 ) are [5]

Φ = Ψ = −ζ + HI , (5)

(where H(t) ≡ ȧ(t)/a(t)), and in this mode there is no anisotropic inertia.


(Eq. (5) can be written as the formula by which ζ is usually defined[6],
ζ = −Ψ + δρ/3(ρ̄ + p̄).) Observation of the cosmic microwave background
can tell us whether the cosmological perturbations are actually in this mode
just before they re-enter the horizon, so it is of great interest to ask, in
what cosmological models we would expect the perturbations to be purely
adiabatic outside the horizon.
Before addressing this question, let’s recall how current observations are
used to check the adiabatic nature of the perturbations. It appears that dur-
ing the whole period when the cosmic temperature is in the range 109 ◦ K ≫
T ≫ 3000 ◦K, the constituents of the universe consist of a non-relativistic
nucleon-electron plasma, plus photons, plus highly relativistic free neutri-
nos, plus non-relativistic cold dark matter. To a good approximation, energy
is separately conserved in each of these fluids, so each satisfies the energy
conservation conditions

ρ̄˙ α + 3H(ρ̄α + p̄α ) = 0 (6)

and, outside the horizon,

δ ρ̇α + 3H(δρα + δpα ) = 3(ρ̄α + p̄α )Ψ̇ . (7)

where α runs over the four fluids — cold dark matter, neutrinos, photons,
and the electron-nucleon plasma. For each of these fluids the pressure pα
is either negligible or a function only of the energy density ρα , so it follows
from Eqs. (6) and (7) that[7]
δρα Ψ + ζα
=− , (8)
ρ̄˙ α H
where the ζα are various constants. (There are actually three separate con-
stants for the three neutrino flavors.) According to Eq. (1), an adiabatic
mode would have all ζα equal. In fact, according to Eqs. (3) and (5), they

2
would all be equal to ζ. The constants ζα affect the further evolution of the
constituents of the universe after horizon re-entry and up to the time of last
scattering, and so can be measured by observations of anisotropies in the cos-
mic microwave background. So far, there is no evidence for unequal ζα , and
some indications that they are equal for cold dark matter and photons.[8]
The most usually considered case in which we would expect purely adia-
batic perturbations is inflation with a single scalar “inflaton” field. It is well
known that in this case the cosmic perturbations are purely adiabatic during
inflation after horizon exit (whether or not the “slow roll” conditions are
satisfied), and reference [1] shows that they then remain adiabatic through
the period of reheating and as long as the perturbations remain outside the
horizon. In this case, all ζα would be equal. With several scalar fields the
perturbations are not in general purely adiabatic during inflation, and it
seems to be widely expected that this would show up in unequal values for
the constants ζα .
But there is a plausible scenario in which non-adiabatic fluctuations pro-
duced in multi-field inflation become adiabatic after reheating, and then re-
main adiabatic until horizon re-entry. Suppose that reheating leaves the
universe in a state of local thermal (including chemical) equilibrium, with no
non-zero conserved quantities. This is what is generally assumed in conven-
tional theories of cosmological baryogenesis. In this case the perturbations
are described by just three degrees of freedom: the temperature fluctuation
δT , the scalar potential Φ = Ψ, and the velocity potential fluctuation δu.
As described in [3], these variables are governed by three first-order differ-
ential equations, so they must have three independent solutions, but there
is also a constraint relating δT , Ψ, and δu, leaving just two independent
solutions. Since there are always two independent adiabatic solutions, the
solutions here are all adiabatic. (This argument would not apply if one or
more scalars remained decoupled from the matter in thermal equilibrium [9],
as in curvaton models [10], theories of scalar field baryogenesis [11], and in
some theories of cosmological moduli [12] or axions [13].)
Later the universe develops a non-zero baryon number; neutrinos and cold
dark matter decouple from other particles; and local thermal equilibrium is
lost. The arguments of [3] then no longer apply, which is why the field
equations in the era from e+ e− annihilation to last scattering have solutions
(8) with different ζα . Nevertheless, as long as the wavelength remains outside
the horizon the adiabatic solution (2)–(5) is always a solution, and since

3
the cosmological perturbations were described by this solution during the
assumed early era of local thermal equilibrium, they remain purely adiabatic.
Just as in [1], there is a weak point in this argument. Local thermal equi-
librium is never perfect, so there are always other degrees of freedom besides
δT , Ψ, and δu, and hence other solutions of the field equations besides the
adiabatic solutions. We shall show in Section II that any non-adiabatic con-
tributions to cosmological fluctuations will decay as the universe approaches
a state of local thermal equilibrium with zero chemical potentials. But even
if these non-adiabatic solutions make only a tiny contribution to cosmological
perturbations as this state is approached, how do we know that they do not
grow rapidly again as local thermal equilibrium is subsequently lost, as for
instance by the decoupling of dark matter particles, or as non-zero chemical
potentials appear in cosmological baryosynthesis?
It seems that this question must be addressed on a case by case basis.
Here we will consider it in two familiar contexts: the survival of cold dark
matter particles as they go out of local thermal equilibrium, treated in Section
III , and the appearance of a non-zero chemical potential through the out-
of-equilibrium decay of a massive exotic particle, considered in Section IV.

II. APPROACH TO ADIABATICITY

Suppose that after inflation the dominant constituent of the universe is


a heat bath in local thermal equilibrium with no chemical potentials, but
that, as a remnant of multi-field inflation, there is also present a species of
particles with number density n that is not even approximately described
by the adiabatic condition (2). The rate of change of this number density
in co-moving inertial frames will be some function Y (n, T ) of n and of the
temperature T of the heat bath2

(n uµ );µ = Y . (9)
2
Here uµ is the velocity four-vector, normalized so that gµν uµ uν = −1, with unper-
turbed components ū0 = +1 and ūi = 0. Because outside the horizon spatial gradients
can be neglected, it is unnecessary to state whether uµ is the velocity four-vector of the
particles with number density n, or of the heat bath energy-momentum tensor, or of the
total energy-momentum tensor — all we need to know about δuµ is that δu0 = δu0 = −Φ,
which follows from the normalization condition on uµ .)

4
To zeroth order in perturbations this is

n̄˙ + 3H n̄ = Ȳ (10)

while to first order

δ ṅ + 3Hδn − 3n̄Ψ̇ = δY + ΦȲ (11)

For simplicity, we will assume that there were so many particle species in
equilibrium that n can be ignored in the evolution of Ψ and T . According to
the argument quoted in Section I, it follows that δT and Ψ are given by the
adiabatic conditions:
δT
= −I , Φ = Ψ = HI − ζ . (12)
T̄˙
It follows that the perturbation to the rate of change of n takes the form

∂ Ȳ
δY + Ȳ˙ I = ˙ ,
(δn + n̄I) (13)
∂n̄
the terms involving ∂ Ȳ /∂ T̄ canceling. Adding the time derivative of I times
Eq. (10) to Eq. (11) and using Eqs. (12) and (13) gives
!
d  ∂ Ȳ  
δn + I n̄˙ = −3H + δn + I n̄˙ . (14)
dt ∂n̄

This must be compared with the rate of decrease of n̄ itself, given by Eq. (10).
From Eqs. (10) and (14) we have

˙ ∂ Ȳ

d δn + n̄I Ȳ
ln = − (15)
dt n̄ ∂n̄ n̄

The right-hand side may have any sign, but we know that when n̄ is near the
value nEQ (T ) that it would have in local thermal equilibrium at a temperature
T , the rate of change of n̄ due to all causes apart from the expansion of the
universe — decay, annihilation, etc. — must take the form
 
Y → −Λ(T ) n − nEQ (T ) (16)

5
where Λ is some rate that must be positive in order that n̄ in Eq. (10) should
be attracted toward nEQ when the expansion of the universe can be neglected.
For Λ(T̄ ) ≫ H the expansion of the universe can be neglected, and so as the
density n̄ approaches the value it would have in thermal equilibrium, the
right-hand side of Eq. (15) approaches the negative value −Λ(T̄ ), and the
˙
ratio (δn + n̄I)/n̄, which can be taken as a dimensionless measure of the
departure from adiabaticity, decays exponentially with rate Λ(T̄ ). We will
see in Sections III and IV that this ratio is the quantity that sets the scale
for possible inequalities in the observable quantities ζα .
We have seen that the non-adiabaticities associated with multi-field in-
flation become small if inflation is followed by a period of local thermal equi-
librium with no non-zero chemical potentials. Now we have to see if they can
be revived when the universe subsequently goes out of thermal equilibrium.

III. DARK MATTER

Next we consider the possible non-adiabatic contributions to cosmological


perturbations that might be caused by the departure from local thermal
equilibrium that occurs when the annihilation rate of heavy cold dark matter
particles can no longer keep up with the rate that these particles would
disappear in thermal equilibrium.[14]
We assume here that all matter and radiation except the cold dark matter
particles are in local thermal equilibrium with each other, and that this
thermal bath dominates the energy density of the universe. Then the density
n of the cold-dark matter particles will again be governed by Eq. (15). As
we have seen in Section II, if there is a sufficiently long period when the
rate Y characterizing interactions of the cold dark matter particles with the
thermal bath is much larger than the expansion rate, then by some time
˙
t1 the ratio |δn + n̄I|/n̄ will have become exponentially small. Eventually,
as the temperature falls below the dark matter particle mass, the fractional
rate of decrease of nEQ will become greater than Y , and n will consequently
become greater than nEQ , eventually leaving over a remnant of dark matter
that has survived to the present. The question is whether the small non-
adiabatic perturbation at time t1 is amplified while the dark matter goes out
of equilibrium.

6
Even when the cold dark matter density n begins to differ appreciably
from its equilibrium value nEQ , Eq. (15) will still apply, but Y will no longer
be given by Eq. (16). If we assume that dark matter particles are annihilated
only in binary collisions, and created only in pairs, then Y will be of the form
 
Y (n, T ) = −R(T ) n2 − n2EQ (T ) , (17)
where R is a positive rate constant, equal to the average over the dark matter
velocity distribution of the product of the dark matter particle velocity and
the annihilation cross section. Then Eq. (15) becomes
˙ n̄2 + n2EQ (T̄ )
!
d δn + n̄I
ln = −R(T̄ ) . (18)
dt n̄ n̄
At late times n̄ ≫ nEQ (T̄ ), so the right-hand side approaches −R(T̄ )n̄. The
rate constant R(T̄ ) approaches a constant at low temperature, and n̄ asymp-
totically goes as a−3 , so the time-integral of the right-hand side of Eq. (18)
converges for large t. Therefore at late times
˙ ˙ n̄2 + n2EQ (T̄ )
" # " ! #
δn + n̄I δn + n̄I ∞
Z
→ exp − R(T̄ ) dt . (19)
n̄ n̄ 1 t1 n̄
We see that the departure from adiabaticity at late times is less than at
the end of the period when the cold dark matter particles were in thermal
equilibrium with everything else, though only by a finite factor.
At late times n̄˙ → −3H n̄ and HI = ζ + Ψ, so Eq. (19) may be written
as in Eq. (8): !
δρD δn ζD + Ψ
= →− , (20)
ρ̄˙ D n̄˙ H
with ζD now given by
˙ n̄2 + n2EQ (T̄ )
" # " ! #
1 δn + n̄I ∞
Z
ζD = ζ + exp − R(T̄ ) dt . (21)
3 n̄ 1 t1 n̄
˙
As we have seen, the quantity [(δn + n̄I)/n̄]1 is exponentially small if there
is a sufficiently long period when the rate Y characterizing interactions of
the cold dark matter particles with the thermal bath is much larger than the
expansion rate, and Eq. (21) shows that then the departure of the observ-
able quantity ζD from the value ζ that it would have for a purely adiabatic
perturbation is even smaller.

7
IV. BARYOGENESIS

Here we will consider the simple original model of cosmological baryoge-


nesis[15], with the later modification[16], that the out-of equilibrium decay
of some heavy exotic particle X produces a non-zero value of the quantity
B −L, which is subsequently processed by B −L-conserving non-perturbative
effects of the electroweak interactions[17] into some definite proportions of
B and L, depending only on the numbers of generations and scalar dou-
blets[18]. It is assumed that at some time t1 after inflation a nearly perfect
local thermal equilibrium has been reached, with no chemical potentials, but
that subsequently the actual rate of disappearance of the X particle (and its
antiparticle, if distinct) does not keep up with the rapid decrease that would
occur in thermal equilibrium as the temperature falls below the X mass. In
the original model [15] of direct baryosynthesis the X particles were assumed
to have distinct antiparticles, but with equal number densities nX . In the
case [16] of leptogenesis the X particles are usually assumed to be identical
with their antiparticles, and their density will again be denoted nX . We
will assume that there are so many particle species in thermal equilibrium
that the density of the X particles makes a negligible contribution to the
gravitational field and to the evolution of the temperature of the particles in
equilibrium, so that the fluctuations in the temperature and metric are given
by Eq. (12). Then the degree of non-adiabaticity of this number density will
again be given by an equation like Eq. (15), in which Y is now the rate of
change of nX in co-moving inertial frames. As before, this gives
δnX + n̄˙ X I δnX + n̄˙ X I
! "Z ! #
t ∂ Ȳ Ȳ
= exp − dt , (22)
n̄X n̄X 1 t1 ∂n̄X n̄X
a subscript 1 denoting the time t1 , when the decay of the X particles has not
yet begun to produce any appreciable net density of B − L. But once the
temperature falls below the X-particle mass and the disappearance rate of
the X-particles falls below the expansion rate, the density nX will be much
larger than its equilibrium value, and Eq. (16) will not apply.
Suppose that the disappearance of a single X particle (together with its
antiparticle, if distinct) produces on average a net value b of B − L. The rate
of change of the density nB−L of B − L in co-moving inertial frames is then
(nB−L uµ );µ = −b (nX uµ );µ = −b Y (nX , T ) . (23)

8
To zeroth order in perturbations, this gives

n̄˙ B−L + 3H n̄B−L = −b Ȳ , (24)

while to first order


 
δ ṅB−L + 3HδnB−L − 3n̄B−L Ψ̇ = −b δY + ΦȲ . (25)

The sum of Eq. (25) and the time derivative of I times Eq. (24) is

d    ∂ Ȳ
δnB−L + n̄˙ B−L I + 3H δnB−L + n̄˙ B−L I = −b (δnX + I n̄˙ X ) . (26)
dt ∂n̄X
The solution is
3 " !#
a1 t ∂ Ȳ
 Z h i
δnB−L + n̄˙ B−L I = b 1 − exp dt δnX + n̄˙ X I , (27)
a t1 ∂n̄X 1

a subscript 1 again denoting quantities evaluated at time t1 . Also, the solu-


tion of Eq. (24) is
" 3 #
a1

n̄B−L = −b n̄X − n̄X 1 . (28)
a

Thus the departure from adiabaticity in the B − L density is

δnB−L + I n̄˙ B−L (δnX + I n̄˙ X )1


!" #
Z t ∂ Ȳ
= 1 − exp dt (29)
n̄B−L n̄X 1 − (a/a1 )3 n̄X t1 ∂n̄X

Asymptotically only decay contributes to the disappearance rate Y , so Ȳ →


−ΓX n̄X (where ΓX is the decay rate of the X-particle), and therefore the
exponential in Eq. (29) becomes negligible compared with one and (a/a1 )3 n̄X
becomes negligible compared with n̄X 1 . Also, after the temperature drops
below the electroweak scale ≈ 300 GeV, the values of the zeroth and first-
order terms in the baryon density n̄B and δnB are proportional respectively
to n̄B−L and δnB−L with the same constant factor, so

δnB + n̄˙ B I δnB−L + I n̄˙ B−L δnX + I n̄˙ X


!
→ → (30)
n̄B n̄B−L n̄X 1

9
Equivalently, since at late times n̄B ∝ a−3 and HI = ζ + Ψ,
!
δnB Ψ + ζB−L
→− , (31)
˙n̄B H

where ζB is the constant

1 δnX + n̄˙ X I
" #
ζB = ζ + . (32)
3 n̄X 1

As we saw in Section II, if there is a sufficiently long period ending at time


t1 during which the X particles are decaying and being recreated by the
heat bath of other particles at a rate much greater than H, then the second
term on the right-hand side of Eq. (32) will be exponentially small. Hence,
on these assumptions, the observed constant ζB will be very close to the
common value ζ of the other ζα .

V. CONCLUSION

If we assume that the results of Sections III and IV are typical of the
general case, then the existence of a post-inflation period of nearly perfect
local thermal equilibrium would rule out any appreciable present departures
from a purely adiabatic perturbation. Thus the observation of a departure
from a purely adiabatic perturbation, such as a measurement of different
values for the ζα in Eq. (8), would be evidence not only that non-adiabatic
perturbations are generated in inflation, but also that there was no era of per-
fect or nearly perfect local thermal equilibrium when all conserved quantum
numbers vanished.
I am grateful for helpful conversations with E. Komatsu and M. Tegmark.
I also thank K. Chaicherdsakul for pointing out typographical errors in an
earlier version of this paper. This material is based upon work supported by
the National Science Foundation under Grant No. 0071512 and with support
from the Robert A. Welch Foundation, Grant Nos. F-0014 and F-1099, and
also grant support from the US Navy, Office of Naval Research, Grant No.
N00014-03-1-0639, Quantum Optics Initiative.

REFERENCES

10
1. S. Weinberg, astro-ph/0401313.
2. See, e.g., C. Armendáriz-Picón, astro-ph/0312389.
3. S. Weinberg, Phys. Rev. D 67, 123504 (2003); S. Bashinsky and U.
Seljak, astro-ph/0310198. Also see the appendix to S. Weinberg, astro-
ph/0306374.

4. The existence of solutions with δS/S˙¯ equal for all energy densities,
pressures, etc. (but not the detailed solution (2)–(5)) seems to have
been generally accepted for a long time. An intuitive “separate uni-
verse” argument for the existence of a solution satisfying Eq. (1) has
been given by D. H. Lyth and D. Wands, Phys. Rev. D 68, 103516
(2003); also see D. Wands, K. A. Malik, D. H. Lyth, and A. R. Liddle,
Phys. Rev. D 62, 043627 (2000); A. R. Liddle and D. H. Lyth, Cos-
mological Inflation and Large-Scale Structure (Cambridge University
Press, 2000). But this sort of argument only shows that there is a so-
lution satisfying Eq. (1) for zero wave number. There are indeed many
such solutions for zero wave number, most of which have no physical
significance because they cannot be extended to finite wave number.
In ref. [3] it was shown that the requirement that the solution can be
extended to finite wave number yields just two solutions, described by
Eqs. (2)–(5). (It is this requirement that requires that the infinitesimal
re-definition of the time coordinate used in the “separate universe” ar-
gument to generate the solutions for zero wave number be accompanied
with an infinitesimal re-scaling of the space coordinate.)
5. D. Polarski and A. A. Starobinsky, Nucl. Phys. B 385, 623 (1992),
found the solution (1)–(5) for the field equations of two scalar fields
interacting with each other only through gravity, but did not extend
this result to general systems of particles and/or fields. This explicit
solution was not found by the “separate universe” arguments of Ref.
4.
6. J. M. Bardeen, P. J. Steinhardt, and M. S. Turner, Phys. Rev. D28,
679 (1983). This quantity was re-introduced by D. Wands, K. A. Malik,
D. H. Lyth, and A. R. Liddle, Phys. Rev. D62, 043527 (2000). Also
see D. H. Lyth, Phys. Rev. D 31, 1792 (1985); K. A. Malik, D. Wands,
and C. Ungarelli, Phys. Rev. D 67, 063516 (2003).

11
7. S. Bashinsky and U. Seljak, astro-ph/0310198.

8. H. V. Peiris et al., Astrophys. J. Suppl. 148, 213 (2003); P. Crotty, J.


Garcia-Bellido, J. Lesgourgues, and A. Riazuelo, Phys. Rev. Lett. 91,
171301 (2003).

9. D. Polarski and A. A. Starobinsky, Phys. Rev. D 50, 6123 (1994).

10. D. H. Lyth and D. Wands, Phys. Lett. B 524, 5 (2002); D. H. Lyth,


C. Ungarelli, and D. Wands, Phys. Rev. D 67, 023503 (3003); D. H.
Lyth and D. Wands, Phys. Rev. D 68, 103516 (2003).

11. T. Moroi and M. Murayama, Phys. Lett. B 553, 126 (2003).

12. T. Moroi and T. Takahashi, Phys. Lett. B 522, 215 (2001).

13. D. Seckel and M. S. Turner, Phys. Rev. D 32, 3178 (1985).

14. B.W. Lee and S. Weinberg, Phys. Rev. Lett. 39, 165 (1977)

15. S. Weinberg, Phys. Rev. Lett. 42, 850 (1979). For earlier discussions
of cosmological baryosynthesis, see A. D. Sakharov, JETP Lett. 6, 24
(1967); M. Yoshimura, Phys. Rev. Lett. 41, 281 (1978); 42, 746(E)
(1979); S. Dimopoulos and L. Susskind, Phys. Rev. D 18, 4500 (1979);
Phys. Lett. 81B, 416 (1979); A. Yu. Ignatiev, N. V. Krosnikov, V.
A. Kuzmin, and A. N. Tavkhelidze, Phys. Lett. 76B, 436 (1978); B.
Toussaint, S. B. Treiman, F. Wilczek, and A. Zee, Phys. Rev. D 19,
1036 (1978); J. Ellis, M. K. Gaillard, and D. V. Nanopoulos, Phys.
Lett. 80B, 360 (1979); 82B, 464(E) (1979).

16. M. Fukugita and T. Yanagida, Phys. Lett. B 174, 45 (1986).

17. G. ’t Hooft, Phys. Rev. Lett. 37, 8 (1976); V. A. Kuzmin, V. A.


Rubakov, and M. E. Shaposhnikov, Phys. Lett. 155B, 36 (1985).

18. J. A. Harvey and M. S. Turner, Phys. Rev. D 42, 3344 (1980).

12
UTTG-01-05

Quantum Contributions to Cosmological Correlations


arXiv:hep-th/0506236v1 28 Jun 2005

Steven Weinberg∗
Theory Group, Department of Physics, University of Texas
Austin, TX, 78712

Abstract

The “in-in” formalism is reviewed and extended, and applied to the calcula-
tion of higher-order Gaussian and non-Gaussian correlations in cosmology.
Previous calculations of these correlations amounted to the evaluation of
tree graphs in the in-in formalism; here we also consider loop graphs. It
turns out that for some though not all theories, the contributions of loop
graphs as well as tree graphs depend only on the behavior of the inflaton
potential near the time of horizon exit. A sample one-loop calculation is
presented.


Electronic address: weinberg@physics.utexas.edu
I. INTRODUCTION

The departures from cosmological homogeneity and isotropy observed


in the cosmic microwave background and large scale structure are small,
so it is natural that they should be dominated by a Gaussian probability
distribution, with bilinear averages given by the terms in the Lagrangian
that are quadratic in perturbations. Nevertheless, there is growing interest
in the possibility of observing non-Gaussian terms in various correlation
functions,1 such as an expectation value of a product of three temperature
fluctuations. It is also important to understand the higher-order corrections
to bilinear correlation functions, which appear in Gaussian correlations.
Until now, higher-order cosmological correlations have been calculated
by solving the classical field equations beyond the linear approximation.
As will be shown in the Appendix, this is equivalent to calculating sums
of tree graphs, though in a formalism different from the familiar Feynman
graph formalism. For instance, Maldacena2 has calculated the non-Gaussian
average of a product of three scalar and/or gravitational fields to first order
in their interactions, which amounts to calculating a tree graph consisting
of a single vertex with 3 attached gravitational and/or scalar field lines.
This paper will discuss how calculations of cosmological correlations can
be carried to arbitrary orders of perturbation theory, including the quantum
effects represented by loop graphs. So far, loop corrections to correlation
functions appear to be much too small ever to be observed. The present
work is motivated by the opinion that we ought to understand what our
theories entail, even where in practice its predictions cannot be verified
experimentally, just as field theorists in the 1940s and 1950s took pains to
understand quantum electrodynamics to all orders of perturbation theory,
even though it was only possible to verify results in the first few orders.
There is a particular question that will concern us. In the familiar calcu-
lations of lowest-order Gaussian correlations, and also in Maldacena’s tree-
graph calculation of non-Gaussian correlations, the results depended only
on the behavior of the unperturbed inflaton field near the time of horizon
exit. Is the same true for loop graphs? If so, it will be possible to calcu-
lated the loop contributions with some confidence, but we can learn little
new from such calculations. On the other hand, if the contribution of loop
graphs depends on the whole history of the unperturbed inflaton field, then
calculations become much more difficult, but potentially more revealing. In
this case, it might even be that the loop contributions are much larger than
otherwise expected.

1
The appropriate formalism for dealing with this sort of problem is the
“in-in” formalism originally due to Schwinger.3 Schwinger’s presentation is
somewhat opaque, so this formalism is outlined (and extended) in an Ap-
pendix. In section II we summarize those aspects of this formalism that
are needed for our present purposes. Section III introduces a class of the-
ories to serve as a basis of discussion, with a single inflaton field, plus any
number of additional massless scalar fields with only gravitational interac-
tions and vanishing expectation values. In Section IV we prove a general
theorem about the late time behavior of cosmological correlations at fixed
internal as well as external wave numbers. Section V introduces a class
of unrealistic theories to illustrate the problems raised by the integration
over internal wave numbers, and how these problems may be circumvented.
In Section VI we return to the theories introduced in Section III, and we
show that the conditions of the theorem proved in Section IV are satisfied
for these theories. This means that, to all orders of perturbation theory, if
ultraviolet divergences cancel in the integrals over internal wave numbers,
then cosmological correlations do indeed depend only on the behavior of the
unperturbed inflation field near the time of horizon exit in the cases studied.
We can also find other theories in which this result does not apply, as for
instance by giving the additional scalar fields a self-interaction. Section VII
presents a sample one-loop calculation of a cosmological correlation.

II. THE “IN-IN” FORMALISM IN COSMOLOGY

The problem of calculating cosmological correlation functions differs


from the more familiar problems encountered in quantum field theory in
at least three respects:
• We are not interested here in the calculation of S-matrix elements, but
rather in evaluating expectation values of products of fields at a fixed
time.
• Conditions are not imposed on the fields at both very early and very
late times, as in the calculation of S-matrix elements, but only at
very early times, when the wavelength is deep inside the horizon and
according to the Equivalence Principle the interaction picture fields
should have the same form (when expressed in terms of metric rather
than co-moving coordinates) as in Minkowski spacetime.
• Although the Hamiltonian H that generates the time dependence of
the various quantum fields is constant in time, the time-dependence

2
of the fluctuations in these fields are governed by a fluctuation Hamil-
tonian H̃ with an explicit time dependence, which as shown in the
Appendix is constructed by expanding H around the unperturbed so-
lution of the field equation, and discarding the terms of first order in
the perturbations to the fields and their canonical conjugates.

Given a fluctuation Hamiltonian H̃, we want to use it to calculate expec-


tation values of some product Q(t) of field operators, all at the same time t
but generally with different space arguments. As discussed in the Appendix,
the prescription of the “in-in” formalism is that
  Z t    Z t 
I
hQ(t)i = T̄ exp i HI (t) dt Q (t) T exp −i HI (t) dt ,
−∞ −∞
(1)
Here T denotes a time-ordered product; T̄ is an anti-time-ordered product;
QI is the product Q in the interaction picture (with time-dependence gen-
erated by the part of H̃ that is quadratic in fluctuations); and HI is the
interaction part of H̃ in the interaction picture. (This result is different
from that originally given by Maldacena2 and other authors4 , who left out
the time-ordering and anti-time-ordering, perhaps through a typographical
error. However, this makes no difference to first order in the interaction,
which is the approximation used by these authors in their calculations.) We
are here taking the time t0 at which the fluctuations are supposed to behave
like free fields as t0 = −∞, which is appropriate for cosmology because at
very early times the fluctuation wavelengths are deep inside the horizon.
Eq. (1) leads to a fairly complicated diagrammatic formalism, described
in the Appendix. Unfortunately this formalism obscures crucial cancella-
tions that occur between different diagrams. For our present purposes, it is
more convenient to use a formula equivalent to Eq. (1):
∞ Z t Z tN Z t2
iN
X
hQ(t)i = dtN dtN −1 · · · dt1
N =0 −∞ −∞ −∞
Dh h h i iiE
× HI (t1 ), HI (t2 ), · · · HI (tN ), QI (t) · · · , (2)

(with the N = 0 term understood to be just hQI (t)i). This can easily be
derived from Eq. (1) by mathematical induction. Obviously Eqs. (1) and
(2) give the same results to zeroth and first order in HI . If we assume that
the right-hand sides of Eqs. (1) and (2) are equal for arbitrary operators Q
up to order N − 1 in HI , then by differentiating these equations we easily

3
see that the time derivatives of the right-hand sides are equal up to order
N . Eqs. (1) and (2) also give the same results for t → −∞ to all orders, so
they give the same results for arbitrary t to order N .

III. THEORIES OF INFLATION

To make our discussion concrete, in this section we will take up a partic-


ular class of theories of inflation. The reader who prefers to avoid details of
specific theories can skip this section, and go on immediately to the general
analysis of late-time behavior in the following section.
In this section we will consider theories of inflation with two kinds of
matter fields : a real scalar field ϕ(x, t) with a non-zero homogeneous ex-
pectation value ϕ̄(t) that rolls down a potential V (ϕ), and any number
of real massless scalar fields σn (x, t), which have only minimal gravitational
interactions, and are prevented by unbroken symmetries from acquiring vac-
uum expectation values. The real field ϕ serves as an inflaton whose energy
density drives inflation, while the σn are a stand-in for the large number of
species of matter fields that will dominate the effects of loop graphs on the
correlations of the inflaton field.∗∗
We follow Maldacena,2 adopting a gauge in which there are no fluctua-
tions in the inflaton field, so that ϕ(x, t) = ϕ̄(t), and in which the spatial
part of the metric takes the form∗∗∗

gij = a2 e2ζ [exp γ]ij , γii = 0, ∂i γij = 0 . (3)


where a(t) is the Robertson–Walker scale factor, γij (x, t) is a gravitational
wave amplitude, and ζ(x, t) is a scalar whose characteristic feature is that
∗∗
Standard counting arguments show that in these theories the number of factors of 8πG
in any graph equals the number of loops of any kind, plus a fixed number that depends
only on which correlation function is being calculated. Matter loops are numerically
more important than loops containing graviton or inflaton lines, because they carry an
additional factor equal to the number of types of matter fields.
∗∗∗
I am adopting Maldacena’s notation, but the quantity he calls ζ is more usually called
R. To first order in fields, the quantity usually called ζ is defined as −Ψ − Hδρ/ρ̄˙ , while
the quantity usually called R is defined as −Ψ + Hδu. (Here the contribution of scalar
modes to gij is written in general gauges as −2a2 (Ψδij + ∂ 2 Ψ′ /∂xi ∂xj ), while δρ and ρ̄
are the perturbation to the total energy density and its unperturbed value, while δu is
the perturbed velocity potential, which for a single inflaton field is δu = −δϕ/ϕ̄˙ .) In the
gauge used by Maldacena and in the present paper δu = Ψ′ = 0, so since ζ is defined here
as Ψ to first order in fields, it corresponds to the quantity usually called R. Outside the
horizon R and ζ are the same.

4
it is conserved outside the horizon,5 that is, for physical wave numbers that
are small compared with the expansion rate. The same is true of γij .
The other components of the metric are given in the Arnowitt–Deser–
Misner (ADM) formalism6 by
g00 = −N 2 + gij N i N j , gi0 = gij N j , (4)
where N and N i are auxiliary fields, whose time-derivatives do not appear
in the action. The Lagrangian density in this gauge (with 8πG ≡ 1) is
"
a3  
2
L = e3ζ N R(3) − 2N V (ϕ̄) + N −1 E j i E i j − (E i i )2 + N −1 ϕ̄˙
2
#
X 2
−1 i −2 −2ζ ij
X
+N σ̇n − N ∂i σn − Na e [exp (−γ)] ∂i σn ∂j σn ,
n n
(5)
where
1 
Eij ≡
ġij − ∇i Nj − ∇j Ni , (6)
2
and bars denote unperturbed quantities. All spatial indices i, j, etc. are
lowered and raised with the matrix gij and its reciprocal; ∇i is the three-
dimensional covariant derivative calculated with this three-metric; and R(3)
is the curvature scalar calculated with this three-metric:
h iij
(3)
R(3) = a−2 e−2ζ e−γ Rij .

The auxiliary fields N and N i are to be found by requiring that the action
is stationary in these variables. This gives the constraint equations:
h  i  
∇i N −1 E i j − δi j E k k = N −1 ∂j σn σ̇n − N i ∂i σn ,
X
(7)
n

" #
 2
2 (3) −2 −2ζ ij
∂i σn ∂j σn = E i j E j i − E i i
X
N R − 2V − a e [exp(−γ)]
n
X 2
2 i
+ ϕ̄˙ + σ̇n − N ∂i σn (8)
n

For instance, to first order in fields (including field derivatives) the auxiliary
fields are the same as in the case of no additional matter fields2
1
N = 1 + ζ̇/H , N i = − 2 ∂i ζ + ǫ∂i ∇−2 ζ̇ , (9)
a H

5
where
2
Ḣ ϕ̄˙ ȧ
2
= ǫ≡−2
, H≡ (10)
H 2H a
The fields in the interaction picture satisfy free-field equations. For ζ we
have the Mukhanov equation:7
" #
∂2ζ d ln(a3 ǫ) ∂ζ
+ − a−2 ∇2 ζ = 0 , (11)
∂t2 dt ∂t

The field equation for gravitational waves is

∂ 2 γij ∂γij
2
+ 3H − a−2 ∇2 γij = 0 , (12)
∂t ∂t
and for the matter fields
∂ 2 σn ∂σn
2
+ 3H − a−2 ∇2 σn = 0 . (13)
∂t ∂t
The fields in the interaction picture are then
Z h i
ζ(x, t) = d3 q eiq·x α(q)ζq (t) + e−iq·x α∗ (q)ζq∗ (t) , (14)
Z Xh i
γij (x, t) = d3 q eiq·x eij (q̂, λ)α(q, λ)γq (t) + e−iq·x e∗ij (q̂, λ)α∗ (q, λ)γq∗ (t) ,
λ
Z (15)
h i
σn (x, t) = d3 q eiq·x α(q, n)σq (t) + e−iq·x α∗ (q, n)σq∗ (t) , (16)

where λ = ±2 is a helicity index and eij (q̂, λ) is a polarization tensor, while


α(q), α(q, λ), and α(q, n) are conventionally normalized annihilation oper-
ators, satisfying the usual commutation relations
h i   h i
α(q) , α∗ (q′ ) = δ3 q − q′ , α(q) , α(q′ ) = 0 . (17)
h i   h i
α(q, λ) , α∗ (q′ , λ′ ) = δλλ′ δ3 q − q′ , α(q, λ) , α(q′ , λ′ ) = 0 , (18)
and
h i   h i
α(q, n) , α∗ (q′ , n′ ) = δnn′ δ3 q − q′ , α(q, n) , α(q′ , n′ ) = 0 , (19)

Also, ζq (t), γq (t), and σq (t) are suitably normalized positive-frequency so-
lutions of Eqs. (11)–(13), with ∇2 replaced with −q 2 . They satisfy initial

6

conditions, designed to make −ζ ϕ̄˙ /H, γij / 16πG, and σn behave like con-
ventionally normalized free fields at t → −∞:†

ϕ̄˙ (t)ζq (t) γq (t)


− →√ → σq (t)
H(t) 16πG
dt′
 Z ∞
1

→ √ exp iq . (20)
(2π)3/2 2q a(t) t a(t′ )

IV. LATE TIME BEHAVIOR

The question to be addressed in this section is whether the time integrals


in Eqs. (1) and (2) are dominated by times near horizon exit for general
graphs. This question is more complicated for loop graphs than for tree
graphs, such as that considered by Maldacena, because for loops there are
two different kinds of wave number: the fixed wave numbers q associated
with external lines, and the internal wave numbers p circulating in loops,
over which we must integrate. It is only if the integrals over internal wave
numbers p are dominated by values of order p ≈ q that we can speak of
a definite time of horizon exit, when q/a ≈ p/a ≈ H. In this section we
will integrate first over the time arguments in Eq. (2), holding the internal
wave numbers at fixed values, and return at the end of this section to the
problems raised by the necessity of then integrating over the p s.
There is never any problem with the convergence of the time integrals
at very early times; all fluctuations oscillate very rapidly for q/a ≫ H and
p/a ≫ H, suppressing the contribution of early times to the time integrals in
Eq. (2). To see what happens for late times, when q/a ≪ H and p/a ≪ H,
we need to count the powers of a in the contribution of late times in general
loop as well as tree graphs.
For this purpose, we need to consider the behavior of the coefficient
functions appearing in the Fourier decompositions (14)–(16) of the fields in
the interaction picture. In order to implement dimensional regularization,
we will consider these coefficient functions in 2ν space dimensions, returning
later to the limit 2ν → 3. The coefficient functions then obey differential
equations obtained by replacing the space dimensionality 3 in Eqs. (11)–(13)

In Newtonian gauge the quantity −ζ(x, t)ϕ̄˙ (t)/H(t) approaches the inflaton field fluc-
tuation δϕ(t) for t → −∞.

7
with 2ν, as well as replacing the Laplacian with −q 2 :
  

d2 ζq (t)  d ln a (t)ǫ(t)  dζq (t) q2
+ + ζq (t) = 0 , (21)
dt2 dt dt a2 (t)

d2 γq (t) dγq (t) q2


+ 2νH(t) + γq (t) = 0 , (22)
dt2 dt a2 (t)
d2 σq (t) dσq (t) q 2
+ 2νH(t) + 2 σq (t) = 0 . (23)
dt2 dt a
At late times, when q/a ≪ H, the solutions can be written as asymptotic
expansions in inverse powers of a:††
" #
Z ∞ q 2 dt′
Z t′
ζq (t) → ζqo 1 + a2ν−2 (t′′ ) ǫ(t′′ ) dt′′ + . . .
t a2ν (t′ )ǫ(t′ ) −∞
"Z
∞ dt′
+ Cq
t a2ν (t′ )ǫ(t′ )
#
Z ∞ dt′
Z t′ Z ∞ dt′′′
+q 2 a2ν−2 (t′′ ) ǫ(t′′ ) dt′′ + . . . (24)
t a2ν (t′ )ǫ(t′ ) −∞ t′′ a2ν (t′′′ )ǫ(t′′′ )
"
t′
#
∞ q 2 dt′
Z Z
γq (t) → γqo 1+ a 2ν−2 ′′ ′′
(t ) dt + . . .
t a2ν (t′ ) −∞
"Z #
∞ dt′
Z ∞ dt′
Z t′ Z ∞ dt′′′
+Dq + q2 ′′
a(t ) dt ′′
+ . . . (25)
t a2ν (t′ ) t a2ν (t′ ) −∞ t′′ a2ν (t′′′ )
" #
Z ∞ q 2 dt′
Z t′
σq (t) → σqo 1+ 2ν−2
a ′′ ′′
(t ) dt + . . .
t a2ν (t′ ) −∞
"Z #
∞ dt′
Z ∞ dt′
Z t′ Z ∞ dt′′
+Eq + q2 a2ν−2 ′′
(t ) dt ′′
+ ...
t a2ν (t′ ) t a2ν (t′ ) −∞ t′′ a2ν (t′′ )
(26)

where ζqo , γq0 , and σqo are the limiting values of ζq (t), γq (t), and σq (t) (the
“o” superscript stands for “outside the horizon”) and Cq , Dq , and Eq are
additional constants. In any kind of inflation with sufficient expansion, the
Robertson-Walker scale factor a will grow much faster than H or ǫ can
††
By t = ∞ in the limits of these integrals and elsewhere in this paper, we mean a time
still during inflation, but sufficiently late so that a(t) is many e-foldings larger than its
value when q/a falls below H.

8
change, and Eqs. (24)–(26) thus show that (at least for 2ν ≥ 2) the time
derivatives of ζq , γq , and σq all vanish for q/a ≪ H like 1/a2 .
If an interaction involves enough factors of ζ̇, γ̇ij , and/or σ̇n so that
these 1/a2 factors and any 1/a2 factors from the contraction of space indices
more than compensate for the a2ν factor in the interaction from the square
root of the metric determinant, then the integral over the associated time
coordinate will converge exponentially fast at late times as well as at early
times, and therefore may be expected to be dominated by the era in which
the wavelength leaves the horizon. For instance, the extension of Eq. (5) to
2ν space dimensions gives the interaction between a ζ field and a pair of σ
fields
a2ν−2 X a2ν−2 X
Lζσσ = − ζ ∂i σn ∂i σn − ζ̇ ∂i σn ∂i σn
2 n 2H n
ζ
 X
2ν−2 2 −2
+a ∂i − ǫa ∇ ζ̇ σ̇n ∂i σn
H n
a2ν X 2 3a2ν X 2
− ζ̇ σ̇ + ζ σ̇n . (27)
2H n n 2 n

(The
R 2ν
ζσσ interaction Hamiltonian given by canonical quantization is just
− d x Lζσσ , but this simple relation does not always apply.) Counting a
factor a−2 for each ζ̇ or σ̇n , the terms in this interaction go as a2ν−2 , a2ν−4 ,
a2ν−4 , a2ν−6 , and a2ν−4 , respectively. All these terms are safe for 2ν < 4,
except for the first, which for 2ν > 2 grows exponentially at late times.
Because of the commutators in Eq. (2), the condition for a safe interac-
tion is actually less stringent than that it should decay exponentially with
time, and even a growing term that only involves fields rather than their time
derivatives, like the first term in Eq. (27), may not destroy the convergence
of the time integrals. We will now prove the following:

Theorem The integrals over the time coordinates of interactions converge


exponentially for t → ∞, essentially as ∞ dt/an (t) with n > 0, provided
R

that in 2ν space dimensions, all interactions are of one or the other of two
types:

• Safe interactions, that contain a number of factors of a(t) (including


−2
√ factors of a for each time derivative and the 2ν factors of a from
−Detg) strictly less than 2ν − 2, and

9
• Dangerous interactions, which grow at late times no faster than a2ν−2 ,
and contain only fields, not time derivatives of fields.

These conditions are evidently met by the interaction (27), irrespective of


the value of ν, and, as we shall see in Section VI, they are satisfied by all
other interactions in the theories of Section III, but not in all theories.
Before proceeding to the proof, it should be noted that just as in √
Eq. (27),
the space dimensionality 2ν enters in the interaction only in a factor −Detg ∝
a2ν , so the question of whether or not a given theory satisfies the conditions
of this theorem does not depend on the value of 2ν. Thus this theorem has
the corollary:

Corollary The integrals over the time coordinates


R∞
of interactions converge
exponentially for t → ∞, essentially as n
dt/a (t) with n > 0, provided
that in 3 space dimensions all interactions are of one or the other of two
types:

• Safe interactions, that contain a number of factors of a(t) (including


−2 factors of a for each time derivative and the 3 factors of a from

−Detg) strictly less than +1, and

• Dangerous interactions, which grow at late times no faster than a, and


contain only fields, not time derivatives of fields.

Here is the proof. As already mentioned, the reason that dangerous


interactions are not necessarily fatal has to do with how they enter into
commutators in Eq. (2). Because of the time-ordering in Eq. (2), any failure
of convergence of the time integrals for t → +∞ in N th-order perturba-
tion theory must come from a region of the multi-time region of integra-
tion in which, for some r, the time arguments tr , tr+1 , . . . tN , all go to
infinity together. We will therefore have to count the number of factors of
a(tr ), a(tr+1 ), . . . a(tN ), treating them all as being of the same order of mag-
nitude. (This does not take proper account of factors of log a, but as long as
the integral over tr , tr+1 , . . . tN involves a negative total number of factors
of a, it converges exponentially fast no matter how many factors of log a arise
from subintegrations.) Now, at least one of the fields or field time derivatives
in each term in H(ts ) with r ≤ s ≤ N must appear in a commutator with one
of the fields in some other HI (ts′ ) with s < s′ ≤ N . So we need to consider
the commutators of fields at times which may be unequal, but are both late.
In the sense described above, treating all a(tr ), a(tr+1 ), . . . a(tN ) as being

10
of the same order of magnitude, if a(t) increases more-or-less exponentially,
then the commutator of two fields or a field and a field time-derivative goes
as a−2ν , while the commutator of two field time-derivatives goes as a−2ν−2 .
For instance, the unequal-time commutators of the interaction-picture
fields (14)–(16) are
h i Z  
′ ′ ′
ζ(x, t), ζ(x , t ) = d2ν p eip·(x−x ) ζp (t)ζp∗ (t′ ) − ζp (t′ )ζp∗ (t) , (28)
h i Z  

γij (x, t), γkl (x′ , t′ ) = d2ν p eip·(x−x ) Πijkl (p̂) γp (t)γp∗ (t′ ) − γp (t′ )γp∗ (t) ,
Z (29)
h i  
ip·(x−x′ )
σn (x, t), σm (x′ , t′ ) = δnm d2ν p e σp (t)σp∗ (t′ ) − σp (t′ )σp∗ (t) ,
(30)
where Πijkl (p̂) ≡ λ eij (p̂, λ)ekl (p̂, λ). The two asymptotic expansions given
P

in Eqs.(21–(23) for each of the fields are both real aside from over-all factors,
so neither by itself contributes to the commutators. On the other hand, the
constants Cp ζpo∗ , Dp γpo∗ , and Ep σpo∗ are in general complex. (For instance,
in a strictly exponential expansion, inflation, the phase of Cp ζpo∗ is given by
a factor −e−iνπ .) The asymptotic expansions of the commutators at late
times are therefore
"Z
t2 dt′
h i Z h i

ζ(x1 , t1 ), ζ(x2 , t2 ) → 2i d p Im Cp ζpo∗ e ip·(x1 −x2 )
t1 a2ν (t′ )ǫ(t′ )
Z t2 dt′
Z t′ Z ∞ dt′′
+p2 a2ν−2 (t′′ ) ǫ(t′′ ) dt′′
t1 a2ν (t′ )ǫ(t′ ) −∞ t′′ a2ν (t′′ ) ǫ(t′′ )
t′2
#
∞ dt′1 ∞ dt′2
Z Z Z
+p2 2ν ′ ′ 2ν ′ ′ a2ν−2 ′′ ′′ ′′
(t )ǫ(t ) dt + . . . ,
t1 a (t1 ) ǫ(t1 ) t2 a (t2 ) ǫ(t2 ) t′1
(31)
"Z
t2 dt′
h i Z h i
γij (x1 , t1 ), γkl (x2 , t2 ) → 2i d2ν p Πijkl (p̂) Im Dp γpo∗ eip·(x1 −x2 )
t1 a2ν (t′ )
Z t2 dt′
Z t′ Z ∞ dt′′
+p2 a2ν−2 (t′′ ) dt′′
t1 a2ν (t′ ) −∞ t′′ a2ν (t′′ )
#
Z ∞ dt′1
Z ∞ dt′2
Z t′2
+p2 a2ν−2 (t′′ ) dt′′ + . . . ,
t1 a2ν (t′1 ) t2 a2ν (t′2 ) t′1
(32)
"Z
t2 dt′
h i Z h i
σn (x1 , t1 ), σm (x2 , t2 ) → 2i δnm d2ν p Im Ep σpo∗ eip·(x1 −x2 )
t1 a2ν (t′ )

11
Z t2 dt′
Z t′ Z ∞ dt′′
2 2ν−2 ′′ ′′
+p a (t ) dt
t1 a2ν (t′ ) −∞ t′′ a2ν (t′′ )
#
Z ∞ dt′1
Z ∞ dt′2
Z t′2
2 2ν−2 ′′ ′′
+p a (t ) dt + . . . .
t1 a2ν (t′1 ) t2 a2ν (t′2 ) t′1
(33)

We see that the commutator of two fields vanishes essentially as a−2ν for
late times, and the same is true for the commutator of a field and its time
derivative, but the commutators of two time derivatives arise only from the
third terms in the expansions (31)–(33), and therefore go as a−2ν−2 . That
is,
h i Z h i
ζ̇(x1 , t1 ), ζ̇(x2 , t2 ) → 2i d2ν p Im Cp ζpo∗ eip·(x1 −x2 )
" #
p2 t2
Z
2ν−2 ′ ′ ′
× 2ν a (t )ǫ(t ) dt + . . . ,
a (t1 ) ǫ(t1 )a2ν (t2 ) ǫ(t2 ) t1

and likewise for γij and σn .


Let’s now add up the total number of factors of a(tr ), a(tr+1 ), . . . and a(tN )
in the integrand of Eq. (2), for some selection of terms in the interactions
H(ts ) with r ≤ s ≤ N . Suppose that the selected term in H(ts ) contains an
explicit factor a(ts )As , and Bs factors of field time derivatives. Suppose also
that in the inner N − r + 1 commutators in Eq. (2) there appear C com-
mutators of fields with each other, C ′ commutators of fields with field time
derivatives, and C ′′ commutators of field time derivatives with each other.
The number of field time derivatives that are not in these commutators is
′ ′′ ′ ′′
s Bs − C − 2C , and these contribute a total −2
P P
s Bs + 2C + 4C factors
P
of a. (All sums over s here run from r to N .) In addition, there are s As
factors of a that appear explicitly in the interactions, and as we have seen,
the commutators contribute −2νC − 2νC ′ − (2ν + 2)C ′′ factors of a. Hence
the total number of factors of a(tr ), a(tr+1 ), . . . and a(tN ) in the integrand
of Eq. (2) is

(As − 2Bs ) − 2νC − (2ν − 2)(C ′ + C ′′ ) =


X X
#= (As − 2Bs − 2ν + 2) − 2C ,
s s
(34)
in which we have used the fact that the total number C + + of com- C′ C ′′
mutators of the interactions H(tr ), H(tr+1 ), . . . and H(tN ) with each other
and with the field product Q equals the number of these interactions. Under
the assumptions of this theorem, all interactions have As − 2Bs ≤ 2ν − 2.

12
If any of them are safe in the sense that As − 2Bs < 2ν − 2, then # < 0,
and the integral over time converges exponentially fast. On the other hand,
if all of them have As − 2Bs = 2ν − 2, then under the assumptions of this
theorem they all involve only fields, not field time derivatives, so the same
is true of the commutators of these interactions. In this case C > 0 and
# = −2C < 0, so again the integral over time converges exponentially fast.
In counting powers of a, we have held the wave numbers p associated
with internal lines fixed, like the external wave numbers, because we are
integrating over time coordinates before we integrate over the internal wave
numbers. The integrals over time receive little contribution from values of
the conformal time τ ≡ − t∞ dt/a satisfying −pτ ≫ 1 and −qτ ≫ 1, be-
R

cause of the rapid oscillation of the integrand, and for theories satisfying the
conditions of our theorem they also receive little contribution from values of
τ with −pτ ≪ 1 and −qτ ≪ 1, because of the damping provided by negative
powers of a. (Note that when a(t) increases more or less exponentially, τ is
of the order of −1/aH.) Thus for these theories, we expect the integrals to
be dominated by times for which −1/τ is in the range from the qs to the pss.
The question then is whether the integrals over the internal wave numbers
p are dominated by values of the order of the external wave numbers q? If
they are, then the results depend only on the history of inflation around the
time of horizon exit, −qτ ≈ 1, or in other words, q/a ≈ H.
Any integral over the internal wave numbers will in general take the form
of a polynomial in the external wave numbers, with coefficients that may be
divergent, plus a finite term given by a convergent integral dominated by
internal wave numbers of the same order of magnitude as the fixed external
wave numbers. An example of this decomposition is given in Section VII. In
particular, the integral over the wave number associated with an internal line
that begins and ends at the same vertex does not involve the external wave
numbers, so its contribution is purely a polynomial in the wave numbers of
the other lines attached to the same vertex.
Just as in dealing with ultraviolet divergences in flat space quantum
field theory, renormalization removes some of these ultraviolet divergent
polynomial terms, and others are removed by appropriate redefinitions of
the field operators. (Some examples are given in the next section.) Where
redefinition of the field operators is necessary, it is only products of the
redefined “renormalized” field operators whose expectation values may be
expected to give results that converge at late times. If, after all such renor-
malizations and redefinitions, there remained ultraviolet divergences in the
integrals over internal wave numbers, we could conclude that the approxi-

13
mation of extending the time integrals to +∞ is not valid, and that these
integrals can be taken only to some time t late in inflation. The decrease
of the integrand at wave numbers p much greater than −1/τ (t) would then
provide the ultraviolet cut off that is still needed, but the correlation func-
tions would exhibit the sort of time dependence that has been found in other
contexts by Woodard and his collaborators,3 and we would not be able to
draw conclusions about correlations actually measured at times much closer
to the present. The possible presence of such ultraviolet divergences that
are not removed by renormalization and field redefinition is an important
issue, which merits further study.††† But even if such ultraviolet divergences
are present, it would still be possible to calculate the non-polynomial part
of the integrals over internal momenta which is not ultraviolet divergent (at
least in one loop order) even when the time t is taken to infinity. Such a
calculation will be presented in Section VII.

V. AN EXAMPLE: EXPONENTIAL EXPANSION

To clarify the issues discussed at the end of the previous section, we


will examine a simple unphysical model, along with a revealing class of
generalizations.
First, consider a single real scalar field ϕ(x, t) in a fixed de Sitter metric.‡
In order to implement dimensional regularization, we work in 2ν space di-
mensions, letting ν → 3/2 at the end of our calculation. The Lagrangian
density is taken as
" #
1p a2ν 2 a2ν−2
L=− −Detg gµν (1+λϕ2 ) ∂µ ϕ ∂ν ϕ = (1+λϕ2 ) ϕ̇ − (∇ϕ)2 ,
2 2 2
(35)
†††
Many theories are afflicted with infrared divergences, even when t is held fixed. The
infrared divergences are attributed to the imposition of the unrealistic initial condition,
that at early times all of infinite space is occupied by a Bunch–Davies vacuum. The
infrared divergence can be eliminated either by taking space to be finite8 or by changing
the vacuum.9 In any case, it is the appearance of uncancelled ultraviolet rather than
infrared divergences when we integrate over internal wave numbers after taking the limit
t → ∞ that shows the impropriety of this interchange of limit and integral, because factors
of 1/a in the integrand are typically accompanied with factors of internal wave numbers,
so that the 1/a factors do not suppress the integrand for large values of a if the integral
receives contributions from arbitrarily large values of the internal wave number.

This model, and much of the analysis, was suggested to me by R. Woodard, private
communication.

14
where a ∝ eHt with H constant. (This of course can be rewritten as a free
field theory, but it is instructive nonetheless, and will be generalized later
in this section to interacting theories.) We follow the usual procedure of
defining a canonical conjugate field π = ∂L/∂ ϕ̇, constructing the Hamilto-
nian density H = π ϕ̇ − L with ϕ̇ expressed in terms of π, dividing H into
a quadratic part H0 and interaction part HI , and then replacing π in HI
with the interaction-picture πI given by ϕ̇ = [∂H0 /∂π]π=πI . This gives an
interaction
" ( ) #
λ a2 ϕ2
Z

HI = d x − , ϕ̇2 2ν−2
+a 2 2
(∇ϕ) ϕ . (36)
2 2 1 + λϕ2

(An anticommutator is needed in the first term to satisfy the requirement


that HI be Hermitian.) This interaction satifies the conditions of the theo-
rem proved in the previous section for any value of the space dimensionality
2ν: the first term in the square brackets contains 2ν − 4 factors of a (count-
ing a factor a−2 for each time derivative, so it is safe, while the second term
contains 2ν − 2 factors of a, and is therefore dangerous, but it only involves
fields (including space derivatives), not their time derivatives, so though
dangerous it still satisfies the conditions of our theorem.
To first order in λ, the expectation value hϕ(x, t) ϕ(x′ , t)i is given by a
one-loop diagram, in which a scalar field line is emitted and absorbed at the
same vertex, with the two external lines also attached to this vertex. This
expectation value receives contributions of three kinds:

i Terms in which no time derivatives act on the internal lines. This contribu-
tion is the same as would be obtained by adding effective interactions
proportional to a2ν−2 (∇ϕ)2 , a2ν−2 ϕ2 , or a2ν ϕ̇2 , all of which satisfy
the conditions of the theorem of the previous section. Thus it can-
not affect the conclusion that the integral over the time argument of
HI (t1 ) converges exponentially at t1 = +∞, so that hϕ(x, t) ϕ(x′ , t)i
approaches a finite limit for t → ∞.

ii Terms in which time derivatives act on both ends of the internal line. This
produces an effective interaction proportional to a2ν ϕ2 , which violates
the conditions
√ of our theorem, but it can be removed by adding an
Rϕ2 −Detg counterterm in the Lagrangian. (This cancellation is not
automatic, because the condition of minimal coupling is not enforced
by any symmetry.)

15
iii Terms in which a time derivative acts on just one end of the internal
line. This produces an effective interaction proportional to a2ν ϕϕ̇,
which violates the conditions of our theorem, and cannot be removed
by adding a generally covariant counterterm to the Lagrangian.

To see in detail what trouble is caused by the third type of contribution,


note that the interaction picture scalar field is given by a Fourier decompo-
sition like Eq. (16), with coefficient functions‡‡

eiπ(2ν+1)/4 H ν−1/2 (1)


ϕq (t) = √ Hν (−qτ ) (−qτ )ν , (37)
4π 2q ν
where τ is the conformal time
∞ dt′ 1
Z
τ ≡− ′
=− . (38)
t a(t ) a(t)H

The contribution of the third kind to the expectation value then has the
Fourier transform
!3  4ν
H 2ν−1 2π
Z
2ν iq·(x−x′ ) ′
d xe hϕ(x, t) ϕ(x , t)iiii =
32π 2 q
Z ∞ dp
Z t d
 2 
2ν ν (1)
×4π dt1 a (t1 ) (−pτ1 ) Hν (−pτ1 )

0 p −∞ dt1
d 
 2  ∗2 
ν (1) ν (1)
×Im (−qτ1 ) Hν (−qτ1 ) (−qτ ) Hν (−qτ ) (39)
dt1
Let’s see what happens if we evaluate this by integrating first over p and
then over t1 from −∞ to late times, or vice versa.
To integrate first over p, we can change the variable of integration from
p to z ≡ −pτ1 , in which case the first derivative with respect to t1 can be
replaced with d/dt1 = (z/a1 τ1 )(d/dz) = −Hz(d/dz), while dp/p = dz/z.
(1)
Dimensional regularization (with 2ν < 1) makes the function z ν Hν (z)

vanish at z → ∞, while for ν > 0 it takes the value 2ν Γ(ν)/π for z → 0, so

2 Γ(ν) 2
∞  ν
d ν (1) 2
Z 
dz z Hν (z) = − ,
0 dz π
‡‡
Here and below we will not be careful to extend factors like 4π to 2ν space dimensions.
This only affects constant terms that accompany any (2ν − 3)−1 poles.

16
and therefore
2 !3  4ν
2ν Γ(ν) H 2ν−1 2π
Z 
2ν iq·(x−x′ ) ′
d xe hϕ(x, t) ϕ(x , t)iiii = −4πH
π 32π 2 q
Z t d 
 2  ∗2 
× dt1 a2ν (t1 )Im (−qτ1 )ν Hν(1) (−qτ1 ) (−qτ )ν Hν(1) (−qτ )
−∞ dt1
(40)
For t1 → +∞ and t → +∞ (that is, τ → 0 and τ1 → 0), the integrand of
the integral over t1 on the second line has the constant limit
d  4Γ(ν)2 q 2ν
 2  ∗2 
2ν ν (1) ν (1)
a (t1 )Im (−qτ1 ) Hν (−qτ1 ) (−qτ ) Hν (−qτ ) → − 3 2ν−1 .
dt1 π H
(41)
Thus for t → ∞, the correlation function (39) does not approach a constant,
but instead goes as
H 4ν−1 Γ(ν)4 t
Z

d2ν x eiq·(x−x ) hϕ(x, t) ϕ(x′ , t)iiii → . (42)
2(2π)10−4ν q 2ν
There is no pole here that prevents continuation to space dimensionality
2ν = 3. From this point of view, integrating first over p, the failure of the
correlation function to approach a finite limit at late times is due to the fact
already noted, that the integral over p produces an effective interaction that
does not satisfy the conditions of our theorem.
But suppose we first integrate over t1 from −∞ to +∞. Now there is no
problem with convergence at late times, because the original interaction does
satisfy the conditions of our theorem, but instead we now have a problem
with the convergence of the integral over p. It will be helpful to divide the
integral over p into an integral from 0 to Λq, where Λ ≫ 1, and an integral
from Λq to infinity. The first integral obviously has no ultraviolet divergence,
and the vanishing of the first time derivative in Eq. (39) for p → 0 prevents
any infrared divergence. In the second integral p and −1/τ are the only
magnitudes in the problem with which q can be compared, so for t → +∞
and hence τ → 0 we can evaluate the correlation function by letting q → 0
and keeping only the leading term in q. Here again we can use the limiting
formula (41), now for q → 0 instead of τ → 0 and τ1 → 0. The integral over
t1 is then trivial, and we find that for q ≪ 1/τ the correlation function is
H 4ν−2 Γ(ν)4 ∞ dp
Z Z

d2ν x eiq·(x−x ) hϕ(x, t) ϕ(x′ , t)iiii → + finite . (43)
2(2π)10−2ν q 2ν Λq p

17
The ultraviolet divergent integral over p is the price we pay for the naugh-
tiness of taking the limit t → ∞ before we integrate over p.
In this model it is clear how to remedy the difficulty of calculating corre-
lation functions at late times. As already mentioned, the original Lagrangian
density (35) actually describes a free field theory. This is made manifest by
defining a new scalar field
Z q
ϕ̃ ≡ 1 + λϕ2 dϕ , (44)

for which the Lagrangian density takes the form


1p
L=− −Detg gµν ∂µ ϕ̃ ∂ν ϕ̃ . (45)
2
There is no problem in taking the late-time limit of the correlation function
R 2ν iq·(x−x′ )
d e hϕ̃(x, t) ϕ̃(x′ , t)i — it is just 22ν H 2ν−1 Γ(ν)2 /32π 4 q 2ν . From
this point of view, the growth of the correlation function (42) at late times is
a result of our perversity in calculating the correlation function of ϕ instead
of ϕ̃.
Can we find fields whose correlation functions have a constant limit at
late times in theories that satisfy the conditions of our theorem but are not
equivalent to free field theories? The general answer is not known, but here
is a class of interacting field theories for which such “renormalized” fields
can be found. This time we consider an arbitrary number of real scalar fields
ϕn (x, t) in a fixed de Sitter metric. The Lagrangian density is taken to have
the form of a non-linear σ-model:
1 Xp  
L=− −Detg gµν δnm + λKnm (ϕ) ∂µ ϕn ∂ν ϕm , (46)
2 nm
where Knm (ϕ) is an arbitrary real symmetric matrix function of the ϕn ; λ
is a coupling constant; and again a ∝ eHt with H constant. The Hamilto-
nian derived from this Lagrangian density does satisfy the conditions of the
theorem of Section IV, whatever the function Knm (ϕ).
To first order in λ, the same problem discussed earlier in this section
arises from graphs in which an internal line of the field ϕn is emitted and
absorbed from the same vertex, with a time derivative acting on just one
end of this line. Depending on what correlation function is being calculated,
the contribution of such graphs is proportional to various contractions of
partial derivatives of the function
X ∂Knm (ϕ)
Am (ϕ) ≡ . (47)
n ∂ϕn

18
Suppose we make a redefinition of the fields of first order in λ:

ϕ̃n ≡ ϕn − λ∆n (ϕ) . (48)

This changes the matrix K to

∂∆n (ϕ) ∂∆m (ϕ)


K̃nm (ϕ) = Knm (ϕ) + + , (49)
∂ϕm ∂ϕn
and so
X ∂ K̃nm (ϕ) X ∂ 2 ∆n (ϕ) X ∂∆m (ϕ)
Ãm (ϕ) ≡ = Am (ϕ) + + . (50)
n ∂ϕn n ∂ϕn ∂ϕm n ∂ϕn ∂ϕn

Thus the fields ϕ̃n are renormalized, in the sense that to first order in λ
correlation functions have finite limits at late times, provided that
X ∂ 2 ∆n (ϕ) X ∂∆m (ϕ)
+ = −Am (ϕ) . (51)
n ∂ϕn ∂ϕm n ∂ϕn ∂ϕn

This can be solved by first solving the Poisson equation


X ∂ 2 B(ϕ) 1 X ∂An (ϕ)
=− (52)
n ∂ϕn ∂ϕn 2 n ∂ϕn

and then solving a second Poisson equation


X ∂ 2 ∆m (ϕ) ∂B(ϕ)
= −Am (ϕ) − . (53)
n ∂ϕn ∂ϕn ∂ϕm

Thus for at least to first order in this class of theories, it is always possible
to find a suitable set of renormalized fields.
Because we can take the limit t → ∞ only for the correlation functions
of suitably defined fields (such as ϕ̃n in our example), the question naturally
arises, whether these are the fields whose correlation functions we want to
calculate. The answer is conditioned by the fact that astronomical observa-
tions of the cosmic microwave background or large scale structure are made
following a long era that has intervened since the end of inflation, during
which things happened about which we know almost nothing, such as re-
heating, baryon and lepton synthesis, and dark matter decoupling. The only
thing that allows us to use observations to learn about inflation is that some
quantities were conserved during this era, while fluctuation wave lengths

19
were outside the horizon. These are the only quantities whose correlation
functions at the end of inflation can be interpreted in terms of current ob-
servations. In the classical limit, the quantities that are conserved outside
the horizon are ζ and γij , but we don’t know whether this will be true when
quantum effects are taken into account. Still, we can expect that quantities
are conserved only when there is some symmetry principle that makes them
conserved, and whatever symmetry principle keeps some quantity conserved
from the end of inflation to the time of horizon re-entry is likely also to keep
it conserved from the time of horizon exit to the end of inflation. So we may
guess that the quantities whose correlation functions we will need to know
are just those whose correlation functions approach constant limits at the
end of inflation.

VI. DANGEROUS INTERACTIONS IN INFLATIONARY


THEORIES
We now return to the semi-realistic theories described in Section III. We
will show in this section that all interactions are of the type called for in the
theorem of Section IV; that is, they are all safe interactions that (in three
space dimensions) do not grow exponentially at late times (and in fact are
suppressed at late times at least by a factor a−1 ), or dangerous interactions
containing only fields and not their time derivatives, which grow no faster
that a at late times. Fortunately, as noticed by Maldacena2 in a different
context, for this purpose it is not necessary to solve the constraint equations
(7) and (8), which are quite complicated especially when the σn fields are
included. Inspection of these equations shows that when we count ζ̇, γ̇ij ,
and σ̇n as of order a−2 , the auxiliary fields N − 1 and N i are both also of
order a−2 .‡‡‡ This is apparent in the first-order solution (9) of the constraint
equations, but it holds to all orders in the fields. To calculate the quantity
E j i E i j − (E i i )2 in Eq. (5), we note that
i
1 −γ ∂ γ 1 i
 
E i j = Hδi j + ζ̇δi j + e e j − ∇ Nj + ∇ j N i . (54)
2 ∂t 2
The first term Hδi j is of order zero in a, while all other terms are of order
a−2 , so
E j i E i j − (E i i )2 = −6H 2 − 12H ζ̇ − 4H∇k N k + O(a−4 ) (55)
‡‡‡
In counting powers of a, note that the three-dimensional affine connection and Ricci
tensor are independent of a, so the curvature scalar R(3) goes as a−2 . For instance, for
γij = 0, we have R(3) = −a−2 e−2ζ (4∇2 ζ + 2(∇ζ)2 ).

20
h ii
∂ γ
(In deriving this result, we note that e−γ ∂t e i = γ̇ii = 0.) The terms
in (5) of first order in N − 1 all cancel as a consequence of the constraint
equation (8), while terms of second order in N − 1 in Eq. (5) (and in partic-
2
ular in a3 e3ζ ϕ̄˙ /2N and −3H 3 a3 e3ζ /2N ) are suppressed by at least a factor
a3 (a−2 )2 , and are therefore safe. Therefore we can isolate all terms that are
potentially dangerous by setting N = 1, and find
"
a3
L = e3ζ R(3) − 2V (ϕ̄) − 6H 2 − 12H ζ̇ − 4H∇k N k
2
#
2
+ ϕ̄˙ − a−2 e−2ζ [exp (−γ)]ij ∂i σn ∂j σn + O(a−1 ) ,
X

n
(56)

We note that e3ζ ∇k N k = ∂k (e3ζ N k ), so this term vanishes when integrated


over three-space, and therefore makes no contribution to the action. The
term proportional to ζ̇ can be written
∂    
−6a3 e3ζ H ζ̇ = −2a3 H e3ζ + a3 e3ζ 6H 2 + 2Ḣ .
∂t
The first term vanishes when integrated over time, so it gives no contribution
to the action. To evaluate the remaining terms we use the unperturbed
2
inflaton field equation, which (with 8πG ≡ 1) gives Ḣ = −ϕ̄˙ /2, and the
2
Friedmann equation, which gives 6H 2 = 2V +ϕ̄˙ . We then find a cancellation
1 2
−V − 3H 2 + ϕ̄˙ + 6H 2 + 2Ḣ = 0 .
2
Aside from terms that make no contribution to the action, the Lagrangian
density is then
" #
a3
L = e3ζ R(3) − a−2 e−2ζ [exp (−γ)]ij ∂i σn ∂j σn + O(a−1 ) .
X
(57)
2 n

We see that, at least in this class of theories, the dangerous terms that are
not suppressed by a factor a−1 grow at most like a at late times, and involve
only fields, not their time derivatives, as assumed in the theorem of Section
III.
It remains to be seen if in these theories, after integrating over times and
taking the limit t → ∞, the remaining integrals over internal wave numbers

21
are made convergent by the same counterterms that eliminate ultraviolet di-
vergences in flat spacetime, and if not, whether they can be made convergent
by suitable redefinitions of the fields ζ and γij appearing in the correlation
functions. This is left as a problem for further work.
Not all theories satisfy the conditions
√ of the theorem of Section IV. For
instance, a non-derivative interaction −DetgF (σ) of the σ fields would
have +3 factors of a, and hence would violate the condition that the total
number of factors of a (counting each time derivative as -2 factors) must
be no greater than +1. The σ fields must be the Goldstone bosons of some
broken global symmetry in order to satisfy the conditions of our theorem in
a natural way.

VII. A SAMPLE CALCULATION

As an application of the formalism described in this paper, we will now


calculate the one-loop contribution to the correlation function of two ζ fields,
which is measured in the spectrum of anisotropies of the cosmic microwave
background. As already mentioned, in the class of theories described in
Section III, this two-point function is dominated by a matter loop, because
there are many types of matter field and only one gravitational field. We
first consider the contribution of second order in the interaction (27). It
saves a great deal of work if we use the interaction-picture field equations
(11) and (13) to put this interaction in the form
Z
Hζσσ (t) = − d3 x Lζσσ (x, t) = A(t) + Ḃ(t) (58)

where
XZ
A = −2ǫHa5 d3 x σ̇n2 ∇−2 ζ̇ (59)
n
aζ 1 1
XZ   
3
B = d x − ǫa3 ∇−2 ζ̇ (∇σn )2 + a2 σ̇n2 . (60)
n H 2 2

In general, for an interaction Hamiltonian of the form (58), Eq. (2) can be
put in the form
∞ Z t Z tN Z t2
iN
X
hQ(t)i = dtN dtN −1 · · · dt1
N =0 −∞ −∞ −∞
Dh h h i iiE
× H̃I (t1 ), H̃I (t2 ), · · · H̃I (tN ), Q̃I (t) · · · , (61)

22
where
d iB(t) i −iB(t) i
h  
iB(t) −iB(t)
H̃I (t) = e A(t)+Ḃ(t)+ie e e = A(t)+i[B(t), A(t)]+ [B(t), Ḃ(t)]+. . .
dt 2
(62)
1
Q̃I (t) = eiB(t) QI (t)e−iB(t) = QI (t)+i[B(t), QI (t)]− [B(t), [B(t), QI (t)]]+. . . .
2
(63)
To second order in an interaction of the form (58), the expectation value is
then
Z t Z t2 Dh h iiE
hQ(t)i2 = − dt2 dt1 A(t1 ), A(t2 ), QI (t)
−∞ −∞
Z t Dhh i iE
− dt1 B(t1 ), A(t1 ) + Ḃ(t1 )/2 , QI (t)
−∞
Dh h iiE
− B(t), B(t), QI (t) , (64)

The Fourier transform of the second-order term in the expectation value of


a product of two ζs is then
Z D E

d3 x eiq·(x−x ) vac, in ζ(x, t) ζ(x′ , t) vac, in

2
32(2π)9 t
Z
=− Re a5 (t2 ) ǫ(t2 ) H(t2 ) dt2
q4 −∞
Z t2
× a5 (t1 ) ǫ(t1 ) H(t1 ) dt1
−∞
 
× ζ̇q (t1 ) ζq∗ (t) ζ̇q (t2 ) ζq∗ (t) − ζq (t) ζ̇q∗ (t2 )
Z Z
×N d3 p d3 p′ δ3 (p + p′ + q)
× σ̇p (t1 ) σ̇p∗ (t2 ) σ̇p′ (t1 ) σ̇p∗′ (t2 )
(2π)3
Z Z
+ N d3 p d3 p′ δ3 (p + p′ + q)
4q 4
× (p · p′ )2 |σp (t)|2 |σp′ (t)|2
+ ... (65)

where N is the number of σ fields. We have shown here explicitly the


contribution of the first and third lines on the right-hand side of Eq. (64).
The dots represent one-loop contributions of the second line, in which [B, A+
Ḃ/2] plays the role of a ζζσσ “seagull” interaction, as well as one-loop terms

23
of first order in the ζζσσ terms in Eq. (5), in both of which the integral
over internal wave number is q-independent, plus counterterms arising in
first order√from interactions that√cancel ultraviolet divergences in flat space,
including −Detg Rµν Rµν and −Detg R2 terms in the Lagrangian density
that are not included in Eq. (5).
Though it has not been made explicit in this section, we use dimensional
regularization to remove infinities in the integrals over p and p′ at interme-
diate stages in the calculation, and we now assume that the singularity as
the number of space dimensions approaches three is cancelled by the terms
in Eq. (65) represented by dots, leaving it to future work to show that this
is the case. Then these integrals are dominated by p ≈ p′ ≈ q. As we have
seen, the integrals over time are then dominated by the time tq of horizon
exit, when q/a(tq ) ≃ H(tq ). For simplicity, we will assume (for the first time
in this paper) that the unperturbed inflaton field ϕ̄(t) is rolling very slowly
down the potential at time tq , so that the expansion near this time can be
approximated as strictly exponential, a(t) ∝ eHt . Then the wave functions
are  
σq (t) ≃ σqo e−iqτ 1 + iqτ ,
 
ζq (t) ≃ ζqo e−iqτ 1 + iqτ ,
where τ is the conformal time
Z ∞ dt
τ ≡− ,
t a(t)

and the wave functions outside the horizon have modulus


H 2 (tq ) H 2 (tq )
|σqo |2 = , |ζqo |2 =
2(2π)3 q 3 2(2π)3 ǫ(tq ) q 3

Using these wave functions in Eq. (65) gives


Z D E

d3 x eiq·(x−x ) vac, in ζ(x, t) ζ(x′ , t) vac, in

2
(8πGH 2 (t ))2 N
Z Z
q
= 3
d3 p d3 p′ δ3 (p + p′ + q)
(2π)
" #
p p′ (p · p′ )2
× + + ... (66)
q 7 (p + p′ + q) 16 q 4 p3 p′3

with the dots having the same meaning as in Eq. (65).

24
Simple dimensional analysis tells us that when the integral over internal
wave numbers of the first term in square brackets is made finite by dimen-
sional regularization, it is converted to

p p′
Z Z
d3 p d3 p′ δ3 (p + p′ + q) ⇒ q 4+δ F (δ) , (67)
p + p′ + q
where δ is a measure of the difference between the space dimensionality
and three. The ultraviolet divergences in this integrals for δ = 0 gives the
function F (δ) a singularities as δ → 0:

F0
F (δ) → + F1 , (68)
δ
so that in the limit δ = 0
p p′
Z Z h i
3
d p d3 p′ δ3 (p + p′ + q) = q 4
F0 ln q + L , (69)
p + p′ + q
where L is a divergent constant. We can easily calculate the coefficient F0
of the logarithm. For this purpose, we note that, in general,
Z Z

Z ∞ Z p+q
3 3 ′ 3 ′ ′
d p d p δ (p + p + q)f (p, p , q) = p dp p′ dp′ f (p, p′ , q)
q 0 |p−q|
(70)
To eliminate the divergence in the integral over p and p′ , we multiply by q
and differentiate six times with respect to q. A tedious but straightforward
calculation gives

d6 p p′ 8π
 Z Z 
q d3 p d3 p′ δ3 (p + p′ + q) =−
dq 6 p + p′ + q q

Comparing this with the result of applying the same operation to Eq. (69)
then gives F0 = −π/15.
In contrast, the integral of the second term in square brackets in Eq. (66)
is a sum of powers of q with divergent coefficients, but with no logarithmic
singularity in q. (This term would be eliminated if we calculated the ex-
pectation value of a product of fields ζ̃ ≡ exp(−iB)ζ exp(iB) instead of ζ.)
The terms represented by dots in Eq. (65) make contributions that are also
just a sum of powers of q with divergent coefficients. We are assuming that
all ultraviolet divergences cancel, but we cannot find resulting
√ finite power
terms without knowing the renormalized coefficients of the −DetgRµν Rµν

25

and −DetgR2 terms in the Lagrangian density. So we are left with the
result (now restoring a suitable power of 8πG) that
Z D E

d3 x eiq·(x−x ) vac, in ζ(x, t) ζ(x′ , t) vac, in

2
 2
π 8πGH 2 (tq ) N h i
=− ln q + C (71)
15(2π)3 q 3

with C an unknown constant. This may be compared with the classical (and
classic) result, that in slow roll inflation this correlation function takes the
form
Z D E

d3 x eiq·(x−x ) vac, in ζ(x, t) ζ(x′ , t) vac, in

0
8πGH 2 (t q)
= (72)
4(2π)3 |ǫ(t q )|q
3

The one-loop correction (71) is smaller by a factor of order 8πGH 2 N |ǫ(tq )|,
so even if N is 102 or 103 this correction is likely to remain unobservable.
Still, it is interesting that even in the extreme slow roll limit, where H(tq )
and ǫ(tq ) are nearly constant, the factor ln q gives it a different dependence
on the wave number q.

ACKNOWLEDGMENTS

For helpful conversations I am grateful to K. Chaicherdsakul, S. Deser,


W. Fischler, E. Komatsu, J. Maldacena, A. Vilenkin, and R. Woodard. This
material is based upon work supported by the National Science Foundation
under Grants Nos. PHY-0071512 and PHY-0455649 and with support from
The Robert A. Welch Foundation, Grant No. F-0014, and also grant support
from the US Navy, Office of Naval Research, Grant Nos. N00014-03-1-0639
and N00014-04-1-0336, Quantum Optics Initiative.

26
APPENDIX: THE IN-IN FORMALISM
1. Time Dependence
First, it is necessary to be precise about the origin of the time-dependence
of the fluctuation Hamiltonian in applications such as those encountered in
cosmology. Consider a general Hamiltonian system, with canonical variables
φa (x, t) and conjugates πa (x, t) satisfying the commutation relations
h i h i h i
φa (x, t), πb (y, t) = iδab δ3 (x−y) , φa (x, t), φb (y, t) = πa (x, t), πb (y, t) = 0 ,
(A.1)
and the equations of motion
h i h i
φ̇a (x, t) = i H[φ(t), π(t)], φa (x, t) , π̇a (x, t) = i H[φ(t), π(t)], πa (x, t) .
(A.2)
Here a is a compound index labeling particular fields and their spin com-
ponents. The Hamiltonian H is a functional of the φa (x, t) and πa (x, t) at
fixed time t, which according to Eq. (A.2) is of course independent of the
time at which these variables are evaluated.
We assume the existence of a time-dependent c-number solution φ̄a (x, t),
π̄a (x, t), satisfying the classical equations of motion:
δH[φ̄(t), π̄(t)] δH(φ̄(t), π̄(t)]
φ̄˙ a (x, t) = , π̄˙ a (x, t) = − , (A.3)
δπ̄a (x, t) δφ̄a (x, t)
and we expand around this solution, writing
φa (x, t) = φ̄a (x, t) + δφa (x, t) , πa (x, t) = π̄a (x, t) + δπa (x, t) . (A.4)
(In cosmology, φ̄a would describe the Robertson–Walker metric and the
expectation values of various scalar fields.) Of course, since c-numbers com-
mute with everything, the fluctuations satisfy the same commutation rules
(A.1) as the total variables:
h i h i h i
δφa (x, t), δπb (y, t) = iδab δ3 (x−y) ,
δφa (x, t), δφb (x, t) = δπa (x, t), δπb (x, t) = 0 ,
(A.5)
When the Hamiltonian is expanded in powers of the perturbations δφa (x, t)
and δπa (x, t) at some definite time t, we encounter terms of zeroth and first
order in the perturbations, as well as time-dependent terms of second and
higher order:
X δH[φ̄(t), π̄(t)] X δH[φ̄(t), π̄(t)]
H[φ(t), π(t)] = H[φ̄(t), π̄(t)] + δφa (x, t] + δπa (x, t)
a δφ̄a (x, t) a ∂ π̄a (x, t)
+ H̃[δφ(t), δπ(t); t] , (A.6)

27
where H̃[δφ(t), δπ(t); t] is the sum of all terms in H[φ̄(t)+δφ(t), π̄(t)+δπ(t)]
of second and higher order in the δφ(x, t) and/or δπ(x, t).
Now, although H generates the time-dependence of φa (x, t) and πa (x, t),
it is H̃ rather than H that generates the time dependence of δφa (x, t) and
δπa (x, t). That is, Eq. (A.2) gives

φ̄˙ a (x, t)+δφ̇a (x, t) = i H[φ(t), π(t)], δφa (x, t) ,


h i h i
π̄˙ a (x, t)+δπ̇a (x, t) = i H[φ(t), π(t)], δπa (x, t) ,

while Eqs. (A.5) and (A.3) give


" #
XZ δH[φ̄(t), π̄(t)] δH[φ̄(t), π̄(t)]
Z
d3 y d3 y δπb (y, t), δφa (x, t) = φ̄˙ a (x, t)
X
i δφb (y, t) +
b
δφ̄b (y, t) b
δπ̄b (y, t)
" #
XZ δH[φ̄(t), π̄(t)] δH[φ̄(t), π̄(t)]
Z
3 3
X
i d y δφb (y, t) + d y δπb (y, t), δπa (x, t) = π̄˙ a (x, t) .
b
δφ̄b (y, t) b
δπ̄b (y, t)
Subtracting, we find
h i h i
δπ̇a (x, t) = i H̃[φ(t), π(t); t], δπa (x, t) .
δφ̇a (x, t) = i H̃[φ(t), π(t); t], δφa (x, t) ,
(A.7)
This then is our prescription for constructing the time-dependent Hamilto-
nian H̃ that governs the time-dependence of the fluctuations: expand the
original Hamiltonian H in powers of fluctuations δφ and δπ, and throw
away the terms of zeroth and first order in these fluctuations. It is this
construction that gives H̃ an explicit dependence on time.

2. Operator Formalism for Expectation Values


We consider a general Hamiltonian system, of the sort described in the
previous subsection. It follows from Eq. (A.7) that the fluctuations at time
t can be expressed in terms of the same operators at some very early time
t0 through a unitary transformation

δφa (t) = U −1 (t, t0 )δφa (t0 ) U (t, t0 ) ,


δπa (t) = U −1 (t, t0 )δπa (t0 ) U (t, t0 ) ,
(A.8)
where U (t, t0 ) is defined by the differential equation

d
U (t, t0 ) = −i H̃[δφ(t), δπ(t); t] U (t, t0 ) (A.9)
dt
and the initial condition
U (t0 , t0 ) = 1 . (A.10)

28
In the application that concerns us in cosmology, we can take t0 = −∞, by
which we mean any time early enough so that the wavelengths of interest
are deep inside the horizon.
To calculate U (t, t0 ), we now further decompose H̃ into a kinematic term
H0 that is quadratic in the fluctuations, and an interaction term HI :

H̃[δφ(t), δπ(t); t] = H0 [δφ(t), δπ(t); t] + HI [δφ(t), δπ(t); t] , (A.11)

and we seek to calculate U as a power series in HI . To this end, we intro-


duce an “interaction picture”: we define fluctuation operators δφIa (t) and
δπaI (t) whose time dependence is generated by the quadratic part of the
Hamiltonian:
h i h i
δφ̇Ia (t) = i H0 [δφI (t), δπ I (t); t], δφIa (t) , δπ̇aI (t) = i H0 [δφI (t), δπ I (t); t], δπaI (t) ,
(A.12)
and the initial conditions

δφIa (t0 ) = δφa (t0 ) , δπaI (t0 ) = δπa (t0 ) . (A.13)

Because H0 is quadratic, the interaction picture operators are free fields,


satisfying linear wave equations.
It follows from Eq. (A.12) that in evaluating H0 [δφI , δπ I ; t] we can take
the time argument of δφI and δπ I to have any value, and in particular we
can take it as t0 , so that

H0 [δφI (t), δπ I (t); t] = H0 [δφ(t0 ), δπ(t0 ); t] , (A.14)

but the intrinsic time-dependence of H0 still remains. The solution of


Eq. (A.12) can again be written as a unitary transformation:

δφIa (t) = U0−1 (t, t0 )δφa (t0 )U0 (t, t0 ) ,


δπaI (t) = U0−1 (t, t0 )δπa (t0 )U0 (t, t0 ) ,
(A.15)
with U0 defined by the differential equation
d
U0 (t, t0 ) = −i H0 [δφ(t0 ), δπ(t0 ); t] U0 (t, t0 ) (A.16)
dt
and the initial condition
U0 (t0 , t0 ) = 1 . (A.17)
Then from Eqs. (A.9) and (A.16) we have
d h −1 i
U0 (t, t0 )U (t, t0 ) = −iU0−1 (t, t0 )HI [δφ(t0 ), δπ(t0 ); t]U (t, t0 ) .
dt

29
Using Eq. (A.15), this gives

U (t, t0 ) = U0 (t, t0 )F (t, t0 ) , (A.18)

where
d
F (t, t0 ) = −iHI (t)F (t, t0 ) , F (t0 , t0 ) = 1 . (A.19)
dt
and HI (t) is the interaction Hamiltonian in the interaction picture:

HI (t) ≡ U0 (t, t0 )HI [δφ(t0 ), δπ(t0 ); t]U0−1 (t, t0 ) = HI [δφI (t), δπ I (t); t]
(A.20)
The solution of equations like (A.19) is well known
 Z t 
F (t, t0 ) = T exp −i HI (t) dt (A.21)
t0

where T indicates that the products of HI s in the power series expansion of


the exponential are to be time-ordered; that is, they are to be written from
left to right in the decreasing order of time arguments. The solution for the
fluctuations in terms of the free fields of the interaction picture is then given
by Eqs. (A.8) and (A.15) as

Q(t) = F −1 (t, t0 ) QI (t)F (t, t0 )


  Z t    Z t 
= T̄ exp i HI (t) dt QI (t) T exp −i HI (t) dt (A.22)
,
t0 t0

where Q(t) is any δφ(x, t) or δπ(x, t) or any product of the δφs and/or δπs,
all at the same time t but in general with different space coordinates, and
QI (t) is the same product of δφI (x, t) and/or δπ I (x, t). Also, T̄ denotes
anti-time-ordering: products of HI s in the power series expansion of the
exponential are to be written from left to right in the increasing order of
time arguments.

3. Diagrammatic Formalism for Expectation Values


We want to use Eq. (A.22) to calculate the expectation value hQ(t)iof
the product Q(t) in a “Bunch–Davies” vacuum, annihilated by the positive-
frequency part of the interaction picture fluctuations δϕI and δπ I . We can
use the familiar Wick theorem to express the vacuum expectation value of
the right-hand side of Eq. (A.22) as a sum over pairings of the δϕI and δπ I
with each other. (This of course is the same as supposing the interaction-
picture fields in HI (t) and QI (t) to be governed by a Gaussian probability

30
distribution, except that the order of operators in bilinear averages has to
be the same as the order in which they appear in Eq. (A.22).) Expand-
ing Eq. (A.22) as a sum of products of bilinear products leads to a set of
diagrammatic rules, but one that is rather complicated.
In calculating the term in hQi of N th order in the interaction, we draw
all diagrams with N vertices. Just as for ordinary Feynman diagrams, each
vertex is labeled with a space and time coordinate, and has lines attached
corresponding to the fields in the interaction. There are also external lines,
one for each field operator in the product Q, labeled with the different
space coordinates and the common time t in the arguments of these fields.
All external lines are connected to vertices or other external lines, and all
remaining lines attached to vertices are attached to other vertices. But there
are significant differences between the rules following from Eq. (A.22) and
the usual Feynman rules:
• We have to distinguish between “right” and “left” vertices, arising
respectively from the time-ordered product and the anti-time-ordered
product. A diagram with N vertices contributes a sum over all 2N
ways of choosing each vertex to be a left vertex or a right vertex.
Each right or left vertex contributes a factor −i or +i, respectively, as
well as whatever coupling parameters appear in the interaction.

• A line connecting two right vertices or a right vertex and an external


line, in which it is associated with field operators A(x, t′ ) and B(y, t′′ ),
contributes a conventional Feynman propagator hT {A(x, t′ )B(y, t′′ }i.
(It will be understood here and below, that in calculating propagators
all fields A, B, etc. are taken in the interaction picture, and can
be δϕI s and/or δπ I s.) As a special case, if B is associated with an
external line then t′′ = t, and since t′ ≤ t, this is hB(y, t)A(x, t′ )i.

• A line connecting two left vertices, associated with field operators


A(x, t′ ) and B(y, t′′ ), contributes a propagator hT̄ {A(x, t′ )B(y, t′′ }i.
As a special case, if B is associated with an external line then t′′ = t,
and this is hA(x, t′ )B(y, t)i.

• A line connecting a left vertex, in which it is associated with a field


operator A(x, t′ ), to a right vertex, in which it is associated with a
field operator B(y, t′′ ), contributes a propagator hA(x, t′ )B(y, t′′ )i.

• We must integrate over all over the times t′ , t′′ , . . ., associated with the
vertices from t0 to t, as well as over all space coordinates associated

31
with the vertices.
We must say a word about the disconnected parts of diagrams. A vac-
uum fluctuation subdiagram is one in which each vertex is connected only to
other vertices, not to external lines. Just as in ordinary quantum field the-
ories, the sum of all vacuum fluctuation diagrams contributes a numerical
factor multiplying the contribution of diagrams in which vacuum fluctua-
tions are excluded. But unlike the case of ordinary quantum field theory,
this numerical factor is not a phase factor, but is simply
  Z t    Z t 
T̄ exp i HI (t) dt T exp −i HI (t) dt =1. (A.23)
t0 t0

Hence in the “in-in” formalism all vacuum fluctuation diagrams automati-


cally cancel. Even so, a diagram may contain disconnected parts which do
not cancel, such as external lines passing through the diagram without in-
teracting. Ignoring all disconnected parts gives what in the theory of noise
is known as the cumulants of expectation values,10 from which the full ex-
pectation values can easily be calculated as a sum of products of cumulants.

4. Path Integral Derivation of the Diagrammatic Rules.


It is often preferable use path integration instead of the operator for-
malism, in order to derive the Feynman rules directly from the Lagrangian
rather than from the Hamltonian, or to make available a larger range of
gauge choices, or to go beyond perturbation theory. Going back to Eq. (1),
and following the same reasoning11 that leads from the operator formalism
to the path-integral formalism in the calculation of S-matrix elements, we
see that the vacuum expectation value of any product Q(t) of δφs and δπs
at the same time t (now taking t0 = −∞) is
Z Y Y dδπLa (x, t′ ) Y Y dδπRa (x, t′ )
hQ(t)i = dδφLa (x, t′ ) dδφRa (x, t′ )
x,t′ ,a x,t′ ,a
2π x,t′ ,a x,t′ ,a

( " #)
Z t XZ
′ 3 ′ ′ ′ ′ ′
× exp −i dt d x δφ̇La (x, t )δπLa (x, t ) − H̃[δφL (t ), δπL (t ); t ]
−∞ a
( Z " #)
t XZ  
′ 3 ′ ′ ′ ′ ′
× exp i dt d x δφ̇Ra (x, t )δπRa (x, t ) − H̃ δφR (x, t ), δπR (x, t ); t
−∞ a
Y     h i
× δ δφLa (x, t) − δφRa (x, t) δ δπLa (x, t) − δπRa (x, t) Q δφL (t), δπL (t)
x,a
h i h i
×Ψ∗0 δφL (−∞) Ψ0 δφR (−∞) . (A.24)

32
Here the functional Ψ0 [δφ] is the wave function of the vacuum,12
 
1X
Z Z
Ψ0 [φ(−∞)] ∝ exp − d3 x d3 y Eab (x, y)δφa (x, −∞) δφb (y, −∞)
2 a,b
 
ǫ
Z t ′ X
Z Z
= exp − dt′ eǫt d3 x d3 y Eab (x, y) δφa (t′ ) δφb (t′ ) (A.25)
,
2 −∞ a,b

where Eab is a positive-definite kernel. For instance, for a real scalar field of
mass m,
1
Z q
3 ip·(x−y)
E(x, y) ≡ d p e p2 + m2 . (A.26)
(2π)3
As is well known, if the Hamiltonian is quadratic in the canonical con-
jugates δπa with a field-independent coefficient in the term of second order,
then we can integrate over the δπa by simply setting  δ φ̇a = ∂ H̃/∂δπa , and
′ ′ ′ ′ ′
the quantity a δφ̇a (t )δπa (t ) − H̃ δφ(t ), δπ(t ); t in Eq. (A.24) then be-
P

comes the original Lagrangian. We will not pursue this here, but will rather
take up a puzzle that at first sight seems to throw doubt on the equivalence
of the path integral formula (A.24), when we do not integrate out the πs,
with the operator formalism.
The puzzle is that, although the propagators for lines connecting left
vertices to each other or right vertices to each other or left or right vertices
to external lines are Greens functions of the sort that familiarly emerge
from path integrals, what are we to make of the propagators arising from
Eq. (A.22) for lines connecting left vertices with right vertices? These are not
Greens functions; that is, they are solutions of homogeneous wave equations,
not of inhomogeneous wave equations with a delta function source. As
we shall see, the source of these propagators lies in the delta functions in
Eq. (A.24). It is these delta functions that tie together the integrals over
the L variables and over the R variables, so that the expression (A.18) does
not factor into a product of these integrals.
In analyzing the consequences of Eq. (A.24), it is convenient to condense
our notation yet further, and let a variable ξn (t) stand for all the δφa (x, t)
and δπa (x, t), so that n runs over positions in space and whatever discrete
indices are used to distinguish different fields, plus a two-valued index that
distinguishes δφ from δπ. With this understanding, Eq. (A.24) reads
dξLn (t′ ) Y dξRn (t′ )
Z Y
hQ(t)i = √ √
t′ ,n 2π t′ ,n 2π

33
 Z t    Z t h i
× exp −i dt′ L̃ ξL (t′ ), ξ˙L (t′ ); t′ exp i dt′ L̃ ξR (t′ ), ξ˙R (t′ ); t′
−∞ −∞
!
Y        
× δ ξLn (t) − ξRn (t) Q ξL (t) Ψ∗0 ξL (−∞) Ψ0 ξR (−∞) , (A.27)
n

where
XZ h i
′ ′ ′
L̃[ξ(t ), ξ̇(t ); t ] ≡ d3 x δπa (x, t′ ) δφ̇a (x, t′ ) − H̃ δφ(t′ ), δπ(t′ ); t′ .
a
(A.28)
To expand in powers of the interaction, we split L̃ into a term L̃0 that is
quadratic in the fluctuations, plus an interaction term −H̃I :

L̃ = L̃0 − H̃I , (A.29)

where
XZ  
L̃0 [ξ(t′ ), ξ̇(t′ ); t′ ] = d3 xδφ̇a (x, t′ )δπa (x, t′ ) − H̃0 δφ(t′ ), δπ(t′ ); t′ .
a
(A.30)
As in calculations of the S-matrix, we will include the argument of the
exponential in the vacuum wave functions along with the quadratic part of
the Lagrangian, writing
(
Z t

dt L̃0 [ξR (t′ ), ξ̇R (t′ ); t′ ]
−∞
)
iǫ X
Z Z
+ d3 x d3 y Eab (x, y) δφRa (x, t′ ) δφRb (y, t′ )
2 ab
1XX R
≡ D ′ ′′ ξRn (t′ ) ξRn′ (t′′ ) , (A.31)
2 nn′ t′ ,t′′ nt ,mt
(
Z t
dt L̃0 [ξL (t′ ), ξ̇L (t′ ); t′ ]


)
iǫ X
Z Z
3 3 ′ ′
− d x d y Eab (x, y) δφLa (x, t ) δφLb (y, t )
2 ab
1XX L
≡ D ′ ′ ′′ ξLn (t′ ) ξLn′ (t′′ ) (A.32)
2 nn′ t′ ,t′′ nt ,n t

The vacuum wave function is


R
the same for ξL and ξR , but it is combinedRhere
with an exponential exp(−i L̃0 ) for the ξLn and an exponential exp(+i L̃0 )

34
for the ξRn , which accounts for the different signs of the iǫ terms in Eqs. (A.31)

and (A.32). (The factor eǫt in Eq. (A.25) is effectively equal to one for any
finite t′ , and has therefore been dropped.) We also express the product of
delta functions in Eq. (A.27) as a Gaussian:
!
Y   1 X 2
δ ξLn (t) − ξRn (t) ∝ exp − ′ ξLn (t) − ξRn (t)
n ǫ n
!
  
′ ′ ′′ ′′
XX
= exp − Cnt′ ,n′ t′′ ξLn (t ) − ξRn (t ) ξLn′ (t ) − ξRn′ (t ) (A.33)
,
nn′ t′ t′′

where
1
Cnt′ ,n′ t′′ ≡ δnn′ δ(t′ − t) δ(t′′ − t) , (A.34)
ǫ′
and ǫ′ is another positive infinitesimal.
Following the usual rules for integrating a Gaussian times a polynomial,
the integral is given by a sum over diagrams as described above, but with
a line that connects right vertices with each other (or with external lines)
contributing a factor −i∆RRnt′ ,n′ t′′ , a line that connects left vertices with each
other (or with external lines) contributing a factor i∆LL nt′ ,n′ t′′ , and a line
that connects a right vertex where it is associated with ξn (t′ ) with a left
vertex associated with ξn′ (t′′ ) contributing a factor i∆RL nt′ ,n′ t′′ , with the ∆s
determined by the condition
! ! !
iD R − C C −i∆RR i∆RL 1 0
= . (A.35)
C −iD L − C i(∆RL )T i∆LL 0 1

This must hold whatever tiny value we give to ǫ′ , and so

D R ∆RR = 1 , D L ∆LL = 1 , (A.36)


 T
D R ∆RL = 0 , D L ∆RL =0, (A.37)

C∆LL = C∆RL , C∆RR = −C(∆RL )T . (A.38)


The first Eq. (A.36) is the usual inhomogeneous wave equation for the
propagator, whose solution as well known is

−i∆RR ′ ′′
nt′ ,n′ t′′ = hT {ξn (t ) ξn′ (t )}i , (A.39)

35
with the time-ordering dictated by the +iǫ in Eq. (A.31). The second
Eq. (A.36) is the complex conjugate of the first wave equation, whose solu-
tion is the complex conjugate of Eq. (A.39):

i∆LL ′ ′′
nt′ ,n′ t′′ = hT̄ {ξn (t ) ξn′ (t )}i . (A.40)

Eqs. (A.39) and (A.40) thus give the same propagators for lines connecting
right vertices with each other or with external lines, and for lines connecting
left vertices with each other or with external lines, as we we encountered
in the operator formalism. Equations (A.37) tell us that ∆RL and (∆RL )T
satisfy the homogeneous versions of the wave equations satisfied by ∆RR and
∆LL , but to find ∆RL we also need an initial condition. This is provided by
the first of Eqs. (A.38), which in more detail reads

i∆RL LL
nm (t, t2 ) = i∆nm (t, t2 ) = hT̄ {ξn (t)ξm (t2 )}i = hξm (t2 )ξn (t)i , (A.41)

in which we have used the fact that t > t2 . This, together with the first of
Eqs. (A.37), tells us that

i∆RL
nm (t1 , t2 ) = hξm (t2 )ξn (t1 )i , (A.42)

which is the same propagator for internal lines connecting right vertices with
left vertices that we found in the operator formalism.

5. Tree Graphs and Classical Solutions.


We will now verify the remark made in Section I, that the usual approach
to the calculation of non-Gaussian correlations, of solving the classical field
equations beyond the linear approximation, simply corresponds to the cal-
culation of tree diagrams in the “in-in” formalism. This is a well-known
result13 in the usual applications of quantum field theory, but some modifi-
cations in the usual argument are needed in the “in-in” formalism, in which
the vacuum persistence functional is always unity whether or not we add a
current term to the Lagrangian.
We begin by introducing a generating functional W [j, t, g] for correlation
functions of fields at a fixed time t:
* +
1
P R
d3 x δφa (x,t)Ja (x)

W [J,t,g]/g
e ≡ vac, in e g vac, in , (A.43)
a

g

where Ja is an arbitrary current, and g a real parameter, with the sub-


script g indicating that the expectation value is to be calculated using a

36
Lagrangian density multiplied with a factor 1/g. (This is different from the
usual definition of the effective action, because here we are not introducing
the current into the Lagrangian.) The quantity of physical interest is of
course W [J, t, 1], from which expectation values of all products of fields can
be found by expanding in powers of the current.
Using Eq. (A.27), we can calculate W as the path integral
Z Y Z Y Z Y Z Y
eW [J,t,g]/g = δφL δπL δφR δπR

1
Z t 
× exp −i dt′
L̃[δφL , δπL ; t′ ]
−∞ g
 Z t
1

× exp +i dt′ L̃[δφR , δπR ; t′ ]
−∞ g
Y Y
× δ[φL (t) − δφR (t)] δ[δπL (t) − δπR (t)]
1
P R
d3 x δφ a (x,t)Ja (x)
×e g a ···
× Ψvac [δφL (−∞)] Ψvac [δφR (−∞)] (A.44)

The usual power-counting arguments13 show that the L loop contribution


to W [J, t, g] has a g-dependence given by a factor g−L . For g → 0, W is
thus given by the sum of all tree graphs. The integrals over δφL , δπL , δφL ,
δπL are dominated in the limit g → 0 by fields where L̃ is stationary, i.e.,
where
δφL = δφR = δφclassical
δπL = δπR = δπ classical
with δφclassical and δπ classical the solutions of the classical field equations
with the initial conditions that the fields go to free fields such as (14)–(16)
satisfying the initial conditions (20) at t → −∞. Since the L and R fields
take the same values at this stationary point, the action integrals cancel,
and we conclude that
h i XZ
W [J, t, 1] = d3 x δφclassical
a (x, t) Ja (x) . (A.45)
zero loops
a

Expanding in powers of the current, this shows that in the tree approxima-
tion the expectation value of any product of fields is to be calculated by
taking the product of the fields obtained by solving the non-linear classical
field equations with suitable free-field initial conditions, as was to be proved.

37
REFERENCES

1. For a review, see N. Bartolo, E. Komatsu, S. Matarrese, and A. Riotto,


astro-ph/0406398.
2. J. Maldacena, JHEP 0305, 013 (2003) (astro-ph/0210603). For other
work on this problem, see A. Gangui, F. Lucchin, S. Matarrese,and
S. Mollerach, Astrophys. J. 430, 447 (1994) (astro-ph/9312033); P.
Creminelli, astro-ph/0306122; P. Creminelli and M. Zaldarriaga, astro-
ph/0407059; G. I. Rigopoulos, E.P.S. Shellard, and B.J.W. van Tent,
astro-ph/0410486; and ref. 3.
3. J. Schwinger, Proc. Nat. Acad. Sci. US 46, 1401 (1961). Also see
L. V. Keldysh, Soviet Physics JETP 20, 1018 (1965); B. DeWitt, The
Global Approach to Quantum Field Theory (Clarendon Press, Oxford,
2003): Sec. 31. This formalism has been applied to cosmology by E.
Calzetta and B. L. Hu, Phys. Rev. D 35, 495 (1987); M. Morikawa,
Prog. Theor. Phys. 93, 685 (1995); N. C. Tsamis and R. Woodard,
Ann. Phys. 238, 1 (1995); 253, 1 (1997); N. C. Tsamis and R.
Woodard, Phys. Lett. B426, 21 (1998); V. K. Onemli and R. P.
Woodard, Class. Quant. Grav. 19, 407 (2002); T. Prokopec, O.
Tornkvist, and R. P. Woodard, Ann. Phys. 303, 251 (2003); T.
Prokopec and R. P. Woodard, JHEP 0310, 059 (2003); T. Brunier,
V.K. Onemli, and R. P. Woodard, Class. Quant. Grav. 22, 59 (2005),
but not (as far as I know) to the problem of calculating cosmological
correlation functions.
4. F. Bernardeau, T. Brunier, and J-P. Uzam, Phys. Rev. D 69, 063520
(2004).
5. The constancy of this quantity outside the horizon has been used
in various special cases by J. M. Bardeen, Phys. Rev. D22, 1882
(1980); D. H. Lyth, Phys. Rev. D31, 1792 (1985). For reviews, see
J. Bardeen, in Cosmology and Particle Physics, eds. Li-zhi Fang and
A. Zee (Gordon & Breach, New York, 1988); A. R. Liddle and D. H.
Lyth, Cosmological Inflation and Large Scale Structure (Cambridge
University Press, Cambridge, UK, 2000).
6. R. S. Arnowitt, S. Deser, and C. W. Misner, in Gravitation: An Intro-
duction to Current Research, ed. L. Witten (Wiley, New York, 1962):
227. This classic article is now available as gr-qc/0405109.

38
7. V. S. Mukhanov, H. A. Feldman, and R. H. Brandenberger, Physics
Reports 215, 203 (1992); E. D. Stewart and D. H. Lyth, Phys. Lett.
B 302, 171 (1993).

8. N. C. Tsamis and R. Woodard, ref. 3.

9. A. Vilenkin and L. H. Ford, Phys. Rev. D. 26, 1231 (1982); A.


Vilenkin, Nucl. Phys. B226, 527 (1983).

10. R. Kubo, J. Math. Phys. 4, 174 (1963).

11. See, e.g., S. Weinberg, The Quantum Theory of Fields – Volume I


(Cambridge, 1995): Sec. 9.1.

12. ibid., Sec. 9.2.

13. S. Coleman, in Aspects of Symmetry (Cambridge University Press,


Cambridge, 1985): pp 139–142.

39
UTTG-12-05

Living in the Multiverse

Opening Talk at the Symposium ”Expectations of a Final Theory” at


arXiv:hep-th/0511037v1 3 Nov 2005

Trinity College, Cambridge, September 2, 2005; to be published in


Universe or Multiverse?, ed. B. Carr (Cambridge University Press).

Steven Weinberg
Physics Department, University of Texas at Austin

Most advances in the history of science have been marked by discoveries


about nature, but at certain turning points we have made discoveries about
science itself. These discoveries lead to changes in how we score our work,
in what we consider to be an acceptable theory.
For an example look back to a discovery made just one hundred years
ago. As you recall, before 1905 there had been numerous unsuccessful ef-
forts to detect changes in the speed of light due to the motion of the earth
through the ether. Attempts were made by Fitzgerald, Lorentz, and others
to construct a mathematical model of the electron (which was then con-
ceived to be the chief constituent of all matter), that would explain how
rulers contract when moving through the ether in just the right way to keep
the apparent speed of light unchanged. Einstein instead offered a symmetry
principle, which stated that not just the speed of light but all the laws of
nature are unaffected by a transformation to a frame of reference in uniform
motion. Lorentz grumbled that Einstein was simply assuming what he and
others had been trying to prove. But history was on Einstein’s side. The
1905 Special Theory of Relativity was the beginning of a general acceptance
of symmetry principles as a valid basis for physical theories.
This was how Special Relativity made a change in science itself. From
one point of view, Special Relativity was no big thing — it just amounted
to the replacement of one 10 parameter spacetime symmetry group, the
Galileo group, with another 10 parameter group, the Lorentz group. But
never before had a symmetry principle been taken as a legitimate hypothesis
on which to base a physical theory.
As usually happens with this sort of revolution, Einstein’s advance came
with a retreat in another direction: The effort to construct a model of

1
the electron was suspended for decades. Instead, symmetry principles in-
creasingly became the dominant foundation for physical theories. This ten-
dency was accelerated after the advent of quantum mechanics in the 1920s,
because the survival of symmetry principles in quantum theories imposes
highly restrictive consistency conditions (existence of antiparticles, connec-
tion between spin and statistics, cancelation of infinities and anomalies) on
physically acceptable theories. Our present Standard Model of elementary
particle interactions can be regarded as simply the consequence of certain
gauge symmetries and the associated quantum mechanical consistency con-
ditions.
The development of the Standard Model did not involve any changes
in our conception of what was acceptable as a basis for physical theories.
Indeed, the Standard Model can be regarded as just quantum electrodynam-
ics writ large. Similarly, when the effort to extend the Standard Model to
include gravity led to widespread interest in string theory, we expected to
score the success or failure of this theory in the same way as for the Stan-
dard Model: String theory would be a success if its symmetry principles and
consistency conditions led to a successful prediction of the free parameters
of the Standard Model.
Now we may be at a new turning point, a radical change in what we
accept as a legitimate foundation for a physical theory. The current ex-
citement is is of course a consequence of the discovery of a vast number
of solutions of string theory, beginning in 2000 with the work of Bousso
and Polchinski.1 The compactified six dimensions in Type II string theories
typically have a large number (tens or hundreds) of topological fixtures (3-
cycles), each of which can be threaded by a variety of fluxes. The logarithm
of the number of allowed sets of values of these fluxes is proportional to the
number of topological fixtures. Further, for each set of fluxes one obtains a
different effective field theory for the modular parameters that describe the
compactified 6-manifold, and for each effective field theory the number of
local minima of the potential for these parameters is again proportional to
the number of topological fixtures. Each local minimum corresponds to the
vacuum of a possible stable or metastable universe.
Subsequent work by Giddings, Kachru, Kallosh, Linde, Maloney, Polchin-
ski, Silverstein, Strominger, and Trivedi (in various combinations2 ) estab-
1
R. Bousso and J. Polchinski, JHEP 0006, 006 (2000).
2
S. B. Giddings, S. Kachru, and J. Polchinski, Phys. Rev. D66, 106006 (2002); A.
Maloney, E. Silverstein, and A. Strominger, hep-th/0205316; S. Kachru, R. Kallosh, A.
D. Linde, and S. P. Trivedi, Phys. Rev. D68, 046005 (2003).

2
lished the existence of a large number of vacua with positive energy densities.
Ashok and Douglas3 estimated the number of these vacua to be of order 10100
to 10500 . Susskind4 gave the name “string landscape” to this multiplicity
of vacua, taking the term from biochemistry, where the possible choices of
orientation of each chemical bond in large molecules leads to a vast number
of possible configurations. Unless one can find a reason to reject all but a
few of the string theory vacua, we will have to accept that much of what we
had hoped to calculate are environmental parameters, like the distance of
the earth from the sun, whose values we will never be able to deduce from
first principles.
We lose some, and win some. The larger the number of possible values
of physical parameters provided by the string landscape, the more string
theory legitimates anthropic reasoning as a new basis for physical theories:
Any scientists who study nature must live in a part of the landscape where
physical parameters take values suitable for the appearance of life and its
evolution into scientists.
An apparently successful example of anthropic reasoning was already at
hand by the time the string landscape was discovered. For decades there
seemed to be something peculiar about the value of the vacuum energy ρV .
Quantum fluctuations in known fields at well-understood energies (say, less
than 100 GeV) give a value of ρV larger than observationally allowed by a
factor 1056 . This contribution to the vacuum energy might be canceled by
quantum fluctuations of higher energy, or by simply including a suitable cos-
mological constant term in the Einstein field equations, but the cancelation
would have to be exact to 56 decimal places. No symmetry argument or ad-
justment mechanism could be found that would explain such a cancelation.
Even if such an explanation could be found, there would be no reason to
suppose that the remaining net vacuum energy would be comparable to the
present value of the matter density, and since it is certainly not very much
larger, it was natural to suppose that it is very much less, too small to be
detected.
On the other hand, if ρV takes a broad range of values in the multiverse,
then it is natural for scientists to find themselves in a subuniverse in which
ρV takes a value suitable for the appearance of scientists. I pointed out in
1987 that this value for ρV can’t be too large and positive, because then
3
S. K. Ashok and M. Douglas, JHEP 0401, 060 (2004).
4
L. Susskind, hep-th/0302219

3
galaxies and stars would not form.5 Roughly, this limit is that ρV should
be less than the mass density of the universe at the time when galaxies first
condense. Since this was in the past, when the mass density was larger than
at present, the anthropic upper limit on the vacuum energy density is larger
than the present mass density, but not many orders of magnitude greater.
But anthropic arguments provide not just a bound on ρV ; they give
us some idea of the value to be expected: ρV should be not very different
from the mean of the values suitable for life. This is what Vilenkin6 calls
the “principle of mediocrity.” This mean is positive, because if ρV were
negative it would have to be less in absolute value than the mass density
of the universe during the whole time that life evolves, since otherwise the
universe would collapse before any astronomers come on the scene,7 while
if positive ρV only has to be less than the mass density of the universe at
the time when most galaxies form, giving a much broader range of possible
positive than negative values. In 1997-8 Martel, Shapiro, and I8 carried out
a detailed calculation of the probability distribution of values of ρV seen
by astronomers throughout the multiverse, under the assumption that the
a priori probability distribution is flat in the relatively very narrow range
that is anthropically allowed. At that time the value of the primordial rms
fractional density fluctuation σ was not well known, since the value inferred
from observations of the cosmic microwave background depended on what
one assumed for ρV . It was therefore not possible to calculate a mean
expected value of ρV , but for any assumed value of ρV we could estimate σ
and use the result to calculate the fraction of astronomers that would observe
a value of ρV as small as the assumed value. In this way we concluded
that if ΩΛ turned out to be much less than 0.6, anthropic reasoning could
not explain why it was so small. The editor of the Astrophysical Journal
objected to publishing papers about anthropic calculations, and we had to
sell our article by pointing out that we had provided a strong argument for
abandoning an anthropic explanation of a small value of ρV , if it turned out
to be too small.
Of course, it turned out that ρV is not too small. Soon after this work,
5
S. Weinberg, Phys. Rev. Lett. 59, 2607 (1987).
6
A. Vilenkin, Phys. Rev. Lett. 74, 846 (1995)
7
J. D. Barrow and F. J. Tipler, The Anthropic Cosmological Principle (Clarendon,
Oxford, 1986).
8
H. Martel, P. Shapiro, and S. Weinberg, Astrophys. J. 492, 29 (1998). For earlier
calculations, see G. Efstathiou, Mon. Not. Roy. Astron. Soc. 274, L73 (1995); S.
Weinberg, in Critical Dialogues in Cosmology, ed. N. Turok (World Scientific, 1997).

4
observations of type Ia supernovae revealed that the expansion of the uni-
verse is accelerating,9 and gave the result that ΩV ≃ 0.7. In other words the
ratio of the vacuum energy density to the present mass density ρM 0 in our
subuniverse (which I use just as a convenient measure of density) is about
2.3, a conclusion subsequently confirmed by observations of the microwave
background.10
This is still a bit low. Martel, Shapiro, and I had found that the probabil-
ity of a vacuum energy density this small was 12%. I have now recalculated
the probability distribution, using WMAP data and a better transfer func-
tion, with the result that the probability of a random astronomer seeing a
value as small as 2.3ρM 0 is increased to 15.6%. Now that we know σ, we
can also calculate that the median vacuum energy density is 13.3ρM 0 .
I should mention a complication in these calculations. The average of
the product of density fluctuations at different points becomes infinite as
these points approach each other, so the rms fractional density fluctuation
σ is actually infinite. Fortunately, it is not σ itself that is really needed
in these calculations, but the rms fractional density fluctuation averaged
over a sphere of co-moving radius R taken large enough so that the density
fluctuation is able to hold on efficiently to the heavy elements produced in the
first generation of stars. The results mentioned above were calculated for R
(projected to the present) equal to 2 Mpc. These results are rather sensitive
to the value of R; for R = 1 Mpc, the probability of finding a vacuum energy
as small as 2.3ρM 0 is only 7.2%. The estimate of the required value of R
involves complicated astrophysics, and needs to be better understood.
Now I want to take up four problems we have to face in working out the
anthropic implications of the string landscape.

I What is the shape of the string landscape?


Douglas11 and Dine12 and their co-workers have taken the first steps in
finding the statistical rules governing different string vacua. I can’t comment
usefully on this, except to say that it wouldn’t hurt in this work if we knew
what string theory is.
9
A. G. Riess et al., Astron. J. 116, 1009 (1998); S. Perlmutter et al., Astrophys. J.
517, 565 (1999).
10
WMAP collaboration, Astrophys. J. Suppl. 148 (2003).
11
M. R. Douglas, hep-ph/0401004; Compt. Rend. Phys. 5, 965 (2004).
12
M. Dine, D. O’Neil and Z. Sun, JHEP 0507, 014 (2005); M. Dine and Z. Sun, hep-
th/0506246.

5
II What constants scan?
Anthropic reasoning makes sense for a given constant if the range over
which the constant varies in the landscape is large compared with the an-
thropically allowed range of values of the constant, for then it is reasonable
to assume that the a priori probability distribution is flat in the anthrop-
ically allowed range. We need to know what constants actually “scan” in
this sense. Physicists would like to be able to calculate as much as possible,
so we hope that not too many constants scan.
The most optimistic hypothesis is that the only constants that scan are
the few whose dimensionality is a positive power of mass: the vacuum energy,
and whatever scalar mass or masses set the scale of electroweak symmetry
breaking. With all other parameters of the Standard Model fixed, the scale
of electroweak symmetry breaking is bounded by about 1.4 to 2.7 times its
value in our subuniverse, by the condition that the pion mass should be small
enough to make the nuclear force strong enough to keep the deuteron stable
against fission.13 (The condition that the deuteron be stable against beta
decay, which yields a tighter bound, does not seem to me to be necessary.
Even a beta-unstable deuteron would live long enough to allow cosmological
helium synthesis; helium would be burned to heavy elements in the first
generation of very massive stars; and then subsequent generations could have
long lifetimes burning hydrogen through the carbon cycle.) But the mere
fact that the electroweak symmetry breaking scale is only a few orders of
magnitude larger than the QCD scale should not in itself lead us to conclude
that it must be anthropically fixed. There is always the possibility that the
electroweak symmetry breaking scale is determined by the energy at which
some gauge coupling constant becomes strong, and if that coupling happens
to grow with decreasing energy a little faster than the QCD coupling then
the electroweak breaking scale will naturally be a few orders of magnitude
larger than the QCD scale.
If the electroweak symmetry breaking scale is anthropically fixed, then
we can give up the decades long search for a natural solution of the hierarchy
problem. This is a very attractive prospect, because none of the “natural”
solutions that have been proposed, such as technicolor or low energy su-
persymmetry, were ever free of difficulties. In particular, giving up low
energy supersymmetry can restore some of the most attractive features of
the non-supersymmetric standard model: automatic conservation of baryon
and lepton number in interactions up to dimension 5 and 4, respectively;
13
V.Agrawal, S. M. Barr, J. F. Donoghue, and D. Seckel, Phys. Rev. D 57, 5480 (1998).

6
natural conservation of flavors in neutral currents; and a small neutron elec-
tric dipole moment. Arkani-Hamed and Dimopoulos14 have even shown how
it is possible to keep the good features of supersymmetry, such as a more
accurate convergence of the SU (3) × SU (2) × U (1) couplings to a single
value, and the presence of candidates for dark matter WIMPs. The idea
of this “split supersymmetry” is that, although supersymmetry is broken
at some very high energy, the gauginos and higgsinos are kept light by a
chiral symmetry. [An additional discrete symmetry is needed to prevent
lepton-number violation in higgsino-lepton mixing, and to keep the lightest
supersymmetric particle stable.] One of the nice things about split super-
symmetry is that, unlike many of the things we talk about these days, it
makes predictions that can be checked when the LHC starts operation. One
expects a single neutral Higgs with a mass in the range 120 to 165 GeV,
possible winos and binos but no squarks or sleptons, and a long-lived gluino.
(Incidentally, a Stanford group15 has recently used considerations of big bang
nucleosynthesis to argue that a 1 TeV gluino must have a lifetime less than
100 seconds, indicating a supersymmetry breaking scale less than 1010 GeV,
which might create problems for proton stability. But I wonder whether,
even if the gluino has a longer lifetime and decays after nucleosynthesis, the
universe might not thereby be reheated above the temperature of helium
dissociation, giving big bang nucleosynthesis a second chance to produce
the observed helium abundance.)
What about the dimensionless Yukawa couplings of the Standard Model?
Hogan16 has analyzed the anthropic constraints on these couplings, with the
electroweak symmetry breaking scale and the sum of the u and d Yukawa
couplings held fixed, to avoid complications due to the dependence of nuclear
forces on the pion mass. He imposes the conditions that (1) md − mu −
me > 1.2 MeV, so that the early universe doesn’t become all neutrons; (2)
md − mu + me < 3.4 MeV, so that the pp reaction is exothermic, and (3)
me > 0. With three conditions on the two parameters mu − md and me , he
naturally finds these parameters are limited to a finite region, which turns
out to be quite small. At first sight, this gives the impression that the quark
and lepton Yukawa couplings are subject to stringent anthropic constraints,
14
N. Arkani-Hamed and S. Dimopoulos, JHEP 0506, 073 (2005). Also see G. F. Giudice
and A. Romanino, Nucl. Phys. B 699, 65 (2004); N. Arkani-Hamed, S. Dimopoulos, G. F.
Giudice, and A. Romanino, Nucl. Phys. B 709, 3 (2005); A. Delgado and G. F. Giudice,
hep-ph/0506217.
15
A. Arvanitaki, C. Davis, P. W. Graham, A. Pierce, and J. G. Wacker, hep-ph/0504210.
16
C. Hogan, Rev. Mod. Phys. 72, 1149 (2000); and astro-ph/0407086.

7
in which case we might infer that the Yukawa couplings probably scan.
I have two reservations about this conclusion. The first reservation is
that the pp reaction is not necessary for life. For one thing, the pep reaction
p + p + e− → d+ ν can keep stars burning hydrogen for a long time. For this,
we do not need md − mu + me < 3.4 MeV, but only the weaker condition
md − mu − me < 3.4 MeV. The three conditions then do not constrain
md − mu and me separately to any finite region, but only constrain the
single parameter md − mu − me to lie between 1.2 MeV and 3.4 MeV, not
a very tight anthropic constraint. (In fact, He4 will be stable as long as
md − mu − me is less than about 13 MeV, so stellar nucleosynthesis can
begin with helium burning in the heavy stars of Population III, followed
by hydrogen burning in later generations of stars.) My second reservation
is that the anthropic constraints on the Yukawa couplings are alleviated if
we suppose (as discussed above) that the electroweak symmetry breaking
scale is not fixed, but free to take whatever value is anthropically necessary.
For instance, according to the results of reference 13, the deuteron binding
energy could be made as large as about 3.5 MeV by taking the electroweak
breaking scale much less than it is in our universe, in which case even the
condition that the pp reaction be exothermic becomes much looser.
Incidentally, I don’t set much store by the famous “coincidence” em-
phasized by Hoyle, that there is an excited state of C12 with just the right
energy to allow carbon production via α–Be8 reactions in stars. We know
that even–even nuclei have states that are well described as composites of
α-particles. One such state is the ground state of Be8 , which is unsta-
ble against fission into two alpha particles. The same α-α potential that
produces that sort of unstable state in Be8 could naturally be expected to
produce an unstable state in C12 that is essentially a composite of three α
particles, and that therefore appears as a low-energy resonance in α–Be8
reactions. So the existence of this state doesn’t seem to me to provide any
evidence of fine tuning.
What else scans? Tegmark and Rees17 have raised the question whether
the rms density fluctuation σ may itself scan. If it does, then the anthropic
constraint on the vacuum energy becomes weaker, resuscitating to some
extent the problem of why ρV is so small. But Garriga and Vilenkin18 have
pointed out that it is really ρV /σ 3 that is constrained anthropically, so that
even if σ does scan the anthropic prediction of this ratio remains robust.
17
M. Tegmark and M. J. Rees, Astrophys. J. 499, 526 (1998).
18
J. Garriga and A. Vilenkin, hep-th/0508005.

8
Arkani-Hamed, Dimopoulos, and Kachru19 have offered a possible rea-
son to suppose that most constants do not scan. If there are a large number
N of decoupled modular fields, each taking a few possible values, then the
probability distribution of quantities that depend√on all these fields will be
sharply peaked, with a width proportional to 1/ N . According to Distler
and Varadarajan,20 it is not really necessary here to make arbitrary assump-
tions about the decoupling of the various scalar fields; it is enough to adopt
the most general polynomial superpotential that is stable, in the sense that
radiative corrections do not change the effective couplings for large N by
amounts larger than the couplings themselves. Distler and Varadarajan em-
phasize cubic superpotentials, because polynomial superpotentials of order
higher than cubic presumably make no physical sense. But it is not clear
that even cubic superpotentials can be plausible approximations, or that
peaks will occur at reasonable values in the distribution of dimensionless
couplings rather than of some combinations of these couplings.21 It also
is not clear that the multiplicity of vacua in this kind of effective scalar
field theory can properly represent the multiplicity of flux values in string
theories,22 but even if not, it presumably can represent the variety of minima
of the potential for a given set of flux vacua.
If most constants do not effectively scan, then why should anthropic ar-
guments work for the vacuum energy and the electroweak breaking scale?
ADK point out that, even if some constant has a relatively narrow distribu-
tion, anthropic arguments will still apply if the anthropically allowed range is
even narrower and near a point around which the distribution is symmetric.
(ADK suppose that this point would be at zero, but this is not necessary.)
This is the case, for instance, for the vacuum energy if the superpotential W
is the sum of the superpotentials Wn for a large number of decoupled scalar
fields, for each of which there is a separate broken R symmetry, so that the
possible values of each Wn are equal and opposite. The probability distri-
P
bution of the total superpotential W = N n=1 Wn √will then be a Gaussian
peaked at W = 0 with a width proportional to 1/ N , and the probability
distribution of the supersymmetric vacuum energy −8πG|W |2 will extend
over a correspondingly narrow range of negative values, with a maximum
at zero. When supersymmetry breaking is taken into account, the proba-
19
N. Arkani-Hamed, S. Dimopoulos, and S. Kachru, hep-th/0501082, referred to below
as ADK.
20
J. Distler and U. Varadarajan, hep-th/0507090.
21
M. Douglas, private communication.
22
T. Banks, hep-th/0011255.

9
bility distribution widens to include positive values of the vacuum energy,
extending out to a positive value depending on the scale of supersymmetry
breaking. For any reasonable supersymmetry breaking scale, this probability
distribution, though narrow compared with the Planck scale, will be very
wide compared with the vary narrow anthropically allowed range around
ρV = 0, so within this range the probability distribution can be expected
to be flat, and anthropic arguments should work. Similar remarks apply to
the µ-term of the supersymmetric Standard Model, which sets the scale of
electroweak symmetry breaking.

III. How should we calculate anthropically conditioned probabili-


ties?
We would expect the anthropically conditioned probability distribution
for a given value of any constant that scans to be proportional to the num-
ber of scientific civilizations that observe that value. In the calculations
described above, Martel, Shapiro, and I took this number to be propor-
tional to the fraction of baryons that find themselves in galaxies, but what
if the total number of baryons itself scans? What if it is infinite?

IV. How is the landscape populated?


There are at least four ways in which we might imagine the different
“universes” described by the string landscape actually to exist:

1. The various subuniverses may be simply different regions of space.


This is most simply realized in the chaotic inflation theory.23 The
scalar fields in different inflating patches may take different values,
giving rise to different values for various effective coupling constants.
Indeed, Linde speculated about the application of the anthropic prin-
ciple to cosmology soon after the proposal of chaotic inflation.24

2. The subuniverses may be different eras of time in a single big bang. For
instance, what appear to be constants of nature might actually depend
on scalar fields that change very slowly as the universe expands.25
23
A. D. Linde, Phys. Lett. 129B, 177 (1983); A. Vilenkin, Phys. Rev. D 27, 2848
(1983); A. D. Linde, Phys. Lett. B 175, 305 (1986); Phys. Scripta T15, 100 (1987);
Phys. Lett. B202, 194 (1988).
24
A. D. Linde, in The Very Early Universe, ed. G. W. Gibbons, S. W. Hawking, and
S. Siklos (Cambridge University Press, 1983); Rept. Progr. Phys. 47, 925 (1984).
25
T. Banks, Nucl. Phys. B 249, 332 (1985).

10
3. The subuniverses may be different regions of spacetime. This can
happen if, instead of changing smoothly with time, various scalar fields
on which the “constants” of nature depend change in a sequence of
first-order phase transitions.26 In these transitions metastable bubbles
form within a region of higher vacuum energy; then within each bubble
there form further bubbles of even lower vacuum energy; and so on.
In recent years this idea has been revived in the context of the string
landscape.27

4. The subuniverses could be different parts of quantum mechanical Hilbert


space. In a reinterpretation of Hawking’s earlier work on the wave
function of the universe,28 Coleman29 showed that certain topological
fixtures known as wormholes in the path integral for the Euclidean
wave function of the universe would lead to a superposition of wave
functions in which any coupling constant not constrained by symmetry
principles would take any possible value. Ooguri, Vafa, and Verlinde30
have argued for a particular wave function of the universe, but it es-
capes me how anyone can tell whether this or any other proposed wave
function is the wave function of the universe.

These alternatives are by no means mutually exclusive. In particular, it


seems to me that, whatever one concludes about alternatives 1, 2, and 3,
26
L. Abbott, Phys. Lett. B150, 427 (1985); J. D. Brown and C. Teitelboim, Phys.
Lett. B 195, 177 (1987); Nucl. Phys. B 297, 787 (1987).
27
R. Bousso and J. Polchinski, op. cit.; J. L. Feng, J. March-Russel, S. Sethi, and F.
Wilczek, Nucl. Phys. B 602, 307 (2001); H. Firouzjahi, S. Sarangji, and S.-H. Henry
Tye, JHEP 0409, 060 (2004); B.Freivogel, M. Kleban, M. R. Martinez, and L. Susskind,
hep-th/0505232.
28
S. W. Hawking, Nucl. Phys. B 239, 257 (1984); and in Relativity, Groups, and
Topology II, NATO Advanced Study Institute Session XL, Les Houches, 1983, ed. B.S.
DeWitt and R. Stora (Elsevier, Amsterdam, 1984): p. 336. Some of this work is based
on an initial condition for the origin of the universe proposed by J. Hartle and S. W.
Hawking, Phys. Rev. D 28, 2960 (1983).
29
S. Coleman, Nucl. Phys. B 307, 867 (1988). It has been argued that the wave
function of the universe is sharply peaked at values of the constants that yield a zero
vacuum energy at late times, by S. W. Hawking, in Shelter Island II — Proceedings of the
1983 Shelter Island Conference on Quantum Field Theory and the Fundamental Problems
of Physics, ed. R. Jackiw et al. (MIT Press, Cambridge, 1985); Phys. Lett. B 134, 403
(1984); E. Baum, Phys. Lett. B 133, 185 (1984); S. Coleman, Nucl. Phys. B 310, 643
(1985). This view has been challenged; see W. Fischler, I. Klebanov, J. Polchinski, and
L. Susskind, Nucl. Phys. B 237, 157 (1989). I am assuming here that there are no such
peaks.
30
H. Ooguri, C. Vafa, and E. Verlinde, hep-th/0502211.

11
we will still have the possibility that the wave function of the universe is
a superposition of different terms representing different ways of populating
the landscape in space and/or time.

In closing, I would like to comment about the impact of anthropic rea-


soning within and beyond the physics community. Some physicists have
expressed a strong distaste for anthropic arguments. (I have heard David
Gross say “I hate it.”) This is understandable. Theories based on anthropic
calculation certainly represent a retreat from what we had hoped for: the
calculation of all fundamental parameters from first principles. It is too soon
to give up on this hope, but without loving it we may just have to resign
ourselves to a retreat, just as Newton had to give up Kepler’s hope of a
calculation of the relative sizes of planetary orbits from first principles.
There is also a less creditable reason for hostility to the idea of a multi-
verse, based on the fact that we will never be able to observe any subuni-
verses except our own. Livio and Rees31 and Tegmark32 have given thor-
ough discussions of various other ingredients of accepted theories that we
will never be able to observe, without our being led to reject these theories.
The test of a physical theory is not that everything in it should be observable
and every prediction it makes should be testable, but rather that enough
is observable and enough predictions are testable to give us confidence that
the theory is right.
Finally, I have heard the objection that, in trying to explain why the
laws of nature are so well suited for the appearance and evolution of life,
anthropic arguments take on some of the flavor of religion. I think that
just the opposite is the case. Just as Darwin and Wallace explained how
the wonderful adaptations of living forms could arise without supernatural
intervention, so the string landscape may explain how the constants of nature
that we observe can take values suitable for life without being fine-tuned by
a benevolent creator. I found this parallel well understood in a surprising
place, a New York Times op-ed article by Christoph Schönborn, Cardinal
Archbishop of Vienna.33 His article concludes as follows:

“Now, at the beginning of the 21st century, faced with scien-


tific claims like neo-Darwinism and the multiverse hypothesis in
cosmology invented to avoid the overwhelming evidence for pur-
31
M. Livio and M. J. Rees, Science 309, 1022 (12 August, 2003).
32
M. Tegmark, Ann. Phys. 270, 1 (1998).
33
C. Schönborn, N. Y. Times, 7 July 2005, p. A23.

12
pose and design found in modern science, the Catholic Church
will again defend human nature by proclaiming that the imma-
nent design evident in nature is real. Scientific theories that try
to explain away the appearance of design as the result of ‘chance
and necessity’ are not scientific at all, but, as John Paul put it,
an abdication of human intelligence.”

It’s nice to see work in cosmology get some of the attention given these days
to evolution, but of course it is not religious preconceptions like these that
can decide any issues in science.
It must be acknowledged that there is a big difference in the degree
of confidence we can have in neo-Darwinism and in the multiverse. It is
settled, as well as anything in science is ever settled, that the adaptations
of living things on earth have come into being through natural selection
acting on random undirected inheritable variations. About the multiverse,
it is appropriate to keep an open mind, and opinions among scientists differ
widely. In the Austin airport on the way to this meeting I noticed for sale
the October issue of a magazine called Astronomy, having on the cover the
headline “Why You Live in Multiple Universes.” Inside I found a report of
a discussion at a conference at Stanford, at which Martin Rees said that
he was sufficiently confident about the multiverse to bet his dog’s life on
it, while Andrei Linde said he would bet his own life. As for me, I have
just enough confidence about the multiverse to bet the lives of both Andrei
Linde and Martin Rees’s dog.


This material is based upon work supported by the National Science
Foundation under Grants Nos. PHY-0071512 and PHY-0455649 and with
support from The Robert A. Welch Foundation, Grant No. F-0014, and
also grant support from the US Navy, Office of Naval Research, Grant Nos.
N00014-03-1-0639 and N00014-04-1-0336, Quantum Optics Initiative.

13
UTTG-03-06

Quantum Contributions to Cosmological Correlations II:


arXiv:hep-th/0605244v2 22 Jun 2006

Can These Corrections Become Large?

Steven Weinberg∗
Theory Group, Department of Physics, University of Texas
Austin, TX, 78712

Abstract

This is a sequel to a previous detailed study of quantum corrections to cos-


mological correlations. It was found there that except in special cases these
corrections depend on the whole history of inflation, not just on the behav-
ior of fields at horizon exit. It is shown here that at least in perturbation
theory these corrections can nevertheless not be proportional to positive
powers of the Robertson–Walker scale factor, but only at most to powers of
its logarithm, and are therefore never large.


Electronic address: weinberg@physics.utexas.edu
I. INTRODUCTION

Calculations of non-Gaussian corrections to cosmological correlations in


the classical approximation have shown that these higher-order corrections
are generally suppressed by powers of GH 2 , where G is Newton’s constant
and H is the cosmological expansion rate at the time of horizon exit[1].
From the magnitude of the observed Gaussian correlations of fluctuations
in the cosmic microwave background, it is known that GH 2 ≈ 10−12 for
the fluctuation wavelengths studied in the microwave background, which
makes the expected non-Gaussian corrections quite small. Quantum effects
involve additional powers of G, and are therefore usually supposed to be
too small to be detected. But a recent paper[2] has shown that it in many
theories these quantum corrections depend on the whole history of inflation,
not just on the behavior of fields at horizon exit, raising the possibility that
they may be much larger than usually thought. In particular, if quantum
corrections were to involve positive powers of the Robertson–Walker scale
factor a(t) at the end of inflation, then they might be large enough to be
detected. In this paper we will extend the results of reference [2] to a very
large class of theories, and show that (at least in perturbation theory) this
never happens; quantum corrections depend at most on powers of ln a(t),
and therefore (without ≈ 1012 e-foldings after horizon exit) never become
large.

II. SCALARS AND GRAVITATION

We will first consider a theory of multiple scalar fields ϕn (x) and gravita-
tion, slightly more general than that considered in reference [2]. The scalar
Lagrangian is assumed to consist of a conventional minimal kinematic term,
plus a term with arbitrary potential V (ϕ). In the Arnowitt-Deser-Misner
formalism[3], the components of the metric are
gij = a2 e2ζ [exp γ]ij , γii = 0 , (1)
g00 = −N 2 + gij N i N j , gi0 = gij N j , (2)
where a(t) is the Robertson–Walker scale factor, γij (x, t) is a gravitational
wave amplitude, ζ(x, t) is a scalar, and N and N i are auxiliary fields, whose
time-derivatives do not appear in the action. The Lagrangian density (with
8πG ≡ 1) is
"
a3  
L = e3ζ − 2N V (ϕ) + N −1 E j i E i j − (E i i )2
2

1
#
X
i
2 Na ζ
e [exp (−γ)]ij
X
−1
+N ϕ̇n − N ∂i ϕn − ∂i ϕn ∂j ϕn
n 2 n
a (3)
+ N eζ [exp(−γ)]ij Rij , (3)
2
where
1 
Eij ≡ġij − ∇i Nj − ∇j Ni ,
2
Eij −2 −2ζ
=a e [exp(−γ)]ik Ekj , N i = a−2 e−2ζ [exp(−γ)]ik Nk , (4)
where ∇i is the three-dimensional covariant derivative calculated with the
(3)
three-metric e2ζ γij ; and Rij is the curvature tensor calculated with this
three-metric. The auxiliary fields N and N i are to be found by requiring
that the action is stationary in these variables. This gives the constraint
equations:
h  i  
∇i N −1 E i j − δi j E k k ∂j ϕn ϕ̇n − N i ∂i ϕn ,
X
= N −1 (5)
n

" #
 2
2 (3) ij
∂i ϕn ∂j ϕn = E i j E j i − E i i
X
−2 −2ζ
N R − 2V − a e [exp(−γ)]
n
X 2
+ ϕ̇n − N i ∂i ϕn . (6)
n
(3)
For our present purposes, all we need to know about Rij , E i j , and the
solutions for N and N i is that none of them contain terms with positive
powers of a. We can impose gauge conditions, for instance by setting any
one of the scalar fields equal to its unperturbed value and requiring that
∂i γij = 0.
The possibility of positive powers of a(t) in correlation functions of the ϕn
and/or ζ arises from the explicit factors a3 and a in the terms in Eq. (3). But
these factors can be compensated by negative powers of a(t) in various field
time derivatives and in various commutators, which arise from the structure
of the perturbative expansion for correlation functions. The expectation
value of any product Q(t) of field operators at various space points (but all
at the same time t) is [2]
∞ Z t Z tN Z t2
iN
X
hQ(t)i = dtN dtN −1 · · · dt1
N =0 −∞ −∞ −∞
Dh h h i iiE
× HI (t1 ), HI (t2 ), · · · HI (tN ), QI (t) · · · , (7)

2
(with the N = 0 term understood to be just hQI (t)i). Here QI is the
product Q in the interaction picture (with time-dependence generated by
the part of the Hamiltonian that is quadratic in fluctuations); and HI is
the interaction part of the Hamiltonian (the part that is of third or higher
order in fluctuations) in the interaction picture. The fields in the interaction
picture are
Z h i
ζ(x, t) = d3 q eiq·x α(q)ζq (t) + e−iq·x α∗ (q)ζq∗ (t) , (8)
Z Xh i
γij (x, t) = d3 q eiq·x eij (q̂, λ)α(q, λ)γq (t) + e−iq·x e∗ij (q̂, λ)α∗ (q, λ)γq∗ (t) ,
λ
Z (9)
h i
ϕn (x, t) = d3 q eiq·x α(q, n)ϕq (t) + e−iq·x α∗ (q, n)ϕ∗q (t) , (10)

where λ = ±2 is a helicity index and eij (q̂, λ) is a polarization tensor, while


α(q), α(q, λ), and α(q, n) are conventionally normalized annihilation oper-
ators, satisfying the usual commutation relations
h i   h i
α(q) , α∗ (q′ ) = δ3 q − q′ , α(q) , α(q′ ) = 0 . (11)
h i   h i
α(q, λ) , α∗ (q′ , λ′ ) = δλλ′ δ3 q − q′ , α(q, λ) , α(q′ , λ′ ) = 0 , (12)
and
h i   h i
α(q, n) , α∗ (q′ , n′ ) = δnn′ δ3 q − q′ , α(q, n) , α(q′ , n′ ) = 0 , (13)

The expectation value in Eq. (7) is assumed to be taken in a “Bunch–


Davies” vacuum annihilated by these annihilation operators. Also, ζq (t),
γq (t), and ϕq (t) are suitably normalized positive-frequency solutions of the
wave equations

d2 ζq d ln ǫ dζq
 
2
+ 3H + + (q/a)2 ζq = 0 , (14)
dt dt dt

d2 γq dγq
+ 3H + (q/a)2 γq = 0 , (15)
dt2 dt
d2 ϕq dϕq
+ 3H + (q/a)2 ϕq = 0 , (16)
dt2 dt
where H ≡ ȧ/a and ǫ ≡ −Ḣ/H 2 .

3
The functions ζq (t), γq (t), and ϕq (t) approach time-independent limits
ζqo ,γqo , and ϕoq at late times during inflation, when the perturbations are
far outside the horizon, with the remainders ζq (t) − ζqo , γq (t) − γqo , and
ϕq (t) − ϕoq all vanishing essentially (apart from slowly varying quantities like
H and ǫ) as a−2 (t). In consequence, ζ̇q (t), γ̇q (t), and ϕ̇q (t) all vanish at
late times like a−2 (t). Also, as shown in reference [2], the commutator of
any two interaction-picture fields at times t1 , t2 during inflation but long
after horizon exit goes essentially as a−3 (t), with t either t1 or t2 or some
weighted average of the times between t1 and t2 . The same is true for the
commutator of a field and a field time derivative, while the commutator of
two field time derivatives goes as a−5 (t).1
This asymptotic properties of the commutators can be seen by noting
that
Z h i

[ζ(x, t) , ζ(y, t )] = d3 q eiq·(x−y) Im ζq (t)ζq∗ (t′ ) (17)
Z h i
[γ(x, t) , γ(y, t′ )] = d3 q eiq·(x−y) Im γq (t)γq∗ (t′ ) (18)
Z h i
[ϕ(x, t) , ϕ(y, t′ )] = d3 q eiq·(x−y) Im ϕq (t)ϕ∗q (t′ ) . (19)

The general solutions of Eqs. (14)–(16) are each linear combinations with
complex coefficients of two independent real solutions, one of which goes at
late times as a constant plus terms of order a−2 , while the other goes as a−3 ,
so the imaginary parts in Eqs.(17)–(19) arise only from the interference of the
two independent real solutions, which goes as a−3 . Likewise the derivatives
of these imaginary parts with respect to either t or t′ also goes essentially
as a−3 , because the derivative may act on the solution that already goes as
a−3 , but the derivative with respect to both t and t′ goes as a−5 , because
both of the independent real solutions are differentiated.
We are interested in the behavior of the correlation function (7) at late
times t, when the perturbations are far outside the horizon. Inspection of the
1
In this counting of powers of a(t), we are tacitly assuming that the time dependence
can be evaluated before integrating over momenta, and will not be altered when the
momentum integrals are done. This is based on the expectation that the counterterms
introduced to eliminate ultraviolet divergences in flat space will suppress the contributions
of large internal momenta even in an inflating spacetime. As discussed in reference [2],
this expectation is not fulfilled for arbitrary choices of the operators whose correlation
functions are to be calculated. It is necessary to consider only correlation functions of
“renormalized” operators, for which large internal momenta em are suppressed. More
work needs to be done to see how to construct appropriate renormalized operators.

4
Lagrangian density (3) shows that no term has more than 3 factors of a(t).
According to Eq. (7), there are just as many commutators as interactions,
and each commutator provides at least 3 factors of 1/a(t) at late times, so
the total number of factors of a(t) at late times in the integrals over time or
in any subintegration is at most zero. With zero factors of a(t) the integrand
can still grow like a power of t, which is more or less the same as a power
of ln a(t)[4], but it cannot grow like a power of a(t), and therefore (without
≈ 1012 e-foldings) it cannot become large at late times.
Indeed, since time derivatives of fields go like a−3 (t), and commutators
of time derivatives of fields with each other go like a−5 (t), the integrand will
go like a negative power of a(t) if any interaction has less than 3 explicit
factors of a(t), or if the time derivative of a field in any interaction does
not appear in a commutator, or appears in a commutator with another time
derivative. It is therefore only a very limited set of terms in the perturbation
series that can contribute to a logarithmic growth of the integrand at late
times.
These conclusions would not be altered by the inclusion of higher-derivative
terms in the action. Each pair of space derivatives is accompanied with a
factor gij ∝ a−2 , while field time derivatives of any order vanish at late
times at least as fast as a−2 .
It remains to consider the inclusion of other kinds of fields, but first we
must say a word about the effect of scalar field masses.
III. MASSES
In the foregoing section we have treated the scalar fields (aside from a
single inflaton field whose fluctuations can be eliminated by a gauge choice)
as if they were all massless, with any possible scalar mass terms in the
Lagrangian implicitly included as just additional possible terms in the po-
tential V (ϕ). As we have seen, when treated perturbatively such a term can
at most introduce powers of ln a in the late-time behavior of the integrand
for cosmological correlation functions. But a mass m cannot be treated as
a perturbation over time intervals t for which mt ≫ 1, and in this case the
powers of ln a can add up to effects that materially change the late-time
behavior of the integrand, requiring a separate treatment of mass effects.
If a scalar mass m is sufficiently large compared with the expansion
rate H, then it produces oscillations in the integrand at late times, which
suppresses the contribution of any times later than 1/m. For m ≫ H, the
correlation function is therefore dominated by times in the era of horizon
exit. But for m < H, a more detailed analysis is required.

5
We can get a good idea of what happens in these two cases by considering
the simple example of a purely exponential expansion, a ∝ eHt , with H
constant. The wave equation for any one scalar field of mass m is

d2 ϕq dϕq h 2 2
i
+ 3H + m + (q/a) ϕq = 0 , (20)
dt2 dt
For H constant, the solutions for q/aH ≪ 1 are
" 2 # " 2 #
q q
 
λ+ λ−
ϕq → Cq a 1+O + Dq a 1+O , (21)
aH aH

where s
3 9 m2
λ± = − ± − , (22)
2 4 H2
and Cq and Dq are complex constants determined by matching this solution
to solutions before horizon exit. For m > 3H/2 the exponents λ± are
complex conjugates, so as mentioned above, the oscillations of the wave
functions suppress the contribution of late times.
For m < 3H/2, the λ± are real, with

−3/2 < λ+ < 0 , − 3 < λ− < −3/2 .

Each scalar field factor in the Lagrangian λ+


p thus contributes a factor of a (t)
at late times, and as long as q/aH ≪ λ+ , the time derivative of a scalar
field will contribute the same factor. On the other hand, commutators of
scalar fields and/or scalar time derivatives contribute factors a(t)λ+ +λ− =
a−3 (t), since the commutators can arise only from an interference between
the two terms in Eq. (21). Once again, with no more than 3 powers of
a(t) in each interaction, and with just as many commutators as there are
interactions, the total number of factors of a(t) in the integrands for cor-
relation functions cannot be greater than zero. Furthermore, except for
trivial diagrams in which every vertex has just two lines attached, since
each commutator involves just two fields, there must be fields that are not
in commutators. These contribute additional factors of a(t)λ+ to the inte-
grand, and since λ+ < 0, the integrand will be exponentially damped at late
times, and the correlation functions will depend only on the behavior of the
fields near horizon exit.

IV. VECTOR FIELDS

6
Next consider a massless vector field, given (in temporal gauge) in the
interaction picture by
XZ h i
Ai (x, t) = d3 q eiq·x ǫi (q̂, λ)α(q, λ)uq (t) + e−iq·x ǫi (q̂, λ)α∗ (q, λ)u∗q (t) ,
λ
(23)
where here ǫi (q̂, λ) and α(q, λ) are the polarization vectors and annihilation
operators for massless particles of helicity λ = ±1, and uq (t) is a suitably
normalized solution of the wave equation

d d q2
 
a(t) uq (t) + uq (t) = 0 (24)
dt dt a(t)

The commutator of two vector fields at unequal times is then


Z    
[Ai (x, t) , Aj (x′ , t′ )] = d3 q δij − q̂i q̂j exp iq · (x − x′ )
 
× uq (t) u∗q (t′ ) − uq (t′ ) u∗q (t) . (25)

Now, the general solution of the wave equation (24) here takes the simple
form
uq (t) = Cq cos qτ + Dq sin qτ , (26)
with Cq and Dq complex constants, and as usual
∞ dt′
Z
τ≡ . (27)
t a(t′ )

We see that at late times, where τ → 0, uq (t) approaches a constant, while


u̇q (t) goes essentially as 1/a(t). Also,
   
uq (t) u∗q (t′ ) − uq (t′ ) u∗q (t) = 2iIm Cq Dq∗ sin q(τ − τ ′ ) (28)

so at late times uq (t)u∗q (t′ ) − uq (t′ )u∗q (t) and uq (t)u̇∗q (t′ ) − u̇q (t′ )u∗q (t) go es-
sentially as 1/a, while u̇q (t)u̇∗q (t′ ) − u̇q (t′ )u̇∗q (t) goes to zero even faster, as
1/a3 . There are just as many commutators as there are interaction vertices,
so if a term in the integrand involves a set of interactions Hs with As explicit
factors of a, the integrand will contain altogether a number of factors of a
bounded by X
#≤ [As − 1] . (29)
s

7
Because of the vector nature of the field, the maximum number of explicit
factors of a(t) in any interaction is 3 − 2 = 1. For instance, in temporal
gauge the electromagnetic interaction of a charged scalar field ϕ is
h   i
a3 a−2 ieAi ϕ∗ ∂i ϕ − ϕ∂i ϕ∗ − e2 Ai Ai ϕ∗ ϕ , (30)

with the factor a3 coming from the metric determinant and the factor a−2
coming from gij . So again the maximum number of factors of a in the
integrand is zero, giving an integrand that grows at most like a power of
ln a. Derivative interactions of the vector field behave even better, because
time-derivatives of vector fields give extra factors of 1/a, while pairs of space-
derivatives are accompanied with factors gij ∝ a−2 . For non-Abelian gauge
fields Aαµ , there are self-interactions
1
 
−a3 a−4 Cαβγ ∂i Aαj Aβi Aγj + Cαβγ Cαδǫ Aiβ Ajγ Aiδ Ajǫ , (31)
4
where Cαβγ is a structure constant. The four factors of 1/a appear here
because the interaction involves two contractions of space indices. Each
such interaction contributes a factor a−2 to the integrand, suppressing the
contribution of late times.

V. DIRAC FIELDS

A Dirac field of mass m in the interaction picture involves a wave function


ψq (t) that satisfies the wave equation
d 3H
ψq + ψq + ia−1 γ0 γi qi ψq + γ0 mψq = 0 . (32)
dt 2
Hence for wave numbers far outside the horizon, the Dirac wave function
has the asymptotic limit

ψq (t) ∝ e−γ0 mt a−3/2 (t) . (33)

The matrix γ0 has eigenvalues ±i, so the factor e−γ0 mt produces an oscilla-
tion, which does not affect bilinears like ψ̄ψ or ψ̄γ0 ψ, but does produce an
oscillation in bilinears like ψ̄γi ψ, which suppresses the late-time contribution
of interactions containing such bilinears. Even apart from this factor (as for
instance for m = 0), every bilinear combination of ψ and ψ̄ is suppressed by
a factor a−3 produced by the factor a−3/2 in Eq. (33). This in itself cancels
the a3 factor from the metric determinant, so that no positive powers of a(t)
can be produced by any interaction involving Dirac fields.

8
VI. AFTERTHOUGHT

In generic theories the N integrals over time in N -th order perturbation


theory will yield correlation functions at time t that grow as (ln a(t))N . Such
a power series in ln a(t) can easily add up to a time dependence that grows
like a power of a(t), or even more dramatically. As everyone knows, the
series of powers of the logarithm of energy encountered in various flat-space
theories such as quantum chromodynamics can be summed by the method of
the renormalization group. It will be interesting to see if the power series in
ln a(t) encountered in calculating cosmological correlation functions at time
t, though arising here in a very different way, can be summed by similar
methods.

ACKNOWLEDGMENTS

For helpful conversations I am grateful to K. Chaicherdsakul. This mate-


rial is based upon work supported by the National Science Foundation under
Grants Nos. PHY-0071512 and PHY-0455649 and with support from The
Robert A. Welch Foundation, Grant No. F-0014, and also grant support
from the US Navy, Office of Naval Research, Grant Nos. N00014-03-1-0639
and N00014-04-1-0336, Quantum Optics Initiative.

REFERENCES

1. J. Maldacena, J. High Energy Phys. 0305, 013 (2003) (astro-ph/0210603).


For other work on this problem, see A. Gangui, F. Lucchin, S. Matar-
rese,and S. Mollerach, Astrophys. J. 430, 447 (1994) (astro-ph/9312033);
P. Creminelli, J. Cosm. Astropart. Phys. 0310, 003 (3002) (astro-
ph/0306122); P. Creminelli and M. Zaldarriaga, J. Cosm. Astropart.
Phys. 0410, 006 (2004) (astro-ph/0407059); G. I. Rigopoulos, E.P.S.
Shellard, and B.J.W. van Tent, Phys. Rev. D 72, 08357 (2005) (astro-
ph/0410486); F. Bernardeau, T. Brunier, and J-P. Uzan, Phys. Rev.
D 69, 063520 (2004). For a review, see N. Bartolo, E. Komatsu,
S. Matarrese, and A. Riotto, Phys. Rept. 402, 103 (2004) (astro-
ph/0406398).
2. S. Weinberg, Phys. Rev. D 72, 043514 (2005) (hep-th/0506236).
3. R. S. Arnowitt, S. Deser, and C. W. Misner, in Gravitation: An Intro-
duction to Current Research, ed. L. Witten (Wiley, New York, 1962):
227; now available as gr-qc/0405109.

9
4. A log a(t) dependence has been found in different contexts by Woodard
and his collaborators; see e. g. N. C. Tsamis and R. Woodard, Ann.
Phys. 238, 1 (1995); 253, 1 (1997); N. C. Tsamis and R. Woodard,
Phys. Lett. B426, 21 (1998); V. K. Onemli and R. P. Woodard,
Class. Quant. Grav. 19, 4607 (2002); T. Prokopec, O. Tornkvist, and
R. P. Woodard, Ann. Phys. 303, 251 (2003); T. Prokopec and R. P.
Woodard, JHEP 0310, 059 (2003); V. K. Onemli and R. P. Woodard,
Phys. Rev. D 70, 107301 (2004); T. Brunier, V.K. Onemli, and R. P.
Woodard, Class. Quant. Grav. 22, 59 (2005).

10
UTTG-09-06

A No-Truncation Approach to Cosmic Microwave Background


Anisotropies
arXiv:astro-ph/0607076v2 12 Sep 2006

Steven Weinberg*
Theory Group, Department of Physics, University of Texas
Austin, TX, 78712

Abstract

We offer a method of calculating the source term in the line-of-sight integral for cosmic
microwave background anisotropies without using a truncated partial-wave expansion in the
Boltzmann hierarchy.

*
Electronic address: weinberg@physics.utexas.edu

1
I. Introduction

Originally the Boltzmann equation for the photon distribution in cosmology[1] was solved
numerically by expanding the components of the photon density matrix in a series of Legen-
dre polynomials[2], but to get results that could be compared with observation of the cosmic
microwave background this method requires the inclusion of hundreds or even thousands of
partial waves, requiring hours or even days of computer time for each theoretical model. A
great improvement was introduced with the suggestion to use instead a formal solution of
the Boltzmann equation, in the form of a “line of sight” integral[3]. But this is still only a
formal solution, in the sense that we still need to calculate source terms appearing in the
integrand. These terms involve partial waves for the photon distribution up to ℓ = 2 for
the scalar modes and ℓ = 4 for the tensor modes, and these of course are coupled to higher
partial waves. In the original proposal of the “line of sight” method, and in the computer
programs CMBfast and CAMB based on this method, these source terms are calculated
numerically, by first finding an approximate solution of the Boltzmann equations for partial
wave amplitudes. An accurate solution for the partial waves appearing in the source terms
can be found by truncating the partial wave expansion at a sufficiently high value of ℓ. In the
latest version of CMBfast, the integrand for scalar modes is calculated using partial waves up
to ℓ = 12, in which case one has to solve at least 26 coupled ordinary differential equations
for the evolution of the partial wave amplitudes, not counting the equations needed to follow
the evolution of the baryonic plasma, cold dark matter, neutrinos, and gravitational field
components. For tensor modes, the source terms are calculated by solving the 22 differential
equations for partial waves with up to ℓ = 10 for photons. The results for ℓ ≤ 2 or ℓ ≤ 4
are used in this method to calculate the integrand of the line-of-sight integral, which then
is used to calculate all the higher partial wave amplitudes measured in observations of the
cosmic microwave background, up to values of ℓ over 1000.
This article will present an alternative approach, which does not use partial wave ex-
pansions to calculate the source terms, and hence obviates the need for any truncation of
this expansion. Instead of a large number of coupled differential equations for the partial

2
waves, we have a single integral equation for the tensor modes, and a trio of coupled integral
equations for the scalar modes (including one for the plasma velocity). Of course, integral
equations are generally harder to solve numerically than differential equations (no routine
for solving them is supplied by Mathematica), but in the case at hand the integral equations
can be solved numerically by simple iteration. In this method, the calculation itself provides
an immediate way of judging its own reliability — if the nth iteration agrees with the n−1th
iteration to a satisfactory degree of accuracy, one has a solution. In a sample calculation of
the source term for the tensor modes, the results converge rapidly in just a few iterations.
This paper concentrates in the next section on the calculation of the photon distribution,
but a truncated partial wave expansion is also unnecessary for neutrinos. Indeed, it is
already known[4] that the momentum distribution of massless neutrinos for a given metric
perturbation can be calculated in terms of a simple line-of-sight integral, with no need to solve
integral equations. CMBfast does not use this line-of-sight method for neutrinos, presumably
because no one is interested in very high partial waves in the neutrino distribution, but to
get good accuracy for the neutrino contribution to the energy-momentum tensor it carries
the partial wave expansion to ℓmax = 25. In the last section of this paper the approach
of reference [4], which dispenses with partial wave expansions, is extended to the case of
massive as well as massless neutrinos, and to scalar as well as tensor modes.

II. Photons

First, some reminders about the Boltzmann equation for photons in cosmology. We will
adopt a coordinate system in which the metric takes the form
 
g00 (x, t) = −1 , g0i (x, t) = 0 , gij (x, t) = a2 (t) δij + hij (x, t) , (1)

where hij is a first-order perturbation. Weakly perturbed metrics will automatically be of


this form in tensor modes, and can be put in this form for scalar modes by adopting a
synchronous gauge.
For our purposes, it is important to write the Boltzmann equation for the photon dis-
tribution in a matrix form, rather than in the partial wave formalism in which it is usually

3
presented. The photon distribution is described by a polarization density matrix nij (x, p, t),
defined so that if we measure whether photons have polarization in a direction ei rather
than in an orthogonal direction, then the number of photons with polarization ei in a vol-
ume dpi dxi of phase space at time t will be found to be gik gjl ek el nij (x, p, t) dpm dxm ,
Q Q
i m

with pi nij = 0. (The polarization of a photon with 3-momentum pi is described by a polar-


ization vector ei , satisfying pi ei = 0 and gij ei ej = 1.) For small perturbations, this matrix
can be put in the form
g ik (x, t)g jl (x, t)pk pl
" #
1 q 
ij ij
n (x, p, t) = n̄γ a(t) g kl (x, t)pk pl g (x, t) −
2 g kl (x, t)pk pl
+ δnij (x, p, t) . (2)

Here n̄γ (p) is the equilibrium phase space number density


1 h   i−1
n̄γ (p) ≡ exp p/k B a(t)T (t) − 1 , (3)
(2π)3
(which is a time-independent function of its argument because in the era of interest T (t) ∝
1/a(t)), and δnij is a small perturbation. This perturbation satisfies a linearized Boltzmann
equation:
∂ δnij (x, p, t) p̂k ∂ δnij (x, p, t) 2ȧ(t) ij
+ + δn (x, p, t)
∂t a(t) ∂xk a(t)
1  
− 2 pn̄′γ (p)p̂k p̂l ḣkl (x, t) δij − p̂i p̂j
4a (t)
3ωc (t)
Z
= −ωc (t) δnij (x, p, t) + d2 p̂1
h 8π
× δnij (x, pp̂1 , t) − p̂i p̂k δnkj (x, pp̂1 , t) − p̂j p̂k δnik (x, pp̂1 , t)
i
+ p̂i p̂j p̂k p̂l δnkl (x, pp̂1 , t)
ωc (t) ′
h i
− p k δu k (x, t) n̄γ (p) δij − p̂ i j ,
p̂ (4)
2a2 (t)

where p ≡ pi pi , p̂k ≡ pk /p, δuk is the streaming velocity of the baryonic plasma, and ωc is
the frequency with which a photon collides with electrons in the plasma. Instead of δnij , it
is sufficient to consider the intensity matrix perturbation Jij (x, p̂, t), defined by
Z ∞
4 2
a (t) ρ̄γ (t) Jij (x, p̂, t) ≡ a (t) δnij (x, pp̂, t) 4πp3 dp , (5)
0

4
where ρ̄γ (t) ≡ a−4 (t) 4πp3 n̄γ (p) dp is the mean photon energy density. (This is all we need
R

to calculate the photon contributions to the perturbations in the energy-momentum tensor.)


To derive the Boltzmann equation for Jij (x, p̂, t) we multiply Eq. (4) with 4πp3 and integrate

over p ≡ pi pi , and find

∂ Jij (x, p̂, t) p̂k ∂ Jij (x, p̂, t)


+
∂t a(t) ∂xk
 
+ p̂k p̂l ḣkl (x, t) δij − p̂i p̂j
3ωc (t)
Z
= −ωc (t) Jij (x, p̂, t) + d2 p̂1
h 8π
× Jij (x, p̂1 , t) − p̂i p̂k Jkj (x, p̂1 , t) − p̂j p̂k Jik (x, p̂1 , t)
i
+ p̂i p̂j p̂k p̂l Jkl (x, p̂1 , t)
h i
+ 2ωc (t) δij − p̂i p̂j p̂k δuk (x, t) . (6)

We can also calculate the equation of motion of the plasma, using the perturbed momentum-
conservation equation:

ρ̄B (t) ∂ h 4 d2 p̂
i Z
a(t)δuk (x, t) = − ωc (t)ρ̄γ (t)δuk (x, t) + ωc (t)ρ̄γ (t) Jii (x, p̂, t) p̂k , (7)
a(t) ∂t 3 4π

where ρ̄B is the unperturbed density of the baryonic plasma. These partial differential
equations can be converted to ordinary differential equations by writing the perturbed metric
as a Fourier integral
Z
hij (x, t) = d3 p eiq·x hij (q, t) . (8)

and looking for solutions in the form


Z Z
Jij (x, p̂, t) = d3 q eiq·x Jij (q, p̂, t) , δui (x, t) = d3 q eiq·x δui (q, t) . (9)

Then Eqs. (6) and (7) become ordinary differential equations, with ∂/∂xi replaced with iqi .
They have a formal solution in the form of a “line of sight” integral[3], which in the general
case may be written
!
t t dt′′ t
Z Z Z
′ ′′ ′′
Jij (q, p̂, t) = dt exp −iq · p̂ − dt ωc (t )
t1 t′ a(t′′ ) t′

5
"
 
× − p̂k p̂l δij − p̂i p̂j ḣkl (q, t′ )

3ωc (t′ )
+ (Jij (q, t′ ) − p̂i p̂k Jkj (q, t′ ) − p̂j p̂k Jik (q, t′ ) + p̂i p̂j p̂k p̂l Jkl (q, t′ ))
2 #
+ 2ωc (t′ )[δij − p̂i p̂j ] p̂k δuk (q, t′ ) + Jij (q, p̂, t1 ) , (10)

!
3 t t ωc (t′′ ) ωc (t′ )a(t′ )
Z Z
′ ′′
δui (q, t) = dt exp − dt Ii (q, t′ )
4a(t) t1 t′ R(t′′ ) R(t′ )
+ δui (q, t1 ) . (11)

We have here introduced the convenient abbreviations

d2 p
Z
Jij (q, t) ≡ Jij (q, p̂, t) , (12)

Z 2
dp
Ii (q, t) ≡ Jkk (q, p̂, t) p̂i , (13)

and, as usual,
3ρ̄B (t)
R(t) ≡ .
4ρ̄γ (t)
If we take the initial time t1 early enough so that photons are in local thermal equilibrium
with the plasma at that time, then
" #
  δT (q, t1 )
Jij (q, p̂, t1 ) = 2 δij − p̂i p̂j + p̂k δuk (q, t1 ) . (14)
T̄ (t1 )

It is the calculation of the source terms Jij (q, t) and Ii (q, t) that concerns us in this
paper. For this purpose, we now need to distinguish between tensor and scalar modes. We
first consider tensor modes, which are computationally simpler.

Tensor Modes

For tensor modes the metric perturbation takes the form


X
hij (q, t) = β(q, λ) eij (q̂, λ) Dq (t) , (15)
λ=±2

6
where β(q, λ) is a stochastic parameter; eij (q̂, λ) is a polarization tensor for helicity λ, with
q̂i eij (q̂, λ) = 0 and ekk (q̂, λ) = 0; and Dq (t) is the solution of the wave equation

ȧ q2
D̈q (t) + 3 Ḋq (t) + 2 Dq (t) = 16πG πqT (t) , (16)
a a

that does not decay while outside the horizon. Here πqT (t) is the coefficient of λ eij (q, λ)β(q, λ)
P

in the Fourier transform of the tensor part of the anisotropic inertia tensor. (We are consid-
ering times sufficiently late so that the decaying solution makes a negligible contribution to
δgij .) The quantity Jij (q, t) will then take the form of a corresponding sum over graviton
helicities
X
Jij (q, p̂, t) = β(q, λ) Jij (q, p̂, t, λ) , (17)
λ=±2

with Jij (q, t, λ) ordinary c-number functions, not stochastic fields, satisfying the Boltzmann
equation

∂ Jij (q, p̂, t, λ) q · p̂


+i Jij (q, p̂, t, λ)
∂t a(t)
 
+ p̂k p̂l ekl (q̂, λ) Ḋq (t) δij − p̂i p̂j
3ωc (t)
= −ωc (t) Jij (q, p̂, t, λ) +
h 2
× Jij (q, t, λ) − p̂i p̂k Jkj (q, t, λ) − p̂j p̂k Jik (q, t, λ)
i
+ p̂i p̂j p̂k p̂l Jkl (q, t, λ) , (18)

where
d2 p̂
Z
Jij (q, t, λ) ≡ Jij (q, p̂, t, λ) . (19)

(The velocity perturbation δui is absent in tensor modes.) Furthermore, because Jij (q, t, λ)
for a given helicity λ must be a linear combination of the polarization tensor components
ekl (q̂, λ) with the same λ, while q̂k ekl (q̂, λ) and ekk (q̂, λ) both vanish, the only possible form
of Jij (q, t, λ) allowed by rotational invariance is just eij (q̂, λ) times some function of q ≡ |q|
and t. This relation is conventionally written

2
Jij (q, t, λ) = − eij (q̂, λ) Ψ(q, t) . (20)
3

7
To make contact with the notation used in the usual calculation of the source term Ψ(q, t),
we note that the intensity matrix perturbation may be written in the form
1  
(T ) (T )

Jij (q, p̂, t, λ) = δij − p̂i p̂j p̂k p̂l ekl (q̂, λ) ∆T (q, p̂ · q̂, t) + ∆P (q, p̂ · q̂, t)
2
(T )
 
+ eij (q̂, λ) − p̂i p̂k ekj (q̂, λ) − p̂j p̂k eik (q̂, λ) + p̂i p̂j p̂k p̂l ekl (q̂, λ) ∆P (q, p̂ · q̂, t) .

(21)

(Here the superscript T stands for ‘tensor,’ while the subscript T stands for ‘temperature.’
The coefficients are chosen so that Jii is proportional to ∆T , and the polarization is pro-
portional to ∆P .) A third term proportional to (q̂i − p̂i (p̂ · q̂))(q̂j − p̂j (p̂ · q̂))p̂k p̂l ekl would
be allowed by symmetry principles, but is not generated by the Boltzmann equation. Using
(T ) (T )
Eq. (21) in Eq. (18) yields separate Boltzmann equations for ∆T and ∆P :
∂ (T ) (T ) (T )
∆ (q, p̂· q̂, t)+i a−1 (t) q· p̂ ∆T (q, p̂· q̂, t) = −2Ḋq (t)−ωc (t) ∆T (q, p̂· q̂, t)+ωc (t) Ψ(q, t) ,
∂t T
(22)
∂ (T ) (T ) (T )
∆ (q, p̂ · q̂, t) + i a−1 (t) q · p̂ ∆P (q, p̂ · q̂, t) = −ωc (t) ∆P (q, p̂ · q̂, t) − ωc (t) Ψ(q, t) . (23)
∂t P
To calculate the source term Ψ(q, t) in terms of partial waves, one first integrates Eq. (21)
over p̂, and finds
d2 p̂
"
3 1 2
Z
(T )
Ψ(q, t) = − − 1 − (p̂ · q̂)2 ∆T (q, p̂ · q̂, t)
2 4π 8
#
1
  
2 2 (T )

2
+ (p̂ · q̂) + 1 − (p̂ · q̂) ∆P (q, p̂ · q̂, t) . (24)
8
(T ) (T )
The functions ∆T and ∆P may be expanded in Legendre polynomials

(T ) (T )
i−ℓ (2ℓ + 1) Pℓ (q̂ · p̂) ∆T ℓ (q, t)
X
∆T (q, p̂ · q̂, t) = (25)
ℓ=0

(T ) (T )
i−ℓ (2ℓ + 1) Pℓ (q̂ · p̂) ∆P ℓ (q, t) ,
X
∆P (q, p̂ · q̂, t) = (26)
ℓ=0

and then Eq. (24) reads[5]:


1 (T ) 1 (T ) 3 (T ) 3 (T )
Ψ(q, t) = ∆T 0 (q, t) + ∆T 2 (q, t) + ∆T 4 (q, t) − ∆P 0 (q, t)
10 7 70 5
6 (T ) 3 (T )
+ ∆P 2 (q, t) − ∆P 4 (q, t) . (27)
7 70
8
As already mentioned, experience shows that to accurately calculate the partial wave am-
plitudes up to ℓ = 4, which appear in Eq. (27), one needs to solve the Boltzmann equations
for the partial wave amplitudes up to larger values of ℓ, up to ℓ = 10. Once Ψ is calculated
in this way, we can calculate Jij (q, p̂, t, λ) for very much higher values of ℓ by using the “line
of sight” integral (10), which for the tensor modes gives
!
t t dt′′ t
Z Z Z
′ ′′ ′′
Jij (q, p̂, t, λ) = dt exp −iq · p̂ − dt ωc (t )
t1 t′ a(t′′ ) t′
"
 
× − p̂k p̂l δij − p̂i p̂j ekl (q̂, λ) Ḋq (t)
#
 
′ ′
−ωc (t ) Ψ(q, t ) eij (q̂, λ) − p̂i p̂k ekj (q̂, λ) − p̂j p̂k eik (q̂, λ) + p̂i p̂j p̂k p̂l ekl (q̂, λ) .

(28)

(In tensor modes there is no perturbation to either the temperature or the velocity of the
baryonic plasma, so Eq. (14) gives the initial value Jij (q, p̂, t1 ) = 0 for an initial time t1
taken sufficiently early so that photons are in local thermal equilibrium with the plasma.)
Here we suggest the alternative, of deriving an integral equation for Ψ(q, t) by simply
analytically integrating Eq. (28) over p̂. Equating the coefficients of eij on both sides gives
the integral equation

3 t t
Z  Z 
Ψ(q, t) = dt′ exp − ωc (t′′ ) dt′′
2 t1 t′
dt′′ dt′′
" ! ! #
Z t Z t

× − 2Ḋq (t )K q + ωc (t′ )F q ′
Ψ(q, t ) .
t′ a(t′′ ) t′ a(t′′ )
(29)

Here K(v) and F (v) are the functions

K(v) ≡ j2 (v)/v 2 , F (v) ≡ j0 (v) − 2j1 (v)/v + 2j2 (v)/v 2 . (30)

Integral equations of this sort are harder to solve numerically than differential equations,
because in a step-by-step calculation it is necessary to keep track of Ψ(t′ ) for all t′ < t in
order to calculate Ψ(t). On the other hand, such equations can also be solved by simple

9
iteration, in which the numerical work is reduced to doing a few integrals. We calculate the
nth approximation Ψ(n) to Ψ by using the previous approximation Ψ(n−1) in the second term
in square brackets in Eq. (29), and start this calculation by taking the lowest approximation
Ψ(0) as just Eq. (29) with the second term in square brackets dropped:

dt′′
!
Z t  Z t  Z t
(0) ′ ′′ ′′ ′
Ψ (q, t) = −3 dt exp − ωc (t ) dt Ḋq (t )K q . (31)
t1 t′ t′ a(t′′ )

Using Ψ(0) for Ψ in Eq. (28) would give precisely the same photon distribution function that
we would find if we assumed that photons remained unpolarized until the last scattering;
subsequent iterations take account of the polarization produced in multiple scattering. When
after n iterations we find that Ψ(n) = Ψ(n−1) to an adequate degree of accuracy, we have
a solution of Eq. (29). This is in contrast with the usual truncation method, in which the
accuracy of the calculation for a given maximum multipole order can only be judged by
comparing with the results for a higher maximum multipole order.
To test the convergence of this iteration procedure, we adopt the usual ΛCDM model,
with cosmological parameters

ΩB h2 = 0.0223 , ΩM h2 = 0.1262 , h = 0.732 .

We calculate the collision rate ωc by solving the kinetic equations for hydrogen recombination,
with an un-ionized helium fraction Y = 0.26. The numerical calculations are done with q
chosen so that the physical wave number q/a comes within the horizon at just the time of
matter-radiation equality. The gravitational wave amplitude Dq (t) is calculated ignoring the
effect of anisotropic inertia. Over the interesting time interval when the matter/radiation
density ratio y increases from 2 to 4, which includes the time of recombination, the first
iteration Ψ(1) (0)
q (t) has roughly the same time-dependence as Ψq (t), but is about 13% to 33

% larger. On the other hand, after five iterations we get a result Ψ(5) (t) that at worst (at
y ≃ 3) is within 0.3% of the previous iteration Ψ(4) , and is much closer at other values of y.
As a further illustration of the convergence of the iteration procedure, Figure 1 shows the
first few iterations for several wave numbers and a broader range of values of y.

10
We will not go on here to use this source function in Eq. (28) to calculate the intensity
matrix perturbation, because we are not proposing anything new in that part of the calcu-
lation of the microwave background anisotropies, but only in the calculation of the source
terms appearing in the integrand of the line of sight integral. The sample calculation de-
scribed here shows that this is a practical method of calculating the source terms, as well as
having the conceptual advantage of avoiding a more-or-less arbitrary truncation of a partial
wave expansion.

Scalar Modes

For scalar modes, the metric perturbation in synchronous gauge takes the form

hij (q, t) = δij A(q, t) − qi qj B(q, t) . (32)

Assuming the perturbation to be dominated by a single mode (presumably the adiabatic


mode that does not decay while outside the horizon), the dependence of A(q, t) and B(q, t)
on the direction of q is entirely contained in a stochastic factor β(q) for this mode:

A(q, t) = β(q) Aq (t) B(q, t) = β(q) Bq (t) . (33)

In this case, the source terms (12) and (13) appearing in the integrand of the line-of-sight
integral must take the form

1
 
Jij (q, t) = β(q) δij Φ(q, t) + q̂i q̂j Π(q, t) , (34)
2

Ii (q, t) = iβ(q) q̂i Ω(q, t) (35)

Then, using Eq. (11) to evaluate the plasma velocity, the line-of-sight formula (10) becomes

dt′′
!
Z t Z t Z t
′ ′′ ′′
Jij (q, p̂, t) = β(q) dt exp −iq · p̂ − dt ωc (t )
t1 t′ a(t′′ ) t′
"
  
× − δij − p̂i p̂j Ȧq (t′ ) − (p̂ · q)2 Ḃq (t′ )

3ωc (t′ ) 1
      
+ Φ(q, t′ ) δij − p̂i p̂j + Π(q, t′ ) q̂i − p̂i (p̂ · q̂) q̂j − p̂j (p̂ · q̂)
2 2

11
′′ ′′ ′′′
!
3i  t′ t′
′′ ωc (t )a(t ) ′′′ ωc (t )
 Z Z
′′
+ δij − p̂ i p̂ j dt
(p̂ · q̂) Ω(q, t ) exp − dt
2a(t′ ) t1 R(t′′ ) t′′ R(t′′′ )
# " #
 δT (q, t )
1
  

+2ωc (t ) δij − p̂i p̂j p̂k δuk (q, t1 ) + 2 δij − p̂i p̂j + p̂k δuk (q, t1 ) .
T̄ (t1 )
(36)

Conventionally, the intensity matrix perturbation for scalar modes is written


(
1  (S) (S)
 
Jij (q, p̂, t) = β(q) ∆T (q, q̂ · p̂, t) − ∆P (q, q̂ · p̂, t) δij − p̂i p̂j
2
   )
(S)
 q̂i − (q̂ · p̂)p̂i q̂j − (q̂ · p̂)p̂j
+ ∆P (q, q̂ · p̂, t)   , (37)
1 − (p̂ · q̂)2

(S) (S)
with ∆T and ∆P satisfying the Boltzmann equations
!
˙ P(S) (q, µ, t) + i qµ (S) (S) 3
∆ ∆P (q, µ, t) = −ωc (t)∆P (q, µ, t) + ωc (t) (1 − µ2 )Π(q, t) , (38)
a(t) 4

!
˙ (S) (q, µ, t) + i qµ (S) (S)
∆ T ∆T (q, µ, t) = −ωc (t)∆T (q, µ, t) − 2Ȧ(q, t) + 2q 2 µ2 Ḃ(q, t)
a(t)
3
+ 3ωc (t) Φ(q, t) + ωc (t)(1 − µ2 )Π(q, t) + 4ωc (t)p̂i δui ; . (39)
4
(S) (S)
By expanding ∆T and ∆P in partial wave amplitudes

(S) (S)
i−ℓ (2ℓ + 1) Pℓ (µ) ∆T ℓ (q, t)
X
∆T (q, µ, t) = (40)
ℓ=0

(S) (S)
i−ℓ (2ℓ + 1) Pℓ (µ) ∆T ℓ (q, t) ,
X
∆P (q, µ, t) = (41)
ℓ=0

and integrating Eq. (37) over the directions of p̂, one obtains well known expressions for the
source functions in terms of the partial wave amplitudes with ℓ ≤ 2:

1 h (S) (S) (S) (S)


i
Φ = 2∆T 0 − ∆P 0 − ∆T 2 − ∆P 2 (42)
6
(S) (S) (S)
Π = ∆P 0 + ∆T 2 + ∆P 2 (43)
(S)
Ω = ∆T 1 (44)

12
Instead, we suggest the alternative of deriving coupled integral equations for Φ, Π, and
Ω by analytically integrating over p̂ in Eq. (36). This gives
Z t  Z t 
′ ′′ ′′
Φ(q, t) = dt exp − dt ωc (t )
t1 t′
dt′′ t dt′′
" ! !
Z t Z
′ 2 ′
× Ȧq (t )F1 q + q Ḃ q (t )F 2 q
t′ a(t′′ ) t′ a(t′′ )
! !
3ωc (t′ ) n dt′′ 1 dt′′ o
Z t Z t
′ ′
+ − Φ(q, t )F1 q + Π(q, t )F3 q
2 t′ a(t′′ ) 2 t′ a(t′′ )
!Z ′ !#
dt′′ ′′ ′′
ωc (t′′′ ) dt′′′
Z t′
3 ′′ ωc (t )a(t )
Z t t
′′
+ F4 q dt Ω(q, t ) exp − ,
2a(t′ ) t′ a(t′′ ) t1 R(t′′ ) t′′ R(t′′′ )
(45)
Z t  Z t 
Π(q, t) = dt′ exp − dt′′ ωc (t′′ )
t1 t′
dt′′ dt′′
" ! !
Z t Z t
′ 2 ′
× − 2Ȧq (q, t )j2 q + 2q Ḃq (t )F5 q
t′ a(t′′ ) t′ a(t′′ )
! !
dt′′ dt′′ o
n Z t Z t
′ ′ ′
+3ωc (t ) Φ(q, t )j2 q + Π(q, t )F6 q
t′ a(t′′ ) t′ a(t′′ )
! !#
dt′′ ′′ ′′
ωc (t′′′ ) dt′′′
Z t′ Z t′
3 ′′ ωc (t )a(t )
Z t
′′
+ ′ F7 q dt Ω(q, t ) exp − ,
a(t ) t′ a(t′′ ) t1 R(t′′ ) t′′ R(t′′′ )
(46)
Z t  Z t 
Ω(q, t) = dt′ exp − dt′′ ωc (t′′ )
t1 t′
dt′′ dt′′
" ! !
Z t Z t
′ 2 ′
× 2Ȧq (t )j1 q − 2q Ḃq (t )F 8 q
t′ a(t′′ ) t′ a(t′′ )

3ωc (t′ ) n t dt′′ ′′


! !
1 dt
Z Z t o
+ − 2Φ(q, t′ )j1 q + Π(q, t′
)F 9 q
2 t′ a(t′′ ) 2 t′ a(t′′ )
! !#
dt′′ ′′ ′′
ωc (t′′′ ) dt′′′
Z t′ Z t′
3 ′′ ωc (t )a(t )
Z t
′′
+ F10 q dt Ω(q, t ) exp − ,
2a(t′ ) t′ a(t′′ ) t1 R(t′′ ) t′′ R(t′′′ )
(47)

where

F1 (v) ≡ j1 (v)/v − j0 (v) , (48)

F2 (v) ≡ j1 (v)/v − j2 (v) − j2 (v)/v 2 + j3 (v)/v(v) , (49)

F3 (v) ≡ j1 (v)/v + j2 (v)/v 2 − j3 (v)/v , (50)

13
F4 (v) ≡ j1 (v) − j2 (v)/v , (51)

F5 (v) ≡ −2j2 (v)/v 2 + 5j3 (v) − j4 (v) , (52)

F6 (v) ≡ j0 (v) − 2j1 (v)/v + 2j2 (v) + 2j2 (v)/v 2 − 5j3 (v)/v + j4 (v) ,

(53)

F7 (v) ≡ −2j2 (v)/v + j3 (v) , (54)

F8 (v) ≡ 3j2 (v)/v − j3 (v) , (55)

F9 (v) ≡ −j1 (v) + 3j2 (v)/v − j3 (v) , (56)

F10 (v) ≡ 2j1 (v)/v − 2j2 (v) , (57)

(We have dropped the terms involving the plasma temperature and velocity perturbations
at the initial time t1 , because as long as the collision rate is much larger than the expansion
rate the photon distribution is given in terms of δT /T̄ and δu by an equilibrium formula like
Eq. (14), and as long as the perturbation is outside the horizon δT /T̄ and δu grow in the
adiabatic mode, so that if t1 is taken sufficiently early in the era when the collision rate is
large and the perturbation is outside the horizon, the initial-data terms become negligible.)
One may again expect that these coupled integral equations may be solved by iteration, as
done for the tensor modes.

III. Neutrinos

In the absence of collisions, the Boltzmann equation for either massless or massive neu-
trinos for the metric (1) reads

∂n pi ∂n ∂n pj pk ∂gjk
+ 0 i =− . (58)
∂t p ∂x ∂pi p0 ∂xi

Here n(x, p, t) is the phase space density of neutrinos, regarded as a function of the compo-
nents xi , pi and the time t, while pi and p0 are functions of the xi and t as well as the pi ,
q
given by pi = g ij pj and p0 = g ij pi pj + m2 . (The derivation of Eq. (58) is given in Ref. [4]
for massless neutrinos, but precisely the same derivation applies also for massive neutrinos.)

14
For weak perturbations, we write
 q 
n(x, p, t) = n̄ν a(t) g ij (x, t)pi pi + δn(x, p, t) , (59)

where n̄ν is a time-independent function of its argument


 q  −1
1  p2 + a2 (t1 )m2
n̄ν (p) ≡ exp   + 1 . (60)
(2π)3 kB a(t)T (t)

Here δn is a small correction, representing the dynamical rather than purely geometric
effect of metric perturbations on the neutrinos, and t1 is the time the neutrinos went out of
equilibrium with the baryonic plasma. (Probably all neutrinos have masses much less than
kB T (t1 ), in which case the term a2 (t1 )m2 in the square root can be neglected.) Expanding
to first order in δn and δgij = a2 hij , after many cancellations this gives

∂δn ∂δn pi n̄′ν ( pk pk )
+ √ = √ ḣkl pk pl . (61)
∂t ∂xi a pk pk + a2 m2 2 pk pk

We again write hij as a Fourier integral (8), and seek a solution in the form
Z
δn(x, p, t) = d3 q eiq·x δn(q, p, t) . (62)

Then Eq. (61) becomes an ordinary differential equation, with ∂/∂xi replaced with iqi .
Instead of solving this equation numerically with a truncated partial wave expansion, it can
be solved analytically, as a line of sight integral
√  
n̄′ν ( pk pk ) Z t ′ Z t
dt′′
δn(q, p, t) = √ dt exp −iq · p q 
2 pk pk t1 t′ a(t′′ ) p p + a2 (t′′ )m2
k k

× ḣkl (q, t′ ) pk pl . (63)

where t1 is any time taken early enough so that δn(x, p, t1 ) is negligible.


For the foreseeable future, the neutrino distribution will be important only in calculating
the neutrino contribution to the energy-momentum tensor. The first-order perturbation to
the contribution of each species of neutrino or antineutrino to the mixed components of the

15
energy-momentum tensor takes the simple form
3
!
1 pi pj
Z
δTνi j (x, t)
Y
= 4 dpk δnν (x, p, t) q , (64)
a (t) k=1 pk pk + a2 (t)m2
3
!
1
Z
δTν0 j (x, t)
Y
= 3 dpk δnν (x, p, t) pj , (65)
a (t) k=1
3
!
1
Z q
δTν0 0 (x, t)
Y
= − 4 dpk δnν (x, p, t) pk pk + a2 (t)m2 , (66)
a (t) k=1

with all other first-order contributions nicely cancelling. In the tensor mode, for which
qi hij = hii = 0, the only non-vanishing perturbations are to the space-space components
of the energy-momentum tensor, which are needed in calculating the viscous damping of
gravitational waves[4]. These take the form
Z Z ∞
δTνi j (x, t) = d3 q eiq·x 4πp5 n̄′ν (p) dp
0
 
Z t Z t dt′′ ḣij (q, t′ )
× dt′ K qp q  q ,
t1 t′ a(t′′ ) p2 + a2 (t′′ )m2 p2 + a2 (t′ )m2
(67)

where K(v) ≡ j2 (v)/v 2. For scalar modes, we get contributions to all components, of the
form
4
δTνi j = δij δpν + ∂i ∂j πν , δTν0 j = ρ̄ν ∂j δuν , δT 0 0 = −δρν , (68)
3
in which δpν , πν , δuν and δρν may be identified as the pressure perturbation, scalar anisotropic
inertia, velocity potential, and energy density perturbation, respectively, for a given neutrino
species. Inserting Eqs. (32) and (33) in Eq. (63) and then using the result in Eqs. (64)–(66)
gives
1 ∞ 4πp5 n̄′ν (p) dp
Z Z
3 iq·x
δpν (x, t) = 4 dqe β(q) q
2a (t) 0 p2 + a2 (t)m2
  
t t dt′′
Z Z
× dt′ Ȧq (t′ )F11 qp q 
t1 t′ a(t′′ ) p2 + a2 (t′′ )m2
 
t dt′′
Z
− q 2 Ḃq (t′ )F12 qp q  , (69)
t′ a(t′′ ) p2 + a2 (t′′ )m2

16
1 ∞ 4πp5 n̄′ν (p)
Z Z
3 iq·x
πν (x, t) = 4 d qe β(q) dp q
2a (t) 0 p2 + a2 (t)m2
  
t t dt′′
Z Z
× dt′ q −2 Ȧq (t′ )j2 qp q 
t1 t′ a(t′′ ) p2 + a2 (t′′ )m2
 
t dt′′
Z
− Ḃq (t′ )F5 qp q  , (70)
t′ a(t′′ ) p2 + a2 (t′′ )m2
4 1 Z 3 iq·x Z ∞
ρ̄ν (t)δuν (x, t) = − 3 d q e β(q) 4πp4 n̄′ν (p) dp
3 2a (t) 0
  
Z t
′ −1 ′
Z t dt′′
× dt q Ȧq (t )j1 qp q 
t1 t′ a(t′′ ) p2 + a2 (t′′ )m2
 
Z t dt′′
− q Ḃq (t′ )F8 qp q  , (71)
t′ a(t′′ ) p2 + a2 (t′′ )m2
1 ∞
Z Z q
δρν (x, t) = 4
d3 q eiq·p β(q) 4πp3 n̄′ν (p) p2 + a2 (t)m2 dp
2a (t) 0
  
Z t Z t dt′′
× dt′ Ȧq (t′ )j0 qp q 
t1 t′ a(t′′ ) p2 + a2 (t′′ )m2
 
2 ′
Z t dt′′
− q Ḃq (t )F13 qp q  (72)
t′ a(t′′ ) p2 + a2 (t′′ )m2

where

F11 (v) ≡ j1 (v)/v , F12 (v) = j2 (v)/v 2 − j3 (v)/v , F13 (v) = j1 (v)/v − j2 (v) . (73)

It is only in the case m = 0 that the integrals over neutrino energies can be done separately
from the integrals over time, and give results simply proportional to ρ̄ν .
The contribution of photons to the perturbations in the energy momentum tensor can be
calculated in a similar way, by using Eqs. (64)–(66) with a2 δnii in place of δn (and of course
with m = 0), taking the integral of δnii over photon energies from the line of sight integrals
(28) or(36) for tensor or scalar modes, respectively.

Added Note: After the preprint of this paper was first circulated, I learned that similar
suggestions regarding the calculation of the photon source terms have been made in the

17
preprint of a paper by D. Beskaran, L. P. Grischchuk, and R. G. Polnarev, gr-qc/0605100.
Their equation (61) is the same integral equation for the source terms of tensor perturbations
to the cosmic microwave background as presented here in equation (29). However, for scalar
modes they do not give an integral equation for the baryonic plasma streaming velocity, and
so instead of the three coupled integral equations found here, they give two coupled integral
equations, in which the plasma velocity as well as the gravitational field perturbations appear
as inputs.

I am grateful to Eiichiro Komatsu for frequent helpful conversations, and to Raphael


Flauger for preparing the figure and pointing out some typographical errors. This material
is based upon work supported by the National Science Foundation under Grant Nos. PHY-
0071512 and PHY-0455649 and with support from The Robert A. Welch Foundation, Grant
No. F-0014.

References

1. P.J.E. Peebles and J. T. Yu, Astrophys. J. 162, 815 (1970); R. A. Sunyaev and Ya.
B. Zel’dovich, Astrophys. Space Sci. 7, 3 (1970).

2. M. L. Wilson and J. Silk, Astrophys. J. 243, 14 (1981); J. R. Bond and G. Efstathiou,


Astrophys. J. 285, L45 (1984); R. Crittenden, J. R. Bond, R. L. Davis, G. Efstathiou,
and P. J. Steinhardt, Phys. Rev. Lett. 71, 324 (1993); C.-P. Ma and E. Bertschinger,
Astrophys. J. 455, 7 (1995).

3. U. Seljak and M. Zaldarriaga, Astrophys. J. 469, 437 (1996).

4. S. Weinberg, Phys. Rev. D 69. 023501 (2004).

5. R. Crittenden et al., ref. [2].

18
YHΚ,yL Κ= 13 YHΚ,yL Κ= 1

0.03 0.09
0.07
0.02 0.05
0.01 0.03
0.01
y y
2 4 6 8 10 2 4 6 8 10

YHΚ,yL Κ= 3

0.01
y
2 4 6 8 10
-0.01
-0.02
-0.03

Figure 1: Iterative Solution of Equation (29) for the Source Function Ψ. Short dashes
indicate the zeroth iteration (31); longer dashes indicate the first iteration; and further
iterations are indistinguishable from the solid curve, which therefore represents the solution.
Here the source function is calculated for a gravitational perturbation Dq (t) whose value Dqo
before horizon entry is unity; for any other initial condition, Ψ should be multiplied by the
value of Dqo . The quantity κ is the wave number q in units of the wave number that just
comes into the horizon at matter-radiation equality, and y is the Robertson–Walker scale
factor, in units of the scale factor at matter-radiation equality. The calculation was done
by R. Flauger using the electron number density calculated with the program Recfast [S.
Seager, D. D. Sasselov, and D. Scott, Astrophys. J. 523, L1 (1999); Astrophys. J. Suppl.
128, 407 (2000)] and taking ΩM h2 = 0.133, ΩB h2 = 0.02238, h = 0.72, T0 = 2.725K, and
YHe = 0.24.

19
UTTG-01-07

Tensor Microwave Background Fluctuations for Large Multipole


Order
arXiv:astro-ph/0703179v2 11 May 2007

Raphael Flauger* and Steven Weinberg**


Theory Group, Department of Physics, University of Texas
Austin, TX, 78712

Abstract

We present approximate formulas for the tensor BB, EE, TT, and TE multipole coefficients
for large multipole order ℓ. The error in using the approximate formula for the BB multipole
coefficients is less than cosmic variance for ℓ > 10. These approximate formulas make various
qualitative properties of the calculated multipole coefficients transparent: specifically, they
show that, whatever values are chosen for cosmological parameters, the tensor EE multipole
coefficients will always be larger than the BB coefficients for all ℓ > 15, and that these
coefficients will approach each other for ℓ ≪ 100. These approximations also make clear how
these multipole coefficients depend on cosmological parameters.

*
Electronic address: flauger@physics.utexas.edu
**
Electronic address: weinberg@physics.utexas.edu

1
I. Introduction

Tensor fluctuations are a prime target for future observations of the cosmic microwave
background, because if detected they can provide a conclusive verification of the theory of
inflation and a unique tool for exploring the details of this theory. The contribution of these
fluctuations to the correlations of temperature and polarization correlations is well known.
They have the multipole coefficients:1
Z ∞
T 2
CEE,ℓ =π T02 q 2 dq
0
2
2
(" # )
∂ 2 ∂ jℓ (ρ)
Z t
0 2

× dt P (t) Ψ(q, t) 12 + 8ρ − ρ + ρ , (1)
t1 ∂ρ ∂ρ2 ρ2
ρ=q r(t)
Z ∞
T 2
CBB,ℓ =π T02 q 2 dq
0
(" # ) 2
∂ jℓ (ρ)
Z t
0
8ρ + 2ρ2

× dt P (t) Ψ(q, t) , (2)
t1 ∂ρ ρ2
ρ=q r(t)
v
2 2 t (ℓ + 2)!

u Z
CTTE,ℓ = q 2 dq
u
−2π T 0
(ℓ − 2)! 0

∂ 2 jℓ (ρ)
(" # )
t0 ∂
Z
× dt P (t)Ψ(q, t) 12 + 8ρ − ρ2 + ρ2 2
t1 ∂ρ ∂ρ ρ2 ρ=q r(t)
  
Z t0  jℓ qr(t′ ) 
× dt′ d(q, t′) , (3)
t1  q 2 r 2 (t′ ) 
  2
4π (ℓ + 2
2)!T02
Z ∞
Z t
0 jℓ qr(t)
CTTT,ℓ = 2
q dq dt d(q, t) 2 2 . (4)
(ℓ − 2)! 0 t1 q r (t)

Our notation here is consistent with that of reference [2]. To be explicit: T0 is the microwave
R t0
background temperature at the present time t0 ; P (t) = ωc (t) exp[− t ωc (t′ ) dt′] is the prob-
ability distribution of last scattering, with ωc (t) the photon collision frequency; t1 is any
time taken early enough before recombination so that any photon present at t1 would have
R t0
collided many times before the present; r(t) = t dt′ /a(t′ ) is the co-moving radial coordinate
1
These formulas are equivalent to those of Zaldarriaga and Seljak [1]. Their gravitational wave amplitude

h and power spectral function Ph (k) are related to our gravitational wave amplitude Dq (t) by h Ph = D/2.

In consequence, their function Ψ Ph is 1/4 times our source function Ψ.

2
of a source from which light emitted at time t would reach us at the origin at the present time
t0 ; and Ψ(q, t) is the “source function,” which is customarily calculated from a hierarchy of
equations for partial-wave amplitudes:

˙ (T ) (q, t) + q 
(T ) (T )

∆ T,ℓ (ℓ + 1)∆T,ℓ+1(q, t) − ℓ∆T,ℓ−1 (q, t)
a(2ℓ + 1)
(T )
 
= − 2Ḋq (t) + ωc (t)Ψ(q, t) δℓ,0 − ωc (t)∆T,ℓ (q, t) , (5)

˙ (T ) (q, t) + q 
(T ) (T )

∆ P,ℓ (ℓ + 1)∆P,ℓ+1(q, t) − ℓ∆P,ℓ−1 (q, t)
a(2ℓ + 1)
(T )
= −ωc (t)Ψ(q, t) δℓ,0 − ωc (t)∆P,ℓ (q, t) , (6)

with
1 (T ) 1 (T ) 3 (T ) 3 (T )
Ψ(q, t) = ∆T,0 (q, t) + ∆T,2 (q, t) + ∆T,4 (q, t) − ∆P,0 (q, t)
10 7 70 5
6 (T ) 3 (T )
+ ∆P,2 (q, t) − ∆P,4 (q, t) . (7)
7 70
Here Dq (t) is the gravitational wave amplitude (apart from terms that decay outside the
horizon), defined by
XZ
δgij (x, t) = d3 q eiq·x β(q, ±2) eij (q̂, ±2) Dq (t) , (8)
±

with β(q, ±2) and eij (q̂, ±2) the stochastic parameter and polarization tensor for helicity
±2, normalized so that

hβ(q, λ) β ∗(q′ , λ′)i = δλλ′ δ 3 (q − q′ ) , (9)

and for q̂ in the 3-direction


√ √
e11 (q̂, ±2) = −e22 (q̂, ±2) = 1/ 2 , e12 (q̂, ±2) = e21 (q̂, ±2) = ±i/ 2 . (10)

Finally, d(q, t) is the quantity


t0 1
 Z   
d(q, t) ≡ exp − dt′ ωc (t′ ) Ḋq (t) − ωc (t)Ψ(q, t) . (11)
t 2
Aside from the treatment of the tensor mode as a first-order perturbation, and the as-
sumption of purely elastic Thomson scattering, Eqs. (1)–(4) may be regarded as exact.

3
They serve as the basis of computer programs such as CMBfast and CAMB, that are used
to compare observations of microwave background polarization and temperature fluctuations
with models that predict values for the gravitational wave amplitude Dq (t). But they are
not very transparent.
For one thing, as shown in Figure 1, computer calculations using Eqs. (1) and (2) yield
T T
results for CEE,ℓ and CBB,ℓ that are of the same order of magnitude, and nearly equal for
T T
ℓ < 100, while CEE,ℓ > CBB,ℓ for all ℓ > 15.

T T
Figure 1: Comparison of CEE,ℓ and CBB,ℓ , in (µK)2 .

Of course computer calculations can only show this for specific choices of cosmological
parameters. (The cosmological parameters used in Figure 1 are described below.) It would
be impossible to conclude just by inspection of Eqs. (1) and (2) that these are general prop-
erties of the multipole coefficients, independent of the choice of cosmological parameters. In

4
this paper we present successive approximations that make these properties apparent, and
that, at the cost of only a small additional loss in accuracy, also clarify how the multipole
coefficients depend on various cosmological parameters.

II. The Large-ℓ Approximation

We can approximate Eqs. (1)–(4) by much simpler and more transparent formulas, by
using an asymptotic formula [3] for the spherical Bessel functions 2 :
 h i

 cos b cos ν(tan b−b)−π/4
 √ ρ>ν
jℓ (ρ) → ν sin b , (12)

0 ρ<ν

where ν ≡ ℓ + 1/2, and cos b ≡ ν/ρ, with 0 ≤ b ≤ π/2. This approximation is valid for
|ν 2 − ρ2 | ≫ ν 4/3 . Hence for ℓ ≫ 1, this formula can be used over most of the ranges of
integration in Eqs. (1)–(4). Furthermore, for ρ > ν ≫ 1 the phase ν(tan b − b) in Eq. (12)
is a very rapidly increasing function of ρ, so the derivatives in Eqs. (1)–(3) can be taken to
act chiefly on this phase:
∂ 2 jℓ (ρ)
" #

12 + 8ρ − ρ2 + ρ2 2 → −jℓ (ρ) + jℓ′′ (ρ) (13)
∂ρ ∂ρ ρ2
(1 + sin2 b) cos b h i
→− √ cos ν(tan b − b) − π/4
ν sin b
2
After the preprint of this work was first circulated we learned that the same approximation is used
by J. R. Pritchard and M. Kamionkowski, Ann. Phys. 318, 2 (2005) [astro-ph/0412581]. However, after
making this approximation they make further approximations that are quite different from ours, and that
lead to a divergence in the integral over wave number, which must be dealt with by an arbitrary cut-off.
The error introduced by their approximation is comparable to the one introduced by our last approximation
given by Eqs. (23) and (24). Another approximation that consists in averaging over the rapid oscillations in
the square of the Bessel functions leading to results similar to our last approximation was proposed by M.
Zaldarriaga and D. D. Harari, Phys. Rev. D 52, 3276 (1995) [astro-ph/9504085]. An analytic expression
for the contribution of the tensor modes to the temperature multipole coefficients approximately valid for
1 ≪ ℓ < 50 obtained using a similar average was given by A. A. Starobinsky, Sov. Astron. Lett. 11, 133
(1985)

5
" # √
∂ jℓ (ρ)
2 ′ 2 sin b cos b h i
8ρ + 2ρ → 2jℓ (ρ) → − sin ν(tan b − b) − π/4 (14)
∂ρ ρ2 ν

Then Eqs. (1)–(4) become, for ν = ℓ + 1/2 ≫ 1,


Z ∞
T 2
ℓ(ℓ + 1)CEE,ℓ →π T02 q 2 dq
0
2
(1 + sin2 b) cos b
Z ( )
h i


× dt P (t) Ψ(q, t) cos ν(tan b − b) − π/4 ,
r(t)>ν/q sin b

cos b=ν/q r(t)
(15)
Z ∞
T
ℓ(ℓ + 1)CBB,ℓ → π 2 T02 q 2 dq
0
2
n √
Z
h io
× dt P (t) Ψ(q, t) 2 sin b cos b sin ν(tan b − b) − π/4 ,


r(t)>ν/q cos b=ν/q r(t)

(16)
Z ∞
ℓ(ℓ + 1)CTTE,ℓ → −2π 2 T02 q 2 dq
0
(1 + sin2 b) cos b
Z ( )
h i
× dt P (t)Ψ(q, t) √ cos ν(tan b − b) − π/4
r(t)>ν/q sin b cos b=ν/q r(t)
3
( )
cos b
Z h i
× dt′ d(q, t′ ) √ cos ν(tan b − b) − π/4 , (17)
r(t′ )>ν/q sin b cos b=ν/q r(t′ )
Z ∞
ℓ(ℓ + 1)CTTT,ℓ → 4π 2 T02 q 2 dq
0
2
cos3 b
Z ( )
h i


× dt d(q, t) cos ν(tan b − b) − π/4 . (18)
r(t)>ν/q sin b

cos b=ν/q r(t)

In evaluating both the exact and approximate expressions, instead of calculating the
source function Ψ(q, t) by truncating the Boltzmann hierarchy (5), (6), we use the integral
equation [2, 4]

3 t ′ t
Z  Z 
Ψ(q, t) = dt exp − ωc (t′′ ) dt′′
2 t1 t′
dt′′ dt′′
" Z t ! Z t ! #
′ ′ ′
× − 2Ḋq (t )K q + ωc (t )F q Ψ(q, t ) ,
t′ a(t′′ ) t′ a(t′′ )

(19)

6
where K(v) and F (v) are the functions

K(v) ≡ j2 (v)/v 2 , F (v) ≡ j0 (v) − 2j1 (v)/v + 2j2 (v)/v 2 . (20)

(This is not an approximation; in principle it should give the same results as the truncated
Boltzmann hierarchy used by CMBfast and CAMB, aside from the supposedly small errors
produced by the truncation. In fact, our method gives results that differ by a few percent
T
from both CMBfast and CAMB, but CMBfast and CAMB give results for both CEE,ℓ and
T
CBB,ℓ that differ by similar amounts from each other, especially for large ℓ. At this point we
are not able to tell which of the three methods is the most reliable.) The specific cosmological
model chosen for this and all other numerical calculations in this paper is consistent with
current observations: We assume zero spatial curvature and constant vacuum energy, with
density parameters for baryons, cold dark matter, and dark energy given by ΩB = 0.0432,
ΩCDM = 0.213, ΩΛ = 0.743. We take the reduced Hubble constant as h = 0.72 and
the present microwave background temperature as T0 = 2.725K. In calculating the photon
collsion frequency, we use the recfast recombination code [8], with helium abundance Y =
0.24. The gravitational field amplitude outside the horizon is taken as

|Dq |2 = 4.34 × 10−11 q −3

corresponding to nT = 0, A = 0.739, and a tensor/scalar ratio r = 1 in the notation of [9].


(The values of the parameters used are the rounded maximum likelihood values from the
full N-dimensional likelihood analysis and may differ slightly from the marginalized values
quoted in [9]). The gravitational wave amplitude Dq (t) is calculated including the damping
due to neutrino anisotropic inertia, as in [10]; the effects of photon anisotropic inertia are
negligible. Reionization is ignored. To take into account a finite optical depth τ of the
reionized plasma or a different value of r, for ℓ > 10 it is only necessary to multiply the
multipole coefficients given here by r exp(−2τ ). The approximate results obtained in this
way from Eqs. (15)–(18) are compared with the exact formulas (1)–(4) in Figures 2–5.
We can gain further simplicity and transparency in the formulas for the EE and BB mul-
tipole coefficients by using another approximation that actually leads to improved accuracy

7
T
for CBB,ℓ . The last-scattering probability distribution P (t) is concentrated around a time
tL , corresponding to a redshift zL ≃ 1090. For any q of the same order of magnitude as
 
ν/r(tL ), the quantity b ≡ cos−1 ν/qr(t) does not vary appreciably for t within the range
in which P (t) is appreciable. Hence we can set r(t) equal to rL everywhere except in the
phase ν(tan b − b), which for ν ≫ 1 does vary over a wide range in this interval. Fur-
thermore, because ν(tan b − b) varies over a wide range for ν ≫ 1, the difference between
cos[ν(tan b − b) − π/4] and sin[ν(tan b − b) − π/4] is immaterial, and we can replace both
with cos[ν(tan b − b)]. Making these replacements in Eqs. (15) and (16) gives
Z ∞
T
ℓ(ℓ + 1)CEE,ℓ → π 2 T02 q 2 dq {(1 + sin2 bL )2 cos2 bL }cos bL =ν/qrL
ν/rL
 h i 2
 cos ν(tan b − b) 
Z

×

dt P (t) Ψ(q, t)  √
,
r(t)>ν/q sin b 
cos b=ν/q r(t)
(21)
Z ∞
T
ℓ(ℓ + 1)CBB,ℓ → π 2 T02 q 2 dq {4 sin2 bL cos2 bL }cos bL =ν/q rL
ν/rL
 h i 2
 cos ν(tan b − b) 
Z

×

dt P (t) Ψ(q, t)  √
. (22)
r(t)>ν/q sin b 
cos b=ν/q r(t)

(We have not set b = bL in the factors 1/ sin b in both integrals over t, in order to
avoid a divergence in the integration over q at q = ν/rL . This factor does not introduce a
divergence in the integrals over time, because dt ∝ sin b db.)
These approximate formulas are compared with results of the exact formulas (1) and (2)
T
in Figures 6 and 7. The approximate result (22) for CBB,ℓ agrees with the exact result (2)
to about 1% for all ℓ > 10, which is better than cosmic variance. The approximate result
T
(21) for CEE,ℓ is not quite as accurate; it agrees with the exact result (1) to better than
about 14% for all ℓ > 10. These approximations are evidently accurate enough for us to
draw qualitative conclusions about the EE and BB multipole coefficients.
One immediate consequence is that, since (1 + sin2 bL )2 ≥ 4 sin2 bL for all real bL , we
T T
expect that CEE,ℓ ≥ CBB,ℓ for all ℓ large enough to justify our approximations. Also, since
Ψ(q, tL ) falls off for wave lengths that come into the horizon before matter-radiation equality,

8
we expect that for relatively small ℓ (say, ℓ < 100) the integrals over q are dominated by
values for which cos bL is small, so that (1 + sin2 bL )2 ≃ 4 sin2 bL , and hence CEE,ℓ
T T
≃ CBB,ℓ
for such ℓ. As mentioned in Section I, and shown in Figure 1, both properties are observed
in the output of numerical calculations based on the accurate formulas (1) and (2).

III. Parameter-Dependence of the EE and BB Correlations

T
With one further approximation, we can find reasonably accurate formulas for CEE,ℓ and
T
CBB,ℓ that reveal the way that these coefficients depend on various cosmological parameters.
We write the squared time integrals in Eqs. (21) and (22) as double integrals over times t
and t′ , and write
"
h i h 1 h
′ ′
i i
cos ν(tan b − b) cos ν(tan b − b ) = cos ν(tan b − b) − ν(tan b′ − b′ )
2
#
h i
′ ′
+ cos ν(tan b − b) + ν(tan b − b ) ,

where cos b = ν/q r(t) and cos b′ = ν/q r(t′ ). For ν ≫ 1, and q r(t) and q r(t′ ) both of
order ν, the second term on the right oscillates very rapidly, and hence may be neglected in
the integral over t and t′ . On the other hand, because P (t) and P (t′ ) are sharply peaked
around the same time tL , the argument of the first cosine on the right is small where P (t)
and P (t′ ) are appreciable, so this cosine may be replaced with unity. Then (now dropping
the distinction between ν and ℓ), Eqs. (21) and (22) become

π 2 T02 ∞
Z
T
ℓ(ℓ + 1)CEE,ℓ → q 2 dq {(1 + sin2 bL )2 cos2 bL }cos bL =ℓ/qrL
2 ℓ/rL
!−1/4 2
ℓ2
Z

× dt P (t) Ψ(q, t) 1− 2 2 , (23)
r(t)>ℓ/q q r (t)

π 2 T02 ∞
Z
T
ℓ(ℓ + 1)CBB,ℓ → q 2 dq {4 sin2 bL cos2 bL }cos bL =ℓ/qrL
2 ℓ/rL
!−1/4 2
ℓ2
Z

× dt P (t) Ψ(q, t) 1− 2 2 . (24)
r(t)>ℓ/q q r (t)

9
This approximation is compared with the results of the exact formulas (1) and (2) in
Figures 8 and 9. As shown there, the fractional error here is less than about 20% for
10 < ℓ < 600, but it becomes larger for larger values of ℓ, where the multipole coefficients
become quite small.
Eqs. (23) and (24) are useful in revealing the parameter dependence of these multipole
coefficients. Where the last-scattering probability distribution P (t) is appreciable, the only
cosmological parameters on which either P (t) or the source function Ψ(q, t) depend are the
baryonic and matter density parameters ΩB h2 and ΩM h2 , as well as the present microwave
background temperature T0 . All dependence of the multipole coefficients on H0 or the
curvature ΩK h2 or the vacuum energy ΩΛ h2 is contained in the function r(t). But Eqs. (23)
and (24) show that r(t) and ℓ enter in the multipole coefficients only in the combination
r(t)/ℓ. Hence, with ΩB h2 , ΩM h2 , and T0 fixed, to a good approximation CEE,ℓ
T T
and CBB,ℓ
depend on H0 , ΩK h2 , and ΩΛ h2 only through their effect on the scale of the ℓ-dependence
T T
of CEE,ℓ and CBB,ℓ . Furthermore, since P (t) is sharply peaked at the time of last scattering,
just as for scalar modes there is a high degree of degeneracy here: for ℓ > 10 the coefficients
T T
CEE,ℓ and CBB,ℓ depend on H0 , ΩK h2 , and ΩΛ h2 only through a single parameter, the radius
r(tL ) of the surface of last acatttering. Of course, the degeneracy here is not as important
as it is for scalar modes, because tensor modes when discovered will be studied primarily
for the purpose of measuring the tensor/scalar ratio r and the tensor slope nT , rather than
other cosmological parameters.

We are grateful to Eiichiro Komatsu for frequent helpful conversations. This material
is based upon work supported by the National Science Foundation under Grant No. PHY-
0455649 and with support from The Robert A. Welch Foundation, Grant No. F-0014.

10
References

1. M. Zaldarriaga and U. Seljak, Phys. Rev. D 55, 1830 (1997) [astro-ph/9609170]. Also
see U. Seljak and M. Zaldarriaga, Phys. Rev. Lett. 78, 2054 (1997) [astro-ph/9609169].
An equivalent analysis is presented by M. Kamionkowski, A. Kosowsky, and A. Stebbins,
Phys. Rev. Lett. 78, 2058 (1997) [astro-ph/9609132]; Phys. Rev. D. 55, 7368 (1997)
[astro-ph/9611125].

2. S. Weinberg, Phys. Rev. D 74, 063517 (2006) [astro-ph/0607076].

3. See, e. g., I. S. Gradshteyn & I. M. Ryzhik, Table of Integrals, Series, and Products, trans-
lated, corrected and enlarged by A. Jeffrey (Academic Press, New York, 1980): formula
8.453.1.

4. D. Baskaran, L. P. Grishchuk, and A. G. Polnarev, Phys. Rev. D 74, 083008 (2006)


[gr-qc/0605100].

5. J. R. Pritchard & M. Kamionkowski, Ann. Phys. 318, 2-36 (2005) [astro-ph/0412581].

6. M. Zaldarriaga & D. D. Harari, Phys. Rev. D52, 3276-3287 (1995) [astro-ph/9504085].

7. A. A. Starobinsky, Sov. Astron. Lett. 11, 133 (1985).

8. S. Seager, D. D. Sasselov & D. Scott, Astrophys. J. 523, L1 (1999) [astro-ph/9909275], and


in more detail, in S. Seager, D. D. Sasselov & D. Scott, Astrophys. J. Suppl. 128, 407 (2000)
[astro-ph/9912182].

9. D. N. Spergel et al., Astrophys. J. Suppl. 148, 175 (2003) [astro-ph/0302209].

10. S. Weinberg, Phys. Rev. D 69, 023503 (2004) [astro-ph/0306304].

11
T
Figure 2: Comparison of formulas for CEE,ℓ . The solid line is the result of using the exact
expression (1); the dashed line is the result of using the approximation (15). Figures 2–5 show
the degree of accuracy of the large-ℓ approximation by itself, without further approximations.
In this and all other figures, all calculations are done using the cosmological parameters given
in Section II, and the units of the vertical axis are square microKelvins.

12
T
Figure 3: Comparison of formulas for CBB,ℓ . The solid line is the result of using the exact
expression (2); the dashed line is the result of using the approximation (16).

13
Figure 4: Comparison of formulas for CTTE,ℓ. The solid line is the result of using the exact
expression (3); the dashed line is the result of using the approximation (17).

14
Figure 5: Comparison of formulas for CTTT,ℓ . The solid line is the result of using the exact
expression (4); the dashed line is the result of using the approximation (18).

15
T
Figure 6: Comparison of formulas for CEE,ℓ . The solid line is the result of using the exact
expression (1); the dashed line is the result of using the approximation (21). Figures 6 and 7
show the degree of accuracy of the combined approximations that we use to show analytically
T T
that CEE,ℓ > CBB,ℓ .

16
T
Figure 7: Comparison of formulas for CBB,ℓ . The solid line is the result of using the exact
expression (2); the dashed line is the result of using the approximation (22). This is our best
T
approximation for CBB,ℓ

17
T
Figure 8: Comparison of formulas for CEE,ℓ . The solid line is the result of using the exact
expression (1); the dashed line is the result of using the approximation (23). This figure
shows the degree of accuracy of the further approximations used to explore the parameter-
T
dependence of CEE,ℓ .

18
T
Figure 9: Comparison of formulas for CBB,ℓ . The solid line is the result of using the exact
expression (2); the dashed line is the result of using the approximation (24). This figure
shows the degree of accuracy of the further approximations used to explore the parameter-
T
dependence of CBB,ℓ .

19
UTTG-29-97

Non-Renormalization Theorems in
Non-Renormalizable Theories
arXiv:hep-th/9803099v1 12 Mar 1998

Steven Weinberg∗
Theory Group, Department of Physics, University of Texas
Austin, TX, 78712

Abstract
A perturbative non-renormalization theorem is presented that
applies to general supersymmetric theories, including non-
renormalizable theories in which the d2 θ integrand is an arbitrary
R

gauge-invariant function F (Φ, W ) of the chiral superfields Φ and


gauge field-strength superfields W , and the d4 θ-integrand is re-
R

stricted only by gauge invariance. In the Wilsonian Lagrangian,


F (Φ, W ) is unrenormalized except for the one-loop renormalization
of the gauge coupling parameter, and Fayet–Iliopoulos terms can be
renormalized only by one-loop graphs, which cancel if the sum of the
U(1) charges of the chiral superfields vanishes. One consequence of
this theorem is that in non-renormalizable as well as renormalizable
theories, in the absence of Fayet–Iliopoulos terms supersymmetry
will be unbroken to all orders if the bare superpotential has a sta-
tionary point.


Research supported in part by the Robert A. Welch Foundation and NSF Grant PHY
9511632. E-mail address: weinberg@physics..utexas.edu
The remarkable absence of various radiative corrections in supersymmet-
ric theories was first shown using supergraph techniques.1 Later Seiberg in-
troduced a simple and powerful new approach to this problem,2 and used

it to prove non-renormalization theorems in various special cases. This pa-


per will use a generalized version of the Seiberg approach to give a proof of
the perturbative non-renormalization theorems, that applies not only to the
usual renormalizable theories but also to general non-renormalizable theo-

ries. These have a much richer set of couplings, involving terms of arbitrary
order in the gauge superfield, that are shown to be unrenormalized.
This is of some importance in model building. As Witten3 pointed out
long ago, it is the perturbative non-renormalization theorems that, by lim-

iting the radiative corrections responsible for supersymmetry breaking to


exponentially small non-perturbative terms, offer a hope that supersymme-
try might solve the hierarchy problem. But the theory with which we have

deal below the Planck or grand-unification scales is surely an effective quan-


tum field theory, which contains non-renormalizable as well as renormalizable
terms. Thus, in relying on supersymmetry to solve the hierarchy problem,
we had better make sure that the non-renormalization theorems apply to

non-renormalizable as well as to renormalizable theories.


We consider a general supersymmetric theory involving left-chiral super-
fields Φn , their right-chiral adjoints Φ∗n , the matrix gauge superfield V , and
their derivatives. The general supersymmetric action has a Lagrangian den-

1
sity of the form
Z " #
   
2 2 † −V †
L = d θL d θR Φe Φ + G Φ, Φ , V, D · · ·
" #
τ X
Z  
+2 Re d2 θL ǫαβ Tr Wα Wβ + F Φ, W , (1)
8πi αβ

where G and F are general gauge-invariant functions of the arguments shown;


‘D · · ·’ denotes a dependence of G on superderivatives (or spacetime deriva-

tives) of the other arguments; Wα is the usual gauge-covariant matrix left-


chiral gauge superfield formed from V ; α and β are two-component spinor
indices with ǫαβ antisymmetric; and τ is the usual complex gauge coupling
parameter
4πi θ
τ= 2
+ . (2)
g 2π
 
The function F Φ, W must be holomorphic in the left-chiral superfields Φn
and W , and may not depend on their superderivatives or spacetime deriva-
 
tives because, as well known, any term in F Φ, W that did depend on these
derivatives could be replaced with a gauge-invariant contribution to G. The

terms (Φ† e−V Φ) and (τ /8πi)Tr (W T ǫW ) in Eq. (1) could have been included
in G and F , respectively; they are displayed here to identify the zeroth-order
kinematic terms that serve as a starting point for perturbation theory.

The first perturbative non-renormalization theorem to be proved here


states that the ‘Wilsonian’ effective Lagrangian density Lλ (which, with an
ultraviolet cut-off λ, yields the same results as the original Lagrangian density

2
(1)) takes the form
Z "
   
2 2
Lλ = d θL d θR Φ† e−V Φ + Gλ Φ, Φ† , V, D · · ·
" #
τλ X
Z  
2
+2 Re d θL ǫαβ Tr Wα Wβ + F Φ, W , (3)
8πi αβ
where Gλ is some new function of the displayed variables, and τλ is a one-loop

renormalized coupling parameter. For instance, for a simple gauge group we


have
3C1 − C2
 
τλ = τ + i ln(λ/Λ) . (4)

Here Λ is an integration constant, and C1 and C2 are the Casimir constants
of the gauge and left-chiral superfields, defined by
X
CACD CBCD = C1 δAB , Tr {tA tB } = C2 δAB , (5)
CD

where CABC are the structure constants, and tA are the matrices representing
the gauge algebra on the left-chiral superfields. Not only is the the superpo-

tential F (Φ, 0) not renormalized — the whole W -dependent integrand of the


d2 θL integral is not renormalized, except for a one-loop renormalization of
R

the gauge coupling constant.


Here is the proof. Assuming that the cut-off respects supersymmetry

and gauge invariance,4 these symmetries require the Wilsonian effective La-
grangian to take the same general form as Eq. (1):
Z " #
   
2 2 † −V †
Lλ = d θL d θR Φe Φ + Gλ Φ, Φ , V, D · · ·
" #
τ X
Z  
+2 Re d2 θL ǫαβ Tr Wα Wβ + Fλ Φ, W , (6)
8πi αβ

3
where Gλ and Fλ are again general gauge-invariant functions of the arguments
shown, with Gλ Hermitian. (Since the functions Fλ and Gλ are not yet
otherwise restricted, there is no loss of generality in extracting the terms

shown explicitly in Eq. (6) from them.) Following Seiberg,2 we regard τ and
the coefficients fr of the various terms in F (Φ, W ) as independent external
left-chiral superfields that happen to have constant scalar components and
no spinor or auxiliary components, and that should also appear among the

arguments of Fλ and (along with their adjoints) in Gλ .


To deal with the non-renormalizable part of the d4 θ integral, we now also
R

 
regard the coefficients of the various terms in the real function G Φ, Φ† , V, D · · ·
as real external superfields, which also have only constant scalar components.

It is tempting to say that because these real superfields are non-chiral, they
cannot appear in the integrand of the d2 θL integral in Lλ , so that Fλ may
R

be analyzed as if G were not present. This would be too hasty, because any

real superfield P with a positive scalar component may be expressed in terms


of a left-chiral superfield Z, as

P = Z ∗ Z exp(VP ) , (7)

where VP has the form of a U(1) gauge superfield in any fixed gauge. (Note
that ln P → ln P −ln Z −ln Z ∗ is a generalized gauge transformation.) Eq. (7)
is invariant under a phase transformation Z → Zeiα , which if unbroken would
prevent the left-chiral superfield Z from appearing in Fλ . This symmetry is

actually violated by non-perturbative effects. For instance, if we modify the

4
usual renormalizable kinematic term for a multiplet of left-chiral superfields
to read d2 θL d2 θR Z ∗ Z(Φ∗ e−V Φ), then since this depends only on ZΦ, the
R

transformation Z → Zeiα , Φ → Φ has the same anomaly as the transfor-

mation Φ → Φeiα , Z → Z. This anomaly leads to a breakdown of this


symmetry, which allows Z to appear in non-perturbative corrections to the
superpotential. Indeed, if it were not for this breakdown of the symmetry
under Z → Zeiα , there could be no non-perturbative corrections to the su-

perpotential, because the kinematic term is invariant under a non-anomalous


transformation Z → Zeiβ , Φ → Φe−iβ , which would prevent the generation
of a non-perturbative term in the superpotential that depends on Φ but not
Z. Here we are considering only perturbation theory, so Z cannot appear in

Fλ , and by the same reasoning neither can any of the real superfields that
appear as coefficients of the terms in G. Thus G can have no effect on Fλ .
With G ignored, the perturbation theory based on the Lagrangian density

(1) has two symmetries that restrict the dependence of Fλ on τ and on


the parameters fr in F (Φ, W ). The first symmetry is conservation of an R
quantum number: θL and θR have R = +1 and R = −1; τ and the Φm and
V have R = 0; and the coefficients fr of all terms in F (Φ, W ) with r factors

of Wα have R = 2 − r. (Since Wα involves two superderivatives of V with


respect to θR and one with respect to θL , it has R = +1. Also, in accordance
with the usual rules for integration over Grassman parameters, integration of
a function over θL lowers its R value by two units.) This symmetry requires

5
the function Fλ (Φ, W, f, τ ) to have R = 2, like F (Φ, W ). The other symmetry
is invariance under translation of τ by an arbitrary real number ξ:

τ →τ +ξ , (8)

which leaves the action invariant because Im d2 θL αβ ǫαβ Tr Wα Wβ is a


R P

spacetime derivative. This tells us that Fλ is independent of τ , except for a


possible term proportional to the W W term in F :
X
Fλ (Φ, W, f, τ ) = cλ τ ǫαβ Tr Wα Wβ + Hλ (Φ, W, f ) , (9)
αβ

with cλ a real function of λ.


To use this information, let us consider how many powers of τ are con-

tributed to Fλ by each graph. Suppose a graph has EV external left-handed


gaugino lines and any number of external Φ-lines; IV internal V -lines and
any number of internal Φ-lines; Am pure gauge vertices with m ≥ 3 V -lines,
arising from the W W term in Eq. (1); Bmr vertices with m ≥ r V -lines and

any number of Φ-lines, arising from the terms in F (Φ, W ) with r factors of
W ; and Cm vertices with two Φ-lines and m ≥ 1 V -lines, arising from the
Φ† e−V Φ term in Eq. (1). (By ‘Φ-lines’ and ‘V -lines’ are meant lines of the
component fields of left-chiral or gauge superfields, respectively; these are

ordinary Feynman graphs, not supergraphs.) These numbers are related by


X XX X
2IV + EV = mAm + mBmr + mCm . (10)
m≥3 r m≥r m≥1

Also, since we have specified that all external V lines are for gauginos, this

graph can only contribute to a term in Lλ with EV factors of W , so it must

6
be proportional to a product of fr factors with total R-value 2 − EV , and so

XX
(2 − r)Bmr = 2 − EV . (11)
r m≥r

Using this to eliminate EV in Eq. (10), the number of factors of τ contributed


by such a graph is then

X
Nτ = A m − IV
m≥3
 
1 X XX X
= 1−  (m − 2)Am + (2 − r + m)Bmr + m Cm  . (12)
2 m≥3 r m≥r m≥1

Each of the As, Bs, and Cs in the square brackets in Eq. (12) has a positive-
definite coefficient, so there is a limit to the number of vertices of each type
that can contribute to the τ -independent function Hλ . To have Nτ = 0,

we can have A3 = 2 and all other As, Bs, and Cs zero, or A4 = 1 and
all other As, Bs, and Cs zero, which give the one-gauge-loop contributions
proportional to C1 in Eq. (4); or Bmr = 1 for some r and m = r, and all
other As, Bs, and Cs zero, which add up to the one-vertex tree contribution

F (Φ, W ) to the function Hλ (Φ, W ) in Eq. (9); or C1 = 2 and all other As,
Bs, and Cs zero, or C2 = 2 and all other As, Bs, and Cs zero, which give
the one-Φ-loop contributions proportional to C2 in Eq. (4). (Graphs with
A3 = C1 = 1 and all other As, Bs, and Cs zero are one-particle-reducible,

and therefore do not contribute to Lλ .) Finally, Eq. (12) shows that there
are no graphs at all with Nτ = 1, so the constant cλ in Eq. (9) vanishes,
completing the proof of the non-renormalization theorem (3).

7
In theories where the gauge group has a U(1) factor, there is also one
term in Gλ which is subject to a non-renormalization theorem. As pointed
out by Fayet and Iliopoulos,5 although a U(1) gauge superfield V1 is not

gauge invariant, d4 x d4 θ V1 is gauge invariant as well as supersymmetric,


R R

so we can include a term ξV1 in G. By detailed calculation, Fischler et al.6


showed that the constant ξ receives corrections for renormalizable theories
only from one-loop diagrams, and that these corrections vanish if the sum

of the U(1) charges of the left-chiral superfields vanish. Using the Seiberg
trick of regarding coupling parameters as the scalar components of external
superfields, it is easy to give a very simple proof7 of this result, which applies
also in non-renormalizable theories, and even non-perturbatively. The point

is, that a term d4 x d4 θ S V1 in Gλ is not gauge-invariant if S depends in


R R

a non-trivial way on any superfields, including the external superfields τ or


fr or those appearing as coefficients of the non-renormalizable terms in G.

There is just one graph that can make a correction to ξ that is independent of
all coupling constants: it is the one-loop tadpole graph, in which an external
V1 line interacts with left-chiral superfields through the term (Φ† exp(−V )Φ)
in Eq. (1). This graph vanishes if the sum of the U(1) charges of the left-

chiral superfields vanish. This condition is necessary (unless the U(1) gauge
symmetry is spontaneously broken) to avoid gravitational anomalies8 that
would violate the conservation of the U(1) current.
What good are the non-renormalization theorems, when so little is known

8
about the structure of the function Gλ (Φ, Φ† , V, D · · ·)? Fortunately, in the
absence of Fayet–Iliopoulos terms, it turns out that only the bare superpo-
tential matters in deciding if supersymmetry is spontaneously broken: if the

superpotential F (Φ, 0) allows solutions of the equations ∂F (Φ, 0)/∂Φn = 0


then supersymmetry is not broken in any finite order of perturbation theory.
To test this, we must examine Lorentz-invariant field configurations, in
which the Φn have only constant scalar components φn and constant auxil-

iary auxiliary components Fn , while (in Wess–Zumino gauge) the coefficients


VA of the gauge generators tA in the matrix gauge superfield V have only
auxiliary components DA . Supersymmetry is unbroken if there are values of
φn for which Lλ has no terms of first order in Fn or DA , in which case there

is sure to be an equilibrium solution with Fn = DA = 0. In the absence of


Fayet–Iliopoulos terms, this requires that for all A

∂Kλ (φ, φ∗)


(tA )mn φ∗m = 0 ,
X
(13)
nm ∂φ∗n

and for all n


∂F (φ, 0)
=0, (14)
∂φn
where the effective Kahler potential Kλ (φ, φ∗ ) is
 
K(φ, φ∗) = φ† φ + Gλ (φ, φ∗, 0, 0 · · ·) (15)

with Gλ (φ, φ∗, 0, 0 · · ·) obtained from Gλ by setting the gauge superfield and
all superderivatives equal to zero. (With superderivatives required to vanish

by Lorentz invariance, the only dependence of Gλ on V is a factor exp(−V )

9
following every factor Φ† .) If there is any solution φ(0) of Eq. (14), then the
gauge symmetry tells us that there is a continuum of such solutions, with φn
replaced with
h i
φ(0)
X
φn (ξ) = exp(i tA ξA ) (16)
nm m
A

where (since F depends only on φ, not φ∗ ), the ξA are an arbitrary set of


complex parameters. If Kλ (φ, φ∗) has a stationary point anywhere on the
surface φ = φ(ξ), then at that point
∂Kλ (φ, φ∗ ) X ∂Kλ (φ, φ∗ )
(tA )mn φ∗m δξA∗ .
X
0= (tA )nm φm δξA − ∗
(17)
nmA ∂φ n nmA ∂φ n

Since this must be satisfied for all infinitesimal complex δξA , the coefficients of

both δξA and δξA∗ must both vanish, and therefore Eq. (13) as well as Eq. (14)
is satisfied at this point. Thus the existence of a stationary point of Kλ (φ, φ∗ )
on the surface φ = φ(ξ) would imply that supersymmetry is unbroken to
all orders of perturbation theory. The zeroth-order Kahler potential (φ† φ)

is bounded below and goes to infinity as φ → ∞, so it certainly has a


minimum on the surface φ = φ(ξ), where of course it is stationary. At this
minimum there are flat directions: ordinary global gauge transformations
δφ = i δξA tA φ with ξA real. But these are also flat directions for the
P
A

perturbation Gλ (φ, φ∗, 0, 0). Thus there is still a local minimum of Kλ on the
surface φ = φ(ξ) for any perturbation Gλ (φ, φ∗ , 0, 0) in at least a finite range,
and thus to all orders in whatever couplings appear in Gλ (φ, φ∗ , 0, 0). We see
that in such a theory supersymmetry can only be broken by non-perturbative

effects, which can naturally lead to a solution of the hierarchy problem.

10
I am glad to acknowledge helpful conversations with J. Distler, W. Fis-
chler, V. Kaplunovsky, and N. Seiberg.

11
References

1. M. T. Grisaru, W. Siegel, and M. Roček, Nucl. Phys. B159, 429


(1979).

2. N. Seiberg, Phys. Lett. B318, 469 (1993).

3. E. Witten, Nucl. Phys. B185, 313 (1981).

4. T. Hayashi, Y. Ohshima, K. Okuyama, and H. Suzuki, Ibaraki preprint


IU-MSTP/27, hep-th/9801062, and earlier references cited therein.

5. P. Fayet and J. Iliopoulos, Phys. Lett. 51B, 461 (1974).

6. W. Fischler, H. P. Nilles, J. Polchinski, S. Raby, and L. Susskind, Phys.


Rev. Lett. 47, 757 (1981).

7. After this paper was submitted for publication I found that the same
proof of the non-renormalization of the Fayet–Iliopoulos term has been
given by M. Dine, in Fields, Strings, and Duality: TASI 96, eds. C.
Efthimiou and B. Greene (World Scientific, Singapore, (1997).

8. R. Delbourgo and A. Salam, Phys. Lett. 40B, 381 (1972); T. Eguchi


and P. Freund, Phys. Rev. Lett. 37, 1251 (1976). For a review and
other references, see T. Eguchi, P. B. Gilkey, and A. J. Hanson, Physics
Reports 66, 213 (1980).

12
UTTG-02-97

Effective Field Theories in the Large N Limit


arXiv:hep-th/9706042v1 6 Jun 1997

Steven Weinberg∗
Theory Group, Department of Physics, University of Texas
Austin, TX, 78712

Abstract Various effective field theories in four dimensions are shown to


have exact non-trivial solutions in the limit as the number N of fields of
some type becomes large. These include extended versions of the U(N)
Gross–Neveu model, the non-linear O(N) σ-model, and the CP N −1 model.
Although these models are not renormalizable in the usual sense, the infinite
number of coupling types allows a complete cancellation of infinities. These
models provide qualitative predictions of the form of scattering amplitudes
for arbitrary momenta, but because of the infinite number of free parameters,
it is possible to derive quantitative predictions only in the limit of small
momenta. For small momenta the large-N limit provides only a modest
simplification, removing at most a finite number of diagrams to each order in
momenta, except near phase transitions, where it reduces the infinite number
of diagrams that contribute for low momenta to a finite number.


Electronic address: weinberg@physics.utexas.edu
I. INTRODUCTION

There are a number of instructive models that can be exactly solved in the
limit where the number N of fields becomes very large.1 Well-known examples

include the linear and non-linear σ-models,2 the Gross–Neveu model3 and the
CP N −1 model.4 In four dimensions none of these models except the linear
σ-model are conventionally renormalizable, so their large-N limit has usually
been studied either by introducing an ultraviolet cutoff, or by working in two

dimensions, where the simpler versions of these models are renormalizable.


There is an alternative approach to infinities. In effective field theories
that are not renormalizable in the usual ‘power-counting’ sense, infinities are
cancelled by renormalization of coupling constants provided we include in the

Lagrangian every interaction allowed by symmetry principles. Even though


this means that the Lagrangian contains an infinite number of interaction
terms, it is often possible to derive useful results in such theories by expanding
in power of energy rather than coupling constants.5

In this paper I will show how non-trivial finite results can be obtained
by passing to the limit of large N in various four-dimensional effective field
theories that are not renormalizable in the conventional sense. In this task
we will encounter problems both of combinatorics and of renormalization.

The combinatoric problems here can be illustrated by recalling the Gross-


Neveu model in its original form. The action is
Z   2 
I[ψ] = d2 x −ψ r γ µ ∂µ ψr − (g/N) ψ r ψr , (1)

1
where ψr is a set of N fermion fields in two spacetime dimensions, forming
the defining representation of a U(N) symmetry, and g is a constant that is
held fixed as N → ∞. As an aid to counting factors of 1/N, one cancels

the quartic term in Eq. (1) by adding an expression that is quadratic in an


auxiliary field σ, and that vanishes when σ is integrated out. This results in
the replacement of (1) with the equivalent action
Z h i2
I[ψ, σ] = I[ψ] + (N/4g) d2 x σ + (2g/N)ψr ψr
Z h i
= d2 x −ψ r γ µ ∂µ ψr + σψ r ψr + (N/4g)σ 2 . (2)

Since the fermion field appears quadratically in Eq. (2) it may be integrated

out, yielding an effective action for σ


Z
Γ[σ] = (N/4g) d2 x σ 2 − iN Tr ln (γ µ ∂µ − σ) (3)

The whole action for σ is proportional to N, so the contribution of graphs


with L σ-loops to the effective action for σ is suppressed by a factor N 1−L .
Because this method uses the special properties of integrals over Gaussians,
it is often said that this method is limited to models in which the interaction

is a product of just two bilinear currents,6 as in Eq. (1).


There are also special problems with infinities when an auxiliary field is
introduced in order to impose some constraint on the N-component field, as
in the non-linear σ-model in four dimensions. In the original form of this

model the Lagrangian is

L = − 21 f 2 ∂µ πr ∂ µ πr (4)

2
where f is an N-independent constant with the dimensions of mass, and the
scalar fields πr form an O(N) N-vector, constrained by

πr πr = N . (5)

The counting of powers of N becomes much easier if one replaces this con-
straint with a Lagrange multiplier term, so that the Lagrangian becomes

L = − 12 f 2 ∂µ πr ∂ µ πr − 21 f 2 λ (πr πr − N) , (6)

with πr now unconstrained. Integrating out the auxiliary field λ(x) yields the
Lagrangian (4), with the πr constrained by Eq. (5). If instead we integrate

over the πr we find an effective action for the auxiliary field

iN Nf 2
Z
Γ[λ] = Tr ln (✷ − λ) + d4 x λ . (7)
2 2

Because both terms are proportional to N, the Greens functions for λ(x)
are given by using the effective action (7) in the tree approximation. As
well known, this theory is nonrenormalizable. We can see this in Eq. (7).

The field λ(x) here has dimensionality +2, so the trace term may be written
(aside from an inconsequential constant term) as
Z h i
1
2 iTr ln(✷ − λ) = d4 x I1 λ + I2 λ2 + Tf [λ] , (8)

where the Ia are divergent constants, and Tf [λ] is finite. The infinite term I1
can be cancelled by an infinity in the parameter f 2 , leaving a finite remainder
1 2
2 NfR d4 x λ in Γ(λ), with fR2 = f 2 + 2I1 . But the term I2 cannot be
R

3
cancelled in this way. We could of course add a term proportional to λ2 to
the Lagrangian (6), with a coefficient whose infinite part cancels the infinite
constant I2 in Eq. (8), but then we would lose the constraint (5), and this

would be the linear σ-model rather than the nonlinear σ-model. If we view
this as an effective field theory then the Lagrangian (4) is just the first term
in an infinite series involving higher powers of the currents ∂µ φr ∂ν φr and
higher derivatives, but it is not immediately obvious how these higher terms

will allow us to cancel the infinity in I2 .


We run into a similar problem when an auxiliary field is introduced to
impose a condition of gauge invariance. The classic example here is the
CP N −1 model in four dimensions. This model contains a set of N complex

scalar fields ur , subject to the constraint that

u†r (x) ur (x) = N . (9)

In order that the ur (x) at each x should form a CP N −1 manifold, we must


require the action to be invariant under ‘gauge’ transformations

δur (x) = i ǫ(x) ur (x) , (10)

with ǫ(x) an arbitrary real infinitesimal function. In the original CP N −1

model, this is accomplished by taking the action as


Z
2
I = −f d4 x (∂ µ ur − iaµ ur )† (∂µ ur − iaµ ur ) , (11)

where f is an N-independent constant with the dimensions of mass, and

4
aµ (x) may be defined as the bilinear
 
aµ ≡ −(i/2N) u†r ∂µ ur − (∂µ u†r )ur (12)

which under the gauge transformation (10) changes by

δaµ = ∂µ ǫ . (13)

Equivalently, we can replace aµ (x) in Eq. (11) with an independent auxiliary


field Aµ (x), so that the action is
Z
I = −f 2 d4 x (∂ µ ur − iAµ ur )† (∂µ ur − iAµ ur ) . (14)

Since Aµ (x) enters quadratically in Eq. (14), the path integral over Aµ (x) is
done by giving it a value at which the action (14) is stationary with respect

to Aµ (x), which turns out to give Aµ (x) = aµ (x). To enforce the constraint
(9) we can add a Lagrange multiplier term −f 2 d4 x λ (u†r ur − N), with λ(x)
R

another auxiliary field, and ur (x) now unconstrained. Since ur (x) enters
quadratically in the action it can be integrated out, yielding an effective

action for the auxiliary fields


h i Z
µ 2
Γ[A, λ] = iN Tr ln Dµ D − λ + Nf d4 x λ , (15)

where here Dµ ≡ ∂µ − iAµ . Because each term is proportional to N, the


contribution of graphs with L loops to the Greens functions for λ and aµ is
suppressed by a factor N 1−L . Eq. (15) displays the problem with renormal-

izabilty in four dimensions: Dimensional analysis and gauge invariance show

5
that the infinite part of the trace is a linear combination of d4 x λ, d 4 x λ2 ,
R R

and d4 x (∂µ Aν − ∂ν Aµ )2 with divergent coefficients, and although the infi-


R

nite part of the coefficient of d4 x λ can be cancelled by an infinite term in


R

f 2 , there is nothing here that can cancel the infinite coefficients of d4 x λ2 or


R

d4 x (∂µ Aν − ∂ν Aµ )2 . Treating this as an effective field theory, we would cer-


R

tainly have to add terms to the action involving d4 x (∂µ aν − ∂ν aµ )2 , where


R

aµ is defined by Eq. (12), but this would not cancel the infinity in Γ[λ, A]

proportional to d4 x (∂µ Aν − ∂ν Aµ )2 . We cannot add a term proportional to


R

d4 x (∂µ Aν − ∂ν Aµ )2 without making Aµ (x) an independent dynamical field,


R

thus removing the most interesting aspect of the theory, the appearance of
long-range forces in a theory without an elementary gauge field.

Finally, there is a problem that always confronts us in dealing with ef-


fective field theories: how to use a theory with an infinite number of free
parameters to derive physical predictions. As we shall see, the large N limit

can give qualitative information about the form of S-matrix elements, but
in effective field theories it is not possible actually to calculate the functions
that appear in S-matrix elements except in the low energy limit. In the ex-
tended Gross–Neveu model considered in Section II it turns out that there

are usually only a finite number of graphs that contribute to each order in
energy, whatever the value of N, so for low energy the large N limit leads
only to modest simplifications. As shown in Sections III and IV, the same
is true in the extended non-linear σ-model and the extended CP N −1 model,

6
with one interesting exception: Near the phase transitions at which the bro-
ken symmetries of these models are restored, there is an infinite number of
graphs of the same order in energy, which can be summed only in the large

N limit.
The original motivation of this work was to decide whether the appearance
of a spin-one ‘photon’ in the two-dimensional CP N −1 model occurs also in
four-dimensional versions of this model, when the problem of infinities is

handled by treating the model as an effective quantum field theory. Section


IV shows that the answer to this question is yes, but as discussed in Section
V, this result is less surprising than might be supposed.

II. THE EXTENDED GROSS–NEVEU MODEL

To illustrate the use of the large N limit in effective field theories, let us
consider the general class of models with a set of N massless fermion fields in d

spacetime dimensions, transforming according to the defining representation


of a global U(N) symmetry. Any U(N)-invariant action will be a functional
of a set of bilinear currents jℓ (x) that are invariant under U(N), such as the
currents j0 = ψ r ψr , j1µ = ψ r ∂µ ψr , etc. We will consider a class of extended

four-dimensional Gross–Neveu models, with an action of the form∗∗


Z
I[ψ] = − dd x ψ r γ µ ∂µ ψr + NF [j/N] , (16)
∗∗
The free-field action could itself be regarded as a linear term in N F [j/N ], but it is
convenient to treat it separately.

7
where F [τ ] is some N-independent functional. The original Gross–Neveu
action (1) is a special case of Eq. (16), with F [j/N] quadratic in the particular
current j0 = ψ r ψr . As we shall see, the N-dependence given the second term

in the action (16) makes the theory non-trivial but soluble, as in the original
Gross–Neveu model.
The action (16) may be replaced with an equivalent action
Z h i
I[ψ, σ] = d4 x −ψ r γ µ ∂µ ψr + σℓ (x)jℓ (x) + NG[σ, N] (17)

where exp{iNG[σ, N]} is the functional Fourier transform with respect to


Nτ of exp{iNF [τ ]}
Z Y  Z 
exp {iNG[σ, N]} ≡ dτℓ (x) exp −iN d4 x σℓ (x)τℓ (x) + iNF [τ ] .
ℓ,x
(18)

Of course, exp{iNF [τ ]} is then (up to an unimportant constant factor) the


Fourier transform of exp{iNG[σ, N]}
Z Y  Z 
4
exp {iNF [τ ]} ∝ dσℓ (x) exp iN d x σℓ (x)τℓ (x) + iNG[σ, N] .
ℓ,x
(19)
The integral over the σℓ (x) in the functional integral of exp{iI[ψ, σ]} thus

yields
Z Y
dσℓ (x) exp{iI[ψ, σ]} ∝ exp{iI[ψ]} (20)
ℓ,x

so the action I[ψ, σ] given by Eq. (17) is equivalent to the original action
(16).

8
In the limit of large N the Fourier integral (18) may be done by setting
τℓ (x) at the stationary ‘point’ τ σ (x) of the integrand, at which

δF [τ ]
≡ σℓ (x) . (21)
δτℓ (x) τ =τ σ

so that G[σ, N] approaches an N-independent functional, the Legendre trans-


form of F [τ ]:
Z
σ
G[σ, N] → G[σ] ≡ F [τ ] − dd x σℓ (x)τℓσ (x) , (22)

That is, for N → ∞, the action may be taken as


Z h i
I[ψ, σ] = d4 x −ψ r γ µ ∂µ ψr + σℓ (x)jℓ (x) + NG[σ] . (23)

We could just as well have taken the action (23) as our starting point, with

G[σ] an arbitrary N-independent functional; the only difference would be


that then the theory would be equivalent to one with an action of the original
form (16), but with F [τ ] independent of N only in the limit N → ∞.
Now, we want to calculate the quantum effective action Γ[ψ, σ], which

by definition in the tree approximation gives the same result as the sum of
all loop and tree graphs calculated using the action (23). According to the
usual prescription, we must replace ψr (x) and σℓ (x) in Eq. (23) with sums
ψr (x)+ψr′ (x) and σℓ (x)+σℓ′ (x) and integrate over the quantum perturbations

ψr′ (x) and σℓ′ (x), including only one-particle-irreducible graphs. A standard
power-counting argument gives, to leading order in 1/N,
Z h i
Γ[ψ, σ] → d4 x −ψ r γ µ ∂µ ψr + σℓ (x)jℓ (x) + NT [σ] + NG[σ] , (24)

9
where T [σ] is an N-independent functional of σ, like the functional Tr ln(γ µ ∂µ −
σ) in Eq. (3), defined in general by the integral of a Gaussian
Z "Y #  Z i

h
exp {iNT [σ]} ≡ dψr′ (x) exp i dd x −ψ r γ µ ∂µ ψr′ + σℓ (x) jℓ′ (x) .
n,x
(25)

[To obtain Eq. (24), note first that purely fermionic loops yield a term NT [σ]
in the effective action, which just makes an additive contribution to NG[σ].
The σℓ propagators are then given by the inverse of the coefficient of the
quadratic term in NT [σ] + NG[σ], and are hence proportional to 1/N, while

the purely bosonic vertices (including those derived from T [σ]) make contri-
butions proportional to N. Thus a graph with Vσ purely bosonic vertices, Vψ
fermion–fermion–boson vertices, Iψ internal fermion lines (excluding those in
purely fermionic loops) and Iσ internal boson lines makes a contribution of

order N Vσ −Iσ . But 2Vψ = 2Iψ +F , where F is the number of external fermion
lines (i.e., factors of the classical ψ field), and the number L of loops (with all
purely fermionic loops counted as bosonic vertices) is L = Iψ +Iσ −Vψ −Vσ +1,
so the number of powers of N is Vσ − Iσ = 1 − L − F/2. The leading graphs

are therefore those with no loops, but the only one-particle-irreducible graphs
with no loops are those consisting of just a single vertex, which yield the re-
sult (24).]
For instance, the amplitude for fermion–fermion scattering is given to

leading order in 1/N by the tree graphs in which a single σ line is ex-
changed between the external fermions, with the vertices given by the term

10
d4 x σℓ (x)jℓ (x) in Eq. (24), and the σ propagator given by the inverse of
R

the coefficient of the quadratic term in NG[σ] + NT [σ]. Unlike the case of
the original Gross–Neveu model, the terms of higher order in 1/N arise not

only from loop graphs that correct the effective action (24), but also from
higher-order terms in G[σ, N].
Now let’s take up the problem of renormalization in the large N limit.
Although the one-loop functional T [σ] is highly non-local, its infinite part

is ‘perturbatively local’ — that is, it is the integral of a series (in general


infinite) of products of fields and their derivatives with divergent coefficients,
of which only a finite number of terms contribute to the tree amplitude
for any given process to any finite order in momentum. In order to cancel

these infinities, it is necessary that G[σ] be a general perturbatively local


functional, subject to no constraints other than symmetry properties that
also constrain T [σ]. But any perturbatively local G[σ] can be obtained as

the Legendre transform of some perturbatively local F [τ ], provided only that

δ 2 G[σ]
DetM =
6 0, Mℓ x ,m y ≡ . (26)
δσℓ (x)δσm (y)

[To see this, it is only necessary to note that the functional G[σ] is the Leg-

endre transform of a functional F [τ ] given by the inverse Legendre transform


of G[σ]:
Z
τ
F [τ ] = G[σ ] + dd x σℓτ (x) τℓ (x) (27)

11
with σ τ the stationary ‘point’ of the expression on the right-hand side

δG[σ]
= −τℓ (x) . (28)
δσℓ (x) σ=στ

In terms of Feynman graphs, this just says that F [τ ] is given a sum of tree
Feynman graphs calculated from the action G[σ]+ dd x σℓ (x)τℓ (x), for which
R

the propagator is M−1


ℓ x ,m y . Therefore F [τ ] is perturbatively local if G[σ] is,

and if DetM 6= 0.] Furthermore, the condition that DetM =


6 0 can always
be satisfied by adding a finite quadratic term to G[σ], and subtracting the
same term from T [σ]. Apart from symmetries, the only constraint on F [τ ]
is that it be perturbatively local, so it is always possible to choose an F [τ ]

that gives whatever G[σ] is needed to cancel the infinities in T [σ].


This works out in a particularly simple way if the currents jℓ (x) have
dimensionality (in powers of mass, with h̄ = c = 1) less than the spacetime
dimensionality, so that the σℓ (x) have positive dimensionality. In this case

the infinite part T∞ [σ] of T [σ] is the integral of a polynomial in the σℓ (x) and
their derivatives, with infinite constant coefficients. For instance, consider
an extended Gross–Neveu model in four dimensions, with an action of the
form
Z
I[ψ] = − d4 x ψ r γ µ ∂µ ψr + NF [j0 /N] , (29)

where F [j0 /N] is an arbitrary even local functional of the single current

j0 = ψ r ψr . (30)

We take F [j/N] even so that the action will be invariant under a discrete

12
chiral symmetry transformation ψr → γ5 ψr , which if unbroken keeps the
fermions massless. Here σ(x) has dimensionality +1, and chiral symmetry
tells us that the functional T [σ] defined by Eq. (25) is even in σ, so this

functional takes the form


Z h i
T [σ] = d4 x I0 + I1 σ 2 + I2 σ✷σ + I3 σ 4 + Tf [σ] (31)

where the Ia are infinite constants, and Tf is a finite functional, which for

constant σ takes the well-known form7

1
Z
Tf [σ] = − d4 x σ 4 ln σ 2 .
32π 2

The functional F [τ ] may be expanded in a series of even local operators of

increasing dimensionality
Z h i
F [τ ] = d4 x A0 + A1 τ 2 + A2 τ ✷τ + A3 τ 4 + A4 τ ✷✷τ + · · · , (32)

in which case Eqs. (21) and (22) give

A22
" ! #
1 2 A2 A3 4 A4
Z
4
G[σ] = d x A0 − σ + 2 σ✷σ+ σ + − σ✷✷σ+· · · .
4A1 4A1 16A41 4A21 4A31
(33)
In order to cancel infinities, we must take the bare parameters as

C2 − I2
A0 = C0 − I0 , A1 = − 14 [C1 − I1 ]−1 , A2 = ,
4[C1 − I1 ]2
C3 − I3 C4 [C2 − I2 ]2
A3 = , A4 = − , ···
16[C1 − I1 ]4 4[C1 − I1 ]2 4[C1 − I1 ]3
(34)

13
where the Ca are the finite renormalized coupling parameters that appear in
the final result
Z " #
4 2 4
G[σ] + T [σ] = d x C0 + C1 σ + C2 σ✷σ + C3 σ + C4 σ✷✷σ + · · · + Tf [σ] .
(35)

It is important but not surprising that the infinite number of unrenormalized


constants Aa can be chosen to give finite results for Γ[σ], despite the fact
that the Gross–Neveu model is not conventionally renormalizable in four
dimensions. What is somewhat surprising is that this is possible with an

interaction given by a power series in the single current j0 = ψ r ψr and


its derivatives, without needing to include additional currents, such as j1µ ≡
ψ r ∂µ ψr . Although it is not necessary to include additional currents in order to
cancel infinities, in the spirit of effective field theory we really should include

all U(N)-invariant currents in the action. This complicates the cancellation


of infinities through renormalization, but as we have shown earlier, it does
not make it impossible.
What good is a theory like this, that has an infinite number of arbitrary

parameters? For an action of the form (29), the effective action (24) takes
the form
Z Z
Γ[ψ, σ] = − d4 x ψ r γ µ ∂µ ψr + d4 x σψ r ψr + NG[σ] + NT [σ] , (36)

with G[σ] + T [σ] given by Eq. (35). To calculate scattering amplitudes we


must use this as the action in the tree approximation. For instance, the

14
invariant amplitude for a fermion–fermion scattering process A+ B → C + D
takes the form

M(A + B → C + D) = δrA rC δrB rD (ūC uA ) (ūD uB ) ∆(t)

−δrA rD δrB rC (ūD uA ) (ūC uD ) ∆(u) , (37)

where t and u are the Mandelstam variables t = −(pA − pC )2 and u =


−(pA − pD )2 , and ∆(t) is the σ propagator. This particular form of the
scattering amplitude is a consequence of the assumption that the action has

the form (29), with only the one current j0 , and it is valid in the large N
limit for arbitrary values of the fermion momenta.
Unfortunately, to go further and actually calculate the propagator ∆(s)
without knowing the infinite number of free parameters in F [τ ] or G[σ], it

is necessary to take the limit of small momenta. But in this limit, little is
gained by also letting N become large.
To compare the consequences of the large N and low momentum approx-
imations, let us consider an amplitude calculated using the action (29) in the

equivalent form
Z Z
I[ψ, σ] = − d4 x ψ r γ µ ∂µ ψr + d4 x σψ r ψr + NG[σ] , (38)

with the N-dependence of G[σ] left unspecified. Take all incoming and out-
going momenta of the order of some small momentum Q, and suppose that
infinities are cancelled by renormalizing at momenta also of order Q. Fermion

propagators go as Q−1 , while σ propagators go as constants (any quadratic

15
terms in G[σ] involving derivatives being treated as interactions) and each
loop introduces four factors of Q, so an amplitude with Iψ internal fermion
lines and L loops goes as Qν , where

X
ν = 4L − Iψ + d i Vi , (39)
i

where Vi is the number of purely bosonic interactions of type i, and di is the


number of derivatives in such an interaction. This may be rewritten by using
the familiar formulas

2Iψ + Eψ = 2Vψ , (40)


X
2Iσ + Eσ = Vi ni + Vψ , (41)
i

and
X
L = Iψ + Iσ − Vi − Vψ + 1 , (42)
i

where Iσ is the number of internal σ lines, Eψ and Eσ are the numbers of


external fermion and σ lines, Vψ is the number of fermion–fermion–σ vertices,
and ni is the number of σ fields in the purely bosonic interaction of type i.
Eq. (39) then may be put in the form

X Eψ
ν = 2L + ∆i Vi + 2 − − Eσ + 2 , (43)
i 2

where
∆i = ni + di − 2 . (44)

Now, all σ interactions have either ni ≥ 4 or ni = 2 and di ≥ 2 (the term

proportional to σ 2 d4 x in G[σ] being treated as the kinematic action for


R

16
the σ field), so for all purely bosonic interactions ∆i ≥ 2. A scattering
amplitude for a fixed process (i.e., Eψ and Eσ fixed) is therefore dominated
for Q → 0 by tree graphs with any number of fermion–fermion–σ vertices

but no loops and no pure σ interactions. These graphs are a subset of those
we encountered in the limit of large N, so the large N limit would introduce
no further simplification here.
The large N limit becomes relevant in the next order in Q2 , which accord-

ing to Eqs. (43) and (44) is given by graphs with any number of fermion–
fermion–σ vertices and either one loop or with no loops and one pure σ
interaction with ∆i = 2. The graphs with a single interaction from G[σ] or a
single purely fermionic loop reproduce what we would find by using the large

N effective action (36) to this order in Q2 . But here there is also a class
of graphs that do not appear in the large N limit: the graphs constructed
out of only fermion–fermion–σ vertices and containing a single loop, which is

not purely fermionic. (For instance, in fermion–fermion scattering these are


the graphs in which a pair of σ lines are exchanged between the fermions.)
The large N limit is therefore useful here, but not very useful, because even
without it there are only a finite number of graphs to each order in Q2 . This

is just a consequence of the fact that this is a theory where all interactions
are nonrenormalizable. In the next section we will see an example where the
large N limit is much more important, because there is an infinite number
of graphs to each order in small momenta, which can not be summed except

17
in the limit of large N.

III. THE NON-LINEAR σ-MODEL: INTEGRATING IN AN ORDER


PARAMETER

Auxiliary fields are sometimes introduced to enforce constraints on the


other fields, as well as to help in counting factors of 1/N. The classic example

is the non-linear O(N) σ-model, which in its usual form has Lagrangian
(4). Here we will consider a class of extended non-linear σ-models, with
Lagrangians of the form

f2
Z
I[π] = − d4 x ∂µ πr ∂ µ πr + Nf 2 F [j/N] , (45)
2

where πr is a set of N scalar fields, satisfying the constraint

πr πr = N ; (46)

j(x) is the O(N)-invariant scalar current with the minimum number of


derivatives
j ≡ 12 ∂µ πr ∂ µ πr ; (47)

f 2 is an arbitrary positive constant; and F [τ ] is a functional that, apart from

being perturbatively local and N-independent, can be chosen as we like. The


N-dependence in Eq. (45) has been chosen so that this model will be soluble
but non-trivial in the limit N → ∞.

18
As in the case of the extended Gross–Neveu model, we shall introduce an
auxiliary field σ(x), and replace Eq. (45) with the equivalent action
Z
I[π, σ] = Nf 2 G[σ, N] − f 2 d4 x (1 + σ) j , (48)

where exp{iNf 2 G[σ, N]} is the functional Fourier transform with respect to

f 2 Nτ of exp{iNf 2 F [τ ]}:

n o Z Y  Z 
exp iNf 2 G[σ, N] ≡ dτ (x) exp iNf 2 d4 x σ(x)τ (x) + iNf 2 F [τ ] .
x
(49)
It is easy to see that we get back to Eq. (45) when we integrate out σ(x),

but it will be convenient instead to use the action in the form (48).
In the limit of large N the functional G[σ, N] approaches an N-independent
functional given by the Legendre transform of F [τ ]
Z
G[σ, τ ] → G[σ] = d4 x στ σ + F [τ σ ] (50)

with τ σ defined by

δF [τ ]
= −σ(x) . (51)
δτ (x) τ =τ σ

At this point, it is not so obvious how infinite counterterms in the func-

tional G[σ] can be used to cancel the ultraviolet divergence proportional to


d4 x λ2 that is encountered when we introduce a Lagrange multiplier λ(x)
R

and integrate over the πr . Yet we know that this is possible, because the
cancellation of infinities is obvious in an extended linear σ model, in which

the action is an arbitrary perturbatively local functional of an unconstrained

19
N-vector field φr , and from such a model it is always possible to construct
a non-linear σ-model by integrating out the massive order-parameter repre-

sented by the O(N) scalar φr φr . This suggests that we should show the

cancellation of infinities in the extended non-linear σ model by using the in-


gredients appearing in Eq. (48) to construct something like a linear σ-model.
For this purpose, let us define new fields


φr ≡ f 1 + σ πr . (52)

Using the constraint (46), the current (47) may be written

1 N
j= ∂µ φr ∂ µ φr − ∂µ σ ∂ µ σ . (53)
2f 2 (1 + σ) 8(1 + σ)2

Also, the constraint now reads

φr φr = Nf 2 (1 + σ) (54)

and will again be imposed by introducing a Lagrange multiplier λ(x). The


action (48) is thereby replaced with the equivalent action
Z " #
 
2 ′ 4 1 2 1 µ
I[φ, σ, λ] = Nf G [σ] − dx 2λ φr φr − Nf (1 + σ) + 2 ∂µ φr ∂ φr . (55)

where
1 ∂µ σ∂ µ σ
Z
G′ [σ] ≡ G[σ] + d4 x (56)
8 1+σ
Now it is λ(x) rather than σ(x) that interacts with the N-vector of scalar
fields, so the φr scattering amplitude may be calculated in terms of the

effective action for λ(x) and φr (x), which is obtained by integrating out

20
σ(x). The part of the action involving σ(x) is proportional to N, and does
not involve the φr (x), so we can integrate out σ(x) to leading order in 1/N
by setting σ(x) equal to the value σ λ (x) where (55) is stationary with respect

to σ(x):

δG′ [σ]
= − 12 λ(x) . (57)
δσ(x) σ=σλ

This gives an action for φr (x) and λ(x):


Z h i
I[φ, λ] = NH[λ] − d4 x 1 µ
2 ∂µ φr ∂ φr + 21 λφr φr , (58)

where H[λ] is another Legendre transform


Z
2 λ 1 2

H[λ] ≡ f G [σ ] + 2f d4 x λ (1 + σ λ ) . (59)

The same reasoning as in the previous section (with λ and πr replacing σ and

ψr ) shows that to leading order in 1/N, the quantum effective action which
in tree approximation gives the complete scattering amplitude is here
Z h i
Γ[φ, λ] = Γ[λ] − d4 x 1 µ
2 ∂µ φr ∂ φr + 21 λφr φr . (60)

where

Γ[λ] = NH[λ] + 12 iNTr ln(✷ − λ) . (61)

Following the same argument as in the previous section, by choosing F [τ ] we


can make G′ [σ] and H[λ] any perturbatively local functionals we like, so we
can adjust H[λ] to cancel the infinite terms in the one-loop trace Tr ln(✷−λ)

proportional to d4 x λ and d4 x λ2 . For λ spacetime-independent, this gives


R R


Nλ2 ln(λ/M 2 )
cn λ n
X
V (λ) = + N (62)
64π 2 n=0

21
where the cn are model-dependent N-independent finite constants; M is a
constant that can be chosen for instance so that c2 = 1; and V [λ] is the
‘effective potential,’ defined so that for a spacetime-independent λ,

Γ[λ] = −V4 V (λ) , (63)

with V4 the spacetime volume.


To identify the possible phases of this model, we must examine the pos-

sible spacetime-independent vacuum expectation values of the fields φr and


λ, at which the effective action (60) is stationary. These phases are of two
different types:

1. Broken Symmetry Phase

In this phase the vacuum expectation value of λ(x) vanishes, while φr (x) has

a vacuum expectation N vr , given by the solution of the equation

δΓ[λ] ∂V (λ)
=− = 1 Nvr vr . (64)
δλ(x) λ(x)=0 ∂λ λ=0 2

From Eqs. (62) and (64) we see the system will be in this phase if c1 < 0,

and that in this case vr is N-independent and given by

vr vr = −2c1 . (65)

To see the particle content of the theory in this phase, we note that the terms

in the effective action (60) of second order in the displacement of the fields

22
from their equilibrium value have a coefficient matrix given in momentum
space by − 12 D(k), where D(k) is the matrix

Drs (k) = k 2 δrs , Dλλ (k) = NA(k 2 ) ,



Drλ (k) = Dλr (k) = N vr . (66)

Here A(k 2 ) is an N-independent one-loop amplitude derived from the part


of Γ[λ] quadratic in λ, which is easily calculated to be

ln(k 2 /M2 ) X
A(k 2 ) = + dn (k 2 )n (67)
32π 2 n=1

with dn another set of model-dependent finite constants, and M is a con-


stant chosen so that the constant term d0 in the sum is absent. The scalar

propagator ∆(k) = D −1 (k) hence has elements

k2
!
1 vr vs
∆rs (k) = 2 δrs − 2 , ∆λλ (k) = − ,
k v − k 2 A(k 2 ) N(v 2 − k 2 A(k 2 ))
vr
∆rλ (k) = ∆λr (k) = √ , (68)
N(v 2 − k 2 A(k 2 ))

where v 2 ≡ vr vr . The pole in ∆rs at k 2 = 0 clearly arises from the Goldstone


bosons of O(N)/O(N − 1). This pole occurs only in the propagators of the
components of φr in directions perpendicular to vr . There may be other
particles associated with poles at model-dependent non-zero masses arising

from the vanishing of the denominators v 2 − k 2 A(k 2 ) in the propagators of


the fields vr φr and λ, but without special assumptions about H[λ] we can say
nothing about them, except that they do not mix with the Goldstone bosons.

23
The invariant amplitude for Goldstone boson–Goldstone boson scattering is
given by the λ–λ element of the propagator, as
" #
′ ′ ′ ′1 δab δa′ b′ s δaa′ δbb′ t δab′ δa′ b u
M(ap, bq → a p , b q ) = + + ,
N v 2 + sA(−s) v 2 + tA(−t) v 2 + uA(−u)
(69)

where a, b, a′ , b′ run over the Goldstone directions, from 1 to N − 1 (with


vr taken in the N-direction), and s, t, u are the usual Mandelstam variables:
s = −(p + q)2 , t = −(p − p′ )2 , and u = −(p − q ′ )2 .
Even with the function A(s) unknown, this specific form of the scatter-

ing amplitude is a non-trivial consequence of the action (45) in the limit of


large N. But to go further and calculate the actual value of the scattering
amplitude we need to restrict ourselves to low energies.
In the extreme low-energy limit, Eq. (69) reduces to the usual low energy

Goldstone boson scattering amplitude,8

4 h i
M(ap, bq → a′ p′ , b′ q ′ ) −→ δ δ
ab a b
′ ′ s + δ δ
aa bb
′ ′ t + δ δ
ab a b
′ ′ u . (70)
Fπ2

provided we identify the Goldstone boson decay amplitude Fπ (equal to ≈ 184

MeV for pions) as



Fπ = 2v N . (71)

In this low-energy limit, nothing is gained by also taking N large.


The large N limit does produce some simplification in the terms of higher

order in energy. According to Eqs. (67), (69), and (71), the term in the

24
Goldstone boson scattering amplitude of fourth order in momenta is

Nδab δa′ b′ 2
M (4) (ap, bq → a′ p′ , b′ q ′ ) = − s ln(−s/M2 ) + crossed terms (72)
2π 2 Fπ4

which may be compared with the exact formula for the terms in the amplitude
of fourth order in momenta†
"
(4) δab δa′ b′ N −3 2 1
=
Maba′ b′ 4
− 2
s ln(−s) − (u2 − s2 + 3t2 ) ln(−t)
Fπ 2π 12π 2
#
1
− (t2 − s2 + 3u2) ln(−u) − cs2 − c′ (t2 + u2 ) + crossed terms ,
12π 2
(73)

where c and c′ are unknown constants. We see that the effect of taking the
large N limit here is just to eliminate a few of the terms in Eq. (73).

Inspection of Eqs. (67) and (69) shows that not only the terms in the
scattering amplitude of second and fourth order in a generic momentum k,
but all the ‘leading log’ terms of order k 2(n+1) (ln k 2 )n for n ≥ 0, are uniquely
determined by the first, model-independent term in Eq. (67), with no depen-

dence on the coefficients dn or the model-dependent functional H[λ]. These


are just the model-independent consequences of unitarity and the broken
O(N) symmetry alone, specialized to the case of large N. It is far easier to
calculate the leading logarithms by using a large N model and then passing

to the low energy limit where the results become model-independent, as we



This agrees with the result of reference 5 for the physical case N = 4. The term
of form δab δa′ b′ s2 ln(−s) is proportional to N − 3 rather than N − 1 because it receives
contributions from graphs that do not have index loops as well as from those that do.

25
have done here, than by evaluating the model-independent leading log terms
for general N and then passing to the limit of large N, but it is still true that
to each order in energy there are only a finite number of diagrams, whether

or not we invoke the large N limit.

2. Unbroken Symmetry Phase

In this phase the vacuum expectation value of φr (x) vanishes, while λ(x) has

a vacuum expectation value λ0 , given by the solution of the equation



δΓ[λ] ∂V (λ)
=− =0. (74)
δλ(x) λ(x)=λ0 ∂λ λ=λ0

Here the O(N) symmetry is unbroken, and in the large N limit we have a
degenerate multiplet of scalars with squared mass λ0 , so the system will be in
this phase only if the stationary point λ0 of V (λ) is positive. The S-matrix
elements for these degenerate scalars are given in the limit of large N by

using the effective action (60) in the tree approximation; for instance, the
Feynman scattering amplitude is
" #
′ ′ ′ ′
M(rp, sq → r p , s q ) = δrs δr′ s′ ∆(s) + δrr′ δss′ ∆(t) + δrs′ δr′ s ∆(u) , (75)

where ∆ is the λ propagator, of order 1/N.


Since λ0 is generically of the same order as whatever characteristic squared
mass scale appears in the functional Γ[λ], there is no way to use the model
to make useful quantitative predictions about masses and scattering ampli-

tudes in this phase without making special assumptions about the constants

26
appearing in H[λ]. From the large N limit we can only infer conclusions like
Eq. (75) about the general form of scattering amplitudes.

3. Phase Transition

As we have seen, in general without knowing all the constants cn in the


potential V (λ) we can say nothing about the masses in the unbroken sym-
metry phase except that they are O(N)-degenerate, and without knowing all
the constants dn in A(k 2 ) we can say nothing about the masses in the broken

symmetry phase except for the existence of a massless multiplet of Goldstone


bosons. We can do better in the case where the constants are tuned so that
the system is near the transition between the two phases.
In the unbroken symmetry phase the system is near the phase transition

if λ0 although non-zero is small. For small λ Eq. (62) becomes


Nλ2 ln(λ/M 2 )
V (λ) → + c1 λ . (76)
64π 2
(Recall that M has been chosen to make c2 vanish.) Condition (74) then
becomes
λ0 ln(e1/2 λ0 /M 2 )
c1 = − . (77)
32π 2
This has small positive solutions for λ0 as long as c1 is small and positive. On
the other hand, in the broken symmetry phase the system is near the phase
transition if vr is small, which according to Eq. (65) requires that c1 be small
and negative. Thus there is a second-order phase transition between these

two phases when c1 = 0, regardless of the values of the other cn .

27
Near this phase transition in the broken symmetry phase the large N
approximation allows us to sum amplitudes to all orders in the ratio of mo-
menta to the small vacuum expectation value v, provided the momenta are

small compared with all the other mass scales characteristic of H[λ]. Eq. (68)
shows that in this case the Goldstone boson scattering amplitude has poles
at s = −m2 , t = −m2 , and u = −m2 , with m given in terms of v by

2 2 2m2 ln(−m2 /M2 )


v = −m A(−m ) → − , (78)
32π 2

indicating the presence of an unstable light O(N − 1)-singlet particle with


complex mass m. This is not unexpected; continuity suggests that the N − 1

Goldstone bosons should be joined near the phase transition by an additional


scalar whose mass must vanish at the phase transition, in order to allow a
smooth transition to the unbroken symmetry phase, where the N degener-
ate scalars become massless at the phase transition. The mass m given by

Eq. (78) is complex because this scalar can decay into Goldstone bosons.
There are other possibilities: near the phase transition the unbroken
phase could have a degenerate multiplet of light scalars belonging to any
representation of O(N) that contains the (N − 1)-vector representation of

O(N − 1), not necessarily the defining representation. The Goldstone bosons
of the broken symmetry phase would then be joined at the phase transition
with those massless scalars that are needed to fill out this representation
when O(N) is restored. Our transformation of the theory has emphasized

the possibility that near the phase transition the light degenerate multiplet

28
of the unbroken phase forms an N-vector, but of course we do not know
that the c1 parameter encountered in this transformed theory is small. To
explore other possible types of phase transition, we would have to transform

the theory in other ways, and then assume that the parameter corresponding
to c1 in those transformed theories is small.
The smallness of c1 opens up a much more powerful role for the large N
approximation. The expectation value < λ > is then small or zero, and the

propagator of φ goes as k −2 for a four-momentum k which though small is


larger than < λ >. On the other hand, the term in the action of second
order in λ has a momentum-independent term which is not small near the
phase transition, so the λ propagator must be regarded as of zeroth order in

momenta. We can count powers of momentum and/or c1 in any diagram
by dimensional analysis, with the fields φr and λ taken as having dimensions
one and two (in powers of momentum), respectively.

With this understanding, the action (58) contains one superrenormaliz-


able (‘relevant’) term c1B d4 x λ; three renormalizable (‘marginal’) terms
R

Z Z Z
−c2B d 4 x λ2 , − 1
2 d4 x ∂µ φr ∂ µ φr , − 1
2 d4 x λ φr φr ;

and an infinite number of nonrenormalizable (‘irrelevant’) terms, includ-


ing terms of second order in derivatives of λ that act as corrections to the
momentum-independent zeroth-order λ propagator. (The subscript B on c1B
and c2B indicates that these are bare couplings, chosen to give finite values to

the c1 and c2 in Eq. (62). To leading order in 1/N there is no renormalization

29
of the coefficients of d4 x ∂µ φr ∂ µ φr and d4 x λ φr φr ; these coefficients are
R R

fixed to be −1/2 by a choice of normalization of λ and φr .) The presence


of a superrenormalizable term prevents an expansion in powers of momenta

alone, but near the phase transition with c1 small, we can expand any scat-

tering amplitude in powers of the over-all scale of momenta and c1 . The
leading term in this expansion is given by Feynman diagrams involving only
the renormalizable and superrenormalizable interactions listed above.

The presence of the renormalizable interaction d4 x λ φr φr means that


R

there is an infinite number of multi-loop graphs of leading order in momenta



and/or c1 . Here the large N limit offers the huge simplification, of reducing
the complete quantum effective action to the simple form (60), only now with

H[λ] containing only terms linear and quadratic in λ:


Z h i Z Z
4 µ 4
Γ[φ, λ] = − dx 1
2 ∂µ φr ∂ φr + 1
2 λφr φr − Nc1B d x λ − Nc2B d 4 x λ2

+ 12 iN Tr ln(✷ − λ) . (79)

Using Eq. (79) in the tree approximation gives the terms in scattering

amplitudes of leading order in 1/N and in small momenta and c1 . For
instance, in the broken symmetry phase the function A(k 2 ) appearing in
Eqs. (66)–(69) is here simply given by the first term in Eq. (67)

ln(k 2 /M2 )
A(k 2 ) = . (80)
32π 2

For v → 0 there is just one solution of the equation v 2 = k 2 A(k 2 ) with

k 2 → 0 (the only case where Eq. (80) can be trusted). This solution has

30
Re k 2 < 0, and corresponds to an unstable scalar particle that can decay into
pairs of Goldstone bosons.

IV. THE EXTENDED CP N −1 MODEL: INTEGRATING IN A GAUGE


FIELD

Besides helping us to count factors of 1/N and enforcing constraints on

the fields, auxiliary fields are sometimes introduced in order to enforce a


condition of gauge invariance. The leading example of this sort is the CP N −1
model,4 which in its original form has an action given by Eqs. (11) and (12).
Here we shall consider a class of extensions of the CP N −1 model in which

non-trivial finite results may be obtained in the limit of large N in four


spacetime dimensions. As we shall see, just as in its original two dimensional
version, this model has the remarkable property that a long range Coulomb
force arises even though no elementary gauge field is introduced into the

action.
The extended CP N −1 models to be considered here contain a set of N
complex scalar fields ur , subject to the constraint that

u†r (x) ur (x) = N . (81)

The action is invariant under ‘gauge’ transformations

δur (x) = i ǫ(x) ur (x) , (82)

31
with ǫ(x) an arbitrary real infinitesimal function. For a soluble model which
yields non-trivial finite results, it turns out to be sufficient to take the action
in the form
Z
I[u] = Nf 2 d4 x (−b + aµ aµ ) + Nf 2 F [a, b] (83)

where
 
aµ ≡ −(i/2N) u†r ∂µ ur − (∂µ u†r )ur , (84)

b ≡ (1/N) (∂µ u†r ) ∂ µ ur , (85)

and F [a, b] is an arbitrary N-independent Lorentz-invariant perturbatively


local functional, invariant under the transformation induced by the gauge
transformation (82)

δb(x) = 2aµ (x)∂µ ǫ(x) , δaµ (x) = ∂µ ǫ(x) . (86)

The first term in Eq. (83) is a rewritten version of the action (11), (12) of
the original CP N −1 model; this term could have been included in NF [a, b],
but it is convenient to display the kinematic part of the action explicitly. In

principle we should include all SU(N)-invariant bilinear currents in addition


to aµ and b, but the effects of other currents are suppressed at small momenta,
and the addition of an arbitrary functional of aµ and b is enough to allow
the cancellation of infinities. Note that aµ (x) is now given by Eq. (84), and

is not taken as an independent field, because we will need to include terms


in F [a, b] involving ∂µ aν − ∂ν aµ , and we are trying to see how a Maxwell

32
field can arise without its being put in from the beginning. The photon will
appear here in quite a different way.
For the sake of variety, we will take a different approach to counting

powers of 1/N here, which gives the same result as the functional Fourier
transform used in the previous sections. We introduce a pair of new auxiliary
fields αµ (x), ρµ (x) and β(x), σ(x) for each of the bilinears aµ (x) and b(x)
appearing in the action, writing Eq. (83) in the equivalent form
Z  
I = Nf 2 d4 x − b + αµ αµ + σ (β − b) + ρµ (αµ − aµ ) + Nf 2 F [α, β] . (87)

Integrating out the σ(x) and ρµ (x) yields delta functions which set β(x) =

b(x) and αµ (x) = aµ (x), taking us back to Eq. (83). Instead, we shall first
integrate out the αµ (x) and β(x). Since the terms in (87) that depend on
αµ (x) or β(x) are simply proportional to N and do not depend on the ur , in
the limit of large N we can set these fields equal to the values at which (87)

is stationary with respect to αµ (x) and β(x), giving


Z  
I = −Nf 2 d4 x (1 + σ)b + ρµ aµ + Nf 2 G[ρ, σ] , (88)

where G is the Legendre transform of F + d4 x αµ αµ


R

h Z Z i
4 µ
G[ρ, σ] = F [α, β] + d x αµ α + d4 x (ρµ αµ + σβ) (89)
staty

with the subscript ‘staty’ meaning that we set αµ (x) and β(x) equal to values

satisfying conditions that make the quantity in square brackets stationary:

δF δF
= −2αµ (x) − ρµ (x) , = −σ(x) . (90)
δαµ (x) δβ(x)

33
The Legendre transform of a general perturbatively local functional is
just another general perturbatively local functional, so we can regard G[ρ, σ]
as arbitrary, except for a gauge invariance condition. Using Eq. (90) and the

invariance of F [a, b] under the transformation (86), we easily see that

µ δF [α, β] δF [α, β]  δG[ρ, σ] µ 


Z  Z 
4
0= d x ∂µ ǫ 2α + = d4 x ∂µ ǫ −2(1+σ) −ρ .
δβ δαµ δρµ
(91)

It follows that we can define a new functional

1 Z 4 ρµ ρµ
G′ [ρ, σ] ≡ G[ρ, σ] + dx (92)
4 1+σ

that is invariant under the transformations

δρµ = −2(1 + σ)∂µ ǫ δσ = 0 . (93)

This suggests that we should define a gauge field

ρµ
Aµ ≡ − , (94)
2(1 + σ)

which according to Eq. (93) has the gauge transformation property

δAµ = ∂µ ǫ . (95)

In the original version of the CP N −1 model, F = 0, and then Eq. (90) shows
that ρµ = −2αµ = −2aµ and σ = 0, so Eq. (94) gives Aµ = aµ . But in the
general case with F 6= 0, it is incorrect to identify Aµ with aµ .
We are not yet ready to add a Lagrange multiplier term − d4 x λ (u†r ur −
R

N) and integrate out the ur fields, because then we would again encounter

34
an infinite term proportional to d4 x λ2 , and it is not yet clear how this
R

could be cancelled. Instead we will first re-define the fields to introduce an


order parameter, as we did in the previous section for the non-linear σ-model.

Define

zr ≡ f 1 + σ ur , (96)

subject to the constraint

zr† zr = Nf 2 (1 + σ) . (97)

The bilinears (84) and (85) then take the form


−i 
† †

aµ = z ∂µ z r − (∂µ zr ) zr (98)
2f 2 N(1 + σ) r
1 1
b = 2
∂µ zr† ∂ µ zr − 2
∂µ σ∂ µ σ . (99)
f N(1 + σ) 4(1 + σ)
The action given by Eq. (88) now may be written
Z
I=− d4 x (Dµ zr )† D µ zr + Nf 2 G′′ [A, σ] , (100)

where Dµ is the gauge-covariant derivative

Dµ zr ≡ ∂µ zr − iAµ zr , (101)

and G′′ is another arbitrary gauge-invariant perturbatively local functional


1 ∂µ σ∂ µ σ
Z
′′
G [A, σ] ≡ d4 x + G′ [ρ, σ] . (102)
4 1+σ
Now we may enforce the constraint (97) by introducing a Lagrange mul-
tiplier term
Z  
− d4 x λ zr† zr − Nf 2 (1 + σ) , (103)

35
which preserves gauge invariance if we define λ(x) to be gauge-invariant.
After integrating out the σ field, the action becomes
Z Z
I=− d4 x (Dµ zr )† D µ zr − d4 x λzr† zr + NH[A, λ] , (104)

where H[A, λ] is yet another Legendre transform


 Z 
H[A, λ] = f 2 G′′ [A, σ] + d4 x (1 + σ)λ (105)
σ=σλ

with σ λ (x) equal to the σ(x) at which the quantity in square brackets on the

right-hand side of Eq. (105) is stationary



δG′′ [A, σ]
= −λ(x) . (106)
δσ(x) σ=σλ

The zr field is now unconstrained.


We want to calculate the effective action Γ[z, A, λ], which in the tree
approximation gives the same result as the sum of all loop and tree graphs

calculated using the action (104). Following the same reasoning as in Section
2 (with zr replacing ψr , and λ and A replacing σ), we find that this is given
to leading order in 1/N by
Z Z
Γ[z, A, λ] = − d4 x (Dµ zr )† D µ zr − d4 x λzr† zr + Γ[A, λ] , (107)

where

Γ[A, λ] = iN Tr ln[Dµ D µ − λ] + N H[A, λ] . (108)

Gauge invariance and dimensional analysis tell us that the infinite part of

the first term in Eq. (108) is a linear combination of the gauge-invariant

36
functionals d4 x λ, d4 x λ2 , and d4 x (∂µ Aν − ∂ν Aµ )2 . But H[A, λ] is an
R R R

arbitrary perturbatively local functional, constrained only by invariance un-


der the gauge transformation (95), so there is no problem in adjusting it to

cancel these infinities.


Like the non-linear sigma-model, the CP N −1 model can exist in several
phases, characterized here by different spacetime-independent vacuum ex-
pectation value of the scalar fields zr and λ, with Aµ (x) = 0. In analyzing

these phases, we shall make use of the fact that for Aµ (x) = 0 and λ(x)
constant, Γ[A, λ] may be expressed as in Eq. (63) in terms of an effective
potential V (λ)
Γ[0, λ] = −V4 V (λ) , (109)

with V4 the spacetime volume. The effective potential here is given by a


formula like Eq. (62)

Nλ2 ln(λ/M 2 )
cn λ n
X
V (λ) = + N (110)
32π 2 n=1

with cn a set of new constant coefficients depending on H[0, λ], and M a new
constant that can be chosen to make c2 = 0. (The coefficient in the first
term is 1/32π 2 instead of 1/64π 2 because zr is complex.) In analyzing the

vector particle mass and the ‘charge’ of the zr particles, we will also need
to study the term in Γ[A, λ] of second order in the photon field for constant
λ(x), which gauge invariance requires must take the form

N
Z
Γ(2) [A, λ] = d4 x Aµ (x) (ηµν ✷ − ∂µ ∂ν ) f (−✷, λ) Aν (x) , (111)
2

37
Evaluating the trace in Eq. (108) gives
! ∞
2 1 Z1 2 λ + q 2 x(1 − x)
fn (q 2 )λn , (112)
X
f (q , λ) = − 2
dx (1−2x) ln 2
+
16π 0 W n=0

where fn (q 2 ) are N-independent functions of q 2 analytic at q 2 = 0, arising


from the unknown functional H[A, λ], and W is another mass parameter,
which can be chosen to make f0 (0) = 0.

1. Broken Symmetry Phase


In this phase zr (x) has a non-vanishing vacuum expectation value N vr

while the vacuum expectation values of λ(x) and Aµ (x) both vanish, which
requires that

∂V (λ)
= −Nv 2 (113)
∂λ λ=0

where v 2 ≡ vr∗ vr . Since Γ ∝ N, vr is N-independent. For small constant λ,

the effective potential (110) is

Nλ2 ln(λ/M 2 )
V (λ) → + N c1 λ (114)
32π 2

Hence condition (113) gives

c1 = −v 2 . (115)

In analyzing the degrees of freedom in this phase, it is very convenient to


eliminate the scalar–vector mixing in Eq. (107) by adopting unitarity gauge,
in which Im(vr∗ zr ) = 0. Taking vN = v real and vi = 0 for i = 1, · · · N − 1,

this means that zN is real, while the zi are still complex. The zi are massless

38
Goldstone boson fields, while zN has the same sort of mixing with λ that
we saw in the previous section; the terms in the action of second-order in λ

and/or zN − N v have a coefficient matrix given by

DN N (k 2 ) = k 2 ,Dλλ (k 2 ) = NA(k 2 ) ,

DN λ (k) = DλN (k) = N v (116)

where now

ln(k 2 /M2 ) X
A(k 2 ) = + fn (k 2 )n . (117)
16π 2 n=1

with fn yet another set of model-dependent constants; and M is a constant


chosen so that the term f0 in the sum is absent. The scalar mass m is given

by the condition that this have a zero determinant at k 2 = −m2 :

− m2 A(−m2 ) = v 2 . (118)

Without further information about the functional H[0, λ], we cannot tell

whether there actually is a massive scalar in the spectrum of zN and λ.


To study the vector particles in this phase, we note that according to
Eq. (107), the term in Γ[v, A, 0] of second order in the photon field is given
in this phase by
√ Z
Γ(2) [ N v, A, 0] = −Nv 2 d4 x Aµ (x)Aµ (x) + Γ(2) [A, 0] (119)

where Γ(2) [A, 0] is defined by Eq. (111). There is a vector particle of mass
µ 6= 0 if

µ2 f (−µ2 , 0) = 2v 2 . (120)

39
Without special assumptions about H[A, 0], it is not possible to tell this has
a solution, much less to calculate the vector boson mass µ. But it is clear
that any massive scalar or vector particles would have to be unstable, because

they could decay into the Goldstone bosons zi .

2. Unbroken Symmetry Phase

In this phase λ has a non-zero vacuum expectation value λ0 , satisfying the


condition

∂V (λ)
=0, (121)
∂λ λ=λ0

which allows zr (x) as well as Aµ (x) to have vanishing expectation values.


Eq. (107) shows that in this phase the zr have squared mass λ0 , so λ0 must
be positive. The photon propagator in momentum space equals

ηµν
∆µν (k) = + gauge terms ,
k2 N f (k 2 , λ0 )

where ‘gauge terms’ denotes gauge-dependent terms proportional to kµ kν .


Because the zr for λ0 6= 0 have a finite mass the function f (k 2 , λ0 ) is analytic
at k 2 = 0, so the photon here is massless. Also the renormalized gauge field
q q
is Nf (0, λ0 )Aµ , so the zr charge is 1/ Nf (0, λ0 ), and is hence of order

1/ N. Without making special assumptions about the functional H[A, λ] it
is impossible to say anything more about the values of the zr squared mass
λ0 or the zr charge.

40
3. Phase Transition

As in the case of the non-linear σ-model, we can obtain more detailed


results when the model is near a phase transition between the broken and
unbroken symmetry phases. In the unbroken symmetry phase, the model is
near this phase transition if the λ0 satisfying Eq. (121) is positive and small.

In this case Eqs. (110) and (121) give


λ0 ln(e1/2 λ0 /M 2 )
c1 = − (122)
16π 2
which has positive small solutions for λ0 as long as c1 is small and positive.

On the other hand, in the broken symmetry phase the model is near the
phase transition if v is small, which according to Eq. (115) requires that c1 is
small and negative. Thus there is a second-order phase transition at c1 = 0,
irrespective of the values of other parameters.

To analyze the low-momentum limit near a phase transition, we note that


the action (104) contains a single superrenormalizable term −Nc1B d4 x λ;
R

four strictly renormalizable terms


Z Z
4 2
−Nc2B d xλ ; − 1
4 NZ d4 x (∂µ Aν − ∂ν Aµ )2
Z Z
4 µ
− †
d x (Dµ zr ) D zr ; − d4 x λ zr† zr ;

and an infinite number of non-renormalizable terms. (The subscript B again


indicates bare values, adjusted to give finite values to c1 and c2 in Eq. (110).)
In the limit where c1 and all momenta are small, we can ignore the non-

renormalizable interactions, and calculate scattering amplitudes by using the

41
quantum effective interaction (107) in the tree approximation, now with
Z Z
Γ[A, λ] = iN Tr ln[Dµ D µ − λ] − Nc1B d4 x λ − Nc2B d 4 x λ2
Z
− 14 NZ d4 x (∂µ Aν − ∂ν Aµ )2 . (123)

This tells us that for example that the potential V (λ) is given by Eq. (114);

that for constant λ(x) the function f (q 2 , λ) appearing in the formula (111)
for the term in Γ[A, λ] of second order in the vector field is here

λ + q 2 x(1 − x)
!
1 1
Z
f (q 2 , λ) = − dx (1 − 2x)2 ln ; (124)
16π 2 0 W2

and that the function A(k 2 ) appearing in the scalar two-point function (116)
is
ln(k 2 /M2 )
A(k 2 ) = . (125)
32π 2
As an example of the use of these results, let’s look more closely at the
properties of the particles near the phase transition. In the unbroken sym-

metry phase for small λ0 , Eq. (124) gives


!
N λ0
f (0, λ0) → − 2
ln (126)
48π W2
q
This is positive but diverges for λ0 → 0, so that the zr charge 1/ N f (0, λ0 )

vanishes at the phase transition. In the broken symmetry phase the vector
boson mass is determined by the function f (q 2 , 0), which for q 2 → 0 is given
by Eq. (124) as
q2
" ! #
2 1 8
f (q , 0) → − ln − . (127)
48π 2 W2 3

42
Eq. (120) for the vector boson squared mass µ2 has a single solution that
vanishes as v → 0, indicating the presence of a single light vector particle.
Also, in the broken symmetry phase near the phase transition, Eq. (118) for

the scalar mass m takes the form

−m2 ln(−m2 /M2 )


= v2 . (128)
32π 2

This has one solution for m2 that vanishes as v → 0, indicating the presence

of single massive but light scalar particle. These solutions for µ2 and m2
both have positive real part but are complex, reflecting the fact that both of
these particles are unstable, because they can decay into pairs of Goldstone
bosons.

V. DYNAMICAL GAUGE BOSONS: A REMARK

The CP N −1 model has attracted much attention because of the appear-

ance of a massless gauge boson in a theory involving only scalar fields. It


is important to recognize that this phenomenon does not depend on the
existence of the gauge symmetry (82), or indeed on any of the symmetry
properties of the action.

This can be seen by a very general argument.9 Consider a theory that


is invariant under a gauge group G, with various matter multiplets forming
various representations of G. Suppose that one of these multiplets consists of

scalar fields, some of which have vacuum expectation values that completely

43
break the gauge symmetry. Integrate out the massive gauge vector bosons
in unitarity gauge. We then have a perturbatively local effective field theory,
with no hint of the original gauge invariance. It seems pretty clear that if we

allow arbitrary interactions in the original theory, then in this way we obtain
a completely general effective field theory of the remaining fields. But this
procedure can be reversed, so out of any effective field theory with no gauge
symmetry and possibly no global symmetry we can obtain a theory with any

broken gauge symmetry.


The point is that a spontaneously broken gauge symmetry in itself has
no predictive power.10 Of course, it can have plenty of predictive power if
the gauge coupling is weak, but for this we have to fine-tune the parameters

in the action. In the CP N −1 model studied in the previous section, this


fine-tuning is achieved by the condition that c1 is small.
To illustrate the possibility of constructing a broken gauge symmetry in

an effective field theory that has no symmetry to begin with, consider a


theory of Dirac fields ψi (x) with an action of form
Z
d4 x ψi γ µ ∂µ ψi − G[ψ]
X
I[ψ] = − (129)
i

where G[ψ] is an essentially arbitrary perturbatively local functional of the


fermion fields. We can choose G[ψ] so that this action has no internal sym-
metries — if we like, not even fermion conservation. This action can be

44
obtained by integrating out a vector field Aµ (x) in the action

M2
Z Z
d4 x ψi γ µ ∂µ ψi − G′ [ψ] − d4 x Aµ Aµ
X
I[ψ, A] = −
i 2
1
Z Z
+ d4 x Aµ j µ − d4 x Fµν F µν (130)
4

where M is an arbitrary mass parameter; Fµν (x) ≡ ∂µ Aν − ∂ν Aµ ; and

1 1
Z
G′ [ψ] ≡ G[ψ] − d4 x jµ (x)
j µ (x)
2 M2 − ✷
1 1
Z
− 2
d4 x ∂ν j ν (x) 2 ∂µ j µ (x) . (131)
2M M −✷

where j µ (x) is the current

jµ ≡ qi ψi γ µ ψi ,
X
(132)
i

with qi an arbitrary set of real parameters. As long as M 6= 0, G′ is still

perturbatively local. The action (130) can be obtained from another action

M2
Z Z  
d4 x |∂µ u − iAµ u|2 − d4 x ψi γ µ ∂µ − iqi Aµ ψi
X
I[ψ, A, u] = −
2 i
1
Z
− d4 x Fµν F µν + G′ [ψ ′ ] (133)
4

with u a scalar field constrained by |u|2 = 1, and

ψi′ ≡ ψi u−qi . (134)

The action (133) is invariant under the gauge transformation

ψi (x) → eiqi α(x) ψi (x) , u(x) → eiα(x) u(x) , Aµ (x) → Aµ (x) + ∂µ α(x) ,

(135)

45
so the action (130) can be obtained from (133) by adopting the unitarity
gauge, in which u = 1. (The action (133) is also perturbatively local, because
in deriving perturbation theory, we expand around u = 1 rather than u = 0.)

Yet there is no trace of this gauge invariance or even global invariance in the
action (129) with which we started.

46
ACKNOWLEDGEMENTS

I am grateful for helpful discussions with S. Coleman, J. Distler, and V.


Kaplunovsky. This research was supported in part by the Robert A. Welch

Foundation and NSF Grant PHY 9511632.

REFERENCES

1. For a review of the large N limit in various contexts, see S. Coleman, in


Aspects of Symmetry (Cambridge University Press, Cambridge, 1985):

Chapter 8.

2. The earliest reference known to me for the O(N) linear and non-linear
σ-models is M. Gell-Mann and M. Lévy, Nuovo Cimento 16, 705 (1960).
The O(N) linear σ-model was studied in the large N limit in statistical

mechanics by H. E. Stanley, Phys. Rev. 176, 718 (1968); K. Wilson,


Phys. Rev. D7, 2911 (1973); and in four-dimensional relativistic quan-
tum field theories by L. Dolan and R. Jackiw, Phys. Rev. D9. 3320
(1974); H. J. Schnitzer, Phys. Rev. D10, 2042 (1974); S. Coleman, R.

Jackiw, and H. D. Politzer, Phys. Rev. D10, 2491 (1974); and many
later authors. I don’t know who first studied the non-linear σ model in
the large-N limit.

3. D. Gross and A. Neveu, Phys. Rev. D10, 3235 (1974).

47
4. H. Eichenherr, Nucl. Phys. B146, 215 (1978); V. Golo and A. Perelo-
mov, Phys. Lett. 79B, 112 (1978); A. D’Adda, M. Lüscher, and P. Di
Vecchia, Nucl. Phys. B146, 63 (1978); B152,, 125 (1979); E. Witten,

Nucl. Phys. B149, 285 (1979); H. Haber, I. Hinchcliffe, and E. Ra-


binovici, Nucl. Phys. B172, 458 (1980); M. Bando, T. Kugo, and K.
Yamawaki, Phys. Rept. 164, 217 (1988).

5. S. Weinberg, Physica 96A, 327 (1979).

6. See, e. g., R. de M. Koch and J. P. Rodrigues, Witwatersrand preprint


hep-th/9605079 (1996).

7. S. Coleman and E. Weinberg, Phys. Rev. D7, 1888 (1973).

8. S. Weinberg, Phys. Rev. Lett. 17, 616 (1966).

9. V. Kaplunovsky, private communication.

10. See, e. g., S. Weinberg, The Quantum Theory of Fields II: Modern
Applications (Cambridge University Press, Cambridge, 1996): p. 318.

48
UTTG-11-92

Three-Body Interactions Among Nucleons and Pions


arXiv:hep-ph/9209257v1 20 Sep 1992

Steven Weinberg∗

Theory Group
Department of Physics
University of Texas
Austin, Texas 78712

Abstract
A chiral invariant effective Lagrangian may be used to calculate
the three-body interactions among low-energy pions and nucleons
in terms of known parameters. This method is illustrated by the
calculation of the pion-nucleus scattering length.


Research supported in part by the Robert A. Welch Foundation and NSF Grant PHY
9009850.
Recent articles1,2 have described a systematic effective Lagrangian frame-
work for the calculation of reactions involving arbitrary numbers of nucleons
as well as pions of low 3-momentum. To leading order in small momenta,

the ‘potential’ for such reactions is given entirely by the tree graphs in which
only two of the pions and/or nucleons interact; further, their interaction is
calculated using the original effective chiral Lagrangian3 , which consists of
terms with only the minimum numbers of derivatives or pion mass factors,

supplemented by contact interaction terms among nucleons. The corrections


to these two-body interactions of second order in small momenta involve not
only one-loop graphs, but also a large number of new terms4 in the La-
grangian with additional derivatives, so many that not much can be learned

about pion-nucleon or nucleon-nucleon interactions in this way. Fortunately,


these two-body interactions can instead be taken from phenomenological
models that incorporate experimental information on nucleon-nucleon, pion-

nucleon, and pion-pion scattering. The only remaining contributions to the


potential of the same order in small momenta consist of graphs in which three
particles (or two pairs of particles) interact, their interactions given by tree
graphs calculated from the original effective chiral Lagrangian. Thus we can

use the three-body interactions calculated in terms of known parameters from


the original effective chiral Lagrangians together with experimental data on
two-body scattering to calculate all corrections to the potential of first and
second order in small momenta.

1
This method will be illustrated here in the calculation of the amplitudes
for pion scattering on complex nuclei. But first, a reminder of some general-
ities.

Consider the amplitude for a process with Nn nucleons and Nπ pions in


the initial state and the same numbers of nucleons and pions in the final
state, all with 3-momenta no larger than of order mπ . We wish to develop
a perturbation theory for this amplitude, based on an expansion in powers

of the ratio of these small momenta (and the pion mass) to some momen-
tum scale that is characteristic of quantum chromodynamics, such as mρ .
In counting the number of powers of small momenta in any given “old fash-
ioned” (time-ordered) diagram for this process, we must distinguish between

energy denominators of two types. Those of the first type arise from in-
termediate states that differ from the initial and final states in the number
of pions and/or in the pion energies, and are therefore of the order of the

small momenta or the pion mass. The energy denominators of the second
type arise from intermediate states that differ from the initial and final states
only in the nucleon momenta, and are therefore much smaller, of the order of
the nucleon kinetic energies. A given graph is called irreducible if it contains

only energy denominators of the first type. These are graphs for which the
initial particle lines cannot all be disconnected from the final particle lines
by cutting through any intermediate state containing Nn nucleons and either
all the initial or all the final pions. We shall consider disconnected as well

2
as connected irreducible graphs, because a general connected graph is built
up from a sequence of both disconnected and connected irreducible graphs
interleaved with small energy denominators of the second type. (However

in all graphs considered here, each of the initial particle lines must be con-
nected to one or more of the final particle lines, and vice versa.) This sum of
disconnected and connected irreducible graphs is what was referred to above
as the potential.

Because irreducible graphs do not contain anomalously small energy de-


nominators of the order of nucleon kinetic energies, it is easy to count the
number ν of powers of small momenta or pion masses in these graphs. For an
irreducible graph with Vi vertices of type i, L loops, and C separate connected

pieces, the number of powers of small momenta or pion masses is1,2

X
ν = 4 − Nn − 2C + 2L + Vi ∆i , (1)
i

where ∆i is an index for an interaction of type i, given in terms of the number


ni of nucleon field factors and the number di of derivatives (or powers of pion
mass) in the interaction, by

1
∆i = di + ni − 2 . (2)
2

(In deriving this result, we count −3 powers of small momenta for each
line passing without interaction through the diagram, because the associated
momentum-space delta function reduces the number of momentum factors

in the total connected amplitude by that amount.)

3
Eq. (1) is useful because chiral invariance rules out any terms in the
Lagrangian with ∆i < 0. It follows that for any given number of external
lines, the leading irreducible graphs (those with smallest ν) are the tree

graphs (i. e., L = 0) with the maximum number C of connected parts,


constructed solely from vertices with ∆i = 0. The contribution of these
vertices can be read off from the effective interaction Hamiltonian (in the
interaction picture):

1 2 1 ~ + 1 m2 (D −1 − 1)π 2
~ · ∇π
Hint,∆=0 = (D − 1)π̇ 2 + (D −2 − 1)∇π
2 2 2 π
+2Fπ−4 (N(t × π)N)2
~
h i
+N 2Fπ−1gA D −1 t · (~σ · ∇π) + 2Fπ−2 Dt · (π × π̇) N
1 1
+ CS (NN)(N N) + CT (N~σ N)(N~σ N) . (3)
2 2

which is derived from the most general chiral-invariant Lagrangian with ∆i =


0:

1 1
L∆=0 = − D −2 ∂µ π · ∂ µ π − D −1 m2π π 2
2 2
~
h i
+N i∂0 − 2D −1 Fπ−2 t · (π × ∂0 π) − mN − 2D −1 Fπ−1 gA t · (~σ · ∇)π N
1 1
− CS (N N)(N N) − CT (N~σ N) · (N~σ N) (4)
2 2

where gA ≃ 1.25 and Fπ ≃ 190 MeV are the usual axial coupling constant
and pion decay amplitude; t is the nucleon isospin matrix; CS and CT are

constants whose values can be inferred from the singlet and triplet neutron-
proton scattering lengths; and D ≡ 1 + π 2 /Fπ2 . (As discussed in ref. 2, terms

4
involving time-derivatives of the nucleon field are eliminated by a suitable
redefinition of that field, while corrections to the non-relativistic treatment
of the nucleon in (3) and (4) appear as terms in the effective Hamiltonian

and Lagrangian with ∆i > 0.) The number C of connected parts is given
its maximum value C = Nn + Nπ − 1 by including only graphs (with one
qualification to be discussed later) for a single πN, NN, or ππ scattering,
with all other lines passing without interaction through the diagram.

The corrections to these leading terms with only one extra factor of small
momenta (or mπ ) arise from (a) tree graphs, with the maximum number
C = Nn +Nπ −1 of connected parts, that involve a single vertex (such as those
arising from non-zero u and d quark masses) with ∆i = 1, plus any number

of vertices with ∆i = 0. The next corrections, with two extra factors of small
momenta (or mπ ), arise from (b) one-loop graphs with C = Nn + Nπ − 1
involving only vertices with ∆i = 0; (c) tree graphs with C = Nn + Nπ − 1

involving either two vertices with ∆i = 1 or one vertex with ∆i = 2 (which


serve as counterterms for the infinities encountered in one-loop graphs), as
well as any number of vertices with ∆i = 0 ; (d) tree graphs, constructed
entirely from vertices with ∆i = 0, that have one less than the maximum

number of connected parts, i. e., with C = Nn + Nπ − 2.


As already mentioned, the vertices with ∆i = 2 that contribute to cor-
rections of type (c) contain so many free parameters4 that little of value can
be learned by using the effective Lagrangian to calculate these corrections.

5
On the other hand, these corrections as well as the leading terms and the
corrections of types (a) and (b) all only contribute to the maximally discon-
nected irreducible graphs, that consist of a connected piece involving just two

of the incoming nucleons and/or pions, plus disconnected lines passing with-
out interaction through the diagram for all of the other incoming nucleons
and pions. But instead of trying to use the effective Lagrangian to calculate
such two-body interactions, we can draw on various phenomenological mod-

els that incorporate not only chiral symmetry but the whole body of present
experimental information about low energy nucleon-nucleon, nucleon-pion,
and pion-pion scattering.
There remain only the corrections of type (d), with C = Nn + Nπ − 2.

These are to be calculated from tree graphs involving only the ∆i = 0 Hamil-
tonian (3), which involves no unknown parameters. These corrections arise
from graphs that either consist of (d1) a connected piece involving just three

of the incoming nucleons and/or pions, or (d2) two connected pieces each
involving just two of the incoming nucleons and/or pions, plus in both cases
disconnected lines passing without interaction through the diagram for all of
the other incoming nucleons and pions. Graphs of type (d2) may, like the

graphs of types (a), (b), and (c), be taken from suitable phenomenological
models based on experimental information about two-body scattering pro-
cesses. This leaves only the three-body graphs of type (d1), which can be
calculated from first principles in terms of known constants.

6
Let’s first see how this applies to processes involving only nucleons. Mult-
inucleon scattering amplitudes and bound-state wave functions are found by
solving an inhomogeneous Lippman-Schwinger or homogeneous Schrödinger

equation with the effective potential taken as the sum of irreducible graphs.
The graphs for the three-nucleon terms in the effective potential are shown in
Figures (1) and (2). A cancellation (to leading order in small momenta) was
noted in reference 2 among the graphs of Figure (1), the only graphs that

involve the part of the effective Hamiltonian (3) that is non-linear in the
pion field. It is instructive to look at the reason for this cancellation. These
graphs all involve a single quadratic interaction 2Fπ−1 gA N t · (π × π̇)N plus
linear interactions of the two pion fields in this interaction with the other

two nucleons. In each individual time-ordered graph, the time derivative in


the quadratic interaction makes a contribution of the order of a pion energy.
However, by summing up the old-fashioned graphs for all the time-orderings

of these three vertices, we obtain a Feynman diagram in which energy is


conserved at each vertex, so that the time-derivative yields a difference of
nucleon kinetic energies, smaller by a factor at most of order mπ /mN .
This leaves the 3-nucleon graphs of Figure (2). These are genuine contri-

butions to what we have defined as the 3-nucleon potential, but they involve
only the contact and pion-exchange nucleon-nucleon interactions, and their
effect is actually cancelled by terms in the expansion of the reducible three-
nucleon graphs of Figure (3) in powers of the ratio of nucleon to pion kinetic

7
energies. Again, the reason for this cancellation is not hard to find. Al-
though in Figure (2) we are not summing over all time orderings, so that
these graphs do not make up a complete Feynman diagram, the sum of all

the time-ordered graphs of Figures (2) and (3) makes up several complete
Feynman diagrams, in which energy denominators are replaced with pion
and nucleon propagators, and energy is conserved at each vertex. Since the
virtual pion energies in these Feynman diagrams are equal to differences of

nucleon kinetic energies, and hence negligible compared with the virtual pion
3-momenta ~q, the pion propagators in these diagrams are just (~q2 + m2π )−1 .
But these Feynman diagrams with such pion propagators are just what we
would get from the old-fashioned diagrams of Figure (3) if we were to neglect

nucleon kinetic energies in energy denominators for states containing a pion.


Thus we may calculate the multi-nucleon potential to second order in small
momenta by ignoring nucleon kinetic energies in the energy denominators of

the leading pion-exchange contributions to the potential, and ignoring the


three (or more) - nucleon contributions altogether. This is more or less what
nuclear physicists have always done anyway.∗
The three-body forces are more interesting in processes involving a pion.

For definiteness, consider the low-energy elastic scattering of a pion from a


nucleus of nucleon number A. General considerations of scattering theory tell

I am grateful to J. Friar for pointing out that in some treatments of the nuclear three-
body problem the pion exchange forces are calculated neglecting nucleon kinetic energies
in energy denominators, and that the corrections to this approximation are of the same
order as the other corrections considered in this work.

8
us that the S-matrix element for this process is simply given by the matrix
element between nuclear wave functions of the sum of all irreducible graphs
with Nn = A and Nπ = 1. (In applying the effective chiral Lagrangian

to such processes we are making use of the fact that typical 3-momenta of
nucleons in nuclei are of order mπ or less.) The leading irreducible graphs
are those in which the pion scatters off a single nucleon, evaluated using
the ∆i = 0 vertices in the tree approximation.∗∗ To second order in small

momenta, the corrections to these leading terms arise from corrections to


the pion-nucleon scattering amplitude (from loop graphs and from vertices
with ∆i = 1, 2) which can be taken from phenomonological models of pion-
nucleon scattering, together with connected three-body interactions among

two nucleons and the pion, calculated from tree graphs evaluated with the
∆i = 0 vertices in Eq. (3). The graphs for these three-body interactions are
shown in Figure 4.

This is a lot to calculate, but the problem becomes much simpler if we


restrict our attention to the pion-nucleus scattering length, for which the
incoming and outgoing pion have vanishing 3-momenta. The leading terms
as well as the corrections to pion-nucleon scattering give a scattering length

that (apart from reduced-mass corrections) is just the sum of the scattering
∗∗
There are also nominally leading terms in which the incoming pion is absorbed by one
nucleon and the outgoing pion is emitted by another, but when these are summed over
different time-orderings they cancel. Again, this is because summing over time-orderings
yields a Feynman diagram in which energy is conserved, but energy cannot be conserved
in the emission or absorption of a single real pion by a single nucleon.

9
lengths on the individual nucleons. This leaves only the three-body irre-
ducible graphs, of which the only ones that survive in the limit of vanishing
external pion 3-momenta are those shown in Figures 4(a) to 4(f).

It is easiest to calculate the contributions of Figures 4(a)-4(c) and 4(f) by


noting that the sum over time orderings in graphs of each type [and lumping
together graph 4(f), produced by the interaction term 2Fπ−4(N (t × π)N)2
in the Hamiltonian (3), with the other graphs] must give the same result

as the complete Feynman diagrams of type 4(a) - 4(c) calculated from the
Lagrangian (4) [which does not contain the interaction 2Fπ−4 (N(t × π)N)2 .]
The other graphs, 4(d) and 4(e), are not summed over all time-orderings
(because the sum would include reducible as well as irreducible graphs) and

so their contributions must be calculated using old-fashioned perturbation


theory. These contributions to the pion-nucleon scattering length are:

m2π
* +
[4(a)] X 1  (r) (s) (r) (s) (s) (r)

aab = 4 4 2t · t δab − ta bt − ta bt ,
2π Fπ (1 + mπ /md ) r<s ~qrs 2
(5)
gA2 δab (r) (s)
* +
[4(b)] ~qrs · ~σ ~qrs · ~σ
t(r) · t(s)
X
aab =− , (6)
2π 4 Fπ4 (1 + mπ /md ) r<s ~qrs 2 + m2π
[4(c)] gA2
aab =
2π 4 Fπ4 (1 + mπ /md )
 
2 (r) (s) (s) (r)
X
* ~qrs t · t δab + m2π (t(r)
a tb + t(s)
a tb ) ~qrs · ~σ (r) q~rs · ~σ (s) +
× (7)
,
r<s (~qrs 2 + m2π )2
g 2 mπ q~rs · ~σ (r) q~rs · ~σ (s)
* +
[4(d,e)]
= 4 4 A (t(r) + t(s) ) · (t(π) )ab
X
aab
8π Fπ (1 + mπ /md ) r<s (~qrs 2 + m2π )3/2
(8)

10
where subscripts a, b are pion isovector indices; r, s label individual nucleons;
~qrs is the momentum transferred between nucleons r and s in their interaction
with the pion; ~σ (r) and t(r) are the Pauli spin vector and isospin vector of

nucleon r; and (t(π)


c )ab = −iǫabc is the pion isospin vector. Note that as

a result of a partial cancellation between (6) and (7), the integrand in the
sum of these averages vanishes for ~qrs → ∞, which makes the result less
sensitive to the behaviour of the nuclear wave function at small internucleon

separation.† To second order in small momenta, the pion-nuclear scattering


length is

1 + mπ /mN X (r) [4(a)] [4(b)] [4(c)] [4(d,e)]


aab = a + aab + aab + aab + aab (9)
1 + mπ /AmN r ab
(r)
where aab is the pion scattering length on the r’th nucleon.
This all becomes much simpler in two special cases. One is double charge-
[r]
exchange scattering, π + + N → π − + N ′ , where the scattering lengths aab
as well as the corrections (6) and (8) vanish. The other, on which we shall
(s)
concentrate here, is pion scattering on an isoscalar nucleus. Here t(r)
a tb +
(r) 2
t(s)
a tb may be replaced with δ t(r)
3 ab
· t(s) , and Eq. (8) vanishes. More

important, the contributions of the nominally leading terms in the pion-


nucleon scattering lengths vanish, because they involve an expectation value
of t(r) · t(π) , which vanishes in any isoscalar nucleus. The first term in (9)
P
r

This cancellation was noted by Robilotta and Wilkin5 in the case of pion-deuteron
scattering. They used a different definition of the pion field, so their results for diagrams
4(b) and 4(c) were different from (6) and (7), but the sum of their results agrees with
what would be found for this process from the sum of (6) and (7).

11
arises only from “σ-term” corrections to the pion-nucleon scattering lengths,
and is therefore relatively small, making it feasible to compare calculations
of the corrections considered here with experimental measurements of the

pion-nuclear scattering lengths.


This may be illustrated in the paradigmatic case of pion-deuteron scat-
tering. To evaluate the two-body terms here we need to use isotopic spin
invariance to derive the pion-neutron scattering lengths from measured val-

ues of the π + p and π − p scattering lengths. This is not entirely straightfor-


ward, because we are interested here in the relatively small corrections to
the leading soft-pion results for which aπp + aπn = 0, and these corrections
arise in part from “sigma terms” proportional to u and d quark masses that

do not even approximately conserve isospin. Fortunately to first order in


quark masses the isospin violation in the sigma terms affects only processes
involving at least one neutral pion,6 so that isospin relations can be used to

calculate aπn . This gives the two-body terms in the π − d scattering length
1+mπ /mN
as7 1+mπ /md
[aπp +aπn ] = −(0.021±0.006)m−1
π . Shifting to coordinate space,

the remaining corrections are given by:

m2π Z ∞ 2
(u + w 2 )
a[4a)] = − dr , (10)
π 2 Fπ4 (1 + mπ /md ) 0 r

and

m2π gA2 1 ∞ 2 1 mπ −mπ r


 Z  
a[4(b,c)] = 2 4
(u + w 2 ) − e dr
3π Fπ (1 + mπ /md ) 4 0 r 2
uw w 2
! #
1
Z ∞ 
−mπ r
− √ − + mπ e dr (11)
0 2 4 r

12
where u and w are the s-wave and d-wave parts of the deuteron wave function,
normalized so that
Z ∞
(u2 + w 2 ) dr = 1 (12)
0

The rescattering term (10) [but not (11)] has been previously considered
in the books of Eisenberg and Koltun and Ericson and Weise.7 Because
of the anomalously large radius of the deuteron, this term is considerably
larger than the remaining three-body term (11), so it should be calculated

including first-order corrections to the pion-nucleon scattering vertices in


Figure 4(a). Fortunately these corrections can be taken from the measured
values of the scattering lengths.8 In this way one finds that7 a[4(a)] =
−(0.026 ± 0.001)m−1
π . The remaining three-body terms (11) are calculated

to be a[4(b,c)] = −0.0005m−1
π (mostly arising from the interference between s-

wave and d-wave parts of the wave function), in agreement with the numerical
result quoted in reference 5. This is small compared with the uncertainties in
other terms, and so may be neglected here, though this may not be the case

for pion scattering on heavier nuclei. This justifies the final theoretical result
of reference 7, aπd = −(0.050 ± 0.006)m−1
π , which is in good agreement with

the experimental value −(0.056±0.009)m−1


π . Although the use of chiral effec-

tive Lagrangians has turned out here only to confirm previous calculations of

pion-deuteron scattering as well as nuclear binding, the systematic counting


of momentum factors in chiral perturbation theory has proved its value in

The calculation of the integrals in Eqs. (10) and (11) was carried out by R. C.
Mastroleo and U. van Kolck, using the Bonn wave function for the deuteron.

13
explaining (as previous calculations did not explain) just why it is correct to
consider only certain graphs and certain terms in the effective Lagrangian.
I am grateful for discussions with C. Dove, J. Friar, A. Gleeson, C.

Ordoñez, U. van Kolck, and J. D. Walecka.

14
References

1. S. Weinberg, Physics Letters B 251 (1990) 288.

2. S. Weinberg, Nuclear Physics B 363 (1991) 3.

3. S. Weinberg, Phys. Rev. Lett. 18 (1967) 188 ; Phys. Rev. 166 (1968)
1568 .

4. C. Ordoñez and U. van Kolck, Texas preprint UTTG-01-92, to be pub-


lished in Physics Letters B291 (1992).

5. M. R. Robilotta and C. Wilkin, J. Phys. G: Nucl. Phys., 4 (1978)


L115. Also see H. McManus and D. O. Riska, Phys. Lett. 92B (1990)
29.

6. S. Weinberg, in A Festschrift for I. I. Rabi, Transactions of the N. Y.


Academy of Sciences 38 (1977) 185.

7. J. M. Eisenberg and D. S. Koltun, Theory of Meson Interactions with Nuclei


(Wiley-Interscience, New York, 1980); T. Ericson and W. Weise, Pions and Nuclei

(Oxford University Press, Oxford, 1988).

8. V. M. Kolybasov and A. E. Kudryavtsev, Zh. Eksper. Teor. Fiz.


(USSR), 63 (1972) 35; Sov. Phys. JETP, 36 (1973) 18.

15
June 27, 2018
UTTG-22-92
LBL 33016
UCB 92/36
arXiv:hep-ph/9303241v1 11 Mar 1993

Flavor Changing Scalar Interactions

Lawrence Hall∗
Department of Physics, University of California
Berkeley, CA, 94720

Steven Weinberg∗∗
Department of Physics, University of Texas
Austin, TX, 78712

Abstract
The smallness of fermion masses and mixing angles has recently
been been attributed to approximate global U(1) symmetries, one
for each fermion type. The parameters associated with these sym-
metry breakings are estimated here directly from observed masses
and mixing angles. It turns out that although flavor changing re-
action rates may be acceptably small in electroweak theories with
several scalar doublets without imposing any special symmetries on
the scalars themselves, such theories generically yield too much CP
violation in the neutral kaon mass matrix. Hence in these theories
CP must also be a good approximate symmetry. Such models pro-
vide an alternative mechanism for CP violation and have various
interesting phenomenological features.

Research supported in part by NSF grant PHY90-21139 and DOE contract DE-AC03-
76SF 00098.
∗∗
Research supported in part by the Robert A. Welch Foundation and NSF Grant PHY
9009850.
The inclusion of multiple scalar doublets at the weak scale in the standard
SU(2) ⊗ U(1) electroweak theory entails the risk of flavor-changing neutral
current processes with rates in excess of experimental bounds. To avoid this,

most studies of such models have adopted the proposal1 of a global symmetry
that allows only one scalar doublet to couple to the right-handed quarks of
each charge. Recently the need for such a symmetry has been challenged
in an article2 that attributes the various small ratios among quark mixing

angles and quark and lepton masses to a set of approximate global U(1) sym-
metries that act only on the quarks, but not on the scalars.3 Specifically,
it is assumed that every appearance of a fermion of the i’th generation in a
Yukawa interaction of quarks or leptons with any scalar doublet φn is accom-

panied with a small dimensionless factor: ǫQi for left-handed quark doublets;
ǫUi or ǫDi for right handed quarks; ǫLi for left-handed lepton doublets; and
ǫEi for right-handed charged leptons. That is, writing the general Yukawa

interaction in the form

LY = −λUijn Q̄Li URj · φ̃n − λD E


ijn Q̄Li DRj · φn − λijn L̄Li ERj · φn + H.c., (1)
" # " # " #
ULj φ+n φ0∗
n
QLj ≡ φn ≡ φ̃n ≡
DLj φ0n −φ+∗
n

the Yukawa couplings are assumed to be of order

|λUijn | ≈ ǫQi ǫUj |λD


ijn | ≈ ǫQi ǫDj |λE
ijn | ≈ ǫLi ǫEj (2)

for all n. (Here and below, ”≈” will be understood to indicate equality within

a factor of order two or three.) Though there is no compelling theoretical

1
justification for this assumption, it may be taken as representative of any
theory of fermion-scalar couplings that attributes the small fermion masses
and mixing angles to symmetries that act on the fermions rather than the

scalars. With the aid of an additional somewhat ad hoc ansatz relating the
ǫ’s, it was shown in reference 2 that the rates of flavor-changing neutral
current processes can be kept within experimental bounds without invoking
any symmetry that restricts which scalars can interact with which quarks.

We shall recover the same result here without using this ansatz. But as we
shall see, there is a further problem with such multi-scalar models: they do
not necessarily yield small violations of CP -conservation in the neutral kaon
system.

To analyze this problem, the generations will be ordered so that, for i < j,

ǫQi ≤ ǫQj ǫUi ≤ ǫUj ǫDi ≤ ǫDj ǫLi ≤ ǫLj ǫEi ≤ ǫEj . (3)

The mass matrices arising from (1) may then be put into a real diagonal
form by subjecting the fermions to transformations:

ULi → VijUL ULj DLi → VijDL DLj

URi → VijUR URj DRi → VijDR DRj

ELi → VijEL ELj ERi → VijER ERj , (4)

with unitary matrices VijUL , etc., having elements


(
ǫQi /ǫQj i≤j
VijUL ≈ , (5)
ǫQj /ǫQi j≤i

2
and likewise for VijDL , VijUR , VijDR , VijEL , and VijER . This transformation yields
quark and lepton masses of order

mUi ≈ ǫQi ǫUi hφi mDi ≈ ǫQi ǫDi hφi mEi ≈ ǫLi ǫEi hφi (6)

and a Cabibbo-Kobayashi-Maskawa (CKM) matrix of the form


(
UL † DL ǫQi /ǫQj i≤j
Vij ≡ [V V ]ij ≈ , (7)
ǫQj /ǫQi j≤i

where hφi is the common order of magnitude of the conventionally normalized



complex neutral scalars, of order 247 GeV/ 2 = 175 GeV.
Now we will use experimental data to estimate the ǫ’s. First, the ratios

of the ǫQi are directly given by Eq. (5) in terms of the mixing angles. The
ratio ǫQ1 /ǫQ2 may be determined either from the Cabibbo angle

ǫQ1 /ǫQ2 ≈ Vus = 0.218 to 0.224

or less accurately from semi-leptonic B meson decays4

Vub
ǫQ1 /ǫQ2 ≈ ≃ 0.07 .
Vcb
Given the theoretical uncertainties in extracting the ratio Vub /Vcb , we regard
these two estimates as being satisfactorily consistent, and we take ǫQ1 /ǫQ2 =
0.2. The second ratio of ǫQi is determined from

ǫQ2 /ǫQ3 ≈ Vcb = 0.032 to 0.054 .

Hence we take

ǫQ1 /ǫQ2 ≈ .2 ǫQ2 /ǫQ3 ≈ .04 ǫQ1 /ǫQ3 ≈ .008 . (8)

3
Using (8), (6), and the “experimental” values of the quark masses5 , we have
then also

ǫU1 ≈ .004/ǫQ3 ǫU2 ≈ .2/ǫQ3 (9)

ǫD1 ≈ .006/ǫQ3 ǫD2 ≈ .025/ǫQ3 ǫD3 ≈ .03/ǫQ3 .

(10)

The Yukawa couplings in Eq. (1) can then be estimated from Eq. (2), with

the unknown ǫQ3 cancelling in all couplings.


Though it is not needed in estimating the Yukawa couplings, we can also
estimate the factor ǫQ3 which is needed to determine the individual suppres-
sion factors. The top quark mass cannot be much less than hφi ≃ 175 GeV,

so if either of the quantities ǫQ3 and ǫU3 were much smaller than the other,
then the larger would have to be much larger than unity, contrary to our
assumption that the ǫ’s are suppression factors. Thus Eq. (6) indicates that
q
ǫQ3 ≈ ǫU3 ≈ mt /hφi. But this actually applies to the Yukawa couplings

defined at a renormalization scale of mt , while we choose to quote the cou-


plings defined at a renormalization scale of 1 GeV, which are larger by a
factor Z ≈ 2. We therefore estimate
q
ǫQ3 ≈ Zmt /hφi (11)

it being understood from now on that all ǫ’s are defined at a renormalization
scale of order 1 GeV.

4
With no measurable mixing angles for leptons, we cannot determine sep-
arate values for the leptonic suppression factors ǫEi and ǫLi . However the
most stringent limits on scalar interactions were found in reference 2 to be

set by the non-leptonic K 0 ↔ K̄ 0 and B 0 ↔ B̄ 0 transitions, to which we now


0
turn. (The transitions D 0 ↔ D̄ 0 and Bs0 ↔ B̄s will be considered later.)
Exchange of neutral scalars produces two different kinds of parity-conserving
∆S = 2 four-quark operators that can induce the transition K 0 ↔ K̄ 0 :

h i
L∆S=2 = 2G(s̄R dL )(s̄L dR ) + G′ (s̄L dR )2 + (s̄R dL )2 (12)

with coupling constants

λD∗ D ∗ 2
X
G= 12n λ21m AnN AmN /mN (13)
nmN

1 X D D
G′ = [λ21n λ21m AnN AmN + λD∗ D∗ ∗ ∗ 2
12n λ12m AnN AmN ]/mN (14)
2 nmN
where
AnN
h0|φ0n (0)|Ni ≡ 3/2
√ (15)
(2π) 2ω N

and the sum over N runs over neutral Higgs scalar mass eigenstates. For

an order-of-magnitude estimate of the K10 − K20 mass difference produced


by this interaction, we will fall back on the vacuum insertion approximation
(which is justified in quantum chromodynamics in the limit of a large number
of colors):

hK̄ 0 |O1 O2 |K 0 i ≈ hK̄ 0 |O1 |0ih0|O2|K 0 i + hK̄ 0 |O2 |0ih0|O1|K 0 i (16)

5
where each of O1 and O2 is either (s̄L dR ) or (s̄R dL ). The one-particle matrix
elements of these operators can be calculated from the known matrix elements
of the corresponding axial-vector current:

0 0 m2K FK
h0|(s̄RdL )|K i = −h0|(s̄L dR )|K i = √ √ (17)
(2π)3/2 2mK 2 2ms
where FK ≃ 230 MeV is the kaon decay amplitude (as compared with Fπ ≃
190 MeV.) This gives a K10 − K20 mass difference

(G − G′ )m3K FK2
∆MK ≈ . (18)
4m2s

The flavor-changing suppression factors in G and G′ turn out to be about


the same
1
ǫQ1 ǫD2 ǫQ2 ǫD1 ≈ [ǫ2Q2 ǫ2D1 + ǫ2Q1 ǫ2D2 ] ≈ 5 × 10−8 . (19)
2
The AnN are of order unity, so barring unexpected cancellations, we expect
that

G − G′ ≈ 5 × 10−8 eiδ /m2H (20)

where δ is an unknown phase, and mH is a weighted average of neutral scalar


masses. Using this in (18) [with ms ≃ 180 MeV] then yields

5 × 10−8 m3K FK2 3 × 10−5 eV


|∆MK | ≈ ≈ . (21)
4m2s m2H (mH /300 GeV)2

The analysis we use to estimate ∆MB parallels that used in Eqs. (12) to

(21) for ∆MK . The relevant coupling suppression factors are now
(b̄L dR )(b̄R dL ) ǫD3 ǫQ1 ǫD1 ǫQ3 ≈ 10−6
(22)
1 1 2
2
[(b̄L dR )2 + (b̄R dL )2 ] [ǫ ǫ2
2 D3 Q 1
+ ǫ2D1 ǫ2Q3 ] ≈ 2 × 10−5

6
(Note that the second suppression factor is an order of magnitude larger than
the naive estimate md mb /hφi2 .) There have been many theoretical estimates
of FB , summarized by Buras and Harlander6 . As a rough consensus value,

we shall take FB ≈ 230 MeV. Following the same reasoning as for ∆MK , we
have then
2 × 10−5 m3B FB2 10−2 eV
|∆MB | ≈ ≈ . (23)
4m2b m2H (mH /300 GeV)2
There are also the more familiar box diagrams involving W W exchange.

Assuming no accidental cancellations between these contributions, it seems


reasonable to require that the scalar exchange contributions should not ex-
ceed twice the experimental values, |∆MK | = 3.5 × 10−6 eV and |∆MB | =
(3.6 ± 0.7) × 10−4 eV. This yields the conditions mH > 600 GeV and mH > 1

TeV, for K and B, respectively.


These Higgs masses are somewhat larger than seems plausible, but our
analysis involves a number of dubious approximations, and it is entirely pos-
sible that we have overestimated the matrix elements for K 0 ↔ K̄ 0 and

B 0 ↔ B̄ 0 transitions by factors of two or three. We conclude then that, as


found in reference 2, the selection rule of reference 1 is not indispensable in
keeping the scalar exchange contribution to the K10 − K20 and B10 − B20 mass
differences at a reasonable size. But without this selection rule the Higgs

scalars must be relatively heavy, and even so scalar exchange would be likely
to make a large and perhaps dominant contribution to the K10 − K20 and
B10 − B20 mas s differences.

7
Our conclusions shift when we consider the CP-violating part of the K10 −
K20 mass difference. This is usually expressed in terms of a parameter ǫ with

|ǫ| ≃ 2.26 × 10−3, which (for |ǫ′ | ≪ |ǫ|) is given by ǫ = Im(∆MK )/ 2|∆MK |.

If scalar exchange does indeed make a major contribution to the K10 −K20 mass
difference, then the phase δ in Eq. (20) would have to be quite small, of the
order of a milliradian or less, in contradiction with the general expectation
that all phases are of order unity. This leaves us with an interesting choice

of alternatives:

• The CP-violating phases are indeed generically of order unity, but


scalar exchange contributions to the K10 − K20 mass difference are
much smaller than we have estimated, perhaps because of accidental
cancellations in the calculation of the scalar-exchange contribution to

the four-quark operator, or a gross failure of the vacuum insertion ap-


proximation used in calculating the K 0 − K̄ 0 matrix element, or both.
This seems implausible unless the scalars are very heavy.

• The CP-violating phases are generically of order unity, but the scalar
couplings are constrained by the selection rule of reference 1. This

is of course automatic with just one scalar doublet, or in supersym-


metric theories with just two scalar doublets. (However it is not at
all automatic in supersymmetric theories with more than two scalar
doublets. In particular, if several scalar doublets couple to the right-

handed quarks of charge −1/3, and if as usually assumed these scalars

8
have smaller vacuum expectation values than the doublets that couple
to the right-handed quarks of charge 2/3, then the Yukawa couplings of
these scalars would be correspondingly larger, leading to an even larger

K10 − K20 mass difference.)

• All of the estimates in this paper are valid, but CP is a good approxi-
mate symmetry, with all the CP- violating phases like δ of order 10−3 .

The third alternative is admittedly a somewhat reactionary view of CP


nonconservation. After the discovery of the process K20 → π + π in 1964 it
was widely assumed that this process is much slower than K10 → π + π be-

cause CP is a good approximate symmetry for the weak interactions. Then


following the discovery of a third generation of quarks and leptons in the
1970s, physicists became attracted to the idea that CP-violating phases are
typically of order unity, and that CP only seems to be a good approximate

symmetry because the third generation is weakly mixed with the first two.
However, since we know that in any case we have to deal with quark masses
and mixing angles that for mysterious reasons are very small, there is noth-
ing absurd in supposing that CP-violating angles are also small. Indeed,

apart from any consideration of scalar exchange effects, we may be driven to


this assumption if theories with supersymmetry broken at the electroweak
scale prove successful. Such theories have CP violating phases in the super-
symmetry breaking interactions that generically lead to a neutron electric

dipole moment three orders of magnitude larger than the present experimen-

9
tal limit7 . This major problem of supersymmetrics models is avoided if we
assume that CP-violating phases are generically of order 10−3 . In the balance
of this paper we discuss the experimental consequences of this picture of CP

violation, combined (where relevant) with our earlier assumptions regarding


scalar couplings.
(1) Direct CP violating effects in the decays of K mesons will be unob-
servably small. The CKM contribution to |ǫ′ /ǫ| will be of order 10−6, and the

contribution from tree level exchange of scalar mesons will be even smaller.
Hence these theories predict that the next round of experiments at CERN
and Fermilab will not find a signal for |ǫ′ /ǫ| at the projected level of sensi-
tivity of 10−4 . Such a null result would be extremely exciting since it would

imply that the CKM matrix could not be the origin of the known CP viola-
tion (unless the top quark mass is found to take a value allowing a precise
cancellation between two contributions to ǫ′ /ǫ), thus implying an alternative

source of CP violation, such as scalar exchange.


(2) All CP violating asymmetries which arise in particle decays must be
of order 10−3 or less, since these asymmetries must be proportional to a CP
violating phase. In particular CP violating effects in B meson decays will be

too small to be observed in any experiment proposed to date. For example


the angles α, β and γ of the unitarity triangle of the CKM matrix will be of
order 10−3 and will be far too small to be observed at proposed B factories.
Nevertheless such B factories could definitively exclude the CKM origin of

10
CP violation8 .
(3) The most promising new positive signature of CP violation in our
scheme is the neutron electric dipole moment. The electric dipole moment

of the up quark arises from a one loop diagram with a virtual top quark
and Higgs meson, and using the results of eqs. 8, 9 and 10 we estimate the
resulting neutron electric dipole moment to be of order 10−26 e cm, close
to the current experimental limit. In supersymmetric theories a comparable

contribution would be expected from diagrams with internal superpartners.


The electron electric dipole moment is expected to be of order 10−31 e cm.
(4) The predictions for the branching ratios for many rare K meson decays
are not the same in our scheme as in the standard model. The most drastic

change is for the K20 → πν ν̄ amplitude which is proportional to the CKM


CP violating phase and therefore gets suppressed by two to three orders of
magnitude. There is no tree level Higgs exchange contribution to this decay

because the Higgs mesons do not couple to neutrinos.


(5) It is striking that for Higgs bosons with a typical mass of about 700
GeV and with couplings to quarks determined by Eqs. (8), (9) and (10),
the tree level scalar exchange contribution to neutral K and B meson mass

mixing turned out to be at about the level observed by experiment. Although


this means that little can be learned about the CKM matrix from ∆MK and
∆MB , the case of D - D̄ presents different opportunities. The analysis we
use to estimate ∆MD parallels that used in Eqs. (12) to (21) for ∆MK and

11
∆MB . The relevant coupling suppression factors are now

(c̄L uR )(c̄R uL ) ǫU1 ǫQ2 ǫU2 ǫQ1 ≈ 3 × 10−7

1 1 2 2 (24)
2
[(c̄L uR )2 + (c̄R uL )2 ] [ǫ ǫ
2 U1 Q 2
+ ǫ2U2 ǫ2Q1 ] ≈ 1 × 10−6

A theoretical estimate of FD may be obtained from the previously quoted

estimate FB ≃ 230 MeV, using the relation (valid in the limit of large quark
q
masses) FD /FB ≃ mb /mc . This gives FD ≈ 470 MeV, so that

10−6 m3D FD2 2 × 10−3 eV


|∆MD | ≈ ≈ . (25)
4m2c m2H (mH /300 GeV)2

If we take the typical Higgs mass as near 1 TeV to account for the observed

values of |∆MK | and |∆MB |, then the predicted value of |∆MD | is close
to the current experimental limit, |∆MD | < 1.3 × 10−4 eV. In the stan-
dard model ∆MD is dominated by long distance contributions, which were
originally estimated9 to be in the range (0.3 to 0.01)×10−4 eV, very much

larger than the order 10−8eV contribution from the short distance standard
model box diagram. In this case, a positive observation of mass mixing at
the level of 10−4 eV would not necessarily require new physics beyond the
standard model. However a recent study10 using heavy quark effective field

theory and naive dimensional analysis suggests that the long distance stan-
dard model contribution to ∆MD is in fact only modestly (about an order
of magnitude) larger than the short distance contribution. Furthermore, a
subsequent calculation11 , which includes leading order QCD corrections,

supports this low value of ∆MD in the standard model. On this basis, we

12
can conclude that a positive signal of neutral D meson mixing at the next
round of searches at Fermilab, CESR and a tau/charm factory would provide
evidence in favor of our scheme.

(6) For strange neutral beauty meson mixing Bs0 ↔ B̄s0 transitions, the
relevant suppression factors are
(b̄L sR )(b̄R sL ) ǫD3 ǫQ2 ǫD2 ǫQ3 ≈ 3 × 10−5
(26)
1 1 2
2
[(b̄L sR )2 2
+ (b̄R sL ) ] [ǫ ǫ2
2 D3 Q 2
+ ǫ2D2 ǫ2Q3 ] ≈ 3 × 10 −4
.
Assuming that the experimental value of ∆MB is dominated by scalar ex-

change, the scalar-mediated contribution to Bs mixing is predicted to be of


order
ǫ2D3 ǫ2Q2 + ǫ2D2 ǫ2Q3
!
(∆MBs )scalar ≈ ∆MB ≈ 5 × 10−3 eV . (27)
ǫ2D3 ǫ2Q1 + ǫ2D1 ǫ2Q3

(7) In theories with only one scalar doublet coupling to quarks of a given
charge,1 the positively charged scalars decay predominantly to cs̄ and ντ τ̄ ,

when the tb̄ mode is kinematically forbidden. In the present class of theories
the decay to cb̄ completely dominates because the relevant products of ǫi are
more than an order of magnitude larger for this mode than any other.
(8) Finally we consider exotic decay modes of the top quark. Our esti-

mates indicate that in the class of theories we are considering Higgs particles
would be too heavy to appear among the decay products of top quarks. But
the phenomenology of such decays would be quite interesting, so it is worth
considering the possibility that we have seriously overestimated neutral me-

son mass mixing, and that there are some Higgs scalars lighter than the top

13
quark. In most models with more than a single scalar doublet the exotic de-
cays t → bh+ and t → ch0 will occur if they are kinematically allowed. (Here
h+ and h0 are the lightest non-Goldstone mass eigenstates formed from lin-

ear combinations of the scalars destroyed by the fields φ+ 0


n and φn introduced

in eq. 1.) As indicated above, the h+ would decay predominantly through


the channel h+ → cb̄, and the h0 decays predominantly via h0 → bb̄, so that
either of these exotic top quark decays yields t → bb̄c. However, as will be

discussed below, the h0 also has a large branching ratio to tau pairs.
The decays t → bh+ are induced by the Yukawa interaction λU33n Q̄L3 UR3 ·
φ̃n , leading to a decay rate
!2
GF m3 m2 +
Γ(t → bh ) ≈ √ t 1 − h2
+
(28)
8 2π mt

If 46 GeV ≤ mt ≤ MW then this exotic decay mode would dominate all


others by a large factor, explaining how a top quark with mass less than mW

might not have been discovered. The charged Higgs h+ decays predominantly
to cb̄. Using our values of the ǫi we compute the branching ratio to τ̄ ντ to be
only ≈ 10−3 . Hence, in this class of theories a successful search for the top
quark at the Fermilab collider would require a technique to isolate candidate

events with four b-type quarks and up to six jets. On the other hand, if
mt > MW we find
 m2 +
2
Γ(t → bh ) + 1− h
m2t 1
≈ (29)
 
MW2 2
MW
Γ(t → bW + )

1− m2t
1+2 m2t

14
which implies that a significant suppression of the conventional decay mode
can occur. For example for a top quark mass of 100 GeV and a scalar mass
of 50 GeV the conventional isolated lepton signature of the top quark will be

suppressed by a factor of about 3. With sufficient statistics the top quark can
still be discovered by the conventional mode, although a determination of its
mass from the rate of these events could result in a considerable overestimate,
about 25 GeV in the example given above. For mt ≥ 150 GeV the suppression

of the conventional signal will be a factor of two or less.


Turning to the decay t → ch0 , we note that this decay is of great interest
since, unlike the decay to bh+ , this flavor-changing decay mode can only
be large if the symmetry imposed in reference 1 is relaxed12 . This decay

is induced by the operator λU32n Q̄L3 UR2 · φ̃n . The relevant coupling factor
ǫQ3 ǫU2 ≈ 0.2 is surprisingly large in this case13 and such decays dominate
(aside from the possible decay t → bh+ ) if the top quark is lighter than the W

boson. The neutral Higgs h0 decays predominantly to b̄b. Using our values
for the ǫi we find the branching ratio to tau pairs to be ≈ 10−1 . Thus h0 has
much larger leptonic branching ratios than h+ . We expect the best signature
at the Fermilab collider to occur when one neutral Higgs decays to b pairs and

the other to tau pairs, with one tau giving an isolated electron and the other
an isolated muon. For an integrated luminosity of 10pb−1 and a top quark
mass of 80 GeV, the Fermilab collider would produce ≈ 30 such events, with
a signature e + µ+ jets (from 2b and 2c quarks) + missing transverse energy.

15
A search for these events must take into account the softer pT distribution of
the isolated leptons compared to the distribution expected from conventional
top quark decays.

For the case mt > MW , the exotic decay mode is no longer likely to
dominate 2
m2 0

ǫ2U2  1 −
0 h
Γ(t → ch ) m2t 1
≈ 2  . (30)

2
MW 2
MW
Γ(t → bW + ) ǫU3 1 −

m2t
1+2 m2t

The decay t → ch0 does not significantly deplete the conventional decays,

so the discovery of the top quark is not hindered by this process. However
the discovery of such exotic, flavor-changing decays would not only reveal a
Higgs boson but would strongly suggest a theory of several scalar doublets
with approximate flavor and CP symmetries.

We are grateful for conversations with Howard Georgi.

16
References

1. S. L. Glashow and S. Weinberg, Phys. Rev. D15, 1958 (1977).

2. A. Antaramian, L. J. Hall, and A. Rašin, Phys. Rev. Lett. 69 1871

(1992).

3. The necessity of coupling scalars according to the rules of reference


1 was also questioned by H. Georgi and D. V. Nanopoulos, Physics
Letters 82B, 95 (1979). They accounted for the suppression of flavor

changing processes like K 0 ↔ K̄ 0 by supposing that the scalars with


flavor changing couplings are much heavier than the Higgs boson re-
sponsible for electroweak symmetry breaking. Here we assume that all
scalars have roughly comparable masses, and attribute the suppression

of flavor changing processes to the smallness of known quark masses


and mixing angles.

4. P. Drell, Talk presented at the International Conference on High Energy

Physics, Dallas. Aug. 1992.

5. We use mu = 5.5 MeV, md = 9 MeV, ms = 180 MeV, mc = 1.4 GeV,


mb = 6 GeV, understood to be defined at a renormalization scale of 1
GeV.

6. A. Buras and M. Harlander, Max Planck Institute preprint MPI-PAE/PTH,


January 1992, Munich.

17
7. J. Polchinski and M.B. Wise, Phys. Lett. 125B 393 (1983).

8. Y. Nir and U. Sarid, Weizmann preprint WIS-92/52/Jun-PH (1992).

9. L. Wolfenstein, Phys. Lett 164B 170 (1985); J.F. Donoghue et al.,


Phys. Rev. D33 179 (1986).

10. H. Georgi, Harvard University preprint HUTP-92/A049 (Aug. 1992).

11. T. Ohl, G. Ricciardi and E. Simmons, Harvard University preprint

HUTP-92/A053 (Dec. 1992).

12. The decay t → ch0 has also been considered recently by W.-S. Hou,
Phys. Lett. B296 179 (1992), where relaxing the symmetry of reference
1 is motivated by models with a Fritzsch-like texture.

13. The large value of ǫU2 is due to the fact that mc /mt is not very much
less than Vcb .

18
UTTG-18-93

Effective Action and Renormalization Group Flow of


arXiv:cond-mat/9306055v1 27 Jun 1993

Anisotropic Superconductors

Steven Weinberg1
Theory Group, Department of Physics, University of Texas
Austin, TX, 78712

Abstract
We calculate the effective action of a superconductor, without as-
suming that either the electron-electron potential or the Fermi sur-
face obey rotational invariance. This approach leads to the same gap
equation and equilibrium free energy as more conventional methods.
The results are used to obtain the Gell-Mann - Low renormalization
group equations for the electron-electron potential.

1
Research supported in part by the Robert A. Welch Foundation and NSF Grant PHY
9009850. Internet address weinberg@utaphy.ph.utexas.edu.
1. Introduction
This paper aims to demonstrate the usefulness in the theory of supercon-
ductivity of the effective action formalism of quantum field theory2 . Although

the effective action may be defined non-perturbatively, for our purposes it


may be taken simply as the sum of all connected one-particle-irreducible
vacuum diagrams in the presence of any background field. As we shall see,
the usual assumptions of the BCS superconductivity theory allow an almost

trivial calculation of the effective action, from which one may obtain all of
the properties of the superconductor, including the gap field, free energy,
penetration depth, and so on.
As an additional application of this formalism, we shall derive the renor-

malization group flow of the electron-electron interaction, a subject that has


received increased attention in the last few years[2]. The renormalization
group equation for the electron-electron potential is obtained here from the

condition that the effective action expressed as a function of the gap field
should be renormalization group invariant. From a practical point of view,
the most significant difference with most earlier work is that here we make
no special assumptions about the form of the Fermi surface or the rotational

invariance of the electron-electron potential.3 Just as in the rotationally in-


2
For a lucid review of the effective action method and references to earlier work, see S.
Coleman, ref. [1].
3
After this work was completed I learned that the renormalization group equation for
non-circular Fermi surfaces in two dimensions had been derived by R. Shankar, Rev. Mod.
Phys., ref. [2].

1
variant case, it turns out that the renormalization group equations may be
expressed as de-coupled equations for an infinite number of coupling param-
eters.

2. Calculation of the Effective Action


As usual in the Fermi theory of liquids, we assume that all degrees of free-
dom may be “integrated out” except for electrons (strictly speaking, quasi-

particles with the quantum numbers of electrons) in a narrow shell around a


Fermi surface, say of thickness µ in electron energy. (In the approach to be
used here, this condition on electron wave numbers is implemented through
the renormalization procedure, and must be verified a posteriori.) Also as

usual, we discard all interactions that become irrelevant in the limit µ → 0.


This leaves just the one-electron and two-electron terms in the Lagrangian:
" ! #
XZ
3 † ′ ∂ ~ + eA(~
~ x, t)) ψ(~x, s)
L= d x ψ (~x, s ) −i + eA0 (~x, t) δs′ ,s + Es′ ,s (−i∇
s′ ,s ∂t
1 X
Z
− d3 x′1 d3 x′2 d3 x1 d3 x2 Vs′1 ,s′2 ,s1 ,s2 (x~1 ′ , x~2 ′ , x~1 , x~2 )
4 s′ ,s′ ,s1 ,s2
1 2

× ψ (x~′1 , s′1 ) ψ † (x~′2 , s′2 ) ψ(x~1 , s1 ) ψ(x~2 , s2 )



(1)

where Aµ is an external electromagnetic vector potential, and s, s′ , etc. are


spin indices summed over the values ± 12 . [The coefficient of the first term is
adjusted to be a simple Kronecker delta by suitable definition of the electron
field operator ψ(~x, s). The time argument is suppressed in all field opera-

tors.] Also, as well known, the only diagrams that are not suppressed by

2
powers of µ as µ → 0 are those whose structure constrains each interacting
pair of electrons to have opposite momenta (for very slowly varying exter-
nal fields), so that if one electron is on the Fermi surface, then the other is

also. [The Fermi surface is defined by the vanishing of any eigenvalue of the
energy matrix Es′ ,s (~p), which is understood to include a chemical potential
term. Time-reversal invariance tells us that if a momentum p~ is on the Fermi
surface, then so is −~p.] In particular, the unsuppressed vacuum diagrams

are those that become disconnected if we cut through an interaction vertex


so as to separate the incoming from the outgoing electrons.
Before setting out to calculate these diagrams, we will introduce a pair
field Ψ by a familiar trick, generally attributed to Hubbard and Stratonovich[3],

which we extend here to general potentials. Add a term to the Lagrangian


of the form
1 X Z 3 ′ 3 ′ 3
∆L = d x1 d x2 d x1 d3 x2 Vs′1 ,s′2 ,s1 ,s2 (x~1 ′ , x~2 ′ , x~1 , x~2 )
4 s′ ,s′ ,s1 ,s2
1 2

× [Ψ†s′ ,s′ (x~′1 , x~′2 ) − ψ † (x~′1 , s′1 ) ψ † (x~′2 , s′2 )]


1 2

× [Ψs1 ,s2 (x~1 , x~2 ) − ψ(x~1 , s1 ) ψ(x~2 , s2 )] (2)

and integrate over the pair field Ψ(x~1 , x~2 ) as well as over the electron field
ψ(~x). This clearly has no effect; the action is quadratic in Ψ, so the path
integral may be evaluated by setting Ψ equal to its equilibrium value

Ψs1 ,s2 (x~1 , x~2 ) = ψ(x~1 , s1 ) ψ(x~2 , s2 )

at which (2) vanishes. Instead of integrating over the pair field, we shall

3
evaluate the effective action in the presence of a background pair field, inte-
grating over the electron field. The term (2) has been chosen to cancel the
term in the Lagrangian quartic in the electron field, leaving just quadratic

terms:

" ! #
XZ
3 † ′ ∂ ~ + eA(~
~ x, t)) ψ(~x, s)
L + ∆L = d x ψ (~x, s ) −i + eA0 (~x, t) δs′ ,s + Es′ ,s (−i∇
s′ ,s
∂t
1 X Z 3 ′ 3 ′ 3
− d x1 d x2 d x1 d3 x2 Vs′1 ,s′2 ,s1 ,s2 (x~1 ′ , x~2 ′ , x~1 , x~2 )
4 s′ ,s′ ,s1 ,s2
1 2

× ψ (x~′1 , s′1 ) ψ † (x~′2 , s′2 ) Ψs1 ,s2 (x~1 , x~2 ) + ψ(x~1 , s1 ) ψ(x~2 , s2 ) Ψ†s′ ,s′ (x~′1 , x~′2 )
h i

1 2

1
Z
d3 x′1 d3 x′2 d3 x1 d3 x2 Ψ†s′ ,s′ (x~′1 , x~′2 ) Vs′1 ,s′2 ,s1 ,s2 (x~1 ′ , x~2 ′ , x~1 , x~2 ) Ψs1 ,s2 (x~1 , x~2 )
X
+
4 s′ ,s′ ,s1 ,s2 1 2
1 2

(3)

Now, the effective action in a background pair field is given by the sum of
all one-particle-irreducible vacuum diagrams — that is, all vacuum diagrams

that cannot be disconnected by cutting through any one internal line[1]. On


the other hand, we have already mentioned that in the limit µ → 0, we are to
keep only graphs that can be disconnected by slicing through any electron-
electron interaction so as to separate incoming from outgoing electrons. In

using (3), this means that we are to keep only graphs that are disconnected by
cutting through any internal pair field line. Since we are to keep only graphs
that both are and are not disconnected by cutting through any internal pair
field line, we conclude that we must keep only graphs that have no internal

pair field lines at all. There are just two such graphs; a tree graph arising

4
from the last term in (3), and a one-electron-loop graph whose value is given
by the determinant of the “matrix” accompanying the terms in (3) quadratic
in the electron field:

1 X
Z Z
Γ[Ψ] = dt d3 x′1 d3 x′2 d3 x1 d3 x2 Ψ†s′ ,s′ (x~′1 , x~′2 )
4 s′ ,s′ ,s1 ,s2 1 2
1 2

× Vs′1 ,s′2 ,s1,s2 (x~1 ′ , x~2 ′ , x~1 , x~2 ) Ψs1 ,s2 (x~1 , x~2 )
" # " #
i A B i A 0
− ln Det + ln Det + Γ[0] (4)
2 B −AT

2 0 −AT

where A and B are the “matrices”:


" ! #
∂ ~ + eA(~
~ x, t)) δ 3 (x~′ − ~x)δ(t′ − t)
As′ x~′ t′ ,s~xt = −i + eA0 (~x, t) δs′ ,s + Es′ ,s (−i∇
∂t
(5)

Bs′ x~′ t′ ,s~xt = ∆s′ s (x~′ , ~x)δ(t′ − t) (6)

and ∆s′ s (x~′ , ~x) is the “gap” field:

1X
Z
∆s′ s (x~′ , ~x) ≡ − d3 yd3y ′ Vs′ sσ′ σ (x~′ , ~x, y~′, ~y )Ψσ′ σ (y~′~y ) (7)
2 σ′ ,σ

The constant Γ(0) represents the contribution of electrons not near the Fermi
surface, for which the approximations made here are not applicable.
To avoid becoming lost in a cloud of indices, we now specialize to the case

of spin-independent forces and a spin-singlet pair field, writing

Ψ 1 − 1 = −Ψ 1 1 ≡Ψ Ψ1 1 = Ψ− 1 − 1 = 0 (8)
2 2 2 2 2 2 2 2

V1 −1 1
− 12 − V1 −1 −1 1 = −V− 1 1 1
− 21 + V −1 1
− 21 12 ≡ 2V (9)
2 2 2 2 2 2 2 2 2 2 2 2

Es′ s = E δs′ s (10)

5
(It is easy to extend the results here to systems with a more general spin
dependence, such as liquid He3 .) It follows from (8) and (9) that

∆ 1 − 1 = −∆− 1 1 ≡∆ ∆1 1 = ∆− 1 − 1 = 0 (11)
2 2 2 2 2 2 2 2

where
Z
∆(x~′ , ~x) ≡ − d3 yd3 y ′ V (x~′ , ~x, y~′, ~y ) Ψ(y~′~y ) (12)

The effective action is now


Z Z
Γ[Ψ] = dt d3 x′1 d3 x′2 d3 x1 d3 x2 Ψ† (x~′1 , x~′2 ) V (x~1 ′ , x~2 ′ , x~1 , x~2 ) Ψ(x~1 , x~2 )
" # " #
A B A 0
−i ln Det + i ln Det + Γ[0] (13)
B −AT

0 −AT
" #
∂ ~ + eA(~
~ x, t)) δ 3 (x~′ − ~x)δ(t′ − t)
Ax~′ t′ ,~xt = −i + eA0 (~x, t) + E(−i∇
∂t
(14)

Bx~′ t′ ,~xt = ∆(x~′ , ~x)δ(t′ − t) (15)

3. Translationally Invariant Equilibrium

First we consider the translationally invariant case, with no external elec-


tromagnetic fields. Then the pair and gap fields can be put in the form
Z
~′
Ψ(x~′ , ~x) = d3 p ei~p·(x −~x) Ψ(~p) (16)
Z
~′
∆(x~′ , ~x) = (2π)−3 d3 p ei~p·(x −~x) ∆(~p) (17)

The electron-electron potential appears here in the form


Z
~′
d3 x1 d3 x2 d3 x3 d3 x4 eip ·(x~1 −x~2 ) ei~p·(x~3 −x~4 ) V (x1 , x2 , x3 , x4 )

≡ Ω4 V (p~′ , ~p) (18)

6
where Ω4 is the spacetime volume
Z Z
Ω4 ≡ d3 x dt (19)

The effective potential U[Ψ] is defined[1] as minus the effective action per
spacetime volume

U[Ψ] ≡ −Γ[Ψ]/Ω4
Z
=− d3 p d3 p′ Ψ∗ (p~′ )V (p~′ , ~p)Ψ(~p)
|∆(~p)|2
!
i Z 3
+ dωd p ln 1 − + U[0] (20)
(2π)4 ω 2 − E 2 (~p) + iǫ
with
Z
∆(~p) = − d3 p′ V (p, p~′ ) Ψ(p~′) (21)

(We are working here at zero temperature. For non-zero temperature, the

integral over ω is of course replaced with a sum over discrete Matsubara


frequencies.) Wick rotating, integrating over ω, and expressing Ψ in terms
of ∆, this becomes
Z
U[∆] = − d3 p d3 p′ ∆∗ (p~′ )V −1 (p~′ , ~p)∆(~p)
1 Z 3 hq 2 2 − E(~
i
− d p E (~
p ) + |∆(~
p )| p ) + U[0] (22)
(2π)3
We pause to note that the gap equation is obtained from the condition

that ∆(~p) be at the stationary point ∆0 (~p) of U[∆]:



δU[∆]
0=
δ∆∗ (~p) ∆=∆0

1 ∆0 (~p)
Z
=− d3 p′ V −1 (~p, p~′ )∆0 (p~′ ) − 3
q
2(2π) E 2 (~p) + |∆0 (~p)|2

7
or in a more familiar form

1 Z 3 ′ V (~p, p~′ ) ∆0 (p~′ )


∆0 (~p) = − dpq (23)
2(2π)3 E 2 (p~′ ) + |∆ (p~′ )|2 0

Also, the effective potential at this stationary “point” is the free energy

density, which using (23) may be expressed in terms of the gap field:

Fequilibrium − F∆=0 = U[∆0 ] − U[0]


 
1 |∆0 (~p)|2
Z q
= d3 p  q − E 2 (~p) + |∆0 (~p)|2 + E(~p)(24)
(2π)3 2 E 2 (~p) + |∆0 (~p)|2

This is the same as the result normally derived from the BCS ground state
wave function[4]. However, it should be noted that (22) is not the same as
the formula for the free energy as a functional of ∆(~p) calculated from the
BCS wave function4 :

1
Z Z  q 
F [Ψ] = + d3 p d3 p′ Ψ∗ (p~′ ) V (p~′ , ~p) Ψ(~p) + d3 p E(~p) 1 − 1 − 4(2π)6 |Ψ(~p)|2 .
(2π)3
(25)

This difference arises because the effective potential does not have the inter-
pretation of an energy density for a composite field like Ψ(~p) or ∆(~p), except
at the stationary point of the effective potential[6]. Nevertheless, both (22)
and (25) yield the same gap equation, and the same value for the equilibrium

free energy.
4
This may be obtained e. g. from Eq. (4-64) of ref. [5], using the relations Ψk = uk vk
and u2k + vk2 = 1 to express the coefficients uk and vk in terms of the pair field Ψk , and
then converting to the notation of the present paper by replacing sums over the discrete
index k with integrals over p~, and inserting appropriate factors of (2π)3 .

8
So far, the limitation of electron momenta to a thin shell around the
Fermi surface has been left implicit. To make this limitation explicit, we will
write the electron momenta as

p~ = ~k + n̂(~k)ℓ d3 p = d2 k dℓ (26)

where ~k is on the Fermi surface (that is, E(~k) = 0), and n̂(~k) is the unit vector
normal to the Fermi surface at ~k. [Here d2 k should be understood as Jdθ1 dθ2 ,

where θ1 and θ2 are coordinates on the Fermi surface, and J is the Jacobian
of the transformation from p~ to ℓ, θ1 , and θ2 . For a spherical Fermi surface,
d2 k just gives a factor 4πkF2 .] We will temporarily impose the condition
R

that electron momenta are close to the Fermi surface by introducing a cut-

off Λ on ℓ, chosen small enough so that Λ ≪ |~k| for all ~k on the Fermi surface,
and so that V (p~′ , ~p) and ∆(~p) change negligibly as ℓ and ℓ′ vary from 0 to Λ.
[This cut-off will eventually be obviated by the introduction of a renormalized
electron-electron potential.] With this cut-off, we may approximate

E(~p) = vF (~k)ℓ (27)

where vF is the Fermi velocity:



∂E(~k + n̂(~k)ℓ)

vF (~k) = (28)
∂ℓ

ℓ=0

Eq. (22) may now be written


Z
U[∆] = −Λ 2
d2 kd2 k ′ ∆∗ (k~′ )V −1 (k~′ , ~k)∆(~k)
1 Z Λ Z 2 hq 2 2 ~ ~k)|2 − ℓvF (~k) + U[0] (29)
i
− dℓ d k ℓ vF ( k) + |∆(
(2π)3 0

9
so that U[∆] is now defined as a functional of the gap field on the Fermi
surface only. We define a renormalized electron-electron potential at a renor-
malization scale µ by

δ 2 U[∆]
Vµ−1 (k~′ , ~k) ≡− (30)

~ ~

∗ ′
δ∆ (k )δ∆(k) ∆(~k)=∆(~k)∗ =µ

When we express the electron-electron potential V in terms of the renormal-


ized potential Vµ , Eq. (29) for the effective potential becomes:
Z
U[∆] = − d2 k d2 k ′ ∆∗ (k~′ )Vµ−1 (k~′ , ~k)∆(~k)
1 Λ
Z Z q
− dℓ 2
dk ℓ2 vF2 (~k) + |∆(~k)|2 − ℓvF (~k)
(2π)3 0

|∆(~k)|2 µ2 |∆(~k)|2
− 2 ~
+ 2 ~
 + U[0] (31)
2 2
2(ℓ vF (k) + µ ) 1/2 2
4(ℓ vF (k) + µ )2 3/2

The one-loop integral over ℓ now converges as ℓ → ∞, so we may remove the


cut-off, and find
Z
U[∆] = − d2 k d2 k ′ ∆∗ (k~′ )Vµ−1 (k~′ , ~k)∆(~k)
   
1 ~ 2 ~
2 |∆(k)|   |∆(k)| 
Z
+ dk ln − 1 + U[0] (32)
2(2π)3 vF (~k) µ

The corresponding gap equation is obtained from the condition that this

expression for the effective potential be stationary:


   
1 Z 2 ′ ~′
~
∆0 (k) = d k V µ (~k, k~′ ) v −1 (k~′ )∆0 (k~′ ) ln  |∆0 (k )|  − 1  (33)
F
2(2π)3 µ 2

and the equilibrium free energy is

1 |∆0 (~k)|2
Z
Fequilibrium = F∆=0 − d2 k (34)
4(2π)3 vF (~k)

10
These very simple results should not be taken entirely literally, because
the effective potential will always contain “irrelevant” terms of higher order
in ∆ arising from degrees of freedom that have been integrated out here. In

particular, note that according to (32) the difference U[∆] − U[0] vanishes
for ∆ → 0, is negative for sufficiently small ∆, and goes to +∞ for ∆ → ∞,
so it must have a stationary point ∆0 6= 0, at which the gap equation (33) is
satisfied, for any renormalized electron-electron potential, whether repulsive

or attractive. But this solution should not be taken seriously if it has ∆0 so


large that it is outside the range of validity of these equations. To be specific,
it is easy to see from (32) that if Vµ (k~′ , k) is positive (in the matrix sense)
for some µ, then the minimum ∆0 (~k) of the effective potential will have a

scale ||∆0 || ≥ eµ, where the “scale” ||∆|| of an arbitrary function ∆(~k) is
defined by the condition
 
|∆(~k)|2 |∆(~k)| 
Z
d2 k ln  ≡ 0.
vF (~k) ||∆||

In particular, if Vµ (k~′ , k) is a positive “matrix” for µ of the order of the Debye


frequency, then the solution of the gap equation (33) is physically irrelevant.

4. Renormalization Group Flow


In the Wilson approach to the renormalization group that is most familiar
in condensed matter physics, we would keep the cut-off Λ finite, and derive a
renormalization group equation for the Λ-dependence of the unrenormalized

electron-electron potential V (k~′ , ~k) from the condition that the effective po-

11
tential (29) should be Λ-independent. This approach would be possible here,
but it would require the introduction of “irrelevant” terms in U[∆] of higher
order in ∆(~k) to keep U[∆] Λ-independent for finite Λ. It is much simpler to

apply the older approach of Gell-Mann and Low: the renormalization group
equation in this approach is the condition that U[∆] is independent of the
arbitrary renormalization scale µ:
d −1 ~′ ~ δ 2 (k~′ − ~k)
µ Vµ (k , k) = − (35)
dµ 2(2π)3 vF (~k)
or equivalently
d 1
Z
µ Vµ (k~′ , ~k) = d2 k ′′ Vµ (k~′ , k~′′ ) vF−1(k~′′ ) Vµ (k~′′ , ~k) (36)
dµ 2(2π)3
This of course also implies that the solution of the gap equation is inde-

pendent of µ. For the special case of a spherical Fermi surface Eq. (36) is a
continuous version of the discrete renormalization group equation of Benfatto
and Gallavotti[2], provided we identify their constant β with 1/2(2π)3vF . Eq.
(36) also agrees with the results of Shankar[2], with fairly obvious changes

to convert his results from two to three dimensions.


These renormalization group equations can be separated into equations
for the eigenvalues λn (µ) of the Hermitian kernel
1
Kµ (k~′ , ~k) ≡
−1/2 ~′
(k ) vF (~k) Vµ (k~′ , ~k) .
−1/2
v (37)
2(2π)3 F
From either (35) or (36), we see that the eigenvectors of K are renormalization-
group invariant, while the eigenvalues are governed by the flow equations
d
µ λn (µ) = λ2n (µ) . (38)

12
For a spherical Fermi surface these eigenvectors are just the spherical har-
monics Yℓm (k̂), but we see that the decoupling of the eigenmodes of K is in
fact quite general, not depending on rotational invariance.

A completely repulsive potential may be defined as one for which all eigen-
vectors λn are positive. If this is true at some starting scale µ0 (say, the Debye
frequency) then as µ → 0 the eigenvalues stay positive and become smaller, so
nothing interesting happens. This conclusion may be altered by the inclusion

of formally irrelevant electron-electron couplings, as in the Kohn-Luttinger


effect, but is not directly affected by anisotropies. A completely or partly
attractive potential is one for which all or some of the eigenvalues of K are
negative. Any eigenvalue that is negative at some µ0 becomes infinite at

a finite µ < µ0 , with the eigenvalue that is largest in magnitude becoming


infinite first. As already mentioned, this is the case where superconductivity
actually occurs.

It is not clear what is gained by this renormalization group analysis.


Within Fermi liquid theory, the effective potential arises solely from tree
and one-loop graphs, so we do not need to use the renormalization group
to make the sort of improvement in perturbation theory familiar in quan-

tum electrodynamics, or to identify a weak coupling regime, as in quantum


chromodynamics or the theory of critical phenomena. Of course, one can go
beyond tree and one-loop graphs, but this would require that we take into
account irrelevant terms in the original Lagrangian, which would involve ad-

13
ditional microscopic information, not just renormalization group equations.

5. Slowly Varying Electromagnetic and Goldstone Fields

Now let us return to the general case of a superconductor in a translationally-


non-invariant external electromagnetic field. In the limit where this field has
very small frequencies and wave numbers (smaller than the inverse correla-
tion length), we can integrate out all degrees of freedom except those that

have zero “mass,” in the sense that their frequency vanishes in the limit of
vanishing wave number. For a superconductor that is not close to a phase
transition at which superconductivity is lost, the only such “massless” degree
of freedom is the Goldstone mode, associated with the spontaneous break-

down of electromagnetic gauge invariance within the superconductor. All of


the classic exact properties of superconductors (persistent currents, Meissner
effect, flux quantization, and Josephson frequency) can be derived by consid-

ering only general properties of the effective action for the Goldstone mode
in the presence of external electromagnetic fields[7]. But to derive values for
quantities like the penetration depth in Type II superconductors, we need a
detailed formula for the effective action.

The effective action for a Goldstone mode is in general obtained by setting


all the fields in the effective action equal to their equilibrium values, and then
subjecting them to a symmetry transformation with space-time dependent
parameters equal to the Goldstone fields. In our case, where the broken

14
symmetry is electromagnetic gauge invariance, this means that we must make
the replacement

~′
∆(x~′ , ~x, t) → e−ieφ(x ,t) ∆0 (x~′ − ~x)e−ieφ(~x,t) (39)

where φ(~x) is the Goldstone field, and ∆0 (x~′ −~x) is the equilibrium gap field,

given by the Fourier transform of the solution of the gap equation:


Z
~′
∆0 (x~′ − ~x) = (2π)−3 d3 p ei~p·(x −~x) ∆0 (~p) (40)

~′ ′
Electromagnetic gauge invariance5 then allows us to remove the factors e−ieφ(x ,t )
and e−ieφ(~x,t) in Eq. (39), by replacing the electromagnetic vector potential
Aµ (~x, t) with Aµ (~x, t) − ∂µ φ(~x, t). In this way, Eqs. (13)-(15) yield the effec-

tive action:
" # " #
A B A 0
Γ[φ, A] = Γ∆=0 [A] − i ln Det + i ln Det (41)
B† −AT 0 −AT

where now
" #
∂ ~ + eA(~
~ x, t) − e∇φ(~
~ x, t))
Ax~′ t′ ,~xt = −i + eA0 (~x, t) − eφ̇(~x, t) + E(−i∇
∂t
× δ 3 (x~′ − ~x)δ(t′ − t) (42)

Bx~′ t′ ,~xt = ∆0 (x~′ − ~x)δ(t′ − t) (43)

Quantitative properties of the superconductor such as the penetration depth

can be read off from the expansion of Eq. (41) in powers of A0 (~x, t) − φ̇(~x, t)
~ x, t) − ∇φ(~
and A(~ ~ x, t).
5
The Lagrangian (1) is gauge invariant for either a local electron-electron potential,
or an arbitrary electron-electron potential with a suitable dependence on the external
electromagnetic fields.

15
Acknowledgment
I am grateful for helpful conversations with S. Coleman, D. Fisher, B.
Halperin, and J. Polchinski. My work on this subject was stimulated by

reading the lectures of Polchinski.

16
References

1. S. Coleman, Aspects of Symmetry (Cambridge University Press, Cam-


bridge, 1985), pp. 136-144; Also see S. Coleman and E. Weinberg,

Phys. Rev. D7 (1973) 1888.

2. G. Benfatto and G. Gallavotti, J. Stat. Phys. 59 (1990) 541; Phys.


Rev. 42 (1990) 9967; J. Feldman and E. Trubowitz, Helv. Phys. Acta
63 (1990) 157; R. Shankar, Physica A 177 (1991) 530; “Renormaliza-

tion Group Approach to Interacting Fermions,” Rev. Mod. Phys., to


be published; J. Polchinski, “Effective Field Theory and the Fermi Sur-
face,” Santa Barbara/Texas preprint NSF-ITP-92- 132/UTTG-20-92,
to be published.

3. R. L. Stratonovich, Sov. Phys. Dokl. 2 (1957) 416; J. Hubbard,


Phys. Rev. Lett. 3 (1959) 77. Also see S. Weinberg, in Proceedings
of the 1962 International Conference on High Energy Physics (CERN,
Geneva, 1962), p. 683.

4. J. Bardeen, L. N. Cooper, and J. R. Schrieffer, Phys. Rev. 108 (1957)


1175.

5. P. De Gennes, Superconductivity of Metals and Alloys, translated by P.

A. Pincus (Benjamin, New York, 1966), p. 110.

17
6. S. Coleman, Aspects of Symmetry (Cambridge University Press, Cam-
bridge, 1985), footnote 3 on page 401.

7. S. Weinberg, “Superconductivity for Particular Physicists,” in Fields,

Symmetries, Strings — Festschrift for Yochiro Nambu, Prog. Theor.


Phys. Suppl. No. 86 (1986) 43.

18
UCLA/94/TEP/25 UTTG-12-94

General Effective Actions


arXiv:hep-ph/9409402v1 23 Sep 1994

Eric D’Hoker1
Department of Physics, University of California at Los Angeles
Los Angeles, CA, 90024

Steven Weinberg2
Theory Group, Department of Physics, University of Texas
Austin, TX, 78712

Abstract
We investigate the structure of the most general actions with sym-
metry group G, spontaneously broken down to a subgroup H. We
show that the only possible terms in the Lagrangian density that,
although not G-invariant, yield G-invariant terms in the action, are
in one to one correspondence with the generators of the fifth co-
homology classes. For the special case of G = SU(N)L × SU(N)R
broken down to the diagonal subgroup H = SU(N)V , there is just
one such term for N ≥ 3, which for N = 3 is the original Wess-
Zumino-Witten term.

1
Research supported in part by NSF grant PHY92-18900. E-mail address:
dhoker@physics.ucla.edu
2
Research supported in part by the Robert A. Welch Foundation and NSF Grant PHY
9009850. E-mail address: weinberg@utaphy.ph.utexas.edu
Effective field theories are increasingly used to understand the dynamics
of the Goldstone bosons that result from spontaneous breaking of continu-
ous symmetries. If the action of a theory is invariant under a (compact) Lie

group G of global symmetries, spontaneously broken to a subgroup H, then


the Goldstone fields π a (x) in the effective action parametrize the coset space
G/H with a = 1, · · · , dim G/H, and accordingly transform under linear rep-
resentations of H, but under non-linear realizations of the broken symmetries

of G. The power of effective field theories arises largely from the fact that the
nonlinearly realized broken symmetry allows only a finite number of terms
in the action, up to any given order in an expansion in powers of derivatives
or momenta.

A general method for constructing invariant non-linear effective actions


was given in Ref. [1] for SU(2)L × SU(2)R and was extended to the case of
arbitrary G and H in Ref. [2]. But although this method yields the most

general G-invariant term in the effective Lagrangian, its results are not quite
complete. Wess and Zumino [3] showed that fermion loops produce a four-
derivative term in the effective Lagrangian for the strong-interaction Gold-
stone octet that is not invariant under SU(3) × SU(3), but rather changes

under SU(3)×SU(3) transformations by a total derivative, so that the action


is SU(3) × SU(3) invariant. Subsequently Witten [4] was able to re-express
this term as the integral over an invariant Lagrangian density in five dimen-
sions. The WZW action has since then been generalized in Ref. [5] to G/H

1
models with arbitrary G and H.
It is natural to ask whether there are any more possible terms in the
action (not necessarily related to anomalies in the underlying theory), that,

although invariant under a nonlinearly realized symmetry G, are not the


four-dimensional integrals of G-invariant Lagrangian densities. This question
seems to us important, as the effective field theory approach is based on our
ability to catalog all invariant terms in the action with a given number of

derivatives.
The first step is to show that even where the action is not the integral
of a G-invariant Lagrangian density, its variation with respect to the Gold-
stone boson fields is an invariant density. The Goldstone boson fields π a (x)

enter the action as a parameterization of a general spacetime-dependent G-


transformation U(π(x)), so the variation of the action under an arbitrary
change in π may be written as
Z n o
δS[π] = d4 x Tr (U −1 δU)X J , (1)

where a subscript X or H will denote the terms proportional to the broken

and unbroken symmetry generators xa and ti , respectively, and the coefficient


J is a local function of the Goldstone boson fields and their derivatives. Let
us work out how J transforms. According to the general formalism of [2],
under a global transformation g ∈ G, the Goldstone boson fields undergo the

transformation π → π ′ , with

g U(π) = U(π ′ ) h(π, g) , (2)

2
where h(π, g) is some element of the unbroken subgroup H. Since S[π] =
S[π ′ ] for all π, the variational derivatives are also equal

δS[π ′ ] δS[π]
= .
δπ a δπ a

(Note that the derivative is with respect to π, not π ′ , on both sides of the
equation.) Using Eq. (1), this is
(" # ) (" # )
∂U(π ′ ) ∂U(π)
Tr U −1 (π ′ ) J(π ′ ) = Tr U −1 (π) J(π) . (3)
∂π a X
∂π a X

To put this in a useful form, take the derivative of Eq (2) with respect to π a ,
and multiply on the left with U(π ′ )−1 and on the right with h−1 (π, g):

∂U(π ′ ) ∂U(π) −1 ∂h(π, g) −1


U −1 (π ′ ) = h(π, g) U −1
(π) h (π, g) − h (π, g)
∂π a ∂π a ∂π a

and so
"
∂U(π ′ )
# " #
−1 ′ −1 ∂U(π)
U (π ) = h(π, g) U (π) h−1 (π, g) . (4)
∂π a X
∂π a X

Eq. (3) then becomes


(" # ) (" # )
−1 ∂U(π) h
−1 ′
i
−1 ∂U(π)
Tr U (π) h (π, g) J(π ) h(π, g) = Tr U (π) J(π) .
∂π a X
∂π a X
(5)
From linear combinations of the quantities [U −1 (π) ∂U(π)/∂π a ]X we can form

arbitrary linear combinations [6] of the broken symmetry generators xa , so


(5) yields the transformation rule for J:

J(π ′ ) = h(π, g) J(π) h−1(π, g) . (6)

3
Following the same arguments that led to (4), we easily see that also
h i h i
U −1 (π ′ )δU(π ′ ) = h(π, g) U −1 (π)δU(π) h−1 (π, g) , (7)
X X

so Tr {(U −1 δU)X J} is invariant under G.


This result leads to a natural five-dimensional formulation of the theory.
As usual, we compactify spacetime to a four-sphere M4 by requiring that all

fields approach definite limits as xµ → ∞. The operator U(π(x)) therefore


traces out a four-sphere in the manifold of G/H as xµ varies over M4 . If
the homotopy group π4 (G/H) is trivial (as is the case for SU(N) × SU(N)
spontaneously broken to SU(N) with N ≥ 3), or if U(π(x)) belongs to

the trivial element of π4 (G/H), then we may introduce a smooth function


π̃ a (x, t1 ), such that π̃ a (x, 1) = π a (x), and π̃ a (x, 0) = 0. In this way spacetime
is extended to a five-ball B5 with boundary M4 and coordinates xµ and t1 .
The action may then be written in the five-dimensional form
Z
S[π] = d4 x dt1 L1 , (8)
B5

where L1 is the G-invariant density Tr {(U −1 ∂U/∂t1 )X J}. (When π4 (G/H) 6=


0, we may interpolate between π a (x) and a fixed representative πoa (x) of the

homotopy class of π a (x). The difference S[π] − S[πo ] is given by the integral
over the cylinder M4 × [0, 1] of the same density L1 as in (8) and the argu-
ments to be presented below still hold. In some cases, G/H may be naturally
embedded into a larger space with vanishing fourth homotopy group, as is

the case for SU(2) embedded in SU(3), considered in [4].)

4
We next show that this is the integral of a G-invariant 5-form on G/H.
Consider a general deformation π(x) → π̃(x; t), where ti are a set of dim(G/H)−
4 free parameters, that along with the xµ provide a set of coordinates for

G/H. The coordinate t1 in (8) can be chosen to be any one of these param-
eters. We have shown that

∂S[π̃] Z
= d4 x Li , (9)
∂ti M4

where Li ≡ Tr {(Ũ −1 ∂ Ũ /∂ti )X J} are G-invariant functions of π̃ a and its


derivatives. The general rules of [2] would allow a wide variety of terms in
Li , but these are limited by integrability conditions. From (9) we see that
!
∂Li ∂Lj
Z
4
dx − i =0.
M4 ∂tj ∂t

Since this integral vanishes for all π̃(x), its integrand must be an x-derivative
[7]:
∂Li ∂Lj
− i = −∂µ Lµij .
∂tj ∂t
This can be written in the language of differential forms, as dt F1 = −dx F2 ,
where
dt ≡ dti ∂i dx ≡ dxµ ∂µ

and F1 and F2 are the five-forms

1 1
F1 ≡ ǫµνρσ Li dti dxµ dxν dxρ dxσ F2 ≡ ǫµνρσ Lµij dti dtj dxν dxρ dxσ .
24 12

It follows that 0 = d2t F1 = dx (dt F2 ), so by an extension of Poincaré’s Lemma,

in any simply connected patch we will have dt F2 = −dx F3 , where F3 is a

5
five-form ǫµνρσ Lµν i j k µ ν
ijk dt dt dt dx dx . Continuing in this way, we can construct

five-forms F4 and F5 proportional respectively to four and five dt factors,


P5
with dt F3 = −dx F4 , dt F4 = −dx F5 , and dt F5 = 0. Hence F ≡ N =1 FN is a

closed five-form on G/H:

dF = 0 d ≡ dx + dt . (10)

Also, because B5 has t2 , t3 , etc., all constant, Eq. (8) may be written
Z
S[π] = F . (11)
B5

So far, only the term F1 has been shown to be G-invariant. The group

G acts transitively on the manifold G/H, so a G transform of a form is


always continuously connected to the original form. Thus the two forms
are homotopic and define the same de Rham cohomology class. One can
construct a G-invariant form in this cohomology class by integrating the

form over the group G with the invariant Haar measure [8,9]. This has
no effect on (11), since the integral depends only on F1 , which is already
invariant. Also, one can similarly show that any two invariant p-forms in
the same cohomology class differ not only by an exterior derivative, but

specifically by the exterior derivative of an invariant (p − 1)-form. Such an


exterior derivative term in the five-form F would yield a term in (8) that can
be written as the four-dimensional integral of a G invariant density, so the
classification of terms in S[π] that cannot be so written is now reduced to

the problem of finding the fifth de Rham cohomology group H 5 (G/H; R) of

6
the manifold G/H [10].
The fifth de Rham cohomology group is well known where G/H is itself a
simple Lie group. For G = SU(N) with N ≥ 3 (including the case SO(6) ∼

SU(4)), H 5 (G; R) has a single generator

−i
Ω5 = Tr (U −1 d U)5 . (12)
240π 2

(Here and henceforth, we suppress wedges in the exterior product of differ-

ential forms, reserving them for the products of cohomology groups.) This is
in particular the case for SU(N) × SU(N) spontaneously broken to SU(N)
with N ≥ 3, where G/H is itself just SU(N). Eq. (12) is the original Wess-
Zumino-Witten term, which we now see is indeed unique. All other simple

(or U(1)) Lie groups have trivial fifth cohomology groups. For the original
case [1] of SU(2) × SU(2) spontaneously broken to SU(2) the cohomology
is trivial, so all invariant actions are the integrals of invariant Lagrangian

densities.
Where G/H is a product space, we use the Künneth formula [8,9]:

H k (K1 × K2 ; R) = H k1 (K1 ; R) ∧ H k2 (K2 ; R) ,


X
(13)
k1 +k2 =k

which gives H 5 (G/H; R) in terms of the cohomologies of its factors up to


degree 5. For this purpose, we need to know that [11,12,13] for all simple
Lie groups G, H k (G; R) vanishes for k = 1, 2, 4 while H 3 (G; R) has a single
generator (corresponding to the Goldstone-Wilczek topologically conserved

7
current [14] )
i
Ω3 = Tr (U −1 d U)3 . (14)
12π
Also H k (U(1); R) vanishes for k > 1, while for k = 1 it has a single generator

Ω1 = −iTr (U −1 d U) . (15)

Finally, H o (K; R) = Rc , where c is the number of connected components

of K; for our purposes this just means that if H 5 (K; R) for some space K
has a generator Ω5 , then H 5 (K × K ′ ; R) has the same generator for any K ′ .
To each generator of H 5(G/H; R), there corresponds a WZW-like term in
the five-dimensional Lagrangian, and an independent coupling constant. In

particular, if G is semi-simple, with precisely p factors SU(Ni ) with Ni ≥ 3


and all other simple factors with H 5 = 0, then we have p different terms of
the Wess-Zumino-Witten type, each of which has an independent coupling
constant in the action. This result is of course expected for a product of

groups, and is known to appear explicitly in the low energy effective action
when massive fermions are integrated out of the path integral [15].
When G/H is not itself a Lie group, the fifth cohomology group of G/H
may still be obtained from that of G. For any simple group G and subgroup

H, we may construct a ‘projected’ five-form on G/H that is invariant under


local H transformations [5,15,16,17], and is given by:

−i
Ω5 (U; V ) = {Tr (U −1 DU)5 − 5Tr W (U −1 DU)3 + 10Tr W 2 U −1 DU} ,
240π 2

(16)

8
where V is the H-connection V = (U −1 d U)H , DU is the H-covariant deriva-
tive DU = dU −UV , and the trace is evaluated in any convenient representa-
tion of G, usually taken as the defining representation. In general, Ω5 (U; V )

is neither closed nor simply related to the generator Ω5 (U; 0) of H 5 (G; R).
Rather,
i
dΩ5 (U; V ) = drst W r W s W t (17)
24π 2
and

Ω5 (U; V ) = Ω5 (U; 0) + Ω5 (1; V ) + d γ(U; V ) , (18)

where W is the field strength W = dV + V 2 , and drst is the trace of the


symmetrized product of generators ρr of H, 2drst = Tr ρr {ρs , ρt }, which
plays a key role in the study of the chiral anomaly in four dimensions [18].

But if drst = 0, then the five form Ω5 (U; V ) is closed, and also each term in
the 5-dimensional Chern-Simons term Ω5 (1; V ) for the H-valued gauge field
V vanishes. The form Ω5 (U; V ) then belongs to the same cohomology class
as Ω5 (U; 0) and it can be shown that there is a one to one correspondence

between the fifth cohomology generators of G/H and those of G. On the


other hand, if drst 6= 0, then the projected form of (16) is not closed and it
can be shown that the fifth cohomology is trivial in this case. For example,
any coset space of the type SU(n)/H with n ≥ 3, where H is embedded in

G in such a way that drst = 0, has one cohomology generator, given in (16).
It is noteworthy that the simple groups SU(2); Sp(2N); SO(N), N ≥
7; E6 , E7 , E8 ; F4 , G2 that have zero fifth cohomology are also those that

9
have vanishing d symbols. We now see that for such groups, the coset spaces
G/H have H 5 (G/H; R) = 0 for all subgroups H. These properties are eas-
ily verified for the special case of compact symmetric spaces [12,13]. Also,

when rank(G)=rank(H), a classic theorem [13] states that all odd cohomol-
ogy classes vanish. An example of a general class of manifolds G/H with
rank(H) ≤ rank(G) for which H 5(G; R) 6= 0 and H 5 (G/H; R) = 0 is pro-
vided [12] by
q
X
SU(n)/S(U(k1 ) × · · · × U(kq )) k= kα ≥ 3 ,
α=1

with the U(kα ) embedded in SU(n) in such a way that the defining represen-

tation of SU(n) transforms also as the defining representation of S(U(k1 ) ×


· · · × U(kq )).
Finally, if G is not simple, and H is a non-trivial subgroup, the cohomol-
ogy problem can be solved by analyzing the restriction of the d-symbols of G

to the subgroup H [19]. If G is semi-simple (and H is connected), two types


of cohomology generators arise [20]. First, the projected form of (16) is now
obtained as a linear combination of Ω5 (U; V ) on each simple component of
G with non-vanishing fifth cohomology. Linear combinations for which drst

on the subgroup H vanishes, yield generators of H 5 (G/H; R). Second, there


may be generators that are linear combinations of products of cohomology
generators on G/H of degrees 2 and 3. Generators of degree 2 correspond to
the field strength associated with generators of invariant Abelian subgroups

of H (i.e. U(1) factors). Generators of degree 3 correspond to the Goldstone-

10
Wilczek current of (14), projected to G/H. When G is not semi-simple and
contains extra U(1) factors, there are also linear combinations of products of
generators of degree 1 with generators of degrees 1, 2, 3 and 4.

We conclude with a brief discussion of global quantization conditions.


Different interpolating maps are generally topologically inequivalent (their
equivalence classes being given by π5 (G/H)), and there is no natural way
of choosing one interpolation above another. Witten has argued that the

quantum action can be allowed to be multiple valued, provided the action


changes additively by integer multiples of 2π.[4] The dependence of interpo-
lation becomes invisible in the quantum theory provided the coupling con-
stants multiplying Ω5 as normalized in (12) are integers. In the present case,

this quantization condition must be enforced on every independent coupling


constant multiplying each non-trivial WZW term normalized as in (12).
A slight refinement of this quantization condition is required when π4 (G) =

0 and π4 (H) 6= 0. For all simple groups H we have π4 (H) = 0, except when
H is a symplectic group, for which π4 (Sp(2n)) = Z2 . Whenever π4 (H) = Z2 ,
H has a discrete anomaly [21], even though its d-symbols vanish identically,
and it can be shown that the coupling constant of the corresponding term of

H 5 (G/H; R) must be quantized in terms of even integers to obtain a single-


valued path integral [22].
We are glad to acknowledge several valuable conversations with E. Farhi,
who collaborated in the early stages of this work. We benefited from helpful

11
conversations of one of us (E. D’H.) on cohomology with V. S. Varadarajan,
and of the other (S.W.) with E. Witten. We thank H. Leutwyler for drawing
our attention to his related work in [23].

References

1. S. Weinberg, Phys. Rev. 166, 1568 (1968).

2. S. Coleman, J. Wess and B. Zumino, Phys. Rev. 177, 2239 (1969) ;


C.G. Callan, S. Coleman, J. Wess and B. Zumino, Phys. Rev. 177,

2247 (1969).

3. J. Wess and B. Zumino, Phys. Lett. 37B, 95 (1971).

4. E. Witten, Nucl. Phys. B223, 422 (1983).

5. Y.-S. Wu, Phys. Lett. 153B, 70 (1985); C.M. Hull and B. Spence,
Nucl. Phys. B353 (1991) 379.

6. With the exponential parameterization, U −1 (π) ∂U(π)/∂π a → ixa for

small π, so the quantities [U −1 (π) ∂U(π)/∂π a ]X span the same space


as the xa for π in at least a finite neighborhood of the origin.

7. Here we are using a general theorem, that if the integral over a closed
manifold of a local function of a field and its derivatives vanishes for
all such fields, then the integrand must be a derivative of another local

function of the field and field derivatives. Since we do not know where

12
this theorem is to be found in the mathematical literature, we have
proven it by direct construction of the latter function.

8. S. Kobayashi and K. Nomizu, Foundations of Differential Geometry,

Vol II (J. Wiley & Sons, 1969).

9. B.A. Dubrovin, A.T. Fomenko and S.P. Novikov, Modern Geometry


and Applications, Vol III (Springer Verlag, 1990).

10. The above arguments may easily be carried over to the construction
of invariant actions in space-times of dimension d, where the allowed
non-invariant Lagrangian densities are in one to one correspondence
with the generators of H d+1 (G/H; R).

11. Encyclopedic Dictionary of Mathematics, S. Iyanaga and Y. Kawada,


eds. (MIT Press, 1980).

12. W. Greub, S. Halperin and R. Vanstone, Connections, Curvature and


Cohomology, Vol III, (Acad. Press, 1976).

13. A. Borel, Ann. Math. (2) 57, 115 (1953); also in A. Borel, Collected
Papers, Vol I (Springer Verlag, 1983).

14. J. Goldstone and F. Wilczek, Phys. Rev. Lett. 47, 986 (1981).

15. E. D’Hoker and E. Farhi, Nucl. Phys. B248, 59, 77 (1984).

13
16. S.S. Chern, Complex Manifolds without Potential Theory (Springer Ver-
lag, 1979).

17. B. Zumino, in Relativity, groups and Topology II: Les Houches 1983,

B. De Witt, R. Stora, eds. (North Holland, 1984); K. Chou, H.Y. Guo,


K. Wu and X. Song, Phys. Lett. 134B, 67 (1984); J. Manes, R. Stora
and B. Zumino, Comm. Math. Phys. 102, 157 (1985); J. Manes, Nucl.

Phys. B250, 369 (1985).

18. W.A. Bardeen, Phys. Rev. 184, 1848 (1969); D.J. Gross and R.
Jackiw, Phys. Rev. D6, 477 (1972); H. Georgi, Lie Algebras in Particle
Physics, (Benjamin/Cummings, 1982).

19. H. Cartan, in Colloque de Topologie, Centre Belge de Recherches Mathé-


matiques, Brussels 1950, (G. Thone, 1950).

20. A discussion of these results will be presented elsewhere.

21. E. Witten, Phys. Lett. 117B, 324 (1982).

22. This includes the cases SU(2) ∼ SO(3) ∼ Sp(2) and SO(5) = Sp(4).

23. H. Leutwyler, “Foundations of Chiral Perturbation Theory”, Univ.


Bern preprint, BUTP-93/24, to appear in Annals of Physics.

14
UTTG-16-94

Strong Interactions at Low Energies

Steven Weinberg1
arXiv:hep-ph/9412326v1 20 Dec 1994

Theory Group, Department of Physics, University of Texas


Austin, TX, 78712

Effective field theories are playing an increasing role in the study of a wide
variety of physical phenomena, from W and Z interactions to superconduc-

tivity. Regarding the subject of this talk, we have known for years that the
low energy strong interactions of nucleons and pions are well described in the
tree approximation2 by an effective field theory, with Lagrangian[1]

∂µ~π · ∂ µ~π m2π ~π 2


Leff = − −
2(1 + ~π 2 /Fπ2)2 2(1 + ~π 2 /Fπ2)2
2~t · (~π × ∂0~π ) 2gA~t · (σ · ∇)~π
" #
+N̄ i∂0 − 2 − mN − N
Fπ (1 + ~π 2 /Fπ2 )2 Fπ (1 + ~π 2 /Fπ2 )
1 1
− CS (N̄N)2 − CT (N̄σN)2 (1)
2 2

where gA = 1.25 and Fπ = 190 MeV, and CS and CT are constants whose

values can be fit to the two nucleon-nucleon scattering lengths. (Spatial


vectors are boldface; arrows denote isovectors.) In this talk I will describe
some current research that takes us beyond the leading terms provided by
Eq. (1), in three different directions.
1
Research supported in part by the Robert A. Welch Foundation and NSF Grant PHY
9009850. E-mail address: weinberg@physics.utexas.edu.
2
In dealing with nuclear forces, the tree approximation must be applied to the nucleon-
nucleon potential, rather than to the scattering amplitude.

1
1 Isospin Breaking Corrections
Using Eq. (1) in the tree approximation gives just the first term in an
expansion in powers of q, the typical value of the pion and nucleon three-

momenta and pion mass. Higher terms in the expansion are generated[2]
by including more derivatives in Leff , each of which contributes a factor of
order q, or more nucleon fields, each of which contributes a factor of order
q 1/2 , or more factors of u and d quark masses, each of which contributes a

factor of order q 2 (because m2π ∝ mu + md ), or loops in the Feynman graphs,


each of which contributes a factor q 2 . These corrections have been explored
in great detail (including also strange particles), especially by Gasser and
Leutwyler[3] for pions and single nucleons, and more recently by Bernard,

Kaiser, and Meissner[4] for pion photoproduction and by Ordoñez and van
Kolck[5] for multinucleon problems. Here I want to concentrate on the quark
mass corrections, which produce violations of isospin conservation. This is
of renewed interest now, because as I learned from Aron Bernstein there

are plans to measure the π 0 -nucleon scattering length, which is sensitive to


isospin violating terms in the effective Lagrangian.
The quark mass terms in quantum chromodynamics may be put in the
form

Lmass = −(mu + md )V4 − (mu − md )A3 (2)

where
1 ¯ 1 ¯ .
V4 = (ūu + dd) A3 = (ūu − dd) (3)
2 2
The operators V4 and A3 are spatial scalars, and components of independent

chiral four-vectors Aα and Vα . We must add terms to the effective Lagrangian

2
with these transformation properties. From just the pion field alone (with
no derivatives) we can construct no term A3 and just one term V4 , the term
in (1) proportional to m2π . From pion fields and a nucleon bilinear (but no

derivatives) we can construct only one term of each type

1 − ~π 2 /Fπ2
!
V4 ∝ N̄N
1 + ~π 2 /Fπ2
!
2 π3
A3 ∝ N̄t3 N − 2 N̄ ~t · ~π N
Fπ 1 + ~π 2 /Fπ2
where N is the nucleon doublet. Therefore in the effective Lagrangian we
must include a term

1 − ~π 2 /Fπ2
!
A
δLef f =− N̄ N
2 1 + ~π 2 /Fπ2
" ! #
2 π3
−B N̄t3 N − 2 N̄ ~t · ~π N (4)
Fπ 1 + ~π 2 /Fπ2

where A and B are constants proportional to the coefficients in (2):

A ∝ mu + md B ∝ mu − md . (5)

The pion-nucleon terms in (1) have single derivatives, which contribute fac-
tors of order q, while the quark masses in (4) contribute factors of order q 2 ,
so the effects of (4) are leading corrections, suppressed by just one factor of
q.

The terms in (4) make a contribution to the scattering length for the
pion-nucleon scattering process πa + N → πb + N (written as a matrix in the
isospin space of the nucleon):
" #
1 2A 2B
δaba = × δab + (ta δ3b + tb δ3a ) . (6)
4π[1 + mπ /mN ] Fπ2 Fπ2

3
The A term is the notorious σ-term. The B term was also described years
ago[6]. What (I think) is new here is the full isospin-breaking term[7] in the
effective action (4), which allows us easily to calculate the effect of isospin

violations in other processes, such as π + N → π + π + N. One immediate


consequence of (4) is that isospin violation never appears in any process that
does not involve at least one neutral pion.
Inspection of (4) shows that the constants A and B are related to the

shifts δmn and δmp in the nucleon masses due to the quark masses:

¯ >
A = δmp + δmn = (mu + md ) < p|(ūu + dd|p
¯ > .
B = δmp − δmn = (mu − md ) < p|(ūu − dd|p (7)

¯ is not related in
Unfortunately the nucleon expectation value of ūu + dd
any simple way to observable quantities, so it is not possible to calculate A

without dynamical assumptions. On the other hand, B is given by SU(3)


symmetry as:

mu − md
 
B≃ (mΞ − mΣ ) ≈ −2.5 MeV . (8)
ms

This satisfies an important consistency condition. The full proton-neutron

mass difference is the sum of B and an electromagnetic term, which is almost


certainly positive, so we must have B < mp − mn = −1.3 MeV, and we do.
It will be very interesting to see if experiments on low energy π 0 -nucleon
interactions confirm these predictions.

2 General Effective Lagrangians


The structure of the effective Lagrangian (1) is dictated by its invariance

4
under SU(2) × SU(2) spontaneously broken to SU(2), which induces on the
pion field the non- linear symmetry transformation[8]:

δ~π = ~ǫ(1 − ~π 2 ) + 2~π (ǫ · ~π ) . (9)

This was generalized by Callan, Coleman, Wess, and Zumino[9] to any group
G broken to any subgroup H ⊂ G, in which case an element g ∈ G induces
on the general Goldstone boson fields π a the transformation π → π ′ , defined

by
gU(π) = U(π ′ )h(π, g) , (10)

where h ∈ H and U(π) is a representative of the coset space G/H, param-


eterized by the Goldstone boson fields. Now, we know how to construct
G-invariant Lagrangian densities our of covariant derivatives of π a , but this
is not the most general possibility. We also can have a Lagrangian density

that under G transformations changes by a derivative

L → L + ∂µ F µ

so that the action is still invariant. Wess and Zumino[10] pointed out that

the ABJ anomaly from fermion loops yields such a term in the effective
Lagrangian for the case of SU(3) × SU(3) spontaneously broken to SU(3):
1 µνρσ
LWZ = ǫ Tr {Π ∂µ Π ∂ν Π ∂ρ Π ∂σ Π } + O(Π6 ) (11)
6π 2
where
1
Π ≡ λa π a ,
2
λa are the Gell-Mann matrices (with Tr λ2a = 2), and the coset representatives
U(π) are chosen as
U(π) = eiΠ . (12)

5
Witten[11] then showed that although the correction d4 xLWZ to the action
R

is not the integral of a G-invariant Lagrangian density over spacetime, it is the


integral of a G-invariant Lagrangian density LWZW over a five-dimensional

ball that has four-dimensional spacetime (Euclideanized and compactified to


a four-sphere) as its boundary. This raises the question whether there are
any other terms in the effective Lagrangian density, not necessarily related
to ABJ anomalies, that although not invariant under G nevertheless yield

G-invariant contributions to the action. Have we been missing something?


This question has now been answered by Eric D’Hoker and myself[12],
with help at the start from Eddie Farhi. Our analysis is in four steps:

(a) As in ref. [11], we first compactify spacetime to a sphere S4 by assuming


that all fields approach definite limits for xµ → ∞. If the homotopy group

π4 (G/H) is trivial (as is the case for SU(N) ⊗ SU(N) spontaneously broken
to SU(N)), or if U(π(x)) belongs to the trivial element of π4 (G/H), then we
may introduce a smooth function π̃ a (x, t1 ), such that

π̃ a (x, 1) = π a (x) π̃ a (x, 0) = 0.

In this way spacetime is extended to a five-ball B5 with boundary S4 and


coordinates xµ and t1 . The action may then be written in the five-dimensional
form
Z
S[π] = d4 x dt1 L1
B5

where
δI[π̃] ∂π̃ a (x, t1 )
L1 ≡ .
δπ̃ a (x, t1 ) ∂t1

(b) It is straightforward to show that if the action I is invariant under G,

6
then the density L1 is also invariant under G. Thus any G-invariant term in
the action can be written in the Witten form, as a five-dimensonal integral
of an invariant density.

(c) From the definition of L1 in terms of δI/δπ, we learn not only that it is G-

invariant, but also that it satisfies an integrability condition, which implies


that L1 is a component of a G-invariant closed five-form Ω5 . That is, in the
language of differential forms:
Z
I= Ω5 dΩ5 = 0 . (13)
B5

Now, if Ω5 is exact, then the four-form F4 satisfying Ω5 = dF4 can be chosen


to be G-invariant, in which case I is the four-dimensional integral F4 over
R
S4

S4 of a G-invariant Lagrangian density. Hence the allowed terms in the four-

dimensional Lagrangian density that are not G- invariant are in one-to-one


correspondence with closed five-forms, modulo exact five-forms. These are
the generators of the fifth de Rham cohomology H 5 (G/H; R) of the space
G/H.

(d) It only remains to find the five-forms that generate H 5 (G/H; R). These

are all known where G/H is itself a simple group. If G/H = SU(N) with
N ≥ 3 then H 5 (G/H; R) has a single generator:

i n
Ω5 = 2
Tr U −1 dU ∧ U −1 dU ∧ U −1 dU
240π
o
∧ U −1 dU ∧ U −1 dU . (14)

For the QCD case of SU(3)×SU(3) spontaneously broken to SU(3), we have


G/H = SU(3), and the unique generator (15) is the Wess-Zumino-Witten

7
five dimensional Lagrangian density. So at least as far as the strong inter-
actions at low energy are concerned, we have not been missing anything.
Where G/H is any simple Lie group other than SU(N) with N ≥ 3, the co-

homology is trivial, and so the four-dimensional Lagrangian density must be


G-invariant. This includes the original case of SU(2) × SU(2) spontaneously
broken to SU(2), where G/H = SU(2).
Of course, a great deal is known about the fifth cohomology groups

H 5 (G/H; R) even where G/H is not a simple Lie group. One interesting
result is that if G itself is one of the simple Lie groups other than SU(N)
with N ≥ 3, then G/H has trivial fifth cohomology group for any subgroup
H ⊂ G, and so the Lagrangian density must be G-invariant. These groups

G also have vanishing triangle anomalies for fermions in any representation


of G, because in all representations the generators tα satisfy

Tr {tα (tβ tγ + tγ tβ )} = 0

so here we would not have expected a Wess-Zumino-Witten term anyway.


D’Hoker is continuing with the study of general coset spaces G/H, to map
out the detailed relation between the possibility of ABJ anomalies and non-

G-invariant terms in the four-dimensional Lagrangian density. It is remark-


able that in all the cases we have studied, the possibility of anomalies could
have been discovered (or ruled out) within the effective field theory of soft
Goldstone bosons, without ever looking at a fermion loop.

3 The Nonrelativistic Quark Model: Sum Rules vs Large Nc


The derivation of results of the non-relativistic quark model from quantum

8
chromodynamics has long remained problematical. Recently, the large Nc
approximation[13] has been used[14-18] to derive some of the quark model
results for baryons. Specifically, it is found that the nucleon doublet is con-

nected by one-pion transitions to a ‘tower’ of narrow baryon states, with


spin and isospin J = T = 1/2, 3/2, · · · Nc /2, and positive parity. Accord-
ing to this picture, the amplitudes for pion transitions between the tower
states together with the spin and isospin operators form a contracted SU(4)

algebra[15-17], under which the baryon tower transforms irreducibly. The


one-pion transition amplitudes derived in this way are just those of the non-
relativistic quark model.
I want to point out that strikingly similar results can be derived in a very

different way. In this approach no use is made of the large Nc approximation


or dynamical models like the non-relativistic quark model or Skyrme model,
beyond the qualitative assumption that there is a tower of narrow baryon
states, with spin and isospin J = T = 1/2, 3/2, · · · Nc /2, and positive parity,

that are connected only to each other by one-pion transitions. By the use
of well-known sum rules saturated with narrow tower states, we will be able
to show (1) that the tower states are degenerate3 , and (2) that the pion
transition amplitudes are part of an uncontracted SU(4) × O(3) Lie algebra,

under which the baryonic tower transforms as a symmetric rank-Nc tensor4 of


3
This is a somewhat surprising result. In general for large Nc it is only the lower tower
states with T = J = O(1) that become degenerate when Nc → ∞. The work described
here shows that in order to understand the splittings of the baryon tower masses, it will
be necessary to take into account single-pion transitions from tower to non-tower states,
rather than 1/Nc corrections to matrix elements between tower states.
4
Of course, if one also assumes that Nc is large, the leading terms in the matrix elements
of the SU (4) generators between low tower states in this representation will grow as Nc .
Since the spin and isospin matrices are for these states are of order Nc0 , they and the

9
SU(4) and a singlet under SO(3). Here Nc is any integer, including Nc = 3;
we keep Nc a free parameter to facilitate comparison with work based on the
large Nc limit. By relying hardly at all here on the large Nc approximation,

this approach offers a prospect of a more convincing derivation of the main


results of the non-relativistic quark model.
To derive these results, lets first recall the relevant sum rules and their al-
gebraic consequences. We usually think of spontaneously broken symmetries

like SU(2) × SU(2) as being manifested entirely in low energy theorems for
the interaction of Goldstone bosons. There are similar low-energy theorems
for the interactions of soft photons. But these low energy theorems when
married to dispersion relations yield sum rules, like the celebrated Adler-

Weisberger, Drell-Hearn, and Cabibbo-Radicati sum rules. Other sum rules


known as superconvergence relations are provided by assumptions that limit
the asymptotic behaviour of scattering amplitudes at high energy. When we
assume that these sum rules are saturated by any number of particles and

narrow resonances, and write down all the sum rules for scattering of Gold-
stone bosons and/or photons not only on stable targets but also on all the
resonances, we find that they take remarkably similar algebraic forms[19-21].
This is a very old story, going back more years than I care to remember. The

new thing I want to discuss here is the solution of these sum rules under a
specific assumption about the menu of baryonic spins and isospins for general
Nc .
Consider pion scattering on an arbitrary hadronic target. The saturated
leading terms in the pion transition amplitudes for Nc → ∞ will furnish a contracted
SU (4) algebra, as found in references [15] - [17].

10
sum rules can be expressed in terms of the matrix elements for pion transi-
tions α → β + πa between stable or resonant states α, β with helicities λ and
λ′ . For any such transition, we can adopt a Lorentz frame in which the initial

and final states have collinear momenta p and p′ , say, in the 3-direction. Us-
ing invariance under rotations around the 3-axis and boosts along the 3-axis,
these matrix elements may be written as:

2(m2α − m2β )
< β, p′ , λ′; πa , q|S|α, p, λ > ≡
(2π)9/2 (8q 0 p0 p′0 )1/2 Fπ
× [Xa (λ)]βα δλ′ λ δ 4 (p′ + q − p) (15)

with a coefficient [Xa (λ)]βα that is independent of |p| and |p′ |. (The axial
coupling gA ≃ 1.25 is just the helicity +1/2 proton-neutron element of the
matrix X1 + iX2 .) Parity conservation tells us that

[Xa (−λ)]βα = −Πα Πβ (−1)Jα −Jβ [Xa (λ)]βα (16)

where Πα and Jα are the parity and spin of the baryonic state α. Isospin

invariance tells us that

[Ta , Xb (λ)] = iǫabc Xc (λ) where [Ta , Tb ] = iǫabc Tc .

In this language, when all of the Adler-Weisberger sum rules for scatter-

ing of a pion on all single-hadron states (either stable particles or narrow


resonances) are saturated with single-hadron states, these sum rules read
simply[19]
[Xa (λ), Xb (λ)] = iǫabc Tc . (17)

Thus for each helicity the reduced pion matrix elements Xa (λ) and the isospin

11
matrix Ta together form an SU(2) × SU(2) algebra5 . There are also two
superconvergence relations that follow from the absence of T = 2 Regge
trajectories with α(0) > 0 in the cross channel. One takes the form[19]

[Xa (λ), [Xb (λ), m2 ] ∝ δab . (18)

The other is a spin-flip superconvergence relation, that connects different


helicities[20]

Xa (λ ± 1)Xb (λ ± 1) m − Xa (λ ± 1) m Xb(λ) − Xb (λ ± 1) m Xa (λ)

+ m Xb (λ)Xa (λ) ∝ δab (19)

where m is the hadronic mass matrix.


To derive the results of the quark model for baryon states, we shall make
use of two lemmas, that may also have applications in other contexts.

Lemma 1: Any set of hadronic states that furnish a representation of the


commutation relations (17) and (18), in which for each helicity any given
isospin appears at most once, must be degenerate.

Proof: Eq. (18) may be written as [Xa (λ), [Xb (λ), m2 ] = m24 (λ)δab . By taking
the commutator of this with Xc (λ) and using the Jacobi identity, it is easy to
see that [Xc (λ), m24 (λ)] = [Xc (λ), m2 ]. Hence the mass-squared matrix may

be written m2 = m24 (λ) + m20 (λ), where m20 (λ) is a chiral scalar, satisfying
[Xc (λ), m20 ] = 0. Also, m24 (λ) and m2b (λ) ≡ [Xb (λ), m24 (λ)] form a chiral four-
vector, in the sense that [Xa (λ), m2b (λ)] = m24 (λ)δab . Now, since we assume
5
This result can be generalized to arbitrary groups G broken to arbitrary subgroups
H: the reduced amplitudes for Goldstone boson emission together with the unbroken
symmetry generators furnish a representation of the algebra of G. This holds even where
G/H is not a symmetric space, i. e., where terms linear in the broken as well as the
unbroken generators appear in the commutators of the broken generators with each other.

12
that for a given helicity, each isospin occurs just once, each isospin for a given
helicity can come from just one irreducible representation of SU(2) × SU(2).
The mass m commutes with isospin, so m2 can have matrix elements only

between baryonic states with the same isospin, and hence belonging to the
same representation of SU(2) × SU(2). But a (1/2, 1/2) operator like m24 (λ)
can have no matrix elements between two states that belong to the same irre-
ducible representation (A, B) of SU(2) × SU(2), so in this representation all

matrix elements of m24 must vanish. This leaves us with m2 = m20 , and since
this commutes with Xa all hadron states connected by one-pion transitions
must have the same mass.

Lemma 2: Any set of degenerate hadronic states of the same parity


that furnish a representation of the commutation relations (17) and (19) also
furnish a representation of an SU(4) × O(3) algebra with SU(4) generators
Ta , Sα , and Dai and O(3) generators S̃i ≡ Ji −Si , satisfying the commutation

relations6

[Ta , Tb ] = iǫabc Tc [Si , Sj ] = iǫijk Sk [Ta , Si ] = 0 (20)

[Ta , Dbi ] = iǫabc Dci [Si , Daj ] = iǫijk Dak (21)

[Dai , Dbj ] = iδij ǫabc Tc + iδab ǫijk Sk (22)

[S̃i , S̃j ] = iǫijk S̃k (23)

[S̃i , Ta ] = [S̃i , Sj ] = [S̃i , Daj ] = 0 (24)

where
[Da3 ]λ′ β,λα = δλ′ λ [Xa (λ)]βα (25)
6
We use a, b, c, etc. for isovector indices and i, j, k, etc. for spatial vector indices.

13
and Ji = S̃i + Si is the usual spin matrix acting on helicity indices, with

[J3 ]λ′ β,λα = δλ′ λ δβα λ . (26)

Proof: For a transition between equal parity states, in the rest frame of the
initial particle the invariant pion transition amplitude < β, p′ , λ′ ; πa , q|S|α, p, λ >

× q 0 p′0 p0 must be an odd function of the momentum of the final particle,
whose magnitude is proportional to m2α − m2β . Hence for m2α − m2β → 0, this

invariant amplitude must be proportional to a linear combination of com-


ponents of the final momentum vector. Rotational invariance requires that
the coefficients must also form a three-vector. In particular, for p′ in the 3-
direction the coefficient of p′3 , which is proportional to Xa , must be the third

component of a quantity that transforms like a spatial 3-vector and an isovec-


tor, in the sense that δλ′ ,λ [Xa (λ)]βα = [Da3 ]λ′ β,λα where [Ta , Dbi ] = iǫabc Dci
and [Ji , Daj ] = iǫijk Dak . Next, we must consider the commutation relations

of the Dai with each other. The commutators of two D’s may be written as
a sum of two terms, one symmetric in space indices and antisymmetric in
isospin indices, and the other vice versa:

[Dai , Dbj ] = iǫabc Aij,c + iǫijk Bab,k

with Aij,c = Aji,c and Bab,k = Bba,k . The commutation relation (17) now
takes the form A33,a = Ta . From rotational invariance (or formally, by taking
repeated commutators with Ji ), we easily see then that Aij,a = δij Ta . To

find Bab,k , we must use the spin-flip superconvergence relation (19), which
can be rewritten in the form [Da3 , [Db3 , m(J1 ± iJ2 )]] ∝ δab , or, since the
states connected by Dai are degenerate, [Da3 , Db1 ± iDb2 ] ∝ δab . It follows

14
that Bab,2 and Bab,1 are proportional to δab , and by rotational invariance the
same is true of Bab,3 , so7 Bab,i = δab Si , verifying Eq. (22). From (22), we
have Si = −iǫijk [Daj , Dak ]/6. Using (22) again to calculate the commutator

of this with Dbj , we easily obtain the commutator [Si , Dbj ] = iǫijk Dbk , and
using this together with the above expression for Sj in terms of the D’s, we
also find [Si , Sj ] = ǫijk Sk , verifying the remainder of the commutators (20)-
(21). We have already mentioned that Daj is a 3-vector, in the sense that

[Ji , Daj ] = iǫijk Dak , so the same is true of Sj . It follows then that S̃i ≡ Ji −Si
satisfies the commutation relations (23) and (24).

To apply Lemma 1 to the tower states, we note that for each helicity

λ, the tower contains isospins T = |λ|, |λ| + 1, · · · Nc /2. Since each isospin
occurs just once, for a given helicity each isospin can come from just one
irreducible representation of SU(2) × SU(2). Therefore according to Lemma
1, the tower states must be degenerate.

Lemma 2 then tells us that the pion transition amplitudes are part of an
SU(4) × O(3) algebra. It is easy to see that the baryons transform under
SU(4)×O(3) as a symmetric SU(4) tensor8 of rank Nc and an SO(3) singlet,
because this is the only representation of SU(4) × O(3) that contains just

the spins and isospin states of the baryon tower. The matrix elements of Dai
in this representation may be calculated by representing it by σi ta , just
P

7
This result was obtained in reference [19] by a weak argument, that the algebra con-
taining the pion transition amplitudes should not contain any T = 2 operators. The
present argument, based on the superconvergence relation (19), avoids this hand-waving.
8
There are actually two of these representations, the contravariant and covariant ten-
sors of rank Nc . They differ only in the sign of Dai , so we can choose either of these
representations by adjusting the sign of the one-pion state.

15
as in the non-relativistic quark model. Thus once we assume that general
Adler-Weisberger and superconvergence sum rules are saturated by the tower
states, the other consequences of the non-relativistic quark model for pion

transitions and gA follow immediately from these sum rules, with no further
need for the large Nc approximation.
All of the above results apply also for baryons that contain some number
Nh of heavy quarks. Since pion transition amplitudes and baryon masses are

independent of the spin 3-component of the heavy quarks, it is only necessary


to replace Nc everywhere above with Nc − Nh .
I am grateful for helpful conversations about the large Nc approximation
with Howard Georgi, Vadim Kaplunovsky, and Aneesh Manohar.

16
References

1. The terms in (1) involving pions alone or pions and a single nucleon
bilinear were given by S. Weinberg, Phys. Rev. Lett. 18, 1 88 (1967).
It was realized later that the terms involving CS and CT make contri-

butions of the same order in small energies: S. Weinberg, Phys. Lett.


B251, 288 (1990); Nucl. Phys. B363, 3 (1991).

2. S. Weinberg, Physica 96A, 327 (1979).

3. J. Gasser and H. Leutwyler, Phys. Lett. 125B, 321, 325 (1985); Ann.
Phys. 158, 142 (1984).

4. V. Bernard, N. Kaiser, and U-G. Meissner, Nucl. Phys. B 383, 442


(1992); Ulf-G. Meissner, Lectures delivered at the XXXII. Interna-

tionale Universitätswochen für Kern- und Teilchenphsik, Schladming,


February 24 - March 6, 1993, hep-ph 9303298.

5. C. Ordóñez and U. van Kolck, Phys. Lett. B 291, 459 (1992); C.


Ordóñez, L. Ray, and and U. van Kolck, Phys. Rev. Lett. 72, 1982

(1994).

6. S. Weinberg, Transactions of the N. Y. Academy of Sciences 38, 185


(1977).

7. After this talk was presented, H. Leutwyler informed me of a paper by

A. Krause, Helv. Phys. Acta 63, 3 (1990). This paper gave an SU(3)×
SU(3) effective Lagrangian that includes a mass term from which it

17
would be possible to derive the B term in (4), but did not consider the
implications of this Lagrangian for pion nucleon interactions.

8. S. Weinberg, Phys. Rev. 166, 1568 (1968).

9. S. Coleman, J. Wess and B. Zumino, Phys. Rev. 177, 2239 (1969) ;


C.G. Callan, S. Coleman, J. Wess and B. Zumino, Phys. Rev. 177,
2247 (1969).

10. J. Wess and B. Zumino, Phys. Lett. 37B, 95 (1971).

11. E. Witten, Nucl. Phys. B223, 422 (1983).

12. E. D’Hoker and S. Weinberg, UCLA-Texas preprint, to be published in


Physical Review D.

13. G. ’t Hooft, Nucl. Phys. B 72, 461 (1974); S. Coleman, in Aspects of

Symmetry (Cambridge University Press, Cambridge, 1985).

14. E. Witten, Nucl. Phys. B160, 57 (1979).

15. J.-L. Gervais and B. Sakita, Phys. Rev. Lett. 52, 87 (1984).

16. R. Dashen and A. V. Manohar, Phys. Lett. B315, 425, 438 (1993); E.
Jenkins, Phys. Lett. 315, 431, 447 (1993); R. Dashen, E. Jenkins, and
A. V. Manohar, Phys. Rev. D 49, 4713 (1994).

17. C. D. Carone, H. Georgi, and S. Osofsky, Phys. Lett. F 322, 227

(1994); C. D. Carone, H. Georgi, L. Kaplan, and D. Morin, Harvard


preprint HUTP-94/A008 (1994).

18
18. A. Wirzba, M. Kirchbach, and D. O. Riska, Darmstadt- Helsinki preprint
(1993), hep-ph/9311299.

19. S. Weinberg, Phys. Rev. 177, 2604 (1969). Some algebraic errors in

the Appendix to this paper are corrected here.

20. S. Weinberg, Phys. Rev. Lett. 22, 1023 (1969).

21. S. Weinberg, in Lectures on Elementary Particles and Quantum Field


Theory, 1970 Brandeis University Summer Institute on Theoretical

Physics (M.I.T. Press, Cambridge, MA, 1970), pp. 285-393.

19
RIMS-1036
UTTG-18-95

Are Nonrenormalizable Gauge Theories


arXiv:hep-th/9510087v2 15 Mar 1996

Renormalizable?

Joaquim Gomisa
Research Institute for Mathematical Sciences
Kyoto University, Kyoto 606-01, JAPAN

Steven Weinbergb
Theory Group, Department of Physics, University of Texas
Austin, TX, 78712, USA
weinberg@physics.utexas.edu

Abstract — We raise the issue whether gauge theories, that are not renor-
malizable in the usual power-counting sense, are nevertheless renormalizable
in the modern sense that all divergences can be cancelled by renormalization
of the infinite number of terms in the bare action. We find that a theory is
renormalizable in this sense if the a priori constraints that we impose on the
form of the bare action correspond to the cohomology of the BRST transfor-
mations generated by the action. Recent cohomology theorems of Barnich,
Brandt, and Henneaux are used to show that conventionally nonrenormal-
izable theories of Yang-Mills fields (such as quantum chromodynamics with
heavy quarks integrated out) and/or gravitation are renormalizable in the
modern sense.

a
Permanent address: Dept. d’ Estructura i Constituents de la Matèria, University of
Barcelona; gomis@ecm.ub.es.
b
Research supported in part by the Robert A. Welch Foundation and NSF Grants PHY
9009850 and PHY 9511632.
1. Introduction
There are two senses in which we may say that a theory is perturbatively
renormalizable. The first is that the theory satisfies the old Dyson criterion,

that the Lagrangian density should contain only operators of dimensionality


four or less.1 This condition is a necessary (though not sufficient) requirement
for infinities to be cancelled with only a finite number of terms in the La-
grangian. Even with this condition violated, it still may be possible that all

divergences are cancelled by renormalization of the terms in the Lagrangian,


but that an infinite number of terms are needed. Despite the presence of
an infinite number of free parameters, such theories have a good deal of
predictive power — specifically, all the predictive power in the S-matrix ax-

ioms of unitarity, analyticity, etc., together with whatever symmetries are


imposed on the theory — and can be used to carry out useful perturbative
calculations.2

Today it is widely believed that all our present realistic field theories are
actually accompanied by interactions that violate the Dyson criterion. The
standard model is presumably what we get when we integrate out modes of
very high energy from some unknown theory, perhaps a string theory, and

like any other effective field theory its Lagrangian density contains terms of
arbitrary dimensionality, though the terms in the Lagrangian density with
dimensionality greater than four are suppressed by negative powers of very
large masses. Likewise for general relativity; there is no reason to believe

1
that the Einstein-Hilbert action is the whole story, but all terms in the ac-
tion with more than two derivatives are suppressed by negative powers of a
very large mass, perhaps the Planck mass. Even if we were to take seriously

the idea that, say, the strong interactions are described by a fundamental
gauge theory whose Lagrangian contains only terms of dimensionality four
or less, nevertheless in calculations of processes at a few GeV we would use
an effective field theory with heavier quarks integrated out, and such an

effective theory necessarily involves terms in the Lagrangian of unlimited di-


mensionality. Similarly, although modern string theories have been generally
based on two-dimensional field theories that are renormalizable in the Dyson
sense, there is some interest in including terms in the action that violate this

condition.3
The second, ‘modern,’ sense in which a theory may be said to be renor-
malizable is that the infinities from loop graphs are constrained by the sym-

metries of the bare action in such a way that there is a counterterm available
to absorb every infinity. Unlike the Dyson criterion, this condition is abso-
lutely necessary for a theory to make sense perturbatively. It is automatically
satisfied if the only limitations imposed on the terms in the bare action arise

from global, linearly realized symmetries. The difficulty in satisfying this


condition appears when we impose nonlinearly realized symmetries or gauge
symmetries on the bare action. Nonlinearly realized symmetries of the bare
action are in general not symmetries of the quantum effective action, while

2
gauge symmetries must be eliminated in quantizing the theory. A BRST
symmetry4 does survive the gauge fixing, but it is nonlinearly realized, so
that even though the quantum effective action respects a BRST symmetry,

it is not the same as the BRST symmetry of the bare action.


The question of whether gauge theories are renormalizable in the modern
sense was originally answered only in the context of theories that are renor-
malizable in the Dyson sense.5 These proofs relied on a brute force enumer-

ation of the possible terms in the quantum effective action of dimensionality


four or less, and it was not obvious that these proofs of renormalizability
could be extended to Lagrangian densities that contain terms of unlimited
dimensionality. This is what is meant by the question asked in the title of

this article.∗
Section 2 discusses the ‘structural constraints’ that are imposed on the
bare action in specifying a gauge symmetry. Section 3 outlines our method

for addressing the question of renormalizability by the use of the antibracket


formalism.7,8 We find there that renormalizability in the modern sense is
guaranteed if the structural constraints imposed on the action are chosen in

To avoid possible confusion, we should distinguish between our aims in this paper and
earlier efforts6 to make general relativity and other theories renormalizable in the Dyson
sense by including higher derivative terms (such as terms bilinear in the curvature) in
the unperturbed Lagrangian. Such efforts lead to problems with unitarity at the energies
at which the renormalized momentum-space integrals begin to converge. In contrast, we
accept the conventional way of splitting the Lagrangian into unperturbed and interaction
terms, so that the unperturbed Lagrangian correctly describes the particle content of the
theory, and no problems with unitarity arise in perturbation theory. Our aim here is not
to restore renormalizability in the Dyson sense, but to learn how to live without it.

3
correspondence with the cohomology of the antibracket transformation gen-
erated by the bare action. (The renormalizability of theories with nonlinearly
realized global symmetries can be dealt with by the same formalism, but with

spacetime-independent ghost fields.) In section 4 we use recently proved co-


homology theorems9 to show that theories of Yang-Mills fields and/or gravi-
tation are renormalizable in the modern sense, even though we allow terms in
the Lagrangian of arbitrary dimensionality. But we shall see that the match-

ing of structural constraints with antibracket cohomologies is only a suffi-


cient, not a necessary, condition for renormalizablity. Cohomology theorems
give the candidates for ultraviolet divergences or anomalies; a perturbative
calculation is needed to see whether the divergences or anomalies actually

occur. In fact, in Section 4 we shall encounter terms in the cohomology of


the antibracket operator that do not correspond to actual infinities.
There are other cohomology theorems10 that can be applied to ‘first-

quantized’ string theories. The question of the renormalizability of super-


gravity and superstring theories remains open, but can be studied by the
methods of antibracket cohomology. It would be reassuring to prove that all
these theories are renormalizable in the modern sense, but even more inter-

esting if some were not, for then renormalizability could again be used, as
we used to think that the Dyson power-counting condition could be used, as
a criterion for selecting physically acceptable theories.
Our discussion does not pretend to be mathematically rigorous. In par-

4
ticular we work with infinite quantities without explicit consideration of pos-
sible regulators, and simply assume that there is some way of introducing a
regulator that does not produce anomalies that would invalidate our argu-

ments. This is no problem in Yang-Mills theories that are free of anomalies


in one-loop order because of the nature of the gauge group rather than be-
cause of cancellations among different fermion multiplets. In such theories
the cohomology theorem of reference 9 shows that the gauge symmetries are

free of anomalies to all orders, without regard to the dimensionality of the


Lagrangian. Theories with U(1) factors may present special difficulties.11
Before proceeding, we wish to comment on earlier work on the renormal-
ization of general gauge theories, most of which were brought to our atten-

tion after the circulation of an earlier version of this paper. Dixon12 and
then Voronov, Tyutin, and Lavrov13 generalized the ideas of Zinn-Justin7 by
introducing a canonical transformation of fields and antifields as well as an

order-by-order renormalization of coupling constants. They emphasized the-


ories that are renormalizable in the Dyson sense, but Voronov, Tyutin, and
Lavrov briefly considered more general theories. More recently, Anselmi14 has
further analyzed the issue of renormalization in gauge theories that are not

renormalizable in the Dyson sense. He also uses a canonical transformation


as well as coupling constant renormalization to cancel infinities, and notes the
possibility that cohomological restrictions might force a weakening of what
we here call ‘structural constraints,’ but his motivation is different from ours;

5
he expresses the view that theories with infinite numbers of free parameters
are not ‘predictive,’ and explains that his purpose is to find a framework for
reducing the infinite number of free parameters in such theories to a finite

number. Also, Harada, Kugo, and Yamawaki15 have recently studied certain
aspects of the renormalization of a conventionally non-renormalizable gauge
theory (a gauge-invariant formulation of a non-linear sigma model), using a
generalization of the Zinn-Justin algorithm. In contrast with these earlier

references, we aim here at showing how to use gauge theories with infinite
numbers of free parameters as realistic field theories. Apart from our differ-
ent motivation, we also give a more explicit discussion of the necessity of the
possible structural constraints imposed on the bare action, which are used

here to deal with the obstructions that arise, for example, for gauge groups
with U(1) factors. Our demonstration that renormalizability follows from co-
homology is not limited to any specific choice of structural constraints, but

only assumes that these are chosen in correspondence with the infinite terms
in the BRST-cohomology of the theory, whatever that might be. Where
some other assumptions make this impossible, the theory must be regarded
as truly unrenormalizable.

2 Structural Constraints
Our first step is to consider how to constrain the bare action to implement
local symmetries. The bare action is taken to be a local functional∗∗ S0 [Φ, Φ∗ ]
∗∗
In a sense the bare action is not local, because it is the integral of an infinite power

6
of a set of fields Φn , including some set of ‘classical’ (matter and gauge) fields
φr , ghosts ω A , and perhaps ghosts for ghosts, etc., as well as ‘non-minimal’
fields (antighosts ω̄ A , auxiliary fields hA , and perhaps extraghosts), and of a

corresponding set of antifields Φ∗n , which have statistics opposite to Φn . The


bare action is assumed to satisfy the quantum master equation†

˜ 0=0,
(S0 , S0 ) − 2ih̄∆S (1)

which incorporates all local symmetries as well as the associated commutation

relations, Jacobi identities, etc.8 Here (F, G) is the antibracket

δF δG δF δG
(F, G) ≡ − , (2)
δR Φ δL Φn δR Φ∗n δL Φn
n ∗

with L and R denoting differentiation from the left and right, respectively,
˜ 0 is the differential operator
and ∆S

˜ ≡ δ 2 S0
∆ . (3)
δL Φn δR Φn ∗

(This is usually called ∆; the tilde is added to distinguish this from a symbol
∆ introduced later.) We further suppose that various global, linearly real-
ized symmetries are imposed, including Lorentz invariance and ghost number
conservation. From now on it should be understood that we also impose the
series in the fields and their derivatives, rather than of a polynomial in fields and field
derivatives. Bare actions of this sort may be regarded as perturbatively local, in the
sense that, to any given order of perturbation theory (whether in small couplings or small
energies), only a finite number of terms in the bare action contribute.

In the original version of this work, we made the stronger assumption that both terms
in Eq. (1) vanish. Both Lavrov and Tyutin13 and Anselmi14 considered theories that
satisfy only the quantum master equation (1).

7
usual conditions on the antibrackets of the action with the non-minimal fields
ω̄ A and hA and their antifields.
If these were the only constraints imposed on the action then the theory

would automatically be renormalizable in the modern sense, because as we


shall see in the next section the infinite part of the quantum effective action
in any order would satisfy the same constraints as the allowed changes in the
counterterms in the bare action. But not all theories are renormalizable in

this sense. One very familiar example of a theory that is not renormalizable
in the modern sense is one in which we arbitrarily set some parameter (such
as the (φ† φ)2 coupling in the electrodynamics of a charged scalar φ) equal to
zero or any finite value. We are concerned here rather with what we shall call

‘structural constraints’ — the constraints that tell us what gauge symmetries


are respected by the theory.
The structural constraints can be of various types:

(a) The usual structural constraints require the bare action S0 to consist of a

term I[φ] that depends only on the ‘classical’ (gauge and matter) fields and
is invariant under some prescribed set of local symmetry transformations,
plus appropriate terms depending also on a limited number of antifield field
factors, whose number and structure are constrained by the master equation.

For instance, for a theory with a closed irreducible gauge algebra like Yang-
Mills theory or general relativity the action would be linear in antifields with

8
one ghost ω A and antighost ω̄ A for each gauge symmetry:

S0 [Φ, Φ∗ ] = I[φ] + ω A CAr [φ] φ∗r + 21 ω A ω B C C AB [φ] ωC∗ − ω̄A∗ hA , (4)

where I[φ] is invariant under the infinitesimal transformation φr → φr +


ǫA CAr [φ], and C C AB [φ] is the structure constant for these transformations.
(We are using a ‘De Witt notation,’ in which indices like A and r include
a spacetime coordinate which is integrated in sums over these indices.) For

supergravity without auxiliary fields the action would be quadratic in anti-


fields.

b) Instead of imposing a fixed gauge symmetry on a theory, we can instead


impose a symmetry with a fixed number of generators and fixed commutation

relations, but with the effect of the symmetry transformations on the classical
fields left arbitrary. For instance, in the case of an irreducible closed gauge
symmetry the action would take the form (4), but with the transformation
functions CAr [φ] otherwise arbitrary.†† This case provides an illustration of

the fact that when we make a change ∆S0 in the bare action, the structural
††
For instance, instead of the usual isospin matrices ti representing the algebra of SU (2)
we can take the generators of the SU (2) gauge transformations to be linear combinations
Oij tj . As long as the matrix Oij is real, orthogonal, and unimodular, this will not change
the SU (2) structure constants. In this case, the change in the gauge transformations is the
same as would be produced by a redefinition of the gauge fields. The cohomology theorem9
used in Section 4 shows that in all semisimple Yang-Mills theories and gravitational theories
r
any infinitesimal change in the transformation functions CA [φ] is the same as would be
produced by a redefinition of fields and antifields together with a corresponding change in
I[φ], but this is not the case in general. For instance, changing the ratios of the coupling
constants of various particles to a U (1) gauge field would change the U (1) transformation
rules in a way that could not be absorbed into a renormalization of the gauge field, while
of course leaving the structure constants zero.

9
constraints apply to S0 + ∆S0 rather than to ∆S0 itself. In particular, ∆I[φ]
is not necessarily invariant under the original gauge transformation φr →
φr + ǫA CAr [φ], but I[φ] + ∆I[φ] is always required to be invariant under the

transformation φr → φr + ǫA (CAr [φ] + ∆CAr [φ]).

c) We might weaken the structural constraints further, assuming only that


the bare action is a polynomial of a given order in the antifields. For instance,
if we required that the action is linear in antifields and involves only the fields

φr , ω A , ω̄ A , and hA and their antifields, then it would have to take the general
form (4), but with unspecified coefficients CrA [φ] and C C AB [φ]. In this case
the master equation would require that the action I[φ] is invariant under the
transformation φr → φr + ǫA CAr [φ] which form a closed irreducible algebra

with structure constants C C AB [φ], but we would not be specifying in advance


what this gauge symmetry algebra is or how it is represented on the matter
fields, except in so far as we specify the transformation of CrA [φ] and C C AB [φ]
under global linear symmetries.

One convenient aspect of structural constraints of types (a) and (b) is that
we can reverse the connection between the master equation and the gauge
symmetry: an action of the form (4) will automatically satisfy the quantum
master equation as long as (1) I[φ] is invariant under the transformations

φr → φr + ǫA CAr [φ] with structure constants C C AB [φ], and (2) a gauge-


˜ 0 = 0.
invariant regulator is used to define integrals over fields, so that ∆S
The same is true when we consider the deformed action I[φ] + ∆I[φ] and

10
require invariance under the deformed gauge transformations φr → φr +
ǫA (CAr [φ] + ∆CAr [φ]). This is not true of structural constraints of type (c);
merely assuming that the action is of some definite order in antifields does

not lead to the master equation. We will not need to assume here that the
structural constraints imply the master equation. We will however assume
that (as is true of all the constraints discussed above) that the structural
constraints are chosen to be linear conditions on possible changes in the

action; if S0 + A and S0 + B both satisfy the structural constraints, then so


does S0 + αA + βB for arbitrary constants α and β. Until Section 4 we will
not be otherwise specific about the structural constraints to be adopted.
It is these structural constraints that create a potential problem for renor-

malizability, for in general they will not be respected by ultraviolet divergent


terms in the quantum effective action. The quantum effective action will not
even always satisfy restrictions on the number of antifield factors, so that, for

example, a bare action with a closed gauge algebra may yield a quantum ef-
fective action with an open gauge algebra.13 Structural constraints arise from
our fundamental assumptions about the sort of theory we wish to study, but
to be physically sensible they must not constrain a theory so severely that

they prevent the cancellation of ultraviolet divergences. Our problem is to


decide what structural constraints satisfy this condition. As we shall see in
the next section, this is a matter of matching the cohomology of the an-
tibracket operation generated by the bare action. Structural constraints of

11
type (a) turn out to be adequate to deal with general relativity and semisim-
ple gauge theories. We would need structural constraints of type (b) to deal
with the candidate divergences that arise when the gauge group has U(1),

but as we shall see these candidate divergences do not correspond to actual


infinities. On the other hand, first-quantized string theories require struc-
tural constraints weaker than those of type (a). In considering structural
constraints other than those of type (a) and (b), it is intriguing that here

we confront the possibility that gauge symmetries may be less fundamental


than the antibracket formalism from which they can be derived.

3. Renormalization in General Gauge Theories


We begin with an outline of the antibracket approach to the renormal-
ization of theories with local symmetries, presented here in a way that is
independent of the specific structural constraints imposed on the theory.

A) In analogy with the renormalization of fields in conventionally renormal-


izable theories like quantum electrodynamics, in order for infinities to cancel
here we need to perform a general canonical transformation Φ → Φ′ (Φ, Φ∗ ),

Φ∗ → Φ′∗ (Φ, Φ∗ ) of fields and antifields. By an canonical transformation is


meant any transformation that preserves the antibracket structure

n
(Φ′n , Φ′m ∗ ) = δm , (Φ′n , Φ′m ) = (Φ′n ∗ , Φ′m ∗ ) = 0 , (5)

which insures that antibrackets of general functionals can be calculated in

12
terms of Φ′n and Φ′n ∗ , in the same way as in terms of Φn and Φ∗n . The action
S0 [Φ, Φ∗ ] if expressed in terms of the transformed fields becomes a different
functional S0′ [Φ′ , Φ′∗ ] ≡ S0 [Φ, Φ∗ ], given by S0′ [Φ′ , Φ′∗ ] = S0 [Φ′ , Φ′∗ ; 1], where

S0 [Φ, Φ∗ ; t] is defined by the differential equation

d  
S0 [Φ, Φ∗ ; t] = F [Φ, Φ∗ ; t], S0 [Φ, Φ∗ ; t] (6)
dt

with initial condition

S0 [Φ, Φ∗ ; 0] = S0 [Φ, Φ∗ ] , (7)

where F [Φ, Φ∗ ; t] is an arbitrary fermionic functional of ghost number −1.


Since the generator F of the canonical transformation contains terms of ar-
bitrary dimensionality, the bare action S0′ [Φ′ , Φ′∗ ] will not generally have any

simple dependence on the transformed antifields Φ′∗ .

B) As a basis for perturbation theory, we must separate out a finite ‘renor-


malized’ zeroth-order action S from the transformed bare action S0′ , with
the remainder regarded as a sum of corrections proportional to powers of a
‘loop-counting’ parameter h̄, with divergent coefficients. The correction term

∆S ′ = S0′ − S receives contributions both from the counterterm ∆S ≡ S0 − S


in the original bare action, and also from the field-antifield-renormalization
canonical transformation in step A. To be specific, suppose we write the
original bare action as a power series in h̄:

S0 = S + h̄∆1 + 12 h̄2 ∆2 + · · · . (8)

13
The generator F (t) of the canonical transformation (4) may similarly be
written as a power series

F (t) = h̄tF1 + 12 h̄2 t2 F2 + · · · . (9)

Eqs. (6) and (7) then give the transformed bare action as

h i h i
S0′ = S + h̄ ∆1 + (F1 , S) + 21 h̄2 ∆2 + 2(F1 , ∆1 ) + (F2 , S) + (F1 , (F1 , S)) + · · · .
(10)
The renormalized action S is taken to have the same form as the original

bare action S0 , satisfying the same structural constraints (including the same
limitations on its dependence on antifields), only with finite instead of infinite
coefficients. Also, since it can be regarded as the limit of S0 for h̄ = 0, it
satisfies the classical master equation

(S, S) = 0 , (11)

with the antibracket calculated in terms of either the original or the canoni-
cally transformed fields and antifields.

C) To carry out quantum mechanical calculations of expectation values,

Greens functions, etc., it is necessary to fix a gauge by taking the antifields


as functions of the fields. This is usually done by taking the antifields in the
form
δΨ(Φ)
Φ∗n = + Kn (12)
δΦn

14
where Ψ is a local fermionic functional of Φ, and Kn is an external field, held
constant in the path integral. It is important to recognize that the same
relation then applies to the transformed antifields

δΨ′ (Φ′ , K)
Φ′n ∗ = + Kn , (13)
δΦ′n

but with a different (and K-dependent) gauge-fixing fermionic functional Ψ′ .


We do not know whether a proof of this result has been published, so a proof

is given in an appendix to this paper. An observable O will be unaffected


by small changes in Ψ, provided it is gauge invariant, in the sense that
˜ = 0.16
(O, S) − ih̄∆O

D) Following the same reasoning as used originally by Zinn-Justin,7 the quan-


tum effective action Γ(Φ, K) satisfies the master equation

(Γ, Γ) = 0 , (14)

with antibrackets calculated using Kn in place of the antifield of Φ′n . But


the variables Φ′n and Φ′n ∗ are related to Φ′n and Kn by a canonical trans-

formation, so we can just as well regard Γ as a functional of Φ′n and Φ′n ∗ ,


satisfying a master equation (14) with the antibracket calculated in terms of
these variables.
In lowest order, Γ is the same as S, and is therefore finite. Suppose that

through cancellations of infinities between loop diagrams and the countert-


erm S0′ − S, all infinities in Γ cancel up to some given order N − 1 in coupling

15
parameters. Then in order N, the infinite part of the master equation con-
strains the infinite part ΓN,∞ of the N-th order term in Γ by

(S, ΓN,∞ ) = 0 . (15)

Because (S, S) = 0, the mapping X 7→ (S, X) is nilpotent, so that the nature

of the solutions of Eq. (15) can be determined with the help of appropriate
cohomology theorems.

E) We shall now suppose that for some given choice of the structural con-
straints discussed in Section 2, we can prove a cohomology theorem, that any
local functional X which is S-closed (in the sense that (S, X) = 0), and is
invariant under the same linearly realized global symmetries (including ghost

number conservation and Lorentz invariance) as S, may be expressed as

X = G + (S, H) (16)

where G is a local functional for which S + G satisfies the same structural


constraints as S, and H is a local fermionic functional, with both G and H
satisfying the same linearly realized global symmetries as S. Eq. (15) tells
us that ΓN,∞ is S-closed, and it automatically is invariant under the same

linearly realized global symmetries as S, so it satisfies the conditions of this


theorem. The cohomology theorem will be applied below not to ΓN,∞ itself,
but to a term in ΓN,∞ that also satisfies these conditions.
Eq. (10) shows that in N-th order S0′ will contain terms (FN , S) and ∆N ,

which make additive contributions to ΓN,∞ , and which do not depend on the

16
terms in F and S0 that appear in ΓM for M < N. We must now inquire
whether ∆N and FN can be chosen to cancel the infinities in ΓN .
Because the structural constraints are supposed to be satisfied by S0 for

all h̄, and are assumed to be linear, they are also satisfied by S + ∆N . Now,
apart from these constraints, and invariance under linearly realized global
symmetries, the only limitation on our freedom to choose the N- th order
counterterm ∆N in the original bare action is that it should not invalidate the

master equation. For the structural constraints of type (a) and (b) discussed
in Section 2, this is not much of a limitation, since the quantum master
equation (1) automatically follows from these structural constraints, provided
we use a gauge- invariant regulator. But for future use we also wish to

consider the more general case, where the master equation must be imposed
on S0 independently of the structural constraints. Since S0 is supposed to
satisfy the master equation for all values of the loop-counting parameter h̄,

the counterterms ∆N are required to satisfy a sequence of equations


N −1
˜ N −1 .
X
(S, ∆N ) = − 12 (∆M , ∆N −M ) + 2i∆∆ (17)
M =1

These conditions on ∆N are not the same as the condition (S, Γ∞,N ) = 0 on

the infinite part of ΓN .


This is no problem. Suppose we find a solution of the equations (17) up
to order N,‡ which satisfies the structural constraints. We may write the

The reader may be bothered by the question of how we know that these equations
can be solved. It is true that if these equations are satisfied up to order N − 1, then

17
N-th order term in the general solution as

∆N = ∆0N + ∆′N (18)

where ∆0N is any particular solution satisfying Eq. (17) (and such that S +∆0N
satisfies the structural constraints), and ∆′N is subject only to the conditions
that S + ∆′N must satisfy the structural constraints and any linearly realized
global symmetries, and

(S, ∆′N ) = 0 . (19)

We may write the infinite N-th order terms in Γ as

ΓN,∞ = ∆′N,∞ − (S, FN,∞ ) + XN,∞ (20)

where XN consists of terms from loop graphs, as well as from the term ∆0N
and various terms in Γ that involve ∆M and FM for M < N. For instance,
for N = 2 Eq. (10) gives

X2 = ∆02 + 2(F1 , ∆1 ) + (F1 , (F1 , S)) + two loop terms involving only S

+ one loop terms involving S, ∆1 and F1 .


the right-hand-side RN of the equation for ∆N does satisfy the condition (S, RN ) = 0,
but we cannot find solutions of the equation (S, ∆N ) = RN for arbitrary RN satisfying
(S, RN ) = 0 unless the cohomology (known as H 1 (S|d), where d denotes the exterior
derivative) of the antibracket operation X 7→ (S, X) on the local functionals X of ghost
number +1 is trivial, which is not generally the case. (The condition H 1 (S|d) = 0 would
also rule out anomalies, but it is not a necessary condition for the theory to be anomaly
free. Even for H 1 (S|d) 6= 0, anomalies can cancel among different fermion multiplets, as
is the case in the standard electroweak theory.) Fortunately, we are not trying to solve
the equations (S, ∆N ) = RN for arbitrary RN satisfying (S, RN ) = 0, but only for the
particular functionals that appear on the right-hand-side of equations (17). The existence
of such solutions is guaranteed by the assumption that the structural constraints allow the
master equation to be solved for all values of h̄.

18
For our purposes the only thing we need to know about XN is that it does
not involve ∆′N or FN , and that it is invariant under any linearly realized
global symmetries of S. It follows from Eqs. (15), (19), and (20) that

(S, XN,∞ ) = 0 . (21)

Hence the hypothesized cohomology theorem would allow us to write XN in


the form (16):
XN,∞ = GN + (S, HN ) , (22)

where GN is a local functional for which S + GN satisfies the same structural


constraints as S, and HN is a local fermionic functional, with both GN and

HN invariant under the same linearly realized global symmetries as S. Since


∆′N and FN are local functionals that can be varied independently of XN ,
subject only to the conditions that they are invariant under linearly realized
global symmetries, that S + ∆′N satisfies the same structure constraints, and

that (S, ∆′N ) = 0, they can be chosen so that

∆′N,∞ = −GN , FN,∞ = HN . (23)

According to Eq. (20), this eliminates the infinities in the quantum effective
action to order N. Continuing this process allows a step-by-step construction

of a counterterm ∆S and canonical transformation generator F that render


the quantum effective action finite to all orders.

4. Cohomology Theorems

The previous section shows how to use cohomology theorems to prove

19
the renormalizability of various ‘nonrenormalizable’ gauge theories. As an
example of such a cohomology theorem, we note that Barnich, Brandt, and
Henneaux9 have recently shown that if S is the action of a semisimple Yang-

Mills theory, or of gravitation, or both together, which of course has ghost


number zero and is linear in antifields, then the most general local functional
X of ghost number zero that satisfies the condition (S, X) = 0 may be
written as a local gauge-invariant functional G[φ] of the ‘classical’ (gauge and

matter) fields alone, so that in our language S + G[φ] satisfies the structural
constraints, plus a term of the form (S, H). Then by the reasoning of the
previous section, we may eliminate all infinities in the quantum effective
action by adjusting the counterterms in S0 −S to cancel G[φ], and performing

a suitable canonical transformation on the fields and antifields to cancel


(S, H).
Gauge theories with U(1) factors require special consideration. Reference

9 shows that in this case the most general local functional X of ghost number
zero that satisfies the condition (S, X) = 0 may be written as a local gauge-
invariant functional G[φ] of the ‘classical’ fields alone, plus a term of the form
(S, H), plus a term of the form‡‡
Z
Aµ (x)jµ (x) d4 x + terms linear in φ∗r , (24)

where j µ (x) is the gauge-invariant current associated with any symmetry of


‡‡
There are additional complications9 in theories with certain exotic couplings between
matter and gauge fields. We will not go into this here, because such theories do not seem
to be of physical interest.

20
the action, and Aµ (x) is the U(1) gauge field (supposing for simplicity that
there is only one.) If jµ (x) is the same current to which Aµ (x) is coupled in the
bare action, then a term like (24) can be compensated by a renormalization

of the field Aµ (x) and a corresponding renormalization of the antifield A∗µ (x),
which is one example of the canonical transformations discussed in Step A
of the previous section.
On the other hand, if the action respects a global symmetry in addition

to the U(1) gauge symmetry, then j µ (x) can be the current associated with
that global symmetry, and in this case the cohomology includes terms whose
antifield-independent part is only gauge-invariant ‘on-shell,’ that is, when the
field equations are satisfied. Thus if infinite terms of the form (24) actually

appeared in the quantum effective action, with j µ (x) a conserved current


other than that to which Aµ (x) was originally coupled, then the structural
constraint we used for semisimple gauge theories, that the bare action has

the form (4) with I[φ] off-shell invariant under a prescribed transformation
δφr → φr + ǫA CAr [φ], would not lead to a renormalizable theory. In this case
we would have to use the weaker structural constraint of type (b) discussed
in Section 2, that the action is of the form (4), with the transformation

functions CrA [φ] specified only as to their number and structure constants
(in this case zero). The counterterms in the bare action would then only
be constrained by the condition that they are linear in antifields, do not
invalidate the master equation, and do not change the structure constants,

21
which in this case are zero.♮ Thus such counterterms could be used to cancel
infinite terms in the quantum effective action of the form (24).
It does not seem that infinities of the form (24), with j µ (x) a conserved

current other than that to which Aµ (x) was originally coupled, actually ap-
pear in the quantum effective action. We have not checked this by direct
calculation, but such infinite terms would represent a change in the mixture
of fermion currents to which long-wave photons couple, and this is prohibited

by the Ward soft-photon theorem. It is not necessary for us to settle this


question, because we have shown that any infinities of form (24) are cancelled
by renormalization of the parameters in the U(1) gauge transformation, but
this seems to be a case where the candidate divergences presented by coho-

mology theorems are not actually divergent.


An even clearer case of this sort is presented by theories containing a
set of free U(1) gauge fields Abµ (x).♮♮ The cohomology of the antibracket

operator also includes the terms


Z  
fabc dx F νµa Abµ Acν + 2A∗µ b c ∗ b c
a Aµ ω + ωa ω ω . (25)

As already noted in Section 2, the antifield-independent term I[φ] + ∆I[φ] is not
required by these structural constraints and the master equation to be invariant under
the original gauge transformations φr → φr + ǫA CAr
, but only under the modified gauge
r r A r r
transformations φ → φ + ǫ (CA + ∆CA ), so that
   
δ∆I[φ]/δφr CAr
= − δ (I[φ] + ∆I[φ]) /δφr ∆CA r

which only requires that ∆I[φ] should be invariant under the original gauge transformation
φr → φr + ǫA CAr
when the field equations are satisfied.
♮♮
We are grateful to F. Brandt for suggesting this to us.

22
where fabc are totally antisymmetric constants. If these corresponded to
actual divergences we would have to weaken the structural constraints so that
not even the structure constants were prescribed in advance, leaving open the

possibility that the fields Abµ (x) transform under a non-Abelian gauge group.
But here it is quite clear that the terms in Eq. (25) are not produced by
radiative corrections; no radiative corrections can give interactions to a field
that does not interact to begin with.

A recent cohomology theorem of Brandt, Troost, and Van Proeyen10


shows that it is also necessary to weaken the structural constraints in deal-
ing with first-quantized string theories — that is, with gravitation coupled
to scalar matter in two dimensions. If the Liouville field is explicitly intro-

duced the analysis of ref. 17 shows that the cohomology of S contains terms
corresponding to a change in the action of its local symmetries, though not
of their algebra, so here one should impose a structural constraint of type

(b). Analogous comments apply to the spinning string.18


The possibility of weakening the structural constraints may become useful
in applications to other theories. It is important to find out whether super-
gravity and general superstring theories are renormalizable in the modern

sense, and for this purpose we need to know the cohomology generated by
the bare action of these theories.

Acknowledgments We are grateful for helpful conversations with C.


Becchi, F. Brandt, D. Buchholz, M. Henneaux, and J. Pons.

23
Appendix

We wish to prove that if

Φ∗n = δΨ(Φ)/δΦn + Kn , (26)

then canonically transformed variables Φ′n and Φ′n ∗ satisfy a relation of the

same form
Φ′n ∗ = δΨ′ (Φ′ , K)/δΦ′n + Kn , (27)

though generally with a different (and K-dependent) fermionic functional


Ψ′ 6= Ψ. It is only necessary to show that this is true for infinitesimal
canonical transformations, which are of the form

Φ′n = Φn + (F, Φn ) = Φn − (δF/δΦ∗n )Φ∗ =δΨ/δΦ+K , (28)

Φ′n ∗ = Φn ∗ + (F, Φ∗n ) = Φn ∗ + (δF/δΦn )Φ∗ =δΨ/δΦ+K , (29)

where F [Φ, Φ∗ ] is an infinitesimal fermionic functional. Continuity then im-

plies that the same will be true for finite canonical transformations, in at
least a finite region around the unit transformation.
To prove Eq. (26), we note that Eqs. (25) and (28) yield

Φ′n ∗ = δΨ/δΦn + (δF/δΦn )Φ∗ =δΨ/δΦ+K + Kn . (30)

The derivative of Ψ with respect to Φ may be expressed in terms of its


derivative with respect to Φ′ , using Eq. (27) to write

δL Φ′n
!
n δL δF
m
= δm − . (31)
δΦ δΦm δΦ∗n Φ∗ =δΨ/δΦ+K

24
Using this in Eq. (29) and keeping only terms of first order in F gives
!
δΨ δΨ δL δF δFΦ∗ =δΨ/δΦ+K
Φ′n ∗ = − m n +
δΦ ′n δΦ δΦ δΦ∗m Φ∗ =δΨ/δΦ+K
δΦn
! !
δL δΨ δF
− + Kn . (32)
δΦn δΦm δΦ∗m Φ∗ =δΨ/δΦ+K

To first order in F this has the same form as the desired result (26), with
!
′ δΨ δF
Ψ =Ψ− + (F )Φ∗ =δΨ/δΦ+K . (33)
δΦm δΦ∗m Φ∗ =δΨ/δΦ+K

References

1. F.J. Dyson, Phys. Rev. 75 (1949), 486, 1736.

2. S. Weinberg, Physica 96A (1979), 327. For reviews of more recent


work, see H. Leutwyler, in Proceedings of the XXVI International Con-

ference on High Energy Nuclear Physics, Dallas, 1992, ed. by J.


Sanford (American Institute of Physics, New York, 1193): 185; U.
G. Meissner, Rep. Prog. Phys. 56 (1993), 903; A. Pich, Valen-
cia preprint FTUV/95-4, February 1995, to be published in Reports

on Progress in Physics; J. Bijnens, G. Ecker, and J. Gasser, in The


Daphne Physics Handbook, Vol. 1, eds. L. Maiani, G. Pancheri, and N.
Paver (INFN, Frascati, 1995): Chapters 3 and 3.1; G. Ecker, preprint
hep-ph/9501357, to be published in Progress in Particle and Nuclear

Physics, Vol. 35 (Pergamon Press, Oxford).

25
3. A. Polyakov, Nuc. Phys. B268 (1986), 406; J. Polchinski and A.
Strominger, Phys. Rev. Lett. 67, 1681 (1991).

4. C. Becchi, A. Rouet, and R. Stora, Comm. Math. Phys. 42 (1975),


127; in Renormalization Theory, ed. by G. Velo and A. S. Wightman

(Reidel, Dordrecht, 1976); Ann. Phys. 98 (1976), 287; I. V. Tyutin,


Lebedev Institute preprint N39 (1975).

5. B. W. Lee and J. Zinn-Justin, Phys. Rev. D5 (1972), 3121, 3137;


Phys. Rev. D7 (1972), 1049; G. ‘t Hooft and M. Veltman, Nucl. Phys.
(1972) B50, 318; B. W. Lee, Phys. Rev. D9 (1974), 933.

6. See, e.g., R. Utiyama and B. De Witt, J. Math. Phys. 3 (1962), 608;

S. Weinberg, in Proceedings of the XVII International Conference on


High Energy Nuclear Physics (Rutherford Laboratory, 1974), p. III-
59; S. Deser, in Gauge Theories and Modern Field Theory, ed. by R.
Arnowitt and P. Nath (M.I.T. Press, Cambridge, 1976); K. S. Stelle,

Phys. Rev., D 16 (1977), 953.

7. J. Zinn-Justin, in Trends in Elementary Particle Theory - International


Summer Institute on Theoretical Physics in Bonn 1974 (Springer-Verlag,
Berlin, 1975).

8. I. A. Batalin and G. A. Vilkovisky, Phys. Lett. B102 (1981), 27; Nucl.


Phys. B234 (1984), 106; J. Math. Phys. 26 (1985), 172. For a review,

see J. Gomis, J. Parı́s and S. Samuel, Phys. Rep. 259 (1995), 1.

26
9. G. Barnich and M. Henneaux, Phys. Rev. Lett. 72 (1994) 1588; G.
Barnich, F. Brandt, and M. Henneaux, Phys. Rev. 51, R1435 (1995);
Brussels–Amsterdam preprint ULB-TH-94/07, NIKHEF-H 94-15, to

be published in Comm. Math. Phys; Brussels–Leuven preprint KUL-


TF-95/16, ULB-TH-95/07.

10. F. Brandt, W. Troost, and A. Van Proeyen, Leuven preprint KUL-TF-


95/17 (September 1995).

11. G. Bandelloni, C. Becchi, A. Blasi, and R. Collina, Ann. Inst. Henri

Poincaré A 28 (1978), 15.

12. J. Dixon, Nucl. Phys. B99 (1975), 420 .

13. B. L. Voronov and I. V. Tyutin, Theor. Math. Phys. 50 (1982), 218;


52 (1982), 628; B. L. Voronov, P. M. Lavrov, and I. V. Tyutin, Sov.
J. Nucl. Phys. 36 (1982), 292; P. M. Lavrov and I. V. Tyutin Sov. J.
Nucl. Phys. 41 (1985), 1049.

14. D. Anselmi, Class. and Quant. Grav. 11 (1994), 2181; 12 (1995), 319.

15. M. Harada, T. Kugo, and K. Yamawaki, Prog. Theor. Phys. 91 (1994),

801.

16. M. Henneaux and C. Teitelboim, Quantization of Gauge Systems (Prince-


ton University Press, Princeton, 1992): Section 18.1.4; M. Lavrov and

I. V. Tyutin, ref. 13.

27
17. J. Gomis and J. Parı́s, Nucl. Phys. B341 (1994), 378.

18. J. Gomis, K. Kamimura, and R. Kuriki Barcelona-Tokyo preprint UB-


ECM-PF 95/22, TOHO-FP-9553, to be published (1995).

28
Theories of the Cosmological Constant
Steven Weinberg∗
arXiv:astro-ph/9610044v1 7 Oct 1996

Physics Department, University of Texas


Austin, Texas 78712
weinberg@physics.utexas.edu
August 29, 1996

Abstract — This is a talk given at the conference Critical Dialogues in Cos-


mology at Princeton University, June 24–27, 1996. It gives a brief summary of
our present theoretical understanding regarding the value of the cosmological
constant, and describes how to calculate the probability distribution of the ob-
served cosmological constant in cosmological theories with a large number of
subuniverses (i. e., different expanding regions, or different terms in the wave
function of the universe) in which this constant takes different values.

UTTG-10-96

∗ Research supported in part by the Robert A. Welch Foundation and NSF Grants PHY

9009850 and PHY 9511632.

1
1 Introduction
The problem of the cosmological constant looks different to astronomers and
particle physicists. Astronomers may prefer the simplicity of a zero cosmologi-
cal constant, but they are also prepared to admit the possibility of a cosmological
constant in a range extending up to values that would make up most of the criti-
cal density required in a spatially flat Robertson–Walker universe. To a particle
physicist, all the values in this observationally allowed range seem ridiculously
implausible.
To see why, it is convenient to consider the effective quantum field theory
that takes into account only degrees of freedom with energy below about 100
GeV, with all higher energy radiative corrections buried in corrections to the
various parameters in the effective Lagrangian. In this effective field theory, the
vacuum energy density that serves as a source of the long-range gravitational
field may be written as
Λ 1X
ρV = + h̄ω , (1)
8πG 2
where Λ is the cosmological constant appearing in the Einstein field equations,
and the second term symbolizes the contribution of quantum fluctuations in
the fields of the effective field theory, cut off at particle energies equal to 100
GeV. Now, we know almost everything about this effective field theory — it
is what particle physicists call the standard model — and we know that the
quantum fluctuations do not cancel, so that on dimensional grounds, in units
with h̄ = c = 1, they yield
1X
h̄ω ≈ (100 GeV)4 (2)
2
On the other hand, observations do not allow ρV to be much greater than the
critical density, which in these units is roughly 10−48 GeV4 . Not to worry —
just arrange that the Einstein term Λ has a value for which the two terms in
Eq. (1) cancel to fifty-six decimal places. This is the cosmological constant
problem: to understand this cancellation.
Here I will consider three main directions for solving this problem[1]:
• Deep Symmetries
• Cancellation Mechanisms
• Anthropic Constraints
By a ‘deep symmetry’ I mean some new symmetry of an underlying theory,
which is not an unbroken symmetry of the effective field theory below 100 GeV
(because we know all these symmetries), but which nevertheless requires ρV to
vanish. In other contexts supersymmetry can sometimes play the role of a deep
symmetry, in the sense that some dimensionless bare constants that are required

1
to vanish by supersymmetry can be shown to vanish to all orders in perturbation
theory even though supersymmetry is spontaneously broken. Unfortunately the
vacuum density is not a constant of this sort — it has dimensionality (mass)4
instead of being dimensionless, and it is a renormalized coupling rather than a
bare coupling. Recently Witten has proposed a highly imaginative and specu-
lative mechanism by which some form of supersymmetry makes ρV vanish[2]. I
am grateful to the organizing committee of this conference for giving me only
15 minutes to talk, so that I don’t have to try to explain Witten’s idea. I turn
instead to the other two approaches on my list.

2 Cancellation Mechanisms
The special thing about having ρV = 0 is that it makes it possible to find
spacetime-independent solutions of the Einstein gravitational field equations.
For such solutions, we have
∂L/∂gµν = 0 , (3)
where L is the Lagrangian density for constant fields. The problem occurs
only in the trace of this equation, which receives a contribution from ρV which
for ρV 6= 0 prevents a solution. Many theorists have tried to get around this
difficulty by introducing a scalar field φ in such a way that the trace of ∂L/∂gµν
is proportional to δL/δφ:

gµν ∂L/∂gµν = f (φ)δL/δφ , (4)

with f (φ) arbitrary, except for being finite. Where this is done, the existence
of a solution of the field equation δL/δφ = 0 for a spacetime-independent φ
implies that the trace gµν ∂L/∂gµν = 0 of the Einstein field equation for a
spacetime-independent metric is also satisfied. The trouble is that, with these
assumptions, the Lagrangian has such a simple dependence on φ that it is not
possible to find a solution of the field equation for Rφ. This is because Eq. (4),
together with the general covariance of the action d4 x L, tells us that, when
the action is stationary with respect to variations of all other fields, it has a
symmetry under the transformations

δgλν = 2ǫgλν , δφ = −ǫf (φ) , (5)

which requires the Lagrangian density for spacetime-independent fields gµν and
φ to have the form
Z φ !
p dφ′
L = c det g exp 4 , (6)
f (φ′ )

where c is a constant whose value depends on the lower limit chosen for the
integral. For c 6= 0, there is no solution at which this is stationary with respect

2
to φ. The literature is full of proposed solutions of the cosmological constant
problem based on this sort of spontaneous adjustment of one or more scalar
fields, but if you look at them closely, you will see that either they do not satisfy
Eq. (4), in which case there may be a solution for φ but it does not imply the
vanishing of ρV , or else they do satisfy Eq. (4), in which case a solution of the
field equation for φ would imply a vanishing ρV , but there is no solution of the
field equation for φ. To the best of my knowledge, no one has found a way out
of this impasse.

3 Anthropic Considerations
Suppose that the observed subuniverse is only one of many subuniverses, in
which ρV takes a variety of different values. This is the case for instance in
theories of chaotic inflation[3], in which various scalar fields on which the vacuum
energy depends take different values in different expanding regions of space. In
a somewhat more subtle way, this can also be the case in some versions of
quantum cosmology, where the wave function of the universe is a superposition
of terms in which ρV takes different values, either because of the presence of
some vacuum field (like the antisymmetric tensor gauge field Aµνλ introduced
for this purpose by Hawking[4]), or because of wormholes, as in the work of
Coleman[5].
Some authors[4], [6], [7] have argued that in quantum cosmology the dis-
tribution of values of ρV is very sharply peaked at ρV = 0, which would im-
mediately solve the cosmological constant problem. This conclusion has been
challenged[8], and it will be assumed here that the probability distribution of
ρV is smooth at ρV = 0, without any sharp peaks or dips.
In any theory of this general sort the measured effective cosmological con-
stant would be much smaller than the value expected on dimensional grounds
in elementary particle physics, not because there is any physical principle that
makes it small in all subuniverses, but because it is only in the subuniverses
where it is sufficiently small that there would be anyone to measure it. For
negative values of ρV , this limitation comes from the requirement that the sub-
universe must survive long enough to allow for the evolution of life[9]. For
positive values of ρV (which are observationally more promising) the limitation
comes from the requirement that large gravitational condensations like galax-
ies must be able to form before the subuniverse begins its final exponential
expansion[10].
If you don’t find this sort of anthropic explanation palatable, consider the
following fable. You are an astronaut, sent out to explore a randomly chosen
planet around some distant star, about which nothing is known. Shortly before
you leave you learn that because of budget cuts, NASA has not been able to
supply you with any life-support equipment to use on the planet’s surface. You
arrive on the planet, and find to your relief that conditions are quite tolerable —

3
the air is breathable, the temperature is about 300◦ K, and the surface gravity
is not very different from what it is on earth. What would you conclude about
the conditions on planets in general? It all depends on how many astronauts
NASA has sent out. If you are the only one then it’s reasonable to infer that
tolerable conditions must be fairly common, contrary to what planetologists
would have naturally expected. On the other hand, if NASA has sent out a
million astronauts, then all you can conclude about the statistics of planetary
conditions is that the number of planets with tolerable conditions is probably not
much less than one in a million — for all you know, almost all of the astronauts
have arrived on planets that cannot support human life. Naturally, the only
astronauts in this program that are in a position to think about the statistics
of planetary conditions are those like you who are lucky enough to have landed
on a planet on which they can live; the others are no longer worrying about it.
In previous work[10] I calculated the anthropic upper bound on the cosmo-
logical constant, which arises from the condition that ρV should not be so large
as to prevent the formation of gravitational condensations on which life could
evolve. This bound is naturally larger than the average value of the cosmologi-
cal constant that would be measured by typical observers, which obviously gives
a better estimate of what we might find in our subuniverse. (Vilenkin[11] has
advocated this point of view under the name of the ‘principle of mediocrity’,
but did not attempt a detailed analysis of its consequences.) The difference
is important, because the anthropic upper bound on ρV is considerably larger
than the largest value of ρV allowed by observation.
I will leave the observational limits on the cosmological constant to Dr.
Fukugita’s talk, but without going into details, it seems that for a spatially flat
(i.e., k = 0) universe, ρV is likely to be positive and somewhat larger than the
present mass density ρ0 , but probably not larger than 3ρ0 [12]. On the other
hand, we know that some galaxies were already formed at redshifts z ≈ 4, at
which time the density of matter was larger than the present density ρ0 by a
factor (1 + z)3 ≈ 125. It therefore seems unlikely that a vacuum energy density
much smaller than 125ρ0 could have completely prevented the formation of
galaxies, so the anthropic upper bound on ρV cannot be much less than about
125ρ0 , which is much greater than the largest observationally allowed value of
ρV .
In contrast, we would expect the anthropic mean value of ρV to be roughly
comparable to the mass density of the universe at the time of the greatest rate
for the accretion of matter by growing galaxies, because it is unlikely for ρV
to be much greater than this and there is no reason why it should be much
smaller. (I will make this more quantitative soon.) Although there is evidence
that galaxy formation was well under way by a redshift z ≈ 3, it is quite possible
that most accretion of matter into galaxies continues to lower redshifts, as seems
to be indicated by cold dark matter models. In this case the anthropic mean
value < ρV > will be considerably less than the anthropic upper bound, and
perhaps within the range allowed observationally.

4
I would like to present an illustrative example of a calculation of the whole
probability distribution of the cosmological constant that would be measured
by observers, weighted by the likelihood that there are observers to measure
it. Instead of the very simple model[13] of galaxy formation from spherically
symmetric pressureless fluctuations used previously[10], here I will rely on the
well-known model of Gunn and Gott[14], which also assumes spherical symmetry
and zero pressure, but takes into account the infall of matter from outside the
initially overdense core. This is still far from realistic, but it will allow me
to make four points about such calculations, which should be more generally
applicable.
As shown in earlier work[10], the condition for a spherically symmetric fluc-
tuation to recondense is that
500 (∆ρ)3
> ρV . (7)
729 ρ2
where ρ and ∆ρ are the average density and the overdensity in the fluctua-
tion at some early initial time, say the time of recombination. Previously ∆ρ
was assumed to be uniform within a spherical fluctuation, but Eq. (7) actually
applies to any sphere, with ∆ρ understood to be the spatially averaged initial
overdensity within the sphere.
Suppose that the fluctuation at recombination consists of a finite spherical
core of volume V with positive average overdensity δρ, outside of which the
density takes its average value ρ. (This picture is appropriate for well separated
fluctuations. The effects of crowding and underdense regions will be considered
in a future paper.) Then the average overdensity within a larger volume V ′
centered on this core is ∆ρ = δρV /V ′ . Assuming that Eq. (7) is satisfied by the
average overdensity δρ within the core,

500 (δρ)3

> ρV , (8)
729 ρ2 recomb

the average overdensity ∆ρ will satisfy the condition (7) out to a volume
 1/3
500
Vmax = ρ−2/3 δρ V
729ρV

so the total mass that will eventually collapse is


"  1/3 #
500ρ
M = δρV + ρVmax = V δρ 1 + . (9)
729ρV

Once a galaxy forms, the subsequent evolution of stars and planets and life is
essentially independent of the cosmological constant (this is point 1), so the
number of independent observers arising from a given fluctuation at the time of

5
recombination is proportional to the mass (9) for those fluctuations satisfying
Eq. (8), and is otherwise zero. Of course, the value of the cosmological constant
might be correlated with the values of other fundamental constants, on which
the evolution of life does depend, but the range of anthropically allowed cosmo-
logical constants is so small compared with the natural scale (2) of densities in
elementary particle physics that within this range it is reasonable to suppose
that all other constants are fixed. (This is point 2.) The range of values of ρV
for which gravitational condensations are possible is also so much less than the
average density at the time of recombination, that the number of fluctuations
N (δρ, V ) dV dδρ with volume between V and V + dV and average overdensity
between δρ and δρ + dδρ should be nearly independent of ρV . (This is point 3.)
If P(ρV ) dρV is the a priori probability that a random subuniverse has vacuum
energy density between ρV and ρV + dρV , then according to the principles of
Bayesian statistics, the probability distribution for observed values of ρV is
Z ∞ Z ∞
Pobs (ρV ) ∝ P(ρV ) dV dδρ N (δρ, V )
0 (729ρV ρ2 /500)1/3
"  1/3 #
500ρ
× V δρ 1 +
729ρV
" 1/3 # Z
 ∞
500ρ
∝ P(ρV ) 1 + dδρ N (δρ)δρ(10)
729ρV (729ρV ρ2 /500)1/3

where Z ∞
N (δρ) ≡ dV V N (δρ, V ) . (11)
0
Finally, the range of values of ρV for which gravitational condensations are
possible is so small compared with the natural scale of densities in elementary
particle physics that within this range the a priori probability P(ρV ) may be
taken as constant. (This is point 4.) The factor P(ρV ) may therefore be omitted
in the probability distribution (10). Also, all anthropically allowed values of ρV
are much smaller than the mass density ρ at recombination, so we may neglect
the 1 in the square brackets in Eq. (10), which now becomes
Z ∞
−1/3
Pobs (ρV ) ∝ ρV dδρ N (δρ)δρ . (12)
(729ρV ρ2 /500)1/3

Strictly speaking, this gives the probability distribution only for ρV > 0.
For ρV < 0 and k = 0, all mass concentrations that are large enough to allow
pressure to be neglected will undergo gravitational collapse. The number of
astronomers is instead limited[9] for ρV < 0 by the fact that the subuniverse
itself also collapses, in a time
s
2π 3
T (|ρV |) = . (13)
3 8πG|ρV |

6
In contrast, the probability distribution for ρV > 0 is weighted by an ρV -
independent factor, the average time T in which stars provide conditions favor-
able for intelligent life. The probability distribution for negative values of ρV is
small except for values of |ρV | that are small enough so that T (|ρV |) is less than
or of order T . It will be assumed here that T is very large, so that Pobs (ρV ) is
negligible for ρV < 0 except in a small range near zero, and may therefore be
neglected in calculating the mean value of ρV .
Using the probability distribution (12) and interchanging the order of the
integrals over δρ and ρV , we easily see that the mean value of observed values
of ρV is
200 < δρ6 >
hρV i = , (14)
729 < δρ3 > ρ2
with all quantities on the right-hand side evaluated at the time of recombination,
and the brackets on the right-hand side (unlike those in hρV i) indicating averages
over fluctuations: Z ∞
< f (δρ) >≡ dδρ N (δρ) f (δρ) . (15)
0
It remains to use astronomical observations to calculate the fluctuation spec-
trum N (δρ) for the density fluctuations at recombination, which can then be
used in Eq. (12) to calculate the probability distribution for ρV . Here I will just
give one example of how information about the time of formation of galaxies
can put constraints on < ρV >. With a positive ρV , the core of a fluctuation
with average overdensity δρ at recombination will collapse at a time when the
average cosmic density ρcoll is less than it would be at the time of core collapse
for ρV = 0:[10]
500 δρ3
ρcoll < , (16)
243 π 2 ρ2
with ρ and δρ on the right-hand side evaluated at recombination. Using this in
Eq. (14) gives a mean vacuum density

2π 2 hρcoll i
<ρV > > . (17)
15
Even if we suppose for example that core collapse occurs for most galaxies at
a redshift as low as z ≈ 1, then ρcoll ≈ 8ρ0 , so Eq. (19) gives < ρV > > 10ρ0 ,
which exceeds current experimantal bounds on ρV . On the other hand, the
median value of ρV is less than the mean value, so the discrepancy is less than
this. Even so, it seems that most galaxies must be formed quite late in order
for the value of ρV in our universe to be close to the value that is anthropically
expected.
***
At the meeting in Princeton I learned of an interesting paper by Efstathiou[15],
in which he calculated the effect of a cosmological constant on the present num-

7
ber density of L∗ galaxies, which he took as a measure of the distribution func-
tion Pobs (ρV ). In this calculation he adopted a standard cold dark matter
model for matter density fluctuations, with amplitude at long wavelengths fixed
by the measured anisotropy of the cosmic microwave background. Efstathiou
found that for a spatially flat universe the galaxy density falls off rapidly (say,
by a factor 10) for values of ρV around 7 to 9 times the present mass density
ρ0 , so that < ρV > /ρ0 should be less than of order 7 to 9, giving a contribution
Ω0 = ρ0 /(ρ0 + ρV ) of matter to the total density somewhat greater than around
0.1, which is consistent with lower bounds on the present matter density.
At first sight this seems encouraging, but there are a few problems with
Efstathiou’s calculation. For one thing, as pointed out by Vilenkin[11], the
probability distribution of observed values of ρV is related to the number of
galaxies (or, more accurately, the amount of matter in galaxies) that ever form,
rather than the number that have formed when the age of the universe is at
any fixed value, as assumed by Efstathiou. However, this will not make much
difference if most galaxy formation is complete in typical subuniverses when
they are as old as our own subuniverse. Efstathiou also encountered another
problem that is endemic to this sort of calculation. The cosmological parame-
ters that can reasonably be assumed to be uncorrelated with the cosmological
constant are the baryon–to–entropy ratio and the spectrum of density fluctu-
ations at recombination, because these are presumably fixed by events that
happened before recombination, when any anthropically allowed cosmological
constant would have been negligible. But the only way we know about the
spectrum of density fluctuations at recombination is to use observations of the
present microwave background (or possibly the numbers of galaxies at various
redshifts), and unfortunately the results we obtain from this for N (δρ) depend
on the value of the cosmological constant in our subuniverse. In calculating
Pobs (ρV ) one should ideally make some assumption about the value of ρV in
our subuniverse, then use this value to infer a spectrum of density fluctuations
at recombination from the observed microwave anisotropies, and then calculate
the number of galaxies that ever form as a function of ρV , with the spectrum of
density fluctuations at recombination held fixed. Instead, Efstathiou calculated
the number of L∗ galaxies as a function of ρV , with the microwave anisotropies
held fixed, which gave Pobs (ρV ) an additional spurious dependence on ρV . This
problem was known to Efstathiou, and apparently did not produce large errors.
There is one other problem, that did have a significant effect in Efstathiou’s
calculation. He relied on the standard method[16] of calculating the evolution
of density fluctuations using linear perturbation theory, and declaring a galaxy
to have formed when the fractional overdensity ∆ρ/ρ reaches a value δc , which
is taken as the fractional overdensity of the linear perturbation at a time when a
nonlinear pressureless spherically symmetric fluctuation would recollapse to infi-
nite density. He took the effective critical overdensity for spatially flat cosmolo-
gies as δc = 1.68/Ω0.28
0 , with Ω0 ≡ 1 − ρV /ρcrit, so that δc = 3.2 for Ω0 = 0.1.
But numerical calculations of Martel and Shapiro[17] show that for all fluctua-

8
tions that result in gravitational recollapse, δc is in a range from 1.63 to 1.69.
The upper bound 1.69 is the well-known result δc = (3/5)(3π/2)3/2 = 1.6865
for ρV = 0. The lower bound 1.63 can also be understood analytically[10]: it is
the critical overdensity for the case where ρV has a value that just barely allows
gravitational recollapse
 1/3    
2 729 11 2
(δc )min = √ Γ Γ = 1.629 . (18)
π 500 6 3

With δc always between these bounds, it is impossible that the effective value of
δc for any ensemble of fluctuations could be greater than 1.69. Overestimating
δc biases the calculation toward late galaxy formation, with a corresponding
increased sensitivity to relatively small values of ρV . Efstathiou has now re-
done his calculations with δc given the constant value 1.68, which should be a
good approximation, and, as I interpret his results, he finds that this change in
δc roughly doubles the value of ρV at which the present density of L∗ galaxies
drops by a factor 10, with a corresponding reduction in the expected value of
Ω0 . It remains to be seen whether this change in his results will lead to a conflict
with observational bounds on Ω0 and ρV .
At present Martel and Shapiro are carrying out a numerical calculation of
Pobs using Eq. (12).
I am grateful for helpful discussions with George Efstathiou and Paul Shapiro.

References
[1] For a discussion of these and other possibilities, see Weinberg, S. 1989, Rev.
Mod. Phys. 61, 1
[2] Witten, E. 1995, Int. J. Mod. Phys. 10, 1247; preprint IASNS-HEP-95-51,
hep-th/9506101
[3] Linde, A. D. 1986, Phys. Lett. B 175, 395
[4] Hawking, S. W. 1983, in Shelter Island II — Proceedings of the 1983 Shelter
Island Conference on Quantum Field Theory and the Fundamental Problems
of Physics, ed. R. Jackiw et al. (MIT Press, Cambridge, 1985); Phys. Lett. B
134, 403.
[5] Coleman, S. 1988, Nucl. Phys. B 307, 867
[6] Baum, E. 1984 Phys. Lett. B133, 185
[7] Coleman, S. 1988, Nucl. Phys. B 310, 643
[8] Fischler, W., Klebanov, I., Polchinski, J., and Susskind, L. 1989, Nucl. Phys.
B237, 157

9
[9] Barrow, J. D., and Tipler, F. J. 1986, The Anthropic Cosmological Principle
(Clarendon, Oxford)
[10] Weinberg, S. 1987, Phys. Rev. Lett. 59, 2607
[11] Vilenkin, A. 1995, Phys. Rev. Lett. 74, 846; Tufts preprint gr-qc/9507018,
to be published in the Proceedings of the 1995 International School of Astro-
physics at Erice; Phys. Rev. D52, 3365; Tufts preprint gr-qc/9512031
[12] For a review and earlier references, see Ostriker, J. P. and Steinhardt, P.
J. 1995, Nature 377, 600
[13] Peebles, P. J. E. 1967, Astrophys. J. 147, 859
[14] Gunn, J. E. and Gott, J. R. 1972, Astrophys. J. 176, 1
[15] Efstathiou, G. 1995, Mon. Not. R. Astron. Soc. 174, L73
[16] Press, W. H. and Schechter, P. 1974, Astrophys. J. 187, 425
[17] Martel, H. and Shapiro, P. R. 1996, paper in preparation

10
UTTG-01-97
MOP374

LIKELY VALUES OF THE COSMOLOGICAL CONSTANT


Hugo Martel1;2, Paul R. Shapiro1;3, and Steven Weinberg4;5

ABSTRACT
In theories in which the cosmological constant takes a variety of values in
astro-ph/9701099 15 Jan 97

di erent \subuniverses," the probability distribution of its observed values is


conditioned by the requirement that there must be someone to measure it. This
probability is proportional to the fraction of matter which is destined to condense
out of the background into mass concentrations large enough to form observers.
We calculate this \collapsed fraction" by a simple, pressure-free, spherically sym-
metric, nonlinear model for the growth of density uctuations in a at universe
with arbitrary value of the cosmological constant, applied in a statistical way to
the observed spectrum of density uctuations at recombination. From this, the
probability distribution for the vacuum energy density V for Gaussian random
density uctuations is derived analytically. (The conventional quantity 0 is the
vacuum energy density in units of the critical density at present, 0 = V =crit;0,
where crit;0 = 3H02=8G.) It is shown that the results depend on only one quan-
tity, 3, where 2 and  are the variance and mean value of the uctuating
matter density eld at recombination, respectively. To calculate , we adopt
the at CDM model with nonzero cosmological constant and x the amplitude
and shape of the primordial power spectrum in accordance with data on cosmic
microwave background anisotropy from the COBE satellite DMR experiment. A
comparison of the results of this calculation of the likely values of V with present
observational bounds on the cosmological constant indicates that the small, pos-
itive value of V (up to 3 times greater than the present cosmic mass density)

1 Department of Astronomy, The University of Texas, Austin, TX 78712


2 hugo@sagredo.as.utexas.edu
3 shapiro@astro.as.utexas.edu
4 Theory Group, Department of Physics, The University of Texas, Austin, TX 78712
5 weinberg@utaphy.ph.utexas.edu
{2{

suggested recently by several lines of evidence is a reasonably likely value to


observe, even if all values of V are equally likely a priori.

Subject headings: cosmology: theory | galaxies: formation

1. INTRODUCTION
Though the evidence is still equivocal, there are persistent hints that the vacuum energy
density6V is positive, and up to 3 times greater than the present cosmic mass density 0.7
From the point of view of fundamental physics, such a value seems absurd. Crude estimates
indicate a value of V some 120 orders of magnitude greater than 0, and while it is hard
enough to imagine any sort of symmetry or adjustment mechanism that could make V
vanish (for a litany of failed attempts, see Weinberg [1989]), it would be even more peculiar
for fundamental physical theory to dictate a non-zero value for V that happens to be
comparable to the cosmic mass density 0 at this particular moment in the history of the
universe.
As far as we know, the only way to understand a value of V comparable to 0 is based on
a weak form of the anthropic principle. In several current theories the cosmological constant
does not have a xed value, but takes a variety of values with varying probabilities. For
instance, Hawking (1983, 1984) showed that the introduction of a three-form gauge eld
A yields a state vector for the universe that is a superposition of terms with di erent
values for the cosmological constant. Coleman (1988a) subsequently showed that the e ect
of wormholes in quantum gravity is to make the state vector a superposition of terms, in
which any coupling coecient in the Lagrangian that is not xed by symmetries takes all
possible values.8 Also, in chaotic in ation (Linde 1986, 1987, 1988) the observed big bang is

6 By  is meant the sum of a term =8G, where  is the cosmological constant appearing in the
V

Einstein eld equations, plus the contribution to the vacuum energy density from quantum uctuations.The
conventional quantity 0 is the vacuum energy density in units of the critical density at present, 0 =
 =crit 0, where crit 0 = 3H02=8G.
V ; ;

7 For a review and earlier references, see Ostriker & Steinhardt (1995). This conclusion has been recently
challenged by preliminary results of measurements by Perlmutter et al. (1996) of redshifts and distances for
distant Type Ia supernovae.
8 Coleman (1988b) subsequently concluded that the probability distribution of  is sharply peaked at
V

zero, as had previously been argued by Hawking (1983, 1984) and Baum (1984). This conclusion has been
challenged by Fischler et al. (1989), and it will be assumed here that there is no sharp peak at  = 0.
V
{3{

just one of an in nite number of expanding regions, in each of which the various elds that
a ect the vacuum energy can take di erent values. For brevity we will refer to parts of the
\universe" in which the cosmological constant takes di erent values, such as terms in the
state vector, local bangs, or whatever, as subuniverses.
In any theory of this general sort the measured value of the vacuum energy density
V would be much smaller than the value expected on dimensional grounds in elementary
particle physics, not because there is any physical principle that makes it small in all sub-
universes, but because it is only in the subuniverses where it is small that there would be
anyone to measure it. This paper will show how to calculate the probability distribution of
the values of V that would be observed under these circumstances.
An earlier paper (Weinberg 1987) pointed out that the anthropic limit on the value of
V for V > 0 arises from the requirement that V should not be so large as to prevent
the formation of galaxies. This paper suggested that this requirement implies a value of
V roughly comparable to the cosmic density of nonrelativistic matter at the time that the
earliest galaxies form, because, if V were much larger than this, then galaxies could not
form and there would be no observers, while there did not seem to be any reason for V
to be much smaller than this. Since then galaxies have continued to be found at higher
and higher redshifts, and hence at higher and higher values of the cosmic mass density, and
it is becoming clear that such values of V are already ruled out. A galaxy with redshift
z  4 was formed when the cosmic mass density was more than (and perhaps considerably
more than) (1 + z)3  125 times the present mass density, which is much greater than the
observational upper limit on V .
On the other hand, it is much more likely that the value of V in our subuniverse is
comparable to the average or median value measured by astronomers in all subuniverses,
rather than the anthropic upper bound, so that its value should be compared with the
cosmic mass density at the time of formation of typical galaxies, rather than of the earliest
galaxies.9 Here we will present a detailed Bayesian analysis, which allows a calculation of
the probability distribution of V from a knowledge of the spectrum of density uctuations
at recombination. The results suggest a much smaller likely value of V than the anthropic

9 See Weinberg (1996). This is essentially the same as what was called the \principle of mediocrity" by
Vilenkin (1995a, 1995b, 1995c, 1995d). Vilenkin did not undertake a detailed calculation of the probability
distribution of the cosmological constant. A calculation of this sort was done by Efstathiou (1995), but
it contained some errors (Vilenkin 1995d; Weinberg 1996). Efstathiou's calculation was done numerically,
using linear perturbation theory and what is believed to be a realistic model of initial perturbations, while
the calculation presented here is thoroughly nonlinear, but concentrates on a single spherically symmetric
density uctuation, so that it is possible to understand the results analytically.
{4{

upper bound, a value that may not be in con ict with present observational bounds.
In x2 we describe how to calculate the probability distribution for the vacuum energy
density V that would be observed in various subuniverses in which there is someone to
observe it. For V > 0, this is simply related to the fraction of matter that condenses into
galaxies. We evaluate this fraction and the resulting probability distribution in x3 under the
assumption of Gaussian random density uctuations in the cosmic mass density at the time
of recombination. The results depend only on the standard deviation  of these uctuations
at the time of recombination. In x4 we calculate , adopting the cold dark matter model for
the power spectrum of these density uctuations, and assuming a at universe with nonzero
cosmological constant (sometimes referred to as the at \CDM" model). The amplitude
of the density uctuations is xed by the data on cosmic microwave background anisotropy
from the COBE satellite DMR experiment. The results of this calculation of the likely values
of V are presented in x5, and compared with the range of values of the cosmological constant
allowed by current observational and theoretical constraints. A summary and conclusions
are presented in x6.

2. THE PROBABILITY DISTRIBUTION


We assume that the a priori probability of a net vacuum energy density between V
and V + dV is P (V ) dV , where P (V ) is some smooth function of V , with no special
behavior near V = 0. What we want is the probability distribution Pobs(V ) that a random
observer in any subuniverse will measure values of V in a given range. According to the
principles of Bayesian statistics, this is given by
Pobs(V ) = R 1 AA((V))PP((V))d ; (1)
0 V V V

where A(V ) is the mean number of astronomers making independent measurements of the
vacuum energy density in subuniverses with vacuum energy density V .10
In calculating the quantity (1), we note that the range of anthropically allowed values
of V is so much smaller than the energy densities typical of elementary particle physics,
that, within this narrow range, we can take the a priori probability distribution P (V ) to

10We have not thought through the problems associated with in nite subuniverses, where A if nonzero is
in nite. Presumably in this case we should take A to be the number of independent astronomers per same
xed number of baryons.
{5{

be constant. 11 The value of this constant then cancels in equation (1), which becomes
Pobs(V ) = R 1 AA((V))d : (2)
0 V V

The evolution of galaxies and astronomers depends on a variety of constants of nature


other than V , and the values of these other constants in the various subuniverses may be
correlated with the values of V , but the range of values of V that are anthropically allowed
is so small compared with the energy densities typical of elementary particle physics that,
within this range, we can take all other fundamental constants to have xed values, the values
we observe in our subuniverse.12 Also, once a uctuation in the cosmic mass distribution
undergoes gravitational condensation, its subsequent evolution is essentially independent of
V , so the ratio of astronomers to mass in galaxies may be taken as independent of V . The
number of astronomers A(V ) who can measure V in any subuniverse should therefore be
proportional to the fraction F (V ) of matter incorporated in galaxies, so that equation (2)
may be written
Pobs(V ) = R 1 FF((V))d : (3)
0 V V

To calculate F (V ), we note that the spectrum of initial density uctuations at re-
combination can be regarded as independent of V , because values of V for which galaxy
formation is possible are much smaller than the cosmic mass density at or before the time
of recombination. Similarly, it is reasonable to suppose that the total amount of matter in
a subuniverse in theories of chaotic in ation is independent of V within the narrow range
of values of V that are anthropically allowed. Our problem is then to calculate the fraction

11It might be asked why within this range the a priori distribution P ( ) is not a power of  rather
V V

than a constant? A power law would mean that there is something special about the value  = 0, since a
V

power law distribution function would have to vanish or blow up there. But the essence of the cosmological
constant problem is that we do not know of anything in fundamental physics that gives a special signi cance
to the value  = 0, which requires a precise cancellation of a coupling coecient in the Lagrangian by
V

radiative corrections. (Analogously, although the probability distribution of temperatures in the Antarctic
ice must vanish at zero degrees Kelvin and zero degrees Centigrade, since these are the limits of the range
of temperatures in which water freezes, we would hardly expect it to vanish or diverge at zero degrees
Fahrenheit.) The same argument applies to a power-law dependence on log  . Of course we are not
V

assuming that P ( ) does not contain terms that vary as positive powers of  , but only that there is also
V V

a constant term, which then naturally dominates for the very small values of  that are consistent with the
V

appearance of astronomers who can measure  . V

12If various constants of nature and initial conditions vary from one subuniverse to another independently
of the values of  , then Pobs ( )d is the probability that, if the other constants and initial conditions
V V V

take the values we observe, then the vacuum energy density will be observed to be between  and  + d .
V V V
{6{

F of matter that undergoes gravitational condensation into galaxies as a function of V ,


for xed initial conditions at recombination. Ironically, while the tininess of observationally
allowed values of V creates the cosmological constant problem in the rst place, it is the
tininess of the range of anthropically allowed values of V that o ers the possibility of a
realistic calculation of Pobs(V ).
To see how this can work in practice, we will carry out an illustrative calculation of F (V )
and use it to calculate both the integrated probability distribution and the mean and median
values of V observed by all astronomers in the subuniverses that contain astronomers.
Earlier work (Weinberg 1987) used a very simple model (Peebles 1967) of galaxy formation
from isolated spherically symmetric pressureless uctuations. This calculation was improved
in the report of a recent conference talk (Weinberg 1996), by using the well-known model of
Gunn & Gott (1972), which also assumes isolated uctuations with spherical symmetry and
zero pressure, but includes the infall of matter from outside the initially overdense ball. Here
we will also take into account the facts that, with space lled with uctuations, there is a
limit to the mass that can accrete onto any one uctuation, and that there must be regions
of negative as well as positive overdensity.
Consider a spherically symmetric pressureless uctuation, consisting at recombination of
a spherical core of volume V and positive average fractional overdensity  [i.e.   ( )=,
where  is the average density inside V and  is the cosmic mean density at recombination],
surrounded by a spherical shell of volume U of constant fractional underdensity, taken to
have the value (V=U ), so that the average overdensity within the whole uctuation is zero.
Outside this shell are other uctuations, about which we do not need to say anything, except
to assume that they do not seriously interfere with the spherical symmetry of the uctuation
in question. For simplicity, we will take V=U to be the same for all uctuations.
As shown in earlier work (Weinberg 1987) the core will undergo gravitational collapse
if 13
!1=3
729
  500  V
: (4)
In addition, that portion U 0 of the outer shell will fall into the core, for which the average
fractional overdensity within the volume V + U 0 saturates this inequality:
!1=3
V  U 0(V=U ) = 729V : (5)
V + U0 500

13It was originally assumed (Weinberg 1987) that the overdensity  was uniform, but these results actually
hold for arbitrary spherically symmetric uctuations, with  interpreted as the average fractional overdensity.
{7{

The fraction of the total mass (U + V ) that su ers gravitational contraction will be
F (; V ) = (1 + ) V (+U[1+ V ()V=U )] U
0
(6)

Solving equation (5) for U 0, we nd for the fraction of mass that undergoes gravitational
contraction " 1=3 #
1 + (729  V =500
 )
F (; V ) = (V=U ) (729 =500)1=3 + (V=U ) : (7)
V

We require that the total density be everywhere nonnegative, so that


  (U=V ) : (8)
For  satisfying this inequality, equation (7) gives F  1, so that no uctuation can get more
than its fair share of mass.
In what follows we will assume that the uctuation number density N () is negligible
for initial uctuations that are not everywhere weak, so that we will be integrating only over
uctuations for which   1 and   U=V . Also, for any anthropically allowed cosmological
constant, V is much less than the mass density  at recombination, so we will drop the
term (729V =500)1=3 in the numerator (but not the denominator) of the fraction in square
brackets in equation (7). The fraction of mass winding up in galaxies is then
Z1
F (V ) = d N () F (; V )
(729V =500)1=3
Z1
d (V=U ) N ()
(729V =500)1=3 + (V=U ) ;
= (9)
(729V =500)1=3

where N () d is the fraction of all positive uctuations that have average core fractional
overdensity between  and  + d, normalized so that
Z1
0
N () d = 1 :
The normalization integral in equation (3) can be calculated by interchanging the order
of integration over  and V , and expressing V in terms of a dimensionless variable x de ned
by
x3  3 :
V = 500729
Equation (9) then gives
Z1
F 500  h3i (V=U ) I (V=U ) ;
(V ) dV = 243 (10)
0
0
{8{

where I0(s) is the function


I0(s) 
Z 1 x2dx = 1 s s2 ln  s  ; (11)
0 x+s 2 1+s
and the brackets denote an average over all positive uctuations
Z1
hf ( )i  0
N () f () d : (12)

The normalized probability distribution (3) for the observed vacuum energy density is then
Pobs(V ) = 500243 1
h iI0(V=U )
3
Z1  N ( )
 (729 =500)1=3 d (729 =500
V )1=3 + (V=U ) ;
V
(13)
with all quantities on the right-hand side referring to the time of recombination.
In using equation (13) we will need to make some assumption about the shape parameter
s  V=U . The value s = 0 corresponds to the limit in which each positive uctuation is
isolated, surrounded by an in nite volume of compensating underdensity (at a total density
arbitrarily close to its mean value ), the case considered by Weinberg (1996). Values of s
much greater than unity correspond to the limit in which the additional mass associated
with the compensating underdense volume U is insigni cant compared with that contained
within the positive uctuation in volume V , the case considered in Weinberg (1987). The
value s = 1 corresponds to the case in which every positive uctuation is surrounded by
an equal volume of compensating negative uctuation. This latter value is the one most
relevant to a Gaussian-random distribution of linear density uctuations, since the volumes
occupied by positive and negative density uctuations of equal amplitude are exactly equal
in that case. Thus we will concentrate on the value s = 1 when we apply our analysis to the
observed universe in what follows. Fortunately, as we shall see, most of our results will turn
out to be almost independent of the value chosen for s.
Strictly speaking, equation (13) times dV gives the probability that, if the vacuum
energy density is positive, then it will be observed to be between V and V + dV . For
V < 0, the anthropic bound on V is set by the condition (Barrow & Tipler 1986) that
the subuniverse should survive long enough for intelligent life to arise. It is plausible that
V > 0 is strongly favored anthropically (Weinberg 1996) as well as observationally, but we
will make no attempt here to calculate the probability distribution for negative values of V .
Equation (13) can be used in various ways. One is to calculate the mean of various
powers of V . Carrying out the same exchange of integrations and change of variables as in
{9{

our calculation of equation (10), we nd


<V >= 729  hh3i i IIn((V=U
 n n 3n+3
n 500 V=U ) ; (14)
0 )
where In is the function Z 1 x3n+2
In(s)  dx : (15)
0 x+s
The average <nV > in equation (14) is taken over all subuniverses, not over uctuations as
in hi, and again all quantities on the right-hand side are to be evaluated at recombination.
In particular, the mean value observed for V (if V is positive) is
500 h6i I1(V=U ) ;
<V >= 729 (16)
h3i I0(V=U )
with  
1 s s2 s3
I1(s) = 5 4 + 3 2 + s + s ln 1 + s :
4 5 s (17)
Fortunately the ratio I1(V=U )=I0 (V=U ) in equation (16) turns out to be nearly constant;
it drops from a value 0.5 when s  V=U  1, corresponding to no infall, to a value 0.4 when
s  1, corresponding to well separated uctuations. It is therefore not very important what
value we choose for s. As mentioned earlier, it seems reasonable to assume the intermediate
case s = 1, where overdense and underdense regions have equal volume at recombination; in
this case, the ratio I1=I0 takes the value
47 ln2
I1(1)=I0(1) = 60
1 + ln 2 = 0:46693 :
2

The insensitivity of our results to the value of s suggests that they also may not be much
a ected by the crudeness of our treatment of the e ect of one uctuation on another.
We could also use equation (13) to calculate the integrated probability P (> V ) that
the vacuum energy density is greater than V :
Z1
P (> V )  V
Pobs(0V ) d0V : (18)

With the same reversal of integrations over 0V and  and the same change of variables as
before, we nd D  E
3 I (729V =500)1=3= ; V=U
P (> V ) = h3i I0(V=U ) ; (19)
{ 10 {

where for t < 1,


Z1  
x 2 1
I (t; s)  t x + s dx = 2 (1 t ) s(1 t) + s ln t + s ;
2 2 1 + s (20)
and I (t; s)  0 for t > 1. As in our calculation of < V >, this is insensitive to the precise
value of s  V=U . For very small V , equation (19) of course approaches unity, whatever
the value of s. For very large V , the only uctuations that contribute within the integral
over  in equation (19) are those with  near the lower limit of the integral, for which the
lower limit of the integral (20) is near unity, where this integral behaves as 1=(1 + s). But
then P (> V ) has an s-dependence proportional to [I0(s)(1 + s)] 1, which only rises from 2
to 3 as s rises from zero to in nity.

3. Gaussian Density Fluctuations


To go further, we must make some assumption about the form of the uctuation prob-
ability distribution N () at an early epoch, such as that of recombination. Current data
on the anisotropy of the cosmic microwave background on large angular scales and on the
large-scale clustering properties of galaxies, as well as theoretical predictions of the origin
of density uctuations by quantum processes in the early universe in in ationary cosmology
models, are consistent with the assumption that the primordial uctuations were isotropic,
Gaussian random noise of very small amplitude. In this case the uctuation distribution has
the form  1=2 2 !
1 2
N () =   exp 22 :  (21)

The mean values of powers of  are given in terms of the variance 2 by



h i = 2 1=2
N
N=2 N N + 1 : (22)
2
Equation (14) then gives
!n
< >=
n

3n + 4  1000  21=23 In(s) ; (23)
V
2 729 I0(s)
and in particular, the mean vacuum energy density is
" #
)1=23 I1(s) ;
<V >= 625(2243 (24)
I0(s)
{ 11 {

where, as before, s  V=U . This gives the numerical values


8
>
>
<2:5788 s = 0 ;
<V >= 3  3:0103 s = 1 ;
>
>
(25)
:3:2235 s = 1 :

Also, using the Gaussian distribution (21) in equation (19) and writing

 = 729 V 1=3 x 1=2 ;
500
we nd the di erential probability distribution
1=2d Z 1 e x dx
Pobs(V ) dV = 2I (s) sx1=2 + 1=2 ; (26)
0

where !2=3
 21 2 729 V : (27)
500
The probability of a vacuum energy density greater than V is then:
Z1 (   1=2 )
1
P (>V ) = (1 + )e 2I (s) e 2s( x) + + 2s x ln 1
s x + 1 dx : (28)
x 1=2 2
0

By combining equations (24) and (27), we see that the parameter in equations (26) and
(28) may be expressed in terms of s  V=U and the ratio V =<V > of the vacuum energy
density to its mean value
 1=3 "  I
#2=3
1(s)
= 4 15 < > I (s) :
V
(29)
V 0

Thus the probability of observing a vacuum energy density in a certain range depends only
on the values of V =<V > and s. The analysis of initial uctuations will enter here only as
a means of calculating the parameter 3, and hence <V >.
In using these formulas, it is important to note that during the era of recombination,
when V is negligible and uctuations are small,  grows as t2=3 and  falls as t 2, so the com-
bination 3 is time independent. Therefore equations (24){(28) show that <V >, Pobs(V ),
and P (> V ) do not depend on what we take as the precise moment of recombination at
which  and  are evaluated.
{ 12 {

We have plotted the di erential probability dP =d  < V > Pobs(V ) versus


 V =<V > in Figure 1, for a range of values of s. This gure shows that the probabil-
ity distribution drops exponentially for large V , and is remarkably insensitive to the value
of s for all V , except for V  1. We have also plotted the di erential probability per
logarithmic interval of , Pobs( ), versus [i.e. dP =d log = V Pobs(V )] in the bottom
panel of Figure 1. This quantity is peaked at values of  = 0:7 0:8, almost independent
of s. Figure 2 shows the integrated probability P (> V ) versus the dimensionless quantity
de ned by equation (27), for various values of s. Again, we see that this probability is only
weakly dependent on s. We can evaluate P (>V ) exactly in a few special cases
(
P (> V ) = e3 e s = 0;
s=1 & 1: (30)
2

Equation (28) may be used to calculate the median value (V )1=2, for which
P [> (V )1=2]  1=2. For isolated uctuations s = 0, and the result is
(V )1=2 = 4(ln 2)3=2 < > = 0:43411 < > = 1:1195 3 :
31=2 V V (31)
For other values of s the median must be calculated numerically, with the result that
8 8
>
>
< 0 : 43411 < 1:1195
>
> s = 0;
(V )1=2 = <V >  > 0:49335 =    > 1:4851
3 s = 1; (32)
> >
: 0:51480 : 1:6595 s = 1:
Again we see the insensitivity of our results to the shape parameter s  V=U .
It is also interesting to ask what is the range of reasonably likely values of V ? For
instance, what is the range of values of V for which only 5% of astronomers in all subuni-
verses would observe smaller values, and only 5% would observe larger values? By setting
P (> V ) in equation (28) equal to 0.95 and 0.05 and solving for V , we nd that this range
has the lower bound
8 8
>
>
<0:00874 < 0:02254
>
> s = 0;
(V )0:95 = <V >  0:01959 =    > 0:05897
>
3 s = 1; (33)
> >
:0:02359 : 0:07604 s = 1;
and the upper bound
8 8
3:9005
>
>
< < 10:0586
>
> s = 0;
(V )0:05 = <V >  3:6914 =    > 11:1122
>
3 s = 1; (34)
> >
3:6157
: : 11:6552 s = 1:
{ 13 {
100

10

.1

.01
s = 0.01
.001 s = 100

.0001
0 2 4 6 8 10

100
s = 0.01
10
s = 10, 100
1

.1

.01

.001

.0001
.0001 .001 .01 .1 1 10

.3 s = 100

.2 s = 0.01

.1

0
.0001 .001 .01 .1 1 10

Fig. 1.| Di erential probability dP =d [=<V > Pobs(V )] versus , where  V = < V >,
for s = 0:01, 0.1, 1, 10, and 100. The bottom panel shows dP =d log [= V Pobs(V )] versus
, instead.
{ 14 {

Fig. 2.| Integrated probability P (> ) versus , for s = 0:01, 0.1, 1, 10, and 100.

Although the lower bound (33) is evidently somewhat sensitive to the shape parameter s, we
see that for all values of s the distribution of V values is quite broad; it would not be very
unlikely for a subuniverse to have a value of V that is 50 times smaller or 3.7 times larger
than the average. On the other hand, it would be extremely unlikely to observe a value of
V which di ers from the mean by more than a few orders of magnitude. Not only are large
values of V unlikely, therefore, as we might previously have guessed based upon the fact
that large V suppresses galaxy formation, but values of V extremely close to zero are also
unlikely; there are simply too many other subuniverses to observe which have larger values
of V but not large enough to prevent galaxy formation.
For some purposes it is useful to have an analytic t to the integrated probability. The
{ 15 {

following generally works quite well:


P (> V )  (3s + 2)2se + 2 se
2
: (35)
Notice that this t is consistent with the special cases mentioned above. The absolute
di erence jP f j and the relative di erence j(P f )=Pj are always less than 0.025 and 0.028,
respectively, for 0  s  1, while these errors increase to 0.027 and 0.092, respectively, for
1  s  1. Because the approximation formula (35) gives P (> V ) as a quadratic function
of e , it is easy to solve equation (35) for the value of and hence of V = <V > that give
any speci c value for P (>V ). For instance, for s = 1 equation (35) yields a median vacuum
density (V )1=2 = 0:482 < V >, as compared with an exact value (V )1=2 = 0:493 < V >,
while for s = 1 equation (35) yields (V )1=2 = 0:568 < V >, as compared with an exact
value (V )1=2 = 0:515 <V >. For s = 0, equation (35) is exact.
Finally, it is also useful to compute the probability that is observed to be within the
interval [ 1; 2], according to P ( 1   2)  P (> 1) P (> 2). We present these
interval probability results in Table 1, for s = 1. We also plot in Figure 3 the interval
probability isocontours such that, for any given value of , call it  , the curves show the
values of above and below this  at which the interval probability has some particular
value, as labelled. In particular, each curve in the ( ; )-plane in Figure 3 corresponds to
the locus of points which satisfy the equation P (> ) P (> ) = constant, if > ,
or P (> ) P (> ) = constant, if < , where the labels indicate the values of the
constant.

4. Evaluation of 
4.1. Filtered Density Fluctuation Spectrum
The mean value < V > as well as the probability distribution Pobs(V ) and the in-
tegrated probability P (> V ) have been expressed in equations (24){ (29) in terms of the
variance 2 in the uctuation distribution (21). Now we must consider how to calculate .
From equation (22), we have 2 = h2i. But the variance 2 which is appropriate for our
purpose here, is that which re ects the range of wavenumbers which might possibly contribute
to the formation of gravitational condensations that are large enough to lead to \astronomer
formation." Only wavenumbers corresponding to density uctuations encompassing such
suciently large masses should be allowed to contribute. This implies that the appropriate
 for our purpose here is one calculated by ltering the underlying density eld to eliminate
the contribution from small wavelengths. This is accomplished by smoothing the density
{ 16 {

Table 1. THE PROBABILITY THAT V = < V > IS BETWEEN TWO VALUES1

1 2 P (> 1) P (> 2 ) 1 2 P (> 1) P (> 2)


0.000 0.001 0.00355 1.5 1.6 0.01499
0.001 0.002 0.00317 1.6 1.7 0.01366
0.002 0.003 0.00299 1.7 1.8 0.01245
0.003 0.004 0.00287 1.8 1.9 0.01140
0.004 0.005 0.00277 1.9 2.0 0.01044
0.005 0.006 0.00269 2.0 2.1 0.00959
0.006 0.007 0.00262 2.1 2.2 0.00882
0.007 0.008 0.00256 2.2 2.3 0.00813
0.008 0.009 0.00251 2.3 2.4 0.00750
0.009 0.010 0.00247 2.4 2.5 0.00693
2.5 2.6 0.00642
0.00 0.01 0.02820 2.6 2.7 0.00585
0.01 0.02 0.02268 2.7 2.8 0.00561
0.02 0.03 0.02023 2.8 2.9 0.00512
0.03 0.04 0.01858 2.9 3.0 0.00477
0.04 0.05 0.01731
0.05 0.06 0.01629 0.0 0.5 0.50350
0.06 0.07 0.01543 0.5 1.0 0.18691
0.07 0.08 0.01468 1.0 1.5 0.10271
0.08 0.09 0.01404 1.5 2.0 0.06294
0.09 0.10 0.01346 2.0 2.5 0.04097
2.5 3.0 0.02777
0.0 0.1 0.18090 3.0 3.5 0.01937
0.1 0.2 0.11175 3.5 4.0 0.01382
0.2 0.3 0.08510 4.0 4.5 0.01004
0.3 0.4 0.06863 4.5 5.0 0.00740
0.4 0.5 0.05712 5.0 5.5 0.00553
0.5 0.6 0.04850 5.5 6.0 0.00418
0.6 0.7 0.04178 6.0 6.5 0.00318
0.7 0.8 0.03638 6.5 7.0 0.00245
0.8 0.9 0.03197 7.0 7.5 0.00189
0.9 1.0 0.02828 7.5 8.0 0.00148
1.0 1.1 0.02518 8.0 8.5 0.00115
1.1 1.2 0.02253 8.5 9.0 0.00092
1.2 1.3 0.02024 9.0 9.5 0.00072
1.3 1.4 0.01825 9.5 10.0 0.00058
1.4 1.5 0.01652 10.0 1 0.00249
1 For s = 1, where   = <  >
V V
{ 17 {

Fig. 3.| Interval Probability Isocontours. Each curve in the ( ; )-plane is the locus
of points with a constant probability that a subuniverse is observed to have a value of
V = < V > in the range between  and (i.e. jP (>  ) P (> )j = constant, where
each curve is labelled with the value of this constant.)

eld before calculating the variance 2, according to


2 = h~2(r)i ; (36)

where Z
~(r)  (x)W (x r)d3x : (37)
Here W is a smoothing \window function," and x and r are co-moving coordinates, which
following convention we shall normalize to give the proper distance at present. This yields
{ 18 {

the following familiar expression for the variance 2


Z1
2 1
 = 22 0 P (k)W^ 2(kR)k2dk ; (38)

where P (k) is the power spectrum (assuming statistical translation and rotation invariance)
Z
P (jkj)  d3x h(x + r)(r)i eikx (39)

and W^ (kR) is the Fourier transform of the window function


Z
^
W (kR)  d3x e ikx W (x) ; (40)
with R a length parameter to be speci ed below, introduced to make the argument of W^
dimensionless. (The bracket in eq. [39] implies an average over space; the assumptions of
isotropy and homogeneity ensure that P depends only on k = jkj.) The window functions
in which we are interested here are those which lter out modes of wavelength smaller than
some characteristic value R. There are two conventional choices for the window function,
the Gaussian window function,
W^ G(u) = e u2 =2 ; (41)
and the Top-Hat window function,
W^ TH (u) = u33 (sin u u cos u) ; (42)

(Peebles 1980). The baryonic mass associated with a density uctuation of wavelength close
to the lter scale is given by
(
Mf = (2 )3=2B0R3 (Gaussian), (43)
(4=3)B0R3 (Top-Hat),
with both R and the cosmic mean baryon density B0 evaluated at the present. The radii
for which both window functions enclose the same mass are thus related by
RG = (4=3)1=3 = 0:6431 : (44)
RTH (2)1=2
We shall occasionally refer to a maximum wavenumber kmax as the wavenumber correspond-
ing to a wavelength RG,
kmax  R2 = (4=(23)1)=3R :
3=2
(45)
G TH
{ 19 {

The particular value of RG (or RTH ) appropriate for use in calculating the mass fraction
which collapses out of the background is uncertain. Our understanding of the detailed
conditions necessary for the formation of planets and intelligent life has not yet advanced to
the point of determining what the minimum mass condensation is which is capable of forming
astronomers. Roughly speaking, we should lter out condensations that are too small to
retain metals produced in the rst generation of stars. The minimum mass condensation
which is capable of this is currently unknown, however. It is not yet established, for example,
whether globular clusters of mass 105 106 M are capable of self-enrichment, whereby a
rst generation of stars generates and releases heavy elements without expelling them from
the cluster, so that they can subsequently be incorporated in a second generation of stars.
Dwarf galaxies of even greater mass, in fact, are often postulated to undergo an initial burst
of massive star formation which leads to supernova-driven expulsion of their interstellar gas
(containing heavy elements). Even the typical galaxy in a rich cluster of galaxies is widely
believed to have released most of its heavy elements into the intracluster medium, in order
to account for the nearly solar metallicity of that gas, which dominates the baryonic mass of
the cluster. In short, we do not currently know what the minimum mass scale (or associated
wavelength of density uctuations) is which satis es the necessary condition that the metals
produced by the rst generation of stars are retained. Nor do we know if this is a sucient
condition for the formation of astronomers.
In fact, all we can say with certainty is that our own Milky Way galaxy met the necessary
and sucient conditions for forming planets, life, and astronomers. The Milky Way has a
luminosity which makes it roughly an L-galaxy, the characteristic luminosity in the bright
end of the galaxy luminosity function. If the minimum mass scale Mf that can be responsible
for astronomer formation corresponds to that of an L-galaxy, then the data on the galaxy
luminosity function and the mass-to- light ratio of the bright inner parts of eld spiral
galaxies yields an estimate of the baryonic mass of the bright inner part of an L-galaxy of
Mf  1011h 1 M (see, for instance, Peebles 1993, pp. 122{123). This leads to an estimate of
RG = h 1=3(
B h2=0:015) 1=3Mpc in present units, assuming a cosmic mean baryon density
which is consistent with the current big bang nucleosynthesis abundance constraints. If, on
the other hand, we take Mf to be the mass of all the baryons initially within a comoving
sphere whose volume equals that which, on average, typically contains just one L-galaxy
today, this gives RG  = 2h 1 Mpc. In view of the fact that the Milky Way is actually not
an isolated galaxy, but is, instead, a member of the Local Group of galaxies, which includes
more than one L-galaxy, we might even wish to consider the possibility that galaxy group
membership is somehow essential to the formation of astronomers.14 In that case, a value as

14For example, group membership might enable a galaxy which undergoes an early burst of star formation
{ 20 {

large as RG  3h 1 Mpc would even be reasonable.


In what follows, we will consider a range of values of RG, therefore. For the case where
Mf corresponds to the bright inner part of an L-galaxy, we shall take RG = 1 Mpc in present
units (where, for simplicity, we shall drop the weak dependence of RG on h for a xed value
of
B h2). We will also bracket the range of possible outcomes by taking values of RG which
are smaller and larger than this, respectively. On the low side, we take RG = 0:01 Mpc,
relevant for instance, if Mf corresponds to the mass of a globular cluster. On the high side,
we take RG = 2 Mpc or 3 Mpc to illustrate the possibilities that Mf corresponds either to
the total mass within the mean volume per L-galaxy or else the mass of a small group of
galaxies, respectively.

4.2. The Cold Dark Matter Model


In order to evaluate  for a given value of RG, we must adopt a model for the density
uctuation power spectrum at recombination. The cosmic microwave background anisotropy
measured at large angles by the COBE satellite is consistent with Gaussian random noise
density uctuations with a scale-invariant primordial power spectrum P (k) / kn, where
n = 1, the case referred to as the Harrison{Zel'dovich spectrum. The range currently allowed
by a statistical analysis of the rst four years of data from the COBE DMR experiment is, in
fact, n = 1:2  0:3 (Bennett et al. 1996). In what follows, we shall generally assume n = 1,
which is the standard prediction of in ationary cosmology. Later, we can consider the e ect
of a \tilt" in the primordial spectrum away from the shape for n = 1. (Values of n < 1 can
result, for example, if the primordial uctuations include a gravitational wave contribution.)
In general, the power spectrum at recombination di ers from the primordial shape kn,
except in the long wavelength limit measured directly by COBE. The di erence re ects the
linear growth of the density uctuations prior to the recombination epoch, which is di erent
for di erent wavelengths. The best-studied and most successful model for the growth of
density uctuations to date is the Cold Dark Matter (CDM) model. This model treats
the CDM density uctuations as adiabatic uctuations in a cold, pressure-free gas. Since
we are interested in the growth of density uctuations in the baryon-electron uid, which
must be present to form stars, planets, and people, we make the assumption that this
component collapses out in lock-step with the dark matter component, at least for density

to expel its heavy elements into the surrounding intra-group environment. The latter might then act as a
reservoir from which the galaxy could later accrete some of its lost metals, after the expelled gas has cooled
o .
{ 21 {

uctuations which are of wavelength large enough to behave in a pressure-free manner.


As long as we restrict our attention to the epoch of recombination and later epochs and
to wavelengths larger than the baryon Jeans length in the intergalactic medium, that is,
the CDM and baryon power spectra should be identical. [For a detailed discussion of the
e ects of Jeans-mass- ltering on the linear growth of baryon density uctuations in a at,
matter-dominated CDM model in which the Jeans mass is increased by the reheating of the
intergalactic medium which accompanies its reionization, the reader is referred to Shapiro,
Giroux, & Babul (1994)].

4.2.1. The Power Spectrum


We use for the CDM power spectrum the expression given by Liddle et al. (1996) and
references therein:
P (k; z) = 22 Hc 3+n(H )2 kn T 2(q)A 2(z; 0) ;
 
(46)
0

where

A(z; 0) = +(0)
(z ) ; (47)
+

T (q) = ln(12+:342q:34q) 1 + 3:89q + (16:1q)2 + (5:46q)3 + (6:71q)4


h i 1=4
; (48)

k ;
q = h Mpc (49)
1

=
0he
B0
B0 =
0 ; (50)
and H is the dimensionless amplitude at horizon crossing, which must be taken from obser-
vations of anisotropies in the microwave radiation background;
0 is the total matter density
parameter (
0 = 0=crit;0, where crit;0 = 3H02 =8G);
B0 is the corresponding parameter
for baryonic matter; H0 is the Hubble constant; h = H0=100 km s 1 Mpc 1; + is the pure
growing mode solution for the evolution of linear density uctuations in this at universe
with nonzero cosmological constant; and A(z; 0) = +(0)=+ (z) is the linear growth factor
between redshift z and the present. For n = 1, these formulae describe the case of the
Harrison{Zel'dovich scale-invariant power- law primordial spectrum, modi ed by the growth
{ 22 {

of uctuations in a CDM model universe, for a at universe with a nonzero cosmological


constant 0 = 1
0 = V =crit;0. The tting formula for T (q) is from Bardeen et al. (1986),
but with given by a t by Sugiyama (1995) in the form quoted by Liddle et al. (1996).
(The numerical coecients in this formula depend on the present microwave radiation en-
ergy density and on the quantities 100 km/sec and 1 Mpc used in de ning the dimensionless
quantities q and h, but not on 0 or H0.) The formula for given in equation (50) includes
an exponential correction factor for the e ect of nonzero baryon density. Since the variable
is often used in the literature to refer, instead, just to the product
0h (i.e. without the
exponential correction factor), the so-called \shape parameter" for CDM models, we will
also de ne 0 
0h, to use whenever we wish to refer only to this product. In the following
calculations, we use
B0 = 0:015h 2 , consistent with big bang nucleosynthesis constraints
from the abundance of light elements (e.g. Copi, Schramm, & Turner 1995).
It is important to note that the explicit dependences of equations (46){(50) on the
values of
0 and h do not mean that the power spectrum at recombination is di erent for
di erent subuniverses with di erent values of the cosmological constant. All factors which
depend upon
0 and h re ect the fact that a knowledge of the local values of
0 and h in
our own subuniverse is required in order to interpret present-day observations in our own
subuniverse (such as those of cosmic microwave background anisotropy) unambiguously to
determine the power spectrum at recombination assumed common to all subuniverses. We
must, therefore, distinguish clearly between the particular values of
0 = 1 0 and h in our
own subuniverse, on which our inference of the universal power spectrum depends, and the
variables
0 = 1 0 and h, di erent for di erent subuniverses, on which the probability
of galaxy formation in any subuniverse depends. To avoid any possible confusion on this
point, we will, henceforth, indicate the values of these quantities in our own subuniverse by
adding an asterisk to the symbol (i.e. 0 ,
0, H0, V , etc.). This notation is not necessary
for the quantities , B , P (k), or , however, since these are assumed not to vary from one
subuniverse to another.

4.2.2. The Linear Growth Rate in a Flat Universe with Nonzero Cosmological Constant
In order to evaluate equation (46) for the CDM power spectrum for any particular value
of the vacuum energy density, we must evaluate the growth factor A(z; 0; 0) in equation (47)
as a function of z and 0  V =crit;0. It is convenient to express this in term of a function
f (0; z), de ned as the ratio of the growth factor (1 + z) in an Einstein{de Sitter universe
to A(z; 0; 0):
{ 23 {

f (0; z)  A1(+z; z0) = (1 + z) +((0)


z) ; (51)
+
where + is the amplitude of the linear growing mode, which is given for general 0 6= 0 by
 1=2 Z y
+(z; 0) = y1 + 1 dw (52)
0 w1=6(1 + w)3=2

(Martel 1991), with


y  (Vz) =
0 (1 + z) 3 ; (53)
0

and
0  1 0 . Using equations (51){(53), we get, after some algebra,
 1=2Z 0 =
0  1
f (0 ; z) =
01=2(1 + z)5=2 0
1 +
(1 + z)3 dw
0 0 w1=6(1 + w)3=2
Z 0 3 dw
 0

0 (1+z)
w1=6(1 + w)3=2 : (54)

For 1 + z  1, this gives the z-independent result


5=6 Z 0 =
0  1
f (0; z) ' f (0) = 601=3 dw
w1=6(1 + w)3=2 : (55)
5
0 0

The corrections are of order (1+ z) 3, which for the case z  1000 that interests us is entirely
negligible. We have evaluated the integral in equation (55) numerically, and have plotted
the function f (0 ) in Figure 4. As we see, f (0 ) di ers substantially from unity only for
relatively large values of 0.

4.2.3. Normalization at Recombination from Cosmic Microwave Background Anisotropy


Measurements
According to Bunn & White (1996), the rst four years of data on the cosmic microwave
background temperature anisotropy detected by the COBE DMR experiment may be t with
a dimensionless amplitude at horizon crossing given by the formula
H = 1:94  10 5 (
0) 0:785 0:05ln
0 exp [a(n 1) + b(n 1)2] ; (56)
{ 24 {
4

0
0 .2 .4 .6 .8 1

Fig. 4.| Ratio f (0) of the linear growth factors for the Einstein-de Sitter model and the
at 0 6= 0 model de ned by equation (55), versus 0 .

where an asterisk, recall, denotes that the quantities are evaluated for our own subuniverse
only. There are two sets of values of the constants a and b, which correspond to the cases
of n 6= 1 without any gravitational wave contribution (a = 0:95, b = 0:169) and of
power-law in ation with gravitational waves (a = 1, b = 1:97), respectively.
Equations (46){(51), along with equations (55) and (56), can now be evaluated to com-
pute the power spectrum at recombination for any at model for any values of 0 (or,
equivalently,
0) and h. The results for n = 1 are shown in Figure 5 for 0 = 0 and
h = 0:5 or 1 (top panel), and for various values of 0 and h = 0:5 (bottom panel).

4.3. Results for  and 3


The variance 2 at recombination is given by equations (38) and (46){(51), as
 
(zrec) = (c100)(n+3)=2 (n+3)=2 
H A(zrec; 0) Kn (qmax)
1 1=2
  
= (c100)(n+3)=2(1 + zrec) 1 (n+3)=2 
H f (0 )K 1=2(q
n max) ; (57)

{ 25 {

Fig. 5.| CDM power spectrum at recombination (zrec = 1000), for n = 1, versus comoving
wavenumber k (= 2=, where wavelength  is in present units of Mpc) . Top panel:
Einstein-de Sitter model (
0 = 1, 0 = 0) with Hubble constants h = 0:5 and 1. Bottom
panel: Flat models with h = 0:5 and 0 = 0, 0.2, 0.4, 0.6, and 0:8.
{ 26 {

where the symbol \" labelling the brackets in equation (57) and in what follows, indicates
that all quantities inside the brackets are evaluated using the values 0 and h in our own
subuniverse; c100 = 2997:9 is the speed of light in units of 100 km/sec; the second equality
refers to the result to leading order in (1 + zrec) 3; and
8Z 1  
^ q
< 0 q T (q )WG 2 qmax dq
>
>
> n+2 2 2 (Gaussian)
Kn (qmax)  > Z 1 
(2)3=2 q dq (Top-Hat) (58)
>
>
: q ^
n+2 T 2(q )W 2
TH (4=3)1=3 q
0 max

The integrals in equation (58) are evaluated numerically. The results are shown in Figure 6
including the values n = 1, 0.9, and 0.8. (Note: For n 6= 1, we hereafter adopt constants a
and b for the case of n 6= 1 with no gravitational waves. The case with gravitational waves
yields a slightly smaller value of  for the same value of n.)

Fig. 6.| The dimensionless integral Kn (qmax) vs. qmax, de ned by equation (58) for CDM
density uctuations, for the Gaussian (solid) and Top-Hat (dashed) window functions, re-
spectively, for n = 1, 0.9, and 0.8, as labelled.
It is customary to report the normalizations of the power spectra for di erent models
in terms of the value of  evaluated at the present for a particular lter scale, assuming
{ 27 {

uctuations continue to grow at the linear growth rate. For our purpose here, however, we
must evaluate  at recombination, thereby undoing the e ects of the growth of uctuations
since that epoch which in uence the value of  at the present. Our goal, recall, is to use
the observations of the cosmic microwave background anisotropy made by astronomers in
our own subuniverse to infer the universal density uctuation distribution, common to all
subuniverses at zrec. Unfortunately, our ability to infer this universal density uctuation
distribution is limited by the fact that we must know the values of 0 and h in our own
subuniverse in order to interpret the cosmic microwave background anisotropy measurements
unambiguously. We illustrate the dependence of the inferred density uctuations on the as-
sumed values of 0 and h in Figure 7, where we have plotted the value of  at recombination
as a function of 0 , for RG = 0:01; 1; 2; and 3 Mpc, for n = 1, 0.9, and 0.8. We have taken
zrec = 1000 for the results in Figure 7. The e ect of a \tilt" to n < 1 is to decrease  relative
to its value for n = 1, for the same value of qmax, or equivalently, of RG.
As already mentioned, results for the probability distribution of V actually depend not
on  or , but on the parameter 3. The total matter density  at recombination is related
to the present matter density 0 by  = 0(1 + zrec)3, so as promised 3 is independent of
the precise value chosen for zrec:
 3
3 = 0(c100)(3n+9)=2 (n+3)=2 
H f (0 )Kn (qmax) :
1=2 (59)


5. RESULTS
With all of the ingredients necessary to evaluate the probability distribution Pobs(V )
thus assembled, we can now evaluate the probability of observing any particular value of V
anywhere in the universe, as well as the average and median values observed, as functions
of the values we adopt for V (or 0 = 1
0) and h in our own subuniverse. There
are two ways to use this information. We may try to guess the actual value of V , by
assuming that we live in a typical subuniverse, in which V is equal to either the mean or
the median observed values of V for all astronomers in all subuniverses. (This will be done
in subsections 5.1 and 5.2.) Such a \prediction" carries low con dence, since it is always
possible that V in our subuniverse could be signi cantly di erent from the anthropic mean or
median. Alternatively, we can use what data we have to estimate the range of observationally
allowed values of V , and then calculate the likelihood that a randomly chosen astronomer
in any subuniverse would nd such a value. (This will be the subject of subsections 5.3 and
5.4.) If this likelihood proves to be appreciable, then the anthropic principle would survive
as a possible explanation for the particular value of the cosmological constant in our own
{ 28 {

Fig. 7.| (a) The rms density uctuation at recombination (i.e. at zrec = 1000),  =
rec, in the COBE-normalized, at CDM model, versus 0, for RG = 0:01; 1; 2; and 3 Mpc,
respectively, as labelled, for h = 0:5, for n = 1 (top panel), 0.9 (middle panel), and 0.8
(bottom panel). (b) Same as Fig. 7(a), except for h = 1.
{ 29 {

Fig. 7b
{ 30 {

subuniverse.

5.1. The Average Observed Value of the Vacuum Energy Density in a


COBE-Normalized CDM Universe
From equations (24) and (59), the average observed vacuum density is given by
" #
< V > = 625(2)1=2 I1(s) c(3n+9)=2 (3n+9)=2
h
f (0)H
i3
K 3=2(q

:
max) (60)
0 243 I0(s) 100 n

where again the symbol  indicates that all quantities inside the braces are evaluated at the
particular values 0 = 0,
0 =
0, h = h for our own subuniverse.
We have used equation (60) to plot the ratio V = < V > versus the assumed value 0 of
the cosmological constant in our subuniverse, and also versus the ratio 0=
0, in Figure 8
for h = 0:5 and 1, RG = 0:01; 1; 2; and 3 Mpc, and n = 1, 0.9, 0.8.
For what values of 0 and h is the vacuum energy density in our own subuniverse equal
to the observed average of all the subuniverses? For the particular values of RG and h
assumed in plotting the curves in Figure 8, the intersection of the horizontal dashed line
with the curves indicates the values for which V =< V >. To answer this question more
generally as a function of RG and h, we have solved the implicit equation V =< V >
numerically by setting the quantity (60) equal to V =0 = 0=(1 0), using the secant
method. The results are shown in Table 2 for n = 1 and s = 1. According to these results, if
the probability that observers are created in any subuniverse is proportional to the amount
of mass which eventually collapses out into the bright inner parts of L-galaxy-mass objects
or larger objects (i.e. RG  1 Mpc), then our own subuniverse has the average observed
value of the vacuum energy density if the value of 0 which we observe locally is
8 8
V 
 >
< 4 : 1 ; < 0:80 ;
> h = 0:5 ;

0 = >: 712:3:4; ; 0 = >: 00::88 ;
93 ;
h = 0:7 ;

h =1.
(61)

If the relevant collapsed fraction is, instead, that which condenses into objects as large as or
larger than those which contain the mean total baryon mass per L-galaxy (i.e. RG  2 Mpc),
then our own universe has V =< V > if
{ 31 {

Fig. 8.| (a) The vacuum energy density adopted for our own subuniverse, V =
3(H0)20=8G, is plotted in units of the mean value of the vacuum energy density ob-
served in all subuniverses, < V >, versus 0 , the cosmological constant in our own sub-
universe, for local Hubble constant h = 0:5, for a COBE-normalized at CDM model with
primordial power spectrum index n = 1 (top panel), 0.9 (middle panel), and 0.8 (bottom
panel), assuming shape parameter s = 1. Intersections of the dashed horizontal lines at
V = < V >= 1 with the various curves give the solution of the implicit equation obtained
by setting < V >= V in equation (60); (b) Same as Fig. 8(a), except plotted versus 0=
0;
(c) Same as Fig. 8(a), except for h = 1; (d) Same as Fig. 8(b), except for h = 1.
{ 32 {

Fig. 8b
{ 33 {

Fig. 8c
{ 34 {

Fig. 8d
{ 35 {

Table 2. LOCAL VACUUM ENERGY DENSITIES WHICH EQUAL THE GLOBAL


AVERAGE (V =< V >)1

h = 0:5 h = 0:7 h = 1
RG (Mpc)2
0
0 0 =
0 0
0 0=
0 0
0 0=
0
0.003 0.8905 0.1095 8.1324 0.9373 0.0627 14.949 0.9635 0.0365 26.397
0.010 0.8839 0.1161 7.6133 0.9327 0.0673 13.859 0.9602 0.0398 24.126
0.030 0.8740 0.1260 6.9365 0.9256 0.0744 12.441 0.9562 0.0438 21.831
0.100 0.8593 0.1407 6.0173 0.9161 0.0839 10.919 0.9500 0.0500 19.000
0.300 0.8397 0.1603 5.2383 0.9032 0.0968 9.3306 0.9415 0.0585 16.094
1.000 0.8046 0.1954 4.1177 0.8791 0.1209 7.2713 0.9252 0.0748 12.369
2.000 0.7712 0.2288 3.3706 0.8554 0.1446 5.9156 0.9073 0.0927 9.7875
3.000 0.7418 0.2582 2.8730 0.8339 0.1661 5.0205 0.8912 0.1088 8.1912
6.000 0.6627 0.3373 1.9647 0.7669 0.2331 3.2900 0.8326 0.1674 4.9737
10.000 0.5424 0.4576 1.1853 0.6408 0.3592 1.7840 0.6680 0.3320 2.0120
15.000 0.3582 0.6418 0.5581 0.3539 0.6461 0.5478 0.1665 0.8335 0.1998
20.000 0.1861 0.8139 0.2287 0.1217 0.8783 0.1386 0.0377 0.9623 0.0918

1 For Harrison-Zel'dovich scale-invariant primordial power spectrum (n = 1), and s = V=U = 1.


2 Present units
{ 36 {

8 8
V < 3:4 ;
> < 0:77 ;
> h  = 0:5 ;

=> 5:9 ; 
0 = > 0:86 ; h  = 0:7 ; (62)
0 :
9:8 ; :
0:92 ; h =1.


If, instead, galaxy groups are the minimum scale of interest (i.e. RG  3 Mpc), then our
own subuniverse has V =< V > if
8 8
V 
 >
< 2 : 9 ; < 0:74 ;
> h  = 0:5 ;
0 = >: 58::02 ;; 0 
=> 0:83 ; h  = 0:7 ; (63)
:
0:89 ; h =1.


On the other hand, if globular cluster formation is enough to satisfy the anthropic constraint
(i. e., RG  0:01 Mpc), then our own universe has V =< V > if
8 8
V < 7:6 ;
> < 0:88 ;
> h = 0:5 ;
= 13:9 ;  
0 = > 0:93 ; h = 0:7 ; (64)

0 >
:
24:1 ; :
0:96 ; h = 1 .
We have assumed s = 1 in all these cases.
The e ect of the \tilt" from n = 1 to n < 1 on these results is to reduce the values
of < V >, or < 0=
0 >, somewhat. This can be understood in terms of the ratio n3 =13,
where the subscript refers to the index of the primordial power spectrum. This ratio is given
by
n3 =  c h Mpc 13(n 1)=2 H (n) 3 Kn (qmax) 3=2
13 H0 H (1) K1(qmax)
 
= (c100 )3(n 1)=2 exp [3a(n 1) + 3b(n 1)2] Kn (qmax) 3=2 : (65)
K (q )
1 max

The exponential factor in the second line of equation (65) is always close to unity for 1 n 
1. The rst factor in this line, however, decreases as (1 n) increases. For qmax >  10, the
results plotted in Figure 6 show that the ratio Kn=K1 also decreases as 1 n increases, albeit
slowly for 1 n  1. For  = 0:2, h = 0:5, and RG = 1 Mpc (i.e. qmax  = 63), for example,
we nd that n=1 = 0:416 and 0.173 for n = 0:9 and 0.8, respectively. A small tilt to n < 1,
3 3
therefore, decreases the variance of the density uctuations on galaxy-mass scales. For a tilt
within the range allowed by the cosmic microwave background anisotropy measured at large
angles by the COBE satellite, this e ect of the tilt translates into a modest decrease in the
average V observed for all subuniverses. For example, our own subuniverse has the average
{ 37 {

observed value of V (i.e. V =< V >) for RG = 1 Mpc and h = 0:7 if
8
< 0:827 ; n = 0:8;
>
0 
 => 0:857 ; n = 0:9; (66)
:
0:879 ; n = 1.
If RG = 2 Mpc and h = 0:7, this becomes
8
< 0:792 ; n = 0:8;
>
 
0 = > 0:828 ; n = 0:9; (67)
:
0:855 ; n = 1.
Again, we have assumed s = 1 for all these cases.

5.2. The Median Observed Value of the Vacuum Energy Density in a


COBE-Normalized CDM Universe
We can calculate the median value of the vacuum energy density observed in all sub-
universes, (V )1=2, by using the solutions for 1=2  (V )1=2= < V > given in equation (32),
multiplied by < V > as calculated already from equation (60). The latter depends upon
the local values adopted for 0 and h, used in calculating the variance 2. Since the value
of 1=2 is, for a given s, just a number, independent of , it is sucient for us to write
(V )1=2 = 1=2hV i (as for instance 1=2 = 0:43411 for s = 1) and use our previous results
for < V >. For convenience, we have plotted V =(V )1=2 versus 0 and versus 0=
0 , for
h = 0:5 and 1, RG = 0:01; 1; 2; and 3 Mpc, and n = 1, 0.9, 0.8 in Figure 9.
For what values of 0 and h is the vacuum energy density of our own subuniverse equal
to the median value (V )1=2 observed in all subuniverses? For the particular values of RG and
h assumed in plotting the curves in Figure 9, the intersection of the horizontal dashed lines
with the curves indicates the values for which V = (V )1=2. To answer this question more
generally as a function of RG and h, we must solve the implicit equation (V )1=2 = V . This
is similar to our previous implicit equation < V >= V , except replaced by the equation
1=2 < V >= V , where 1=2 is a constant for a given s (and only weakly depends on s).
The results are shown in Table 3 for n = 1 and s = 1. We nd that for RG = 1 Mpc, our
own subuniverse has a vacuum energy density equal to that of the median if
8 8
V 
 >
< 3 : 4 ; >
< 0 : 77 ; h  = 0:5 ;
= 6:1 ; = 0 : 86 ; h  = 0:7 ; (68)
0 : 10:3 ;
 > 0 >
:
0:91 ; h =1 ;

{ 38 {

Fig. 9.| (a) The vacuum energy density adopted for our own subuniverse, V , is plotted in
units of the median value of the vacuum energy density observed in all subuniverses, (V )1=2,
versus 0, the cosmological constant in our own subuniverse, for local Hubble constant
h = 0:5, for a COBE-normalized, at CDM model with primordial spectral index n = 1
(top panel), 0.9 (middle panel), and 0.8 (bottom panel), assuming shape parameter s = 1.
Intersection of the dashed horizontal lines at V =(V )1=2 = 1 with the various curves give the
solution of the implicit equation obtained by setting (V )1=2 = V in equations (27), (28),
and (59); (b) Same as Fig. 9(a), except plotted versus 0=
0 ; (c) Same as Fig. 9(a), except
for h = 1; (d) Same as Fig. 9(b), except for h = 1.
{ 39 {

Fig. 9b
{ 40 {

Fig. 9c
{ 41 {

Fig. 9d
{ 42 {

Table 3. LOCAL VACUUM ENERGY DENSITIES WHICH EQUAL THE GLOBAL


MEDIAN (V = (V )1=2)1

h = 0:5 h = 0:7 h = 1
RG (Mpc)2
0
0 0 =
0 0
0 0=
0 0
0 0=
0
0.003 0.8795 0.1205 7.2988 0.9301 0.0699 13.306 0.9592 0.0408 23.510
0.010 0.8709 0.1291 6.7459 0.9239 0.0761 12.141 0.9553 0.0447 21.371
0.030 0.8592 0.1408 6.1023 0.9164 0.0836 10.962 0.9504 0.0496 19.161
0.100 0.8414 0.1586 5.3052 0.9049 0.0951 9.5152 0.9429 0.0571 16.513
0.300 0.8175 0.1825 4.4795 0.8891 0.1109 8.0171 0.9324 0.0676 13.793
1.000 0.7740 0.2260 3.4248 0.8590 0.1410 6.0922 0.9115 0.0885 10.299
2.000 0.7319 0.2681 2.7300 0.8273 0.1727 4.7904 0.8884 0.1116 7.9606
3.000 0.6940 0.3060 2.2680 0.7988 0.2012 3.9702 0.8659 0.1341 6.4571
6.000 0.5877 0.4123 1.4254 0.7036 0.2964 2.3738 0.7761 0.2239 3.4663
10.000 0.4279 0.5721 0.7480 0.5080 0.4920 1.0325 0.4584 0.5416 0.8464
15.000 0.2231 0.7769 0.2872 0.1889 0.8111 0.2389 0.0760 0.9240 0.0823
20.000 0.0972 0.9028 0.1077 0.0582 0.9418 0.0618 0.0179 0.9821 0.0182

1 For Harrison-Zel'dovich scale-invariant primordial spectrum (n = 1), and s = V=U = 1.


2 Present units
{ 43 {

while if RG = 2 Mpc, these values shift to


8 8
V 
 >
< 2 : 7 ; < 0:73 ;
> h  = 0:5 ;
 
0 = :> 48::80 ;; 0 = > 0:83 ;
:
0:89 ;
h  = 0:7 ;
h = 1 .
(69)

If RG = 3 Mpc, then our subuniverse has the median observed value of V if


8 8
V 
 >
< 2 : 3 ; < 0:69 ;
> h  = 0:5 ;
0 = :> 46::05 ;; 0 
=:>
0:80 ; h  = 0:7 ; (70)
0:87 ; h = 1 .
If, on the other hand, RG = 0:01 Mpc, then this happens if
8 8
V 
 >
< 6 : 7 ; < 0:87 ;
> h = 0:5 ;
12:1 ;
0 = >: 21 0 
=> 0:92 ; h = 0:7 ; (71)
:
:3 ; 0:96 ; h = 1 .
Here again, the e ect of a tilt to n < 1 is to decrease the values of V and of 0
corresponding to the median V somewhat. For example, for RG = 1 Mpc and h = 0:7, our
own subuniverse has the median value of V (assuming s = 1) for
8
< 0:80 ; n = 0:8 ;
>
0 
 = > 0:83 ; n = 0:9 ; (72)
:
0:86 ; n = 1 .
For RG = 2 Mpc and h = 0:7, this becomes
8
< 0:75 ; n = 0:8 ;
>
0 
 => 0:79 ; n = 0:9 ; (73)
:
0:83 ; n = 1 .

5.3. Observational Constraints on the Cosmological Constant in Our Own


Subuniverse
Ostriker & Steinhardt (1995) have argued that the apparent discrepancies that had
previously been identi ed between observations and the predictions of the standard CDM
model (i.e. cold dark matter in a at universe with zero cosmological constant and the scale-
invariant Harrison-Zel'dovich primordial power spectrum) can be reconciled if the standard
CDM model is modi ed to admit a nonzero cosmological constant roughly in the range
0 = 0:65  0:1. We have reproduced the observational and theoretical constraints that led
{ 44 {

Fig. 10.| (a) Observational Constraints on 0 and h for own subuniverse. Curves labelled
\LSS" and \ 0 = 0:2" or \ 0 = 0:3" bound the region allowed by the constraint 0 =
0h =
0:25  0:05 derived by matching the spatial and angular correlation statistics from galaxy
surveys with the theoretical predictions of the large-scale clustering of galaxies in a COBE-
normalized, at CDM model with primordial power spectrum index n = 1. The curves
labelled \8 X-ray clusters" bound the values of 0 and h which make this CDM model
satisfy the constraint on the present space density of X-ray clusters. The curve labelled
\t0 = 12 Gyr" indicates the lower limit which makes the age of the universe at least as large
as current estimates of the minimum age of globular clusters. The curves labelled \
0 h1=2"
indicate the boundaries de ned by the X-ray-measured total and baryonic masses of clusters
of galaxies, together with the big bang nucleosynthesis limits on the baryon mean density
and the assumption that the ratio of baryon to total mass inside each cluster equals the ratio
of universal mean values. The curve labelled \gravitational lenses" indicates the upper limit
imposed by counts of quasars lensed by intervening galaxies. The dashed curves labelled
\(V )1=2" are the values for which our own subuniverse has the median value of V for all
subuniverses, if RG = 1 Mpc, if n = 1 (top dashed curves), 0.9 (middle dashed curve), or 0.8
(bottom dashed curve). (b) Same as Fig. 10(a), except RG = 2 Mpc. (c) Same as Fig. 10(a),
except RG = 3 Mpc.
{ 45 {

Fig. 10b
{ 46 {

Fig. 10c
{ 47 {

Ostriker & Steinhardt (1995) to this conclusion, plotted here in Figure 10 as a series of upper
and lower bounds in the (0; h)-plane. The constraint curves plotted in Figure 10 are based
on Ostriker & Steinhardt (1995) and references therein, with the following additions.
The data on the large-scale clustering of galaxies from galaxy surveys constrain the
at, CDM model by requiring that the spatial and angular correlation statistics of the
observed galaxies in our local universe at the present epoch agree with the predictions of
structure formation by gravitational instability in the CDM model. This leads to upper
and lower bounds on the so-called \shape parameter" 0 =
0h, given by 0 = 0:25  0:05
(assuming n = 1) (see curves labelled \ 0" and \LSS" in Fig. 10). A similar constraint
results from the requirement that the CDM model reproduce the observed space density
and luminosity function of X-ray clusters in the present universe. We plot this constraint
separately from that of the shape parameter 0 which Ostriker & Steinhardt (1995) plotted.
This X-ray cluster abundance constraint is expressed by Viana & Liddle (1996) as bounds
on the rms density uctuation 8(z = 0) for a smoothing radius (in present units) given by
RTH  8h 1Mpc. Assuming n = 1, these bounds are given by
h (0:59 0:016
0+0:06
0 2 ) i+32(
0 )% ;
8 = 0:6
0 24(
0 )%
(74)

where
(
0)  (
0)0:26log10
0 : (75)
This amounts to a constraint on 0 and h which is similar to that from the correlation
statistics. (The above-mentioned bounds from the statistics of large-scale structure (\LSS")
in the galaxy distribution also refer to the part of the CDM power spectrum at wavelengths
>  8h 1 Mpc.)
Estimates of the total masses and baryonic mass fractions of individual clusters of galax-
ies, derived by tting the X-ray surface brightness pro les of each cluster and assuming the
cluster intergalactic medium is an isothermal sphere in hydrostatic equilibrium with a virial-
ized cluster gravitational potential, yield another pair of bounds. If the assumption is further
made that the ratio of baryonic mass to total mass of each cluster is equal to the universal
mean ratio,
B0=
0, then a comparison of this X-ray-estimated ratio with the constraints
on
B0 from standard big bang nucleosynthesis and the observed light element abundances
(i.e. 0:01 < 2

B0(h) < 0:02; Copi, Schramm, 1
& Turner 1995) implies a constraint on the
=2
total density parameter given by 0:09 < 
0(h ) < 0:33. The curves labelled \
0 h1=2" and
 
\X-ray cluster masses + big bang nucleosynthesis" in Figure 10 indicate the bounds on 0
and h which result from this argument. Some recent numerical gas dynamical simulations
of cluster formation in the at, matter-dominated CDM model suggest that the upper bound
{ 48 {

on
0h1=2 which results from the high values estimated for cluster baryonic mass fraction
by the equilibrium model described above may be too low (e.g. Bartelmann & Steinmetz
1996; Martel, Shapiro, & Valinia 1996; Valinia 1996). The simulated clusters, when properly
resolved, are often found to be comprised of subclusters in the act of merging and, together
with projection e ects, this can cause an observer who uses the assumption of isothermal
spheres in hydrostatic equilibrium to underestimate the total mass and overestimate the
baryon fraction.
The estimated minimum age of globular clusters derived by comparison of theoreti-
cal models of stellar evolution with observed globular cluster H-R diagrams is about 12 
109years. This leads to a lower limit to 0 for each h based on the requirement that the
age of our universe exceeds this estimate of the minimum age of globular cluster stars (see
curve in Fig. 10 labelled \t0 = 12 Gyr").
An upper bound to 0 results from the comparison of the statistics of quasars observed
to be gravitationally lensed by intervening galaxies with the predictions of at cosmological
models with a nonzero cosmological constant. A at cosmology with cosmological constant
tends to produce more gravitationally lensed quasars than does such a cosmology with zero
cosmological constant. The resulting limit was quoted by Ostriker & Steinhardt as 0 < 0:75.
More recently, Kochanek (1996) has argued for a somewhat tighter limit. However, limited
observational data and the possibility that evolution e ects on the population of lensing
galaxies have not been properly taken into account suggest that this limit is still uncertain.
We plot the upper bound quoted by Ostriker & Steinhardt (1995) in Figure 10, but caution
that this limit is still in ux and that the future change can be up or down.
Finally, an independent constraint was recently derived by Perlmutter et al. (1996),
using observations of Type Ia supernovae to infer the relationship between redshift and
distance. Their result is consistent with the limit 0 < 0:5. However, this interesting
approach is too preliminary to be considered reliable at this time. We have not included it
in Figure 10.

5.4. How Likely is the Value of the Cosmological Constant Observed in Our
Subuniverse?
Along with the observational constraints discussed in the previous subsection, we have
plotted in Figure 10 the values of the cosmological constant 0 such that our own universe
has the median value observed for all subuniverses for di erent values of h and for n = 1,
0.9, and 0.8, for RG = 1, 2, and 3 Mpc and s = 1. It is apparent from Figure 10 that if
{ 49 {

the universe is at and the in ationary CDM model applies, then the range of values of the
cosmological constant allowed for our own subuniverse is somewhat below the median, but
not far from it.
This statement is sensitive to the choice of the smoothing scale RG. As discussed in
Section 4.1, this scale is intended to re ect the minimum size of uctuations which are
responsible for forming observers in any subuniverse. The value RG = 1 Mpc was chosen
to correspond to uctuations which encompass roughly the mass of the bright inner part of
an L-galaxy. If, instead, observers are formed if even a much smaller, globular-cluster-size
object collapses out of the background, then the global median value of V would be larger
since the variance of the density uctuations at recombination ltered on this smaller scale is
larger. In that case, the median (V )1=2 is on the high side, somewhat further from the range
in the (0; h)-plane allowed by observations of our own universe. Conversely, if observers
are formed only if uctuations larger than 1 Mpc collapse out, then the global median (V )1=2
will be smaller [since (zrec) is smaller if RG > 1 Mpc than if RG = 1 Mpc] and the match
between the allowed range in the (0; h)-plane and (V )1=2 improves. We illustrate this
by plotting the median value curves for RG = 2 and 3 Mpc along with the observational
constraints, in Figures 10(b) and (c).
These results also depend weakly on the value adopted for
B0. A small increase of

B0 has the e ect of decreasing  and, with it, the value of (zrec). This, too, would
drive the global median (V )1=2 downward for given values of RG and h. We chose
B0 =
0:015(h ) 2 here, but the big bang nucleosynthesis limits allow a value
B0 = 0:02(h ) 2
and uncertainties exist even in that upper limit.
Even though V in our subuniverse seems to be below the median, the allowed range in
the (0 ; h)- plane in Figure 10 includes values that are reasonably likely. To see this, it is
useful to recast the curves of Figure 10 as curves in the ( ; h)-plane, where  V = <V >.
For each point (0; h) on a curve in Figure 10 we can use the results of Sections 3 and 4
to compute <V >, and hence the ratio   V = < V >. We plot these constraint curves
in the ( ; h)-plane in Figure 11. Inspection of this gure shows that the observationally
allowed values of  are between 0.01 and 0.04 for RG = 1 Mpc, between 0.015 and 0.1 for
RG = 2 Mpc, and between 0.03 and 0.15 for RG = 3 Mpc. For all of these values of RG,
there is an appreciable overlap between these observationally allowed values, and the range
from 0.02 to 3.7 which was found (for s = 1) in equations (33) and (34) to be anthropically
likely.
{ 50 {

Fig. 11.| (a) Observational Constraints of Fig. 10 are plotted instead as curves in the
( ; h)-plane, where   V = < V >, and < V > is evaluated by assuming COBE-
normalized density uctuations for the at CDM model with n = 1, as inferred by adopting
the value V = V , assuming RG = 1 Mpc and s = 1. Curves are labelled just as in Figure
10. (b) Same as Fig. 11(a), except RG = 2 Mpc. (c) Same as Fig. 11(a), except RG = 3 Mpc.
{ 51 {

Fig. 11b
{ 52 {

Fig. 11c
{ 53 {

6. SUMMARY AND CONCLUSION


The range of values of the cosmological constant that allow life to arise is so narrow,
that within that range, we can assume that the a priori probability distribution of values of
the cosmological constant is constant. The probability that a particular value of the cosmo-
logical constant is observed in our universe is then proportional to the number of observers
who might measure that value. That abundance is, in turn, proportional to the fraction of
matter which eventually collapses out of the background into gravitationally bound concen-
trations large enough to initiate star formation and retain heavy elements, presumed to be
prerequisites for the origin of planets and intelligent life. We have derived an analytical esti-
mate for this \collapsed fraction" based upon a simple, pressure-free, spherically symmetric,
nonlinear model for the growth of density uctuations in a at universe with arbitrary value
of the vacuum energy density V , applied in a statistical way to a distribution of cosmolog-
ical density uctuations. We have evaluated the resulting probability distribution for the
observed values of V for density uctuations which are Gaussian random and of linear am-
plitude at recombination. We nd that the probability distribution in that case is a unique
function of V = < V >, where < V > is the average of the observed values of V over
all subuniverses, and a shape parameter s which characterizes the variation of density with
radius within a given density uctuation. The dependence on the parameter s is, moreover,
found to be very weak. The values of V (such as the median value) at which the integrated
probability distribution takes any de nite values are simply proportional to the mean value
< V > for a given s, with proportionality constants that are fairly insensitive to the value
of s, and for a given value of s, these values of V are, in turn, exactly proportional to the
quantity 3 evaluated at recombination, where 2 is the variance of the density uctuations
and  is the cosmic mean matter density. The dependence of this proportionality constant
on the value of s is, once again, quite weak. The quantity 3 is assumed to be common to
all subuniverses in which life can arise, because in these subuniverses V is negligible at and
before recombination.
Presumably it will some day be possible to calculate 3 unambiguously by using as-
tronomical observations to measure the density uctuations in our own universe. While
this is not yet possible with great certainty, current measurements of anisotropy at large
angles in the cosmic microwave background by the COBE DMR experiment represent sub-
stantial progress toward this goal. In particular, the detected cosmic microwave background
anisotropy xes the uctuation amplitude at long wavelengths and is consistent with Gaus-
sian random density uctuations in a at universe, with the scale- invariant primordial power
spectrum, P (k) / kn with n  1, as expected from in ationary cosmology. Unfortunately,
the precise interpretation of these anisotropy measurements is, itself, dependent upon our
uncertain knowledge of the values of 0 and h for our observed universe. In addition, the
{ 54 {

variance 2 which is relevant to our anthropic probability calculation is that for the density
uctuations after the density has been smoothed over some length scale RG so as to eliminate
uctuations which are too small to contribute to the formation of astronomers, and RG is
orders of magnitudes smaller than the long wavelengths at which the uctuations are con-
strained directly by the COBE DMR anisotropy measurements. Unfortunately, therefore,
neither the value of RG nor the amplitude of the density uctuations at wavelengths close to
RG are precisely determined at this time. In the meantime, in order to specify 3, we have
here adopted the CDM model for the growth of a scale-invariant primordial spectrum of
Gaussian random density uctuations in a at universe with nonzero cosmological constant,
with an amplitude set by the COBE DMR data, and parameterized our results in terms of
the unknown lter scale RG and power spectrum index n.
Our results are encouraging from the point of view of the anthropic hypothesis. Although
the range of observationally favored values of V is somewhat less than the median value of
V for all subuniverses, it has a signi cant overlap with the range of values between the 5th
and 95th percentile for all subuniverses.
In short, these results explain why there might be a nonzero, but small value of V in
our universe, if all values of V are otherwise equally likely to occur. They show that a range
of values close to those favored by current observations are reasonably probable, while values
which are orders of magnitude smaller or larger are extremely improbable for us to observe.
This is the essential ingredient in the anthropic explanation for the observed value of 0 in
our own universe.
With a continued improvement in measurements of V and the spectrum of uctuations
at recombination, it may turn out that the actual value of V in our subuniverse is more
than two orders of magnitude less than the average. In this case we would have to conclude
that the anthropic arguments used here do not explain the smallness of the cosmological
constant. Unfortunately, the converse is not possible; observation of a value of V that is
anthropically likely would support the idea that there is a diversity of possible V values,
but since we only observe one subuniverse, astronomical observation alone cannot con rm
this idea. Ultimately this issue will have to be settled by advances in fundamental physics,
which we hope will tell us whether in fact it is correct that there are many subuniverses
with di erent values of the cosmological constant. If this is not correct, then there is no
justi cation for the anthropic reasoning used here, while if it is correct, then these anthropic
arguments are just common sense.

This research was supported in part by NASA Grant NAG5-2785, NSF Grants ASC
{ 55 {

9504046, PHY 9009850, and PHY 9511632, and the Robert A. Welch Foundation.
{ 56 {

REFERENCES
Bardeen, J. M., Bond, J. R., Kaiser, N., & Szalay, A. S. 1986, ApJ, 304, 15
Barrow, J. D., & Tipler, F. J. 1986, The Anthropic Cosmological Principle (New York:
Oxford University Press)
Bartelmann, M., & Steinmetz, M. 1996, preprint astro-ph/9603101
Baum, E. 1984, Phys. Lett. B, 133, 185.
Bennett, C. L. et al 1996, ApJ, 464, L1
Bunn, E. F., & White, M. 1996, preprint (astro-ph/9607060)
Coleman, S. 1988a, Nucl. Phys. B, 307, 867
Coleman, S. 1988b, Nucl. Phys. B, 310, 643
Copi, C., Schramm, D. N., & Turner, M. S. 1995, Science, 267, 192
Efstathiou, G. 1995, M.N.R.A.S., 274, L73
Fischler, W., Klebanov, I., Polchinski, J., & Susskind, L. 1989, Nucl. Phys. B, 237, 157
Gunn, J. E., & Gott, J. R. 1972, ApJ, 176, 1
Hawking, S. W. 1983, in Shelter Island II { Proceedings of the 1983 Shelter Island Conference
On Quantum Field Theory and the Fundamental Problem of Physics, ed. Jackiw, R.,
et al (MIT Press: Cambridge, 1995)
Hawking, S. W. 1984, Phys. Lett. B, 175, 395
Kochanek, C. S. 1996, ApJ, 466, 638
Linde, A. D. 1986, Phys. Lett. B, 175, 395
Linde, A. D. 1987, Phys. Scri., T15, 169
Linde, A. D. 1988, Phys. Lett. B, 202, 194
Liddle, A. R., Lyth, D. H., Viana, P. T. P., & White, M. 1996, M.N.R.A.S., 282, 281
Martel, H. 1991, ApJ, 377, 7
Martel, H., Shapiro, P. R., and Valinia, A. 1996, in preparation.
{ 57 {

Ostriker, J. P., & Steinhardt, P. J. 1995, Nature, 377, 600


Peebles, P. J. E. 1967, ApJ, 147, 859
Peebles, P. J. E. 1980, The Large-Scale Structure of the Universe (Princeton: Princeton
University Press)
Peebles, P. J. E. 1993, Principles of Physical Cosmology (Princeton: Princeton University
Press)
Perlmutter, S. et al. 1996, preprint astro/ph-9602122
Shapiro, P. R., Giroux, M. L., & Babul, A. 1994, ApJ, 427, 25
Sugiyama, N. 1995, ApJS, 100, 281
Valinia, A. 1996, Ph.D. Dissertation, Dept. of Physics, The University of Texas at Austin.
Viana, P. T. P., & Liddle, A. R. 1996, M.N.R.A.S., 281, 323
Vilenkin, A. 1995a, Phys. Rev. Lett., 74, 846
Vilenkin, A. 1995b, Tufts preprint gr-qc/9507018, to be published in the Proceedings of the
1995 International School of Astrophysics at Erice.
Vilenkin, A. 1995c, Phys.Rev.D, 52, 3365
Vilenkin, A. 1995d, Tufts preprint gr-qc/9512031
Weinberg, S. 1987, Phys. Rev. Lett., 59, 2067
Weinberg, S. 1989, Rev. Mod. Phys., 61, 1
Weinberg, S. 1996, astro-ph/9610044, to be published in the proceedings of the conference
Critical Dialogues in Cosmology at Princeton University, June 24{28, 1996.

This preprint was prepared with the AAS LATEX macros v4.0.
UTTG-05-97

What is Quantum Field Theory, and What Did We Think It Is?∗

Steven Weinberg∗∗
arXiv:hep-th/9702027v1 4 Feb 1997

Physics Department, University of Texas at Austin


Austin, TX 78712

Quantum field theory was originally thought to be simply the quantum


theory of fields. That is, when quantum mechanics was developed physicists
already knew about various classical fields, notably the electromagnetic field,
so what else would they do but quantize the electromagnetic field in the same
way that they quantized the theory of single particles? In 1926, in one of
the very first papers on quantum mechanics,1 Born, Heisenberg and Jordan
presented the quantum theory of the electromagnetic field. For simplicity
they left out the polarization of the photon, and took spacetime to have
one space and one time dimension, but that didn’t affect the main results.
(Response to comment from audience: Yes, they were really doing string
theory, so in this sense string theory is earlier than quantum field theory.)
Born et al. gave a formula for the electromagnetic field as a Fourier transform
and used the canonical commutation relations to identify the coefficients in
this Fourier transform as operators that destroy and create photons, so that
when quantized this field theory became a theory of photons. Photons, of
course, had been around (though not under that name) since Einstein’s work
on the photoelectric effect two decades earlier, but this paper showed that
photons are an inevitable consequence of quantum mechanics as applied to
electromagnetism.
The quantum theory of particles like electrons was being developed at
the same time, and made relativistic by Dirac2 in 1928–1930. For quite a
long time many physicists thought that the world consisted of both fields

Talk presented at the conference “Historical and Philosophical Reflections on the
Foundations of Quantum Field Theory,” at Boston University, March 1996. It will be
published in the proceedings of this conference.
∗∗
Research supported in part by the Robert A. Welch Foundation and NSF Grant PHY
9511632. E-mail address: weinberg@physics.utexas.edu

1
and particles: the electron is a particle, described by a relativistically in-
variant version of the Schrödinger wave equation, and the electromagnetic
field is a field, even though it also behaves like particles. Dirac I think never
really changed his mind about this, and I believe that this was Feynman’s
understanding when he first developed the path integral and worked out his
rules for calculating in quantum electrodynamics. When I first learned about
the path-integral formalism, it was in terms of electron trajectories (as it is
also presented in the book by Feynman and Hibbs3 ). I already thought that
wasn’t the best way to look at electrons, so this gave me an distaste for the
path integral formalism, which although unreasonable lasted until I learned
of ’t Hooft’s work4 in 1971. I feel it’s all right to mention autobiographical
details like that as long as the story shows how the speaker was wrong.
In fact, it was quite soon after the Born–Heisenberg–Jordan paper of
1926 that the idea came along that in fact one could use quantum field
theory for everything, not just for electromagnetism. This was the work
of many theorists during the period 1928–1934, including Jordan, Wigner,
Heisenberg, Pauli, Weisskopf, Furry, and Oppenheimer. Although this is
often talked about as second quantization, I would like to urge that this
description should be banned from physics, because a quantum field is not
a quantized wave function. Certainly the Maxwell field is not the wave
function of the photon, and for reasons that Dirac himself pointed out, the
Klein–Gordon fields that we use for pions and Higgs bosons could not be
the wave functions of the bosons. In its mature form, the idea of quantum
field theory is that quantum fields are the basic ingredients of the universe,
and particles are just bundles of energy and momentum of the fields. In
a relativistic theory the wave function is a functional of these fields, not a
function of particle coordinates. Quantum field theory hence led to a more
unified view of nature than the old dualistic interpretation in terms of both
fields and particles.
There is an irony in this. (I’ll point out several ironies as I go along — this
whole subject is filled with delicious ironies.) It is that although the battle
is over, and the old dualism that treated photons in an entirely different way
from electrons is I think safely dead and will never return, some calculations
are actually easier in the old particle framework. When Euler, Heisenberg and
Kockel5 in the mid-thirties calculated the effective action (often called the
Euler–Heisenberg action) of a constant external electromagnetic field, they
calculated to all orders in the field, although their result is usually presented

2
only to fourth order. This calculation would probably have been impossible
with the old fashioned perturbation theory techniques of the time, if they
had not done it by first solving the Dirac equation in a constant external
electromagnetic field and using those Dirac wave functions to figure out the
effective action. These techniques of using particle trajectories rather than
field histories in calculations have been revived in recent years. Under the
stimulus of string theory, Bern and Kosower,6 in particular, have developed
a useful formalism for doing calculations by following particle world lines
rather than by thinking of fields evolving in time. Although this approach
was stimulated by string theory, it has been reformulated entirely within the
scope of ordinary quantum field theory, and simply represents a more efficient
way of doing certain calculations.
One of the key elements in the triumph of quantum field theory was the
development of renormalization theory. I’m sure this has been discussed often
here, and so I won’t dwell on it. The version of renormalization theory that
had been developed in the late 1940s remained somewhat in the shade for a
long time for two reasons: (1) for the weak interactions it did not seem pos-
sible to develop a renormalizable theory, and (2) for the strong interactions
it was easy to write down renormalizable theories, but since perturbation
theory was inapplicable it did not seem that there was anything that could
be done with these theories. Finally all these problems were resolved through
the development of the standard model, which was triumphantly verified by
experiments during the mid-1970s, and today the weak, electromagnetic and
strong interactions are happily all described by a renormalizable quantum
field theory. If you had asked me in the mid-1970s about the shape of future
fundamental physical theories, I would have guessed that they would take the
form of better, more all-embracing, less arbitrary, renormalizable quantum
field theories. I gave a talk at the Harvard Science Center at around this
time, called “The Renaissance of Quantum Field Theory,” which shows you
the mood I was in.
There were two things that especially attracted me to the ideas of renor-
malization and quantum field theory. One of them was that the requirement
that a physical theory be renormalizable is a precise and rational criterion of
simplicity. In a sense, this requirement had been used long before the advent
of renormalization theory. When Dirac wrote down the Dirac equation in
1928 he could have added an extra ‘Pauli’ term7 which would have given
the electron an arbitrary anomalous magnetic moment. Dirac could (and

3
perhaps did) say ‘I won’t add this term because it’s ugly and complicated
and there’s no need for it.’ I think that in physics this approach generally
makes good strategies but bad rationales. It’s often a good strategy to study
simple theories before you study complicated theories because it’s easier to
see how they work, but the purpose of physics is to find out why nature is
the way it is, and simplicity by itself is I think never the answer. But renor-
malizability was a condition of simplicity which was being imposed for what
seemed after Dyson’s 1949 papers8 like a rational reason, and it explained not
only why the electron has the magnetic moment it has, but also (together
with gauge symmetries) all the detailed features of the standard model of
weak, electromagnetic, and strong, interactions, aside from some numerical
parameters.
The other thing I liked about quantum field theory during this period
of tremendous optimism was that it offered a clear answer to the ancient
question of what we mean by an elementary particle: it is simply a particle
whose field appears in the Lagrangian. It doesn’t matter if it’s stable, unsta-
ble, heavy, light — if its field appears in the Lagrangian then it’s elementary,
otherwise it’s composite.∗∗∗
Now my point of view has changed. It has changed partly because of my
experience in teaching quantum field theory. When you teach any branch
of physics you must motivate the formalism — it isn’t any good just to
present the formalism and say that it agrees with experiment — you have
to explain to the students why this the way the world is. After all, this is
our aim in physics, not just to describe nature, but to explain nature. In the
course of teaching quantum field theory, I developed a rationale for it, which
very briefly is that it is the only way of satisfying the principles of Lorentz
invariance plus quantum mechanics plus one other principle.
Let me run through this argument very rapidly. The first point is to start
with Wigner’s definition of physical multi-particle states as representations
of the inhomogeneous Lorentz group.9 You then define annihilation and cre-
ation operators a(~p, σ, n) and a† (~p, σ, n) that act on these states (where ~p is
the three-momentum, σ is the spin z-component, and n is a species label).
There’s no physics in introducing such operators, for it is easy to see that
∗∗∗
We should not really give quantum field theory too much credit for clarifying the dis-
tinction between elementary and composite particles, because some quantum field theories
exhibit the phenomenon of bosonization: At least in two dimensions there are theories of
elementary scalars that are equivalent to theories with elementary fermions.

4
any operator whatever can be expressed as a functional of them. The exis-
tence of a Hamiltonian follows from time-translation invariance, and much
of physics is described by the S-matrix, which is given by the well known
Feynman–Dyson series of integrals over time of time-ordered products of the
interaction Hamiltonian HI (t) in the interaction picture;

(−i)n ∞ ∞ ∞
X Z Z Z
S = dt1 dt2 · · · dtn
n=0 n! −∞ −∞ −∞

× T {HI (t1 )HI (t2 ) · · · HI (tn )} . (1)

This should all be familiar. The other principle that has to be added is the
cluster decomposition principle, which requires that distant experiments give
uncorrelated results.10 In order to have cluster decomposition, the Hamilto-
nian is written not just as any functional of creation and annihilation oper-
ators, but as a power series in these operators with coefficients that (aside
from a single momentum-conservation delta function) are sufficiently smooth
functions of the momenta carried by the operators. This condition is satisfied
for an interaction Hamiltonian of the form
Z
HI (t) = d3 x H(~x, t) (2)

where H(x) is a power series (usually a polynomial) with terms that are
local in annihilation fields, which are Fourier transforms of the annihilation
operators: Z
(+)
ψℓ (x) = d3 p eip·x uℓ (~p, σ, n) a(~p, σ, n)
X
(3)
σ,n

together of course with their adjoints, the creation fields.


So far this all applies to nonrelativistic as well as relativistic theories.†
Now if you also want Lorentz invariance, then you have to face the fact that
the time-ordering in the Feynman–Dyson series (1) for the S-matrix doesn’t
look very Lorentz invariant. The obvious way to make the S-matrix Lorentz
invariant is to take the interaction Hamiltonian density H(x) to be a scalar,

By the way, the reason that quantum field theory is useful even in nonrelativistic
statistical mechanics, where there is often a selection rule that makes the actual creation
or annihilation of particles impossible, is that in statistical mechanics you have to impose
a cluster decomposition principle, and quantum field theory is the natural way to do so.

5
and also to require that these Hamiltonian densities commute at spacelike
separations
[H(x), H(y)] = 0 for spacelike x − y , (4)
in order to exploit the fact that time ordering is Lorentz invariant when
the separation between spacetime points is timelike. In order to satisfy the
requirement that the Hamiltonian density commute with itself at spacelike
separations, it is constructed out of fields which satisfy the same requirement.
These are given by sums of fields that annihilate particles plus fields that
create the corresponding antiparticles
"
XZ
3
ψℓ (x) = d p eip·x uℓ (~p, σ, n) a(~p, σ, n)
σ,n
#
−ip·x †
+e vℓ (~p, σ, n) a (~p, σ, n̄) , (5)

where n̄ denotes the antiparticle of the particle of species n. For a field


ψℓ that transforms according to an irreducible representation of the homo-
geneous Lorentz group, the form of the coefficients uℓ and vℓ is completely
determined (up to a single over-all constant factor) by the Lorentz transfor-
mation properties of the fields and one-particle states, and by the condition
that the fields commute at spacelike separations. Thus the whole formalism
of fields, particles, and antiparticles seems to be an inevitable consequence of
Lorentz invariance, quantum mechanics, and cluster decomposition, without
any ancillary assumptions about locality or causality.
This discussion has been extremely sketchy, and is subject to all sorts of
qualifications. One of them is that for massless particles, the range of possible
theories is slightly larger than I have indicated here. For example, in quantum
electrodynamics, in a physical gauge like Coulomb gauge, the Hamiltonian
is not of the form (2) — there is an additional term, the Coulomb potential,
which is bilocal and serves to cancel a non-covariant term in the propagator.
But relativistically invariant quantum theories always (with some qualifica-
tions I’ll come to later) do turn out to be quantum field theories, more or
less as I have described them here.
One can go further, and ask why we should formulate our quantum field
theories in terms of Lagrangians. Well, of course creation and annihilation
operators by themselves yield pairs of canonically conjugate variables; from
the as and a† s, it is easy to construct qs and ps. The time-dependence of

6
these operators is dictated in terms of the Hamiltonian, the generator of time
translations, so the Hamiltonian formalism is trivially always with us. But
why the Lagrangian formalism? Why do we enumerate possible theories by
giving their Lagrangians rather than by writing down Hamiltonians? I think
the reason for this is that it is only in the Lagrangian formalism (or more
generally the action formalism) that symmetries imply the existence of Lie
algebras of suitable quantum operators, and you need these Lie algebras to
make sensible quantum theories. In particular, the S-matrix will be Lorentz
invariant if there is a set of 10 sufficiently smooth operators satisfying the
commutation relations of the inhomogeneous Lorentz group. It’s not trivial
to write down a Hamiltonian that will give you a Lorentz invariant S-matrix
— it’s not so easy to think of the Coulomb potential just on the basis of
Lorentz invariance — but if you start with a Lorentz invariant Lagrangian
density then because of Noether’s theorem the Lorentz invariance of the S-
matrix is automatic.
Finally, what is the motivation for the special gauge invariant Lagrangians
that we use in the standard model and general relativity? One possible an-
swer is that quantum theories of mass zero, spin one particles violate Lorentz
invariance unless the fields are coupled in a gauge invariant way, while quan-
tum theories of mass zero, spin two particles violate Lorentz invariance unless
the fields are coupled in a way that satisfies the equivalence principle.
This has been an outline of the way I’ve been teaching quantum field
theory these many years. Recently I’ve put this all together into a book,11
now being sold for a negligible price. The bottom line is that quantum me-
chanics plus Lorentz invariance plus cluster decomposition implies quantum
field theory. But there are caveats that have to be attached to this, and I
can see David Gross in the front row anxious to take me by the throat over
various gaps in what I have said, so I had better list these caveats quickly to
save myself.
First of all, the argument I have presented is obviously based on pertur-
bation theory. Second, even in perturbation theory, I haven’t stated a clear
theorem, much less proved one. As I mentioned there are complications when
you have things like mass zero, spin one particles for example; in this case
you don’t really have a fully Lorentz invariant Hamiltonian density, or even
one that is completely local. Because of these complications, I don’t know
how even to state a general theorem, let alone prove it, even in perturbation
theory. But I don’t think that these are insuperable obstacles.

7
A much more serious objection to this not-yet-formulated theorem is that
there’s already a counter example to it: string theory. When you first learn
string theory it seems in an almost miraculous way to give Lorentz invariant,
unitary S-matrix elements without being a field theory in the sense that
I’ve been using it. (Of course it is a field theory in a different sense —
it’s a two dimensional conformally invariant field theory, but not a quantum
field theory in four spacetime dimensions.) So before even being formulated
precisely, this theorem suffers from at least one counter example.
Another fundamental problem is that the S-matrix isn’t everything. Space-
time could be radically curved, not just have little ripples on it. Also, at
finite temperature there’s no S-matrix because particles cannot get out to
infinite distances from a collision without bumping into things. Also, it
seems quite possible that at very short distances the description of events in
four-dimensional flat spacetime becomes inappropriate.
Now, all of these caveats really work only against the idea that the final
theory of nature is a quantum field theory. They leave open the view, which is
in fact the point of view of my book, that although you can not argue that rel-
ativity plus quantum mechanics plus cluster decomposition necessarily leads
only to quantum field theory, it is very likely that any quantum theory that
at sufficiently low energy and large distances looks Lorentz invariant and sat-
isfies the cluster decomposition principle will also at sufficiently low energy
look like a quantum field theory. Picking up a phrase from Arthur Wight-
man, I’ll call this a folk theorem. At any rate, this folk theorem is satisfied
by string theory, and we don’t know of any counterexamples.
This leads us to the idea of effective field theories. When you use quantum
field theory to study low-energy phenomena, then according to the folk the-
orem you’re not really making any assumption that could be wrong, unless
of course Lorentz invariance or quantum mechanics or cluster decomposition
is wrong, provided you don’t say specifically what the Lagrangian is. As
long as you let it be the most general possible Lagrangian consistent with
the symmetries of the theory, you’re simply writing down the most general
theory you could possibly write down. This point of view has been used in
the last fifteen years or so to justify the use of effective field theories, not just
in the tree approximation where they had been used for some time earlier,
but also including loop diagrams. Effective field theory was first used in this
way to calculate processes involving soft π mesons,12 that is, π mesons with
energy less than about 2πFπ ≈ 1200 MeV. The use of effective quantum

8
field theories has been extended more recently to nuclear physics,13 where
although nucleons are not soft they never get far from their mass shell, and
for that reason can be also treated by similar methods as the soft pions.
Nuclear physicists have adopted this point of view, and I gather that they
are happy about using this new language because it allows one to show in a
fairly convincing way that what they’ve been doing all along (using two-body
potentials only, including one-pion exchange and a hard core) is the correct
first step in a consistent approximation scheme. The effective field theory
approach has been applied more recently to superconductivity. Shankar, I
believe, in a contribution to this conference is talking about this. The present
educated view of the standard model, and of general relativity,14 is again that
these are the leading terms in effective field theories.
The essential point in using an effective field theory is you’re not al-
lowed to make any assumption of simplicity about the Lagrangian. Certainly
you’re not allowed to assume renormalizability. Such assumptions might be
appropriate if you were dealing with a fundamental theory, but not for an
effective field theory, where you must include all possible terms that are con-
sistent with the symmetry. The thing that makes this procedure useful is
that although the more complicated terms are not excluded because they’re
non-renormalizable, their effect is suppressed by factors of the ratio of the
energy to some fundamental energy scale of the theory. Of course, as you go
to higher and higher energies, you have more and more of these suppressed
terms that you have to worry about.
On this basis, I don’t see any reason why anyone today would take Ein-
stein’s general theory of relativity seriously as the foundation of a quantum
theory of gravitation, if by Einstein’s theory is meant the theory with a La-

grangian density given by just the term gR/16πG. It seems to me there’s
no reason in the world to suppose that the Lagrangian does not contain all
the higher terms with more factors of the curvature and/or more derivatives,
all of which are suppressed by inverse powers of the Planck mass, and of
course don’t show up at any energy far below the Planck mass, much less in
astronomy or particle physics. Why would anyone suppose that these higher
terms are absent?
Likewise, since now we know that without new fields there’s no way that
the renormalizable terms in the standard model could violate baryon con-
servation or lepton conservation, we now understand in a rational way why
baryon number and lepton number are as well conserved as they are, without

9
having to assume that they are exactly conserved.†† Unless someone has
some a priori reason for exact baryon and lepton conservation of which I
haven’t heard, I would bet very strong odds that baryon number and lepton
number conservation are in fact violated by suppressed non-renormalizable
corrections to the standard model.
These effective field theories are non-renormalizable in the old Dyson
power-counting sense. That is, although to achieve a given accuracy at any
given energy, you need only take account of a finite number of terms in the
action, as you increase the accuracy or the energy you need to include more
and more terms, and so you have to know more and more. On the other
hand, effective field theories still must be renormalizable theories in what I
call the modern sense: the symmetries that govern the action also have to
govern the infinities, for otherwise there will be infinities that can’t be elimi-
nated by absorbing them into counter terms to the parameters in the action.
This requirement is automatically satisfied for unbroken global symmetries,
such as Lorentz invariance and isotopic spin invariance and so on. Where it’s
not trivial is for gauge symmetries. We generally deal with gauge theories
by choosing a gauge before quantizing the theory, which of course breaks
the gauge invariance, so it’s not obvious how gauge invariance constrains
the infinities. (There is a symmetry called BRST invariance15 that survives
gauge fixing, but that’s non-linearly realized, and non-linearly realized sym-
metries of the action are not symmetries of the Feynman amplitudes.) This
raises a question, whether gauge theories that are not renormalizable in the
power counting sense are renormalizable in the modern sense. The theorem
that says that infinities are governed by the same gauge symmetries as the
terms in the Lagrangian was originally proved back in the old days by ’t
Hooft and Veltman16 and Lee and Zinn-Justin17 only for theories that are
renormalizable in the old power-counting sense, but this theorem has only
recently been extended to theories of the Yang–Mills18 or Einstein type with
arbitrary numbers of complicated interactions that are not renormalizable in
the power-counting sense.‡ You’ll be reassured to know that these theories
are renormalizable in the modern sense, but there’s no proof that this will
††
The extra fields required by low-energy supersymmetry may invalidate this argument.

I refer here to work of myself and Joaquim Gomis,19 relying on recent theorems about
the cohomology of the Batalin–Vilkovisky operator by Barnich, Brandt, and Henneaux.20
Earlier work along these lines but with different motivation was done by Voronov, Tyutin,
and Lavrov;21 Anselmi;22 and Harada, Kugo, and Yamawaki.23

10
be true of all quantum field theories with local symmetries.
I promised you a few ironies today. The second one takes me back to
the early 1960s when S-matrix theory was very popular at Berkeley and
elsewhere. The hope of S-matrix theory was that, by using the principles of
unitarity, analyticity, Lorentz invariance and other symmetries, it would be
possible to calculate the S-matrix, and you would never have to think about a
quantum field. In a way, this hope reflected a kind of positivistic puritanism:
we can’t measure the field of a pion or a nucleon, so we shouldn’t talk about
it, while we do measure S-matrix elements, so this is what we should stick
to as ingredients of our theories. But more important than any philosophical
hang-ups was the fact that quantum field theory didn’t seem to be going
anywhere in accounting for the strong and weak interactions.
One problem with the S-matrix program was in formulating what is
meant by the analyticity of the S-matrix. What precisely are the analytic
properties of a multi-particle S-matrix element? I don’t think anyone ever
knew. I certainly didn’t know, so even though I was at Berkeley I never got
too enthusiastic about the details of this program, although I thought it was
a lovely idea in principle. Eventually the S-matrix program had to retreat,
as described by Kaiser in a contribution to this conference, to a sort of mix
of field theory and S-matrix theory. Feynman rules were used to find the sin-
gularities in the S-matrix, and then they were thrown away, and the analytic
structure of the S-matrix with these singularities, together with unitarity
and Lorentz invariance, was used to do calculations.
Unfortunately to use these assumptions it was necessary to make uncon-
trolled approximations, such as the strip approximation, whose mention will
bring tears to the eyes of those of us who are old enough to remember it. By
the mid-1960’s it was clear that S-matrix theory had failed in dealing with
the one problem it had tried hardest to solve, that of pion–pion scattering.
The strip approximation rested on the assumption that double dispersion re-
lations are dominated by regions of the Mandelstam diagram near the fringes
of the physical region, which would only make sense if π–π scattering is strong
at low energy, and these calculations predicted that π–π scattering is indeed
strong at low energy, which was at least consistent, but it was then discov-
ered that π–π scattering is not strong at low energy. Current algebra came
along at just that time, and was used to predict not only that low energy π-π
scattering is not strong, but also successfully predicted the values of the π–π
scattering lengths.24 From a practical point of view, this was the greatest

11
defeat of S-matrix theory. The irony here is that the S-matrix philosophy is
not that far from the modern philosophy of effective field theories, that what
you should do is just write down the most general S-matrix that satisfies
basic principles. But the practical way to implement S-matrix theory is to
use an effective quantum field theory — instead of deriving analyticity prop-
erties from Feynman diagrams, we use the Feynman diagrams themselves.
So here’s another answer to the question of what quantum field theory is: it
is S-matrix theory, made practical.
By the way, I think that the emphasis in S-matrix theory on analyticity
as a fundamental principle was misguided, not only because no one could
ever state the detailed analyticity properties of general S-matrix elements,
but also because Lorentz invariance requires causality (because as I argued
earlier otherwise you’re not going to get a Lorentz invariant S-matrix), and
in quantum field theory causality allows you to derive analyticity proper-
ties. So I would include Lorentz invariance, quantum mechanics and cluster
decomposition as fundamental principles, but not analyticity.
As I have said, quantum field theories provide an expansion in powers
of the energy of a process divided by some characteristic energy; for soft
pions this characteristic energy is about a GeV; for superconductivity it’s
the Debye frequency or temperature; for the standard model it’s 1015 to 1016
GeV; and for gravitation it’s about 1018 GeV. Any effective field theory loses
its predictive power when the energy of the processes in question approaches
the characteristic energy. So what happen to the effective field theories of
electroweak, strong, and gravitational interactions at energies of order 1015 –
1018 GeV? I know of only two plausible alternatives.
One possibility is that the theory remains a quantum field theory, but one
in which the finite or infinite number of renormalized couplings do not run off
to infinity with increasing energy, but hit a fixed point of the renormalization
group equations. One way that can happen is provided by asymptotic free-
dom in a renormalizable theory,25 where the fixed point is at zero coupling,
but it’s possible to have more general fixed points with infinite numbers of
non-zero nonrenormalizable couplings. Now, we don’t know how to calculate
these non-zero fixed points very well, but one thing we know with fair cer-
tainty is that the trajectories that run into a fixed point in the ultraviolet
limit form a finite dimensional subspace of the infinite dimensional space of
all coupling constants. (If anyone wants to know how we know that, I’ll ex-
plain this later.) That means that the condition, that the trajectories hit a

12
fixed point, is just as restrictive in a nice way as renormalizability used to be:
It reduces the number of free coupling parameters to a finite number. We
don’t yet know how to do calculations for fixed points that are not near zero
coupling. Some time ago I proposed26 that these calculations could be done
in the theory of gravitation by working in 2 + ǫ dimensions and expanding
in powers of ǫ = 2, in analogy with the way that Wilson and Fisher27 had
calculated critical exponents by working in 4 − ǫ dimensions and expanding
in powers of ǫ = 1, but this program doesn’t seem to be working very well.
The other possibility, which I have to admit is a priori more likely, is that
at very high energy we will run into really new physics, not describable in
terms of a quantum field theory. I think that by far the most likely possibility
is that this will be something like a string theory.
Before I leave the renormalization group, I did want to say another word
about it because there’s going to be an interesting discussion on this sub-
ject here tomorrow morning, and for reasons I’ve already explained I can’t
be here. I’ve read a lot of argument about the Wilson approach28 vs. the
Gell-Mann–Low approach,29 which seems to me to call for reconciliation.
There have been two fundamental insights in the development of the renor-
malization group. One, due to Gell-Mann and Low, is that logarithms of
energy that violate naive scaling and invalidate perturbation theory arise be-
cause of the way that renormalized coupling constants are defined, and that
these logarithms can be avoided by renormalizing at a sliding energy scale.
The second fundamental insight, due to Wilson, is that it’s very important
in dealing with phenomena at a certain energy scale to integrate out the
physics at much higher energy scales. It seems to me these are the same
insight, because when you adopt the Gell-Mann–Low approach and define a
renormalized coupling at a sliding scale and use renormalization theory to
eliminate the infinities rather than an explicit cutoff, you are in effect inte-
grating out the higher energy degrees of freedom — the integrals converge
because after renormalization the integrand begins to fall off rapidly at the
energy scale at which the coupling constant is defined. (This is true whether
or not the theory is renormalizable in the power-counting sense.) So in other
words instead of a sharp cutoff a la Wilson, you have a soft cutoff, but it’s
a cutoff nonetheless and it serves the same purpose of integrating out the
short distance degrees of freedom. There are practical differences between
the Gell-Mann–Low and Wilson approaches, and there are some problems
for which one is better and other problems for which the other is better. In

13
statistical mechanics it isn’t important to maintain Lorentz invariance, so
you might as well have a cutoff. In quantum field theories, Lorentz invari-
ance is necessary, so it’s nice to renormalize a la Gell-Mann–Low. On the
other hand, in supersymmetry theories there are some non-renormalization
theorems that are simpler if you use a Wilsonian cutoff than a Gell-Mann–
Low cutoff.30 These are all practical differences, which we have to take into
account, but I don’t find any fundamental philosophical difference between
these two approaches.
On the plane coming here I read a comment by Michael Redhead, in a
paper submitted to this conference: ‘To subscribe to the new effective field
theory programme is to give up on this endeavor’ [the endeavor of finding
really fundamental laws of nature], ‘and retreat to a position that is somehow
less intellectually exciting.’ It seems to me that this is analogous to saying
that to balance your checkbook is to give up dreams of wealth and have a life
that is intrinsically less exciting. In a sense that’s true, but nevertheless it’s
still something that you had better do every once in a while. I think that in
regarding the standard model and general relativity as effective field theories
we’re simply balancing our checkbook and realizing that we perhaps didn’t
know as much as we thought we did, but this is the way the world is and
now we’re going to go on the next step and try to find an ultraviolet fixed
point, or (much more likely) find entirely new physics. I have said that I
thought that this new physics takes the form of string theory, but of course,
we don’t know if that’s the final answer. Nielsen and Oleson31 showed long
ago that relativistic quantum field theories can have string-like solutions. It’s
conceivable, although I admit not entirely likely, that something like modern
string theory arises from a quantum field theory. And that would be the final
irony.

14
References

1. M. Born, W. Heisenberg, and P. Jordan, Z. f. Phys. 35, 557 (1926).

2. P.A.M. Dirac, Proc. Roy. Soc. A117, 610 (1928); ibid., A118, 351
(1928); ibid., A126, 360 (1930).

3. R. P. Feynman and A. R. Hibbs, Quantum Mechanics and Path Inte-


grals (McGraw-Hill, New York, 1965).

4. G. ’t Hooft, Nucl. Phys. B35, 167 (1971).

5. H. Euler and B. Kockel, Naturwiss. 23, 246 (1935); W. Heisenberg and


H. Euler, Z. f. Phys. 98, 714 (1936).

6. Z. Bern and D. A. Kosower, in International Symposium on Particles,


Strings, and Cosmology, eds. P. Nath and S. Reucroft (World Scientific,
Singapore, 1992): 794; Phys. Rev. Lett. 66, 669 (1991).

7. W. Pauli, Z. f. Phys. 37, 263 (1926); 43, 601 (1927).

8. F.J. Dyson, Phys. Rev. 75, 486, 1736 (1949).

9. E. P. Wigner, Ann. Math. 40, 149 (1939).

10. The cluster decomposition principle seems to have been first stated ex-
plicitly in quantum field theory by E. H. Wichmann and J. H. Crichton,
Phys. Rev. 132, 2788 (1963).

11. S. Weinberg, The Quantum Theory of Fields — Volume I: Founda-


tions (Cambridge University Press, Cambridge, 1995)

12. S. Weinberg, Phys. Rev. Lett. 18, 188 (1967); Phys. Rev. 166, 1568
(1968); Physica 96A, 327 (1979).

13. S. Weinberg, Phys. Lett. B251, 288 (1990); Nucl. Phys. B363, 3
(1991); Phys. Lett. B295, 114 (1992). C. Ordóñez and U. van Kolck,
Phys. Lett. B291, 459 (1992); C. Ordóñez, L. Ray, and U. van Kolck,
Phys. Rev. Lett. 72, 1982 (1994); U. van Kolck, Phys. Rev., C49,
2932 (1994); U. van Kolck, J. Friar, and T. Goldman, to appear in
Phys. Lett. B. This approach to nuclear forces is summarized in C.

15
Ordóñez, L. Ray, and U. van Kolck, Texas preprint UTTG-15-95, nucl-
th/9511380, submitted to Phys. Rev. C; J. Friar, Few-Body Systems
Suppl. 99, 1 (1996). For application of these techniques to related
nuclear processes, see T.-S. Park, D.-P. Min, and M. Rho, Phys. Rep.
233, 341 (1993); Seoul preprint SNUTP 95-043, nucl-th/9505017; S.R.
Beane, C.Y. Lee, and U. van Kolck, Phys. Rev., C52, 2915 (1995);
T. Cohen, J. Friar, G. Miller, and U. van Kolck, Washington preprint
DOE/ER/40427-26-N95, nucl-th/9512036.

14. J. F. Donoghue, Phys. Rev. D 50, 3874 (1994).

15. C. Becchi, A. Rouet, and R. Stora, Comm. Math. Phys. 42, 127
(1975); in Renormalization Theory, eds. G. Velo and A. S. Wightman
(Reidel, Dordrecht, 1976); Ann. Phys. 98, 287 (1976); I. V. Tyutin,
Lebedev Institute preprint N39 (1975).

16. G. ’t Hooft and M. Veltman, Nucl. Phys. B50, 318 (1972).

17. B. W. Lee and J. Zinn-Justin, Phys. Rev. D5, 3121, 3137 (1972); Phys.
Rev. D7, 1049 (1972).

18. C. N. Yang and R. L. Mills, Phys. Rev. 96, 191 (1954).

19. J. Gomis and S. Weinberg, Nuclear Physics B 469, 475–487 (1996).

20. G. Barnich and M. Henneaux, Phys. Rev. Lett. 72, 1588 (1994); G.
Barnich, F. Brandt, and M. Henneaux, Phys. Rev. 51, R143 (1995);
Commun. Math. Phys. 174, 57, 93 (1995); Nucl. Phys. B455, 357
(1995).

21. B. L. Voronov and I. V. Tyutin, Theor. Math. Phys. 50, 218 (1982);
52, 628 (1982); B. L. Voronov, P. M. Lavrov, and I. V. Tyutin, Sov.
J. Nucl. Phys. 36, 292 (1982); P. M. Lavrov and I. V. Tyutin Sov. J.
Nucl. Phys. 41, 1049 (1985).

22. D. Anselmi, Class. and Quant. Grav. 11, 2181 (1994); 12, 319 (1995).

23. M. Harada, T. Kugo, and K. Yamawaki, Prog. Theor. Phys. 91, 801
(1994).

16
24. S. Weinberg, Phys. Rev. Lett. 16, 879 (1966).

25. D. J. Gross and F. Wilczek, Phys. Rev. Lett., 30, 1343 (1973); H. D.
Politzer, Phys. Rev. Lett., 30, 1346 (1973).

26. S. Weinberg, in General Relativity, eds. S. W. Hawking and W. Israel,


eds. (Cambridge University Press, Cambridge, 1979): p. 790.

27. K. G. Wilson and M. E. Fisher, Phys. Rev. Lett. 28, 240 (1972); K.
G. Wilson, Phys. Rev. Lett. 28, 548 (1972).

28. K. G. Wilson, Phys. Rev. B4, 3174, 3184 (1971); Rev. Mod. Phys.
47, 773 (1975).

29. M. Gell-Mann and F. E. Low, Phys. Rev. 95, 1300 (1954).

30. V. Novikov, M. A. Shifman, A. I. Vainshtein, and V. I. Zakharov, Nucl.


Phys. B229, 381 (1983); M. A. Shifman and A. I. Vainshtein, Nucl.
Phys. B277, 456 (1986); and references quoted therein. See also M.
A. Shifman and A. I. Vainshtein, Nucl. Phys. B359, 571 (1991).

31. H. Nielsen and P. Oleson, Nucl. Phys. B61, 45 (1973).

17

You might also like