Introduction To Random Matrices
Introduction To Random Matrices
Introduction To Random Matrices
Editorial Board
B. BOLLOBAS, W. FULTON, A. KATOK, F. KIRWAN,
P. SARNAK, B. SIMON, B. TOTARO
The theory of random matrices plays an important role in many areas of pure
mathematics and employs a variety of sophisticated mathematical tools (analytical,
probabilistic and combinatorial). This diverse array of tools, while attesting to the
vitality of the field, presents several formidable obstacles to the newcomer, and even
the expert probabilist.
This rigorous introduction to the basic theory is sufficiently self-contained to be
accessible to graduate students in mathematics or related sciences, who have mastered
probability theory at the graduate level, but have not necessarily been exposed to
advanced notions of functional analysis, algebra or geometry. Useful background
material is collected in the appendices and exercises are also included throughout to
test the readers understanding. Enumerative techniques, stochastic analysis, large
deviations, concentration inequalities, disintegration and Lie algebras all are
introduced in the text, which will enable readers to approach the research literature
with confidence.
GREG W. ANDERSON
University of Minnesota
ALICE GUIONNET
Ecole Normale Superieure de Lyon
OFER ZEITOUNI
University of Minnesota and
Weizmann Institute of Science
CAMBRIDGE UNIVERSITY PRESS
Cambridge, New York, Melbourne, Madrid, Cape Town, Singapore,
So Paulo, Delhi, Dubai, Tokyo
Published in the United States of America by Cambridge University Press, New York
www.cambridge.org
Information on this title: www.cambridge.org/9780521194525
1 Introduction 1
vii
viii C ONTENTS
The study of random matrices, and in particular the properties of their eigenval-
ues, has emerged from the applications, first in data analysis and later as statisti-
cal models for heavy-nuclei atoms. Thus, the field of random matrices owes its
existence to applications. Over the years, however, it became clear that models
related to random matrices play an important role in areas of pure mathematics.
Moreover, the tools used in the study of random matrices came themselves from
different and seemingly unrelated branches of mathematics.
At this point in time, the topic has evolved enough that the newcomer, especially
if coming from the field of probability theory, faces a formidable and somewhat
confusing task in trying to access the research literature. Furthermore, the back-
ground expected of such a newcomer is diverse, and often has to be supplemented
before a serious study of random matrices can begin.
We believe that many parts of the field of random matrices are now developed
enough to enable one to expose the basic ideas in a systematic and coherent way.
Indeed, such a treatise, geared toward theoretical physicists, has existed for some
time, in the form of Mehtas superb book [Meh91]. Our goal in writing this book
has been to present a rigorous introduction to the basic theory of random matri-
ces, including free probability, that is sufficiently self-contained to be accessible to
graduate students in mathematics or related sciences who have mastered probabil-
ity theory at the graduate level, but have not necessarily been exposed to advanced
notions of functional analysis, algebra or geometry. Along the way, enough tech-
niques are introduced that we hope will allow readers to continue their journey
into the current research literature.
This project started as notes for a class on random matrices that two of us (G. A.
and O. Z.) taught in the University of Minnesota in the fall of 2003, and notes for
a course in the probability summer school in St. Flour taught by A. G. in the
xiii
xiv P REFACE
This book is concerned with random matrices. Given the ubiquitous role that
matrices play in mathematics and its application in the sciences and engineer-
ing, it seems natural that the evolution of probability theory would eventually
pass through random matrices. The reality, however, has been more complicated
(and interesting). Indeed, the study of random matrices, and in particular the
properties of their eigenvalues, has emerged from the applications, first in data
analysis (in the early days of statistical sciences, going back to Wishart [Wis28]),
and later as statistical models for heavy-nuclei atoms, beginning with the semi-
nal work of Wigner [Wig55]. Still motivated by physical applications, at the able
hands of Wigner, Dyson, Mehta and co-workers, a mathematical theory of the
spectrum of random matrices began to emerge in the early 1960s, and links with
various branches of mathematics, including classical analysis and number theory,
were established. While much progress was initially achieved using enumerative
combinatorics, gradually, sophisticated and varied mathematical tools were intro-
duced: Fredholm determinants (in the 1960s), diffusion processes (in the 1960s),
integrable systems (in the 1980s and early 1990s), and the RiemannHilbert prob-
lem (in the 1990s) all made their appearance, as well as new tools such as the
theory of free probability (in the 1990s). This wide array of tools, while attest-
ing to the vitality of the field, presents, however, several formidable obstacles to
the newcomer, and even to the expert probabilist. Indeed, while much of the re-
cent research uses sophisticated probabilistic tools, it builds on layers of common
knowledge that, in the aggregate, few people possess.
Our goal in this book is to present a rigorous introduction to the basic theory
of random matrices that would be sufficiently self-contained to be accessible to
graduate students in mathematics or related sciences who have mastered probabil-
ity theory at the graduate level, but have not necessarily been exposed to advanced
notions of functional analysis, algebra or geometry. With such readers in mind, we
1
2 1. I NTRODUCTION
present some background material in the appendices, that novice and expert alike
can consult; most material in the appendices is stated without proof, although the
details of some specialized computations are provided.
Keeping in mind our stated emphasis on accessibility over generality, the book
is essentially divided into two parts. In Chapters 2 and 3, we present a self-
contained analysis of random matrices, quickly focusing on the Gaussian ensem-
bles and culminating in the derivation of the gap probabilities at 0 and the Tracy
Widom law. These chapters can be read with very little background knowledge,
and are particularly suitable for an introductory study. In the second part of the
book, Chapters 4 and 5, we use more advanced techniques, requiring more exten-
sive background, to emphasize and generalize certain aspects of the theory, and to
introduce the theory of free probability.
So what is a random matrix, and what questions are we about to study? Through-
out, let F = R or F = C, and set = 1 in the former case and = 2 in the latter. (In
Section 4.1, we will also consider the case F = H, the skew-field of quaternions,
see Appendix E for definitions and details.) Let MatN (F) denote the space of N-
( )
by-N matrices with entries in F, and let HN denote the subset of self-adjoint
matrices (i.e., real symmetric if = 1 and Hermitian if = 2). One can always
( )
consider the sets MatN (F) and HN , = 1, 2, as submanifolds of an appropriate
Euclidean space, and equip it with the induced topology and (Borel) sigma-field.
Recall that a probability space is a triple (, F , P) so that F is a sigma-algebra
of subsets of and P is a probability measure on (, F ). In that setting, a random
matrix XN is a measurable map from (, F ) to MatN (F).
Our main interest is in the eigenvalues of random matrices. Recall that the
eigenvalues of a matrix H MatN (F) are the roots of the characteristic polynomial
PN (z) = det(zIN H), with IN the identity matrix. Therefore, on the (open) set
where the eigenvalues are all simple, they are smooth functions of the entries of
XN (a more complete discussion can be found in Section 4.1).
( )
We will be mostly concerned in this book with self-adjoint matrices H HN ,
= 1, 2, in which case the eigenvalues are all real and can be ordered. Thus,
( )
for H HN , we let 1 (H) N (H) be the eigenvalues of H. A conse-
quence of the perturbation theory of normal matrices (see Lemma A.4) is that the
eigenvalues {i (H)} are continuous functions of H (this also follows from the
HoffmanWielandt theorem, Theorem 2.1.19). In particular, if XN is a random
matrix then the eigenvalues {i (XN )} are random variables.
We present now a guided tour of the book. We begin by considering Wigner
matrices in Chapter 2. These are symmetric (or Hermitian) matrices XN whose
1. I NTRODUCTION 3
entries are independent and identically distributed, except for the symmetry con-
straints. For x R, let x denote the Dirac measure at x, that is, the unique prob-
ability measure satisfying f d x = f (x) for all continuous functions on R. Let
LN = N 1 Ni=1 i (XN ) denote the empirical measure of the eigenvalues of XN .
Wigners Theorem (Theorem 2.1.1) asserts that, under appropriate assumptions
on the law of the entries, LN converges (with respect to the weak convergence
of measures) towards a deterministic probability measure, the semicircle law. We
present in Chapter 2 several proofs of Wigners Theorem. The first, in Section 2.1,
involves a combinatorial machinery that is also exploited to yield central limit the-
orems and estimates on the spectral radius of XN . After first introducing in Section
2.3 some useful estimates on the deviation between the empirical measure and its
mean, we define in Section 2.4 the Stieltjes transform of measures and use it to
give another quick proof of Wigners Theorem.
The expression for the joint density of the eigenvalues in the Gaussian ensem-
bles is the starting point for obtaining local information on the eigenvalues. This
is the topic of Chapter 3. The bulk of the chapter deals with the GUE, because
in that situation the eigenvalues form a determinantal process. This allows one
to effectively represent the probability that no eigenvalues are present in a set
as a Fredholm determinant, a notion that is particularly amenable to asymptotic
analysis. Thus, after representing in Section 3.2 the joint density for the GUE in
terms of a determinant involving appropriate orthogonal polynomials, the Hermite
polynomials, we develop in Section 3.4 in an elementary way some aspects of the
theory of Fredholm determinants. We then present in Section 3.5 the asymptotic
analysis required in order to study the gap probability at 0, that is the probabil-
ity that no eigenvalue is present in an interval around the origin. Relevant tools,
such as the Laplace method, are developed along the way. Section 3.7 repeats this
analysis for the edge of the spectrum, introducing along the way the method of
4 1. I NTRODUCTION
steepest descent. The link with integrable systems and the Painleve equations is
established in Sections 3.6 and 3.8.
As mentioned before, the eigenvalues of the GUE are an example of a deter-
minantal process. The other Gaussian ensembles (GOE and GSE) do not fall into
this class, but they do enjoy a structure where certain Pfaffians replace determi-
nants. This leads to a considerably more involved analysis, the details of which
are provided in Section 3.9.
Chapter 4 is a hodge-podge of results whose common feature is that they all
require new tools. We begin in Section 4.1 with a re-derivation of the joint law
of the eigenvalues of the Gaussian ensemble, in a geometric framework based on
Lie theory. We use this framework to derive the expressions for the joint distri-
bution of eigenvalues of Wishart matrices, of random matrices from the various
unitary groups and of matrices related to random projectors. Section 4.2 studies
in some depth determinantal processes, including their construction, associated
central limit theorems, convergence and ergodic properties. Section 4.3 studies
what happens when in the GUE (or GOE), the Gaussian entries are replaced by
Brownian motions. The powerful tools of stochastic analysis can then be brought
to bear and lead to functional laws of large numbers, central limit theorems and
large deviations. Section 4.4 consists of an in-depth treatment of concentration
techniques and their application to random matrices; it is a generalization of the
discussion in the short Section 2.3. Finally, in Section 4.5, we study a family of
tri-diagonal matrices, parametrized by a parameter , whose distribution of eigen-
values coincides with that of members of the Gaussian ensembles for = 1, 2, 4.
The study of the maximal eigenvalue for this family is linked to the spectrum of
an appropriate random Schrodinger operator.
Chapter 5 is devoted to free probability theory, a probability theory for certain
noncommutative variables, equipped with a notion of independence called free
independence. Invented in the early 1990s, free probability theory has become
a versatile tool for analyzing the laws of noncommutative polynomials in several
random matrices, and of the limits of the empirical measure of eigenvalues of such
polynomials. We develop the necessary preliminaries and definitions in Section
5.2, introduce free independence in Section 5.3, and discuss the link with random
matrices in Section 5.4. We conclude the chapter with Section 5.5, in which we
study the convergence of the spectral radius of noncommutative polynomials of
random matrices.
Each chapter ends with bibliographical notes. These are not meant to be com-
prehensive, but rather guide the reader through the enormous literature and give
some hint of recent developments. Although we have tried to represent accurately
1. I NTRODUCTION 5
rk := max E|Z1,2 |k , E|Y1 |k < . (2.1.1)
Define the semicircle distribution (or law) as the probability distribution (x)dx
6
2.1 T RACES , MOMENTS AND COMBINATORICS 7
on R with density
1
(x) = 4 x2 1|x|2 . (2.1.3)
2
The following theorem, contained in [Wig55], can be considered the starting point
of random matrix theory (RMT).
Theorem 2.1.1 (Wigner) For a Wigner matrix, the empirical measure LN con-
verges weakly, in probability, to the semicircle distribution.
In greater detail, Theorem 2.1.1 asserts that for any f Cb (R), and any > 0,
lim P(|LN , f , f | > ) = 0 .
N
Remark 2.1.2 The assumption (2.1.1) that rk < for all k is not really needed.
See Theorem 2.1.21 in Section 2.1.5.
We will see many proofs of Wigners Theorem 2.1.1. In this section, we give
a direct combinatorics-based proof, mimicking the original argument of Wigner.
Before doing so, however, we need to discuss some properties of the semicircle
distribution.
Hence,
/2
2 22k 4(2k 1)
m2k = sin2k ( )d = m2k2 , (2.1.5)
(2k + 2) /2 2k + 2
8 2. W IGNER MATRICES
Proof of Lemma 2.1.3 Let Bk denote the number of Bernoulli walks {Sn } of
length 2k that satisfy S2k = 0, and let Bk denote the number of Bernoulli walks
{Sn } of length 2k that satisfy S2k = 0 and St < 0 for some t < 2k. Then, k =
Bk Bk . By reflection at the first hitting of 1, one sees that Bk equals the number
of Bernoulli walks {Sn } of length 2k that satisfy S2k = 2. Hence,
2k 2k
k = Bk Bk = = Ck .
k k1
Turning to the evaluation of (z), considering the first return time to 0 of the
Bernoulli walk {Sn } gives the relation
k
k = k j j1 , k 1 , (2.1.7)
j=1
with the convention that 0 = 1. Because the number of Bernoulli walks of length
2k is bounded by 4k , one has that k 4k , and hence the function (z) is well
defined and analytic for |z| < 1/4. But, substituting (2.1.7),
k k
(z) 1 = zk k j j1 = z zk k j j ,
k=1 j=1 k=0 j=0
while
q
(z)2 = zk+k k k = zq q .
k,k =0 q=0 =0
(z) = 1 + z (z)2 ,
2.1 T RACES , MOMENTS AND COMBINATORICS 9
from which (2.1.6) follows (using that (0) = 1 to choose the correct branch of
the square-root).
We note in passing that, expanding (2.1.6) in power series in z in a neighborhood
of zero, one gets (for |z| < 1/4)
k
2 z (2k2)!
k=1 k!(k1)!
(2k)!
(z) =
2z
= k!(k + 1)! zk = zkCk ,
k=0 k=0
Exercise 2.1.5 Prove that for z C such that z [2, 2], the Stieltjes transform
10 2. W IGNER MATRICES
1 1
2 2
3 5 3
5
4 4
Fig. 2.1.1. Non-crossing (left, (1, 4), (2, 3), (5, 6)) and crossing (right, (1, 5), (2, 3), (4, 6))
partitions of the set K6 .
Indeed, assume that Lemmas 2.1.6 and 2.1.7 have been proved. To conclude the
proof of Theorem 2.1.1, one needs to check that for any bounded continuous func-
tion f ,
lim LN , f = , f , in probability. (2.1.8)
N
2.1 T RACES , MOMENTS AND COMBINATORICS 11
Toward this end, note first that an application of the Chebyshev inequality yields
1 LN , x2k
P LN , |x|k 1|x|>B > ELN , |x|k 1|x|>B .
Bk
Hence, by Lemma 2.1.6,
, x2k 4k
lim sup P LN , |x|k 1|x|>B > ,
N Bk Bk
where we used that Ck 4k . Thus, with B = 5, noting that the left side above is
increasing in k, it follows that
lim sup P LN , |x|k 1|x|>B > = 0 . (2.1.9)
N
In particular, when proving (2.1.8), we may and will assume that f is supported
on the interval [5, 5].
Fix next such an f and > 0. By the Weierstrass approximation theorem, one
can find a polynomial Q (x) = Li=0 ci xi such that
sup |Q (x) f (x)| .
x:|x|B 8
Then,
P (|LN , f , f | > ) P |LN , Q LN , Q | >
4
+P |LN , Q , Q | > + P |LN , Q 1|x|>B | >
4 4
=: P1 + P2 + P3 .
By an application of Lemma 2.1.7, P1 N 0. Lemma 2.1.6 implies that P2 = 0
for N large, while (2.1.9) implies that P3 N 0. This completes the proof of
Theorem 2.1.1 (modulo Lemmas 2.1.6 and 2.1.7).
The starting point of the proof of Lemma 2.1.6 is the following identity:
1
LN , xk = EtrXNk
N
N
1
=
N i1 ,...,i =1
EXN (i1 , i2 )XN (i2 , i3 ) XN (ik1 , ik )XN (ik , i1 )
k
N N
1 1
=:
N i1 ,...,i =1
ETiN =: T N ,
N i1 ,...,i =1 i
(2.1.10)
k k
12 2. W IGNER MATRICES
where the notation ! means there exists a unique. Considering the index j > 1
such that either (i j , i j+1 ) = (i2 , i1 ) or (i j , i j+1 ) = (i1 , i2 ), and recalling that i2 = i1
since Yi1 = 0, one obtains
2k N N
1
LN , x2k = (1 + O(N 1 ))
N (2.1.12)
j=2 i1 =i2 =1 i3 ,...,i j1 ,
i j+2 ,...,i2k =1
EXN (i2 , i3 ) XN (i j1 , i2 )XN (i1 , i j+2 ) XN (i2k , i1 )
+EXN (i2 , i3 ) XN (i j1 , i1 )XN (i2 , i j+2 ) XN (i2k , i1 ) .
where we have used the fact that by induction LN , x2k2 is uniformly bounded
and also the fact that odd moments vanish. Further,
1 N
N i,
LN , x2 = EXN (i, j)2 N 1 = C1 . (2.1.14)
j=1
Thus, we conclude from (2.1.13) by induction that LN , x2k converges to a limit
ak with a0 = a1 = 1, and further that the family {ak } satisfies the recursions ak =
kj=1 ak j a j1 . Comparing with (2.1.7), we deduce that ak = Ck , as claimed.
We turn next to the actual proof. To handle the summation in expressions like
(2.1.10), it is convenient to introduce some combinatorial machinery that will
serve us also in the sequel. We thus first digress and discuss the combinatorics
intervening in the evaluation of the sum in (2.1.10). This is then followed by the
actual proof of Lemma 2.1.6.
In the following definition, the reader may think of S as a subset of the integers.
When S = {1, . . . , N} for some finite N, we use the term N-word. Otherwise, if
the set S is clear from the context, we refer to an S -word simply as a word.
For any S -word w = s1 sk , we use (w) = k to denote the length of w, define
the weight wt(w) as the number of distinct elements of the set {s1 , . . . , sk } and the
support of w, denoted supp w, as the set of letters appearing in w. With any word
w we may associate an undirected graph, with wt(w) vertices and (w) 1 edges,
as follows.
The graph Gw is connected since the word w defines a path connecting all the
vertices of Gw , which further starts and terminates at the same vertex if the word
is closed. For e Ew , we use New to denote the number of times this path traverses
14 2. W IGNER MATRICES
the edge e (in any direction). We note that equivalent words generate the same
graphs Gw (up to graph isomorphism) and the same passage-counts New .
Coming back to the evaluation of TiN , see (2.1.10), note that any k-tuple of
integers i defines a closed word wi = i1 i2 ik i1 of length k + 1. We write wti =
wt(wi ), which is nothing but the number of distinct integers in i. Then,
1 wi wi
TiN =
N k/2
c Ne
E(Z1,2 ) s E(Y1Ne ) . (2.1.15)
eEw eEw
i i
w
In particular, TiN = 0 unless 2 for all e Ewi , which implies that wti
Ne i
k/2 + 1. Also, (2.1.15) shows that if wi wi then TiN = TiN . Further, if N t then
there are exactly
N-words that are equivalent to a given N-word of weight t. We make the following
definition:
Wk,t denotes a set of representatives for equivalence classes of closed
t-words w of length k + 1 and weight t with New 2 for each e Ew .
(2.1.16)
One deduces from (2.1.10) and (2.1.15) that
k/2+1
CN,t Nw Nw
LN , xk = N k/2+1 E(Z1,2e ) E(Y1 e ) . (2.1.17)
t=1 wWk,t eEwc eEws
Note that the cardinality of Wk,t is bounded by the number of closed S -words of
length k + 1 when the cardinality of S is t k, that is, |Wk,t | t k kk . Thus,
(2.1.17) and the finiteness of rk , see (2.1.1), imply that
We have now motivated the following definition. Note that for the purpose of this
section, the case k = 0 in Definition 2.1.10 is not really needed. It is introduced in
this way here in anticipation of the analysis in Section 2.1.6.
Proof of Lemma 2.1.6 Let k be even. It is convenient to choose the set of rep-
resentatives Wk,k/2+1 such that each word w = v1 vk+1 in that set satisfies, for
i = 1, . . . , k + 1, the condition that {v1 , . . . , vi } is an interval in Z beginning at 1.
(There is a unique choice of such representatives.) Each element w Wk,k/2+1
determines a path v1 , v2 , . . . , vk , vk+1 = v1 of length k on the tree Gw . We refer
to this path as the exploration process associated with w. Let d(v, v ) denote the
distance between vertices v, v on the tree Gw , i.e. the length of the shortest path
on the tree beginning at v and terminating at v . Setting xi = d(vi+1 , v1 ), one sees
that each word w Wk,k/2+1 defines a Dyck path D(w) = (x1 , x2 , . . . , xk ) of length
k. See Figure 2.1.2 for an example of such coding. Conversely, given a Dyck path
x = (x1 , . . . , xk ), one may construct a word w = T (x) Wk,k/2+1 by recursively
constructing an increasing sequence w2 , . . . , wk = w of words, as follows. Put
w2 = (1, 2). For i > 2, if xi1 = xi2 + 1, then wi is obtained by adjoining on the
right of wi1 the smallest positive integer not appearing in wi1 . Otherwise, wi is
obtained by adjoining on the right of wi1 the next-to-last letter of wi1 . Note that
for all i, Gwi is a tree (because Gw2 is a tree and, inductively, at stage i, either a
backtrack is added to the exploration process on Gwi1 or a leaf is added to Gwi1 ).
Furthermore, the distance in Gwi between first and last letters of wi equals xi1 , and
therefore, D(w) = (x1 , . . . , xk ). With our choice of representatives, T (D(w)) = w,
because each uptick in the Dyck path D(w) starting at location i 2 corresponds
to adjoinment on the right of wi1 of a new letter, which is uniquely determined by
supp wi1 , whereas each downtick at location i 2 corresponds to the adjoinment
of the next-to-last letter in wi1 . This establishes a bijection between Dyck paths
of length k and Wk,k/2+1 . Lemma 2.1.3 then establishes that
From the proof of Lemma 2.1.6 we extract as a further benefit a proof of a fact
needed in Chapter 5. Let k be an even positive integer and let Kk = {1, . . . , k}.
Recall the notion of non-crossing partition of Kk , see Definition 2.1.4. We define
16 2. W IGNER MATRICES
1
3 4 5
Fig. 2.1.2. Coding of the word w = 123242521 into a tree and a Dyck path of length 8.
Note that (w) = 9 and wt(w) = 5.
Proof (i) Because a Wigner word w viewed as a walk on its graph Gw crosses
every edge exactly twice, w is a pair partition. Because the graph Gw is a tree,
the pair partition w is non-crossing.
(ii) The non-crossing pair partitions of Kk correspond bijectively to Dyck paths.
More precisely, given a non-crossing pair partition of Kk , associate with it a
path f = ( f (1), . . . , f (k)) by the rules that f (1) = 1 and, for i = 2, . . . , k,
2.1 T RACES , MOMENTS AND COMBINATORICS 17
where
Ti,iN = ETiN TiN ETiN ETiN . (2.1.22)
The role of words in the proof of Lemma 2.1.6 is now played by pairs of words,
which is a particular case of a sentence.
w wi wi wi
E(Z1,2 ) Ne i
E(Y1Ne ) Ne
E(Z1,2 ) E(Y1Ne ) .
eEwc eEws eEwc eEws
i i i i
a
In particular, Ti,iN = 0 unless 2 for all e Eai,i . Also, Ti,iN = 0 unless
Ne i,i
Ewi Ewi = 0.
/ Further, (2.1.23) shows that if ai,i aj,j then Ti,iN = Tj,jN . Finally,
if N t then there are exactly CN,t N-sentences that are equivalent to a given
N-sentence of weight t. We make the following definition:
(2)
Wk,t denotes a set of representatives for equivalence classes of sentences a
of weight t consisting of two closed t-words (w1 , w2 ), each of length k + 1,
with Nea 2 for each e Ea , and Ew1 Ew2 = 0/ .
(2.1.24)
One deduces from (2.1.21) and (2.1.23) that
E(LN , xk 2 ) LN , xk 2 (2.1.25)
2k
CN,t
Na Na
= N k+2 c E(Z1,2e ) s E(Y1 e )
t=1 (2) eEa eEa
a=(w1 ,w2 )Wk,t
w1 w1 w2 w2
c Ne
E(Z1,2 ) s E(Y1Ne ) c Ne
E(Z1,2 ) s E(Y1Ne ) .
eEw eEw eEw eEw
1 1 2 2
Remark 2.1.14 Note that in the course of the proof of Lemma 2.1.7, we actually
showed that for N > 2k,
Exercise 2.1.15 Consider symmetric random matrices XN , with the zero mean
independent random variables {XN (i, j)}1i jN no longer assumed identically
distributed nor all of variance 1/N. Check that Theorem 2.1.1 still holds if one
assumes that for all > 0,
#{(i, j) : |1 NEXN (i, j)2 | < }
lim = 1,
N N2
and for all k 1, there exists a finite rk independent of N such that
k
sup E NXN (i, j) rk .
1i jN
Exercise 2.1.16 Check that the conclusion of Theorem 2.1.1 remains true when
convergence in probability is replaced by almost sure convergence.
Hint: Using Chebyshevs inequality and the BorelCantelli Lemma, it is enough
to verify that for all positive integers k, there exists a constant C = C(k) such that
C
|E LN , xk 2 LN , xk 2 | 2 .
N
Exercise 2.1.17 In the setup of Theorem 2.1.1, assume that rk < for all k but
2 ] = 1. Show that, for any positive integer k,
not necessarily that E[Z1,2
Exercise 2.1.18 We develop in this exercise the limit theory for Wishart matrices.
Let M = M(N) be a sequence of positive integers such that
EYN (i1 , j1 )YN (i2 , j1 )YN (i2 , j2 )YN (i3 , j2 ) YN (ik , jk )YN (i1 , jk )
i1 ,...,ik
j1 ,..., jk
and show that the only contributions to the sum (divided by N) that survive the
passage to the limit are those in which each term appears exactly twice.
Hint: use the words i1 j1 i2 j2 . . . jk i1 and a bi-partite graph to replace the Wigner
analysis.
(b) Code the contributions as Dyck paths, where the even heights correspond to
i indices and the odd heights correspond to j indices. Let = (i, j) denote the
number of times the excursion makes a descent from an odd height to an even
height (this is the number of distinct j indices in the tuple!), and show that the
combinatorial weight of such a path is asymptotic to N k+1 .
(c) Let denote the number of times the excursion makes a descent from an even
height to an odd height, and set
k = , k = .
Dyck paths of length 2k Dyck paths of length 2k
(The k are the kth moments of any weak limit of LN .) Prove that
k k
k = k j j1 , k = k j j1 , k 1.
j=1 j=1
and thus the limit F of LN possesses the Stieltjes transform (see Definition 2.4.1)
z1 (1/z), where
( 1)2 z
1 ( 1)z 1 4z +1
2 4
(z) = .
2z
2.1 T RACES , MOMENTS AND COMBINATORICS 21
This section is devoted to the following simple observation that often allows one
to considerably simplify arguments concerning the convergence of empirical mea-
sures.
Proof Note that trA2 = i (iA )2 and trB2 = i (iB )2 . Let U denote the matrix
diagonalizing B written in the basis determined by A, and let DA , DB denote the
diagonal matrices with diagonal elements iA , iB respectively. Then,
trAB = trDAUDBU T = iA jB u2i j .
i, j
The last sum is linear in the coefficients vi j = u2i j , and the orthogonality of U
implies that j vi j = 1, i vi j = 1. Thus
trAB sup iA jB vi j .
vi j 0: j vi j =1,i vi j =1 i, j
(2.1.28)
But this is a maximization of a linear functional over the convex set of doubly
stochastic matrices, and the maximum is obtained at the extreme points, which
are well known to correspond to permutations The maximum among permuta-
tions is then easily checked to be i iA iB . Collecting these facts together implies
Lemma 2.1.19. Alternatively, one sees directly that a maximizing V = {vi j } in
(2.1.28) is the identity matrix. Indeed, assume w.l.o.g. that v11 < 1. We then
construct a matrix V = {vi j } with v11 = 1 and vii = vii for i > 1 such that V is also
22 2. W IGNER MATRICES
a maximizing matrix. Indeed, because v11 < 1, there exist a j and a k with v1 j > 0
and vk1 > 0. Set v = min(v1 j , vk1 ) > 0 and define v11 = v11 + v, vk j = vk j + v and
v1 j = v1 j v, vk1 = vk1 v, and vab = vab for all other pairs ab. Then,
iA jB (vi j vi j ) = v(1A 1B + kA jB kA 1B 1A jB )
i, j
= v(1A kA )(1B jB ) 0 .
Thus, V = {vi j } satisfies the constraints, is also a maximum, and the number of
zero elements in the first row and column of V is larger by 1 at least from the
corresponding one for V . If v11 = 1, the claims follows, while if v11 < 1, one
repeats this (at most 2N 2 times) to conclude. Proceeding in this manner with
all diagonal elements of V , one sees that indeed the maximum of the right side of
(2.1.28) is i iA iB , as claimed.
Remark 2.1.20 The statement and proof of Lemma 2.1.19 carry over to the case
where A and B are both Hermitian matrices.
Lemma 2.1.19 allows one to perform all sorts of truncations when proving con-
vergence of empirical measures. For example, let us prove the following variant
of Wigners Theorem 2.1.1.
Proof Fix a constant C and consider the symmetric matrix XN whose elements
satisfy, for 1 i j N,
XN (i, j) = XN (i, j)1N|XN (i, j)|C E(XN (i, j)1N|XN (i, j)|C ).
N, i, j, when C converges to infinity. Hence, one may chose for each a large
enough C such that P(|WN | > ) < . Further, let
| f (x) f (y)
Lip(R) = { f Cb (R) : sup | f (x)| 1, sup 1} .
x x=y |x y|
Wigners theorem asserts the weak convergence of the empirical measure of eigen-
values to the compactly supported semicircle law. One immediately is led to sus-
pect that the maximal eigenvalue of XN should converge to the value 2, the largest
element of the support of the semicircle distribution. This fact, however, does not
follow from Wigners Theorem. Nonetheless, the combinatorial techniques we
have already seen allow one to prove the following, where we use the notation
introduced in (2.1.1) and (2.1.2).
Remark The assumption of Theorem 2.1.22 holds if the random variables |Z1,2 |
and |Y1 | possess a finite exponential moment.
Proof of Theorem 2.1.22 Fix > 0 and let g : R R+ be a continuous function
supported on [2 , 2], with , g = 1. Then, applying Wigners Theorem 2.1.1,
1
P(NN < 2 ) P(LN , g = 0) P(|LN , g , g| > ) N 0 . (2.1.29)
2
We thus need to provide a complementary estimate on the probability that NN is
large. We do that by estimating LN , x2k for k growing with N, using the bounds
24 2. W IGNER MATRICES
Lemma 2.1.23 For all integers k > 2t 2 one has the estimate
To evaluate the last expectation, fix w W2k,t , and let l denote the number of edges
in Ewc with New = 2. Holders inequality then gives
Nw Nw
c E(Z1,2e ) s E(Y1 e ) r2k2l ,
eEw eEw
Proof of Lemma 2.1.23 The idea of the proof is to keep track of the number of
possibilities to prevent words in Wk,t from having weight k/2 + 1. Toward this
end, let w Wk,t be given. A parsing of the word w is a sentence aw = (w1 , . . . , wn )
such that the word obtained by concatenating the words wi is w. One can imagine
creating a parsing of w by introducing commas between parts of w.
We say that a parsing a = aw of w is an FK parsing (after Furedi and Komlos),
and call the sentence a an FK sentence, if the graph associated with a is a tree, if
Nea 2 for all e Ea , and if for any i = 1, . . . , n 1, the first letter of wi+1 belongs
to ij=1 supp w j . If the one-word sentence a = w is an FK parsing, we say that w
is an FK word. Note that the constituent words in an FK parsing are FK words.
As will become clear next, the graph of an FK word consists of trees whose
edges have been visited twice by w, glued together by edges that have been visited
only once. Recalling that a Wigner word is either a one-letter word or a closed
word of odd length and maximal weight (subject to the constraint that edges are
visited at least twice), this leads to the following lemma.
in the sense of formal power series. By the proof of Lemma 2.1.6, |W2l,l+1 | =
Cl = l . Hence, by Lemma 2.1.3, for |z| < 1/4,
1 1 4z2
z+ z 2l+1
|W2l,l+1 | = z (z ) =
2
.
l=1 2z
26 2. W IGNER MATRICES
Substituting in (2.1.33), one sees that (again, in the sense of power series)
z (z2 ) 1 1 4z2 1 z + 12
nN z n
= =
1 z (z2 ) 2z 1 + 1 4z2
= +
2 1 4z2
.
n=1
6 6
5 5
2 3 2 3
1 1
7
4 7 4
Since a word w can be recovered from its FK parsing by omitting the extra
commas, and since the number of equivalence classes of FK words is estimated
by Lemma 2.1.24, one could hope to complete the proof of Lemma 2.1.23 by
controlling the number of possible parsed FK sequences. A key step toward this
end is the following lemma, which explains how FK words are fitted together to
form FK sentences. Recall that any FK word w can be written in a unique way as
a concatenation of disjoint Wigner words wi , i = 1, . . . , r. With si denoting the first
(and last) letter of wi , define the skeleton of w as the word s1 sr . Finally, for a
2.1 T RACES , MOMENTS AND COMBINATORICS 27
sentence a with graph Ga , let G1a = (Va1 , Ea1 ) be the graph with vertex set Va = Va1
and edge set Ea1 = {e Ea : Nea = 1}. Clearly, when a is an FK sentence, G1a is
always a forest, that is a disjoint union of trees.
Completion of proof of Lemma 2.1.23 Let (t, , m) denote the set of equiva-
lence classes of FK sentences a = (w1 , . . . , wm ) consisting of m words, with total
length mi=1 (wi ) = and wt(a) = t. An immediate corollary of Lemma 2.1.25 is
that
1
|(t, , m)| 2mt 2(m1) . (2.1.34)
m1
1
Indeed, there are c,m := m-tuples of positive integers summing to ,
m1
and thus at most 2m c,m equivalence classes of sentences consisting of m pair-
wise disjoint FK words with sum of lengths equal to . Lemma 2.1.25 then shows
that there are at most t 2(m1) ways to glue these words into an FK sentence,
whence (2.1.34) follows.
For any FK sentence a consisting of m words with total length , we have that
m = |Ea1 | 2wt(a) + 2 + . (2.1.35)
Indeed, the word obtained by concatenating the words of a generates a list of 1
(not necessarily distinct) unordered pairs of adjoining letters, out of which m 1
correspond to commas in the FK sentence a and 2|Ea | |Ea1 | correspond to edges
of Ga . Using that |Ea | = |Va | 1, (2.1.35) follows.
Consider a word w Wk,t that is parsed into an FK sentence w consisting of
m words. Note that if an edge e is retained in Gw , then no comma is inserted
at e at the first and second passage on e (but is introduced if there are further
passages on e). Therefore, Ew1 = 0.
/ By (2.1.35), this implies that for such words,
28 2. W IGNER MATRICES
Since |Vb | + |Vc | |Vb Vc | = |Va |, it follows that |Vd | = |Vb Vc |. Since Vd
Vb Vc , one concludes that Vd = Vb Vc , as claimed.
Remark 2.1.26 The result described in Theorem 2.1.22 is not optimal, in the sense
that even with uniform bounds on the (rescaled) entries, i.e. rk uniformly bounded,
the estimate one gets on the displacement of the maximal eigenvalue to the right
of 2 is O(n1/6 log n), whereas the true displacement is known to be of order n2/3
(see Section 2.7 for more details, and, in the context of complex Gaussian Wigner
matrices, see Theorems 3.1.4 and 3.1.5).
Exercise 2.1.27 Prove that the conclusion of Theorem 2.1.22 holds with conver-
gence in probability replaced by either almost sure convergence or L p conver-
gence.
Exercise 2.1.28 Prove that the statement of Theorem 2.1.22 can be strengthened
to yield that for some constant = (C) > 0, N (NN 2) converges to 0, almost
surely.
Exercise 2.1.29 Assume that for some constants > 0, C, the independent (but
not necessarily identically distributed) entries {XN (i, j)}1i jN of the symmetric
matrices XN satisfy
sup E(e N|XN (i, j)|
) C.
i, j,N
Prove that there exists a constant c1 = c1 (C) such that lim supN NN c1 , almost
surely, and lim supN E NN c1 .
2.1 T RACES , MOMENTS AND COMBINATORICS 29
Exercise 2.1.30 We develop in this exercise an alternative proof, that avoids mo-
ment computations, to the conclusion of Exercise 2.1.29, under the stronger as-
sumption that for some > 0,
sup E(e ( N|XN (i, j)|)2
) C.
i, j,N
(a) Prove (using Chebyshevs inequality and the assumption) that there exists a
constant c0 independent of N such that for any fixed z RN , and all C large
enough,
P(zT XN 2 > C) ec0C
2N
. (2.1.36)
N
(b) Let N = {zi }i=1
be a minimal deterministic net in the unit ball of RN , that
is zi 2 = 1, supz:z2 =1 infi z zi 2 , and N is the minimal integer with the
property that such a net can be found. Check that
(c) Combine steps (a) and (b) and the estimate N cN , valid for some c > 0, to
conclude that there exists a constant c2 independent of N such that for all C large
enough, independently of N,
Our goal here is to derive a simple version of a central limit theorem (CLT)
for linear statistics of the eigenvalues of Wigner matrices. With XN a Wigner
matrix and LN the associated empirical measure of its eigenvalues, set WN,k :=
N[LN , xk LN , xk ]. Let
x
1
eu
2 /2
(x) = du
2
denote the Gaussian distribution. We set k2 as in (2.1.44) below, and prove the
following.
Theorem 2.1.31 The law of the sequence of random variables WN,k /k converges
weakly to the standard Gaussian distribution. More precisely,
WN,k
lim P x = (x) . (2.1.38)
N k
30 2. W IGNER MATRICES
(2)
Note that if a = (w1 , w2 ) Wk,k then Ga is connected and possesses k vertices and
at most k edges, each visited at least twice by the paths generated by a. Hence,
(2)
with k vertices, Ga possesses either k 1 or k edges. Let Wk,k,+ denote the subset
(2)
of Wk,k such that |Ea | = k (that is, Ga is unicyclic, i.e. possesses one edge too
(2) (2)
many to be a tree) and let Wk,k, denote the subset of Wk,k such that |Ea | = k 1.
(2)
Suppose first a Wk,k, . Then, Ga is a tree, Eas = 0,
/ and necessarily Gwi is a
subtree of Ga . This implies that k is even and that |Ewi | k/2. In this case, for
Ew1 Ew2 = 0/ one must have |Ewi | = k/2, which implies that all edges of Gwi are
visited twice by the walk generated by wi , and exactly one edge is visited twice
by both w1 and w2 . In particular, wi are both closed Wigner words of length k + 1.
The emerging picture is of two trees with k/2 edges each glued together at one
edge. Since there are Ck/2 ways to chose each of the trees, k/2 ways of choosing
(in each tree) the edge to be glued together, and 2 possible orientations for the
gluing, we deduce that
2
(2) k
|Wk,k, | = 2 2
Ck/2 . (2.1.40)
2
(2)
Further, for each a Wk,k, ,
Na Na
c E(Z1,2e ) s E(Y1 e )
eEa eEa
w1 w1 w2 w2
c Ne
E(Z1,2 ) s E(Y1Ne ) c Ne
E(Z1,2 ) s E(Y1Ne )
eEw eEw eEw eEw
1 1 2 2
= 4
E(Z1,2 2
)[E(Z1,2 )]k2 [E(Z1,2
2
)]k 4
= E(Z1,2 )1. (2.1.41)
2.1 T RACES , MOMENTS AND COMBINATORICS 31
(2)
We next turn to consider Wk,k,+ . In order to do so, we need to understand the
structure of unicyclic graphs.
We call Z the bracelet of G. We call r the circuit length of G, and each of the
components of F we call a pendant tree. (The case r = 2 is excluded from Lemma
2.1.33 because a bracelet of circuit length 2 is a tree and thus never unicyclic.)
See Figure 2.1.4.
4 1
3 2
8 5
7 6
Fig. 2.1.4. The bracelet 1234 of circuit length 4, and the pendant trees, associated with the
unicyclic graph corresponding to [12565752341, 2383412]
32 2. W IGNER MATRICES
(2)
Coming back to a Wk,k,+ , let Za be the associated bracelet (with circuit length
r = 1 or r 3). Note that for any e Ea one has Nea = 2. We claim next that
e Za if and only if New1 = New2 = 1. On the one hand, if e Za then (Va , Ea \ e)
is a tree. If one of the paths determined by w1 and w2 fail to visit e then all edges
visited by this path determine a walk on a tree and therefore the path visits each
edge exactly twice. This then implies that the set of edges visited by the walks
are disjoint, a contradiction. On the other hand, if e = (x, y) and Newi = 1, then all
vertices in Vwi are connected to x and to y by a path using only edges from Ewi \ e.
Hence, (Va , Ea \ e) is connected, and thus e Za .
(2)
Thus, any a = (w1 , w2 ) Wk,k,+ with bracelet length r can be constructed from
the following data: the pendant trees {T ji }rj=1 (possibly empty) associated with
each word wi and each vertex j of the bracelet Za , the starting point for each word
wi on the graph consisting of the bracelet Za and trees {T ji }, and whether Za is
traversed by the words wi in the same or in opposing directions (in the case r 3).
In view of the above, counting the number of ways to attach trees to a bracelet of
length r, and then the distinct number of non-equivalent ways to choose starting
points for the paths on the resulting graph, there are exactly
2
21r3 k2
r
r Cki
(2.1.42)
ki 0: i=1
2 ri=1 ki =kr
(2) (2)
elements of Wk,k,+ with bracelet of length r. Further, for a Wk,k,+ we have
Na Na
E(Z1,2e ) E(Y1 e )
eEac eEas
w1 w1 w2 w2
Ne
E(Z1,2 ) E(Y1Ne ) Ne
E(Z1,2 ) E(Y1Ne )
eEwc eEws eEwc eEws
1 1 2 2
2 ))k 0
(E(Z1,2 if r 3,
=
(E(Z1,2 )) EY1 0 if r = 1
2 k1 2
1 if r 3 ,
= (2.1.43)
EY12 if r = 1 .
Combining (2.1.39), (2.1.40), (2.1.41), (2.1.42) and (2.1.43), and setting Cx = 0 if
x is not an integer, one obtains, with
2
k2 2
2k2
r
k2 = k2C2k1 EY12 +
2 2 2
4
C k [EZ1,2 1] +
r Cki
, (2.1.44)
r=3 ki 0: i=1
2 ri=1 ki =kr
2.1 T RACES , MOMENTS AND COMBINATORICS 33
that
k2 = lim EWN,k
2
. (2.1.45)
N
To see (2.1.46), recall, for a multi-index i = (i1 , . . . , ik ), the terms TiN of (2.1.15),
and the associated closed word wi . Then, as in (2.1.21), one has
N
j
E(WN,k )= n TiN1 ,i2 ,...,i j , (2.1.47)
in1 ,...,ik =1
n=1,2,... j
where
j
TiN1 ,i2 ,...,i j =E (TiNn ETiNn ) . (2.1.48)
n=1
Note that TiN1 ,i2 ,...,i j = 0 if the graph generated by any word wn := win does not
have an edge in common with any graph generated by the other words wn , n = n.
Motivated by that and our variance computation, let
( j)
Wk,t denote a set of representatives for equivalence classes of
sentences a of weight t consisting of j closed words (w1 , w2 , . . . , w j ),
each of length k + 1, with Nea 2 for each e Ea , and such that for
each n there is an n = n (n) = n such that Ewn Ewn = 0.
/
(2.1.49)
As in (2.1.25), one obtains
jk jk
CN,t
j
E(WN,k ) = CN,t TwN1 ,w2 ,...,w j := jk/2 Ta . (2.1.50)
t=1 ( j) t=1 N ( j)
a=(w1 ,w2 ,...,w j )Wk,t aWk,t
The next lemma, whose proof is deferred to the end of the section, is concerned
( j)
with the study of Wk,t .
34 2. W IGNER MATRICES
( j)
By Lemma 2.1.34, if a Wk,k j/2 for j even then Ga possesses exactly j/2 con-
nected components. This is possible only if there exists a permutation
: {1, . . . , j} {1, . . . , j} ,
all of whose cycles have length 2 (that is, a matching), such that the connected
components of Ga are the graphs {G(wi ,w (i) ) }. Letting mj denote the collection of
all possible matchings, one thus obtains that for j even,
j/2
Ta = Twi ,w (i)
( j) mj i=1 (w ,w (2)
aWk,k j/2 i (i) )Wk,k
Proof of Lemma 2.1.34 That c j/2 is immediate from the fact that the sub-
graph corresponding to any word in a must have at least one edge in common with
at least one subgraph corresponding to another word in a.
Next, put
j
!
j
a = [[i,n ]kn=1 ]i=1 ,I= {i} {1, . . . , k} , A = [{i,n , i,n+1 }](i,n)I .
i=1
Now let X = {Xin }(i,n)I be a table of the same shape as A, but with all entries
equal either to 0 or 1. We call X an edge-bounding table under the following
conditions.
For each e E there exist distinct (i1 , n1 ), (i2 , n2 ) I such that Xi1 ,n1 = Xi2 ,n2 =
1 and Ai1 ,n1 = Ai2 ,n2 = e.
For each e E and index i {1, . . . , j}, if e appears in the ith row of A then
there exists (i, n) I such that Ai,n = e and Xi,n = 1.
For any edge-bounding table X the corresponding quantity 12 (i,n)I Xi,n bounds
|E |. At least one edge-bounding table exists, namely the table with a 1 in position
(i, n) for each (i, n) I such that Ai,n E and 0 elsewhere. Now let X be an edge-
bounding table such that for some index i0 all the entries of X in the i0 th row are
equal to 1. Then the closed word wi0 is a walk in G , and hence every entry in the
i0 th row of A appears there an even number of times and a fortiori at least twice.
Now choose (i0 , n0 ) I such that Ai0 ,n0 E appears in more than one row of A.
Let Y be the table obtained by replacing the entry 1 of X in position (i0 , n0 ) by
the entry 0. Then Y is again an edge-bounding table. Proceeding in this way we
can find an edge-bounding table with 0 appearing at least once in every row, and
hence we have |E | |I| j
2 . Together with (2.1.53) and the definition of I, this
completes the proof.
Exercise 2.1.35 (from [AnZ05]) Prove that the random vector {WN,i }ki=1 satisfies
a multidimensional CLT (as N ). (See Exercise 2.3.7 for an extension of this
result.)
In this section we describe the (minor) modifications needed when one considers
the analog of Wigners theorem for Hermitian matrices. Compared with (2.1.2),
we will have complex-valued random variables Zi, j . That is, start with two in-
dependent families of i.i.d. random variables {Zi, j }1i< j (complex-valued) and
{Yi }1i (real-valued), zero mean, such that EZ1,22 = 0, E|Z |2 = 1 and, for all
1,2
integers k 1,
rk := max E|Z1,2 |k , E|Y1 |k < . (2.2.1)
importance, and
for reasons that will become clearer in Chapter 3, such matrices
(rescaled by N) are referred to as Gaussian unitary ensemble (GUE) matrices.
As before, let iN denote the (real) eigenvalues of XN , with 1N 2N
NN , and recall that the empirical distribution of the eigenvalues is the probability
measure on R defined by
1 N
LN = iN .
N i=1
Theorem 2.2.1 (Wigner) For a Hermitian Wigner matrix, the empirical measure
LN converges weakly, in probability, to the semicircle distribution.
lim mNk = mk .
N
Proof of Lemma 2.2.2 We recall the machinery introduced in Section 2.1.3. Thus,
an N-word w = (s1 , . . . , sk ) defines a graph Gw = (Vw , Ew ) and a path on the graph.
For our purpose, it is convenient to keep track of the direction in which edges are
traversed by the path. Thus, given an edge e = {s, s }, with s < s , we define
New,+ as the number of times the edge is traversed from s to s , and we set New, =
New New,+ as the number of times it is traversed in the reverse direction.
Recalling the equality (2.1.10), we now have instead of (2.1.15) the equation
1 wi ,+ wi , wi
TiN = c
N k/2 eEw
Ne
E(Z1,2 Ne
(Z1,2 ) ) s E(Y1Ne ) . (2.2.3)
eEw
i i
w
In particular, TiN = 0 unless Ne i 2 for all e Ewi . 2 = 0,
Furthermore, since EZ1,2
w w ,+
one has TiN = 0 if Ne i = 2 and Ne i = 1 for some e Ewi .
A slight complication occurs since the function
w,+ w,
Ne Ne
gw (New,+ , New, ) := E(Z1,2 (Z1,2 ) )
2.2 C OMPLEX W IGNER MATRICES 37
is not constant over equivalence classes of words (since changing the letters de-
termining w may switch the role of New,+ and New, in the above expression). Note
however that, for any w Wk,t , one has
w
|gw (New,+ , New, )| E(|Z1,2 |Ne ) .
On the other hand, any w Wk,k/2+1 satisfies that Gw is a tree, with each edge
visited exactly twice by the path determined by w. Since the latter path starts and
ends at the same vertex, one has New,+ = New, = 1 for each e Ew . Thus, repeating
the argument in Section 2.1.3, the finiteness of rk implies that
Proof of Lemma 2.2.3 The proof is a rerun of the proof of Lemma 2.1.7, using
the functions gw (New,+ , New, ), defined in the course of proving Lemma 2.2.2. The
(2)
proof boils down to showing that Wk,k+2 is empty, a fact that was established in
the course of proving Lemma 2.1.7.
In this short section we digress slightly and prove that certain functionals of ran-
dom matrices have the concentration property, namely, with high probability these
functionals are close to their mean value. A more complete treatment of concen-
tration inequalities and their application to random matrices is postponed to Sec-
tion 4.4. The results of this section will be useful in Section 2.4, where they will
play an important role in the proof of Wigners Theorem via the Stieltjes trans-
form.
and call G a Lipschitz function if |G|L < . The following lemma is an immediate
application of Lemma 2.1.19. In its statement, we identify C with R2 .
Lemma 2.3.2 Let V : RM R satisfy that for some positive constant C, V (x)
x22 /2C is convex. Then, the probability measure (dx) = Z 1 eV (x) dx, where
Z = eV (x) dx, satisfies the logarithmic Sobolev inequality with constant C. In
particular, the standard Gaussian law on RM satisfies the logarithmic Sobolev
inequality with constant 1.
Lemma 2.3.3 (Herbst) Assume that P satisfies the LSI on RM with constant c.
Let G be a Lipschitz function on RM , with Lipschitz constant |G|L . Then for all
R,
EP [e (GEP (G)) ] ec
2 |G|2 /2
L , (2.3.2)
and so for all > 0
P (|G EP (G)| ) 2e
2 /2c|G|2
L . (2.3.3)
Define
A = log EP e2 (GEP G) .
Then, taking F = e (GEP G) in (2.3.1), some algebra reveals that for > 0,
d A
2c|| ||G||22 || .
d
Now, because G EP (G) is centered,
A
lim =0
0+
and hence integrating with respect to yields
A 2c|| ||G||22 || 2 ,
first for 0 and then for any R by considering the function G instead of G.
This completes the proof of (2.3.2) in the case that G is bounded and differentiable.
Let us now assume only that G is Lipschitz with |G|L < . For > 0, define
G = G (1/ ) (1/ ), and note that |G |L |G|L < . Consider the reg-
ularization G (x) = p G (x) = G (y)p (x y)dy with the Gaussian density
2.3 C ONCENTRATION AND LOGARITHMIC S OBOLEV INEQUALITIES 41
p (x) = e|x| /2 dx/ (2 )M such that p (x)dx converges weakly towards the
2
Thus, we can apply (2.3.2) in the bounded differentiable case to find that for any
> 0 and all R,
EP [e G ] e EP G ec
2 |G|2 /2
L . (2.3.5)
Therefore, by Fatous Lemma,
EP [e G ] elim inf 0 EP G ec
2 |G|2 /2
L . (2.3.6)
We next show that lim 0 EP G = EP G, which, in conjunction with (2.3.6), will
conclude the proof. Indeed, (2.3.5) implies that
P (|G EP G | > ) 2e
2 /2c|G|2
L . (2.3.7)
Consequently,
E[(G EP G )2 ] = 2 xP (|G EP G | > x) dx
0
x2
2c|G|2
4 xe L dx = 4c|G|2L , (2.3.8)
0
so that the sequence (G EP G ) 0 is uniformly integrable. Now, G converges
pointwise towards G and therefore there exists a constant K, independent of ,
such that for < 0 , P(|G | K) 34 . On the other hand, (2.3.7) implies that
P(|G EP G | r) 34 for some r independent of . Thus,
{|G EP G | r} {|G | K} {|EP G | K + r}
is not empty, providing a uniform bound on (EP G ) <0 . We thus deduce from
(2.3.8) that sup <0 EP G2 is finite, and hence (G ) <0 is uniformly integrable. In
particular,
lim EP G = EP G < ,
0
42 2. W IGNER MATRICES
Remark 2.3.6 The assumption of Theorem 2.3.5 is satisfied for Gaussian matrices
whose entries on or above the diagonal are independent, with variance bounded
2.4 S TIELTJES TRANSFORMS AND RECURSIONS 43
Exercise 2.3.7 (From [AnZ05]) Using Exercise 2.1.35, prove that if XN is a Gaus-
sian Wigner matrix and f : R R is a Cb1 function, then N[ f , LN f , LN ]
satisfies a central limit theorem.
Definition 2.4.1 Let be a positive, finite measure on the real line. The Stieltjes
transform of is the function
(dx)
S (z) := , z C\R.
R xz
Note that for z C \ R, both the real and imaginary parts of 1/(x z) are continu-
ous bounded functions of x R and, further, |S (z)| (R)/|z|. These crucial
observations are used repeatedly in what follows.
Remark 2.4.2 The generating function (z), see (2.1.6), is closely related to the
Stieltjes transform of the semicircle distribution : for |z| < 1/4,
(z) = zk x2k (x)dx = zx2 (x)dx
k
k=0 k=0
1
= (x)dx
1 zx2
1 1
= (x)dx = S (1/ z) ,
1 zx z
where the third equality uses the fact that the support of is the interval [2, 2],
and the fourth uses the symmetry of .
Theorem 2.4.3 For any open interval I with neither endpoint on an atom of ,
1 S ( + i ) S ( i )
(I) = lim d
0 I 2i
1
= lim S ( + i )d . (2.4.1)
0 I
Theorem 2.4.3 allows for the reconstruction of a measure from its Stieltjes
transform. Further, one has the following.
(We recall that n converges vaguely to if, for any continuous function f on R
that decays to 0 at infinity, f d n f d . Recall also that a positive measure
on R is a sub-probability measure if it satisfies (R) 1.)
2.4 S TIELTJES TRANSFORMS AND RECURSIONS 45
Proof Part (a) is a restatement of the notion of weak convergence. To see part
(b), let nk be a subsequence on which nk converges vaguely (to a sub-probability
measure ). (Such a subsequence always exists by Hellys selection theorem.)
Because x 1/(z x), for z C \ R, is continuous and decays to zero at infinity,
one obtains the convergence Snk (z) S (z) pointwise for such z. From the hy-
pothesis, it follows that S(z) = S (z). Applying Theorem 2.4.3, we conclude that
all vaguely convergent subsequences converge to the same , and hence n
vaguely.
To see part (c), fix a sequence zi z0 in C \ R with zi = z0 , and define, for
1 , 2 M1 (R), (1 , 2 ) = i 2i |S1 (zi ) S2 (zi )|. Note that (n , ) 0 im-
plies that n converges weakly to . Indeed, moving to a subsequence if neces-
sary, n converges vaguely to some sub-probability measure , and thus Sn (zi )
S (zi ) for each i. On the other hand, the uniform (in i, n) boundedness of Sn (zi )
and (n , ) 0 imply that Sn (zi ) S (zi ). Thus, S (z) = S (z) for all z = zi
and hence, for all z C \ R since the set {zi } possesses an accumulation point and
S , S are analytic. By the inversion formula (2.4.1), it follows that = and in
particular is a probability measure and n converges weakly to = . From
the assumption of part (c) we have that (n , ) 0, in probability, and thus n
converges weakly to in probability, as claimed.
We consider in this section the case when XN is a Gaussian Wigner matrix, pro-
viding
Proof #2 of Theorem 2.1.1 (XN a Gaussian Wigner matrix).
Recall first the following identity, characterizing the Gaussian distribution, which
is proved by integration by parts.
Lemma 2.4.5 If is a zero mean Gaussian random variable, then for f differen-
tiable, with polynomial growth of f and f ,
E( f ( )) = E( f ( ))E( 2 ) .
46 2. W IGNER MATRICES
This, and the boundedness of 1/(z x)2 for a fixed z as above, imply the existence
of a sequence N (z) N 0 such that, letting SN (z) := N 1 EtrSXN (z), one has
1 1
SN (z) = SN (z)2 + N (z) .
z z
Thus any limit point s(z) of SN (z) satisfies
Further, let C+ = {z C : z > 0}. Then, for z C+ , by its definition, s(z) must
have a nonnegative imaginary part, while for z C \ (R C+ ), s(z) must have a
nonpositive imaginary part. Hence, for all z C, with the choice of the branch of
the square-root dictated by the last remark,
1
s(z) = z z2 4 . (2.4.7)
2
2.4 S TIELTJES TRANSFORMS AND RECURSIONS 47
Comparing with (2.1.6) and using Remark 2.4.2, one deduces that s(z) is the Stielt-
jes transform of the semicircle law , since s(z) coincides with the latter for |z| > 2
and hence for all z C \ R by analyticity. Applying again Theorem 2.3.5 and Re-
mark 2.3.6, it follows that SLN (z) converges in probability to s(z), solution of
(2.4.7), for all z C \ R. The proof is completed by using part (c) of Theorem
2.4.4.
We consider in this section the case when XN is a Wigner matrix. We give now:
(1)
Lemma 2.4.6 Let W HN be a symmetric matrix, and let wi denote the ith col-
umn of W with the entry W (i, i) removed (i.e., wi is an N 1-dimensional vector).
(1)
Let W (i) HN1 denote the matrix obtained by erasing the ith column and row
from W . Then, for every z C \ R,
1
(W zI)1 (i, i) = . (2.4.8)
W (i, i) z wTi (W (i) zIN1 )1 wi
and use the matrix identity (A.1) with A = W (N) zIN1 , B = wN , C = wTN and
D = W (N, N) z to conclude that
det(W zIN ) =
det(W (N) zIN1 ) det W (N, N) z wTN (W (N) zIN1 )1 wN .
The last formula holds in the same manner with W (i) , wi and W (i, i) replacing
W (N) , wN and W (N, N) respectively. Substituting in (2.4.9) completes the proof of
Lemma 2.4.6.
We are now ready to return to the proof of Theorem 2.1.1. Repeating the trunca-
tion argument used in the proof of Theorem 2.1.21, we may and will assume in
48 2. W IGNER MATRICES
the sequel thatXN (i, i) = 0 for all i and that for some constant C independent of N,
it holds that | NXN (i, j)| C for all i, j. Define k (i) = XN (i, k), i.e. k is the kth
column of the matrix XN . Let k denote the N 1 dimensional vector obtained
(k) (1)
from k by erasing the entry k (k) = 0. Denote by XN HN the matrix con-
sisting of XN with the kth row and column removed. By Lemma 2.4.6, one gets
that
1 1 N 1
N
trSXN (z) = (i)
N i=1 z T (X zIN1 )1 i
i N
1
= N (z) , (2.4.10)
z + N 1 trSXN (z)
where
1 N i,N
N (z) =
N i=1 (z N trSXN (z) + i,N )(z N 1 trSXN (z))
1
, (2.4.11)
and
i,N = N 1 trSXN (z) iT (XN zIN1 )1 i .
(i)
(2.4.12)
Our next goal is to prove the convergence in probability of N (z) to zero for
each fixed z C \ R with |z| = 0 > 0. Toward this end, note that the term
z N 1 trSXN (z)) in the right side of (2.4.11) has modulus at least 0 , since
|z| = 0 and all eigenvalues of XN are real. Thus, if we prove the convergence
of supiN |i,N | to zero in probability, it will follow that N (z) converges to 0 in
(i)
probability. Toward this end, let XN denote the matrix XN with the ith column
(i) (i)
and row set to zero. Then, the eigenvalues of XN and XN coincide except that
(i)
XN has one more zero eigenvalue. Hence,
1 1
|trS (i) (z) trS (i) (z)| ,
N XN XN 0 N
(i) (i) (i) (i)
whereas, with the eigenvalues of XN denoted 1 2 N , and those
of XN denoted 1N 2N NN , one has
1/2
1 1 N (i) 1 1 N (i)
N
|trS (i) (z) trSXN (z)|
XN | k | 2 N |k k |
02 N k=1 k
N N 2
0 k=1
1/2
1 2 N
02 N k=1
XN (i, k)2 ,
where Lemma 2.1.19 was used in the last inequality. Since | NXN (i, j)| C,
we get that supi N 1 |trS (i) (z) trSXN (z)| converges to zero (deterministically).
XN
2.4 S TIELTJES TRANSFORMS AND RECURSIONS 49
Combining the above, it follows that to prove the convergence of supiN |i,N | to
zero in probability, it is enough to prove the convergence to 0 in probability of
supiN |i,N |, where
(i) 1 (i)
i,N = iT BN (z)i trBN (z)
N
1 N1
2 N1
BN (z)(k, k) + i (k)i (k )BN (z)(k, k )
(i) (i)
= N i (k) 1
N k=1 k,k =1,k=k
=: i,N (1) + i,N (2) , (2.4.13)
possesses zero mean independent entries of variance 1/N, one observes by condi-
(i)
tioning on the sigma-field Fi,N generated by XN that E i,N = 0. Further, since
1
N 1 tr BN (z)2 2 ,
(i)
0
and the random variables | N i (k)| are uniformly bounded, it follows that
c1
E|i,N (1)|4 .
N2
for some constant c1 that depends only on 0 and C. Similarly, one checks that
c2
E|i,N (2)|4 2 ,
N
for some constant c2 depending only on C, 0 . One obtains then, by Chebyshevs
inequality, the claimed convergence of supiN |i,N (z)| to 0 in probability.
The rest of the argument is similar to what has already been done in Section
2.4.1, and is omitted.
Remark 2.4.7 We note that reconstruction and continuity results that are stronger
than those contained in Theorems 2.4.3 and 2.4.4 are available. An accessible
introduction to these and their use in RMT can be found in [Bai99]. For example,
in Theorem 2.4.3, if possesses a Holder continuous density m then, for R,
(dx)
S ( + i0) := lim S ( + ) = i m( ) + P.V. (2.4.14)
0 R x
exists, where the notation P.V. stands for principal value. Also, in the context of
Theorem 2.4.4, if the and are probability measures supported on [B, B], a,
are constants satisfying
1 1 1
:= du > ,
|u|a u2 + 1 2
50 2. W IGNER MATRICES
4B
:= (0, 1) ,
(A B)(2 1)
In the context of random matrices, equation (2.4.15) is useful in obtaining the rate
of convergence of LN to its limit, but we will not discuss this issue here at all.
1
yTi (B + yi yTi )1 = yT B1 ,
1 + yTi B1 yi i
with the matrices Bi = WN zI yi yTi , to show that the normalized trace of the
right side of (2.4.16) converges to 0.
2.5.1 Definition and preliminary discussion of the GOE and the GUE
Let {i, j , i, j }
i, j=1 be an i.i.d. family of real mean 0 variance 1 Gaussian random
variables. We define
(1) (1)
P2 , P3 , . . .
( ) ( )
respectively. A random matrix X HN with law PN is said to belong to the
Gaussian orthogonal ensemble (GOE) or the Gaussian unitary ensemble (GUE)
according as = 1 or = 2, respectively. (We often write GOE(N) and GUE(N)
when an emphasis on the dimension is needed.) The theory of Wigner matrices
developed in previous sections of this book applies here. In particular, for fixed
( ) ( )
, given for each N a random matrix X(N) H N with law PN , the empirical
distribution of the eigenvalues of XN := X(N)/ N tends to the semicircle law of
mean 0 and variance 1.
( )
So whats special about the law PN within the class of laws of Wigner matri-
( )
ces? The law PN is highly symmetrical. To explain the symmetry, as well as
to explain the presence of the terms orthogonal and unitary in our terminol-
( ) ( )
ogy, let us calculate the density of PN with respect to Lebesgue measure N on
( ) ( )
HN . To fix N unambiguously (rather than just up to a positive constant fac-
tor) we use the following procedure. In the case = 1, consider the one-to-one
(1)
onto mapping HN RN(N+1)/2 defined by taking on-or-above-diagonal entries
(1)
as coordinates, and normalize N by requiring it to push forward to Lebesgue
measure on R N(N+1)/2 . Similarly, in the case = 2, consider the one-to-one
(2) 2
onto mapping HN RN CN(N1)/2 = RN defined by taking on-or-above-
(2)
diagonal entries as coordinates, and normalize N by requiring it to push forward
52 2. W IGNER MATRICES
2 ( )
to Lebesgue measure on RN . Let Hi, j denote the entry of H HN in row i and
column j. Note that
N
trH 2 = trHH = Hi,i
2
+2 |Hi, j |2 .
i=1 1i< jN
( )
The latter formula clarifies the symmetry of PN . The main thing to notice is that
the density at H depends only on the eigenvalues of H. It follows that if X is a
(1) (1)
random element of HN with law PN , then for any N N orthogonal matrix U,
again UXU has law PN ; and similarly, if X is a random element of HN with
(1) (2)
law PN , then for any N N unitary matrix U, again UXU has law PN . As
(2) (2)
( )
we already observed, for random X HN it makes sense to talk about the joint
distribution of the eigenvalues 1 (X) N (X).
(For an easy verification of the second equality in (2.5.2), note that the determinant
is a polynomial that must vanish when xi = x j for any pair i = j.)
The main result in this section is the following.
( )
Remark 2.5.3 We refer to the probability measure PN on RN with density
( ) N
dPN ( )
= CN |(x)| e xi /4 ,
2
(2.5.6)
dLebN i=1
where LebN is the Lebesgue measure on RN and CN is given in (2.5.4), as the law
of the unordered eigenvalues of the GOE(N) (when = 1) or GUE(N) (when =
2). The special case = 4 corresponds to the GSE(N) (see Section 4.1 for details
on the explicit construction of random matrices whose eigenvalues are distributed
(4)
according to PN ).
( )
The distributions PN for 1, = 1, 2, 4 also appear as the law of the
unordered eigenvalues of certain random matrices, although with a very different
structure, see Section 4.5.
A consequence of Theorem 2.5.2 is that a.s., the eigenvalues of the GOE and
GUE are all distinct. Let v1 , . . . , vN denote the eigenvectors corresponding to the
eigenvalues (1N , . . . , NN ) of a matrix X from GOE(N) or GUE(N), with their first
nonzero entry positive real. Recall that O(N) (the group of orthogonal matrices)
and U(N) (the group of unitary matrices) admit a unique Haar probability measure
(see Theorem F.13). The invariance of the law of X under arbitrary orthogonal
(unitary) transformations implies then the following.
(for the GUE). Further, (v1 , . . . , vN ) is distributed like a sample of Haar measure
on O(N) (for the GOE) or U(N) (for the GUE), with each column multiplied by a
N1 N1
norm one scalar so that the columns all belong to S+ (for the GOE) and SC,+
(for the GUE).
is distributed like X for any orthogonal (in the GOE case) or unitary (in the GUE
case) T independent of X, and since choosing T uniformly according to Haar
measure and independent of U makes TU Haar distributed and hence of law in-
dependent of that of U, the independence of the eigenvectors and the eigenvalues
follows. All other statements are immediate consequences of this and the fact that
each column of a Haar distributed orthogonal (resp., unitary) matrix is distributed,
after multiplication by a scalar that makes its first entry real and nonnegative, uni-
formly on S+ N1 N1
(resp. SC,+ ).
We present in this section a proof of Theorem 2.5.2 that has the advantage of
being direct, elementary, and not requiring much in terms of computations. On
the other hand, this proof is not enough to provide one with the evaluation of the
normalization constant CN in (2.5.4). The evaluation of the latter is postponed to
subsection 2.5.3, where the Selberg integral formula is derived. Another approach
to evaluating the normalization constants, in the case of the GUE, is provided in
Section 3.2.1.
( )
The idea behind the proof of Theorem 2.5.2 is as follows. Since X HN ,
there exists a decomposition X = UDU , with eigenvalue matrix D DN , where
DN denotes diagonal matrices with real entries, and with eigenvector matrix U
( ) ( )
UN , where UN denotes the collection of orthogonal matrices (when = 1)
or unitary matrices (when = 2). Suppose this map were a bijection (which it
is not, at least at the matrices X without distinct eigenvalues) and that one could
( )
parametrize UN using N(N 1)/2 parameters in a smooth way (which one
cannot). An easy computation shows that the Jacobian of the transformation
would then be a polynomial in the eigenvalues with coefficients that are func-
( )
tions of the parametrization of UN , of degree N(N 1)/2. Since the bijection
must break down when Dii = D j j for some i = j, the Jacobian must vanish on
that set; symmetry and degree considerations then show that the Jacobian must
( )
be proportional to the factor (x) . Integrating over the parametrization of UN
then yields (2.5.3).
In order to make the above construction work, we need to throw away subsets
( )
of HN that fortunately turn out to have zero Lebesgue measure. Toward this
( )
end, we say that U UN is normalized if every diagonal entry of U is strictly
( )
positive real. We say that U UN is good if it is normalized and every entry of
( ),g
U is nonzero. The collection of good matrices is denoted UN . We also say that
D DN is distinct if its entries are all distinct, denoting by DNd the collection of
2.5 J OINT DISTRIBUTIONS IN THE GOE AND THE GUE 55
distinct matrices, and by DNdo the subset of matrices with decreasing entries, that
is DNdo = {D DNd : Di,i > Di+1,i+1 }.
( ),dg
Let HN denote the subset of H ( ) consisting of those matrices that possess
( ),g
a decomposition X = UDU where D DNd and U UN . The first step is
contained in the following lemma.
( ) ( ),dg
Lemma 2.5.5 HN \ HN has null Lebesgue measure. Further, the map
( ),g ( ),dg
(DN , UN ) HN
do given by (D,U) UDU is one-to-one and onto, while
( ),g ( ),dg
(DNd , UN ) HN given by the same map is N!-to-one.
Proof of Lemma 2.5.5 In order to prove the first part of the lemma, we note
that for any nonvanishing polynomial function p of the entries of X, the set {X :
p(X) = 0} is closed and has zero Lebesgue measure (this fact can be checked by
applying Fubinis Theorem). So it is enough to exhibit a nonvanishing polynomial
( ) ( ),dg
p with p(X) = 0 if X HN \ HN . Toward this end, we will show that
for such X, either X has some multiple eigenvalue, or, for some k, X and the
matrix X (k) obtained by erasing the kth row and column of X possess a common
eigenvalue.
Given any n by n matrix H, for i, j = 1, . . . , n let H (i, j) be the n 1 by n 1
matrix obtained by deleting the ith column and jth row of H, and write H (k) for
H (k,k) . We begin by proving that if X = UDU with D DNd , and X and X (k) do
not have eigenvalues in common for any k = 1, 2, . . . , N, then all entries of U are
nonzero. Indeed, let be an eigenvalue of X, set A = X I, and define Aadj as the
adj
N by N matrix with Ai, j = (1)i+ j det(A(i, j) ). Using the identity AAadj = det(A)I,
one concludes that AAadj = 0. Since the eigenvalues of X are assumed distinct,
the null space of A has dimension 1, and hence all columns of Aadj are scalar
multiple of some vector v , which is then an eigenvector of X corresponding to the
adj
eigenvalue . Since v (i) = Ai,i = det(X (i) I) = 0 by assumption, it follows
that all entries of v are nonzero. But each column of U is a nonzero scalar
multiple of some v , leading to the conclusion that all entries of U do not vanish.
We recall, see Appendix A.4, that the resultant of the characteristic polynomials
of X and X (k) , which can be written as a polynomial in the entries of X and X (k) ,
and hence as a polynomial P1 in the entries of X, vanishes if and only if X and X (k)
have a common eigenvalue. Further, the discriminant of X, which is a polynomial
P2 in the entries of X, vanishes if and only if not all eigenvalues of X are distinct.
Taking p(X) = P1 (X)P2 (X), one obtains a nonzero polynomial p with p(X) = 0
( ) ( ),dg
if X HN \ HN . This completes the proof of the first part of Lemma 2.5.5.
The second part of the lemma is immediate since the eigenspace corresponding
56 2. W IGNER MATRICES
( ),g
Next, we say that U UN is very good if all minors of U have nonvanishing
( ),vg
determinant. Let UN denote the collection of very good matrices. The interest
in such matrices is that they possess a particularly nice parametrization.
( ),vg
Lemma 2.5.6 The map T : UN R N(N1)/2 defined by
U1,2 U1,N U2,3 U2,N UN1,N
T (U) = ,..., , ,..., ,..., (2.5.7)
U1,1 U1,1 U2,2 U2,2 UN1,N1
(where C is identified
with R2in the case = 2) is one-to-one with smooth inverse.
( ),vg c
Further, the set T (UN ) is closed and has zero Lebesgue measure.
Proof of Lemma 2.5.6 We begin with the first part. The proof is by an inductive
2
construction. Clearly, U1,1 = 1 + Nj=2 |U1, j |2 /|U1,1 |2 . So suppose that Ui, j are
given for 1 i i0 and 1 j N. Let vi = (Ui,1 , . . . ,Ui,i0 ), i = 1, . . . , i0 . One
then solves the equation
U
U1,i0 +1 + Ni=i0 +2 U1,i Ui +1,i
i0 +1,i
v1 U0 0 +1
v2 N i0 +1,i
2,i0 +1 i=i0 +2 2,i Ui +1,i
U + U
. Z = 0 0 +1 .
.. .
..
U
vi0 i0 +1,i
Ui0 ,i0 +1 + Ni=i0 +2 Ui0 ,i Ui +1,i +1
0 0
The very good condition on U ensures that the vector Z is uniquely determined by
this equation, and one then sets
N
i0
Ui0 +1,i 2
Ui2
0 +1,i0 +1
= 1 + k |Z |2
+
k=1 i=i0 +2 Ui0 +1,i0 +1
and
Ui0 +1, j = Z j Ui0 +1,i0 +1 , for 1 j i0 .
(All entries Ui0 +1, j with j > i0 + 1 are then determined by T (U).) This completes
the proof of the first part.
( )
To see the second part, let ZN be the space of matrices whose columns are
orthogonal, whose diagonal entries all equal to 1, and all of whose minors have
( )
nonvanishing determinants. Define the action of T on ZN using (2.5.7). Then,
( ),vg ( )
T (UN ) = T (ZN ). Applying the previous constructions, one immediately
2.5 J OINT DISTRIBUTIONS IN THE GOE AND THE GUE 57
obtains a polynomial type condition for a point in R N(N1)/2 to not belong to the
( )
set T (ZN ).
( ),vg ( ),dg
Let HN denote the subset of HN consisting of those matrices X that
( ),vg
can be written as X = UDU with D DN and U UN
d .
( ) ( ),vg
Lemma 2.5.7 The Lebesgue measure of HN \ HN is zero.
( ),vg
Proof of Lemma 2.5.7 We identify a subset of HN which we will prove to
be of full Lebesgue measure. We say that a matrix D DNd is strongly distinct if
for any integer r = 1, 2, . . . , N 1 and subsets I, J of {1, 2, . . . , N},
Proof of (2.5.3) Recall the map T introduced in Lemma 2.5.6, and define the
( ),vg ( ) ( ),vg
map T : T (UN ) RN HN by setting, for RN and z T (UN ),
D DN with Di,i = i and T (z, ) = T 1 (z)DT 1 (z) . By Lemma 2.5.6, T is
smooth, whereas by Lemma 2.5.5, it is N!-to-1 on a set of full Lebesgue measure
and is locally one-to-one on a set of full Lebesgue measure. Letting J T denote the
58 2. W IGNER MATRICES
Theorem 2.5.8 (Selbergs integral formula) For all positive numbers a, b and c
we have
1 1
1 n n1
(a + jc)(b + jc)(( j + 1)c)
|(x)|2c xia1 (1 xi )b1 dxi = .
n! 0 0 i=1 j=0 (a + b + (n + j 1)c)(c)
(2.5.9)
and
1 n n1
(( j + 1)c)
|(x)|2c exi /2 dxi = (2 )n/2
2
. (2.5.11)
n! i=1 j=0 (c)
Remark 2.5.10 The identities in Theorem 2.5.8 and Corollary 2.5.9 hold under
rather less stringent conditions on the parameters a, b and c. For example, one
can allow a, b and c to be complex with positive real parts. We refer to the biblio-
graphical notes for references. We note also that only (2.5.11) is directly relevant
to the study of the normalization constants for the GOE and GUE. The usefulness
of the other more complicated formulas will become apparent in Section 4.1.
We will prove Theorem 2.5.8 following Andersons method [And91], after first
explaining how to deduce Corollary 2.5.9 from (2.5.9) by means of the Stirling
approximation, which we recall is the statement
2 s s
(s) = (1 + os+ (1)), (2.5.12)
s e
where s tends to + along the positive real axis. (For a proof of (2.5.12) by an
application of Laplaces method, see Exercise 3.5.5.)
Proof of Corollary 2.5.9 We denote the left side of (2.5.9) by Sn (a, b, c). Consider
first the integral
s s n
1
Is = (x)2c xia1 (1 xi /s)s dxi ,
n! 0 0 i=1
Is = sn(a+(n1)c) Sn (a, s + 1, c) .
60 2. W IGNER MATRICES
Js = 23n(n1)/2+3n/2+2ns sn(n1)c/2+n/2 Sn (s + 1, s + 1, c) .
Before providing the proof of Theorem 2.5.8, we note the following identity
involving the beta integral in the left side:
sn+1 1
n n
(s1 ) (sn+1 )
1 xi xisi 1 dxi = .
{xRn :minni=1 xi >0,ni=1 xi <1} i=1 i=1 (s 1 + + sn+1 )
(2.5.15)
0
0
usi i 1 eui dui ,
i=1
and applying Fubinis Theorem both before and after the substitution.
Proof of Theorem 2.5.8 We aim now to rewrite the left side of (2.5.9) in an
intuitive way, see Lemma 2.5.12 below. Toward this end, we introduce some
notation.
Let Dn be the space consisting of monic polynomials P(t) of degree n in a vari-
able t with real coefficients such that P(t) has n distinct real roots. More generally,
2.5 J OINT DISTRIBUTIONS IN THE GOE AND THE GUE 61
,n put
Lemma 2.5.11 For k, = 1, . . . , n and = (1 , . . . , n ) D
k
k = k (1 , . . . , n ) = i1 ik , k, =
.
1i1 <<ik n
Then
n
det k, = |i j | = |( )| . (2.5.16)
k,=1
1i< jn
Proof We have
k, = k1 (t i ) ,
i{1,...,n}\{}
Lemma 2.5.13 Fix Q Dn+1 with roots 1 < < n+1 . Fix real numbers
1 , . . . , n+1 and let P(t) be the unique polynomial in t of degree n with real
coefficients such that the partial fraction expansion
P(t) n+1 i
=
Q(t) i=1 t i
holds. Then the following statements are equivalent:
(I) (P, Q) En .
i=1 i > 0 and i=1 i = 1.
(II) minn+1 n+1
Proof (III) The numbers P(i ) do not vanish and their signs alternate. Similarly,
the numbers Q (i ) do not vanish and their signs alternate. By LHopitals rule, we
have i = P(i )/Q (i ) for i = 1, . . . , n + 1. Thus all the quantities i are nonzero
2.5 J OINT DISTRIBUTIONS IN THE GOE AND THE GUE 63
and have the same sign. The quantity P(t)/Q (t) depends continuously on t in
the interval [n+1 , ), does not vanish in that interval, and tends to 1/(n + 1) as
t +. Thus n+1 is positive. Since the signs of P(i ) alternate, and so do the
signs of Q (i ), it follows that i = P(i )/Q (i ) > 0 for all i. Because P(t) is
monic, the numbers i sum to 1. Thus condition (II) holds.
(III) Because the signs of the numbers Q (i ) alternate, we have sufficient in-
formation to force P(t) to change sign n + 1 times, and thus to have n distinct real
roots interlaced with the roots of Q(t). And because the numbers i sum to 1, the
polynomial P(t) must be monic in t. Thus condition (I) holds.
Lemma 2.5.14 Fix Q Dn+1 with roots 1 < < n+1 . Then we have
1 n+1 D(Q)1/2
n ({P Dn | (P, Q) En }) =
n! j=1
|Q ( j )|1/2 =
n!
. (2.5.20)
A = {x Rn | (Px , Q) En } .
By definition the left side of (2.5.20) equals the Lebesgue measure of A. Consider
the polynomials Q j (t) = Q(t)/(t j ) for j = 1, . . . , n + 1. By Lemma 2.5.13, for
all x Rn , we have x A if and only if Px (t) = n+1 i=1 i Qi (t) for some real numbers
i such that min i > 0 and i = 1, or equivalently, A is the interior of the convex
hull of the points
2, j (1 , . . . , n+1 ), . . . , n+1, j (1 , . . . , n+1 ) Rn for j = 1, . . . , n + 1 ,
where the s are defined as in Lemma 2.5.11 (but with n replaced by n+1). Noting
that 1, 1 for = 1, . . . , n + 1, the Lebesgue measure of A equals the absolute
k,=1 k, (1 , . . . , n+1 ) by the determinantal formula for computing
value of n!1 detn+1
the volume of a simplex in Rn . Finally, we get the claimed result by (2.5.16).
Lemma 2.5.15 Fix Q Dn+1 with roots 1 < < n+1 . Fix positive numbers
s1 , . . . , sn+1 . Then we have
si 1/2 (s )
i=1 |Q (i )|
n+1
n+1
|P(i )|si 1 dn (P) =
i
. (2.5.21)
{PDn |(P,Q)En } i=1 (i=1 si )
n+1
Proof For P in the domain of integration in the left side of (2.5.21), define i =
i (P) = P(i )/Q (i ), i = 1, . . . , n + 1. By Lemma 2.5.13, i > 0, n+1
i=1 i = 1,
and further P (i )ni=1 is a bijection from {P Dn | (P, Q) En } to the domain
of integration in the right side of (2.5.15). Further, the map x (Px ) is linear.
64 2. W IGNER MATRICES
Hence
n+1
P(i ) si 1
{PDn |(P,Q)En i=1 }
Q (i ) dn (P)
equals, up to a constant multiple C independent of {si }, the right side of (2.5.15).
Finally, by evaluating the left side of (2.5.21) for s1 = = sn+1 = 1 by means of
Lemma 2.5.14 (and recalling that (n + 1) = n!) we find that C = 1.
We may now complete the proof of Theorem 2.5.8. Recall that the integral on
the left side of (2.5.9), denoted as above by Sn (a, b, c), can be represented as the
integral (2.5.17). Consider the double integral
Kn (a, b, c) = |Q(0)|a1 |Q(1)|b1 |R(P, Q)|c1 dn (P)dn+1 (Q) ,
En (0,1)
where R(P, Q) denotes the resultant of P and Q, see Appendix A.4. We will apply
Fubinis Theorem in both possible ways. On the one hand, we have
Kn (a, b, c) = |Q(0)|a1 |Q(1)|b1
Dn+1 (0,1)
|R(P, Q)|
c1
dn (P) dn+1 (Q)
{PDn (0,1)|(P,Q)En }
(c)n+1
= Sn+1 (a, b, c) ,
((n + 1)c)
via Lemma 2.5.15. On the other hand, writing P = t(t 1)P, we have
Kn (a, b, c) =
Dn (0,1) {QDn+1 |(Q,P)En+2 }
|Q(0)|a1 |Q(1)|b1 |R(P, Q)|c1 dn+1 (Q) dn (P)
(a)(b)(c)n
= |P (0)|a1/2 |P (1)|b1/2 |R(P, P )|c1/2 dn (P)
Dn (0,1) (a + b + nc)
(a)(b)(c)n
= Sn (a + c, b + c, c) ,
(a + b + nc)
by another application of Lemma 2.5.15. This proves (2.5.9) by induction on n;
the induction base n = 1 is an instance of (2.5.15).
Exercise 2.5.16 Provide an alternative proof of Lemma 2.5.11 by noting that the
determinant in the left side of (2.5.16) is a polynomial of degree n(n 1)/2 that
vanishes whenever xi = x j for some i = j, and thus, must equal a constant multiple
of (x).
2.5 J OINT DISTRIBUTIONS IN THE GOE AND THE GUE 65
It is sometimes useful to represent the formulas for the joint distribution of eigen-
values as integration formulas for functions that depend only on the eigenvalues.
We develop this correspondence now.
( )
Let f : HN [0, ] be a Borel function such that f (H) depends only on the
sequence of eigenvalues 1 (H) N (H). In this situation, for short, we say
that f (H) depends only on the eigenvalues of H. (Note that the definition implies
( )
that f is a symmetric function of the eigenvalues of H.) Let X HN be random
( )
with law PN . Assuming the validity of Theorem 2.5.2, we have
N xi2 /4 dx
f (x1 , . . . , xN )|(x)| i=1 e i
E f (X) = , (2.5.22)
2 /4
|(x)| i=1 e
N x i dxi
where f (x1 , . . . , xN ) denotes the value of f at the diagonal matrix with diago-
nal entries x1 , . . . , xN . Conversely, assuming (2.5.22), we immediately verify that
(2.5.3) is proportional to the joint density of the eigenvalues 1 (X), . . . , N (X) by
taking f (H) = 1(1 (H),...,N (H))A where A RN is any Borel set. In turn, to prove
(2.5.22), it suffices to prove the general integration formula
N
( ) ( )
f (H)N (dH) = CN f (x1 , . . . , xN )|(x)| dxi , (2.5.23)
j=1
where
1 N (1/2)k
N! k=1 (k/2)
if = 1 ,
( )
CN =
1 N k1
(k 1)!
N! k=1
if = 2 ,
and as in (2.5.22), the integrand f (H) is nonnegative, Borel measurable, and de-
pends only on the eigenvalues of H. Moreover, assuming the validity of (2.5.23),
it follows by taking f (H) = exp(atr(H 2 )/2) with a > 0 and using Gaussian
integration that
N
1
|(x)| eaxi /2 dxi
2
N! i=1
N
( j /2) 1
= (2 )N/2 a N(N1)/4N/2 =: . (2.5.24)
j=1 ( /2) ( )
N!CN
Thus, Theorem 2.5.2 is equivalent to the integration formula (2.5.23).
66 2. W IGNER MATRICES
The goal of this short subsection is to show how the eigenvalues of the GUE can be
coupled (that is, constructed on the same probability space) with the eigenvalues
of the GOE. As a by-product, we also discuss the eigenvalues of the GSE. Besides
the obvious probabilistic interest in such a construction, the coupling will actually
save us some work in the analysis of limit distributions for the maximal eigenvalue
of the GOE and the GSE.
To state our results, we introduce some notation. For a finite subset A R with
|A| = n, we define Ord(A) to be the vector in Rn whose entries are the elements of
A, ordered, that is
Note that if x is ordered, then Dec(x) erases from x the smallest entry, the third
smallest entry, etc.
The main result of this section is the following.
Theorem 2.5.17 For N > 0 integer, let AN and BN+1 denote the (collection of)
eigenvalues of two independent random matrices distributed according to GOE(N)
and GOE(N+1), respectively. Set
and
(1N , . . . , NN ) = N = Dec(Ord(A2N+1 )) . (2.5.26)
The proof of Theorem 2.5.17 goes through an integration relation that is slightly
more general than our immediate needs. To state it, let L = (a, b) R be a
nonempty open interval, perhaps unbounded, and let f and g be positive real-
valued infinitely differentiable functions defined on L. We will use the following
assumption on the triple (L, f , g).
Assumption 2.5.18 For (L, f , g) as above, for each integer k 0, write fk (x) =
xk f (x) and gk (x) = xk g(x) for x L. Then the following hold.
2.5 J OINT DISTRIBUTIONS IN THE GOE AND THE GUE 67
(I) There exists a matrix M (n) Matn+1 (R), independent of x, such that
det M (n) > 0 and
M (n) ( f0 , f1 , . . . , fn )T = (g 0 , g 1 , . . . , g n1 , f0 )T .
(II) ab | fn (x)|dx < .
(III) limxa gn (x) = 0 and limxb gn (x) = 0.
Proposition 2.5.19 Let Assumption 2.5.18 hold for a triple (L, f , g) with L =
(a, b). For x2n+1 = (x1 , . . . , x2n+1 ), set
(e) (o)
xn = Dec(x2n+1 ) = (x2 , x4 , . . . , x2n ) , and xn+1 = (x1 , x3 , . . . , x2n+1 ) .
Let
a x2
x2n
(xI )(xJ ) f (xi ) dx2n+1 dx3 dx1
(I,J)J2n+1 i=1
(e) 2 b
2n (xn ) a f (x)dx (ni=1 f (x2i )) (ni=1 g(x2i ))
= , (2.5.27)
det M (n)
and
x2 x4 b
2n+1
a x2
x2n
(x2n+1 ) f (xi ) dx2n+1 dx3 dx1
i=1
4
b (e)
a f (x)dx (xn ) (ni=1 g(x2i ))2
= . (2.5.28)
det M (2n)
Assumption 2.5.18(II) guarantees the finiteness of the integrals in the proposition.
The value of the positive constant det M (n) will be of no interest in applications.
The proof of Proposition 2.5.19 will take up most of this section, after we com-
plete the
Proof of Theorem 2.5.17 We first check that Assumption 2.5.18 with L = (, ),
68 2. W IGNER MATRICES
f (x) = g(x) = ex /4 holds, that is we verify that a matrix M (n) as defined there
2
M (n) ( f0 , f1 , . . . , fn )T = ( f0 , f0 , f1 , . . . , fn1
)T .
(n)
coefficient equal 1/2, we have that M (n) is a lower triangular matrix, with M1,1 =
(n)
1/2 for i > 1 and M1,1 = 1, and thus det M (n) = (1/2)n . Since M (n) is obtained
from M (n) by a cyclic permutation (of length n+1, and hence sign equal to (1)n ),
we conclude that det M (n) = (1/2)n > 0, as needed.
To see the statement of Theorem 2.5.17 concerning the GUE, one applies equa-
tion (2.5.27) of Proposition 2.5.19 with the above choices of (L, f , g) and M (n) ,
together with Theorem 2.5.2. The statement concerning the GSE follows with the
same choice of (L, f , g), this time using (2.5.28).
In preparation for the proof of Proposition 2.5.19, we need three lemmas. Only
the first uses Assumption 2.5.18 in its proof. To compress notation, write
A11 . . . A1N
.. .
[Ai j ]n,N = ... .
An1 ... AnN
Dividing G(y2n ) by ni=1 (y2i y2i1 ) and substituting y2i1 = y2i = xi for i =
1, . . . , n give the left side of (2.5.30). On the other hand, let u j denote the jth col-
umn of [gi1 (y j )]2n,2n . (Thus, G(y2n ) = det[u1 , . . . , u2n ].) Since it is a determinant,
G(y2n ) = det[u1 , u2 u1 , u3 , u4 u3 , . . . , u2n1 , u2n u2n1 ] and thus
" #
G(y2n ) u2 u1 u2n u2n1
= det u1 , , . . . , u2n1 , .
ni=1 (y2i y2i1 ) y2 y1 y2n y2n1
Applying LHopitals rule thus shows that the last expression evaluated at y2i1 =
y2i = xi for i = 1, . . . , n equals the right side of (2.5.30).
Lemma 2.5.22 For every positive integer n and x2n+1 = (x1 , . . . , x2n+1 ) we have
an identity
(o) (e)
2n (xn+1 )(xn ) = (xI )(xJ ) . (2.5.32)
(I,J)J2n+1
Proof Given I = {i1 < < ir } {1, . . . , 2n + 1}, we write I = (xI ). Given
a polynomial P = P(x1 , . . . , x2n+1 ) and a permutation S2n+1 , let P be defined
by the rule
( P)(x1 , . . . , x2n+1 ) = P(x (1) , . . . , x (2n+1) ) .
Given a permutation S2n+1 , let I = { (i) | i I}. Now let I J be a term
appearing on the right side of (2.5.32) and let = (i j) S2n+1 be a transposition.
We claim that
(I J ) 1 if {i, j} I or {i, j} J,
= (2.5.33)
I J (1)|i j|+1 otherwise.
To prove (2.5.33), since the cases {i, j} I and {i, j} J are trivial, and we may
allow i and j to exchange roles, we may assume without loss of generality that
i I and j J. Let k (resp., ) be the number of indices in the set I (resp., J)
strictly between i and j. Then
k + = |i j| 1, I / I = (1)k , J / J = (1) ,
which proves (2.5.33). It follows that if i and j have the same parity, the effect of
applying to the right side of (2.5.32) is to multiply by 1, and therefore (xi x j )
divides the right side. On the other hand, the left side of (2.5.32) equals 2n times
the product of (xi x j ) with i < j of the same parity. Therefore, because the
70 2. W IGNER MATRICES
polynomial functions on both sides of (2.5.32) are homogeneous of the same total
degree in the variables x1 , . . . , x2n+1 , the left side equals the right side times some
constant factor. Finally, the constant factor has to be 1 because the monomial
n+1 i1 n i1 n
i=1 x2i1 i=1 x2i appears with coefficient 2 on both sides.
and then evaluate using (2.5.29) and the second equality in (2.5.31). To prove
(2.5.28), rewrite the left side multiplied by det M (2n) as
" x j+1 #
(2n) x j1 f i1 (x)dx if j is odd
det M ,
fi1 (x j ) if j is even 2n+1,2n+1
Exercise 2.5.23 Let , > 1 be real constants. Show that each of the following
triples (L, f , g) satisfies Assumption 2.5.18:
(a) L = (0, ), f (x) = x ex , g(x) = x +1 ex (the Laguerre ensembles);
(b) L = (0, 1), f (x) = x (1 x) , g(x) = x +1 (1 x) +1 (the Jacobi ensembles).
for a > 0 and a continuous function V : RR such that, for some > 1 satis-
fying ,
V (x)
lim inf > 1. (2.6.2)
|x| log |x|
When V (x) = x2 /4, and = 1, 2, we saw in Section 2.5 that PNx2 /4, is the law
2.6 L ARGE DEVIATIONS FOR RANDOM MATRICES 71
We endow M1 (R) with the usual weak topology, compatible with the Lipschitz
bounded metric, see (C.1). Our goal is to estimate the probability PV,N (LN A),
for measurable sets A M1 (R). Of particular interest is the case where A does
not contain the limiting distribution of LN .
Define the noncommutative entropy : M1 (R) [, ) as
log |x y|d (x)d (y) if log(|x| + 1)d (x) < ,
( ) = (2.6.4)
otherwise ,
where
cV = inf { V (x)d (x) ( )} (, ) . (2.6.6)
M1 (R) 2
(Lemma 2.6.2 below and its proof show that both and IV are well defined, and
that cV is finite.)
Theorem 2.6.1 Let LN = N 1 Ni=1 N where the random variables {iN }Ni=1 are
i
distributed according to the law PV,N of (2.6.1), with potential V satisfying (2.6.2).
Then, the family of random measures LN satisfies, in M1 (R) equipped with the
weak topology, a large deviation principle with speed N 2 and good rate function
IV . That is,
The proof of Theorem 2.6.1 relies on the properties of the function IV collected in
Lemma 2.6.2 below. Define the logarithmic capacity of a measurable set A R
as -
1
(A) := exp inf log d (x)d (y) .
M1 (A) |x y|
Lemma 2.6.2
(a) cV (, ) and IV is well defined on M1 (R), taking its values in [0, +].
(b) IV ( ) is infinite as soon as satisfies one of the following conditions
(b.1) V (x)d (x) = +.
(b.2) There exists a set A R of positive mass but null logarithmic capacity,
i.e. a set A such that (A) > 0 but (A) = 0.
(c) IV is a good rate function.
(d) IV is a strictly convex function on M1 (R).
(e) IV achieves its minimum value at unique V M1 (R). The measure V is
compactly supported, and is characterized by the equality
V (x) V , log | x| = CV , for V -almost every x, (2.6.9)
2.6 L ARGE DEVIATIONS FOR RANDOM MATRICES 73
and inequality
Proof of Lemma 2.6.2 For all M1 (R), ( ) is well defined and < due to
the bound
log |x y| log(|x| + 1) + log(|y| + 1) . (2.6.11)
Further, cV < as can be checked by taking as the uniform law on [0, 1].
Set
1 1
f (x, y) = V (x) + V (y) log |x y| . (2.6.12)
2 2 2
Note that (2.6.2) implies that f (x, y) goes to + when x, y do since (2.6.11) yields
1 1
f (x, y) (V (x) log(|x| + 1)) + (V (y) log(|y| + 1)) . (2.6.13)
2 2
Further, f (x, y) goes to + when x, y approach the diagonal {x = y}. Therefore,
for all L > 0, there exists a constant K(L) (going to infinity with L) such that, with
BL := {(x, y) : |x y| < L1 } {(x, y) : |x| > L} {(x, y) : |y| > L},
Since f is continuous on the compact set BcL , we conclude that f is bounded below
on R2 , and denote by b f > a lower bound. It follows that cV b f > . Thus,
because V is bounded below by (2.6.2), we conclude that IV is well defined and
takes its values in [0, ], completing the proof of part (a). Further, since for any
measurable subset A R,
IV ( ) = ( f (x, y) b f )d (x)d (y) + b f cV
( f (x, y) b f )d (x)d (y) + b f cV
A A
log |x y|1 d (x)d (y) + inf V (x) (A)2 |b f | cV
2 A A xR
(A)2 log( (A)) |b f | cV + inf V (x) (A)2 ,
2 xR
74 2. W IGNER MATRICES
one concludes that if IV ( ) < , and A is a measurable set with (A) > 0, then
(A) > 0. This completes the proof of part (b).
We now show that IV is a good rate function, and first that its level sets {IV
M} are closed, that is that IV is lower semicontinuous. Indeed, by the monotone
convergence theorem,
IV ( ) = f (x, y)d (x)d (y) cV
= sup ( f (x, y) M)d (x)d (y) cV .
M0
/
Hence, taking (B) = [ (M + cV b f )+ / (K (B) b f )+ ] 1, which goes to
zero when B goes to infinity, one has that {IV M} K . This completes the
proof of part (c).
Since IV is a good rate function, it achieves its minimal value. Let V be
2.6 L ARGE DEVIATIONS FOR RANDOM MATRICES 75
IV (V + ) IV (V ) , (2.6.16)
which implies
V (x) log |x y|d V (y) d (x) 0 .
CV = 2cV V ,V ,
proving (2.6.9) and (2.6.10), with the strict inequality in (2.6.10) following from
the uniqueness of V , since the later implies that the inequality (2.6.16) is strict
as soon as is nontrivial. Finally, integrating (2.6.9) with respect to V reveals
that the latter must be a minimizer of IV , so that (2.6.9) characterizes V .
The claimed uniqueness of V , and hence the completion of the proof of part
(e), will follow from the strict convexity claim (part (d) of the lemma), which we
turn to next. Note first that, extending the definition of to signed measures in
evident fashion when the integral in (2.6.4) is well defined, we can rewrite IV as
IV ( ) = ( V ) + V (x) log |x y|d V (y) CV d (x) .
2
The fact that IV is strictly convex will follow as soon as we show that is strictly
concave. Toward this end, note the formula
1 1 |x y|2
log |x y| = exp{ } exp{ } dt , (2.6.19)
0 2t 2t 2t
76 2. W IGNER MATRICES
( + (1 ) ) ( ( ) + (1 )( )) = ( 2 )( ) ,
2
N
,V
PV,N (d 1 , , d N ) = (ZN )1 eN x=y f (x,y)dLN (x)dLN (y)
eV (i ) d i .
i=1
were a bounded continuous function, the proof would follow from a standard ap-
plication of Varadhans Lemma, Theorem D.8. The main point will therefore be
to overcome the singularities of this function, with the most delicate part being to
overcome the singularity of the logarithm.
2.6 L ARGE DEVIATIONS FOR RANDOM MATRICES 77
Following Appendix D (see Corollary D.6 and Definition D.3), a full large devi-
ation principle can be proved by proving that exponential tightness holds, as well
as estimating the probability of small balls. We follow these steps below.
Exponential tightness
Recall that d denotes the Lipschitz bounded metric, see (C.1). We prove here that
,V
for any M1 (R), if we set PV,N = ZN PV,N ,
1
lim lim sup log PV,N (d(LN , ) ) f (x, y)d (x)d (y) . (2.6.22)
0 N N2
(We will prove the full LDP for PV,N as a consequence of both the upper and lower
bounds on PV,N , see (2.6.28) below.) For any M 0, set fM (x, y) = f (x, y) M.
Then the bound
2
N
PV,N (d(LN , ) )
d(LN , )
eN x=y f M (x,y)dLN (x)dLN (y)
eV (i ) d i
i=1
holds. Since under the product Lebesgue measure, the i s are almost surely dis-
tinct, it holds that LN LN (x = y) = N 1 , PV,N almost surely. Thus we deduce
that
fM (x, y)dLN (x)dLN (y) = fM (x, y)dLN (x)dLN (y) + MN 1 ,
x=y
78 2. W IGNER MATRICES
and so
PV,N (d(LN , ) )
2
N
eMN
d(LN , )
eN fM (x,y)dLN (x)dLN (y)
eV (i ) d i .
i=1
Since fM is bounded and continuous, : IV,M fM (x, y)d (x)d (y) is a con-
tinuous functional, and therefore we deduce that
1
lim lim sup log PV,N (d(LN , ) ) IV,M ( ) .
0 N N2
We finally let M go to infinity and conclude by the monotone convergence theo-
rem. Note that the same argument shows that
1 ,V
lim sup log ZN inf f (x, y)d (x)d (y) . (2.6.23)
N N2 M1 (R)
which ensures that it is enough to prove the lower bound for (M , M R+ , IV ( ) <
), and so for compactly supported probability measures with finite entropy.
The idea is to localize the eigenvalues (i )1iN in small sets and to take ad-
vantage of the fast speed N 2 of the large deviations to neglect the small volume of
these sets. To do so, we first remark that, for any M1 (R) with no atoms, if we
set
-
1
x1,N = inf x : ((, x]) ,
N +1
-
1
xi+1,N = inf x xi,N : (xi,N , x] , 1 i N 1,
N +1
2.6 L ARGE DEVIATIONS FOR RANDOM MATRICES 79
for any real number , there exists a positive integer N( ) such that, for any N
larger than N( ),
1 N
d , xi,N < .
N i=1
In particular, for N N( 2 ),
-
(i )1iN | |i x | < i [1, N] {(i )1iN | d(LN , ) < } ,
i,N
2
so that we have the lower bound
PV,N (d(LN , ) )
2
N
0 i,N |<
eN x=y f (x,y)dLN (x)dLN (y)
eV (i ) d i
i {|i x 2 } i=1
N
|xi,N x j,N + i j | eN i=1 V (x d i
N i,N +
= i)
0
i {|i |< }
2 i< j i=1
|xi,N x j,N | |xi,N xi+1,N | 2 eN i=1 V (x
N i,N )
i+1< j i
N
|i i+1 | i=1 [V (x +i )V (x )]
N N
d i
i,N i,N
0 2 e
i {|i |< 2 }
i <i+1 i i=1
=: PN,1 PN,2 , (2.6.25)
where we used the fact that |xi,N x j,N + i j | |xi,N x j,N | |i j | when
i j and xi,N x j,N . To estimate PN,2 , note that since we assumed that is
compactly supported, the (xi,N , 1 i N)NN are uniformly bounded and so, by
continuity of V ,
|i |< 2 i |i i+1 | d i 2
0<ui < 2N
ui dui
2
( + 2)N
.
i <i1 i i=1 i=2 i=1
Therefore,
1
lim lim inf log PN,2 0 . (2.6.26)
0 N N2
To handle the term PN,1 , the uniform boundedness of the xi,N s and the convergence
80 2. W IGNER MATRICES
Thus, (2.6.24) and (2.6.22) imply the weak large deviation principle, i.e. that for
all M1 (R),
1
lim lim inf log PV,N (d(LN , ) )
0 N N2
1
= lim lim sup 2 log PV,N (d(LN , ) ) = IV ( ) . (2.6.28)
0 N N
This, together with the exponential tightness property proved above, completes
the proof of Theorem 2.6.1.
We consider next the large deviations for the maximum N = maxNi=1 i , of ran-
dom variables that possess the joint law (2.6.1). These will be obtained under the
following assumption.
N satisfy
Assumption 2.6.5 The normalization constants ZV,
N1
1 ZNV /(N1),
lim log N
= V, . (2.6.29)
N N ZV,
Theorem 2.6.6 Let (1N , . . . , NN ) be distributed according to the joint law PV,N of
(2.6.1), with continuous potential V that satisfies (2.6.2) and Assumption 2.6.5.
Let V be the minimizing measure of Lemma 2.6.2, and set x = max{x : x
supp V }. Then, N = maxNi=1 iN satisfies the LDP in R with speed N and good
rate function
log |x y|V (dy) V (x) V, if x x ,
JV (x) =
otherwise .
Further,
1
lim lim sup log PV,N (N > M) = (2.6.34)
M N N
Equipped with Lemma 2.6.7, we may complete the proof of Theorem 2.6.6. We
begin with the upper bound (2.6.31). Note that for any M > x,
By choosing M large enough and using (2.6.34), the first term in the right side of
NJV (x)
(2.6.36) can be made smaller than e , for all N large. In the sequel, we fix
an M such that the above is satisfied, the analogous bound with 1 also holds,
and further
" # " #
log |x y|V (dy) V (x) > sup log |z y|V (dy) V (z) .
z[M,)
(2.6.37)
2.6 L ARGE DEVIATIONS FOR RANDOM MATRICES 83
Combining the last equality with (2.6.40) and (2.6.36), we obtain (2.6.31).
We finally prove the lower bound (2.6.32). Let 2 < x x and fix r (x , x
2 ). Then, with Ir = (M, r)N1 ,
PV,N (N (x , x + ))
PV,N (N (x , x + ), i (M, r), i = 1, . . . , N 1) (2.6.41)
x+
d N e(N1)(N ,LN1 ) PNV
N
= N N1
/(N1), (d 1 , . . . , d N1 )
xIr
2 N exp (N 1) inf (z, ) PNV /(N1), (LN1 Br,M ( )) ,
N1
z(x ,x+ )
Br,M ( )
where Br,M ( ) denotes those measures in B( ) with support in [M, r]. Recall
from the upper bound (2.6.31) together with (2.6.35) that
Proof of Lemma 2.6.7 We first prove (2.6.33). Note that, for any > 0 and all N
large,
N1 N1 N1 N1
ZV, ZV,
ZNV /(N1), ZV,
N
= N1
N
N1
eN(V, + ) , (2.6.42)
ZV, ZNV /(N1),
ZV, ZNV /(N1),
By the LDP for LN1 (at speed N 2 , see Theorem 2.6.1), Lemma 2.6.2 and (2.6.21),
N(V ,V + )
the last integral is bounded above by e . Substituting this in (2.6.43) and
(2.6.42) yields (2.6.33).
For |x| > M, M large and i R, for some constants a , b ,
Therefore,
N1 N1
ZV,
PV,N (N > M) N N
ZV, M
eNV (N ) d N
RN1 i=1
|x i | eV (i ) dPV,N1
N1
ZV,
NbN1
eNV (M)/2 N
eV (N ) d N ,
ZV, M
Wigners Theorem was presented in [Wig55], and proved there using the method
of moments developed in Section 2.1. Since then, this result has been extended in
many directions. In particular, under appropriate moment conditions, an almost
sure version holds, see [Arn67] for an early result in that direction. Relaxation
of moment conditions, requiring only the existence of third moments of the vari-
ables, is described by Bai and co-workers, using a mixture of combinatorial, prob-
abilistic and complex-analytic techniques. For a review, we refer to [Bai99]. It is
important to note that one cannot hope to forgo the assumption of finiteness of sec-
ond moments, because without this assumption the empirical measure, properly
rescaled, converges toward a noncompactly supported measure, see [BeG08].
Regarding the proof of Wigners Theorem that we presented, there is a slight
ambiguity in the literature concerning the numbering of Catalan numbers. Thus,
[Aig79, p. 85] denotes by ck what we denote by Ck1 . Our notation follows
[Sta97]. Also, there does not seem to be a clear convention as to whether the
Dyck paths we introduced should be called Dyck paths of length 2k or of length
k. Our choice is consistent with our notion of length of Bernoulli walks. Finally,
we note that the first part of the proof of Lemma 2.1.3 is an application of the
reflection principle, see [Fel57, Ch. III.2].
The study of Wigner matrices is closely related to the study of Wishart ma-
trices, discussed in Exercises 2.1.18 and 2.4.8. The limit of the empirical mea-
sure of eigenvalues of Wishart matrices (and generalizations) can be found in
[MaP67], [Wac78] and [GrS77]. Another similar model is given by band ma-
trices, see [BoMP91]. In fact, both Wigner and Wishart matrices fall under the
86 2. W IGNER MATRICES
class of the general band matrices discussed in [Shl96], [Gui02] (for the Gaussian
case) and [AnZ05], [HaLN06].
Another promising combinatorial approach to the study of the spectrum of ran-
dom Wigner matrices, making a direct link with orthogonal polynomials, is pre-
sented in [Sod07].
The rate of convergence toward the semicircle distribution has received some
attention in the literature, see, e.g., [Bai93a], [Bai93b], [GoT03].
Lemma 2.1.19 first appears in [HoW53]. In the proof we mention that permu-
tation matrices form the extreme points of the set of doubly stochastic matrices,
a fact that is is usually attributed to G. Birkhoff. See [Chv83] for a proof and a
historical discussion which attributes this result to D. Konig. The argument we
present (that bypasses this characterization) was kindly communicated to us by
Hongjie Dong. The study of the distribution of the maximal eigenvalue of Wigner
matrices by combinatorial techniques was initiated by [Juh81], and extended by
[FuK81] (whose treatment we essentially follow; see also [Vu07] for recent im-
provements). See also [Gem80] for the analogous results for Wishart matrices.
The method was widely extended in the papers [SiS98a], [SiS98b], [Sos99] (with
symmetric distribution of the entries) and [PeS07] (in the general case), allow-
ing one to derive much finer behavior on the law of the largest eigenvalue, see
the discussion in Section 3.7. Some extensions of the FurediKomlos and Sinai
Soshnikov techniques can also be found in [Kho01]. Finally, conditions for the
almost sure convergence of the maximal eigenvalue of Wigner matrices appear in
[BaY88].
The study of maximal and minimal eigenvalues for Wishart matrices is of fun-
damental importance in statistics, where they are referred to as sample covari-
ance matrices, and has received a great deal of attention recently. See [SpT02],
[BeP05], [LiPRTJ05], [TaV09a], [Rud08], [RuV08] for a sample of recent devel-
opments.
The study of central limit theorems for traces of powers of random matrices
can be traced back to [Jon82], in the context of Wishart matrices (an even earlier
announcement appears in [Arh71], without proofs). Our presentation follows to a
large extent Jonssons method, which allows one to derive a CLT for polynomial
functions. A by-product of [SiS98a] is a CLT for tr f (XN ) for analytic f , under
a symmetry assumption on the moments. The paper [AnZ05] generalizes these
results, allowing for differentiable functions f and for nonconstant variance of the
independent entries. See also [AnZ08a] for a different version of Lemma 2.1.34.
For functions of the form f (x) = ai /(zi x) where zi C \ R, and matrices of
Wigner type, CLT statements can be found in [KhKP96], with somewhat sketchy
proofs. A complete treatment for f analytic in a domain including the support of
2.7 B IBLIOGRAPHICAL NOTES 87
the limit of the empirical distribution of eigenvalues is given in [BaY05] for ma-
trices of Wigner type, and in [BaS04] for matrices of Wishart type under a certain
restriction on fourth moments. Finally, an approach based on Fourier transforms
and interpolation was recently proposed in [PaL08].
Much more is known concerning the CLT for restricted classes of matrices:
[Joh98] uses an approach based on the explicit joint density of the eigenvalues,
see Section 2.5. (These results apply also to a class of matrices with dependent
entries.) For Gaussian matrices, an approach based on the stochastic calculus
introduced in Section 4.3 can be found in [Cab01] and [Gui02]. Recent extensions
and reinterpretation of this work, using the notion of second order freeness, can
be found in [MiS06] (see Chapter 5 for the notion of freeness and its relation to
random matrices).
The study of spectra of random matrices via the Stieltjes transform (resolvent
functions) was pioneered by Pastur co-workers, and greatly extended by Bai and
co-workers. See [MaP67] for an early reference, and [Pas73] for a survey of the
literature up to 1973. Our derivation is based on [KhKP96], [Bai99] and [SiB95].
We presented in Section 2.3 a very brief introduction to concentration inequali-
ties. This topic is picked up again in Section 4.4, to which we refer the reader for
a complete introduction to different concentration inequalities and their applica-
tion in RMT, and for full bibliographical notes. Good references for the logarith-
mic Sobolev inequalities used in Section 2.3 are [Led01] and [AnBC+ 00]. Our
treatment is based on [Led01] and [GuZ00]. Lemma 2.3.2 is taken from [BoL00,
Proposition 3.1]. We note in passing that, on R, a criterion for a measure to satisfy
the logarithmic Sobolev inequality was developed by Bobkov and Gotze [BoG99].
In particular, any probability measure on R possessing a bounded above and be-
low density with respect to the measures (dx) = Z 1 e|x| dx for 2, where
|x|
Z= e dx, satisfies the LSI, see [Led01], [GuZ03, Property 4.6]. Finally,
in the Gaussian case, estimates on the expectation of the maximal eigenvalue (or
minimal and maximal singular values, in the case of Wishart matrices) can be ob-
tained from Slepians and Gordons inequalities, see [LiPRTJ05] and [DaS01]. In
particular, these estimates are useful when using, in the Gaussian case, (2.3.10)
with k = N.
The basic results on joint distribution of eigenvalues in the GOE and GUE pre-
sented in Section 2.5, as well as an extensive list of integral formulas similar to
(2.5.4) are given in [Meh91], [For05]. We took, however, a quite different ap-
proach to all these topics based on the elementary proof of the Selberg integral
formula [Sel44], see [AnAR99], given in [And91]. The proof of [And91] is based
on a similar proof [And90] of some trigonometric sum identities, and is also simi-
88 2. W IGNER MATRICES
lar in spirit to the proofs in [Gus90] of much more elaborate identities. For a recent
review of the importance of the Selberg integral, see [FoO08], where in particular
it is pointed out that Lemma 2.5.15 seems to have first appeared in [Dix05].
We follow [FoR01] in our treatment of superposition and decimation (The-
orem 2.5.17). We remark that triples (L, f , g) satisfying Assumption 2.5.18, and
hence the conclusions of Proposition 2.5.19, can be classified, see [FoR01], to
which we refer for other classical examples where superposition and decimation
relations hold. An early precursor of such relations can be traced to [MeD63].
Theorem 2.6.1 is stated in [BeG97, Theorem 5.2] under the additional assump-
tion that V does not grow faster than exponentially and proved there in detail when
V (x) = x2 . In [HiP00b], the same result is obtained when the topology over M1 (R)
is taken to be the weak topology with respect to polynomial test functions instead
of bounded continuous functions. Large deviations for the empirical measure of
random matrices with complex eigenvalues were considered in [BeZ98] (where
non self-adjoint matrices with independent Gaussian entries were studied) and in
[HiP00a] (where Haar unitary distributed matrices are considered). This strategy
can also be used when one is interested in discretized versions of the law PN,V
as they appear in the context of Young diagrams, see [GuM05]. The LDP for
the maximal eigenvalue described in Theorem 2.6.6 is based on [BeDG01]. We
mention in passing that other results discussed in this chapter have analogs for the
law PN,V . In particular, the CLT for linear statistics is discussed in [Joh98], and
concentration inequalities for V convex are a consequence of the results in Section
4.4.
Models of random matrices with various degrees of dependence between entries
have also be treated extensively in the literature. For a sample of existing results,
see [BodMKV96], [ScS05] and [AnZ08b]. Random Toeplitz, Hankel and Markov
matrices have been studied in [BrDJ06] and [HaM05].
Many of the results described in this chapter (except for Sections 2.3, 2.5 and
2.6) can also be found in the book [Gir90], a translation of a 1975 Russian edition,
albeit with somewhat sketchy and incomplete proofs.
We have restricted attention in this chapter to Hermitian matrices. A natural
question concerns the complex eigenvalues of a matrix XN where all are i.i.d. In
the Gaussian case, the joint distribution of the eigenvalues was derived by [Gin65].
The analog of the semicircle law is now the circular law: the empirical measure of
the (rescaled) eigenvalues converges to the circular law, i.e. the measure uniform
on the unit disc in the complex plane. This is stated in [Gir84], with a sketchy
proof. A full proof for the Gaussian case is provided in [Ede97], who also eval-
uated the probability that exactly k eigenvalues are real. Large deviations for the
2.7 B IBLIOGRAPHICAL NOTES 89
empirical measure in the Gaussian case are derived in [BeZ98]. For non-Gaussian
entries whose law possesses a density and finite moments of order at least 6, a
full proof, based on Girko ideas, appears in [Bai97]. The problem was recently
settled in full generality, see [TaV08a], [TaV08b], [GoT07]; the extra ingredients
in the proof are closely related to the study of the minimal singular value of XX
discussed above.
3
Hermite polynomials, spacings and limit
distributions for the Gaussian ensembles
In this chapter, we present the analysis of asymptotics for the joint eigenvalue dis-
tribution for the Gaussian ensembles: the GOE, GUE and GSE. As it turns out, the
analysis takes a particularly simple form for the GUE, because then the process of
eigenvalues is a determinantal process. (We postpone to Section 4.2 a discussion
of general determinantal processes, opting to present here all computations with
bare hands.) In keeping with our goal of making this chapter accessible with
minimal background, in most of this chapter we consider the GUE, and discuss
the other Gaussian ensembles in Section 3.9. Generalizations to other ensembles,
refinements and other extensions are discussed in Chapter 4 and in the biblio-
graphical notes.
3.1 Summary of main results: spacing distributions in the bulk and edge of
the spectrum for the Gaussian ensembles
We recall that the N eigenvalues ofthe GUE/GOE/GSE are spread out on an in-
terval of width roughly equal to 4 N, and hence the spacing between adjacent
eigenvalues is expected to be of order 1/ N.
90
3.1 S UMMARY OF MAIN RESULTS 91
with
t (x)
1 F(t) = exp dx for t 0 ,
0 x
with the solution of
(t )2 + 4(t )(t + ( )2 ) = 0 ,
so that
t t2 t3
= 2 3 + O(t 4 ) as t 0 . (3.1.2)
The differential equation satisfied by is the -form of Painleve V. Note that
Theorem 3.1.2 implies that F(t) t0 0. Additional analysis (see Remark 3.6.5 in
Subsection 3.6.3) yields that also F(t) t 1, showing that F is the distribution
function of a probability distribution on R+ .
We now turn our attention to the edge of the spectrum.
where C is the contour in the -plane consisting of the ray joining e i/3 to the
origin plus the ray joining the origin to e i/3 (see Figure 3.1.1).
92 3. S PACINGS FOR G AUSSIAN ENSEMBLES
C
6
By differentiating under the integral and then integrating by parts, it follows that
Ai(x), for x R, satisfies the Airy equation:
d2y
xy = 0 . (3.1.4)
dx2
Various additional properties of the Airy function and kernel are summarized in
Subsection 3.7.3.
The fundamental result concerning the eigenvalues of the GUE at the edge of
the spectrum is the following.
Note that the statement of Theorem 3.1.4 does not ensure that F2 is a distribu-
tion function (and in particular, does not ensure that F2 () = 0), since it only
implies thevague convergence, not the weak convergence, of the random vari-
ables NN / N 2. The latter convergence, as well as a representation of F2 , are
contained in the following.
where q satisfies
q = tq + 2q3 , q(t) Ai(t) , as t + . (3.1.8)
We next state the results for the GOE and GSE in a concise way that allows easy
comparison with the GUE. Most of the analysis will be devoted to controlling the
influence of the departure from a determinantal structure in these ensembles.
( ,n) ( ,n)
For = 1, 2, 4, let ( ,n) = (1 , . . . , n ) be a random vector in Rn with
( )
the law Pn , see (2.5.6), possessing a density with respect to Lebesgue measure
proportional to |(x)| e |x| /4 . (Thus, = 1 corresponds to the GOE, = 2 to
2
where
1 t
t 2 ((tr) + (tr))2 = 4(tr)2 ((tr)2 + ((tr) )2 ) , r(t) = + + Ot0 (t 2 ).
2
94 3. S PACINGS FOR G AUSSIAN ENSEMBLES
The following is the main result of the analysis of spacings for the GOE and GSE.
In this section we show why orthogonal polynomials arise naturally in the study
of the law of the GUE. The relevant orthogonal polynomials in this study are the
Hermite polynomials and the associated oscillator wave-functions, which we in-
troduce and use to derive a Fredholm determinant representation for certain prob-
abilities connected with the GUE.
We now show that the joint distribution of the eigenvalues following the GUE has
a nice determinantal form, see Lemma 3.2.2 below. We then use this formula
in order to deduce a Fredholm determinant expression for the probability that no
eigenvalues belong to a given interval, see Lemma 3.2.4.
Throughout this section, we shall consider the eigenvalues of GUE matrices
with complex Gaussian entries of unit variance as in Theorem 2.5.2, and later
normalize the eigenvalues to study convergence issues. We shall be interested in
symmetric statistics of the eigenvalues. For p N, recalling the joint distributions
(2)
PN of the unordered eigenvalues of the GUE, see Remark 2.5.3, we call its
marginal P p,N on p coordinates the distribution of p unordered eigenvalues of
3.2 H ERMITE POLYNOMIALS AND THE GUE 95
(2)
the GUE. More explicitly, P p,N is the probability measure on R p so that, for any
f Cb (R p ),
(2) (2)
f (1 , , p )dP p,N (1 , , p ) = f (1 , , p )dPN (1 , , N )
(2)
(recall that PN is the law of the unordered eigenvalues). Clearly, one also has
(2)
f (1 , , p )dP p,N (1 , , p )
(N p)!
(2)
= f ( (1) , , (p) )dPN (1 , , N ) ,
N! S p,N
where S p,N is the set of injective maps from {1, , p} into {1, , N}. Note that
(2) (2)
we automatically have PN,N = PN .
We now introduce the Hermite polynomials and associated normalized (har-
monic) oscillator wave-function.
ex /4 Hn (x)
2
n (x) = .
2 n!
d 2 n
x is taken as the definition of the nth Her-
2
(Often, in the literature, (1)n ex dx ne
mite polynomial. We find (3.2.1) more convenient.)
For our needs, the most important property of the oscillator wave-functions is
their orthogonality relations
k (x) (x)dx = k . (3.2.2)
We will also use the monic property of the Hermite polynomials, that is
The proofs of (3.2.2) and (3.2.3) appear in Subsection 3.2.2, see Lemmas 3.2.7
and 3.2.5.
(2)
We are finally ready to describe the determinantal structure of P p,N . (See Sec-
tion 4.2 for more information on implications of this determinantal structure.)
96 3. S PACINGS FOR G AUSSIAN ENSEMBLES
(2)
Lemma 3.2.2 For any p N, the law P p,N is absolutely continuous with respect
to Lebesgue measure, with density
(2) (N p)! p (N)
p,N (1 , , p ) = det K (k , l ) ,
N! k,l=1
where
N1
K (N) (x, y) = k (x)k (y) . (3.2.4)
k=0
(2)
Proof Theorem 2.5.2 shows that p,N exists and equals
N N
|(x)|2 exi /2
(2) 2
p,N (1 , , p ) = Cp,N d i , (3.2.5)
i=1 i=p+1
where xi = i for i p and i for i > p, and C p,N is a normalization constant. The
fundamental remark is that this density depends on the Vandermonde determinant
N N
(x) = (x j xi ) = det xij1 = det H j1 (xi ) ,
i, j=1 i, j=1
(3.2.6)
1i< jN
where in the last line we used the fact that det(AB) = det(A) det(B) with A = B =
( j1 (i ))i, j=1 . Here, CN,N = k=0 ( 2 k!)CN,N .
N N1
(2)
Of course, from (2.5.4) we know that CN,N = CN . We provide now yet another
direct evaluation of the normalization constant, following [Meh91]. We introduce
a trick that will be very often applied in the sequel.
Proof Using the identity det(AB) = det(A) det(B) applied to A = { fk (xi )}ik and
B = {gk (x j )}k j , we get
n n n n n n
det
i, j=1
fk (xi )gk (x j ) dxi = det fi (x j ) det gi (x j ) dxi ,
i, j=1 i, j=1
k=1 i=1 i=1
which equals, by expanding the determinants involving the families {gi } and { fi },
n n
( ) ( ) f (i) (xi ) g (i) (xi )) dxi
, Sn i=1 i=1
n
= ( ) ( ) f (i) (x)g (i) (x)dx
, Sn i=1
n n
= n! ( ) fi (x)g (i) (x)dx = n! det
i, j=1
fi (x)g j (x)dx.
Sn i=1
where in the first equality we used the orthogonality of the family { j } to con-
clude that the contribution comes only from permutations of SN for which (i) =
98 3. S PACINGS FOR G AUSSIAN ENSEMBLES
p
(2)
p,N (1 , , p ) = Cp,N det (K (N) (i , j )).
i, j=1
p 2
1 = Cp,N det j 1 (i )
i, j=1
d 1 d p , (3.2.11)
11 << p N
p 2
det j 1 (i ) d 1 d p = p!.
i, j=1
Thus, since there are (N!)/((N p)!p!) terms in the sum at the right side of
(3.2.11), we conclude that C p,N = (N p)!/N!.
Now we arrive at the main point, on which the study of the local properties of
the GUE will be based.
.
N k
(1)k k
{i A}) = 1 + det K (N) (xi , x j ) dxi . (3.2.12)
(2)
PN (
i=1 k=1 k! Ac Ac i, j=1 i=1
(The proof will show that the sum in (3.2.12) is actually finite.) The last expres-
sion appearing in (3.2.12) is a Fredholm determinant. The latter are discussed in
greater detail in Section 3.4.
Proof By using Lemmas 3.2.2 and 3.2.3 in the first equality, and the orthogonality
relations (3.2.2) in the second equality, we have
(2)
PN [i A, i = 1, . . . , N]
N1 N1
= det i (x) j (x)dx = det i j i (x) j (x)dx
i, j=0 A i, j=0 Ac
N
k
= 1 + (1)k det i (x) j (x)dx ,
k=1 01 <<k N1 i, j=1 Ac
3.2 H ERMITE POLYNOMIALS AND THE GUE 99
Therefore,
(2)
PN [i A, i = 1, . . . , N]
N 2 k
(1)k k
= 1+ det i (x j ) dxi
k=1 k! Ac Ac 0 << N1 i, j=1 i=1
1 k
N k
(1)k k
= 1+ det K (N) (xi , x j ) dxi
k=1 k! Ac Ac i, j=1 i=1
k
(1)k k
= 1+ det K (N) (xi , x j ) dxi , (3.2.13)
k=1 k! Ac Ac i, j=1 i=1
where the first equality uses (3.2.8) with gi (x) = fi (x) = i (x)1Ac (x), the second
equality uses the CauchyBinet Theorem A.2, and the last step is trivial since the
determinant detki, j=1 K (N) (xi , x j ) has to vanish identically for k > N because the
rank of {K (N) (xi , x j )}ki, j=1 is at most N.
Recall the definition of the Hermite polynomials, Definition 3.2.1. Some proper-
ties of the Hermite polynomials are collected in the following lemma. Through-
out, we use the notation f , gG = R f (x)g(x)ex /2 dx. In anticipation of further
2
development, we collect much more information than was needed so far. Thus,
the proof of Lemma 3.2.5 may be skipped at first reading. Note that (3.2.3) is the
second point of Lemma 3.2.5.
Property 2 shows that {Hn }n0 is a basis of polynomial functions, whereas prop-
erty 5 implies that it is an orthogonal basis for the scalar product f , gG defined
on L2 (ex /2 dx) (since the polynomial functions are dense in the latter space).
2
Remark 3.2.6 Properties 7 and 10 are the three-term recurrence and the Christoffel
Darboux identity satisfied by the Hermite polynomials, respectively.
Proof of Lemma 3.2.5 Properties 1, 2 and 3 are clear. To prove property 5, use
integration by parts to get that
x2 /2 dl
= (1) Hk (x) l (ex /2 )dx
l 2
Hk (x)Hl (x)e dx
dx
" l #
d
ex /2 dx
2
= Hk (x)
dxl
vanishes
if l > k (since the degree of Hk is strictly less than l), and is equal to
2 k! if k = l, by property 2. Then, we deduce property 4 since, by property 3,
H2n is an even function and so is the function ex /2 . Properties 2 and 5 suffice
2
equals the analogous integral with G(x, y) replacing F(x, y); we leave the details to
the reader. Equality of these integrals granted, property 10 follows since {Hk }k0
being a basis of the set of polynomials, it implies almost sure equality and hence
3.3 T HE SEMICIRCLE LAW REVISITED 101
Recall next the oscillator wave-functions, see Definition 3.2.1. Their basic
properties are contained in the following lemma, which is an easy corollary of
Lemma 3.2.5. Note that (3.2.2) is just the first point of the lemma.
We remark that the last relation above is the one-dimensional Schrodinger equa-
tion for the eigenstates of the one-dimensional quantum-mechanical harmonic os-
cillator. This explains the terminology.
LN = ( N /N + + N /N )/N (3.3.1)
1 N
denote the empirical distribution of the eigenvalues of the rescaled matrix XN / N.
LN thus corresponds to the eigenvalues of a Gaussian Wigner matrix.
We are going to make the average empirical distribution LN explicit in terms
of Hermite polynomials, calculate the moments of LN explicitly, check that the
moments of LN converge to those of the semicircle law, and thus provide an al-
ternative proof of Lemma 2.1.7. We also derive a recursion for the moments of
LN and
estimate the order of fluctuation of the renormalized maximum eigenvalue
NN / N above the spectrum edge, an observation that will be useful in Section
3.7.
102 3. S PACINGS FOR G AUSSIAN ENSEMBLES
Therefore
d (n)
K (x, x)/ n = n (x)n1 (x) n1
(x)n (x) = n (x)n1 (x). (3.3.4)
dx
By (3.3.3) the function K (N) ( Nx, Nx)/ N is the RadonNikodym derivative
of LN with respect to Lebesgue measure and hence we have the following repre-
sentation of the moment-generating function of LN :
1
LN , e =
s
esx/ N
K (N) (x, x)dx. (3.3.5)
N
Thus the calculation of the moment generating function of LN boils down to the
problem of evaluating the integral on the right.
By Taylors theorem it follows from point 8 of Lemma 3.2.5 that, for any n,
n n
n n
Hn (x + t) = Hnk (x)t =
k
Hk (x)t nk .
k=0
k k=0
k
3.3 T HE SEMICIRCLE LAW REVISITED 103
Let Stn =: e n (x)n1 (x)dx.
tx By the preceding identity and orthogonality we
have
n
Hn (x)Hn1 (x)ex /2+tx dx
2
Stn =
n! 2
t 2 /2
ne
Hn (x + t)Hn1 (x + t)ex /2 dx
2
=
n! 2
t 2 /2 n1
n1
k! n
= e n t 2n12k .
k=0 n!
k k
Changing the index of summation in the last sum from k to n 1 k, we then get
t 2 /2 (n 1 k)! n1
n1
n
n
St = e n t 2k+1
k=0 n! n 1 k n 1 k
n1
(n 1 k)! n n1
n
2
t /2
= e t 2k+1 .
k=0 n! k + 1 k
From the last calculation combined with (3.3.6) and after a further bit of re-
arrangement we obtain (3.3.2).
We can now present another
Proof of Lemma 2.1.7 (for Gaussian Wigner matrices) We have written the
moment generating function in the form (3.3.2), making it obvious that as N
the moments of LN tend to the moments of the semicircle distribution.
Recall that, throughout this chapter, NN denotes the maximal eigenvalue of a GUE
matrix. Our goal in this section is to provide the proof of the following lemma.
Lemma 3.3.2 (Ledouxs bound) There exist positive constants c and C such that
N
2/3
P N eN C ec , (3.3.7)
2 N
for all N 1 and > 0.
Roughly speaking, the last inequality says that fluctuations of the rescaled top
eigenvalue NN := NN /2 N 1 above 0 are of order of magnitude N 2/3 . This is
an a priori indication that the random variables N 2/3 NN converge in distribution,
as stated in Theorems 3.1.4 and 3.1.5. In fact, (3.3.7) is going to play a role in the
proof of Theorem 3.1.4, see Subsection 3.7.1.
104 3. S PACINGS FOR G AUSSIAN ENSEMBLES
Proof of Lemma 3.3.2 From (3.3.8) and the definitions we obtain the inequalities
(N) (N) k(k + 1) (N)
0 bk bk+1 1 + bk
4N 2
for N 1, k 0. As a consequence, we deduce that
3
(N) c k2
bk e N , (3.3.13)
for some finite constant c > 0. By Stirlings approximation (2.5.12) we have
k3/2 2k
sup 2k < .
k=0 2 (k + 1) k
It follows from (3.3.13) and the last display that, for appropriate positive constants
c and C,
N 2k
N NN
P e E (3.3.14)
2 N 2 Ne
(N)
e2 k Nbk 2k
CNt 3/2 e2 t+ct /N ,
3 2
22k (k + 1) k
for all N 1, k 0 and real numbers ,t > 0 such that k = t, where t denotes
the largest integer smaller than or equal to t. Taking t = N 2/3 and substituting
N 2/3 for yields the lemma.
Exercise 3.3.4 Prove that, in the setup of this section, for every integer k it holds
that
lim ELN , xk 2 = lim LN , xk 2 . (3.3.15)
N N
Using the fact that the moments of LN converge to the moments of the semicircle
distribution, complete yet another proof of Wigners Theorem 2.1.1 in the GUE
setup.
Hint: Deduce from (3.3.3) that
1
LN , xk = xk K (N) (x, x)dx .
N k/2+1
Also, rewrite ELN , xk 2 as
N N N
1 1
= ( xik )2 det K (N) (xi , x j ) dx j
N 2+k N! i=1 i, j=1 j=1
2
! 1 1
= x2k K (N) (x, y)2 dxdy + xk K (N) (x, x)dx
N k+2 N k+2
(N)
= LN , xk 2 + Ik ,
106 3. S PACINGS FOR G AUSSIAN ENSEMBLES
(N)
where Ik is equal to
1 x2k xk yk
(N (x)N1 (y) N1 (x)N (y))K(x, y)dxdy .
N k+3/2 xy
To prove the equality marked with the exclamation point, show that
K (n) (x,t)K (n) (t, y)dt = K (n) (x, y) ,
(N)
while the expression for Ik uses the ChristoffelDarboux formula (see Section
(N)
3.2.1). To complete the proof of (3.3.15), show that limN Ik = 0, expanding
the expression
x2k xk yk
(N (x)N1 (y) N1 (x)N (y))
xy
as a linear combination of the functions (x)m (y) by exploiting the three-term
recurrence (see Section 3.2.1) satisfied by the oscillator wave-functions.
Exercise 3.3.5 With the notation of Lemma 3.3.2, show that there exist c ,C > 0
so that, for all N 1, if > 1 then
N
2/3 1
3
P N eN C 3 ec 2 .
2 N 4
This bound improves upon (3.3.7) for large .
Hint: optimize differently over the parameter t at the end of the proof of Lemma
3.3.2, replacing there by N 2/3 .
Exercise 3.3.6 The function Fn (t) defined in (3.3.9) is a particular case of the
general hypergeometric function, see [GrKP94]. Let
xk = x(x + 1) (x + k 1)
be the ascending factorial power. The general hypergeometric function is given
by the rule
a1 a p ak ak k
p t
F
b1 bq t = 1
.
k=0 b bq k!
k k
1
We have seen in Lemma 3.2.4 that a certain gap probability, i.e. the probability
that a set does not contain any eigenvalue, is given by a Fredholm determinant.
The asymptotic study of gap probabilities thus involves the analysis of such de-
terminants. Toward this end, in this section we review key definitions and facts
concerning Fredholm determinants. We make no attempt to achieve great gen-
erality. In particular we do not touch here on any functional analytic aspects of
the theory of Fredholm determinants. The reader interested only in the proof of
Theorem 3.1.1 may skip Subsection 3.4.2 in a first reading.
Let X be a locally compact Polish space, with BX denoting its Borel -algebra.
Let be a complex-valued measure on (X, BX ), such that
1 = | (dx)| < . (3.4.1)
X
Given two kernels K(x, y) and L(x, y), define their composition (with respect to )
as
(K L)(x, y) = K(x, z)L(z, y)d (z). (3.4.4)
The trace in (3.4.3) and the composition in (3.4.4) are well defined because 1 <
and K < , and further, K L is itself a kernel. By Fubinis Theorem, for any
108 3. S PACINGS FOR G AUSSIAN ENSEMBLES
Lemma 3.4.2 Fix n > 0. For any two kernels F(x, y) and G(x, y) we have
n n
det F(xi , y j ) det G(xi , y j ) n1+n/2 F G max(F, G)n1 (3.4.5)
i, j=1 i, j=1
and
n
det F(xi , y j ) nn/2 Fn . (3.4.6)
i, j=1
The factor nn/2 in (3.4.5) and (3.4.6) comes from Hadamards inequality (Theorem
A.3). In view of Stirlings approximation (2.5.12), it is clear that the Hadamard
bound is much better than the bound n! we would get just by counting terms.
Proof Define
G(x, y) if i < k,
(k)
Hi (x, y) = F(x, y) G(x, y) if i = k,
F(x, y) if i > k,
noting that, by the linearity of the determinant with respect to rows,
n n n n
i,det
(k)
det F(xi , y j ) det G(xi , y j ) = Hi (xi , y j ) . (3.4.7)
i, j=1 i, j=1 j=1
k=1
(k) (k)
Considering the vectors vi = vi with vi ( j) = Hi (xi , y j ), and applying Hadamards
inequality (Theorem A.3), one gets
n
det H (k) (xi , y j ) nn/2 F G max(F, G)n1 .
i
i, j=1
Substituting in (3.4.7) yields (3.4.5). Noting that the summation in (3.4.7) involves
only one nonzero term when G = 0, one obtains (3.4.6).
We are now finally ready to define the Fredholm determinant associated with a
kernel K(x, y). For n > 0, put
n
n = n (K, ) = det K(i , j )d (1 ) d (n ) , (3.4.8)
i, j=1
3.4 F REDHOLM DETERMINANTS 109
Definition 3.4.3 The Fredholm determinant associated with the kernel K is defined
as
(1)n
(K) = (K, ) = n (K, ).
n=0 n!
(As in (3.4.8) and Definition 3.4.3, we often suppress the dependence on from
the notation for Fredholm determinants.) In view of Stirlings approximation
(2.5.12) and estimate (3.4.9), the series in Definition 3.4.3 converges absolutely,
and so (K) is well defined. The reader should not confuse the Fredholm determi-
nant (K) with the Vandermonde determinant (x): in the former, the argument
is a kernel while, in the latter, it is a vector.
Remark 3.4.4 Here is some motivation for calling (K) a determinant. Let
f1 (x), . . . , fN (x), g1 (x), . . . , gN (x) be given. Put
N
K(x, y) = fi (x)gi (y).
i=1
Assume further that maxi supx fi (x) < and max j supy g j (y) < . Then K(x, y) is
a kernel and so fits into the theory developed thus far. Paraphrasing the proof of
Lemma 3.2.4, we have that
N
(K) = det i j fi (x)g j (x)d (x) . (3.4.10)
i, j=1
For this reason, one often encounters the notation det(I K) for the Fredholm
determinant of K.
The determinants (K) inherit good continuity properties with respect to the
norm.
Lemma 3.4.5 For any two kernels K(x, y) and L(x, y) we have
n1+n/2 n1 max(K, L)n1
|(K) (L)| K L . (3.4.11)
n=1 n!
110 3. S PACINGS FOR G AUSSIAN ENSEMBLES
In particular, with K held fixed, and with L varying in such a way that K L
0, it follows that (L) (K). This is the only thing we shall need to obtain
the convergence in law of the spacing distribution of the eigenvalues of the GUE,
Theorem 3.1.1. On the other hand, the next subsections will be useful in the proof
of Theorem 3.1.2.
Throughout, we fix a measure and a kernel K(x, y). We put = (K). All the
constructions under this heading depend on K and , but we suppress reference to
this dependence in the notation in order to control clutter. Define, for any integer
n 1,
x1 . . . xn n
K = det K(xi , y j ) , (3.4.12)
y1 . . . yn i, j=1
set
x 1 ... n
Hn (x, y) = K d (1 ) d (n ) (3.4.13)
y 1 ... n
and
H0 (x, y) = K(x, y) .
We then have from Lemma 3.4.2 that
|Hn (x, y)| Kn+1 n1 (n + 1)(n+1)/2 . (3.4.14)
Definition 3.4.6 The Fredholm adjugant of the kernel K(x, y) is the function
(1)n
H(x, y) = Hn (x, y) . (3.4.15)
n=0 n!
Lemma 3.4.7 (The fundamental identity) Let H(x, y) be the Fredholm adjugant
of the kernel K(x, y). Then,
K(x, z)H(z, y)d (z) = H(x, y) (K) K(x, y)
= H(x, z)K(z, y)d (z) , (3.4.18)
K H = H (K) K = H K . (3.4.19)
Remark 3.4.8 Before proving the fundamental identity (3.4.19), we make some
amplifying remarks. If (K) = 0 and hence the resolvent R(x, y) = H(x, y)/(K)
of K(x, y) is well defined, then the fundamental identity takes the form
K(x, z)R(z, y)d (z) = R(x, y) K(x, y) = R(x, z)K(z, y)d (z) (3.4.20)
K R = RK = RK.
It is helpful if not perfectly rigorous to rewrite the last formula as the operator
identity
1 + R = (1 K)1 .
Rigor is lacking here because we have not taken the trouble to associate linear
operators with our kernels. Lack of rigor notwithstanding, the last formula makes
it clear that R(x, y) deserves to be called the resolvent of K(x, y). Moreover, this
formula is useful for discovering composition identities which one can then verify
directly and rigorously.
Proof of Lemma 3.4.7 Here are two reductions to the proof of the fundamental
identity. Firstly, it is enough just to prove the first of the equalities claimed in
(3.4.18) because the second is proved similarly. Secondly, proceeding term by
112 3. S PACINGS FOR G AUSSIAN ENSEMBLES
where n = n (K).
Now we can quickly give the proof of the fundamental identity (3.4.19). Ex-
panding by minors of the first row, we find that
x 1 . . . n
K
y 1 . . . n
1 . . . n
= K(x, y)K
1 . . . n
n
1 . . . j1 j j+1 . . . n
+ (1) j K(x, j )K
j=1
y 1 . . . j1 j+1 . . . n
1 . . . n
= K(x, y)K
1 . . . n
n
j 1 . . . j1 j+1 . . . n
K(x, j )K .
j=1
y 1 . . . j1 j+1 . . . n
We extract two further benefits from the proof of the fundamental identity. Re-
call from (3.4.8) and Definition 3.4.3 the abbreviated notation n = n (K) and
(K).
(ii) Further,
n
(1)n (1)k
n+1 = k tr(K
3 45
K6) . (3.4.23)
n! k=0 k! n+1k
Proof Part (i) follows from (3.4.21) by employing an induction on n. We leave the
details to the reader. Part (ii) follows by putting x = and y = in (3.4.22), and
integrating out the variable .
We now prove a result needed for our later analysis of GOE and GSE. A reader
interested only in GUE can skip this material.
(K + L L K) = (K)(L) . (3.4.24)
In the sequel we refer to this relation as the multiplicativity of the Fredholm deter-
minant construction.
Proof Let t be a complex variable. We are going to prove multiplicativity by
studying the entire function
of t. We assume below that K,L (t) does not vanish identically, for otherwise there
is nothing to prove. We claim that
K,L (0) = (K) tr(L L K) + tr((L L K) H)
= (K) tr(L) , (3.4.25)
where H is the Fredholm adjugant of K, see equation (3.4.15). The first step
is justified by differentiation under the integral; to justify the exchange of limits
one notes that for any entire analytic function f (z) and > 0 one has f (0) =
1 f (z)
2 i |z|= z2 dz, and then uses Fubinis Theorem. The second step follows by the
fundamental identity, see Lemma 3.4.7. This completes the proof of (3.4.25).
Since 0,L (t) = (tL) equals 1 for t = 0, the product 0,L (t)K,L (t) does not
vanish identically. Arbitrarily fix a complex number t0 such that 0,L (t0 )K,L (t0 ) =
0. Note that the resolvant S of t0 L is defined. One can verify by straightforward
calculation that the kernels
K = K + t0 (L L K), L = L + L S , (3.4.26)
With K and L as in (3.4.26), we have K,L (t) = K,L (t + t0 ) by (3.4.27) and hence
d
log K,L (t) = tr(L)
dt t=t0
by (3.4.25). Now the last identity holds also for K = 0 and the right side is inde-
pendent of K. It follows that the logarithmic derivatives of the functions 0,L (t)
and K,L (t) agree wherever neither has a pole, and so these logarithmic deriva-
tives must be identically equal. Integrating and exponentiating once we obtain an
identity K,L (t) = K,L (0)0,L (t) of entire functions of t. Finally, by evaluating
the last relation at t = 1, we recover the multiplicativity relation (3.4.24).
see the statement of Theorem 3.1.1. We note that a priori, because of Theorems
2.1.1 and 2.1.22, the limit in (3.5.1)
Nhas some chance of being nondegenerate
because the N random variables N 1 ,. . ., N N are spread out over an interval
N
very nearly of length 4N. As we will show in Section 4.2, the computation of the
limit in (3.5.1) allows one to evaluate other limits, such as the limit of the empirical
measure of the spacings in the bulk of the spectrum.
As in (3.2.4), set
n1 n (x)n1 (y) n1 (x)n (y)
K (n) (x, y) = k (x)k (y) = n
xy
,
k=0
where the k (x) are the normalized oscillator wave-functions introduced in Defi-
nition 3.2.1. Set
1 x y
S(n) (x, y) = K (n) , .
n n n
A crucial step in the proof of Theorem 3.1.1 is the following lemma, whose proof,
which takes most of the analysis in this section, is deferred.
(The scaling of Lebesgues measure in the last equality explains the appearance
of the scaling by 1/ n in the definition of S(n) (x, y).) Lemma 3.5.1 together with
Lemma 3.4.5 complete the proof of the theorem.
The proof of Lemma 3.5.1 takes up the rest of this section. We begin by bring-
ing, in Subsection 3.5.1, a quick introduction to Laplaces method for the evalua-
tion of asymptotics of integrals, which will be useful for other asymptotic compu-
tations, as well. We then apply it in Subsection 3.5.2 to conclude the proof.
Remark 3.5.2 We remark that one is naturally tempted to guess that the ran-
dom variable WN =width of the largest
open interval
symmetric about the origin
containing none of the eigenvalues N 1N , . . . , N NN should possess a limit in
distribution. Note however that we do not a priori have tightness for that random
variable. But, as we show in Section 3.6, we do have tightness (see (3.6.34) be-
low) a posteriori. In particular, in Section 3.6 we prove Theorem 3.1.2, which
provides an explicit expression for the limit distribution of WN .
We will be concerned with the situation in which the function f possesses a global
maximum at some point a, and behaves quadratically in a neighborhood of that
maximum. More precisely, let f : R R+ be given, and for some constant a
and positive constants s0 , K, L, M, let G = G (a, 0 , s0 , f (), K, L, M) be the class of
measurable functions g : R R satisfying the following conditions:
(i) |g(a)| K;
116 3. S PACINGS FOR G AUSSIAN ENSEMBLES
(ii) sup0<|xa|0 g(x)g(a)
xa L;
(iii) f (x)s0 |g(x)|dx M.
Note that by point (b) of the assumptions, f (a) > 0. The intuition here is that as s
tends to infinity the function ( f (x)/ f (a))s near x = a peaks more and more sharply
and looks at the microscopic level more and more like a bell-curve, whereas f (x)s
elsewhere becomes negligible. Formula (3.5.3) is arguably the simplest nontrivial
application of Laplaces method. Later we are going to encounter more sophisti-
cated applications.
Proof of Theorem 3.5.3 Let (s) be a positive function defined for s s0 such
that (s) s 0 and s (s)2 s , while 0 = supss0 (s). For example, we
could take (s) = 0 (s0 /s)1/4 . For s s0 , write
f (x)s g(x)dx = g(a)I1 + I2 + I3 ,
where
I1 = |xa| (s) f (x)s dx ,
I2 = |xa| (s) f (x)s (g(x) g(a))dx ,
I3 = |xa|> (s) f (x)s g(x)dx .
thus defining a continuous function of t such that h(0) = f (a)/2 f (a) and which
by Taylors Theorem satisfies
and hence
lim s f (a)s I3 = 0 .
s
This is enough to prove that the limit formula (3.5.3) holds and enough also to
prove the uniformity of convergence over all functions g(x) of the class G .
The main step in the proof of Lemma 3.5.1 is the following uniform convergence
result, whose proof is deferred. Let
1 t
(t) = n 4 ,
n
with a quantity whose difference from n is fixed (in the proof of Lemma 3.5.1,
we will use = n, n 1, n 2).
In order to prove the claimed uniform convergence, it is useful to get rid of the
division by (x y) in S(n) (x, y). Toward this end, noting that for any differentiable
functions f , g on R,
f (x)g(y) f (y)g(x)
xy
f (x) f (y) g(y) g(x)
= g(y) + f (y)
xy xy
1 1
= g(y) f (tx + (1 t)y)dt f (y) g (tx + (1 t)y)dt , (3.5.5)
0 0
we deduce
1
y x y
S(n) (x, y) = n1 ( ) n (t + (1 t) )dt
n 0 n n
1
y x y
n ( ) n1 (t + (1 t) )dt (3.5.6)
n 0 n n
1
y z
= n1 ( ) ( nn1 (z) n (z))|z=t x +(1t) y dt
n 0 2 n n
1
y z
n ( ) ( n 1n2 (z) n1 (z))|z=t x +(1t) y dt ,
n 0 2 n n
where we used in the last equality point 4 of Lemma 3.2.7. Using (3.5.4) (in the
case = n, n 1, n 2) in (3.5.6) and elementary trigonometric formulas shows
that
1 (n 1) 1 (n 1)
S(n) (x, y) cos(y ) cos tx + (1 t)y dt
2 0 2
n 1 (n 2)
cos(y ) cos tx + (1 t)y dt
2 0 2
1 sin(x y)
,
xy
which, Lemma 3.5.4 granted, completes the proof of Lemma 3.5.1.
e
2 /2i x
(x) = d . (3.5.7)
(2 )3/4 !
We use the letter here instead of n to help avoid confusion at the next step. As a
consequence, setting C ,n = n/(2 ), we have
i et /(4n) n1/4
2
2 /2i t/n
(t) = e d
(2 )3/4 !
(2 )1/4C ,n et /(4n) n1/4+ /2
2
( e /2 )n i ei t n d
2
=
!
(2 )1/4C ,n n1/4+n/2
( e /2 )n i ei t n d
2
n!
| e |n [(i sign ) ei t ]| n |d ,
2 /2
C ,n en/2
where Stirlings approximation (2.5.12) and the fact that (t) is real were used
in the last line. Using symmetry, we can rewrite the last expressions as
2C ,n en/2 f ( )n gt ( )d ,
Exercise 3.5.5 Use Laplaces method (Theorem 3.5.3) with a = 1 to prove (2.5.12):
as s along the positive real axis,
dx dx
(s) = xs ex = ss (xex )s 2 ss1/2 es .
0 x 0 x
120 3. S PACINGS FOR G AUSSIAN ENSEMBLES
That is, the generating function in the left side of (3.5.8) can be represented in
terms of a Fredholm determinant. We note that this holds in greater generality, see
Section 4.2.
Proof The proof is a slight modification of the method presented in Subsection
3.5.2. Note that the right side of (3.5.9) defines, by the fundamental estimate
(3.4.9), an entire function of the complex variables s1 , . . . , s p , whereas the left side
defines a function analytic in a domain containing the product of p copies of the
unit disc centered at the origin. Clearly we have
N
E 1
N iN = PN (1 , . . . , p ; A1 , . . . , A p )s11 s pp .
i=1 1 ,..., p 0
1 ++ p N
(3.5.10)
The function of s1 , . . . , s p on the right is simply a polynomial, whereas the expec-
tation on the left can be represented as a Fredholm determinant. From this, the
lemma follows after representing the probability PN (1 , . . . , p ; A1 , . . . , A p ) as a p-
dimensional Cauchy integral.
Our goal in this section is to derive differential equations (in the parameter t)
for the probability that no eigenvalue of the (properly rescaled) GUE lies in the
interval (t/2,t/2). We will actually derive slightly more general systems of
differential equations that can be used to evaluate expressions like (3.5.9).
s1 , . . . , sn1 , s0 = 0 = sn .
Set
= s1 1(t1 ,t2 ) + + sn1 1(tn1 ,tn ) ,
and define so that it has density with respect to the Lebesgue measure on
X = R. We then have, for f L1 [(a, b)],
n1 ti+1
f , = f (x)d (x) = si ti
f (x)dx .
i=1
and hence R is also of rank 2. Letting P(x) = (1 + R) cos(x)/ and Q(x) =
(1 + R) sin(x)/ , we then obtain R = Q(x)P(y) Q(y)P(x), and thus
Q(x)P(y) Q(y)P(x)
R(x, y) = . (3.6.2)
xy
(See Lemma 3.6.2 below for the precise statement and proof.) One checks that
differentiating with respect to the endpoints t1 ,t2 the function log (S) yields the
functions R(ti ,ti ), i = 1, 2, which in turn may be related to derivatives of P and
Q by a careful differentiation, using (3.6.2). The system of differential equations
thus obtained, see Theorem 3.6.2, can then be simplified, after specialization to
the case t2 = t1 = t/2, to yield the Painleve V equation appearing in Theorem
3.1.2.
Turning to the actual derivation, we consider the parameters t1 , . . . ,tn as vari-
able, whereas we consider the kernel S(x, y) and the parameters s1 , . . . , sn1 to be
fixed. Motivated by the sketch above, set f (x) = (sin x)/ and
Q(x) = f (x) + R(x, y) f (y) d (y), P(x) = f (x) + R(x, y) f (y) d (y) .
(3.6.3)
We emphasize that P(x), Q(x) and R(x, y) depend on t1 , . . . ,tn (through ), al-
though the notation does not show it. The main result of this section, of which
Theorem 3.1.2 is an easy corollary, is the following system of differential equa-
tions.
Ri j = (qi p j q j pi )/(ti t j ) ,
q j / ti = (si si1 ) R ji qi ,
p j / ti = (si si1 ) R ji pi ,
qi / ti = +pi + (sk sk1 ) Rik qk ,
k=i
with H1 as in (3.4.13). Multiplying by (1) /! and summing, using the esti-
mate (3.4.9) and dominated convergence, we find that
= (si si1 )H(ti ,ti ) . (3.6.6)
ti
From (3.6.6) we get
log = (si si1 )R(ti ,ti ) . (3.6.7)
ti
124 3. S PACINGS FOR G AUSSIAN ENSEMBLES
We also need to be able to differentiate R(x, y). From the fundamental identity
(3.4.20), we have
R(z, y)
R(x, y) = (si si1 )R(x,ti )S(ti , y) + S(x, z) (dz) . (3.6.8)
ti ti
Substituting y = z in (3.6.8) and integrating against R(z , y) with respect to (dz )
gives
R(x, z )
R(z , y) (dz ) = (si si1 )R(x,ti ) S(ti , z )R(z , y) (dz )
ti
R(z, z )
+ S(x, z) R(z , y) (dz) (dz ) . (3.6.9)
ti
Summing (3.6.8) and (3.6.9) and using again the fundamental identity (3.4.20)
then yields
R(x, y) = (si1 si )R(x,ti )R(ti , y) . (3.6.10)
ti
The next lemma will play an important role in the proof of Theorem 3.6.1.
Q(x) = (si1 si )R(x,ti )Q(ti ) , (3.6.13)
ti
and similarly
P(x) = (si1 si )R(x,ti )P(ti ) . (3.6.14)
ti
R S + R S = R S .
3.6 A NALYSIS OF THE SINE - KERNEL 125
(ii) Using the fact that the kernel S is an entire function, extend the definitions of
H , H and in the setup of this section to analytic functions in the parameters
t1 , . . . ,tn , s1 , . . . , sn1 .
(iii) View the signed measure as defining a family of distributions (in the
sense of Schwartz) on the interval (a, b) depending on the parameters t1 , . . . ,tn , by
the formula
n1 ti+1
, = si ti
(x)dx ,
i=1
valid for any smooth function (x) on (a, b). Show that / ti is a distribution
satisfying
= (si1 si )ti (3.6.16)
ti
for i = 1, . . . , n, and that the distributional derivative (d/dx) of satisfies
d n n
= (si si1 )ti = . (3.6.17)
dx i=1 i=1 ti
(iv) Use (3.6.16) to justify (3.6.5) and step (i) to justify (3.6.6).
To proceed farther we need means for differentiating Q(x) and P(x) both with
respect to x and with respect to the parameters t1 , . . . ,tn . To this end we introduce
the further abbreviated notation
S (x, y) = + S(x, y) = 0, R (x, y) = + R(x, y)
x y x y
and
n
(F G)(x, y) = F(x, z)G(z, y)d (z) := (si si1 )F(x,ti )G(ti , y) ,
i=1
RS = RS = S R,
R S + R S + R S = R S .
3.6 A NALYSIS OF THE SINE - KERNEL 127
Applying the operation R on both sides of the last equation we find that
R (R S) + R (R S) + R S R = R R S R .
Adding the last two equations and then making the obvious cancellations (includ-
ing now the cancellation S = 0) we find that
R = R R .
Now we can differentiate Q(x) and P(x). We have from the last identity
Q (x) = f (x) + R(x, y) f (y)d (y)
x
= f (x) R(x, y) f (y)d (y)
y
+ R(x,t)R(t, y)d (t) f (y)d (y) .
and similarly
n
P (x) = Q(x) + (sk sk1 )R(x,tk )P(tk ) . (3.6.20)
k=1
n
Q(ti ) = P(ti ) + (sk sk1 )R(ti ,tk )Q(tk ) . (3.6.21)
ti k=1,k=i
n
P(ti ) = Q(ti ) + (sk sk1 )R(ti ,tk )P(tk ). (3.6.22)
ti k=1,k=i
R(ti ,ti ) = P(ti ) Q(ti ) Q(ti ) P(ti ) . (3.6.23)
ti ti
(Note that the terms involving Q(x)/ ti |x=ti cancel out to yield the above equal-
ity.) Unraveling the definitions, this completes the proof of (3.6.4) and hence of
Theorem 3.6.1.
1 1
( / t2 / t1 ) log (t1 ,t2 ) = s(p21 + q21 + p22 + q22 ) + s2 (t2 t1 )R221 ,
2 2
1
( q1 / t2 q1 / t1 ) = p1 /2 + sR12 q2 ,
2
1
( p1 / t2 p1 / t1 ) = +q1 /2 + sR12 p2 . (3.6.26)
2
in order to emphasize the roles of the parameters t1 and t2 . To begin with, since
S(x + c, y + c) = S(x, y) ,
1 (1)n sn+1
p1 (t1 ,t2 ) = f (t1 ) +
(t1 ,t2 ) n=0 n!
t2 t2
t1 x1 . . . xn
S f (y) dx1 dxn dy
t1 t1 y x1 . . . xn
1 (1)n sn+1
= f (t1 ) +
(t2 , t1 ) n=0 n!
t2 t2
t1 x1 . . . xn
S f (y) dx1 dxn dy
t1 t1 y x1 . . . xn
1 (1)n sn+1
= f (t1 ) +
(t2 , t1 ) n=0 n!
t1 t1
t1 x1 . . . xn
S f (y) dx1 dxn dy
t2 t2 y x1 . . . xn
= p2 (t2 , t1 ) . (3.6.28)
Similarly, we have
= st(p2 + q2 ) + 4s2 q2 p2 ,
q = p/2 + 2spq2 /t , (3.6.31)
p = +q/2 2sp q/t , 2
= s(p2 + q2 ) ,
t = 4s2 (p3 q q3 p) . (3.6.32)
Using (3.6.32) together with the equation for from (3.6.31) to eliminate the
variables p, q, we obtain finally
4t( )3 + 4t 2 ( )2 4 ( )2 + 4 2 + (t )2 8t = 0 , (3.6.33)
Each of the terms inside the limit in the last display is an entire function in t, and
the convergence (in n) is uniform due to the boundedness of the kernel and the
Hadamard inequality, see Lemma 3.4.2. The claimed analyticity of in t follows.
We next explicitly compute a few terms of the expansion of in powers of t.
Indeed,
t/2 t/2 t/2 k sin(xi x j ) k
t/2
dx = t,
t/2
det dx j = O(t 4 ) for k 2 ,
t/2 i, j=1 (xi x j ) j=1
and hence the part of (3.6.25) dealing with follows. With more computational
effort, which we omit, one verifies the other part of (3.6.25).
Remark 3.6.5 We emphasize that we have not yet proved that the function F()
in Theorem 3.1.2 is a distribution function, that is, we have not shown tightness
for the sequence of gaps around 0. From the expansion at 0 of (t), see (3.1.2),
it follows immediately that limt0 F(t) = 0. To show that F(t) 1 as t
requires more work. One approach, that uses careful and nontrivial analysis of the
resolvent equation, see [Wid94] for the first rigorous proof, shows that in fact
(t) t 2 /4 as t + , (3.6.34)
implying that limt F(t) = 1. An easier approach, which does not however yield
such precise information, proceeds from the CLT for determinantal processes de-
veloped in Section 4.2; indeed, it is straightforward to verify, see Exercise 4.2.40,
that for the determinantal process determined by the sine-kernel, the expected
number of points in an interval of length L around 0 increases linearly in L, while
the variance increases only logarithmically in N. This is enough to show that with
A = [t/2,t/2], the right side of (3.1.1) decreases to 0 as t , which implies
that limt F(t) = 1. In particular, it follows that the random variable giving the
width
of the largest open interval centered at the origin in which no eigenvalue of
NXN appears is weakly convergent as N to a random variable with distri-
bution F.
132 3. S PACINGS FOR G AUSSIAN ENSEMBLES
Our goal in this section is to study the spacing of eigenvalues at the edge of the
spectrum. The main result is the proof of Theorem 3.1.4, which is completed in
Subsection 3.7.1 (some technical estimates involving the steepest descent method
are postponed to Subsection 3.7.2). For the proof of Theorem 3.1.4, we need the
3.7 E DGE - SCALING 133
following a priori estimate on the Airy kernel. Its proof is postponed to Subsection
3.7.3, where additional properties of the Airy function are studied.
then
n (x) n (y) n (y) n (x) 1
A(n) (x, y) = 1/3 n (x)n (y) .
xy 2n
The following lemma plays the role of Lemma 3.5.1 in the study of the spacing in
the bulk. Its proof is rather technical and takes up most of Subsection 3.7.2.
Since the functions n are entire, the convergence in Lemma 3.7.2 entails the
uniform convergence of n to Ai on compact subsets of C. Together with Lemma
3.4.5, this completes the proof of the theorem.
Remark 3.7.3 An analysis similar to, but more elaborate than, the proof of Theo-
rem 3.1.4 shows that
" #
N
lim P N 2/3
2 t
N
N N
exists for each positive integer and real number t. In other words, the suitably
rescaled th largest eigenvalue converges vaguely and in fact weakly. Similar
statements can be made concerning the joint distribution of the rescaled top
eigenvalues.
In this subsection, we use the steepest descent method to prove Lemma 3.7.2.
The steepest descent method is a general, more elaborate version of the method
of Laplace discussed in Subsection 3.5.1, which is inadequate when oscillatory
integrands are involved. Indeed, consider the evaluation of integrals of the form
f (x)s g(x)dx ,
see (3.5.3), in the situation where f and g are analytic functions and the integral
is a contour integral. The oscillatory nature of f prevents the use of Laplaces
method. Instead, the oscillatory integral is tamed by modifying the contour of
integration in such a way that f can be written along the contour as e f with f real,
and the oscillations of g at a neighborhood of the critical points of f are slow. In
practice, one needs to consider slightly more general versions of this example, in
which g itself may depend (weakly) on s.
3.7 E DGE - SCALING 135
The main effort in the proof is to modify the contour integral in the formula above
in such a way that the leading asymptotic order of all terms in the integrand match,
and then keep track of the behavior of the integrand near its critical point. To carry
out this program, note that, by Cauchys Theorem, we may replace the contour of
integration in (3.7.5) by any straight line in the complex plane with slope of ab-
solute value greater than 1 oriented so that height above the real axis is increasing
(the condition on the slope is to ensure that no contribution appears from the con-
tour near ). Since (x) > 0 under our assumptions concerning u and n, we may
take the contour of integration in (3.7.5) to be the perpendicular bisector of the
line segment joining x to the origin, that is, replace by (x/2)(1 + ), to obtain
i
ex /8 (x/2)n+1
2
2 ( 2 /2 )
n (x) = (1 + )n e(x/2) d . (3.7.6)
i(2 )3/4 n! i
Let log be the principal branch of the logarithm, that is, the branch real on the
interval (0, ) and analytic in the complement of the interval (, 0], and set
F( ) = log(1 + ) + 2 /2 . (3.7.7)
Note that the leading term in the integrand in (3.7.6) has the form enF( ) , where
(F) has a maximum along the contour of integration at = 0, and a Taylor
expansion starting with 3 /3 in a neighborhood of that point (this explains the
particular scaling we took for u). Put
x 2/3
= , u = 2 n/ ,
2
where to define fractional powers of complex numbers such as that figuring in
the definition of we follow the rule that a = exp(a log ) whenever is in the
domain of our chosen branch of the logarithm. We remark that as n we have
u u and n1/3 , uniformly for |u| < C. Now rearrange (3.7.6) to the form
where
i
1 3 F( )u log(1+ )
In (u) = e d . (3.7.9)
2 i i
because we have
n1/12 (x/2)n+1/3 ex u n1/3 u
2 /8
1 u2
log = n+ log 1 + 2/3 1/3
en/2 nn/2+1/4 3 2n 2 8n
and hence
(2 )1/4 n1/12 (x/2)n+1/3 ex2 /8
lim sup 1 = 0 ,
n |u|<C n!
in the complex plane with corner at the origin. For each > 0 let S be the in-
tersection of S with the closed disc of radius centered at the origin and let S be
the boundary of S . For each t > 0 and all sufficiently large , the curve F( S )
winds exactly once about the point t. Since, by the argument principle of com-
plex analysis, the winding number equals the difference between the number of
zeros and the number of poles of the function F() + t in the domain S , and the
function F() + t does not possess poles there, it follows that there exists a unique
solution (t) S of the equation F( ) = t (see Figure 3.7.1). Clearly (0) = 0
is the unique solution of the equation F( ) = 0 in S. We have the following.
4 3 2 1 0 1 2
Fig. 3.7.1. The contour S3 (solid), its image F( S3 ) (dashed), and the curve () (dash
and dots).
Proof (i) follows by noting that F restricted to S is proper, that is for any sequence
zn S with |zn | as n , it holds that |F(zn )| . The real analyticity
claim in (ii) follows from the implicit function theorem. (iii) follows from a direct
computation, and together with (0) = 0 implies the continuity claim in (ii).
From Lemma 3.7.4 we obtain the formula
1
e t (1 + (t)) u (t) (1 + (t)) u (t) dt ,
3
In (u) =
2 i 0
by deforming the contour i i in (3.7.9) to . After replacing t by t 3 /3n
in the integral above we obtain the formula
1
In (u) = (An (t, u) Bn (t, u))dt , (3.7.11)
2 i 0
where
3 u 3 2
3t 3 t t t
An (t, u) = exp 1+ ,
3n 3n 3n n
3 u 3 2
3t 3 t t t
Bn (t, u) = exp 1 + .
3n 3n 3n n
138 3. S PACINGS FOR G AUSSIAN ENSEMBLES
Put
t3 i/3
A(t, u) = exp e tu + i/3 ,
33
t i/3
B(t, u) = exp e tu i/3 .
3
By modifying the contour of integration in the definition of the Airy function
Ai(x), see (3.7.16), we have
1
Ai(u) = (A(t, u) B(t, u))dt . (3.7.12)
2 i 0
A calculus exercise reveals that, for any positive constant c and each t0 0,
An (t, u)
lim sup sup 1 = 0 (3.7.13)
n 0tt |u|<c A(t, u)
0
and clearly the analogous limit formula linking Bn (t, u) to B(t, u) holds also. There
exist positive constants c1 and c2 such that
| log(1 + (t))| c1t 1/3 , | (t)| c2 max(t 2/3 ,t 1/2 )
for all t > 0. There exists a positive constant n0 such that
( 3 ) n/2, | | 2n1/3 , |u | < 2c
for all n n0 and |u| < c. Also there exists a positive constant c3 such that
1/3
ec3 t t 1/6
for t 1. Consequently there exist positive constants c4 and c5 such that
| e t (1 + (t)) u (t)| c4 n1/3 ent/2+c5 n t 2/3 ,
3 1/3 t
hence
|An (t, u)| c4 et
3 /6+c
5t (3.7.14)
for all n n0 , t > 0 and |u| < c. Clearly we have the same majorization for
|Bn (t, u)|. Integral formula (3.7.12), uniformity of convergence (3.7.13) and ma-
jorization (3.7.14) together are enough to finish the proof of limit formula (3.7.10)
and hence of limit formula (3.7.4).
Note that the rapid decay of the integrand in (3.7.15) along the indicated con-
tour ensures that Ai(x) is well defined and depends holomorphically on x. By
parametrizing the contour appearing in (3.7.15) in evident fashion, we also obtain
the formula
3
1 t i i 3i i
Ai(x) = exp exp xte + 3 exp xte dt .
2 i 0 3 3 3
(3.7.16)
In the statement of the next lemma, we use the notation x to mean that x goes
to along the real axis. Recall also the definition of Eulers Gamma function, see
(2.5.5): (s) = 0 ex xs1 dx, for s with positive real part.
Lemma 3.7.6 (a) For any integer 0, the derivative Ai( ) (x) satisfies
We next evaluate the asymptotics of the Airy functions at infinity. For two
functions f , g, we write f g as x if limx f (x)/g(x) = 1.
2 3/2
Ai (x) 1/2 x1/4 e 3 x /2 . (3.7.21)
where
C = (e2 i/3 , i 3] + [i 3, i 3] + [i 3, e2 i/3 ) =: C1 +C2 +C3 .
Since the infimum of the real part of u2 u3 /3 on the rays C1 and C3 is strictly
negative, the contribution of the integral over C1 and C3 to the right side of (3.7.22)
vanishes as x . The remaining integral (over C2 ) gives
3x3/4
3 3/4
et +it x /3 dt et dt = i
2 2
i i as x ,
3x 3/4
Proof of Lemma 3.7.1 Fix x0 R. By (3.7.20), (3.7.21) and the Airy differential
equation (3.1.4), there exists a positive constant C (possibly depending on x0 ) such
that
max(| Ai(x)|, | Ai (x)|, | Ai (x)|) Cex
But by the variant (3.5.5) of Taylors Theorem noted above we also have, for
x, y x0 ,
|x y| < 1 |A(x, y)| 2C2 e2 exy .
Exercise 3.7.8 Show that 0 Ai(x)dx = 1/3.
Hint: for > 0, let denote the path (t e2 it ) : [5/6, 7/6] C, and define
the contour C = (e2 i/3 , e2 i/3 ] + + [ e2 i/3 , e2 i/3 ). Show that
1
w1 ew
3 /3
Ai(x)dx = dw ,
0 2 i C
Exercise 3.7.9 Write x if x along the real axis. Prove the asymptotics
sin( 23 |x|3/2 + 4 )
Ai(x) as x (3.7.23)
|x|1/4
and
cos( 23 |x|3/2 + 4 )|x|1/4
Ai (x) as x . (3.7.24)
Conclude that Lemma 3.7.1 can be strengthened to the statement
Exercise 3.7.10 The proof of Lemma 3.7.7 as well as the asymptotics in Exercise
3.7.17 are based on finding an appropriate explicit contour of integration. An al-
ternative to this approach utilizes the steepest descent method. Provide the details
of the proof of (3.7.20), using the following steps.
(a) Replacing by x1/2 in (3.1.3), deduce the integral representation, for x > 0,
x1/2 3/2 H( )
Ai(x) = ex d , H( ) = 3 /3 . (3.7.26)
2 i C
and the intersection of S with the closed disc of radius about 1, and apply a
reasoning similar to the proof of Lemma 3.7.2 to find a curve (t) such that
e2x
3/2 /3
x1/2
ex ( (t) (t))dt for x > 0 .
3/2 t
Ai(x) = (3.7.27)
2 i 0
Show
that Bi(x) satisfies (3.1.4) with the boundary conditions [Bi(0) Bi (0)] =
31/6
1
31/6 (2/3) (1/3)
. Show that for any x R,
" #
Ai(x) Ai (x) 1
det = , (3.7.29)
Bi(x) Bi (x)
concluding that Ai and Bi are linearly independent solutions. Show also that
Bi(x) > 0 and Bi (x) > 0 for all x > 0. Finally, repeat the analysis in Lemma
3.7.7, using the substitution w x1/2 (u + 1) and the (undeformed!) contour
Put
k
(1)k x x1 ... xk
H(x, y) = A(x, y) + A d (x j ) .
k=1 k!
y x1 ... xk j=1
In view of the basic estimate (3.4.9) and the crude bound (3.7.1) for the Airy
kernel, we must have (t) 1 as t . Similarly, we have
sup sup ex+y |H(x, y)| < (3.8.1)
tt0 x,yR
the last equality by symmetry R(x, y) = R(y, x). Convergence of all these integrals
is easy to check. Note that each of the quantities q, p, u and v tends to 0 as t .
144 3. S PACINGS FOR G AUSSIAN ENSEMBLES
More precise information is also available. For example, from (3.8.1) and (3.8.2)
it follows that
q(x)/ Ai(x) x 1 , (3.8.4)
because for x large, (3.7.20) implies that for some constant C independent of x,
R(x, y) Ai(y)dy C exy Ai(y)dy C Ai(x)e2x .
x x
We follow the trail blazed in the discussion of the sine-kernel in Section 3.6. The
first few steps we can get through quickly by analogy. We have
log = R(t,t) , (3.8.5)
t
R(x, y) = R(x,t)R(t, y) . (3.8.6)
t
As before we have a relation
Q(x)P(y) Q(y)P(x)
R(x, y) = = R(y, x) (3.8.7)
xy
and hence by LHopitals rule we have
Heres the wrinkle in the carpet that changes the game in a critical way: A does
not vanish identically. Instead we have
R = R R + A + R A + A R + R A R.
The wrinkle propagates to produce the extra term on the right. We now have
Q (x) = Ai(x) + R(x, y) Ai(y)d (y)
x
= Ai (x) R(x, y) Ai(y)d (y)
y
+R(x,t) R(t, y) Ai(y)d (y) Q(x)u
= Ai(x) + R(x, y) Ai (y)d (y) + R(x, y) Ai(y)d (y)
+R(x,t) R(t, y) Ai(y)d (y) Q(x)u
= Ai(x) + R(x, y) Ai (y)d (y)
+R(x,t)(Ai(t) + R(t, y) Ai(y)d (y)) Q(x)u
= P(x) + R(x,t)Q(t) Q(x)u . (3.8.13)
This is more or less in analogy with the sine-kernel case. But the wrinkle continues
to propagate, producing the extra terms involving the quantities u and v.
146 3. S PACINGS FOR G AUSSIAN ENSEMBLES
v = Q(x) Ai (x)d (x) + Q(x) Ai (x)d (x)
t t
= Q(t) R(t, x) Ai (x)d (x) Q(t) Ai (t) = pq .
Let us now write q(t) and (t) to emphasize the t-dependence. In view of the
rapid decay of (t) 1, (log (t)) and q(t) as t we must have
(t) = exp (x t)q(x)2 dx , (3.8.20)
t
whence the conclusion that F2 (t) = (t) satisfies F2 () = 1 and, because of the
factor (x t) in (3.8.20) and the fact that q() does not identically vanish, also
F2 () = 0. In other words, F2 is a distribution function. Together with (3.8.17)
and Theorem 3.1.4, this completes the proof of Theorem 3.1.5.
Remark 3.8.1 The Painleve II equation q = tq+2q3 has been studied extensively.
The following facts, taken from [HaM80], are particularly relevant: any solution
of Painleve II that satisfies q(t) t 0 satisfies also that as t , q(t) Ai(t)
for some R, and for each fixed , such a solution exists and is unique. For
= 1, which is the case of interest to us, see (3.1.8), one then gets
q(t) t/2 , t . (3.8.21)
Remark 3.8.2 The analysis in this section would have proceeded verbatim if the
Airy kernel A(x, y) were replaced by sA(x, y) for any s (0, 1), the only difference
being that the boundary condition for (3.1.8) would be replaced by q(t) s Ai(t)
as t . On the other hand, by Corollary 4.2.23 below, the kernel sA(n) (x, y)
replaces A(n) (x, y) if one erases each eigenvalue of the GUE with probability s. In
particular, one concludes that for any k fixed,
lim lim sup P(N 1/6 (Nk
N
2 N) t) = 0 . (3.8.22)
t N
Exercise 3.8.3 Using (3.7.20), (3.8.4) and (3.8.21), deduce from the representation
(3.1.7) of F2 that
1 4
lim log[1 F2 (t)] = ,
t t 3/2 3
1 1
lim log F2 (t) = ,
t t 3 12
Note the different decay rate of the upper and lower tails of the distribution of the
(rescaled) largest eigenvalue.
148 3. S PACINGS FOR G AUSSIAN ENSEMBLES
We prove Theorems 3.1.6 and 3.1.7 in this section, using the tools developed in
Sections 3.4, 3.6 and 3.7, along with some new tools, namely, Pfaffians and matrix
kernels. The multiplicativity of Fredholm determinants, see Theorem 3.4.10, also
plays a key role.
We begin our analysis of the limiting behavior of the GOE and GSE by proving a
series of integration identities involving Pfaffians; the latter are needed to handle
the novel algebraic situations created by the factors |(x)| with {1, 4} ap-
pearing in the joint distribution of eigenvalues in the GOE and GSE, respectively.
Then, with Remark 3.4.4 in mind, we use the Pfaffian integration identities to
obtain determinant formulas for squared gap probabilities in the GOE and GSE.
Recall that Matk (C) denotes the space of k-by- matrices with complex entries,
with Matn (C) = Matnn (C) and In Matn (C) denoting the identity matrix. Let
0 1
1 0
..
Jn = . Mat2n (C)
0 1
1 0
" #
0 1
be the block-diagonal matrix consisting of n copies of strung along
1 0
the diagonal. Given a family of matrices
let
X(1, 1) ... X(1, n)
.. ..
X(i, j)|m,n = . . Matkmn (C) .
X(m, 1) . . . X(m, n)
" #
0 1
For example, Jn = i, j |n,n Mat2n (C).
1 0
Next, recall a basic definition.
3.9 L IMITING B EHAVIOR OF THE GOE AND THE GSE 149
1
For example, Pf Jn = 1, which explains the normalization 2n n! .
We collect without proof some standard facts related to Pfaffians.
To evaluate gap probabilities in the GOE and GSE, we will specialize Proposi-
tion 3.9.3 in several different ways, varying both F and n. To begin the evaluation,
2
let denote a function on the real line of the form (x) = eC1 x +C2 x+C3 , where
C1 < 0, C2 and C3 are real constants, and let On denote the span over C of the set
of functions {xi1 (x)}n1
i=0 . Later we will make use of specially chosen bases for
On consisting of suitably modified oscillator wave-functions, but initially these
are not needed. Recall that (x) = 1i< jn (x j xi ) for x = (x1 , . . . , xn ) Rn .
The application of (3.9.1) to the GSE is the following.
Proof By Theorem 3.9.2(i), we may assume without loss of generality that fi (x) =
xi1 (x), and it suffices to show that (3.9.3) holds with c = 0. By identity (3.9.1)
and the confluent alternant identity (2.5.30), identity (3.9.3) does indeed hold for
suitable nonzero c independent of A.
The corresponding result for the GOE uses indefinite integrals of functions. To
streamline the handling of the latter, we introduce the following notation, which
is used throughout Section 3.9. For each integrable real-valued function f on the
real line we define a continuous function f by the formula
1 1
( f )(x) = sign (x y) f (y)dy = f (y)dy + f (y)dy
2 x 2
x
1
= f (y)dy sign(y) f (y)dx , (3.9.4)
0 2
where sign(x) = 1x>0 1x<0 , and we write f (x)dx = f (x)dx to abbreviate
notation. Note that ( f ) (x) = f (x) almost everywhere, that is, inverts dif-
ferentiation. Note also that the operation reverses parity and commutes with
translation.
The application of (3.9.1) to the GOE is the following.
Proof By Theorem 3.9.2(i), we may assume without loss of generality that fi (x) =
xi1 (x), and it suffices to show that (3.9.5) holds with c = 0 independent of A.
For x R, let f (x) = [ fi (x) ] |n,1 Matn1 (C). Let An+ be the subset of An Rn
consisting of n-tuples in strictly increasing order. Then, using the symmetry of the
integrand of (3.9.5) and the Vandermonde determinant identity, one can verify that
the integral An+ det[ f (y j )]|1,n n1 dyi equals the right side of (3.9.5) with c = 1/n!.
Put r = n/2. Consider, for z Rr , the n n matrix
[ (1A fi )|z 1 z
]|n,1 [ fi (z j ) (1A fi )|z j+1
j ]|n,r if n is odd,
A (z) =
z
[ fi (z j ) (1A fi )|z j+1
j ]|n,r if n is even,
where zr+1 = , and h|ts = h(t) h(s). By integrating every other variable, we
obtain a relation
r n
det A (z) dzi = det[ f (y j )]|1,n dyi .
Ar+ 1 An+ 1
Consider, for z Rr ,
the n n matrix
[[FA (z j )]|1,r a A f (x)dx] if n is odd,
A (z) =
[FA (z j )]|1,r if n is even.
Because A (z) arises from A (z) by evident column operations, we deduce that
det A (z) = c1 det A (z) for some nonzero complex constant c1 independent of A
and z. Since the function det A (z) of z Rr is symmetric, we have
r r
1
det A (z) dzi = det A (z) dzi .
Ar+ 1 r! Ar 1
If n is even, we conclude the proof by using the Pfaffian integration identity (3.9.1)
to verify that the right side above equals the left side of (3.9.5).
Assume for the rest of the proof that n is odd. For i = 1, . . . , n, let FAe,i (x) be the
152 3. S PACINGS FOR G AUSSIAN ENSEMBLES
result of striking out the ith row from FAe (x) and similarly, let iA (z) be the result
of striking the ith row and last column from A (z). Then we have expansions
" e #
A FA (x)J F e (x)T dx a A f (x)dx
Pf 1 AT
a A f (x) dx 0
n
= a (1) ( fi (x)dx) Pf FA (x)J1 FA (x) dx ,
i+1 e,i e,i T
i=1 A A
n
det A (z) = a (1)i+n ( fi (x)dx) det iA (z) ,
i=1 A
obtained in the first case by Theorem 3.9.2(iii), and in the second by expanding
the determinant by minors of the last column. Finally, by applying (3.9.1) term
by term to the latter expansion, and comparing the resulting terms with those of
the former expansion, one verifies that r!1 Ar det A (z) r1 dzi equals the left side
of (3.9.5). This concludes the proof in the remaining case of odd n.
The next lemma gives further information about the structure of the antisym-
metric matrix A FA (x)J1FA (x)T dx appearing in Proposition 3.9.5. Let n = 2In
" #
2In 0
for even n, and n = for odd n.
0 1/ 2
Lemma 3.9.6 In the setup of Proposition 3.9.5, for all measurable sets A R,
FA (x)J1 FA (x)T dx = FR (x)J1 FR (x)T dx n FR (x)J1 FA (x)T n dx . (3.9.6)
A Ac
Proof Let Li, j (resp., Ri, j ) denote the (i, j) entry of the matrix on the left (resp.,
right). To abbreviate notation we write f , g = f (x)g(x)dx. For i, j < n + 1,
using antisymmetry of the kernel 12 sign(x y), we have
1 1
Li, j = (1A fi , (1A f j ) 1A f j , (1A fi )) = 1A fi , (1A f j )
2 2
= fi , f j 1Ac fi , f j 1A fi , (1Ac f j )
1
= fi , f j 1Ac fi , f j + (1A fi ), 1Ac f j = Ri, j ,
2
which concludes the proof in the case of even n. In the case of odd n it remains
only to consider the cases max(i, j) = n+1. If i = j = n+1, then Li, j = 0 = Ri, j . If
i < j = n + 1, then Li, j = a1A , fi = Ri, j . If j < i = n + 1, then Li, j = a1A , f j =
Ri, j . The proof is complete.
formulas for squared gap probabilities. Toward that end, for fixed > 0 and real
, let
n (x) = n, , (x) = 1/2 n ( 1 x + ) , (3.9.7)
and 1 0 for convenience. The functions n are shifted and scaled versions of
the oscillator wave-functions, see Definition 3.2.1.
We are ready to state the main results for gap probabilities in the GSE and GOE.
These should be compared with Lemma 3.2.4 and Remark 3.4.4. The result for
the GSE is as follows.
To prove the proposition we will interpret GA as a matrix of the form FAT n ap-
pearing on the right side of (3.9.6) in Lemma 3.9.6.
Before commencing the proofs we record a series of elementary properties of
the functions i following immediately from Lemmas 3.2.5 and 3.2.7. These
properties will be useful throughout Section 3.9. As above, we write f , g =
f (x)g(x)dx. Let k, , n 0 be integers. Let On = On, , denote the span of the
set {i }n1
i=0 over C.
Proof of Proposition 3.9.7 Using property (3.9.19), and recalling that inverts
differentiation, we observe that, with = 0 and F(x) = H(x)T , the integra-
tion identity (3.9.3) holds with a constant c independent of A. Further, we have
, T H(x)dx = I2n by (3.9.14) and (3.9.15), and hence
H(x)
2
det In H(x), T H(x)dx = Pf F(x)J1 F(x)T dx ,
A Ac
after some algebraic manipulations using part (ii) of Theorem 3.9.2 and the fact
that det Jn = 1. Thus, by (3.9.3) with A replaced by Ac , the integration identity
(3.9.9) holds up to a constant factor independent of A. Finally, since (3.9.9) obvi-
ously holds for A = 0,
/ it holds for all A.
Definition 3.9.10 For k {1, 2}, let Kerk denote the space of Borel-measurable
functions K : R R Matk (C). We call elements of Ker1 scalar kernels, ele-
ments of Ker2 matrix kernels, and elements of Ker1 Ker2 simply kernels. We
often view a matrix kernel K Ker2 as a 2 2 matrix with entries Ki, j Ker1 .
We are now using the term kernel in a sense somewhat differing from that in
Section 3.4. On the one hand, usage is more general because boundedness is not
assumed any more. On the other hand, usage is more specialized in that kernels
are always functions defined on R R.
Since the definition of Fredholm determinant made in Section 3.4 applies only
to bounded kernels on measure spaces of finite total mass, to use it efficiently we
have to make the next several definitions.
Given a real constant 0, let w (x) = exp( |x + | 2 ) for x R. Note that
w (x) = e x for x > and w0 (x) 1.
Definition 3.9.12 ( -twisting) Given k {1, 2}, a kernel K Kerk , and a constant
0, we define the -twisted kernel K ( ) Kerk by
K(x, y)w (y) if k = 1,
K ( ) (x, y) = " #
w (x)K11 (x, y) w (x)K12 (x, y)w (y)
if k = 2 .
K21 (x, y) K22 (x, y)w (y)
We remark that K Ker2 K11
T , K Ker where K T (x, y) = K (y, x).
22 1 11 11
As before, let Leb denote Lebesgue measure on the real line. For 0, let
Leb (dx) = w (x)1 Leb(dx), noting that Leb0 = Leb, and that Leb has finite
total mass for > 0.
Note that Kerk is closed under the operation because, for K, L Kerk , we have
(K L)( ) (x, y) = K ( ) (x,t)L( ) (t, y)Leb (dt) (3.9.23)
and hence K L Kerk .
We turn next to the formulation of a version of the definition of Fredholm de-
terminant suited to kernels of the class Kerk .
Definition 3.9.14 Given k {1, 2}, 0, and L Kerk , we define Fredk (L) by
specializing the setup of Section 3.4 as follows.
(i) Choose U R open and c > 0 such that maxi, j |(L( ) )i, j | c1UU .
(ii) Let X = U I , where I = {1}, {1, 2} according as k = 1, 2.
(iii) Let = (restriction of Leb to U) (counting measure on I ).
(iv) Let K((s, i), (t, j)) = L( ) (s,t)i, j for (s, i), (t, j) X.
Finally, we let Fredk (L) = (K), where the latter is given as in Definition 3.4.3,
with inputs X, and K as defined above.
3.9 L IMITING B EHAVIOR OF THE GOE AND THE GSE 157
The complex number Fredk (L) is independent of the choice of U and c made in
point (i) of the definition, and hence well defined. The definition is contrived so
that if L Kerki for i = 1, 2, then Fredki (L) is independent of i, as one verifies by
comparing the expansions of these Fredholm determinants term by term.
Two formal properties of Fredk () deserve emphasis.
Remark 3.9.15 If K, L Kerk , then multiplicativity holds in the form
Fredk (K + L K L) = Fredk (K)Fredk (L) ,
by (3.9.23) and Theorem 3.4.10. Further, by Corollary 3.4.9, if K Ker2 satisfies
K21 0 or K12 0, then
T
Fred2 (K) = Fred1 (K11 )Fred1 (K22 ) .
If K Kerk and Fredk (K) = 0, then one can adapt the Fredholm adjugant con-
struction, see equation (3.4.15), to the present situation, and one can verify that
there exists unique R Kerk such that the resolvent equation RK = K R = RK
holds.
Definition 3.9.17 The kernel R Kerk associated as above with K Kerk is called
the resolvent of K with respect to , and we write R = Resk (K).
This definition is contrived so that if K Kerki for i = 1, 2, then Reski (K) is in-
dependent of i. In fact, we will need to use this definition only for k = 1, and the
only resolvents that we will need are those we have already used to analyze GUE
in the bulk and at the edge of the spectrum.
158 3. S PACINGS FOR G AUSSIAN ENSEMBLES
Main results
1 n1
Kn, , ,2 (x, y) = i (x)i (y) .
2 i=0
(3.9.24)
The kernel Kn, , ,2 (x, y) is nothing new: we have previously studied it to obtain
limiting results for the GUE.
We come to the novel definitions. We write Kn = Kn, , ,2 to abbreviate. Let
Kn (x, y) Kyn (x, y)
Kn, , ,1 (x, y) =
12 sign(x y) + yx Kn (t, y)dt Kn (x, y)
n1 (x)n (y) n1 (x)n (y)
n
+ x
2 3
( y n1 (t)dt)n (y) n (x)n1 (y)
n1 (x)
n1 ,1 0
x (t)dt if n is odd,
+ y n1 (y) (3.9.25)
n1
n1 ,1 n1 ,1
0 if n is even,
3.9 L IMITING B EHAVIOR OF THE GOE AND THE GSE 159
and
K
1 K2n+1 (x, y) 2n+1
y (x, y)
Kn, , ,4 (x, y) = x (3.9.26)
2 K
y 2n+1 (t, y)dt K 2n+1 (x, y)
2n + 1 2n (x)2n+1 (y) 2n (x)2n+1 (y)
+ x .
4 3
( 2n (t)dt)2n+1 (y) 2n+1 (x)2n (y)
y
Theorem 3.9.19 Let 0 and a Borel set A R be given. Assume either that
> 0 or that A is bounded. Let {1, 4}. Then we have
2
n
|(x)| i=1 0, , i(x ) dx
Ac Ac
i
= Fred (1AA Kn, , , ) . (3.9.27)
2
|(x)| i=1 0, , (xi ) dxi
n
It is easy to check using Lemma 3.9.9 that the right side is defined. For compari-
son, we note that under the same hypotheses on and A we have
Ac Ac |(x)| i=1 0, , (xi ) dxi
2 n 2
= Fred1 (1AA Kn, , ,2 ) . (3.9.28)
|(x)|2 ni=1 0, , (xi )2 dxi
The latter is merely a restatement in the present setup of Lemma 3.2.4.
Before commencing the proof we need to prove a Pfaffian analog of (3.9.21).
For integers n > 0, put
(x) (y)
Ln (x, y) = Ln, , (x, y) = 2
(x) (y) .
0<n
(1) =(1)n
Lemma 3.9.20
n 1 n1
Ln (x, y) = n1 (x)n (y) + 2 i (x)i (y) .
2 3 i=0
Let F1 (x, y) and F2 (x, y) denote the left and right sides of the equation above,
respectively. Fix {1, 2} and integers j, k 0 arbitrarily. By means of (3.9.14)
160 3. S PACINGS FOR G AUSSIAN ENSEMBLES
and (3.9.17), one can verify that F (x, y) j (x)k (y)dxdy is independent of ,
which is enough by (3.9.14) to complete the proof.
Proof of Theorem 3.9.19 Given smooth L Ker1 , to abbreviate notation, let Lext
Ker2 denote the differential extension of L, see Definition 3.9.18.
First we prove the case = 4 pertaining to the GSE. Let H(x) be as defined
in Proposition 3.9.7. By straightforward calculation based on Lemma 3.9.20, one
can verify that
1 ext
H(x)J1 T
n H(y) J1 = L2n+1, , (x, y) = Kn, , ,4 (x, y) .
2
Then formula (3.9.27) in the case = 4 follows from (3.9.9) and Remark 3.9.16.
We next prove the case = 1 pertaining to the GOE. We use all the notation
introduced in Proposition 3.9.8. One verifies by straightforward calculation using
Lemma 3.9.20 that
GR (x)J1 GR (y)T J1 ext ext
n = Ln, , (x, y) + Mn, , (x, y) ,
where
n1 (x)n1 (y)
1,n1 if n is odd,
Mn, , (x, y) =
0 if n is even.
Further, with
" #
0 0
Q(x, y) = GAc (x)J1 GR (y)T J1
n , E(x, y) = , (3.9.29)
1
2 sign(x y) 0
QA = 1AA Q and EA = 1AA E, we have
EA + QA + EA QA = 1AA Kn, , ,1 .
Finally, formula (3.9.27) in the case = 1 follows from (3.9.10) combined with
Remarks 3.9.15 and 3.9.16.
Remark 3.9.21 Because the kernel Ln, , is smooth and antisymmetric, the proof
above actually shows that Kn, , ,4 is both self-dual and the differential extension
of its entry in the lower left. Further, the proof shows the same for Kn, , ,1 + E.
Recall the symmetric scalar kernels, see Theorem 3.1.1, and Definition 3.1.3,
1 sin(x y)
Ksine (x, y) = Ksine,2 (x, y) = , (3.9.30)
xy
The subscripts 1 and 4 refer to the parameters for the GOE and GSE, respec-
tively. Note that each of the kernels Ksine,4 and, with E as in (3.9.29), E + Ksine,1 is
self-dual and the differential extension of its entry in the lower left. In other words,
the kernels Ksine, have properties analogous to those of Kn, , , mentioned in Re-
mark 3.9.18.
We will prove the following limit formulas.
uniformly for x, y I.
Limit formula (3.9.35) is merely a restatement of Lemma 3.5.1, and to the proof
of the latter there is not much to add in order to prove the other two limit formu-
las. Using these we will prove the following concerning the bulk limits Fbulk, (t)
considered in Theorem 3.1.6.
162 3. S PACINGS FOR G AUSSIAN ENSEMBLES
Corollary 3.9.23 For {1, 2, 4} and constants t > 0, the limits Fbulk, (t) exist.
More precisely, with I = (t/2,t/2) R,
Formula (3.9.38) merely restates the limit formula in Theorem 3.1.1. Note that the
limit formulas limt0 Fbulk, (t) = 0 for {1, 2, 4} hold automatically as a conse-
quence of the Fredholm determinant formulas (3.9.37), (3.9.38) and (3.9.39), re-
spectively. The case = 2 of (3.9.40) was discussed previously in Remark 3.6.5.
We will see that the cases {1, 4} are easily deduced from the case = 2 by
using decimation and superposition, see Theorem 2.5.17.
We turn to the study of the edge of the spectrum. We introduce matrix variants
of the Airy kernel KAiry and then state limit results. Let
KAiry,1 (x, y)
K
KAiry (x, y) Airy
y (x, y)
=
12 sign(x y) + yx KAiry (t, y)dt KAiry (x, y)
1 Ai(x)(1 y Ai(t)dt) Ai(x) Ai(y)
+ x , (3.9.41)
2 ( y Ai(t)dt)(1 y Ai(t)dt) (1 x Ai(t)dt) Ai(y)
KAiry,4 (x, y)
K
1 KAiry (x, y) Airy
y (x, y)
= x
2 y KAiry (t, y)dt KAiry (x, y)
1 Ai(x) y Ai(t)dt Ai(x) Ai(y)
+ x . (3.9.42)
4 ( y Ai(t)dt)( y Ai(t)dt) ( x Ai(t)dt) Ai(y)
Although it is not immediately apparent, the scalar kernels appearing in the lower
left of KAiry, for {1, 4} are antisymmetric, as can be verified by using formula
(3.9.58) below and integration by parts. More precisely, each of the kernels KAiry,4
and E + KAiry,1 (with E as in (3.9.29)) is self-dual and the differential extension
of its entry in the lower left. In other words, the kernels KAiry, have properties
analogous to those of Kn, , , mentioned in Remark 3.9.18.
We will prove the following limit formulas.
3.9 L IMITING B EHAVIOR OF THE GOE AND THE GSE 163
uniformly for x, y I.
The proofs of the limit formulas are based on a strengthening of Lemma 3.7.2
capable of handling intervals unbounded above, see Proposition 3.9.30. The limit
formulas imply, with some extra arguments, the following results concerning the
edge limits Fedge, (t) considered in Theorem 3.1.7.
Corollary 3.9.25 For {1, 2, 4} and real constants t, the edge limits Fedge, (t)
exist. More precisely, with I = (t, ), and > 0 any constant,
Fedge,1 (t)2 = Fred2 (1II KAiry,1 ) , (3.9.46)
Fedge,2 (t) = Fred1 (1II KAiry,2 ) , (3.9.47)
2/3 2
Fedge,4 (t/2 ) = Fred2 (1II KAiry,4 ) . (3.9.48)
Further, for {1, 2, 4},
lim Fedge, (t) = 0 . (3.9.49)
t
We will show below, see Lemma 3.9.33, that for 0 and {1, 2, 4}, the
( )
-twisted kernel KAiry, is bounded on sets of the form I I with I an interval
bounded below, and hence all Fredholm determinants on the right are defined.
Note that the limits limt+ Fedge, (t) = 1 for {1, 2, 4} follow automatically
from formulas (3.9.46), (3.9.47) and (3.9.48), respectively. In particular, formula
(3.9.47) provides another route to the proof of Theorem 3.1.4 concerning edge-
scaling in the GUE which, bypassing the Ledoux bound (Lemma 3.3.2), handles
the right-tightness issue directly.
uniformly for x I.
Proof The case k = 0 of the proposition is exactly (3.5.4). Assume hereafter that
k > 0. By (3.9.17) and (3.9.20) we have
n+ xn+ ,n,0 (x)
n+ ,n,0 (x) = n+ 1,n,0 (x) .
n 2n
Repeated differentiation of the latter yields a relation which finishes the proof by
induction on k.
In the bulk case only the order of magnitude established here is needed, but in the
edge case we will need the exact value of the limit.
Proof By (3.9.11) in the case = 1 and = 0 we have
0 (x) = 21/4 1/4 ex
2 /4
, 0 (x)dx = 23/4 1/4 . (3.9.51)
by the Stirling approximation, see (2.5.12). Then (3.9.50) follows from (3.9.51).
for odd positive integers n. By (3.9.51) and (3.9.53), the right side is positive and
in any case is O(n5/4 ). The bound (3.9.52) follows.
These hold by Propositions 3.9.28 and 3.9.29, respectively. The proof of Theo-
rem 3.9.22 is complete.
( ,n) ( ,n)
Proof of Corollary 3.9.23 For {1, 2, 4}, let ( ,n) = (1 , . . . , n ) be
a random vector in Rn with law possessing a density with respect to Lebesgue
measure proportional to |(x)| e |x| /4 . We have by Theorem 3.9.19, formula
2
The proofs of (3.9.37), (3.9.38) and (3.9.39) are completed by using Lemma 3.4.5
and Theorem 3.9.22. It remains only to prove the statement (3.9.40). For = 2,
it is a fact which can be proved in a couple of ways described in Remark 3.6.5.
The case = 2 granted, the cases {1, 4} can be proved by using decimation
and superposition, see Theorem 2.5.17. Indeed, consider first the case = 1. To
derive a contradiction, assume limt Fbulk,1 (t) = 1 for some > 0. Then, by
the decimation relation (2.5.25), limt Fbulk,2 (t) 1 2 , a contradiction. Thus,
limt Fbulk,1 (t) = 1. This also implies by symmetry that the probability that no
(rescaled) eigenvalue of the GOE appears in [0,t], denoted F1 (t), decays to 0 as
t . By the decimation relation (2.5.26), we then have
uniformly for x I.
We first need to prove two lemmas. The first is a classical trick giving growth
information about solutions of one-dimensional Schrodinger equations. The sec-
ond applies the first to the Schrodinger equation (3.9.22) satisfied by oscillator
wave-functions.
Lemma 3.9.31 Fix real numbers a < b. Let and V be infinitely differentiable
real-valued functions defined on the interval (a, ) satisfying the following:
(i) = V ; (ii) > 0 on [b, ); (iii) limx (log ) (x) = ;
(iv) V > 0 on [b, );
(v) V 0 on [b, ).
Then (log ) V on [b, ).
Lemma 3.9.32 Fix n > 0 and put n (x) = n,n1/6 ,2n (x). Then for x 1 we have
n (x) > 0 and (log n ) (x) (x 1/2)1/2 .
Proof Let be the rightmost of the finitely many zeroes of the function n . Then
n does not change sign on ( , ) and in fact is positive by (3.9.20). The logarith-
mic derivative of n tends to as x + because n is a polynomial in x times
a Gaussian density function of x. In the present case the Schrodinger equation
(3.9.22) takes the form
We finally apply Lemma 3.9.31 with a = max(1, ) < b, thus obtaining the esti-
mate
(log n ) (b) b 1/2 for b ( , ) (1, ) .
This inequality forces one to have < 1 because the function of b on the left side
tends to + as b .
Proof of Proposition 3.9.30 We write n, (x) instead of n+ ,n1/6 ,2n (x) to abbre-
viate. We have
xn, (x) n n1/6
n, 1 (x) n, (x) = + 1 n, (x) n, (x) ,
2n1/6 n + n+ n+
by (3.9.20) and (3.9.17), and by means of this relation we can easily reduce to the
case = 0. Assume that = 0 hereafter and write simply n = n,0 .
By Lemma 3.7.2, the limit (3.9.54) holds on bounded intervals I. Further, from
Lemma 3.7.7 and the Airy equation Ai (x) = x Ai(x), we deduce that
sup sup |e x n (x)| < .
(k)
(3.9.57)
n=1 x1
( )
Lemma 3.9.33 For {1, 2, 4}, 0 and intervals I bounded below, KAiry, is
bounded on I I.
Proof We have
KAiry (x, y) = Ai(x + t) Ai(y + t)dt . (3.9.58)
0
To verify this formula, first apply x + y to both sides, using (3.9.56) to justify
differentiation under the integral, then apply the Airy equation Ai (x) = x Ai(x) to
verify equality of derivatives, and finally apply (3.9.56) again to fix the constant
of integration. By further differentiation under the integral, it follows that for all
integers k, 0, constants 0 and intervals I bounded below,
k+
sup e (x+y) k KAiry (x, y) < . (3.9.59)
x,yI x y
uniformly for x, y I.
3.9 L IMITING B EHAVIOR OF THE GOE AND THE GSE 169
This is proved using (3.9.12), (3.9.21) and (3.9.22), following the pattern set in
proving (3.9.58) above. In the case = 0 we then get the desired uniform con-
vergence (3.9.50) by Proposition 3.9.30 and dominated convergence. After differ-
entiating under the integrals in (3.9.58) and (3.9.61), we get the desired uniform
convergence for = 1 in similar fashion.
Proof of Theorem 3.9.24 The limit (3.9.44) follows from Proposition 3.9.34. To
see (3.9.43) and (3.9.45), note that by definitions (3.9.41) and (3.9.42), and Propo-
sitions 3.9.30 and 3.9.34, we just have to verify the (numerical) limit formulas
1 n1/4 1
lim 1/6 , 1 =
n 4 n,n ,2 n
lim
n
n , 1 = ,
n:even n:even
4 2
1 1 1
lim = lim = .
n
n:odd
n1,n1/6 ,2n , 1 n n1/4
n:odd n1 , 1 2
To finish the proofs of (3.9.46), (3.9.47) and (3.9.48), use Lemma 3.4.5 and The-
orem 3.9.24. The statement (3.9.49) holds for = 2 by virtue of Theorem 3.1.5,
and for = 1 as a consequence of the decimation relation (2.5.25).
The argument for = 4 is slightly more complicated. We use some information
on determinantal processes as developed in Section 4.2. By (3.8.22), the sequence
of laws of the second eigenvalue of the GUE, rescaled at the edge scaling, is
tight. Exactly as in the argument above concerning = 1, this property is inherited
by the sequence of laws of the (rescaled) second eigenvalue of the GOE. Using
170 3. S PACINGS FOR G AUSSIAN ENSEMBLES
(2.5.26), we conclude that the same applies to the sequence of laws of the largest
eigenvalue of the GSE.
Exercise 3.9.36 Using Exercise 3.8.3, (3.7.20), (3.8.4), (3.8.21) and Theorem
3.1.7, show that for = 1, 2, 4,
1 2
lim log[1 Fedge, (t)] = ,
t t 3/2 3
1
lim log Fedge, (t) = .
t t 3 24
Again, note the different rates of decay for the upper and lower tails of the distri-
bution of the largest eigenvalue.
for {1, 4}, thus finishing the proofs of Theorems 3.1.6 and 3.1.7.
We aim to represent each of the quantities bulk, (t) and edge, (t) as a Fredholm
determinant of a finite rank kernel. Toward that end we prove the following two
lemmas.
Fix a constant 0. Fix kernels
" # " #
a b 0 0
, Ker2 , , w Ker1 . (3.9.63)
c d e 0
Assume that
d = + w , Fred1 ( ) = 0 . (3.9.64)
3.9 L IMITING B EHAVIOR OF THE GOE AND THE GSE 171
Lemma 3.9.37 With data (3.9.63) and under assumption (3.9.64), the kernels K1
and K4 are well defined, and have the following properties:
K1 , K4 Ker2 , (3.9.67)
" #
a b
Fred2
e + c d
Fred2 (K1 + K1 R) = , (3.9.68)
Fred1 ( )
" #
a b
Fred2 12
c d
Fred2 (K4 + K4 R) = . (3.9.69)
Fred1 ( )
Proof Put
" # " # " #
0 b 0 0 0 0
B= , E= , S= .
0 0 e 0 0
Note that B, E, S Ker2 . Given L1 , . . . , Ln Ker2 with n 2, let
m(L1 , L2 ) = L1 + L2 L1 L2 Ker2 ,
m(L1 , . . . , Ln ) = m(m(L1 , . . . , Ln1 ), Ln ) Ker2 for n > 2 .
Put
" #
a b
L1 = m , E, B, R ,
e + c d
" #
1 a b 1
L4 = m E, , E, B, R .
2 c d 2
Ones verifies that
K = L L S, L = K + K R (3.9.70)
172 3. S PACINGS FOR G AUSSIAN ENSEMBLES
The next lemma shows that K can indeed be of finite rank in cases of interest.
Lemma 3.9.38 Let K Ker2 be smooth, self-dual, and the differential extension
of its entry K21 Ker1 in the lower left. Let I = (t1 ,t2 ) be a bounded interval. Let
" #
a(x, y) b(x, y) 1
= 1II (x, y)K(x, y), e(x, y) = 1II (x, y)sign(x y) ,
c(x, y) d(x, y) 2
We begin by recalling basic objects from the analysis of the GUE in the bulk of
the spectrum. Reverting to the briefer notation introduced in equation (3.6.1), we
3.9 L IMITING B EHAVIOR OF THE GOE AND THE GSE 173
write S(x, y) = Ksine,2 (x, y) for the sine-kernel. Explicitly, equation (3.9.38) says
that
n
(1)n x1 . . . xn
1 Fbulk,2 (t) = 1 +
x1 . . . xn
S dxi .
n=1 n! [ 2t , 2t ]n i=1
Let R(x, y;t) be the resolvent kernel introduced in Section 3.6.1 (obtained from
the sine-kernel with the choice n = 2, s0 = 0 = s2 , s1 = 1 and t2 = t1 = t/2).
Explicitly, R(x, y;t) is given by
n
(1)n x x1 xn
(1Fbulk,2 (t))R(x, y;t) = S(x, y)+
y x1 xn
S dxi ,
n=1 n! [ 2t , 2t ]n i=1
and satisfies
t/2
S(x, y) + S(x, z)R(z, y;t)dz = R(x, y;t) (3.9.76)
t/2
noting that
r = r(t) = 2pq/t , (3.9.77)
= (t) = (t/2,t), = (x;t) = (x;t) .
x
sin x
f (x;t) = ,
1
g(x;t) = (S(x,t/2) + S(x, t/2)) ,
2
1
h(x;t) = (S(x,t/2) S(x, t/2)) ,
2
x
G(x;t) = g(z;t)dz .
0
Then we have
" #
g(x;t)
K1 (x, y) = 1II (x, y) 1 2h(x;t) ,
G(x;t)
" #" #
g(x;t)/2 0 1 h(y;t)
K4 (x, y) = 1II (x, y) ,
G(x;t) 1 0 g(x;t)/2
" #
0 0
R(x, y) = 1II (x, y) ,
0 R(x, y;t)
where the first two formulas can be checked using Lemma 3.9.38, and the last
formula holds by the resolvent identity (3.9.76).
The right sides of (3.9.68) and (3.9.69) equal bulk, (t) for {1, 4}, respec-
tively, by Corollary 3.9.23. Using Remark 3.9.15, one can check that the left side
of (3.9.68) equals the right side of (3.9.81), which concludes the proof of the latter.
A similar argument shows that the left side of (3.9.69) equals
" + #
G (t) + h|Gt h|1t
det I2 .
2 g|Gt 12 g|1t
1
But h|1t and g|Gt are forced to vanish identically by (3.9.79). This concludes
the proof of (3.9.82).
Toward the goal of evaluating the logarithmic derivatives of the right sides of
(3.9.81) and (3.9.82), we prove a final lemma. Given a test-function = (x;t),
let D = (D )(x;t) = (x x + t t ) (x;t). In the statement of the lemma and the
calculations following we drop subscripts of t for brevity.
Proof The resolvent identity (3.9.76) and the symmetry S(x, y) S(y, x) yield the
relation
t/2
g h| t = R(t/2, x;t) (x)dx.
t/2
These facts, along with the symmetry (3.9.78) and integration by parts, yield
(3.9.83) after a straightforward calculation. Similarly, using the previously proved
formulas for t R(x, y;t), (x y)R(x, y;t), P (x;t) and Q (x;t), see Section 3.6,
along with the trick
1+x +y R= (x y)R + y + R,
x y x x y
one gets
1+x +y +t R(x, y;t) = P(x;t)P(y;t) + Q(x;t)Q(y;t) ,
x y t
whence formula (3.9.83) by differentiation under the integral.
To apply the preceding lemma we need the following identities for which the
verifications are straightforward.
d +
h + Dh = f + f , g + Dg = f + f , DG = f + f , t G = f + f + . (3.9.85)
dt
The notation here is severely abbreviated. For example, the third relation written
out in full reads (DG)(x;t) = f + (t) f (x) = f (t/2) f (x). The other relations are
interpreted similarly.
We are ready to conclude. We claim that
d
t (1 2G+ 2h|G)
dt
= 2( f + + h| f )( f + + f |G) = 2q( f + + f |G)
= 2q( f + + g| f 2( f + + g| f )(G+ + h|G))
= 2pq(1 2G+ 2h|G) = tr(1 2G+ 2h|G) . (3.9.86)
At the first step we apply (3.9.79), (3.9.84) and (3.9.85). At the second and fourth
steps we apply (3.9.80). At the third step we apply (3.9.83) with 1 = f and
2 = G, using (3.9.79) to simplify. At the last step we apply (3.9.77). Thus the
claim (3.9.86) is proved. The claim is enough to prove (3.1.11) since both sides
of the latter tend to 1 as t 0. Similarly, we have
d
t(1 + g|1) = p f |1 = 2pq(1 + g|1) = tr(1 + g|1) ,
dt
which is enough in conjunction with (3.1.11) to verify (3.1.12). The proof of
Theorem 3.1.6 is complete.
The pattern of the proof of Theorem 3.1.6 will be followed rather closely, albeit
with some extra complications. We begin by recalling the main objects from the
3.9 L IMITING B EHAVIOR OF THE GOE AND THE GSE 177
analysis of the GUE at the edge of the spectrum. We revert to the abbreviated
notation A(x, y) = KAiry,2 (x, y). Explicitly, equation (3.9.47) says that
n
(1)n x1 ... xn
Fedge,2 (t) = 1 + A dxi .
n=1 n! [t,)n x1 ... xn i=1
Let R(x, y;t) be the resolvent kernel studied in Section 3.8. Explicitly, R(x, y;t) is
given by
(1)n x x1 xn n
Fedge,2 (t)R(x, y;t) = A(x, y) + A dxi ,
n=1 n! (t,)n y x1 xn i=1
which are as in definition (3.8.3), noting that q is the function appearing in Theo-
rem 3.1.7.
Given any smooth functions 1 = 1 (x;t) and 2 = 2 (x;t) defined on R2 , we
define
1 |2 t = 1 (t + x;t)2 (t + x;t)dx
0
+ 1 (t + x;t)R(t + x,t + y;t)2 (t + y;t)dxdy ,
0 0
provided that the integrals converge absolutely for each fixed t. We call the result-
ing function of t an angle bracket. Since the kernel R(x, y;t) is symmetric in x and
y, we have 1 |2 t = 2 |1 t .
We will only need finitely many explicitly constructed pairs (1 , 2 ) to substi-
tute into |t . For each of these pairs it will be clear using the estimates (3.9.56)
and (3.9.59) that the integrals above converge absolutely, and that differentiation
under the integral is permissible.
We now define the finite collection of smooth functions of (x,t) R2 from
178 3. S PACINGS FOR G AUSSIAN ENSEMBLES
f = f (x;t) = Ai(x) ,
g = g(x;t) = A(t, x) ,
F = F(x;t) = f (z)dz ,
x
G = G(x;t) = g(z;t)dz .
x
= (x;t) = (x;t) ,
x
= (t) = (t;t) ,
D = (D )(x;t) = + (x;t) .
x y
We have
d
D f = f , DF = F = f , G = g , F = f, (3.9.88)
dt
d
Dg = f f , DG = f F , G = (F )2 /2 , G = f F , (3.9.89)
dt
the first four relations clearly, and the latter four following from the integral rep-
resentation (3.9.58) of A(x, y). We further have
q = f + f |g , (3.9.90)
by (3.9.87). The next lemma links q to the ratios (3.9.62) in the edge case by
expressing these ratios in terms of angle brackets. For {1, 4} let
h 1 12 F " #
g =
12
,1 1
g
,
2 + 4F f
f 0 1
" # ,1 ,1 1 G
G
1
F + F
= 2 4
,1
2 4 1 .
F 0 1
2 2 F
3.9 L IMITING B EHAVIOR OF THE GOE AND THE GSE 179
It is easy to check that all the angle brackets are well defined.
Proof We arbitrarily fix real t, along with {1, 4} and > 0. Let K = E +KAiry,1
if = 1 and otherwise let K = 2KAiry,4 if = 4. Let I = (t, ) and define inputs
to Lemma 3.9.37 as follows.
" #
a(x, y) b(x, y)
= 1II (x, y)K(x, y) ,
c(x, y) d(x, y)
1
e(x, y) = 1II (x, y) sign(x y) ,
2
(x, y) = 1II (x, y)A(x, y) ,
1
w(x, y) = ,1 Ai(z)dz Ai(y) .
2 x
Using Lemma 3.9.38 with t1 = t and t2 , one can verify after a straightforward
if long calculation that if = 1, then
" #" #
g1 (y;t) 0 1 h1 (x;t)
K1 (x, y) = 1II (x, y) ,
G1 (y;t) F1 (y;t) 0 f1 (x;t)
whereas, if = 4, then
" # 1 h4 (y;t)/2
g4 (x;t)/2 0 0 0 g4 (y;t)/2 .
K4 (x, y) = 1II (x, y)
G4 (x;t) 1 F4 (x;t)
0 f4 (y;t)
We also have
" #
0 0
R(x, y) = 1II (x, y) .
0 R(x, y;t)
The right sides of (3.9.68) and (3.9.69) equal edge, (t) for {1, 4}, respec-
tively, by Corollary 3.9.25. Using Remark 3.9.15, and the identity
,1
g (x;t)dx = F (t) ,
t 2
which follows from (3.9.88) and the definitions, one can check that for = 1 the
180 3. S PACINGS FOR G AUSSIAN ENSEMBLES
left side of (3.9.68) equals the right side of (3.9.91), and that for = 4, the left
side of (3.9.69) equals the right side of (3.9.92). This completes the proof.
One last preparation is required. For the rest of the proof we drop the subscript
t, writing 1 |2 instead of 1 |2 t . For 1 { f , g} and 2 {1, F, G}, we have
d
1 |2 = D1 |2 + 1 |D2 f |1 f |2 , (3.9.93)
dt
1 |2 + 1 |2 = (1 + g|1 )(2 + g|2 ) + f |1 f |2 , (3.9.94)
as
one verifies
by straightforwardly applying the previously obtained formulas for
x + y R(x, y;t) and t R(x, y;t), see Section 3.8.
We now calculate using (3.9.88), (3.9.89), (3.9.90), (3.9.93) and (3.9.94). We
have
d
(1 + g|1) = q( f |1) ,
dt
d
( f |1) = f |1 + f | f 1| f = q(1 + g|1) ,
dt
d
(1 f |F) = f |F f | f + f | f f |F = q(F + g|F) ,
dt
d
(F + g|F) = q(1 f |F) ,
dt
g|1 = (G + g|G)(1 + g|1) + f |G f |1 ,
g|F + f |G = (G + g|G)(F + g|F) + f |F f |G .
The first four differential equations are easy to integrate, and moreover the con-
stants of integration can be fixed in each case by noting that the angle brackets
tend to 0 as t +, as does q. In turn, the last two algebraic equations are easily
solved for g|G and f |G. Letting
x = x(t) = exp q(x)dx ,
t
Remark 3.9.42 The evaluations of determinants which conclude the proof above
are too long to suffer through by hand. Fortunately one can organize them into
manipulations of matrices with entries that are (Laurent) polynomials in variables
x and F , and carry out the details with a computer algebra system.
The study of spacings between eigenvalues of random matrices in the bulk was
motivated by Wigners surmise [Wig58], that postulated a density of spacing
distributions of the form Cses /4 . Soon afterwords, it was realized that this was
2
not the case [Meh60]. This was followed by the path-breaking work [MeG60],
that established the link with orthogonal polynomials and the sine-kernel. Other
relevant papers from that early period include the series [Dys62b], [Dys62c],
[Dys62d] and [DyM63]. An important early paper concerning the orthogonal
and symplectic ensembles is [Dys70]. Both the theory and a description of the
history of the study of spacings of eigenvalues of various ensembles can be found
in the treatise [Meh91]. The results concerning the largest eigenvalue are due to
[TrW94a] for the GUE (with a 1992 ArXiv online posting), and [TrW96] for the
GOE and GSE; a good review is in [TrW93]. These results have been extended
in many directions; at the end of this section we provide a brief description and
pointers to the relevant (huge) literature.
The book [Wil78] contains an excellent short introduction to orthogonal poly-
nomials as presented in Section 3.2. Other good references are the classical
[Sze75] and the recent [Ism05]. The three term recurrence and the Christoffel
Darboux identities mentioned in Remark 3.2.6 hold for any system of polynomials
orthogonal with respect to a given weight on the real line.
Section 3.3.1 follows [HaT03], who proved (3.3.11) and observed that differ-
ential equation (3.3.12) implies a recursion for the moments of LN discovered by
[HaZ86] in the course of the latters investigation of the moduli space of curves.
Their motivation came from the following: at least formally, we have the expan-
sion
s2p
LN , es = LN , x2p .
p0 2p!
with N C tr(X2p ),g (1) the number of perfect matchings on one vertex of degree 2p
whose associated graph has genus g. Hence, computing LN , es as in Lemma
3.3.1 gives exact expressions for the numbers N C tr(X2p ),g (1). The link between
random matrices and the enumeration of maps was first described in the physics
context in [tH74] and [BrIPZ78], and has since been enormously developed, also
to situations involving multi-matrices, see [GrPW91], [FrGZJ95] for a descrip-
tion of the connection to quantum gravity. In these cases, matrices do not have in
general independent entries but their joint distribution is described by a Gibbs
measure. When this joint distribution is a small perturbation of the Gaussian
law, it was shown in [BrIPZ78] that, at least at a formal level, annealed mo-
ments LN , x2p expands formally into a generating function of the numbers of
maps. For an accessible introduction, see [Zvo97], and for a discussion of the as-
sociated asymptotic expansion (in contrast with formal expansion), see [GuM06],
[GuM07], [Mau06] and the discussion of RiemannHilbert methods below.
The sharp concentration estimates for max contained in Lemma 3.3.2 are de-
rived in [Led03].
Our treatment of Fredholm determinants in Section 3.4 is for the most part
adapted from [Tri85]. The latter gives an excellent short introduction to Fredholm
determinants and integral equations from the classical viewpoint.
The beautiful set of nonlinear partial differential equations (3.6.4), contained in
Theorem 3.6.1, is one of the great discoveries reported in [JiMMS80]. Their work
follows the lead of the theory of holonomic quantum fields developed by Sato,
Miwa and Jimbo in the series of papers [SaMJ80]. The link between Toeplitz
and Fredholm determinants and the Painleve theory of ordinary differential equa-
tions was earlier discussed in [WuMTB76], and influenced the series [SaMJ80].
See the recent monograph [Pal07] for a discussion of these developments in the
original context of the evaluation of correlations for two dimensional fields. To
derive the equations (3.6.4) we followed the simplified approach of [TrW93], how-
ever we altered the operator-theoretic viewpoint of [TrW93] to a matrix algebra
viewpoint consistent with that taken in our general discussion in Section 3.4 of
Fredholm determinants. The differential equations have a Hamiltonian structure
discussed briefly in [TrW93]. The same system of partial differential equations is
discussed in [Mos80] in a wider geometrical context. See also [HaTW93].
Limit formula (3.7.4) appears in the literature as [Sze75, Eq. 8.22.14, p. 201]
but is stated there without much in the way of proof. The relatively short self-
contained proof of (3.7.4) presented in Section 3.7.2 is based on the ideas of
[PlR29]; the latter paper is, however, devoted to the asymptotic behavior of the
Hermite polynomials Hn (x) for real positive x only.
3.10 B IBLIOGRAPHICAL NOTES 183
In Section 3.8, we follow [TrW02] fairly closely. It is possible to work out a sys-
tem of partial differential equations for the Fredholm determinant of the Airy ker-
nel in the multi-interval case analogous to the system (3.6.4) for the sine-kernel.
See [AdvM01] for a general framework that includes also non-Gaussian models.
As in the case of the sine-kernel, there is an interpretation of the system of partial
differential equations connected to the Airy kernel in the multi-interval case as an
integrable Hamiltonian system, see [HaTW93] for details.
The statement contained in Remark 3.8.1, taken from [HaM80], is a solution
of a connection problem. For another early solution to connection problems, see
[McTW77]. The book [FoIKN06] contains a modern perspective on Painleve
equations and related connection problems, via the RiemannHilbert approach.
Precise asymptotics on the TracyWidom distribution are contained in [BaBD08]
and [DeIK08].
Section 3.9 borrows heavily from [TrW96] and [TrW05], again reworked to our
matrix algebra viewpoint.
Our treatment of Pfaffians in Section 3.9.1 is classical, see [Jac85] for more
information. We avoided the use of quaternion determinants; for a treatment based
on these, see e.g. [Dys70] and [Meh91].
An analog of Lemma 3.2.2 exists for = 1, 4, see Theorem 6.2.1 and its proof
in [Meh91] (in the language of quaternion determinants) and the exposition in
[Rai00] (in the Pfaffian language).
As mentioned above, the results of this chapter have been extended in many
directions, seeking to obtain universality results, stating that the limit distributions
for spacings at the bulk and the edge of the GOE/GUE/GSE appear also in other
matrix models, and in other problems. Four main directions for such universality
occur in the literature, and we describe these next.
First, other classical ensembles have been considered (see Section 4.1 for what
ensembles mean in this context). These involve the study of other types of orthog-
onal polynomials than the Hermite polynomials (e.g., Laguerre or Jacobi). See
[For93], [For94], [TrW94b], [TrW00], [Joh00], [John01], [For06], and the book
[For05].
Second, one may replace the entries of the random matrix by non-Gaussian
entries. In that case, the invariance of the law under conjugation is lost, and no ex-
plicit expression for the joint distribution of the eigenvalues exist. It is, however,
remarkable that it is still possible to obtain results concerning the top eigenvalue
and spacings at the edge that are of the same form as Theorems 3.1.4 and 3.1.7,
in the case that the law of the entries possesses good tail properties. The seminal
184 3. S PACINGS FOR G AUSSIAN ENSEMBLES
Third, one can consider joint distribution of eigenvalues of the form (2.6.1), for
general potentials V . This is largely motivated by applications in physics. When
deriving the bulk and edge asymptotics, one is naturally led to study the asymp-
totics of orthogonal polynomials associated with the weight eV . At this point,
the powerful RiemannHilbert approach to the asymptotics of orthogonal poly-
nomials and spacing distributions can be applied. Often, that approach yields the
sharpest estimates, especially in situations where the orthogonal polynomials are
not known explicitly, thereby proving universality statements for random matri-
ces. Describing this approach in detail goes beyond the scope of this book (and
bibliography notes). For the origins and current state of the art of this approach we
refer the reader to the papers [FoIK92], [DeZ93], [DeZ95], [DeIZ97], [DeVZ97]
[DeKM+ 98], [DeKM+ 99], [BlI99], to the books [Dei99], [DeG09] and to the
lecture [Dei07]. See also [PaS08a].
Let Ln denote the length of the longest increasing subsequence of a random per-
mutation on {1, . . . , n}. The problem is to understand the asymptotics of the law
of Ln . Based on his subadditive ergodic theorem, Hammersley [Ham72] showed
that Ln / n converges to a deterministic limit and, shortly thereafter, [VeK77] and
[LoS77] independently proved that the limit equals 2. It was conjectured (in anal-
ogy with conjectures for first passage percolation, see [AlD99] for some of the
history and references) that Ln := (Ln 2 n)/n1/6 has variance of order 1. Using
a combinatorial representation, due to Gessel, of the distribution of Ln in terms
of an integral over an expression resembling a joint distribution of eigenvalues
(but with non-Gaussian potential V ), [BaDJ99] applied the RiemannHilbert ap-
proach to prove that not only is the conjecture true, but in fact Ln asymptotically
is distributed according to the TracyWidom distribution F2 . Subsequently, di-
rect proofs that do not use the RiemannHilbert approach (but do use the random
matrices connection) emerged, see [Joh01a], [BoOO00] and [Oko00]. Certain
growth models also fall in the same pattern, see [Joh00] and [PrS02]. Since then,
many other examples of combinatorial problems leading to a universal behavior
of the TracyWidom type have emerged. We refer the reader to the forthcoming
book [BaDS09] for a thorough discussion.
We have not discussed, neither in the main text nor in these bibliographical
notes, the connections between random matrices and number theory, more specif-
ically the connections with the Riemann zeta function. We refer the reader to
[KaS99] for an introduction to these links, and to [Kea06] for a recent account.
4
Some generalities
In this chapter, we introduce several tools useful in the study of matrix ensem-
bles beyond GUE, GOE and Wigner matrices. We begin by setting up in Section
4.1 a general framework for the derivation of joint distribution of eigenvalues in
matrix ensembles and then we use it to derive joint distribution results for several
classical ensembles, namely, the GOE/GUE/GSE, the Laguerre ensembles (corre-
sponding to Gaussian Wishart matrices), the Jacobi ensembles (corresponding to
random projectors) and the unitary ensembles (corresponding to random matrices
uniformly distributed in classical compact Lie groups). In Section 4.2, we study
a class of point processes that are determinantal; the eigenvalues of the GUE, as
well as those for the unitary ensembles, fall within this class. We derive a repre-
sentation for determinantal processes and deduce from it a CLT for the number
of eigenvalues in an interval, as well as ergodic consequences. In Section 4.3,
we analyze time-dependent random matrices, where the entries are replaced by
Brownian motions. The introduction of Brownian motion allows us to use the
powerful theory of Ito integration. Generalizations of the Wigner law, CLTs, and
large deviations are discussed. We then present in Section 4.4 a discussion of
concentration inequalities and their applications to random matrices, substantially
extending Section 2.3. Concentration results for matrices with independent en-
tries, as well as for matrices distributed according to Haar measure on compact
groups, are discussed. Finally, in Section 4.5, we introduce a tridiagonal model of
random matrices, whose joint distribution of eigenvalues generalizes the Gaussian
ensembles by allowing for any value of 1 in Theorem 2.5.3. We refer to this
matrix model as the beta ensemble.
186
4.1 J OINT DISTRIBUTIONS FOR CLASSICAL MATRIX ENSEMBLES 187
Throughout this section, we let F denote any of the (skew) fields R, C or H. (See
Appendix E for the definition of the skew field of quaternions H. Recall that H
is a skew field, but not a field, because the product in H is not commutative.)
We set = 1, 2, 4 according as F = R, C, H, respectively. (Thus is the dimen-
sion of F over R.) We next recall matrix notation which in greater detail is set
out in Appendix E.1. Let Mat pq (F) be the space of p q matrices with en-
tries in F, and write Matn (F) = Matnn (F). For each matrix X Mat pq (F), let
X Matqp (F) be the matrix obtained by transposing X and then applying the
conjugation operation to every entry. We endow Mat pq (F) with the structure
188 4. S OME GENERALITIES
of Euclidean space (that is, with the structure of finite-dimensional real Hilbert
space) by setting X Y = tr X Y . Let GLn (F) be the group of invertible ele-
ments of Matn (F), and let Un (F) be the subgroup of GLn (F) consisting of unitary
matrices; by definition U Un (F) iff UU = In iff U U = In .
The first integration formula that we present pertains to the Gaussian ensembles,
that is, to the GOE, GUE and GSE. Let Hn (F) = {X Matn (F) : X = X}. Let
Hn (F) denote the volume measure on Hn (F). (See Proposition F.8 for the general
definition of the volume measure M on a manifold M embedded in a Euclidean
space.) Let Un (F) denote the volume measure on Un (F). (We will check below,
see Proposition 4.1.14, that Un (F) is a manifold.) The measures Hn (F) and Un (F)
are just particular normalizations of Lebesgue and Haar measure, respectively. Let
[Un (F)] denote the (finite and positive) total volume of Un (F). (For any manifold
M embedded in a Euclidean space, we write [M] = M (M).) We will calculate
[Un (F)] explicitly in Section 4.1.2. Recall that if x = (x1 , . . . , xn ), then we write
(x) = 1i< jn (x j xi ). The notion of eigenvalue used in the next result is
defined for general F in a uniform way by Corollary E.12 and is the standard one
for F = R, C.
where for every x = (x1 , . . . , xn ) Rn we write (x) = (X) for any X Hn (F)
with eigenvalues x1 , . . . , xn .
According to Corollary E.12, the hypothesis that (X) depends only on the eigen-
values of X could be restated as the condition that (UXU ) = (X) for all
X Hn (F) and U Un (F).
Suppose now that X Hn (F) is random. Suppose more precisely that the en-
tries on or above the diagonal are independent; that each diagonal entry is (real)
Gaussian of mean 0 and variance 2/ ; and that each above-diagonal entry is stan-
dard normal over F. (We say that a random variable G with values in F is stan-
dard normal if, with {Gi }4i=1 independent real-valued Gaussian random variables
4.1 J OINT DISTRIBUTIONS FOR CLASSICAL MATRIX ENSEMBLES 189
G1 if F = R ,
(G1 + iG2 )/ 2 if F = C ,
(G1 + iG2 + jG3 + kG4 )/2 if F = H .) (4.1.2)
Remark 4.1.2 As in formula (4.1.1), all the integration formulas in this section
involve normalization constants given in terms of volumes of certain manifolds.
Frequently, when working with probability distributions, one bypasses the need
to evaluate these volumes by instead using the Selberg integral formula, Theorem
2.5.8, and its limiting forms, as in our previous discussion of the GOE and GUE
in Section 2.5.
We saw in Chapter 3 that the Hermite polynomials play a crucial role in the
analysis of GUE/GOE/GSE matrices. For that reason we will sometimes speak of
Gaussian/Hermite ensembles. In similar fashion we will tag each of the next two
ensembles by the name of the associated family of orthogonal polynomials.
We next turn our attention to random matrices generalizing the Wishart matrices
discussed in Exercise 2.1.18, in the case of Gaussian entries. Fix integers 0 <
p q and put n = p + q. Let Mat pq (F) be the volume measure on the Euclidean
space Mat pq (F). The analog of integration formula (4.1.1) for singular values of
rectangular matrices is the following. The notion of singular value used here is
defined for general F in a uniform way by Corollary E.13 and is the standard one
for F = R, C.
190 4. S OME GENERALITIES
where for every x = (x1 , . . . , x p ) R p we write (x) = (X) for any matrix X
H p (F) with eigenvalues x1 , . . . , x p .
The symmetry here crucial for the proof is that (W (p) ) = ((UWU )(p) ) for all
U Un (F) commuting with diag(Ip , 0q ) and all W Flagn (D, F).
Now up to a normalization constant, Flagn (D,F) is the law of a random matrix
of the form Un DUn , where Un Un (F) is Haar-distributed. (See Exercise 4.1.19
for evaluation of the constant [Flagn (D, F)].) We call such a random matrix
Un DUn a random projector. The joint distribution of eigenvalues of the submatrix
(Un DUn )(p) is then specified by formula (4.1.5). Now the orthogonal polynomials
corresponding to weights of the form x (1 x) on [0, 1] are the Jacobi polyno-
mials. In the analysis of random matrices of the form (Un DUn )(p) , the Jacobi
polynomials play a role analogous to that played by the Hermite polynomials in
the analysis of GUE/GOE/GSE matrices. For this reason we call (Un DUn )(p) a
random element of a Jacobi ensemble over F.
The last several integration formulas we present pertain to the classical compact
Lie groups Un (F) for F = R, C, H, that is, to the ensembles of orthogonal, unitary
and symplectic "matrices, respectively, # equipped with normalized Haar measure.
cos sin
We set R( ) = U2 (R) for R. More generally, for =
sin cos
(1 , . . . , n ) R , we set Rn ( ) = diag(R(1 ), . . . , R(n )) U2n (R). We also write
n
Remark 4.1.5 The choice of letters A, B, C, and D made here is consistent with
the standard labeling of the corresponding root systems.
(Odd orthogonal case) For odd n = 2+1 and every nonnegative Borel-measurable
central function on Un (R), we have
d Un (R)
1 1
d i
= (diag(R ( ), (1) ))B ( ) 2 .
[Un (R)] 2+1 ! [0,2 ] k=0
k
i=1
(4.1.7)
(Symplectic case) For every nonnegative Borel-measurable central function on
Un (H), we have
d Un (H) n
1 d i
= (eidiag( )
)Cn ( ) . (4.1.8)
[Un (H)] 2n n! [0,2 ]n i=1 2
We will recover these classical results of Weyl in our setup in order to make it
clear that all the results on joint distribution discussed in Section 4.1 fall within
Weyls circle of ideas.
Dn ( ) = (2 cos i 2 cos j )2 ,
1i< jn
Section 4.1.2 introduces the coarea formula, Theorem 4.1.8. In the specialized
form of Corollary 4.1.10, the coarea formula will be our main tool for proving the
formulas of Section 4.1.1. To allow for quick reading by the expert, we merely
state the coarea formula here, using standard terminology; precise definitions,
preliminary material and a proof of Theorem 4.1.8 are all presented in Appendix
F. After presenting the coarea formula, we illustrate it by working out an explicit
formula for [Un (F)].
Fix a smooth map f : M N from an n-manifold to a k-manifold, with deriva-
tive at a point p M denoted T p ( f ) : T p (M) T f (p) (N). Let Mcrit , Mreg , Ncrit
and Nreg be the sets of critical (regular) points (values) of f , see Definition F.3
and Proposition F.10 for the terminology. For q N such that Mreg f 1 (q) is
nonempty (and hence by Proposition F.16 a manifold) we equip the latter with the
volume measure Mreg f 1 (q) (see Proposition F.8). Put 0/ = 0 for convenience.
Finally, let J(T p ( f )) denote the generalized determinant of T p ( f ), see Definition
F.17.
Theorem 4.1.8 (The coarea formula) With notation and setting as above, let
be any nonnegative Borel-measurable function on M. Then:
(i) the function p J(T p ( f )) on M is Borel-measurable;
(ii) the function q (p)d Mreg f 1 (q) (p) on N is Borel-measurable;
194 4. S OME GENERALITIES
holds.
Corollary 4.1.10 We continue in the setup of Theorem 4.1.8. For every Borel-
measurable nonnegative function on N one has the integral formula
( f (p))J(T p ( f ))d M (p) = [ f 1 (q)] (q)d N (q) . (4.1.11)
Nreg
whence the result by Sards Theorem (Theorem F.11), Proposition F.16, and the
definitions.
Let Sn1 be the unit sphere centered at the origin in Rn . We will calculate
[Un (F)] by relating it to [Sn1 ]. We prepare by proving two well-known lem-
mas concerning Sn1 and its volume. Their proofs provide templates for the more
complicated proofs of Lemma 4.1.15 and Proposition 4.1.14 below.
Lemma 4.1.11 Sn1 is a manifold and for every x Sn1 we have Tx (Sn1 ) =
{X Rn : x X = 0}.
s1 x
Recall that (s) = 0 x e dx is Eulers Gamma function.
4.1 J OINT DISTRIBUTIONS FOR CLASSICAL MATRIX ENSEMBLES 195
As further preparation for the evaluation of [Un (F)], we state without proof
the following elementary lemma which allows us to consider transformations of
manifolds by left (or right) matrix multiplication.
Lemma 4.1.13 Let M Matnk (F) be a manifold. Fix g GLn (F). Let f = (p
gp) : M gM = {gp Matnk (F) : p M}. Then:
(i) gM is a manifold and f is a diffeomorphism;
(ii) for every p M and X T p (M) we have T p ( f )(X) = gX;
(iii) if g Un (F), then f is an isometry (and hence measure-preserving).
The proof of Proposition 4.1.14 will be obtained by applying the coarea formula
to the smooth map
f = (g (last column of g)) : Un (F) S n1 (4.1.14)
196 4. S OME GENERALITIES
S n1 = {x Matn1 (F) : x x = 1}
Lemma 4.1.15 Un (F) is a manifold and TIn (Un (F)) is the space of anti-self-
adjoint matrices in Matn (F).
Let be a curve in Matn (F) with (0) = In and (0) = X TIn (Matn (F)) =
Matn (F). Then, for all g Un (F) and X Matn (F),
Lemma 4.1.16 f is onto, and furthermore (provided that n > 1), for any s S n1 ,
the fiber f 1 (s) is isometric to Un1 (F).
Proof The first claim (which should be obvious in the cases F = R, C) is proved
by applying Corollary E.8 with k = 1. To see the second claim, note first that for
any W Un1 (F), we have
" #
W 0
Un (F) , (4.1.16)
0 1
and that every g Un (F) whose last column is the unit vector en = (0, . . . , 0, 1)T
is necessarily of the form (4.1.16). Therefore the fiber f 1 (en ) is isometric to
Un1 (F). To see the claim for other fibers, note that if g, h Un (F), then f (gh) =
g f (h), and then apply part (iii) of Lemma 4.1.13.
Proof (i) Fix h Un (F) arbitrarily. Let en = (0, . . . , 0, 1)T Matn1 . The diagram
TI ( f )
TIn (Un (F)) n Ten (S n1 )
TIn (ghg) Ten (xhx)
Th ( f )
Th (Un (F)) T f (h) (S n1 )
commutes. Furthermore, its vertical arrows are, by part (ii) of Lemma 4.1.13,
induced by left-multiplication by h, and hence are isometries of Euclidean spaces.
Therefore we have J(Th ( f )) = J(TIn ( f )).
(ii) Recall the notation i, j, k in Definition E.1. Recall the elementary matrices
ei j Matn (F) with 1 in position (i, j) and 0s elsewhere, see Appendix E.1. By
Lemma 4.1.15 the collection
{(uei j u e ji )/ 2 : 1 i < j n, u {1, i, j, k} F}
{ueii : 1 i n, u {i, j, k} F}
is an orthonormal basis for TIn (Un (F)). Let be a curve in Un (F) with (0) = In
and (0) = X TIn (Un (F)). We have
(TIn ( f ))(X) = ( en ) (0) = Xen ,
hence the collection
{(uein u eni )/ 2 : 1 i < n, u {1, i, j, k} F}
{uenn : u {i, j, k} F}
is an orthonormal basis for TIn (Un (F))(ker(TIn ( f ))) . An application of Lemma
F.19 yields the desired formula.
(iii) This follows from the preceding two statements, since f is onto.
Proof of Proposition 4.1.14 Assume at first that n > 1. We apply Corollary 4.1.10
to f with 1. After simplifying with the help of the preceding two lemmas, we
find the relation
(1n)
2 [Un (F)] = [Un1 (F)] [S n1 ] .
By induction on n we conclude that formula (4.1.13) holds for all positive integers
n; the induction base n = 1 holds because S 1 = U1 (F).
With an eye toward the proof of Proposition 4.1.4 about Jacobi ensembles, we
prove the following concerning the spaces Flagn ( , F) defined in (4.1.4).
198 4. S OME GENERALITIES
Proof In view of Corollary E.12 (the spectral theorem for self-adjoint matrices
over F), Flagn (D, F) is the set of projectors in Matn (F) of trace p. Now consider
the open set O Hn (F) consisting of matrices whose p-by-p block in upper left
is invertible, noting that D O. Using Corollary E.9, one can construct a smooth
map from Mat pq (F) to O Flagn (D, F) with a smooth inverse. Now let P
Flagn (D, F) be any point. By definition P = U DU for some U Un (D, F). By
Lemma 4.1.13 the set {UMU | M O Flagn (D, F)} is a neighborhood of P
diffeomorphic to O Flagn (D, F) and hence to Mat pq (F). Thus Flagn (D, F) is
indeed a manifold of dimension pq.
Motivated by Lemma 4.1.18, we refer to Flagn (D, F) as the flag manifold deter-
mined by D. In fact the claim in Lemma 4.1.18 holds for all real diagonal matrices
D, see Exercise 4.1.19 below.
dim Un (F) dim Uni (F).
i=1
[Un (F)]
[Flagn ( , F)] =
i=1 [U (F)]
|i j | . (4.1.17)
ni 1i< jn
i = j
(c) Derive the joint distribution of eigenvalues in the GUE, GOE and GSE from
(4.1.17) and (4.1.18).
For the rest of Section 4.1 we will be working in the setup of Lie groups, see
Appendix F for definitions and basic properties. We aim to derive an integration
formula of Weyl type, Theorem 4.1.28, in some generality, which encompasses
all the results enunciated in Section 4.1.1.
Our immediate goal is to introduce a framework within which a uniform ap-
proach to derivation of joint eigenvalue distributions is possible. For motivation,
suppose that G and M are submanifolds of Matn (F) and that G is a closed sub-
group of Un (F) such that {gmg1 : m M, g G} = M. We want to integrate
out the action of G. More precisely, given a submanifold M which satisfies
M = {g g1 : g G, }, and a function on M such that (gmg1 ) = (m)
for all m M and g G, we want to represent d M in a natural way as an
integral on . This is possible if we can control the set of solutions (g, ) G
of the equation g g1 = m for all but a negligible set of m M. Such a procedure
was followed in Section 2.5 when deriving the law of the eigenvalues of the GOE.
However, as was already noted in the derivation of the law of the eigenvalues of
the GUE, decompositions of the form m = g g1 are not unique, and worse, the
set {(g, ) G : g g1 = m} is in general not discrete. Fortunately, however,
it typically has the structure of compact manifold. These considerations (and hind-
sight based on familiarity with classical matrix ensembles) motivate the following
definition.
H, M and with common ambient space Matn (F) satisfying the following condi-
tions:
(I) (a) G is a closed subgroup of Un (F),
(b) H is a closed subgroup of G, and
(c) dim G dim H = dim M dim .
(II) (a) M = {g g1 : g G, },
(b) = {h h1 : h H, },
(c) for every the set {h h1 : h H} is finite, and
(d) for all , we have = .
(III) There exists such that
(a) is open in ,
(b) ( \ ) = 0, and
(c) for every we have H = {g G : g g1 }.
We say that a subset for which (IIIa,b,c) hold is generic.
We emphasize that by conditions (Ia,b), the groups G and H are compact, and
that by Lemma 4.1.13(iii), the measures G and H are Haar measures. We also
remark that we make no connectedness assumptions concerning G, H, M and
. (In general, we do not require manifolds to be connected, although we do
assume that all tangent spaces of a manifold are of the same dimension.) In fact,
in practice, H is usually not connected.
In the next proposition we present the simplest example of a Weyl quadruple.
We recall, as in Definition E.4, that a matrix h Matn (F) is monomial if it factors
as the product of a diagonal matrix and a permutation matrix.
This Weyl quadruple and the value of the associated constant [G]/ [H] will be
used to prove Proposition 4.1.1.
Proof Of all the conditions imposed by Definition 4.1.22, only conditions (Ic),
4.1 J OINT DISTRIBUTIONS FOR CLASSICAL MATRIX ENSEMBLES 201
(IIa) and (IIIc) require special attention, because the others are clear. To verify
condition (Ic), we note that
The first two equalities are clear since M and are real vector spaces. By Lemma
4.1.15 the tangent space TIn (G) consists of the collection of anti-self-adjoint ma-
trices in Matn (F), and thus the third equality holds. So does the fourth because
TIn (H) consists of the diagonal elements of TIn (G). Thus condition (Ic) holds.
To verify condition (IIa), we have only to apply Corollary E.12(i) which asserts
the possibility of diagonalizing a self-adjoint matrix. To verify condition (IIIc),
arbitrarily fix , and g G such that g g1 = , with the goal to show
that g H. In any case, by Corollary E.12(ii), the diagonal entries of are merely
a rearrangement of those of . After left-multiplying g by a permutation matrix
(the latter belongs by definition to H), we may assume that = , in which case g
commutes with . Then, because the diagonal entries of are distinct, it follows
that g is diagonal and thus belongs to H. Thus (IIIc) is proved. Thus (G, H, M, )
is a Weyl quadruple for which is generic.
We turn to the verification of formula (4.1.19). It is clear that the numerator on
the right side of (4.1.19) is correct. To handle the denominator, we observe that H
is the disjoint union of n! isometric copies of the manifold U1 (F)n , and then apply
Proposition F.8(vi). Thus (4.1.19) is proved.
Note that condition (IIa) of Definition 4.1.22 implies that gmg1 M for all
m M and g G. Thus the following definition makes sense.
For the calculation of the factor J(T(g, ) ( f )) figuring in the coarea formula for the
map f we need to understand for each fixed the structure of the derivative
at In G of the map
f = (g g g1 ) : G M (4.1.21)
202 4. S OME GENERALITIES
Concerning the derivative TIn ( f ) we then have the following key result.
Lemma 4.1.26 Fix a Weyl quadruple (G, H, M, ) with ambient space Matn (F)
and a point . Let f be as in (4.1.21). Then we have
be the linear map induced by TIn ( f ). For each we define the Weyl operator
to equal D D .
Theorem 4.1.28 (Weyl) Let (G, H, M, ) be a Weyl quadruple. Then for every
Borel-measurable nonnegative G-conjugation-invariant function on M, we have
[G]
d M = ( ) det d ( ) .
[H]
4.1 J OINT DISTRIBUTIONS FOR CLASSICAL MATRIX ENSEMBLES 203
The proof takes up the rest of Section 4.1.3. We emphasize that a Weyl quadruple
(G, H, M, ) with ambient space Matn (F) is fixed now and remains so until the
end of Section 4.1.3.
We begin with the analysis of the maps f and f defined in (4.1.20) and (4.1.21),
respectively.
Proof The function f |H is continuous and by assumption (IIc) takes only finitely
many values. Thus f |H is locally constant, whence the result.
The inclusion follows from assumption (IIb). To prove the opposite inclu-
sion , suppose now that g g1 = g0 0 g1 0 for some g G and . Then
we have g1 g0 H by assumption (IIIc), hence g1 0 g = h for some h H, and
hence (g, ) = (g0 h, h1 0 h). The claim is proved. By assumptions (Ia,b) and
Lemma 4.1.13(iii), the map
(h g0 h) : H g0 H = {g0 h : h H}
is an isometry of manifolds, and indeed is the restriction to H of an isometry of
Euclidean spaces. In view of Lemma 4.1.29, the map
(h (g0 h, h1 0 h)) : H f 1 (g0 0 g1
0 ) (4.1.26)
is also an isometry, which finishes the proof of Lemma 4.1.30.
Note that we have not asserted that the map (4.1.26) preserves distances as
measured in ambient Euclidean spaces, but rather merely that it preserves geodesic
distances within the manifolds in question. For manifolds with several connected
components (as is typically the case for H), distinct connected components are
considered to be at infinite distance one from the other.
Proof of Lemma 4.1.26 The identity (4.1.22) follows immediately from Lemma
4.1.29.
204 4. S OME GENERALITIES
We prove (4.1.23). Let be a curve in G with (0) = In and (0) = X TIn (G).
Since ( 1 ) = 1 1 , we have TIn ( f )(X) = ( 1 ) (0) = [X, ]. Thus
(4.1.23) holds.
It remains to prove (4.1.24). As a first step, we note that
[ , X] = 0 for and X T () . (4.1.27)
Indeed, let be a curve in with (0) = and (0) = X. Then [ , ] vanishes
identically by Assumption (IId) and hence [ , X] = 0.
We further note that
[X, ] Y = X [Y, ] for X,Y Matn (F) , (4.1.28)
which follows from the definition A B = trX Y for any A, B Matn (F) and
straightforward manipulations.
We now prove (4.1.24). Given X TIn (G) and L T (), we have
TIn ( f )(X) L = [X, ] L = X [L, ] = 0 ,
where the first equality follows from (4.1.23), the second from (4.1.28) and the
last from (4.1.27). This completes the proof of (4.1.24) and of Lemma 4.1.26.
Lemma 4.1.31 Let : Matn (F) TIn (G) TIn (H) be the orthogonal projec-
tion. Fix . Then the following hold:
(X) = ([ , [ , X]]) for X TIn (G) TIn (H) , (4.1.29)
J(T(g, ) ( f )) = det for g G . (4.1.30)
Proof We prove (4.1.29). Fix X,Y TIn (G) TIn (H) arbitrarily. We have
(X) Y = D (D (X)) Y = D (X) D (Y )
= TIn ( f )(X) TIn ( f )(Y )
= [X, ] [Y, ] = [[X, ], ] Y = ([[X, ], ]) Y
at the first step by definition, at the second step by definition of adjoint, at the third
step by definition of D , at the fourth step by (4.1.23), at the fifth step by (4.1.28)
and at the last step trivially. Thus (4.1.29) holds.
Fix h G arbitrarily. We claim that J(T(h, ) ( f )) is independent of h G.
Toward that end consider the commuting diagram
T(In , ) ( f )
T(In , ) (G ) T (M)
T(In , ) ((g, )(hg, )) T (mhmh1 ).
T(h, ) ( f )
T(h, ) (G ) Thmh1 (M)
4.1 J OINT DISTRIBUTIONS FOR CLASSICAL MATRIX ENSEMBLES 205
Since the vertical arrows are isometries of Euclidean spaces by assumption (Ia)
and Lemma 4.1.13(ii), it follows that J(T(h, ) ( f )) = J(T(In , ) ( f ), and in particular
is independent of h, as claimed.
We now complete the proof of (4.1.30), assuming without loss of generality that
g = In . By definition
where we recall that the direct sum is equipped with Euclidean structure by declar-
ing the summands to be orthogonal. Clearly we have
By (4.1.24) and (4.1.31), the linear map T(In , ) ( f ) decomposes as the orthogonal
direct sum of TIn ( f ) and the identity map of T () to itself. Consequently we
have J(TIn , ( f )) = J( TIn ( f )) by Lemma F.18. Finally, by assumption (Ic),
formula (4.1.22) and Lemma F.19, we find that J( TIn ( f )) = det .
Proof of Theorem 4.1.28 Let Mreg be the set of regular values of the map f . We
have
[ f 1 (m)] (m)d M (m) = ( ) det d G (g, )
Mreg
= [G] ( ) det d ( ) . (4.1.32)
The two equalities in (4.1.32) are justified as follows. The first holds by formula
(4.1.30), the pushed down version (4.1.11) of the coarea formula, and the fact
that ( f (g, )) = ( ) by the assumption that is G-conjugation-invariant. The
second holds by Fubinis Theorem and the fact that G = G by Proposi-
tion F.8(vi).
By assumption (IIa) the map f is onto, hence Mreg = M \ Mcrit , implying by
Sards Theorem (Theorem F.11) that Mreg has full measure in M. For every m
Mreg , the quantity [ f 1 (m)] is positive (perhaps infinite). The quantity [G] is
positive and also finite since G is compact. It follows by (4.1.32) that the claimed
integration formula at least holds in the weak sense that a G-conjugation-invariant
Borel set A M is negligible in M if the intersection A is negligible in .
Now put M = {g g1 : g G, }. Then M is a Borel set. Indeed, by
assumption (IIIa) the set is -compact, hence so is M . By construction M is
G-conjugation-invariant. Now we have M , hence by assumption (IIIb)
the intersection M is of full measure in , and therefore by what we proved
in the paragraph above, M is of full measure in M. Thus, if we replace by 1M
in (4.1.32), neither the first nor the last integral in (4.1.32) changes and further,
206 4. S OME GENERALITIES
by Lemma 4.1.30, we can replace the factor f 1 (m) f 1 (m) in the first integral by
[H]. Therefore we have
[H] (m)d M (m) = [G] ( ) det d ( ) .
M Mreg M
We now present the proofs of the integration formulas of Section 4.1.1. We prove
each by applying Theorem 4.1.28 to a suitable Weyl quadruple.
We begin with the Gaussian/Hermite ensembles.
Proof of Proposition 4.1.1 Let (G, H, M, ) be the Weyl quadruple defined in
Proposition 4.1.23. As in the proof of Lemma 4.1.17 above, and for a similar
purpose, we use the notation ei j , i, j, k. By Lemma 4.1.15 we know that TIn (G)
Matn (F) is the space of anti-self-adjoint matrices, and it is clear that TIn (H)
TIn (G) is the subspace consisting of diagonal anti-self-adjoint matrices. Thus the
set
8 9
uei j u e ji u {1, i, j, k} F, 1 i < j n
is an orthogonal basis for TIn (G) TIn (H) . By formula (4.1.29), we have
and hence
/
det diag(x) = |(x)| for x Rn .
To finish the bookkeeping, note that the map x diag(x) sends Rn isometrically to
and hence pushes Lebesgue measure on Rn forward to . Then the integration
formula (4.1.1) follows from Theorem 4.1.28 combined with formula (4.1.19) for
[G]/ [H].
Let be the subset consisting of elements for which the corresponding real
diagonal matrix x has nonzero diagonal entries with distinct absolute values. Then
(G, H, M, ) is a Weyl quadruple with ambient space Matn (F) for which the set
is generic and, furthermore,
[G] [U p (F)] [Uq (F)]
= p . (4.1.33)
[H] 2 p!(2( 1)/2 [U1 (F)]) p [Uqp (F)]
We remark that in the case p = q we are abusing notation slightly. For p = q one
should ignore V in the definition of H, and similarly modify the other definitions
and formulas.
Proof Of the conditions imposed by Definition 4.1.22, only conditions (Ic), (IIa)
and (IIIc) deserve comment. As in the proof of Proposition 4.1.23 one can verify
(Ic) by means of Lemma 4.1.15. Conditions (IIa) and (IIIc) follow from Corollary
E.13 concerning the singular value decomposition in Mat pq (F), and specifically
follow from points (i) and (iii) of that corollary, respectively. Thus (G, H, M, ) is
a Weyl quadruple for which is generic.
Turning to the proof of (4.1.33), note that the group G is isometric to the product
U p (F) Uq (F). Thus the numerator on the right side of (4.1.33) is justified. The
map x diag(x, x) from U1 (F) to 2U (F) magnifies by a factor of 2. Abusing
notation, we denote its image by 2U1 (F). The group H is the disjoint union of
2 p p! isometric copies of the manifold ( 2U1 (F)) p Uqp (F). This justifies the
denominator on the right side of (4.1.33), and completes the proof.
TIn (G) TIn (H) may be described as the set of matrices of the form
a a+b 0 0
b := 0 ab c
c 0 c 0
where a, b Mat p (F) are anti-self-adjoint with a vanishing identically on the di-
agonal, and c Mat pq (F). Given (real) diagonal x Mat p , we also put
0 x 0
(x) := x 0 0 ,
0 0 0qp
Integration formula (4.1.3) now follows from Theorem 4.1.28 combined with for-
mula (4.1.33) for [G]/ [H].
We turn next to the Jacobi ensembles. The next proposition provides the needed
Weyl quadruples.
where:
(Unlike in the proofs of Propositions 4.1.1 and 4.1.3, the orthogonal projection
is used nontrivially.) We find that
/ p p
det (diag(x),diag(y)) = |(x)| (4xi (1 xi ))( 1)/2 (xir (1 xi )s ) /2
i=1 i=1
4.1 J OINT DISTRIBUTIONS FOR CLASSICAL MATRIX ENSEMBLES 211
for x, y R p such that xi (1 xi ) = y2i (and hence xi [0, 1]) for i = 1, . . . , p. The
calculation of the determinant is straightforward once it is noted that the identity
The next five propositions supply the Weyl quadruples needed to prove Proposi-
tion 4.1.6. All the propositions have similar proofs, with the last two proofs being
the hardest. We therefore supply only the last two proofs.
Proof of Proposition 4.1.37 Only conditions (IIa) and (IIIc) require proof. The
other parts of the proposition, including formula (4.1.38), are easy to check.
To verify condition (IIa), fix m M arbitrarily. After conjugating m by some
element of G, we may assume by Theorem E.11 that m is block-diagonal with R-
standard blocks on the diagonal. Now the only orthogonal R-standard blocks are
4.1 J OINT DISTRIBUTIONS FOR CLASSICAL MATRIX ENSEMBLES 213
1 Mat1 and R( ) Mat2 for 0 < < . Since we assume det m = 1, there are
even numbers of 1s and 1s along the diagonal of m, and hence after conjugating
m by a suitable permutation matrix, we have m as required. Thus condition
(IIa) is proved.
To verify condition (IIIc), we fix , g G and such that g g1 = ,
with the goal to show that g H. After conjugating by a suitably chosen element
of Wn+ , we may assume that the angles 1 , . . . , describing , as in the definition
of , satisfy 0 < 1 < < < . By another application of Theorem E.11,
after replacing g by wg for suitably chosen w Wn+ , we may assume that = .
Then g commutes with , which is possible only if g . Thus condition (IIIc)
is proved, and the proposition is proved.
Proof of Proposition 4.1.6 It remains only to calculate det for each of the
five types of Weyl quadruples defined above in order to complete the proofs of
(4.1.6), (4.1.7), (4.1.8) and (4.1.9), for then we obtain each formula by invoking
Theorem 4.1.28, combined with the formulas (4.1.35), (4.1.36), (4.1.37), (4.1.38)
and (4.1.39), respectively, for the ratio [H]/ []. Note that the last two Weyl
quadruples are needed to handle the two terms on the right side of (4.1.9), respec-
tively.
All the calculations are similar. Those connected with the proof of (4.1.9) are
the hardest, and may serve to explain all the other calculations. In the follow-
ing, we denote the Weyl quadruples defined in Propositions 4.1.37 and 4.1.38 by
(G, H + , M + , + ) and (G, H , M , ), respectively. We treat each quadruple in
a separate paragraph below.
To prepare for the calculation it is convenient to introduce two special functions.
Given real numbers and , let D( , ) be the square-root of the absolute value
of the determinant of the R-linear operator
on Mat2 (R), and let C( ) be the square-root of the absolute value of the determi-
214 4. S OME GENERALITIES
which proves (4.1.9) for all functions supported on M . (The last factor of 2 is
accounted for by the fact that for Z Mat2 real antisymmetric, [ , [ , Z]] = 4Z.)
This completes the proof of (4.1.9).
All the remaining details needed to complete the proof of Proposition 4.1.6,
being similar, we omit.
Exercise 4.1.39
Let G = Un (C) and let H G be the subgroup consisting of monomial ele-
ments. Let M Matn (C) be the set consisting of normal matrices with distinct
eigenvalues, and let M be the subset consisting of diagonal elements. Show
that (G, H, M, ) is a Weyl quadruple. Show that det = 1i< jn |i j |2
for all = diag(1 , . . . , n ) .
Note that the event in (4.2.1) is measurable due to the fact that is Polish. One
may think about also in terms of configurations. Let X denote the space of
locally finite configurations in , and let X = denote the space of locally finite
configurations with no repetitions. More precisely, for xi , i I an interval
of positive integers (beginning at 1 if nonempty), with I finite or countable, let
[xi ] denote the equivalence class of all sequences {x (i) }iI , where runs over all
permutations (finite or countable) of I. Then, set
X = X () = {x = [xi ]i=1 , where xi , , and
|xK | := {i : xi K} < for all compact K }
and
X = = {x X : xi = x j for i = j} .
We endow X and X = with the -algebra CX generated by the cylinder sets
CnB = {x X : |xB | = n}, with B Borel with compact closure and n a nonnegative
integer. Since = i=1 i for some (possibly random) and random i , each
point process can be associated with a point in X (in X = if is simple). The
216 4. S OME GENERALITIES
With a slight abuse, we will therefore not distinguish between the point process
and the induced configuration x. In the sequel, we associate the law with the
point process , and write E for expectation with respect to this law.
We next note that if x is not simple, then one may construct a simple point pro-
cess x = {(xj , N j )}j=1 X ( ) on = N+ by letting denote the num-
ber of distinct entries in x, introducing a many-to-one mapping j(i) : {1, . . . , }
{1, . . . , } with N j = |{i : j(i) = j}| such that if j(i) = j(i ) then xi = xi , and then
setting xj = xi if j(i) = j. In view of this observation, we only consider in the se-
quel simple point processes.
Definition 4.2.3 Let be a simple point process. Assume locally integrable func-
tions k : k [0, ), k 1, exist such that for any mutually disjoint family of
subsets D1 , , Dk of ,
k
E [ (Di )] = k (x1 , , xk )d (x1 ) d (xk ) .
i=1 ki=1 Di
Then the functions k are called the joint intensities (or correlation functions) of
the point process with respect to .
The term correlation functions is standard in the physics literature, while joint
intensities is more commonly used in the mathematical literature.
The joint intensities, if they exist, allow one to consider overlapping sets, as well.
In what follows, for a configuration x X = , and k integer, we let xk denote
4.2 D ETERMINANTAL POINT PROCESSES 217
Proof of Lemma 4.2.5 Note first that, for any compact Q , there exists an
increasing sequence of partitions {Qni }ni=1 of Q such that, for any x Q,
. .
Qni = {x} .
n i:xQni
Thus
E (Mkn ) = (Q1 Qk )B
k (x1 , . . . , xk )d (x1 ) . . . d (xk ) . (4.2.4)
(Q1 ,...,Qk )Qnk
Note that Mkn increases monotonically in n to |xk B|. On the other hand, since x
is simple, and by our convention concerning the intensities k , see Remark 4.2.4,
lim sup
n (Q ,...,Q )(Q 1 )k \Q k (Q1 Qk )B
k (x1 , . . . , xk )d (x1 ) . . . d (xk ) = 0 .
1 k n n
The conclusion follows from these facts, the fact that X is a Radon measure and
(4.2.4).
(b) Equation (4.2.3) follows from (4.2.2) through the choice B = Dki
i .
Remark 4.2.6 Note that a system of nonnegative, measurable and symmetric func-
tions {r : r [0, ]}
r=1 is a system of joint intensities for a simple point process
218 4. S OME GENERALITIES
that consists of exactly n points almost surely, if and only if r = 0 for r > n, 1 /n
is a probability density function, and the family is consistent, that is, for 1 < r n,
r (x1 , . . . , xr )d (xr ) = (n r + 1)r1 (x1 , . . . , xr1 ) .
As we have seen, for a simple point process, the joint intensities give information
concerning the number of points in disjoint sets. Let now Di be given disjoint
compact sets, with D = Li=1 Di be such that E(z (D) ) < for z in a neighborhood
of 1. Consider the Taylor expansion, valid for z in a neighborhood of 1,
L
(D )
L
(Di )! L
z = 1+ ( (Di ) ni )!ni ! (zi 1)ni (4.2.5)
=1 n=1 ni (Di ) i=1 i=1
ni #L n
L
( (Di )( (Di ) 1) ( (Di ) ni + 1)) L
= 1+ (zi 1)ni ,
n=1 ni #L n i=1 ni ! i=1
where
L
{ni #L n} = {(n1 , . . . , nL ) NL+ : ni = n} .
i=1
Then one sees that, under these conditions, the factorial moments in (4.2.3) deter-
mine the characteristic function of the collection { (Di )}Li=1 . A more direct way
to capture the distribution of the point process is via its Janossy densities, that
we define next.
The following easy consequences of the definition are proved in the same way that
Lemma 4.2.5 was proved.
Lemma 4.2.8 For any compact D , if the Janossy densities jD,k , k 1 exist
then
1
P( (D) = k) = jD,k (x1 , . . . , xk ) (dxi ) , (4.2.7)
k! Dk i
4.2 D ETERMINANTAL POINT PROCESSES 219
and, for any mutually disjoint measurable sets Di D, i = 1, . . . , k and any integer
r 0,
P( (D) = k + r, (Di ) = 1, i = 1, . . . , k)
1
= jD,k+r (x1 , . . . , xk+r ) (dxi ) . (4.2.8)
r! ki=1 Di Dr i
In view of (4.2.8) (with r = 0), one can naturally view the collection of Janossy
densities as a distribution on the space k
k=0 D .
Janossy densities and joint intensities are (at least locally, i.e. restricted to a
compact set D) equivalent descriptions of the point process , as the following
proposition states.
where
r
jD,k+r (x1 , . . . , xk , D, . . . , D) = jD,k+r (x1 , . . . , xk , y1 , . . . , yr ) (dyi ) .
Dr i=1
Then the Janossy densities jD,k exist for all k and satisfy
(1)r k+r (x1 , . . . , xk , D, . . . , D)
jD,k (x1 , . . . , xk ) = r!
, (4.2.12)
r=0
where
r
k+r (x1 , . . . , xk , D, . . . , D) = k+r (x1 , . . . , xk , y1 , . . . , yr ) (dyi ) .
Dr i=1
220 4. S OME GENERALITIES
The proof follows the same procedure as in Lemma 4.2.5: partition and use
dominated convergence together with the integrability conditions and the fact that
is assumed simple. We omit further details. We note in passing that under a
slightly stronger assumption of the existence of exponential moments, part (b) of
the proposition follows from (4.2.5) and part (b) of Lemma 4.2.5.
Exercise 4.2.10 Show that, for the standard Poisson process of rate > 0 on
= R with taken as the Lebesgue measure, one has, for any compact D R
with Lebesgue measure |D|,
The following standard result, which we quote from [Sim05b, Theorem 2.12]
without proof, gives sufficient conditions for a (positive definite) kernel to be ad-
missible.
By standard results, see e.g. [Sim05b, Theorem 1.4], an integral compact operator
K with admissible kernel K possesses the decomposition
n
K f (x) = k k (x)k , f L2 ( ) , (4.2.17)
k=1
We will later see (see Corollary 4.2.21) that if the kernel K in definition 4.2.11 of
a determinantal process is (locally) admissible, then it must in fact be good.
The following example is our main motivation for discussing determinantal
point processes.
We state next the following extension of Lemma 4.2.5. (Recall, see Definition
3.4.3, that (G) denotes the Fredholm determinant of a kernel G.)
Proof of Lemma 4.2.16 By our assumptions, the right side of (4.2.19) is well
defined for any choice of (z )L=1 CL as a Fredholm determinant (see Definition
3.4.3), and
L
1D (1 z )K1D 1
=1
@n
L
1
= det (z 1)K(xi , x j )1D (x j ) (dx1 ) (dxL )
n=1 n! D D =1 i, j=1
L n
1
= n! (zk 1) (4.2.20)
n=1 1 ,...,n =1 k=1
1 2n
det 1D (xi )K(xi , x j )1D j (x j ) (dx1 ) (dxL ) .
i, j=1
On the other hand, recall the Taylor expansion (4.2.5). Using (4.2.3) we see that
the -expectation of each term in the last power series equals the corresponding
term in the power series in (4.2.20), which represents an entire function. Hence,
by monotone convergence, (4.2.19) follows.
A natural question is now whether, given a good kernel K, one may construct
an associated determinantal point process. We will answer this question in the
4.2 D ETERMINANTAL POINT PROCESSES 223
Proof By assumption, n < in (4.2.18). The matrix {K(xi , x j )}ki, j=1 has rank at
most n for all k. Hence, by (4.2.3), () n, almost surely. On the other hand,
n
E ( ()) = 1 (x)d (x) = K(x, x)d (x) = |i (x)|2 d (x) = n .
i=1
A simple proof of Proposition 4.2.19 can be obtained by noting that the function
detni, j=1 K(xi , x j )/n! is nonnegative, integrates to 1, and by a computation similar
to Lemma 3.2.2, see in particular (3.2.10), its kth marginal is (n k)! detki, j=1
K(xi , x j )/n!. We present an alternative proof that has the advantage of providing
an explicit construction of the resulting determinantal point process.
Proof For a finite-dimensional subspace H of L2 ( ) of dimension d, let KH
denote the projection operator into H and let KH denote an associated kernel. That
is, KH (x, y) = dk=1 k (x)k (y) for some orthonormal family {k }dk=1 in H. For
x , set kxH () = KH (x, ). (Formally, kxH = KH x , in the sense of distributions.)
The function kxH () L2 ( ) does not depend on the choice of basis {k }, for
almost every x: indeed, if {k } is another orthonormal basis in H, then there exist
complex coefficients {ai, j }ki, j=1 such that
d d
k = ak, j j , ak, j ak, j = j, j .
j=1 j=1
If j = 0, stop.
Pick a point Z j distributed according to H j / j.
H
Let H j1 be the orthocomplement to the function kZ jj in H j .
Decrease j by one and iterate.
Hence the density of the random vector (Z1 , . . . , Zn ) with respect to n equals
H n KH kH 2
n kx j j 2 j xj
p(x1 , . . . , xn ) = = .
j=1 j j=1 j
equals the volume of the parallelepiped determined by the vectors kxH1 , . . . , kxHn in
the finite-dimensional subspace H L2 ( ). Since kxHi (x)kxHj (x) (dx) = K(xi , x j ),
it follows that V 2 = det(K(xi , x j ))ni, j=1 . Hence
1
p(x1 , . . . , xn ) = det(K(xi , x j ))ni, j=1 .
n!
Thus, the random variables Z1 , . . . , Zn are exchangeable, almost surely distinct,
and the n-point intensity of the point process x equals n!p(x1 , . . . , xn ). In partic-
ular, integrating and applying the same argument as in (3.2.10), all k-point inten-
sities have the determinantal form for k n. Together with Lemma 4.2.18, this
completes the proof.
The statement in the proposition can be interpreted as stating that the mixture of
determinental processes I has the same distribution as .
Proof Assume first n is finite. We need to show that for all m n, the m-point
joint intensities of and I are the same, that is
m m
det (K(xi , x j )) = E[ det (KI (xi , x j ))] .
i, j=1 i, j=1
= lim
N
det(C{1,..,m}{1 , ,m } ) det(B{1 , ,m }{1,..,m} )
11 <<m N
m m
= lim det (KN (xi , x j )) = det (K(xi , x j )) , (4.2.24)
N i, j=1 i, j=1
Thus, 1 is determinantal with kernel K1 = (1/1 )K. Since had finitely many
points almost surely (recall that K was assumed trace-class), it follows that
P(1 () = 0) > 0. But, the process 1 can be constructed by the procedure
of Proposition 4.2.20, and since the top eigenvalue of K1 equals 1, we obtain
P(1 () 1) = 1, a contradiction.
Corollary 4.2.22 Let K be a locally admissible kernel on , such that for any
compact D , the nonzero eigenvalues of KD belong to (0, 1]. Then K uniquely
determines a determinantal point process on .
4.2 D ETERMINANTAL POINT PROCESSES 227
Proof Repeat the argument in the proof of the necessity part of Corollary 4.2.21.
We begin with the following immediate corollary of Proposition 4.2.20 and Lemma
4.2.18. Throughout, for a good kernel K and a set D , we write KD (x, y) =
1D (x)K(x, y)1D (y) for the restriction of K to D.
Corollary 4.2.24 Let K be a good kernel, and let D be such that KD is trace-
class, with eigenvalues k , k 1. Then (D) has the same distribution as k k
where k are independent Bernoulli random variables with P(k = 1) = k and
P(k = 0) = 1 k .
The above representation immediately leads to a central limit theorem for oc-
cupation measures.
n (Dn ) E [n (Dn )]
Zn =
Var(n (Dn ))
Proof We write Kn for the kernel (Kn )Dn and set Sn = Var(n (Dn )). By
Corollary 4.2.24, n (Dn ) has the same distribution as the sum of independent
Bernoulli variables kn , whose parameters kn are the eigenvalues of Kn . In partic-
228 4. S OME GENERALITIES
k
k k
= + log(1 + kn (e /Sn 1))
Sn k
2 k kn (1 kn ) k kn (1 kn )
= + o( ),
2Sn2 Sn3
uniformly for in compacts. Since k kn /Sn3 n 0, the conclusion follows.
(n)
We also note that from (4.2.3) (with r = 1 and k = 2, and k denoting the inten-
sity functions corresponding to the kernel Kn from Theorem 4.2.25), we get
Var(n (Dn )) = Kn (x, x)d n (x) Kn2 (x, y)d n (x)d n (y) . (4.2.26)
Dn Dn Dn
Exercise 4.2.26 Using (4.2.26), provide an alternative proof that a necessary con-
dition for Var(n (Dn )) is that (4.2.25) holds.
The GUE
Corollary 4.2.27 Let D = [a, b] with a, b > 0, (1/2, 1/2), and set DN =
N D. Then
N (DN ) E[N (DN )]
ZN =
Var(N (DN ))
converges in distribution towards a standard normal variable.
Proof In view of Example 4.2.15 and Theorem 4.2.25, the only thing we need to
check is that Var(N (DN )) as N . Recalling that
2
K (N) (x, y) dy = K (N) (x, x) ,
R
it follows from (4.2.26) that for any R > 0, and all N large,
2
Var(N (DN )) = K (N) (x, y) dxdy
DN (DN )c
2
1 x y
= K (N) ( , ) dxdy
NDN N(DN )c N N N
0 R
(N)
SbN (x, y)dxdy , (4.2.27)
R 0
where
(N) 1 x y
Sz (x, y) = K (N) z + , z +
N N N
(N)
is as in Exercise 3.7.5, and SbN (x, y) converges uniformly on compacts, as N
, to the sine-kernel sin(x y)/( (x y)). Therefore, there exists a constant c > 0
such that the right side of (4.2.27) is bounded below, for large N, by c log R. Since
R is arbitrary, the conclusion follows.
Exercise 4.2.28 Using Exercise 3.7.5 again, prove that if DN = [a N, b N]
with a, b (0, 2), then Corollary 4.2.27 still holds.
Exercise 4.2.29 Prove that the conclusions of Corollary 4.2.27 and Exercise 4.2.28
hold when the GUE is replaced by the GOE.
Hint: Write (N) (DN ) for the variable corresponding to N (DN ) in Corollary
4.2.27, with the GOE replacing the GUE. Let (N) (DN ) and (N+1) (DN ) be inde-
pendent.
(a) Use Theorem 2.5.17 to show that N (DN ) can be constructed on the same prob-
ability space as (N) (DN ), (N+1) (DN ) in such a way that, for any > 0, there is
a C so that
lim sup P(|N (DN ) ( (N) (DN ) + (N+1) (DN ))/2| > C ) < .
N
230 4. S OME GENERALITIES
Writing ksine (z) = Ksine (x, y)|z=xy , we see that ksine (z) is the Fourier transform of
the function 1[1/2 ,1/2 ] ( ). In particular, for any f L2 (R),
1/2
f , Ksine f = f (x) f (y)ksine (x y)dxdy = | f( )|2 d f 22 .
1/2
(4.2.28)
Thus, Ksine (x, y) is positive definite, and by Lemma 4.2.13, Ksine is locally admis-
sible. Further, (4.2.28) implies that all eigenvalues of restrictions of Ksine to any
compact interval belong to the interval [0, 1]. Hence, by Corollary 4.2.22, Ksine
determines a determinantal point process on R (which is translation invariant in
the terminology of Section 4.2.6 below).
is the contour in the -plane consisting of the ray joining e i/3 to the origin plus
the ray joining the origin to e i/3 , and the Airy kernel KAiry (x, y) = A(x, y) :=
(Ai(x) Ai (y) Ai (x) Ai(y))/(x y) . Take = R and the Lebesgue measure.
Fix L > and let KAiry L denote the operator on L2 ([L, )) determined by
KAiry
L
f (x) = KAiry (x, y) f (y)dy .
L
Proposition 4.2.30 For any L > , the kernel KAiry L (x, y) is locally admissible.
Further, all the eigenvalues of its restriction to compact sets belong to the interval
L
(0, 1]. In particular, KAiry determines a determinantal point process.
4.2 D ETERMINANTAL POINT PROCESSES 231
To complete the proof, as in the case of the sine process, we need an upper
bound on the eigenvalues of restrictions of KAiry to compact subsets of R. Toward
this end, deforming the contour of integration in the definition of Ai(x) to the
imaginary line, using integration by parts to control the contribution of the integral
outside a large disc in the complex plane, and applying Cauchys Theorem, we
obtain the representation, for x R,
R
1 3 /3+xs)
Ai(x) = lim ei(s ds ,
R 2 R
with the convergence uniform for x in compacts (from this, one can conclude
3
that Ai(x) is the Fourier transform, in the sense of distributions, of eis /3 / 2 , al-
though we will not use that). We now obtain, for continuous functions f supported
on [M, M] [L, ),
2 2
M
f , KAiry f = f (x) Ai(x + t)dx dt f (x) Ai(x + t)dx dt .
0 L M
(4.2.30)
But, for any fixed K > 0,
K M 2
f (x) Ai(x + t)dx dt
K M
K M 2
1 R i(s3 /3+ts) ixs
= elim e ds f (x)dx dt
K M R 2 R
1 K R i(s3 /3+ts) 2
= lim e f (s)ds dt ,
R 2 K R
232 4. S OME GENERALITIES
where f denotes the Fourier transform of f and we have used dominated conver-
gence (to pull the limit out) and Fubinis Theorem in the last equality. Therefore,
K M 2 K R 2
1
eits eis /3 f(s)ds dt
3
f (x) Ai(x + t)dx dt = lim
K M R K 2 R
2
1
eits eis /3 1[R,R] (s) f(s)ds dt
3
lim sup
R 2
2
it 3 /3 2
= lim sup e 1[R,R] (t) f(t)dt dt f (t) dt = f 22 ,
R
where we used Parsevals Theorem in the two last equalities. Using (4.2.30), we
thus obtain
f , KAiry f f 22 ,
first for all compactly supported continuous functions f and then for all f
L2 ([L, )) by approximation. An application of Corollary 4.2.22 completes the
proof.
In particular,
Vol(D)K(0) K 2 (x y)dxdy .
DD
By monotone convergence, it then follows by taking L that K 2 (x)dx
K(0) < . Further, again from (4.2.26),
Var( (D)) = Vol(D)(K(0) K(x)2 dx) + dx K 2 (x y)dy.
Rd D y:yD
Since Rd K(x)2 dx < , (4.2.31) follows from the last equality.
We emphasize that the RHS in (4.2.31) can vanish. In such a situation, a more
careful analysis of the limiting variance is needed. We refer to Exercise 4.2.40 for
an example of such a situation in the (important) case of the sine-kernel.
We turn next to the ergodic properties of determinantal processes. It is natural
to discuss these in the framework of the configuration space X . For t Rd , let T t
denote the shift operator, that is for any Borel set A Rd , T t A = {x + t : x A}.
We also write T t f (x) = f (x + t) for Borel functions. We can extend the shift to
act on X via the formula T t x = (xi + t)i=1 for x = (xi )i=1 . T t then extends to a
shift on CX in the obvious way. Note that one can alternatively also define T t
by the formula T t (A) = (T t A).
Proof Recall from Theorem 4.2.25 that K 2 (x)dx < . It is enough to check
that for arbitrary collections of compact Borel sets {Fi }Li=1 1
and {G j }Lj=1
2
such that
Fi Fi = 0/ and G j G j = 0/ for i = i , j = j , and with the notation Gtj = T t G j , it
holds that for any z = {zi }Li=1
1
CL1 , w = {w j }Lj=1
2
CL2 ,
L1 L2 L1 L2
(F ) (Gt ) (F ) (G )
E zi i wj j |t| E zi i E wj j . (4.2.32)
i=1 j=1 i=1 j=1
234 4. S OME GENERALITIES
L1 L2
Define F = i=1 Fi , Gt = t
j=1 G j . Let
L1 L2
K1 = 1F (1 zi )K1Fi , K2t = 1Gt (1 w j )K1Gtj ,
i=1 j=1
L2 L1
t
K12 = 1F (1 w j )K1Gtj , t
K21 = 1Gt (1 zi )K1Fi .
j=1 i=1
By Lemma 4.2.16, the left side of (4.2.32) equals, for |t| large enough so that
F Gt = 0,
/
(K1 + K2t + K12
t t
+ K21 ). (4.2.33)
t
|t| 0, supx,y K21 |t| 0. Therefore, by
Note that, by assumption, supx,y K12 t
Next, note that for |t| large enough such that F Gt = 0, K1 K2t = 0 and hence,
by the definition of the Fredholm determinant,
where K2 := K20 and the last equality follows from the translation invariance of K.
Therefore, substituting in (4.2.33) and using (4.2.34), we get that the left side of
(4.2.32) equals (K1 )(K2 ). Using Lemma 4.2.16 again, we get (4.2.32).
exists and is strictly positive, and is called the intensity of the point process.
For stationary point processes, an alternative description can be obtained by
considering configurations conditioned to have a point at the origin. When spe-
cialized to one-dimensional stationary point processes, this point of view will be
used in Subsection 4.2.7 when relating statistical properties of the gap around zero
for determinantal processes to ergodic averages of spacings.
Definition 4.2.35 Let be a translation invariant point process, and let B denote a
Borel subset of Rd of positive and finite Lebesgue measure. The Palm distribution
Q associated with is the measure on M+ (Rd ) determined by the equation, valid
4.2 D ETERMINANTAL POINT PROCESSES 235
We then have:
Lemma 4.2.36 The Palm distribution Q does not depend on the choice of the Borel
set B.
Proof We first note that, due to the stationarity, E( (B)) = c (B) with the
Lebesgue measure, for some constant c. (It is referred to as the intensity of , and
for determinantal translation invariant point processes, it coincides with the pre-
viously defined notion of intensity, see (4.2.35)). It is obvious from the definition
that the random measure
A (B) := 1A (T s ) (ds)
B
Due to Lemma 4.2.36, we can speak of the point process 0 attached to the
Palm measure Q, which we refer to as the Palm process. Note that 0 is such
that Q( 0 ({0}) = 1) = 1, i.e. 0 is such that the associated configurations have
a point at zero. It turns out that this analogy goes deeper, and in fact the law Q
corresponds to conditioning on an atom at the origin. Let V 0 denote the Voronoi
cell associated with 0 , i.e., with B(a, r) denoting the Euclidean ball of radius r
around a,
V 0 = {t Rd : 0 (B(t, |t|)) = 0} .
Proof From the definition of 0 it follows that for any bounded measurable func-
tion g,
E g(T s ) (ds) = c (B)Eg( 0 ) . (4.2.37)
B
Since all configurations are countable, the set of ts in the indicator in the inner
integral on the right side of the last expression is contained in a countable collec-
tion of (d 1)-dimensional surfaces. In particular, its Lebesgue measure vanishes.
One thus concludes that
P(D) = 0 . (4.2.39)
Exercise 4.2.39 Assume that K satisfies the assumptions of Lemma 4.2.32, and
define the Fourier transform
K( ) = K(x) exp(2 ix )dx L2 (Rd ) .
xRd
Give a direct proof that the right side of (4.2.31) is nonnegative.
Hint: use the fact that, since K is a good kernel, it follows that K 1.
Exercise 4.2.40 [CoL95] Take d = 1 and check that the sine-kernel Ksine (x) =
sin(x)/ x is a good translation invariant kernel for which the right side of (4.2.31)
vanishes. Check that then, if a < b are fixed,
E[ (L[a, b])] = L(b a)/ ,
whereas
1
Var( (L[a, b])) =
log L + O(1).
2
Hint: (a) Apply Parsevals Theorem and the fact that the Fourier transform of the
function sin(x)/ x is the indicator over the interval [1/2 , 1/2 ] to conclude
2
that K (x)dx = 1/ = K(0).
(b) Note that, with D = L[a, b] and Dx = [La x, Lb x],
1 1 cos(2u)
dx K 2 (x y)dy = dx K 2 (u)du = dx du ,
D Dc D Dcx 2 D Dcx 2u2
from which the conclusion follows.
Exercise 4.2.41 Let |V 0 | denote the Lebesgue measure of the Voronoi cell for a
Palm process 0 corresponding to a stationary determinantal process on Rd with
intensity c. Prove that E(|V 0 |) = 1/c.
We restrict attention in the sequel to the case of most interest to us, namely to
dimension d = 1, in which case the results are particularly explicit. Indeed, when
238 4. S OME GENERALITIES
the point process translates then to stationarity for the Palm process increments,
as follows.
Lemma 4.2.42 Let x0 denote the Palm process associated with a determinantal
translation invariant point process x on R with good kernel K satisfying
K(|x|) |x| 0, and with intensity c > 0. Then the sequence y0 := {xi+1
0 x0 }
i iZ
is stationary and ergodic.
Proof Let T y0 = {y0i+1 }iZ denote the shift of y0 . Consider g a Borel function
on R2r for some r 1, and set g(y0 ) = g(y0r , . . . , y0r1 ). For any configuration x
with xi < xi+1 and x1 < 0 x0 , set y := {xi+1 xi }iZ . Set f (x) = g(xr+1
xr , . . . , xr xr1 ), and let Au = { : f (x) u}. Au is clearly measurable, and
by Definition 4.2.35 and Lemma 4.2.36, for any Borel B with positive and finite
Lebesgue measure,
P(g(y ) u) = Q(Au ) = E
0
1Au (T ) (ds) /c (B)
s
B
= E 1g(T i y)u /c (B) . (4.2.40)
i:xi B
(Note the different roles of the shifts T s , which is a spatial shift, and T i , which is
a shift on the index set, i.e. on Z.) Hence,
Taking B = Bn = [n, n] and then n , we obtain that the left side of the last
expression vanishes. This proves the stationarity. The ergodicity (and in fact,
mixing property) of the sequence y0 is proved similarly, starting from Theorem
4.2.34.
Proposition 4.2.43 gives an natural way to construct the point process starting
from 0 (whose increments form a stationary sequence): indeed, it implies that
is nothing but the size biased version of 0 , where the size biasing is obtained by
the value of x10 . More explicitly, let x denote a translation invariant determinantal
process with intensity c, and let x0 denote the associated Palm process on R.
Consider the sequence y0 introduced in Lemma 4.2.42, and denote its law by Qy .
Let y denote a sequence with law Qy satisfying d Qy /dQy (y) = cy0 , let x0 denote
the associated configuration, that is xi0 = i1
j=1 y j , noting that x0 = 0, and let U
denote a random variable distributed uniformly on [0, 1], independent of x0 . Set
0
x = T U x1 x0 . We then have
Corollary 4.2.44 has an important implication to averages. Let Bn = [0, n]. For
a bounded measurable function f and a point process x on R, let
xi Bn f (T xi x)
fn (x) = .
|{i : xi Bn }|
Proof The statement is immediate from the ergodic theorem and Lemma 4.2.42
for the functions fn (x0 ). Since, by Corollary 4.2.44, the law of T x1 x is absolutely
continuous with respect to that of x0 , the conclusion follows by an approximation
argument.
the Palm measure, and with Q1 defined by d Q1 /dQ1 (u) = cu, note that
P(Gx t) = P(x10 t) = Q1 (du) = c uQ1 (du) .
t t
that is, G(t) can be read off easily from the kernel K. Other quantities can be read
off G, as well. In particular, the following holds.
where the change of variables w = t/u was used in the last equality. Integrating
by parts, using V (w) = 1/w and U (w) = Q1 ([w, )), we get
G(t) = U (2t) 2t w1 Q1 (dw)
2t
= U (2t) 2ct Q1 (dw) = U (2t) 2ctQ1 ([2t, ))
2t
= c [w 2t]Q1 (dw) .
2t
Theorem 4.2.49 Let gt (x) = 1x>t , and define gNn,t = 1n ni=1 gt (yNi ) . Suppose further
that n = o(N) N is such that for any constant a > 0,
Then
gNn,t N EQ1 gt = Q1 (dw) , in probability . (4.2.46)
t
and
1 $an/ %
S(s, , n) = 1i 1, Nj =0, j=i+1,...,i+s/ .
n i=1
1 $an/ %
=
n i=1
(1Bi KN 1Bi ) (1B+ KN 1B+ ) ,
i i
i+s/ i+s/
where Bi = j=i+1 D j and B+
i = j=i D j , and we used Lemma 4.2.16 in the
last equality. Similarly,
1 $an/ %
ES(s, , n) =
n i=1
(1Bi K1Bi ) (1 +
Bi K1 +
Bi ) ,
4.2.9 Examples
N N N
N (dx1 , , dxN ) = det (i (x j )) det (i (x j )) d (xi ) . (4.2.51)
i, j=1 i, j=1 i=1
Lemma 4.2.50 Assume that all principal minors of G = (gi j ) are not zero. Then
the measure N of (4.2.51) defines a determinantal simple point process with N
points.
Proof The hypothesis implies that G admits a Gauss decomposition, that is, it
can be decomposed into the product of a lower triangular and an upper triangular
matrix, with nonzero diagonal entries. Thus there exist matrices L = (li j )Ni,j=1 and
U = (ui j )Ni,j=1 so that LGU = I. Setting
= U = L ,
i , j = i, j , (4.2.52)
and, further,
N N N
N (dx1 , , dxN ) = CN det (i (x j )) det (i (x j )) d (xi )
i, j=1 i, j=1 i=1
The proof of Lemma 4.2.50 is concluded by using (4.2.52) and computations sim-
ilar to Lemma 3.2.2 in order to verify the property in Remark 4.2.6.
Exercise 4.2.51 By using Remark 4.1.7, show that all joint distributions appear-
ing in Weyls formula for the unitary groups (Proposition 4.1.6) correspond to
determinantal processes.
4.2 D ETERMINANTAL POINT PROCESSES 245
Fix x = (x1 < < xN ) with xi 2Z. Let {Xxn }n0 = {(Xn1 , . . . , XnN )}n0 denote
N independent copies of {Xn }n0 , with initial positions (X01 , . . . , X0N ) = x. For
0
integer T , define the event AT = 0kT {Xk1 < Xk2 < < XkN }.
Lemma 4.2.52 (GesselViennot) With the previous notation, set y = (y1 < <
yN ) with yi 2Z. Then
N &
K2T (x, y) = P (Xx2T = y|A2T )
detNi,j=1 (K2T (xi , y j ))
= .
z1 <<zN detNi,j=1 (K2T (xi , z j )) d (z j )
Proof The proof is an illustration of the reflection principle. Let P2T (x, y),
x, y 2Z, denote the collection of Z-valued, nearest neighbor paths { ()}2T
=0
with (0) = x, (2T ) = y and | ( + 1) ()| = 1. Let
8 9
2T (x, y) = { i }Ni=1 : i P2T (xi , yi )
denote the collection of N nearest neighbor paths, with the ith path connecting xi
and yi . For any permutation SN , set y = {y (i) }Ni=1 . Then
N N
det (K2T (xi , y j )) =
i, j=1
( ) K2T ( i ) , (4.2.53)
SN { i }N
i=1 2T (x,y )
i=1
where
2T 2
K2T ( ) = K1 (x , (2))
i i i
K1 ( (k), (k + 1))
i i
K1 ( i (2T 1), y (i) ) .
k=2
N Cx,y
2T = {{ }i=1 2T (x, y) : { } { } = 0
i N i j
/ if i = j}
246 4. S OME GENERALITIES
denote the collection of disjoint nearest neighbor paths connecting x and y. Then
N
P (Xx2T = y, A2T ) = x,y
K2T ( i ) . (4.2.54)
{ i }i=1 N C2T i=1
N
Thus, to prove the lemma, it suffices to check that the total contribution in (4.2.53)
of the collection of paths not belonging to N Cx,y 2T vanishes. Toward this end,
the important observation is that because we assumed x, y 2Z, for any n 2t
and i, j N, any path 2T (xi , y j ) satisfies (n) 2Z + n. In particular, if
{ i }Ni=1 SN 2T (x, y ) and there is a time n 2T and integers i < j such
that i (n) j (n), then there actually is a time m n with i (m) = j (m).
Now, suppose that in a family { i }Ni=1 2T (x, y ), there are integers i < j so
that i (n) = j (n). Consider the path so that
j
(), k = i, > n
k () = i (), k = j, > n
k
(), otherwise.
with
N N N
CN (n, T, x, y) = det (Kn (xi , z j )) det (K2T n (zi , y j )) d (zi ) .
i, j=1 i, j=1 i=1
We note that, in the proof of Lemma 4.2.52, it was enough to consider only
the first time in which paths cross; the proof can therefore be adapted to cover
diffusion processes, as follows. Take = R, the Lebesgue measure, and con-
sider a time homogeneous, real valued diffusion process (Xt )t0 with transition
kernel Kt (x, y) which is jointly continuous in (x, y). Fix x = (x1 < < xN )
with xi R. Let {Xtx }t0 = {(Xt1 , . . . , XtN )}t0 denote N independent copies of
{Xt }t0 , with initial positions (X01 , . . . , X0N ) = x. For real T , define the event
0
AT = 0tT {Xt1 < Xt2 < < XtN }.
Exercise 4.2.55 Prove the analog of Corollary 4.2.53 in the setup of Lemma
4.2.54. Use the following steps.
(a) For t < T , construct the density qtN,T,x,y of Xtx conditioned on AT {XxT = y}
so as to satisfy, for any Borel sets A, B RN and t < T ,
N N
P (Xtx A, XxT B|AT ) =
A
dzi B dyi qtN,T,x,y (z)px (y|AT ) .
i=1 i=1
(b) Show that the collection of densities qtN,T,x,y determine a Markov semigroup
corresponding to a diffusion process, and
N N
qtN,T,x,y (z) = CN,T (t, x, y) det (Kt (xi , z j )) det (KT t (zi , y j ))
i, j=1 i, j=1
with
N N N
CN,T (t, x, y) = det (Kt (xi , z j )) det (KT t (zi , y j )) d (zi ) ,
i, j=1 i, j=1 i=1
Exercise 4.2.56 (a) Use Exercise 4.2.55 and the heat kernel
K1 (x, y) = (2 )1/2 e(xy)
2 /2
to conclude that the law of the (ordered) eigenvalues of the GOE coincides with
the law of N Brownian motions run for a unit of time and conditioned not to inter-
sect at positive times smaller than 1.
Hint: start the Brownian motion at locations 0 = x1 < x2 < < xN and then take
xN 0, keeping only the leading term in x and noting that it is a polynomial in y
that vanishes when (y) = 0.
(b) Using part (a) and Exercise 4.2.55, show that the law of the (ordered) eigen-
values of the GUE coincides with the law of N Brownian motions at time 1, run
for two units of time, and conditioned not to intersect at positive times less than 2,
while returning to 0 at time 2.
In this section we introduce yet another effective tool for the study of Gaussian
random matrices. The approach is based on the fact that a standard Gaussian
variable of mean 0 and variance 1 can be seen as the value, at time 1, of a standard
Brownian motion. (Recall that a Brownian motion Wt is a zero mean Gaussian
process of covariance E(Wt Ws ) = t s.) Thus, replacing the entries by Brownian
motions, one gets a matrix-valued random process, to which stochastic analysis
and the theory of martingales can be applied, leading to alternative derivations and
extensions of laws of large numbers, central limit theorems, and large deviations
for classes of Gaussian random matrices that generalize the Wigner ensemble of
Gaussian matrices. As discussed in the bibliographical notes, Section 4.6, some of
the later results, when specialized to fixed matrices, are currently only accessible
through stochastic calculus.
Our starting point is the introduction of the symmetric and Hermitian Brownian
motions; we leave the introduction of the symplectic Brownian motions to the
exercises.
We refer the reader to Appendix H, Definitions H.4 and H.3, for the notions of
strong and weak solutions.
250 4. S OME GENERALITIES
Note that, in Theorem 4.3.2, we do not assume that N (0) N . The fact that
N (t) N for all t > 0 is due to the natural repulsion of the eigenvalues. This
repulsion will be fundamental in the proof of the theorem.
It is not hard to guess the form of the stochastic differential equation for the
eigenvalues of X N, (t), simply by writing X N, (t) = (ON ) (t)(t)ON (t), with
(t) diagonal and (ON ) (t)ON (t) = IN . Differentiating formally (using Itos for-
mula) then allows one to write the equations (4.3.3) and appropriate stochastic dif-
ferential equations for ON (t). However, the resulting equations are singular, and
proceeding this way presents several technical difficulties. Instead, our derivation
of the evolution of the eigenvalues N (t) will be somewhat roundabout. We first
show, in Lemma 4.3.3, that the solution of (4.3.3), when started at N , exists, is
unique, and stays in N . Once this is accomplished, the proof that ( N (t))t0
solves this system will involve routine stochastic analysis.
Lemma 4.3.3 Let N (0) = (1N (0), . . . , NN (0)) N . For any 1, there exists
a unique strong solution ( N (t))t0 C(R+ , N ) to the stochastic differential
system (4.3.3) with initial condition N (0). Further, the weak solution to (4.3.3)
is unique.
1 N 2 1
f (x) = f (x1 , . . . , xN ) = xi N 2 log |xi x j | .
N i=1 i= j
2 N 1
+ 1 + N 2 ui,2 ( (t)) dt + dMN (t) , (4.3.8)
N i=1
N,R
252 4. S OME GENERALITIES
Similarly,
N
N(N 1)
ui,1 (x)xi = 2
.
i=1
Thus, for all 1, for all M < , since (M N (t TM ),t 0) is a martingale with
zero expectation,
P (M N : TM2 t) = 1,
4.3 S TOCHASTIC ANALYSIS FOR RANDOM MATRICES 253
The proof we present goes backward by proposing a way to construct the ma-
trix X N, (t) from the solution of (4.3.3) and a Brownian motion on the orthogonal
(resp. unitary) group. Its advantage with respect to a forward proof is that we
do not need to care about justifying that certain quantities defined from X N, are
semi-martingales to insure that Itos calculus applies.
We first prove the theorem in the case N (0) N . We begin by enlarging the
probability space by adding to the independent Brownian motions (Wi , 1 i N)
an independent collection of independent Brownian motions (wi j , 1 i < j
1
N), which are complex if = 2 (that is, wi j = 2 2 (w1i j + 1w2i j ) with two
independent real Brownian motions w1i j , w2i j ) and real if = 1. We continue to use
Ft to denote the enlarged sigma-algebra (wi j (s), 1 i < j N,Wi (s), 1 i
N, s t).
1 1
dRNij (t) = dwi j (t) , RNij (0) = 0. (4.3.9)
N i (t) jN (t)
N
We let RN (t) be the skew-Hermitian matrix (i.e. RN (t) = RN (t) ) with such en-
tries above the diagonal and null entries on the diagonal. Note that since N (t)
N for all t, the matrix-valued process RN (t) is well defined, and its entries are
semi-martingales.
254 4. S OME GENERALITIES
Recalling the notation for the bracket of semi-martingales, see (H.1), for A, B
two semi-martingales with values in MN , we denote by A, Bt the matrix
N
(A, Bt )i j = (AB)i j t = Aik , Bk j t , 1 i, j N.
k=1
Observe that for all t 0, A, Bt = B , A t . We set ON to be the (strong) solution
of
1
dON (t) = ON (t)dRN (t) ON (t)d(RN ) , RN t , ON (0) = IN . (4.3.10)
2
This solution exists and is unique since it is a linear equation in ON and RN is a
well defined semi-martingale. In fact, as the next lemma shows, ON (t) describes
a process in the space of unitary matrices (orthogonal if = 1).
Further, let D( N (t)) denote a diagonal matrix with D( N (t))ii = N (t)i and set
Y N (t) = ON (t)D( N (t))ON (t) . Then
P(t 0, Y N (t) HN ) = 1 ,
and the entries of the process (Y N (t))t0 are continuous martingales with respect
to the filtration F , with bracket
Proof We begin by showing that J N (t) := ON (t) ON (t) equals the identity IN for
all time t. Toward this end, we write a differential equation for K N (t) := J N (t)IN
based on the fact that the process (ON (t))t0 is the strong solution of (4.3.10). We
have
. .
d(ON ) , (ON )t i j = d d(RN ) (s)(ON ) (s), ON (s)dRN (s)t
0 0 ij
N . .
= d( 0
(dRN ) (s)(ON ) (s))ik , (
0
ON (s)dRN (s))k j t
k=1
N N
= ONkm (t)ONkn (t)dRNmi , RNnj t
m,n=1 k=1
N
= N
Jmn (t)dRNmi , RNnj t , (4.3.11)
m,n=1
4.3 S TOCHASTIC ANALYSIS FOR RANDOM MATRICES 255
where here and in the sequel we use 0 to denote an indefinite integral viewed as
a process. Therefore, setting A.B = AB + BA, we obtain
1
dK N (t) = J N (t)[dRN (t) d(RN ) , RN t ]
2
1
+[d(R ) (t) d(RN ) , RN t ]J N (t) + d(ON ) , ON t
N
2
1
= K (t).(dR (t) d(RN ) , RN t ) + drN (t) ,
N N
2
with drN (t)i j = Nm,n=1 Kmn
N (t)dRN , RN . For any deterministic M > 0 and
mi n j t
0 S T , set, with TM given by (4.3.7),
and note that E (M, S, T ) < for all M, S, T , and that it is nondecreasing in
S. From the BurkholderDavisGundy inequality (Theorem H.8), the equality
KN (0) = 0, and the fact that (RN (t TM ))tT has a uniformly (in T ) bounded mar-
tingale bracket, we deduce that there exists a constant C(M) < (independent of
S, T ) such that for all S T ,
S
E (M, S, T ) C(M)E (M,t, T )dt .
0
and we used in the last equality the independence of (wi j , 1 i < j N) and
(Wi , 1 i N) to assert that the martingale bracket of N and ON vanishes. Set-
256 4. S OME GENERALITIES
ting
dZ N (t) := ON (t) dY N (t)ON (t) , (4.3.13)
we obtain from the left multiplication by ON (t) and right multiplication by ON (t)
of (4.3.12) that
dZ N (t) = (ON ) (t)dON (t)D( N (t)) + D( N (t))dON (t) ON (t)
+dD( N (t)) + ON (t) dON D( N )(ON ) t ON (t) . (4.3.14)
We next compute the last term in the right side of (4.3.14). For all i, j {1, . . . , N}2 ,
we have
N
dON D( N )(ON ) t i j = kN (t)dONik , ONjk t
k=1
N
= kN (t)ONil (t)ONjm (t)dRNlk , RNmk t .
k,l,m=1
Therefore, identifying the terms on the diagonal in (4.3.14) and recalling that RN
vanishes on the diagonal, we find, substituting in (4.3.13), that
7
N 2
dZii (t) = dWi (t).
N
4.3 S TOCHASTIC ANALYSIS FOR RANDOM MATRICES 257
and thus the a.s. continuity of the Brownian motions paths results in the a.s.
continuity of t N (t) for any given N. Letting 0 completes the proof of the
theorem.
Our next goal is to extend the statement of Lemma 4.3.3 to initial conditions
belonging to N . Namely, we have the following.
258 4. S OME GENERALITIES
Proposition 4.3.5 Let N (0) = (1N (0), . . . , NN (0)) N . For any 1, there
exists a unique strong solution ( N (t))t0 C(R+ , N ) to the stochastic differen-
tial system (4.3.3) with initial condition N (0). Further, for any t > 0, N (t) N
and N (t) is a continuous function of N (0).
Lemma 4.3.6 Let ( N (t))t0 and ( N (t))t0 be two strong solutions of (4.3.3)
starting, respectively, from N (0) N and N (0) N . Assume that iN (0) <
iN (0) for all i. Then,
Proof of Lemma 4.3.6 We note first that d(i iN (t) i iN (t)) = 0. In particular,
Next, for all i {1, . . . , N}, we have from (4.3.3) and the fact that N (t) N ,
N (t) N for all t that
1 (iN iN Nj + jN )(t)
d(iN iN )(t) =
N (iN (t) Nj (t))(iN (t) jN (t))
dt .
j: j=i
Proof of Proposition 4.3.5 Set N (0) = (1N (0), . . . , NN (0)) N and put for n
Z, iN,n (0) = iN (0) + ni . We have N,n (0) N and, further, if n > 0, iN,n (0) <
iN,n1 (0) < iN,n+1 (0) < iN,n (0). Hence, by Lemma 4.3.6, the corresponding
solutions to (4.3.3) satisfy almost surely and for all t > 0
iN,n (t) < iN,n1 (t) < iN,n+1 (t) < iN,n (t) .
4.3 S TOCHASTIC ANALYSIS FOR RANDOM MATRICES 259
Since
N N
( N,n (t) N,n (t)) = ( N,n (0) N,n (0)) (4.3.18)
i=1 i=1
goes to zero as n goes to infinity, we conclude that the sequences N,n and N,n
converge uniformly to a limit, which we denote by N . By construction, N
C(R+ , N ). Moreover, if we take any other sequence N,p (0) N converging
to N (0), the solution N,p to (4.3.3) also converges to N (as can be seen by
comparing N,p (0) with some N,n (0), N,n (0) for p large enough).
We next show that N is a solution of (4.3.3). Toward that end it is enough
to show that for all t > 0, N (t) N , since then if we start at any positive time
s we see that the solution of (4.3.3) starting from N (s) can be bounded above
and below by N,n and N,n for all large enough n, so that this solution must
coincide with the limit ( N (t),t s). So let us assume that there is t > 0 so that
N (s) N \N for all s t and obtain a contradiction. We let I be the largest
i {2, . . . , N} so that kN (s) < k+1
N (s) for k I but N (s) = N (s) for s t.
I1 I
Then, we find a constant C independent of n and n going to zero with n so that,
for n large enough,
|kN,n (s) k+1
N,n
(s)| C k I, |IN,n (s) I1
N,n
(s)| n .
Since N,n solves (4.3.3), we deduce that for s t
N,n N,n 2 I1 1 1
I1 (s) I1 (0) + W + (n C(N I))s.
N s N
N,n
This implies that I1 (s) goes to infinity as n goes to infinity, a.s. To obtain a
contradiction, we show that with CN (n,t) := N1 Ni=1 (iN,n (t))2 , we have
sup sup CN (n,t) < , a.s. (4.3.19)
n s[0,t]
With (4.3.19), we conclude that for all t > 0, N (t) N , and in particular it is
the claimed strong solution.
To see (4.3.19), note that since iN,n (s) iN,n (s) for any n n and all s by
Lemma 4.3.6, we have that
1 N N,n
|CN (n, s) CN (n , s)| = (i (s) iN,n (s))|(iN,n (s) + iN,n (s))|
N i=1
N N
1
(iN,n (s) iN,n (s)) N (|(iN,n (s)| + |iN,n (s)|)
i=1 i=1
N
( CN (n, s) + CN (n , s)) ( N,n (0) N,n (0)) ,
i=1
260 4. S OME GENERALITIES
where (4.3.18) and the CauchySchwarz inequality were used in the last inequal-
ity. It follows that
N
CN (n, s) CN (n , s) + ( N,n (0) N,n (0)) ,
i=1
and thus
N
sup sup CN (n, s) sup CN (n , s) + ( N,n (0) N,n (0)) .
nn s[0,t] s[0,t] i=1
Thus, to see (4.3.19), it is enough to bound almost surely sups[0,t] CN (n,t) for a
fixed n. From Itos Lemma (see Lemma 4.3.12 below for a generalization of this
particular computation),
2 2 N t N,n
CN (n,t) = DN (n,t) +
N N i=1 0
i (s)dWi (s)
E[ sup CN (n, s SR )2 ]
s[0,t]
t
2[DN (n,t)]2 + 2N 2 E[ sup CN (n, s SR )]du
0 s[0,u]
t
2[DN (n,t)]2 + N 2 t + N 2
E[ sup CN (n, s SR )2 ]du ,
0 s[0,u]
where the constant does not depend on R. Gronwalls Lemma then implies, with
EN (n,t) := 2[DN (n,t)]2 + N 2 t, that
t
2
E[ sup CN (n, s SR )2 ] EN (n,t) + e2N 1 (st) EN (n, s)ds .
s[0,t] 0
We can finally let R go to infinity and conclude that E[sups[0,t] CN (n, s)] is finite
and so sups[0,t] CN (n, s), and therefore supn sups[0,t] CN (n, s), are finite al-
most surely, completing the proof of (4.3.19).
N,
Exercise 4.3.7 Let H N,4 = Xi j be 2N2N complex Gaussian Wigner matrices
defined as the self-adjoint random matrices with entries
N, 4i=1 gikl ei N,4 1
Hkl = , 1 k < l N, Xkk = gkk e1 , 1 k N ,
4N 2N
4.3 S TOCHASTIC ANALYSIS FOR RANDOM MATRICES 261
Show that with H N,4 as above, and X N,4 (0) a Hermitian matrix with eigenval-
ues (1N (0), . . . , 2N
N (0)) , the eigenvalues ( N (t), . . . , N (t)) of X N,4 (0) +
N 1 2N
N,4
H (t) satisfy the stochastic differential system
1 1 1
d iN (t) = dWi (t) +
2N N N (t) N (t) dt , i = 1, . . . , 2N . (4.3.20)
j=i i j
Exercise 4.3.8 [Bru91] Let V (t) be an NM matrix whose entries are independent
complex Brownian motions and let V (0) be an NM matrix with complex entries.
Let N (0) = ( N (0), . . . , NN (0)) N be the eigenvalues of V (0)V (0) . Show
that the law of the eigenvalues of X(t) = V (t)V (t) is the weak solution to
7
iN (t) M N + iN
d iN (t) = 2 dWi (t) + 2( + kN )dt ,
N N k=i i kN
(b) Show that if X0N = H N, (1), then the law of XtN is the same law for all t 0.
( )
Conclude that the law PN of the eigenvalues of Gaussian Wigner matrices is sta-
tionary for the process (4.3.21).
( )
(c) Deduce that PN is absolutely continuous with respect to the Lebesgue mea-
sure, with density
N
|xi x j | e xi /4 ,
2
1x1 xN
1i< jN i=1
as proved in Theorem 2.5.2. Hint: obtain a partial differential equation for the
invariant measure of (4.3.21) and solve it.
262 4. S OME GENERALITIES
where (iN (t))t0 is a solution of (4.3.3) for 1 (see Proposition 4.3.10). Spe-
cializing to = 1 or = 2, we will then deduce in Corollary 4.3.11 a dynamical
proof of Wigners Theorem, Theorem 2.1.1, which, while restricted to Gaussian
entries, generalizes the latter theorem in the sense that it allows one to consider
the sum of a Wigner matrix with an arbitrary, N-dependent Hermitian matrix,
provided the latter has a converging empirical distribution. The limit law is then
described as the law at time one of the solution to a complex Burgers equation, a
definition which introduces already the concept of free convolution (with respect
to the semicircle law) that we shall develop in Section 5.3.3. In Exercise 4.3.18,
Wigners Theorem is recovered from its dynamical version.
We recall that, for T > 0, we denote by C([0, T ], M1 (R)) the space of contin-
uous processes from [0, T ] into M1 (R) (the space of probability measures on R,
equipped with its weak topology). We now prove the convergence of the empirical
measure LN (), viewed as an element of C([0, T ], M1 (R)).
1 N
C0 := sup log(iN (0)2 + 1) < ,
N0 N i=1
(4.3.23)
and the empirical measure LN (0) = N1 Ni=1 N (0) converges weakly as N goes to
k
infinity towards a M1 (R).
Let N (t) = (1N (t), . . . , NN (t))t0 be the solution of (4.3.3) with initial con-
dition N (0), and set LN (t) as in (4.3.22). Then, for any fixed time T < ,
(LN (t))t[0,T ] converges almost surely in C([0, T ], M1 (R)). Its limit is the unique
measure-valued process (t )t[0,T ] so that 0 = and the function
Gt (z) = (z x)1 d t (x) (4.3.24)
for z C\R .
4.3 S TOCHASTIC ANALYSIS FOR RANDOM MATRICES 263
Proof of Proposition 4.3.10 We begin by showing that the sequence (LN (t))t[0,T ]
is almost surely pre-compact in C([0, T ], M1 (R)) and then show that it has a unique
limit point characterized by (4.3.25). The key step of our approach is the follow-
ing direct application of Itos Lemma, Theorem H.9, to the stochastic differential
system (4.3.3), whose elementary proof we omit.
Lemma 4.3.12 Under the assumptions of Proposition 4.3.10, for all T > 0, all
f C2 ([0, T ]R, R) and all t [0, T ],
t
f (t, ), LN (t) = f (0, ), LN (0) + s f (s, ), LN (s)ds (4.3.26)
0
1 t x f (s, x) y f (s, y)
+ dLN (s)(x)dLN (s)(y)ds
2 0 xy
t
2 1
+ ( 1) 2 f (s, ), LN (s)ds + M Nf (t) ,
2N 0 x
sequence (LN (t))t[0,T ] is a pre-compact family in C([0, T ], M1 (R)) for all T < .
Toward this end, we first describe a family of compact sets of C([0, T ], M1 (R)).
Proof of Lemma 4.3.13 The space C([0, T ], M1 (R)) being Polish, it is enough to
prove that the set K is sequentially compact and closed. Toward this end, let
( n )n0 be a sequence in K . Then, for all i N, the functions ttn ( fi ) be-
long to the compact sets Ci and hence we can find a subsequence i (n) n
(n)
such that the sequence of bounded continuous functions tt i ( fi ) converges
in C[0, T ]. By a diagonalization procedure, we can find an i independent subse-
(n)
quence (n) n such that for all i N, the functions tt ( fi ) converge
towards some function tt ( fi ) C[0, T ]. Because ( fi )i0 is convergence deter-
mining in K M1 (R), it follows that one may extract a further subsequence, still
denoted (n), such that for a fixed dense countable subset of [0, T ], the limit t
belongs to M1 . The continuity of tt ( fi ) then shows that t M1 (R) for all t,
which completes the proof that ( n )n0 is sequentially compact. Since K is an
intersection of closed sets, it is closed. Thus, K is compact, as claimed.
We next prove the pre-compactness of the sequence (LN (t),t [0, T ]).
Proof We begin with a couple of auxiliary estimates. Note that from Lemma
4.3.12, for any function f that is twice continuously differentiable,
f (x) f (y)
dLN (s)(x)dLN (s)(y)
xy
1
= f ( x + (1 )y)d dLN (s)(x)dLN (s)(y) . (4.3.28)
0
Apply Lemma 4.3.12 with the function f (x) = log(1 + x2 ), which is twice contin-
uously differentiable with second derivative uniformly bounded by 2, to deduce
that
1
sup | f , LN (t)| | f , LN (0)| + T (1 + ) + sup |M Nf (t)| (4.3.29)
tT N tT
4.3 S TOCHASTIC ANALYSIS FOR RANDOM MATRICES 265
which, together with (4.3.29), proves that there exists a = a(T ) < so that, for
M > T +C0 + 1,
a
P sup log(x2 + 1), LN (t) M . (4.3.31)
t[0,T ] (M T C 0 1) N
2 2
Indeed, apply Lemma 4.3.12 with f (x,t) = f (x). Using (4.3.28), one deduces that
for all t s,
| f , LN (t) f , LN (s)| || f || |s t| + |M Nf (t) M Nf (s)| , (4.3.33)
where M Nf (t) is a martingale with bracket 2 1 N 2 0t ( f )2 , LN (u)du. Now,
cutting [0, T ] to intervals of length we get, with J := [T 1 ],
N
P sup M f (t) M Nf (s) (M 1) 1/8
|ts|
t,sT
J+1 N
P sup M (t) M N (k ) (M 1) 1/8 /3
f f
k=1 k t(k+1)
J+1
34 N
1/2 (M 1)4 E sup M (t) M N (k ) 4
f f
k=1 k t(k+1)
1
4 34 2 2 a 2
(J + 1)|| f ||2 =: 2 f 2 ,
N (M 1)
2 4 1/2 4 N (M 1)4
where again we used in the second inequality Chebyshevs inequality, and in the
last the BurkholderDavisGundy inequality (Theorem H.8) with m = 2. Com-
bining this inequality with (4.3.33) completes the proof of (4.3.32).
266 4. S OME GENERALITIES
Then, by (4.3.32),
a 4
P (LN CT ( f , )c ) . (4.3.35)
N4
Combining (4.3.34) and (4.3.35), we get from the BorelCantelli Lemma that
! .
P {LN K } = 1.
N0 0 NN0
the equation
t
f (t, x)d t (x) = f (0, x)d 0 (x) + s f (s, x)d s (x)ds
0
t
1 x f (s, x) x f (s, y)
+ d s (x)d s (y)ds . (4.3.37)
2 0 xy
Taking f (x) = (z x)1 for some z C\R, we deduce that the function Gt (z) =
(z x)1 d t (x) satisfies (4.3.24), (4.3.25). Note also that since the limit t is a
probability measure on the real line, Gt (z) is analytic in z for z C+ .
To conclude the proof of Proposition 4.3.10, we show below in Lemma 4.3.15
that (4.3.24), (4.3.25) possess a unique solution analytic on z C+ := {z C :
(z) > 0}. Since we know a priori that the support of any limit point t lives in
R for all t, this uniqueness implies the uniqueness of the Stieltjes transform of t
for all t and hence, by Theorem 2.4.3, the uniqueness of t for all t, completing
the proof of Proposition 4.3.10.
Proof We first note that since |G0 (z)| 1/|z|, (z + tG0 (z)) z t/z is
positive for t < (z)2 and z > 0. Thus, t ,t t for t < (t t )2 /(1 + t2 ).
Moreover, |G0 (z)| 1/2|z| from which we see that, for all t 0, the image of
t ,t by z + tG0 (z) is contained in some t ,t provided t is large enough. Note
that we can choose the t ,t and t ,t decreasing in time.
We next use the method of characteristics. Fix G. a solution of (4.3.24), (4.3.25).
We associate with z C+ the solution {zt ,t 0} of the equation
t zt = Gt (zt ) , z0 = z. (4.3.38)
We can construct a solution z. to this equation up to time (z)2 /4 with zt z/2
as follows. We put for > 0,
z x
Gt (z) := d t (x), t zt = Gt (zt ) , z0 = z.
|z x|2 +
z. exists and is unique since Gt is uniformly Lipschitz. Moreover,
t (zt ) 1 1
= d t (x) [ , 0],
(zt ) |zt x| +
2 |(zt )|2
268 4. S OME GENERALITIES
With a view toward later applications in Subsection 4.3.3 to the proof of central
limit theorems, we extend the previous results to polynomial test functions.
With the same notation and assumptions as in Proposition 4.3.10, for any T < ,
for any polynomial function q, the process (q, LN (t))t[0,T ] converges almost
surely and in all L p , towards the process (t (q))t[0,T ] , that is,
lim sup sup |q, LN (t) q, t | = 0 a.s.
N t[0,T ]
A key ingredient in the proof is the following control of the moments of N (t) :=
max1iN |iN (t)| = max(NN (t), 1N (t)).
Lemma 4.3.17 Let 1 and N (0) N . Then there exist finite constants
= ( ) > 0,C = C( ), and for all t 0 a random variable N (t) with law
independent of t, such that
P(N (t) x +C) e Nx
4.3 S TOCHASTIC ANALYSIS FOR RANDOM MATRICES 269
and, further, the unique strong solution of (4.3.3) satisfies, for all t 0,
N (t) N (0) + t N (t) . (4.3.39)
We note that for = 1, 2, 4, this result can be deduced from the study of the
maximal eigenvalue of X N, (0) + H N, (t), since the spectral radius of H N, (t)
has the same law as the spectral radius of tH N, (1), that can be controlled as in
Section 2.1.6. The proof we give below is based on stochastic analysis, and works
for all 1. It is based on the comparison between strong solutions of (4.3.3)
presented in Lemma 4.3.6.
Proof of Lemma 4.3.17 Our approach is to construct a stationary process N (t) =
(1N (t), . . . , NN (t)) N , t 0, with marginal distribution P(N ) := PNx2 /4, as in
(2.6.1), such that, with N (t) = max(NN (t), 1N (t)), the bound (4.3.39) holds.
We first construct this process (roughly corresponding to the process of eigenval-
ues of H N, (t)/ t if = 1, 2, 4) and then prove (4.3.39) by comparing solutions
to (4.3.3) started from different initial conditions.
Fix > 0. Consider, for t , the stochastic differential system
7
2 1 1 1
N
dui (t) =
Nt
dWi (t) +
Nt j=i ui (t) u j (t)
N N
dt uNi (t)dt.
2t
(4.3.40)
( )
Let PN denote the rescaled version of PN from (2.5.1), that is, the law on N with
density proportional to
|i j | eN i /4 .
2
i< j i
Because PN (N ) = 1, we may take uN ( ) distributed according to PN , and the
proof of Lemma 4.3.3 carries over to yield the strong existence and uniqueness of
solutions to (4.3.40) initialized from such (random) initial conditions belonging to
N .
Our next goal is to prove that PN is a stationary distribution for the system
(4.3.40) with this initial distribution, independently of . Toward this end, note
that by Itos calculus (Lemma 4.3.12), one finds that for any twice continuously
differentiable function f : RN R,
1 i f (uN (t)) j f (uN (t))
t E[ f (uN (t))] = E[
2Nt i= j uNi (t) uNj (t)
]
1 1
E[
2t i
uNi (t)i f (uN (t))] + E[
Nt
i2 f (uN (t))] ,
i
x2 /4
with J (s) > 0 for s > 2. Thus, there exist C < and > 0 so that for x C,
for all N N ,
P(uN (t) x) 2PN (N x) e Nx . (4.3.41)
Define next N,0 (t) = tuN (t). Clearly, N,0 (0) = 0 N . An application
of Itos calculus, Lemma 4.3.12, shows that N,0 (t) is a continuous solution of
(4.3.3) with initial data 0, and N,0 (t) N for all t > 0. For an arbitrary constant
A, define N,A (t) N by iN,A (t) = iN,0 (t) + A, noting that ( N,A (t))t0 is again
a solution of (4.3.3), starting from the initial data (A, . . . , A) N , that belongs to
N for all t > 0.
N, +N (0)
Note next that for any > 0, i (0) > iN (0) for all i. Further, for
N, +N (0)
t small, i > iN (t) for all i by continuity. Therefore, we get from
(t)
Lemma 4.3.6 that, for all t > 0,
N, +N (0)
NN (t) N (t) N (0) + + tuN (t) .
Proof of Lemma 4.3.16 We use the estimates on N (t) from Lemma 4.3.17 in
order to approximate q, LN (t) for polynomial functions q by similar expressions
involving bounded continuous functions.
We begin by noting that, due to Lemma 4.3.17 and the BorelCantelli Lemma,
for any fixed t,
lim sup N (t) N (0) + tC a.s. (4.3.42)
N
4.3 S TOCHASTIC ANALYSIS FOR RANDOM MATRICES 271
To prove (4.3.45), we use (4.3.26) with f (t, x) = xn and an integer n > 0 to get
xn+2 , LN (t) = xn+2 , LN (0) + Mn+2
N
(t)
t
(n + 1)(n + 2) 2
+ 1 xn , LN (s)ds
2N 0
(n + 2) n t
+
2 =0 0 x , LN (s)xn , LN (s)ds , (4.3.47)
N is a local martingale with bracket
where Mn+2
t
2(n + 2)2
Mn+2
N
t = x2n+2 , LN (s)ds .
N2 0
272 4. S OME GENERALITIES
and deduce from (4.3.47) and the last estimate that for p [0, N/2] integer,
1
(c1 ) 2 p tC(t)(2p+1)
t (2(p + 1)) 0 (2(p + 1)) +
N
t
+(p + 1) E[(N (t))2p ]ds
2
(4.3.48)
0
1
(c1 ) 2 p tC(t)(2p+1)
C2(p+1) + + ( N)2C(t)2p .
N
Taking p = N/2, we deduce that the left side is bounded by (2C(T )) N , for all
N large. Therefore, by Jensens inequality, we conclude
t () t ( N) N (2C(T )) for all [0, N] . (4.3.49)
We may now complete the proof of the lemma. For > 0 and continuous
function q, set
x
q (x) = q .
1 + x2
By Proposition 4.3.10, for any > 0, we have
lim sup |q , LN (t) q , t | = 0 . (4.3.50)
N t[0,T ]
for any N. By the BorelCantelli Lemma, taking = (log N)2 and A larger
than 2C(T ), we conclude that
lim sup sup |(q q ), LN (t)| [(2C(T )) p+2 + (2C(T ))3 ]C , a.s.
N t[0,T ]
Together with (4.3.50) and (4.3.51), this yields the almost sure uniform conver-
gence of q, LN (t) to q, t . The proof of the L p convergence is similar once we
have (4.3.45).
Exercise 4.3.18 Take 0 = 0 . Show that the empirical measure LN (1) of the
Gaussian (real) Wigner matrices converges almost surely. Show that
1
G1 (z) = G1 (z)2
z
and conclude that the limit is the semicircle law, hence giving a new proof of
Theorem 2.1.1 for Gaussian entries.
Hint: by the scaling property, show that Gt (z) = t 1/2 G1 (t 1/2 z) and use Lemma
4.3.25.
Exercise 4.3.19 Using Exercise 4.3.7, extend Corollary 4.3.11 to the symplectic
setup ( = 4).
and that LN (0) converges towards a probability measure in such a way that, for
all p 2,
sup E[|N(xn , LN (0) xn , )| p ] < .
NN
Assume that for any n N and any P1 , . . . , Pn C[X], GN, (P1 , . . . , Pn )(0) con-
verges in law towards a random vector (G(P1 )(0), . . . , G(Pn )(0)). Then
(a) there exists a process (G(P)(t))t[0,T ],PC[X] , such that for any polynomial
functions P1 , . . . , Pn C[X], the process (GN, (P1 , . . . , Pn )(t))t[0,T ] converges in
law towards (G(P1 )(t), . . . , G(Pn )(t))t[0,T ] ;
(b) the limit process (G(P)(t))t[0,T ],PC[X] is uniquely characterized by the fol-
lowing two properties.
(1) For all P, Q C[X] and ( , ) R2 ,
G( P + Q)(t) = G(P)(t) + G(Q)(t) t [0, T ].
(2) For any n N, (G(xn )(t))t[0,T ],nN is the unique solution of the system of
equations
G(1)(t) = 0 , G(x)(t) = G(x)(0) + Gt1 ,
and, for n 2,
t n2
G(xn )(t) = G(xn )(0) + n s (xnk2 )G(xk )(s)ds
0 k=0
t
2
+ n(n 1) s (xn2 )ds + Gtn , (4.3.52)
2 0
Nxi , (LN (t) t ) to get, using (4.3.47) (which is still valid with obvious modifi-
cations if i = 1),
i2 t
GNi (t) = GNi (0) + i GNk (s)s (xi2k )ds + MiN (t)
k=0 0
t t
2 i i2
+
2
i(i 1)
0
xi2 , LN (s)ds +
2N k=0 0
GNk (s)GNi2k (s)ds , (4.3.53)
(Note that by Lemma 4.3.16, the L p norm of MiN is finite for all p, and so in
particular MiN are martingales and not just local martingales.)
By Lemma 4.3.16, for all t 0, MiN , M Nj t converges in L2 and almost surely
towards 2 i j 0t xi+ j2 , s ds. Thus, by Theorem H.14, and with the Gaussian
process (Gti )t[0,T ],iN as defined in the theorem, we see that, for all k N,
(MkN (t), . . . , M1N (t))t[0,T ] converges in law towards
the k-dimensional Gaussian process (Gtk , Gtk1 , . . . , Gt1 )t[0,T ] . (4.3.54)
Moreover, (Gtk , Gtk1 , . . . , Gt1 )t[0,T ] is independent of (G(xn )(0))nN since the con-
vergence in (4.3.54) holds given any initial condition such that LN (0) converges
to . We next show by induction over p that, for all q 2,
Aqp := max sup E[ sup |GNi (t)|q ] < . (4.3.55)
ip NN t[0,T ]
To begin the induction, note that (4.3.55) holds for p = 0 since GN0 (t) = 0. Assume
(4.3.55) is verified for polynomials of degree strictly less than p and all q. Recall
that, by (4.3.45) of Lemma 4.3.16, for all q N,
Bq = sup sup E[|x|q , LN (t)] < . (4.3.56)
NN t[0,T ]
Set Aqp (N, T ) := E[supt[0,T ] |GNp (t)|q ]. Using (4.3.56), Jensens inequality in the
form E(x1 + x2 + x3 )q 3q1 3i=1 E|xi |q , and the BurkholderDavisGundy in-
equality (Theorem H.8), we obtain that, for all > 0,
Aqp (N, T ) 3q [Aqp (N, 0)
p2
1
+(pT )q (Akq(1+ ) (N, T ))(1+ ) 1+
B(1+ ) 1 (p2k)q
k=0
T
1 q q1
+(pN ) T q/2 E[ x2q(p1) , LN (s)ds] .
0
276 4. S OME GENERALITIES
By the induction hypothesis (Akq(1+ ) is bounded since k < p), the fact that we
control Aqp (N, 0) by hypothesis and the finiteness of Bq for all q, we conclude
also that Aqp (N, T ) is bounded uniformly in N for all q N. This completes the
induction and proves (4.3.55).
Set next, for i N,
i2 t
N (i)(s) := iN 1 GNk (s)GNi2k (s)ds .
k=0 0
Since
1
sup E[N (i)(s)q ] N q i2q (A2q
p 2
) T,
s[0,T ]
Setting
i2 t
YiN (t) = GNi (t) GNi (0) i GNk (s)xi2k , s ds,
k=0 0
for all t [0, T ], we conclude from (4.3.53), (4.3.54) and (4.3.57) that the pro-
cesses (YiN (t),Yi1N (t), . . . ,Y N (t))
1 t0 converge in law towards the centered Gaus-
sian process (G (t), . . . , G1 (t))t0 .
i
Exercise 4.3.21 Recover the results of Section 2.1.7 in the case of Gaussian
Wigner matrices. by taking X N, (0) = 0, with 0 = 0 and G(xn )(0) = 0. Note
that mn (t) := EG(xn )(t) = t n/2 mn (1) may not vanish.
Exercise 4.3.22 In each part of this exercise, check that the given initial data
X N (0) fulfills the hypotheses of Theorem 4.3.20. (a) Let X N (0) be a diagonal
matrix with entries on the diagonal ( ( Ni ), 1 i N), with a continuously
differentiable function on [0, 1]. Show that
1
1
0 ( f ) = f ( (x))dx, G(x p )(0) = [ (1) p (0) p ] for all p,
0 2
4.3 S TOCHASTIC ANALYSIS FOR RANDOM MATRICES 277
0 ( f ) = 0 , G(x p )(0) = 0 if p = 1
and
1
Ss,t ( , f ) = Ss,t ( , f ) f , f s,t . (4.3.60)
2
Set, for any probability measure M1 (R),
+ , if 0 = ,
S ( ) := S ( ) := sup f C2,1 (R[0,T ]) sup0stT Ss,t ( , f ) , otherwise.
0,T
b
278 4. S OME GENERALITIES
We now show that S () is a candidate for rate function, and that a large devia-
tion upper bound holds with it.
Lemma 4.3.24 Assume (4.3.23). Let T R+ . Then, there exists a(T ) > 0 and
M(T ),C(T ) < so that:
(a) for M M(T ),
C(T )ea(T )MN ;
2
P sup log(x2 + 1), LN (t) M
t[0,T ]
(b) for any L N, there exists a compact set K (L) C([0, T ], M1 (R)) so that
P (LN () K (L)c ) eN L .
2
It follows in particular from the second part of Lemma 4.3.24 that the sequence
(LN (t),t [0, T ]) is almost surely pre-compact in C([0, T ], M1 (R)); compare with
Lemma 4.3.14.
Proof The proof proceeds as in Lemma 4.3.14. Set first f (x) = log(x2 + 1).
Recalling (4.3.29) and Corollary H.13, we then obtain that, for all L 0,
N 2 L2
P sup |M f (s)| L 2e 16T ,
N
sT
which combined with (4.3.29) yields the first part of the lemma.
For the second part of the lemma, we proceed similarly, by first noticing that
4.3 S TOCHASTIC ANALYSIS FOR RANDOM MATRICES 279
if f C2 (R) is bounded, together with its first and second derivatives, by 1, then
from Corollary H.13 and (4.3.33) we have that
sup | f , LN (s) LN (ti )| 2 + ,
i s(i+1)
N 2 ( )2
with probability greater than 1 2e 16 . Using the compact sets K = KM of
C([0, T ], M1 (R)) as in (4.3.36) with k = 1/kM( fk + fk + fk ), we then
conclude that
P(LN KM ) 2ecM N ,
2
Proof of Proposition 4.3.23 We first prove that S () is a good rate function. Then
we obtain a weak large deviation upper bound, which gives, by the exponential
tightness proved in the Lemma 4.3.24, the full large deviation upper bound.
(a) Observe first that, from Riesz Theorem (Theorem B.11), S ( ) is also
given, when 0 = , by
1 Ss,t ( , f )2
SD ( ) = sup sup . (4.3.61)
2 f Cb (R[0,T ]) 0stT
2,1 f , f s,t
Hence, (4.3.61) implies, by taking f = f in the supremum, that, for any (0, 1],
any t [0, T ], any . {SD M},
t ( f ) 0 ( f ) + 2Ct + 2C Mt .
Chebyshevs inequality and (4.3.23) thus imply that for any . {S () M} and
any K R+ ,
CD + 2C(1 + M)
sup t (|x| K) ,
t[0,T ] log(K 2 + 1)
We turn next to establishing the weak large deviation upper bound. Pick
C([0, T ], M1 (R)) and f C2,1 ([0, T ]R). By Lemma 4.3.12, for any s 0, the
s} is a martingale for the filtration of the Brownian motion
process {Ss,t (LN , f ),t
W , which is equal to 2/ N 3/2 Ni=1 st f (iN (u))dWui . Its bracket is f , f s,t
LN .
As f is uniformly bounded, we can apply Theorem H.10 to deduce that the pro-
cess {MN (LN , f )(t),t s} is a martingale if for C([0, T ], M1 (R)) we denote
N2
MN ( , f )(t) := exp{N 2 Ss,t ( , f ) f , f s,t s,t
+ N ( f ) }
2
with
t
1 1
( f )s,t
:= ( ) x2 f (s, x)d (x)du.
2 s
some metric d compatible with the weak topology on C([0, T ], M1 (R))) of radius
around , we obtain, for all s t T ,
MN (LN , f )(t)
P (d(LN , ) < ) = E[ 1 ]
MN (LN , f )(t) d(LN , )<
2 +N f 2 Ss,t ( , f )
N
eN E[MN (LN , f )(t)1d(LN , )< ]
N 2 +N f N 2 Ss,t ( , f )
e
E[MN (LN , f )(t)]
N 2 +N f N
2 Ss,t ( , f )
= e ,
where we finally used the fact that E[MN (LN , f )(t)] = E[MN (LN , f )(s)] = 1 since
the process {MN (LN , f )(t),t s} is a martingale. Hence,
1
lim lim log P (d(LN , ) < ) Ss,t ( , f )
0 N N 2
Exercise 4.3.25 In this exercise, you prove that the set { : S ( ) = 0} consists
of the unique solution of (4.3.25).
(a) By applying Riesz Theorem, show that
Ss,t ( , f )2
S0,T ( ) := sup sup .
f Cb (R[0,T ]) 0stT
2,1 2 f , f s,t
(b) Show that S (. ) = 0 iff 0 = and Ss,t ( , f ) = 0 for all 0 s t T and all
f Cb2,1 (R[0, T ]). Take f (x) = (z x)1 to conclude.
expand on this theme by developing both concentration techniques and their ap-
plications to random matrices. To do so we follow each of two well-established
routes. Taking the first route, we consider functionals of the empirical measure
of a matrix as functions of the underlying entries. When enough independence is
present, and for functionals that are smooth enough (typically, Lipschitz), concen-
tration inequalities for product measures can be applied. Taking the second route,
which applies to situations in which random matrix entries are no longer inde-
pendent, we view ensembles of matrices as manifolds equipped with probability
measures. When the manifold satisfies appropriate curvature constraints, and the
measure satisfies coercivity assumptions, semigroup techniques can be invoked to
prove concentration of measure results.
Hence, concentration results for linear functionals of the empirical measure of the
singular values of YN can be deduced from such results for the eigenvalues of XN .
For an example, see Exercise 4.4.9.
4.4 C ONCENTRATION OF MEASURE AND RANDOM MATRICES 283
Our first goal is to extend the concentration inequalities, Lemma 2.3.3 and The-
orem 2.3.5, to Hermitian matrices whose independent entries satisfy a weaker
condition than the LSI, namely to matrices whose entries satisfy a Poincare type
inequality.
It is not hard to check that if P satisfies an LSI with constant c, then it satisfies a PI
with constant m c1 , see [GuZ03, Theorem 4.9]. However, there are probability
measures which satisfy the PI but not the LSI such as Z 1 e|x| dx for a (1, 2).
a
Further, like the LSI, the PI tensorizes: if P satisfies the PI with constant m, PM
also satisfies the PI with constant m for any M N, see [GuZ03, Theorem 2.5].
Finally, if for some uniformly bounded function V we set PV = Z 1 eV (x) dP(x),
then PV also satisfies the PI with constant bounded below by e supV +infV m, see
[GuZ03, Property 2.6].
As we now show, probability measures on RM satisfying the PI have sub-
exponential tails.
Lemma 4.4.3 Assume that P satisfies the PI on RM with constant m. Then, for
any differentiable function G on RM , for |t| m/ 2 G2 ,
EP (et(GEP (G)) ) K , (4.4.1)
with K = i0 2i log(1 21 4i ). Consequently, for all > 0,
m
P(|G EP (G)| ) 2Ke 2 G2
. (4.4.2)
Proof With G as in the statement, for t 2 < m/G22 , set f = etG and note
that
2 t 2
EP (e2tG ) EP (etG ) G22 EP (e2tG )
m
so that
t2 2
EP (e2tG ) (1 )1 EP (etG ) .
m G2
2
284 4. S OME GENERALITIES
and
4it 2
Dt := 2i log(1 G22 ) <
i=0 m
increases with |t|, we conclude that with t0 = m/ 2 G2 ,
EP (e2t0 (GEP (G)) ) Dt0 = K .
The estimate (4.4.2) then follows by Chebyshevs inequality.
We can immediately apply this result in the context of large random matri-
ces. Consider Hermitian matrices such that the laws of the independent entries
{XN (i, j)}1i jN all satisfy the PI (over R or R2 ) with constant bounded below
by Nm. Note that, as for the LSI, if P satisfies the PI with constant m, the law of
ax under P satisfies it also with a constant bounded by a2 m1 , so that our hypoth-
esis includes the case where XN (i, j) = aN (i, j)YN (i, j) with YN (i, j) i.i.d. of law P
satisfying the PI and a(i, j) deterministic and uniformly bounded.
Corollary 4.4.4 Under the preceding assumptions, there exists a universal con-
stant C > 0 such that, for any differentiable function f , and any > 0,
C Nm
P (|tr( f (XN )) E[tr( f (XN ))]| N) Ce f
2 .
x2
Exercise 4.4.6 Let (dx) = (2 )1/2 e 2 dx be the standard Gaussian measure.
Show that satisfies the Poincare inequality with constant one, by following the
following approaches.
Use Lemma 2.3.2.
Use the interpolation
1
2
(( f ( f ))2 ) = f ( x + 1 y)d (y) d (x)d ,
0
4.4 C ONCENTRATION OF MEASURE AND RANDOM MATRICES 285
integration by parts, the CauchySchwarz inequality and the fact that, for
any [0, 1], the law of x + 1 y is under .
Exercise 4.4.7 [GuZ03, Theorem 2.5] Show that the PI tensorizes: if P satisfies
the PI with constant m then PM also satisfies the PI with constant m for any
M N.
Exercise 4.4.8 [GuZ03, Theorem 4.9] Show that if P satisfies an LSI with constant
c, then it satisfies a PI with constant m c1 . Hint: Use the LSI with f = 1 + g
and 0.
Exercise 4.4.9 Show that Corollary 4.4.4 extends to the setup of singular values of
the Wishart matrices introduced in Exercise 2.1.18. That is, in the setup described
there, assume the entries YN (i, j) satisfy the PI with constant bounded below by
Nm, and set XN = (YN YNT )1/2 . Prove that, for a universal constant C, and all > 0,
C Nm
P (|tr( f (XN )) E[tr( f (XN ))]| (M + N)) Ce f
2 .
Recall that the median MY of a random variable Y is defined as the largest real
number such that P(Y x) 21 . The following is an easy consequence of a
theorem due to Talagrand, see [Tal96, Theorem 6.6].
f c 0
for some constant c, we have R f (x, y) 2c (x y)2 = R c x2 (x, y). Consider also
2
the matrix R f (X,Y ) = f (X) f (Y ) (X Y ) f (Y ), noting that tr(R c x2 (X,Y )) =
2
tr( 2c (X Y )2 ). For i {1, . . . , N}, with ci j = |i , j |2 , and with summations on
j {1, . . . , N}, we have
= ci j R f (xi , y j ) ci j R 2c x2 (xi , y j ) ,
j j
where at the middle step we use the fact that j ci j = 1. After summing on i
{1, . . . , N} we have
c
tr( f (X) f (Y ) (X Y ) f (Y )) tr(X Y )2 0 . (4.4.3)
2
4.4 C ONCENTRATION OF MEASURE AND RANDOM MATRICES 287
Now take successively (X,Y ) = (A, (A + B)/2), (B, (A + B)/2). After summing
(2)
the resulting inequalities, we have for arbitrary A, B Hn that
1 1 1 1
tr f ( A + B) tr ( f (A)) + tr ( f (B)) .
2 2 2 2
The result follows for general convex functions f by approximations.
We can now apply Corollary 4.4.11 and Lemma 4.4.12 to the function
f ({XN (i, j)}1i jN ) = tr( f (XN )) to obtain the following.
Theorem 4.4.13 Let (Pi, j , i j) and (Qi, j , i < j) be probability measures sup-
ported on a convex compact subset K of R. Let XN be a Hermitian matrix, such
that XN (i, j), i j, is distributed according to Pi, j , and XN (i, j), i < j, is dis-
tributed according to Qi, j , and such that all these random variables are indepen-
dent. Fix 1 (N) = 8|K| a/N. Then, for any 4 |K|1 (N), and any convex
Lipschitz function f on R,
PN |tr( f (XN )) E N [tr( f (XN ))]| N
32|K| 1 2
exp N 2
[ 1 (N)] . (4.4.4)
16|K|2 a2 16|K|| f |2L
Let be a smooth function from Rm into R, with fast enough growth at infinity
such that the measure
1 (x1 ,...,xm )
(dx) := e dx1 dxm
Z
is a well defined probability measure. (Further assumptions of will be imposed
below.) We consider the operator L on twice continuously differentiable func-
288 4. S OME GENERALITIES
tions defined by
m
L = () = [i2 (i )i ] .
i=1
Then, integrating by parts, we see that L is symmetric in L2 ( ), that is, for any
compactly supported smooth functions f , g,
( f L g) d = (gL f ) d .
In the rest of this section, we will use the notation f = f d .
Let B denote a Banach space of real functions on M, equipped with a partial
order <, that contains Cb (M), the Banach space of continuous functions on M
equipped with the uniform norm, with the latter being dense in B. We will be
concerned in the sequel with B = L2 ( ).
The collection of functions for which the right side of (4.4.5) exists is the domain
of L , and is denoted D(L ).
We will only be interested in the cases n = 1, 2. Thus, the carre du champ operator
1 satisfies
1
1 ( f , g) = (L f g f L g gL f ) , (4.4.7)
2
and the carre du champ itere operator 2 satisfies
1
2 ( f , f ) = {L 1 ( f , f ) 21 ( f , L f )} . (4.4.8)
2
We often write i ( f ) for i ( f , f ), i = 1, 2. Simple algebra shows that 1 ( f ) =
i=1 (i f ) , and
m 2
m m
2 ( f , f ) = (i j f )2 + i f Hess()i j j f , (4.4.9)
i, j=1 i, j=1
Definition 4.4.16 We say that the BakryEmery condition (denoted BE) is satisfied
if there exists a positive constant c > 0 such that
1
2 ( f , f ) 1 ( f , f ) (4.4.11)
c
for any smooth function f .
Theorem 4.4.17 Assume that C2 (Rm ) and that the BE condition (4.4.12)
290 4. S OME GENERALITIES
holds. Then, satisfies the logarithmic Sobolev inequality with constant c, that
is, for any f L2 ( ),
f2
f 2 log d 2c 1 ( f , f )d . (4.4.13)
f 2 d
(Rm ) denote the subset of C (Rm ) that consists of func-
In the sequel, we let Cpoly
tions all of whose derivatives have polynomial growth at infinity. The proof of
Theorem 4.4.17 is based on the following result which requires stronger assump-
tions.
From Theorem 4.4.17, (4.4.9) and Lemma 2.3.3 of Section 2.3, we immediately
get the following.
Proof of Theorem 4.4.17 (with Theorem 4.4.18 granted). Fix > 0, M > 1, and
set B(0, M) = {x Rm : x2 M}. We will construct below approximations of
(Rm ) with the following properties:
by functions M, Cpoly
with the operator norm on Matm (R). Such an approximation exists by Weier-
strass Theorem. Note that
With c1
=c
1 > 0 for small, note that Hess(P )(x) c1 I on B(0, 2M)
4
and define P as the function on Rm given by
-
1
P (x) = sup P (y) + P (y) (x y) + x y22 .
yB(0,2M) 2c
is convex as a supremum of convex functions (and thus its Hessian, which is al-
(Rm )-
most everywhere well defined, is nonnegative). Finally, to define a Cpoly
valued function we put, for some small t,
,t (x) = P (x + tz)d (z)
Thus, M ( ,t) vanishes when and t go to zero and we choose these two pa-
(Rm ) since the
rameters so that it is bounded by . Moreover, ,t belongs to Cpoly
density of the Gaussian law is C and P has at most a quadratic growth at infinity.
Finally, since Hess(P ) c1 1
I almost everywhere, Hess ,t c I everywhere.
To conclude, we choose small enough so that c c + .
Our proof of Theorem 4.4.18 proceeds via the introduction of the semigroup Pt
associated with L through the solution of the stochastic differential equation
dXtx = (Xtx )dt + 2dwt , X0x = x , (4.4.17)
292 4. S OME GENERALITIES
Lemma 4.4.20 With assumptions as in Theorem 4.4.18, for any x Rm , the solu-
tion of (4.4.17) exists for all t R+ . Further, the formula
Pt f (x) = E( f (Xtx )) (4.4.18)
determines a Markov semigroup on B = L2 (
), with infinitesimal generator L
(Rm ).
so that D(L ) contains Cpoly (R ), and L coincides with L on Cpoly
m
Proof Since the second derivatives of are locally bounded, the coefficients of
(4.4.17) are locally Lipschitz, and the solution exists and is unique up to (possi-
bly) an explosion time. We now show that no explosion occurs, in a way similar
to our analysis in Lemma 4.3.3. Let Tn = inf{t : |Xtx | > n}. Itos Lemma and
the inequality x (x) |x|2 /c c for some constant c > 0 (consequence of
(4.4.12)) imply that
tT
n
x
E(|XtT n
|2
) = x 2
E X s (X s )ds + 2E(t Tn )
0
tT
1 n
x2 + E |Xs |2 ds + (2 + c )E(t Tn ) . (4.4.19)
c 0
Arguing similarly with the term containing L f (Xsx ), we conclude that all terms
in (4.4.20) are uniformly integrable. Taking n and using the fact that Tn
together with the above uniform integrability yields that
t
E ( f (Xtx )) f (x) = E L f (Xsx )ds .
0
Taking the limit as t 0 (and using again the uniform integrability together with
(Rm ) D(L ) .
the continuity Xsx s0 x a.s.) completes the proof that Cpoly
Pt f = gt ,
by Lemma 4.4.20, belongs to D(L ) and so does Pt 1 ( f , g). The rest follows from
the definitions.
A direct proof can be given based on part (i) of Lemma 4.4.22. Instead, we present
a slightly longer proof that allows us to derive useful intermediate estimates.
We first note that we can localize (4.4.21): because Pt 1 = 1 and Pt f 0 for f
positive continuous, it is enough to prove (4.4.21) for h Cb (Rm ) that is compactly
supported. Because Cb (K) is dense in C(K) for any compact K, it is enough
to prove (4.4.21) for h Cb (Rm ). To prepare for what follows, we will prove
(4.4.21) for a function h satisfying h = (P g) for some g Cb , 0, and
that is infinitely differentiable with bounded derivatives on the range of g (the
immediate interest is with = 0, (x) = x).
Set ht = Pt h and for s [0,t], define (s) = Ps 1 (hts , hts ). By part (ii) of
Lemma 4.4.22, 1 (hts , hts ) D(L ). Therefore,
d 2 2
(s) = 2Ps 2 (Pts h, Pts h) Ps 1 (Pts h, Pts h) = (s) ,
ds c c
where we use the BE condition in the inequality. In particular,
Next, using the fact that Pt is symmetric together with the CauchySchwarz in-
equality, we get
1 (ht , log ht )d = 1 (h, Pt (log ht )) d
1 1
1 (h, h) 2 2
d h1 (Pt log ht , Pt log ht )d . (4.4.24)
h
Now, applying (4.4.22) with the function log ht (note that since ht is bounded
below uniformly away from 0, log() is indeed smooth on the range of ht ), we
obtain
2
h1 (Pt log ht , Pt log ht )d he c t Pt 1 (log ht , log ht )d
2 2
= e c t ht 1 (log ht , log ht )d = e c t 1 (ht , log ht )d , (4.4.25)
where in the last equality we have used symmetry of the semigroup and the Leib-
niz rule for 1 . The inequalities (4.4.24) and (4.4.25) imply the bound
2 1 (h, h) 2 1 1
1 (ht , log ht )d e c t d = 4e c t 1 (h 2 , h 2 )d .
h
(4.4.26)
Using this, one arrives at
2t 1 1
S f (0) 4e c dt 1 (h 2 , h 2 )d = 2c 1 ( f , f )d ,
0
We now consider the version of Corollary 4.4.19 applying to the setting of a com-
pact connected manifold M of dimension m equipped with a Riemannian metric g
and volume measure , see Appendix F for the notions employed.
296 4. S OME GENERALITIES
Remark 4.4.23 For the reader familiar with such language, we note that, in local
coordinates,
m m
L = gi j i j + b
i i
i, j=1 i=1
with
b
i (x) = e
(x)
j e(x) det(gx )gixj .
j
in terms of a local orthonormal frame {Li }. The latter expression for 1 may be
verified by a straightforward manipulation of differential operators. The expres-
sion for 2 is more complicated and involves derivatives of the metric g, reflecting
the fact that the LeviCivita connection does not preserve the Lie bracket. In other
words, the curvature intervenes, as follows.
4.4 C ONCENTRATION OF MEASURE AND RANDOM MATRICES 297
(See Appendix F for the definition of the Ricci tensor Ric(, ).)
+ ((LiCkk
j
LkCijk )(Li f )(L j f ))| p . (4.4.27)
i, j,k
We have [Li , L j ] = k (Cikj Ckji )Lk because is torsion-free. We also have [Li , L j ]| p
= 0 because {Li } is geodesic at p. It follows that
[Li , L j ]Li f | p = 0,
Li [Li , L j ] f | p = (LiCikj LiCkji )(Lk f )| p ,
k
([L j , Ai Li ] f )(L j f )| p = (L jCkki + L j Li )(Li f )(L j f )| p .
k
Therefore, after some relabeling of dummy indices, we can see that equation
(4.4.27) holds.
Rerunning the proofs of Theorem 4.4.18 and Lemma 2.3.3 (this time, not wor-
rying about explosions, since the process lives on a compact manifold, and replac-
(Rm ) by C (M)), we deduce from Lemma 4.4.24 the
ing throughout the space Cpoly b
following.
298 4. S OME GENERALITIES
then satisfies the LSI (4.4.13) with constant c and, further, for any differen-
tiable function G on M,
|G G(x) (dx)| 2e /2cE 1 (G,G) .
2
(4.4.28)
for the product Lebesgue measure on the entries on-and-above the diagonal of
XN , where the Lebesgue measure on C is taken as the product of the Lebesgue
measure on the real and imaginary parts.
We next apply, in the setup of compact Riemannian manifolds, the general con-
centration inequality of Corollary 4.4.25. We study concentration on orthogonal
and unitary groups. We let O(N) denote the N-by-N orthogonal group and U(N)
denote the N-by-N unitary group. (In the notation of Appendix E, O(N) = UN (R)
and U(N) = Un (C).) We let SU(N) = {X U(N) : det X = 1} and SO(N) =
O(N) SU(N). All the groups O(N), SO(N), U(N) and SU(N) are manifolds
embedded in MatN (C). We consider each of these manifolds to be equipped with
the Riemannian metric it inherits from MatN (C), the latter equipped with the in-
ner product X Y = tr XY . It is our aim is to get concentration results for O(N)
and U(N) by applying Corollary 4.4.25 to SO(N) and SU(N).
We introduce some general notation. Given a compact group G, let G denote
the unique Haar probability measure on G. Given a compact Riemannian mani-
fold M with metric g, and f C (M), let | f |L ,M be the maximum achieved by
g(grad f , grad f )1/2 on M.
Although we are primarily interested in SO(N) and SU(N), in the following
result, for completeness, we consider also the Lie group USp(N) = UN (H)
MatN (H).
Proof Recall from Appendix F, see (F.6), that the Ricci curvature of GN is given
by
(N + 2)
Ricx (GN )(X, X) = 1 gx (X, X) (4.4.30)
4
for x GN and X Tx (GN ). Consider now the specialization of Corollary 4.4.25
to the following case:
0 and (hence) = GN .
We next deduce a corollary with an elementary character which does not make
reference to differential geometry.
and furthermore
(N+2)
4 1 2
GN | f () f (Y )d SGN (Y )| 2e 2C2 (4.4.33)
For the proof we need a lemma which records some group-theoretical tricks. We
continue in the setting of Corollary 4.4.28.
Proof To abbreviate we write X = (tr XX )1/2 for X MatN (C). For X,Y GN
we have
A A
| f (X) f (Y )| 2NFL AXD N X Y D N Y A 2 2NDX Y .
Further, by Lemma 4.4.29, since T f = f , we have GN f = S f . Plugging into
Corollary 4.4.28, we obtain the result.
Proof We may assume without loss of generality that p = ti1 ti for some indices
i1 , . . . , i {1, . . . , k +2}, and also that N > . We claim first that, for all X U(N),
U(N) f = f (Y X)d SU(N) (Y ) =: (S f )(X) . (4.4.35)
302 4. S OME GENERALITIES
For some integer a such that |a| we have f (ei X) = eia f (X) for all R
and X U(N). If a = 0, then S f = U(N) f by Lemma 4.4.29. Otherwise, if
a > 0, then U(N) f = 0, but also S f = 0, because f (e2 i/N X) = e2 ia/N f (X) and
e2 ia/N IN SU(N). This completes the proof of (4.4.35).
It is clear that f is a Lipschitz function, with Lipschitz constant depending
only on and D. Thus, from Corollary 4.4.28 in the case = 2 and the equality
U(N) f = S f , we obtain (4.4.34) for p = ti1 ti with N0 = and c = c(, D),
which finishes the proof of Corollary 4.4.31.
Exercise 4.4.33 In this exercise, you provide another proof of Proposition 4.4.26
by proving directly that the law
N
1 N N V (i )
PVN (d 1 , . . . , d N ) =
ZNV
e i=1 ( i )
d i
i=1
on RN satisfies the LSI with constant (Nc)1 . This proof extends to the -
ensembles discussed in Section 4.5.
(i) Use Exercise 4.4.32 to show that Theorem 4.4.18 extends to the case where
N
( ) = N V (i ) log |i j | .
i=1 2 i= j
We consider in this section a class of random matrices that are tridiagonal and
possess joint distribution of eigenvalues that generalize the classical GOE, GUE
and GSE matrices. The tridiagonal representation has some advantages, among
them a link with the well-developed theory of random Schroedinger operators.
4.5 T RIDIAGONAL MATRIX MODELS AND ENSEMBLES 303
21t/2 xt1 ex
2 /2
ft (x) =
(t/2)
is called the distribution with t degrees of freedom, and is denoted t .
Here, () is Eulers Gamma function, see (2.5.5). The reason for the name is that
if t is integer and X is distributed according to t , then X has the same law as
/
ti=1 i2 where i are standard Gaussian random variables.
Let i be independent i.i.d. standard Gaussian random variables of zero mean
and variance 1, and let Yi i be independent and independent of the vari-
ables {i }. Define the tridiagonal symmetric matrix HN MatN (R) withen-
tries HN (i, j) = 0 if |i j| > 1, HN (i, i) = 2/ i and HN (i, i + 1) = YNi / ,
i = 1, . . . , N. The main result of this section is the following.
Lemma 4.5.36 The eigenvalues of any H TN are distinct, and all eigenvectors
v = (v1 , . . . , vN ) of H satisfy v1 = 0.
Proof The null space of any matrix H TN is at most one dimensional. Indeed,
suppose Hv = 0 for some nonzero vector v = (v1 , . . . , vN ). Because all entries of
b are nonzero, it is impossible that v1 = 0 (for then, necessarily all vi = 0). So
suppose v1 = 0, and then v2 = aN /bN1 . By solving recursively the equation
which is possible because all entries of b are nonzero, all entries of v are deter-
mined. Thus, the null space of any H TN is one dimensional at most. Since
H I TN for any , the first part of the lemma follows. The second part fol-
4.5 T RIDIAGONAL MATRIX MODELS AND ENSEMBLES 305
(d)
. (4.5.4)
N1 i1
i=1 bi
Proof That the map in (4.5.3) is a bijection follows from the proof of Lemma
4.5.36, and in particular from (4.5.2) (the map (d, v) (a, b) is determined by
the relation H = UDU T ).
To evaluate the Jacobian, we recall the proof of the = 1 case of Theorem
4.5.35. Let X be a matrix distributed according to the GOE, consider the tridiag-
onal matrix with diagonals a, b obtained from X by the successive Householder
transformations employed in that proof. Write X = UDU where U is orthogonal,
D is diagonal (with elements d), and the first row u of U consists of nonnegative
entries (and strictly positive except on a set of measure 0). Note that, by Corollary
2.5.4, u is independent of D and, by Theorem 2.5.2, the density of the distribution
of the vector (d, u) with respect to the product of the Lebesgue measure on cN
is proportional to (d)e i=1 di /4 . Using
N1 N 2
and the the uniform measure on S+
Theorem 4.5.35 and the first part of the lemma, we conclude that the latter (when
evaluated in the variables a, b) is proportional to
a2i 2
N1 bi
N1 N1
Je i=1 4 i=1 2
bi1 = Je i=1 di /4 bi1
N N 2
i i .
i=1 i=1
Proof Write H = UDU T . Let e1 = (1, 0, . . . , 0)T . Let w1 be the first column of
U T , which is the vector made out of the first entries of v1 , . . . , vn . One then has
N1
bii = det[e1 , He1 , . . . , H N1 e1 ] = det[e1 ,UDU T e1 , . . . ,UDN1U T e1 ]
i=1
N
= det[w1 , Dw1 , . . . , DN1 w1 ] = (d) vi1 .
i=1
Because all terms involved are positive by construction, the is actually a +, and
the lemma follows.
where (4.5.5) was used in the second equality. Substituting in (4.5.6) and integrat-
ing over the variables v completes the proof.
2N /2 B x
(N HN )(n) N +1/22 (x) + (x) xN + 1/2 (x) , (4.5.8)
where (4.5.8) has to be understood after an appropriate integration by parts against
smooth test functions. To obtain a scaling limit, one then needs to take , so that
1 1 1 1
+ 2 = = + = 0 = , = .
2 2 2 3 6
In particular, we recover the TracyWidom scaling, and expect the top of the
spectrum of N 1/6 HN to behave like the top of the spectrum of the stochastic Airy
operator
d2 2
H := x + B x . (4.5.9)
dx2
308 4. S OME GENERALITIES
elements of L are continuous functions, and vanish at the origin. Further prop-
erties of L are collected in Lemma 4.5.43 below.
Remark 4.5.41 Using the fact that f L , one can integrate by parts in (4.5.11)
and express all integrals as integrals involving only. In this way, one obtains
that ( f , ) is an eigenvectoreigenvalue pair of H if and only if, for Lebesgue
almost every x and some constant C, f (x) exists and
x x
f (x) = C + ( + ) f ( )d Bx f (x) + B f ( )d . (4.5.12)
0 0
Since the right side is a continuous function, we conclude that f can be taken con-
tinuous. (4.5.12) will be an important tool in deriving properties of eigenvector
4.5 T RIDIAGONAL MATRIX MODELS AND ENSEMBLES 309
The proof of Theorem 4.5.42 will take the rest of this section. It is divided into
two main steps. We first study the operator H by associating with it a variational
problem. We prove, see Corollary 4.5.45 and Lemma 4.5.47 below, that the eigen-
values of H are discrete, that they can be obtained from this variational problem
and that the associated eigenspaces are simple. In a second step, we introduce a
discrete quadratic form associated with HN = N 1/6 HN and prove its convergence
to that associated with H , see Lemma 4.5.50. Combining these facts will then
lead to the proof of Theorem 4.5.42.
We begin with some preliminary material related to the space L .
f 2 (x) 2 f 2 f 2 . (4.5.14)
Points (ii) and (iv) in the statement of the lemma follow from the Banach
Alaoglu Theorem (Theorem B.8). Point (iii) follows from the uniform equi-
continuity on compacts of the sequence fn that is a consequence of the uniform
Holder estimate. Together with the uniform integrability supn x fn2 (x)dx < ,
this gives (i).
Lemma 4.5.44 (a) For each > 0 there exists a random constant C (depending
on , and B only) such that
4 |Q | |R |
sup x x . (4.5.18)
x C+ x
4.5 T RIDIAGONAL MATRIX MODELS AND ENSEMBLES 311
We can now consider the variational problem associated with the quadratic form
, H of (4.5.16).
We will shortly see in Lemma 4.5.47 that the minimizer in Corollary 4.5.45 is
unique.
Proof By the estimate (4.5.19), the infimum in (4.5.20) is finite. Let { fn }n be a
minimizing sequence, that is fn 2 = 1 and fn , fn H 0 . Again by (4.5.19),
there is some (random) constant K so that fn K for all n. Write
" #
2
fn , fn H = fn 2 fn 22 Qx fn2 (x)dx 2 Rx fn (x) fn (x)dx .
0 0
Let f L be a limit point of fn (in all the senses provided by Lemma 4.5.43).
312 4. S OME GENERALITIES
f , f H lim inf fn , fn H + K = 0 + K .
n
f , f H f , f H
= 2 f , f H f (x) (x)dx + 2 ( f (x) (x) + x f (x) (x))dx
0 0
" #
4
Qx (x) f (x)dx Rx [ (x) f (x)] dx + O( 2 ) .
0 0
k := inf f , f H . (4.5.22)
f L , f 2 =1, f Hk
Mimicking the proof of Corollary 4.5.45, one shows that the infimum in (4.5.22)
is achieved at some f L , and ( f , k ) is an eigenvectoreigenvalue pair for
H , with k = k . We then denote by Hk the (finite dimensional) linear space
of scalar multiples of minimizers in (4.5.22). It follows that the collection of
eigenvalues of H is discrete and can be ordered as 0 > 1 > .
Our next goal is to show that the spaces Hk are one dimensional, i.e. that each
eigenvalue is simple. This will come from the analysis of (4.5.12). We have the
following.
Lemma 4.5.47 For each given C, and continuous function B , the solution to
(4.5.12) is unique. As a consequence, the spaces Hk are all one-dimensional.
[xN 1/3 ]
yN,2 (x) = 2N 1/6
( N HN (i, i + 1)) . (4.5.25)
i=1
Lemma 4.5.48 There exists a probability space supporting the processes yN, j ()
and two independent Brownian motions B, j , j = 1, 2, such that, with respect to
the Skorohod topology, the following convergence holds almost surely:
7
2
yN, j () Bx, j + x2 ( j 1)/2 , j = 1, 2 .
In the sequel, we work in the probability space determined by Lemma 4.5.48, and
write Bx = Bx,1 + Bx,2 (thus defining naturally a version of the operator H whose
relation to the matrices HN needs clarification). Toward this end, we consider the
matrices HN as operators acting on RN equipped with the norm
N N N
v2N, = N 1/3 (v(i + 1) v(i))2 + N 2/3 iv(i)2 + N 1/3 v(i)2 ,
i=1 i=1 i=1
where we set v(N + 1) = 0. Write v, wN,2 = N 1/3 and let vN,2
Ni=1 v(i)w(i)
denote the associated norm on R . Recall the random variables Yi appearing in
N
the definition of the tridiagonal matrix HN , see Theorem 4.5.35, and motivated by
the scaling in Theorem 4.5.42, introduce
1
i = 2N 1/6 ( N EYNi ) ,
1
i = 2N 1/6 (EYNi YNi ) .
It is straightforward to verify that i 0 and that, for some constant independent
of N,
i i
i + . (4.5.26)
N N
Also, with wk = 2/ N 1/6 ki=1 i and wk = ki=1 i , we have that for any
(1) (2)
|wk wi |2 iN 1/3 + N, .
( j) ( j)
sup (4.5.27)
iki+N 1/3
4.5 T RIDIAGONAL MATRIX MODELS AND ENSEMBLES 315
and thus, together with the bound (4.5.26), we have that S2 is bounded above by
a constant multiple of the sum of the second and third terms in v2N, . Similarly,
we have from the bound ab (a b)2 /3 + a2 /4 that
1 1 1 1
i v(i)v(i + 1) (vi+1 vi )2 + i v2i (vi+1 vi )2 + iv2i v2i
3 4 3 4 4
and using (4.5.26) again, we conclude that, for an appropriate constant c( ),
2
S2 + S1 v2N, c( )v22 . (4.5.29)
3
by parts we get
N N
(wi+1 wi wi )v2 (i) + wi v2 (i)
(1) (1) (1) (1)
S3 =
i=1 i=1
N i+N 1/3 N
N 1/3 (w wi ) (v2 (i + 1) v2 (i)) + wi v2 (i)
(1) (1) (1)
=
i=1 =i+1 i=1
=: S3,1 + S3,2 . (4.5.30)
316 4. S OME GENERALITIES
Because the family of random variables in Lemma 4.5.49 is tight, any subse-
quence {Nk } possesses a further subsequence {Nki } so that the estimates there
hold with fixed random variables ci (now independent of N). To prove Theorem
4.5.42, it is enough to consider such a subsequence. With some abuse of notation,
we continue to write N instead of Nk .
Each vector v RN can be identified with a piecewise constant function fv by
the formula fv (x) = v($N 1/3 x%) for x [0, $N 2/3 %] and fv (x) = 0 for all other x.
The collection of such functions (for a fixed N) forms a closed linear subspace of
L2 := L2 (R+ ), denoted L2,N , and HN acts naturally on L2,N . Let PN denote the
projection from L2 to L2,N L2 . Then HN extends naturally to an operator on L2
by the formula HN f = HN PN f . The relation between the operators HN and H
is clarified in the following lemma.
, HN fN 2 , H . (4.5.31)
, HN fN 2 , f H .
4.5 T RIDIAGONAL MATRIX MODELS AND ENSEMBLES 317
Proof The first part is an exercise in summation by parts that we omit. To see
the second part, pick a subsequence such that both fN and N 1/3 ( fN (x + N 1/3 )
fN (x)) converge weakly in L2 to a limit ( f , g), with f (x) = 0t g(s)ds (this is pos-
sible because fN N, < ). An application of the first part of the lemma then
completes the proof.
We have now put in place all the analytic machinery needed to conclude.
Proof of Theorem 4.5.42 Write N,k = N 1/6 (Nk N 2 N). Then
N,k is the
kth top eigenvalue of HN . Let vN,k denote the associated eigenvector, so that
fvN,k 2 = 1. We first claim that k := lim sup k,N k . Indeed, if k > , one
can find a subsequence, that we continue to denote by N, so that (N,1 , . . . , N,k )
(1 , . . . , k = k ). By Lemma 4.5.49, for j = 1, . . . , k, vN, j N, are uniformly
bounded, and hence, on a further subsequence, fvN, j converge in L2 to a limit f j ,
j = 1, . . . , N, and the f j are eigenvectors of H with eigenvalue at least k . Since
the f j are orthogonal in L2 and the spaces H j are one dimensional, it follows that
k k .
To see the reverse implication, that will complete the proof, we use an inductive
argument. Suppose that N, j j and fvN, j f j in L2 for j = 1, . . . , k 1, where
( f j , j ) is the jth eigenvectoreigenvalue pair for H . Let ( fk , k ) be the kth
eigenvectoreigenvalue pair for H . Let fk be smooth and of compact support, so
that fk fk , and set
k1
fN,k = PN fk vN, j , PN fk vN, j .
j1
Since vN, j N, < c for some fixed c by Lemma 4.5.49, and PN fk fvN,k 2
is bounded by 2 for N large, it follows that fN,k PN fk N, < c for some
(random) constant c. Using Lemma 4.5.49 again, we get that
fN,k , HN fN,k PN fk , HN PN fk
lim inf N,k lim inf = lim inf + s( ) ,
N N fN,k , fN,k N PN fk , PN fk
(4.5.32)
where s( ) 0 0. Applying (4.5.31), we have that
lim PN fk , HN PN fk = fk , fk H .
N
The background material on manifolds that we used in Section 4.1 can be found
in [Mil97] and [Ada69]. The Weyl formula (Theorem 4.1.28) can be found in
[Wey39]. A general version of the coarea formula, Theorem 4.1.8, is due to Fed-
erer and can be found in [Fed69], see also [Sim83] and [EvG92] for less intimi-
dating descriptions.
The physical motivation for studying different ensembles of random matrices
is discussed in [Dys62e]. We note that the Laguerre and Jacobi ensembles oc-
cur also through statistical applications (the latter under the name MANOVA, or
multivariate analysis of variance), see [Mui81].
Our treatment of the derivation of joint distributions of eigenvalues was influ-
enced by [Due04] (the latter relies directly on Weyls formula) and [Mat97]. The
book [For05] is an excellent recent reference on the derivation of joint distribu-
tions of eigenvalues of random matrices belonging to various ensembles; see also
[Meh91] and the more recent [Zir96]. Note, however, that the circular ensembles
COE and CSE do not correspond to random matrices drawn uniformly from the
unitary ensembles as in Proposition 4.1.6. A representation theoretic approach to
the study of the latter that also gives central limit theorems for moments is pre-
sented in [DiS94] and further developed in [DiE01]. The observation contained
in Remark 4.1.7 is motivated by the discussion in [KaS99]. For more on the root
systems mentioned in Remark 4.1.5 and their link to the Weyl integration formula,
see [Bou05, Chapter 9, Section 2].
The theory of point processes and the concept of Palm measures apply to much
more general situations than we have addressed in Section 4.2. A good treatment
of the theory is contained in [DaVJ88]. Our exposition builds on [Kal02, Chapter
11].
Point processes x0 on R whose associated difference sequences y0 (see Lemma
4.2.42) are stationary with marginals of finite mean are called cyclo-stationary. It
is a general fact, see [Kal02, Theorem 11.4], that all cyclo-stationary processes
are in one-to-one correspondence with nonzero stationary simple point processes
of finite intensity via the Palm recipe.
Determinantal point processes were studied in [Mac75], see also the survey
[Sos00]. The representation of Proposition 4.2.20, as well as the observation that
it leads to a simple proof of Corollary 4.2.21 and of the CLT of Corollary 4.2.23
(originally proved in [Sos02a]), is due to [HoKPV06]. See also [HoKPV09]. The
4.6 B IBLIOGRAPHICAL NOTES 319
Proposition 4.3.23 is due to [CaG01]. It was completed into a full large devi-
ation principle in [GuZ02] and [GZ04]. By the contraction principle (Theorem
D.7), it implies also the large deviations principle for LN (1), and in particular for
the empirical measure of eigenvalues for the sum of a Gaussian Wigner matrix XN
and a deterministic matrix AN whose empirical measure of eigenvalues converges
and satisfies (4.3.23). For AN = 0, this recovers the results of Theorem 2.6.1 in
the Gaussian case.
As pointed out in [GuZ02] (see also [Mat94]), the large deviations for the em-
pirical measure of the eigenvalues of AN + XN are closely related to the Itzykson
ZuberHarish-Chandra integral, also called spherical integral, given by
N
(2)
IN (A, D) = e 2 tr(UDU A) dm( ) (U),
N
where the integral is with respect to the Haar measure on the orthogonal group
(when = 1) and unitary group (when = 2). This integral appeared first in the
work of Harish-Chandra [Har56] who proved that when = 2,
Citing D. Voiculescu, Around 1982, I realized that the right way to look at certain
operator algebra problems was by imitating some basic probability theory. More
precisely, in noncommutative probability theory a new kind of independence can
be defined by replacing tensor products with free products and this can help un-
derstand the von Neumann algebras of free groups. The subject has evolved into a
kind of parallel to basic probability theory, which should be called free probability
theory.
Thus, Voiculescus first motivation to introduce free probability was the analy-
sis of the von Neumann algebras of free groups. One of his central observations
was that such groups can be equipped with tracial states (also called traces), which
resemble expectations in classical probability, whereas the property of freeness,
once properly stated, can be seen as a notion similar to independence in classical
probability. This led him to the statement
These two components are the basis for a probability theory for noncommuta-
tive variables where many concepts taken from probability theory such as the no-
tions of laws, convergence in law, independence, central limit theorem, Brownian
motion, entropy and more can be naturally defined. For instance, the law of one
self-adjoint variable is simply given by the traces of its powers (which generalizes
the definition through moments of compactly supported probability measures on
the real line), and the joint law of several self-adjoint noncommutative variables
is defined by the collection of traces of words in these variables. Similarly to the
classical notion of independence, freeness is defined by certain relations between
traces of words. Convergence in law just means that the trace of any word in the
noncommutative variables converges towards the right limit.
322
5.1 I NTRODUCTION AND MAIN RESULTS 323
This chapter is devoted to free probability theory and some of its consequences
for the study of random matrices.
The key relation between free probability and random matrices was discovered
by Voiculescu in 1991 when he proved that the trace of any word in independent
Wigner matrices converges toward the trace of the corresponding word in free
semicircular variables. Roughly speaking, he proved the following (see Theorem
5.4.2 for a complete statement).
Laws of free variables are defined in Definition 5.3.1. These are noncommutative
laws which are defined uniquely in terms of the laws of their variables, that is,
in terms of their one-variable marginal distributions. In Theorem 5.1.1 all the
one-variable marginals are the same, namely, the semicircle law. The statement
of Theorem 5.1.1 extends to Hermitian or real symmetric Wigner matrices whose
entries have finite moments, see Theorem 5.4.2. Another extension deals with
words that include also deterministic matrices whose law converges, as in the
following.
1
tr Q1 (DN )P1 (XN )Q2 (DN ) P (XN )
N
324 5. F REE PROBABILITY
(See Theorem 5.4.5 for the full statement and the proof.)
Theorems 5.1.1 and 5.1.2 are extremely useful in the study of random matrices.
Indeed, many classical models of random matrices can be written as some polyno-
mials in Wigner matrices and deterministic matrices. This is the case for Wishart
matrices or, more generally, for band matrices (see Exercises 5.4.14 and 5.4.16).
The law of free variables appears also when one considers random matrices fol-
lowing Haar measure on the unitary group. The following summarizes Theorem
5.4.10.
Theorem 5.1.3 Take DN = {DNi }1ip as in Theorem 5.1.2. Let UN = {UiN }1ip
be a collection of independent Haar-distributed unitary matrices independent
from {DNi }1ip , and set (UN ) = {(UiN ) }1ip . Then, for any positive integer
and any polynomial functions (Qi , Pi )1i ,
1
lim tr Q1 (DN )P1 (UN , (UN ) )Q2 (DN ) P (UN , (UN ) )
N N
= (Q1 (D)P1 (U, U )Q2 (D) P (U, U )) a.s.,
where is the law of p free variables U = (U1 , . . . ,Up ), free from the noncommu-
tative variables D of law . The law of Ui , 1 i p, is such that
((UiUi 1)2 ) = 0, (Uin ) = ((Ui )n ) = 1n=0 .
Thus, free probability appears as the natural setting to study the asymptotics of
traces of words in several (possibly random) matrices.
Adopting the point of view that traces of words in several matrices are funda-
mental objects is fruitful because it leads to the study of some general structure
such as freeness (see Section 5.3); freeness in turns simplifies the analysis of con-
vergence of moments. The drawback is that one needs to consider more general
objects than empirical measures of eigenvalues converging towards a probabil-
ity measure, namely, traces of noncommutative polynomials in random matrices
converging towards a linear functional on such polynomials, called a tracial state.
Analysis of such objects is then achieved using free probability tools.
In the first part of this chapter, Section 5.2, we introduce the setup of free prob-
ability theory (the few required notions from the theory of operator algebras are
5.2 N ONCOMMUTATIVE LAWS AND PROBABILITY SPACES 325
contained in Appendix G). We then define in Section 5.3 the property of freeness
and discuss free cumulants and free convolutions. In Section 5.4, which can be
read independently of the previous ones except for the description of the limit-
ing quantities in terms of free variables, we show that the asymptotics of many
classical models of random matrices satisfy the freeness property, and use that
observation to evaluate limiting laws. Finally, Section 5.5 uses free probability
tools to describe the behavior of spectral norms of noncommutative polynomials
in independent random matrices taken from the GUE.
Example 5.2.2
(i) Classical probability theory Let (X, B, ) be a probability space and set
A = L (X, B, ). Take to be the expectation (a) = X a(x) (dx).
Note that, for any p < , the spaces L p (X, B, ) are not algebras for the
0
usual product. (But the intersection 1p< L p (X, B, ) is again an alge-
bra.) To consider unbounded variables, we will introduce later the notion
of affiliated operators, see Subsection 5.2.3.
(ii) Discrete groups Let G be a discrete group with identity e and let A =
C(G) denote the group algebra (see Definition G.1). Take to be the
linear functional on A so that, for all g G, (g) = 1g=e .
326 5. F REE PROBABILITY
(iii) Matrices Let N be a positive integer and A = MatN (C). Let , denote
the scalar product on CN and fix v CN such that v, v = 1. We can take
on A to be given by v (a) = av, v, or by N (a) = N 1 tr(a).
(iv) Random matrices Let (X, B, ) be a probability space. Define A =
L (X, , MatN (C)), the space of NN-dimensional complex random ma-
trices with -almost surely uniformly bounded entries. Set
1 1 N
N (a) = tr(a(x)) (dx) = a(x)ei , ei (dx) , (5.2.1)
N X N i=1
where here the ei are the standard basis vectors in CN . Alternatively, one
can consider, with v CN so that v, v = 1,
v (a) = a(x)v, v (dx) . (5.2.2)
X
(v) Bounded operators on a Hilbert space Let H be a Hilbert space with inner
product , and B(H) be the set of bounded linear operators on H. We
set for v H so that v, v = 1 and a B(H),
v (a) = av, v.
The GNS construction discussed below will show that this example is in a
certain sense universal. It is therefore a particularly important example to
keep in mind.
and so a is (the sequence of moments of) the law of a under (or equiv-
alently the push-forward a# of by a).
(ii) Discrete groups Let G be a group with identity e and take (g) = 1g=e . Fix
{gi }1in Gn . The law = {gi }1in has then the following description:
for any monomial P = Xi1 Xi2 Xik , we have (P) = 1 if gi1 gik = e and
(P) = 0 otherwise.
(iii) One matrix Let a be an NN Hermitian matrix with eigenvalues
(1 , . . . , N ). Then we have, for all polynomials P C[X],
1 1 N
a (P) = tr(P(a)) = P(i ).
N N i=1
= LN , P . (5.2.3)
their quenched empirical distribution {ai (x)}iJ for almost all x, or their
annealed empirical distribution {ai (x)}iJ d (x).
(vi) Bounded operators on a Hilbert space Let H be a Hilbert space and T a
bounded normal linear operator on H with spectrum (T ) (see Appendix
G, and in particular Section G.1, for definitions). According to the spec-
tral theorem, Theorem G.6, if is the spectral resolution of T , for any
polynomial function P C[X],
P(T ) = P( )d ( ).
(T )
We first recall C -algebras, see Appendix G.1 for detailed definitions. We will re-
strict our discussion throughout to unital C -algebras (and C -subalgebras) with-
out further mentioning it. Thus, in the following, a C -algebra A is a unital
algebra equipped with a norm and an involution so that
(A )i j = A ji , 1 i, j N
Part (iv) of Example 5.2.6 is, in a sense, generic: any C -algebra A is isomorphic
to a sub C -algebra of B(H) for some Hilbert space H (see e.g. [Rud91, Theorem
12.41]). We provide below a concrete example.
always exists, does not depend on the sequence of approximations, and yields an
5.2 N ONCOMMUTATIVE LAWS AND PROBABILITY SPACES 331
To begin discussing probability, we need two more concepts: the first is posi-
tivity and the second is that of a state.
{a A : a 0} = {aa : a A } . (5.2.4)
C -probability spaces
We show next how all cases in Example 5.2.2 can be made to fit the definition
of C -probability space.
cg vg , c g vg = cg c g ,
gG gG gG
on A is a tracial state. There are many other states on A ; for any vector
v CN with ||v|| = 1,
v (a) = a(x)v, vd (x)
is a state.
5.2 N ONCOMMUTATIVE LAWS AND PROBABILITY SPACES 333
We now consider the set of laws of variables {ai }iJ defined on a C -probability
space.
(By Lemma G.11, a state automatically satisfies 1, that is, | (x)| x
for any x A .) Note that by either Lemma G.11 or (5.2.4), equation (5.2.6) is
equivalent to
(bb ) 0 b A , (1) = 1 . (5.2.7)
Proposition 5.2.14 Let J be a set of positive integers. Fix a constant 0 < R < .
Let the involution on CXi |i J be as in (5.2.8). Then there exists a C -algebra
A = A (R, J) and a family {ai }iJ of self-adjoint elements of it with the following
properties.
(a) supiJ ai R.
(b) A is generated by {ai }iJ as a C -algebra.
(c) For any C -algebra B and family of self-adjoint elements {bi }iJ of it
satisfying supiJ bi R, we have P({ai }iJ ) P({bi }iJ ) for all
polynomials P CXi |i J.
(d) A linear functional CXi |i J is the law of {ai }iJ under some state
MA if and only if (1) = 1,
| (Xi1 Xik )| Rk (5.2.9)
for all words Xi1 , . . . , Xik , and (P P) 0 for all P CXi |i J.
334 5. F REE PROBABILITY
(e) Under the equivalent conditions stated in point (d), the state is unique,
and furthermore is tracial if (PQ) = (QP) for all P, Q CXi |i J.
Points (a), (b) and (c) of Proposition 5.2.14 imply that, for any C -algebra B and
{bi }iJ as in point (c), there exists a unique continuous algebra homomorphism
A B commuting with sending ai to bi for i J. In this sense, A is the
universal example of a C -algebra equipped with an R-bounded J-indexed family
of self-adjoint elements.
Proof To abbreviate notation, we write
A = CXi |i J.
First we construct A and {ai }iJ to fulfill the first three points of the proposition
by completing A in a certain way. For P = P({Xi }iJ ) A, put
where B ranges over all C -algebras and {bi }iJ ranges over all families of self-
adjoint elements of B such that supiJ bi R. Put
L = {P A : PR,J,C = 0}.
where deg denotes the length of the word . One checks that PR,J is a norm
on A and further, from assumption (5.2.9),
By the continuity of with respect to R,J , see (5.2.11), and Lemma G.22, we
have that P P PR,J . In particular, Xi R for all i J.
1/2
Using again the quotient and completion process which we used to construct
A , but this time using the seminorm , we obtain a C -algebra B and self-
adjoint elements {bi }iJ satisfying supiJ bi R and P = P({bi }iJ ) for
P A. But then by point (c) we have P PR,J,C for P A, and thus
| (P)| PR,J,C . Let be the unique continuous linear functional on A such
336 5. F REE PROBABILITY
Weak*-topology
Recall that we endowed the set of noncommutative laws with its weak*-topology,
see Definition 5.2.5.
We remark that a different proof of part (i) of Corollary 5.2.16 can be given
directly by using part (d) of Proposition 5.2.14. A different proof of part (ii) is
sketched in Exercise 5.2.20.
then it follows from Corollary 5.2.16 that there exists a state on the uni-
versal C -algebra A (R, J) and elements {ai }iJ A (R, J) so that
{MiN ( )}iJ converges in expectation to {ai }iJ , i.e.
The space MA possesses a nice topological property that we state next. The
main part of the proof (which we omit) uses the BanachAlaoglu Theorem, The-
orem B.8.
Exercise 5.2.20 In the setting of Corollary 5.2.16, show, without using part (d) of
Proposition 5.2.14, that under the assumptions of part (ii) of the corollary, there
exists a sequence of states N on A (R + 1, J) so that N (P) converges to (P)
for all P CXi |i J. Conclude that is a state on A (R + 1, J).
Hint: set fR (x) = x (R + 1) ((R + 1)), and define aN,R
i = fR (aNi ). Using the
N,R
CauchySchwarz inequality, show that N (P({ai }iJ )) converges to (P) for
all P CXi |i J. Conclude by applying part (i) of the corollary.
5.2 N ONCOMMUTATIVE LAWS AND PROBABILITY SPACES 339
g(b)x, y = g(z)d x,y
b
(z) . (5.2.14)
In general, g(b) may not belong to the C -algebra generated by b; it will, however,
belong to a larger algebra that we now define.
(Weak operator topology closure means that b b on a net if, for any fixed
x, y H, b x, y converges to bx, y. Recall, see Theorem G.14, that in Definition
5.2.21, the requirement of closure with respect to the weak operator topology is
equivalent to closure with respect to the strong operator topology, i.e., with the
previous notation, to b x converging to bx in H.)
Example 5.2.23
(i) We have seen in Remark 5.2.8 that the C -algebra Ab generated by a
self-adjoint bounded operator b on a separable Hilbert space H is exactly
{ f (b), f C(sp(b))}. It turns out that the von Neumann algebra generated
340 5. F REE PROBABILITY
Proposition 5.2.14, L is a left ideal. It is closed due to the continuity of the map
f ( f f ). Consider the quotient space A := A \ L . Denote by : a a
the map from A into A . Note that, by (G.6), (x y) depends only on x , y , and
put
1
x , y = (x y), x := x , x 2 ,
which defines a pre-Hilbert structure on A . Let H be the (separable) Hilbert
space obtained by completing A with respect to the Hilbert norm .
To construct the morphism , we consider A as acting on A by left multipli-
cation and define, for a A and b A ,
(a)b := ab A .
By (G.7),
|| (a)b ||2 = ||ab ||2 = (b a ab) ||a||2 (b b) = ||a||2 ||b ||2 ,
and therefore (a) extends uniquely to an element of B(H), still denoted (a),
with operator norm bounded by a. is a -homomorphism from A into B(H),
that is, (ab) = (a) (b) and (a) = (a ). To complete the construction, we
take 1 as the image under of the unit in A .
We now verify the conclusions (a)(c) of the theorem. Part (a) holds since H
was constructed as the closure of { (a)1 : a A }. To see (b), observe that
for all a A , 1 , (a)1 = 1 , a = (a). Finally, since is a morphism,
(P({ai }iJ )) = P({ (ai )}iJ ), which together with part (b), shows part (c).
To verify part (d), note that part (b) implies that for a, b A ,
(ab) = ( (ab)) = ( (a) (b))
and thus, if is tracial, one gets ( (a) (b)) = ( (b) (a)). The conclusion
follows by a density argument, using the Kaplansky density theorem, Theorem
G.15, to first reduce attention to self-adjoint operators and their approximation by
a net, belonging to (A ), of self-adjoint operators.
Corollary 5.2.25 In the setup of Theorem 5.2.24, there exists a separable Hilbert
space H, a norm-preserving -homomorphism : A B(H) and a unit vector
H such that for all a A , (a) = (a) , .
We will see that the state of Theorem 5.2.24 satisfies additional properties
that we now define. These properties will play an important role in our treatment
of unbounded operators in subsection 5.2.3.
inf (a ) = 0 .
Proof We keep the same notation as in the proof of Theorem 5.2.24. We begin by
showing that is faithful on W ({ai }iJ ) B(H). Take x W ({ai }iJ ) so that
(x x) = 0. Then we claim that
Indeed, we have
where we used in the last equality the fact that is tracial on W ({ai }iJ ). Be-
cause is a morphism we have (a) (a ) = (aa ), and because the operator
norm of (aa ) B(H) is bounded by the norm aa in A , we obtain from the
last display
completing the proof of (5.2.15). Since (a)1 is dense in H by part (a) of The-
orem 5.2.24, and x B(H), we conclude that x = 0 for all H, and therefore
x = 0, completing the proof that is faithful in W ({ai }iJ ). By using Proposi-
tion G.21 with x the projection onto the linear vector space generated by 1 , we
see that is normal.
(Here, f (X) is defined by the spectral theorem, Theorem G.8, see Section G.2 for
details.)
It follows from the definition that a self-adjoint operator X is affiliated with A
iff (1 + zX)1 X A for one (or equivalently all) z C\R. (Equivalently, iff all
the spectral projections of X belong to A .) By the double commutant theorem,
Theorem G.13, this is also equivalent to saying that, for any unitary operator u in
the commutant of A , uXu = X.
The proof of Proposition 5.2.32 is based on the two following auxiliary lemmas.
Note that part of the statement is that Q(T1 , . . . , Tk )p A . In the proof of Proposi-
tion 5.2.32, we use Lemma 5.2.33 with projections pi = pni := Ti ([n, n]) on the
domain of the Ti that ensure that (T1 , . . . , Tk ) belong to A . Since such projections
can be chosen with traces arbitrarily close to 1, Lemma 5.2.33 will allow us to
define the law of polynomials in affiliated operators by density, as a consequence
of the following lemma.
dominates the Levy distance on M1 (R) defined in Theorem C.8. Lemma 5.2.34
shows that, with X,Y, p, as in the statement, dKS (X , Y ) .
Proof of Lemma 5.2.33 The key to the proof is to show that if Z ABand p is a
projection, then there exists a projection q such that
S2 p = S2 q p = p1 S2 q p = p1 S2 p = p1 S2 p2 p . (5.2.17)
Therefore
S1 S2 p = S1 S2 p , (5.2.18)
where (5.2.17) was used in the last equality. Note that part of the equality is that
the image of S2 p is in the domain of S1 and so S1 S2 p A . Moreover, (p)
1 4 max (1 pi ) by Property G.18. We proceed by induction. We first detail
the next step involving the product S1 S2 S3 . Set S = S2 S3 and let p be the projection
as in (5.2.18), so that Sp = S2 S3 p A . Repeat the previous step now with S and
S1 , yielding a projection q so that S1 S2 S3 pq = S1 S2 S3 pq. Proceeding by induction,
we can thus find a projection p so that S1 Sn p = S1 Sn p with Si = Si pi and
(p) 1 2n max (1 pi ). Similarly, (S1 + + Sn )q = (S1 + + Sn )q if
q = p1 p2 pn . Iterating these two results, for any given polynomial Q, we
find a finite constant m(Q) such that for any Ti = Ti pi with (pi ) 1 , 1 i k,
there exists p so that Q(T1 , . . . , Tk )p = Q(T1 , . . . , Tk )p and (p) 1 m(Q) .
To complete the argument by proving (5.2.16), we write the polar decompo-
sition (1 p)Z = uT (see G.9), with a self-adjoint nonnegative operator T =
|(1 p)Z| and u a partial isometry such that u vanishes on the ortho-complement
of the range of T . Set q = 1 u u. Noting that uu 1 p, we have (q) (p).
Also, qT = (1 u u)T = 0 implies that T q = 0 since T and q are self-adjoint, and
therefore (1 p)Zq = 0.
Proof of Lemma 5.2.34 We first claim that, given an unbounded self-adjoint op-
erator T affiliated to A and a real number x, we have
FT (x) = sup{ (q) : q = q2 = q A , qT q A , qT q xq}. (5.2.19)
More precisely, we now prove that the supremum is achieved for c with the
projections qT,c (x) = T ((c, x]) provided by the spectral theorem. At any rate, it is
clear that FT (x) = (T ((, x])) is a lower bound for the right side of (5.2.19).
To show that FT (x) is also an upper bound, consider any projection r A such
that (r) > FT (x) with rTr bounded. Put q = T ((, x]). We have (r) > (q).
We have (r r q) = (r q q) (r) (q) > 0 using Proposition G.17.
Therefore we can find a unit vector v H such that rTrv, v > x, thus ruling out
the possibility that (r) belongs to the set of numbers on the right side of (5.2.19).
This completes the proof of the latter equality.
Consider next the quantity
FT,p (x) = sup{ (q) : q = q2 = q A , qT q A , qT q xq, q p} .
We claim that
FT (x) FT,p (x) FT (x) . (5.2.20)
The inequality on the right of (5.2.20) is obvious. We get the lower equality by
taking q = qT,c (x) p on the right side of the definition of FT,p (x) with c large and
using Proposition G.17 again. Thus, (5.2.20) is proved.
To complete the proof of Lemma 5.2.34, simply note that FX,p (x) = FY,p (x) by
hypothesis, and apply (5.2.20).
Proof of Proposition 5.2.32 Put Tin := Ti pni with pni = Ti ([n, n]). Define the
multiplication operator MQ := MQ(T1 ,...,Tk ) as in Theorem 5.2.31. By Lemma
5.2.33, we can find a projection pn such that
X n := pn Q(T1n , . . . , Tkn )pn = pn Q(T1 , . . . , Tk )pn = pn MQ pn
and (pn ) 1 m(Q) maxi (1 Ti ([n, n])). By Lemma 5.2.34,
dKS (MQ , Q(T1n ,...,Tkn ) ) m(Q) max (1 Ti ([n, n])) ,
i
implying the convergence of the law of Q(T1n , . . . , Tkn ) to the law of MQ . Since also
by construction pni Ti pni = wn (Ti ) with wn (x) = x1|x|n , we see that we can replace
now wn by any other local approximation un of the identity since the difference
X n pn Q(un (T1 ), . . . , un (Tk ))pn
is uniformly bounded by c sup|x|n |wn un |(x) for some finite constant c =
c(n, sup|x|n |wn (x)|, Q) and therefore goes to zero when un (x) approaches the
identity map on [n, n].
What makes free probability special is the notion of freeness that we define in
Section 5.3.1. It is the noncommutative analog of independence in probability.
In some sense, probability theory distinguishes itself from integration theory by
the notions of independence and of random variables which are the basis to treat
problems from a different perspective. Similarly, free probability differentiates
from noncommutative probability by this very notion of freeness which makes it
a noncommutative analog of classical probability.
(a1 an ) = 0 .
Let r, (mk )1kr be positive integers. The sets (X1,p , . . . , Xm p ,p )1pr of noncom-
mutative random variables are called free if the algebras they generate are free.
Remark 5.3.2
(i) Independence and free independence are quite different. Indeed, let X,Y
be two self-adjoint elements of a noncommutative probability space (A , )
such that (X) = (Y ) = 0 but (X 2 ) = 0 and (Y 2 ) = 0. If X,Y commute
and are independent,
(XY ) = 0, (XY XY ) = (X 2 ) (Y 2 ) = 0,
whereas if X,Y are free, then (XY ) = 0 but (XY XY ) = 0.
(ii) The interest in free independence is that if the subalgebras Ai are freely
independent, the restrictions of to the Ai are sufficient in order to com-
pute on the subalgebra generated by all Ai . To see that, note that it is
enough to compute (a1 a2 an ) for ai Ak(i) and k(i) = k(i + 1). But,
from the freeness condition,
The proof of some basic properties of free independence that are inherited by
subalgebras is left to Exercise 5.3.8.
The following are standard examples of free variables.
Example 5.3.3
(i) Free products of groups (Continuation of Example 5.2.2, part (ii)) Sup-
pose G is a group which is the free product of its subgroups Gi , that is,
every element in G can be written as the product of elements in the Gi and
350 5. F REE PROBABILITY
Note that even though i i is typically not the identity, it does hold true that
i j = i j I with I the identity in B(T ). Due to that, the algebra generated
by (i , i , I) is generated by the terms qi (i ) p , p + q > 0, and I. Note also
5.3 F REE INDEPENDENCE 351
that
(qi (i ) p ) = (i ) p 1, (i )q 1 = 0 ,
since at least one of p, q is nonzero. Thus, we need only to prove that if
pk + qk > 0, ik = ik+1 ,
Z := qi11 (i1 ) p1 qi22 (i2 ) p2 qinn (in ) pn = 0 .
Let H be defined similarly but without the restriction j1 = j. Note that all the
Hilbert spaces H ( j) are closed subspaces of H . We equip B(H ) with the state
= (a a , ), and hereafter regard it as a noncommutative probability space.
352 5. F REE PROBABILITY
j ,
h j h j ,
j (h j1 h j2 h jn ) h j1 h j2 h jn ,
h j (h j1 h j2 h jn ) h j h j1 h j2 h jn .
j (T ) = V j (T IH ( j) ) V j
j (T )(h j1 h jm ) = j (T )h j1 h jm + (T j ) h j1 h jm . (5.3.4)
We have nearly reached our goal. The key point is the following.
The lemma granted, we can quickly conclude the construction of the free product
(A , ), as follows. We take A to be the C -subalgebra of B(H ) generated by
the images j (A j ), to be the restriction of to A , and i j to be the restriction of
j to A j . It is immediate that the images i j (A j ) are free in (A , ).
Proof of Lemma 5.3.4 Fix j1 = j2 = = jm and operators Tk B(H jk ) for
k = 1, . . . , m. Note that by definition ( jk (Tk )) = Tk jk , jk . Put Tk = Tk
Tk jk , jk I jk , where I jk denotes the identity mapping of H jk to itself, noting that
( jk (Tk )) = 0. By iterated application of (5.3.4) we have
( j1 (T1 ) jm (Tm )) = 0.
Thus the C -subalgebras j (B(H j )) are indeed free in B(H ) with respect to the
state .
Remark 5.3.5 In point (i) of Example 5.3.3 the underlying Hilbert space equipped
5.3 F REE INDEPENDENCE 353
with unit vector is the free product of the pairs (2 (Gi ), veGi ), while in point (ii) it
F
is the free product of the pairs ( n
n=0 Cei , 1).
Exercise 5.3.7 In the setting of part (ii) of Example 5.3.3, show that, for all n N,
2
1
[(1 + 1 )n ] = xn 4 x2 dx.
2
Hint: Expand the left side and show that ( p1 p2 pn ), with pi = 1 or , van-
ishes unless ni=1 1 pi =1 = ni=1 1 pi = . Deduce that the left side vanishes when n
is odd. Show that when n is even, the only indices (p1 , . . . , pn ) contributing to
the expansion are those for which the path (Xi = Xi1 + 1 pi =1 1 pi = )1in , with
X0 = 0, is a Dyck path. Conclude by using Section 2.1.3.
Exercise 5.3.8 (i) Show that freely independent algebras can be piled up, as
follows. Let {Ai }iI be a family of freely independent subalgebras of A . Partition
I into subsets {I j } jJ and denote by B j the subalgebra generated by the family
{Ai }iI j . Show that the family {B j } jJ is freely independent. (ii) Show that
freeness is preserved under (strong or weak) closures, as follows. Suppose that
(A , ) is a C - or W -probability space. Let {Ai }iI be a family consisting of
unital subalgebras closed under the involution, and for each index i I let AC i
be the strong or weak closure of Ai . Show that the family {AC i }iI is still freely
independent.
354 5. F REE PROBABILITY
Whereas classical cumulants are related to moments via a sum on the whole set
of partitions, free cumulants are defined with the help of non-crossing partitions
(recall Definition 2.1.4). A pictorial description of non-crossing versus crossing
partitions was given in Figure 2.1.1.
Before turning to the definition of free cumulants, we need to review key prop-
erties of non-crossing partitions. It is convenient to define, for any finite nonempty
set J of positive integers, the set NC(J) to be the family of non-crossing partitions
of J. This makes sense because the non-crossing property of a partition is well de-
fined in the presence of a total ordering. Also, we define an interval in J to be any
nonempty subset consisting of consecutive elements of J. Given , NC(J) we
say that refines if every block of is contained in some block of , and in
this case we write . Equipped with this partial order, NC(J) is a poset, that
is, a partially ordered set. For J = {1, . . . , n}, we simply write NC(n) = NC(J).
The unique maximal element of NC(n), namely {{1, . . . , n}}, we denote by 1n .
Property 5.3.9 For any finite nonempty family {i }iJ of elements of NC(n) there
exists a greatest lower bound iJ i NC(n) and a least upper bound iJ i
NC(n) with respect to the refinement partial ordering.
We remark that greatest lower bounds and least upper bounds in a poset are auto-
matically unique. Below, we write i{1,2} i = 1 2 and i{1,2} i = 1 2 .
Proof It is enough to prove existence of the greatest lower bound iJ i , for then
iJ i can be obtained as kK k , where {k }kK is the family of elements of
NC(n) coarser than i for all i J. (The family {k } is nonempty since 1n belongs
to it.) It is clear that in the refinement-ordered family of all partitions of {1, . . . , n}
there exists a greatest lower bound for the family {i }iJ . Finally, it is routine
to check that is in fact non-crossing, and hence = iJ i .
Remark 5.3.10 As noted in the proof above, for , NC(n), the greatest lower
bound of and in the poset NC(n) coincides with the greatest lower bound in
5.3 F REE INDEPENDENCE 355
the poset of all partitions of {1, . . . , n}. But the analogous statement about least
upper bounds is false in general.
The proof is straightforward and so omitted. But this property bears emphasis
because it is crucial for defining free cumulants.
Proof We define ({ai }iJ ) C for finite nonempty sets J of positive integers,
families {ai }iJ of elements of A and NC(J) in two stages: first we write
J = {i1 < < im } and define iJ ai = ai1 aim ; then we define ({ai }iJ ) =
V (iV ai ). If the defining relations (5.3.5) hold, then, more generally, we
must have
(a1 , . . . , an ) = k (a1 , . . . , an ) (5.3.6)
NC(n)
for all n, (a1 , . . . , an ) A n and NC(n), by Property 5.3.11. Since every partial
ordering of a finite set can be extended to a linear ordering, the system of linear
equations (5.3.6), for fixed n and (a1 , . . . , an ) A n , has (in effect) a square tri-
angular coefficient matrix with 1s on the diagonal, and hence a unique solution.
Thus, the free cumulants are indeed well defined.
Before beginning the proof of the theorem, we prove a result which explains
why the description of freeness by cumulants does not require any centering of
the variables.
kn (a1 , . . . , an ) = 0.
where by our induction hypothesis all the partitions contributing to the above
sum must be such that {i} is a block. But then, by the induction hypothesis,
where the second equality is due to the definition of cumulants and the third to
(5.3.8). As a consequence, because (a1 ai1 ai+1 an ) = (a1 an ), we
have proved that kn (a1 , . . . , an ) = 0.
= k (a1 , . . . , an ) , (5.3.10)
NC(n)
has no singleton blocks
where the second equality is due to Proposition 5.3.16 and the vanishing k1 (ai
(ai )) = 0. To finish the proof of (5.3.9) it is enough to prove that the last sum
reduces to kn (a1 , . . . , an ). If n = 2 this is clear; otherwise, for n > 2, this holds by
induction on n, using Property 5.3.12.
The next lemma provides the inductive step needed to finish the proof of Theo-
rem 5.3.15.
f : {1, . . . , n} {1, . . . , n 1}
be the unique onto monotone increasing function such that f (i) = f (i + 1). Let
NC(n) be the partition whose blocks are of the form f 1 (V ) with V a block
of . Summing the left side of (5.3.11) on we get (a1 , . . . , ai ai+1 , . . . , an )
by (5.3.6). Now summing the right side of (5.3.11) on is the same thing as
replacing the sum already there by a sum over NC(n) such that . Thus,
summing the right side of (5.3.11) over , we get (a1 , . . . , an ) by another
application of (5.3.6). But clearly
In the present case the first of the terms on the right vanishes by induction on n.
Now each NC(n) contributing on the right is of the form = {Vi ,Vi+1 } where
i Vi and i + 1 Vi+1 . Since the function i j(i) cannot be constant both on
Vi and on Vi+1 lest it be constant, it follows that every term in the sum on the far
right vanishes by induction on n. We conclude that kn (a1 , . . . , an ) = 0. The proof
of Theorem 5.3.15 is complete.
We postpone giving a direct link between free independence and random matrices
in order to first exhibit some consequence of free independence, often described as
free harmonic analysis. We will consider two self-adjoint noncommutative vari-
ables a and b. Our goal is to determine the law of a + b or of ab when a, b are free.
Since the law of (a, b) with a, b free is uniquely determined by the laws a of a and
b of b (see part (ii) of Remark 5.3.2), the law of their sum (respectively, product)
is a function of a and b denoted by a b (respectively, a b ). There are
several approaches to these questions; we will detail first a purely combinatorial
approach based on free cumulants and then mention an algebraic approach based
on the Fock space representations (see part (ii) of Example 5.3.3). These two
approaches concern the case where the probability measures a , b have compact
support (that is, a and b are bounded). We will generalize the results to unbounded
variables in Section 5.3.5.
360 5. F REE PROBABILITY
kn (a + b) = kn (a) + kn (b).
kn (a + b) = kn (1 a + (1 1 )b, . . . , n a + (1 n )b)
i =0,1
= kn (a) + kn (b) ,
R a b = R a + R b ,
Let K (z) be the formal inverse of G , i.e. G (K (z)) = z. The formal power
series expansion of K is
1
K (z) = + Cn zn1 .
z n=1
Proof Consider the generating function of the cumulants as the formal power
series
Ca (z) = 1 + kn (a)zn
n=1
and the generating function of the moments as the formal power series
Ma (z) = 1 + mn (a)zn
n=1
then gives Ca (Ga (z)) = zGa (z) and so, by composition with Ka ,
This equality proves that kn = Cn for n 1. To derive (5.3.14), we will first show
that
n
mn (a) = ks (a)mi1 (a) mis (a) . (5.3.15)
s=1 i1 ,...,is {0,1,...,ns}
i1 ++is =ns
362 5. F REE PROBABILITY
mn (a) = k (a) .
NC(n)
k = ks k1 ks .
where we used again the relation (5.3.5) between cumulants and moments. The
proof of (5.3.15), and hence of the lemma, is thus complete.
a1 + a2 = (1 + 2 ) + j,1 1j + j,2 2j (5.3.16)
j=0 j=0
and
a3 = 1 + j,1 1j + j,2 1j (5.3.17)
j=0 j=0
possess the same distribution in the noncommutative probability space (T , 1, 1).
In the above lemma, infinite sums are formal. The law of the associated operators
is still well defined since the (ij ) jM will not contribute to moments of order
smaller than M; thus, any finite family of moments is well defined.
Proof We need to show that the traces ak3 1, 1 and (a1 + a2 )k 1, 1 are equal
for all positive integers k. Comparing (5.3.16) and (5.3.17), there is a bijection
between each term in the sum defining (a1 + a2 ) and the sum defining a3 , which
extends to the expansions of ak3 and (a1 + a2 )k . We thus only need to compare the
vacuum expectations of individual terms; for ak3 1, 1 they are of the form Z :=
w1 1 w1 2 w1 n 1, 1 where wi {, 1}, whereas the expansion of (a1 + a2 )k 1, 1
yields similar terms except that 1 has to be replaced by 1 + 2 and some of the
11 by 12 . Note, however, that Z = 0 if and only if the sequence w1 , w2 , . . . , wn is
a Dyck path, i.e. the walk defined by it forms a positive excursion that returns
to 0 at time n (replacing the symbol by 1). But, since (1 + 2 )i = 1 = i i
for i = 1, 2, the value of Z is unchanged under the rules described above, which
completes the proof.
To deduce another proof of Lemma 5.3.21 from Lemma 5.3.25, we next show
that the cumulants of the distribution of an operator of the form
a = + j j ,
j0
364 5. F REE PROBABILITY
for some creation operator on T , are given by ki = i+1 . To prove this point,
we compute the moments of a. By definition,
n
an 1, 1 = + j j 1, 1
j0
where for j = 1 we wrote for j and set 1 = 1, and further observed that
mixed moments vanish if some i(l) n. Recall now that i(1) i(n) 1, 1 van-
ishes except if the path (i(1), . . . , i(n)) forms a positive excursion that returns to
the origin at time n, that is,
(Such a path is not in general a Dyck path since the (i(p), 1 p n) may take
any values in {1, 0, . . . , n 1}.) We thus have proved that
Define next a bijection between the set of integers (i(1), . . . , i(n)) satisfying
(5.3.18) and non-crossing partitions = {V1 , . . . ,Vr } by i(m) = |Vi | 1 if m is
the first element of the block Vi , and i(m) = 1 otherwise. To see it is a bijection,
being given a partition, the numbers (i(1), . . . , i(n)) satisfy (5.3.18). Reciprocally,
being given the numbers (i(1), . . . , i(n)), we have a unique non-crossing partition
= (V1 , . . . ,Vk ) satisfying |Vi | = i(m) + 1 with m the first point of Vi . It is drawn
inductively by removing block intervals which are sequences of indices such that
{i(m) = p, i(m + k) = 1, 1 k p} (including p = 0 in which case an interval is
{i(m) = 0}). Such a block must exist by the second assumption in (5.3.18). Fixing
such intervals as blocks of the partition, we can remove the corresponding indices
and search for intervals in the corresponding subset S of {i(k), 1 k n}. The
indices in S also satisfy (5.3.18), so that we can continue the construction until no
indices are left.
This bijection allows us to replace the summation over the i(k) in (5.3.19) by
summation over non-crossing partitions to obtain
Thus, by the definition (5.3.5) of the cumulants, we deduce that, for all i 0,
i1 = ki , with ki the ith cumulant. Therefore, Lemma 5.3.25 is equivalent to the
5.3 F REE INDEPENDENCE 365
additivity of the free cumulants of Lemma 5.3.21 and the rest of the analysis is
similar.
Exercise 5.3.27 (i) Let = 12 (+1 + 1 ). Show that G (z) = (z2 1)1 z and
1 + 4z2 1
R (z) =
2z
1
with the appropriate branch of the square root. Deduce that G (z) = z2 4 .
Recall
that if is the standard semicircle law d (x) = (x)dx, G (x) = 12 (z
z 4). Deduce by derivations and integration by parts that
2
1 1
(1 zG (z)) = x (x)dx.
2 zx
Conclude that is absolutely continuous with respect to Lebesgue measure
1
and with density proportional to 1|x|2 (4 x2 ) 2 .
(ii) (Free Poisson) Let > 0. Show that if one takes pn (dx) = (1 n )0 + n ,
pn
n converges to a limit p whose R-transform is given by
R(z) = .
1 z
Deduce that p is the MarcenkoPastur law given, if > 1, by
/
1
p(dx) = p(dx) = 4 2 (x ( + 1))2 dx ,
2 x
and for < 1, p = (1 )0 + p.
is, the collection of moments { ((ab)n ), n N}. Note that ab does not need to be
a self-adjoint operator. In the case where is tracial and a self-adjoint positive,
1 1
we can, however, rewrite ((ab)n ) = ((a 2 ba 2 )n ) so that the law of ab coincides
1 1
with the spectral measure of a 2 ba 2 when b is self-adjoint. However, the following
analysis of the family { ((ab)n ), n N} holds in a more general context where
these quantities might not be related to a spectral measure.
Denote by ma the generating function of the moments, that is, the formal power
series
ma (z) := (an )zn = Ma (z) 1 .
m1
We next prove that the S-transform plays the same role in free probability that the
Mellin transform does in classical probability.
See Exercise 5.3.31 for extensions of Lemma 5.3.30 to the case where either (a)
or (b) vanish.
Proof The idea is to use the structure of non-crossing partitions to relate the gen-
erating functions
where (c, d) = (a, b) or (b, a). Note first that, from Theorem 5.3.15,
= k1 (a)k2 (b) .
1 NC(1,3,...,2n1)2 NC(2,4,...,2n)
1 2 NC(2n)
Now we can do the same for (b(ab)n ) by fixing the first block V1 = (v1 , . . . , vs ) in
the partition of the bs (on the odd numbers); the corresponding first intervals are
{vk + 1, vk+1 1} for k s 1 (representing the words of the form (ab)ik a, with
ik = 21 (vk+1 vk ) 1), whereas the last interval {vs + 1, 2n + 1} corresponds to
a word of the form (ab)i0 with i0 = 21 (2n + 1 vs ). Thus we get, for n 0,
n s
(b(ab)n ) = ks+1 (b) i ++i=ns ((ab)i0 ) (a(ba)ik ) . (5.3.21)
s=0 0 s k=1
ik 0
Set ca (z) := n1 kn (a)zn . Summing (5.3.20) and (5.3.21) yields the relations
b
Mab (z) = 1 + ca (zMab (z)) ,
Mab (z)
b
Mab (z) = zs ks+1 (b)Mab (z)Mbaa (z)s = zMa (z) cb (zMbaa (z)) .
s0 ba
which yields, noting that ca , cb are invertible as formal power series since k1 (a) =
(a) = 0 and k1 (b) = (b) = 0 by assumption,
c1 1
a (Mab (z) 1)cb (Mab (z) 1) = zMab (z) (Mab (z) 1) . (5.3.22)
Exercise 5.3.31 In the case where a is a self-adjoint operator such that (a) = 0
but a = 0, define m1
a , the inverse of ma , as a formal power series in z. Define
the S-transform Sa (z) = (z1 + 1)m1 a (z) and extend Lemma 5.3.30 to the case
where (a) or (b) may vanish.
Hint: Note that (a2 ) = 0 so that ma (z) = (a2 )z2 + m3 (am )zm has formal
inverse m1 2 1
a (z) = (a ) 2 z + ( (a )/2 (a ) )z + , which is a formal power
3 2 2
series in z.
In view of the free harmonic analysis that we developed in the previous sections,
which is analogous to the classical one, it is no surprise that standard results from
classical probability can be generalized to the noncommutative setting. One of the
most important such generalizations is the free central limit theorem.
Lemma 5.3.32 Let {ai }iN be a family of free self-adjoint random variables in
a noncommutative probability space with a tracial state . Assume that, for all
k N,
sup | (akj )| < . (5.3.23)
j
1 N
XN = ai
N i=1
converges in law as N goes to infinity to a standard semicircle distribution.
5.3 F REE INDEPENDENCE 369
Proof Note that by (5.3.23) the cumulants of words in the ai are well defined and
finite. Moreover, by Lemma 5.3.21, for all p 1, we have
N N
ai 1
k p (XN ) = k p ( N ) = N 2p k p (ai ) .
k=1 k=1
lim k p (XN ) = 0 .
N
The notion of freeness was defined for bounded variables possessing all moments.
It naturally extends to general unbounded variables thanks to the notion of affili-
ated operators defined in Section 5.2.3, as follows.
Definition 5.3.33 Self-adjoint operators {Xi }1ip , affiliated with a von Neumann
algebra A , are called freely independent, or simply free, iff the algebras generated
by { f (Xi ) : f bounded measurable}1ip are free.
Proof Set Ai = B(Hi ) with Hi = L2 (i ) and construct the free product H as in the
discussion following (5.3.3), yielding a C -probability space (A , ) with a tracial
370 5. F REE PROBABILITY
state and a morphism such that the algebras ( (Ai ))1ip are free. By the
GNS construction, see Proposition 5.2.24 and Corollary 5.2.27, we can construct
a normal faithful tracial state on a von Neumann algebra B and unbounded
operators (a1 , . . . , a p ) affiliated with B, with marginal distribution (1 , . . . , p ).
They are free since since the algebras they generate are free (note that and
satisfy the relations of Definition 5.3.1 according to Remark 5.3.2).
Corollary 5.3.35 Let {Ti }1ik ABbe free self-adjoint variables with marginal
distribution {i }1ik and let Q be a self-adjoint polynomial in k noncommuting
variables. Then the law of Q({Ti }1ik ) depends only on {i }1ik and it is
continuous in these measures.
Free harmonic analysis can be extended to affiliated operators, that is, to laws
with unbounded support. We consider here the additive free convolution. We
first show that the R-transform can be defined as an analytic function, at least
for arguments with large enough imaginary part, without using the existence of
moments.
lim F (z) = 1 .
|z|,z ,
In particular, the latter shows that |F (z)| > 1/2 on , for large enough.
We can thus apply the implicit function theorem (also known in this context as
the Lagrange inversion theorem) to deduce that F is invertible, with an analytic
inverse. The other claims follow by noting that F is approximately the identity
for sufficiently large.
(z) = F1 (z) z .
Lemma 5.3.38 If is compactly supported and |z| is small enough, then R (z)
equals the absolutely convergent series n0 kn+1 (a)zn .
Note that the definition of G given in (5.3.24) is analytic (in the upper half plane),
whereas it was defined as a formal power series in (5.3.13). However, when is
compactly supported and z is large enough, the formal series (5.3.13) is absolutely
convergent and is equal to the analytic definition (5.3.24), which justifies the use
of the same notation. Similarly, Lemma 5.3.38 shows that the formal Definition
5.3.22 of R can be strengthened into an analytic definition when is compactly
supported.
372 5. F REE PROBABILITY
Proof Let be supported in [M, M] for some M < . Then observe that G
defined in (5.3.13) can be as well defined as an absolutely converging series for
|z| > M, and the resulting function is analytic in this neighborhood of infinity. R
is then defined using Lemma 5.3.36 by applying the same procedure as in Lemma
5.3.24, but on analytic functions rather than formal series.
Proof The proof is obtained by continuity from the bounded variables case. In-
deed, Lemmas 5.3.23 and 5.3.24, together with the last point of Lemma 5.3.36,
show that Corollary 5.3.39 holds when 1 and 2 are compactly supported. We
will next show that
if n converge to in the weak topology, then there exist
, > 0 such that n converges to uniformly on (5.3.25)
compacts subsets of , .
With (5.3.25) granted, put d in = i ([n, n])1 1|x|n d i , note that in converges
to i for i = 1, 2, and observe that the law 1n 2n of un (X1 ) + un (X2 ), with X1 , X2
being two free affiliated variables, converges to 1 2 by Proposition 5.2.32.
The convergence of n to on the compacts of some , for = 1 , 2
and 1 2 , together with the corollary applied to the compactly supported in ,
implying
1n 2n = 1n + 2n ,
yield the corollary for arbitrary measures i .
It remains to prove (5.3.25). Fix a probability measure and a sequence n
converging to . Then, F converges to F uniformly on compact sets of C+ (as
well as its derivatives, since the functions Fn are analytic). Since |F n (z)| > 1/2
on , for sufficiently large, |F n (z)| > 1/4 uniformly in n large enough for z in
compact subsets of , for sufficiently large. Therefore, the implicit function
theorem asserts that there exist , > 0 such that Fn has a right inverse F1 n
on
, , and thus the functions (n , n N, ) are well defined analytic functions
on , and are such that n (z) = o(z) uniformly in n as |z| goes to infinity.
Therefore, by Montels Theorem, the family {n , n N} has subsequences that
converge uniformly on compacts of , . We claim that all limit points must be
5.3 F REE INDEPENDENCE 373
The first term in the right side goes to zero as j goes to infinity by continuity of F
and the second term goes to zero by uniform convergence of Fn j on , . (Note
that n j (z) is uniformly small compared to |z| so that z + n j (z), j N, stays in
, .) Thus, z + is a right inverse of F , that is, = .
The study of free convolution via the analytic functions (or R ) is useful
in deducing properties of free convolution and of free infinitely divisible laws
(whose definition is analogous to the classical one, with free convolution replacing
classical convolution). The following lemma sheds light on the special role of the
semicircle law with respect to free convolution. For a measure M1 (R), we
define the rescaled measure # 1 M1 (R) by the relation
2
x
# 1 , f = f ( )d (x) for all bounded measurable functions f .
2 2
# 1 # 1 = , (5.3.26)
2 2
(z) = 2 (z) .
# 1
2
But
G (z) = 2G ( 2z) (z) = 2 (z/ 2) ,
# 1 # 1
2 2
and so we obtain
(z/ 2) = 2 (z) . (5.3.27)
374 5. F REE PROBABILITY
Show that for z C+ , G p (z) = 1/(z + i ) and so R p (z) = i and therefore that
for any probability measure on R, G p (z) = G (z + i ). Show by the residue
theorem that G p (z) = G (z + i ) and conclude that p = p , that is, the
free convolution by a Cauchy law is the same as the standard convolution.
Random matrices played a central role in free probability since Voiculescus sem-
inal observation that independent Gaussian Wigner matrices converge in distri-
bution as their size goes to infinity to free semicircular variables (see Theorem
5.4.2). This result can be extended to approximate any law of free variables by
taking diagonal matrices and conjugating them by independent unitary matrices
(see Corollary 5.4.11). In this section we aim at presenting these results and the
underlying combinatorics.
We first prove that independent (not necessarily Gaussian) Wigner matrices are
asymptotically free.
that (XiN (m, ), 1 m N, 1 i p) are independent, and that E[XiN (m, )] =
0 and E[|XiN (m, )|2 ] = 1.
Then the empirical distribution N := { 1 X N }1ip of { 1N XiN }1ip converges
N i
almost surely and in expectation to the law of p free semicircular variables. In
other words, the matrices { 1N XiN }1ip , viewed as elements of the noncom-
mutative probability space (MatN (C), , N1 tr) (respectively, (MatN (C), , E[ N1 tr])),
are almost surely asymptotically free (respectively, asymptotically free) and their
spectral measures almost surely converge (respectively, converge) to the semicir-
cle law.
In the course of the proof of this theorem, we shall prove the following useful
intermediate remark, which in particular holds when only one matrix is involved.
Remark 5.4.3 Under the hypotheses of Theorem 5.4.2, except that we do not
require that E[|XiN (m, l)|2 ] = 1 but only that it is bounded by 1, for all monomials
q CXi , 1 i p of degree k normalized so that q(1, 1, . . . , 1) = 1,
Proof of Theorem 5.4.2 We first prove the convergence of E[N ]. The proof
follows closely that of Lemma 2.1.6 (see also Lemma 2.2.3 in the case of complex
entries). We need to show, for any monomial q({Xi }1ip ) = Xi1 Xik CXi |1
i p, the convergence of
1
E[N (q)] = k
2 +1
Tj , (5.4.2)
N j
376 5. F REE PROBABILITY
where j = ( j1 , . . . , jk ) and
Tj := E XiN1 ( j1 , j2 )XiN2 ( j2 , j3 ) XiNk ( jk , j1 ) .
Therefore,
Tj N t ck t k C(k)N 2k
j:wtj k t k
2 2
where the set {j : wtj = 2k + 1} is empty if k is odd. This already shows that, if k
is odd,
lim E[N (q)] = 0 . (5.4.4)
N
If k is even, recall also that if wt(wj ) = 2k + 1, then Gwj is a tree (see an explana-
tion below Definition 2.1.10) and (by the cited definition) wj is a Wigner word.
This means that each (unoriented) edge of Gwj is traversed exactly once in each
direction by the walk j1 jk j1 . Hence, Tj will be a product of covariances of
5.4 L INK WITH RANDOM MATRICES 377
the entries, and therefore vanishes if these covariances involve two independent
matrices. Also, when c2 1, Tj will be bounded above by one and therefore
lim supN |E[N (q)]| is bounded above by |Wk,k/2+1 | 2k , where, as in Def-
inition 2.1.10, Wk,k/2+1 denotes a set of representatives for equivalence classes
of Wigner words of length k + 1, and (hence) |Wk,k/2+1 | is equal to the Catalan
1 k
number k/2+1 k/2 . This will prove Remark 5.4.3.
Moreover, trivially,
q Xk
|Wk,k/2+1 | |Wk,k/2+1
1
| = |Wk,k/2+1 | . (5.4.6)
Recall that Wk,k/2+1 is canonically in bijection with the set NC2 (k) of non-crossing
pair partitions of Kk = {1, . . . , k} (see Proposition 2.1.11 and its proof). Similarly,
q
for q = Xi1 Xik , the set Wk,k/2+1 is canonically in bijection with the subset of
NC2 (k) consisting of non-crossing pair partitions of Kk such that for every
block {b, b } one has ib = ib . Thus, we can also write
lim E[N (q)] =
N
1ib =ib ,
NC2 (k) (b,b )
where the product runs over all blocks {b, b } of the pair partition . Recalling
that kn (ai ) = 1n=2 for semicircular variables by Example 5.3.26 and (5.3.7), we
378 5. F REE PROBABILITY
with k = 0 if is not a pair partition and k2 (ai , a j ) = 1i= j . The right side corre-
sponds to the definition of the moments of free semicircular variables according
to Theorem 5.3.15 and Example 5.3.26. This proves the convergence of E[N ] to
the law of m free semicircular variables.
We now prove the almost sure convergence. Continuing to adapt the ideas of
the (first) proof of Theorem 2.1.1, we follow the proof of Lemma 2.1.7 closely.
(Recall that we proved in Lemma 2.1.7 that the variance of LN , xk is of or-
der N 2 . As in Exercise 2.1.16, this was enough, using Chebyshevs inequal-
ity and the BorelCantelli Lemma, to conclude the almost sure convergence in
Wigners Theorem, Theorem 2.1.1.) Here, we study the variance of N (q) for
q(X1 , . . . , Xp ) = Xi1 Xik which is given by
1
N k+2
Var(N (q)) = E[|N (q) E[N (q)]|2 ] = Tj,j (5.4.7)
j,j
with
Tj,j = E[Xi1 ( j1 , j2 ) Xik ( jk , j1 )Xik ( j1 , j2 ) Xi1 ( jk , j1 )]
E[Xi1 ( j1 , j2 ) Xik ( jk , j1 )]E[Xik ( j1 , j2 ) Xi1 ( jk , j1 )] ,
where we observed that N (q) = N (q ). We consider the sentence
wj,j = ( j1 jk j1 , j1 j2 j1 ) and its associated graph Gwj,j = (Vwj,j , Ewj,j ). As
in the proof of Lemma 2.1.7, Tj,j vanishes unless each edge in Ewj,j appears at
least twice and the graph Gwj,j is connected. This implies that the number of dis-
tinct elements in Vwj,j is not more than k + 1, and it was further shown in the proof
of Lemma 2.1.7 that the case where it is equal to k + 1 never happens. Hence,
there are at most k different vertices and so at most N k possible choices for them.
Thus, since Tj,j is uniformly bounded by 2c2k , we conclude that there exists a
finite constant c(k) such that
c(k)
Var(N (q))
.
N2
By Chebyshevs inequality we therefore find that
c(k)
P(|N (Xi1 Xik ) E[N (Xi1 Xik )]| ) .
2N2
The BorelCantelli Lemma then yields that
lim |N (Xi1 Xik ) E[N (Xi1 Xik )]| = 0 , a.s.
N
5.4 L INK WITH RANDOM MATRICES 379
We next show that Theorem 5.4.2 generalizes to the case of polynomials that
may include some deterministic matrices.
and that the law of DN in the noncommutative probability space (MatN (C), ,
N tr) converges to a noncommutative law . Then we have the following.
1
To avoid repetition, we follow a different route than that used in the proof of
Theorem 5.4.2 (even though similar arguments could be developed). We de-
note by CDi , Xi |1 i p the set of polynomials in {Di , Xi }1ip , by N (re-
spectively, N ) the quenched (respectively, annealed) empirical distribution of
1 1
{DN , N 2 XN } = {DNi , N 2 XiN }1ip given, for q CDi , Xi |1 i p, by
1 XN N
N (q) := tr q( , D ) , N (q) := E[N (q)] .
N N
To prove the convergence of {N }NN we first show that this sequence is tight
(see Lemma 5.4.6), and then show that any limit point satisfies the so-called
SchwingerDyson, or master loop, equation which has a unique solution (see
Lemma 5.4.7).
any R, d N,
sup lim sup |N (q)| Dd 2R . (5.4.9)
qCXi ,Di |1ipR,d N
i PQ = i P (1 Q) + (P 1) i Q (5.4.10)
Lemma 5.4.7 For any R, d N, the following hold under the hypotheses of Theo-
rem 5.4.5.
(i) Any limit point of {N (q), q CXi , Di |1 i pR,d }NN satisfies the
boundary and tracial conditions
where the second equality in (5.4.11) holds for all monomials P, Q such
that PQ CXi , Di |1 i pR,d . Moreover, for all i {1, . . . , m} and all
q CXi , Di |1 i mR1,d , we have
(Xi q) = (i q) . (5.4.12)
Remark 5.4.8 The system of equations (5.4.11) and (5.4.12) is often referred to
in the physics literature as the SchwingerDyson, or master loop, equation.
We next show heuristically how, when {XiN }1ip are taken from the GUE, the
SchwingerDyson equation can be derived using Gaussian integration by parts,
see Lemma 2.4.5. Toward this end, we introduce the derivative z = (z iz )/2
with respect to the complex variable z = z + iz, so that z z = 1 but z z = 0.
Using this definition for the complex variable XiN (, r) when = r (and otherwise
the usual definition for the real variable XiN (, )), note that we have
Using Lemma 2.4.5 directly, one verifies that (5.4.15) still holds for m = . (One
could just as well take (5.4.15) as the definition of X N (m,) .) Now let us consider
i
N
(5.4.15) with the special choice of f = P( X
N
, DN )( j, k), where P CXi , Di |1
i p and j, k {1, . . . , N}. Some algebra reveals that, using the notation (A
B)( j, m, , k) = A( j, m)B(, k),
X N (m,) P(XN , DN ) ( j, k) = i P(XN , DN ) ( j, m, , k) . (5.4.16)
i
Together with (5.4.15), and after summation over j = m and = k, this shows that
E [N (Xi P) N N (i P)] = 0 .
382 5. F REE PROBABILITY
XN
lim sup N max E[|q( , DN )(i, j)|k ] = 0 . (5.4.17)
N 1i jN N
(ii) There exists a finite constant C(q) such that, for all positive integers N,
C(q)
E[|N (q) N (q)|2 ] . (5.4.18)
N 2
We next give the proof of Theorem 5.4.5, with Lemmas 5.4.6, 5.4.7 and 5.4.9
granted.
Proof of Theorem 5.4.5 By Lemmas 5.4.6 and 5.4.7, {N (q), q CXi , Di |1
i pR,d } is tight and converges to the unique solution {R,d (q), q CXi , Di |1
i pR,d } of the system of equations (5.4.11) and (5.4.12). As a consequence,
R,d (q) = R ,d (q) for q CXi , Di |1 i pR ,d , R R and d d , and we can
define (q) = R,d (q) for q CXi , Di |1 i pR,d . This completes the proof of
the first point of Theorem 5.4.5 since is the law of p free semicircular variables,
free with {Di }1ip with law by part (iii) of Lemma 5.4.7.
The almost sure convergence asserted in the second part of the theorem is a
direct consequence of (5.4.18), the BorelCantelli Lemma and the previous con-
vergence in expectation.
which proves (5.4.19) for K = R, and thus completes the proof of the induction
step. Equation (5.4.9) follows.
Proof of Lemma 5.4.9 Without loss of generality, we assume in what follows that
D 1. If q is a monomial in CXi , Di |1 i pR,d , and if max (X) denotes the
spectral radius of a matrix X and ei the canonical orthonormal basis of CN ,
XN XN p XN
|q( , DN )(i, j)| = |ei , q( , DN )e j | Di=1 di max ( i )i ,
N N 1ip N
pN(N+1)/2
N := E[|N (q) N (q)|2 ] = r , (5.4.20)
r=1
with
Xr := (1 )X r + X r .
where the sum runs over all decompositions of q into q1 Xs q2 . Hence we obtain
that there exists a finite constant C(q) such that
1 XN ,r
C(q)
r
N3
q=q X q 0
E[|YsN (k, )|2 |(q2 q1 )( , DN )(, k)|2 ]d ,
N
1 s 2
(k,)=(i, j) or ( j,i)
with XN ,r the p-tuple of matrices where the (i, j) and ( j, i) entries of the matrix s
were replaced by the interpolation Xr and its conjugate and YsN (i, j) = XsN (i, j)
XsN (i, j). We interpolate again with the p-tuple XNr where the entries (i, j) and
( j, i) of the matrix s vanishes to obtain by the CauchySchwarz inequality and
5.4 L INK WITH RANDOM MATRICES 385
independence of XNr with YsN (i, j) that, for some finite constants C(q)1 , C(q)2 ,
C(q)1 XN
r
N 3
q=q1 Xs q2
E[|(q2 q1 )( r , DN )(k, )|2 ]
N
(k,)=(i, j) or ( j,i)
1
XN XN ,r 1
+ E[|(q2 q1 )( r , DN )(k, ) (q2 q1 )( , DN )(k, )|4 ] 2 d
0 N N
C(q)2 XN N
N3
q=q1 Xs q2
E[|(q2 q1 )( , D )(k, )|2 ]
N
(k,)=(i, j) or ( j,i)
XN XN
+ E[|(q2 q1 )( , DN )(k, ) (q2 q1 )( r , DN )(k, )|2 ]
N N
1
XNr XN ,r 1
+ E[|(q2 q1 )( , DN )(i, j) (q2 q1 )( , DN )(k, )|4 ] 2 d . (5.4.21)
0 N N
To control the last two terms, consider two p-tuples of matrices XN and XN
that differ only at the entries (i, j) and ( j, i) of the matrix s and put YsN (i, j) =
XsN (i, j) XsN (i, j). Let q be a monomial and 1 k, N. Then, if we set
XN = (1 )XN + XN , we have
XN XN
q(k, ) := q( , DN )(k, ) q( , DN )(k, )
N N
N 1 XN XN
Y (m, n)
= s 1 N p ( , DN
)(k, m)p2 ( , DN )(n, )d .
(m,n)=(i, j) N 0 q=p1 Xs p2 N
or ( j,i)
As a consequence, the two last terms in (5.4.21) are at most of order N 1+ and
summing (5.4.21) over r, we deduce that there exist finite constants C(q)3 , C(q)4
so that
C(q)3 p XN N
N
N 3 s=1
E[ |(q2 q1 )(
N
, D )(i, j)|2
] + N 1+
q=q1 Xs q2 1i, jN
C(q)3 p C(q)4
= 2
N s=1 q=q1 Xs q2
N (q2 q1 q1 q2 ) + 2 .
N
Proof of Lemma 5.4.7 To derive the equations satisfied by a limiting point R,d of
N , note that the first equality of (5.4.11) holds since we assumed that the law of
386 5. F REE PROBABILITY
{DNi }1ip converges to , whereas the second equality is verified by N for each
N, and therefore by all its limit points. To check that R,d also satisfies (5.4.12),
we write
N
1 XN
N (Xi q) =
N 3/2
E[XiN ( j1 , j2 )q( , DN )( j2 , j1 )] = I1 ,2 , (5.4.23)
N
j1 , j2 =1 1 ,2
1 XN XN
I0,1 =
N2 E[q1 ( , D)( j1 , j1 )q2 ( , DN )( j2 , j2 )] ,
N N
j1 , j2 q=q1 Xi q2
where XN is the p-tuple of matrices whose entries are the same as XN , except that
XiN ( j1 , j2 ) = XiN ( j2 , j1 ) = 0. By (5.4.22), we can replace the matrices XN by XN
up to an error of order N 2 for any > 0, and therefore
1
with
XN XN
I( j1 , j2 , ) := E[q1 ( )( (1), (2)) qk+1 ( )( (k + 1), (1))] ,
N N
Corollary 5.4.11 Let {DNi }1ip be a sequence of uniformly bounded real di-
agonal matrices with empirical measure of diagonal elements converging to i ,
i = 1, . . . , p respectively. Let {UiN }1ip be independent unitary matrices follow-
ing the Haar measure, independent from {DNi }1ip .
(i) The noncommutative variables {UiN DNi (UiN ) }1ip in the noncommuta-
tive probability space (MatN (C), , E[ N1 tr]) (respectively,
(MatN (C), , N1 tr)) are asymptotically free (respectively, almost surely
asymptotically free), the law of the marginals being given by the i .
(ii) The empirical measure of eigenvalues of of DN1 +UN DN2 UN converges weakly
almost surely to 1 2 as N goes to infinity.
(iii) Assume that DN1 is nonnegative. Then, the empirical measure of eigenval-
ues of
1 1
(DN1 ) 2 UN DN2 UN (DN1 ) 2
random matrices. If DN1 and DN2 are two diagonal matrices whose eigenvalues
are independent and equidistributed, the spectral measure of DN1 + DN2 converges
to a standard convolution. At the other extreme, if the eigenvectors of a matrix
AN1 are very independent from those of a matrix AN2 in the sense that the joint
distribution of the matrices can be written as the distribution of (AN1 ,U N AN2 (U N ) ),
then free convolution will describe the limit law.
Proof of Theorem 5.4.10 We denote by N := {DN ,U N ,(U N ) }1ip the joint em-
i i i
pirical distribution of {DNi ,UiN , (UiN ) }1ip , considered as an element of the al-
gebraic dual of CXi , 1 i n with n = 3p, equipped with the involution such
that ( Xi1 Xin ) = Xin Xi1 if
X3i2 = X3i2 , 1 i p, X3i1 = X3i , 1 i p .
The norm is the operator norm on matrices. We may and will assume that D 1,
and then our variables are bounded uniformly by D. Hence, N is a state on the
universal C -algebra A (D, {1, , 3n}) as defined in Proposition 5.2.14 by an
appropriate separation/completion construction of CXi , 1 i n. The sequence
{E[N ]}NN is tight for the weak*-topology according to Lemma 5.2.18. Hence,
we can take converging subsequences and consider their limit points. The strategy
of the proof will be to show, as in the proof of Theorem 5.4.5, that these limit
points satisfy a SchwingerDyson equation. Of course, this SchwingerDyson
equation will be slightly different from the equation obtained in Lemma 5.4.7 in
the context of Gaussian random matrices. However, it will again be a system
of equations defined by an appropriate noncommutative derivative, and will be
derived from the invariance by multiplication of the Haar measure, replacing the
integration by parts (5.4.15) (the latter could be derived from the invariance by
translation of the Lebesgue measure). We will also show that the Schwinger
Dyson equation has a unique solution, implying the convergence of (E[N ], N
N). We will then show that this limit is exactly the law of free variables. Finally,
concentration inequalities will allow us to extend the result to the almost sure
convergence of {N }NN .
SchwingerDyson equation We consider a limit point of {E[N ]}NN . Be-
cause we have N ((Ui (Ui ) 1)2 ) = 0 and N (PQ) = N (QP) for any P, Q
CDi ,Ui ,Ui |1 i p, almost surely, we know by taking the large N limit that
i (PQ) = i P 1 Q + P 1 i Q ,
i U j = 1 j=iU j 1, iU j = 1 j=i 1 U j ,
where we used the notation (A B)(k, r, q, l) := A(k, r)B(q, l). Taking k = r and
q = l and summing over r, q gives
E [N N (i P)] = 0 . (5.4.29)
Using Corollary 4.4.31 inductively (on the number p of independent unitary ma-
trices), we find that, for any polynomial P CDi ,Ui ,Ui |1 i p, there exists
a positive constant c(P) such that
p
|trP({DNi ,UiN , (UiN ) }1ip ) EtrP| > 2ec(P) ,
2
U(N)
and therefore
2
E[|trP EtrP|2 ] .
c(P)
Writing i P = M j=1 Pj Q j for appropriate integer M and polynomials Pj , Q j
CDi ,Ui ,Ui |1 i p, we deduce by the CauchySchwarz inequality that
lim E [N ] E [N ] (i P) = 0 .
N
5.4 L INK WITH RANDOM MATRICES 391
(i P) = 0 , (5.4.30)
i (QUi ) = i Q 1 Ui + (QUi ) 1 ,
(QUi ) = (i Q 1 Ui )
= (Q1Ui ) (Q2Ui ) + (Q1 ) (Q2 ) ,
Q=Q1Ui Q2 Q=Q1Ui Q2
where we used the fact that (Ui Q2Ui ) = (Q2 ) by (5.4.28). Each term in the
right side is the trace under of a polynomial of degree strictly smaller in Ui and
Ui than QUi . Hence, this relation defines uniquely by induction. In particular,
taking P = Uin we get, for all n 1,
n
(Uik ) (Uink ) = 0 ,
k=1
from which we deduce by induction that (Uin ) = 0 for all n 1 since (Ui0 ) =
(1) = 1. Moreover, as is a state, ((Ui )n ) = (((Ui )n ) ) = (Uin ) = 0 for n 1.
The solution is the law of free variables It is enough to show by the previous
point that the joint law of the two free p-tuples {Ui ,Ui }1ip and {Di }1ip
n
satisfies (5.4.30). So take P = Uin11 B1 Ui pp B p with some Bk s in the algebra gen-
erated by {Di }1ip and ni Z\{0} (where we observed that Ui = Ui1 ). We
wish to show that, for all i {1, . . . , p},
(i P) = 0. (5.4.31)
392 5. F REE PROBABILITY
Note that, by linearity, it is enough to prove this equality when (B j ) = 0 for all
j. Now, by definition, we have
nk
n l
Uin11 B1 Bk1Uil Ui k
n
i P = Bk Ui pp B p
k:ik =i,nk >0 l=1
nk 1
Uin11 B1 Bk1Uil Ui k
n +l n
Bk Ui pp B p .
k:ik =i,nk <0 l=0
Proof of Corollary 5.4.11 The only point to prove is the first. By Theorem 5.4.10,
we know that the normalized trace of any polynomial P in {UiN DNi (UiN ) }1ip
converges to (P({Ui DiUi }1ip )) with the subalgebras generated by {Di }1ip
and {Ui ,Ui }1ip free. Thus, if
P({Xi }1ip ) = Q1 (Xi1 ) Qk (Xik ) , with i+1 = i , 1 k 1
and (Q (Xi )) = (Q (Di )) = 0, then
(P({Ui DiUi }1ip )) = (Ui1 Q1 (Di1 )Ui1 Uik Qk (Dik )Uik ) = 0 ,
since (Q (Di )) = 0 and (Ui ) = (Ui ) = 0.
Exercise 5.4.12 Extend Theorem 5.4.2 to the self-dual random matrices con-
structed in Exercise 2.2.4.
Exercise 5.4.13 In the case where the Di are diagonal matrices, generalize the
arguments of Theorem 5.4.2 to prove Theorem 5.4.5.
Exercise 5.4.14 Take DN (i j) = 1i= j 1i[ N] the projection on the first [ N] indices
and X N be an N N matrix satisfying the hypotheses of Theorem 5.4.5. With In
the identity matrix, set
ZN = DN X N (IN DN ) + (IN DN )X N DN
0 X N[ N],[ N]
=
(X N[ N],[ N] ) 0
with X N[ N],[ N] the corner (X N )1i[ N],[ N]+1 jN of the matrix X N . Show
that (Z N )2 has the same eigenvalues as those of the Wishart matrix W N, :=
5.4 L INK WITH RANDOM MATRICES 393
X N[ N],[ N] TN TN (X N[ N],[ N] ) .
Exercise 5.4.17 Another proof of Theorem 5.4.10 can be based on Theorem 5.4.2
1
and the polar decomposition U jN = GNj (GNj (GNj ) ) 2 with GNj a complex Gaussian
matrix which can be written, in terms of independent self-adjoint Gaussian Wigner
matrices, as GNj = X jN + iX jN .
(i) Show that U jN follows the Haar measure.
1
(ii) Approximating GNj (GNj (GNj ) ) 2 by a polynomial in (X jN , X jN )1 jp , prove
Theorem 5.4.10 by using Theorem 5.4.5.
Exercise 5.4.18 State and prove the analog of Theorem 5.4.10 when the UiN fol-
low the Haar measure on the orthogonal group O(N) instead of the unitary group
U(N).
394 5. F REE PROBABILITY
The goal of this section is to show that not only do the traces of polynomials in
Gaussian Wigner matrices converge to the traces of polynomials in free semicir-
cular variables, as shown in Theorem 5.4.2, but that this convergence extends to
the operator norm, thus generalizing Theorem 2.1.22 and Exercise 2.1.27 to any
polynomial in independent Gaussian Wigner matrices.
The main result of this section is the following.
On the left, we consider the operator norm (largest singular value) of the N N
XN N Xm
random matrix P( 1N , . . . , N
), whereas, on the right, we consider the norm of
P(S1 , . . . , Sm ) in the C -algebra S . The theorem asserts a correspondence be-
tween random matrices and free probability going considerably beyond moment
computations.
However, (5.5.1) fails in general, because the spectrum of aa can be strictly larger
than the support of the law of aa . We assume faithfulness and traciality in Theo-
rem 5.5.1 precisely so that we can use (5.5.1).
We mention the state and degree bound d in the statement of the proposition
because, even though they do not appear in the conclusion, they figure prominently
in many formulas and estimates below. We remark that since formula (5.5.1) is
not needed to prove Proposition 5.5.3, we do not assume faithfulness and traciality
of . Note the scale invariance of the proposition: for any constant > 0, the
conclusion of the proposition holds for P if and only if it holds for P.
Proof of Theorem 5.5.1 (Proposition 5.5.3 granted). We may assume that P is
self-adjoint. By Proposition 5.5.3, using P(S) = P(S),
XN
lim sup P( ) (spectral radius of P(S)) + = P(S) + , a.s. ,
N N
1 XN XN
(P(S) ) = lim tr(P( ) ) lim inf P( ) , a.s.
N N N N N
By (5.5.1), and our assumption that is faithful and tracial,
XN 1
lim inf P( ) sup (P(S)2 ) 2 = P(S) , a.s. ,
N N 0
We pause for more notation. Recall that, given a complex number z, z and z
denote the real and imaginary parts of z, respectively. In general, we let 1A denote
the unit of a unital complex algebra A . (But we let In denote the unit of Matn (C).)
Note that, for any self-adjoint element a of a C -algebra
A A , and A C such that
> 0, we have that a 1A is invertible and (a 1A )1 A 1/ . The
A
latter observation is used repeatedly below.
For C such that > 0, with P CX self-adjoint, as in Proposition 5.5.3,
396 5. F REE PROBABILITY
let
g( ) = gP ( ) = ((P(S) 1S )1 ) , (5.5.2)
1 XN
gN ( ) = gPN ( ) = E tr (P( ) IN )1 . (5.5.3)
N N
Both g( ) and gN ( ) are analytic in the upper half-plane { > 0}. Further, g( )
is the Stieltjes transform of the law of the noncommutative random variable P(S)
under , and gN ( ) is the expected value of the Stieltjes transform of the empirical
XN
distribution of the eigenvalues of the random matrix P( N
). The uniform bounds
1 1
|g( )| , |gN ( )| (5.5.4)
are clear.
We now break the proof of Proposition 5.5.3 into three lemmas.
Lemma 5.5.4 For any choice of constants c0 , c 0 > 0, there exist constants N0 , c1 ,
c2 , c3 > 0 (depending only on P, c0 and c 0 ) such that the following holds.
then
c2
|gP ( ) gPN ( )| . (5.5.6)
N 2 ( )c3
P
Now for any > 0 we have g P ( ) = gP ( ) and gN ( ) = gPN ( ). Thus,
crucially, this lemma, just like Proposition 5.5.3, is scale invariant: for any > 0,
the lemma holds for P if and only if it holds for P.
4 N
Lemma 5.5.6 With and P as above, limN N 3 N1 tr (P(
X
N
)) = 0, almost
surely.
The heart of the matter, and the hardest to prove, is Lemma 5.5.4. The main
idea of its proof is the linearization trick, which has a strong algebraic flavor. But
before commencing the proof of that lemma, we will present (in reverse order) the
chain of implications leading from Lemma 5.5.4 to Proposition 5.5.3.
5.5 C ONVERGENCE OF OPERATOR NORMS 397
Proof of Proposition 5.5.3 (Lemma 5.5.6 granted) Let D = sp(P(S)), and write
D = {y R : d(y, D) < }. Denote by N the empirical measure of the eigenval-
X N XN
ues of the matrix P( N
). By Exercise 2.1.27, the spectral radii of the matrices iN
for i = 1, . . . , m converge almost surely towards 2 and therefore there exists a fi-
nite constant M such that lim supN N ([M, M]c ) = 0 almost surely. Consider a
smooth compactly supported function : R R equal to one on (D )c [M, M]
and vanishing on D /2 [2M, 2M]c . We now see that almost surely for large N,
no eigenvalue can belong to (D )c , since otherwise
1 XN 4
tr (P( )) = (x)d N (x) N 1 ( N 3 ,
N N
Proof of Lemma 5.5.6 (Lemma 5.5.5 granted) As before, let N denote the em-
XN
pirical distribution of the eigenvalues of P( N
). Let i be the noncommutative
derivative defined in (5.4.10). Let X N (,k) be the derivative as it appears in (5.4.13)
i
and (5.4.15). The quantity (x)d N (x) is a bounded smooth function of XN sat-
isfying
1 XN XN
X N (,k) (x)d N (x) = 3 (( i P)( ) (P( )))k, (5.5.7)
i
N2 N N
where we let A BC = BCA. Formula (5.5.7) can be checked for polynomial
, and then extended to general smooth by approximations. As a consequence,
with d bounding the degree of P as in the statement of Proposition 5.5.3, we find
that
C m XiN 2d2 1 XN 2
(x)d N (x)22 N
N 2 i=1
( + 1)
N
tr |
(P(
N
))|
for some finite constant C = C(P). Now the Gaussian Poincare inequality
must hold with a constant c independent of N and f since all matrix entries
XiN (, r) are standard Gaussian, see Exercise 4.4.6. Consequently, for every suffi-
398 5. F REE PROBABILITY
Proof of Lemma 5.5.5 (Lemma 5.5.4 granted) We first briefly review a method for
reconstructing a measure from its Stieltjes transform. Let : R2 C be a smooth
compactly supported function. Put (x, y) = 1 (x + iy )(x, y). Assume that
(x, 0) 0 and (x, 0) 0. Note that by Taylors Theorem (x, y)/|y| is
bounded for |y| = 0. Let be a probability measure on the real line. Then we
have the following formula for reconstructing from its Stieltjes transform:
+
(x, y)
dy dx (dt) = (t, 0) (dt) . (5.5.11)
0 t x iy
This can be verified in two steps. One first reduces to the case = 0 , using
Fubinis Theorem, compact support of (x, y) and the hypothesis that
| (x, y)|/|t x iy| | (x, y)|/|y|
is bounded for y > 0. Then, letting |(x, y)| = x2 + y2 , one uses Greens Theorem
on the domain {0 < |(x, y)| R, y 0} with R so large that is supported in
the disc {|(x, y)| R/2}, and with 0.
Now let be as specified in Lemma 5.5.5. Let M be a large positive integer,
5.5 C ONVERGENCE OF OPERATOR NORMS 399
claim is proved.
N
As before, let N be the empirical distribution of the eigenvalues of P(
X
N
).
Let be the law of the noncommutative random variable P(S). By hypothesis
vanishes on the spectrum of P(S) and hence also vanishes on the support of . By
(5.5.11) and using the uniform bound
A A
A XN A
A(P( 1 A
) IN ) A 1/ ,
A N
we have
E d N = E d N (t) (dt)
+
= ( (x, y))(gN (x + iy) g(x + iy))dz .
0
where the first error term is justified by the uniform bound (5.5.4). With M large
enough, the right side is of order N 2 at most.
We turn finally to the task of proving Lemma 5.5.4. We need first to introduce
suitable notation and conventions for handling block-decomposed matrices with
entries in unital algebras.
Let A be any unital algebra over the complex numbers. Let Matk,k (A ) denote
the space of k-by-k matrices with entries in A , and write Matk (A ) = Matk,k (A ).
Elements of Matk,k (A ) can and will be identified with elements of the tensor
product Matk,k (C)A . In the case that A itself is a matrix algebra, say Matn (B),
we identify Matk,k (Matn (B)) with Matkn,k n (B) by viewing each element of the
400 5. F REE PROBABILITY
latter space as a k-by-k array of blocks each of which is an n-by-n matrix. Re-
call that the unit of A is denoted by 1A , but that the unit of Matn (C) is usually
denoted by In . Thus, the unit in Matn (A ) is denoted by In 1A .
Suppose that A is an algebra equipped with an involution. Then, given a ma-
trix a Matk (A ), we define a Matk (A ) to be the matrix with entries
(a )i, j = aj,i . Suppose further that A is a C -algebra. Then we use the GNS
construction to equip Matk (A ) with a norm by first identifying A with a C -
subalgebra of B(H) for some Hilbert space H, and then identifying Matk (A )
in compatible fashion with a subspace of B(H , H k ). In particular, the rules enun-
ciated above equip Matn (A ) with the structure of a C -algebra. That structure is
unique because a C -algebra cannot be renormed without destroying the property
aa = a2 .
We define the degree of Q Matk (CX) to be the maximum of the lengths of
the words in the variables Xi appearing in the entries of Q. Also, given a collection
x = (x1 , . . . , xm ) of elements in a unital complex algebra A , we define Q(x)
Matk (A ) to be the result of making the substitution X = x in every entry of Q.
Given for i = 1, 2 a linear map Ti : Vi Wi , the tensor product T1 T2 : V1 V2
W1 W2 of the maps is defined by the formula
For example, given A Matk (A ) = Matk (C) MatN (C), one evaluates (idk
N tr)(A) Matk (C) by viewing A as a k-by-k array of N-by-N blocks and then
1
and put Kd+1 = K1 . Note that {1, . . . , k} is the disjoint union of K1 , . . . , Kd . Let A
be a C -algebra and for i = 1, . . . , d, let ti Matki ki+1 (A ) be given. Consider the
5.5 C ONVERGENCE OF OPERATOR NORMS 401
block-decomposed matrix
t1
..
.
T = Matk (A ) , (5.5.13)
td1
td
where for i = 1, . . . , d, the matrix ti is placed in the block with rows (resp., columns)
indexed by Ki (resp., Ki+1 ), and all other entries of T equal 0 A . We remark that
the GNS-based procedure we used to equip each matrix space Mat p,q (A ) with a
norm implies that
d
T max ti . (5.5.14)
i=1
" #
0
Let C be given and put = Matk (C). Below, we write =
0 Ik1
1A , = 1A and more generally = 1A for any Matk (C). This
will not cause confusion, and is needed to compress notation.
A all Mat
(ii) For k (C), if 2cA < 1, then T is invertible and
A(T )1 (T )1 A 2c2 < c.
Here we have abbreviated notation even further by writing 1 = Iki 1A . The first
matrix above is T . Call the next two matrices A and B, respectively, and the
last D. The matrices A and B are invertible since A Ik is strictly lower triangular
and B Ik is strictly upper triangular. The diagonal matrix D is invertible by the
hypothesis that t1 td is invertible. Thus T is invertible with inverse
( T )1 = AD1 B1 . This proves the first of the three claims made in point (i).
For i, j = 1, . . . , d let B1 (i, j) denote the Ki K j block of B1 . It is not difficult
to check that B1 (i, j) = 0 for i > j, B1 (i, i) = Iki , and B1 (i, j) = ti t j1 for
i < j. The second claim of point (i) can now be verified A A by direct calculation,
and the third by using (5.5.14) to bound A and AB1 A. Point (ii) follows by
consideration of the Neumann series expansion for (Ik (T )1 )1 .
Lemma 5.5.8 Let P CX be given, and let d 2 be an integer bounding the
degree of P. Then there exists an integer n 1 and matrices
Proof We have
d m m
P= cri1 ,...,ir Xi1 Xir
r=0 i1 =1 ir =1
for some complex constants cri1 ,...,ir . Let {P }n =1 be an enumeration of the terms
(k,)
on the right. Let ei, j Matk (C) denote the elementary matrix with entry 1 in
position (i, j) and 0 elsewhere. Then we have a factorization
We continue to prepare for the proof of Lemma 5.5.4. For the rest of this section
we fix a self-adjoint noncommutative polynomial P CX and also, as in the
statement of Proposition 5.5.3, an integer d 2 bounding the degree of P. For
i = 1, . . . , d, fix Vi Matki ki+1 (CX) of degree 1, for suitably chosen positive
integers k1 , . . . , kd+1 , such that P = V1 Vd . This is possible by Lemma 5.5.8.
Any such factorization serves our purposes. Put k = k1 + + kd and let Ki be as
5.5 C ONVERGENCE OF OPERATOR NORMS 403
for uniquely determined matrices ai Matk (C). As we will see, Lemma 5.5.7
XN
allows us to use the matrices L( N
) and L(S) to code the spectral properties of
XN
P( N
) and P(S), respectively. We will exploit this coding to prove Lemma 5.5.4.
We will say that any matrix of the form L arising from P by the factorization
procedure above is a d-linearization of P. Of course P has many d-linearizations.
However, the linearization construction is scale invariant in the sense that, for any
constant > 0, if L is a d-linearization of P, then 1/d L is a d-linearization of P.
Put
A A
A XN A 8d8
1 = sup E(1 + d AL( )A
A ) , (5.5.17)
N=1 N A
m
2 = a0 + ai 2 , (5.5.18)
i=1
3 = (1 + dL(S))2d2 . (5.5.19)
Note that 1 < by (5.5.10). We will take care to make all our estimates below
explicit in terms of the constants i (and the constant c appearing in (5.5.8)), in an-
ticipation of exploiting the scale invariance of Lemma 5.5.4 and the d-linearization
construction.
We next present the linearized versions
" of the# definitions (5.5.2) and (5.5.3).
0
For C such that > 0, let = Matk (C). We define
0 Ik1
The next two lemmas, which are roughly parallel in form, give the basic prop-
erties of GN ( ) and G( ), respectively, and in particular show that these matrices
are well defined.
Lemma 5.5.9 (i) For C such that > 0, GN ( ) is well defined, depends
analytically on , and satisfies the bound
1
GN ( ) 1 (1 + ). (5.5.22)
(ii) The upper left entry of GN ( ) equals gN ( ).
(iii) We have
A A
A m A c 2
A A 1 4
AIk + ( a0 )GN ( ) + ai GN ( )ai GN ( )A
1 2
(1 + ) , (5.5.23)
A i=1
A N 2
so that A = ,r A[, r] e,r . (Thus, within this proof, we view A as an N-by-N
array of k-by-k blocks A[, r].)
Since is fixed throughout the proof, we drop it from the notation to the extent
possible. To abbreviate, we write
XN 1 1 N
RN = (L( ) IN )1 , HN = (idk tr)RN = RN [i, i].
N N N i=1
From Lemma 5.5.7(i) we get an estimate
A A
A XN A 2d2 1
RN (1 + d AL( )A
A
A) (1 + ) (5.5.24)
N
which, combined with (5.5.17), yields assertion (i). From Lemma 5.5.7(i) we also
get assertion (ii).
Assertion (iii) will follow from an integration by parts as in (5.4.15). Recall
5.5 C ONVERGENCE OF OPERATOR NORMS 405
that X N (,r) XiN ( , r ) = i,i , r,r . We have, for i {1, . . . , m} and , r, , r
i
{1, . . . , N},
1
X N (r,) RN [r , ] = RN [r , r]ai RN [, ] . (5.5.25)
i N
Recall that E X N (r,) f (XN ) = EXiN (, r) f (XN ). We obtain
i
1
ERN ( )[r , r]ai RN ( )[, ] = EXiN (, r)RN ( )[r , ] . (5.5.26)
N
Now left-multiply both sides of (5.5.26) by ai
N 3/2
, and sum on i, = , and r = r ,
thus obtaining the first equality below.
m
1 XN
E(ai HN ai HN ) = E(idk tr)((L( ) a0 IN )RN )
i=1 N N
1
= E(idk tr) (Ik IN + (( a0 ) IN )RN )
N
= Ik + ( a0 )GN ( ) .
The last two steps are simple algebra. Thus the left side of (5.5.23) is bounded by
the quantity
A A
A m A
A A
N = AE[ ai (HN EHN )ai (HN EHN )]A
A i=1 A
A A2
A A
( ai 2 )EHN EHN 22 c( ai 2 )E AX N (r,) HN A ,
i 2
i i i,,r
where at the last step we use once again the Gaussian Poincare inequality in the
form (5.5.8). For the quantity at the extreme right under the expectation, we have
by (5.5.25) an estimate
1 1
3
N i,r,,r ,
tr RN [ , r]ai RN [, ]RN [, r ] ai RN [r , r] 2 ( ai 2 )RN 4 .
N i
The latter, combined with (5.5.17), (5.5.18) and (5.5.24), finishes the proof of
(5.5.23).
We will need a generalization of G( ). For any Matk (C) such that
L(S) 1S is invertible, we define
G() = (idk )((L(S) 1S )1 ) .
Now for C such that G( ) is defined, G() is also defined and
" #
0
G = G( ) . (5.5.27)
0 Ik1
406 5. F REE PROBABILITY
Lemma 5.5.10 (i) For C such that > 0, G( ) is well defined, depends
analytically on , and satisfies the bound
1
G( ) k2 3 (1 + ). (5.5.29)
(ii) The upper left entry of G( ) equals g( ).
(iii) More generally, G() is well defined and analytic for O, and satisfies the
bound
A " # A
A 0 A 1 2 1
AG + G( )A A 2k 3 (1 + ) < k 3 (1 + )
2 2 2
A 0 Ik1
(5.5.30)
for and as in (5.5.28).
(iv) If there exists O such that a0 is invertible and the operator
for all O.
Points (i) and (ii) of Lemma 5.5.10 follow. In view of the relationship (5.5.27) be-
tween G() and G( ), point (iii) of Lemma 5.5.10 follows from Lemma 5.5.7(ii).
It remains only to prove assertion (iv). Since the open set O is connected, and
G() is analytic on O, it is necessary only to show that (5.5.32) holds for all in
the nonempty open subset of O consisting of for which the operator (5.5.31) is
defined and has norm < 1. Fix such now, and let M denote the corresponding
operator (5.5.31). Put
bi = ai ( a0 )1 Matk (C)
for i = 1, . . . , m. By developing
(L(S) 1S )1 = (( a0 )1 1S )(Ik 1S M)1 ,
as a power series in M, we arrive at the identity
Ik + ( a0 )G() = (idk )(M +1 ) .
=0
Lemma 5.5.12 Fix C and a positive " integer N# such that > 0 and the
0
right side of (5.5.23) is < 1/2. Put = Matk (C). Then GN ( ) is
0 Ik1
invertible and the matrix
m
N ( ) = GN ( )1 + a0 ai GN ( )ai (5.5.33)
i=1
satisfies
2c1 22 1 4 1 2
N ( ) (1 + ) (| | + 1 + 2 + 1 2 + ), (5.5.34)
N 2
where c is the constant appearing in (5.5.8).
We now arrive at estimate (5.5.34) by our hypothesis N ( ) < 1/2, along with
(5.5.23) to bound N ( ) more strictly, and finally (5.5.18) and (5.5.22).
then z = w.
S = S and T = T S = .
Then we have
= T 1 (T T ST ) = (T T ST ) T 1 . (5.5.39)
si = r si r , 0 si r = i 0 = r si 0 (5.5.40)
holding for i, , = 1, . . . , m.
(iv) Identify Matk (B(H)) with B(H k ). Let L = a0 + m i=1 ai Xi Matk (CX) be
of degree 1. Fix Matk (C) such that T = L(s) 1B(H) B(H k ) is invert-
ible. Put = Ik 0 B(H k ) and S = m 1 (I r ) B(H k ). Put
i=1 (Ik ri )T k i
G() = (idk )(T 1 ). Use (5.5.39) and (5.5.40) to verify (5.5.32).
Note that, according to [BeV93, Proposition 5.2], the second set of conditions on
F is equivalent to the existence of a probability measure on R so that F = F is
the reciprocal of a Stieltjes transform. Such a point of view can actually serve as
a definition of free convolution, see [ChG08] or [BeB07].
Lemma 5.3.40 is a particularly simple example of infinite divisibility. The as-
sumption of finite variance in the lemma can be removed by observing that the
solution of (5.3.26) is infinitely divisible, and then using [BeV93, Theorem 7.5].
The theory of free infinite divisibility parallels the classical one, and in particular,
a LevyKhitchine formula does exist to characterize infinitely divisible laws, see
[BeP00] and [BaNT04]. The former paper introduces the BercoviciPata bijec-
tion between the classical and free infinitely divisible laws (see also the Boolean
BercoviciPata bijection in [BN08]). Matrix approximations to free infinitely di-
visible laws are constructed in [BeG05].
The generalization of multiplicative free convolution to affiliated operators is
done in [BeV93], see also [NiS97].
The relation between random matrices and asymptotic freeness was first estab-
lished in the seminal article of Voiculescu [Voi91]. In [Voi91, Theorem 2.2], he
proved Theorem 5.4.5 in the case of Wigner Gaussian (Hermitian) random matri-
ces and diagonal matrices {DNi }1ip , whereas in [Voi91, Theorem 3.8], he gen-
eralized this result to independent unitary matrices. In [Voi98b], he removed the
former hypothesis on the matrices {DNi }1ip to obtain Theorem 5.4.5 for Gaus-
sian matrices and Theorem 5.4.10 in full generality (following the same ideas as in
Exercise 5.4.17). An elegant proof of Theorem 5.4.2 for Gaussian matrices which
avoid combinatorial arguments appears in [CaC04]. Theorem 5.4.2 was extended
to non-Gaussian entries in [Dyk93b]. The proof of Theorem 5.4.10 we presented
follows the characterization of the law of free unitary variables by a Schwinger
Dyson equation given in [Voi99, Proposition 5.17] and the ideas of [CoMG06].
Other proofs were given in terms of Weingarten functions in [Col03] and with a
more combinatorial approach in [Xu97]. For uses of master loop (or Schwinger
Dyson) equations in the physics literature, see e.g. [EyB99] and [Eyn03].
Asymptotic freeness can be extended to other models such as joint distribu-
tion of random matrices with correlated entries [ScS05] or to deterministic mod-
els such as permutation matrices [Bia95]. Biane [Bia98b] (see also [Sni06] and
[Bia01]) showed that the asymptotic behavior of rescaled Young diagrams and as-
sociated representations and characters of the symmetric groups can be expressed
in terms of free cumulants.
412 5. F REE PROBABILITY
The study of the correction (central limit theorem) to Theorem 5.4.2 for Gaus-
sian entries was performed in [Cab01] and [MiS06]. The generalization to non-
Gaussian entries, as done in [AnZ05], is still open in the general noncommutative
framework. A systematic study and analysis of the limiting covariance was un-
dertaken in [MiN04]. The failure of the central limit theorem for a matrix model
whose potential has two deep wells was shown in [Pas06].
We have not mentioned the notion of freeness with amalgamation, which is a
freeness property where the scalar-valued state is replaced by an operator-valued
conditional expectation with properties analogous to conditional expectation from
classical probability theory. This notion is particularly natural when consider-
ing the algebra generated by two subalgebras. For instance, the free algebras
{Xi }1ip as in Theorem 5.4.5 are free with amalgamation with respect to the al-
gebra generated by the {Di }1ip . We refer to [Voi00b] for definitions and to
[Shl98] for a nice application to the study the asymptotics of the spectral measure
of band matrices. The central limit theorem for the trace of mixed moments of
band matrices and deterministic matrices was done in [Gui02].
The convergence of the operator norm of polynomials in independent GUE ma-
trices discussed in Section 5.5 was first proved in [HaT05]. (The norms of the lim-
iting object, namely free operators with matrix coefficients, were already studied
in [Leh99].) This result was generalized to independent matrices from the GOE
and the GSE in [Sch05], see also [HaST06], and to Wigner or Wishart matrices
with entries satisfying the Poincare inequality in [CaD07]. It was also shown in
[GuS08] to hold with matrices whose laws are absolutely continuous with respect
to the Lebesgue measure and possess a strictly log-concave density. The norm of
long words in free noncommutative variables is discussed in [Kar07a]. We note
that a by-product of the proof of Theorem 5.5.1 is that the Stieltjes transform of
the law of any self-adjoint polynomial in free semicircular random variables is
an algebraic function, as one sees by applying the algebraicity criterion [AnZ08b,
Theorem 6.1], to the SchwingerDyson equation as expressed in the form (5.5.32).
Proposition 5.5.3 is analogous to a result for sample covariance matrices proved
earlier in [BaS98a].
Many topics related to free probability have been left out in our discussion. In
particular, we have not mentioned free Brownian motion as defined in [Spe90],
which appears as the limit of the Hermitian Brownian motion with size going
to infinity [Bia97a]. We refer to [BiS98b] for a study of the related stochastic
calculus, to [Bia98a] for the introduction of a wide class of processes with free
increments and for the study of their Markov properties, to [Ans02] for the intro-
duction of stochastic integrals with respect to processes with free increments, and
to [BaNT02] for a thorough discussion of Levy processes and Levy laws. Such
5.6 B IBLIOGRAPHICAL NOTES 413
a stochastic calculus was used to prove a central limit theorem in [Cab01], large
deviation principles, see the survey [Gui04], and the convergence of the empirical
distribution of interacting matrices [GuS08]. In such a noncommutative stochastic
calculus framework, inequalities such as the BurkholderDavisGundy inequality
[BiS98b] or the BurkholderRosenthal inequalities [JuX03] hold.
Another important topic we did not discuss is the notion of free entropy. We re-
fer the interested readers to the reviews [Voi02] and [HiP00b]. Voiculescu defined
several concepts for an entropy in the noncommutative setup. First, the so-called
microstates entropy was defined in [Voi94], analogously to the
BoltzmannShannon entropy, as the volume of the collection of random matri-
ces whose empirical distribution approximates a given tracial state. Second, in
[Voi98a], the microstates-free free entropy was defined by following an infinitesi-
mal approach based on the free Fisher information. Voiculescu showed in [Voi93]
that, in the case of one variable, both entropies are equal. Following a large de-
viations and stochastic processes approach, bounds between these two entropies
could be given in the general setting, see [CaG01] and [BiCG03], providing strong
evidence toward the conjecture that they are equal in full generality. Besides its
connections with large deviations questions, free entropies were used to define
in [Voi94] another important concept, namely the free entropy dimension. This
dimension is related with L2 -Betti numbers [CoS05], [MiS05] and is analogous
to a fractal dimension in the classical setting [GuS07]. A long standing conjec-
ture is that the entropy dimension is an invariant of the von Neumann algebra,
which would settle the well known problem of the isomorphism between free
group factors [Voi02, section 2.6]. Free entropy theory has already been used to
settle some important questions in von Neumann algebras, see [Voi96], [Ge97],
[Ge98] or [Voi02, section 2.5]. In another direction, random matrices can be an
efficient way to tackle questions concerning C -algebras or von Neumman alge-
bras, see e.g. [Voi90], [Dyk93a], [Rad94], [HaT99], [Haa02], [PoS03], [HaT05],
[HaST06], [GuJS07] and [HaS09].
The free probability concepts developed in this chapter, and in particular free
cumulants, can also be used in more applied subjects such as telecommunications,
see [LiTV01] and [TuV04].
Appendices
This appendix recalls some basic results from linear algebra. We refer the reader
to [HoJ85] for further details and proofs.
The following identities are repeatedly used. Throughout, A, B,C, D denote arbi-
trary matrices of appropriate dimensions. We then have
" # " #" #
A B A 0 1 A1 B
1det A=0 det = det
C D C D CA1 B 0 1
= det A det[D CA1 B] , (A.1)
where the right side of (A.1) is set to 0 if A is not invertible.
The following lemma, proved by multiplying on the right by (X zI) and on
the left by (X A zI), is very useful.
Lemma A.1 (Matrix inversion) For matrices X, A and scalar z, the following
identity holds if all matrices involved are invertible:
(X A zI)1 (X zI)1 = (X A zI)1 A(X zI)1 .
Many manipulations of matrices involve their minors. Thus, let I = {i1 , . . . , i|I| }
{1, . . . , m}, J = { j1 , . . . , j|J| } {1, . . . , n}, and for an m-by-n matrix A, let AI,J
be the |I|-by-|J| matrix obtained by erasing all entries that do not belong to a row
with index from I and a column with index from J. That is,
AI,J (l, k) = A(il , jk ) , l = 1, . . . , |I|, k = 1, . . . , |J| .
414
A. L INEAR ALGEBRA PRELIMINARIES 415
For Hermitian matrices, more can be said. Recall that, for a Hermitian matrix A,
we let 1 (A) 2 (A) N (A) denote the ordered eigenvalues of A. We first
recall the
(2)
Theorem A.5 (Weyls inequalities) Let A, B HN . Then, for each k {1, . . . , N},
we have
k (A) + 1 (B) k (A + B) k (A) + N (B) . (A.3)
Corollary A.6 is weaker than Lemma 2.1.19, which in its Hermitian formulation,
see Remark 2.1.20, actually implies that, under the same assumptions,
(2)
Theorem A.7 Let A HN and z CN . Then, for 1 k N 2,
k (A zz ) k+1 (A) k+2 (A zz ) . (A.6)
Properties (A.7), (A.8) and (A.9) are immediate consequences of the definition. A
proof of (A.10) and (A.11) can be found in [Sim05b, Prop. 2.6 & Thm. 2.7]. It
follows from (A.10) that if X is a square matrix then
X1 | tr(X)| . (A.12)
For matrices X and Y with complex entries which can be multiplied, and expo-
nents 1 p, q, r satisfying 1p + 1q = 1r , we have the noncommutative Holder
inequality
XY r X p Y q . (A.13)
A. L INEAR ALGEBRA PRELIMINARIES 417
be two polynomials where the as, bs, s and s are complex numbers, the lead
coefficients am and bn are nonzero, and t is a variable. The resultant of P and Q is
defined as
m n m n
n (i j ) = am Q(i ) = (1) bn P( j ).
R(P, Q) = anm bm n mn m
i=1 j=1 i=1 j=1
B Topological preliminaries
B.1 Generalities
Theorem B.2 A subset of a metric space is compact iff it is closed and sequentially
compact.
contains a countable dense set. Any topological space that possesses a countable
base is separable, whereas any separable metric space possesses a countable base.
Even if a space is not metric, the notion of convergence on a sequence may be
extended to convergence on filters, or nets, such that compactness, closedness,
etc. may be checked by convergence. The interested reader is referred to [DuS58]
or [Bou87] for details.
Let J be an arbitrary set. Let X be the Cartesian product of topological spaces
X j , i.e., X = j X j . The product topology on X is the topology generated by
the base j U j , where U j are open and equal to X j except for a finite number
of values of j. This topology is the weakest one which makes all projections
p j : X X j continuous. The Hausdorff property is preserved under products,
and any countable product of metric spaces (with metric dn (, )) is metrizable,
with the metric on X given by
1 dn (pn x, pn y)
d(x, y) = 2n 1 + dn (pn x, pn y) .
n=1
A vector space over the reals is a set X that is closed under the operations of
addition and multiplication by scalars, i.e., if x, y X , then x + y X and x
X for all R. All vector spaces in this book are over the reals. A topological
vector space is a vector space equipped with a Hausdorff topology that makes the
vector space operations continuous. The convex hull of a set A, denoted co(A), is
the intersection of all convex sets containing A. The closure of co(A) is denoted
co(A). co({x1 , . . . , xN }) is compact, and, if Ki are compact, convex sets, then the
set co( Ni=1 Ki ) is closed. A locally convex topological vector space is a vector
space that possesses a convex base for its topology.
for a topological vector space. The product of two topological vector spaces is
a topological vector space, and is locally convex if each of the coordinate spaces
is locally convex. The topological dual of the product space is the product of the
topological duals of the coordinate spaces. A set H X is called separating if
for any point x X , x = 0, one may find an h H such that h(x) = 0. It follows
from its definition that X is separating.
It follows in particular that there may be different topological vector spaces with
the same topological dual. Such examples arise when the original topology on X
is strictly finer than the weak topology.
Unlike in the Euclidean setup, balls need not be convex in a metric space. How-
ever, in normed spaces, all balls are convex. Actually, the following partial con-
verse holds.
Theorem B.9 A topological vector space is normable, i.e., a norm may be de-
fined on it that is compatible with its topology, iff its origin has a convex bounded
neighborhood.
Weak topologies may be defined on Banach spaces and their topological duals. A
striking property of the weak topology of Banach spaces is the fact that compact-
ness, apart from closure, may be checked using sequences.
We collect below some basic results tying measures and functions on locally com-
pact Hausdorff spaces. In most of our applications, the underlying space will be
R. A good reference that contains this material is [Rud87].
C.1 Generalities
The following indicates why Polish spaces are convenient when handling measur-
ability issues. Throughout, unless explicitly stated otherwise, Polish spaces are
equipped with their Borel -fields.
424 A PPENDICES
1 ({ : ( ) = 1 }) = 0 ;
It is property (b) that allows for the decomposition of measures. In Polish spaces,
the existence of an r.c.p.d. follows from:
({1 : 1 = 1 }) = 0 .
We now turn our attention to the particular case where is metric (and, when-
ever needed, Polish).
Theorem C.6 Let be Polish, and let M1 (). Then there exists a unique
closed set C such that (C ) = 1 and, if D is any other closed set with (D) = 1,
then C D. Finally,
C = { : U o (U o ) > 0 } .
where Cb (), > 0 and x R. If one takes only functions Cb () that are
of compact support, the resulting topology is the vague topology.
Hereafter, M1 () always denotes M1 () equipped with the weak topology. The
following are some basic properties of this topological space.
Theorem C.9 (Prohorov) Let be Polish, and let M1 (). Then is compact
iff is tight.
The following theorem is the analog of Fatous Lemma for measures. It is proved
from Fatous Lemma either directly or by using the Skorohod representation the-
orem.
D. N OTIONS OF LARGE DEVIATIONS 427
This appendix recalls basic definitions and main results of large deviation theory.
We refer the reader to [DeS89] and [DeZ98] for a full treatment.
In what follows, X will be assumed to be a Polish space (that is a complete sep-
arable metric space). We recall that a function f : X R is lower semicontinuous
if the level sets {x : f (x) C} are closed for any constant C.
1
For any open set O X, lim inf log N (O) inf I. (D.2)
N aN O
1
For any closed set F X, lim sup log N (F) inf I. (D.3)
N aN F
When it is clear from the context, we omit the reference to the speed or rate func-
tion and simply say that the sequence {N } satisfies the LDP. Also, if xN are
X-valued random variables distributed according to N , we say that the sequence
{xN } satisfies the LDP if the sequence {N } satisfies the LDP.
The proof of a large deviation principle often proceeds first by the proof of a weak
large deviation principle, in conjuction with the so-called exponential tightness
property.
(b) A rate function I is good if the level sets {x X : I(x) M} are compact for
all M 0.
Theorem D.4 (a) ([DeZ98, Lemma 1.2.18]) If {N } satisfies the weak LDP and
it is exponentially tight, then it satisfies the full LDP, and the rate function I is
good.
(b) ([DeZ98, Exercise 4.1.10]) If {N } satisfies the upper bound (D.3) with a good
rate function I, then it is exponentially tight.
A weak large deviation principle is itself equivalent to the estimation of the prob-
ability of deviations towards small balls.
From a given large deviation principle one can deduce a large deviation principle
for other sequences of probability measures by using either the so-called contrac-
tion principle or Laplaces method.
Theorem D.8 (Varadhans Lemma) Assume that (N )NN satisfies a large devi-
ation principle with good rate function I. Let F : XR be a bounded continuous
function. Then
1
lim log eaN F(x) d N (x) = sup{F(x) I(x)}.
N aN xX
Proof In the special case f (t) = Bt we have F(x) = B(x+1+1) , and hence the
claim holds. To handle the general case we may assume that B = 0. Then we
have 01 etx f (t)dt = O( 0 t +1 etx dt) and 1 etx f (t)dt decays exponentially
fast, which proves the lemma.
Definition E.1 The field H is the associative (but not commutative) R-algebra
with unit for which 1, i, j, k form a basis over R, and in which multiplication is
dictated by the rules
i2 = j2 = k2 = ijk = 1. (E.1)
Remark E.2 Here is a concrete model for the quaternions in terms of matrices.
Note that the matrices
" # " # " #
i 0 0 1 0 i
, ,
0 i 1 0 i 0
with complex number entries satisfy the rules (E.1). It follows that the map
" #
a + bi c + di
a + bi + cj + dk (a, b, c, d R)
c + di a bi
is an isomorphism of H onto a subring of the ring of 2-by-2 matrices with entries
in C. The quaternions often appear in the literature identified with 2-by-2 matrices
in this way. We do not use this identification in this book.
For every
x = a + bi + cj + dk H (a, b, c, d R)
E. Q UATERNIONS AND MATRIX THEORY OVER F 431
we define
x = a2 + b2 + c2 + d 2 , x = a bi cj dk, x = a.
We then have
x + x
x2 = xx , xy = x y, (xy) = y x , x = , xy = yx
2
for all x, y H. In particular, we have x1 = x /x2 for nonzero x H.
The space of all real multiples of 1 H is a copy of R and the space of all real
linear combinations of 1 and i is a copy of C. Thus R and C can be and will be
identified with subfields of H, and in particular both i and i will be used to denote
the imaginary unit of the complex numbers. In short, we think of R, C and H as
forming a tower
R C H.
Let Mat pq (F) denote the space of p-by-q matrices with entries in F. Given
X Mat pq (F), let Xi j F denote the entry of X in row i and column j. Let
Mat pq = Mat pq (R) and Matn (F) = Matnn (F). Let 0 pq denote the p-by-q
zero matrix, and let 0 p = 0 pp . Let In denote the n-by-n identity matrix. Given
X Mat pq (F), let X Matqp (F) be the matrix obtained by transposing X and
then applying asterisk to every entry. The operation X X is R-linear and,
furthermore, (XY ) = Y X for all X Mat pq (F) and Y Matqr (F). Similarly,
we have (xX) = X x for any matrix X Mat pq (F) and scalar x F. Given
X Matn (F), we define tr X F to be the sum of the diagonal entries of X. Given
X,Y Mat pq (F), we set X Y = tr X Y , thus equipping Mat pq (F) with the
structure of finite-dimensional real Hilbert space (Euclidean space). Given ma-
trices Xi Matni (F) for i = 1, . . . , , let diag(X1 , . . . , X ) Matn1 ++n (F) be the
block-diagonal matrix obtained by stringing the given matrices Xi along the diag-
onal.
(p,q)
Definition E.3 The matrix ei j = ei j Mat pq with entry 1 in row i and column
j and 0s elsewhere is called an elementary matrix.
432 A PPENDICES
The set
{uei j | u F {1, i, j, k}, ei j Mat pq }
is an orthonormal basis for Mat pq (F).
Definition E.4 (i) Let X Matn (F) be a matrix. It is invertible if there exists
Y Matn (F) such that Y X = In = XY . It is normal if X X = XX . It is unitary
if X X = In = XX . It is self-adjoint (resp., anti-self-adjoint) if X = X (resp.,
X = X). It is upper triangular (resp., lower triangular) if Xi j = 0 unless i j
(resp., i j).
(ii) A matrix X Matn (F) is monomial if there is exactly one nonzero entry in
every row and in every column; if, moreover, every entry of X is either 0 or 1, we
call X a permutation matrix.
(iii) A self-adjoint X Matn (F) is positive definite if v Xv > 0 for all nonzero
v Matn1 (F).
(iv) A matrix X Matn (F) is a projector if it is both self-adjoint and idempotent,
that is, if X = X = X 2 .
(v) A matrix X Mat pq (F) is diagonal if Xi j = 0 unless i = j. The set of positions
(i, i) for i = 1, . . . , min(p, q) is called the (main) diagonal of X.
The group of invertible elements of Matn (F) is denoted GLn (F), while the sub-
group of GLn (F) consisting of unitary matrices is denoted Un (F). Permutation
matrices in Matn belong to Un (F).
We next present several factorization theorems. The first is obtained by the
Gaussian elimination method.
Theorem E.5 (Gaussian elimination) Let X Mat pq (F) have the property that
for all v Matq1 (F), if Xv = 0, then v = 0. Then p q. Furthermore, there exists
a permutation matrix P Mat p (F) and an upper triangular matrix T Matq (F)
with every diagonal entry equal to 1 such that PXT vanishes above the main
diagonal but vanishes nowhere on the main diagonal.
Corollary E.7 (UT factorization) Every X GLn (F) has a unique factorization
X = UT where T GLn (F) is upper triangular with every diagonal entry positive
and U Un (F).
A reference for the proof of the spectral theorem in the unfamiliar case F = H is
[FaP03].
Definition E.10 (Standard blocks) A C-standard block is any element of Mat1 (C)
= C. An H-standard block is any element of Mat1 (C) = C with nonnegative
imaginary
" part. An # R-standard block is either an element of Mat1 = R, or a
a b
matrix Mat2 with b > 0. Finally, X Matn (F) is F-reduced if
b a
X = diag(B1 , . . . , B ) for some F-standard blocks Bi .
We call the absolute values of the entries of D the singular values of the rectangu-
lar matrix X. (When F = R, C this is the standard notion of singular value.) The
squares of the singular values of X are the eigenvalues of X X or XX , whichever
has min(p, q) rows and columns.
Proposition E.14 Let 0 < p q be integers and put n = p + q. Let Matn (F)
be a projector. Then there exists U Un (F) commuting with diag(Ip , 0q ) such that
" #
a b
U U = , where a Mat p , 2b Mat pq and d Matq are diagonal
bT d
with entries in the closed unit interval [0, 1].
" #
a
Proof Write = with a Mat p (F), Mat pq (F) and d Matq (F).
d
Since every element of Un (F) commuting with diag(Ip , 0q ) is of the form diag(v, w)
for v U p (F) and w Uq (F), we may by Corollary E.13 assume that a and d are
diagonal and real. Necessarily the diagonal entries of a and d belong to the closed
unit interval [0, 1]. For brevity, write ai = aii and d j = d j j . We may assume that
the diagonal entries of a are ordered so that ai (1 ai ) is nonincreasing as a func-
tion of i, and similarly d j (1 d j ) is nonincreasing as a function of j. We may
further assume that whenever ai (1 ai ) = ai+1 (1 ai+1 ) we have ai ai+1 , but
that whenever d j (1 d j ) = d j+1 (1 d j+1 ) we have d j d j+1 .
E. Q UATERNIONS AND MATRIX THEORY OVER F 435
0 0 0 d
bT d
We present an identity needed to compute the Ricci curvature of the special or-
thogonal and special unitary groups, see Lemma F.27 and the discussion immedi-
ately following. The identity is well known in Lie algebra theory, but the effort
436 A PPENDICES
needed to decode a typical statement in the literature is about equal to the effort
needed to prove it from scratch. So we give a proof here.
Let sun (F) be the set of anti-self-adjoint matrices X Matn (F) such that, if
F = C, then tr X = 0. We equip the real vector space sun (F) with the inner product
inherited from Matn (F), namely X Y = tr XY . Let [X,Y ] = XY Y X for X,Y
Matn (F), noting that sun (F) is closed under the bracket operation. Let = 1, 2, 4
according as F = R, C, H.
Proposition E.15 For all X sun (F) and orthonormal bases {L } for sun (F), we
have
1 (n + 2)
[[X, L ], L ] = 1 X . (E.2)
4 4
Proof We have su1 (R) = su1 (C) = 0, and the case su1 (H) can be checked by
direct calculation with i, j, k. Therefore we assume that n 2 for the rest of the
proof.
Now for fixed X sun (F), the expression [[X, L], M] for L, M sun (F) is an
R-bilinear form on sun (F). It follows that the left side of (E.2) is independent of
the choice of orthonormal basis {L }. We are therefore free to choose {L } at
our convenience, and we do so as follows. Let ei j Matn for i, j = 1, . . . , n be the
elementary matrices. For 1 k < n and u {i, j, k}, let
k
u u n
Dk =
u
kek+1,k+1 + eii , Dk = Dik , Dun = eii .
k + k2 i=1 n i=1
(I) Given {L } and X for which (E.2) holds and any U Un (F), again (E.2)
holds for {UL U } and UXU .
F. M ANIFOLDS 437
Claim (I) holds because the operation X UXU stabilizes sun (F), preserves the
bracket [X,Y ], and preserves the inner product X Y . We turn to the proof of claim
(II). By considering conjugations that involve appropriate 2-by-2 blocks, one can
generate any element of the collection {F12 u , Du } from E . Further, using conju-
1 12
gation by permutation matrices and taking linear combinations, one can generate
{Fiuj , Duk }. Finally, to obtain Dun , it is enough to show that diag(i, i, 0, . . . , 0) can be
generated, and this follows from the identity
It follows that the left side of (E.2) with X = E12 and {L } specially chosen as
above equals cE12 , where the constant c is equal to
1 1 (n + 2)
2 (n 2) + 2 2( 1) = 1.
4 2 4
Since (E.2) holds with X = E12 and specially chosen {L }, by the previous steps
it holds in general. The proof of the lemma is finished.
F Manifolds
(A locally closed set is the intersection of a closed set with an open set.) We refer
to E as the ambient space of M.
We consider Rn as Euclidean space by adopting the standard inner product
(x, y)Rn = x y = ni=1 xi yi . Given Euclidean spaces E and F, and a map f : U V
from an open subset of E to an open subset of F, we say that f is smooth if (after
identifying E with Rn and F with Rk as vector spaces over R in some way) f is
infinitely differentiable.
Given for i = 1, 2 a Euclidean set Mi with ambient space Ei , we define the
product M1 M2 to be the subset {m1 m2 | m1 M1 , m2 M2 } of the orthogonal
direct sum E1 E2 .
Let f : M N be a map from one Euclidean set to another. We say that f is
smooth if for every point p M there exists an open neighborhood U of p in the
ambient space of M such that f |UM can be extended to a smooth map from U to
the ambient space of N. If f is smooth, then f is continuous. We say that f is a
diffeomorphism if f is smooth and has a smooth inverse, in which case we also
say that M and N are diffeomorphic. Note that the definition implies that every
n-dimensional linear subspace of a Euclidean space is diffeomorphic to Rn .
Remark F.4 Isometries need not preserve distances in ambient Euclidean spaces.
For example, {(x, y) R2 \ {(0, 0)} : x2 + y2 = 1} R2 and {0} (0, 2 ) R2
are isometric.
(ii) The chart measure T, on the Borel sets of T is the measure absolutely con-
tinuous with respect to Lebesgue measure restricted to T , T , defined by
dT,
= .
dT
Critical vocabulary
Our usage of the term regular value therefore does not conform to the traditions
of differential topology. In the latter context, a regular value is simply a point
which is not a critical value.
The following facts, which we use repeatedly, are straightforwardly deduced
from the definitions.
Regular values are easier to handle than critical ones. Sards Theorem allows
one to restrict attention, when integrating, to such values.
Theorem F.11 (Sard) [Mil97, Chapter 3] The set of critical values of a smooth
map of manifolds is negligible.
Definition F.12 A Lie group G is a manifold with ambient space Matn (F) for some
n and F such that G is a closed subgroup of GLn (F).
This ad hoc definition is of course not as general as possible but it is simple and
suits our purposes well. For example, GLn (F) is a Lie group. By Lemma 4.1.15,
Un (F) is a Lie group.
Let G be a locally compact topological group, e.g., a Lie group. Let be a
measure on the Borel sets of G. We say that is left-invariant if A = {ga | a
A} for all Borel A G and g G. Right-invariance is defined analogously.
We note that Lebesgue measure in Rn is a Haar measure. Further, for any Lie
group G contained in Un (F), the volume measure G is by Proposition F.8(vi) and
Lemma 4.1.13(iii) a Haar measure.
442 A PPENDICES
In this subsection, we prove the coarea formula, Theorem 4.1.8. We begin by in-
troducing the notion of f -adapted pairs of charts, prove a few preliminary lemmas,
and then provide the proof of the theorem. Lemmas F.18 and F.19 can be skipped
in the course of the proof of the coarea formula, but are included since they are
useful in Section 4.1.3.
Let f : M N be a smooth map from an n-manifold to a k-manifold and assume
that n k. Let : Rn Rk be the projection to the first k coordinates. Recall that
a chart on M is a an open nonempty subset S Rn together with a diffeomorphism
from S to an open subset of M.
S 1 (T ) Rn , U f 1 (V ), f = |S ,
in which case we also say that the open set U M is good for f .
Proof Without loss we may assume that M Rn and N Rk are open sets. We
may also assume that p = 0 Rn and q = f (p) = 0 Rk . Write f = ( f1 , . . . , fk ).
Let t1 , . . . ,tn be the standard coordinates in Rn . By hypothesis, for some permuta-
tion of {1, . . . , n}, putting gi = fi for i = 1, . . . , k and gi = t (i) for i = k +1, . . . , n,
the determinant detni, j=1 j gi does not vanish at the origin. By the inverse func-
tion theorem there exist open neighborhoods U, S Rn of the origin such that
() = ( f1 |U , . . . , fk |U ,t (k+1) |U , . . . ,t (n) |U ) maps U diffeomorphically to S. Take
to be the inverse of (). Take to be the identity map of N to itself. Then
(, ) is an f -adapted pair of charts and the origin belongs to the image of .
F. M ANIFOLDS 443
Proof We may assume that Mreg = 0/ and hence n k, for otherwise there is noth-
ing to prove. By Lemma F.15 we may assume that M Rn and N Rk are open
sets and that f is projection to the first k coordinates, in which case all assertions
here are obvious.
Definition F.17 Let f : E F be a linear map between Euclidean spaces and let
f : F E be the adjoint of f . The generalized determinant J( f ) is defined as
the square root of the determinant of f f : F F.
way that x1 , . . . , xn+k (resp., y1 , . . . , yn ) becomes the standard basis in Rn+k (resp.,
Rn ). Then f is represented by the matrix [A 0], where 0 Matnk . Finally, by
definition, J( f )2 = det[A 0][A 0]T = det AT A, which proves the result.
Proof Let A (resp., B) be the n-by-n (resp., k-by-k) real symmetric positive definite
matrix with entries Ai j = (xi , x j )E (resp., Bi j = (yi , y j )F ). Let C be the (n k)-by-
(n k) block of A in the lower right corner. We have to prove that J( f )2 det A =
detC det B. Make R-linear (but in general not isometric) identifications E = Rn
and F = Rk in such a way that {xi }ni=1 (respectively, {yi }ki=1 ) is the standard basis
in Rn (respectively, Rk ), and (hence) f is projection to the first k coordinates.
Let P be the k-by-n matrix with 1s along the main diagonal and 0s elsewhere.
Then we have f x = Px for all x E. Let Q be the unique n-by-k matrix such
that f y = Qy for all y F = Rk . Now the inner product on E is given in terms
of A by the formula (x, y)E = xT Ay and similarly (x, y)F = xT By. By definition
of Q we have (Px)T By = xT A(Qy) for all x Rn and y Rk , hence PT B = AQ,
and hence Q = A1 PT B. By definition of J( f ) we have J( f )2 = det(PA1 PT B) =
det(PA1 PT ) det B. Now decompose A into blocks thus:
" #
a b
A= , a = PAPT , d = C.
c d
From the matrix inversion lemma, Lemma A.1, it follows that det(PA1 PT )
= det A/ detC. The result follows.
St = {x Rnk |(t, x) U}
is chart of Mreg f 1 ((t)), and hence the correction factor t , see Definition
F.6, is defined.
Proof Use Lemma F.20 to calculate J(T(s) ( f )), taking {(i )(s)}ni=1 as the basis
for the domain of T(s) ( f ) and {(i )( (s))}ki=1 as the basis for the range.
Proof of Theorem 4.1.8 We may assume that Mreg = 0/ and hence n k, for other-
wise there is nothing to prove. Lemma F.21 expresses the function p J(T p ( f ))
locally in a fashion which makes continuity on Mreg clear. Moreover, Mcrit = {p
M | J(T p ( f )) = 0}. Thus the function in question is indeed Borel-measurable. (In
fact it is continuous, but to prove that fact requires uglier formulas.) Thus part (i)
of the theorem is proved. We turn to the proof of parts (ii) and (iii) of the theorem.
Since on the set Mcrit no contribution is made to any of the integrals under con-
sideration, we may assume that M = Mreg . We may assume that is the indicator
of a Borel subset A M. By Lemma F.15 the manifold M is covered by open
sets good for f . Accordingly M can be expressed as a countable disjoint union of
Borel sets each of which is contained in an open set good for f , say M = M . By
monotone convergence we may replace A by A M for some index , and thus
we may assume that for some f -adapted pair ( : S U, : T V ) of charts we
have A U. We adopt again the notation introduced in Lemma F.21. We have
A J(T p ( f ))d M (p) = 1 (A) J(T(s) ( f ))dS, (s)
= 1
(A) dSt ,t (x) dT, (t)
t
= ( A f 1 (q) d f 1 (q) (p))d N (q).
At the first and last steps we appeal to Proposition F.8(i) which characterizes the
measures () . At the crucial second step we apply Lemma F.21 and Fubinis
Theorem. The last calculation proves both the measurability assertion (ii) and the
integral formula (iii).
Definition F.22 (i) A vector field (on M) is a smooth map X from M to its ambient
space such that, for all p M, X(p) T p (M). Given a vector field X and a smooth
function f C (M), we define the function X f C (M) by the requirement that
X f (p) = dtd f ( (t))|t=0 for any curve through p with (0) = X(p).
(ii) If X,Y are vector fields, we define g(X,Y ) C (M) by
The Lie bracket [X,Y ] is the unique vector field satisfying, for all f C (M),
[X,Y ] f = X(Y f ) Y (X f ) .
Definition F.23 (i) For f C (M), the gradient grad f is the unique vector field
satisfying g(X, grad f ) = X f for all vector fields X. If {Li } is any local orthonor-
mal frame, then grad f = i (Li f )Li .
(ii) A connection is a bilinear operation associating with vector fields X and Y
a vector field X Y such that, for any f C (M),
f X Y = f X Y , X ( fY ) = f X Y + X( f )Y .
(iv) Given a vector field X, the divergence divX C (M) is the unique function
satisfying, for any orthonormal local frame {Li },
From part (iv) of Definition F.23, we have the classical integration by parts for-
mula: for all functions , C (M) at least one of which is compactly sup-
ported,
g(grad , grad )d = ( )d . (F.1)
Definition F.24 Given f C (M), we define the Hessian Hess f to be the opera-
tion associating with two vector fields X and Y the function
Hess( f )(X,Y ) = (XY X Y ) f = g(X grad f ,Y ) = Hess( f )(Y, X) .
(The second and third equalities can be verified from the definition of the Levi
Civita connection.)
We have Hess( f )(hX,Y ) = Hess( f )(X, hY ) = hHess( f )(X,Y ) for all h C (M)
and hence (Hess( f )(X,Y ))(p) depends only X(p) and Y (p).
448 A PPENDICES
With respect to any orthonormal local frame {Li }, we have the relations
Definition F.25 (i) The Riemann curvature tensor R(, ) associates with vector
fields X,Y an operator R(X,Y ) on vector fields defined by the formula
R(X,Y )Z = X (Y Z) Y (X Z) [X,Y ] Z .
(ii) The Ricci curvature tensor associates with vector fields X and Y the function
Ric(X,Y ) C (M), which, with respect to any orthonormal local frame {Li },
satisfies Ric(X,Y ) = i g(R(X, Li )Li ,Y ).
Ric(X, , = 1 [[X, L ], L ] X,
, X) (F.5)
4
where the sum runs over any orthonormal basis {L } of TIn (G).
and thus no uniform strictly positive lower bound on the Ricci tensor exists for
G = UN (C). We also note that (F.6) remains valid for G = UN (H) and = 4.
x(yz) = (xy)z,
(x + y)z = xz + yz, x(y + z) = xy + xz,
(xy) = ( x)y = x( y).
We will say that A is unital if there exists a unit element e A such that xe =
ex = x (e is necessarily unique because if e is also a unit then ee = e = e e = e) .
A group algebra F(G) of a group (G, ) over a field F is the set {gG ag g :
ag F} of linear combinations of finitely many elements of G with coefficients
in F (above, ag = 0 except for finitely many g). F(G) is the algebra over F with
addition and multiplication
ag g + bg g = (ag + bg )g, ag g bg g = ag bh g h,
gG gG gG gG gG g,hG
Here denotes the complex conjugate of . Note that the assumption ||a|| = ||a ||
ensures the continuity of the involution.
The following collects some of the fundamental properties of Banach algebras
(see [Rud91, pp. 234235]).
Theorem G.3 Let A be a unital Banach algebra and let G(A ) denote the invert-
ible elements of A . Then G(A ) is open, and it is a group under multiplication.
Furthermore, for every a A , the spectrum of a, defined as
sp(a) = { C : e a G(A )} ,
is nonempty, compact and, defining the spectral radius
(a) = sup{| | : sp(a)} ,
we have that
(a) = lim ||an ||1/n = inf ||an ||1/n .
n n1
Let B(H) denote the space of bounded linear operators on the Hilbert space
H. We define the adjoint T of any T B(H) as the unique element of B(H)
452 A PPENDICES
satisfying
T x, y = x, T y x, y H. (G.3)
The space B(H), equipped with the involution and the norm
has a structure of C -algebra, see Definition G.2, and a fortiori that of Banach
algebra. Therefore, Theorem G.3 applies, and we denote by sp(T ) the spectrum
of the operator T B(H).
We have (see [Rud91, Theorem 12.26]) the following.
The GNS construction (Theorem 5.2.24) discussed in the main text can be used
to prove the following fundamental fact (see [Rud91, Theorem 12.41]).
Theorem G.5 For every C -algebra A there exists a Hilbert space HA and a
norm-preserving -homomorphism A : A B(HA ).
: M B(H)
(i) (0)
/ = 0, () = I.
(ii) Each ( ) is a self-adjoint projection.
(iii) ( ) = ( ) ( ).
(iv) If = 0,
/ ( ) = ( ) + ( ).
(v) For every x H and y H, the set function x,y ( ) = ( )x, y is a
complex measure on M .
When M is the -algebra of all Borel sets on a locally compact Hausdorff space,
it is customary to add the requirement that each x,y is a regular Borel measure
G. O PERATOR ALGEBRAS 453
(this is automatically satisfied on compact metric spaces). Then we have the fol-
lowing theorem. (For bounded operators, see [Rud91, Theorem 12.23], and for
unbounded operators, see [Ber66] or references therein.)
Note that sp(T ) is a bounded set if T B(H), ensuring that x,y is a compactly
supported measure for all x, y H. For any bounded measurable function f on
sp(T ), we can use the spectral theorem to define f (T ) by
f (T ) = f ( )d ( ).
sp(T )
We then have (see [Rud91, Section 12.24]) the following.
Theorem G.7
(i) f f (T ) is a homomorphism of the algebra of all bounded Borel func-
tions on sp(T ) into B(H) which carries the function 1 to I, the identity into
T and which satisfies f(T ) = f (T ) .
(ii) f (T ) sup{| f ( )| : sp(T )}, with equality for continuous f .
(iii) If fn converges to f uniformly on sp(T ), fn (T ) f (T ) goes to zero as n
goes to infinity.
Another good property of closed and densely defined operators (not necessarily
self-adjoint) is the existence of a polar decomposition.
Theorem G.9 [DuS58, p. 1249] Let T be a closed, densely defined operator. Then
T can be written uniquely as a product T = PA, where P is a partial isometry, that
is, P P is a projection, A is a nonnegative self-adjoint operator, the closures of the
ranges of A and T coincide, and both are contained in the domain of P.
Some authors use the term positive functional where we use nonnegative func-
tional.
Theorem G.15 (Kaplansky density theorem) Let H be a Hilbert space and let
A B(H) be a C -algebra with strong closure B. Let Asa and Bsa denote the
self-adjoint elements of A and B. Then:
(i) Asa is strongly dense in Bsa ;
(ii) the closed unit ball of Asa is strongly dense in the closed unit ball of Bsa ;
(iii) the closed unit ball of A is strongly dense in the closed unit ball of B.
Von Neumann algebras are classified into three types: I, II and III [Li92, Chap-
ter 6]. The class of finite von Neumann algebras will be of special interest to
us. Since its definition is related to properties of projections, we first describe the
latter (see [Li92, Definition 6.1.1] and [Li92, Proposition 1.3.5]).
(i) Let > 0 and p, q be two projections in A so that (p) 1 and (q)
1 . Then, with r = p q, (r) 1 2 .
(ii) If pi is an increasing sequence of projections converging weakly to the
identity, then (pi ) goes to one.
(iii) Conversely, if pi is an increasing sequence of projections such that (pi )
goes to one, then pi converges weakly to the identity in A .
Von Neumann algebras equipped with nice tracial states are finite von Neumann
algebras, as stated below.
(aa ) 0 . (G.4)
Using the sub-multiplicativity of the norm and taking the limit as n yields
(G.7).
In the case when the martingale Xt possesses continuous paths, Xt equals its
quadratic variation. The usefulness of the notion of bracket of a continuous mar-
tingale is apparent in the following.
0
Xt dBt := lim
n
X Tnk (B T (k+1)
n
B Tk )
n
k=0
exists, the convergence holds in L2 and the limit does not depend on the choice of
the discretization of [0, T ] (see [KaS91, Chapter 3]).
460 A PPENDICES
One can therefore consider the problem of finding solutions to the integral equa-
tion
t t
Xt = X0 + (Xs )dBs + b(Xs )ds (H.1)
0 0
with a given X0 , and b some functions on Rn , and B a n-dimensional Brownian
motion. This can be written under the differential form
dXs = (Xs )dBs + b(Xs )ds . (H.2)
There are at least two notions of solutions: strong solutions and weak solutions.
Definition H.3 [KaS91, Definition 5.2.1] A strong solution of the stochastic dif-
ferential equation (H.2) on the given probability space (, F ) and with respect to
the fixed Brownian motion B and initial condition is a process {Xt ,t 0} with
continuous sample paths so that the following hold.
(i) Xt is adapted to the filtration Ft given by Ft = (Gt N ), with
Gt = (Bs , s t; X0 ), N = {N , G G with N G, P(G) = 0} .
(ii) P(X0 = ) = 1.
t
0 (|bi (Xs )| + |i j (Xs )| )ds < ) = 1 for all i, j n.
(iii) P(t, 2
Definition H.4 [KaS91, Definition 5.3.1] A weak solution of the stochastic dif-
ferential equation (H.2) is a pair (X, B) and a triple (, F , P) so that (, F , P)
is a probability space equipped with a filtration Ft , B is an n-dimensional Brow-
nian motion, and X is a continuous adapted process, satisfying (iii) and (iv) in
Definition H.3.
for some finite constant K independent of t. Then there exists a unique solution to
(H.2), and it is strong. Moreover, it satisfies
T
E[ b(t, Xt )2 dt] < ,
0
for all T 0.
m E(Am
T ) E(sup Mt ) m E(AT ) .
2m m
tT
is an Ft -martingale.
462 A PPENDICES
P( sup Xt ) E[X+ ] .
0t
0
Xt 2 dt =
0 i=1
(Xti )2 dt
is uniformly bounded by the constant AT . Let {Wt , Ft ,t 0} be a d-dimensional
Brownian motion. Then, for any L > 0,
t L 2
2A
P( sup | Xu dWu | L) 2e T .
0tT 0
t
Proof We denote in short Yt = 0 Xu .dWu and write, for > 0,
[Ada69] F. Adams. Lectures on Lie groups. New York, NY, W. A. Benjamin, 1969.
[Adl05] M. Adler. PDEs for the Dyson, Airy and sine processes. Ann. Inst. Fourier (Greno-
ble), 55:18351846, 2005.
[AdvM01] M. Adler and P. van Moerbeke. Hermitian, symmetric and symplectic random
ensembles: PDEs for the distribution of the spectrum. Annals Math., 153:149189,
2001.
[AdvM05] M. Adler and P. van Moerbeke. PDEs for the joint distributions of the Dyson,
Airy and sine processes. Annals Probab., 33:13261361, 2005.
[Aig79] M. Aigner. Combinatorial Theory. New York, NY, Springer, 1979.
[AlD99] D. Aldous and P. Diaconis. Longest increasing subsequences: from patience sort-
ing to the BaikDeiftJohansson theorem. Bull. Amer. Math. Soc. (N.S.), 36:413432,
1999.
[AlKV02] N. Alon, M. Krivelevich and V. H. Vu. On the concentration of eigenvalues of
random symmetric matrices. Israel J. Math., 131:259267, 2002.
[AnAR99] G. E. Andrews, R. Askey and R. Roy. Special Functions, volume 71 of Ency-
clopedia of Mathematics and its Applications. Cambridge University Press, 1999.
[And90] G. W. Anderson. The evaluation of Selberg sums. C.R. Acad. Sci. I.-Math.,
311:469472, 1990.
[And91] G. W. Anderson. A short proof of Selbergs generalized beta formula. Forum
Math., 3:415417, 1991.
[AnZ05] G. W. Anderson and O. Zeitouni. A CLT for a band matrix model. Probab. Theory
Rel. Fields, 134:283338, 2005.
[AnZ08a] G. W. Anderson and O. Zeitouni. A CLT regularized sample covariance matrices.
Ann. Statistics, 36:25532576, 2008.
[AnZ08b] G. W. Anderson and O. Zeitouni. A LLN for finite-range dependent random
matrices. Comm. Pure Appl. Math., 61:11181154, 2008.
[AnBC+ 00] C. Ane, S. Blachere, D. Chaf, P. Fougeres, I. Gentil, F. Malrieu, C. Roberto
and G. Scheffer. Sur les inegalites de Sobolev logarithmique, volume 11 of Panoramas
et Synthese. Paris, Societe Mathematique de France, 2000.
[Ans02] M. Anshelevich. Ito formula for free stochastic integrals. J. Funct. Anal., 188:292
315, 2002.
[Arh71] L. V. Arharov. Limit theorems for the characteristic roots of a sample covariance
matrix. Dokl. Akad. Nauk SSSR, 199:994997, 1971.
[Arn67] L. Arnold. On the asymptotic distribution of the eigenvalues of random matrices.
J. Math. Anal. Appl., 20:262268, 1967.
465
466 R EFERENCES
[AuBP07] A. Auffinger, G. Ben Arous and S. Peche. Poisson convergence for the largest
eigenvalues of heavy tailed random matrices. arXiv:0710.3132v3 [math.PR], 2007.
[Bai93a] Z. D. Bai. Convergence rate of expected spectral distributions of large random
matrices. I. Wigner matrices. Annals Probab., 21:625648, 1993.
[Bai93b] Z. D. Bai. Convergence rate of expected spectral distributions of large random
matrices. II. Sample covariance matrices. Annals Probab., 21:649672, 1993.
[Bai97] Z. D. Bai. Circular law. Annals Probab., 25:494529, 1997.
[Bai99] Z. D. Bai. Methodologies in spectral analysis of large-dimensional random matri-
ces, a review. Stat. Sinica, 9:611677, 1999.
[BaS98a] Z. D. Bai and J. W. Silverstein. No eigenvalues outside the support of the lim-
iting spectral distribution of large-dimensional sample covariance matrices. Annals
Probab., 26:316345, 1998.
[BaS04] Z. D. Bai and J. W. Silverstein. CLT for linear spectral statistics of large-
dimensional sample covariance matrices. Annals Probab., 32:553605, 2004.
[BaY88] Z. D. Bai and Y. Q. Yin. Necessary and sufficient conditions for almost sure
convergence of the largest eigenvalue of a Wigner matrix. Annals Probab., 16:1729
1741, 1988.
[BaY05] Z. D. Bai and J.-F. Yao. On the convergence of the spectral empirical process of
Wigner matrices. Bernoulli, 6:10591092, 2005.
[BaBP05] J. Baik, G. Ben Arous and S. Peche. Phase transition of the largest eigenvalue for
nonnull complex sample covariance matrices. Annals Probab., 33:16431697, 2005.
[BaBD08] J. Baik, R. Buckingham and J. DiFranco. Asymptotics of TracyWidom distri-
butions and the total integral of a Painleve II function. Comm. Math. Phys., 280:463
497, 2008.
[BaDJ99] J. Baik, P. Deift and K. Johansson. On the distribution of the length of the longest
increasing subsequence of random permutations. J. Amer. Math. Soc., 12:11191178,
1999.
[BaDS09] J. Baik, P. Deift and T. Suidan. Some Combinatorial Problems and Random
Matrix Theory. To appear, 2009.
[Bak94] D. Bakry. Lhypercontractivite et son utilisation en theorie des semigroupes, vol-
ume 1581 of Lecture Notes in Mathematics, pages 1114. Berlin, Springer, 1994.
[BaE85] D. Bakry and M. Emery. Diffusions hypercontractives. In Seminaire de proba-
bilites, XIX, 1983/84, volume 1123 of Lecture Notes in Mathematics, pages 177206.
Berlin, Springer, 1985.
[BaNT02] O. E. Barndorff-Nielsen and S. Thorbjrnsen. Levy processes in free probability.
Proc. Natl. Acad. Sci. USA, 99:1657616580, 2002.
[BaNT04] O. E. Barndorff-Nielsen and S. Thorbjrnsen. A connection between free and
classical infinite divisibility. Infin. Dimens. Anal. Qu., 7:573590, 2004.
[BeB07] S. T. Belinschi and H. Bercovici. A new approach to subordination results in free
probability. J. Anal. Math., 101:357365, 2007.
[BN08] S. T. Belinschi and A. Nica. -series and a Boolean Bercovici-Pata bijection for
bounded k-tuples. Adv. Math., 217:141, 2008.
[BeDG01] G. Ben Arous, A. Dembo and A. Guionnet. Aging of spherical spin glasses.
Probab. Theory Rel. Fields, 120:167, 2001.
[BeG97] G. Ben Arous and A. Guionnet. Large deviations for Wigners law and
Voiculescus non-commutative entropy. Probab. Theory Rel. Fields, 108:517542,
1997.
[BeG08] G. Ben Arous and A. Guionnet. The spectrum of heavy-tailed random matrices.
Comm. Math. Phys., 278:715751, 2008.
[BeP05] G. Ben Arous and S. Peche. Universality of local eigenvalue statistics for some
sample covariance matrices. Comm. Pure Appl. Math., 58:13161357, 2005.
R EFERENCES 467
[BeZ98] G. Ben Arous and O. Zeitouni. Large deviations from the circular law. ESAIM
Probab. Statist., 2:123134, 1998.
[BeG05] F. Benaych-Georges. Classical and free infinitely divisible distributions and ran-
dom matrices. Annals Probab., 33:11341170, 2005.
[BeG09] F. Benaych-Georges. Rectangular random matrices, related convolution. Probab.
Theory Rel. Fields, 144:471515, 2009.
[BeP00] H. Bercovici and V. Pata. A free analogue of Hincins characterization of infinite
divisibility. P. Am. Math. Soc., 128:10111015, 2000.
[BeV92] H. Bercovici and D. Voiculescu. Levy-Hincin type theorems for multiplicative and
additive free convolution. Pacific J. Math., 153:217248, 1992.
[BeV93] H. Bercovici and D. Voiculescu. Free convolution of measures with unbounded
support. Indiana U. Math. J., 42:733773, 1993.
[Ber66] S.J. Bernau. The spectral theorem for unbounded normal operators. Pacific J.
Math., 19:391406, 1966.
[Bia95] P. Biane. Permutation model for semi-circular systems and quantum random walks.
Pacific J. Math., 171:373387, 1995.
[Bia97a] P. Biane. Free Brownian motion, free stochastic calculus and random matrices. In
Free Probability Theory (Waterloo, ON 1995), volume 12 of Fields Inst. Commun.,
pages 119. Providence, RI, American Mathematical Society, 1997.
[Bia97b] P. Biane. On the free convolution with a semi-circular distribution. Indiana U.
Math. J., 46:705718, 1997.
[Bia98a] P. Biane. Processes with free increments. Math. Z., 227:143174, 1998.
[Bia98b] P. Biane. Representations of symmetric groups and free probability. Adv. Math.,
138:126181, 1998.
[Bia01] P. Biane. Approximate factorization and concentration for characters of symmetric
groups. Int. Math. Res. Not., pages 179192, 2001.
[BiBO05] P. Biane, P. Bougerol and N. OConnell. Littelmann paths and Brownian paths.
Duke Math. J., 130:127167, 2005.
[BiCG03] P. Biane, M. Capitaine and A. Guionnet. Large deviation bounds for matrix
Brownian motion. Invent. Math., 152:433459, 2003.
[BiS98b] P. Biane and R. Speicher. Stochastic calculus with respect to free Brownian mo-
tion and analysis on Wigner space. Probab. Theory Rel. Fields, 112:373409, 1998.
[BlI99] P. Bleher and A. Its. Semiclassical asymptotics of orthogonal polynomials,
Riemann-Hilbert problem, and universality in the matrix model. Annals Math.,
150:185266, 1999.
[BoG99] S. G. Bobkov and F. G. Gotze. Exponential integrability and transportation cost
related to log-Sobolev inequalities. J. Funct. Anal., 163:128, 1999.
[BoL00] S. G. Bobkov and M. Ledoux. From BrunnMinkowski to BrascampLieb and to
logarithmic Sobolev inequalities. Geom. Funct. Anal., 10:10281052, 2000.
[BoMP91] L. V. Bogachev, S. A. Molchanov and L. A. Pastur. On the density of states of
random band matrices. Mat. Zametki, 50:3142, 157, 1991.
[BoNR08] P. Bougarde, A. Nikeghbali and A. Rouault. Circular jacobi ensembles and de-
formed verblunsky coefficients. arXiv:0804.4512v2 [math.PR], 2008.
[BodMKV96] A. Boutet de Monvel, A. Khorunzhy and V. Vasilchuk. Limiting eigenvalue
distribution of random matrices with correlated entries. Markov Process. Rel. Fields,
2:607636, 1996.
[Bor99] A. Borodin. Biorthogonal ensembles. Nuclear Phys. B, 536:704732, 1999.
[BoOO00] A. Borodin, A. Okounkov and G. Olshanski. Asymptotics of Plancherel mea-
sures for symmetric groups. J. Amer. Math. Soc., 13:481515, 2000.
[BoS03] A. Borodin and A. Soshnikov. Janossy densities. I. Determinantal ensembles. J.
Statist. Phys., 113:595610, 2003.
468 R EFERENCES
[For06] P. J. Forrester. Hard and soft edge spacing distributions for random matrix ensem-
bles with orthogonal and symplectic symmetry. Nonlinearity, 19:29893002, 2006.
[FoO08] P. J. Forrester and S. Ole Warnaar. The importance of the Selberg integral. Bulletin
AMS, 45:489534, 2008.
[FoR01] P. J. Forrester and E. M. Rains. Interrelationships between orthogonal, unitary
and symplectic matrix ensembles. In Random Matrix Models and their Applications,
volume 40 of Math. Sci. Res. Inst. Publ., pages 171207. Cambridge, Cambridge Uni-
versity Press, 2001.
[FoR06] P. J. Forrester and E. M. Rains. Jacobians and rank 1 perturbations relating to
unitary Hessenberg matrices. Int. Math. Res. Not., page 48306, 2006.
[FrGZJ95] P. Di Francesco, P. Ginsparg and J. Zinn-Justin. 2D gravity and random matrices.
Phys. Rep., 254:133, 1995.
[FuK81] Z. Furedi and J. Komlos. The eigenvalues of random symmetric matrices. Combi-
natorica, 1:233241, 1981.
[Ge97] L. Ge. Applications of free entropy to finite von Neumann algebras. Amer. J. Math.,
119:467485, 1997.
[Ge98] L. Ge. Applications of free entropy to finite von Neumann algebras. II. Annals
Math., 147:143157, 1998.
[Gem80] S. Geman. A limit theorem for the norm of random matrices. Annals Probab.,
8:252261, 1980.
[GeV85] I. Gessel and G. Viennot. Binomial determinants, paths, and hook length formulae.
Adv. Math., 58:300321, 1985.
[GiT98] D. Gilbarg and N. S. Trudinger. Elliptic Partial Equations of Second Order. New
York, NY, Springer, 1998.
[Gin65] J. Ginibre. Statistical ensembles of complex, quaternion, and real matrices. J.
Math. Phys., 6:440449, 1965.
[Gir84] V. L. Girko. The circular law. Theory Probab. Appl., 29:694706, 1984.
[Gir90] V. L. Girko. Theory of Random Determinants. Dordrecht, Kluwer, 1990.
[GoT03] F. Gotze and A. Tikhomirov. Rate of convergence to the semi-circular law. Probab.
Theory Rel. Fields, 127:228276, 2003.
[GoT07] F. Gotze and A. Tikhomirov. The circular law for random matrices.
arXiv:0709.3995v3 [math.PR], 2007.
[GrKP94] R. Graham, D. Knuth and O. Patashnik. Concrete Mathematics: a Foundation
for Computer Science. Reading, MA, Addison-Wesley, second edition, 1994.
[GrS77] U. Grenander and J. W. Silverstein. Spectral analysis of networks with random
topologies. SIAM J. Appl. Math., 32:499519, 1977.
[GrMS86] M. Gromov, V. Milman and G. Schechtman. Asymptotic Theory of Finite Di-
mensional Normed Spaces, volume 1200 of Lectures Notes in Mathematics. Berlin,
Springer, 1986.
[GrPW91] D. Gross, T. Piran and S. Weinberg. Two dimensional quantum gravity and
random surfaces. In Jerusalem Winter School. Singapore, World Scientific, 1991.
[Gui02] A. Guionnet. Large deviation upper bounds and central limit theorems for band
matrices. Ann. Inst. H. Poincare Probab. Statist., 38:341384, 2002.
[Gui04] A. Guionnet. Large deviations and stochastic calculus for large random matrices.
Probab. Surv., 1:72172, 2004.
[GuJS07] A. Guionnet, V. F. R Jones and D. Shlyakhtenko. Random matrices, free proba-
bility, planar algebras and subfactors. arXiv:math/0712.2904 [math.OA], 2007.
[GuM05] A. Guionnet and M. Mada. Character expansion method for a matrix integral.
Probab. Theory Rel. Fields, 132:539578, 2005.
[GuM06] A. Guionnet and E. Maurel-Segala. Combinatorial aspects of matrix models.
Alea, 1:241279, 2006.
472 R EFERENCES
[GuM07] A. Guionnet and E. Maurel-Segala. Second order asymptotics for matrix models.
Ann. Probab., 35:21602212, 2007.
[GuS07] A. Guionnet and D. Shlyakhtenko. On classical analogues of free entropy dimen-
sion. J. Funct. Anal., 251:738771, 2007.
[GuS08] A. Guionnet and D. Shlyakhtenko. Free diffusion and matrix models with strictly
convex interaction. GAFA, 18:18751916, 2008.
[GuZ03] A. Guionnet and B. Zegarlinski. Lectures on logarithmic Sobolev inequalities.
In Seminaire de Probabilites XXXVI, volume 1801 of Lecture Notes in Mathematics.
Paris, Springer, 2003.
[GuZ00] A. Guionnet and O. Zeitouni. Concentration of the spectral measure for large
matrices. Electron. Commun. Prob., 5:119136, 2000.
[GuZ02] A. Guionnet and O. Zeitouni. Large deviations asymptotics for spherical integrals.
J. Funct. Anal., 188:461515, 2002.
[GZ04] A. Guionnet and O. Zeitouni. Addendum to: Large deviations asymptotics for
spherical integrals. J. Funct. Anal., 216:230241, 2004.
[Gus90] R. A. Gustafson. A generalization of Selbergs beta integral. B. Am. Math. Soc.,
22:97105, 1990.
[Haa02] U. Haagerup. Random matrices, free probability and the invariant subspace prob-
lem relative to a von Neumann algebra. In Proceedings of the International Congress
of Mathematicians, Vol. I (Beijing, 2002), pages 273290, Beijing, Higher Education
Press, 2002.
[HaS09] U. Haagerup and H. Schultz. Invariant subspaces for operators in a general II1 -
factor. To appear in Publ. Math. Inst. Hautes Etudes Sci., 2009.
[HaST06] U. Haagerup, H. Schultz and S. Thorbjrnsen. A random matrix approach to the
lack of projections in C (F2 ). Adv. Math., 204:183, 2006.
[HaT99] U. Haagerup and S. Thorbjrnsen. Random matrices and k-theory for exact C -
algebras. Doc. Math., 4:341450, 1999.
[HaT03] U. Haagerup and S. Thorbjrnsen. Random matrices with complex Gaussian en-
tries. Expo. Math., 21:293337, 2003.
[HaT05] U. Haagerup and S. Thorbjrnsen. A new application of random matrices:
Ext(C (F2 )) is not a group. Annals Math., 162:711775, 2005.
[HaLN06] W. Hachem, P. Loubaton and J. Najim. The empirical distribution of the eigen-
values of a Gram matrix with a given variance profile. Ann. Inst. H. Poincare Probab.
Statist., 42:649670, 2006.
[Ham72] J. M. Hammersley. A few seedlings of research. In Proceedings of the Sixth Berke-
ley Symposium on Mathematical Statistics and Probability (University of California,
Berkeley, CA, 1970/1971), Vol. I: Theory of Statistics, pages 345394, Berkeley, CA,
University of California Press, 1972.
[HaM05] C. Hammond and S. J. Miller. Distribution of eigenvalues for the ensemble of real
symmetric Toeplitz matrices. J. Theoret. Probab., 18:537566, 2005.
[HaZ86] J. Harer and D. Zagier. The Euler characteristic of the moduli space of curves.
Invent. Math., 85:457485, 1986.
[Har56] Harish-Chandra. Invariant differential operators on a semisimple Lie algebra. Proc.
Nat. Acad. Sci. U.S.A., 42:252253, 1956.
[HaTW93] J. Harnad, C. A. Tracy and H. Widom. Hamiltonian structure of equations ap-
pearing in random matrices. In Low-dimensional Topology and Quantum Field Theory
(Cambridge, 1992), volume 315 of Adv. Sci. Inst. Ser. B Phys., pages 231245, New
York, NY, NATO, Plenum, 1993.
[HaM80] S. P. Hastings and J. B. McLeod. A boundary value problem associated with the
second Painleve transcendent and the Kortewegde Vries equation. Arch. Rational
Mech. Anal., 73:3151, 1980.
R EFERENCES 473
[Hel01] S. Helgason. Differential Geometry, Lie Groups, and Symmetric Spaces, volume 34
of Graduate Studies in Mathematics. Providence, RI, American Mathematical Society,
2001. Corrected reprint of the 1978 original.
[HiP00a] F. Hiai and D. Petz. A large deviation theorem for the empirical eigenvalue distri-
bution of random unitary matrices. Ann. Inst. H. Poincare Probab. Statist., 36:7185,
2000.
[HiP00b] F. Hiai and D. Petz. The Semicircle Law, Free Random Variables and Entropy,
volume 77 of Mathematical Surveys and Monographs. Providence, RI, American
Mathematical Society, 2000.
[HoW53] A. J. Hoffman and H. W. Wielandt. The variation of the spectrum of a normal
matrix. Duke Math. J., 20:3739, 1953.
[HoJ85] R. A. Horn and C. R. Johnson. Matrix Analysis. Cambridge, Cambridge University
Press, 1985.
[HoX08] C. Houdre and H. Xu. Concentration of the spectral measure for large random
matrices with stable entries. Electron. J. Probab., 13:107134, 2008.
[HoKPV06] J. B. Hough, M. Krishnapur, Y. Peres and B. Virag. Determinantal processes
and independence. Probab. Surv., 3:206229, 2006.
[HoKPV09] J. B. Hough, M. Krishnapur, Y. Peres and B. Virag. Zeros of Gaussian Analytic
Functions and Determinantal Point Processes. Providence, RI, American Mathemat-
ical Society, 2009.
[Ism05] M. E. H. Ismail. Classical and Quantum Orthogonal Polynomials in One Vari-
able, volume 98 of Encyclopedia of Mathematics and its Applications. Cambridge,
Cambridge University Press, 2005.
[ItZ80] C. Itzykson and J. B. Zuber. The planar approximation. II. J. Math. Phys., 21:411
421, 1980.
[Jac85] N. Jacobson. Basic Algebra. I. New York, NY, W. H. Freeman and Company,
second edition, 1985.
[JiMMS80] M. Jimbo, T. Miwa, Y. Mori and M. Sato. Density matrix of an impenetrable
Bose gas and the fifth Painleve transcendent. Physica, 1D:80158, 1980.
[Joh98] K. Johansson. On fluctuations of eigenvalues of random Hermitian matrices. Duke
Math. J., 91:151204, 1998.
[Joh00] K. Johansson. Shape fluctuations and random matrices. Comm. Math. Phys.,
209:437476, 2000.
[Joh01a] K. Johansson. Discrete orthogonal polynomial ensembles and the Plancherel mea-
sure. Annals Math., 153:259296, 2001.
[Joh01b] K. Johansson. Universality of the local spacing distribution in certain ensembles
of Hermitian Wigner matrices. Comm. Math. Phys., 215:683705, 2001.
[Joh02] K. Johansson. Non-intersecting paths, random tilings and random matrices. Probab.
Theory Rel. Fields, 123:225280, 2002.
[Joh05] K. Johansson. The arctic circle boundary and the Airy process. Annals Probab.,
33:130, 2005.
[John01] I. M. Johnstone. On the distribution of the largest eigenvalue in principal compo-
nents analysis. Ann. Statist., 29:295327, 2001.
[Jon82] D. Jonsson. Some limit theorems for the eigenvalues of a sample covariance matrix.
J. Multivariate Anal., 12:138, 1982.
[Juh81] F. Juhasz. On the spectrum of a random graph. In Algebraic Methods in Graph
Theory, Coll. Math. Soc. J. Bolyai, volume 25, pages 313316. Amsterdam, North-
Holland, 1981.
[JuX03] M. Junge and Q. Xu. Noncommutative Burkholder/Rosenthal inequalities. Annals
Probab., 31:948995, 2003.
[Kal02] O. Kallenberg. Foundations of Modern Probability. Probability and its Applica-
474 R EFERENCES
106:409438, 1992.
[Mac75] O. Macchi. The coincidence approach to stochastic point processes. Adv. Appl.
Probability, 7:83122, 1975.
[MaP67] V. A. Marcenko and L. A. Pastur. Distribution of eigenvalues in certain sets of
random matrices. Math. USSR Sb., 1:457483, 1967.
[Mat97] T. Matsuki. Double coset decompositions of reductive Lie groups arising from two
involutions. J. Algebra, 197:4991, 1997.
[Mat94] A. Matytsin. On the large-N limit of the ItzyksonZuber integral. Nuclear Phys.
B, 411:805820, 1994.
[Mau06] E. Maurel-Segala. High order asymptotics of matrix models and enumeration of
maps. arXiv:math/0608192v1 [math.PR], 2006.
[McTW77] B. McCoy, C. A. Tracy and T. T. Wu. Painleve functions of the third kind. J.
Math. Physics, 18:10581092, 1977.
[McK05] H. P. McKean. Stochastic integrals. Providence, RI, AMS Chelsea Publishing,
2005. Reprint of the 1969 edition, with errata.
[Meh60] M. L. Mehta. On the statistical properties of the level-spacings in nuclear spectra.
Nuclear Phys. B, 18:395419, 1960.
[Meh91] M.L. Mehta. Random Matrices. San Diego, Academic Press, second edition,
1991.
[MeD63] M. L. Mehta and F. J. Dyson. Statistical theory of the energy levels of complex
systems. V. J. Math. Phys., 4:713719, 1963.
[MeG60] M. L. Mehta and M. Gaudin. On the density of eigenvalues of a random matrix.
Nuclear Phys. B, 18:420427, 1960.
[Mil63] J. W. Milnor. Morse Theory. Princeton, NJ, Princeton University Press, 1963.
[Mil97] J. W. Milnor. Topology from the Differentiable Viewpoint. Princeton, NJ, Princeton
University Press, 1997. Revised printing of the 1965 edition.
[MiS05] I. Mineyev and D. Shlyakhtenko. Non-microstates free entropy dimension for
groups. Geom. Funct. Anal., 15:476490, 2005.
[MiN04] J. A. Mingo and A. Nica. Annular noncrossing permutations and partitions, and
second-order asymptotics for random matrices. Int. Math. Res. Not., pages 1413
1460, 2004.
[MiS06] J. A. Mingo and R. Speicher. Second order freeness and fluctuations of random
matrices. I. Gaussian and Wishart matrices and cyclic Fock spaces. J. Funct. Anal.,
235:226270, 2006.
[Mos80] J. Moser. Geometry of quadrics and spectral theory. In The Chern Symposium 1979
(Proc. Int. Sympos., Berkeley, CA., 1979), pages 147188, New York, NY, Springer,
1980.
[Mui81] R. J. Muirhead. Aspects of Multivariate Statistical Theory. New York, NY, John
Wiley & Sons, 1981.
[Mur90] G. J. Murphy. C -algebras and Operator Theory. Boston, MA, Academic Press,
1990.
[Nel74] E. Nelson. Notes on non-commutative integration. J. Funct. Anal., 15:103116,
1974.
[NiS97] A. Nica and R. Speicher. A Fourier transform for multiplicative functions on
non-crossing partitions. J. Algebraic Combin., 6:141160, 1997.
[NiS06] A. Nica and R. Speicher. Lectures on the Combinatorics of Free Probability, vol-
ume 335 of London Mathematical Society Lecture Note Series. Cambridge, Cam-
bridge University Press, 2006.
[NoRW86] J.R. Norris, L.C.G. Rogers and D. Williams. Brownian motions of ellipsoids.
Trans. Am. Math. Soc., 294:757765, 1986.
[Oco03] N. OConnell. Random matrices, non-colliding processes and queues. In
476 R EFERENCES
1929.
[PoS03] S. Popa and D. Shlyakhtenko. Universal properties of L(F ) in subfactor theory.
Acta Math., 191:225257, 2003.
[PrS02] M. Prahofer and H. Spohn. Scale invariance of the PNG droplet and the Airy
process. J. Stat. Phys., 108:10711106, 2002.
[Rad94] F. Radulescu. Random matrices, amalgamated free products and subfactors of the
von Neumann algebra of a free group, of noninteger index. Invent. Math., 115:347
389, 1994.
[Rai00] E. Rains. Correlation functions for symmetrized increasing subsequences.
arXiv:math/0006097v1 [math.CO], 2000.
[RaR08] J. A. Ramrez and B. Rider. Diffusion at the random matrix hard edge.
arXiv:0803.2043v3 [math.PR], 2008.
[RaRV06] J. A. Ramrez, B. Rider and B. Virag. Beta ensembles, stochastic airy spectrum,
and a diffusion. arXiv:math/0607331v3 [math.PR], 2006.
R EFERENCES 477
[Reb80] R. Rebolledo. Central limit theorems for local martingales. Z. Wahrs. verw. Geb.,
51:269286, 1980.
[ReY99] D. Revuz and M. Yor. Continuous Martingales and Brownian motion, volume 293
of Grundlehren der Mathematischen Wissenschaften. Berlin, Springer, third edition,
1999.
[RoS93] L. C. G. Rogers and Z. Shi. Interacting Brownian particles and the Wigner law.
Probab. Theory Rel. Fields, 95:555570, 1993.
[Roy07] G. Royer. An Initiation to Logarithmic Sobolev Inequalities, volume 14 of
SMF/AMS Texts and Monographs. Providence, RI, American Mathematical Society,
2007. Translated from the 1999 French original.
[Rud87] W. Rudin. Real and Complex Analysis. New York, NY, McGraw-Hill Book Co.,
third edition, 1987.
[Rud91] W. Rudin. Functional Analysis. New York, NY, McGraw-Hill Book Co, second
edition, 1991.
[Rud08] M. Rudelson. Invertibility of random matrices: norm of the inverse. Annals Math.,
168:575600, 2008.
[RuV08] M. Rudelson and R. Vershynin. The LittlewoodOfford problem and invertibility
of random matrices. Adv. Math., 218:600633, 2008.
[Rue69] D. Ruelle. Statistical Mechanics: Rigorous Results. Amsterdam, Benjamin, 1969.
[SaMJ80] M. Sato, T. Miwa and M. Jimbo. Holonomic quantum fields I-V. Publ. RIMS
Kyoto Univ., 14:223267, 15:201278, 15:577629, 15:871972, 16:531584, 1978-
1980.
[ScS05] J. H. Schenker and H. Schulz-Baldes. Semicircle law and freeness for random
matrices with symmetries or correlations. Math. Res. Lett., 12:531542, 2005.
[Sch05] H. Schultz. Non-commutative polynomials of independent Gaussian random ma-
trices. The real and symplectic cases. Probab. Theory Rel. Fields, 131:261309, 2005.
[Sel44] A. Selberg. Bermerkninger om et multipelt integral. Norsk Mat. Tidsskr., 26:7178,
1944.
[Shl96] D. Shlyakhtenko. Random Gaussian band matrices and freeness with amalgama-
tion. Int. Math. Res. Not., pages 10131025, 1996.
[Shl98] D. Shlyakhtenko. Gaussian random band matrices and operator-valued free proba-
bility theory. In Quantum Probability (Gdansk, 1997), volume 43 of Banach Center
Publ., pages 359368. Warsaw, Polish Acad. Sci., 1998.
[SiB95] J. Silverstein and Z. D. Bai. On the empirical distribution of eigenvalues of large
dimensional random matrices. J. Multivariate Anal., 54:175192, 1995.
[Sim83] L. Simon. Lectures on Geometric Measure Theory, volume 3 of Proceedings of the
Centre for Mathematical Analysis, Australian National University. Canberra, Aus-
tralian National University Centre for Mathematical Analysis, 1983.
[Sim05a] B. Simon. Orthogonal Polynomials on the Unit Circle, I, II. American Math-
ematical Society Colloquium Publications. Providence, RI, American Mathematical
Society, 2005.
[Sim05b] B. Simon. Trace Ideals and their Applications, volume 120 of Mathematical
Surveys and Monographs. Providence, RI, American Mathematical Society, second
edition, 2005.
[Sim07] B. Simon. CMV matrices: five years after. J. Comput. Appl. Math., 208:120154,
2007.
[SiS98a] Ya. Sinai and A. Soshnikov. Central limit theorem for traces of large random
symmetric matrices with independent matrix elements. Bol. Soc. Bras. Mat., 29:124,
1998.
[SiS98b] Ya. Sinai and A. Soshnikov. A refinement of Wigners semicircle law in a neigh-
borhood of the spectrum edge for random symmetric matrices. Funct. Anal. Appl.,
478 R EFERENCES
32:114131, 1998.
[Sni02] P. Sniady. Random regularization of Brown spectral measure. J. Funct. Anal.,
193:291313, 2002.
[Sni06] P. Sniady. Asymptotics of characters of symmetric groups, genus expansion and
free probability. Discrete Math., 306:624665, 2006.
[Sod07] S. Sodin. Random matrices, nonbacktracking walks, and orthogonal polynomials.
J. Math. Phys., 48:123503, 21, 2007.
[Sos99] A. Soshnikov. Universality at the edge of the spectrum in Wigner random matrices.
Commun. Math. Phys., 207:697733, 1999.
[Sos00] A. Soshnikov. Determinantal random point fields. Uspekhi Mat. Nauk, 55:107160,
2000.
[Sos02a] A. Soshnikov. Gaussian limit for determinantal random point fields. Annals
Probab., 30:171187, 2002.
[Sos02b] A. Soshnikov. A note on universality of the distribution of the largest eigenvalues
in certain sample covariance matrices. J. Statist. Phys., 108:10331056, 2002.
[Sos03] A. Soshnikov. Janossy densities. II. Pfaffian ensembles. J. Statist. Phys., 113:611
622, 2003.
[Sos04] A. Soshnikov. Poisson statistics for the largest eigenvalues of Wigner random ma-
trices with heavy tails. Electron. Comm. Probab., 9:8291, 2004.
[Spe90] R. Speicher. A new example of independence and white noise. Probab. Theory
Rel. Fields, 84:141159, 1990.
[Spe98] R. Speicher. Combinatorial theory of the free product with amalgamation and
operator-valued free probability theory. Mem. Amer. Math. Soc., 132(627), 1998.
[Spe03] R. Speicher. Free calculus. In Quantum Probability Communications, Vol. XII
(Grenoble, 1998), pages 209235, River Edge, NJ, World Scientific Publishing, 2003.
[SpT02] D. A. Spielman and S. H. Teng. Smooth analysis of algorithms. In Proceedings of
the International Congress of Mathematicians (Beijing 2002), volume I, pages 597
606. Beijing, Higher Education Press, 2002.
[Sta97] R. P. Stanley. Enumerative Combinatorics, volume 2. Cambridge University Press,
1997.
[Sze75] G. Szego. Orthogonal Polynomials. Providence, R.I., American Mathematical
Society, fourth edition, 1975. Colloquium Publications, Vol. XXIII.
[Tal96] M. Talagrand. A new look at independence. Annals Probab., 24:134, 1996.
[TaV08a] T. Tao and V. H. Vu. Random matrices: the circular law. Commun. Contemp.
Math., 10:261307, 2008.
[TaV08b] T. Tao and V. H. Vu. Random matrices: universality of esds and the circular law.
arXiv:0807.4898v2 [math.PR], 2008.
[TaV09a] T. Tao and V. H. Vu. Inverse LittlewoodOfford theorems and the condition num-
ber of random discrete matrices. Annals Math., 169:595632, 2009.
[TaV09b] T. Tao and V. H. Vu. Random matrices: Universality of local eigenvalue statistics.
arXiv:0906.0510v4 [math.PR], 2009.
[tH74] G. tHooft. Magnetic monopoles in unified gauge theories. Nuclear Phys. B,
79:276284, 1974.
[Tri85] F. G. Tricomi. Integral Equations. New York, NY, Dover Publications, 1985.
Reprint of the 1957 original.
[TuV04] A. M. Tulino and S. Verdu. Random matrix theory and wireless communications.
In Foundations and Trends in Communications and Information Theory, volume 1,
Hanover, MA, Now Publishers, 2004.
[TrW93] C. A. Tracy and H. Widom. Introduction to Random Matrices, volume 424 of
Lecture Notes in Physics, pages 103130. New York, NY, Springer, 1993.
[TrW94a] C. A. Tracy and H. Widom. Level spacing distributions and the Airy kernel.
R EFERENCES 479
146:101166, 1999.
[Voi00a] D. Voiculescu. The coalgebra of the free difference quotient and free probability.
Int. Math. Res. Not., pages 79106, 2000.
[Voi00b] D. Voiculescu. Lectures on Probability Theory and Statistics: Ecole DEte de
Probabilites de Saint-Flour XXVIII - 1998, volume 1738 of Lecture Notes in Mathe-
matics, pages 283349. New York, NY, Springer, 2000.
[Voi02] D. Voiculescu. Free entropy. Bull. London Math. Soc., 34:257278, 2002.
[Vu07] V. H. Vu. Spectral norm of random matrices. Combinatorica, 27:721736, 2007.
[Wac78] K. W. Wachter. The strong limits of random matrix spectra for sample matrices of
independent elements. Annals Probab., 6:118, 1978.
[Wey39] H. Weyl. The Classical Groups: their Invariants and Representations. Princeton,
NJ, Princeton University Press, 1939.
[Wid94] H. Widom. The asymptotics of a continuous analogue of orthogonal polynomials.
J. Approx. Theory, 77:5164, 1994.
[Wig55] E. P. Wigner. Characteristic vectors of bordered matrices with infinite dimensions.
Annals Math., 62:548564, 1955.
[Wig58] E. P. Wigner. On the distribution of the roots of certain symmetric matrices. Annals
Math., 67:325327, 1958.
[Wil78] H. S. Wilf. Mathematics for the Physical Sciences. New York, NY, Dover Publica-
tions, 1978.
[Wis28] J. Wishart. The generalized product moment distribution in samples from a normal
multivariate population. Biometrika, 20A:3252, 1928.
[WuMTB76] T. T. Wu, B. M McCoy, C. A. Tracy and E. Barouch. Spinspin correlation
functions for the two-dimensional ising model: exact theory in the scaling region.
Phys. Rev. B., 13, 1976.
[Xu97] F. Xu. A random matrix model from two-dimensional YangMills theory. Comm.
Math. Phys., 190:287307, 1997.
[Zir96] M. Zirnbauer. Riemannian symmetric superspaces and their origin in random matrix
theory. J. Math. Phys., 37:49865018, 1996.
[Zvo97] A. Zvonkin. Matrix integrals and maps enumeration: an accessible introduction.
Math. Comput. Modeling, 26:281304, 1997.
General conventions and notation
Unless stated otherwise, for S a Polish space, M1 (S) is given the topology of weak
convergence, that makes it into a Polish space.
When we write a(s) b(s), we assert that there exists c(s) defined for s ( 0 such
that lims c(s) = 1 and c(s)a(s) = b(s) for s ( 0. We use the notation an bn for
sequences in the analogous sense. We write a(s) = O(b(s)) if lim sups |a(s)/b(s)| < .
We write a(s) = o(b(s)) if lim sups |a(s)/b(s)| = 0. an = O(bn ) and an = o(bn ) are
defined analogously.
The following is a list of frequently used notation. In case the notation is not routine, we
provide a pointer to the definition.
for all
a.s., a.e. almost sure, almost everywhere
Ai(x) Airy function
(A , , , ) C -algebra (see Definition 5.2.11)
A, Ao , Ac closure, interior and complement of A
A\B set difference
B(H) space of bounbed operators on a Hilbert space H
Ck (S), Cbk (S) functions on S with continuous (resp., bounded continuous)
derivatives up to order k
C (S) infinitely differentiable functions on S
Cb (S) bounded functions on S possessing bounded derivatives of all order
Cc (S) infinitely differentiable functions on S of compact support
C(S, S ) Continuous functions from S to S
(Rm )
Cpoly infinitely differentiable functions on Rm all of whose derivatives
have polynomial growth at infinity.
CLT central limit theorem
Prob
convergence in probability
d(, ), d(x, A) metric and distance from point x to a set A
det(M) determinant of M
(x) Vandermonde determinant, see (2.5.2)
(K) Fredholm determinant of a kernel K, see Definition 3.4.3
N open (N 1)-dimensional simplex
D(L ) domain of L
0/ the empty set
( ) the signature of a permutation
, ! there exists, there exists a unique
481
482 C ONVENTIONS AND NOTATION
484
I NDEX 485
Conjugation-invariant, 201, 202, 205, 208 Diffusion process, 247281, 319, 321
Connection problem, 183 Discriminant, 55, 257, 417
Contraction principle, 320, 428 Distribution (law),
Convergence, 419, 420 Bernoulli, see Bernoulli random variables
almost sure, 28, 71, 73, 263, 268, 323, Cauchy, 374
324, 375, 378, 379, 393 , 303, 307
L p , 28, 268, 375, 388 function, 344
in distribution (law), 92, 93, 103, 241, Gaussian, see Gaussian, distribution
274, 322, 328 noncommutative, 326, 327, 331, 333, 343,
in expectation, 323, 324, 375, 379 344, 349, 360, 363, 365, 366, 378, 380,
in moments, 328, 337 382, 385, 387, 391, 394
in probability, 23, 27, 48 Schwarz, 126, 310
sequential, 338 Semicircle, see Semicircle distribution
vague, 44, 45, 134 stable, 321
weak, 44, 134, 388, 420 Double commutant theorem (von Neumann),
weakly, in probability, 7, 23, 44, 71 340, 343, 455
Convex, Doubly stochastic matrix, 21, 86
function, 72, 285287, 291 Dyck path, 7, 8, 9, 1517, 20, 85, 353, 363,
hull, 63, 420 364
set, 21, 421, 422 Dyson, 181, 249, 319
strict, 72, 75, 298 see also SchwingerDyson equation
Correlation functions, 216
see also Intensity, joint
EdelmanDumitriu, 303
Coupling, 66
Edge (of graph), 9, 1319, 2427, 3037,
Critical (point, value), 134136, 141, 193,
376378, 387,
440, 441
bounding table, 34, 35
Cumulant, 354357, 360364, 369, 410, 411 connecting, 13, 17
see also Free, cumulant self, 13, 17
Cut-off, 250 Edge (of support of spectrum), 90, 9294,
Cyclo-stationary, 318 101, 132, 162, 163, 166, 177, 183, 184,
Cylinder set, 215 215, 306, 319, 321
hard, 321
Decimation, 66, 88, 162, 166, 169, 170 Eigenvalue, 2, 6, 20, 2123, 36, 37, 45, 48,
Determinantal 51, 55, 58, 65, 71, 78, 9094, 131, 186,
formulas, 152155 188, 193, 198, 199, 209212, 220, 221,
process, 90, 94, 131, 186, 193, 214220 223, 226228, 230, 231, 240, 249, 261,
248, 318, 319 263, 269, 286, 298, 320, 321, 327, 374
projections, 222227 393, 395, 396, 399, 433
relations, 120 complex, 88, 89, 213
stationary process, 215, 237239 joint density, 65, 87
structure, form, 9395 joint distribution, 5070, 87, 88, 184, 186,
187, 191, 261, 303, 318
Diagonal, block-diagonal, 190, 191, 198, 200, law of ordered, 52, 248, 249
201, 206, 207, 209214, 254, 263, 276, law of unordered, 53, 94, 95, 189, 304
277, 282, 300, 301, 304, 305, 319, 388, maximal, 23, 28, 66, 81, 8688, 103, 183,
389, 402, 411, 432437 269, 306, 321
Differential equations (system), 121-123, 126 see also Empirical measure
130, 170180, 182, 183 Eigenvector, 38, 53, 286, 304, 389
Differential extension, 158, 160162, 172 Eigenvectoreigenvalue pair, 308317
Differentiation formula, 121, 123, 144
486 I NDEX
Federer, 194, 318, see Coarea formula Gamma function (Eulers), 53, 139, 194, 303
Feynmans diagrams, 181 Gap, 107, 114119, 131, 148, 150155, 159,
Fiber, 196, 203, 440 234, 239
Field, 187, 430 GaudinMehta, 91
vector, 446 Gauss decomposition, 244
Filtration, 249, 251, 254, 280, 459 Gaussian, 42, 88
Fisher information, 413 distribution (law), 29, 30, 33, 39, 45, 182,
Flag manifold, 190, 197, 198, 209, 211 184, 188, 277, 284, 291, 303, 307, 311,
Fock, BoltzmannFock space, 350, 359, 362, 381, 397, 405
409 ensembles, see Ensembles, Gaussian
Forest, 27, 31, 34 process, 248, 274276
Fourier transform, 76, 87, 118, 230232, sub-, 39
237, 360 Wigner matrix see Wigner
Fredholm Gaussian orthogonal ensemble (GOE), 6, 51
adjugant, 110, 111, 113, 125, 157 54, 58, 66, 71, 81, 82, 87, 90, 93, 94, 113,
determinant, 94, 98, 107, 108, 109113, 132, 148, 150, 155, 157, 160, 161, 166,
120, 121, 128, 142, 156, 163, 170, 182, 169, 181193, 198, 199, 229, 230, 248,
183, 222, 234 302305, 323, 412
resolvent, 110, 111, 121123, 157 Gaussian symplectic ensemble (GSE), 37,
I NDEX 487
53, 58, 66, 68, 71, 93, 132, 148, 150, Hypergeometric function, 104, 106
160, 170, 183193, 302, 412
Gaussian unitary ensemble (GUE), 36, 51 Implicit function theorem, 58, 137, 268, 371,
54, 58, 66, 68, 71, 81, 82, 87, 9098, 372
101, 103, 105, 121, 158, 163, 169, 183
Inequality,
193, 198, 199, 215, 228, 229, 230, 248,
BurkholderDavisGundy, 255, 260, 265,
302305, 319, 323, 394, 395, 412
266, 271, 272, 275, 413, 461
GelfandNaimark Theorem, 331 BurkholderRosenthal, 413
GelfandNaimark-Segal construction (GNS), CauchySchwarz, 260, 285, 295, 335, 338,
326, 333, 340, 342, 369, 370, 400, 401, 384, 390, 457, 458
452 Chebyshev, 11, 17, 19, 29, 40, 49, 265,
Generalized determinant, 193, 443 271, 280, 284, 378, 398, 463
Generic, 200, 201, 203, 207, 209212 Gordon, 87
Geodesic, 27, 28, 203 Hadamard, 108, 131, 415
frame, 297, 448 Holder, 24, 387
Gersgorin circle theorem, 415 Jensen, 23, 77, 272, 275, 292
noncommutative Holder, 416, 458
GesselViennot, 245
Logarithmic Sobolev (LSI), 38, 3943, 87,
Graph, 283285, 287, 290, 298, 302
unicyclic, 30, 31 Poincare (PI), 283285, 397, 405, 412
see also Sentence, Word Slepian, 87
Greens theorem, 398 Weyl, 415
Gromov, 299 Infinitesimal generator, 288, 292
Gronwalls Lemma, 260, 292, 313 Infinite divisibility, 411
Group, 186, 188, 192, 200, 207, 211, 212, Initial condition, 249, 250, 257, 258, 261,
299, 300, 432 262, 269, 275, 460
algebra, 325, 450 Integral operator, 220
discrete, 325, 327, 332 admissible, 220222, 226, 227, 230241
see also Free group, Lie group, Orthogo- compact, 221
nal group and Unitary groups good, 221, 233239, 241
Integration formula, 65, 66, 148, 187214,
Hamburger moment problem, 329 318
Harer-Zagier recursions, 104, 181 Intensity, 216220, 222228, 234238, 240,
Heat equation, 320 242, 318
Hellys selection theorem, 45 Interlace, 62
Herbsts Lemma, 40, 284 Involution, 329334, 389, 394, 400, 450
Hermite, Isometry, 195, 196, 197, 201, 203, 205
polynomials, 90, 94, 95, 99, 100, 101, 207, 211, 343, 346, 439, 440, 443, 454
182, 183, 187, 189, 190, 191 Itos Lemma (formula), 250, 251, 260, 263,
ensemble, 189, see also Ensemble, Gaus- 269, 279, 280, 292, 293, 461
sian/Hermite ItzyksonZuberHarish-Chandra, 184, 320
Hessian, 289291, 298, 437, 447, 448
Hilbert space, 308, 326, 328, 330332, 339 Jacobi,
345, 350353, 370, 372, 400, 409, 431, ensemble, 70, 183, 186, 190, 191, 193,
439, 451457 197, 206, 208, 318, 321, 434
HoffmanWielandt, 21 polynomial, 187, 191
Holder norm, 265 Jacobian, 54, 58, 305, 306
Householder reflector (transformation), 303 Janossy density, 218, 219, 319
305
488 I NDEX
Scale invariance, 395, 403, 408 radius (norm), 269, 323, 325, 330, 331,
Schroedinger Operator, 302 336, 383, 393396, 451
Schur function, 320 resolution, 328, 453
SchwingerDyson equation, 379, 381, 382, theorem, 198, 328, 331, 339, 340, 343,
386, 389, 391, 404, 406409, 411, 412 344, 347, 433, 452454
also appear as master loop equation Spectrum, 328, 330332, 394396, 398, 399,
Self-adjoint, 198, 220, 260, 323, 329, 332 451454
334, 343347, 359, 366, 368, 370, 395, Spiked models, 184
396, 412, 432, 433, 451456 State, 331334, 336342, 391, 395, 454,
anti-, 196, 201, 206, 207, 210, 432, 436 455
Self-dual, 37, 392 faithful, 342345, 369, 370, 394, 395,
see also Kernel, self-dual 456, 457
Selberg integral formula, 54, 5964, 87, 88, normal, 342345, 369, 370, 454, 456,
189 457
tracial, 322, 324, 331334, 337, 340
Semicircle distribution (law), 6, 7, 10, 21, 345, 349, 366370, 372, 380, 387, 389,
23, 36, 43, 47, 51, 81, 86, 88, 101, 105, 391, 394, 395, 413, 456458
139, 262, 273, 319, 323, 365, 368, 369,
373, 374, 375, 404, 410 Stationary process, 261, 269, 318
see also Determinantal, stationary pro-
Semicircular variables, 323, 324, 365, 374, cess and Translation invariance
375, 377380, 382, 394, 395, 410, 412
Steepest descent, 134, 138, 141
Semi-martingale, 249, 253, 254, 461
Stieltjes transform, 9, 20, 38, 4350, 81, 87,
Sentence, 17, 18, 25, 33, 378 267, 360, 396, 398, 411, 412
equivalent, 17
FK, 2528 Stirlings formula, 59, 105, 108, 109, 119,
graph associated with, 17, 378 136, 164
support of, 17 Stochastic,
weight of, 17 analysis (calculus), 87, 248281, 319, 412,
Separable, 338342, 345, 351, 419, 420 413, 459463
427 differential equation (system), 249, 250,
258, 261, 263, 269, 274, 291, 460
Shift, 233, 238 noncommutative calculus, 413
SinaiSoshnikov, 86 StoneWeierstrass Theorem, 330
Singular value, 8789, 189, 190, 193, 207, Stopping time, 251, 253, 260, 459
210, 282, 285, 301, 394, 416, 434
S-transform, 366, 368
Size bias, 239
Subordination function, 410
Skew field, 187, 430
Superposition, 66, 88, 162, 166
Skew-Hermitian matrix, 253
Symmetric function, 65, 217, 218
Sobolev space, 293
Symplectic, 192
Solution, see also Gaussian symplectic ensemble
strong, 249251, 253259, 269, 460
weak, 249, 251, 261, 460
Soshnikov, 184 Talagrand, 285287, 320
Spacing, 9093, 110, 114, 132, 134, 160, Tangent space, 196, 200, 437, 439
181184, 240, 242 Tensor product, 322, 348, 399, 400, 451
Spectral, Telecommunications, 413
analysis, 330 Three-term recurrence (recursion), 100, 106,
measure, 319, 327, 366, 370, 375, 389, 181, 321
393, 412 Tight, 314317, 379, 380, 382, 389, 425,
projection, 332, 343, 344, 454
492 I NDEX