Introduction To Random Matrices

Download as pdf or txt
Download as pdf or txt
You are on page 1of 508

This page intentionally left blank

CAMBRIDGE STUDIES IN ADVANCED MATHEMATICS 118

Editorial Board
B. BOLLOBAS, W. FULTON, A. KATOK, F. KIRWAN,
P. SARNAK, B. SIMON, B. TOTARO

An Introduction to Random Matrices

The theory of random matrices plays an important role in many areas of pure
mathematics and employs a variety of sophisticated mathematical tools (analytical,
probabilistic and combinatorial). This diverse array of tools, while attesting to the
vitality of the field, presents several formidable obstacles to the newcomer, and even
the expert probabilist.
This rigorous introduction to the basic theory is sufficiently self-contained to be
accessible to graduate students in mathematics or related sciences, who have mastered
probability theory at the graduate level, but have not necessarily been exposed to
advanced notions of functional analysis, algebra or geometry. Useful background
material is collected in the appendices and exercises are also included throughout to
test the readers understanding. Enumerative techniques, stochastic analysis, large
deviations, concentration inequalities, disintegration and Lie algebras all are
introduced in the text, which will enable readers to approach the research literature
with confidence.

g r e g w . a n d e r s o n is Professor of Mathematics at the University of Minnesota.

a l i c e g u i o n n e t is CNRS Research Director at the Ecole Normale Superieure in


Lyon (ENS-Lyon).

o f e r z e i t o u n i is Professor of Mathematics at both the University of Minnesota


and the Weizmann Institute of Science in Rehovot, Israel.
CAMBRIDGE STUDIES IN ADVANCED MATHEMATICS
Editorial Board:
B. Bollobas, W. Fulton, A. Katok, F. Kirwan, P. Sarnak, B. Simon, B. Totaro
All the titles listed below can be obtained from good booksellers or from Cambridge University
Press. For a complete series listing visit: www.cambridge.org/series/sSeries.asp?code=CSAM
Already published
65 A. J. Berrick & M. E. Keating An introduction to rings and modules with K-theory in view
66 S. Morosawa et al. Holomorphic dynamics
67 A. J. Berrick & M. E. Keating Categories and modules with K-theory in view
68 K. Sato Levy processes and infinitely divisible distributions
69 H. Hida Modular forms and Galois cohomology
70 R. Iorio & V. Iorio Fourier analysis and partial differential equations
71 R. Blei Analysis in integer and fractional dimensions
72 F. Borceux & G. Janelidze Galois theories
73 B. Bollobas Random graphs (2nd Edition)
74 R. M. Dudley Real analysis and probability (2nd Edition)
75 T. Sheil-Small Complex polynomials
76 C. Voisin Hodge theory and complex algebraic geometry, I
77 C. Voisin Hodge theory and complex algebraic geometry, II
78 V. Paulsen Completely bounded maps and operator algebras
79 F. Gesztesy & H. Holden Soliton equations and their algebro-geometric solutions, I
81 S. Mukai An introduction to invariants and moduli
82 G. Tourlakis Lectures in logic and set theory, I
83 G. Tourlakis Lectures in logic and set theory, II
84 R. A. Bailey Association schemes
85 J. Carlson, S. Muller-Stach & C. Peters Period mappings and period domains
86 J. J. Duistermaat & J. A. C. Kolk Multidimensional real analysis, I
87 J. J. Duistermaat & J. A. C. Kolk Multidimensional real analysis, II
89 M. C. Golumbic & A. N. Trenk Tolerance graphs
90 L. H. Harper Global methods for combinatorial isoperimetric problems
91 I. Moerdijk & J. Mrcun Introduction to foliations and Lie groupoids
92 J. Kollar, K. E. Smith & A. Corti Rational and nearly rational varieties
93 D. Applebaum Levy processes and stochastic calculus (1st Edition)
94 B. Conrad Modular forms and the Ramanujan conjecture
95 M. Schechter An introduction to nonlinear analysis
96 R. Carter Lie algebras of finite and affine type
97 H. L. Montgomery & R. C. Vaughan Multiplicative number theory, I
98 I. Chavel Riemannian geometry (2nd Edition)
99 D. Goldfeld Automorphic forms and L-functions for the group GL(n,R)
100 M. B. Marcus & J. Rosen Markov processes, Gaussian processes, and local times
101 P. Gille & T. Szamuely Central simple algebras and Galois cohomology
102 J. Bertoin Random fragmentation and coagulation processes
103 E. Frenkel Langlands correspondence for loop groups
104 A. Ambrosetti & A. Malchiodi Nonlinear analysis and semilinear elliptic problems
105 T. Tao & V. H. Vu Additive combinatorics
106 E. B. Davies Linear operators and their spectra
107 K. Kodaira Complex analysis
108 T. Ceccherini-Silberstein, F. Scarabotti & F. Tolli Harmonic analysis on finite groups
109 H. Geiges An introduction to contact topology
110 J. Faraut Analysis on Lie groups: An Introduction
111 E. Park Complex topological K-theory
112 D. W. Stroock Partial differential equations for probabilists
113 A. Kirillov, Jr An introduction to Lie groups and Lie algebras
114 F. Gesztesy et al. Soliton equations and their algebro-geometric solutions, II
115 E. de Faria & W. de Melo Mathematical tools for one-dimensional dynamics
116 D. Applebaum Levy processes and stochastic calculus (2nd Edition)
117 T. Szamuely Galois groups and fundamental groups
An Introduction to Random Matrices

GREG W. ANDERSON
University of Minnesota

ALICE GUIONNET
Ecole Normale Superieure de Lyon

OFER ZEITOUNI
University of Minnesota and
Weizmann Institute of Science
CAMBRIDGE UNIVERSITY PRESS
Cambridge, New York, Melbourne, Madrid, Cape Town, Singapore,
So Paulo, Delhi, Dubai, Tokyo

Cambridge University Press


The Edinburgh Building, Cambridge CB2 8RU, UK

Published in the United States of America by Cambridge University Press, New York

www.cambridge.org
Information on this title: www.cambridge.org/9780521194525

G. W. Anderson, A. Guionnet and O. Zeitouni 2010

This publication is in copyright. Subject to statutory exception and to the


provision of relevant collective licensing agreements, no reproduction of any part
may take place without the written permission of Cambridge University Press.
First published in print format 2009

ISBN-13 978-0-511-78780-5 eBook (EBL)


ISBN-13 978-0-521-19452-5 Hardback

Cambridge University Press has no responsibility for the persistence or accuracy


of urls for external or third-party internet websites referred to in this publication,
and does not guarantee that any content on such websites is, or will remain,
accurate or appropriate.
To Meredith, Benoit and Naomi
Contents

Preface page xiii

1 Introduction 1

2 Real and complex Wigner matrices 6


2.1 Real Wigner matrices: traces, moments and combinatorics 6
2.1.1 The semicircle distribution, Catalan numbers and
Dyck paths 7
2.1.2 Proof #1 of Wigners Theorem 2.1.1 10
2.1.3 Proof of Lemma 2.1.6: words and graphs 11
2.1.4 Proof of Lemma 2.1.7: sentences and graphs 17
2.1.5 Some useful approximations 21
2.1.6 Maximal eigenvalues and FurediKomlos enumeration 23
2.1.7 Central limit theorems for moments 29
2.2 Complex Wigner matrices 35
2.3 Concentration for functionals of random matrices and
logarithmic Sobolev inequalities 38
2.3.1 Smoothness properties of linear functions of the
empirical measure 38
2.3.2 Concentration inequalities for independent variables
satisfying logarithmic Sobolev inequalities 39
2.3.3 Concentration for Wigner-type matrices 42
2.4 Stieltjes transforms and recursions 43

vii
viii C ONTENTS

2.4.1 Gaussian Wigner matrices 45


2.4.2 General Wigner matrices 47
2.5 Joint distribution of eigenvalues in the GOE and the GUE 50
2.5.1 Definition and preliminary discussion of the GOE
and the GUE 51
2.5.2 Proof of the joint distribution of eigenvalues 54
2.5.3 Selbergs integral formula and proof of (2.5.4) 58
2.5.4 Joint distribution of eigenvalues: alternative formu-
lation 65
2.5.5 Superposition and decimation relations 66
2.6 Large deviations for random matrices 70
2.6.1 Large deviations for the empirical measure 71
2.6.2 Large deviations for the top eigenvalue 81
2.7 Bibliographical notes 85

3 Hermite polynomials, spacings and limit distributions for the Gaus-


sian ensembles 90
3.1 Summary of main results: spacing distributions in the bulk
and edge of the spectrum for the Gaussian ensembles 90
3.1.1 Limit results for the GUE 90
3.1.2 Generalizations: limit formulas for the GOE and GSE 93
3.2 Hermite polynomials and the GUE 94
3.2.1 The GUE and determinantal laws 94
3.2.2 Properties of the Hermite polynomials and oscillator
wave-functions 99
3.3 The semicircle law revisited 101
3.3.1 Calculation of moments of LN 102
3.3.2 The HarerZagier recursion and Ledouxs argument 103
3.4 Quick introduction to Fredholm determinants 107
3.4.1 The setting, fundamental estimates and definition of
the Fredholm determinant 107
3.4.2 Definition of the Fredholm adjugant, Fredholm
resolvent and a fundamental identity 110
C ONTENTS ix

3.5 Gap probabilities at 0 and proof of Theorem 3.1.1 114


3.5.1 The method of Laplace 115
3.5.2 Evaluation of the scaling limit: proof of Lemma
3.5.1 117
3.5.3 A complement: determinantal relations 120
3.6 Analysis of the sine-kernel 121
3.6.1 General differentiation formulas 121
3.6.2 Derivation of the differential equations: proof of
Theorem 3.6.1 126
3.6.3 Reduction to Painleve V 128
3.7 Edge-scaling: proof of Theorem 3.1.4 132
3.7.1 Vague convergence of the largest eigenvalue: proof
of Theorem 3.1.4 133
3.7.2 Steepest descent: proof of Lemma 3.7.2 134
3.7.3 Properties of the Airy functions and proof of Lemma
3.7.1 139
3.8 Analysis of the TracyWidom distribution and proof of
Theorem 3.1.5 142
3.8.1 The first standard moves of the game 144
3.8.2 The wrinkle in the carpet 144
3.8.3 Linkage to Painleve II 146
3.9 Limiting behavior of the GOE and the GSE 148
3.9.1 Pfaffians and gap probabilities 148
3.9.2 Fredholm representation of gap probabilities 155
3.9.3 Limit calculations 160
3.9.4 Differential equations 170
3.10 Bibliographical notes 181

4 Some generalities 186


4.1 Joint distribution of eigenvalues in the classical matrix
ensembles 187
4.1.1 Integration formulas for classical ensembles 187
4.1.2 Manifolds, volume measures and the coarea formula 193
x C ONTENTS

4.1.3 An integration formula of Weyl type 199


4.1.4 Applications of Weyls formula 206
4.2 Determinantal point processes 214
4.2.1 Point processes: basic definitions 215
4.2.2 Determinantal processes 220
4.2.3 Determinantal projections 222
4.2.4 The CLT for determinantal processes 227
4.2.5 Determinantal processes associated with eigenvalues 228
4.2.6 Translation invariant determinantal processes 232
4.2.7 One-dimensional translation invariant determinantal
processes 237
4.2.8 Convergence issues 241
4.2.9 Examples 243
4.3 Stochastic analysis for random matrices 248
4.3.1 Dysons Brownian motion 249
4.3.2 A dynamical version of Wigners Theorem 262
4.3.3 Dynamical central limit theorems 273
4.3.4 Large deviation bounds 277
4.4 Concentration of measure and random matrices 281
4.4.1 Concentration inequalities for Hermitian matrices
with independent entries 282
4.4.2 Concentration inequalities for matrices with depen-
dent entries 287
4.5 Tridiagonal matrix models and the ensembles 302
4.5.1 Tridiagonal representation of ensembles 303
4.5.2 Scaling limits at the edge of the spectrum 306
4.6 Bibliographical notes 318

5 Free probability 322


5.1 Introduction and main results 323
5.2 Noncommutative laws and noncommutative probability spaces 325
C ONTENTS xi

5.2.1 Algebraic noncommutative probability spaces and


laws 325
5.2.2 C -probability spaces and the weak*-topology 329
5.2.3 W -probability spaces 339
5.3 Free independence 348
5.3.1 Independence and free independence 348
5.3.2 Free independence and combinatorics 354
5.3.3 Consequence of free independence: free convolution 359
5.3.4 Free central limit theorem 368
5.3.5 Freeness for unbounded variables 369
5.4 Link with random matrices 374
5.5 Convergence of the operator norm of polynomials of inde-
pendent GUE matrices 394
5.6 Bibliographical notes 410
Appendices 414
A Linear algebra preliminaries 414
A.1 Identities and bounds 414
A.2 Perturbations for normal and Hermitian matrices 415
A.3 Noncommutative matrix L p -norms 416
A.4 Brief review of resultants and discriminants 417
B Topological preliminaries 418
B.1 Generalities 418
B.2 Topological vector spaces and weak topologies 420
B.3 Banach and Polish spaces 422
B.4 Some elements of analysis 423
C Probability measures on Polish spaces 423
C.1 Generalities 423
C.2 Weak topology 425
D Basic notions of large deviations 427
E The skew field H of quaternions and matrix theory over F 430
E.1 Matrix terminology over F and factorization theorems 431
xii C ONTENTS

E.2 The spectral theorem and key corollaries 433


E.3 A specialized result on projectors 434
E.4 Algebra for curvature computations 435
F Manifolds 437
F.1 Manifolds embedded in Euclidean space 438
F.2 Proof of the coarea formula 442
F.3 Metrics, connections, curvature, Hessians, and the
LaplaceBeltrami operator 445
G Appendix on operator algebras 450
G.1 Basic definitions 450
G.2 Spectral properties 452
G.3 States and positivity 454
G.4 von Neumann algebras 455
G.5 Noncommutative functional calculus 457
H Stochastic calculus notions 459
References 465
General conventions and notation 481
Index 484
Preface

The study of random matrices, and in particular the properties of their eigenval-
ues, has emerged from the applications, first in data analysis and later as statisti-
cal models for heavy-nuclei atoms. Thus, the field of random matrices owes its
existence to applications. Over the years, however, it became clear that models
related to random matrices play an important role in areas of pure mathematics.
Moreover, the tools used in the study of random matrices came themselves from
different and seemingly unrelated branches of mathematics.
At this point in time, the topic has evolved enough that the newcomer, especially
if coming from the field of probability theory, faces a formidable and somewhat
confusing task in trying to access the research literature. Furthermore, the back-
ground expected of such a newcomer is diverse, and often has to be supplemented
before a serious study of random matrices can begin.
We believe that many parts of the field of random matrices are now developed
enough to enable one to expose the basic ideas in a systematic and coherent way.
Indeed, such a treatise, geared toward theoretical physicists, has existed for some
time, in the form of Mehtas superb book [Meh91]. Our goal in writing this book
has been to present a rigorous introduction to the basic theory of random matri-
ces, including free probability, that is sufficiently self-contained to be accessible to
graduate students in mathematics or related sciences who have mastered probabil-
ity theory at the graduate level, but have not necessarily been exposed to advanced
notions of functional analysis, algebra or geometry. Along the way, enough tech-
niques are introduced that we hope will allow readers to continue their journey
into the current research literature.
This project started as notes for a class on random matrices that two of us (G. A.
and O. Z.) taught in the University of Minnesota in the fall of 2003, and notes for
a course in the probability summer school in St. Flour taught by A. G. in the

xiii
xiv P REFACE

summer of 2006. The comments of participants in these courses, and in particular


A. Bandyopadhyay, H. Dong, K. Hoffman-Credner, A. Klenke, D. Stanton and
P.M. Zamfir, were extremely useful. As these notes evolved, we taught from them
again at the University of Minnesota, the University of California at Berkeley,
the Technion and the Weizmann Institute, and received more much appreciated
feedback from the participants in those courses. Finally, when expanding and
refining these course notes, we have profited from the comments and questions of
many colleagues. We would like in particular to thank G. Ben Arous, F. Benaych-
Georges, P. Biane, P. Deift, A. Dembo, P. Diaconis, U. Haagerup, V. Jones, M.
Krishnapur, Y. Peres, R. Pinsky, G. Pisier, B. Rider, D. Shlyakhtenko, B. Solel, A.
Soshnikov, R. Speicher, T. Suidan, C. Tracy, B. Virag and D. Voiculescu for their
suggestions, corrections and patience in answering our questions or explaining
their work to us. Of course, any remaining mistakes and unclear passages are
fully our responsibility.

G REG A NDERSON M INNEAPOLIS , M INNESOTA


A LICE G UIONNET LYON , F RANCE
O FER Z EITOUNI R EHOVOT, I SRAEL
1
Introduction

This book is concerned with random matrices. Given the ubiquitous role that
matrices play in mathematics and its application in the sciences and engineer-
ing, it seems natural that the evolution of probability theory would eventually
pass through random matrices. The reality, however, has been more complicated
(and interesting). Indeed, the study of random matrices, and in particular the
properties of their eigenvalues, has emerged from the applications, first in data
analysis (in the early days of statistical sciences, going back to Wishart [Wis28]),
and later as statistical models for heavy-nuclei atoms, beginning with the semi-
nal work of Wigner [Wig55]. Still motivated by physical applications, at the able
hands of Wigner, Dyson, Mehta and co-workers, a mathematical theory of the
spectrum of random matrices began to emerge in the early 1960s, and links with
various branches of mathematics, including classical analysis and number theory,
were established. While much progress was initially achieved using enumerative
combinatorics, gradually, sophisticated and varied mathematical tools were intro-
duced: Fredholm determinants (in the 1960s), diffusion processes (in the 1960s),
integrable systems (in the 1980s and early 1990s), and the RiemannHilbert prob-
lem (in the 1990s) all made their appearance, as well as new tools such as the
theory of free probability (in the 1990s). This wide array of tools, while attest-
ing to the vitality of the field, presents, however, several formidable obstacles to
the newcomer, and even to the expert probabilist. Indeed, while much of the re-
cent research uses sophisticated probabilistic tools, it builds on layers of common
knowledge that, in the aggregate, few people possess.
Our goal in this book is to present a rigorous introduction to the basic theory
of random matrices that would be sufficiently self-contained to be accessible to
graduate students in mathematics or related sciences who have mastered probabil-
ity theory at the graduate level, but have not necessarily been exposed to advanced
notions of functional analysis, algebra or geometry. With such readers in mind, we

1
2 1. I NTRODUCTION

present some background material in the appendices, that novice and expert alike
can consult; most material in the appendices is stated without proof, although the
details of some specialized computations are provided.
Keeping in mind our stated emphasis on accessibility over generality, the book
is essentially divided into two parts. In Chapters 2 and 3, we present a self-
contained analysis of random matrices, quickly focusing on the Gaussian ensem-
bles and culminating in the derivation of the gap probabilities at 0 and the Tracy
Widom law. These chapters can be read with very little background knowledge,
and are particularly suitable for an introductory study. In the second part of the
book, Chapters 4 and 5, we use more advanced techniques, requiring more exten-
sive background, to emphasize and generalize certain aspects of the theory, and to
introduce the theory of free probability.
So what is a random matrix, and what questions are we about to study? Through-
out, let F = R or F = C, and set = 1 in the former case and = 2 in the latter. (In
Section 4.1, we will also consider the case F = H, the skew-field of quaternions,
see Appendix E for definitions and details.) Let MatN (F) denote the space of N-
( )
by-N matrices with entries in F, and let HN denote the subset of self-adjoint
matrices (i.e., real symmetric if = 1 and Hermitian if = 2). One can always
( )
consider the sets MatN (F) and HN , = 1, 2, as submanifolds of an appropriate
Euclidean space, and equip it with the induced topology and (Borel) sigma-field.
Recall that a probability space is a triple (, F , P) so that F is a sigma-algebra
of subsets of and P is a probability measure on (, F ). In that setting, a random
matrix XN is a measurable map from (, F ) to MatN (F).
Our main interest is in the eigenvalues of random matrices. Recall that the
eigenvalues of a matrix H MatN (F) are the roots of the characteristic polynomial
PN (z) = det(zIN H), with IN the identity matrix. Therefore, on the (open) set
where the eigenvalues are all simple, they are smooth functions of the entries of
XN (a more complete discussion can be found in Section 4.1).
( )
We will be mostly concerned in this book with self-adjoint matrices H HN ,
= 1, 2, in which case the eigenvalues are all real and can be ordered. Thus,
( )
for H HN , we let 1 (H) N (H) be the eigenvalues of H. A conse-
quence of the perturbation theory of normal matrices (see Lemma A.4) is that the
eigenvalues {i (H)} are continuous functions of H (this also follows from the
HoffmanWielandt theorem, Theorem 2.1.19). In particular, if XN is a random
matrix then the eigenvalues {i (XN )} are random variables.
We present now a guided tour of the book. We begin by considering Wigner
matrices in Chapter 2. These are symmetric (or Hermitian) matrices XN whose
1. I NTRODUCTION 3

entries are independent and identically distributed, except for the symmetry con-
straints. For x R, let x denote the Dirac measure at x, that is, the unique prob-

ability measure satisfying f d x = f (x) for all continuous functions on R. Let
LN = N 1 Ni=1 i (XN ) denote the empirical measure of the eigenvalues of XN .
Wigners Theorem (Theorem 2.1.1) asserts that, under appropriate assumptions
on the law of the entries, LN converges (with respect to the weak convergence
of measures) towards a deterministic probability measure, the semicircle law. We
present in Chapter 2 several proofs of Wigners Theorem. The first, in Section 2.1,
involves a combinatorial machinery that is also exploited to yield central limit the-
orems and estimates on the spectral radius of XN . After first introducing in Section
2.3 some useful estimates on the deviation between the empirical measure and its
mean, we define in Section 2.4 the Stieltjes transform of measures and use it to
give another quick proof of Wigners Theorem.

Having discussed techniques valid for entries distributed according to general


laws, we turn attention to special situations involving additional symmetry. The
simplest of these concerns the Gaussian ensembles, the GOE and GUE, so named
because their law is invariant under conjugation by orthogonal (resp., unitary)
matrices. The latter extra symmetry is crucial in deriving in Section 2.5 an explicit
joint distribution for the eigenvalues (thus effectively reducing consideration from
a problem involving order of N 2 random variables, namely the matrix entries, to
one involving only N variables). (The GSE, or Gaussian symplectic ensemble,
also shares this property and is discussed briefly.) A large deviations principle for
the empirical distribution, which leads to yet another proof of Wigners Theorem,
follows in Section 2.6.

The expression for the joint density of the eigenvalues in the Gaussian ensem-
bles is the starting point for obtaining local information on the eigenvalues. This
is the topic of Chapter 3. The bulk of the chapter deals with the GUE, because
in that situation the eigenvalues form a determinantal process. This allows one
to effectively represent the probability that no eigenvalues are present in a set
as a Fredholm determinant, a notion that is particularly amenable to asymptotic
analysis. Thus, after representing in Section 3.2 the joint density for the GUE in
terms of a determinant involving appropriate orthogonal polynomials, the Hermite
polynomials, we develop in Section 3.4 in an elementary way some aspects of the
theory of Fredholm determinants. We then present in Section 3.5 the asymptotic
analysis required in order to study the gap probability at 0, that is the probabil-
ity that no eigenvalue is present in an interval around the origin. Relevant tools,
such as the Laplace method, are developed along the way. Section 3.7 repeats this
analysis for the edge of the spectrum, introducing along the way the method of
4 1. I NTRODUCTION

steepest descent. The link with integrable systems and the Painleve equations is
established in Sections 3.6 and 3.8.
As mentioned before, the eigenvalues of the GUE are an example of a deter-
minantal process. The other Gaussian ensembles (GOE and GSE) do not fall into
this class, but they do enjoy a structure where certain Pfaffians replace determi-
nants. This leads to a considerably more involved analysis, the details of which
are provided in Section 3.9.
Chapter 4 is a hodge-podge of results whose common feature is that they all
require new tools. We begin in Section 4.1 with a re-derivation of the joint law
of the eigenvalues of the Gaussian ensemble, in a geometric framework based on
Lie theory. We use this framework to derive the expressions for the joint distri-
bution of eigenvalues of Wishart matrices, of random matrices from the various
unitary groups and of matrices related to random projectors. Section 4.2 studies
in some depth determinantal processes, including their construction, associated
central limit theorems, convergence and ergodic properties. Section 4.3 studies
what happens when in the GUE (or GOE), the Gaussian entries are replaced by
Brownian motions. The powerful tools of stochastic analysis can then be brought
to bear and lead to functional laws of large numbers, central limit theorems and
large deviations. Section 4.4 consists of an in-depth treatment of concentration
techniques and their application to random matrices; it is a generalization of the
discussion in the short Section 2.3. Finally, in Section 4.5, we study a family of
tri-diagonal matrices, parametrized by a parameter , whose distribution of eigen-
values coincides with that of members of the Gaussian ensembles for = 1, 2, 4.
The study of the maximal eigenvalue for this family is linked to the spectrum of
an appropriate random Schrodinger operator.
Chapter 5 is devoted to free probability theory, a probability theory for certain
noncommutative variables, equipped with a notion of independence called free
independence. Invented in the early 1990s, free probability theory has become
a versatile tool for analyzing the laws of noncommutative polynomials in several
random matrices, and of the limits of the empirical measure of eigenvalues of such
polynomials. We develop the necessary preliminaries and definitions in Section
5.2, introduce free independence in Section 5.3, and discuss the link with random
matrices in Section 5.4. We conclude the chapter with Section 5.5, in which we
study the convergence of the spectral radius of noncommutative polynomials of
random matrices.
Each chapter ends with bibliographical notes. These are not meant to be com-
prehensive, but rather guide the reader through the enormous literature and give
some hint of recent developments. Although we have tried to represent accurately
1. I NTRODUCTION 5

the historical development of the subject, we have necessarily omitted important


references, misrepresented facts, or plainly erred. Our apologies to those authors
whose work we have thus unintentionally slighted.
Of course, we have barely scratched the surface of the subject of random ma-
trices. We mention now the most glaring omissions, together with references to
some recent books that cover these topics. We have not discussed the theory of the
RiemannHilbert problem and its relation to integrable systems, Painleve equa-
tions, asymptotics of orthogonal polynomials and random matrices. The interested
reader is referred to the books [FoIKN06], [Dei99] and [DeG09] for an in-depth
treatment. We do not discuss the relation between asymptotics of random matri-
ces and combinatorial problems a good summary of these appears in [BaDS09].
We barely discuss applications of random matrices, and in particular do not re-
view the recent increase in applications to statistics or communication theory
for a nice introduction to the latter we refer to [TuV04]. We have presented only a
partial discussion of ensembles of matrices that possess explicit joint distribution
of eigenvalues. For a more complete discussion, including also the case of non-
Hermitian matrices that are not unitary, we refer the reader to [For05]. Finally,
we have not discussed the link between random matrices and number theory; the
interested reader should consult [KaS99] for a taste of that link. We further re-
fer to the bibliographical notes for additional reading, less glaring omissions and
references.
2
Real and complex Wigner matrices

2.1 Real Wigner matrices: traces, moments and combinatorics

We introduce in this section a basic model of random matrices. Nowhere do we


attempt to provide the weakest assumptions or sharpest results available. We point
out in the bibliographical notes (Section 2.7) some places where the interested
reader can find finer results.
Start with two independent families of independent and identically distributed
(i.i.d.) zero mean, real-valued random variables {Zi, j }1i< j and {Yi }1i , such that
EZ1,22 = 1 and, for all integers k 1,

 
rk := max E|Z1,2 |k , E|Y1 |k < . (2.1.1)

Consider the (symmetric) N N matrix XN with entries



/ N , if i < j,
Zi, j
XN ( j, i) = XN (i, j) = (2.1.2)
Yi / N , if i = j.
We call such a matrix a Wigner matrix, and if the random variables Zi, j and Yi are
Gaussian, we use the term Gaussian Wigner matrix. The case of Gaussian Wigner
matrices in which EY12 = 2 is of particular importance, and
for reasons that will
become clearer in Chapter 3, such matrices (rescaled by N) are referred to as
Gaussian orthogonal ensemble (GOE) matrices.
Let iN denote the (real) eigenvalues of XN , with 1N 2N NN , and
define the empirical distribution of the eigenvalues as the (random) probability
measure on R defined by
1 N
LN = iN .
N i=1

Define the semicircle distribution (or law) as the probability distribution (x)dx

6
2.1 T RACES , MOMENTS AND COMBINATORICS 7

on R with density
1 
(x) = 4 x2 1|x|2 . (2.1.3)
2
The following theorem, contained in [Wig55], can be considered the starting point
of random matrix theory (RMT).

Theorem 2.1.1 (Wigner) For a Wigner matrix, the empirical measure LN con-
verges weakly, in probability, to the semicircle distribution.

In greater detail, Theorem 2.1.1 asserts that for any f Cb (R), and any > 0,
lim P(|LN , f   , f | > ) = 0 .
N

Remark 2.1.2 The assumption (2.1.1) that rk < for all k is not really needed.
See Theorem 2.1.21 in Section 2.1.5.

We will see many proofs of Wigners Theorem 2.1.1. In this section, we give
a direct combinatorics-based proof, mimicking the original argument of Wigner.
Before doing so, however, we need to discuss some properties of the semicircle
distribution.

2.1.1 The semicircle distribution, Catalan numbers and Dyck paths

Define the moments mk :=  , xk  . Recall the Catalan numbers


 
2k
k (2k)!
Ck = = .
k+1 (k + 1)!k!
We now check that, for all integers k 1,
m2k = Ck , m2k+1 = 0 . (2.1.4)
Indeed, m2k+1 = 0 by symmetry, while
 2  /2
2 22k
m2k = x2k (x)dx = sin2k ( ) cos2 ( )d
2 /2
 /2
2 22k
= sin2k ( )d (2k + 1)m2k .
/2

Hence,
 /2
2 22k 4(2k 1)
m2k = sin2k ( )d = m2k2 , (2.1.5)
(2k + 2) /2 2k + 2
8 2. W IGNER MATRICES

from which, together with m0 = 1, one concludes (2.1.4).


The Catalan numbers possess many combinatorial interpretations. To introduce
a first one, say that an integer-valued sequence {Sn }0n is a Bernoulli walk of
length  if S0 = 0 and |St+1 St | = 1 for t  1. Of particular relevance here is
the fact that Ck counts the number of Dyck paths of length 2k, that is, the number
of nonnegative Bernoulli walks of length 2k that terminate at 0. Indeed, let k
denote the number of such paths. A classical exercise in combinatorics is

Lemma 2.1.3 k = Ck < 4k . Further, the generating function (z) := 1+


k=1 z k
k

satisfies, for |z| < 1/4,



1 1 4z
(z) = . (2.1.6)
2z

Proof of Lemma 2.1.3 Let Bk denote the number of Bernoulli walks {Sn } of
length 2k that satisfy S2k = 0, and let Bk denote the number of Bernoulli walks
{Sn } of length 2k that satisfy S2k = 0 and St < 0 for some t < 2k. Then, k =
Bk Bk . By reflection at the first hitting of 1, one sees that Bk equals the number
of Bernoulli walks {Sn } of length 2k that satisfy S2k = 2. Hence,
   
2k 2k
k = Bk Bk = = Ck .
k k1

Turning to the evaluation of (z), considering the first return time to 0 of the
Bernoulli walk {Sn } gives the relation
k
k = k j j1 , k 1 , (2.1.7)
j=1

with the convention that 0 = 1. Because the number of Bernoulli walks of length
2k is bounded by 4k , one has that k 4k , and hence the function (z) is well
defined and analytic for |z| < 1/4. But, substituting (2.1.7),
k k
(z) 1 = zk k j j1 = z zk k j j ,
k=1 j=1 k=0 j=0

while
q

(z)2 = zk+k k k = zq q  .
k,k =0 q=0 =0

Combining the last two equations, one sees that

(z) = 1 + z (z)2 ,
2.1 T RACES , MOMENTS AND COMBINATORICS 9

from which (2.1.6) follows (using that (0) = 1 to choose the correct branch of
the square-root).


We note in passing that, expanding (2.1.6) in power series in z in a neighborhood
of zero, one gets (for |z| < 1/4)
k
2 z (2k2)!
k=1 k!(k1)!

(2k)!
(z) =
2z
= k!(k + 1)! zk = zkCk ,
k=0 k=0

which provides an alternative proof of the fact that k = Ck .


Another useful interpretation of the Catalan numbers is that Ck counts the num-
ber of rooted planar trees with k edges. (A rooted planar tree is a planar graph
with no cycles, with one distinguished vertex, and with a choice of ordering at
each vertex; the ordering defines a way to explore the tree, starting at the root.)
It is not hard to check that the Dyck paths of length 2k are in bijection with such
rooted planar trees. See the proof of Lemma 2.1.6 in Section 2.1.3 for a formal
construction of this bijection.
We note in closing that a third interpretation of the Catalan numbers, particu-
larly useful in the context of Chapter 5, is that they count the non-crossing parti-
tions of the ordered set Kk := {1, 2, . . . , k}.

Definition 2.1.4 A partition of the set Kk := {1, 2, . . . , k} is called crossing if there


exists a quadruple (a, b, c, d) with 1 a < b < c < d k such that a, c belong to
one part while b, d belong to another part. A partition which is not crossing is a
non-crossing partition.

Non-crossing partitions form a lattice with respect to refinement. A look at Fig-


ure 2.1.1 should explain the terminology non-crossing: one puts the points
1, . . . , k on the circle, and connects each point with the next member of its part
(in cyclic order) by an internal path. Then, the partition is non-crossing if this can
be achieved without arcs crossing each other.
It is not hard to check that Ck is indeed the number k of non-crossing partitions
of Kk . To see that, let be a non-crossing partition of Kk and let j denote the
largest element connected to 1 (with j = 1 if the part containing 1 is the set {1}).
Then, because is non-crossing, it induces non-crossing partitions on the sets
{1, . . . , j 1} and { j + 1, . . . , k}. Therefore, k = kj=1 k j j1 . With 1 = 1, and
comparing with (2.1.7), one sees that k = k .

Exercise 2.1.5 Prove that for z C such that z [2, 2], the Stieltjes transform
10 2. W IGNER MATRICES
1 1

2 2

3 5 3
5

4 4

Fig. 2.1.1. Non-crossing (left, (1, 4), (2, 3), (5, 6)) and crossing (right, (1, 5), (2, 3), (4, 6))
partitions of the set K6 .

S(z) of the semicircle law (see Definition 2.4.1) equals



1 z + z2 4
S(z) = (d ) = .
z 2z
Hint: Either use the residue theorem, or relate S(z) to the generating function (z),
see Remark 2.4.2.

2.1.2 Proof #1 of Wigners Theorem 2.1.1

Define the probability distribution LN = ELN by the relation LN , f  = ELN , f 


for all f Cb , and set mNk := LN , xk . Theorem 2.1.1 follows from the following
two lemmas.

Lemma 2.1.6 For every k N,


lim mNk = mk .
N

(See (2.1.4) for the definition of mk .)

Lemma 2.1.7 For every k N and > 0,


 

lim P LN , xk  LN , xk  > = 0 .
N

Indeed, assume that Lemmas 2.1.6 and 2.1.7 have been proved. To conclude the
proof of Theorem 2.1.1, one needs to check that for any bounded continuous func-
tion f ,
lim LN , f  =  , f  , in probability. (2.1.8)
N
2.1 T RACES , MOMENTS AND COMBINATORICS 11

Toward this end, note first that an application of the Chebyshev inequality yields
  1 LN , x2k 
P LN , |x|k 1|x|>B  > ELN , |x|k 1|x|>B  .
Bk
Hence, by Lemma 2.1.6,
   , x2k  4k
lim sup P LN , |x|k 1|x|>B  > ,
N Bk Bk
where we used that Ck 4k . Thus, with B = 5, noting that the left side above is
increasing in k, it follows that
 
lim sup P LN , |x|k 1|x|>B  > = 0 . (2.1.9)
N
In particular, when proving (2.1.8), we may and will assume that f is supported
on the interval [5, 5].
Fix next such an f and > 0. By the Weierstrass approximation theorem, one
can find a polynomial Q (x) = Li=0 ci xi such that

sup |Q (x) f (x)| .
x:|x|B 8
Then,
 

P (|LN , f   , f | > ) P |LN , Q  LN , Q | >
4
   

+P |LN , Q   , Q | > + P |LN , Q 1|x|>B | >
4 4
=: P1 + P2 + P3 .
By an application of Lemma 2.1.7, P1 N 0. Lemma 2.1.6 implies that P2 = 0
for N large, while (2.1.9) implies that P3 N 0. This completes the proof of
Theorem 2.1.1 (modulo Lemmas 2.1.6 and 2.1.7).

2.1.3 Proof of Lemma 2.1.6: words and graphs

The starting point of the proof of Lemma 2.1.6 is the following identity:
1
LN , xk  = EtrXNk
N
N
1
=
N i1 ,...,i =1
EXN (i1 , i2 )XN (i2 , i3 ) XN (ik1 , ik )XN (ik , i1 )
k
N N
1 1
=:
N i1 ,...,i =1
ETiN =: T N ,
N i1 ,...,i =1 i
(2.1.10)
k k
12 2. W IGNER MATRICES

where we use the notation i = (i1 , . . . , ik ).


The proof of Lemma 2.1.6 now proceeds by considering which terms contribute
to (2.1.10). Let us provide first an informal sketch that explains the emergence of
the Catalan numbers, followed by a formal proof. For the purpose of this sketch,
assume that the variables Yi vanish, and that the law of Z1,2 is symmetric, so that
all odd moments vanish (and in particular, LN , xk  = 0 for k odd).
A first step in the sketch (that is fully justified in the actual proof below) is to
check that the only terms in (2.1.10) that survive the passage to the limit involve
only second moments of Zi, j , because there are order N k/2+1 nonzero terms but
only at most order N k/2 terms that involve moments higher than or equal to 4. One
then sees that
1
LN , x2k  = (1 + O(N 1 ))
N TiN1 ,...,i2k , (2.1.11)
p ,! j = p:
(i p ,i p+1 )=(i j ,i j+1 ) or (i j+1 ,i j )

where the notation ! means there exists a unique. Considering the index j > 1
such that either (i j , i j+1 ) = (i2 , i1 ) or (i j , i j+1 ) = (i1 , i2 ), and recalling that i2 = i1
since Yi1 = 0, one obtains
2k N N
1
LN , x2k  = (1 + O(N 1 ))
N (2.1.12)
j=2 i1 =i2 =1 i3 ,...,i j1 ,
i j+2 ,...,i2k =1

EXN (i2 , i3 ) XN (i j1 , i2 )XN (i1 , i j+2 ) XN (i2k , i1 )

+EXN (i2 , i3 ) XN (i j1 , i1 )XN (i2 , i j+2 ) XN (i2k , i1 ) .

Hence, if we could prove that E[LN LN , xk ]2 = O(N 2 ) and hence


E[LN , x j LN , x2k j2 ] = LN , x j LN , x2k j2 (1 + O(N 1 )) ,
we would obtain
2(k1) 
LN , x2k  = (1 + O(N 1 )) LN , x j LN , x2k j2 
j=0
1 
+ LN , x2k2 
N
2k2
= (1 + O(N 1 )) LN , x j LN , x2k j2 
j=0
k1
= (1 + O(N 1 )) LN , x2 j LN , x2(k j1)  , (2.1.13)
j=0
2.1 T RACES , MOMENTS AND COMBINATORICS 13

where we have used the fact that by induction LN , x2k2  is uniformly bounded
and also the fact that odd moments vanish. Further,

1 N
N i,
LN , x2  = EXN (i, j)2 N 1 = C1 . (2.1.14)
j=1

Thus, we conclude from (2.1.13) by induction that LN , x2k  converges to a limit
ak with a0 = a1 = 1, and further that the family {ak } satisfies the recursions ak =
kj=1 ak j a j1 . Comparing with (2.1.7), we deduce that ak = Ck , as claimed.
We turn next to the actual proof. To handle the summation in expressions like
(2.1.10), it is convenient to introduce some combinatorial machinery that will
serve us also in the sequel. We thus first digress and discuss the combinatorics
intervening in the evaluation of the sum in (2.1.10). This is then followed by the
actual proof of Lemma 2.1.6.
In the following definition, the reader may think of S as a subset of the integers.

Definition 2.1.8 (S -words) Given a set S , an S -letter s is simply an element of


S . An S -word w is a finite sequence of letters s1 sn , at least one letter long.
An S -word w is closed if its first and last letters are the same. Two S -words
w1 , w2 are called equivalent, denoted w1 w2 , if there is a bijection on S that
maps one into the other.

When S = {1, . . . , N} for some finite N, we use the term N-word. Otherwise, if
the set S is clear from the context, we refer to an S -word simply as a word.
For any S -word w = s1 sk , we use (w) = k to denote the length of w, define
the weight wt(w) as the number of distinct elements of the set {s1 , . . . , sk } and the
support of w, denoted supp w, as the set of letters appearing in w. With any word
w we may associate an undirected graph, with wt(w) vertices and (w) 1 edges,
as follows.

Definition 2.1.9 (Graph associated with an S -word) Given a word w = s1 sk ,


we let Gw = (Vw , Ew ) be the graph with set of vertices Vw = supp w and (undi-
rected) edges Ew = {{si , si+1 }, i = 1, . . . , k 1}. We define the set of self edges as
Ews = {e Ew : e = {u, u}, u Vw } and the set of connecting edges as Ewc = Ew \Ews .

The graph Gw is connected since the word w defines a path connecting all the
vertices of Gw , which further starts and terminates at the same vertex if the word
is closed. For e Ew , we use New to denote the number of times this path traverses
14 2. W IGNER MATRICES

the edge e (in any direction). We note that equivalent words generate the same
graphs Gw (up to graph isomorphism) and the same passage-counts New .
Coming back to the evaluation of TiN , see (2.1.10), note that any k-tuple of
integers i defines a closed word wi = i1 i2 ik i1 of length k + 1. We write wti =
wt(wi ), which is nothing but the number of distinct integers in i. Then,
1 wi wi
TiN =
N k/2
c Ne
E(Z1,2 ) s E(Y1Ne ) . (2.1.15)
eEw eEw
i i

w
In particular, TiN = 0 unless 2 for all e Ewi , which implies that wti
Ne i
k/2 + 1. Also, (2.1.15) shows that if wi wi then TiN = TiN . Further, if N t then
there are exactly

CN,t := N(N 1)(N 2) (N t + 1)

N-words that are equivalent to a given N-word of weight t. We make the following
definition:
Wk,t denotes a set of representatives for equivalence classes of closed
t-words w of length k + 1 and weight t with New 2 for each e Ew .
(2.1.16)
One deduces from (2.1.10) and (2.1.15) that
k/2+1
CN,t Nw Nw
LN , xk  = N k/2+1 E(Z1,2e ) E(Y1 e ) . (2.1.17)
t=1 wWk,t eEwc eEws

Note that the cardinality of Wk,t is bounded by the number of closed S -words of
length k + 1 when the cardinality of S is t k, that is, |Wk,t | t k kk . Thus,
(2.1.17) and the finiteness of rk , see (2.1.1), imply that

lim LN , xk  = 0 , if k is odd ,


N

while, for k even,


Nw Nw
lim LN , xk  =
N
c E(Z1,2e ) s E(Y1 e ) . (2.1.18)
wWk,k/2+1 eEw eEw

We have now motivated the following definition. Note that for the purpose of this
section, the case k = 0 in Definition 2.1.10 is not really needed. It is introduced in
this way here in anticipation of the analysis in Section 2.1.6.

Definition 2.1.10 A closed word w of length k + 1 1 is called a Wigner word if


either k = 0 or k is even and w is equivalent to an element of Wk,k/2+1 .
2.1 T RACES , MOMENTS AND COMBINATORICS 15

We next note that if w Wk,k/2+1 then Gw is a tree: indeed, Gw is a connected


graph with |Vw | = k/2 + 1, hence |Ew | k/2, while the condition New 2 for each
e Ew implies that |Ew | k/2. Thus, |Ew | = |Vw | 1, implying that Gw is a tree,
that is a connected graph with no loops. Further, the above implies that Ews is
empty for w Wk,k/2+1 , and thus, for k even,

lim LN , xk  = |Wk,k/2+1 | . (2.1.19)


N

We may now complete the

Proof of Lemma 2.1.6 Let k be even. It is convenient to choose the set of rep-
resentatives Wk,k/2+1 such that each word w = v1 vk+1 in that set satisfies, for
i = 1, . . . , k + 1, the condition that {v1 , . . . , vi } is an interval in Z beginning at 1.
(There is a unique choice of such representatives.) Each element w Wk,k/2+1
determines a path v1 , v2 , . . . , vk , vk+1 = v1 of length k on the tree Gw . We refer
to this path as the exploration process associated with w. Let d(v, v ) denote the
distance between vertices v, v on the tree Gw , i.e. the length of the shortest path
on the tree beginning at v and terminating at v . Setting xi = d(vi+1 , v1 ), one sees
that each word w Wk,k/2+1 defines a Dyck path D(w) = (x1 , x2 , . . . , xk ) of length
k. See Figure 2.1.2 for an example of such coding. Conversely, given a Dyck path
x = (x1 , . . . , xk ), one may construct a word w = T (x) Wk,k/2+1 by recursively
constructing an increasing sequence w2 , . . . , wk = w of words, as follows. Put
w2 = (1, 2). For i > 2, if xi1 = xi2 + 1, then wi is obtained by adjoining on the
right of wi1 the smallest positive integer not appearing in wi1 . Otherwise, wi is
obtained by adjoining on the right of wi1 the next-to-last letter of wi1 . Note that
for all i, Gwi is a tree (because Gw2 is a tree and, inductively, at stage i, either a
backtrack is added to the exploration process on Gwi1 or a leaf is added to Gwi1 ).
Furthermore, the distance in Gwi between first and last letters of wi equals xi1 , and
therefore, D(w) = (x1 , . . . , xk ). With our choice of representatives, T (D(w)) = w,
because each uptick in the Dyck path D(w) starting at location i 2 corresponds
to adjoinment on the right of wi1 of a new letter, which is uniquely determined by
supp wi1 , whereas each downtick at location i 2 corresponds to the adjoinment
of the next-to-last letter in wi1 . This establishes a bijection between Dyck paths
of length k and Wk,k/2+1 . Lemma 2.1.3 then establishes that

|Wk,k/2+1 | = Ck/2 . (2.1.20)

This completes the proof of Lemma 2.1.6.


From the proof of Lemma 2.1.6 we extract as a further benefit a proof of a fact
needed in Chapter 5. Let k be an even positive integer and let Kk = {1, . . . , k}.
Recall the notion of non-crossing partition of Kk , see Definition 2.1.4. We define
16 2. W IGNER MATRICES
1

3 4 5

Fig. 2.1.2. Coding of the word w = 123242521 into a tree and a Dyck path of length 8.
Note that (w) = 9 and wt(w) = 5.

a pair partition of Kk to be a partition all of whose parts are two-element sets.


The fact we need is that the equivalence classes of Wigner words of length k + 1
and the non-crossing pair partitions of Kk are in canonical bijective correspon-
dence. More precisely, we have the following result which describes the bijection
in detail.

Proposition 2.1.11 Given a Wigner word w = i1 ik+1 of length k + 1, let w be


the partition generated by the function j  {i j , i j+1 } : {1, . . . , k} Ew . (Here, re-
call, Ew is the set of edges of the graph Gw associated with w.) Then the following
hold:
(i) w is a non-crossing pair partition;
(ii) every non-crossing pair partition of Kk is of the form w for some Wigner
word w of length k + 1;
(iii) if two Wigner words w and w of length k + 1 satisfy w = w , then w and w
are equivalent.

Proof (i) Because a Wigner word w viewed as a walk on its graph Gw crosses
every edge exactly twice, w is a pair partition. Because the graph Gw is a tree,
the pair partition w is non-crossing.
(ii) The non-crossing pair partitions of Kk correspond bijectively to Dyck paths.
More precisely, given a non-crossing pair partition of Kk , associate with it a
path f = ( f (1), . . . , f (k)) by the rules that f (1) = 1 and, for i = 2, . . . , k,
2.1 T RACES , MOMENTS AND COMBINATORICS 17

f (i) = f (i 1) + 1 (resp., f (i) = f (i 1) 1) if i is the first (resp., second)


member of the part of to which i belongs. It is easy to check that f is a Dyck
path, and furthermore that the map  f puts non-crossing pair partitions of
Kk into bijective correspondence with Dyck paths of length k. Now choose a
Wigner word w whose associated Dyck path D(w), see the proof of Lemma 2.1.6,
equals f . One can verify that w = .
(iii) Given w = w , one can verify that D(w) = D(w ), from which the equiva-
lence of w and w follows.

2.1.4 Proof of Lemma 2.1.7: sentences and graphs

By Chebyshevs inequality, it is enough to prove that


 
lim |E LN , xk 2 LN , xk 2 | = 0 .
N

Proceeding as in (2.1.10), one has


N
1
E(LN , xk 2 ) LN , xk 2 =
N2 Ti,iN , (2.1.21)
i1 ,...,ik =1
i 1 ,...,i k =1

where


Ti,iN = ETiN TiN ETiN ETiN . (2.1.22)
The role of words in the proof of Lemma 2.1.6 is now played by pairs of words,
which is a particular case of a sentence.

Definition 2.1.12 (S -sentences) Given a set S , an S -sentence a is a finite se-


quence of S -words w1 , . . . , wn , at least one word long. Two S -sentences a1 , a2
are called equivalent, denoted a1 a2 , if there is a bijection on S that maps one
into the other.

As with words, for a sentence a = (w1 , w2 , . . . , wn ), we define the support as



supp (a) = ni=1 supp (wi ), and the weight wt(a) as the cardinality of supp (a).

Definition 2.1.13 (Graph associated with an S -sentence) Given a sentence a


= (w1 , . . . , wk ), with wi = si1 si2 si(wi ) , we set Ga = (Va , Ea ) to be the graph with
set of vertices Va = supp (a) and (undirected) edges
Ea = {{sij , sij+1 }, j = 1, . . . , (wi ) 1, i = 1, . . . , k} .
We define the set of self edges as Eas = {e Ea : e = {u, u}, u Va } and the set of
connecting edges as Eac = Ea \ Eas .
18 2. W IGNER MATRICES

In words, the graph associated with a sentence a = (w1 , . . . , wk ) is obtained by


piecing together the graphs of the individual words wi (and in general, it differs
from the graph associated with the word obtained by concatenating the words
wi ). Unlike the graph of a word, the graph associated with a sentence may be
disconnected. Note that the sentence a defines k paths in the graph Ga . For e Ea ,
we use Nea to denote the number of times the union of these paths traverses the
edge e (in any direction). We note that equivalent sentences generate the same
graphs Ga and the same passage-counts Nea .
Coming back to the evaluation of Ti,i , see (2.1.21), recall the closed words wi , wi
of length k + 1, and define the two-word sentence ai,i = (wi , wi ). Then,
1 Na Na
Ti,iN = k
N eE c
E(Z1,2e ) E(Y1 e ) (2.1.23)
eE s
ai,i ai,i

w wi wi wi 
E(Z1,2 ) Ne i
E(Y1Ne ) Ne
E(Z1,2 ) E(Y1Ne ) .
eEwc eEws eEwc eEws
i i i i
a
In particular, Ti,iN = 0 unless 2 for all e Eai,i . Also, Ti,iN = 0 unless
Ne i,i
Ewi Ewi = 0.
/ Further, (2.1.23) shows that if ai,i aj,j then Ti,iN = Tj,jN . Finally,
if N t then there are exactly CN,t N-sentences that are equivalent to a given
N-sentence of weight t. We make the following definition:
(2)
Wk,t denotes a set of representatives for equivalence classes of sentences a
of weight t consisting of two closed t-words (w1 , w2 ), each of length k + 1,
with Nea 2 for each e Ea , and Ew1 Ew2 = 0/ .
(2.1.24)
One deduces from (2.1.21) and (2.1.23) that
E(LN , xk 2 ) LN , xk 2 (2.1.25)
2k
CN,t 
Na Na
= N k+2 c E(Z1,2e ) s E(Y1 e )
t=1 (2) eEa eEa
a=(w1 ,w2 )Wk,t
w1 w1 w2 w2 
c Ne
E(Z1,2 ) s E(Y1Ne ) c Ne
E(Z1,2 ) s E(Y1Ne ) .
eEw eEw eEw eEw
1 1 2 2

We have completed the preliminaries to


(2)
Proof of Lemma 2.1.7 In view of (2.1.25), it suffices to check that Wk,t is empty
for t k + 2. Since we need it later, we prove a slightly stronger claim, namely
(2)
that Wk,t is empty for t k + 1.
(2)
Toward this end, note that if a Wk,t then Ga is a connected graph, with t
vertices and at most k edges (since Nea 2 for e Ea ), which is impossible when
2.1 T RACES , MOMENTS AND COMBINATORICS 19

t > k + 1. Considering the case t = k + 1, it follows that Ga is a tree, and each


edge must be visited by the paths generated by a exactly twice. Because the path
generated by w1 in the tree Ga starts and end at the same vertex, it must visit each
edge an even number of times. Thus, the set of edges visited by w1 is disjoint
(2)
from the set of edges visited by w2 , contradicting the definition of Wk,t .

Remark 2.1.14 Note that in the course of the proof of Lemma 2.1.7, we actually
showed that for N > 2k,

E(LN , xk 2 ) LN , xk 2 (2.1.26)


k
CN,t
Na Na
= N k+2 c E(Z1,2e ) s E(Y1 e )
t=1 (2) eEa eEa
a=(w1 ,w2 )Wk,t
w1 w1 w2 w2 
c Ne
E(Z1,2 ) s E(Y1Ne ) c Ne
E(Z1,2 ) s E(Y1Ne ) ,
eEw eEw eEw eEw
1 1 2 2

that is, that the summation in (2.1.25) can be restricted to t k.

Exercise 2.1.15 Consider symmetric random matrices XN , with the zero mean
independent random variables {XN (i, j)}1i jN no longer assumed identically
distributed nor all of variance 1/N. Check that Theorem 2.1.1 still holds if one
assumes that for all > 0,
#{(i, j) : |1 NEXN (i, j)2 | < }
lim = 1,
N N2
and for all k 1, there exists a finite rk independent of N such that
k

sup E NXN (i, j) rk .
1i jN

Exercise 2.1.16 Check that the conclusion of Theorem 2.1.1 remains true when
convergence in probability is replaced by almost sure convergence.
Hint: Using Chebyshevs inequality and the BorelCantelli Lemma, it is enough
to verify that for all positive integers k, there exists a constant C = C(k) such that
  C
|E LN , xk 2 LN , xk 2 | 2 .
N

Exercise 2.1.17 In the setup of Theorem 2.1.1, assume that rk < for all k but
2 ] = 1. Show that, for any positive integer k,
not necessarily that E[Z1,2

sup E[LN , xk ] =: C(r ,  k) < .


NN
20 2. W IGNER MATRICES

Exercise 2.1.18 We develop in this exercise the limit theory for Wishart matrices.
Let M = M(N) be a sequence of positive integers such that

lim M(N)/N = [1, ) .


N

Consider an N M(N)  matrix YN with


 i.i.d. entries of mean zero and variance
1/N, and such that E N k/2 |YN (1, 1)|k rk < . Define the N N Wishart matrix
as WN = YN YNT , and let LN denote the empirical measure of the eigenvalues of WN .
Set LN = ELN .
(a) Write NLN , xk  as

EYN (i1 , j1 )YN (i2 , j1 )YN (i2 , j2 )YN (i3 , j2 ) YN (ik , jk )YN (i1 , jk )
i1 ,...,ik
j1 ,..., jk

and show that the only contributions to the sum (divided by N) that survive the
passage to the limit are those in which each term appears exactly twice.
Hint: use the words i1 j1 i2 j2 . . . jk i1 and a bi-partite graph to replace the Wigner
analysis.
(b) Code the contributions as Dyck paths, where the even heights correspond to
i indices and the odd heights correspond to j indices. Let  = (i, j) denote the
number of times the excursion makes a descent from an odd height to an even
height (this is the number of distinct j indices in the tuple!), and show that the
combinatorial weight of such a path is asymptotic to N k+1  .
(c) Let  denote the number of times the excursion makes a descent from an even
height to an odd height, and set



k =  , k =  .
Dyck paths of length 2k Dyck paths of length 2k

(The k are the kth moments of any weak limit of LN .) Prove that

k k
k = k j j1 , k = k j j1 , k 1.
j=1 j=1

(d) Setting (z) =


k=0 z k , prove that (z) = 1 + z (z) + ( 1)z (z),
k 2

and thus the limit F of LN possesses the Stieltjes transform (see Definition 2.4.1)
z1 (1/z), where
 
( 1)2 z
1 ( 1)z 1 4z +1
2 4
(z) = .
2z
2.1 T RACES , MOMENTS AND COMBINATORICS 21

(e) Conclude that F possesses a density f supported on [b , b+ ], with b =



(1 )2 , b+ = (1 + )2 , satisfying

(x b )(b+ x)
f (x) = , x [b , b+ ] . (2.1.27)
2 x
(This is the famous MarcenkoPastur law, due to [MaP67].)
(f) Prove the analog of Lemma 2.1.7 for Wishart matrices, and deduce that LN
F weakly, in probability.
(g) Note that F1 is the image of the semicircle distribution under the transformation
x  x2 .

2.1.5 Some useful approximations

This section is devoted to the following simple observation that often allows one
to considerably simplify arguments concerning the convergence of empirical mea-
sures.

Lemma 2.1.19 (HoffmanWielandt) Let A, B be N N symmetric matrices, with


eigenvalues 1A 2A . . . NA and 1B 2B . . . NB . Then
N
|iA iB |2 tr(A B)2 .
i=1

Proof Note that trA2 = i (iA )2 and trB2 = i (iB )2 . Let U denote the matrix
diagonalizing B written in the basis determined by A, and let DA , DB denote the
diagonal matrices with diagonal elements iA , iB respectively. Then,
trAB = trDAUDBU T = iA jB u2i j .
i, j

The last sum is linear in the coefficients vi j = u2i j , and the orthogonality of U
implies that j vi j = 1, i vi j = 1. Thus
trAB sup iA jB vi j .
vi j 0: j vi j =1,i vi j =1 i, j
(2.1.28)

But this is a maximization of a linear functional over the convex set of doubly
stochastic matrices, and the maximum is obtained at the extreme points, which
are well known to correspond to permutations The maximum among permuta-
tions is then easily checked to be i iA iB . Collecting these facts together implies
Lemma 2.1.19. Alternatively, one sees directly that a maximizing V = {vi j } in
(2.1.28) is the identity matrix. Indeed, assume w.l.o.g. that v11 < 1. We then
construct a matrix V = {vi j } with v11 = 1 and vii = vii for i > 1 such that V is also
22 2. W IGNER MATRICES

a maximizing matrix. Indeed, because v11 < 1, there exist a j and a k with v1 j > 0
and vk1 > 0. Set v = min(v1 j , vk1 ) > 0 and define v11 = v11 + v, vk j = vk j + v and
v1 j = v1 j v, vk1 = vk1 v, and vab = vab for all other pairs ab. Then,

iA jB (vi j vi j ) = v(1A 1B + kA jB kA 1B 1A jB )
i, j

= v(1A kA )(1B jB ) 0 .
Thus, V = {vi j } satisfies the constraints, is also a maximum, and the number of
zero elements in the first row and column of V is larger by 1 at least from the
corresponding one for V . If v11 = 1, the claims follows, while if v11 < 1, one
repeats this (at most 2N 2 times) to conclude. Proceeding in this manner with
all diagonal elements of V , one sees that indeed the maximum of the right side of
(2.1.28) is i iA iB , as claimed.

Remark 2.1.20 The statement and proof of Lemma 2.1.19 carry over to the case
where A and B are both Hermitian matrices.

Lemma 2.1.19 allows one to perform all sorts of truncations when proving con-
vergence of empirical measures. For example, let us prove the following variant
of Wigners Theorem 2.1.1.

Theorem 2.1.21 Assume XN is as in (2.1.2), except that instead of (2.1.1), only


r2 < is assumed. Then, the conclusion of Theorem 2.1.1 still holds.

Proof Fix a constant C and consider the symmetric matrix XN whose elements
satisfy, for 1 i j N,
XN (i, j) = XN (i, j)1N|XN (i, j)|C E(XN (i, j)1N|XN (i, j)|C ).

Then, with iN denoting the eigenvalues of XN , ordered, it follows from Lemma


2.1.19 that
1 N N 1
|i iN |2 N tr(XN XN )2 .
N i=1
But
1
WN := tr(XN XN )2
N
1 2
N2
NX (i, j)1 E( NX (i, j)1 ) .
N | NXN (i, j)|C N | NXN (i, j)|C
i, j

Since r2 < , and the involved


random variables are identical in law to either Z1,2
or Y1 , it follows that E[( NXN (i, j))2 1|NXN (i, j)|C ] converges to 0 uniformly in
2.1 T RACES , MOMENTS AND COMBINATORICS 23

N, i, j, when C converges to infinity. Hence, one may chose for each a large
enough C such that P(|WN | > ) < . Further, let
| f (x) f (y)
Lip(R) = { f Cb (R) : sup | f (x)| 1, sup 1} .
x x =y |x y|

Then, on the event {|WN | < }, it holds that for f Lip(R),


1
|LN , f  LN , f |
N i
|iN iN | ,

where LN denotes the empirical measure of the eigenvalues of XN , and Jensens


inequality was used in the second inequality. This, together with the weak conver-
gence in probability of LN toward the semicircle law assured by Theorem 2.1.1,
and the fact that weak convergence is equivalent to convergence with respect to
the Lipschitz bounded metric, see Theorem C.8, complete the proof of Theorem
2.1.21.

2.1.6 Maximal eigenvalues and FurediKomlos enumeration

Wigners theorem asserts the weak convergence of the empirical measure of eigen-
values to the compactly supported semicircle law. One immediately is led to sus-
pect that the maximal eigenvalue of XN should converge to the value 2, the largest
element of the support of the semicircle distribution. This fact, however, does not
follow from Wigners Theorem. Nonetheless, the combinatorial techniques we
have already seen allow one to prove the following, where we use the notation
introduced in (2.1.1) and (2.1.2).

Theorem 2.1.22 (Maximal eigenvalue) Consider a Wigner matrix XN satisfying


rk kCk for some constant C and all positive integers k. Then, NN converges to 2
in probability.

Remark The assumption of Theorem 2.1.22 holds if the random variables |Z1,2 |
and |Y1 | possess a finite exponential moment.
Proof of Theorem 2.1.22 Fix > 0 and let g : R  R+ be a continuous function
supported on [2 , 2], with  , g = 1. Then, applying Wigners Theorem 2.1.1,
1
P(NN < 2 ) P(LN , g = 0) P(|LN , g  , g| > ) N 0 . (2.1.29)
2
We thus need to provide a complementary estimate on the probability that NN is
large. We do that by estimating LN , x2k  for k growing with N, using the bounds
24 2. W IGNER MATRICES

on rk provided in the assumptions. The key step is contained in the following


combinatorial lemma that gives information on the sets Wk,t , see (2.1.16).

Lemma 2.1.23 For all integers k > 2t 2 one has the estimate

|Wk,t | 2k k3(k2t+2) . (2.1.30)

The proof of Lemma 2.1.23 is deferred to the end of this section.


Equipped with Lemma 2.1.23, we have for 2k < N, using (2.1.17),
k+1
Nw Nw
LN , x2k  Nt(k+1) |W2k,t | wW
sup E(Z1,2e ) E(Y1 e )
c s
(2.1.31)
t=1 2k,t eEw eEw
k+1  k+1t
(2k)6 Nw Nw
4 k
N
sup c E(Z1,2e ) s E(Y1 e ) .
wW2k,t eEw
t=1 eEw

To evaluate the last expectation, fix w W2k,t , and let l denote the number of edges
in Ewc with New = 2. Holders inequality then gives
Nw Nw
c E(Z1,2e ) s E(Y1 e ) r2k2l ,
eEw eEw

with the convention that r0 = 1. Since Gw is connected, |Ewc | |Vw |1 = t 1. On


the other hand, by noting that New 3 for |Ewc |l edges, one has 2k 3(|Ewc |l)+
2l + 2|Ews |. Hence, 2k 2l 6(k + 1 t). Since r2q is a nondecreasing function of
q bounded below by 1, we get, substituting back in (2.1.31), that for some constant
c1 = c1 (C) > 0 and all k < N,
k+1  k+1t
(2k)6
LN , x  4
2k k
r6(k+1t) (2.1.32)
t=1 N
k+1  k+1t k  c1 i
(2k)6 (6(k + 1 t))6C k
4k 4k .
t=1 N i=0 N

Choose next a sequence k(N) N such that

k(N)c1 /N N 0 but k(N)/ log N N .

Then, for any > 0, and all N large,

P(NN > (2 + )) P(NLN , x2k(N)  > (2 + )2k(N) )


NLN , x2k(N)  2N4k(N)
N 0 ,
(2 + ) 2k(N) (2 + )2k(N)
completing the proof of Theorem 2.1.22, modulo Lemma 2.1.23.

2.1 T RACES , MOMENTS AND COMBINATORICS 25

Proof of Lemma 2.1.23 The idea of the proof is to keep track of the number of
possibilities to prevent words in Wk,t from having weight k/2 + 1. Toward this
end, let w Wk,t be given. A parsing of the word w is a sentence aw = (w1 , . . . , wn )
such that the word obtained by concatenating the words wi is w. One can imagine
creating a parsing of w by introducing commas between parts of w.
We say that a parsing a = aw of w is an FK parsing (after Furedi and Komlos),
and call the sentence a an FK sentence, if the graph associated with a is a tree, if
Nea 2 for all e Ea , and if for any i = 1, . . . , n 1, the first letter of wi+1 belongs

to ij=1 supp w j . If the one-word sentence a = w is an FK parsing, we say that w
is an FK word. Note that the constituent words in an FK parsing are FK words.
As will become clear next, the graph of an FK word consists of trees whose
edges have been visited twice by w, glued together by edges that have been visited
only once. Recalling that a Wigner word is either a one-letter word or a closed
word of odd length and maximal weight (subject to the constraint that edges are
visited at least twice), this leads to the following lemma.

Lemma 2.1.24 Each FK word can be written in a unique way as a concatenation


of pairwise disjoint Wigner words. Further, there are at most 2n1 equivalence
classes of FK words of length n.

Proof of Lemma 2.1.24 Let w = s1 sn be an FK word of length n. By definition,


Gw is a tree. Let {si j , si j +1 }rj=1 denote those edges of Gw visited only once by the
walk induced by w. Defining i0 = 1, one sees that the words w j = si j1 +1 si j ,
j 1, are closed, disjoint, and visit each edge in the tree Gw j exactly twice. In
particular, with l j := i j i j1 1, it holds that l j is even (possibly, l j = 0 if w j
is a one-letter word), and further if l j > 0 then w j Wl j ,l j /2+1 . This decomposi-
tion being unique, one concludes that for any z, with Nn denoting the number of
equivalence classes of FK words of length n, and with |W0,1 | := 1,
r
Nn zn = r zl j +1 |Wl j ,l j /2+1 |
n=1 r=1 {l j } j=1 j=1
l j even
 r

= z+ z 2l+1
|W2l,l+1 | , (2.1.33)
r=1 l=1

in the sense of formal power series. By the proof of Lemma 2.1.6, |W2l,l+1 | =
Cl = l . Hence, by Lemma 2.1.3, for |z| < 1/4,


1 1 4z2
z+ z 2l+1
|W2l,l+1 | = z (z ) =
2
.
l=1 2z
26 2. W IGNER MATRICES

Substituting in (2.1.33), one sees that (again, in the sense of power series)


z (z2 ) 1 1 4z2 1 z + 12
nN z n
= =
1 z (z2 ) 2z 1 + 1 4z2
= +
2 1 4z2
.
n=1

Using the fact that


 k  
1 t 2k
= k ,
1 t k=0 4 k
one concludes that
 
1 2n
Nn z = z + 2 (1 + 2z) z2n
n
n
,
n=1 n=1

from which Lemma 2.1.24 follows.


Our interest in FK parsings is the following FK parsing w of a word w =


s1 sn . Declare an edge e of Gw to be new (relative to w) if for some index
1 i < n we have e = {si , si+1 } and si+1 {s1 , . . . , si }. If the edge e is not new,
then it is old. Define w to be the sentence obtained by breaking w (that is, insert-
ing a comma) at all visits to old edges of Gw and at third and subsequent visits to
new edges of Gw .

6 6
5 5

2 3 2 3

1 1
7

4 7 4

Fig. 2.1.3. Two inequivalent FK sentences [x1 , x2 ] corresponding to (solid line) b =


141252363 and (dashed line) c = 1712 (in left) 3732 (in right).

Since a word w can be recovered from its FK parsing by omitting the extra
commas, and since the number of equivalence classes of FK words is estimated
by Lemma 2.1.24, one could hope to complete the proof of Lemma 2.1.23 by
controlling the number of possible parsed FK sequences. A key step toward this
end is the following lemma, which explains how FK words are fitted together to
form FK sentences. Recall that any FK word w can be written in a unique way as
a concatenation of disjoint Wigner words wi , i = 1, . . . , r. With si denoting the first
(and last) letter of wi , define the skeleton of w as the word s1 sr . Finally, for a
2.1 T RACES , MOMENTS AND COMBINATORICS 27

sentence a with graph Ga , let G1a = (Va1 , Ea1 ) be the graph with vertex set Va = Va1
and edge set Ea1 = {e Ea : Nea = 1}. Clearly, when a is an FK sentence, G1a is
always a forest, that is a disjoint union of trees.

Lemma 2.1.25 Suppose b is an FK sentence with n 1 words and c is an FK


word with skeleton s1 sr such that s1 supp (b). Let  be the largest index such
that s supp b, and set d = s1 s . Then a = (b, c) is an FK sentence only if
supp b supp c = supp d and d is a geodesic in G1b .

(A geodesic connecting x, y G1b is a path of minimal length starting at x and


terminating at y.) A consequence of Lemma 2.1.25 is that there exist at most
(wt(b))2 equivalence classes of FK sentences x1 , . . . , xn such that b x1 , . . . , xn1
and c xn . See Figure 2.1.3 for an example of two such equivalence classes and
their pictorial description.
Before providing the proof of Lemma 2.1.25, we explain how it leads to

Completion of proof of Lemma 2.1.23 Let (t, , m) denote the set of equiva-
lence classes of FK sentences a = (w1 , . . . , wm ) consisting of m words, with total
length mi=1 (wi ) =  and wt(a) = t. An immediate corollary of Lemma 2.1.25 is
that
 
1
|(t, , m)| 2mt 2(m1) . (2.1.34)
m1
 
1
Indeed, there are c,m := m-tuples of positive integers summing to ,
m1
and thus at most 2m c,m equivalence classes of sentences consisting of m pair-
wise disjoint FK words with sum of lengths equal to . Lemma 2.1.25 then shows
that there are at most t 2(m1) ways to glue these words into an FK sentence,
whence (2.1.34) follows.
For any FK sentence a consisting of m words with total length , we have that
m = |Ea1 | 2wt(a) + 2 +  . (2.1.35)
Indeed, the word obtained by concatenating the words of a generates a list of  1
(not necessarily distinct) unordered pairs of adjoining letters, out of which m 1
correspond to commas in the FK sentence a and 2|Ea | |Ea1 | correspond to edges
of Ga . Using that |Ea | = |Va | 1, (2.1.35) follows.
Consider a word w Wk,t that is parsed into an FK sentence w consisting of
m words. Note that if an edge e is retained in Gw , then no comma is inserted
at e at the first and second passage on e (but is introduced if there are further
passages on e). Therefore, Ew1 = 0.
/ By (2.1.35), this implies that for such words,
28 2. W IGNER MATRICES

m 1 = k + 2 2t. Inequality (2.1.34) then allows one to conclude the proof of


Lemma 2.1.23.

Proof of Lemma 2.1.25 Assume a is an FK sentence. Then Ga is a tree, and since


the Wigner words composing c are disjoint, d is the unique geodesic in Gc Ga
connecting s1 to s . Hence, it is also the unique geodesic in Gb Ga connecting
s1 to s . But d visits only edges of Gb that have been visited exactly once by the
constituent words of b, for otherwise (b, c) would not be an FK sentence (that
is, a comma would need to be inserted to split c). Thus, Ed Eb1 . Since c is
an FK word, Ec1 = Es1 sr . Since a is an FK sentence, Eb Ec = Eb1 Ec1 . Thus,
Eb Ec = Ed . But, recall that Ga , Gb , Gc , Gd are trees, and hence

|Va | = 1 + |Ea | = 1 + |Eb | + |Ec | |Eb Ec | = 1 + |Eb | + |Ec | |Ed |


= 1 + |Eb | + 1 + |Ec | 1 |Ed | = |Vb | + |Vc | |Vd | .

Since |Vb | + |Vc | |Vb Vc | = |Va |, it follows that |Vd | = |Vb Vc |. Since Vd
Vb Vc , one concludes that Vd = Vb Vc , as claimed.

Remark 2.1.26 The result described in Theorem 2.1.22 is not optimal, in the sense
that even with uniform bounds on the (rescaled) entries, i.e. rk uniformly bounded,
the estimate one gets on the displacement of the maximal eigenvalue to the right
of 2 is O(n1/6 log n), whereas the true displacement is known to be of order n2/3
(see Section 2.7 for more details, and, in the context of complex Gaussian Wigner
matrices, see Theorems 3.1.4 and 3.1.5).

Exercise 2.1.27 Prove that the conclusion of Theorem 2.1.22 holds with conver-
gence in probability replaced by either almost sure convergence or L p conver-
gence.

Exercise 2.1.28 Prove that the statement of Theorem 2.1.22 can be strengthened
to yield that for some constant = (C) > 0, N (NN 2) converges to 0, almost
surely.

Exercise 2.1.29 Assume that for some constants > 0, C, the independent (but
not necessarily identically distributed) entries {XN (i, j)}1i jN of the symmetric
matrices XN satisfy

sup E(e N|XN (i, j)|
) C.
i, j,N

Prove that there exists a constant c1 = c1 (C) such that lim supN NN c1 , almost
surely, and lim supN E NN c1 .
2.1 T RACES , MOMENTS AND COMBINATORICS 29

Exercise 2.1.30 We develop in this exercise an alternative proof, that avoids mo-
ment computations, to the conclusion of Exercise 2.1.29, under the stronger as-
sumption that for some > 0,

sup E(e ( N|XN (i, j)|)2
) C.
i, j,N

(a) Prove (using Chebyshevs inequality and the assumption) that there exists a
constant c0 independent of N such that for any fixed z RN , and all C large
enough,
P(zT XN 2 > C) ec0C
2N
. (2.1.36)
N
(b) Let N = {zi }i=1
be a minimal deterministic net in the unit ball of RN , that
is zi 2 = 1, supz:z2 =1 infi z zi 2 , and N is the minimal integer with the
property that such a net can be found. Check that

(1 2 ) sup zT XN z sup zTi XN zi + 2 sup sup zT XN zi . (2.1.37)


z:z2 =1 zi N i z:zzi 2

(c) Combine steps (a) and (b) and the estimate N cN , valid for some c > 0, to
conclude that there exists a constant c2 independent of N such that for all C large
enough, independently of N,

P(NN > C) = P( sup zT XN z > C) ec2C


2N
.
z:z2 =1

2.1.7 Central limit theorems for moments

Our goal here is to derive a simple version of a central limit theorem (CLT)
for linear statistics of the eigenvalues of Wigner matrices. With XN a Wigner
matrix and LN the associated empirical measure of its eigenvalues, set WN,k :=
N[LN , xk  LN , xk ]. Let
 x
1
eu
2 /2
(x) = du
2

denote the Gaussian distribution. We set k2 as in (2.1.44) below, and prove the
following.

Theorem 2.1.31 The law of the sequence of random variables WN,k /k converges
weakly to the standard Gaussian distribution. More precisely,
 
WN,k
lim P x = (x) . (2.1.38)
N k
30 2. W IGNER MATRICES

Proof of Theorem 2.1.31 Most of the proof consists of a variance computation.


The reader interested only in a proof of convergence to a Gaussian distribution
(without worrying about the actual variance) can skip to the text following equa-
tion (2.1.45).
(2)
Recall the notation Wk,t , see (2.1.24). Using (2.1.26), we have

2
lim E(WN,k ) = lim N 2 E(LN , xk 2 ) LN , xk 2 (2.1.39)
N N

Na Na
= E(Z1,2e ) E(Y1 e )
(2) eEac eEas
a=(w1 ,w2 )Wk,k
w1 w1 w2 w2 
c Ne
E(Z1,2 ) s E(Y1Ne ) c Ne
E(Z1,2 ) s E(Y1Ne ) .
eEw eEw eEw eEw
1 1 2 2

(2)
Note that if a = (w1 , w2 ) Wk,k then Ga is connected and possesses k vertices and
at most k edges, each visited at least twice by the paths generated by a. Hence,
(2)
with k vertices, Ga possesses either k 1 or k edges. Let Wk,k,+ denote the subset
(2)
of Wk,k such that |Ea | = k (that is, Ga is unicyclic, i.e. possesses one edge too
(2) (2)
many to be a tree) and let Wk,k, denote the subset of Wk,k such that |Ea | = k 1.
(2)
Suppose first a Wk,k, . Then, Ga is a tree, Eas = 0,
/ and necessarily Gwi is a
subtree of Ga . This implies that k is even and that |Ewi | k/2. In this case, for
Ew1 Ew2 = 0/ one must have |Ewi | = k/2, which implies that all edges of Gwi are
visited twice by the walk generated by wi , and exactly one edge is visited twice
by both w1 and w2 . In particular, wi are both closed Wigner words of length k + 1.
The emerging picture is of two trees with k/2 edges each glued together at one
edge. Since there are Ck/2 ways to chose each of the trees, k/2 ways of choosing
(in each tree) the edge to be glued together, and 2 possible orientations for the
gluing, we deduce that
 2
(2) k
|Wk,k, | = 2 2
Ck/2 . (2.1.40)
2
(2)
Further, for each a Wk,k, ,

Na Na
c E(Z1,2e ) s E(Y1 e )
eEa eEa
w1 w1 w2 w2 
c Ne
E(Z1,2 ) s E(Y1Ne ) c Ne
E(Z1,2 ) s E(Y1Ne )
eEw eEw eEw eEw
1 1 2 2

= 4
E(Z1,2 2
)[E(Z1,2 )]k2 [E(Z1,2
2
)]k 4
= E(Z1,2 )1. (2.1.41)
2.1 T RACES , MOMENTS AND COMBINATORICS 31
(2)
We next turn to consider Wk,k,+ . In order to do so, we need to understand the
structure of unicyclic graphs.

Definition 2.1.32 A graph G = (V, E) is called a bracelet if there exists an enu-


meration 1 , 2 , . . . , r of V such that


{{1 , 1 }} if r = 1,

{{1 , 2 }} if r = 2,
E=

{{1 , 2 }, {2 , 3 }, {3 , 1 }} if r = 3,

{{1 , 2 }, {2 , 3 }, {3 , 4 }, {4 , 1 }} if r = 4,

and so on. We call r the circuit length of the bracelet G.

We need the following elementary lemma, allowing one to decompose a uni-


cyclic graph as a bracelet and its associated pendant trees. Recall that a graph
G = (V, E) is unicyclic if it is connected and |E| = |V |.

Lemma 2.1.33 Let G = (V, E) be a unicyclic graph. Let Z be the subgraph of


G consisting of all e E such that G \ e is connected, along with all attached
vertices. Let r be the number of edges of Z. Let F be the graph obtained from G
by deleting all edges of Z. Then, Z is a bracelet of circuit length r, F is a forest
with exactly r connected components, and Z meets each connected component of
F in exactly one vertex. Further, r = 1 if E s = 0/ while r 3 otherwise.

We call Z the bracelet of G. We call r the circuit length of G, and each of the
components of F we call a pendant tree. (The case r = 2 is excluded from Lemma
2.1.33 because a bracelet of circuit length 2 is a tree and thus never unicyclic.)
See Figure 2.1.4.

4 1

3 2

8 5

7 6

Fig. 2.1.4. The bracelet 1234 of circuit length 4, and the pendant trees, associated with the
unicyclic graph corresponding to [12565752341, 2383412]
32 2. W IGNER MATRICES
(2)
Coming back to a Wk,k,+ , let Za be the associated bracelet (with circuit length
r = 1 or r 3). Note that for any e Ea one has Nea = 2. We claim next that
e Za if and only if New1 = New2 = 1. On the one hand, if e Za then (Va , Ea \ e)
is a tree. If one of the paths determined by w1 and w2 fail to visit e then all edges
visited by this path determine a walk on a tree and therefore the path visits each
edge exactly twice. This then implies that the set of edges visited by the walks
are disjoint, a contradiction. On the other hand, if e = (x, y) and Newi = 1, then all
vertices in Vwi are connected to x and to y by a path using only edges from Ewi \ e.
Hence, (Va , Ea \ e) is connected, and thus e Za .
(2)
Thus, any a = (w1 , w2 ) Wk,k,+ with bracelet length r can be constructed from
the following data: the pendant trees {T ji }rj=1 (possibly empty) associated with
each word wi and each vertex j of the bracelet Za , the starting point for each word
wi on the graph consisting of the bracelet Za and trees {T ji }, and whether Za is
traversed by the words wi in the same or in opposing directions (in the case r 3).
In view of the above, counting the number of ways to attach trees to a bracelet of
length r, and then the distinct number of non-equivalent ways to choose starting
points for the paths on the resulting graph, there are exactly
2
21r3 k2

r
r Cki
(2.1.42)
ki 0: i=1
2 ri=1 ki =kr

(2) (2)
elements of Wk,k,+ with bracelet of length r. Further, for a Wk,k,+ we have

Na Na
E(Z1,2e ) E(Y1 e )
eEac eEas
w1 w1 w2 w2 
Ne
E(Z1,2 ) E(Y1Ne ) Ne
E(Z1,2 ) E(Y1Ne )
eEwc eEws eEwc eEws
1 1 2 2

2 ))k 0
(E(Z1,2 if r 3,
=
(E(Z1,2 )) EY1 0 if r = 1
2 k1 2


1 if r 3 ,
= (2.1.43)
EY12 if r = 1 .
Combining (2.1.39), (2.1.40), (2.1.41), (2.1.42) and (2.1.43), and setting Cx = 0 if
x is not an integer, one obtains, with
2
k2 2
2k2

r
k2 = k2C2k1 EY12 +
2 2 2
4
C k [EZ1,2 1] +
r Cki
, (2.1.44)
r=3 ki 0: i=1
2 ri=1 ki =kr
2.1 T RACES , MOMENTS AND COMBINATORICS 33

that
k2 = lim EWN,k
2
. (2.1.45)
N

The rest of the proof consists in verifying that, for j 3,


  
WN,k j 0 if j is odd ,
lim E = (2.1.46)
N k ( j 1)!! if j is even ,
where ( j 1)!! = ( j 1)( j 3) 1. Indeed, this completes the proof of the
theorem since the right hand side of (2.1.46) coincides with the moments of the
Gaussian distribution , and the latter moments determine the Gaussian distribu-
tion by an application of Carlemans theorem (see, e.g., [Dur96]), since

[(2 j 1)!!](1/2 j) = .
n=1

To see (2.1.46), recall, for a multi-index i = (i1 , . . . , ik ), the terms TiN of (2.1.15),
and the associated closed word wi . Then, as in (2.1.21), one has
N
j
E(WN,k )= n TiN1 ,i2 ,...,i j , (2.1.47)
in1 ,...,ik =1
n=1,2,... j

where

j
TiN1 ,i2 ,...,i j =E (TiNn ETiNn ) . (2.1.48)
n=1

Note that TiN1 ,i2 ,...,i j = 0 if the graph generated by any word wn := win does not
have an edge in common with any graph generated by the other words wn , n = n.
Motivated by that and our variance computation, let
( j)
Wk,t denote a set of representatives for equivalence classes of
sentences a of weight t consisting of j closed words (w1 , w2 , . . . , w j ),
each of length k + 1, with Nea 2 for each e Ea , and such that for
each n there is an n = n (n) = n such that Ewn Ewn = 0.
/
(2.1.49)
As in (2.1.25), one obtains
jk jk
CN,t
j
E(WN,k ) = CN,t TwN1 ,w2 ,...,w j := jk/2 Ta . (2.1.50)
t=1 ( j) t=1 N ( j)
a=(w1 ,w2 ,...,w j )Wk,t aWk,t

The next lemma, whose proof is deferred to the end of the section, is concerned
( j)
with the study of Wk,t .
34 2. W IGNER MATRICES

Lemma 2.1.34 Let c denote the number of connected components of Ga for a


( j)
t Wk,t . Then, c  j/2 and wt(a) c j + (k + 1) j/2.

In particular, Lemma 2.1.34 and (2.1.50) imply that



j
0 if j is odd ,
lim E(WN,k ) = ( j) Ta if j is even . (2.1.51)
N aWk,k j/2

( j)
By Lemma 2.1.34, if a Wk,k j/2 for j even then Ga possesses exactly j/2 con-
nected components. This is possible only if there exists a permutation

: {1, . . . , j} {1, . . . , j} ,

all of whose cycles have length 2 (that is, a matching), such that the connected
components of Ga are the graphs {G(wi ,w (i) ) }. Letting mj denote the collection of
all possible matchings, one thus obtains that for j even,
j/2
Ta = Twi ,w (i)
( j) mj i=1 (w ,w (2)
aWk,k j/2 i (i) )Wk,k

= m kj = |mj |kj = kj ( j 1)!! , (2.1.52)


j

which, together with (2.1.51), completes the proof of Theorem 2.1.31.


Proof of Lemma 2.1.34 That c  j/2 is immediate from the fact that the sub-
graph corresponding to any word in a must have at least one edge in common with
at least one subgraph corresponding to another word in a.
Next, put
j
!
j
a = [[i,n ]kn=1 ]i=1 ,I= {i} {1, . . . , k} , A = [{i,n , i,n+1 }](i,n)I .
i=1

We visualize A as a left-justified table of j rows. Let G = (V , E ) be any spanning


forest in Ga , with c connected components. Since every connected component of
G is a tree, we have
wt(a) = c + |E | . (2.1.53)

Now let X = {Xin }(i,n)I be a table of the same shape as A, but with all entries
equal either to 0 or 1. We call X an edge-bounding table under the following
conditions.

For all (i, n) I, if Xi,n = 1, then Ai,n E .


2.2 C OMPLEX W IGNER MATRICES 35

For each e E there exist distinct (i1 , n1 ), (i2 , n2 ) I such that Xi1 ,n1 = Xi2 ,n2 =
1 and Ai1 ,n1 = Ai2 ,n2 = e.
For each e E and index i {1, . . . , j}, if e appears in the ith row of A then
there exists (i, n) I such that Ai,n = e and Xi,n = 1.

For any edge-bounding table X the corresponding quantity 12 (i,n)I Xi,n bounds
|E |. At least one edge-bounding table exists, namely the table with a 1 in position
(i, n) for each (i, n) I such that Ai,n E and 0 elsewhere. Now let X be an edge-
bounding table such that for some index i0 all the entries of X in the i0 th row are
equal to 1. Then the closed word wi0 is a walk in G , and hence every entry in the
i0 th row of A appears there an even number of times and a fortiori at least twice.
Now choose (i0 , n0 ) I such that Ai0 ,n0 E appears in more than one row of A.
Let Y be the table obtained by replacing the entry 1 of X in position (i0 , n0 ) by
the entry 0. Then Y is again an edge-bounding table. Proceeding in this way we
can find an edge-bounding table with 0 appearing at least once in every row, and
hence we have |E |  |I| j
2 . Together with (2.1.53) and the definition of I, this
completes the proof.

Exercise 2.1.35 (from [AnZ05]) Prove that the random vector {WN,i }ki=1 satisfies
a multidimensional CLT (as N ). (See Exercise 2.3.7 for an extension of this
result.)

2.2 Complex Wigner matrices

In this section we describe the (minor) modifications needed when one considers
the analog of Wigners theorem for Hermitian matrices. Compared with (2.1.2),
we will have complex-valued random variables Zi, j . That is, start with two in-
dependent families of i.i.d. random variables {Zi, j }1i< j (complex-valued) and
{Yi }1i (real-valued), zero mean, such that EZ1,22 = 0, E|Z |2 = 1 and, for all
1,2
integers k 1,
 
rk := max E|Z1,2 |k , E|Y1 |k < . (2.2.1)

Consider the (Hermitian) N N matrix XN with entries



/ N if i < j ,
Zi, j
XN ( j, i) = XN (i, j) = (2.2.2)
Yi / N if i = j .
We call such a matrix a Hermitian Wigner matrix, and if the random variables Zi, j
and Yi are Gaussian, we use the term Gaussian Hermitian Wigner matrix. The
case of Gaussian Hermitian Wigner matrices in which EY12 = 1 is of particular
36 2. W IGNER MATRICES

importance, and
for reasons that will become clearer in Chapter 3, such matrices
(rescaled by N) are referred to as Gaussian unitary ensemble (GUE) matrices.
As before, let iN denote the (real) eigenvalues of XN , with 1N 2N
NN , and recall that the empirical distribution of the eigenvalues is the probability
measure on R defined by
1 N
LN = iN .
N i=1

The following is the analog of Theorem 2.1.1.

Theorem 2.2.1 (Wigner) For a Hermitian Wigner matrix, the empirical measure
LN converges weakly, in probability, to the semicircle distribution.

As in Section 2.1.2, the proof of Theorem 2.2.1 is a direct consequence of the


following two lemmas.

Lemma 2.2.2 For any k N,

lim mNk = mk .
N

Lemma 2.2.3 For any k N and > 0,


 

lim P LN , xk  LN , xk  > = 0 .
N

Proof of Lemma 2.2.2 We recall the machinery introduced in Section 2.1.3. Thus,
an N-word w = (s1 , . . . , sk ) defines a graph Gw = (Vw , Ew ) and a path on the graph.
For our purpose, it is convenient to keep track of the direction in which edges are
traversed by the path. Thus, given an edge e = {s, s }, with s < s , we define
New,+ as the number of times the edge is traversed from s to s , and we set New, =
New New,+ as the number of times it is traversed in the reverse direction.
Recalling the equality (2.1.10), we now have instead of (2.1.15) the equation
1 wi ,+ wi , wi
TiN = c
N k/2 eEw
Ne
E(Z1,2 Ne
(Z1,2 ) ) s E(Y1Ne ) . (2.2.3)
eEw
i i

w
In particular, TiN = 0 unless Ne i 2 for all e Ewi . 2 = 0,
Furthermore, since EZ1,2
w w ,+
one has TiN = 0 if Ne i = 2 and Ne i = 1 for some e Ewi .
A slight complication occurs since the function
w,+ w,
Ne Ne
gw (New,+ , New, ) := E(Z1,2 (Z1,2 ) )
2.2 C OMPLEX W IGNER MATRICES 37

is not constant over equivalence classes of words (since changing the letters de-
termining w may switch the role of New,+ and New, in the above expression). Note
however that, for any w Wk,t , one has
w
|gw (New,+ , New, )| E(|Z1,2 |Ne ) .

On the other hand, any w Wk,k/2+1 satisfies that Gw is a tree, with each edge
visited exactly twice by the path determined by w. Since the latter path starts and
ends at the same vertex, one has New,+ = New, = 1 for each e Ew . Thus, repeating
the argument in Section 2.1.3, the finiteness of rk implies that

lim LN , xk  = 0 , if k is odd ,


N

while, for k even,


lim LN , xk  = |Wk,k/2+1 |gw (1, 1) . (2.2.4)
N

Since gw (1, 1) = 1, the proof is completed by applying (2.1.20).


Proof of Lemma 2.2.3 The proof is a rerun of the proof of Lemma 2.1.7, using
the functions gw (New,+ , New, ), defined in the course of proving Lemma 2.2.2. The
(2)
proof boils down to showing that Wk,k+2 is empty, a fact that was established in
the course of proving Lemma 2.1.7.

Exercise 2.2.4 We consider in this exercise Hermitian self-dual matrices, which


in the Gaussian case reduce to matrices from the Gaussian symplectic ensemble
discussed in greater detail in Section 4.1. For any a, b C, set
 
a b
ma,b = Mat2 (C) .
b a
(k)
Let {Zi, j }1i< j,1k4 and {Yi }1iN be independent zero mean real-valued ran-
dom variables of unit variance satisfying the condition (2.1.1). For 1 i < j N,
(1) (2) (3) (4)
set ai, j = (Zi, j + iZi, j )/(2 N), bi, j = (Zi, j + iZi, j )/(2 N), ai,i = Yi / N, bi,i =
0, and write mi, j = mai, j ,bi, j for 1 i j N. Finally, construct a Hermitian matrix
(2)
XN H2N from the 2-by-2 matrices mi, j by setting XN (i, j) = mi, j , 1 i j N.
(a) Let
 
0 1
J1 = Mat2 (R) ,
1 0
and let JN = diag(J1 , . . . , J1 ) Mat2N (R) be the block diagonal matrix with blocks
J1 on the diagonal. Check that XN = JN XN JN1 . This justifies the name self-dual.
(b) Verify that the eigenvalues of XN occur in pairs, and that Wigners Theorem
continues to hold.
38 2. W IGNER MATRICES

2.3 Concentration for functionals of random matrices and logarithmic


Sobolev inequalities

In this short section we digress slightly and prove that certain functionals of ran-
dom matrices have the concentration property, namely, with high probability these
functionals are close to their mean value. A more complete treatment of concen-
tration inequalities and their application to random matrices is postponed to Sec-
tion 4.4. The results of this section will be useful in Section 2.4, where they will
play an important role in the proof of Wigners Theorem via the Stieltjes trans-
form.

2.3.1 Smoothness properties of linear functions of the empirical measure

Let us recall that if X is a symmetric (Hermitian) matrix and f is a bounded mea-


surable function, f (X) is defined as the matrix with the same eigenvectors as X
but with eigenvalues that are the image by f of those of X; namely, if e is an eigen-
vector of X with eigenvalue , Xe = e, f (X)e := f ( )e. In terms of the spectral
decomposition X = UDU with U orthogonal (unitary) and D diagonal real, one
has f (X) = U f (D)U with f (D)ii = f (Dii ). For M N, we denote by ,  the
Euclidean scalar product on RM (or CM ), x, y = M M
i=1 xi yi (x, y = i=1 xi yi ),
and by || ||2 the associated norm ||x||2 = x, x.
2

General functions of independent random variables need not, in general, satisfy


a concentration property. Things are different when the functions involved satisfy
certain regularity conditions. It is thus reassuring to see that linear functionals of
the empirical measure, viewed as functions of the matrix entries, do possess some
regularity properties.
Throughout this section, we denote the Lipschitz constant of a function G :
RM R by
|G(x) G(y)|
|G|L := sup ,
x =yRM x y2

and call G a Lipschitz function if |G|L < . The following lemma is an immediate
application of Lemma 2.1.19. In its statement, we identify C with R2 .

Lemma 2.3.1 Let g : RN R be Lipschitz with Lipschitz constant |g|L . Then,


with X denoting the Hermitian matrix with entries X(i, j), the map

{X(i, j)}1i jN  g(1 (X), . . . , N (X))


2
is a Lipschitz function on RN with Lipschitz constant bounded by 2|g|L . In
2.3 C ONCENTRATION AND LOGARITHMIC S OBOLEV INEQUALITIES 39

particular, if f is a Lipschitz function on R,


{X(i, j)}1i jN  tr( f (X))

is a Lipschitz function on RN(N+1) with Lipschitz constant bounded by 2N| f |L .

2.3.2 Concentration inequalities for independent variables satisfying


logarithmic Sobolev inequalities

We derive in this section concentration inequalities based on the logarithmic


Sobolev inequality.
To begin with, recall that a probability measure P on R is said to satisfy the log-
arithmic Sobolev inequality (LSI) with constant c if, for any differentiable function
f in L2 (P),
 
f2
f 2 log  dP 2c | f |2 dP .
f 2 dP
It is not hard to check, by induction, that if Pi satisfy the LSI with constant c and
if P(M) = M i=1 Pi denotes the product measure on R , then P
M (M) satisfies the LSI

with constant c in the sense that, for every differentiable function F on RM ,


 
F2
F log  2 (M) dP(M) 2c
2
||F||22 dP(M) , (2.3.1)
F dP
where F denotes the gradient of F. (See Exercise 2.3.4 for hints.) We note that
if the law of a random variable X satisfies the LSI with constant c, then for any
fixed = 0, the law of X satisfies the LSI with constant 2 c.
Before discussing consequences of the logarithmic Sobolev inequality, we quote
from [BoL00] a general sufficient condition for it to hold.

Lemma 2.3.2 Let V : RM R satisfy that for some positive constant C, V (x)
x22 /2C is convex. Then, the probability measure (dx) = Z 1 eV (x) dx, where

Z = eV (x) dx, satisfies the logarithmic Sobolev inequality with constant C. In
particular, the standard Gaussian law on RM satisfies the logarithmic Sobolev
inequality with constant 1.

The lemma is also a consequence of the BakryEmery criterion, see Theorem


4.4.18 in Section 4.4 for details.
The interest in the logarithmic Sobolev inequality, in the context of concentra-
tion inequalities, lies in the following argument, that among other things, shows
that LSI implies sub-Gaussian tails.
40 2. W IGNER MATRICES

Lemma 2.3.3 (Herbst) Assume that P satisfies the LSI on RM with constant c.
Let G be a Lipschitz function on RM , with Lipschitz constant |G|L . Then for all
R,
EP [e (GEP (G)) ] ec
2 |G|2 /2
L , (2.3.2)
and so for all > 0
P (|G EP (G)| ) 2e
2 /2c|G|2
L . (2.3.3)

Note that part of the statement in Lemma 2.3.3 is that EP G is finite.


Proof of Lemma 2.3.3 Note first that (2.3.3) follows from (2.3.2). Indeed, by
Chebyshevs inequality, for any > 0,
P (|G EP G| > ) e EP [e |GEP G| ]
e (EP [e (GEP G) ] + EP [e (GEP G) ])
2e ec|G|L
2 2 /2
.
Optimizing with respect to (by taking = /c|G|2L ) yields the bound (2.3.3).
Turning to the proof of (2.3.2), let us first assume that G is a bounded differen-
tiable function such that
M
|| ||G||22 || := sup
M
(xi G(x))2 < .
xR i=1

Define
A = log EP e2 (GEP G) .
Then, taking F = e (GEP G) in (2.3.1), some algebra reveals that for > 0,
 
d A
2c|| ||G||22 || .
d
Now, because G EP (G) is centered,
A
lim =0
0+
and hence integrating with respect to yields
A 2c|| ||G||22 || 2 ,
first for 0 and then for any R by considering the function G instead of G.
This completes the proof of (2.3.2) in the case that G is bounded and differentiable.
Let us now assume only that G is Lipschitz with |G|L < . For > 0, define
G = G (1/ ) (1/ ), and note that |G |L |G|L < . Consider the reg-

ularization G (x) = p G (x) = G (y)p (x y)dy with the Gaussian density
2.3 C ONCENTRATION AND LOGARITHMIC S OBOLEV INEQUALITIES 41

p (x) = e|x| /2 dx/ (2 )M such that p (x)dx converges weakly towards the
2

atomic measure 0 as converges to 0. Since, for any x RM ,



|G (x) G (x)| |G|L ||y||2 p (y)dy = M|G|L ,

G converges pointwise towards G. Moreover, G is Lipschitz, with Lipschitz


constant bounded by |G|L independently of . G is also continuously differen-
tiable and
 G 22  = sup sup {2G (x), u u22 }
xRM uRM
sup sup{2 1 (G (x + u) G (x)) u22 }
u,xRM >0

sup {2|G|L u2 u22 } = |G|2L . (2.3.4)


uRM

Thus, we can apply (2.3.2) in the bounded differentiable case to find that for any
> 0 and all R,
EP [e G ] e EP G ec
2 |G|2 /2
L . (2.3.5)
Therefore, by Fatous Lemma,
EP [e G ] elim inf 0 EP G ec
2 |G|2 /2
L . (2.3.6)
We next show that lim 0 EP G = EP G, which, in conjunction with (2.3.6), will
conclude the proof. Indeed, (2.3.5) implies that
P (|G EP G | > ) 2e
2 /2c|G|2
L . (2.3.7)
Consequently,

E[(G EP G )2 ] = 2 xP (|G EP G | > x) dx
0
 x2
2c|G|2
4 xe L dx = 4c|G|2L , (2.3.8)
0
so that the sequence (G EP G ) 0 is uniformly integrable. Now, G converges
pointwise towards G and therefore there exists a constant K, independent of ,
such that for < 0 , P(|G | K) 34 . On the other hand, (2.3.7) implies that
P(|G EP G | r) 34 for some r independent of . Thus,
{|G EP G | r} {|G | K} {|EP G | K + r}
is not empty, providing a uniform bound on (EP G ) <0 . We thus deduce from
(2.3.8) that sup <0 EP G2 is finite, and hence (G ) <0 is uniformly integrable. In
particular,
lim EP G = EP G < ,
0
42 2. W IGNER MATRICES

which finishes the proof.


Exercise 2.3.4 (From [Led01], page 98)



(a) Let f 0 be a measurable function and set EntP ( f ) = f log( f /EP f )dP.
Prove that
EntP ( f ) = sup{EP f g : EP eg 1} .
(b) Use induction and the above representation to prove (2.3.1).

2.3.3 Concentration for Wigner-type matrices

We consider in this section (symmetric) matrices XN with independent (and not


necessarily identically distributed) entries {XN (i, j)}1i jN . The following is an
immediate corollary of Lemmas 2.3.1 and 2.3.3.

Theorem 2.3.5 Suppose that the laws of the independent entries


{XN (i, j)}1i jN all satisfy the LSI with constant c/N. Then, for any Lipschitz
function f on R, for any > 0,
1 N2 2
4c| f |2
P (|tr( f (XN ) E[tr( f (XN )]| N) 2e L . (2.3.9)
Further, for any k {1, . . . , N},
1 N 2
4c| f |2
P (| f (k (XN )) E f (k (XN ))| ) 2e L . (2.3.10)

We note that under the assumptions of Theorem 2.3.5, E N (XN ) is uniformly


bounded, see Exercise 2.1.29 or Exercise 2.1.30. In the Gaussian case, more in-
formation is available, see the bibliographical notes (Section 2.7).
Proof of Theorem 2.3.5 To see (2.3.9), take
G(XN (i, j), 1 i j N) = tr( f (XN )) .
By Lemma 2.3.1,
we see that if f is Lipschitz, G is also Lipschitz with constant
bounded by 2N| f |L and hence Lemma 2.3.3 with M = N(N + 1)/2 yields the
result. To see (2.3.10), apply the same argument to the function
G(XN (i, j), 1 i j N) = f (k (XN )) .

Remark 2.3.6 The assumption of Theorem 2.3.5 is satisfied for Gaussian matrices
whose entries on or above the diagonal are independent, with variance bounded
2.4 S TIELTJES TRANSFORMS AND RECURSIONS 43

by c/N. In particular, the assumptions hold for Gaussian Wigner matrices. We


emphasize that Theorem 2.3.5 applies also when the variance of XN (i, j) depends
on i, j, e.g. when XN (i, j) = aN (i, j)YN (i, j) with YN (i, j) i.i.d. with law P satisfying
the log-Sobolev inequality and a(i, j) uniformly bounded (since if P satisfies the
log-Sobolev inequality with constant c, the law of ax under P satisfies it also with
a constant bounded by a2 c).

Exercise 2.3.7 (From [AnZ05]) Using Exercise 2.1.35, prove that if XN is a Gaus-
sian Wigner matrix and f : R R is a Cb1 function, then N[ f , LN   f , LN ]
satisfies a central limit theorem.

2.4 Stieltjes transforms and recursions

We begin by recalling some classical results concerning the Stieltjes transform of


a probability measure.

Definition 2.4.1 Let be a positive, finite measure on the real line. The Stieltjes
transform of is the function

(dx)
S (z) := , z C\R.
R xz

Note that for z C \ R, both the real and imaginary parts of 1/(x z) are continu-
ous bounded functions of x R and, further, |S (z)| (R)/|z|. These crucial
observations are used repeatedly in what follows.

Remark 2.4.2 The generating function (z), see (2.1.6), is closely related to the
Stieltjes transform of the semicircle distribution : for |z| < 1/4,
 
 
 
(z) = zk x2k (x)dx = zx2 (x)dx
k

k=0 k=0

1
= (x)dx
1 zx2

1 1
= (x)dx = S (1/ z) ,
1 zx z
where the third equality uses the fact that the support of is the interval [2, 2],
and the fourth uses the symmetry of .

Stieltjes transforms can be inverted. In particular, one has


44 2. W IGNER MATRICES

Theorem 2.4.3 For any open interval I with neither endpoint on an atom of ,

1 S ( + i ) S ( i )
(I) = lim d
0 I 2i

1
= lim S ( + i )d . (2.4.1)
0 I

Proof Note first that because



1
S (i) = (dx) ,
1 + x2
we have that S 0 implies = 0. So assume next that S does not vanish
identically. Then, since

y2
lim yS (iy) = lim (dx) = (R)
y+ y+ x2 + y2
by bounded convergence, we may and will assume that (R) = 1, i.e. that is a
probability measure.
Let X be distributed according to , and denote by C a random variable, inde-
pendent of X, Cauchy distributed with parameter , i.e. the law of C has density
dx
. (2.4.2)
(x2 + 2 )
Then, S ( + i )/ is nothing but the density (with respect to Lebesgue mea-
sure) of the law of X +C evaluated at R. The convergence in (2.4.1) is then
just a rewriting of the weak convergence of the law of X + C to that of X, as
0.

Theorem 2.4.3 allows for the reconstruction of a measure from its Stieltjes
transform. Further, one has the following.

Theorem 2.4.4 Let n M1 (R) be a sequence of probability measures.


(a) If n converges weakly to a probability measure then Sn (z) converges to
S (z) for each z C \ R.
(b) If Sn (z) converges for each z C \ R to a limit S(z), then S(z) is the Stieltjes
transform of a sub-probability measure , and n converges vaguely to .
(c) If the probability measures n are random and, for each z C \ R, Sn (z)
converges in probability to a deterministic limit S(z) that is the Stieltjes transform
of a probability measure , then n converges weakly in probability to .

(We recall that n converges vaguely to if, for any continuous function f on R
 
that decays to 0 at infinity, f d n f d . Recall also that a positive measure
on R is a sub-probability measure if it satisfies (R) 1.)
2.4 S TIELTJES TRANSFORMS AND RECURSIONS 45

Proof Part (a) is a restatement of the notion of weak convergence. To see part
(b), let nk be a subsequence on which nk converges vaguely (to a sub-probability
measure ). (Such a subsequence always exists by Hellys selection theorem.)
Because x  1/(z x), for z C \ R, is continuous and decays to zero at infinity,
one obtains the convergence Snk (z) S (z) pointwise for such z. From the hy-
pothesis, it follows that S(z) = S (z). Applying Theorem 2.4.3, we conclude that
all vaguely convergent subsequences converge to the same , and hence n
vaguely.
To see part (c), fix a sequence zi z0 in C \ R with zi = z0 , and define, for
1 , 2 M1 (R), (1 , 2 ) = i 2i |S1 (zi ) S2 (zi )|. Note that (n , ) 0 im-
plies that n converges weakly to . Indeed, moving to a subsequence if neces-
sary, n converges vaguely to some sub-probability measure , and thus Sn (zi )
S (zi ) for each i. On the other hand, the uniform (in i, n) boundedness of Sn (zi )
and (n , ) 0 imply that Sn (zi ) S (zi ). Thus, S (z) = S (z) for all z = zi
and hence, for all z C \ R since the set {zi } possesses an accumulation point and
S , S are analytic. By the inversion formula (2.4.1), it follows that = and in
particular is a probability measure and n converges weakly to = . From
the assumption of part (c) we have that (n , ) 0, in probability, and thus n
converges weakly to in probability, as claimed.

For a matrix X, define SX (z) := (X zI)1 . Taking A = X in the matrix inver-


sion lemma (Lemma A.1), one gets
SX (z) = z1 (XSX (z) I) , z C \ R. (2.4.3)
Note that with LN denoting the empirical measure of the eigenvalues of XN ,
1 1
SLN (z) = trSXN (z) , SLN (z) = EtrSXN (z) .
N N

2.4.1 Gaussian Wigner matrices

We consider in this section the case when XN is a Gaussian Wigner matrix, pro-
viding
Proof #2 of Theorem 2.1.1 (XN a Gaussian Wigner matrix).
Recall first the following identity, characterizing the Gaussian distribution, which
is proved by integration by parts.

Lemma 2.4.5 If is a zero mean Gaussian random variable, then for f differen-
tiable, with polynomial growth of f and f ,
E( f ( )) = E( f ( ))E( 2 ) .
46 2. W IGNER MATRICES

Define next the matrix i,k


N as the symmetric N N matrix satisfying

i,k 1, (i, k) = ( j, l) or (i, k) = (l, j) ,
N ( j, l) =
0, otherwise .
Then, with X an N N symmetric matrix,

SX (z) = SX (z)i,k
N SX (z) . (2.4.4)
X(i, k)
Using now (2.4.3) in the first equality and Lemma 2.4.5 and (2.4.4) (conditioning
on all entries of XN but one) in the second, one concludes that
1 1 11
EtrSXN (z) = + E (trXN SXN (z))
N z zN
 
1 1
= 2 E [SXN (z)(i, i)SXN (z)(k, k) + SXN (z)(i, k) ]
2
z zN i,k
1  2  
zN 2
EYi 2 ESXN (z)(i, i)2
i
1 1 1
= E[LN , (x z)1 2 ] LN , (x z)2 
z z zN
1  2  
2 EYi 2 ESXN (z)(i, i)2 . (2.4.5)
zN i

Since (x z)1 is a Lipschitz function for any fixed z C \ R, it follows from


Theorem 2.3.5 and Remark 2.3.6 that

|E[LN , (x z)1 2 ] LN , (x z)1 2 | N 0 .

This, and the boundedness of 1/(z x)2 for a fixed z as above, imply the existence
of a sequence N (z) N 0 such that, letting SN (z) := N 1 EtrSXN (z), one has
1 1
SN (z) = SN (z)2 + N (z) .
z z
Thus any limit point s(z) of SN (z) satisfies

s(z)(z + s(z)) + 1 = 0 . (2.4.6)

Further, let C+ = {z C : z > 0}. Then, for z C+ , by its definition, s(z) must
have a nonnegative imaginary part, while for z C \ (R C+ ), s(z) must have a
nonpositive imaginary part. Hence, for all z C, with the choice of the branch of
the square-root dictated by the last remark,
1  
s(z) = z z2 4 . (2.4.7)
2
2.4 S TIELTJES TRANSFORMS AND RECURSIONS 47

Comparing with (2.1.6) and using Remark 2.4.2, one deduces that s(z) is the Stielt-
jes transform of the semicircle law , since s(z) coincides with the latter for |z| > 2
and hence for all z C \ R by analyticity. Applying again Theorem 2.3.5 and Re-
mark 2.3.6, it follows that SLN (z) converges in probability to s(z), solution of
(2.4.7), for all z C \ R. The proof is completed by using part (c) of Theorem
2.4.4.

2.4.2 General Wigner matrices

We consider in this section the case when XN is a Wigner matrix. We give now:

Proof #3 of Theorem 2.1.1 (XN a Wigner matrix).


We begin again with a general fact valid for arbitrary symmetric matrices.

(1)
Lemma 2.4.6 Let W HN be a symmetric matrix, and let wi denote the ith col-
umn of W with the entry W (i, i) removed (i.e., wi is an N 1-dimensional vector).
(1)
Let W (i) HN1 denote the matrix obtained by erasing the ith column and row
from W . Then, for every z C \ R,
1
(W zI)1 (i, i) = . (2.4.8)
W (i, i) z wTi (W (i) zIN1 )1 wi

Proof of Lemma 2.4.6 Note first that from Cramers rule,


det(W (i) zIN1 )
(W zIN )1 (i, i) = . (2.4.9)
det(W zI)
Write next
 
W (N) zIN1 wN
W zIN = ,
wNT W (N, N) z

and use the matrix identity (A.1) with A = W (N) zIN1 , B = wN , C = wTN and
D = W (N, N) z to conclude that

det(W zIN ) =

det(W (N) zIN1 ) det W (N, N) z wTN (W (N) zIN1 )1 wN .

The last formula holds in the same manner with W (i) , wi and W (i, i) replacing
W (N) , wN and W (N, N) respectively. Substituting in (2.4.9) completes the proof of
Lemma 2.4.6.

We are now ready to return to the proof of Theorem 2.1.1. Repeating the trunca-
tion argument used in the proof of Theorem 2.1.21, we may and will assume in
48 2. W IGNER MATRICES

the sequel thatXN (i, i) = 0 for all i and that for some constant C independent of N,
it holds that | NXN (i, j)| C for all i, j. Define k (i) = XN (i, k), i.e. k is the kth
column of the matrix XN . Let k denote the N 1 dimensional vector obtained
(k) (1)
from k by erasing the entry k (k) = 0. Denote by XN HN the matrix con-
sisting of XN with the kth row and column removed. By Lemma 2.4.6, one gets
that
1 1 N 1
N
trSXN (z) = (i)
N i=1 z T (X zIN1 )1 i
i N
1
= N (z) , (2.4.10)
z + N 1 trSXN (z)
where
1 N i,N
N (z) =
N i=1 (z N trSXN (z) + i,N )(z N 1 trSXN (z))
1
, (2.4.11)

and
i,N = N 1 trSXN (z) iT (XN zIN1 )1 i .
(i)
(2.4.12)

Our next goal is to prove the convergence in probability of N (z) to zero for
each fixed z C \ R with |z| = 0 > 0. Toward this end, note that the term
z N 1 trSXN (z)) in the right side of (2.4.11) has modulus at least 0 , since
|z| = 0 and all eigenvalues of XN are real. Thus, if we prove the convergence
of supiN |i,N | to zero in probability, it will follow that N (z) converges to 0 in
(i)
probability. Toward this end, let XN denote the matrix XN with the ith column
(i) (i)
and row set to zero. Then, the eigenvalues of XN and XN coincide except that
(i)
XN has one more zero eigenvalue. Hence,
1 1
|trS (i) (z) trS (i) (z)| ,
N XN XN 0 N
(i) (i) (i) (i)
whereas, with the eigenvalues of XN denoted 1 2 N , and those
of XN denoted 1N 2N NN , one has
 1/2
1 1 N (i) 1 1 N (i)
N
|trS (i) (z) trSXN (z)|
XN | k | 2 N |k k |
02 N k=1 k
N N 2
0 k=1
 1/2
1 2 N

02 N k=1
XN (i, k)2 ,

where Lemma 2.1.19 was used in the last inequality. Since | NXN (i, j)| C,
we get that supi N 1 |trS (i) (z) trSXN (z)| converges to zero (deterministically).
XN
2.4 S TIELTJES TRANSFORMS AND RECURSIONS 49

Combining the above, it follows that to prove the convergence of supiN |i,N | to
zero in probability, it is enough to prove the convergence to 0 in probability of
supiN |i,N |, where

(i) 1 (i)
i,N = iT BN (z)i trBN (z)
N
 
1 N1 2 N1
BN (z)(k, k) + i (k)i (k )BN (z)(k, k )
(i) (i)
= N i (k) 1
N k=1 k,k =1,k =k
=: i,N (1) + i,N (2) , (2.4.13)

where BN (z) = (XN zIN1 )1 . Noting that i is independent of BN (z), and


(i) (i) (i)

possesses zero mean independent entries of variance 1/N, one observes by condi-
(i)
tioning on the sigma-field Fi,N generated by XN that E i,N = 0. Further, since
  1
N 1 tr BN (z)2 2 ,
(i)
0

and the random variables | N i (k)| are uniformly bounded, it follows that
c1
E|i,N (1)|4 .
N2
for some constant c1 that depends only on 0 and C. Similarly, one checks that
c2
E|i,N (2)|4 2 ,
N
for some constant c2 depending only on C, 0 . One obtains then, by Chebyshevs
inequality, the claimed convergence of supiN |i,N (z)| to 0 in probability.
The rest of the argument is similar to what has already been done in Section
2.4.1, and is omitted.

Remark 2.4.7 We note that reconstruction and continuity results that are stronger
than those contained in Theorems 2.4.3 and 2.4.4 are available. An accessible
introduction to these and their use in RMT can be found in [Bai99]. For example,
in Theorem 2.4.3, if possesses a Holder continuous density m then, for R,

(dx)
S ( + i0) := lim S ( + ) = i m( ) + P.V. (2.4.14)
0 R x
exists, where the notation P.V. stands for principal value. Also, in the context of
Theorem 2.4.4, if the and are probability measures supported on [B, B], a,
are constants satisfying

1 1 1
:= du > ,
|u|a u2 + 1 2
50 2. W IGNER MATRICES

and A is a constant satisfying

4B
:= (0, 1) ,
(A B)(2 1)

then for any v > 0,

(1 )(2 1) sup | ([B, x]) ([B, x])


|x|B
 A
|S (u + iv) S (u + iv)|du (2.4.15)
A
 
1
+ sup | ([B, x + y]) ([B, x])|dy .
v x |y|2va

In the context of random matrices, equation (2.4.15) is useful in obtaining the rate
of convergence of LN to its limit, but we will not discuss this issue here at all.

Exercise 2.4.8 Let Y (N) be a sequence of matrices as in Exercise 2.1.18. By writ-


M(N)
ing WN = YN YNT = i=1 yi yTi for appropriate vectors yi , and again using Lemma
A.1, provide a proof of points (d) and (e) of Exercise 2.1.18 based on Stieltjes
transforms, showing that N 1 trSWN (z) converges to the solution of the equation
m(z) = 1/(z /(1 + m(z)).
Hint: use the equality

IN + (z x)(WN zIN )1 = (WN xIN )(WN zIN )1 , (2.4.16)

and then use the equality

1
yTi (B + yi yTi )1 = yT B1 ,
1 + yTi B1 yi i

with the matrices Bi = WN zI yi yTi , to show that the normalized trace of the
right side of (2.4.16) converges to 0.

2.5 Joint distribution of eigenvalues in the GOE and the GUE

We are going to calculate the joint distribution of eigenvalues of a random sym-


metric or Hermitian matrix under a special type of probability law which displays
a high degree of symmetry but still makes on-or-above-diagonal entries indepen-
dent so that the theory of Wigner matrices applies.
2.5 J OINT DISTRIBUTIONS IN THE GOE AND THE GUE 51

2.5.1 Definition and preliminary discussion of the GOE and the GUE

Let {i, j , i, j }
i, j=1 be an i.i.d. family of real mean 0 variance 1 Gaussian random
variables. We define
(1) (1)
P2 , P3 , . . .

to be the laws of the random matrices



" # 21,1
21,1 1,2 (1) 1,2 1,3
H2 , 1,2 22,2 H (1) , . . . ,
1,2 22,2 2,3 3
1,3 2,3 23,3
respectively. We define
(2) (2)
P2 , P3 , . . .

to be the laws of the random matrices


1,2 +i1,2 1,3 +i1,3
11
1,2 +i1,2
1,1 1,2 i1,2 2 2
2,3 +i2,3
2 H ,
(2)
2,2 H (2) , . . . ,
1,2 i1,2

2 2 2 3
2,2 1,3 i1,3 2,3 i2,3
2
2

2
3,3

( ) ( )
respectively. A random matrix X HN with law PN is said to belong to the
Gaussian orthogonal ensemble (GOE) or the Gaussian unitary ensemble (GUE)
according as = 1 or = 2, respectively. (We often write GOE(N) and GUE(N)
when an emphasis on the dimension is needed.) The theory of Wigner matrices
developed in previous sections of this book applies here. In particular, for fixed
( ) ( )
, given for each N a random matrix X(N) H N with law PN , the empirical
distribution of the eigenvalues of XN := X(N)/ N tends to the semicircle law of
mean 0 and variance 1.
( )
So whats special about the law PN within the class of laws of Wigner matri-
( )
ces? The law PN is highly symmetrical. To explain the symmetry, as well as
to explain the presence of the terms orthogonal and unitary in our terminol-
( ) ( )
ogy, let us calculate the density of PN with respect to Lebesgue measure N on
( ) ( )
HN . To fix N unambiguously (rather than just up to a positive constant fac-
tor) we use the following procedure. In the case = 1, consider the one-to-one
(1)
onto mapping HN RN(N+1)/2 defined by taking on-or-above-diagonal entries
(1)
as coordinates, and normalize N by requiring it to push forward to Lebesgue
measure on R N(N+1)/2 . Similarly, in the case = 2, consider the one-to-one
(2) 2
onto mapping HN RN CN(N1)/2 = RN defined by taking on-or-above-
(2)
diagonal entries as coordinates, and normalize N by requiring it to push forward
52 2. W IGNER MATRICES
2 ( )
to Lebesgue measure on RN . Let Hi, j denote the entry of H HN in row i and
column j. Note that
N
trH 2 = trHH = Hi,i
2
+2 |Hi, j |2 .
i=1 1i< jN

It is a straightforward matter now to verify that



N/2
( )
2 (2 )N(N+1)/4 exp(trH 2 /4) if = 1,
dPN
(H) = (2.5.1)
dN
( )
2N/2 N /2 exp(trH 2 /2) if = 2.
2

( )
The latter formula clarifies the symmetry of PN . The main thing to notice is that
the density at H depends only on the eigenvalues of H. It follows that if X is a
(1) (1)
random element of HN with law PN , then for any N N orthogonal matrix U,
again UXU has law PN ; and similarly, if X is a random element of HN with
(1) (2)

law PN , then for any N N unitary matrix U, again UXU has law PN . As
(2) (2)

( )
we already observed, for random X HN it makes sense to talk about the joint
distribution of the eigenvalues 1 (X) N (X).

Definition 2.5.1 Let x = (x1 , . . . , xN ) CN . The Vandermonde determinant asso-


ciated with x is
(x) = det({xij1 }ni, j=1 ) = (x j xi ) . (2.5.2)
i< j

(For an easy verification of the second equality in (2.5.2), note that the determinant
is a polynomial that must vanish when xi = x j for any pair i = j.)
The main result in this section is the following.

Theorem 2.5.2 (Joint distribution of eigenvalues: GOE and GUE) Let X


( ) ( )
HN be random with law PN , = 1, 2. The joint distribution of the eigenvalues
1 (X) N (X) has density with respect to Lebesgue measure which equals
N
( )
N!CN 1x1 xN |(x)| e xi /4 ,
2
(2.5.3)
i=1
where
 
1
N
( )
N!CN = N!



|(x)|
e xi2 /4
dxi
i=1
  N(N1)/4+N/2 N
( /2)
= (2 )N/2 ( j /2) . (2.5.4)
2 j=1
2.5 J OINT DISTRIBUTIONS IN THE GOE AND THE GUE 53

Here, for any positive real s,



(s) = xs1 ex dx (2.5.5)
0

is Eulers Gamma function.

( )
Remark 2.5.3 We refer to the probability measure PN on RN with density
( ) N
dPN ( )
= CN |(x)| e xi /4 ,
2
(2.5.6)
dLebN i=1

where LebN is the Lebesgue measure on RN and CN is given in (2.5.4), as the law
of the unordered eigenvalues of the GOE(N) (when = 1) or GUE(N) (when =
2). The special case = 4 corresponds to the GSE(N) (see Section 4.1 for details
on the explicit construction of random matrices whose eigenvalues are distributed
(4)
according to PN ).
( )
The distributions PN for 1, = 1, 2, 4 also appear as the law of the
unordered eigenvalues of certain random matrices, although with a very different
structure, see Section 4.5.

A consequence of Theorem 2.5.2 is that a.s., the eigenvalues of the GOE and
GUE are all distinct. Let v1 , . . . , vN denote the eigenvectors corresponding to the
eigenvalues (1N , . . . , NN ) of a matrix X from GOE(N) or GUE(N), with their first
nonzero entry positive real. Recall that O(N) (the group of orthogonal matrices)
and U(N) (the group of unitary matrices) admit a unique Haar probability measure
(see Theorem F.13). The invariance of the law of X under arbitrary orthogonal
(unitary) transformations implies then the following.

Corollary 2.5.4 The collection (v1 , . . . , vN ) is independent of the eigenvalues


(1N , . . . , NN ). Each of the eigenvectors v1 , . . . , vN is distributed uniformly on
N1
S+ = {x = (x1 , . . . , xN ) : xi R, x2 = 1, x1 > 0}

(for the GOE), or on


N1
SC,+ = {x = (x1 , . . . , xN ) : x1 R, xi C for i 2, x2 = 1, x1 > 0}

(for the GUE). Further, (v1 , . . . , vN ) is distributed like a sample of Haar measure
on O(N) (for the GOE) or U(N) (for the GUE), with each column multiplied by a
N1 N1
norm one scalar so that the columns all belong to S+ (for the GOE) and SC,+
(for the GUE).

Proof Write X = UDU . Since T XT possesses the same eigenvalues as X and


54 2. W IGNER MATRICES

is distributed like X for any orthogonal (in the GOE case) or unitary (in the GUE
case) T independent of X, and since choosing T uniformly according to Haar
measure and independent of U makes TU Haar distributed and hence of law in-
dependent of that of U, the independence of the eigenvectors and the eigenvalues
follows. All other statements are immediate consequences of this and the fact that
each column of a Haar distributed orthogonal (resp., unitary) matrix is distributed,
after multiplication by a scalar that makes its first entry real and nonnegative, uni-
formly on S+ N1 N1
(resp. SC,+ ).

2.5.2 Proof of the joint distribution of eigenvalues

We present in this section a proof of Theorem 2.5.2 that has the advantage of
being direct, elementary, and not requiring much in terms of computations. On
the other hand, this proof is not enough to provide one with the evaluation of the

normalization constant CN in (2.5.4). The evaluation of the latter is postponed to
subsection 2.5.3, where the Selberg integral formula is derived. Another approach
to evaluating the normalization constants, in the case of the GUE, is provided in
Section 3.2.1.
( )
The idea behind the proof of Theorem 2.5.2 is as follows. Since X HN ,
there exists a decomposition X = UDU , with eigenvalue matrix D DN , where
DN denotes diagonal matrices with real entries, and with eigenvector matrix U
( ) ( )
UN , where UN denotes the collection of orthogonal matrices (when = 1)
or unitary matrices (when = 2). Suppose this map were a bijection (which it
is not, at least at the matrices X without distinct eigenvalues) and that one could
( )
parametrize UN using N(N 1)/2 parameters in a smooth way (which one
cannot). An easy computation shows that the Jacobian of the transformation
would then be a polynomial in the eigenvalues with coefficients that are func-
( )
tions of the parametrization of UN , of degree N(N 1)/2. Since the bijection
must break down when Dii = D j j for some i = j, the Jacobian must vanish on
that set; symmetry and degree considerations then show that the Jacobian must
( )
be proportional to the factor (x) . Integrating over the parametrization of UN
then yields (2.5.3).
In order to make the above construction work, we need to throw away subsets
( )
of HN that fortunately turn out to have zero Lebesgue measure. Toward this
( )
end, we say that U UN is normalized if every diagonal entry of U is strictly
( )
positive real. We say that U UN is good if it is normalized and every entry of
( ),g
U is nonzero. The collection of good matrices is denoted UN . We also say that
D DN is distinct if its entries are all distinct, denoting by DNd the collection of
2.5 J OINT DISTRIBUTIONS IN THE GOE AND THE GUE 55

distinct matrices, and by DNdo the subset of matrices with decreasing entries, that
is DNdo = {D DNd : Di,i > Di+1,i+1 }.
( ),dg
Let HN denote the subset of H ( ) consisting of those matrices that possess
( ),g
a decomposition X = UDU where D DNd and U UN . The first step is
contained in the following lemma.

( ) ( ),dg
Lemma 2.5.5 HN \ HN has null Lebesgue measure. Further, the map
( ),g ( ),dg
(DN , UN ) HN
do given by (D,U)  UDU is one-to-one and onto, while
( ),g ( ),dg
(DNd , UN ) HN given by the same map is N!-to-one.

Proof of Lemma 2.5.5 In order to prove the first part of the lemma, we note
that for any nonvanishing polynomial function p of the entries of X, the set {X :
p(X) = 0} is closed and has zero Lebesgue measure (this fact can be checked by
applying Fubinis Theorem). So it is enough to exhibit a nonvanishing polynomial
( ) ( ),dg
p with p(X) = 0 if X HN \ HN . Toward this end, we will show that
for such X, either X has some multiple eigenvalue, or, for some k, X and the
matrix X (k) obtained by erasing the kth row and column of X possess a common
eigenvalue.
Given any n by n matrix H, for i, j = 1, . . . , n let H (i, j) be the n 1 by n 1
matrix obtained by deleting the ith column and jth row of H, and write H (k) for
H (k,k) . We begin by proving that if X = UDU with D DNd , and X and X (k) do
not have eigenvalues in common for any k = 1, 2, . . . , N, then all entries of U are
nonzero. Indeed, let be an eigenvalue of X, set A = X I, and define Aadj as the
adj
N by N matrix with Ai, j = (1)i+ j det(A(i, j) ). Using the identity AAadj = det(A)I,
one concludes that AAadj = 0. Since the eigenvalues of X are assumed distinct,
the null space of A has dimension 1, and hence all columns of Aadj are scalar
multiple of some vector v , which is then an eigenvector of X corresponding to the
adj
eigenvalue . Since v (i) = Ai,i = det(X (i) I) = 0 by assumption, it follows
that all entries of v are nonzero. But each column of U is a nonzero scalar
multiple of some v , leading to the conclusion that all entries of U do not vanish.
We recall, see Appendix A.4, that the resultant of the characteristic polynomials
of X and X (k) , which can be written as a polynomial in the entries of X and X (k) ,
and hence as a polynomial P1 in the entries of X, vanishes if and only if X and X (k)
have a common eigenvalue. Further, the discriminant of X, which is a polynomial
P2 in the entries of X, vanishes if and only if not all eigenvalues of X are distinct.
Taking p(X) = P1 (X)P2 (X), one obtains a nonzero polynomial p with p(X) = 0
( ) ( ),dg
if X HN \ HN . This completes the proof of the first part of Lemma 2.5.5.
The second part of the lemma is immediate since the eigenspace corresponding
56 2. W IGNER MATRICES

to each eigenvalue is of dimension 1, the eigenvectors are fixed by the normaliza-


tion condition, and the multiplicity arises from the possible permutations of the
order of the eigenvalues.

( ),g
Next, we say that U UN is very good if all minors of U have nonvanishing
( ),vg
determinant. Let UN denote the collection of very good matrices. The interest
in such matrices is that they possess a particularly nice parametrization.

( ),vg
Lemma 2.5.6 The map T : UN R N(N1)/2 defined by
 
U1,2 U1,N U2,3 U2,N UN1,N
T (U) = ,..., , ,..., ,..., (2.5.7)
U1,1 U1,1 U2,2 U2,2 UN1,N1
(where C is identified
 with R2in the case = 2) is one-to-one with smooth inverse.
( ),vg c
Further, the set T (UN ) is closed and has zero Lebesgue measure.

Proof of Lemma 2.5.6 We begin with the first part. The proof is by an inductive
2
construction. Clearly, U1,1 = 1 + Nj=2 |U1, j |2 /|U1,1 |2 . So suppose that Ui, j are
given for 1 i i0 and 1 j N. Let vi = (Ui,1 , . . . ,Ui,i0 ), i = 1, . . . , i0 . One
then solves the equation
 U 
U1,i0 +1 + Ni=i0 +2 U1,i Ui +1,i
i0 +1,i
v1  U0 0 +1 
v2 N i0 +1,i
2,i0 +1 i=i0 +2 2,i Ui +1,i
U + U
. Z = 0 0 +1 .
.. .
..
 U 
vi0 i0 +1,i
Ui0 ,i0 +1 + Ni=i0 +2 Ui0 ,i Ui +1,i +1
0 0

The very good condition on U ensures that the vector Z is uniquely determined by
this equation, and one then sets
N

i0
Ui0 +1,i 2
Ui2
0 +1,i0 +1
= 1 + k |Z |2
+
k=1 i=i0 +2 Ui0 +1,i0 +1

and
Ui0 +1, j = Z j Ui0 +1,i0 +1 , for 1 j i0 .
(All entries Ui0 +1, j with j > i0 + 1 are then determined by T (U).) This completes
the proof of the first part.
( )
To see the second part, let ZN be the space of matrices whose columns are
orthogonal, whose diagonal entries all equal to 1, and all of whose minors have
( )
nonvanishing determinants. Define the action of T on ZN using (2.5.7). Then,
( ),vg ( )
T (UN ) = T (ZN ). Applying the previous constructions, one immediately
2.5 J OINT DISTRIBUTIONS IN THE GOE AND THE GUE 57

obtains a polynomial type condition for a point in R N(N1)/2 to not belong to the
( )
set T (ZN ).

( ),vg ( ),dg
Let HN denote the subset of HN consisting of those matrices X that
( ),vg
can be written as X = UDU with D DN and U UN
d .

( ) ( ),vg
Lemma 2.5.7 The Lebesgue measure of HN \ HN is zero.

( ),vg
Proof of Lemma 2.5.7 We identify a subset of HN which we will prove to
be of full Lebesgue measure. We say that a matrix D DNd is strongly distinct if
for any integer r = 1, 2, . . . , N 1 and subsets I, J of {1, 2, . . . , N},

I = {i1 < < ir }, J = { j1 < < jr }


( ),sdg
with I = J, it holds that iI Di,i = iJ Di,i . We consider the subset HN
( ),vg
of HN consisting of those matrices X = UDU with D strongly distinct and
( ),vg
U UN .
Given a positive integer r and subsets I, J as above, put
*
r r
( X)IJ := det Xi , j ,
, =1
+
thus defining a square matrix r X with rows and columns indexed by r-element
subsets of {1, . . . , n}. If we replace each entry of X by its complex conjugate, we
+
replace each entry of r X by its complex conjugate. If we replace X by its trans-
+r
pose, we replace X by its transpose. Given another N by N matrix Y with com-
+ + +
plex entries, by the CauchyBinet Theorem A.2 we have r (XY ) = ( r X)( r Y ).
( ) + ( )
Thus, if U UN then r U Ucr where crN = N!/(N r)!r!. We thus obtain
+ N + + + +
that if X = UDU then r X can be decomposed as r X = ( r U)( r D)( r U ).
+
In particular, if D is not strongly distinct then, for some r, r X does not possess all
( ),vg
eigenvalues distinct. Similarly, if D is strongly distinct but U UN , then some
+r
entry of U vanishes. Repeating the argument presented in the proof of the first
( ) ( ),sdg
part of Lemma 2.5.5, we conclude that the Lebesgue measure of HN \ HN
vanishes. This completes the proof of the lemma.

We are now ready to provide the

Proof of (2.5.3) Recall the map T introduced in Lemma 2.5.6, and define the
( ),vg ( ) ( ),vg
map T : T (UN ) RN HN by setting, for RN and z T (UN ),
D DN with Di,i = i and T (z, ) = T 1 (z)DT 1 (z) . By Lemma 2.5.6, T is
smooth, whereas by Lemma 2.5.5, it is N!-to-1 on a set of full Lebesgue measure
and is locally one-to-one on a set of full Lebesgue measure. Letting J T denote the
58 2. W IGNER MATRICES

Jacobian of T , we note that J T (z, ) is a homogeneous polynomial in of degree


(at most) N(N 1)/2, with coefficients that are functions of z (since derivatives
of T (z, ) with respect to the -variables do not depend on , while derivatives
with respect to the z variables are linear in ). Note next that T fails to be locally
one-to-one when i = j for some i = j. In particular, it follows by the implicit
function theorem that J T vanishes at such points. Hence, ( ) = i< j ( j i )
is a factor of J T . In fact, we have that
( ) is a factor of J T . (2.5.8)
We postpone the proof of (2.5.8) in the case = 2. Since ( ) is a polynomial
of degree N(N 1)/2, it follows from (2.5.8) that J T (z, ) = g(z)( ) for some
(continuous, hence measurable) function g. By Lemma 2.5.7, we conclude that
for any function f that depends only on the eigenvalues of X, it holds that
   N
( )
f ( )|( )| e i /4 d i .
2
N! f (H)dPN = |g(z)|dz
i=1

Up to the normalization constant ( |g(z)|dz)/N!, this is (2.5.3).
It only remains to complete the proof of (2.5.8) in the case = 2. Writing for
brevity W = T 1 (z), we have T = W DW , and W W = I. Using the notation
d T for the matrix of differentials of T , we have d T = (dW )DW + W (dD)W +
W D(dW ). Using the relation d(W W ) = (dW )W + W (dW ) = 0, we deduce
that
W (d T )W = W (dW )D DW (dW ) + (dD) .
Therefore, when i = j for some i = j, a complex entry (above the diagonal) of
W (d T )W vanishes. This implies that, when i = j , there exist two linear (real)
relations between the on-and-above diagonal entries of d T , which implies in turn
that (i j )2 must divide J T .

2.5.3 Selbergs integral formula and proof of (2.5.4)


To complete the description of the joint distribution of eigenvalues of the GOE,
GUE and GSE, we derive in this section an expression for the normalization con-
stant in (2.5.4). The value of the normalization constant does not play a role in the
rest of this book, except for Section 2.6.2.
We begin by stating Selbergs integral formula. We then describe in Corol-
lary 2.5.9 a couple of limiting cases of Selbergs formula. The evaluation of the
normalization constant in (2.5.4) is immediate from Corollary 2.5.9. Recall, see
Definition 2.5.1, that (x) denotes the Vandermonde determinant of x.
2.5 J OINT DISTRIBUTIONS IN THE GOE AND THE GUE 59

Theorem 2.5.8 (Selbergs integral formula) For all positive numbers a, b and c
we have
 1  1
1 n n1
(a + jc)(b + jc)(( j + 1)c)
|(x)|2c xia1 (1 xi )b1 dxi = .
n! 0 0 i=1 j=0 (a + b + (n + j 1)c)(c)
(2.5.9)

Corollary 2.5.9 For all positive numbers a and c we have


 
1 n n1
(a + jc)(( j + 1)c)
|(x)|2c xia1 exi dxi = , (2.5.10)
n! 0 0 i=1 j=0 (c)

and
 
1 n n1
(( j + 1)c)
|(x)|2c exi /2 dxi = (2 )n/2
2
. (2.5.11)
n! i=1 j=0 (c)

Remark 2.5.10 The identities in Theorem 2.5.8 and Corollary 2.5.9 hold under
rather less stringent conditions on the parameters a, b and c. For example, one
can allow a, b and c to be complex with positive real parts. We refer to the biblio-
graphical notes for references. We note also that only (2.5.11) is directly relevant
to the study of the normalization constants for the GOE and GUE. The usefulness
of the other more complicated formulas will become apparent in Section 4.1.

We will prove Theorem 2.5.8 following Andersons method [And91], after first
explaining how to deduce Corollary 2.5.9 from (2.5.9) by means of the Stirling
approximation, which we recall is the statement
  
2 s s
(s) = (1 + os+ (1)), (2.5.12)
s e
where s tends to + along the positive real axis. (For a proof of (2.5.12) by an
application of Laplaces method, see Exercise 3.5.5.)
Proof of Corollary 2.5.9 We denote the left side of (2.5.9) by Sn (a, b, c). Consider
first the integral
 s  s n
1
Is = (x)2c xia1 (1 xi /s)s dxi ,
n! 0 0 i=1

where s is a large positive number. By monotone convergence, the left side of


(2.5.10) equals lims Is . By rescaling the variables of integration, we find that

Is = sn(a+(n1)c) Sn (a, s + 1, c) .
60 2. W IGNER MATRICES

From (2.5.12) we deduce the formula


(s + 1 + A)
= sAB (1 + os+ (1)) , (2.5.13)
(s + 1 + B)
in which A and B are any real constants. Finally, assuming the validity of (2.5.9),
we can evaluate lims Is with the help of (2.5.13), thus verifying (2.5.10).
Turning to the proof of (2.5.11), consider the integral
 2s  2s n  s
1 xi2
Js =
n! 2s

2s
|(x)|2c
1
2s
dxi ,
i=1

where s is a large positive number. By monotone convergence the left side of


(2.5.11) equals lims Js . By shifting and rescaling the variables of integration,
we find that

Js = 23n(n1)/2+3n/2+2ns sn(n1)c/2+n/2 Sn (s + 1, s + 1, c) .

From (2.5.12) we deduce the formula


(2s + 2 + A) 2A+3/2+2s sA2B+1/2
= (1 + os+ (1)) , (2.5.14)
(s + 1 + B)2 2
where A and B are any real constants. Assuming the validity of (2.5.9), we can
evaluate lims Js with the help of (2.5.14), thus verifying (2.5.11).

Before providing the proof of Theorem 2.5.8, we note the following identity
involving the beta integral in the left side:

 sn+1 1
n n
(s1 ) (sn+1 )
1 xi xisi 1 dxi = .
{xRn :minni=1 xi >0,ni=1 xi <1} i=1 i=1 (s 1 + + sn+1 )

(2.5.15)

The identity (2.5.15) is proved by substituting u1 = tx1 , . . . , un = txn , un+1 = t(1


x1 xn ) in the integral
  n+1

0

0
usi i 1 eui dui ,
i=1

and applying Fubinis Theorem both before and after the substitution.
Proof of Theorem 2.5.8 We aim now to rewrite the left side of (2.5.9) in an
intuitive way, see Lemma 2.5.12 below. Toward this end, we introduce some
notation.
Let Dn be the space consisting of monic polynomials P(t) of degree n in a vari-
able t with real coefficients such that P(t) has n distinct real roots. More generally,
2.5 J OINT DISTRIBUTIONS IN THE GOE AND THE GUE 61

given an open interval I R, let Dn I Dn be the subspace consisting of polyno-


mials with n distinct roots in I. Given x Rn , let Px (t) = t n + ni=1 (1)i xnit ni .
For any open interval I R, the set {x Rn | Px Dn I} is open, since the pertur-
bation of a degree n polynomial by the addition of a degree n 1 polynomial
with small real coefficients does not destroy the property of having n distinct
real roots, nor does it move the roots very much. By definition a set A Dn
is measurable if and only if {x Rn | Px A} is Lebesgue measurable. Let n
be the measure on Dn obtained by pushing Lebesgue measure on the open set
{x Rn | Px Dn } forward to Dn via x  Px (that is, under n , monic polynomials
of degree n have coefficients that are jointly Lebesgue distributed). Given P Dn ,
we define k (P) R for k = 0, . . . , n by the rule P(t) = nk=0 (1)k k (P)t nk .
Equivalently, if 1 < < n are the roots of P Dn , we have 0 (P) = 1 and
k (P) = i1 ik
1i1 <<ik n

for k = 1, . . . , n. The map (P  (1 (P), . . . , n (P))) : Dn Rn inverts the map


(x  Px ) : {x Rn | Px Dn } Dn . Let D ,n Rn be the open set consisting
of n-tuples (1 , . . . , n ) such that 1 < < n . Finally, for P Dn with roots
= (1 < < n ), we set D(P) = i< j ( j i )2 = ( )2 .

,n put
Lemma 2.5.11 For k,  = 1, . . . , n and = (1 , . . . , n ) D
k
k = k (1 , . . . , n ) = i1 ik , k, =

.
1i1 <<ik n

Then
n
det k, = |i j | = |( )| . (2.5.16)
k,=1
1i< jn

Proof We have
 
k, = k1 (t i ) ,
i{1,...,n}\{}

whence follows the identity


n
(1)m1 knm m, = k ( i ) .
m=1 i{1,...,n}\{}

This last is equivalent to a matrix identity AB = C where det A up to a sign equals


the Vandermonde determinant detni, j=1 ni
j , det B is the determinant we want to
calculate, and detC up to a sign equals (det A)2 . Formula (2.5.16) follows.

(See Exercise 2.5.16 for an alternative proof of Lemma 2.5.11.)


62 2. W IGNER MATRICES

We can now rewrite (2.5.9).

Lemma 2.5.12 The left side of (2.5.9) equals



|P(0)|a1 |P(1)|b1 D(P)c1/2 dn (P) . (2.5.17)
Dn (0,1)

Proof We prove a slightly more general statement: for any nonnegative


n -measurable function f on Dn , we have
  n
f dn = f ((t i ))( )d 1 d n , (2.5.18)
Dn ,n
D i=1

from which (2.5.17) follows by taking f (P) = |P(0)|a1 |P(1)|b1 D(P)c1/2 . To


see (2.5.18), put g(x) = f (Px ) for x Rn such that Px Dn . Then, the left side of
(2.5.18) equals
 
n
g(x1 , . . . , xn )dx1 dxn = g(1 , . . . , n ) det k, d 1 . . . d n ,
{xRn |Px Dn } ,n
D k,=1
(2.5.19)
by the usual formula for changing variables in a multivariable integral. The left
sides of (2.5.18) and (2.5.19) are equal by definition; the right sides are equal by
(2.5.16).

We next transform some naturally occurring integrals on Dn to beta integrals,


see Lemma 2.5.15 below. This involves some additional notation. Let En Dn
Dn+1 be the subset consisting of pairs (P, Q) such that the roots 1 < < n
of P and the roots 1 < < n+1 of Q are interlaced, that is, i (i , i+1 ) for
i = 1, . . . , n. More generally, given an interval I R, let En I = En (Dn I Dn+1 I).

Lemma 2.5.13 Fix Q Dn+1 with roots 1 < < n+1 . Fix real numbers
1 , . . . , n+1 and let P(t) be the unique polynomial in t of degree n with real
coefficients such that the partial fraction expansion
P(t) n+1 i
=
Q(t) i=1 t i
holds. Then the following statements are equivalent:
(I) (P, Q) En .
i=1 i > 0 and i=1 i = 1.
(II) minn+1 n+1

Proof (III) The numbers P(i ) do not vanish and their signs alternate. Similarly,
the numbers Q (i ) do not vanish and their signs alternate. By LHopitals rule, we
have i = P(i )/Q (i ) for i = 1, . . . , n + 1. Thus all the quantities i are nonzero
2.5 J OINT DISTRIBUTIONS IN THE GOE AND THE GUE 63

and have the same sign. The quantity P(t)/Q (t) depends continuously on t in
the interval [n+1 , ), does not vanish in that interval, and tends to 1/(n + 1) as
t +. Thus n+1 is positive. Since the signs of P(i ) alternate, and so do the
signs of Q (i ), it follows that i = P(i )/Q (i ) > 0 for all i. Because P(t) is
monic, the numbers i sum to 1. Thus condition (II) holds.
(III) Because the signs of the numbers Q (i ) alternate, we have sufficient in-
formation to force P(t) to change sign n + 1 times, and thus to have n distinct real
roots interlaced with the roots of Q(t). And because the numbers i sum to 1, the
polynomial P(t) must be monic in t. Thus condition (I) holds.

Lemma 2.5.14 Fix Q Dn+1 with roots 1 < < n+1 . Then we have

1 n+1 D(Q)1/2
n ({P Dn | (P, Q) En }) =
n! j=1
|Q ( j )|1/2 =
n!
. (2.5.20)

Proof Consider the set

A = {x Rn | (Px , Q) En } .

By definition the left side of (2.5.20) equals the Lebesgue measure of A. Consider
the polynomials Q j (t) = Q(t)/(t j ) for j = 1, . . . , n + 1. By Lemma 2.5.13, for
all x Rn , we have x A if and only if Px (t) = n+1 i=1 i Qi (t) for some real numbers
i such that min i > 0 and i = 1, or equivalently, A is the interior of the convex
hull of the points
 
2, j (1 , . . . , n+1 ), . . . , n+1, j (1 , . . . , n+1 ) Rn for j = 1, . . . , n + 1 ,

where the s are defined as in Lemma 2.5.11 (but with n replaced by n+1). Noting
that 1, 1 for  = 1, . . . , n + 1, the Lebesgue measure of A equals the absolute
k,=1 k, (1 , . . . , n+1 ) by the determinantal formula for computing
value of n!1 detn+1
the volume of a simplex in Rn . Finally, we get the claimed result by (2.5.16).

Lemma 2.5.15 Fix Q Dn+1 with roots 1 < < n+1 . Fix positive numbers
s1 , . . . , sn+1 . Then we have
 si 1/2 (s )
i=1 |Q (i )|
n+1
n+1
|P(i )|si 1 dn (P) =
i
. (2.5.21)
{PDn |(P,Q)En } i=1 (i=1 si )
n+1

Proof For P in the domain of integration in the left side of (2.5.21), define i =
i (P) = P(i )/Q (i ), i = 1, . . . , n + 1. By Lemma 2.5.13, i > 0, n+1
i=1 i = 1,
and further P  (i )ni=1 is a bijection from {P Dn | (P, Q) En } to the domain
of integration in the right side of (2.5.15). Further, the map x  (Px ) is linear.
64 2. W IGNER MATRICES

Hence

n+1
P(i ) si 1
{PDn |(P,Q)En i=1 }
Q (i ) dn (P)
equals, up to a constant multiple C independent of {si }, the right side of (2.5.15).
Finally, by evaluating the left side of (2.5.21) for s1 = = sn+1 = 1 by means of
Lemma 2.5.14 (and recalling that (n + 1) = n!) we find that C = 1.

We may now complete the proof of Theorem 2.5.8. Recall that the integral on
the left side of (2.5.9), denoted as above by Sn (a, b, c), can be represented as the
integral (2.5.17). Consider the double integral

Kn (a, b, c) = |Q(0)|a1 |Q(1)|b1 |R(P, Q)|c1 dn (P)dn+1 (Q) ,
En (0,1)

where R(P, Q) denotes the resultant of P and Q, see Appendix A.4. We will apply
Fubinis Theorem in both possible ways. On the one hand, we have

Kn (a, b, c) = |Q(0)|a1 |Q(1)|b1
Dn+1 (0,1)
 
|R(P, Q)|
c1
dn (P) dn+1 (Q)
{PDn (0,1)|(P,Q)En }
(c)n+1
= Sn+1 (a, b, c) ,
((n + 1)c)

via Lemma 2.5.15. On the other hand, writing P = t(t 1)P, we have
 
Kn (a, b, c) =
Dn (0,1) {QDn+1 |(Q,P)En+2 }

|Q(0)|a1 |Q(1)|b1 |R(P, Q)|c1 dn+1 (Q) dn (P)

(a)(b)(c)n
= |P (0)|a1/2 |P (1)|b1/2 |R(P, P )|c1/2 dn (P)
Dn (0,1) (a + b + nc)
(a)(b)(c)n
= Sn (a + c, b + c, c) ,
(a + b + nc)
by another application of Lemma 2.5.15. This proves (2.5.9) by induction on n;
the induction base n = 1 is an instance of (2.5.15).

Exercise 2.5.16 Provide an alternative proof of Lemma 2.5.11 by noting that the
determinant in the left side of (2.5.16) is a polynomial of degree n(n 1)/2 that
vanishes whenever xi = x j for some i = j, and thus, must equal a constant multiple
of (x).
2.5 J OINT DISTRIBUTIONS IN THE GOE AND THE GUE 65

2.5.4 Joint distribution of eigenvalues: alternative formulation

It is sometimes useful to represent the formulas for the joint distribution of eigen-
values as integration formulas for functions that depend only on the eigenvalues.
We develop this correspondence now.
( )
Let f : HN [0, ] be a Borel function such that f (H) depends only on the
sequence of eigenvalues 1 (H) N (H). In this situation, for short, we say
that f (H) depends only on the eigenvalues of H. (Note that the definition implies
( )
that f is a symmetric function of the eigenvalues of H.) Let X HN be random
( )
with law PN . Assuming the validity of Theorem 2.5.2, we have
  N xi2 /4 dx
f (x1 , . . . , xN )|(x)| i=1 e i
E f (X) =   , (2.5.22)
2 /4
|(x)| i=1 e
N x i dxi

where f (x1 , . . . , xN ) denotes the value of f at the diagonal matrix with diago-
nal entries x1 , . . . , xN . Conversely, assuming (2.5.22), we immediately verify that
(2.5.3) is proportional to the joint density of the eigenvalues 1 (X), . . . , N (X) by
taking f (H) = 1(1 (H),...,N (H))A where A RN is any Borel set. In turn, to prove
(2.5.22), it suffices to prove the general integration formula
   N
( ) ( )
f (H)N (dH) = CN f (x1 , . . . , xN )|(x)| dxi , (2.5.23)
j=1

where


1 N (1/2)k



N! k=1 (k/2)
if = 1 ,
( )
CN =



1 N k1

(k 1)!
N! k=1
if = 2 ,

and as in (2.5.22), the integrand f (H) is nonnegative, Borel measurable, and de-
pends only on the eigenvalues of H. Moreover, assuming the validity of (2.5.23),
it follows by taking f (H) = exp(atr(H 2 )/2) with a > 0 and using Gaussian
integration that
  N
1
|(x)| eaxi /2 dxi
2

N! i=1
N
( j /2) 1
= (2 )N/2 a N(N1)/4N/2 =: . (2.5.24)
j=1 ( /2) ( )
N!CN
Thus, Theorem 2.5.2 is equivalent to the integration formula (2.5.23).
66 2. W IGNER MATRICES

2.5.5 Superposition and decimation relations

The goal of this short subsection is to show how the eigenvalues of the GUE can be
coupled (that is, constructed on the same probability space) with the eigenvalues
of the GOE. As a by-product, we also discuss the eigenvalues of the GSE. Besides
the obvious probabilistic interest in such a construction, the coupling will actually
save us some work in the analysis of limit distributions for the maximal eigenvalue
of the GOE and the GSE.
To state our results, we introduce some notation. For a finite subset A R with
|A| = n, we define Ord(A) to be the vector in Rn whose entries are the elements of
A, ordered, that is

Ord(A) = (x1 , . . . , xn ) with xi A and x1 x2 . . . xn .

For a vector x = (x1 , . . . , xn ) Rn , we define Dec(x) as the even-location deci-


mated version of x, that is

Dec(x) = (x2 , x4 , . . . , x2n/2 ) .

Note that if x is ordered, then Dec(x) erases from x the smallest entry, the third
smallest entry, etc.
The main result of this section is the following.

Theorem 2.5.17 For N > 0 integer, let AN and BN+1 denote the (collection of)
eigenvalues of two independent random matrices distributed according to GOE(N)
and GOE(N+1), respectively. Set

(1N , . . . , NN ) = N = Dec(Ord(AN BN+1 )) , (2.5.25)

and
(1N , . . . , NN ) = N = Dec(Ord(A2N+1 )) . (2.5.26)

Then, { N } (resp., { N }) is distributed as the eigenvalues of GUE(N) (resp.,


GSE(N)).

The proof of Theorem 2.5.17 goes through an integration relation that is slightly
more general than our immediate needs. To state it, let L = (a, b) R be a
nonempty open interval, perhaps unbounded, and let f and g be positive real-
valued infinitely differentiable functions defined on L. We will use the following
assumption on the triple (L, f , g).

Assumption 2.5.18 For (L, f , g) as above, for each integer k 0, write fk (x) =
xk f (x) and gk (x) = xk g(x) for x L. Then the following hold.
2.5 J OINT DISTRIBUTIONS IN THE GOE AND THE GUE 67

(I) There exists a matrix M (n) Matn+1 (R), independent of x, such that
det M (n) > 0 and

M (n) ( f0 , f1 , . . . , fn )T = (g 0 , g 1 , . . . , g n1 , f0 )T .

(II) ab | fn (x)|dx < .
(III) limxa gn (x) = 0 and limxb gn (x) = 0.

For a vector xn = (x1 , . . . , xn ), recall that (xn ) = 1i< jn (x j xi ) is the


Vandermonde determinant associated with xn , noting that if xn is ordered then
(xn ) 0. For an ordered vector xn and an ordered collection of indices I = {i1 <
i2 < . . . < i|I| } {1, . . . , n}, we write xI = (xi1 , xi2 , . . . , xi|I| ). The key to the proof
of Theorem 2.5.17 is the following proposition.

Proposition 2.5.19 Let Assumption 2.5.18 hold for a triple (L, f , g) with L =
(a, b). For x2n+1 = (x1 , . . . , x2n+1 ), set
(e) (o)
xn = Dec(x2n+1 ) = (x2 , x4 , . . . , x2n ) , and xn+1 = (x1 , x3 , . . . , x2n+1 ) .

Let

J2n+1 = {(I, J) : I, J {1, . . . , 2n + 1}, |I| = n, |J| = n + 1, I J = 0}


/ .
(e)
Then for each positive integer n and xn Ln , we have the integration identities
  
  
x2 x4 b 2n+1

a x2

x2n
(xI )(xJ ) f (xi ) dx2n+1 dx3 dx1
(I,J)J2n+1 i=1
   
(e) 2  b
2n (xn ) a f (x)dx (ni=1 f (x2i )) (ni=1 g(x2i ))
= , (2.5.27)
det M (n)
and
 x2  x4  b
 
2n+1

a x2

x2n
(x2n+1 ) f (xi ) dx2n+1 dx3 dx1
i=1
  4
b (e)
a f (x)dx (xn ) (ni=1 g(x2i ))2
= . (2.5.28)
det M (2n)
Assumption 2.5.18(II) guarantees the finiteness of the integrals in the proposition.
The value of the positive constant det M (n) will be of no interest in applications.
The proof of Proposition 2.5.19 will take up most of this section, after we com-
plete the
Proof of Theorem 2.5.17 We first check that Assumption 2.5.18 with L = (, ),
68 2. W IGNER MATRICES

f (x) = g(x) = ex /4 holds, that is we verify that a matrix M (n) as defined there
2

exists. Define M (n) as the solution to

M (n) ( f0 , f1 , . . . , fn )T = ( f0 , f0 , f1 , . . . , fn1

)T .

Because fi is a polynomial of degree i + 1 multiplied by ex /4 , with leading


2

(n)
coefficient equal 1/2, we have that M (n) is a lower triangular matrix, with M1,1 =
(n)
1/2 for i > 1 and M1,1 = 1, and thus det M (n) = (1/2)n . Since M (n) is obtained
from M (n) by a cyclic permutation (of length n+1, and hence sign equal to (1)n ),
we conclude that det M (n) = (1/2)n > 0, as needed.
To see the statement of Theorem 2.5.17 concerning the GUE, one applies equa-
tion (2.5.27) of Proposition 2.5.19 with the above choices of (L, f , g) and M (n) ,
together with Theorem 2.5.2. The statement concerning the GSE follows with the
same choice of (L, f , g), this time using (2.5.28).

In preparation for the proof of Proposition 2.5.19, we need three lemmas. Only
the first uses Assumption 2.5.18 in its proof. To compress notation, write

A11 . . . A1N
.. .
[Ai j ]n,N = ... .
An1 ... AnN

Lemma 2.5.20 For positive integers n and N, we have


 
M (n) xxj1
j
fi1 (x)dx [1i j ]N+1,N+1
n+1,N+1

gi1 (x j ) if i < n + 1 and j < N + 1,
= 0 if i < n + 1 and j = N + 1, (2.5.29)
 xj
a f 0 (x)dx if i = n + 1 n+1,N+1

for all a = x0 < x1 < < xN < xN+1 = b.

The left side of (2.5.29) is well-defined by Assumptions 2.5.18(I,II).


Proof Let hi = g i for i = 0, . . . , n 1 and put hn = f0 . The left side of (2.5.29)

equals [ ax j hi1 (x)dx]n+1,N+1 and this in turn equals the right side of (2.5.29) by
Assumption 2.5.18(III).

Lemma 2.5.21 For every positive integer n and x Ln , we have


 2 " #
n
gi1 (x( j+1)/2 ) if j is odd
((x)) g(xi ) = det
4
. (2.5.30)
i=1 g i1 (x j/2 ) if j is even 2n,2n
2.5 J OINT DISTRIBUTIONS IN THE GOE AND THE GUE 69

The case g = 1 is the classical confluent alternant identity.


Proof Write y2n = (y1 , . . . , y2n ). Set
2n
G(y2n ) = det([gi1 (y j )]2n,2n ) = (y) g(yi ) . (2.5.31)
i=1

Dividing G(y2n ) by ni=1 (y2i y2i1 ) and substituting y2i1 = y2i = xi for i =
1, . . . , n give the left side of (2.5.30). On the other hand, let u j denote the jth col-
umn of [gi1 (y j )]2n,2n . (Thus, G(y2n ) = det[u1 , . . . , u2n ].) Since it is a determinant,
G(y2n ) = det[u1 , u2 u1 , u3 , u4 u3 , . . . , u2n1 , u2n u2n1 ] and thus
" #
G(y2n ) u2 u1 u2n u2n1
= det u1 , , . . . , u2n1 , .
ni=1 (y2i y2i1 ) y2 y1 y2n y2n1
Applying LHopitals rule thus shows that the last expression evaluated at y2i1 =
y2i = xi for i = 1, . . . , n equals the right side of (2.5.30).

Lemma 2.5.22 For every positive integer n and x2n+1 = (x1 , . . . , x2n+1 ) we have
an identity

(o) (e)
2n (xn+1 )(xn ) = (xI )(xJ ) . (2.5.32)
(I,J)J2n+1

Proof Given I = {i1 < < ir } {1, . . . , 2n + 1}, we write I = (xI ). Given
a polynomial P = P(x1 , . . . , x2n+1 ) and a permutation S2n+1 , let P be defined
by the rule
( P)(x1 , . . . , x2n+1 ) = P(x (1) , . . . , x (2n+1) ) .
Given a permutation S2n+1 , let I = { (i) | i I}. Now let I J be a term
appearing on the right side of (2.5.32) and let = (i j) S2n+1 be a transposition.
We claim that

(I J ) 1 if {i, j} I or {i, j} J,
= (2.5.33)
I J (1)|i j|+1 otherwise.
To prove (2.5.33), since the cases {i, j} I and {i, j} J are trivial, and we may
allow i and j to exchange roles, we may assume without loss of generality that
i I and j J. Let k (resp., ) be the number of indices in the set I (resp., J)
strictly between i and j. Then
k +  = |i j| 1, I / I = (1)k , J / J = (1) ,
which proves (2.5.33). It follows that if i and j have the same parity, the effect of
applying to the right side of (2.5.32) is to multiply by 1, and therefore (xi x j )
divides the right side. On the other hand, the left side of (2.5.32) equals 2n times
the product of (xi x j ) with i < j of the same parity. Therefore, because the
70 2. W IGNER MATRICES

polynomial functions on both sides of (2.5.32) are homogeneous of the same total
degree in the variables x1 , . . . , x2n+1 , the left side equals the right side times some
constant factor. Finally, the constant factor has to be 1 because the monomial
n+1 i1 n i1 n
i=1 x2i1 i=1 x2i appears with coefficient 2 on both sides.

We can now provide the

Proof of Proposition 2.5.19 Let x0 = a and x2n+2 = b. To prove (2.5.27), use


(2.5.32) to rewrite the left side multiplied by det M (n) as
  x   n
f (x2i ) ,
(e)
2n (xn ) det M (n) x22j2j
fi1 (x)dx
n+1.n+1 i=1

and then evaluate using (2.5.29) and the second equality in (2.5.31). To prove
(2.5.28), rewrite the left side multiplied by det M (2n) as
 "  x j+1 # 
(2n) x j1 f i1 (x)dx if j is odd
det M ,
fi1 (x j ) if j is even 2n+1,2n+1

and then evaluate using (2.5.29) and (2.5.30).


Exercise 2.5.23 Let , > 1 be real constants. Show that each of the following
triples (L, f , g) satisfies Assumption 2.5.18:
(a) L = (0, ), f (x) = x ex , g(x) = x +1 ex (the Laguerre ensembles);
(b) L = (0, 1), f (x) = x (1 x) , g(x) = x +1 (1 x) +1 (the Jacobi ensembles).

2.6 Large deviations for random matrices

In this section, we consider N random variables (1 , , N ) with law


N
N 1 N i=1 V (i )
d i ,
N
PV,N (d 1 , , d N ) = (ZV, ) |( )| e (2.6.1)
i=1

for a > 0 and a continuous function V : RR such that, for some > 1 satis-
fying ,
V (x)
lim inf > 1. (2.6.2)
|x| log |x|

Here, ( ) = 1i< jN (i j ) and


  N
|( )| eN i=1 V (i ) d i .
N
N
ZV, = (2.6.3)
R R i=1

When V (x) = x2 /4, and = 1, 2, we saw in Section 2.5 that PNx2 /4, is the law
2.6 L ARGE DEVIATIONS FOR RANDOM MATRICES 71

of the (rescaled) eigenvalues of a GOE(N) matrix when = 1, and of a GUE(N)


matrix when = 2. It also follows from the general results in Section 4.1 that the
case = 4 corresponds to another matrix ensemble, namely the GSE(N). In view
of these and applications to certain problems in physics, we consider in this section
the slightly more general model. We emphasize, however, that the distribution
(2.6.1) precludes us from considering random matrices with independent non-
Gaussian entries.
We have proved earlier in this chapter (for the GOE, see Section 2.1, and for the
GUE, see Section 2.2) that the empirical measure LN = N 1 Ni=1 i converges in
probability (and almost surely, under appropriate moment assumptions), and we
studied its fluctuations around its mean. We have also considered the convergence
of the top eigenvalue NN . Such results did not depend much on the Gaussian
nature of the entries.
We address here a different type of question. Namely, we study the probability
that LN , or NN , take a very unlikely value. This was already considered in our
discussion of concentration inequalities, see Section 2.3, where the emphasis was
put on obtaining upper bounds on the probability of deviation. In contrast, the
purpose of the analysis here is to exhibit a precise estimate of these probabilities,
or at least of their logarithmic asymptotics. The appropriate tool for handling
such questions is large deviation theory, and we give in Appendix D a concise
introduction to that theory and related definitions, together with related references.

2.6.1 Large deviations for the empirical measure

We endow M1 (R) with the usual weak topology, compatible with the Lipschitz
bounded metric, see (C.1). Our goal is to estimate the probability PV,N (LN A),
for measurable sets A M1 (R). Of particular interest is the case where A does
not contain the limiting distribution of LN .
Define the noncommutative entropy : M1 (R) [, ) as
  
log |x y|d (x)d (y) if log(|x| + 1)d (x) < ,
( ) = (2.6.4)
otherwise ,

and the function IV : M1 (R) [0, ] as


  
V (x)d (x) 2 ( ) cV if V (x)d (x) < ,
IV ( ) = (2.6.5)
otherwise ,
72 2. W IGNER MATRICES

where


cV = inf { V (x)d (x) ( )} (, ) . (2.6.6)
M1 (R) 2
(Lemma 2.6.2 below and its proof show that both and IV are well defined, and
that cV is finite.)

Theorem 2.6.1 Let LN = N 1 Ni=1 N where the random variables {iN }Ni=1 are
i
distributed according to the law PV,N of (2.6.1), with potential V satisfying (2.6.2).
Then, the family of random measures LN satisfies, in M1 (R) equipped with the
weak topology, a large deviation principle with speed N 2 and good rate function
IV . That is,

(a) IV : M1 (R) [0, ] possesses compact level sets


{ : IV ( ) M} for all M R+ ,
(b) for any open set O M1 (R) ,
1
lim inf 2 log PN,V (LN O) inf IV , (2.6.7)
N N O
(c) for any closed set F M1 (R) ,
1
lim sup 2 log PN,V (LN F) inf IV . (2.6.8)
N N F

The proof of Theorem 2.6.1 relies on the properties of the function IV collected in
Lemma 2.6.2 below. Define the logarithmic capacity of a measurable set A R
as  -
 
1
(A) := exp inf log d (x)d (y) .
M1 (A) |x y|

Lemma 2.6.2
(a) cV (, ) and IV is well defined on M1 (R), taking its values in [0, +].
(b) IV ( ) is infinite as soon as satisfies one of the following conditions

(b.1) V (x)d (x) = +.
(b.2) There exists a set A R of positive mass but null logarithmic capacity,
i.e. a set A such that (A) > 0 but (A) = 0.
(c) IV is a good rate function.
(d) IV is a strictly convex function on M1 (R).
(e) IV achieves its minimum value at unique V M1 (R). The measure V is
compactly supported, and is characterized by the equality
V (x) V , log | x| = CV , for V -almost every x, (2.6.9)
2.6 L ARGE DEVIATIONS FOR RANDOM MATRICES 73

and inequality

V (x) V , log | x| > CV , for all x supp(V ), (2.6.10)

for some constant CV . Necessarily, CV = 2cV V ,V .

As an immediate corollary of Theorem 2.6.1 and of part (e) of Lemma 2.6.2 we


have the following.

Corollary 2.6.3 Under PV,N , LN converges almost surely towards V .

Proof of Lemma 2.6.2 For all M1 (R), ( ) is well defined and < due to
the bound
log |x y| log(|x| + 1) + log(|y| + 1) . (2.6.11)

Further, cV < as can be checked by taking as the uniform law on [0, 1].
Set
1 1
f (x, y) = V (x) + V (y) log |x y| . (2.6.12)
2 2 2
Note that (2.6.2) implies that f (x, y) goes to + when x, y do since (2.6.11) yields
1 1
f (x, y) (V (x) log(|x| + 1)) + (V (y) log(|y| + 1)) . (2.6.13)
2 2
Further, f (x, y) goes to + when x, y approach the diagonal {x = y}. Therefore,
for all L > 0, there exists a constant K(L) (going to infinity with L) such that, with
BL := {(x, y) : |x y| < L1 } {(x, y) : |x| > L} {(x, y) : |y| > L},

BL {(x, y) : f (x, y) K(L)} . (2.6.14)

Since f is continuous on the compact set BcL , we conclude that f is bounded below
on R2 , and denote by b f > a lower bound. It follows that cV b f > . Thus,
because V is bounded below by (2.6.2), we conclude that IV is well defined and
takes its values in [0, ], completing the proof of part (a). Further, since for any
measurable subset A R,

IV ( ) = ( f (x, y) b f )d (x)d (y) + b f cV
 
( f (x, y) b f )d (x)d (y) + b f cV
A A
 

log |x y|1 d (x)d (y) + inf V (x) (A)2 |b f | cV
2 A A xR

(A)2 log( (A)) |b f | cV + inf V (x) (A)2 ,
2 xR
74 2. W IGNER MATRICES

one concludes that if IV ( ) < , and A is a measurable set with (A) > 0, then
(A) > 0. This completes the proof of part (b).

We now show that IV is a good rate function, and first that its level sets {IV

M} are closed, that is that IV is lower semicontinuous. Indeed, by the monotone
convergence theorem,

IV ( ) = f (x, y)d (x)d (y) cV

= sup ( f (x, y) M)d (x)d (y) cV .
M0

But f M = f M is bounded continuous and so, for M < ,



IV,M ( ) = ( f (x, y) M)d (x)d (y)

is bounded continuous on M1 (R). As a supremum of the continuous functions


IV,M , IV is lower semicontinuous.
To complete the proof that IV is a good rate function, we need to show that the
set {IV L} is compact. By Theorem C.9, to see the latter it is enough to show
that {IV L} is included in a compact subset of M1 (R) of the form
.
K = { M1 (R) : ([B, B]c ) (B)} ,
BN

with a sequence (B) going to zero as B goes to infinity. Arguing as in (2.6.14),


there exist constants K (L) going to infinity as L goes to infinity, such that

{(x, y) : |x| > L, |y| > L} {(x, y) : f (x, y) K (L)} . (2.6.15)

Therefore, for any large positive L,

(|x| > L)2 (|x| > L, |y| > L)


=
 
f (x, y) K (L)

1
( f (x, y) b f )d (x)d (y)
K (L) b f
1
= (IV ( ) + cV b f ) .
K (L) b f

/ 
Hence, taking (B) = [ (M + cV b f )+ / (K (B) b f )+ ] 1, which goes to
zero when B goes to infinity, one has that {IV M} K . This completes the
proof of part (c).
Since IV is a good rate function, it achieves its minimal value. Let V be
2.6 L ARGE DEVIATIONS FOR RANDOM MATRICES 75

a minimizer. Let us derive some consequences of minimality. For any signed


measure (dx) = (x)V (dx) + (x)dx with two bounded measurable compactly
supported functions ( , ) such that 0 and (R) = 0, for > 0 small enough,
V + is a probability measure so that

IV (V + ) IV (V ) , (2.6.16)
which implies
   
V (x) log |x y|d V (y) d (x) 0 .

Taking = 0, we deduce (using ) that there is a constant CV such that



V (x) log |x y|d V (y) = CV , V a.s., (2.6.17)

which implies that V is compactly supported (because V (x) log |xy|d V (y)

goes to infinity when x does by (2.6.13)). Taking = (y)dy on the support
of V , we then find that

V (x) log |x y|d V (y) CV , (2.6.18)

Lebesgue almost surely, and then everywhere outside of the support of V by


continuity. Integrating (2.6.17) with respect to V then shows that

CV = 2cV V ,V  ,
proving (2.6.9) and (2.6.10), with the strict inequality in (2.6.10) following from
the uniqueness of V , since the later implies that the inequality (2.6.16) is strict
as soon as is nontrivial. Finally, integrating (2.6.9) with respect to V reveals
that the latter must be a minimizer of IV , so that (2.6.9) characterizes V .
The claimed uniqueness of V , and hence the completion of the proof of part
(e), will follow from the strict convexity claim (part (d) of the lemma), which we
turn to next. Note first that, extending the definition of to signed measures in
evident fashion when the integral in (2.6.4) is well defined, we can rewrite IV as
   

IV ( ) = ( V ) + V (x) log |x y|d V (y) CV d (x) .
2
The fact that IV is strictly convex will follow as soon as we show that is strictly
concave. Toward this end, note the formula
  
1 1 |x y|2
log |x y| = exp{ } exp{ } dt , (2.6.19)
0 2t 2t 2t
76 2. W IGNER MATRICES

which follows from the equality



1 1
= eu/2 du
z 2z 0

by the change of variables u  z2 /t and integration of z from 1 to |x y|. Now,


(2.6.19) implies that for any M1 (R),
  
1 |x y|2
( V ) = exp{ }d( V )(x)d( V )(y) dt .
0 2t 2t
(2.6.20)
Indeed, one may apply Fubinis Theorem when , are supported in [ 12 , 12 ]
V

since then V (exp{ 2t1 }exp{ |xy|


2
2t } 0) = 1. One then deduces (2.6.20)
for any compactly supported probability measure by scaling and finally for all
probability measures by approximations. The fact that, for all t 0,

|x y|2
exp{ }d( V )(x)d( V )(y)
2t
  2
t +  t 2
= exp{i x}d( )(x) exp{
V
}d ,
2 2
 2

therefore entails that is concave since exp{i x}d( V )(x) is convex
for all R. Strict convexity comes from the fact that

( + (1 ) ) ( ( ) + (1 )( )) = ( 2 )( ) ,

which vanishes for (0, 1) if and only if ( ) = 0. The latter equality


implies that all the Fourier transforms of vanish, and hence = . This
completes the proof of part (d) and hence of the lemma.

Proof of Theorem 2.6.1 With f as in (2.6.12),

2 
N
,V
PV,N (d 1 , , d N ) = (ZN )1 eN x =y f (x,y)dLN (x)dLN (y)
eV (i ) d i .
i=1

(No typo here: indeed, no N before V (i ).) Hence, if



f (x, y)d (x)d (y)
x =y

were a bounded continuous function, the proof would follow from a standard ap-
plication of Varadhans Lemma, Theorem D.8. The main point will therefore be
to overcome the singularities of this function, with the most delicate part being to
overcome the singularity of the logarithm.
2.6 L ARGE DEVIATIONS FOR RANDOM MATRICES 77

Following Appendix D (see Corollary D.6 and Definition D.3), a full large devi-
ation principle can be proved by proving that exponential tightness holds, as well
as estimating the probability of small balls. We follow these steps below.

Exponential tightness

Observe that, by Jensens inequality, for some constant C,



,V
log ZN N log eV (x) dx
   N V ( )
i d
e
f (x, y)dLN (x)dLN (y)  V (x) CN 2 .
i
N 2
x =y i=1 e dx
Moreover, by (2.6.13) and (2.6.2), there exist constants a > 0 and c > so that
f (x, y) a|V (x)| + a|V (y)| + c ,
from which one concludes that for all M 0,
   N
|V (x)|dLN M e2aN M+(Cc)N eV (x) dx
2 2
PV,N . (2.6.21)

Since V goes to infinity at infinity, KM = { M1 (R) : |V |d M} is a com-
pact set for all M < , so that we have proved that the law of LN under PV,N is
exponentially tight.

A large deviation upper bound

Recall that d denotes the Lipschitz bounded metric, see (C.1). We prove here that
,V
for any M1 (R), if we set PV,N = ZN PV,N ,

1
lim lim sup log PV,N (d(LN , ) ) f (x, y)d (x)d (y) . (2.6.22)
0 N N2
(We will prove the full LDP for PV,N as a consequence of both the upper and lower
bounds on PV,N , see (2.6.28) below.) For any M 0, set fM (x, y) = f (x, y) M.
Then the bound

2
N
PV,N (d(LN , ) )
d(LN , )
eN x =y f M (x,y)dLN (x)dLN (y)
eV (i ) d i
i=1

holds. Since under the product Lebesgue measure, the i s are almost surely dis-
tinct, it holds that LN LN (x = y) = N 1 , PV,N almost surely. Thus we deduce
that
 
fM (x, y)dLN (x)dLN (y) = fM (x, y)dLN (x)dLN (y) + MN 1 ,
x =y
78 2. W IGNER MATRICES

and so

PV,N (d(LN , ) )


2
N
eMN
d(LN , )
eN fM (x,y)dLN (x)dLN (y)
eV (i ) d i .
i=1

Since fM is bounded and continuous, :  IV,M fM (x, y)d (x)d (y) is a con-
tinuous functional, and therefore we deduce that
1
lim lim sup log PV,N (d(LN , ) ) IV,M ( ) .
0 N N2
We finally let M go to infinity and conclude by the monotone convergence theo-
rem. Note that the same argument shows that

1 ,V
lim sup log ZN inf f (x, y)d (x)d (y) . (2.6.23)
N N2 M1 (R)

A large deviation lower bound

We prove here that for any M1 (R),



1
lim lim inf log PV,N (d(LN , ) ) f (x, y)d (x)d (y) . (2.6.24)
0 N N2
Note that we can assume without loss of generality that IV ( ) < , since other-
wise the bound is trivial, and so, in particular, we may and will assume that has
no atoms. We can also assume that is compactly supported since if we con-
sider M = ([M, M])1 1|x|M d (x), clearly M converges towards and by
the monotone convergence theorem, one checks that, since f is bounded below,
 
lim f (x, y)d M (x)d M (y) = f (x, y)d (x)d (y) ,
M

which ensures that it is enough to prove the lower bound for (M , M R+ , IV ( ) <
), and so for compactly supported probability measures with finite entropy.
The idea is to localize the eigenvalues (i )1iN in small sets and to take ad-
vantage of the fast speed N 2 of the large deviations to neglect the small volume of
these sets. To do so, we first remark that, for any M1 (R) with no atoms, if we
set
 -
1
x1,N = inf x : ((, x]) ,
N +1
 -
  1
xi+1,N = inf x xi,N : (xi,N , x] , 1 i N 1,
N +1
2.6 L ARGE DEVIATIONS FOR RANDOM MATRICES 79

for any real number , there exists a positive integer N( ) such that, for any N
larger than N( ),
 
1 N
d , xi,N < .
N i=1

In particular, for N N( 2 ),
 -

(i )1iN | |i x | < i [1, N] {(i )1iN | d(LN , ) < } ,
i,N
2
so that we have the lower bound

PV,N (d(LN , ) )

2
N
0 i,N |<
eN x =y f (x,y)dLN (x)dLN (y)
eV (i ) d i
i {|i x 2 } i=1
 N
|xi,N x j,N + i j | eN i=1 V (x d i
N i,N +
= i)
0
i {|i |< }
2 i< j i=1
 

|xi,N x j,N | |xi,N xi+1,N | 2 eN i=1 V (x
N i,N )

i+1< j i
 
N
|i i+1 | i=1 [V (x +i )V (x )]
N N
d i
i,N i,N
0 2 e
i {|i |< 2 }
i <i+1 i i=1
=: PN,1 PN,2 , (2.6.25)

where we used the fact that |xi,N x j,N + i j | |xi,N x j,N | |i j | when
i j and xi,N x j,N . To estimate PN,2 , note that since we assumed that is
compactly supported, the (xi,N , 1 i N)NN are uniformly bounded and so, by
continuity of V ,

lim sup sup sup |V (xi,N + x) V (xi,N )| = 0 .


N NN 1iN |x|

Moreover, writing u1 = 1 , ui+1 = i+1 i ,


   N( +1)
N N N
2

|i |< 2 i |i i+1 | d i 2

0<ui < 2N
ui dui
2
( + 2)N
.
i <i1 i i=1 i=2 i=1

Therefore,
1
lim lim inf log PN,2 0 . (2.6.26)
0 N N2
To handle the term PN,1 , the uniform boundedness of the xi,N s and the convergence
80 2. W IGNER MATRICES

of their empirical measure towards imply that



1 N
lim
N N
V (xi,N ) = V (x)d (x). (2.6.27)
i=1

Finally since x log(x) increases on R+ , we notice that



log(y x)d (x)d (y)
x1,N x<yxN,N

log(x j+1,N xi,N ) x[xi,N ,xi+1,N ]
1x<y d (x)d (y)
1i jN1 y[x j,N ,x j+1,N ]
N1
1 1
=
(N + 1)2 i< j
log |x i,N
x j+1,N
| + log |xi+1,N xi,N | .
2(N + 1)2 i=1

Since log |x y| is upper-bounded when x, y are in the support of the compactly


supported measure , the monotone convergence theorem implies that the left side
in the last display converges towards 12 ( ). Thus, with (2.6.27), we have proved
 
1
lim inf log PN,1 log(y x)d (x)d (y) V (x)d (x) ,
N N2 x<y

which concludes, with (2.6.25) and (2.6.26), the proof of (2.6.24).

Conclusion of the proof of Theorem 2.6.1

By (2.6.24), for all M1 (R),


1 1
lim inf log ZN,V lim lim inf log PV,N (d(LN , ) )
N N2 0 N

N2
f (x, y)d (x)d (y) ,

and so, optimizing with respect to M1 (R) and with (2.6.23),



1
lim log ZN,V = inf { f (x, y)d (x)d (y)} = cV .
N N 2 M1 (R)

Thus, (2.6.24) and (2.6.22) imply the weak large deviation principle, i.e. that for
all M1 (R),
1
lim lim inf log PV,N (d(LN , ) )
0 N N2
1
= lim lim sup 2 log PV,N (d(LN , ) ) = IV ( ) . (2.6.28)
0 N N

This, together with the exponential tightness property proved above, completes
the proof of Theorem 2.6.1.

2.6 L ARGE DEVIATIONS FOR RANDOM MATRICES 81

Exercise 2.6.4 [Proof #5 of Wigners Theorem] Take V (x) = x2 /4 and apply


Corollary 2.6.3 together with Lemma 2.6.2 to provide a proof of Wigners Theo-
rem 2.1.1 in the case of GOE or GUE matrices.
Hint: It is enough to check (2.6.9) and (2.6.10), that is to check that

x2 1
log |x y| (dy) ,
4 2
with equality for x [2, 2], where is the semicircle law. Toward this end, use
the representation of the Stieltjes transform of , see (2.4.6).

2.6.2 Large deviations for the top eigenvalue

We consider next the large deviations for the maximum N = maxNi=1 i , of ran-
dom variables that possess the joint law (2.6.1). These will be obtained under the
following assumption.

N satisfy
Assumption 2.6.5 The normalization constants ZV,

N1
1 ZNV /(N1),
lim log N
= V, . (2.6.29)
N N ZV,

It is immediate from (2.5.11) that if V (x) = x2 /4 then Assumption 2.6.5 holds,


with V, = /2.
Assumption 2.6.5 is crucial in deriving the following LDP.

Theorem 2.6.6 Let (1N , . . . , NN ) be distributed according to the joint law PV,N of
(2.6.1), with continuous potential V that satisfies (2.6.2) and Assumption 2.6.5.
Let V be the minimizing measure of Lemma 2.6.2, and set x = max{x : x
supp V }. Then, N = maxNi=1 iN satisfies the LDP in R with speed N and good
rate function
 
log |x y|V (dy) V (x) V, if x x ,
JV (x) =
otherwise .

Proof of Theorem 2.6.6 Since JV () is continuous on (x , ) and JV (x) increases


to infinity as x , it is a good rate function. Therefore, the stated LDP follows
as soon as we show that
1
for any x < x , lim sup log PV,N (N x) = , (2.6.30)
N N
82 2. W IGNER MATRICES
1
for any x > x , lim sup log PV,N (N x) JV (x) (2.6.31)
N N
and
1
for any x > x , lim lim inf log PV,N (N (x , x + )) JV (x) . (2.6.32)
0 N N
The limit (2.6.30) follows immediately from the LDP (at speed N 2 ) for the empiri-
cal measure, Theorem 2.6.1; indeed, the event N x implies that LN ((x, x ]) = 0.
Hence, one can find a bounded continuous function f with support in (x, x ], inde-
pendent of N, such that LN , f  = 0 but V , f  > 0. Theorem 2.6.1 implies that
this event has probability that decays exponentially (at speed N 2 ), whence (2.6.30)
follows.
The following lemma, whose proof is deferred, will allow for a proper trunca-
tion of the top and bottom eigenvalues. (The reader interested only in the GOE or
GUE setups can note that Lemma 2.6.7 is then a consequence of Exercise 2.1.30.)

Lemma 2.6.7 Under the assumptions of Theorem 2.6.6, we have


N1
1 ZV,
lim sup log N < . (2.6.33)
N N ZV,

Further,
1
lim lim sup log PV,N (N > M) = (2.6.34)
M N N

and, with 1 = minNi=1 iN ,


1
lim lim sup log PV,N (1 < M) = . (2.6.35)
M N N

Equipped with Lemma 2.6.7, we may complete the proof of Theorem 2.6.6. We
begin with the upper bound (2.6.31). Note that for any M > x,

PV,N (N x) PV,N (N > M) + PV,N (N [x, M]) . (2.6.36)

By choosing M large enough and using (2.6.34), the first term in the right side of
NJV (x)
(2.6.36) can be made smaller than e , for all N large. In the sequel, we fix
an M such that the above is satisfied, the analogous bound with 1 also holds,
and further
"  # "  #
log |x y|V (dy) V (x) > sup log |z y|V (dy) V (z) .
z[M,)
(2.6.37)
2.6 L ARGE DEVIATIONS FOR RANDOM MATRICES 83

Set, for z [M, M] and supported on [M, M],



(z, ) = log |z y| (dy) V (z) log(2M) +V =: M ,

where V = infxR V (x) < . Setting B( ) as the ball of radius around V ,


BM ( ) as those probability measures in B( ) with support in [M, M], and writing
N1
ZNV /(N1),
N = N
, IM = [M, M]N1 ,
ZV,
we get
PV,N (N [x, M])
PV,N (1 < M)
 M 
+N N d N e(N1)(N ,LN1 ) PNV
N1
/(N1), (d 1 , . . . , d N1 )
x IM
 M
PV,N (1 < M) + N N e(N1) sup BM ( ) (z, ) dz
x

+(M x)e(N1)M PNV
N1
/(N1), (LN1
B( )) . (2.6.38)

(The choice of metric in the definition of B( ) plays no role in our argument,


as long as it is compatible with weak convergence.) Noting that the perturbation
involving the multiplication of V by N/(N 1) introduces only an exponential in
N factor, see (2.6.33), we get from the LDP for the empirical measure, Theorem
2.6.1, that
1
/(N1), (LN1 B( )) < 0 ,
N1
lim sup 2 log PNV
N N

and hence, for any fixed > 0,


1
/(N1), (LN1 B( )) = .
N1
lim sup log PNV (2.6.39)
N N
We conclude from (2.6.38) and (2.6.39) that
1 N 1
lim sup P ( [x, M]) lim sup log N + lim sup (z, )
N N V, N N N 0 z[x,M], BM ( )

= V, + lim sup (z, ) . (2.6.40)


0 z[x,M], BM ( )

Since (z, ) = inf >0 [ log(|z y| ) (dy) V (z), it holds that (z, ) 
(z, ) is upper semicontinuous on [M, M] M1 ([M, M]). Therefore, using
(2.6.37) in the last equality,
lim sup (z, ) = sup (z, V ) = sup (z, V ) .
0 z[x,M], BM ( ) z[x,M] z[x,)
84 2. W IGNER MATRICES

Combining the last equality with (2.6.40) and (2.6.36), we obtain (2.6.31).
We finally prove the lower bound (2.6.32). Let 2 < x x and fix r (x , x
2 ). Then, with Ir = (M, r)N1 ,
PV,N (N (x , x + ))
PV,N (N (x , x + ), i (M, r), i = 1, . . . , N 1) (2.6.41)
 x+ 
d N e(N1)(N ,LN1 ) PNV
N
= N N1
/(N1), (d 1 , . . . , d N1 )
xIr
 
2 N exp (N 1) inf (z, ) PNV /(N1), (LN1 Br,M ( )) ,
N1
z(x ,x+ )
Br,M ( )

where Br,M ( ) denotes those measures in B( ) with support in [M, r]. Recall
from the upper bound (2.6.31) together with (2.6.35) that

/(N1), (i (M, r) for some i {1, . . . , N 1}) = 0 .


N1
lim sup PNV
N
Combined with (2.6.39) and the strict inequality in (2.6.10) of Lemma 2.6.2, we
get by substituting in (2.6.41) that
1
lim lim inf log PV,N (N (x , x + )) V, + lim inf (z, )
0 N N 0 z(x ,x+ )
Br,M ( )

= V, + (x, V ) ,
where in the last step we used the continuity of (z, )  (z, ) on [x , x +
] M1 ([M, r]). The bound (2.6.32) follows.

Proof of Lemma 2.6.7 We first prove (2.6.33). Note that, for any > 0 and all N
large,
N1 N1 N1 N1
ZV, ZV,
ZNV /(N1), ZV,
N
= N1
N
N1
eN(V, + ) , (2.6.42)
ZV, ZNV /(N1),
ZV, ZNV /(N1),

by (2.6.29). On the other hand,


N1
ZV, 

N1
= eNLN1 ,V  dPN1,NV /(N1) . (2.6.43)
ZNV /(N1),

By the LDP for LN1 (at speed N 2 , see Theorem 2.6.1), Lemma 2.6.2 and (2.6.21),
N(V ,V + )
the last integral is bounded above by e . Substituting this in (2.6.43) and
(2.6.42) yields (2.6.33).
For |x| > M, M large and i R, for some constants a , b ,

|x i | eV (i ) a (|x| + |i | )eV (i ) b |x| b eV (x) .


2.7 B IBLIOGRAPHICAL NOTES 85

Therefore,
N1   N1  
ZV,
PV,N (N > M) N N
ZV, M
eNV (N ) d N
RN1 i=1
|x i | eV (i ) dPV,N1


N1 
ZV,
NbN1
eNV (M)/2 N
eV (N ) d N ,
ZV, M

implying, together with (2.6.33), that


1
lim lim sup log PV,N (N > M) = .
M N N
This proves (2.6.34). The proof of (2.6.35) is similar.

2.7 Bibliographical notes

Wigners Theorem was presented in [Wig55], and proved there using the method
of moments developed in Section 2.1. Since then, this result has been extended in
many directions. In particular, under appropriate moment conditions, an almost
sure version holds, see [Arn67] for an early result in that direction. Relaxation
of moment conditions, requiring only the existence of third moments of the vari-
ables, is described by Bai and co-workers, using a mixture of combinatorial, prob-
abilistic and complex-analytic techniques. For a review, we refer to [Bai99]. It is
important to note that one cannot hope to forgo the assumption of finiteness of sec-
ond moments, because without this assumption the empirical measure, properly
rescaled, converges toward a noncompactly supported measure, see [BeG08].
Regarding the proof of Wigners Theorem that we presented, there is a slight
ambiguity in the literature concerning the numbering of Catalan numbers. Thus,
[Aig79, p. 85] denotes by ck what we denote by Ck1 . Our notation follows
[Sta97]. Also, there does not seem to be a clear convention as to whether the
Dyck paths we introduced should be called Dyck paths of length 2k or of length
k. Our choice is consistent with our notion of length of Bernoulli walks. Finally,
we note that the first part of the proof of Lemma 2.1.3 is an application of the
reflection principle, see [Fel57, Ch. III.2].
The study of Wigner matrices is closely related to the study of Wishart ma-
trices, discussed in Exercises 2.1.18 and 2.4.8. The limit of the empirical mea-
sure of eigenvalues of Wishart matrices (and generalizations) can be found in
[MaP67], [Wac78] and [GrS77]. Another similar model is given by band ma-
trices, see [BoMP91]. In fact, both Wigner and Wishart matrices fall under the
86 2. W IGNER MATRICES

class of the general band matrices discussed in [Shl96], [Gui02] (for the Gaussian
case) and [AnZ05], [HaLN06].
Another promising combinatorial approach to the study of the spectrum of ran-
dom Wigner matrices, making a direct link with orthogonal polynomials, is pre-
sented in [Sod07].
The rate of convergence toward the semicircle distribution has received some
attention in the literature, see, e.g., [Bai93a], [Bai93b], [GoT03].
Lemma 2.1.19 first appears in [HoW53]. In the proof we mention that permu-
tation matrices form the extreme points of the set of doubly stochastic matrices,
a fact that is is usually attributed to G. Birkhoff. See [Chv83] for a proof and a
historical discussion which attributes this result to D. Konig. The argument we
present (that bypasses this characterization) was kindly communicated to us by
Hongjie Dong. The study of the distribution of the maximal eigenvalue of Wigner
matrices by combinatorial techniques was initiated by [Juh81], and extended by
[FuK81] (whose treatment we essentially follow; see also [Vu07] for recent im-
provements). See also [Gem80] for the analogous results for Wishart matrices.
The method was widely extended in the papers [SiS98a], [SiS98b], [Sos99] (with
symmetric distribution of the entries) and [PeS07] (in the general case), allow-
ing one to derive much finer behavior on the law of the largest eigenvalue, see
the discussion in Section 3.7. Some extensions of the FurediKomlos and Sinai
Soshnikov techniques can also be found in [Kho01]. Finally, conditions for the
almost sure convergence of the maximal eigenvalue of Wigner matrices appear in
[BaY88].
The study of maximal and minimal eigenvalues for Wishart matrices is of fun-
damental importance in statistics, where they are referred to as sample covari-
ance matrices, and has received a great deal of attention recently. See [SpT02],
[BeP05], [LiPRTJ05], [TaV09a], [Rud08], [RuV08] for a sample of recent devel-
opments.
The study of central limit theorems for traces of powers of random matrices
can be traced back to [Jon82], in the context of Wishart matrices (an even earlier
announcement appears in [Arh71], without proofs). Our presentation follows to a
large extent Jonssons method, which allows one to derive a CLT for polynomial
functions. A by-product of [SiS98a] is a CLT for tr f (XN ) for analytic f , under
a symmetry assumption on the moments. The paper [AnZ05] generalizes these
results, allowing for differentiable functions f and for nonconstant variance of the
independent entries. See also [AnZ08a] for a different version of Lemma 2.1.34.
For functions of the form f (x) = ai /(zi x) where zi C \ R, and matrices of
Wigner type, CLT statements can be found in [KhKP96], with somewhat sketchy
proofs. A complete treatment for f analytic in a domain including the support of
2.7 B IBLIOGRAPHICAL NOTES 87

the limit of the empirical distribution of eigenvalues is given in [BaY05] for ma-
trices of Wigner type, and in [BaS04] for matrices of Wishart type under a certain
restriction on fourth moments. Finally, an approach based on Fourier transforms
and interpolation was recently proposed in [PaL08].
Much more is known concerning the CLT for restricted classes of matrices:
[Joh98] uses an approach based on the explicit joint density of the eigenvalues,
see Section 2.5. (These results apply also to a class of matrices with dependent
entries.) For Gaussian matrices, an approach based on the stochastic calculus
introduced in Section 4.3 can be found in [Cab01] and [Gui02]. Recent extensions
and reinterpretation of this work, using the notion of second order freeness, can
be found in [MiS06] (see Chapter 5 for the notion of freeness and its relation to
random matrices).
The study of spectra of random matrices via the Stieltjes transform (resolvent
functions) was pioneered by Pastur co-workers, and greatly extended by Bai and
co-workers. See [MaP67] for an early reference, and [Pas73] for a survey of the
literature up to 1973. Our derivation is based on [KhKP96], [Bai99] and [SiB95].
We presented in Section 2.3 a very brief introduction to concentration inequali-
ties. This topic is picked up again in Section 4.4, to which we refer the reader for
a complete introduction to different concentration inequalities and their applica-
tion in RMT, and for full bibliographical notes. Good references for the logarith-
mic Sobolev inequalities used in Section 2.3 are [Led01] and [AnBC+ 00]. Our
treatment is based on [Led01] and [GuZ00]. Lemma 2.3.2 is taken from [BoL00,
Proposition 3.1]. We note in passing that, on R, a criterion for a measure to satisfy
the logarithmic Sobolev inequality was developed by Bobkov and Gotze [BoG99].
In particular, any probability measure on R possessing a bounded above and be-

low density with respect to the measures (dx) = Z 1 e|x| dx for 2, where
 |x|
Z= e dx, satisfies the LSI, see [Led01], [GuZ03, Property 4.6]. Finally,
in the Gaussian case, estimates on the expectation of the maximal eigenvalue (or
minimal and maximal singular values, in the case of Wishart matrices) can be ob-
tained from Slepians and Gordons inequalities, see [LiPRTJ05] and [DaS01]. In
particular, these estimates are useful when using, in the Gaussian case, (2.3.10)
with k = N.
The basic results on joint distribution of eigenvalues in the GOE and GUE pre-
sented in Section 2.5, as well as an extensive list of integral formulas similar to
(2.5.4) are given in [Meh91], [For05]. We took, however, a quite different ap-
proach to all these topics based on the elementary proof of the Selberg integral
formula [Sel44], see [AnAR99], given in [And91]. The proof of [And91] is based
on a similar proof [And90] of some trigonometric sum identities, and is also simi-
88 2. W IGNER MATRICES

lar in spirit to the proofs in [Gus90] of much more elaborate identities. For a recent
review of the importance of the Selberg integral, see [FoO08], where in particular
it is pointed out that Lemma 2.5.15 seems to have first appeared in [Dix05].
We follow [FoR01] in our treatment of superposition and decimation (The-
orem 2.5.17). We remark that triples (L, f , g) satisfying Assumption 2.5.18, and
hence the conclusions of Proposition 2.5.19, can be classified, see [FoR01], to
which we refer for other classical examples where superposition and decimation
relations hold. An early precursor of such relations can be traced to [MeD63].
Theorem 2.6.1 is stated in [BeG97, Theorem 5.2] under the additional assump-
tion that V does not grow faster than exponentially and proved there in detail when
V (x) = x2 . In [HiP00b], the same result is obtained when the topology over M1 (R)
is taken to be the weak topology with respect to polynomial test functions instead
of bounded continuous functions. Large deviations for the empirical measure of
random matrices with complex eigenvalues were considered in [BeZ98] (where
non self-adjoint matrices with independent Gaussian entries were studied) and in
[HiP00a] (where Haar unitary distributed matrices are considered). This strategy
can also be used when one is interested in discretized versions of the law PN,V
as they appear in the context of Young diagrams, see [GuM05]. The LDP for
the maximal eigenvalue described in Theorem 2.6.6 is based on [BeDG01]. We
mention in passing that other results discussed in this chapter have analogs for the
law PN,V . In particular, the CLT for linear statistics is discussed in [Joh98], and
concentration inequalities for V convex are a consequence of the results in Section
4.4.
Models of random matrices with various degrees of dependence between entries
have also be treated extensively in the literature. For a sample of existing results,
see [BodMKV96], [ScS05] and [AnZ08b]. Random Toeplitz, Hankel and Markov
matrices have been studied in [BrDJ06] and [HaM05].
Many of the results described in this chapter (except for Sections 2.3, 2.5 and
2.6) can also be found in the book [Gir90], a translation of a 1975 Russian edition,
albeit with somewhat sketchy and incomplete proofs.
We have restricted attention in this chapter to Hermitian matrices. A natural
question concerns the complex eigenvalues of a matrix XN where all are i.i.d. In
the Gaussian case, the joint distribution of the eigenvalues was derived by [Gin65].
The analog of the semicircle law is now the circular law: the empirical measure of
the (rescaled) eigenvalues converges to the circular law, i.e. the measure uniform
on the unit disc in the complex plane. This is stated in [Gir84], with a sketchy
proof. A full proof for the Gaussian case is provided in [Ede97], who also eval-
uated the probability that exactly k eigenvalues are real. Large deviations for the
2.7 B IBLIOGRAPHICAL NOTES 89

empirical measure in the Gaussian case are derived in [BeZ98]. For non-Gaussian
entries whose law possesses a density and finite moments of order at least 6, a
full proof, based on Girko ideas, appears in [Bai97]. The problem was recently
settled in full generality, see [TaV08a], [TaV08b], [GoT07]; the extra ingredients
in the proof are closely related to the study of the minimal singular value of XX
discussed above.
3
Hermite polynomials, spacings and limit
distributions for the Gaussian ensembles

In this chapter, we present the analysis of asymptotics for the joint eigenvalue dis-
tribution for the Gaussian ensembles: the GOE, GUE and GSE. As it turns out, the
analysis takes a particularly simple form for the GUE, because then the process of
eigenvalues is a determinantal process. (We postpone to Section 4.2 a discussion
of general determinantal processes, opting to present here all computations with
bare hands.) In keeping with our goal of making this chapter accessible with
minimal background, in most of this chapter we consider the GUE, and discuss
the other Gaussian ensembles in Section 3.9. Generalizations to other ensembles,
refinements and other extensions are discussed in Chapter 4 and in the biblio-
graphical notes.

3.1 Summary of main results: spacing distributions in the bulk and edge of
the spectrum for the Gaussian ensembles

We recall that the N eigenvalues ofthe GUE/GOE/GSE are spread out on an in-
terval of width roughly equal to 4 N, and hence the spacing between adjacent
eigenvalues is expected to be of order 1/ N.

3.1.1 Limit results for the GUE

Using the determinantal structure of the eigenvalues {1N , . . . , NN } of the GUE,


developed in Sections 3.23.4, we prove the following.

90
3.1 S UMMARY OF MAIN RESULTS 91

Theorem 3.1.1 (GaudinMehta) For any compact set A R,



lim P[ N 1N , . . . , N NN A]
N

(1)k  
= 1+ A A deti, j=1 Ksine (xi , x j ) j=1 dx j ,
k k
(3.1.1)
k=1 k!
where

1 sin(xy)
xy , x = y ,
Ksine (x, y) = 1
, x = y.

(Similar results apply to the sets A + c n with |c| < 2, see Exercise 3.7.5.)
As a consequence of Theorem 3.1.1, we will show that the theory of integrable
systems applies and yields the following fundamental result concerning the be-
havior of spacings between eigenvalues in the bulk.

Theorem 3.1.2 (JimboMiwaMoriSato) One has



lim P[ N 1N , . . . , N NN (t/2,t/2)] = 1 F(t),
N

with
 
t (x)
1 F(t) = exp dx for t 0 ,
0 x
with the solution of
(t )2 + 4(t )(t + ( )2 ) = 0 ,
so that
t t2 t3
= 2 3 + O(t 4 ) as t 0 . (3.1.2)

The differential equation satisfied by is the -form of Painleve V. Note that
Theorem 3.1.2 implies that F(t) t0 0. Additional analysis (see Remark 3.6.5 in
Subsection 3.6.3) yields that also F(t) t 1, showing that F is the distribution
function of a probability distribution on R+ .
We now turn our attention to the edge of the spectrum.

Definition 3.1.3 The Airy function is defined by the formula



1
e
3 /3x
Ai(x) = d , (3.1.3)
2 i C

where C is the contour in the -plane consisting of the ray joining e i/3 to the
origin plus the ray joining the origin to e i/3 (see Figure 3.1.1).
92 3. S PACINGS FOR G AUSSIAN ENSEMBLES

C

6

Fig. 3.1.1. Contour of integration for the Airy function

The Airy kernel is defined by


Ai(x) Ai (y) Ai (x) Ai(y)
KAiry (x, y) = A(x, y) := ,
xy
where the value for x = y is determined by continuity.

By differentiating under the integral and then integrating by parts, it follows that
Ai(x), for x R, satisfies the Airy equation:
d2y
xy = 0 . (3.1.4)
dx2
Various additional properties of the Airy function and kernel are summarized in
Subsection 3.7.3.
The fundamental result concerning the eigenvalues of the GUE at the edge of
the spectrum is the following.

Theorem 3.1.4 For all < t t ,


"  N  #

lim P N 2/3 i 2 [t,t ], i = 1, . . . , N
N N
 t  t k k
(1)k
= 1+ det A (xi , x j ) dx j , (3.1.5)
k=1 k! t t i, j=1 j=1

with A the Airy kernel. In particular,


"  N  #

lim P N 2/3 N 2 t
N N

(1)k   k
= 1+ t t deti, j=1 A(xi , x j ) j=1 dx j =: F2 (t) . (3.1.6)
k
k=1 k!
3.1 S UMMARY OF MAIN RESULTS 93

Note that the statement of Theorem 3.1.4 does not ensure that F2 is a distribu-
tion function (and in particular, does not ensure that F2 () = 0), since it only
implies thevague convergence, not the weak convergence, of the random vari-
ables NN / N 2. The latter convergence, as well as a representation of F2 , are
contained in the following.

Theorem 3.1.5 (TracyWidom) The function F2 () is a distribution function that


admits the representation
  
F2 (t) = exp (x t)q(x)2 dx , (3.1.7)
t

where q satisfies
q = tq + 2q3 , q(t) Ai(t) , as t + . (3.1.8)

The function F2 () is the TracyWidom distribution. Equation (3.1.8) is the Painleve


II equation. Some information on its solutions is collected in Remark 3.8.1 below.

3.1.2 Generalizations: limit formulas for the GOE and GSE

We next state the results for the GOE and GSE in a concise way that allows easy
comparison with the GUE. Most of the analysis will be devoted to controlling the
influence of the departure from a determinantal structure in these ensembles.
( ,n) ( ,n)
For = 1, 2, 4, let ( ,n) = (1 , . . . , n ) be a random vector in Rn with
( )
the law Pn , see (2.5.6), possessing a density with respect to Lebesgue measure
proportional to |(x)| e |x| /4 . (Thus, = 1 corresponds to the GOE, = 2 to
2

the GUE and = 4 to the GSE.) Consider the limits



1 F ,bulk (t) = lim P({ n ( ,n) } (t/2,t/2)} = 0) / ,
n
for t > 0 , (3.1.9)
1 2 
F ,edge (t) = lim P n1/6 ( ( ,n) 2 n) (t, ) = 0/ ,
n
for all real t . (3.1.10)
The existence of these limits for = 2 follows from Theorems 3.1.2 and 3.1.4,
together with Corollary 3.1.5. Further, from Lemma 3.6.6 below, we also have
  t 
t
1 F2,bulk (t) = exp (t x)r(x) dx ,
2
0

where
1 t
t 2 ((tr) + (tr))2 = 4(tr)2 ((tr)2 + ((tr) )2 ) , r(t) = + + Ot0 (t 2 ).
2
94 3. S PACINGS FOR G AUSSIAN ENSEMBLES

The following is the main result of the analysis of spacings for the GOE and GSE.

Theorem 3.1.6 The limits 1 F ,bulk ( = 1, 4) exist and are as follows:


  
1 F1,bulk (t) 1 t
 = exp r(x)dx , (3.1.11)
1 F2,bulk (t) 2 0
  
1 F4,bulk (t/2) 1 t
 = cosh r(x)dx . (3.1.12)
1 F2,bulk (t) 2 0

Theorem 3.1.7 The limits F ,edge ( = 1, 4) exist and are as follows:


  
F1,edge (t) 1
 = exp q(x)dx , (3.1.13)
F2,edge (t) 2 t
  
F4,edge (t/22/3 ) 1
 = cosh q(x)dx . (3.1.14)
F2,edge (t) 2 t

The proofs of Theorems 3.1.6 and 3.1.7 appear in Section 3.9.

3.2 Hermite polynomials and the GUE

In this section we show why orthogonal polynomials arise naturally in the study
of the law of the GUE. The relevant orthogonal polynomials in this study are the
Hermite polynomials and the associated oscillator wave-functions, which we in-
troduce and use to derive a Fredholm determinant representation for certain prob-
abilities connected with the GUE.

3.2.1 The GUE and determinantal laws

We now show that the joint distribution of the eigenvalues following the GUE has
a nice determinantal form, see Lemma 3.2.2 below. We then use this formula
in order to deduce a Fredholm determinant expression for the probability that no
eigenvalues belong to a given interval, see Lemma 3.2.4.
Throughout this section, we shall consider the eigenvalues of GUE matrices
with complex Gaussian entries of unit variance as in Theorem 2.5.2, and later
normalize the eigenvalues to study convergence issues. We shall be interested in
symmetric statistics of the eigenvalues. For p N, recalling the joint distributions
(2)
PN of the unordered eigenvalues of the GUE, see Remark 2.5.3, we call its
marginal P p,N on p coordinates the distribution of p unordered eigenvalues of
3.2 H ERMITE POLYNOMIALS AND THE GUE 95
(2)
the GUE. More explicitly, P p,N is the probability measure on R p so that, for any
f Cb (R p ),
 
(2) (2)
f (1 , , p )dP p,N (1 , , p ) = f (1 , , p )dPN (1 , , N )

(2)
(recall that PN is the law of the unordered eigenvalues). Clearly, one also has

(2)
f (1 , , p )dP p,N (1 , , p )

(N p)!

(2)
= f ( (1) , , (p) )dPN (1 , , N ) ,
N! S p,N

where S p,N is the set of injective maps from {1, , p} into {1, , N}. Note that
(2) (2)
we automatically have PN,N = PN .
We now introduce the Hermite polynomials and associated normalized (har-
monic) oscillator wave-function.

Definition 3.2.1 (a) The nth Hermite polynomial Hn (x) is defined as


2 /2 d n x2 /2
Hn (x) := (1)n ex e . (3.2.1)
dxn
(b) The nth normalized oscillator wave-function is the function

ex /4 Hn (x)
2

n (x) =  .
2 n!

d 2 n
x is taken as the definition of the nth Her-
2
(Often, in the literature, (1)n ex dx ne
mite polynomial. We find (3.2.1) more convenient.)
For our needs, the most important property of the oscillator wave-functions is
their orthogonality relations

k (x) (x)dx = k . (3.2.2)

We will also use the monic property of the Hermite polynomials, that is

Hn (x) is a polynomial of degree n with leading term xn . (3.2.3)

The proofs of (3.2.2) and (3.2.3) appear in Subsection 3.2.2, see Lemmas 3.2.7
and 3.2.5.
(2)
We are finally ready to describe the determinantal structure of P p,N . (See Sec-
tion 4.2 for more information on implications of this determinantal structure.)
96 3. S PACINGS FOR G AUSSIAN ENSEMBLES
(2)
Lemma 3.2.2 For any p N, the law P p,N is absolutely continuous with respect
to Lebesgue measure, with density
(2) (N p)! p (N)
p,N (1 , , p ) = det K (k , l ) ,
N! k,l=1
where
N1
K (N) (x, y) = k (x)k (y) . (3.2.4)
k=0

(2)
Proof Theorem 2.5.2 shows that p,N exists and equals
 N N
|(x)|2 exi /2
(2) 2
p,N (1 , , p ) = Cp,N d i , (3.2.5)
i=1 i=p+1

where xi = i for i p and i for i > p, and C p,N is a normalization constant. The
fundamental remark is that this density depends on the Vandermonde determinant
N N
(x) = (x j xi ) = det xij1 = det H j1 (xi ) ,
i, j=1 i, j=1
(3.2.6)
1i< jN

where we used (3.2.3) in the last equality.


(2) (2)
We begin by considering p = N, writing N for N,N . Then
 2 N
N
N (1 , , N ) = CN,N det H j1 (i ) ei /2
(2) 2
(3.2.7)
i, j=1 i=1
 2
N N
= CN,N det j1 (i ) = CN,N det K (N) (i , j ) ,
i, j=1 i, j=1

where in the last line we used the fact that det(AB) = det(A) det(B) with A = B =
( j1 (i ))i, j=1 . Here, CN,N = k=0 ( 2 k!)CN,N .
N N1

(2)
Of course, from (2.5.4) we know that CN,N = CN . We provide now yet another
direct evaluation of the normalization constant, following [Meh91]. We introduce
a trick that will be very often applied in the sequel.

Lemma 3.2.3 For any square-integrable functions f1 , . . . , fn and g1 , . . . , gn on the


real line, we have
  n
 
n n
1
det fk (xi )gk (x j ) dxi
n! i, j=1
k=1 i=1
  n n n n 
1
= det fi (x j ) det gi (x j ) dxi = det fi (x)g j (x)dx . (3.2.8)
n! i, j=1 i, j=1 i=1 i, j=1
3.2 H ERMITE POLYNOMIALS AND THE GUE 97

Proof Using the identity det(AB) = det(A) det(B) applied to A = { fk (xi )}ik and
B = {gk (x j )}k j , we get
 
   
n n n n n n
det
i, j=1
fk (xi )gk (x j ) dxi = det fi (x j ) det gi (x j ) dxi ,
i, j=1 i, j=1
k=1 i=1 i=1

which equals, by expanding the determinants involving the families {gi } and { fi },
  n n
( ) ( ) f (i) (xi ) g (i) (xi )) dxi
, Sn i=1 i=1
n 
= ( ) ( ) f (i) (x)g (i) (x)dx
, Sn i=1
n  n 
= n! ( ) fi (x)g (i) (x)dx = n! det
i, j=1
fi (x)g j (x)dx.
Sn i=1

Substituting fi = gi = i1 and n = N in Lemma 3.2.3, and using the orthogonality


relations (3.2.2), we deduce that
 N N
det K (N) (i , j ) d i = N! , (3.2.9)
i, j=1 i=1
which completes the evaluation of CN,N and the proof of Lemma 3.2.2 for p = N.
For p < N, using (3.2.5) and (3.2.6) in a manner similar to (3.2.7), we find that
for some constant Cp,N , with xi = i if i p and i otherwise,
 N N

(2)
p,N (1 , , p ) = Cp,N ( det j1 (xi ))2 d i
i, j=1 i=p+1
 N N
= C p,N ( ) ( ) ( j)1 (x j ) ( j)1 (x j ) d i .
, SN j=1 i=p+1

Therefore, letting S (p, ) denote the bijections from {1, , p} to {1 , , p }


:= , we get
(2)
p,N (1 , , p )
p
= Cp,N ( ) ( ) (i)1 (i ) (i)1 (i )
11 << p N , S (p, ) i=1
 p
2
= Cp,N det j 1 (i )
i, j=1
, (3.2.10)
11 << p N

where in the first equality we used the orthogonality of the family { j } to con-
clude that the contribution comes only from permutations of SN for which (i) =
98 3. S PACINGS FOR G AUSSIAN ENSEMBLES

(i) for i > p, and we put {1 , , p } = { (1), , (p)} = { (1), , (p)}.


Using the CauchyBinet Theorem A.2 with A = B (of dimension p N) and
Ai, j = j1 (i ), we get that

p
(2)
p,N (1 , , p ) = Cp,N det (K (N) (i , j )).
i, j=1

To compute Cp,N , note that, by integrating both sides of (3.2.10), we obtain

  p 2
1 = Cp,N det j 1 (i )
i, j=1
d 1 d p , (3.2.11)
11 << p N

whereas Lemma 3.2.3 implies that for all {1 , . . . , p },

  p 2
det j 1 (i ) d 1 d p = p!.
i, j=1

Thus, since there are (N!)/((N p)!p!) terms in the sum at the right side of
(3.2.11), we conclude that C p,N = (N p)!/N!.


Now we arrive at the main point, on which the study of the local properties of
the GUE will be based.

Lemma 3.2.4 For any measurable subset A of R,

.
N   k
(1)k k
{i A}) = 1 + det K (N) (xi , x j ) dxi . (3.2.12)
(2)
PN (
i=1 k=1 k! Ac Ac i, j=1 i=1

(The proof will show that the sum in (3.2.12) is actually finite.) The last expres-
sion appearing in (3.2.12) is a Fredholm determinant. The latter are discussed in
greater detail in Section 3.4.
Proof By using Lemmas 3.2.2 and 3.2.3 in the first equality, and the orthogonality
relations (3.2.2) in the second equality, we have

(2)
PN [i A, i = 1, . . . , N]
 
N1  N1 
= det i (x) j (x)dx = det i j i (x) j (x)dx
i, j=0 A i, j=0 Ac
N  
k
= 1 + (1)k det i (x) j (x)dx ,
k=1 01 <<k N1 i, j=1 Ac
3.2 H ERMITE POLYNOMIALS AND THE GUE 99

Therefore,
(2)
PN [i A, i = 1, . . . , N]
N    2 k
(1)k k
= 1+ det i (x j ) dxi
k=1 k! Ac Ac 0 << N1 i, j=1 i=1
1 k
N   k
(1)k k
= 1+ det K (N) (xi , x j ) dxi
k=1 k! Ac Ac i, j=1 i=1
  k
(1)k k
= 1+ det K (N) (xi , x j ) dxi , (3.2.13)
k=1 k! Ac Ac i, j=1 i=1

where the first equality uses (3.2.8) with gi (x) = fi (x) = i (x)1Ac (x), the second
equality uses the CauchyBinet Theorem A.2, and the last step is trivial since the
determinant detki, j=1 K (N) (xi , x j ) has to vanish identically for k > N because the
rank of {K (N) (xi , x j )}ki, j=1 is at most N.

3.2.2 Properties of the Hermite polynomials and oscillator wave-functions

Recall the definition of the Hermite polynomials, Definition 3.2.1. Some proper-
ties of the Hermite polynomials are collected in the following lemma. Through-

out, we use the notation  f , gG = R f (x)g(x)ex /2 dx. In anticipation of further
2

development, we collect much more information than was needed so far. Thus,
the proof of Lemma 3.2.5 may be skipped at first reading. Note that (3.2.3) is the
second point of Lemma 3.2.5.

Lemma 3.2.5 The sequence of polynomials {Hn (x)}


n=0 has the following proper-
ties.
1. H0 (x) = 1, H1 (x) = x and Hn+1 (x) = xHn (x) H n (x).
2. Hn (x) is a polynomial of degree n with leading term xn .
3. Hn (x) is even or odd according as n is even or odd.
4. x, H2n G = 0.

5. Hk , H G = 2 k! k .
6.  f , Hn G = 0 for all polynomials f (x) of degree < n.
7. xHn (x) = Hn+1 (x) + nHn1 (x) for n 1.
8. H n (x) = nHn1 (x).
9. H n (x) xH n (x) + nHn (x) = 0.
10. For x = y,
n1
Hk (x)Hk (y) (Hn (x)Hn1 (y) Hn1 (x)Hn (y))
k!
=
(n 1)!(x y)
.
k=0
100 3. S PACINGS FOR G AUSSIAN ENSEMBLES

Property 2 shows that {Hn }n0 is a basis of polynomial functions, whereas prop-
erty 5 implies that it is an orthogonal basis for the scalar product  f , gG defined
on L2 (ex /2 dx) (since the polynomial functions are dense in the latter space).
2

Remark 3.2.6 Properties 7 and 10 are the three-term recurrence and the Christoffel
Darboux identity satisfied by the Hermite polynomials, respectively.

Proof of Lemma 3.2.5 Properties 1, 2 and 3 are clear. To prove property 5, use
integration by parts to get that
 
x2 /2 dl
= (1) Hk (x) l (ex /2 )dx
l 2
Hk (x)Hl (x)e dx
dx
 " l #
d
ex /2 dx
2
= Hk (x)
dxl

vanishes
if l > k (since the degree of Hk is strictly less than l), and is equal to
2 k! if k = l, by property 2. Then, we deduce property 4 since, by property 3,
H2n is an even function and so is the function ex /2 . Properties 2 and 5 suffice
2

to prove property 6. To prove property 7, we proceed by induction on n. By


properties 2 and 5 we have, for n 1,
n+1
xHn , Hk G
xHn (x) = Hk (x).
k=0 Hk , Hk G

By property 6 the kth term on the right vanishes unless |k n| 1, by property 4


the nth term vanishes, and by property 2 the (n + 1)st term equals 1. To get the
(n 1)st term we observe that
xHn , Hn1 G xHn , Hn1 G Hn , Hn G
= = 1n = n
Hn1 , Hn1 G Hn , Hn G Hn1 , Hn1 G
by induction on n and property 5. Thus property 7 is proved. Property 8 is a direct
consequence of properties 1 and 7, and property 9 is obtained by differentiating
the last identity in property 1 and using property 8. To prove property 10, call the
left side of the claimed identity F(x, y) and the right side G(x, y). Using properties
2 and 5, followed by integration by parts and property 8, one sees that the integral
 
ex
2 /2y2 /2
Hk (x)H (y)F(x, y)(x y)dxdy

equals the analogous integral with G(x, y) replacing F(x, y); we leave the details to
the reader. Equality of these integrals granted, property 10 follows since {Hk }k0
being a basis of the set of polynomials, it implies almost sure equality and hence
3.3 T HE SEMICIRCLE LAW REVISITED 101

equality by continuity of F, G. Thus the claimed properties of Hermite polynomi-


als are proved.

Recall next the oscillator wave-functions, see Definition 3.2.1. Their basic
properties are contained in the following lemma, which is an easy corollary of
Lemma 3.2.5. Note that (3.2.2) is just the first point of the lemma.

Lemma 3.2.7 The oscillator wave-functions satisfy the following.



1. k (x) (x)dx = k .

2. xn (x) = n + 1n+1 (x) + nn1 (x) .
n1
3. k (x)k (y) = n(n (x)n1 (y) n1 (x)n (y))/(x y) .
k=0
x
4. n (x) = n (x) + n n1 (x) .
2
1 x2
5. n (x) + (n + )n (x) = 0 .
2 4

We remark that the last relation above is the one-dimensional Schrodinger equa-
tion for the eigenstates of the one-dimensional quantum-mechanical harmonic os-
cillator. This explains the terminology.

3.3 The semicircle law revisited


(2)
Let XN HN be a random Hermitian matrix from the GUE with eigenvalues
1N NN , and let

LN = ( N /N + + N /N )/N (3.3.1)
1 N


denote the empirical distribution of the eigenvalues of the rescaled matrix XN / N.
LN thus corresponds to the eigenvalues of a Gaussian Wigner matrix.
We are going to make the average empirical distribution LN explicit in terms
of Hermite polynomials, calculate the moments of LN explicitly, check that the
moments of LN converge to those of the semicircle law, and thus provide an al-
ternative proof of Lemma 2.1.7. We also derive a recursion for the moments of
LN and
estimate the order of fluctuation of the renormalized maximum eigenvalue
NN / N above the spectrum edge, an observation that will be useful in Section
3.7.
102 3. S PACINGS FOR G AUSSIAN ENSEMBLES

3.3.1 Calculation of moments of LN

In this section, we derive the following explicit formula for LN , es .

Lemma 3.3.1 For any s R, any N N,


 
N1
1 2k (N 1) (N k) s2k
LN , e  = e
s s2 /(2N)
k Nk
. (3.3.2)
k=0 k + 1 (2k)!

Proof By Lemma 3.2.2,


   
1 x K (N) ( Nx, Nx)
LN ,  = K (N) (x, x)dx = (x) dx. (3.3.3)
N N N
This last identity shows that
LN is absolutely
continuous with respect to Lebesgue
measure, with density K (N) ( Nx, Nx)/ N.
Using points 3 and 5 of Lemma 3.2.7, we obtain that, for any n,
n (x)n1 (y) n1 (x)n (y)
K (n) (x, y)/ n =
xy
and hence by LHopitals rule

K (n) (x, x)/ n = n (x)n1 (x) n1

(x)n (x) .

Therefore
d (n)
K (x, x)/ n = n (x)n1 (x) n1

(x)n (x) = n (x)n1 (x). (3.3.4)
dx

By (3.3.3) the function K (N) ( Nx, Nx)/ N is the RadonNikodym derivative
of LN with respect to Lebesgue measure and hence we have the following repre-
sentation of the moment-generating function of LN :

1
LN , e  =
s
esx/ N
K (N) (x, x)dx. (3.3.5)
N

Integrating by parts once and then applying (3.3.4), we find that



1
LN , es  = esx/ N
N (x)N1 (x)dx . (3.3.6)
s

Thus the calculation of the moment generating function of LN boils down to the
problem of evaluating the integral on the right.
By Taylors theorem it follows from point 8 of Lemma 3.2.5 that, for any n,
n   n  
n n
Hn (x + t) = Hnk (x)t =
k
Hk (x)t nk .
k=0
k k=0
k
3.3 T HE SEMICIRCLE LAW REVISITED 103

Let Stn =: e n (x)n1 (x)dx.
tx By the preceding identity and orthogonality we
have

n
Hn (x)Hn1 (x)ex /2+tx dx
2
Stn =
n! 2
t 2 /2 
ne
Hn (x + t)Hn1 (x + t)ex /2 dx
2
=
n! 2
  
t 2 /2 n1
n1
k! n
= e n t 2n12k .
k=0 n!
k k
Changing the index of summation in the last sum from k to n 1 k, we then get
  
t 2 /2 (n 1 k)! n1
n1
n
n
St = e n t 2k+1
k=0 n! n 1 k n 1 k
  
n1
(n 1 k)! n n1
n
2
t /2
= e t 2k+1 .
k=0 n! k + 1 k
From the last calculation combined with (3.3.6) and after a further bit of re-
arrangement we obtain (3.3.2).


We can now present another

Proof of Lemma 2.1.7 (for Gaussian Wigner matrices) We have written the
moment generating function in the form (3.3.2), making it obvious that as N
the moments of LN tend to the moments of the semicircle distribution.

3.3.2 The HarerZagier recursion and Ledouxs argument

Recall that, throughout this chapter, NN denotes the maximal eigenvalue of a GUE
matrix. Our goal in this section is to provide the proof of the following lemma.

Lemma 3.3.2 (Ledouxs bound) There exist positive constants c and C such that
 N 
2/3
P N eN C ec , (3.3.7)
2 N
for all N 1 and > 0.

Roughly speaking, the last inequality says that fluctuations of the rescaled top
eigenvalue NN := NN /2 N 1 above 0 are of order of magnitude N 2/3 . This is
an a priori indication that the random variables N 2/3 NN converge in distribution,
as stated in Theorems 3.1.4 and 3.1.5. In fact, (3.3.7) is going to play a role in the
proof of Theorem 3.1.4, see Subsection 3.7.1.
104 3. S PACINGS FOR G AUSSIAN ENSEMBLES

The proof of Lemma 3.3.2 is based on a recursion satisfied by the moments of


LN . We thus first introduce this recursion in Lemma 3.3.3 below, prove it, and
then show how to deduce from it Lemma 3.3.2. Write
b(N)   2k
2k s
LN , es  = k .
k=0 k + 1
k (2k)!

Lemma 3.3.3 (HarerZagier recursions) For any integer numbers k and N,

(N) (N) k(k + 1) (N)


bk+1 = bk + b , (3.3.8)
4N 2 k1
where if k = 0 we ignore the last term.

Proof of Lemma 3.3.3 Define the (hypergeometric) function


   
1 n
(1)k n1
Fn (t) = F
2
t := (k + 1)! k
tk , (3.3.9)
k=0

and note that


 2 
d d
t 2 + (2 t) + (n 1) Fn (t) = 0 . (3.3.10)
dt dt
By rearranging (3.3.2) it follows from (3.3.9) that
 2
s
LN , e  = N
s
, (3.3.11)
N
where
n (t) = et/2 Fn (t) .

From (3.3.10) we find that


 
d2 d
4t 2 + 8 + 4n t n (t) = 0 . (3.3.12)
dt dt

Write next n (t) =


(n)
k
k=0 ak t . By (3.3.12) we have

(n) (n) (n)


0 = 4(k + 2)(k + 1)ak+1 + 4nak ak1 ,

where if k = 0 we ignore the last term. Clearly we have, taking n = N,


(N) (N)  
(1)k ak (2k)! b 2k
k
= k = LN , x2k  .
N k+1 k
The lemma follows.

3.3 T HE SEMICIRCLE LAW REVISITED 105

Proof of Lemma 3.3.2 From (3.3.8) and the definitions we obtain the inequalities
 
(N) (N) k(k + 1) (N)
0 bk bk+1 1 + bk
4N 2
for N 1, k 0. As a consequence, we deduce that
3
(N) c k2
bk e N , (3.3.13)
for some finite constant c > 0. By Stirlings approximation (2.5.12) we have
 
k3/2 2k
sup 2k < .
k=0 2 (k + 1) k
It follows from (3.3.13) and the last display that, for appropriate positive constants
c and C,
 N   2k
N NN
P e E (3.3.14)
2 N 2 Ne
(N)  
e2 k Nbk 2k
CNt 3/2 e2 t+ct /N ,
3 2

22k (k + 1) k
for all N 1, k 0 and real numbers ,t > 0 such that k = t, where t denotes
the largest integer smaller than or equal to t. Taking t = N 2/3 and substituting
N 2/3 for yields the lemma.

Exercise 3.3.4 Prove that, in the setup of this section, for every integer k it holds
that
lim ELN , xk 2 = lim LN , xk 2 . (3.3.15)
N N

Using the fact that the moments of LN converge to the moments of the semicircle
distribution, complete yet another proof of Wigners Theorem 2.1.1 in the GUE
setup.
Hint: Deduce from (3.3.3) that

1
LN , xk  = xk K (N) (x, x)dx .
N k/2+1
Also, rewrite ELN , xk 2 as
  N N N
1 1
= ( xik )2 det K (N) (xi , x j ) dx j
N 2+k N! i=1 i, j=1 j=1
   2
! 1 1
= x2k K (N) (x, y)2 dxdy + xk K (N) (x, x)dx
N k+2 N k+2
(N)
= LN , xk 2 + Ik ,
106 3. S PACINGS FOR G AUSSIAN ENSEMBLES
(N)
where Ik is equal to
 
1 x2k xk yk
(N (x)N1 (y) N1 (x)N (y))K(x, y)dxdy .
N k+3/2 xy
To prove the equality marked with the exclamation point, show that

K (n) (x,t)K (n) (t, y)dt = K (n) (x, y) ,

(N)
while the expression for Ik uses the ChristoffelDarboux formula (see Section
(N)
3.2.1). To complete the proof of (3.3.15), show that limN Ik = 0, expanding
the expression
x2k xk yk
(N (x)N1 (y) N1 (x)N (y))
xy
as a linear combination of the functions  (x)m (y) by exploiting the three-term
recurrence (see Section 3.2.1) satisfied by the oscillator wave-functions.

Exercise 3.3.5 With the notation of Lemma 3.3.2, show that there exist c ,C > 0
so that, for all N 1, if > 1 then
 N 
2/3 1
3
P N eN C 3 ec 2 .
2 N 4
This bound improves upon (3.3.7) for large .
Hint: optimize differently over the parameter t at the end of the proof of Lemma
3.3.2, replacing there by N 2/3 .

Exercise 3.3.6 The function Fn (t) defined in (3.3.9) is a particular case of the
general hypergeometric function, see [GrKP94]. Let
xk = x(x + 1) (x + k 1)
be the ascending factorial power. The general hypergeometric function is given
by the rule
 
a1 a p ak ak k
p t
F
b1 bq t = 1
.
k=0 b bq k!
k k
1

(i) Verify the following generalization of (3.3.10):


     
d d d a1 ap
t + b1 1 t + bq 1 F t
dt dt dt b1 bq
     
d d a1 ap
= t + a1 t + a p F t .
dt dt b1 bq
3.4 F REDHOLM DETERMINANTS 107

(ii) (Proposed by D. Stanton) Check that Fn (t) in (3.3.9) is a Laguerre polynomial.

3.4 Quick introduction to Fredholm determinants

We have seen in Lemma 3.2.4 that a certain gap probability, i.e. the probability
that a set does not contain any eigenvalue, is given by a Fredholm determinant.
The asymptotic study of gap probabilities thus involves the analysis of such de-
terminants. Toward this end, in this section we review key definitions and facts
concerning Fredholm determinants. We make no attempt to achieve great gen-
erality. In particular we do not touch here on any functional analytic aspects of
the theory of Fredholm determinants. The reader interested only in the proof of
Theorem 3.1.1 may skip Subsection 3.4.2 in a first reading.

3.4.1 The setting, fundamental estimates and definition of the Fredholm


determinant

Let X be a locally compact Polish space, with BX denoting its Borel -algebra.
Let be a complex-valued measure on (X, BX ), such that

 1 = | (dx)| < . (3.4.1)
X

(In many applications, X = R, and will be a scalar multiple of the Lebesgue


measure on a bounded interval).

Definition 3.4.1 A kernel is a Borel measurable, complex-valued function K(x, y)


defined on X X such that

K := sup |K(x, y)| < . (3.4.2)


(x,y)XX

The trace of a kernel K(x, y) (with respect to ) is



tr(K) = K(x, x)d (x) . (3.4.3)

Given two kernels K(x, y) and L(x, y), define their composition (with respect to )
as 
(K  L)(x, y) = K(x, z)L(z, y)d (z). (3.4.4)

The trace in (3.4.3) and the composition in (3.4.4) are well defined because  1 <
and K < , and further, K  L is itself a kernel. By Fubinis Theorem, for any
108 3. S PACINGS FOR G AUSSIAN ENSEMBLES

three kernels K, L and M, we have

tr(K  L) = tr(L  K) and (K  L)  M = K  (L  M).

Warning We do not restrict K in Definition 3.4.1 to be continuous. Thus, we may


have situations where two kernels K, K satisfy K = K , - a.e., but tr(K) =
tr(K ).
We turn next to a basic estimate.

Lemma 3.4.2 Fix n > 0. For any two kernels F(x, y) and G(x, y) we have

n n
det F(xi , y j ) det G(xi , y j ) n1+n/2 F G max(F, G)n1 (3.4.5)
i, j=1 i, j=1

and

n
det F(xi , y j ) nn/2 Fn . (3.4.6)
i, j=1

The factor nn/2 in (3.4.5) and (3.4.6) comes from Hadamards inequality (Theorem
A.3). In view of Stirlings approximation (2.5.12), it is clear that the Hadamard
bound is much better than the bound n! we would get just by counting terms.
Proof Define

G(x, y) if i < k,
(k)
Hi (x, y) = F(x, y) G(x, y) if i = k,

F(x, y) if i > k,
noting that, by the linearity of the determinant with respect to rows,
n n n n
i,det
(k)
det F(xi , y j ) det G(xi , y j ) = Hi (xi , y j ) . (3.4.7)
i, j=1 i, j=1 j=1
k=1

(k) (k)
Considering the vectors vi = vi with vi ( j) = Hi (xi , y j ), and applying Hadamards
inequality (Theorem A.3), one gets

n
det H (k) (xi , y j ) nn/2 F G max(F, G)n1 .
i
i, j=1

Substituting in (3.4.7) yields (3.4.5). Noting that the summation in (3.4.7) involves
only one nonzero term when G = 0, one obtains (3.4.6).

We are now finally ready to define the Fredholm determinant associated with a
kernel K(x, y). For n > 0, put
  n
n = n (K, ) = det K(i , j )d (1 ) d (n ) , (3.4.8)
i, j=1
3.4 F REDHOLM DETERMINANTS 109

setting 0 = 0 (K, ) = 1. We have, by (3.4.6),


  n

det K(i , j )d (1 ) d (n )  n Kn nn/2 . (3.4.9)
i, j=1 1

So, n is well defined.

Definition 3.4.3 The Fredholm determinant associated with the kernel K is defined
as

(1)n
(K) = (K, ) = n (K, ).
n=0 n!

(As in (3.4.8) and Definition 3.4.3, we often suppress the dependence on from
the notation for Fredholm determinants.) In view of Stirlings approximation
(2.5.12) and estimate (3.4.9), the series in Definition 3.4.3 converges absolutely,
and so (K) is well defined. The reader should not confuse the Fredholm determi-
nant (K) with the Vandermonde determinant (x): in the former, the argument
is a kernel while, in the latter, it is a vector.

Remark 3.4.4 Here is some motivation for calling (K) a determinant. Let
f1 (x), . . . , fN (x), g1 (x), . . . , gN (x) be given. Put
N
K(x, y) = fi (x)gi (y).
i=1

Assume further that maxi supx fi (x) < and max j supy g j (y) < . Then K(x, y) is
a kernel and so fits into the theory developed thus far. Paraphrasing the proof of
Lemma 3.2.4, we have that
  
N
(K) = det i j fi (x)g j (x)d (x) . (3.4.10)
i, j=1

For this reason, one often encounters the notation det(I K) for the Fredholm
determinant of K.

The determinants (K) inherit good continuity properties with respect to the  
norm.

Lemma 3.4.5 For any two kernels K(x, y) and L(x, y) we have
 

n1+n/2  n1 max(K, L)n1
|(K) (L)| K L . (3.4.11)
n=1 n!
110 3. S PACINGS FOR G AUSSIAN ENSEMBLES

Proof Sum the estimate (3.4.5).


In particular, with K held fixed, and with L varying in such a way that K L
0, it follows that (L) (K). This is the only thing we shall need to obtain
the convergence in law of the spacing distribution of the eigenvalues of the GUE,
Theorem 3.1.1. On the other hand, the next subsections will be useful in the proof
of Theorem 3.1.2.

3.4.2 Definition of the Fredholm adjugant, Fredholm resolvent and a


fundamental identity

Throughout, we fix a measure and a kernel K(x, y). We put = (K). All the
constructions under this heading depend on K and , but we suppress reference to
this dependence in the notation in order to control clutter. Define, for any integer
n 1,
 
x1 . . . xn n
K = det K(xi , y j ) , (3.4.12)
y1 . . . yn i, j=1

set
   
x 1 ... n
Hn (x, y) = K d (1 ) d (n ) (3.4.13)
y 1 ... n
and
H0 (x, y) = K(x, y) .
We then have from Lemma 3.4.2 that
|Hn (x, y)| Kn+1  n1 (n + 1)(n+1)/2 . (3.4.14)

Definition 3.4.6 The Fredholm adjugant of the kernel K(x, y) is the function

(1)n
H(x, y) = Hn (x, y) . (3.4.15)
n=0 n!

If (K) = 0 we define the resolvent of the kernel K(x, y) as the function


H(x, y)
R(x, y) = . (3.4.16)
(K)

By (3.4.14), the series in (3.4.15) converges absolutely and uniformly on X X.


Therefore H() is well defined (and continuous on X 2p if K is continuous on X
X). The main fact to bear in mind as we proceed is that
sup |F(x, y)| < (3.4.17)
3.4 F REDHOLM DETERMINANTS 111

for F = K, H, R. These bounds are sufficient to guarantee the absolute convergence


of all the integrals we will encounter in the remainder of Section 3.4. Also it bears
emphasizing that the two-variable functions H(x, y) (resp., R(x, y) if defined) are
kernels.
We next prove a fundamental identity relating the Fredholm adjugant and de-
terminant associated with a kernel K.

Lemma 3.4.7 (The fundamental identity) Let H(x, y) be the Fredholm adjugant
of the kernel K(x, y). Then,

K(x, z)H(z, y)d (z) = H(x, y) (K) K(x, y)

= H(x, z)K(z, y)d (z) , (3.4.18)

and hence (equivalently)

K  H = H (K) K = H  K . (3.4.19)

Remark 3.4.8 Before proving the fundamental identity (3.4.19), we make some
amplifying remarks. If (K) = 0 and hence the resolvent R(x, y) = H(x, y)/(K)
of K(x, y) is well defined, then the fundamental identity takes the form
 
K(x, z)R(z, y)d (z) = R(x, y) K(x, y) = R(x, z)K(z, y)d (z) (3.4.20)

and hence (equivalently)

K R = RK = RK.

It is helpful if not perfectly rigorous to rewrite the last formula as the operator
identity
1 + R = (1 K)1 .

Rigor is lacking here because we have not taken the trouble to associate linear
operators with our kernels. Lack of rigor notwithstanding, the last formula makes
it clear that R(x, y) deserves to be called the resolvent of K(x, y). Moreover, this
formula is useful for discovering composition identities which one can then verify
directly and rigorously.

Proof of Lemma 3.4.7 Here are two reductions to the proof of the fundamental
identity. Firstly, it is enough just to prove the first of the equalities claimed in
(3.4.18) because the second is proved similarly. Secondly, proceeding term by
112 3. S PACINGS FOR G AUSSIAN ENSEMBLES

term, since H0 = K and 0 = 1, it is enough to prove that, for n > 0,



(1)n1 (1)n
K(x, z)Hn1 (z, y)d (z) = (Hn (x, y) n K(x, y))
(n 1)! n!
or, equivalently,

Hn (x, y) = n K(x, y) n K(x, z)Hn1 (z, y)d (z) , (3.4.21)

where n = n (K).
Now we can quickly give the proof of the fundamental identity (3.4.19). Ex-
panding by minors of the first row, we find that
 
x 1 . . . n
K
y 1 . . . n
 
1 . . . n
= K(x, y)K
1 . . . n
 
n
1 . . . j1 j j+1 . . . n
+ (1) j K(x, j )K
j=1
y 1 . . . j1 j+1 . . . n
 
1 . . . n
= K(x, y)K
1 . . . n
 
n
j 1 . . . j1 j+1 . . . n
K(x, j )K .
j=1
y 1 . . . j1 j+1 . . . n

Integrating out the variables 1 , . . . , n in evident fashion, we obtain (3.4.21). Thus


the fundamental identity is proved.

We extract two further benefits from the proof of the fundamental identity. Re-
call from (3.4.8) and Definition 3.4.3 the abbreviated notation n = n (K) and
(K).

Corollary 3.4.9 (i) For all n 0,


n
(1)n (1)k
Hn (x, y) = k (K
3  45
 K6)(x, y) . (3.4.22)
n! k=0 k! n+1k

(ii) Further,
n
(1)n (1)k
n+1 = k tr(K
3  45
 K6) . (3.4.23)
n! k=0 k! n+1k

In particular, the sequence of numbers


tr(K), tr(K  K), tr(K  K  K), . . .
3.4 F REDHOLM DETERMINANTS 113

uniquely determines the Fredholm determinant (K).

Proof Part (i) follows from (3.4.21) by employing an induction on n. We leave the
details to the reader. Part (ii) follows by putting x = and y = in (3.4.22), and
integrating out the variable .

Multiplicativity of Fredholm determinants

We now prove a result needed for our later analysis of GOE and GSE. A reader
interested only in GUE can skip this material.

Theorem 3.4.10 Fix kernels K(x, y) and L(x, y) arbitrarily. We have

(K + L L  K) = (K)(L) . (3.4.24)

In the sequel we refer to this relation as the multiplicativity of the Fredholm deter-
minant construction.
Proof Let t be a complex variable. We are going to prove multiplicativity by
studying the entire function

K,L (t) = (K + t(L L  K))

of t. We assume below that K,L (t) does not vanish identically, for otherwise there
is nothing to prove. We claim that

K,L (0) = (K) tr(L L  K) + tr((L L  K)  H)
= (K) tr(L) , (3.4.25)

where H is the Fredholm adjugant of K, see equation (3.4.15). The first step
is justified by differentiation under the integral; to justify the exchange of limits
one notes that for any entire analytic function f (z) and > 0 one has f (0) =
1  f (z)
2 i |z|= z2 dz, and then uses Fubinis Theorem. The second step follows by the
fundamental identity, see Lemma 3.4.7. This completes the proof of (3.4.25).
Since 0,L (t) = (tL) equals 1 for t = 0, the product 0,L (t)K,L (t) does not
vanish identically. Arbitrarily fix a complex number t0 such that 0,L (t0 )K,L (t0 ) =
0. Note that the resolvant S of t0 L is defined. One can verify by straightforward
calculation that the kernels

K = K + t0 (L L  K), L = L + L  S , (3.4.26)

satisfy the composition identity

K + (t0 + t)(L L  K) = K + t(L L  K) . (3.4.27)


114 3. S PACINGS FOR G AUSSIAN ENSEMBLES

With K and L as in (3.4.26), we have K,L (t) = K,L (t + t0 ) by (3.4.27) and hence

d
log K,L (t) = tr(L)
dt t=t0

by (3.4.25). Now the last identity holds also for K = 0 and the right side is inde-
pendent of K. It follows that the logarithmic derivatives of the functions 0,L (t)
and K,L (t) agree wherever neither has a pole, and so these logarithmic deriva-
tives must be identically equal. Integrating and exponentiating once we obtain an
identity K,L (t) = K,L (0)0,L (t) of entire functions of t. Finally, by evaluating
the last relation at t = 1, we recover the multiplicativity relation (3.4.24).

3.5 Gap probabilities at 0 and proof of Theorem 3.1.1


(2)
In the remainder of this chapter, we let XN HN be a random Hermitian matrix
from the GUE with eigenvalues 1N NN . We initiate in this section the
study of the spacings between eigenvalues of XN . We focus on those eigenvalues
that lie near 0, and seek, for a fixed t > 0, to evaluate the limit

lim P[ N 1N , . . . , N NN (t/2,t/2)] , (3.5.1)
N

see the statement of Theorem 3.1.1. We note that a priori, because of Theorems
2.1.1 and 2.1.22, the limit in (3.5.1)
Nhas some chance of being nondegenerate
because the N random variables N 1 ,. . ., N N are spread out over an interval
N

very nearly of length 4N. As we will show in Section 4.2, the computation of the
limit in (3.5.1) allows one to evaluate other limits, such as the limit of the empirical
measure of the spacings in the bulk of the spectrum.
As in (3.2.4), set
n1 n (x)n1 (y) n1 (x)n (y)
K (n) (x, y) = k (x)k (y) = n
xy
,
k=0

where the k (x) are the normalized oscillator wave-functions introduced in Defi-
nition 3.2.1. Set  
1 x y
S(n) (x, y) = K (n) , .
n n n
A crucial step in the proof of Theorem 3.1.1 is the following lemma, whose proof,
which takes most of the analysis in this section, is deferred.

Lemma 3.5.1 With the above notation, it holds that


1 sin(x y)
lim S(n) (x, y) = , (3.5.2)
n xy
3.5 G AP PROBABILITIES AT 0 115

uniformly on each bounded subset of the (x, y)-plane.

Proof of Theorem 3.1.1 Recall that by Lemma 3.2.4,


(n) (n)
P[ n1 , . . . , nn A]

(1)k  
= 1+ 1 1 detk
n A n A
(n) k
i, j=1 K (xi , x j ) j=1 dx j
k=1 k!

(1)k  
= 1+ A A deti, j=1 S
k (n) (x , x ) k
i j j=1 dx j .
k=1 k!

(The scaling of Lebesgues measure in the last equality explains the appearance

of the scaling by 1/ n in the definition of S(n) (x, y).) Lemma 3.5.1 together with
Lemma 3.4.5 complete the proof of the theorem.

The proof of Lemma 3.5.1 takes up the rest of this section. We begin by bring-
ing, in Subsection 3.5.1, a quick introduction to Laplaces method for the evalua-
tion of asymptotics of integrals, which will be useful for other asymptotic compu-
tations, as well. We then apply it in Subsection 3.5.2 to conclude the proof.

Remark 3.5.2 We remark that one is naturally tempted to guess that the ran-
dom variable WN =width of the largest
open interval
symmetric about the origin
containing none of the eigenvalues N 1N , . . . , N NN should possess a limit in
distribution. Note however that we do not a priori have tightness for that random
variable. But, as we show in Section 3.6, we do have tightness (see (3.6.34) be-
low) a posteriori. In particular, in Section 3.6 we prove Theorem 3.1.2, which
provides an explicit expression for the limit distribution of WN .

3.5.1 The method of Laplace

Laplaces method deals with the asymptotic (as s ) evaluation of integrals of


the form 
f (x)s g(x)dx .

We will be concerned with the situation in which the function f possesses a global
maximum at some point a, and behaves quadratically in a neighborhood of that
maximum. More precisely, let f : R  R+ be given, and for some constant a
and positive constants s0 , K, L, M, let G = G (a, 0 , s0 , f (), K, L, M) be the class of
measurable functions g : R  R satisfying the following conditions:

(i) |g(a)| K;
116 3. S PACINGS FOR G AUSSIAN ENSEMBLES


(ii) sup0<|xa|0 g(x)g(a)
xa L;

(iii) f (x)s0 |g(x)|dx M.

We then have the following.

Theorem 3.5.3 (Laplace) Let f : R R+ be a function such that, for some a R


and some positive constants 0 , c, the following hold.
(a) f (x) f (x ) if either a 0 x x a or a x x a + 0 .
(b) For all < 0 , sup|xa|> f (x) f (a) c 2 .
(c) f (x) has two continuous derivatives in the interval (a 20 , a + 20 ).
(d) f (a) < 0.
Then, for any function g G (a, 0 , s0 , f (), K, L, M), we have
7

2 f (a)
lim s f (a)s f (x)s g(x)dx = g(a) , (3.5.3)
s f (a)

and moreover, for fixed f , a, 0 , s0 , K, L, M, the convergence is uniform over the


class G (a, 0 , s0 , f (), K, L, M).

Note that by point (b) of the assumptions, f (a) > 0. The intuition here is that as s
tends to infinity the function ( f (x)/ f (a))s near x = a peaks more and more sharply
and looks at the microscopic level more and more like a bell-curve, whereas f (x)s
elsewhere becomes negligible. Formula (3.5.3) is arguably the simplest nontrivial
application of Laplaces method. Later we are going to encounter more sophisti-
cated applications.
Proof of Theorem 3.5.3 Let (s) be a positive function defined for s s0 such
that (s) s 0 and s (s)2 s , while 0 = supss0 (s). For example, we
could take (s) = 0 (s0 /s)1/4 . For s s0 , write

f (x)s g(x)dx = g(a)I1 + I2 + I3 ,

where

I1 = |xa| (s) f (x)s dx ,

I2 = |xa| (s) f (x)s (g(x) g(a))dx ,

I3 = |xa|> (s) f (x)s g(x)dx .

For |t| < 20 , put


 1
h(t) = (1 r)(log f ) (a + rt)dr ,
0
3.5 G AP PROBABILITIES AT 0 117

thus defining a continuous function of t such that h(0) = f (a)/2 f (a) and which
by Taylors Theorem satisfies

f (x) = f (a) exp(h(x a)(x a)2 )

for |x a| < 20 . We then have


    
f (a)s t
I1 = exp h t 2 dt ,
s |t| (s) s s
and hence
7
  
2 f (a)
lim s f (a)s I1 = exp h(0)t 2 dt = .
s f (a)

We have |I2 | L (s)I1 and hence



lim s f (a)s I2 = 0 .
s

We have, since (s) < 0 ,


 ss0
c (s)2
|I3 | M sup | f (x)|ss0 M f (a)ss0 1 ,
x:|xa|> (s) f (a)

and hence

lim s f (a)s I3 = 0 .
s

This is enough to prove that the limit formula (3.5.3) holds and enough also to
prove the uniformity of convergence over all functions g(x) of the class G .

3.5.2 Evaluation of the scaling limit: proof of Lemma 3.5.1

The main step in the proof of Lemma 3.5.1 is the following uniform convergence
result, whose proof is deferred. Let
 
1 t
(t) = n 4 ,
n
with a quantity whose difference from n is fixed (in the proof of Lemma 3.5.1,
we will use = n, n 1, n 2).

Lemma 3.5.4 Uniformly for t in a fixed bounded interval,


1  
lim | (t) cos t | = 0. (3.5.4)
n 2
118 3. S PACINGS FOR G AUSSIAN ENSEMBLES

With Lemma 3.5.4 granted, we can complete the


Proof of Lemma 3.5.1 Recall that
y y
n ( n )n1 ( n ) n1 ( n )n ( n )
x x
S(n) (x, y) = n .
xy

In order to prove the claimed uniform convergence, it is useful to get rid of the
division by (x y) in S(n) (x, y). Toward this end, noting that for any differentiable
functions f , g on R,
f (x)g(y) f (y)g(x)
xy
   
f (x) f (y) g(y) g(x)
= g(y) + f (y)
xy xy
 1  1
= g(y) f (tx + (1 t)y)dt f (y) g (tx + (1 t)y)dt , (3.5.5)
0 0

we deduce
1 
y x y
S(n) (x, y) = n1 ( ) n (t + (1 t) )dt
n 0 n n
 1
y x y
n ( ) n1 (t + (1 t) )dt (3.5.6)
n 0 n n
 1
y z
= n1 ( ) ( nn1 (z) n (z))|z=t x +(1t) y dt
n 0 2 n n
 1
y z
n ( ) ( n 1n2 (z) n1 (z))|z=t x +(1t) y dt ,
n 0 2 n n

where we used in the last equality point 4 of Lemma 3.2.7. Using (3.5.4) (in the
case = n, n 1, n 2) in (3.5.6) and elementary trigonometric formulas shows
that
   
1 (n 1) 1 (n 1)
S(n) (x, y) cos(y ) cos tx + (1 t)y dt
2 0 2
   
n 1 (n 2)
cos(y ) cos tx + (1 t)y dt
2 0 2
1 sin(x y)
,
xy
which, Lemma 3.5.4 granted, completes the proof of Lemma 3.5.1.

Proof of Lemma 3.5.4 Recall the Fourier transform identity



1
e
2 /2i x
ex
2 /2
= d .
2
3.5 G AP PROBABILITIES AT 0 119

Differentiating under the integral, we find that



d n x2 /2 1
(i )n e
2 /2i x
Hn (x)ex
2 /2
= (1)n e = d ,
dxn 2
or equivalently

i ex /4
2

e
2 /2i x
(x) = d . (3.5.7)
(2 )3/4 !
We use the letter here instead of n to help avoid confusion at the next step. As a

consequence, setting C ,n = n/(2 ), we have

i et /(4n) n1/4
2
2 /2i t/n
(t) = e d
(2 )3/4 !

(2 )1/4C ,n et /(4n) n1/4+ /2
2

( e /2 )n i ei t n d
2
=
!

(2 )1/4C ,n n1/4+n/2
( e /2 )n i ei t n d
2

n!

| e |n [(i sign ) ei t ]| n |d ,
2 /2
C ,n en/2

where Stirlings approximation (2.5.12) and the fact that (t) is real were used
in the last line. Using symmetry, we can rewrite the last expressions as

2C ,n en/2 f ( )n gt ( )d ,

with f (x) = xex 1x0 and g(x) = gt (x) = cos(xt n .


2 /2
2 )x
Consider t as fixed, and let n in one of the four possible ways such that
g() does not depend on n (recall that n does not depend on n). Note that f (x)
achieves its maximal value at x = 1 and
f (1) = e1/2 , f (1) = 0, f (1) = 2e1/2 .
Hence, we can apply Laplaces method (Theorem 3.5.3) to find that
1  
(t) n cos t .
2
Moreover, the convergence here is uniform for t in a fixed bounded interval, as
follows from the uniformity asserted for convergence in limit formula (3.5.3).

Exercise 3.5.5 Use Laplaces method (Theorem 3.5.3) with a = 1 to prove (2.5.12):
as s along the positive real axis,
 
dx dx
(s) = xs ex = ss (xex )s 2 ss1/2 es .
0 x 0 x
120 3. S PACINGS FOR G AUSSIAN ENSEMBLES

This recovers in particular Stirlings approximation (2.5.12).

3.5.3 A complement: determinantal relations

Let integers 1 , . . . ,  p 0 and bounded disjoint Borel sets A1 , . . . , A p be given.


Put

8 9
PN (1 , . . . ,  p ; A1 , . . . , A p ) = P i = N 1N , . . . , N NN Ai , for i = 1, . . . , p .
We have the following.

Lemma 3.5.6 Let s1 , . . . , s p be independent complex variables and let


= (1 s1 )1A1 + + (1 s p )1A p .
Then, the limit
P(1 , . . . ,  p ; A1 , . . . , A p ) = lim PN (1 , . . . ,  p ; A1 , . . . , A p ) (3.5.8)
N

exists and satisfies



P(1 , . . . ,  p ; A1 , . . . , A p )s11 s pp


1 =0  p =0

(1)k   sin(xi x j ) k
= 1+ detki, j=1 1 xi x j=1 (x j )dx j . (3.5.9)
k=1 k! j

That is, the generating function in the left side of (3.5.8) can be represented in
terms of a Fredholm determinant. We note that this holds in greater generality, see
Section 4.2.
Proof The proof is a slight modification of the method presented in Subsection
3.5.2. Note that the right side of (3.5.9) defines, by the fundamental estimate
(3.4.9), an entire function of the complex variables s1 , . . . , s p , whereas the left side
defines a function analytic in a domain containing the product of p copies of the
unit disc centered at the origin. Clearly we have
N   
E 1

N iN = PN (1 , . . . ,  p ; A1 , . . . , A p )s11 s pp .
i=1 1 ,..., p 0
1 ++ p N
(3.5.10)
The function of s1 , . . . , s p on the right is simply a polynomial, whereas the expec-
tation on the left can be represented as a Fredholm determinant. From this, the
lemma follows after representing the probability PN (1 , . . . ,  p ; A1 , . . . , A p ) as a p-
dimensional Cauchy integral.

3.6 A NALYSIS OF THE SINE - KERNEL 121

3.6 Analysis of the sine-kernel

Our goal in this section is to derive differential equations (in the parameter t)
for the probability that no eigenvalue of the (properly rescaled) GUE lies in the
interval (t/2,t/2). We will actually derive slightly more general systems of
differential equations that can be used to evaluate expressions like (3.5.9).

3.6.1 General differentiation formulas

Recalling the setting of our general discussion of Fredholm determinants in Sec-


tion 3.4, we fix a bounded open interval (a, b) R, real numbers

a < t1 < < tn < b

in the interval (a, b) and complex numbers

s1 , . . . , sn1 , s0 = 0 = sn .

Set
= s1 1(t1 ,t2 ) + + sn1 1(tn1 ,tn ) ,

and define so that it has density with respect to the Lebesgue measure on
X = R. We then have, for f L1 [(a, b)],
 n1  ti+1
 f ,  = f (x)d (x) = si ti
f (x)dx .
i=1

Motivated by Theorem 3.1.1, we fix the function


sin(x y)
S(x, y) = (3.6.1)
(x y)
on (a, b)2 as our kernel. As usual = (S) denotes the Fredholm determinant
associated with S and the measure . We assume that = 0 so that the Fredholm
resolvent R(x, y) is also defined.
Before proceeding with the construction of a system of differential equations,
we provide a description of the main ideas, disregarding in this sketch issues of
rigor, and concentrating on the most important case of n = 2. View the kernels S
and R as operators on L1 [(a, b)], writing multiplication instead of the  operation.
As noted in Remark 3.4.8, we have, with S(x, y) = (x y)S(x, y) and R(x, y) =
(x y)R(x, y), that

(1 S)1 = 1 + R, S = [M, S], R = [M, R] ,


122 3. S PACINGS FOR G AUSSIAN ENSEMBLES

where M is the operation of multiplication by x and the bracket [A, B] = ABBA is


the commutator of the operators A, B. Note also that under our special assumptions

S(x, y) = (sin x cos y sin y cos x)/ ,

and hence the operator S is of rank 2. But we have

R = [M, R] = [M, (1 S)1 ]


= (1 S)1 [M, 1 S](1 S)1 = (1 + R)S(1 + R) ,


and hence R is also of rank 2. Letting P(x) = (1 + R) cos(x)/ and Q(x) =

(1 + R) sin(x)/ , we then obtain R = Q(x)P(y) Q(y)P(x), and thus

Q(x)P(y) Q(y)P(x)
R(x, y) = . (3.6.2)
xy

(See Lemma 3.6.2 below for the precise statement and proof.) One checks that
differentiating with respect to the endpoints t1 ,t2 the function log (S) yields the
functions R(ti ,ti ), i = 1, 2, which in turn may be related to derivatives of P and
Q by a careful differentiation, using (3.6.2). The system of differential equations
thus obtained, see Theorem 3.6.2, can then be simplified, after specialization to
the case t2 = t1 = t/2, to yield the Painleve V equation appearing in Theorem
3.1.2.
Turning to the actual derivation, we consider the parameters t1 , . . . ,tn as vari-
able, whereas we consider the kernel S(x, y) and the parameters s1 , . . . , sn1 to be

fixed. Motivated by the sketch above, set f (x) = (sin x)/ and
 
Q(x) = f (x) + R(x, y) f (y) d (y), P(x) = f (x) + R(x, y) f (y) d (y) .
(3.6.3)
We emphasize that P(x), Q(x) and R(x, y) depend on t1 , . . . ,tn (through ), al-
though the notation does not show it. The main result of this section, of which
Theorem 3.1.2 is an easy corollary, is the following system of differential equa-
tions.

Theorem 3.6.1 With the above notation, put, for i, j = 1, . . . , n,

pi = P(ti ), qi = Q(ti ), Ri j = R(ti ,t j ) .


3.6 A NALYSIS OF THE SINE - KERNEL 123

Then, for i, j = 1, . . . , n with i = j, we have the following equations:

Ri j = (qi p j q j pi )/(ti t j ) ,
q j / ti = (si si1 ) R ji qi ,
p j / ti = (si si1 ) R ji pi ,
qi / ti = +pi + (sk sk1 ) Rik qk ,
k =i

pi / ti = qi + (sk sk1 ) Rik pk ,


k =i
Rii = pi qi / ti qi pi / ti ,
( / ti ) log = (si si1 ) Rii . (3.6.4)

The proof of Theorem 3.6.1 is completed in Subsection 3.6.2. In the rest of


this subsection, we derive a fundamental differentiation formula, see (3.6.10), and
derive several relations concerning the functions P, Q introduced in (3.6.3), and
the resolvent R.
  ti+1
In the sequel, we write Ii for ti . Recall from (3.4.8) that
   
n1 n1
1 ... 
 = si1 si Ii1

Ii
S
1 ... 
d 1 d  .
i1 =1 i =1

Therefore, by the fundamental theorem of calculus,



 (x, y)
ti
 n1 n1 n1 n1
= si1 si j1 si j+1 sik (si si1 )
j=1 i1 =1 i j1 =1 i j+1 =1 i =1
     
1 ... i1 ti i+1 ...  

Ii1

Ii j1 Ii j+1

Ii
S
1 ... i1 ti i+1 ...  d j
j=1
j =i
= (si si1 )H1 (ti ,ti ) , (3.6.5)

with H1 as in (3.4.13). Multiplying by (1) /! and summing, using the esti-
mate (3.4.9) and dominated convergence, we find that

= (si si1 )H(ti ,ti ) . (3.6.6)
ti
From (3.6.6) we get

log = (si si1 )R(ti ,ti ) . (3.6.7)
ti
124 3. S PACINGS FOR G AUSSIAN ENSEMBLES

We also need to be able to differentiate R(x, y). From the fundamental identity
(3.4.20), we have

R(z, y)
R(x, y) = (si si1 )R(x,ti )S(ti , y) + S(x, z) (dz) . (3.6.8)
ti ti
Substituting y = z in (3.6.8) and integrating against R(z , y) with respect to (dz )
gives
 
R(x, z )
R(z , y) (dz ) = (si si1 )R(x,ti ) S(ti , z )R(z , y) (dz )
ti
 
R(z, z )
+ S(x, z) R(z , y) (dz) (dz ) . (3.6.9)
ti
Summing (3.6.8) and (3.6.9) and using again the fundamental identity (3.4.20)
then yields

R(x, y) = (si1 si )R(x,ti )R(ti , y) . (3.6.10)
ti

The next lemma will play an important role in the proof of Theorem 3.6.1.

Lemma 3.6.2 The functions P, Q, R satisfy the following relations:


Q(x)P(y) Q(y)P(x)
R(x, y) = = R(y, x) , (3.6.11)
xy

R(x, x) = Q (x)P(x) Q(x)P (x) , (3.6.12)


Q(x) = (si1 si )R(x,ti )Q(ti ) , (3.6.13)
ti
and similarly

P(x) = (si1 si )R(x,ti )P(ti ) . (3.6.14)
ti

Proof We rewrite the fundamental identity (3.4.19) in the abbreviated form

RS = RS = S R. (3.6.15)

To abbreviate notation further, put

R(x, y) = (x y)R(x, y), S(x, y) = (x y)S(x, y) .

From (3.6.15) we deduce that

R  S + R  S = R S .
3.6 A NALYSIS OF THE SINE - KERNEL 125

Applying the operation ()  R on both sides, we get


R  (R S) + R  S  R = R  R S  R .
Adding the last two relations and making the obvious cancellations and rearrange-
ments, we get
R = (1 + R)  S  (1 + R).
Together with the trigonometric identity
sin(x y) = sin x cos y sin y cos x
as well as the symmetry
S(x, y) = S(y, x), R(x, y) = R(y, x) ,
this yields (3.6.11). An application of LHopitals rule then yields (3.6.12). Fi-
nally, by (3.6.10) and the definitions we obtain
  

Q(x) = (si1 si )R(x,ti ) f (ti ) + R(ti , y) f (y)d (y)
ti
= (si1 si )R(x,ti )Q(ti ) ,
yielding (3.6.13). Equation (3.6.14) is obtained similarly.

Exercise 3.6.3 An alternative to the elementary calculus used in deriving (3.6.5)


and (3.6.6), which is useful in obtaining higher order derivatives of the determi-
nants, resolvents and adjugants, is sketched in this exercise.
(i) Let D be a domain (connected open subset) in Cn . With X a measure space, let
f (x, ) be a measurable function on X D, depending analytically on for each
fixed x and satisfying the condition

sup | f (x, )|d (x) <
K

for all compact subsets K D. Prove that the function



F( ) = f (x, )d (x)

is analytic in D and that for each index i = 1, . . . , n and all compact K D,





sup f (x, ) d (x) < .
K i
Further, applying Cauchys Theorem to turn the derivative into an integral, and
then Fubinis Theorem, prove the identity of functions analytic in D:
  

F( ) = f (x, ) d (x) .
i i
126 3. S PACINGS FOR G AUSSIAN ENSEMBLES

(ii) Using the fact that the kernel S is an entire function, extend the definitions of
H , H and in the setup of this section to analytic functions in the parameters
t1 , . . . ,tn , s1 , . . . , sn1 .
(iii) View the signed measure as defining a family of distributions (in the
sense of Schwartz) on the interval (a, b) depending on the parameters t1 , . . . ,tn , by
the formula
n1  ti+1
 ,  = si ti
(x)dx ,
i=1

valid for any smooth function (x) on (a, b). Show that / ti is a distribution
satisfying

= (si1 si )ti (3.6.16)
ti
for i = 1, . . . , n, and that the distributional derivative (d/dx) of satisfies
d n n

= (si si1 )ti = . (3.6.17)
dx i=1 i=1 ti

(iv) Use (3.6.16) to justify (3.6.5) and step (i) to justify (3.6.6).

3.6.2 Derivation of the differential equations: proof of Theorem 3.6.1

To proceed farther we need means for differentiating Q(x) and P(x) both with
respect to x and with respect to the parameters t1 , . . . ,tn . To this end we introduce
the further abbreviated notation
   

S (x, y) = + S(x, y) = 0, R (x, y) = + R(x, y)
x y x y
and
 n
(F  G)(x, y) = F(x, z)G(z, y)d (z) := (si si1 )F(x,ti )G(ti , y) ,
i=1

which can be taken as the definition of .


Below we persist for a while in writing

S instead of just automatically putting S = 0 everywhere in order to keep the
structure of the calculations clear. From the fundamental identity (3.4.19),

RS = RS = S R,

we deduce, after integrating by parts, that

R  S + R  S + R  S = R S .
3.6 A NALYSIS OF THE SINE - KERNEL 127

Applying the operation R on both sides of the last equation we find that

R  (R S) + R  (R S) + R  S  R = R  R S  R .

Adding the last two equations and then making the obvious cancellations (includ-
ing now the cancellation S = 0) we find that

R = R  R .

Written out in longhand the last equation says that


 
n
+ R(x, y) = (si si1 )R(x,ti )R(ti , y) . (3.6.18)
x y i=1

Now we can differentiate Q(x) and P(x). We have from the last identity


Q (x) = f (x) + R(x, y) f (y)d (y)
x


= f (x) R(x, y) f (y)d (y)
y

  

+ R(x,t)R(t, y)d (t) f (y)d (y) .

Integrating by parts and then rearranging the terms, we get


 
Q (x) = f (x) + R(x, y) f (y)d (y) + R(x, y) f (y)d (y)
  
+ R(x,t)R(t, y) (t)dt f (y)d (y)

= f (x) +
R(x, y) f (y)d (y)
   
+ R(x,t) f (t) + R(t, y) f (y)d (y) d (t)
n
= P(x) + (sk sk1 )R(x,tk )Q(tk ) , (3.6.19)
k=1

and similarly
n
P (x) = Q(x) + (sk sk1 )R(x,tk )P(tk ) . (3.6.20)
k=1

Observing now that





Q(ti ) = Q (ti ) +
Q(x) ,
P(ti ) = P (ti ) + P(x) ,
ti ti x=ti t i t i x=ti
128 3. S PACINGS FOR G AUSSIAN ENSEMBLES

and adding (3.6.19) and (3.6.13), we have

n
Q(ti ) = P(ti ) + (sk sk1 )R(ti ,tk )Q(tk ) . (3.6.21)
ti k=1,k =i

Similarly, by adding (3.6.20) and (3.6.14) we have

n
P(ti ) = Q(ti ) + (sk sk1 )R(ti ,tk )P(tk ). (3.6.22)
ti k=1,k =i

It follows also via (3.6.12) and (3.6.13) that


R(ti ,ti ) = P(ti ) Q(ti ) Q(ti ) P(ti ) . (3.6.23)
ti ti
(Note that the terms involving Q(x)/ ti |x=ti cancel out to yield the above equal-
ity.) Unraveling the definitions, this completes the proof of (3.6.4) and hence of
Theorem 3.6.1.

3.6.3 Reduction to Painleve V

In what follows, we complete the proof of Theorem 3.1.2. We take in Theorem


3.6.1 the values n = 2, s1 = s. Our goal is to figure out the ordinary differential
equation we get by reducing still farther to the case t1 = t/2 and t2 = t/2. Recall
the sine-kernel S in (3.6.1), set = ddx = s1(t/2,t/2) and write = (S) for the
Fredholm determinant of S with respect to the measure . Finally, set = (t) =
t dtd log . We now prove the following.

Lemma 3.6.4 With notation as above,

(t )2 + 4(t )(t + ( )2 ) = 0 , (3.6.24)

and, for each fixed s, is analytic in t C, with the following expansions as t 0:


s s  s 2  s 3
= 1 t + O(t 4 ), = t t2 t 3 + O(t 4 ) . (3.6.25)

Proof We first consider the notation of Theorem 3.6.1 specialized to n = 2, writ-


ing (t1 ,t2 ) for the Fredholm determinant there. (Thus, = (t1 ,t2 )|t1 =t2 =t/2 .)
Recall that
R21 = (q2 p1 q1 p2 )/(t2 t1 ) = R12 .
3.6 A NALYSIS OF THE SINE - KERNEL 129

From Theorem 3.6.1 specialized to n = 2 we have

1 1
( / t2 / t1 ) log (t1 ,t2 ) = s(p21 + q21 + p22 + q22 ) + s2 (t2 t1 )R221 ,
2 2
1
( q1 / t2 q1 / t1 ) = p1 /2 + sR12 q2 ,
2
1
( p1 / t2 p1 / t1 ) = +q1 /2 + sR12 p2 . (3.6.26)
2

We now analyze symmetry. Temporarily, we write

p1 (t1 ,t2 ), q1 (t1 ,t2 ), p2 (t1 ,t2 ), q2 (t1 ,t2 ),

in order to emphasize the roles of the parameters t1 and t2 . To begin with, since

S(x + c, y + c) = S(x, y) ,

for any constant c we have

(t1 ,t2 ) = (t1 + c,t2 + c) = (t2 , t1 ) . (3.6.27)



Further, we have (recall that f (x) = (sin x)/ )


1 (1)n sn+1
p1 (t1 ,t2 ) = f (t1 ) +
(t1 ,t2 ) n=0 n!
 t2  t2  
t1 x1 . . . xn
S f (y) dx1 dxn dy
t1 t1 y x1 . . . xn

1 (1)n sn+1
= f (t1 ) +
(t2 , t1 ) n=0 n!
 t2  t2  
t1 x1 . . . xn
S f (y) dx1 dxn dy
t1 t1 y x1 . . . xn

1 (1)n sn+1
= f (t1 ) +
(t2 , t1 ) n=0 n!
 t1  t1  
t1 x1 . . . xn
S f (y) dx1 dxn dy
t2 t2 y x1 . . . xn
= p2 (t2 , t1 ) . (3.6.28)

Similarly, we have

q1 (t1 ,t2 ) = q2 (t2 , t1 ) . (3.6.29)


130 3. S PACINGS FOR G AUSSIAN ENSEMBLES

Now we are ready to reduce to the one-dimensional situation. We specialize as


follows. Put

p = p(t) = p1 (t/2,t/2) = p2 (t/2,t/2) ,


q = q(t) = q1 (t/2,t/2) = q2 (t/2,t/2) ,
r = r(t) = R12 (t/2,t/2) = 2pq/t ,
d
= (t) = t log (t/2,t/2) . (3.6.30)
dt
Note that, by the symmetry relations, writing for differentiation with respect to
t, we have
1
p (t) = ( p1 / t2 p1 / t1 )|t2 =t1 =t/2 ,
2
1
q (t) = ( q1 / t2 q1 / t1 )|t2 =t1 =t/2 ,
2
while
t
(t) =
( / t2 / t1 ) log (t1 ,t2 )|t2 =t1 =t/2 .
2
From (3.6.26) and the above we get

= st(p2 + q2 ) + 4s2 q2 p2 ,
q = p/2 + 2spq2 /t , (3.6.31)

p = +q/2 2sp q/t , 2

while differentiating (twice) and using these relations gives

= s(p2 + q2 ) ,
t = 4s2 (p3 q q3 p) . (3.6.32)

Using (3.6.32) together with the equation for from (3.6.31) to eliminate the
variables p, q, we obtain finally

4t( )3 + 4t 2 ( )2 4 ( )2 + 4 2 + (t )2 8t = 0 , (3.6.33)

or equivalently, we get (3.6.24). Note that the differential equation is independent


of s.
Turning to the proof of the claimed analyticity of and of (3.6.25), we write
 t/2  t/2
(s)k k sin(xi x j ) k
= 1+ det dx j
k=1 k! t/2 t/2 i, j=1 (xi x j ) j=1
 1/2  1/2
n
(st)k k sin(txi tx j ) k
= 1 + lim
n
k! 1/2
det dx j .
1/2 i, j=1 (txi tx j ) j=1
k=1
3.6 A NALYSIS OF THE SINE - KERNEL 131

Each of the terms inside the limit in the last display is an entire function in t, and
the convergence (in n) is uniform due to the boundedness of the kernel and the
Hadamard inequality, see Lemma 3.4.2. The claimed analyticity of in t follows.
We next explicitly compute a few terms of the expansion of in powers of t.
Indeed,
 t/2  t/2  t/2 k sin(xi x j ) k
t/2
dx = t,
t/2
det dx j = O(t 4 ) for k 2 ,
t/2 i, j=1 (xi x j ) j=1

and hence the part of (3.6.25) dealing with follows. With more computational
effort, which we omit, one verifies the other part of (3.6.25).

Proof of Theorem 3.1.2 We use Lemma 3.6.4. Take s = 1 and set


 t 
(u)
F(t) = 1 = 1 exp du for t 0 .
0 u
Then by (3.1.1) we have

1 F(t) = lim P[ N 1N , . . . , N NN (t/2,t/2)] ,
N

completing the proof of the theorem.


Remark 3.6.5 We emphasize that we have not yet proved that the function F()
in Theorem 3.1.2 is a distribution function, that is, we have not shown tightness
for the sequence of gaps around 0. From the expansion at 0 of (t), see (3.1.2),
it follows immediately that limt0 F(t) = 0. To show that F(t) 1 as t
requires more work. One approach, that uses careful and nontrivial analysis of the
resolvent equation, see [Wid94] for the first rigorous proof, shows that in fact

(t) t 2 /4 as t + , (3.6.34)

implying that limt F(t) = 1. An easier approach, which does not however yield
such precise information, proceeds from the CLT for determinantal processes de-
veloped in Section 4.2; indeed, it is straightforward to verify, see Exercise 4.2.40,
that for the determinantal process determined by the sine-kernel, the expected
number of points in an interval of length L around 0 increases linearly in L, while
the variance increases only logarithmically in N. This is enough to show that with
A = [t/2,t/2], the right side of (3.1.1) decreases to 0 as t , which implies
that limt F(t) = 1. In particular, it follows that the random variable giving the
width
of the largest open interval centered at the origin in which no eigenvalue of
NXN appears is weakly convergent as N to a random variable with distri-
bution F.
132 3. S PACINGS FOR G AUSSIAN ENSEMBLES

We finally present an alternative formulation of Theorem 3.1.2 that is useful


in comparing with the limit results for the GOE and GSE. Recall the function
r = r(t) = R12 (t/2,t/2), see (3.6.30).

Lemma 3.6.6 With F() as in Theorem 3.1.2, we have


  t 
t
1 F(t) = exp (t x)r(x) dx , 2
(3.6.35)
0

and furthermore the differential equation

t 2 ((tr) + (tr))2 = 4(tr)2 ((tr)2 + ((tr) )2 ) (3.6.36)

is satisfied with boundary conditions


1 t
r(t) = + 2 + Ot0 (t 2 ) . (3.6.37)

The function r(t) has a convergent expansion in powers of t valid for small t.

Proof Recall p and q from (3.6.30). We have


4p2 q2
= p2 + q2 , tr = 2pq, p = q/2 2p2 q/t, q = p/2 + 2pq2 /t ,
t t
hence (3.6.36) holds and furthermore
d  
= r2 , (3.6.38)
dt t
as one verifies by straightforward calculations. From the analyticity of it follows
that it is possible to extend both r(t) and (t) to analytic functions defined in a
neighborhood of [0, ) in the complex plane, and thus in particular both functions
have convergent expansions in powers of t valid for small t. It is clear that
1
lim r(t) = . (3.6.39)
t0
Thus (3.6.35) and (3.6.37) follow from (3.6.33), (3.6.38), (3.6.39) and (3.6.25).

3.7 Edge-scaling: proof of Theorem 3.1.4

Our goal in this section is to study the spacing of eigenvalues at the edge of the
spectrum. The main result is the proof of Theorem 3.1.4, which is completed in
Subsection 3.7.1 (some technical estimates involving the steepest descent method
are postponed to Subsection 3.7.2). For the proof of Theorem 3.1.4, we need the
3.7 E DGE - SCALING 133

following a priori estimate on the Airy kernel. Its proof is postponed to Subsection
3.7.3, where additional properties of the Airy function are studied.

Lemma 3.7.1 For any x0 R,

sup ex+y |A(x, y)| < . (3.7.1)


x,yx0

3.7.1 Vague convergence of the largest eigenvalue: proof of Theorem 3.1.4


(2)
Again we let XN HN be a random Hermitian matrix from the GUE with eigen-
values 1N NN . We now present the
Proof of Theorem 3.1.4 As before put
n (x)n1 (y) n1 (x)n (y)
K (n) (x, y) = n ,
xy
where the n (x) is the normalized oscillator wave-function. Define
1 (n)  x y 
A(n) (x, y) = K 2 n + , 2 n + . (3.7.2)
n1/6 n1/6 n1/6
In view of the basic estimate (3.4.9) in the theory of Fredholm determinants and
the crude bound (3.7.1) for the Airy kernel we can by dominated convergence
integrate to the limit on the right side of (3.1.5). By the bound (3.3.7) of Ledoux
type, if the limit
"  N  #

lim lim P N 2/3
i 2 (t,t ) for i = 1, . . . , N (3.7.3)
t + N N
exists then the limit (3.1.6) also exists and both limits are equal. Therefore we
can take the limit as t on the left side of (3.1.5) inside the limit as n in
order to conclude (3.1.6). We thus concentrate in the sequel on proving (3.1.5) for
t < .
We begin by extending by analyticity the definition of K (n) and A(n) to the
complex plane C. Our goal will be to prove the convergence of A(n) to A on
compact sets of C, which will imply also the convergence of derivatives. Recall
that by part 4 of Lemma 3.2.7,
n (x)n (y) n (y)n (x) 1
K (n) (x, y) = n (x)n (y) ,
xy 2
so that if we set
x
n (x) := n1/12 n (2 n + 1/6 ) ,
n
134 3. S PACINGS FOR G AUSSIAN ENSEMBLES

then
n (x) n (y) n (y) n (x) 1
A(n) (x, y) = 1/3 n (x)n (y) .
xy 2n
The following lemma plays the role of Lemma 3.5.1 in the study of the spacing in
the bulk. Its proof is rather technical and takes up most of Subsection 3.7.2.

Lemma 3.7.2 Fix a number C > 1. Then,


lim sup |n (u) Ai(u)| = 0 . (3.7.4)
n uC:|u|<C

Since the functions n are entire, the convergence in Lemma 3.7.2 entails the
uniform convergence of n to Ai on compact subsets of C. Together with Lemma
3.4.5, this completes the proof of the theorem.

Remark 3.7.3 An analysis similar to, but more elaborate than, the proof of Theo-
rem 3.1.4 shows that
"   #
N
lim P N 2/3
2 t
N
N N
exists for each positive integer  and real number t. In other words, the suitably
rescaled th largest eigenvalue converges vaguely and in fact weakly. Similar
statements can be made concerning the joint distribution of the rescaled top 
eigenvalues.

3.7.2 Steepest descent: proof of Lemma 3.7.2

In this subsection, we use the steepest descent method to prove Lemma 3.7.2.
The steepest descent method is a general, more elaborate version of the method
of Laplace discussed in Subsection 3.5.1, which is inadequate when oscillatory
integrands are involved. Indeed, consider the evaluation of integrals of the form

f (x)s g(x)dx ,

see (3.5.3), in the situation where f and g are analytic functions and the integral
is a contour integral. The oscillatory nature of f prevents the use of Laplaces
method. Instead, the oscillatory integral is tamed by modifying the contour of

integration in such a way that f can be written along the contour as e f with f real,
and the oscillations of g at a neighborhood of the critical points of f are slow. In
practice, one needs to consider slightly more general versions of this example, in
which g itself may depend (weakly) on s.
3.7 E DGE - SCALING 135

Proof of Lemma 3.7.2 Throughout, we let


u  u 
x = 2n1/2 + 1/6 = 2n1/2 1 + 2/3 , n (u) = n1/12 n (x) .
n 2n
We assume throughout the proof that n is large enough so that |u| < C < n2/3 .
Let be a complex variable. By reinterpreting formula (3.5.7) above as a con-
tour integral we get the formula
2  i
ex /4
n e
2 /2 x
n (x) = d . (3.7.5)
i(2 )3/4 n! i

The main effort in the proof is to modify the contour integral in the formula above
in such a way that the leading asymptotic order of all terms in the integrand match,
and then keep track of the behavior of the integrand near its critical point. To carry
out this program, note that, by Cauchys Theorem, we may replace the contour of
integration in (3.7.5) by any straight line in the complex plane with slope of ab-
solute value greater than 1 oriented so that height above the real axis is increasing
(the condition on the slope is to ensure that no contribution appears from the con-
tour near ). Since (x) > 0 under our assumptions concerning u and n, we may
take the contour of integration in (3.7.5) to be the perpendicular bisector of the
line segment joining x to the origin, that is, replace by (x/2)(1 + ), to obtain
 i
ex /8 (x/2)n+1
2
2 ( 2 /2 )
n (x) = (1 + )n e(x/2) d . (3.7.6)
i(2 )3/4 n! i

Let log be the principal branch of the logarithm, that is, the branch real on the
interval (0, ) and analytic in the complement of the interval (, 0], and set
F( ) = log(1 + ) + 2 /2 . (3.7.7)
Note that the leading term in the integrand in (3.7.6) has the form enF( ) , where
(F) has a maximum along the contour of integration at = 0, and a Taylor
expansion starting with 3 /3 in a neighborhood of that point (this explains the
particular scaling we took for u). Put
 x 2/3
= , u = 2 n/ ,
2
where to define fractional powers of complex numbers such as that figuring in
the definition of we follow the rule that a = exp(a log ) whenever is in the
domain of our chosen branch of the logarithm. We remark that as n we have
u u and n1/3 , uniformly for |u| < C. Now rearrange (3.7.6) to the form

(2 )1/4 n1/12 (x/2)n+1/3 ex


2 /8

n (u) = In (u) , (3.7.8)


n!
136 3. S PACINGS FOR G AUSSIAN ENSEMBLES

where
 i
1 3 F( )u log(1+ )
In (u) = e d . (3.7.9)
2 i i

To prove (3.7.4) it is enough to prove that

lim sup |In (u) Ai(u)| = 0 , (3.7.10)


n |u|<C

because we have
  
n1/12 (x/2)n+1/3 ex u  n1/3 u
2 /8
1 u2
log = n+ log 1 + 2/3 1/3
en/2 nn/2+1/4 3 2n 2 8n
and hence

(2 )1/4 n1/12 (x/2)n+1/3 ex2 /8

lim sup 1 = 0 ,
n |u|<C n!

by Stirlings approximation (2.5.12) and some calculus.


To prove (3.7.10), we proceed by a saddle point analysis near the critical point
= 0 of (F)( ). The goal is to replace complex integration with real integra-
tion. This is achieved by making a change of contour of integration so that F is
real along that contour. Ideally, we seek a contour so that the maximum of F is
achieved at a unique point along the contour. We proceed to find such a contour
now, noting that since the maximum of (F)( ) along the imaginary axis is 0 and
is achieved at = 0, we may seek contours that pass through 0 and such that F is
strictly negative at all other points of the contour.
Turning to the actual construction, consider the wedge-shaped closed set

S = {rei |r [0, ), [ /3, /2]}

in the complex plane with corner at the origin. For each > 0 let S be the in-
tersection of S with the closed disc of radius centered at the origin and let S be
the boundary of S . For each t > 0 and all sufficiently large , the curve F( S )
winds exactly once about the point t. Since, by the argument principle of com-
plex analysis, the winding number equals the difference between the number of
zeros and the number of poles of the function F() + t in the domain S , and the
function F() + t does not possess poles there, it follows that there exists a unique
solution (t) S of the equation F( ) = t (see Figure 3.7.1). Clearly (0) = 0
is the unique solution of the equation F( ) = 0 in S. We have the following.

Lemma 3.7.4 The function : [0, ) S has the following properties.


(i) limt | (t)| = .
3.7 E DGE - SCALING 137
5

4 3 2 1 0 1 2

Fig. 3.7.1. The contour S3 (solid), its image F( S3 ) (dashed), and the curve () (dash
and dots).

(ii) (t) is continuous for t 0 and real analytic for t > 0.


(iii)
(t) = O(t 1/2 ) as t ,
(t) = O(t 1/2 ) as t ,
(t) = e i/3 31/3t 1/3 + O(t 4/3 ) as t 0 ,
(t) = e i/3 32/3t 2/3 + O(t 1/3 ) as t 0 .

Proof (i) follows by noting that F restricted to S is proper, that is for any sequence
zn S with |zn | as n , it holds that |F(zn )| . The real analyticity
claim in (ii) follows from the implicit function theorem. (iii) follows from a direct
computation, and together with (0) = 0 implies the continuity claim in (ii).

From Lemma 3.7.4 we obtain the formula
  
1
e t (1 + (t)) u (t) (1 + (t)) u (t) dt ,
3
In (u) =
2 i 0
by deforming the contour i i in (3.7.9) to . After replacing t by t 3 /3n
in the integral above we obtain the formula

1
In (u) = (An (t, u) Bn (t, u))dt , (3.7.11)
2 i 0

where
   3  u  3  2
3t 3 t t t
An (t, u) = exp 1+ ,
3n 3n 3n n
   3  u  3  2
3t 3 t t t
Bn (t, u) = exp 1 + .
3n 3n 3n n
138 3. S PACINGS FOR G AUSSIAN ENSEMBLES

Put
 
t3 i/3
A(t, u) = exp e tu + i/3 ,
 33 
t i/3
B(t, u) = exp e tu i/3 .
3
By modifying the contour of integration in the definition of the Airy function
Ai(x), see (3.7.16), we have

1
Ai(u) = (A(t, u) B(t, u))dt . (3.7.12)
2 i 0
A calculus exercise reveals that, for any positive constant c and each t0 0,

An (t, u)
lim sup sup 1 = 0 (3.7.13)
n 0tt |u|<c A(t, u)
0

and clearly the analogous limit formula linking Bn (t, u) to B(t, u) holds also. There
exist positive constants c1 and c2 such that
| log(1 + (t))| c1t 1/3 , | (t)| c2 max(t 2/3 ,t 1/2 )
for all t > 0. There exists a positive constant n0 such that
( 3 ) n/2, | | 2n1/3 , |u | < 2c
for all n n0 and |u| < c. Also there exists a positive constant c3 such that
1/3
ec3 t t 1/6
for t 1. Consequently there exist positive constants c4 and c5 such that

| e t (1 + (t)) u (t)| c4 n1/3 ent/2+c5 n t 2/3 ,
3 1/3 t

hence
|An (t, u)| c4 et
3 /6+c
5t (3.7.14)
for all n n0 , t > 0 and |u| < c. Clearly we have the same majorization for
|Bn (t, u)|. Integral formula (3.7.12), uniformity of convergence (3.7.13) and ma-
jorization (3.7.14) together are enough to finish the proof of limit formula (3.7.10)
and hence of limit formula (3.7.4).

Exercise 3.7.5 Set


(n) 1
Szn (x, y) = K (n) (zn + x/ n, zn + y/ n) .
n

Apply the steepest descent method to show that if zn / n n c with |c| < 2,
(n)
then Szn (x, y) converges to the rescaled sine-kernel sin[g(c)(x y)]/( (x y)),
3.7 E DGE - SCALING 139

uniformly in x, y in compacts, where g(c) = (c) = 4 c2 /2 and () is the
semicircle density, see (2.1.3).
Hint: use (3.7.6) and note the different behavior of the function F at 0 when c < 2.

3.7.3 Properties of the Airy functions and proof of Lemma 3.7.1

Throughout this subsection, we will consider various contours in the complex


plane. We introduce the following convenient notation: for complex numbers a, b,
we let [a, b] denote the contour joining a to b along the segment connecting them,
i.e. the contour (t  (1 t)a + tb) : [0, 1] C. We also write [a, c) for the ray
emanating from a in the direction c, that is the contour (t  a + ct) : [0, ) C,
and write (c, a] = [a, c). With this notation, and performing the change of
variables  w, we can rewrite (3.1.3) as

1 3 /3
Ai(x) = exww dw . (3.7.15)
2 i (e2 i/3 ,0]+[0,e2 i/3 )

Note that the rapid decay of the integrand in (3.7.15) along the indicated con-
tour ensures that Ai(x) is well defined and depends holomorphically on x. By
parametrizing the contour appearing in (3.7.15) in evident fashion, we also obtain
the formula
  3     
1 t i i 3i i
Ai(x) = exp exp xte + 3 exp xte dt .
2 i 0 3 3 3
(3.7.16)
In the statement of the next lemma, we use the notation x to mean that x goes
to along the real axis. Recall also the definition of Eulers Gamma function, see

(2.5.5): (s) = 0 ex xs1 dx, for s with positive real part.

Lemma 3.7.6 (a) For any integer 0, the derivative Ai( ) (x) satisfies

Ai( ) (x) 0 , as x . (3.7.17)

(b) The function Ai(x) is a solution of (3.1.4) that satisfies


1 1
Ai(0) = , Ai (0) = . (3.7.18)
32/3 (2/3) 31/3 (1/3)
(c) Ai(x) > 0 and Ai (x) < 0 for all x > 0 .

Proof For x 0 real, c C satisfying c3 = 1 and k 0 integer, define


 
3 /3 3 /3
I(x, c, k) = wk ewxw dw = ck+1 t k exctt dt . (3.7.19)
[0,c) 0
140 3. S PACINGS FOR G AUSSIAN ENSEMBLES

As x we have I(x, e2 i/3 , k) 0 by dominated convergence. This proves


(3.7.17). Next, (3.7.18) follows from (3.7.19) and the definition of (). We next
prove that Ai(x) > 0 for x > 0. Assume otherwise that for some x0 > 0 one has
Ai(x0 ) 0. By (3.7.29), if Ai(x0 ) = 0 then Ai (x0 ) = 0. Thus, for some x1 > 0,
Ai(x1 ) < 0. Since Ai(0) = 0 and Ai(x) 0 as x , Ai() possesses a global
minimum at some x2 (0, ), and Ai (x2 ) 0, contradicting the Airy differential
equation.

We next evaluate the asymptotics of the Airy functions at infinity. For two
functions f , g, we write f g as x if limx f (x)/g(x) = 1.

Lemma 3.7.7 For x we have the following asymptotic formulas:


2 3/2
Ai(x) 1/2 x1/4 e 3 x /2 . (3.7.20)

2 3/2
Ai (x) 1/2 x1/4 e 3 x /2 . (3.7.21)

Proof Making the substitution w  x1/2 (u 1) and deforming the contour of


integration in (3.7.15), we obtain

2/3 /3 3/2 (u2 u3 /3)
2 ix1/4 e2x Ai(x) = x3/4 ex du , (3.7.22)
C

where

C = (e2 i/3 , i 3] + [i 3, i 3] + [i 3, e2 i/3 ) =: C1 +C2 +C3 .

Since the infimum of the real part of u2 u3 /3 on the rays C1 and C3 is strictly
negative, the contribution of the integral over C1 and C3 to the right side of (3.7.22)
vanishes as x . The remaining integral (over C2 ) gives
 3x3/4 
3 3/4
et +it x /3 dt et dt = i
2 2
i i as x ,
3x 3/4

by dominated convergence. This completes the proof of (3.7.20). A similar proof


gives (3.7.21). Further details are omitted.

Proof of Lemma 3.7.1 Fix x0 R. By (3.7.20), (3.7.21) and the Airy differential
equation (3.1.4), there exists a positive constant C (possibly depending on x0 ) such
that
max(| Ai(x)|, | Ai (x)|, | Ai (x)|) Cex

for all real x x0 and hence for x, y x0 ,

|x y| 1 |A(x, y)| 2C2 exy .


3.7 E DGE - SCALING 141

But by the variant (3.5.5) of Taylors Theorem noted above we also have, for
x, y x0 ,
|x y| < 1 |A(x, y)| 2C2 e2 exy .

Thus the lemma is proved.



Exercise 3.7.8 Show that 0 Ai(x)dx = 1/3.
Hint: for > 0, let denote the path (t  e2 it ) : [5/6, 7/6] C, and define
the contour C = (e2 i/3 , e2 i/3 ] + + [ e2 i/3 , e2 i/3 ). Show that
 
1
w1 ew
3 /3
Ai(x)dx = dw ,
0 2 i C

and take 0 to conclude.

Exercise 3.7.9 Write x if x along the real axis. Prove the asymptotics

sin( 23 |x|3/2 + 4 )
Ai(x) as x (3.7.23)
|x|1/4
and
cos( 23 |x|3/2 + 4 )|x|1/4
Ai (x) as x . (3.7.24)

Conclude that Lemma 3.7.1 can be strengthened to the statement

sup ex+y |A(x, y)| < . (3.7.25)


x,yR

Exercise 3.7.10 The proof of Lemma 3.7.7 as well as the asymptotics in Exercise
3.7.17 are based on finding an appropriate explicit contour of integration. An al-
ternative to this approach utilizes the steepest descent method. Provide the details
of the proof of (3.7.20), using the following steps.
(a) Replacing by x1/2 in (3.1.3), deduce the integral representation, for x > 0,

x1/2 3/2 H( )
Ai(x) = ex d , H( ) = 3 /3 . (3.7.26)
2 i C

(b) Modify the contour C to another (implicitly defined) contour C , so that


(H(C )) is constant, and the deformed contour C snags the critical point = 1
of H, so that the image H(C ) runs on the real axis from to 2/3 and back.
Hint: Consider the closed sets

S = {1 + rei |r 0, [ /3, /2]}


142 3. S PACINGS FOR G AUSSIAN ENSEMBLES

and the intersection of S with the closed disc of radius about 1, and apply a
reasoning similar to the proof of Lemma 3.7.2 to find a curve (t) such that

e2x
3/2 /3
x1/2
ex ( (t) (t))dt for x > 0 .
3/2 t
Ai(x) = (3.7.27)
2 i 0

Identify the asymptotics of (t) and its derivative as t 0 and t .


(c) Apply Laplaces method, Lemma D.9, to obtain (3.7.20).

Exercise 3.7.11 Another solution of (3.1.4), denoted Bi(x), is obtained by replac-


ing the contour in (3.7.15) with the contour (e2 i/3 , 0] + [0, ) + (e2 i/3 , 0] +
[0, ), that is

1 3 /3
Bi(x) = exww dw . (3.7.28)
2 (e2 i/3 ,0]+2[0,)+(e2 i/3 ,0]

Show
that Bi(x) satisfies (3.1.4) with the boundary conditions [Bi(0) Bi (0)] =
31/6
1
31/6 (2/3) (1/3)
. Show that for any x R,
" #
Ai(x) Ai (x) 1
det = , (3.7.29)
Bi(x) Bi (x)
concluding that Ai and Bi are linearly independent solutions. Show also that
Bi(x) > 0 and Bi (x) > 0 for all x > 0. Finally, repeat the analysis in Lemma
3.7.7, using the substitution w  x1/2 (u + 1) and the (undeformed!) contour

C = (e2 i/3 , 1] + [1, 1] + [1, ) + e2 i/3 , 1] + [1, 1] + [1, ) ,

and conclude that


2 3/2
Bi(x) 1/2 x1/4 e 3 x , (3.7.30)
2 3/2
Bi (x) 1/2 x1/4 e 3 x . (3.7.31)

3.8 Analysis of the TracyWidom distribution and proof of Theorem 3.1.5

We will study the Fredholm determinant


 
(1)k   x1 ... xk
= (t) := 1 + t t A kj=1 dx j
k=1 k!
x1 ... xk

where A(x, y) is the Airy kernel and as before we write


 
x1 . . . xk k
A = det A(xi , y j ) .
y1 . . . yk i, j=1
3.8 A NALYSIS OF THE T RACYW IDOM DISTRIBUTION 143

We are going to explain why (t) is a distribution function, which, together


 with

(n)
n
Theorem 3.1.4, will complete our proof of weak convergence of n2/3 n
2 .
Further, we are going to link (t) to the Painleve II differential equation.
We begin by putting the study of the TracyWidom distribution (t) into a
framework compatible with the general theory of Fredholm determinants devel-
oped in Section 3.4. Let denote the measure on the real line with density
d /dx = 1(t,) (x) with respect to the Lebesgue measure (although depends on
t, we suppress this dependence from the notation). We have then
    k
(1)k x1 . . . xk
= 1+
x1 . . . xk
A d (x j ) .
k=1 k! j=1

Put
    k
(1)k x x1 ... xk
H(x, y) = A(x, y) + A d (x j ) .
k=1 k!
y x1 ... xk j=1

In view of the basic estimate (3.4.9) and the crude bound (3.7.1) for the Airy
kernel, we must have (t) 1 as t . Similarly, we have
sup sup ex+y |H(x, y)| < (3.8.1)
tt0 x,yR

for each real t0 and


lim sup ex+y |H(x, y) A(x, y)| = 0 . (3.8.2)
t x,yR

Note that because can be extended to a not-identically-vanishing entire analytic


function of t, it follows that vanishes only for isolated real values of t. Put
R(x, y) = H(x, y)/ ,
provided of course that = 0; a similar proviso applies to each of the following
definitions since each involves R(x, y). Put

Q(x) = Ai(x) + R(x, y) Ai(y)d (y) ,

P(x) = Ai (x) + R(x, y) Ai (y)d (y) ,

q = Q(t), p = P(t) , u = Q(x) Ai(x)d (x) ,
 
v = Q(x) Ai (x)d (x) = P(x) Ai(x)d (x) , (3.8.3)

the last equality by symmetry R(x, y) = R(y, x). Convergence of all these integrals
is easy to check. Note that each of the quantities q, p, u and v tends to 0 as t .
144 3. S PACINGS FOR G AUSSIAN ENSEMBLES

More precise information is also available. For example, from (3.8.1) and (3.8.2)
it follows that
q(x)/ Ai(x) x 1 , (3.8.4)

because for x large, (3.7.20) implies that for some constant C independent of x,
 
R(x, y) Ai(y)dy C exy Ai(y)dy C Ai(x)e2x .
x x

3.8.1 The first standard moves of the game

We follow the trail blazed in the discussion of the sine-kernel in Section 3.6. The
first few steps we can get through quickly by analogy. We have

log = R(t,t) , (3.8.5)
t

R(x, y) = R(x,t)R(t, y) . (3.8.6)
t
As before we have a relation
Q(x)P(y) Q(y)P(x)
R(x, y) = = R(y, x) (3.8.7)
xy
and hence by LHopitals rule we have

R(x, x) = Q (x)P(x) Q(x)P (x) . (3.8.8)

We have the differentiation formulas



Q(x) = R(x,t)Q(t) = Q(t)R(t, x) , (3.8.9)
t

P(x) = R(x,t)P(t) = P(t)R(t, x) . (3.8.10)
t
Here the Airy function and its derivative are playing the roles previously played
by sine and cosine, but otherwise to this point our calculation is running just as
before. Actually the calculation to this point is simpler since we are focusing on a
single interval of integration rather than on several.

3.8.2 The wrinkle in the carpet

As before we introduce the abbreviated notation


   

A (x, y) = + A(x, y), R (x, y) = + R(x, y) ,
x y x y
3.8 A NALYSIS OF THE T RACYW IDOM DISTRIBUTION 145

(F  G)(x, y) = F(x, z)G(z, y)d (z) = F(x,t)G(t, y) .

Heres the wrinkle in the carpet that changes the game in a critical way: A does
not vanish identically. Instead we have

A (x, y) = Ai(x) Ai(y) , (3.8.11)

which is an immediate consequence of the Airy differential equation y xy = 0.


Calculating as before but this time not putting A to zero we find that

R = R  R + A + R  A + A  R + R  A  R.

Written out in longhand the last equation says that


 

+ R(x, y) = R(x,t)R(t, y) Q(x)Q(y) . (3.8.12)
x y

The wrinkle propagates to produce the extra term on the right. We now have
  

Q (x) = Ai(x) + R(x, y) Ai(y)d (y)
x
  

= Ai (x) R(x, y) Ai(y)d (y)
y

+R(x,t) R(t, y) Ai(y)d (y) Q(x)u
 
= Ai(x) + R(x, y) Ai (y)d (y) + R(x, y) Ai(y)d (y)

+R(x,t) R(t, y) Ai(y)d (y) Q(x)u

= Ai(x) + R(x, y) Ai (y)d (y)

+R(x,t)(Ai(t) + R(t, y) Ai(y)d (y)) Q(x)u
= P(x) + R(x,t)Q(t) Q(x)u . (3.8.13)

Similar manipulations yield

P (x) = xQ(x) + R(x,t)P(t) + P(x)u 2Q(x)v . (3.8.14)

This is more or less in analogy with the sine-kernel case. But the wrinkle continues
to propagate, producing the extra terms involving the quantities u and v.
146 3. S PACINGS FOR G AUSSIAN ENSEMBLES

3.8.3 Linkage to Painleve II

The derivatives of the quantities p, q, u and v with respect to t we denote simply


by a prime. We calculate these derivatives as follows. Observe that



q =
Q(x) + Q (t),
p = P(x) + P (t) .
t x=t t x=t
By adding (3.8.9) to (3.8.13) and (3.8.10) to (3.8.14) we have
q = p qu, p = tq + pu 2qv . (3.8.15)
It follows also via (3.8.8) that

log (t) = R(t,t) = q p p q = p2 tq2 2pqu + 2q2 v . (3.8.16)
t
We have
     

u = Q(x) Ai(x)d (x) + Q(x) Ai(x)d (x)
t t

= Q(t) R(t, x) Ai(x)d (x) Q(t) Ai(t) = q2 .

     

v = Q(x) Ai (x)d (x) + Q(x) Ai (x)d (x)
t t

= Q(t) R(t, x) Ai (x)d (x) Q(t) Ai (t) = pq .

We have a first integral


u2 2v = q2 ;
at least it is clear that the t-derivative here vanishes, but then the constant of inte-
gration has to be 0 because all the functions here tend to 0 as t . Finally,
q = (p qu) = p q u qu = tq + pu 2qv (p qu)u q(q2 )
= tq + pu 2qv pu + qu2 + q3 = tq + 2q3 , (3.8.17)
which is Painleve II; that q(t) Ai(t) as t was already proved in (3.8.4).
It remains to prove that the function F2 defined in (3.1.6) is a distribution func-
tion. By adding equations (3.8.12) and (3.8.6) we get
 

+ + R(x, y) = Q(x)Q(y) . (3.8.18)
x y t
By evaluating both sides at x = t = y and also using (3.8.5) we get
2
log = q2 . (3.8.19)
t2
3.8 A NALYSIS OF THE T RACYW IDOM DISTRIBUTION 147

Let us now write q(t) and (t) to emphasize the t-dependence. In view of the
rapid decay of (t) 1, (log (t)) and q(t) as t we must have
  
(t) = exp (x t)q(x)2 dx , (3.8.20)
t

whence the conclusion that F2 (t) = (t) satisfies F2 () = 1 and, because of the
factor (x t) in (3.8.20) and the fact that q() does not identically vanish, also
F2 () = 0. In other words, F2 is a distribution function. Together with (3.8.17)
and Theorem 3.1.4, this completes the proof of Theorem 3.1.5.

Remark 3.8.1 The Painleve II equation q = tq+2q3 has been studied extensively.
The following facts, taken from [HaM80], are particularly relevant: any solution
of Painleve II that satisfies q(t) t 0 satisfies also that as t , q(t) Ai(t)
for some R, and for each fixed , such a solution exists and is unique. For
= 1, which is the case of interest to us, see (3.1.8), one then gets

q(t) t/2 , t . (3.8.21)

We defer additional comments to the bibliographical notes.

Remark 3.8.2 The analysis in this section would have proceeded verbatim if the
Airy kernel A(x, y) were replaced by sA(x, y) for any s (0, 1), the only difference
being that the boundary condition for (3.1.8) would be replaced by q(t) s Ai(t)
as t . On the other hand, by Corollary 4.2.23 below, the kernel sA(n) (x, y)
replaces A(n) (x, y) if one erases each eigenvalue of the GUE with probability s. In
particular, one concludes that for any k fixed,

lim lim sup P(N 1/6 (Nk
N
2 N) t) = 0 . (3.8.22)
t N

This observation will be useful in the proof of Theorem 3.1.7.

Exercise 3.8.3 Using (3.7.20), (3.8.4) and (3.8.21), deduce from the representation
(3.1.7) of F2 that
1 4
lim log[1 F2 (t)] = ,
t t 3/2 3
1 1
lim log F2 (t) = ,
t t 3 12
Note the different decay rate of the upper and lower tails of the distribution of the
(rescaled) largest eigenvalue.
148 3. S PACINGS FOR G AUSSIAN ENSEMBLES

3.9 Limiting behavior of the GOE and the GSE

We prove Theorems 3.1.6 and 3.1.7 in this section, using the tools developed in
Sections 3.4, 3.6 and 3.7, along with some new tools, namely, Pfaffians and matrix
kernels. The multiplicativity of Fredholm determinants, see Theorem 3.4.10, also
plays a key role.

3.9.1 Pfaffians and gap probabilities

We begin our analysis of the limiting behavior of the GOE and GSE by proving a
series of integration identities involving Pfaffians; the latter are needed to handle
the novel algebraic situations created by the factors |(x)| with {1, 4} ap-
pearing in the joint distribution of eigenvalues in the GOE and GSE, respectively.
Then, with Remark 3.4.4 in mind, we use the Pfaffian integration identities to
obtain determinant formulas for squared gap probabilities in the GOE and GSE.

Pfaffian integration formulas

Recall that Matk (C) denotes the space of k-by- matrices with complex entries,
with Matn (C) = Matnn (C) and In Matn (C) denoting the identity matrix. Let

0 1
1 0

..
Jn = . Mat2n (C)

0 1
1 0
" #
0 1
be the block-diagonal matrix consisting of n copies of strung along
1 0
the diagonal. Given a family of matrices

{X(i, j) Matk (C) : i = 1, . . . , m and j = 1, . . . , n} ,

let

X(1, 1) ... X(1, n)
.. ..
X(i, j)|m,n = . . Matkmn (C) .
X(m, 1) . . . X(m, n)
" #
0 1
For example, Jn = i, j |n,n Mat2n (C).
1 0
Next, recall a basic definition.
3.9 L IMITING B EHAVIOR OF THE GOE AND THE GSE 149

Definition 3.9.1 (Pfaffians) Let X Mat2n (C) be antisymmetric, that is, X T =


X, X j,i = Xi, j . The Pfaffian of X is defined by the formula
n
1
Pf X =
2n n! S2n
(1)
X (2i1), (2i) ,
i=1

where (1) denotes the sign of the permutation .

1
For example, Pf Jn = 1, which explains the normalization 2n n! .
We collect without proof some standard facts related to Pfaffians.

Theorem 3.9.2 Let X Mat2n (C) be antisymmetric. The following hold:


(i) Pf(Y T XY ) = (Pf X) (detY ) for every Y Mat2n (C);
(ii) (Pf X)2 = det X;
(iii) Pf X = 2n1 i+1 X {i,2n} , where X {i,2n} is the submatrix obtained
i=1 (1) i,2n Pf X
by striking out the ith row, ith column, (2n)th row and (2n)th column.

We next give a general integration identity involving Pfaffians, which is the


analog for {1, 4} of Lemma 3.2.3.

Proposition 3.9.3 Let f1 , . . . , f2n and g1 , . . . , g2n be C-valued measurable func-


tions on the real line. Assume that all products fi g j are integrable. For x R,
put
F(x) = [ fi (x) gi (x)]|2n,1 Mat2n2 (C).

Then, for all measurable sets A R,


   n
1
Pf F(x)J1 F(x)T dx = det[F(x j )]|1,n dxi . (3.9.1)
A n! A A i=1

Here and throughout the discussion of Pfaffian integration identities, measurable


means Lebesgue measurable.
Proof Expand the right side of (3.9.1) as
  n " # n
1 f (2i1) (xi ) g (2i1) (xi )
2n n!
(1)
det dxi . (3.9.2)
S A A i=1 f (2i) (xi ) g (2i) (xi ) i=1
2n

The (i, j)"entry of the matrix


# appearing on the left side of (3.9.1) can be expressed
 fi (x) gi (x)
as A det dx. Therefore, by Fubinis Theorem, the expansion
f j (x) g j (x)
(3.9.2) matches term for term the analogous expansion of the left side of (3.9.1)
according to the definition of the Pfaffian.

150 3. S PACINGS FOR G AUSSIAN ENSEMBLES

To evaluate gap probabilities in the GOE and GSE, we will specialize Proposi-
tion 3.9.3 in several different ways, varying both F and n. To begin the evaluation,
2
let denote a function on the real line of the form (x) = eC1 x +C2 x+C3 , where
C1 < 0, C2 and C3 are real constants, and let On denote the span over C of the set
of functions {xi1 (x)}n1
i=0 . Later we will make use of specially chosen bases for
On consisting of suitably modified oscillator wave-functions, but initially these
are not needed. Recall that (x) = 1i< jn (x j xi ) for x = (x1 , . . . , xn ) Rn .
The application of (3.9.1) to the GSE is the following.

Proposition 3.9.4 Let { fi }2n


i=1 be any family of elements of O2n . For x R, put

F(x) = [ fi (x) fi (x) ]|2n,1 Mat2n2 (C) .

Then, for all measurable sets A R,


   n
Pf F(x)J1 F(x)T dx = c (x)4 (xi )2 dxi , (3.9.3)
A A A i=1

where c = c({ fi }) is a complex number depending only on the family { fi }, not on


A. Further, c = 0 if and only if { fi }2n
i=1 is a basis for O2n over C.

Proof By Theorem 3.9.2(i), we may assume without loss of generality that fi (x) =
xi1 (x), and it suffices to show that (3.9.3) holds with c = 0. By identity (3.9.1)
and the confluent alternant identity (2.5.30), identity (3.9.3) does indeed hold for
suitable nonzero c independent of A.

The corresponding result for the GOE uses indefinite integrals of functions. To
streamline the handling of the latter, we introduce the following notation, which
is used throughout Section 3.9. For each integrable real-valued function f on the
real line we define a continuous function f by the formula
  
1 1
( f )(x) = sign (x y) f (y)dy = f (y)dy + f (y)dy
2 x 2
 x 
1
= f (y)dy sign(y) f (y)dx , (3.9.4)
0 2


where sign(x) = 1x>0 1x<0 , and we write f (x)dx = f (x)dx to abbreviate

notation. Note that ( f ) (x) = f (x) almost everywhere, that is, inverts dif-
ferentiation. Note also that the operation reverses parity and commutes with
translation.
The application of (3.9.1) to the GOE is the following.

Proposition 3.9.5 Let { fi }ni=1 be any family of elements of On . Let a = 0 be a


3.9 L IMITING B EHAVIOR OF THE GOE AND THE GSE 151

complex constant. For each measurable set A R and x R, put




FAe (x) = fi (x) (1A fi )(x) |n,1 Matn2 (C) .
If n is even, let FA (x) = FAe (x) Matn 2 (C). Otherwise, if n is odd, let FA (x)
Matn 2 (C) be the result of adjoining the row [ 0 a ] at the bottom of FAe (x). Then,
for all measurable sets A R,
   n
Pf FA (x)J1 FA (x)T dx = c |(x)| (xi )dxi , (3.9.5)
A A A i=1

where c = c({ fi }, a) is a complex number depending only on the data ({ fi }, a),


not on A. Further, c = 0 if and only if { fi }ni=1 is a basis for On over C.

Proof By Theorem 3.9.2(i), we may assume without loss of generality that fi (x) =
xi1 (x), and it suffices to show that (3.9.5) holds with c = 0 independent of A.
For x R, let f (x) = [ fi (x) ] |n,1 Matn1 (C). Let An+ be the subset of An Rn
consisting of n-tuples in strictly increasing order. Then, using the symmetry of the
integrand of (3.9.5) and the Vandermonde determinant identity, one can verify that

the integral An+ det[ f (y j )]|1,n n1 dyi equals the right side of (3.9.5) with c = 1/n!.
Put r = n/2. Consider, for z Rr , the n n matrix

[ (1A fi )|z 1 z
]|n,1 [ fi (z j ) (1A fi )|z j+1
j ]|n,r if n is odd,
A (z) =
z
[ fi (z j ) (1A fi )|z j+1
j ]|n,r if n is even,
where zr+1 = , and h|ts = h(t) h(s). By integrating every other variable, we
obtain a relation
 r  n
det A (z) dzi = det[ f (y j )]|1,n dyi .
Ar+ 1 An+ 1

Consider, for z Rr ,
the n n matrix
 
[[FA (z j )]|1,r a A f (x)dx] if n is odd,
A (z) =
[FA (z j )]|1,r if n is even.
Because A (z) arises from A (z) by evident column operations, we deduce that
det A (z) = c1 det A (z) for some nonzero complex constant c1 independent of A
and z. Since the function det A (z) of z Rr is symmetric, we have
 r  r
1
det A (z) dzi = det A (z) dzi .
Ar+ 1 r! Ar 1

If n is even, we conclude the proof by using the Pfaffian integration identity (3.9.1)
to verify that the right side above equals the left side of (3.9.5).
Assume for the rest of the proof that n is odd. For i = 1, . . . , n, let FAe,i (x) be the
152 3. S PACINGS FOR G AUSSIAN ENSEMBLES

result of striking out the ith row from FAe (x) and similarly, let iA (z) be the result
of striking the ith row and last column from A (z). Then we have expansions
"  e  #
A FA (x)J F e (x)T dx a A f (x)dx
Pf  1 AT
a A f (x) dx 0
n    
= a (1) ( fi (x)dx) Pf FA (x)J1 FA (x) dx ,
i+1 e,i e,i T
i=1 A A
n 
det A (z) = a (1)i+n ( fi (x)dx) det iA (z) ,
i=1 A

obtained in the first case by Theorem 3.9.2(iii), and in the second by expanding
the determinant by minors of the last column. Finally, by applying (3.9.1) term
by term to the latter expansion, and comparing the resulting terms with those of

the former expansion, one verifies that r!1 Ar det A (z) r1 dzi equals the left side
of (3.9.5). This concludes the proof in the remaining case of odd n.

The next lemma gives further information about the structure of the antisym-


metric matrix A FA (x)J1FA (x)T dx appearing in Proposition 3.9.5. Let n = 2In
" #
2In 0
for even n, and n = for odd n.
0 1/ 2

Lemma 3.9.6 In the setup of Proposition 3.9.5, for all measurable sets A R,
  
FA (x)J1 FA (x)T dx = FR (x)J1 FR (x)T dx n FR (x)J1 FA (x)T n dx . (3.9.6)
A Ac

Proof Let Li, j (resp., Ri, j ) denote the (i, j) entry of the matrix on the left (resp.,

right). To abbreviate notation we write  f , g = f (x)g(x)dx. For i, j < n + 1,
using antisymmetry of the kernel 12 sign(x y), we have
1 1
Li, j = (1A fi , (1A f j ) 1A f j , (1A fi )) = 1A fi , (1A f j )
2 2
=  fi , f j  1Ac fi , f j  1A fi , (1Ac f j )
1
=  fi , f j  1Ac fi , f j  +  (1A fi ), 1Ac f j  = Ri, j ,
2
which concludes the proof in the case of even n. In the case of odd n it remains
only to consider the cases max(i, j) = n+1. If i = j = n+1, then Li, j = 0 = Ri, j . If
i < j = n + 1, then Li, j = a1A , fi  = Ri, j . If j < i = n + 1, then Li, j = a1A , f j  =
Ri, j . The proof is complete.

Determinant formulas for squared gap probabilities


By making careful choices for the functions fi in Propositions 3.9.4 and 3.9.5,
and applying Theorems 3.9.2(ii) and 2.5.2, we are going to obtain determinant
3.9 L IMITING B EHAVIOR OF THE GOE AND THE GSE 153

formulas for squared gap probabilities. Toward that end, for fixed > 0 and real
, let
n (x) = n, , (x) = 1/2 n ( 1 x + ) , (3.9.7)
and 1 0 for convenience. The functions n are shifted and scaled versions of
the oscillator wave-functions, see Definition 3.2.1.
We are ready to state the main results for gap probabilities in the GSE and GOE.
These should be compared with Lemma 3.2.4 and Remark 3.4.4. The result for
the GSE is as follows.

Proposition 3.9.7 For x R, put


"
#
1 2i1 (x) 2i1 (x)
H(x) = |1,n Mat22n (C) (3.9.8)
2 2i1 (x) 2i1 (x)
, = J1 H(x)J1
and H(x) n . Then, for all measurable sets A R,
     2
Ac Ac (x) i=1 0 (xi ) dxi
4 n 2
, T H(x)dx =
det I2n H(x) . (3.9.9)
A (x)4 ni=1 0 (xi )2 dxi

To prove the proposition we will interpret H as the transpose of a matrix of the


form F appearing in Proposition 3.9.4, which is possible because inverts differ-
entiation.
The result for the GOE is as follows.

Proposition 3.9.8 Let r = n/2. Let n = n if n is even, and otherwise, if n is odd,


let n = n + 1. Let  {1, 2} have the same parity as n. For x R, and measurable
sets A R, put
"
#
1 2i (x) 2i (x)
GeA (x) = |1,r Mat22r (C) .
(1A 2i )(x) (1A 2i )(x)
If n is even, put GA (x) = GeA (x) Mat2n (C). Otherwise, if n is odd, let GA (x)
Mat2n (C) be obtained from GeA (x) by adjoining the block
" #
n1 (x) 0
(1A n1 )(x) 1/n1 , 1

on the far right. Also put G,A (x) = J1 GA (x)J1


n /2 . Then, for all measurable sets
A R,
     2
Ac Ac |(x)| i=1 0 (xi )dxi
n
det In G ,R (x)T GAc (x)dx = .
A |(x)| ni=1 0 (xi )dxi
(3.9.10)
154 3. S PACINGS FOR G AUSSIAN ENSEMBLES

To prove the proposition we will interpret GA as a matrix of the form FAT n ap-
pearing on the right side of (3.9.6) in Lemma 3.9.6.
Before commencing the proofs we record a series of elementary properties of
the functions i following immediately from Lemmas 3.2.5 and 3.2.7. These
properties will be useful throughout Section 3.9. As above, we write  f , g =

f (x)g(x)dx. Let k, , n 0 be integers. Let On = On, , denote the span of the
set {i }n1
i=0 over C.

Lemma 3.9.9 The following hold:


( 1 x+ )2
0 (x) = 1/2 (2 )1/4 e 4 , (3.9.11)
|x|
sup e |n (x)| < for every real constant , (3.9.12)
x
n = (n ) = (n ) , (3.9.13)
k ,   = 2
k = k ,   , (3.9.14)
k ,   = 0 and k ,   = 0 for k +  even , (3.9.15)
n , 1 = 0 for n odd , (3.9.16)

n+1 n
n = n+1 + n1 , (3.9.17)
2 2
n , 1 > 0 for n even , (3.9.18)
n On1 for n odd , (3.9.19)
1

( x + )n (x) = n + 1n+1 (x) + nn1 (x) , (3.9.20)
n1
i (x)i (y) n (x)n (y) n (x)n (y) n (x)n (y)
2
=
xy

2 2
, (3.9.21)
i=0
 
( 1 x + )2 1
2 n (x) = n n (x) . (3.9.22)
4 2

Proof of Proposition 3.9.7 Using property (3.9.19), and recalling that inverts
differentiation, we observe that, with = 0 and F(x) = H(x)T , the integra-
tion identity (3.9.3) holds with a constant c independent of A. Further, we have

, T H(x)dx = I2n by (3.9.14) and (3.9.15), and hence
H(x)
     2
det In H(x), T H(x)dx = Pf F(x)J1 F(x)T dx ,
A Ac

after some algebraic manipulations using part (ii) of Theorem 3.9.2 and the fact
that det Jn = 1. Thus, by (3.9.3) with A replaced by Ac , the integration identity
(3.9.9) holds up to a constant factor independent of A. Finally, since (3.9.9) obvi-
ously holds for A = 0,
/ it holds for all A.

3.9 L IMITING B EHAVIOR OF THE GOE AND THE GSE 155

Proof of Proposition 3.9.8 Taking n as in Lemma 3.9.6, = 0 and FA (x) =


n1 GA (x)T , the integration identity (3.9.5) holds with a constant c independent

of A. Further, we have In = J1 T
n /2 FR (x)J1 FR (x) dx by (3.9.14), (3.9.15) and
(3.9.16), and hence
     2
det In G(x)T GAc (x)dx = Pf FAc (x)J1 FAc (x)T dx
A Ac

by Lemma 3.9.6 with A replaced by Ac , after some algebraic manipulations using


part (ii) of Theorem 3.9.2 and the fact that det Jn = 1. Thus, by (3.9.5) with
A replaced by Ac , the integration identity (3.9.10) holds up to a constant factor
independent of A. Finally, since (3.9.10) obviously holds for A = 0,/ it holds for
all A.

3.9.2 Fredholm representation of gap probabilities

In this section, by reinterpreting formulas (3.9.9) and (3.9.10), we represent the


square of a gap probability for the GOE or GSE as a Fredholm determinant of a
matrix kernel, see Theorem 3.9.19.

Matrix kernels and a revision of the Fredholm setup

We make some specialized definitions to adapt Fredholm determinants as defined


in Section 3.4 to the study of limits in the GOE and GSE.

Definition 3.9.10 For k {1, 2}, let Kerk denote the space of Borel-measurable
functions K : R R Matk (C). We call elements of Ker1 scalar kernels, ele-
ments of Ker2 matrix kernels, and elements of Ker1 Ker2 simply kernels. We
often view a matrix kernel K Ker2 as a 2 2 matrix with entries Ki, j Ker1 .

We are now using the term kernel in a sense somewhat differing from that in
Section 3.4. On the one hand, usage is more general because boundedness is not
assumed any more. On the other hand, usage is more specialized in that kernels
are always functions defined on R R.

Definition 3.9.11 Given K, L Kerk , we define K  L by the formula



(K  L)(x, y) = K(x,t)L(t, y)dt ,

whenever |Ki, (x,t)L, j (t, y)|dt < for all x, y R and i, j,  {1, . . . , k}.
156 3. S PACINGS FOR G AUSSIAN ENSEMBLES

Since the definition of Fredholm determinant made in Section 3.4 applies only
to bounded kernels on measure spaces of finite total mass, to use it efficiently we
have to make the next several definitions.
Given a real constant 0, let w (x) = exp( |x + | 2 ) for x R. Note that
w (x) = e x for x > and w0 (x) 1.

Definition 3.9.12 ( -twisting) Given k {1, 2}, a kernel K Kerk , and a constant
0, we define the -twisted kernel K ( ) Kerk by


K(x, y)w (y) if k = 1,

K ( ) (x, y) = " #

w (x)K11 (x, y) w (x)K12 (x, y)w (y)
if k = 2 .
K21 (x, y) K22 (x, y)w (y)


We remark that K Ker2 K11
T , K Ker where K T (x, y) = K (y, x).
22 1 11 11

As before, let Leb denote Lebesgue measure on the real line. For 0, let
Leb (dx) = w (x)1 Leb(dx), noting that Leb0 = Leb, and that Leb has finite
total mass for > 0.

Definition 3.9.13 Given k {1, 2}, a kernel K Kerk , and a constant 0, we



write K Kerk if there exists some open set U R and constant c > 0 such that
Leb (U) < and maxi, j |(K ( ) )i, j | c1UU .


Note that Kerk is closed under the operation  because, for K, L Kerk , we have

(K  L)( ) (x, y) = K ( ) (x,t)L( ) (t, y)Leb (dt) (3.9.23)

and hence K  L Kerk .
We turn next to the formulation of a version of the definition of Fredholm de-

terminant suited to kernels of the class Kerk .


Definition 3.9.14 Given k {1, 2}, 0, and L Kerk , we define Fredk (L) by
specializing the setup of Section 3.4 as follows.
(i) Choose U R open and c > 0 such that maxi, j |(L( ) )i, j | c1UU .
(ii) Let X = U I , where I = {1}, {1, 2} according as k = 1, 2.
(iii) Let = (restriction of Leb to U) (counting measure on I ).
(iv) Let K((s, i), (t, j)) = L( ) (s,t)i, j for (s, i), (t, j) X.

Finally, we let Fredk (L) = (K), where the latter is given as in Definition 3.4.3,
with inputs X, and K as defined above.
3.9 L IMITING B EHAVIOR OF THE GOE AND THE GSE 157

The complex number Fredk (L) is independent of the choice of U and c made in
point (i) of the definition, and hence well defined. The definition is contrived so

that if L Kerki for i = 1, 2, then Fredki (L) is independent of i, as one verifies by
comparing the expansions of these Fredholm determinants term by term.

Two formal properties of Fredk () deserve emphasis.

Remark 3.9.15 If K, L Kerk , then multiplicativity holds in the form

Fredk (K + L K  L) = Fredk (K)Fredk (L) ,

by (3.9.23) and Theorem 3.4.10. Further, by Corollary 3.4.9, if K Ker2 satisfies
K21 0 or K12 0, then
T
Fred2 (K) = Fred1 (K11 )Fred1 (K22 ) .

The analog of Remark 3.4.4 in the present situation is the following.

Remark 3.9.16 Let 0 be a constant. Let U R be an open set such that


Leb (U) < . Let G, G : R Mat22n (C) be Borel-measurable. Assume further
that all entries of the matrices
" # " #
w (x) 0 1 0 ,
G(x), G(x)
0 1 0 w (x)
are bounded for x U. Let
, T Mat2 (C)
K(x, y) = G(x)G(y)

for x, y R. Let A U be a Borel set. Then 1AA K Ker2 and
  
, T G(x)dx .
Fred2 (1AA K) = det I2n G(x)
A


If K Kerk and Fredk (K) = 0, then one can adapt the Fredholm adjugant con-
struction, see equation (3.4.15), to the present situation, and one can verify that

there exists unique R Kerk such that the resolvent equation RK = K R = RK
holds.

Definition 3.9.17 The kernel R Kerk associated as above with K Kerk is called

the resolvent of K with respect to , and we write R = Resk (K).

This definition is contrived so that if K Kerki for i = 1, 2, then Reski (K) is in-
dependent of i. In fact, we will need to use this definition only for k = 1, and the
only resolvents that we will need are those we have already used to analyze GUE
in the bulk and at the edge of the spectrum.
158 3. S PACINGS FOR G AUSSIAN ENSEMBLES

Finally, we introduce terminology pertaining to useful additional structure a


kernel may possess.

Definition 3.9.18 We say that K Kerk for k {1, 2} is smooth if K is infinitely


differentiable. We say that L Ker1 is symmetric (resp., antisymmetric) if L(x, y)
L(y, x) (resp., L(x, y) L(y, x)). We say that K Ker2 is self-dual if K21 and K12
are antisymmetric and K11 (x, y) K21 (x, y). Given smooth L Ker1 and K Ker2 ,
we say that K is the differential extension of L if

L
xLy (x, y)
2
x (x, y)
K(x, y) .
L(x, y) Ly (x, y)

Note that if K Ker2 is smooth, K21 is antisymmetric, and K is the differential



extension of K21 , then K is self-dual and K21 (x, y) = yx K11 (t, y)dt.

Main results

Fix real constants > 0 and . With n = n, , as defined by formula (3.9.7),


we put

1 n1
Kn, , ,2 (x, y) = i (x)i (y) .
2 i=0
(3.9.24)

The kernel Kn, , ,2 (x, y) is nothing new: we have previously studied it to obtain
limiting results for the GUE.
We come to the novel definitions. We write Kn = Kn, , ,2 to abbreviate. Let

Kn (x, y) Kyn (x, y)
Kn, , ,1 (x, y) = 
12 sign(x y) + yx Kn (t, y)dt Kn (x, y)
 n1 (x)n (y) n1 (x)n (y)
n
+ x
2 3
( y n1 (t)dt)n (y) n (x)n1 (y)

n1 (x)

n1 ,1 0

 x (t)dt if n is odd,
+ y n1 (y) (3.9.25)

 
n1

n1 ,1 n1 ,1
0 if n is even,
3.9 L IMITING B EHAVIOR OF THE GOE AND THE GSE 159

and

K
1 K2n+1 (x, y) 2n+1
y (x, y)
Kn, , ,4 (x, y) = x (3.9.26)
2 K
y 2n+1 (t, y)dt K 2n+1 (x, y)

2n + 1 2n (x)2n+1 (y) 2n (x)2n+1 (y)
+ x .
4 3
( 2n (t)dt)2n+1 (y) 2n+1 (x)2n (y)
y

We then have the following representations of squares of gap probabilities as Fred-


holm determinants of matrix kernels.

Theorem 3.9.19 Let 0 and a Borel set A R be given. Assume either that
> 0 or that A is bounded. Let {1, 4}. Then we have
  2
n
|(x)| i=1 0, , i(x ) dx
Ac Ac
i
= Fred (1AA Kn, , , ) . (3.9.27)
2

|(x)| i=1 0, , (xi ) dxi
n

It is easy to check using Lemma 3.9.9 that the right side is defined. For compari-
son, we note that under the same hypotheses on and A we have
 
Ac Ac |(x)| i=1 0, , (xi ) dxi
2 n 2

  = Fred1 (1AA Kn, , ,2 ) . (3.9.28)
|(x)|2 ni=1 0, , (xi )2 dxi
The latter is merely a restatement in the present setup of Lemma 3.2.4.
Before commencing the proof we need to prove a Pfaffian analog of (3.9.21).
For integers n > 0, put

 (x)  (y)
Ln (x, y) = Ln, , (x, y) = 2
 (x)  (y) .
0<n
(1) =(1)n

Lemma 3.9.20

n 1 n1
Ln (x, y) = n1 (x)n (y) + 2 i (x)i (y) .
2 3 i=0

Proof In view of (3.9.13), it is enough to prove



 (x)  (y) n n1

(x) (y) = 2 n1 (x)n (y) + i (x)i (y) .
0<n   i=0
(1) =(1)n

Let F1 (x, y) and F2 (x, y) denote the left and right sides of the equation above,
respectively. Fix {1, 2} and integers j, k 0 arbitrarily. By means of (3.9.14)
160 3. S PACINGS FOR G AUSSIAN ENSEMBLES

and (3.9.17), one can verify that F (x, y) j (x)k (y)dxdy is independent of ,
which is enough by (3.9.14) to complete the proof.


Proof of Theorem 3.9.19 Given smooth L Ker1 , to abbreviate notation, let Lext
Ker2 denote the differential extension of L, see Definition 3.9.18.
First we prove the case = 4 pertaining to the GSE. Let H(x) be as defined
in Proposition 3.9.7. By straightforward calculation based on Lemma 3.9.20, one
can verify that
1 ext
H(x)J1 T
n H(y) J1 = L2n+1, , (x, y) = Kn, , ,4 (x, y) .
2
Then formula (3.9.27) in the case = 4 follows from (3.9.9) and Remark 3.9.16.
We next prove the case = 1 pertaining to the GOE. We use all the notation
introduced in Proposition 3.9.8. One verifies by straightforward calculation using
Lemma 3.9.20 that
GR (x)J1 GR (y)T J1 ext ext
n = Ln, , (x, y) + Mn, , (x, y) ,

where 
n1 (x)n1 (y)
1,n1  if n is odd,
Mn, , (x, y) =
0 if n is even.
Further, with
" #
0 0
Q(x, y) = GAc (x)J1 GR (y)T J1
n , E(x, y) = , (3.9.29)
1
2 sign(x y) 0
QA = 1AA Q and EA = 1AA E, we have
EA + QA + EA  QA = 1AA Kn, , ,1 .
Finally, formula (3.9.27) in the case = 1 follows from (3.9.10) combined with
Remarks 3.9.15 and 3.9.16.

Remark 3.9.21 Because the kernel Ln, , is smooth and antisymmetric, the proof
above actually shows that Kn, , ,4 is both self-dual and the differential extension
of its entry in the lower left. Further, the proof shows the same for Kn, , ,1 + E.

3.9.3 Limit calculations


( )
In this section we evaluate various limits of the form limn Kn,n ,n , , paying
strict attention to uniformity of the convergence, see Theorems 3.9.22 and 3.9.24
below. Implications of these to spacing probabilities are summarized in Corollar-
ies 3.9.23 and 3.9.25 below.
3.9 L IMITING B EHAVIOR OF THE GOE AND THE GSE 161

Statements of main results

Recall the symmetric scalar kernels, see Theorem 3.1.1, and Definition 3.1.3,
1 sin(x y)
Ksine (x, y) = Ksine,2 (x, y) = , (3.9.30)
xy

Ai(x) Ai (y) Ai (x) Ai(y)


KAiry (x, y) = KAiry,2 (x, y) = . (3.9.31)
xy
It is understood that these kernels are defined for x = y in the unique way making
them continuous (and in fact infinitely differentiable). The subscript 2 refers to
the parameter for the GUE.
We define matrix variants of the sine-kernel, and state the main result on con-
vergence toward these variants. Let

Ksine (x, y) Ksine
y (x, y)
Ksine,1 (x, y) =  x ,
2 sign(x y) + y Ksine (t, y)dt
1
Ksine (x, y)
(3.9.32)

1 Ksine (x, y) Ksine
y (x, y)
Ksine,4 (x, y) = x . (3.9.33)
2 y Ksine (t, y)dt Ksine (x, y)

The subscripts 1 and 4 refer to the parameters for the GOE and GSE, respec-
tively. Note that each of the kernels Ksine,4 and, with E as in (3.9.29), E + Ksine,1 is
self-dual and the differential extension of its entry in the lower left. In other words,
the kernels Ksine, have properties analogous to those of Kn, , , mentioned in Re-
mark 3.9.18.
We will prove the following limit formulas.

Theorem 3.9.22 For all bounded intervals I R,

lim Kn,n,0,1 (x, y) = Ksine,1 (x, y) , (3.9.34)


n
lim Kn,n,0,2 (x, y) = Ksine,2 (x, y) , (3.9.35)
n
lim Kn,2n,0,4 (x, y) = Ksine,4 (x, y) , (3.9.36)
n

uniformly for x, y I.

Limit formula (3.9.35) is merely a restatement of Lemma 3.5.1, and to the proof
of the latter there is not much to add in order to prove the other two limit formu-
las. Using these we will prove the following concerning the bulk limits Fbulk, (t)
considered in Theorem 3.1.6.
162 3. S PACINGS FOR G AUSSIAN ENSEMBLES

Corollary 3.9.23 For {1, 2, 4} and constants t > 0, the limits Fbulk, (t) exist.
More precisely, with I = (t/2,t/2) R,

(1 Fbulk,1 (t))2 = Fred02 (1II Ksine,1 ) , (3.9.37)


1 Fbulk,2 (t) = Fred01 (1II Ksine,2 ) , (3.9.38)
(1 Fbulk,4 (t/2)) 2
= Fred02 (1II Ksine,4 ) . (3.9.39)

Further, for {1, 2, 4},


lim Fbulk, (t) = 1 . (3.9.40)
t

Formula (3.9.38) merely restates the limit formula in Theorem 3.1.1. Note that the
limit formulas limt0 Fbulk, (t) = 0 for {1, 2, 4} hold automatically as a conse-
quence of the Fredholm determinant formulas (3.9.37), (3.9.38) and (3.9.39), re-
spectively. The case = 2 of (3.9.40) was discussed previously in Remark 3.6.5.
We will see that the cases {1, 4} are easily deduced from the case = 2 by
using decimation and superposition, see Theorem 2.5.17.
We turn to the study of the edge of the spectrum. We introduce matrix variants
of the Airy kernel KAiry and then state limit results. Let

KAiry,1 (x, y)

K
KAiry (x, y) Airy
y (x, y)
= 
12 sign(x y) + yx KAiry (t, y)dt KAiry (x, y)
 
1 Ai(x)(1 y Ai(t)dt) Ai(x) Ai(y)
+ x   , (3.9.41)
2 ( y Ai(t)dt)(1 y Ai(t)dt) (1 x Ai(t)dt) Ai(y)
KAiry,4 (x, y)

K
1 KAiry (x, y) Airy
y (x, y)
= x
2 y KAiry (t, y)dt KAiry (x, y)
 
1 Ai(x) y Ai(t)dt Ai(x) Ai(y)
+ x   . (3.9.42)
4 ( y Ai(t)dt)( y Ai(t)dt) ( x Ai(t)dt) Ai(y)

Although it is not immediately apparent, the scalar kernels appearing in the lower
left of KAiry, for {1, 4} are antisymmetric, as can be verified by using formula
(3.9.58) below and integration by parts. More precisely, each of the kernels KAiry,4
and E + KAiry,1 (with E as in (3.9.29)) is self-dual and the differential extension
of its entry in the lower left. In other words, the kernels KAiry, have properties
analogous to those of Kn, , , mentioned in Remark 3.9.18.
We will prove the following limit formulas.
3.9 L IMITING B EHAVIOR OF THE GOE AND THE GSE 163

Theorem 3.9.24 For constants 0 and intervals I R bounded below,


( ) ( )
lim Kn,n1/6 ,2n,1 (x, y) = KAiry,1 (x, y) , (3.9.43)
n
( ) ( )
lim K (x, y) = KAiry,2 (x, y) , (3.9.44)
n n,n1/6 ,2 n,2
( ) ( )
lim K (x, y) = KAiry,4 (x, y) , (3.9.45)
n n,(2n)1/6 ,2 2n,4

uniformly for x, y I.

The proofs of the limit formulas are based on a strengthening of Lemma 3.7.2
capable of handling intervals unbounded above, see Proposition 3.9.30. The limit
formulas imply, with some extra arguments, the following results concerning the
edge limits Fedge, (t) considered in Theorem 3.1.7.

Corollary 3.9.25 For {1, 2, 4} and real constants t, the edge limits Fedge, (t)
exist. More precisely, with I = (t, ), and > 0 any constant,

Fedge,1 (t)2 = Fred2 (1II KAiry,1 ) , (3.9.46)

Fedge,2 (t) = Fred1 (1II KAiry,2 ) , (3.9.47)
2/3 2
Fedge,4 (t/2 ) = Fred2 (1II KAiry,4 ) . (3.9.48)
Further, for {1, 2, 4},
lim Fedge, (t) = 0 . (3.9.49)
t

We will show below, see Lemma 3.9.33, that for 0 and {1, 2, 4}, the
( )
-twisted kernel KAiry, is bounded on sets of the form I I with I an interval
bounded below, and hence all Fredholm determinants on the right are defined.
Note that the limits limt+ Fedge, (t) = 1 for {1, 2, 4} follow automatically
from formulas (3.9.46), (3.9.47) and (3.9.48), respectively. In particular, formula
(3.9.47) provides another route to the proof of Theorem 3.1.4 concerning edge-
scaling in the GUE which, bypassing the Ledoux bound (Lemma 3.3.2), handles
the right-tightness issue directly.

Proofs of bulk results

The proof of Theorem 3.9.22 is based on the following refinement of (3.5.4).

Proposition 3.9.26 For all integers k 0, integers , and bounded intervals I of


real numbers, we have
   
d k
cos(x (n + )/2)
lim n+ ,n,0 (x) = 0,
n dx
164 3. S PACINGS FOR G AUSSIAN ENSEMBLES

uniformly for x I.

Proof The case k = 0 of the proposition is exactly (3.5.4). Assume hereafter that
k > 0. By (3.9.17) and (3.9.20) we have

n+ xn+ ,n,0 (x)
n+ ,n,0 (x) = n+ 1,n,0 (x) .
n 2n
Repeated differentiation of the latter yields a relation which finishes the proof by
induction on k.

Proposition 3.9.27 For , {0, 1} and bounded intervals I R we have


   

lim Kn+ ,n,0,2 (x, y) = Ksine,2 (x, y) ,
n y y
uniformly for x, y I.

The proof is a straightforward modification of the proof of Lemma 3.5.1, using


Proposition 3.9.26 to justify differentiation under the integral. We omit the details.
The following elementary properties of the oscillator wave-functions will also
be needed.

Proposition 3.9.28 We have



lim n1/4
n
n (x)dx = 2 . (3.9.50)
n:even

In the bulk case only the order of magnitude established here is needed, but in the
edge case we will need the exact value of the limit.
Proof By (3.9.11) in the case = 1 and = 0 we have

0 (x) = 21/4 1/4 ex
2 /4
, 0 (x)dx = 23/4 1/4 . (3.9.51)

By (3.9.17) in the case = 0 and = 1 we have


: 7
 ; n/2 
n (x)dx ; < 2i 1 n! 4 2
 = = ,
0 (x)dx i=1 2i 2n ((n/2)!)2 n

by the Stirling approximation, see (2.5.12). Then (3.9.50) follows from (3.9.51).

3.9 L IMITING B EHAVIOR OF THE GOE AND THE GSE 165

Proposition 3.9.29 We have





sup n (x)dx < . (3.9.52)
n=1 0
n:odd

Proof For odd positive integers n we have a recursion


  
2 n+1
n+2 (x)dx = n+1 (0) + n (x)dx ,
0 n+2 n+2 0
which follows directly from (3.9.17) in the case = 0 and = 1. Iterating, and
using also the special case

n + 1n+1 (0) = nn1 (0) (3.9.53)
of (3.9.20), we obtain the relation
   
n+3 n+1
(1) (n+5)/2
n+4 (x)dx (1)(n+1)/2 n (x)dx
0 n+4 n+2 0
   
2 n+2 n+3
= (1)(n+1)/2 n+1 (0) + ,
n+4 n+3 n+2

for odd positive integers n. By (3.9.51) and (3.9.53), the right side is positive and
in any case is O(n5/4 ). The bound (3.9.52) follows.

Proof of Theorem 3.9.22 The equality (3.9.35) is the case = 0 of Proposition


3.9.27. To prove (3.9.34) and (3.9.36), in view of Propositions 3.9.26 and 3.9.27,
we just have to verify the (numerical) limit formulas
1 1
lim = lim = 0,
n
n:odd
n1,n,0 , 1 n n3/4 
n:odd n1 , 1
n,n,0 (0) 
1
lim = lim n (x)dx = 0 .
n
n:odd
n n
n:odd 2n1/4 0

These hold by Propositions 3.9.28 and 3.9.29, respectively. The proof of Theo-
rem 3.9.22 is complete.

( ,n) ( ,n)
Proof of Corollary 3.9.23 For {1, 2, 4}, let ( ,n) = (1 , . . . , n ) be
a random vector in Rn with law possessing a density with respect to Lebesgue
measure proportional to |(x)| e |x| /4 . We have by Theorem 3.9.19, formula
2

(3.9.11) and the definitions that


P({ ( (1,n) )} I = 0)
/ 2 = Fred02 (1II Kn, , ,1 ) ,
P({ ( (2,n) )} I = 0)
/ = Fred01 (1II Kn, , ,2 ) ,
(4,n)
P({ ( 2 )} I = 0)
/ 2 = Fred02 (1II Kn, , ,4 ) .
166 3. S PACINGS FOR G AUSSIAN ENSEMBLES

The proofs of (3.9.37), (3.9.38) and (3.9.39) are completed by using Lemma 3.4.5
and Theorem 3.9.22. It remains only to prove the statement (3.9.40). For = 2,
it is a fact which can be proved in a couple of ways described in Remark 3.6.5.
The case = 2 granted, the cases {1, 4} can be proved by using decimation
and superposition, see Theorem 2.5.17. Indeed, consider first the case = 1. To
derive a contradiction, assume limt Fbulk,1 (t) = 1 for some > 0. Then, by
the decimation relation (2.5.25), limt Fbulk,2 (t) 1 2 , a contradiction. Thus,
limt Fbulk,1 (t) = 1. This also implies by symmetry that the probability that no
(rescaled) eigenvalue of the GOE appears in [0,t], denoted F1 (t), decays to 0 as
t . By the decimation relation (2.5.26), we then have

1 Fbulk,4 (t) 2F1 (2t) t 0 .

This completes the proof of (3.9.40).


Proofs of edge results

The proof of Theorem 3.9.24 is similar in structure to that of Theorem 3.9.22. We


begin by refining Lemma 3.7.2.

Proposition 3.9.30 For all constants 0, integers k 0, integers and intervals


I bounded below we have

lim e x n+ ,n1/6 ,2n (x) = e x Ai(k) (x)


(k)
(3.9.54)
n

uniformly for x I.

We first need to prove two lemmas. The first is a classical trick giving growth
information about solutions of one-dimensional Schrodinger equations. The sec-
ond applies the first to the Schrodinger equation (3.9.22) satisfied by oscillator
wave-functions.

Lemma 3.9.31 Fix real numbers a < b. Let and V be infinitely differentiable
real-valued functions defined on the interval (a, ) satisfying the following:
(i) = V ; (ii) > 0 on [b, ); (iii) limx (log ) (x) = ;
(iv) V > 0 on [b, );

(v) V 0 on [b, ).
Then (log ) V on [b, ).

The differentiability assumptions, while satisfied in our intended application, are


much stronger than needed.
Proof Suppose rather that the conclusion does not hold. After replacing
 b by

some point of the interval (b, ) we may assume that (b) > V (b). After
3.9 L IMITING B EHAVIOR OF THE GOE AND THE GSE 167

making a linear change of both independent and dependent variables, we may



assume that b = 0, V (0) = 1 and hence (0) > 1. Consider the function


(x) = cosh x+ (0) sinh x. Clearly we have (0) = 1, (0) = (0) and
= .

Further, because (0) > 1, we have > 0 and > 1 on [0, ). Finally, we
have
d
( )(0) = 0, ( ) = (V 1) 0 on [0, ) ,
dx

and hence > 1 on [0, ), which is a contradiction.

Lemma 3.9.32 Fix n > 0 and put n (x) = n,n1/6 ,2n (x). Then for x 1 we have
n (x) > 0 and (log n ) (x) (x 1/2)1/2 .

Proof Let be the rightmost of the finitely many zeroes of the function n . Then
n does not change sign on ( , ) and in fact is positive by (3.9.20). The logarith-
mic derivative of n tends to as x + because n is a polynomial in x times
a Gaussian density function of x. In the present case the Schrodinger equation
(3.9.22) takes the form

n (x) = (x + n2/3 x2 /4 1/(2n1/3 )) n (x) . (3.9.55)

We finally apply Lemma 3.9.31 with a = max(1, ) < b, thus obtaining the esti-
mate

(log n ) (b) b 1/2 for b ( , ) (1, ) .

This inequality forces one to have < 1 because the function of b on the left side
tends to + as b .

Proof of Proposition 3.9.30 We write n, (x) instead of n+ ,n1/6 ,2n (x) to abbre-
viate. We have
 
xn, (x) n n1/6
n, 1 (x) n, (x) = + 1 n, (x) n, (x) ,
2n1/6 n + n+ n+
by (3.9.20) and (3.9.17), and by means of this relation we can easily reduce to the
case = 0. Assume that = 0 hereafter and write simply n = n,0 .
By Lemma 3.7.2, the limit (3.9.54) holds on bounded intervals I. Further, from
Lemma 3.7.7 and the Airy equation Ai (x) = x Ai(x), we deduce that

e x Ai(k) (x) is bounded on intervals bounded below . (3.9.56)


168 3. S PACINGS FOR G AUSSIAN ENSEMBLES

Thus it is enough to establish the following bound, for arbitrary constants 0


and integers k 0:


sup sup |e x n (x)| < .
(k)
(3.9.57)
n=1 x1

Since in any case sup


n=1 n (1) < , we get the bound (3.9.57) for k = 0, 1 and all
0 by Lemma 3.9.32. We then get (3.9.57) for k 2 and all 0 by (3.9.55)
and induction on k.

Growth of KAiry, is under control in the following sense.

( )
Lemma 3.9.33 For {1, 2, 4}, 0 and intervals I bounded below, KAiry, is
bounded on I I.

Proof We have

KAiry (x, y) = Ai(x + t) Ai(y + t)dt . (3.9.58)
0

To verify this formula, first apply x + y to both sides, using (3.9.56) to justify
differentiation under the integral, then apply the Airy equation Ai (x) = x Ai(x) to
verify equality of derivatives, and finally apply (3.9.56) again to fix the constant
of integration. By further differentiation under the integral, it follows that for all
integers k,  0, constants 0 and intervals I bounded below,

k+
sup e (x+y) k  KAiry (x, y) < . (3.9.59)
x,yI x y

The latter is more than enough to prove the lemma.


The following is the analog of Proposition 3.9.27.

Proposition 3.9.34 For , {0, 1}, constants 0 and intervals I R bounded


below, we have
   

lim e (x+y) Kn+ ,n1/6 ,2n,2 (x, y) = e (x+y) KAiry,2 (x, y) , (3.9.60)
n y y

uniformly for x, y I.
3.9 L IMITING B EHAVIOR OF THE GOE AND THE GSE 169

Proof To abbreviate we write n, = n+ ,n1/6 ,2n . We have

Kn+ ,n1/6 ,2n,2 (x, y)



= n, (x + t)n, (y + t)dt
0

1
+ (x + y + 2t)n, (x + t)n, (y + t)dt (3.9.61)
4n2/3 0
  
1
+ 1/3 n, (x + t)n, (y + t) + n, (x + t)n, (y + t) dt .
2n 0

This is proved using (3.9.12), (3.9.21) and (3.9.22), following the pattern set in
proving (3.9.58) above. In the case = 0 we then get the desired uniform con-
vergence (3.9.50) by Proposition 3.9.30 and dominated convergence. After differ-
entiating under the integrals in (3.9.58) and (3.9.61), we get the desired uniform
convergence for = 1 in similar fashion.

Proof of Theorem 3.9.24 The limit (3.9.44) follows from Proposition 3.9.34. To
see (3.9.43) and (3.9.45), note that by definitions (3.9.41) and (3.9.42), and Propo-
sitions 3.9.30 and 3.9.34, we just have to verify the (numerical) limit formulas
1 n1/4 1
lim  1/6 , 1 =
n 4 n,n ,2 n
lim
n
n , 1 = ,
n:even n:even
4 2
1 1 1
lim = lim = .
n
n:odd
n1,n1/6 ,2n , 1 n n1/4 
n:odd n1 , 1 2

These hold by Proposition 3.9.28. The proof of Theorem 3.9.24 is complete.


Proof of Corollary 3.9.25 With the notation ( ,n)


as defined at the beginning of
the proof of Corollary 3.9.23, we have by Theorem 3.9.19, formula (3.9.11) and
the definitions that

P({ ( (1,n) )} I = 0)
/ 2 = Fred2 (1II Kn, , ,1 ) ,

P({ ( (2,n) )} I = 0)
/ = Fred1 (1II Kn, , ,2 ) ,
(4,n)
P({ ( 2 )} I = 0)
/ 2 = Fred2 (1II Kn, , ,4 )) .

To finish the proofs of (3.9.46), (3.9.47) and (3.9.48), use Lemma 3.4.5 and The-
orem 3.9.24. The statement (3.9.49) holds for = 2 by virtue of Theorem 3.1.5,
and for = 1 as a consequence of the decimation relation (2.5.25).
The argument for = 4 is slightly more complicated. We use some information
on determinantal processes as developed in Section 4.2. By (3.8.22), the sequence
of laws of the second eigenvalue of the GUE, rescaled at the edge scaling, is
tight. Exactly as in the argument above concerning = 1, this property is inherited
by the sequence of laws of the (rescaled) second eigenvalue of the GOE. Using
170 3. S PACINGS FOR G AUSSIAN ENSEMBLES

(2.5.26), we conclude that the same applies to the sequence of laws of the largest
eigenvalue of the GSE.

Remark 3.9.35 An alternative to using the decimation relations (2.5.25) and


(2.5.26) in the proof of lower tail tightness is to use the asymptotics of solutions
of the Painleve II equations, see Remark 3.8.1. It has the advantage of leading to
more precise tail estimates on Fedge, . We sketch the argument in Exercise 3.9.36.

Exercise 3.9.36 Using Exercise 3.8.3, (3.7.20), (3.8.4), (3.8.21) and Theorem
3.1.7, show that for = 1, 2, 4,

1 2
lim log[1 Fedge, (t)] = ,
t t 3/2 3
1
lim log Fedge, (t) = .
t t 3 24
Again, note the different rates of decay for the upper and lower tails of the distri-
bution of the largest eigenvalue.

3.9.4 Differential equations

We derive differential equations for the ratios

(1 Fbulk, (t/2))2 Fedge, (t/22/3 )2


bulk, (t) = , edge, (t) = , (3.9.62)
1 Fbulk,2 (t) Fedge,2 (t)

for {1, 4}, thus finishing the proofs of Theorems 3.1.6 and 3.1.7.

Block matrix calculations

We aim to represent each of the quantities bulk, (t) and edge, (t) as a Fredholm
determinant of a finite rank kernel. Toward that end we prove the following two
lemmas.
Fix a constant 0. Fix kernels
" # " #
a b 0 0
, Ker2 , , w Ker1 . (3.9.63)
c d e 0

Assume that

d = + w , Fred1 ( ) = 0 . (3.9.64)
3.9 L IMITING B EHAVIOR OF THE GOE AND THE GSE 171

Below, for brevity, we suppress , writing AB for A  B. Put


" #
a be (a be)b
K1 = , (3.9.65)
c de w + (c de)b
 (abe)b
abe
2 4
K4 = cde+e(abe) (cde+e(abe))b
, (3.9.66)
ebd
2 w+ 2 + 4
" #
0 0
R = Ker2 .
0 Res1 ( )

That R is well defined and belongs to Ker2 follows from assumption (3.9.64).
That K1 and K4 are well defined will be proved below. Recall that for k {1, 2}

and L1 , L2 Kerk , again L1 L2 Kerk , by (3.9.23).

Lemma 3.9.37 With data (3.9.63) and under assumption (3.9.64), the kernels K1
and K4 are well defined, and have the following properties:

K1 , K4 Ker2 , (3.9.67)
" #
a b
Fred2
e + c d
Fred2 (K1 + K1 R) = , (3.9.68)
Fred1 ( )
 " #
a b
Fred2 12
c d
Fred2 (K4 + K4 R) = . (3.9.69)
Fred1 ( )

Proof Put
" # " # " #
0 b 0 0 0 0
B= , E= , S= .
0 0 e 0 0

Note that B, E, S Ker2 . Given L1 , . . . , Ln Ker2 with n 2, let

m(L1 , L2 ) = L1 + L2 L1 L2 Ker2 ,

m(L1 , . . . , Ln ) = m(m(L1 , . . . , Ln1 ), Ln ) Ker2 for n > 2 .
Put
" # 
a b
L1 = m , E, B, R ,
e + c d
 " # 
1 a b 1
L4 = m E, , E, B, R .
2 c d 2
Ones verifies that
K = L L S, L = K + K R (3.9.70)
172 3. S PACINGS FOR G AUSSIAN ENSEMBLES

for {1, 4} by straightforward calculation with 2 2 matrices in which one


uses the first part of assumption (3.9.64), namely d = + w, and the resolvent
identity R S = RS = SR. Relation (3.9.70) establishes that K1 and K4 are well
defined and proves (3.9.67). By Remark 3.9.15, we have

Fred2 (cB) = 1, Fred2 (E) = 1, Fred2 (R)Fred1 ( ) = 1 ,

where c is any real constant, and for L1 , . . . , Ln Ker2 with n 2,

Fred2 (m(L1 , . . . , Ln )) = Fred2 (L1 ) Fred2 (Ln ) .

We can now evaluate Fred2 (L ), thus proving (3.9.68) and (3.9.69).

The next lemma shows that K can indeed be of finite rank in cases of interest.

Lemma 3.9.38 Let K Ker2 be smooth, self-dual, and the differential extension
of its entry K21 Ker1 in the lower left. Let I = (t1 ,t2 ) be a bounded interval. Let
" #
a(x, y) b(x, y) 1
= 1II (x, y)K(x, y), e(x, y) = 1II (x, y)sign(x y) ,
c(x, y) d(x, y) 2

thus defining a, b, c, d, e Ker01 . Let


1
(x) = (K11 (x,t1 ) + K11 (x,t2 )) , (3.9.71)
2
(x) = K11 (x,t2 ) K11 (x,t1 ) , (3.9.72)
 x  t2 
1
(x) = (y)dy (y)dy . (3.9.73)
2 t1 x

Let K for {1, 4} be as defined in (3.9.65) and (3.9.66), respectively, with


w = 0. Then
" #
(x)

K1 (x, y) = 1II (x, y) 1 (y) , (3.9.74)
(x)
" #" #
(x)/2 0 1 (y)/2
K4 (x, y) = 1II (x, y) . (3.9.75)
(x) 1 0 (y)/2

We omit the straightforward proof.

Proof of Theorem 3.1.6

We begin by recalling basic objects from the analysis of the GUE in the bulk of
the spectrum. Reverting to the briefer notation introduced in equation (3.6.1), we
3.9 L IMITING B EHAVIOR OF THE GOE AND THE GSE 173

write S(x, y) = Ksine,2 (x, y) for the sine-kernel. Explicitly, equation (3.9.38) says
that
   n
(1)n x1 . . . xn
1 Fbulk,2 (t) = 1 +
x1 . . . xn
S dxi .
n=1 n! [ 2t , 2t ]n i=1

Let R(x, y;t) be the resolvent kernel introduced in Section 3.6.1 (obtained from
the sine-kernel with the choice n = 2, s0 = 0 = s2 , s1 = 1 and t2 = t1 = t/2).
Explicitly, R(x, y;t) is given by
   n
(1)n x x1 xn
(1Fbulk,2 (t))R(x, y;t) = S(x, y)+
y x1 xn
S dxi ,
n=1 n! [ 2t , 2t ]n i=1

and satisfies
 t/2
S(x, y) + S(x, z)R(z, y;t)dz = R(x, y;t) (3.9.76)
t/2

by the fundamental identity, see Lemma 3.4.7. Recall the functions


t/2 
sin x sin y
Q(x;t) = + R(x, y;t) dy ,
t/2
 t/2
cos x cos y
P(x;t) = + R(x, y;t) dy ,
t/2
which are as in definition (3.6.3) as specialized to the case n = 2, s0 = 0, s1 = 1,
s2 = 0, t1 = t/2 and t2 = t/2 studied in Section 3.6.3. Finally, as in (3.6.30), let

p = p(t) = P(t/2;t), q = q(t) = Q(t/2;t) ,

noting that
r = r(t) = 2pq/t , (3.9.77)

is the function appearing in Theorem 3.1.6.


We introduce a systematic method for extracting useful functions of t from
R(x, y;t). A smooth (infinitely differentiable) function (x;t) defined for real x
and positive t will be called a test-function. Given two test-functions 1 and 2 ,
we define
 1/2
1 |2 t = t 1 (tx;t)2 (tx;t)dx
1/2
 1/2  1/2
+t 2 1 (tx;t)R(tx,ty;t)2 (ty;t)dxdy .
1/2 1/2

We call the resulting function of t an angle bracket. Because

R(x, y;t) R(y, x;t) R(x, y;t) , (3.9.78)


174 3. S PACINGS FOR G AUSSIAN ENSEMBLES

the pairing |t is symmetric and, furthermore,

1 (x;t)2 (x;t) 1 (x;t)2 (x;t) 1 |2 t 0 . (3.9.79)

Given a test-function = (x;t), we also define


= (t) = (t/2,t), = (x;t) = (x;t) .
x

Now consider the test-functions

sin x
f (x;t) = ,

1
g(x;t) = (S(x,t/2) + S(x, t/2)) ,
2
1
h(x;t) = (S(x,t/2) S(x, t/2)) ,
2
 x
G(x;t) = g(z;t)dz .
0

By the resolvent identity (3.9.76) and the symmetry (3.9.78) we have

p(t) = f + (t) + g| f t , q(t) = f + (t) + h| f t . (3.9.80)

It follows by (3.9.77) that r(t) is also expressible in terms of angle brackets. To


link the function r(t) to the ratios (3.9.62) in the bulk case, we begin by expressing
the latter in terms of angle brackets, as follows.

Lemma 3.9.39 For each constant t > 0 we have

bulk,1 (t) = 1 2G+ (t) 2h|Gt , (3.9.81)


1
bulk,4 (t) = (1 G+ (t) h|Gt )(1 + g|1t ) . (3.9.82)
2

Proof Let I = (t/2,t/2) and define inputs to Lemma 3.9.37 as follows:


" #
a(x, y) b(x, y)
= 21II (x, y)Ksine,4 (x, y) ,
c(x, y) d(x, y)
1
e(x, y) = 1II (x, y) sign(x y) ,
2
(x, y) = 1II (x, y)S(x, y) , w = 0 .
3.9 L IMITING B EHAVIOR OF THE GOE AND THE GSE 175

Then we have
" #
g(x;t)

K1 (x, y) = 1II (x, y) 1 2h(x;t) ,
G(x;t)
" #" #
g(x;t)/2 0 1 h(y;t)
K4 (x, y) = 1II (x, y) ,
G(x;t) 1 0 g(x;t)/2
" #
0 0
R(x, y) = 1II (x, y) ,
0 R(x, y;t)
where the first two formulas can be checked using Lemma 3.9.38, and the last
formula holds by the resolvent identity (3.9.76).
The right sides of (3.9.68) and (3.9.69) equal bulk, (t) for {1, 4}, respec-
tively, by Corollary 3.9.23. Using Remark 3.9.15, one can check that the left side
of (3.9.68) equals the right side of (3.9.81), which concludes the proof of the latter.
A similar argument shows that the left side of (3.9.69) equals
 " + #
G (t) + h|Gt h|1t
det I2 .
2 g|Gt 12 g|1t
1

But h|1t and g|Gt are forced to vanish identically by (3.9.79). This concludes
the proof of (3.9.82).

Toward the goal of evaluating the logarithmic derivatives of the right sides of
(3.9.81) and (3.9.82), we prove a final lemma. Given a test-function = (x;t),
let D = (D )(x;t) = (x x + t t ) (x;t). In the statement of the lemma and the
calculations following we drop subscripts of t for brevity.

Lemma 3.9.40 For all test-functions 1 , 2 we have


1 |2  + 1 |2  = (3.9.83)
(1+ + g + h|1 )(2+ + g + h|2 ) (1 + g h|1 )(2 + g h|2 ) ,
d
t 1 |2  =
dt
1 |2  + D1 |2  + 1 |D2  + 1 | f  f |2  + 1 | f  f |2  . (3.9.84)

Proof The resolvent identity (3.9.76) and the symmetry S(x, y) S(y, x) yield the
relation
 t/2
g h| t = R(t/2, x;t) (x)dx.
t/2

Formula (3.6.18) with n = 2, s0 = 0 = s2 , s1 = 1, t2 = t1 = t/2 states that


 

+ R(x, y;t) = R(x, t/2;t)R(t/2, y;t) R(x,t/2;t)R(t/2, y;t).
x y
176 3. S PACINGS FOR G AUSSIAN ENSEMBLES

These facts, along with the symmetry (3.9.78) and integration by parts, yield
(3.9.83) after a straightforward calculation. Similarly, using the previously proved
formulas for t R(x, y;t), (x y)R(x, y;t), P (x;t) and Q (x;t), see Section 3.6,
along with the trick
   

1+x +y R= (x y)R + y + R,
x y x x y
one gets
 

1+x +y +t R(x, y;t) = P(x;t)P(y;t) + Q(x;t)Q(y;t) ,
x y t
whence formula (3.9.83) by differentiation under the integral.

To apply the preceding lemma we need the following identities for which the
verifications are straightforward.
d +
h + Dh = f + f , g + Dg = f + f , DG = f + f , t G = f + f + . (3.9.85)
dt
The notation here is severely abbreviated. For example, the third relation written
out in full reads (DG)(x;t) = f + (t) f (x) = f (t/2) f (x). The other relations are
interpreted similarly.
We are ready to conclude. We claim that
d
t (1 2G+ 2h|G)
dt
= 2( f + + h| f )( f + +  f |G) = 2q( f + +  f |G)
= 2q( f + + g| f  2( f + + g| f )(G+ + h|G))
= 2pq(1 2G+ 2h|G) = tr(1 2G+ 2h|G) . (3.9.86)
At the first step we apply (3.9.79), (3.9.84) and (3.9.85). At the second and fourth
steps we apply (3.9.80). At the third step we apply (3.9.83) with 1 = f and
2 = G, using (3.9.79) to simplify. At the last step we apply (3.9.77). Thus the
claim (3.9.86) is proved. The claim is enough to prove (3.1.11) since both sides
of the latter tend to 1 as t 0. Similarly, we have
d
t(1 + g|1) = p f |1 = 2pq(1 + g|1) = tr(1 + g|1) ,
dt
which is enough in conjunction with (3.1.11) to verify (3.1.12). The proof of
Theorem 3.1.6 is complete.

Proof of Theorem 3.1.7

The pattern of the proof of Theorem 3.1.6 will be followed rather closely, albeit
with some extra complications. We begin by recalling the main objects from the
3.9 L IMITING B EHAVIOR OF THE GOE AND THE GSE 177

analysis of the GUE at the edge of the spectrum. We revert to the abbreviated
notation A(x, y) = KAiry,2 (x, y). Explicitly, equation (3.9.47) says that

   n
(1)n x1 ... xn
Fedge,2 (t) = 1 + A dxi .
n=1 n! [t,)n x1 ... xn i=1

Let R(x, y;t) be the resolvent kernel studied in Section 3.8. Explicitly, R(x, y;t) is
given by

  
(1)n x x1 xn n
Fedge,2 (t)R(x, y;t) = A(x, y) + A dxi ,
n=1 n! (t,)n y x1 xn i=1

and by Lemma 3.4.7 satisfies



A(x, y) + A(x, z)R(z, y;t)dz = R(x, y;t) . (3.9.87)
t

Recall the functions



Q(x;t) = Ai(x) + R(x, y;t) Ai(y)dy , q = q(t) = Q(t;t) ,
t

which are as in definition (3.8.3), noting that q is the function appearing in Theo-
rem 3.1.7.
Given any smooth functions 1 = 1 (x;t) and 2 = 2 (x;t) defined on R2 , we
define

1 |2 t = 1 (t + x;t)2 (t + x;t)dx
0
 
+ 1 (t + x;t)R(t + x,t + y;t)2 (t + y;t)dxdy ,
0 0

provided that the integrals converge absolutely for each fixed t. We call the result-
ing function of t an angle bracket. Since the kernel R(x, y;t) is symmetric in x and
y, we have 1 |2 t = 2 |1 t .
We will only need finitely many explicitly constructed pairs (1 , 2 ) to substi-
tute into |t . For each of these pairs it will be clear using the estimates (3.9.56)
and (3.9.59) that the integrals above converge absolutely, and that differentiation
under the integral is permissible.
We now define the finite collection of smooth functions of (x,t) R2 from
178 3. S PACINGS FOR G AUSSIAN ENSEMBLES

which we will draw pairs to substitute into |t . Let

f = f (x;t) = Ai(x) ,
g = g(x;t) = A(t, x) ,

F = F(x;t) = f (z)dz ,
x
G = G(x;t) = g(z;t)dz .
x

Given any smooth function = (x;t), it is convenient to define


= (x;t) = (x;t) ,
x
= (t) = (t;t) ,
 

D = (D )(x;t) = + (x;t) .
x y

We have
d
D f = f , DF = F = f , G = g , F = f, (3.9.88)
dt

d
Dg = f f , DG = f F , G = (F )2 /2 , G = f F , (3.9.89)
dt

the first four relations clearly, and the latter four following from the integral rep-
resentation (3.9.58) of A(x, y). We further have

q = f +  f |g , (3.9.90)

by (3.9.87). The next lemma links q to the ratios (3.9.62) in the edge case by
expressing these ratios in terms of angle brackets. For {1, 4} let


h 1 12 F " #
g =
12
,1 1
g
,
2 + 4F f
f 0 1

" # ,1 ,1 1 G
G
1
F + F
= 2 4
,1
2 4 1 .
F 0 1
2 2 F
3.9 L IMITING B EHAVIOR OF THE GOE AND THE GSE 179

Lemma 3.9.41 For each real t we have


 " #
F (t)/2 + h1 |G1 t h1 |F1 t
edge,1 (t) = det I2 , (3.9.91)
 f1 |G1 t  f1 |F1 t

h4 |G4 t /2 h4 |1t /2 h4 |F4 t /2
edge,4 (t) = det I3 g4 |G4 t /2 g4 |1t /2 g4 |F4 t /2 .
 f4 |G4 t  f4 |1t  f4 |F4 t
(3.9.92)

It is easy to check that all the angle brackets are well defined.
Proof We arbitrarily fix real t, along with {1, 4} and > 0. Let K = E +KAiry,1
if = 1 and otherwise let K = 2KAiry,4 if = 4. Let I = (t, ) and define inputs
to Lemma 3.9.37 as follows.
" #
a(x, y) b(x, y)
= 1II (x, y)K(x, y) ,
c(x, y) d(x, y)
1
e(x, y) = 1II (x, y) sign(x y) ,
2
(x, y) = 1II (x, y)A(x, y) ,
  
1
w(x, y) = ,1 Ai(z)dz Ai(y) .
2 x

Using Lemma 3.9.38 with t1 = t and t2 , one can verify after a straightforward
if long calculation that if = 1, then
" #" #
g1 (y;t) 0 1 h1 (x;t)
K1 (x, y) = 1II (x, y) ,
G1 (y;t) F1 (y;t) 0 f1 (x;t)
whereas, if = 4, then

" # 1 h4 (y;t)/2
g4 (x;t)/2 0 0 0 g4 (y;t)/2 .
K4 (x, y) = 1II (x, y)
G4 (x;t) 1 F4 (x;t)
0 f4 (y;t)
We also have
" #
0 0
R(x, y) = 1II (x, y) .
0 R(x, y;t)
The right sides of (3.9.68) and (3.9.69) equal edge, (t) for {1, 4}, respec-
tively, by Corollary 3.9.25. Using Remark 3.9.15, and the identity

,1
g (x;t)dx = F (t) ,
t 2
which follows from (3.9.88) and the definitions, one can check that for = 1 the
180 3. S PACINGS FOR G AUSSIAN ENSEMBLES

left side of (3.9.68) equals the right side of (3.9.91), and that for = 4, the left
side of (3.9.69) equals the right side of (3.9.92). This completes the proof.

One last preparation is required. For the rest of the proof we drop the subscript
t, writing 1 |2  instead of 1 |2 t . For 1 { f , g} and 2 {1, F, G}, we have
d
1 |2  = D1 |2  + 1 |D2   f |1  f |2  , (3.9.93)
dt
1 |2  + 1 |2  = (1 + g|1 )(2 + g|2 ) +  f |1  f |2  , (3.9.94)
as
 one verifies
 by straightforwardly applying the previously obtained formulas for

x + y R(x, y;t) and t R(x, y;t), see Section 3.8.
We now calculate using (3.9.88), (3.9.89), (3.9.90), (3.9.93) and (3.9.94). We
have
d
(1 + g|1) = q( f |1) ,
dt
d
( f |1) =  f |1 +  f | f 1| f  = q(1 + g|1) ,
dt
d
(1  f |F) =  f |F  f | f  +  f | f  f |F = q(F + g|F) ,
dt
d
(F + g|F) = q(1  f |F) ,
dt
g|1 = (G + g|G)(1 + g|1) +  f |G f |1 ,
g|F +  f |G = (G + g|G)(F + g|F) +  f |F f |G .
The first four differential equations are easy to integrate, and moreover the con-
stants of integration can be fixed in each case by noting that the angle brackets
tend to 0 as t +, as does q. In turn, the last two algebraic equations are easily
solved for g|G and  f |G. Letting
  
x = x(t) = exp q(x)dx ,
t

we thus obtain the relations


" #
g|G g|1 g|F
(3.9.95)
 f |G  f |1  f |F

x+x1 1 x+x1 xx1
2 xx2 F + (F )2 /2 1 2 1 2 F
= x+x1 xx1 1 x+x1
.
2 F 2 xx2 1 2
It remains only to use these formulas to evaluate the determinants on the right sides
of (3.9.91) and (3.9.92) in terms of x and F . The former determinant evaluates
1
to x and the latter to x+2+x
4 . The proof of Theorem 3.1.7 is complete.

3.10 B IBLIOGRAPHICAL NOTES 181

Remark 3.9.42 The evaluations of determinants which conclude the proof above
are too long to suffer through by hand. Fortunately one can organize them into
manipulations of matrices with entries that are (Laurent) polynomials in variables
x and F , and carry out the details with a computer algebra system.

3.10 Bibliographical notes

The study of spacings between eigenvalues of random matrices in the bulk was
motivated by Wigners surmise [Wig58], that postulated a density of spacing
distributions of the form Cses /4 . Soon afterwords, it was realized that this was
2

not the case [Meh60]. This was followed by the path-breaking work [MeG60],
that established the link with orthogonal polynomials and the sine-kernel. Other
relevant papers from that early period include the series [Dys62b], [Dys62c],
[Dys62d] and [DyM63]. An important early paper concerning the orthogonal
and symplectic ensembles is [Dys70]. Both the theory and a description of the
history of the study of spacings of eigenvalues of various ensembles can be found
in the treatise [Meh91]. The results concerning the largest eigenvalue are due to
[TrW94a] for the GUE (with a 1992 ArXiv online posting), and [TrW96] for the
GOE and GSE; a good review is in [TrW93]. These results have been extended
in many directions; at the end of this section we provide a brief description and
pointers to the relevant (huge) literature.
The book [Wil78] contains an excellent short introduction to orthogonal poly-
nomials as presented in Section 3.2. Other good references are the classical
[Sze75] and the recent [Ism05]. The three term recurrence and the Christoffel
Darboux identities mentioned in Remark 3.2.6 hold for any system of polynomials
orthogonal with respect to a given weight on the real line.
Section 3.3.1 follows [HaT03], who proved (3.3.11) and observed that differ-
ential equation (3.3.12) implies a recursion for the moments of LN discovered by
[HaZ86] in the course of the latters investigation of the moduli space of curves.
Their motivation came from the following: at least formally, we have the expan-
sion
s2p
LN , es  = LN , x2p .
p0 2p!

Using graphical rules for the evaluation of expectations of products of Gaussian


variables (Feynmans diagrams), one checks that LN , x2p  expands formally into
1
N 2g N C tr(X2p ),g (1)
g0
182 3. S PACINGS FOR G AUSSIAN ENSEMBLES

with N C tr(X2p ),g (1) the number of perfect matchings on one vertex of degree 2p
whose associated graph has genus g. Hence, computing LN , es  as in Lemma
3.3.1 gives exact expressions for the numbers N C tr(X2p ),g (1). The link between
random matrices and the enumeration of maps was first described in the physics
context in [tH74] and [BrIPZ78], and has since been enormously developed, also
to situations involving multi-matrices, see [GrPW91], [FrGZJ95] for a descrip-
tion of the connection to quantum gravity. In these cases, matrices do not have in
general independent entries but their joint distribution is described by a Gibbs
measure. When this joint distribution is a small perturbation of the Gaussian
law, it was shown in [BrIPZ78] that, at least at a formal level, annealed mo-
ments LN , x2p  expands formally into a generating function of the numbers of
maps. For an accessible introduction, see [Zvo97], and for a discussion of the as-
sociated asymptotic expansion (in contrast with formal expansion), see [GuM06],
[GuM07], [Mau06] and the discussion of RiemannHilbert methods below.
The sharp concentration estimates for max contained in Lemma 3.3.2 are de-
rived in [Led03].
Our treatment of Fredholm determinants in Section 3.4 is for the most part
adapted from [Tri85]. The latter gives an excellent short introduction to Fredholm
determinants and integral equations from the classical viewpoint.
The beautiful set of nonlinear partial differential equations (3.6.4), contained in
Theorem 3.6.1, is one of the great discoveries reported in [JiMMS80]. Their work
follows the lead of the theory of holonomic quantum fields developed by Sato,
Miwa and Jimbo in the series of papers [SaMJ80]. The link between Toeplitz
and Fredholm determinants and the Painleve theory of ordinary differential equa-
tions was earlier discussed in [WuMTB76], and influenced the series [SaMJ80].
See the recent monograph [Pal07] for a discussion of these developments in the
original context of the evaluation of correlations for two dimensional fields. To
derive the equations (3.6.4) we followed the simplified approach of [TrW93], how-
ever we altered the operator-theoretic viewpoint of [TrW93] to a matrix algebra
viewpoint consistent with that taken in our general discussion in Section 3.4 of
Fredholm determinants. The differential equations have a Hamiltonian structure
discussed briefly in [TrW93]. The same system of partial differential equations is
discussed in [Mos80] in a wider geometrical context. See also [HaTW93].
Limit formula (3.7.4) appears in the literature as [Sze75, Eq. 8.22.14, p. 201]
but is stated there without much in the way of proof. The relatively short self-
contained proof of (3.7.4) presented in Section 3.7.2 is based on the ideas of
[PlR29]; the latter paper is, however, devoted to the asymptotic behavior of the
Hermite polynomials Hn (x) for real positive x only.
3.10 B IBLIOGRAPHICAL NOTES 183

In Section 3.8, we follow [TrW02] fairly closely. It is possible to work out a sys-
tem of partial differential equations for the Fredholm determinant of the Airy ker-
nel in the multi-interval case analogous to the system (3.6.4) for the sine-kernel.
See [AdvM01] for a general framework that includes also non-Gaussian models.
As in the case of the sine-kernel, there is an interpretation of the system of partial
differential equations connected to the Airy kernel in the multi-interval case as an
integrable Hamiltonian system, see [HaTW93] for details.
The statement contained in Remark 3.8.1, taken from [HaM80], is a solution
of a connection problem. For another early solution to connection problems, see
[McTW77]. The book [FoIKN06] contains a modern perspective on Painleve
equations and related connection problems, via the RiemannHilbert approach.
Precise asymptotics on the TracyWidom distribution are contained in [BaBD08]
and [DeIK08].
Section 3.9 borrows heavily from [TrW96] and [TrW05], again reworked to our
matrix algebra viewpoint.
Our treatment of Pfaffians in Section 3.9.1 is classical, see [Jac85] for more
information. We avoided the use of quaternion determinants; for a treatment based
on these, see e.g. [Dys70] and [Meh91].
An analog of Lemma 3.2.2 exists for = 1, 4, see Theorem 6.2.1 and its proof
in [Meh91] (in the language of quaternion determinants) and the exposition in
[Rai00] (in the Pfaffian language).
As mentioned above, the results of this chapter have been extended in many
directions, seeking to obtain universality results, stating that the limit distributions
for spacings at the bulk and the edge of the GOE/GUE/GSE appear also in other
matrix models, and in other problems. Four main directions for such universality
occur in the literature, and we describe these next.
First, other classical ensembles have been considered (see Section 4.1 for what
ensembles mean in this context). These involve the study of other types of orthog-
onal polynomials than the Hermite polynomials (e.g., Laguerre or Jacobi). See
[For93], [For94], [TrW94b], [TrW00], [Joh00], [John01], [For06], and the book
[For05].
Second, one may replace the entries of the random matrix by non-Gaussian
entries. In that case, the invariance of the law under conjugation is lost, and no ex-
plicit expression for the joint distribution of the eigenvalues exist. It is, however,
remarkable that it is still possible to obtain results concerning the top eigenvalue
and spacings at the edge that are of the same form as Theorems 3.1.4 and 3.1.7,
in the case that the law of the entries possesses good tail properties. The seminal
184 3. S PACINGS FOR G AUSSIAN ENSEMBLES

work is [Sos99], who extended the combinatorial techniques in [SiS98b] to show


that the dominant term in the evaluation of traces of large powers of random ma-
trices does not depend on the law of the entry, as long as the mean is zero, the vari-
ance as in the GOE/GUE, and the distribution of the entries is symmetric. This has
been extended to other models, and specifically to certain Wishart matrices, see
[Sos02b] and [Pec09]. Some partial results relaxing the symmetry assumption can
be found in [PeS07], [PeS08b], although at this time the universality at the edge
of Wigner matrices with entries possessing non-symmetric distribution remains
open. When the entries possess heavy tail, limit laws for the largest eigenvalue
change, see [Sos04], [AuBP07]. Concerning the spacing in the bulk, universality
was proved when the i.i.d. entries are complex and have a distribution that can be
written as convolution with a Gaussian law, see [Joh01b] (for the complex Wigner
case) and [BeP05] (for the complex Wishart case). The proof is based on an ap-
plication of the ItzyksonZuberHarish-Chandra formula, see the bibliographical
notes for Chapter 4. Similar techniques apply to the study of the largest eigenvalue
of so called spiked models, which are matrices of the form XT X with X possess-
ing i.i.d. complex entries and T a diagonal real matrix, all of whose entries except
for a finite number equal to 1, and to small rank perturbations of Wigner matrices,
see [BaBP05], [Pec06], [FeP07], [Kar07b] and [Ona08]. Finally, a wide ranging
extension of the universality results in [Joh01b] to Hermitian matrices with inde-
pendent entries on and above the diagonal appears in [ERSY09], [TaV09b] and
[ERS+ 09].

Third, one can consider joint distribution of eigenvalues of the form (2.6.1), for
general potentials V . This is largely motivated by applications in physics. When
deriving the bulk and edge asymptotics, one is naturally led to study the asymp-
totics of orthogonal polynomials associated with the weight eV . At this point,
the powerful RiemannHilbert approach to the asymptotics of orthogonal poly-
nomials and spacing distributions can be applied. Often, that approach yields the
sharpest estimates, especially in situations where the orthogonal polynomials are
not known explicitly, thereby proving universality statements for random matri-
ces. Describing this approach in detail goes beyond the scope of this book (and
bibliography notes). For the origins and current state of the art of this approach we
refer the reader to the papers [FoIK92], [DeZ93], [DeZ95], [DeIZ97], [DeVZ97]
[DeKM+ 98], [DeKM+ 99], [BlI99], to the books [Dei99], [DeG09] and to the
lecture [Dei07]. See also [PaS08a].

Finally, expressions similar to the joint distribution of the eigenvalues of ran-


dom matrices have appeared in the study of various combinatorial problems. Ar-
guably, the most famous is the problem of the longest increasing subsequence of
a random permutation, also known as Ulams problem, which we now describe.
3.10 B IBLIOGRAPHICAL NOTES 185

Let Ln denote the length of the longest increasing subsequence of a random per-
mutation on {1, . . . , n}. The problem is to understand the asymptotics of the law
of Ln . Based on his subadditive ergodic theorem, Hammersley [Ham72] showed

that Ln / n converges to a deterministic limit and, shortly thereafter, [VeK77] and
[LoS77] independently proved that the limit equals 2. It was conjectured (in anal-
ogy with conjectures for first passage percolation, see [AlD99] for some of the

history and references) that Ln := (Ln 2 n)/n1/6 has variance of order 1. Using
a combinatorial representation, due to Gessel, of the distribution of Ln in terms
of an integral over an expression resembling a joint distribution of eigenvalues
(but with non-Gaussian potential V ), [BaDJ99] applied the RiemannHilbert ap-
proach to prove that not only is the conjecture true, but in fact Ln asymptotically
is distributed according to the TracyWidom distribution F2 . Subsequently, di-
rect proofs that do not use the RiemannHilbert approach (but do use the random
matrices connection) emerged, see [Joh01a], [BoOO00] and [Oko00]. Certain
growth models also fall in the same pattern, see [Joh00] and [PrS02]. Since then,
many other examples of combinatorial problems leading to a universal behavior
of the TracyWidom type have emerged. We refer the reader to the forthcoming
book [BaDS09] for a thorough discussion.
We have not discussed, neither in the main text nor in these bibliographical
notes, the connections between random matrices and number theory, more specif-
ically the connections with the Riemann zeta function. We refer the reader to
[KaS99] for an introduction to these links, and to [Kea06] for a recent account.
4
Some generalities

In this chapter, we introduce several tools useful in the study of matrix ensem-
bles beyond GUE, GOE and Wigner matrices. We begin by setting up in Section
4.1 a general framework for the derivation of joint distribution of eigenvalues in
matrix ensembles and then we use it to derive joint distribution results for several
classical ensembles, namely, the GOE/GUE/GSE, the Laguerre ensembles (corre-
sponding to Gaussian Wishart matrices), the Jacobi ensembles (corresponding to
random projectors) and the unitary ensembles (corresponding to random matrices
uniformly distributed in classical compact Lie groups). In Section 4.2, we study
a class of point processes that are determinantal; the eigenvalues of the GUE, as
well as those for the unitary ensembles, fall within this class. We derive a repre-
sentation for determinantal processes and deduce from it a CLT for the number
of eigenvalues in an interval, as well as ergodic consequences. In Section 4.3,
we analyze time-dependent random matrices, where the entries are replaced by
Brownian motions. The introduction of Brownian motion allows us to use the
powerful theory of Ito integration. Generalizations of the Wigner law, CLTs, and
large deviations are discussed. We then present in Section 4.4 a discussion of
concentration inequalities and their applications to random matrices, substantially
extending Section 2.3. Concentration results for matrices with independent en-
tries, as well as for matrices distributed according to Haar measure on compact
groups, are discussed. Finally, in Section 4.5, we introduce a tridiagonal model of
random matrices, whose joint distribution of eigenvalues generalizes the Gaussian
ensembles by allowing for any value of 1 in Theorem 2.5.3. We refer to this
matrix model as the beta ensemble.

186
4.1 J OINT DISTRIBUTIONS FOR CLASSICAL MATRIX ENSEMBLES 187

4.1 Joint distribution of eigenvalues in the classical matrix ensembles

In Section 2.5, we derived an expression for the joint distribution of eigenvalues


of a GUE or GOE matrix which could be stated as an integration formula, see
(2.5.22). Although we did not emphasize it in our derivation, a key point was that
the distribution of the random matrices was invariant under the action of a group
(orthogonal for the GOE, unitary for the GUE). A collection of matrices equipped
with a probability measure invariant under a large group of symmetries is gener-
ally called an ensemble. It is our goal in this section to derive integration formulas,
and hence joint distribution of eigenvalues, for several ensembles of matrices, in
a unified way, by following in the footsteps of Weyl. The point of view we adopt
is that of differential geometry, according to which we consider ensembles of ma-
trices as manifolds embedded in Euclidean spaces. The prerequisites and notation
are summarized in Appendix F.
The plan for Section 4.1 is as follows. In Section 4.1.1, after briefly recalling
notation, we present the main results of Section 4.1, namely integration formu-
las yielding joint distribution of eigenvalues in three classical matrix ensembles
linked to Hermite, Laguerre and Jacobi polynomials, respectively, and also Weyls
integration formulas for the classical compact Lie groups. We then state in Section
4.1.2 a special case of Federers coarea formula and illustrate it by calculating the
volumes of unitary groups. (A proof of the coarea formula in the easy version
used here is presented in Appendix F.) In Section 4.1.3 we present a general-
ized Weyl integration formula, Theorem 4.1.28, which we prove by means of the
coarea formula and a modest dose of Lie group theory. In Section 4.1.4 we verify
the hypotheses of Theorem 4.1.28 in each of the setups discussed in Section 4.1.1,
thus completing the proofs of the integration formulas by an updated version of
Weyls original method.

4.1.1 Integration formulas for classical ensembles

Throughout this section, we let F denote any of the (skew) fields R, C or H. (See
Appendix E for the definition of the skew field of quaternions H. Recall that H
is a skew field, but not a field, because the product in H is not commutative.)
We set = 1, 2, 4 according as F = R, C, H, respectively. (Thus is the dimen-
sion of F over R.) We next recall matrix notation which in greater detail is set
out in Appendix E.1. Let Mat pq (F) be the space of p q matrices with en-
tries in F, and write Matn (F) = Matnn (F). For each matrix X Mat pq (F), let
X Matqp (F) be the matrix obtained by transposing X and then applying the
conjugation operation to every entry. We endow Mat pq (F) with the structure
188 4. S OME GENERALITIES

of Euclidean space (that is, with the structure of finite-dimensional real Hilbert
space) by setting X Y = tr X Y . Let GLn (F) be the group of invertible ele-
ments of Matn (F), and let Un (F) be the subgroup of GLn (F) consisting of unitary
matrices; by definition U Un (F) iff UU = In iff U U = In .

The Gaussian ensembles

The first integration formula that we present pertains to the Gaussian ensembles,
that is, to the GOE, GUE and GSE. Let Hn (F) = {X Matn (F) : X = X}. Let
Hn (F) denote the volume measure on Hn (F). (See Proposition F.8 for the general
definition of the volume measure M on a manifold M embedded in a Euclidean
space.) Let Un (F) denote the volume measure on Un (F). (We will check below,
see Proposition 4.1.14, that Un (F) is a manifold.) The measures Hn (F) and Un (F)
are just particular normalizations of Lebesgue and Haar measure, respectively. Let
[Un (F)] denote the (finite and positive) total volume of Un (F). (For any manifold
M embedded in a Euclidean space, we write [M] = M (M).) We will calculate
[Un (F)] explicitly in Section 4.1.2. Recall that if x = (x1 , . . . , xn ), then we write
(x) = 1i< jn (x j xi ). The notion of eigenvalue used in the next result is
defined for general F in a uniform way by Corollary E.12 and is the standard one
for F = R, C.

Proposition 4.1.1 For every nonnegative Borel-measurable function on Hn (F)


such that (X) depends only on the eigenvalues of X, we have
 
[Un (F)] n
d Hn (F) = (x)|(x)| dxi , (4.1.1)
( [U1 (F)])n n! Rn i=1

where for every x = (x1 , . . . , xn ) Rn we write (x) = (X) for any X Hn (F)
with eigenvalues x1 , . . . , xn .

According to Corollary E.12, the hypothesis that (X) depends only on the eigen-
values of X could be restated as the condition that (UXU ) = (X) for all
X Hn (F) and U Un (F).
Suppose now that X Hn (F) is random. Suppose more precisely that the en-
tries on or above the diagonal are independent; that each diagonal entry is (real)
Gaussian of mean 0 and variance 2/ ; and that each above-diagonal entry is stan-
dard normal over F. (We say that a random variable G with values in F is stan-
dard normal if, with {Gi }4i=1 independent real-valued Gaussian random variables
4.1 J OINT DISTRIBUTIONS FOR CLASSICAL MATRIX ENSEMBLES 189

of zero mean and unit variance, we have that G is distributed like

G1 if F = R ,

(G1 + iG2 )/ 2 if F = C ,
(G1 + iG2 + jG3 + kG4 )/2 if F = H .) (4.1.2)

Then for F = R (resp., F = C) the matrix X is a random element of the GOE


(resp., GUE), and in the case F = H is by definition a random element of the
Gaussian Symplectic Ensemble (GSE). Consider now the substitution (X) =
e tr X /4 f (X) in (4.1.1), in conjunction with Proposition 4.1.14 below which
2

computes volumes of unitary groups. For = 1, 2, we recover Theorem 2.5.2


in the formulation given in (2.5.22). In the remaining case = 4 the substitution
yields the joint distribution of the (unordered) eigenvalues in the GSE.

Remark 4.1.2 As in formula (4.1.1), all the integration formulas in this section
involve normalization constants given in terms of volumes of certain manifolds.
Frequently, when working with probability distributions, one bypasses the need
to evaluate these volumes by instead using the Selberg integral formula, Theorem
2.5.8, and its limiting forms, as in our previous discussion of the GOE and GUE
in Section 2.5.

We saw in Chapter 3 that the Hermite polynomials play a crucial role in the
analysis of GUE/GOE/GSE matrices. For that reason we will sometimes speak of
Gaussian/Hermite ensembles. In similar fashion we will tag each of the next two
ensembles by the name of the associated family of orthogonal polynomials.

Laguerre ensembles and Wishart matrices

We next turn our attention to random matrices generalizing the Wishart matrices
discussed in Exercise 2.1.18, in the case of Gaussian entries. Fix integers 0 <
p q and put n = p + q. Let Mat pq (F) be the volume measure on the Euclidean
space Mat pq (F). The analog of integration formula (4.1.1) for singular values of
rectangular matrices is the following. The notion of singular value used here is
defined for general F in a uniform way by Corollary E.13 and is the standard one
for F = R, C.
190 4. S OME GENERALITIES

Proposition 4.1.3 For every nonnegative Borel-measurable function on


Mat pq (F) such that (X) depends only on the singular values of X, we have

[U p (F)] [Uq (F)]2 p/2
d Mat pq (F) = (4.1.3)
[U1 (F)] p [Uqp (F)]2 pq/2 p!
 p
(qp+1)1
p
(x) |(x2 )| xi dxi ,
R+ i=1
p
where for every x = (x1 , . . . , x p ) R+ we write x2 = (x12 , . . . , x2p ), and (x) = (X)
for any X Mat pq (F) with singular values x1 , . . . , x p .

Here and in later formulas, by convention, [U0 (F)] = 1. According to Corol-


lary E.13, the hypothesis that (X) depends only on the singular values of X
could be restated as the condition that (UXV ) = (X) for all U U p (F), X
Mat pq (F) and V Uq (F).
Suppose now that the entries of X Mat pq (F) are i.i.d. standard normal. In
the case F = R the random matrix XX is an example of a Wishart matrix, the
latter as studied in Exercise 2.1.18. In the case of general F we call XX a Gaus-
sian Wishart matrix over F. Proposition 4.1.3 implies that the distribution of the
(unordered) eigenvalues of XX (which are the squares of the singular values of
X) possesses a density on (0, ) p with respect to Lebesgue measure proportional
to
p p
(qp+1)/21
|(x)| e xi /4 xi .
i=1 i=1

Now the orthogonal polynomials corresponding to weights of the form x e x on


(0, ) are the Laguerre polynomials. In the analysis of random matrices of the
form XX , the Laguerre polynomials and their asymptotics play a role analogous
to that played by the Hermite polynomials and their asymptotics in the analysis of
GUE/GOE/GSE matrices. For this reason we also call XX a random element of
a Laguerre ensemble over F.

Jacobi ensembles and random projectors

We first make a general definition. Put


Flagn ( , F) = {U U : U Un (F)} Hn (F), (4.1.4)
where Matn is any real diagonal matrix. The compact set Flagn ( , F) is al-
ways a manifold, see Lemma 4.1.18 and Exercise 4.1.19.
Now fix integers 0 < p q and put n = p + q. Also fix 0 r q p and
write q = p + r + s. Consider the diagonal matrix D = diag(Ip+r , 0 p+s ), and the
4.1 J OINT DISTRIBUTIONS FOR CLASSICAL MATRIX ENSEMBLES 191

corresponding space Flagn (D, F) as defined in (4.1.4) above. (As in Appendix


E.1, we will use the notation diag to form block-diagonal matrices as well as
matrices diagonal in the usual sense.) Let Flagn (D,F) denote the volume measure
on Flagn (D, F). Given W Flagn (D, F), let W (p) H p (F) denote the upper left
p p block. Note that all eigenvalues of W (p) are in the unit interval [0, 1].

Proposition 4.1.4 With notation as above, for all Borel-measurable nonnegative


functions on H p (F) such that (X) depends only on the eigenvalues of X, we
have

[U p (F)] [Uq (F)]2 p/2
(W (p) )d Flagn (D,F) (W ) =
[U1 (F)] p [Ur (F)] [Us (F)]2 p p!
 p
(r+1) /21
(x) |(x)| (xi (1 xi )(s+1) /21 dxi ) , (4.1.5)
[0,1] p i=1

where for every x = (x1 , . . . , x p ) R p we write (x) = (X) for any matrix X
H p (F) with eigenvalues x1 , . . . , x p .

The symmetry here crucial for the proof is that (W (p) ) = ((UWU )(p) ) for all
U Un (F) commuting with diag(Ip , 0q ) and all W Flagn (D, F).
Now up to a normalization constant, Flagn (D,F) is the law of a random matrix
of the form Un DUn , where Un Un (F) is Haar-distributed. (See Exercise 4.1.19
for evaluation of the constant [Flagn (D, F)].) We call such a random matrix
Un DUn a random projector. The joint distribution of eigenvalues of the submatrix
(Un DUn )(p) is then specified by formula (4.1.5). Now the orthogonal polynomials
corresponding to weights of the form x (1 x) on [0, 1] are the Jacobi polyno-
mials. In the analysis of random matrices of the form (Un DUn )(p) , the Jacobi
polynomials play a role analogous to that played by the Hermite polynomials in
the analysis of GUE/GOE/GSE matrices. For this reason we call (Un DUn )(p) a
random element of a Jacobi ensemble over F.

The classical compact Lie groups

The last several integration formulas we present pertain to the classical compact
Lie groups Un (F) for F = R, C, H, that is, to the ensembles of orthogonal, unitary
and symplectic "matrices, respectively, # equipped with normalized Haar measure.
cos sin
We set R( ) = U2 (R) for R. More generally, for =
sin cos
(1 , . . . , n ) R , we set Rn ( ) = diag(R(1 ), . . . , R(n )) U2n (R). We also write
n

diag( ) = diag(1 , . . . , n ) Matn .


192 4. S OME GENERALITIES

We define nonnegative functions An , Bn ,Cn , Dn on Rn as follows:


2
ii i j
An ( ) = |eii
ei j 2
| , Dn ( ) = An ( ) e e ,
1i< jn 1i< jn
n n
Bn ( ) = Dn ( ) |eii 1|2 , Cn ( ) = Dn ( ) |e eii |2 .
ii
i=1 i=1

(Recall that i equals the imaginary unit viewed as an element of C or H.)

Remark 4.1.5 The choice of letters A, B, C, and D made here is consistent with
the standard labeling of the corresponding root systems.

We say that a function on a group G is central if (g) depends only on the


conjugacy class of g, that is, if (g1 g2 g1
1 ) = (g2 ) for all g1 , g2 G.

Proposition 4.1.6 (Weyl) (Unitary case) For every nonnegative Borel-measurable


central function on Un (C), we have

d Un (C)  n  
1 d i
= (eidiag( )
)An ( ) . (4.1.6)
[Un (C)] n! [0,2 ]n i=1 2

(Odd orthogonal case) For odd n = 2+1 and every nonnegative Borel-measurable
central function on Un (R), we have

d Un (R)    
1 1
d i
= (diag(R ( ), (1) ))B ( ) 2 .
[Un (R)] 2+1 ! [0,2 ] k=0
k
i=1
(4.1.7)
(Symplectic case) For every nonnegative Borel-measurable central function on
Un (H), we have

d Un (H)  n  
1 d i
= (eidiag( )
)Cn ( ) . (4.1.8)
[Un (H)] 2n n! [0,2 ]n i=1 2

(Even orthogonal case) For even n = 2 and every nonnegative Borel-measurable


central function on Un (R) we have

d Un (R)

[Un (R)]
  
1 
d i
= (R ( ))D ( ) (4.1.9)
2 ! [0,2 ] i=1 2
 1  
1 d i
+  (diag(R1 ( ), 1, 1))C1 ( ) .
2 ( 1)! [0,2 ]1 i=1 2
4.1 J OINT DISTRIBUTIONS FOR CLASSICAL MATRIX ENSEMBLES 193

We will recover these classical results of Weyl in our setup in order to make it
clear that all the results on joint distribution discussed in Section 4.1 fall within
Weyls circle of ideas.

Remark 4.1.7 Because we have

Dn ( ) = (2 cos i 2 cos j )2 ,
1i< jn

the process of eigenvalues of Un (F) is determinantal (see Section 4.2.9 and in


particular Lemma 4.2.50) not only for F = C but also for F = R, H. This is in sharp
contrast to the situation with Gaussian/Hermite, Laguerre and Jacobi ensembles
where, in the cases F = R, H, the eigenvalue (singular value) processes are not
determinantal. One still has tools for studying the latter processes, but they are
Pfaffian- rather than determinant-based, of the same type considered in Section
3.9 to obtain limiting results for GOE/GSE.

4.1.2 Manifolds, volume measures and the coarea formula

Section 4.1.2 introduces the coarea formula, Theorem 4.1.8. In the specialized
form of Corollary 4.1.10, the coarea formula will be our main tool for proving the
formulas of Section 4.1.1. To allow for quick reading by the expert, we merely
state the coarea formula here, using standard terminology; precise definitions,
preliminary material and a proof of Theorem 4.1.8 are all presented in Appendix
F. After presenting the coarea formula, we illustrate it by working out an explicit
formula for [Un (F)].
Fix a smooth map f : M N from an n-manifold to a k-manifold, with deriva-
tive at a point p M denoted T p ( f ) : T p (M) T f (p) (N). Let Mcrit , Mreg , Ncrit
and Nreg be the sets of critical (regular) points (values) of f , see Definition F.3
and Proposition F.10 for the terminology. For q N such that Mreg f 1 (q) is
nonempty (and hence by Proposition F.16 a manifold) we equip the latter with the
volume measure Mreg f 1 (q) (see Proposition F.8). Put 0/ = 0 for convenience.
Finally, let J(T p ( f )) denote the generalized determinant of T p ( f ), see Definition
F.17.

Theorem 4.1.8 (The coarea formula) With notation and setting as above, let
be any nonnegative Borel-measurable function on M. Then:
(i) the function p  J(T p ( f )) on M is Borel-measurable;

(ii) the function q  (p)d Mreg f 1 (q) (p) on N is Borel-measurable;
194 4. S OME GENERALITIES

(iii) the integral formula


   
(p)J(T p ( f ))d M (p) = (p)d Mreg f 1 (q) (p) d N (q) (4.1.10)

holds.

Theorem 4.1.8 is in essence a version of Fubinis Theorem. It is also a particu-


lar case of the general coarea formula due to Federer. The latter formula at full
strength (that is, in the language of Hausdorff measures) requires far less differ-
entiability of f and is much harder to prove.

Remark 4.1.9 Since f in Theorem 4.1.8 is smooth, we have by Sards Theorem


(Theorem F.11) that for N almost every q, Mreg f 1 (q) = f 1 (q). Thus, with
slight abuse of notation, one could write the right side of (4.1.10) with f 1 (q)
replacing Mreg f 1 (q).

Corollary 4.1.10 We continue in the setup of Theorem 4.1.8. For every Borel-
measurable nonnegative function on N one has the integral formula
 
( f (p))J(T p ( f ))d M (p) = [ f 1 (q)] (q)d N (q) . (4.1.11)
Nreg

Proof of Corollary 4.1.10 By (4.1.10) with = f , we have


 
( f (p))J(T p ( f ))d M (p) = [Mreg f 1 (q)] (q)d N (q) ,

whence the result by Sards Theorem (Theorem F.11), Proposition F.16, and the
definitions.


Let Sn1 be the unit sphere centered at the origin in Rn . We will calculate
[Un (F)] by relating it to [Sn1 ]. We prepare by proving two well-known lem-
mas concerning Sn1 and its volume. Their proofs provide templates for the more
complicated proofs of Lemma 4.1.15 and Proposition 4.1.14 below.

Lemma 4.1.11 Sn1 is a manifold and for every x Sn1 we have Tx (Sn1 ) =
{X Rn : x X = 0}.

Proof Consider the smooth map f = (x  x x) : Rn R. Let be a curve with


(0) = x Rn and (0) = X Tx (Rn ) = Rn . We have (Tx ( f ))(X) = ( ) (0) =
2x X. Thus 1 is a regular value of f , whence the result by Proposition F.16.

 s1 x
Recall that (s) = 0 x e dx is Eulers Gamma function.
4.1 J OINT DISTRIBUTIONS FOR CLASSICAL MATRIX ENSEMBLES 195

Proposition 4.1.12 With notation as above, we have


2 n/2
[Sn1 ] = . (4.1.12)
(n/2)

Proof Consider the smooth map


f = (x  x/x) : Rn \ {0} Sn1 .
Let be a curve with (0) = x Rn \ {0} and (0) = X Tx (Rn \ {0}) = Rn .
We have
 
X x X x
(Tx ( f ))(X) = ( / ) (0) = ,
x x x x
and hence J(Tx ( f )) = x1n . Letting (x) = xn1 exp(x2 ), we have
  
exx dx1 dxn = [Sn1 ] rn1 er dr ,
2

0
by Theorem 4.1.8 applied to f and . Formula (4.1.12) now follows.

As further preparation for the evaluation of [Un (F)], we state without proof
the following elementary lemma which allows us to consider transformations of
manifolds by left (or right) matrix multiplication.

Lemma 4.1.13 Let M Matnk (F) be a manifold. Fix g GLn (F). Let f = (p 
gp) : M gM = {gp Matnk (F) : p M}. Then:
(i) gM is a manifold and f is a diffeomorphism;
(ii) for every p M and X T p (M) we have T p ( f )(X) = gX;
(iii) if g Un (F), then f is an isometry (and hence measure-preserving).

The analogous statement concerning right-multiplication by an invertible matrix


also holds. The lemma, especially part (iii) of it, will be frequently exploited
throughout the remainder of Section 4.1.
Now we can state our main result concerning Un (F) and its volume. Recall in
what follows that = 1, 2, 4 according as F = R, C, H.

Proposition 4.1.14 Un (F) is a manifold whose volume is


n n
2(2 ) k/2
[Un (F)] = 2 n(n1)/4 [S k1 ] = /2 ( k/2)
. (4.1.13)
k=1 k=1 2

The proof of Proposition 4.1.14 will be obtained by applying the coarea formula
to the smooth map
f = (g  (last column of g)) : Un (F) S n1 (4.1.14)
196 4. S OME GENERALITIES

where, abusing notation slightly, we make the isometric identification

S n1 = {x Matn1 (F) : x x = 1}

on the extreme right in (4.1.14).


Turning to the actual proof, we begin with the identification of Un (F) as a man-
ifold and the calculation of its tangent space at In .

Lemma 4.1.15 Un (F) is a manifold and TIn (Un (F)) is the space of anti-self-
adjoint matrices in Matn (F).

Proof Consider the smooth map

h = (X  X X) : Matn (F) Hn (F) .

Let be a curve in Matn (F) with (0) = In and (0) = X TIn (Matn (F)) =
Matn (F). Then, for all g Un (F) and X Matn (F),

(Tg (h))(gX) = ((g ) (g )) (0) = X + X . (4.1.15)

Thus In is a regular value of h, and hence Un (F) is a manifold by Proposition F.16.


To find the tangent space TIn (Un (F)), consider a curve (t) Un (F) with
(0) = In . Then, because XX = In on Un (F) and thus the derivative of h( (t))
vanishes for t = 0, we deduce from (4.1.15) that X +X = 0, and hence TIn (Un (F))
is contained in the space of anti-self-adjoint matrices in Matn (F). Because the lat-
ter two spaces have the same dimension, the inclusion must be an equality.

Recall the function f introduced in (4.1.14).

Lemma 4.1.16 f is onto, and furthermore (provided that n > 1), for any s S n1 ,
the fiber f 1 (s) is isometric to Un1 (F).

Proof The first claim (which should be obvious in the cases F = R, C) is proved
by applying Corollary E.8 with k = 1. To see the second claim, note first that for
any W Un1 (F), we have
" #
W 0
Un (F) , (4.1.16)
0 1

and that every g Un (F) whose last column is the unit vector en = (0, . . . , 0, 1)T
is necessarily of the form (4.1.16). Therefore the fiber f 1 (en ) is isometric to
Un1 (F). To see the claim for other fibers, note that if g, h Un (F), then f (gh) =
g f (h), and then apply part (iii) of Lemma 4.1.13.

4.1 J OINT DISTRIBUTIONS FOR CLASSICAL MATRIX ENSEMBLES 197

Lemma 4.1.17 Let f be as in (4.1.14). Then:


(i) J(Tg ( f )) is constant as a function of g Un (F);
(1n)
(ii) J(TIn ( f )) = 2 ;
(iii) every value of f is regular.

Proof (i) Fix h Un (F) arbitrarily. Let en = (0, . . . , 0, 1)T Matn1 . The diagram
TI ( f )
TIn (Un (F)) n Ten (S n1 )
TIn (ghg) Ten (xhx)
Th ( f )
Th (Un (F)) T f (h) (S n1 )
commutes. Furthermore, its vertical arrows are, by part (ii) of Lemma 4.1.13,
induced by left-multiplication by h, and hence are isometries of Euclidean spaces.
Therefore we have J(Th ( f )) = J(TIn ( f )).
(ii) Recall the notation i, j, k in Definition E.1. Recall the elementary matrices
ei j Matn (F) with 1 in position (i, j) and 0s elsewhere, see Appendix E.1. By
Lemma 4.1.15 the collection

{(uei j u e ji )/ 2 : 1 i < j n, u {1, i, j, k} F}
{ueii : 1 i n, u {i, j, k} F}
is an orthonormal basis for TIn (Un (F)). Let be a curve in Un (F) with (0) = In
and (0) = X TIn (Un (F)). We have
(TIn ( f ))(X) = ( en ) (0) = Xen ,
hence the collection

{(uein u eni )/ 2 : 1 i < n, u {1, i, j, k} F}
{uenn : u {i, j, k} F}
is an orthonormal basis for TIn (Un (F))(ker(TIn ( f ))) . An application of Lemma
F.19 yields the desired formula.
(iii) This follows from the preceding two statements, since f is onto.

Proof of Proposition 4.1.14 Assume at first that n > 1. We apply Corollary 4.1.10
to f with 1. After simplifying with the help of the preceding two lemmas, we
find the relation
(1n)
2 [Un (F)] = [Un1 (F)] [S n1 ] .
By induction on n we conclude that formula (4.1.13) holds for all positive integers
n; the induction base n = 1 holds because S 1 = U1 (F).

With an eye toward the proof of Proposition 4.1.4 about Jacobi ensembles, we
prove the following concerning the spaces Flagn ( , F) defined in (4.1.4).
198 4. S OME GENERALITIES

Lemma 4.1.18 With p, q, n positive integers so that p+q = n, and D = diag(Ip , 0q ),


the collection Flagn (D, F) is a manifold of dimension pq.

Proof In view of Corollary E.12 (the spectral theorem for self-adjoint matrices
over F), Flagn (D, F) is the set of projectors in Matn (F) of trace p. Now consider
the open set O Hn (F) consisting of matrices whose p-by-p block in upper left
is invertible, noting that D O. Using Corollary E.9, one can construct a smooth
map from Mat pq (F) to O Flagn (D, F) with a smooth inverse. Now let P
Flagn (D, F) be any point. By definition P = U DU for some U Un (D, F). By
Lemma 4.1.13 the set {UMU | M O Flagn (D, F)} is a neighborhood of P
diffeomorphic to O Flagn (D, F) and hence to Mat pq (F). Thus Flagn (D, F) is
indeed a manifold of dimension pq.

Motivated by Lemma 4.1.18, we refer to Flagn (D, F) as the flag manifold deter-
mined by D. In fact the claim in Lemma 4.1.18 holds for all real diagonal matrices
D, see Exercise 4.1.19 below.

Exercise 4.1.19 Fix 1 , . . . , n R and put = diag(1 , . . . , n ). In this exercise


we study Flagn ( , F). Write {1 < <  } = {1 , . . . , n } and let ni be the
number of indices j such that i = j . (Thus, n = n1 + + n .)
(a) Prove that Flagn ( , F) is a manifold of dimension equal to


dim Un (F) dim Uni (F).
i=1

(b) Applying the coarea formula to the smooth map f = (g  g g1 ) : Un (F)


Flagn (D, F), show that

[Un (F)]
[Flagn ( , F)] = 
i=1 [U (F)]
|i j | . (4.1.17)
ni 1i< jn
i = j

Exercise 4.1.20 We look at joint distribution of eigenvalues in the Gaussian en-


sembles (GUE/GOE/GSE) in yet another way. We continue with the notation of
the previous exercise.
(a) Consider the smooth map f = (A  (tr(A), tr(A2 )/2, . . . , tr(An )/n)) : Hn (F)
Rn . Show that J(TA ( f )) depends only on the eigenvalues of A Hn (F), that
J(T ( f )) = |( )|, and that a point of Rn is a regular value of f if and only if it
is of the form f (X) for some X Hn (F) with distinct eigenvalues.
(b) Applying the coarea formula to f , prove that for any nonnegative Borel-
4.1 J OINT DISTRIBUTIONS FOR CLASSICAL MATRIX ENSEMBLES 199

measurable function on Hn (F),


    
d Hn (F) = d Flagn ( ,F) d 1 d n . (4.1.18)
3 45 6
<1 <<n <
=diag(1 ,...,n )

(c) Derive the joint distribution of eigenvalues in the GUE, GOE and GSE from
(4.1.17) and (4.1.18).

Exercise 4.1.21 Fix 1 , . . . , n C and put = diag(1 , . . . , n ). Let Flagn ( , C)


be the set of normal matrices with the same eigenvalues as . (When has real
entries, then Flagn ( , C) is just as we defined it before.) Show that in this extended
setting Flagn ( , C) is again a manifold and that formula (4.1.17), with F = C and
= 2, still holds.

4.1.3 An integration formula of Weyl type

For the rest of Section 4.1 we will be working in the setup of Lie groups, see
Appendix F for definitions and basic properties. We aim to derive an integration
formula of Weyl type, Theorem 4.1.28, in some generality, which encompasses
all the results enunciated in Section 4.1.1.
Our immediate goal is to introduce a framework within which a uniform ap-
proach to derivation of joint eigenvalue distributions is possible. For motivation,
suppose that G and M are submanifolds of Matn (F) and that G is a closed sub-
group of Un (F) such that {gmg1 : m M, g G} = M. We want to integrate
out the action of G. More precisely, given a submanifold M which satisfies
M = {g g1 : g G, }, and a function on M such that (gmg1 ) = (m)

for all m M and g G, we want to represent d M in a natural way as an
integral on . This is possible if we can control the set of solutions (g, ) G
of the equation g g1 = m for all but a negligible set of m M. Such a procedure
was followed in Section 2.5 when deriving the law of the eigenvalues of the GOE.
However, as was already noted in the derivation of the law of the eigenvalues of
the GUE, decompositions of the form m = g g1 are not unique, and worse, the
set {(g, ) G : g g1 = m} is in general not discrete. Fortunately, however,
it typically has the structure of compact manifold. These considerations (and hind-
sight based on familiarity with classical matrix ensembles) motivate the following
definition.

Definition 4.1.22 A Weyl quadruple (G, H, M, ) consists of four manifolds G,


200 4. S OME GENERALITIES

H, M and with common ambient space Matn (F) satisfying the following condi-
tions:
(I) (a) G is a closed subgroup of Un (F),
(b) H is a closed subgroup of G, and
(c) dim G dim H = dim M dim .
(II) (a) M = {g g1 : g G, },
(b) = {h h1 : h H, },
(c) for every the set {h h1 : h H} is finite, and
(d) for all , we have = .
(III) There exists such that
(a) is open in ,
(b) ( \ ) = 0, and
(c) for every we have H = {g G : g g1 }.
We say that a subset for which (IIIa,b,c) hold is generic.

We emphasize that by conditions (Ia,b), the groups G and H are compact, and
that by Lemma 4.1.13(iii), the measures G and H are Haar measures. We also
remark that we make no connectedness assumptions concerning G, H, M and
. (In general, we do not require manifolds to be connected, although we do
assume that all tangent spaces of a manifold are of the same dimension.) In fact,
in practice, H is usually not connected.
In the next proposition we present the simplest example of a Weyl quadruple.
We recall, as in Definition E.4, that a matrix h Matn (F) is monomial if it factors
as the product of a diagonal matrix and a permutation matrix.

Proposition 4.1.23 Let G = Un (F) and let H G be the subset consisting of


monomial elements. Let M = Hn (F), let M be the subset consisting of (real)
diagonal elements, and let be the subset consisting of matrices with dis-
tinct diagonal entries. Then (G, H, M, ) is a Weyl quadruple with ambient space
Matn (F) for which the set is generic, and furthermore

[G] [Un (F)]


= . (4.1.19)
[H] n! [U1 (F)]n

This Weyl quadruple and the value of the associated constant [G]/ [H] will be
used to prove Proposition 4.1.1.
Proof Of all the conditions imposed by Definition 4.1.22, only conditions (Ic),
4.1 J OINT DISTRIBUTIONS FOR CLASSICAL MATRIX ENSEMBLES 201

(IIa) and (IIIc) require special attention, because the others are clear. To verify
condition (Ic), we note that

dim M = n + n(n 1)/2, dim = n ,


dim G = ( 1)n + n(n 1)/2, dim H = ( 1)n .

The first two equalities are clear since M and are real vector spaces. By Lemma
4.1.15 the tangent space TIn (G) consists of the collection of anti-self-adjoint ma-
trices in Matn (F), and thus the third equality holds. So does the fourth because
TIn (H) consists of the diagonal elements of TIn (G). Thus condition (Ic) holds.
To verify condition (IIa), we have only to apply Corollary E.12(i) which asserts
the possibility of diagonalizing a self-adjoint matrix. To verify condition (IIIc),
arbitrarily fix , and g G such that g g1 = , with the goal to show
that g H. In any case, by Corollary E.12(ii), the diagonal entries of are merely
a rearrangement of those of . After left-multiplying g by a permutation matrix
(the latter belongs by definition to H), we may assume that = , in which case g
commutes with . Then, because the diagonal entries of are distinct, it follows
that g is diagonal and thus belongs to H. Thus (IIIc) is proved. Thus (G, H, M, )
is a Weyl quadruple for which is generic.
We turn to the verification of formula (4.1.19). It is clear that the numerator on
the right side of (4.1.19) is correct. To handle the denominator, we observe that H
is the disjoint union of n! isometric copies of the manifold U1 (F)n , and then apply
Proposition F.8(vi). Thus (4.1.19) is proved.

Note that condition (IIa) of Definition 4.1.22 implies that gmg1 M for all
m M and g G. Thus the following definition makes sense.

Definition 4.1.24 Given a Weyl quadruple (G, H, M, ) and a function on M


(resp., a subset A M), we say that (resp., A) is G-conjugation-invariant if
(gmg1 ) = (m) (resp., 1A (gmg1 ) = 1A (m)) for all g G and m M.

Given a Weyl quadruple (G, H, M, ) and a G-conjugation-invariant nonnega-



tive Borel-measurable function on M, we aim now to represent d M as an
integral on . Our strategy for achieving this is to apply the coarea formula to the
smooth map
f = (g  g g1 ) : G M . (4.1.20)

For the calculation of the factor J(T(g, ) ( f )) figuring in the coarea formula for the
map f we need to understand for each fixed the structure of the derivative
at In G of the map
f = (g  g g1 ) : G M (4.1.21)
202 4. S OME GENERALITIES

obtained by freezing the second variable in f . For study of the derivative


TIn ( f ) the following ad hoc version of the Lie bracket will be useful.

Definition 4.1.25 Given X,Y Matn (F), let [X,Y ] = XY Y X.

Concerning the derivative TIn ( f ) we then have the following key result.

Lemma 4.1.26 Fix a Weyl quadruple (G, H, M, ) with ambient space Matn (F)
and a point . Let f be as in (4.1.21). Then we have

TIn ( f )(TIn (H)) = 0 , (4.1.22)


TIn ( f )(X) = [X, ] , (4.1.23)
TIn ( f )(TIn (G)) T (M) T () . (4.1.24)

The proof will be given later.

Definition 4.1.27 Let (G, H, M, ) be a Weyl quadruple. Given , let

D : TIn (G) TIn (H) T (M) T () (4.1.25)

be the linear map induced by TIn ( f ). For each we define the Weyl operator
to equal D D .

The abbreviated notation D and is appropriate because in applications be-


low the corresponding Weyl quadruple (G, H, M, ) will be fixed, and thus need
not be referenced in the notation. We emphasize that the source and target of
the linear map D have the same dimension by assumption (Ic). The determi-
nant det , which is independent of the choice of basis used to compute it, is

nonnegative because is positive semidefinite, and hence det is a well-
defined nonnegative number. We show in formula (4.1.29) below how to reduce
the calculation of to an essentially mechanical procedure. Remarkably, in all
intended applications, we can calculate det by exhibiting an orthogonal basis
for TIn (G) TIn (H) simultaneously diagonalizing the whole family { } .
We are now ready to state the generalized Weyl integration formula.

Theorem 4.1.28 (Weyl) Let (G, H, M, ) be a Weyl quadruple. Then for every
Borel-measurable nonnegative G-conjugation-invariant function on M, we have
  
[G]
d M = ( ) det d ( ) .
[H]
4.1 J OINT DISTRIBUTIONS FOR CLASSICAL MATRIX ENSEMBLES 203

The proof takes up the rest of Section 4.1.3. We emphasize that a Weyl quadruple
(G, H, M, ) with ambient space Matn (F) is fixed now and remains so until the
end of Section 4.1.3.
We begin with the analysis of the maps f and f defined in (4.1.20) and (4.1.21),
respectively.

Lemma 4.1.29 The restricted function f |H is constant on connected components


of H, and a fortiori has identically vanishing derivative.

Proof The function f |H is continuous and by assumption (IIc) takes only finitely
many values. Thus f |H is locally constant, whence the result.

Lemma 4.1.30 Let be generic. Then for every g0 G and 0 , the


fiber f 1 (g0 0 g1
0 ) is a manifold isometric to H.

It follows from Lemma 4.1.30 and Proposition F.8(v) that [ f 1 (g0 0 g1


0 )] =
[H].
Proof We claim that
f 1 (g0 0 g1 1
0 ) = {(g0 h, h 0 h) G M : h H} .

The inclusion follows from assumption (IIb). To prove the opposite inclu-
sion , suppose now that g g1 = g0 0 g1 0 for some g G and . Then
we have g1 g0 H by assumption (IIIc), hence g1 0 g = h for some h H, and
hence (g, ) = (g0 h, h1 0 h). The claim is proved. By assumptions (Ia,b) and
Lemma 4.1.13(iii), the map
(h  g0 h) : H g0 H = {g0 h : h H}
is an isometry of manifolds, and indeed is the restriction to H of an isometry of
Euclidean spaces. In view of Lemma 4.1.29, the map
(h  (g0 h, h1 0 h)) : H f 1 (g0 0 g1
0 ) (4.1.26)
is also an isometry, which finishes the proof of Lemma 4.1.30.

Note that we have not asserted that the map (4.1.26) preserves distances as
measured in ambient Euclidean spaces, but rather merely that it preserves geodesic
distances within the manifolds in question. For manifolds with several connected
components (as is typically the case for H), distinct connected components are
considered to be at infinite distance one from the other.
Proof of Lemma 4.1.26 The identity (4.1.22) follows immediately from Lemma
4.1.29.
204 4. S OME GENERALITIES

We prove (4.1.23). Let be a curve in G with (0) = In and (0) = X TIn (G).
Since ( 1 ) = 1 1 , we have TIn ( f )(X) = ( 1 ) (0) = [X, ]. Thus
(4.1.23) holds.
It remains to prove (4.1.24). As a first step, we note that
[ , X] = 0 for and X T () . (4.1.27)
Indeed, let be a curve in with (0) = and (0) = X. Then [ , ] vanishes
identically by Assumption (IId) and hence [ , X] = 0.
We further note that
[X, ] Y = X [Y, ] for X,Y Matn (F) , (4.1.28)
which follows from the definition A B = trX Y for any A, B Matn (F) and
straightforward manipulations.
We now prove (4.1.24). Given X TIn (G) and L T (), we have
TIn ( f )(X) L = [X, ] L = X [L, ] = 0 ,
where the first equality follows from (4.1.23), the second from (4.1.28) and the
last from (4.1.27). This completes the proof of (4.1.24) and of Lemma 4.1.26.

Lemma 4.1.31 Let : Matn (F) TIn (G) TIn (H) be the orthogonal projec-
tion. Fix . Then the following hold:
(X) = ([ , [ , X]]) for X TIn (G) TIn (H) , (4.1.29)

J(T(g, ) ( f )) = det for g G . (4.1.30)

Proof We prove (4.1.29). Fix X,Y TIn (G) TIn (H) arbitrarily. We have
(X) Y = D (D (X)) Y = D (X) D (Y )
= TIn ( f )(X) TIn ( f )(Y )
= [X, ] [Y, ] = [[X, ], ] Y = ([[X, ], ]) Y
at the first step by definition, at the second step by definition of adjoint, at the third
step by definition of D , at the fourth step by (4.1.23), at the fifth step by (4.1.28)
and at the last step trivially. Thus (4.1.29) holds.
Fix h G arbitrarily. We claim that J(T(h, ) ( f )) is independent of h G.
Toward that end consider the commuting diagram
T(In , ) ( f )
T(In , ) (G ) T (M)
T(In , ) ((g, )(hg, )) T (mhmh1 ).
T(h, ) ( f )
T(h, ) (G ) Thmh1 (M)
4.1 J OINT DISTRIBUTIONS FOR CLASSICAL MATRIX ENSEMBLES 205

Since the vertical arrows are isometries of Euclidean spaces by assumption (Ia)
and Lemma 4.1.13(ii), it follows that J(T(h, ) ( f )) = J(T(In , ) ( f ), and in particular
is independent of h, as claimed.
We now complete the proof of (4.1.30), assuming without loss of generality that
g = In . By definition

T(In , ) (G ) = TIn (G) T () ,

where we recall that the direct sum is equipped with Euclidean structure by declar-
ing the summands to be orthogonal. Clearly we have

(T(In , ) ( f ))(X L) = TIn ( f )(X) + L for X TIn (G) and L T () . (4.1.31)

By (4.1.24) and (4.1.31), the linear map T(In , ) ( f ) decomposes as the orthogonal
direct sum of TIn ( f ) and the identity map of T () to itself. Consequently we
have J(TIn , ( f )) = J( TIn ( f )) by Lemma F.18. Finally, by assumption (Ic),

formula (4.1.22) and Lemma F.19, we find that J( TIn ( f )) = det .

Proof of Theorem 4.1.28 Let Mreg be the set of regular values of the map f . We
have
  
[ f 1 (m)] (m)d M (m) = ( ) det d G (g, )
Mreg
 
= [G] ( ) det d ( ) . (4.1.32)

The two equalities in (4.1.32) are justified as follows. The first holds by formula
(4.1.30), the pushed down version (4.1.11) of the coarea formula, and the fact
that ( f (g, )) = ( ) by the assumption that is G-conjugation-invariant. The
second holds by Fubinis Theorem and the fact that G = G by Proposi-
tion F.8(vi).
By assumption (IIa) the map f is onto, hence Mreg = M \ Mcrit , implying by
Sards Theorem (Theorem F.11) that Mreg has full measure in M. For every m
Mreg , the quantity [ f 1 (m)] is positive (perhaps infinite). The quantity [G] is
positive and also finite since G is compact. It follows by (4.1.32) that the claimed
integration formula at least holds in the weak sense that a G-conjugation-invariant
Borel set A M is negligible in M if the intersection A is negligible in .
Now put M = {g g1 : g G, }. Then M is a Borel set. Indeed, by
assumption (IIIa) the set is -compact, hence so is M . By construction M is
G-conjugation-invariant. Now we have M , hence by assumption (IIIb)
the intersection M is of full measure in , and therefore by what we proved
in the paragraph above, M is of full measure in M. Thus, if we replace by 1M
in (4.1.32), neither the first nor the last integral in (4.1.32) changes and further,
206 4. S OME GENERALITIES

by Lemma 4.1.30, we can replace the factor f 1 (m) f 1 (m) in the first integral by
[H]. Therefore we have
  
[H] (m)d M (m) = [G] ( ) det d ( ) .
M Mreg M

Finally, since M Mreg is of full measure in M and M is of full measure in


, the desired formula holds.

4.1.4 Applications of Weyls formula

We now present the proofs of the integration formulas of Section 4.1.1. We prove
each by applying Theorem 4.1.28 to a suitable Weyl quadruple.
We begin with the Gaussian/Hermite ensembles.
Proof of Proposition 4.1.1 Let (G, H, M, ) be the Weyl quadruple defined in
Proposition 4.1.23. As in the proof of Lemma 4.1.17 above, and for a similar
purpose, we use the notation ei j , i, j, k. By Lemma 4.1.15 we know that TIn (G)
Matn (F) is the space of anti-self-adjoint matrices, and it is clear that TIn (H)
TIn (G) is the subspace consisting of diagonal anti-self-adjoint matrices. Thus the
set
8 9
uei j u e ji u {1, i, j, k} F, 1 i < j n

is an orthogonal basis for TIn (G) TIn (H) . By formula (4.1.29), we have

diag(x) (uei j u e ji ) = [diag(x), [diag(x), uei j u e ji ]] = (xi x j )2 (uei j u e ji )

and hence
/
det diag(x) = |(x)| for x Rn .

To finish the bookkeeping, note that the map x  diag(x) sends Rn isometrically to
and hence pushes Lebesgue measure on Rn forward to . Then the integration
formula (4.1.1) follows from Theorem 4.1.28 combined with formula (4.1.19) for
[G]/ [H].

We remark that the orthogonal projection appearing in formula (4.1.29) is


unnecessary in the Gaussian setup. In contrast, we will see that it does play a
nontrivial role in the study of the Jacobi ensembles.
We turn next to the Laguerre ensembles. The following proposition provides
the needed Weyl quadruples.
4.1 J OINT DISTRIBUTIONS FOR CLASSICAL MATRIX ENSEMBLES 207

Proposition 4.1.32 Fix integers 0 < p q and put n = p + q. Let

G = {diag(U,V ) : U U p (F), V Uq (F)} Un (F) ,


H = {diag(U,V ,V ) : U,V U p (F), V Uqp (F) ,
U,V are monomial, U(V ) is diagonal, (U(V ) )2 = I p } G ,
" # -
0 X
M = : X Mat pq (F) Hn (F) ,
X 0

0 x 0
= x 0 0 : x Mat p is (real) diagonal M .

0 0 0qp

Let be the subset consisting of elements for which the corresponding real
diagonal matrix x has nonzero diagonal entries with distinct absolute values. Then
(G, H, M, ) is a Weyl quadruple with ambient space Matn (F) for which the set
is generic and, furthermore,
[G] [U p (F)] [Uq (F)]
= p . (4.1.33)
[H] 2 p!(2( 1)/2 [U1 (F)]) p [Uqp (F)]

We remark that in the case p = q we are abusing notation slightly. For p = q one
should ignore V in the definition of H, and similarly modify the other definitions
and formulas.
Proof Of the conditions imposed by Definition 4.1.22, only conditions (Ic), (IIa)
and (IIIc) deserve comment. As in the proof of Proposition 4.1.23 one can verify
(Ic) by means of Lemma 4.1.15. Conditions (IIa) and (IIIc) follow from Corollary
E.13 concerning the singular value decomposition in Mat pq (F), and specifically
follow from points (i) and (iii) of that corollary, respectively. Thus (G, H, M, ) is
a Weyl quadruple for which is generic.
Turning to the proof of (4.1.33), note that the group G is isometric to the product
U p (F) Uq (F). Thus the numerator on the right side of (4.1.33) is justified. The
map x  diag(x, x) from U1 (F) to 2U (F) magnifies by a factor of 2. Abusing
notation, we denote its image by 2U1 (F). The group H is the disjoint union of
2 p p! isometric copies of the manifold ( 2U1 (F)) p Uqp (F). This justifies the
denominator on the right side of (4.1.33), and completes the proof.

Proof of Proposition 4.1.3 Let (G, H, M, ) be the Weyl quadruple defined in


Proposition 4.1.32. By Lemma 4.1.15, TIn (G) consists of matrices of the form
diag(X,Y ), where X Mat p (F) and Y Matq (F) are anti-self-adjoint. By the
same lemma, TIn (H) consists of matrices of the form diag(W,W, Z), where W
Mat p (F) is diagonal anti-self-adjoint and Z Matqp (R) is anti-self-adjoint. Thus
208 4. S OME GENERALITIES

TIn (G) TIn (H) may be described as the set of matrices of the form

a a+b 0 0
b := 0 ab c
c 0 c 0

where a, b Mat p (F) are anti-self-adjoint with a vanishing identically on the di-
agonal, and c Mat pq (F). Given (real) diagonal x Mat p , we also put

0 x 0
(x) := x 0 0 ,
0 0 0qp

thus parametrizing . By a straightforward calculation using formula (4.1.29), in


which the orthogonal projection is again unnecessary, one verifies that
2
a x a 2xax + ax2
(x) b = x2 b + 2xbx + bx2 ,
c x2 c

and hence that


/ p p
det (diag(x)) = |(x2 )| |2xi | 1 |xi | (qp) for x R p .
i=1 i=1
#"
0 X
Now for X Mat pq (F), put X = M. With as in the statement
X 0
of formula (4.1.3), let be the unique function on M such that (X ) = (X)
for all X Mat pq (F). By construction, is G-conjugation-invariant, and in
particular, ( (diag(x)) depends only on the absolute values
of the entries of

x. Note also that the map X  X magnifies by a factor of 2. We thus have
integration formulas
   p 
2 pq/2 d Mat pq (F) = d M , 23p/2 p
(x) dxi = d .
R+ i=1

Integration formula (4.1.3) now follows from Theorem 4.1.28 combined with for-
mula (4.1.33) for [G]/ [H].

We turn next to the Jacobi ensembles. The next proposition provides the needed
Weyl quadruples.

Proposition 4.1.33 Fix integers 0 < p q and put n = p + q. Fix 0 r q p


4.1 J OINT DISTRIBUTIONS FOR CLASSICAL MATRIX ENSEMBLES 209

and write q = p + r + s. Let


G = {diag(U,V ) : U U p (F), V Uq (F)} Un (F) ,
H = {diag(U,V ,V ,V ) : U,V U p (F), V Ur (F), V Us (F) ,
U,V are monomial, U(V ) is diagonal, (U(V ) )2 = I p } G ,
M = Flagn (diag(Ip+r , 0 p+s ), F) ,
" #
x y
= {diag( , Ir , 0s ) : x, y Mat p are diagonal
y Ip x
and x2 + y2 = x} M .
Let be the subset consisting of elements such that the absolute values of
the diagonal entries of the corresponding diagonal matrix y belong to the interval
(0, 1/2) and are distinct. Then (G, H, M, ) is a Weyl quadruple with ambient
space Matn (F) for which is generic and, furthermore,
[G] [U p (F)] [Uq (F)]
= p . (4.1.34)
[H] 2 p!(2( 1)/2 [U1 (F)]) p [Ur (F)] [Us (F)]

As in Proposition 4.1.32, we abuse notation slightly; one has to make appropriate


adjustments to handle extreme values of the parameters p, q, r, s.
Proof As in the proof of Proposition 4.1.32, of the conditions imposed by Defini-
tion 4.1.22, only conditions (Ic), (IIa) and (IIIc) need be treated. One can verify
(Ic) by means of Lemma 4.1.18 and Lemma 4.1.15.
We turn to the verification of condition (IIa). By Proposition E.14, for every
m M, there exists g G such that
" #
1 x y
gmg = diag( , w)
y z
where x, y, z Mat p and w Matn2p are real diagonal and satisfy the relations
dictated by the fact that gmg1 squares to itself and has trace p + r. If we have
tr w = r, then after left-multiplying g by a permutation matrix in G we have w =
diag(Ir , 0s ), and we are done. Otherwise tr w = r. After left-multiplying g by
a permutation matrix belonging to G, we can write y = diag(y , 0) where y
Mat p has nonzero diagonal entries. Correspondingly, we write x = diag(x , x )
and z = diag(z , z ) with x , z Mat p and x , z Mat pp . We then have z =
I p x . Further, all diagonal entries of x and z belong to {0, 1}, and finally,
tr z + tr w r. Thus, if we left-multiply g by a suitable permutation matrix in G
we can arrange to have tr w = r and we are done.
We turn finally to the verification of condition (IIIc). Fix and g G
such that g g1 . Let x, y Mat p be the real diagonal matrices corresponding
210 4. S OME GENERALITIES

to as in the definition of . By definition of , no two of the four diagonal


matrices x, I p x, Ir and 0s have a diagonal entry in common, and hence g =
diag(U,V,W, T ) for some U,V U p (F), W Ur (F) and T Us (F). Also by
definition of , the diagonal entries of y have distinct nonzero absolute values,
and hence we have g H by Corollary E.13(iii) concerning the singular value
decomposition. Thus (G, H, M, ) is a Weyl quadruple for which is generic.
A slight modification of the proof of formula (4.1.33) yields formula (4.1.34).

Proof of Proposition 4.1.4 Let (G, H, M, ) be the Weyl quadruple provided by


Proposition 4.1.33. We follow the pattern established in the previous analysis
of the Laguerre ensembles, but proceed more rapidly. We parametrize and
TIn (G) TIn (H) , respectively, in the following way.
" # 
x y
(x, y) := diag , Ir , 0s ,
y Ip x

a
b a+b 0 0 0

0 ab c d
,
c :=
0 c 0 e
d
0 d e 0
e

where:

x, y Mat p are real diagonal and satisfy x2 + y2 = x,


a, b Mat p (F) are anti-self-adjoint with a vanishing identically along the diag-
onal, and
c Mat pr (F), d Mat ps (F) and e Matrs (F).

By a straightforward if rather involved calculation using formula (4.1.29), we have



a xa + ax 2xax 2yay
b xb + bx 2xbx + 2yby


(x,y) c = xc .

d (Ip x)d
e e

(Unlike in the proofs of Propositions 4.1.1 and 4.1.3, the orthogonal projection
is used nontrivially.) We find that
/ p p
det (diag(x),diag(y)) = |(x)| (4xi (1 xi ))( 1)/2 (xir (1 xi )s ) /2
i=1 i=1
4.1 J OINT DISTRIBUTIONS FOR CLASSICAL MATRIX ENSEMBLES 211

for x, y R p such that xi (1 xi ) = y2i (and hence xi [0, 1]) for i = 1, . . . , p. The
calculation of the determinant is straightforward once it is noted that the identity

(x1 + x2 2x1 x2 2y1 y2 )(x1 + x2 2x1 x2 + 2y1 y2 ) = (x1 x2 )2

holds if xi (1 xi ) = y2i for i = 1, 2.


Now let be as it appears in formula (4.1.5). Note that is an isometric copy
Flag2 (diag(1, 0), R) and that Flag2 (diag(1, 0), R) is a circle of circumference
of p

2 . Note also that


  1
f (x)dx
f ((1 + cos )/2)d =  .
0 0 x(1 x)
We find that
  p
dxi
( (p) )d ( ) = 2 p/2 (x)  .
[0,1] p i=1 xi (1 xi )

Finally, note that the unique function on M satisfying (W ) = (W (p) ) is G-


conjugation invariant. We obtain (4.1.5) now by Theorem 4.1.28 combined with
formula (4.1.34) for [G]/ [H].

The next five propositions supply the Weyl quadruples needed to prove Proposi-
tion 4.1.6. All the propositions have similar proofs, with the last two proofs being
the hardest. We therefore supply only the last two proofs.

Proposition 4.1.34 Let G = M = Un (C). Let H G be the set of monomial


elements of G. Let G be the set of diagonal elements of G, and let be
the subset consisting of elements with distinct diagonal entries. Then (G, H, M, )
is a Weyl quadruple with ambient space Matn (C) for which is generic and,
furthermore,
[H]/ [] = n! . (4.1.35)

The proof of this proposition is an almost verbatim repetition of that of Proposi-


tion 4.1.23.
" #
0 1
Put = Mat2 and recall the notation Rn ( ) used in Proposition
1 0
4.1.6.

Proposition 4.1.35 Let n = 2 + 1 be odd. Let G = M = Un (R). Let Wn be the


group consisting of permutation matrices in Matn commuting with diag( , . . . , , 1).
Let
= {diag(R ( ), 1) : R } , H = {w : , w Wn } .
212 4. S OME GENERALITIES

Let be the subset consisting of elements with distinct (complex) eigen-


values. Then (G, H, M, ) is a Weyl quadruple with ambient space Matn (R) for
which is generic and, furthermore,
[H]/ [] = 2 ! . (4.1.36)

Proposition 4.1.36 Let G = M = Un (H). Let H G be the set of monomial ele-


ments with entries in C Cj. Let G be the set of diagonal elements with en-
tries in C. Let be the subset consisting of elements such that diag( , )
has distinct diagonal entries. Then (G, H, M, ) is a Weyl quadruple with ambient
space Matn (H) for which is generic and, furthermore,
[H]/ [] = 2n n! . (4.1.37)

Proposition 4.1.37 Let n = 2 be even. Let G = Un (R) and let M G be the


subset on which det = 1. Let Wn+ G be the group consisting of permutation
matrices commuting with diag( , . . . , ). Put
= {R ( ) : R } M, H = {w : , w Wn+ } G.
Let be the subset consisting of elements with distinct (complex) eigenval-
ues. Then (G, H, M, ) is a Weyl quadruple with ambient space Matn (R) such that
is generic and, furthermore,
[H]/ [] = 2 ! . (4.1.38)

Proposition 4.1.38 Let n = 2 be even. Let G = Un (R) and let M G be the


subset on which det = 1. Put
Wn = {diag(w, 1, 1) : w Wn2
+
} G,
= {diag(R1 ( ), 1, 1) : R1 } M ,
H = {w : w Wn , } G.
Let be the subset consisting of elements with distinct (complex) eigen-
values. Then (G, H, M, ) is a Weyl quadruple with ambient space Matn (R) for
which is generic and, furthermore,
[H]/ [] = 2+1 ( 1)! . (4.1.39)

Proof of Proposition 4.1.37 Only conditions (IIa) and (IIIc) require proof. The
other parts of the proposition, including formula (4.1.38), are easy to check.
To verify condition (IIa), fix m M arbitrarily. After conjugating m by some
element of G, we may assume by Theorem E.11 that m is block-diagonal with R-
standard blocks on the diagonal. Now the only orthogonal R-standard blocks are
4.1 J OINT DISTRIBUTIONS FOR CLASSICAL MATRIX ENSEMBLES 213

1 Mat1 and R( ) Mat2 for 0 < < . Since we assume det m = 1, there are
even numbers of 1s and 1s along the diagonal of m, and hence after conjugating
m by a suitable permutation matrix, we have m as required. Thus condition
(IIa) is proved.
To verify condition (IIIc), we fix , g G and such that g g1 = ,
with the goal to show that g H. After conjugating by a suitably chosen element
of Wn+ , we may assume that the angles 1 , . . . ,  describing , as in the definition
of , satisfy 0 < 1 < <  < . By another application of Theorem E.11,
after replacing g by wg for suitably chosen w Wn+ , we may assume that = .
Then g commutes with , which is possible only if g . Thus condition (IIIc)
is proved, and the proposition is proved.

Proof of Proposition 4.1.38 As in the proof of Proposition 4.1.37, only conditions


(IIa) and (IIIc) require proof. To verify condition (IIa) we argue exactly as in the
proof of Proposition 4.1.37, but this time, because det m = 1, we have to pair
off a 1 with a 1, and we arrive at the desired conclusion. To prove condition
(IIIc), we again fix , g G and such that g g1 = , with the
goal to show that g H; and arguing as before, we may assume that g commutes
with . The hypothesis that has distinct complex eigenvalues then insures then
g = diag(In2 , 1, 1) for some , and hence g H. Thus condition (IIIc)
is verified, and the proposition is proved.


Proof of Proposition 4.1.6 It remains only to calculate det for each of the
five types of Weyl quadruples defined above in order to complete the proofs of
(4.1.6), (4.1.7), (4.1.8) and (4.1.9), for then we obtain each formula by invoking
Theorem 4.1.28, combined with the formulas (4.1.35), (4.1.36), (4.1.37), (4.1.38)
and (4.1.39), respectively, for the ratio [H]/ []. Note that the last two Weyl
quadruples are needed to handle the two terms on the right side of (4.1.9), respec-
tively.
All the calculations are similar. Those connected with the proof of (4.1.9) are
the hardest, and may serve to explain all the other calculations. In the follow-
ing, we denote the Weyl quadruples defined in Propositions 4.1.37 and 4.1.38 by
(G, H + , M + , + ) and (G, H , M , ), respectively. We treat each quadruple in
a separate paragraph below.
To prepare for the calculation it is convenient to introduce two special functions.
Given real numbers and , let D( , ) be the square-root of the absolute value
of the determinant of the R-linear operator

Z  R( )(R( )Z ZR( )) (R( )Z ZR( ))R( )

on Mat2 (R), and let C( ) be the square-root of the absolute value of the determi-
214 4. S OME GENERALITIES

nant of the R-linear operator


Z  R( )(R( )Z Z ) (R( )Z Z )
on Mat2 (R), where = diag(1, 1). Actually both operators in question are non-
negative definite and hence have nonnegative determinants. One finds that
D( , ) = |(ei ei )(ei ei )|, C( ) = |ei ei |
by straightforward calculations.
Consider the Weyl quadruple (G, H + , M + , + ) and for R put + ( ) =
R ( ). The space TIn (G) TIn (H + ) consists of real antisymmetric matrices
X Matn such that X2i,2i1 = 0 for i = 1, . . . , . Using formula (4.1.29), one finds
that
/
det + ( ) = D(i , j ) = D ( )
1i< j

which proves (4.1.9) for all functions supported on M + .


Consider next the Weyl quadruple (G, H , M , ) and for R1 put ( )
= diag(R ( ), 1, 1). The space TIn (G) TIn (H ) consists of real antisymmet-
ric matrices X Matn such that X2i,2i1 = 0 for i = 1, . . . ,  1. Using formula
(4.1.29) one finds that
/
det ( ) = D(i , j ) C(i ) 2 ,
1i< j1 1i1

which proves (4.1.9) for all functions supported on M . (The last factor of 2 is
accounted for by the fact that for Z Mat2 real antisymmetric, [ , [ , Z]] = 4Z.)
This completes the proof of (4.1.9).
All the remaining details needed to complete the proof of Proposition 4.1.6,
being similar, we omit.

Exercise 4.1.39
Let G = Un (C) and let H G be the subgroup consisting of monomial ele-
ments. Let M Matn (C) be the set consisting of normal matrices with distinct
eigenvalues, and let M be the subset consisting of diagonal elements. Show

that (G, H, M, ) is a Weyl quadruple. Show that det = 1i< jn |i j |2
for all = diag(1 , . . . , n ) .

4.2 Determinantal point processes


The collection of eigenvalues of a random matrix naturally can be viewed as a
configuration of points (on R or on C), that is, as a point process. This section
4.2 D ETERMINANTAL POINT PROCESSES 215

is devoted to the study of a class of point processes known as determinantal pro-


cesses; such processes possess useful probabilistic properties, such as CLTs for
occupation numbers, and, in the presence of approximate translation invariance,
convergence to stationary limits. The point process determined by the eigenvalues
of the GUE is, as we show below, a determinantal process. Further, determinantal
processes occur as limits of the rescaled configuration of eigenvalues of the GUE,
in the bulk and in the edge of the spectrum, see Section 4.2.5.

4.2.1 Point processes: basic definitions

Let be a locally compact Polish space, equipped with a (necessarily -finite)


positive Radon measure on its Borel -algebra (recall that a positive measure
is Radon if (K) < for each compact set K). We let M () denote the space
of -finite Radon measures on , and let M+ () denote the subset of M ()
consisting of positive measures.

Definition 4.2.1 (a) A point process is a random, integer-valued M+ (). (By


random we mean that for any Borel B , (B) is an integer-valued random
variable.)
(b) A point process is simple if
P(x : ({x}) > 1) = 0 . (4.2.1)

Note that the event in (4.2.1) is measurable due to the fact that is Polish. One
may think about also in terms of configurations. Let X denote the space of
locally finite configurations in , and let X = denote the space of locally finite
configurations with no repetitions. More precisely, for xi , i I an interval
of positive integers (beginning at 1 if nonempty), with I finite or countable, let
[xi ] denote the equivalence class of all sequences {x (i) }iI , where runs over all
permutations (finite or countable) of I. Then, set
X = X () = {x = [xi ]i=1 , where xi , , and
|xK | := {i : xi K} < for all compact K }
and
X = = {x X : xi = x j for i = j} .
We endow X and X = with the -algebra CX generated by the cylinder sets
CnB = {x X : |xB | = n}, with B Borel with compact closure and n a nonnegative
integer. Since = i=1 i for some (possibly random) and random i , each
point process can be associated with a point in X (in X = if is simple). The
216 4. S OME GENERALITIES

converse is also true, as is summarized in the following elementary lemma, where


we let be a probability measure on the measure space (X , CX ).

Lemma 4.2.2 A -distributed random element x of X can be associated with a


point process via the formula (B) = |xB | for all Borel B . If (X = ) = 1,
then is a simple point process.

With a slight abuse, we will therefore not distinguish between the point process
and the induced configuration x. In the sequel, we associate the law with the
point process , and write E for expectation with respect to this law.
We next note that if x is not simple, then one may construct a simple point pro-

cess x = {(xj , N j )}j=1 X ( ) on = N+ by letting denote the num-
ber of distinct entries in x, introducing a many-to-one mapping j(i) : {1, . . . , } 
{1, . . . , } with N j = |{i : j(i) = j}| such that if j(i) = j(i ) then xi = xi , and then
setting xj = xi if j(i) = j. In view of this observation, we only consider in the se-
quel simple point processes.

Definition 4.2.3 Let be a simple point process. Assume locally integrable func-
tions k : k [0, ), k 1, exist such that for any mutually disjoint family of
subsets D1 , , Dk of ,
k 
E [ (Di )] = k (x1 , , xk )d (x1 ) d (xk ) .
i=1 ki=1 Di

Then the functions k are called the joint intensities (or correlation functions) of
the point process with respect to .

The term correlation functions is standard in the physics literature, while joint
intensities is more commonly used in the mathematical literature.

Remark 4.2.4 By Lebesgues Theorem, for k almost every (x1 , . . . , xk ),


P ( (B(xi , )) = 1, i = 1, . . . , k)
lim = k (x1 , . . . , xk ) .
0 ki=1 (B(xi , ))
Further, note that k () is in general only defined k -almost everywhere, and that
k (x1 , . . . , xk ) is not determined by Definition 4.2.3 if there are i = j with xi = x j .
For consistency with Lemma 4.2.5 below and the fact that we consider simple
processes only, we set k (x1 , . . . , xk ) = 0 for such points.

The joint intensities, if they exist, allow one to consider overlapping sets, as well.
In what follows, for a configuration x X = , and k integer, we let xk denote
4.2 D ETERMINANTAL POINT PROCESSES 217

the set of ordered samples of k distinct elements from x. (Thus, if = R and


x = {1, 2, 3}, then x2 = {(1, 2), (2, 1), (1, 3), (3, 1), (2, 3), (3, 2)}.)

Lemma 4.2.5 Let be a simple point process with intensities k .


(a) For any Borel set B k with compact closure,
  
k
E |x B| = k (x1 , , xk )d (x1 ) d (xk ) . (4.2.2)
B

(b) If Di , i = 1, . . . , r, are mutually disjoint subsets of contained in a compact


set K, and if {ki }ri=1 is a collection of positive integers such that ri=1 ki = k, then
   
r
(Di )
E ki ! = k
k (x1 , . . . , xk ) (dx1 ) (dxk ) . (4.2.3)
i=1 ki Di i

Proof of Lemma 4.2.5 Note first that, for any compact Q , there exists an
increasing sequence of partitions {Qni }ni=1 of Q such that, for any x Q,
. .
Qni = {x} .
n i:xQni

We denote by Qnk the collection of (ordered) k-tuples of distinct elements of {Qni }.


(a) It is enough to consider sets of the form B = B1 B2 Bk , with the sets
Bi Borel of compact closure. Then
k
Mkn := |(Q1 Qk ) B xk | = (Qi Bi ) .
(Q1 ,...,Qk )Qnk (Q1 ,...,Qk )Qnk i=1

Thus

E (Mkn ) = (Q1 Qk )B
k (x1 , . . . , xk )d (x1 ) . . . d (xk ) . (4.2.4)
(Q1 ,...,Qk )Qnk

Note that Mkn increases monotonically in n to |xk B|. On the other hand, since x
is simple, and by our convention concerning the intensities k , see Remark 4.2.4,

lim sup
n (Q ,...,Q )(Q 1 )k \Q k (Q1 Qk )B
k (x1 , . . . , xk )d (x1 ) . . . d (xk ) = 0 .
1 k n n

The conclusion follows from these facts, the fact that X is a Radon measure and
(4.2.4).
(b) Equation (4.2.3) follows from (4.2.2) through the choice B = Dki
i .

Remark 4.2.6 Note that a system of nonnegative, measurable and symmetric func-
tions {r : r [0, ]}
r=1 is a system of joint intensities for a simple point process
218 4. S OME GENERALITIES

that consists of exactly n points almost surely, if and only if r = 0 for r > n, 1 /n
is a probability density function, and the family is consistent, that is, for 1 < r n,

r (x1 , . . . , xr )d (xr ) = (n r + 1)r1 (x1 , . . . , xr1 ) .

As we have seen, for a simple point process, the joint intensities give information
concerning the number of points in disjoint sets. Let now Di be given disjoint

compact sets, with D = Li=1 Di be such that E(z (D) ) < for z in a neighborhood
of 1. Consider the Taylor expansion, valid for z in a neighborhood of 1,

L
(D )
L
(Di )! L
z = 1+ ( (Di ) ni )!ni ! (zi 1)ni (4.2.5)
=1 n=1 ni (Di ) i=1 i=1
ni #L n
L
( (Di )( (Di ) 1) ( (Di ) ni + 1)) L
= 1+ (zi 1)ni ,
n=1 ni #L n i=1 ni ! i=1

where
L
{ni #L n} = {(n1 , . . . , nL ) NL+ : ni = n} .
i=1

Then one sees that, under these conditions, the factorial moments in (4.2.3) deter-
mine the characteristic function of the collection { (Di )}Li=1 . A more direct way
to capture the distribution of the point process is via its Janossy densities, that
we define next.

Definition 4.2.7 Let D be compact. Assume there exist symmetric functions


jD,k : Dk R+ such that, for any finite collection of mutually disjoint measurable
sets Di D, i = 1, . . . , k,

P( (D) = k, (Di ) = 1, i = 1, . . . , k) = jD,k (x1 , . . . , xk ) (dxi ) . (4.2.6)
i Di i

Then we refer to the collection { jD,k }


k=1 as the Janossy densities of in D.

The following easy consequences of the definition are proved in the same way that
Lemma 4.2.5 was proved.

Lemma 4.2.8 For any compact D , if the Janossy densities jD,k , k 1 exist
then

1
P( (D) = k) = jD,k (x1 , . . . , xk ) (dxi ) , (4.2.7)
k! Dk i
4.2 D ETERMINANTAL POINT PROCESSES 219

and, for any mutually disjoint measurable sets Di D, i = 1, . . . , k and any integer
r 0,

P( (D) = k + r, (Di ) = 1, i = 1, . . . , k)

1
= jD,k+r (x1 , . . . , xk+r ) (dxi ) . (4.2.8)
r! ki=1 Di Dr i

In view of (4.2.8) (with r = 0), one can naturally view the collection of Janossy
densities as a distribution on the space k
k=0 D .
Janossy densities and joint intensities are (at least locally, i.e. restricted to a
compact set D) equivalent descriptions of the point process , as the following
proposition states.

Proposition 4.2.9 Let be a simple point process on and assume D is


compact.
(a) Assume the Janossy densities jD,k , k 1, exist, and that

kr jD,k (x1 , . . . , xk )
Dk k! (dxi ) < , for all r integer . (4.2.9)
k i

Then restricted to D possesses the intensities


jD,k+r (x1 , . . . , xk , D, . . . , D)
k (x1 , . . . , xk ) = r!
, xi D, (4.2.10)
r=0

where
 r
jD,k+r (x1 , . . . , xk , D, . . . , D) = jD,k+r (x1 , . . . , xk , y1 , . . . , yr ) (dyi ) .
Dr i=1

(b) Assume the intensities k (x1 , . . . , xk ) exist and satisfy



kr k (x1 , . . . , xk )
Dk k! (dxi ) < , for all r integer . (4.2.11)
k i

Then the Janossy densities jD,k exist for all k and satisfy

(1)r k+r (x1 , . . . , xk , D, . . . , D)
jD,k (x1 , . . . , xk ) = r!
, (4.2.12)
r=0

where
 r
k+r (x1 , . . . , xk , D, . . . , D) = k+r (x1 , . . . , xk , y1 , . . . , yr ) (dyi ) .
Dr i=1
220 4. S OME GENERALITIES

The proof follows the same procedure as in Lemma 4.2.5: partition and use
dominated convergence together with the integrability conditions and the fact that
is assumed simple. We omit further details. We note in passing that under a
slightly stronger assumption of the existence of exponential moments, part (b) of
the proposition follows from (4.2.5) and part (b) of Lemma 4.2.5.

Exercise 4.2.10 Show that, for the standard Poisson process of rate > 0 on
= R with taken as the Lebesgue measure, one has, for any compact D R
with Lebesgue measure |D|,

k (x1 , . . . , xk ) = e |D| jD,k (x1 , . . . , xk ) = k .

4.2.2 Determinantal processes

We begin by introducing the general notion of a determinantal process.

Definition 4.2.11 A simple point process is said to be a determinantal point


process with kernel K (in short: determinantal process) if its joint intensities k
exist and are given by
k
k (x1 , , xk ) = det (K(xi , x j )) . (4.2.13)
i, j=1

In what follows, we will be mainly interested in certain locally trace-class op-


erators on L2 ( ) (viewed as either a real or complex Hilbert space, with inner
product denoted  f , gL2 ( ) ), motivating the following definition.

Definition 4.2.12 An integral operator K : L2 ( ) L2 ( ) with kernel K given


by

K ( f )(x) = K(x, y) f (y)d (y) , f L2 ( )

is admissible (with admissible kernel K) if K is self-adjoint, nonnegative and lo-


cally trace-class, that is, with the operator KD = 1D K 1D having kernel KD (x, y) =
1D (x)K(x, y)1D (y), the operators K and KD satisfy:

g, K ( f )L2 ( ) = K (g), f L2 ( ) , f , g L2 ( ) , (4.2.14)


 f , K ( f )L2 ( ) 0 , f L2 ( ) , (4.2.15)
For all compact sets D , the eigenvalues (iD )i0 ( R+ )
(4.2.16)
of KD satisfy iD < .
4.2 D ETERMINANTAL POINT PROCESSES 221

We say that K is locally admissible (with locally admissible kernel K) if (4.2.14)


and (4.2.15) hold with KD replacing K .

The following standard result, which we quote from [Sim05b, Theorem 2.12]
without proof, gives sufficient conditions for a (positive definite) kernel to be ad-
missible.

Lemma 4.2.13 Suppose K : C is a continuous, Hermitian and posi-


tive definite function, that is, ni=1 zi z j K(xi , x j ) 0 for any n, x1 , . . . , xn and
z1 , . . . , zn C. Then K is locally admissible.

By standard results, see e.g. [Sim05b, Theorem 1.4], an integral compact operator
K with admissible kernel K possesses the decomposition
n
K f (x) = k k (x)k , f L2 ( ) , (4.2.17)
k=1

where the functions k are orthonormal in L2 ( ), n is either finite or infinite, and


k > 0 for all k, leading to
n
K(x, y) = k k (x)k (y) . (4.2.18)
k=1

(The last equality is to be understood in L2 ( ).) If K is only locally admis-


sible, KD is admissible and compact for any compact D, and the relation (4.2.18)
holds with KD replacing K and the k and k depending on D.

Definition 4.2.14 An admissible (respectively, locally admissible) integral opera-


tor K with kernel K is good (with good kernel K) if the k (respectively, kD ) in
(4.2.17) satisfy k (0, 1].

We will later see (see Corollary 4.2.21) that if the kernel K in definition 4.2.11 of
a determinantal process is (locally) admissible, then it must in fact be good.
The following example is our main motivation for discussing determinantal
point processes.

Example 4.2.15 Let (1N , , NN ) be the eigenvalues of the GUE of dimension N,


and denote by N the point process N (D) = Ni=1 1 N D . By Lemma 3.2.2, N is
i
a determinantal process with (admissible, good) kernel
N1
K (N) (x, y) = k (x)k (y) ,
k=0

where the functions k are the oscillator wave-functions.


222 4. S OME GENERALITIES

We state next the following extension of Lemma 4.2.5. (Recall, see Definition
3.4.3, that (G) denotes the Fredholm determinant of a kernel G.)

Lemma 4.2.16 Suppose is a -distributed determinantal point processes. Then,


for mutually disjoint Borel sets D ,  = 1, . . . , L, whose closure is compact,
 
L L
(D )
E ( z ) = 1D (1 z )K1D , (4.2.19)
=1 =1

where D = L=1 D and the equality is valid for all (z )L=1 CL . In particu-
lar, the law of the restriction of simple determinantal processes to compact sets
is completely determined by the intensity functions, and the restriction of a de-
terminantal process to a compact set D is determinantal with admissible kernel
1D (x)K(x, y)1D (y).

Proof of Lemma 4.2.16 By our assumptions, the right side of (4.2.19) is well
defined for any choice of (z )L=1 CL as a Fredholm determinant (see Definition
3.4.3), and
 
L
1D (1 z )K1D 1
=1
 
 @n
L
1
= det (z 1)K(xi , x j )1D (x j ) (dx1 ) (dxL )
n=1 n! D D =1 i, j=1
L n
1
= n! (zk 1) (4.2.20)
n=1 1 ,...,n =1 k=1
  1 2n
det 1D (xi )K(xi , x j )1D j (x j ) (dx1 ) (dxL ) .
i, j=1

On the other hand, recall the Taylor expansion (4.2.5). Using (4.2.3) we see that
the -expectation of each term in the last power series equals the corresponding
term in the power series in (4.2.20), which represents an entire function. Hence,
by monotone convergence, (4.2.19) follows.

Note that an immediate consequence of Definition 4.2.3 and Lemma 4.2.16 is


that the restriction of a determinantal process with kernel K(x, y) to a compact
subset D is determinantal, with kernel 1xD K(x, y)1yD .

4.2.3 Determinantal projections

A natural question is now whether, given a good kernel K, one may construct
an associated determinantal point process. We will answer this question in the
4.2 D ETERMINANTAL POINT PROCESSES 223

affirmative by providing an explicit construction of determinantal point processes.


We begin, however, with a particular class of determinantal processes defined by
projection kernels.

Definition 4.2.17 A good kernel K is called a trace-class projection kernel if all


eigenvalues k in (4.2.18) satisfy k = 1, and nk=1 k < . For a trace-class
projection kernel K, set HK = span{k }.

Lemma 4.2.18 Suppose is a determinantal point process with trace-class pro-


jection kernel K. Then () = n, almost surely.

Proof By assumption, n < in (4.2.18). The matrix {K(xi , x j )}ki, j=1 has rank at
most n for all k. Hence, by (4.2.3), () n, almost surely. On the other hand,
  n 
E ( ()) = 1 (x)d (x) = K(x, x)d (x) = |i (x)|2 d (x) = n .
i=1

This completes the proof.


Proposition 4.2.19 Let K be a trace-class projection kernel. Then a simple deter-


minantal point process with kernel K exists.

A simple proof of Proposition 4.2.19 can be obtained by noting that the function
detni, j=1 K(xi , x j )/n! is nonnegative, integrates to 1, and by a computation similar
to Lemma 3.2.2, see in particular (3.2.10), its kth marginal is (n k)! detki, j=1
K(xi , x j )/n!. We present an alternative proof that has the advantage of providing
an explicit construction of the resulting determinantal point process.
Proof For a finite-dimensional subspace H of L2 ( ) of dimension d, let KH
denote the projection operator into H and let KH denote an associated kernel. That
is, KH (x, y) = dk=1 k (x)k (y) for some orthonormal family {k }dk=1 in H. For
x , set kxH () = KH (x, ). (Formally, kxH = KH x , in the sense of distributions.)
The function kxH () L2 ( ) does not depend on the choice of basis {k }, for
almost every x: indeed, if {k } is another orthonormal basis in H, then there exist
complex coefficients {ai, j }ki, j=1 such that
d d
k = ak, j j , ak, j ak, j = j, j .
j=1 j=1

Hence, for -almost every x, y,


d d d
k (x)k (y) = ak, j ak, j j (x) j (y) = j (x) j (y) .
k=1 k, j, j =1 j=1
224 4. S OME GENERALITIES

We have that KH (x, x) = kxH 2 belongs to L1 ( ) and that different choices of


basis {k } lead to the same equivalent class of functions in L1 ( ). Let H be the
measure on defined by d H /d (x) = KH (x, x).
By assumption, n < in (4.2.18). Thus the associated subspace HK is finite-
dimensional. We construct a sequence of random variables Z1 , . . . , Zn in as
follows. Set Hn = HK and j = n.

If j = 0, stop.
Pick a point Z j distributed according to H j / j.
H
Let H j1 be the orthocomplement to the function kZ jj in H j .
Decrease j by one and iterate.

We now claim that the point process x = (Z1 , . . . , Zn ), of law , is determinantal


with kernel K. To see that, note that
H
kZ jj = KH j kZHj , in L2 ( ), -a.s.

Hence the density of the random vector (Z1 , . . . , Zn ) with respect to n equals
H n KH kH 2
n kx j j 2 j xj
p(x1 , . . . , xn ) = = .
j=1 j j=1 j

Since H j = H (kxHj+1 , . . . , kxHn ) , it holds that


n
V = KH j kxHj 
j=1

equals the volume of the parallelepiped determined by the vectors kxH1 , . . . , kxHn in

the finite-dimensional subspace H L2 ( ). Since kxHi (x)kxHj (x) (dx) = K(xi , x j ),
it follows that V 2 = det(K(xi , x j ))ni, j=1 . Hence

1
p(x1 , . . . , xn ) = det(K(xi , x j ))ni, j=1 .
n!
Thus, the random variables Z1 , . . . , Zn are exchangeable, almost surely distinct,
and the n-point intensity of the point process x equals n!p(x1 , . . . , xn ). In partic-
ular, integrating and applying the same argument as in (3.2.10), all k-point inten-
sities have the determinantal form for k n. Together with Lemma 4.2.18, this
completes the proof.

Projection kernels can serve as building blocks for trace-class determinantal


processes.
4.2 D ETERMINANTAL POINT PROCESSES 225

Proposition 4.2.20 Suppose is a determinantal process with good kernel K of


the form (4.2.18), with k k < . Let {Ik }nk=1 be independent Bernoulli variables
with P(Ik = 1) = k . Set
n
KI (x, y) = Ik k (x)k (y) ,
k=1
and let I denote the determinantal process with (random) kernel KI . Then and
I have the same distribution.

The statement in the proposition can be interpreted as stating that the mixture of
determinental processes I has the same distribution as .
Proof Assume first n is finite. We need to show that for all m n, the m-point
joint intensities of and I are the same, that is
m m
det (K(xi , x j )) = E[ det (KI (xi , x j ))] .
i, j=1 i, j=1

But, with Ai,k = Ik k (xi ) and Bk,i = k (xi ) for 1 i m, 1 k n, then


(KI (xi , x j ))m
i, j=1 = AB , (4.2.21)
and by the CauchyBinet Theorem A.2,
m
det (KI (xi , x j )) =
i, j=1
det(A{1,..,m}{1 , ,m } ) det(B{1 , ,m }{1,..,m} ) .
11 <<m n

Since E(Ik ) = k , we have


E[det(A{1,..,m}{1 , ,m } )] = det(C{1,..,m}{1 , ,m } )
with Ci,k = k k (xi ). Therefore,
m
E[ det (KI (xi , x j ))] =
i, j=1
det(C{1,..,m}{1 , ,m } ) det(B{1 , ,m }{1,..,m} )
11 <<m n
m
= det(CB) = det(K(xi , x j )) , (4.2.22)
i=1
where the CauchyBinet Theorem A.2 was used again in the last line.
Suppose next that n = . Since k < , we have that I := Ik < almost
surely. Thus, I is a well defined point process. Let IN denote the determinantal
process with kernel KIN = Nk=1 Ik k (x)k (y). Then IN is a well defined point
process, and arguing as in (4.2.21), we get, for every integer m,
m
det (KIN (xi , x j )) =
i, j=1
det(A{1,..,m}{1 , ,m } ) det(B{1 , ,m }{1,..,m} )
11 <<m N

= 1{I j =1, j=1,...,m} | det(B{1 , ,m }{1,..,m} )|2 . (4.2.23)


11 <<m N
226 4. S OME GENERALITIES

In particular, the left side of (4.2.23) increases in N. Taking expectations and


using the CauchyBinet Theorem A.2 and monotone convergence, we get, with
the same notation as in (4.2.22), that
m m
E det (KI (xi , x j )) = lim E det (KIN (xi , x j ))
i, j=1 N i, j=1

= lim
N
det(C{1,..,m}{1 , ,m } ) det(B{1 , ,m }{1,..,m} )
11 <<m N
m m
= lim det (KN (xi , x j )) = det (K(xi , x j )) , (4.2.24)
N i, j=1 i, j=1

where we write KN (x, y) = Nk=1 k k (x)k (y).


We have the following.

Corollary 4.2.21 Let K be admissible on L2 ( ), with trace-class kernel K. Then


there exists a determinantal process with kernel K if and only if the eigenvalues
of K belong to [0, 1].

Proof From the definition, determinantal processes are determined by restriction


to compact subsets, and the resulting process is determinantal too, see Lemma
4.2.16. Since the restriction of an admissible K to a compact subset is trace-
class, it thus suffices to consider only the case where K is trace-class. Thus, the
sufficiency is immediate from the construction in Proposition 4.2.20.
To see the necessity, suppose is a determinantal process with nonnegative
kernel K(x, y) = k k (x)k (y), with max i = 1 > 1. Let 1 denote the point
process with each point xi deleted with probability 1 1/1 , independently. 1 is
clearly a simple point process and, moreover, for disjoint subsets D1 , . . . , Dk of ,
k 
E [ 1 (Di )] = (1/1 )k k (x1 , , xk )d (x1 ) d (xk ) .
i=1 ki=1 Di

Thus, 1 is determinantal with kernel K1 = (1/1 )K. Since had finitely many
points almost surely (recall that K was assumed trace-class), it follows that
P(1 () = 0) > 0. But, the process 1 can be constructed by the procedure
of Proposition 4.2.20, and since the top eigenvalue of K1 equals 1, we obtain
P(1 () 1) = 1, a contradiction.

We also have the following corollaries.

Corollary 4.2.22 Let K be a locally admissible kernel on , such that for any
compact D , the nonzero eigenvalues of KD belong to (0, 1]. Then K uniquely
determines a determinantal point process on .
4.2 D ETERMINANTAL POINT PROCESSES 227

Proof By Corollary 4.2.21, a determinantal process is uniquely determined by KD


for any compact D. By the definition of the intensity functions, this sequence of
laws of the processes is consistent, and hence they determine uniquely a determi-
nantal process on .

Corollary 4.2.23 Let be a determinantal process corresponding to an admissi-


ble trace class kernel K. Define the process p by erasing, independently, each
point with probability (1 p). Then p is a determinantal process with kernel pK.

Proof Repeat the argument in the proof of the necessity part of Corollary 4.2.21.

4.2.4 The CLT for determinantal processes

We begin with the following immediate corollary of Proposition 4.2.20 and Lemma
4.2.18. Throughout, for a good kernel K and a set D , we write KD (x, y) =
1D (x)K(x, y)1D (y) for the restriction of K to D.

Corollary 4.2.24 Let K be a good kernel, and let D be such that KD is trace-
class, with eigenvalues k , k 1. Then (D) has the same distribution as k k
where k are independent Bernoulli random variables with P(k = 1) = k and
P(k = 0) = 1 k .

The above representation immediately leads to a central limit theorem for oc-
cupation measures.

Theorem 4.2.25 Let n be a sequence of determinantal processes on with good


kernels Kn . Let Dn be a sequence of measurable subsets of such that (Kn )Dn is
trace class and Var(n (Dn )) n . Then

n (Dn ) E [n (Dn )]
Zn = 
Var(n (Dn ))

converges in distribution towards a standard normal variable.


Proof We write Kn for the kernel (Kn )Dn and set Sn = Var(n (Dn )). By
Corollary 4.2.24, n (Dn ) has the same distribution as the sum of independent
Bernoulli variables kn , whose parameters kn are the eigenvalues of Kn . In partic-
228 4. S OME GENERALITIES

ular, Sn2 = k kn (1 kn ). Since Kn is trace-class, we can write, for any real,

log E[e Zn ] = log E[e (k k )/Sn ]


n n

k
k k
= + log(1 + kn (e /Sn 1))
Sn k
2 k kn (1 kn ) k kn (1 kn )
= + o( ),
2Sn2 Sn3
uniformly for in compacts. Since k kn /Sn3 n 0, the conclusion follows.

We note in passing that, under the assumptions of Theorem 4.2.25,



Var(n (Dn )) = kn (1 kn ) kn = Kn (x, x)d n (x) .
k k

Thus, for Var(n (Dn )) to go to infinity, it is necessary that



lim Kn (x, x)d n (x) = + . (4.2.25)
n Dn

(n)
We also note that from (4.2.3) (with r = 1 and k = 2, and k denoting the inten-
sity functions corresponding to the kernel Kn from Theorem 4.2.25), we get
 
Var(n (Dn )) = Kn (x, x)d n (x) Kn2 (x, y)d n (x)d n (y) . (4.2.26)
Dn Dn Dn

Exercise 4.2.26 Using (4.2.26), provide an alternative proof that a necessary con-
dition for Var(n (Dn )) is that (4.2.25) holds.

4.2.5 Determinantal processes associated with eigenvalues

We provide in this section several examples of point processes related to configu-


rations of eigenvalues of random matrices that possess a determinantal structure.
We begin with the eigenvalues of the GUE, and move on to define the sine and
Airy processes, associated with the sine and Airy kernels.

The GUE

[Continuation of Example 4.2.15] Let (1N , , NN ) be the eigenvalues of the GUE


of dimension N, and denote by N the point process N (D) = Ni=1 1 N D . Recall
i
that, with the GUE scaling, the empirical measure of theeigenvalues is, with high
probability, roughly supported on the interval [2 N, 2 N].
4.2 D ETERMINANTAL POINT PROCESSES 229

Corollary 4.2.27 Let D = [a, b] with a, b > 0, (1/2, 1/2), and set DN =
N D. Then
N (DN ) E[N (DN )]
ZN = 
Var(N (DN ))
converges in distribution towards a standard normal variable.

Proof In view of Example 4.2.15 and Theorem 4.2.25, the only thing we need to
check is that Var(N (DN )) as N . Recalling that
  2
K (N) (x, y) dy = K (N) (x, x) ,
R
it follows from (4.2.26) that for any R > 0, and all N large,
   2
Var(N (DN )) = K (N) (x, y) dxdy
DN (DN )c
   2
1 x y
= K (N) ( , ) dxdy
NDN N(DN )c N N N
 0  R
(N)
SbN (x, y)dxdy , (4.2.27)
R 0

where  
(N) 1 x y
Sz (x, y) = K (N) z + , z +
N N N
(N)
is as in Exercise 3.7.5, and SbN (x, y) converges uniformly on compacts, as N
, to the sine-kernel sin(x y)/( (x y)). Therefore, there exists a constant c > 0
such that the right side of (4.2.27) is bounded below, for large N, by c log R. Since
R is arbitrary, the conclusion follows.


Exercise 4.2.28 Using Exercise 3.7.5 again, prove that if DN = [a N, b N]
with a, b (0, 2), then Corollary 4.2.27 still holds.

Exercise 4.2.29 Prove that the conclusions of Corollary 4.2.27 and Exercise 4.2.28
hold when the GUE is replaced by the GOE.
Hint: Write (N) (DN ) for the variable corresponding to N (DN ) in Corollary
4.2.27, with the GOE replacing the GUE. Let (N) (DN ) and (N+1) (DN ) be inde-
pendent.
(a) Use Theorem 2.5.17 to show that N (DN ) can be constructed on the same prob-
ability space as (N) (DN ), (N+1) (DN ) in such a way that, for any > 0, there is
a C so that
lim sup P(|N (DN ) ( (N) (DN ) + (N+1) (DN ))/2| > C ) < .
N
230 4. S OME GENERALITIES

(b) By writing a GOE(N + 1) matrix as a rank 2 perturbation of a GOE(N) matrix,


show that the laws of (N) (DN ) and (N+1) (DN ) are close in the sense that a copy
of (N) (DN ) could be constructed on the same probability space as (N+1) (DN )
in such a way that their difference is bounded by 4.

The sine process

Recall the sine-kernel


1 sin(x y)
Ksine (x, y) = .
xy
Take = R and to be the Lebesgue measure, and for f L2 (R), define

Ksine f (x) = Ksine (x y) f (y)dy .

Writing ksine (z) = Ksine (x, y)|z=xy , we see that ksine (z) is the Fourier transform of
the function 1[1/2 ,1/2 ] ( ). In particular, for any f L2 (R),
   1/2
 f , Ksine f  = f (x) f (y)ksine (x y)dxdy = | f( )|2 d  f 22 .
1/2
(4.2.28)
Thus, Ksine (x, y) is positive definite, and by Lemma 4.2.13, Ksine is locally admis-
sible. Further, (4.2.28) implies that all eigenvalues of restrictions of Ksine to any
compact interval belong to the interval [0, 1]. Hence, by Corollary 4.2.22, Ksine
determines a determinantal point process on R (which is translation invariant in
the terminology of Section 4.2.6 below).

The Airy process



Recall from Definition 3.1.3 the Airy function Ai(x) = 21 i C e /3x d , where C
3

is the contour in the -plane consisting of the ray joining e i/3 to the origin plus
the ray joining the origin to e i/3 , and the Airy kernel KAiry (x, y) = A(x, y) :=
(Ai(x) Ai (y) Ai (x) Ai(y))/(x y) . Take = R and the Lebesgue measure.
Fix L > and let KAiry L denote the operator on L2 ([L, )) determined by


KAiry
L
f (x) = KAiry (x, y) f (y)dy .
L

We now have the following.

Proposition 4.2.30 For any L > , the kernel KAiry L (x, y) is locally admissible.

Further, all the eigenvalues of its restriction to compact sets belong to the interval
L
(0, 1]. In particular, KAiry determines a determinantal point process.
4.2 D ETERMINANTAL POINT PROCESSES 231

Proof We first recall, see (3.9.58), that



KAiry (x, y) = Ai(x + t) Ai(y + t)dt . (4.2.29)
0

In particular, for any L > and functions f , g L2 ([L, )),


  
 f , KAiry
L
g = g, KAiry
L
f = Ai(x + t) Ai(y + t) f (x)g(y)dtdxdy .
L L 0

It follows that KAiry


L is self-adjoint on L ([L, )). Further, from this representa-
2
tion, by an application of Fubinis Theorem,
  2

 f , KAiry
L
f = f (x) Ai(x + t)dx dt 0 .
0 L

Together with Lemma 4.2.13, this proves that KAiry


L is locally admissible.

To complete the proof, as in the case of the sine process, we need an upper
bound on the eigenvalues of restrictions of KAiry to compact subsets of R. Toward
this end, deforming the contour of integration in the definition of Ai(x) to the
imaginary line, using integration by parts to control the contribution of the integral
outside a large disc in the complex plane, and applying Cauchys Theorem, we
obtain the representation, for x R,
 R
1 3 /3+xs)
Ai(x) = lim ei(s ds ,
R 2 R

with the convergence uniform for x in compacts (from this, one can conclude
3
that Ai(x) is the Fourier transform, in the sense of distributions, of eis /3 / 2 , al-
though we will not use that). We now obtain, for continuous functions f supported
on [M, M] [L, ),
  2   2
M

 f , KAiry f  = f (x) Ai(x + t)dx dt f (x) Ai(x + t)dx dt .
0 L M
(4.2.30)
But, for any fixed K > 0,
 K  M 2

f (x) Ai(x + t)dx dt
K M
 K  M  2
1 R i(s3 /3+ts) ixs
= elim e ds f (x)dx dt
K M R 2 R
 
1 K R i(s3 /3+ts) 2

= lim e f (s)ds dt ,
R 2 K R
232 4. S OME GENERALITIES

where f denotes the Fourier transform of f and we have used dominated conver-
gence (to pull the limit out) and Fubinis Theorem in the last equality. Therefore,
 K  M 2  K  R 2
1
eits eis /3 f(s)ds dt
3
f (x) Ai(x + t)dx dt = lim
K M R K 2 R
  2
1
eits eis /3 1[R,R] (s) f(s)ds dt
3
lim sup
R 2
 2 
it 3 /3 2
= lim sup e 1[R,R] (t) f(t)dt dt f (t) dt =  f 22 ,
R

where we used Parsevals Theorem in the two last equalities. Using (4.2.30), we
thus obtain
 f , KAiry f   f 22 ,
first for all compactly supported continuous functions f and then for all f
L2 ([L, )) by approximation. An application of Corollary 4.2.22 completes the
proof.

4.2.6 Translation invariant determinantal processes


In this section we specialize the discussion to determinantal processes on Eu-
clidean space equipped with Lebesgues measure. Thus, let = Rd and let be
the Lebesgue measure.

Definition 4.2.31 A determinantal process with (, ) = (Rd , dx) is translation


invariant if the associated kernel K is admissible and can be written as K(x, y) =
K(x y) for some continuous function K : Rd R.

As we will see below after introducing appropriate notation, a determinantal pro-


cess is translation invariant if its law is invariant under (spatial) shifts.
For translation invariant determinantal processes, the conditions of Theorem
4.2.25 can sometimes be simplified.

Lemma 4.2.32 Assume that K is associated with a translation invariant determi-


nantal process on Rd . Then

1
lim Var( ([L, L]d )) = K(0) K(x)2 dx . (4.2.31)
L (2L)d Rd

Proof. By (4.2.26) with D = [L, L]d and Vol(D) = (2L)d ,



Var( (D)) = Vol(D)K(0) K 2 (x y)dxdy .
DD
4.2 D ETERMINANTAL POINT PROCESSES 233

In particular,

Vol(D)K(0) K 2 (x y)dxdy .
DD

By monotone convergence, it then follows by taking L that K 2 (x)dx
K(0) < . Further, again from (4.2.26),
  
Var( (D)) = Vol(D)(K(0) K(x)2 dx) + dx K 2 (x y)dy.
Rd D y:y D

Since Rd K(x)2 dx < , (4.2.31) follows from the last equality.

We emphasize that the RHS in (4.2.31) can vanish. In such a situation, a more
careful analysis of the limiting variance is needed. We refer to Exercise 4.2.40 for
an example of such a situation in the (important) case of the sine-kernel.
We turn next to the ergodic properties of determinantal processes. It is natural
to discuss these in the framework of the configuration space X . For t Rd , let T t
denote the shift operator, that is for any Borel set A Rd , T t A = {x + t : x A}.
We also write T t f (x) = f (x + t) for Borel functions. We can extend the shift to
act on X via the formula T t x = (xi + t)i=1 for x = (xi )i=1 . T t then extends to a
shift on CX in the obvious way. Note that one can alternatively also define T t
by the formula T t (A) = (T t A).

Definition 4.2.33 Let x be a point process in (X , CX , ). We say that x is ergodic


if for any A CX satisfying T t A = A for all real t, it holds that (A) {0, 1}. It
is mixing if for any A, B CX , (A T t B) |t| (A) (B).

By standard ergodic theory, if x is mixing then it is ergodic.

Theorem 4.2.34 Let x be a translation invariant determinantal point process in


Rd , with good kernel K satisfying K(|x|) |x| 0. Then x is mixing, and hence
ergodic.


Proof Recall from Theorem 4.2.25 that K 2 (x)dx < . It is enough to check
that for arbitrary collections of compact Borel sets {Fi }Li=1 1
and {G j }Lj=1
2
such that

Fi Fi = 0/ and G j G j = 0/ for i = i , j = j , and with the notation Gtj = T t G j , it
holds that for any z = {zi }Li=1
1
CL1 , w = {w j }Lj=1
2
CL2 ,
     
L1 L2 L1 L2
(F ) (Gt ) (F ) (G )
E zi i wj j |t| E zi i E wj j . (4.2.32)
i=1 j=1 i=1 j=1
234 4. S OME GENERALITIES
L1 L2
Define F = i=1 Fi , Gt = t
j=1 G j . Let

L1 L2
K1 = 1F (1 zi )K1Fi , K2t = 1Gt (1 w j )K1Gtj ,
i=1 j=1
L2 L1
t
K12 = 1F (1 w j )K1Gtj , t
K21 = 1Gt (1 zi )K1Fi .
j=1 i=1

By Lemma 4.2.16, the left side of (4.2.32) equals, for |t| large enough so that
F Gt = 0,
/
(K1 + K2t + K12
t t
+ K21 ). (4.2.33)
t
|t| 0, supx,y K21 |t| 0. Therefore, by
Note that, by assumption, supx,y K12 t

Lemma 3.4.5, it follows that

lim |(K1 + K2t + K12


t t
+ K21 ) (K1 + K2t )| = 0 . (4.2.34)
|t|

Next, note that for |t| large enough such that F Gt = 0, K1  K2t = 0 and hence,
by the definition of the Fredholm determinant,

(K1 + K2t ) = (K1 )(K2t ) = (K1 )(K2 ) ,

where K2 := K20 and the last equality follows from the translation invariance of K.
Therefore, substituting in (4.2.33) and using (4.2.34), we get that the left side of
(4.2.32) equals (K1 )(K2 ). Using Lemma 4.2.16 again, we get (4.2.32).

Let be a nonzero translation invariant determinantal point process with good


kernel K satisfying K(|x|) |x| 0. As a consequence of Theorem 4.2.34 and
the ergodic theorem, the limit

c := lim ([n, n]d )/(2n)d (4.2.35)


n

exists and is strictly positive, and is called the intensity of the point process.
For stationary point processes, an alternative description can be obtained by
considering configurations conditioned to have a point at the origin. When spe-
cialized to one-dimensional stationary point processes, this point of view will be
used in Subsection 4.2.7 when relating statistical properties of the gap around zero
for determinantal processes to ergodic averages of spacings.

Definition 4.2.35 Let be a translation invariant point process, and let B denote a
Borel subset of Rd of positive and finite Lebesgue measure. The Palm distribution
Q associated with is the measure on M+ (Rd ) determined by the equation, valid
4.2 D ETERMINANTAL POINT PROCESSES 235

for any measurable A,


 
Q(A) = E 1A (T s ) (ds) /E( (B)) .
B

We then have:

Lemma 4.2.36 The Palm distribution Q does not depend on the choice of the Borel
set B.

Proof We first note that, due to the stationarity, E( (B)) = c (B) with the
Lebesgue measure, for some constant c. (It is referred to as the intensity of , and
for determinantal translation invariant point processes, it coincides with the pre-
viously defined notion of intensity, see (4.2.35)). It is obvious from the definition
that the random measure

A (B) := 1A (T s ) (ds)
B

is stationary, namely A (T t B) has the same distribution as A (B). It follows that


E A (T t B) = E A (B) for all t Rd , implying that E A (B) = cA (B) for some
constant cA , since the Lebesgue measure is (up to multiplication by scalar) the
unique translation invariant measure on Rd . The conclusion follows.

Due to Lemma 4.2.36, we can speak of the point process 0 attached to the
Palm measure Q, which we refer to as the Palm process. Note that 0 is such
that Q( 0 ({0}) = 1) = 1, i.e. 0 is such that the associated configurations have
a point at zero. It turns out that this analogy goes deeper, and in fact the law Q
corresponds to conditioning on an atom at the origin. Let V 0 denote the Voronoi
cell associated with 0 , i.e., with B(a, r) denoting the Euclidean ball of radius r
around a,
V 0 = {t Rd : 0 (B(t, |t|)) = 0} .

Proposition 4.2.37 Let be a nonzero translation invariant point process with


good kernel K satisfying K(|x|) |x| 0, with intensity c. Let 0 denote the
associated Palm process. Then the law P of can be determined from the law Q
of 0 via the formula, valid for any bounded measurable function f ,

E f ( ) = cE f (T t 0 )dt , (4.2.36)
V 0

where c is the intensity of .


236 4. S OME GENERALITIES

Proof From the definition of 0 it follows that for any bounded measurable func-
tion g,

E g(T s ) (ds) = c (B)Eg( 0 ) . (4.2.37)
B

This extends by monotone class to jointly measurable nonnegative functions h :


M+ (Rd ) Rd R as
 
E h(T ,t) (dt) = cE
t
h( 0 ,t)dt .
Rd Rd

Applying the last equality to h( ,t) = g(T t ,t), we get


  
E g( ,t) (dt) = cE g(T t 0 ,t)dt = cE g(T t 0 , t)dt . (4.2.38)
Rd Rd Rd

Before proceeding, we note a particularly useful consequence of (4.2.38). Namely,


let

D := { : there exist t = t Rd with t = t  and ({t}) ({t }) = 1} .

The measurability of D is immediate from the measurability of the set

D = {(t,t ) (Rd )2 : t = t ,t = t } .

Now, with Et = { : (y) = 1 for some y = t with y = t},



1D 1Et (dt) .

Therefore, using (4.2.38),



P(D) cE 1T t 0 Et dt .
Rd

Since all configurations are countable, the set of ts in the indicator in the inner
integral on the right side of the last expression is contained in a countable collec-
tion of (d 1)-dimensional surfaces. In particular, its Lebesgue measure vanishes.
One thus concludes that
P(D) = 0 . (4.2.39)

Returning to the proof of the proposition, apply (4.2.38) with


g( ,t) = f ( )1 ({t})=1, (B(0,|t|))=0 , and use the fact that T t 0 (B(0, |t|)) = 0 iff
t V 0 to conclude that
    
E f ( ) 1 (B(0,|t|))=0 (dt) = cE f (T t 0 )dt .
V 0
4.2 D ETERMINANTAL POINT PROCESSES 237

Since P(D) = 0, it follows that 1 (B(0,|t|))=0 (dt) = 1, for almost every . This
yields (4.2.36).

Exercise 4.2.38 Let be a nonzero translation invariant determinantal point pro-


cess with good kernel K. Show that the intensity c defined in (4.2.35) satisfies
c = K(0).

Exercise 4.2.39 Assume that K satisfies the assumptions of Lemma 4.2.32, and
define the Fourier transform

K( ) = K(x) exp(2 ix )dx L2 (Rd ) .
xRd
Give a direct proof that the right side of (4.2.31) is nonnegative.
Hint: use the fact that, since K is a good kernel, it follows that K 1.

Exercise 4.2.40 [CoL95] Take d = 1 and check that the sine-kernel Ksine (x) =
sin(x)/ x is a good translation invariant kernel for which the right side of (4.2.31)
vanishes. Check that then, if a < b are fixed,
E[ (L[a, b])] = L(b a)/ ,
whereas
1
Var( (L[a, b])) =
log L + O(1).
2
Hint: (a) Apply Parsevals Theorem and the fact that the Fourier transform of the
function sin(x)/ x is the indicator over the interval [1/2 , 1/2 ] to conclude
 2
that K (x)dx = 1/ = K(0).
(b) Note that, with D = L[a, b] and Dx = [La x, Lb x],
     
1 1 cos(2u)
dx K 2 (x y)dy = dx K 2 (u)du = dx du ,
D Dc D Dcx 2 D Dcx 2u2
from which the conclusion follows.

Exercise 4.2.41 Let |V 0 | denote the Lebesgue measure of the Voronoi cell for a
Palm process 0 corresponding to a stationary determinantal process on Rd with
intensity c. Prove that E(|V 0 |) = 1/c.

4.2.7 One-dimensional translation invariant determinantal processes

We restrict attention in the sequel to the case of most interest to us, namely to
dimension d = 1, in which case the results are particularly explicit. Indeed, when
238 4. S OME GENERALITIES

d = 1, each configuration x of a determinantal process can be ordered, and we


write x = (. . . , x1 , x0 , x1 , . . .) with the convention that xi < xi+1 for all i and
x0 < 0 < x1 (by stationarity and local finiteness, P( ({0}) = 1) = 0, and thus
the above is well defined). We also use x0 = (. . . , x1 0 , 0 = x0 , x0 , . . .) to denote the
0 1
configuration corresponding to the Palm process . The translation invariance of
0

the point process translates then to stationarity for the Palm process increments,
as follows.

Lemma 4.2.42 Let x0 denote the Palm process associated with a determinantal
translation invariant point process x on R with good kernel K satisfying
K(|x|) |x| 0, and with intensity c > 0. Then the sequence y0 := {xi+1
0 x0 }
i iZ
is stationary and ergodic.

Proof Let T y0 = {y0i+1 }iZ denote the shift of y0 . Consider g a Borel function
on R2r for some r 1, and set g(y0 ) = g(y0r , . . . , y0r1 ). For any configuration x
with xi < xi+1 and x1 < 0 x0 , set y := {xi+1 xi }iZ . Set f (x) = g(xr+1
xr , . . . , xr xr1 ), and let Au = { : f (x) u}. Au is clearly measurable, and
by Definition 4.2.35 and Lemma 4.2.36, for any Borel B with positive and finite
Lebesgue measure,
 
P(g(y ) u) = Q(Au ) = E
0
1Au (T ) (ds) /c (B)
s
B
 
= E 1g(T i y)u /c (B) . (4.2.40)
i:xi B

(Note the different roles of the shifts T s , which is a spatial shift, and T i , which is
a shift on the index set, i.e. on Z.) Hence,

|P(g(y0 ) u) P(g(T y0 ) u)| 2/c (B) .

Taking B = Bn = [n, n] and then n , we obtain that the left side of the last
expression vanishes. This proves the stationarity. The ergodicity (and in fact,
mixing property) of the sequence y0 is proved similarly, starting from Theorem
4.2.34.

We also have the following analog of Proposition 4.2.37.

Proposition 4.2.43 Assume x is a nonzero stationary determinantal process on R


with intensity c. Then for any bounded measurable function f ,
 x0
1
E( f (x)) = cE f (T t x0 )dt . (4.2.41)
0
4.2 D ETERMINANTAL POINT PROCESSES 239

Proof Apply (4.2.38) with g( ,t) = f (x)1x0 ( )=t .


Proposition 4.2.43 gives an natural way to construct the point process starting
from 0 (whose increments form a stationary sequence): indeed, it implies that
is nothing but the size biased version of 0 , where the size biasing is obtained by
the value of x10 . More explicitly, let x denote a translation invariant determinantal
process with intensity c, and let x0 denote the associated Palm process on R.
Consider the sequence y0 introduced in Lemma 4.2.42, and denote its law by Qy .
Let y denote a sequence with law Qy satisfying d Qy /dQy (y) = cy0 , let x0 denote
the associated configuration, that is xi0 = i1
j=1 y j , noting that x0 = 0, and let U
denote a random variable distributed uniformly on [0, 1], independent of x0 . Set
0
x = T U x1 x0 . We then have

Corollary 4.2.44 The point process x has the same law as x.

Proof By construction, for any bounded measurable f ,


 1  x0
0 1 dt
E f (x) = E f (T ux1 x0 )du = E f (T t x0 )
0 0 x10
 x0
1
= cE f (T t x0 )dt = E f (x) ,
0

where Proposition 4.2.43 was used in the last step.


Corollary 4.2.44 has an important implication to averages. Let Bn = [0, n]. For
a bounded measurable function f and a point process x on R, let

xi Bn f (T xi x)
fn (x) = .
|{i : xi Bn }|

Corollary 4.2.45 Let x be a translation invariant determinantal process with in-


tensity c, and good kernel K satisfying K(x) |x| 0, and Palm measure Q. Then

lim fn (x) = EQ f , almost surely .


n

Proof The statement is immediate from the ergodic theorem and Lemma 4.2.42
for the functions fn (x0 ). Since, by Corollary 4.2.44, the law of T x1 x is absolutely
continuous with respect to that of x0 , the conclusion follows by an approximation
argument.

Corollary 4.2.44 allows us to relate several quantities of interest in the study of


determinantal processes. For a translation invariant determinantal point process x,
let Gx = x1 x0 denote the gap around 0. With Q1 denoting the marginal on x10 of
240 4. S OME GENERALITIES

the Palm measure, and with Q1 defined by d Q1 /dQ1 (u) = cu, note that
 
P(Gx t) = P(x10 t) = Q1 (du) = c uQ1 (du) .
t t

Let G(t) = P({x} (t,t) = 0)


/ be the probability that the interval (t,t) does not
contain any point of the configuration x. Letting Dt = 1(t,t) Kt = 1Dt K1Dt , and
t = (Dt ), we have, using Lemma 4.2.16, that

G(t) = P(t = 0) = lim E(zt ) = (Kt ) , (4.2.42)


|z|0

that is, G(t) can be read off easily from the kernel K. Other quantities can be read
off G, as well. In particular, the following holds.

Proposition 4.2.46 Let x be a translation invariant determinantal point process


of intensity c. Then the function G is differentiable and

G(t)
= 2c Q1 (dw) . (4.2.43)
t 2t

Proof By Corollary 4.2.44,


 1/2  1/2 
G(t) = 2 P(ux10 t)du = 2 Q1 (ds)du
0 0 t/u
 
= 2t dww2 Q1 (ds) ,
2t w

where the change of variables w = t/u was used in the last equality. Integrating
by parts, using V (w) = 1/w and U (w) = Q1 ([w, )), we get

G(t) = U (2t) 2t w1 Q1 (dw)
2t

= U (2t) 2ct Q1 (dw) = U (2t) 2ctQ1 ([2t, ))
2t

= c [w 2t]Q1 (dw) .
2t

Differentiating in t, we then get (4.2.43).


Finally, we describe an immediate consequence of Proposition 4.2.46, which


is useful when relating different statistics related to the spacing of eigenvalues
of random matrices. Recall the spacing process y associated with a stationary
point process x, i.e. yi = xi+1 xi .
4.2 D ETERMINANTAL POINT PROCESSES 241

Corollary 4.2.47 Let g be a bounded measurable function on R+ and define gn =


1 n
n i=1 g(yi ) . Then

gn n EQ1 g = g(w)Q1 (dw) , almost surely .
0

In particular, with gt (w) = 1w>2t , we get


G(t)
= 2c lim (gt )n , almost surely . (4.2.44)
t n

4.2.8 Convergence issues

We continue to assume K is a good translation invariant kernel on R satisfying


K(|x|) |x| 0. In many situations, the kernel K arises as a suitable limit of
kernels KN (x, y) that are not translation invariant, and it is natural to relate prop-
erties of determinantal processes xN (or N ) associated with KN to those of the
determinantal process x (or ) associated with K.
We begin with a simple lemma that is valid for (not necessarily translation
invariant) determinantal processes. Let KN denote a sequence of good kernels
corresponding to a determinantal process xN , and let K be a good kernel cor-
responding to a determinantal process x. Set G(t) = P({x} (t, t) = 0)
/ and
GN (t) = P({x } (t,t) = 0).
N /

Lemma 4.2.48 Let D denote disjoint compact subsets of R. Suppose a se-


quence of good kernels KN satisfy KN (x, y) K(x, y) uniformly on compact sub-
sets of R, where K is a good kernel. Then for any L finite, the random vector
( N (D1 ), . . . , N (DL )) converges to the random vector ( (D1 ), . . . , (DL )) in dis-
tribution. In particular, GN (t) N G(t).

Proof It is clearly enough to check that


   
L L
N (D ) (D )
E z  N E z  .
=1 =1
L
By Lemma 4.2.16, with D = =1 , the last limit would follow from the conver-
gence
   
L L
1D (1 z )KN 1D N 1D (1 z )K1D ,
=1 =1

which is an immediate consequence of Lemma 3.4.5.


In what follows, we assume that K is a good translation invariant kernel on R


242 4. S OME GENERALITIES

satisfying K(|x|) |x| 0. In many situations, the kernel K arises as a suitable


limit of kernels KN (x, y) that are not translation invariant, and it is natural to relate
properties of determinantal processes xN (or N ) associated with KN to those of
the determinantal process x (or ) associated with K.
We next discuss a modification of Corollary 4.2.47 that is applicable to the
process xN and its associated spacing process yN .

Theorem 4.2.49 Let gt (x) = 1x>t , and define gNn,t = 1n ni=1 gt (yNi ) . Suppose further
that n = o(N) N is such that for any constant a > 0,

lim sup sup |KN (x, y) K(x y)| = 0 . (4.2.45)


N |x|+|y|2an

Then 
gNn,t N EQ1 gt = Q1 (dw) , in probability . (4.2.46)
t

Proof In view of Corollary 4.2.47, it is enough to prove that |gNn,t gn,t | N 0,


in probability. Let c denote the intensity of the process x. For a > 0, let Dn,a =
[0, an]. By Corollary 4.2.45, (Dn,a )/n converges almost surely to a/c. We now
claim that
N (Dn,a ) a
N , in probability . (4.2.47)
n c
Indeed, recall that by Lemma 4.2.5 and the estimate (4.2.45),
 an
1 N 1 anK(0) a
E (Dn,a ) = [KN (x, x) K(0)]dx + N ,
n n 0 n c
while, c.f. (4.2.26),
  
1 N 1 an
Var (Dn,a ) 2 KN (x, x)dx N 0 ,
n n 0
proving (4.2.47).
In the sequel, fix a > 0 and let
1 1
CN (s, n) = 1an>xiN ,xi+1
n i=1
N xN >s ,
i
C(s, n) = 1an>xi ,xi+1 xi >s .
n i=1
In view of (4.2.47), in order to prove (4.2.46) it is enough to show that, for any
a, s > 0,

|ECN (s, n) EC(s, n)| N 0 , |E (CN (s, n))2 E (C(s, n))2 | N 0 .


(4.2.48)
Fix > 0, and divide the interval [0, an) into $n/ % disjoint intervals Di =
4.2 D ETERMINANTAL POINT PROCESSES 243

[(i 1) , i ) [0, n), each of length . Let iN = N (Di ) and i = (Di ).


Set
1 $an/ %
SN (s, , n) = 1iN 1, Nj =0, j=i+1,...,i+s/  ,
n i=1

and
1 $an/ %
S(s, , n) = 1i 1, Nj =0, j=i+1,...,i+s/  .
n i=1

We prove below that, for any fixed s, ,

|ESN (s, , n) ES(s, , n)| N 0 , (4.2.49)


|E(S (s, , n) ) E(S(s, , n) )| N 0 ,
N 2 2
(4.2.50)

from which (4.2.48) follows by approximation.


To see (4.2.49), note first that
 
1 $an/ % i+s/ 
ES (s, , n) =
N
E 1iN 1 j
n i=1
N
j=i+1
 
1 $an/ % i+s/ 
= E (1 1iN =0 ) j
n i=1
N
j=i+1
    
$an/ % i+s/  N i+s/  N
1 j j
= maxlim
n i=1 j |z j |0
E zj E zj
j=i+1 j=i

1 $an/ % 
=
n i=1
(1Bi KN 1Bi ) (1B+ KN 1B+ ) ,
i i

i+s/  i+s/ 
where Bi = j=i+1 D j and B+
i = j=i D j , and we used Lemma 4.2.16 in the
last equality. Similarly,

1 $an/ % 
ES(s, , n) =
n i=1
(1Bi K1Bi ) (1 +
Bi K1 +
Bi ) ,

Applying Corollary 4.2.45, (4.2.49) follows.


The proof of (4.2.50) is similar and omitted.

4.2.9 Examples

We consider in this subsection several examples of determinantal processes.


244 4. S OME GENERALITIES

The biorthogonal ensembles

In the setup of Subsection 4.2.1, let (i , i )i0 be functions in L2 (, ). Let



gi j = i (x) j (x)d (x), 1 i, j N .

Define the measure N on N by

N N N
N (dx1 , , dxN ) = det (i (x j )) det (i (x j )) d (xi ) . (4.2.51)
i, j=1 i, j=1 i=1

Lemma 4.2.50 Assume that all principal minors of G = (gi j ) are not zero. Then
the measure N of (4.2.51) defines a determinantal simple point process with N
points.

Proof The hypothesis implies that G admits a Gauss decomposition, that is, it
can be decomposed into the product of a lower triangular and an upper triangular
matrix, with nonzero diagonal entries. Thus there exist matrices L = (li j )Ni,j=1 and
U = (ui j )Ni,j=1 so that LGU = I. Setting

= U = L ,

it follows that, with respect to the scalar product in L2 ( ),

i , j  = i, j , (4.2.52)

and, further,
N N N
N (dx1 , , dxN ) = CN det (i (x j )) det (i (x j )) d (xi )
i, j=1 i, j=1 i=1

for some constant CN . Proceeding as in the proof of Lemma 3.2.2, we conclude


that
N N N
N (dx1 , , dxN ) = CN det
i, j=1
k (xi )k (x j ) d (xi ) .
k=1 i=1

The proof of Lemma 4.2.50 is concluded by using (4.2.52) and computations sim-
ilar to Lemma 3.2.2 in order to verify the property in Remark 4.2.6.

Exercise 4.2.51 By using Remark 4.1.7, show that all joint distributions appear-
ing in Weyls formula for the unitary groups (Proposition 4.1.6) correspond to
determinantal processes.
4.2 D ETERMINANTAL POINT PROCESSES 245

Birthdeath processes conditioned not to intersect

Take to be Z, the counting measure and Kn a homogeneous (discrete time)


Markov semigroup, that is, Kn : R+ so that, for any integers n, m,

Kn+m (x, y) = Kn  Km (x, y) = Kn (x, z)Km (z, y)d (z) ,

and, further, Kn (x, y)d (y) = 1. We assume K1 (x, y) = 0 if |x y| = 1. We let
{Xn }n0 denote the Markov process with kernel K1 , that is for all n < m integers,

P(Xm A|X j , j n) = P(Xm A|Xn ) = Kmn (Xs , y)d (y) .
yA

Fix x = (x1 < < xN ) with xi 2Z. Let {Xxn }n0 = {(Xn1 , . . . , XnN )}n0 denote
N independent copies of {Xn }n0 , with initial positions (X01 , . . . , X0N ) = x. For
0
integer T , define the event AT = 0kT {Xk1 < Xk2 < < XkN }.

Lemma 4.2.52 (GesselViennot) With the previous notation, set y = (y1 < <
yN ) with yi 2Z. Then
N &
K2T (x, y) = P (Xx2T = y|A2T )
detNi,j=1 (K2T (xi , y j ))
=  .
z1 <<zN detNi,j=1 (K2T (xi , z j )) d (z j )

Proof The proof is an illustration of the reflection principle. Let P2T (x, y),
x, y 2Z, denote the collection of Z-valued, nearest neighbor paths { ()}2T
=0
with (0) = x, (2T ) = y and | ( + 1) ()| = 1. Let
8 9
2T (x, y) = { i }Ni=1 : i P2T (xi , yi )

denote the collection of N nearest neighbor paths, with the ith path connecting xi
and yi . For any permutation SN , set y = {y (i) }Ni=1 . Then
N N
det (K2T (xi , y j )) =
i, j=1
( ) K2T ( i ) , (4.2.53)
SN { i }N
i=1 2T (x,y )
i=1

where
 
2T 2
K2T ( ) = K1 (x , (2))
i i i
K1 ( (k), (k + 1))
i i
K1 ( i (2T 1), y (i) ) .
k=2

On the other hand, let

N Cx,y
2T = {{ }i=1 2T (x, y) : { } { } = 0
i N i j
/ if i = j}
246 4. S OME GENERALITIES

denote the collection of disjoint nearest neighbor paths connecting x and y. Then
N
P (Xx2T = y, A2T ) = x,y
K2T ( i ) . (4.2.54)
{ i }i=1 N C2T i=1
N

Thus, to prove the lemma, it suffices to check that the total contribution in (4.2.53)
of the collection of paths not belonging to N Cx,y 2T vanishes. Toward this end,
the important observation is that because we assumed x, y 2Z, for any n 2t
and i, j N, any path 2T (xi , y j ) satisfies (n) 2Z + n. In particular, if

{ i }Ni=1 SN 2T (x, y ) and there is a time n 2T and integers i < j such
that i (n) j (n), then there actually is a time m n with i (m) = j (m).
Now, suppose that in a family { i }Ni=1 2T (x, y ), there are integers i < j so
that i (n) = j (n). Consider the path so that
j
(), k = i,  > n
k () = i (), k = j,  > n
k
(), otherwise.

Then, obviously, Ni=1 K2T ( i ) = Ni=1 K2T ( i ). Further, for some SN ,


{ i }Ni=1 2T (x, y ), with and differing only by the transposition of i and
j. In particular, ( ) + ( ) = 0.
We can now conclude: by the previous argument, the contribution in (4.2.53)
of the collection of paths where 1 intersects with any other path vanishes. On the
other hand, for the collection of paths where 1 does not intersect any other path
(and thus 1 (2T ) = y1 ), one freezes a path 1 and repeats the same argument to
conclude that the sum over all other paths, restricted not to intersect the frozen path
1 but to have 2 intersect another path, vanishes. Proceeding inductively, one
concludes that the sum in (4.2.53) over all collections { i }Ni=1 N Cx,y
2T vanishes.
This completes the proof.

Combining Lemma 4.2.52 with Lemma 4.2.50, we get the following.

Corollary 4.2.53 In the setup of Lemma 4.2.52, let


.
N
B2T,y = A2T {X i (2T ) = yi } .
i=1

Conditioned on the event B2T,y , the process (X 1 (n), . . . , X N (n))n[0,2T ] is a (time


inhomogeneous) Markov process satisfying, with z = (z1 < z2 < < zN ) and
n < 2T ,
N N
P (Xxn = z|A2T ) = CN (n, T, x, y) det (Kn (xi , z j )) det (K2T n (zi , y j ))
i, j=1 i, j=1
4.2 D ETERMINANTAL POINT PROCESSES 247

with
 N N N
CN (n, T, x, y) = det (Kn (xi , z j )) det (K2T n (zi , y j )) d (zi ) .
i, j=1 i, j=1 i=1

At any time n < 2T , the configuration (X 1 (n), . . . , X N (n)), conditioned on the


event B2T,y , is a determinantal simple point process.

We note that, in the proof of Lemma 4.2.52, it was enough to consider only
the first time in which paths cross; the proof can therefore be adapted to cover
diffusion processes, as follows. Take = R, the Lebesgue measure, and con-
sider a time homogeneous, real valued diffusion process (Xt )t0 with transition
kernel Kt (x, y) which is jointly continuous in (x, y). Fix x = (x1 < < xN )
with xi R. Let {Xtx }t0 = {(Xt1 , . . . , XtN )}t0 denote N independent copies of
{Xt }t0 , with initial positions (X01 , . . . , X0N ) = x. For real T , define the event
0
AT = 0tT {Xt1 < Xt2 < < XtN }.

Lemma 4.2.54 (KarlinMcGregor) With the previous notation, the probability


measure P (XxT |AT ) is absolutely continuous with respect to Lebesgue measure
restricted to the set {y = (y1 < y2 < < yN )} RN , with density pxT (y|AT )
satisfying
detNi,j=1 (KT (xi , y j ))
pxT (y|AT ) =  .
z1 <<zN detNi,j=1 (KT (xi , z j )) dz j

Exercise 4.2.55 Prove the analog of Corollary 4.2.53 in the setup of Lemma
4.2.54. Use the following steps.
(a) For t < T , construct the density qtN,T,x,y of Xtx conditioned on AT {XxT = y}
so as to satisfy, for any Borel sets A, B RN and t < T ,
 N  N
P (Xtx A, XxT B|AT ) =
A
dzi B dyi qtN,T,x,y (z)px (y|AT ) .
i=1 i=1

(b) Show that the collection of densities qtN,T,x,y determine a Markov semigroup
corresponding to a diffusion process, and
N N
qtN,T,x,y (z) = CN,T (t, x, y) det (Kt (xi , z j )) det (KT t (zi , y j ))
i, j=1 i, j=1

with
 N N N
CN,T (t, x, y) = det (Kt (xi , z j )) det (KT t (zi , y j )) d (zi ) ,
i, j=1 i, j=1 i=1

whose marginal at any time t < T corresponds to a determinantal simple point


process with N points.
248 4. S OME GENERALITIES

Exercise 4.2.56 (a) Use Exercise 4.2.55 and the heat kernel
K1 (x, y) = (2 )1/2 e(xy)
2 /2

to conclude that the law of the (ordered) eigenvalues of the GOE coincides with
the law of N Brownian motions run for a unit of time and conditioned not to inter-
sect at positive times smaller than 1.
Hint: start the Brownian motion at locations 0 = x1 < x2 < < xN and then take
xN 0, keeping only the leading term in x and noting that it is a polynomial in y
that vanishes when (y) = 0.
(b) Using part (a) and Exercise 4.2.55, show that the law of the (ordered) eigen-
values of the GUE coincides with the law of N Brownian motions at time 1, run
for two units of time, and conditioned not to intersect at positive times less than 2,
while returning to 0 at time 2.

4.3 Stochastic analysis for random matrices

In this section we introduce yet another effective tool for the study of Gaussian
random matrices. The approach is based on the fact that a standard Gaussian
variable of mean 0 and variance 1 can be seen as the value, at time 1, of a standard
Brownian motion. (Recall that a Brownian motion Wt is a zero mean Gaussian
process of covariance E(Wt Ws ) = t s.) Thus, replacing the entries by Brownian
motions, one gets a matrix-valued random process, to which stochastic analysis
and the theory of martingales can be applied, leading to alternative derivations and
extensions of laws of large numbers, central limit theorems, and large deviations
for classes of Gaussian random matrices that generalize the Wigner ensemble of
Gaussian matrices. As discussed in the bibliographical notes, Section 4.6, some of
the later results, when specialized to fixed matrices, are currently only accessible
through stochastic calculus.
Our starting point is the introduction of the symmetric and Hermitian Brownian
motions; we leave the introduction of the symplectic Brownian motions to the
exercises.

Definition 4.3.1 Let (Bi, j , Bi, j , 1 i j N) be a collection of i.i.d. real valued


standard Brownian motions. The symmetric (resp. Hermitian) Brownian 8 N,motion,

denoted H N, HN , = 1, 2, is the random process with entries Hi, j (t),t
9
0, i j equal to
1
(Bk,l + i( 1)Bk,l ), if k < l ,
N,
Hk,l = N (4.3.1)
2 Bl,l , if k = l .
N
4.3 S TOCHASTIC ANALYSIS FOR RANDOM MATRICES 249

We will be studying the stochastic process of the (ordered) eigenvalues of H N, . In


Subsection 4.3.1, we derive equations for the system of eigenvalues, and show that
at all positive times, eigenvalues do not collide. These stochastic equations are
then used in Subsections 4.3.2, 4.3.3 and 4.3.4 to derive laws of large numbers,
central limit theorems, and large deviation upper bounds, respectively, for the
process of empirical measure of the eigenvalues.

4.3.1 Dysons Brownian motion

We begin in this subsection our study of the process of eigenvalues of time-


dependent matrices. Throughout, we let (W1 , . . . ,WN ) be a N-dimensional Brow-
nian motion in a probability space (, P) equipped with a filtration F = {Ft ,t
0}. Let N denote the open simplex

N = {(xi )1iN RN : x1 < x2 < < xN1 < xN } ,



with closure N . With {1, 2}, let X N, (0) HN be a matrix with (real)
eigenvalues (1N (0), . . . , NN (0)) N . For t 0, let N (t) = (1N (t), . . . , NN (t))
N denote the ordered collection of (real) eigenvalues of

X N, (t) = X N, (0) + H N, (t) , (4.3.2)

with H N, as in Definition 4.3.1. A fundamental observation (due to Dyson in the


case X N, (0) = 0) is that the process ( N (t))t0 is a vector of semi-martingales,
whose evolution is described by a stochastic differential system.
 
Theorem 4.3.2 (Dyson) Let X N, (t) t0 be as in (4.3.2), with eigenvalues
( N (t))t0 and N (t) N for all t 0. Then, the processes ( N (t))t0 are
semi-martingales. Their joint law is the unique distribution on C(R+ , RN ) so that
 
P t > 0, (1N (t), , NN (t)) N = 1 ,

which is a weak solution to the system



2 1 1
d i (t) = 
N
N
dWi (t) +
N N (t) N (t) dt , i = 1, . . . , N , (4.3.3)
j: j =i i j

with initial condition N (0).

We refer the reader to Appendix H, Definitions H.4 and H.3, for the notions of
strong and weak solutions.
250 4. S OME GENERALITIES

Note that, in Theorem 4.3.2, we do not assume that N (0) N . The fact that
N (t) N for all t > 0 is due to the natural repulsion of the eigenvalues. This
repulsion will be fundamental in the proof of the theorem.
It is not hard to guess the form of the stochastic differential equation for the
eigenvalues of X N, (t), simply by writing X N, (t) = (ON ) (t)(t)ON (t), with
(t) diagonal and (ON ) (t)ON (t) = IN . Differentiating formally (using Itos for-
mula) then allows one to write the equations (4.3.3) and appropriate stochastic dif-
ferential equations for ON (t). However, the resulting equations are singular, and
proceeding this way presents several technical difficulties. Instead, our derivation
of the evolution of the eigenvalues N (t) will be somewhat roundabout. We first
show, in Lemma 4.3.3, that the solution of (4.3.3), when started at N , exists, is
unique, and stays in N . Once this is accomplished, the proof that ( N (t))t0
solves this system will involve routine stochastic analysis.

Lemma 4.3.3 Let N (0) = (1N (0), . . . , NN (0)) N . For any 1, there exists
a unique strong solution ( N (t))t0 C(R+ , N ) to the stochastic differential
system (4.3.3) with initial condition N (0). Further, the weak solution to (4.3.3)
is unique.

This result is extended to initial conditions N (0) N in Proposition 4.3.5.


Proof The proof is routine stochastic analysis, and proceeds in three steps. To
overcome the singularity in the drift, one first introduces a cut-off, parametrized
by a parameter M, thus obtaining a stochastic differential equation with Lipschitz
coefficients. In a second step, a Lyapunov function is introduce that allows one
to control the time TM until the diffusion sees the cut-off; before that time, the
solution to the system with cut-off is also a solution to the original system. Finally,
taking M one shows that TM almost surely, and thus obtains a solution
for all times.
Turning to the proof, set, for R > 0,
 1
x if |x| R1 ,
R (x) =
R2 x otherwise.

Introduce the auxiliary system


7
2 1
d iN,R (t) =
N
dWi (t) +
N R (iN,R (t) jN,R (t))dt, i = 1, . . . , N ,
j: j =i
(4.3.4)
with iN,R (0) = iN (0) for i = 1, . . . , N. Since R is uniformly Lipschitz, it follows
from Theorem H.6 that (4.3.4) admits a unique strong solution, adapted to the
4.3 S TOCHASTIC ANALYSIS FOR RANDOM MATRICES 251
N,R
filtration F , as well as a unique weak solution PT, N (0)
M1 (C([0, T ], RN )). Let

R := inf{t : min |iN,R (t) jN,R (t)| < R1 } ,


i = j

noting that R is monotone increasing in R and



N,R (t) = N,R (t) for all t R and R < R . (4.3.5)
We now construct a solution to (4.3.3) by taking N (t) = N,R (t) on the event R >
t, and then showing that R R , almost surely. Toward this end, consider the
Lyapunov function, defined for x = (x1 , . . . , xN ) N ,

1 N 2 1
f (x) = f (x1 , . . . , xN ) = xi N 2 log |xi x j | .
N i=1 i = j

Using the fact that


log |x y| log(|x| + 1) + log(|y| + 1) and x2 2 log(|x| + 1) 4 ,
we find that, for all i = j,
1
f (x1 , . . . , xN ) 4 , log |xi x j | f (x1 , . . . , xN ) + 4 . (4.3.6)
N2
For any M > 0 and x = (x1 , . . . , xN ) N , set
2 (4+M)
R = R(N, M) = eN and TM = inf{t 0 : f ( N,R (t)) M} . (4.3.7)
Since f is C (N , R) on sets where it is uniformly bounded (note here that f is
bounded below uniformly), we have that {TM > T } FT for all T 0, and hence
TM is a stopping time. Moreover, due to (4.3.6), on the event {TM > T }, we get
that, for all t T ,
|iN,R (t) jN,R (t)| R1 ,
and thus on the event {T TM }, ( N,R (t),t T ) provides an adapted strong
solution to (4.3.3). For i = 1, . . . , N and j = 1, 2, define the functions ui, j : N R
by
1 1
ui,1 (x) = , ui,2 (x) = .
k:k =i xi xk k:k =i (xi xk )2

Itos Lemma (see Theorem H.9) gives


 
2 N 1
d f ( (t)) =
N,R
i (t) N ui,1 ( (t)) ui,1 ( N,R (t))dt
N 2 i=1
N,R N,R

 
2 N 1
+ 1 + N 2 ui,2 ( (t)) dt + dMN (t) , (4.3.8)
N i=1
N,R
252 4. S OME GENERALITIES

with M N (t) the local martingale


3
 
N
22 1 1
dM (t) = 1 3 i (t)
N N,R
N,R (t) N,R (t) dWi (t) .
2 N 2 i=1 N k:k =i i k

Observing that, for all x = (x1 , . . . , xN ) N ,


N   1 1
ui,1 (x)2 ui,2 (x) = xi xk xi xl
i=1 k =i,l =i
k =l
 
1 1 1 1 1
= xl xk

xi xl xi xk
= 2 xi xk xi xl
,
k =i,l =i k =i,l =i
k =l k =l

we conclude that, for x N ,


N  
ui,1 (x)2 ui,2 (x) = 0 .
i=1

Similarly,
N
N(N 1)
ui,1 (x)xi = 2
.
i=1

Substituting the last two equalities into (4.3.8), we get


2 1 2(1 )
N2
d f ( N,R (t)) = (1 + )dt + ui,2 ( N,R (t))dt + dM N (t) .
N i

Thus, for all 1, for all M < , since (M N (t TM ),t 0) is a martingale with
zero expectation,

E[ f ( N,R (t TM ))] 3E[t TM ] + f ( N,R (0)).

Therefore, recalling (4.3.6),


 
(M + 4)P (TM t) = E[ f ( N,R (t TM )) + 4 1tTM ]
E[ f ( N,R (t TM )) + 4] 3E[t TM ] + 4 + f ( N,R (0))
3t + 4 + f ( N,R (0)) ,

which proves that


3t + 4 + f ( N,R (0))
P (TM t) .
M+4
Hence, the BorelCantelli Lemma implies that, for all t R+ ,

P (M N : TM2 t) = 1,
4.3 S TOCHASTIC ANALYSIS FOR RANDOM MATRICES 253

and in particular, TM2 goes to infinity almost surely. As a consequence, recalling


that M = 4 + (log R)/N 2 , see (4.3.7), and setting N (t) = N,R (t) for t TM2 ,
gives, due to (4.3.5), a strong solution to (4.3.3), which moreover satisfies N (t)
N for all t. The strong (and weak) uniqueness of the solutions to (4.3.4), together
with N,R (t) = N (t) on {T TM } and the fact that TM almost surely, imply
the strong (and weak) uniqueness of the solutions to (4.3.3).

Proof of Theorem 4.3.2 As a preliminary observation, note that the law of H N, is


invariant under the action of the orthogonal (when = 1) or unitary (when = 2)
groups, that is, (OH N, (t)O )t0 has the same distribution as (H N, (t))t0 if O
belongs to the orthogonal (if = 1) or unitary (if = 2) groups. Therefore, the
law of ( N (t))t0 does not depend on the basis of eigenvectors of X N, (0) and we
shall assume in the sequel, without loss of generality, that X N, (0) is diagonal and
real.

The proof we present goes backward by proposing a way to construct the ma-
trix X N, (t) from the solution of (4.3.3) and a Brownian motion on the orthogonal
(resp. unitary) group. Its advantage with respect to a forward proof is that we
do not need to care about justifying that certain quantities defined from X N, are
semi-martingales to insure that Itos calculus applies.

We first prove the theorem in the case N (0) N . We begin by enlarging the
probability space by adding to the independent Brownian motions (Wi , 1 i N)
an independent collection of independent Brownian motions (wi j , 1 i < j
1
N), which are complex if = 2 (that is, wi j = 2 2 (w1i j + 1w2i j ) with two
independent real Brownian motions w1i j , w2i j ) and real if = 1. We continue to use
Ft to denote the enlarged sigma-algebra (wi j (s), 1 i < j N,Wi (s), 1 i
N, s t).

Fix M > 0 and R as in (4.3.7). We consider the strong solution of (4.3.3),


constructed with the Brownian motions (Wi , 1 i N), till the stopping time TM
defined in (4.3.7). We set, for i < j,

1 1
dRNij (t) = dwi j (t) , RNij (0) = 0. (4.3.9)
N i (t) jN (t)
N

We let RN (t) be the skew-Hermitian matrix (i.e. RN (t) = RN (t) ) with such en-
tries above the diagonal and null entries on the diagonal. Note that since N (t)
N for all t, the matrix-valued process RN (t) is well defined, and its entries are
semi-martingales.
254 4. S OME GENERALITIES

Recalling the notation for the bracket of semi-martingales, see (H.1), for A, B
two semi-martingales with values in MN , we denote by A, Bt the matrix
N
(A, Bt )i j = (AB)i j t = Aik , Bk j t , 1 i, j N.
k=1

Observe that for all t 0, A, Bt = B , A t . We set ON to be the (strong) solution
of
1
dON (t) = ON (t)dRN (t) ON (t)d(RN ) , RN t , ON (0) = IN . (4.3.10)
2
This solution exists and is unique since it is a linear equation in ON and RN is a
well defined semi-martingale. In fact, as the next lemma shows, ON (t) describes
a process in the space of unitary matrices (orthogonal if = 1).

Lemma 4.3.4 The solution of (4.3.10) satisfies

ON (t)ON (t) = ON (t) ON (t) = I for all t 0 .

Further, let D( N (t)) denote a diagonal matrix with D( N (t))ii = N (t)i and set
Y N (t) = ON (t)D( N (t))ON (t) . Then

P(t 0, Y N (t) HN ) = 1 ,

and the entries of the process (Y N (t))t0 are continuous martingales with respect
to the filtration F , with bracket

YiNj ,YklN t = N 1 (1i j=kl (2 ) + 1i j=lk )t.

Proof We begin by showing that J N (t) := ON (t) ON (t) equals the identity IN for
all time t. Toward this end, we write a differential equation for K N (t) := J N (t)IN
based on the fact that the process (ON (t))t0 is the strong solution of (4.3.10). We
have
 .  . 
 
d(ON ) , (ON )t i j = d d(RN ) (s)(ON ) (s), ON (s)dRN (s)t
0 0 ij
N  .  .
= d( 0
(dRN ) (s)(ON ) (s))ik , (
0
ON (s)dRN (s))k j t
k=1
N N
= ONkm (t)ONkn (t)dRNmi , RNnj t
m,n=1 k=1
N
= N
Jmn (t)dRNmi , RNnj t , (4.3.11)
m,n=1
4.3 S TOCHASTIC ANALYSIS FOR RANDOM MATRICES 255

where here and in the sequel we use 0 to denote an indefinite integral viewed as
a process. Therefore, setting A.B = AB + BA, we obtain
1
dK N (t) = J N (t)[dRN (t) d(RN ) , RN t ]
2
1
+[d(R ) (t) d(RN ) , RN t ]J N (t) + d(ON ) , ON t
N
2
1
= K (t).(dR (t) d(RN ) , RN t ) + drN (t) ,
N N
2
with drN (t)i j = Nm,n=1 Kmn
N (t)dRN , RN  . For any deterministic M > 0 and
mi n j t
0 S T , set, with TM given by (4.3.7),

(M, S, T ) = max sup |KiNj (t TM )|2 ,


1i, jN tS

and note that E (M, S, T ) < for all M, S, T , and that it is nondecreasing in
S. From the BurkholderDavisGundy inequality (Theorem H.8), the equality
KN (0) = 0, and the fact that (RN (t TM ))tT has a uniformly (in T ) bounded mar-
tingale bracket, we deduce that there exists a constant C(M) < (independent of
S, T ) such that for all S T ,
 S
E (M, S, T ) C(M)E (M,t, T )dt .
0

It follows that E (M, T, T ) vanishes for all T, M. Letting M going to infinity we


conclude that K N (t) = 0 almost surely, that is, ON (t) ON (t) = IN .
We now show that Y N has martingales entries and compute their martingale
bracket. By construction,

dY N (t) = dON (t)D( N (t))ON (t) + ON (t)D( N (t))dON (t)


+ON (t)dD( N (t))ON (t) + dON D( N )(ON ) t (4.3.12)

where for all i, j {1, , N}, we have denoted


 
dON D( N )(ON ) t i j
N  
1 N 1
= Oik (t)dkN , ONjk t + kN (t)dONik , ONjk t + ONjk (t)dkN , ONik t
k=1 2 2
N
= kN (t)dONik , ONjk t ,
k=1

and we used in the last equality the independence of (wi j , 1 i < j N) and
(Wi , 1 i N) to assert that the martingale bracket of N and ON vanishes. Set-
256 4. S OME GENERALITIES

ting
dZ N (t) := ON (t) dY N (t)ON (t) , (4.3.13)
we obtain from the left multiplication by ON (t) and right multiplication by ON (t)
of (4.3.12) that
dZ N (t) = (ON ) (t)dON (t)D( N (t)) + D( N (t))dON (t) ON (t)
+dD( N (t)) + ON (t) dON D( N )(ON ) t ON (t) . (4.3.14)
We next compute the last term in the right side of (4.3.14). For all i, j {1, . . . , N}2 ,
we have
  N
dON D( N )(ON ) t i j = kN (t)dONik , ONjk t
k=1
N
= kN (t)ONil (t)ONjm (t)dRNlk , RNmk t .
k,l,m=1

But, by the definition (4.3.9) of RN ,


1
dRNlk , RNmk t = 1m=l 1m =k dt , (4.3.15)
N(kN (t) mN (t))2
and so we obtain
  kN (t)
dON D( N )(ON ) t i j = N( N (t)
ON (t)ONjl (t)dt .
N (t))2 il
1k =lN k l

Hence, for all i, j {1, . . . , N}2 ,


kN (t)
[ON (t) dON D( N )(ON ) t ON (t)]i j = 1i= j N(iN (t) kN (t))2
dt .
1kN
k =i

Similarly, recall that


ON (t) dON (t) = dRN (t) 21 d(RN ) , RN t ,
so that from (4.3.15) we get, for all i, j {1, , N}2 ,
1
[ON (t) dON (t)]i j = dRNij (t) 21 1i= j dt .
1kN N(i (t) k (t))
N N 2
k =i

Therefore, identifying the terms on the diagonal in (4.3.14) and recalling that RN
vanishes on the diagonal, we find, substituting in (4.3.13), that
7
N 2
dZii (t) = dWi (t).
N
4.3 S TOCHASTIC ANALYSIS FOR RANDOM MATRICES 257

Away from the diagonal, for i = j, we get


1
dZiNj (t) = [dRN (t)D( N (t)) + D( N (t))dRN (t) ]i j = dwi j (t) .
N
Hence, (Z N (t))t0 has the law of a symmetric (resp. Hermitian) Brownian motion.
Thus, since (ON (t))t0 is adapted,
 t
Y N (t) = ON (s)dZ N (s)ON (s)
0
is a continuous matrix-valued martingale whose quadratic variation
dYiNj ,YiN j t is given by
N


ONik (t)ONjl (t)ONi k (t)ONj l (t)dZklN , ZkN l t
k,l,k ,l =1
N
1
= ON (t)ONjl (t)ONi k (t)ONj l (t)(1kl=l k + 1 =1 1kl=k l )dt
N k,l,k ,l =1 ik
1
= (1 + 1 =1 1i j=i j )dt .
N i j= j i

We return to the proof of Theorem 4.3.2. Applying Levys Theorem (Theo-


rem H.2) to the entries of Y N , we conclude that (Y N (t) Y N (0))t0 is a symmet-
ric (resp. Hermitian) Brownian motion, and so (Y N (t))t0 has the same law as
(X N, (t))t0 since X N (0) = Y N (0), which completes the proof of the theorem in
the case Y N (0) N .
Consider next the case where X N, (0) N \ N . Note that the condition
N (t) N means that the discriminant of the characteristic polynomial of X N, (t)
vanishes. The latter discriminant is a polynomial in the entries of X N, (t), that
does not vanish identically. By the same argument as in the proof of Lemma
2.5.5, it follows that N (t) N , almost surely. Hence, for any > 0, the law of
(X N, (t))t coincides with the strong solution of (4.3.3) initialized at X N, ( ).
By Lemma 2.1.19, it holds that for all s,t R,
N N
1 N, N,
(iN (t) iN (s))2 N (Hi j (t) Hi j (s))2 ,
i=1 i, j=1

and thus the a.s. continuity of the Brownian motions paths results in the a.s.
continuity of t N (t) for any given N. Letting 0 completes the proof of the
theorem.

Our next goal is to extend the statement of Lemma 4.3.3 to initial conditions
belonging to N . Namely, we have the following.
258 4. S OME GENERALITIES

Proposition 4.3.5 Let N (0) = (1N (0), . . . , NN (0)) N . For any 1, there
exists a unique strong solution ( N (t))t0 C(R+ , N ) to the stochastic differen-
tial system (4.3.3) with initial condition N (0). Further, for any t > 0, N (t) N
and N (t) is a continuous function of N (0).

When = 1, 2, 4, Proposition 4.3.5 can be proved by using Theorem 4.3.2. In-


stead, we provide a proof valid for all 1, that does not use the random matrices
representation of the solutions. As a preliminary step, we present a comparison
between strong solutions of (4.3.3) with initial condition in N .

Lemma 4.3.6 Let ( N (t))t0 and ( N (t))t0 be two strong solutions of (4.3.3)
starting, respectively, from N (0) N and N (0) N . Assume that iN (0) <
iN (0) for all i. Then,

P(for all t 0 and i = 1, . . . , N, iN (t) < iN (t)) = 1 . (4.3.16)

Proof of Lemma 4.3.6 We note first that d(i iN (t) i iN (t)) = 0. In particular,

(iN (t) iN (t)) = (iN (0) iN (0)) < 0 . (4.3.17)


i i

Next, for all i {1, . . . , N}, we have from (4.3.3) and the fact that N (t) N ,
N (t) N for all t that

1 (iN iN Nj + jN )(t)
d(iN iN )(t) =
N (iN (t) Nj (t))(iN (t) jN (t))
dt .
j: j =i

Thus, iN iN is differentiable for all i and, by continuity, negative for small


enough times. Let T be the first time at which (iN iN )(t) vanishes for some i
{1, . . . , N}, and assume T < . Since (iN (t) Nj (t))(iN (t) jN (t)) is strictly
positive for all time, we deduce that t (iN iN )|t=T is negative (note that it is
impossible to have ( jN Nj )(T ) = 0 for all j because of (4.3.17)). This provides
a contradiction since (iN iN )(t) was strictly negative for t < T .

We can now prove Proposition 4.3.5.

Proof of Proposition 4.3.5 Set N (0) = (1N (0), . . . , NN (0)) N and put for n
Z, iN,n (0) = iN (0) + ni . We have N,n (0) N and, further, if n > 0, iN,n (0) <
iN,n1 (0) < iN,n+1 (0) < iN,n (0). Hence, by Lemma 4.3.6, the corresponding
solutions to (4.3.3) satisfy almost surely and for all t > 0

iN,n (t) < iN,n1 (t) < iN,n+1 (t) < iN,n (t) .
4.3 S TOCHASTIC ANALYSIS FOR RANDOM MATRICES 259

Since
N N
( N,n (t) N,n (t)) = ( N,n (0) N,n (0)) (4.3.18)
i=1 i=1

goes to zero as n goes to infinity, we conclude that the sequences N,n and N,n
converge uniformly to a limit, which we denote by N . By construction, N
C(R+ , N ). Moreover, if we take any other sequence N,p (0) N converging
to N (0), the solution N,p to (4.3.3) also converges to N (as can be seen by
comparing N,p (0) with some N,n (0), N,n (0) for p large enough).
We next show that N is a solution of (4.3.3). Toward that end it is enough
to show that for all t > 0, N (t) N , since then if we start at any positive time
s we see that the solution of (4.3.3) starting from N (s) can be bounded above
and below by N,n and N,n for all large enough n, so that this solution must
coincide with the limit ( N (t),t s). So let us assume that there is t > 0 so that
N (s) N \N for all s t and obtain a contradiction. We let I be the largest
i {2, . . . , N} so that kN (s) < k+1
N (s) for k I but N (s) = N (s) for s t.
I1 I
Then, we find a constant C independent of n and n going to zero with n so that,
for n large enough,
|kN,n (s) k+1
N,n
(s)| C k I, |IN,n (s) I1
N,n
(s)| n .
Since N,n solves (4.3.3), we deduce that for s t
N,n N,n 2 I1 1 1
I1 (s) I1 (0) + W + (n C(N I))s.
N s N
N,n
This implies that I1 (s) goes to infinity as n goes to infinity, a.s. To obtain a
contradiction, we show that with CN (n,t) := N1 Ni=1 (iN,n (t))2 , we have

sup sup CN (n,t) < , a.s. (4.3.19)
n s[0,t]

With (4.3.19), we conclude that for all t > 0, N (t) N , and in particular it is
the claimed strong solution.

To see (4.3.19), note that since iN,n (s) iN,n (s) for any n n and all s by
Lemma 4.3.6, we have that
1 N N,n
|CN (n, s) CN (n , s)| = (i (s) iN,n (s))|(iN,n (s) + iN,n (s))|
N i=1
N N
1
(iN,n (s) iN,n (s)) N (|(iN,n (s)| + |iN,n (s)|)
i=1 i=1
  N

( CN (n, s) + CN (n , s)) ( N,n (0) N,n (0)) ,
i=1
260 4. S OME GENERALITIES

where (4.3.18) and the CauchySchwarz inequality were used in the last inequal-
ity. It follows that
  N

CN (n, s) CN (n , s) + ( N,n (0) N,n (0)) ,
i=1

and thus
  N

sup sup CN (n, s) sup CN (n , s) + ( N,n (0) N,n (0)) .
nn s[0,t] s[0,t] i=1

Thus, to see (4.3.19), it is enough to bound almost surely sups[0,t] CN (n,t) for a
fixed n. From Itos Lemma (see Lemma 4.3.12 below for a generalization of this
particular computation),

2 2 N t N,n
CN (n,t) = DN (n,t) + 
N N i=1 0
i (s)dWi (s)

with DN (n,t) := CN (n, 0) + ( 2 + N1


N )t. Define the stopping time SR = inf{s :
CN (n, s) R}. Then, by the BurkholderDavisGundy inequality (Theorem H.8)
we deduce that

E[ sup CN (n, s SR )2 ]
s[0,t]
 t
2[DN (n,t)]2 + 2N 2 E[ sup CN (n, s SR )]du
0 s[0,u]
 t
2[DN (n,t)]2 + N 2 t + N 2
E[ sup CN (n, s SR )2 ]du ,
0 s[0,u]

where the constant does not depend on R. Gronwalls Lemma then implies, with
EN (n,t) := 2[DN (n,t)]2 + N 2 t, that
 t
2
E[ sup CN (n, s SR )2 ] EN (n,t) + e2N 1 (st) EN (n, s)ds .
s[0,t] 0

We can finally let R go to infinity and conclude that E[sups[0,t] CN (n, s)] is finite
 
and so sups[0,t] CN (n, s), and therefore supn sups[0,t] CN (n, s), are finite al-
most surely, completing the proof of (4.3.19).

 
N,
Exercise 4.3.7 Let H N,4 = Xi j be 2N2N complex Gaussian Wigner matrices
defined as the self-adjoint random matrices with entries

N, 4i=1 gikl ei N,4 1
Hkl = , 1 k < l N, Xkk = gkk e1 , 1 k N ,
4N 2N
4.3 S TOCHASTIC ANALYSIS FOR RANDOM MATRICES 261

where (ei )1i are the Pauli matrices


       
1 0 0 1 0 i i 0
e14 = , e24 = , e34 = , e44 = .
0 1 1 0 i 0 0 i

Show that with H N,4 as above, and X N,4 (0) a Hermitian matrix with eigenval-
ues (1N (0), . . . , 2N
N (0)) , the eigenvalues ( N (t), . . . , N (t)) of X N,4 (0) +
N 1 2N
N,4
H (t) satisfy the stochastic differential system

1 1 1
d iN (t) = dWi (t) +
2N N N (t) N (t) dt , i = 1, . . . , 2N . (4.3.20)
j =i i j

Exercise 4.3.8 [Bru91] Let V (t) be an NM matrix whose entries are independent
complex Brownian motions and let V (0) be an NM matrix with complex entries.
Let N (0) = ( N (0), . . . , NN (0)) N be the eigenvalues of V (0)V (0) . Show
that the law of the eigenvalues of X(t) = V (t)V (t) is the weak solution to
7
iN (t) M N + iN
d iN (t) = 2 dWi (t) + 2( + kN )dt ,
N N k =i i kN

with initial condition N (0).

Exercise 4.3.9 Let X N be the matrix-valued process solution of the stochastic


N,
differential system dXtN = dHt XtN dt, with D(X N (0)) N .
(a) Show that the law of the eigenvalues of XtN is a weak solution of

2 1 1
d iN (t) =  dWi (t) + N dt iN (t)dt . (4.3.21)
N N j =i i (t) jN (t)

(b) Show that if X0N = H N, (1), then the law of XtN is the same law for all t 0.
( )
Conclude that the law PN of the eigenvalues of Gaussian Wigner matrices is sta-
tionary for the process (4.3.21).
( )
(c) Deduce that PN is absolutely continuous with respect to the Lebesgue mea-
sure, with density
N
|xi x j | e xi /4 ,
2
1x1 xN
1i< jN i=1

as proved in Theorem 2.5.2. Hint: obtain a partial differential equation for the
invariant measure of (4.3.21) and solve it.
262 4. S OME GENERALITIES

4.3.2 A dynamical version of Wigners Theorem

In this subsection, we derive systems of (deterministic) differential equations sat-


isfied by the limits of expectation of LN (t), g, for nice test functions g and

LN (t) = N 1 N (t) , (4.3.22)


i

where (iN (t))t0 is a solution of (4.3.3) for 1 (see Proposition 4.3.10). Spe-
cializing to = 1 or = 2, we will then deduce in Corollary 4.3.11 a dynamical
proof of Wigners Theorem, Theorem 2.1.1, which, while restricted to Gaussian
entries, generalizes the latter theorem in the sense that it allows one to consider
the sum of a Wigner matrix with an arbitrary, N-dependent Hermitian matrix,
provided the latter has a converging empirical distribution. The limit law is then
described as the law at time one of the solution to a complex Burgers equation, a
definition which introduces already the concept of free convolution (with respect
to the semicircle law) that we shall develop in Section 5.3.3. In Exercise 4.3.18,
Wigners Theorem is recovered from its dynamical version.
We recall that, for T > 0, we denote by C([0, T ], M1 (R)) the space of contin-
uous processes from [0, T ] into M1 (R) (the space of probability measures on R,
equipped with its weak topology). We now prove the convergence of the empirical
measure LN (), viewed as an element of C([0, T ], M1 (R)).

Proposition 4.3.10 Let 1 and let N (0) = (1N (0), . . . , NN (0)) N , be a


sequence of real vectors so that N (0) N ,

1 N
C0 := sup log(iN (0)2 + 1) < ,
N0 N i=1
(4.3.23)

and the empirical measure LN (0) = N1 Ni=1 N (0) converges weakly as N goes to
k
infinity towards a M1 (R).
Let N (t) = (1N (t), . . . , NN (t))t0 be the solution of (4.3.3) with initial con-
dition N (0), and set LN (t) as in (4.3.22). Then, for any fixed time T < ,
(LN (t))t[0,T ] converges almost surely in C([0, T ], M1 (R)). Its limit is the unique
measure-valued process (t )t[0,T ] so that 0 = and the function

Gt (z) = (z x)1 d t (x) (4.3.24)

satisfies the equation


 t
Gt (z) = G0 (z) Gs (z)z Gs (z)ds (4.3.25)
0

for z C\R .
4.3 S TOCHASTIC ANALYSIS FOR RANDOM MATRICES 263

An immediate consequence of Proposition 4.3.10 is the following.

Corollary 4.3.11 For = 1, 2, let (X N, (0))NN be a sequence of real diago-


nal matrices, with eigenvalues (1N (0), . . . , NN (0)) satisfying the assumptions of
Proposition 4.3.10. For t 0, let iN (t) = (1N (t), . . . , NN (t) N denote the
eigenvalues of X N, (t) = X N, (0) + H N, (t), and let LN (t) be as in (4.3.22). Then
the measure-valued process (LN (t))t0 converges almost surely towards (t )t0
in C([0, T ], M1 (R)).

Proof of Proposition 4.3.10 We begin by showing that the sequence (LN (t))t[0,T ]
is almost surely pre-compact in C([0, T ], M1 (R)) and then show that it has a unique
limit point characterized by (4.3.25). The key step of our approach is the follow-
ing direct application of Itos Lemma, Theorem H.9, to the stochastic differential
system (4.3.3), whose elementary proof we omit.

Lemma 4.3.12 Under the assumptions of Proposition 4.3.10, for all T > 0, all
f C2 ([0, T ]R, R) and all t [0, T ],
 t
 f (t, ), LN (t) =  f (0, ), LN (0) + s f (s, ), LN (s)ds (4.3.26)
0
 
1 t x f (s, x) y f (s, y)
+ dLN (s)(x)dLN (s)(y)ds
2 0 xy
 t
2 1
+ ( 1)  2 f (s, ), LN (s)ds + M Nf (t) ,
2N 0 x

where M Nf is the martingale given for t T by


N  t
2
M Nf (t) =  3 x f (s, iN (s))dWsi .
N 2 i=1 0

We note that the bracket of the martingale M Nf appearing in Lemma 4.3.12 is


 t 2t sups[0,t] x f (., s)2
2
M Nf t = (x f (s, x))2 , LN (s)ds .
N2 0 N2
We also note that the term multiplying (2/ 1) in (4.3.26) is coming from both
the quadratic variation term in Itos Lemma and the finite variation term where
the terms on the diagonal x = y were added. That it vanishes when = 2 is a
curious coincidence, and emphasizes once more that the Hermitian case ( = 2)
is in many ways the simplest case.
We return now to the proof of Proposition 4.3.10, and begin by showing that the
264 4. S OME GENERALITIES

sequence (LN (t))t[0,T ] is a pre-compact family in C([0, T ], M1 (R)) for all T < .
Toward this end, we first describe a family of compact sets of C([0, T ], M1 (R)).

Lemma 4.3.13 Let K be a an arbitrary compact subset of M1 (R), let ( fi )i0 be a


sequence of bounded continuous functions dense in C0 (R), and let Ci be compact
subsets of C([0, T ], R). Then the sets
.
K := {t [0, T ], t K} {tt ( fi ) Ci } (4.3.27)
i0

are compact subsets of C([0, T ], M1 (R)).

Proof of Lemma 4.3.13 The space C([0, T ], M1 (R)) being Polish, it is enough to
prove that the set K is sequentially compact and closed. Toward this end, let
( n )n0 be a sequence in K . Then, for all i N, the functions ttn ( fi ) be-
long to the compact sets Ci and hence we can find a subsequence i (n) n
(n)
such that the sequence of bounded continuous functions tt i ( fi ) converges
in C[0, T ]. By a diagonalization procedure, we can find an i independent subse-
(n)
quence (n) n such that for all i N, the functions tt ( fi ) converge
towards some function tt ( fi ) C[0, T ]. Because ( fi )i0 is convergence deter-
mining in K M1 (R), it follows that one may extract a further subsequence, still
denoted (n), such that for a fixed dense countable subset of [0, T ], the limit t
belongs to M1 . The continuity of tt ( fi ) then shows that t M1 (R) for all t,
which completes the proof that ( n )n0 is sequentially compact. Since K is an
intersection of closed sets, it is closed. Thus, K is compact, as claimed.

We next prove the pre-compactness of the sequence (LN (t),t [0, T ]).

Lemma 4.3.14 Under the assumptions of Proposition 4.3.10, fix T R+ . Then


the sequence (LN (t),t [0, T ]) is almost surely pre-compact in C([0, T ], M1 (R)).

Proof We begin with a couple of auxiliary estimates. Note that from Lemma
4.3.12, for any function f that is twice continuously differentiable,

f (x) f (y)
dLN (s)(x)dLN (s)(y)
xy
  1
= f ( x + (1 )y)d dLN (s)(x)dLN (s)(y) . (4.3.28)
0

Apply Lemma 4.3.12 with the function f (x) = log(1 + x2 ), which is twice contin-
uously differentiable with second derivative uniformly bounded by 2, to deduce
that
1
sup | f , LN (t)| | f , LN (0)| + T (1 + ) + sup |M Nf (t)| (4.3.29)
tT N tT
4.3 S TOCHASTIC ANALYSIS FOR RANDOM MATRICES 265

with M Nf a martingale with bracket bounded by 2( N 2 )1 since | f | 1. By the


BurkholderDavisGundy inequality (Theorem H.8) and Chebyshevs inequality,
we get that, for a universal constant 1 ,
21
P(sup |M Nf (t)| ) , (4.3.30)
tT N2
2

which, together with (4.3.29), proves that there exists a = a(T ) < so that, for
M > T +C0 + 1,
 
a
P sup log(x2 + 1), LN (t) M . (4.3.31)
t[0,T ] (M T C 0 1) N
2 2

We next need an estimate on the Holder norm of the function t f , LN (t),


for any twice boundedly differentiable function f on R, with first and second
derivatives bounded by 1. We claim that there exists a constant a = a(T ) so that,
for any (0, 1) and M > 2,

1 a 1/2
P sup | f , LN (t)  f , LN (s)| M 8 4 4 . (4.3.32)
t,s[0,T ] M N
|ts|

Indeed, apply Lemma 4.3.12 with f (x,t) = f (x). Using (4.3.28), one deduces that
for all t s,
| f , LN (t)  f , LN (s)| || f || |s t| + |M Nf (t) M Nf (s)| , (4.3.33)

where M Nf (t) is a martingale with bracket 2 1 N 2 0t ( f )2 , LN (u)du. Now,
cutting [0, T ] to intervals of length we get, with J := [T 1 ],

N
P sup M f (t) M Nf (s) (M 1) 1/8
|ts|
t,sT
 
J+1 N
P sup M (t) M N (k ) (M 1) 1/8 /3
f f
k=1 k t(k+1)
 
J+1
34 N
1/2 (M 1)4 E sup M (t) M N (k ) 4
f f
k=1 k t(k+1)
1
4 34 2 2 a 2
(J + 1)|| f ||2 =: 2  f 2 ,
N (M 1)
2 4 1/2 4 N (M 1)4
where again we used in the second inequality Chebyshevs inequality, and in the
last the BurkholderDavisGundy inequality (Theorem H.8) with m = 2. Com-
bining this inequality with (4.3.33) completes the proof of (4.3.32).
266 4. S OME GENERALITIES

We can now conclude the proof of the lemma. Setting



KM = { M1 (R) : log(1 + x2 )d (x) M} ,

BorelCantellis Lemma and (4.3.31) show that


 
! .
P { t [0, T ], LN (t) KM } = 1. (4.3.34)
N0 0 NN0

Next, recall that by the ArzelaAscoli Theorem, sets of the form


.
C= {g C([0, T ], R) : sup |g(t) g(s)| n , sup |g(t)| M} ,
n t,s[0,T ] t[0,T ]
|ts|n

where {n , n 0} and {n , n 0} are sequences of positive real numbers going


to zero as n goes to infinity, are compact. For f C2 (R) with derivatives bounded
by 1, and > 0, consider the subset of C([0, T ], M1 (R)) defined by

. 1
CT ( f , ) := { C([0, T ], M1 (R)) : sup |t ( f ) s ( f )| } .
n=1 |ts|n4 n

Then, by (4.3.32),
a 4
P (LN CT ( f , )c ) . (4.3.35)
N4

Choose a countable family fk of twice continuously differentiable functions


1
dense in C0 (R), and set k = 1/k( fk  +  fk  +  fk  ) 2 < 21 , with
.
K = KM CT ( fk , k ) C([0, T ], M1 (R)) . (4.3.36)
k1

Combining (4.3.34) and (4.3.35), we get from the BorelCantelli Lemma that
 
! .
P {LN K } = 1.
N0 0 NN0

Since K is compact by Lemma 4.3.13, the claim follows.


We return to the proof of Proposition 4.3.10. To characterize the limit points


of LN , we again use Lemma 4.3.12 with a general twice continuously differen-
tiable function f with bounded derivatives. Exactly as in the derivation leading to
(4.3.30), the BoreliCantelli Lemma and the BurkholderDavisGundy inequality
(Theorem H.8) yield the almost sure convergence of M Nf towards zero, uniformly
on compact time intervals. Therefore, any limit point (t ,t [0, T ]) of LN satisfies
4.3 S TOCHASTIC ANALYSIS FOR RANDOM MATRICES 267

the equation
   t
f (t, x)d t (x) = f (0, x)d 0 (x) + s f (s, x)d s (x)ds
0
 t
1 x f (s, x) x f (s, y)
+ d s (x)d s (y)ds . (4.3.37)
2 0 xy
Taking f (x) = (z x)1 for some z C\R, we deduce that the function Gt (z) =

(z x)1 d t (x) satisfies (4.3.24), (4.3.25). Note also that since the limit t is a
probability measure on the real line, Gt (z) is analytic in z for z C+ .
To conclude the proof of Proposition 4.3.10, we show below in Lemma 4.3.15
that (4.3.24), (4.3.25) possess a unique solution analytic on z C+ := {z C :
(z) > 0}. Since we know a priori that the support of any limit point t lives in
R for all t, this uniqueness implies the uniqueness of the Stieltjes transform of t
for all t and hence, by Theorem 2.4.3, the uniqueness of t for all t, completing
the proof of Proposition 4.3.10.

Lemma 4.3.15 Let , = {z C+ : z |z|, |z| } and for t 0, set


t := {z C+ : z + tG0 (z) C+ }. For all t 0, there exist positive constants
t , t , t , t such that t ,t t and the function z t ,t z + tG0 (z) t ,t
is invertible with inverse Ht : t ,t t ,t . Any solution of (4.3.24), (4.3.25) is
the unique analytic function on C+ such that for all t and all z t ,t ,
Gt (z) = G0 (Ht (z)) .

Proof We first note that since |G0 (z)| 1/|z|, (z + tG0 (z)) z t/z is
positive for t < (z)2 and z > 0. Thus, t ,t t for t < (t t )2 /(1 + t2 ).
Moreover, |G0 (z)| 1/2|z| from which we see that, for all t 0, the image of
t ,t by z + tG0 (z) is contained in some t ,t provided t is large enough. Note
that we can choose the t ,t and t ,t decreasing in time.
We next use the method of characteristics. Fix G. a solution of (4.3.24), (4.3.25).
We associate with z C+ the solution {zt ,t 0} of the equation
t zt = Gt (zt ) , z0 = z. (4.3.38)
We can construct a solution z. to this equation up to time (z)2 /4 with zt z/2
as follows. We put for > 0,

z x
Gt (z) := d t (x), t zt = Gt (zt ) , z0 = z.
|z x|2 +
z. exists and is unique since Gt is uniformly Lipschitz. Moreover,

t (zt ) 1 1
= d t (x) [ , 0],
(zt ) |zt x| +
2 |(zt )|2
268 4. S OME GENERALITIES

implies that |(zt )|2 [|(z)|2 2t, |(z)|2 ] and



(zt ) x 1 1
t (zt ) = d t (x) [  , ]
|zt x|2 +
|(zt )| +
2 |(zt )|2 +
shows that (zt ) stays uniformly bounded, independently of , up to time (z)2 /4
as well as its time derivative. Hence, {zt ,t (z)2 /4} is tight by ArzelaAscolis
Theorem. Any limit point is a solution of the original equation and such that
zt z/2 > 0. It is unique since Gt is uniformly Lipschitz on this domain.
Now, t Gt (zt ) = 0 implies that for t (z)2 /4,
zt = tG0 (z) + z, Gt (z + tG0 (z)) = G0 (z) .
By the implicit function theorem, z + tG0 (z) is invertible from t ,t into t ,t
since 1 + tG 0 (z) = 0 (note that G 0 (z) = 0) on t ,t . Its inverse Ht is analytic
from t ,t into t ,t and satisfies
Gt (z) = G0 (Ht (z)).

With a view toward later applications in Subsection 4.3.3 to the proof of central
limit theorems, we extend the previous results to polynomial test functions.

Lemma 4.3.16 Let 1. Assume that


C = sup max |iN (0)| < .
NN 1iN

With the same notation and assumptions as in Proposition 4.3.10, for any T < ,
for any polynomial function q, the process (q, LN (t))t[0,T ] converges almost
surely and in all L p , towards the process (t (q))t[0,T ] , that is,
lim sup sup |q, LN (t) q, t | = 0 a.s.
N t[0,T ]

and for all p N,


lim sup E[ sup |q, LN (t) q, t | p ] = 0 .
N t[0,T ]

A key ingredient in the proof is the following control of the moments of N (t) :=
max1iN |iN (t)| = max(NN (t), 1N (t)).

Lemma 4.3.17 Let 1 and N (0) N . Then there exist finite constants
= ( ) > 0,C = C( ), and for all t 0 a random variable N (t) with law
independent of t, such that
P(N (t) x +C) e Nx
4.3 S TOCHASTIC ANALYSIS FOR RANDOM MATRICES 269

and, further, the unique strong solution of (4.3.3) satisfies, for all t 0,

N (t) N (0) + t N (t) . (4.3.39)

We note that for = 1, 2, 4, this result can be deduced from the study of the
maximal eigenvalue of X N, (0) + H N, (t), since the spectral radius of H N, (t)

has the same law as the spectral radius of tH N, (1), that can be controlled as in
Section 2.1.6. The proof we give below is based on stochastic analysis, and works
for all 1. It is based on the comparison between strong solutions of (4.3.3)
presented in Lemma 4.3.6.
Proof of Lemma 4.3.17 Our approach is to construct a stationary process N (t) =
(1N (t), . . . , NN (t)) N , t 0, with marginal distribution P(N ) := PNx2 /4, as in
(2.6.1), such that, with N (t) = max(NN (t), 1N (t)), the bound (4.3.39) holds.
We first construct this process (roughly corresponding to the process of eigenval-

ues of H N, (t)/ t if = 1, 2, 4) and then prove (4.3.39) by comparing solutions
to (4.3.3) started from different initial conditions.
Fix > 0. Consider, for t , the stochastic differential system
7
2 1 1 1
N
dui (t) =
Nt
dWi (t) +
Nt j =i ui (t) u j (t)
N N
dt uNi (t)dt.
2t
(4.3.40)

( )
Let PN denote the rescaled version of PN from (2.5.1), that is, the law on N with
density proportional to

|i j | eN i /4 .
2

i< j i


Because PN (N ) = 1, we may take uN ( ) distributed according to PN , and the
proof of Lemma 4.3.3 carries over to yield the strong existence and uniqueness of
solutions to (4.3.40) initialized from such (random) initial conditions belonging to
N .

Our next goal is to prove that PN is a stationary distribution for the system
(4.3.40) with this initial distribution, independently of . Toward this end, note
that by Itos calculus (Lemma 4.3.12), one finds that for any twice continuously
differentiable function f : RN R,
1 i f (uN (t)) j f (uN (t))
t E[ f (uN (t))] = E[
2Nt i = j uNi (t) uNj (t)
]

1 1
E[
2t i
uNi (t)i f (uN (t))] + E[
Nt
i2 f (uN (t))] ,
i

where we used the notation i f (x) = xi f (x1 , . . . , xN ). Hence, if at any time t,


270 4. S OME GENERALITIES

uN (t) has law PN , we see by integration by parts that t E[ f (uN (t))]| vanishes
for any twice continuously differentiable f . Therefore, (uN (t))t is a stationary

process with marginal law PN . Because the marginal PN does not depend on ,
one may extend this process to a stationary process (uN (t))t0 .
Set uN (t) = max(uNN (t), uN1 (t)). Recall that by Theorem 2.6.6 together with
(2.5.11),
1 x2 /4
lim log PN (N u) = inf J (s) ,
N N su

x2 /4
with J (s) > 0 for s > 2. Thus, there exist C < and > 0 so that for x C,
for all N N ,

P(uN (t) x) 2PN (N x) e Nx . (4.3.41)


Define next N,0 (t) = tuN (t). Clearly, N,0 (0) = 0 N . An application
of Itos calculus, Lemma 4.3.12, shows that N,0 (t) is a continuous solution of
(4.3.3) with initial data 0, and N,0 (t) N for all t > 0. For an arbitrary constant
A, define N,A (t) N by iN,A (t) = iN,0 (t) + A, noting that ( N,A (t))t0 is again
a solution of (4.3.3), starting from the initial data (A, . . . , A) N , that belongs to
N for all t > 0.
N, +N (0)
Note next that for any > 0, i (0) > iN (0) for all i. Further, for
N, +N (0)
t small, i > iN (t) for all i by continuity. Therefore, we get from
(t)
Lemma 4.3.6 that, for all t > 0,
N, +N (0)
NN (t) N (t) N (0) + + tuN (t) .

A similar argument shows that



1N (t) N (0) + + tuN (t) .

Since uN (t) is distributed according to the law PN , taking 0 and recalling
(4.3.41) completes the proof of the lemma.

Proof of Lemma 4.3.16 We use the estimates on N (t) from Lemma 4.3.17 in
order to approximate q, LN (t) for polynomial functions q by similar expressions
involving bounded continuous functions.
We begin by noting that, due to Lemma 4.3.17 and the BorelCantelli Lemma,
for any fixed t,

lim sup N (t) N (0) + tC a.s. (4.3.42)
N
4.3 S TOCHASTIC ANALYSIS FOR RANDOM MATRICES 271

Again from Lemma 4.3.17, we also have that, for any p 0,


  

p p
p1 Nx
E[(N (t)) ] 2 (N (0) +C t) + pt
p p 2 x e dx
0
 
p! p
= 2 p (N (0) +C t) p + t2 . (4.3.43)
( N) p
As a consequence, there exists an increasing function C(t), such that for any T <
, C(T ) = suptT C(t) < , and so that for all N sufficiently large, all p [0, N],
E[(N (t)) p ] (2C(t)) p . (4.3.44)
Note that (4.3.42) implies that, under the current assumptions, the support of the
limit t , see Proposition 4.3.10, is contained in the compact set [A(t), A(t)],

where A(t) := C +C t.
We next improve (4.3.42) to uniform (in t T ) bounds. Fix a constant

< min( /6, 1/T 1 ), where 1 is as in the BurkholderDavisGundy inequal-
ity (Theorem H.8). We will show that, for all T < and p N,
E[ sup |x| p , LN (t)] C(T ) p . (4.3.45)
t[0,T ]

This will imply that


E[ sup N (t) p ] NC(T ) p , (4.3.46)
t[0,T ]

and therefore, by Chebyshevs inequality, for any > 0,


NC(T ) p
P( sup N (t) > C(T ) + ) .
t[0,T ] (C(T ) + ) p

Taking p = p(N) = (log N)2 , we conclude by the BorelCantelli Lemma that


lim sup sup N (t) C(T ) a.s.
N 0tT

To prove (4.3.45), we use (4.3.26) with f (t, x) = xn and an integer n > 0 to get
xn+2 , LN (t) = xn+2 , LN (0) + Mn+2
N
(t)
  t
(n + 1)(n + 2) 2
+ 1 xn , LN (s)ds
2N 0

(n + 2) n t 
+
2 =0 0 x , LN (s)xn , LN (s)ds , (4.3.47)
N is a local martingale with bracket
where Mn+2
 t
2(n + 2)2
Mn+2
N
t = x2n+2 , LN (s)ds .
N2 0
272 4. S OME GENERALITIES

Setting n = 2p and using the BurkholderDavisGundy inequality (Theorem H.8),


one obtains
 T
81 (p + 1)2
N
E[ sup M2(p+1) (t)2 ] E[ x4p+2 , LN (s)ds]
t[0,T ] N2 0
T
1 p2 C(t)(4p+2)m dt 1 p2 TC(T )(4p+2)
c 0
c ,
N2 N2
for some constant c = c( ) independent of p or T , where we used (4.3.44) (and
thus used that 4p + 2 N). We set
t (p) := E[ sup |x| p , LN (t)] ,
t[0,T ]

and deduce from (4.3.47) and the last estimate that for p [0, N/2] integer,
1
(c1 ) 2 p tC(t)(2p+1)
t (2(p + 1)) 0 (2(p + 1)) +
N
 t
+(p + 1) E[(N (t))2p ]ds
2
(4.3.48)
0
1
(c1 ) 2 p tC(t)(2p+1)
C2(p+1) + + ( N)2C(t)2p .
N
Taking p = N/2, we deduce that the left side is bounded by (2C(T )) N , for all
N large. Therefore, by Jensens inequality, we conclude

t () t ( N) N (2C(T )) for all  [0, N] . (4.3.49)

We may now complete the proof of the lemma. For > 0 and continuous
function q, set
 
x
q (x) = q .
1 + x2
By Proposition 4.3.10, for any > 0, we have
lim sup |q , LN (t) q , t | = 0 . (4.3.50)
N t[0,T ]

Further, since the collection of measures t , t [0, T ], is uniformly compactly


supported by the remark following (4.3.42), it follows that
lim sup |q , t  q, t | = 0 . (4.3.51)
0 t[0,T ]

Now, if q is a polynomial of degree p, we find a finite constant C so that


|x|3
|q(x) q (x)| C (|x| p1 + 1) C (|x| p+2 + |x|3 ) .
1 + x2
4.3 S TOCHASTIC ANALYSIS FOR RANDOM MATRICES 273

Hence, (4.3.45) shows that, for any A > 0,


 
P sup |(q q ), LN (t)| AC
t[0,T ]
1 1

E[ sup (|x|(p+2) + |x|3 ) , LN (t)]  ((2C(T ))(p+2) + (2C(T ))3 ) ,
A t[0,T ] A

for any  N. By the BorelCantelli Lemma, taking  = (log N)2 and A larger
than 2C(T ), we conclude that
lim sup sup |(q q ), LN (t)| [(2C(T )) p+2 + (2C(T ))3 ]C , a.s.
N t[0,T ]

Together with (4.3.50) and (4.3.51), this yields the almost sure uniform conver-
gence of q, LN (t) to q, t . The proof of the L p convergence is similar once we
have (4.3.45).

Exercise 4.3.18 Take 0 = 0 . Show that the empirical measure LN (1) of the
Gaussian (real) Wigner matrices converges almost surely. Show that
1
G1 (z) = G1 (z)2
z
and conclude that the limit is the semicircle law, hence giving a new proof of
Theorem 2.1.1 for Gaussian entries.
Hint: by the scaling property, show that Gt (z) = t 1/2 G1 (t 1/2 z) and use Lemma
4.3.25.

Exercise 4.3.19 Using Exercise 4.3.7, extend Corollary 4.3.11 to the symplectic
setup ( = 4).

4.3.3 Dynamical central limit theorems

In this subsection, we study the fluctuations of (LN (t))t0 on path space. We


shall only consider the fluctuations of moments, the generalization to other test
functions such as continuously differentiable functions is possible by using con-
centration inequalities, see Exercise 2.3.7.
We continue in the notation of Subsection 4.3.2. For any n-tuple of polynomial
functions P1 , . . . , Pn C[X] and (t )t[0,T ] as in Lemma 4.3.16 with 0 = , set
 
GN, (P1 , . . . , Pn )(t) = N P1 , LN (t) t , . . . , Pn , LN (t) t  .

The main result of this subsection is the following.


274 4. S OME GENERALITIES

Theorem 4.3.20 Let 1 and T < . Assume that


C = sup max |iN (0)| <
NN 1iN

and that LN (0) converges towards a probability measure in such a way that, for
all p 2,
sup E[|N(xn , LN (0) xn , )| p ] < .
NN

Assume that for any n N and any P1 , . . . , Pn C[X], GN, (P1 , . . . , Pn )(0) con-
verges in law towards a random vector (G(P1 )(0), . . . , G(Pn )(0)). Then
(a) there exists a process (G(P)(t))t[0,T ],PC[X] , such that for any polynomial
functions P1 , . . . , Pn C[X], the process (GN, (P1 , . . . , Pn )(t))t[0,T ] converges in
law towards (G(P1 )(t), . . . , G(Pn )(t))t[0,T ] ;
(b) the limit process (G(P)(t))t[0,T ],PC[X] is uniquely characterized by the fol-
lowing two properties.
(1) For all P, Q C[X] and ( , ) R2 ,
G( P + Q)(t) = G(P)(t) + G(Q)(t) t [0, T ].
(2) For any n N, (G(xn )(t))t[0,T ],nN is the unique solution of the system of
equations
G(1)(t) = 0 , G(x)(t) = G(x)(0) + Gt1 ,
and, for n 2,
 t n2
G(xn )(t) = G(xn )(0) + n s (xnk2 )G(xk )(s)ds
0 k=0
 t
2
+ n(n 1) s (xn2 )ds + Gtn , (4.3.52)
2 0

where (Gtn )t[0,T ],nN is a centered Gaussian process, independent of


(G(xn )(0))nN , such that, if n1 , n2 1, then for all s,t 0,
 ts
E[Gtn1 Gns 2 ] = n1 n2 u (xn1 +n2 2 )du .
0

Note that a consequence of Theorem 4.3.20 is that if (G(xn )(0))nN is a centered


Gaussian process, then so is (G(xn )(t))t[0,T ],nN .
Proof of Theorem 4.3.20 The idea of the proof is to use (4.3.47) to show that
the process (GN (x, . . . , xn )(t))t[0,T ] is the solution of a stochastic differential sys-
tem whose martingale terms converge by Rebolledos Theorem H.14 towards a
Gaussian process.
It is enough to prove the theorem with Pi = xi for i N. Set GNi (t) := GN (xi )(t) =
4.3 S TOCHASTIC ANALYSIS FOR RANDOM MATRICES 275

Nxi , (LN (t) t ) to get, using (4.3.47) (which is still valid with obvious modifi-
cations if i = 1),
i2  t
GNi (t) = GNi (0) + i GNk (s)s (xi2k )ds + MiN (t)
k=0 0
 t  t
2 i i2
+
2
i(i 1)
0
xi2 , LN (s)ds +
2N k=0 0
GNk (s)GNi2k (s)ds , (4.3.53)

where (MiN , i N) are martingales with bracket


 t
2
MiN , M Nj t = ij xi+ j2 , LN (s)ds .
0

(Note that by Lemma 4.3.16, the L p norm of MiN  is finite for all p, and so in
particular MiN are martingales and not just local martingales.)
By Lemma 4.3.16, for all t 0, MiN , M Nj t converges in L2 and almost surely

towards 2 i j 0t xi+ j2 , s ds. Thus, by Theorem H.14, and with the Gaussian
process (Gti )t[0,T ],iN as defined in the theorem, we see that, for all k N,
(MkN (t), . . . , M1N (t))t[0,T ] converges in law towards
the k-dimensional Gaussian process (Gtk , Gtk1 , . . . , Gt1 )t[0,T ] . (4.3.54)
Moreover, (Gtk , Gtk1 , . . . , Gt1 )t[0,T ] is independent of (G(xn )(0))nN since the con-
vergence in (4.3.54) holds given any initial condition such that LN (0) converges
to . We next show by induction over p that, for all q 2,
Aqp := max sup E[ sup |GNi (t)|q ] < . (4.3.55)
ip NN t[0,T ]

To begin the induction, note that (4.3.55) holds for p = 0 since GN0 (t) = 0. Assume
(4.3.55) is verified for polynomials of degree strictly less than p and all q. Recall
that, by (4.3.45) of Lemma 4.3.16, for all q N,
Bq = sup sup E[|x|q , LN (t)] < . (4.3.56)
NN t[0,T ]

Set Aqp (N, T ) := E[supt[0,T ] |GNp (t)|q ]. Using (4.3.56), Jensens inequality in the
form E(x1 + x2 + x3 )q 3q1 3i=1 E|xi |q , and the BurkholderDavisGundy in-
equality (Theorem H.8), we obtain that, for all > 0,
Aqp (N, T ) 3q [Aqp (N, 0)
p2
1
+(pT )q (Akq(1+ ) (N, T ))(1+ ) 1+
B(1+ ) 1 (p2k)q
k=0
 T
1 q q1
+(pN ) T q/2 E[ x2q(p1) , LN (s)ds] .
0
276 4. S OME GENERALITIES

By the induction hypothesis (Akq(1+ ) is bounded since k < p), the fact that we
control Aqp (N, 0) by hypothesis and the finiteness of Bq for all q, we conclude
also that Aqp (N, T ) is bounded uniformly in N for all q N. This completes the
induction and proves (4.3.55).
Set next, for i N,
i2  t
N (i)(s) := iN 1 GNk (s)GNi2k (s)ds .
k=0 0

Since
1
sup E[N (i)(s)q ] N q i2q (A2q
p 2
) T,
s[0,T ]

we conclude from (4.3.55) and the BorelCantelli Lemma that

N (i)() N 0 , in all Lq , q 2, and a.s. (4.3.57)

Setting
i2  t
YiN (t) = GNi (t) GNi (0) i GNk (s)xi2k , s ds,
k=0 0

for all t [0, T ], we conclude from (4.3.53), (4.3.54) and (4.3.57) that the pro-
cesses (YiN (t),Yi1N (t), . . . ,Y N (t))
1 t0 converge in law towards the centered Gaus-
sian process (G (t), . . . , G1 (t))t0 .
i

To conclude, we need to deduce the convergence in law of the GN s from that


of the Y N s. But this is clear again by induction; GN1 is uniquely determined from
Y1N and GN1 (0), and so the convergence in law of Y1N implies that of GN1 since
GN1 (0) converges in law. By induction, if we assume the convergence in law of
(GNk , k p 2), we deduce that of GNp1 and GNp from the convergence in law of
YpN and Yp1
N .

Exercise 4.3.21 Recover the results of Section 2.1.7 in the case of Gaussian
Wigner matrices. by taking X N, (0) = 0, with 0 = 0 and G(xn )(0) = 0. Note
that mn (t) := EG(xn )(t) = t n/2 mn (1) may not vanish.

Exercise 4.3.22 In each part of this exercise, check that the given initial data
X N (0) fulfills the hypotheses of Theorem 4.3.20. (a) Let X N (0) be a diagonal
matrix with entries on the diagonal ( ( Ni ), 1 i N), with a continuously
differentiable function on [0, 1]. Show that
 1
1
0 ( f ) = f ( (x))dx, G(x p )(0) = [ (1) p (0) p ] for all p,
0 2
4.3 S TOCHASTIC ANALYSIS FOR RANDOM MATRICES 277

and that (G(x p )(0), p 0) are deterministic.


(b) Let X N, (0) be a finite rank diagonal matrix, i.e. for some k fixed indepen-
dently of N, X0N = diag(1 , . . . , k , 0, . . . , 0), with the i s uniformly bounded.
Check that
k
0 = 0 , G(x p )(0) = lp for all p,
l=1

and that G(x p )(0) is random if the i s are.


(c) Let X N, (0) be a diagonal matrix with entries X N (0)(ii) = i / N for 1 i
N, with some i.i.d. centered bounded random variables i . Check that

0 ( f ) = 0 , G(x p )(0) = 0 if p = 1

but G(x)(0) is a standard Gaussian variable.

4.3.4 Large deviation bounds

Fix T R+ . We discuss in this subsection the derivation of large deviation esti-


mates for the measure-valued process {LN (t)}t[0,T ] . We will only derive expo-
nential upper bounds, and refer the reader to the bibliographical notes for infor-
mation on complementary lower bounds, applications and relations to spherical
integrals.
We begin by introducing a candidate for a rate function on C([0, T ], M1 (R)).
For any f , g Cb2,1 (R[0, T ]), s t [0, T ] and . C([0, T ], M1 (R)), set
 
Ss,t ( , f ) = f (x,t)d t (x) f (x, s)d s (x)
 t
u f (x, u)d u (x)du
s
 t 
1 x f (x, u) x f (y, u)
d u (x)d u (y)du, (4.3.58)
2 s xy
 t
 f , gs,t
= x f (x, u)x g(x, u)d u (x)du (4.3.59)
s

and
1
Ss,t ( , f ) = Ss,t ( , f )  f , f s,t . (4.3.60)
2
Set, for any probability measure M1 (R),

+ , if 0 = ,
S ( ) := S ( ) := sup f C2,1 (R[0,T ]) sup0stT Ss,t ( , f ) , otherwise.
0,T
b
278 4. S OME GENERALITIES

We now show that S () is a candidate for rate function, and that a large devia-
tion upper bound holds with it.

Proposition 4.3.23 (a) For any M1 (R), S () is a good rate function on


C([0, T ], M1 (R)), that is, { C([0, T ], M1 (R)); S ( ) M} is compact for any
M R+ .
(b) With assumptions as in Proposition 4.3.10, the sequence (LN (t))t[0,T ] satisfies
a large deviation upper bound of speed N 2 and good rate function S , that is, for
all closed subsets F of C([0, T ], M1 (R)),
1
lim sup log P(LN () F) inf S .
N N2 F

We note in passing that, since S () is a good rate function, the process


(LN (t))t[0,T ] concentrates on the set { : S ( ) = 0}. Exercise 4.3.25 below
establishes that the latter set consists of a singleton, the solution of (4.3.25).
The proof of Proposition 4.3.23 is based on Itos calculus and the introduction
of exponential martingales. We first need to improve Lemma 4.3.14 in order to
obtain exponential tightness.

Lemma 4.3.24 Assume (4.3.23). Let T R+ . Then, there exists a(T ) > 0 and
M(T ),C(T ) < so that:
(a) for M M(T ),
 
C(T )ea(T )MN ;
2
P sup log(x2 + 1), LN (t) M
t[0,T ]

(b) for any L N, there exists a compact set K (L) C([0, T ], M1 (R)) so that

P (LN () K (L)c ) eN L .
2

It follows in particular from the second part of Lemma 4.3.24 that the sequence
(LN (t),t [0, T ]) is almost surely pre-compact in C([0, T ], M1 (R)); compare with
Lemma 4.3.14.
Proof The proof proceeds as in Lemma 4.3.14. Set first f (x) = log(x2 + 1).
Recalling (4.3.29) and Corollary H.13, we then obtain that, for all L 0,
 
N 2 L2
P sup |M f (s)| L 2e 16T ,
N
sT

which combined with (4.3.29) yields the first part of the lemma.
For the second part of the lemma, we proceed similarly, by first noticing that
4.3 S TOCHASTIC ANALYSIS FOR RANDOM MATRICES 279

if f C2 (R) is bounded, together with its first and second derivatives, by 1, then
from Corollary H.13 and (4.3.33) we have that
sup | f , LN (s) LN (ti )| 2 + ,
i s(i+1)

N 2 ( )2
with probability greater than 1 2e 16 . Using the compact sets K = KM of
C([0, T ], M1 (R)) as in (4.3.36) with k = 1/kM( fk  +  fk  +  fk  ), we then
conclude that
P(LN KM ) 2ecM N ,
2

with cM M . Adjusting M = M(L) completes the proof.


Proof of Proposition 4.3.23 We first prove that S () is a good rate function. Then
we obtain a weak large deviation upper bound, which gives, by the exponential
tightness proved in the Lemma 4.3.24, the full large deviation upper bound.
(a) Observe first that, from Riesz Theorem (Theorem B.11), S ( ) is also
given, when 0 = , by
1 Ss,t ( , f )2
SD ( ) = sup sup . (4.3.61)
2 f Cb (R[0,T ]) 0stT
2,1  f , f s,t

Consequently, S is nonnegative. Moreover, S is obviously lower semicontin-


uous as a supremum of continuous functions. Hence, we merely need to check
that its level sets are contained in relatively compact sets. By Lemma 4.3.13, it is
enough to show that, for any M > 0:
M so that for any
(1) for any integer m, there is a positive real number Lm
{SD M},
1
sup s (|x| LmM
) , (4.3.62)
0sT m
proving that s KLM for all s [0, T ];
(2) for any integer m and f Cb2 (R), there exists a positive real number mM so
that for any {S () M},
1
sup |t ( f ) s ( f )| , (4.3.63)
|ts|mM m

showing that ss ( f ) C M ,|| f || .


 
To prove (4.3.62), we consider, for > 0, f (x) = log x2 (1 + x2 )1 + 1
Cb2,1 (R[0, T ]). We observe that
C := sup ||x f || + sup ||x2 f ||
0< 1 0< 1
280 4. S OME GENERALITIES

is finite and, for (0, 1],



x f (x) x f (y)
C.
xy

Hence, (4.3.61) implies, by taking f = f in the supremum, that, for any (0, 1],
any t [0, T ], any . {SD M},

t ( f ) 0 ( f ) + 2Ct + 2C Mt .

Consequently, we deduce by the monotone convergence theorem and letting


decrease to zero that for any . {S () M},

sup t (log(x2 + 1))  , log(x2 + 1) + 2C(1 + M) .
t[0,T ]

Chebyshevs inequality and (4.3.23) thus imply that for any . {S () M} and
any K R+ ,

CD + 2C(1 + M)
sup t (|x| K) ,
t[0,T ] log(K 2 + 1)

which finishes the proof of (4.3.62).


The proof of (4.3.63) again relies on (4.3.61) which implies that for any f
Cb2 (R), any . {S () M} and any 0 s t T ,

| f , t s |  f  |t s| + 2 f  M |t s|. (4.3.64)

We turn next to establishing the weak large deviation upper bound. Pick
C([0, T ], M1 (R)) and f C2,1 ([0, T ]R). By Lemma 4.3.12, for any s 0, the
s} is a martingale for the filtration of the Brownian motion
process {Ss,t (LN , f ),t 

W , which is equal to 2/ N 3/2 Ni=1 st f (iN (u))dWui . Its bracket is  f , f s,t
LN .
As f is uniformly bounded, we can apply Theorem H.10 to deduce that the pro-
cess {MN (LN , f )(t),t s} is a martingale if for C([0, T ], M1 (R)) we denote
N2
MN ( , f )(t) := exp{N 2 Ss,t ( , f )  f , f s,t s,t
+ N ( f ) }
2
with
 t
1 1
( f )s,t
:= ( ) x2 f (s, x)d (x)du.
2 s

Moreover, C([0, T ], M1 (R))Ss,t ( , f ) := Ss,t ( , f ) 12  f , f s,t


is continu-
ous as f and its two first derivatives are bounded continuous whereas the function
 
 st x2 f (s, x)d (x)du is uniformly bounded by T x2 f  . Therefore, if we
pick small enough so that Ss,t (., f ) varies by at most > 0 on the ball (for
4.4 C ONCENTRATION OF MEASURE AND RANDOM MATRICES 281

some metric d compatible with the weak topology on C([0, T ], M1 (R))) of radius
around , we obtain, for all s t T ,
MN (LN , f )(t)
P (d(LN , ) < ) = E[ 1 ]
MN (LN , f )(t) d(LN , )<
2 +N f  2 Ss,t ( , f )
N
eN E[MN (LN , f )(t)1d(LN , )< ]
N 2 +N f  N 2 Ss,t ( , f )
e
E[MN (LN , f )(t)]
N 2 +N f  N
2 Ss,t ( , f )
= e ,
where we finally used the fact that E[MN (LN , f )(t)] = E[MN (LN , f )(s)] = 1 since
the process {MN (LN , f )(t),t s} is a martingale. Hence,
1
lim lim log P (d(LN , ) < ) Ss,t ( , f )
0 N N 2

for any f C2,1 ([0, T ]R). Optimizing over f gives


1
lim lim log P (d(LN , ) < ) S0,1 ( , f ).
0 N N 2

Since LN (0) is deterministic and converges to A , if 0 = A ,


1
lim lim log P (d(LN , ) < ) = ,
0 N N2
which allows us to conclude that
1
lim lim 2 log P (d(LN , ) < ) SA ( , f ) .
0 N N

Exercise 4.3.25 In this exercise, you prove that the set { : S ( ) = 0} consists
of the unique solution of (4.3.25).
(a) By applying Riesz Theorem, show that
Ss,t ( , f )2
S0,T ( ) := sup sup .
f Cb (R[0,T ]) 0stT
2,1 2 f , f s,t

(b) Show that S (. ) = 0 iff 0 = and Ss,t ( , f ) = 0 for all 0 s t T and all
f Cb2,1 (R[0, T ]). Take f (x) = (z x)1 to conclude.

4.4 Concentration of measure and random matrices

We have already seen in Section 2.3 that the phenomenon of concentration of


measure can be useful in the study of random matrices. In this section, we further
282 4. S OME GENERALITIES

expand on this theme by developing both concentration techniques and their ap-
plications to random matrices. To do so we follow each of two well-established
routes. Taking the first route, we consider functionals of the empirical measure
of a matrix as functions of the underlying entries. When enough independence is
present, and for functionals that are smooth enough (typically, Lipschitz), concen-
tration inequalities for product measures can be applied. Taking the second route,
which applies to situations in which random matrix entries are no longer inde-
pendent, we view ensembles of matrices as manifolds equipped with probability
measures. When the manifold satisfies appropriate curvature constraints, and the
measure satisfies coercivity assumptions, semigroup techniques can be invoked to
prove concentration of measure results.

4.4.1 Concentration inequalities for Hermitian matrices with independent


entries

We begin by considering Hermitian matrices XN whose entries on-and-above the


diagonal are independent (but not necessarily identically distributed) random vari-
ables. We will mainly be concerned with concentration inequalities for the random
variable tr f (XN ), which is a Lipschitz function of the entries of XN , see Lemma
2.3.1.

Remark 4.4.1 Wishart matrices, as well as matrices of the form YN TN YN with TN


diagonal and deterministic, and YN MatMN possessing independent entries, can
be easily treated by the techniques of this section. For example, to treat Wishart
matrices, fix N M positive integers, and define the matrix XN MatN+M ,
 
0 YN
XN = .
YN 0

Now (XN )2 equals


 
YN YN 0
0 YNYN

and therefore, for any continuous function f ,

tr( f (XN2 )) = 2tr( f (YN YN )) + (M N) f (0) .

Hence, concentration results for linear functionals of the empirical measure of the
singular values of YN can be deduced from such results for the eigenvalues of XN .
For an example, see Exercise 4.4.9.
4.4 C ONCENTRATION OF MEASURE AND RANDOM MATRICES 283

Entries satisfying Poincares inequality

Our first goal is to extend the concentration inequalities, Lemma 2.3.3 and The-
orem 2.3.5, to Hermitian matrices whose independent entries satisfy a weaker
condition than the LSI, namely to matrices whose entries satisfy a Poincare type
inequality.

Definition 4.4.2 (Poincare inequality) A probability measure P on RM satisfies


the Poincare inequality (PI) with constant m > 0 if, for all continuously differen-
tiable functions f ,
  1
VarP ( f ) := EP | f (x) EP ( f (x))|2 EP (| f |2 ).
m

It is not hard to check that if P satisfies an LSI with constant c, then it satisfies a PI
with constant m c1 , see [GuZ03, Theorem 4.9]. However, there are probability
measures which satisfy the PI but not the LSI such as Z 1 e|x| dx for a (1, 2).
a

Further, like the LSI, the PI tensorizes: if P satisfies the PI with constant m, PM
also satisfies the PI with constant m for any M N, see [GuZ03, Theorem 2.5].
Finally, if for some uniformly bounded function V we set PV = Z 1 eV (x) dP(x),
then PV also satisfies the PI with constant bounded below by e supV +infV m, see
[GuZ03, Property 2.6].
As we now show, probability measures on RM satisfying the PI have sub-
exponential tails.

Lemma 4.4.3 Assume that P satisfies the PI on RM with constant m. Then, for

any differentiable function G on RM , for |t| m/ 2 G2  ,
EP (et(GEP (G)) ) K , (4.4.1)
with K = i0 2i log(1 21 4i ). Consequently, for all > 0,

m

P(|G EP (G)| ) 2Ke 2 G2 
. (4.4.2)

Proof With G as in the statement, for t 2 < m/G22  , set f = etG and note
that
 2 t 2
EP (e2tG ) EP (etG )  G22  EP (e2tG )
m
so that
t2  2
EP (e2tG ) (1 )1 EP (etG ) .
m G2 
2
284 4. S OME GENERALITIES

Iterating, we deduce that


n
4it 2 n
log EP (e2tG ) 2i log(1  G22  ) + 2n+1 log EP (e2 tG ).
i=0 m
Since
n tG
lim 2n+1 log EP (e2 ) = 2tEP (G)
n

and

4it 2
Dt := 2i log(1  G22  ) <
i=0 m

increases with |t|, we conclude that with t0 = m/ 2 G2  ,
EP (e2t0 (GEP (G)) ) Dt0 = K .
The estimate (4.4.2) then follows by Chebyshevs inequality.

We can immediately apply this result in the context of large random matri-
ces. Consider Hermitian matrices such that the laws of the independent entries
{XN (i, j)}1i jN all satisfy the PI (over R or R2 ) with constant bounded below
by Nm. Note that, as for the LSI, if P satisfies the PI with constant m, the law of
ax under P satisfies it also with a constant bounded by a2 m1 , so that our hypoth-
esis includes the case where XN (i, j) = aN (i, j)YN (i, j) with YN (i, j) i.i.d. of law P
satisfying the PI and a(i, j) deterministic and uniformly bounded.

Corollary 4.4.4 Under the preceding assumptions, there exists a universal con-
stant C > 0 such that, for any differentiable function f , and any > 0,

C Nm
P (|tr( f (XN )) E[tr( f (XN ))]| N) Ce f 
2 .

Exercise 4.4.5 Using an approximation argument similar to that employed in the


proof of Herbsts Lemma 2.3.3, show that the conclusions of Lemma 4.4.3 and
Corollary 4.4.4 remain true if G is only assumed Lipschitz continuous, with |G|L
replacing  G2  .

x2
Exercise 4.4.6 Let (dx) = (2 )1/2 e 2 dx be the standard Gaussian measure.
Show that satisfies the Poincare inequality with constant one, by following the
following approaches.
Use Lemma 2.3.2.
Use the interpolation
 1  
2

(( f ( f ))2 ) = f ( x + 1 y)d (y) d (x)d ,
0
4.4 C ONCENTRATION OF MEASURE AND RANDOM MATRICES 285

integration by parts, the CauchySchwarz inequality and the fact that, for

any [0, 1], the law of x + 1 y is under .

Exercise 4.4.7 [GuZ03, Theorem 2.5] Show that the PI tensorizes: if P satisfies
the PI with constant m then PM also satisfies the PI with constant m for any
M N.

Exercise 4.4.8 [GuZ03, Theorem 4.9] Show that if P satisfies an LSI with constant
c, then it satisfies a PI with constant m c1 . Hint: Use the LSI with f = 1 + g
and 0.

Exercise 4.4.9 Show that Corollary 4.4.4 extends to the setup of singular values of
the Wishart matrices introduced in Exercise 2.1.18. That is, in the setup described
there, assume the entries YN (i, j) satisfy the PI with constant bounded below by
Nm, and set XN = (YN YNT )1/2 . Prove that, for a universal constant C, and all > 0,

C Nm
P (|tr( f (XN )) E[tr( f (XN ))]| (M + N)) Ce f 
2 .

Matrices with bounded entries and Talagrands method

Recall that the median MY of a random variable Y is defined as the largest real
number such that P(Y x) 21 . The following is an easy consequence of a
theorem due to Talagrand, see [Tal96, Theorem 6.6].

Theorem 4.4.10 (Talagrand) Let K be a convex compact subset of R with diam-


eter |K| = supx,yK |x y|. Consider a convex real-valued function f defined on
K M . Assume that f is Lipschitz on K M , with constant | f |L . Let P be a probability
measure on K and let X1 , . . . , XM be independent random variables with law P.
Then, if M f is the median of f (X1 , . . . , XM ), for all > 0,
2
 
2 2
P | f (X1 , . . . , XM ) M f | 4e 16|K| | f |L .

Under the hypotheses of Theorem 4.4.10,


  
E[| f (X1 , . . . , XM ) M f |] = P | f (X1 , . . . , XM ) M f | t dt
0
 t2
16|K|2 | f |2
4 e L dt = 16|K|| f |L .
0

Hence we obtain as an immediate corollary of Theorem 4.4.10 the following.


286 4. S OME GENERALITIES

Corollary 4.4.11 Under the hypotheses of Theorem 4.4.10, for all t R+ ,


t2
P (| f (X1 , . . . , XM ) E[ f (X1 , . . . , XM )]| (t + 16)|K|| f |L ) 4e 16 .

In order to apply Corollary 4.4.11 in the context of (Hermitian) random matrices


XN , we need to identify convex functions of the entries. Since

1 (XN ) = sup v, XN v ,


vCN ,|v|2 =1

it is obvious that the top eigenvalue of a Hermitian matrix is a convex function of


the real and imaginary parts of the entries. Somewhat more surprisingly, so is the
trace of a convex function of the matrix.

Lemma 4.4.12 (Kleins Lemma) Suppose that f is a real-valued convex func-


(2)
tion on R. Then the function X  tr f (X) on the vector space HN of N-by-N
Hermitian matrices is convex.

For f twice-differentiable and f bounded away from 0 we actually prove a


sharper result, see (4.4.3) below.
Proof We denote by X (resp. Y ) an N N Hermitian matrix with eigenvalues
(xi )1iN (resp. (yi )1iN ) and eigenvectors (i )1iN (resp. (i )1iN ). Assume
at first that f is twice continuously differentiable, and consider the Taylor remain-
der R f (x, y) = f (x) f (y) (x y) f (y). Since

f c 0

for some constant c, we have R f (x, y) 2c (x y)2 = R c x2 (x, y). Consider also
2
the matrix R f (X,Y ) = f (X) f (Y ) (X Y ) f (Y ), noting that tr(R c x2 (X,Y )) =
2
tr( 2c (X Y )2 ). For i {1, . . . , N}, with ci j = |i , j |2 , and with summations on
j {1, . . . , N}, we have

i , R f (X,Y )i  = f (xi ) + (ci j f (y j ) xi ci j f (y j ) + ci j y j f (y j ))


j

= ci j R f (xi , y j ) ci j R 2c x2 (xi , y j ) ,
j j

where at the middle step we use the fact that j ci j = 1. After summing on i
{1, . . . , N} we have
c
tr( f (X) f (Y ) (X Y ) f (Y )) tr(X Y )2 0 . (4.4.3)
2
4.4 C ONCENTRATION OF MEASURE AND RANDOM MATRICES 287

Now take successively (X,Y ) = (A, (A + B)/2), (B, (A + B)/2). After summing
(2)
the resulting inequalities, we have for arbitrary A, B Hn that
 
1 1 1 1
tr f ( A + B) tr ( f (A)) + tr ( f (B)) .
2 2 2 2
The result follows for general convex functions f by approximations.

We can now apply Corollary 4.4.11 and Lemma 4.4.12 to the function
f ({XN (i, j)}1i jN ) = tr( f (XN )) to obtain the following.

Theorem 4.4.13 Let (Pi, j , i j) and (Qi, j , i < j) be probability measures sup-
ported on a convex compact subset K of R. Let XN be a Hermitian matrix, such
that XN (i, j), i j, is distributed according to Pi, j , and XN (i, j), i < j, is dis-
tributed according to Qi, j , and such that all these random variables are indepen-

dent. Fix 1 (N) = 8|K| a/N. Then, for any 4 |K|1 (N), and any convex
Lipschitz function f on R,
 
PN |tr( f (XN )) E N [tr( f (XN ))]| N
 
32|K| 1 2
exp N 2
[ 1 (N)] . (4.4.4)
16|K|2 a2 16|K|| f |2L

4.4.2 Concentration inequalities for matrices with dependent entries

We develop next an approach to concentration inequalities based on semigroup


theory. When working on Rm , this approach is related to concentration inequali-
ties for product measures, and in particular to the LSI. However, its great advan-
tage is that it also applies to manifolds, through the BakryEmery criterion.
Our general setup will be concerned with a manifold M equipped with a mea-
sure . We will consider either M = Rm or M compact.

The setup with M = Rm and =Lebesgue measure

Let be a smooth function from Rm into R, with fast enough growth at infinity
such that the measure
1 (x1 ,...,xm )
(dx) := e dx1 dxm
Z
is a well defined probability measure. (Further assumptions of will be imposed
below.) We consider the operator L on twice continuously differentiable func-
288 4. S OME GENERALITIES

tions defined by
m
L = () = [i2 (i )i ] .
i=1

Then, integrating by parts, we see that L is symmetric in L2 ( ), that is, for any
compactly supported smooth functions f , g,
 
( f L g) d = (gL f ) d .

In the rest of this section, we will use the notation f = f d .
Let B denote a Banach space of real functions on M, equipped with a partial
order <, that contains Cb (M), the Banach space of continuous functions on M
equipped with the uniform norm, with the latter being dense in B. We will be
concerned in the sequel with B = L2 ( ).

Definition 4.4.14 A collection of operators (Pt )t0 with Pt : BB is a Markov


semigroup with infinitesimal generator L if the following hold.
(i) P0 f = f for all f B.
(ii) The map tPt is continuous in the sense that for all f B, tPt f is a con-
tinuous map from R+ into B.
(iii) For any f B and (t, s) R2+ , Pt+s f = Pt Ps f .
(iv) Pt 1 = 1 for t 0, and Pt preserves positivity: for each f 0 and t 0, Pt f 0.
(v) For any function f for which the limit exists,

L ( f ) = lim t 1 (Pt f f ) . (4.4.5)


t0

The collection of functions for which the right side of (4.4.5) exists is the domain
of L , and is denoted D(L ).

Property (iv) implies in particular that Pt f   f  . Furthermore, Pt is re-


versible in L2 ( ), i.e., ( f Pt g) = (gPt f ) for any smooth functions f , g. In
particular, is invariant under Pt : that is, Pt = . It also follows immediately
from the definition that, for any f D(L ) and t 0,

f D(L ) Pt f D(L ) , L Pt f = Pt L f . (4.4.6)

In what follows we will be interested in the case where L = L , at least as


operators on a large enough class of functions. We introduce a family of bilinear
forms n on smooth functions by setting 0 ( f , g) = f g and, for n 1,
1
n ( f , g) = (L n1 ( f , g) n1 ( f , L g) n1 (g, L f )) .
2
4.4 C ONCENTRATION OF MEASURE AND RANDOM MATRICES 289

We will only be interested in the cases n = 1, 2. Thus, the carre du champ operator
1 satisfies
1
1 ( f , g) = (L f g f L g gL f ) , (4.4.7)
2
and the carre du champ itere operator 2 satisfies
1
2 ( f , f ) = {L 1 ( f , f ) 21 ( f , L f )} . (4.4.8)
2
We often write i ( f ) for i ( f , f ), i = 1, 2. Simple algebra shows that 1 ( f ) =
i=1 (i f ) , and
m 2

m m
2 ( f , f ) = (i j f )2 + i f Hess()i j j f , (4.4.9)
i, j=1 i, j=1

with Hess()i j = Hess() ji = i j the Hessian of .

Remark 4.4.15 We introduced the forms n ( f , f ) in a purely formal way. To


motivate, note that, assuming all differentiation and limits can be taken as written,
one has
1 d
n ( f , g) = (Pt (n1 ( f , g)) n1 (Pt f , Pt g)) |t=0
2 dt
1
= (L n1 ( f , g) n1 ( f , L g) n1 (g, L f )) . (4.4.10)
2
We will see below in Lemma 4.4.22 that indeed these manipulations are justified
when f , g are sufficiently smooth.

Definition 4.4.16 We say that the BakryEmery condition (denoted BE) is satisfied
if there exists a positive constant c > 0 such that
1
2 ( f , f ) 1 ( f , f ) (4.4.11)
c
for any smooth function f .

Note (by taking f = ai xi with ai arbitrary constants) that the BE condition is


equivalent to
1
Hess()(x) I for all x Rm , in the sense of the partial order
c
on positive definite matrices . (4.4.12)

Theorem 4.4.17 Assume that C2 (Rm ) and that the BE condition (4.4.12)
290 4. S OME GENERALITIES

holds. Then, satisfies the logarithmic Sobolev inequality with constant c, that
is, for any f L2 ( ),
 
f2
f 2 log  d 2c 1 ( f , f )d . (4.4.13)
f 2 d
(Rm ) denote the subset of C (Rm ) that consists of func-
In the sequel, we let Cpoly
tions all of whose derivatives have polynomial growth at infinity. The proof of
Theorem 4.4.17 is based on the following result which requires stronger assump-
tions.

Theorem 4.4.18 Assume the BE condition (4.4.12). Further assume that


(Rm ). Then satisfies the logarithmic Sobolev inequality with constant c.
Cpoly

From Theorem 4.4.17, (4.4.9) and Lemma 2.3.3 of Section 2.3, we immediately
get the following.

Corollary 4.4.19 Under the hypotheses of Theorem 4.4.17,


  
|G G(x) (dx)| 2e /2c|G|L .
2 2
(4.4.14)

Proof of Theorem 4.4.17 (with Theorem 4.4.18 granted). Fix > 0, M > 1, and
set B(0, M) = {x Rm : x2 M}. We will construct below approximations of
(Rm ) with the following properties:
by functions M, Cpoly

sup |M, (x) (x)| ,


xB(0,M)
1
Hess(M ) I uniformly . (4.4.15)
c+
With such a construction, M, converges weakly (as M tends to infinity and
tends to 0) toward , by bounded convergence. Further, by Theorem 4.4.18,
for any M, as above, M, satisfies (4.4.13) with the constant c + > 0. For
f 2 smooth, bounded below by a strictly positive constant, and constant outside a
compact set, we deduce that satisfies (4.4.13) by letting M go to infinity and
go to zero in this family of inequalities. We then obtain the bound (4.4.13) for all

functions f L2 ( ) with 1 ( f , f )d < by density.
So it remains to construct a family M, satisfying (4.4.15). For > 0, we let
P be a polynomial approximation of on B(0, 2M) such that

sup Hess(P )(x) Hess()(x) < , P (0) = (0), P (0) = (0)
xB(0,2M) 4
4.4 C ONCENTRATION OF MEASURE AND RANDOM MATRICES 291

with  the operator norm on Matm (R). Such an approximation exists by Weier-
strass Theorem. Note that

sup |P (x) (x)|


xB(0,2M)
 1
M2
sup d x, (Hess(P )( x) Hess()( x))x
. (4.4.16)
xB(0,2M) 0 2

With c1
=c
1 > 0 for small, note that Hess(P )(x) c1 I on B(0, 2M)
4
and define P as the function on Rm given by
 -
1
P (x) = sup P (y) + P (y) (x y) + x y22 .
yB(0,2M) 2c

Note that P = P on B(0, 2M) whereas Hess(P ) c1 I almost everywhere since


the map
 -
1 1
x sup P (y) + P (y) (x y) + x y2 x2
yB(0,2M) 2c 2c

is convex as a supremum of convex functions (and thus its Hessian, which is al-
(Rm )-
most everywhere well defined, is nonnegative). Finally, to define a Cpoly
valued function we put, for some small t,

,t (x) = P (x + tz)d (z)

with the standard centered Gaussian law. By (4.4.16) and since P = P on


B(0, M), we obtain for x B(0, M),

M ( ,t) := sup | ,t (x) (x)|


xB(0,M)

M2
sup |P (x + tz) P (x)|d (z) + .
xB(0,M) 2

Thus, M ( ,t) vanishes when and t go to zero and we choose these two pa-
(Rm ) since the
rameters so that it is bounded by . Moreover, ,t belongs to Cpoly

density of the Gaussian law is C and P has at most a quadratic growth at infinity.
Finally, since Hess(P ) c1 1
I almost everywhere, Hess ,t c I everywhere.
To conclude, we choose small enough so that c c + .

Our proof of Theorem 4.4.18 proceeds via the introduction of the semigroup Pt
associated with L through the solution of the stochastic differential equation

dXtx = (Xtx )dt + 2dwt , X0x = x , (4.4.17)
292 4. S OME GENERALITIES

where wt is an m-dimensional Brownian motion. We first verify the properties of


the solutions of (4.4.17), and then deduce in Lemma 4.4.20 some analytical prop-
erties of the semigroup. The proof of Theorem 4.4.18 follows these preliminary
steps.

Lemma 4.4.20 With assumptions as in Theorem 4.4.18, for any x Rm , the solu-
tion of (4.4.17) exists for all t R+ . Further, the formula
Pt f (x) = E( f (Xtx )) (4.4.18)
determines a Markov semigroup on B = L2 (
), with infinitesimal generator L
(Rm ).
so that D(L ) contains Cpoly (R ), and L coincides with L on Cpoly
m

Proof Since the second derivatives of are locally bounded, the coefficients of
(4.4.17) are locally Lipschitz, and the solution exists and is unique up to (possi-
bly) an explosion time. We now show that no explosion occurs, in a way similar
to our analysis in Lemma 4.3.3. Let Tn = inf{t : |Xtx | > n}. Itos Lemma and
the inequality x (x) |x|2 /c c for some constant c > 0 (consequence of
(4.4.12)) imply that
 tT 
n
x
E(|XtT n
|2
) = x 2
E X s (X s )ds + 2E(t Tn )
0
 tT 
1 n
x2 + E |Xs |2 ds + (2 + c )E(t Tn ) . (4.4.19)
c 0

Gronwalls Lemma then yields that


x
E(|XtTn
|2 ) (x2 + (2 + c )t)et/c .
Since the right side of the last estimate does not depend on n, it follows from
Fatous Theorem that the probability that explosion occurs in finite time vanishes.
That (4.4.18) determines a Markov semigroup is then immediate (note that Pt is a
contraction on L2 ( ) by virtue of Jensens inequality).
To analyze the infinitesimal generator of Pt , we again use Itos Lemma. First
note that (4.4.19) implies that E x |Xt |2 C(t)(x2 + 1) for some locally bounded
C(t) . Repeating the same computation (with the function |XtT x |2p , p positive
n
x 2p 2p
integer) yields that E |Xt | C(t, p)(x + 1). For f Cpoly (Rm ), we then get
that
 tTn  tTn
x
f (XtTn
) f (x) = L f (Xsx )ds + g(Xsx )dws , (4.4.20)
0 0
where the function g has polynomial growth at infinity and thus, in particular,
 t
E(sup( g(Xsx )dws )2 ) < .
t1 0
4.4 C ONCENTRATION OF MEASURE AND RANDOM MATRICES 293

Arguing similarly with the term containing L f (Xsx ), we conclude that all terms
in (4.4.20) are uniformly integrable. Taking n and using the fact that Tn
together with the above uniform integrability yields that
 t
E ( f (Xtx )) f (x) = E L f (Xsx )ds .
0

Taking the limit as t 0 (and using again the uniform integrability together with
(Rm ) D(L ) .
the continuity Xsx s0 x a.s.) completes the proof that Cpoly

Remark 4.4.21 In fact, D(L ) can be explicitly characterized: it is the subset


of L2 ( ) consisting of functions f that are locally in the Sobolev space W 2,2
and such that L f L2 ( ) in the sense of distributions (see [Roy07, Theorem
2.2.27]). In the interest of providing a self-contained proof, we do not use this
fact.

An important analytical consequence of Lemma 4.4.20 is the following.

Lemma 4.4.22 With assumptions as in Theorem 4.4.18, we have the following.


(i) If f is a Lipschitz(1) function on Rm , then Pt f is a Lipschitz(e2t/c ) function
for all t R+ .
(ii) If f Cb (Rm ), then Pt f Cpoly
(Rm ).

(iii) If f , g Cpoly (R ), then the equality (4.4.10) with n = 2 holds.
m

Proof (i) By applying Itos Lemma we obtain that


d x 2
|Xt Xty |2 = 2(Xtx Xty )((Xtx ) (Xty )) |Xtx Xty |2 .
dt c
In particular, |Xtx Xty | |x y|e2t/c , and thus for f Lipschitz with Lipschitz
constant equal to 1, we have | f (Xtx ) f (Xty )| |x y|e2t/c . Taking expectations
completes the proof.
(Rm ), we have that f D(L ) and L f = L f . Therefore, also
(ii) Since f Cpoly
Pt f D(L ), and L Pt f = Pt L f L2 ( ) (since L f L2 ( ) and Pt is a
contraction on L2 ( )). By part (i) of the lemma, |Pt f | is uniformly bounded
and, by assumption, || has at most polynomial growth. It follows that Pt f ,
which exists everywhere, satisfies

Pt f = gt ,

where the function gt L2 ( ) has at most polynomial growth at infinity. Stan-


dard estimates for the solutions of uniformly elliptic equations (for this version,
see [GiT98, Theorem 4.8]) then imply that Pt f Cpoly (Rm ).
(Rm ). Thus ( f , g) C (Rm ) and, in particular,
(iii) By assumption, f , g Cpoly 1 poly
294 4. S OME GENERALITIES

by Lemma 4.4.20, belongs to D(L ) and so does Pt 1 ( f , g). The rest follows from
the definitions.

Proof of Theorem 4.4.18 Let h be a positive bounded continuous function so that



hd = 1. We begin by proving that Pt is ergodic in the sense that

lim (Pt h h)2 = 0. (4.4.21)


t

A direct proof can be given based on part (i) of Lemma 4.4.22. Instead, we present
a slightly longer proof that allows us to derive useful intermediate estimates.
We first note that we can localize (4.4.21): because Pt 1 = 1 and Pt f 0 for f
positive continuous, it is enough to prove (4.4.21) for h Cb (Rm ) that is compactly
supported. Because Cb (K) is dense in C(K) for any compact K, it is enough
to prove (4.4.21) for h Cb (Rm ). To prepare for what follows, we will prove
(4.4.21) for a function h satisfying h = (P g) for some g Cb , 0, and
that is infinitely differentiable with bounded derivatives on the range of g (the
immediate interest is with = 0, (x) = x).
Set ht = Pt h and for s [0,t], define (s) = Ps 1 (hts , hts ). By part (ii) of
Lemma 4.4.22, 1 (hts , hts ) D(L ). Therefore,
d 2 2
(s) = 2Ps 2 (Pts h, Pts h) Ps 1 (Pts h, Pts h) = (s) ,
ds c c
where we use the BE condition in the inequality. In particular,

ht 22 = 1 (ht , ht ) = (0) e2t/c (t) = e2t/c Pt 1 (h, h) . (4.4.22)

The expression 1 (ht , ht ) converges to 0 as t because 1 (h, h) = h22


is uniformly bounded. Further, since for any x, y Rm ,
 1

|ht (x) ht (y)| = ht ( x + (1 )y), (x y)d
0

x y2  ht 2  x y2 et/c  h2  ,

it follows that ht () (ht ) converges almost everywhere to zero. Since (ht ) =


(h), we conclude that ht converges almost everywhere and in L2 ( ) to (h),
yielding (4.4.21).
We now prove Theorem 4.4.18 for f 2 = h Cb that is uniformly bounded below
by a strictly positive constant. Set

S f (t) = (ht log ht )d .

Since ht log ht is uniformly bounded and ht Pt Cb (Rm ), we have by (4.4.21) that


4.4 C ONCENTRATION OF MEASURE AND RANDOM MATRICES 295

Sh (t) converges to 0 as t . Hence


  
d
S f (0) = dt S f (t) = dt 1 (ht , log ht )d , (4.4.23)
0 dt 0

where, in the second equality, we used (4.4.7) and the fact that L (g)d = 0
(Rm ) and, in particular, for g = h log h .
for any g Cpoly t t

Next, using the fact that Pt is symmetric together with the CauchySchwarz in-
equality, we get
 
1 (ht , log ht )d = 1 (h, Pt (log ht )) d
  1  1
1 (h, h) 2 2
d h1 (Pt log ht , Pt log ht )d . (4.4.24)
h
Now, applying (4.4.22) with the function log ht (note that since ht is bounded
below uniformly away from 0, log() is indeed smooth on the range of ht ), we
obtain
 
2
h1 (Pt log ht , Pt log ht )d he c t Pt 1 (log ht , log ht )d
 
2 2
= e c t ht 1 (log ht , log ht )d = e c t 1 (ht , log ht )d , (4.4.25)

where in the last equality we have used symmetry of the semigroup and the Leib-
niz rule for 1 . The inequalities (4.4.24) and (4.4.25) imply the bound
  
2 1 (h, h) 2 1 1
1 (ht , log ht )d e c t d = 4e c t 1 (h 2 , h 2 )d .
h
(4.4.26)
Using this, one arrives at
  
2t 1 1
S f (0) 4e c dt 1 (h 2 , h 2 )d = 2c 1 ( f , f )d ,
0

which completes the proof of (4.4.13) when f Cb is strictly bounded below.


To consider f Cb , apply the inequality (4.4.13) to the function f2 = f 2 + ,
noting that 1 ( f , f ) 1 ( f , f ), and use monotone convergence. Another use of
localization and dominated convergence is used to complete the proof for arbitrary
f L2 ( ) with 1 ( f , f ) < .

The setup with M a compact Riemannian manifold

We now consider the version of Corollary 4.4.19 applying to the setting of a com-
pact connected manifold M of dimension m equipped with a Riemannian metric g
and volume measure , see Appendix F for the notions employed.
296 4. S OME GENERALITIES

We let be a smooth function on M and define


1
(dx) = e(x) d (x)
Z
as well as the operator L such that for all smooth functions h, f C (M),

( f L h) = (hL f ) = g(grad f , grad h)d .
M
We have, for all f C (M),
L f = f g(grad , grad f ),
where is the LaplaceBeltrami operator. In terms of a local orthonormal frame
{Li }, we can rewrite the above as
L = (Li2 Li Li (Li )Li ) ,
i

where is the LeviCivita connection.

Remark 4.4.23 For the reader familiar with such language, we note that, in local
coordinates,
m m
L = gi j i j + b
i i
i, j=1 i=1

with
  
b
i (x) = e
(x)
j e(x) det(gx )gixj .
j

We will not need to use this formula.

Given f , h C (M) we define Hess f , Hess h C (M) by requiring that


Hess f , Hess h = (Hess f )(Li , L j )(Hess h)(Li , L j )
i, j

for all local orthonormal frames {Li }.


We define n , for n 0, as in (4.4.10). In particular, 1 and 2 are given by
(4.4.7) and (4.4.8). We have 1 ( f , h) = g(grad f , grad h) or equivalently
1 ( f , h) = (Li f )(Li h)
i

in terms of a local orthonormal frame {Li }. The latter expression for 1 may be
verified by a straightforward manipulation of differential operators. The expres-
sion for 2 is more complicated and involves derivatives of the metric g, reflecting
the fact that the LeviCivita connection does not preserve the Lie bracket. In other
words, the curvature intervenes, as follows.
4.4 C ONCENTRATION OF MEASURE AND RANDOM MATRICES 297

Lemma 4.4.24 (BochnerBakryEmery)


2 ( f , f ) = Hess f , Hess f  + (Ric + Hess )(grad f , grad f ).

(See Appendix F for the definition of the Ricci tensor Ric(, ).)

Proof Fix p M arbitrarily and let | p denote evaluation at p. Let L1 , . . . , Lm be


an orthonormal frame defined near p M. Write Li L j = k Cikj Lk , where Cikj =
g(Li L j , Lk ). We assume that the frame {Li } is geodesic at p, see Definition F.26.
After exploiting the simplifications made possible by use of a geodesic frame, it
will be enough to prove that
 
2 ( f , f )| p = (Li L j f )2 + Li L j )(Li f )(L j f ) | p
ij

+ ((LiCkk
j
LkCijk )(Li f )(L j f ))| p . (4.4.27)
i, j,k

To abbreviate write Ai = Li + k Ckk i . By definition, and after some trivial ma-

nipulations of differential operators, we have


1
2 ( f , f ) = ( 2 ((Li2 Ai Li ) f )(L j f )2 (L j (Li2 Ai Li ) f )(L j f ))
i, j

= ((Li L j f )2 + ([Li , L j ]Li + Li [Li , L j ] + [L j , Ai Li ]) f )(L j f )).


i, j

We have [Li , L j ] = k (Cikj Ckji )Lk because is torsion-free. We also have [Li , L j ]| p
= 0 because {Li } is geodesic at p. It follows that
[Li , L j ]Li f | p = 0,
Li [Li , L j ] f | p = (LiCikj LiCkji )(Lk f )| p ,
k
([L j , Ai Li ] f )(L j f )| p = (L jCkki + L j Li )(Li f )(L j f )| p .
k

We have g(Li L j , Lk ) + g(L j , Li Lk ) = Cikj + Cikj = 0 by orthonormality of {Li }


and thus
(Li [Li , L j ] f )(L j f )| p = (LiCkji )(Lk f )(L j f )| p .
i, j i, j,k

Therefore, after some relabeling of dummy indices, we can see that equation
(4.4.27) holds.

Rerunning the proofs of Theorem 4.4.18 and Lemma 2.3.3 (this time, not wor-
rying about explosions, since the process lives on a compact manifold, and replac-
(Rm ) by C (M)), we deduce from Lemma 4.4.24 the
ing throughout the space Cpoly b
following.
298 4. S OME GENERALITIES

Corollary 4.4.25 If for all x M and v Tx M,

(Ric + Hess )x (v, v) c1 gx (v, v) ,

then satisfies the LSI (4.4.13) with constant c and, further, for any differen-
tiable function G on M,
  
|G G(x) (dx)| 2e /2cE 1 (G,G) .
2
(4.4.28)

Applications to random matrices

We begin by applying, in the setup M = Rm and =Lebesgue measure, the gen-


( )
eral concentration inequality of Corollary 4.4.19. For XN HN we write

d XN = dXN (i, j) dXN (i, i) ,


i< j i

for the product Lebesgue measure on the entries on-and-above the diagonal of
XN , where the Lebesgue measure on C is taken as the product of the Lebesgue
measure on the real and imaginary parts.

(R) be a strictly convex function satisfying V (x)


Proposition 4.4.26 Let V Cpoly
cI for all x R and some c > 0. Let = 1 or = 2, and suppose XNV is a random
matrix distributed according to the probability measure
1 N tr(V (XN ))
e d XN .
ZNV
Let PNV denote the law of the eigenvalues (1 , . . . , N ) of XNV . Then, for any Lips-
chitz function f : RN R ,
2
  Nc2
PNV | f (1 , . . . , N ) PNV f | > e 2| f |L .
1
Note that if f (1 , . . . , N ) = 1
N Ni=1 g(i ), then | f |L = 2N |g|L .
( )
Proof Take m = N(N 1) /2 + N. Let h : HN
Rm denote the one-to-one
and onto mapping as defined in the beginning of Section 2.5.1, and let V be the
function on Rm defined by trV (X) = V (h(X)). Note that tr X 2 h(X)2 . For
( )
X,Y HN we have
c
tr (V (X) V (Y ) (X Y )V (Y )) h(X) h(Y )2
2
by (4.4.3), and hence Hess V cIm . Now the function f gives rise to a function
f(X) = f (1 , . . . , n ) on Rm , where the i are the eigenvalues of h1 (X). By
4.4 C ONCENTRATION OF MEASURE AND RANDOM MATRICES 299

Lemma 2.3.1, the Lipschitz constants of f and f coincide. Applying Corollary


4.4.19 yields the proposition.

We next apply, in the setup of compact Riemannian manifolds, the general con-
centration inequality of Corollary 4.4.25. We study concentration on orthogonal
and unitary groups. We let O(N) denote the N-by-N orthogonal group and U(N)
denote the N-by-N unitary group. (In the notation of Appendix E, O(N) = UN (R)
and U(N) = Un (C).) We let SU(N) = {X U(N) : det X = 1} and SO(N) =
O(N) SU(N). All the groups O(N), SO(N), U(N) and SU(N) are manifolds
embedded in MatN (C). We consider each of these manifolds to be equipped with
the Riemannian metric it inherits from MatN (C), the latter equipped with the in-
ner product X Y = tr XY . It is our aim is to get concentration results for O(N)
and U(N) by applying Corollary 4.4.25 to SO(N) and SU(N).
We introduce some general notation. Given a compact group G, let G denote
the unique Haar probability measure on G. Given a compact Riemannian mani-
fold M with metric g, and f C (M), let | f |L ,M be the maximum achieved by
g(grad f , grad f )1/2 on M.
Although we are primarily interested in SO(N) and SU(N), in the following
result, for completeness, we consider also the Lie group USp(N) = UN (H)
MatN (H).

Theorem 4.4.27 (Gromov) Let {1, 2, 4}. Let


GN = SO(N), SU(N),USp(N)
according as = 1, 2, 4. Then, for all f C (GN ) and 0, we have
 
(N+2)
4 1 2

2| f |2
GN (| f GN f | ) 2e L ,GN
. (4.4.29)

Proof Recall from Appendix F, see (F.6), that the Ricci curvature of GN is given
by
 
(N + 2)
Ricx (GN )(X, X) = 1 gx (X, X) (4.4.30)
4
for x GN and X Tx (GN ). Consider now the specialization of Corollary 4.4.25
to the following case:

M = GN , which is a connected manifold;


g = the Riemannian metric inherited from MatN (F), with F = R, C, H accord-
ing as = 1, 2, 4;
= the volume measure on M corresponding to g;
300 4. S OME GENERALITIES

0 and (hence) = GN .

Then the corollary yields the theorem.


We next deduce a corollary with an elementary character which does not make
reference to differential geometry.

Corollary 4.4.28 Let {1, 2}. Let GN = O(N),U(N), according as = 1, 2.


Put SGN = {X GN : det X = 1}. Let f be a continuous real-valued function on
GN which, for some constant C and all X,Y GN , satisfies
| f (X) f (Y )| C tr((X Y )(X Y ) )1/2 . (4.4.31)
Then we have

sup |GN f f (Y X)d SGN (Y )| 2C, (4.4.32)
XGN

and furthermore
 
   (N+2)
4 1 2

GN | f () f (Y )d SGN (Y )| 2e 2C2 (4.4.33)

for all > 0.

For the proof we need a lemma which records some group-theoretical tricks. We
continue in the setting of Corollary 4.4.28.

Lemma 4.4.29 Let HN GN be the subgroup consisting of diagonal matrices with


all diagonal entries equal to 1 except possibly the entry in the upper left corner.
Let HN GN be the subgroup consisting of scalar multiples of the identity. For
any continuous real-valued function f on GN , put

(S f )(X) = f (Y X)d SGN (Y ) ,

(T f )(X) = f (XZ)d HN (Z) ,

(T f )(X) = f (XZ)d HN (Z) .

Then we have T S f = ST f = GN f . Furthermore, if = 2 or N is odd, then we


have T S f = ST f = GN f .

Proof It is clear that T S = ST . Since GN = {XY : X SGN , Y HN }, and Haar


measure on a compact group is both left- and right-invariant, it follows that T S f
is constant, and hence that T S f = GN f . The remaining assertions of the lemma
are proved similarly.

4.4 C ONCENTRATION OF MEASURE AND RANDOM MATRICES 301

Proof of Corollary 4.4.28 From (4.4.31) it follows that | f T f | 2C. The


bound (4.4.32) then follows by applying the previous lemma. We turn to the proof
of (4.4.33). By mollifying f as in the course of the proof of Lemma 2.3.3, we
may assume for the rest of this proof that f C (GN ). Now fix Z HN and
define fZ C (SGN ) by fZ (Y ) = f (Y Z), noting that SGN fZ = (S f )(Z) and that
the constant C bounds | fZ |L ,SGN . We obtain (4.4.33) by applying (4.4.29) to fZ
and then averaging over Z HN . The proof is complete.

We next describe a couple of important applications of Corollary 4.4.28. We


continue in the setup of Corollary 4.4.28.

Corollary 4.4.30 Let D be a constant and let DN , D N MatN be real diagonal


matrices with all entries bounded in absolute value by D. Let F be a Lipschitz
function on R with Lipschitz constant |F|L . Set f (X) = tr(F(D N + XDN X )) for
X GN . Then for every > 0 we have
 (N+2) 
4 1 N 2
GN (| f GN f | N) 2 exp .
16D2 F2L

Proof To abbreviate we write X = (tr XX )1/2 for X MatN (C). For X,Y GN
we have
A A
| f (X) f (Y )| 2NFL AXD N X Y D N Y A 2 2NDX Y  .
Further, by Lemma 4.4.29, since T f = f , we have GN f = S f . Plugging into
Corollary 4.4.28, we obtain the result.

In Chapter 5, we will need the following concentration result for noncommuta-


tive polynomials.

Corollary 4.4.31 Let Xi MatN (C) for i = 1, . . . , k be a collection of nonrandom


matrices and let D be a constant bounding all singular values of these matrices.
Let p = p(t1 , . . . ,tk+2 ) be a polynomial in k +2 noncommuting variables with com-
plex coefficients, and for X U(N), define f (X) = tr p(X, X , X1 , . . . , Xk ). Then
there exist positive constants N0 = N0 (p) and c = c(p, D) such that, for any > 0
and N > N0 (p),
 
U(N) | f U(N) f | > N 2ecN .
2 2
(4.4.34)

Proof We may assume without loss of generality that p = ti1 ti for some indices
i1 , . . . , i {1, . . . , k +2}, and also that N > . We claim first that, for all X U(N),

U(N) f = f (Y X)d SU(N) (Y ) =: (S f )(X) . (4.4.35)
302 4. S OME GENERALITIES

For some integer a such that |a|  we have f (ei X) = eia f (X) for all R
and X U(N). If a = 0, then S f = U(N) f by Lemma 4.4.29. Otherwise, if
a > 0, then U(N) f = 0, but also S f = 0, because f (e2 i/N X) = e2 ia/N f (X) and
e2 ia/N IN SU(N). This completes the proof of (4.4.35).
It is clear that f is a Lipschitz function, with Lipschitz constant depending
only on  and D. Thus, from Corollary 4.4.28 in the case = 2 and the equality
U(N) f = S f , we obtain (4.4.34) for p = ti1 ti with N0 =  and c = c(, D),
which finishes the proof of Corollary 4.4.31.

Exercise 4.4.32 Prove Lemma 2.3.2.


Hint: follow the approximation ideas used in the proof of Theorem 4.4.17, replac-

ing V by an approximation V (x) = V (x + z) (dz) with the normal distribu-
tion.

Exercise 4.4.33 In this exercise, you provide another proof of Proposition 4.4.26
by proving directly that the law
N
1 N N V (i )
PVN (d 1 , . . . , d N ) =
ZNV
e i=1 ( i )
d i
i=1

on RN satisfies the LSI with constant (Nc)1 . This proof extends to the -
ensembles discussed in Section 4.5.
(i) Use Exercise 4.4.32 to show that Theorem 4.4.18 extends to the case where
N

( ) = N V (i ) log |i j | .
i=1 2 i = j

(Alternatively, you may prove this directly by first smoothing .)


(ii) Note that

(k l )2 if k = l,
Hess( log |i j |)kl = 2
2 i = j (
j =k k j ) otherwise ,

is a nonnegative matrix, and apply Theorem 4.4.18.

4.5 Tridiagonal matrix models and the ensembles

We consider in this section a class of random matrices that are tridiagonal and
possess joint distribution of eigenvalues that generalize the classical GOE, GUE
and GSE matrices. The tridiagonal representation has some advantages, among
them a link with the well-developed theory of random Schroedinger operators.
4.5 T RIDIAGONAL MATRIX MODELS AND ENSEMBLES 303

4.5.1 Tridiagonal representation of ensembles

We begin by recalling the definition of random variables (with t degrees of


freedom).

Definition 4.5.34 The density on R+

21t/2 xt1 ex
2 /2

ft (x) =
(t/2)
is called the distribution with t degrees of freedom, and is denoted t .

Here, () is Eulers Gamma function, see (2.5.5). The reason for the name is that
if t is integer and X is distributed according to t , then X has the same law as
/
ti=1 i2 where i are standard Gaussian random variables.
Let i be independent i.i.d. standard Gaussian random variables of zero mean
and variance 1, and let Yi i be independent and independent of the vari-
ables {i }. Define the tridiagonal symmetric  matrix HN MatN (R) withen-
tries HN (i, j) = 0 if |i j| > 1, HN (i, i) = 2/ i and HN (i, i + 1) = YNi / ,
i = 1, . . . , N. The main result of this section is the following.

Theorem 4.5.35 (EdelmanDumitriu) The joint distribution of the eigenvalues


of HN is given by

CN ( )( ) e 4 i=1 i ,
N 2
(4.5.1)

where the normalization constant CN ( ) can be read off (2.5.11).

We begin by performing a preliminary computation that proves Theorem 4.5.35


in the case = 1 and also turns out to be useful in the proof of the theorem in the
general case.
Proof of Theorem 4.5.35 ( = 1) Let XN be a matrix distributed according to the
GOE law (and in particular, its joint distribution of eigenvalueshas the density
(2.5.3) with = 1, coinciding with (4.5.1)). Set N = XN (1, 1)/ 2, noting that,
due to the construction in Section 2.5.1, N is a standard Gaussian variable. Let
(1,1)
XN denote the matrix obtained from XN by striking the first column and row,
T (1,1)
and let ZN1 = (XN (1, 2), . . . , XN (1, N)). Then ZN1 is independent of XN and
N . Let HN be an orthogonal N 1-by-N 1 matrix, measurable on (ZN1 ),
such that HN ZN1 = (ZN1 2 , 0, . . . , 0), and set YN1 = ZN1 2 , noting that
YN1 is independent of N and is distributed according to N1 . (A particular
choice of HN is the Householder reflector HN = I 2uuT /u22 , where u = ZN1
304 4. S OME GENERALITIES

ZN1 2 (1, . . . , 0).) Let


 
1 0
HN = .
0 HN

Then the law of eigenvalues of HN XN HNT is still (4.5.1), while



2N YN1 0N2
YN1
HN XN HNT =

,

XN1
0N2

where XN1 is again distributed according to the GOE and is independent of N


and YN1 . Iterating this construction N 1 times (in the next step, with the House-
holder matrix corresponding to XN1 ), one concludes the proof (with = 1).

We next prove some properties of the eigenvalues and eigenvectors of tridiag-
onal matrices. Recall some notation from Section 2.5: DN denotes the collection
of diagonal N-by-N matrices with real entries, DNd denotes the subset of DN con-
sisting of matrices with distinct entries, and DNdo denotes the subset of matrices
with decreasing entries, that is DNdo = {D DNd : Di,i > Di+1,i+1 }. Recall also that
(1) (1),+
UN denotes the collection of N-by-N orthogonal matrices, and let UN denote
(1)
the subset of UN consisting of matrices whose first row has all elements strictly
positive.
We parametrize tridiagonal matrices by two vectors of length N and N 1,
(1)
a = (a1 , . . . , aN ) and b = (b1 , . . . , bN1 ), so that if H HN is tridiagonal then
(1)
H(i, i) = aNi+1 and H(i, i + 1) = bNi . Let TN HN denote the collection of
tridiagonal matrices with all entries of b strictly positive.

Lemma 4.5.36 The eigenvalues of any H TN are distinct, and all eigenvectors
v = (v1 , . . . , vN ) of H satisfy v1 = 0.

Proof The null space of any matrix H TN is at most one dimensional. Indeed,
suppose Hv = 0 for some nonzero vector v = (v1 , . . . , vN ). Because all entries of
b are nonzero, it is impossible that v1 = 0 (for then, necessarily all vi = 0). So
suppose v1 = 0, and then v2 = aN /bN1 . By solving recursively the equation

bNi vi1 + aNi vi = bNi1 vi+1 , i = 2, . . . , N 1, (4.5.2)

which is possible because all entries of b are nonzero, all entries of v are deter-
mined. Thus, the null space of any H TN is one dimensional at most. Since
H I TN for any , the first part of the lemma follows. The second part fol-
4.5 T RIDIAGONAL MATRIX MODELS AND ENSEMBLES 305

lows because we showed that if v = 0 is in the null space of H I, it is impossible


to have v1 = 0.

Let H TN , with diagonals a and b as above, and write H = UDU T with


D DNdo and U = [v1 , . . . , vN ] orthogonal, such that the first row of U, denoted v =
(v11 , . . . , vN1 ), has nonnegative entries. (Note that v2 = 1.) Write d =
(D1,1 , . . . , DN,N ). Let cN = {(x1 , . . . , xN ) : x1 > x2 > xN } and let
N1
S+ = {v = (v1 , . . . , vN ) RN : v2 = 1, vi > 0} .

(Note that cN is similar to N , except that the ordering of coordinates is reversed.)

Lemma 4.5.37 The map


(N1)
(a, b)  (d, v) : RN R+ cN S+
N1
(4.5.3)

is a bijection, whose Jacobian J is proportional to

(d)
. (4.5.4)
N1 i1
i=1 bi

Proof That the map in (4.5.3) is a bijection follows from the proof of Lemma
4.5.36, and in particular from (4.5.2) (the map (d, v)  (a, b) is determined by
the relation H = UDU T ).
To evaluate the Jacobian, we recall the proof of the = 1 case of Theorem
4.5.35. Let X be a matrix distributed according to the GOE, consider the tridiag-
onal matrix with diagonals a, b obtained from X by the successive Householder
transformations employed in that proof. Write X = UDU where U is orthogonal,
D is diagonal (with elements d), and the first row u of U consists of nonnegative
entries (and strictly positive except on a set of measure 0). Note that, by Corollary
2.5.4, u is independent of D and, by Theorem 2.5.2, the density of the distribution
of the vector (d, u) with respect to the product of the Lebesgue measure on cN
is proportional to (d)e i=1 di /4 . Using
N1 N 2
and the the uniform measure on S+
Theorem 4.5.35 and the first part of the lemma, we conclude that the latter (when
evaluated in the variables a, b) is proportional to

a2i 2
N1 bi
N1 N1
Je i=1 4 i=1 2
bi1 = Je i=1 di /4 bi1
N N 2
i i .
i=1 i=1

The conclusion follows.


We will also need the following useful identity.


306 4. S OME GENERALITIES

Lemma 4.5.38 With notation as above, we have the identity


N1
i=1 bi
i
(d) = . (4.5.5)
Ni=1 vi1

Proof Write H = UDU T . Let e1 = (1, 0, . . . , 0)T . Let w1 be the first column of
U T , which is the vector made out of the first entries of v1 , . . . , vn . One then has
N1
bii = det[e1 , He1 , . . . , H N1 e1 ] = det[e1 ,UDU T e1 , . . . ,UDN1U T e1 ]
i=1
N
= det[w1 , Dw1 , . . . , DN1 w1 ] = (d) vi1 .
i=1

Because all terms involved are positive by construction, the is actually a +, and
the lemma follows.

We can now conclude.


Proof of Theorem 4.5.35 (general > 0) The density of the independent vectors
a and b, together with Lemma 4.5.37, imply that the joint density of d and v with
respect to the product of the Lebesgue measure on cN and the uniform measure
N1
on S+ is proportional to
N1
i 1 4 N
bi
2
J e i=1 di . (4.5.6)
i=1

Using the expression (4.5.4) for the Jacobian, one has


  1   1
N1 N1 N
i 1
J bi = (d) bii = (d)
vi1 ,
i=1 i=1 i=1

where (4.5.5) was used in the second equality. Substituting in (4.5.6) and integrat-
ing over the variables v completes the proof.

4.5.2 Scaling limits at the edge of the spectrum



By Theorem 4.5.35, Corollary 2.6.3 and Theorem 2.6.6, we know that N / N,
the maximal eigenvalue of HN / N, converges
to 2 as N . It is thus natural
to consider the matrix HN = HN 2 NIN , and study its top eigenvalue. For
= 1, 2, 4, we have seen in Theorems 3.1.4 and 3.1.7 that the top eigenvalue of
N 1/6 HN converges in distribution (to the TracyWidom distributions F ). In this
section, we give an alternative derivation, valid for all , of the convergence in
distribution, although the identification of the limit does not involve the Tracy
Widom distributions.
4.5 T RIDIAGONAL MATRIX MODELS AND ENSEMBLES 307

One of the advantages of the tridiagonal representation of Theorem 4.5.35 is


that one can hope that scaling limits of tridiagonal matrices naturally relate to
(second order) differential operators. We begin by providing a heuristic argument
that allows one to guess both the correct scale and the form of the limiting oper-
ator. From the definition of variables with t degrees of freedom, such variables

are asymptotically (for large t) equivalent to t + G/ 2 where G is a standard
Gaussian random variable. Consider HN as an operator acting on column vectors
= (1 , . . . , N )T . We look for parameters , such that, if one writes n = [xN ]
and n = (x) for some nice function , the action of the top left corner of
N HN on approximates the action of a second order differential operator on .
(We consider the upper left corner because this is where the off-diagonal terms
have largest order, and one expects the top of the spectrum to be related to that
corner.) Toward this end, expand in a Taylor series up to second order, and
write n1 n N (x) + N 2 (x)/2. Using the asymptotic form of
variables mentioned above, one gets, after neglecting small error terms, that, for
< 1 and x in some compact subset of R+ ,

(N HN )(n) N +1/22 (x)


7
1  (1) 
N 2Gn + Gn + Gn1 (x) xN + 1/2 (x) ,
(2) (2)
+ (4.5.7)
2
(i)
where {Gn }, i = 1, 2, are independent sequences of i.i.d. standard Gaussian vari-
(i)
ables. It is then natural to try to represent Gn as discrete derivatives of indepen-
dent Brownian motions: thus, let Wx , W x denote standard Brownian motions and

(formally) write Gn = N /2Wx , Gn = N /2W x with the understanding that
(1) (2)

a rigorous definition will involve integration by parts. Substituting in (4.5.7) and


writing Bx = (Wx +W x )/ 2, we obtain formally

2N /2 B x
(N HN )(n) N +1/22 (x) +  (x) xN + 1/2 (x) , (4.5.8)

where (4.5.8) has to be understood after an appropriate integration by parts against
smooth test functions. To obtain a scaling limit, one then needs to take , so that
1 1 1 1
+ 2 = = + = 0 = , = .
2 2 2 3 6
In particular, we recover the TracyWidom scaling, and expect the top of the
spectrum of N 1/6 HN to behave like the top of the spectrum of the stochastic Airy
operator
d2 2
H := x +  B x . (4.5.9)
dx2
308 4. S OME GENERALITIES

The rest of this section is devoted to providing a precise definition of H , devel-


oping some of its properties, and proving the convergence of the top eigenvalues
of N 1/6 HN to the top eigenvalues of H . In doing so, the convergence of the
quadratic forms associated with N 1/6 HN toward a quadratic form associated with
H plays an important role. We thus begin by providing some analytical machin-
ery that will be useful in controlling this convergence.
On smooth functions of compact support in (0, ), introduce the bilinear non-
degenerate form
 
 f , g = f (x)g (x)dx + (1 + x) f (x)g(x)dx .
0 0

Define L as the Hilbert space obtained


 by completion with respect to the inner
product ,  (and norm  f  =  f , f  ). Because of the estimate

| f (x) f (y)| |x y| f  , (4.5.10)

elements of L are continuous functions, and vanish at the origin. Further prop-
erties of L are collected in Lemma 4.5.43 below.

Definition 4.5.39 A pair ( f , ) L R is called an eigenvectoreigenvalue pair


of H if  f 2 = 1 and, for any compactly supported infinitely differentiable func-
tion ,
 
(x) f (x)dx = [ (x) f (x) x (x) f (x)]dx
0 0
"  #
2
 (x) f (x)Bx dx + (x)Bx f (x)dx . (4.5.11)
0 0

Remark 4.5.40 Equation (4.5.11) expresses the following: ( f , ) is an


eigenvectoreigenvalue pair of H if H f = f in the sense of Schwarz dis-
tributions, where we understand f (x)B x as the Schwarz distribution that is the

derivative of the continuous function f (x)Bx 0x By f (y)dy.

Remark 4.5.41 Using the fact that f L , one can integrate by parts in (4.5.11)
and express all integrals as integrals involving only. In this way, one obtains
that ( f , ) is an eigenvectoreigenvalue pair of H if and only if, for Lebesgue
almost every x and some constant C, f (x) exists and
 x  x
f (x) = C + ( + ) f ( )d Bx f (x) + B f ( )d . (4.5.12)
0 0

Since the right side is a continuous function, we conclude that f can be taken con-
tinuous. (4.5.12) will be an important tool in deriving properties of eigenvector
4.5 T RIDIAGONAL MATRIX MODELS AND ENSEMBLES 309

eigenvalue pairs, and in particular the nonexistence of two eigenvectoreigenvalue


pairs sharing the same eigenvalue.

The main result of this section in the following.

Theorem 4.5.42 (RamirezRiderVirag) Fix > 0 and let NN > N1 N >


denote the eigenvalues of HN . For almost every Brownian path Bx , for each k 0,
the collection of eigenvalues of H possesses a well
defined k + 1st largest element
k . Further, the random vector N (N j 2 N) j=0 converges in distribution
1/6 N k

to the random vector ( j )kj=0 .

The proof of Theorem 4.5.42 will take the rest of this section. It is divided into
two main steps. We first study the operator H by associating with it a variational
problem. We prove, see Corollary 4.5.45 and Lemma 4.5.47 below, that the eigen-
values of H are discrete, that they can be obtained from this variational problem
and that the associated eigenspaces are simple. In a second step, we introduce a
discrete quadratic form associated with HN = N 1/6 HN and prove its convergence
to that associated with H , see Lemma 4.5.50. Combining these facts will then
lead to the proof of Theorem 4.5.42.
We begin with some preliminary material related to the space L .

Lemma 4.5.43 Any f L is Holder(1/2)-continuous and satisfies

x1/4 | f (x)| 2 f  , x > 1. (4.5.13)

Further, if { fn } is a bounded sequence in L then it possesses a subsequence that


converges to some f in L in the following senses: (i) fn L2 f , (ii) fn f
weakly in L2 , (iii) fn f uniformly on compacts, (iv) fn f weakly in L .

Proof The Holder continuity statement is a consequence of (4.5.10). The latter


also implies that for any function f with derivative in L2 ,
  
| f (y)| | f (x)| |y x| f 2
+

and in particular, for any x,

f 2 (x) 2 f 2  f 2 . (4.5.14)

(Indeed, fix x and consider the set Ax = {y : |y x| f 2 (x)/4 f 22 }. On Ax ,



| f (y)| | f (x)|/2. Writing  f 22 Ax f 2 (y)dy then gives (4.5.14).) Since  f 2
  2
z (1 + x) f (x)dx z z f (x)dx, applying the estimate (4.5.14) on the function
2

f (z)1zx yields (4.5.13).


310 4. S OME GENERALITIES

Points (ii) and (iv) in the statement of the lemma follow from the Banach
Alaoglu Theorem (Theorem B.8). Point (iii) follows from the uniform equi-
continuity on compacts of the sequence fn that is a consequence of the uniform

Holder estimate. Together with the uniform integrability supn x fn2 (x)dx < ,
this gives (i).

The next step is the introduction of a bilinear form on L associated with H .


Toward this end, note that if one interprets H for smooth in the sense of
Schwarz distributions, then it can be applied (as a linear functional) again on ,
yielding the quadratic form

4
 , H :=  22 +  x (x)22 +  Bx (x) (x)dx . (4.5.15)
0

We seek to extend the quadratic form in (4.5.15) to functions in L . The main


issue is the integral
 
2 Bx (x) (x)dx = Bx ( (x)2 ) dx .
0 0

Since it is not true that |Bx | < C x for all large x, in order to extend the quadratic
form in (4.5.15) to functions in L , we need to employ the fact that Bx is itself
regular in x. More precisely, define
 x+1
Bx = By dy .
x

For smooth and compactly supported, we can write Bx = Bx + (Bx Bx ) and


integrate by parts to obtain
  
Bx ( (x)2 ) dx = (Bx ) 2 (x)dx + 2 (Bx Bx ) (x) (x)dx .
0 0 0
This leads us to define
"  #
2
 , H :=  22 +  x (x)22  Qx 2 (x)dx 2 Rx (x) (x)dx ,
0 0
(4.5.16)
where
Qx = (Bx ) = Bx+1 Bx , Rx = Bx Bx . (4.5.17)
As we now show, this quadratic form extends to L .

Lemma 4.5.44 (a) For each > 0 there exists a random constant C (depending
on , and B only) such that
4 |Q | |R |
 sup x x . (4.5.18)
x C+ x
4.5 T RIDIAGONAL MATRIX MODELS AND ENSEMBLES 311

(b) The quadratic form , H of (4.5.16) extends to a continuous symmetric bi-


linear form on L L : there exists a (random) constant C , depending on the
Brownian path B only, such that, almost surely,
1
 f 2 C  f 22  f , f H C  f 2 . (4.5.19)
2
Proof For part (a), note that
|Qx | |Rx | Z[x] + Z[x]+1 ,
where Zi = supy[0,1] |Bi+y Bi |. The random variables Zi are i.i.d. and satisfy
P(Zi > t) 4P(G > t) where G is a standard Gaussian random variable. From
this and the BorelCantelli Lemma, (4.5.18) follows.
We turn to the proof of (b). The sum of the first two terms in the definition
of  f , f H equals  f 2  f 22 . By the estimate (4.5.18) on Q with = 1/10,
the third term can be bounded in absolute value by  f 2 /10 + C1  f 22 for some

(random) constant C1 (this is achieved by upper bounding C(1 + x) by C1 +
x/10). Similarly, the last term can be controlled as

1 1 1
(C + x)| f (x)|| f (x)|dx C f  f  +  f 2  f 2 +C2  f 22 .
0 10 10 5
Combining these estimates (and the fact that  f  dominates  f 2 ) yields (4.5.19).

We can now consider the variational problem associated with the quadratic form
, H of (4.5.16).

Corollary 4.5.45 The infimum in the minimization problem


0 := inf  f , f H (4.5.20)
f L , f 2 =1

is achieved at some f L , and ( f , 0 ) is an eigenvectoreigenvalue pair for


H , with 0 = 0 .

We will shortly see in Lemma 4.5.47 that the minimizer in Corollary 4.5.45 is
unique.
Proof By the estimate (4.5.19), the infimum in (4.5.20) is finite. Let { fn }n be a
minimizing sequence, that is  fn 2 = 1 and  fn , fn H 0 . Again by (4.5.19),
there is some (random) constant K so that  fn  K for all n. Write
"  #
2
 fn , fn H =  fn 2  fn 22  Qx fn2 (x)dx 2 Rx fn (x) fn (x)dx .
0 0

Let f L be a limit point of fn (in all the senses provided by Lemma 4.5.43).
312 4. S OME GENERALITIES

Then 1 =  fn 2  f 2 and hence  f 2 = 1, while lim inf  fn   f  . Fix


> 0. Then, by (4.5.18), there is a random variable X such that
#
2 " 

 Q f (x)dx 2
2
Rx fn (x) fn (x)dx  fn  .
X x n X

The convergence of fn to f uniformly on [0, X] together with the boundedness of


 fn  then imply that

 f , f H lim inf fn , fn H + K = 0 + K .
n

Since is arbitrary, it follows from the definition of 0 that  f , f H = 0 , as


claimed.
To see that ( f , 0 ) is an eigenvectoreigenvalue pair, fix > 0 and smooth
of compact support, and set f , = ( f + )/ f + 2 (reduce if needed so
that = f / ). Then

 f , f H  f , f H
 
= 2  f , f H f (x) (x)dx + 2 ( f (x) (x) + x f (x) (x))dx
0 0
"  #
4
 Qx (x) f (x)dx Rx [ (x) f (x)] dx + O( 2 ) .
0 0

Thus, a necessary condition for f to be a minimizer is that the linear in term in


the last equality vanishes for all such smooth and compactly supported . Using
the fact that is compactly supported, one can integrate by parts the term involv-
ing Q and rewrite it in terms of Bx . Using also the fact that  f , f H = 0 , one
gets from this necessary condition that ( f , 0 ) satisfies (4.5.11).
Finally, we note that by (4.5.11) and an integration by parts, if (g, ) is an
eigenvectoreigenvalue pair then for any compactly supported smooth ,
 
(x)g(x)dx = [ (x)g(x) x (x)g(x)]dx
0 0
"  #
4
 (x)g(x)Qx dx Rx [ (x)g(x)] dx . (4.5.21)
0 0

Take a sequence {n } of smooth, compactly supported functions, so that n g in


L . Applying the same argument as in the proof of Lemma 4.5.44, one concludes
that all terms in (4.5.21) (with n replacing ) converge to their value with f
replacing . This implies that g, gH = g22 , and in particular that 0 .
Since the existence of a minimizer f to (4.5.20) was shown to imply that ( f , 0 )
is an eigenvectoreigenvalue pair, we conclude that in fact 0 = 0 .

4.5 T RIDIAGONAL MATRIX MODELS AND ENSEMBLES 313

Remark 4.5.46 The collection of scalar multiples of minimizers in Corollary


4.5.45 forms a linear subspace H0 . We show that H0 is finite dimensional: in-
deed, let { fn } denote an orthogonal (in L2 ) basis of H0 , and suppose that it is
infinite dimensional. By Lemma 4.5.44, there is a constant C such that  fn  C.
Switching to a subsequence if necessary, it follows from Lemma 4.5.43 that fn
converges to some f in L2 , with  f 2 = 1, and in fact f H0 . But on the other
hand, f is orthogonal to all fn in H0 and thus f H0 , a contradiction.

We can now repeat the construction of Corollary 4.5.45 inductively. For k 1,


denoting the ortho-complement of H
with Hk1 2
k1 in L , set

k := inf  f , f H . (4.5.22)
f L , f 2 =1, f Hk

Mimicking the proof of Corollary 4.5.45, one shows that the infimum in (4.5.22)
is achieved at some f L , and ( f , k ) is an eigenvectoreigenvalue pair for
H , with k = k . We then denote by Hk the (finite dimensional) linear space
of scalar multiples of minimizers in (4.5.22). It follows that the collection of
eigenvalues of H is discrete and can be ordered as 0 > 1 > .
Our next goal is to show that the spaces Hk are one dimensional, i.e. that each
eigenvalue is simple. This will come from the analysis of (4.5.12). We have the
following.

Lemma 4.5.47 For each given C, and continuous function B , the solution to
(4.5.12) is unique. As a consequence, the spaces Hk are all one-dimensional.

Proof Integrating by parts, we rewrite (4.5.12) as


 x  x  x  x
f (x) = C +( +x) f ( )d f ( )d Bx f ( )d + B f ( )d .
0 0 0 0
(4.5.23)
By linearity, it is enough to show that solutions of (4.5.23) vanish when C = 0.
But, for C = 0, one gets that for some bounded C (x) = C ( , B , x) with C (x)

increasing in x, | f (x)| C 0x | f ( )|d . An application of Gronwalls Lemma
shows that f (x) = 0 for all positive x. To see that Hk is one dimensional, note
that if f satisfies (4.5.12) with constant C, then c f satisfies the same with constant
cC.

Another ingredient of the proof of Theorem 4.5.42 is the representation of the


314 4. S OME GENERALITIES

matrix HN := N 1/6 HN as an operator on L . Toward this end, define (for x R+ )


7 1/3
2 [xN ]
yN,1 (x) = N 1/6
HN (i, i) ,
i=1
(4.5.24)

[xN 1/3 ]
yN,2 (x) = 2N 1/6
( N HN (i, i + 1)) . (4.5.25)
i=1

Standard estimates lead to the following.

Lemma 4.5.48 There exists a probability space supporting the processes yN, j ()
and two independent Brownian motions B, j , j = 1, 2, such that, with respect to
the Skorohod topology, the following convergence holds almost surely:
7
2
yN, j () Bx, j + x2 ( j 1)/2 , j = 1, 2 .

In the sequel, we work in the probability space determined by Lemma 4.5.48, and
write Bx = Bx,1 + Bx,2 (thus defining naturally a version of the operator H whose
relation to the matrices HN needs clarification). Toward this end, we consider the
matrices HN as operators acting on RN equipped with the norm
N N N
v2N, = N 1/3 (v(i + 1) v(i))2 + N 2/3 iv(i)2 + N 1/3 v(i)2 ,
i=1 i=1 i=1

where we set v(N + 1) = 0. Write v, wN,2 = N 1/3 and let vN,2
Ni=1 v(i)w(i)
denote the associated norm on R . Recall the random variables Yi appearing in
N

the definition of the tridiagonal matrix HN , see Theorem 4.5.35, and motivated by
the scaling in Theorem 4.5.42, introduce
1
i = 2N 1/6 ( N  EYNi ) ,

1
i = 2N 1/6  (EYNi YNi ) .

It is straightforward to verify that i 0 and that, for some constant independent
of N,
i i
i + . (4.5.26)
N N

Also, with wk = 2/ N 1/6 ki=1 i and wk = ki=1 i , we have that for any
(1) (2)

> 0 there is a tight sequence of random variables N, satisfying

|wk wi |2 iN 1/3 + N, .
( j) ( j)
sup (4.5.27)
iki+N 1/3
4.5 T RIDIAGONAL MATRIX MODELS AND ENSEMBLES 315

We now have the following analog of (4.5.19).

Lemma 4.5.49 There exists a tight sequence of random variables ci = ci (N), i =


1, 2, 3, so that, for all N and v,

c1 v2N, c2 v2N,2 v, HN vN,2 c3 v2N, .

Proof Using the definitions, one gets (setting v(N + 1) = 0)


N N
v, HN vN,2 = N 1/3 (v(i + 1) v(i))2 + 2N 1/6 i v(i)v(i + 1)
i=1 i=1
7
N N
2 1/6


N v2 (i)i + 2N 1/6 i v(i)v(i + 1)
i=1 i=1
=: S1 + S2 S3 + S4 . (4.5.28)

One identifies S1 with the first term in v2N, . Next, we have


7
N N N
i v(i)v(i + 1) i v(i)2 i v(i + 1)2 ,
i=1 i=1 i=1

and thus, together with the bound (4.5.26), we have that S2 is bounded above by
a constant multiple of the sum of the second and third terms in v2N, . Similarly,
we have from the bound ab (a b)2 /3 + a2 /4 that

1 1 1 1
i v(i)v(i + 1) (vi+1 vi )2 + i v2i (vi+1 vi )2 + iv2i v2i
3 4 3 4 4
and using (4.5.26) again, we conclude that, for an appropriate constant c( ),

2
S2 + S1 v2N, c( )v22 . (4.5.29)
3

We turn next to S3 . Write wk = N 1/3 [wk+N 1/3 wk ], j = 1, 2. Summing


( j) ( j) ( j)

by parts we get
N N
(wi+1 wi wi )v2 (i) + wi v2 (i)
(1) (1) (1) (1)
S3 =
i=1 i=1
 
N i+N 1/3 N
N 1/3 (w wi ) (v2 (i + 1) v2 (i)) + wi v2 (i)
(1) (1) (1)
=
i=1 =i+1 i=1
=: S3,1 + S3,2 . (4.5.30)
316 4. S OME GENERALITIES

Using (4.5.27) we find that


N /
|S3,1 | |v2 (i + 1) v2 (i)| iN 1/3 + N,
i=1
1/3 N 1 N
N (v(i + 1) v(i))2 + ( iN 2/3 + N, N 1/3 )v2 (i)
i=1 i=1
N,
v2N, + v22 .

Applying (4.5.27) again to estimate S3,2 , we conclude that
1
|S3 | ( + )v2N, + ( + 1)N, v22 .

A similar argument applies to S4 . Choosing small and combining with the esti-
mate (4.5.29) then concludes the proof of the lemma.

Because the family of random variables in Lemma 4.5.49 is tight, any subse-
quence {Nk } possesses a further subsequence {Nki } so that the estimates there
hold with fixed random variables ci (now independent of N). To prove Theorem
4.5.42, it is enough to consider such a subsequence. With some abuse of notation,
we continue to write N instead of Nk .
Each vector v RN can be identified with a piecewise constant function fv by
the formula fv (x) = v($N 1/3 x%) for x [0, $N 2/3 %] and fv (x) = 0 for all other x.
The collection of such functions (for a fixed N) forms a closed linear subspace of
L2 := L2 (R+ ), denoted L2,N , and HN acts naturally on L2,N . Let PN denote the
projection from L2 to L2,N L2 . Then HN extends naturally to an operator on L2
by the formula HN f = HN PN f . The relation between the operators HN and H
is clarified in the following lemma.

Lemma 4.5.50 (a) Let fN L2,N and suppose fN f weakly in L2 , so that

N 1/3 ( fN (x + N 1/3 ) fN (x)) f (x) weakly in L2 .

Then, for any compactly supported ,

 , HN fN 2  , H . (4.5.31)

(b) Let fN L2,N with  fN N, c and  fN 2 = 1. Then there exists an f L


and a subsequence N so that fN f in L2 and, for all smooth, compactly
supported , one has

 , HN fN 2   , f H .
4.5 T RIDIAGONAL MATRIX MODELS AND ENSEMBLES 317

Proof The first part is an exercise in summation by parts that we omit. To see
the second part, pick a subsequence such that both fN and N 1/3 ( fN (x + N 1/3 )

fN (x)) converge weakly in L2 to a limit ( f , g), with f (x) = 0t g(s)ds (this is pos-
sible because  fN N, < ). An application of the first part of the lemma then
completes the proof.

We have now put in place all the analytic machinery needed to conclude.

Proof of Theorem 4.5.42 Write N,k = N 1/6 (Nk N 2 N). Then
N,k is the
kth top eigenvalue of HN . Let vN,k denote the associated eigenvector, so that
 fvN,k 2 = 1. We first claim that k := lim sup k,N k . Indeed, if k > , one
can find a subsequence, that we continue to denote by N, so that (N,1 , . . . , N,k )
(1 , . . . , k = k ). By Lemma 4.5.49, for j = 1, . . . , k, vN, j N, are uniformly
bounded, and hence, on a further subsequence, fvN, j converge in L2 to a limit f j ,
j = 1, . . . , N, and the f j are eigenvectors of H with eigenvalue at least k . Since
the f j are orthogonal in L2 and the spaces H j are one dimensional, it follows that
k k .
To see the reverse implication, that will complete the proof, we use an inductive
argument. Suppose that N, j j and fvN, j f j in L2 for j = 1, . . . , k 1, where
( f j , j ) is the jth eigenvectoreigenvalue pair for H . Let ( fk , k ) be the kth
eigenvectoreigenvalue pair for H . Let fk be smooth and of compact support, so
that  fk fk  , and set
k1
fN,k = PN fk vN, j , PN fk vN, j .
j1

Since vN, j N, < c for some fixed c by Lemma 4.5.49, and PN fk fvN,k 2
is bounded by 2 for N large, it follows that  fN,k PN fk N, < c for some
(random) constant c. Using Lemma 4.5.49 again, we get that
 fN,k , HN fN,k  PN fk , HN PN fk 
lim inf N,k lim inf = lim inf + s( ) ,
N N  fN,k , fN,k  N PN fk , PN fk 
(4.5.32)
where s( ) 0 0. Applying (4.5.31), we have that
lim PN fk , HN PN fk  =  fk , fk H .
N

Substituting in (4.5.32), we get that


 fk , fk H
lim inf N,k + s ( ) ,
N  fk 2
where again s ( ) 0 0. This implies, after taking 0, that
lim inf N,k k .
N
318 4. S OME GENERALITIES

The convergence of fvN,k fk follows from point (b) of Lemma 4.5.50.


4.6 Bibliographical notes

The background material on manifolds that we used in Section 4.1 can be found
in [Mil97] and [Ada69]. The Weyl formula (Theorem 4.1.28) can be found in
[Wey39]. A general version of the coarea formula, Theorem 4.1.8, is due to Fed-
erer and can be found in [Fed69], see also [Sim83] and [EvG92] for less intimi-
dating descriptions.
The physical motivation for studying different ensembles of random matrices
is discussed in [Dys62e]. We note that the Laguerre and Jacobi ensembles oc-
cur also through statistical applications (the latter under the name MANOVA, or
multivariate analysis of variance), see [Mui81].
Our treatment of the derivation of joint distributions of eigenvalues was influ-
enced by [Due04] (the latter relies directly on Weyls formula) and [Mat97]. The
book [For05] is an excellent recent reference on the derivation of joint distribu-
tions of eigenvalues of random matrices belonging to various ensembles; see also
[Meh91] and the more recent [Zir96]. Note, however, that the circular ensembles
COE and CSE do not correspond to random matrices drawn uniformly from the
unitary ensembles as in Proposition 4.1.6. A representation theoretic approach to
the study of the latter that also gives central limit theorems for moments is pre-
sented in [DiS94] and further developed in [DiE01]. The observation contained
in Remark 4.1.7 is motivated by the discussion in [KaS99]. For more on the root
systems mentioned in Remark 4.1.5 and their link to the Weyl integration formula,
see [Bou05, Chapter 9, Section 2].
The theory of point processes and the concept of Palm measures apply to much
more general situations than we have addressed in Section 4.2. A good treatment
of the theory is contained in [DaVJ88]. Our exposition builds on [Kal02, Chapter
11].
Point processes x0 on R whose associated difference sequences y0 (see Lemma
4.2.42) are stationary with marginals of finite mean are called cyclo-stationary. It
is a general fact, see [Kal02, Theorem 11.4], that all cyclo-stationary processes
are in one-to-one correspondence with nonzero stationary simple point processes
of finite intensity via the Palm recipe.
Determinantal point processes were studied in [Mac75], see also the survey
[Sos00]. The representation of Proposition 4.2.20, as well as the observation that
it leads to a simple proof of Corollary 4.2.21 and of the CLT of Corollary 4.2.23
(originally proved in [Sos02a]), is due to [HoKPV06]. See also [HoKPV09]. The
4.6 B IBLIOGRAPHICAL NOTES 319

Janossy densities of Definition 4.2.7 for determinantal processes were studied in


[BoS03], see [Sos03] for the Pfaffian analog.
The argument in the proof of Proposition 4.2.30 was suggested to us by T.
Suidan. Lemma 4.2.50 appears in [Bor99]. Lemma 4.2.52 is taken from [GeV85].
A version valid for continuous time processes was proved earlier in [KaM59]. The
relation between non-intersecting random walks, Brownian motions and queueing
systems was developed in [OcY01], [OcY02], [KoOR02] and [Oco03]. There is
a bijection between paths conditioned not to intersect and certain tiling problems,
see [Joh02], [Kra90] and references therein; thus, certain tiling problems are re-
lated to determinantal processes. The relation with spanning trees in graphs is
described in [BuP93]. Finally, two-dimensional determinantal processes appear
naturally in the study of zeroes of random analytic functions, as was discovered
in [PeV05], see [HoKPV09].
The description of eigenvalues of the GUE as a diffusion process, that is, Theo-
rem 4.3.2, was first stated by Dyson [Dys62a]. McKean [McK05, p.123] consid-
ered the symmetric Brownian motion and related its eigenvalues to Dysons Brow-
nian motion. A more general framework is developed in [NoRW86] in the context
of Brownian motions of ellipsoids. The relation between paths conditioned not to
intersect and the Dyson process is studied in [BiBO05] and [DoO05]. The ideas
behind Lemma 4.3.6 come from [Sni02]. A version of Lemma 4.3.10 can be found
in [RoS93]. When = 1, 2, t in that lemma is the asymptotic limit of the spectral
measure of X N, (0) + H N, (t). It is a special case of free convolution (of the law
and the semicircle law with variance t) that we shall describe in Chapter 5. A
refined study of the analytic properties of free convolution with a semicircle law
that greatly expands on the results in Lemma 4.3.15 appears in [Bia97b].
The properly rescaled process of eigenvalues converges weakly to the sine pro-
cess (in the bulk) and the Airy process (at the edge), see [TrW03], [Adl05] and
[AdvM05]. The Airy process also appears as the limit of various combinatorial
problems. For details, see [PrS02] or [Joh05]. Other processes occur in the study
of rescaled versions of the eigenvalue processes of other random matrices. In par-
ticular, the Laguerre process arises as the scaling limit of the low-lying eigenvalues
of Wishart matrices, see [Bru91], [KoO01] and [Dem07], and has the interpreta-
tion of Bessel processes conditioned not to intersect.
The use of stochastic calculus as in Theorem 4.3.20 to prove central limit theo-
rems in the context of Gaussian random matrices was introduced in [Cab01]. This
approach extends to the study of the fluctuations of words of two (or more) inde-
pendent Wigner matrices, see [Gui02] who considered central limit theorems for
words of a Gaussian band matrix and deterministic diagonal matrices.
320 4. S OME GENERALITIES

Proposition 4.3.23 is due to [CaG01]. It was completed into a full large devi-
ation principle in [GuZ02] and [GZ04]. By the contraction principle (Theorem
D.7), it implies also the large deviations principle for LN (1), and in particular for
the empirical measure of eigenvalues for the sum of a Gaussian Wigner matrix XN
and a deterministic matrix AN whose empirical measure of eigenvalues converges
and satisfies (4.3.23). For AN = 0, this recovers the results of Theorem 2.6.1 in
the Gaussian case.
As pointed out in [GuZ02] (see also [Mat94]), the large deviations for the em-
pirical measure of the eigenvalues of AN + XN are closely related to the Itzykson
ZuberHarish-Chandra integral, also called spherical integral, given by

N
(2)
IN (A, D) = e 2 tr(UDU A) dm( ) (U),
N

where the integral is with respect to the Haar measure on the orthogonal group
(when = 1) and unitary group (when = 2). This integral appeared first in the
work of Harish-Chandra [Har56] who proved that when = 2,

(2) det((eNdi a j )1i, jN )


IN (A, D) = ,
i< j (ai a j ) i< j (di d j )
where (di )1iN (resp. (ai )1iN ) denote the eigenvalues of D (resp. A). Itzykson
and Zuber [ItZ80] rederived this result, proved it using the heat equation, and gave
(2) (2)
some properties of IN (A, D) as N goes to infinity. The integral IN (A, D) is also
related to Schur functions, see [GuM05].
Concentration inequalities have a long history, we refer to [Led01] for a modern
and concise introduction. Theorem 4.4.13 is taken from [GuZ00], where analo-
gous bounds are derived, via Talagrands method [Tal96],for the case in which
the entries of the matrix XN are bounded uniformly by c/ N for some constant
c. Under boundedness assumptions, concentration inequalities for the s-largest
eigenvalue are derived in [AlKV02]. The proof of Kleins Lemma 4.4.12 follows
[Rue69, Page 26].
In [GuZ00] it is explained how Theorems 2.3.5 and 4.4.4 allow one to obtain
concentration results for the empirical measure, with respect to the Wasserstein
distance
 
d( , ) = sup | f d f d | , , M1 (R).
f :|| f || 1,|| f ||L 1

(d( , ) is also called the MongeKantorovichRubinstein distance, see the his-


torical comments in [Dud89, p. 341342]).
Concentration inequalities for the Lebesgue measure on compact connected
Riemannian manifold were first obtained, in the case of the sphere, in [Lev22]
4.6 B IBLIOGRAPHICAL NOTES 321

and then generalized to arbitrary compact connected Riemannian manifold of di-


mension n with Ricci curvature bounded below by (n 1)R2 for some R > 0 in
[GrMS86, p. 128]. Our approach in Section 4.4.2 follows Bakry and Emery
[BaE85], who introduced the criterion that carries their names. The ergodicity of
Pt invoked in the course of proving Theorem 4.4.18, see (4.4.21), does not depend
on the BE criterion and holds in greater generality, as a consequence of the fact
that vanishes only on the constants, see [Bak94]. In much of our treatment, we
follow [AnBC+ 00, Ch. 5], [GuZ03, Ch. 4] and [Roy07], which we recommend
for more details and other applications.
Concentration inequalities for the empirical measure and largest eigenvalue of
Hermitian matrices with stable entries are derived in [HoX08].
The first derivation of tridiagonal matrix models for the -Hermite and La-
guerre ensembles is due to [DuE02]. These authors used the models to derive
CLT results for linear statistics [DuE06]. In our derivation, we borrowed some
tools from [Par80, Ch. 7]. Soon after, other three- and five-diagonal models for
the -Jacobi and circular ensembles were devised in [KiN04], explicitly linking
to the theory of orthogonal polynomials on the unit circle and the canonical ma-
trix form of unitary matrices introduced in [CaMV03]. The book [Sim05a] and
the survey [Sim07] contains much information on the relations between the coef-
ficients in the three term recursions for orthogonal polynomials on the unit circle
with respect to a given measure (the Verblunsky coefficients) and the CMV ma-
trices of [CaMV03]. In this language, the key observation of [KiN04] is that the
Verblunsky coefficients corresponding to Haar-distributed unitaries are indepen-
dent. See also [FoR06], [KiN07] and [BoNR08] for further developments in this
direction.
The derivation in Section 4.5.2 of the asymptotics of the eigenvalues of the
-ensembles at the edge is due to [RaRV06], who followed a conjecture of Edel-
man and Sutton [EdS07]. (In [RaRV06], tail estimates on the top eigenvalue are
deduced from the diffusion representation.) The results in [RaRV06] are more
general than we have exposed here in that they apply to a large class of tridiagonal
matrices, as long as properly rescaled coefficients converge to Brownian motion.
Analogous results for the hard edge (as in the case of the bottom eigenvalue
of Wishart matrices) are described in [RaR08]. A major challenge is to identify
the TracyWidom distributions (and their -analogs) from the diffusion in The-
orem 4.5.42. The description of the process of eigenvalues in the bulk involves
a different machinery, see [VaV07] (where it is called Brownian carousel) and
[KiS09].
5
Free probability

Citing D. Voiculescu, Around 1982, I realized that the right way to look at certain
operator algebra problems was by imitating some basic probability theory. More
precisely, in noncommutative probability theory a new kind of independence can
be defined by replacing tensor products with free products and this can help un-
derstand the von Neumann algebras of free groups. The subject has evolved into a
kind of parallel to basic probability theory, which should be called free probability
theory.

Thus, Voiculescus first motivation to introduce free probability was the analy-
sis of the von Neumann algebras of free groups. One of his central observations
was that such groups can be equipped with tracial states (also called traces), which
resemble expectations in classical probability, whereas the property of freeness,
once properly stated, can be seen as a notion similar to independence in classical
probability. This led him to the statement

free probability theory=noncommutative probability theory+ free independence.

These two components are the basis for a probability theory for noncommuta-
tive variables where many concepts taken from probability theory such as the no-
tions of laws, convergence in law, independence, central limit theorem, Brownian
motion, entropy and more can be naturally defined. For instance, the law of one
self-adjoint variable is simply given by the traces of its powers (which generalizes
the definition through moments of compactly supported probability measures on
the real line), and the joint law of several self-adjoint noncommutative variables
is defined by the collection of traces of words in these variables. Similarly to the
classical notion of independence, freeness is defined by certain relations between
traces of words. Convergence in law just means that the trace of any word in the
noncommutative variables converges towards the right limit.

322
5.1 I NTRODUCTION AND MAIN RESULTS 323

This chapter is devoted to free probability theory and some of its consequences
for the study of random matrices.

5.1 Introduction and main results

The key relation between free probability and random matrices was discovered
by Voiculescu in 1991 when he proved that the trace of any word in independent
Wigner matrices converges toward the trace of the corresponding word in free
semicircular variables. Roughly speaking, he proved the following (see Theorem
5.4.2 for a complete statement).

Theorem 5.1.1 Let (, B, P) be a probability space and N, p be positive inte-


( )
gers. Let XiN : HN , 1 i p, be a family of independent Gaussian Wigner
matrices following the (rescaled) GOE or GUE. Then, for any integer k 1 and
i1 , . . . , ik {1, . . . , p}, N 1 tr(XiN1 XiN ) converges almost surely (and in expec-
tation) as N to a limit denoted (p) (Xi1 Xi p ). (p) is a linear form on
noncommutative polynomial functions which is called the law of p free semicir-
cular variables.

Laws of free variables are defined in Definition 5.3.1. These are noncommutative
laws which are defined uniquely in terms of the laws of their variables, that is,
in terms of their one-variable marginal distributions. In Theorem 5.1.1 all the
one-variable marginals are the same, namely, the semicircle law. The statement
of Theorem 5.1.1 extends to Hermitian or real symmetric Wigner matrices whose
entries have finite moments, see Theorem 5.4.2. Another extension deals with
words that include also deterministic matrices whose law converges, as in the
following.

Theorem 5.1.2 Let = 1 or 2 and let (, B, P) be a probability space. Let DN =


{DNi }1ip be a sequence of Hermitian deterministic matrices with uniformly
( )
bounded spectral radius, and let XN = {XiN }1ip , XiN : HN , 1 i p,
be self-adjoint independent Wigner matrices whose entries have zero mean and
finite moments of all order. Assume that for any positive integer k and i1 , . . . , ik
{1, . . . , p}, N 1 tr(DNi1 DNik ) converges to some number (Di1 Dik ).
Then, for any positive integer  and polynomial functions (Qi , Pi )1i ,

1  
tr Q1 (DN )P1 (XN )Q2 (DN ) P (XN )
N
324 5. F REE PROBABILITY

converges almost surely and in expectation to a limit denoted


(Q1 (D)P1 (X)Q2 (D) P (X)) .
Here, is the law of p free semicircular variables X, free from the collection of
noncommutative variables D of law .

(See Theorem 5.4.5 for the full statement and the proof.)
Theorems 5.1.1 and 5.1.2 are extremely useful in the study of random matrices.
Indeed, many classical models of random matrices can be written as some polyno-
mials in Wigner matrices and deterministic matrices. This is the case for Wishart
matrices or, more generally, for band matrices (see Exercises 5.4.14 and 5.4.16).
The law of free variables appears also when one considers random matrices fol-
lowing Haar measure on the unitary group. The following summarizes Theorem
5.4.10.

Theorem 5.1.3 Take DN = {DNi }1ip as in Theorem 5.1.2. Let UN = {UiN }1ip
be a collection of independent Haar-distributed unitary matrices independent
from {DNi }1ip , and set (UN ) = {(UiN ) }1ip . Then, for any positive integer 
and any polynomial functions (Qi , Pi )1i ,
1  
lim tr Q1 (DN )P1 (UN , (UN ) )Q2 (DN ) P (UN , (UN ) )
N N
= (Q1 (D)P1 (U, U )Q2 (D) P (U, U )) a.s.,
where is the law of p free variables U = (U1 , . . . ,Up ), free from the noncommu-
tative variables D of law . The law of Ui , 1 i p, is such that
((UiUi 1)2 ) = 0, (Uin ) = ((Ui )n ) = 1n=0 .

Thus, free probability appears as the natural setting to study the asymptotics of
traces of words in several (possibly random) matrices.
Adopting the point of view that traces of words in several matrices are funda-
mental objects is fruitful because it leads to the study of some general structure
such as freeness (see Section 5.3); freeness in turns simplifies the analysis of con-
vergence of moments. The drawback is that one needs to consider more general
objects than empirical measures of eigenvalues converging towards a probabil-
ity measure, namely, traces of noncommutative polynomials in random matrices
converging towards a linear functional on such polynomials, called a tracial state.
Analysis of such objects is then achieved using free probability tools.
In the first part of this chapter, Section 5.2, we introduce the setup of free prob-
ability theory (the few required notions from the theory of operator algebras are
5.2 N ONCOMMUTATIVE LAWS AND PROBABILITY SPACES 325

contained in Appendix G). We then define in Section 5.3 the property of freeness
and discuss free cumulants and free convolutions. In Section 5.4, which can be
read independently of the previous ones except for the description of the limit-
ing quantities in terms of free variables, we show that the asymptotics of many
classical models of random matrices satisfy the freeness property, and use that
observation to evaluate limiting laws. Finally, Section 5.5 uses free probability
tools to describe the behavior of spectral norms of noncommutative polynomials
in independent random matrices taken from the GUE.

5.2 Noncommutative laws and noncommutative probability spaces

In this section, we introduce the notions of noncommutative laws and noncommu-


tative probability spaces. An example that the reader should keep in mind con-
cerns N N matrices (M1 , . . . , M p ); a natural noncommutative probability space
is then the algebra of N N matrices, equipped with the normalized trace N 1 tr,
whereas the law (or empirical distribution) of (M1 , . . . , M p ) is given by the collec-
tion of the normalized traces of all words in these matrices.

5.2.1 Algebraic noncommutative probability spaces and laws

Basic algebraic notions are recalled in Appendix G.1.

Definition 5.2.1 A noncommutative probability space is a pair (A , ) where A


is a unital algebra over C and is a linear functional : A C so that (1) = 1.
Elements a A are called noncommutative random variables.

Let us give some relevant examples of noncommutative probability spaces.

Example 5.2.2
(i) Classical probability theory Let (X, B, ) be a probability space and set

A = L (X, B, ). Take to be the expectation (a) = X a(x) (dx).
Note that, for any p < , the spaces L p (X, B, ) are not algebras for the
0
usual product. (But the intersection 1p< L p (X, B, ) is again an alge-
bra.) To consider unbounded variables, we will introduce later the notion
of affiliated operators, see Subsection 5.2.3.
(ii) Discrete groups Let G be a discrete group with identity e and let A =
C(G) denote the group algebra (see Definition G.1). Take to be the
linear functional on A so that, for all g G, (g) = 1g=e .
326 5. F REE PROBABILITY

(iii) Matrices Let N be a positive integer and A = MatN (C). Let ,  denote
the scalar product on CN and fix v CN such that v, v = 1. We can take
on A to be given by v (a) = av, v, or by N (a) = N 1 tr(a).
(iv) Random matrices Let (X, B, ) be a probability space. Define A =
L (X, , MatN (C)), the space of NN-dimensional complex random ma-
trices with -almost surely uniformly bounded entries. Set
 
1 1 N
N (a) = tr(a(x)) (dx) = a(x)ei , ei  (dx) , (5.2.1)
N X N i=1

where here the ei are the standard basis vectors in CN . Alternatively, one
can consider, with v CN so that v, v = 1,

v (a) = a(x)v, v (dx) . (5.2.2)
X
(v) Bounded operators on a Hilbert space Let H be a Hilbert space with inner
product ,  and B(H) be the set of bounded linear operators on H. We
set for v H so that v, v = 1 and a B(H),
v (a) = av, v.
The GNS construction discussed below will show that this example is in a
certain sense universal. It is therefore a particularly important example to
keep in mind.

We now describe the notion of laws of noncommutative variables. Hereafter, J


denotes a subset of N, and CXi |i J denotes the set of polynomials in noncom-
mutative indeterminates {Xi }iJ , that is, the set of all finite C-linear combinations
of words in the variables Xi with the empty word identified to 1 C; in symbols,
m
CXi |i J = {0 + k Xik Xikp , k C, m N, ikj J}.
1 k
k=1

C[X] = CX denotes the set of polynomial functions in one variable.

Definition 5.2.3 Let {ai }iJ be a family of elements in a noncommutative proba-


bility space (A , ). Then, the distribution (or law) of {ai }iJ is the map {ai }iJ :
CXi |i J C such that
{ai }iJ (P) = (P({ai }iJ )) .

This definition is reminiscent of the description of compactly supported proba-


bility measures (on a collection of random variables) by means of their (mixed)
5.2 N ONCOMMUTATIVE LAWS AND PROBABILITY SPACES 327

moments. Since linear functionals on CXi |i J are uniquely determined by their


values on words Xi1 Xik , (i1 , . . . , ik ) J, we can and often do think of laws as
word-indexed families of complex numbers.

Example 5.2.4 Example 5.2.2 continued.


(i) Classical probability theory If a L (X, B, ), we get by definition that

a (P) = P(a(x))d (x)

and so a is (the sequence of moments of) the law of a under (or equiv-
alently the push-forward a# of by a).
(ii) Discrete groups Let G be a group with identity e and take (g) = 1g=e . Fix
{gi }1in Gn . The law = {gi }1in has then the following description:
for any monomial P = Xi1 Xi2 Xik , we have (P) = 1 if gi1 gik = e and
(P) = 0 otherwise.
(iii) One matrix Let a be an NN Hermitian matrix with eigenvalues
(1 , . . . , N ). Then we have, for all polynomials P C[X],

1 1 N
a (P) = tr(P(a)) = P(i ).
N N i=1

Thus, a is (the sequence of moments of) the spectral measure of a, and


thus (in effect) a probability measure on R.
(iv) One random matrix In the setting of part (iv) of Example 5.2.2, if a : X
( )
HN , for = 1 or 2, has eigenvalues (1 (x), . . . , N (x))xX , we have
 
1 1 N
N (P(a)) =
N X
tr(P(a)(x)) (dx) =
N i=1
P(i (x)) (dx)

= LN , P . (5.2.3)

Thus, a is (the sequence of moments of) the mean spectral measure of a.


(v) Several matrices (Setting of Example 5.2.2, parts (iii) and (iv)) If we are
given {ai }iJ MatN (C) so that ai = ai for all i J, then for P CXi |i
J,
{ai }iJ (P) := N 1 tr (P({ai }iJ ))

defines a distribution of noncommutative variables. {ai }iJ is called the


empirical distribution or law of the matrices {ai }iJ . Note that if J = {1}
and a1 is self-adjoint, a1 can be identified, by the previous example, as the
empirical distribution of the eigenvalues of a1 . Observe that if the {ai }iJ
are random and with the notation of Example 5.2.2, part 4, we may define
328 5. F REE PROBABILITY

their quenched empirical distribution {ai (x)}iJ for almost all x, or their

annealed empirical distribution {ai (x)}iJ d (x).
(vi) Bounded operators on a Hilbert space Let H be a Hilbert space and T a
bounded normal linear operator on H with spectrum (T ) (see Appendix
G, and in particular Section G.1, for definitions). According to the spec-
tral theorem, Theorem G.6, if is the spectral resolution of T , for any
polynomial function P C[X],

P(T ) = P( )d ( ).
(T )

Therefore, with v H so that v, v = 1, we find that



v (P(T )) = P(T )v, v = P( )d ( )v, v.
(T )

Hence, the law of T (B(H), v ) is (the sequence of moments of) the


compactly supported complex measure d ( )v, v.
(vii) Tautological example Let A = CXi |i J and let A be any linear
functional such that (1) = 1. Then (A , ) is a noncommutative proba-
bility space and is identically equal to the law {Xi }iJ .

It is convenient to have a notion of convergence of laws. It is easiest to work


with the weak*-topology. This leads us to the following definition.

Definition 5.2.5 Let (AN , N ), N N {}, be noncommutative probability spa-


ces, and let {aNi }iJ be a sequence of elements of AN . Then {aNi }iJ converges in
law to {ai }iJ if and only if for all P CXi |i J,

lim N (P) = {ai }iJ (P).


N {ai }iJ

We also say in such a situation that {aNi }iJ converges in moments to {a


i }iJ .

Since a law is uniquely determined by its values on monomials in the noncom-


mutative variables Xi , the notion of convergence introduced here is the same as
word-wise convergence.
The tautological example mentioned in Example 5.2.4 underscores the point
that the notion of law is purely algebraic and for that reason too broad to capture
any flavor of analysis. We have to enrich the structure of a noncommutative prob-
ability space in various ways in order to put the analysis back. To begin to see
what sort of additional structure would be useful, consider the case in which J is
reduced to a single element. Then a law is simply a linear functional C[X]
such that (1) = 1, or equivalently a sequence of complex numbers n = (X n )
indexed by positive integers n. Consider the following question.
5.2 N ONCOMMUTATIVE LAWS AND PROBABILITY SPACES 329

Does there exist a probability measure on the real line such that (P) = P(x) (dx) for
all P C[X]?

This is a reformulation in the present setup of the Hamburger moment problem.


It is well known that the problem has an affirmative solution if and only if all the
moments n are real, and furthermore the matrices {i+ j }n1 i, j=0 are positive defi-
nite for all n. We can rephrase the latter conditions in our setup as follows. Given
P = i ai X i C[X], ai C, put P = i ai X i . Then the Hamburger moment prob-
lem has an affirmative solution if and only if (P P) 0 for all P C[X]. This
example underscores the important role played by positivity. Our next immedi-
ate goal is, therefore, to introduce the notion of positivity into the setup of non-
commutative probability spaces, through the concept of states and C -probability
spaces. We will then give sufficient conditions, see Proposition 5.2.14, for a linear
functional CXi |i J to be written (P({ai }iJ )) = (P) for all polynomi-
als P CXi |i J, where {ai }iJ is a fixed family of elements of a C -algebra A
and is a state on A .

5.2.2 C -probability spaces and the weak*-topology

We first recall C -algebras, see Appendix G.1 for detailed definitions. We will re-
strict our discussion throughout to unital C -algebras (and C -subalgebras) with-
out further mentioning it. Thus, in the following, a C -algebra A is a unital
algebra equipped with a norm   and an involution so that

xy xy, a a = a2 .

Recall that A is complete under its norm.


An element a of A is said to be self-adjoint (respectively, normal) if a = a
(respectively, a a = aa ). Let Asa (respectively, An ) denote the set of self-adjoint
(respectively, normal) elements of A .

Example 5.2.6 The following are examples of C -algebras.


(i) Function spaces If X is a Polish space, the spaces B(X) and Cb (X), of
C-valued functions which are, respectively, bounded and bounded contin-
uous, are unital C -algebras when equipped with the supremum norm and
the conjugation operation. Note however that the space C0 (R) of contin-
uous functions vanishing at infinity is in general not a (unital) C -algebra,
for it has no unit.
(ii) Classical probability theory Take (X, B, ) a measure space and set A =
330 5. F REE PROBABILITY

L (X, ), with the norm

|| f || = ess supx | f (x)| .

(iii) Matrices An important example is obtained if one takes A = MatN (C). It


is a C -algebra when equipped with the standard involution

(A )i j = A ji , 1 i, j N

and the operator norm given by the spectral radius.


(iv) Bounded operators on a Hilbert space The previous example generalizes
as follows. Take H a complex Hilbert space, and consider as A the space
B(H) of linear operators T : H H which are bounded for the norm

||T ||B(H) = sup ||Te||H .


||e||H =1

Here, the multiplication operation is taken as composition. The adjoint T


of T B(H) is defined as the unique element of B(H) such that Ty, x =
y, T x for all x, y H, see (G.3).

Part (iv) of Example 5.2.6 is, in a sense, generic: any C -algebra A is isomorphic
to a sub C -algebra of B(H) for some Hilbert space H (see e.g. [Rud91, Theorem
12.41]). We provide below a concrete example.

Example 5.2.7 Let be a probability measure on a Polish space X. The C -


algebra A = L (X, ) can be identified as a subset of B(H) with H = L2 (X, ) as
follows. For all f L (X, ), we define the multiplication operator M f B(H)
by M f g = f g (which is in H if g H). Then M maps L (X, ) into B(H).

In C -algebras, spectral analysis can be developed. We recall (see Appendix


G.2) that the spectrum of a normal operator a in a C -algebra A is the compact
set
sp(a) = { C : e a is not invertible } {z C : |z| a}.

The same functional calculus we encountered in the context of matrices can be


used in C -algebras, for such normal operators a. Suppose that f is continuous on
sp(a). By the StoneWeierstrass Theorem, f can be uniformly approximated on
sp(a) by a sequence of polynomials pnf in a and a . Then, by part (iii) of Theorem
G.7, the limit
f (a) = lim pnf (a, a )
n

always exists, does not depend on the sequence of approximations, and yields an
5.2 N ONCOMMUTATIVE LAWS AND PROBABILITY SPACES 331

element of A . It can thus serve as the definition of f : a A  f (a) A (one


may alternatively use the spectral theorem, see Section G.2).

Remark 5.2.8 The smallest C -subalgebra Aa A containing a given self-adjoint


operator a is given by Aa = { f (a) : f C(sp(a))}. Indeed, Aa contains {p(a) :
p C[X]} and so, by functional calculus, contains { f (a) : f C(sp(a))}. The
conclusion follows from the fact that the latter is a C -algebra. The norm on Aa
is necessarily the spectral radius by Theorem G.3. Observe that this determines
an isomorphism of C(sp(a)) into A that preserves linearity and involution. It is
a theorem of Gelfand and Naimark (see e.g. [Rud91, Theorem 11.18]) that if a
C -algebra A is commutative then it is isomorphic to the algebra C(X) for some
compact X; we will not need this fact.

To begin discussing probability, we need two more concepts: the first is posi-
tivity and the second is that of a state.

Definition 5.2.9 Let (A ,  , ) be a C -algebra.


(i) An element a A is nonnegative (denoted a 0) if a = a and its spec-
trum sp(a) is nonnegative.
(ii) A state is a linear map : A C with (e) = 1 and (a) 0 if a 0.
(iii) A state is tracial if (ab) = (ba) for all a, b A .

It is standard to check (see e.g. [Mur90, Theorem 2.2.4]) that

{a A : a 0} = {aa : a A } . (5.2.4)

Example 5.2.10 An important example is A = C(X) with X some compact space.


Then, by the Riesz representation theorem, Theorem B.11, a state is a probability
measure on X.

C -probability spaces

Definition 5.2.11 A quadruple (A ,  , , ) is called a C -probability space if


(A ,  , ) is a C -algebra and is a state.

As a consequence of Theorem 5.2.24 below, the law of a family of random vari-


ables {ai }iJ in a C -probability space can always be realized as the law of random
variables {bi }iJ in a C -probability space of the form (B(H),  , , a  av, v),
where H is a Hilbert space with inner product , ,   is the operator norm, and
v H is a unit vector.
332 5. F REE PROBABILITY

We show next how all cases in Example 5.2.2 can be made to fit the definition
of C -probability space.

Example 5.2.12 Examples 5.2.2 and 5.2.4 continued.


(i) Classical probability theory Let (X, B, ) be a probability space and set

A = L (X, B, ). Let (a) = X a(x) (dx) be the expectation operator.
In this setup, use H = L2 (X, B, ), consider each a A as an element
of B(H) by associating with it the multiplication operator Ma f = a f (for
f H), and then write (a) = Ma 1, 1. A is equipped with a structure
of C -algebra as in part (i) of Example 5.2.6. Note that if a is self-adjoint,
it is just a real-valued element of L (X, B, ), and the spectrum of Ma is
a subset of [ess-infxX a(x), ess-supxX a(x)]. The spectral projections are
then given by E() = M1a1 () for any in that interval.
(ii) Discrete groups Let G be a discrete group. Consider an orthonormal basis
{vg }gG of 2 (G), the set of sums gG cg vg with cg C and |cg |2 < .
2 (G) is equipped with a scalar product

 cg vg , c g vg  = cg c g ,
gG gG gG

which turns it into a Hilbert space. The action of each g G on 2 (G)


becomes (g )(g cg vg ) = g cg vg g , yielding the left regular represen-
tation determined by G, which defines a family of unitary operators on
2 (G). These operators are determined by (g)vh = vgh . The C -algebra
associated with this representation is generated by the unitary operators
{ (g)}gG , and coincides with the operator-norm closure of the linear
span of { (g)}gG (the latter contains any sum cg (g) when |cg | <
). It is in particular included in B(2 (G)). Take as trace the function
(a) = ave , ve  where e G is the unit. In particular, (g bg (g)) = be .
(iii) Random matrices In the setting of part (iv) of Example 5.2.2, consider
A = L (X, , MatN (C)). The function
 
1 1 N
N (a) =
N X
tr(a(x)) (dx) =
N i=1
a(x)ei , ei  (dx) , (5.2.5)

on A is a tracial state. There are many other states on A ; for any vector
v CN with ||v|| = 1,

v (a) = a(x)v, vd (x)

is a state.
5.2 N ONCOMMUTATIVE LAWS AND PROBABILITY SPACES 333

We now consider the set of laws of variables {ai }iJ defined on a C -probability
space.

Definition 5.2.13 Let (A ,  , ) be a C -algebra. Define MA = MA ,, to be


the set of states on A , i.e. the set of linear forms on A so that, for all positive
elements a A ,
(a) 0, (1) = 1 . (5.2.6)

(By Lemma G.11, a state automatically satisfies   1, that is, | (x)| x
for any x A .) Note that by either Lemma G.11 or (5.2.4), equation (5.2.6) is
equivalent to
(bb ) 0 b A , (1) = 1 . (5.2.7)

In studying laws of random variables {ai }iJ in a C -algebra A , we may restrict


attention to self-adjoint variables, by writing for any a A , a = b + ic with b =
(a + a )/2 and c = i(a a)/2 both self-adjoint. Thus, in the sequel, we restrict
ourselves to studying the law of self-adjoint elements. In view of this restriction,
it is convenient to equip CXi |i J with the unique involution so that Xi = Xi ,
and, as a consequence,
( Xi1 Xim ) = Xim Xi1 , (5.2.8)

We now present a criterion for verifying that a given linear functional on


CXi |i J represents the law of a family of (self-adjoint) random variables on
some C -algebra. Its proof follows ideas that are also employed in the proof of
the GelfandNaimarkSegal construction, Theorem 5.2.24 below.

Proposition 5.2.14 Let J be a set of positive integers. Fix a constant 0 < R < .
Let the involution on CXi |i J be as in (5.2.8). Then there exists a C -algebra
A = A (R, J) and a family {ai }iJ of self-adjoint elements of it with the following
properties.
(a) supiJ ai  R.
(b) A is generated by {ai }iJ as a C -algebra.
(c) For any C -algebra B and family of self-adjoint elements {bi }iJ of it
satisfying supiJ bi  R, we have P({ai }iJ ) P({bi }iJ ) for all
polynomials P CXi |i J.
(d) A linear functional CXi |i J is the law of {ai }iJ under some state
MA if and only if (1) = 1,
| (Xi1 Xik )| Rk (5.2.9)
for all words Xi1 , . . . , Xik , and (P P) 0 for all P CXi |i J.
334 5. F REE PROBABILITY

(e) Under the equivalent conditions stated in point (d), the state is unique,
and furthermore is tracial if (PQ) = (QP) for all P, Q CXi |i J.

Points (a), (b) and (c) of Proposition 5.2.14 imply that, for any C -algebra B and
{bi }iJ as in point (c), there exists a unique continuous algebra homomorphism
A B commuting with sending ai to bi for i J. In this sense, A is the
universal example of a C -algebra equipped with an R-bounded J-indexed family
of self-adjoint elements.
Proof To abbreviate notation, we write

A = CXi |i J.

First we construct A and {ai }iJ to fulfill the first three points of the proposition
by completing A in a certain way. For P = P({Xi }iJ ) A, put

PR,J,C = sup P({bi }iJ ) , (5.2.10)


B,{bi }iJ

where B ranges over all C -algebras and {bi }iJ ranges over all families of self-
adjoint elements of B such that supiJ bi  R. Put

L = {P A : PR,J,C = 0}.

Now the function R,J,C is a seminorm on the algebra A . It follows that L is


a two-sided ideal of A and that R,J,C induces on the quotient A/L a norm.
Furthermore PP R,J,C = P2R,J,C , and hence P R,J,C = PR,J,C for all
P A. In particular, the involution passes to the quotient A/L and preserves the
norm induced by R,J,C . Now complete A/L with respect to the norm induced
by R,J,C , and equip it with the involution induced by P  P , thus obtaining
a C -algebra. Call this completion A and let ai denote the image of Xi in A for
i J. Thus we obtain A and self-adjoint {ai }iJ fulfilling points (a), (b), (c).
Since the implication (d)() is trivial, and point (e) is easy to prove by approxi-
mation arguments, it remains only to prove (d)(). Given P = c A, where
the summation extends over all words in the Xi (including the empty word) and
all but finitely many of the coefficients c C vanish, we define

PR,J = |c |Rdeg < ,

where deg denotes the length of the word . One checks that PR,J is a norm
on A and further, from assumption (5.2.9),

| (P)| PR,J , P A. (5.2.11)


5.2 N ONCOMMUTATIVE LAWS AND PROBABILITY SPACES 335

For P A and Q A satisfying (Q Q) > 0 we define


(Q PQ)
Q (P) = ,
(Q Q)
and we set
1/2

P = sup Q (P P) .
QA
(Q Q)>0

By the continuity of with respect to  R,J , see (5.2.11), and Lemma G.22, we
have that P P PR,J . In particular, Xi  R for all i J.
1/2

We check that  is a seminorm on A satisfying P P = P2 for all P A.


Indeed, for C,  P = | | P by definition. We verify next the sub-
additivity of   . Since Q is a nonnegative linear form on A, we have from
(G.6) that for any S, T A,
[Q ((S + T ) (S + T ))]1/2 [Q (S S)]1/2 + [Q (T T )]1/2 ,
from which S + T  S + T  follows by optimization over Q.
To prove the sub-multiplicativity of  , note first that by the CauchySchwarz
inequality (G.5), for Q, S, T A with (Q Q) > 0,
Q (T S ST ) vanishes if Q (T T ) = 0 .
Then, assuming T  > 0,
ST 2 = sup Q (T S ST )
QA
(Q Q)>0

= sup T Q (S S)Q (T T ) S2 T 2 . (5.2.12)


QA
(Q T T Q)>0

We conclude that   is a seminorm on A.


To verify that T T  = T 2 , note that by the CauchySchwarz inequal-
ity (G.5) and Q (1) = 1, we have |Q (T T )|2 Q ((T T )2 ), hence T 2
T T  . By (5.2.12), T T  T  T  and therefore we get that T 
T  . By symmetry, this implies T  = T  = T T  , as claimed.
1/2

Using again the quotient and completion process which we used to construct
A , but this time using the seminorm  , we obtain a C -algebra B and self-
adjoint elements {bi }iJ satisfying supiJ bi  R and P = P({bi }iJ ) for
P A. But then by point (c) we have P PR,J,C for P A, and thus
| (P)| PR,J,C . Let be the unique continuous linear functional on A such
336 5. F REE PROBABILITY

that (P({ai }iJ )) = (P) for all P A. Since (P P) 0 for P A, it follows,


see (5.2.7), that is positive and hence a state on A . The proof of point (d)() is
complete.

Example 5.2.15 Examples 5.2.2 continued.


(i) Classical probability The set M1 ([R, R]) of probability measures on
[R, R] can be recovered as the set MA (R,{1}) .
(ii) Matrices The study of noncommutative laws of matrices {ai }iJ belonging
to MatN (C) with spectral radii bounded by R reduces, by the remark fol-
lowing (5.2.7), to the study of laws of Hermitian matrices. For the latter,
the noncommutative law of k matrices whose spectral radii are bounded
by R can be represented as elements of MA (R,{1,...,k}) .

The examples above do not accommodate laws of unbounded variables. We will


see in Section 5.2.3 that such laws can be defined using the notion of affiliated
operators.

Weak*-topology

Recall that we endowed the set of noncommutative laws with its weak*-topology,
see Definition 5.2.5.

Corollary 5.2.16 For N N, let {aNi }iJ be self-adjoint elements of a C -proba-


bility space (AN ,  N , N , N ). Assume that for all P CXi |i J, N (P(aNi , i
J)) converges to some (P). Let R > 0 be given, with A (R, J) the universal C -
algebra and {ai }iJ the elements of it defined in Proposition 5.2.14.
(i) If supiJ,N aNi N R, then there exists a collection of states N , on
A (R, J) so that, for any P CXi |i J,
N (P({ai }iJ )) = N (P({aNi }iJ )) , (P({ai }iJ )) = (P) .
(ii) If there exists a finite R so that for all k N and all (i j )1 jk J k ,
| (Xi1 Xik )| Rk , (5.2.13)
then there exists a state on A (R, J) so that, for any P CXi |i J,
(P({ai }iJ )) = (P) .

Proof By the remark following Proposition 5.2.14, there exist for N N C -


homomorphisms hN : A (R, J) AN so that aNi = hN (ai ) and the state N =
N hN satisfies N (P({aNi }iJ )) = N (P({ai }iJ )) for each P CXi |i J. By
assumption, N (P({ai })) converges to (P), and thus | (P)| P({ai }iJ ) (the
5.2 N ONCOMMUTATIVE LAWS AND PROBABILITY SPACES 337

norm here is the norm on A (R, J)). As a consequence, extends to a state on


A (R, J), completing the proof of the first part of the corollary.
The second part of the corollary is a direct consequence of part (d) of Proposi-
tion 5.2.14.

We remark that a different proof of part (i) of Corollary 5.2.16 can be given
directly by using part (d) of Proposition 5.2.14. A different proof of part (ii) is
sketched in Exercise 5.2.20.

Example 5.2.17 Examples 5.2.2, parts (iii) and (iv), continued.


(i) Matrices Let {M Nj } jJ MatN (C) be a sequence of Hermitian matrices
and assume that there exists R finite so that

lim sup |{MN } jJ (Xi1 Xik )| Rk .


j
N

Assume that {MN } jJ (P) converge as N goes to infinity to some limit


j
(P) for all P CXi |i J. Then, there exist noncommutative random
variables {a j } jJ in a C -probability space so that ai = ai and {M Nj } jJ
converge in law to {a j } jJ .
(ii) Random matrices Let (, B, ) be a probability space. For j J, let
(2)
M Nj ( ) HN be a collection of Hermitian random matrices. If the re-
quirements of the previous example are satisfied for almost all ,
then we can conclude similarly that {M Nj ( )} jJ MatN (C) converges
in law to some {a j ( )} jJ . Alternatively, assume one can show the con-
vergence of the moments of products of elements from {M Nj ( )} jJ , in
L1 ( ). In this case, we endow the C -algebra (MatN (C),  N , ) with the
tracial state N = N 1 tr. Observe that N is continuous with respect

to M := ess supM( ) , but the latter unfortunately may be infinite.
However, if we assume that for all i j J, N (MiN1 MiNk ) converges as N
goes to infinity to (Xi1 Xik ), and that there exists R < so that, for all
i j J,
(Xi1 Xik )| Rk ,

then it follows from Corollary 5.2.16 that there exists a state on the uni-
versal C -algebra A (R, J) and elements {ai }iJ A (R, J) so that
{MiN ( )}iJ converges in expectation to {ai }iJ , i.e.

lim N (P(MiN ( ), i J)) = (P(ai , i J)) P CXi |i J .


N

This example applies in particular to collections of independent Wigner


matrices.
338 5. F REE PROBABILITY

The space MA possesses a nice topological property that we state next. The
main part of the proof (which we omit) uses the BanachAlaoglu Theorem, The-
orem B.8.

Lemma 5.2.18 Let (A ,  , ) be a C -algebra, with A separable. Then MA is


compact and separable, hence metrizable.

Thus, on MA , sequential convergence determines convergence.


As we next show, the construction of noncommutative laws is such that any
one-dimensional marginal distribution is a probability measure. This can be seen
as a variant of the Riesz representation theorem, Theorem B.11.

Lemma 5.2.19 Let (A ,  , ) be a C -algebra and a state on (A ,  , ). Let


F A , F = F . Then there exists a unique probability measure F M1 (R) with

moments xk F (dx) = (F k ). The support of F is included in [FA , FA ].
Further, the map  F from MA furnished with the weak*-topology, into
M1 (R), equipped with the weak topology, is continuous.

Proof The uniqueness of F with the prescribed properties is a standard con-


sequence of the bound | (F k )| FkA . To prove existence of F , recall the
functional calculus described in Remark 5.2.8 which provides us with a map f 
f (F) identifying the C -algebra C(spA (F)) isometrically with the C -subalgebra
AF A generated by F. The composite map f  ( f (F)) is then a state on
C(spA (F)) and hence by Example 5.2.10 a probability measure on spA (F)
[FA , FA ]. It is clear that this probability measure has the moments pre-
scribed for F . Existence of F M1 (R) with the prescribed moments follows.
Abusing notation, for f Cb (R), let f (F) = g(F) A where g = f |spF (F) and

note that F ( f ) = f d F = ( f (F)) by construction. Finally, to see the claimed
continuity, if we take a sequence n MA converging to for the weak*-
topology, for any f Cb (R), Fn ( f ) converges to F ( f ) as n goes to infinity since
f (F) A . Therefore  F is indeed continuous.

Exercise 5.2.20 In the setting of Corollary 5.2.16, show, without using part (d) of
Proposition 5.2.14, that under the assumptions of part (ii) of the corollary, there
exists a sequence of states N on A (R + 1, J) so that N (P) converges to (P)
for all P CXi |i J. Conclude that is a state on A (R + 1, J).
Hint: set fR (x) = x (R + 1) ((R + 1)), and define aN,R
i = fR (aNi ). Using the
N,R
CauchySchwarz inequality, show that N (P({ai }iJ )) converges to (P) for
all P CXi |i J. Conclude by applying part (i) of the corollary.
5.2 N ONCOMMUTATIVE LAWS AND PROBABILITY SPACES 339

5.2.3 W -probability spaces

In the previous section, we considered noncommutative probability measures de-


fined on C -algebras. This is equivalent, in the classical setting, to defining proba-
bility measures as linear forms on the set of continuous bounded functions. How-
ever, in the classical setting, it is well known that one can define probability
measures as linear forms, satisfying certain regularity conditions, on the set of
measurable bounded functions. One can define a generalization to the notion of
measurable functions in the noncommutative setting.
If one deals with a single (not necessarily bounded) self-adjoint operator b, it
is possible by the spectral theorem G.6 to define g(b) for any function g in the
set B(sp(b)) of bounded, Borel-measurable functions on sp(b). This extension is
such that for any x, y H, there exists a compactly supported measure x,y b (which

equals b x, y if b is the resolution of the identity of b, see Appendix G.2) such


that


g(b)x, y = g(z)d x,y
b
(z) . (5.2.14)

In general, g(b) may not belong to the C -algebra generated by b; it will, however,
belong to a larger algebra that we now define.

Definition 5.2.21 A C -algebra A B(H) for some Hilbert space H is a von


Neumann algebra (or W -algebra) if it is closed with respect to the weak operator
topology.

(Weak operator topology closure means that b b on a net if, for any fixed
x, y H, b x, y converges to bx, y. Recall, see Theorem G.14, that in Definition
5.2.21, the requirement of closure with respect to the weak operator topology is
equivalent to closure with respect to the strong operator topology, i.e., with the
previous notation, to b x converging to bx in H.)

Definition 5.2.22 A W -probability space is a pair (A , ) where A is a W -


algebra, subset of B(H) for some Hilbert space H, and is a state that can be
written as (a) = a ,  for some unit vector H.

Example 5.2.23
(i) We have seen in Remark 5.2.8 that the C -algebra Ab generated by a
self-adjoint bounded operator b on a separable Hilbert space H is exactly
{ f (b), f C(sp(b))}. It turns out that the von Neumann algebra generated
340 5. F REE PROBABILITY

by b is Ab = { f (b), f B(sp(b))}. Indeed, by Lusins Theorem, Theorem


B.13, for all x, y H, for any bounded measurable function g, there ex-
ists a sequence gn of uniformly bounded continuous functions converging
in x,y
b probability to g. Since we assumed that H is separable, we can,

by a diagonalization argument, assume that this convergence holds for all


x, y H simultaneously. Therefore, the above considerations show that
gn (b) converges weakly to g(b). Thus the weak closure of Ab contains
Ab . One sees that Ab is a von Neumann algebra by the double commutant
theorem, Theorem G.13, and the spectral theorem, Theorem G.7.
(ii) As a particular case of the previous example (take b to be the right mul-
tiplication operator by a random variable with law ), L (X, ) can be
identified as a W -algebra. In fact, every commutative von Neumann al-
gebra on a separable Hilbert space H can be represented as L (X, ) for
some (X, B, ). (Since we do not use this fact, the proof, which can be
found in [Mur90, Theorem 4.4.4], is omitted.)
(iii) An important example of a W -algebra is B(H) itself which is a von Neu-
mann algebra since it is trivially closed.

We saw in Proposition 5.2.14 sufficient conditions for a linear functional on


CXi |i J to be represented by a state in a C -algebra (A ,  , ). The fol-
lowing GNS construction gives a canonical way to represent the latter as a state
on B(H) for some Hilbert space H.

Theorem 5.2.24 (GelfandNaimarkSegal construction) Let be a state on


a unital C -algebra (A ,  , ) generated by a countable family {ai }iJ of self-
adjoint elements. Then there exists a separable Hilbert space H, equipped with
a scalar product , , a norm-decreasing -homomorphism : A B(H) and a
vector 1 H so that the following hold.
(a) { (a)1 : a A } is dense in H.
(b) Set (x) = 1 , x1  for x B(H). Then, for all a in A ,
(a) = ( (a)) .
(c) The noncommutative law of {ai }iJ in the C -probability space
(A ,  , , ) equals the law of { (ai )}iJ in the W -probability space
(B(H), ).
(d) Let W ({ai }iJ ) denote the von Neumann algebra generated by
{ (ai ) : i J} in B(H). If is tracial, so is the restriction of the state
to W ({ai }iJ ).

Proof of Theorem 5.2.24 Let L = { f A | ( f f ) = 0}. As in the proof of


5.2 N ONCOMMUTATIVE LAWS AND PROBABILITY SPACES 341

Proposition 5.2.14, L is a left ideal. It is closed due to the continuity of the map
f  ( f f ). Consider the quotient space A := A \ L . Denote by : a  a
the map from A into A . Note that, by (G.6), (x y) depends only on x , y , and
put
1
x , y  = (x y), x  := x , x  2 ,
which defines a pre-Hilbert structure on A . Let H be the (separable) Hilbert
space obtained by completing A with respect to the Hilbert norm   .
To construct the morphism , we consider A as acting on A by left multipli-
cation and define, for a A and b A ,
(a)b := ab A .
By (G.7),
|| (a)b ||2 = ||ab ||2 = (b a ab) ||a||2 (b b) = ||a||2 ||b ||2 ,
and therefore (a) extends uniquely to an element of B(H), still denoted (a),
with operator norm bounded by a. is a -homomorphism from A into B(H),
that is, (ab) = (a) (b) and (a) = (a ). To complete the construction, we
take 1 as the image under of the unit in A .
We now verify the conclusions (a)(c) of the theorem. Part (a) holds since H
was constructed as the closure of { (a)1 : a A }. To see (b), observe that
for all a A , 1 , (a)1  = 1 , a  = (a). Finally, since is a morphism,
(P({ai }iJ )) = P({ (ai )}iJ ), which together with part (b), shows part (c).
To verify part (d), note that part (b) implies that for a, b A ,
(ab) = ( (ab)) = ( (a) (b))
and thus, if is tracial, one gets ( (a) (b)) = ( (b) (a)). The conclusion
follows by a density argument, using the Kaplansky density theorem, Theorem
G.15, to first reduce attention to self-adjoint operators and their approximation by
a net, belonging to (A ), of self-adjoint operators.

The norm-decreasing -homomorphism constructed by the theorem is in gen-


eral not one-to-one. This defect can be corrected as follows.

Corollary 5.2.25 In the setup of Theorem 5.2.24, there exists a separable Hilbert
space H, a norm-preserving -homomorphism : A B(H) and a unit vector
H such that for all a A , (a) =  (a) , .

Proof By Theorem G.5 there exists a norm-preserving -homomorphism A :


A B(HA ) but HA might be nonseparable. Using the separability of A , it is
342 5. F REE PROBABILITY

routine to construct a separable Hilbert space H0 HA stable under the action of


A via A so that the induced representation 0 : A B(H0 ) is a norm-preserving
-homomorphism. Then, with : A B(H) and 1 as in Theorem 5.2.24, the
direct sum = 0 : A B(H0 H) of representations and the unit vector
= 0 1 H0 H have the desired properties.

We will see that the state of Theorem 5.2.24 satisfies additional properties
that we now define. These properties will play an important role in our treatment
of unbounded operators in subsection 5.2.3.

Definition 5.2.26 Let A be a von Neumann algebra.


A state on A is faithful iff (xx ) = 0 implies x = 0.
A state on A is normal iff for any monotone decreasing to zero net a of
nonnegative elements of A ,

inf (a ) = 0 .

The normality assumption is an analog in the noncommutative setup of the reg-


ularity assumptions on linear functionals on measurable functions needed to en-
sure they are represented by measures. For some consequences of normality, see
Proposition G.21.
We next show that the GelfandNaimarkSegal construction allows us, if
is tracial, to represent any joint law of noncommutative variables as the law of
elements of a von Neumann algebra equipped with a faithful normal state. In what
follows, we will always restrict ourselves to W -probability spaces equipped with
a tracial state . The properties we list below often depend on this assumption.

Corollary 5.2.27 Let be a tracial state on a unital C -algebra satisfying the


assumptions of Theorem 5.2.24. Then, the tracial state on W ({ai }iJ ) of
Theorem 5.2.24 is normal and faithful.

Proof We keep the same notation as in the proof of Theorem 5.2.24. We begin by
showing that is faithful on W ({ai }iJ ) B(H). Take x W ({ai }iJ ) so that
(x x) = 0. Then we claim that

x (a)1 = 0 , for all a A . (5.2.15)

Indeed, we have

x (a)1 2H = x (a)1 , x (a)1  = 1 , (a) x x (a)1 


= ( (a) x x (a)) = (x (a) (a )x ) ,
5.2 N ONCOMMUTATIVE LAWS AND PROBABILITY SPACES 343

where we used in the last equality the fact that is tracial on W ({ai }iJ ). Be-
cause is a morphism we have (a) (a ) = (aa ), and because the operator
norm of (aa ) B(H) is bounded by the norm aa  in A , we obtain from the
last display

x (a)1 2H = 1 , x (aa )x 1  aa  (x x) = 0 ,

completing the proof of (5.2.15). Since (a)1 is dense in H by part (a) of The-
orem 5.2.24, and x B(H), we conclude that x = 0 for all H, and therefore
x = 0, completing the proof that is faithful in W ({ai }iJ ). By using Proposi-
tion G.21 with x the projection onto the linear vector space generated by 1 , we
see that is normal.

Laws of self-adjoint operators

So far, we have considered bounded operators. However, with applications to


random matrices in mind, it is useful also to consider unbounded operators. The
theory incorporates such operators via the notion of affiliated operators. Let A be
a W -algebra, subset of B(H) for some Hilbert space H.

Definition 5.2.28 A densely defined self-adjoint operator X on a Hilbert space H


is said to be affiliated to A if, for any bounded Borel function f on the spectrum
of X, f (X) A . A closed densely defined operator Y is affiliated with A if its
polar decomposition Y = uX (see Lemma G.9) is such that u A is a partial
isometry and X is a self-adjoint operator affiliated with A . We denote by ABthe
collection of operators affiliated with A .

(Here, f (X) is defined by the spectral theorem, Theorem G.8, see Section G.2 for
details.)
It follows from the definition that a self-adjoint operator X is affiliated with A
iff (1 + zX)1 X A for one (or equivalently all) z C\R. (Equivalently, iff all
the spectral projections of X belong to A .) By the double commutant theorem,
Theorem G.13, this is also equivalent to saying that, for any unitary operator u in
the commutant of A , uXu = X.

Example 5.2.29 Let be a probability measure on R, H = L2 ( ) and A = B(H).


Let X be the left multiplication by x with law , that is, X f := x f , f H. Then X
is a densely defined operator, affiliated with A .

We define below the noncommutative laws of affiliated operators and of poly-


nomials in affiliated operators.
344 5. F REE PROBABILITY

Definition 5.2.30 Let (A , ) be a W -probability space and let T be a self-adjoint


operator affiliated with A . Then, the law T of T is the unique probability mea-

sure on R such that (u(T )) = u( )d T ( ) for any bounded measurable func-
tion u. The associated distribution function is FT (x) := FT (x) := T ((, x]),
x R.

(The uniqueness of T follows from the Riesz representation theorem, Theorem


B.11.) The spectral theorem, Theorem G.8, implies that FT (x) = (T ((, x]))
if T is the resolution of the identity of the operator T (this is well defined since
the spectral projection T ((, x]) belongs to A ).
Polynomials of affiliated operators are defined by the following algebraic rules:
(A + B)v := Av + Bv for any v H belonging to the domains of both A and B,
and similarly, (AB)v := A(Bv) for v in the domain of B such that Bv is in the
domain of A. One difficulty arising with such polynomials is that, in general, they
are not closed, and therefore not affiliated. This difficulty again can be overcome
by an appropriate completion procedure, which we now describe. Given a W -
algebra A equipped with a normal faithful tracial state , introduce a topology by
declaring the sets
N( , ) = {a A : for some projection p A , ap , (1 p) }
and their translates to be neighborhoods. Similarly, introduce neighborhoods in H
by declaring the sets
O( , ) = {h H : for some projection p A , ph , (1 p) }
to be a fundamental system of neighborhoods, i.e. their translates are also neigh-
borhoods. Let ACbe the completion of vector space A with respect to the uni-
formity defined by the system N( , ) of neighborhoods of origin. Let H C be the
analogous completion with respect to the system of neighborhoods O( , ). A
fundamental property of this completion is the following theorem, whose proof,
which we skip, can be found in [Nel74].

Theorem 5.2.31 (Nelson) Suppose A is a von Neumann algebra equipped with a


normal faithful tracial state.
(i) The mappings a  a , (a, b)  a + b, (a, b)  ab, (h, g)  h + g, (a, h) 
ah with a, b A and h, g H possess unique uniformly continuous exten-
sions to ACand H.
D
(ii) With b ACassociate a multiplication operator Mb , with domain D(Mb ) =
{h H : bh H}, by declaring Mb h = bh for h D(Mb ). Then Mb is a
closed, densely defined operator affiliated with A , with Mb = Mb . Fur-
ther, if a AB, then there exists a unique b ACso that a = Mb .
5.2 N ONCOMMUTATIVE LAWS AND PROBABILITY SPACES 345

The advantage of the operators Mb is that they recover an algebraic structure.


Namely, while if a, a ABthen it is not necessarily the case that a + a or aa
belong to AB, however, if a = Mb and a = Mb then Mb+b and Mbb are affiliated
operators that equal the closure of Mb + Mb and Mb Mb (see [Nel74, Theorem 4]).
Thus, with some standard abuse of notation, if Ti AB, i = 1, . . . , k, we say that
for Q CXi |1 i k, Q(T1 , . . . , Tk ) AB, meaning that with Ti = Mai , we have
MQ(a1 ,...,ak ) AB.
The assumption of the existence of a normal faithful tracial state ensures Prop-
erty G.18, which is crucial in the proof of the following proposition.

Proposition 5.2.32 Let (A , ) be a W -probability space, subset of B(H) for some


separable Hilbert space H. Assume that is a normal faithful tracial state. Let
Q CXi |1 i k be self-adjoint. Let T1 , . . . , Tk ABbe self-adjoint, and let
Q(T1 , . . . , Tk ) be the self-adjoint affiliated operator described following Theorem
5.2.31. Then, for any sequence un of bounded measurable functions converging,
as n goes to infinity, to the identity uniformly on compact subsets of R, the law of
Q(un (T1 ), . . . , un (Tk )) converges to the law of Q(T1 , . . . , Tk ).

The proof of Proposition 5.2.32 is based on the two following auxiliary lemmas.

Lemma 5.2.33 Let (A , ) be as in Proposition 5.2.32. Let T1 , . . . , Tk be self-


adjoint operators in AB, and let Q CXi |1 i k. Then there exists a constant
m(Q) < , such that, for any projections p1 , . . . , pk A so that Ti = Ti pi A
for i = 1, 2, . . . , k, there exists a projection p such that
Q(T1 , . . . , Tk )p = Q(T1 , . . . , Tk )p,
(p) 1 m(Q) max1ik (1 (pi )).

Note that part of the statement is that Q(T1 , . . . , Tk )p A . In the proof of Proposi-
tion 5.2.32, we use Lemma 5.2.33 with projections pi = pni := Ti ([n, n]) on the
domain of the Ti that ensure that (T1 , . . . , Tk ) belong to A . Since such projections
can be chosen with traces arbitrarily close to 1, Lemma 5.2.33 will allow us to
define the law of polynomials in affiliated operators by density, as a consequence
of the following lemma.

Lemma 5.2.34 Let (A , ) be as in Proposition 5.2.32. Let X,Y be two self-adjoint


operators in AB. Fix > 0. Assume that there exists a projection p A such that
pX p = pY p and (p) 1 for some > 0. Then

sup |FX (x) FY (x)| .


xR
346 5. F REE PROBABILITY

Note that the KolmogorovSmirnov distance

dKS ( , ) := max |F (x) F (x)|


xR

dominates the Levy distance on M1 (R) defined in Theorem C.8. Lemma 5.2.34
shows that, with X,Y, p, as in the statement, dKS (X , Y ) .
Proof of Lemma 5.2.33 The key to the proof is to show that if Z ABand p is a
projection, then there exists a projection q such that

(q) (p) and Zq = pZq . (5.2.16)

With (5.2.16) granted, we proceed by induction, as follows. Let Si ABand pi be


projections so that Si = Si pi A , i = 1, 2. (To prepare for the induction argument,
at this stage we do not assume that the Si are self-adjoint.) Write p12 = p1 p2 .
By (5.2.16) (applied with p = p12 ), there exist two projections q and q such that
p12 S1 q = S1 q, p12 S2 q = S2 q . Set p := p1 p2 q q . We have that p2 p = p
and q p = p, and thus S2 p = S2 q p. The range of S2 q belongs to the range of p1
and of p2 (because p12 S2 q = S2 q ). Thus

S2 p = S2 q p = p1 S2 q p = p1 S2 p = p1 S2 p2 p . (5.2.17)

Therefore
S1 S2 p = S1 S2 p , (5.2.18)

where (5.2.17) was used in the last equality. Note that part of the equality is that
the image of S2 p is in the domain of S1 and so S1 S2 p A . Moreover, (p)
1 4 max (1 pi ) by Property G.18. We proceed by induction. We first detail
the next step involving the product S1 S2 S3 . Set S = S2 S3 and let p be the projection
as in (5.2.18), so that Sp = S2 S3 p A . Repeat the previous step now with S and
S1 , yielding a projection q so that S1 S2 S3 pq = S1 S2 S3 pq. Proceeding by induction,
we can thus find a projection p so that S1 Sn p = S1 Sn p with Si = Si pi and
(p) 1 2n max (1 pi ). Similarly, (S1 + + Sn )q = (S1 + + Sn )q if
q = p1 p2 pn . Iterating these two results, for any given polynomial Q, we
find a finite constant m(Q) such that for any Ti = Ti pi with (pi ) 1 , 1 i k,
there exists p so that Q(T1 , . . . , Tk )p = Q(T1 , . . . , Tk )p and (p) 1 m(Q) .
To complete the argument by proving (5.2.16), we write the polar decompo-
sition (1 p)Z = uT (see G.9), with a self-adjoint nonnegative operator T =
|(1 p)Z| and u a partial isometry such that u vanishes on the ortho-complement
of the range of T . Set q = 1 u u. Noting that uu 1 p, we have (q) (p).
Also, qT = (1 u u)T = 0 implies that T q = 0 since T and q are self-adjoint, and
therefore (1 p)Zq = 0.

5.2 N ONCOMMUTATIVE LAWS AND PROBABILITY SPACES 347

Proof of Lemma 5.2.34 We first claim that, given an unbounded self-adjoint op-
erator T affiliated to A and a real number x, we have
FT (x) = sup{ (q) : q = q2 = q A , qT q A , qT q xq}. (5.2.19)
More precisely, we now prove that the supremum is achieved for c with the
projections qT,c (x) = T ((c, x]) provided by the spectral theorem. At any rate, it is
clear that FT (x) = (T ((, x])) is a lower bound for the right side of (5.2.19).
To show that FT (x) is also an upper bound, consider any projection r A such
that (r) > FT (x) with rTr bounded. Put q = T ((, x]). We have (r) > (q).
We have (r r q) = (r q q) (r) (q) > 0 using Proposition G.17.
Therefore we can find a unit vector v H such that rTrv, v > x, thus ruling out
the possibility that (r) belongs to the set of numbers on the right side of (5.2.19).
This completes the proof of the latter equality.
Consider next the quantity
FT,p (x) = sup{ (q) : q = q2 = q A , qT q A , qT q xq, q p} .
We claim that
FT (x) FT,p (x) FT (x) . (5.2.20)
The inequality on the right of (5.2.20) is obvious. We get the lower equality by
taking q = qT,c (x) p on the right side of the definition of FT,p (x) with c large and
using Proposition G.17 again. Thus, (5.2.20) is proved.
To complete the proof of Lemma 5.2.34, simply note that FX,p (x) = FY,p (x) by
hypothesis, and apply (5.2.20).

Proof of Proposition 5.2.32 Put Tin := Ti pni with pni = Ti ([n, n]). Define the
multiplication operator MQ := MQ(T1 ,...,Tk ) as in Theorem 5.2.31. By Lemma
5.2.33, we can find a projection pn such that
X n := pn Q(T1n , . . . , Tkn )pn = pn Q(T1 , . . . , Tk )pn = pn MQ pn
and (pn ) 1 m(Q) maxi (1 Ti ([n, n])). By Lemma 5.2.34,
dKS (MQ , Q(T1n ,...,Tkn ) ) m(Q) max (1 Ti ([n, n])) ,
i

implying the convergence of the law of Q(T1n , . . . , Tkn ) to the law of MQ . Since also
by construction pni Ti pni = wn (Ti ) with wn (x) = x1|x|n , we see that we can replace
now wn by any other local approximation un of the identity since the difference
X n pn Q(un (T1 ), . . . , un (Tk ))pn
is uniformly bounded by c sup|x|n |wn un |(x) for some finite constant c =
c(n, sup|x|n |wn (x)|, Q) and therefore goes to zero when un (x) approaches the
identity map on [n, n].

348 5. F REE PROBABILITY

5.3 Free independence

What makes free probability special is the notion of freeness that we define in
Section 5.3.1. It is the noncommutative analog of independence in probability.
In some sense, probability theory distinguishes itself from integration theory by
the notions of independence and of random variables which are the basis to treat
problems from a different perspective. Similarly, free probability differentiates
from noncommutative probability by this very notion of freeness which makes it
a noncommutative analog of classical probability.

5.3.1 Independence and free independence

Classical independence of random variables can be defined in the noncommuta-


tive context. We assume throughout that (A , ) is a noncommutative probability
space. Suppose {Ai }iI is a family of subalgebras of A , each containing the
unit of A . The family is called independent if the algebras Ai commute and
(a1 an ) = (a1 ) (an ) for ai Ak(i) with i = j k(i) = k( j). This is the
natural notion of independence when considering tensor products, as is the case
in the classical probability example L (X, B, ).
Free independence is a completely different matter.

Definition 5.3.1 Let {A j } jI be a family of subalgebras of A , each containing


the unit of A . The family {A j } jI is called freely independent if for any positive
integer n, indices k(1) = k(2), k(2) = k(3), . . . , k(n 1) = k(n) in I and any
a j Ak( j) , j = 1, . . . , n, with (a j ) = 0, it holds that

(a1 an ) = 0 .

Let r, (mk )1kr be positive integers. The sets (X1,p , . . . , Xm p ,p )1pr of noncom-
mutative random variables are called free if the algebras they generate are free.

Note that, in contrast to the classical notion of independence, repetition of indices


is allowed provided they are not consecutive; thus, free independence is a truly
noncommutative notion. Note also that it is impossible to have ai = 1 in Definition
5.3.1 because of the condition (ai ) = 0.
Observe that we could have assumed that A as well as all members of the
family {Ai }iI are W -algebras. In that situation, if i is a family of generators
of the W -algebra Ai , then the W -subalgebras {Ai }iI are free iff the families of
variables {i }iI are free.
5.3 F REE INDEPENDENCE 349

Remark 5.3.2
(i) Independence and free independence are quite different. Indeed, let X,Y
be two self-adjoint elements of a noncommutative probability space (A , )
such that (X) = (Y ) = 0 but (X 2 ) = 0 and (Y 2 ) = 0. If X,Y commute
and are independent,

(XY ) = 0, (XY XY ) = (X 2 ) (Y 2 ) = 0,
whereas if X,Y are free, then (XY ) = 0 but (XY XY ) = 0.
(ii) The interest in free independence is that if the subalgebras Ai are freely
independent, the restrictions of to the Ai are sufficient in order to com-
pute on the subalgebra generated by all Ai . To see that, note that it is
enough to compute (a1 a2 an ) for ai Ak(i) and k(i) = k(i + 1). But,
from the freeness condition,

((a1 (ai )1)(a2 (a2 )1) (an (an )1)) = 0 . (5.3.1)

Expanding the product (using linearity), one can inductively compute


(a1 an ) as a function of lower order terms. We will see a systematic
way to perform such computations in Section 5.3.2.
(iii) The law of free sets of noncommutative variables is a continuous func-
tion of the laws of the sets. For example, let X p = (X1,p , . . . , Xm,p ) and
Y p = (Y1,p , . . . ,Yn,p ) be sets of noncommutative variables for each p which
are free. Assume that the law of X p (respectively, Y p ) converges as p
goes to infinity towards the law of X = (X1 , . . . , Xm ) (respectively, Y =
(Y1 , . . . ,Yn )).
(a) If the sets X and Y are free, then the joint law of (X p , Y p ) converges to
the joint law of (X, Y).
(b) If instead the joint law of (X p , Y p ) converge to the joint law of (X, Y),
then X and Y are free.
(iv) If the restriction of to each of the subalgebras {Ai }iI is tracial, then the
restriction of to the algebra generated by {Ai }iI is also tracial.

The proof of some basic properties of free independence that are inherited by
subalgebras is left to Exercise 5.3.8.
The following are standard examples of free variables.

Example 5.3.3
(i) Free products of groups (Continuation of Example 5.2.2, part (ii)) Sup-
pose G is a group which is the free product of its subgroups Gi , that is,
every element in G can be written as the product of elements in the Gi and
350 5. F REE PROBABILITY

g1 g2 gn = e whenever g j Gi( j) \ {e} and i( j) = i( j + 1) for all j. In


this setup, we may take as A the W -algebra generated by the left regular
representation (G), see part (ii) of Example 5.2.12, and we may take as
the trace defined in that example. Take also as Ai the W -algebra gen-
erated by the left regular representations (Gi ). This coincides with those
operators g cg (g) with c(g) = 0 for g Gi that form bounded opera-
tors. Now, if a Ai and (a) = 0 then ce = (a) = 0. Thus, if ai Ak(i)
with (ai ) = 0 and k(i) = k(i + 1), the resulting operator corresponding
to a1 an , denoted g cg (g), satisfies cg = 0 only if g = g1 gn for
gi Gk(i) \ e. In particular, since g1 gn = e, we have that ce = 0, i.e.
(a1 an ) = 0, which proves the freeness of the Ai . The converse is also
true, that is, if the subalgebras Ai associated with the subgroups Gi are
free, then the subgroups are algebraically free.
(ii) Fock spaces. Let H be a Hilbert space and define the BoltzmannFock
space as
E
T = H n . (5.3.2)
n0

(Here, H 0 = C1 where 1 is an arbitrary unit vector in H). T is itself a


Hilbert space (with the inner product determined from the inner product in
H by (G.1) and (G.2)). If {ei } is an orthonormal basis in H, then {1} is
an orthonormal basis for H 0 , and {ei1 ein } is an orthonormal basis
for H n . An orthonormal basis for T is constructed naturally from these
bases.
For h H, define (h) to be the left creation operator, (h)g = h g.
On the algebra of bounded operators on T , denoted B(T ), consider the
state given by the vacuum, (a) = a1, 1. We next show that the family
{(ei ),  (ei )} is freely independent in (B(T ), ). Here, i :=  (ei ), the
left annihilation operator, is the operator adjoint to i := (ei ). We have
i 1 = 0. More generally,

i ei1 ei2 ein = ii1 ei2 ein

because, for g T with (n 1)th term equal to gn1 ,

ei1 ei2 ein , i g = ei1 ei2 ein , ei gn1 


= ii1 ei2 ein , gn1  .

Note that even though i i is typically not the identity, it does hold true that
i  j = i j I with I the identity in B(T ). Due to that, the algebra generated
by (i , i , I) is generated by the terms qi (i ) p , p + q > 0, and I. Note also
5.3 F REE INDEPENDENCE 351

that
(qi (i ) p ) = (i ) p 1, (i )q 1 = 0 ,
since at least one of p, q is nonzero. Thus, we need only to prove that if
pk + qk > 0, ik = ik+1 ,
 
Z := qi11 (i1 ) p1 qi22 (i2 ) p2 qinn (in ) pn = 0 .

But necessarily if Z = 0 then q1 = 0 (for otherwise a term ei1 pops out


on the left of the expression which will then be annihilated in the scalar
product with 1). Thus, p1 > 0, and then one must have q2 = 0, implying
in turn p2 > 0, etc., up to pn > 0. But since (in ) pn 1 = 0, we conclude that
Z = 0.

In classical probability one can create independent random variables by forming


products of probability spaces. Analogously, in free probability, one can create
free random variables by forming free products of noncommutative probability
spaces. More precisely, if {(A j , j )} is a family of noncommutative probability
spaces, one may construct a noncommutative probability space (A , ) equipped
with injections i j : A j A such that j = i j and the images i j (A j ) are free
in A .
We now explain the construction of free products in a simplified setting suf-
ficient for the applications we have in mind. We assume each noncommutative
probability space (A j , j ) is a C -probability space, A j is separable, and the fam-
ily {(A j , j )} is countable. By Corollary 5.2.25, we may assume that A j is a
C -subalgebra of B(H j ) for some separable Hilbert space H j , and that for some
unit vector j H j we have j (a) = a j , j  for all a A j . Then the free prod-
uct (A , ) we aim to construct will be a C -subalgebra of B(H ) for a certain
separable Hilbert space H , and we will have for some unit vector H that
(a) = a ,  for all a A .
We construct (H , ) as the free product of the pairs (H j , j ). Toward that end,
given f H j , let f = f  f , j  j H j and put H j = { f : f H j }. Then, for a
unit vector in some Hilbert space which is independent of j, put

E E
H ( j) := C H j1 H j2 H jn . (5.3.3)
n1 j1 = j2 = jn
j1 = j

Let H be defined similarly but without the restriction j1 = j. Note that all the
Hilbert spaces H ( j) are closed subspaces of H . We equip B(H ) with the state
= (a  a , ), and hereafter regard it as a noncommutative probability space.
352 5. F REE PROBABILITY

We need next for each fixed j to define an embedding of B(H j ) in B(H ).


Toward that end we define a Hilbert space isomorphism V j : H j H ( j) H as
follows, where h j denotes a general element of H j .

j  ,
h j  h j ,
j (h j1 h j2 h jn )  h j1 h j2 h jn ,
h j (h j1 h j2 h jn )  h j h j1 h j2 h jn .

Then, given T B(H j ), we define j (T ) B(H ) by the formula

j (T ) = V j (T IH ( j) ) V j

where IH ( j) denotes the identity mapping of H ( j) to itself. Note that j is a


norm-preserving -homomorphism of B(H j ) into B(H ). The crucial feature of
the definition is that for j = j1 = j2 = = jm ,

j (T )(h j1 h jm ) = j (T )h j1 h jm + (T j ) h j1 h jm . (5.3.4)

We have nearly reached our goal. The key point is the following.

Lemma 5.3.4 In the noncommutative probability space (B(H ), ), the subalge-


bras j (B(H j )) are free.

The lemma granted, we can quickly conclude the construction of the free product
(A , ), as follows. We take A to be the C -subalgebra of B(H ) generated by
the images j (A j ), to be the restriction of to A , and i j to be the restriction of
j to A j . It is immediate that the images i j (A j ) are free in (A , ).
Proof of Lemma 5.3.4 Fix j1 = j2 = = jm and operators Tk B(H jk ) for
k = 1, . . . , m. Note that by definition ( jk (Tk )) = Tk jk , jk . Put Tk = Tk
Tk jk , jk I jk , where I jk denotes the identity mapping of H jk to itself, noting that
( jk (Tk )) = 0. By iterated application of (5.3.4) we have

j1 (T1 ) jm (Tm ) = (T1 j1 ) (Tm jm ) H j1 H j2 H jm .

Since the space on the right is orthogonal to , we have

( j1 (T1 ) jm (Tm )) = 0.

Thus the C -subalgebras j (B(H j )) are indeed free in B(H ) with respect to the
state .

Remark 5.3.5 In point (i) of Example 5.3.3 the underlying Hilbert space equipped
5.3 F REE INDEPENDENCE 353

with unit vector is the free product of the pairs (2 (Gi ), veGi ), while in point (ii) it
F
is the free product of the pairs ( n
n=0 Cei , 1).

Remark 5.3.6 The free product (A , ) of a family {(A j , j )} can be constructed


purely algebraically, using just the spaces (A j , j ) themselves, but it is less simple
to describe precisely. Given a A j , put a = a j (a)1A j and Aj = {a : a A j }.
At the level of vector spaces,
 
E
A = C1A Aj Aj .
1 m
j1 = j2 = = jm

The injection i j : A j A is given by the formula

i j (a) = j (a)1A a C1A Aj A

and the state is defined by


 
(1A ) = 1, Aj1 Ajm = 0.

Multiplication in A is obtained, roughly, by simplifying as much as possible when


elements of the same algebra A j are juxtaposed. Since a rigorous definition takes
some effort and is not needed, we do not describe it in detail.

Exercise 5.3.7 In the setting of part (ii) of Example 5.3.3, show that, for all n N,
 2 
1
[(1 + 1 )n ] = xn 4 x2 dx.
2

Hint: Expand the left side and show that ( p1  p2  pn ), with pi = 1 or , van-
ishes unless ni=1 1 pi =1 = ni=1 1 pi = . Deduce that the left side vanishes when n
is odd. Show that when n is even, the only indices (p1 , . . . , pn ) contributing to
the expansion are those for which the path (Xi = Xi1 + 1 pi =1 1 pi = )1in , with
X0 = 0, is a Dyck path. Conclude by using Section 2.1.3.

Exercise 5.3.8 (i) Show that freely independent algebras can be piled up, as
follows. Let {Ai }iI be a family of freely independent subalgebras of A . Partition
I into subsets {I j } jJ and denote by B j the subalgebra generated by the family
{Ai }iI j . Show that the family {B j } jJ is freely independent. (ii) Show that
freeness is preserved under (strong or weak) closures, as follows. Suppose that
(A , ) is a C - or W -probability space. Let {Ai }iI be a family consisting of
unital subalgebras closed under the involution, and for each index i I let AC i
be the strong or weak closure of Ai . Show that the family {AC i }iI is still freely
independent.
354 5. F REE PROBABILITY

5.3.2 Free independence and combinatorics

The definition 5.3.1 of free independence is given in terms of the vanishing of


certain moments of the variables. It is not particularly easy to handle for com-
putation. We explore in this section the notion of cumulant, which is often much
easier to handle.

Basic properties of non-crossing partitions

Whereas classical cumulants are related to moments via a sum on the whole set
of partitions, free cumulants are defined with the help of non-crossing partitions
(recall Definition 2.1.4). A pictorial description of non-crossing versus crossing
partitions was given in Figure 2.1.1.
Before turning to the definition of free cumulants, we need to review key prop-
erties of non-crossing partitions. It is convenient to define, for any finite nonempty
set J of positive integers, the set NC(J) to be the family of non-crossing partitions
of J. This makes sense because the non-crossing property of a partition is well de-
fined in the presence of a total ordering. Also, we define an interval in J to be any
nonempty subset consisting of consecutive elements of J. Given , NC(J) we
say that refines if every block of is contained in some block of , and in
this case we write . Equipped with this partial order, NC(J) is a poset, that
is, a partially ordered set. For J = {1, . . . , n}, we simply write NC(n) = NC(J).
The unique maximal element of NC(n), namely {{1, . . . , n}}, we denote by 1n .

Property 5.3.9 For any finite nonempty family {i }iJ of elements of NC(n) there
exists a greatest lower bound iJ i NC(n) and a least upper bound iJ i
NC(n) with respect to the refinement partial ordering.

We remark that greatest lower bounds and least upper bounds in a poset are auto-
matically unique. Below, we write i{1,2} i = 1 2 and i{1,2} i = 1 2 .
Proof It is enough to prove existence of the greatest lower bound iJ i , for then
iJ i can be obtained as kK k , where {k }kK is the family of elements of
NC(n) coarser than i for all i J. (The family {k } is nonempty since 1n belongs
to it.) It is clear that in the refinement-ordered family of all partitions of {1, . . . , n}
there exists a greatest lower bound for the family {i }iJ . Finally, it is routine
to check that is in fact non-crossing, and hence = iJ i .

Remark 5.3.10 As noted in the proof above, for , NC(n), the greatest lower
bound of and in the poset NC(n) coincides with the greatest lower bound in
5.3 F REE INDEPENDENCE 355

the poset of all partitions of {1, . . . , n}. But the analogous statement about least
upper bounds is false in general.

Property 5.3.11 Let be a non-crossing partition of a finite nonempty set S of


positive integers. Let S1 , . . . , Sm be an enumeration of the blocks of . For i =

1, . . . , m let i be a partition of Si . Then the partition m
i=1 i of S obtained by
combining the i is non-crossing if and only if i is non-crossing for i = 1, . . . , m.

The proof is straightforward and so omitted. But this property bears emphasis
because it is crucial for defining free cumulants.

Property 5.3.12 If a partition of a finite nonempty set S of positive integers is


non-crossing, then there is at least one block of which is an interval in S.

Proof Let W be any block of , let W W be the interval in S bounded by the


least and greatest elements of W , and put S = W \W . If S is empty, we are done.
Otherwise S is a union of blocks of , by the non-crossing property. Let be the
restriction of to S . By induction on the cardinality of S, some block V of is
an interval of S , hence V is an interval in S and a block of .

Free cumulants and freeness

In classical probability, moments can be written as a sum over partitions of clas-


sical cumulants. A similar formula holds in free probability except that partitions
have to be non-crossing. This relation between moments and free cumulants can
be used to define free cumulants, as follows.
We pause to introduce some notation. Suppose we are given a collection {n :
A n C} n=1 of multilinear functionals on a fixed complex algebra A . We
define  ({ai }iJ ) C for finite nonempty sets J of positive integers, families
{ai }iJ of elements of A and NC(J) in two stages: first we write J = {i1 <
< im } and define ({ai }iJ ) = m (ai1 , . . . , aim ); then we define  ({ai }iJ ) =
V ({ai }iV ).

Definition 5.3.13 Let (A , ) be a noncommutative probability space. The free


cumulants are defined as a collection of multilinear functionals
kn : A n C (n N)
by the following system of equations:
(a1 an ) = k (a1 , . . . , an ). (5.3.5)
NC(n)
356 5. F REE PROBABILITY

Lemma 5.3.14 The free cumulants are well defined.

Proof We define ({ai }iJ ) C for finite nonempty sets J of positive integers,
families {ai }iJ of elements of A and NC(J) in two stages: first we write
J = {i1 < < im } and define iJ ai = ai1 aim ; then we define ({ai }iJ ) =
V (iV ai ). If the defining relations (5.3.5) hold, then, more generally, we
must have
(a1 , . . . , an ) = k (a1 , . . . , an ) (5.3.6)
NC(n)

for all n, (a1 , . . . , an ) A n and NC(n), by Property 5.3.11. Since every partial
ordering of a finite set can be extended to a linear ordering, the system of linear
equations (5.3.6), for fixed n and (a1 , . . . , an ) A n , has (in effect) a square tri-
angular coefficient matrix with 1s on the diagonal, and hence a unique solution.
Thus, the free cumulants are indeed well defined.

We now turn to the description of freeness in terms of cumulants, which is


analogous to the characterization of independence by cumulants in classical prob-
ability.

Theorem 5.3.15 Let (A , ) be a noncommutative probability space and consider


unital subalgebras A1 , . . . , Am A . Then, A1 , . . . , Am are free if and only if, for
all n 2 and for all ai A j(i) with 1 j(1), . . . , j(n) m,

kn (a1 , . . . , an ) = 0 if there exist 1 l, k n with j(l) = j(k). (5.3.7)

Before beginning the proof of the theorem, we prove a result which explains
why the description of freeness by cumulants does not require any centering of
the variables.

Proposition 5.3.16 Let (A , ) be a noncommutative probability space and as-


sume a1 , . . . , an A with n 2. If there is i {1, . . . , n} so that ai = 1, then

kn (a1 , . . . , an ) = 0.

As a consequence, for n 2 and any a1 , . . . , an A ,

kn (a1 , . . . , an ) = kn (a1 (a1 ), a2 (a2 ), . . . , an (an )).

Proof We use induction on n 2. To establish the induction base, for n = 2 we


have, since k1 (a) = (a),

(a1 a2 ) = k2 (a1 , a2 ) + (a1 ) (a2 )


5.3 F REE INDEPENDENCE 357

and so, if a1 = 1 or a2 = 1, we deduce, since (1) = 1, that k2 (a1 , a2 ) = 0. For


the rest of the proof we assume that n > 2. By induction we may assume that for
p n 1, k p (b1 , . . . , b p ) = 0 if one of the bi is the identity. Suppose now that
ai = 1. Then

(a1 an ) = kn (a1 , . . . , an ) + k (a1 , . . . , an ) , (5.3.8)


NC(n)
=1n

where by our induction hypothesis all the partitions contributing to the above
sum must be such that {i} is a block. But then, by the induction hypothesis,

k (a1 , . . . , an ) = k (a1 , . . . , ai1 , ai+1 , . . . , an )


NC(n) NC(n1)
=1n

= (a1 ai1 ai+1 an )


= (a1 an ) kn (a1 , . . . , an )

where the second equality is due to the definition of cumulants and the third to
(5.3.8). As a consequence, because (a1 ai1 ai+1 an ) = (a1 an ), we
have proved that kn (a1 , . . . , an ) = 0.

Proof of the implication in Theorem 5.3.15 We assume that the cumulants


vanish when evaluated at elements of different algebras A1 , . . . , Am and consider,
for ai A j(i) with j(i) = j(i + 1) for all i {1, . . . , n 1}, the equation

((a1 (a1 )) (an (an ))) = k (a1 , . . . , an ).


NC(n)

By our hypothesis, k vanishes as soon as a block of contains 1 p, q n so that


j(p) = j(q). Therefore, since we assumed j(p) = j(p + 1) for all p {1, . . . , n
1}, we see that the contribution in the above sum comes from partitions whose
blocks cannot contain two nearest neighbors {p, p + 1} for any p {1, . . . , n 1}.
On the other hand, by Property 5.3.12, must contain an interval in {1, . . . , n},
and the previous remark implies that this interval must be of the form V = {p}
for some p {1, . . . , n 1}. But then k vanishes since k1 = 0 by centering of the
variables. Therefore, if for 1 p n 1, j(p) = j(p + 1), we get

((a1 (a1 )) (an (an ))) = 0,

and hence satisfies (5.3.1).


The next lemma handles an important special case of the implication in


Theorem 5.3.15.
358 5. F REE PROBABILITY

Lemma 5.3.17 If A1 , . . . , Am are free, then for n 2,

kn (a1 , . . . , an ) = 0 if a j A j(i) with j(1) = j(2) = = j(n). (5.3.9)

Proof We proceed by induction on n 2. We have

0 = ((a1 (a1 )) (an (an ))) = k (a1 (a1 ), . . . , an (an ))


NC(n)

= k (a1 , . . . , an ) , (5.3.10)
NC(n)
has no singleton blocks

where the second equality is due to Proposition 5.3.16 and the vanishing k1 (ai
(ai )) = 0. To finish the proof of (5.3.9) it is enough to prove that the last sum
reduces to kn (a1 , . . . , an ). If n = 2 this is clear; otherwise, for n > 2, this holds by
induction on n, using Property 5.3.12.

The next lemma provides the inductive step needed to finish the proof of Theo-
rem 5.3.15.

Lemma 5.3.18 Fix n 2 and a1 , . . . , an A . Fix 1 i n 1 and let


NC(n) be the non-crossing partition all blocks of which are singletons except for
{i, i + 1}. Then for all NC(n 1) we have that

k (a1 , . . . , ai ai+1 , . . . , an ) = k (a1 , . . . , an ). (5.3.11)


NC(n)
=

Proof Fix NC(n 1) arbitrarily. It will be enough to prove equality after


summing both sides of (5.3.11) over . Let

f : {1, . . . , n} {1, . . . , n 1}

be the unique onto monotone increasing function such that f (i) = f (i + 1). Let
NC(n) be the partition whose blocks are of the form f 1 (V ) with V a block
of . Summing the left side of (5.3.11) on we get (a1 , . . . , ai ai+1 , . . . , an )
by (5.3.6). Now summing the right side of (5.3.11) on is the same thing as
replacing the sum already there by a sum over NC(n) such that . Thus,
summing the right side of (5.3.11) over , we get (a1 , . . . , an ) by another
application of (5.3.6). But clearly

(a1 , . . . , ai ai+1 , . . . , an ) = (a1 , . . . , an ),


Thus (5.3.11) holds.

Proof of the implication in Theorem 5.3.15 For n 2, indices j(1), . . . , j(n)


{1, . . . , m} such that { j(1), . . . , j(n)} is a set of more than one element, and ai
5.3 F REE INDEPENDENCE 359

A j(i) for i = 1, . . . , m, assuming A1 , . . . , Am are free in A with respect to , we


have to prove that kn (a1 , . . . , an ) = 0. We proceed by induction on n 2. The
induction base n = 2 holds by (5.3.9). Assume for the rest of the proof that n > 2.
Because of (5.3.9), we may assume there exists i {1, . . . , n 1} such that j(i) =
j(i + 1). Let NC(n) be the unique partition all blocks of which are singletons
except for the block {i, i + 1}. In the special case = 1n1 , equation (5.3.11) after
slight rearrangement takes the form

kn (a1 , . . . , an ) = kn1 (a1 , . . . , ai ai+1 , . . . , an ) k (a1 , . . . , an ). (5.3.12)


1n = NC(n)
=1n

In the present case the first of the terms on the right vanishes by induction on n.
Now each NC(n) contributing on the right is of the form = {Vi ,Vi+1 } where
i Vi and i + 1 Vi+1 . Since the function i  j(i) cannot be constant both on
Vi and on Vi+1 lest it be constant, it follows that every term in the sum on the far
right vanishes by induction on n. We conclude that kn (a1 , . . . , an ) = 0. The proof
of Theorem 5.3.15 is complete.

Exercise 5.3.19 Prove that

k3 (a1 , a2 , a3 ) = (a1 a2 a3 ) (a1 ) (a2 a3 ) (a1 a3 ) (a2 )


(a1 a2 ) (a3 ) + 2 (a1 ) (a2 ) (a3 ) .

5.3.3 Consequence of free independence: free convolution

We postpone giving a direct link between free independence and random matrices
in order to first exhibit some consequence of free independence, often described as
free harmonic analysis. We will consider two self-adjoint noncommutative vari-
ables a and b. Our goal is to determine the law of a + b or of ab when a, b are free.
Since the law of (a, b) with a, b free is uniquely determined by the laws a of a and
b of b (see part (ii) of Remark 5.3.2), the law of their sum (respectively, product)
is a function of a and b denoted by a  b (respectively, a  b ). There are
several approaches to these questions; we will detail first a purely combinatorial
approach based on free cumulants and then mention an algebraic approach based
on the Fock space representations (see part (ii) of Example 5.3.3). These two
approaches concern the case where the probability measures a , b have compact
support (that is, a and b are bounded). We will generalize the results to unbounded
variables in Section 5.3.5.
360 5. F REE PROBABILITY

Free additive convolution

Definition 5.3.20 Let a, b be two noncommutative variables in a noncommutative


probability space (A , ) with law a , b respectively. If a, b are free, then the law
of a + b is denoted a  b .

We use kn (a) = kn (a, . . . , a) to denote the nth cumulant of the variable a.

Lemma 5.3.21 Let a, b be two bounded operators in a noncommutative probability


space (A , ). If a and b are free, then for all n 1,

kn (a + b) = kn (a) + kn (b).

Proof The result is obvious for n = 1 by linearity of k1 . Moreover, for all n 2,


by multilinearity of the cumulants,

kn (a + b) = kn (1 a + (1 1 )b, . . . , n a + (1 n )b)
i =0,1
= kn (a) + kn (b) ,

where the second equality is a consequence of Theorem 5.3.15.


Definition 5.3.22 For a bounded operator a the formal power series

Ra (z) = kn+1 (a)zn


n0

is called the R-transform of the law a . We also write Ra := Ra since Ra only


depends on the law a .

By Lemma 5.3.21, the R-transform is to free probability what the log-Fourier


transform is to classical probability in the sense that it is linear for free additive
convolution, as stated by the next corollary.

Corollary 5.3.23 Let a, b be two bounded operators in a noncommutative proba-


bility space (A , ). If a and b are free, we have

R a  b = R a + R b ,

where the equalities hold between formal series.

We next provide a more tractable definition of the R-transform in terms of the


Stieltjes transform. Let : C[X] C be a distribution in the sense of Definition
5.3 F REE INDEPENDENCE 361

5.2.3 and define the formal power series

G (z) := (X n )z(n+1) . (5.3.13)


n0

Let K (z) be the formal inverse of G , i.e. G (K (z)) = z. The formal power
series expansion of K is

1
K (z) = + Cn zn1 .
z n=1

Lemma 5.3.24 Let be a compactly supported probability measure. For n 1


integer, Cn = kn and so we have equality in the sense of formal series

R (z) = K (z) 1/z.

Proof Consider the generating function of the cumulants as the formal power
series

Ca (z) = 1 + kn (a)zn
n=1

and the generating function of the moments as the formal power series

Ma (z) = 1 + mn (a)zn
n=1

with mn (a) := (an ). We will prove that

Ca (zMa (z)) = Ma (z) . (5.3.14)

The rest of the proof is pure algebra since

Ga (z) := Ga (z) = z1 Ma (z1 ) , Ra (z) := z1 (Ca (z) 1)

then gives Ca (Ga (z)) = zGa (z) and so, by composition with Ka ,

zRa (z) + 1 = Ca (z) = zKa (z) .

This equality proves that kn = Cn for n 1. To derive (5.3.14), we will first show
that
n
mn (a) = ks (a)mi1 (a) mis (a) . (5.3.15)
s=1 i1 ,...,is {0,1,...,ns}
i1 ++is =ns
362 5. F REE PROBABILITY

With (5.3.15) granted, (5.3.14) follows readily since



Ma (z) = 1 + mn (a)zn
n=1
n
= 1+ ks (a)zs mi1 (a)zi1 mis (a)zis
n=1 s=1 i1 ,...,is {0,1,...,ns}
i1 ++is =ns
 s

= 1 + ks (z)z s
z mi (a)
i
= Ca (zMa (z)) .
s=1 i=0

To prove (5.3.15), recall that, by definition of the cumulants,

mn (a) = k (a) .
NC(n)

Given a non-crossing partition = {V1 , . . . ,Vr } NC(n), write V1 = (1, v2 , . . . , vs )


with s = |V1 | {1, . . . , n}. Since is non-crossing, we see that for any l
{2, . . . , r}, there exists k {1, . . . , s} so that the elements of Vl lie between vk
and vk+1 . Here vs+1 = n + 1 by convention. This means that decomposes into
V1 and at most s other (non-crossing) partitions 1 , . . . , s . Therefore

k = ks k1 ks .

If we let ik denote the number of elements in k , we thus have proved that


n
mn (a) = ks (a) k1 (a) ks (a)
s=1 k NC(ik ),
i1 ++is =ns
n
= ks (a) i ++i=ns mi1 (a) mis (a) ,
s=1 1 s
ik 0

where we used again the relation (5.3.5) between cumulants and moments. The
proof of (5.3.15), and hence of the lemma, is thus complete.

We now digress by rapidly describing the original proof of Corollary 5.3.23


due to Voiculescu. The idea is that since laws only depends on moments, one can
choose a specific representation of the free noncommutative variables a, b with
given marginal distribution to actually compute the law of a+b. A standard choice
is then to use left creation and annihilation operators as described in part (ii) of
Example 5.3.3. Let T denote the Fock space described in (5.3.2) and i = (ei ),
i = 1, 2, be two creation operators on T .
5.3 F REE INDEPENDENCE 363

Lemma 5.3.25 Let ( j,i , i = 1, 2, j N) be complex numbers and consider the


operators on T

ai = i + 0,i I + j,i ij , i = 1, 2.
j=1

Then, denoting in short 0i = I for i = 1, 2, we have that


a1 + a2 = (1 + 2 ) + j,1 1j + j,2 2j (5.3.16)
j=0 j=0

and

a3 = 1 + j,1 1j + j,2 1j (5.3.17)
j=0 j=0

possess the same distribution in the noncommutative probability space (T , 1, 1).

In the above lemma, infinite sums are formal. The law of the associated operators
is still well defined since the (ij ) jM will not contribute to moments of order
smaller than M; thus, any finite family of moments is well defined.
Proof We need to show that the traces ak3 1, 1 and (a1 + a2 )k 1, 1 are equal
for all positive integers k. Comparing (5.3.16) and (5.3.17), there is a bijection
between each term in the sum defining (a1 + a2 ) and the sum defining a3 , which
extends to the expansions of ak3 and (a1 + a2 )k . We thus only need to compare the
vacuum expectations of individual terms; for ak3 1, 1 they are of the form Z :=
w1 1 w1 2 w1 n 1, 1 where wi {, 1}, whereas the expansion of (a1 + a2 )k 1, 1
yields similar terms except that 1 has to be replaced by 1 + 2 and some of the
11 by 12 . Note, however, that Z = 0 if and only if the sequence w1 , w2 , . . . , wn is
a Dyck path, i.e. the walk defined by it forms a positive excursion that returns
to 0 at time n (replacing the symbol by 1). But, since (1 + 2 )i = 1 = i i
for i = 1, 2, the value of Z is unchanged under the rules described above, which
completes the proof.

To deduce another proof of Lemma 5.3.21 from Lemma 5.3.25, we next show
that the cumulants of the distribution of an operator of the form

a =  + j  j ,
j0
364 5. F REE PROBABILITY

for some creation operator  on T , are given by ki = i+1 . To prove this point,
we compute the moments of a. By definition,
 n
an 1, 1 =   + j  j 1, 1
j0

= i(1) i(n) 1, 1i(1) i(n) ,


i(1),...,i(n){1,0,...,n1}

where for j = 1 we wrote  for  j and set 1 = 1, and further observed that
mixed moments vanish if some i(l) n. Recall now that i(1) i(n) 1, 1 van-
ishes except if the path (i(1), . . . , i(n)) forms a positive excursion that returns to
the origin at time n, that is,

i(1) + + i(m) 0 for all m n, and i(1) + + i(n) = 0 . (5.3.18)

(Such a path is not in general a Dyck path since the (i(p), 1 p n) may take
any values in {1, 0, . . . , n 1}.) We thus have proved that

an 1, 1 = i(1) i(n) . (5.3.19)


i(1),...,i(n){1,...,n1},
m n
p=1 i(p)0, p=1 i(p)=0

Define next a bijection between the set of integers (i(1), . . . , i(n)) satisfying
(5.3.18) and non-crossing partitions = {V1 , . . . ,Vr } by i(m) = |Vi | 1 if m is
the first element of the block Vi , and i(m) = 1 otherwise. To see it is a bijection,
being given a partition, the numbers (i(1), . . . , i(n)) satisfy (5.3.18). Reciprocally,
being given the numbers (i(1), . . . , i(n)), we have a unique non-crossing partition
= (V1 , . . . ,Vk ) satisfying |Vi | = i(m) + 1 with m the first point of Vi . It is drawn
inductively by removing block intervals which are sequences of indices such that
{i(m) = p, i(m + k) = 1, 1 k p} (including p = 0 in which case an interval is
{i(m) = 0}). Such a block must exist by the second assumption in (5.3.18). Fixing
such intervals as blocks of the partition, we can remove the corresponding indices
and search for intervals in the corresponding subset S of {i(k), 1 k n}. The
indices in S also satisfy (5.3.18), so that we can continue the construction until no
indices are left.
This bijection allows us to replace the summation over the i(k) in (5.3.19) by
summation over non-crossing partitions to obtain

an 1, 1 = |V1 |1 |Vr |1 .


=(V1 ,...,Vr )

Thus, by the definition (5.3.5) of the cumulants, we deduce that, for all i 0,
i1 = ki , with ki the ith cumulant. Therefore, Lemma 5.3.25 is equivalent to the
5.3 F REE INDEPENDENCE 365

additivity of the free cumulants of Lemma 5.3.21 and the rest of the analysis is
similar.

Example 5.3.26 Consider the standard semicircle law a (dx) = (x)dx. By


Lemma 2.1.3 and Remark 2.4.2,

z z2 4
Ga (z) = .
2
Thus, Ka (z) = z1 + z. In particular, the R-transform of the semicircle is the linear
function z, and summing two (freely independent) semicircular variables yields
again a semicircular variable with a different variance. Indeed, repeating the com-
putation above, the R-transform of a semicircle with support [ , ] (or equiva-
lently with variance 2 /4) is 2 z/4. Note here that the linearity of the R-transform
is equivalent to kn (a) = 0 except if n = 2, and k2 (a) = 2 /4 = (a2 ).

Exercise 5.3.27 (i) Let = 12 (+1 + 1 ). Show that G (z) = (z2 1)1 z and

1 + 4z2 1
R (z) =
2z
1
with the appropriate branch of the square root. Deduce that G  (z) = z2 4 .
Recall
that if is the standard semicircle law d (x) = (x)dx, G (x) = 12 (z
z 4). Deduce by derivations and integration by parts that
2

1 1
(1 zG  (z)) = x (x)dx.
2 zx
Conclude that  is absolutely continuous with respect to Lebesgue measure
1
and with density proportional to 1|x|2 (4 x2 ) 2 .
(ii) (Free Poisson) Let > 0. Show that if one takes pn (dx) = (1 n )0 + n ,
pn
n converges to a limit p whose R-transform is given by


R(z) = .
1 z
Deduce that p is the MarcenkoPastur law given, if > 1, by
/
1
p(dx) = p(dx) = 4 2 (x ( + 1))2 dx ,
2 x
and for < 1, p = (1 )0 + p.

Multiplicative free convolution

We consider again two bounded self-adjoint operators a, b in a noncommutative


probability space (A , ) with laws a and b , but now study the law of ab, that
366 5. F REE PROBABILITY

is, the collection of moments { ((ab)n ), n N}. Note that ab does not need to be
a self-adjoint operator. In the case where is tracial and a self-adjoint positive,
1 1
we can, however, rewrite ((ab)n ) = ((a 2 ba 2 )n ) so that the law of ab coincides
1 1
with the spectral measure of a 2 ba 2 when b is self-adjoint. However, the following
analysis of the family { ((ab)n ), n N} holds in a more general context where
these quantities might not be related to a spectral measure.

Definition 5.3.28 Let a, b be two noncommutative variables in a noncommutative


probability space (A , ) with laws a and b respectively. If a and b are free, the
law of ab is denoted a  b .

Denote by ma the generating function of the moments, that is, the formal power
series
ma (z) := (an )zn = Ma (z) 1 .
m1

When (a) = 0, ma is invertible as a formal power series. Denote by m1


a its
(formal) inverse. We then define

Definition 5.3.29 Assume (a) = 0. The S-transform of a is given by


1 + z 1
Sa (z) := ma (z).
z

We next prove that the S-transform plays the same role in free probability that the
Mellin transform does in classical probability.

Lemma 5.3.30 Let a, b be two free bounded operators in a noncommutative prob-


ability space (A , ), so that (a) = 0, (b) = 0. Then

Sab (z) = Sa (z)Sb (z) .

See Exercise 5.3.31 for extensions of Lemma 5.3.30 to the case where either (a)
or (b) vanish.
Proof The idea is to use the structure of non-crossing partitions to relate the gen-
erating functions

Mab (z) = ((ab)n )zn , d


Mcd (z) = (d(cd)n )zn ,
n0 n0
5.3 F REE INDEPENDENCE 367

where (c, d) = (a, b) or (b, a). Note first that, from Theorem 5.3.15,

((ab)n ) = (abab ab) = k (a, b, . . . , a, b)


NC(2n)

= k1 (a)k2 (b) .
1 NC(1,3,...,2n1)2 NC(2,4,...,2n)
1 2 NC(2n)

The last formula is symmetric in a, b so that, even if is not tracial, ((ab)n ) =


((ba)n ) for all n 1. We use below the notation P(odd) and P(even) for the
partitions on the odd, respectively, even, positive integers. Fix the first block V1 =
{v1 , . . . , vs } in the partition 1 . We denote by W1 , . . . ,Ws the intervals between the
elements of V1 {2n}. For k = 1, . . . , s, the sum over the non-crossing partitions
of Wk corresponds to a word b(ab)ik if |Wk | = 2ik + 1 = vk+1 vk 1. Therefore
we have

n s

((ab)n ) = ks (a)i ++i=ns k1 (b)k2 (a)
s=1 1 s k=1 1 P(odd),2 P(even)
ik 0 1 2 NC({1,...,2ik +1})
n s
= ks (a) i ++i=ns (b(ab)ik ) . (5.3.20)
s=1 1 s k=1
ik 0

Now we can do the same for (b(ab)n ) by fixing the first block V1 = (v1 , . . . , vs ) in
the partition of the bs (on the odd numbers); the corresponding first intervals are
{vk + 1, vk+1 1} for k s 1 (representing the words of the form (ab)ik a, with
ik = 21 (vk+1 vk ) 1), whereas the last interval {vs + 1, 2n + 1} corresponds to
a word of the form (ab)i0 with i0 = 21 (2n + 1 vs ). Thus we get, for n 0,
n s
(b(ab)n ) = ks+1 (b) i ++i=ns ((ab)i0 ) (a(ba)ik ) . (5.3.21)
s=0 0 s k=1
ik 0

Set ca (z) := n1 kn (a)zn . Summing (5.3.20) and (5.3.21) yields the relations
b
Mab (z) = 1 + ca (zMab (z)) ,
Mab (z)
b
Mab (z) = zs ks+1 (b)Mab (z)Mbaa (z)s = zMa (z) cb (zMbaa (z)) .
s0 ba

Since Mab = Mba , we deduce that


b (z)M a (z)
zMab
Mab (z) 1 = ca (zMab
b a
(z)) = cb (zMba (z)) = ba
,
Mab (z)
368 5. F REE PROBABILITY

which yields, noting that ca , cb are invertible as formal power series since k1 (a) =
(a) = 0 and k1 (b) = (b) = 0 by assumption,

c1 1
a (Mab (z) 1)cb (Mab (z) 1) = zMab (z) (Mab (z) 1) . (5.3.22)

Finally, from the equality (5.3.14) (note here that ca = Ca 1), if ma = Ma 1,


then

ma (z) = ca (z(1 + ma (z))) c1 1


a (z) = (1 + z)ma (z) = zSa (z) .

Therefore, (5.3.22) implies

z2 Sa (z)Sb (z) = (1 + z)zm1 2


ab (z) = z Sab (z) ,

which completes the proof of the lemma.


Exercise 5.3.31 In the case where a is a self-adjoint operator such that (a) = 0

but a = 0, define m1
a , the inverse of ma , as a formal power series in z. Define
the S-transform Sa (z) = (z1 + 1)m1 a (z) and extend Lemma 5.3.30 to the case
where (a) or (b) may vanish.
Hint: Note that (a2 ) = 0 so that ma (z) = (a2 )z2 + m3 (am )zm has formal
inverse m1 2 1
a (z) = (a ) 2 z + ( (a )/2 (a ) )z + , which is a formal power
3 2 2

series in z.

5.3.4 Free central limit theorem

In view of the free harmonic analysis that we developed in the previous sections,
which is analogous to the classical one, it is no surprise that standard results from
classical probability can be generalized to the noncommutative setting. One of the
most important such generalizations is the free central limit theorem.

Lemma 5.3.32 Let {ai }iN be a family of free self-adjoint random variables in
a noncommutative probability space with a tracial state . Assume that, for all
k N,
sup | (akj )| < . (5.3.23)
j

Assume (ai ) = 0, (a2i ) = 1. Then

1 N
XN = ai
N i=1
converges in law as N goes to infinity to a standard semicircle distribution.
5.3 F REE INDEPENDENCE 369

Proof Note that by (5.3.23) the cumulants of words in the ai are well defined and
finite. Moreover, by Lemma 5.3.21, for all p 1, we have
N N
ai 1
k p (XN ) = k p ( N ) = N 2p k p (ai ) .
k=1 k=1

Since, for each p, {k p (ai )}


i=1 are bounded uniformly in i, we get, for p 3,

lim k p (XN ) = 0 .
N

Moreover, since (ai ) = 0, (a2i ) = 1, for any integer N, k1 (XN ) = 0 whereas


k2 (XN ) = 1. Therefore, we see by definition 5.3.13 that, for all p N,

0 if p is odd,
lim (XNp ) = .
N { NC(p), pair partition}
Here we recall that a pair partition is a partition whose blocks have exactly two
elements. The right side corresponds to the definition of the moments of the semi-
circle law, see Proposition 2.1.11.

5.3.5 Freeness for unbounded variables

The notion of freeness was defined for bounded variables possessing all moments.
It naturally extends to general unbounded variables thanks to the notion of affili-
ated operators defined in Section 5.2.3, as follows.

Definition 5.3.33 Self-adjoint operators {Xi }1ip , affiliated with a von Neumann
algebra A , are called freely independent, or simply free, iff the algebras generated
by { f (Xi ) : f bounded measurable}1ip are free.

Free unbounded variables can be constructed in a noncommutative space, even


though it is not possible anymore to represent these variables as bounded opera-
tors, so that standard tools such as the GNS representation, Theorem 5.2.24, do
not hold directly. However, we can construct free affiliated variables as follows.

Proposition 5.3.34 Let (1 , . . . , p ) be probability measures on R. Then there


exist a W -probability space (A , ) with a normal faithful tracial state, and self-
adjoint operators {Xi }1ip which are affiliated with A , with laws i , 1 i p,
and which are free.

Proof Set Ai = B(Hi ) with Hi = L2 (i ) and construct the free product H as in the
discussion following (5.3.3), yielding a C -probability space (A , ) with a tracial
370 5. F REE PROBABILITY

state and a morphism such that the algebras ( (Ai ))1ip are free. By the
GNS construction, see Proposition 5.2.24 and Corollary 5.2.27, we can construct
a normal faithful tracial state on a von Neumann algebra B and unbounded
operators (a1 , . . . , a p ) affiliated with B, with marginal distribution (1 , . . . , p ).
They are free since since the algebras they generate are free (note that and
satisfy the relations of Definition 5.3.1 according to Remark 5.3.2).

From now on we assume that we are given a Hilbert space H as well as a


W -algebra A B(H) and self-adjoint operators affiliated with A . The law of
affiliated operators is given by their spectral measure and, according to Theorem
5.2.31 and Proposition 5.2.32, if {Ti }1ik are self-adjoint affiliated operators, the
law of Q({Ti }1ik ) is well defined for any polynomial Q.
The following corollary is immediate.

Corollary 5.3.35 Let {Ti }1ik ABbe free self-adjoint variables with marginal
distribution {i }1ik and let Q be a self-adjoint polynomial in k noncommuting
variables. Then the law of Q({Ti }1ik ) depends only on {i }1ik and it is
continuous in these measures.

Proof of Corollary 5.3.35 Let un : R R be bounded continuous functions so


that un (x) = x for |x| < n and un (x) = 0 for |x| > 2n. By Proposition 5.2.32, the
law of Q({Ti }1ik ) can be approximated by the law of Q({un (Ti )}1ik ). To see
the claimed continuity, note that if ip i converges weakly as p for i =
1, . . . , k, then the sequences {ip } are tight, and thus for each > 0 there exists an
M independent of p so that ip ({x : |x| > M}) < . In particular, with Ti p denoting
the operators corresponding to the measures ip , it follows that the convergence of
the law of Q({un (Ti p )}1ik ) to the law of Q({Ti p }1ik ) is uniform in p. Since,
for each n, the law of Q({un (Ti p )}1ik ) converges to that of Q({un (Ti )}1ik ),
the claimed continuity follows.

Free harmonic analysis can be extended to affiliated operators, that is, to laws
with unbounded support. We consider here the additive free convolution. We
first show that the R-transform can be defined as an analytic function, at least
for arguments with large enough imaginary part, without using the existence of
moments.

Lemma 5.3.36 Let be a probability measure on R. For , > 0, let , C+


be given by

, = {z = x + iy C+ : |x| < y, y > } .


5.3 F REE INDEPENDENCE 371

Put, for z C\R,



1
G (z) := d (x), F (z) = 1/G (z). (5.3.24)
zx
For any > 0 and (0, ), there exists > 0 so that:
(i) F is univalent on , ;
(ii) F ( , ) contains , (1+ ) and in particular, the inverse of F , de-
noted F1 , satisfies F1 : , (1+ ) , ;
(iii) F1 is analytic on , (1+ ) .

Proof Observe that F is analytic on , and

lim F (z) = 1 .
|z|,z ,

In particular, the latter shows that |F (z)| > 1/2 on , for large enough.
We can thus apply the implicit function theorem (also known in this context as
the Lagrange inversion theorem) to deduce that F is invertible, with an analytic
inverse. The other claims follow by noting that F is approximately the identity
for sufficiently large.

Definition 5.3.37 Let , be as in Lemma 5.3.36. We define the Voiculescu


transform of on , as

(z) = F1 (z) z .

For 1/z , , we define the R-transform of as R (z) := ( 1z ).

By Lemma 5.3.36, for large enough, is analytic on , . As the following


lemma shows, the analyticity extends to a full neighborhood of infinity (and to an
analyticity of R in a neighborhood of 0) as soon as is compactly supported.

Lemma 5.3.38 If is compactly supported and |z| is small enough, then R (z)
equals the absolutely convergent series n0 kn+1 (a)zn .

Note that the definition of G given in (5.3.24) is analytic (in the upper half plane),
whereas it was defined as a formal power series in (5.3.13). However, when is
compactly supported and z is large enough, the formal series (5.3.13) is absolutely
convergent and is equal to the analytic definition (5.3.24), which justifies the use
of the same notation. Similarly, Lemma 5.3.38 shows that the formal Definition
5.3.22 of R can be strengthened into an analytic definition when is compactly
supported.
372 5. F REE PROBABILITY

Proof Let be supported in [M, M] for some M < . Then observe that G
defined in (5.3.13) can be as well defined as an absolutely converging series for
|z| > M, and the resulting function is analytic in this neighborhood of infinity. R
is then defined using Lemma 5.3.36 by applying the same procedure as in Lemma
5.3.24, but on analytic functions rather than formal series.

By Property 5.3.34, we can always construct a Hilbert space H, a tracial state


, and two free variables X1 , X2 with laws 1 and 2 , respectively, affiliated with
B(H). By Corollary 5.3.35, we may define the law of X1 + X2 which we denote
1  2 .

Corollary 5.3.39 Let 1 and 2 be probability measures on R, and let = 1 


2 . For each > 0, we have = 1 + 2 in , for sufficiently large.

Proof The proof is obtained by continuity from the bounded variables case. In-
deed, Lemmas 5.3.23 and 5.3.24, together with the last point of Lemma 5.3.36,
show that Corollary 5.3.39 holds when 1 and 2 are compactly supported. We
will next show that
if n converge to in the weak topology, then there exist
, > 0 such that n converges to uniformly on (5.3.25)
compacts subsets of , .

With (5.3.25) granted, put d in = i ([n, n])1 1|x|n d i , note that in converges
to i for i = 1, 2, and observe that the law 1n  2n of un (X1 ) + un (X2 ), with X1 , X2
being two free affiliated variables, converges to 1  2 by Proposition 5.2.32.
The convergence of n to on the compacts of some , for = 1 , 2
and 1  2 , together with the corollary applied to the compactly supported in ,
implying
1n 2n = 1n + 2n ,
yield the corollary for arbitrary measures i .
It remains to prove (5.3.25). Fix a probability measure and a sequence n
converging to . Then, F converges to F uniformly on compact sets of C+ (as
well as its derivatives, since the functions Fn are analytic). Since |F n (z)| > 1/2
on , for sufficiently large, |F n (z)| > 1/4 uniformly in n large enough for z in
compact subsets of , for sufficiently large. Therefore, the implicit function
theorem asserts that there exist , > 0 such that Fn has a right inverse F1 n
on
, , and thus the functions (n , n N, ) are well defined analytic functions
on , and are such that n (z) = o(z) uniformly in n as |z| goes to infinity.
Therefore, by Montels Theorem, the family {n , n N} has subsequences that
converge uniformly on compacts of , . We claim that all limit points must be
5.3 F REE INDEPENDENCE 373

equal to and hence n converges to on , . Indeed, assume n j converges


to on a compact K , . We have

|F ( (z) + z) z| = |F ( (z) + z) Fn j (n j (z) + z)|


= |F ( (z) + z) F (n j (z) + z)|
+|F (n j (z) + z) Fn j (n j (z) + z)| .

The first term in the right side goes to zero as j goes to infinity by continuity of F
and the second term goes to zero by uniform convergence of Fn j on , . (Note
that n j (z) is uniformly small compared to |z| so that z + n j (z), j N, stays in
, .) Thus, z + is a right inverse of F , that is, = .

The study of free convolution via the analytic functions (or R ) is useful
in deducing properties of free convolution and of free infinitely divisible laws
(whose definition is analogous to the classical one, with free convolution replacing
classical convolution). The following lemma sheds light on the special role of the
semicircle law with respect to free convolution. For a measure M1 (R), we
define the rescaled measure # 1 M1 (R) by the relation
2

x
# 1 , f  = f ( )d (x) for all bounded measurable functions f .
2 2

Lemma 5.3.40 Let be a probability measure on R, so that  , x2  < . If

# 1  # 1 = , (5.3.26)
2 2

then is a scalar rescale of the semicircle law.

(The assumption of finite variance in Lemma 5.3.40 is superfluous, see Section


5.6. The statement we present has the advantage of possessing a short proof.)
Proof Below, we consider the definition of Voiculescus transform of , see Defi-
nition 5.3.37. We deduce from (5.3.26) that

(z) = 2 (z) .
# 1
2

But

G (z) = 2G ( 2z) (z) = 2 (z/ 2) ,
# 1 # 1
2 2

and so we obtain

(z/ 2) = 2 (z) . (5.3.27)
374 5. F REE PROBABILITY

When  , x2  < and z has large imaginary part, since


 
1  , x  , x2  2
G (z) = 1+ + + o(|z| ) ,
z z z2
we get
 , x2   , x2
(z) =  , x + + o(|z|1 ) . (5.3.28)
2z
From (5.3.27) and (5.3.28), we deduce first that  , x = 0 and then that, as z
, z (z) converges to  , x2 /2. Since 5.3.27 implies that z (z) = 2n/2 (2n/2 z),
it follows by letting n go to infinity that z (z) =  , x2 /2, for all z with z = 0.
From Example 5.3.26, we conclude that is a scalar rescale of the semicircle
law.

Exercise 5.3.41 Let > 0 and p (dx) be the Cauchy law


1
p (dx) = dx.
x + 2
2

Show that for z C+ , G p (z) = 1/(z + i ) and so R p (z) = i and therefore that
for any probability measure on R, G p (z) = G (z + i ). Show by the residue
theorem that G p (z) = G (z + i ) and conclude that  p = p , that is, the
free convolution by a Cauchy law is the same as the standard convolution.

5.4 Link with random matrices

Random matrices played a central role in free probability since Voiculescus sem-
inal observation that independent Gaussian Wigner matrices converge in distri-
bution as their size goes to infinity to free semicircular variables (see Theorem
5.4.2). This result can be extended to approximate any law of free variables by
taking diagonal matrices and conjugating them by independent unitary matrices
(see Corollary 5.4.11). In this section we aim at presenting these results and the
underlying combinatorics.

Definition 5.4.1 A sequence of collections of noncommutative random variables

({aNi }iJ )NN

in noncommutative probability spaces (AN , , N ) is called asymptotically free if it


converges in law as N goes to infinity to a collection of noncommutative random
variables {ai }iJ in a noncommutative probability space (A, , ), where {ai }iJ
5.4 L INK WITH RANDOM MATRICES 375

is free. In other words, for any positive integer p and any i1 , . . . , i p J,

lim N (aNi1 aNi2 aNip ) = (ai1 ai p )


N

and the noncommutative variables ai , i J, are free in (A, , ).

We first prove that independent (not necessarily Gaussian) Wigner matrices are
asymptotically free.

Theorem 5.4.2 Let (, B, P) be a probability space and N, p be positive integers.


( )
Let = 1 or 2,and let XiN : HN , 1 i p, be a family of random matrices
such that XiN / N are Wigner matrices. Assume that, for all k N,

sup sup sup E[|XiN (m, )|k ] ck < , (5.4.1)


NN 1ip 1mN

that (XiN (m, ), 1 m  N, 1 i p) are independent, and that E[XiN (m, )] =
0 and E[|XiN (m, )|2 ] = 1.
Then the empirical distribution N := { 1 X N }1ip of { 1N XiN }1ip converges
N i
almost surely and in expectation to the law of p free semicircular variables. In
other words, the matrices { 1N XiN }1ip , viewed as elements of the noncom-
mutative probability space (MatN (C), , N1 tr) (respectively, (MatN (C), , E[ N1 tr])),
are almost surely asymptotically free (respectively, asymptotically free) and their
spectral measures almost surely converge (respectively, converge) to the semicir-
cle law.

In the course of the proof of this theorem, we shall prove the following useful
intermediate remark, which in particular holds when only one matrix is involved.

Remark 5.4.3 Under the hypotheses of Theorem 5.4.2, except that we do not
require that E[|XiN (m, l)|2 ] = 1 but only that it is bounded by 1, for all monomials
q CXi , 1 i p of degree k normalized so that q(1, 1, . . . , 1) = 1,

lim sup |E[N (q)]| 2k .


N

Proof of Theorem 5.4.2 We first prove the convergence of E[N ]. The proof
follows closely that of Lemma 2.1.6 (see also Lemma 2.2.3 in the case of complex
entries). We need to show, for any monomial q({Xi }1ip ) = Xi1 Xik CXi |1
i p, the convergence of
1
E[N (q)] = k
2 +1
Tj , (5.4.2)
N j
376 5. F REE PROBABILITY

where j = ( j1 , . . . , jk ) and
 
Tj := E XiN1 ( j1 , j2 )XiN2 ( j2 , j3 ) XiNk ( jk , j1 ) .

(Compare with (2.1.10).) By (5.4.1), Tj is uniformly bounded by ck .


We use the language of Section 2.1.3. Consider the closed word w = wj =
j1 jk j1 and recall that its weight wt(w) is the number of distinct letters in w.
Let Gw = (Vw , Ew ) be the graph as defined in the proof of Lemma 2.1.6. As there,
we need to find out which set of indices contributes to the leading order of the
sum in the right side of (5.4.2). Loosely speaking, Tj vanishes more often when
one has independent matrices than when one always has the same matrix. Hence,
the indices corresponding to graphs Gw which are not trees will be negligible. We
will then only consider indices corresponding to graphs which are trees, for which
Tj will be easily computed. Recall the following from the proof of Lemma 2.1.6
(see also Lemma 2.2.3 for complex entries).
w
(i) Tj vanishes if each edge in Ewj is not repeated at least twice (i.e. Ne j 2
for each e Ewj ); hence, wt(wj ) 2k + 1 for all contributing indices;
(ii) the number of N-words in the equivalence class of a given N-word of
weight t is N(N 1) (N t + 1) N t ;
(iii) the number of equivalence classes of closed N-words w of length k + 1 and
weight t such that New 2 for each e Ew is bounded by t k kk .

Therefore,



Tj N t ck t k C(k)N 2k

j:wtj k t k
2 2

and, considering (5.4.2), we deduce




1
E[N (q)] Tj C(k)N 1 , (5.4.3)
k +1
N 2 j:wtj = k +1
2

where the set {j : wtj = 2k + 1} is empty if k is odd. This already shows that, if k
is odd,
lim E[N (q)] = 0 . (5.4.4)
N

If k is even, recall also that if wt(wj ) = 2k + 1, then Gwj is a tree (see an explana-
tion below Definition 2.1.10) and (by the cited definition) wj is a Wigner word.
This means that each (unoriented) edge of Gwj is traversed exactly once in each
direction by the walk j1 jk j1 . Hence, Tj will be a product of covariances of
5.4 L INK WITH RANDOM MATRICES 377

the entries, and therefore vanishes if these covariances involve two independent
matrices. Also, when c2 1, Tj will be bounded above by one and therefore
lim supN |E[N (q)]| is bounded above by |Wk,k/2+1 | 2k , where, as in Def-
inition 2.1.10, Wk,k/2+1 denotes a set of representatives for equivalence classes
of Wigner words of length k + 1, and (hence) |Wk,k/2+1 | is equal to the Catalan
 
1 k
number k/2+1 k/2 . This will prove Remark 5.4.3.

We next introduce a refinement of Definition 2.1.8 needed to handle the more


complicated combinatorics of monomials in several independent Wigner matrices.
(Throughout, we consider the set S = {1, . . . , N} and omit it from the notation.)

Definition 5.4.4 Let q = q({Xi }1ip ) = Xi1 Xik CXi |1 i p be given,


where k is even. Let w = s1 sk sk+1 , sk+1 = s1 be any Wigner word of length
k + 1 and let Gw be the tree associated with w. We say that w is q-colorable if,
for j,  = 1, . . . , k, equality of edges {s j , s j+1 } = {s , s+1 } of the tree Gw implies
equality of indices (colors) i j = i . With, as above, Wk,k/2+1 denoting a set of
representatives for the equivalence classes of Wigner words of length k + 1, let
q
Wk,k/2+1 denote the subset of q-colorable such.

By the previous considerations, each index j contributing to the leading or-


der in the evaluation of E[N (q)] corresponds to a tree Gwj , each edge of which
is traversed exactly once in each direction by the walk j1 jk j1 . Further, since
E[XiN (1, 2)XiN (2, 1)] = 1= , an index j contributes to the leading order of

E[N (q)] if and only if it the associated Wigner word wj is q-colorable, and hence
q
equivalent to an element of Wk,k/2+1 . Therefore, for even k,
q
lim E[N (q)] = |Wk,k/2+1 |. (5.4.5)
N

Moreover, trivially,
q Xk
|Wk,k/2+1 | |Wk,k/2+1
1
| = |Wk,k/2+1 | . (5.4.6)

Recall that Wk,k/2+1 is canonically in bijection with the set NC2 (k) of non-crossing
pair partitions of Kk = {1, . . . , k} (see Proposition 2.1.11 and its proof). Similarly,
q
for q = Xi1 Xik , the set Wk,k/2+1 is canonically in bijection with the subset of
NC2 (k) consisting of non-crossing pair partitions of Kk such that for every
block {b, b } one has ib = ib . Thus, we can also write
lim E[N (q)] =
N
1ib =ib ,
NC2 (k) (b,b )

where the product runs over all blocks {b, b } of the pair partition . Recalling
that kn (ai ) = 1n=2 for semicircular variables by Example 5.3.26 and (5.3.7), we
378 5. F REE PROBABILITY

can rephrase the above as


lim E[N (q)] =
N
k (ai1 , . . . , aik ) ,
NC(k)

with k = 0 if is not a pair partition and k2 (ai , a j ) = 1i= j . The right side corre-
sponds to the definition of the moments of free semicircular variables according
to Theorem 5.3.15 and Example 5.3.26. This proves the convergence of E[N ] to
the law of m free semicircular variables.
We now prove the almost sure convergence. Continuing to adapt the ideas of
the (first) proof of Theorem 2.1.1, we follow the proof of Lemma 2.1.7 closely.
(Recall that we proved in Lemma 2.1.7 that the variance of LN , xk  is of or-
der N 2 . As in Exercise 2.1.16, this was enough, using Chebyshevs inequal-
ity and the BorelCantelli Lemma, to conclude the almost sure convergence in
Wigners Theorem, Theorem 2.1.1.) Here, we study the variance of N (q) for
q(X1 , . . . , Xp ) = Xi1 Xik which is given by
1
N k+2
Var(N (q)) = E[|N (q) E[N (q)]|2 ] = Tj,j (5.4.7)
j,j

with
Tj,j = E[Xi1 ( j1 , j2 ) Xik ( jk , j1 )Xik ( j1 , j2 ) Xi1 ( jk , j1 )]
E[Xi1 ( j1 , j2 ) Xik ( jk , j1 )]E[Xik ( j1 , j2 ) Xi1 ( jk , j1 )] ,
where we observed that N (q) = N (q ). We consider the sentence
wj,j = ( j1 jk j1 , j1 j2 j1 ) and its associated graph Gwj,j = (Vwj,j , Ewj,j ). As
in the proof of Lemma 2.1.7, Tj,j vanishes unless each edge in Ewj,j appears at
least twice and the graph Gwj,j is connected. This implies that the number of dis-
tinct elements in Vwj,j is not more than k + 1, and it was further shown in the proof
of Lemma 2.1.7 that the case where it is equal to k + 1 never happens. Hence,
there are at most k different vertices and so at most N k possible choices for them.
Thus, since Tj,j is uniformly bounded by 2c2k , we conclude that there exists a
finite constant c(k) such that
c(k)
Var(N (q))
.
N2
By Chebyshevs inequality we therefore find that
c(k)
P(|N (Xi1 Xik ) E[N (Xi1 Xik )]| ) .
2N2
The BorelCantelli Lemma then yields that
lim |N (Xi1 Xik ) E[N (Xi1 Xik )]| = 0 , a.s.

N
5.4 L INK WITH RANDOM MATRICES 379

We next show that Theorem 5.4.2 generalizes to the case of polynomials that
may include some deterministic matrices.

Theorem 5.4.5 Let = 1 or 2 and let (, B, P) be a probability space. Let


DN = {DNi }1ip be a sequence of Hermitian deterministic matrices and let XN =
( )
{XiN }1ip , XiN : HN , 1 i p, be matrices satisfying the hypotheses of
Theorem 5.4.2. Assume that
1 1
D := sup max sup tr(|DNi |k ) k < , (5.4.8)
kN 1ip N N

and that the law of DN in the noncommutative probability space (MatN (C), ,
N tr) converges to a noncommutative law . Then we have the following.
1

(i) The noncommutative variables 1 XN and DN in the noncommutative prob-


N
ability space (MatN (C), , E[ N1 tr]) are asymptotically free.
(ii) The noncommutative variables 1N XN and DN in the noncommutative prob-
ability space (MatN (C), , N1 tr) are almost surely asymptotically free.
In particular, the empirical distribution of { 1N XN , DN } converges almost surely
and in expectation to the law of {X, D}, X and D being free, D with law and X
being p free semicircular variables.

To avoid repetition, we follow a different route than that used in the proof of
Theorem 5.4.2 (even though similar arguments could be developed). We de-
note by CDi , Xi |1 i p the set of polynomials in {Di , Xi }1ip , by N (re-
spectively, N ) the quenched (respectively, annealed) empirical distribution of
1 1
{DN , N 2 XN } = {DNi , N 2 XiN }1ip given, for q CDi , Xi |1 i p, by
 
1 XN N
N (q) := tr q( , D ) , N (q) := E[N (q)] .
N N
To prove the convergence of {N }NN we first show that this sequence is tight
(see Lemma 5.4.6), and then show that any limit point satisfies the so-called
SchwingerDyson, or master loop, equation which has a unique solution (see
Lemma 5.4.7).

Lemma 5.4.6 For R, d N, we denote by CXi , Di |1 i pR,d the set of mono-


mials in X := {Xi }1ip and D := {Di }1ip with total degree in the variables
X (respectively, D) less than R (respectively, d). Under the hypotheses of The-
orem 5.4.5, except that instead of E[|XiN (m, l)|2 ] = 1 we only require that it is
bounded by 1, assuming without loss of generality that D 1, we have that, for
380 5. F REE PROBABILITY

any R, d N,
sup lim sup |N (q)| Dd 2R . (5.4.9)
qCXi ,Di |1ipR,d N

As a consequence, {N (q), q CXi , Di |1 i pR,d }NN is tight as a CC(R,d) -


valued sequence, with C(R, d) denoting the number of monomials in CXi , Di |1
i pR,d .

We next characterize the limit points of {N (q), q CXi , Di |1 i pR,d }NN .


To this end, let i be the noncommutative derivative with respect to the variable
Xi which is defined as the linear map from CXi , Di |1 i p to CXi , Di |1 i
p2 which satisfies the Leibniz rule

i PQ = i P (1 Q) + (P 1) i Q (5.4.10)

and i X j = 1i= j 1 1, i D j = 0 0. (Here, A B C D = AC BD). If q is a


monomial, we have
i q = q1 q2 ,
q=q1 Xi q2

where the sum runs over all possible decompositions of q as q1 Xi q2 .

Lemma 5.4.7 For any R, d N, the following hold under the hypotheses of Theo-
rem 5.4.5.
(i) Any limit point of {N (q), q CXi , Di |1 i pR,d }NN satisfies the
boundary and tracial conditions

|CDi |1ip0,d = |CDi |1ip0,d , (PQ) = (QP) , (5.4.11)

where the second equality in (5.4.11) holds for all monomials P, Q such
that PQ CXi , Di |1 i pR,d . Moreover, for all i {1, . . . , m} and all
q CXi , Di |1 i mR1,d , we have

(Xi q) = (i q) . (5.4.12)

(ii) There exists a unique solution {R,d (q), q CXi , Di |1 i pR,d } to


(5.4.11) and (5.4.12).
(iii) Set to be the linear functional on CXi , Di |1 i p so that (q) =
R,d (q) for q CXi , Di |1 i pR,d , any R, d N. Then is character-
ized as the unique solution of the system of equations (5.4.11) and (5.4.12)
holding for q, Q, P CXi , Di |1 i p. Further, is the law of p free
semicircular variables, free with variables {Di }1ip possessing law .
5.4 L INK WITH RANDOM MATRICES 381

Note here that q CXi , Di |1 i pR,d implies that q1 , q2 CXi , Di |1 i


pR,d for any decomposition of q into q1 Xi q2 . Therefore, equation (5.4.12), which
is given by
(Xi q) = (q1 ) (q2 ) ,
q=q1 Xi q2

makes sense for any q CXi , Di |1 i pR1,d if { (q), q CXi , Di |1 i


pR,d } is well defined.

Remark 5.4.8 The system of equations (5.4.11) and (5.4.12) is often referred to
in the physics literature as the SchwingerDyson, or master loop, equation.

We next show heuristically how, when {XiN }1ip are taken from the GUE, the
SchwingerDyson equation can be derived using Gaussian integration by parts,
see Lemma 2.4.5. Toward this end, we introduce the derivative z = (z iz )/2
with respect to the complex variable z = z + iz, so that z z = 1 but z z = 0.
Using this definition for the complex variable XiN (, r) when  = r (and otherwise
the usual definition for the real variable XiN (, )), note that we have

X N (,r) XiN ( , r ) = i,i , r,r . (5.4.13)


i

Lemma 2.4.5 can be extended to standard complex Gaussian variables, as intro-


duced in (4.1.2), by
 
z f (z, z)e|z| dz = z f (z, z)e|z| dz .
2 2
(5.4.14)

Here, dz is the Lebesgue measure on C, dz = dzdz. Applying (5.4.14) with


z = XiN (m, ) for m =  and f (XN ) a smooth function of {XiN }1ip of polynomial
growth along with its derivatives, we have


E XiN (, m) f (XN ) = E X N (m,) f (XN ) . (5.4.15)
i

Using Lemma 2.4.5 directly, one verifies that (5.4.15) still holds for m = . (One
could just as well take (5.4.15) as the definition of X N (m,) .) Now let us consider
i
N
(5.4.15) with the special choice of f = P( X
N
, DN )( j, k), where P CXi , Di |1
i p and j, k {1, . . . , N}. Some algebra reveals that, using the notation (A
B)( j, m, , k) = A( j, m)B(, k),
   
X N (m,) P(XN , DN ) ( j, k) = i P(XN , DN ) ( j, m, , k) . (5.4.16)
i

Together with (5.4.15), and after summation over j = m and  = k, this shows that

E [N (Xi P) N N (i P)] = 0 .
382 5. F REE PROBABILITY

We have thus seen that, as a consequence of Gaussian integration by parts, N


satisfies the master loop equation in expectation. In order to prove that N satis-
fies asymptotically the master loop equation, that is, part (i) of Lemma 5.4.7, it is
therefore enough to show that N self-averages (that is, it is close to its expecta-
tion). The latter point is the content of the following technical lemma, which is
stated in the generality of Theorem 5.4.5. The proof of the lemma is postponed
until after we derive Theorem 5.4.5 from the lemma.

Lemma 5.4.9 Let q be a monomial in CXi , Di |1 i p. Under the hypotheses


of Theorem 5.4.5, except that instead of E[|XiN (m, l)|2 ] = 1, we only require that
it is bounded by 1, we have the following for any > 0.
(i) For any positive integer k,

XN
lim sup N max E[|q( , DN )(i, j)|k ] = 0 . (5.4.17)
N 1i jN N

(ii) There exists a finite constant C(q) such that, for all positive integers N,

C(q)
E[|N (q) N (q)|2 ] . (5.4.18)
N 2

We next give the proof of Theorem 5.4.5, with Lemmas 5.4.6, 5.4.7 and 5.4.9
granted.
Proof of Theorem 5.4.5 By Lemmas 5.4.6 and 5.4.7, {N (q), q CXi , Di |1
i pR,d } is tight and converges to the unique solution {R,d (q), q CXi , Di |1
i pR,d } of the system of equations (5.4.11) and (5.4.12). As a consequence,
R,d (q) = R ,d (q) for q CXi , Di |1 i pR ,d , R R and d d , and we can
define (q) = R,d (q) for q CXi , Di |1 i pR,d . This completes the proof of
the first point of Theorem 5.4.5 since is the law of p free semicircular variables,
free with {Di }1ip with law by part (iii) of Lemma 5.4.7.
The almost sure convergence asserted in the second part of the theorem is a
direct consequence of (5.4.18), the BorelCantelli Lemma and the previous con-
vergence in expectation.

We now prove Lemmas 5.4.6, 5.4.7 and 5.4.9.


Proof of Lemma 5.4.6 We prove by induction over R a slightly stronger result,

namely that for all R, d N, with |q| = qq ,
1
sup sup lim sup |N (|q|r )| r Dd 2R . (5.4.19)
r0 qCXi ,Di |1ipR,d N
5.4 L INK WITH RANDOM MATRICES 383

If R = 0, this is obvious by (5.4.8). When R = 1, by using (G.10) twice, for any


q CXi , Di |1 i p1,d ,
1 1
|N (|q|r )| r Dd max |N (|Xi |r )| r ,
1ip

which yields (5.4.19) since by Remark 5.4.3, if r 2p for some p N,


1 1
lim sup |N (|Xi |r )| r lim sup |N ((Xi )2p )| 2p 2.
N N

We next proceed by induction and assume that (5.4.19) is true up to R = K 1.


We write q = q X j p(D) with p a monomial of degree  and q CXi , Di |1 i
pK1,d . By (G.10) and the induction hypothesis, we have, for all r 0,
1 1 1
lim sup |N (|q|r )| r D |N (|X j |2r )| 2r |N (|q |2r )| 2r 2D 2K1 Dd ,
N

which proves (5.4.19) for K = R, and thus completes the proof of the induction
step. Equation (5.4.9) follows.

Proof of Lemma 5.4.9 Without loss of generality, we assume in what follows that
D 1. If q is a monomial in CXi , Di |1 i pR,d , and if max (X) denotes the
spectral radius of a matrix X and ei the canonical orthonormal basis of CN ,
XN XN p XN
|q( , DN )(i, j)| = |ei , q( , DN )e j | Di=1 di max ( i )i ,
N N 1ip N

where i (respectively, di ) is the degree of qi in the variable Xi (respectively, Di )


(in particular i R and di d). As a consequence, we obtain the following
bound, for any even positive integer k and any s 1,
XN XN
E[|q( , DN )(i, j)|k ] Dkd E[max ( i )ki ]
N 1ip N
p  - 1
1 21
XN s p
Dkd E tr(( 1 )ksi )
s
Dkd N s E N ((X1N )ksR ) ,
i=1 N
where the last term is bounded uniformly in N by Lemma 2.1.6 (see Exercise
2.1.17 in the case where the variances of the entries are bounded by one rather than
equal to one, and recall that D 1) or Remark 5.4.3. Choosing s large enough so
that ps < completes the proof of (5.4.17). Note that this control holds uniformly
on all Wigner matrices with normalized entries possessing ksR moments bounded
above by some value.
To prove (5.4.18) we consider a lexicographical order (X r , 1 r pN(N +
1)/2) of the (independent) entries (XkN (i, j), 1 i j N, 1 k p) and denote
384 5. F REE PROBABILITY

by k = {X r , r k} the associated sigma-algebra. By convention we denote by


0 the trivial algebra. Then we have the decomposition

pN(N+1)/2
N := E[|N (q) N (q)|2 ] = r , (5.4.20)
r=1

with

r := E[|E[N (q)|r ] E[N (q)|r1 ]|2 ] .

By the properties of conditional expectation and the independence of the X r , we


can write r = E[|r |2 ] with

r := E[N (q)|r ](X r , X r1 , . . . , X 1 ) E[N (q)|r ](X r , X r1 , . . . , X 1 )



and (X r , X r ) identically distributed and independent of each other and of X r , r =
r. If X r = XsN (i, j) for some s {1, . . . , p} and i, j {1, . . . , N}2 , we denote by Xr
the interpolation

Xr := (1 )X r + X r .

Taylors formula then gives


 1
r = E[N (q)|r ](Xr , X r1 , . . . , X 1 )d
0
 1
1
=
N 3/2 0
Xr E[(q2 q1 )( j, i)|r ](Xr , X r1 , . . . , X 1 )d
q=q1 Xs q2
 1
1
+
N 3/2 0
Xr E[(q2 q1 )(i, j)|r ](Xr , X r1 , . . . , X 1 )d ,
q=q1 Xs q2

where the sum runs over all decompositions of q into q1 Xs q2 . Hence we obtain
that there exists a finite constant C(q) such that
 1 XN ,r
C(q)
r
N3
q=q X q 0
E[|YsN (k, )|2 |(q2 q1 )( , DN )(, k)|2 ]d ,
N
1 s 2
(k,)=(i, j) or ( j,i)

with XN ,r the p-tuple of matrices where the (i, j) and ( j, i) entries of the matrix s
were replaced by the interpolation Xr and its conjugate and YsN (i, j) = XsN (i, j)
XsN (i, j). We interpolate again with the p-tuple XNr where the entries (i, j) and
( j, i) of the matrix s vanishes to obtain by the CauchySchwarz inequality and
5.4 L INK WITH RANDOM MATRICES 385

independence of XNr with YsN (i, j) that, for some finite constants C(q)1 , C(q)2 ,

C(q)1 XN
r
N 3
q=q1 Xs q2
E[|(q2 q1 )( r , DN )(k, )|2 ]
N
(k,)=(i, j) or ( j,i)
 1 
XN XN ,r 1
+ E[|(q2 q1 )( r , DN )(k, ) (q2 q1 )( , DN )(k, )|4 ] 2 d
0 N N

C(q)2 XN N

N3
q=q1 Xs q2
E[|(q2 q1 )( , D )(k, )|2 ]
N
(k,)=(i, j) or ( j,i)

XN XN
+ E[|(q2 q1 )( , DN )(k, ) (q2 q1 )( r , DN )(k, )|2 ]
N N
 1 
XNr XN ,r 1
+ E[|(q2 q1 )( , DN )(i, j) (q2 q1 )( , DN )(k, )|4 ] 2 d . (5.4.21)
0 N N
To control the last two terms, consider two p-tuples of matrices XN and XN
that differ only at the entries (i, j) and ( j, i) of the matrix s and put YsN (i, j) =
XsN (i, j) XsN (i, j). Let q be a monomial and 1 k,  N. Then, if we set
XN = (1 )XN + XN , we have

XN XN
q(k, ) := q( , DN )(k, ) q( , DN )(k, )
N N
N  1 XN XN
Y (m, n)
= s 1 N p ( , DN
)(k, m)p2 ( , DN )(n, )d .
(m,n)=(i, j) N 0 q=p1 Xs p2 N
or ( j,i)

Using (5.4.17), we deduce, that for all , r > 0,


lim N 2 N max
r
max E[|q(k, )|r ] = 0 . (5.4.22)
N 1i, jN 1k,N

As a consequence, the two last terms in (5.4.21) are at most of order N 1+ and
summing (5.4.21) over r, we deduce that there exist finite constants C(q)3 , C(q)4
so that
 
C(q)3 p XN N
N
N 3 s=1
E[ |(q2 q1 )(
N
, D )(i, j)|2
] + N 1+
q=q1 Xs q2 1i, jN

C(q)3 p C(q)4
= 2
N s=1 q=q1 Xs q2
N (q2 q1 q1 q2 ) + 2 .
N

Using (5.4.17) again, we conclude that N C(q)/N 2 .


Proof of Lemma 5.4.7 To derive the equations satisfied by a limiting point R,d of
N , note that the first equality of (5.4.11) holds since we assumed that the law of
386 5. F REE PROBABILITY

{DNi }1ip converges to , whereas the second equality is verified by N for each
N, and therefore by all its limit points. To check that R,d also satisfies (5.4.12),
we write
N
1 XN
N (Xi q) =
N 3/2
E[XiN ( j1 , j2 )q( , DN )( j2 , j1 )] = I1 ,2 , (5.4.23)
N
j1 , j2 =1 1 ,2

where 1 (respectively, 2 ) denotes the number of occurrences of the entry


XiN ( j1 , j2 ) (respectively, XiN ( j2 , j1 )) in the expansion of q in terms of the entries
of XN . I0,0 in the right side of (5.4.23) vanishes by independence and centering.
To show that the equation (5.4.15) leading to the master loop equation is approxi-
mately true, we will prove that (1 ,2 ) =(0,1) I1 ,2 is negligible.
We evaluate separately the different terms in the right side of (5.4.23). Con-
cerning I0,1 , we have

1 XN XN
I0,1 =
N2 E[q1 ( , D)( j1 , j1 )q2 ( , DN )( j2 , j2 )] ,
N N
j1 , j2 q=q1 Xi q2

where XN is the p-tuple of matrices whose entries are the same as XN , except that
XiN ( j1 , j2 ) = XiN ( j2 , j1 ) = 0. By (5.4.22), we can replace the matrices XN by XN
up to an error of order N 2 for any > 0, and therefore
1

I0,1 = E[N (q1 )N (q2 )] + o(1)


q=q1 Xi q2

= E[N (q1 )]E[N (q2 )] + o(1) , (5.4.24)


q=q1 Xi q2

where we used (5.4.18) in the second equality.


We similarly find that
N
1 XN XN
I1,0 =
N2 E[q1 ( , D)( j2 , j1 )q2 ( , D)( j2 , j1 )]
N N
j1 , j2 =1 q=q1 Xi q2

so that replacing XN by XN as above shows that


1
I1,0 = N (q1 q2 ) + o(1) N 0 , (5.4.25)
N
where (5.4.9) was used in the limit, and we used that (zXi1 Xi p ) = zXi p Xi1 .
Finally, with (1 , 2 ) = (1, 0) or (0, 1), we find that
1
I1 ,2 =  + 1
2+ 1 22
I( j1 , j2 , )
N q=q1 Xi q2 Xi qk+1 j1 , j2
5.4 L INK WITH RANDOM MATRICES 387

with

XN XN
I( j1 , j2 , ) := E[q1 ( )( (1), (2)) qk+1 ( )( (k + 1), (1))] ,
N N

where we sum over all possible maps : {1, . . . , k + 1}{ j1 , j2 } correspond-


ing to 1 (respectively, 2 ) occurrences of the oriented edge ( j1 , j2 ) (respectively,
( j2 , j1 )). Using Holders inequality and (5.4.17) we find that the above is at most
1 +2 1
of order N 2 + for any > 0. Combined with (5.4.24) and (5.4.25), we have
proved that
 
lim
N
N (Xi q) N (q1 )N (q2 ) = 0 . (5.4.26)
q=q1 Xi q2

Since if q CXi , Di |1 i pR1,d , any q1 , q2 such that q = q1 Xi q2 also belong


to this set, we conclude that any limit point R,d of { 1 X N ,DN }1ip restricted to
N i i
CXi , Di |1 i pR,d satisfies (5.4.12).
Since (5.4.12) together with (5.4.11) defines (P) uniquely for any P
CXi , Di |1 i pR,d by induction over the degree of P in the Xi , it follows that
N converges as N goes to infinity towards a law which coincides with R,d on
CXi , Di |1 i pR,d for all R, d 0. Thus, to complete the proof of part (i) of
Theorem 5.4.5, it only remains to check that is the law of free variables. This
task is achieved by induction: we verify that the trace of

Q(X, D) = q1 (X)p1 (D)q2 (X)p2 (D) pk (D) (5.4.27)

vanishes for all polynomials qi , pi such that (pi (D)) = (q j (X)) = 0, i 1, j 2.


By linearity, we can restrict attention to the case where qi , pi are monomials.
Let degX (Q) denote the degree of Q in X. We need only consider degX (Q) 1.
If degX (Q) = 1 (and thus Q = p1 (D)Xi p2 (D)) we have (Q) = (Xi p2 p1 (D)) = 0
by (5.4.12). We continue by induction: assume that (Q) = 0 whenever degX (Q) <
K and (pi (D)) = (q j (X)) = 0, i 1, j 2. Consider now Q of the form
(5.4.27) with degX (Q) = K and (q j (X)) = 0, j 2, (pi ) = 0, i 1. Using
traciality, we can write (Q) = (Xi q) with degX (q) = K 1 and q satisfies all
assumptions in the induction hypothesis. Applying (5.4.12), we find that (Q) =
q=q1 Xi q2 (q1 ) (q2 ), where q1 (respectively, q2 ) is a product of centered polyno-
mials except possibly for the first or last polynomials in the Xi . The induction hy-
pothesis now yields that (Xi q) = q=q1 Xi q2 (q1 ) (q2 ) = 0, completing the proof
of the claimed asymptotic freeness. The marginal distribution of the {Xi }1ip is
given by Theorem 5.4.2.

388 5. F REE PROBABILITY

We now consider conjugation by unitary matrices following the Haar measure


U(N) on the set U(N) of N N unitary matrices (see Theorem F.13 for a defini-
tion).

Theorem 5.4.10 Let DN = {DNi }1ip be a sequence of Hermitian (possibly ran-


dom) N N matrices. Assume that their empirical distribution converges to a
noncommutative law . Assume also that there exists a deterministic D < such
that, for all k N and all N N,
1
tr((DNi )2k ) D2k , a.s.
N
Let UN = {UiN }1ip be independent unitary matrices with Haar law U(N) , in-
dependent from {DNi }1ip . Then the subalgebras Ui N generated by the matrices
{UiN , (UiN ) }1ip , and the subalgebra D N generated by the matrices {DNi }1ip ,
in the noncommutative probability space (MatN (C), , E[ N1 tr]) (respectively,
(MatN (C), , N1 tr)) are asymptotically free (respectively, almost surely asymptot-
ically free). For all i {1, . . . , p}, the limit law of {UiN , (UiN ) } is given as the
element of MCU,U ,1 , such that

((UU 1)2 ) = 0, (U n ) = ((U )n ) = 1n=0 .

We have the following corollary.

Corollary 5.4.11 Let {DNi }1ip be a sequence of uniformly bounded real di-
agonal matrices with empirical measure of diagonal elements converging to i ,
i = 1, . . . , p respectively. Let {UiN }1ip be independent unitary matrices follow-
ing the Haar measure, independent from {DNi }1ip .
(i) The noncommutative variables {UiN DNi (UiN ) }1ip in the noncommuta-
tive probability space (MatN (C), , E[ N1 tr]) (respectively,
(MatN (C), , N1 tr)) are asymptotically free (respectively, almost surely
asymptotically free), the law of the marginals being given by the i .
(ii) The empirical measure of eigenvalues of of DN1 +UN DN2 UN converges weakly
almost surely to 1  2 as N goes to infinity.
(iii) Assume that DN1 is nonnegative. Then, the empirical measure of eigenval-
ues of
1 1
(DN1 ) 2 UN DN2 UN (DN1 ) 2

converges weakly almost surely to 1  2 as N goes to infinity.

Corollary 5.4.11 provides a comparison between independence (respectively,


standard convolution) and freeness (respectively, free convolution) in terms of
5.4 L INK WITH RANDOM MATRICES 389

random matrices. If DN1 and DN2 are two diagonal matrices whose eigenvalues
are independent and equidistributed, the spectral measure of DN1 + DN2 converges
to a standard convolution. At the other extreme, if the eigenvectors of a matrix
AN1 are very independent from those of a matrix AN2 in the sense that the joint
distribution of the matrices can be written as the distribution of (AN1 ,U N AN2 (U N ) ),
then free convolution will describe the limit law.
Proof of Theorem 5.4.10 We denote by N := {DN ,U N ,(U N ) }1ip the joint em-
i i i
pirical distribution of {DNi ,UiN , (UiN ) }1ip , considered as an element of the al-
gebraic dual of CXi , 1 i n with n = 3p, equipped with the involution such
that ( Xi1 Xin ) = Xin Xi1 if

X3i2 = X3i2 , 1 i p, X3i1 = X3i , 1 i p .

The norm is the operator norm on matrices. We may and will assume that D 1,
and then our variables are bounded uniformly by D. Hence, N is a state on the
universal C -algebra A (D, {1, , 3n}) as defined in Proposition 5.2.14 by an
appropriate separation/completion construction of CXi , 1 i n. The sequence
{E[N ]}NN is tight for the weak*-topology according to Lemma 5.2.18. Hence,
we can take converging subsequences and consider their limit points. The strategy
of the proof will be to show, as in the proof of Theorem 5.4.5, that these limit
points satisfy a SchwingerDyson equation. Of course, this SchwingerDyson
equation will be slightly different from the equation obtained in Lemma 5.4.7 in
the context of Gaussian random matrices. However, it will again be a system
of equations defined by an appropriate noncommutative derivative, and will be
derived from the invariance by multiplication of the Haar measure, replacing the
integration by parts (5.4.15) (the latter could be derived from the invariance by
translation of the Lebesgue measure). We will also show that the Schwinger
Dyson equation has a unique solution, implying the convergence of (E[N ], N
N). We will then show that this limit is exactly the law of free variables. Finally,
concentration inequalities will allow us to extend the result to the almost sure
convergence of {N }NN .
SchwingerDyson equation We consider a limit point of {E[N ]}NN . Be-
cause we have N ((Ui (Ui ) 1)2 ) = 0 and N (PQ) = N (QP) for any P, Q
CDi ,Ui ,Ui |1 i p, almost surely, we know by taking the large N limit that

(PQ) = (QP) , ((UiUi 1)2 ) = 0, 1 i p . (5.4.28)

Since is a tracial state by Proposition 5.2.16, the second equality in (5.4.28)


implies that, in the C -algebra (CDi ,Ui ,Ui |1 i p, ,   ), UiUi = 1 (note
that this algebra was obtained by taking the quotient with {P : (PP ) = 0}).
390 5. F REE PROBABILITY

By definition, the Haar measure U(N) is invariant under multiplication by a


unitary matrix. In particular, if P CDi ,Ui ,Ui |1 i p, we have for all k, l
{1, . . . , N},
  
t P(Di ,Ui etBi , etBi Ui ) (k, l)d U(N) (U1 ) d U(N) (Up ) = 0

for any anti-Hermitian matrices Bi (Bi = Bi ), 1 i p, since etBi U(N).


Taking Bi = 0 except for i = i0 and Bi0 = 0 except at the entries (q, r) and (r, q),
we find that

(i0 P)({Di ,Ui ,Ui }1ip )(k, r, q, l)d U(N) (U1 ) d U(N) (Up ) = 0

with i the derivative which obeys the Leibniz rules

i (PQ) = i P 1 Q + P 1 i Q ,
i U j = 1 j=iU j 1, iU j = 1 j=i 1 U j ,

where we used the notation (A B)(k, r, q, l) := A(k, r)B(q, l). Taking k = r and
q = l and summing over r, q gives

E [N N (i P)] = 0 . (5.4.29)

Using Corollary 4.4.31 inductively (on the number p of independent unitary ma-
trices), we find that, for any polynomial P CDi ,Ui ,Ui |1 i p, there exists
a positive constant c(P) such that
p  
|trP({DNi ,UiN , (UiN ) }1ip ) EtrP| > 2ec(P) ,
2
U(N)

and therefore
2
E[|trP EtrP|2 ] .
c(P)
Writing i P = M j=1 Pj Q j for appropriate integer M and polynomials Pj , Q j
CDi ,Ui ,Ui |1 i p, we deduce by the CauchySchwarz inequality that

|E [(N E[N ]) (N E[N ])(i P)]|



M

E [(N E[N ])(Pj )(N E[N ])(Q j )]
j=1
2M 1 1
2
max max{ , } N 0 .
N 1 jp c(Pj ) c(Q j )
We thus deduce from (5.4.29) that

lim E [N ] E [N ] (i P) = 0 .
N
5.4 L INK WITH RANDOM MATRICES 391

Therefore, the limit point satisfies the SchwingerDyson equation

(i P) = 0 , (5.4.30)

for all i {1, . . . , p} and P CDi ,Ui ,Ui |1 i p.


Uniqueness of the solution to (5.4.30) Let be a solution to (5.4.28) and
(5.4.30), and let P be a monomial in CDi ,Ui ,Ui |1 i p. We show by induc-
tion over the total degree n of P in the variables Ui and Ui that (P) is uniquely
determined by (5.4.28) and (5.4.30). Note that if P CDi |1 i p, (P) =
(P) is uniquely determined. If P CDi ,Ui ,Ui |1 i p\CDi |1 i p is
a monomial, we can always write (P) = (QUi ) or (P) = (Ui Q) for some
monomial Q by the tracial property (5.4.28). We study the first case, the second
being similar. If (P) = (QUi ),

i (QUi ) = i Q 1 Ui + (QUi ) 1 ,

and so (5.4.30) gives

(QUi ) = (i Q 1 Ui )
= (Q1Ui ) (Q2Ui ) + (Q1 ) (Q2 ) ,
Q=Q1Ui Q2 Q=Q1Ui Q2

where we used the fact that (Ui Q2Ui ) = (Q2 ) by (5.4.28). Each term in the
right side is the trace under of a polynomial of degree strictly smaller in Ui and
Ui than QUi . Hence, this relation defines uniquely by induction. In particular,
taking P = Uin we get, for all n 1,

n
(Uik ) (Uink ) = 0 ,
k=1

from which we deduce by induction that (Uin ) = 0 for all n 1 since (Ui0 ) =
(1) = 1. Moreover, as is a state, ((Ui )n ) = (((Ui )n ) ) = (Uin ) = 0 for n 1.
The solution is the law of free variables It is enough to show by the previous
point that the joint law of the two free p-tuples {Ui ,Ui }1ip and {Di }1ip
n
satisfies (5.4.30). So take P = Uin11 B1 Ui pp B p with some Bk s in the algebra gen-
erated by {Di }1ip and ni Z\{0} (where we observed that Ui = Ui1 ). We
wish to show that, for all i {1, . . . , p},

(i P) = 0. (5.4.31)
392 5. F REE PROBABILITY

Note that, by linearity, it is enough to prove this equality when (B j ) = 0 for all
j. Now, by definition, we have
nk
n l
Uin11 B1 Bk1Uil Ui k
n
i P = Bk Ui pp B p
k:ik =i,nk >0 l=1
nk 1
Uin11 B1 Bk1Uil Ui k
n +l n
Bk Ui pp B p .
k:ik =i,nk <0 l=0

Taking the expectation on both sides, since (U ji ) = 0 and (B j ) = 0 for all i = 0


and j, we see that freeness implies that the trace of the right side vanishes (recall
here that, in the definition of freeness, two consecutive elements have to be in
free algebras but the first and the last element can be in the same algebra). Thus,
(i P) = 0, which proves the claim.

Proof of Corollary 5.4.11 The only point to prove is the first. By Theorem 5.4.10,
we know that the normalized trace of any polynomial P in {UiN DNi (UiN ) }1ip
converges to (P({Ui DiUi }1ip )) with the subalgebras generated by {Di }1ip
and {Ui ,Ui }1ip free. Thus, if
P({Xi }1ip ) = Q1 (Xi1 ) Qk (Xik ) , with i+1 = i , 1  k 1
and (Q (Xi )) = (Q (Di )) = 0, then
(P({Ui DiUi }1ip )) = (Ui1 Q1 (Di1 )Ui1 Uik Qk (Dik )Uik ) = 0 ,
since (Q (Di )) = 0 and (Ui ) = (Ui ) = 0.

Exercise 5.4.12 Extend Theorem 5.4.2 to the self-dual random matrices con-
structed in Exercise 2.2.4.

Exercise 5.4.13 In the case where the Di are diagonal matrices, generalize the
arguments of Theorem 5.4.2 to prove Theorem 5.4.5.

Exercise 5.4.14 Take DN (i j) = 1i= j 1i[ N] the projection on the first [ N] indices
and X N be an N N matrix satisfying the hypotheses of Theorem 5.4.5. With In
the identity matrix, set
ZN = DN X N (IN DN ) + (IN DN )X N DN
 
0 X N[ N],[ N]
=
(X N[ N],[ N] ) 0

with X N[ N],[ N] the corner (X N )1i[ N],[ N]+1 jN of the matrix X N . Show
that (Z N )2 has the same eigenvalues as those of the Wishart matrix W N, :=
5.4 L INK WITH RANDOM MATRICES 393

X N[ N],[ N] (X N[ N],[ N] ) with multiplicity 2, plus N 2[ N] zero eigenval-


ues (if 1/2 so that N [ N] [ N] ). Prove the almost sure convergence of
the spectral measure of the Wishart matrix W N, by using Theorem 5.4.5.

Exercise 5.4.15 Continuing in the setup of Exercise 5.4.14, take TN Mat[ N] to


be a self-adjoint matrix with converging spectral distribution. Prove the almost
sure convergence of the spectral measure of the Wishart matrix

X N[ N],[ N] TN TN (X N[ N],[ N] ) .

Exercise 5.4.16 Take ( (p, q))0p,qk1 Mk (C) and put

i j (N) = (p, q)1 [pN/k]i<[(p+1)N/k] for 0 p, q k 1 .


[qN/k] j<[(q+1)N/k]

Take X N to be an N N matrix satisfying the hypotheses of Theorem 5.4.5 and


1
put YiNj = N 2 i j (N)XiNj . Let AN be a deterministic matrix in the noncommutative
probability space MN (C) and DN be the diagonal matrix diag(1/N, 2/N, . . . , 1).
Assume that (AN , (AN ) , DN ) converge in law towards , while the spectral radius
of AN stays uniformly bounded. Prove that (Y N + AN )(Y N + AN ) converges in
law almost surely and in expectation.
Hint: Show that Y N = 1ik2 ai Ni X N Ni , with {Ni , Ni }1ik2 appropriate pro-
jection matrices. Show the convergence in law of {(Ni , Ni )1ik2 , AN , (AN ) } by
approximating the projections Ni by functions of DN . Conclude by using Theo-
rem 5.4.5.

Exercise 5.4.17 Another proof of Theorem 5.4.10 can be based on Theorem 5.4.2
1
and the polar decomposition U jN = GNj (GNj (GNj ) ) 2 with GNj a complex Gaussian
matrix which can be written, in terms of independent self-adjoint Gaussian Wigner
matrices, as GNj = X jN + iX jN .
(i) Show that U jN follows the Haar measure.
1
(ii) Approximating GNj (GNj (GNj ) ) 2 by a polynomial in (X jN , X jN )1 jp , prove
Theorem 5.4.10 by using Theorem 5.4.5.

Exercise 5.4.18 State and prove the analog of Theorem 5.4.10 when the UiN fol-
low the Haar measure on the orthogonal group O(N) instead of the unitary group
U(N).
394 5. F REE PROBABILITY

5.5 Convergence of the operator norm of polynomials of independent GUE


matrices

The goal of this section is to show that not only do the traces of polynomials in
Gaussian Wigner matrices converge to the traces of polynomials in free semicir-
cular variables, as shown in Theorem 5.4.2, but that this convergence extends to
the operator norm, thus generalizing Theorem 2.1.22 and Exercise 2.1.27 to any
polynomial in independent Gaussian Wigner matrices.
The main result of this section is the following.

Theorem 5.5.1 Let (X1N , . . . , XmN ) be a collection of independent matrices from


the GUE. Let (S1 , . . . , Sm ) be a collection of free semicircular variables in a C -
probability space (S , ) equipped with a faithful tracial state. For any noncom-
mutative polynomial P CX1 , . . . , Xm , we have
XN XN
lim P( 1 , . . . , m ) = P(S1 , . . . , Sm ) a.s.
N N N

On the left, we consider the operator norm (largest singular value) of the N N
XN N Xm
random matrix P( 1N , . . . , N
), whereas, on the right, we consider the norm of
P(S1 , . . . , Sm ) in the C -algebra S . The theorem asserts a correspondence be-
tween random matrices and free probability going considerably beyond moment
computations.

Remark 5.5.2 If (A , ) is a C -probability space equipped with a faithful tracial


state, then the norm of a noncommutative random variable a A can be recovered
by the limit formula
1
a = lim ((aa )k ) 2k . (5.5.1)
k

However, (5.5.1) fails in general, because the spectrum of aa can be strictly larger
than the support of the law of aa . We assume faithfulness and traciality in Theo-
rem 5.5.1 precisely so that we can use (5.5.1).

We pause to introduce some notation. Let X = (X1 , . . . , Xm ). We often abbrevi-


ate using this notation. For example, we abbreviate the statement Q(X1 , . . . , Xm )
CX1 , . . . , Xm  to Q(X) CX. Analogous boldface notation will often be used
below.
Theorem 5.5.1 will follow easily from the next proposition. The proof of the
proposition will take up most of this section. Recall that CX is equipped with
the unique involution such that Xi = Xi for i = 1, . . . , m. Recall also that the degree
5.5 C ONVERGENCE OF OPERATOR NORMS 395

of Q = Q(X) CX is defined to be the maximum of the lengths of the words in


the variables Xi appearing in Q.

Proposition 5.5.3 Let XN := (X1N , . . . , XmN ) be a collection of independent matrices


from the GUE. Let S := (S1 , . . . , Sm ) be a collection of free semicircular variables
in a C -probability space (S , ). Fix an integer d 2 and let P = P(X) CX
be a self-adjoint noncommutative polynomial of degree d. Then, for any > 0,
XN
P( N
), for all N large enough, has no eigenvalue at distance larger than from
the spectrum of P(S), almost surely.

We mention the state and degree bound d in the statement of the proposition
because, even though they do not appear in the conclusion, they figure prominently
in many formulas and estimates below. We remark that since formula (5.5.1) is
not needed to prove Proposition 5.5.3, we do not assume faithfulness and traciality
of . Note the scale invariance of the proposition: for any constant > 0, the
conclusion of the proposition holds for P if and only if it holds for P.
Proof of Theorem 5.5.1 (Proposition 5.5.3 granted). We may assume that P is
self-adjoint. By Proposition 5.5.3, using P(S) = P(S),

XN
lim sup P( ) (spectral radius of P(S)) + = P(S) + , a.s. ,
N N

for any positive . Using Theorem 5.4.2, we obtain the bound

1 XN XN
(P(S) ) = lim tr(P( ) ) lim inf P( ) , a.s.
N N N N N
By (5.5.1), and our assumption that is faithful and tracial,

XN 1
lim inf P( ) sup (P(S)2 ) 2 = P(S) , a.s. ,
N N 0

which gives the complementary bound.


We pause for more notation. Recall that, given a complex number z, z and z
denote the real and imaginary parts of z, respectively. In general, we let 1A denote
the unit of a unital complex algebra A . (But we let In denote the unit of Matn (C).)
Note that, for any self-adjoint element a of a C -algebra
A A , and A C such that
> 0, we have that a 1A is invertible and (a 1A )1 A 1/ . The
A
latter observation is used repeatedly below.
For C such that > 0, with P CX self-adjoint, as in Proposition 5.5.3,
396 5. F REE PROBABILITY

let

g( ) = gP ( ) = ((P(S) 1S )1 ) , (5.5.2)
 
1 XN
gN ( ) = gPN ( ) = E tr (P( ) IN )1 . (5.5.3)
N N
Both g( ) and gN ( ) are analytic in the upper half-plane { > 0}. Further, g( )
is the Stieltjes transform of the law of the noncommutative random variable P(S)
under , and gN ( ) is the expected value of the Stieltjes transform of the empirical
XN
distribution of the eigenvalues of the random matrix P( N
). The uniform bounds

1 1
|g( )| , |gN ( )| (5.5.4)

are clear.
We now break the proof of Proposition 5.5.3 into three lemmas.

Lemma 5.5.4 For any choice of constants c0 , c 0 > 0, there exist constants N0 , c1 ,
c2 , c3 > 0 (depending only on P, c0 and c 0 ) such that the following holds.

For all integers N and complex numbers , if

N max(N0 , (c 0 )1/c1 ) , | | c0 , and N c1 c 0 , (5.5.5)

then
c2
|gP ( ) gPN ( )| . (5.5.6)
N 2 ( )c3
P
Now for any > 0 we have g P ( ) = gP ( ) and gN ( ) = gPN ( ). Thus,
crucially, this lemma, just like Proposition 5.5.3, is scale invariant: for any > 0,
the lemma holds for P if and only if it holds for P.

Lemma 5.5.5 For each smooth compactly supported function : R R vanishing


on the spectrum of P(S), there exists a constant c depending only on and P such
that |E N1 tr (P(XN ))| Nc2 for all N.

4 N
Lemma 5.5.6 With and P as above, limN N 3 N1 tr (P(
X
N
)) = 0, almost
surely.

The heart of the matter, and the hardest to prove, is Lemma 5.5.4. The main
idea of its proof is the linearization trick, which has a strong algebraic flavor. But
before commencing the proof of that lemma, we will present (in reverse order) the
chain of implications leading from Lemma 5.5.4 to Proposition 5.5.3.
5.5 C ONVERGENCE OF OPERATOR NORMS 397

Proof of Proposition 5.5.3 (Lemma 5.5.6 granted) Let D = sp(P(S)), and write
D = {y R : d(y, D) < }. Denote by N the empirical measure of the eigenval-
X N XN
ues of the matrix P( N
). By Exercise 2.1.27, the spectral radii of the matrices iN
for i = 1, . . . , m converge almost surely towards 2 and therefore there exists a fi-
nite constant M such that lim supN N ([M, M]c ) = 0 almost surely. Consider a
smooth compactly supported function : R R equal to one on (D )c [M, M]
and vanishing on D /2 [2M, 2M]c . We now see that almost surely for large N,
no eigenvalue can belong to (D )c , since otherwise


1 XN 4
tr (P( )) = (x)d N (x) N 1 ( N 3 ,
N N

in contradiction to Lemma 5.5.6.


Proof of Lemma 5.5.6 (Lemma 5.5.5 granted) As before, let N denote the em-
XN
pirical distribution of the eigenvalues of P( N
). Let i be the noncommutative
derivative defined in (5.4.10). Let X N (,k) be the derivative as it appears in (5.4.13)
 i
and (5.4.15). The quantity (x)d N (x) is a bounded smooth function of XN sat-
isfying


1 XN XN
X N (,k) (x)d N (x) = 3 (( i P)( )  (P( )))k, (5.5.7)
i
N2 N N

where we let A BC = BCA. Formula (5.5.7) can be checked for polynomial
, and then extended to general smooth by approximations. As a consequence,
with d bounding the degree of P as in the statement of Proposition 5.5.3, we find
that

  
C m XiN 2d2 1 XN 2
 (x)d N (x)22 N
N 2 i=1
(  + 1)
N
tr |
(P(
N
))|

for some finite constant C = C(P). Now the Gaussian Poincare inequality

Var( f (XN )) cE |X N (,r) f (XN )|2 (5.5.8)


i
i,,r

must hold with a constant c independent of N and f since all matrix entries
XiN (, r) are standard Gaussian, see Exercise 4.4.6. Consequently, for every suffi-
398 5. F REE PROBABILITY

ciently small > 0, we have


 
Var( (x)d N (x)) cE( (x)d N (x)22 )

2cCmN
E( (x)2 d N (x))
N2
C m XiN 2d2
+c 2 E(  N  1 XiN 2d2 N )
N 2 i=1 N

2cCm C
E( (x)2 d N (x)) +  2 (5.5.9)
N 2 N4
for a constant C = C ( ), where we use the fact that
A N Ap
A Xi A
1 p < , sup E A A
A N A < (5.5.10)
N

by Lemma 2.6.7. But Lemma 5.5.5 implies that E[ (x)2 d N (x)] is at most of
order N 2 since vanishes on the spectrum of P(S). Thus the right side of (5.5.9)
is of order N 4+ at most when vanishes on the spectrum of P(S). Applying
Chebyshevs inequality, we deduce that
 
1
) C N 3 4+
8
P(| (x)d N (x) E( (x)d N (x))| 4
N 3

for a finite constant C = C (P, , ). Thus, by the BorelCantelli Lemma and


 4
Lemma 5.5.5, (x)d N (x) is almost surely of order N 3 at most.

Proof of Lemma 5.5.5 (Lemma 5.5.4 granted) We first briefly review a method for
reconstructing a measure from its Stieltjes transform. Let : R2 C be a smooth
compactly supported function. Put (x, y) = 1 (x + iy )(x, y). Assume that
(x, 0) 0 and (x, 0) 0. Note that by Taylors Theorem (x, y)/|y| is
bounded for |y| = 0. Let be a probability measure on the real line. Then we
have the following formula for reconstructing from its Stieltjes transform:
  +   
(x, y)
dy dx (dt) = (t, 0) (dt) . (5.5.11)
0 t x iy
This can be verified in two steps. One first reduces to the case = 0 , using
Fubinis Theorem, compact support of (x, y) and the hypothesis that
| (x, y)|/|t x iy| | (x, y)|/|y|

is bounded for y > 0. Then, letting |(x, y)| = x2 + y2 , one uses Greens Theorem
on the domain {0 < |(x, y)| R, y 0} with R so large that is supported in
the disc {|(x, y)| R/2}, and with 0.
Now let be as specified in Lemma 5.5.5. Let M be a large positive integer,
5.5 C ONVERGENCE OF OPERATOR NORMS 399

later to be chosen appropriately. Choose the arbitrary constant c0 in Lemma 5.5.4


so that is supported in the interval [c0 , c0 ]. Choose c 0 > 0 arbitrarily. We
claim that there exists a smooth function : R2 C supported in the rectangle
[c0 , c0 ] [c 0 , c 0 ] such that (t, 0) = (t) and (x, y)/|y|M is bounded for
|y| = 0. To prove the claim, pick a smooth function : R [0, 1] identically
equal to 1 near the origin, and supported in the interval [c 0 , c 0 ]. One verifies
i ()
immediately that (x, y) = M =0 ! (x) (y)y has the desired properties. The


claim is proved.
N
As before, let N be the empirical distribution of the eigenvalues of P(
X
N
).
Let be the law of the noncommutative random variable P(S). By hypothesis
vanishes on the spectrum of P(S) and hence also vanishes on the support of . By
(5.5.11) and using the uniform bound
A A
A XN A
A(P( 1 A
) IN ) A 1/ ,
A N
we have
  
E d N = E d N (t) (dt)
  +
= ( (x, y))(gN (x + iy) g(x + iy))dz .
0

Let c4 = c4 (M) > 0 be a constant such that

sup | (x, y)|/|y|M < c4 .


(x,y)[c0 ,c0 ](0,c 0 ]

Then, with constants N0 , c1 , c2 and c3 coming from the conclusion of Lemma


5.5.4, for all N N0 ,
  c0  N c1  c0  c
c4 c2 0
|E d N | 2c4 yM1 dxdy + yMc3 dxdy ,
c0 0 N2 c0 0

where the first error term is justified by the uniform bound (5.5.4). With M large
enough, the right side is of order N 2 at most.

We turn finally to the task of proving Lemma 5.5.4. We need first to introduce
suitable notation and conventions for handling block-decomposed matrices with
entries in unital algebras.
Let A be any unital algebra over the complex numbers. Let Matk,k (A ) denote
the space of k-by-k matrices with entries in A , and write Matk (A ) = Matk,k (A ).
Elements of Matk,k (A ) can and will be identified with elements of the tensor
product Matk,k (C)A . In the case that A itself is a matrix algebra, say Matn (B),
we identify Matk,k (Matn (B)) with Matkn,k n (B) by viewing each element of the
400 5. F REE PROBABILITY

latter space as a k-by-k array of blocks each of which is an n-by-n matrix. Re-
call that the unit of A is denoted by 1A , but that the unit of Matn (C) is usually
denoted by In . Thus, the unit in Matn (A ) is denoted by In 1A .
Suppose that A is an algebra equipped with an involution. Then, given a ma-
trix a Matk (A ), we define a Matk (A ) to be the matrix with entries
(a )i, j = aj,i . Suppose further that A is a C -algebra. Then we use the GNS
construction to equip Matk (A ) with a norm by first identifying A with a C -
subalgebra of B(H) for some Hilbert space H, and then identifying Matk (A )
in compatible fashion with a subspace of B(H  , H k ). In particular, the rules enun-
ciated above equip Matn (A ) with the structure of a C -algebra. That structure is
unique because a C -algebra cannot be renormed without destroying the property
aa  = a2 .
We define the degree of Q Matk (CX) to be the maximum of the lengths of
the words in the variables Xi appearing in the entries of Q. Also, given a collection
x = (x1 , . . . , xm ) of elements in a unital complex algebra A , we define Q(x)
Matk (A ) to be the result of making the substitution X = x in every entry of Q.
Given for i = 1, 2 a linear map Ti : Vi Wi , the tensor product T1 T2 : V1 V2
W1 W2 of the maps is defined by the formula

(T1 T2 )(A1 A2 ) = T1 (A1 ) T2 (A2 ) , Ai Vi .

For example, given A Matk (A ) = Matk (C) MatN (C), one evaluates (idk
N tr)(A) Matk (C) by viewing A as a k-by-k array of N-by-N blocks and then
1

replacing each block by its normalized trace.


We now present the linearization trick. It consists of two parts summarized in
Lemmas 5.5.7 and 5.5.8. The first part is the core idea: it describes the spectral
properties of a certain sort of patterned matrix with entries in a C -algebra. The
second part is a relatively simple statement concerning factorization of a noncom-
mutative polynomial into matrices of degree 1.
To set up for Lemma 5.5.7, fix an integer d 2 and let k1 , . . . , kd+1 be positive
integers such that k1 = kd+1 = 1. Put k = k1 + + kd . For i = 1, . . . , d, let
 @
Ki = 1 + k , . . . , k {1, . . . , k} (5.5.12)
<i i

and put Kd+1 = K1 . Note that {1, . . . , k} is the disjoint union of K1 , . . . , Kd . Let A
be a C -algebra and for i = 1, . . . , d, let ti Matki ki+1 (A ) be given. Consider the
5.5 C ONVERGENCE OF OPERATOR NORMS 401

block-decomposed matrix

t1
..
.
T = Matk (A ) , (5.5.13)
td1
td
where for i = 1, . . . , d, the matrix ti is placed in the block with rows (resp., columns)
indexed by Ki (resp., Ki+1 ), and all other entries of T equal 0 A . We remark that
the GNS-based procedure we used to equip each matrix space Mat p,q (A ) with a
norm implies that
d
T  max ti  . (5.5.14)
i=1
" #
0
Let C be given and put = Matk (C). Below, we write =
0 Ik1
1A , = 1A and more generally = 1A for any Matk (C). This
will not cause confusion, and is needed to compress notation.

Lemma 5.5.7 Assume that t1 td A is invertible and let c be a constant


such that
A A
c (1 + dT )2d2 (1 + A(t1 td )1 A) .
Then the following hold.
(i) T is invertible,
A the entry
A of (T )1 in the upper left equals (t1 td
) , and A(T ) A c.
1 1

A all Mat
(ii) For k (C), if 2cA  < 1, then T is invertible and
A(T )1 (T )1 A 2c2   < c.

Proof Put ti = ti td . The following matrix identity is easy to verify.



t1 1
1 t2 t2 1

.. .. . ..
. . .. .

1 td1 td1 1
td 1 td 1

1 t1
t1 td
1 t 2
1
= .. ..
. . ..
.
1 td1
1
1
402 5. F REE PROBABILITY

Here we have abbreviated notation even further by writing 1 = Iki 1A . The first
matrix above is T . Call the next two matrices A and B, respectively, and the
last D. The matrices A and B are invertible since A Ik is strictly lower triangular
and B Ik is strictly upper triangular. The diagonal matrix D is invertible by the
hypothesis that t1 td is invertible. Thus T is invertible with inverse
( T )1 = AD1 B1 . This proves the first of the three claims made in point (i).
For i, j = 1, . . . , d let B1 (i, j) denote the Ki K j block of B1 . It is not difficult
to check that B1 (i, j) = 0 for i > j, B1 (i, i) = Iki , and B1 (i, j) = ti t j1 for
i < j. The second claim of point (i) can now be verified A A by direct calculation,
and the third by using (5.5.14) to bound A and AB1 A. Point (ii) follows by
consideration of the Neumann series expansion for (Ik (T )1 )1 .

The second part of the linearization trick is the following.

Lemma 5.5.8 Let P CX be given, and let d 2 be an integer bounding the
degree of P. Then there exists an integer n 1 and matrices

V1 Mat1n (CX), V2 , . . . ,Vd1 Matn (CX), Vd Matn1 (CX)

of degree 1 such that P = V1 Vd .

Proof We have
d m m
P= cri1 ,...,ir Xi1 Xir
r=0 i1 =1 ir =1

for some complex constants cri1 ,...,ir . Let {P }n =1 be an enumeration of the terms
(k,)
on the right. Let ei, j Matk (C) denote the elementary matrix with entry 1 in
position (i, j) and 0 elsewhere. Then we have a factorization

P = (e1, V1 )(e , V2 ) (e , Vd1



)(e ,1 Vd )
(1,n) (n,n) (n,n) (n,1)

for suitably chosen Vi CX of degree 1. Take V1 = e1, V1 , V =


(1,n)

e , V for  = 2, . . . , d 1, and Vd = e ,1 Vd . Then V1 , . . . ,Vd have


(n,n) (n,1)

all the desired properties.


We continue to prepare for the proof of Lemma 5.5.4. For the rest of this section
we fix a self-adjoint noncommutative polynomial P CX and also, as in the
statement of Proposition 5.5.3, an integer d 2 bounding the degree of P. For
i = 1, . . . , d, fix Vi Matki ki+1 (CX) of degree 1, for suitably chosen positive
integers k1 , . . . , kd+1 , such that P = V1 Vd . This is possible by Lemma 5.5.8.
Any such factorization serves our purposes. Put k = k1 + + kd and let Ki be as
5.5 C ONVERGENCE OF OPERATOR NORMS 403

defined in (5.5.12). Consider the matrix



V1
..
.
L= Matk (CX) , (5.5.15)
Vd1
Vd
where, for i = 1, . . . , d, the matrix Vi occupies the block with rows (resp., columns)
indexed by the set Ki (resp., Ki+1 ), and all other entries of L equal 0 CX. It is
convenient to write
m
L = a0 1CX + ai Xi , (5.5.16)
i=1

for uniquely determined matrices ai Matk (C). As we will see, Lemma 5.5.7
XN
allows us to use the matrices L( N
) and L(S) to code the spectral properties of
XN
P( N
) and P(S), respectively. We will exploit this coding to prove Lemma 5.5.4.
We will say that any matrix of the form L arising from P by the factorization
procedure above is a d-linearization of P. Of course P has many d-linearizations.
However, the linearization construction is scale invariant in the sense that, for any
constant > 0, if L is a d-linearization of P, then 1/d L is a d-linearization of P.
Put
A A
A XN A 8d8
1 = sup E(1 + d AL( )A
A ) , (5.5.17)
N=1 N A
m
2 = a0  + ai 2 , (5.5.18)
i=1
3 = (1 + dL(S))2d2 . (5.5.19)

Note that 1 < by (5.5.10). We will take care to make all our estimates below
explicit in terms of the constants i (and the constant c appearing in (5.5.8)), in an-
ticipation of exploiting the scale invariance of Lemma 5.5.4 and the d-linearization
construction.
We next present the linearized versions
" of the# definitions (5.5.2) and (5.5.3).
0
For C such that > 0, let = Matk (C). We define
0 Ik1

G( ) = (idk )((L(S) 1S )1 ) , (5.5.20)


1 XN
GN ( ) = E(idk tr)((L( ) IN )1 ) , (5.5.21)
N N
which are matrices in Matk (C).
404 5. F REE PROBABILITY

The next two lemmas, which are roughly parallel in form, give the basic prop-
erties of GN ( ) and G( ), respectively, and in particular show that these matrices
are well defined.

Lemma 5.5.9 (i) For C such that > 0, GN ( ) is well defined, depends
analytically on , and satisfies the bound
1
GN ( ) 1 (1 + ). (5.5.22)

(ii) The upper left entry of GN ( ) equals gN ( ).
(iii) We have
A A
A m A c 2
A A 1 4
AIk + ( a0 )GN ( ) + ai GN ( )ai GN ( )A
1 2
(1 + ) , (5.5.23)
A i=1
A N 2

where c is the constant appearing in (5.5.8).

We call (5.5.23) the SchwingerDyson approximation. Indeed, as N goes to infin-


ity, the left hand side of (5.5.23) must go to zero, yielding a system of equations
which is closely related to (5.4.12). We remark also that the proof of (5.5.23) fol-
lows roughly the same plan as was used in Section 2.4.1 to give Proof #2 of the
semicircle law.
Proof As before, let e,r = eN,N,r MatN (C) denote the elementary matrix with
entry 1 in position (, r), and 0 elsewhere. Given A Matkn (C), let

A[, r] = (idk trN )((Ik er, )A) Matk (C) ,

so that A = ,r A[, r] e,r . (Thus, within this proof, we view A as an N-by-N
array of k-by-k blocks A[, r].)
Since is fixed throughout the proof, we drop it from the notation to the extent
possible. To abbreviate, we write

XN 1 1 N
RN = (L( ) IN )1 , HN = (idk tr)RN = RN [i, i].
N N N i=1
From Lemma 5.5.7(i) we get an estimate
A A
A XN A 2d2 1
RN  (1 + d AL( )A
A
A) (1 + ) (5.5.24)
N
which, combined with (5.5.17), yields assertion (i). From Lemma 5.5.7(i) we also
get assertion (ii).
Assertion (iii) will follow from an integration by parts as in (5.4.15). Recall
5.5 C ONVERGENCE OF OPERATOR NORMS 405

that X N (,r) XiN ( , r ) = i,i , r,r . We have, for i {1, . . . , m} and , r,  , r
i
{1, . . . , N},
1
X N (r,) RN [r ,  ] = RN [r , r]ai RN [,  ] . (5.5.25)
i N
Recall that E X N (r,) f (XN ) = EXiN (, r) f (XN ). We obtain
i

1
ERN ( )[r , r]ai RN ( )[,  ] = EXiN (, r)RN ( )[r ,  ] . (5.5.26)
N
Now left-multiply both sides of (5.5.26) by ai
N 3/2
, and sum on i,  =  , and r = r ,
thus obtaining the first equality below.
m
1 XN
E(ai HN ai HN ) = E(idk tr)((L( ) a0 IN )RN )
i=1 N N
1
= E(idk tr) (Ik IN + (( a0 ) IN )RN )
N
= Ik + ( a0 )GN ( ) .
The last two steps are simple algebra. Thus the left side of (5.5.23) is bounded by
the quantity
A A
A m A
A A
N = AE[ ai (HN EHN )ai (HN EHN )]A
A i=1 A
A A2
A A
( ai 2 )EHN EHN 22 c( ai 2 )E AX N (r,) HN A ,
i 2
i i i,,r

where at the last step we use once again the Gaussian Poincare inequality in the
form (5.5.8). For the quantity at the extreme right under the expectation, we have
by (5.5.25) an estimate
1   1
3
N i,r,,r ,
tr RN [ , r]ai RN [,  ]RN [, r ] ai RN [r , r] 2 ( ai 2 )RN 4 .
N i

The latter, combined with (5.5.17), (5.5.18) and (5.5.24), finishes the proof of
(5.5.23).


We will need a generalization of G( ). For any Matk (C) such that
L(S) 1S is invertible, we define
G() = (idk )((L(S) 1S )1 ) .
Now for C such that G( ) is defined, G() is also defined and
" #
0
G = G( ) . (5.5.27)
0 Ik1
406 5. F REE PROBABILITY

Thus, the function G() should be regarded as an extension of G( ). Let O be


the connected open subset of Matk (C) consisting of all sums of the form
" #
0
+ ,
0 Ik1
where
1
C , Matk (C) , > 0 , 23  (1 + ) < 1. (5.5.28)

Recall that the constant 3 is specified in (5.5.19).

Lemma 5.5.10 (i) For C such that > 0, G( ) is well defined, depends
analytically on , and satisfies the bound
1
G( ) k2 3 (1 + ). (5.5.29)

(ii) The upper left entry of G( ) equals g( ).
(iii) More generally, G() is well defined and analytic for O, and satisfies the
bound
A " #  A
A 0 A 1 2 1
AG + G( )A A 2k 3 (1 + )   < k 3 (1 + )
2 2 2
A 0 Ik1
(5.5.30)
for and as in (5.5.28).
(iv) If there exists O such that a0 is invertible and the operator

(L(S) a0 1S )(( a0 )1 1S ) Matk (S ) (5.5.31)

has norm < 1, then


m
Ik + ( a0 )G() + ai G()ai G() = 0 (5.5.32)
i=1

for all O.

In particular, G() is by (5.5.32) invertible for all O. As we will see in


the course of the proof, equation (5.5.32) is essentially a reformulation of the
SchwingerDyson equation (5.4.12).
Proof Let us specialize Lemma 5.5.7 by taking ti = Vi (S) for i = 1, . . . , d and hence
T = L(S). Then we may take 3 (1 + 1/ )1 as the constant in Lemma 5.5.7.
We note also the crude bound (idk )(M) k2 M for M Matk (S ). By
Lemma 5.5.7(i) the operator L(S) 1S is invertible, with inverse bounded in
norm by 3 (1 + 1/ )1 and possessing (P(S) 1S )1 as its upper left entry.
5.5 C ONVERGENCE OF OPERATOR NORMS 407

Points (i) and (ii) of Lemma 5.5.10 follow. In view of the relationship (5.5.27) be-
tween G() and G( ), point (iii) of Lemma 5.5.10 follows from Lemma 5.5.7(ii).
It remains only to prove assertion (iv). Since the open set O is connected, and
G() is analytic on O, it is necessary only to show that (5.5.32) holds for all in
the nonempty open subset of O consisting of for which the operator (5.5.31) is
defined and has norm < 1. Fix such now, and let M denote the corresponding
operator (5.5.31). Put
bi = ai ( a0 )1 Matk (C)
for i = 1, . . . , m. By developing
(L(S) 1S )1 = (( a0 )1 1S )(Ik 1S M)1 ,
as a power series in M, we arrive at the identity

Ik + ( a0 )G() = (idk )(M +1 ) .
=0

According to the SchwingerDyson equation (5.4.12),



bi (idk )(Si M  ) = bi (idk )(M p1 )bi (idk )(Mp ) ,
p=1

whence, after summation, we get (5.5.32).


Remark 5.5.11 In Exercise 5.5.15 we indicate a purely operator-theoretic way to


prove (5.5.32), using a special choice of C -probability space.

Lemma 5.5.12 Fix C and a positive " integer N# such that > 0 and the
0
right side of (5.5.23) is < 1/2. Put = Matk (C). Then GN ( ) is
0 Ik1
invertible and the matrix
m
N ( ) = GN ( )1 + a0 ai GN ( )ai (5.5.33)
i=1

satisfies
2c1 22 1 4 1 2
N ( )  (1 + ) (| | + 1 + 2 + 1 2 + ), (5.5.34)
N 2
where c is the constant appearing in (5.5.8).

Proof Let us write


m
Ik + ( a0 )GN ( ) + ai GN ( )ai GN ( ) = N ( ) .
i=1
408 5. F REE PROBABILITY

By hypothesis N ( ) < 1/2, hence Ik N ( ) is invertible, hence GN ( ) is


invertible, and we have an algebraic identity
m
N ( ) = (Ik N ( ))1 N ( )( a0 + ai GN ( )ai ) .
i=1

We now arrive at estimate (5.5.34) by our hypothesis N ( ) < 1/2, along with
(5.5.23) to bound N ( ) more strictly, and finally (5.5.18) and (5.5.22).

We record the last trick.

Lemma 5.5.13 Let z, w Matk (C) be invertible. If


m m m
z1 + ai zai = w1 + ai wai , and zw ai 2 < 1 ,
i=1 i=1 i=1

then z = w.

Proof Suppose that z = w. We have w z = m


i=1 zai (w z)ai w after some alge-
braic manipulation, whence a contradiction.

Completion of the proof of Lemma 5.5.4 By the scale invariance of Lemma


5.5.4 and of the d-linearization construction, for any constant > 0, we are free to
replace P by P, and hence to replace the linearization L by 1/d L. Thus, without
loss of generality, we may assume that
1
1 < 2 , 2 < , 3 < 2 . (5.5.35)
18
The # of Lemma 5.5.10(iv) is then fulfilled. More precisely, with =
" hypothesis
i 0
, the matrix a0 is invertible, and the operator (5.5.31) has norm
0 Ik1
< 1. Consequently, we may take the SchwingerDyson equation (5.5.32) for
granted.
Now fix c0 , c 0 > 0 arbitrarily. We are free to increase c 0 , so we may assume
that
c 0 > 3 . (5.5.36)
We then pick N0 and c1 so that:
If (5.5.5) holds, then the right side of (5.5.23) is < 1/2 and
1 1 1
the right side of (5.5.34) is < 23 (1 + ) .
Suppose now that N and satisfy (5.5.5). Then N ( ) is well defined by formula
(5.5.33) because GN ( ) is invertible, and moreover belongs to O. We claim that
G(N ( )) = GN ( ) . (5.5.37)
5.5 C ONVERGENCE OF OPERATOR NORMS 409

To prove (5.5.37), which is an equality of analytic functions of , we may assume


in view of (5.5.36) that
> 2 . (5.5.38)

Put z = GN ( ) and w = G(N ( )). Now


z < 3

by (5.5.22), (5.5.35) and (5.5.38), whereas


w < 6

by (5.5.29), (5.5.30), (5.5.35) and (5.5.38). Applying the SchwingerDyson equa-


tion (5.5.32) along with (5.5.35), we see that the hypotheses of Lemma 5.5.13 are
fulfilled. Thus z = w, which completes the proof of the claim (5.5.37). The claim
granted, for suitably chosen c2 and c3 , the bound (5.5.6) in Lemma 5.5.4 holds by
(5.5.30) and (5.5.34), along with Lemma 5.5.9(ii) and Lemma 5.5.10(ii). In turn,
the proofs of Proposition 5.5.3 and Theorem 5.5.1 are complete.

In the next two exercises we sketch an operator-theoretic approach to the


SchwingerDyson equation (5.5.32) based on the study of BoltzmannFock space
(see Example 5.3.3).

Exercise 5.5.14 Let T, and S be bounded linear operators on a Hilbert space.


Assume that T is invertible. Assume that is a projector and let = 1 be
the complementary projector. Assume that

S = S and T = T S = .
Then we have

= T 1 (T T ST ) = (T T ST ) T 1 . (5.5.39)

Hint: Use the block matrix factorization


" # " #" #" #
a b 1 bd 1 a bd 1 c 0 1 0
=
c d 0 1 0 d d 1 c 1
in the Hilbert space setting.

Exercise 5.5.15 Let V be a finite-dimensional Hilbert space with orthonormal


F i be the corresponding BoltzmannFock space, as
basis {ei }m
i=1 . Let H = i=0 V
0
in Example 5.3.3. Let v V H be the vacuum state. Equip B(H) with the
state = (a  av, v). For i = 1, . . . , m, let i = ei B(H) be the left creation
operator previously considered. We will also consider the right creation operator
ri = ei B(H). For i = 1, . . . , m put si = i + i and recall that s1 , . . . , sm are
410 5. F REE PROBABILITY

free semicircular elements in B(H). Put s = (s1 , . . . , sm ).


(i) For = 1, . . . , m, show that r r = 1B(H) and = r r is the orthogonal
projection of H onto the closed linear span of all words ei1 eir with terminal
letter eir equal to e .
(ii) Let 0 B(H) be the orthogonal projection of H onto V 0 . Show that we have
F
an orthogonal direct sum decomposition H = m =0 H.
(iii) Verify the relations

si = r si r , 0 si r = i 0 = r si 0 (5.5.40)

holding for i, , = 1, . . . , m.
(iv) Identify Matk (B(H)) with B(H k ). Let L = a0 + m i=1 ai Xi Matk (CX) be
of degree 1. Fix Matk (C) such that T = L(s) 1B(H) B(H k ) is invert-
ible. Put = Ik 0 B(H k ) and S = m 1 (I r ) B(H k ). Put
i=1 (Ik ri )T k i
G() = (idk )(T 1 ). Use (5.5.39) and (5.5.40) to verify (5.5.32).

5.6 Bibliographical notes

For basics in free probability and operator algebras, we relied on Voiculescus


St. Flour course [Voi00b] and on [VoDN92]. A more combinatorial approach is
presented in [Spe98]. For notions of operator algebras which are summarized in
Appendix G, we used [Rud91], [DuS58], [Mur90], [Li92], [Ped79] and [Dix69].
For affiliated operators, we relied on [BeV93] and [DuS58], and on the paper
[Nel74]. (In particular, the remark following Definition 5.2.28 clarifies that the
notion of affiliated operators in these references coincide.) Section 5.3.2 follows
closely [Spe03]. Many refinements of the relation between free cumulants and
freeness can be found in the work of Speicher, Nica and co-workers, see the mem-
oir [Spe98] and the recent book [NiS06] with its bibliography. A theory of cumu-
lants for finite dimensional random matrices was initiated in [CaC06]. Subjects
related to free probability are also discussed in the collection of papers [Voi97].
Free additive convolutions were first studied in [Voi86] and [BeV92] for boun-
ded operators, then generalized to operators with finite variance in [Maa92] and
finally to the general setting presented here in [BeV93]. A detailed study of free
convolution by the semicircle law was done by Biane [Bia97b]. Freeness for
rectangular matrices and related free convolution were studied in [BeG09]. The
Markovian structure of free convolution (see [Voi00a] for a basic derivation) was
shown in [Voi93] and [Bia98a, Theorem 3.1] to imply the existence of a unique
subordination function F : CC such that

for all z C\R, Ga+b (z) = Ga (F(z)),


5.6 B IBLIOGRAPHICAL NOTES 411

F(C+ ) C+ , F(z) = F(z), (F(z)) (z) for z C+ and F(iy)/iy1 as y


goes to infinity while staying in R.

Note that, according to [BeV93, Proposition 5.2], the second set of conditions on
F is equivalent to the existence of a probability measure on R so that F = F is
the reciprocal of a Stieltjes transform. Such a point of view can actually serve as
a definition of free convolution, see [ChG08] or [BeB07].
Lemma 5.3.40 is a particularly simple example of infinite divisibility. The as-
sumption of finite variance in the lemma can be removed by observing that the
solution of (5.3.26) is infinitely divisible, and then using [BeV93, Theorem 7.5].
The theory of free infinite divisibility parallels the classical one, and in particular,
a LevyKhitchine formula does exist to characterize infinitely divisible laws, see
[BeP00] and [BaNT04]. The former paper introduces the BercoviciPata bijec-
tion between the classical and free infinitely divisible laws (see also the Boolean
BercoviciPata bijection in [BN08]). Matrix approximations to free infinitely di-
visible laws are constructed in [BeG05].
The generalization of multiplicative free convolution to affiliated operators is
done in [BeV93], see also [NiS97].
The relation between random matrices and asymptotic freeness was first estab-
lished in the seminal article of Voiculescu [Voi91]. In [Voi91, Theorem 2.2], he
proved Theorem 5.4.5 in the case of Wigner Gaussian (Hermitian) random matri-
ces and diagonal matrices {DNi }1ip , whereas in [Voi91, Theorem 3.8], he gen-
eralized this result to independent unitary matrices. In [Voi98b], he removed the
former hypothesis on the matrices {DNi }1ip to obtain Theorem 5.4.5 for Gaus-
sian matrices and Theorem 5.4.10 in full generality (following the same ideas as in
Exercise 5.4.17). An elegant proof of Theorem 5.4.2 for Gaussian matrices which
avoid combinatorial arguments appears in [CaC04]. Theorem 5.4.2 was extended
to non-Gaussian entries in [Dyk93b]. The proof of Theorem 5.4.10 we presented
follows the characterization of the law of free unitary variables by a Schwinger
Dyson equation given in [Voi99, Proposition 5.17] and the ideas of [CoMG06].
Other proofs were given in terms of Weingarten functions in [Col03] and with a
more combinatorial approach in [Xu97]. For uses of master loop (or Schwinger
Dyson) equations in the physics literature, see e.g. [EyB99] and [Eyn03].
Asymptotic freeness can be extended to other models such as joint distribu-
tion of random matrices with correlated entries [ScS05] or to deterministic mod-
els such as permutation matrices [Bia95]. Biane [Bia98b] (see also [Sni06] and
[Bia01]) showed that the asymptotic behavior of rescaled Young diagrams and as-
sociated representations and characters of the symmetric groups can be expressed
in terms of free cumulants.
412 5. F REE PROBABILITY

The study of the correction (central limit theorem) to Theorem 5.4.2 for Gaus-
sian entries was performed in [Cab01] and [MiS06]. The generalization to non-
Gaussian entries, as done in [AnZ05], is still open in the general noncommutative
framework. A systematic study and analysis of the limiting covariance was un-
dertaken in [MiN04]. The failure of the central limit theorem for a matrix model
whose potential has two deep wells was shown in [Pas06].
We have not mentioned the notion of freeness with amalgamation, which is a
freeness property where the scalar-valued state is replaced by an operator-valued
conditional expectation with properties analogous to conditional expectation from
classical probability theory. This notion is particularly natural when consider-
ing the algebra generated by two subalgebras. For instance, the free algebras
{Xi }1ip as in Theorem 5.4.5 are free with amalgamation with respect to the al-
gebra generated by the {Di }1ip . We refer to [Voi00b] for definitions and to
[Shl98] for a nice application to the study the asymptotics of the spectral measure
of band matrices. The central limit theorem for the trace of mixed moments of
band matrices and deterministic matrices was done in [Gui02].
The convergence of the operator norm of polynomials in independent GUE ma-
trices discussed in Section 5.5 was first proved in [HaT05]. (The norms of the lim-
iting object, namely free operators with matrix coefficients, were already studied
in [Leh99].) This result was generalized to independent matrices from the GOE
and the GSE in [Sch05], see also [HaST06], and to Wigner or Wishart matrices
with entries satisfying the Poincare inequality in [CaD07]. It was also shown in
[GuS08] to hold with matrices whose laws are absolutely continuous with respect
to the Lebesgue measure and possess a strictly log-concave density. The norm of
long words in free noncommutative variables is discussed in [Kar07a]. We note
that a by-product of the proof of Theorem 5.5.1 is that the Stieltjes transform of
the law of any self-adjoint polynomial in free semicircular random variables is
an algebraic function, as one sees by applying the algebraicity criterion [AnZ08b,
Theorem 6.1], to the SchwingerDyson equation as expressed in the form (5.5.32).
Proposition 5.5.3 is analogous to a result for sample covariance matrices proved
earlier in [BaS98a].
Many topics related to free probability have been left out in our discussion. In
particular, we have not mentioned free Brownian motion as defined in [Spe90],
which appears as the limit of the Hermitian Brownian motion with size going
to infinity [Bia97a]. We refer to [BiS98b] for a study of the related stochastic
calculus, to [Bia98a] for the introduction of a wide class of processes with free
increments and for the study of their Markov properties, to [Ans02] for the intro-
duction of stochastic integrals with respect to processes with free increments, and
to [BaNT02] for a thorough discussion of Levy processes and Levy laws. Such
5.6 B IBLIOGRAPHICAL NOTES 413

a stochastic calculus was used to prove a central limit theorem in [Cab01], large
deviation principles, see the survey [Gui04], and the convergence of the empirical
distribution of interacting matrices [GuS08]. In such a noncommutative stochastic
calculus framework, inequalities such as the BurkholderDavisGundy inequality
[BiS98b] or the BurkholderRosenthal inequalities [JuX03] hold.
Another important topic we did not discuss is the notion of free entropy. We re-
fer the interested readers to the reviews [Voi02] and [HiP00b]. Voiculescu defined
several concepts for an entropy in the noncommutative setup. First, the so-called
microstates entropy was defined in [Voi94], analogously to the
BoltzmannShannon entropy, as the volume of the collection of random matri-
ces whose empirical distribution approximates a given tracial state. Second, in
[Voi98a], the microstates-free free entropy was defined by following an infinitesi-
mal approach based on the free Fisher information. Voiculescu showed in [Voi93]
that, in the case of one variable, both entropies are equal. Following a large de-
viations and stochastic processes approach, bounds between these two entropies
could be given in the general setting, see [CaG01] and [BiCG03], providing strong
evidence toward the conjecture that they are equal in full generality. Besides its
connections with large deviations questions, free entropies were used to define
in [Voi94] another important concept, namely the free entropy dimension. This
dimension is related with L2 -Betti numbers [CoS05], [MiS05] and is analogous
to a fractal dimension in the classical setting [GuS07]. A long standing conjec-
ture is that the entropy dimension is an invariant of the von Neumann algebra,
which would settle the well known problem of the isomorphism between free
group factors [Voi02, section 2.6]. Free entropy theory has already been used to
settle some important questions in von Neumann algebras, see [Voi96], [Ge97],
[Ge98] or [Voi02, section 2.5]. In another direction, random matrices can be an
efficient way to tackle questions concerning C -algebras or von Neumman alge-
bras, see e.g. [Voi90], [Dyk93a], [Rad94], [HaT99], [Haa02], [PoS03], [HaT05],
[HaST06], [GuJS07] and [HaS09].
The free probability concepts developed in this chapter, and in particular free
cumulants, can also be used in more applied subjects such as telecommunications,
see [LiTV01] and [TuV04].
Appendices

A Linear algebra preliminaries

This appendix recalls some basic results from linear algebra. We refer the reader
to [HoJ85] for further details and proofs.

A.1 Identities and bounds

The following identities are repeatedly used. Throughout, A, B,C, D denote arbi-
trary matrices of appropriate dimensions. We then have
" # " #" #
A B A 0 1 A1 B
1det A =0 det = det
C D C D CA1 B 0 1
= det A det[D CA1 B] , (A.1)
where the right side of (A.1) is set to 0 if A is not invertible.
The following lemma, proved by multiplying on the right by (X zI) and on
the left by (X A zI), is very useful.

Lemma A.1 (Matrix inversion) For matrices X, A and scalar z, the following
identity holds if all matrices involved are invertible:
(X A zI)1 (X zI)1 = (X A zI)1 A(X zI)1 .

Many manipulations of matrices involve their minors. Thus, let I = {i1 , . . . , i|I| }
{1, . . . , m}, J = { j1 , . . . , j|J| } {1, . . . , n}, and for an m-by-n matrix A, let AI,J
be the |I|-by-|J| matrix obtained by erasing all entries that do not belong to a row
with index from I and a column with index from J. That is,
AI,J (l, k) = A(il , jk ) , l = 1, . . . , |I|, k = 1, . . . , |J| .

414
A. L INEAR ALGEBRA PRELIMINARIES 415

The I, J minor of A is then defined as det AI,J . We have the following.

Theorem A.2 (CauchyBinet Theorem) Suppose A is an m-by-k matrix, B a


k-by-n matrix, C = AB, and, with r min{m, k, n}, set I = {i1 , . . . , ir } {1, . . . , m},
J = { j1 , . . . , jr } {1, . . . , n}. Then, letting Kr,k denote all subsets of {1, . . . , k} of
cardinality r,
detCI,J = det AI,K det BK,J . (A.2)
KKr,k

We next provide a fundamental bound on determinants.

Theorem A.3 (Hadamards inequality) For any column vectors v1 , . . . , vn of


length n with complex entries, it holds that
n  n
det [v1 . . . vn ] vi T vi nn/2 |vi | .
i=1 i=1

A.2 Perturbations for normal and Hermitian matrices

We recall that a normal matrix A satisfies the relation AA = A A. In particular,


( )
all matrices in HN , = 1, 2, are normal.
/
In what follows, we let A2 := i, j |A(i, j)|2 denote the Frobenius norm of
the matrix A. The following lemma is a corollary of Gersgorins circle theorem.

Lemma A.4 (Perturbations of normal matrices) Let A be an N by N normal


matrix with eigenvalues i , i = 1, . . . , N, and let E be an arbitrary N by N matrix.
Let be any eigenvalues of A + E. Then there is an i {1, . . . , N} such that
| i | E2 .

For Hermitian matrices, more can be said. Recall that, for a Hermitian matrix A,
we let 1 (A) 2 (A) N (A) denote the ordered eigenvalues of A. We first
recall the

(2)
Theorem A.5 (Weyls inequalities) Let A, B HN . Then, for each k {1, . . . , N},
we have
k (A) + 1 (B) k (A + B) k (A) + N (B) . (A.3)

The following is a useful corollary of Weyls inequalities.


416 A PPENDICES
(2)
Corollary A.6 (Lipschitz continuity) Let A, E HN . Then
|k (A + E) k (A)| E2 . (A.4)

Corollary A.6 is weaker than Lemma 2.1.19, which in its Hermitian formulation,
see Remark 2.1.20, actually implies that, under the same assumptions,

|k (A + E) k (A)|2 E22 . (A.5)


k

We finally note the following comparison, whose proof is based on the


CourantFischer representation of the eigenvalues of Hermitian matrices.

(2)
Theorem A.7 Let A HN and z CN . Then, for 1 k N 2,
k (A zz ) k+1 (A) k+2 (A zz ) . (A.6)

A.3 Noncommutative matrix L p -norms


Given X Matk (C) with singular values 1 r 0, where r = min(k, ),
and a constant 1 p , one defines the noncommutative L p -norm of X by
 1/p
X p = ri=1 ip if p < and X = lim p X p = 1 .

Theorem A.8 The noncommutative L p norms satisfy the following.


A A
X = X  = AX T A .
p p p
(A.7)
UX p = X p for unitary matrices U Matk (C) . (A.8)
tr(XX ) = X22 . (A.9)
 1/p
r
X p |Xi,i | p for 1 p . (A.10)
i=1
 p is a norm on the complex vector space Matk (C) . (A.11)

Properties (A.7), (A.8) and (A.9) are immediate consequences of the definition. A
proof of (A.10) and (A.11) can be found in [Sim05b, Prop. 2.6 & Thm. 2.7]. It
follows from (A.10) that if X is a square matrix then
X1 | tr(X)| . (A.12)

For matrices X and Y with complex entries which can be multiplied, and expo-
nents 1 p, q, r satisfying 1p + 1q = 1r , we have the noncommutative Holder
inequality
XY r X p Y q . (A.13)
A. L INEAR ALGEBRA PRELIMINARIES 417

(See [Sim05b, Thm. 2.8].)

A.4 Brief review of resultants and discriminants

Definition A.9 Let


m m n n
P = P(t) = ait i = am (t i ), Q = Q(t) = b j t j = bn (t j ),
i=0 i=1 j=0 j=1

be two polynomials where the as, bs, s and s are complex numbers, the lead
coefficients am and bn are nonzero, and t is a variable. The resultant of P and Q is
defined as
m n m n
n (i j ) = am Q(i ) = (1) bn P( j ).
R(P, Q) = anm bm n mn m
i=1 j=1 i=1 j=1

The resultant R(P, Q) can be expressed as the determinant of the (m + n)-by-(m +


n) Sylvester matrix

am . . . a0
.. ..
. .


.. ..
. .

am . . . a0 .

bn . . . . . . . . . b0

.. ..
. .
bn . . . . . . . . . b0
Here there are n rows of as and m rows of bs. In particular, the resultant R(P, Q) is
a polynomial (with integer coefficients) in the as and bs. Hence R(P, Q) depends
only on the as and bs and does so continuously.

Definition A.10 Given a polynomial P as in Definition A.9, the discriminant of P


is defined as
m
D(P) = (1)m(m1)/2 R(P, P ) = (1)m(m1)/2 P (i )
i=1
= a2m1
m (i j )2 . (A.14)
1i< jn

We emphasize that D(P) depends only on the as and does so continuously.


418 A PPENDICES

B Topological preliminaries

The material in Appendices B and C is classical. These appendices are adapted


from [DeZ98].

B.1 Generalities

A family of subsets of a set X is a topology if 0/ , if X , if any union


of sets of belongs to , and if any finite intersection of elements of belongs
to . A topological space is denoted (X , ), and this notation is abbreviated to
X if the topology is obvious from the context. Sets that belong to are called
open sets. Complements of open sets are closed sets. An open set containing a
point x X is a neighborhood of x. Likewise, an open set containing a subset
A X is a neighborhood of A. The interior of a subset A X , denoted Ao , is
the union of the open subsets of A. The closure of A, denoted A, is the intersection
of all closed sets containing A. A point p is called an accumulation point of a set
A X if every neighborhood of p contains at least one point in A. The closure
of A is the union of its accumulation points.
A base for the topology is a collection of sets A such that any set from
is the union of sets in A . If 1 and 2 are two topologies on X , 1 is called
stronger (or finer) than 2 , and 2 is called weaker (or coarser) than 1 if 2 1 .
A topological space is Hausdorff if single points are closed and every two dis-
tinct points x, y X have disjoint neighborhoods. It is regular if, in addition,
any closed set F X and any point x / F possess disjoint neighborhoods. It is
normal if, in addition, any two disjoint closed sets F1 , F2 possess disjoint neigh-
borhoods.
If (X , 1 ) and (Y , 2 ) are topological spaces, a function f : X Y is a
bijection if it is one-to-one and onto. It is continuous if f 1 (A) 1 for any A 2 .
This implies also that the inverse image of a closed set is closed. Continuity is
preserved under compositions, i.e., if f : X Y and g : Y Z are continuous,
then g f : X Z is continuous. If both f and f 1 are continuous, then f is
a homeomorphism, and spaces X , Y are called homeomorphic if there exists a
homeomorphism f : X Y .
A function f : X R is lower semicontinuous (upper semicontinuous) if its
level sets {x X : f (x) } (respectively, {x X : f (x) } ) are closed
sets. Clearly, every continuous function is lower (upper) semicontinuous and the
pointwise supremum of a family of lower semicontinuous functions is lower semi-
continuous.
B. T OPOLOGICAL PRELIMINARIES 419

A Hausdorff topological space is completely regular if for any closed set F


X and any point x / F, there exists a continuous function f : X [0, 1] such
that f (x) = 1 and f (y) = 0 for all y F.
A cover of a set A X is a collection of open sets whose union contains A. A
set is compact if every cover of it has a finite subset that is also a cover. A contin-
uous image of a compact set is compact. A continuous bijection between compact
spaces is a homeomorphism. Every compact subset of a Hausdorff topological
space is closed. A set is pre-compact if its closure is compact. A topological
space is locally compact if every point possesses a neighborhood that is compact.

Theorem B.1 A lower (upper) semicontinuous function f achieves its minimum


(respectively, maximum) over any compact set K.

Let (X , ) be a topological space, and let A X . The relative (or induced)


0
topology on A is the collection of sets A . The Hausdorff, normality and regu-
larity properties are preserved under the relative topology. Furthermore, the com-
pactness is preserved, i.e., B A is compact in the relative topology iff it is com-
pact in the original topology . Note, however, that the closedness property is
not preserved.
A nonnegative real function d : X X R is called a metric if d(x, y) = 0
x = y, d(x, y) = d(y, x), and d(x, y) d(x, z) + d(z, y). The last property is referred
to as the triangle inequality. The set Bx, = {y : d(x, y) < } is called the ball of
center x and radius . The metric topology of X is the weakest topology which
contains all balls. The set X equipped with the metric topology is a metric space
(X , d). A topological space whose topology is the same as some metric topology
is called metrizable. Every metrizable space is normal. Every regular space that
possesses a countable base is metrizable.
A sequence xn X converges to x X (denoted xn x) if every neighbor-
hood of x contains all but a finite number of elements of the sequence {xn }. If
X , Y are metric spaces, then f : X Y is continuous iff f (xn ) f (x) for any
convergent sequence xn x. A subset A X of a topological space is sequen-
tially compact if every sequence of points in A has a subsequence converging to a
point in X .

Theorem B.2 A subset of a metric space is compact iff it is closed and sequentially
compact.

A set A X is dense if its closure is X . A topological space is separable if it


420 A PPENDICES

contains a countable dense set. Any topological space that possesses a countable
base is separable, whereas any separable metric space possesses a countable base.
Even if a space is not metric, the notion of convergence on a sequence may be
extended to convergence on filters, or nets, such that compactness, closedness,
etc. may be checked by convergence. The interested reader is referred to [DuS58]
or [Bou87] for details.
Let J be an arbitrary set. Let X be the Cartesian product of topological spaces
X j , i.e., X = j X j . The product topology on X is the topology generated by
the base j U j , where U j are open and equal to X j except for a finite number
of values of j. This topology is the weakest one which makes all projections
p j : X X j continuous. The Hausdorff property is preserved under products,
and any countable product of metric spaces (with metric dn (, )) is metrizable,
with the metric on X given by

1 dn (pn x, pn y)
d(x, y) = 2n 1 + dn (pn x, pn y) .
n=1

Theorem B.3 (Tychonoff) A product of compact spaces is compact.

B.2 Topological vector spaces and weak topologies

A vector space over the reals is a set X that is closed under the operations of
addition and multiplication by scalars, i.e., if x, y X , then x + y X and x
X for all R. All vector spaces in this book are over the reals. A topological
vector space is a vector space equipped with a Hausdorff topology that makes the
vector space operations continuous. The convex hull of a set A, denoted co(A), is
the intersection of all convex sets containing A. The closure of co(A) is denoted
co(A). co({x1 , . . . , xN }) is compact, and, if Ki are compact, convex sets, then the

set co( Ni=1 Ki ) is closed. A locally convex topological vector space is a vector
space that possesses a convex base for its topology.

Theorem B.4 Every (Hausdorff) topological vector space is regular.

A linear functional on the vector space X is a function f : X R that satisfies


f ( x + y) = f (x) + f (y) for any scalars , R and any x, y X . The
algebraic dual of X , denoted X , is the collection of all linear functionals on
X . The topological dual of X , denoted X , is the collection of all continuous
linear functionals on the topological vector space X . Both the algebraic dual
and the topological dual are vector spaces. Note that, whereas the algebraic dual
may be defined for any vector space, the topological dual may be defined only
B. T OPOLOGICAL PRELIMINARIES 421

for a topological vector space. The product of two topological vector spaces is
a topological vector space, and is locally convex if each of the coordinate spaces
is locally convex. The topological dual of the product space is the product of the
topological duals of the coordinate spaces. A set H X is called separating if
for any point x X , x = 0, one may find an h H such that h(x) = 0. It follows
from its definition that X is separating.

Theorem B.5 (HahnBanach) Suppose A and B are two disjoint, nonempty,


closed, convex sets in the locally convex topological vector space X . If A is
compact, then there exists an f X and scalars , R such that, for all
x A, y B,

f (x) < < < f (y) . (B.1)

It follows in particular that if X is locally convex, then X is separating. Now


let H be a separating family of linear functionals on X . The H -topology of
X is the weakest (coarsest) one that makes all elements of H continuous. Two
particular cases are of interest.
(a) If H = X , then the X -topology on X obtained in this way is called the
weak topology of X . It is weaker (coarser) than the original topology on X .
(b) Let X be a topological vector space (not necessarily locally convex). Every
x X defines a linear functionals fx on X by the formula fx (x ) = x (x). The
set of all such functionals is separating in X . The X -topology of X obtained
in this way is referred to as the weak topology of X .

Theorem B.6 Suppose X is a vector space and Y X is a separating vector


space. Then the Y -topology makes X into a locally convex topological vector
space with X = Y .

It follows in particular that there may be different topological vector spaces with
the same topological dual. Such examples arise when the original topology on X
is strictly finer than the weak topology.

Theorem B.7 Let X be a locally convex topological vector space. A convex


subset of X is weakly closed iff it is originally closed.

Theorem B.8 (BanachAlaoglu) Let V be a neighborhood of 0 in the topological


vector space X . Let K = {x X : |x (x)| 1 , x V }. Then K is weak
compact.
422 A PPENDICES

B.3 Banach and Polish spaces

A norm || || on a vector space X is a metric d(x, y) = ||x y|| that satisfies


the scaling property || (x y)|| = ||x y|| for all > 0. The metric topology
then yields a topological vector space structure on X , which is referred to as a
normed space. The standard norm on the topological dual of a normed space X
is ||x ||X = sup||x||1 |x (x)|, and then ||x|| = sup||x ||X 1 x (x), for all x X .

A Cauchy sequence in a metric space X is a sequence xn X such that,


for every > 0, there exists an N( ) such that d(xn , xm ) < for any n > N( )
and m > N( ). If every Cauchy sequence in X converges to a point in X , the
metric in X is called complete. Note that completeness is not preserved under
homeomorphism. A complete separable metric space is called a Polish space. In
particular, a compact metric space is Polish, and an open subset of a Polish space
(equipped with the induced topology) is homeomorphic to a Polish space.

A complete normed space is called a Banach space. The natural topology on a


Banach space is the topology defined by its norm.

A set B in a topological vector space X is bounded if, given any neighborhood


V of the origin in X , there exists an > 0 such that { x : x B, | | } V .
In particular, a set B in a normed space is bounded iff supxB ||x|| < . A set B in
a metric space X is totally bounded if, for every > 0, it is possible to cover B
by a finite number of balls of radius centered in B. A totally bounded subset of
a complete metric space is pre-compact.

Unlike in the Euclidean setup, balls need not be convex in a metric space. How-
ever, in normed spaces, all balls are convex. Actually, the following partial con-
verse holds.

Theorem B.9 A topological vector space is normable, i.e., a norm may be de-
fined on it that is compatible with its topology, iff its origin has a convex bounded
neighborhood.

Weak topologies may be defined on Banach spaces and their topological duals. A
striking property of the weak topology of Banach spaces is the fact that compact-
ness, apart from closure, may be checked using sequences.

Theorem B.10 (EberleinSmulian) Let X be a Banach space. In the weak


topology of X , a set is sequentially compact iff it is pre-compact.
C. P ROBABILITY MEASURES ON P OLISH SPACES 423

B.4 Some elements of analysis

We collect below some basic results tying measures and functions on locally com-
pact Hausdorff spaces. In most of our applications, the underlying space will be
R. A good reference that contains this material is [Rud87].

Theorem B.11 (Riesz representation theorem) Let X be a locally compact Haus-


dorff space, and let be a positive linear functional on Cc (X). Then there exists
a -algebra M in X which contains all Borel sets in X, and there exists a unique
positive measure on M which represents in the sense that

f = f d for every f Cc (X).
X

We next discuss the approximation of measurable functions by nice functions.


Recall that a function s is said to be simple if there are measurable sets Ai and real
constants (i )1in such that s = ni=1 i 1Ai .

Theorem B.12 Let X be a measure space, and let f : X [0, ] be measurable.


Then there exist simple functions (s p ) p0 on X such that 0 s1 s2 sk f
and sk (x) converges to f (x) for all x X.

The approximation of measurable functions by continuous ones is often achieved


using the following.

Theorem B.13 (Lusin) Suppose X is a locally compact Hausdorff space and


is a positive Borel measure on X. Let A X be measurable with (A) < , and
suppose f is a complex measurable function on X, with f (x) = 0 if x A. Then,
for any > 0 there exists a g Cc (X) such that

({x : f (x) = g(x)}) < .

Furthermore, g can be taken such that supxX |g(x)| supxX | f (x)|.

C Probability measures on Polish spaces

C.1 Generalities

The following indicates why Polish spaces are convenient when handling measur-
ability issues. Throughout, unless explicitly stated otherwise, Polish spaces are
equipped with their Borel -fields.
424 A PPENDICES

Theorem C.1 (Kuratowski) Let 1 , 2 be Polish spaces, and let f : 1 2 be


a measurable, one-to-one map. Let E1 1 be a Borel set. Then f (E1 ) is a Borel
set in 2 .

A probability measure on the Borel -field B of a Hausdorff topological space


is a countably additive, positive set function with () = 1. The space of
(Borel) probability measures on is denoted M1 (). When is separable, the
structure of M1 () becomes simpler, and conditioning becomes easier to handle;
namely, let , 1 be two separable Hausdorff spaces, and let be a probability
measure on (, B ). Let : 1 be measurable, and let = 1 be the
measure on B1 defined by (E1 ) = ( 1 (E1 )).

Definition C.2 A regular conditional probability distribution given (referred to


as r.c.p.d.) is a mapping 1 1  1 M1 () such that:
(a) there exists a set N B1 with (N) = 0 and, for each 1 1 \N,

1 ({ : ( ) = 1 }) = 0 ;

(b) for any set E B , the map 1  1 (E) is B1 measurable and



(E) = 1 (E) (d 1 ) .
1

It is property (b) that allows for the decomposition of measures. In Polish spaces,
the existence of an r.c.p.d. follows from:

Theorem C.3 Let , 1 be Polish spaces, M1 (), and : 1 a measurable


map. Then there exists an r.c.p.d. 1 . Moreover, it is unique in the sense that any
other r.c.p.d. 1 satisfies

({1 : 1 = 1 }) = 0 .

Another useful property of separable spaces is their behavior under products.

Theorem C.4 Let N be either finite or N = .


(a) Ni=1 B BN .
i=1
(b) If is separable, then Ni=1 B = BN .
i=1

We now turn our attention to the particular case where is metric (and, when-
ever needed, Polish).

Theorem C.5 Let be a metric space. Then any M1 () is regular.


C. P ROBABILITY MEASURES ON P OLISH SPACES 425

Theorem C.6 Let be Polish, and let M1 (). Then there exists a unique
closed set C such that (C ) = 1 and, if D is any other closed set with (D) = 1,
then C D. Finally,

C = { : U o (U o ) > 0 } .

The set C of Theorem C.6 is called the support of .


A probability measure on the metric space is tight if, for each > 0,
there exists a compact set K such that (Kc ) < . A family of probability
measures { } on the metric space is called a tight family if the set K may be
chosen independently of .

Theorem C.7 Each probability measure on a Polish space is tight.

C.2 Weak topology

Whenever is Polish, a topology may be defined on M1 () that possesses nice


properties; namely, define the weak topology on M1 () as the topology generated
by the sets

U ,x, = { M1 () : | d x| < } ,

where Cb (), > 0 and x R. If one takes only functions Cb () that are
of compact support, the resulting topology is the vague topology.
Hereafter, M1 () always denotes M1 () equipped with the weak topology. The
following are some basic properties of this topological space.

Theorem C.8 Let be Polish.


(i) M1 () is Polish.
(ii) A metric compatible with the weak topology is the Levy metric:

d( , ) = inf{ : (F) (F ) + F closed} .

(iii) M1 () is compact iff is compact.


(iv) Let E be a dense countable subset of . The set of all probability
measures whose supports are finite subsets of E is dense in M1 ().
(v) Another metric compatible with the weak topology is the Lipschitz bound-
ed metric:  
dLU ( , ) = sup | f d f d| , (C.1)
f FLU
426 A PPENDICES

where FLU is the class of Lipschitz continuous functions f : R, with


Lipschitz constant at most 1 and uniform bound 1.

The space M1 () possesses a useful criterion for compactness.

Theorem C.9 (Prohorov) Let be Polish, and let M1 (). Then is compact
iff is tight.

Since M1 () is Polish, convergence may be decided by sequences. The following


lists some useful properties of converging sequences in M1 ().

Theorem C.10 (Portmanteau theorem) Let be Polish. The following state-


ments are equivalent.
(i) n as n .  
(ii) g bounded and uniformly continuous, lim g d n = g d.
n
(iii) F closed, lim sup n (F) (F).
n
(iv) G open, lim inf n (G) (G).
n
(v) A B , which is a continuity set, i.e., such that (A\Ao ) = 0, limn
n (A) = (A).

A collection of functions G B() is called convergence determining for M1 ()


if
 
lim gd n = gd , g G n n .
n

For Polish, there exists a countable convergence determining collection of func-


tions for M1 () and the collection { f (x)g(y)} f ,gCb () is convergence determining
for M1 (2 ).

Theorem C.11 Let be Polish. If K is a set of continuous, uniformly bounded


functions on that are equicontinuous on compact subsets of , then n
implies that
1  2
lim sup sup | d n d | = 0 .
n K

The following theorem is the analog of Fatous Lemma for measures. It is proved
from Fatous Lemma either directly or by using the Skorohod representation the-
orem.
D. N OTIONS OF LARGE DEVIATIONS 427

Theorem C.12 Let be Polish. Let f : [0, ] be a lower semicontinuous


function, and assume n . Then
 
lim inf f d n f d .
n

D Basic notions of large deviations

This appendix recalls basic definitions and main results of large deviation theory.
We refer the reader to [DeS89] and [DeZ98] for a full treatment.
In what follows, X will be assumed to be a Polish space (that is a complete sep-
arable metric space). We recall that a function f : X R is lower semicontinuous
if the level sets {x : f (x) C} are closed for any constant C.

Definition D.1 A sequence (N )NN of probability measures on X satisfies a large


deviation principle with speed aN (going to infinity with N) and rate function I iff
I : X[0, ] is lower semicontinuous. (D.1)

1
For any open set O X, lim inf log N (O) inf I. (D.2)
N aN O

1
For any closed set F X, lim sup log N (F) inf I. (D.3)
N aN F

When it is clear from the context, we omit the reference to the speed or rate func-
tion and simply say that the sequence {N } satisfies the LDP. Also, if xN are
X-valued random variables distributed according to N , we say that the sequence
{xN } satisfies the LDP if the sequence {N } satisfies the LDP.

Definition D.2 A sequence (N )NN of probability measures on X satisfies a weak


large deviation principle if (D.1) and (D.2) hold, and in addition (D.3) holds for
all compact sets F X.

The proof of a large deviation principle often proceeds first by the proof of a weak
large deviation principle, in conjuction with the so-called exponential tightness
property.

Definition D.3 (a) A sequence (N )NN of probability measures on X is exponen-


tially tight iff there exists a sequence (KL )LN of compact sets such that
1
lim sup lim sup log N (KLc ) = .
L N aN
428 A PPENDICES

(b) A rate function I is good if the level sets {x X : I(x) M} are compact for
all M 0.

The interest in these concepts lies in the following.

Theorem D.4 (a) ([DeZ98, Lemma 1.2.18]) If {N } satisfies the weak LDP and
it is exponentially tight, then it satisfies the full LDP, and the rate function I is
good.
(b) ([DeZ98, Exercise 4.1.10]) If {N } satisfies the upper bound (D.3) with a good
rate function I, then it is exponentially tight.

A weak large deviation principle is itself equivalent to the estimation of the prob-
ability of deviations towards small balls.

Theorem D.5 Let A be a base of the topology of X. For every A A , define


1
A = lim inf log N (A)
N aN
and
I(x) = sup A .
AA :xA

Suppose that, for all x X,


 -
1
I(x) = sup lim sup log N (A) .
AA :xA N aN
Then N satisfies a weak large deviation principle with rate function I.

Let d be the metric in X, and set B(x, ) = {y X : d(y, x) < }.

Corollary D.6 Assume that, for all x X,


1 1
I(x) = lim sup lim sup log N (B(x, )) = lim inf lim inf log N (B(x, )) .
0 N aN 0 N aN

Then N satisfies a weak large deviation principle with rate function I.

From a given large deviation principle one can deduce a large deviation principle
for other sequences of probability measures by using either the so-called contrac-
tion principle or Laplaces method.

Theorem D.7 (Contraction principle) Assume that the sequence of probability


measures (N )NN on X satisfies a large deviation principle with good rate func-
tion I. Then, for any function F : XY with values in a Polish space Y which is
D. N OTIONS OF LARGE DEVIATIONS 429

continuous, the image (FN )NN M1 (Y )N defined as FN (A) = F 1 (A)


also satisfies a large deviation principle with the same speed and rate function
given for any y Y by
J(y) = inf{I(x) : F(x) = y}.

Theorem D.8 (Varadhans Lemma) Assume that (N )NN satisfies a large devi-
ation principle with good rate function I. Let F : XR be a bounded continuous
function. Then

1
lim log eaN F(x) d N (x) = sup{F(x) I(x)}.
N aN xX

Moreover, the sequence


1
N (dx) =  eaN F(x) d N (x) M1 (X)
eaN F(y) d N (y)
satisfies a large deviation principle with good rate function
J(x) = I(x) F(x) sup{F(y) I(y)}.
yX

Laplaces method for the asymptotic evaluation of integrals, which is discussed


in Section 3.5.1, can be viewed as a (refined) precursor to Theorem D.8 in a nar-
rower context. In developing it, we make use of the following elementary result.

Lemma D.9 (Asymptotics for Laplace transforms) Let f : R+ C posses poly-


nomial growth at infinity. Suppose that for some exponent > 1 and complex
constant B,
f (t) = At + O(t +1 ) as t 0.
Consider the Laplace transform

F(x) = f (t)etx dt
0
which is defined (at least) for all real x > 0. Then,
 
B( + 1) 1
F(x) = + O +2 as x .
x +1 x

Proof In the special case f (t) = Bt we have F(x) = B(x+1+1) , and hence the
claim holds. To handle the general case we may assume that B = 0. Then we
  
have 01 etx f (t)dt = O( 0 t +1 etx dt) and 1 etx f (t)dt decays exponentially
fast, which proves the lemma.

Note that if f (t) has an expansion in powers t , t +1 , t +2 and so on, then


430 A PPENDICES

iterated application of the claim yields an asymptotic expansion of the Laplace


transform F(x) at infinity in powers x 1 , x 2 , x 3 and so on.

E The skew field H of quaternions and matrix theory over F


Whereas the reader is undoubtedly familiar with the fields R and C, the skew
field H of quaternions invented by Hamilton may be less familiar. We give a brief
account of its most important features here. Then, with F denoting any of the
(skew) fields R, C or H, we recount (without proof) the elements of matrix theory
over F, culminating in the spectral theorem (Theorem E.11) and its corollaries. We
also prove a couple of specialized results (one concerning projectors and another
concerning Lie algebras of unitary groups) which are well known in principle but
for which references uniform in F are not known to us.

Definition E.1 The field H is the associative (but not commutative) R-algebra
with unit for which 1, i, j, k form a basis over R, and in which multiplication is
dictated by the rules
i2 = j2 = k2 = ijk = 1. (E.1)

Elements of H are called quaternions. Multiplication in H is not commutative.


However, every nonzero element of H is invertible. Indeed, we have (a + bi +
cj + dk)1 = (a bi cj dk)/(a2 + b2 + c2 + d 2 ) for all a, b, c, d R not all
vanishing. Thus H is a skew field: that is, an algebraic system satisfying all the
axioms of a field except for commutativity of multiplication.

Remark E.2 Here is a concrete model for the quaternions in terms of matrices.
Note that the matrices
" # " # " #
i 0 0 1 0 i
, ,
0 i 1 0 i 0
with complex number entries satisfy the rules (E.1). It follows that the map
" #
a + bi c + di
a + bi + cj + dk  (a, b, c, d R)
c + di a bi
is an isomorphism of H onto a subring of the ring of 2-by-2 matrices with entries
in C. The quaternions often appear in the literature identified with 2-by-2 matrices
in this way. We do not use this identification in this book.

For every
x = a + bi + cj + dk H (a, b, c, d R)
E. Q UATERNIONS AND MATRIX THEORY OVER F 431

we define

x = a2 + b2 + c2 + d 2 , x = a bi cj dk, x = a.

We then have
x + x
x2 = xx , xy = x y, (xy) = y x , x = , xy = yx
2
for all x, y H. In particular, we have x1 = x /x2 for nonzero x H.
The space of all real multiples of 1 H is a copy of R and the space of all real
linear combinations of 1 and i is a copy of C. Thus R and C can be and will be
identified with subfields of H, and in particular both i and i will be used to denote
the imaginary unit of the complex numbers. In short, we think of R, C and H as
forming a tower
R C H.

If x C, then x (resp., x , x) is the absolute value (resp., complex conjugate,


real part) of x in the usual sense. Further, jx = x j for all x C. Finally, for all
nonreal x C, we have {y H | xy = yx} = C.

E.1 Matrix terminology over F and factorization theorems

Let Mat pq (F) denote the space of p-by-q matrices with entries in F. Given
X Mat pq (F), let Xi j F denote the entry of X in row i and column j. Let
Mat pq = Mat pq (R) and Matn (F) = Matnn (F). Let 0 pq denote the p-by-q
zero matrix, and let 0 p = 0 pp . Let In denote the n-by-n identity matrix. Given
X Mat pq (F), let X Matqp (F) be the matrix obtained by transposing X and
then applying asterisk to every entry. The operation X  X is R-linear and,
furthermore, (XY ) = Y X for all X Mat pq (F) and Y Matqr (F). Similarly,
we have (xX) = X x for any matrix X Mat pq (F) and scalar x F. Given
X Matn (F), we define tr X F to be the sum of the diagonal entries of X. Given
X,Y Mat pq (F), we set X Y = tr X Y , thus equipping Mat pq (F) with the
structure of finite-dimensional real Hilbert space (Euclidean space). Given ma-
trices Xi Matni (F) for i = 1, . . . , , let diag(X1 , . . . , X ) Matn1 ++n (F) be the
block-diagonal matrix obtained by stringing the given matrices Xi along the diag-
onal.

(p,q)
Definition E.3 The matrix ei j = ei j Mat pq with entry 1 in row i and column
j and 0s elsewhere is called an elementary matrix.
432 A PPENDICES

The set
{uei j | u F {1, i, j, k}, ei j Mat pq }
is an orthonormal basis for Mat pq (F).

Definition E.4 (i) Let X Matn (F) be a matrix. It is invertible if there exists
Y Matn (F) such that Y X = In = XY . It is normal if X X = XX . It is unitary
if X X = In = XX . It is self-adjoint (resp., anti-self-adjoint) if X = X (resp.,
X = X). It is upper triangular (resp., lower triangular) if Xi j = 0 unless i j
(resp., i j).
(ii) A matrix X Matn (F) is monomial if there is exactly one nonzero entry in
every row and in every column; if, moreover, every entry of X is either 0 or 1, we
call X a permutation matrix.
(iii) A self-adjoint X Matn (F) is positive definite if v Xv > 0 for all nonzero
v Matn1 (F).
(iv) A matrix X Matn (F) is a projector if it is both self-adjoint and idempotent,
that is, if X = X = X 2 .
(v) A matrix X Mat pq (F) is diagonal if Xi j = 0 unless i = j. The set of positions
(i, i) for i = 1, . . . , min(p, q) is called the (main) diagonal of X.

The group of invertible elements of Matn (F) is denoted GLn (F), while the sub-
group of GLn (F) consisting of unitary matrices is denoted Un (F). Permutation
matrices in Matn belong to Un (F).
We next present several factorization theorems. The first is obtained by the
Gaussian elimination method.

Theorem E.5 (Gaussian elimination) Let X Mat pq (F) have the property that
for all v Matq1 (F), if Xv = 0, then v = 0. Then p q. Furthermore, there exists
a permutation matrix P Mat p (F) and an upper triangular matrix T Matq (F)
with every diagonal entry equal to 1 such that PXT vanishes above the main
diagonal but vanishes nowhere on the main diagonal.

In particular, for square A, B Mat p (F), if AB = Ip , then BA = Ip . It follows also


that GLn (F) is an open subset of Matn (F).
The GramSchmidt process gives more information when p = q.

Theorem E.6 (Triangular factorization) Let Q Matn (F) be self-adjoint and


positive definite. Then there exists a unique upper triangular matrix T Matn (F)
with every diagonal entry equal to 1 such that T QT is diagonal. Further, T
depends smoothly (that is, infinitely differentiably) on the entries of Q.
E. Q UATERNIONS AND MATRIX THEORY OVER F 433

Corollary E.7 (UT factorization) Every X GLn (F) has a unique factorization
X = UT where T GLn (F) is upper triangular with every diagonal entry positive
and U Un (F).

Corollary E.8 (Unitary extension) If V Matnk (F) satisfies V V = Ik , then


n k and there exists U Un (F) agreeing with V in the first k columns.

Corollary E.9 (Construction of projectors) Let p and q be positive integers. Fix


Y Mat pq (F). Put n = p + q. Write T (Ip + YY )T = I p for some (unique)
upper triangular matrix T Mat p (F) with positive diagonal entries. Then =
" #
TT T T Y
(Y ) = Matn (F) is a projector. Further, every projector
Y T T Y T T Y
Matn (F) such that tr = p and the p p block in upper left is invertible is
of the form = (Y ) for unique Y Mat pq (F).

E.2 The spectral theorem and key corollaries

A reference for the proof of the spectral theorem in the unfamiliar case F = H is
[FaP03].

Definition E.10 (Standard blocks) A C-standard block is any element of Mat1 (C)
= C. An H-standard block is any element of Mat1 (C) = C with nonnegative
imaginary
" part. An # R-standard block is either an element of Mat1 = R, or a
a b
matrix Mat2 with b > 0. Finally, X Matn (F) is F-reduced if
b a
X = diag(B1 , . . . , B ) for some F-standard blocks Bi .

Theorem E.11 (Spectral theorem) Let X Matn (F) be normal.


(i) There exists U Un (F) such that U XU is F-reduced.
(ii) Fix U Un (F) and F-standard blocks B1 , . . . , B such that diag(B1 , . . . , B )
= U XU. Up to order, the Bi depend only on X, not on U.

Corollary E.12 (Eigenvalues) Fix a self-adjoint X Matn (F).


(i) There exist U Un (F) and a diagonal matrix D Matn such that D = U XU.
(ii) For any such D and U, the sequence of diagonal entries of D arranged in
nondecreasing order is the same.

We call the entries of D the eigenvalues of the self-adjoint matrix X. (When


F = R, C this is the standard notion of eigenvalue.)
434 A PPENDICES

Corollary E.13 (Singular values) Fix X Mat pq (F).


(i) There exist U U p (F), V Uq (F) and diagonal D Mat pq such that D =
UXV .
(ii) For any such U, V and D, the sequence of absolute values of diagonal entries
of D arranged in nondecreasing order is the same.
(iii) Now assume that p q, and that X is diagonal with nonzero diagonal entries
the absolute values of which are distinct. Then, for any U, V and D as in (i),
U is monomial and V = diag(V ,V ), where V U p (F) and V Uqp (F). (We
simply put V = V if p = q.) Furthermore, the product UV is diagonal and squares
to the identity.

We call the absolute values of the entries of D the singular values of the rectangu-
lar matrix X. (When F = R, C this is the standard notion of singular value.) The
squares of the singular values of X are the eigenvalues of X X or XX , whichever
has min(p, q) rows and columns.

E.3 A specialized result on projectors

We present a factorization result for projectors which is used in the discussion of


the Jacobi ensemble in Section 4.1. The case F = C of the result is well known.
But for lack of a suitable reference treating the factorization uniformly in F, we
give a proof here.

Proposition E.14 Let 0 < p q be integers and put n = p + q. Let Matn (F)
be a projector. Then there exists U Un (F) commuting with diag(Ip , 0q ) such that
" #
a b
U U = , where a Mat p , 2b Mat pq and d Matq are diagonal
bT d
with entries in the closed unit interval [0, 1].
" #
a
Proof Write = with a Mat p (F), Mat pq (F) and d Matq (F).
d
Since every element of Un (F) commuting with diag(Ip , 0q ) is of the form diag(v, w)
for v U p (F) and w Uq (F), we may by Corollary E.13 assume that a and d are
diagonal and real. Necessarily the diagonal entries of a and d belong to the closed
unit interval [0, 1]. For brevity, write ai = aii and d j = d j j . We may assume that
the diagonal entries of a are ordered so that ai (1 ai ) is nonincreasing as a func-
tion of i, and similarly d j (1 d j ) is nonincreasing as a function of j. We may
further assume that whenever ai (1 ai ) = ai+1 (1 ai+1 ) we have ai ai+1 , but
that whenever d j (1 d j ) = d j+1 (1 d j+1 ) we have d j d j+1 .
E. Q UATERNIONS AND MATRIX THEORY OVER F 435

From the equation 2 = we deduce that a(I p a) = and d(Iq d) =


. Let b Mat p be the unique diagonal matrix with nonnegative entries such
that b2 = . Note that the diagonal entries of b appear in nonincreasing order,
and in particular all nonvanishing diagonal entries are grouped together in the
upper left. Furthermore, all entries of b belong to the closed interval [0, 1/2].
By Corollary E.13 there exist v U p (F) and w Uq (F) such that v[b 0 p(qp) ]w
= . From the equation b2 = we deduce that v commutes with b2 and hence
also with b. After replacing w by diag(v, Iqp )w, we may assume without loss of
generality that = [b 0 p(qp) ]w. From the equation

w diag(b2 , 0qp )w = = d(Iq d) ,

we deduce that w commutes with diag(b, 0qp ).


Let 0 r p be the number of nonzero diagonal entries of b. Write b =
diag(b, 0 pr ), where b GLr (R). Since w commutes with diag(b, 0qr ), we can
write w = diag(w, w ), where w Ur (F) and w Uqr (F). Then we have =
[diag(bw, 0 pr ) 0 p(qp) ] and, further, w commutes with b.
Now write a = diag(a, a ) with a Matr and a Mat pr . Similarly, write
d ) with d Matr and d Matqr . Both a and d are diagonal with
d = diag(d,
diagonal entries in (0, 1). Both a and d are diagonal with diagonal entries in
{0, 1}. We have a block decomposition

a 0 bw 0
0 a 0 0
= w b 0 d 0 .

0 0 0 d

From the equation 2 = we deduce that baw = abw = bw(Ir d),


hence aw =
w(Ir d),
hence a and Ir d have the same eigenvalues, and hence (on account of
the care we took in ordering the diagonal entries of a and d), we have a = Ir d.
Finally, since d and w commute, with U = diag(I p , w, Iqr ), we have U U =
" #
a b
.

bT d

E.4 Algebra for curvature computations

We present an identity needed to compute the Ricci curvature of the special or-
thogonal and special unitary groups, see Lemma F.27 and the discussion immedi-
ately following. The identity is well known in Lie algebra theory, but the effort
436 A PPENDICES

needed to decode a typical statement in the literature is about equal to the effort
needed to prove it from scratch. So we give a proof here.
Let sun (F) be the set of anti-self-adjoint matrices X Matn (F) such that, if
F = C, then tr X = 0. We equip the real vector space sun (F) with the inner product
inherited from Matn (F), namely X Y = tr XY . Let [X,Y ] = XY Y X for X,Y
Matn (F), noting that sun (F) is closed under the bracket operation. Let = 1, 2, 4
according as F = R, C, H.

Proposition E.15 For all X sun (F) and orthonormal bases {L } for sun (F), we
have
 
1 (n + 2)
[[X, L ], L ] = 1 X . (E.2)
4 4

Proof We have su1 (R) = su1 (C) = 0, and the case su1 (H) can be checked by
direct calculation with i, j, k. Therefore we assume that n 2 for the rest of the
proof.
Now for fixed X sun (F), the expression [[X, L], M] for L, M sun (F) is an
R-bilinear form on sun (F). It follows that the left side of (E.2) is independent of
the choice of orthonormal basis {L }. We are therefore free to choose {L } at
our convenience, and we do so as follows. Let ei j Matn for i, j = 1, . . . , n be the
elementary matrices. For 1 k < n and u {i, j, k}, let
 
k
u u n
Dk =
u
kek+1,k+1 + eii , Dk = Dik , Dun = eii .
k + k2 i=1 n i=1

For 1 i < j n and u {1, i, j, k}, let


uei j u e ji
Fiuj = , Ei j = Fi1j , Fi j = Fiij .
2
Then
{Ei j : 1 i < j n},
{Dk : 1 k < n} {Ei j , Fi j : 1 i < j n},
{Duk : 1 k n, u {i, j, k}} {Fiuj : 1 i < j n, u {1, i, j, k}}
are orthonormal bases for sun (R), sun (C) and sun (H), respectively.
We next want to show that, in proving (E.2), it is enough to consider just one
X, namely X = E12 . We achieve that goal by proving the following two claims.

(I) Given {L } and X for which (E.2) holds and any U Un (F), again (E.2)
holds for {UL U } and UXU .
F. M ANIFOLDS 437

(II) The set {UE12U | U Un (F)} spans sun (F) over R.

Claim (I) holds because the operation X  UXU stabilizes sun (F), preserves the
bracket [X,Y ], and preserves the inner product X Y . We turn to the proof of claim
(II). By considering conjugations that involve appropriate 2-by-2 blocks, one can
generate any element of the collection {F12 u , Du } from E . Further, using conju-
1 12
gation by permutation matrices and taking linear combinations, one can generate
{Fiuj , Duk }. Finally, to obtain Dun , it is enough to show that diag(i, i, 0, . . . , 0) can be
generated, and this follows from the identity

diag(1, j)diag(i, i)diag(1, j)1 = diag(i, i).

Thus claim (II) is proved.


We are ready to conclude. The following facts may be verified by straightfor-
ward calculations:

E12 commutes with Duk for k > 1 and u {i, j, k};


E12 commutes with Fiuj for 2 < i < j n and u {1, i, j, k};
[[E12 , Fiuj ], Fiuj ] = 12 E12 for 1 i < j < n such that #{i, j} {1, 2} = 1 and
u {1, i, j, k}; and
u ], F u ] = [[E , Du ], Du ] = 2E for u {i, j, k}.
[[E12 , F12 12 12 1 1 12

It follows that the left side of (E.2) with X = E12 and {L } specially chosen as
above equals cE12 , where the constant c is equal to
 
1 1 (n + 2)
2 (n 2) + 2 2( 1) = 1.
4 2 4
Since (E.2) holds with X = E12 and specially chosen {L }, by the previous steps
it holds in general. The proof of the lemma is finished.

F Manifolds

We have adopted in Section 4.1 a framework in which all groups of matrices we


used were embedded as submanifolds of Euclidean space. This had the advantage
that the structure of the tangent space was easy to identify. For completeness, we
present in this appendix all notions employed, and provide in Subsection F.2 the
proof of the coarea formula, Theorem 4.1.8. An inspiration for our treatment is
[Mil97]. At the end of the appendix, in Subsection F.3, we introduce the language
of connections, LaplaceBeltrami operators, and Hessians, used in Section 4.4.
For the latter we follow [Hel01] and [Mil63].
438 A PPENDICES

F.1 Manifolds embedded in Euclidean space

Given a differentiable function f defined on an open subset of Rn with values in


a finite-dimensional real vector space and an index i = 1, . . . , n, we let i f denote
the partial derivative of f with respect to the ith coordinate. If n = 1, then we write
f = 1 f .

Definition F.1 A Euclidean space is a finite-dimensional real Hilbert space E,


with inner product denoted by (, )E . A Euclidean set M is a nonempty locally
closed subset of E, which we equip with the induced topology.

(A locally closed set is the intersection of a closed set with an open set.) We refer
to E as the ambient space of M.
We consider Rn as Euclidean space by adopting the standard inner product
(x, y)Rn = x y = ni=1 xi yi . Given Euclidean spaces E and F, and a map f : U V
from an open subset of E to an open subset of F, we say that f is smooth if (after
identifying E with Rn and F with Rk as vector spaces over R in some way) f is
infinitely differentiable.
Given for i = 1, 2 a Euclidean set Mi with ambient space Ei , we define the
product M1 M2 to be the subset {m1 m2 | m1 M1 , m2 M2 } of the orthogonal
direct sum E1 E2 .
Let f : M N be a map from one Euclidean set to another. We say that f is
smooth if for every point p M there exists an open neighborhood U of p in the
ambient space of M such that f |UM can be extended to a smooth map from U to
the ambient space of N. If f is smooth, then f is continuous. We say that f is a
diffeomorphism if f is smooth and has a smooth inverse, in which case we also
say that M and N are diffeomorphic. Note that the definition implies that every
n-dimensional linear subspace of a Euclidean space is diffeomorphic to Rn .

Definition F.2 (Manifolds) A manifold M of dimension n (for short: n-manifold)


is a Euclidean set such that every point of M has an open neighborhood diffeo-
morphic to an open subset of Rn .

We call n the dimension of M and write n = dim M. A diffeomorphism : T U


where T Rn is a nonempty open set and U is an open subset of M is called a
chart of M. By definition M is covered by the images of charts. The product of
manifolds is again a manifold. A subset N M is called a submanifold if N is a
manifold in its own right when viewed as a subset of the ambient space of M.
F. M ANIFOLDS 439

Definition F.3 Let M be an n-manifold with ambient space E. Let p M be a


point. A curve through p M is by definition a smooth map : I M, where
I R is a nonempty open interval, 0 I, and (0) = p. We define the tangent
space T p (M) of M at p to be the subset of E consisting of all vectors of the form
(0) for some curve through p M.

The set T p (M) is a vector subspace of E of dimension n over R. More precisely,


for any chart : T U and point t0 T such that (t0 ) = p, the vectors (i )(t0 )
for i = 1, . . . , n form a basis over R for T p (M). We endow T p (M) with the struc-
ture of Euclidean space it inherits from E.
Let f : M N be a smooth map of manifolds, and let p M. There exists
a unique R-linear transformation T p ( f ) : T p (M) T f (p) (N) with the follow-
ing property: for every curve with (0) = p and (0) = X Tp (M), we have
(T p ( f ))(X) = ( f ) (0). We call T p ( f ) the derivative of f at p. The map T p ( f )
is an isomorphism if and only if f maps some open neighborhood of p M diffeo-
morphically to some open neighborhood of f (p) N. If f is a diffeomorphism
and T p ( f ) is an isometry of real Hilbert spaces for every p M, we call f an
isometry.

Remark F.4 Isometries need not preserve distances in ambient Euclidean spaces.
For example, {(x, y) R2 \ {(0, 0)} : x2 + y2 = 1} R2 and {0} (0, 2 ) R2
are isometric.

Definition F.5 Let M be an n-manifold, with A M. We say that A is negligible if


for every chart : T U of M the subset 1 (A) Rn is of Lebesgue measure
zero.

By the change of variable formula of Lebesgue integration, a subset A M is


negligible if and only if for every p M there exists a chart : T U such that
p U and 1 (A) Rn is of Lebesgue measure zero.
We exploit the change of variables formula to define a volume measure on the
Borel subsets of M. We begin with the following.

Definition F.6 Let : T U be a chart of an n-manifold M. Let E be the ambient


space of M.
(i) The correction factor is the smooth positive function on T defined by the
following formula, valid for all t T :
7
n
(t) = det ((i )(t), ( j )(t))E .
i, j=1
440 A PPENDICES

(ii) The chart measure T, on the Borel sets of T is the measure absolutely con-
tinuous with respect to Lebesgue measure restricted to T , T , defined by
dT,
= .
dT

Lemma F.7 Let A be a Borel subset of an n-manifold M, and let : T U be a


chart such that A U. Then T, (1 (A)) is independent of the chart .

Since a measure on a Polish space is defined by its (compatible) restrictions to


open subsets of the space, one may employ charts and Lemma F.7 and define in a
unique way a measure on a manifold M, which we call the volume measure on M.

Proposition F.8 (Volume measure) Let M be a manifold.


(i) There exists a unique measure M on the Borel subsets of M such that for
all Borel subsets A M and charts : T U of M we have M (A U) =
T, (1 (A)). The measure M is finite on compacts.
(ii) A Borel set A M is negligible if and only if M (A) = 0.
(iii) For every nonempty open subset U M and Borel set A M we have U (A
U) = M (A U).
(iv) For every isometry f : M1 M2 of manifolds we have M1 f 1 = M2 .
(v) For all manifolds M1 and M2 we have M1 M2 = M1 M2 .

Clearly, Rn is Lebesgue measure on the Borel subsets of Rn .


We write [M] = M (M) for every manifold M. We have frequently to con-
sider such normalizing constants in the sequel. We always have [M] (0, ].
(It is possible to have [M] = , for example [R] = ; but it is impossible to
have [M] = 0 because we do not allow the empty set to be a manifold.) If M is
compact, then [M] < .

Critical vocabulary

Definition F.9 Critical and regular points Let f : M N be a smooth map of


manifolds. A p M is a critical point for f if the derivative T p ( f ) fails to be onto;
otherwise p is a regular point for f . We say that q N is a critical value of f if
there exists a critical point p M for f such that f (p) = q. Given q N, the fiber
f 1 (q) is by definition the set {p M | f (p) = q}. Finally, q N is a regular
value for f if q is not a critical value and the fiber f 1 (q) is nonempty.
F. M ANIFOLDS 441

Our usage of the term regular value therefore does not conform to the traditions
of differential topology. In the latter context, a regular value is simply a point
which is not a critical value.
The following facts, which we use repeatedly, are straightforwardly deduced
from the definitions.

Proposition F.10 Let f : M N be a smooth map of manifolds. Let Mreg (resp.,


Mcrit ) be the set of regular (resp., critical) points for f . Let Ncrit (resp., Nreg ) be
the set of critical (resp., regular) values of f .
(i) The set Mreg (resp., Mcrit ) is open (resp., closed) in M.
(ii) The sets Ncrit and Nreg , being -compact, are Borel subsets of N.

Regular values are easier to handle than critical ones. Sards Theorem allows
one to restrict attention, when integrating, to such values.

Theorem F.11 (Sard) [Mil97, Chapter 3] The set of critical values of a smooth
map of manifolds is negligible.

Lie groups and Haar measure

Definition F.12 A Lie group G is a manifold with ambient space Matn (F) for some
n and F such that G is a closed subgroup of GLn (F).

This ad hoc definition is of course not as general as possible but it is simple and
suits our purposes well. For example, GLn (F) is a Lie group. By Lemma 4.1.15,
Un (F) is a Lie group.
Let G be a locally compact topological group, e.g., a Lie group. Let be a
measure on the Borel sets of G. We say that is left-invariant if A = {ga | a
A} for all Borel A G and g G. Right-invariance is defined analogously.

Theorem F.13 Let G be a locally compact topological group.


(i) There exists a left-invariant measure on G (neither 0 nor infinite on com-
pacts), called Haar measure, which is unique up to a positive constant multiple.
(ii) If G is compact, then every Haar measure is right-invariant, and has finite
total mass. In particular, there exists a unique Haar probability measure.

We note that Lebesgue measure in Rn is a Haar measure. Further, for any Lie
group G contained in Un (F), the volume measure G is by Proposition F.8(vi) and
Lemma 4.1.13(iii) a Haar measure.
442 A PPENDICES

F.2 Proof of the coarea formula

In this subsection, we prove the coarea formula, Theorem 4.1.8. We begin by in-
troducing the notion of f -adapted pairs of charts, prove a few preliminary lemmas,
and then provide the proof of the theorem. Lemmas F.18 and F.19 can be skipped
in the course of the proof of the coarea formula, but are included since they are
useful in Section 4.1.3.
Let f : M N be a smooth map from an n-manifold to a k-manifold and assume
that n k. Let : Rn Rk be the projection to the first k coordinates. Recall that
a chart on M is a an open nonempty subset S Rn together with a diffeomorphism
from S to an open subset of M.

Definition F.14 A pair ( : S U, : T V ) consisting of a chart of M and a


chart of N is f -adapted if

S 1 (T ) Rn , U f 1 (V ), f = |S ,

in which case we also say that the open set U M is good for f .

The commuting diagram



Rn S
U M
|S f |U f

Rk T
V N

summarizes the relationships among the maps in question here.

Lemma F.15 Let f : M N be a smooth map from an n-manifold to a k-manifold.


Let p M be a regular point. (Since a regular point exists, necessarily n k.)
Then there exists an open neighborhood of p good for f .

Proof Without loss we may assume that M Rn and N Rk are open sets. We
may also assume that p = 0 Rn and q = f (p) = 0 Rk . Write f = ( f1 , . . . , fk ).
Let t1 , . . . ,tn be the standard coordinates in Rn . By hypothesis, for some permuta-
tion of {1, . . . , n}, putting gi = fi for i = 1, . . . , k and gi = t (i) for i = k +1, . . . , n,
the determinant detni, j=1 j gi does not vanish at the origin. By the inverse func-
tion theorem there exist open neighborhoods U, S Rn of the origin such that
() = ( f1 |U , . . . , fk |U ,t (k+1) |U , . . . ,t (n) |U ) maps U diffeomorphically to S. Take
to be the inverse of (). Take to be the identity map of N to itself. Then
(, ) is an f -adapted pair of charts and the origin belongs to the image of .

F. M ANIFOLDS 443

Proposition F.16 Let f : M N be a smooth map from an n-manifold to a k-


manifold. Let Mreg M be the set of regular points of f . Fix q N such that
f 1 (q) Mreg is nonempty. Then:
(i) Mreg f 1 (q) is a manifold of dimension n k;
(ii) for every p Mreg f 1 (q) we have T p (Mreg f 1 (q)) = ker(T p ( f )).

Proof We may assume that Mreg = 0/ and hence n k, for otherwise there is noth-
ing to prove. By Lemma F.15 we may assume that M Rn and N Rk are open
sets and that f is projection to the first k coordinates, in which case all assertions
here are obvious.

We pause to introduce some apparatus from linear algebra.

Definition F.17 Let f : E F be a linear map between Euclidean spaces and let
f : F E be the adjoint of f . The generalized determinant J( f ) is defined as
the square root of the determinant of f f : F F.

We emphasize that J( f ) is always nonnegative. If a linear map f : Rn Rk is


represented by a k-by-n matrix A with real entries, and the Euclidean structures
of source and target f are the usual ones, then J( f )2 = det(AAT ). In general, we
have J( f ) = 0 if and only if f is onto. Note also that, if f is an isometry, then
J( f ) = 1.

Lemma F.18 For i = 1, 2 let fi : Ei Fi be a linear map between Euclidean


spaces. Let f1 f2 : E1 E2 F1 F2 be the orthogonal direct sum of f1 and f2 .
Then we have J( f f ) = J( f )J( f ).

Proof This follows directly from the definitions.

Lemma F.19 Let f : E F be a linear map between Euclidean spaces. Let


D ker( f ) be a subspace such that D and F have the same dimension. Let
x1 , . . . , xn D be an orthonormal basis. Let : E D be the orthogonal
projection. Then:
(i) J( f )2 = detni, j=1 ( f xi , f x j )F ;
(ii) J( f )2 is the determinant of the R-linear operator f f : D D .

Proof Since ( f xi , f x j )F = (xi , f f x j )F , statements (i) and (ii) are equivalent.


We have only to prove statement (i). Extend x1 , . . . , xn to an orthonormal basis
of x1 , . . . , xn+k of E. Let y1 , . . . , yn be an orthonormal basis of F. Let A be the
n-by-n matrix with entries (yi , f x j )F , in which case AT A is the n-by-n matrix with
entries ( f xi , f x j )E . Now make the identifications E = Rn+k and F = Rn such a
444 A PPENDICES

way that x1 , . . . , xn+k (resp., y1 , . . . , yn ) becomes the standard basis in Rn+k (resp.,
Rn ). Then f is represented by the matrix [A 0], where 0 Matnk . Finally, by
definition, J( f )2 = det[A 0][A 0]T = det AT A, which proves the result.

Lemma F.20 Let f : E F be an onto linear map from an n-dimensional Eu-


clidean space to a k-dimensional Euclidean space. Let {xi }ni=1 and {yi }ki=1 be
bases (not necessarily orthonormal) for E and F, respectively, such that f (xi ) = yi
for i = 1, . . . , k and f (xi ) = 0 for i = k + 1, . . . , n. Then we have
n n k
J( f )2 det (xi , x j )E = det (xi , x j )E det (yi , y j )F .
i, j=1 i, j=k+1 i, j=1

Proof Let A (resp., B) be the n-by-n (resp., k-by-k) real symmetric positive definite
matrix with entries Ai j = (xi , x j )E (resp., Bi j = (yi , y j )F ). Let C be the (n k)-by-
(n k) block of A in the lower right corner. We have to prove that J( f )2 det A =
detC det B. Make R-linear (but in general not isometric) identifications E = Rn
and F = Rk in such a way that {xi }ni=1 (respectively, {yi }ki=1 ) is the standard basis
in Rn (respectively, Rk ), and (hence) f is projection to the first k coordinates.
Let P be the k-by-n matrix with 1s along the main diagonal and 0s elsewhere.
Then we have f x = Px for all x E. Let Q be the unique n-by-k matrix such
that f y = Qy for all y F = Rk . Now the inner product on E is given in terms
of A by the formula (x, y)E = xT Ay and similarly (x, y)F = xT By. By definition
of Q we have (Px)T By = xT A(Qy) for all x Rn and y Rk , hence PT B = AQ,
and hence Q = A1 PT B. By definition of J( f ) we have J( f )2 = det(PA1 PT B) =
det(PA1 PT ) det B. Now decompose A into blocks thus:
" #
a b
A= , a = PAPT , d = C.
c d

From the matrix inversion lemma, Lemma A.1, it follows that det(PA1 PT )
= det A/ detC. The result follows.

We need one more technical lemma. We continue in the setting of Theorem


4.1.8. For the statement of the lemma we also fix an f -adapted pair ( : S
U, : T V ) of charts. (Existence of such implies that n k.) Let : Rn Rk
be projection to the first k coordinates. Let : Rn Rnk be projection to the last
n k coordinates. Given t T such that the set

St = {x Rnk |(t, x) U}

is nonempty, the map

t = (x  (t, x)) : St U f 1 ((t))


F. M ANIFOLDS 445

is chart of Mreg f 1 ((t)), and hence the correction factor t , see Definition
F.6, is defined.

Lemma F.21 With notation as above, for all s S we have


J(T(s) ( f )) (s) = (s) ( (s)) ( (s)).

Proof Use Lemma F.20 to calculate J(T(s) ( f )), taking {(i )(s)}ni=1 as the basis
for the domain of T(s) ( f ) and {(i )( (s))}ki=1 as the basis for the range.

Proof of Theorem 4.1.8 We may assume that Mreg = 0/ and hence n k, for other-
wise there is nothing to prove. Lemma F.21 expresses the function p  J(T p ( f ))
locally in a fashion which makes continuity on Mreg clear. Moreover, Mcrit = {p
M | J(T p ( f )) = 0}. Thus the function in question is indeed Borel-measurable. (In
fact it is continuous, but to prove that fact requires uglier formulas.) Thus part (i)
of the theorem is proved. We turn to the proof of parts (ii) and (iii) of the theorem.
Since on the set Mcrit no contribution is made to any of the integrals under con-
sideration, we may assume that M = Mreg . We may assume that is the indicator
of a Borel subset A M. By Lemma F.15 the manifold M is covered by open
sets good for f . Accordingly M can be expressed as a countable disjoint union of

Borel sets each of which is contained in an open set good for f , say M = M . By
monotone convergence we may replace A by A M for some index , and thus
we may assume that for some f -adapted pair ( : S U, : T V ) of charts we
have A U. We adopt again the notation introduced in Lemma F.21. We have
 
A J(T p ( f ))d M (p) = 1 (A) J(T(s) ( f ))dS, (s)
  
= 1
(A) dSt ,t (x) dT, (t)
  t
= ( A f 1 (q) d f 1 (q) (p))d N (q).

At the first and last steps we appeal to Proposition F.8(i) which characterizes the
measures () . At the crucial second step we apply Lemma F.21 and Fubinis
Theorem. The last calculation proves both the measurability assertion (ii) and the
integral formula (iii).

F.3 Metrics, connections, curvature, Hessians, and the LaplaceBeltrami


operator

We briefly review some notions of Riemannian geometry. Although in this book


we work exclusively with manifolds embedded in Euclidean space, all formulas in
this subsection can be understood in the general setting of Riemannian geometry.
Let M be a manifold of dimension m, equipped with a Riemannian metric g, and
446 A PPENDICES

let be the measure naturally associated with g. By definition, g is the specifica-


tion for every p M of a scalar product g p on T p (M). In the setup of manifolds
embedded in some Euclidean space that we have adopted, T p (M) is a subspace
of the ambient Euclidean space, the Riemannian metric g p is given by the restric-
tion of the Euclidean inner product to that subspace, and the volume measure
coincides with the measure M given in Proposition F.8.
Let C (M) denote the space of real-valued smooth functions on M.

Definition F.22 (i) A vector field (on M) is a smooth map X from M to its ambient
space such that, for all p M, X(p) T p (M). Given a vector field X and a smooth
function f C (M), we define the function X f C (M) by the requirement that
X f (p) = dtd f ( (t))|t=0 for any curve through p with (0) = X(p).
(ii) If X,Y are vector fields, we define g(X,Y ) C (M) by

g(X,Y )(p) = g p (X(p),Y (p)) .

The Lie bracket [X,Y ] is the unique vector field satisfying, for all f C (M),

[X,Y ] f = X(Y f ) Y (X f ) .

(iii) A collection of vector fields L1 , . . . , Lm defined on an open set U M is a


local frame if L1 (p), . . . , Lm (p) are a basis of T p (M) for all p U. The local
frame {Li } is orthonormal if g(Li , L j ) = i j .

Definition F.23 (i) For f C (M), the gradient grad f is the unique vector field
satisfying g(X, grad f ) = X f for all vector fields X. If {Li } is any local orthonor-
mal frame, then grad f = i (Li f )Li .
(ii) A connection is a bilinear operation associating with vector fields X and Y
a vector field X Y such that, for any f C (M),

f X Y = f X Y , X ( fY ) = f X Y + X( f )Y .

The connection is torsion-free if X Y Y X = [X,Y ].


(iii) The LeviCivita connection is the unique torsion-free connection satisfying
that, for all vector fields X,Y, Z,

Xg(Y, Z) = g(X Y, Z) + g(Y, X Z) .

(iv) Given a vector field X, the divergence divX C (M) is the unique function
satisfying, for any orthonormal local frame {Li },

divX = g(Li , [Li , X]) .


i
F. M ANIFOLDS 447

Alternatively, for any compactly supported f C (M),


 
g(grad f , X)d = f divXd .

(v) The LaplaceBeltrami operator on C (M) is defined by f = div grad f .


With respect to any orthonormal local frame {Li } we have
f = Li2 f + g(Li , [Li , L j ])L j f .
i, j

From part (iv) of Definition F.23, we have the classical integration by parts for-
mula: for all functions , C (M) at least one of which is compactly sup-
ported,
 
g(grad , grad )d = ( )d . (F.1)

In our setup of manifolds embedded in a Euclidean space, the gradient grad f


introduced in Definition F.23 can be evaluated at a point p M by extending
f , in a neighborhood of p, to a smooth function f in the ambient space, taking
the standard gradient of f in the ambient space at p, and finally projecting it
orthogonally to Tp (M). We also note (but do not use) that a connection gives
rise to the notion of parallel transport of a vector field along a curve, and in this
language the LeviCivita connection is characterized by being torsion-free and
preserving the metric g under parallel transport.
We use in the sequel the symbol to denote exclusively the LeviCivita con-
nection. It follows from part (iv) of Definition F.23 that, for a vector field X and
an orthonormal local frame {Li }, divX = i g(Li X, Li ). Further, for all vector
fields X, Y and Z,
2g(X Y, Z) = Xg(Y, Z) +Y g(Z, X) Zg(X,Y ) (F.2)
+g([X,Y ], Z) + g([Z, X],Y ) + g(X, [Z,Y ]) .

Definition F.24 Given f C (M), we define the Hessian Hess f to be the opera-
tion associating with two vector fields X and Y the function
Hess( f )(X,Y ) = (XY X Y ) f = g(X grad f ,Y ) = Hess( f )(Y, X) .

(The second and third equalities can be verified from the definition of the Levi
Civita connection.)
We have Hess( f )(hX,Y ) = Hess( f )(X, hY ) = hHess( f )(X,Y ) for all h C (M)
and hence (Hess( f )(X,Y ))(p) depends only X(p) and Y (p).
448 A PPENDICES

With respect to any orthonormal local frame {Li }, we have the relations

Hess( f )(Li , L j ) = (Li L j Li L j ) f ,


f = (Li2 Li Li ) f = Hess( f )(Li , Li ) . (F.3)
i i

In this respect, the LaplaceBeltrami operator is a contraction of the Hessian.


The divergence, the Hessian and the LaplaceBeltrami operator coincide with the
usual notions of gradient, Hessian and Laplacian when M = Rm and the tangent
spaces (all of which can be identified with Rm in that case) are equipped with the
standard Euclidean metric.
We are ready to introduce the Riemannian curvature tensor and its contraction,
the Ricci curvature tensor.

Definition F.25 (i) The Riemann curvature tensor R(, ) associates with vector
fields X,Y an operator R(X,Y ) on vector fields defined by the formula

R(X,Y )Z = X (Y Z) Y (X Z) [X,Y ] Z .

(ii) The Ricci curvature tensor associates with vector fields X and Y the function
Ric(X,Y ) C (M), which, with respect to any orthonormal local frame {Li },
satisfies Ric(X,Y ) = i g(R(X, Li )Li ,Y ).

We have R( f X,Y )Z = R(X, fY )Z = R(X,Y )( f Z) = f R(X,Y )Z for all f C (M)


and hence (R(X,Y )Z)(p) T p (M) depends only on X(p), Y (p) and Z(p). The
analogous remark holds for Ric(X,Y ) since it is a contraction of R(X,Y )Z.
Many computations are simplified by the introduction of a special type of or-
thonormal frame.

Definition F.26 Let p M. An orthonormal local frame {Li } in a neighborhood


of p is said to be geodesic at p if (Li L j )(p) = 0.

A geodesic local frame {Li } in a neighborhood U of p M can always be built


from a given orthonormal local frame {Ki } by setting Li = j Ai j K j with A :
U Matn a smooth map satisfying A(p) = Im , AT A = Im , and (Ki A jk )(p) =
g(Ki K j , Kk )(p). With respect to geodesic frames {Li }, we have the simple
expressions

Hess( f )(Li , L j )(p) = (Li L j f )(p), Ric(Li , L j )(p) = ( LiCkk


j
LkCikj )(p) ,
k
(F.4)
where Cikj = g(Li L j , Lk ).
F. M ANIFOLDS 449

Curvature of classical compact Lie groups

Let G be a closed subgroup and submanifold of Un (F), where the latter is as


defined in Appendix E. In this situation both left- and right-translation in G are
isometries. We specialize now to the case M = G. We are going to compute the
Ricci curvature of G and then apply the result to concrete examples. In particular,
we will provide the differential geometric interpretation of Proposition E.15.
The crucial observation is that, in this situation, all computations can be done
at the identity, as we now explain. For each X TIn (G), choose any curve
through In such that (0) = X and let X, be the vector field whose associated first
order differential operator is given by (X, f )(x) = dtd f (x (t))|t=0 for all f C (G)
and x G. The vector field X, does not depend on the choice of . Recall that
[X,Y ] = XY Y X and X Y = tr XY for X,Y Matn (F). For all X,Y TIn (G)
one verifies by straightforward calculation that
] = [X,
[X,Y ] TIn (G), [X,Y , Y, ], g(X,
, Y, ) = X Y.

It follows in particular from dimension considerations that every orthonormal ba-


, } on G.
sis {L } for TIn (G) gives rise to a global orthonormal frame {L

Lemma F.27 For all X,Y, Z,W TIn (G) we have


1 , ) = 1 [[X,Y ], Z] W ,
X,Y, = [X,Y , Y, )Z,
], g(R(X, ,W
2 4
and hence

Ric(X, , = 1 [[X, L ], L ] X,
, X) (F.5)
4

where the sum runs over any orthonormal basis {L } of TIn (G).

Proof By formula (F.2) we have g(X,Y, , Z)


, = 1 [X,Y ] Z, whence the result after
2
a straightforward calculation.

We now consider the special cases G = {U UN (F) | detU = 1} for F = R, C.


If F = R, then G is the special orthogonal group SO(N) whereas, if F = C, then
G is the special unitary group SU(N). Using now the notation of Proposition
E.15, one can show that TIN (G) = suN (F). Thus, from (E.2) and (F.5) one gets
for G = SO(N) or G = SU(N) that
 
(N + 2)
Ric(X, X) = 1 g(X, X) , (F.6)
4
for every vector field X on G, where = 1 for SO(N) and = 2 for SU(N). We
, X)
note in passing that if G = UN (C) then Ric(X, , = 0 for X = iIN TI (UN (C)),
N
450 A PPENDICES

and thus no uniform strictly positive lower bound on the Ricci tensor exists for
G = UN (C). We also note that (F.6) remains valid for G = UN (H) and = 4.

G Appendix on operator algebras

G.1 Basic definitions

An algebra is a vector space A over a field F equipped with a multiplication


which is associative, distributive and F-bilinear, that is, for x, y, z A and F:

x(yz) = (xy)z,
(x + y)z = xz + yz, x(y + z) = xy + xz,
(xy) = ( x)y = x( y).

We will say that A is unital if there exists a unit element e A such that xe =
ex = x (e is necessarily unique because if e is also a unit then ee = e = e e = e) .
A group algebra F(G) of a group (G, ) over a field F is the set {gG ag g :
ag F} of linear combinations of finitely many elements of G with coefficients
in F (above, ag = 0 except for finitely many g). F(G) is the algebra over F with
addition and multiplication
  
ag g + bg g = (ag + bg )g, ag g bg g = ag bh g h,
gG gG gG gG gG g,hG

respectively, and with product by a scalar b gG ag g = gG (bag )g. The unit of


F(G) is identified with the unit of G.
A complex algebra is an algebra over the complex field C. A seminorm on a
complex algebra A is a map from A into R+ such that for all x, y A and C,
 x = | |x, x + y x + y, xy x y,
and, if A is unital with unit e, also e = 1. A norm on a complex algebra A is a
seminorm satisfying that x = 0 implies x = 0 in A . A normed complex algebra
is a complex algebra A equipped with a norm ..

Definition G.1 A complex normed algebra (A , ||.||) is a Banach algebra if the


norm || || induces a complete distance.

Definition G.2 Let A be a Banach algebra.


An involution on A is a map from A to itself that satisfies (a + b) =
a +b , (ab) = b a , ( a) = a (for C), (a ) = a and a  = a.
G. O PERATOR ALGEBRAS 451

A is a C -algebra if it possesses an involution a  a that satisfies ||a a|| =


||a||2 .
B is a (unital) C -subalgebra of a (unital) C -algebra if it is a subalgebra
and, in addition, is closed with respect to the norm and the involution (and
contains the unit).

Here denotes the complex conjugate of . Note that the assumption ||a|| = ||a ||
ensures the continuity of the involution.
The following collects some of the fundamental properties of Banach algebras
(see [Rud91, pp. 234235]).

Theorem G.3 Let A be a unital Banach algebra and let G(A ) denote the invert-
ible elements of A . Then G(A ) is open, and it is a group under multiplication.
Furthermore, for every a A , the spectrum of a, defined as
sp(a) = { C : e a G(A )} ,
is nonempty, compact and, defining the spectral radius
(a) = sup{| | : sp(a)} ,
we have that
(a) = lim ||an ||1/n = inf ||an ||1/n .
n n1

(The last equality is valid due to sub-additivity.)

An element a of A is said to be self-adjoint (resp., normal, unitary) if a = a


(resp., a a = aa , a a = e = aa ). Note that, if A is unital, its unit e is self-
adjoint. Indeed, for all x A , we have e x = (x e) = x, similarly xe = x, and
hence e = e by uniqueness of the unit.
A Hilbert space H is a vector space equipped with aninner product ,  that is
complete for the topology induced by the norm   := , .
Let H1 , H2 be two Hilbert spaces with inner products , H1 and , H2 respec-
tively. The direct sum H1 H2 is a Hilbert space equipped with the inner product
(x1 , y1 ), (x2 , y2 )H1 H2 = x1 , x2 H1 + y1 , y2 H2 . (G.1)
The tensor product H1 H2 is a Hilbert space with inner product
x1 y1 , x2 y2 H1 H2 = x1 , x2 H1 y1 , y2 H2 . (G.2)

Let B(H) denote the space of bounded linear operators on the Hilbert space
H. We define the adjoint T of any T B(H) as the unique element of B(H)
452 A PPENDICES

satisfying
T x, y = x, T y x, y H. (G.3)

The space B(H), equipped with the involution and the norm

T B(H) = sup{|T x, y|, x = y = 1},

has a structure of C -algebra, see Definition G.2, and a fortiori that of Banach
algebra. Therefore, Theorem G.3 applies, and we denote by sp(T ) the spectrum
of the operator T B(H).
We have (see [Rud91, Theorem 12.26]) the following.

Theorem G.4 Let H be a Hilbert space. A normal T B(H) is


(i) self-adjoint iff sp(T ) lies in the real axis,
(ii) unitary iff sp(T ) lies on the unit circle.

The GNS construction (Theorem 5.2.24) discussed in the main text can be used
to prove the following fundamental fact (see [Rud91, Theorem 12.41]).

Theorem G.5 For every C -algebra A there exists a Hilbert space HA and a
norm-preserving -homomorphism A : A B(HA ).

G.2 Spectral properties

We next state the spectral theorem. Let M be a -algebra in a set . A resolution


of the identity (on M ) is a mapping

: M B(H)

with the following properties.

(i) (0)
/ = 0, () = I.
(ii) Each ( ) is a self-adjoint projection.
(iii) ( ) = ( ) ( ).
(iv) If = 0,
/ ( ) = ( ) + ( ).
(v) For every x H and y H, the set function x,y ( ) =  ( )x, y is a
complex measure on M .

When M is the -algebra of all Borel sets on a locally compact Hausdorff space,
it is customary to add the requirement that each x,y is a regular Borel measure
G. O PERATOR ALGEBRAS 453

(this is automatically satisfied on compact metric spaces). Then we have the fol-
lowing theorem. (For bounded operators, see [Rud91, Theorem 12.23], and for
unbounded operators, see [Ber66] or references therein.)

Theorem G.6 If T is a normal linear operator on a Hilbert space H with domain


dense in H, there exists a unique resolution of the identity on the Borel subsets
of sp(T ) which satisfies

T= d ( ).
sp(T )
We call the spectral resolution of T .

Note that sp(T ) is a bounded set if T B(H), ensuring that x,y is a compactly
supported measure for all x, y H. For any bounded measurable function f on
sp(T ), we can use the spectral theorem to define f (T ) by

f (T ) = f ( )d ( ).
sp(T )
We then have (see [Rud91, Section 12.24]) the following.

Theorem G.7
(i) f f (T ) is a homomorphism of the algebra of all bounded Borel func-
tions on sp(T ) into B(H) which carries the function 1 to I, the identity into
T and which satisfies f(T ) = f (T ) .
(ii)  f (T ) sup{| f ( )| : sp(T )}, with equality for continuous f .
(iii) If fn converges to f uniformly on sp(T ),  fn (T ) f (T ) goes to zero as n
goes to infinity.

The theory can be extended to unbounded operators as follows. An operator


T on H is a linear map from H into H with domain of definition D(T ). Two
operators T, S are equal if D(T ) = D(S) and T x = Sx for x D(T ). T is said
to be closed if, for every sequence {xn }nN D(T ) converging to some x H
such that T xn converges as n goes to infinity to y, one has x D(A) and y = T x.
Equivalently, the graph (h, T h)hD(A) in the direct sum H H is closed. T is
closable if the closure of its graph in H H is the graph of a (closed) operator.
The spectrum sp(T ) of T is the complement of the set of all complex numbers
such that ( I T )1 exists as an everywhere defined bounded operator. We
next define the adjoint of a densely defined operator T ; if the domain D(T ) of the
operator T is dense in H, then the domain D(T ) consists, by definition, of all
y H such that T x, y is continuous for x D(T ). Then, by density of D(T ),
there exists a unique y H such that T x, y = x, y  and we then set T y := y .
454 A PPENDICES

A densely defined operator T is self-adjoint iff D(T ) = D(T ) and T = T . We


can now state the generalization of Theorem G.6 to unbounded operators.

Theorem G.8 [DuS58, p. 1192] Let T be a densely defined self-adjoint operator.


Then its spectrum is real and there is a uniquely determined regular countably
additive self-adjoint spectral measure T defined on the Borel sets of the real line,
vanishing on the complement of the spectrum, and related to T by the equations

(a) D(T ) = {x H| 2 dT ( )x, x < },
sp(T )
 n
(b) T x = lim d T ( )x .
n n

Another good property of closed and densely defined operators (not necessarily
self-adjoint) is the existence of a polar decomposition.

Theorem G.9 [DuS58, p. 1249] Let T be a closed, densely defined operator. Then
T can be written uniquely as a product T = PA, where P is a partial isometry, that
is, P P is a projection, A is a nonnegative self-adjoint operator, the closures of the
ranges of A and T coincide, and both are contained in the domain of P.

Let A be a sub-algebra of B(H). A self-adjoint operator T on H is affiliated


with A iff it is a densely defined self-adjoint operator such that for any bounded
Borel function f on the spectrum of A, f (A) A . This is equivalent, by the spec-
tral theorem, to requiring that all the spectral projections {T ([n, m]), n m} be-
long to A (see [Ped79, p. 164]).

G.3 States and positivity

Lemma G.10 [Ped79, p. 6] An element x of a C -algebra A is nonnegative, x 0,


iff one of the following equivalent conditions is true:
(i) x is normal and with nonnegative spectrum;
(ii) x = y2 for some self-adjoint operator y in A;
(iii) x is self-adjoint and ||t1 x|| t for any t ||x||;
(iv) x is self-adjoint and ||t1 x|| t for some t ||x||.

Lemma G.11 [Ped79, Section 3.1] Let be a linear functional on a C -algebra


(A , , ||.||). Then the two following conditions are equivalent:
(i) (x x) 0 for all x A ;
(ii) (x) 0 for all x 0 in A .
G. O PERATOR ALGEBRAS 455

When one of these conditions is satisfied, we say that is nonnegative. Then is


self-adjoint, that is, (x ) = (x) and if A has a unit I, | (x)| (I)||x||.

Some authors use the term positive functional where we use nonnegative func-
tional.

Lemma G.12 [Ped79, Theorem 3.1.3] If is a nonnegative functional on a C -


algebra A , then for all x, y A ,
| (y x)|2 (x x) (y y) .

G.4 von Neumann algebras

By Theorem G.5, any C -algebra can be represented as a C -subalgebra of B(H),


for H a Hilbert space. So, let us fix a Hilbert space H. B(H) can be endowed with
different topologies. In particular, the strong (resp., weak) topology on B(H) is
the locally convex vector space topology associated with the family of seminorms
{xx  : H} (resp., the family of linear functionals {xx ,  : ,
H}).

Theorem G.13 (von Neumanns double commutant theorem) For a subset


S B(H) that is closed under the involution , define the commutant of S as
S := {b B(H) : ba = ab, a S } .
Then a C -subalgebra A of B(H) is a W -algebra if and only if A = A .

We have also the following.

Theorem G.14 [Ped79, Theorem 2.2.2] Let A B(H) be a subalgebra that is


closed under the involution and contains the identity operator. Then the follow-
ing are equivalent:
(i) A = A ;
(ii) A is strongly closed;
(iii) A is weakly closed.

In particular, A is the weak closure of A . The advantage of a von Neumann


algebra is that it allows one to construct functions of operators which are not
continuous.
A useful property of self-adjoint operators is their behavior under closures.
More precisely, we have the following. (See [Mur90, Theorem 4.3.3] for a proof.)
456 A PPENDICES

Theorem G.15 (Kaplansky density theorem) Let H be a Hilbert space and let
A B(H) be a C -algebra with strong closure B. Let Asa and Bsa denote the
self-adjoint elements of A and B. Then:
(i) Asa is strongly dense in Bsa ;
(ii) the closed unit ball of Asa is strongly dense in the closed unit ball of Bsa ;
(iii) the closed unit ball of A is strongly dense in the closed unit ball of B.

Von Neumann algebras are classified into three types: I, II and III [Li92, Chap-
ter 6]. The class of finite von Neumann algebras will be of special interest to
us. Since its definition is related to properties of projections, we first describe the
latter (see [Li92, Definition 6.1.1] and [Li92, Proposition 1.3.5]).

Definition G.16 Let A be a von Neumann algebra.


(i) A projection is an element p A such that p = p = p2 .
(ii) We say that p q if q p is a nonnegative element of A . We say that
p q if there exists a v A so that p = vv and q = v v.
(iii) A projection p A is said to be finite if any projection q of A such that
q p and q p must be equal to p.

We remark that the relation in point (ii) of Definition G.16 is an equivalence


relation.
Recall that, for projections p, q B(H), the minimum of p and q, denoted p q,
is the projection from H onto pH qH, while the maximum p q is the projection
from H onto pH + qH. The minimum p q can be checked to be the largest
operator dominated by both p and q, with respect to the order . The maximum
p q has the analogous least upper bound property.
The following elementary proposition clarifies the analogy between the role the
operations of taking minimum and maximum of projections play in noncommuta-
tive probability, and the role intersection and unions play in classical probability.
This, and other related facts concerning projections, can be found in [Nel74, Sec-
tion 1], see in particular (3) there. (For similar statements, see [Li92].) Recall the
notions of tracial, faithful and normal states, see Definitions 5.2.9 and 5.2.26.

Proposition G.17 Let (A , ) be a W -probability space, with tracial. Let p, q


A be projections. Then p q, p q A and (p) + (q) = (p q) + (p q).

As a consequence of Proposition G.17, we have the following.

Property G.18 Let (A , ) be a W - probability space, subset of B(H) for some


Hilbert space H. Assume that is a a normal faithful tracial state.
G. O PERATOR ALGEBRAS 457

(i) Let > 0 and p, q be two projections in A so that (p) 1 and (q)
1 . Then, with r = p q, (r) 1 2 .
(ii) If pi is an increasing sequence of projections converging weakly to the
identity, then (pi ) goes to one.
(iii) Conversely, if pi is an increasing sequence of projections such that (pi )
goes to one, then pi converges weakly to the identity in A .

Proof of Property G.18 The first point is an immediate consequence of Proposi-


tion G.17. The second point is a direct consequence of normality of while the
third is a consequence of the faithfulness of .

Definition G.19 A von Neumann algebra A is finite if its identity is finite.

Von Neumann algebras equipped with nice tracial states are finite von Neumann
algebras, as stated below.

Proposition G.20 [Li92, Proposition 6.3.15] Let A be a von Neumann algebra. If


there is a faithful normal tracial state on A , A is a finite von Neumann algebra.

We also have the following equivalent characterization of normal states on a von


Neumann algebra, see [Ped79, Theorem 3.6.4].

Proposition G.21 Let be a state on a von Neumann algebra A in B(H). Let


{i }i0 be an orthonormal basis for H and put, for x B(H), Tr(x) = i xi , i .
Then the following are equivalent:
is normal;
there exists an operator x of trace class on H such that (y) = Tr(xy);
is weakly continuous on the unit ball of A .

G.5 Noncommutative functional calculus

We take to be a linear form on a unital complex algebra A equipped with an


involution such that, for all a A ,

(aa ) 0 . (G.4)

Then, for all a, b A , we have (a b) = (b a) and the noncommutative version


of the CauchySchwarz inequality, namely
1 1
| (a b)| (a a) 2 (b b) 2 . (G.5)
458 A PPENDICES

(See, e.g., [Ped79, Theorem 3.1.3].) Moreover, by an application of Minkowskis


inequality,
1 1 1
((a + b) (a + b)) 2 (aa ) 2 + (bb ) 2 . (G.6)

Lemma G.22 If is as above and, in addition, for some norm   on A , | (a)|


a for all a A , then
| (b a ab)| a a (b b) . (G.7)

Proof By the CauchySchwarz inequality (G.5), the claim is trivial if (b b) = 0.


Thus, fix b A with (b b) > 0. Define
(b ab)
b (a) = .
(b b)
Note that b is still a linear form on A satisfying (G.4). Thus, for all a1 , a2 A ,
by the CauchySchwarz inequality (G.5) applied to b (a1 a2 ),
| (b a1 a2 b)|2 (b a1 a1 b) (b a2 a2 b) .
Taking a1 = (a a)2 and a2 the unit in A yields
n

(b (a a)2 b)2 (b (a a)2 b) (b b) .


n n+1

Chaining these inequalities gives


n n n n
(b (a a)b) (b (a a)2 b)2 (b b)12 b (a a)2 b2 (b b)12 .
n n

Using the sub-multiplicativity of the norm and taking the limit as n yields
(G.7).

We next assume that (A , ,  ) is a von Neumann algebra and a tracial state


on (A , ). The following noncommutative versions of Holder inequalities can be
found in [Nel74].
1
For a A , we denote |a| = (aa ) 2 . We have, for a, b A , b a self-adjoint
bounded operator,
| (ab)| b (|a|) . (G.8)
We have the noncommutative Holder inequality saying that for all p, q 1 such
that 1p + 1q = 1, we have
1 1
| (ab)| (|a|q ) q (|b| p ) p . (G.9)
More generally, see [FaK86, Theorem 4.9(i)], for all r 0 and p1 + q1 = r1 ,
1 1 1
| (|ab|r )| r (|a|q ) q (|b| p ) p . (G.10)
This generalizes and extends the matricial case of (A.13).
H. S TOCHASTIC CALCULUS NOTIONS 459

H Stochastic calculus notions

A good background on stochastic analysis, at a level suitable to our needs, is


provided in [KaS91] and [ReY99].

Definition H.1 Let (, F ) be a measurable space.


(i) A filtration Ft ,t 0, is a nondecreasing family of sub- -fields of F .
(ii) A random time T is a stopping time of the filtration Ft , t 0, if the event
{T t} belongs to the -field Ft for all t 0.
(iii) A process Xt , t 0, is adapted to the filtration Ft if, for all t 0, Xt is an
Ft -measurable random variable. In this case, we say {Xt , Ft ,t 0} is an
adapted process.
(iv) Let {Xt , Ft ,t 0} be an adapted process, so that E[|Xt |] < for all t 0.
The process Xt ,t 0 is said to be an Ft martingale (respectively, sub-
martingale) if, for every 0 s < t < ,

E[Xt |Fs ] = Xs , resp. , E[Xt |Fs ] Xs .

(v) Let Xt , t 0, be an Ft martingale, so that E[Xt2 ] < for all t 0. The


martingale bracket Xt , t 0 of Xt is the unique adapted increasing pro-
cess so that Xt2 Xt is a martingale for the filtration Ft .
(vi) If Xt , t 0, and Yt , t 0, are Ft martingales, their cross-bracket is defined
as X,Y t = [X +Y t X Y t ]/4.

In the case when the martingale Xt possesses continuous paths, Xt equals its
quadratic variation. The usefulness of the notion of bracket of a continuous mar-
tingale is apparent in the following.

Theorem H.2 (Levy) Let {Xt , Ft ,t 0} with X0 = 0 be a continuous, adapted,


n-dimensional process such that each component is a continuous Ft -martingale
and the martingale cross bracket X i , X j t = i, j t. Then the components Xti are
independent Brownian motions.

Let Xt ,t 0 be a real-valued Ft adapted process, and let B be a Brownian motion.



Assume that E[ 0T Xt2 dt] < . Then
 T n1

0
Xt dBt := lim
n
X Tnk (B T (k+1)
n
B Tk )
n
k=0

exists, the convergence holds in L2 and the limit does not depend on the choice of
the discretization of [0, T ] (see [KaS91, Chapter 3]).
460 A PPENDICES

One can therefore consider the problem of finding solutions to the integral equa-
tion
 t  t
Xt = X0 + (Xs )dBs + b(Xs )ds (H.1)
0 0
with a given X0 , and b some functions on Rn , and B a n-dimensional Brownian
motion. This can be written under the differential form
dXs = (Xs )dBs + b(Xs )ds . (H.2)
There are at least two notions of solutions: strong solutions and weak solutions.

Definition H.3 [KaS91, Definition 5.2.1] A strong solution of the stochastic dif-
ferential equation (H.2) on the given probability space (, F ) and with respect to
the fixed Brownian motion B and initial condition is a process {Xt ,t 0} with
continuous sample paths so that the following hold.
(i) Xt is adapted to the filtration Ft given by Ft = (Gt N ), with
Gt = (Bs , s t; X0 ), N = {N , G G with N G, P(G) = 0} .
(ii) P(X0 = ) = 1.
t
0 (|bi (Xs )| + |i j (Xs )| )ds < ) = 1 for all i, j n.
(iii) P(t, 2

(iv) (H.1) holds almost surely.

Definition H.4 [KaS91, Definition 5.3.1] A weak solution of the stochastic dif-
ferential equation (H.2) is a pair (X, B) and a triple (, F , P) so that (, F , P)
is a probability space equipped with a filtration Ft , B is an n-dimensional Brow-
nian motion, and X is a continuous adapted process, satisfying (iii) and (iv) in
Definition H.3.

There are also two notions of uniqueness.

Definition H.5 [KaS91, Definition 5.3.4]


We say that strong uniqueness holds if two solutions with common prob-
ability space, common Brownian motion B and common initial condition
are almost surely equal at all times.
We say that weak uniqueness, or uniqueness in the sense of probability
law, holds if any two weak solutions have the same law.

Theorem H.6 Suppose that b and satisfy


b(t, x) b(t, y) +  (t, x) (t, y) Kx y ,
b(t, x)2 +  (t, x)2 K 2 (1 + x2 ) ,
H. S TOCHASTIC CALCULUS NOTIONS 461

for some finite constant K independent of t. Then there exists a unique solution to
(H.2), and it is strong. Moreover, it satisfies
 T
E[ b(t, Xt )2 dt] < ,
0

for all T 0.

Theorem H.7 Any weak solutions (X i , Bi , i , F i , Pi )i=1,2 of (H.2) with = In , so


that
 T
E[ b(t, Xti )2 dt] < ,
0

for all T < and i = 1, 2, have the same law.

Theorem H.8 (BurkholderDavisGundy inequality) There exist universal con-


stants m , m so that, for all m N, and any continuous local martingale (Mt ,t
0) with bracket (At ,t 0),

m E(Am
T ) E(sup Mt ) m E(AT ) .
2m m
tT

Theorem H.9 (Ito, KunitaWatanabe) Let f : RR be a function of class C 2


and let X = {Xt , Ft ; 0 t < } be a continuous semi-martingale with decompo-
sition
Xt = X0 + Mt + At ,

where M is a local martingale and A the difference of continuous, adapted, non-


decreasing processes. Then, almost surely,
 t  t  2
1
f (Xt ) = f (X0 )+ f (Xs )dMs + f (Xs )dAs + f (Xs )dMs , 0 t < .
0 0 2 0

Theorem H.10 (Novikov) Let {Xt , Ft ,t 0} be an adapted process with values


in Rd such that
1 T
E[e 2 0 di=1 (Xti )2 dt ] <

for all T R+ . Then, if {Wt , Ft ,t 0} is a d-dimensional Brownian motion, then


 t  t d
1
Mt = exp{
0
Xu .dWu
2 (Xui )2 du}
0 i=1

is an Ft -martingale.
462 A PPENDICES

Theorem H.11 (Girsanov) Let {Xt , Ft ,t 0} be an adapted process with values


in Rd such that
1 T
E[e 2 0 di=1 (Xti )2 dt ] < .

Then, if {Wt , Ft , P, 0 t T } is a d-dimensional Brownian motion,


 t
Wti = Wti Xsi ds , 0t T,
0

is a d-dimensional Brownian motion under the probability measure


 T  T d
1
P = exp{
0
Xu dWu
2 (Xui )2 du}P .
0 i=1

Theorem H.12 Let {Xt , Ft , 0 t < } be a submartingale whose every path is


right-continuous. Then, for any > 0 and > 0,

P( sup Xt ) E[X+ ] .
0t

We shall use the following consequence.

Corollary H.13 Let {Xt , Ft ,t 0} be an adapted process with values in Rd , such


that
 T  T d

0
Xt 2 dt =
0 i=1
(Xti )2 dt
is uniformly bounded by the constant AT . Let {Wt , Ft ,t 0} be a d-dimensional
Brownian motion. Then, for any L > 0,
 t L 2
2A
P( sup | Xu dWu | L) 2e T .
0tT 0

t
Proof We denote in short Yt = 0 Xu .dWu and write, for > 0,

P( sup |Yt | A) P( sup e Yt e A ) + P( sup e Yt e A )


0tT 0tT 0tT
 2 

2 AT
Yt 2 0t Xu 2 du A
P sup e e 2
0tT
 2 t

2 AT
Yt 2 0 Xu  du
2 A
+P sup e e 2 .
0tT
H. S TOCHASTIC CALCULUS NOTIONS 463
2  t
By Theorem H.10, Mt = e Yt 2 0 Xu  du is a nonnegative martingale. Thus, by
2

Chebyshevs inequality and Doobs inequality,


 
2A 2 AT 2 AT
A 2 T
P sup Mt e e A+ 2 E[MT ] = e A+ 2 .
0tT

Optimizing with respect to completes the proof.


The next statement, an easy consequence of the DubinsSchwartz time change


identities (see [KaS91, Thm. 3.4.6]), was extended in [Reb80] to a much more
general setup than we need to consider.

Theorem H.14 (Rebolledos Theorem) Let n N, and let MN be a sequence of


continuous centered martingales with values in Rn with bracket MN  converging
pointwise (that is, for all t 0) in L1 towards a continuous deterministic function
(t). Then, for any T > 0, (MN (t),t [0, T ]) converges in law as a continuous
process from [0, T ] into Rn towards a Gaussian process G with covariance
E[G(s)GT (t)] = (t s) .
References

[Ada69] F. Adams. Lectures on Lie groups. New York, NY, W. A. Benjamin, 1969.
[Adl05] M. Adler. PDEs for the Dyson, Airy and sine processes. Ann. Inst. Fourier (Greno-
ble), 55:18351846, 2005.
[AdvM01] M. Adler and P. van Moerbeke. Hermitian, symmetric and symplectic random
ensembles: PDEs for the distribution of the spectrum. Annals Math., 153:149189,
2001.
[AdvM05] M. Adler and P. van Moerbeke. PDEs for the joint distributions of the Dyson,
Airy and sine processes. Annals Probab., 33:13261361, 2005.
[Aig79] M. Aigner. Combinatorial Theory. New York, NY, Springer, 1979.
[AlD99] D. Aldous and P. Diaconis. Longest increasing subsequences: from patience sort-
ing to the BaikDeiftJohansson theorem. Bull. Amer. Math. Soc. (N.S.), 36:413432,
1999.
[AlKV02] N. Alon, M. Krivelevich and V. H. Vu. On the concentration of eigenvalues of
random symmetric matrices. Israel J. Math., 131:259267, 2002.
[AnAR99] G. E. Andrews, R. Askey and R. Roy. Special Functions, volume 71 of Ency-
clopedia of Mathematics and its Applications. Cambridge University Press, 1999.
[And90] G. W. Anderson. The evaluation of Selberg sums. C.R. Acad. Sci. I.-Math.,
311:469472, 1990.
[And91] G. W. Anderson. A short proof of Selbergs generalized beta formula. Forum
Math., 3:415417, 1991.
[AnZ05] G. W. Anderson and O. Zeitouni. A CLT for a band matrix model. Probab. Theory
Rel. Fields, 134:283338, 2005.
[AnZ08a] G. W. Anderson and O. Zeitouni. A CLT regularized sample covariance matrices.
Ann. Statistics, 36:25532576, 2008.
[AnZ08b] G. W. Anderson and O. Zeitouni. A LLN for finite-range dependent random
matrices. Comm. Pure Appl. Math., 61:11181154, 2008.
[AnBC+ 00] C. Ane, S. Blachere, D. Chaf, P. Fougeres, I. Gentil, F. Malrieu, C. Roberto
and G. Scheffer. Sur les inegalites de Sobolev logarithmique, volume 11 of Panoramas
et Synthese. Paris, Societe Mathematique de France, 2000.
[Ans02] M. Anshelevich. Ito formula for free stochastic integrals. J. Funct. Anal., 188:292
315, 2002.
[Arh71] L. V. Arharov. Limit theorems for the characteristic roots of a sample covariance
matrix. Dokl. Akad. Nauk SSSR, 199:994997, 1971.
[Arn67] L. Arnold. On the asymptotic distribution of the eigenvalues of random matrices.
J. Math. Anal. Appl., 20:262268, 1967.

465
466 R EFERENCES

[AuBP07] A. Auffinger, G. Ben Arous and S. Peche. Poisson convergence for the largest
eigenvalues of heavy tailed random matrices. arXiv:0710.3132v3 [math.PR], 2007.
[Bai93a] Z. D. Bai. Convergence rate of expected spectral distributions of large random
matrices. I. Wigner matrices. Annals Probab., 21:625648, 1993.
[Bai93b] Z. D. Bai. Convergence rate of expected spectral distributions of large random
matrices. II. Sample covariance matrices. Annals Probab., 21:649672, 1993.
[Bai97] Z. D. Bai. Circular law. Annals Probab., 25:494529, 1997.
[Bai99] Z. D. Bai. Methodologies in spectral analysis of large-dimensional random matri-
ces, a review. Stat. Sinica, 9:611677, 1999.
[BaS98a] Z. D. Bai and J. W. Silverstein. No eigenvalues outside the support of the lim-
iting spectral distribution of large-dimensional sample covariance matrices. Annals
Probab., 26:316345, 1998.
[BaS04] Z. D. Bai and J. W. Silverstein. CLT for linear spectral statistics of large-
dimensional sample covariance matrices. Annals Probab., 32:553605, 2004.
[BaY88] Z. D. Bai and Y. Q. Yin. Necessary and sufficient conditions for almost sure
convergence of the largest eigenvalue of a Wigner matrix. Annals Probab., 16:1729
1741, 1988.
[BaY05] Z. D. Bai and J.-F. Yao. On the convergence of the spectral empirical process of
Wigner matrices. Bernoulli, 6:10591092, 2005.
[BaBP05] J. Baik, G. Ben Arous and S. Peche. Phase transition of the largest eigenvalue for
nonnull complex sample covariance matrices. Annals Probab., 33:16431697, 2005.
[BaBD08] J. Baik, R. Buckingham and J. DiFranco. Asymptotics of TracyWidom distri-
butions and the total integral of a Painleve II function. Comm. Math. Phys., 280:463
497, 2008.
[BaDJ99] J. Baik, P. Deift and K. Johansson. On the distribution of the length of the longest
increasing subsequence of random permutations. J. Amer. Math. Soc., 12:11191178,
1999.
[BaDS09] J. Baik, P. Deift and T. Suidan. Some Combinatorial Problems and Random
Matrix Theory. To appear, 2009.
[Bak94] D. Bakry. Lhypercontractivite et son utilisation en theorie des semigroupes, vol-
ume 1581 of Lecture Notes in Mathematics, pages 1114. Berlin, Springer, 1994.
[BaE85] D. Bakry and M. Emery. Diffusions hypercontractives. In Seminaire de proba-
bilites, XIX, 1983/84, volume 1123 of Lecture Notes in Mathematics, pages 177206.
Berlin, Springer, 1985.
[BaNT02] O. E. Barndorff-Nielsen and S. Thorbjrnsen. Levy processes in free probability.
Proc. Natl. Acad. Sci. USA, 99:1657616580, 2002.
[BaNT04] O. E. Barndorff-Nielsen and S. Thorbjrnsen. A connection between free and
classical infinite divisibility. Infin. Dimens. Anal. Qu., 7:573590, 2004.
[BeB07] S. T. Belinschi and H. Bercovici. A new approach to subordination results in free
probability. J. Anal. Math., 101:357365, 2007.
[BN08] S. T. Belinschi and A. Nica. -series and a Boolean Bercovici-Pata bijection for
bounded k-tuples. Adv. Math., 217:141, 2008.
[BeDG01] G. Ben Arous, A. Dembo and A. Guionnet. Aging of spherical spin glasses.
Probab. Theory Rel. Fields, 120:167, 2001.
[BeG97] G. Ben Arous and A. Guionnet. Large deviations for Wigners law and
Voiculescus non-commutative entropy. Probab. Theory Rel. Fields, 108:517542,
1997.
[BeG08] G. Ben Arous and A. Guionnet. The spectrum of heavy-tailed random matrices.
Comm. Math. Phys., 278:715751, 2008.
[BeP05] G. Ben Arous and S. Peche. Universality of local eigenvalue statistics for some
sample covariance matrices. Comm. Pure Appl. Math., 58:13161357, 2005.
R EFERENCES 467

[BeZ98] G. Ben Arous and O. Zeitouni. Large deviations from the circular law. ESAIM
Probab. Statist., 2:123134, 1998.
[BeG05] F. Benaych-Georges. Classical and free infinitely divisible distributions and ran-
dom matrices. Annals Probab., 33:11341170, 2005.
[BeG09] F. Benaych-Georges. Rectangular random matrices, related convolution. Probab.
Theory Rel. Fields, 144:471515, 2009.
[BeP00] H. Bercovici and V. Pata. A free analogue of Hincins characterization of infinite
divisibility. P. Am. Math. Soc., 128:10111015, 2000.
[BeV92] H. Bercovici and D. Voiculescu. Levy-Hincin type theorems for multiplicative and
additive free convolution. Pacific J. Math., 153:217248, 1992.
[BeV93] H. Bercovici and D. Voiculescu. Free convolution of measures with unbounded
support. Indiana U. Math. J., 42:733773, 1993.
[Ber66] S.J. Bernau. The spectral theorem for unbounded normal operators. Pacific J.
Math., 19:391406, 1966.
[Bia95] P. Biane. Permutation model for semi-circular systems and quantum random walks.
Pacific J. Math., 171:373387, 1995.
[Bia97a] P. Biane. Free Brownian motion, free stochastic calculus and random matrices. In
Free Probability Theory (Waterloo, ON 1995), volume 12 of Fields Inst. Commun.,
pages 119. Providence, RI, American Mathematical Society, 1997.
[Bia97b] P. Biane. On the free convolution with a semi-circular distribution. Indiana U.
Math. J., 46:705718, 1997.
[Bia98a] P. Biane. Processes with free increments. Math. Z., 227:143174, 1998.
[Bia98b] P. Biane. Representations of symmetric groups and free probability. Adv. Math.,
138:126181, 1998.
[Bia01] P. Biane. Approximate factorization and concentration for characters of symmetric
groups. Int. Math. Res. Not., pages 179192, 2001.
[BiBO05] P. Biane, P. Bougerol and N. OConnell. Littelmann paths and Brownian paths.
Duke Math. J., 130:127167, 2005.
[BiCG03] P. Biane, M. Capitaine and A. Guionnet. Large deviation bounds for matrix
Brownian motion. Invent. Math., 152:433459, 2003.
[BiS98b] P. Biane and R. Speicher. Stochastic calculus with respect to free Brownian mo-
tion and analysis on Wigner space. Probab. Theory Rel. Fields, 112:373409, 1998.
[BlI99] P. Bleher and A. Its. Semiclassical asymptotics of orthogonal polynomials,
Riemann-Hilbert problem, and universality in the matrix model. Annals Math.,
150:185266, 1999.
[BoG99] S. G. Bobkov and F. G. Gotze. Exponential integrability and transportation cost
related to log-Sobolev inequalities. J. Funct. Anal., 163:128, 1999.
[BoL00] S. G. Bobkov and M. Ledoux. From BrunnMinkowski to BrascampLieb and to
logarithmic Sobolev inequalities. Geom. Funct. Anal., 10:10281052, 2000.
[BoMP91] L. V. Bogachev, S. A. Molchanov and L. A. Pastur. On the density of states of
random band matrices. Mat. Zametki, 50:3142, 157, 1991.
[BoNR08] P. Bougarde, A. Nikeghbali and A. Rouault. Circular jacobi ensembles and de-
formed verblunsky coefficients. arXiv:0804.4512v2 [math.PR], 2008.
[BodMKV96] A. Boutet de Monvel, A. Khorunzhy and V. Vasilchuk. Limiting eigenvalue
distribution of random matrices with correlated entries. Markov Process. Rel. Fields,
2:607636, 1996.
[Bor99] A. Borodin. Biorthogonal ensembles. Nuclear Phys. B, 536:704732, 1999.
[BoOO00] A. Borodin, A. Okounkov and G. Olshanski. Asymptotics of Plancherel mea-
sures for symmetric groups. J. Amer. Math. Soc., 13:481515, 2000.
[BoS03] A. Borodin and A. Soshnikov. Janossy densities. I. Determinantal ensembles. J.
Statist. Phys., 113:595610, 2003.
468 R EFERENCES

[Bou87] N. Bourbaki. Elements of Mathematics General Topology. Berlin, Springer,


1987.
[Bou05] N. Bourbaki. Lie Groups and Lie Algebras. Berlin, Springer, 2005.
[BrIPZ78] E. Brezin, C. Itzykson, G. Parisi and J. B. Zuber. Planar diagrams. Comm. Math.
Phys., 59:3551, 1978.
[Bru91] M. F. Bru. Wishart processes. J. Theoret. Probab., 4:725751, 1991.
[BrDJ06] W. Bryc, A. Dembo and T. Jiang. Spectral measure of large random Hankel,
Markov and Toeplitz matrices. Annals Probab., 34:138, 2006.
[BuP93] R. Burton and R. Pemantle. Local characteristics, entropy and limit theorems for
spanning trees and domino tilings via transfer-impedances. Annals Probab., 21:1329
1371, 1993.
[Cab01] T. Cabanal-Duvillard. Fluctuations de la loi empirique de grande matrices
aleatoires. Ann. Inst. H. Poincare Probab. Statist., 37:373402, 2001.
[CaG01] T. Cabanal-Duvillard and A. Guionnet. Large deviations upper bounds for the
laws of matrix-valued processes and non-communicative entropies. Annals Probab.,
29:12051261, 2001.
[CaMV03] M. J. Cantero, L. Moral and L. Velazquez. Five-diagonal matrices and zeros of
orthogonal polynomials on the unit circle. Linear Algebra Appl., 362:2956, 2003.
[CaC04] M. Capitaine and M. Casalis. Asymptotic freeness by generalized moments for
Gaussian and Wishart matrices. Application to beta random matrices. Indiana Univ.
Math. J., 53:397431, 2004.
[CaC06] M. Capitaine and M. Casalis. Cumulants for random matrices as convolutions on
the symmetric group. Probab. Theory Rel. Fields, 136:1936, 2006.
[CaD07] M. Capitaine and C. Donati-Martin. Strong asymptotic freeness for Wigner and
Wishart matrices. Indiana Univ. Math. J., 56:767803, 2007.
[ChG08] G. P. Chistyakov and F. Gotze. Limit theorems in free probability theory I. Annals
Probab., 36:5490, 2008.
[Chv83] V. Chvatal. Linear Programming. New York, NY, W. H. Freeman, 1983.
[Col03] B. Collins. Moments and cumulants of polynomial random variables on unitary
groups, the Itzykson-Zuber integral, and free probability. Int. Math. Res. Not., pages
953982, 2003.
[CoMG06] B. Collins, E. Maurel-Segala and A. Guionnet. Asymptotics of unitary and
orthogonal matrix integrals. arxiv:math/0608193 [math.PR], 2006.
[CoS05] A. Connes and D. Shlyakhtenko. L2 -homology for von Neumann algebras. J.
Reine Angew. Math., 586:125168, 2005.
[CoL95] O. Costin and J. Lebowitz. Gaussian fluctuations in random matrices. Phys. Rev.
Lett., 75:6972, 1995.
[DaVJ88] D. J. Daley and D. Vere-Jones. An Introduction to the Theory of Point Processes.
Springer Series in Statistics. New York, NY, Springer, 1988.
[DaS01] K. R. Davidson and S. J. Szarek. Local operator theory, random matrices and
Banach spaces. In Handbook of the Geometry of Banach Spaces, Vol. I, pages 317
366. Amsterdam, North-Holland, 2001.
[Dei99] P. A. Deift. Orthogonal Polynomials and Random Matrices: a Riemann-Hilbert
Approach, volume 3 of Courant Lecture Notes in Mathematics. New York, NY, New
York University Courant Institute of Mathematical Sciences, 1999.
[Dei07] P. Deift. Universality for mathematical and physical systems. In International
Congress of Mathematicians 2006. Vol. I, pages 125152. Zurich, Eur. Math. Soc.,
2007.
[DeG09] P. A. Deift and D. Gioev. Invariant Random Matrix Ensembles: General Theory
and Universality. Courant Lecture Notes in Mathematics. New York, NY, New York
University Courant Institute of Mathematical Sciences, 2009. To appear.
R EFERENCES 469

[DeIK08] P. Deift, A. Its and I. Krasovsky. Asymptotics of the Airykernel determinant.


Comm. Math. Phys., 278:643678, 2008.
[DeIZ97] P. A. Deift, A. R. Its and X. Zhou. A RiemannHilbert approach to asymptotic
problems arising in the theory of random matrix models, and also in the theory of
integrable statistical mechanics. Annals Math., 146:149235, 1997.
[DeKM+ 98] P. Deift, T. Kriecherbauer, K.T.-R. McLaughlin, S. Venakides and X. Zhou.
Uniform asymptotics for orthogonal polynomials. Doc. Math., III:491501, 1998.
Extra volume ICM 1998.
[DeKM+ 99] P. Deift, T. Kriecherbauer, K. T-R. McLaughlin, S. Venakides and X. Zhou.
Uniform asymptotics for polynomials orthogonal with respect to varying exponential
weights and applications to universality questions in random matrix theory. Comm.
Pure Appl. Math., 52:13351425, 1999.
[DeVZ97] P. Deift, S. Venakides and X. Zhou. New results in small dispersion KdV by an
extension of the steepest descent method for RiemannHilbert problems. Int. Math.
Res. Not., pages 286299, 1997.
[DeZ93] P. A. Deift and X. Zhou. A steepest descent method for oscillatory Riemann
Hilbert problems. Asymptotics for the MKdV equation. Annals Math., 137:295368,
1993.
[DeZ95] P. A. Deift and X. Zhou. Asymptotics for the Painleve II equation. Comm. Pure
Appl. Math., 48:277337, 1995.
[DeZ98] A. Dembo and O. Zeitouni. Large Deviation Techniques and Applications. New
York, NY, Springer, second edition, 1998.
[Dem07] N. Demni. The Laguerre process and generalized HartmanWatson law.
Bernoulli, 13:556580, 2007.
[DeS89] J. D. Deuschel and D. W. Stroock. Large Deviations. Boston, MA, Academic
Press, 1989.
[DiE01] P. Diaconis and S. N. Evans. Linear functionals of eigenvalues of random matrices.
Trans. Amer. Math. Soc., 353:26152633, 2001.
[DiS94] P. Diaconis and M. Shahshahani. On the eigenvalues of random matrices. J. Appl.
Probab., 31A:4962, 1994.
[Dix69] J. Dixmier. Les C -algebres et leurs Representations. Les Grands Classiques
Gauthier-Villars. Paris, Jacques Gabay, 1969.
[Dix05] A. L. Dixon. Generalizations of Legendres formula ke (k e)k = /2. Proc.
London Math. Society, 3:206224, 1905.
[DoO05] Y. Doumerc and N. OConnell. Exit problems associated with finite reflection
groups. Probab. Theory Rel. Fields, 132:501538, 2005.
[Dud89] R. M. Dudley. Real Analysis and Probability. Pacific Grove, CA, Wadsworth &
Brooks/Cole, 1989.
[Due04] E. Duenez. Random matrix ensembles associated to compact symmetric spaces.
Comm. Math. Phys., 244:2961, 2004.
[DuE02] I. Dumitriu and A. Edelman. Matrix models for beta ensembles. J. Math. Phys.,
43:58305847, 2002.
[DuE06] I. Dumitriu and A. Edelman. Global spectrum fluctuations for the -Hermite and
-Laguerre ensembles via matrix models. J. Math. Phys., 47:063302, 36, 2006.
[DuS58] N. Dunford and J. T. Schwartz. Linear Operators, Part I. New York, NY, Inter-
science Publishers, 1958.
[Dur96] R. Durrett. Probability: Theory and Examples. Belmont, MA, Duxbury Press,
second edition, 1996.
[Dyk93a] K. Dykema. Free products of hyperfinite von Neumann algebras and free dimen-
sion. Duke Math. J., 69:97119, 1993.
[Dyk93b] K. Dykema. On certain free product factors via an extended matrix model. J.
470 R EFERENCES

Funct. Anal., 112:3160, 1993.


[Dys62a] F. J. Dyson. A Brownian-motion model for the eigenvalues of a random matrix.
J. Math. Phys., 3:11911198, 1962.
[Dys62b] F. J. Dyson. Statistical theory of the energy levels of complex systems. I. J. Math.
Phys., 3:140156, 1962.
[Dys62c] F. J. Dyson. Statistical theory of the energy levels of complex systems. II. J. Math.
Phys., 3:157165, 1962.
[Dys62d] F. J. Dyson. Statistical theory of the energy levels of complex systems. III. J.
Math. Phys., 3:166175, 1962.
[Dys62e] F. J. Dyson. The threefold way. Algebraic structure of symmetry groups and
ensembles in quantum mechanics. J. Math. Phys., 3:11991215, 1962.
[Dys70] F. J. Dyson. Correlations between eigenvalues of a random matrix. Comm. Math.
Phys., 19:235250, 1970.
[DyM63] F. J. Dyson and M. L. Mehta. Statistical theory of the energy levels of complex
systems. IV. J. Math. Phys., 4:701712, 1963.
[Ede97] A. Edelman. The probability that a random real Gaussian matrix has k real eigen-
values, related distributions, and the circular law. J. Multivariate Anal., 60:203232,
1997.
[EdS07] A. Edelman and B. D. Sutton. From random matrices to stochastic operators. J.
Stat. Phys., 127:11211165, 2007.
[ERSY09] L. Erdos, J. Ramrez, B. Schlein and H.-T. Yau. Bulk universality for Wigner
matrices. arXiv:0905.4176v1 [math-ph], 2009.
[ERS+ 09] L. Erdos, J. Ramrez, B. Schlein, T. Tao, V. Vu and H.-T. Yau. Bulk univer-
sality for Wigner Hermitian matrices with subexponential decay. arXiv:0906.4400v1
[math.PR], 2009
[EvG92] L. C. Evans and R. F. Gariepy. Measure Theory and Fine Properties of Functions.
Boca Raton, CRC Press, 1992.
[Eyn03] B. Eynard. Master loop equations, free energy and correlations for the chain of
matrices. J. High Energy Phys., 11:018, 45 pp., 2003.
[EyB99] B. Eynard and G. Bonnet. The Potts-q random matrix model: loop equations,
critical exponents, and rational case. Phys. Lett. B, 463:273279, 1999.
[FaK86] T. Fack and H. Kosaki. Generalized s-numbers of -measurable operators. Pacific
J. Math., 123:269300, 1986.
[FaP03] D. G. Farenick and B. F. Pidkowich. The spectral theorem in quaternions. Linear
Algebra and its Applications, 371:75102, 2003.
[Fed69] H. Federer. Geometric Measure Theory. New York, NY, Springer, 1969.
[Fel57] W. Feller. An Introduction to Probability Theory and its Applications, Part I. New
York, NY, Wiley, second edition, 1957.
[FeP07] D. Feral and S. Peche. The largest eigenvalue of rank one deformation of large
Wigner matrices. Comm. Math. Phys., 272:185228, 2007.
[FoIK92] A. S. Fokas, A. R. Its and A. V. Kitaev. The Isomonodromy approach to matrix
models in 2D quantum gravity. Comm. Math. Phys., 147:395430, 1992.
[FoIKN06] A. S. Fokas, A. R. Its, A. A. Kapaev and V. Yu. Novokshenov. Painleve Tran-
scendents. The RiemannHilbert Approach, volume 128 of Mathematical Surveys and
Monographs. Providence, RI, American Mathematical Society, 2006.
[For93] P. J. Forrester. The spectrum edge of random matrix ensembles. Nuclear Phys. B,
402:709728, 1993.
[For94] P. J. Forrester. Exact results and universal asymptotics in the Laguerre random
matrix ensemble. J. Math. Phys., 35:25392551, 1994.
[For05] P. J. Forrester. Log-gases and Random Matrices. 2005.
http://www.ms.unimelb.edu.au/ matpjf/matpjf.html.
R EFERENCES 471

[For06] P. J. Forrester. Hard and soft edge spacing distributions for random matrix ensem-
bles with orthogonal and symplectic symmetry. Nonlinearity, 19:29893002, 2006.
[FoO08] P. J. Forrester and S. Ole Warnaar. The importance of the Selberg integral. Bulletin
AMS, 45:489534, 2008.
[FoR01] P. J. Forrester and E. M. Rains. Interrelationships between orthogonal, unitary
and symplectic matrix ensembles. In Random Matrix Models and their Applications,
volume 40 of Math. Sci. Res. Inst. Publ., pages 171207. Cambridge, Cambridge Uni-
versity Press, 2001.
[FoR06] P. J. Forrester and E. M. Rains. Jacobians and rank 1 perturbations relating to
unitary Hessenberg matrices. Int. Math. Res. Not., page 48306, 2006.
[FrGZJ95] P. Di Francesco, P. Ginsparg and J. Zinn-Justin. 2D gravity and random matrices.
Phys. Rep., 254:133, 1995.
[FuK81] Z. Furedi and J. Komlos. The eigenvalues of random symmetric matrices. Combi-
natorica, 1:233241, 1981.
[Ge97] L. Ge. Applications of free entropy to finite von Neumann algebras. Amer. J. Math.,
119:467485, 1997.
[Ge98] L. Ge. Applications of free entropy to finite von Neumann algebras. II. Annals
Math., 147:143157, 1998.
[Gem80] S. Geman. A limit theorem for the norm of random matrices. Annals Probab.,
8:252261, 1980.
[GeV85] I. Gessel and G. Viennot. Binomial determinants, paths, and hook length formulae.
Adv. Math., 58:300321, 1985.
[GiT98] D. Gilbarg and N. S. Trudinger. Elliptic Partial Equations of Second Order. New
York, NY, Springer, 1998.
[Gin65] J. Ginibre. Statistical ensembles of complex, quaternion, and real matrices. J.
Math. Phys., 6:440449, 1965.
[Gir84] V. L. Girko. The circular law. Theory Probab. Appl., 29:694706, 1984.
[Gir90] V. L. Girko. Theory of Random Determinants. Dordrecht, Kluwer, 1990.
[GoT03] F. Gotze and A. Tikhomirov. Rate of convergence to the semi-circular law. Probab.
Theory Rel. Fields, 127:228276, 2003.
[GoT07] F. Gotze and A. Tikhomirov. The circular law for random matrices.
arXiv:0709.3995v3 [math.PR], 2007.
[GrKP94] R. Graham, D. Knuth and O. Patashnik. Concrete Mathematics: a Foundation
for Computer Science. Reading, MA, Addison-Wesley, second edition, 1994.
[GrS77] U. Grenander and J. W. Silverstein. Spectral analysis of networks with random
topologies. SIAM J. Appl. Math., 32:499519, 1977.
[GrMS86] M. Gromov, V. Milman and G. Schechtman. Asymptotic Theory of Finite Di-
mensional Normed Spaces, volume 1200 of Lectures Notes in Mathematics. Berlin,
Springer, 1986.
[GrPW91] D. Gross, T. Piran and S. Weinberg. Two dimensional quantum gravity and
random surfaces. In Jerusalem Winter School. Singapore, World Scientific, 1991.
[Gui02] A. Guionnet. Large deviation upper bounds and central limit theorems for band
matrices. Ann. Inst. H. Poincare Probab. Statist., 38:341384, 2002.
[Gui04] A. Guionnet. Large deviations and stochastic calculus for large random matrices.
Probab. Surv., 1:72172, 2004.
[GuJS07] A. Guionnet, V. F. R Jones and D. Shlyakhtenko. Random matrices, free proba-
bility, planar algebras and subfactors. arXiv:math/0712.2904 [math.OA], 2007.
[GuM05] A. Guionnet and M. Mada. Character expansion method for a matrix integral.
Probab. Theory Rel. Fields, 132:539578, 2005.
[GuM06] A. Guionnet and E. Maurel-Segala. Combinatorial aspects of matrix models.
Alea, 1:241279, 2006.
472 R EFERENCES

[GuM07] A. Guionnet and E. Maurel-Segala. Second order asymptotics for matrix models.
Ann. Probab., 35:21602212, 2007.
[GuS07] A. Guionnet and D. Shlyakhtenko. On classical analogues of free entropy dimen-
sion. J. Funct. Anal., 251:738771, 2007.
[GuS08] A. Guionnet and D. Shlyakhtenko. Free diffusion and matrix models with strictly
convex interaction. GAFA, 18:18751916, 2008.
[GuZ03] A. Guionnet and B. Zegarlinski. Lectures on logarithmic Sobolev inequalities.
In Seminaire de Probabilites XXXVI, volume 1801 of Lecture Notes in Mathematics.
Paris, Springer, 2003.
[GuZ00] A. Guionnet and O. Zeitouni. Concentration of the spectral measure for large
matrices. Electron. Commun. Prob., 5:119136, 2000.
[GuZ02] A. Guionnet and O. Zeitouni. Large deviations asymptotics for spherical integrals.
J. Funct. Anal., 188:461515, 2002.
[GZ04] A. Guionnet and O. Zeitouni. Addendum to: Large deviations asymptotics for
spherical integrals. J. Funct. Anal., 216:230241, 2004.
[Gus90] R. A. Gustafson. A generalization of Selbergs beta integral. B. Am. Math. Soc.,
22:97105, 1990.
[Haa02] U. Haagerup. Random matrices, free probability and the invariant subspace prob-
lem relative to a von Neumann algebra. In Proceedings of the International Congress
of Mathematicians, Vol. I (Beijing, 2002), pages 273290, Beijing, Higher Education
Press, 2002.
[HaS09] U. Haagerup and H. Schultz. Invariant subspaces for operators in a general II1 -
factor. To appear in Publ. Math. Inst. Hautes Etudes Sci., 2009.
[HaST06] U. Haagerup, H. Schultz and S. Thorbjrnsen. A random matrix approach to the
lack of projections in C (F2 ). Adv. Math., 204:183, 2006.
[HaT99] U. Haagerup and S. Thorbjrnsen. Random matrices and k-theory for exact C -
algebras. Doc. Math., 4:341450, 1999.
[HaT03] U. Haagerup and S. Thorbjrnsen. Random matrices with complex Gaussian en-
tries. Expo. Math., 21:293337, 2003.
[HaT05] U. Haagerup and S. Thorbjrnsen. A new application of random matrices:
Ext(C (F2 )) is not a group. Annals Math., 162:711775, 2005.
[HaLN06] W. Hachem, P. Loubaton and J. Najim. The empirical distribution of the eigen-
values of a Gram matrix with a given variance profile. Ann. Inst. H. Poincare Probab.
Statist., 42:649670, 2006.
[Ham72] J. M. Hammersley. A few seedlings of research. In Proceedings of the Sixth Berke-
ley Symposium on Mathematical Statistics and Probability (University of California,
Berkeley, CA, 1970/1971), Vol. I: Theory of Statistics, pages 345394, Berkeley, CA,
University of California Press, 1972.
[HaM05] C. Hammond and S. J. Miller. Distribution of eigenvalues for the ensemble of real
symmetric Toeplitz matrices. J. Theoret. Probab., 18:537566, 2005.
[HaZ86] J. Harer and D. Zagier. The Euler characteristic of the moduli space of curves.
Invent. Math., 85:457485, 1986.
[Har56] Harish-Chandra. Invariant differential operators on a semisimple Lie algebra. Proc.
Nat. Acad. Sci. U.S.A., 42:252253, 1956.
[HaTW93] J. Harnad, C. A. Tracy and H. Widom. Hamiltonian structure of equations ap-
pearing in random matrices. In Low-dimensional Topology and Quantum Field Theory
(Cambridge, 1992), volume 315 of Adv. Sci. Inst. Ser. B Phys., pages 231245, New
York, NY, NATO, Plenum, 1993.
[HaM80] S. P. Hastings and J. B. McLeod. A boundary value problem associated with the
second Painleve transcendent and the Kortewegde Vries equation. Arch. Rational
Mech. Anal., 73:3151, 1980.
R EFERENCES 473

[Hel01] S. Helgason. Differential Geometry, Lie Groups, and Symmetric Spaces, volume 34
of Graduate Studies in Mathematics. Providence, RI, American Mathematical Society,
2001. Corrected reprint of the 1978 original.
[HiP00a] F. Hiai and D. Petz. A large deviation theorem for the empirical eigenvalue distri-
bution of random unitary matrices. Ann. Inst. H. Poincare Probab. Statist., 36:7185,
2000.
[HiP00b] F. Hiai and D. Petz. The Semicircle Law, Free Random Variables and Entropy,
volume 77 of Mathematical Surveys and Monographs. Providence, RI, American
Mathematical Society, 2000.
[HoW53] A. J. Hoffman and H. W. Wielandt. The variation of the spectrum of a normal
matrix. Duke Math. J., 20:3739, 1953.
[HoJ85] R. A. Horn and C. R. Johnson. Matrix Analysis. Cambridge, Cambridge University
Press, 1985.
[HoX08] C. Houdre and H. Xu. Concentration of the spectral measure for large random
matrices with stable entries. Electron. J. Probab., 13:107134, 2008.
[HoKPV06] J. B. Hough, M. Krishnapur, Y. Peres and B. Virag. Determinantal processes
and independence. Probab. Surv., 3:206229, 2006.
[HoKPV09] J. B. Hough, M. Krishnapur, Y. Peres and B. Virag. Zeros of Gaussian Analytic
Functions and Determinantal Point Processes. Providence, RI, American Mathemat-
ical Society, 2009.
[Ism05] M. E. H. Ismail. Classical and Quantum Orthogonal Polynomials in One Vari-
able, volume 98 of Encyclopedia of Mathematics and its Applications. Cambridge,
Cambridge University Press, 2005.
[ItZ80] C. Itzykson and J. B. Zuber. The planar approximation. II. J. Math. Phys., 21:411
421, 1980.
[Jac85] N. Jacobson. Basic Algebra. I. New York, NY, W. H. Freeman and Company,
second edition, 1985.
[JiMMS80] M. Jimbo, T. Miwa, Y. Mori and M. Sato. Density matrix of an impenetrable
Bose gas and the fifth Painleve transcendent. Physica, 1D:80158, 1980.
[Joh98] K. Johansson. On fluctuations of eigenvalues of random Hermitian matrices. Duke
Math. J., 91:151204, 1998.
[Joh00] K. Johansson. Shape fluctuations and random matrices. Comm. Math. Phys.,
209:437476, 2000.
[Joh01a] K. Johansson. Discrete orthogonal polynomial ensembles and the Plancherel mea-
sure. Annals Math., 153:259296, 2001.
[Joh01b] K. Johansson. Universality of the local spacing distribution in certain ensembles
of Hermitian Wigner matrices. Comm. Math. Phys., 215:683705, 2001.
[Joh02] K. Johansson. Non-intersecting paths, random tilings and random matrices. Probab.
Theory Rel. Fields, 123:225280, 2002.
[Joh05] K. Johansson. The arctic circle boundary and the Airy process. Annals Probab.,
33:130, 2005.
[John01] I. M. Johnstone. On the distribution of the largest eigenvalue in principal compo-
nents analysis. Ann. Statist., 29:295327, 2001.
[Jon82] D. Jonsson. Some limit theorems for the eigenvalues of a sample covariance matrix.
J. Multivariate Anal., 12:138, 1982.
[Juh81] F. Juhasz. On the spectrum of a random graph. In Algebraic Methods in Graph
Theory, Coll. Math. Soc. J. Bolyai, volume 25, pages 313316. Amsterdam, North-
Holland, 1981.
[JuX03] M. Junge and Q. Xu. Noncommutative Burkholder/Rosenthal inequalities. Annals
Probab., 31:948995, 2003.
[Kal02] O. Kallenberg. Foundations of Modern Probability. Probability and its Applica-
474 R EFERENCES

tions. New York, NY, Springer, second edition, 2002.


[KaS91] I. Karatzas and S. Shreve. Brownian Motion and Stochastic Calculus, volume 113
of Graduate Texts in Mathematics. New York, NY, Springer, second edition, 1991.
[Kar07a] V. Kargin. The norm of products of free random variables. Probab. Theory Rel.
Fields, 139:397413, 2007.
[KaM59] S. Karlin and J. McGregor. Coincidence properties of birth and death processes.
Pacific J. Math., 9:11091140, 1959.
[Kar07b] N. El Karoui. TracyWidom limit for the largest eigenvalue of a large class of
complex sample covariance matrices. Annals Probab., 35:663714, 2007.
[KaS99] N. M. Katz and P. Sarnak. Random Matrices, Frobenius Eigenvalues, and Mon-
odromy, volume 45 of American Mathematical Society Colloquium Publications.
Providence, RI, American Mathematical Society, 1999.
[Kea06] J. P. Keating. Random matrices and number theory. In Applications of Random
Matrices in Physics, volume 221 of NATO Sci. Ser. II Math. Phys. Chem., pages 132.
Dordrecht, Springer, 2006.
[Kho01] A. M. Khorunzhy. Sparse random matrices: spectral edge and statistics of rooted
trees. Adv. Appl. Probab., 33:124140, 2001.
[KhKP96] A. M. Khorunzhy, B. A. Khoruzhenko and L. A. Pastur. Asymptotic properties
of large random matrices with independent entries,. J. Math. Phys., 37:50335060,
1996.
[KiN04] R. Killip and I. Nenciu. Matrix models for circular ensembles. Int. Math. Res.
Not., pages 26652701, 2004.
[KiN07] R. Killip and I. Nenciu. CMV: the unitary analogue of Jacobi matrices. Comm.
Pure Appl. Math., 60:11481188, 2007.
[KiS09] R. Killip and M. Stoiciu. Eigenvalue statistics for cmv matrices: from poisson to
clock via circular beta ensembles. Duke Math. J., 146:361399, 2009.
[KoO01] W. Konig and N. OConnell. Eigenvalues of the Laguerre process as non-colliding
squared Bessel processes. Electron. Comm. Probab., 6:107114, 2001.
[KoOR02] W. Konig, N. OConnell and S. Roch. Non-colliding random walks, tandem
queues, and discrete orthogonal polynomial ensembles. Electron. J. Probab., 7, 24
pp., 2002.
[Kra90] C. Krattenthaler. Generating functions for plane partitions of a given shape.
Manuscripta Math., 69:173201, 1990.
[Led01] M. Ledoux. The Concentration of Measure Phenomenon. Providence, RI, Ameri-
can Mathematical Society, 2001.
[Led03] M. Ledoux. A remark on hypercontractivity and tail inequalities for the largest
eigenvalues of random matrices. In Seminaire de Probabilites XXXVII, volume 1832
of Lecture Notes in Mathematics. Paris, Springer, 2003.
[Leh99] F. Lehner. Computing norms of free operators with matrix coefficients. Amer. J.
Math., 121:453486, 1999.
[Lev22] P. Levy. Lecons danalyse Fonctionnelle. Paris, Gauthiers-Villars, 1922.
[Li92] B. R. Li. Introduction to Operator Algebras. River Edge, NJ, World Scientific
Publishing Co., 1992.
[LiTV01] L. Li, A. M. Tulino and S. Verdu. Asymptotic eigenvalue moments for linear
multiuser detection. Commun. Inf. Syst., 1:273304, 2001.
[LiPRTJ05] A. E. Litvak, A. Pajor, M. Rudelson and N. Tomczak-Jaegermann. Smallest
singular value of random matrices and geometry of random polytopes. Adv. Math.,
195:491523, 2005.
[LoS77] B. F. Logan and L. A. Shepp. A variational problem for random Young tableaux.
Adv. Math., 26:206222, 1977.
[Maa92] H. Maassen. Addition of freely independent random variables. J. Funct. Anal.,
R EFERENCES 475

106:409438, 1992.
[Mac75] O. Macchi. The coincidence approach to stochastic point processes. Adv. Appl.
Probability, 7:83122, 1975.
[MaP67] V. A. Marcenko and L. A. Pastur. Distribution of eigenvalues in certain sets of
random matrices. Math. USSR Sb., 1:457483, 1967.
[Mat97] T. Matsuki. Double coset decompositions of reductive Lie groups arising from two
involutions. J. Algebra, 197:4991, 1997.
[Mat94] A. Matytsin. On the large-N limit of the ItzyksonZuber integral. Nuclear Phys.
B, 411:805820, 1994.
[Mau06] E. Maurel-Segala. High order asymptotics of matrix models and enumeration of
maps. arXiv:math/0608192v1 [math.PR], 2006.
[McTW77] B. McCoy, C. A. Tracy and T. T. Wu. Painleve functions of the third kind. J.
Math. Physics, 18:10581092, 1977.
[McK05] H. P. McKean. Stochastic integrals. Providence, RI, AMS Chelsea Publishing,
2005. Reprint of the 1969 edition, with errata.
[Meh60] M. L. Mehta. On the statistical properties of the level-spacings in nuclear spectra.
Nuclear Phys. B, 18:395419, 1960.
[Meh91] M.L. Mehta. Random Matrices. San Diego, Academic Press, second edition,
1991.
[MeD63] M. L. Mehta and F. J. Dyson. Statistical theory of the energy levels of complex
systems. V. J. Math. Phys., 4:713719, 1963.
[MeG60] M. L. Mehta and M. Gaudin. On the density of eigenvalues of a random matrix.
Nuclear Phys. B, 18:420427, 1960.
[Mil63] J. W. Milnor. Morse Theory. Princeton, NJ, Princeton University Press, 1963.
[Mil97] J. W. Milnor. Topology from the Differentiable Viewpoint. Princeton, NJ, Princeton
University Press, 1997. Revised printing of the 1965 edition.
[MiS05] I. Mineyev and D. Shlyakhtenko. Non-microstates free entropy dimension for
groups. Geom. Funct. Anal., 15:476490, 2005.
[MiN04] J. A. Mingo and A. Nica. Annular noncrossing permutations and partitions, and
second-order asymptotics for random matrices. Int. Math. Res. Not., pages 1413
1460, 2004.
[MiS06] J. A. Mingo and R. Speicher. Second order freeness and fluctuations of random
matrices. I. Gaussian and Wishart matrices and cyclic Fock spaces. J. Funct. Anal.,
235:226270, 2006.
[Mos80] J. Moser. Geometry of quadrics and spectral theory. In The Chern Symposium 1979
(Proc. Int. Sympos., Berkeley, CA., 1979), pages 147188, New York, NY, Springer,
1980.
[Mui81] R. J. Muirhead. Aspects of Multivariate Statistical Theory. New York, NY, John
Wiley & Sons, 1981.
[Mur90] G. J. Murphy. C -algebras and Operator Theory. Boston, MA, Academic Press,
1990.
[Nel74] E. Nelson. Notes on non-commutative integration. J. Funct. Anal., 15:103116,
1974.
[NiS97] A. Nica and R. Speicher. A Fourier transform for multiplicative functions on
non-crossing partitions. J. Algebraic Combin., 6:141160, 1997.
[NiS06] A. Nica and R. Speicher. Lectures on the Combinatorics of Free Probability, vol-
ume 335 of London Mathematical Society Lecture Note Series. Cambridge, Cam-
bridge University Press, 2006.
[NoRW86] J.R. Norris, L.C.G. Rogers and D. Williams. Brownian motions of ellipsoids.
Trans. Am. Math. Soc., 294:757765, 1986.
[Oco03] N. OConnell. Random matrices, non-colliding processes and queues. In
476 R EFERENCES

Seminaire de Probabilites, XXXVI, volume 1801 of Lecture Notes in Math., pages


165182. Berlin, Springer, 2003.
[OcY01] N. OConnell and M. Yor. Brownian analogues of Burkes theorem. Stochastic
Process. Appl., 96:285304, 2001.
[OcY02] N. OConnell and M. Yor. A representation for non-colliding random walks. Elec-
tron. Comm. Probab., 7, 12 pp., 2002.
[Oko00] A. Okounkov. Random matrices and random permutations. Int. Math. Res. Not.,
pages 10431095, 2000.
[Ona08] A. Onatski. The TracyWidom limit for the largest eigenvalues of singular com-
plex Wishart matrices. Ann. Appl. Probab., 18:470490, 2008.
[Pal07] J. Palmer. Planar Ising Correlations, volume 49 of Progress in Mathematical
Physics. Boston, MA, Birkhauser, 2007.
[Par80] B. N. Parlett. The Symmetric Eigenvalue Problem. Englewood Cliffs, N.J., Prentice-
Hall, 1980.
[Pas73] L. A. Pastur. Spectra of random selfadjoint operators. Uspehi Mat. Nauk, 28:364,
1973.
[Pas06] L. Pastur. Limiting laws of linear eigenvalue statistics for Hermitian matrix models.
J. Math. Phys., 47:103303, 2006.
[PaL08] L. Pastur and A. Lytova. Central limit theorem for linear eigenvalue statistics of
random matrices with independent entries. arXiv:0809.4698v1 [math.PR], 2008.
[PaS08a] L. Pastur and M. Shcherbina. Bulk universality and related properties of Hermi-
tian matrix models. J. Stat. Phys., 130:205250, 2008.
[Pec06] S. Peche. The largest eigenvalue of small rank perturbations of Hermitian random
matrices. Probab. Theory Rel. Fields, 134:127173, 2006.
[Pec09] S. Peche. Universality results for largest eigenvalues of some sample covariance
matrix ensembles. Probab. Theory Rel. Fields, 143:481516, 2009.
[PeS07] S. Peche and A. Soshnikov. Wigner random matrices with non-symmetrically dis-
tributed entries. J. Stat. Phys., 129:857884, 2007.
[PeS08b] S. Peche and A. Soshnikov. On the lower bound of the spectral norm of symmetric
random matrices with independent entries. Electron. Commun. Probab., 13:280290,
2008.
[Ped79] G. Pedersen. C -algebras and their Automorphism Groups, volume 14 of London
Mathematical Society Monographs. London, Academic Press, 1979.
[PeV05] Y. Peres and B. Virag. Zeros of the i.i.d. Gaussian power series: a conformally
invariant determinantal process. Acta Math., 194:135, 2005.
[PlR29] M. Plancherel and W. Rotach. Sur les valeurs asymptotiques des polynomes
dhermite Hn (x) = (1)n ex /2 (d/dx)n ex /2 . Comment. Math. Helv., 1:227254,
2 2

1929.
[PoS03] S. Popa and D. Shlyakhtenko. Universal properties of L(F ) in subfactor theory.
Acta Math., 191:225257, 2003.
[PrS02] M. Prahofer and H. Spohn. Scale invariance of the PNG droplet and the Airy
process. J. Stat. Phys., 108:10711106, 2002.
[Rad94] F. Radulescu. Random matrices, amalgamated free products and subfactors of the
von Neumann algebra of a free group, of noninteger index. Invent. Math., 115:347
389, 1994.
[Rai00] E. Rains. Correlation functions for symmetrized increasing subsequences.
arXiv:math/0006097v1 [math.CO], 2000.
[RaR08] J. A. Ramrez and B. Rider. Diffusion at the random matrix hard edge.
arXiv:0803.2043v3 [math.PR], 2008.
[RaRV06] J. A. Ramrez, B. Rider and B. Virag. Beta ensembles, stochastic airy spectrum,
and a diffusion. arXiv:math/0607331v3 [math.PR], 2006.
R EFERENCES 477

[Reb80] R. Rebolledo. Central limit theorems for local martingales. Z. Wahrs. verw. Geb.,
51:269286, 1980.
[ReY99] D. Revuz and M. Yor. Continuous Martingales and Brownian motion, volume 293
of Grundlehren der Mathematischen Wissenschaften. Berlin, Springer, third edition,
1999.
[RoS93] L. C. G. Rogers and Z. Shi. Interacting Brownian particles and the Wigner law.
Probab. Theory Rel. Fields, 95:555570, 1993.
[Roy07] G. Royer. An Initiation to Logarithmic Sobolev Inequalities, volume 14 of
SMF/AMS Texts and Monographs. Providence, RI, American Mathematical Society,
2007. Translated from the 1999 French original.
[Rud87] W. Rudin. Real and Complex Analysis. New York, NY, McGraw-Hill Book Co.,
third edition, 1987.
[Rud91] W. Rudin. Functional Analysis. New York, NY, McGraw-Hill Book Co, second
edition, 1991.
[Rud08] M. Rudelson. Invertibility of random matrices: norm of the inverse. Annals Math.,
168:575600, 2008.
[RuV08] M. Rudelson and R. Vershynin. The LittlewoodOfford problem and invertibility
of random matrices. Adv. Math., 218:600633, 2008.
[Rue69] D. Ruelle. Statistical Mechanics: Rigorous Results. Amsterdam, Benjamin, 1969.
[SaMJ80] M. Sato, T. Miwa and M. Jimbo. Holonomic quantum fields I-V. Publ. RIMS
Kyoto Univ., 14:223267, 15:201278, 15:577629, 15:871972, 16:531584, 1978-
1980.
[ScS05] J. H. Schenker and H. Schulz-Baldes. Semicircle law and freeness for random
matrices with symmetries or correlations. Math. Res. Lett., 12:531542, 2005.
[Sch05] H. Schultz. Non-commutative polynomials of independent Gaussian random ma-
trices. The real and symplectic cases. Probab. Theory Rel. Fields, 131:261309, 2005.
[Sel44] A. Selberg. Bermerkninger om et multipelt integral. Norsk Mat. Tidsskr., 26:7178,
1944.
[Shl96] D. Shlyakhtenko. Random Gaussian band matrices and freeness with amalgama-
tion. Int. Math. Res. Not., pages 10131025, 1996.
[Shl98] D. Shlyakhtenko. Gaussian random band matrices and operator-valued free proba-
bility theory. In Quantum Probability (Gdansk, 1997), volume 43 of Banach Center
Publ., pages 359368. Warsaw, Polish Acad. Sci., 1998.
[SiB95] J. Silverstein and Z. D. Bai. On the empirical distribution of eigenvalues of large
dimensional random matrices. J. Multivariate Anal., 54:175192, 1995.
[Sim83] L. Simon. Lectures on Geometric Measure Theory, volume 3 of Proceedings of the
Centre for Mathematical Analysis, Australian National University. Canberra, Aus-
tralian National University Centre for Mathematical Analysis, 1983.
[Sim05a] B. Simon. Orthogonal Polynomials on the Unit Circle, I, II. American Math-
ematical Society Colloquium Publications. Providence, RI, American Mathematical
Society, 2005.
[Sim05b] B. Simon. Trace Ideals and their Applications, volume 120 of Mathematical
Surveys and Monographs. Providence, RI, American Mathematical Society, second
edition, 2005.
[Sim07] B. Simon. CMV matrices: five years after. J. Comput. Appl. Math., 208:120154,
2007.
[SiS98a] Ya. Sinai and A. Soshnikov. Central limit theorem for traces of large random
symmetric matrices with independent matrix elements. Bol. Soc. Bras. Mat., 29:124,
1998.
[SiS98b] Ya. Sinai and A. Soshnikov. A refinement of Wigners semicircle law in a neigh-
borhood of the spectrum edge for random symmetric matrices. Funct. Anal. Appl.,
478 R EFERENCES

32:114131, 1998.
[Sni02] P. Sniady. Random regularization of Brown spectral measure. J. Funct. Anal.,
193:291313, 2002.
[Sni06] P. Sniady. Asymptotics of characters of symmetric groups, genus expansion and
free probability. Discrete Math., 306:624665, 2006.
[Sod07] S. Sodin. Random matrices, nonbacktracking walks, and orthogonal polynomials.
J. Math. Phys., 48:123503, 21, 2007.
[Sos99] A. Soshnikov. Universality at the edge of the spectrum in Wigner random matrices.
Commun. Math. Phys., 207:697733, 1999.
[Sos00] A. Soshnikov. Determinantal random point fields. Uspekhi Mat. Nauk, 55:107160,
2000.
[Sos02a] A. Soshnikov. Gaussian limit for determinantal random point fields. Annals
Probab., 30:171187, 2002.
[Sos02b] A. Soshnikov. A note on universality of the distribution of the largest eigenvalues
in certain sample covariance matrices. J. Statist. Phys., 108:10331056, 2002.
[Sos03] A. Soshnikov. Janossy densities. II. Pfaffian ensembles. J. Statist. Phys., 113:611
622, 2003.
[Sos04] A. Soshnikov. Poisson statistics for the largest eigenvalues of Wigner random ma-
trices with heavy tails. Electron. Comm. Probab., 9:8291, 2004.
[Spe90] R. Speicher. A new example of independence and white noise. Probab. Theory
Rel. Fields, 84:141159, 1990.
[Spe98] R. Speicher. Combinatorial theory of the free product with amalgamation and
operator-valued free probability theory. Mem. Amer. Math. Soc., 132(627), 1998.
[Spe03] R. Speicher. Free calculus. In Quantum Probability Communications, Vol. XII
(Grenoble, 1998), pages 209235, River Edge, NJ, World Scientific Publishing, 2003.
[SpT02] D. A. Spielman and S. H. Teng. Smooth analysis of algorithms. In Proceedings of
the International Congress of Mathematicians (Beijing 2002), volume I, pages 597
606. Beijing, Higher Education Press, 2002.
[Sta97] R. P. Stanley. Enumerative Combinatorics, volume 2. Cambridge University Press,
1997.
[Sze75] G. Szego. Orthogonal Polynomials. Providence, R.I., American Mathematical
Society, fourth edition, 1975. Colloquium Publications, Vol. XXIII.
[Tal96] M. Talagrand. A new look at independence. Annals Probab., 24:134, 1996.
[TaV08a] T. Tao and V. H. Vu. Random matrices: the circular law. Commun. Contemp.
Math., 10:261307, 2008.
[TaV08b] T. Tao and V. H. Vu. Random matrices: universality of esds and the circular law.
arXiv:0807.4898v2 [math.PR], 2008.
[TaV09a] T. Tao and V. H. Vu. Inverse LittlewoodOfford theorems and the condition num-
ber of random discrete matrices. Annals Math., 169:595632, 2009.
[TaV09b] T. Tao and V. H. Vu. Random matrices: Universality of local eigenvalue statistics.
arXiv:0906.0510v4 [math.PR], 2009.
[tH74] G. tHooft. Magnetic monopoles in unified gauge theories. Nuclear Phys. B,
79:276284, 1974.
[Tri85] F. G. Tricomi. Integral Equations. New York, NY, Dover Publications, 1985.
Reprint of the 1957 original.
[TuV04] A. M. Tulino and S. Verdu. Random matrix theory and wireless communications.
In Foundations and Trends in Communications and Information Theory, volume 1,
Hanover, MA, Now Publishers, 2004.
[TrW93] C. A. Tracy and H. Widom. Introduction to Random Matrices, volume 424 of
Lecture Notes in Physics, pages 103130. New York, NY, Springer, 1993.
[TrW94a] C. A. Tracy and H. Widom. Level spacing distributions and the Airy kernel.
R EFERENCES 479

Commun. Math. Phys., 159:151174, 1994.


[TrW94b] C. A. Tracy and H. Widom. Level spacing distributions and the Bessel kernel.
Comm. Math. Phys., 161:289309, 1994.
[TrW96] C. A. Tracy and H. Widom. On orthogonal and symplectic matrix ensembles.
Commun. Math. Phys., 177:727754, 1996.
[TrW00] C. A. Tracy and H. Widom. Universality of the distribution functions of ran-
dom matrix theory. In Integrable Systems: from Classical to Quantum (Montreal,
QC, 1999), volume 26 of CRM Proc. Lecture Notes, pages 251264. Providence, RI,
American Mathematical Society, 2000.
[TrW02] C. A. Tracy and H. Widom. Airy kernel and Painleve II. In A. Its and J. Har-
nad, editors, Isomonodromic Deformations and Applications in Physics, volume 31 of
CRM Proceedings and Lecture Notes, pages 8598. Providence, RI, American Math-
ematical Society, 2002.
[TrW03] C. A. Tracy and H. Widom. A system of differential equations for the Airy process.
Electron. Comm. Probab., 8:9398, 2003.
[TrW05] C. A. Tracy and H. Widom. Matrix kernels for the Gaussian orthogonal and sym-
plectic ensembles. Ann. Inst. Fourier (Grenoble), 55:21972207, 2005.
[VaV07] B. Valko and B. Virag. Continuum limits of random matrices and the Brownian
carousel. arXiv:0712.2000v3 [math.PR], 2007.
[VeK77] A. M. Vershik and S. V. Kerov. Asymptotics of the Plancherel measure of the
symmetric group and the limiting form of Young tables. Soviet Math. Dokl., 18:527
531, 1977.
[VoDN92] D. V. Voiculescu, K. J. Dykema and A. Nica. Free Random Variables, volume 1
of CRM Monograph Series. Providence, RI, American Mathematical Society, 1992.
[Voi86] D. Voiculescu. Addition of certain non-commuting random variables. J. Funct.
Anal., 66:323346, 1986.
[Voi90] D. Voiculescu. Circular and semicircular systems and free product factors. In Oper-
ator Algebras, Unitary Representations, Enveloping Algebras, and Invariant Theory
(Paris, 1989), volume 92 of Progr. Math., pages 4560. Boston, MA, Birkhauser,
1990.
[Voi91] D. Voiculescu. Limit laws for random matrices and free products. Invent. Math.,
104:201220, 1991.
[Voi93] D. Voiculescu. The analogues of entropy and of Fishers information measure in
free probability theory. I. Comm. Math. Phys., 155:7192, 1993.
[Voi94] D. Voiculescu. The analogues of entropy and of Fishers information measure in
free probability theory. II. Invent. Math., 118:411440, 1994.
[Voi96] D. Voiculescu. The analogues of entropy and of Fishers information measure in
free probability theory. III. The absence of Cartan subalgebras. Geom. Funct. Anal.,
6:172199, 1996.
[Voi97] D. Voiculescu, editor. Free Probability Theory, volume 12 of Fields Institute Com-
munications. Providence, RI, American Mathematical Society, 1997. Papers from the
Workshop on Random Matrices and Operator Algebra Free Products held during the
Special Year on Operator Algebra at the Fields Institute for Research in Mathematical
Sciences, Waterloo, ON, March 1995.
[Voi98a] D. Voiculescu. The analogues of entropy and of Fishers information measure
in free probability theory. V. Noncommutative Hilbert transforms. Invent. Math.,
132:189227, 1998.
[Voi98b] D. Voiculescu. A strengthened asymptotic freeness result for random matrices
with applications to free entropy. Int. Math. Res. Not., pages 4163, 1998.
[Voi99] D. Voiculescu. The analogues of entropy and of Fishers information measure
in free probability theory. VI. Liberation and mutual free information. Adv. Math.,
480 R EFERENCES

146:101166, 1999.
[Voi00a] D. Voiculescu. The coalgebra of the free difference quotient and free probability.
Int. Math. Res. Not., pages 79106, 2000.
[Voi00b] D. Voiculescu. Lectures on Probability Theory and Statistics: Ecole DEte de
Probabilites de Saint-Flour XXVIII - 1998, volume 1738 of Lecture Notes in Mathe-
matics, pages 283349. New York, NY, Springer, 2000.
[Voi02] D. Voiculescu. Free entropy. Bull. London Math. Soc., 34:257278, 2002.
[Vu07] V. H. Vu. Spectral norm of random matrices. Combinatorica, 27:721736, 2007.
[Wac78] K. W. Wachter. The strong limits of random matrix spectra for sample matrices of
independent elements. Annals Probab., 6:118, 1978.
[Wey39] H. Weyl. The Classical Groups: their Invariants and Representations. Princeton,
NJ, Princeton University Press, 1939.
[Wid94] H. Widom. The asymptotics of a continuous analogue of orthogonal polynomials.
J. Approx. Theory, 77:5164, 1994.
[Wig55] E. P. Wigner. Characteristic vectors of bordered matrices with infinite dimensions.
Annals Math., 62:548564, 1955.
[Wig58] E. P. Wigner. On the distribution of the roots of certain symmetric matrices. Annals
Math., 67:325327, 1958.
[Wil78] H. S. Wilf. Mathematics for the Physical Sciences. New York, NY, Dover Publica-
tions, 1978.
[Wis28] J. Wishart. The generalized product moment distribution in samples from a normal
multivariate population. Biometrika, 20A:3252, 1928.
[WuMTB76] T. T. Wu, B. M McCoy, C. A. Tracy and E. Barouch. Spinspin correlation
functions for the two-dimensional ising model: exact theory in the scaling region.
Phys. Rev. B., 13, 1976.
[Xu97] F. Xu. A random matrix model from two-dimensional YangMills theory. Comm.
Math. Phys., 190:287307, 1997.
[Zir96] M. Zirnbauer. Riemannian symmetric superspaces and their origin in random matrix
theory. J. Math. Phys., 37:49865018, 1996.
[Zvo97] A. Zvonkin. Matrix integrals and maps enumeration: an accessible introduction.
Math. Comput. Modeling, 26:281304, 1997.
General conventions and notation

Unless stated otherwise, for S a Polish space, M1 (S) is given the topology of weak
convergence, that makes it into a Polish space.
When we write a(s) b(s), we assert that there exists c(s) defined for s ( 0 such
that lims c(s) = 1 and c(s)a(s) = b(s) for s ( 0. We use the notation an bn for
sequences in the analogous sense. We write a(s) = O(b(s)) if lim sups |a(s)/b(s)| < .
We write a(s) = o(b(s)) if lim sups |a(s)/b(s)| = 0. an = O(bn ) and an = o(bn ) are
defined analogously.
The following is a list of frequently used notation. In case the notation is not routine, we
provide a pointer to the definition.

for all
a.s., a.e. almost sure, almost everywhere
Ai(x) Airy function
(A ,  , , ) C -algebra (see Definition 5.2.11)
A, Ao , Ac closure, interior and complement of A
A\B set difference
B(H) space of bounbed operators on a Hilbert space H
Ck (S), Cbk (S) functions on S with continuous (resp., bounded continuous)
derivatives up to order k
C (S) infinitely differentiable functions on S
Cb (S) bounded functions on S possessing bounded derivatives of all order
Cc (S) infinitely differentiable functions on S of compact support
C(S, S ) Continuous functions from S to S
(Rm )
Cpoly infinitely differentiable functions on Rm all of whose derivatives
have polynomial growth at infinity.
CLT central limit theorem
Prob
convergence in probability
d(, ), d(x, A) metric and distance from point x to a set A
det(M) determinant of M
(x) Vandermonde determinant, see (2.5.2)
(K) Fredholm determinant of a kernel K, see Definition 3.4.3
N open (N 1)-dimensional simplex
D(L ) domain of L
0/ the empty set
( ) the signature of a permutation
, ! there exists, there exists a unique

481
482 C ONVENTIONS AND NOTATION

f (A) image of A under f


f 1 inverse image of f
f g composition of functions
Flagn ( , F) Flag manifold, see (4.1.4)
GLn (F) invertible elements of Matn (F)
H skew-field of quaternions
Hn (F) elements of Matn (F) with X = X
i basis elements of C (together with 1)
i, j, k basis elements of H (together with 1)
i.i.d. independent, identically distributed (random variables)
1A (), 1a () indicator on A and on {a}
In identity matrix in GLn (F)
t, $t% largest integer smaller than or equal to t, smallest integer greater than or equal to t
LDP large deviation principle (see Definition D.1)
Lip(R) Lipschitz functions on R
LLN law of large numbers
log() logarithm, natural base
LSI logarithmic Sobolev inequality (see Subsection 2.3.2 and (4.4.13))
Mat pq (F) p-by-q matrices with entries belonging to F (where F=R, C or H)
Mat p (F) same as Mat pp (F)
M1 (S) probability measures on S
, , probability measures
f 1 composition of a (probability) measure and a measurable map
N(0,I) zero mean, identity covariance standard multivariate normal
, (pointwise) minimum, maximum
PI Poincare inequality (see Definition 4.4.2)
P(), E() probability and expectation, respectively
R, C reals and complex fields
Rd d-dimensional Euclidean space (where d is a positive integer)
R (z) R-transform of a measure (see Definition 5.3.37)
M volume on Riemannian manifold M
sp(T ) spectrum of an operator T
Sa (z) S-transform of a (see Definition 5.3.29)
S (z) Stieltjes transform of a measure (see Definition 2.4.1).
Sn1 unit sphere in Rn
SO(N), SU(N) special orthogonal group (resp., special unitary group)
sun (F) anti-self-adjoint elements of Matn (F), with vanishing trace if F = C
( ) noncommutative entropy of the measure , see (2.6.4)
tr(M), tr(K) trace of a matrix M or of a kernel K
v transpose of the vector (matrix) v
v transpose and complex conjugate of the vector (matrix) v
Un (F) unitary matrices in GLn (F)
{x} set consisting of the point x
Z+ positive integers
contained in (not necessarily properly)
,  scalar product in Rd
 f ,  integral of f with respect to
direct sum
tensor product
 free additive convolution (see Definition 5.3.20)
 free multiplicative convolution (see Definition 5.3.28
C ONVENTIONS AND NOTATION 483
Index

Adapted, 251, 257, 459, 460 symmetric, 248, 257, 319


Airy symplectic, 248
equation, 92, 140, 142, 145, 167 Bulk, 90, 91, 114, 163, 183, 184, 215, 319,
function, 91, 133, 138141, 231 321
kernel, see Kernel, Airy
process, see Process, Airy
C -
stochastic operator, 307317
algebra, 329339, 394, 400, 413, 451
Algebraic function, algebraicity condition, probability space, 329, 331338, 351, 353,
412 369, 394, 395, 407
Ambient space, 200, 202, 203, 207, 209 universal C -algebra, 334, 336
212, 438, 439
Antisymmetric matrices, 214 Carre du champ operator, 289
ArzelaAscoli Theorem, 266, 268 itere, 289
Catalan number, 79, 10, 12, 85, 377
BakryEmery condition (BE criterion), 39, CauchyBinet Theorem, 57, 98, 99, 225,
287, 289, 290, 294, 321 226, 415
BanachAlaoglu Theorem, 310, 338, 421 Central function, 192
BercoviciPata bijection, 411 Central limit theorem (CLT), 29, 86, 87, 88,
Bernoulli random variables, 225, 227 131, 186, 215, 227, 248, 318, 319, 321,
Bernoulli walk, 8, 85 412, 413
dynamical, 273277
Beta integral, 60, 62 multidimensional, 35
(L2 )-Betti numbers, 413 see also Free, central limit theorem
Birkhoff, G., 86 Characteristic polynomial, 55, 257
BobkovGotze, 87 ChristoffelDarboux, 100, 106, 181
BochnerBakryEmery, 297 Circular law, 88
BorelCantelli Lemma, 19, 252, 266, 270 Coarea formula, 187, 193195, 198, 201,
273, 276, 311, 378, 382, 398 205, 318, 437, 442445
Bracelet, (circuit length of), 31 Configuration, 215, 216, 228240, 247
Branch, 9, 46, 135, 365 Combinatorial problems, 184, 319
Brownian motion, 186, 248, 253, 257, 261, Commutant, 343, 455
280, 292, 307, 309, 314, 319, 321, 459 Complete, completion 329, 334, 335, 341,
carousel, 321 389, 422
free, 412
Hermitian, 248, 257 Concentration, 3843, 71, 87, 88, 186, 273,
281302, 320, 321, 389
Confluent alternant identity, 69, 150

484
I NDEX 485

Conjugation-invariant, 201, 202, 205, 208 Diffusion process, 247281, 319, 321
Connection problem, 183 Discriminant, 55, 257, 417
Contraction principle, 320, 428 Distribution (law),
Convergence, 419, 420 Bernoulli, see Bernoulli random variables
almost sure, 28, 71, 73, 263, 268, 323, Cauchy, 374
324, 375, 378, 379, 393 , 303, 307
L p , 28, 268, 375, 388 function, 344
in distribution (law), 92, 93, 103, 241, Gaussian, see Gaussian, distribution
274, 322, 328 noncommutative, 326, 327, 331, 333, 343,
in expectation, 323, 324, 375, 379 344, 349, 360, 363, 365, 366, 378, 380,
in moments, 328, 337 382, 385, 387, 391, 394
in probability, 23, 27, 48 Schwarz, 126, 310
sequential, 338 Semicircle, see Semicircle distribution
vague, 44, 45, 134 stable, 321
weak, 44, 134, 388, 420 Double commutant theorem (von Neumann),
weakly, in probability, 7, 23, 44, 71 340, 343, 455
Convex, Doubly stochastic matrix, 21, 86
function, 72, 285287, 291 Dyck path, 7, 8, 9, 1517, 20, 85, 353, 363,
hull, 63, 420 364
set, 21, 421, 422 Dyson, 181, 249, 319
strict, 72, 75, 298 see also SchwingerDyson equation
Correlation functions, 216
see also Intensity, joint
EdelmanDumitriu, 303
Coupling, 66
Edge (of graph), 9, 1319, 2427, 3037,
Critical (point, value), 134136, 141, 193,
376378, 387,
440, 441
bounding table, 34, 35
Cumulant, 354357, 360364, 369, 410, 411 connecting, 13, 17
see also Free, cumulant self, 13, 17
Cut-off, 250 Edge (of support of spectrum), 90, 9294,
Cyclo-stationary, 318 101, 132, 162, 163, 166, 177, 183, 184,
Cylinder set, 215 215, 306, 319, 321
hard, 321
Decimation, 66, 88, 162, 166, 169, 170 Eigenvalue, 2, 6, 20, 2123, 36, 37, 45, 48,
Determinantal 51, 55, 58, 65, 71, 78, 9094, 131, 186,
formulas, 152155 188, 193, 198, 199, 209212, 220, 221,
process, 90, 94, 131, 186, 193, 214220 223, 226228, 230, 231, 240, 249, 261,
248, 318, 319 263, 269, 286, 298, 320, 321, 327, 374
projections, 222227 393, 395, 396, 399, 433
relations, 120 complex, 88, 89, 213
stationary process, 215, 237239 joint density, 65, 87
structure, form, 9395 joint distribution, 5070, 87, 88, 184, 186,
187, 191, 261, 303, 318
Diagonal, block-diagonal, 190, 191, 198, 200, law of ordered, 52, 248, 249
201, 206, 207, 209214, 254, 263, 276, law of unordered, 53, 94, 95, 189, 304
277, 282, 300, 301, 304, 305, 319, 388, maximal, 23, 28, 66, 81, 8688, 103, 183,
389, 402, 411, 432437 269, 306, 321
Differential equations (system), 121-123, 126 see also Empirical measure
130, 170180, 182, 183 Eigenvector, 38, 53, 286, 304, 389
Differential extension, 158, 160162, 172 Eigenvectoreigenvalue pair, 308317
Differentiation formula, 121, 123, 144
486 I NDEX

Empirical distribution (measure) Free,


eigenvalues, 6, 7, 2023, 29, 36, 38, 45, asymptotically, 374393, 411
51, 71, 8089, 101, 114, 228, 262, 320, central limit theorem, 368
324 convolution, 262, 319, 325, 359368, 373,
annealed, 328, 379 374, 388, 389, 410, 411
matrices, 327, 375, 379, 388, 396, 397, cumulant, 325, 354356, 359, 360, 365,
399, 413 410, 411
quenched, 328, 379 increments, 412
Ensemble, 187 independence, 322, 348374
beta, 186, 303, 321 infinitely divisible law, 373
Biorthogonal, 244 group, 322
COE, CSE, 318 group factors, 413
Gaussian, 90, 186, 188, 189, 193, 198, harmonic analysis, 359, 368, 370
206 (see also Hermite) multiplicative convolution, 365368, 411
Jacobi, see Jacobi, ensemble probability, 322410, 366
Laguerre, see Laguerre, ensemble Poisson, 365
unitary, 186, 318 product, 349353
see also GOE, GSE, GUE semicircular variables, 323, 324
Entropy, see Noncommutative, entropy variables, 325, 348352, 362, 378, 380,
Enumeration of maps, 182 382, 387, 391, 394, 395, 410413
see also Brownian motion, free
Ergodic, 185, 186, 233, 234, 238, 239, 294,
321 Freeness, 87, 324, 350, 387, 388, 392, 410
second order, 87
Euclidean space, 187190, 197, 203, 205, with amalgamation, 412
232, 437, 438445
Functional calculus, 330, 331, 338, 457
Exploration process, 15 458
Exponential tightness, 77, 80, 278, 279, 427, Fundamental identity, 111113, 124, 126,
428 173
Extreme point, 21, 86 FurediKomlos (FK), 2329, 86

Federer, 194, 318, see Coarea formula Gamma function (Eulers), 53, 139, 194, 303
Feynmans diagrams, 181 Gap, 107, 114119, 131, 148, 150155, 159,
Fiber, 196, 203, 440 234, 239
Field, 187, 430 GaudinMehta, 91
vector, 446 Gauss decomposition, 244
Filtration, 249, 251, 254, 280, 459 Gaussian, 42, 88
Fisher information, 413 distribution (law), 29, 30, 33, 39, 45, 182,
Flag manifold, 190, 197, 198, 209, 211 184, 188, 277, 284, 291, 303, 307, 311,
Fock, BoltzmannFock space, 350, 359, 362, 381, 397, 405
409 ensembles, see Ensembles, Gaussian
Forest, 27, 31, 34 process, 248, 274276
Fourier transform, 76, 87, 118, 230232, sub-, 39
237, 360 Wigner matrix see Wigner
Fredholm Gaussian orthogonal ensemble (GOE), 6, 51
adjugant, 110, 111, 113, 125, 157 54, 58, 66, 71, 81, 82, 87, 90, 93, 94, 113,
determinant, 94, 98, 107, 108, 109113, 132, 148, 150, 155, 157, 160, 161, 166,
120, 121, 128, 142, 156, 163, 170, 182, 169, 181193, 198, 199, 229, 230, 248,
183, 222, 234 302305, 323, 412
resolvent, 110, 111, 121123, 157 Gaussian symplectic ensemble (GSE), 37,
I NDEX 487

53, 58, 66, 68, 71, 93, 132, 148, 150, Hypergeometric function, 104, 106
160, 170, 183193, 302, 412
Gaussian unitary ensemble (GUE), 36, 51 Implicit function theorem, 58, 137, 268, 371,
54, 58, 66, 68, 71, 81, 82, 87, 9098, 372
101, 103, 105, 121, 158, 163, 169, 183
Inequality,
193, 198, 199, 215, 228, 229, 230, 248,
BurkholderDavisGundy, 255, 260, 265,
302305, 319, 323, 394, 395, 412
266, 271, 272, 275, 413, 461
GelfandNaimark Theorem, 331 BurkholderRosenthal, 413
GelfandNaimark-Segal construction (GNS), CauchySchwarz, 260, 285, 295, 335, 338,
326, 333, 340, 342, 369, 370, 400, 401, 384, 390, 457, 458
452 Chebyshev, 11, 17, 19, 29, 40, 49, 265,
Generalized determinant, 193, 443 271, 280, 284, 378, 398, 463
Generic, 200, 201, 203, 207, 209212 Gordon, 87
Geodesic, 27, 28, 203 Hadamard, 108, 131, 415
frame, 297, 448 Holder, 24, 387
Gersgorin circle theorem, 415 Jensen, 23, 77, 272, 275, 292
noncommutative Holder, 416, 458
GesselViennot, 245
Logarithmic Sobolev (LSI), 38, 3943, 87,
Graph, 283285, 287, 290, 298, 302
unicyclic, 30, 31 Poincare (PI), 283285, 397, 405, 412
see also Sentence, Word Slepian, 87
Greens theorem, 398 Weyl, 415
Gromov, 299 Infinitesimal generator, 288, 292
Gronwalls Lemma, 260, 292, 313 Infinite divisibility, 411
Group, 186, 188, 192, 200, 207, 211, 212, Initial condition, 249, 250, 257, 258, 261,
299, 300, 432 262, 269, 275, 460
algebra, 325, 450 Integral operator, 220
discrete, 325, 327, 332 admissible, 220222, 226, 227, 230241
see also Free group, Lie group, Orthogo- compact, 221
nal group and Unitary groups good, 221, 233239, 241
Integration formula, 65, 66, 148, 187214,
Hamburger moment problem, 329 318
Harer-Zagier recursions, 104, 181 Intensity, 216220, 222228, 234238, 240,
Heat equation, 320 242, 318
Hellys selection theorem, 45 Interlace, 62
Herbsts Lemma, 40, 284 Involution, 329334, 389, 394, 400, 450
Hermite, Isometry, 195, 196, 197, 201, 203, 205
polynomials, 90, 94, 95, 99, 100, 101, 207, 211, 343, 346, 439, 440, 443, 454
182, 183, 187, 189, 190, 191 Itos Lemma (formula), 250, 251, 260, 263,
ensemble, 189, see also Ensemble, Gaus- 269, 279, 280, 292, 293, 461
sian/Hermite ItzyksonZuberHarish-Chandra, 184, 320
Hessian, 289291, 298, 437, 447, 448
Hilbert space, 308, 326, 328, 330332, 339 Jacobi,
345, 350353, 370, 372, 400, 409, 431, ensemble, 70, 183, 186, 190, 191, 193,
439, 451457 197, 206, 208, 318, 321, 434
HoffmanWielandt, 21 polynomial, 187, 191
Holder norm, 265 Jacobian, 54, 58, 305, 306
Householder reflector (transformation), 303 Janossy density, 218, 219, 319
305
488 I NDEX

JimboMiwaMoriSato, 91, 182 Leibniz rule, 295, 380, 390


Jonsson, 86 see also Noncommutative Liebniz rule
Letter (S -letter), 13
Kaplansky density theorem, 341, 456 LeviCivita connection, 296, 446, 447
KarlinMcGregor, 247 LevyKhitchine, 411
Kernel, 107, 121, 220, 224228 Levy distance (metric), 346, 425
Airy, 92, 133, 143, 147, 161, 162, 168, Levy process, 412
177, 183, 228, 230 Levys Theorem, 257, 459
antisymmetric, 158, 160, 162 Lie,
-twisting of, 156 algebra, 430, 435
good, see Integral operator, good bracket, 202, 296, 446
Hermitian, 221 group, 186, 187, 191, 199, 299, 441
matrix, 148, 155159, 161, 170172 Linearization (trick), 396, 400, 402, 403,
positive definite, 221, 230 408
trace-class projection, 223228 Lipschitz
transition, 241, 245, 247 bounded metric, 23, 71, 77, 425
resolvent, 110, 121125, 157, 172177 constant, 3842, 299, 302
self-dual, 158162, 172 function, 23, 3842, 46, 250, 267, 268,
sine, 91, 114, 121, 122, 131, 144, 145, 282, 284287, 292, 293, 298, 301, 302,
161, 173, 181, 228, 229, 233, 237 426
smooth, 158, 160, 172
Limit distribution (law), 66, 115, 183, 184,
symmetric, 158, 161, 177
262, 388, 389
Kleins Lemma, 286, 320
Lusins Theorem, 340, 423
Kolmogorov-Smirnov distance, 346
Logarithmic asymptotics, see Large devia-
tion
Laguerre Logarithmic capacity, 72
ensemble, 70, 186, 189, 190, 193, 206, Lyapunov function, 250, 251
210, 318, 321
polynomial, 107, 183, 187, 190
Lagrange inversion theorem, 371 Manifold, 187, 193200, 207, 318, 437
450
LaplaceBeltrami operator, 296, 437, 447,
Riemannian, 295, 299, 320, 321
448
submanifold, 199, 437, 438, 449
Laplaces method, 59, 115117, 119, 134, see also Flag manifold
142, 428, 429
MANOVA, 318
Large deviation, 7085, 88, 186, 248, 277,
MarcenkoPastur law, 21, 365
320, 413, 427429
lower bound, 72, 78, 79, 84 Markov, 410, 412
principle (LDP), 72, 77, 8183, 413, 427 Process, 246
rate function, see Rate function semigroup, 245, 247, 282, 287, 288, 291,
speed, 72, 78, 81, 82, 84, 278, 427 292, 295
upper bound, 72, 77, 82, 84, 278281 Martingale (martingale bracket, submartin-
weak LDP, 80, 427 gale), 252, 254, 255, 263, 265, 271, 274,
Lattice, 9 275, 278, 280, 281, 459
see also Semi-martingale
Law, see Distribution (law)
of large numbers, 248, 249 Master loop equation, see SchwingerDyson
equation
Lebesgues Theorem, 216
Matching, 34, 182
Ledouxs bound, 103, 133, 163, 181
Matrix
Left regular representation, 332, 350
band, 85, 86, 319, 324, 412
I NDEX 489

distinct, 54, 55 probability (space), 322, 325, 326, 328,


good, 54 329, 348352, 355, 356, 360, 363, 365,
Hankel, 88 366, 368, 374, 375, 379, 388, 393, 400
inversion lemma, 45, 414 (random) variable, 322, 325328, 337,
Markov, 88 349, 359, 360, 362, 366, 374, 375, 379,
normalized, 54 394, 396, 399, 412
sample covariance, 412, see also Wishart Non-intersecting, 245, 319
matrix Norm, 38, 109, 329331, 334, 336, 341,
Toeplitz, 88, 182 343, 352, 394, 400, 401, 406, 412, 422
Wigner matrix see Wigner Frobenius, 415
with dependent entries, 87, 88, 287302 Holder, 265
Measure, L p , 275
Gaussian see Gaussian distribution operator, 291
Haar, 53, 88, 186, 188, 191, 200, 299, uniform, 288
300, 320, 321, 324, 388, 389, 390, 393, semi-, 334, 335, 450, 455
441 sub-multiplicativity, 335, 458
Hausdorff, 194 see also Noncommutative, L p -norm and
Lebesgue, 5157, 61, 63, 77, 93, 96, 102, Operator, norm
107, 115, 121, 143, 149, 156, 165, 188, Normal
206, 220, 230, 236, 238, 247, 261, 287, matrix, 199, 214, 415
298, 305, 320, 439441 standard variable, 188, 190, 227, 229
positive, 44, 215, 423 Normalization constant, 54, 58, 59, 81, 96,
Radon, 215, 217 189, 191, 303
reconstruction of, 44, 49
sub-probability, 44, 45
Median, 285 Operator,
Mellin transform, 366 algebra, 322, 324, 410, 450458
affiliated, 325, 336, 343345, 347, 369,
Metrizable, 338, 419, 420
370, 372, 410, 411, 454
Minimizer (of variational problem), 75, 311 bounded, 326328, 330, 339, 343, 350,
314 360, 366, 369, 409, 410, 453, 458
Mixing, 233, 238 commutator, 122
Moment, 7, 12, 13, 20, 29, 8589, 101, 102, densely-defined, 343, 344, 453, 454
181, 182, 268, 273, 318, 328, 361364, left annihilation, 350, 362
366, 369, 370, 383, 412 left (right) creation, 350, 362, 364, 409
see also Hamburger moment problem multiplication, 122, 330, 332, 340, 341,
Monge-Kantorovich-Rubinstein distance, 320 343, 344, 347, 353
Monomial, 70, 200, 207, 209214, 327, 328, norm, 343, 394, 412
375, 378, 379, 380, 382, 383, 391, 432 normal, 328330, 415, 432, 451453
Montels Theorem, 372 unbounded, 342, 343, 347, 369
unitary, 332, 343, 451, 452
see also under Self-adjoint
Noncommutative, Orthogonal, 192, 205, 313, 317, 352, 438,
derivative, 380, 389, 390, 397 443, 447
entropy, 71, 78, 413 basis, 100, 202, 206, 313
law, 325, 326, 336, 338, 340, 343, 379, bi-, 244
388 ensemble, see GOE
Leibniz rule, 380, 390 group, 53, 187, 253, 299, 320, 393, 435,
L p -norm, 416 449
polynomial, 301, 323, 324326, 394, 395, matrix, 38, 52, 54, 56, 254, 303305
400, 402 polynomial, 86, 94, 181, 183, 184, 189
490 I NDEX

191, 321 eigenvalue, 319


projection, 204, 206, 208, 210, 410, 443 exploration, 15
Oscillator wave-function, 94, 95, 99, 101, Gaussian, see Gaussian, process
114, 133, 150, 153, 164, 166, 221 Laguerre, 319
measure-valued, 262, 263, 277
sine, 230, 231, 319
Painleve, 91, 93, 122, 128, 143, 146, 147, see also Diffusion, Markov, process and
170, 182, 183 Point, process.
-form, 91
Projector, projection, 186, 190, 191, 198,
Palm (distribution, process), 234, 235, 237 345347, 409, 410, 430, 432435, 456
240, 318
Parsevals Theorem, 232, 237
Partition, 9, 16, 217, 354, 355, 357, 359, Quaternion, 187, 430
367 determinant, 183
block of, 354359, 364, 367, 369, 377 Queueing, 319
crossing, 9, 10 Quotient (space), 334, 335, 341, 389
interval of, 354359, 367
non-crossing, 9, 10, 1517, 354, 355, 358, RamirezRiderVirag Theorem, 309
362, 364, 366, 367, 377
Random analytic functions, 319
pair, 16, 17, 369, 377, 378
refinement of, 354 Random permutation, 184, 185
Pauli matrices, 261 Rate function, 72, 277, 278, 427429
good, 72, 74, 81, 278, 279, 427, 428
Permutation matrix, 86, 200, 201, 209213,
minimizer, 75, 81
411, 432, 437
strictly convex, 72, 75
Perturbation, 61, 182, 184, 230, 415
Rebolledos Theorem, 274, 463
Pfaffian, 148, 149, 159, 183, 193, 319
Reflection, 8, 85, 245
integration formulas, 148151, 154
Regular (point, value), 193, 194, 196198,
Point process, 215220, 225, 318
205, 440443
simple, 215220
see also Determinantal, process Resolution of the identity, 339, 344, 452,
453
Poisson process, 220
free, 365 Resolvent, 87
see also Fredholm, resolvent
Polar decomposition, 343, 346, 393, 454
Resultant, 55, 64, 417
Polish space, 107, 215, 264, 330, 423426
Ricci tensor (curvature), 297, 299, 321, 435,
Polynomial, 11, 52, 54, 55, 58, 6070, 100,
448450
120, 167, 181, 248, 257, 268, 270275,
290, 323, 324, 326330, 333, 343346, Riemannian,
370, 379, 381, 387, 390395, 397, 412, manifold, see Manifold, Riemannian
417 metric, 295, 299, 445, 446
degree, 394, 400, 402, 410 Riemann-Hilbert, 182185
see also Hermite, polynomial, Jacobi, poly- Riemann zeta function, 185
nomial, Laguerre, polynomial, Noncom- Riesz Theorem, 279, 281, 331, 338, 344,
mutative, polynomial and Orthogonal, poly- 423
nomial. Root system, 192, 318
Poset, 354, 355 R-transform, 360, 365, 370, 371
Principal value, 49 S-transform, 366, 368
Process,
Airy, 228, 230, 319
Bessel, 319 Saddle point, 136
Birth-death, 245 Sards Theorem, 194, 205, 441
I NDEX 491

Scale invariance, 395, 403, 408 radius (norm), 269, 323, 325, 330, 331,
Schroedinger Operator, 302 336, 383, 393396, 451
Schur function, 320 resolution, 328, 453
SchwingerDyson equation, 379, 381, 382, theorem, 198, 328, 331, 339, 340, 343,
386, 389, 391, 404, 406409, 411, 412 344, 347, 433, 452454
also appear as master loop equation Spectrum, 328, 330332, 394396, 398, 399,
Self-adjoint, 198, 220, 260, 323, 329, 332 451454
334, 343347, 359, 366, 368, 370, 395, Spiked models, 184
396, 412, 432, 433, 451456 State, 331334, 336342, 391, 395, 454,
anti-, 196, 201, 206, 207, 210, 432, 436 455
Self-dual, 37, 392 faithful, 342345, 369, 370, 394, 395,
see also Kernel, self-dual 456, 457
Selberg integral formula, 54, 5964, 87, 88, normal, 342345, 369, 370, 454, 456,
189 457
tracial, 322, 324, 331334, 337, 340
Semicircle distribution (law), 6, 7, 10, 21, 345, 349, 366370, 372, 380, 387, 389,
23, 36, 43, 47, 51, 81, 86, 88, 101, 105, 391, 394, 395, 413, 456458
139, 262, 273, 319, 323, 365, 368, 369,
373, 374, 375, 404, 410 Stationary process, 261, 269, 318
see also Determinantal, stationary pro-
Semicircular variables, 323, 324, 365, 374, cess and Translation invariance
375, 377380, 382, 394, 395, 410, 412
Steepest descent, 134, 138, 141
Semi-martingale, 249, 253, 254, 461
Stieltjes transform, 9, 20, 38, 4350, 81, 87,
Sentence, 17, 18, 25, 33, 378 267, 360, 396, 398, 411, 412
equivalent, 17
FK, 2528 Stirlings formula, 59, 105, 108, 109, 119,
graph associated with, 17, 378 136, 164
support of, 17 Stochastic,
weight of, 17 analysis (calculus), 87, 248281, 319, 412,
Separable, 338342, 345, 351, 419, 420 413, 459463
427 differential equation (system), 249, 250,
258, 261, 263, 269, 274, 291, 460
Shift, 233, 238 noncommutative calculus, 413
SinaiSoshnikov, 86 StoneWeierstrass Theorem, 330
Singular value, 8789, 189, 190, 193, 207, Stopping time, 251, 253, 260, 459
210, 282, 285, 301, 394, 416, 434
S-transform, 366, 368
Size bias, 239
Subordination function, 410
Skew field, 187, 430
Superposition, 66, 88, 162, 166
Skew-Hermitian matrix, 253
Symmetric function, 65, 217, 218
Sobolev space, 293
Symplectic, 192
Solution, see also Gaussian symplectic ensemble
strong, 249251, 253259, 269, 460
weak, 249, 251, 261, 460
Soshnikov, 184 Talagrand, 285287, 320
Spacing, 9093, 110, 114, 132, 134, 160, Tangent space, 196, 200, 437, 439
181184, 240, 242 Tensor product, 322, 348, 399, 400, 451
Spectral, Telecommunications, 413
analysis, 330 Three-term recurrence (recursion), 100, 106,
measure, 319, 327, 366, 370, 375, 389, 181, 321
393, 412 Tight, 314317, 379, 380, 382, 389, 425,
projection, 332, 343, 344, 454
492 I NDEX

426 Volume (measure), 188, 189, 191, 193, 195,


see also Exponential tightness 224, 234, 295, 299, 439, 440, 441, 446
Tiling, 319 von Neumann algebra, 322, 339342, 344,
Torsion-free, 297, 446, 447 348, 353, 370, 413, 455458
Trace, 50, 86, 107, 198, 286, 322325, 332, Voronoi cell, 235237
350, 363, 387, 392, 394, 400, 412
-class, 220, 223228, 412, 457 Wasserstein distance, 320
normalized, 50, 325, 392, 400
Weierstrass approximation theorem, 11
see also State, tracial
Weingarten function, 411
TracyWidom, 93, 142147, 181185, 306,
307, 321 Weyl, 187, 192, 193, 199, 202
formula, 206, 244, 318
Translation invariance, 215, 230, 231241
operator, 202 quadruple, 199203, 206
Tridiagonal, 186, 302317, 321 214
Topology, 88, 344, 418 see also Inequality, Weyl
Skorohod, 314 Wigner,
strong operator, 339 complex (Hermitian) Wigner matrix, 35
weak, 71, 72, 88, 262, 281, 282, 372, 37, 184
421, 425427 complex Gaussian (Hermitian) Wigner ma-
weak operator, 339, 455 trix, 28, 35, 260, 323, 393, 411
weak*, 328, 336, 338, 389, 421 Gaussian Wigner matrix, 6, 43, 45, 101,
Tree, 15, 16, 19, 25, 27, 28, 30, 32, 34, 37, 103, 261, 273, 276, 320, 323, 374, 394
376, 377 matrix, 6, 23, 29, 42, 47, 50, 51, 86, 87,
pendant, 31 186, 262, 323, 324, 337, 375, 383, 412
rooted planar tree, 9 surmise, 181
Trigonometric sum identities, 87, 118, 125 Theorem, 7, 10, 22, 23, 35, 3638, 81,
85, 105, 186, 262, 378
Ulams problem, 184 word see Word
Unbounded variable, 325, 336, 369 Wishart matrix, 20, 21, 8587, 184, 186,
189, 190, 261, 282, 285, 319, 321, 324,
Unital agebra, 325, 329, 340, 342, 353, 356, 392, 393, 412
395, 399, 400, 450, 457
Word, 11, 13, 18, 25, 34, 36, 37, 319, 322,
Unitary, 192 325328, 333, 334, 367, 369, 395, 400,
ensemble, see Ensemble, unitary 410, 412
Gaussian ensemble, see GUE closed, 13, 14, 18, 30, 33, 376
groups, 53, 187, 191197, 244, 253, 299, q-colorable, 377
320, 324, 393, 435, 449 equivalent, 13, 14
matrix, 52, 54, 88, 188, 254, 321, 324, FK, 2528
374, 388390, 411, 416, 432 FK parsing of, 25, 27
see also Operator, unitary graph associated with, 13, 376
Universality, 183185 length of, 13, 334
skeleton of FK, 26
Vacuum, 350, 363, 409 weight of, 13, 14, 18, 20, 25, 33, 376
Vandermonde determinant, 52, 58, 61, 67, Wigner, 14, 16, 17, 25, 26, 28, 30, 376,
96, 109, 151 377
Varadhans lemma, 76, 429 W -
algebra, see von Neumann algebra
Verblunsky coefficients, 321
probability space, 339347
Vertices, 13, 15, 17, 18, 3032, 378
Voiculescu, 322, 323, 362, 374, 410413
transform, 371, 373 Young diagram, 88, 411

You might also like