Large Derivations

LARGE DEVIATIONS FOR INTERACTING MULTISCALE PARTICLE SYSTEMS
Z.W. BEZEMEK, AND K. SPILIOPOULOS
Abstract. We consider a collection of weakly interacting diffusion processes moving in a two-scale locally
periodic environment. We study the large deviations principle of the empirical distribution of the particles’
arXiv:2011.03032v1 [math.PR] 5 Nov 2020
positions in the combined limit as the number of particles grow to infinity and the time-scale separation
parameter goes to zero simultaneously. We make use of weak convergence methods providing a convenient
representation for the large deviations rate function, which allow us to characterize the effective controlled
mean field dynamics. In addition, we obtain in certain special cases equivalent representations for the large
deviations rate function.
1. Introduction
The goal of this article is to obtain the large deviations principle (LDP) for interacting particle systems
of diffusion type in multiscale environments. We use methods from weak convergence and stochastic control,
[21], ultimately making connections with mean field stochastic control problems [12].
In particular, we consider the interacting particle system

i,N 1
(1) dXt = f (Xt , Xt /δ, µt ) + b(Xt , Xt /δ, µt ) dt + σ(Xti,N , Xti,N /δ, µN
i,N i,N N i,N i,N N i
t )dWt ,
δ
X0i,N = xi,N
where t ∈ [0, 1], Wti , i = 1, ..., N are m-dimensional independent Ft -Brownian motions,
N
1 X
µN
t (ω) := δ i,N ,
N i=1 Xt (ω)
Xti,N (ω), f (x, y, µ), b(x, y, µ) ∈ Rd , σ(x, y, µ) ∈ Rd×m and all coefficients are 1−periodic in the second coor-
dinate. Suppose also that δ > 0, N ∈ N and δ(N ) → 0 as N → ∞.
Our goal is to obtain the large deviations principle for the measure-valued process µN

t , t ∈ [0, 1] N ∈N in
the combined limit N → ∞ and δ ↓ 0. Here, δ is the time scale separation parameter. One can regard X i,N
as the slow ith component and Y i,N = X i,N /δ as the fast ith component.
Systems of interacting diffusions arise in many areas of science, finance and engineering, see for example
[6, 32, 33, 38, 44, 46] to name just a few. On the other hand, diffusions in multiscale environments are also
common in many applications ranging from chemical physics to finance and climate modeling, see for example
[1, 7, 23–25, 37, 45, 64] for a representative, but by no means complete, list. Our goal in this paper is to study
the combined effect of weak mean field interactions in a fast oscillating multiscale environment from the
point of view of large deviations for the empirical measure of particles.
In the case δ = 1, i.e. in the absence of multiple scales, the limiting problem of N → ∞ has been very
well studied in the literature. Typical behavior, fluctuations as well as large deviations have been obtained,
see for example [11, 15, 16] for related classical works. Analogously, if N = 1, i.e., in the single particle case,
the limiting behavior as δ ↓ 0 has also been extensively studied in the literature under various modeling
assumptions, see for example [2, 9, 20, 26, 31, 48, 50, 51, 56–60] and the references therein. In this paper, we
study the combined limit as N → ∞ and δ(N ) ↓ 0. The main result of the paper is Theorem 3.4 that gives
the large deviations principle of the empirical distribution of the particles in the combined limit N → ∞
and δ ↓ 0. As a byproduct we also obtain in Theorem 3.7 the typical behavior, i.e. the law of large numbers.
We use weak convergence methods of [21] which leads to the study of related mean field stochastic control
2010 Mathematics Subject Classification. 60F10, 60F05.

Key words and phrases. interacting particle systems, multiscale processes, empirical measure, large deviations.
This work has been partially supported by the National Science Foundation (DMS 1550918) and Simons Foundation Award
672441.
1
2 Z.W. BEZEMEK, AND K. SPILIOPOULOS
problems [12, 29, 42]. In addition, in Section 4, we connect, in a simplified regime, the variational form of the
action functional that we obtain in Theorem 3.4 with the expected “dual” form based on the classical work
of Dawson and Gärtner [16] in the δ = 1 case and [20] in the N = 1 case. As pointed out in [29], while the
limit problems here are not mean-field games, but rather optimal control problems of McKean–Vlasov-type,
the modern advances in control theory for mean-field models and McKean-Vlasov equations resulting from
mean-field games’ current popularity may provide a rigorous means of proving the formal results provided
in Section 4. This analysis will appear elsewhere.
As an example, we consider in Section 5 a system of interacting processes in dimension one in a multiscale
confining potential, i.e. a Curie-Weiss type of interaction. This example in motivated by the work of Dawson
in [15]. We verify that this system satisfies the required assumptions for an LDP to hold, derive the large
deviations principle and discuss equivalent formulations. We provide an alternative variational form of the
rate function for this particular system in Theorem 5.2.
To our knowledge, this is the first large deviations result for the combined δ ↓ 0 and N → ∞ limit.
Some similar results include the proof of an averaging principal for slow-fast McKean-Vlasov SDEs found
in [54], i.e. the δ ↓ 0 limit for a system of the type we get after N → ∞. There is also the result of
[3], in which the authors establish an averaging principal for the empirical mean a system of mean field
multiscale diffusions at the level of large deviations. What is meant by this is that they study the known
large deviations rate functions for the N → ∞ limit of the empirical measures with fixed δ, J δ , and are able
to prove Γ-convergence of the sequence {J δ }δ>0 to a functional J as δ ↓ 0. Lastly, in [17], a result similar
to Theorem 3.7 appears (only typical behavior, not LDP). A key difference between the regime of [17] and
the regime of our paper is that rather than depending on the slow process Xti,N , the fast process Xti,N /δ,
i,N
and the empirical measure µN t , their coefficients depend on the fast process Xt /δ and the “fast empirical
N,δ 1 N
measure” µt := N i=1 δX i,N /δ . As a result, the invariant measure π (see Equation 3) depends not on
P
t
the parameter µ = L(X̄· ) in the limit, as in our regime, but implicitly on itself as µ = π. Consequently, in
[17], multiple steady states can exist, potentially affecting the way in which the limits δ → 0 and N → ∞
interact. We discuss this further in our conclusion section 11.
The rest of the paper is organized as follows. In Section 2, we lay out notation and main assumptions
in regards to the model (1). In addition, we introduce the corresponding controlled particle system and
controlled McKean-Vlasov process which will be crucial components of the large deviations analysis. In
Section 3 we present our main result on large deviations for the measure-valued process µN

t , t ∈ [0, 1] N ∈N
in the combined limit N → ∞ and δ ↓ 0. Section 4 connects the obtained Laplace principle with other
classical works in the literature, i.e. the LDP in the δ = 1 case of [16] and the LDP in the N = 1 case of
[20]. Section 5 includes an example with a bistable confining potential motivated by the classical work of
Dawson in [15] and an alternative variational form of the rate function. In Section 6 we discuss the limiting
behavior of the controlled particle system, proving tightness and identifying the limit. In Sections 7 and
8 we prove the Laplace’s principle (which is equivalent to the large deviations principle) lower and upper
bounds respectively. Compactness of the level sets of the rate function is proven in Section 9. In Section 10
we discuss how the assumptions on the coefficients can be relaxed. In Appendices A,C, and D we discuss
technical preliminary results that are used in various places of the paper. For purposes of self containment
and for the reader’s convenience, Appendix B reviews the necessary material from Lions differentiation.
Lastly, an equivalent variational representation for the rate function offered in Theorem 3.4 is discussed in
Appendix E. Section 11 has our conclusions and directions for future work.
2. Notation, Assumptions, and the Controlled McKean-Vlasov process

Consider a probability space (Ω, F , P) with filtration Ft that satisfies the usual conditions. For S a Polish
space, we will use C([0, 1]; S) to denote the space of continuous functions from [0, 1] to S, equipped with
the topology of uniform convergence and the sup norm. We will use DS [0, 1] to denote B : [0, 1] → S such
that B is right-continuous and has left-hand limits, equipped with the Skorohod topology. A useful fact
is that both DRd [0, 1] and C([0, 1]; Rd ) with the previously described topologies are Polish spaces (see [21]
Theorem A.6.5). We will use Cb (S) to denote the space of continuous, bounded functions B : S → R, and let
kBk∞ := supx∈S |B(x)|. We use Cbk (S) to denote the space of continuous, bounded functions B : S → R with
k continuous, bounded derivatives. We use Cc∞ (S) to denote the space of continuous, infinitely differentiable
functions B : S → R with compact support. L2 (S, µ; Rk ), where µ is a measure on S will denote the class
LARGE DEVIATIONS FOR INTERACTING MULTISCALE PARTICLE SYSTEMS 3
1/2
k 2
R
of functions B : S → R such that kBkL2 (µ) := S
|B(x)| µ(dx) < ∞. We may omit the codomain
in this notation when convenient. P(S) will denote the space of probability measures on the Borel σ-field
B(S), where open sets are induced by the metric on S. P(S) is given the topology of weak convergence and
the Prokhorov metric, and is itself a Polish space ([22] Theorem 3.1.7). P2 (E) ⊂ P(E) will denote the set of
square integrable measures on E. It is given the L2 -Wasserstein distance (see Definition B.1) as its metric
and is also a Polish space ([12] p.360). Given a random variable η, L(η) will denote the distribution of η.
Assume the following:
(A1) For g = f, b, σ, g,∇x g, ∇y g, ∇x ∇x g, and ∇x ∇y g exist, are uniformly bounded, and are continuous
in Rd × Td × P(Rd ).
(A2) For A = σσ ⊤ there exists γ > 0 such that uniformly in x ∈ Rd , y ∈ Td , µ ∈ P2 ,
ξ ⊤ A(x, y, µ)ξ ≥ γ|ξ|2 , ∀ξ ∈ Rd .
(A3) For g = f, b, σ and x ∈ Rd , y ∈ Td , g(x, y, ·), ∇x g(x, y, ·), ∇x ∇x g(x, y, ·), ∇x ∇y g(x, y, ·), and
∇y g(x, y, ·) are all Fully C 2 in the sense of Definition B.2.
(A4) For g = f, b or σ, ∂µ g(x, y, µ)(·), ∇x ∂µ g(x, y, µ)(·), ∇y ∂µ g(x, y, µ)(·), ∂v ∂µ g(x, y, µ)(·), ∂µ2 g(x, y, µ)(·, ·)
are bounded in L2 (Rd , µ) uniformly in x, y, and µ.
(A5) Three exists L ∈ (0, ∞) such that for x1 , x2 ∈ Rd , y1 , y2 ∈ Td , µ1 , µ2 ∈ P2 ,

|g(x1 , y1 , µ1 ) − g(x2 , y2 , µ2 )| ≤ L |x1 − x2 | + |y1 − y2 | + W2 (µ1 , µ2 ) ,
where g = f, b, or σ and W2 is the L2 -Wasserstein distance (see Appendix B).

(A6) For some ν0 ∈ P(Rd ), N1 N
P 1
PN i,N 2
i=1 δxi,N → ν0 as N → ∞ and supN ∈N N i=1 |x | < ∞.
(A7) All the first and second derivatives listed in (A1) and (A4) are Höldarian in y uniformly in x and µ.
(A8) The centering condition:
Z
f (x, y, µ)π(dy|x, µ) = 0, ∀x ∈ Rd , µ ∈ P2 (Rd )
Td
holds.
By Proposition A.1, under Assumptions (A1)-(A8) this system has a unique strong solution for each
N ∈ N. It is also worth noting here that though the assumption (A8) is standard in the theory of averaging
and homogenization for guaranteeing unique strong solutions to the Cell Problem 4, in this particular instance
also has a significant effect on the identification of the limiting SDE given in Equation 12 (see (41)).
As we shall see in Section 10, Assumptions (A1)-(A8) can be substantially relaxed in the expense of more
technical estimates. We choose to present the main results with Assumptions (A1)-(A8) for the sake of
readability.
An important object of study will be the operator Lx,µ , paramaterized by x ∈ Rd and µ ∈ P2 (Rd ) which
acts on g ∈ C 2 (Td ) by
1
(2) L1x,µ g(y) := f (x, y, µ) · ∇g(y) + A(x, y, µ) : ∇∇g(y).
2
Related to this operator we consider the measure π(·|x, µ) ∈ P(Td ), paramaterized by µ ∈ P2 (Rd ), whose
density π̃(·|x, µ) satisfies the adjoint equation
∗
(3) L1x,µ π̃(y|x, µ) = 0
Z
π̃(y|x, µ)dy = 1, ∀x ∈ Rd , µ ∈ P2 (Rd ),
Td
and the function Φ : Rd × Td × P2 (Rd ) → Rd , Φ = (Φ1 , ..., Φd ) solving

(4) L1x,µ Φl (x, y, µ) = −fl (x, y, µ)
Z
Φl (x, y, µ)π(dy|x, µ) = 0,
Td
where both of these equations are given periodic boundary conditions. As we will see in Propositions C.1
and C.2, π and Φ are uniquely defined and π indeed admits a density π̃.
We wish to observe the behavior of the sequence of P(C([0, 1]; Rd ) random variables
N
1 X
(5) µN (ω) := δX i,N (ω)
N
i=1
as N → ∞. Specifically, letting ev : C([0, 1]; Rd ) → Rd be the evaluation map at time t, in Theorem

3.7 we see that, under assumptions (A1)-(A8), L(µN ) → δµ∗ in P(P(C([0, 1]; Rd ))), where deterministic
µ∗ ∈ P(C([0, 1]; Rd )) satisfies µ∗ ◦ ev −1 (t) = L(Xt ), t ∈ [0, 1] for X solving the McKean-Vlasov SDE:
Z

(6) dXt = [∇y Φ(Xt , y, L(Xt )) + I]b(Xt , y, L(Xt )) + ∇x Φ(Xt , y, L(Xt ))f (Xt , y, L(Xt ))
Td

+A : ∇x ∇y Φ(Xt , y, L(Xt )) π(dy|Xt , L(Xt ))dt
+ B(Xt , L(Xt ))dWt
Z
1 1
B(x, µ)B(x, µ)⊤ = ∇y Φ(x, y, µ)A(x, y, µ) + f (x, y, µ) ⊗ Φ(x, y, µ) + A(x, y, µ) π(dy|x, µ)
2 Td 2
X 0 ∼ ν0
for Wt a d-dimensional FtX −Brownian motion. Here we define
A : ∇x ∇y Φ(x, y, µ) := (A(x, y, µ) : ∇x ∇y Φ1 (x, y, µ), ..., A(x, y, µ) : ∇x ∇y Φd (x, y, µ))⊤ .
We seek to quantify the rate at which the convergence of the random measures given by Equation 5 to the
law of the solution of Equation 6 occurs via deriving a large deviations principal for {µN }N ∈N .
2.1. The Controlled Process. We start by constructing a controlled version of the system of mean-field
SDEs 1 which will then allow us to us the weak convergence approach to large deviations of [21].
For N ∈ N let UN denote the space of Ft -progressively measurable functions u : [0, 1] × Ω → RN ×m such
R1
that E[ 0 |u(t)|2 dt] < ∞, where E denotes the expectation with respect to P and | · | the Euclidean norm.
For u ∈ UN , we write u = (u1 , ..., uN ) where ui ∈ Rm , i = 1, ..., N .
Given uN ∈ UN , we consider the controlled system of SDEs

1
(7) dX̄ti,N = f (X̄ti,N , X̄ti,N /δ, µ̄N
t ) + b(X̄ i,N
t , X̄ i,N
t /δ, µ̄ N
t ) + σ(X̄ i,N
t , X̄ i,N
t /δ, µ̄ N
t )u N
i dt
δ
+ σ(X̄ti,N , X̄ti,N /δ, µ̄N i
t )dWt
X̄0i,N = xi,N
where µ̄N (t) and µ̄N are the empirical measures of X̄ i,N (t) and X̄ i,N respectively,
N N
1 X 1 X
(8) µ̄N
t (ω) := δ i,N , µ̄N (ω) := δ i,N .
N i=1 X̄t (ω) N i=1 X̄ (ω)
For notational convenience, we now introduce some spaces of interest. Let X := C([0, 1]; Rd ), Y := R1 (Td ),
Z = R11 , and W = X × Y × Z. Here
Rα m m
1 := {r : r is a positive Borel measure on R × [0, α], r(R × [0, t]) = t, ∀t ∈ [0, α],
Z
and |z|r(dz × dt) < ∞}.
Rm ×[0,α]
and
Rα (Td ) := {n : n is a positive Borel measure on Td × [0, α] and n(Td × [0, t]) = t, ∀t ∈ [0, α]}.
Note that we construct Y and Z this way to allow for extension of the results of this paper to bounded time
intervals other than [0, 1]. Also note that by Section 6.3 in [53] that Z is a Polish space and that by Theorem
A.3.3. in [21] that Y is a Polish space.
Note that if u ∈ U N for any N ∈ N, then u induces a Z-valued random variable r via
Z
(9) rω (D × I) := δu(t,ω) (D)dt, D ∈ B(Rm ), I ∈ B([0, 1]), ω ∈ Ω.
I
Since for r ∈ Z, t 7→ r(B × [0, t]) for B ∈ B(Rd ) is absolutely continuous, there exists rt : [0, 1] → P(Rm )
such that r(dzdt) = rt (dz)dt. Similarly, for m ∈ Y, there exists mt : [0, 1] → P(Td ) such that m(dydt) =
mt (dy)dt.
Consider the McKean-Vlasov SDE parameterized by ν ∈ C([0, 1]; P(Rd )) given by:
Z Z
(10) dX̃tν = ν ν ν
[∇y Φ(X̃t , y, ν(t)) + I][b(X̃t , y, ν(t)) + σ(X̃t , y, ν(t)) zρt (dz)]
Td m
R
+ ∇x Φ(X̃tν , y, ν(t))f (X̃tν , y, ν(t)) + A : ∇x ∇y Φ(X̃tν , y, ν(t)) mt (dy)dt
+ B(t, X̃tν , ν(t))dWt ,

1 1
Z
B(t, x, µ)B(t, x, µ)⊤ = (∇y Φ(x, y, µ) + )A(x, y, µ) + f (x, y, µ) ⊗ Φ(x, y, µ) mt (dy)
2 Td 2
for X̃ ν ∈ X , m ∈ Y, and ρ ∈ Z. Q ∈ P(W) corresponds to a weak solution of (10) if there exists a filtered
probability space (Ω̃, F̃, P̃), {F̃t } and an m-dimensional F̃t -Brownian motion W such that (X̃ ν , m, ρ) is an
F̃t -adapted W- valued random variable that has distribution Q under P̃. Note that X̃ ν , m, and ρ are each
random processes.
A useful remark making the point that the measures ρ(ω), m(ω) and the process X̃ ν (ω) all dependent on
each other follows.
Remark 2.1. While at a given ω, the measure ρ(ω) appears separately from m(ω) and X̃ ν (ω) in Equation
10, apriori X̃ ν , m, and ρ are not independent random variables (nor should we expect them to be).
We are interested in particular in weak solutions Q to X̃ νQ , where νQ (t) : [0, 1] → P(Rd ) is the Borel
measureable mapping defined by
(11) νQ (t) := Q({(φ, n, r) ∈ W : φ(t) ∈ B}), B ∈ B(Rd ), t ∈ [0, 1].
(For a description of B(P(Rd )) see [21] Lemma A.5.1).

ν
Since νQ (t) = L(X̃t Q ), we are thus interested in weak solutions to the limiting controlled McKean-Vlasov
SDE
Z Z
(12) dX̄t = [∇y Φ(X̄t , y, L(X̄t )) + I][b(X̄t , y, L(X̄t )) + σ(X̄t , y, L(X̄t )) zρt (dz)]
Td m
R
+ ∇x Φ(X̄t , y, L(X̄t ))f (X̄t , y, L(X̄t )) + A : ∇x ∇y Φ(X̄t , y, L(X̄t )) mt (dy)dt
+ B(t, X̄t , L(X̄t ))dWt ,

1 1
Z
B(t, x, µ)B(t, x, µ)⊤ = (∇y Φ(x, y, µ) + )A(x, y, µ) + f (x, y, µ) ⊗ Φ(x, y, µ) mt (dy)
2 Td 2
R
Note that in the case Rm zρt (ω)(dz) = 0 and mt (dy) = π(dy|X̄t , L(X̄t )) for almost every ω ∈ Ω̃ and
t ∈ [0, 1], this agrees with Equation 6.
The process triple (X̄, m, ρ) can be given explicitly as the
coordinate process on theprobability space
(W, B(W), Q) endowed with the canonical filtration Gt := σ (X̄s , m(s), ρ(s)), 0 ≤ s ≤ t . Thus, for ω =
(φ, n, r) ∈ W,
(13) X̄t (ω) = φ(t), m(t, ω) = n|B(Rm ×[0,t]) , ρ(t, ω) = r|B(Rm ×[0,t]) .

Thus, for g : Td → R, when we write EQ
R
Td ×[s,t]
g(y)m(t)(dydτ ) , we mean
Z Z Z
Q
E g(y)m(t)(dydτ ) = g(y)m(t, ω)(dydτ )Q(dω)
Td ×[s,t] W Td ×[s,t]
Z Z
= g(y)n|B(Rm ×[0,t]) (dydτ )Q(dφdndr)
X ×Y×Z Td ×[s,t]
Z Z tZ
= g(y)nτ (dy)dτ Q(dφdndr).
X ×Y×Z s Td
Throughout this
paper we will only integrate m(t, ω) against
time intervals of theform [s, t], so we will
simply write EQ Td ×[s,t] g(y)m(dydτ ) in the place of EQ Td ×[s,t] g(y)m(t)(dydτ ) and n(dydτ ) in the
R R
place of n|B(Td ×[0,t]) (dydτ ). The same applies to ρ(t, ω) and r|B(Td ×[0,t]) .
We end this section with a discussion on well posedness of the limiting controlled McKean-Vlasov SDE
(12).
Definition 2.2. We will say weak-sense uniqueness holds for Equation 12 if under the assumptions:
(1) Θ, Θ̃ ∈ V, where V is defined in Definition 3.1.
(2) ΘZ = Θ̃Z
we have ΘX = Θ̃X .
Proposition 2.3. Under the assumptions (A1)-(A8), weak-sense uniqueness as defined in Definition 2.2
holds for Equation (12).
Proof. Since the product of bounded Lipshitz functions is Lipshitz (Φ is Lipshitz in x, y because it has
bounded first order derivatives and in µ by Corollary C.3), the averaged coefficients are Lipshitz continuous
by Section 6.2 of [54]. Then the requirements for Theorem 2.1 in [62] are satisfied by the limiting system. By
a slight modification of the proof to include the control term (in particular the Yamada-Wantanabe Theorem
used in Lemma 2.3 does not directly apply, but a modification of its proof is simple given we consider relaxed
R1R
controls ρ such that 0 Rm |z|2 ρt (dz)dt < ∞) we get that for any fixed ρ there is a unique strong solution to
Equation 10 under assumptions (A1)-(A6). Then, noting that by Remark 6.7 that Θ|B(X ×Y) is completely
determined by ΘX , we get immediately that weak-sense uniqueness holds by the proof of Proposition C.2 in
[30].
3. Statement of the Main Results

In order to state the main results of this paper, we need the following two definitions:
Definition 3.1. We will say Θ ∈ P(W) is in V if
(V1) Θ corresponds
to a weaksolution X̄ of (12).
(V2) EΘ Rm ×[0,1] |z|2 ρ(dzdt) < ∞ .
R
(V3) νΘ (0) = ν0 from (A6).

Cb2 (Td ), Θ 1
R
(V4) ∀t ∈ [0, 1], g ∈ E Td ×[0,t] LX̄s ,νΘ (s) g(y)m(dyds) = 0.

Where here we are using the notation for the coordinate process given in Equation 13.
Essentially, condition (V4) says that the coordinate process ms (dy) is π(dy|X̄s , νΘ (s)), the invariant
measure for the fast dynamics, for Θ-a.e. ω ∈ W. A detailed explanation on why this is true is in Section
6.2 and in particular in Remark 6.7.
Definition 3.2. A function I : P(X ) → [0, ∞] is called a (good) rate function if for each M < ∞, the set
{θ ∈ P(X ) : I(θ) ≤ M } is compact. We say that a Laplace principal holds for the family {µN }N ∈N with rate
function I if for any bounded, continuous F : P(X ) → R,
1
(14) lim − log E[exp(−N F (µN ))] = inf {F (θ) + I(θ)}
N →∞ N θ∈P(X )
It is well known that in our setting the Laplace principal holds if and only if {µN }N ∈N satisfies a LDP
with rate function I. See [21] Theorem 1.2.3.
In order to prove the Laplace Principal for {µN }N ∈N , we make use of the following proposition:
Proposition 3.3. The pre-limit expression in (14) can be written as
1 1 1 1 N
Z
N
(15) − log E[exp(−N F (µ )] = inf [ E[ |u (t)|2 dt] + E[F (µ̄N )]
N uN ∈UN 2 N 0
N Z
1 1 X 1 N
= inf [ E[ |ui (t)|2 dt] + E[F (µ̄N )]
uN ∈UN 2 N i=1 0
for any F ∈ Cb (P(X )) where µ̄N is given by (8) with uN = (uN N

1 , ..., uN ) ∈ UN the control in Equation 7.
Proof. By Proposition A.1 and [63] there is Borel measurable ψ i,N such that
ψ i,N ((x1,N , ..., xN,N ), (W 1 , ..., W N )) = X i,N ,
and by the characterization of B(P(X )) given in Lemma A.5.1 of [21] p : C([0, 1]; Rd )N → P(C([0, 1]; Rd ))
PN
given by p(φ1 , ..., φN ) = N1 i=1 δφi is Borel measurable. So
µ̄N = p(ψ 1,N ((x1,N , ..., xN,N ), (W 1 , ..., W N )), ..., ψ N,N ((x1,N , ..., xN,N ), (W 1 , ..., W N )))
and is thus a Borel-measurable function of the driving Wiener processes for each N . Then Theorem 3.6 in
[10] applies, giving us the desired result.
Theorem 3.4. Under assumptions (A1)-(A8), the sequence of P(X )-valued random variables {µN }N ∈N as
defined by Equation 5 satisfies the Laplace Principal with good rate function
Z
Θ 1 2
(16) I(θ) = inf E |z| ρ(dzdt)
Θ∈V:ΘX =θ 2 Rm ×[0,1]
where inf(∅) := +∞.
Proof. As is standard, in order to prove that {µN }N ∈N satisfies the Laplace Principal with rate function
I, we prove the Laplace Principal lower bound (42) in Section 7 and the Laplace Principal upper bound
(44) in Section 8. The main tool in these proofs is the Variational Representation Theorem for Functionals
of Brownian Motion, given in Proposition 3.3. Once we identify the law of large numbers result for the
controlled process in Section 6, the Laplace Principal lower bound, i.e.
1
lim inf − log E[exp(−N F (µN ))] ≥ inf {F (θ) + I(θ)}
N →∞ N θ∈P(X )
follows immediately from Fatou’s lemma, as seen in Section 7. Then to prove the Laplace Principal upper
bound,
1
lim sup − log E[exp(−N F (µN ))] ≤ inf {F (θ) + I(θ)},
N →∞ N θ∈P(X )
in Section 8, given θ ∈ P(X ), we construct a probability space and a sequence of iid controls whose law
corresponds to ΘZ for Θ ∈ V, ΘX = θ, that nearly reaches the infimum in the definition of I(θ). Once we
show these two bounds, we get
1
inf {F (θ) + I(θ)} ≤ lim inf − log E[exp(−N F (µN ))]
θ∈P(X ) N →∞ N
1
≤ lim − log E[exp(−N F (µN ))]
N →∞ N
1
≤ lim sup − log E[exp(−N F (µN ))]
N →∞ N
≤ inf {F (θ) + I(θ)},
θ∈P(X )
so that Equation 14 is satisfied.

In Section 9 we prove that the level sets of I are compact, so indeed I is a good rate function.
Remark 3.5. See Section 4 for a formal connection of this rate function to known rate functions in the
literature which are not of variational form, and Theorem 5.2 for an alternative variational form of the rate
function for a certain subclass of systems.
Remark 3.6. We can see that the Law of the control in the definition of I depends on θ and the invariant mea-
sure π directly as follows: decomposing Θ ∈ V into stochastic kernels as Θ(dφdndr) = γ(dr|n, φ)λ(dn|φ)ΘX (dφ) =
γ(dr|n, φ)λ(dn|φ)θ(dφ), we have by Remark 6.7 that λ(dn|φ) = δπ(·|φ(s),θ(s))⊗ds(dn), which we will abbrevi-
ate as δπ (dn). Here by θ(s) we mean θ ◦ ev −1 (s). Then, using the fact that the control only appears linearly
in the dynamics of Equation 12,
Z Z
1 1
Z
EΘ |z|2 ρ(dzdt) = |z|2 ρ(ω)(dzdt)Θ(dω)
2 Rm ×[0,1] W 2 Rm ×[0,1]
1 1
Z Z Z Z Z
= |z|2 rt (dz)dtγ(dr|n, φ)δπ (dn)θ(dφ)
X Y Z 2 0 Rm
1 1
Z Z Z Z
= |z|2 rt (dz)dtγ(dr|π, φ)θ(dφ).
X Z 2 0 Rm
In order to solve the variational problem which defines the rate function I, one must find a minimizing
stochastic kernel γ.
In proving Theorem 3.4, we will also have proved the following convergence result:
Theorem 3.7. Let ev : X → Rd be the evaluation map at time t and {µN } be as defined by Equation
5. Under assumptions (A1)-(A8), L(µN ) → δµ∗ in P(P(X )), where deterministic µ∗ ∈ P(X ) satisfies
µ∗ ◦ ev −1 (t) = L(Xt ), t ∈ [0, 1] for X solving the McKean-Vlasov SDE (6).
Proof. This follows immediately from the proofs in Section 6 by taking uN ≡ 0 for all N ∈ N.
Remark 3.8. It is worth noting that via an integration-by-parts argument, the diffusion term in Equation 6
can also be written as
Z
⊤
(17) B(x, µ)B(x, µ) = [I + ∇y Φ(x, y, µ)]A(x, y, µ)[I + ∇y Φ(x, y, µ)]⊤ π(dy|x, µ).
Td
See [4] Chapter 3 Section 6.2.
4. Formal Connection to Rate Functions in the Existing Literature

The goal of this section is to connect, at least at a formal level, the rate function obtained in Theorem
3.4 to the ones obtained in the literature for δ = 1, which is the result of [16], and for N = 1, which is the
result of [20]. As noted in Section 7.1 of [11], in the absence of multiple scales there is a heuristic connection
between the variational form of the rate function of weakly interacting particles and that from the classical
paper [16]. In order to connect this to our case, we also must compare the LDP for small noise diffusions
from [27] to the rate function for combined small noise and averaging from [20].
Assume d = m, f ≡ 0, σ = I, and the initial conditions of all the particles is 0. Then π is Lebesgue
measure and Φ = 0. The limiting system given by Equation 12 is given by
Z Z
dX̄t = b(X̄t , y, L(X̄t )) + zρt (dz) dy dt + dWt .
Td Rd
Since in [20], the joint small noise and scale separation limit is being taken, we expect that we should be
able to formally connect our rate function, given by Equation 15, to IDG , the rate function from [16], and
IDS , the rate function from [20], by assuming that the analogous rate function to IDS which acts on P(X )
instead of X would be given through the same relationship as the small noise rate function from [27] to the
mean-field rate function from [16].
The rate function in [16] for the empirical measure for strong solutions to
dXti,N = b(Xti,N , µN
t ) + dWt
acts on θ ∈ C([0, 1]; P(Rd )) by

1
1 |hg, θ̇(t) − L2 (θ(t))∗ θ(t)i|2
Z
IDG (θ) = sup dt
2 0 g∈D:h|∇g|2 ,θ(t)i6=0 h|∇g|2 , θ(t)i
d
if θ : [0, 1] → P(R ) is absolutely continuous in the distribution sense (see Definition 4.1 in [16]) and
IDG (θ) = +∞ otherwise. Here θ̇(t) denotes the distributional derivative of θ at time t, D is the Schwartz
space of compactly supported smooth test functions g : Rd → R, and L2 (θ(t)) acts on g ∈ D according to
d
2 1 X ∂2g
L (θ(t))[g](x) := b(x, θ(t)) · ∇g(x) + (x).
2 ∂xk ∂xj
j,k=1
ǫ
Under the current assumptions, the rate function for {X }ǫ>0 , the strong solutions to
√
dXt = b(Xt ) + ǫdWt ,
of [27] acts on φ ∈ X by:
1
1
Z
IDE (φ) = |φ̇(t) − b(φ(t))|2 dt
2 0
1 d
for φ ∈ H ([0, 1]; R ) and +∞ otherwise.
The rate function for {X δ }δ>0 , the strong solutions to
√
dXtδ = b(Xtδ , Xtδ /δ) + ǫdWt ,
is given in [20] acts on φ ∈ X by
1
1
Z
IDS (φ) = |φ̇(t) − r(φ(t)|2 dt
2 0
for φ absolutely
R continuous and +∞ otherwise. Here ǫ(δ) ↓ 0 as δ ↓ 0 such that δǫ → ∞ as δ ↓ 0 and
r(x) := Td b(x, y)dy.
Comparing IDE to IDG and IDS , weR see our rate function can be expected to be the same as IDG but
with b(x, θ(t)) replaced by r(x, θ(t)) := Td b(x, y, θ(t))dy. in other words, in the notation of [16], we expect
the rate function to have the representation
1 1 |hg, θ̇(t) − L3 (θ(t))∗ θ(t)i|2
Z
S(θ) = sup dt
2 0 g∈D:h|∇g|2 ,θ(t)i6=0 h|∇g|2 , θ(t)i
if θ : [0, 1] → P(Rd ) is absolutely continuous in the distribution sense and S(θ) = +∞ otherwise. Here
L3 (θ(t)) acts on g ∈ D according to
d
1 X ∂2g
Z
3
L (θ(t))[g](x) := b(x, y, θ(t))dy · ∇g(x) + (x).
Td 2 ∂xk ∂xj
j,k=1
Formally we can see the connection between our rate function given by Equation 16 and S by denoting
V 1 the class of Θ ∈ V such that Θ-almost surely, ρt (dz) = δ∇v(t,X̄t ) (dz),for some v : [0, 1] × Rd → R such
that v(t, ·) ∈ D, ∀t ∈ [0, 1]. Applying Itô’s formula, we get for g ∈ D and Θ ∈ V 1 with ΘX = θ that
Z t+h Z t+h Z t+h
3
g(X̄t+h ) − g(X̄t ) = L (θ(s))[g](X̄s )ds + ∇v(s, X̄s ) · ∇g(X̄s )ds + ∇ · g(X̄s )dWs .
t t t
Taking expectations, dividing by h, and sending h → 0, we get

hg, θ̇(t) − L3 (θ(t))∗ θ(t)i = h∇v(t, ·) · ∇g, θ(t)i, ∀t ∈ [0, 1].
Then
|hg, θ̇(t) − L3 (θ(t))∗ θ(t)i|2 |h∇v(t, ·) · ∇g, θ(t)i|2
sup = sup .
g∈D:h|∇g|2 ,θ(t)i6=0 h|∇g|2 , θ(t)i g∈D:h|∇g|2 ,θ(t)i6=0 h|∇g|2 , θ(t)i
By Hölder’s inequality the expression on the right hand side is bounded above by h|∇v(t, ·)|2 , θ(t)i, with
equality reached when g(x) = v(t, x). So
|hg, θ̇(t) − L3 (θ(t))∗ θ(t)i|2
sup 2 , θ(t)i
= h|∇v(t, ·)|2 , θ(t)i.
2
g∈D:h|∇g| ,θ(t)i6=0 h|∇g|
Since this holds for every Θ ∈ V 1 , it then stands that it holds for Θ ∈ V 1 along which the infimum in
Z 1Z
Θ 1 2
IV 1 (θ) := inf E |z| ρt (dz)dt
Θ∈V 1 :ΘX =θ 2 0 Rd
is attained, so IV 1 = S. Thus we see if indeed the infimum over Θ ∈ V in Equation 16 can be restricted to
V 1 , then our rate function is equivalent to S.
5. An example with a bistable confining potential

Even though we have chosen for presentation purposes to proove Theorem 3.4 under strong boundedness
assumptions on the coefficients, the extended results of Section 10 show that Theorem 3.4 holds true under
greater generality. In particular, if one can appropriately control the behavior of the solution to (4) then
the results can be extended. As we explain in detail in Section 10, in the absence of such a general theory,
one in principle would need to confirm this in a case by case situation. In this example, we do so for the
popular interacting particle system considered in the classical work [15] where the drift is a bistable, confining
potential.
Consider 1-D the system of weakly interacting particles from Dawson’s classical paper [15], where the
ǫ
interacting potential is modified from V (x) = x4 /4 − x2 /2 to V δ (x) = x4 /4 − x2 /2 − 2π cos(2πx/δ) for some
ǫ > 0. Then the system of controlled SDE’s is given by

i,N i,N 3 i,N i,N ǫ i,N
(18) dX̄t = −(X̄t ) + X̄t + σui (t) − κ(X̄t − νt ) − sin(2π X̄t /δ) dt + σdWti
N N
δ
1 N i,N
where σ, κ > 0, νtN := h·, µ̄N i
P
t i = N i=1 X̄t , i ∈ {1, ..., N }, t ∈ [0, 1], and Wt are independent 1-D
Brownian motions.
In order to confirm that the result of Theorem 3.4 holds in this case, we need to verify that Assumptions
(A1’)-(A12’) of Section 10 hold ( see Theorem 10.2). After doing so, we discuss how the typical events and
large deviations principle look here, and provide an alternative variational form for the rate function.
5.1. Verification that the necessary assumptions hold. Let us start by assuming that
(1) (A1’) in Section 10 holds.
PN
(2) supN ∈N N1 i=1 |X̄0i,N |4 < D, D ∈ (0, ∞).
To see that the uncontrolled equation has a unique strong solution, one can write the system as a 2N
dimensional SDE in the same way as in the proof of Proposition A.1 and use a standard truncation argument
for SDEs with one-sided Lipshitz drift and the fact that the solution is nonexplosive (see [15] p.37 and [47]).
In fact for this particular case, Theorem 1 in [61] directly applies for each N ∈ N. Thus (A2’) holds.
To see (A3’) holds, we can apply standard PDE theory as in the bounded case. Further, since we are in
the case d = m = 1, we can solve for the density π̃ of π explicitly. We get that π̃(y) is the solution to
′ Z 1
1 2 ′′
− −ǫ sin(2πy)π̃(y) + σ π̃ (y) = 0, π̃ 1 - periodic, π̃(y)dy = 1.
2 0
Solving, we get
1
1 ǫ ǫ
Z
π̃(y) = exp 2 cos(2πy) , Z= exp 2 cos(2πx) dx.
Z σ π 0 σ π
The cell problem takes the form
1
1
Z
−ǫ sin(2πy)Φ′ (y) + σ 2 Φ′′ (y) = ǫ sin(2πy), Φ 1 - periodic, Φ(y)π̃(y)dy = 0.
2 0
and its unique solution is

1 y
Z 1
ǫ ǫ
Z
Φ(y) = exp − 2 cos(2πx) dx − y, Ẑ = exp − 2 cos(2πx) dx.
Ẑ 0 σ π 0 σ π
So, keeping in mind that Φ is 1-periodic and smooth, we have

sup |Φ(y)| + |Φ′ (y)| + |Φ′′ (y)| ≤ K
y∈R
for some K > 0. Thus, (A3’) and (A4’) hold, and we attain some apriori bounds that will help us in proving
(A9’). Since Φ is independent of µ, (A5’) and (A11’) hold trivially, and by our explicit representation (A6’)
holds. Since the only coefficient which depends on a measure is through a first order interaction through
µ 7→ h·, µi, and convergence in W2 implies convergence of first moments, we also have that (A10’) holds.
The proof that (A7’) and (A8’) hold is given by Proposition 5.1, where we show that in this particular
example we can take p = 4 and p2 = 6.

1
PN R 1 N 2
Proposition 5.1. If supN ∈N E N i=1 0 |ui (t)| dt < B, B ∈ (0, ∞) then for each t ∈ [0, 1],
N X N Z 1
1 X i,N 4 1
sup E sup |X̄t | + sup E |X̄ti,N |6 dt ≤ C(κ, σ, B, D, ǫ).
N ∈N t∈[0,1] N i=1 N ∈N N i=1 0
For presentation purposes we offer the proof of Proposition 5.1 in Appendix D. We next confirm that
(A9’) is satisfied. We identify that
b(x, y, µ) = −x3 + x − κ(x − h·, µi),
f (x, y, µ) = −ǫ sin(2πy), σ(x, y, µ) = σ > 0,
Φ(x, y, µ) = Φ(y) is bounded with bounded derivatives,
Thus, |b(x, y, µ)| ≤ C(1 + |x|3 + µ(| · |1 )) and all the other terms are bounded uniformly in x and µ. This
gives via the triangle and Cauchy-Schwarz inequalities that all the conditions in (A9’) are satisfied.
Lastly, we check that weak-sense uniqueness as given in Definition 2.2 holds, so that (A12’) is satisfied.
For this, we first write down the limiting controlled system:
Z Z Z
dX̃t = Φ′ (y) + 1 −X̃t3 + (1 − κ)X̃t + κ xL(X̃t )(dx) + σ zρt (dz) mt (dy) dt
T R R
s 2
Z
(19) +σ 1 + Φ′ (y) mt (dy)dWt ,
T
where here we used Remark 3.8 to simplify the diffusion term. Since for all m ∈ Y and all t ∈ [0, 1],
2
′ ′
R R
c1 (t) = T Φ (y) + 1 mt (dy) and c2 (t) = T 1 + Φ (y) mt (dy) are bounded and c1 (t) > 0, we quickly see
that due to the assumed L2 bound on the control in the conditions of Definition 2.2 that the proof goes
through in the same way as in Appendix A of [15].
5.2. Form of the limiting theorems and equivalent formulations. Now that we have confirmed that
the necessary assumptions indeed hold for the system 18, let us discuss what the law of large numbers and
large deviations principle look like.
As already discussed, the limiting controlled system takes the form (19). In particular, noting that
Z 2 Z
′ ′ 1
(20) Φ (y) + 1 π(dy) = Φ (y) + 1 π(dy) = .
T T Z Ẑ
we get that Theorem 3.7 (equivalently Theorem 10.3) holds and L(µN ) → δµ∗ in P(P(X )), where determin-
istic µ∗ ∈ P(X ) satisfies µ∗ ◦ ev −1 (t) = L(Xt ), t ∈ [0, 1] for X solving the McKean-Vlasov SDE:
r
1 1
Z
3
dXt = −Xt + (1 − κ)Xt + κ xL(Xt )(dx) dt + σ dWt .
Z Ẑ R Z Ẑ
Analogously, Theorem 3.4 (equivalently Theorem 10.2) shows that the Large Deviations Principle holds
with rate function given by (50) where Θ corresponds to a weak solution of (19) according to Definition 10.1.
Unlike the system offered in Section 4, the nontrivial interaction of this system with π makes it so we
cannot make a direct formal connection to the rate funcion of [16] by the same means. However, we are able
to prove an alternative variational form of the rate function, as stated in the following Theorem 5.2.
In preparation for stating this result, let us set Ŵ = X × Ŷ where X = C([0, 1]; Rd ) and
Z
Ŷ = {r̂ ∈ P(Td × Rm × [0, 1]) : r̂(Td × Rm × [0, t]) = t, ∀t ∈ [0, 1] and |z|r̂(dy × dz × dt) < ∞}.
Td ×Rm ×[0,1]
We shall write that Θ̂ ∈ P(Ŵ) is in V̂ if

(V̂ 1) Θ̂ corresponds to a weak solution X̂ of (21),
Z Z
′ 3
(21) dX̂t = Φ (y) + 1 −X̂t + (1 − κ)X̂t + κ xL(X̂t )(dx) + σz ρ̂t (dydz) dt
T×R R
s 2
Z
+σ 1 + Φ′ (y) ρ̂t (dydz)dWt .
T×R
(X̂, ρ̂) the coordinate process on Ŵ defined

with as in Equation 13.
1
(V̂ 2) EΘ̂ Td ×Rm ×[0,1] |z|2 ρ̂(dydzdt) + 0 |X̂t |4 dt < ∞ .
R R
(V̂ 3) ν̂Θ̂ (0) = ν0 , where ν̂Θ̂ is asin Equation 11 but acting on P(Ŵ).
(V̂ 4) ∀t ∈ [0, 1], g ∈ Cb2 (Td ), EΘ̂ Td ×Rm ×[0,t] L1X̄s ,ν (s) g(y)ρ̂(dydzds) = 0.
R
Θ̂
Now, we are ready to state Theorem 5.2.

Theorem 5.2. The rate function given by Theorem 10.2 (equivalently in Theorem 3.4) is equivalent to:
Z
ˆ = 1
(22) I(θ) inf EΘ̂ |z|2 ρ̂(dydzdt)
Θ̂∈V̂:Θ̂X =θ 2 Td ×Rm ×[0,1]
ˆ
where inf(∅) := +∞. In particular, we have that I(θ) = I(θ) for all θ ∈ P(X ).
The proof of Theorem 5.2 is deferred to Appendix E. Note that the proof of this theorem extends to the
more general case where σ(x,
R y, µ) = σ(y, µ), f (x, y, µ) = f (y, µ), m = d, [∇y Φ(y, µ) + I]σ(y, µ) is uniformly
bounded in y and µ, and Td [∇y Φ(y, µ) + I]σ(y, µ)π(dy|µ) is invertible with uniformly bounded inverse for
all µ, where as with Theorem 10.2, the exponent on |X̃t | in (V̂ 2) must be adapted to p in (A7’). For the
sake of brivity, we leave the proof of this extension to the interested reader.
Lastly, even though we do not show this here, we do mention that along the same lines one is able to
check that the required assumptions to satisfy Theorem 10.2 also hold for slightly more complicated cases
where the empirical measure appears linearly in the coefficient which blows up in the drift. For example,
one can consider a particle system of the form

dX̄ti,N = −(X̄ti,N )3 + X̄ti,N + σuN i,N
i (t) − κ(X̄t − νtN ) − ǫ cos(2π X̄ti,N /δ)

ǫ
+ X̄ti,N + νtN sin(2π X̄ti,N /δ) dt + σdWti .
δ
We leave the details of this example to the interested reader.
6. Limiting Behavior of the Controlled Empirical Measure

Our object of study in this section is the family of occupation measures {QN }N ∈N ∈ P(W) defined by:
N
1 X
(23) QN
ω (A × B × C) = δ i,N (A)δmi,N (ω) (B)δρi,N (ω) (C)
N i=1 X̄ (ω)
for A × B × C ∈ B(W), ω ∈ Ω, ρi,N the ordinary relaxed control corresponding to ui,N via Equation 9, X̄ i,N
as in (7), and mi,N (ω) ∈ Y given by
Z
(24) mi,N (ω)(I × D) := δ(X̄ i,N (ω)/δ)mod1 (D)dt
t
I
for I ∈ B([0, 1]) and D ∈ B(T ). We use the convention that for s > 1, uN
d
i (s) = 0, ∀i, N ∈ N.
Remark 6.1. For a discussion of another possible choice of occupation measures, see Remark E.1 after the
proof of Theorem 5.2 in Appendix E.
Assume that there exists B > 0 such that P almost-surely,
N Z
1 X 1 N
(25) sup |ui (t)|2 dt ≤ B
N ∈N N i=1 0
We will prove the following two propositions:

Proposition 6.2. Under assumption (25), the sequence {L(QN )}N ∈N is precompact in P(P(W)).
Proof. See Subsection 6.1.
N
Proposition 6.3. Under assumption (25), for Q such that L(Q ) → L(Q) in P(P(W)), Q ∈ V almost
surely, where the class of measures V is described in Definition 3.1.
Proof. See Subsection 6.2.
N
6.1. Tightness of the Occupation Measures. We prove tightness of the occupation measures {Q }N ∈N
defined in Equation 23 as P(W)-valued random variables by proving tightness of each of the marginals, QN
Z,
QNY , and Q N
X = µ̄ N
(as defined by Equation 8).
6.1.1. Tightness of QN
Z . This will follow analogously to p.88-89 of [11]. We recall here the arguments for the
readers convenience. Observe that
Z
g(r) := |z|2 r(dz × dt)
Rm ×[0,1]
is a tightness function on R1 . Namely, it is bounded from below and has relatively compact level sets.
Indeed, boundedness from below is obvious and in order to confirm the second property, for c ∈ (0, ∞) let
us set Rc := {r ∈ R1 : g(r) ≤ c}. Chebyshev’s inequality for M > 0 gives that
c
(26) sup r({z ∈ Rm : |z| > M } × {[0, 1]}) ≤ 2 .
r∈Rc M
Therefore, Rc is tight and thus relatively compact as a subset of R. Let {rn }n∈N ⊂ Rc be such that
{rn }n∈N converges weakly to r∗ ∈ R. We need to show that r∗ has finite first moment and that first
moments of {rn }n∈N converge to the first moment of r∗ . By Jensen’s inequality and Fatou’s lemma (Thm
A.3.12 in [21]),
√ p Z Z
c ≥ lim inf g(rn ) ≥ lim inf |z|rn (dz × dt) ≥ |z|r∗ (dz × dt).
n→∞ n→∞ Rm ×[0,1] Rm ×[0,1]
Now letting M > 0, by Equation 26 and Hölder’s inequality, we have for all r ∈ Rc ,
Z sZ Z
1{z∈Rm :|y|>M} |z|r(dz × dt) ≤ |z|2 r(dz × dt) 1{z∈Rm :|z|>M} r(dz × dt)
Rm ×[0,1] Rm ×[0,1] Rm ×[0,1]
r
c c
≤ c 2 = .
M M
So by reverse Fatou’s Lemma we get
c
Z Z
lim sup |z|rn (dz × dt) ≤ + 1{z∈Rm :|z|≤M} |z|r∗ (dz × dt)
n→∞ Rm ×[0,1] M Rm ×[0,1]
c
Z
≤ + |z|r∗ (dz × dt).
M Rm ×[0,T
Given that M may be taken to be arbitrarily large, we have

Z Z
lim |z|rn (dz × dt) = |z|r∗ (dz × dt).
n→∞ Rm ×[0,1] Rm ×[0,1]
Thus we have g is a tightness function on R1 . Now define G : P(Z) → [0, ∞] by

Z
G(Θ) := g(r)Θ(dr).
Z
Then G is a tightness function on P(Z) (see Theorem A.3.17 in [21]). Thus in order to prove tightness of
{QN
Z }N ∈N , it is enough to show that
sup E[G(QN )] < ∞.

N ∈N
But this follows immediately from assumption (25), since by definition of G and QN ,
Z Z
N
E[G(QZ )] = E[ |z|2 r(dzdt)QN
Z (dr)]
Z Rm ×[0,1]
N Z
1 X
= E[ |z|2 ρi,N (dzdt)]
N i=1 Rm ×[0,1]
N Z
1 X 1 N
= E[ |ui (t)|2 dt]
N i=1 0
< ∞.
6.1.2. Tightness of QN d 1 d 1
Y . First we note that T × [0, 1] is compact, so M (T × [0, 1]), where M (E) denotes
the set of subprobability measures on E, is compact by Corollary A.3.16 in [21] (this also works for Mα (E),
positive Borel measures µ on E with µ(E) ≤ α, for any α > 0). Then by the proof of Lemma 3.3.1 in [21],
R1 (Td ) ⊂ M1 (Td × [0, 1]) is closed in the topology of weak convergence (if a weakly converging sequence of
measures on Td ×[0, 1] has the property that for each member of the sequence, its second marginal is Lebesgue
measure, then this will also be true of the limiting measure), and hence Y = R1 (Td ) is compact. Then P(Y)
is compact, and hence P(P(Y)) is compact. Since {L(QN )}N ∈N ⊂ P(P(Y)), and on a metrizable space
compactness implies sequential compactness, we immediately get QN Y is tight as a P(Y)-random variable.
6.1.3. Tightness of QN d d
X . Let E = P(R ), the space of probability measures on R . We will prove tightness
N N
of {QX }N ∈N = {µ̄ }N ∈N as DE [0, 1]-valued random variables, where DE [0, 1] is the space of maps from
[0, 1] to E which are right continuous and have left-hand limits. Noting that µ̄N (ω) ∈ P(X ) for each
ω ∈ Ω, N ∈ N, we can treat {µ̄N }N ∈N as a sequence of C([0, 1]; P(Rd ))-valued random variables, so that
indeed {µ̄N (ω)}N ∈N ⊂ DE [0, 1], ∀ω ∈ Ω, N ∈ N. Proving tightness of {µ̄N }N ∈N as DE [0, 1]-valued random
variables will imply tightness as C([0, 1]; P(Rd ))-valued random variables by Problem 25 on p.153 of [22].
We will use Theorem 3.8.6 in [22] together with Theorem 3.1 in [39] to show the tightness of {µ̄N }N ∈N as
DE [0, 1] random variables.
Note that by Proposition 3.4.4. and Theorem 3.4.5. in [22], the class of functions
G := {g ∈ Cb (E) : g(µ) = hh, µi, h ∈ Cc∞ (Rd )}
separates points in E and is closed under addition. Thus, by Theorem 3.1 in [39], if we show Lemma 6.4
holds, then to show tightness of {µ̄N }N ∈N , it is enough to show tightness of {g ◦ µ̄N }N ∈N as DR [0, 1]-valued
random variables for each g ∈ G.
Lemma 6.4. For each η > 0, there exists K η ⊂⊂ E such that
P(µ̄N 6∈ {µ ∈ DE [0, 1] : µ(t) ∈ K η , ∀t ∈ [0, 1]}) < η.
Here by A ⊂⊂ B we mean A is a compact subset of B. Appealing to Theorem 8.6 and Remark 8.7 in
[22], it will thus be sufficient to show:
Lemma 6.5. For q(x, y) := 1 ∧ |x − y| and 0 ≤ t ≤ t + τ ≤ 1, 0 ≤ τ ≤ γ, there exists a family

{ξ N (γ)}γ∈(0,1),N ∈N of non negative random variables and r > 0 such that
h i
E q r g(µ̄N (t + τ )), g(µ̄N (t)) |FtN ≤ E ξ N (γ)|FtN , ∀g ∈ G
and
lim lim sup E[ξ N (γ)] = 0

γ→0 N →∞
N
where here FtN := F µ̄t (= Ft since X̄ti,N are strong solutions).
Note in particular that the following compact containment condition required by Theorem 3.8.6 in [22] as
provided in the following corollary follows directly from Lemma 6.4 and its proof.
Corollary 6.6. For each η > 0 and t ∈ [0, 1], there exist Ktη ⊂⊂ E such that:
P(µ̄N η
t 6∈ K ) < η.
We proceed with the proof of Lemma 6.4:
Proof of Lemma 6.4. Define KL ⊂⊂ Rd by KL := {x ∈ Rd : |x| ≤ L}. Then
N
1 X
E[ sup µ̄N d
t (R \ KL )] ≤ P( sup |X̄ i,N | > L)
t∈[0,1] N i=1 t∈[0,1] t
N i,N
1 X E[supt∈[0,1] |X̄t |2 ]
≤ by Chebyshev’s inequality
N i=1 L2
C
≤ by Proposition A.2.
L2
Now define
1
KL∗ := {ν ∈ P(Rd ) : ν(Rd \ K(L+j)2 ) ≤ √ , ∀j ∈ N}.
L+j
By Prokhorov’s Theorem, KL∗ ⊂⊂ P(Rd ) for each L. Now we see that
P(µ̄N 6∈ {µ ∈ DE [0, 1] : µ(t) ∈ KL∗ , ∀t ∈ [0, 1]})

1
= P ∃j ∈ N, t ∈ [0, 1] : µ̄Nt (R d
\ K (L+j) 2 ) > √
L+j
∞
!
X 1
≤ P sup µ̄N d
t (R \ K(L+j)2 ) > √
j=1 t∈[0,1] L+j
∞
" #
X p
≤ E sup µ̄N d
t (R \ K(L+j)2 ) L + j by Chebyshev’s inequality
j=1 t∈[0,1]
X∞
≤C (L + j)−7/2 .
j=1
This bound approaches 0 as L → ∞, so the lemma is proved.
Now we prove Lemma 6.5.

Proof of Lemma 6.5. We take r = 1. Now we observe that for all 0 ≤ t ≤ t + τ ≤ 1, 0 ≤ τ ≤ γ and g ∈ G,

E q g(µ̄N (t + τ )), g(µ̄N (t)) |FtN = E 1 ∧ hh, µ̄N (t + τ ))i − hh, µ̄N (t))i |FtN for some h ∈ Cc∞

≤ E hh, µ̄N (t + τ ))i − hh, µ̄N (t))i |FtN

 
N
 1 i,N
) − h(X̄ti,N ) |FtN  .
X
= E  h(X̄t+τ

N
i=1
Applying Itô’s formula to h, we get (ignoring the arguments for notational convenience)
Z t Z t
1 1
h(X̄ti,N ) = h(xi,N ) + [ f + b + σuN
i ] · ∇h + A : ∇∇hds + ∇h · (σdWsi ).
0 δ 2 0
In order to control the term that blows up as δ → 0, we define ψl (x, y, µ) := Φl (x, y, µ)hxl (x), l = 1, ..., d,
for Φ as in Equation 4. Then ψl solves
L1x,µ ψl (x, y, µ) = −fl (x, y, µ)hxl (x).
Now applying Itô’s formula (using Equations 51 and 52 in Proposition B.3 and the regularity of Φ from
Proposition C.2) to ψl , we get
ψl (X̄ti,N , X̄ti,N /δ, µ̄N

t )
Z t
1 1 1 1
= ψl (xi,N , xi,N /δ, µ̄N
0 )+ [ f + b + σuN i ] · ∇x ψl + A : ∇x ∇x ψl + [ f + b + σui ] · ∇y ψl
N
0 δ 2 δ δ
Z t
1 t

1 1
Z
i
: :
+ 2 A ∇y ∇y ψl + A ∇x ∇y ψl ds + ∇x ψl · (σdWs ) + ∇y ψl · (σdWsi )
2δ δ 0 δ 0
N Z t
1 X
+ ∂µ ψl (X̄si,N , X̄si,N /δ, µ̄N j,N
s )(X̄s )
N j=1 0

1 j,N j,N N j,N j,N N j,N j,N N N
· f (X̄s , X̄s /δ, µ̄s ) + b(X̄s , X̄s /δ, µ̄s ) + σ(X̄s , X̄s /δ, µ̄s )uj (s)
δ
1
+ A(X̄sj,N , X̄sj,N /δ, µ̄N i,N i,N N j,N
s ) : ∂v ∂µ ψl (X̄s , X̄s /δ, µ̄s )(X̄s )
2
1
+ A(X̄sj,N , X̄sj,N /δ, µ̄N 2 i,N i,N N j,N j,N
s ) : ∂µ ψl (X̄s , X̄s /δ, µ̄s )(X̄s , X̄s )ds
2N
Z t
i,N i,N N j,N j,N j,N N j
+ ∂µ ψl (X̄s , X̄s /δ, µ̄s )(X̄s ) · (σ(X̄s , X̄s /δ, µ̄s )dWs )
0
t t
1 1
Z Z
+ A : ∇x ∂µ ψl (X̄si,N , X̄si,N /δ, µ̄N i,N
s )(X̄s )ds + A : ∇y ∂µ ψl (X̄si,N , X̄si,N /δ, µ̄N i,N
s )(X̄s )ds
N 0 Nδ 0
t
1 1 1
Z
= ψl (xi,N , xi,N /δ, µ̄N
0 )+ [ f + b + σuN N
i ] · ∇x ψl + A : ∇x ∇x ψl + [b + σui ] · ∇y ψl
0 δ 2 δ
Z t
1 1 t 1 t
Z Z
+ A : ∇x ∇y ψl ds + ∇x ψl · (σdWsi ) + ∇y ψl · (σdWsi ) − 2 fl hxl ds
δ 0 δ 0 δ 0
N Z t
1 X
+ ∂µ ψl (X̄si,N , X̄si,N /δ, µ̄N j,N
s )(X̄s )
N j=1 0

1
· f (X̄sj,N , X̄sj,N /δ, µ̄N
s ) + b(X̄ j,N
s , X̄ j,N
s /δ, µ̄ N
s ) + σ(X̄ j,N
s , X̄ j,N
s /δ, µ̄ N
s )u N
j (s)
δ
1
+ A(X̄sj,N , X̄sj,N /δ, µ̄N i,N i,N N
s ) : ∂v ∂µ ψl (X̄s , X̄s /δ, µ̄s )(X̄s )
j,N
2
1
+ A(X̄sj,N , X̄sj,N /δ, µ̄N 2 i,N i,N N j,N
s ) : ∂µ ψl (X̄s , X̄s /δ, µ̄s )(X̄s , X̄s )ds
j,N
2N
Z t
+ ∂µ ψl (X̄si,N , X̄si,N /δ, µ̄N
s )(X̄ j,N
s ) · (σ(X̄ j,N
s , X̄ j,N
s /δ, µ̄ N
s )dW j
s )
0
t t
1 1
Z Z
+ A : ∇x ∂µ ψl (X̄si,N , X̄si,N /δ, µ̄N i,N
s )(X̄s )ds + A : ∇y ∂µ ψl (X̄si,N , X̄si,N /δ, µ̄N i,N
s )(X̄s )ds,
N 0 Nδ 0
where in all coefficients where the argument is suppressed, the argument is (X̄si,N , X̄si,N /δ, µ̄N
s ). Solving for
1 t
R
δ 0 fl hxl ds and plugging into our representation for h, we get
8
h(X̄ti,N ) = h(xi,N ) + Bki,N (t)
X
(27)
k=1
where
t d
1
Z
B1i,N (t) =
X
b · ∇h + A : ∇∇h + [f + δb] · [∇x Φl hxl + Φl ∇x hxl ]
0 2
l=1
δ
+ A : [hxl ∇x ∇x Φl + 2∇x Φl ⊗ ∇x hxl + ∇x ∇x hxl Φl ] + b · [hxl ∇y Φl ]
2
+ A : [hxl ∇x ∇y Φl + ∇x hxl ⊗ ∇y Φl ] ds
Z t
δ
= :
[I + δ∇x Φ + ∇y Φ]b + ∇x Φf + A [∇x ∇y Φ + ∇x ∇x Φ] · ∇h
2
0
1
+ [ + δ∇x Φ + ∇y Φ]A + [f + δb] ⊗ Φ : ∇∇h
2
d
X δ
+ [A : ∇x ∇x hxl ]Φl ds
2
l=1
Z t d
B2i,N (t)
X
= [σuN
i ] · ∇h + N N
δ[σui ] · [∇x Φl hxl + Φl ∇x hxl ] + [σui ] · [hxl ∇y Φl ] ds
0 l=1
Z t
= [I + δ∇x Φ + ∇y Φ]σuN
i · ∇h + δ[σuN
i ] ⊗ Φ : ∇∇hds
0
Z t d
Z tX
B3i,N (t) = ∇h · (σdWsi ) + [δ[∇x Φl hxl + Φl ∇x hxl ] + [hxl ∇y Φl ]] · (σdWsi )
0 0 l=1
Z t Z t
= ∇h · ([I + δ∇x Φ + ∇y Φ]σdWsi ) + δ∇∇h : Φ ⊗ (σdWsi )
0 0
d
B4i,N (t) = δ i,N i,N i,N
X
hxl (xi,N )Φl (xi,N , xi,N /δ, µ̄N
0 ) − h xl (X̄ t )Φ l (X̄ t , X̄ t /δ, µ̄ N
t )
l=1

i,N i,N i,N
= δ ∇h(xi,N ) · Φ(xi,N , xi,N /δ, µ̄N
0 ) − ∇h(X̄ t ) · Φ(X̄ t , X̄ t /δ, µ̄ N
t )
d N Z t
δ X X
B5i,N (t) = [hxl ∂µ Φl (X̄si,N , X̄si,N /δ, µ̄N j,N
s )(X̄s )]
N 0
l=1 j=1

1
· f (X̄sj,N , X̄sj,N /δ, µ̄N
s ) + b(X̄ s
j,N
, X̄ j,N
s /δ, µ̄ N
s )
δ
1
+ A(X̄sj,N , X̄sj,N /δ, µ̄N i,N i,N
s ) : [hxl ∂v ∂µ Φl (X̄s , X̄s /δ, µ̄s )(X̄s )]
N j,N
2
1
+ A(X̄sj,N , X̄sj,N /δ, µ̄Ns ) : [h ∂ 2
xl µ l Φ (X̄ i,N
s , X̄ i,N
s /δ, µ̄ N
s )( X̄ j,N
s , X̄ s
j,N
)]ds
2N
Z t Z
1
= ∂µ Φ(X̄si,N , X̄si,N /δ, µ̄N N N
s )(v) f (v, v/δ, µ̄s ) + δb(v, v/δ, µ̄s ) + A(v, v/δ, µ̄s )
N
0 R d 2

δ 2
: δ∂v ∂µ Φ(X̄si,N , X̄si,N /δ, µ̄N s )(v) + ∂ Φ( X̄ i,N
, X̄ i,N
/δ, µ̄ N
)(v, v) µ̄ N
(dv)
N µ s s s s
· ∇hds
d N Z t
δ X X
B6i,N (t) = i,N i,N N j,N j,N j,N N N
[hxl ∂µ Φl (X̄s , X̄s /δ, µ̄s )(X̄s )] · σ(X̄s , X̄s /δ, µ̄s )uj (s) ds
N j=1 0
l=1
Z t N
δ X
= ∂µ Φ(X̄si,N , X̄si,N /δ, µ̄N
s )(X̄ j,N
s )σ(X̄ j,N
s , X̄ j,N
s /δ, µ̄ N
s )u N
j (s) · ∇hds
0 N j=1
d N Z t
δ X X
B7i,N (t) = i,N i,N N j,N j,N j,N N j
[hxl ∂µ Φl (X̄s , X̄s /δ, µ̄s )(X̄s )] · (σ(X̄s , X̄s /δ, µ̄s )dWs )
N j=1 0 l=1
t N
δ X
Z
i,N i,N N j,N j,N j,N N j
= ∇h · ∂µ Φ(X̄s , X̄s /δ, µ̄s )(X̄s )σ(X̄s , X̄s /δ, µ̄s )dWs
0 N j=1
d Z t
1 X
B8i,N (t) = δ A : [∇x hxl ⊗ ∂µ Φl (X̄si,N , X̄si,N /δ, µ̄N i,N
s )(X̄s )
N 0l=1
+ hxl ∇x ∂µ Φl (X̄si,N , X̄si,N /δ, µ̄N i,N
s )(X̄s )]ds
Z t
+ A : [hxl ∇y ∂µ Φl (X̄si,N , X̄si,N /δ, µ̄N
s )(X̄ i,N
s )]ds
0
Z t
1 i,N i,N N i,N δ i,N i,N N i,N
= :
A [ ∇y ∂µ Φ(X̄s , X̄s /δ, µ̄s )(X̄s ) + ∇x ∂µ Φ(X̄s , X̄s /δ, µ̄s )(X̄s )] · ∇h
0 N N

δ
+ ∂µ Φ(X̄si,N , X̄si,N /δ, µ̄N i,N
s )(X̄s )A : ∇∇hds.
N
Thus we have
 
8 XN
 1

Bki,N (t + τ ) − Bki,N (t) |FtN  .
X
E q g(µ̄N (t + τ )), g(µ̄N (t)) |FtN ≤ E 

N
k=1 i=1
Firstly, we observe that

N N
1 X i,N i,N ≤ 2δk∇hk 1 sup |Φ(X̄ti,N , X̄ti,N /δ, µ̄N
X

N B 4 (t + τ ) − B 4 (t) ∞ t )|
i=1
N i=1 t∈[0,1]
≤ δC
by Proposition C.2.
Next, we observe that
N
1 X i,N i,N

N B2 (t + τ ) − B2 (t)
i=1
N Z t+τ Z 1 1/2
1 X 2 N 2
≤ c1 (m, d)k∇hk∞ |[I + δ∇x Φ + ∇y Φ]σ| ds |ui (s)| ds
N i=1 t 0
N Z t+τ Z 1 1/2
δ X
+ c2 (m, d)k∇∇hk∞ |Φ|2 |σ|2 ds |uN
i (s)| 2
ds
N i=1 t 0
N Z 1 1/2
1 X
≤ c3 (1 + δ)τ 1/2 |uNi (s)| 2
ds
N i=1 0
X N Z 1 1/2
1/2 1 N 2
≤ c3 (1 + δ)τ |ui (s)| ds
N i=1 0
by Hölder’s inequality, monotonicity, Assumption (A1), Proposition C.2, and Jensen’s inequality. Similarly,
we have
1 N i,N
X
i,N

N B 6 (t + τ ) − B 6 (t)

i=1
N N Z t+τ
δ XX
≤ c4 (d, m)|∇h| 2 |∂µ Φ(X̄si,N , X̄si,N /δ, µ̄N j,N j,N j,N N 2
s )(X̄s )σ(X̄s , X̄s /δ, µ̄s )| ds
N i=1 j=1 t
Z 1 1/2
N 2
× |uj (s)| ds
0
N N Z 1/2 X N Z 1 1/2
δ X 1 X t+τ i,N i,N N j,N 2 1 N 2
≤ c5 (d, m)|∇h| |∂µ Φ(X̄s , X̄s /δ, µ̄s )(X̄s )| ds |uj (s)| ds
N i=1 N j=1 t N j=1 0
N Z 1/2
1 X 1 N

1/2 2
≤ c6 δτ |ui (s)| ds
N i=1 0
In addition,
1 N i,N
X
E B3 (t + τ ) + B7i,N (t + τ ) − B3i,N (t) − B7i,N (t) |FtN
N i=1
1 N i,N
X 2 1/2
B3 (t + τ ) + B7i,N (t + τ ) − B3i,N (t) − B7i,N (t) |FtN

≤ E by Jensen’s inequality
N i=1
1 δ
≤ τ 1/2 c7 ( + ) by Itô Isometry, Assumption (A1), and Proposition C.2.
N N
Lastly, we observe that

1 N i,N
X
i,N i,N i,N i,N i,N

N B1 (t + τ ) + B5 (t + τ ) + B8 (t + τ ) − B1 (t) − B5 (t) − B8 (t)
i=1
N Z t+τ
1 X
≤ k∇hk∞ [I + δ∇x Φ + ∇y Φ]b + ∇x Φf
N i=1 t

δ 1
+ A : [∇x ∇y Φ + ∇x ∇x Φ + ∇y ∂µ Φ(X̄si,N , X̄si,N /δ, µ̄N i,N
s )(X̄s )
2 N
δ
+ ∇x ∂µ Φ(X̄si,N , X̄si,N /δ, µ̄N i,N
s )(X̄s )]
N
Z
+ ∂µ Φ(X̄si,N , X̄si,N /δ, µ̄N N N
s )(v) f (v, v/δ, µ̄s ) + δb(v, v/δ, µ̄s )
Rd
1
+ A(v, v/δ, µ̄N s )
2
i,N i,N N δ 2 i,N i,N N N

: δ∂v ∂µ Φ(X̄s , X̄s /δ, µ̄s )(v) + ∂µ Φ(X̄s , X̄s /δ, µ̄s )(v, v) µ̄s (dv) ds
N
Z t+τ
1 δ i,N i,N N i,N

+k∇∇hk∞ [ 2 + δ∇x Φ + ∇y Φ + N ∂µ Φ(X̄s , X̄s /δ, µ̄s )(X̄s )]A + [f + δb] ⊗ Φds

t
Z t+τ
+ δc8 (d, m) sup |∇∇hxl | |Φ||A|ds
l∈{1,...,d} t
1 δ
≤ c9 (1 + δ + + )τ
N N
by Assumption (A1) and Proposition C.2. So, letting
X N Z 1 1/2
1 1 δ 1 δ 1/2
ξ N (γ) = C δ + (1 + δ)γ 1/2 |uN
i (s)| 2
ds + (1 + δ + + )γ + ( + )γ
N i=1 0 N N N N
for C large enough to dominate all the constants in the previous bounds, we see that for each g ∈ G,
h i
E q g(µ̄N (t + τ )), g(µ̄N (t)) |FtN ≤ E ξ N (γ)|FtN .
Now we must show that

lim lim sup E[ξ N (γ)] = 0.
γ→0 N →∞
This follows immediately since

X N Z 1 1/2
1 1 δ 1 δ 1/2
E ξ N (γ) ≤ C δ + (1 + δ)γ 1/2 E |uN
i (s)| 2
ds + (1 + δ + + )γ + ( + )γ
N i=1 0 N N N N
by Jensen’s inequality

1/2 1/2 1 δ 1 δ 1/2
≤ C δ + (1 + δ)γ B + (1 + δ + + )γ + ( + )γ by Assumption (25),
N N N N
so

lim sup E ξ (γ) ≤ C(B 1/2 γ 1/2 + γ)
N
N →∞
and
lim lim sup E[ξ N (γ)] ≤ lim C(B 1/2 γ 1/2 + γ) = 0.
γ→0 N →∞ γ→0

6.2. Identification of the Limit. Extract a convergent subsequence from {QN }N ∈N and relabel with new
indexes so that {QN }N ∈N converges to some Q weakly as a P(W)-valued random variable. Let (Ω̃, F̃ , P̃) be
the probability space on which Q lies.
We wish to identify the limit Q as a member of V for P̃ almost every ω̃ ∈ Ω̃. Our main tool here is the
associated martingale problem to weak solutions of (10). An important element to the proof which is special
to the joint limit as N → ∞, δ ↓ 0 is that we must first show that (V4) holds before identifying the SDE
associated to Q to prove (V1). This is because, as proven in (41), in the prelimit there exists a term which
is O(1) in N , but is in fact 0 in the limit due to the centering condition (A8). As the centering condition
is a statement involving the invariant measure π (see Equation 3), it is necessary to the proof that we have
already identified the Y component of the limiting Coordinate Process 13 as being concentrated on π.
6.2.1. Proof of (V4). As stated, we offer the proof of (V4) first.
We want to show that for almost every ω̃ ∈ Ω̃ and ∀t ∈ [0, 1], g ∈ Cb2 (Td ),
Z
Qω̃ 1

E LX̄s ,νQ (s) g(y)m(dyds) = 0.
ω̃
d
T ×[0,t]
Let gl : T → R, l ∈ N be smooth and bounded with bounded derivatives and dense in Cb2 (Td ). This
d
set exists by using Stone–Weierstrass and taking rational coefficients. Let Ȳ i,N = X̄ i,N /δ. Considering the
operator which acts on g ∈ Cb2 (Td ) by

1 1 1
Ax,z,µ [g](y) := 2 f (x, y, µ) + (b(x, y, µ) + σ(x, y, µ)z) · ∇g(y) + 2 A(x, y, µ) : ∇∇g(y).
δ δ 2δ
Note that by (A1), for t ∈ [0, 1] and fixed N ∈ N,
Z t
Mti,N := gl (Ȳti,N ) − gl (xi,N
0 /δ) − AX̄si,N ,uN (s),µ̄N [gl ](Ȳsi,N )ds
i s
0
Z t
1
= ∇y gl (Ȳsi,N ) · (σ(X̄si,N , Ȳsi,N , µ̄N i
s )dWt )
δ 0
is an Ft -martingale. By definition, for t ∈ [0, 1],
Z t
1 t 1
Z
i,N
AX̄si,N ,uN (s),µ̄N gl (Ȳs )ds = 2 L i,N gl (Ȳsi,N )ds
0 i s δ 0 X̄s ,µ̄N s
1 t
Z
+ b(X̄s , Ȳs , µ̄s ) + σ(X̄s , Ȳs , µ̄s )ui (s) · ∇y gl (Ȳsi,N )ds
i,N i,N N i,N i,N N N
δ 0
Consider now the operator which acts on g ∈ Cb2 (Td ) by

Bx,z,µ [g](y) := b(x, y, µ) + σ(x, y, µ)z · ∇g(y).
Then
N Z t
1 X 2 i,N i,N i,N i,N

(28) δ −M t + g (Ȳ
l t ) − g (x
l 0 /δ) − δ B X̄ i,N
,uN (s),µ̄N [g l ](Ȳs )ds
N i=1
0
s i s
N Z
1 X t 1

i,N

= LX̄ i,N ,µ̄N gl (Ȳs )ds.
N i=1 0 s s

We will show the right hand side of Equation 28 converges in distribution to EQ Td ×[0,t] L1X̄s ,νQ (s) f (y)m(dyds)
R
and the left hand side converges in distribution to 0, so by a density argument the result holds.
Q
R 1

The proof that the right hand side of Equation 28 → E Td ×[0,t] L νQ f (y)m(dyds) in distribution
X̃s ,νQ (s)
follows from the observation that
N Z
1 X t 1
Z Z
i,N 1
N
L X̄ i,N
,µ̄N lg (Ȳs )ds = L φ(s),ν (s) g l (y)n(dyds) Q (dφdndr).
N
i=1 0
s s

W
d
T ×[0,t]
Q N
We invoke Skorohod’s representation theorem (Theorem 3.1.8 in [22]) to assume the convergence of QN →
Q holds with probability 1. Without making a distinction between the original probability space in the new
one, we will prove
1 N t 1
X Z Z
i,N Q 1

(29) E L X̄ i,N
,µ̄
g
N l (Ȳs )ds − E LX̄ ,ν (s) g l (y)m(dyds) → 0,
N i=1 0
s s

Td ×[0,t]
s Q
so that by Chebyshev’s inequality the convergence holds in probability and hence in distribution.
First we prove that
Z Z
L1φ(s),ν N (s) gl (y)n(dyds)QN (dφdndr)

(30) E
Q
W Td ×[0,t]
Z Z
L1φ(s),νQ (s) gl (y)n(dyds)QN (dφdndr) → 0.

−

W Td ×[0,t]
We have
Z Z
L1φ(s),ν N (s) gl (y)n(dyds)QN (dφdndr)

lim E
N →∞ W

Td ×[0,t]
Q
Z Z
1
N
−
d L φ(s),νQ (s) lg (y)n(dyds)
Q (dφdndr)

W T ×[0,t]
Z Z
L1φ(s),ν j (s) gl (y)n(dyds)

≤ sup lim E
N ∈N j→∞ W Td ×[0,t]
Q
Z
1
N
− Lφ(s),νQ (s) gl (y)n(dyds)Q (dφdndr)

Td ×[0,t]
Z Z
1 1
N
≤ sup lim E L φ(s),νQj (s) − L φ(s),νQ (s) g l (y)n(dyds) Q (dφdndr)
N ∈N j→∞

W Td ×[0,t]
Z Z t Z
1 1 N

≤ sup lim E Lφ(s),ν j (s) − Lφ(s),νQ (s) gl (y)ns (dy)dsQ (dφdndr)

N ∈N j→∞ Q

W 0 Td
By Proposition A.3 and Assumption (A1), we get for fixed φ, n, and s that
Z
1 1

lim Lφ(s),ν j (s) − Lφ(s),νQ (s) gl (y)ns (dy) = 0.
j→∞ d Q
T
By Assumption (A1) the bounded convergence theorem applies. So we can pass the limit through the
integrals, and the results follows.
Now we will show
1 N t 1
X Z Z
i,N Q 1

(31) E L X̄ i,N
,ν (s)
g (Ȳ
l s )ds − E LX̄ ,ν (s) g l (y)m(dyds) → 0,
N
i=1 0
s Q d
s Q
T ×[0,t]

which together with triangle inequality and Equation 30 proves Equation 29.
Noting that
1 N t 1
X Z Z
i,N Q 1

E LX̄ i,N ,ν (s) gl (Ȳs )ds − E
LX̄s ,νQ (s) gl (y)m(dyds)
N i=1 0
s Q
Td ×[0,t]
Z Z
L1φ(s),νQ (s) gl (y)n(dyds)QN (dφdndr)

= E
d
W T ×[0,t]
Z Z
1

−
Lφ(s),νQ (s) gl (y)n(dyds)Q(dφdndr) ,

W Td ×[0,t]

1
R
so since (φ, n) 7→ Td ×[0,t] Lφ(s),νQ (s) gl (y)n(dyds) is continuous from X × Y to R, and since we have that

R 1

sup(φ,n)∈X ×Y Td ×[0,t] Lφ(s),νQ (s) gl (y)n(dyds) < ∞ by assumption (A1), the result follows immediately from

the convergence of QN → Q.
To prove the left hand side of (28) converges to zero in distribution, we will show that
X N Z t
1 δ −Mti,N + gl (Ȳti,N ) − gl (xi,N /δ) − δ
2 i,N

E 0 B X̄ i,N N
,ui (s),µ̄N [g l ](Ȳs )ds →0
N i=1
0
s s
as N → ∞, so the result will follow by Chebyshev’s inequality.

We first note that
N N 2 1/2
1 X i,N
2 2 1
X i,N
δ E M ≤δ E Mt ≤ δk∇gl k∞ C by Assumption (A1) and Itô Isometry.
N i=1 t N i=1
Also,
N
21 X i,N
E gl (Ȳt ) − gl (x /δ) ≤ 2δ 2 kgl k∞ .
i,N

δ
N i=1
Lastly,
N
1 X t
Z
BX̄si,N ,uN (s),µ̄N [gl ](Ȳsi,N )ds ≤ δk∇gl k C(B)

δ E
N i=1 0 i s
by Hölder’s inequality, Assumption (25), and Assumption (A1).
Now we have that for each gl and t ∈ [0, 1], there exists a set Ngl ,t such that P̃ (Ngl ,t ) = 0 and
Z
Qω̃
L1X̄s ,νQ (s) gl (y)m(dyds)

E = 0, ∀ω̃ ∈ Ω̃ \ Ngl ,t .
ω̃

Td ×[0,t]
Taking a countable dense set D ⊂ [0, 1] and letting N = ∪l∈N ∪t∈D Ngl ,t , we have P̃(N ) = 0 and
Z
Qω̃
L1X̄s ,νQ (s) g(y)m(dyds) = 0, ∀ω̃ ∈ Ω̃ \ N, ∀g ∈ Cb2 (Td ).

E
ω̃

Td ×[0,t]
So (V4) holds for almost every ω̃ ∈ Ω̃. In Remark 6.7 we discuss implications of (V4).
Remark 6.7. Θ ∈ P(W) satisfying condition (V4) implies that ∀t ∈ [0, 1], g ∈ Cb2 (Td ),
Z Z Z
Θ
L1X̄s ,νΘ(s) g(y)m(dyds) L1φ(s),νΘ (s) g(y)n(dyds)Θ(dφdndr)

E :=

Td ×[0,t] W Td ×[0,t]
= 0.
Note that we can write Θ|B(X ×Y) (dφdn) = λ(dn|φ)ΘX (dφ). Then (V4) can be written as:
Z Z Z
Cb2 (Td ), L1φ(s),νΘ (s) g(y)n(dyds)λ(dn|φ)ΘX (dφ)

∀t ∈ [0, 1], g ∈
= 0.
X Y Td ×[0,t]
So
Z Z
L1φ(s),νΘ (s) g(y)n(dyds)λ(dn|φ) = 0, ∀t ∈ [0, 1], ∀g ∈ Cb2 (Td ), ΘX − a.e. φ ∈ X

Y Td ×[0,t]
(32) ⇒ for ΘX -a.e.φ ∈ X , λ({n ∈ Y : (L1φ(s),νΘ (s) )∗ ns = 0, ∀s ∈ [0, 1]}|φ) = 1.

But by Proposition C.1, the invariant measure associated with L1φ(s),νΘ (s) is unique for each φ, Θ, and s,
so λ(·|φ) is concentrated on the single measure π(dy|φ(s), νΘ (s)) ⊗ ds for ΘX -a.e. φ. Thus we have

Θ (φ, n, r) ∈ W : ns (dy) = π(dy|φ(s), νΘ (s)), ∀s ∈ [0, 1]

= Θ|B(X ×Y) (φ, n) ∈ X × Y : ns (dy) = π(dy|φ(s), νΘ (s)), ∀s ∈ [0, 1]
since the set we are measuring doesn’t depend on r

Z
= :
λ (φ, n) ∈ X × Y ns (dy) = π(dy|φ(s), νΘ (s)), ∀s ∈ [0, 1] |φ ΘX (dφ)
X
Z
= :
λ (φ, n) ∈ X × Y ns (dy) = π(dy|φ(s), νΘ (s)), ∀s ∈ [0, 1] |φ ΘX (dφ)
A
for A ∈ B(X ) such that 32 holds ∀φ ∈ A, ΘX (A) = 1
= 1.
6.2.2. Proof of (V1). We wish to prove that Qω̃ corresponds to X̄, a weak solution of Equation 12, for P̃-a.e.
ω̃ ∈ Ω̃.
Given g ∈ Cb2 (Rd ) and Θ ∈ P(W), define a real-valued process {MgΘ (t)}t∈[0,1] on (W, B(W), Θ) given by
Z tZ Z
Θ
(33) Mg (t, (φ, n, r)) = g(φ(t)) − g(φ(0)) − A[g](φ(s), y, z, νΘ (s))ns (dy)rs (dz)ds
0 Rm Td
where

A[g](x, y, z, νΘ (s)) := [∇y Φ(x, y, νΘ (s)) + I][b(x, y, νΘ (s)) + σ(x, y, νΘ (s))z] + ∇x Φ(x, y, νΘ (s))f (x, y, νΘ (s))

+A : ∇x ∇y Φ(x, y, νΘ (s)) ] · ∇g(x)

1
+ [∇y Φ(x, y, νΘ (s)) + ]A(x, y, νΘ (s)) + f (x, y, νΘ (s)) ⊗ Φ(x, y, νΘ (s)) : ∇∇g(x).
2
We will say Θ ∈ P(W) solves the martingale problem associated to X̃ νΘ with initial distribution ν0 if for
all 0 ≤ s ≤ t ≤ 1 and g ∈ Cb2 (Rd ),

(34) EΘ MgΘ (t)|Gs = MgΘ (s), νΘ (0) = ν0
for {Gt }t∈[0,1] the canonical filtration on the coordinate process as defined in Equation (13). To identify
the limit Q as a weak solution X̃ νQ to 10, by the density argument offered at the end of this subsection, it
suffices to show that for fixed h ∈ Cc∞ (Rd ), 0 ≤ s ≤ t ≤ 1, and Gs - measurable Ψ ∈ Cb (W) that
Theorem 6.8.

EQω̃
Ψ(MhQω̃ (t) − MhQω̃ (s)) =0
for almost every ω̃ ∈ Ω̃.

We note that it is enough to prove Theorem 6.8 for h ∈ Cc∞ (Rd ), due to the fact that Cc∞ (Rd ) is separating
in the sense of Chapter 3 Section 4 in [22] (see [14] Definition 3.1 and Chapter 4 Section 8 in [22]). In order
prove Theorem 6.8 we will prove Lemma 6.9 and Lemma 6.10.
Lemma 6.9.

N N N
(35) EQ Ψ(MhQ (t) − MhQ (s)) → EQ Ψ(MhQ (t) − MhQ (s)) in distribution,
Lemma 6.10.

N N
QN
(36) E Ψ(MhQ (t) − MhQ (s)) → 0 in distribution.
Then the conclusion follows.

We proceed with the proof of Lemma 6.9.
Proof of Lemma 6.9. Unpacking the notation in Equation 35, we see what we are trying to show is that
Z Z tZ Z
Ψ g(φ(t)) − g(φ(s)) − A[h](φ(τ ), y, z, νQN (τ ))nτ (dy)rτ (dz)dτ QN (dφdndr)
W s Rm T d
Z Z tZ Z
→ Ψ g(φ(t)) − g(φ(s)) − A[h](φ(τ ), y, z, νQ (τ ))nτ (dy)rτ (dz)dτ Q(dφdndr)
W s Rm Td
in distribution. We invoke Skorokhod’s representation theorem to assume the convergence of QN → Q occurs

with probability 1, without making a distinction in the notation between the new probability space and the
original one. Then we see Lemma 6.9 essentially follows from the definition of convergence of measures in the
space P(W). The only caveats are that the integrand is also a function of the converging measures through
its dependence on νQN , and that the integrand is not apriori in Cb (W), since it grows linearly in the control.
Thus, we will show

N
E EQ sup MhQN (t) − MhQ (t) → 0

(37)
t∈[0,1]
and

QN Q Q Q Q Q

(38) E E
Ψ(Mh (t) − Mh (s)) − E Ψ(Mh (t) − Mh (s)) → 0.
Once these limits are established, by triangle inequality and Chebyshev’s inequality Lemma 6.9 will be
proved.
To see Equation 37 holds we write

(39) A[h](x, y, z, νΘ (s)) = A (x, y, νΘ (s)) + As (x, y, νΘ (s))z · ∇h(x) + A3s (x, y, νΘ (s)) : ∇∇h(x)
1 2
where for s ∈ [0, 1], x ∈ Rd , y ∈ Td , and Θ ∈ P(W):

A1 (x, y, νΘ (s)) := [∇y Φ(x, y, νΘ (s)) + I]b(x, y, νΘ (s)) + ∇x Φ(x, y, νΘ (s))f (x, y, νΘ (s)) + A : ∇x ∇y Φ(x, y, νΘ (s))
A2 (x, y, νΘ (s)) := [∇y Φ(x, y, νΘ (s)) + I]σ(x, y, νΘ (s))
1
A3 (x, y, νΘ (s)) := [∇y Φ(x, y, νΘ (s)) + ]A(x, y, νΘ (s)) + f (x, y, νΘ (s)) ⊗ Φ(x, y, νΘ (s)).
2
Propositions A.3 and C.2 along with Assumption (A1) imply for fixed z ∈ Rm , h ∈ Cc∞ (Rm ), x ∈ Rd , y ∈
d
T , and s ∈ [0, 1] that
(40)

lim A (x, y, νQN (s)) + A (x, y, νQN (s))z · ∇h(x) + A3 (x, y, νQN (s)) : ∇∇h(x)
1 2

N →∞

1 2 3

:
− A (x, y, νQ (s)) + A (x, y, νQ (s))z · ∇h(x) − A (x, y, νQ (s)) ∇∇h(x) = 0
Thus if we are able to prove that we can pass the limit into the integrals in Equation 37, then we will be
done. We observe that for Θ = QN , N ∈ N or Q,
2
A[h](x, y, z, νΘ (s)) ≤ k∇hk A1 (x, y, νΘ (s)) + Ck∇hk
2
A (x, y, νΘ (s)) + |z|2

∞ ∞

+k∇∇hk∞ A3 (x, y, νΘ (s))

≤ C(1 + |z|2 ) by Young’s inequality, Assumption (A1) and Proposition C.2.

Then
Z 1 Z Z
N
sup E EQ C(1 + |z|2 )nt (dy)rt (dz)dt
N ∈N 0 Rm Td
Z 1 N
1 X N
= sup C(1 + E |ui (t)|2 dt ) < ∞ by Assumption (25)
N ∈N 0 N i=1
We have then that

QN
QN Q

lim E E sup Mh (t, (φ, n, r)) − Mh (t, (φ, n, r))

N →∞ t∈[0,1]
Z 1 Z Z
QN

≤ lim E E A[h](φ(s), y, z, νQN (s)) − A[h](φ(s), y, z, νQ (s))ns (dy)rs (dz)ds

N →∞ 0 Rd T d
Z 1 Z Z
QN

≤ sup lim E E A[h](φ(s), y, z, νQj (s)) − A[h](φ(s), y, z, νQ (s)) ns (dy)rs (dz)ds

N ∈N j→∞ 0 Rd Td
Z 1 Z Z
N
= sup E EQ

lim A[h](φ(s), y, z, νQj (s)) − A[h](φ(s), y, z, νQ (s)) ns (dy)rs (dz)ds
N ∈N 0 Rd Td j→∞
by Dominated Convergence Theorem
= 0 by Equation 40.
and hence Equation 37 is proved. Now we prove Equation 38. We have:

QN Q Q Q Q Q

E E
Ψ(Mh (t) − Mh (s)) − E Ψ(Mh (t) − Mh (s))
Z tZ Z
QN
= E E
h(φ(t)) − h(φ(s)) − A[h](φ(τ ), y, z, νQ (τ ))nτ (dy)rτ (dz)dτ
s Rm T d
Z tZ Z
− EQ h(φ(t)) − h(φ(s)) −

A[h](φ(τ ), y, z, νQ (τ ))nτ (dy)rτ (dz)dτ .
s Rm Td
R Noting that h(φ(t)) − h(φ(s)) is bounded and writing A[h] as in Equation 39, we see that only
A2 (φ(τ ), y, νQ (s))zns (dy)rs (dz) exhibits growth in the control r. Since the desired convergence
R
Rm T d
occurs immediately by boundedness of the integrand and almost-sure convergence of QN → Q, we only show
work to show the convergence of this term. Let
Z 1Z
B 2 (M ) := {r ∈ Z : |z|r(dzdt) > M },
0 Rm
R R1R
1
 m zrt (dz)
R if 0 Rm |z|r(dzdt) ≤ M
ψM (r, t) := R1R
R 1 R M
R
|z|r(dzdt) R m zrt (dz) if 0 Rm |z|r(dzdt) > M.
0 Rm
 R1R
0 if 0 Rm |z|r(dzdt) ≤ M

2
ψM (r, t) := R

M
R1R
 Rm zrt (dz) 1 − R01 RRm |z|r(dzdt)
 if 0 Rm |z|r(dzdt) > M.
1 2
R R1 1
Then for all r ∈ Z, M > 0, t ∈ [0, 1], ψM (r, t) + ψM (r, t) = Rm
zrt (dz), 0
|ψM (r, t)|dt ≤ M , and
|ψM (r, t)|dt < 1B (M) (r) 0 Rm |z|r(dzdt).
R1 2 R1R
2
0
So
Z t Z Z
QN 2
E E
A (φ(τ ), y, νQ (τ ))zmτ (dy)rτ (dz) · ∇h(φ(τ ))dτ
s Rm T d
Z t Z Z
− EQ A2 (φ(τ ), y, νQ (τ ))znτ (dy)rτ (dz) · ∇h(φ(τ ))dτ

s Rm T d
Z t Z
QN
= E E A2 (φ(τ ), y, νQ (τ ))ψM 1
(r, τ )nτ (dy) · ∇h(φ(τ ))dτ
s Td
Z t Z
N
+ EQ A2 (φ(τ ), y, νQ (τ ))ψM2
(r, τ )nτ (dy) · ∇h(φ(τ ))dτ
s Td
Z t Z
Q 2 1
−E A (φ(τ ), y, νQ (τ ))ψM (r, τ )nτ (dy) · ∇h(φ(τ ))dτ
s Td
Z t Z
Q 2 2

−E A (φ(τ ), y, νQ (τ ))ψM (r, τ )nτ (dy) · ∇h(φ(τ ))dτ
s Td
Z t Z
QN
≤ E E A2 (φ(τ ), y, νQ (τ ))ψM 1
(r, τ )nτ (dy) · ∇h(φ(τ ))dτ
s Td
Z t Z
− EQ A2 (φ(τ ), y, νQ (τ ))ψM1

(r, τ )nτ (dy) · ∇h(φ(τ ))dτ
s Td
Z t Z
QN 2 2

+ 2k∇hk∞ sup E E A (φ(τ ), y, νQ (τ ))ψM (r, τ )nτ (dy) dτ by A.3.12 in [21].
N ∈N s Td
By Bounded Convergence Theorem and convergence of QN → Q, the first term vanishes as N → ∞ (for
1
continuity of the time integral of ψM see Lemma 5.3.4/3.3.1 in [21]). To handle the second term, we have:
Z t Z
QN 2 2

sup E E A (φ(τ ), y, νQ (τ ))ψM (r, τ )nτ (dy) dτ
N ∈N s Td
Z 1
QN 2
≤ CE E |ψM (r, τ )|dτ
0
by Assumption (A1), Proposotion C.2, and monotonicity
Z 1Z
≤ CE EQ 1B 2 (M) (r)
N
|z|r(dzdt)
0 Rm
2
R1R
0 Rm |z|r(dzdt)

= C sup E EQN
1B2 (M) (r) R1R
0 Rm |z|r(dzdt)
N ∈N
Z 1 Z 2
C N
≤ sup E EQ |z|r(dzdt) by the definition of B 2 (M )
M N ∈N 0 Rm
Z 1 Z
C QN 2
≤ sup E E |z| rτ (dz)dτ by Jensen’s inequality
M N ∈N 0 Rm
X N Z 1
C 1
= sup E |ui (τ )| dτ by definition of QN
N 2
M N ∈N N i=1 0
CB
≤
by Assumption (25).
M
Taking N → ∞ then M → ∞ the result follows.
Now we prove Lemma 6.10.
Proof of Lemma 6.10. Again, we invoke Skorohod’s representation theorem to assume the convergence of
QN → Q occurs with probability 1.

N N N
We will show E EQ Ψ(MhQ (t) − MhQ (s)) → 0 as N → ∞, and so the conclusion will follow via

Chebyshev’s inequality.
From Equation 27, we get that
Z tZ Z 5
h(X̄ti,N ) Dki,N ,
X
− h(X̄si,N ) = A[h](X̄τi,N , y, z, µ̄N i,N
τ )mτ (dy)δuN
i (τ )
(dz)dτ +
s Rm Td k=1
where
Z t
1
D1i,N = A : [∇y ∂µ Φ(X̄τi,N , X̄τi,N /δ, µ̄N i,N
τ )(X̄τ )]
s N

1 1
+ δ ∇x Φb + A : ∇x ∇x Φ + A : ∇x ∂µ Φ(X̄τi,N , X̄τi,N /δ, µ̄N i,N
τ )(X̄τ )
2 N
Z
1
+ ∂µ Φ(X̄τi,N , X̄τi,N /δ, µ̄N N
τ )(v)b(v, v/δ, µ̄τ ) + A(v, v/δ, µ̄τ )
N
R d 2

i,N i,N N 1 2 i,N i,N N N
: ∂v ∂µ Φ(X̄τ , X̄τ /δ, µ̄τ )(v) + ∂µ Φ(X̄τ , X̄τ /δ, µ̄τ )(v, v) µ̄τ (dv)
N
· ∇h(X̄τi,N )dτ
Z t
1
+δ ∇x ΦA + b ⊗ Φ + ∂µ Φ(X̄τi,N , X̄τi,N /δ, µ̄N
τ )(X̄ i,N
τ )A : ∇∇h(X̄τi,N )dτ
s N
d
Z tX
1
+δ [A : ∇x ∇x hxl (X̄τi,N ]Φl dτ
s l=1 2
Z t N
1 X
D2i,N =δ ∇x ΦσuN
i (τ ) + i,N i,N N j,N j,N j,N N N
∂µ Φ(X̄τ , X̄τ /δ, µ̄τ )(X̄τ )σ(X̄τ , X̄τ /δ, µ̄τ )uj (τ )
s N j=1
· ∇h(X̄τi,N )
+ [σuN i,N
i (τ )] ⊗ Φ : ∇∇h(X̄τ )dτ
Z t Z t
D3i,N = ∇h(X̄τi,N ) · ([I + δ∇x Φ + ∇y Φ]σdWτi ) + δ∇∇h(X̄τi,N ) : Φ ⊗ (σdWτi )
s s
t N
δ X
Z
+ ∇h(X̄τi,N ) · i,N i,N N j,N j,N j,N N j
∂µ Φ(X̄τ , X̄τ /δ, µ̄τ )(X̄τ )σ(X̄τ , X̄τ /δ, µ̄τ )dWτ
s N j=1

D4i,N =δ ∇h(X̄si,N ) · Φ(X̄si,N , X̄si,N /δ, µ̄N
s ) − ∇h(X̄ti,N ) · Φ(X̄ti,N , X̄ti,N /δ, µ̄N
t )
Z t N
1 X
D5i,N = ∂µ Φ(X̄τi,N , X̄τi,N /δ, µ̄N
τ )(X̄ j,N
τ )f (X̄ j,N
τ , X̄ j,N
τ /δ, µ̄ N
τ ) · ∇h(X̄τi,N )dτ,
s N j=1
mi,N are defined as in Equation 24 and the arguments which are omitted are taken to be (X̄τi,N , X̄τi,N /δ, µ̄N
τ ).
Thus
N 5
N N N 1 X X i,N
EQ Ψ(MhQ (t) − MhQ (s)) = Ψ Dk .
N i=1
k=1
By the same proofs as in tightness of QN

X in Subsection 6.1.3, using Assumption (A1) and Proposition C.2,
we get for large enough N that
N X4
1 X i,N
1
E Ψ
Dk ≤ max{δ, }C
N i=1 N
k=1
where C depends only on the sup norms of Ψ and h and its first 3 derivatives. This vanishes as N → ∞, so
once we prove the following (41), Lemma 6.10 will be proved.
N
1 X i,N
(41) lim E |Ψ D5 | = 0.
N →∞ N i=1
Unlike the average of Dki,N , k = 1, 2, 3, 4, the term we wish to vanish in (41) is O(1) in N . However, as
we will see, the fact that Q almost surely satisfies (V4) along with Remark 6.7 and the centering condition
from Assumption (A8) will result in this term vanishing when we pass to the limit. We first observe that
N
1 X i,N
N
E Ψ
D ≤ kΨk∞ k∇hk∞ E D
N i=1 5
where
t N N
1 X 1 X
Z
N i,N i,N N j,N j,N j,N N

D := ∂µ Φ(X̄τ , X̄τ /δ, µ̄τ )(X̄τ )f (X̄τ , X̄τ /δ, µ̄τ )dτ.
s N i=1 N j=1
We can rewrite DN in terms of the occupation measures defined in Equation 23 as:

Z t Z Z Z Z
DN = N N N

∂µ Φ(φ(τ ), y, µ̄ τ )(ψ(τ ))f (ψ(τ ), ŷ, µ̄ τ )n τ (dŷ)Q (dψdndr) mτ (dy)

s W Td W Td
QN (dφdmdρ)dτ.
And thus
Z t Z Z Z Z
E DN ≤ E ∂µ Φ(φ(τ ), y, µ̄N N N

τ )(ψ(τ ))f (ψ(τ ), ŷ, µ̄τ )nτ (dŷ)mτ (dy)Q (dψdndr)

d

s W W T Td

N
Q (dφdmdρ)dτ
Z t Z Z Z Z
∂µ Φ(φ(τ ), y, µ̄N N

≤E
τ )(ψ(τ ))f (ψ(τ ), ŷ, µ̄τ )
s W W Td Td

− ∂µ Φ(φ(τ ), y, νQ (τ ))(ψ(τ ))f (ψ(τ ), ŷ, νQ (τ ))nτ (dŷ)mτ (dy)QN (dψdndr)QN (dφdmdρ)dτ

Z t Z Z Z Z

+E
∂µ Φ(φ(τ ), y, ν Q (τ ))(ψ(τ ))f (ψ(τ ), ŷ, ν Q (τ ))n τ (dŷ)mτ (dy)

s W W Td Td

QN (dψdndr)QN (dφdmdρ)dτ .
First we show the first of these terms vanishes as N → ∞. We have

Z t Z Z Z Z
∂µ Φ(φ(τ ), y, µ̄N N

lim E
τ )(ψ(τ ))f (ψ(τ ), ŷ, µ̄τ )
N →∞ s W W Td Td

N N

− ∂µ Φ(φ(τ ), y, νQ (τ ))(ψ(τ ))f (ψ(τ ), ŷ, νQ (τ ))nτ (dŷ)mτ (dy)Q (dψdndr)Q (dφdmdρ)dτ

Z t Z Z Z Z
∂µ Φ(φ(τ ), y, µ̄jτ )(ψ(τ ))f (ψ(τ ), ŷ, µ̄jτ )

≤ lim sup E
j→∞ N ∈N s W W Td Td

− ∂µ Φ(φ(τ ), y, νQ (τ ))(ψ(τ ))f (ψ(τ ), ŷ, νQ (τ ))nτ (dŷ)mτ (dy)QN (dψdndr)QN (dφdmdρ)dτ

We note now that by Propositions A.3 and C.2 along with Assumption (A1) and Bounded Convergence
Theorem, for fixed φ, ψ, n, m and τ
Z Z
∂µ Φ(φ(τ ), y, µ̄jτ )(ψ(τ ))f (ψ(τ ), ŷ, µ̄jτ )

lim
j→∞ Td Td

− ∂µ Φ(φ(τ ), y, νQ (τ ))(ψ(τ ))f (ψ(τ ), ŷ, νQ (τ ))nτ (dŷ)mτ (dy) = 0.
So by Assumption (A1) and Proposition C.2, we have by Bounded Convergence Theorem that this term
vanishes.
Now we have
Z Z Z t Z Z
N

lim E[D ] ≤ lim E ∂µ Φ(φ(τ ), y, νQ (τ ))(ψ(τ ))f (ψ(τ ), ŷ , νQ (τ ))nτ (dŷ)
N →∞ N →∞ W W s

Td Td

mτ (dy)dτ QN (dψdndr)QN (dφdmdρ) .
By the same argument as above, we have that we can pass the limit through the expectation to get
Z Z Z t Z Z
N

lim E[D ] ≤ E lim d ∂µ Φ(φ(τ ), y, νQ (τ ))(ψ(τ ))f (ψ(τ ), ŷ , νQ (τ ))nτ (dŷ)

N →∞ N →∞ d W W s T T

N N
mτ (dy)dτ Q (dψdndr)Q (dφdmdρ) .
Since QN → Q in P(W) almost surely and the integrand is bounded, we have via Proposition 4.6 on p.115
of [22] that
Z Z Z t Z Z

E lim ∂µ Φ(φ(τ ), y, νQ (τ ))(ψ(τ ))f (ψ(τ ), ŷ, νQ (τ ))nτ (dŷ)mτ (dy)dτ
N →∞ W W s
Td Td

N N
Q (dψdndr)Q (dφdmdρ)
Z Z Z t Z Z

=E
∂µ Φ(φ(τ ), y, νQ (τ ))(ψ(τ ))f (ψ(τ ), ŷ, νQ (τ ))nτ (dŷ)mτ (dy)dτ
W W s Td Td

Q(dψdndr)Q(dφdmdρ) .
Now by Hölder’s inequality and Tonelli’s Theorem,

Z Z Z t Z Z

E
∂µ Φ(φ(τ ), y, νQ (τ ))(ψ(τ )) f (ψ(τ ), ŷ, νQ (τ ))nτ (dŷ) mτ (dy)dτ
W W s d T d T

Q(dψdndr)Q(dφdmdρ)
Z t Z Z Z 1/2 2 1/2
≤E |∂µ Φ(φ(τ ), y, νQ (τ ))(ψ(τ ))|2 mτ (dy)Q(dψdndr) Q(dφdmdρ) dτ
s W W Td
Z t Z Z 2 1/2

×
f (ψ(τ ), ŷ, νQ (τ ))nτ (dŷ) Q(dψdndr)dτ
s W Td
Z t Z Z Z 1/2 2 1/2
2
=E |∂µ Φ(φ(τ ), y, νQ (τ ))(x)| mτ (dy)νQ (τ )(dx) Q(dφdmdρ) dτ
s W Rd Td
Z t Z Z 2 1/2

×
f (ψ(τ ), ŷ, νQ (τ ))nτ (dŷ) Q(dψdndr)dτ .
s W Td
The first term in the product inside the expectation is bounded by Proposition C.2. For the second, we
have that
Z Z 2 Z Z 2

d f (ψ(τ ), ŷ, ν Q (τ ))n τ (dŷ)
Q(dψdndr) =
d f (ψ(τ ), ŷ, ν Q (τ ))n τ (dŷ) Q|B(X ×Y) (dψdn)

W T X ×Y T
Z Z Z 2

=
f (ψ(τ ), ŷ, νQ (τ ))nτ (dŷ) λ(dn|ψ)QX (dψ),
X Y Td
by writing Q|B(X ×Y (dψdn) = λ(dn|ψ)QX (dψ). Since Q almost surely satisfies (V4), we have by Remark 6.7
that P-almost surely, for every τ ∈ [0, 1]
Z Z Z 2

d f (ψ(τ ), ŷ, νQ (τ ))nτ (dŷ) λ(dn|ψ)QX (dψ)

X Y T
Z Z 2

=
f (ψ(τ ), ŷ, νQ (τ ))π(dŷ|ψ(τ ), νQ (τ )) QX (dψ)
X Td
= 0 by Assumption (A8).
Thus limN →∞ E[DN ] is bounded by the expectation of a term which its almost surely bounded times a
term which is almost-surely 0, and thus (41) holds and the proof of Lemma 6.10 is complete.
We have then that for each (s, t, Ψ, h) ∈ [0, 1] × [0, 1] × Cb (W) × Cc∞ (Rd ) there is a set Z(s,t,Ψ,h) ∈ F̃ such
that P̃(Z(s,t,Ψ,h) ) = 0 and

EQω̃ Ψ(MhQω̃ (t) − MhQω̃ (s)) = 0, ∀ω̃ ∈ Ω̃ \ Z(s,t,Ψ,h) .
Since there is a a countable collection of h ∈ Cc∞ (Rd ) which is dense in Cc∞ (Rd ), a countable collection
(s, t) ∈ [0, 1]2 which is dense in [0, 1]2 , and countably many Φ ∈ Cb (W) generating each of the countably
many sigma algebras Gsl , letting Z be the union over all these countable collections of Z(s,t,Ψ,h) , we have
Z ∈ F̃ , P̃(Z) = 0, and

E Qω̃
Ψ(MhQω̃ (t) − MhQω̃ (s)) = 0, ∀ω̃ ∈ Ω̃ \ Z.
So Theorem 6.8 is proved.
6.2.3. Proof of (V2). By Skorohod’s representation theorem, we can invoke another probability space on
which the convergence of QN → Q occurs with probability 1. Without making a distinction in the notation
between that probability space and our original one, we note that by Fatou’s lemma
Z Z
Q 2 Q 2
E |z| ρ(dzdt) = E E |z| ρ(dzdt)
Rm ×[0,1] Rm ×[0,1]
Z Z
≤ lim inf E |z|2 r(dzdt) QN (dφdndr)
N →∞ W Rm ×[0,1]
X N Z 1
1 N 2
= lim inf E |ui (s)| ds
N →∞ N i=1 0
< ∞ by Assumption (25),
where in the first equality we use that through Theorem 6.8 we identified Q to P-a.s. be the unique (by
Proposition 2.3) deterministic measure in P(W) with Z-marginal QZ solving the Martingale Problem 34.
6.2.4. Proof ofR (V3). This follows immediately from weak convergence, since by Proposition A.3, for f ∈
Cb (Rd ), Θ 7→ Rd f (x)νΘ (0)(dx) is a continuous bounded map from P(W) to R. Thus,
X N
1
Z Z
f (x)ν0 (dx) = lim f (x) δ i,N (dx)
Rd N →∞ Rd N i=1 x
Z
= lim E f (x)νQN (0)(dx)
N →∞ Rd
Z
= Ẽ f (x)νQ (0)(dx)
d
Z R
= f (x)νQ (0)(dx),
Rd
where again in the last step we use that through Theorem 6.8 we identified Q to P̃-a.s. be the unique (by
Proposition 2.3) deterministic measure in P(W) with Z-marginal QZ solving the Martingale Problem 34.
Thus we get that Q P̃-.a.s. satisfies (V3).
7. The Laplace Principal Lower Bound

We now proceed with proving the Laplace Principal Lower Bound:
1
(42) lim inf − log E[exp(−N F (µN )] ≥ inf {F (θ) + I(θ)}.
N →∞ N θ∈P(X )
It suffices to prove this bound along any subsequence such that the left hand side converges. Such a
sequence exists since − N1 log E[exp(−N F (µN )] ≤ kF k∞ . Fix η > 0. By Proposition 3.3, for each N ∈ N,
there exists vN ∈ UN such that
N Z
1 N 1 1 X 1 N
− log E[exp(−N F (µ )] ≥ E[ |v (t)|2 dt] + E[F (µ̄N )] − η.
N 2 N i=1 0 i
Note also that for this choice of controls, we have for all N ∈ N,
N Z
1 X 1 N
(43) E[ |v (t)|2 dt] ≤ 4kF k∞ + 2η.
N i=1 0 i
Thus, since (25) is satisfied, by the proof of Theorem 4.4 in [10], we also get that it is enough to assume
that P almost-surely,
N Z
1 X 1 N 4kF k∞ (4kF k∞ + η)
|v (t)|2 dt ≤ ,
N i=1 0 i η
so the results of Section 6 apply with {v N }N ∈N as our choice of controls, and for {QN }N ∈N as in Equation
23 with Z-marginal determined by {v N }N ∈N , L(QN ) → δQ in P(P(W)) such that Q ∈ V almost-surely. So
 
N
1 1 1 X
lim inf − log E[exp(−N F (µN )) ≥ lim inf  E[ |v N (t)|2 dt] + E[F (µ̄N ] − η
N →∞ N N →∞ 2 N i=1 i
" #
1
Z Z
2 N N
= lim inf E[ |z| r(dzdt)QZ (dr)] + E[F (QX )] − η
N →∞ 2 Z Rm ×[0,1]
1
Z Z
≥ |z|2 r(dzdt)QZ (dr) + F (QX ) − η
2 Z Rm ×[0,1]
by Fatou’s Lemma
( Z )
Θ 1 2
≥ inf inf E |z| ρ(dzdt) + F (θ) − η
θ∈P(X ) Θ∈V:ΘX =θ 2 Rm ×[0,1]
= inf {I(θ) + F (θ)} − η.
θ∈P(X ∗ )
Since η is arbitrary the lower bound (42) is proved.
8. The Laplace Principal Upper Bound

In order to close the proof of Theorem 3.4, we now need to show for F ∈ Cb (P(X )) that
1
(44) lim sup − log E[exp(−N F (µN )) ≤ inf {I(θ) + F (θ)}.
N →∞ N θ∈P(X )
Given η > 0, take θ ∈ P(X ) such that

η
I(θ) + F (θ) ≤ inf {I(θ) + F (θ)} + .
θ∈P(X ) 2
Since the bound given in Equation 44 is trivial if the right hand side is +∞, we may assume it is finite.
By the definition of I, there exists Θ ∈ V such that ΘX = θ. By merit of (V2), we get that letting (X̄, m, ρ)
Θ
be the canonical process on W as defined by Equation 13 that there exists a m-dimensional Gt+ - Brownian
motion W such that
Θ
((W, B(W), Θ), {Gt+ }, (X̄, m, ρ, W ))
is a weak solution of (10). Note that we take the Θ-augmentation and right limit of Gt so that we have a
filtration that satisfies the usual conditions. By (V1) we know that the martingale problem (34) is satisfied
Θ
by the coordinate process, and by Exercise 5.4.13 in [41] it is also satisfied with Gt+ in the place of Gt . We
note that in the dynamics of Equation 10 the control only appears linearly as
Z
u(t, ω) = zρω,t (dz), ω ∈ W,
Rm
and
Z 1 Z 1 Z Z 1 Z Z
Θ 2 Θ
E [ |u(t)| dt] = E [ | zρt (dz)|2 dt] ≤ EΘ [ |z|2 ρt (dz)dt] = EΘ [ |z|2 ρ(dzdt)].
0 0 Rm 0 Rm Rm ×[0,1]
δu(t,ω) (D)dt for D ∈ B(Rm ), I ∈ B([0, 1]), ω ∈ W, that

R
Then we see that defining ρ̃(ω)(I × D) = I
Θ
((W, B(W).Θ), {Gt+ }, (X̄, m, ρ̃, W ))
is still a weak solution to (10) and the cost of ρ̃ does not exceed that of ρ.
Θ
We thus can find Θ̃ ∈ V such that Θ̃X = θ, there exists a m-dimensional Gt+ -Brownian motion W such
that
Θ̃
((W, B(W), Θ̃), {Gt+ }, (X̄, m, ρ̃, W ))
is a weak solution to (10) with ρ̃t (ω)(D) = δũ(t,ω) (D) for D ∈ B(Rm ), and ũ a Rm -valued process on W such
that
Z 1 Z
Θ̃ 1 2 Θ̃ 1 2 η
E |ũ(t)| dt = E |z| ρ̃(dzdt) ≤ I(θ) + .
2 0 2 Rm ×[0,1] 2
Now let us define a filtered probability space (Ω∞ , F∞ , P∞ ), {Ft∞ } by taking countably infinitely many
products of
Θ̃
(W, B(W), Θ̃), {Gt+ }.
For ω = (ω1 , ω2 , ...) ∈ Ω∞ , define (see also Section 6 of [11])
u∞
i (t, ω) = ũ(t, ωi ), i ∈ N, t ∈ [0, 1].
Note that {u∞
i }i∈N are independent and identically-distributed and
N Z N
1 X 1 ∞ 2 1 X ∞ 1 ∞ 2
Z Z
(45) E∞ [ |ui (t)| dt] = E [ |ui (t)| dt] = EΘ̃ [ |z|2 ρ̃(dzdt)] < ∞
N i=1 0 N i=1 0 m
R ×[0,1]
by (V 2).
Let {W i }i∈N be independent m-dimensional Gt+ Θ̃

-Brownian motions and {X̃ i,N }i∈{1,...,N } be the unique
solution to the system of SDEs on (Ω∞ , F∞ , P∞ )

i,N 1 i,N i,N N i,N i,N N i,N i,N N i,∞
dX̃t = f (X̃t , X̃ /δ, µ̃t ) + b(X̃t , X̃ /δ, µ̃t ) + σ(X̃t , X̃ /δ, µ̃t )u (t) dt
δ
+ σ(X̃ti,N , X̃ i,N /δ, µ̃N i
t )dWt
X̃0i,N = xi,N
for N ∈ N and µ̃N
t the empirical measure of X̃
1,N
, ..., X̃ N,N at time t.
Now defining
N
1 X
Q̃N
ω (A × B × C) = δ i,N (A)δm̃i,N (ω) (B)δρ̃i,∞ (ω) (C)
N i=1 X̃ (·,ω)
where ρ̃i,∞ (ω)(I × D) = ρ̃(ωi )(I × D) = I δũ(t,ωi ) (D)dt = I δui,∞ (t,ω) (D)dt and m̃i,N (ω) is as in Equation
R R
PN R 1 ∞ 2
24 with X̃ i,N in the place of X̄ i,N . Since Equation 45 holds, supN ∈N N1 i=1 0 |ui (t)| dt < ∞ P
∞
almost-
N
surely, and the proof of tightness and the fact that Q̃ → Q̃ ∈ V weakly as a P(W)-valued random variable
along some subsequence holds by the proofs in Section 6.
We wish now to conclude that Q̃X = Θ̃X . By Proposition 2.3, weak-sense uniqueness as defined in
Definition 2.2 holds, so it suffices to prove that Q̃Z = Θ̃Z . By the mapping theorem (Theorem 2.7 in [5])
and continuity of the projection operator from W to Z, we can simply show Q̃N Z → Θ̃Z weakly as a P(Z)-
valued random variable. Since ρ̃i,∞ are independent and identically-distributed under P∞ with common
distribution the same as that of ρ̃ under Θ̃, for Q̃ a limit point of {QN }N ∈N defined on a probability space
(Ω̃, F̃ , P̃), we have by Varadarajan’s theorem ([18] p.399) that P̃-almost surely,
Q̃Z = Θ̃ ◦ ρ̃−1 = Θ̃Z .
Therefore Q̃X = Θ̃X , and we have, where the infimum in the first line is taken to be over all stochastic bases,
 
N
1 1 1 X 
lim sup − log E[exp(−N F (µN )) = lim sup inf E[ |uNi (t)|2 dt] + E[F (µ̄N )]
N →∞ N N →∞ uN ∈U N  2 N i=1 
 
1 N Z 1
1 X 
≤ lim sup E∞ [ |u∞ 2 ∞
i (t)| dt] + E [F (µ̄ )]
N
N →∞  2 N i=1 0 
Z
1
= EΘ̃ |z|2 ρ̃(dzdt) + lim sup E∞ [F (Q̃N X )]
2 Rm ×[0,1] N →∞
Z
1
≤ EΘ̃ |z|2 ρ̃(dzdt) + F (Θ̃X )
2 Rm ×[0,1]
η
≤ I(θ) + F (θ) +
2
≤ inf {I(θ) + F (θ)} + η.
θ∈P(X )
Since η is arbitrary, Equation 44 is proved.
9. Compactness of Level Sets

Consider I as defined in Equation 16. We want to prove that for each s ∈ [0, ∞), the set
(46) Is := {θ ∈ P(X ) : I(θ) ≤ s}
is a compact subset of P(X ). This will imply that indeed I is a good rate function.
Since in this section we are dealing with sequences of measures all of which coincide with weak solutions
of Equation 12, but with possibly different controls, we introduce a new notation for the coordinate process
which allows us to keep track of which measure the X -component of the coordinate process corresponds to.
For this we use the paramaterized version of the limiting Equation 10.
For Q corresponding to a weak solution of Equation 12, Q also corresponds to a solution of Equation 10
with νQ as defined in Equation 11 in the place of ν. Thus we consider the process triple (X̃ νQ , m, ρ), which
coordinate process on theprobability space (W, B(W), Q) endowed with the
can be given explicitly as the
ν
canonical filtration Gt := σ (X̃s Q , m(s), ρ(s)), 0 ≤ s ≤ t . Thus, for ω = (φ, n, r) ∈ W,
ν
(47) X̃t Q (ω) = φ(t), m(t, ω) = n|B(Rm ×[0,t]) , ρ(t, ω) = r|B(Rm ×[0,t]) .
Lemma 9.1. Fix K < ∞ and consider a sequence {QN }N ∈N ⊂ P(W) such that for every N ∈ N, QN is
viable and
Z
QN 2
E |z| ρ(dzdt) < K.
Rm ×[0,1]
N
Then {Q }N ∈N is tight.
Proof. As in Subsection 6.1, it suffices to show tightness of each of the marginals. It is worth noting that
where before we were proving tightness of L(QN ) in P(P(W)), here we have that QN are deterministic
measures and we are proving tightness of the measures themselves in P(QN ).
Tightness of the Y-marginals follows in essentially the same way as in Subsection 6.1.2. P(Y) is itself
compact so {QNY }N ∈N is tight.
Tightness of the Z-marginals is also very similar to Subsection 6.1.1.
Z
g(r) := |z|2 r(dzdt)
Rm ×[0,1]
is a tightness function on R1 , so since
Z
QN 2
E |z| ρ(dzdt) < ∞,
Rm ×[0,1]
{QN
Z }N ∈N is tight.
νQN
For the tightness of the X -marginals, we use that each QN satisfies (V1); that is, QN X = L(X̃ ). Via
Theorem 2.4.10 in [41], it suffices to show that for every η > 0,

νQN νQN
lim sup QN X sup | X̃ t1 − X̃ t2 | ≥ η = 0,
ρ↓0 N ∈N |t1 −t2 |<ρ,0≤t1 <t2 ≤1
where here we are using the notation from Equation 47. We have that by Chebyshev’s inequality,

N νQN νQN
lim sup QX sup |X̃t1 − X̃t2 | ≥ η
ρ↓0 N ∈N |t1 −t2 |<ρ,0≤t1 <t2 ≤1

1 N ν N ν N
≤ lim sup EQ sup |X̃t1Q − X̃t2Q | .
ρ↓0 η N ∈N |t1 −t2 |<ρ,0≤t1 <t2 ≤1
Since
ν N ν N
|X̃t1Q − X̃t2Q |
Z t2 Z Z
ν N ν N ν N
= [∇y Φ(X̃t Q , y, νQN (t)) + I][b(X̃t Q , y, νQN (t)) + σ(X̃t Q , y, νQN (t)) zρt (dz)]
t1 Td Rm

ν N ν N ν N
+ ∇x Φ(X̃t Q , y, νQN (t))f (X̃t Q , y, νQN (t)) + A : ∇x ∇y Φ(X̃t Q , y, νQN (t)) mt (dy)dt
Z t2
B(t, X̃tν , ν(t))dWt ,

+
t1
Z
1 1
B(t, x, µ)B(t, x, µ)⊤ = (∇y Φ(x, y, µ) + )A(x, y, µ) + f (x, y, µ) ⊗ Φ(x, y, µ) mt (dy),
2 Td 2
we get via Hölder’s inequality, Itô isometry, Assumption (A1) and Proposition C.2 that
sZ 1 Z
√

νQN νQN
|X̃t1 − X̃t2 | ≤ C (t2 − t1 ) + t2 − t1 2
|z| ρt (dz)dt + 1 .
0 Rm
Then we have by Young’s inequality that

1
5 1
Z Z
ν ν
|z|2 ρt (dz)dt .
N N
sup |X̃t1Q − X̃t2Q | ≤ C +
|t1 −t2 |<ρ,0≤t1 <t2 ≤1 2 2 0 Rm
Then
5 1 1

5 1
Z Z
N
sup EQ C + 2
|z| ρt (dz)dt ≤C + K by assumption.
N ∈N 2 2 0 Rm 2 2
So by dominated convergence theorem, we have

N νQN νQN
lim sup QX sup |X̃t1 − X̃t2 | ≥ η
ρ↓0 N ∈N |t1 −t2 |<ρ,0≤t1 <t2 ≤1

1 N ν N ν N
≤ lim sup EQ sup |X̃t1Q − X̃t2Q |
ρ↓0 η N ∈N |t1 −t2 |<ρ,0≤t1 <t2 ≤1
sZ 1 Z
√

C Q N
2
≤ sup E lim sup (t2 − t1 ) + t2 − t1 |z| ρt (dz)dt + 1
η N ∈N ρ↓0 |t1 −t2 |<ρ,0≤t1 <t2 ≤1 0 Rm
= 0.

Lemma 9.2. Fix K < ∞ and consider a convergent sequence {QN }N ∈N ⊂ P(W) such that for every N ∈ N,
QN is viable and
Z
N
EQ |z|2 ρ(dzdt) < K.
Rm ×[0,1]
Then for Q such that QN → Q, Q is viable.

Proof. The fact that Q satisfies (V2) follows immediately from Fatou’s lemma. Since by Proposition A.3
ν0 = limN →∞ νQN (0) = νQ (0), (V3) is satisfied.
We now prove Q satisfies (V1). As before, our tool here is the martingale problem associated to Equation
10. It suffices to show that for fixed h ∈ Cc∞ (Rd ), 0 ≤ s ≤ t ≤ 1, and Gs -measureable Ψ ∈ Cb (W) that

Q Q Q
E Ψ(Mh (t) − Mh (s)) = 0
where MhQ is given in Equation 33. It suffices to show that

QN QN QN Q Q Q
E Ψ(Mh (t) − Mh (s)) → E Ψ(Mh (t) − Mh (s))
since by (V1)

N N
QN
E Ψ(MhQ (t) − MhQ (s)) = 0.
Unlike in the previous proof of (V1), here the convergence is as a sequence of real numbers and not in
distribution, since QN are deterministic.
So that we can keep track of which measure m and ρ correspond to in the Coordinate Process 47 on W
under QN , we relabel it (X̃ νQN , mN , ρN ). Under Q, we keep the notation (X̃ νQ , m, ρ). Invoking Skorohod’s
representation theorem to find another probability space on which the convergence of the random variables
(X̃ νQN , mN , ρN ) → (X̃ νQ , m, ρ) occurs for almost every ω ∈ Ω, we have

N N N
E Ψ MhQ (t) − MhQ (t) + MhQ (s) − MhQ (s) ≤ C(kΨk∞ ) E MhQ (t) − MhQ (t)

N
+ E MhQ (s) − MhQ (s)

and

N
E MhQ (t) − MhQ (t)

Z tZ Z
ν N ν
A[g](X̃sνQ , y, z, νQ (s))ms (dy)ρs (dz)ds

= E g(X̃t Q )) − g(X̃t Q ) +
0 Rm Td
Z tZ Z
ν N
A[g](X̃s Q , y, z, νQN (s))mN N

− s (dy)ρs (dz)ds
.
0 Rm Td
By continuity and boundedness of g and convergence of X̃ νQN → X̃ νQ along with bounded convergence
theorem,

ν N ν
E g(X̃t Q )) − g(X̃t Q ) → 0 as N → ∞.
By Assumption (A1) and Proposition C.2 along with bounded convergence thoerem,
Z t Z Z Z tZ Z
ν N
A[g](X̃sνQ , y, z, νQ (s))ms (dy)ρs (dz)ds − A[g](X̃s Q , y, z, νQN (s))mN N

lim E s (dy)ρ s (dz)ds
N →∞ 0 Rm T d 0 Rm T d

Z t Z Z Z Z
ν N
A[g](X̃sνQ , y, z, νQ (s))ms (dy)ρs (dz) − A[g](X̃s Q , y, z, νQN (s))mN N

≤ lim E
s (dy)ρs (dz)ds

N →∞ 0

Rm T d Rm T d
Z t Z Z Z Z
νQ νQN N N

=E lim A[g](X̃s , y, z, νQ (s))ms (dy)ρs (dz) − A[g](X̃s , y, z, νQN (s))ms (dy)ρs (dz)ds .
0 N →∞ Rm Td Rm Td
By continuity of the coefficients in x and µ from Assumption (A1) and Proposition C.2, along with the
assumed uniform L2 bound on the control and with the fact that the growth in the control is linear, if we
can show that νQN (t) → νQ (t) in P(Rd ) for each t ∈ [0, 1] then this term will vanish by the same triangle-
inequality argument given in the proof of Lemma 6.9. But this follows immediately by the assumption that
QN → Q almost surely and Proposition A.3, and by Chebyshev’s inequality and the same density argument
as at the end of Subsection 6.2.2 we have that Q satisfies (V1).
Finally we prove that Q satisfies (V4). Again invoking Skorohod’s representation theorem to find another
probability space on which the convergence of the random variables (X̃ νQN , mN , ρN ) → (X̃ νQ , m, ρ) occurs
for almost every ω ∈ Ω,
Z
L1X̃ νQ ,ν (s) f (y)m(dyds)

E
s Q
Td ×[0,t]
Z
L1 νQN f (y)mN (dyds)

≤ lim inf E by Fatou’s Lemma
N →∞ Td ×[0,t] X̃s ,νQ (s)
Z
L1 νQN f (y)mN (dyds)

≤ lim inf E
N →∞ Td ×[0,t] X̃s ,νQN (s)
Z
L1 νQN f (y) − L1 νQN N

+ lim inf E f (y)m (dyds)
N →∞ Td ×[0,t] X̃s ,νQ (s) X̃s ,νQN (s)
Z
1 1 N

= lim inf E
L ν QN f (y) − L νQN f (y)m (dyds) by (V4)
N →∞ Td ×[0,t] X̃s ,νQ (s) X̃ s ,νQN (s)
Z
1 1
N
≤ sup lim L ν N
X̃s Q ,νQ (s) f (y) − L ν N f (y) m (dyds)
N ∈N j→∞ Td ×[0,t] X̃s Q ,νQj (s)
Z
1 1
N
≤ sup E lim L QN
ν f (y) − L QN ν f (y)m (dyds)

N ∈N Td ×[0,t] j→∞ X̃s ,νQ (s) X̃s ,νQj (s)
by Assumption (A1) and Dominated Convergence Theorem

=0
by continuity of L1x,µ in µ via Assumption (A1) and Proposition A.3.
So Q satisfies (V4).
Lemma 9.1 establishes precompactness of Is defined in (46). Now we will use both Lemmas 9.1 and 9.2
to prove the level sets Is are closed via showing lower-semicontinuity of I.
Lemma 9.3. The functional I given in Equation 16 is lower semi-continuous.
Proof. Consider a sequence {θN } ⊂ P(X ) with limit θ. We wish to show
lim inf I(θN ) ≥ I(θ).
N →∞
It suffices to consider the case there the left hand side is finite, so there is M ∈ [0, ∞) such that lim inf N →∞ I(θN ) ≤
M . Then, recalling that
Z
N 1
I(θN ) = inf EΘ |z|2 ρ(dzdt) ,
ΘN ∈V:ΘN X =θ
N 2 Rm ×[0,1]
by taking a subsequence of {θN } if necessary, we can find measures ΘN such that ΘN N
X = θ ,
Z
N 1
(48) sup EΘ |z|2 ρ(dzdt) < M + 1,
N ∈N 2 m
R ×[0,1]
and
Z
N ΘN 1 2 1
I(θ ) ≥ E |z| ρ(dzdt) − .
2 Rm ×[0,1] N
Then by Lemma 9.1 we can consider a subsequence along which {ΘN } converges to some Θ. By Lemma 9.2
Θ is viable. Hence by Fatou’s lemma,
Z
N ΘN 1 2 1
lim inf I(θ ) ≥ lim inf E |z| ρ(dzdt) −
N →∞ N →∞ 2 Rm ×[0,1] N
Z
1
≥ EΘ |z|2 ρ(dzdt)
2 Rm ×[0,1]
Z
1
≥ inf EΘ |z|2 ρ(dzdt)
Θ∈V:ΘX =θ 2 Rm ×[0,1]
= I(θ),
so lower semi-continuity of I is proved.
10. A More General Regime

For presentation purposes we proved Theorem 3.4 under quite strong boundedness assumptions for the
coefficients. The proofs however make it clear that one can relax such assumptions at the expense of requiring
more precise information on the behavior of the solution to Equation 4 and its derivatives.
In particular, let us make the following set of assumptions:
PN
(A1’) For some ν0 ∈ P2 (Rd ), N1 i=1 δxi,N → ν0 as N → ∞.
(A2’) The system of SDEs (1) admits a unique strong solution.
(A3’) There exists a unique invariant measure π(dy|x, µ) satisfying Equation 3 and the centering condition:
Z
f (x, y, µ)π(dy|x, µ) = 0, ∀x ∈ Rd , µ ∈ P2 (Rd )
Td
holds.
(A4’) There exists a unique function Φ : Rd × Td × P2 (Rd ) → Rd , Φ = (Φ1 , ..., Φd ) solving Equation (4).
(A5’) For each x ∈ Rd , y ∈ Td , Φ(x, y, ·) is fully C 2 in the sense of Definition B.2.
(A6’) The partial derivatives in x and y of Φ, ∂µ Φ, and ∇x Φ, ∇y Φ exist.
i,N N 1 PN
R1 N 2
(A7’) There exists p ≥ 2 such that for X̄ controlled by any u ∈ UN satisfying supN ∈N E N i=1 0 |ui (t)| dt <

1
PN i,N p
∞, supN ∈N E sup0≤t≤1 N i=1 |X̄t | < ∞. This assures that for all N , there exists a modification
of µ̄N ∈ C([0, 1]; P2 (Rd )) so that µ̄N
t is in the domain of the coefficients and Φ for all time.

R1 N
(A8’) There exists p2 > p such that for X̄ i,N controlled by any uN ∈ UN satisfying supN ∈N E N1 N 2
P
i=1 0 |u i (t)| dt <

R 1 PN
∞, supN ∈N E 0 N1 i=1 |X̄ti,N |p2 dt < ∞.
(A9’) For (x, y, ν) ∈ Rd × Td × P2 (Rd ) and α > 0 sufficiently small, we have the following terms are bounded
by

p p
C 1 + |x| + ν(| · | )
for some C > 0 and p from Assumption (A7’):

i.) |f (x, y, ν)|+|b(x, y, ν)|+|σ(x, y, ν)|2
2
2 2 2

ii.) |∇x Φ(x, y, ν)σ(x, y, ν)| +|Φ(x, y, ν)| |σ(x, y, ν)| + [I + ∇y Φ(x, y, ν)]σ(x, y, ν)

iii.) ∂µ Φ(x, y, ν)σ(v, y, ν) L2 (ν,Rd ) + ∂µ Φ(x, y, ν) L2 (ν,Rd )

:
iv.) [I + ∇y Φ(x, y, ν)]b(x, y, ν) + ∇x Φ(x, y, ν)f (x, y, ν) + A ∇x ∇y Φ(x, y, ν)

v.) [I + ∇y Φ(x, y, ν)]A(x, y, ν) + f ⊗ Φ(x, y, ν)
vi.)

∇x Φ(x, y, ν)b(x, y, ν) + A : [ 1 ∇x ∇x Φ(x, y, ν) + ∇y ∂µ Φ(x, y, ν)(x)

2
Z

+ α∇x ∂µ Φ(x, y, ν)(x)] + ∂µ Φ(x, y, ν)(v)b(v, y, ν)
Rd

1
+ A(v, y, ν) : [∂v ∂µ Φ(x, y, ν)(v) + α∂µ2 Φ(x, y, ν)(v, v) ν(dv)

2

vii.) [∇x Φ(x, y, ν) + α∂µ Φ(x, y, ν)(x)]A(x, y, ν) + b ⊗ Φ(x, y, ν)

(A10’) For fixed x, x̃ ∈ Rd , y ∈ Td , we have ν 7→ ∂µ Φ(x, y, ν)(x̃) and ν 7→ f˜(x, y, ν) for f˜ = b, f, σ, Φ, ∇y Φ, ∇x Φ,

and ∇x ∇y Φ are continuous in P(Rd ) (or if p > 2 in (A7’), it is sufficient to have continuity in
(P2 (Rd ), W2 )).
(A11’) For x, x̂ ∈ Rd , y, ŷ ∈ Td , ν ∈ P2 (Rd ) and p from Assumption (A7’) we assume |∂µ Φ(x, y, ν)(x̂)f (x̂, ŷ, ν)| ≤
C(1 + |x|p + |x̂|p + ν(| · |p ).
(A12’) Weak uniqueness in the sense of Definition 2.2 holds for the limiting Equation 12.
Admittedly these assumptions seem somewhat contrived to meet the requirements of our proofs, but
the reason for this presentation is two-fold. The first reason is that these assumptions encapsulate two
major classes of SDEs for the particles: ones where the coefficients are bounded (in which case we can drop
assumption (A8’)), and ones where b is a confining potential and the growth of f and σ in x and µ is not too
fast (as in Section 5). The second is that, with the notable exception of [54], regularity of the Cell Problem
4 with respect to the paramater µ ∈ P2 (Rd ) is not yet well-studied in the existing literature. Thus this
regularity must be studied on a case-to-case basis in applications, though sufficient regularity can be proved
either through explicitly solving for Φ or through the representation offered by Equation 54 in the proof of
Proposition C.2.
The proofs go through in the same manner as under assumptions (A1)- (A8), with a few notable changes.
The first is that the necessary results of Appendix A, C, and Proposition 2.3 are assumed rather than proven.
The second is that the bounds in assumption (A9’) are needed for the proof of tightness in Subsection 6.1.
Those bounds yield the moment dependence on |X̄ti,N |p which then due to Assumption (A7’)-(A8’) allow us
to conclude tightness and then convergence. (A11’) is needed for technical reasons in the proof of Lemma
6.10.
Lastly, the class of viable measures in the definition of the rate function (16) must be modified:
Definition 10.1. We will say Θ ∈ P(W) is in V ′ if

(1) Θ satisfies (V1), (V3),
and (V4) from Definition 3.1.
Θ
R 2
R1 p
(2) Θ satisfies (V2)’: E Rm ×[0,1] |z| ρ(dzdt) + 0 |X̄t | dt < ∞, where p is from Assumption (A7’).
Due to this modification, the proofs in Section 9 have a few minor differences: Lemmas 9.1 and 9.2 should
instead be proved under the assumption that there exists K > 0 such that
Z Z 1
N
(49) EQ |z|2 ρ(dzdt) + |X̃ νQN (t)|p dt < K,
Rm ×[0,1] 0
where here we are using the modified coordinate process notation of Equation 47. The proofs follow in
essentially the same manner using the uniform integrability granted by Assumption (A8’). Then, when
proving Lemma 9.3, we see that in fact the bound offered in Equation 48 imply that Equation 49 holds for
some K by using the same construction of a prelimit system for each QN given in Section 8, Fatou’s Lemma,
and Assumption (A7’).
Thus we have the following theorems:
Theorem 10.2. Under assumptions (A1’)-(A12’), the sequence of P(X )-valued random variables {µN } as
defined by Equation 5 satisfies the Laplace Principal with good rate function
Z
Θ 1 2
(50) I(θ) = inf E |z| ρ(dzdt)
Θ∈V ′ :ΘX =θ 2 Rm ×[0,1]
where inf(∅) := ∞.
Theorem 10.3. Let ev : X → Rd be the evaluation map at time t and {µN } be as defined by Equation
5. Under assumptions (A1’)-(A12’), L(µN ) → δµ∗ in P(P(X )), where deterministic µ∗ ∈ P(X ) satisfies
µ∗ ◦ ev −1 (t) = L(Xt ), t ∈ [0, 1] for X solving the McKean-Vlasov SDE (6).
11. Conclusions and Future Work

We have derived a large deviations principal and law of large numbers for the empirical measure of a
system of weakly interacting particles in a two-scale environment in the joint many-particle and averaging
limit. We use weak convergence methods, and obtain a variational form of the rate function. We saw that
for the system (1), the two limiting procedures commute.
The results of this paper bring to light many interesting problems to be explored in future work. Of
significant interest is making the discussion of Section 4 rigorous; in other words, proving the equivalence
of rate functions in variational form, such as from Theorem 3.4 or [11], to rate functions of the form found
in [16]. A related problem is finding methods for solving constrained optimization problems over a space of
measures where the constraint is determined by a class of joint laws for a controlled McKean-Vlasov SDE.
This would allow for finding an explicit form of the rate function I in Theorem 3.4. Finally, an interesting
extension of this work would be to proving a large deviations principal for systems whose coefficients depend
PN
on the “fast empirical measure” µN,δ
t := N1 i=1 δXti,N /δ as well. Such a result would then capture the system
explored in [17], and would perhaps serve to give insight into the nature of bifurcations in the number of
steady states for certain classes McKean-Vlasov systems as originally investigated in [15].
Appendix A. Preliminary Results on the Prelimit System 7 and the Operator νQ (t)
Proposition A.1. Under assumptions (A1)-(A8), the system of mean-field SDEs 1 admits a unique strong
solution for each N ∈ N.
Proof. We observe that Equation 1 can be written as a standard 2dN -dimensional SDE via

N 1ˆ N
dX̂t = f (X̂t ) + b̂(X̂t ) + σ̂(X̂tN )dŴtN
N
δ
where, letting Yti,N = Xti,N /δ, ∀i ∈ {1, ..., N }, and x̂ = (x̂1 , ..., x̂2N )⊤ , x̂i ∈ Rd , i ∈ {1, ..., 2N }, we have
X̂tN = (Xt1,N , · · · , XtN,N , Yt1,N , · · · , YtN,N ).
Let g play the role of f, b or σ. For g : Rd × Td × P2 (Rd ) → Rj , j = d or d × m, here we denote by

ĝ : Rd × Td × ⊗N Rd → Rj the empirical projection of the last coordinate of g, i.e. the function satisfying
N
1 X
ĝ(x, y, β1 , ..., βN ) = g(x, y, δβ ), ∀x, β1 , ..., βN ∈ Rd , y ∈ Td .
N i=1 i
One can then verify that (A5) then implies that for each N ∈ N, ∃C(N ) such that for all x̂1 , x̂2 ∈ (Rd )2N ,
| 1δ fˆ(x̂1 ) + b̂(x̂1 ) − 1δ fˆ(x̂2 ) − b̂(x̂2 )| + |σ(x̂1 ) − σ(x̂2 )| ≤ C(N )|x̂1 − x̂2 |, so that by standard existence and
uniqueness results for SDE’s with globally Lipshitz coefficients, the proposition holds. See, for example,
Theorem 5.2.1 in [49].
Proposition A.2. For X̄ i,N as in

R 1Equation 7 controlled by any uN ∈ UN satisfying almost surely for some
1 PN N 2
B > 0 the bound supN ∈N N i=1 0 |ui (t)| dt < B,
N q
1 X i,N 2
sup E sup |X̄t | < ∞, ∀q > 0.
N ∈N 0≤t≤1 N i=1
This ensures that for all N , there exists a modification of µ̄N ∈ C([0, 1]; P2 (Rd )) so that µ̄N
t is in the domain
of the coefficients and Φ for all time.
Proof. It clearly suffices to prove the result assuming q ≥ 1. We use the same computations as in the proof
of tightness of QN 2
X in Subsection 6.1.3, but taking h(x) = |x| .
Then, by Equation 27 in Subsection 6.1, we get
8
Bki,N (t).
X
i,N p i,N p
|X̄ | = |x | +
k=1
Using that for any p,
|∇h(x)| = p|x|p−1 , |∇∇h(x)| = p(p2 − 2p + d)1/2 |x|p−2
and when p = 2, we have ∇∇hxl = 0, l ∈ {1, ..., d}, we apply Cauchy Schwarz to all the inner products with
∇h, ∇∇h, and ∇∇hxl in Bki,N (t), k = 1, 2, 4, 5, 6, 8 and, using Assumption (A1) and Proposition C.2,
N q
1 X i,N 2
E sup |X̄t |
0≤t≤1 N i=1
X N Z t
1
≤ E sup |xi,N |2 + C(d) |xi,N | + |X̄ti,N | + 1 + |X̄si,N | + |uN i,N N
i (s)||X̄s | + |ui (s)|ds
t∈[0,1] N i=1 0
N t
1 X
Z
+ |uN (s)||X̄si,N |ds
N j=1 0 j
t N
1 X
Z
+ X̄si,N ·∂µ Φ(X̄si,N , X̄si,N /δ, µ̄N
s )( X̄ j,N
s )σ(X̄ j,N
s , X̄ j,N
s /δ, µ̄ N
s )dWs
j
0 N j=1
Z t
i,N i,N i,N N i,N i,N N i,N i,N N i
+ X̄t · I + δ∇x Φ(X̄s , X̄s /δ, µ̄s ) + ∇y Φ(X̄s , X̄s /δ, µ̄s ) σ(X̄s , X̄s /δ, µ̄s )dWs
0
Z t q
i,N i,N N i,N i,N N i
+ tr Φ(X̄s , X̄s /δ, µ̄s ) ⊗ (σ(X̄s , X̄s /δ, µ̄s )dWs .
0
Since C(d) is a constant depending only on the assumed bounds on the coefficients and the dimension d,
γ|X̄ i,N |2
we can use Young’s inequality on |X̄ti,N | to get |X̄ti,N | ≤ t
2
1
+ 2γ with γ small enough that C(d)γ/2 < 1
and subtract that term to the left hand side. Using also our bound on the initial values assumed in (A6)
and Cr -inequality, we get
N q
1 X i,N 2
E sup |X̄t |
t∈[0,1] N i=1
N Z t
1 X
≤ C2 (d, q) 1 + E sup 1 + |X̄si,N | + |uN i,N N
i (s)||X̄s | + |ui (s)|ds
t∈[0,1] N i=1 0
N Z q
1 X t N
+ |uj (s)||X̄si,N |ds
N j=1 0
N Z t X N
1 X 1
+E sup X̄si,N · ∂µ Φ(X̄si,N , X̄si,N /δ, µ̄Ns )(X̄ j,N
s )σ(X̄ j,N
s , X̄ j,N
s /δ, µ̄ N
s )dWs
j
t∈[0,1] N i=1 0 N j=1
Z t
i,N i,N i,N N i,N i,N N i,N i,N N i
+ X̄t · I + δ∇x Φ(X̄s , X̄s /δ, µ̄s ) + ∇y Φ(X̄s , X̄s /δ, µ̄s ) σ(X̄s , X̄s /δ, µ̄s )dWs
0
Z t q
+ tr Φ(X̄si,N , X̄si,N /δ, µ̄N
s ) ⊗ (σ( X̄ i,N
s , X̄ i,N
s /δ, µ̄ N
s )dW i
s
0
N Z t
1 X
≤ C3 (d, q) 1 + E sup 1 + |X̄si,N | + |uN i,N N
i (s)||X̄s | + |ui (s)|ds
t∈[0,1] N i=1 0
N Z q N Z q/2
1 X t N 1 X 1 i,N 2

i,N
+ |u (s)||X̄s |ds +E |X̄s | ds by Burkholder-Davis-Gundy inequality
N j=1 0 j N 2 i=1 0
on the martingale terms, Assumption (A1), and Proposition C.2
N Z q 1/2 N Z q 1/2
1 X t i,N 2 1 X t N

2
≤ C4 (d, q) 1 + E sup |X̄s | dt 1+E sup |ui (s)| ds
t∈[0,1] N i=1 0 t∈[0,1] N i=1 0
N Z q/2
1 X 1 i,N 2

+E | X̄ s | ds by Hölder’s, Jensen’s, and Cr inequalities
N 2 i=1 0
X N Z 1 q 1/2 X N Z 1 q 1/2
1 1
≤ C5 (d, q) 1 + E sup |X̄si,N |2 dt 1+E |uN
i (s)| 2
ds
N i=1 0 s∈[0,t] N i=1 0
by Jensen’s inequality, monotonicity of the integrands, and the fact that N ≥ 1
X N Z 1 q 1/2
1 i,N 2 q/2
≤ C5 (d, q) 1 + E sup |X̄ | dt 1+B by the assumed bound on the control
N i=1 0 s∈[0,t] s
N q 2
1 1
X
1 1
Z
≤ C5 (d, q) 1 + E sup |X̄si,N |2 dt + 1 + B q/2
2 0 N i=1 s∈[0,t] 2
by Young’s inequality, Tonelli’s Theorem, and Jensen’s inequality. By Grönwall’s inequality, we can conclude
the proof of the proposition.
We end this section with a proposition regarding the mapping defined in Equation 11.
Proposition A.3. For fixed t ∈ [0, 1], Q 7→ νQ (t) is continuous.
Proof. Take {Qn } ⊂ P(W) such that Qn → Q and f ∈ Cb (Rd ). Then, since (φ, n, r) 7→ f (φ(t)) ∈ Cb (W) we
get
Z Z Z Z
lim f (x)νQn (t)(dx) = lim f (φ(t))Qn (dφdndr) = f (φ(t))Q(dφdndr) = f (x)νQ (t)(dx).
n→∞ Rd n→∞ W W Rd

Appendix B. On Lions Differentiation

We will need the following two definitions from [12]:
Definition B.1. Given a function u : P2 (Rd ) → R, we may define a lifting of u to ũ : L2 (Ω̃, F̃, P̃; Rd ) → R
via ũ(X) = u(L(X)) for X ∈ L2 (Ω̃, F̃ , P̃; Rd ). Here we assume Ω̃ is a Polish space, F̃ its Borel σ-field,
and P̃ is an atomless probability measure (since Ω̃ is Polish, this is equivalent to every singleton having zero
measure).
Here, denoting by µ(| · |r ) := Rd |x|r µ(dx) for r > 0,
R
Z
d d 2
P2 (R ) :=: {µ ∈ P(R ) : µ(| · | ) = |x|2 µ(dx) < ∞}.
Rd
d 2
P2 (R ) is a polish space under the L -Wasserstein distance
Z 1/2
2
W2 (µ1 , µ2 ) := inf |x − y| π(dx, dy) ,
π∈Cµ1 ,µ2 Rd ×Rd
where Cµ1 ,µ2 denotes the set of all couplings of µ1 , µ2 .

We say u is L-differentiable or Lions-differentiable at µ0 ∈ P2 (Rd ) if there exists a random variable
X0 on some (Ω̃, F̃ , P̃) satisfying the above assumptions such that L(X0 ) = µ0 and ũ is Fréchet differentiable
at X0 .
The Fréchet derivative of ũ can be viewed as an element of L2 (Ω̃, F̃ , P̃; Rd ) by identifying L2 (Ω̃, F̃ , P̃; Rd )
and its dual. From this, one can find that if u is L-differentiable at µ0 ∈ P2 (Rd ), there is a deterministic
measurable function ξ : Rd → Rd such that Dũ(X0 ) = ξ(X0 ), and that ξ is uniquely defined µ0 -almost
everywhere on Rd . We denote this equivalence class of ξ ∈ L2 (Rd , µ0 ; Rd ) by ∂µ u(µ0 ) and call ∂µ u(µ0 )(·) :
Rd → Rd the Lions derivative of u at µ0 . Note that this definition is independent of the choice of X0 and
(Ω̃, F̃ , P̃). See [12] Section 5.2.
To avoid confusion when u depends on more variables than just µ, if ∂µ u(µ0 ) is differentiable at v0 ∈ Rd ,
we denote its derivative at v0 by ∂v ∂µ u(µ0 )(v0 ).
Definition B.2. ([12] Definition 5.83) We say u : P2 (Rd ) → R is Fully C2 if the following conditions are
satisfied:
(1) u is C 1 in the sense of L-differentiation, and its first derivative has a jointly continuous version
P2 (Rd ) × Rd ) ∋ (µ, v) 7→ ∂µ u(µ)(v) ∈ Rd .
(2) For each fixed µ ∈ P2 (Rd ), the version of Rd ∋ v 7→ ∂µ u(µ)(v) ∈ Rd from the first condition is
differentiable on Rd in the classical sense and its derivative is given by a jointly continuous function
P2 (Rd ) × Rd ) ∋ (µ, v) 7→ ∂v ∂µ u(µ)(v) ∈ Rd×d .
(3) For each fixed v ∈ Rd , the version of P2 (Rd ) ∋ µ 7→ ∂µ u(µ)(v) ∈ Rd in the first condition is continu-
ously L-differentiable component-by-component, with a derivative given by a function P2 (Rd ) × Rd ×
Rd ∋ (µ, v, v ′ ) 7→ ∂µ2 u(µ)(v)(v ′ ) ∈ Rd×d such that for any µ ∈ P2 (Rd ) and X ∈ L2 (Ω̃, F̃ , P̃; Rd ) with
L(X) = µ, ∂ 2 u(µ)(v)(X) gives the Fréchet derivative at X of L2 (Ω̃, F̃, P̃; Rd ) ∋ X ′ 7→ ∂µ u(L(X ′ ))(v)
for every v ∈ Rd . Denoting ∂µ2 u(µ)(v)(v ′ ) by ∂µ2 u(µ)(v, v ′ ), the map P2 (Rd ) × Rd × Rd ∋ (µ, v, v ′ ) 7→
∂µ2 u(µ)(v, v ′ ) is also assumed to be continuous in the product topology.
We recall now a useful connection between the Lion derivative as defined in B.1 and the empirical measure.
Proposition B.3. For g : P2 (Rd ) → Rd which is Fully C 2 in the sense of definition B.2, we can define the
empirical projection of g, as g N : (Rd )N → Rd given by
N
1 X
g N (β1 , ..., βN ) := g( δβ ).
N i=1 i
Then g N is twice differentiable on (Rd )N , and for each β1 , .., βN ∈ Rd , (i, j) ∈ {1, ..., N }2 , l ∈ {1, ..., d}
N
1 1 X
(51) ∇βi glN (β1 , ..., βN ) = ∂µ gl ( δβ )(βi )
N N i=1 i
and
N N
1 1 X 1 1 X
(52) ∇βi ∇βj glN (x, y, β1 , ..., βN ) = ∂v ∂µ gl ( δβi )(βi )1i=j + 2 ∂µ2 gl ( δβ )(βi , βj ).
N N i=1 N N i=1 i
In particular, this holds for Φ(x, y, ·) for fixed x ∈ Rd and y ∈ Td .

Proof. This follows from Propositions 5.35 and 5.91 of [12]. Since by Proposition C.2 Φ is Fully C 2 , it applies
to Φ(x, y, ·).
Appendix C. On the Operator L1 and Related PDEs

Proposition C.1. Under assumptions (A1)-(A8), the invariant measure π defined by Equation 3 is uniquely
determined for each x ∈ Rd and µ ∈ P2 (Rd ) and has a density π̃.
Proof. This follows immediately from Theorem 6.16 in [52].
Proposition C.2. Under assumptions (A1)-(A8), there is a unique strong solution Φ to equation 4. More-
over, Φ is Fully C 2 in the sense of Definition B.2 and Φ, all second order partial derivatives of Φ in x
and y, and ∂µ Φ(x, y, µ)(·), ∂v ∂µ Φ(x, y, µ)(·), ∇x ∂µ Φ(x, y, µ)(·), ∇y ∂µ Φ(x, y, µ)(·), ∂µ2 Φ(x, y, µ)(·, ·) exist, are
continuous with respect to all variables x, v, v ′ ∈ Rd , y ∈ Td , µ ∈ P(Rd ), and are uniformly bounded L2 (Rd , µ)
with respect to x and y.
Proof. Existence and uniqueness follows directly from Theorems 6.16 and 7.9 in [52].
Consider the frozen process on Td for fixed x ∈ Rd , µ ∈ P2 (Rd ), y ∈ Td given by
(53) dYtx,y,µ = f (x, Ytx,y,µ , µ)dt + σ(x, Ytx,y,µ , µ)dW̃t

Y0x,y,µ = y
where W̃t is a d-dimensional, F̃t -adapted Brownian motion on some probability space (Ω̃, F̃ , P̃) satisfying
the usual conditions.
As per Proposition 4.1 in [54] and Section 11.6 in [52], Φ is given by
Z ∞
(54) Φ(x, y, µ) = Ẽ[f (x, Ysx,y,µ , µ)]ds.
0
2
Then the fact that Φ is fully C and smooth in x and y and boundedness of Φ, along with regularity of
Φ of the same type given in (A3) and (A4) follows from the unique representation of the cell problem given
by Equation 54 and the regularity assumptions on the coefficients given by (A3), (A4), and (A7) (see, for
example [50], [51] for general results on Euclidean space with no measure dependence, [4] Chapter 3 Section 6
for the case where the fast component is on the torus with no measure dependence, as well as [54] for when Φ
depends on a measure). Differentiability in x an y can also be seen directly from standard interior regularity
theory for elliptic PDEs (see [36], [28], and Proposition 5.1 in [19]). Though second differentiability in µ
isn’t addressed in [54], it follows by applying the same direct differentiation to the transition density of the
semigroup associated to Equation 53 twice. See [14] and [13] for an example of where estimates on the second
derivative of the transition density in µ are derived under different, weaker assumptions. In addition, in [54]
the fast component of the process is on Euclidean space, so they make use of a certain confining assuption on
the coefficients to prove Lemma 3.6. In our case this assumption is not needed, as the analagous ergodicity
result to Lemma 3.6 is automatically given on a compact space via Theorem 6.16 in [52]. For a proof see
Chapter 3 Section 3 in [4].
Corollary C.3. For fixed x, v ∈ Rd and y ∈ Td , µ 7→ g(x, y, µ) is Lipshitz continuous in (P2 (Rd ), W2 ) for
g = Φ, ∇x Φ, ∇y Φ, or ∇x ∇y Φ.
Proof. From Proposition C.2, we have ∂µ g(x, y, µ)(·) is bounded in L2 (Rd , µ) for all x ∈ Rd , y ∈ Td , µ ∈
P2 (Rd ). This implies that g(x, y, ·) is Lipshitz continuous with respect to W2 for each x ∈ Rd and y ∈ Td
by Remark 5.27 in [12].
Appendix D. Proof of Proposition 5.1

Here we present the proof of Proposition 5.1.
Proof of Proposition 5.1. Applying Itô’s formula, we get
N N Z t
1 X i,N 4 1 X i,N 4
(X̄ ) = (X̄0 ) + −(X̄si,N )3 + X̄si,N + σuN i,N
i (s) − κ(X̄s − νsN )
N i=1 t N i=1 0
Z t
ǫ i,N i,N 3 2 i,N 2 i,N 3 i
− sin(2π X̄t /δ) 4(X̄s ) + 6σ (X̄s ) ds + σ4(X̄s ) dWs .
δ 0
Now considering ψ(x, y) = x3 Φ(y), we have that ψ satisfies
∂ 1 ∂2
−ǫ sin(2πy) ψ(x, y) + σ 2 2 ψ(x, y) = ǫ sin(2πy)x3 .
∂y 2 ∂y
Letting Ȳti,N = X̄ti,N /δ and applying Itô’s formula to ψ(X̄ti,N , Ȳti,N ), we get
Z t
ǫ
ψ(X̄ti,N , Ȳti,N ) = ψ(X̄0i,N , Ȳ0i,N ) + −(X̄si,N )3 + X̄si,N + σuN
i (s) − κ(X̄si,N − νsN ) i,N
− sin(2π Ȳs )
0 δ
× 3(X̄si,N )2 Φ(Ȳsi,N )

1
+ −(X̄si,N )3 + X̄si,N + σuNi (s) − κ( X̄ s
i,N
− ν N
s ) (X̄si,N )3 Φ′ (Ȳsi,N )
δ
Z t
σ2

+ 3X̄si,N Φ(Ȳsi,N )σ 2 + 3(X̄si,N )2 Φ′ (Ȳsi,N ) ds + σ3(X̄si,N )2 Φ(Ȳsi,N )
δ 0
1 t

σ i,N 3 ′ i,N
Z
i
+ (X̄s ) Φ (Ȳs ) dWs + 2 ǫ sin(2π Ȳs )(X̄si,N )3 ds.
i,N
δ δ 0
1
Rt 1 PN i,N 4
Solving for δ 0
ǫ sin(2π Ȳsi,N )(X̄si,N )3 ds and plugging this into our equation for N i=1 (X̄t ) , we get
N N
1 X i,N 4 1 X
(X̄ ) = (X̄0i,N )4 + 4δ(X̄0i,N )3 Φ(Ȳ0i,N ) − 4δ(X̄ti,N )3 Φ(Ȳti,N )
N i=1 t N i=1
Z t
+ −(X̄s ) + X̄s + σui (s) − κ(X̄s − νs ) 4(X̄si,N )3 + 6σ 2 (X̄si,N )2
i,N 3 i,N N i,N N
0

ǫ
i,N 3 i,N N i,N N
+ 12δ −(X̄s ) + X̄s + σui (s) − κ(X̄s − νs ) − sin(2π Ȳs ) (X̄si,N )2 Φ(Ȳsi,N ) i,N
δ

+ 4 −(X̄si,N )3 + X̄si,N + σuN i (s) − κ(X̄s
i,N
− νsN ) (X̄si,N )3 Φ′ (Ȳsi,N )

i,N i,N 2 2 i,N 2 ′ i,N
+ 12δ X̄s Φ(Ȳs )σ + 12σ (X̄s ) Φ (Ȳs ) ds
Z t
+ σ4(1 + Φ′ (Ȳsi,N ))(X̄si,N )3 + 12δσ(X̄si,N )2 Φ(Ȳsi,N ) dWsi .
0
First applying Young’s inequality to 4δ(X̄0i,N )3 Φ(Ȳ0i,N ), we get |4δ(X̄0i,N )3 Φ(Ȳ0i,N )| ≤ 34 (X̄0i,N )4 +43 (δΦ(Ȳ0i,N ))4 .
Similarly, |4δ(X̄ti,N )3 Φ(Ȳti,N )| ≤ 43 (X̄ti,N )4 + 43 (δΦ(Ȳti,N ))4 . So
N
1 1 X i,N 4
E sup (X̄ )
4 t∈[0,1] N i=1 t
N Z t
1 X
≤ C(D, K, δ) + E sup −(X̄s ) + X̄s + σui (s) − κ(X̄s − νs ) 4(X̄si,N )3
i,N 3 i,N N i,N N
t∈[0,1] N i=1 0

ǫ
+ 6σ 2 (X̄si,N )2 + 12δ −(X̄si,N )3 + X̄si,N + σuN i (s) − κ( X̄ i,N
s − ν s
N
) − sin(2π Ȳs
i,N
) (X̄si,N )2 Φ(Ȳsi,N )
δ

+ 4 −(X̄si,N )3 + X̄si,N + σuN i (s) − κ(X̄ i,N
s − ν N
s ) (X̄si,N )3 Φ′ (Ȳsi,N ) + 12δ X̄si,N Φ(Ȳsi,N )σ 2
Z t
2 i,N 2 ′ i,N ′ i,N i,N 3 i,N 2 i,N i
+ 12σ (X̄s ) Φ (Ȳs ) ds + σ4(1 + Φ (Ȳs ))(X̄s ) + 12δσ(X̄s ) Φ(Ȳs ) dWs .
0
Now we use the explicit form of Φ′ (y) to get a crucial lower bound. We have

′ ǫ
Φ (y) = −1 + exp − 2 cos(2πy) /Z > −1.
πσ
Then taking γ > 0 such that inf y∈[0,1] Φ′ (y) ≥ −1 + γ, we get
N
1 1 X i,N 4
E sup (X̄ )
4 t∈[0,1] N i=1 t
N Z t
1 X
≤ C(D, K, δ) + E sup −(X̄si,N )3 + X̄si,N + σuN i (s) − κ( X̄ i,N
s − ν N
s ) 4(X̄si,N )3
t∈[0,1] N i=1 0

ǫ
+ 6σ 2 (X̄si,N )2 + 12δ −(X̄si,N )3 + X̄si,N + σuN i (s) − κ( X̄ i,N
s − ν N
s ) − sin(2π Ȳs
i,N
) (X̄si,N )2 Φ(Ȳsi,N )
δ

+ 4 (1 − γ)(X̄si,N )6 + K(X̄si,N )4 + K|σuN i (s)|| X̄ i,N 3
s | + (1 − γ)κ( X̄ i,N 4
s ) + Kκ|ν N
s || X̄ i,N 3
s |

+ 12δ X̄si,N Φ(Ȳsi,N )σ 2 + 12σ 2 (X̄si,N )2 K ds
Z t
+ σ4(1 + Φ′ (Ȳsi,N ))(X̄si,N )3 + 12δσ(X̄si,N )2 Φ(Ȳsi,N ) dWsi
0
N Z t
1 X
≤ C(D, K, δ) + E sup 4 −γ(X̄si,N )6 + (1 + K)(X̄si,N )4 + (1 + K)|σuN i,N 3
i (s)||X̄s |
t∈[0,1] N i=1 0

− γκ(X̄s ) + (1 + K)κ|νs ||X̄s | + 6σ 2 (X̄si,N )2
i,N 4 N i,N 3

ǫ
+ 12δ −(X̄s ) + X̄s + σui (s) − κ(X̄s − νs ) − sin(2π Ȳs ) (X̄si,N )2 Φ(Ȳsi,N )
i,N 3 i,N N i,N N i,N
δ

+ 12δ X̄si,N Φ(Ȳsi,N )σ 2 + 12σ 2 (X̄si,N )2 K ds
Z t
+ σ4(1 + Φ′ (Ȳsi,N ))(X̄si,N )3 + 12δσ(X̄si,N )2 Φ(Ȳsi,N ) dWsi .
0
1 PN k,N 2
Now by Young’s and Jensen’s inequalities (in particular using repeatedly that (νtN )2 ≤ N k=1 (Xt ) )
as well as Burkholder-Davis Gundy inequality on the martingale terms,
N
1 1 X i,N 4
E sup (X̄ )
4 t∈[0,1] N i=1 t
N Z t
1 X
≤ C(D, K, δ) + E sup −4γ(X̄si,N )6 + 4(1 + K − γκ)(X̄si,N )4
t∈[0,1] N i=1 0

3 α
+ 16 (1 + K)2 |σuN i (s)| 2
+ [(1 + K)κ] 2
(X̄ i,N 2
s ) + 9[δ(1 + κ)K] 2
+ |X̄si,N |6
2α 2

5 β
+ 36 σ 4 + 4[δK]2 |σuN 2 2 i,N 2
i (s)| + 4[κδK] (X̄s ) + 4[ǫK] + 4[σ K]
2 2 2
+ (X̄si,N )4
2β 2
Z t 1/2
1 1
+ 12δ|X̄si,N |5 K + [12δKσ 2 ]2 + (X̄si,N )2 ds + 4σc (1 + K 2 )(X̄si,N )6 ds
2 2 0
Z t 1/2
+ 12δσc K 2 (X̄si,N )4 ds
0
N Z t
1 X
≤ C(D, K, δ) + E sup −4γ(X̄si,N )6 + 4(1 + K − γκ)(X̄si,N )4
t∈[0,1] N i=1 0

3 α
+ 16 (1 + K)2 |σuN i (s)| 2
+ [(1 + K)κ] 2
(X̄ i,N 2
s ) + 9[δ(1 + κ)K] 2
+ |X̄si,N |6
2α 2

5 β
+ 36 σ 4 + 4[δK]2 |σuN 2 2 i,N 2
i (s)| + 4[κδK] (X̄s ) + 4[ǫK] + 4[σ K]
2 2 2
+ (X̄si,N )4
2β 2
2 Z t
1 1 [4σc] α
+ 12δ|X̄si,N |5 K + [12δKσ 2 ]2 + (X̄si,N )2 ds + + (1 + K 2 )(X̄si,N )6 ds
2 2 2α 2 0
[12δσc]2 β t 2 i,N 4
Z
+ + K (X̄s ) ds ,
2β 2 0
where α, β > 0 are to be chosen. To simplify things, since δ ↓ 0 as N → ∞ and all terms are increasing
inδ (including C(D, K, δ)), we may assume N is large enough that δ ≤ 1. This, along with our bound on
PN R 1
E N1 i=1 0 |uN 2
i (t)| dt gives
N
1 1 X i,N 4
E sup (X̄t )
4 t∈[0,1] N i=1
N Z t
1 X
≤ C(D, K) + E sup −4γ(X̄si,N )6 + 4(1 + K − γκ)(X̄si,N )4
t∈[0,1] N i=1 0

3 α
+ 16 [(1 + K)σ] B + [(1 + K)κ] (X̄s ) + 9[(1 + κ)K] + |X̄si,N |6
2 2 i,N 2 2
2α 2

5 β
+ 36 σ 4 + 4[σK]2 B + 4[κK]2 (X̄si,N )2 + 4[ǫK]2 + 4[σ 2 K]2 + (X̄si,N )4
2β 2
Z t
1 1 1 α
+ 12|X̄si,N |5 K + [12Kσ 2 ]2 + (X̄si,N )2 ds + [4σc]2 + (1 + K 2 )(X̄si,N )6 ds
2 2 2α 0 2
Z t
1 β 2 i,N 4
+ [12σc]2 + K (X̄s ) ds
2β 0 2
N Z t
1 X
= C(D, K) + E sup (α(1 + K 2 /2) − 4γ)(X̄si,N )6
t∈[0,1] N i=1 0
+ (β(1 + K 2 )/2 + 4(1 + K) − 4γκ)(X̄si,N )4

3
+ 16 [(1 + K)σ]2 B + [(1 + K)κ]2 (X̄si,N )2 + 9[(1 + κ)K]2
2α

5
+ 36 σ 4 + 4[σK]2 B + 4[κK]2 (X̄si,N )2 + 4[ǫK]2 + 4[σ 2 K]2
2β

1 1 1 1
+ 12|X̄si,N |5 K + [12Kσ 2 ]2 + (X̄si,N )2 ds + [4σc]2 + [12σc]2 .
2 2 2α 2β
Letting α = 3γ/(1 + K 2 /2), β = 8γκ/(1 + K 2 ), we get

N N Z t
1 1 X i,N 4 1 X
E sup (X̄ ) ≤ C(D, K) + E sup −γ(X̄si,N )6 + 4(1 + K)(X̄si,N )4
4 t∈[0,1] N i=1 t t∈[0,1] N i=1 0

+ C(K, κ)(X̄si,N )2 + 12K|X̄si,N |5 ds + C(K, σ, κ, ǫ, B, c, γ),
where we used t ≤ 1 to pull the constant out of the integral. Once again by Young’s inequality, we have
2
C(K, κ)x2 ≤ C(K,κ)
2 + 21 x4 and 12K|x|5 ≤ (12K/α)6 + 56 α6/5 x6 for any α > 0. Then letting α = (3γ/5)5/6 ,
we get
N N Z t
1 1 X i,N 4 1 X γ
(55) E sup (X̄t ) ≤ E sup − (X̄si,N )6 + C2 (K, κ)(X̄si,N )4 ds
4 t∈[0,1] N i=1 t∈[0,1] N i=1 0 2
+ C(D, K, σ, κ, ǫ, B, c, γ).
Now we have
N Z t N
1 X i,N 4 1 X i,N 4
E sup (X̄t ) ≤ 4C(D, K, σ, κ, ǫ, B, c, γ) + 4C2 (K, κ)E sup (X̄s ) ds
t∈[0,1] N i=1 t∈[0,1] 0 N i=1
Z 1 N
1 X i,N 4
≤ 4C(D, K, σ, κ, ǫ, B, c, γ) + 4C2 (K, κ) E sup (X̄s ) dt,
0 s∈[0,t] N i=1
so by Grönwall’s inequality,
N
1 X i,N 4
(56) E sup (X̄t ) ≤ 4C(D, K, σ, κ, ǫ, B, c, γ) exp 4C2 (K, κ) < ∞.
t∈[0,1] N i=1
Also, by performing the exact same proof without the supremum over time in the expectation, (using Jensen’s
inequality and Itô Isometry instead of Burkholder-Davis-Gundy inequality for the martingale terms), we get
from Equation 55 that for each t ∈ [0, 1],
X N Z t X N Z t
1 1
γE (X̄si,N )6 ds ≤ 2C(D, K, σ, κ, ǫ, B, c, γ) + 2C2 (K, κ)E (X̄si,N )4 ds
N i=1 0 N i=1 0
≤ 2C(D, K, σ, κ, ǫ, B, c, γ)

+ 8C2 (K, κ)C(D, K, σ, κ, ǫ, B, c, γ) exp 4C2 (K, κ) by Equation 56
< ∞.
Recalling K and γ are just a fixed constants from the bounds on Φ and its derivatives and c is the fixed
constant from applying Burkholder-Davis-Gundy inequality, we are done.
Appendix E. Proof of Theorem 5.2

We begin with the proof of Theorem 5.2.
Proof. As a consequence of (V̂ 4), we get for Θ̂ ∈ V̂,
(57)

Θ̂ (φ, r̂) ∈ Ŵ : r̂t (dydz) = γ(dz|y, t)π(dy|φ(t), ν̂Θ̂ (t)) for some stochastic kernel γ, ∀t ∈ [0, 1] = 1.
This follows in essentially the same way as in Remark 6.7. Then we can write Equation 21 as:
(58)
r Z Z
1 1
Z
dX̂t = −X̂t3
+ (1 − κ)X̂t + κ xL(X̂t )(dx) dt + σ dWt + σ ′
z Φ (y) + 1 γ(dz|y, t)π(dy)dt.
Z Ẑ R Z Ẑ T R
We recall here for convenience that, using Equation 20 and Remark 6.7, the corresponding limiting
controlled system under the original control formulation on W from Equation 12 is given by:
r
1 1 σ
Z Z
(59) dX̃t = −X̃t3 + (1 − κ)X̃t + κ xL(X̃t )(dx) dt + σ dWt + zρt (dz)dt.
Z Ẑ R Z Ẑ Z Ẑ R
To prove the equivalence of I given in Theorems 3.4 and 10.2 and Iˆ given in Theorem 5.2, we prove that
given any θ ∈ P(X ), we can construct from Θ ∈ V some Θ̂ ∈ V̂ with the expression inside the infimum in
the definition of Iˆ less than the expression inside the infimum in the definition of I, and vice versa. Then
this construction holds for any Θ or Θ̂ where the infimum is attained, and the result will be proved.
Fix θ ∈ P(X ) such that I(θ) < ∞. Given Θ ∈ V such that ΘX = θ, we construct Θ̂ ∈ V̂ with Θ̂X = θ
such that
Z Z
1 1
EΘ̂ |z|2 ρ̂(dydzdt) ≤ EΘ |z|2 ρ(dzdt) .
2 T×R×[0,1] 2 R×[0,1]
To do this, simply set

Θ̂Ŷ (B) = ΘZ {r ∈ Z : ∃r̂ ∈ B where r̂(dydzdt) = r(dzdt)π(dy)}
for all B ∈ Ŷ.

What we are really doing here is embedding Z and Ŷ into the product measure space (Z × Ŷ, B(Z × Ŷ)
and defining a probability measure Θ̄ on this space by
Z Z
Θ̄(E × F ) = δp(r) (dr̂)ΘZ (dr)
E F
for E × F ∈ B(Z × Ŷ) and p : Z → Ŷ defined by p(r(dzdt)) = r(dzdt) ⊗ π(dy). Then, Θ̄ has first marginal
ΘZ and we define its second marginal to be Θ̂Ŷ . Due to the fact that p is injective, we can quickly see that
Θ̄ is indeed a probability measure on the product space, so that this is a well-defined way of constructing
Θ̂Ŷ .
Then the equality in distribution
Z sZ Z sZ Z
′ ′
σ z Φ (y) + 1 ρ̃t (dydz)dt = σ Φ (y) + 1 π(dy) zρt (dz)dt
0 T×R 0 T R
Z sZ
σ
= zρt (dz)dt by Equation 20
Z Ẑ 0 R
holds for all s ∈ [0, 1]. In other words, the dynamics of Equation 58 under Θ̂ and Equation 59 under Θ are
the same. The fact that Θ̂ ∈ V̂ then follows from the fact that Θ ∈ V. In addition,
Z Z Z
1 1
EΘ̂ |z|2 ρ̂(dydzdt) = EΘ |z|2 ρ(dzdt)π(dy)
2 T×R×[0,1] 2 T R×[0,1]
Z
Θ 1 2
=E |z| ρ(dzdt) .
2 R×[0,1]
ˆ < ∞. Given Θ̂ ∈ V̂ such that Θ̂X = θ, we construct Θ ∈ V with ΘX = θ

Now fix θ ∈ P(X ) such that I(θ)
such that
Z Z
1 1
EΘ |z|2 ρ(dzdt) ≤ EΘ̂ |z|2 ρ̂(dydzdt) .
2 R×[0,1] 2 T×R×[0,1]
The fact that Θ ∈ V already fixes ΘX ×Y . Then we set

R
ΘZ (C) = Θ̂Ŷ {r̂ ∈ Ŷ : ∃r ∈ C where rt (dz) = Z Ẑ ∗ δ T×R [Φ′ (y)+1]ẑr̂t (dydẑ) (dz), for a.e. t ∈ [0, 1]}
for all C ∈ Z. Notice that in our definition of Z we do not require rt to be a probability measure (the
proof of tightness in Subsection 6.1.3 still holds in the larger space of positive Borel measures on Rm × [0, 1]
with finite first moment rather than just probability measures with finite first moment since {QN Z }N ∈N is
uniformly bounded in the total variation norm, see for example Theorem 8.6.2 in [8]), and that since Φ′ (y)
is bounded,
Z 1Z Z 1 Z
R ′

Z Ẑ|z|δ T×R [Φ′ (y)+1]ẑr̂t (dydẑ) (dz)dt = Z Ẑ
[Φ (y) + 1]ẑr̂t (dydẑ)dt
0 R 0 T×R
Z 1 Z
≤ Z ẐC |ẑ|r̂t (dydẑ)dt
0 R
< ∞ for r̂ ∈ Ŷ.
So indeed for r̂ ∈ Ŷ, r(dzdt) = Z Ẑ ∗ δRT×R [Φ′ (y)+1]ẑr̂t (dydẑ) (dz)dt is in Z. The equality in distribution
Z sZ Z sZ
σ
zρt (dz)dt = σ [Φ′ (y) + 1]ẑ ρ̂t (dydẑ)dt
Z Ẑ 0 R 0 T×R
holds for all s ∈ [0, 1]. In other words, the dynamics of Equation 58 under Θ̂ and Equation 59 under Θ are
the same. The fact that Θ ∈ V then follows from the fact that Θ̂ ∈ V̂. In addition,
Z Z 2
1 Z Ẑ
EΘ |z|2 ρ(dzdt) = EΘ̂ ′

[Φ (y) + 1]ẑ ρ̂ t (dydẑ) dt
2 R×[0,1] 2
T×R

Z Z 2
Θ̂ Z Ẑ ′

=E [Φ (y) + 1]zγ(dz|y, t)π(dy) dt by Equation 57
2
T R
Z Z Z 2
Z Ẑ
≤ EΘ̂ |Φ′ (y) + 1|2 π(dy)

zγ(dz|y, t) π(dy) dt by Hölder’s inequality
2 T

T R

Z Z 2
1
= EΘ̂

zγ(dz|y, t) π(dy)dt by Equation 20
2
ZT Z R
Θ̂ 1 2
≤E |z| γ(dz|y, t)π(dy)dt by Jensen’s inequality
2
ZT R
Θ̂ 1 2
=E |z| ρ̂(dydzdt) .
2 T×R×[0,1]
The rigorous construction of ΘZ in this direction follows in the same way as for previous one, but
this time p : Ŷ → Z is given by p(r̂) = Z Ẑ ∗ δRT×R [Φ′ (y)+1]ẑr̂t (dydẑ) (dz)dt. The issue here is that for
two different r̂1 , r̂2 ∈ Ŷ, we could have T×R [Φ′ (y) + 1]ẑr̂t1 (dydẑ) = T×R [Φ′ (y) + 1]ẑr̂t2 (dydẑ), so p is
R R
Rnot injective. However, thisR can be overcome by defining the equivalence relation on Ŷ by r̂1 ∼ r̂2 if
′ 1 ′ 2
T×R [Φ (y) + 1]ẑr̂t (dydẑ) = T×R [Φ (y) + 1]ẑr̂t (dydẑ). Then we can take the quotient measure space Ŷ/ ∼
and corresponding measure Θ̂Ŷ / ∼ and define ΘZ as above, but with Θ̂Ŷ replaced by Θ̂Ŷ / ∼ (see, for example,
RsR
Section 3 of [55]). Since ρ̂ only shows up in the dynamics of Equation 58 as 0 T×R [Φ′ (y) + 1]ẑ ρ̂t (dydẑ)dt,
Θ still satisfies (V1) by virtue of Θ̂ satisfying (V̂ 1), and the rest of the analysis still follows.
Remark E.1. While reading Section 6, one might realize that, due to the linear interaction of the prelimit
process with the second and third marginals of the occupation measures {QN }N ∈N , there is an equally viable
choice of occupation measures which joins the relaxed controls and the prelimit invariant measure to lie on
one space. Namely:
N
1 X
Q̂N
ω (A × B) = δ i,N (A)δρ̂i,N (ω) (B)
N i=1 X̄ (·,ω)
for A × B ∈ B(Ŵ), ω ∈ Ω, X̄ i,N ρ̂i,N (ω) ∈ Ŷ given by

Z
i,N
ρ̂ (ω)(I × C × D) := δ(X̄ i,N (t,ω)/δ)mod1 (C)δuN
i (t,ω)
(D)dt
I
d m
for I ∈ B([0, 1]), C ∈ B(T ), D ∈ B(R ). Here Ŵ is as in Theorem 5.2.
Indeed, with such a choice the proofs in Sections 6, 7, and 9 go through in essentially the same way. It
is a well known result that the rate function for the Laplace Principal of a sequence of random variables on
a given space is unique (see, for example, Theorem 1.3.1 in [21]). The proof of Theorem 5.2 shows that, at
least in a certain subclass of our system, the rate function attained from the use of {Q̂N }N ∈N and {QN }N ∈N
agree. It is thus possibly a viable option to use {Q̂N }N ∈N to attain a rate function of the form offered in
Theorem 5.2 for the general system.
The reason why we instead make the choice of occupation measures described by Equation 23 is because
it makes the proof of the upper bound in Section 8 much easier. We would not be able to exploit weak-sense
uniqueness (see Definition 2.2) as we did if we had used {Q̂N }N ∈N .
References
[1] A. Ansari, Mean first passage time solution of the Smoluchowski equation: Application of relaxation dynamics in myoglobin,
Journal of Chemical Physics 112 (2000), no. 5, 2516–2522.
[2] P. Baldi, Large deviations for diffusions processes with homogenization and applications, Annals of Probability 19 (1991),
no. 2, 509–524.
[3] J. Barré, C. Bernardin, R. Chétrite, Y. Chopra, and M. Mariani, Gamma convergence approach for the large deviations of
the density in systems of interacting diffusion processes (2019), available at arXiv:1910.04026 [math.AP].
[4] A. Bensoussan, J. L. Lions, and G. Papanicolau, Asymptotic Analysis for Periodic Structures, North Holland, Amsterdam,
1978.
[5] P. Billingsley, Convergence of Probability Measures, Wiley, NY, 1999.
[6] J. Binney and S. Tremaine, Galactic Dynamics, Princeton University Press, Princeton, 2008.
[7] J. D. Bryngelson, J. N. Onuchic, N. D. Socci, and P. G. Wolynes, Funnels, pathways and the energy landscape of protein
folding: A synthesis, Proteins 21 (1995), no. 3, 167–195.
[8] V. I. Bogachev, Measure Theory, Vol. 2, Springer, NY, 2007.
[9] V. Borkar and V. Gaitsgory, Averaging of singularly perturbed controlled stochastic differential equations, Applied Math-
ematics and Optimization 56 (2007), no. 2, 169–209.
[10] A. Budhirja and P. Dupuis, A Variational representation for positive functionals of infinite dimensional brownian motion,
Probab. Math. Statist. 20 (2001).
[11] A. Budhirja, P. Dupuis, and M. Fischer, Large devation properties of weakly interacting particles via weak convergence
methods, T.A. of Prob. 40 (2012), 74–100.
[12] R. Carmona and F. Delarue, Probabilistic Theory of Mean Field Games with Applications I, Springer, NY, 2018.
[13] P. E. Chaudru de Raynal and N. Frikha, From the backward Kolmogorov PDE on the Wasserstein space to propagation
of chaos for McKean-Vlasov SDEs (2019), available at arXiv:1907.01410 [math.AP].
[14] , Well-posedness for some non-linear diffusion processes and related PDE on the Wasserstein space (2018), available
at arXiv:1811.06904 [math.CA].
[15] D. A. Dawson, Critical dynamics and fluctuations for a mean-field model of cooperative behavior, J. Stat. Phys. 31 (1983),
29–85.
[16] D. A. Dawson and J. Gärtner, Large deviations from the mckean-vlasov limit for weakly interacting diffusions, Stochastics
20 (1987), no. 4, 247–308.
[17] M. G. Delgadino, R. S. Gvalani, and G. A. Pavliotis, On the diffusive-mean field limit for weakly interacting diffusions
exhibiting phase transitions (2020), available at arXiv:2001.03920 [math.AP].
[18] IR. M. Dudley, Real Analysis and Probability, Cambridge University Press, Cambridge, 2010.
[19] A. B. Duncan and G. A. Pavliotis, Well-posedness for some non-linear diffusion processes and related PDE on the Wasser-
stein space (2016), available at arXiv:1605.05854 [math.ph].
[20] P. Dupuis and K. Spiliopoulos, Large deviations for multiscale diffusion via weak convergence methods, Stochastic Processes
and their Applications 122 (2012), no. 4, 1947–1987.
[21] P. Dupuis and R. S. Ellis, A Weak Convergence Approach to the Theory of Large Deviations, Wiley, NY, 1997.
[22] S. Ethier and T. Kurtz, Markov Processes: Characterization and Convergence, Wiley, NY, 1986.
[23] J. Feng, M. Forde, and J. P. Fouque, Short-maturity asymptotics for a fast mean-reverting heston stochastic volatility
model, SIAM Journal on Financial Mathematics 1 (2010), no. 1, 126–141.
[24] J. Feng, J. P. Fouque, and R. Kumar, Small-time asymptotics for fast mean-reverting stochastic volatility models, The
Annals of Applied Probability 22 (2012), no. 4, 1541–1575.
[25] J. P. Fouque, G. Papanicolaou, and K. R. Sircar, Derivatives in financial markets with stochastic volatility, Cambridge
University Press, Cambridge, 2000.
[26] M. Freidlin and R. Sowers, A comparison of homogenization and large deviations, with applications to wavefront propaga-
tion, Stochastic Process and Their Applications 82 (1999), no. 1, 23–52.
[27] M. I. Freidlin and A. D. Wentzell, Random Perturbations of Dynamical Systems, Springer, Heidelberg, 2012.
[28] A. Friedman, Partial Differential Equations of Parabolic Type, R. E. Krieger Publishing, Malabar, Florida, 1983.
[29] M. Fischer, On the connection between symmetric N-player games and mean field games, Ann. Appl. Probab. 27 (2017),
no. 2, 757–810.
[30] , On the form of the large deviation rate function for the empirical measures of weakly interacting systems, Bernoulli
20 (2014), no. 4, 1765–1801.
[31] V. Gaitsgor and M. T. Nguyen, Multiscale singularly perturbed control systems: Limit occupational measures sets and
averaging, SIAM Journal on Control and Optimization 41 (2002), no. 3, 954–974.
[32] J. Garnier, G. Papanicolaou, and T. W. Yang, Large deviations for a mean field model of systemic risk, SIAM Journal of
financial mathematics 4 (2013), no. 1, 151–184.
[33] , Consensus convergence with stochastic effects, Vietnam Journal of mathematics 45 (2017), no. 1-2, 51–75.
[34] J. Gärtner, On Large deviations from the invariant measure, Theory of probability and applications 22 (1975), no. 1,
24–39.
[35] A. Gut, Probability: A Graduate Course, Springer, NY, 2013.
[36] D. Gilbarg and N. S. Trudinger, Elliptic Partial Differential Equations of Second Order, Springer, NY, 2001.
[37] C. Hyeon and D. Thirumalai, Can energy landscapes roughness of proteins and RNA be measured by using mechanical
unfolding experiments?, Proc. Natl. Acad. Sci. 100 (2003), no. 18, 10249–10253.
[38] S. A. Isaacson, J. Ma, and K. Spiliopoulos, Mean field limits of particle-based stochastic reaction-diffusion models (2020),
available at arXiv:2003.11868 [math.PR].
[39] A. Jakubowski, On the Skorokhod topology, Annales de l’I. H. P., section B 22 (1986), no. 3, 263–285.
[40] M. Kac, Probability and related topics in physical sciences, American Mathematical Society, Providence, 1957.
[41] I. Karatzas and S. Shreve, Brownian Motion and Stochastic Calculus, Springer, NY, 1998.
[42] D. Lacker, Limit theory for controlled McKean-Vlasov dynamics, SIAM Journal on Control and Optimization 55 (2017),
no. 3, 1641–1672.
[43] S. Lifson and J. L. Jackson, On the self-diffusion of ions in a polyelectrolyte solution, Journal of Chemical Physics 36
(1962), 2410–2414.
[44] E. Lućon and W. Stannat, Transition from Gaussian to non-Gaussian fluctuations for mean-field diffusions in spatial
interaction, Annals of Probability 26 (2016), no. 6, 3840–3909.
[45] A. J. Majda, C. Franzke, and B. Khouider, An applied mathematics perspective on stochastic modelling for climate,
Philosophical Transactions of the Royal Society A 336 (2008), no. 1875, 2429–2455.
[46] S. Motsch and E. Tadmor, Heterophilious dynamics enhances consensus, SIAM Review 56 (2014), no. 4, 577–621.
[47] H. P. McKean, Stochastic Integrals, AMS, NY, 2005.
[48] R. Lipster, Large deviations for two scaled diffusions, Probability Theory and Related Fields 106 (1996), no. 1, 71–104.
[49] B. Øksendal, Stochastic Differential Equations, Springer, NY, 2003.
[50] E. Pardoux and A. Y. Veretennikov, On Poisson equation and diffusion approximation I, Annals of Probability 29 (2001),
no. 3, 1061–1085.
[51] , On Poisson equation and diffusion approximation I, Annals of Probability 31 (2003), no. 3, 1166–1192.
[52] G. Pavliotis and G. A. Stuart, Multiscale Methods, Springer, NY, 2008.
[53] S. T. Rachev, Probability Metrics and the Stability of Stochastic Models, Wiley, NY, 1991.
[54] M. Röckner, X. Sun, and Y. Xie, Strong convergence order for slow-fast McKean-Vlasov stochastic differential equations
(2019), available at arXiv:1909.07665 [math.PR].
[55] V. A. Rokhlin, On the Fundamental Ideas of Measure Theory, Mat. Sb. (N.S.) 25 (1949), no. 67, 107–150; English transl.,
Translations Amer. Math. Soc., Series 1 10 (1962), 1-–54.
[56] K. Spiliopoulos, Large deviations and importance sampling for systems of slow-fast motion, Applied Mathematics and
Optimization 67 (2013), 123–161.
[57] , Fluctuation analysis and short time asymptotics for multiple scales diffusion processes, Stochastics and Dynamics
14 (2014), no. 3, 1350026.
[58] , Quenched Large Deviations for Multiscale Diffusion Processes in Random Environments, Electronic Journal of
Probability 20 (2015), no. 15, 1–29.
[59] A. Yu. Veretennikov, On large deviations in the averaging principle for SDEs with a “full dependence”, correction (2005),
available at arXiv:math/0502098 [math.PR]. Initial article in Annals of Probability, Vol. 27, No. 1, (1999), pp. 284–296.
[60] , On large deviations for SDEs with small diffusion and averaging, Stochastic Processes and their Applications 89
(2000), no. 1, 69–79.
[61] , On the strong solutions of stochastic differential equations, Theory of Probability and Its Applications 24 (1977),
no. 2, 354-–366.
[62] F. Y. Wang, Distribution dependent SDEs for Landau type equations, Stochastic Processes and their Applications 128
(2017), no. 2, 595-–621.
[63] T. Yamada and S. Watanabe, On the uniqueness of solutions of stochastic differential equations, J. Math. Kyoto Univ. 11
(1971), no. 1, 155–167.
[64] R. Zwanzig, Diffusion in a rough potential, Proc. Natl. Acad. Sci. 85 (1988), 2029–2030.
Boston University, Department of Mathematics and Statistics, 111 Cummington Mall, Boston, MA 02215, USA
Email address, Zachary William Bezemek: bezemek@bu.edu
Email address, Konstantinos Spiliopoulos: kspiliop@bu.edu

Large Derivations

Uploaded by

Copyright:

Available Formats

Large Derivations

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Large Derivations

Uploaded by

Copyright:

Available Formats

LARGE DEVIATIONS FOR INTERACTING MULTISCALE PARTICLE SYSTEMS

Z.W. BEZEMEK, AND K. SPILIOPOULOS

2010 Mathematics Subject Classification. 60F10, 60F05.

2. Notation, Assumptions, and the Controlled McKean-Vlasov process

where g = f, b, or σ and W2 is the L2 -Wasserstein distance (see Appendix B).

and the function Φ : Rd × Td × P2 (Rd ) → Rd , Φ = (Φ1 , ..., Φd ) solving

as N → ∞. Specifically, letting ev : C([0, 1]; Rd ) → Rd be the evaluation map at time t, in Theorem

+ B(t, X̃tν , ν(t))dWt ,

(11) νQ (t) := Q({(φ, n, r) ∈ W : φ(t) ∈ B}), B ∈ B(Rd ), t ∈ [0, 1].

(For a description of B(P(Rd )) see [21] Lemma A.5.1).

+ B(t, X̄t , L(X̄t ))dWt ,

3. Statement of the Main Results

(V3) νΘ (0) = ν0 from (A6).  

for any F ∈ Cb (P(X )) where µ̄N is given by (8) with uN = (uN N

so that Equation 14 is satisfied.

See [4] Chapter 3 Section 6.2.

4. Formal Connection to Rate Functions in the Existing Literature

acts on θ ∈ C([0, 1]; P(Rd )) by

Taking expectations, dividing by h, and sending h → 0, we get

5. An example with a bistable confining potential

and its unique solution is

We shall write that Θ̂ ∈ P(Ŵ) is in V̂ if

 (X̂, ρ̂) the coordinate process on Ŵ defined

Now, we are ready to state Theorem 5.2.

6. Limiting Behavior of the Controlled Empirical Measure

We will prove the following two propositions:

Given that M may be taken to be arbitrarily large, we have

Thus we have g is a tightness function on R1 . Now define G : P(Z) → [0, ∞] by

sup E[G(QN )] < ∞.

Lemma 6.5. For q(x, y) := 1 ∧ |x − y| and 0 ≤ t ≤ t + τ ≤ 1, 0 ≤ τ ≤ γ, there exists a family

lim lim sup E[ξ N (γ)] = 0

We proceed with the proof of Lemma 6.4:

Proof of Lemma 6.4. Define KL ⊂⊂ Rd by KL := {x ∈ Rd : |x| ≤ L}. Then

By Prokhorov’s Theorem, KL∗ ⊂⊂ P(Rd ) for each L. Now we see that

P(µ̄N 6∈ {µ ∈ DE [0, 1] : µ(t) ∈ KL∗ , ∀t ∈ [0, 1]})

This bound approaches 0 as L → ∞, so the lemma is proved.

Now we prove Lemma 6.5.

ψl (X̄ti,N , X̄ti,N /δ, µ̄N

Firstly, we observe that

Next, we observe that

Lastly, we observe that

Now we must show that

This follows immediately since

as N → ∞, so the result will follow by Chebyshev’s inequality.

by Hölder’s inequality, Assumption (25), and Assumption (A1).

(32) ⇒ for ΘX -a.e.φ ∈ X , λ({n ∈ Y : (L1φ(s),νΘ (s) )∗ ns = 0, ∀s ∈ [0, 1]}|φ) = 1.

since the set we are measuring doesn’t depend on r

for almost every ω̃ ∈ Ω̃.

Then the conclusion follows.

in distribution. We invoke Skorokhod’s representation theorem to assume the convergence of QN → Q occurs

where for s ∈ [0, 1], x ∈ Rd , y ∈ Td , and Θ ∈ P(W):

≤ C(1 + |z|2 ) by Young’s inequality, Assumption (A1) and Proposition C.2.

We have then that

and hence Equation 37 is proved. Now we prove Equation 38. We have:

Now we prove Lemma 6.10.

By the same proofs as in tightness of QN

We can rewrite DN in terms of the occupation measures defined in Equation 23 as:

First we show the first of these terms vanishes as N → ∞. We have

(V3) νΘ (0) = ν0 from (A6).

(X̂, ρ̂) the coordinate process on Ŵ defined