0% found this document useful (0 votes)
86 views126 pages

Introduction To Stochastic Calculus

This document provides an outline for a course on stochastic calculus. It introduces motivations for studying stochastic calculus, which include modeling random phenomena in nature like population dynamics, infectious diseases, stock prices, and gambling. It notes the deep connection between stochastic processes involving noise and the deterministic theory of partial differential equations. The outline then previews the topics to be covered in the course, including probability spaces, Brownian motion and other stochastic processes, Itô integrals, Itô's formula, stochastic differential equations, the connection between SDEs and PDEs, martingales, localization, Girsanov's theorem, and one-dimensional SDEs. Applications mentioned include the Dirichlet problem, the Black-Scholes equation

Uploaded by

ike.unadike
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
86 views126 pages

Introduction To Stochastic Calculus

This document provides an outline for a course on stochastic calculus. It introduces motivations for studying stochastic calculus, which include modeling random phenomena in nature like population dynamics, infectious diseases, stock prices, and gambling. It notes the deep connection between stochastic processes involving noise and the deterministic theory of partial differential equations. The outline then previews the topics to be covered in the course, including probability spaces, Brownian motion and other stochastic processes, Itô integrals, Itô's formula, stochastic differential equations, the connection between SDEs and PDEs, martingales, localization, Girsanov's theorem, and one-dimensional SDEs. Applications mentioned include the Dirichlet problem, the Black-Scholes equation

Uploaded by

ike.unadike
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 126

Introduction to Stochastic Calculus

Math 545 - Duke University

Andrea Agazzi, Jonathan C. Mattingly


Contents

Chapter 1. Introduction 5
1. Motivations 5
2. Outline For a Course 6
Chapter 2. Probabilistic Background 9
1. Countable probability spaces 9
2. Uncountable Probability Spaces 12
3. General Probability Spaces and Sigma Algebras 13
4. Distributions and Convergence of Random Variables 20
Chapter 3. Brownian Motion and Stochastic Processes 23
1. An Illustrative Example: A Collection of Random Walks 23
2. General Stochastic Proceses 24
3. Definition of Brownian motion (Wiener Process) 25
4. Constructive Approach to Brownian motion 28
5. Brownian motion has Rough Trajectories 29
6. More Properties of Random Walks 31
7. More Properties of General Stochastic Processes 32
8. A glimpse of the connection with pdes 37
Chapter 4. Itô Integrals 39
1. Properties of the noise Suggested by Modeling 39
2. Riemann-–Stieltjes Integral 40
3. A motivating example 41
4. Itô integrals for a simple class of step functions 42
5. Extension to the Closure of Elementary Processes 46
6. Properties of Itô integrals 48
7. A continuous in time version of the the Itô integral 49
8. An Extension of the Itô Integral 50
9. Itô Processes 51
Chapter 5. Stochastic Calculus 53
1. Itô’s Formula for Brownian motion 53
2. Quadratic Variation and Covariation 56
3. Itô’s Formula for an Itô Process 60
4. Full Multidimensional Version of Itô Formula 62
5. Collection of the Formal Rules for Itô’s Formula and Quadratic Variation 66
Chapter 6. Stochastic Differential Equations 69
1. Definitions 69
2. Examples of SDEs 70
3. Existence and Uniqueness for SDEs 73
3
4. Weak solutions to SDEs 76
5. Markov property of Itô diffusions 78
Chapter 7. PDEs and SDEs: The connection 79
1. Infinitesimal generators 79
2. Martingales associated with diffusion processes 81
3. Connection with PDEs 83
4. Time-homogeneous Diffusions 86
5. Stochastic Characteristics 87
6. A fundamental example: Brownian motion and the Heat Equation 88
Chapter 8. Martingales and Localization 91
1. Martingales & Co. 91
2. Optional stopping 93
3. Localization 95
4. Quadratic variation for martingales 98
5. Lévy-Doob characterization of Brownian motion 99
6. Random time changes 101
7. Martingale inequalities 104
8. Martingale representation theorem 106
Chapter 9. Girsanov’s Theorem 109
1. An illustrative example 109
2. Tilted Brownian motion 110
3. Girsanov’s Theorem for sdes 111
Chapter 10. One Dimensional sdes 117
1. Natural Scale and Speed measure 117
2. Existence of Weak Solutions 118
3. Exit From an Interval 118
4. Recurrence 119
5. Intervals with Singular End Points 119
Bibliography 121
Appendix A. Some Results from Analysis 123
Appendix B. Exponential Martingales and Hermite Polynomials 125

4
CHAPTER 1

Introduction

1. Motivations
Evolutions in time with random influences/random dynamics. Let N ptq be the
“number of rabbits in some population” or “the price of a stock”. Then one might want to make a
model of the dynamics which includes “random influences”. A (very) simple example is
dN ptq
“ aptqN ptq where aptq “ rptq ` “noise” . (1.1)
dt
Making sense of “noise” and learning how to make calculations with it is one of the principal
objectives of this course. This will allow us predict, in a probabilistic sense, the behavior of N ptq .
Examples of situations like the one introduced above are ubiquitous in nature:
i) The gambler’s ruin problem We play the following game: We start with 3$ in our
pocket and we flip a coin. If the result is tail we loose one dollar, while if the result is
positive we win one dollar. We stop when we have no money to bargain, or when we reach
9$. We may ask: what is the probability that I end up broke?
ii) Population dynamics/Infectious diseases As anticipated, (1.1) can be used to model
the evolution in the number of rabbits in some population. Similar models are used to
model the number of genetic mutations an animal species. We may also think about
N ptq as the number of sick individuals in a population. Reasonable and widely applied
models for the spread of infectious diseases are obtained by modifying (1.1), and observing
its behavior. In all these cases, one may be interested in knowing if it is likely for the
disease/mutation to take over the population, or rather to go extinct.
iii) Stock prices We may think about a set of M risky investments (e.g. a stock), where the
price Ni ptq for i P t1, . . . M u per unit at time t evolves according to (1.1). In this case, one
one would like to optimize his/her choice of stocks to maximize the total value M
ř
i“1 αi Ni ptq
at a later time T .
Connections with diffusion theory and PDEs. There exists a deep connection between
noisy processes such as the one introduced above and the deterministic theory of partial differential
equations. This starling connection will be explored and expanded upon during the course, but we
anticipate some examples below:
i) Dirichlet problem Let upxq be the solution to the pde given below with the noted
B2 B2
boundary conditions. Here ∆ “ Bx 2 ` By 2 . The amazing fact is the following: If we start a

Brownian motion diffusing from a point px0 , y0 q inside the domain then the probability
that it first hits the boundary in the darker region is given by upx0 , y0 q.
ii) Black Scholes Equation Suppose that at time t “ 0 the person in iii) is offered the
right (without obligation) to buy one unit of the risky asset at a specified price S and at
a specified future time t “ T . Such a right is called a European call option. How much
should the person be willing to pay for such an option? This question can be answered by
solving the famous Black Scholes equation, giving for any stock price N ptq the right value
S of the European option.
5
u=0

1
2 ∆u =0

u=1

2. Outline For a Course


What follows is a rough outline of the class, giving a good indication of the topics to be covered,
though there will be modifications.
i) Weeks 1-2: Motivation and Introduction to Stochastic Process
(a) Motivating Examples: Random Walks, Population Model with noise, Black-Scholes,
Dirichlet problems
(b) Themes: Direct calculation with stochastic calculus, connections with pdes
(c) Introduction: Probability Spaces, Expectations, σ-algebras, Conditional expectations,
Random walks and discrete time stochastic processes. Continuous time stochastic pro-
cesses and characterization of the law of a process by its finite dimensional distributions
(Kolmogorov Extension Theorem). Markov Process and Martingales.
ii) Weeks 3-4: Brownian motion and its Properties
(a) Definitions of Brownian motion (BM) as a continuous Gaussian process with indepen-
dent increments. Chapman-Kolmogorov equation, forward and backward Kolmogorov
equations for BM. Continuity of sample paths (Kolmogorov Continuity Theorem).
BM and more Markov process and Martingales.
(b) First and second variation (a.k.a variation and quadratic variation) Application to BM
iii) Week 5: Stochastic Integrals
(a) The Riemann-Stieltjes integral. Why can’t we use it ?
şt
(b) Building the Itô and Stratonovich integrals (Making sense of “ 0 σ dB.”)
(c) Standard properties
ş of integrals
ş 2hold: linearity, additivity
2
(d) Itô isometry: Ep f dBq “ E f ds.
iv) Week 6: Itô’s Formula and Applications
(a) Change of variable
(b) Connections with pdes and the Backward Kolmogorov equation
v) Week 7: Stochastic Differential Equations
(a) What does it mean to solve an sde ?
(b) Existence of solutions (Picard iteration), Uniqueness of solutions
vi) Week 8-9: Stopping Times
(a) Definition. σ-algebra associated to stopping time. Bounded stopping times. Doob’s
optional stopping theorem
(b) Dirichlet Problems and hitting probabilities
(c) Localization via stopping times
vii) Week 10: Levy-Doob theorem and Girsonov’s Theorem
(a) How to tell when a continuous martingale is a Brownian motion
(b) Random time changes to turn a Martingale into a Brownian motion
6
(c) Hermite Polynomials and the exponential martingale
(d) Girsanov’s Theorem, Cameron-Martin formula, and changes of measure
(1) The simple example of i.i.d Gaussian random variables shifted
(2) Idea of Importance sampling and how to sample from tails
(3) The shift of a Brownian motion
(4) Changing the drift in a diffusion
viii) Week 11: Feller Theory of one dimensional diffusions
(a) Speed measures, natural scales, the classification of boundary point.
ix) Week 12-13: Applications
(a) Option Pricing and the Black-Scholes equation
(b) Population biology and Chemical Kinetics
(c) Stochastic Control, Signal Processing and Reinforcement learing

7
CHAPTER 2

Probabilistic Background

1. Countable probability spaces


Example 1.1. We begin with the following motivating example. Consider a random sequence
ω “ tωi uN
i“0 where
#
1 with probability p
ωi “
´1 with probability 1 ´ p
independent of the other ωj ’s. We will also write this as
Prpω1 , ω2 , . . . , ωN q “ ps1 , s2 , . . . , sN qs “ pn` p1 ´ pqN ´n`
for si “ ˘1, where n` :“ |ti : si “ `1u|. We can group the possible outcomes with ω1 “ `1:
A1 “ tω P Ω : ω1 “ 1u .
and compute the probability of such an event:
ÿ
P rA1 s “ P rωs “ p.
ωPA1

Let Ω be the set of all such sequences of length N (i.e. Ω “ t´1, 1uN ), and consider now the
sequence of functions tXn : Ω Ñ Zu where
X0 pωq “ 0 (2.1)
ÿ n
Xn pωq “ ωi
i“1

for n P t1, ¨ ¨ ¨ , N u. This sequence is a biased random walk (we call it unbiased or simply a random
walk if p “ 1{2) of length N : A simple example of a stochastic process. We can compute its
expectation: ÿ
E rX2 s “ iP rX2 “ is “ 2p2 ´ 2p1 ´ pq2 “ 2p2p ´ 1q .
iPt´2,0,2u
This expectation changes if we assume that we have some information on the state of the random
walk at an earlier time:
ÿ
E rX2 |X1 “ 1s “ iP rX2 “ i|X1 “ 1s “ 2p ` 0 p1 ´ pq “ 2p .
iPt´2,0,2u

We now recall some basic definitions from the theory of probability which will allow us to put
this example on solid ground.
In the above example, the set Ω is called the sample space (or outcome space). Intuitively, each
ω P Ω is a possible outcome of all of the randomness in our system. The subsets of Ω (the sets of
outcomes we want to compute the probability of) are referred to as the events and the measure
given by P on subsets A Ď Ω is called the probability measure, giving the chance of the various
9
outcomes. Finally, each Xn is an example of an integer-valued random variable. We will refer to
this collection of random variables as random walk.
In the above setting where the outcome space Ω consists of a finite number of elements, we are
able to define everything in a straightforward way. We begin with a quick recalling of a number of
definitions in the countably infinite (possibly finite) setting.
If Ω is countable it is enough
ř to define the probability of each element in Ω. That is to say give
a function p : Ω Ñ r0, 1s with ωPΩ ppωq “ 1 and define
Prωs “ ppωq
for each ω P Ω. An event A is just a subset of Ω. We naturally extend the definition of P to an
event A by
ÿ
PrAs :“ P rωs .
ωPA
Observe that this definition has a number of consequences. In particular, if Ai are disjoint events,
that is to say Ai Ă Ω and Ai X Ak “ H if i ‰ j then
« ff
8
ď 8
ÿ
P Ai “ PrAi s
i“1 i“1

and ifAc:“ tω P Ω with ω R Au is the compliment of A then PrAs “ 1 ´ PrAc s.


Given two event A and B, the conditional probability of A given B is defined by
PrA X Bs
PrA|Bs :“ (2.2)
PrBs
For fixed B, this is just a new probability measure Pr ¨ |Bs on Ω which gives probability Prω|Bs to
the outcome ω P Ω.
A random variable taking values in some set X is a function X : Ω Ñ X. In particular a
real-valued random variable X is simply a real-valued function X : Ω Ñ R. Throughout this course
we will almost exclusively consider real-valued random variables, so we can set X “ R in most cases.
We can then define the expected value of a random variable X (or simply the expectation of X) as
ÿ ÿ
ErXs :“ xPrX “ xs “ XpωqPrωs (2.3)
xPRangepXq ωPΩ

Here we have used the convention that tX “ xu is short hand for tω P Ω : Xpωq “ xu and the defi-
nition of RangepXq “ tx P X : Dω, with Xpωq “ xu “ XpΩq. We can further define the covariance
of two random variables X, Y in the same space as Cov rX, Y s “ E rpX ´ E rXsq ¨ pY ´ E rY sqs and
” ı
Var rXs :“ Cov rX, Xs “ E X 2 ´ E rXs2 .

Two events A and B are independent if PrA X Bs “ PrAsPrBs. Two random variables are
independent if PrX “ x, Y “ ys “ PrX “ xsPrY “ ys. Of course this implies that for any
events A and B that PrX P A, Y P Bs “ PrX P AsPrY P Bs and that ErXY s “ ErXsErY s, so
that Cov rX, Y s “ 0 (note that Cov rX, Y s “ 0 is a necessary but not sufficient condition for
independence). A collection of events Ai is said to be mutually independent if
n
ź
PrA1 X ¨ ¨ ¨ X An s “ PrAi s .
i“1
Similarly a collection of random variable Xi are mutually independent if for any collection of sets
from their range Ai one has that the collection of events tXi P Ai u are mutually independent. As
10
before, as a consequence one has that
n
ź
ErX1 . . . Xn s “ ErXi s .
i“1
Given two X-valued random variables Y and Z, for any z P Range(Z) we define the conditional
expectation of Y given tZ “ zu as
ÿ
ErY |Z “ zs :“ yPrY “ y|Z “ zs (2.4)
yPRangepY q

Which is to say that ErY |Z “ zs is just the expected value of Y under the probability measure
which is given by P r ¨ |Z “ zs.
In general, for any event A we can define the conditional expectation of Y given A as
ÿ
ErY |As :“ yPrY “ y | As (2.5)
yPRangepY q

We can extend the definition ErY |Z “ zs to ErY |Zs which we understand to be a function of Z which
takes the value ErY |Z “ zs when Z “ z. More formally ErY |Zs :“ hpZq where h : RangepZq Ñ X
given by hpzq “ ErY |Z “ zs.
Example 1.2 (Example 1.1 continued). By clever rearrangement one does not always have to
calculate the function ErY |Zs explicitly:
« ff « ff
7
ÿ 6
ÿ
ErX7 |X6 s “ E ωi |X6 “ E ωi ` ω7 |X6
i“1 i“1
“E rX6 ` ω7 |X6 s “ ErX6 |X6 s ` Erω7 |X6 s
“X6 ` Erω7 s “ X6 ` p2p ´ 1q
since ω7 is independent on tωi u6i“1 (and therefore on X6 ) and we have Erω7 s “ p2p ´ 1q.
Example 1.3 (again, Example 1.1 continued). Setting p “ 1{2 we consider the random variable
pX3 q2 and we see that
ÿ
ErpX3 q2 |X2 “ 2s “ iPrpX3 q2 “ i|X2 “ 2s
iPN
“ p1q PrX3 “ 1|X2 “ 2s ` p3q2 PrX3 “ 3|X2 “ 2s “ 5
2

Of course, X2 can also take the value ´2 and 0. For these values of X2 we have
ErpX3 q2 |X2 “ ´2s “p´1q2 PrX3 “ ´1|X2 “ ´2s ` p´3q2 PrX3 “ ´3|X2 “ ´2s “ 5
ErpX3 q2 |X2 “ 0s “p´1q2 PrX3 “ ´1|X2 “ 0s ` p1q2 PrX3 “ 1|X2 “ 0s “ 1
Hence ErpX3 q2 |X2 s “ hpX2 q where
#
5 if x “ ˘2
hpxq “ (2.6)
1 if x “ 0
Again, note that we can arrive to the same result by cleverly rearranging the terms involved in the
computation:
ErX32 |X2 s “ ErpX2 ` ω3 q2 |X2 s “ ErX22 ` 2ω3 X2 ` ω32 |X2 s
“ ErX22 s ` 2Erω3 sErX2 s ` Erω32 s “ X22 ` 1
since Erω3 s “ 0 and Erω32 s “ 1. Compare this to the definition of h given in (2.6) above.
11
2. Uncountable Probability Spaces
If we consider Example 1.1 in the case N “ 8 (or even worse if we imagine our stochastic
process to live on the continuous interval r0, 1s) we need to consider Ω which have uncountably
many points. To illustrate the difficulties one can encounter in this setting let us condsider the
following example:
Example 2.1. Consider Ω “ r0, 1s and let P be the uniform probability distribution on Ω, i.e.,
that dmeasure that associates the same probability to each on the points in Ω. We immediately see
that in order to have P rΩs to be finite we must have P rωs “ 0 for all ω P Ω, as otherwise
« ff
ď ÿ
P rΩs “ P ω “ P rωs “ 8 .
ωPΩ ωPΩ
For this reason it is not sufficient anymore to simply assign a probability to each point ω P Ω as we
did before. We have to assign a probability to sets:
P rpa, bqs “ b ´ a for 0 ď a ď b ď 1 .
To handle the setting such as the one introduced above completely rigorously we need ideas
from basic measure theory. However if one is willing to except a few formal rules of manipulation,
we can proceed with learning basic stochastic calculus without needing to distract ourselves with
too much measure theory.
As we did in the previous section, we can define a real random variable as a function X : Ω Ñ R.
To define the measure associated by P to the values of this random variable we specify its Cumulative
Distribution Function (CDF) F pxq defined as P rX ď xs “ F pxq. We say that a R-valued random
variable X is a continuous random variable if there exists an (absolutely continuous) density function
ρ : R Ñ R so that
żb
P rX P ra, bss “ ρpxqdx
a
for any ra, bs Ă R. By the fundamental theorem of calculus we see that ρpxq satisfies ρpxq “ F 1 pxq.
More generally a Rn -valued random variable X is called a continuous random variable if there exists
a density function ρ : Rn Ñ Rě0 so that
ż b1 ż bn ż ż
PrX P ra, bss “ ¨¨¨ ρpx1 , . . . , xn q dx1 ¨ ¨ ¨ dxn “ ρpxqdx “ ρpxqLebpdxq
a1 an ra,bs ra,bs
n
ś
for any ra, bs “ rai , bi s Ă
R . The last two expressions are just different ways of writing the same
thing. Here we have introduced the notation Lebpdxq for the standard Lebesgue measure on Rn
given by dx1 ¨ ¨ ¨ dxn .
If X and Y are Rn -valued and Rm -valued random variables, respectively, then the vector pX, Y q
is again a continuous Rnˆm -valued random variable which has a density which is called the joint
probability density function (joint density for short) of X and Y . If Y has density ρY and ρXY is
the joint density of X and Y we can define
ż
ρXY px, yq
PrX P A|Y “ ys “ dx . (2.7)
A ρY pyq
Hence X given Y “ y is a new continuous random variable with density x ÞÑ ρXY px,yq
ρY pyq for a fixed y.
Finally, analogously to the countable case we define the expectation of a continuous random
variable with density ρ by
ż
ErhpXqs “ hpxqρpxqdx . (2.8)
Rn

12
The conditional expectation is defined using the density (2.7).
Definition 2.2. A real-valued random variable X is Gaussian with mean µ and variance σ 2 if
px´µq2
ż
1
PrX P As “ ? e´ 2σ2 dx .
A 2πσ 2
If a random variable has this distribution we will write X „ N pµ, σ 2 q . More generally we say
that a Rn -valued random variable X is Gaussian with mean µ P Rn and SPD covariance matrix
Σ P GLpRn q if
„ 
px ´ µqJ Σ´1 px ´ µq
ż
1
PrX P As “ a exp ´ dx .
A 2π detpΣq 2
While many calculations can be handled satisfactorily at this level, we will soon see that we
need to consider random variables on much more complicated spaces such as the space of real-valued
continuous functions on the time interval r0, T s which will be denoted Cpr0, T s; Rq. To give all of
the details in such a setting would require a level of technical detail which we do not wish to enter
into on our first visit to the subject of stochastic calculus. If one is willing to “suspend a little
disbelief” one can learn the formal rule of manipulation, much as one did when one first learned
regular calculus. The technical details are important but better appreciated after one fist has the
big picture.

3. General Probability Spaces and Sigma Algebras


To this end, we will introduce the idea of a sigma algebra (usually written σ-algebra or σ-field
in [Klebaner]). In Section 1, we defined our probability measures by beginning with assigning a
probability to each ω P Ω. This was fine when Ω was finite or countably infinite. However, as we
have seen in Example 2.1, when Ω is uncountable as in the case of picking a uniform point from the
unit interval (Ω “ r0, 1s), the probability of any given point must be zero. Otherwise the sum of all
of the probabilities would be 8 since there are infinitely many points and each of them has the
same probability as no point is more or less likely than another.
This is only the tip of the iceberg and there are many more complicated issues. The solution
is to fix a collection of subsets of Ω about which we are “allowed” to ask “what is the probability
of this event?”. We will be able to make this collection of subsets very large, but it will not, in
general, contain all of the subsets of Ω in situations where Ω is uncountable. This collections of
subsets is called the σ-algebra. The triplet pΩ, F, Pq of an outcome space Ω, a probability measure
P and a σ-algebra F is called a Probability Space. For any event A P F, the “probability of this
event happening” is well defined and equal to PrAs. A subset of Ω which is not in F might not
have a well defined probability. Essentially all of the event you will think of naturally will be in the
σ-algebra with which we will work. In light of this, it is reasonable to ask why we bring them up
at all. It turns out that σ-algebras are a useful way to “encode the information” contained in a
collection of events or random variables. This idea and notation is used in many different contexts.
If you want to be able to read the literature, it is useful to have a operational understanding of
σ-algebras without entering into the technical detail.
3.1. Sigma-algebras and probability spaces. Before attempting to convey any intuition
or operational knowledge about σ-algebras we give the formal definitions since they are short (even
if unenlightening).
Definition 3.1. Given a set Ω, a σ-algebra F is a collection of subsets of Ω such that
i) Ω P F

13
ii) A P F ùñ Ac “ ΩzA P F
Ť8
iii) Given tAn u a countable collection of subsets of F, we have i“1 Ai P F.

In this case the pair pΩ, Fq are referred to as a measurable space


Intuitively, a σ-algebra contains the sets of events that we are able to distinguish , i.e., the set of
events we are able to talk about. The more sets are contained in a σ-algebra, the larger the amount
of events we can talk about, the more information we have (or we can have) on the state ou four
system. Therefore
for us a σ-algebra is the embodiment of information.
Note that the requirements i)-iii) in the above definition correspond to operations we need to be
able to talk about:
i) we should be able to know if anything has happened,
ii) if we can infer whether an event has happened we should also know if that event has not
happened,
iii) given our knowledge on whether a series of events has happened we should also be able to
say if any of those events has happened.
Example 3.2 (Example 1.1 continued). If we set the total number of coin tosses N “ 2 we can
enumerate the σ-algebra completely: by iterating the operations i)-iii) in Def. 3.1 and by denoting
t`´u “ tω : ω1 “ `1, ω2 “ ´1u we have
F2 “ tΩ, H,t``u, t`´u, t´`u, t´´u, t``, `´u, t``, ´`u, t``, ´´u, t`´, ´`u,
t`´, `´u, t´`, ´´u, t``, `´, ´`u, t``, `´, ´´u, t``, ´`, ´´u, t`´, ´`, ´´uu .
Given any collection of subsets G of Ω we can talk about the “σ-algebra generated by G” as
simply what we get by taking all of the elements of G and exhaustively applying all of the operations
listed above in the definition of a σ-algebra. More formally,
Definition 3.3. Given Ω and F a collection of subsets of Ω, σpF q is the σ-algebra generated
by F . This is defined as the smallest (in terms of numbers of sets) σ-algebra which contains F .
Intuitively σpF q represents all of the probability data contained in F .
Example 3.4 (Example 1.1 continued). We define
F1 “ ttω P Ω : ω1 “ 1u, tω P Ω : ω1 “ ´1uu ,
as a division of the possible outcomes fixing ω1 . This collection of sets generates a σ-algebra on Ω,
given by
F1 :“ tΩ, H, tω P Ω : ω1 “ 1u, tω P Ω : ω1 “ ´1uu , (2.9)
representing the information we have on the process knowing ω1 .
Example 3.5. If Ω “ Rn or any subset of it, we talk about the Borel σ-algebra as the σ-algebra
generated by all of the intervals ra, bs with a, b P Ω. This σ-algebra contains essentially any event you
would think about in most all reasonable problems. Using pa, bq, or ra, bq or pa, bs or some mixture
of them makes no difference.
To complete our measurable space pΩ, Fq into a probability space we need to add a probability
measure. Since we will not build our measure from its definition on individual ω P Ω as we did in
Section 1 we will instead assume that it satisfies certain reasonable properties which follow from
this construction in the countable or finite case. The fact that the following assumptions is all that
is needed would be covered in a measure theoretical probability or analysis class.
14
Definition 3.6. Given a measurable space pΩ, Fq, a function P : F Ñ r0, 1s is a probability
measure if
i) PrΩs “ 1,
ii) PrAc s “ 1 ´ PrAs for all A P F.
iii) Given tAi u a finite collection of pairwise disjoint sets in F, P r ni“1 Ai s “ ni“1 PrAi s ,
Ť ř

In this case the triple pΩ, F, Pq is referred to as a probability space.


Given any events A and B in F, we define the conditional probability just as before, namely
PrA X Bs
PrA|Bs “ .
PrBs
3.2. Random Variables. As we anticipated in the previous sections, a random variable is a
function mapping an outcome to a number.
Definition 3.7. Let pΩ, Fq and pX, Bq be measurable spaces, then X : Ω Ñ X is a X-valued
random variable if for all B P B we have
X ´1 pBq “ tω P Ω : ξpωq P Bu P F . (2.10)
When (2.10) holds we say that the random variable is measurable with respect to F (F-measurable
for short). In this case we will write X P F. While this is a slight abuse of notation, it will be very
convenient.
The condition (2.10) in the above definition guarantees that admissible events in X get mapped
to admissible events in Ω. Speaking intuitively, a random variable is measurable with respect to a
given σ-algebra if the information in the σ-algebra is always sufficient to uniquely fix the value of
the random variable. In other words, for a random variable to make sense we need to have enough
information on the state of the system (in F) in order to uniquey determine the value of X (within
the “precision” of B).
Remark 3.8. The above definition can be naturally extended to sub-σ-algebras of F: For any
σ-algebra G Ď F we say that the random variable X is G-measurable if for all B P B we have
X ´1 pBq P G.
In particular, we say that a real-valued random variable X is measurable with respect to σ-algebra
G if every set of the form X ´1 pra, bsq is in G.
Example 3.9 (Example 1.1 continued). Consider the case N “ 2, Xn “ ni“1 ωn ,
ř

F1 “ tΩ, H, t``, `´u, t´`, ´´uu


(from (2.9)). Then we see that X1 P F1 as X1´1 p1q “ t``, `´u, X1´1 p´1q “ t´`, ´´u, while
X2 R F1 as X2´1 p0q “ t`´, ´`u R F1 .
One can define the smallest amount of information needed to specify the value of a certain
random variable: the generated σ-algebra.
Definition 3.10. Given a random variable on the probability space pΩ, F, Pq taking values in a
measurable space pX, Bq, we define the σ-algebra generated by the random variable X as
σpXq “ σptX ´1 pBq|B P Buq .
The idea is that σpXq contains all of the information contained in X. If an event is in σpXq then
whether this event happens or not is completely determined by knowing the value of the random
variable X. Of course the random variable X is always measurable with respect to σpXq. More
specifically, σpXq is the smallest σ-algebra G on Ω such that X is G-measurable.
15
Example 3.11 (Example 1.1 continued). By definition (2.1), since X1 “ ω1 the σ-algebra
generated by the random variable X1 is σpX1 q “ F1 from (2.9). However, denoting by t`´u the
event tω : ω1 “ 1, ω2 “ ´1u, the σ-algebra generated by X2 “ ω1 ` ω2 is given by
σpX2 q “ σptt``u, t´´u, t`´, ´`uuq
“ tH, Ω, t``u, t´´u, t`´, ´`u, t``, ´´u, t`´, ´`, ´´u, t``, `´, ´`uu, .
Note that this σ-algebra is different than F2 “ σpttw P Ω : pw1 , w2 q “ ps1 , s2 qu : s1 , s2 P t´1, 1uuq .
Indeed, knowing the value of X2 is not always sufficient to know the value of ω1 “ X1 . Contrarily,
knowing the value of pω1 , ω2 q (contained in F2 ) definitely implies that you know the value of X2 . In
other words, (the information of ) σpX2 q is contained in F2 , concisely σpX2 q Ă F2 .
Now compare σpX2 q and σpY q where Y “ X22 . Lets consider three events A “ tX2 “ 2u,
B “ tX2 “ 0u, C “ tX2 is evenu. Clearly all three events are in the σ-algebra generated by X2 (i.e.
σpX2 q) since if you know that value of X2 then you always know whether the events happen or not.
Next notice that B P σpY q since if you know that Y “ 0 then X2 “ 0 and if Y ‰ 0 then X2 ‰ 0.
Hence no mater what the value of Y is knowing it you can decide if X2 “ 0 or not. However,
knowing the value of Y does not always tell you if X2 “ 2. It does sometimes, but not always. If
Y “ 0 then you know that X2 ‰ 0. However if Y “ 4 then X2 could be equal to either 2 or ´2.
We conclude that A R σpY q but B P σpY q. Since X2 is always even, we do not need to know any
information to decide C and it is in fact in both σpX2 q and σpY q. In fact, C “ Ω and Ω is in any
σ-algebra since by definition Ω and the empty set H are always included. Lastly, since whenever
we know X2 we know Y , it is clear that σpX2 q contains all of the information contained in σpY q.
In fact it follows from the definition and the fact that σpY q Ă σpX2 q. To say that one σ-algebra is
contained in another is to say that the second contains all of the information of the first and possibly
more. In other words, Y is measurable with respect to σpX2 q since knowing the value of X2 fixes the
value of Y .
Example 3.12. Let X be a random variable taking values in r´1, 1s. Let g be the function from
r´1, 1s Ñ t´1, 1u such that gpxq “ ´1 if x ď 0 and gpxq “ 1 if x ą 0. Define the random variable
Y by Y pωq “ gpXpωqq. Hence Y is a random variable talking values in t´1, 1u. Let FY be the
σ-algebra generated by the random variable Y . That is FY “ σpY q :“ tY ´1 pBq : B P BpRqu. In this
case, we can figure out exactly what FY looks like. Since Y takes on only two values, we see that for
any subset B in BpRq(the Borel σ-algebra of R)
$


’ Y ´1 p´1q :“ tω : Y pωq “ ´1u if ´ 1 P B, 1 R B


’Y ´1 p1q :“ tω : Y pωq “ 1u

& if 1 P B, ´1 R B
Y ´1 pBq “

’ H if ´ 1 R B, 1 R B



%Ω if ´ 1 P B, 1 P B

Thus FY consists of exactly four sets, namely tH, Ω, Y ´1 p´1q, Y ´1 p1qu. For a function f : Ω Ñ R
to be measurable with respect the σ-algebra FY , the inverse image of any set B P BpRq must be one
of the four sets in FY . This is another way of saying that f must be constant on both Y ´1 p´1q and
Y ´1 p1q. Note that together Y ´1 p´1q Y Y ´1 p1q “ Ω.
Definition 3.13. Given pΩ, F, Pq a probability space and A, B P F, we say that A and B are
independent (A B) if

PrA X Bs “ PrAs ¨ PrBs (2.11)

16
Furthermore, random variables tXi u are jointly independent if for all Ci ,
źn
PrX1 P C1 and . . . and Xn P Cn s “ PrXi P Ci s (2.12)
i“1

We conclude by extending our main example:


Example 3.14 (Infinite random walk). We now cosider ω “ tωi u8
i“1 where
#
1 with probability p
ωi “
´1 with probability 1 ´ p
independent of the other ωj ’s. We note that the cardinality of the sample space Ω is uncountable:
you can map each outcome to a number in the interval r0, 1s in binary representation. We now start
by considering the events for which we know we can compute the probability: for any couple of finite
disjoint sets of indices I, J Ă N we have
P rAI,J s “ p|I| p1 ´ pq|J| for AI,J :“ tω : ωi “ 1 @i P I, ωj “ ´1 @j P Ju .
Then we define a σ-algebra generated by these sets:
F “ σptAI,J uq .
With a σ algebra and a probability measure defined on our sample space Ω1 we have a probability
space pΩ, F, Pq and can now define random variables on it. We immediately see that the random
walk
n
ÿ
Xn :“ ωi
i“1
is a random variable for any n P N, as the sets determining the outcome of the first n coin tosses
are in F by consruction. Indeed for any k P Z we have
ÿn ď
Xn´1 pkq “ tω : ωi “ nu “ AI,J
i“1 I,J : IYJ“p1,...,nq,|I|´|J|“k

which clearly belongs to F.


Consider now the random variable Y` :“ mintk P N : ωk P 1u. We see that we have Y` P F,
too! To check that this is the case, it is sufficient to verify that for every n P N we have tY` “ nu P F.
3.3. Expactation. We now introduce a notation allowing to write the sums (or integrals) (??)
and (2.8) in a unified way. Given a real-valued random variable X on a probability space pΩ, F, Pq,
we define the expected value of X as the integral:
ż
EpXq “ XpωqPpdωq . (2.13)

We will take for granted that this integral makes sense. However, it follows from the general theory
of measure spaces.
Example 3.15. Consider Ω “ r0, 1s equipped with the Borel σ-algebra, P „ UnifpΩq and the
random variable Xpωq “ eω . Then the expectation (2.13) is given by
ż ż1
ω
E rXs “ e Ppdωq “ eω dω “ e ´ 1 .
Ω 0
1it is not a priori clear that the probability measure we have defined on single events can be extended to the
σ-algebra of interest. It turns out that this can be done wothout problems, and this is the content of the Carathéodory
extension theorem

17
We recall below some properties of the expected value:

‚ For any A P F we have E r1A s “ P rAs where 1A pωq is defined in (2.14) ,


‚ Independence If the random variables X and Y are independent, then X and Y one has
ErXY s “ ErXs ¨ ErY s ,

‚ Jensen’s inequality If g : I Ñ R is convex2 on I Ď R for a random variable X P G with


range(X) Ď I we have
gpErXsq ď ErgpXqs ,

‚ Chebysheff inequality For a random variable X P G we have that for any λ ą 0


Er|X|s
Pr|X| ą λs ď .
λ
We now go back to conditional expectations. Recall Example 1.2 and Example 1.3, where the
definition of conditional expectation has been extended to be a function of the random variable we
are conditioning on (i.e., a random variable itself!). In other words, conditional expectations wrt
a random variable depend on the information contained by that random varable. It is therefore
natural to further generalize this concept to the one of conditional expectation with respect to a
σ-algebra. To do so, we introduce the indicator function.
Definition 3.16. Given pΩ, F, Pq a probability space and A P F, the indicator function of A is
#
1 xPA
1A pxq “ (2.14)
0 otherwise

Note that the above is a measurable function. Fixing a probability space pΩ, F, Pq, we define the
conditional expectation:
Proposition 3.17. If X is a random variable on pΩ, F, Pq with Er|X|s ă 8, and G Ă F is a
σ-algebra, then there is a unique random variable Y on pΩ, G, Pq such that

i) Er|Y |s ă 8 ,
ii) Er1A Y s “ Er1A Xs for all A P G .

Definition 3.18. We define the conditional expectation with respect to a σ-algebra G as the
unique random variable Y from Proposition 3.17, i.e., ErX|Gs :“ Y .
The intuition behind Proposition 3.18 is that the conditional expectation wrt a σ-algebra G Ă F
of a random variable X P F is that random variable Y P G that is equivalent or identical (in terms
of expected value, or predictive power) to X given the information contained in G. In other words,
Y “ ErX|Gs is that random variable that is

i) measurable with respect to G, and


ii) the best approximation of the value of X given the information in G in the sense of
Proposition 3.17 ii).

2A function g is convex on I Ď R if for all x, y P I with rx, ys Ď I and for all λ P r0, 1s one has gpλx ` p1 ´ λqyq ď
λgpxq ` p1 ´ λqgpyq

18
The previous definition of conditional expectation wrt a fixed set of events is obtained by evaluating
the random variable ErX|Gs on the events of interest, i.e., by fixing the events in G that may have
occurred.
When we condition on a random variable we are really conditioning on the information that
random variable is giving to us. In other words, we are conditioning on the σ-algebra generated by
that random variable:
E rX|Zs :“ E rX|σpZqs .
As in the discrete case, one can show that there exists a function h : RangepZq Ñ X such that
ErY |Zpωqs :“ hpZpωqq ,
and hence we can think about the conditional expectation as a function of Zpωq. In particular, this
allows to define
ErY |Z “ zs :“ hpzq .
Example 3.19 (Example 3.15 continued). Consider Ω “ r0, 1s, P „ UnifpΩq and the real random
variable Xpωq “ eω . We further define A1 “ r0, 13 s, A2 “ p 13 , 32 s, A3 “ p 23 , 1s and G “ σptA1 , A2 , A3 uq.
We want to find E rX|Gs. By definition (point i) above) the random variable Y “ E rX|Gs must be
measurable on G, i.e., it must assign a unique value to all the outcomes ω in each of the intervals
A1 , A2 , A3 . Therefore, we can write a random variable Y P G as
ÿ
Y pωq “ ai 1Ai pωq . (2.15)
iPI

for I “ t1, 2, 3u and real numbers tai uiPI .3 It therefore only remains to specify the value of tai uiPI so
that Y (which is now measurable wrt G) is the best approximation to the original random variable
X. We do so enforcing the condition from Proposition 3.17 ii):
« ff
“ ‰ ÿ ÿ “ ‰ aj
E Y 1Aj “ E ai 1Ai 1Aj “ ai E 1Ai XAj “ aj P rω P Aj s “
iPI iPI
3
ż
E X1Aj “ E eω 1Aj “ eω dω
“ ‰ “ ‰
Aj

and evaluating the above we obtain


E rX|Gs pωq “ 3pe1{3 ´ 1q1A1 pωq ` 3pe2{3 ´ e1{3 q1A2 pωq ` 3pe1 ´ e2{3 q1A3 pωq .
We now list some properties of the conditional expectation:
‚ Linearity: for all α, β P R we have
ErαX ` βY |Gs “ αErX|Gs ` βErY |Gs ,
‚ if X is G-measurable then
ErX|Gs “ X and ErXY |Gs “ XErY |Gs .
Intuitively, since X P G (X is measurable wrt the σ-algebra G), the best approximation of
X on the sets contained in G is X itself, so we do not need to approximate it.
‚ if X is independent of G then
ErX|Gs “ E rXs and in particular ErX|Ω, Hs “ ErXs .
3In fact, we know that when a σ-algebra G is countable any real-valued random variable Z P G has the form (2.15)
for an index set I, a family of sets tAi uiPI Ă G and real numbers tai uiPI .

19
‚ Tower property: if G and H are both σ-algebras with G Ă H, then
E rE rX|Hs |Gs “ E rE rX|Gs |Hs “ E rX|Gs .
Since G is a smaller σ-algebra, the functions which are measurable with respect to it are
contained in the space of functions measurable with respect to H. More intuitively, being
measure with respect to G means that only the information contained in G is left free
to vary. E rE rX|Hs |Gs means first give me your best guess given only the information
contained in H as input and then reevaluate this guess making use of only the information
in G which is a subset of the information in H. Limiting oneself to the information in G is
the bottleneck so in the end it is the only effect one sees. In other words, once one takes
the conditional expectation with respect to a smaller σ algebra one is loosing information.
Therefore, by doing E rE rX|Gs |Hs one is loosing information (in the innermost expectation)
that cannot be recovered by the second one.
‚ Optimal approximation The conditional expectation with respect to a σ-algebra G Ă F
is that random variable Y P G such that
ErX|Gs “ argmin ErX ´ Y s2 (2.16)
Y meas w.r.t. G

This should be thought of as the best guess of the value of Y given the information in G.

Example 3.20 (Example 3.12 continued). In the previous example, EtX|FY u is the best
approximation to X which is measurable with respect to FY , that is constant on Y ´1 p´1q and
Y ´1 p1q.In other words, EtX|FY u is the random variable built from a function hmin composed with
the random variable Y such that the expression
! )
E pX ´ hmin pY qq2

is minimized. Since Y pωq takes only two values in our example, the only details of hmin which mater
are its values at 1 and -1. Furthermore, since hmin pY q only depends on the information in Y , it
is measurable with respect to FY . If by chance X is measurable with respect to FY , then the best
approximation to X is X itself. So in that case EtX|FY upωq “ Xpωq.
In light of (2.16), we see that
ErX|Y1 , . . . , Yk s “ ErX|σpY1 , . . . , Yk qs
This fits with our intuitive idea that σpY1 , . . . , Yk q embodies the information contained in the
random variables Y1 , Y2 , . . . Yk and that ErX|σpY1 , . . . , Yk qs is our best guess at X if we only know
the information in σpY1 , . . . , Yk q.

4. Distributions and Convergence of Random Variables


Definition 4.1. We say that two X-valued random variables X and Y have the same distribution
or have the same law if for all bounded (measureable) functions f : X Ñ R we have Erf pXqs “
Erf pY qs. This equivalence is sometimes written
LawpXq “ LawpY q (2.17)
Remark 4.2. Either of the following are equivalent to two random variable X and Y on a
probability space pΩ, F, Pq having the same distribution.

i) Erf pXqs “ Erf pY qs for all continuous f with compact support.

20
ii) PrX ă xs “ PrY ă xs for all x P R .4

As in the case of functions, there are many ways a sequence of random variables tXn unPN can
converge to another random variable X:
Definition 4.3. Let tXn unPN be a sequence of random variables on a probability space pΩ, F, Pq,
and let X be a random variable on the same space. Then
‚ almost sure convergence tXn u converges to X almost surely if
Prtw P Ω : lim Xn pωq “ Xpωqus “ 1 ,
nÑ8

‚ convergence in probability tXn u converges to X in probability if, for all ε ą 0


lim Prtw P Ω : |Xn pωq ´ Xpωq| ą εus “ 0 ,
nÑ8

‚ weak convergence tXn u converges weakly (or in distribution) to X if, for all x P R
lim PrXn pωq ă xs “ PrXpωq ă xs .
nÑ8

‚ Lp convergence For p ě 1, tXn u converges in Lp to X if,


lim E r|Xn ´ X|p s “ 0 .
nÑ8

Remark 4.4. The above definitions can be ordered by strength: we have the following implications
almost sure convergence ñ convergence in probability ñ weak convergence .
and, for 1 ď q ď p ă 8
convergence in Lp ñ convergence in Lq ñ convergence in probability .
Moreover, we note that in order to have convergence in distributions the two random variables do
not need to live on the same probability space.
A useful method of showing that the distribution of a sequence of random variables converges to
another is to consider the associated sequence of Fourier transforms, or the characteristic function
of a random variable as it is called in probability theory.
Definition 4.5. The characteristic function (or Fourier Transform) of a random variable X is
defined as
ψptq “ ErexppitXqs
for all t P R.
It is a basic fact that the characteristic function of a random variable uniquely determines its
distribution. Furthermore, the following convergence theorem is a classical theorem from probability
theory.
Theorem 4.6. Let Xn be a sequence of real-valued random variables and let ψn be the associated
characteristic functions. Assume that there exists a function ψ so that for each t P R
lim ψn ptq “ ψptq .
nÑ8
If ψ is continuous at zero then there exists a random variable X so that the distribution of Xn
converges to the distribution of X. Furthermore the characteristic function of X is ψ.
4or, equivalently if PrX P As “ PrY P As for all continuity sets A P F.

21
Example 4.7. Note that using a Fourier transform,
σ 2 λ2
Ereiλx s “ eiλm´ 2 ,
for all λ. Using this, we say that X “ pX1 , . . . , Xk q is a k-dimensional Gaussian if there exists
m P Rk and R a positive definite symmetric k ˆ k matrix so that for all λ P Rk we have
pRλq¨λ
Ereiλ¨x s “ eiλ¨m´ 2 .

22
CHAPTER 3

Brownian Motion and Stochastic Processes

1. An Illustrative Example: A Collection of Random Walks


pnq
Fixing an n ě 0, let tξk : k “ 1, ¨ ¨ ¨ , 2n u be a collection of independent random variables each
distributed as normal with mean zero and variance 2´n . For t “ k2´n for some k P t1, ¨ ¨ ¨ , 2n u, we
define
ÿk
pnq
B pnq ptq “ ξj . (3.1)
j“1

for intermediate times P r0, 1s not of the form k2´n for some k we define the function as the linear
function connecting the two nearest points of the form k2´n . In other words, if t P rs, rs were
s “ k2´n and r “ pk ` 1q2´n then
t ´ s pnq r´t
B pnq ptq “ n
B psq ` n B pnq prq
2 2
We will see momentarily that B pnq has the following properties independent of n:

i) B pnq p0q “ 0.
“ ‰
ii) E B pnq ptq “ 0 for all t P r0, 1s.
“ ‰
iii) E |B pnq ptq ´ B pnq psq|2 “ t ´ s for 0 ď s ă t ď 1 of the form k2´n .
iv) The distribution of B pnq ptq ´ B pnq psq is Gaussian for 0 ď s ă t ď 1 of the form k2´n .
v) The collection of random variables

tB pnq pti q ´ B pnq pti´1 qu

are mutually independent as long as 0 ď t0 ă t1 ă ¨ ¨ ¨ ă tm ď 1 for some m and the tti u


are of the form k2´n .

The first property is clear since the sum in (3.1) is empty. The second property for t “ k2´n
follows from
” ı ÿk ” ı
pnq
E B pnq ptq “ E ξj “0
j“1
” ı
pnq
since E ξj “ 0 for all j P p1, . . . , 2n q. For general t, we have t P ps, rq “ pk{2´n , pk ` 1q2´n q for a
k P p1, . . . , 2n q, so that
” ı t´s ” ı r´t ” ı
E B pnq ptq “ n E B pnq psq ` n E B pnq prq “ 0 .
2 2
23
To see the second moment calculation take s “ m2´n and t “ k2´n and observe that
« ff
” ı ´ ÿ k k
¯´ ÿ ¯ k
ÿ k
ÿ ” ı
pnq pnq 2 pnq pnq pnq pnq
E |B ptq ´ B psq| “ E ξj ξ` “ E ξj ξ`
j“m`1 `“m`1 j“m`1 `“m`1
k
ÿ ” ı k
ÿ k
ÿ ” ı ” ı
pnq pnq pnq
“ E pξj q2 ` E ξj E ξ`
j“m`1 j“m`1 `“m`1
`‰j
k
ÿ
“ 2´n “ k2´n ´ m2´n “ t ´ s
j“m`1
” ı ” ı
pnq pnq pnq pnq
since ξj and ξ` are independent if j ‰ `, while by definition E ξj “ 0 and E pξj q2 “ 2´n .
Since B n ptq is just the sum of independent Gaussians, it is also distributed Gaussian with a
mean and a variance which is just the sum of the individual means and variances respectively.
Because for disjoint time intervals the differences of the B pnq pti q ´ B pnq pti´1 q are sums over disjoint
collections of ξi1 s, they are mutually independent. 
Since all of these properties are independent of n it is tempting to think about the limit as
n Ñ 8 and the mesh becoming increasingly fine. It is not clear that such a limit would exist as the
curves B pnq become increasingly “wiggly.” We will see in fact that it does exist. We begin by taking
an abstract perspective in the next sections though we will return to a more concrete perspective at
the end.

2. General Stochastic Proceses


Motivated by the example of the previous section, we pause to discuss the idea of a stochastic
process more generally.
Definition 2.1. Let pΩ, F, Pq be a probability space and let pX, Bq be a measurable space. Also
let T be an indexing set which for our purposes will typically be R, R` , N, or Z. Suppose that for
each t P T we have Xt : Ω Ñ X a measurable function. Then the set tXt u is a stochastic process
on T with values in X. Also, given ω P Ω, Xt pωq : T Ñ X is called a path or trajectory of tXt u.
Remark 2.2. Commonly used notations for stochastic processes include tXt u, Xt pωq, X¨ , . . .
Now that we have defined a stochastic process we would like to characterize its distribution.
However, as we have seen in Example 3.14, defining probability distribution on high dimensional
spaces such as the ones we are dealing with is not entirely trivial. We recall that one way to define
a probability distribution on an uncountable sigma algebra (such as the ones generated by the
stochastic processes defined above) is to first define the probability of a certain family of events,
and then generate a σ-algebra from those events. A natural and useful way to define events of
interests is by defining probabilities on marginals of the process: for any finite collection of times
tti un1 and corrseponding collection of sets tAi un1 in X, one can identify the distribution of a process
by specifying the probabilities
P rXt1 P A1 , Xt2 P A2 , . . . , Xtn P An s “ µt1 ,t2 ,...,tn pA1 , A2 , . . . , An q (3.2)
In the language of measure theory such events are referred to as cylinder sets.
The above definition of the distribution of a process allows us to define a type of equivalence
between stochastic processes.
24
Definition 2.3. We say that two stochastic processes have the same distribution or have the
same law if for all t1 ă ¨ ¨ ¨ ă tn P T we have
LawpXt1 , . . . , Xtn q “ LawpYt1 , . . . , Ytn q
where we think of the vector pXt1 , . . . Xtn q as a random variable taking values in the product space
Xn .
We see that in order to be sensible probability measures on cylinder sets (events appearing
on the rhs of (3.2)), the family of functions tµu must have some properties, summarized in the
definition below.
Definition 2.4. Given a set of finite dimensional distributions tµu over an indexing set T on
X we say that the set is compatible if
i) For all t1 ă ¨ ¨ ¨ ă tm`1 P T and A1 , . . . , Am P B we have
µt1 ...tm pA1 , . . . , Am q “ µt1 ...tm`1 pA1 , . . . , Am , Xq
ii) For all t1 ă ¨ ¨ ¨ ă tm , A1 , . . . , Am P B, and σ a permutation on m letters, we have
` ˘
µt1 ...tm pA1 , . . . , Am q “ µtσp1q ...tσpmq Aσp1q , . . . , Aσpmq
The first condition is roughly saying that if one considers a null condition (the total space) in
a higher-dimensional measure, one gets the same result without the null condition in the lower
dimensional measure. The second condition is saying that the order of the indexing of the µ doesn’t
matter.
Remark 2.5. The first of the above two requirements is called the Chapman-Kolmogorov
equation.
We conclude this paragraph by stating a nice extension theorem for constructing stochastic
processes. This theorem (much as the Carathéodory extension theorem in Example 3.14) extends
the definition of the probability measure P originally defined on cylinder sets to the whole σ-algebra
and ensures the existence of a stochastic process with such distribution.
Theorem 2.6 (Kolmogorov Extension Theorem). Given a set of compatible finite dimensional
distributions tµt1 ...tm u with indexing set T , there exists a probability space pΩ, F, Pq and a stochastic
process tXt u so that Xt has the required finite dimensional distributions, i.e. for all t1 ă ¨ ¨ ¨ ă tm P T
and A1 , . . . Am P B we have
PrXt1 P A1 and . . . and Xtm P Am s “ µt1 ...tm pA1 , . . . Am q .

3. Definition of Brownian motion (Wiener Process)


Looking back at Section 1, the list of properties of B pnq suggest a reasonable collection of
compatible finite distributions. Namely independent increments with each increment distributed
normally with mean zero and variance proportional to the time interval. These are the distribution
of marginals of a fundamental process in stochastic calculus: Brownian motion, which we define
below.
Definition 3.1. Standard Brownian motion tBt u is a stochastic process on R such that
i) B0 “ 0 almost surely (i.e. P rtω P Ω : B0 ‰ 0us “ 0),
ii) Bt has independent increments: for any t1 ă t2 ă . . . ă tn ,
Bt1 , Bt2 ´ Bt1 , . . . , Btn ´ Btn´1 are independent,

25
iii) The increments Bt ´ Bs are Gaussian random variables with mean 0 and variance given
by the length of the interval:
VarpBt ´ Bs q “ |t ´ s| .
iv) The paths t ÞÑ Bt pωq are continuous with probability one.
Wedefine in particular a process satisfying assumption iii) above as continuous:
Definition 3.2. A stochastic process is continuous if its paths t Ñ Xt pωq are continuous with
probability one.
This allows to define Brownian motion shortly as:
Brownian motion is a continuous stochastic process with indipendent increments „ N p0, t ´ sq
The first two points of the above definition specify the distribution of the increments of Brownian
motion, which in turn defines the distribution of the marginals (3.2). Indeed, we can “separate” the
event of a path running through two sets A1 , A2 at times t1 ă t2 by considering the events of
i) the path arriving at y P A1 at time t1 and
ii) the path arriving at z P A2 conditioned on starting at y.
For a fixed y, the second event depends exclusively on the increment Bt2 ´ Bt1 and by the definition
of brownian motion we can write:
P rBt1 P A1 , Bt2 P A2 s “ P rBt1 ´ B0 P A1 , Bt1 ` pBt2 ´ Bt1 q P A2 s
ż
“ P ry ` pBt2 ´ Bt1 q P A2 |Bt1 “ ys P rBt1 ´ B0 P dys
żA1 ż
“ P ry ` pBt2 ´ Bt1 q P dzs P rBt1 ´ B0 P dys
A1 A2
ż ż
“ ρpy, z, t2 ´ t1 q dz ρp0, y, t1 q dy (3.3)
A1 A2
where in the third line we have used the independence of the increments, and in the last line we
have used that the increments are normal random variables. Furthermore we have defined
1 px1 ´x2 q2
ρpx1 , x2 , ∆tq “ ? e´ 2∆t (3.4)
2π∆t
which can be interpreted as the probability density of a transition from x1 to x2 in a time interval
∆t. The above conditioning procedure can trivially be extended to any finite number of marginals.
The family of probability distributions that this process generates is compatible according to
drefd:compatible, an therefore Theorem 2.6 guarantees the existence of the process we have described.
More precisely, Theorem 2.6 guarantees the existence of a process with properties i) and ii) from
above.
Notice that the above definition makes no mention of the continuity. It turns out that the finite
dimensional distributions can not guarantee that a stochastic process is almost surely continuous
(but they can imply that it is possible for a given process to be continuous, as we see below). Indeed,
defining the distribution of the marginals still leaves us with some freedom in the choice of the
process. This is captured by the following definition:
Definition 3.3. A stochastic process tXt u is a version (or modification) of a second stochastic
process tYt u if for all t , PrXt “ Yt s “ 1. Notice that this is a symmetric relation.
We now give an example showing that different versions of a process, despite having by definition
the same distribution, can have different continuity properties:
26
Example 3.4. We consider the probability space pΩ, F, Pq “ pr0, 1s, B, unifpr0, 1sqq. On this
space we define two processes variables:
#
0 if t ‰ ω
Xt pωq “ 0 and Yt pωq “
1 else
We immediately see that for any t P r0, 1s
P rXt ‰ Yt s “ P rω “ ts “ 0
so that one process is a version of the other. However, we see that all of the paths of Xt are
continuous, while none of the paths Yt are in such class.
The previous example showcases a family of distributions of marginals that is trivially compatible
with the continuity of the process they represents (the process can have a continuous version). The
following theorem gives sufficient conditions on the distribution of marginals guaranteeing that the
corresponding process has a continuous version. We use this result to prove that, in partcular, the
marginals defined by points i) and ii) in Def. 3.1 are compatible with iii), i.e., there exists a process
satisfying all such conditions.
Theorem 3.5 (Kolmogorov Continuity Theorem (a version)). Suppose that a stochastic process
tXt u, t ě 0 satisfies the estimate:
for all T ą 0 there exist positive constants α, β, D so that
Er|Xt ´ Xs |α s ď D|t ´ s|1`β @t, s P r0, T s, (3.5)
then there exist a version of Xt which is continuous.
Remark 3.6. The estimate in (3.5) holds for a Brownian motion. We give the details in
one-dimension. First recall that if X is a Gaussian random variable with mean 0 and variance σ 2
then ErX 4 s “ 3σ 4 . Applying this to Brownian motion we have E|Bt ´ Bs |4 “ 3|t ´ s|2 and conclude
that (3.5) holds with α “ 4, β “ 1, D “ 3. Hence it is not incompatible with the all ready assumed
properties of Brownian motion to assume that Bt is continuous almost surely.
Remark 3.6 shows that continuity is a fundamental attribute of Brownian motion. In fact we have the
following second (and equivalent) definition of Brownian motion which assumes a form of continuity as a
basic assumption, replacing other assumptions.
Theorem 3.7. Let Bt be a stochastic process such that the following conditions hold:
i) EpB12 q “ constant,
ii) B0 “ 0 almost surely,
iii) Bt`h ´ Bt is independent of tBs : s ď tu.
iv) The distribution of Bt`h ´ Bt is independent of t ě 0 (stationary increments),
v) (Continuity in probability.) For all δ ą 0,
lim Pr|Bt`h ´ Bt | ą δs “ 0
hÑ0

then Bt is Brownian motion. When EBp1q2 “ 1 we call it standard Brownian motion.


The process introduced above can be straightforwardly generalized to n dimensions:
Definition 3.8. n-dimensional Standard Brownian motion tBt u is a stochastic process on Rn
such that
i) B0 “ 0 almost surely (i.e. P rtω : B0 ‰ 0us “ 0),
ii) Bt has independent increments: for any t1 ă t2 ă . . . ă tn ,
Bt1 , Bt2 ´ Bt1 , . . . , Btn ´ Btn´1 are independent,

27
iii) The increments Bt ´ Bs are Gaussian random variables with mean 0 and variance given
by the length of the interval: denoting by pBt qi the i-th component of Bt
#
` ˘ |t ´ s| if i “ j
Cov pBt ´ Bs qi , pBt ´ Bs qj “ .
0 else

4. Constructive Approach to Brownian motion


Returning to the construction of Section 1, one might be tempted to hope that the random
walks B pnq converge to a Brownian motion Bt as n Ñ 8. While this is true that the distribution
of B pnq converges weakly to that of Bt as n Ñ 8, a moments reflection shows that there is not
hope that the sequence converges almost surely since B pnq and B pn`1q have no relation for a given
pnq
realization of the underlying random variable tξk u.
We will now show that by cleverly rearranging the randomness we can construct a new sequence
of random walks W pnq ptq so that the stochastic process W pnq has the same distribution as B pnq yet
W pnq will converge almost surely to a realization of Brownian motion.
pnq pnq
We begin by defining a new collection of random variables tηk u from the tξk u. We define
p0q
η1 to be a normal random variable with mean 0 and variance 1 which is independent of all of the
ξ’s. Then for n ě 0 and k P t1, . . . , 2k u we define
pn`1q 1 pnq 1 pnq pn`1q 1 pnq 1 pnq
η2k “ η k ` ξk and η2k´1 “ ηk ´ ξk
2 2 2 2
pnq
Since each ηk is the sum of independent Gaussian random variables they are themselves Gaussian
pnq
random variables. It is easy to see that ηk is mean zero and has variance 2´n . Since for any
pnq pnq
n ě 0 and j, k P t1, . . . , 2k u with j ‰ k, we see that Eηk ηj “ 0 and we conclude that because the
pnq
variables are Gaussian that the collection of random variables tηk : k P t1, . . . , 2k uu are mutually
independent. Hence if we define
k
ÿ pnq
W pnq pk2´n q “ ηj
j“1

and at intermediate times as the value of the line connecting the two nearest points, then W pnq has
the same distribution as B pnq from Section 1.
` ˘
Theorem 4.1. With probability one, the sequence of functions W pnq ptq pωq on converges
uniformly to a continuous function Bt pωq as n Ñ 8, and the process Bt pωq is a Brownian motion
on r0, 1s.
Proof. Now for n ě 0 and k P t1, . . . , 2k u define
pnq ˇ pnq ˇ
ˇW ptq ´ W pn`1q ptqˇ
Zk “ sup
tPrpk´1q2´n ,k2´n s

and observe that


ˇ1 ˇ ˇ1 ˇ
pnq ˇ pnq pn`1q ˇ ˇ pnq ˇ
Zk “ ˇ ηk ´ η2k´1 ˇ “ ˇ ξk ˇ
2 2
pnq
Since ξk is normal with mean zero and variance 2´pn`2q , we have by Markov inequality that
pnq
pnq Er|Zk |4 s 3 ¨ 2´2pn`2q
PrZk ą δs ď “ .
δ4 δ4
28
pnq
In turn since the tZk : k “ 1, . . . , 2n u are mutually independent this implies that
« ff
ˇ pnq ˇ pnq pnq
P sup ˇW ptq ´ W pn`1q
ptqˇ ą δ “ Prsup Zk ą δs “ 2n PrZ1 ą δs
tPr0,1s k

3 ¨ 2´2pn`2q
ď 2n :“ ψpn, δq .
δ4
Since ψpn, 2´n{5 q „ c2´n{5 for some c ą 0, we have that
ÿ8 ” ˇ ˇ ı
P sup ˇW pnq ptq ´ W pn`1q ptqˇ ą 2´n{5 ă 8 .
n“1 tPr0,1s

Hence the Borel-Cantelli lemma implies that with probability one there exists a random kpωq so
that if n ě k then
ˇ ˇ
sup ˇW pnq ptq ´ W pn`1q ptqˇ ď 2´n{5 .
tPr0,1s

In other words with probability one the tW pnq u form a Cauchy sequence. Let Bt denote the limit.
It is not hard to see that Bt has the properties that define Brownian motion. Furthermore since
each W pnq is uniformly continuous and converge in the supremum norm to Bt , we conclude that
with probability one Bt is also uniformly continuous. 

5. Brownian motion has Rough Trajectories


In Section 4, we saw that Brownian motion could be seen as the limit of a ever roughening
path. This leads us to wonder “how rough is Brownian motion?” We know it is continuous, but is
it differentiable?
Definition 5.1. The (standard) p-th variation on the interval ps, tq of any continuous function
f is defined to be ÿ
Vp rf sps, tq “ sup |f ptk`1 q ´ f ptk q|p (3.6)
Γ k
where the supremum is over all partitions
Γ “ tttk u : s “ t0 ă t1 ă ¨ ¨ ¨ ă tn´1 ă tn “ tu , (3.7)
for some n.
For a given partition ttk u let us define the mesh wi dth of the partition to be
|Γ| “ sup |tk ´ tk´1 | .
0ăkďN
The variations of Brownian motion are finite only for a certain range of q:
Proposition 5.2. If Bt is a Brownian motion on the interval r0, T s with T ă 8, then
Vp rBsp0, T q ă 8 a.s. if and only if p ą 2. (3.8)
Proof. See [20, 13] for details. 
The fact that large values of p imply boundedness of the quadratic variation may be surprising
at first. However, this results from the fact that in order for the supremum in (3.6) to diverge
for a continuous function we must consider a sequence ΓN of partitions with diverging number of
intervals. As these intervals become smaller, the variation that each of them captures becomes
smaller. These small contribution become even smaller if they are raised to a power p ą 1, whence
the (possible) convergence. This concept is in close relation with the one of Hölder continuity.
29
Notice that if a function f has a nice bounded derivative on the interval r0, ts then V1 rf sp0, tq ă 8
since
ˇ ˇ `
|f ptk q ´ f ptk`1 q| “ ˇtk tk`1 f 1 psq dsˇ ď sup |f 1 psq| |tk`1 ´ tk | ,
˘
sPr0,ts

we see that
` ˘
V1 rf sp0, tq ď sup |f 1 psq| t .
sPr0,ts

Similar considerations hold if f is Lipschitz continuous in r0, T s with Lipschitz constant L:


ÿ ÿ
V1 rf sp0, tq “ |f ptk q ´ f ptk`1 q| ď Lptk`1 ´ tk q “ LT .
k k
Hence (3.8) implies that with probability one Brownian motion can not have a bounded derivative
on any interval. In fact something much stronger is true. With probability one, Brownian motion is
nowhere differentiable as a function of time (see [13] for details).
From Proposition 5.2, we see that p “ 2 is the border case. It is quite subtle. On one hand the
statement (3.8) is true, yet if one considers a specific sequence of shrinking partitions ΓpN q (such
that each successive partition contains the previous partition as a sub-partition) then
ÿ pN q pN q
QN pT q :“ |Bptk`1 q ´ Bptk q|2 Ñ T a.s. .
ΓpN q
Initially we will prove the following simpler statement.
Theorem 5.3. Let ΓpN q be a sequence of partitions of r0, T s as in (3.7) with limN Ñ8 |ΓpN q | Ñ 0.
Then
ÿ pN q pN q
QN pT q :“ |Bptk`1 q ´ Bptk q|2 ÝÑ T (3.9)
N Ñ8
ΓpN q

in L2 pΩ, Pq.
Corollary 5.4. Under the conditions of the above theorem we have limN Ñ8 QN pT q “ T in
probability.
Proof. We see that for any ε ą 0
Er|Zn pωq ´ T |2 s
Prω : |ZN pωq ´ T | ą εs ď Ñ0 as |ΓpN q | Ñ 0.
ε2

Proof of Theorem 5.3. Fix any sequence of partitions
pN q pN q pN q
ΓpN q :“ tttk u : 0 “ t1 ă t2 ¨ ¨ ¨ ă tpN
n
q
“ Tu,
of r0, T s with |ΓpN q | Ñ 0 as N Ñ 8. Defining
Nÿ
´1
pN q pN q
ZN :“ rBptk`1 q ´ Bptk qs2 ,
k“1
we need to show that
ErZN ´ T s2 .
We have,
ErZN ´ T s2 “ ErZN s2 ´ 2T ErZN s ` T 2 “ ErZN s2 ´ T 2 .
30
pN q pN q pN q pN q
Using the convenient notation ∆k B :“ Bptk`1 q ´ Bptk q and ∆k tpN q :“ |tk`1 ´ tk | we have that
ÿ ÿ
ErZN s2 “ Er p∆n Bq2 p∆k Bq2 s
n k
ÿ ÿ
“ Er p∆n Bq4 s ` Er p∆k Bq2 p∆n Bq2 s
n n‰k
ÿ ÿ
“3 p∆n tpN q q2 ` p∆k tpN q qp∆n tpN q q
n n‰k

since Ep∆k Bq2 “ ∆k t and Ep∆k Bq4 “ p∆k tpN q q2 because ∆k B is a Gaussian random variable with
mean zero and variance ∆k tpN q .
The limit of the first term equals 0 as the maximum partition spacing goes to zero since
ÿ
p∆n tpN q q2 ď 3 ¨ supp∆n tpN q qT
n
Returning to the remaining term
ÿ N
ÿ k´1
ÿ N
ÿ
p∆k tpN q qp∆n tpN q q “ ∆k tpN q r ∆n tpN q ` ∆n tpN q s
n‰k k“1 n“1 k`1
N
ÿ
“ ∆k tpN q pT ´ ∆k tpN q q
k“1
ÿ ÿ
“T ∆k tpN q ´ p∆k tpN q q2
“ T2 ´ 0
Summarizing, we have shown that
ErZN ´ T s2 Ñ 0 as N Ñ8

Corollary 5.5. Under the conditions of Theorem 5.3, if ΓpN q Ă ΓpN q we have limN Ñ8 QN pT q “
T almost surely.
Proof. You can prove the above result as an exercise. To do so repeat the proof of the main
theorem and apply Borel-Cantelli Lemma when necessary. 

6. More Properties of Random Walks


We now return to the family of random walks constructed in Section 1. The collection of random
walks B pnq ptq constructed in (3.1) have additional properties which are useful to identify and isolate
as the general structures will be important for our development of stochastic calculus.
Fixing an n ě 0, define tk “ k2´n for k “ 0, . . . , 2n . Then notice that for each such k, B pnq ptk q
is a Gaussian random variable since it is the sum of the mutually independent random variables
pnq
ξj . Furthermore for any collection of ptk1 , tk2 , . . . , tkm q with kj P t1, . . . , 2n qu we have that
` pnq ˘
B pt1 q, ¨ ¨ ¨ , B pnq ptm q
is a multidimensional Gaussian vector.
Next notice that for 0 ď t` ă tk ď 1 we have
“ pnq ˇ pnq ‰ “ pnq pnq pnq ˇ pnq ‰ “ pnq pnq ‰ pnq pnq
E Btk ˇBt` “ E pBtk ´ Bt` q ` Bt` ˇBt` “ E Btk ´ Bt` ` Bt` “ Bt` (3.10)

31
pnq pnq pnq pnq pnq
since Btk ´ Bt` is independent of Bt` and ErBtk ´ Bt` s “ 0. We will choose to view this single
fact as the result of two finer grain facts. The first being that the distribution of the walk at time
pnq
tk given the values tBs : s ă t` u is the same as the conditional distribution of the walk at time tk
pnq
given only Bt` . In light of Definition 4.1, we can state this more formally by saying that for all
functions f
” ˇ ı ” ˇ ı
pnq ˇ pnq ˇ pnq
E f pBtk qˇFt` “ E f pBtk qˇBt`
pnq
where Ft “ σpBs : s ď tq. This property is called the Markov property which states that the
distribution of the future depends only on the past through the present value of the process.
There is a stronger version of this property called the strong Markov property which states that
one can in fact restart the process and restart it from the current (random) value and run it for the
remaining amount of time and obtain the same answer. To state this more precisely let us introduce
the process Xptq “ x ` B pnq ptq as the random walk starting from the point x and let Px be the
probability distribution induced on Cpr0, 1sRq by the trajectory of Xptq for fixed initial x. Let Ex
be the expected value associated to Px . Of course P0 is simply the random walk starting from 0
that we have been previously considering. Then the strong Markov property states that for any
function f
E0 f pXtk q “ E0 F pXtk ´t` , tk q where F px, tq “ Ex f pXt q .
Neither of these Markov properties is solely enough to produce (3.10). We also need some fact
pnq
about the mean of the process given the past. Again defining Ft “ σpBs : s ď tq, we can rewrite
(3.10) as
” ˇ ı
pnq ˇ pnq
E Btk ˇFt` “ Bt`
by using the Markov property. This equality is the principle fact that makes a process what is called
a martingale.
We now revisit these ideas making more general definitions which abstract these properties so
we can talk about and use them in broader contexts.

7. More Properties of General Stochastic Processes


Gaussian processes. We begin by giving the general definition of a Gaussian process of which
Brownian motion is an example.
Definition 7.1. tXt u is a Gaussian random process if all finite dimensional distributions of
X are Gaussian random variables. I.e., for all t1 ă . . . tk P T and A1 , . . . , Ak P B (where here B
represents Borel sets on the real line) we have that there exists R a positive definite symmetric k ˆ k
matrix and m P Rk so that
ż
1 1 T ´1
PrXt1 P A1 , . . . , Xtk P Ak s “ k{2
? e´ 2 pX´mq R pX´mq .
A1 ˆ¨¨¨ˆAk p2πq det R
We have the associated definitions
µt :“ ErXt s , Rt,s :“ covpXt Xs q “ ErpXt ´ µt q ¨ pXs ´ µs qs . (3.11)
Example 7.2. By definition, Brownian motion is a Gaussian process: we can rewrite the rhs
of (3.3) using the definition (3.4) and obtain
ˆ ˙
ż ż y2 px´yq2
1 ´ 2t
1
` 2pt
2 ´t2 q
P rBt1 P A1 , Bt2 P A2 s “ a e dz dy
A1 A2 2π t1 pt2 ´ t1 q

32
Expanding the exponento of the above expression or even better applying (3.11) we see that Brownian
motion has mean µt “ 0 for all t ą 0 and covariance matrix
CovpBt , Bs q “ Cov pBs ` pBt ´ Bs q, Bs q “ E rBs s2 ` E rpBt ´ Bs qBs s
“ E rBs s2 ` E rBt ´ Bs s E rBs s “ s .
Here we have assumed without loss of generality that s ă t and in the third identity we have used
the independence of increments of Brownian motion. This shows that for general t, s ě 0 we have
CovpBt , Bs q “ mintt, su . (3.12)
In fact, because the mean and covariance structure of a Gaussian process completely determine
the properties of its marginals, if a Gaussian process has the same covariance and mean as a
Brownian motion then it is a Brownian motion.
Theorem 7.3. A Brownian motion is a Gaussian process with zero mean function, and covariance
function minpt, sq. Conversely, a Gaussian process with zero mean function, and covariance function
minpt, sq is a Brownian motion.
Proof. Example 7.2 proves the forward direction. To prove the reverse direction, assume that
Xt is a Gaussian process with zero mean and CovpXt , Xs q “ minpt, sq. Then the increments of the
process, given by pXt , Xt`s ´ Xt q are Gaussian random variables with mean 0. The variance of the
increments Xt`s ´ Xt is given by
VarpXt`s ´ Xt , Xt`s ´ Xt q “ CovpXt`s , Xt`s q ´ 2CovpXt , Xt`s ´ Xt q ` CovpXt , Xt q
“ pt ` sq ´ 2t ` t “ s .
The independence of Xt and Xt`s ´ Xt follows immediately by
CovpXt , Xt`s ´ Xt q “ CovpXt , Xt`s q ´ CovpXt , Xt q “ t ´ t “ 0 .


Martingales. In order to introduce the concept of a martingale, we first adapt the concept
of σ-algebra to the framework of stochastic processes. In particular, in the case of a stochastic
process we would like to encode the idea of history of a process: by observing a process up to a
time t ą 0 we have all the information on the behavior of the process before that time but none
after it. Furthermore, as t increases we increase the amount of information we have on that process.
This idea is the one that underlies the concept of filtration:
Definition 7.4. Given an indexing set T , a filtration of σ-algebras is a set of sigma algebras
tFt utPT such that for all t1 ă ¨ ¨ ¨ ă tm P T we have
Ft1 Ă ¨ ¨ ¨ Ă Ftm .
We now define in which sense a filtration contains the information associated to a certain process
Definition 7.5. A stochastic process tXt u is adapted to a filtration tFt u if its marginals are
measurable with respect to the corresponding σ-algebras, i.e., if σpXt q Ď Ft for all t P T . In this
case we say that the process is to the filtration tFt u .
We also extend the concept of σ-algebras generated by a random variable to the case of a
filtration. In this case, the filtration generated by a process tXt u is the smallest filtration containing
enough information about tXt u.
33
Definition 7.6. Let tXt u be a stochastic process on pΩ, F, Pq . Then the filtration tFtX u
generated by Xt is given by
FtX “ σpXs |0 ď s ď tq ,
which means the smallest σ-algebra with respect to which the random variable Xs is measurable, for
all s P r0, ts. Thus, FtX contains σpXs q for all 0 ď s ď t.
The canonical example of this is the filtration generated by a discrete random process (i.e.
T “ N):
Example 7.7 (Example 1.1 continued). The filtration generated by the N -coin flip process for
m ă N is
Fm :“ σpX0 , . . . , Xm q .
Intuitively, we will think of tFtX u as the history of tXt u up to time t, or the “information” about
tXt u up to time t. Roughly speaking, an event A is in tFtX u if its occurrence can be determined by
knowing tXs u for all s P r0, ts.
Example 7.8. Let Bt be a Brownian motion consider the event
A “ tω P Ω : max |Bs pωq| ď 2u .
tPp0,1{2q

It is clear that we have A P F1{2 as the history of Bt up to time t “ 1{2 determines whether A has
occurres or not. However, we have that A R F1{3 as the process may not yet have reached 2 at time
t “ 1{3 but may do so before t “ 1{2.
We now have all the tools to define the concept of a martingale:
Definition 7.9. tXt u is a Martingale with respect to a filtration Ft if for all t ą s we have
i) Xt is Ft -measurable ,
ii) Er|Xt |s ă 8 ,
iii) ErXt |Fs s “ Xs .

Condition iii) in Def. 7.9 involves a conditional expectation with respect to the σ-algebra Ft .
Recall that E rXt`s |Ft s is an Ft -measurable random variable which approximates Xt`s in a certain
optimal way (and it is uniquely defined). Then Def. 7.9 states that, given the history of Xt up to
time t, our best estimate of Xt`s is simply Xt , the value of tXt u at the present time t. In a certain
sense, a martingale is the equivalent in stochastic calculus of a constant function.
Example 7.10. Brownian motion is a martingale wrt tFtB u. Indeed, we have that
E rBt`s |Ft s “ E rBt ` pBt`s ´ Bt q|Ft s “ E rBt |Ft s ` E rBt`s ´ Bt |Ft s “ Bt ` 0 ,
by the independence of increments property.
The above strategy can be extended to general functions gpXq, as the only property that was
used is independence of the increments: because gpXq P F X this property implies that
E gpBt`s ´ Bt q|FtB “ E rgpBt`s ´ Bt qs .
“ ‰
(3.13)
Example 7.11. The process Xt :“ Bt2 ´ t is a martingale B
“ 2 ‰ wrt tFt u. Indeed we have that
B
the process is obviously measurable wrt tFt u and that E |Bt | “ t ă 8 verifying i) and ii) from
Def. 7.9. For iii) we have
“ 2
|FtB “ E pBt ` Bt`s ´ Bt q2 |FtB
‰ “ ‰
E Bt`s
34
“ E Bt2 |FtB ´ 2E rBt s E Bt`s ´ Bt |FtB ` E pBt`s ´ Bt q2 |FtB “ Bt2 ` s .
“ ‰ “ ‰ “ ‰
“ 2 ‰
subtracting t ´ s on both sides of the above equation we obtain E Bt`s ´ pt ` sq|FtB “ Bt2 ´ t .
Example 7.12. The process Yt :“ exprλBt2 ´ λ2 t{2s is a martingale wrt tFtB u. Again, the
process is obviously measurable wrt tFtB u and, computing `the moment
˘ generating function of a
2
Gaussian random variable, we have that E rexp pλBt qs “ exp tλ {2 ă 8, verifying i) and ii) from
Def. 7.9. For iii) we have
E exp pλBt`s q |FtB “ E exp pλpBt ` Bt`s ´ Bt qq |FtB “ exp pλBt q E exp pλpBt`s ´ Bt qq |FtB
“ ‰ “ ‰ “ ‰

“ exp pλBt q E rexp pλBs qs “ exp sλ2 {2 .


` ˘

multiplying by exprpt ´ sqλ2 {2s on both sides of the above equation we obtain
E exp λBt`s ´ λ2 pt ` sq{2 |FtB “ exp λBt ´ λ2 t{s .
“ ` ˘ ‰ ` ˘

Markov processes. We now turn to the general idea of a Markov Process. As we have seen
in the example above, this family of processes has the “memoryless” property, i.e., their future
depends on their past (their history, their filtration) only through their present (their state at the
present time, or the σ-algebra generated by the random variable of the process at that time).
In the discrete time and countable sample space setting, this holds if given t1 ă ¨ ¨ ¨ ă tm ă t,
we have that the distribution of Xt given pXt1 , . . . , Xtm q equals the distribution of Xt given pXtm q.
This is the case if for all A P BpXq and s1 , . . . , sm P X we have that
P rXt P A|Xt1 “ s1 . . . , Xtm “ sm s “ P rXt P A|Xtm “ sm s .
This property can be stated in more general terms as follows:
Definition 7.13. A random process tXt u is called Markov with respect to a filtration tFt u when
Xt is adapted to the filtration and, for any s ą t, Xs is independent of Ft given Xt .
The above definition can be restated in terms of Brownian motions as follows: For any set
A P pXq we have
PrBt P A|FsB s “ PrBt P A|Bs s a.s. . (3.14)
Remember that conditional probabilities with respect to a σ-algebra are really random variables in
the way that a conditional expectation with respect to σ-algebra is a random variable. That is,
P Bt P A|FsB “ E 1Bt PA pωq|FsB
“ ‰ “ ‰
and P rBt P A|Bs s “ E r1Bt PA pωq|σpBs qs .
Example 7.14. The fact that (3.14) holds can be shown directly by using characteristic functions.
Indeed,to show that the distributions of the right and left hand side of (3.14) coincide it is enough to
identify their characteristic functions. We compute
” ı ” ı ” ı ” ı
E eiϑBt |FsB “ E eiϑpBs `Bt ´Bs q |FsB “ eiϑBs E eiϑpBt ´Bs q |FsB “ eiϑBs E eiϑpBt ´Bs q ,
and similarly
” ı ” ı ” ı ” ı
E eiϑBt |Bs “ E eiϑpBs `Bt ´Bs q |Bs “ eiϑBs E eiϑpBt ´Bs q |Bs “ eiϑBs E eiϑpBt ´Bs q .

Stopping times and Strong Markov property. We now introduce the concept of stopping
time. As the name suggests, a stopping time is a time at which one can stop the process. The accent
in this sentence should be put on can, and is to be intended in the following sense: if someone
is observing the process as it evolves, and is given the instructions on when to stop, I can stop
the process given his/her/their observations. In other words, the observer does not need future
information to know if the event triggering the stop of the process has occurred or not. We now
define this concept formally:
35
Definition 7.15. For a measurable space pΩ, Fq and a filtration tFt u with Ft Ď F for all t P T ,
a random variable tτ u is a stopping time wrt tFt u if tω P Ω : τ ď tu P Ft for all t P T .
Classical examples of stopping times are hitting times such as the one defined in the following
example
Example 7.16. The random time ( τ1 :“ infts ą 0 : Bt ě 1u is a stopping time wrt the natural
filtration of Brownian motion FtB . Indeed, at any time t i can know if the event τ1 has passed
by looking at the past history of(Bt . However, the random time τ0 : ` sup ts P p0, t˚ q : Bs “ 0u is
NOT a stopping time wrt FtB for t ă t˚ , as before t˚ we cannot know for sure if the process will
reach 0 again.
The strong Markov property introduced below is a generalization of the Markov property to
stopping times (as opposed to fixed times in Def. 7.13). More specifically, we say that a stochastic
process has the strong Markov property if its future after any stopping time depends on its past
only through the present (i.e., its state at the stopping time).
Definition 7.17. The stochastic process tXt u has the strong Markov property if for all finite
stopping time τ one has
P rXτ `s P A|Fτ s “ P rXτ `s P A|Xτ s ,
where Fτ :“ tA P Ft : tτ ď tu X A P Ft , @t ą 0u.
In the above definition, the σ-algebra Fτ can be interpreted as “all the information we have on
the process up to time τ ”.
Theorem 7.18. Brownian motion has the strong Markov property.
We now use the above result to investigate some of the properties of Brownian motion:
Example 7.19. For any t ą 0 define the maximum of Brownian motion in the interval r0, ts
as Mt :“ maxsPp0,tq Bs . Similarly, for any m ą 0 we define the hitting time of m as τm :“ infts P
r0, ts : Bs ě mu. Then, we write
P rMt ě ms “ P rτm ď ts “ P rτm ď t, Bt ě ms ` P rτm ď t, Bt ă ms
“ P rτm ď t, Bt ´ Bτm ě 0s ` P rτm ď t, Bt ´ Bτm ă ms .
Using the strong Markov property of Brownian motion we have that Bt ´ Bτm is independent on
Fτm and is a Brownian motion. So by symmetry of Brownian motion we have that
P rτm ď t, Bt ´ Bτm ě 0s “ P rτm ď t, Bt ´ Bτm ď ms “ P rBt ě ms ,
and we conclude that ż8 2
2 y
P rMt ě ms “ 2P rBt ě ms “ ? e 2t dx .
2πt m
This argument is called the reflection principle for Brownian motion. From the above argument one
can also extract that
lim P rτm ă T s “ 1 ,
T Ñ8
i.e., that the hitting times of Brownian motion are almost surely finite.
From the above example we can also derive the following formula
Example 7.20. We will compute the probability density ρτm psq of the hitting time of level m by
şt
the Brownian motion, defined by P rτm ď ts “ 0 ρτm psqds. To do so we write
c ż
2 8 ´ y2 ? ż8 ´u2
P rτm ď ts “ P rMt ě ms “ 2P rBt ě ms “ e 2t dy “ 2π ? e du , (3.15)
πt m m{ t

36
?
where in the last equality we made a change of variables u “ y{ t. Now, differentiating (3.15) wrt
t we obtain, by Leibniz rule,
d |m| ´3{2 ´ m2
ρτm ptq “ P rτm ď ts “ t e 2t . (3.16)
dt 2π
We immediately see from (3.16) that
ż8
|m| m2
E rτm s “ s´1{2 e´ 2s ds “ 8 .
2π 0

Example 7.21. From the above computations we derive the distribution of zeros of Brownian
motion in the interval ra, bs for 0 ă a ă b. We start by computing the desired quantity on the
interval r0, ts for an initial condition x which we assume wlog to be x ă 0:
„  „ 
P rBs “ 0 for s P r0, ts|B0 “ xs “ P max Bs ě 0|B0 “ x “ P max Bs ě ´x|B0 “ 0
sPr0,ts sPr0,ts
żt 2
|x| ´ x2s
“ P rτ´x ď ts “ s´3{2 e ds . (3.17)
2π 0
Since the above expression holds for all x we obtain the distribution of zeroes in the interval ra, bs by
integrating (3.17) over all possible x, weighted by the probability of reaching x at time a:
ż8
P rBs “ 0 for s P ra, bs|B0 “ 0s “ P rBs “ 0 for s P ra, b ´ as|Ba “ xs P rBa P dxs
´8
c
|x| b´a ´3{2 ´ x2
ż8 ż ˆc ˙
2 ´ x2 2 a
s e 2s ds e 2a dx “ arccos .
´8 2π 0 πa π b
By taking the complement of the above we also obtain the probability that Brownian motion has no
zeroes in the interval ra, bs:
ˆc ˙
2 a
P rBs ‰ 0 @s P ra, bss “ 1 ´ P rBs “ 0 for s P ra, bss “ arcsin .
π b
The above result is referred to as the arcsine law for Brownian motion.

8. A glimpse of the connection with pdes


The Gaussian
z2
e´ 2πt
ρpt, zq “ ?
2πt
2
B ρ
is the fundamental solution to the heat equation. By direct calculation one sees that if t ą 0 then Bρ
Bt
“ 12 Bx 2 . There is
a small problem at zero, namely ρ blows up. However, for any “nice” function φpxq (smooth with compact support),
ż
lim ρpt, xqφpxqdx “ φp0q.
tÑ0

This is the definition of the “delta function” δpxq. (If this is uncomfortable to you, look at [18].) Hence we see that
ρpt, xq is the (weak) solution to
Bρ 1 B2 ρ

Bt 2 Bx2
ρp0, xq “ δpxq
To see the connection to probability, we set ppt, x, yq “ ρpt, x ´ yq and observe that for any function f we have
ż8
ˇ (
E f pBt qˇB0 “ x “ f pyqppt, x, yqdy
´8

37
ˇ (
We will write Ex f pBt q for E f pBt qˇB0 “ x . Now notice that if upt, xq “ Ex f pBt q then upt, xq solves
Bu 1 B2 u

Bt 2 Bx2
up0, xq “ f pxq
This is Kolmogorov Backward equation. We can also write it in terms of the transition density ppt, x, yq
Bp 1 B2 p

Bt 2 Bx2
pp0, x, yq “ δpx ´ yq
From this we see why it is called the “backwards” equation. It is a differential equation in the x variable. This is the
“backwards” equation in ppt, x, yq in that it gives the initial point. This begs a question. Yes, there is also a forward
equation. It is written in terms of the forward variable y.
Bp 1 B2 p

Bt 2 By 2
pp0, x, yq “ δpx ´ yq
In this case it is identical to the backwards equation. In general it will not be.
We make one last observation: the ppt, s, x, yq “ ppt ´ s, x, yq “ ρpt ´ s, x ´ yq satisfy the Chapman-Kolmogorov
equation (the semi-group property). Namely, for any s ă r ă t and any x, y we have
ż8
pps, t, x, yq “ pps, r, x, zqppr, t, z, yqdz
´8

y
x
z

s r t
This also suggests the following form for the Kolmogorov forward equation. If we write an equation for pps, t, x, yq
evolving in s and y, then we get an equation with a finial condition instead of an initial condition. Namely, for s ď t
Bp 1 B2 p
“´
Bs 2 By 2
ρpt, t, x, yq “ δpx ´ yq
Hence, we are solving backwards in time.

38
CHAPTER 4

Itô Integrals

1. Properties of the noise Suggested by Modeling


If we want to model a process Xt which was subject to random noise we might think of writing
down a differential equation for Xt with a “noise” term regularly injecting randomness like the
following:
dXt
“ f pXt q ` gpXt q ¨ pnoiseqt
dt
The way we have written it suggests the “noise” is generic and is shaped to the specific state of the
system by the coefficient gpXt q.
It is instructive to write this in integral form over the interval r0, ts
żt żt
Xt “ X0 ` f pXs q ds ` gpXs q ¨ pnoiseqs ds (4.1)
0 0
it is reasonable to take the “noise” term to be a pure noise, independent of the structure of Xt and
leave the “shaping” of the noise to a particular setting to the function gpXt q. Since we want all
moments of time to be the same it is reasonable to assume that distribution of “noise” is stationary
in time. We would also like the noise at one moment to be independent of the noise at different
moment. Since in particular both of these properties should hold when the function g ” 1 we
consider that simplified case to gain insight.
Defining
żt
Vt “ pnoiseqs ds
0
stationarity translates to the distribution of Vt`h ´ Vt being independent of t. Independence
translates to
Vt1 ´ V0 , Vt2 ´ Vt1 , ¨ ¨ ¨ , Vtn`1 ´ Vtn
being a collection of mutually independent random variables for any collection of times
0 ă t1 ă t2 ă ¨ ¨ ¨ ă tn .
Rewriting (4.1) with g ” 1 produces
żt
Xt “ X0 ` f pXs q ds ` Vt
t
we see that if further decide that we would like to model processes Xt which are continuous in time
we need to require that t ÞÑ Vt is almost surely a continuous process.
Clearly from our informal definition, V0 “ 0. Collecting all of the properties we desire of Vt :
i) V0 “ 0
ii) Stationary increments
iii) Independent increments
iv) t ÞÑ Vt is almost surely continuous.
39
Comparing this list with Theorem 3.7 we see that Vt must be a Brownian motion. We will chose it
to be standard Brownian motion to fix a normalization.
Hence we are left to make sense of the integral equation
żt żt
dBs
Xt “ X0 ` f pXs qdS ` gpXs q ds
0 0 ds
Of course this leads to its own problems since we saw in Section 5 that Bs is nowhere differentiable.
Formally canceling the “ds” maybe we can make sense of the integral
żt
gpXs qdBs .
0
There is a well established classical theory of integrals in this form called “Riemann–Stieltjes”
integration. We will briefly sketch this theory in the next section. However, we will see that even
this theory is not applicable to the above integral. This will lead us to consider a new type of
integration theory designed explicitly for random functions like Brownian motion. This named
the Itô Integral after Kiyoshi Itô who developed the modern version though earlier version exist
(notably in the work of Paley, Wiener and Zygmund).

2. Riemann-–Stieltjes Integral
Before we try to understand how to integrate against Brownian motion, we recall the classical
Riemann—Stieltjes integration theory. Given two continuous functions f and g, we want to define
żT
f ptqdgptq . (4.2)
0
We begin by considering a piecewise function φ function defined by
#
a0 for t P rt0 , t1 s
φptq “ (4.3)
ak for t P ptk , tk`1 s, k “ 1, . . . , n ´ 1
for some partition
0 “ t0 ă t1 ă ¨ ă tn´1 ă tn “ t
and constants ak P R. For such a function φ it is intuitively clear that
żT n´1
ÿ ż tk`1 n´1
ÿ ż tk`1 n´1
ÿ
φptqdgptq “ φpsqdgpsq “ ak dgpsq “ ak rgptk`1 q ´ gptk qs . (4.4)
0 k“0 tk k“0 tk k“0
şt şt
because tkk`1 dgpsq “ gptk`1 q ´ gptk q by the fundamental theorem of Calculus (since tkk`1 dgpsq “
ştk`1 1
tk g psq ds if g is differentiable).
The basic idea of defining (4.2) is to approximate f by a sequence of step functions tφn ptqu each
of the form given in (4.3) so that
sup |f ptq ´ φn ptq| Ñ 0 as n Ñ 8. (4.5)
tPr0,T s
pnq
A natural choice of partition for the nth level is tk “ T k2´n for k “ 0, . . . , 2n and then define the
nth approximating function by
#
f pT q if t “ T
φn ptq “ pnq pnq pnq
f ptk q if t P rtk , tk`1 q
If f is continuous, it easy to see that (4.5) holds.
40
We then are left to show that there exists a constant α so that
ˇ T pnq
ż
ˇ
ˇ φ ptqdgptq ´ αˇ ÝÑ 0 as n Ñ 8
0
şT
We would then define the integral 0 f ptqdgptq to be equal to α. One of the keys to proving this
şT
convergence is a uniform bound on the approximating integrals 0 φpnq ptqdgptq. Observe that
żT n´1
ÿ
pnq
| φ ptqdgptq| ď }f }8 |gptk q ´ gptk`1 q| ď }f }8 V1 rgsp0, T q
0 k“0
where
ÿ
}f }8 “ |f ptq| .
tPr0,T s
şT
This uniform bound implies that the 0 φpnq ptqdgptq say in a compact subset of R. Hence there must
be a limit point α of the sequence and a subsequence which converges to it. It is not then hard to
show that this limit point is unique. That isş to say that if any other subsequent converges it must
T
also converge to α. We define the value of 0 f ptqdgptq to be α. Hence it seems sufficient for g to
have V1 rgsp0, T q ă 8 if when g and f are continuous. It can also be shown to be necessary for a
reasonable class of f .
It is further possible to show using essentially the same calculations that the limit α is independent
of the sequence of partitions as long as the maximal spacing goes to zero and independent of the
choice of point at which to evaluate the integrand f . In the above discussion we chose the left hand
endpoint tk of the interval rtk , tk`1 s. However we were free to choose any point in the interval.
While the compactness argument above is a standard path in mathematics is often more
satisfying to explicitly show that the tφpnq u are a Cauchy sequence by showing that for any ε ą 0
there exists and N so that if n, m ą N then
ˇż T żT ˇ
pmq
φ ptqdgptq ´ φpnq ptqdgptqˇ ă ε
ˇ ˇ
ˇ
0 0
şT
Since φpmq ´ φpnq is again a step function of the form (4.3), the integral 0 rφpmq ´ φpnq sptqdgptq is
well defined given by a sum of the form (4.4). Hence we have
ˇż T żT ˇ ˇż T ˇ
pmq pnq
φ ptqdgptq ´ φ ptqdgptqˇ “ ˇ rφpmq ´ φpnq sptqdgptqˇ ď }φpmq ´ φpnq }8 V1 rgsp0, T q
ˇ ˇ ˇ ˇ
ˇ
0 0 0

Since f is continuous and the partition spacing is going to zero it is not hard so see that the tφpnq u
from a Cauchy sequence under the } ¨ }8 norm which completes the proof that integrals of the step
functions form a Cauchy sequence.

3. A motivating example
We begin by considering the example
żT
Bs dBs (4.6)
0
where B is a standard Brownian motion. Since V1 rBsp0, T q “ 8 almost surely we can not entirely
follow the prescription of the Riemann-Stieltjes integral given in Section 2. However, it still seems
reasonable to approximate the integrand Bs by a sequence of step functions of the form(4.3).
However, since B is random, the ak from (4.3) will have to be random variables.
41
Fixing a partition 0 “ t0 ă t1 ă ¨ ¨ ¨ ă tN “ T , we define two different sequences of step function
approximations of B. For t P r0, T s, we define
φN ptq “ Bptk q if t P rtk , tk`1 q
N
φ̂ ptq “ Bptk`1 q if t P ptk , tk`1 s
Just as in the Riemann-Stieltjes setting (see (4.4)), for such step functions it is clear that one should
define the respective integrals in the following manner:
żT Nÿ
´1
N
φ ptq dBs “ Bptk qrBptk`1 q ´ Bptk qs
0 i“0
żT Nÿ´1
φ̂N ptq dBs “ Bptk`1 qrBptk`1 q ´ Bptk qs
0 i“0
In the Riemann-Stieltjes setting, the two are the same. But in this case, the two have very different
properties as the following calculation
“ shows.
‰ Since Bptk q and Bptk`1 q ´ Bptk q are independent we
have EBptk qrBptk`1 q ´ Bptk qs “ E Bptk q ErBptk`1 q ´ Bptk qs “ 0. So
”ż T ı Nÿ ´1
E φN dBs “ EBptk qrBptk`1 q ´ Bptk qs
0 k“0
Nÿ
´1
“ ‰
“ E Bptk q ErBptk`1 q ´ Bptk qs “ 0
k“0

While since ErBptk`1 q ´ Bptk qs2


“ tk`1 ´ tk , we have
”ż T ı Nÿ
´1 Nÿ
´1
N
E φ̂ dBs “E Bptk qrBptk`1 q ´ Bptk qs ` E rBptk`1 q ´ Bptk qs2
0 i“0 k“0
Nÿ
´1
“ ‰
“0 ` tk`1 ´ tk “ T
k“0
Hence, how we construct our step functions will be important in our analysis. The choice of the
endpoint used in φN ptq leads to what is called the Itô integral. The choice used in φ̂N ptq is called
the Klimontovich Integral. While if the midpoint is chosen, this leads to the Stratonovich integral.
The question on which to use is a modeling question and is dependent on the problem being studied.
We will see that it is possible to translate between all three in most cases. We will concentrate on
the Itô integral since it has some nice additional properties which make the analysis attractive.

4. Itô integrals for a simple class of step functions


Let pΩ, F, Pq be a probability space and Bt be a standard Brownian motion. Let Ft be a filtration
on the probability space to which Bt is adapted. For example, one could have Ft “ σpBs : s ď tq be
the filtration generated by the Brownian motion Bt .
Definition 4.1. φpt, ωq is an elementary stochastic process if there exists a collection of bounded,
disjoint intervals tIk u “ trt0 , t1 q, rt1 , t2 q, . . . , rtN ´1 , tN qu associated to a partition 0 ď t0 ă t1 ă
¨ ¨ ¨ ă tN and a collection of random variables tαk : k “ 0, . . . , N u so that
N
ÿ
φpt, ωq “ αk pωq1Ik ptq ,
k“0
42
the random variable αk pωq is measurable with respect to Ftk , and Er|αk pωq|2 s ă 8. We denote the
space of all elementary functions as S2 .
To be precise, stochastic integrals are defined on the class of progressively measurable processes, defined
below:
Definition 4.2. A stochastic process tXt utě0 on pΩ, F, P, tFt uq is called progressively measurable, if for
any t ě 0, Xt pωq, viewed as a function of two variables pt, ωq is Br0,ts ˆ Ft -measurable, where Br0,ts is the
Borel σ-algebra on r0, ts.
Fact 1. Every adapted right continuous with left limits (“cadlag”) or left continuous with right limits
process is progressively measurable. The reason to assume progressive measurability is to ensure that the
expectation and the integral can be interchanged (by Fubini’s theorem).
Fact 2. Any progressively measurable process X “ tXt utPr0,T s can be approximated by a sequence of
simple processes X n “ tXtn utPr0,T s P S2 in the L2 -sense, that is
«ż ff ż
T T “ ‰
E |Xtn ´ Xt |2 dt “ E |Xtn ´ Xt |2 dt Ñ 0 as n Ñ 8 .
0 0

The proof of this approximation can be found in [14].


Because of the above result, we will be able to extend the notion of integral to adapted processes, and we
restrict our attention to such processes for the rest of the chapter.
ş8
Next, we define a functional I which will be our integral operator. That is to say Ipφq “ 0 φ dB.
Just as for Riemann-Stieltjes integral, if φ is an elementary stochastic process, it is relatively clear
what we should mean by Ipφq, namely
Definition 4.3. The stochastic integral operator of an elementary stochastic process is given by
ÿ
Ipφq :“ αk rBptk`1 q ´ Bptk qs .
k

We first observe that I satisfies one of the standard properties of an integral in that it is a linear
functional. In other words, if λ P R, and φ and ψ are elementary stochastic processes then
Ipλφq “ λIpφq and Ipψ ` φq “ Ipψq ` Ipφq (4.7)
Thanks to our requirement that αk are measurable with respect to the filtration associated to the
left endpoint of the interval rtk , tk`1 q we have the following properties which will play a central role
in what follows and should be compared to the calculations in Section 3.
Lemma 4.4. If φ is an elementary stochastic processes then
EIpφq “ 0 (mean zero)
ż8
E Ipφq2 “ E|φptq|2 ds
“ ‰
(Itô Isometry)
0
Remark 4.5. An isometry is a map between two spaces which preserves distance (i.e. the norm).
If we consider
IpS2 q “ Ipφq : φ P S2 u
then according to Lemma 4.4 the map Ipφq ÞÑ φ is an isometry between the space of random variables
a
L2 IpS2 q, P “ X P IpS2 q : }X} “ EpX 2 q ă 8u
` ˘

and the space of elementary stochastic processes L2 pS2 , Prdωq ˆ dts equipped with the norm
´ż 8 ¯1
2 2
}φ} “ Eφ pt, ωq dt
0
43
Proof of Lemma 4.4. We begin by showing Ipφq is mean zero.
ÿ
ErIpφqs “ Erαk pBptk`1 q ´ Bptk qqs
k
ÿ
“ ErErαk pBptk`1 q ´ Bptk qqs|Ftk s
k
ÿ
“ Erαk ErBptk`1 q ´ Bptk q|Ftk ss “ 0
k
Turning to the Itô isometry
” `ÿ ˘ı ” ` ÿ ˘ı
ErIpφq2 s “ E αk pBptk`1 q ´ Bptk qq ¨ E αj pBptj`1 q ´ Bptj qq
k j
ÿ
“ Erαk αj rBptk`1 q ´ Bptk qsrBptj`1 q ´ Bptj qss
j,k
ÿ
“2 Erαk αj rBptk`1 q ´ Bptk qsrBptj`1 q ´ Bptj qss
jăk
ÿ
` Erαk2 rBptk`1 q ´ Bptk qs2 s
k
Next, we examine each component separately: Recall for tk`1 ď tj
Erαk αj rBptk`1 q ´ Bptk qsrBptj`1 q ´ Bptj qss
“ ErErαk αj rBptk`1 q ´ Bptk qsrBptj`1 q ´ Bptj qs|Ftj ss
“ Erαk αj rBptk`1 q ´ Bptk qsErrBptj`1 q ´ Bptj qs|Ftj ss
“0
since ErrBptj`1 q ´ Bptj qs|Ftj s “ 0. Similarly
ÿ ÿ
Erαk2 rBptk`1 q ´ Bptk qs2 s “ ErErαk2 rBptk`1 q ´ Bptk qs2 |Ftk ss
k k
ÿ
“ Erαk2 ptk`1 ´ tk qs
k
Hence, we have:
ÿ ż
2
ErIpφq s “ 0 ` Erαk2 sptk`1 ´ tk q “ Erφ2 psqs ds
k

So far we have just defined the Itô Integral on the whole positive half line r0, 8q. For any
0 ď s ă t ď 8, we make the following definition
żt
φr dBr “ Ipφ1rs,tq q
s
We can now talk about the stochastic process
żt
Mt “ Ipφ1r0,tq q “ φs dBs (4.8)
0
associated to a given elementary stochastic process φt , where the last two expressions are just
different notation for the same object.
We now state a few simple consequences of our definitions.
44
Lemma 4.6. Let φ P S2 and 0 ă s ă t then
żt żs żt
φr dBr “ φr dBr ` φr dBr
0 0 s
and
żt
Mt “ φs dBs
0
is measurable with respect to Ft .
Proof of Lemma 4.6. Clearly Mt is measurable with respect to Ft since tφs : s ď tu are by
assumption and the construction of the integral only uses the information from tBs : s ď tu. The
first property follows from φ1r0,ts “ φ1r0,sq ` φ1rs,tq and hence
Ipφ1r0,ts q “ Ipφ1r0,sq q ` Ipφ1rs,tq q
by (4.7). 
Lemma 4.7. Let Mt be as in (4.8) for an elementary process φ and a Brownian motion Bt both
adapted to a filtration tFt : t ě 0u. Then Mt is a martingale with respect to the filtration Ft .
Proof of Lemma 4.7. Looking at Definition 7.9, there are three conditions we need to verify.
The measurability is contained in Lemma 4.6. The fact that ErMt2 s ă 8 follows from the Itô
isometry since
żt ż8
2 2
ErMt s “ E|φs | ds ď E|φs |2 ds
0 0
because the last integral is assumed to be finite in the definition of an elementary stochastic process.
All that remains is to verify that for s ă t.
ErMt pφq ´ Ms pφq|Fs s “ 0 .
There are a few cases. Taking one case, say s and t are in the disjoint intervals rtk , tk`1 q and
rtj , tj`1 q, respectively. We have that
´ j´1
ÿ ¯
Mt pφq ´ Ms pφq “ αk rBptk`1 q ´ Bs s ` αn rBptn`1 q ´ Bptn qs ` αj rBt ´ Bptj qs
n“k`1

Next, take repeated expectations with respect to the filtrations tFa u, where a P ttj , tj´1 , . . . , tk`1 , su.
This would then imply that for each a
Erαa rBpta`1 q ´ Bpta qs|Fa s “ αa ErBpta`1 q ´ Bpta q|Fa s
“ αa ErBpta`1 q ´ Bpta qs
“0
Hence, ErMt pφq ´ Ms pφq|Fs s “ 0. The other cases can be done similarly. And the conclusion
immediately follows. 
Lemma 4.8. In the same setting as Lemma 4.7, Mt is a continuous stochastic process. (That is
to say, with probability one the map t ÞÑ Mt is continuous for all t P r0, 8q) .
Proof of Lemma 4.8. We begin by noticing that if φpt, ωq is a simple process with
Nÿ
´1
φpt, ωq “ αk pωq1rtk ,tk`1 q ptq
k“1

45
then if t P ptk˚ , tk˚ `1 q then
˚ ´1
kÿ
Mt pφ, ωq “ αk pωqrBptk`1 , ωq ´ Bptk , ωqs ` αk˚ pωqrBpt, ωq ´ Bptk˚ , ωqs
k“1
Hence it is clear that Mt pφ, ωq is continuous if φ is a simple function since the Brownian motion Bt
is continuous. 

5. Extension to the Closure of Elementary Processes


We will denote by S¯2 the closure in L2 pP ˆ dtq of the square integrable elementary processes S2 .
Namely,
!
S¯2 “ all stochastic process f pt, ωq : there exist a sequence φn pt, ωq P S2
ż8 )
so that Epf ptq ´ φn ptqq2 dt Ñ 0 as n Ñ 8
0
Also recall that we define
! ż )
2 2
L pΩ, Pq :“ f : Ω Ñ R : f pωq Prdωs ă 8

“ tRandom Variables X with EpX 2 q ă 8u
and
! ż8ż )
2
L pΩ ˆ r0, 8q, P ˆ dtq :“ f : Ω ˆ r0, 8q : f pt, ωq2 Prdωs dt ă 8
0 Ω
! ż8 )
“ Stochastic Processes Xt with EpXt2 q dt ă 8
0
In general, in the interest of brevity we will write L2 pΩq
for the 2
a first and L pΩ ˆ r0, 8qq for
2
the second space above. We will occasionally write }X}L2 pΩq for EpX q and }X}L2 pΩˆr0,8qq for
ş8 1
p 0 EpXt2 q dtq 2 though we will simply write } ¨ }L2 when the context is clear.
Recalling that our definition of S2 and hence S¯2 had a filtration Ft in the background, we
define L2ad pΩ ˆ r0, 8qq to be the subset of L2 pΩ ˆ r0, 8qq in which all of the stochastic processes are
adapted to the filtration Ft . We have the following simple observation
Lemma 5.1. The closure of the space of elementary processes is contained in the space of square
integrable, adapted processes, i.e., S¯2 Ă L2ad pΩ ˆ r0, 8qq .
Proof. The adaptedness follows from the adaptedness of S2 . The fact that elements in S¯2 are
square integrable follows from the following calculation. Fixing an X P S¯2 and a sequence φn P S¯2
so that }φn ´ X} Ñ 0 as n Ñ 8 we fix an n so that }φn ´ X} ď 1. Then since for any real numbers
a and b and p ě 1 one has |a ` b|2 ď 2p´1 |a|2 ` 2p´1 |b|p we see that
}f }2L2 “ }fn ` pf ´ φn q}2L2 ď 2}φn }2L2 ` 2}f ´ φn }2L2 ď 2p}φn }2L2 ` 1q ă 8
where the term on the far right is finite since for every n, φn P S2 (observe that }φn }2L2 “
Epαk2 q).
ř

In fact, S¯2 is exactly L2ad pΩ ˆ r0, 8qq,
Theorem 5.2. S¯2 “ L2ad pΩ ˆ r0, 8qq
Proof. S¯2 Ă L2ad pΩ ˆ r0, 8qq was just proven in Lemma 5.1. For the other direction, see the
proof of [14, Theorem 3.1.5] or [11] 
46
We are now ready to state the main theorem of this section. This result shows that given
a Cauchy sequence of elementary stochastic processes tφn u P S2 converging in L2 pΩ, r0, 8qq to a
process f P S̄2 , there exists a random variable X P L2 pΩq to which the stochastic integrals
ş8 Ipφn q
2
converge in L pΩq. We will then define this variable X to be the stochastic integral Ipf q “ 0 fs dBs .
The concept of the construction is summarized in the following scheme:
Ip¨q ş
tφn ptqutPT P S2 IpS 2 q Q φn dBs

ş8 “ ‰
E pφn ´ f q2 dt Ñ 0 n Ñ 8
“ ‰
0 n Ñ 8 E pIpφn q ´ Xq2 Ñ 0

Ip¨q ş
tf ptqutPT P S̄ 2 L2 pΩq Q X “: f dBs

Theorem 5.3. For every f P S¯2 , there exists a random variable X P L2 pΩq so that if tφn : n “
1, . . . , 8u is a sequence of elementary stochastic processes (i.e. elements of S2 ) converging to the
stochastic process f in L2 pΩ ˆ r0, 8q then Ipφn q converges to X in L2 pΩq.
Definition 5.4. For any f P S¯2 , we define the Itô integral
ż8
Ipf q “ ft dBt
0

to be the random variable X P L2 pωq given by Theorem 5.3.


Proof of Theorem 5.3. We start by showing that Ipφn q is a Cauchy sequence in L2 pΩq. By
the linearity of the map I (see (4.7)), Ipφn q ´ Ipφm q “ Ipφn ´ φm q. Hence by the Itô isometry for
the elementary stochastic processes (see lemma 4.4), we have that
ż8
2 2
E pφn ptq ´ φm ptqq2 dt
“ ‰ “ ‰ “ ‰
E pIpφn q ´ Ipφm qq “ E Ipφn ´ φm q “ (4.9)
0

Since the sequence φn converges to f in L2 pΩ ˆ r0, 8qq, we know that it is a Cauchy sequence in
L2 pΩ ˆ r0, 8qq. By the above calculations we hence have that tIpφn qu is a Cauchy sequence in
L2 pΩq. It is a classical fact from real analysis that this space is complete which implies that every
Cauchy sequence converges to a point in the same space. Let X P L2 pΩq denote the limit.
To see that X does not depend on the sequence tφn u, let tφ̃n u be another sequence converging
to f . The same reasoning as above ensures the existence of a X̃ P L2 pΩq so that Ipφ̃n q Ñ X̃ in
L2 pΩq. On the other hand
” ı ” ı ż8 ” ı
E pIpφn q ´ Ipφ̃n qq2 “ E pIpφn ´ φ̃n qq2 “ E pφn ptq ´ φ̃m ptqq2 dt
0
ż8 ż8 ” ı
2
E pf ´ φ̃m ptqq2 dt
“ ‰
ď2 E pφn ptq ´ f q dt ` 2
0 0

where in the inequality we have again used the fact that pφn ´ φ̃m q2 “ rpφn ´ f q ` pf ´ φ̃m qs2 ď
2pφn ´ f q2 ` 2pf ´ φ̃n q2 . Since both φn and φ̃n converge to f in L2 pΩ ˆ r0, 8qq we have that the
last two terms on the right-hand side go to 0. This in turn implies that EpX ´ X̃q2 “ 0 and that
the two random variables are the same. 
47
Remark 5.5. While the above construction might seem like magic since it is so soft, it is an
application of a general principle in mathematics. If one can define a linear map on a dense subset
of elements in a space in such a way that the map is an isometry then the map can be uniquely
extend to the whole space. This approach is beautifully presented in our current context in [10] using
the following lemma
Lemma 5.6. (Extension Theorem) Let B1 and B2 be two Banach spaces. Let B0 Ă B1 be a linear
space. If L : B0 Ñ B2 is defined for all b P B0 and |Lb|B2 “ |Lb|B1 , @b P B0 . Then there exists a
unique representation of L to B 0 (closure of B0 ) called L with Lb “ Lb, @b P B0 .
Example 5.7. We use the above theorem to show that
żT
1
IT pBs q “ Bs dBs “ pBT2 ´ T q .
0 2
řN pnq pnq
To do so, we show that the sequence φn “ j“1 1rtpnq ,tpnq q Btnj with ∆tnj “ tj`1 ´ tj Ñ 0 converges
j j`1
in L2 pΩ, r0, T qq to tBt u. Indeed we have that
żT N ż tpnq N ż tpnq
ÿ j`1 ÿ j`1
“ 2
‰ 2 pnq
E |φn ´ Bs | dt “ pBtpnq ´ Bs q ds “ ptj`1 ´ sqds
pnq j pnq
0 j“1 tj j“1 tj
N
1 ÿ pnq pnq
“ pt ´ tj q2 Ñ 0 .
2 j“1 j`1
Then, by Theorem 5.3 we have that
żT ÿ
Bs dBs “ lim Ipφn q “ lim Btpnq ∆Bjn .
0 nÑ8 nÑ8 j
j
Now, writing ∆Bjn :“ Btpnq ´ Btpnq we have
j j`1

∆pBj2 q :“ B 2pnq ´ B 2pnq “ pBtpnq ´ Btpnq q2 ` 2Btpnq pBtpnq ´ Btpnq q “ p∆Bjn q2 ` 2Btpnq ∆Bjn
tj tj`1 j j`1 j j j`1 j

and therefore
˜ ¸ ˜ ¸
ÿ 1 ÿ ÿ 1 ÿ
Btpnq ∆Bjn “ ∆pBj2 q ´ p∆Bjn q2 “ BT2 ´ p∆Bjn q2 .
j
j 2 j j
2 j
şT
The term on the rhs converges to pBT2 ´ T q{2 in L2 pΩq, which therefore corresponds to 0 Bs dBs .

6. Properties of Itô integrals


Proposition 6.1. Let f, g P L2ad pΩ, r0, 8q, λ P R, then
i) Linearity: ż8 ż8 ż8
pλfs ` gs qdBs “ λ fs dBs ` gs dBs , (4.10)
0 0 0
ii) Separability: for all S ą 0,
ż8 żS ż8
fs dBs “ fs dBs ` fs dBs ,
0 0 S
iii) Mean 0: „ż 8 
E fs dBs “ 0 , (4.11)
0
48
iv) Itô isometry: «ˆż
8 ˙2 ff ż8
E fs2 dt .
“ ‰
E fs dBs “ (4.12)
0 0

Proof. For the proof of points i) and ii) we refer to [9].


To prove iv), for f P L2ad pΩ, r0, 8q, let tφn u P S2 be a sequence of elementary stochastic
“ processes

converging to f in L“2ad pΩ, r0,
‰ “ 8q. By‰Theorem 5.3, there exists X P L 2 pΩq with E pX ´ Ipφ n qq2 Ñ0

as n Ñ 8. Since E X 2 , E Ipφn q2 ă 8 using Cauchy-Schwartz (or Hölder) inequality we write


|E Ipφn q2 ´ E X 2 | “ |E rpX ´ Ipφn qqpX ` Ipφn qqs |
“ ‰ “ ‰

ď |E rpX ´ Ipφn qqIpφn qs ` E rpX ´ Ipφn qqXs |


a a
ď E rX 2 s ` E rIpφn q2 s E rpX ´ Ipφn qq2 s ,
a
with the term on the right hand side E rpX ´ Ipφn qq2 s “ }X ´ Ipφn q}L2 pωq Ñ 0 as n Ñ 8 . This
implies that in the limit n Ñ 8 we have
E Ipφn q2 Ñ E X 2 .
“ ‰ “ ‰

At the same time,


ż8 ż8
EIpφn q2 “ Epφn ptqq2 dt Ñ Epf ptq2 q dt .
0 0
Combining these facts produces
ż8
2
EpX q “ Epf ptq2 q dt
0
as desired. The exact same logic produces EX “ 0. 

7. A continuous in time version of the the Itô integral


In this section, we consider the
Definition 7.1. For any f P Lad pΩ ˆ r0, 8qq and any t ě 0, we define the Itô integral process
tIt pf qu as
żt
It pf q “ fs pωqdBs pωq :“ Ipf 1p0,ts q .
0
Note that the process introduced above is well defined and adapted to Ft . In fact since
żt ż8
2 2
EIt pf q “ Eft dt ď Eft2 dt ă 8
0 0
we see that It pf q is an adapted stochastic process whose second moment is uniformly bounded in
time. It is not immediately clear that we can fix the realization ω (which in turn fixes the realization
şt
of f and W ) and then change the time t in It pf qpωq “ 0 fs pωqdBs pωq. We built the integral as
some limit at each time t. Changing the time t requires us to repeat the limiting process. This
would be fine except that we built the Itô integral through an L2 -limit. Hence at each time it is
only defined up to sets of probability zero. If we only want to define It for some countable sequence
of time then this still would not be a problem because the countable union of measure zero sets is
still measure zero. However, we would like to define It pf q for all t P r0, T s. This is a problem and
more work is required.
Theorem 7.2. Let f P L2ad pΩ, r08qq, then there exists a continuous version of It pf q, i.e., there
exists a t-continuous stochastic process tJt u on pΩ, F, Pq such that for all t P T
P rJt “ It pf qs “ 1 .
49
To prove this result, we state a very useful theorem which we will prove later.
Theorem 7.3 (Doob’s Martingale Inequality). Let Mt be a continuous-time martingale with
respect to the filtration Ft . Then
´ ¯ Er|M |p s
T
P sup |Mt | ě λ ď p
0ďtďT λ
where λ P R` , and p ě 1.
Proof of Theorem 7.2. By Lemma 4.8 we know that It pφ, ωq is continuous if φ is a simple
function. If we had a sequence of simple functions φn converging to f in L2 , we would like to “transfer”
the continuity of φ to f . To do so we use the following fact: “ The uniform limit of continuous
functions is continuous”. In other words, if fn is continuous and suptPr0,T s |fn ptq ´ f ptq| Ñ 0 as
n Ñ 8 then f is continuous. şt
To do so, let It pφn q “ 0 φpsqdBs and consider a Cauchy sequence tφn u, for which we have
„ 
1 “
P sup |It pφn q ´ It pφm q| ą ε ď 2 E |IT pφn q ´ IT pφm q|2

0ďtďT ε
„ż T 
1 2
ď 2E pφn psq ´ φm psqq ds
ε 0
The last term goes to zero as n, m Ñ 8. Hence we can find a subsequence tnk u so that
„ 
1 1
P sup |It pφnk`1 q ´ It pφnk q| ą 2 ď k
0ďtďT k 2
If we set
" *
1
Ak “ sup |It pφnk`1 q ´ It pφnk q| ą 2
0ďtďT k
ř ř
then k PrAk s ď k 2´k ă 8. Hence the Borel-Cantelli lemma tells us that there is an random
N pωq so that
1
n ą N pωq ùñ sup |It pφnk`1 q ´ It pφnk q| ď
0ďtďT k2
pkq pkq
If we set Jt “ It pφnk q then the tJt u form a Cauchy sequence in the sup norm (|f |sup “
pkq
suptPr0,T s |f ptq|). Since the convergence is uniform in t and each Jt is continuous, we know that for
pkq
almost every ω the limit point limkÑ8 Jt “ Jt is also continuous in t. Finally, since by assumption
pkq şt
we also have Jt Ñ It pf q “ 0 fs pωqdBs pωq in L2 , we have that
żt
fs pωqdBs pωq “ Jt pωq a.s. ,
0
as required. 

8. An Extension of the Itô Integral


şT
Up until now we have only considered the Itô integral for integrands f such that E 0 f 2 ds ă 8.
şT
However it is possible to make sense of 0 fs pωqdBs pωq if we only know that
„ż T 
P |fs pωq|2 ds ă 8 “ 1 .
0

50
şT
Most of the previous properties hold. In particular 0 fs pωqdBs pωq is a perfectly fine random
şT
variable which is almost surely finite. However, it is not necessarily true that Er 0 fs pωqdBs pωqs “ 0.
şT
Which in turn means that 0 fs pωqdBs pωq need not be a martingale. (In fact it is what is called a
local martingale.) By obvious reasons, the Itô isometry property (4.12) may not hold in this case.
Example 8.1.

9. Itô Processes
Let pΩ, F, Pq be the canonical probability space and let Ft be the σ-algebra generated by the
Brownian motion Bt pωq.
Definition 9.1. Xt pωq is an Itô process if there exist stochastic processes f pt, ωq and σpt, ωq
such that
i) f pt, ωq and σpt, ωq are Ft -measurable ,
şt şt
ii) 0 |f | ds ă 8 and 0 |σ|2 ds ă 8 almost surely ,
iii) X0 pωq is F0 -measurable ,
iv) With probability one the following holds
żt żt
Xt pωq “ X0 pωq ` fs pωq ds ` σs pωqdBs pωq (4.13)
0 0
The processes f pt, ωq and σpt, ωq are referred to as drift and diffusion coefficients of Xt .
For brevity, one often writes (4.13) as
dXt pωq “ ft pωq dt ` σt pωqdBt pωq
But this is just notation for the integral equation above!

51
CHAPTER 5

Stochastic Calculus

This section introduces the fundamental tools for the computation of stochastic integrals. Indeed,
similarly to what is done in classical calculus, stochastic integrals are rarely computed by applying
the definition of Itô integral from the previous chapter. In the case of şclassical calculus, instead
of applying the definition of Riemann integral one usually computes f pxqdx by applying the
d
fundamental theorem of calculus and choosing dx F pxq “ f pxq such that
ż ż
d
f pxqdx “ F pxqdx “ F pxq . (5.1)
dx
Even though, as we have seen in the previous section differentiation in this framework is not possible,
it is possible to obtain a similar result for Itô integrals. In the following chapter we will introduce
such a formula (called the Itô formula) allowing for rapid computation of stochastic integrals.

1. Itô’s Formula for Brownian motion


We first introduce the Itô formula for the Brownian motion process.
Theorem 1.1. Let f P C 2 pRq (the set of twice continuously differentiable functions on R) and
Bt a standard Brownian motion. Then for any t ą 0,
żt
1 t 2
ż
1
f pBt q “ f p0q ` f pBs qdBs ` f pBs q ds .
0 2 0
To prove this theorem, we first state the following partial result, proven at the end of the section.
Lemma 1.2. Let g be a continuous function and Γn :“ ttnk : k “ 1, . . . , N pnqu be a sequence of
partitions of r0, ts such that t0 “ 0 and tN “ t
|Γn | :“ sup |tni`1 ´ tni | Ñ 0 as n Ñ 8. (5.2)
i
Then
Nÿ
´1 ´ ¯2 żt
gpξkn q B
tn
k
´B tn
k`1
Ñ gpBs q ds ,
k“0 0
for any choice of ξkn P pBtnk , Btnk`1 q .
Proof of Theorem 1.1. Without loss of generality we can assume that f and its first two
derivatives are bounded. After establishing the result for such functions we can approximate any
function by such a sequence and pass to the limit to obtain the general result.
Let ttnk : k “ 1, . . . , N pnqu be a sequence of partitions of r0, ts such that t0 “ 0 and tN “ t
|ΓN | :“ sup |tni`1 ´ tni | Ñ 0 as n Ñ 8.
i
Now for any level n,
Nÿ
pnq ´ ¯
f pBt q ´ f p0q “ f pBtk q ´ f pBtk´1 q . (5.3)
k“1
53
Taylor’s Theorem implies
` ˘ 1 ` ˘2
f pBtk q ´ f pBtk´1 q “ f 1 pBtk´1 q Btk ´ Btk´1 ` f 2 pξk q Btk ´ Btk´1 ,
2
for some ξk P rBtk´1 , Btk s. Returning to (5.3), we have
Nÿ
pnq
1
` ˘ 1 Nÿpnq
` ˘2
f pBt q ´ f p0q “ f pBtk´1 q Btk ´ Btk´1 ` f 2 pξk q Btk ´ Btk´1 . (5.4)
k“1
2 k“1
By the construction of the Itô integral, for the first term on the right hand side of (5.4) we have
Nÿpnq żt
1
` ˘
f pBtk´1 q Btk ´ Btk´1 ÝÑ f 1 pBs q dBs
k“1 0

inL2 as n Ñ 8. Combining this with the result of Lemma 1.2 (proven below) with g “ f 2 for the
second term on the right hand side of (5.4) we conclude the proof. 
Proof of Lemma 1.2. We now want to show that
Nÿ
pnq żt
` ˘2
An :“ gpξk q Btk ´ Btk´1 ÝÑ gpBs q ds
k“1 0

in probability as n Ñ 8. We begin by showing that


Nÿ
pnq żt
` ˘2
Cn :“ gpBtk´1 q Btk ´ Btk´1 ÝÑ gpBs q ds , (5.5)
k“1 0

in probability as n Ñ 8. First since gpBs q is continuous, we have that


Nÿpnq żt
Dn :“ gpBtk´1 qptk ´ tk´1 q ÝÑ gpBs q ds .
k“1 0

as n Ñ 8. Therefore, we obtain (5.5), by showing that the term |Cn ´ Dn | converges to 0 in L2 pΩq
as this directly implies convergence in probability. To that end observe that
» fi
Nÿ
pnq
E pDn ´ Cn q2 “ E – gpBtk´1 q2 p∆k t ´ ∆2k Bq2 fl
“ ‰
k“1
» fi
ÿ
` 2E – gpBtk´1 qgpBtj´1 qp∆k t ´ ∆2k Bqp∆j t ´ ∆2j Bqfl (5.6)
jăk

where ∆k :“ tk ´ tk´1 and ∆2k B :“ pBtk ´ Btk´1 q2 . Now since


E p∆k t ´ ∆2k Bq2 “ p∆k tq2 ´ 2p∆k tq2 ` 3p∆k tq2 “ 2p∆k tq2
“ ‰

Considering the first term in (5.6) we have


» fi
Nÿ
pnq Nÿpnq Nÿ
pnq
2 2 2 fl 2 2 N
E gpBtk´1 q2 p∆k tq
“ ‰ “ ‰
E – gpBtk´1 q p∆k t ´ ∆k Bq “ 2 E gpBtk´1 q p∆k tq ď 2|Γ |
k“1 k“1 k“1

Since the sum converges to


żt
E gpBs q2 ds ă 8
“ ‰
0

54
as n Ñ 8 and |ΓN | Ñ 0, the product goes to zero. All that remains is the second sum in (5.6).
Since

E gpBtk´1 qgpBtj´1 qp∆k t ´ ∆2k Bqp∆j t ´ ∆2j Bq


“ ‰
ˇ
“ E gpBtk´1 qgpBtj´1 qp∆j t ´ ∆2j BqE ∆k t ´ ∆2k B ˇFtk´1
“ “ ‰‰
“ ˇ ‰
and E ∆k t ´ ∆2k B ˇFtk´1 “ 0, we see that the second sum is in fact zero.
All that remains is to show that An converges to Cn . Now
Nÿ
pnq
` ˘2
|Cn ´ An | ď |gpξk q ´ gpBtk q| Btk ´ Btk´1
k“1
´ ¯ Nÿ
pnq
` ˘2
ď sup |gpξk q ´ gpBtk q| Btk ´ Btk´1 .
k k“1
Since the first term goes to zero in probability as n Ñ 8 by the continuity and boundedness of
gpBs q and the second term converges to the quadratic variation of Bt (which equals t), we conclude
that the product converges to zero in probability. 

1.1. A second look at Itô’s Formula. Looking back at (5.4), one sees that the Itô integral
term in Itô’s Formula comes from the sum against the increments of Brownian motion. This term
results directly from the first order Taylor expansion of f , and can be identified with the first order
derivative term that we are used to see in the fundamental theorem of calculus (5.1). The second
sum
Nÿ
pnq
` ˘2
f 2 pξk q Btk ´ Btk´1 ,
k“1
which contains the squares of the increments of Brownian motion, results from the second order
term in the Taylor expansion and is absent in the classical calculus formulation. However, since the
sum
Nÿ
pnq
` ˘2
Btk ´ Btk´1
k“1
converges in probability to the quadratic variation of the Brownian motion Bt , which according
to Lemma 5.3 is simply t, this term gives a nonzero contribution in the limit n Ñ 8 and should
be considered in this framework. We refer to this term as the Itô correction term. In light of
this remark, if we let rBst denote the quadratic variation of Bt , then one can reinterpret the Itô
correction term
1 t 2 1 t 2
ż ż
f pBs q ds as f pBs q drBss . (5.7)
2 0 2 0
We wish to derive a more general version of Itô’s formula for a general Itô process Xt defined by
żt żt
Xt “ X0 ` fs ds ` gs dBs .
0 0
Beginning in the same way as before, we write the expression analogous to (5.4), namely
Nÿ
pnq
1
` ˘ 1 Nÿpnq
` ˘2
f pXt q ´ f pX0 q “ f pXtk´1 q Xtk ´ Xtk´1 ` f 2 pξk q Xtk ´ Xtk´1 (5.8)
k“1
2 k“1

55
for some twice continuously differentiable function f and some partition ttk u of r0, ts. It is reasonable
to expect the first and second sums to converge respectively to the integrals
żt
1 t 2
ż
1
f pXs q dXs and f pXs q drXss . (5.9)
0 2 0
The first is the Itô stochastic integral with respect to an Itô process Xt while the second is an
integral with respect to the differential of the quadratic variation rXst of the process Xt . So far this
discussion has proceeded mainly by analogy to the simple Brownian motion. To make sense of the
two terms in (5.9) we need to better understand the quadratic variation of an Itô process Xt and
to define the concept of stochastic integral against Xt . While the former point is covered in the
following section, the latter is quickly clarified by this intuitive definition:
Definition 1.3. Given an Itô process tXt u with differential dXt “ ft dt ` σt dBt and an adapted
stochastic process tht u such that
ż8 ż8
|hs fs | ds ă 8 and phs σs q2 ds ă 8 a.s.
0 0
then we define the integral of ht against Xt as
żt żt żt
hs dXs :“ hs fs ds ` hs σs dBs . (5.10)
0 0 0

2. Quadratic Variation and Covariation


We generalize the definition (3.9) of quadratic variation to the one of quadratic covariation
Definition 2.1. Let Xt , Yt be two adapted, stochastic processes. Their quadratic covariation is
defined as
p jN ´
ÿ ¯´ ¯
rX, Y st :“ lim XtN ´ XtN YtN ´ YtN
N Ñ8 j`1 j j`1 j
j“0

where limp denotes a limit in probability and ttN


j u is a set partitioning the interval r0, ts defined by

ΓN :“ tttN N N N
j u : 0 “ t0 ă t1 ă ¨ ¨ ¨ ă tj N “ tu (5.11)
with |ΓN | :“ supj |tN N
j`1 ´ tj | Ñ 0 as N Ñ 8. Furthermore, we define the quadratic variation of Xt
as
rXst :“ rX, Xst . (5.12)
We can also speak about the quadratic variation on an interval different than r0, ts. For 0 ď s ă t
we will write respectively rXss,t and rX, Y ss,t for the quadratic variation and cross-quadratic variation
on the interval rs, ts.
Just from the algebraic form of the pre-limiting object the quadratic variation satisfies a number of
properties.
Lemma 2.2. Assuming all of the objects are defined, then for any adapted, continuous stochastic
processes Xt , Yt ,
i) for any constant c P R we have
rc Xst “ c2 rXst
ii) for 0 ă s ă t we have
rXs0,s ` rXss,t “ rXs0,t (5.13)
56
iii) we have that
0 ď rXs0,s ď rXs0,t (5.14)
for t ą s ě 0. In other words, the map t ÞÑ rXst is nondecreasing a.s. .
iv) we can write
rX ˘ Y st “ rXst ` rY st ˘ 2rX, Y st . (5.15)
Consequently, quadratic covariations can be written in terms of quadratic variations as
1´ ¯ 1´ ¯
rX, Y st “ rX ` Y st ´ rXst ´ rY st “ rX ` Y st ´ rX ´ Y st . (5.16)
2 4
Proof. Parts i) and ii) are a direct consequence of Def. 2.1, while part iii) results from ii) the fact
that rX, Y st is defined as a sum of squares and is therefore nonnegative: rXs0,t “ rXs0,s ` rXss,t ě
rXs0,s . Part iv) is obtained by noticing that
pjN ´
ÿ ¯2
rX ˘ Y st “ lim pXtN ´ XtN q ˘ pYtN ´ YtN q
N Ñ8 j`1 j j`1 j`1
j“0
j ˆ´ N
pÿ ¯2 ´ ¯´ ¯ ´ ¯2 ˙
“ lim XtN ´ XtN ˘ 2 XtN ´ XtN YtN ´ YtN ` YtN ´ YtN
N Ñ8 j`1 j j`1 j j`1 j j`1 j
j“0
“ rXst ˘ 2rX, Y st ` rY st ,
while (5.16) is obtained by rearranging the terms of the above result (for the first inequality) and
by using it to compute rX ` Y st ` rX ´ Y st (for the second). 
In the following sections we will see that the quadratic variation of Itô integrals acquires a
particularly simple form. We will show this by first considering quadratic variations of Itô integrals
and then extend this result to Itô processes.
2.1. Quadratic Variation of an Itô Integral.
ş8
Lemma 2.3. Let σt be a process adapted to the filtration tFtB u and such that 0 σs2 ds ă 8 a.s..
şt
Then defining Mt :“ It pσq “ 0 σs dBs we have that
żt
rM st “ σs2 ds (5.17)
0
or in differential notation drM st “ σt2 dt.
Proof of Lemma 2.3. It is enough to prove (5.17) when σs is an elementary stochastic process
in S2 . The general case can then be handled by approximation as in the proof of the Itô isometry.
Hence, we assume that
K
ÿ
σt “ αj´1 1rtj´1 ,tj q ptq (5.18)
j“1
where the αk satisfy the properties required by S2 and K is some integer. Without loss of generality
we can assume that t is the right endpoint of our interval so that the partition takes the form
0 “ t0 ă t1 ă ¨ ¨ ¨ ă tK “ t
Now observe that if rs, rs Ă rtj´1 , tj s then
żr
` ˘
στ dBτ “ αj´1 Bs ´ Br
s

57
pnq
Hence ts` u is a sequence of partitions of the interval rtj´1 , tj s so that
pnq pnq pnq
tj´1 “ s0 ă s1 ă ¨ ¨ ¨ ă sN pnq “ tj
pnq pnq
and |ΓN | “ sup` |s`´1 ´ s` | Ñ 0 as n Ñ 8. Then the quadratic variation of Mt on the interval
rtj´1 , tj s is the limit n Ñ 8 of
Nÿ
pnq Nÿ
pnq
pMs`´1 ´ Ms` q2 “ αj´1
2
pBs`´1 ´ Bs` q2 .
`“1 `“1
Since the summation on the right hand side limits to the quadratic variation of the Brownian motion
B on the interval rtj´1 , tj s which we know to be tj ´ tj´1 we conclude that
2
rM stj´1 ,tj “ αj´1 ptj ´ tj´1 q .
Since the quadratic variation on disjoint intervals adds, we have that
K
ÿ K
ÿ żt
2
rM st “ rM stj´1 ,tj “ αj´1 ptj ´ tj´1 q “ σs2 ds
j“1 j“1 0

where the last equality follows from the fact that σs takes the form (5.18). As mentioned at the
start, the general form follows from this calculation by approximation by functions in S2 . 
Remark 2.4. The proof of the above result in Klebaner (Theorem 4.14 on pp. 106) has a subtle issue .
When bounding « ff
n´1
ÿ n´1
ÿ
2 n n 2 2 n n
2 g pBtni qpti`1 ´ ti q ď 2δn E g pBtni qpti`1 ´ ti q ,
i“0 i“0
řn´1
Klebaner asserts that as n Ñ 8, δn “ |Γn | Ñ 0, Erg pBt2i qsptni`1 ´ tni q would stay finite, and thus their
2
ři“0
n´1
product would go to 0. However, the finiteness of i“0 Erg 2 pBt2i qsptni`1 ´ tni q is unjustified. In fact, if it were
şt
finite, it must converge to 0 Erg 2 pBs qsds (Riemann sum). However, this integral might be infinity for certain
2
choice of g, for example, gpxq “ ex (see Example 4.5 on pp. 99 of [Klebaner]). The proof here uses the
same computation of second moment but only for “nice” functions (i.e., those with compact support). The
convergence in probability (note: this is weaker than convergence in L2 ) for general continuous functions is
established using approximation. The stopping rules for Itô integral are needed here, but we defer it to the
later part of the course.
We now consider the quadratic covariation of two Itô integrals with respect to independent
Brownian motions.
Lemma 2.5. Let Bt , Wt two independent Brownian motions,
ş8 and
ş8 fs , gs two stochastic processes,
all adapted to the underlying filtration Ft and such that 0 fs2 ds, 0 gs2 ds ă 8 almost surely. We
define
żt żt
Mt :“ fs dBs and Nt :“ gs dWs .
0 0
Then, for all t ě 0 one has
rN, M st “ 0 . (5.19)
Proof of (5.19). Again without lost of generality it is enough to prove the result for σt and
gs in S2 . We can further assume that both functions are defined with respect to the same partition
0 “ t0 ă t1 ă ¨ ¨ ¨ ă tK “ t

58
Since as observed in (5.13), the quadratic variation on disjoint intervals adds, we need only show
that rN, M stj´1 ,tj “ 0 on any of the partition intervals rtj´1 , tj s.
Fixing such an interval rtj´1 , tj s, we see that rN, M stj´1 ,tj “ σtj´1 gtj´1 rW, Bstj´1 ,tj . The easiest
way to see this is to use the “polarization” equality (5.16)
2rW, Bstj´1 ,tj “ r W?`B
2
, W?`B s
2 tj´1 ,tj
´ r W?´B
2
, W?´B s
2 tj´1 ,tj
“ ptj ´ tj´1 q ´ ptj ´ tj´1 q “ 0

since W?`B
2
and W?´B
2
are standard Brownian motions and hence have quadratic variation the length
of the time interval. 

Remark 2.6. One can also prove the above result more directly by following the same argument as
in Theorem 5.3 of Chapter 3. The key calculation is to show that the expected value of the approximating
quadratic variation is 0 and not the length of the time interval as in the proof of Theorem (5.3) of Chapter 3.
For any partitions ts` u of rtj´1 , tj s one has to zero we have
ÿ ÿ
E pBs` ´ Bs` ´1 qpBs` ´ Bs` ´1 q “ EpBs` ´ Bs` ´1 qEpBs` ´ Bs` ´1 q “ 0
` `

2.2. Quadratic Variation of an Itô Process. In this section, using the results presented
above, we finally obtain a simple expression for the quadratic variation of an Itô process.
Lemma 2.7. If Xt is an Itô process with differential dXt “ µt dt ` σt dBt , then
żt
rXst “ rIpσ 2 qst “ σs2 ds , (5.20)
0

or equivalently drXst “ σt2 dt.


By comparing this result with (5.17) we notice that the only contribution to the quadratic variation
process comes from the Itô integral. The following result will be useful in the proof of (5.20).
Lemma 2.8. Let Xt and Yt be adapted stochastic processes, such that Xt is continuous a.s. and
Yt has trajectories with finite first variation (V1 rY sptq ă 8) then rX, Y st “ 0 a.s..
Before we give the proof of Lemma 2.8 we observe that it immediately yields (5.20).
şt şt
Proof of Lemma 2.7. Defining Ft “ 0 µs ds and Mt “ 0 σs dBs , observe that Xt “ Ft ` Mt
and that Ft is continuous and of finite first variation almost surely. Hence rF st “ 0. Since Mt is
continuous a.s., we have that rM, F st “ 0 almost surely. Hence
żt
rXst “ rF st ` 2rF, M st ` rM st “ rM st “ σs2 ds .
0

Proof of Lemma 2.8 . Let ΓN :“ ttN i : i “ 0, . . . , iN u be a sequence of partitions of r0, ts


N N N
such that |Γ | “ supi |ti`1 ´ ti | Ñ 0 as N Ñ 8. Now
ˇÿiN ˇ ´ iN
¯ÿ
ˇ pXti ´ Xti´1 qpYti ´ Yti´1 qˇ ď sup |Xti ´ Xti´1 | |Yti ´ Yti´1 |
ˇ ˇ
i“1 i i“1

The summation on the right hand side is bounded from above by the first variation of Yt which by
assumption is finite a.s. On the other hand, as n Ñ 8 the supremum goes to zero since |ΓN | Ñ 0
and Xt is a.s. continuous. 
59
Remark 2.9. Similarly to the formal considerations in Section 1.1, we may think of the
differential of the quadratic variation process drXst as the limit of the difference term pXtnk`1 ´ Xtnk q2 ,
which in turn is the square of the differential of Brownian motion pdXt q2 . Therefore, formally
speaking, we can obtain the result of the previous lemma by writing
drXst “ pdXt q2 “ pµt dt ` σt dBt q2
“ µ2t pdtq2 ` 2µt σt pdtqpdBt q ` σt2 pdBt q2
“ σt2 dt .
where we have applied pdtq2 “ pdtqpdBt q “ 0 (cfr. Lemma 2.8) and pdBt q2 “ dt (cfr. Lemma 2.3).
These formal multiplication rules are summarized in the following table: By the same formal

ˆ dt dBt
dt 0 0
dBt 0 dt

arguments, such rules apply to the computation of the quadratic covariation of two Itô processes
Yt “ µ1t dt ` σt1 dBt :
drX, Y st “ pdXt qpdYt q “ pµt dt ` σt dBt qpµ1t dt ` σt1 dBt q
“ µt µ1t pdtq2 ` µt σt1 pdtqpdBt q ` µ1t σt pdtqpdBt q ` σt σt1 pdBt q2 (5.21)
“ σt σt1 dt .
This result can be verified by going through the steps of the proof of the above lemmas.

3. Itô’s Formula for an Itô Process


Using the lessons learned in the previous sections, we now proceed to compute the infinitesimal
of (5.8) as
1
df pXt q “ f 1 pXt qdXt ` f 2 pXt qpdXt q2 . (5.22)
2
Of course this is just notation for the integral equation
żt
1 t 2
ż
f pXt q ´ f pX0 q “ 1
f pXs qdXs ` f pXs qpdXs q2 .
0 2 0
The first integral is simply the integral against an Itô process as we have already discussed. In light
of Remark 2.9, we should interpret drXt s “ pdXt q2 “ pµt dt ` σt dBt q2 “ σt2 dt. Hence
1 t 2 1 t 2
ż ż
2
f pXs qpdXs q “ f pXs qσs2 ds
2 0 2 0
This formal calculation (which is correct) leads us to suggest the following general Itô formula.
Theorem 3.1. Let Xt be the Itô process given by
dXt “ µt dt ` σt dBt
If f is a C 2 function then
żt
1 t 2
ż
1
f pXt q “ f pX0 q ` f pXs qdXs ` f pXs qdrXss
0 2 0
żt żt
1 t 2
ż
1 1
“ f pX0 q ` f pXs qµs ds ` f pXs qσs dBs ` f pXs qσs2 ds
0 0 2 0
60
Proof of Theorem 3.1. Without loss of generality, we can assume that both µt and σt are
adapted elementary stochastic processes satisfying the assumptions required by an Itô process.
Furthermore, by inserting partition points if needed, we can assume that they are both defined on
the same partition
0 “ t0 ă t1 ă ¨ ¨ ¨ ă tN “ t .
Since
N
ÿ
f pXt q ´ f pX0 q “ f pXt` q ´ f pXt`´1 q
`“1

we need only prove Itô’s formula for f pXt` q ´ f pXt`´1 q. Now since Xt is constant for t P rt`´1 , t` q
we can take ξ “ Xt`´1 and for rr, ss Ă rt`´1 , t` s we have
żs żs
Xs ´ Xr “ µτ dτ ` στ dBτ “ µt`´1 ps ´ rq ` σt`´1 pBs ´ Br q .
r r
pnq pnq
Let tsi : i “ 0, . . . , Kpnqu be a sequence of partitions of rt`´1 , t` s such that |ΓN | “ supi |si ´
pnq
si´1 | Ñ 0 as n Ñ 8. Now using Taylor’s theorem we have
1
f pXsj q ´ f pXsj´1 q “ f 1 pXsj´1 qpXsj ´ Xsj´1 q ` f 2 pξj qpXsj ´ Xsj´1 q2 .
2
for some ξj P pXsj´1 , Xsj q. Hence we have
Kpnq
ÿ
f pXt` q ´ f pXt`´1 q “ f pXsj q ´ f pXsj´1 q
j“1
Kpnq Kpnq
ÿ 1 ÿ 2 1
“ 1
f pXsj´1 qpXsj ´ Xsj´1 q ` f pξj qpXsj ´ Xsj´1 q2 “ pIq ` pIIq
j“1
2 j“1 2

Since
pXsj ´ Xsj´1 q2 “ µ2t`´1 psj ´ sj´1 q2 ` 2µt`´1 σt`´1 psj ´ sj´1 qpBsj ´ Bsj´1 q ` σt2`´1 pBsj ´ Bsj´1 q2
we have
Kpnq
ÿ Kpnq
ÿ
pIq “ µt`´1 f 1 pXsj´1 qps ´ rq ` σt`´1 f 1 pXsj´1 qpBs ´ Br q “ pIaq ` pIbq
j“1 j“1

and
Kpnq
ÿ Kpnq
ÿ
pIIq “µ2t`´1 f 2 pξj qpsj ´ sj´1 q2 ` 2µt`´1 σt`´1 f 2 pξj qpsj ´ sj´1 qpBsj ´ Bsj´1 q
j“1 j“1
Kpnq
ÿ
` σt2`´1 f 2 pξj qpBsj ´ Bsj´1 q2 “ pIIaq ` pIIbq ` pIIcq
j“1

As n Ñ 8, it is clear that
ż t` ż t`
pIaq ÝÑ f 1 pXs qbs ds, and pIbq ÝÑ f 1 pXs qσs dBs .
t`´1 t`´1

61
Using the same arguments as in Theorem 1.1 we see that
ż t` ż t`
pIIcq ÝÑ σt2`´1 f 2 pXs q ds “ σs2 f 2 pXs q ds .
t`´1 t`´1

All that remains is to show that pIIaq and pIIbq converge to zero as n Ñ 8. Observe that
Kpnq
ÿ Kpnq
ÿ
|pIIaq| ď µ2t`´1 |ΓN | 2
f pξj qpsj ´ sj´1 q N
and |pIIbq| ď 2µt`´1 σt`´1 |Γ | f 2 pξj qpBsj ´ Bsj´1 q .
j“1 j“1
ş ş
Since the two sums converge to and respectively the fact that |ΓN | Ñ 0
f 2 pXs q ds f 2 pXs qdBs
implies that pIIaq and pIIbq converge to zero as n Ñ 8. Putting all of these results together
produces the quoted result. 
Remark 3.2. Notice that the “multiplication table” given in Remark 2.9 is reflected in the
details of the proof of Theorem 3.1. Each of the terms in pdXt q2 correspond to one of the terms
labeled pIIq which came from pXpsj q ´ Xpsj´1 qq2 in the Taylor’s theorem expansion. The term
pIIaq which corresponds to p dtq2 limits to zero as the multiplication table indicates. The term pIIbq
which corresponds to pdtqpdBt q also tends to zero again as the table indicates. Lastly, pIIcq which
corresponds to p dBt q2 limits to an integral against pdtq as indicated in the table.
Example 3.3. Consider the stochastic process with differential
1
dXt “ Xt dBt ` Xt dBt .
2
The above process is an example of a geometric Brownian motion, a process widely used in finance
to model the price of a stock. We apply Itô formula to the function f pxq :“ log x. Using that
2 f pxq “ ´x´2 we obtain
Bx f pxq “ x´1 and Bxx
ˆ ˙
1 1 1 1 1 1 1 ` 2 ˘
d log Xt “ dXt ´ 2 drXst “ Xt dBt ` Xt dBt ´ Xt dt “ dBt .
Xt 2 Xt Xt 2 2 Xt2
In the integral form the above can be written as log Xt “ log X0 ` Bt and therefore Xt “ X0 eBt .

4. Full Multidimensional Version of Itô Formula


We now give the full multidimensional version of Itô’s formula. We will include the possibility
that the function depends on time and that there is more than one Brownian motion. We begin
with the definition of a multidimensional Itô process.
Definition 4.1. A stochastic process Xt “ pX1 ptq, . . . , Xd ptqq P Rd is an Itô process if each of
the coordinate process Xi ptq is a one-dimensional Itô process.
Let µi ptq and σij ptq be adapted stochastic processes and tBj ptqum j“1 be m mutually independent,
standard Brownian motions such that for i “ 1, . . . , d we can write each component of the d-
dimensional Itô process as
m
ÿ
dXi ptq “ µi ptq dt ` σij ptq dBj ptq . (5.23)
j“1

If we collect the Brownian motions into one m-dimensional Brownian motion Bt “ pB1 ptq, . . . , Bm ptqq
and define the Rd -valued process µt “ pµ1 ptq, . . . , µd ptqq and the matrix valued process σt whose
matrix elements are the σij ptq then we can write
dXt “ µt dt ` σt dBt . (5.24)

62
While this is nice and compact, it is perhaps more suggestive to define the Rd -valued processes
σ pjq “ pσ1,j , . . . , σn,j q for j “ 1, . . . , m and write
d
ÿ pjq
dXt “ µt dt ` σt dBj ptq . (5.25)
j“1

This emphasizes that the process Xt at each moment of time is pushed in the direction which µt
pjq
points and given m random kicks, in the directions the σt point, whose magnitude and sign are
dictated by the Brownian motions Bj ptq.
We now want to derive the process which describes evolution of F pXt q where F : Rd Ñ R. In
other words, the multidimensional Itô formula.
We begin by developing some intuition. Recall Lemma 2.5 stating that the cross-quadratic
variation of independent Brownian motions is zero. Hence if Bt and Wt are independent standard
Brownian motions then the multiplication table for pdXt q2 and pdYt qpdXt q if Xt and Yt are two Itô
processes is given in the following table.

ˆ dt dBt dWt
dt 0 0 0
dBt 0 dt 0
dWt 0 0 dt
Table 1. Formal multiplication rules for differentials of two independent Brownian motions

Theorem 4.2. Let F : Rd Ñ R be a function such that F pxq P C 2 in x P Rd . If Xt is as above


then
n n n
ÿ BF 1 ÿ ÿ BF
dF pXt q “ pXt qdXi ptq ` pXt q drXi , Xk st (5.26)
i“1
Bx i 2 i“1 k“1
Bx i Bxk

Furthermore one has


n
d ÿ n ÿ n
ÿ BF ÿ BF
pXt qdrXi , Xk sptq “ pXt qaik ptq dt (5.27)
i“1 k“1
Bx i Bx k i“1 k“1
Bx i Bx k

where
d
ÿ
aik ptq “ σij ptqσkj ptq
j“1

The matrix aptq can be written compactly as σptqσptqT . The matrix a is often called the diffusion
matrix.
We will only sketch the proof of this version of Itô formula since it follows the same logic of the
others already proven. Proofs can be found in many places including [14, 7, 3].
Sketch of proof. Similarly to the proof of Theorem 3.1 we introduce the family of partitions
ΓN of the interval r0, ts as in (8.8) with limnÑ8 |ΓN | “ 0 and expand in Taylor the function f in
each of these intervals:
N !ÿd
ÿ B
F pXt q ´ F pX0 q “ F pXs`´1 qpXi ps` q ´ Xi psj´1 qq
`“1 i“1
Bx i

63
d
1 ÿ B2 )
` F pξ` qpXi ps` q ´ Xi ps`´1 qqpXj ps` q ´ Xj ps`´1 qq
2 i,j“1 Bxi Bxj
N
ÿ 1
“ tpIq` ` pIIq` u ,
`“1
2

for ξ` P di“1 rXi ps` q, Xi ps``1 qs . For the first order term, it is straightforward to generalize the
Ś
proof of Theorem 3.1 to obtain that
p N żtÿ d
ÿ B
lim pIq` “ F pXs q dXi psq .
N Ñ8
`“1 0 i“1
Bx i

We formally recover the expression of the second order term by combining (5.21) with the rules
of Table 1:
N żt ÿ d
p ÿ B2
lim pIIq` “ F pXs qpdXi psqqpdXj psqq
N Ñ8
`“1 0 i,j“1
Bx i Bx j

d
żt ÿ m
B2 ÿ
“ F pXs q pσik ptqdBk psqqpσjl ptqdBl psqq
0 i,j“1
Bxi Bxj k,l“1
d
żt ÿ m
B2 ÿ
“ F pXs q σik ptqσjk ptq dt ,
0 i,j“1
Bxi Bxj k“1

where in the second equality we have used that pdtqpdBj ptqq “ 0 and in the third that pdBi ptqqpdBj ptqq “
0 for i ‰ j. Note that one should check that when taking the limit in the first equality F pξ` q can be
replaced by F pXs` q. This can be done by reproducing the proof of Lemma 1.2. 
Remark 4.3. The fact that (5.27) holds requires that Bi and Bj are independent if i ‰ j.
However, (5.26) holds even when they are not independent.
Theorem 4.2 is for a scalar valued function F . However by applying the result to each coordinate function
Fi : Rd Ñ R of the function F : Rd Ñ Rp with F “ pF1 , . . . , Fp q we obtain the full multidimensional version.
Instead of writing it again in coordinates, we take the opportunity to write a version aligned with the
perspective and notation of (5.26). Recalling that the directional derivative of F in the direction ν P Rd at
the point x P Rd is
p ÿ d
F px ` ενq ´ F pxq ÿ BFk
DF pxqrνs “ lim “ p∇F ¨ νqpxq “ νi ek
εÑ0 ε k“1 i“1
Bxi

where ek is the k-th unit vector of Rp . Similarly, the second directional derivative at the point x P Rd in the
directions ν, η P Rd is given by
p ÿ d ÿ d
DF px ` εηqrνs ´ DF pxqrνs ÿ B 2 Fk
D2 F pxqrν, ηs “ lim “ νi ηj ek
εÑ0 ε k“1 i“1 j“1
Bxi xj

Then in the notation of (5.26), Theorem 4.2 can be rewritten as


d
ÿ
dF pXptqq “ DF pXptqqrf ptqs dt ` DF pXptqqrσ piq ptqs dBi ptq
i“1
d d
1ÿÿ 2
` D F pXptqqrσ piq ptq, σ pjq ptqsdrBi , Bj sptq
2 i“1 j“1

64
If Bi and Bj are assumed to be independent if i ‰ j, then
d d
ÿ 1ÿ 2
dF pXptqq “ DF pXptqqrf ptqs dt ` DF pXptqqrσ piq ptqs dBi ptq ` D F pXptqqrσ piq ptq, σ piq ptqs dt .
i“1
2 i“1

We now consider special cases of Theorem 3.1 that will be helpful in practice. The first describes
evolution of a function F px, tq that depends explicitly on time:
Corollary 4.4. Let F : Rd ˆ r0, 8q Ñ R such that F px, tq is C 2 in x P Rd and C 1 in t P r0, 8q.
Furthermore, let Xt be a d-dimensional Itô process as in (5.23). Then
n n n
BF ÿ BF 1 ÿ ÿ BF
dF pXt , tq “ pXt , tq dt ` pXt , tqdXi ptq ` pXt , tq drXi , Xk st . (5.28)
Bt i“1
Bx i 2 i“1 k“1
Bx i Bxk

where a is defined as in (5.27).


Proof. In this proof we will make use of the “multiplication table” at the beginning of
this section. Consider the d ` 1-dimensional process X̄t :“ pX1 ptq, . . . , Xd ptq, Yt q for Yt given by
dYt “ dt ` 0 dBt . Then by applying Itô’s Formula Theorem 5.26 and the fact that Yt is of finite
variation (so drZ, Y st “ 0 for continuous Zt ) we obtain
d
ÿ
dF pXt q “ Bi F pXt , tqdXi ptq ` Bt F pXt , tqpdt ` 0 dBt q
i“1
˜ ¸
d d
1 ÿ
2
ÿ
` Bij F pXt , tqdrXi , Xj st `2 Bi Bt F pXt , tqdrXi , Yt s ` Bt2 F pXt , tqdrYt , Yt s
2 i,j“1 i“1
d d
ÿ 1 ÿ 2
“ Bi F pXt , tqdXi ptq ` Bt F pXt , tqdt ` B F pXt , tqdrXi , Xj st .
i“1
2 i,j“1 ij

Note that the existence of the second derivative in t is not needed in the above formula and can
therefore be dropped. 
Corollary 4.5. Let Xt , Yt be two Itô processes. Then
dpXt Yt q “ Yt dXt ` Xt dYt ` drX, Y st . (5.29)
This result is known as stochastic integration by parts formula.
Proof. Let F : R2 Ñ R with F px, yq “ x ¨ y, then since
2 2 2
Bx F px, yq “ y , By F px, yq “ x , Bxx F px, yq “ Byy F px, yq “ 0 , Bxy “ 1,
by Itô’s Formula Theorem 5.26 we have
dpXt Yt q “ dF pXt , Yt q “ Yt dXt ` Xt dYt ` 1drX, Y st . (5.30)

Example 4.6. We compute the stochastic integral
żt
sdBs .
0
Applying the integration by parts formula (5.30) for dXt “ dBt , dYt “ dt we obtain
dptBt q “ tdBt ` Bt dt ` drB, tst .
65
Since Yt is of finite variation we have drB, Y st “ 0, and by integrating and rearranging the terms
we obtain
żt żt żt żt
sdBs “ dpsBs q ´ Bs ds “ tBt ´ Bs ds .
0 0 0 0

Example 4.7. Assume that f px, tq P C 2,1 pR, R `q satisfies the pde
B B2
f px, tq ` 2 f px, tq “ 0 ,
Bt Bx
“ ‰
and E f pBt , tq2 ă 8, then have that
1 2
df pBt , tq “ Bt f pBt , tq dt ` Bx f pBt , tq dt ` Bxx f pBt , tq drBst
ˆ ˙ 2
1 2
“ Bt ` Bxx f pBt , tq dt ` Bx f pBt , tq dBt “ Bx f pBt , tqdBt .
2
şt
Therefore f pBt , tq “ f p0, 0q ` 0 Bx f pBs , sq dBs is a martingale.

5. Collection of the Formal Rules for Itô’s Formula and Quadratic Variation
We now recall some of the formal calculations, bringing them all together in one place. We consider
a probability space pΩ, F, Pq with a filtration Ft . We assume that Bt and Bt are independent standard
Brownian motions adapted to the a filtration Ft .
For any ρ P r0, 1s, Zt “ ρBt ` 1 ´ ρ2 Bt is again a standard Brownian motion. Furthermore
a
rZst “ rZ, Zst “ ρ2 rB, Bst ` 2ρ 1 ´ ρ2 “ rB, Bst ` p1 ´ ρ2 qrB, Bst
“ ρ2 t ` 0 ` p1 ´ ρ2 qt “ t
Or in the formal differential notation, drZst “ dt. This result can be understood by using the formal
multiplication table for differentials which formally states:
a a
drZst “ pdZt q2 “ pρdBt ` 1 ´ ρ2 dBt q2 “ ρ2 pdBt q2 ` 2ρ 1 ´ ρ2 dBt dBt ` p1 ´ ρ2 qp dBt q2
“ ρ2 dt ` 0 ` p1 ´ ρ2 q dt “ dt
Similarly, one has
a
drZ, Bst “ pdZt qpdBt q “ ρpdBt q2 ` 1 ´ ρ2 p dBt qpdBt q “ ρ dt ` 0 “ ρ dt
a a a
drZ, Bst “ pdZt qp dBt q “ ρpdBt qp dBt q ` 1 ´ ρ2 p dBt q2 “ 0 ` 1 ´ ρ2 dt “ 1 ´ ρ2 dt
Now let σt and gt be adapted stochastic processes (adapted to Ft ) with
żt żt
σs2 ds ă 8 and gs2 ds ă 8
0 0

almost surely. Now define


dMt “ σt dBt dNt “ gt dBt
dUt “ σt dBt dVt “ σt dZt
şt
Of course these are just formal expression. For example, dMt “ σt dBt means Mt “ M0 ` 0
σs dBs . Using
the multiplication table from before we have
drM st “ pdMt q2 “ σt2 p dBt q2 “ σt2 dt drU st “ pdUt q2 “ σt2 pdBt q2 “ σt2 dt
drN st “ pdNt q2 “ gt2 p dBt q2 “ gt2 dt drV st “ pdVt q2 “ σt2 pdZt q2 “ σt2 dt

66
and the cross-quadratic variations
drM, N st “ pdMt qpdNt q “ σt gt p dBt q2 “ σt gt dt
drM, U st “ pdMt qpdUt q “ σt2 p dBt qpdBt q “ 0
a a
drM, Zst “ pdMt qpdZt q “ ρσt2 p dBt qpdBt q ` 1 ´ ρ2 σt2 p dBt q2 “ 1 ´ ρ2 σt2 dt
Next we define
dHt “ µt dt and dKt “ ft dt
and observe that since Ht and Kt have finite first variation we have that
drHst “ pdHt q2 “ µ2t p dtq2 “ 0 and drKst “ pdKt q2 “ ft2 p dtq2 “ 0
Furthermore if Xt “ Ht ` Mt and Yt “ Kt ` Nt then using the previous calculations
drXst “ drX, Xst “ drH ` M, H ` M st “ drHst ` drM st ` 2drH, M st “ σt2 dt
drX, Y st “ drH ` M, K ` N st “ drH, K ` N st ` drM, K ` N st “ drM, N st “ σt gt dt
or using the formal algebra
drXst “ pdXt q2 “ µ2t p dtq2 ` 2µt σt p dtqp dBt q ` σt2 p dBt q2 “ 0 ` 0 ` σt2 dt “ σt2 dt
drX, Y st “ pdXt qpdYt q “ µt ft p dtq2 ` σt ft p dtqp dBt q ` gt ft p dtqp dBt q ` σt gt p dBt q2
“ 0 ` 0 ` 0 ` σt gt dt “ σt gt dt

67
CHAPTER 6

Stochastic Differential Equations

1. Definitions
Let pΩ, F, Pq be a probability space equipped with a filtration tFt uT . Let Bt “ pB1 ptq, . . . , Bm ptqq P
Rm be a m-dimensional Brownian motion with tBj ptqum j“1 a collection of mutually independent
Brownian motions such that Bj ptq P Ft and for any 0 ď s ă t, Bj ptq ´ Bj psq is independent of Fs .
Obviously, these conditions are satisfied by the natural filtration tFtB uT .

Definition 1.1. Let µ : Rd Ñ Rd and σj : Rd Ñ Rd for i “ 1, . . . , m be fixed functions. An


equation of the form
m
ÿ
dXt “ µpXt q dt ` σi pXt q dBj ptq (6.1)
i“1

representing the integral equation


żt m żt
ÿ
Xt “ x ` µpXs q ds ` σj pXs q dBj psq . (6.2)
0 j“1 0

where Xt is an unknown process is a Stochastic Differential Equation (sde) driven by the Brownian
motion tBt u. The functions µpxq, σpxq asre called the drift and diffusion coefficients, respectively.
It is more compact to introduce the matrix
¨ ˛
| |
σpxq “ ˝σ1 pxq ¨ ¨ ¨ σm pxq‚ P Rmˆd
| |
and write
dXt “ bpXt q dt ` σpXt q dBt . (6.3)
There are different concepts of solution for a sde. The most natural is the one of strong solution:
Definition 1.2. A stochastic process tXt u is a strong solution to the sde (6.1) driven by the
Brownian motion Bt with (possibly random) initial condition X0 P R if the following holds
i) tXt u is adapted to tFt u ,
ii) tXt u is continuous ,
şt şt
iii) Xt “ X0 ` 0 µpXt q dt ` m
ř
j“1 0 σj pXt q dBj ptq almost surely .

Remark 1.3. Often, the choice of Brownian motion in the above definition is implicit. However,
it is important that the strong solution of an sde depends on the chosen Brownian motion driving it.
A conceptually useful way to restate strong existence, say for all t ě 0, is that there exists a measure
map Φ : pt, Bq ÞÑ Xt pBq from r0, 8q ˆ Cpr0, 8q, Rd q Ñ Rd such that Xt “ Φpt, Bq solves (6.2) and
Xt is measurable with respect to the filtration generated by Bt .
69
Definition 1.4. We say that a strong solution to (6.1) (driven by a Brownian motion Bt ) is
strongly unique if for any two solutions Xt , Yt with the same initial condition X0 of (6.1) we have
that
P rXt “ Yt for all t P r0, T ss “ 1 .
Remark 1.5. By definition, the strong solution of a sde is continuous. For this reason, to prove strong
uniqueness it is sufficient to prove that two solutions Xt , Yt satisfy
P rXt “ Yt s “ 1 for all t P r0, T s .
Indeed, assuming that Xt is a version of Yt , by countable additivity, the set A “ tω : Xt pωq “ Yt pωq for all t P
Q` u has probability one. By right-continuity (resp. left-continuity) of the sample paths, it follows that X and
Y have the same paths for all ω P A.

2. Examples of SDEs
We now consider a few useful examples of sdes that have a strong solution.

2.1. Geometric Brownian motion. The geometric Brownian motion, or Black Scholes model
in finance, is a stochastic process Xt that solves the sde
dXt “ µXt dt ` σXt dBt , , (6.4)
where µ, σ P R are constants. This model can be used to describe the evolution of the price Xt of a
stock, which is assumed to have a mean increase and fluctuations with variance that depend both
linearly on the stock price Xt . The coefficients µ, σ are called the percentage drift and percentage
volatility, respectively. We see immediately that (6.4) has a solution by Itô’s formula: letting
f pxq “ log x we have that
1 1 1 2 2 1
d log Xt “ pµXt dt ` σXt dBt q ´ 2 σ Xt dt “ pµ ´ σ 2 q dt ` σ dBt ,
Xt 2 Xt 2
Therefore, by integrating and exponentiating, that the solution of the equation reads
„ˆ ˙ 
1 2
Xt “ X0 exp µ ´ σ t ` σBt .
2
The uniqueness of this solution will be proven shortly.

2.2. Stochastic Exponential. Let Xt be an Itô process with differential dXt “ µt dt ` σt dBt .
We consider the following sde:
dUt “ Ut dXt (6.5)
with initial condition U0 “ u0 P R. Note that often one chooses u0 “ 1. Since the above sde is
analogous to the ode df “ f dt whose solution is given by the exponential function f ptq “ expptq,
the process Ut solving (6.5) is often called the stochastic exponential and one writes Xt “ EpXqt .
The following result ensures that this process exists and is unique:
Proposition 2.1. The sde (6.5) has a unique strong solution, given by
„  „ż t ˆ ˙ żt 
1 1 2
Ut “ EpXqt :“ U0 exp Xt ´ X0 ´ rXst “ U0 exp µs ´ σs ds ` σs dBs . (6.6)
2 0 2 0

Proof. If u0 “ 0, then it is immediate by (6.5) that Ut ” 0 for all t ě 0 and it is the


only solution. Now suppose u0 ‰ 0. We start by proving existence. The proposed solution is
70
clearly adapted, so defining Vt “ Xt ´ X0 ´ 12 rXst only have to verify that eVt satisfies (6.5), i.e.,
dpeVt q “ eVt dXt . First, using Itô’s formula, we get
1
dVt “ dXt ´ drXst “ dXt ´ σt2 dt (6.7)
ˆ 2 ˙
1
dpeVt q “ eVt dVt ` drV st . (6.8)
2
ş
Note that (6.7) implies rV st “ rXst , because the σt2 dt part is of finite variation. Therefore
dVt “ dXt ´ 12 drV st . Plugging it back to (6.8), we have the desired equality.
We next check that the solution is unique. Suppose we have another solution Ũ that satisfies (6.5).
Notice that U is nonzero when u0 ‰ 0, we can compute dpŨt {Ut q
1
dpŨt {Ut q “ Ũt dp1{Ut q ` dŨt ` drŨ , U ´1 st
Ut
„ 
dUt drU st Ũt dXt
“ Ũt ´ 2 ` 3 ` ` drŨ , U ´1 st
Ut Ut Ut
Ũt σt2 dt
“ ` drŨ , U ´1 st .
Ut
We need the stochastic differential of U ´1 .
dUt drU st dXt σt2 dt
dpU ´1 q “ ´ ` “ ´ `
Ut2 Ut3 Ut Ut
6 drŨ , U ´1 st “ ´σt2 Ũt Ut´1 dt
Plugging this back gives dpŨt {Ut q “ 0. So the ratio of the two solution stays a constant for all t ě 0.
Since both solutions start at u0 , we conclude PrŨt “ Ut , @t ě 0s “ 1. 
Similarly to the above definition. we introduce the stochastic logarithm Xt “ LpU qt of a process
Ut with a stochastic differential dUt “ µ1t dt ` σt1 dBt1 and Ut ‰ 0 as the solution to the following sde:
dUt
dXt “ andXt “ 0 . (6.9)
Ut
Again we have that the solution to the above sde exists and is unique.
Proposition 2.2. Under the conditions listed above, the sde (6.9) has a unique solution given
by ˆ ˙ żt
Ut drU st
LpU qt “ log ` .
U0 0 Ut2
Furthermore, as suggested by the framework of stochastic calculus, the above operators are
inverse wrt each other:
Proposition 2.3. If u0 “ 1 we have LpEpXqqt “ Xt and if Ut ‰ 0 then EpLpU qqt “ Ut .
Proof. It is enough to check that dXt “ dpEpXqt q{EpXqt and dUt “ LpU qt dLpU qt . 
2.3. Linear SDEs. Let tαt u, tβt u, tγt u, tδt u be given (i.e., independent on Xt ) continuous
stochastic processes adapted to the filtration tFt u. We consider the family of sdes given by
dXt “ pαt ` βt Xt q dt ` pγt ` δt Xt q dBt . (6.10)
We proceed to solve such family of sdes, which includes as special cases some of the examples treated
previously in this course. We do so in two steps:
71
i) First we consider the case where αt “ γt ” 0. In this case we should solve the sde
dUt “ βt Ut dt ` δt Ut dBt ,
which by Proposition 2.1, choosing U0 “ 1 and defining dY “ βt dt ` δt dBt has the unique
solution „ż t ˆ ˙ żt 
1 2
Ut “ EpY qt “ exp βs ´ δs ds ` δs dBs .
0 2 0
ii) We now proceed to consider the full sde (6.10), and make the ansatz of a separable solution,
i.e., assume that Xt “ Ut Vt where dVt “ at dt ` bt dBt for unknown processes tat u, tbt u.
Then we compute
dXt “ Ut dVt ` Vt dUt ` drU, V sptq
“ Ut at dt ` Ut bt dBt ` Vt βt Ut dt ` Vt δt Ut dBt ` bt δt Ut dt
“ pat Ut ` bt δt Ut ` βt Xt q dt ` pbt Ut ` δt Xt q dBt .
We notice that the above expression coincides with the rhs of (6.10) if
αt ´ δt γt γt
at “ and bt “ .
Ut Ut
This uniquely defines the process Vt (whose initial condition is fixed by the fact that U0 “ 1
to be V0 “ X0 ) as
żt żt
αs ´ δs γs γs
Vt “ X0 ` ds ` dBs ,
0 U s 0 U s
which in turn defines the solution to (6.10) as Xt “ Ut ¨ Vt .
Example 2.4. Letting a, b P R, consider the following sde on t P p0, T q:
b ´ Xt
dXt “ dt ` dBt with X0 “ a . (6.11)
T ´t
It is clear that this is a linear sde with
b 1
αt “ , βt “ , γt “ 1 , δt “ 0 .
T ´t T ´t
Therefore, the solution to (6.11) is given by
ˆ ˙ żt
t t 1
Xt “ a 1 ´ ` b ` pT ´ tq ds . (6.12)
T T 0 T ´s
şt
Since 0 pT ´ sq´1 ds ă 8 for all t ă T the Itô integral in (6.12) is a martingale. Furthermore, as
we have proven in Homework 2 it is a gaussian process. Hence, Xt is also a Gaussian process, with
E rXt s “ a ` pa ´ bqt{T and covariance structure
˜ż ¸
minps,tq ż minps,tq ż maxps,tq
1 1 1
CovpXs , Xt q “ pT ´ tqpT ´ sqCov dBq , dBq ` dBq
0 T ´q 0 T ´q minps,tq T ´ q
˜ż ¸
minps,tq
1 st
“ pT ´ maxpt, sqqpT ´ minpt, sqqVar dBq “ minps, tq ´
0 T ´ q T
where in the second equality we have used that the covariance of independent random variables is 0.
The above expression suggests that the variance of the process Xt is 0 at t “ 0 and t “ T , and
maximized at t “ T {2, while the expected value of the process Xt is on the line interpolating between
a and b. Hence the name Brownian Bridge: the process above can be seen as a Brownian motion
72
with initial condition B0 “ a and conditioned on its final value BT “ b. Indeed, one can prove (cfr
Klebaner, Example 5.11) that
żt
1
lim pT ´ tq dBs “ 0 a.s. .
tÑT 0 T ´ s
3. Existence and Uniqueness for SDEs
In this section we prove a theorem giving sufficient conditions on the coefficients µp¨q, σp¨q for
the existence and uniqueness of the solution to the associated sde. We will consider solutions to the
equation
dXt “ µpt, Xt q dt ` σpt, Xt q dBt (6.13)
Note that we have assumed explicit dependence on time t of the drift and diffusion coefficients as
this will allow us to weaken the conditions of the following theorem:
Theorem 3.1. Fix a terminal time T ą 0. Assume that σpt, xq and µpt, xq are globally Lipschitz
continuous, i.e., that there is a positive constant K so that for any t P r0, T s and any x and y we
have |µpt, xq ´ µpt, yq| ` |σpt, xq ´ σpt, yq| ď K|x ´ y| . Then the sde (6.13) with initial condition
X0 “ x has a solution żt ż t
Xt “ x ` µps, Xs q ds ` σps, Xs q dBs ,
0 0
and this solution is strongly unique. Furthermore the solution satisfies is in L2 pΩ ˆ r0, T sq, i.e.,
„ż t 
2
E Xs ds ă 8 .
0
It is clear that in order for the solution Xt to be well defined we need that
żt żt
|µps, Xs q| ds ă 8 and σps, Xs q2 ds ă 8 a.s.
0 0
However, the assumptions of Theorem 3.1, while being easier to check, are more strict than the ones
above. It is useful to recall that these assumptions are needed even for odes to have existence and
uniqueness: we remind in the following examples how the non-Lipschitz character of the drift can
Example 3.2 (Existence). The ode
dx
“ x2 with xp0q “ 1
dt
has a drift coefficient µpxq “ x2 that is not uniformly Lipschitz continuous (although it is locally
1
Lipschitz continuous) because it grows faster than linearly. This ode has a solution xptq “ 1´t .
However, it is clear that this solution is only well defined for all t P p0, 1q and diverges for t Ñ 1. In
other words, the solution to this ode does not exist beyond t “ 1.
Example 3.3 (Uniqueness). The ode
dx a
“ 2 |x|
dt
is not locally Lipschitz continuous at x “ 0. The solution of this ode with initial condition x0 “ 0 is
not unique. Indeed, it is immediate to check that
x1,t “ 0 and x2,t “ t2 ,
are both solution to this equation with the given initial condition.
3.1. Proof of Theorem 3.1.
73
Preparatory results. We start by proving two very useful lemmas:
Lemma 3.4. (Gronwall’s inequality) Let yptq be a nonnegative function such that
żt
yptq ď A ` D ypsq ds (6.14)
0

for nonnegative A, D P R. Then yptq satisfies


f ptq ď A exp pDtq
Proof of Lemma 3.4. By repeatedly iterating the (6.14) we obtain
żt
yptq ď A ` Dypsq ds
0
żt ˆ żs ˙
ďA` D A` Dypqq dq ds
0 0
żt żt żs
ď A ` A D ds ` D Dypqq dq ds
0 0 0
żtżsˆ żs ˙
2
ď A ` ADt ` D A`D yprq dr dq ds
0 0 0
żtżs żtżsżq
ď A ` ADt ` AD2 dq ds ` D3 yprq dr dq ds ď . . .
0 0 0 0 0
żtżsżqżr
D2 t2 D 3 t3 4
ď A ` ADt ` A `A `D ypτ q dτ dr dq ds .
2 3! 0 0 0 0

We notice that repeating the above procedure k times we will obtain the first k terms of the Taylor
expansion of A exp Dt plus a remainder term resulting from an integral iterated k ` 1 times. For
finite T we can bound such integral by defining the constants
żT
C :“ ypsq ds ă 8 and G :“ A ` DC ,
0

so that yptq ď G. Consequently, we can bound the remainder term by Gtk`1 Dk`1 {k! which vanishes
exponentially fast in the limit k Ñ 8, uniformly in t P r0, T s.
Alternative proofs assuming the existence and uniqueness of solutions to odes can be found in
any good ode or dynamics book. For instance [6] or [5]. 
Lemma 3.5. Let tyn ptqu be a sequence of functions satisfying
‚ y0 ptq ď A ,
żt
‚ yn`1 ptq ď D yn psq ds ă 8 @ n ą 0, t P r0, T s ,
0

for positive constants A, D P R, then yn ptq ď CDn tn {n! .


Proof. the proof of this result goes by induction: the first step is trivial, while for the induction
step we have
żt żt
Dn tn Dn`1 tn`1
yn`1 ptq ď D yn psq ds ď D C ! ds “ C .
0 0 n pn ` 1q!

74
Uniqueness for sde. If X1 ptq and X2 ptq are two solutions then taking there difference produces
żt żt
X1 ptq ´ X2 ptq “ rµps, X1 psqq ´ µps, X2 psqqs ds ` rσps, X1 psqq ´ σps, X2 psqqs dBs
0 0
We now use the fact that
maxtpa ´ bq2 , pa ` bq2 u ď pa ´ bq2 ` pa ` bq2 “ 2a2 ` 2b2
gives
ˇż t ˇ2 ˇż t ˇ2
2
ˇ ˇ ˇ ˇ
|X1 ptq ´ X2 ptq| ď2 ˇ rµps, X1 psqq ´ µps, X2 psqqs dsˇ ` 2 ˇ rσps, X1 psqq ´ σps, X2 psqqs dBs ˇˇ
ˇ ˇ ˇ
0 0
ďpIq ` pIIq
şt şt
Next recall that Holders (or Cauchy-Schwartz) inequality implies that p 0 f dsq2 ď t 0 f 2 ds (apply
Holder to the product 1 ¨ f with p “ q “ 2.) Hence
żt żt
2
EpIq ď 2tE rµps, X1 psqq ´ µps, X2 psqqs ds ď 2tK 2
E |X1 psq ´ X2 psq|2 ds
0 0
Applying Itô’s isometry to the second term gives
żt żt
EpIIq “2 E |σps, X1 psqq ´ σps, X2 psqq|2 ds ď 2K 2 E |X1 psq ´ X2 psq|2 ds
0 0
Putting this all together and recalling that t P r0, T s gives
żt
E |X1 ptq ´ X2 ptq|2 ď 2K 2 pT ` 1q E |X1 psq ´ X2 psq|2 ds
0

Hence by Gronwall’s inequality Lemma 3.4 we conclude that E |X1 ptq ´ X2 ptq|2 “ 0 for all t P r0, T s.
Hence X1 ptq and X2 ptq are identical almost surely.
Existence for sde. The existence of solutions is proved by a variant of Picard’s iterations. Fix
an initial value x, we define a sequence processes Xn ptq follows. By induction, the processes have
continuous paths and are adapted.
X0 ptq “ x
żt żt
X1 ptq “ x ` µps, xqds ` σps, xqdBs
0 0
.. .. ..
. . .
żt żt
Xn`1 ptq “ x ` µps, Xn psqqds ` σps, Xn psqqdBs
0 0

Fix t ě 0, we will show that Xn ptq converges in L2 . Hence there is a random variable Xptq P
L2 “ ‰
L2 pΩ, F, P q and Xn ÝÑ Xptq. Let yn ptq “ E pXn`1 ptq ´ Xn ptqq2 , we will verify the two conditions
in Lemma 3.5. First, for n “ 0 and any t P r0, T s,
«ˆż ˙2 ff «ˆż ˙2 ff
t t
2
“ ‰
y0 ptq “ E pX1 ptq ´ X0 ptqq ď 2E µps, xqds ` 2E σps, xqdBs
0 0
«ˆż ˙2 ff «ˆż ˙2 ff
t t
ď 2E K|1 ` x|ds ` 2E K|1 ` x|dBs ď C ă 8,
0 0

75
where the second inequality uses the fact that the coefficients are growing no faster than linearly.
Second, similar computation as for the uniqueness yields
żt
2
yn`1 ptq ď 2K p1 ` T q yn psqds @t P r0, T s, n “ 0, 1, 2 . . .
0
which is finite by induction. Lemma 3.5 implies
p4K 2 ` 4K 2 T qn
yn ptq “ E pXn`1 ptq ´ Xn ptqq2 ď C
“ ‰
,
n!
which goes to zero uniformly for all t P r0, T s. We thus conclude Xn ptq converges in L2 uniformly
and their L2 -limit, Xptq P L2 pΩ, F, P q.
L2
It remains to show that the limit process Xptq solves (6.13). Since Xn ÝÑ X, we have
Erµpt, Xn ptqq ´ µpt, Xptqqs2 ` Erσpt, Xn ptqq ´ σpt, Xptqqs2
ď K 2 ErpXn ptq ´ Xptqq2 s ` K 2 ErpXn ptq ´ Xptqq2 s
Ñ 0, uniformly in t
By Itô’s isometry and Fubini:
«ˆż ˙2 ff «ˆż ˙2 ff
t żt t
E σps, Xn psqqdBs ´ σps, XpsqqdBs “E σps, Xn psqq ´ σps, XpsqqdBs
0 0 0
żt
nÑ8
E pσps, Xn psqq ´ σps, Xpsqqq2 ds ÝÑ 0 .
“ ‰

0
Similarly, by Cauchy-Schwarz inequality we have that :
«ˆż ˙2 ff «ˆż ˙2 ff
t żt t
E µps, Xn psqqdBs ´ µps, Xpsqq ds “E µps, Xn psqq ´ µps, Xpsqq ds
0 0 0
żt
nÑ8
E pµps, Xn psqq ´ µps, Xpsqqq2 ds ÝÑ 0 .
“ ‰
“t
0
We thus have żt żt
Xptq “ x ` µps, Xpsqqds ` σps, XpsqqdBs ,
0 0
i.e., Xptq solves (6.13).
Remark 3.6. Looking through the proof of Theorem 3.1 we see that the assumption of global
Lipschitz continuity can be weakened to the following assumption
i) |µpt, xq| ` |σpt, xq| ă Cp1 ` |x|q (necessary for existence) ,
ii) |µpt, xq ´ µpt, yq| ` |σpt, xq ´ σpt, yq| ď C|x ´ y| (necessary for uniqueness) .

4. Weak solutions to SDEs


Until now we have studied strong solutions to sdes, i.e., solutions for which a Brownian motion
(and a probability space) is given in advance, and that are constructed based on such Brownian
motion. If we are only given some functions µpxq and σ without fixing a Brownian motion, we may
be able to construct a weak solution to an sde of the form (6.1). Such solutions allow to choose a
convenient Brownian motion (and consequently a probability space!) for the solution Xt to satisfy
the desired sde.
76
Definition 4.1. A weak solution of the stochastic differential equation (6.1) with initial condition
X0 „ ρ0 for a given probability distribution ρ0 is a continuous stochastic process Xt defined on some
probability space pΩ, F, Pq such that for some Brownian motion Bt and some filtration tFt u such
that Bj ptq P Ft and for any 0 ď s ă t, Bj ptq ´ Bj psq is independent of Fs , the process Xt is adapted
and satisfies the stochastic integral equation (6.2).
In other words, in the case of a weak solution we are free to chose some convenient Brownian
motion that allows Xt to be a solution. In this sense, these solutions are also distributional solutions,
i.e., solutions that have the “right” marginals.
Because two solutions Xt , Yt may live on different probability spaces, we cannot compare their
paths as in the case of strong solutions. Instead, we weaken the concept of strong uniqueness to the
one of weak uniqueness, i.e., uniqueness in law of the solution process:
Definition 4.2. The weak solution of a sde is said to be weakly unique if any two solutions
Xt , Yt have the same law, i.e., for all tti P r0, T su, tAi P Bu we have
« ff « ff
č č
P tXti P Ai u “ P tYti P Ai u .
i i

Example 4.3. Consider the sde dYt “ dBt with initial condition Y0 “ 0 . This sde has clearly
a strong solution, which is Yt “ Bt . If we let Wt be another Brownian motion (possibly defined on
another probability space) then Wt will not be, in general, a strong solution to the above sde (in the
case that the two probability spaces are different, the two solutions cannot even be compared). It
will, however, be a weak solution to the sde, as being a Brownian motion completely determines the
marginals of the process.
We will now consider an example for which there exists a weak solution, but not a strong
solution:
Example 4.4 (Tanaka’s sde). For certain µ and σ, solutions to (6.1) may exist for some
Brownian motion and some admissible filtrations but not for others. Consider the sde
dXt “ signpXt qdBt , X0 “ 0; (6.15)
where σpt, xq “ signpxq is the sign function
"
`1, if x ě 0
signpxq “
´1, if x ă 0.
The function σpxq is not continuous and thus not Lipschitz. A strong solution does not exist for
this sde, with the filtration F “ pFt q chosen to be Ft :“ σpBs , 0 ď s ď tq. Suppose Xt is a strong
solution to Tanaka’s sde, then we must have
F̃t :“ σpXs , 0 ď s ď tq Ď Ft . (6.16)
şT “ ‰ ş t
Notice that for any T ě 0, 0 E signpXt q2 ds ă 8, the Itô integral 0 signpXt qdBs is well defined
and Xt is a martingale. Moreover, the quadratic variation of Xt is
żt żt
rXst “ rsignpXt qs2 ds “ 1 ¨ ds “ t,
0 0
thus Xt must be a Brownian motion (by Lévy’s characterization, to be proved later). We may
denote Xt “ B̃t to emphasize that it is a Brownian motion. Now multiplying both sides of (6.15) by
signpXt q, we obtain
dBt “ signpB̃t qdB̃t . (6.17)
77
şt
and thus Bt “ 0 signpB̃s qdB̃s . By Tanaka’s formula (to be shown later), we then have
Bt “ |B̃t | ´ L̃t
where L̃t is the local time of B̃t at zero. It follows that Bt is σp|B̃s |, 0 ď s ď tq-measurable. This
leads to a contradiction to (6.16), because it would imply that
Ft Ď σp|B̃s |, 0 ď s ď tq Ĺ σpB̃s , 0 ď s ď tq “ F̃t .
Still, as we have seen above, choosing Xt “ B̃t there exists a Brownian motion Bs such that Tanaka’s
sde holds. Such pair of Brownian motions forms a weak solution to Tanaka’s equation.

5. Markov property of Itô diffusions


The solutions (weak or strong) to stochastic differential equations are referred to as diffusion
processes (or Itô diffusions).
Definition 5.1. An Itô process dXt “ µt dt ` σt dBt is an Itô diffusion if µt , σt are measurable
wrt the filtration tFtX u generated by Xt for all t P r0, T s, i.e.,
µt , σt P FtX .
Remark 5.2. It is clear that solution tXt u to the sde dXt “ µpt, Xt q dt ` σpt, Xt q dBt for
continuous functions µ, σ is an Itô diffusion. For this reason, such sdes are called of diffusion-type.
Recall Def. 7.13 that a Markov process is a process whose future depends on its past only
through its present value, while if this property holds also for stopping times, the process is said to
have the strong Markov property (cfr Def. 7.17).
Theorem 5.3. The solution tXt u to the sde
dXt “ µpt, Xt q dt ` σpt, Xt q dBt , (6.18)
has the strong Markov property.
While we do not present the proof of this result, which can be found in [14], it should be
intuitively clear why solutions to (6.18) have the Markov property. Indeed, we see that the drift
and diffusion coefficients of the above sde only depend on the time and on the value of Xt at that
time (and not on its past value). This fact, combined with the independence of the increments of
Brownian motion results in the Markov (and the strong Markov) property of such solutions.

78
CHAPTER 7

PDEs and SDEs: The connection

Throughout this chapter, except when specified otherwise, we let tXt u be a solution to the sde
dXt “ µpt, Xt q dt ` σpt, Xt q dBt . (7.1)
As we have seen in the last chapter, solutions to the above equation are referred to as diffusion
processes. This name comes from the fact that Brownian motion, the archetypal diffusion process,
was invented to model the diffusion of a dust particle in water. Similarly, in the world of partial
differential equations, diffusion equations model precisely the same type of phenomenon. In this
chapter we will see that the correspondence between these two domains goes well beyond this point.

1. Infinitesimal generators
Having seen in the previous chapter that solutions to sdes possess the strong markov property,
we introduce the following operator to study the evolution of their finite-dimensional distributions:
Definition 1.1. The infinitesimal generator for a continuous time Markov process Xt is an
operator A such that for any function f ,
Erf pXt`dt q|Xt “ xs ´ f pxq
At f pxq :“ lim , (7.2)
dtÓ0 dt
provided the limit exists. The set of function for which the above limit exists is called the domain
DpAt q of the generator.
This operator encodes the infinitesimal change in the probability distribution of the process Xt .
One way of seeing this is by choosing f pxq “ 1A pxq for a set A P Rd .
We now look at some examples where we find the explicit form of the generator for Itô diffusions:
Example 1.2. The infinitesimal generator for a standard one-dimensional Brownian motion Bt
is
1 d2
A“
2 dx2
for all f that are C with compact support. To derive this, we first apply Itô’s formula to any f P C 2
2

and write
żt 2 żt
d 1 d2
f pBt q “ f pB0 q ` f pBs qdBs ` 2
f pBs qds
0 dx 0 2 dx
żt żt
1 1 d2
“ f pB0 q ` f pBs qdBs ` 2
f pBs qds
0 0 2 dx
Apply this formula to two time points t and t ` r, we have
ż t`r ż t`r
1 1 d2
f pBt`r q “ f pBt q ` f pBs qdBs ` f pBs qds
t t 2 dx2
79
When f has compact support, f 1 pxq is bounded and suppose |f 1 pxq| ď K and thus for each t,
żt żt
K 2 ds “ K 2 t ă 8.
` 12 ˘
E f pBs q ds ď
0 0
Hence the first integral has expectation zero, due to Itô’s isometry. It follows that
" ż t`r ż t`r
1 d2
ˇ *
1
ˇ
Erf pBt`r q|Bt “ xs “ f pxq ` E f pBs qdBs ` f pBs qdsˇ Bt “ x
ˇ
t t 2 dx2
" ż t`r ż t`r
1 d2
*
1
“ f pxq ` E f pBs qdBs ` f pBs qds
t t 2 dx2
ż t`r
1 d2
“ f pxq ` E f pBs qds
t 2 dx2
where the second equality is due to the independence of the post-t process Bt`s ´ Bt and Bt .
Subtracting f pxq, dividing by r and letting r Ñ 0 on both sides, we obtain
”ş ı
t`r 1 d2
Erf pBt`r q|Bt “ xs ´ f pxq E t 2 dx 2 f pBs qds
Af pxq “ lim “ lim
rÓ0 r rÓ0 r
´ ”ş 2
¯ı ” 2 ı
t`r t`r 1
d E t 12 dx d d
ş
2 f pB s qds p˚q
d t 2 E dx2
f pB s ds
q
“ “
ˇdr dr
1 d2 ˇ 1 d2 1 d 2
“ f pBs qˇˇ “ f pBt q “ f pxq
2 dx2 s“t 2 dx2 2 dx2
In the above calculation, we inverted the order of the integrals using Fubini-Tonelli’s theorem.
Remark 1.3. Here we omit the subscript t in the generator A because Brownian motion is
time-homogeneous, i.e.,
Erf pBt`dt q|Bt “ xs ´ f pxq Erf pBs`dt q|Bs “ xs ´ f pxq

dt dt
1 d2
and thus At f pxq “ As f pxq. The generator A “ 2 dx2 does not change with time.
The procedure to obtain the infinitesimal generator of Brownian motion can be straightforwardly
generalized to the case of Itô diffusions:
Example 1.4. Assume that Xt satisfies the sde (7.1), then its generator At is
d σ 2 pt, xq d2
At f pxq “ µpt, xq f pxq ` f pxq (7.3)
dx 2 dx2
for all f P C 2 with compact support.The computation is similar to the Brownian motion case. First
apply Itô’s formula to f pXt q and get
ż t`r " ż t`r
σ 2 ps, Xs q d2
*
d d
f pXt`r q “ f pXt q ` µps, Xs q f pXs q ` 2
f pXs q ds ` σps, Xs q f pXs qdBs
t dx 2 dx t dx
Then using the fact that f P C 2 with compact support, the last integral has expectation zero.
Conditioning on Xt “ x, computing Erf pXt`r q|X
r
t “xs´f pxq
, exchanging integrals by Fubini-Tonelli and
taking r Ñ 0, we conclude that the generator has the form (7.3).
The above example can be further generalized to the case when the function f also depends on time:
80
Example 1.5. Consider the two dimensional process pt, Xt q, where the first coordinate is
deterministic and the second coordinate Xt satisfies (7.1). We treat it as a process Yt “ pt, Xt q P R2 .
In this case, the generator of Yt , according to the definition in (7.2) is given by
Erf pYt`dt q|Yt “ pt, xqs ´ f pt, xq
At f pt, xq :“ lim
dtÓ0 dt
B σ 2 pt, xq B 2 B
“ µpt, xq f pt, xq ` f pt, xq ` f pt, xq (7.4)
Bx 2 Bx2 Bt
for any f P C 1,2 that has compact support.
Formally speaking, if At is the generator of Xt , what At does to f is to map it to the “drift
coefficient” in the stochastic differential of f pXt q, i.e.,
df pXt q “ At f pXt q ¨ dt ` (something) ¨ dBt
Remark 1.6. The notation here is slightly different from [Klebaner], where Klebaner always
uses Lt to denote the operator on functions f P C 1,2 so that
B σ 2 pt, xq B 2
f pt, xq `
Lt f pt, xq “ µpt, xq f pt, xq (7.5)
Bx 2 Bx2
and call such Lt the “generator of Xt ”. Comparing this form with (7.3) and (7.4), and since Lt
acts on C 1,2 functions, we can relate Lt to the generator At of pt, Xt q, i.e.,
B
Lt f pt, xq `
f pt, xq “ At f pt, xq
Bt
When we look at martingales constructed from the generators, At will give a more compact (and
maybe more intuitive) notation.
Exercise. Find the generator At of pXt , Yt q, where Xt and Yt satisfies
dXt “ µpt, Xt qdt ` σpt, Xt qdBt
dYt “ αpt, Yt qdt ` βpt, Yt qdBt
What if Xt and Yt are driven by two independent Brownian motions?

2. Martingales associated with diffusion processes


Suppose Xt solves (7.1) and At is its generator (see (7.3)). For f P C 2 , we know from Itô’s
formula that żt żt
f pXt q “ f pX0 q ` As f pXs qdt ` σps, Xs qf 1 pXs qdBs
0 0
Under proper conditions, the third term on the right is a well-defined Itô integral and also a
martingale. We can construct martingales by isolating this integral, i.e., let
żt ˆ żt ˙
1
Mt :“ f pXt q ´ f pX0 q ´ As f pXs qds “ σps, Xs qf pXs qdBs , (7.6)
0 0
and we will see that for certain µ, σ and f functions, Mt will be a martingale. First of all, if Xt is a
solution to (7.1) (either weak or strong), then, by definition, Mt is always Gt :“ σpXs , 0 ď s ď tq
measurable. So from now on, for the purpose of constructing martingales, we will only say “Xt
solves the sde (7.1)” without specifying whether Xt is a strong or a weak solution. Recall that if
żt
Erσ 2 ps, Xs qf 12 pXs qs ds ă 8, (7.7)
0
81
şt
then the Itô integral 0 σps, Xs qf 1 pXs qdBs is a martingale. Therefore, the usual technical step is to
prove (7.7) in order to conclude that Mt is a martingale.
Theorems 6.2 in [Klebaner] gives a set of conditions for Mt to be a martingale:
Condition 1. Let the following assumptions hold:
i) µpt, xq and σpt, xq are locally Lipschitz in x with a Lipschitz constant independent of t and
are growing at most linearly in x; and
ii) f P C 2 and f 1 is bounded,
Condition (i) implies that (7.1) has a strong solution, but more importantly, it controls the
speed of growth of the solution Xt (see Theorem 5.4 and also the proof in Theorem 6.2 of [Klebaner]
for more details); (ii) controls the magnitude of f , which together with (i) ensure the finiteness of
the integral in (7.7). [9, Theorem 6.3] gives a n alternative set of conditions to Condition 1, however,
the proof follows the same idea. The above result can be summarized in the following theorem
Theorem 2.1. Let tXt u be a solution to (7.1), f a function such that Condition 1 holds, then
Mt defined in (7.6) is a martingale.
We now generalize the above result to the case when f is time-dependent. Let Xt solve (7.1). If
At is the generator of the two dimensional process pt, Xt q (see the expression in (7.4)), then for any
function f pt, xq P C 1,2
żt
Mt :“ f pt, Xt q ´ f p0, X0 q ´ As f ps, Xs qds (7.8)
0
can be a martingale if µ, σ and f satisfy certain conditions. Again, using Itô’s formula, we see that
żt
B
Mt “ σps, Xs q f ps, Xs qdBs
0 Bx
The approach to show that Mt is a martingale is the same as above. For example, if Condition 1 (ii)
above is modified to
Condition 2. Let the following assumptions hold:
i) µpt, xq and σpt, xq are locally Lipschitz in x with a Lipschitz constant independent of t and
are growing at most linearly in x; and
ii)’ f P C 1,2 and Bx
B
f pt, xq is bounded for all t and x.
then, we can conclude that Mt defined in (7.8) is a martingale:
Theorem 2.2. Let tXt u be a solution to (7.1), f a function such that Condition 2 holds, then
Mt defined in (7.8) is a martingale.
One advantage of using
şt At instead of Lt is that we can express Mt in the same form, that is,
Mt :“ f pXt q ´ f pX0 q ´ 0 As f pXs qds, provided At is chosen to be the generator of Xt (which might
be high-dimensional). The following are a few immediate consequences, stated under Condition 2.
However, one should keep in mind that there are other conditions, under which these claims are
also true.
Corollary 2.3 (Dynkin’s formula). Suppose that Xt solves (7.1) and that Condition 2 holds.
Let At be the generator of pt, Xt q (see (7.4)). Then for any t P r0, T s,
„ż t 
Erf pt, Xt qs “ f p0, X0 q ` E As f ps, Xs qds ,
0
The result is also true if t is replaced by a bounded stopping time τ P r0, T s.
82
Corollary 2.4. Assume that Xt solves (7.1) and that Condition 2 holds. If f solves the
following pde
B σ 2 pt, xq B 2 B
pAt f pt, xq “q µpt, xq f pt, xq ` 2
f pt, xq ` f pt, xq ” 0,
Bx 2 Bx Bt
then f pt, Xt q is a martingale.
Example 2.5. Consider Xt “ Bt , then σ ” 1 and µ ” 0, which satisfies Condition 2 (i). Then,
for any f pt, xq that satisfies Condition 2 (ii’) (or conditions in Theorem 6.3 of Klebaner) and solves
1 B2 B
f pt, xq ` f pt, xq ” 0,
2 Bx2 Bt
f pt, Bt q is a martingale. For example, f pt, xq “ x, x2 ´ t, x3 ´ 3tx, et{2 sinpxq, or ex´t{2 .

3. Connection with PDEs


In the previous section we have seen that under proper conditions the solution f of some pde
can be used to construct a martingale. In this section, we will see that the solutions of certain pdes
may be represented by the expectation of the solution of the sde.
Throughout this section we will assume that Xt solves the sde (7.1) whose coefficients satisfy
Condition 2 (i), and At is the generator of pt, Xt q as given in (7.4). Furthermore, we assume that
f satisfies Condition 2 (ii’). Note that other conditions, under which Mt defined in (7.8) is a
martingale, would also work.

3.1. Kolmogorov Backwards Equation.


Theorem 3.1. Under the standing assumptions, if f pt, xq solves the pde
#
At f “ 0 for all t P p0, T q
, (7.9)
f pT, xq “ gpxq
for some function g such that Er|gpXT q|s ă 8. Then,
f pt, xq “ EpgpXT q|Xt “ xq, for all t P r0, T s.
Proof. Under the standing assumptions and the fact that At f “ 0, we know that f pt, Xt q is a
martingale, due to Corollary 2.4. Then for any t P r0, T s,
Erf pT, XT q|Ft s “ f pt, Xt q
Using the boundary condition, we have f pT, XT q “ gpXT q. The result then follows from the Markov
property of the solution to the diffusion-type sde, i.e.,
f pt, Xt q “ ErgpXT q|Ft s “ ErgpXT q|Xt s.

Remark 3.2. Note that the above theorem, assuming that f pt, xq solves the given pde, represents
expectation values of the process Xt in terms of such solutions. Under suitable regularity conditions
on the coefficients of the sde (7.1) and on the boundary condition gpxq one can show that such
expected value is the unique solution to the pde (7.9). These results, however, go beyond the scope
of this course and will not be presented here. We refer the interested reader to, e.g., [14].
Definition 3.3. For a Itô diffusion tXt u solving (7.1) the pde (7.9) is called the Kolmogorov
Backwards equation.
83
The name of the above pde is due to the fact that it has to be solved backwards in time, i.e., the
boundary condition in (7.9) is fixed at the end of the time interval of interest. This may seem at
first counterintuitive. One possible way to interpret this fact is that bringing the time derivative on
the other side of the equality we obtain ´Bt f “ Lt f , where Lt is the generator defined in (7.5). In
this form, the “arrow of time” is given by the fact that σ 2 has nonnegative values and corresponds
to the second derivative “widening the support” of f , and the negative sign in front of the time
derivative corresponds to an evolution in the “reverse” direction. Another way of understanding
the fact that the direction of time is reversed in (7.9) is the following: in order to establish an
expectation wrt a certain function in the future (e.g., the value of an option at expiration), an
operator that evolves such function should project it backwards to the information we have at the
moment, i.e., the value of Xt

Example 3.4. Letting gpxq “ 1A pxq, we have that being able to solve (7.9) is equivalent to
knowing
E r1A pXT q|Xt “ xs “ P rXT P A|Xt “ xs .
Example 3.5. Letting Xt be the solution to the Black Scholes model dXt “ µXt dt ` σXt dBt
and gpxq “ V pxq some value function of an option at time T , then being able to solve (7.9) is
equivalent to knowing the expected value of that option at expiration: E rV pXT q|Xt “ xs.
We now state an extension of Theorem 3.1 which deals with the case where the right side of the
pde is nonzero.

Theorem 3.6. Under the standing assumptions, if f pt, xq solves the pde
#
At f pt, xq “ ´φpxq for all t P p0, T q
f pT, xq “ gpxq
for some bounded function φ : R Ñ R and g such that Er|gpXT q|s ă 8. Then,
ˆ żT ˇ ˙
ˇ
f pt, xq “ E gpXT q ` φpXs qdsˇ Xt “ x , for all t P r0, T s.
ˇ
t

Proof. By Theorem 2.2 we have that


żt
Mt :“ f pt, Xt q ´ f p0, X0 q ´ As f ps, Xs qds
0
is a martingale. Plugging in At f pt, xq “ ´φpxq and taking conditional expectation of MT |Ft , since
Mt “ EpMT |Ft q we get
żt ˆż T ˇ ˙
ˇ
f pt, Xt q ´ f p0, X0 q ´ As f ps, Xs qds “ Epf pT, XT q|Ft q ´ f p0, X0 q ´ E As f ps, Xs qdsˇˇ Ft
0 0
which can be rewritten as
„ żT ˇ 
ˇ
f pt, Xt q “ E gpXT q ` φpXs qdsˇˇ Ft
t
Finally, by the Markov property of Itô diffusions, we obtain
„ żT ˇ 
ˇ
f pt, xq “ E gpXT q ` φpXs qdsˇ Xt “ x
ˇ
t

84
3.2. Feynman-Kac formula. Theorem 3.1 can be generalized even further:
Theorem 3.7 (Feynman-Kac Formula). Under the standing assumptions, if f pt, xq solves
#
At f pt, xq “ rpt, xqf pt, xq for all t P r0, T s
, (7.10)
f pT, xq “ gpxq
where rpt, xq and gpxq are some bounded functions, then
´ şT ˇ ¯
f pt, xq “ E e´ t rps,Xs q ds gpXT qˇ Xt “ x .
ˇ

Furthermore, f pt, xq above is the unique solution to (7.10).


Proof. The proof of uniqueness of the solution goes beyond the scope of this lecture and we
do not prove it here. As in the previous cases, we want to show that the content of the expectation
value is a martingale. Therefore, consider
şτ
rps,Xs q ds
Mτ :“ e´ t f pτ, Xτ q .
şτ
rps,Xs q ds
Defining Uτ “ e´ t , Yτ “ f pτ, Xτ q we apply Itô’s formula to obtain
dMτ “ dpUτ Yτ q “ Uτ dpYτ q ` Yτ dpUτ q ` drU, Y sτ .
Recall from the chapter on the stochastic exponential that Uτ “ Eprpτ, Xτ qq and that therefore
dUτ “ rpτ, Xτ qUτ dτ . Furthermore, we recognize that Uτ has finite variation, so drU, Y s “ 0.
Combining these observations we obtain by Itô’s formula for f , that
˜„ 
1 2 2
dMτ “ Uτ Bτ f pτ, Xτ q ` µpτ, Xτ qBx f pτ, Xτ q ` σpτ, Xτ q Bxx f pτ, Xτ q dτ
2
¸
` σpτ, Xτ qBx f pτ, Xτ q dBτ ´ rpτ, Xτ qf pτ, Xτ qdτ

“ Uτ ppAτ ´ rpτ, Xτ qqf pτ, Xτ q ` σpτ, Xτ qBx f pτ, Xτ q dBτ q .


We immediately realize that the drift term in the above formula vanishes by assumption, and that
the Itô integral term is a martingale by the standing assumptions on f , σ and µ. Consequently, the
expected vylue of the martingale is constant and we have that
” şT ˇ ı
f pt, xqe0 “ Mt “ ErMT |Ft s “ E e´ t rps,Xs q ds gpXT qˇ Xt “ x
ˇ


Example 3.8 (Example 3.5 continued). Let us consider the Black Scholes model i.e., dXt “
µXt dt ` σXt dBt for σ, µ P R. We consider the case where one can cash his/her option and obtain
a risk-free interest that satisfies the ode
dXt “ rXt dt ,
for a positive constant r P R. Then, one needs to factor such possible, risk-free earning in the
value V pt, Xt q of the asset Xt (the underlying), i.e., compare the expected value at future time T ,
V pXT q “ V ˚ pXT q with the projected risk-free value today:
erpT ´tq V pt, Xt q “ E rV ˚ pXT q|Xt “ xs
or, in other words, ” ı
V pt, Xt q “ E e´rpT ´tq V pXT q|Xt “ x .
85
The above is an example of the expected value in Theorem 3.7, and therefore obeys the pde
#
Bt V pt, xq ` µxBx V pt, xq ` 12 σ 2 x2 Bxx
2 V pt, xq ´ r V pt, xq “ 0 for all t P r0, T s
˚
,
V pT, xq “ V pxq
which is called the Black Scholes equation.

4. Time-homogeneous Diffusions
In this section we now consider a class of diffusion processes whose drift and diffusion coefficients
do not depend explicitly on time:
Definition 4.1. If Xt solves
dXt “ µpXt q dt ` σpXt q dBt (7.11)
then Xt is a time-homogeneous Itô diffusion process.
Intuitively, the evolution of such processes does not depend on the time at which the process is
started, i.e., P rXt P A|X0 “ x0 s “ P rXt`s P A|Xs “ x0 s for all A P BpRq, s P T s.t. t ` s P T . In
other words, their evolution is invariant wrt translations in time, whence the name time-homogeneous.
Definition 4.2. Given a sde with a unique solution we define the associated Markov semigroup
Pt by
pPt φqpxq “ Ex φpXt q
To see that this definition satisfies the semigroup property observe that the Markov property
states that
pPt`s φqpxq “ Ex φpXt`s q “ Ex EXs φpXt q “ Ex pPt φqpXs q “ pPs Pt φqpxq .
Note that for the class of processes introduced above the infinitesimal generator is time-independent,
i.e., we have At f “ Af . As a further consequence of the translation invariance (in time) of the sde
(7.11), the fact that the final condition of the backward Kolmogorov equation is at a specific time T
is not relevant in this framework. This enables us to “store” the time-reversal in the function itself
and look at the backward Kolmogorov equation as a forward equation as we explain below.
Let f´ px, tq be a bounded, C 2,1 function satisfying
Bf´
“ Lf´
Bt (7.12)
f´ px, 0q “ gpxq
where L is the generator defined in (7.5). For simplicity we also assume that g is bounded and
continuous. Then we have the analogous result to Theorem 3.1
Theorem 4.3. under Mt “ f´ pXt , T ´ tq is a martingale for t P r0, T q.
Proof. The proof is identical to the Brownian case. We start by applying Itô’s formula
żs
“ Bf´ ` ˘` ˘ı
f´ pXs , T ´ sq ´ f´ pX0 , T q “ ´ pXγ , T ´ γq ` Lf´ Xγ , T ´ γ dγ
0 Bt
żs
Bf´ ` ˘
` Xγ , T ´ γ dBγ .
0 Bx
Bf´
As before the integrand of the first integral is identically zero because Bt “ Lf´ . Hence only the
stochastic integral is left on the right-hand side. 
And as before we have
86
Corollary 4.4. In the above setting
` ˘
f´ px, tq “ Erg Xt |X0 “ xs
The restriction to bounded and continuous g is not needed.

Proof of Cor. 4.4. Since s ÞÑ f´ pXs , T ´ sq ´ f´ pX0 , T q is a martingale,


Erf´ pXT , 0q|X0 “ xs “ ErupX0 , T q|X0 “ xs
because at s “ 0 we see that f´ pXs , T ´ sq ´ upXp0q, T q “ 0. Since ErupX0 , T q|X0 “ xs “ upx, T q
and Erf´ pXT , 0q|X0 “ xs “ ErgpXT q|X0 “ xs, the proof is complete. 
For a more detailed discussion of Poisson and Dirichlet problems we refer to [14].

5. Stochastic Characteristics
To better understand Theorem 4.3 and Corollary 4.4, we begin by considering the deterministic case
Bf´
“ pb ¨ ∇qf´ (7.13)
Bt
f´ px, 0q “ f pxq
We want to make an analogy between the method of characteristics used to solve (7.13) and the results in
Theorem 4.3 and Corollary 4.4. The method of characteristics is a method of solving (7.13) which in this
simple setting amounts to finding a collection of curves (“characteristic curves”) along which the solution
is constant. Let us call these curves xptq were t is the parametrizing variable. Mathematically, we want
f´ pξptq, T ´ tq to be a constant independent of t P r0, T s for some fixed T . Hence the constant depends only on
the choice of ξptq. We will look of ξptq “ pξ1 ptq, ¨ ¨ ¨ , ξd ptqq which solve an ODE and thus we can parametrize
the curves ξptq by there initial condition ξp0q “ x. It may seem odd (and unneeded) to introduce the finial
time T . This done so that f´ pT, xq “ f´ pT, ξp0qq and to keep the analogy close to what is traditionally done
in sdes. Differentiating f´ pξptq, T ´ tq with respect to t, we see that maintaining constant amounts
d d
ÿ Bf´ dξi Bf´ ÿ Bf´
pξptq, T ´ tq ptq “ pξptq, T ´ tq “ bi pξptqq pξptq, T ´ tq
i“1
Bxi dt Bt i“1
Bxi

where the last equality follows from (7.13). We conclude that for this equality to hold in general we need

“ bpξptqq and ξp0q “ x .
dt
Since f´ pξptq, T ´ tq is a constant we have
f´ pξp0q, T q “ f´ pξpT q, 0q ùñ f´ px, T q “ f pξpT qq (7.14)
which provides a solution to (7.13) to all points which can be reached by curves ξpT q. Under mild assumptions
this is all of Rd .
Looking back at Theorem 4.3, we notice that differently from the ode case we did not find a sde Xt
which keeps f´ pXt , T ´ tq constant in the fully fledged sense. However, we have obtained something very
close to it: We chose t ÞÑ f´ pXt , T ´ tq to be a martingale, i.e., a process that is constant on average! This
is the content of Theorem 4.3 and Corollary 4.4 (putting the accent on the expectation part of the result),
which mimicks the result of (7.14), only with the addition of expected values. Hence we might be provoked
to make the following fanciful statement.
Stochastic differential equation are the method of characteristics for diffusions. Rather than
follow a single characteristic back to the initial value to find the current value, we trace a
infinite collection of stochastic curves each back to its own initial value which we then average
weighting with the probability of the curve.
87
6. A fundamental example: Brownian motion and the Heat Equation
We now consider the simple but fundamental case of standard Brownian motion.
Let us consider a compact subset D Ă R2 with a smooth boundary BD and a f pxq defined on
BD.
The Dirichlet problem: We are looking for a function upxq such that
B2u B2u
∆u “ ` “ 0 for y “ py1 , y2 q inside D.
B 2 y1 B 2 y2
lim upyq “ f pxq for all x P BD.
yÑx

Let Bpt, ωq “ pB1 pt, ωq, B2 pt, ωqq be a two dimensional Brownian motion. Define the stopping time
τ “ inftt ą 0 : Bptq R Du
Let Ey be the expectation with respect to the Wiener measure for a Brownian motion starting
from y at time t “ 0. Let us define φpxq “ Ey f pBpτ qq. We are going to show that φpxq solves the
Dirichlet problem.
Lemma 6.1. With probability 1, τ ă 8. In fact, Eτ r ă 8 for all r ą 0.
Proof.
Ptτ ě nu ď Pt|Bp1q ´ Bp0q| ď diamD, |Bp2q ´ Bp1q| ď diamD, . . . , |Bpnq ´ Bpn ´ 1q| ď diamDu
źn
ď Pt||Bpkq ´ Bpk ´ 1q| ď diamDu “ αn where α P p0, 1q
k“1
ř8
Hence n“1 Ptτ ě nu ă 8 and the Borel-Cantelli lemma says that τ is almost surely finite. Now
lets look at the moments.
ż 8
ÿ 8
ÿ ÿ8
Eτ r “ xr Ptτ P dxu ď nr P τ P pn ´ 1, ns ď nr P τ ě n ´ 1 ď nr α n ă 8
( (
n“1 n“1 n“1


Lets fix a point y in the interior of D. Lets put a small circle of radius ρ around y so that the
circle in contained completely in D. Let τρ,y be the first moment of time Bptq hits the circle of
radius ρ centered at y.
Because the law of Brownian motion is invariant under rotations, we see that Bpτρ,y q is distributed
uniformly on the circle centered at y. (Lets call this circle Sρ pyq.)
Theorem 6.2. φpxq solves the Laplace equation.
Proof. i) We start by proving the mean value property. To do so we invoke the Strong
Markov property of Bptq. Let τS “ inf tt : Bptq P Sρ pyqu and zϑ “ pρ cos ϑ, ρ sin ϑq be the
point on Sρ pyq at angle ϑ. We notice that any path from y to the boundary of D ˇ must pass
through Sρ pyq. Thus we can think of φpyq as the weighted average of E f pBpτ qqˇBpτS q “ zϑ
where ϑ moves use around the circle Sρ pyq. Each entry in this average is weighted by the
chance of hitting that point on the sphere starting from y. Since this chance is uniform
1
(all points are equally likely), we simply get the factor of 2π to normalize things to be a
probability measure.
1 2π 1 2π
ż ż
ˇ (
φpyq “ dϑE f pBpτ qq BpτS q “ zϑ “
ˇ dϑφpzϑ q (7.15)
2π 0 2π 0
88
ii) φpxq is infinitely differentiable. This can be easily shown but let us just assume it since we
are doing this exercise in explicit calculation to improve our understanding, not to prove
every detail of the theorems.
2 2
iii) Now we see that φ satisfies ∆φ “ BByφ2 ` BByφ2 “ 0. We expand about a point y in the interior
1 2
of D.
Bφ Bφ
φpzq “φpyq ` pz1 ´ y1 q ` pz2 ´ y2 q
By1 By2
1 B2φ B2φ B2φ
„ 
2 2
` pz1 ´ y1 q ` 2 pz2 ´ y2 q ` pz1 ´ y1 qpz2 ´ y2 q ` Op|z ´ y|3 q
2 By12 By2 By1 By2
Now we integrate this over a circle Sρ pyq centered at y of radius ρ. We take ρ to by
sufficiently small so that the entire disk in the domain D. By direct calculation we have
ż ż ż
pz1 ´ y1 qdz “ 0, pz2 ´ y2 qdz “ 0, pz1 ´ y1 qpz2 ´ y2 qdz “ 0
Sρ pyq Sρ pyq Sρ pyq

and
ż ż
2 2
pz1 ´ y1 q dz “ pconstqρ , pz2 ´ y2 q2 dz “ pconstqρ2 .
Sρ pyq Sρ pyq
Since by the mean value property,
ż
φpyq “ pconstq φpzqdz
Sρ pyq
we see that
B2φ B2φ
ˆ ˙
0 “ pconstqρ2 ` 2 ` Opρ3 q .
By12 By2
And thus,
B2φ B2φ
∆φ “ ` 2 “0
By12 By2


89
CHAPTER 8

Martingales and Localization

This chapter is dedicated to a more in-depth study of martingales and their properties. Some
of the results exposed here are fairly general, and their proof in full generality required tools that
are more advanced than the ones we have at our disposal. For this reason, some of the proofs will
be given in a simplified setting/under stronger assumptions together with a reference for the more
general result.

1. Martingales & Co.


We recall the definition of a martingale given at the beginning of the course, extending it slightly.
Definition 1.1. tXt u is a Martingale with respect to a filtration Ft (Ft -martingale for short)
if for all t ą s we have
i) Xt is Ft -measurable ,
ii) Er|Xt |s ă 8 ,
iii) ErXt |Fs s “ Xs a.s. .
Similarly, Xt is a Ft -supermartingale [Ft -submartingale] is it satisfies conditions i) and ii) above,
and
ErXt |Fs s ď Xs , rErXt |Fs s ě Xs s a.s. .
When the filtration is clear from the context we simply say that a process is a [super/sub-]martingale.
Super- and Submartingales extend the idea of a process that is constant in expectation to
processes that are, respectively, nonincreasing and nondecreasing in expectation. It is clear that a
martingale is both a supermartingale and a submartingale, while a supermartingale that is also a
submartingale is a martingale.
Proposition 1.2. A supermartingale [submartingale] Mt is a martingale on r0, T s if and only
if E rMT s “ E rM0 s.
Proof. The “only if” direction follows by definition: if Mt is a martingale then E rMT s “ E rM0 s
and it is both a super- and a submartingale. For the “if” assume that Mt is a supermartingale
and E rMT s “ E rM0 s. Assume by contradiction that it is not a martingale, i.e., that there is a set
W Ď Ω of positive probability such that E rMt |Fs s ă Ms for all ω P W . Then by the supermartingale
property of Mt we have that
E rMT s ď E rMt s “ E rE rMt |Fs ss ă E rMs s ď E rM0 s ,
which contradicts the assumption. 
Remark 1.3. By Jensen’s inequality on conditional expectations we have for any convex function
g : R Ñ R, a martingale Mt satisfies
E rgpMt q|Ft s ě gpE rMt |Fs sq “ gpMs q ,
so application of a convex [concave] map to a martingale makes it a submartingale [supermartingale].
91
“ ‰
Recall that a random variable X is [square-]integrable if E r|X|s ă 8 [E X 2 ă 8]. The
condition of simple integrability of a random variable X can be equivalently stated as the condition
“ ‰
lim E |X|1|X|ąn “ 0 . (8.1)
nÑ8
“ ‰
Indeed, on one hand as limnÑ8 |X|1|X|ąn “ 0 a.s. and E |X|1|X|ąn ď E r|X|s ă 8 we have by
“ ‰
the dominated convergence theorem1 that limnÑ8 E |X|1|X|ąn “ E r0s “ 0. On the other hand,
we have that “ ‰ “ ‰ “ ‰
E r|X|s “ E |X|1|X|ďn ` E |X|1|X|ąn ď n ` E |X|1|X|ąn , (8.2)
and by using that both summands on the right hand side are bounded (the first by definition and
the second by assumption) we obtain that E r|X|s ă 8.
We now generalize the definitions above to stochastic processes.
Definition 1.4. A stochastic process Xt on T “ r0, T s (where possibly T “ 8) is

i) integrable if suptPT E r|Xt |s ă 8 ,


“ ‰
ii) square integrable if suptPT E Xt2 ă 8 (i.e., the second moments are uniformly bounded) ,
“ ‰
iii) uniformly integrable if limnÑ8 suptPT E |Xt |1|Xt |ąn “ 0
The introduction of (8.1) allows to separate the concepts of simple and uniform integrability for
stochastic processes as in the latter definition the limit is taken after the supremum. As one would
expect, uniform integrability is stronger than simple integrability: similarly to (8.2) one has
“ ‰ “ ‰ “ ‰
sup E r|Xt |s “ sup E |Xt |1|Xt |ďn ` sup E |Xt |1|Xt |ąn ď n ` sup E |Xt |1|Xt |ąn ă 8 .
tPT tPT tPT tPT
For the converse result we need stronger assumptions. We give below examples of such results:
Proposition 1.5. A stochastic process tXt u is uniformly integrable if, either
i) It is dominated by a random variable Y defined on the same probability space, i.e., Xt pωq ď
Y pωq such that E r|Y |s ă 8,
ii) There exists some positive function Gpxq on p0, 8q with limxÑ8 Gpxq{x “ 8 such that
sup E rGp|Xt |qs ă 8 .
tPT

Proof. We only prove the first result, for which we have that
“ ‰ “ ‰
lim sup E |Xt |1|Xt |ąn ď lim E |Y |1|Y |ąn ă 8 .
nÑ8 tPT nÑ8

For the proof of the second result we refer e.g., to [16]. 


In ii) of the above theorem we see that we need something sligthly better than simple integrability
to have uniform integrability. In particular we see that Gpxq “ x1`ε for any ε ą 0 satisfies condition
ii) of the above theorem. In particular, all square integrable martingales are uniformly integrable.
Theorem 1.6. Let Y be an integrable random variable on a filtered probability space pΩ, F, P, Ft q,
then
Mt :“ E rY |Ft s (8.3)
is a uniformly integrable martingale.
Proof. We refer to Klebaner [9, Proof of Thm. 7.9]. 
1See Theorem 0.2 in the appendix for a reminder of this theorem

92
We define a martingale such as the one in (8.3) as closed by the random variable Y . In particular,
for any finite time interval r0, T s by definition every martingale is closed by its value at T since
E rMT |Ft s “ Mt and we have the following corollary.
Corollary 1.7. Any martingale Mt on a finite time interval is uniformly integrable.
The above results can be extended to infinite time intervals.
Theorem 1.8 (Martingale convergence theorem). Let Mt on T “ r0, 8q be an integrable
[sub/super]-martingale. Then there exists an almost sure (i.e., pointwise) limit limtÑ8 Mt “ Y ,
and Y is an integrable random variable.
The above theorem does not establish a correspondence between the random variables in terms
of expected values. In particular, we may have cases where the theorem above applies but we have
limtÑ8 E rMt s ‰ E rY s:
Example 1.9. Consider the martingale Mt “ exprBt ´ t{2s. Because it is positive, we have that
sup E r|Mt |s “ sup E rMt s “ E rM0 s “ 1 ă 8 ,
tPT tPT

so it converges almost surely to a random variable Y by Theorem 1.8. However, we see that by the
law of large numbers for Brownian motion Bt {t Ñ 0 a.s. and therefore
” ı ” ı
E rY s “ E lim Mt “ E lim etpBt {t´1{2q “ 0 ,
tÑ8 tÑ8

which differs from limtÑ8 E rMt s “ 1.


The above observation neans in particular that the conditions of Theorem 1.8 do not guarantee
convergence in the L1 norm. Under the stronger condition of uniform integrability of the process
Xt one obtains the same result with convergence in L1 norm and consequently the closedness of the
martingale:
Theorem 1.10. Let Mt be a uniformly integrable martingale on T “ r0, 8q, then it converges
as t Ñ 8 in L1 and a.s. to a random variable Y . Conversely, if Mt converges in L1 to an integrable
random variable Y then it is square integrable and converges almost surely. In both cases Mt is
closed by Y .

2. Optional stopping
After studying martingales per se, we consider their relation with stopping times. In particular,
we will see that martingales behave nicely with respect to stopping times. To be more explicit, given
a stochastic process Xt and recalling the definition Def. 7.15 of a stopping time τ , we denote by
τ ^ t “ minpτ, tq and define the stopped process
#
Xt if t ă τ
Xtτ :“ Xτ ^t “ .
Xτ else
The following theorem gives an example of the nice relationship between martingales and stopping
times: it says that the martingale property is maintained by a process when such process is stopped.
Theorem 2.1. For a Ft -martingale Mt and any stopping time τ , the process Mτ ^t is a Ft -
martingale (and therefore a Fτ ^ t-martingale), so
E rMτ ^t s “ E rM0 s for all t ą 0 . (8.4)
93
Martingales are often thought of as fair games because of their property of conserving their
expected value: It is impossible, on average, to make positive gains by playing such game. Under
this interpretation, Theorem 2.1 states that even if a player is given the possibility of quitting the
game use any betting strategy, he/she will not be able to make net gains at time t provided that
his/her strategy only depends on past information (cfr. Def. 7.15 of stopping time). However, the
above property is lost if the player is patient enough, as the following example shows:
Example 2.2. Let Bt be a Standard Brownian motion (a martingale, hence an example of a
“fair game”: you can think of it as a continuous version of betting one dollar on every coin flip) and
define τ1 :“ inftt : Bt ě 1u (the strategy of stopping as soon as you have a net gain of 1$). Then
by definition we have that Bτ “ 1 ‰ 0 “ E rB0 s.
A similar situation to the one described above holds when considering the “martingale” betting
strategy of doubling your bet every time you loose a coin flip. This strategy leads to an almost
sure net win of 1$ if one is patient enough (and has enough money to bet). As the examples above
shows, stopped martingales may lose the property of conserving the expected value in the limit
t Ñ 8. The following theorem gives sufficient conditions for the martingale property to hold in this
limit, i.e., for the expected value of a game to be conserved at a stopping time τ :
Theorem 2.3 (Optional stopping theorem). Let Mt be a martingale, τ a stopping time, then
we have E rMτ s “ E rM0 s if either of the following conditions holds:
‚ The stopping time τ is bounded a.s, i.e., DK ă 8: τ ď K ,
‚ The martingale Mt is uniformly integrable ,
‚ The stopping time is finite a.s. (i.e., Prτ “ 8s “ 0), Mt is integrable and
lim E rMt 1τ ąt s “ 0 .
tÑ8

Proof. 
Under the gaming interpretation of above, we see that a game is “fair”, i.e., it is impossible
to make net gains, on average, using only past information, if any of the conditions i)-iii) hold. In
particular, in the case of coin-flip games (or casino games) we see that a winning strategy does not
exist as condition ii) holds: there is only a finite amount of money in the world, so the martingale is
uniformly bounded, and in particular uniformly integrable. A simplified example of such a situation
is given next:
Example 2.4. Let Bt be a Standard Brownian motion on on the interval a ă 0 ă b and
define the stopping time τ “ τab “ inftt P r0, 8q : Bt R pa, bqu. The stopped process Bτ ^ t is
uniformly bounded and in particular uniformly integrable. Hence, by Theorem 2.3 we have that
E rBτ s “ E rB0 s “ 0. However, we also have that Bτ “ b with probability p and Bτ “ a with
probability 1 ´ p, therefore
´a
0 “ E rBτ s “ a ¨ p1 ´ pq ` b ¨ p ñ P rBτ “ bs “ p “ ,
b´a
which we have concluded based on considerations based on the martingale properties of Bt and
therefore extends to any martingale for which τab is finite a.s..
We conclude the chapter by presenting the converse of Theorem 2.3:
Proposition 2.5. Let Xt be a stochastic process such that for any stopping time τ , Xτ is
integrable and E rX0 s “ E rXτ s. Then Xt is a martingale.
Proof. We refer to Klebaner [9, Proof of Thm. 7.17]. 
94
3. Localization
This section is devoted to the use of stopping times for the study of the properties of stochastic
processes. As we have seen, the stopped process may have some properties that the original process
did not have (e.g., uniform integrability on r0, 8q in Example 2.4). One can generalize such situation
to a sequence of stopping times, such as the following example:
Example 3.1. Consider, similarly to Example 2.4, a Standard Brownian motion Bt on the
interval p´n, nq for n P N. Then we can define the stopping times τn :“ inftt : Bt R p´n, nqu. For
each n ą 0, the process is uniformly integrable.
In the above example, by taking the limit n Ñ 8 one would approach the original setting of
unbounded Brownian motion by approximating it with uniformly bounded stopped processes. This
prices can be extremely useful to obtain stronger results as the ones obtained previously in the
course, as we will see later in this section, and justifies the following definition:
Definition 3.2. A property of a stochastic process Xt is said to hold locally if there exists
a sequence tτn u of stopping times with the property limnÑ8 τn pωq “ 8 a.s. such that the stopped
process Xτn ^t has such property. In this case, the sequence tτn u is called the localizing sequence.
A particularly useful example is the one of the martingale property:
Definition 3.3. An adapted process Mt is a local martingale if there exists a sequence of
stopping times tτn u such that limnÑ8 τn pωq “ 8 a.s. and the stopped process Mτn ^t is a martingale
for all n.
It is clear that if a property holds in the original sense, then it also holds locally: one just has
to take τn “ n ą t. On the contrary a local martingale is in general not a martingale:
şt
Example 3.4. Consider the Itô integral 0 exprBs2 s dBs , for t ă 1{4 and define τn :“ inftt ą
0 : exprBs2 s “ nu. The process Mτn ^t is a martingale, since we can write it as
żt
Mτn ^t “ Ut “ exprBs2 s1exprBs2 sďn dBs
0
is square integrable by Itô isometry. However, we have that
ż8
1 2 2
E expr2Bt2 s “ e2x e´x {p2tq dx
“ ‰
2πt ´8
which diverges for t ą 1{4, implying that Mt is not integrable.
We now list some results that, besides allowing to practice the use of localization methods, give
sufficient conditions for a local martingale to be a martingale.
Proposition 3.5. Let Mt be a local martingale such that |Mt | ăď Y for an integrable random
variable Y , then Mt is a uniformly integrable martingale.
Proof. Let τn be a localizing sequence, then for any n and s ă t we have
E rMt^τn |Fs s “ Ms^τn .
Because τn Ò 8 a.s. we have the pointwise convergence limnÑ8 Xs^τn “ Xs . Furthermore by our
assumptions Mt is integrable, and we can apply Dominated Convergence Theorem2 to obtain that
” ı
E rMt |Fs s “ E lim Xt^τn |Fs “ lim E rXt^τn |Fs s “ lim Xs^τn “ Ms ,
nÑ8 nÑ8 nÑ8
showing that Mt is a martingale. By Proposition 1.5 we establish uniform integrability of Mt . 
2a version of this theorem is presented in the appendix

95
Proposition 3.6. A non-negative local martingale Mt , for t P r0, T s is a supermartingale.
Proof. Let tτn u be the localizing sequence of Mt . Then for any t we have that limnÑ8 τn ^t “ t
a.s and therefore that limnÑ8 Mτn ^t “ Mt . Consequently, by Fatou’s lemma on conditional
expectations we have
” ı
E rMt |Fs s “ E lim inf Mτn ^t |Fs ď lim inf E rMτn ^t |Fs s “ lim inf Mτn ^s “ Ms a.s. ,
nÑ8 nÑ8 nÑ8
where in the second equality we have used that the limit exists. In particular, we have that
E rMt s ď E rM0 s ă 8. 
Corollary 3.7. A non-negative local martingale Mt on T “ r0, T s for T ă 8 is a martingale
if and only if E rMT s “ M0
Proof. This is a direct result of Proposition 1.2 and Proposition 3.6. 
Remark 3.8. As explained in [9] there exists a necessary and sufficient condition for a local
martingale to be a martingale: for the local martingale to be of “Dirichlet Class”, i.e., such that
such that the collection of random variables
X “ tXτ : τ is a finite stopping timeu
“ ‰
is uniformly integrable, i.e., supXPX limnÑ8 E |X|1|X|ąn “ 0.
We now give some slightly more advanced examples of the use of localization procedure. We
begin by revisiting the problem of proving moment bounds şt for Itô integrals.
Moment Bounds for Itô Integrals. We let It “ 0 σs dBs . We want to prove the moment
bounds
E|It |2p ď p2p ´ 1qp2p ´ 3q ¨ ¨ ¨ 3 ¨ 1 ¨ M 2p tp ,
under the assumption that |σs | ď M a.s.
The case p “ 1 follows from the Itô isometry. Therefore, we now proceed to prove the induction
step. Let us assume the inequality for p ´ 1 and use it to prove the inequality for p. For any N ą 0,
we define
żt
τN “ inftt ě 0 : |Is |4p´2 σs2 ds ě N u
0
Applying Itô formula to x ÞÑ |x|2p
and evaluating at the time t ^ τN produces
ż t^τN ż t^τN
|It^τN |2p “ pp2p ´ 1q |Is |2pp´1q 2
σs ds ` 2p |Is |2p´1 σs dBs “ pIq ` pIIq
0 0
now by the induction hypothesis
żt żt
2pp´1q 2 2p
EpIq ď pp2p ´ 1q E|Is | σs ds ď p2p ´ 1qp2p ´ 3q ¨ ¨ ¨ 3 ¨ 1 ¨ pM sp´1 ds
0 0
ď p2p ´ 1qp2p ´ 3q ¨ ¨ ¨ 3 ¨ 1 ¨ M pt ^ τN q ď p2p ´ 1qp2p ´ 3q ¨ ¨ ¨ 3 ¨ 1 ¨ M 2p tp
2p p

If we define
żt
Ut “ |Is |2p´1 σs 1sďτN dBs
0
then Ut is a martingale since
żt ż t^τN
4p´2 2
|Is | |σs | 1sďτN ds “ |Is |4p´2 |σs |2 ds ď N .
0 0

96
Since t ^ τN is a bounded stopping time, the optional stopping lemma says that EUt^τN “ 0.
However as noted above
EpIIq “ EUt^τN
so one obtains
E|It^τN |2p ď p2p ´ 1qp2p ´ 3q ¨ ¨ ¨ 3 ¨ 1 ¨ M 2p tp .
Since |Is | is almost surely finite, we know that τN is finite with probability one. Hence |It^τN |2p Ñ
|It |2p almost sure. Then by Fatou’s lemma we have
E|It |2p ď lim E|It^τN |2p ď p2p ´ 1qp2p ´ 3q ¨ ¨ ¨ 3 ¨ 1 ¨ M 2p tp (8.5)
N Ñ8

SDEs with Superlinear Coefficients. Let b : Rd Ñ Rd and σ piq : Rd Ñ Rd be such that for
any R ą 0 there exists a C such that
m
ÿ
|bpxq ´ bpyq| ` |σ piq pxq ´ σ piq pyq| ď C|x ´ y|
i“1
|bpxq| ` |σpxq| ď C
for any x, y P B0 pRq, where B0 prq :“ tx P Rd : }x}2 ă Ru.
Consider the sde
m
ÿ piq
dXt “ bpXt q dt ` σ piq pXt q dBt (8.6)
i“1
piq
For any R let bR and σR be are globally bounded and globally Lipchitz functions in Rd such
that bR pxq “ bpxq and σR pxq “ σpxq in B0 pRq.
Since bR and σR satisfy the existence and uniqueness assumptions of Chapter 6.3, there exists a
pRq
solution Xt to the equation
ÿm
pRq pRq piq pRq piq
dXt “ bR pXt q dt ` σR pXt q dBt (8.7)
i“1
For any N ą 0 and R ą 0 we define the stopping time
pRq
τR :“ inftt ě 0 : |Xt | ą Ru
Theorem 3.9. If
„ 
P lim τR “ 8 “ 1
RÑ8

there there exists a unique strong solution to (8.6).


(
Proof. Fix a T ą 0. For R P N let ΩR “ τR ă T ă τR`1 . By the assumption
« ff
8
ď
P ΩR “ 1 .
R“1
Also notice that the ΩR are disjoint for R “ 1, 2, . . . and we can define the process
pRq
Xt pωq “ Xt pωq for t P r0, T s if ω P ΩR .
pRq pRq pRq pRq pRq
Since sup |Xt pωq| ă R, we know that bpXt pωqq “ bpRq pXt pωqq and σpXt pωqq “ σ pRq pXt pωqq
for all t P r0, T s. Hence Xt as defined solves the original equation. Uniqueness comes from the fact
that solutions to (8.7) are unique. 
97
4. Quadratic variation for martingales
Recall the definition of quadratic variation of a stochastic process:
Definition 4.1. The quadratic variation of an adapted stochastic process Xt is defined as
j ´ N
pÿ ¯2
rXst :“ lim XtN ´ XtN
N Ñ8 j`1 j
j“0

where limp denotes a limit in probability and ttN


j u is a set partitioning the interval r0, ts defined by

ΓN :“ tttN N N N
j u : 0 “ t0 ă t1 ă ¨ ¨ ¨ ă tj N “ tu (8.8)
with |ΓN | :“ supj |tN N
j`1 ´ tj | Ñ 0 as N Ñ 8.

The process defined above is a sum of positive contributions and is therefore nondecreasing in t
a.s..
Now let Mt be a [local] martingale. In light of Remark 1.3 we know that Mt2 is a [local]
submartingale. Hence, we would like to know if we can transform Mt2 back to a martingale, for
example by subtracting a “compensation process” removing the nondecreasing part of the squared
process. It turns out that such process exists and is precisely the quadratic variation process. The
intuition behind this result comes from the following computation: assume that s ă t, then we have
E rMt Ms s “ E rMs E rMt |Fs ss “ E Ms2
“ ‰

where in the second equality we have used the martingale property. As a consequence of this we can
write
E pMt ´ Ms q2 “ E Mt2 ´ 2E rMt Ms s ` E Ms2 “ E Mt2 ´ E Ms2 .
“ ‰ “ ‰ “ ‰ “ ‰ “ ‰
(8.9)
In particular this implies that the summands in the definition of quadratic variation can be expressed,
on expectation, as differences of expectation values that cancel telescopically, leading to (part of)
the following theorem.
Theorem 4.2. This theorem can be stated in the martingale and local martingale version:
i) Let Mt be a square-integrable martingale, then the quadratic variation process rM st exists
and Mt2 ´ rM st is a martingale.
ii) Let Mt be a local martingale, then the quadratic variation process rM st exists and Mt2 ´rM st
is a local martingale.
Proof. We only prove point i) of the theorem above. Point ii) follows for locally square
integrable martingales by localization, i.e., by substituting t Ñ τn ^ t where τn is the localizing
sequence. Repeating the calculation leading to (8.9) with conditional expectations we obtain
» fi
jN
ÿ
E Mt ´ Ms2 |Fs “ E – pMtN ´ MtN q2 |FtN fl .
“ 2 ‰
j j´1 j´1
j“1

Now, taking the limit in probability of the right hand side (we do
“ not prove that
‰ such limit exists
here, but we refer to [16]) and rearranging we obtain that E Mt2 ´ rM st |Fs “ Ms2 ´ rM ss as
desired . 
We conclude this section by proving a surprising result about martingales with finite first
variation.
Lemma 4.3. Let Mt be a continuous local martingale with finite first variation. Then Mt is
almost surely constant.
98
The intuition behind the above result is quite simple: considering a continuum time interval,
constraining a continuous martingale on behaving “nicely” in order to have finite first variation
(for example monotonically or in a differentiable way) the martingale would somehow have to be
“consistent with its trend at t´ ” (except of course in a set of measure 0) and could therefore not
respect the constant conditional expectation property. In other words, martingales with finite first
variation are too “stiff” to be different from the identity function.
Remark 4.4. Note that continuity is a key requirement in the above result: jump processes
(constant between jumps, discontinuous when jumps occur) give an example of martingales that are
not constant but that have finite first variation.
Proof of Lemma 4.3. We assume for this proof that Mt is a [locally] bounded martingale.
We will eventually show that the variance of Mt is zero and hence Mt is constant. Picking some
partition of time 0 “ t0 ă t1 ă ¨ ¨ ¨ ă tk “ t, recalling (8.9) we consider the variance at time t
ÿ´ ¯ ÿ
EMt2 “ E Mt2n ´ Mt2n´1 “ E pMtn ´ Mtn´1 q2
ÿ
ď E sup |Mtn ´ Mtn´1 | |Mtn ´ Mtn´1 |
tn
ř
Since the first variation V ptq “ lim∆T Ñ0 |Mtn ´ Mtn´1 | was assumed to be finite we obtain
EMt2 ď pconstqE lim sup |Mtn ´ Mtn´1 |
∆T Ñ0 tn

this limit is zero because M was assumed to be continuous.


Hence the variance of Mt is zero and thus Mt is constant almost surely. Thus Mt is constant for
any countable collection of times. Use the rational numbers and then continuity to conclude that it
is constant and the same constant for all times. 

5. Lévy-Doob characterization of Brownian motion


In the beginning of this course we have given several equivalent conditions on the continuity
and the marginals of a process to guarantee that such process is a Brownian motion. Using the
intuition on martingales that we have developed in the previous sections we are now ready to give a
different set of conditions that allow to draw the same conclusion:
Theorem 5.1 (Lévy-Doob). If Xptq is a continuous martingale such that
i) Xp0q “ 0,
ii) Xptq is a square integrable-martingale with respect to the filtration it generates,
iii) Xptq2 ´ t is a square integrable-martingale with respect to the filtration it generates
then Xptq is a standard Brownian motion.
It is important the Xptq be continuous. For example if Nt is a jump process Nt ´ t and
pNt ´ tq2 ´ t are both martingales but Nt ´ t is quite different from Brownian motion.
Proof. Our proof essentially follows that of Doob found in [2], which approaches the problem
as a central limit theorem, proved through a clever trick using a telescopic sum. Fix a positive
integer N and an ε ą 0. Define
τ pε, N q “ infts ą 0 : sup |Xps1 q ´ Xps2 q| “ εu .
s1 ăs2 ăs
|s1 ´s2 |ă1{N

If there is no such time s, set τ “ 8. Fix a time t. We what to show that the random variable Xptq
2
has the same Gaussian distribution as Bt . To do this it is enough to show that EeiαXptq “ e´α t{2 ,
99
that is show that they both have the same characteristic functions (Fourier transform). It is a
standard result in basic probability that if a sequence of random variables have characteristic
2
functions which converge for each α to e´α t{2 then the sequence of random variables has a limit and
it is Gaussian. See [1] for a nice discussion of characteristic functions and convergence of probability
measures. Hence, we will show that
α2 t
! )
E eiαXpt^τ q Ñ e´ 2 ` Opεq as N Ñ 8 for any ε ą 0.

Since ε will be arbitrary and the left hand side is independent of ε, this will imply the result.
Partition the interval r0, ts with point tk “ ktN . Set
ˇ # +ˇ
N N
α2
ˇ ź ź ˇ
I “ ˇE eiαpXj ´Xj´1 q ´ e´ 2 ptj ´tj´1 q ˇ
ˇ ˇ
ˇ j“1 j“1
ˇ

where Xj :“ Xptj ^ τ q. In general, observe that the following identity holds


A1 A2 A3 ¨ ¨ ¨ AN ´ B1 B2 ¨ ¨ ¨ BN “A1 A2 ¨ ¨ ¨ AN ´1 pAN ´ BN q
`A1 A2 ¨ ¨ ¨ AN ´2 pAN ´1 ´ BN ´1 qBN
`A1 A2 ¨ ¨ ¨ AN ´3 pAN ´2 ´ BN ´2 qBN ´1 BN
..
.
`pA1 ´ B1 qB2 B3 ¨ ¨ ¨ BN .
Hence
ˇ $ ,ˇ
ˇ &N N ´k´1 ˆ ˙ N .ˇˇ
2 2
´ α2 ptN ´k ´tN ´k`1 q ´ α2 ptj ´tj´1 q
ˇ ÿ ź ź
I “ ˇˇE eiαpXj ´Xj´1 q eiαpXN ´k ´XN ´k`1 q ´ e e ˇ
ˇ % j“1
-ˇˇ
k“1 j“N ´k`1

All of the terms in the first product have modulus one and all of the terms in the second product
are less than one. Hence
# * ˇ+
N ˇ "
α2
ÿ ˇ
IďE ˇE eiαpXN ´k ´XN ´k`1 q ´ e´ 2 ptN ´k ´tN ´k`1 q ˇFtN ´k ^τ ˇ
ˇ ˇ ˇ
k“1

Now observe that by Taylor’s theorem


α2 α2 α2
eiα∆k X ´ e´ 2
∆k t
“ iα∆k X ´ p∆k Xq2 ` ∆k t ` Op∆k Xq3 ` Op∆k tq2
2 2
where ∆k X “ Xk ´ Xk´1 and ∆k t “ tk ´ tk´1 . The constants implied by Op∆k Xq3 and Op∆k tq2
can be taken to be uniformly bounded for ε P p0, ε0 s and N P rN0 , 8q for some ε0 ą 0 and N0 ă 8.
Observe that by using the martingale assumptions on X and the optional stopping lemma, we have
that
(
E ∆N ´k X|FtN ´k ^τ “ 0
E p∆N ´k Xq2 |FtN ´k ^τ “ ∆N ´k pt ^ τ q ď tN ´k ´ tN ´k´1 .
(

Here ∆k pt ^ τ q “ tk ^ τ ´ tk ^ τ . By our definition of τ , |∆N ´k X| ď ε. So we have


E |∆N ´k X|3 |FtN ´k ^τ ď sup |∆N ´k X| E p∆N ´k Xq2 |FtN ´k ^τ ď εE p∆N ´k Xq2 |FtN ´k ^τ .
( ` ˘ ( (
k,ω

100
And thus,
# ˇ+
N ˇˇ 2 2
ÿ α 2 α 2 3
ˇ
IďE ˇEtiα∆k X|Ft
N ´k
u ´ Et p∆ k Xq |F tN ´k
u ` ∆ k t ` Cp∆ k tq ` CEtp∆ k Xq |F tN ´k ^τ u ˇ
k“1
ˇ 2 2 ˇ
« ˆ ˙ 2 ˆ ˙ 2 ˆ ˙2 ˆ ˙ ff
t α t α t t t
ď ´N `N ` CN ` CεN “ C ` εCt
N 2 N 2 N N N
Observe that τ Ñ 8 as N Ñ 8 for any fixed ε. Hence we have that
ˇ ! ˇ
) 2 ˇ
ˇE eiαXptq ´ e´ α2 t ˇ ď lim I ď εCt
ˇ
ˇ ˇ N Ñ8
Notice that the left hand side is independent of ε. Since C and t are fixed and ε was any arbitrary
number in p0, ε0 s, we conclude that
α2 t
! )
E eiαXptq “ e´ 2

We now give a slightly different formulation of the Levy-Doob theorem. Let Mt be a continuous
martingale. Then by Theorem 4.2 if rM st “ t condition ii) of Theorem 5.1 is satisfied and we obtain
the following result.
Theorem 5.2 (Levy-Doob theorem). If Mt is a continuous martingale with rM st “ t and
M0 “ 0 then Mt is standard Brownian motion.

6. Random time changes


Let Mt be a contiguous martingale with respect to the filtration Ft . Since the quadratic variation
map t ÞÑ rM st is non-decreasing, we can define its left-inverse by
τt “ infts ě 0 : rM ss ě tu (8.10)
and the limiting value
rM s8 “ lim rM st
tÑ8
Theorem 6.1 (Dambis-Dubins-Schwartz). Let Mt , τt be as above. If rM s8 ą T then Bt “ Mτt
is a Brownian motion on the interval r0, T s with respect to the filtration Gt “ Fτt . Conversely, there
exists a standard Brownian motion Bt such that Mt “ BrM st for t ě 0. This result also holds when
Mt is a continuous local martingale.
Remark 6.2. Theorem 6.1 shows that any continuous martingale is just the time change of
Brownian motion with rM st giving the rate at which fluctuations are injected into the system. This
intuition is particularly useful in finance, where rM st can be thought of a measure of the volatility
of the process.
Proof of Theorem 6.1. By the definition of τt as the left-inverse of the map t ÞÑ rM st we
have that rB̂st “ rM sτt “ t. Hence Mτ2t ´ t is a martingale. By localizing the stopping time τt to
τt ^ s for a finite s if necessary (i.e., to allow for the application of the optional stopping theorem)
we have that
EpB̂t |Gs q “ EpMτt |Fτs q “ Mτs “ B̂s
and consequently we see that B̂t is also a martingale. Hence the by the Levy-Doob Theorem
(Theorem 5.2), B̂t is a standard Brownian motion. The converse result follows from the first: for B̂t
defined above we see by the definition of τt that B̂rM st “ MτrM st “ Mt since τrM st “ t. 
101
For martingales that can be written as
dMt “ Ht dBt ,
şt 2
ş8
we know that rM st “ 0 Hs ds. Therefore, by the above theorem if 0 Hs2 ds “ 8 we can write Mt
as ˆ żt ˙
Mt pωq “ B̂ ω, Hs2 pωq ds , (8.11)
0
for a Brownian motion B̂pω, sq that can be constructed from Mt . We note that we can explicitly
şt
invert the time-change: letting f pt, ωq “ 0 Hs pωq2 ds we have
a
Ht pωq “ Bt f pt, ωq
This implies that Mt , i.e., the time-changed Brownian motion B̂pω, f pt, ωqq satisfies the sde
a
dMt “ dB̂pf ptqq “ Bt f ptq dBt , (8.12)
where, in general, B ‰ B̂! We also note that, if Hs pωq “ Hs i.e., Hs is a deterministic process, the
time-change is deterministic and the interpretation of the above calculation simplifies (cfr the next
example). Furthermore, in this case, changing time for another Brownian motion B̃ still satisfies
(8.11) in distribution.
Example 6.3. We consider the time-change Hs “ σeαs i.e.,
żt
e2αt ´ 1
f ptq “ σ 2 e2αs “ σ 2 .
0 2α
Then we have that the process B̂pf ptqq is the (weak) solution to the sde dXt “ σeαt dBt . Now,
consider the process
2αt ´ 1
ˆ ˙
´αt ´αt 2e
Ut :“ e Xt “ e B̂ σ .

By Itô’s product rule we see that this process satisfies
dUt “ ´αUt dt ` σ dBt ,
which is the well know sde for the Ornstein-Uhlenbeck process (cfr. Langevin equation).
Time Change for an SDE. We now extend the above reasoning and use it to construct a
new way of solving sdes.
Consider the simple one dimensional sde
dXt “ σpXt q dBt
with σpxq ą 0. We can rewrite the above equation as
1
dBt “ dXt .
σpXt q
Now, by Theorem 6.1 we write Xt as a time-changed Brownian motion Xt pωq “ B̂pω, rXst q, and in
the new timescale τ defined by rXst we have that the sde reads
1
dMτ “ dB̂τ .
σpB̂τ q
In the following, by abuse
ş8 ´2of notation we will denote the new timescale as the old one, i.e.,
τ “ t. We assume that 0 σ pB̂s qds “ 8 almost surely (A simple condition which ensures this is
|σpxq| ď c ă 8 for all x.) Now we would like to invert the change of time we just performed i.e., go
102
back to the timescale where M¨ is a Brownian motion. Similarly to the previous paragraph, we do
this by defining the inverse transformation:
żt
rM st “ σ ´2 pB̂s qds “: Gptq and τt “ G´1 ptq “ infts : rM ss ą tu . (8.13)
0

In other words, we are now in the same setting as in the previous section, where f ptq “ G´1 ptq. At
the same time, by the inverse function theorem we obtain
¨ ˛´1
` ˘´1 ˚ 1 2
f 1 ptq “ Bt G´1 ptq “ G1 pG´1 ptqq ¯2 ‚ “ σpB̂pτt qq

“˝ ´
σ B̂pG´1 ptqq

Inserting this into (8.12) we finally obtain


dXt “ dB̂τt “ σpB̂τt q dBt “ σpXt q dBt (8.14)
Remark 6.4. We note that the above calculation can be performed for a general choice of
time-change
żt
Hptq “ hpXs q ds , and τt “ H ´1 ptq ,
0
resulting in the sde for Yt “ Xτt given by
σpYt q
dYt “ a dBt ,
hpYt q
which for the choice of H “ σ 2 gives the standard Brownian motion as a solution. Inverting this
time transformation as done above will give the solution to the original sde. Note again that the
time-changed Brownian motion is a weak solution to the original sde, since we first choose a
Brownian motion according to which we solve in the sde in the new timescale, and then we transform
it back to the original timescale, mapping the solution to another Brownian motion.

Now consider the full-fledged sde


dXt “ µpXt q dt ` σpXt q dBt (8.15)
where µ : Rd Ñ Rd , σ : Rd ˆ Rm,d and Bt is a m-dimensional Brownian motion. As we have done is
Remark 6.4, we define the time change
żt
Hptq :“ hpXs qds and τt “ H ´1 ptq “ infts : Hpsq ą tu . (8.16)
0
Then we can show the following result:
Theorem 6.5. Let Xt be the solution to (8.15), then the process Yt “ Xτt is a weak solution to
the sde
µpYt q σpYt q
dYt “ dt ` a dBt .
hpYt q hpYt q
a
Proof. With the same definitions as above define dMt “ hpXt q dBt and Bt “ Mτt . Since
şt
rM st “ 0 hpXs q dt we see that Bt is a standard Brownian motion. Observe that dτt “ hpXτt q´1 dt
and
1
dBt “ a dBτt
hpXτt q
103
Defining Yt “ Xτt , we have that
1 1 ` ˘
dYt “ dXτt “ µpXτt q dt ` σpXτt q dBτt
hpXτt q hpXτt q
µpYt q σpYt q 1
“ dt ` a a dBτt
hpYt q hpYt q hpXτt q
µpYt q σpYt q
“ dt ` a dBt
hpYt q hpYt q

Note that, similarly to all the cases above, it is only a weak solution since the Brownian motion
Bt was constructed at the same time as the solution Yt . A strong solution required that the Brownian
motion be specified in advance.
Example 6.6. We consider the equation for the squared Bessel process (cfr problem sets)
a
dXt “ δdt ` 2 Xt dBt
and define the time change
σ2
ˆ ˆ ˙˙
2νt
τ“ 1 ´ exp ´ .
2νp2 ´ δq 2´δ
Then by the above theorem we obtain
b a
1
dX̃t “ δτ ptqdt ` 2 X̃t τ 1 ptqdBptq .
Now defining
1´δ{2
Yt “ exppνtqX̃t ,
we have that
ˆ ˙
δ ´δ{2 δ δ 1 ´δ{2
dYt “ νYt dt ` exppνtqp1 ´ qX̃t dX̃t ` exppνtq 2p´ qp1 ´ qτ ptqX̃t dt .
2 2 2
and combining with the definition of τ and dX̃t we obtain that Yt “ Xτ solves
1´δ
dYt “ νYt dt ` σYt 2´δ dWt
Remark 6.7. The same argument shows that if
a
dXt “ ht µpXt q dt ` ht σpXt q dBt
şt
for some positive, adapted stochastic process ht , then if τt “ 0 h´1
s ds and Yt “ Xτt we have

dYt “ µpYt q dt ` σpYt qdB̂t


şt
for the standard Brownian motion B̂t “ Mτt where Mt “ 0 hs dBs .

7. Martingale inequalities
We now present some very useful inequalities that allow to control the fluctuations of martingales.
The first result is due to Doob and controls the probability distribution of the maximum of a
martingale on a certain time interval. For this reason these inequalities are sometimes called Doob’s
maximal inequalities. The first one bounds from above the probability that the supremum of a
martingale in an interval exceeds a certain a certain value λ, while the second bounds the first
moment of such distribution, i.e., the expected value of the supremum on the given interval.
104
Theorem 7.1 (Doob’s Martingale Inequality). Let Mt be a martingale (or a positive submartin-
gale) with respect to the filtration Ft . Then for T ą 0 and for all λ ą 0
Er|MT |p s
„ 
P sup |Mt | ě λ ď for all p ě 1 ,
0ďtďT λp
and „  ˆ ˙p
p
E sup |Mt | ď Er|MT |p s for all p ą 1 .
0ďtďT p ´ 1
Before turning to the proof, we remark the similarity of the first inequality with Markov’s
inequality, i.e., given a random variable X, for every p ě 1 we have
E r|X|p s
P r|X| ą λs ď .
λp
The difference of the two inequalities is the supremum, under the condition of the process Mt being
a martingale, in Doob’s inequality.
Proof. First of all we note that by convexity of |x| and xp on R` the process |Mt |p is a
submartingale. Consequently defining the stopping time
τλ :“ inftt : |Mt | ą λu ,
we have by Doob’s optional stopping theorem
E r|Mτλ ^t |p s ď E r|Mt |p s . (8.17)
At the same time, we have that
E r|Mτλ ^t |p s “ E r|Mτλ ^t |p 1τλ ďt s ` E r|Mτλ ^t |p 1τλ ąt s
“ λP rτλ ď ts ` E r|Mt |p 1τλ ąt s . (8.18)
Combining (8.17) and (8.18) we finally obtain
« ff
E r|Mt |p 1τλ ďt s E r|Mt |p s
P sup |Ms | ě λ “ P rτλ ď ts ď p
ď
sPr0,ts λ λp
where in the last passage we have used the nonnegativity of |Mt |. 
The above result is key to derive numerous results in stochastic calculus. We have seen one
example in the proof of Theorem 7.2. We can also use it to bound the supremum of Itô integrals:

” Example
şt ı7.2. Under the assumption that σs ď M ă 8 we have shown in Section 3 that
p
E | 0 σs dBs | ă 8. Consequently, by Doob’s inequality (recall that for a martingale Mt , |Mt |p is
a positive submartingale for p ě 1) we have
« żt ff „ żt 
p 2p
E sup | σs dBs | ď C2 E | σs dBs | ă 8.
tPp0,T q 0 0

We now introduce the very useful Burkholder-Davis-Gundy inequalities.


Theorem 7.3 (Burkholders-Davis-Gundy Inequality). Let Xt be a local martingale, then for
any p ě 1
” ı
cp E rXt sp ď E sup |Xs |2p ď Cp E rXt sp
“ ‰ “ ‰
0ďsďt

where cp , Cp are constants independent of the process, depending only on p.


105
Proof.
şt We only prove the upper bound of this result, under the simplifying assumption that
Xt “ 0 fs pωqdBs for a bounded process fs ď M for M ă 8. For the complete versions of the proof
see [4, 8, 15].
Doob’s Lp maximal inequality combined with Itô’s formula implies that
„  ˆ ˙p
2p p
E |Xt |2p
“ ‰
E sup |Xs | ď (8.19)
0ďsďt p´1
˙p „
2pp2p ´ 1q t
ˆ ż żt 
p 2pp´1q 2 2p´1
“ E |Xs | f psq ds ` 2p |Xs | f psqdBs .
p´1 2 0 0
Next we introduce the stopping time
żt
τN “ inftt ě 0 : |Xs |4p´2 |fs |2 ds ě N u
0
şt 2p´1 f
Let IN ptq “ 2p 0 |Xs^τN | s^τN dBs .
Since the integrand is bounded by the construction of τN
we have that EIN ptq “ 0. Notice that
„ ż t^τN 
2p´1
E 2p |Xs | f psqdBs “ ErIN pt ^ τN qs “ 0
0
where the last equality follows from the Optional Stopping Theorem. Next observe that
„ż t  „ żt  „ 
2pp´1q 2 2pp´1q 2 2pp´1q
E |Xs | f psq ds ďE sup |Xs | f psq ds “ E sup |Xs | rXt s
0 0ďsďt 0 0ďsďt

and by Holder’s inequality with powers p “ p and q “ p{pp ´ 1q we have that


„  „ 1´ 1 ” ı1
p
E rXspt
2pp´1q 2p p
E sup |Xs | rXt s ďE sup |Xs |
0ďsďt 0ďsďt

Putting everything together produces


„  „ 1´ 1 ” ı1
p
E rXspt^τN
2p 2p p
E sup |Xs | ď CE sup |Xs |
0ďsďt^τN 0ďsďt^τN

By the definition of the stopping time everything is finite, hence we can divide thought by the first
term on the right to obtain
„ 1 ı1
p ”
ď CE rXspt^τN .
2p p
E sup |Xs |
0ďsďt^τN

We realize that
şt both right- and left hand side are uniformly bounded, under our assumption, by
(8.5) and by 0 fs ds ă tM 2 respectively. The proof is concluded by raising both sides to the power
2

p and, by means of dominated convergence theorem, removing the stopping time by taking the limit
as N Ñ 8. 

8. Martingale representation theorem


We conclude this chapter by introducing a last fundamental result about martingales, strengthen-
ing the connection between martingales and Ito integrals. Recall that Itô integrals of square-integrable
processes are martingales. The Martingale Representation theorem, a quite remarkable result, es-
sentially establishes that the converse result is also true: every martingale can be expressed as the
Itô integral of a square-integrable process. Furthermore, such process is unique among the family of
predictable processes. As suggested by the name, predictable processes are those whose value at
time t can be predicted given the information before time t. Examples of such processes are given
106
by processes that are continuous from the left, i.e., for which limsÒt Xs “ Xt .
A precise definition of this class of processes is given below:
Definition 8.1. Given a filtered probability space pΩ, F, P, tFt utě0 q, then a continuous-time stochastic
process tXt utě0 is predictable if X, considered as a mapping from Ω ˆ R` , is measurable with respect to
the σ-algebra generated by all left-continuous adapted processes.This σ-algebra is also called the predictable
σ-algebra.
One can think about predictable processes as processes that an external observer can control, as
exemplified below:
Remark 8.2. This example lives in discrete time, where predictability implies that Xn`1 P Fn .
Suppose we have a certain amount of money Vn at a certain time tn . We decide to invest a certain
percentage Xn of this money in a title with value Sn at time tn and put the remaining part 1 ´ Xn
in our bank account. Sn can be modeled as a random variable, but so can Xn : our fund’s allocation
varies based on how the title’s value fluctuates. Seeing the σ algebra Fn as information from the
values of Sn up to time tn . What makes Sn and Xn different is that we have control of the amount
of money Xn`1 that we want to invest at the time tn in the title Sn because this decision must be
made before tn`1 . In other words, the value of Xn`1 must depend exclusively the information up to
time tn , i.e., Xn`1 P Fn .
Theorem 8.3 (Martingale representation theorem). Let Mt be a square-integrable [or local]
FtB -martingale on p0,
”ş T q (where possibly
ı T “ 8) then there exists a square-integrable process Cs [or
T 2
a process Cs s.t. P 0 Cs ds ă 8 “ 0] such that
żt
Mt “ M0 Cs dBs .
0
We do not prove the above result here, but refer to [16] for a proof. We note that the result is
restricted to FtB . This result is especially useful in finance, as we will see in the final chapters of
this course.

107
CHAPTER 9

Girsanov’s Theorem

1. An illustrative example
We begin with a simple example. We will frame it in a rather formal way as this will make the
analogies with later examples clearer.
One-dimensional Gaussian case. Let us consider the probability space pω, P, Fq where Ω “ R
and P is the standard Gaussian with mean zero on variance one. (For completeness let F be the
Borel σ-algebra on R.). We define two random variables Z and Z̃ on this probability space. As
always, a real valued random variable is a function from Ω into R. Let us define
Zpωq “ ω and Z̃pωq “ ω ` µ
for some fixed constant µ. Since ω is drawn under P with respect the N p0, 1q measure on R we have
that Z is also distributed N p0, 1q and Z̃ is distributed N pµ, 1q.
Now let us introduce the density function associated to P as
1 ω2
φpωq “ ? expp´ q
2π 2
Now we introduce the function
φpω ´ µq ´ µ2 ¯
Λµ pωq “ “ exp ωµ ´
φpωq 2
Since Λµ is a function from Ω to R is can be viewed as a random variable and we have
ż ż8 ż8
EP Λµ “ Λµ pωqPpdωq “ Λµ pωqφpωqdω “ φpω ´ µqdω “ 1
ω ´8 ´8

since φpω ´ µq is the density of a N pµ, 1q random variable. Hence Λµ is a L1 pΩ, Pq random variable.
Hence we can define a new measure Q on Ω by
Qpdωq “ Λµ pωqPpdωq.
This means that for any random variable X on Ω we have that the expected value with respect to
the Q, denoted by EQ is define by
EQ rXs “ EP rXΛµ s
Furthermore observe that for any bounded f : R Ñ R,
ż8 ż8
EQ f pZq “ EP rf pZqΛµ s “ f pZpωqqΛµ pωqφpωqdω “ f pωqφpω ´ µqdω
´8 ´8
ż8
“ f pω ` µqφpωqdω “ EP f pZ̃q
´8

Which implies that the distribution of Z under the measure Q is the same as the distribution of Z̃
under distribution P.

109
Example 1.1 (Importance sampling). Let f : R Ñ R, and let X be distributed N pµ, 1q. We
have that
ż8
1 px´µq2
Ef pXq “ ? f pxqe´ 2 dx
2π ´8
for some µ P R.
For n large and tXi un1 iid N p0, 1q, we estimate the above expected value by sampling, i.e.,
n
1ÿ
Erf pXqs « f pXi q
n i“1

The problem of the above method is that for not-so-large values of µ ( e.g., µ ą 6), taking for example
f “ 1Xă0 we would need a very large amount of samples before sampling the tail of N pµ, 1q, i.e.,
elements that are relevant for our estimation.
However, let Y be distributed N p0, 1q. Then by the procedure outlined above we have:
ż8 px´µq2
e´ 2
„ 
1 ´ x2
µY ´ µ2
Erf pXqs “ ? f pxq x2
e 2 dx “ E f pY qe 2
2π ´8 e´ 2
n
1ÿ µ2
« f pYi qeµYi ´ 2
n i“1

for tYi uni“1 iid N p0, 1q. Under this new distribution, the indicator function often positively to the
sampling, and we need significantly less samples to obtain an accurate estimate of the expectation.

Multidimensional Gaussian case. Now let’s consider a higher dimensional version of the
above example. Let Ω “ Rn and let P be n-dimensional Gaussian probability measure with
covariance σ 2 I where σ ą 0 and I is the n ˆ n dimensional covariance matrix. In analogy to before,
we define for ω “ pω1 , . . . , ωn q P Rn
n
1 ´ 1 ÿ 2¯
φpωq “ exp ´ ω
2σ 2 i“1 i
n
p2πσ 2 q 2

and for µ “ pµ1 , . . . , µn q P Rn


n n
φpω ´ µq ´1 ÿ 1 ÿ 2¯
Λµ pωq “ “ exp 2 ωi µ i ´ 2 µ
φpωq σ i“1 2σ i“1 i

Then if we define the Rn valued random variables Zpωq “ pZ1 pωq, . . . , Zn pωqq and Z̃pωq “
pZ̃1 pωq, . . . , Z̃n pωqq “ Zpωq ` µ. Then if we define Qpdωq “ Λµ pωqPpdωq then following the same
reasoning as before that the distribution of Z under Q is the same as the distribution of Z̃ under P.

2. Tilted Brownian motion


Consider the tilted Brownian motion process

dXt “ µ dt ` dBt ,

where Bt is standard Brownian Motion, µ P R. Furthermore, let 0 “ t0 ă t1 ă ¨ ¨ ¨ ă tn ď T , and


f, g : Rn Ñ R such that f px1 , x2 , . . . , xn q “ gpx1 , x2 ´ x1 , . . . , xn ´ xn´1 q.
110
The to compute the expectation of f we write:
“ ‰
E gpXt1 , Xt2 ´ Xt1 , . . . , Xtn ´ Xtn´1 q
¨ ˛ 2
ż
gpx , x ´ x , . . . , x ´ x q
n
ź ´
rpxi ´xi´1 q´µpti ´ti´1 q2 s ź
1 2 1 n n´1 ‚ e 2pti ´ti´1 q
“ ˝
n 1 1 1
dxi
Ωˆ¨¨¨ˆΩ 2
p2πq 2 t1 pt2 ´ t1 q 2 ¨ ¨ ¨ ptn ´ tn´1 q 2 i“1

In light of what has been discussed in the previous section, we transform the above in iid Gaussian
distributions:
2
n
ź ´
rpxi ´xi´1 q´µpti ´ti´1 q2 s n
ź ´
pxi ´xi´1 q2 n
ź 1 2 pt ´t
e 2pti ´ti´1 q
“ e 2pti ´ti´1 q
eµpxi ´xi´1 q´ 2 µ i i´1 q

i“1 i“1 i“1


n pxi ´xi´1 q2
µxn ´ 12 µ2 tn ´
ź
2pti ´ti´1 q
“e e .
i“1

Now we can consider the multiplication as the desired measure of Gaussian distribution and the
prefactor as the random variable Λµ pω, tq:
” 1 2
ı
E rf pXt1 , . . . , Xtn qs “ E f pBt1 , . . . , Btn qeµBtn ´ 2 µ tn
“ E rf pBt1 , . . . , Btn qΛµ pω, tqs
We note en passant that the “coefficient” Λµ pω, tq can be written as a martingale Mt pωq, more
1 2
precisely the exponential martingale Mt “ eµBt ´ 2 µ t (we are going to define this concept more
precisely in the next section).

3. Girsanov’s Theorem for sdes


We now introduce some notation to generalize the above observations to the framework of
measure theory. Let pΩ, Fq be a measurable space, then
Definition 3.1. Given two measures µ, ν, we say that ν is absolutely continuous wrt (denoted
by µ ν ! µ if
µpAq “ 0 ñ νpAq “ 0 for all measurable sets A.
Provided that a measure Q is absolutely continuous wrt another measure P, the following
theorem from measure theory ensures that it is possible to perform the changes of measure that we
carried out in the previous section, i.e., it is possible to define a random variable Λ (the reweighting
factor) that compensates for such change of measure.
Theorem 3.2 (Radon Nikodym). Let P, Q be two probability measures on pΩ, Fq, such that
Q ! P , then there exists a measurable function Λ : Ω ÞÑ R (a random variable) such that EP rΛs “ 1
and ż
QrAs “ EP r1A Λs “ Λpωq dPpωq @A P F .
A
We denote such function
dQ
pωq , Λpωq “
dP
and we refer to it as the Radon Nikodym derivative.
111
The assumption of absolute continuity guarantees that the Radon Nikodym derivative is well
defined. Indeed, in the case where both probability measures have densities ρP , ρQ , Λ “ ρQ {ρP and
absolutely continuity guarantees that the above ratio is well defined (i.e., it does not explode).
We now present, without proof, a lemma from measure theory that allows to obtain most of the
results in this chapter.
Lemma 3.3 (General Bayes rule). Let µ and ν be probability measures on pΩ, Fq with dνpωq “
f pωqdµpωq for some f P L1 pµq. Let X be a random variable with:
ż ż
Eν |X| “ |Xpωq|dνpωq “ |Xpωq|f pωqdµpωq

If G Ă F is a σ-algebra, then:
Eν rX|Gs Eµ rf |Gs “ Eµ rf X|Gs
Before using the above theorem in the context of stochastic processes, we recall the concept of
stochastic exponential of a process Xt , given by
ˆ ˙
1
EpXqt “ exp Xt ´ X0 ´ rXst .
2
Recall that the stochastic integral of a process Xt are defined as the solution to the abbrsde
dUt “ Ut dXt . (9.1)
When Xt wr know by the Martingale representation theorem Theorem 8.3 that we can express
dXt “ Cs dBt for a predictable process Cs . Therefore, by (9.1) stochastic exponentials of local
martingales are local martingales themselves, as summarized in the following theorem. This result
also gives a sufficient condition (called the Novikov condition) for the stochastic exponential of a
(local) martingale to be a true martingale.
Theorem 3.4 (Exponential ”Martingale). ıIf Mt is a local martingale with M0 “ 0 (like, for
şt şt
instance, every 0 as dBs with P 0 a2s ds ă 8 “ 1) then the stochastic exponential EpM qt is a
continuous positive local martingale, and hence a supermartingale. Furthermore, if
„ ˆ ˙
1
E exp rM sT ă 8, (Novikov)
2
then EpM qt is a martingale on r0, T s with E pEpM qt q “ 1 .
Remark 3.5. Other conditions guaranteeing that the stochastic exponential of a local martingale
is a true martingale exist. Some of them are summarized in [9, Thm. 8.14 – 8.17]. Furthermore, if
şt
Mt has the form Mt “ 0 as dBs , then the condition as ď cpsq ă 8 for all P p0, T q is a sufficient
condition for EpM qt to be a martingale.
We finally come to the first version of Girsanov’s theorem. This result allows to do something
very similar to what was done in the first section of this chapter: Switching to a new probability
measure so that an “unnatural” random variable becomes a normal-distributed one. This result can
be generalized to the framework of stochastic processes: Girsanov’s theorem allows, under some
conditions summarized below, to transform an Itô process
dYt “ at pωq ` dBt (9.2)
on a given probability space pΩ, F, Pq to the “simplest” stochastic process we encountered in this
course, i.e., Brownian motion, by changing the measure on that space.
112
Theorem 3.6 (Girsanov I). Let Yt be defined as in (9.2) with Bt a Brownian motion under P.
şt
Assume that 0 as dBs is well defined, define the stochastic exponential
„ żt
1 t 2
ż 
Λt “ exp ´ as dBs ´ a ds .
0 2 0 s
and assume that Λt is a martingale on r0, T s with respect to P (i.e., a P-martingale). Then under
the (equivalent) probability measure
dQ
pωq “ ΛT pωq (9.3)
dP
the process Yt is a Brownian motion B̂t on r0, T s.
Proof. We want to show that Yt is a SBM wrt Q. To do so, by Lévy’s characterization of
Brownian motion Theorem 5.2, it is sufficient to show that
i) Yt is a local martingale wrt Q ,
ii) rY st “ t ,
provided that Y0 “ 0 (which we assume without loss of generality).
Part ii) follows from the following computation:

drB̃st “ drY st “ pat dt ` dBt q ¨ pat dt ` dBt q “ drBst “ dt ,


provided that quadratic variation of processes are unchanged by absolutely continuous changes of
probability measures such as the one defined by ΛT . To show this, because the quadratic variation
process is defined as a limit in probability, it is enough to show that for a sequence of random
variables tXn u, if limpnÑ8 Xn “ X in probability in P then the same holds in Q. To this aim, let
An :“ t|Xn ´ X| ą εu and assume P rAn s Ñ 0 then by integrability of ΛT we can apply dominated
convergence theorem and obtain that
QrAn s “ EP r1An ΛT s Ñ 0 .
For part i) we apply Itô’s product rule to Kt “ Yt Λt and obtain dKt “ Λt dYt ` Yt dΛt ` dYt dΛt .
Combining this with the sde for ΛT ,
dΛt “ ´Λt at dBt
we obtain
dKt “ Λt pat dt ` dBt q ´ Yt Λt at dBt ´ Λt at dBt2
“Λt pat dt ` dBt q ´ Yt Λt at dBt ´ Λt at dt
“Λt p1 ´ Yt at qdBt
and so Kt is a martingale wrt P.
Now, we have that
EP rΛt Yt |Fs s Ks
EQ rYt |Fs s “ “ “ Ys ,
EP rΛt |Fs s Λs
implying that Ys is a martingale wrt Q. 
Remark 3.7. We note that instead of proving part ii) of the above theorem one could also have
applied Theorem 5.1, i.e., we could have shown that Kt2 ´ t is a martingale. The proof of this result
follows the same lines of the one of part i) above
113
Example 3.8 (Brownian motion Tracking a Continuous Function). We would like to estimate
the probability that during the interval r0, T s Brownian motion Bt stays in a “tube” of radius ε
around a given differentiable function hptq P C 1 pRq with hp0q “ 0. More precisely, we would like to
estimate the following probability:
ˆ ˙
P sup |Bt ´ hptq| ă ε ą 0
0ďtď1
Let the event G be given by:
G “ t|Bs ´ hpsq| ă ε, s P r0, 1su
“ t|Xs | ă ε, s P r0, 1su
for the process Xs “ Bs ´ hpsq which has differential
dXs “ ´h1 psqds ` dBs
Then by the above theorem we define the change of measure
t 1
ş 1
şt
|h1 psq|2 dsq
Λt “ ep 0 h psqdBs ´ 2 0

Because h1 psq is continuous on a compact interval it is uniformly bounded and the Novikov condition
holds. Hence we can define the measure dQ “ Λt dP, by the above theorem under Q, Xt is a standard
BM. Therefore we can write
˜ż ˆ ¸1
dQ 2 2
ż ˆ ˙ ˙
dQ 1
QpGq “ 1G dP ď dP PpGq 2
dP dP
where in the step inequality we have used Cauchy-Schwartz inequality and so:
Qpsupp0,1q |B̂s | ă εq2
ˆ ˙
P sup |Bt ´ hptq| ă ε ě ş ´ dQ ¯
0ďtď1
dP dQ

Looking at the above inequality we see that we have reduced the estimation of the relevant probability
to the estimation of the probability of Brownian motion exiting an interval and the expected value of
the random variable Λ1 .
The above result can be extended to the d-dimensional setting with nontrivial diffusion coefficient
σpXt q. Furthermore, we may be interested in transforming Yt (in the distributional sense) to a
different Itô process Xt different than Brownian motion. Conditions to do this are summarized in
the following more general theorem:
Theorem 3.9 (Girsanov II). Let Xt , Yt P Rd be processes satisfying
dXt “ µpXt , tq dt `σpXt , tq dBt ,
dYt “ pµpYt , tq ` γpω, tqq dt `σpYt , tq dBt ,
with Y0 “ X0 “ x for a m-dimensional P-Brownian motion Bt on t P r0, T s. Suppose that there
exists a process upω, tq such that
σpYt qupω, tq “ γpω, tq .
Furthermore let „ żt
1 t
ż 
2
Λt :“ exp ´ upω, sq dBs ´ upω, sq ds , (9.4)
0 2 0
Then if Λt is a P-martingale on r0, T s and Q is defined as in (9.3) we have that
dYt “ µpYt , tq dt ` σpYt , tq dB̂t ,
114
for a Q-Brownian motion żt
B̂t “ upω, sq ds ` Bt .
0

Proof. It follows from Theorem 3.6 that B̂t is a Brownian motion wrt Q. Furthermore we
observe that
dYt “ pµpYt , tq ` γpω, tqq dt ` σpYt , tqp dB̂t ´ upω, tq dtq
“ pµpYt , tq ` γpω, tqq dt ` σpYt , tq dB̂t ´ γpω, tq dt
“ µpYt , tq dt ` σpYt , tq dB̂t
as desired. 
We note that the above result can be added to our arsenal of methods to find weak solutions to
sdes! Indeed, let Xt , Yt be defined by:
/ dXt “ µ1 pXt q dt ` σpXt qdBt
, dYt “ µ2 pYt q dt ` σpYt qdBt
X0 “ Y0 “ x
and assume that we cannot solve / but have an idea on how to solve ,. Then we can define upyq
by:
σpyqupyq “ µ2 pyq ´ µ1 pyq
and set, as in Theorem 3.9 şt 1 t
ş 2
Λt “ e´ 0 upYs qdBs ´ 2 0 |upYs q| ds
which allows us to define the measure dQ “ Λt dP. Then by Theorem 3.9 we have that
żt
B̂t “ Bt ` upYs qds
0
is a standard Brownian motion under Q, and
dYt “ µ1 pYt q dt ` σpYt qdB̂t
“ µ1 pYt q dt ` σpYt q rupYt q dt ` dBt s
“ µ1 pYt q dt ` µ2 pYt q dt ´ µ1 pYt q dt ` σpYt qdBt
“ µ2 pYt q dt ` σpYt qdBt
Hence, Yt in Q solves the same sde as Xt , but with a different Brownian motion. This implies that
the Law of Yt on Cp0, T ; Rd q (and therefore all of its marginals) is equivalent to the Law of Xt on
Cp0, T ; Rd q. Hence we can write the unknown marginals for the process / in P as
EP rf pXt qs “ EQ rf pYt qs “ EP rf pYt qΛT s ,
i.e., as an expectation on a process that we know multiplied by a weighting factor that can be
estimated/computed.

115
CHAPTER 10

One Dimensional sdes

1. Natural Scale and Speed measure


We now want to consider sdes which do not satisfy the Lipschitz assumptions of Chapter 3. Let
b and σ be bounded, continuous real-valued functions with σ uniformly bounded from below by a
positive constant. Consider the sde
dXt “ bpXt q dt ` σpXt q dBt (10.1)
We want to find a function φ : R Ñ R so that if we define Yt “ φpXt q then Yt is a martingale.
Applying Itô’s formula gives
dYt “ pLφqpXt q dt ` φ1 pXt qσpXt q dBt . (10.2)
where L is the generator of the process Xt defined by
1
pLφqpxq “ bpxqφ1 pxq ` σ 2 pxqφ2 pxq
2
Assuming that our choice of φ is such that φ1 is bounded, Yt will be a martingale if pLφqpxq “ 0.
This implies that
żx ´ ż y 2b
1 1 φ2 2b ¯
plog φ q “ 1 “ ´ 2 implies φpxq “ exp ´ 2
pzqdz dy
φ σ α β σ

for any choice of α and β. Notice that by construction φ is twice-differentiable, positive and
monotone increasing function of R onto R. Hence φ is invertible and we can understand φ as a
warping of R so that Xt becomes a Martingale. For this reason, the function φ is called the natural
scale for the process Xt .
In light of (10.2), Yt “ φpXt q satisfies
dYt “ pφ1 σqpφ´1 pYt qq dBt (10.3)
which shows that Yt not only is a Martingale but it is again an sde.
In the discussion of random time changes, we saw that the when the martingale Mt was solves
the sde
dMt “ gpMt q dBt (10.4)
then if we consider Mt on the time scale
żt
1
τ ptq “ ds
0 g 2 pM sq

then Bt “ Mτ ptq is a Brownian motion. Since the rate of randomness injection into the system, as
measured by the quadratic variation, for a Brownian motion is one, this time changes is given a
distinguished status. The measure on R which gives this time change is g21pxq when integrated along
117
the trajectory so called the speed measure. In the setting of (10.4), the speed measure, denoted
mpxqdx, would be
1
mpxq “ 2 .
g pxq
Returning to the setting with a drift term (10.1), we look for the time change of the resulting
martingale after the system has been put on its natural scale. Looking at (10.3), we see that
1
dy (10.5)
rpφ σqpφ´1 pyqqs2
1

is the speed measure for the system expressed in the y variable where y “ φpxq. Undoing this
transform using and using dy “ φ1 pxqdx shows the speed measure in the original variables to be
1
mpxqdx “ 1 2 dx
pφ σ qpxq
2. Existence of Weak Solutions
In the previous section we saw how to transform the one-dimensional sde (10.1) in to a
Brownian motion by warping space and changing time. Noticing that each of these processes was
reversible/invertible, we now reverse our steps to turn a Brownian motion in to a solution of (10.1).
Let Bt be a standard Brownian motion. Looking back at (10.3) and (10.5), we define Yt by
dYt “ pφ1 σqpφ´1 pYt qqdBt
The equation has a weak solution given by Yt “ BTt where
żt
“ 1 ‰2
Tt “ pφ σqpφ´1 pBs qq ds .
0
Next we define Xt “ ψpYt q where for notation compactness we have defined ψ “ φ´1 then Itô’s
formula tells up that
1
dXt “ ψ 1 pYt qdY ` ψ 2 pYt qdrY st
2
Need to finish argument

3. Exit From an Interval


Let Mt “ φpXt q where φ is the natural scale and Xt solves (10.1). And define the hitting time
τy “ inftt ě 0 : Xt “ yu
Assuming that X0 “ x P pa, bq we define the exit time of the interval by
τpa,bq “ τa ^ τb .
By the construction of φ, Mt is a martingale. Hence since τpa,bq ^ t is a bounded stopping time, the
Optional Stopping lemma says that
Ex Mτpa,bq ^t “ Ex M0 “ φpxq
If we assume that σpyq ą 0 for all y P ra, bs then it is possible to show that
Ex τpa,bq ă 8 .
This in turn implies that τpa,bq ^ t Ñ τpa,bq as t Ñ 8. Hence we have that
φpxq “Ex Mτpa,bq “ Px pτa ď τb qMτa ` Px pτb ď τa qMτb
“Px pτa ď τb qφpaq ` p1 ´ Px pτa ď τb qqφpbq

118
Rearranging produces
φpxq ´ φpaq
Px pτa ď τb q “ (10.6)
φpbq ´ φpaq
Another way to find this formula is to set upxq “ Px pτa ď τb q. Then upxq solves the pde
pLuqpxq “ 0 x P pa, bq, upaq “ 1, and upbq “ 0
It is not heard to see that the above formula solves this pde. (Try the case when Xt is a standard
Brownian motion to get started).
Now we derive a formula for vpxq “ Ex τpa,bq . Since it is a solution to
pLvq “ ´1 x P pa, bq and vpaq “ vpbq “ 0
one finds
żb
φpxq ´ φpaq “ ‰
vpxq “ Ex τpa,bq “2 φpbq ´ φpzq mpzqdz
φpbq ´ φpaq x
żx
φpbq ´ φpxq “ ‰
`2 φpzq ´ φpaq mpzqdz
φpbq ´ φpaq a

4. Recurrence
Definition 4.1. A one-dimensional diffusion is recurrent if for all x, Ppτx ă 8q “ 1.
Theorem 4.2. If a ă x ă b then
i) Px pTa ă 8q “ 1 if and only if φp8q “ 8.
ii) Px pTb ă 8q “ 1 if and only if φp´8q “ ´8.
iii) Xt is recurrent if and only if φpRq “ R if and only if both φp8q “ 8 and φp´8q “ ´8.
proof of Theorem 4.2. 

5. Intervals with Singular End Points


Let rα, βs be an interval such that on any rr, ls Ă pα, βq we have that the coefficients bpxq and
σpxq are bounded and σpxq positive on rr, ls. Under these assumptions the only points were σ can
vanish or σ and β become infinite are α and β. Without loss of generality, we assume that x P pα, βq.
If we define
ż0 żβ
“ “
Iα “ φp0q ´ φpzqsmpxqdz Iβ “ φpzq ´ φp0qsmpxqdz
α 0
ż0 żβ
“ “
Jα “ M p0q ´ M pzqsφ1 pxqdz Jβ “ M pzq ´ M p0qsφ1 pxqdz
α 0
then we have the following result.
Theorem 5.1. Let γ P tα, βu, then
i) Iγ ă 8 if and only if Xt can reach the point γ.
ii) Jγ ă 8 if and only if Xt can escape the point γ.
Following Feller, we have the following boundary point classification.
Iq Jq Boundary Type of q
ă8 ă8 regular point
ă8 “8 absorbing point
“8 ă8 entrance point
“8 “8 natural point
119
Bibliography

1. Leo Breiman, Probability, Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, 1992,
Corrected reprint of the 1968 original. MR 93d:60001
2. J. L. Doob, Stochastic processes, John Wiley & Sons Inc., New York, 1953. MR 15,445b
3. Richard Durrett, Stochastic calculus, Probability and Stochastics Series, CRC Press, Boca Raton, FL, 1996, A
practical introduction. MR 1398879 (97k:60148)
4. , Stochastic calculus, a practical introduction, CRC Press, 1996.
5. John Guckenheimer and Philip Holmes, Nonlinear oscillations, dynamical systems, and bifurcations of vector
fields, Applied Mathematical Sciences, vol. 42, Springer-Verlag, New York, 1990, Revised and corrected reprint of
the 1983 original.
6. Philip Hartman, Ordinary differential equations, second ed., Birkhäuser, Boston, Mass., 1982.
7. Ioannis Karatzas and Steven E. Shreve, Brownian motion and stochastic calculus, second ed., Graduate Texts in
Mathematics, vol. 113, Springer-Verlag, New York, 1991. MR 1121940 (92h:60127)
8. , Brownian motion and stochastic calculus, second ed., Springer-Verlag, New York, 1991. MR 92h:60127
9. Fima C. Klebaner, Introduction to stochastic calculus with applications, third ed., Imperial College Press, London,
2012. MR 2933773
10. N. V. Krylov, Introduction to the theory of random processes, Graduate Studies in Mathematics, vol. 43, American
Mathematical Society, Providence, RI, 2002. MR 1885884 (2003d:60001)
11. Hui-Hsiung Kuo, Introduction to stochastic integration, Universitext, Springer, New York, 2006. MR 2180429
(2006e:60001)
12. H. P. McKean, Stochastic integrals, Academic Press, New York-London, 1969, Probability and Mathematical
Statistics, No. 5.
13. Peter Mörters and Yuval Peres, Brownian motion, Cambridge Series in Statistical and Probabilistic Mathematics,
Cambridge University Press, Cambridge, 2010, With an appendix by Oded Schramm and Wendelin Werner.
MR 2604525 (2011i:60152)
14. Bernt Øksendal, Stochastic differential equations, fifth ed., Universitext, Springer-Verlag, Berlin, 1998, An
introduction with applications. MR 1619188 (99c:60119)
15. Philip Protter, Stochastic integration and differential equations: a new approach, Springer-Verlag, 1990.
16. Philip E. Protter, Stochastic integration and differential equations, Stochastic Modelling and Applied Probability,
vol. 21, Springer-Verlag, Berlin, 2005, Second edition. Version 2.1, Corrected third printing. MR 2273672
(2008e:60001)
17. Daniel Revuz and Marc Yor, Continuous martingales and Brownian motion, second ed., Grundlehren der
Mathematischen Wissenschaften, vol. 293, Springer-Verlag, Berlin, 1994.
18. Walter A. Strauss, Partial differential equations, John Wiley & Sons Inc., New York, 1992, An introduction.
MR 92m:35001
19. Daniel W. Stroock, Probability theory, an analytic view, Cambridge University Press, Cambridge, 1993.
MR 95f:60003
20. S. J. Taylor, Exact asymptotic estimates of brownian path variation, Duke Mathematical Journal 39 (1972), no. 2,
219–241, Mathematical Reviews number (MathSciNet) MR0295434, Zentralblatt MATH identifier0241.60069.

121
APPENDIX A

Some Results from Analysis

Recalling that, given a probability space pΩ, Σ, Pq and a random variable X on such a space we
define the expectation of a function f as the integral
ż
E rf pXqs “ f pωqPpdωq ,

where P denotes the (probability) measure against which we are integrating. The following results
are stated for a general measure µ (i.e., not necessarily a probability measure).
Theorem 0.1 (Hölder inequality). Let pΩ, Σ, µq be a measure space and let p, q P r1, 8s with
1{p ` 1{q “ 1. Then, for all measurable real- or complex-valued functions f and g on Ω,
ż ˆż ˙ 1 ˆż ˙1
p q
|f pxqgpxq|dµpxq ď |f pxq|p dµpxq |gpxq|q dµpxq .
Ω Ω Ω

Theorem 0.2 (Lebesgue’s Dominated Convergence theorem). Let tfn u be a sequence of mea-
surable functions on a measure space pΩ, Σ, µq. Suppose that the sequence converges pointwise to a
function f and is dominated by some integrable function g in the sense that
|fn pxq| ď gpxq
for all numbers n in the index set of the sequence and all points x P S. Then f is integrable and
ż
lim |fn ´ f | dµ “ 0
nÑ8 Ω

which also implies ż ż


lim fn dµ “ f dµ
nÑ8 Ω Ω
Theorem 0.3 (Fatou’s Lemma). Given a measure space pΩ, Σ, µq and a set X P Σ , let tfn u be
a sequence of pΣ, B Rě0 q-measurable non-negative functions fn : X Ñ r0, `8s. Define the function
f : X Ñ r0, `8s by setting
f pxq “ lim inf fn pxq,
nÑ8
for every x P X. Then f is pΣ, B Rě0 q-measurable, and
ż ż
f dµ ď lim inf fn dµ.
Ω nÑ8 Ω
where the integrals may be finite or infinite.
Remark 0.4. The above theorem can in particular be used when f is the indicator function 1An
for a sequence of sets tAn u P Σ, obtaining
ż ż
µplim inf An q “ lim inf 1An dµ ď lim inf 1An dµ “ lim inf µpAn q .
nÑ8 Ω nÑ8 nÑ8 Ω nÑ8

123
APPENDIX B

Exponential Martingales and Hermite Polynomials


şt
Let σpt, ωq be a bounded adapted stochastic process. Define Ipt, ωq “ 0 σs pωqdBps, ωq. We
showed that
ˆż t
1 t
ż ˙
Ipt,ωq´ 21 rIspt,ωq 2
EI pt, ωq :“ e “ exp σs pωqdBpsωq ´ σs pωq ds
0 2 0
was a martingale. Ept, ωq is often called the exponential martingale of I. This is reasonable because
of the following analogy. In the standard ODE setting we have
ˆż t ˙
dY ptq “ Y ptqaptq dt ùñ Y ptq “ Y p0q exp apsqds .
0

The analogous sde is


dZpt, ωq “ Zpt, ωqdIpt, ωq “ Zpt, ωqσpt, ωqdBpt, ωq

or
żt
Zptq “ Zp0q ` Zps, ωqσs pωqdBps, ωq
0

The solution to this is Zpt, ωq “ EI pt, ωq. Hence it is reasonable to call it the stochastic exponential.
From the sde representation it is clear that EI pt, ωq is a martingale, assuming Ipt, ωq is a nice process
(bounded for example). (The Novikov condition is another criteria (in [8] or [17] for example)).
Just as the exponential can be expanded in a basis of homogeneous polynomials, it is reasonable
to ask if something similar can be done with the stochastic exponential. (A function f pxq is
homogeneous of degree n if for all γ P R, f pγxq “ γ n f pxq.) For the regular exponential, we have
8
γX
ÿ Xn
e “ γn .
n“0
n!

Hence we look for Hn pI, rIsq such that


´ ¯
2 1 rIspt,ωq
8
ÿ Hn Ipt, ωq, rIspt, ωq
EγI pt, ωq “ eγIpt,ωq´γ 2 “ γn .
n“0
n!
` ˘
Since the stochastic exponential is a martingale, it is reasonable to expect that the Hn It , rIst
should be martingales. In fact, you can argue that the Hn must be mean zero martingales ` ˘by
varying 2
` γ. Recall
˘ that from its definition rγIspt, ωq “ γ rIspt, ωq. Hence if we want Hn γIt , rγIst “
γ n Hn It , rIst , we are lead to look for polynomials of the form
ÿ
Hn px, yq “ Cn,m xn´2m y m .
0ďmďtn{2u

125
In homework 2, you found the conditions on the Cn,m so that Hn pI, rIsq was a martingale. In fact,
these polynomial are well known in many areas of math and engineering. They are the Hermite
polynomials. They can also defined by the following expression
ˆ ˙
n x
Hn px, yq “ y H̄n ?
y
n
ˆ ˙
n z2 d
2 2
´ z2
H̄n pzq “ p´1q e e
dz n
Here the H̄n are the standard Hermite polynomial (possible with a different normalization than you
are used to).
We now have two different expressions for the stochastic exponential of γIpt, ωq with zp0q “ 1.
Namely, setting Zpt, ωq “ EγI , we have
żt
Zpt, ωq “ 1 ` γ Zps, ωqσs pωqdBps, ωq
0

and
´ ¯
8
ÿ Hn Ipt, ωq, rIspt, ωq
Zpt, ωq “ γn
k“0
n!
The first expression has Z on the right hand side. At least formally, we can repeatedly insert the
expression of Zps, ωq. Suppressing the ω dependence, we obtain
żt
Zptq “ 1 ` γ Zps1 qσps1 qdBps1 q
0
żt ż t ż s1
2
“ 1 ` γ Zps1 qσps1 qdBps1 q ` γ Zps2 qσps2 qdBps2 qσps1 qdBps1 q
0 0 0
żt
“ 1 ` γ Zps1 qσps1 qdBps1 q ` ¨ ¨ ¨
0
ż t ż s1 ż sn´1
` γn ¨¨¨ Zpsn qσpsn qdBpsn q ¨ ¨ ¨ σps1 qdBps1 q
0 0 0
8
ÿ ż t ż s1 ż sk´1
“ γk ¨¨¨ σpsk qdBpsk q ¨ ¨ ¨ σps1 qdBps1 q
k“0 0 0 0

Now if we equate like powers of γ, we obtains


´ż t żt ¯ żt ż sn´1
σdB, σ 2 ds “ n! ¨ ¨ ¨
` ˘
Hn Iptq, rIsptq “ Hn σpsn qdBpsn q ¨ ¨ ¨ σps1 qdBps1 q
0 0 0 0
` ˘
From this expression, it is again clear that Hn Iptq, rIsptq is a martingale.
For more information along the lines of this section first see [12] and then see [19, 17].

126

You might also like