Introduction To Stochastic Calculus
Introduction To Stochastic Calculus
Chapter 1. Introduction 5
1. Motivations 5
2. Outline For a Course 6
Chapter 2. Probabilistic Background 9
1. Countable probability spaces 9
2. Uncountable Probability Spaces 12
3. General Probability Spaces and Sigma Algebras 13
4. Distributions and Convergence of Random Variables 20
Chapter 3. Brownian Motion and Stochastic Processes 23
1. An Illustrative Example: A Collection of Random Walks 23
2. General Stochastic Proceses 24
3. Definition of Brownian motion (Wiener Process) 25
4. Constructive Approach to Brownian motion 28
5. Brownian motion has Rough Trajectories 29
6. More Properties of Random Walks 31
7. More Properties of General Stochastic Processes 32
8. A glimpse of the connection with pdes 37
Chapter 4. Itô Integrals 39
1. Properties of the noise Suggested by Modeling 39
2. Riemann-–Stieltjes Integral 40
3. A motivating example 41
4. Itô integrals for a simple class of step functions 42
5. Extension to the Closure of Elementary Processes 46
6. Properties of Itô integrals 48
7. A continuous in time version of the the Itô integral 49
8. An Extension of the Itô Integral 50
9. Itô Processes 51
Chapter 5. Stochastic Calculus 53
1. Itô’s Formula for Brownian motion 53
2. Quadratic Variation and Covariation 56
3. Itô’s Formula for an Itô Process 60
4. Full Multidimensional Version of Itô Formula 62
5. Collection of the Formal Rules for Itô’s Formula and Quadratic Variation 66
Chapter 6. Stochastic Differential Equations 69
1. Definitions 69
2. Examples of SDEs 70
3. Existence and Uniqueness for SDEs 73
3
4. Weak solutions to SDEs 76
5. Markov property of Itô diffusions 78
Chapter 7. PDEs and SDEs: The connection 79
1. Infinitesimal generators 79
2. Martingales associated with diffusion processes 81
3. Connection with PDEs 83
4. Time-homogeneous Diffusions 86
5. Stochastic Characteristics 87
6. A fundamental example: Brownian motion and the Heat Equation 88
Chapter 8. Martingales and Localization 91
1. Martingales & Co. 91
2. Optional stopping 93
3. Localization 95
4. Quadratic variation for martingales 98
5. Lévy-Doob characterization of Brownian motion 99
6. Random time changes 101
7. Martingale inequalities 104
8. Martingale representation theorem 106
Chapter 9. Girsanov’s Theorem 109
1. An illustrative example 109
2. Tilted Brownian motion 110
3. Girsanov’s Theorem for sdes 111
Chapter 10. One Dimensional sdes 117
1. Natural Scale and Speed measure 117
2. Existence of Weak Solutions 118
3. Exit From an Interval 118
4. Recurrence 119
5. Intervals with Singular End Points 119
Bibliography 121
Appendix A. Some Results from Analysis 123
Appendix B. Exponential Martingales and Hermite Polynomials 125
4
CHAPTER 1
Introduction
1. Motivations
Evolutions in time with random influences/random dynamics. Let N ptq be the
“number of rabbits in some population” or “the price of a stock”. Then one might want to make a
model of the dynamics which includes “random influences”. A (very) simple example is
dN ptq
“ aptqN ptq where aptq “ rptq ` “noise” . (1.1)
dt
Making sense of “noise” and learning how to make calculations with it is one of the principal
objectives of this course. This will allow us predict, in a probabilistic sense, the behavior of N ptq .
Examples of situations like the one introduced above are ubiquitous in nature:
i) The gambler’s ruin problem We play the following game: We start with 3$ in our
pocket and we flip a coin. If the result is tail we loose one dollar, while if the result is
positive we win one dollar. We stop when we have no money to bargain, or when we reach
9$. We may ask: what is the probability that I end up broke?
ii) Population dynamics/Infectious diseases As anticipated, (1.1) can be used to model
the evolution in the number of rabbits in some population. Similar models are used to
model the number of genetic mutations an animal species. We may also think about
N ptq as the number of sick individuals in a population. Reasonable and widely applied
models for the spread of infectious diseases are obtained by modifying (1.1), and observing
its behavior. In all these cases, one may be interested in knowing if it is likely for the
disease/mutation to take over the population, or rather to go extinct.
iii) Stock prices We may think about a set of M risky investments (e.g. a stock), where the
price Ni ptq for i P t1, . . . M u per unit at time t evolves according to (1.1). In this case, one
one would like to optimize his/her choice of stocks to maximize the total value M
ř
i“1 αi Ni ptq
at a later time T .
Connections with diffusion theory and PDEs. There exists a deep connection between
noisy processes such as the one introduced above and the deterministic theory of partial differential
equations. This starling connection will be explored and expanded upon during the course, but we
anticipate some examples below:
i) Dirichlet problem Let upxq be the solution to the pde given below with the noted
B2 B2
boundary conditions. Here ∆ “ Bx 2 ` By 2 . The amazing fact is the following: If we start a
Brownian motion diffusing from a point px0 , y0 q inside the domain then the probability
that it first hits the boundary in the darker region is given by upx0 , y0 q.
ii) Black Scholes Equation Suppose that at time t “ 0 the person in iii) is offered the
right (without obligation) to buy one unit of the risky asset at a specified price S and at
a specified future time t “ T . Such a right is called a European call option. How much
should the person be willing to pay for such an option? This question can be answered by
solving the famous Black Scholes equation, giving for any stock price N ptq the right value
S of the European option.
5
u=0
1
2 ∆u =0
u=1
7
CHAPTER 2
Probabilistic Background
Let Ω be the set of all such sequences of length N (i.e. Ω “ t´1, 1uN ), and consider now the
sequence of functions tXn : Ω Ñ Zu where
X0 pωq “ 0 (2.1)
ÿ n
Xn pωq “ ωi
i“1
for n P t1, ¨ ¨ ¨ , N u. This sequence is a biased random walk (we call it unbiased or simply a random
walk if p “ 1{2) of length N : A simple example of a stochastic process. We can compute its
expectation: ÿ
E rX2 s “ iP rX2 “ is “ 2p2 ´ 2p1 ´ pq2 “ 2p2p ´ 1q .
iPt´2,0,2u
This expectation changes if we assume that we have some information on the state of the random
walk at an earlier time:
ÿ
E rX2 |X1 “ 1s “ iP rX2 “ i|X1 “ 1s “ 2p ` 0 p1 ´ pq “ 2p .
iPt´2,0,2u
We now recall some basic definitions from the theory of probability which will allow us to put
this example on solid ground.
In the above example, the set Ω is called the sample space (or outcome space). Intuitively, each
ω P Ω is a possible outcome of all of the randomness in our system. The subsets of Ω (the sets of
outcomes we want to compute the probability of) are referred to as the events and the measure
given by P on subsets A Ď Ω is called the probability measure, giving the chance of the various
9
outcomes. Finally, each Xn is an example of an integer-valued random variable. We will refer to
this collection of random variables as random walk.
In the above setting where the outcome space Ω consists of a finite number of elements, we are
able to define everything in a straightforward way. We begin with a quick recalling of a number of
definitions in the countably infinite (possibly finite) setting.
If Ω is countable it is enough
ř to define the probability of each element in Ω. That is to say give
a function p : Ω Ñ r0, 1s with ωPΩ ppωq “ 1 and define
Prωs “ ppωq
for each ω P Ω. An event A is just a subset of Ω. We naturally extend the definition of P to an
event A by
ÿ
PrAs :“ P rωs .
ωPA
Observe that this definition has a number of consequences. In particular, if Ai are disjoint events,
that is to say Ai Ă Ω and Ai X Ak “ H if i ‰ j then
« ff
8
ď 8
ÿ
P Ai “ PrAi s
i“1 i“1
Here we have used the convention that tX “ xu is short hand for tω P Ω : Xpωq “ xu and the defi-
nition of RangepXq “ tx P X : Dω, with Xpωq “ xu “ XpΩq. We can further define the covariance
of two random variables X, Y in the same space as Cov rX, Y s “ E rpX ´ E rXsq ¨ pY ´ E rY sqs and
” ı
Var rXs :“ Cov rX, Xs “ E X 2 ´ E rXs2 .
Two events A and B are independent if PrA X Bs “ PrAsPrBs. Two random variables are
independent if PrX “ x, Y “ ys “ PrX “ xsPrY “ ys. Of course this implies that for any
events A and B that PrX P A, Y P Bs “ PrX P AsPrY P Bs and that ErXY s “ ErXsErY s, so
that Cov rX, Y s “ 0 (note that Cov rX, Y s “ 0 is a necessary but not sufficient condition for
independence). A collection of events Ai is said to be mutually independent if
n
ź
PrA1 X ¨ ¨ ¨ X An s “ PrAi s .
i“1
Similarly a collection of random variable Xi are mutually independent if for any collection of sets
from their range Ai one has that the collection of events tXi P Ai u are mutually independent. As
10
before, as a consequence one has that
n
ź
ErX1 . . . Xn s “ ErXi s .
i“1
Given two X-valued random variables Y and Z, for any z P Range(Z) we define the conditional
expectation of Y given tZ “ zu as
ÿ
ErY |Z “ zs :“ yPrY “ y|Z “ zs (2.4)
yPRangepY q
Which is to say that ErY |Z “ zs is just the expected value of Y under the probability measure
which is given by P r ¨ |Z “ zs.
In general, for any event A we can define the conditional expectation of Y given A as
ÿ
ErY |As :“ yPrY “ y | As (2.5)
yPRangepY q
We can extend the definition ErY |Z “ zs to ErY |Zs which we understand to be a function of Z which
takes the value ErY |Z “ zs when Z “ z. More formally ErY |Zs :“ hpZq where h : RangepZq Ñ X
given by hpzq “ ErY |Z “ zs.
Example 1.2 (Example 1.1 continued). By clever rearrangement one does not always have to
calculate the function ErY |Zs explicitly:
« ff « ff
7
ÿ 6
ÿ
ErX7 |X6 s “ E ωi |X6 “ E ωi ` ω7 |X6
i“1 i“1
“E rX6 ` ω7 |X6 s “ ErX6 |X6 s ` Erω7 |X6 s
“X6 ` Erω7 s “ X6 ` p2p ´ 1q
since ω7 is independent on tωi u6i“1 (and therefore on X6 ) and we have Erω7 s “ p2p ´ 1q.
Example 1.3 (again, Example 1.1 continued). Setting p “ 1{2 we consider the random variable
pX3 q2 and we see that
ÿ
ErpX3 q2 |X2 “ 2s “ iPrpX3 q2 “ i|X2 “ 2s
iPN
“ p1q PrX3 “ 1|X2 “ 2s ` p3q2 PrX3 “ 3|X2 “ 2s “ 5
2
Of course, X2 can also take the value ´2 and 0. For these values of X2 we have
ErpX3 q2 |X2 “ ´2s “p´1q2 PrX3 “ ´1|X2 “ ´2s ` p´3q2 PrX3 “ ´3|X2 “ ´2s “ 5
ErpX3 q2 |X2 “ 0s “p´1q2 PrX3 “ ´1|X2 “ 0s ` p1q2 PrX3 “ 1|X2 “ 0s “ 1
Hence ErpX3 q2 |X2 s “ hpX2 q where
#
5 if x “ ˘2
hpxq “ (2.6)
1 if x “ 0
Again, note that we can arrive to the same result by cleverly rearranging the terms involved in the
computation:
ErX32 |X2 s “ ErpX2 ` ω3 q2 |X2 s “ ErX22 ` 2ω3 X2 ` ω32 |X2 s
“ ErX22 s ` 2Erω3 sErX2 s ` Erω32 s “ X22 ` 1
since Erω3 s “ 0 and Erω32 s “ 1. Compare this to the definition of h given in (2.6) above.
11
2. Uncountable Probability Spaces
If we consider Example 1.1 in the case N “ 8 (or even worse if we imagine our stochastic
process to live on the continuous interval r0, 1s) we need to consider Ω which have uncountably
many points. To illustrate the difficulties one can encounter in this setting let us condsider the
following example:
Example 2.1. Consider Ω “ r0, 1s and let P be the uniform probability distribution on Ω, i.e.,
that dmeasure that associates the same probability to each on the points in Ω. We immediately see
that in order to have P rΩs to be finite we must have P rωs “ 0 for all ω P Ω, as otherwise
« ff
ď ÿ
P rΩs “ P ω “ P rωs “ 8 .
ωPΩ ωPΩ
For this reason it is not sufficient anymore to simply assign a probability to each point ω P Ω as we
did before. We have to assign a probability to sets:
P rpa, bqs “ b ´ a for 0 ď a ď b ď 1 .
To handle the setting such as the one introduced above completely rigorously we need ideas
from basic measure theory. However if one is willing to except a few formal rules of manipulation,
we can proceed with learning basic stochastic calculus without needing to distract ourselves with
too much measure theory.
As we did in the previous section, we can define a real random variable as a function X : Ω Ñ R.
To define the measure associated by P to the values of this random variable we specify its Cumulative
Distribution Function (CDF) F pxq defined as P rX ď xs “ F pxq. We say that a R-valued random
variable X is a continuous random variable if there exists an (absolutely continuous) density function
ρ : R Ñ R so that
żb
P rX P ra, bss “ ρpxqdx
a
for any ra, bs Ă R. By the fundamental theorem of calculus we see that ρpxq satisfies ρpxq “ F 1 pxq.
More generally a Rn -valued random variable X is called a continuous random variable if there exists
a density function ρ : Rn Ñ Rě0 so that
ż b1 ż bn ż ż
PrX P ra, bss “ ¨¨¨ ρpx1 , . . . , xn q dx1 ¨ ¨ ¨ dxn “ ρpxqdx “ ρpxqLebpdxq
a1 an ra,bs ra,bs
n
ś
for any ra, bs “ rai , bi s Ă
R . The last two expressions are just different ways of writing the same
thing. Here we have introduced the notation Lebpdxq for the standard Lebesgue measure on Rn
given by dx1 ¨ ¨ ¨ dxn .
If X and Y are Rn -valued and Rm -valued random variables, respectively, then the vector pX, Y q
is again a continuous Rnˆm -valued random variable which has a density which is called the joint
probability density function (joint density for short) of X and Y . If Y has density ρY and ρXY is
the joint density of X and Y we can define
ż
ρXY px, yq
PrX P A|Y “ ys “ dx . (2.7)
A ρY pyq
Hence X given Y “ y is a new continuous random variable with density x ÞÑ ρXY px,yq
ρY pyq for a fixed y.
Finally, analogously to the countable case we define the expectation of a continuous random
variable with density ρ by
ż
ErhpXqs “ hpxqρpxqdx . (2.8)
Rn
12
The conditional expectation is defined using the density (2.7).
Definition 2.2. A real-valued random variable X is Gaussian with mean µ and variance σ 2 if
px´µq2
ż
1
PrX P As “ ? e´ 2σ2 dx .
A 2πσ 2
If a random variable has this distribution we will write X „ N pµ, σ 2 q . More generally we say
that a Rn -valued random variable X is Gaussian with mean µ P Rn and SPD covariance matrix
Σ P GLpRn q if
„
px ´ µqJ Σ´1 px ´ µq
ż
1
PrX P As “ a exp ´ dx .
A 2π detpΣq 2
While many calculations can be handled satisfactorily at this level, we will soon see that we
need to consider random variables on much more complicated spaces such as the space of real-valued
continuous functions on the time interval r0, T s which will be denoted Cpr0, T s; Rq. To give all of
the details in such a setting would require a level of technical detail which we do not wish to enter
into on our first visit to the subject of stochastic calculus. If one is willing to “suspend a little
disbelief” one can learn the formal rule of manipulation, much as one did when one first learned
regular calculus. The technical details are important but better appreciated after one fist has the
big picture.
13
ii) A P F ùñ Ac “ ΩzA P F
Ť8
iii) Given tAn u a countable collection of subsets of F, we have i“1 Ai P F.
Thus FY consists of exactly four sets, namely tH, Ω, Y ´1 p´1q, Y ´1 p1qu. For a function f : Ω Ñ R
to be measurable with respect the σ-algebra FY , the inverse image of any set B P BpRq must be one
of the four sets in FY . This is another way of saying that f must be constant on both Y ´1 p´1q and
Y ´1 p1q. Note that together Y ´1 p´1q Y Y ´1 p1q “ Ω.
Definition 3.13. Given pΩ, F, Pq a probability space and A, B P F, we say that A and B are
independent (A B) if
|ù
16
Furthermore, random variables tXi u are jointly independent if for all Ci ,
źn
PrX1 P C1 and . . . and Xn P Cn s “ PrXi P Ci s (2.12)
i“1
17
We recall below some properties of the expected value:
Note that the above is a measurable function. Fixing a probability space pΩ, F, Pq, we define the
conditional expectation:
Proposition 3.17. If X is a random variable on pΩ, F, Pq with Er|X|s ă 8, and G Ă F is a
σ-algebra, then there is a unique random variable Y on pΩ, G, Pq such that
i) Er|Y |s ă 8 ,
ii) Er1A Y s “ Er1A Xs for all A P G .
Definition 3.18. We define the conditional expectation with respect to a σ-algebra G as the
unique random variable Y from Proposition 3.17, i.e., ErX|Gs :“ Y .
The intuition behind Proposition 3.18 is that the conditional expectation wrt a σ-algebra G Ă F
of a random variable X P F is that random variable Y P G that is equivalent or identical (in terms
of expected value, or predictive power) to X given the information contained in G. In other words,
Y “ ErX|Gs is that random variable that is
2A function g is convex on I Ď R if for all x, y P I with rx, ys Ď I and for all λ P r0, 1s one has gpλx ` p1 ´ λqyq ď
λgpxq ` p1 ´ λqgpyq
18
The previous definition of conditional expectation wrt a fixed set of events is obtained by evaluating
the random variable ErX|Gs on the events of interest, i.e., by fixing the events in G that may have
occurred.
When we condition on a random variable we are really conditioning on the information that
random variable is giving to us. In other words, we are conditioning on the σ-algebra generated by
that random variable:
E rX|Zs :“ E rX|σpZqs .
As in the discrete case, one can show that there exists a function h : RangepZq Ñ X such that
ErY |Zpωqs :“ hpZpωqq ,
and hence we can think about the conditional expectation as a function of Zpωq. In particular, this
allows to define
ErY |Z “ zs :“ hpzq .
Example 3.19 (Example 3.15 continued). Consider Ω “ r0, 1s, P „ UnifpΩq and the real random
variable Xpωq “ eω . We further define A1 “ r0, 13 s, A2 “ p 13 , 32 s, A3 “ p 23 , 1s and G “ σptA1 , A2 , A3 uq.
We want to find E rX|Gs. By definition (point i) above) the random variable Y “ E rX|Gs must be
measurable on G, i.e., it must assign a unique value to all the outcomes ω in each of the intervals
A1 , A2 , A3 . Therefore, we can write a random variable Y P G as
ÿ
Y pωq “ ai 1Ai pωq . (2.15)
iPI
for I “ t1, 2, 3u and real numbers tai uiPI .3 It therefore only remains to specify the value of tai uiPI so
that Y (which is now measurable wrt G) is the best approximation to the original random variable
X. We do so enforcing the condition from Proposition 3.17 ii):
« ff
“ ‰ ÿ ÿ “ ‰ aj
E Y 1Aj “ E ai 1Ai 1Aj “ ai E 1Ai XAj “ aj P rω P Aj s “
iPI iPI
3
ż
E X1Aj “ E eω 1Aj “ eω dω
“ ‰ “ ‰
Aj
19
‚ Tower property: if G and H are both σ-algebras with G Ă H, then
E rE rX|Hs |Gs “ E rE rX|Gs |Hs “ E rX|Gs .
Since G is a smaller σ-algebra, the functions which are measurable with respect to it are
contained in the space of functions measurable with respect to H. More intuitively, being
measure with respect to G means that only the information contained in G is left free
to vary. E rE rX|Hs |Gs means first give me your best guess given only the information
contained in H as input and then reevaluate this guess making use of only the information
in G which is a subset of the information in H. Limiting oneself to the information in G is
the bottleneck so in the end it is the only effect one sees. In other words, once one takes
the conditional expectation with respect to a smaller σ algebra one is loosing information.
Therefore, by doing E rE rX|Gs |Hs one is loosing information (in the innermost expectation)
that cannot be recovered by the second one.
‚ Optimal approximation The conditional expectation with respect to a σ-algebra G Ă F
is that random variable Y P G such that
ErX|Gs “ argmin ErX ´ Y s2 (2.16)
Y meas w.r.t. G
This should be thought of as the best guess of the value of Y given the information in G.
Example 3.20 (Example 3.12 continued). In the previous example, EtX|FY u is the best
approximation to X which is measurable with respect to FY , that is constant on Y ´1 p´1q and
Y ´1 p1q.In other words, EtX|FY u is the random variable built from a function hmin composed with
the random variable Y such that the expression
! )
E pX ´ hmin pY qq2
is minimized. Since Y pωq takes only two values in our example, the only details of hmin which mater
are its values at 1 and -1. Furthermore, since hmin pY q only depends on the information in Y , it
is measurable with respect to FY . If by chance X is measurable with respect to FY , then the best
approximation to X is X itself. So in that case EtX|FY upωq “ Xpωq.
In light of (2.16), we see that
ErX|Y1 , . . . , Yk s “ ErX|σpY1 , . . . , Yk qs
This fits with our intuitive idea that σpY1 , . . . , Yk q embodies the information contained in the
random variables Y1 , Y2 , . . . Yk and that ErX|σpY1 , . . . , Yk qs is our best guess at X if we only know
the information in σpY1 , . . . , Yk q.
20
ii) PrX ă xs “ PrY ă xs for all x P R .4
As in the case of functions, there are many ways a sequence of random variables tXn unPN can
converge to another random variable X:
Definition 4.3. Let tXn unPN be a sequence of random variables on a probability space pΩ, F, Pq,
and let X be a random variable on the same space. Then
‚ almost sure convergence tXn u converges to X almost surely if
Prtw P Ω : lim Xn pωq “ Xpωqus “ 1 ,
nÑ8
‚ weak convergence tXn u converges weakly (or in distribution) to X if, for all x P R
lim PrXn pωq ă xs “ PrXpωq ă xs .
nÑ8
Remark 4.4. The above definitions can be ordered by strength: we have the following implications
almost sure convergence ñ convergence in probability ñ weak convergence .
and, for 1 ď q ď p ă 8
convergence in Lp ñ convergence in Lq ñ convergence in probability .
Moreover, we note that in order to have convergence in distributions the two random variables do
not need to live on the same probability space.
A useful method of showing that the distribution of a sequence of random variables converges to
another is to consider the associated sequence of Fourier transforms, or the characteristic function
of a random variable as it is called in probability theory.
Definition 4.5. The characteristic function (or Fourier Transform) of a random variable X is
defined as
ψptq “ ErexppitXqs
for all t P R.
It is a basic fact that the characteristic function of a random variable uniquely determines its
distribution. Furthermore, the following convergence theorem is a classical theorem from probability
theory.
Theorem 4.6. Let Xn be a sequence of real-valued random variables and let ψn be the associated
characteristic functions. Assume that there exists a function ψ so that for each t P R
lim ψn ptq “ ψptq .
nÑ8
If ψ is continuous at zero then there exists a random variable X so that the distribution of Xn
converges to the distribution of X. Furthermore the characteristic function of X is ψ.
4or, equivalently if PrX P As “ PrY P As for all continuity sets A P F.
21
Example 4.7. Note that using a Fourier transform,
σ 2 λ2
Ereiλx s “ eiλm´ 2 ,
for all λ. Using this, we say that X “ pX1 , . . . , Xk q is a k-dimensional Gaussian if there exists
m P Rk and R a positive definite symmetric k ˆ k matrix so that for all λ P Rk we have
pRλq¨λ
Ereiλ¨x s “ eiλ¨m´ 2 .
22
CHAPTER 3
for intermediate times P r0, 1s not of the form k2´n for some k we define the function as the linear
function connecting the two nearest points of the form k2´n . In other words, if t P rs, rs were
s “ k2´n and r “ pk ` 1q2´n then
t ´ s pnq r´t
B pnq ptq “ n
B psq ` n B pnq prq
2 2
We will see momentarily that B pnq has the following properties independent of n:
i) B pnq p0q “ 0.
“ ‰
ii) E B pnq ptq “ 0 for all t P r0, 1s.
“ ‰
iii) E |B pnq ptq ´ B pnq psq|2 “ t ´ s for 0 ď s ă t ď 1 of the form k2´n .
iv) The distribution of B pnq ptq ´ B pnq psq is Gaussian for 0 ď s ă t ď 1 of the form k2´n .
v) The collection of random variables
The first property is clear since the sum in (3.1) is empty. The second property for t “ k2´n
follows from
” ı ÿk ” ı
pnq
E B pnq ptq “ E ξj “0
j“1
” ı
pnq
since E ξj “ 0 for all j P p1, . . . , 2n q. For general t, we have t P ps, rq “ pk{2´n , pk ` 1q2´n q for a
k P p1, . . . , 2n q, so that
” ı t´s ” ı r´t ” ı
E B pnq ptq “ n E B pnq psq ` n E B pnq prq “ 0 .
2 2
23
To see the second moment calculation take s “ m2´n and t “ k2´n and observe that
« ff
” ı ´ ÿ k k
¯´ ÿ ¯ k
ÿ k
ÿ ” ı
pnq pnq 2 pnq pnq pnq pnq
E |B ptq ´ B psq| “ E ξj ξ` “ E ξj ξ`
j“m`1 `“m`1 j“m`1 `“m`1
k
ÿ ” ı k
ÿ k
ÿ ” ı ” ı
pnq pnq pnq
“ E pξj q2 ` E ξj E ξ`
j“m`1 j“m`1 `“m`1
`‰j
k
ÿ
“ 2´n “ k2´n ´ m2´n “ t ´ s
j“m`1
” ı ” ı
pnq pnq pnq pnq
since ξj and ξ` are independent if j ‰ `, while by definition E ξj “ 0 and E pξj q2 “ 2´n .
Since B n ptq is just the sum of independent Gaussians, it is also distributed Gaussian with a
mean and a variance which is just the sum of the individual means and variances respectively.
Because for disjoint time intervals the differences of the B pnq pti q ´ B pnq pti´1 q are sums over disjoint
collections of ξi1 s, they are mutually independent.
Since all of these properties are independent of n it is tempting to think about the limit as
n Ñ 8 and the mesh becoming increasingly fine. It is not clear that such a limit would exist as the
curves B pnq become increasingly “wiggly.” We will see in fact that it does exist. We begin by taking
an abstract perspective in the next sections though we will return to a more concrete perspective at
the end.
25
iii) The increments Bt ´ Bs are Gaussian random variables with mean 0 and variance given
by the length of the interval:
VarpBt ´ Bs q “ |t ´ s| .
iv) The paths t ÞÑ Bt pωq are continuous with probability one.
Wedefine in particular a process satisfying assumption iii) above as continuous:
Definition 3.2. A stochastic process is continuous if its paths t Ñ Xt pωq are continuous with
probability one.
This allows to define Brownian motion shortly as:
Brownian motion is a continuous stochastic process with indipendent increments „ N p0, t ´ sq
The first two points of the above definition specify the distribution of the increments of Brownian
motion, which in turn defines the distribution of the marginals (3.2). Indeed, we can “separate” the
event of a path running through two sets A1 , A2 at times t1 ă t2 by considering the events of
i) the path arriving at y P A1 at time t1 and
ii) the path arriving at z P A2 conditioned on starting at y.
For a fixed y, the second event depends exclusively on the increment Bt2 ´ Bt1 and by the definition
of brownian motion we can write:
P rBt1 P A1 , Bt2 P A2 s “ P rBt1 ´ B0 P A1 , Bt1 ` pBt2 ´ Bt1 q P A2 s
ż
“ P ry ` pBt2 ´ Bt1 q P A2 |Bt1 “ ys P rBt1 ´ B0 P dys
żA1 ż
“ P ry ` pBt2 ´ Bt1 q P dzs P rBt1 ´ B0 P dys
A1 A2
ż ż
“ ρpy, z, t2 ´ t1 q dz ρp0, y, t1 q dy (3.3)
A1 A2
where in the third line we have used the independence of the increments, and in the last line we
have used that the increments are normal random variables. Furthermore we have defined
1 px1 ´x2 q2
ρpx1 , x2 , ∆tq “ ? e´ 2∆t (3.4)
2π∆t
which can be interpreted as the probability density of a transition from x1 to x2 in a time interval
∆t. The above conditioning procedure can trivially be extended to any finite number of marginals.
The family of probability distributions that this process generates is compatible according to
drefd:compatible, an therefore Theorem 2.6 guarantees the existence of the process we have described.
More precisely, Theorem 2.6 guarantees the existence of a process with properties i) and ii) from
above.
Notice that the above definition makes no mention of the continuity. It turns out that the finite
dimensional distributions can not guarantee that a stochastic process is almost surely continuous
(but they can imply that it is possible for a given process to be continuous, as we see below). Indeed,
defining the distribution of the marginals still leaves us with some freedom in the choice of the
process. This is captured by the following definition:
Definition 3.3. A stochastic process tXt u is a version (or modification) of a second stochastic
process tYt u if for all t , PrXt “ Yt s “ 1. Notice that this is a symmetric relation.
We now give an example showing that different versions of a process, despite having by definition
the same distribution, can have different continuity properties:
26
Example 3.4. We consider the probability space pΩ, F, Pq “ pr0, 1s, B, unifpr0, 1sqq. On this
space we define two processes variables:
#
0 if t ‰ ω
Xt pωq “ 0 and Yt pωq “
1 else
We immediately see that for any t P r0, 1s
P rXt ‰ Yt s “ P rω “ ts “ 0
so that one process is a version of the other. However, we see that all of the paths of Xt are
continuous, while none of the paths Yt are in such class.
The previous example showcases a family of distributions of marginals that is trivially compatible
with the continuity of the process they represents (the process can have a continuous version). The
following theorem gives sufficient conditions on the distribution of marginals guaranteeing that the
corresponding process has a continuous version. We use this result to prove that, in partcular, the
marginals defined by points i) and ii) in Def. 3.1 are compatible with iii), i.e., there exists a process
satisfying all such conditions.
Theorem 3.5 (Kolmogorov Continuity Theorem (a version)). Suppose that a stochastic process
tXt u, t ě 0 satisfies the estimate:
for all T ą 0 there exist positive constants α, β, D so that
Er|Xt ´ Xs |α s ď D|t ´ s|1`β @t, s P r0, T s, (3.5)
then there exist a version of Xt which is continuous.
Remark 3.6. The estimate in (3.5) holds for a Brownian motion. We give the details in
one-dimension. First recall that if X is a Gaussian random variable with mean 0 and variance σ 2
then ErX 4 s “ 3σ 4 . Applying this to Brownian motion we have E|Bt ´ Bs |4 “ 3|t ´ s|2 and conclude
that (3.5) holds with α “ 4, β “ 1, D “ 3. Hence it is not incompatible with the all ready assumed
properties of Brownian motion to assume that Bt is continuous almost surely.
Remark 3.6 shows that continuity is a fundamental attribute of Brownian motion. In fact we have the
following second (and equivalent) definition of Brownian motion which assumes a form of continuity as a
basic assumption, replacing other assumptions.
Theorem 3.7. Let Bt be a stochastic process such that the following conditions hold:
i) EpB12 q “ constant,
ii) B0 “ 0 almost surely,
iii) Bt`h ´ Bt is independent of tBs : s ď tu.
iv) The distribution of Bt`h ´ Bt is independent of t ě 0 (stationary increments),
v) (Continuity in probability.) For all δ ą 0,
lim Pr|Bt`h ´ Bt | ą δs “ 0
hÑ0
27
iii) The increments Bt ´ Bs are Gaussian random variables with mean 0 and variance given
by the length of the interval: denoting by pBt qi the i-th component of Bt
#
` ˘ |t ´ s| if i “ j
Cov pBt ´ Bs qi , pBt ´ Bs qj “ .
0 else
and at intermediate times as the value of the line connecting the two nearest points, then W pnq has
the same distribution as B pnq from Section 1.
` ˘
Theorem 4.1. With probability one, the sequence of functions W pnq ptq pωq on converges
uniformly to a continuous function Bt pωq as n Ñ 8, and the process Bt pωq is a Brownian motion
on r0, 1s.
Proof. Now for n ě 0 and k P t1, . . . , 2k u define
pnq ˇ pnq ˇ
ˇW ptq ´ W pn`1q ptqˇ
Zk “ sup
tPrpk´1q2´n ,k2´n s
3 ¨ 2´2pn`2q
ď 2n :“ ψpn, δq .
δ4
Since ψpn, 2´n{5 q „ c2´n{5 for some c ą 0, we have that
ÿ8 ” ˇ ˇ ı
P sup ˇW pnq ptq ´ W pn`1q ptqˇ ą 2´n{5 ă 8 .
n“1 tPr0,1s
Hence the Borel-Cantelli lemma implies that with probability one there exists a random kpωq so
that if n ě k then
ˇ ˇ
sup ˇW pnq ptq ´ W pn`1q ptqˇ ď 2´n{5 .
tPr0,1s
In other words with probability one the tW pnq u form a Cauchy sequence. Let Bt denote the limit.
It is not hard to see that Bt has the properties that define Brownian motion. Furthermore since
each W pnq is uniformly continuous and converge in the supremum norm to Bt , we conclude that
with probability one Bt is also uniformly continuous.
we see that
` ˘
V1 rf sp0, tq ď sup |f 1 psq| t .
sPr0,ts
in L2 pΩ, Pq.
Corollary 5.4. Under the conditions of the above theorem we have limN Ñ8 QN pT q “ T in
probability.
Proof. We see that for any ε ą 0
Er|Zn pωq ´ T |2 s
Prω : |ZN pωq ´ T | ą εs ď Ñ0 as |ΓpN q | Ñ 0.
ε2
Proof of Theorem 5.3. Fix any sequence of partitions
pN q pN q pN q
ΓpN q :“ tttk u : 0 “ t1 ă t2 ¨ ¨ ¨ ă tpN
n
q
“ Tu,
of r0, T s with |ΓpN q | Ñ 0 as N Ñ 8. Defining
Nÿ
´1
pN q pN q
ZN :“ rBptk`1 q ´ Bptk qs2 ,
k“1
we need to show that
ErZN ´ T s2 .
We have,
ErZN ´ T s2 “ ErZN s2 ´ 2T ErZN s ` T 2 “ ErZN s2 ´ T 2 .
30
pN q pN q pN q pN q
Using the convenient notation ∆k B :“ Bptk`1 q ´ Bptk q and ∆k tpN q :“ |tk`1 ´ tk | we have that
ÿ ÿ
ErZN s2 “ Er p∆n Bq2 p∆k Bq2 s
n k
ÿ ÿ
“ Er p∆n Bq4 s ` Er p∆k Bq2 p∆n Bq2 s
n n‰k
ÿ ÿ
“3 p∆n tpN q q2 ` p∆k tpN q qp∆n tpN q q
n n‰k
since Ep∆k Bq2 “ ∆k t and Ep∆k Bq4 “ p∆k tpN q q2 because ∆k B is a Gaussian random variable with
mean zero and variance ∆k tpN q .
The limit of the first term equals 0 as the maximum partition spacing goes to zero since
ÿ
p∆n tpN q q2 ď 3 ¨ supp∆n tpN q qT
n
Returning to the remaining term
ÿ N
ÿ k´1
ÿ N
ÿ
p∆k tpN q qp∆n tpN q q “ ∆k tpN q r ∆n tpN q ` ∆n tpN q s
n‰k k“1 n“1 k`1
N
ÿ
“ ∆k tpN q pT ´ ∆k tpN q q
k“1
ÿ ÿ
“T ∆k tpN q ´ p∆k tpN q q2
“ T2 ´ 0
Summarizing, we have shown that
ErZN ´ T s2 Ñ 0 as N Ñ8
Corollary 5.5. Under the conditions of Theorem 5.3, if ΓpN q Ă ΓpN q we have limN Ñ8 QN pT q “
T almost surely.
Proof. You can prove the above result as an exercise. To do so repeat the proof of the main
theorem and apply Borel-Cantelli Lemma when necessary.
31
pnq pnq pnq pnq pnq
since Btk ´ Bt` is independent of Bt` and ErBtk ´ Bt` s “ 0. We will choose to view this single
fact as the result of two finer grain facts. The first being that the distribution of the walk at time
pnq
tk given the values tBs : s ă t` u is the same as the conditional distribution of the walk at time tk
pnq
given only Bt` . In light of Definition 4.1, we can state this more formally by saying that for all
functions f
” ˇ ı ” ˇ ı
pnq ˇ pnq ˇ pnq
E f pBtk qˇFt` “ E f pBtk qˇBt`
pnq
where Ft “ σpBs : s ď tq. This property is called the Markov property which states that the
distribution of the future depends only on the past through the present value of the process.
There is a stronger version of this property called the strong Markov property which states that
one can in fact restart the process and restart it from the current (random) value and run it for the
remaining amount of time and obtain the same answer. To state this more precisely let us introduce
the process Xptq “ x ` B pnq ptq as the random walk starting from the point x and let Px be the
probability distribution induced on Cpr0, 1sRq by the trajectory of Xptq for fixed initial x. Let Ex
be the expected value associated to Px . Of course P0 is simply the random walk starting from 0
that we have been previously considering. Then the strong Markov property states that for any
function f
E0 f pXtk q “ E0 F pXtk ´t` , tk q where F px, tq “ Ex f pXt q .
Neither of these Markov properties is solely enough to produce (3.10). We also need some fact
pnq
about the mean of the process given the past. Again defining Ft “ σpBs : s ď tq, we can rewrite
(3.10) as
” ˇ ı
pnq ˇ pnq
E Btk ˇFt` “ Bt`
by using the Markov property. This equality is the principle fact that makes a process what is called
a martingale.
We now revisit these ideas making more general definitions which abstract these properties so
we can talk about and use them in broader contexts.
32
Expanding the exponento of the above expression or even better applying (3.11) we see that Brownian
motion has mean µt “ 0 for all t ą 0 and covariance matrix
CovpBt , Bs q “ Cov pBs ` pBt ´ Bs q, Bs q “ E rBs s2 ` E rpBt ´ Bs qBs s
“ E rBs s2 ` E rBt ´ Bs s E rBs s “ s .
Here we have assumed without loss of generality that s ă t and in the third identity we have used
the independence of increments of Brownian motion. This shows that for general t, s ě 0 we have
CovpBt , Bs q “ mintt, su . (3.12)
In fact, because the mean and covariance structure of a Gaussian process completely determine
the properties of its marginals, if a Gaussian process has the same covariance and mean as a
Brownian motion then it is a Brownian motion.
Theorem 7.3. A Brownian motion is a Gaussian process with zero mean function, and covariance
function minpt, sq. Conversely, a Gaussian process with zero mean function, and covariance function
minpt, sq is a Brownian motion.
Proof. Example 7.2 proves the forward direction. To prove the reverse direction, assume that
Xt is a Gaussian process with zero mean and CovpXt , Xs q “ minpt, sq. Then the increments of the
process, given by pXt , Xt`s ´ Xt q are Gaussian random variables with mean 0. The variance of the
increments Xt`s ´ Xt is given by
VarpXt`s ´ Xt , Xt`s ´ Xt q “ CovpXt`s , Xt`s q ´ 2CovpXt , Xt`s ´ Xt q ` CovpXt , Xt q
“ pt ` sq ´ 2t ` t “ s .
The independence of Xt and Xt`s ´ Xt follows immediately by
CovpXt , Xt`s ´ Xt q “ CovpXt , Xt`s q ´ CovpXt , Xt q “ t ´ t “ 0 .
Martingales. In order to introduce the concept of a martingale, we first adapt the concept
of σ-algebra to the framework of stochastic processes. In particular, in the case of a stochastic
process we would like to encode the idea of history of a process: by observing a process up to a
time t ą 0 we have all the information on the behavior of the process before that time but none
after it. Furthermore, as t increases we increase the amount of information we have on that process.
This idea is the one that underlies the concept of filtration:
Definition 7.4. Given an indexing set T , a filtration of σ-algebras is a set of sigma algebras
tFt utPT such that for all t1 ă ¨ ¨ ¨ ă tm P T we have
Ft1 Ă ¨ ¨ ¨ Ă Ftm .
We now define in which sense a filtration contains the information associated to a certain process
Definition 7.5. A stochastic process tXt u is adapted to a filtration tFt u if its marginals are
measurable with respect to the corresponding σ-algebras, i.e., if σpXt q Ď Ft for all t P T . In this
case we say that the process is to the filtration tFt u .
We also extend the concept of σ-algebras generated by a random variable to the case of a
filtration. In this case, the filtration generated by a process tXt u is the smallest filtration containing
enough information about tXt u.
33
Definition 7.6. Let tXt u be a stochastic process on pΩ, F, Pq . Then the filtration tFtX u
generated by Xt is given by
FtX “ σpXs |0 ď s ď tq ,
which means the smallest σ-algebra with respect to which the random variable Xs is measurable, for
all s P r0, ts. Thus, FtX contains σpXs q for all 0 ď s ď t.
The canonical example of this is the filtration generated by a discrete random process (i.e.
T “ N):
Example 7.7 (Example 1.1 continued). The filtration generated by the N -coin flip process for
m ă N is
Fm :“ σpX0 , . . . , Xm q .
Intuitively, we will think of tFtX u as the history of tXt u up to time t, or the “information” about
tXt u up to time t. Roughly speaking, an event A is in tFtX u if its occurrence can be determined by
knowing tXs u for all s P r0, ts.
Example 7.8. Let Bt be a Brownian motion consider the event
A “ tω P Ω : max |Bs pωq| ď 2u .
tPp0,1{2q
It is clear that we have A P F1{2 as the history of Bt up to time t “ 1{2 determines whether A has
occurres or not. However, we have that A R F1{3 as the process may not yet have reached 2 at time
t “ 1{3 but may do so before t “ 1{2.
We now have all the tools to define the concept of a martingale:
Definition 7.9. tXt u is a Martingale with respect to a filtration Ft if for all t ą s we have
i) Xt is Ft -measurable ,
ii) Er|Xt |s ă 8 ,
iii) ErXt |Fs s “ Xs .
Condition iii) in Def. 7.9 involves a conditional expectation with respect to the σ-algebra Ft .
Recall that E rXt`s |Ft s is an Ft -measurable random variable which approximates Xt`s in a certain
optimal way (and it is uniquely defined). Then Def. 7.9 states that, given the history of Xt up to
time t, our best estimate of Xt`s is simply Xt , the value of tXt u at the present time t. In a certain
sense, a martingale is the equivalent in stochastic calculus of a constant function.
Example 7.10. Brownian motion is a martingale wrt tFtB u. Indeed, we have that
E rBt`s |Ft s “ E rBt ` pBt`s ´ Bt q|Ft s “ E rBt |Ft s ` E rBt`s ´ Bt |Ft s “ Bt ` 0 ,
by the independence of increments property.
The above strategy can be extended to general functions gpXq, as the only property that was
used is independence of the increments: because gpXq P F X this property implies that
E gpBt`s ´ Bt q|FtB “ E rgpBt`s ´ Bt qs .
“ ‰
(3.13)
Example 7.11. The process Xt :“ Bt2 ´ t is a martingale B
“ 2 ‰ wrt tFt u. Indeed we have that
B
the process is obviously measurable wrt tFt u and that E |Bt | “ t ă 8 verifying i) and ii) from
Def. 7.9. For iii) we have
“ 2
|FtB “ E pBt ` Bt`s ´ Bt q2 |FtB
‰ “ ‰
E Bt`s
34
“ E Bt2 |FtB ´ 2E rBt s E Bt`s ´ Bt |FtB ` E pBt`s ´ Bt q2 |FtB “ Bt2 ` s .
“ ‰ “ ‰ “ ‰
“ 2 ‰
subtracting t ´ s on both sides of the above equation we obtain E Bt`s ´ pt ` sq|FtB “ Bt2 ´ t .
Example 7.12. The process Yt :“ exprλBt2 ´ λ2 t{2s is a martingale wrt tFtB u. Again, the
process is obviously measurable wrt tFtB u and, computing `the moment
˘ generating function of a
2
Gaussian random variable, we have that E rexp pλBt qs “ exp tλ {2 ă 8, verifying i) and ii) from
Def. 7.9. For iii) we have
E exp pλBt`s q |FtB “ E exp pλpBt ` Bt`s ´ Bt qq |FtB “ exp pλBt q E exp pλpBt`s ´ Bt qq |FtB
“ ‰ “ ‰ “ ‰
multiplying by exprpt ´ sqλ2 {2s on both sides of the above equation we obtain
E exp λBt`s ´ λ2 pt ` sq{2 |FtB “ exp λBt ´ λ2 t{s .
“ ` ˘ ‰ ` ˘
Markov processes. We now turn to the general idea of a Markov Process. As we have seen
in the example above, this family of processes has the “memoryless” property, i.e., their future
depends on their past (their history, their filtration) only through their present (their state at the
present time, or the σ-algebra generated by the random variable of the process at that time).
In the discrete time and countable sample space setting, this holds if given t1 ă ¨ ¨ ¨ ă tm ă t,
we have that the distribution of Xt given pXt1 , . . . , Xtm q equals the distribution of Xt given pXtm q.
This is the case if for all A P BpXq and s1 , . . . , sm P X we have that
P rXt P A|Xt1 “ s1 . . . , Xtm “ sm s “ P rXt P A|Xtm “ sm s .
This property can be stated in more general terms as follows:
Definition 7.13. A random process tXt u is called Markov with respect to a filtration tFt u when
Xt is adapted to the filtration and, for any s ą t, Xs is independent of Ft given Xt .
The above definition can be restated in terms of Brownian motions as follows: For any set
A P pXq we have
PrBt P A|FsB s “ PrBt P A|Bs s a.s. . (3.14)
Remember that conditional probabilities with respect to a σ-algebra are really random variables in
the way that a conditional expectation with respect to σ-algebra is a random variable. That is,
P Bt P A|FsB “ E 1Bt PA pωq|FsB
“ ‰ “ ‰
and P rBt P A|Bs s “ E r1Bt PA pωq|σpBs qs .
Example 7.14. The fact that (3.14) holds can be shown directly by using characteristic functions.
Indeed,to show that the distributions of the right and left hand side of (3.14) coincide it is enough to
identify their characteristic functions. We compute
” ı ” ı ” ı ” ı
E eiϑBt |FsB “ E eiϑpBs `Bt ´Bs q |FsB “ eiϑBs E eiϑpBt ´Bs q |FsB “ eiϑBs E eiϑpBt ´Bs q ,
and similarly
” ı ” ı ” ı ” ı
E eiϑBt |Bs “ E eiϑpBs `Bt ´Bs q |Bs “ eiϑBs E eiϑpBt ´Bs q |Bs “ eiϑBs E eiϑpBt ´Bs q .
Stopping times and Strong Markov property. We now introduce the concept of stopping
time. As the name suggests, a stopping time is a time at which one can stop the process. The accent
in this sentence should be put on can, and is to be intended in the following sense: if someone
is observing the process as it evolves, and is given the instructions on when to stop, I can stop
the process given his/her/their observations. In other words, the observer does not need future
information to know if the event triggering the stop of the process has occurred or not. We now
define this concept formally:
35
Definition 7.15. For a measurable space pΩ, Fq and a filtration tFt u with Ft Ď F for all t P T ,
a random variable tτ u is a stopping time wrt tFt u if tω P Ω : τ ď tu P Ft for all t P T .
Classical examples of stopping times are hitting times such as the one defined in the following
example
Example 7.16. The random time ( τ1 :“ infts ą 0 : Bt ě 1u is a stopping time wrt the natural
filtration of Brownian motion FtB . Indeed, at any time t i can know if the event τ1 has passed
by looking at the past history of(Bt . However, the random time τ0 : ` sup ts P p0, t˚ q : Bs “ 0u is
NOT a stopping time wrt FtB for t ă t˚ , as before t˚ we cannot know for sure if the process will
reach 0 again.
The strong Markov property introduced below is a generalization of the Markov property to
stopping times (as opposed to fixed times in Def. 7.13). More specifically, we say that a stochastic
process has the strong Markov property if its future after any stopping time depends on its past
only through the present (i.e., its state at the stopping time).
Definition 7.17. The stochastic process tXt u has the strong Markov property if for all finite
stopping time τ one has
P rXτ `s P A|Fτ s “ P rXτ `s P A|Xτ s ,
where Fτ :“ tA P Ft : tτ ď tu X A P Ft , @t ą 0u.
In the above definition, the σ-algebra Fτ can be interpreted as “all the information we have on
the process up to time τ ”.
Theorem 7.18. Brownian motion has the strong Markov property.
We now use the above result to investigate some of the properties of Brownian motion:
Example 7.19. For any t ą 0 define the maximum of Brownian motion in the interval r0, ts
as Mt :“ maxsPp0,tq Bs . Similarly, for any m ą 0 we define the hitting time of m as τm :“ infts P
r0, ts : Bs ě mu. Then, we write
P rMt ě ms “ P rτm ď ts “ P rτm ď t, Bt ě ms ` P rτm ď t, Bt ă ms
“ P rτm ď t, Bt ´ Bτm ě 0s ` P rτm ď t, Bt ´ Bτm ă ms .
Using the strong Markov property of Brownian motion we have that Bt ´ Bτm is independent on
Fτm and is a Brownian motion. So by symmetry of Brownian motion we have that
P rτm ď t, Bt ´ Bτm ě 0s “ P rτm ď t, Bt ´ Bτm ď ms “ P rBt ě ms ,
and we conclude that ż8 2
2 y
P rMt ě ms “ 2P rBt ě ms “ ? e 2t dx .
2πt m
This argument is called the reflection principle for Brownian motion. From the above argument one
can also extract that
lim P rτm ă T s “ 1 ,
T Ñ8
i.e., that the hitting times of Brownian motion are almost surely finite.
From the above example we can also derive the following formula
Example 7.20. We will compute the probability density ρτm psq of the hitting time of level m by
şt
the Brownian motion, defined by P rτm ď ts “ 0 ρτm psqds. To do so we write
c ż
2 8 ´ y2 ? ż8 ´u2
P rτm ď ts “ P rMt ě ms “ 2P rBt ě ms “ e 2t dy “ 2π ? e du , (3.15)
πt m m{ t
36
?
where in the last equality we made a change of variables u “ y{ t. Now, differentiating (3.15) wrt
t we obtain, by Leibniz rule,
d |m| ´3{2 ´ m2
ρτm ptq “ P rτm ď ts “ t e 2t . (3.16)
dt 2π
We immediately see from (3.16) that
ż8
|m| m2
E rτm s “ s´1{2 e´ 2s ds “ 8 .
2π 0
Example 7.21. From the above computations we derive the distribution of zeros of Brownian
motion in the interval ra, bs for 0 ă a ă b. We start by computing the desired quantity on the
interval r0, ts for an initial condition x which we assume wlog to be x ă 0:
„ „
P rBs “ 0 for s P r0, ts|B0 “ xs “ P max Bs ě 0|B0 “ x “ P max Bs ě ´x|B0 “ 0
sPr0,ts sPr0,ts
żt 2
|x| ´ x2s
“ P rτ´x ď ts “ s´3{2 e ds . (3.17)
2π 0
Since the above expression holds for all x we obtain the distribution of zeroes in the interval ra, bs by
integrating (3.17) over all possible x, weighted by the probability of reaching x at time a:
ż8
P rBs “ 0 for s P ra, bs|B0 “ 0s “ P rBs “ 0 for s P ra, b ´ as|Ba “ xs P rBa P dxs
´8
c
|x| b´a ´3{2 ´ x2
ż8 ż ˆc ˙
2 ´ x2 2 a
s e 2s ds e 2a dx “ arccos .
´8 2π 0 πa π b
By taking the complement of the above we also obtain the probability that Brownian motion has no
zeroes in the interval ra, bs:
ˆc ˙
2 a
P rBs ‰ 0 @s P ra, bss “ 1 ´ P rBs “ 0 for s P ra, bss “ arcsin .
π b
The above result is referred to as the arcsine law for Brownian motion.
This is the definition of the “delta function” δpxq. (If this is uncomfortable to you, look at [18].) Hence we see that
ρpt, xq is the (weak) solution to
Bρ 1 B2 ρ
“
Bt 2 Bx2
ρp0, xq “ δpxq
To see the connection to probability, we set ppt, x, yq “ ρpt, x ´ yq and observe that for any function f we have
ż8
ˇ (
E f pBt qˇB0 “ x “ f pyqppt, x, yqdy
´8
37
ˇ (
We will write Ex f pBt q for E f pBt qˇB0 “ x . Now notice that if upt, xq “ Ex f pBt q then upt, xq solves
Bu 1 B2 u
“
Bt 2 Bx2
up0, xq “ f pxq
This is Kolmogorov Backward equation. We can also write it in terms of the transition density ppt, x, yq
Bp 1 B2 p
“
Bt 2 Bx2
pp0, x, yq “ δpx ´ yq
From this we see why it is called the “backwards” equation. It is a differential equation in the x variable. This is the
“backwards” equation in ppt, x, yq in that it gives the initial point. This begs a question. Yes, there is also a forward
equation. It is written in terms of the forward variable y.
Bp 1 B2 p
“
Bt 2 By 2
pp0, x, yq “ δpx ´ yq
In this case it is identical to the backwards equation. In general it will not be.
We make one last observation: the ppt, s, x, yq “ ppt ´ s, x, yq “ ρpt ´ s, x ´ yq satisfy the Chapman-Kolmogorov
equation (the semi-group property). Namely, for any s ă r ă t and any x, y we have
ż8
pps, t, x, yq “ pps, r, x, zqppr, t, z, yqdz
´8
y
x
z
s r t
This also suggests the following form for the Kolmogorov forward equation. If we write an equation for pps, t, x, yq
evolving in s and y, then we get an equation with a finial condition instead of an initial condition. Namely, for s ď t
Bp 1 B2 p
“´
Bs 2 By 2
ρpt, t, x, yq “ δpx ´ yq
Hence, we are solving backwards in time.
38
CHAPTER 4
Itô Integrals
2. Riemann-–Stieltjes Integral
Before we try to understand how to integrate against Brownian motion, we recall the classical
Riemann—Stieltjes integration theory. Given two continuous functions f and g, we want to define
żT
f ptqdgptq . (4.2)
0
We begin by considering a piecewise function φ function defined by
#
a0 for t P rt0 , t1 s
φptq “ (4.3)
ak for t P ptk , tk`1 s, k “ 1, . . . , n ´ 1
for some partition
0 “ t0 ă t1 ă ¨ ă tn´1 ă tn “ t
and constants ak P R. For such a function φ it is intuitively clear that
żT n´1
ÿ ż tk`1 n´1
ÿ ż tk`1 n´1
ÿ
φptqdgptq “ φpsqdgpsq “ ak dgpsq “ ak rgptk`1 q ´ gptk qs . (4.4)
0 k“0 tk k“0 tk k“0
şt şt
because tkk`1 dgpsq “ gptk`1 q ´ gptk q by the fundamental theorem of Calculus (since tkk`1 dgpsq “
ştk`1 1
tk g psq ds if g is differentiable).
The basic idea of defining (4.2) is to approximate f by a sequence of step functions tφn ptqu each
of the form given in (4.3) so that
sup |f ptq ´ φn ptq| Ñ 0 as n Ñ 8. (4.5)
tPr0,T s
pnq
A natural choice of partition for the nth level is tk “ T k2´n for k “ 0, . . . , 2n and then define the
nth approximating function by
#
f pT q if t “ T
φn ptq “ pnq pnq pnq
f ptk q if t P rtk , tk`1 q
If f is continuous, it easy to see that (4.5) holds.
40
We then are left to show that there exists a constant α so that
ˇ T pnq
ż
ˇ
ˇ φ ptqdgptq ´ αˇ ÝÑ 0 as n Ñ 8
0
şT
We would then define the integral 0 f ptqdgptq to be equal to α. One of the keys to proving this
şT
convergence is a uniform bound on the approximating integrals 0 φpnq ptqdgptq. Observe that
żT n´1
ÿ
pnq
| φ ptqdgptq| ď }f }8 |gptk q ´ gptk`1 q| ď }f }8 V1 rgsp0, T q
0 k“0
where
ÿ
}f }8 “ |f ptq| .
tPr0,T s
şT
This uniform bound implies that the 0 φpnq ptqdgptq say in a compact subset of R. Hence there must
be a limit point α of the sequence and a subsequence which converges to it. It is not then hard to
show that this limit point is unique. That isş to say that if any other subsequent converges it must
T
also converge to α. We define the value of 0 f ptqdgptq to be α. Hence it seems sufficient for g to
have V1 rgsp0, T q ă 8 if when g and f are continuous. It can also be shown to be necessary for a
reasonable class of f .
It is further possible to show using essentially the same calculations that the limit α is independent
of the sequence of partitions as long as the maximal spacing goes to zero and independent of the
choice of point at which to evaluate the integrand f . In the above discussion we chose the left hand
endpoint tk of the interval rtk , tk`1 s. However we were free to choose any point in the interval.
While the compactness argument above is a standard path in mathematics is often more
satisfying to explicitly show that the tφpnq u are a Cauchy sequence by showing that for any ε ą 0
there exists and N so that if n, m ą N then
ˇż T żT ˇ
pmq
φ ptqdgptq ´ φpnq ptqdgptqˇ ă ε
ˇ ˇ
ˇ
0 0
şT
Since φpmq ´ φpnq is again a step function of the form (4.3), the integral 0 rφpmq ´ φpnq sptqdgptq is
well defined given by a sum of the form (4.4). Hence we have
ˇż T żT ˇ ˇż T ˇ
pmq pnq
φ ptqdgptq ´ φ ptqdgptqˇ “ ˇ rφpmq ´ φpnq sptqdgptqˇ ď }φpmq ´ φpnq }8 V1 rgsp0, T q
ˇ ˇ ˇ ˇ
ˇ
0 0 0
Since f is continuous and the partition spacing is going to zero it is not hard so see that the tφpnq u
from a Cauchy sequence under the } ¨ }8 norm which completes the proof that integrals of the step
functions form a Cauchy sequence.
3. A motivating example
We begin by considering the example
żT
Bs dBs (4.6)
0
where B is a standard Brownian motion. Since V1 rBsp0, T q “ 8 almost surely we can not entirely
follow the prescription of the Riemann-Stieltjes integral given in Section 2. However, it still seems
reasonable to approximate the integrand Bs by a sequence of step functions of the form(4.3).
However, since B is random, the ak from (4.3) will have to be random variables.
41
Fixing a partition 0 “ t0 ă t1 ă ¨ ¨ ¨ ă tN “ T , we define two different sequences of step function
approximations of B. For t P r0, T s, we define
φN ptq “ Bptk q if t P rtk , tk`1 q
N
φ̂ ptq “ Bptk`1 q if t P ptk , tk`1 s
Just as in the Riemann-Stieltjes setting (see (4.4)), for such step functions it is clear that one should
define the respective integrals in the following manner:
żT Nÿ
´1
N
φ ptq dBs “ Bptk qrBptk`1 q ´ Bptk qs
0 i“0
żT Nÿ´1
φ̂N ptq dBs “ Bptk`1 qrBptk`1 q ´ Bptk qs
0 i“0
In the Riemann-Stieltjes setting, the two are the same. But in this case, the two have very different
properties as the following calculation
“ shows.
‰ Since Bptk q and Bptk`1 q ´ Bptk q are independent we
have EBptk qrBptk`1 q ´ Bptk qs “ E Bptk q ErBptk`1 q ´ Bptk qs “ 0. So
”ż T ı Nÿ ´1
E φN dBs “ EBptk qrBptk`1 q ´ Bptk qs
0 k“0
Nÿ
´1
“ ‰
“ E Bptk q ErBptk`1 q ´ Bptk qs “ 0
k“0
We first observe that I satisfies one of the standard properties of an integral in that it is a linear
functional. In other words, if λ P R, and φ and ψ are elementary stochastic processes then
Ipλφq “ λIpφq and Ipψ ` φq “ Ipψq ` Ipφq (4.7)
Thanks to our requirement that αk are measurable with respect to the filtration associated to the
left endpoint of the interval rtk , tk`1 q we have the following properties which will play a central role
in what follows and should be compared to the calculations in Section 3.
Lemma 4.4. If φ is an elementary stochastic processes then
EIpφq “ 0 (mean zero)
ż8
E Ipφq2 “ E|φptq|2 ds
“ ‰
(Itô Isometry)
0
Remark 4.5. An isometry is a map between two spaces which preserves distance (i.e. the norm).
If we consider
IpS2 q “ Ipφq : φ P S2 u
then according to Lemma 4.4 the map Ipφq ÞÑ φ is an isometry between the space of random variables
a
L2 IpS2 q, P “ X P IpS2 q : }X} “ EpX 2 q ă 8u
` ˘
and the space of elementary stochastic processes L2 pS2 , Prdωq ˆ dts equipped with the norm
´ż 8 ¯1
2 2
}φ} “ Eφ pt, ωq dt
0
43
Proof of Lemma 4.4. We begin by showing Ipφq is mean zero.
ÿ
ErIpφqs “ Erαk pBptk`1 q ´ Bptk qqs
k
ÿ
“ ErErαk pBptk`1 q ´ Bptk qqs|Ftk s
k
ÿ
“ Erαk ErBptk`1 q ´ Bptk q|Ftk ss “ 0
k
Turning to the Itô isometry
” `ÿ ˘ı ” ` ÿ ˘ı
ErIpφq2 s “ E αk pBptk`1 q ´ Bptk qq ¨ E αj pBptj`1 q ´ Bptj qq
k j
ÿ
“ Erαk αj rBptk`1 q ´ Bptk qsrBptj`1 q ´ Bptj qss
j,k
ÿ
“2 Erαk αj rBptk`1 q ´ Bptk qsrBptj`1 q ´ Bptj qss
jăk
ÿ
` Erαk2 rBptk`1 q ´ Bptk qs2 s
k
Next, we examine each component separately: Recall for tk`1 ď tj
Erαk αj rBptk`1 q ´ Bptk qsrBptj`1 q ´ Bptj qss
“ ErErαk αj rBptk`1 q ´ Bptk qsrBptj`1 q ´ Bptj qs|Ftj ss
“ Erαk αj rBptk`1 q ´ Bptk qsErrBptj`1 q ´ Bptj qs|Ftj ss
“0
since ErrBptj`1 q ´ Bptj qs|Ftj s “ 0. Similarly
ÿ ÿ
Erαk2 rBptk`1 q ´ Bptk qs2 s “ ErErαk2 rBptk`1 q ´ Bptk qs2 |Ftk ss
k k
ÿ
“ Erαk2 ptk`1 ´ tk qs
k
Hence, we have:
ÿ ż
2
ErIpφq s “ 0 ` Erαk2 sptk`1 ´ tk q “ Erφ2 psqs ds
k
So far we have just defined the Itô Integral on the whole positive half line r0, 8q. For any
0 ď s ă t ď 8, we make the following definition
żt
φr dBr “ Ipφ1rs,tq q
s
We can now talk about the stochastic process
żt
Mt “ Ipφ1r0,tq q “ φs dBs (4.8)
0
associated to a given elementary stochastic process φt , where the last two expressions are just
different notation for the same object.
We now state a few simple consequences of our definitions.
44
Lemma 4.6. Let φ P S2 and 0 ă s ă t then
żt żs żt
φr dBr “ φr dBr ` φr dBr
0 0 s
and
żt
Mt “ φs dBs
0
is measurable with respect to Ft .
Proof of Lemma 4.6. Clearly Mt is measurable with respect to Ft since tφs : s ď tu are by
assumption and the construction of the integral only uses the information from tBs : s ď tu. The
first property follows from φ1r0,ts “ φ1r0,sq ` φ1rs,tq and hence
Ipφ1r0,ts q “ Ipφ1r0,sq q ` Ipφ1rs,tq q
by (4.7).
Lemma 4.7. Let Mt be as in (4.8) for an elementary process φ and a Brownian motion Bt both
adapted to a filtration tFt : t ě 0u. Then Mt is a martingale with respect to the filtration Ft .
Proof of Lemma 4.7. Looking at Definition 7.9, there are three conditions we need to verify.
The measurability is contained in Lemma 4.6. The fact that ErMt2 s ă 8 follows from the Itô
isometry since
żt ż8
2 2
ErMt s “ E|φs | ds ď E|φs |2 ds
0 0
because the last integral is assumed to be finite in the definition of an elementary stochastic process.
All that remains is to verify that for s ă t.
ErMt pφq ´ Ms pφq|Fs s “ 0 .
There are a few cases. Taking one case, say s and t are in the disjoint intervals rtk , tk`1 q and
rtj , tj`1 q, respectively. We have that
´ j´1
ÿ ¯
Mt pφq ´ Ms pφq “ αk rBptk`1 q ´ Bs s ` αn rBptn`1 q ´ Bptn qs ` αj rBt ´ Bptj qs
n“k`1
Next, take repeated expectations with respect to the filtrations tFa u, where a P ttj , tj´1 , . . . , tk`1 , su.
This would then imply that for each a
Erαa rBpta`1 q ´ Bpta qs|Fa s “ αa ErBpta`1 q ´ Bpta q|Fa s
“ αa ErBpta`1 q ´ Bpta qs
“0
Hence, ErMt pφq ´ Ms pφq|Fs s “ 0. The other cases can be done similarly. And the conclusion
immediately follows.
Lemma 4.8. In the same setting as Lemma 4.7, Mt is a continuous stochastic process. (That is
to say, with probability one the map t ÞÑ Mt is continuous for all t P r0, 8q) .
Proof of Lemma 4.8. We begin by noticing that if φpt, ωq is a simple process with
Nÿ
´1
φpt, ωq “ αk pωq1rtk ,tk`1 q ptq
k“1
45
then if t P ptk˚ , tk˚ `1 q then
˚ ´1
kÿ
Mt pφ, ωq “ αk pωqrBptk`1 , ωq ´ Bptk , ωqs ` αk˚ pωqrBpt, ωq ´ Bptk˚ , ωqs
k“1
Hence it is clear that Mt pφ, ωq is continuous if φ is a simple function since the Brownian motion Bt
is continuous.
ş8 “ ‰
E pφn ´ f q2 dt Ñ 0 n Ñ 8
“ ‰
0 n Ñ 8 E pIpφn q ´ Xq2 Ñ 0
Ip¨q ş
tf ptqutPT P S̄ 2 L2 pΩq Q X “: f dBs
Theorem 5.3. For every f P S¯2 , there exists a random variable X P L2 pΩq so that if tφn : n “
1, . . . , 8u is a sequence of elementary stochastic processes (i.e. elements of S2 ) converging to the
stochastic process f in L2 pΩ ˆ r0, 8q then Ipφn q converges to X in L2 pΩq.
Definition 5.4. For any f P S¯2 , we define the Itô integral
ż8
Ipf q “ ft dBt
0
Since the sequence φn converges to f in L2 pΩ ˆ r0, 8qq, we know that it is a Cauchy sequence in
L2 pΩ ˆ r0, 8qq. By the above calculations we hence have that tIpφn qu is a Cauchy sequence in
L2 pΩq. It is a classical fact from real analysis that this space is complete which implies that every
Cauchy sequence converges to a point in the same space. Let X P L2 pΩq denote the limit.
To see that X does not depend on the sequence tφn u, let tφ̃n u be another sequence converging
to f . The same reasoning as above ensures the existence of a X̃ P L2 pΩq so that Ipφ̃n q Ñ X̃ in
L2 pΩq. On the other hand
” ı ” ı ż8 ” ı
E pIpφn q ´ Ipφ̃n qq2 “ E pIpφn ´ φ̃n qq2 “ E pφn ptq ´ φ̃m ptqq2 dt
0
ż8 ż8 ” ı
2
E pf ´ φ̃m ptqq2 dt
“ ‰
ď2 E pφn ptq ´ f q dt ` 2
0 0
where in the inequality we have again used the fact that pφn ´ φ̃m q2 “ rpφn ´ f q ` pf ´ φ̃m qs2 ď
2pφn ´ f q2 ` 2pf ´ φ̃n q2 . Since both φn and φ̃n converge to f in L2 pΩ ˆ r0, 8qq we have that the
last two terms on the right-hand side go to 0. This in turn implies that EpX ´ X̃q2 “ 0 and that
the two random variables are the same.
47
Remark 5.5. While the above construction might seem like magic since it is so soft, it is an
application of a general principle in mathematics. If one can define a linear map on a dense subset
of elements in a space in such a way that the map is an isometry then the map can be uniquely
extend to the whole space. This approach is beautifully presented in our current context in [10] using
the following lemma
Lemma 5.6. (Extension Theorem) Let B1 and B2 be two Banach spaces. Let B0 Ă B1 be a linear
space. If L : B0 Ñ B2 is defined for all b P B0 and |Lb|B2 “ |Lb|B1 , @b P B0 . Then there exists a
unique representation of L to B 0 (closure of B0 ) called L with Lb “ Lb, @b P B0 .
Example 5.7. We use the above theorem to show that
żT
1
IT pBs q “ Bs dBs “ pBT2 ´ T q .
0 2
řN pnq pnq
To do so, we show that the sequence φn “ j“1 1rtpnq ,tpnq q Btnj with ∆tnj “ tj`1 ´ tj Ñ 0 converges
j j`1
in L2 pΩ, r0, T qq to tBt u. Indeed we have that
żT N ż tpnq N ż tpnq
ÿ j`1 ÿ j`1
“ 2
‰ 2 pnq
E |φn ´ Bs | dt “ pBtpnq ´ Bs q ds “ ptj`1 ´ sqds
pnq j pnq
0 j“1 tj j“1 tj
N
1 ÿ pnq pnq
“ pt ´ tj q2 Ñ 0 .
2 j“1 j`1
Then, by Theorem 5.3 we have that
żT ÿ
Bs dBs “ lim Ipφn q “ lim Btpnq ∆Bjn .
0 nÑ8 nÑ8 j
j
Now, writing ∆Bjn :“ Btpnq ´ Btpnq we have
j j`1
∆pBj2 q :“ B 2pnq ´ B 2pnq “ pBtpnq ´ Btpnq q2 ` 2Btpnq pBtpnq ´ Btpnq q “ p∆Bjn q2 ` 2Btpnq ∆Bjn
tj tj`1 j j`1 j j j`1 j
and therefore
˜ ¸ ˜ ¸
ÿ 1 ÿ ÿ 1 ÿ
Btpnq ∆Bjn “ ∆pBj2 q ´ p∆Bjn q2 “ BT2 ´ p∆Bjn q2 .
j
j 2 j j
2 j
şT
The term on the rhs converges to pBT2 ´ T q{2 in L2 pΩq, which therefore corresponds to 0 Bs dBs .
50
şT
Most of the previous properties hold. In particular 0 fs pωqdBs pωq is a perfectly fine random
şT
variable which is almost surely finite. However, it is not necessarily true that Er 0 fs pωqdBs pωqs “ 0.
şT
Which in turn means that 0 fs pωqdBs pωq need not be a martingale. (In fact it is what is called a
local martingale.) By obvious reasons, the Itô isometry property (4.12) may not hold in this case.
Example 8.1.
9. Itô Processes
Let pΩ, F, Pq be the canonical probability space and let Ft be the σ-algebra generated by the
Brownian motion Bt pωq.
Definition 9.1. Xt pωq is an Itô process if there exist stochastic processes f pt, ωq and σpt, ωq
such that
i) f pt, ωq and σpt, ωq are Ft -measurable ,
şt şt
ii) 0 |f | ds ă 8 and 0 |σ|2 ds ă 8 almost surely ,
iii) X0 pωq is F0 -measurable ,
iv) With probability one the following holds
żt żt
Xt pωq “ X0 pωq ` fs pωq ds ` σs pωqdBs pωq (4.13)
0 0
The processes f pt, ωq and σpt, ωq are referred to as drift and diffusion coefficients of Xt .
For brevity, one often writes (4.13) as
dXt pωq “ ft pωq dt ` σt pωqdBt pωq
But this is just notation for the integral equation above!
51
CHAPTER 5
Stochastic Calculus
This section introduces the fundamental tools for the computation of stochastic integrals. Indeed,
similarly to what is done in classical calculus, stochastic integrals are rarely computed by applying
the definition of Itô integral from the previous chapter. In the case of şclassical calculus, instead
of applying the definition of Riemann integral one usually computes f pxqdx by applying the
d
fundamental theorem of calculus and choosing dx F pxq “ f pxq such that
ż ż
d
f pxqdx “ F pxqdx “ F pxq . (5.1)
dx
Even though, as we have seen in the previous section differentiation in this framework is not possible,
it is possible to obtain a similar result for Itô integrals. In the following chapter we will introduce
such a formula (called the Itô formula) allowing for rapid computation of stochastic integrals.
inL2 as n Ñ 8. Combining this with the result of Lemma 1.2 (proven below) with g “ f 2 for the
second term on the right hand side of (5.4) we conclude the proof.
Proof of Lemma 1.2. We now want to show that
Nÿ
pnq żt
` ˘2
An :“ gpξk q Btk ´ Btk´1 ÝÑ gpBs q ds
k“1 0
as n Ñ 8. Therefore, we obtain (5.5), by showing that the term |Cn ´ Dn | converges to 0 in L2 pΩq
as this directly implies convergence in probability. To that end observe that
» fi
Nÿ
pnq
E pDn ´ Cn q2 “ E – gpBtk´1 q2 p∆k t ´ ∆2k Bq2 fl
“ ‰
k“1
» fi
ÿ
` 2E – gpBtk´1 qgpBtj´1 qp∆k t ´ ∆2k Bqp∆j t ´ ∆2j Bqfl (5.6)
jăk
54
as n Ñ 8 and |ΓN | Ñ 0, the product goes to zero. All that remains is the second sum in (5.6).
Since
1.1. A second look at Itô’s Formula. Looking back at (5.4), one sees that the Itô integral
term in Itô’s Formula comes from the sum against the increments of Brownian motion. This term
results directly from the first order Taylor expansion of f , and can be identified with the first order
derivative term that we are used to see in the fundamental theorem of calculus (5.1). The second
sum
Nÿ
pnq
` ˘2
f 2 pξk q Btk ´ Btk´1 ,
k“1
which contains the squares of the increments of Brownian motion, results from the second order
term in the Taylor expansion and is absent in the classical calculus formulation. However, since the
sum
Nÿ
pnq
` ˘2
Btk ´ Btk´1
k“1
converges in probability to the quadratic variation of the Brownian motion Bt , which according
to Lemma 5.3 is simply t, this term gives a nonzero contribution in the limit n Ñ 8 and should
be considered in this framework. We refer to this term as the Itô correction term. In light of
this remark, if we let rBst denote the quadratic variation of Bt , then one can reinterpret the Itô
correction term
1 t 2 1 t 2
ż ż
f pBs q ds as f pBs q drBss . (5.7)
2 0 2 0
We wish to derive a more general version of Itô’s formula for a general Itô process Xt defined by
żt żt
Xt “ X0 ` fs ds ` gs dBs .
0 0
Beginning in the same way as before, we write the expression analogous to (5.4), namely
Nÿ
pnq
1
` ˘ 1 Nÿpnq
` ˘2
f pXt q ´ f pX0 q “ f pXtk´1 q Xtk ´ Xtk´1 ` f 2 pξk q Xtk ´ Xtk´1 (5.8)
k“1
2 k“1
55
for some twice continuously differentiable function f and some partition ttk u of r0, ts. It is reasonable
to expect the first and second sums to converge respectively to the integrals
żt
1 t 2
ż
1
f pXs q dXs and f pXs q drXss . (5.9)
0 2 0
The first is the Itô stochastic integral with respect to an Itô process Xt while the second is an
integral with respect to the differential of the quadratic variation rXst of the process Xt . So far this
discussion has proceeded mainly by analogy to the simple Brownian motion. To make sense of the
two terms in (5.9) we need to better understand the quadratic variation of an Itô process Xt and
to define the concept of stochastic integral against Xt . While the former point is covered in the
following section, the latter is quickly clarified by this intuitive definition:
Definition 1.3. Given an Itô process tXt u with differential dXt “ ft dt ` σt dBt and an adapted
stochastic process tht u such that
ż8 ż8
|hs fs | ds ă 8 and phs σs q2 ds ă 8 a.s.
0 0
then we define the integral of ht against Xt as
żt żt żt
hs dXs :“ hs fs ds ` hs σs dBs . (5.10)
0 0 0
ΓN :“ tttN N N N
j u : 0 “ t0 ă t1 ă ¨ ¨ ¨ ă tj N “ tu (5.11)
with |ΓN | :“ supj |tN N
j`1 ´ tj | Ñ 0 as N Ñ 8. Furthermore, we define the quadratic variation of Xt
as
rXst :“ rX, Xst . (5.12)
We can also speak about the quadratic variation on an interval different than r0, ts. For 0 ď s ă t
we will write respectively rXss,t and rX, Y ss,t for the quadratic variation and cross-quadratic variation
on the interval rs, ts.
Just from the algebraic form of the pre-limiting object the quadratic variation satisfies a number of
properties.
Lemma 2.2. Assuming all of the objects are defined, then for any adapted, continuous stochastic
processes Xt , Yt ,
i) for any constant c P R we have
rc Xst “ c2 rXst
ii) for 0 ă s ă t we have
rXs0,s ` rXss,t “ rXs0,t (5.13)
56
iii) we have that
0 ď rXs0,s ď rXs0,t (5.14)
for t ą s ě 0. In other words, the map t ÞÑ rXst is nondecreasing a.s. .
iv) we can write
rX ˘ Y st “ rXst ` rY st ˘ 2rX, Y st . (5.15)
Consequently, quadratic covariations can be written in terms of quadratic variations as
1´ ¯ 1´ ¯
rX, Y st “ rX ` Y st ´ rXst ´ rY st “ rX ` Y st ´ rX ´ Y st . (5.16)
2 4
Proof. Parts i) and ii) are a direct consequence of Def. 2.1, while part iii) results from ii) the fact
that rX, Y st is defined as a sum of squares and is therefore nonnegative: rXs0,t “ rXs0,s ` rXss,t ě
rXs0,s . Part iv) is obtained by noticing that
pjN ´
ÿ ¯2
rX ˘ Y st “ lim pXtN ´ XtN q ˘ pYtN ´ YtN q
N Ñ8 j`1 j j`1 j`1
j“0
j ˆ´ N
pÿ ¯2 ´ ¯´ ¯ ´ ¯2 ˙
“ lim XtN ´ XtN ˘ 2 XtN ´ XtN YtN ´ YtN ` YtN ´ YtN
N Ñ8 j`1 j j`1 j j`1 j j`1 j
j“0
“ rXst ˘ 2rX, Y st ` rY st ,
while (5.16) is obtained by rearranging the terms of the above result (for the first inequality) and
by using it to compute rX ` Y st ` rX ´ Y st (for the second).
In the following sections we will see that the quadratic variation of Itô integrals acquires a
particularly simple form. We will show this by first considering quadratic variations of Itô integrals
and then extend this result to Itô processes.
2.1. Quadratic Variation of an Itô Integral.
ş8
Lemma 2.3. Let σt be a process adapted to the filtration tFtB u and such that 0 σs2 ds ă 8 a.s..
şt
Then defining Mt :“ It pσq “ 0 σs dBs we have that
żt
rM st “ σs2 ds (5.17)
0
or in differential notation drM st “ σt2 dt.
Proof of Lemma 2.3. It is enough to prove (5.17) when σs is an elementary stochastic process
in S2 . The general case can then be handled by approximation as in the proof of the Itô isometry.
Hence, we assume that
K
ÿ
σt “ αj´1 1rtj´1 ,tj q ptq (5.18)
j“1
where the αk satisfy the properties required by S2 and K is some integer. Without loss of generality
we can assume that t is the right endpoint of our interval so that the partition takes the form
0 “ t0 ă t1 ă ¨ ¨ ¨ ă tK “ t
Now observe that if rs, rs Ă rtj´1 , tj s then
żr
` ˘
στ dBτ “ αj´1 Bs ´ Br
s
57
pnq
Hence ts` u is a sequence of partitions of the interval rtj´1 , tj s so that
pnq pnq pnq
tj´1 “ s0 ă s1 ă ¨ ¨ ¨ ă sN pnq “ tj
pnq pnq
and |ΓN | “ sup` |s`´1 ´ s` | Ñ 0 as n Ñ 8. Then the quadratic variation of Mt on the interval
rtj´1 , tj s is the limit n Ñ 8 of
Nÿ
pnq Nÿ
pnq
pMs`´1 ´ Ms` q2 “ αj´1
2
pBs`´1 ´ Bs` q2 .
`“1 `“1
Since the summation on the right hand side limits to the quadratic variation of the Brownian motion
B on the interval rtj´1 , tj s which we know to be tj ´ tj´1 we conclude that
2
rM stj´1 ,tj “ αj´1 ptj ´ tj´1 q .
Since the quadratic variation on disjoint intervals adds, we have that
K
ÿ K
ÿ żt
2
rM st “ rM stj´1 ,tj “ αj´1 ptj ´ tj´1 q “ σs2 ds
j“1 j“1 0
where the last equality follows from the fact that σs takes the form (5.18). As mentioned at the
start, the general form follows from this calculation by approximation by functions in S2 .
Remark 2.4. The proof of the above result in Klebaner (Theorem 4.14 on pp. 106) has a subtle issue .
When bounding « ff
n´1
ÿ n´1
ÿ
2 n n 2 2 n n
2 g pBtni qpti`1 ´ ti q ď 2δn E g pBtni qpti`1 ´ ti q ,
i“0 i“0
řn´1
Klebaner asserts that as n Ñ 8, δn “ |Γn | Ñ 0, Erg pBt2i qsptni`1 ´ tni q would stay finite, and thus their
2
ři“0
n´1
product would go to 0. However, the finiteness of i“0 Erg 2 pBt2i qsptni`1 ´ tni q is unjustified. In fact, if it were
şt
finite, it must converge to 0 Erg 2 pBs qsds (Riemann sum). However, this integral might be infinity for certain
2
choice of g, for example, gpxq “ ex (see Example 4.5 on pp. 99 of [Klebaner]). The proof here uses the
same computation of second moment but only for “nice” functions (i.e., those with compact support). The
convergence in probability (note: this is weaker than convergence in L2 ) for general continuous functions is
established using approximation. The stopping rules for Itô integral are needed here, but we defer it to the
later part of the course.
We now consider the quadratic covariation of two Itô integrals with respect to independent
Brownian motions.
Lemma 2.5. Let Bt , Wt two independent Brownian motions,
ş8 and
ş8 fs , gs two stochastic processes,
all adapted to the underlying filtration Ft and such that 0 fs2 ds, 0 gs2 ds ă 8 almost surely. We
define
żt żt
Mt :“ fs dBs and Nt :“ gs dWs .
0 0
Then, for all t ě 0 one has
rN, M st “ 0 . (5.19)
Proof of (5.19). Again without lost of generality it is enough to prove the result for σt and
gs in S2 . We can further assume that both functions are defined with respect to the same partition
0 “ t0 ă t1 ă ¨ ¨ ¨ ă tK “ t
58
Since as observed in (5.13), the quadratic variation on disjoint intervals adds, we need only show
that rN, M stj´1 ,tj “ 0 on any of the partition intervals rtj´1 , tj s.
Fixing such an interval rtj´1 , tj s, we see that rN, M stj´1 ,tj “ σtj´1 gtj´1 rW, Bstj´1 ,tj . The easiest
way to see this is to use the “polarization” equality (5.16)
2rW, Bstj´1 ,tj “ r W?`B
2
, W?`B s
2 tj´1 ,tj
´ r W?´B
2
, W?´B s
2 tj´1 ,tj
“ ptj ´ tj´1 q ´ ptj ´ tj´1 q “ 0
since W?`B
2
and W?´B
2
are standard Brownian motions and hence have quadratic variation the length
of the time interval.
Remark 2.6. One can also prove the above result more directly by following the same argument as
in Theorem 5.3 of Chapter 3. The key calculation is to show that the expected value of the approximating
quadratic variation is 0 and not the length of the time interval as in the proof of Theorem (5.3) of Chapter 3.
For any partitions ts` u of rtj´1 , tj s one has to zero we have
ÿ ÿ
E pBs` ´ Bs` ´1 qpBs` ´ Bs` ´1 q “ EpBs` ´ Bs` ´1 qEpBs` ´ Bs` ´1 q “ 0
` `
2.2. Quadratic Variation of an Itô Process. In this section, using the results presented
above, we finally obtain a simple expression for the quadratic variation of an Itô process.
Lemma 2.7. If Xt is an Itô process with differential dXt “ µt dt ` σt dBt , then
żt
rXst “ rIpσ 2 qst “ σs2 ds , (5.20)
0
The summation on the right hand side is bounded from above by the first variation of Yt which by
assumption is finite a.s. On the other hand, as n Ñ 8 the supremum goes to zero since |ΓN | Ñ 0
and Xt is a.s. continuous.
59
Remark 2.9. Similarly to the formal considerations in Section 1.1, we may think of the
differential of the quadratic variation process drXst as the limit of the difference term pXtnk`1 ´ Xtnk q2 ,
which in turn is the square of the differential of Brownian motion pdXt q2 . Therefore, formally
speaking, we can obtain the result of the previous lemma by writing
drXst “ pdXt q2 “ pµt dt ` σt dBt q2
“ µ2t pdtq2 ` 2µt σt pdtqpdBt q ` σt2 pdBt q2
“ σt2 dt .
where we have applied pdtq2 “ pdtqpdBt q “ 0 (cfr. Lemma 2.8) and pdBt q2 “ dt (cfr. Lemma 2.3).
These formal multiplication rules are summarized in the following table: By the same formal
ˆ dt dBt
dt 0 0
dBt 0 dt
arguments, such rules apply to the computation of the quadratic covariation of two Itô processes
Yt “ µ1t dt ` σt1 dBt :
drX, Y st “ pdXt qpdYt q “ pµt dt ` σt dBt qpµ1t dt ` σt1 dBt q
“ µt µ1t pdtq2 ` µt σt1 pdtqpdBt q ` µ1t σt pdtqpdBt q ` σt σt1 pdBt q2 (5.21)
“ σt σt1 dt .
This result can be verified by going through the steps of the proof of the above lemmas.
we need only prove Itô’s formula for f pXt` q ´ f pXt`´1 q. Now since Xt is constant for t P rt`´1 , t` q
we can take ξ “ Xt`´1 and for rr, ss Ă rt`´1 , t` s we have
żs żs
Xs ´ Xr “ µτ dτ ` στ dBτ “ µt`´1 ps ´ rq ` σt`´1 pBs ´ Br q .
r r
pnq pnq
Let tsi : i “ 0, . . . , Kpnqu be a sequence of partitions of rt`´1 , t` s such that |ΓN | “ supi |si ´
pnq
si´1 | Ñ 0 as n Ñ 8. Now using Taylor’s theorem we have
1
f pXsj q ´ f pXsj´1 q “ f 1 pXsj´1 qpXsj ´ Xsj´1 q ` f 2 pξj qpXsj ´ Xsj´1 q2 .
2
for some ξj P pXsj´1 , Xsj q. Hence we have
Kpnq
ÿ
f pXt` q ´ f pXt`´1 q “ f pXsj q ´ f pXsj´1 q
j“1
Kpnq Kpnq
ÿ 1 ÿ 2 1
“ 1
f pXsj´1 qpXsj ´ Xsj´1 q ` f pξj qpXsj ´ Xsj´1 q2 “ pIq ` pIIq
j“1
2 j“1 2
Since
pXsj ´ Xsj´1 q2 “ µ2t`´1 psj ´ sj´1 q2 ` 2µt`´1 σt`´1 psj ´ sj´1 qpBsj ´ Bsj´1 q ` σt2`´1 pBsj ´ Bsj´1 q2
we have
Kpnq
ÿ Kpnq
ÿ
pIq “ µt`´1 f 1 pXsj´1 qps ´ rq ` σt`´1 f 1 pXsj´1 qpBs ´ Br q “ pIaq ` pIbq
j“1 j“1
and
Kpnq
ÿ Kpnq
ÿ
pIIq “µ2t`´1 f 2 pξj qpsj ´ sj´1 q2 ` 2µt`´1 σt`´1 f 2 pξj qpsj ´ sj´1 qpBsj ´ Bsj´1 q
j“1 j“1
Kpnq
ÿ
` σt2`´1 f 2 pξj qpBsj ´ Bsj´1 q2 “ pIIaq ` pIIbq ` pIIcq
j“1
As n Ñ 8, it is clear that
ż t` ż t`
pIaq ÝÑ f 1 pXs qbs ds, and pIbq ÝÑ f 1 pXs qσs dBs .
t`´1 t`´1
61
Using the same arguments as in Theorem 1.1 we see that
ż t` ż t`
pIIcq ÝÑ σt2`´1 f 2 pXs q ds “ σs2 f 2 pXs q ds .
t`´1 t`´1
All that remains is to show that pIIaq and pIIbq converge to zero as n Ñ 8. Observe that
Kpnq
ÿ Kpnq
ÿ
|pIIaq| ď µ2t`´1 |ΓN | 2
f pξj qpsj ´ sj´1 q N
and |pIIbq| ď 2µt`´1 σt`´1 |Γ | f 2 pξj qpBsj ´ Bsj´1 q .
j“1 j“1
ş ş
Since the two sums converge to and respectively the fact that |ΓN | Ñ 0
f 2 pXs q ds f 2 pXs qdBs
implies that pIIaq and pIIbq converge to zero as n Ñ 8. Putting all of these results together
produces the quoted result.
Remark 3.2. Notice that the “multiplication table” given in Remark 2.9 is reflected in the
details of the proof of Theorem 3.1. Each of the terms in pdXt q2 correspond to one of the terms
labeled pIIq which came from pXpsj q ´ Xpsj´1 qq2 in the Taylor’s theorem expansion. The term
pIIaq which corresponds to p dtq2 limits to zero as the multiplication table indicates. The term pIIbq
which corresponds to pdtqpdBt q also tends to zero again as the table indicates. Lastly, pIIcq which
corresponds to p dBt q2 limits to an integral against pdtq as indicated in the table.
Example 3.3. Consider the stochastic process with differential
1
dXt “ Xt dBt ` Xt dBt .
2
The above process is an example of a geometric Brownian motion, a process widely used in finance
to model the price of a stock. We apply Itô formula to the function f pxq :“ log x. Using that
2 f pxq “ ´x´2 we obtain
Bx f pxq “ x´1 and Bxx
ˆ ˙
1 1 1 1 1 1 1 ` 2 ˘
d log Xt “ dXt ´ 2 drXst “ Xt dBt ` Xt dBt ´ Xt dt “ dBt .
Xt 2 Xt Xt 2 2 Xt2
In the integral form the above can be written as log Xt “ log X0 ` Bt and therefore Xt “ X0 eBt .
If we collect the Brownian motions into one m-dimensional Brownian motion Bt “ pB1 ptq, . . . , Bm ptqq
and define the Rd -valued process µt “ pµ1 ptq, . . . , µd ptqq and the matrix valued process σt whose
matrix elements are the σij ptq then we can write
dXt “ µt dt ` σt dBt . (5.24)
62
While this is nice and compact, it is perhaps more suggestive to define the Rd -valued processes
σ pjq “ pσ1,j , . . . , σn,j q for j “ 1, . . . , m and write
d
ÿ pjq
dXt “ µt dt ` σt dBj ptq . (5.25)
j“1
This emphasizes that the process Xt at each moment of time is pushed in the direction which µt
pjq
points and given m random kicks, in the directions the σt point, whose magnitude and sign are
dictated by the Brownian motions Bj ptq.
We now want to derive the process which describes evolution of F pXt q where F : Rd Ñ R. In
other words, the multidimensional Itô formula.
We begin by developing some intuition. Recall Lemma 2.5 stating that the cross-quadratic
variation of independent Brownian motions is zero. Hence if Bt and Wt are independent standard
Brownian motions then the multiplication table for pdXt q2 and pdYt qpdXt q if Xt and Yt are two Itô
processes is given in the following table.
ˆ dt dBt dWt
dt 0 0 0
dBt 0 dt 0
dWt 0 0 dt
Table 1. Formal multiplication rules for differentials of two independent Brownian motions
where
d
ÿ
aik ptq “ σij ptqσkj ptq
j“1
The matrix aptq can be written compactly as σptqσptqT . The matrix a is often called the diffusion
matrix.
We will only sketch the proof of this version of Itô formula since it follows the same logic of the
others already proven. Proofs can be found in many places including [14, 7, 3].
Sketch of proof. Similarly to the proof of Theorem 3.1 we introduce the family of partitions
ΓN of the interval r0, ts as in (8.8) with limnÑ8 |ΓN | “ 0 and expand in Taylor the function f in
each of these intervals:
N !ÿd
ÿ B
F pXt q ´ F pX0 q “ F pXs`´1 qpXi ps` q ´ Xi psj´1 qq
`“1 i“1
Bx i
63
d
1 ÿ B2 )
` F pξ` qpXi ps` q ´ Xi ps`´1 qqpXj ps` q ´ Xj ps`´1 qq
2 i,j“1 Bxi Bxj
N
ÿ 1
“ tpIq` ` pIIq` u ,
`“1
2
for ξ` P di“1 rXi ps` q, Xi ps``1 qs . For the first order term, it is straightforward to generalize the
Ś
proof of Theorem 3.1 to obtain that
p N żtÿ d
ÿ B
lim pIq` “ F pXs q dXi psq .
N Ñ8
`“1 0 i“1
Bx i
We formally recover the expression of the second order term by combining (5.21) with the rules
of Table 1:
N żt ÿ d
p ÿ B2
lim pIIq` “ F pXs qpdXi psqqpdXj psqq
N Ñ8
`“1 0 i,j“1
Bx i Bx j
d
żt ÿ m
B2 ÿ
“ F pXs q pσik ptqdBk psqqpσjl ptqdBl psqq
0 i,j“1
Bxi Bxj k,l“1
d
żt ÿ m
B2 ÿ
“ F pXs q σik ptqσjk ptq dt ,
0 i,j“1
Bxi Bxj k“1
where in the second equality we have used that pdtqpdBj ptqq “ 0 and in the third that pdBi ptqqpdBj ptqq “
0 for i ‰ j. Note that one should check that when taking the limit in the first equality F pξ` q can be
replaced by F pXs` q. This can be done by reproducing the proof of Lemma 1.2.
Remark 4.3. The fact that (5.27) holds requires that Bi and Bj are independent if i ‰ j.
However, (5.26) holds even when they are not independent.
Theorem 4.2 is for a scalar valued function F . However by applying the result to each coordinate function
Fi : Rd Ñ R of the function F : Rd Ñ Rp with F “ pF1 , . . . , Fp q we obtain the full multidimensional version.
Instead of writing it again in coordinates, we take the opportunity to write a version aligned with the
perspective and notation of (5.26). Recalling that the directional derivative of F in the direction ν P Rd at
the point x P Rd is
p ÿ d
F px ` ενq ´ F pxq ÿ BFk
DF pxqrνs “ lim “ p∇F ¨ νqpxq “ νi ek
εÑ0 ε k“1 i“1
Bxi
where ek is the k-th unit vector of Rp . Similarly, the second directional derivative at the point x P Rd in the
directions ν, η P Rd is given by
p ÿ d ÿ d
DF px ` εηqrνs ´ DF pxqrνs ÿ B 2 Fk
D2 F pxqrν, ηs “ lim “ νi ηj ek
εÑ0 ε k“1 i“1 j“1
Bxi xj
64
If Bi and Bj are assumed to be independent if i ‰ j, then
d d
ÿ 1ÿ 2
dF pXptqq “ DF pXptqqrf ptqs dt ` DF pXptqqrσ piq ptqs dBi ptq ` D F pXptqqrσ piq ptq, σ piq ptqs dt .
i“1
2 i“1
We now consider special cases of Theorem 3.1 that will be helpful in practice. The first describes
evolution of a function F px, tq that depends explicitly on time:
Corollary 4.4. Let F : Rd ˆ r0, 8q Ñ R such that F px, tq is C 2 in x P Rd and C 1 in t P r0, 8q.
Furthermore, let Xt be a d-dimensional Itô process as in (5.23). Then
n n n
BF ÿ BF 1 ÿ ÿ BF
dF pXt , tq “ pXt , tq dt ` pXt , tqdXi ptq ` pXt , tq drXi , Xk st . (5.28)
Bt i“1
Bx i 2 i“1 k“1
Bx i Bxk
Note that the existence of the second derivative in t is not needed in the above formula and can
therefore be dropped.
Corollary 4.5. Let Xt , Yt be two Itô processes. Then
dpXt Yt q “ Yt dXt ` Xt dYt ` drX, Y st . (5.29)
This result is known as stochastic integration by parts formula.
Proof. Let F : R2 Ñ R with F px, yq “ x ¨ y, then since
2 2 2
Bx F px, yq “ y , By F px, yq “ x , Bxx F px, yq “ Byy F px, yq “ 0 , Bxy “ 1,
by Itô’s Formula Theorem 5.26 we have
dpXt Yt q “ dF pXt , Yt q “ Yt dXt ` Xt dYt ` 1drX, Y st . (5.30)
Example 4.6. We compute the stochastic integral
żt
sdBs .
0
Applying the integration by parts formula (5.30) for dXt “ dBt , dYt “ dt we obtain
dptBt q “ tdBt ` Bt dt ` drB, tst .
65
Since Yt is of finite variation we have drB, Y st “ 0, and by integrating and rearranging the terms
we obtain
żt żt żt żt
sdBs “ dpsBs q ´ Bs ds “ tBt ´ Bs ds .
0 0 0 0
Example 4.7. Assume that f px, tq P C 2,1 pR, R `q satisfies the pde
B B2
f px, tq ` 2 f px, tq “ 0 ,
Bt Bx
“ ‰
and E f pBt , tq2 ă 8, then have that
1 2
df pBt , tq “ Bt f pBt , tq dt ` Bx f pBt , tq dt ` Bxx f pBt , tq drBst
ˆ ˙ 2
1 2
“ Bt ` Bxx f pBt , tq dt ` Bx f pBt , tq dBt “ Bx f pBt , tqdBt .
2
şt
Therefore f pBt , tq “ f p0, 0q ` 0 Bx f pBs , sq dBs is a martingale.
5. Collection of the Formal Rules for Itô’s Formula and Quadratic Variation
We now recall some of the formal calculations, bringing them all together in one place. We consider
a probability space pΩ, F, Pq with a filtration Ft . We assume that Bt and Bt are independent standard
Brownian motions adapted to the a filtration Ft .
For any ρ P r0, 1s, Zt “ ρBt ` 1 ´ ρ2 Bt is again a standard Brownian motion. Furthermore
a
rZst “ rZ, Zst “ ρ2 rB, Bst ` 2ρ 1 ´ ρ2 “ rB, Bst ` p1 ´ ρ2 qrB, Bst
“ ρ2 t ` 0 ` p1 ´ ρ2 qt “ t
Or in the formal differential notation, drZst “ dt. This result can be understood by using the formal
multiplication table for differentials which formally states:
a a
drZst “ pdZt q2 “ pρdBt ` 1 ´ ρ2 dBt q2 “ ρ2 pdBt q2 ` 2ρ 1 ´ ρ2 dBt dBt ` p1 ´ ρ2 qp dBt q2
“ ρ2 dt ` 0 ` p1 ´ ρ2 q dt “ dt
Similarly, one has
a
drZ, Bst “ pdZt qpdBt q “ ρpdBt q2 ` 1 ´ ρ2 p dBt qpdBt q “ ρ dt ` 0 “ ρ dt
a a a
drZ, Bst “ pdZt qp dBt q “ ρpdBt qp dBt q ` 1 ´ ρ2 p dBt q2 “ 0 ` 1 ´ ρ2 dt “ 1 ´ ρ2 dt
Now let σt and gt be adapted stochastic processes (adapted to Ft ) with
żt żt
σs2 ds ă 8 and gs2 ds ă 8
0 0
66
and the cross-quadratic variations
drM, N st “ pdMt qpdNt q “ σt gt p dBt q2 “ σt gt dt
drM, U st “ pdMt qpdUt q “ σt2 p dBt qpdBt q “ 0
a a
drM, Zst “ pdMt qpdZt q “ ρσt2 p dBt qpdBt q ` 1 ´ ρ2 σt2 p dBt q2 “ 1 ´ ρ2 σt2 dt
Next we define
dHt “ µt dt and dKt “ ft dt
and observe that since Ht and Kt have finite first variation we have that
drHst “ pdHt q2 “ µ2t p dtq2 “ 0 and drKst “ pdKt q2 “ ft2 p dtq2 “ 0
Furthermore if Xt “ Ht ` Mt and Yt “ Kt ` Nt then using the previous calculations
drXst “ drX, Xst “ drH ` M, H ` M st “ drHst ` drM st ` 2drH, M st “ σt2 dt
drX, Y st “ drH ` M, K ` N st “ drH, K ` N st ` drM, K ` N st “ drM, N st “ σt gt dt
or using the formal algebra
drXst “ pdXt q2 “ µ2t p dtq2 ` 2µt σt p dtqp dBt q ` σt2 p dBt q2 “ 0 ` 0 ` σt2 dt “ σt2 dt
drX, Y st “ pdXt qpdYt q “ µt ft p dtq2 ` σt ft p dtqp dBt q ` gt ft p dtqp dBt q ` σt gt p dBt q2
“ 0 ` 0 ` 0 ` σt gt dt “ σt gt dt
67
CHAPTER 6
1. Definitions
Let pΩ, F, Pq be a probability space equipped with a filtration tFt uT . Let Bt “ pB1 ptq, . . . , Bm ptqq P
Rm be a m-dimensional Brownian motion with tBj ptqum j“1 a collection of mutually independent
Brownian motions such that Bj ptq P Ft and for any 0 ď s ă t, Bj ptq ´ Bj psq is independent of Fs .
Obviously, these conditions are satisfied by the natural filtration tFtB uT .
where Xt is an unknown process is a Stochastic Differential Equation (sde) driven by the Brownian
motion tBt u. The functions µpxq, σpxq asre called the drift and diffusion coefficients, respectively.
It is more compact to introduce the matrix
¨ ˛
| |
σpxq “ ˝σ1 pxq ¨ ¨ ¨ σm pxq‚ P Rmˆd
| |
and write
dXt “ bpXt q dt ` σpXt q dBt . (6.3)
There are different concepts of solution for a sde. The most natural is the one of strong solution:
Definition 1.2. A stochastic process tXt u is a strong solution to the sde (6.1) driven by the
Brownian motion Bt with (possibly random) initial condition X0 P R if the following holds
i) tXt u is adapted to tFt u ,
ii) tXt u is continuous ,
şt şt
iii) Xt “ X0 ` 0 µpXt q dt ` m
ř
j“1 0 σj pXt q dBj ptq almost surely .
Remark 1.3. Often, the choice of Brownian motion in the above definition is implicit. However,
it is important that the strong solution of an sde depends on the chosen Brownian motion driving it.
A conceptually useful way to restate strong existence, say for all t ě 0, is that there exists a measure
map Φ : pt, Bq ÞÑ Xt pBq from r0, 8q ˆ Cpr0, 8q, Rd q Ñ Rd such that Xt “ Φpt, Bq solves (6.2) and
Xt is measurable with respect to the filtration generated by Bt .
69
Definition 1.4. We say that a strong solution to (6.1) (driven by a Brownian motion Bt ) is
strongly unique if for any two solutions Xt , Yt with the same initial condition X0 of (6.1) we have
that
P rXt “ Yt for all t P r0, T ss “ 1 .
Remark 1.5. By definition, the strong solution of a sde is continuous. For this reason, to prove strong
uniqueness it is sufficient to prove that two solutions Xt , Yt satisfy
P rXt “ Yt s “ 1 for all t P r0, T s .
Indeed, assuming that Xt is a version of Yt , by countable additivity, the set A “ tω : Xt pωq “ Yt pωq for all t P
Q` u has probability one. By right-continuity (resp. left-continuity) of the sample paths, it follows that X and
Y have the same paths for all ω P A.
2. Examples of SDEs
We now consider a few useful examples of sdes that have a strong solution.
2.1. Geometric Brownian motion. The geometric Brownian motion, or Black Scholes model
in finance, is a stochastic process Xt that solves the sde
dXt “ µXt dt ` σXt dBt , , (6.4)
where µ, σ P R are constants. This model can be used to describe the evolution of the price Xt of a
stock, which is assumed to have a mean increase and fluctuations with variance that depend both
linearly on the stock price Xt . The coefficients µ, σ are called the percentage drift and percentage
volatility, respectively. We see immediately that (6.4) has a solution by Itô’s formula: letting
f pxq “ log x we have that
1 1 1 2 2 1
d log Xt “ pµXt dt ` σXt dBt q ´ 2 σ Xt dt “ pµ ´ σ 2 q dt ` σ dBt ,
Xt 2 Xt 2
Therefore, by integrating and exponentiating, that the solution of the equation reads
„ˆ ˙
1 2
Xt “ X0 exp µ ´ σ t ` σBt .
2
The uniqueness of this solution will be proven shortly.
2.2. Stochastic Exponential. Let Xt be an Itô process with differential dXt “ µt dt ` σt dBt .
We consider the following sde:
dUt “ Ut dXt (6.5)
with initial condition U0 “ u0 P R. Note that often one chooses u0 “ 1. Since the above sde is
analogous to the ode df “ f dt whose solution is given by the exponential function f ptq “ expptq,
the process Ut solving (6.5) is often called the stochastic exponential and one writes Xt “ EpXqt .
The following result ensures that this process exists and is unique:
Proposition 2.1. The sde (6.5) has a unique strong solution, given by
„ „ż t ˆ ˙ żt
1 1 2
Ut “ EpXqt :“ U0 exp Xt ´ X0 ´ rXst “ U0 exp µs ´ σs ds ` σs dBs . (6.6)
2 0 2 0
We notice that repeating the above procedure k times we will obtain the first k terms of the Taylor
expansion of A exp Dt plus a remainder term resulting from an integral iterated k ` 1 times. For
finite T we can bound such integral by defining the constants
żT
C :“ ypsq ds ă 8 and G :“ A ` DC ,
0
so that yptq ď G. Consequently, we can bound the remainder term by Gtk`1 Dk`1 {k! which vanishes
exponentially fast in the limit k Ñ 8, uniformly in t P r0, T s.
Alternative proofs assuming the existence and uniqueness of solutions to odes can be found in
any good ode or dynamics book. For instance [6] or [5].
Lemma 3.5. Let tyn ptqu be a sequence of functions satisfying
‚ y0 ptq ď A ,
żt
‚ yn`1 ptq ď D yn psq ds ă 8 @ n ą 0, t P r0, T s ,
0
Hence by Gronwall’s inequality Lemma 3.4 we conclude that E |X1 ptq ´ X2 ptq|2 “ 0 for all t P r0, T s.
Hence X1 ptq and X2 ptq are identical almost surely.
Existence for sde. The existence of solutions is proved by a variant of Picard’s iterations. Fix
an initial value x, we define a sequence processes Xn ptq follows. By induction, the processes have
continuous paths and are adapted.
X0 ptq “ x
żt żt
X1 ptq “ x ` µps, xqds ` σps, xqdBs
0 0
.. .. ..
. . .
żt żt
Xn`1 ptq “ x ` µps, Xn psqqds ` σps, Xn psqqdBs
0 0
Fix t ě 0, we will show that Xn ptq converges in L2 . Hence there is a random variable Xptq P
L2 “ ‰
L2 pΩ, F, P q and Xn ÝÑ Xptq. Let yn ptq “ E pXn`1 ptq ´ Xn ptqq2 , we will verify the two conditions
in Lemma 3.5. First, for n “ 0 and any t P r0, T s,
«ˆż ˙2 ff «ˆż ˙2 ff
t t
2
“ ‰
y0 ptq “ E pX1 ptq ´ X0 ptqq ď 2E µps, xqds ` 2E σps, xqdBs
0 0
«ˆż ˙2 ff «ˆż ˙2 ff
t t
ď 2E K|1 ` x|ds ` 2E K|1 ` x|dBs ď C ă 8,
0 0
75
where the second inequality uses the fact that the coefficients are growing no faster than linearly.
Second, similar computation as for the uniqueness yields
żt
2
yn`1 ptq ď 2K p1 ` T q yn psqds @t P r0, T s, n “ 0, 1, 2 . . .
0
which is finite by induction. Lemma 3.5 implies
p4K 2 ` 4K 2 T qn
yn ptq “ E pXn`1 ptq ´ Xn ptqq2 ď C
“ ‰
,
n!
which goes to zero uniformly for all t P r0, T s. We thus conclude Xn ptq converges in L2 uniformly
and their L2 -limit, Xptq P L2 pΩ, F, P q.
L2
It remains to show that the limit process Xptq solves (6.13). Since Xn ÝÑ X, we have
Erµpt, Xn ptqq ´ µpt, Xptqqs2 ` Erσpt, Xn ptqq ´ σpt, Xptqqs2
ď K 2 ErpXn ptq ´ Xptqq2 s ` K 2 ErpXn ptq ´ Xptqq2 s
Ñ 0, uniformly in t
By Itô’s isometry and Fubini:
«ˆż ˙2 ff «ˆż ˙2 ff
t żt t
E σps, Xn psqqdBs ´ σps, XpsqqdBs “E σps, Xn psqq ´ σps, XpsqqdBs
0 0 0
żt
nÑ8
E pσps, Xn psqq ´ σps, Xpsqqq2 ds ÝÑ 0 .
“ ‰
“
0
Similarly, by Cauchy-Schwarz inequality we have that :
«ˆż ˙2 ff «ˆż ˙2 ff
t żt t
E µps, Xn psqqdBs ´ µps, Xpsqq ds “E µps, Xn psqq ´ µps, Xpsqq ds
0 0 0
żt
nÑ8
E pµps, Xn psqq ´ µps, Xpsqqq2 ds ÝÑ 0 .
“ ‰
“t
0
We thus have żt żt
Xptq “ x ` µps, Xpsqqds ` σps, XpsqqdBs ,
0 0
i.e., Xptq solves (6.13).
Remark 3.6. Looking through the proof of Theorem 3.1 we see that the assumption of global
Lipschitz continuity can be weakened to the following assumption
i) |µpt, xq| ` |σpt, xq| ă Cp1 ` |x|q (necessary for existence) ,
ii) |µpt, xq ´ µpt, yq| ` |σpt, xq ´ σpt, yq| ď C|x ´ y| (necessary for uniqueness) .
Example 4.3. Consider the sde dYt “ dBt with initial condition Y0 “ 0 . This sde has clearly
a strong solution, which is Yt “ Bt . If we let Wt be another Brownian motion (possibly defined on
another probability space) then Wt will not be, in general, a strong solution to the above sde (in the
case that the two probability spaces are different, the two solutions cannot even be compared). It
will, however, be a weak solution to the sde, as being a Brownian motion completely determines the
marginals of the process.
We will now consider an example for which there exists a weak solution, but not a strong
solution:
Example 4.4 (Tanaka’s sde). For certain µ and σ, solutions to (6.1) may exist for some
Brownian motion and some admissible filtrations but not for others. Consider the sde
dXt “ signpXt qdBt , X0 “ 0; (6.15)
where σpt, xq “ signpxq is the sign function
"
`1, if x ě 0
signpxq “
´1, if x ă 0.
The function σpxq is not continuous and thus not Lipschitz. A strong solution does not exist for
this sde, with the filtration F “ pFt q chosen to be Ft :“ σpBs , 0 ď s ď tq. Suppose Xt is a strong
solution to Tanaka’s sde, then we must have
F̃t :“ σpXs , 0 ď s ď tq Ď Ft . (6.16)
şT “ ‰ ş t
Notice that for any T ě 0, 0 E signpXt q2 ds ă 8, the Itô integral 0 signpXt qdBs is well defined
and Xt is a martingale. Moreover, the quadratic variation of Xt is
żt żt
rXst “ rsignpXt qs2 ds “ 1 ¨ ds “ t,
0 0
thus Xt must be a Brownian motion (by Lévy’s characterization, to be proved later). We may
denote Xt “ B̃t to emphasize that it is a Brownian motion. Now multiplying both sides of (6.15) by
signpXt q, we obtain
dBt “ signpB̃t qdB̃t . (6.17)
77
şt
and thus Bt “ 0 signpB̃s qdB̃s . By Tanaka’s formula (to be shown later), we then have
Bt “ |B̃t | ´ L̃t
where L̃t is the local time of B̃t at zero. It follows that Bt is σp|B̃s |, 0 ď s ď tq-measurable. This
leads to a contradiction to (6.16), because it would imply that
Ft Ď σp|B̃s |, 0 ď s ď tq Ĺ σpB̃s , 0 ď s ď tq “ F̃t .
Still, as we have seen above, choosing Xt “ B̃t there exists a Brownian motion Bs such that Tanaka’s
sde holds. Such pair of Brownian motions forms a weak solution to Tanaka’s equation.
78
CHAPTER 7
Throughout this chapter, except when specified otherwise, we let tXt u be a solution to the sde
dXt “ µpt, Xt q dt ` σpt, Xt q dBt . (7.1)
As we have seen in the last chapter, solutions to the above equation are referred to as diffusion
processes. This name comes from the fact that Brownian motion, the archetypal diffusion process,
was invented to model the diffusion of a dust particle in water. Similarly, in the world of partial
differential equations, diffusion equations model precisely the same type of phenomenon. In this
chapter we will see that the correspondence between these two domains goes well beyond this point.
1. Infinitesimal generators
Having seen in the previous chapter that solutions to sdes possess the strong markov property,
we introduce the following operator to study the evolution of their finite-dimensional distributions:
Definition 1.1. The infinitesimal generator for a continuous time Markov process Xt is an
operator A such that for any function f ,
Erf pXt`dt q|Xt “ xs ´ f pxq
At f pxq :“ lim , (7.2)
dtÓ0 dt
provided the limit exists. The set of function for which the above limit exists is called the domain
DpAt q of the generator.
This operator encodes the infinitesimal change in the probability distribution of the process Xt .
One way of seeing this is by choosing f pxq “ 1A pxq for a set A P Rd .
We now look at some examples where we find the explicit form of the generator for Itô diffusions:
Example 1.2. The infinitesimal generator for a standard one-dimensional Brownian motion Bt
is
1 d2
A“
2 dx2
for all f that are C with compact support. To derive this, we first apply Itô’s formula to any f P C 2
2
and write
żt 2 żt
d 1 d2
f pBt q “ f pB0 q ` f pBs qdBs ` 2
f pBs qds
0 dx 0 2 dx
żt żt
1 1 d2
“ f pB0 q ` f pBs qdBs ` 2
f pBs qds
0 0 2 dx
Apply this formula to two time points t and t ` r, we have
ż t`r ż t`r
1 1 d2
f pBt`r q “ f pBt q ` f pBs qdBs ` f pBs qds
t t 2 dx2
79
When f has compact support, f 1 pxq is bounded and suppose |f 1 pxq| ď K and thus for each t,
żt żt
K 2 ds “ K 2 t ă 8.
` 12 ˘
E f pBs q ds ď
0 0
Hence the first integral has expectation zero, due to Itô’s isometry. It follows that
" ż t`r ż t`r
1 d2
ˇ *
1
ˇ
Erf pBt`r q|Bt “ xs “ f pxq ` E f pBs qdBs ` f pBs qdsˇ Bt “ x
ˇ
t t 2 dx2
" ż t`r ż t`r
1 d2
*
1
“ f pxq ` E f pBs qdBs ` f pBs qds
t t 2 dx2
ż t`r
1 d2
“ f pxq ` E f pBs qds
t 2 dx2
where the second equality is due to the independence of the post-t process Bt`s ´ Bt and Bt .
Subtracting f pxq, dividing by r and letting r Ñ 0 on both sides, we obtain
”ş ı
t`r 1 d2
Erf pBt`r q|Bt “ xs ´ f pxq E t 2 dx 2 f pBs qds
Af pxq “ lim “ lim
rÓ0 r rÓ0 r
´ ”ş 2
¯ı ” 2 ı
t`r t`r 1
d E t 12 dx d d
ş
2 f pB s qds p˚q
d t 2 E dx2
f pB s ds
q
“ “
ˇdr dr
1 d2 ˇ 1 d2 1 d 2
“ f pBs qˇˇ “ f pBt q “ f pxq
2 dx2 s“t 2 dx2 2 dx2
In the above calculation, we inverted the order of the integrals using Fubini-Tonelli’s theorem.
Remark 1.3. Here we omit the subscript t in the generator A because Brownian motion is
time-homogeneous, i.e.,
Erf pBt`dt q|Bt “ xs ´ f pxq Erf pBs`dt q|Bs “ xs ´ f pxq
“
dt dt
1 d2
and thus At f pxq “ As f pxq. The generator A “ 2 dx2 does not change with time.
The procedure to obtain the infinitesimal generator of Brownian motion can be straightforwardly
generalized to the case of Itô diffusions:
Example 1.4. Assume that Xt satisfies the sde (7.1), then its generator At is
d σ 2 pt, xq d2
At f pxq “ µpt, xq f pxq ` f pxq (7.3)
dx 2 dx2
for all f P C 2 with compact support.The computation is similar to the Brownian motion case. First
apply Itô’s formula to f pXt q and get
ż t`r " ż t`r
σ 2 ps, Xs q d2
*
d d
f pXt`r q “ f pXt q ` µps, Xs q f pXs q ` 2
f pXs q ds ` σps, Xs q f pXs qdBs
t dx 2 dx t dx
Then using the fact that f P C 2 with compact support, the last integral has expectation zero.
Conditioning on Xt “ x, computing Erf pXt`r q|X
r
t “xs´f pxq
, exchanging integrals by Fubini-Tonelli and
taking r Ñ 0, we conclude that the generator has the form (7.3).
The above example can be further generalized to the case when the function f also depends on time:
80
Example 1.5. Consider the two dimensional process pt, Xt q, where the first coordinate is
deterministic and the second coordinate Xt satisfies (7.1). We treat it as a process Yt “ pt, Xt q P R2 .
In this case, the generator of Yt , according to the definition in (7.2) is given by
Erf pYt`dt q|Yt “ pt, xqs ´ f pt, xq
At f pt, xq :“ lim
dtÓ0 dt
B σ 2 pt, xq B 2 B
“ µpt, xq f pt, xq ` f pt, xq ` f pt, xq (7.4)
Bx 2 Bx2 Bt
for any f P C 1,2 that has compact support.
Formally speaking, if At is the generator of Xt , what At does to f is to map it to the “drift
coefficient” in the stochastic differential of f pXt q, i.e.,
df pXt q “ At f pXt q ¨ dt ` (something) ¨ dBt
Remark 1.6. The notation here is slightly different from [Klebaner], where Klebaner always
uses Lt to denote the operator on functions f P C 1,2 so that
B σ 2 pt, xq B 2
f pt, xq `
Lt f pt, xq “ µpt, xq f pt, xq (7.5)
Bx 2 Bx2
and call such Lt the “generator of Xt ”. Comparing this form with (7.3) and (7.4), and since Lt
acts on C 1,2 functions, we can relate Lt to the generator At of pt, Xt q, i.e.,
B
Lt f pt, xq `
f pt, xq “ At f pt, xq
Bt
When we look at martingales constructed from the generators, At will give a more compact (and
maybe more intuitive) notation.
Exercise. Find the generator At of pXt , Yt q, where Xt and Yt satisfies
dXt “ µpt, Xt qdt ` σpt, Xt qdBt
dYt “ αpt, Yt qdt ` βpt, Yt qdBt
What if Xt and Yt are driven by two independent Brownian motions?
Example 3.4. Letting gpxq “ 1A pxq, we have that being able to solve (7.9) is equivalent to
knowing
E r1A pXT q|Xt “ xs “ P rXT P A|Xt “ xs .
Example 3.5. Letting Xt be the solution to the Black Scholes model dXt “ µXt dt ` σXt dBt
and gpxq “ V pxq some value function of an option at time T , then being able to solve (7.9) is
equivalent to knowing the expected value of that option at expiration: E rV pXT q|Xt “ xs.
We now state an extension of Theorem 3.1 which deals with the case where the right side of the
pde is nonzero.
Theorem 3.6. Under the standing assumptions, if f pt, xq solves the pde
#
At f pt, xq “ ´φpxq for all t P p0, T q
f pT, xq “ gpxq
for some bounded function φ : R Ñ R and g such that Er|gpXT q|s ă 8. Then,
ˆ żT ˇ ˙
ˇ
f pt, xq “ E gpXT q ` φpXs qdsˇ Xt “ x , for all t P r0, T s.
ˇ
t
Example 3.8 (Example 3.5 continued). Let us consider the Black Scholes model i.e., dXt “
µXt dt ` σXt dBt for σ, µ P R. We consider the case where one can cash his/her option and obtain
a risk-free interest that satisfies the ode
dXt “ rXt dt ,
for a positive constant r P R. Then, one needs to factor such possible, risk-free earning in the
value V pt, Xt q of the asset Xt (the underlying), i.e., compare the expected value at future time T ,
V pXT q “ V ˚ pXT q with the projected risk-free value today:
erpT ´tq V pt, Xt q “ E rV ˚ pXT q|Xt “ xs
or, in other words, ” ı
V pt, Xt q “ E e´rpT ´tq V pXT q|Xt “ x .
85
The above is an example of the expected value in Theorem 3.7, and therefore obeys the pde
#
Bt V pt, xq ` µxBx V pt, xq ` 12 σ 2 x2 Bxx
2 V pt, xq ´ r V pt, xq “ 0 for all t P r0, T s
˚
,
V pT, xq “ V pxq
which is called the Black Scholes equation.
4. Time-homogeneous Diffusions
In this section we now consider a class of diffusion processes whose drift and diffusion coefficients
do not depend explicitly on time:
Definition 4.1. If Xt solves
dXt “ µpXt q dt ` σpXt q dBt (7.11)
then Xt is a time-homogeneous Itô diffusion process.
Intuitively, the evolution of such processes does not depend on the time at which the process is
started, i.e., P rXt P A|X0 “ x0 s “ P rXt`s P A|Xs “ x0 s for all A P BpRq, s P T s.t. t ` s P T . In
other words, their evolution is invariant wrt translations in time, whence the name time-homogeneous.
Definition 4.2. Given a sde with a unique solution we define the associated Markov semigroup
Pt by
pPt φqpxq “ Ex φpXt q
To see that this definition satisfies the semigroup property observe that the Markov property
states that
pPt`s φqpxq “ Ex φpXt`s q “ Ex EXs φpXt q “ Ex pPt φqpXs q “ pPs Pt φqpxq .
Note that for the class of processes introduced above the infinitesimal generator is time-independent,
i.e., we have At f “ Af . As a further consequence of the translation invariance (in time) of the sde
(7.11), the fact that the final condition of the backward Kolmogorov equation is at a specific time T
is not relevant in this framework. This enables us to “store” the time-reversal in the function itself
and look at the backward Kolmogorov equation as a forward equation as we explain below.
Let f´ px, tq be a bounded, C 2,1 function satisfying
Bf´
“ Lf´
Bt (7.12)
f´ px, 0q “ gpxq
where L is the generator defined in (7.5). For simplicity we also assume that g is bounded and
continuous. Then we have the analogous result to Theorem 3.1
Theorem 4.3. under Mt “ f´ pXt , T ´ tq is a martingale for t P r0, T q.
Proof. The proof is identical to the Brownian case. We start by applying Itô’s formula
żs
“ Bf´ ` ˘` ˘ı
f´ pXs , T ´ sq ´ f´ pX0 , T q “ ´ pXγ , T ´ γq ` Lf´ Xγ , T ´ γ dγ
0 Bt
żs
Bf´ ` ˘
` Xγ , T ´ γ dBγ .
0 Bx
Bf´
As before the integrand of the first integral is identically zero because Bt “ Lf´ . Hence only the
stochastic integral is left on the right-hand side.
And as before we have
86
Corollary 4.4. In the above setting
` ˘
f´ px, tq “ Erg Xt |X0 “ xs
The restriction to bounded and continuous g is not needed.
5. Stochastic Characteristics
To better understand Theorem 4.3 and Corollary 4.4, we begin by considering the deterministic case
Bf´
“ pb ¨ ∇qf´ (7.13)
Bt
f´ px, 0q “ f pxq
We want to make an analogy between the method of characteristics used to solve (7.13) and the results in
Theorem 4.3 and Corollary 4.4. The method of characteristics is a method of solving (7.13) which in this
simple setting amounts to finding a collection of curves (“characteristic curves”) along which the solution
is constant. Let us call these curves xptq were t is the parametrizing variable. Mathematically, we want
f´ pξptq, T ´ tq to be a constant independent of t P r0, T s for some fixed T . Hence the constant depends only on
the choice of ξptq. We will look of ξptq “ pξ1 ptq, ¨ ¨ ¨ , ξd ptqq which solve an ODE and thus we can parametrize
the curves ξptq by there initial condition ξp0q “ x. It may seem odd (and unneeded) to introduce the finial
time T . This done so that f´ pT, xq “ f´ pT, ξp0qq and to keep the analogy close to what is traditionally done
in sdes. Differentiating f´ pξptq, T ´ tq with respect to t, we see that maintaining constant amounts
d d
ÿ Bf´ dξi Bf´ ÿ Bf´
pξptq, T ´ tq ptq “ pξptq, T ´ tq “ bi pξptqq pξptq, T ´ tq
i“1
Bxi dt Bt i“1
Bxi
where the last equality follows from (7.13). We conclude that for this equality to hold in general we need
dξ
“ bpξptqq and ξp0q “ x .
dt
Since f´ pξptq, T ´ tq is a constant we have
f´ pξp0q, T q “ f´ pξpT q, 0q ùñ f´ px, T q “ f pξpT qq (7.14)
which provides a solution to (7.13) to all points which can be reached by curves ξpT q. Under mild assumptions
this is all of Rd .
Looking back at Theorem 4.3, we notice that differently from the ode case we did not find a sde Xt
which keeps f´ pXt , T ´ tq constant in the fully fledged sense. However, we have obtained something very
close to it: We chose t ÞÑ f´ pXt , T ´ tq to be a martingale, i.e., a process that is constant on average! This
is the content of Theorem 4.3 and Corollary 4.4 (putting the accent on the expectation part of the result),
which mimicks the result of (7.14), only with the addition of expected values. Hence we might be provoked
to make the following fanciful statement.
Stochastic differential equation are the method of characteristics for diffusions. Rather than
follow a single characteristic back to the initial value to find the current value, we trace a
infinite collection of stochastic curves each back to its own initial value which we then average
weighting with the probability of the curve.
87
6. A fundamental example: Brownian motion and the Heat Equation
We now consider the simple but fundamental case of standard Brownian motion.
Let us consider a compact subset D Ă R2 with a smooth boundary BD and a f pxq defined on
BD.
The Dirichlet problem: We are looking for a function upxq such that
B2u B2u
∆u “ ` “ 0 for y “ py1 , y2 q inside D.
B 2 y1 B 2 y2
lim upyq “ f pxq for all x P BD.
yÑx
Let Bpt, ωq “ pB1 pt, ωq, B2 pt, ωqq be a two dimensional Brownian motion. Define the stopping time
τ “ inftt ą 0 : Bptq R Du
Let Ey be the expectation with respect to the Wiener measure for a Brownian motion starting
from y at time t “ 0. Let us define φpxq “ Ey f pBpτ qq. We are going to show that φpxq solves the
Dirichlet problem.
Lemma 6.1. With probability 1, τ ă 8. In fact, Eτ r ă 8 for all r ą 0.
Proof.
Ptτ ě nu ď Pt|Bp1q ´ Bp0q| ď diamD, |Bp2q ´ Bp1q| ď diamD, . . . , |Bpnq ´ Bpn ´ 1q| ď diamDu
źn
ď Pt||Bpkq ´ Bpk ´ 1q| ď diamDu “ αn where α P p0, 1q
k“1
ř8
Hence n“1 Ptτ ě nu ă 8 and the Borel-Cantelli lemma says that τ is almost surely finite. Now
lets look at the moments.
ż 8
ÿ 8
ÿ ÿ8
Eτ r “ xr Ptτ P dxu ď nr P τ P pn ´ 1, ns ď nr P τ ě n ´ 1 ď nr α n ă 8
( (
n“1 n“1 n“1
Lets fix a point y in the interior of D. Lets put a small circle of radius ρ around y so that the
circle in contained completely in D. Let τρ,y be the first moment of time Bptq hits the circle of
radius ρ centered at y.
Because the law of Brownian motion is invariant under rotations, we see that Bpτρ,y q is distributed
uniformly on the circle centered at y. (Lets call this circle Sρ pyq.)
Theorem 6.2. φpxq solves the Laplace equation.
Proof. i) We start by proving the mean value property. To do so we invoke the Strong
Markov property of Bptq. Let τS “ inf tt : Bptq P Sρ pyqu and zϑ “ pρ cos ϑ, ρ sin ϑq be the
point on Sρ pyq at angle ϑ. We notice that any path from y to the boundary of D ˇ must pass
through Sρ pyq. Thus we can think of φpyq as the weighted average of E f pBpτ qqˇBpτS q “ zϑ
where ϑ moves use around the circle Sρ pyq. Each entry in this average is weighted by the
chance of hitting that point on the sphere starting from y. Since this chance is uniform
1
(all points are equally likely), we simply get the factor of 2π to normalize things to be a
probability measure.
1 2π 1 2π
ż ż
ˇ (
φpyq “ dϑE f pBpτ qq BpτS q “ zϑ “
ˇ dϑφpzϑ q (7.15)
2π 0 2π 0
88
ii) φpxq is infinitely differentiable. This can be easily shown but let us just assume it since we
are doing this exercise in explicit calculation to improve our understanding, not to prove
every detail of the theorems.
2 2
iii) Now we see that φ satisfies ∆φ “ BByφ2 ` BByφ2 “ 0. We expand about a point y in the interior
1 2
of D.
Bφ Bφ
φpzq “φpyq ` pz1 ´ y1 q ` pz2 ´ y2 q
By1 By2
1 B2φ B2φ B2φ
„
2 2
` pz1 ´ y1 q ` 2 pz2 ´ y2 q ` pz1 ´ y1 qpz2 ´ y2 q ` Op|z ´ y|3 q
2 By12 By2 By1 By2
Now we integrate this over a circle Sρ pyq centered at y of radius ρ. We take ρ to by
sufficiently small so that the entire disk in the domain D. By direct calculation we have
ż ż ż
pz1 ´ y1 qdz “ 0, pz2 ´ y2 qdz “ 0, pz1 ´ y1 qpz2 ´ y2 qdz “ 0
Sρ pyq Sρ pyq Sρ pyq
and
ż ż
2 2
pz1 ´ y1 q dz “ pconstqρ , pz2 ´ y2 q2 dz “ pconstqρ2 .
Sρ pyq Sρ pyq
Since by the mean value property,
ż
φpyq “ pconstq φpzqdz
Sρ pyq
we see that
B2φ B2φ
ˆ ˙
0 “ pconstqρ2 ` 2 ` Opρ3 q .
By12 By2
And thus,
B2φ B2φ
∆φ “ ` 2 “0
By12 By2
89
CHAPTER 8
This chapter is dedicated to a more in-depth study of martingales and their properties. Some
of the results exposed here are fairly general, and their proof in full generality required tools that
are more advanced than the ones we have at our disposal. For this reason, some of the proofs will
be given in a simplified setting/under stronger assumptions together with a reference for the more
general result.
Proof. We only prove the first result, for which we have that
“ ‰ “ ‰
lim sup E |Xt |1|Xt |ąn ď lim E |Y |1|Y |ąn ă 8 .
nÑ8 tPT nÑ8
92
We define a martingale such as the one in (8.3) as closed by the random variable Y . In particular,
for any finite time interval r0, T s by definition every martingale is closed by its value at T since
E rMT |Ft s “ Mt and we have the following corollary.
Corollary 1.7. Any martingale Mt on a finite time interval is uniformly integrable.
The above results can be extended to infinite time intervals.
Theorem 1.8 (Martingale convergence theorem). Let Mt on T “ r0, 8q be an integrable
[sub/super]-martingale. Then there exists an almost sure (i.e., pointwise) limit limtÑ8 Mt “ Y ,
and Y is an integrable random variable.
The above theorem does not establish a correspondence between the random variables in terms
of expected values. In particular, we may have cases where the theorem above applies but we have
limtÑ8 E rMt s ‰ E rY s:
Example 1.9. Consider the martingale Mt “ exprBt ´ t{2s. Because it is positive, we have that
sup E r|Mt |s “ sup E rMt s “ E rM0 s “ 1 ă 8 ,
tPT tPT
so it converges almost surely to a random variable Y by Theorem 1.8. However, we see that by the
law of large numbers for Brownian motion Bt {t Ñ 0 a.s. and therefore
” ı ” ı
E rY s “ E lim Mt “ E lim etpBt {t´1{2q “ 0 ,
tÑ8 tÑ8
2. Optional stopping
After studying martingales per se, we consider their relation with stopping times. In particular,
we will see that martingales behave nicely with respect to stopping times. To be more explicit, given
a stochastic process Xt and recalling the definition Def. 7.15 of a stopping time τ , we denote by
τ ^ t “ minpτ, tq and define the stopped process
#
Xt if t ă τ
Xtτ :“ Xτ ^t “ .
Xτ else
The following theorem gives an example of the nice relationship between martingales and stopping
times: it says that the martingale property is maintained by a process when such process is stopped.
Theorem 2.1. For a Ft -martingale Mt and any stopping time τ , the process Mτ ^t is a Ft -
martingale (and therefore a Fτ ^ t-martingale), so
E rMτ ^t s “ E rM0 s for all t ą 0 . (8.4)
93
Martingales are often thought of as fair games because of their property of conserving their
expected value: It is impossible, on average, to make positive gains by playing such game. Under
this interpretation, Theorem 2.1 states that even if a player is given the possibility of quitting the
game use any betting strategy, he/she will not be able to make net gains at time t provided that
his/her strategy only depends on past information (cfr. Def. 7.15 of stopping time). However, the
above property is lost if the player is patient enough, as the following example shows:
Example 2.2. Let Bt be a Standard Brownian motion (a martingale, hence an example of a
“fair game”: you can think of it as a continuous version of betting one dollar on every coin flip) and
define τ1 :“ inftt : Bt ě 1u (the strategy of stopping as soon as you have a net gain of 1$). Then
by definition we have that Bτ “ 1 ‰ 0 “ E rB0 s.
A similar situation to the one described above holds when considering the “martingale” betting
strategy of doubling your bet every time you loose a coin flip. This strategy leads to an almost
sure net win of 1$ if one is patient enough (and has enough money to bet). As the examples above
shows, stopped martingales may lose the property of conserving the expected value in the limit
t Ñ 8. The following theorem gives sufficient conditions for the martingale property to hold in this
limit, i.e., for the expected value of a game to be conserved at a stopping time τ :
Theorem 2.3 (Optional stopping theorem). Let Mt be a martingale, τ a stopping time, then
we have E rMτ s “ E rM0 s if either of the following conditions holds:
‚ The stopping time τ is bounded a.s, i.e., DK ă 8: τ ď K ,
‚ The martingale Mt is uniformly integrable ,
‚ The stopping time is finite a.s. (i.e., Prτ “ 8s “ 0), Mt is integrable and
lim E rMt 1τ ąt s “ 0 .
tÑ8
Proof.
Under the gaming interpretation of above, we see that a game is “fair”, i.e., it is impossible
to make net gains, on average, using only past information, if any of the conditions i)-iii) hold. In
particular, in the case of coin-flip games (or casino games) we see that a winning strategy does not
exist as condition ii) holds: there is only a finite amount of money in the world, so the martingale is
uniformly bounded, and in particular uniformly integrable. A simplified example of such a situation
is given next:
Example 2.4. Let Bt be a Standard Brownian motion on on the interval a ă 0 ă b and
define the stopping time τ “ τab “ inftt P r0, 8q : Bt R pa, bqu. The stopped process Bτ ^ t is
uniformly bounded and in particular uniformly integrable. Hence, by Theorem 2.3 we have that
E rBτ s “ E rB0 s “ 0. However, we also have that Bτ “ b with probability p and Bτ “ a with
probability 1 ´ p, therefore
´a
0 “ E rBτ s “ a ¨ p1 ´ pq ` b ¨ p ñ P rBτ “ bs “ p “ ,
b´a
which we have concluded based on considerations based on the martingale properties of Bt and
therefore extends to any martingale for which τab is finite a.s..
We conclude the chapter by presenting the converse of Theorem 2.3:
Proposition 2.5. Let Xt be a stochastic process such that for any stopping time τ , Xτ is
integrable and E rX0 s “ E rXτ s. Then Xt is a martingale.
Proof. We refer to Klebaner [9, Proof of Thm. 7.17].
94
3. Localization
This section is devoted to the use of stopping times for the study of the properties of stochastic
processes. As we have seen, the stopped process may have some properties that the original process
did not have (e.g., uniform integrability on r0, 8q in Example 2.4). One can generalize such situation
to a sequence of stopping times, such as the following example:
Example 3.1. Consider, similarly to Example 2.4, a Standard Brownian motion Bt on the
interval p´n, nq for n P N. Then we can define the stopping times τn :“ inftt : Bt R p´n, nqu. For
each n ą 0, the process is uniformly integrable.
In the above example, by taking the limit n Ñ 8 one would approach the original setting of
unbounded Brownian motion by approximating it with uniformly bounded stopped processes. This
prices can be extremely useful to obtain stronger results as the ones obtained previously in the
course, as we will see later in this section, and justifies the following definition:
Definition 3.2. A property of a stochastic process Xt is said to hold locally if there exists
a sequence tτn u of stopping times with the property limnÑ8 τn pωq “ 8 a.s. such that the stopped
process Xτn ^t has such property. In this case, the sequence tτn u is called the localizing sequence.
A particularly useful example is the one of the martingale property:
Definition 3.3. An adapted process Mt is a local martingale if there exists a sequence of
stopping times tτn u such that limnÑ8 τn pωq “ 8 a.s. and the stopped process Mτn ^t is a martingale
for all n.
It is clear that if a property holds in the original sense, then it also holds locally: one just has
to take τn “ n ą t. On the contrary a local martingale is in general not a martingale:
şt
Example 3.4. Consider the Itô integral 0 exprBs2 s dBs , for t ă 1{4 and define τn :“ inftt ą
0 : exprBs2 s “ nu. The process Mτn ^t is a martingale, since we can write it as
żt
Mτn ^t “ Ut “ exprBs2 s1exprBs2 sďn dBs
0
is square integrable by Itô isometry. However, we have that
ż8
1 2 2
E expr2Bt2 s “ e2x e´x {p2tq dx
“ ‰
2πt ´8
which diverges for t ą 1{4, implying that Mt is not integrable.
We now list some results that, besides allowing to practice the use of localization methods, give
sufficient conditions for a local martingale to be a martingale.
Proposition 3.5. Let Mt be a local martingale such that |Mt | ăď Y for an integrable random
variable Y , then Mt is a uniformly integrable martingale.
Proof. Let τn be a localizing sequence, then for any n and s ă t we have
E rMt^τn |Fs s “ Ms^τn .
Because τn Ò 8 a.s. we have the pointwise convergence limnÑ8 Xs^τn “ Xs . Furthermore by our
assumptions Mt is integrable, and we can apply Dominated Convergence Theorem2 to obtain that
” ı
E rMt |Fs s “ E lim Xt^τn |Fs “ lim E rXt^τn |Fs s “ lim Xs^τn “ Ms ,
nÑ8 nÑ8 nÑ8
showing that Mt is a martingale. By Proposition 1.5 we establish uniform integrability of Mt .
2a version of this theorem is presented in the appendix
95
Proposition 3.6. A non-negative local martingale Mt , for t P r0, T s is a supermartingale.
Proof. Let tτn u be the localizing sequence of Mt . Then for any t we have that limnÑ8 τn ^t “ t
a.s and therefore that limnÑ8 Mτn ^t “ Mt . Consequently, by Fatou’s lemma on conditional
expectations we have
” ı
E rMt |Fs s “ E lim inf Mτn ^t |Fs ď lim inf E rMτn ^t |Fs s “ lim inf Mτn ^s “ Ms a.s. ,
nÑ8 nÑ8 nÑ8
where in the second equality we have used that the limit exists. In particular, we have that
E rMt s ď E rM0 s ă 8.
Corollary 3.7. A non-negative local martingale Mt on T “ r0, T s for T ă 8 is a martingale
if and only if E rMT s “ M0
Proof. This is a direct result of Proposition 1.2 and Proposition 3.6.
Remark 3.8. As explained in [9] there exists a necessary and sufficient condition for a local
martingale to be a martingale: for the local martingale to be of “Dirichlet Class”, i.e., such that
such that the collection of random variables
X “ tXτ : τ is a finite stopping timeu
“ ‰
is uniformly integrable, i.e., supXPX limnÑ8 E |X|1|X|ąn “ 0.
We now give some slightly more advanced examples of the use of localization procedure. We
begin by revisiting the problem of proving moment bounds şt for Itô integrals.
Moment Bounds for Itô Integrals. We let It “ 0 σs dBs . We want to prove the moment
bounds
E|It |2p ď p2p ´ 1qp2p ´ 3q ¨ ¨ ¨ 3 ¨ 1 ¨ M 2p tp ,
under the assumption that |σs | ď M a.s.
The case p “ 1 follows from the Itô isometry. Therefore, we now proceed to prove the induction
step. Let us assume the inequality for p ´ 1 and use it to prove the inequality for p. For any N ą 0,
we define
żt
τN “ inftt ě 0 : |Is |4p´2 σs2 ds ě N u
0
Applying Itô formula to x ÞÑ |x|2p
and evaluating at the time t ^ τN produces
ż t^τN ż t^τN
|It^τN |2p “ pp2p ´ 1q |Is |2pp´1q 2
σs ds ` 2p |Is |2p´1 σs dBs “ pIq ` pIIq
0 0
now by the induction hypothesis
żt żt
2pp´1q 2 2p
EpIq ď pp2p ´ 1q E|Is | σs ds ď p2p ´ 1qp2p ´ 3q ¨ ¨ ¨ 3 ¨ 1 ¨ pM sp´1 ds
0 0
ď p2p ´ 1qp2p ´ 3q ¨ ¨ ¨ 3 ¨ 1 ¨ M pt ^ τN q ď p2p ´ 1qp2p ´ 3q ¨ ¨ ¨ 3 ¨ 1 ¨ M 2p tp
2p p
If we define
żt
Ut “ |Is |2p´1 σs 1sďτN dBs
0
then Ut is a martingale since
żt ż t^τN
4p´2 2
|Is | |σs | 1sďτN ds “ |Is |4p´2 |σs |2 ds ď N .
0 0
96
Since t ^ τN is a bounded stopping time, the optional stopping lemma says that EUt^τN “ 0.
However as noted above
EpIIq “ EUt^τN
so one obtains
E|It^τN |2p ď p2p ´ 1qp2p ´ 3q ¨ ¨ ¨ 3 ¨ 1 ¨ M 2p tp .
Since |Is | is almost surely finite, we know that τN is finite with probability one. Hence |It^τN |2p Ñ
|It |2p almost sure. Then by Fatou’s lemma we have
E|It |2p ď lim E|It^τN |2p ď p2p ´ 1qp2p ´ 3q ¨ ¨ ¨ 3 ¨ 1 ¨ M 2p tp (8.5)
N Ñ8
SDEs with Superlinear Coefficients. Let b : Rd Ñ Rd and σ piq : Rd Ñ Rd be such that for
any R ą 0 there exists a C such that
m
ÿ
|bpxq ´ bpyq| ` |σ piq pxq ´ σ piq pyq| ď C|x ´ y|
i“1
|bpxq| ` |σpxq| ď C
for any x, y P B0 pRq, where B0 prq :“ tx P Rd : }x}2 ă Ru.
Consider the sde
m
ÿ piq
dXt “ bpXt q dt ` σ piq pXt q dBt (8.6)
i“1
piq
For any R let bR and σR be are globally bounded and globally Lipchitz functions in Rd such
that bR pxq “ bpxq and σR pxq “ σpxq in B0 pRq.
Since bR and σR satisfy the existence and uniqueness assumptions of Chapter 6.3, there exists a
pRq
solution Xt to the equation
ÿm
pRq pRq piq pRq piq
dXt “ bR pXt q dt ` σR pXt q dBt (8.7)
i“1
For any N ą 0 and R ą 0 we define the stopping time
pRq
τR :“ inftt ě 0 : |Xt | ą Ru
Theorem 3.9. If
„
P lim τR “ 8 “ 1
RÑ8
ΓN :“ tttN N N N
j u : 0 “ t0 ă t1 ă ¨ ¨ ¨ ă tj N “ tu (8.8)
with |ΓN | :“ supj |tN N
j`1 ´ tj | Ñ 0 as N Ñ 8.
The process defined above is a sum of positive contributions and is therefore nondecreasing in t
a.s..
Now let Mt be a [local] martingale. In light of Remark 1.3 we know that Mt2 is a [local]
submartingale. Hence, we would like to know if we can transform Mt2 back to a martingale, for
example by subtracting a “compensation process” removing the nondecreasing part of the squared
process. It turns out that such process exists and is precisely the quadratic variation process. The
intuition behind this result comes from the following computation: assume that s ă t, then we have
E rMt Ms s “ E rMs E rMt |Fs ss “ E Ms2
“ ‰
where in the second equality we have used the martingale property. As a consequence of this we can
write
E pMt ´ Ms q2 “ E Mt2 ´ 2E rMt Ms s ` E Ms2 “ E Mt2 ´ E Ms2 .
“ ‰ “ ‰ “ ‰ “ ‰ “ ‰
(8.9)
In particular this implies that the summands in the definition of quadratic variation can be expressed,
on expectation, as differences of expectation values that cancel telescopically, leading to (part of)
the following theorem.
Theorem 4.2. This theorem can be stated in the martingale and local martingale version:
i) Let Mt be a square-integrable martingale, then the quadratic variation process rM st exists
and Mt2 ´ rM st is a martingale.
ii) Let Mt be a local martingale, then the quadratic variation process rM st exists and Mt2 ´rM st
is a local martingale.
Proof. We only prove point i) of the theorem above. Point ii) follows for locally square
integrable martingales by localization, i.e., by substituting t Ñ τn ^ t where τn is the localizing
sequence. Repeating the calculation leading to (8.9) with conditional expectations we obtain
» fi
jN
ÿ
E Mt ´ Ms2 |Fs “ E – pMtN ´ MtN q2 |FtN fl .
“ 2 ‰
j j´1 j´1
j“1
Now, taking the limit in probability of the right hand side (we do
“ not prove that
‰ such limit exists
here, but we refer to [16]) and rearranging we obtain that E Mt2 ´ rM st |Fs “ Ms2 ´ rM ss as
desired .
We conclude this section by proving a surprising result about martingales with finite first
variation.
Lemma 4.3. Let Mt be a continuous local martingale with finite first variation. Then Mt is
almost surely constant.
98
The intuition behind the above result is quite simple: considering a continuum time interval,
constraining a continuous martingale on behaving “nicely” in order to have finite first variation
(for example monotonically or in a differentiable way) the martingale would somehow have to be
“consistent with its trend at t´ ” (except of course in a set of measure 0) and could therefore not
respect the constant conditional expectation property. In other words, martingales with finite first
variation are too “stiff” to be different from the identity function.
Remark 4.4. Note that continuity is a key requirement in the above result: jump processes
(constant between jumps, discontinuous when jumps occur) give an example of martingales that are
not constant but that have finite first variation.
Proof of Lemma 4.3. We assume for this proof that Mt is a [locally] bounded martingale.
We will eventually show that the variance of Mt is zero and hence Mt is constant. Picking some
partition of time 0 “ t0 ă t1 ă ¨ ¨ ¨ ă tk “ t, recalling (8.9) we consider the variance at time t
ÿ´ ¯ ÿ
EMt2 “ E Mt2n ´ Mt2n´1 “ E pMtn ´ Mtn´1 q2
ÿ
ď E sup |Mtn ´ Mtn´1 | |Mtn ´ Mtn´1 |
tn
ř
Since the first variation V ptq “ lim∆T Ñ0 |Mtn ´ Mtn´1 | was assumed to be finite we obtain
EMt2 ď pconstqE lim sup |Mtn ´ Mtn´1 |
∆T Ñ0 tn
If there is no such time s, set τ “ 8. Fix a time t. We what to show that the random variable Xptq
2
has the same Gaussian distribution as Bt . To do this it is enough to show that EeiαXptq “ e´α t{2 ,
99
that is show that they both have the same characteristic functions (Fourier transform). It is a
standard result in basic probability that if a sequence of random variables have characteristic
2
functions which converge for each α to e´α t{2 then the sequence of random variables has a limit and
it is Gaussian. See [1] for a nice discussion of characteristic functions and convergence of probability
measures. Hence, we will show that
α2 t
! )
E eiαXpt^τ q Ñ e´ 2 ` Opεq as N Ñ 8 for any ε ą 0.
Since ε will be arbitrary and the left hand side is independent of ε, this will imply the result.
Partition the interval r0, ts with point tk “ ktN . Set
ˇ # +ˇ
N N
α2
ˇ ź ź ˇ
I “ ˇE eiαpXj ´Xj´1 q ´ e´ 2 ptj ´tj´1 q ˇ
ˇ ˇ
ˇ j“1 j“1
ˇ
All of the terms in the first product have modulus one and all of the terms in the second product
are less than one. Hence
# * ˇ+
N ˇ "
α2
ÿ ˇ
IďE ˇE eiαpXN ´k ´XN ´k`1 q ´ e´ 2 ptN ´k ´tN ´k`1 q ˇFtN ´k ^τ ˇ
ˇ ˇ ˇ
k“1
100
And thus,
# ˇ+
N ˇˇ 2 2
ÿ α 2 α 2 3
ˇ
IďE ˇEtiα∆k X|Ft
N ´k
u ´ Et p∆ k Xq |F tN ´k
u ` ∆ k t ` Cp∆ k tq ` CEtp∆ k Xq |F tN ´k ^τ u ˇ
k“1
ˇ 2 2 ˇ
« ˆ ˙ 2 ˆ ˙ 2 ˆ ˙2 ˆ ˙ ff
t α t α t t t
ď ´N `N ` CN ` CεN “ C ` εCt
N 2 N 2 N N N
Observe that τ Ñ 8 as N Ñ 8 for any fixed ε. Hence we have that
ˇ ! ˇ
) 2 ˇ
ˇE eiαXptq ´ e´ α2 t ˇ ď lim I ď εCt
ˇ
ˇ ˇ N Ñ8
Notice that the left hand side is independent of ε. Since C and t are fixed and ε was any arbitrary
number in p0, ε0 s, we conclude that
α2 t
! )
E eiαXptq “ e´ 2
We now give a slightly different formulation of the Levy-Doob theorem. Let Mt be a continuous
martingale. Then by Theorem 4.2 if rM st “ t condition ii) of Theorem 5.1 is satisfied and we obtain
the following result.
Theorem 5.2 (Levy-Doob theorem). If Mt is a continuous martingale with rM st “ t and
M0 “ 0 then Mt is standard Brownian motion.
In other words, we are now in the same setting as in the previous section, where f ptq “ G´1 ptq. At
the same time, by the inverse function theorem we obtain
¨ ˛´1
` ˘´1 ˚ 1 2
f 1 ptq “ Bt G´1 ptq “ G1 pG´1 ptqq ¯2 ‚ “ σpB̂pτt qq
‹
“˝ ´
σ B̂pG´1 ptqq
7. Martingale inequalities
We now present some very useful inequalities that allow to control the fluctuations of martingales.
The first result is due to Doob and controls the probability distribution of the maximum of a
martingale on a certain time interval. For this reason these inequalities are sometimes called Doob’s
maximal inequalities. The first one bounds from above the probability that the supremum of a
martingale in an interval exceeds a certain a certain value λ, while the second bounds the first
moment of such distribution, i.e., the expected value of the supremum on the given interval.
104
Theorem 7.1 (Doob’s Martingale Inequality). Let Mt be a martingale (or a positive submartin-
gale) with respect to the filtration Ft . Then for T ą 0 and for all λ ą 0
Er|MT |p s
„
P sup |Mt | ě λ ď for all p ě 1 ,
0ďtďT λp
and „ ˆ ˙p
p
E sup |Mt | ď Er|MT |p s for all p ą 1 .
0ďtďT p ´ 1
Before turning to the proof, we remark the similarity of the first inequality with Markov’s
inequality, i.e., given a random variable X, for every p ě 1 we have
E r|X|p s
P r|X| ą λs ď .
λp
The difference of the two inequalities is the supremum, under the condition of the process Mt being
a martingale, in Doob’s inequality.
Proof. First of all we note that by convexity of |x| and xp on R` the process |Mt |p is a
submartingale. Consequently defining the stopping time
τλ :“ inftt : |Mt | ą λu ,
we have by Doob’s optional stopping theorem
E r|Mτλ ^t |p s ď E r|Mt |p s . (8.17)
At the same time, we have that
E r|Mτλ ^t |p s “ E r|Mτλ ^t |p 1τλ ďt s ` E r|Mτλ ^t |p 1τλ ąt s
“ λP rτλ ď ts ` E r|Mt |p 1τλ ąt s . (8.18)
Combining (8.17) and (8.18) we finally obtain
« ff
E r|Mt |p 1τλ ďt s E r|Mt |p s
P sup |Ms | ě λ “ P rτλ ď ts ď p
ď
sPr0,ts λ λp
where in the last passage we have used the nonnegativity of |Mt |.
The above result is key to derive numerous results in stochastic calculus. We have seen one
example in the proof of Theorem 7.2. We can also use it to bound the supremum of Itô integrals:
” Example
şt ı7.2. Under the assumption that σs ď M ă 8 we have shown in Section 3 that
p
E | 0 σs dBs | ă 8. Consequently, by Doob’s inequality (recall that for a martingale Mt , |Mt |p is
a positive submartingale for p ě 1) we have
« żt ff „ żt
p 2p
E sup | σs dBs | ď C2 E | σs dBs | ă 8.
tPp0,T q 0 0
By the definition of the stopping time everything is finite, hence we can divide thought by the first
term on the right to obtain
„ 1 ı1
p ”
ď CE rXspt^τN .
2p p
E sup |Xs |
0ďsďt^τN
We realize that
şt both right- and left hand side are uniformly bounded, under our assumption, by
(8.5) and by 0 fs ds ă tM 2 respectively. The proof is concluded by raising both sides to the power
2
p and, by means of dominated convergence theorem, removing the stopping time by taking the limit
as N Ñ 8.
107
CHAPTER 9
Girsanov’s Theorem
1. An illustrative example
We begin with a simple example. We will frame it in a rather formal way as this will make the
analogies with later examples clearer.
One-dimensional Gaussian case. Let us consider the probability space pω, P, Fq where Ω “ R
and P is the standard Gaussian with mean zero on variance one. (For completeness let F be the
Borel σ-algebra on R.). We define two random variables Z and Z̃ on this probability space. As
always, a real valued random variable is a function from Ω into R. Let us define
Zpωq “ ω and Z̃pωq “ ω ` µ
for some fixed constant µ. Since ω is drawn under P with respect the N p0, 1q measure on R we have
that Z is also distributed N p0, 1q and Z̃ is distributed N pµ, 1q.
Now let us introduce the density function associated to P as
1 ω2
φpωq “ ? expp´ q
2π 2
Now we introduce the function
φpω ´ µq ´ µ2 ¯
Λµ pωq “ “ exp ωµ ´
φpωq 2
Since Λµ is a function from Ω to R is can be viewed as a random variable and we have
ż ż8 ż8
EP Λµ “ Λµ pωqPpdωq “ Λµ pωqφpωqdω “ φpω ´ µqdω “ 1
ω ´8 ´8
since φpω ´ µq is the density of a N pµ, 1q random variable. Hence Λµ is a L1 pΩ, Pq random variable.
Hence we can define a new measure Q on Ω by
Qpdωq “ Λµ pωqPpdωq.
This means that for any random variable X on Ω we have that the expected value with respect to
the Q, denoted by EQ is define by
EQ rXs “ EP rXΛµ s
Furthermore observe that for any bounded f : R Ñ R,
ż8 ż8
EQ f pZq “ EP rf pZqΛµ s “ f pZpωqqΛµ pωqφpωqdω “ f pωqφpω ´ µqdω
´8 ´8
ż8
“ f pω ` µqφpωqdω “ EP f pZ̃q
´8
Which implies that the distribution of Z under the measure Q is the same as the distribution of Z̃
under distribution P.
109
Example 1.1 (Importance sampling). Let f : R Ñ R, and let X be distributed N pµ, 1q. We
have that
ż8
1 px´µq2
Ef pXq “ ? f pxqe´ 2 dx
2π ´8
for some µ P R.
For n large and tXi un1 iid N p0, 1q, we estimate the above expected value by sampling, i.e.,
n
1ÿ
Erf pXqs « f pXi q
n i“1
The problem of the above method is that for not-so-large values of µ ( e.g., µ ą 6), taking for example
f “ 1Xă0 we would need a very large amount of samples before sampling the tail of N pµ, 1q, i.e.,
elements that are relevant for our estimation.
However, let Y be distributed N p0, 1q. Then by the procedure outlined above we have:
ż8 px´µq2
e´ 2
„
1 ´ x2
µY ´ µ2
Erf pXqs “ ? f pxq x2
e 2 dx “ E f pY qe 2
2π ´8 e´ 2
n
1ÿ µ2
« f pYi qeµYi ´ 2
n i“1
for tYi uni“1 iid N p0, 1q. Under this new distribution, the indicator function often positively to the
sampling, and we need significantly less samples to obtain an accurate estimate of the expectation.
Multidimensional Gaussian case. Now let’s consider a higher dimensional version of the
above example. Let Ω “ Rn and let P be n-dimensional Gaussian probability measure with
covariance σ 2 I where σ ą 0 and I is the n ˆ n dimensional covariance matrix. In analogy to before,
we define for ω “ pω1 , . . . , ωn q P Rn
n
1 ´ 1 ÿ 2¯
φpωq “ exp ´ ω
2σ 2 i“1 i
n
p2πσ 2 q 2
Then if we define the Rn valued random variables Zpωq “ pZ1 pωq, . . . , Zn pωqq and Z̃pωq “
pZ̃1 pωq, . . . , Z̃n pωqq “ Zpωq ` µ. Then if we define Qpdωq “ Λµ pωqPpdωq then following the same
reasoning as before that the distribution of Z under Q is the same as the distribution of Z̃ under P.
dXt “ µ dt ` dBt ,
In light of what has been discussed in the previous section, we transform the above in iid Gaussian
distributions:
2
n
ź ´
rpxi ´xi´1 q´µpti ´ti´1 q2 s n
ź ´
pxi ´xi´1 q2 n
ź 1 2 pt ´t
e 2pti ´ti´1 q
“ e 2pti ´ti´1 q
eµpxi ´xi´1 q´ 2 µ i i´1 q
Now we can consider the multiplication as the desired measure of Gaussian distribution and the
prefactor as the random variable Λµ pω, tq:
” 1 2
ı
E rf pXt1 , . . . , Xtn qs “ E f pBt1 , . . . , Btn qeµBtn ´ 2 µ tn
“ E rf pBt1 , . . . , Btn qΛµ pω, tqs
We note en passant that the “coefficient” Λµ pω, tq can be written as a martingale Mt pωq, more
1 2
precisely the exponential martingale Mt “ eµBt ´ 2 µ t (we are going to define this concept more
precisely in the next section).
If G Ă F is a σ-algebra, then:
Eν rX|Gs Eµ rf |Gs “ Eµ rf X|Gs
Before using the above theorem in the context of stochastic processes, we recall the concept of
stochastic exponential of a process Xt , given by
ˆ ˙
1
EpXqt “ exp Xt ´ X0 ´ rXst .
2
Recall that the stochastic integral of a process Xt are defined as the solution to the abbrsde
dUt “ Ut dXt . (9.1)
When Xt wr know by the Martingale representation theorem Theorem 8.3 that we can express
dXt “ Cs dBt for a predictable process Cs . Therefore, by (9.1) stochastic exponentials of local
martingales are local martingales themselves, as summarized in the following theorem. This result
also gives a sufficient condition (called the Novikov condition) for the stochastic exponential of a
(local) martingale to be a true martingale.
Theorem 3.4 (Exponential ”Martingale). ıIf Mt is a local martingale with M0 “ 0 (like, for
şt şt
instance, every 0 as dBs with P 0 a2s ds ă 8 “ 1) then the stochastic exponential EpM qt is a
continuous positive local martingale, and hence a supermartingale. Furthermore, if
„ ˆ ˙
1
E exp rM sT ă 8, (Novikov)
2
then EpM qt is a martingale on r0, T s with E pEpM qt q “ 1 .
Remark 3.5. Other conditions guaranteeing that the stochastic exponential of a local martingale
is a true martingale exist. Some of them are summarized in [9, Thm. 8.14 – 8.17]. Furthermore, if
şt
Mt has the form Mt “ 0 as dBs , then the condition as ď cpsq ă 8 for all P p0, T q is a sufficient
condition for EpM qt to be a martingale.
We finally come to the first version of Girsanov’s theorem. This result allows to do something
very similar to what was done in the first section of this chapter: Switching to a new probability
measure so that an “unnatural” random variable becomes a normal-distributed one. This result can
be generalized to the framework of stochastic processes: Girsanov’s theorem allows, under some
conditions summarized below, to transform an Itô process
dYt “ at pωq ` dBt (9.2)
on a given probability space pΩ, F, Pq to the “simplest” stochastic process we encountered in this
course, i.e., Brownian motion, by changing the measure on that space.
112
Theorem 3.6 (Girsanov I). Let Yt be defined as in (9.2) with Bt a Brownian motion under P.
şt
Assume that 0 as dBs is well defined, define the stochastic exponential
„ żt
1 t 2
ż
Λt “ exp ´ as dBs ´ a ds .
0 2 0 s
and assume that Λt is a martingale on r0, T s with respect to P (i.e., a P-martingale). Then under
the (equivalent) probability measure
dQ
pωq “ ΛT pωq (9.3)
dP
the process Yt is a Brownian motion B̂t on r0, T s.
Proof. We want to show that Yt is a SBM wrt Q. To do so, by Lévy’s characterization of
Brownian motion Theorem 5.2, it is sufficient to show that
i) Yt is a local martingale wrt Q ,
ii) rY st “ t ,
provided that Y0 “ 0 (which we assume without loss of generality).
Part ii) follows from the following computation:
Because h1 psq is continuous on a compact interval it is uniformly bounded and the Novikov condition
holds. Hence we can define the measure dQ “ Λt dP, by the above theorem under Q, Xt is a standard
BM. Therefore we can write
˜ż ˆ ¸1
dQ 2 2
ż ˆ ˙ ˙
dQ 1
QpGq “ 1G dP ď dP PpGq 2
dP dP
where in the step inequality we have used Cauchy-Schwartz inequality and so:
Qpsupp0,1q |B̂s | ă εq2
ˆ ˙
P sup |Bt ´ hptq| ă ε ě ş ´ dQ ¯
0ďtď1
dP dQ
Looking at the above inequality we see that we have reduced the estimation of the relevant probability
to the estimation of the probability of Brownian motion exiting an interval and the expected value of
the random variable Λ1 .
The above result can be extended to the d-dimensional setting with nontrivial diffusion coefficient
σpXt q. Furthermore, we may be interested in transforming Yt (in the distributional sense) to a
different Itô process Xt different than Brownian motion. Conditions to do this are summarized in
the following more general theorem:
Theorem 3.9 (Girsanov II). Let Xt , Yt P Rd be processes satisfying
dXt “ µpXt , tq dt `σpXt , tq dBt ,
dYt “ pµpYt , tq ` γpω, tqq dt `σpYt , tq dBt ,
with Y0 “ X0 “ x for a m-dimensional P-Brownian motion Bt on t P r0, T s. Suppose that there
exists a process upω, tq such that
σpYt qupω, tq “ γpω, tq .
Furthermore let „ żt
1 t
ż
2
Λt :“ exp ´ upω, sq dBs ´ upω, sq ds , (9.4)
0 2 0
Then if Λt is a P-martingale on r0, T s and Q is defined as in (9.3) we have that
dYt “ µpYt , tq dt ` σpYt , tq dB̂t ,
114
for a Q-Brownian motion żt
B̂t “ upω, sq ds ` Bt .
0
Proof. It follows from Theorem 3.6 that B̂t is a Brownian motion wrt Q. Furthermore we
observe that
dYt “ pµpYt , tq ` γpω, tqq dt ` σpYt , tqp dB̂t ´ upω, tq dtq
“ pµpYt , tq ` γpω, tqq dt ` σpYt , tq dB̂t ´ γpω, tq dt
“ µpYt , tq dt ` σpYt , tq dB̂t
as desired.
We note that the above result can be added to our arsenal of methods to find weak solutions to
sdes! Indeed, let Xt , Yt be defined by:
/ dXt “ µ1 pXt q dt ` σpXt qdBt
, dYt “ µ2 pYt q dt ` σpYt qdBt
X0 “ Y0 “ x
and assume that we cannot solve / but have an idea on how to solve ,. Then we can define upyq
by:
σpyqupyq “ µ2 pyq ´ µ1 pyq
and set, as in Theorem 3.9 şt 1 t
ş 2
Λt “ e´ 0 upYs qdBs ´ 2 0 |upYs q| ds
which allows us to define the measure dQ “ Λt dP. Then by Theorem 3.9 we have that
żt
B̂t “ Bt ` upYs qds
0
is a standard Brownian motion under Q, and
dYt “ µ1 pYt q dt ` σpYt qdB̂t
“ µ1 pYt q dt ` σpYt q rupYt q dt ` dBt s
“ µ1 pYt q dt ` µ2 pYt q dt ´ µ1 pYt q dt ` σpYt qdBt
“ µ2 pYt q dt ` σpYt qdBt
Hence, Yt in Q solves the same sde as Xt , but with a different Brownian motion. This implies that
the Law of Yt on Cp0, T ; Rd q (and therefore all of its marginals) is equivalent to the Law of Xt on
Cp0, T ; Rd q. Hence we can write the unknown marginals for the process / in P as
EP rf pXt qs “ EQ rf pYt qs “ EP rf pYt qΛT s ,
i.e., as an expectation on a process that we know multiplied by a weighting factor that can be
estimated/computed.
115
CHAPTER 10
for any choice of α and β. Notice that by construction φ is twice-differentiable, positive and
monotone increasing function of R onto R. Hence φ is invertible and we can understand φ as a
warping of R so that Xt becomes a Martingale. For this reason, the function φ is called the natural
scale for the process Xt .
In light of (10.2), Yt “ φpXt q satisfies
dYt “ pφ1 σqpφ´1 pYt qq dBt (10.3)
which shows that Yt not only is a Martingale but it is again an sde.
In the discussion of random time changes, we saw that the when the martingale Mt was solves
the sde
dMt “ gpMt q dBt (10.4)
then if we consider Mt on the time scale
żt
1
τ ptq “ ds
0 g 2 pM sq
then Bt “ Mτ ptq is a Brownian motion. Since the rate of randomness injection into the system, as
measured by the quadratic variation, for a Brownian motion is one, this time changes is given a
distinguished status. The measure on R which gives this time change is g21pxq when integrated along
117
the trajectory so called the speed measure. In the setting of (10.4), the speed measure, denoted
mpxqdx, would be
1
mpxq “ 2 .
g pxq
Returning to the setting with a drift term (10.1), we look for the time change of the resulting
martingale after the system has been put on its natural scale. Looking at (10.3), we see that
1
dy (10.5)
rpφ σqpφ´1 pyqqs2
1
is the speed measure for the system expressed in the y variable where y “ φpxq. Undoing this
transform using and using dy “ φ1 pxqdx shows the speed measure in the original variables to be
1
mpxqdx “ 1 2 dx
pφ σ qpxq
2. Existence of Weak Solutions
In the previous section we saw how to transform the one-dimensional sde (10.1) in to a
Brownian motion by warping space and changing time. Noticing that each of these processes was
reversible/invertible, we now reverse our steps to turn a Brownian motion in to a solution of (10.1).
Let Bt be a standard Brownian motion. Looking back at (10.3) and (10.5), we define Yt by
dYt “ pφ1 σqpφ´1 pYt qqdBt
The equation has a weak solution given by Yt “ BTt where
żt
“ 1 ‰2
Tt “ pφ σqpφ´1 pBs qq ds .
0
Next we define Xt “ ψpYt q where for notation compactness we have defined ψ “ φ´1 then Itô’s
formula tells up that
1
dXt “ ψ 1 pYt qdY ` ψ 2 pYt qdrY st
2
Need to finish argument
118
Rearranging produces
φpxq ´ φpaq
Px pτa ď τb q “ (10.6)
φpbq ´ φpaq
Another way to find this formula is to set upxq “ Px pτa ď τb q. Then upxq solves the pde
pLuqpxq “ 0 x P pa, bq, upaq “ 1, and upbq “ 0
It is not heard to see that the above formula solves this pde. (Try the case when Xt is a standard
Brownian motion to get started).
Now we derive a formula for vpxq “ Ex τpa,bq . Since it is a solution to
pLvq “ ´1 x P pa, bq and vpaq “ vpbq “ 0
one finds
żb
φpxq ´ φpaq “ ‰
vpxq “ Ex τpa,bq “2 φpbq ´ φpzq mpzqdz
φpbq ´ φpaq x
żx
φpbq ´ φpxq “ ‰
`2 φpzq ´ φpaq mpzqdz
φpbq ´ φpaq a
4. Recurrence
Definition 4.1. A one-dimensional diffusion is recurrent if for all x, Ppτx ă 8q “ 1.
Theorem 4.2. If a ă x ă b then
i) Px pTa ă 8q “ 1 if and only if φp8q “ 8.
ii) Px pTb ă 8q “ 1 if and only if φp´8q “ ´8.
iii) Xt is recurrent if and only if φpRq “ R if and only if both φp8q “ 8 and φp´8q “ ´8.
proof of Theorem 4.2.
1. Leo Breiman, Probability, Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, 1992,
Corrected reprint of the 1968 original. MR 93d:60001
2. J. L. Doob, Stochastic processes, John Wiley & Sons Inc., New York, 1953. MR 15,445b
3. Richard Durrett, Stochastic calculus, Probability and Stochastics Series, CRC Press, Boca Raton, FL, 1996, A
practical introduction. MR 1398879 (97k:60148)
4. , Stochastic calculus, a practical introduction, CRC Press, 1996.
5. John Guckenheimer and Philip Holmes, Nonlinear oscillations, dynamical systems, and bifurcations of vector
fields, Applied Mathematical Sciences, vol. 42, Springer-Verlag, New York, 1990, Revised and corrected reprint of
the 1983 original.
6. Philip Hartman, Ordinary differential equations, second ed., Birkhäuser, Boston, Mass., 1982.
7. Ioannis Karatzas and Steven E. Shreve, Brownian motion and stochastic calculus, second ed., Graduate Texts in
Mathematics, vol. 113, Springer-Verlag, New York, 1991. MR 1121940 (92h:60127)
8. , Brownian motion and stochastic calculus, second ed., Springer-Verlag, New York, 1991. MR 92h:60127
9. Fima C. Klebaner, Introduction to stochastic calculus with applications, third ed., Imperial College Press, London,
2012. MR 2933773
10. N. V. Krylov, Introduction to the theory of random processes, Graduate Studies in Mathematics, vol. 43, American
Mathematical Society, Providence, RI, 2002. MR 1885884 (2003d:60001)
11. Hui-Hsiung Kuo, Introduction to stochastic integration, Universitext, Springer, New York, 2006. MR 2180429
(2006e:60001)
12. H. P. McKean, Stochastic integrals, Academic Press, New York-London, 1969, Probability and Mathematical
Statistics, No. 5.
13. Peter Mörters and Yuval Peres, Brownian motion, Cambridge Series in Statistical and Probabilistic Mathematics,
Cambridge University Press, Cambridge, 2010, With an appendix by Oded Schramm and Wendelin Werner.
MR 2604525 (2011i:60152)
14. Bernt Øksendal, Stochastic differential equations, fifth ed., Universitext, Springer-Verlag, Berlin, 1998, An
introduction with applications. MR 1619188 (99c:60119)
15. Philip Protter, Stochastic integration and differential equations: a new approach, Springer-Verlag, 1990.
16. Philip E. Protter, Stochastic integration and differential equations, Stochastic Modelling and Applied Probability,
vol. 21, Springer-Verlag, Berlin, 2005, Second edition. Version 2.1, Corrected third printing. MR 2273672
(2008e:60001)
17. Daniel Revuz and Marc Yor, Continuous martingales and Brownian motion, second ed., Grundlehren der
Mathematischen Wissenschaften, vol. 293, Springer-Verlag, Berlin, 1994.
18. Walter A. Strauss, Partial differential equations, John Wiley & Sons Inc., New York, 1992, An introduction.
MR 92m:35001
19. Daniel W. Stroock, Probability theory, an analytic view, Cambridge University Press, Cambridge, 1993.
MR 95f:60003
20. S. J. Taylor, Exact asymptotic estimates of brownian path variation, Duke Mathematical Journal 39 (1972), no. 2,
219–241, Mathematical Reviews number (MathSciNet) MR0295434, Zentralblatt MATH identifier0241.60069.
121
APPENDIX A
Recalling that, given a probability space pΩ, Σ, Pq and a random variable X on such a space we
define the expectation of a function f as the integral
ż
E rf pXqs “ f pωqPpdωq ,
Ω
where P denotes the (probability) measure against which we are integrating. The following results
are stated for a general measure µ (i.e., not necessarily a probability measure).
Theorem 0.1 (Hölder inequality). Let pΩ, Σ, µq be a measure space and let p, q P r1, 8s with
1{p ` 1{q “ 1. Then, for all measurable real- or complex-valued functions f and g on Ω,
ż ˆż ˙ 1 ˆż ˙1
p q
|f pxqgpxq|dµpxq ď |f pxq|p dµpxq |gpxq|q dµpxq .
Ω Ω Ω
Theorem 0.2 (Lebesgue’s Dominated Convergence theorem). Let tfn u be a sequence of mea-
surable functions on a measure space pΩ, Σ, µq. Suppose that the sequence converges pointwise to a
function f and is dominated by some integrable function g in the sense that
|fn pxq| ď gpxq
for all numbers n in the index set of the sequence and all points x P S. Then f is integrable and
ż
lim |fn ´ f | dµ “ 0
nÑ8 Ω
123
APPENDIX B
or
żt
Zptq “ Zp0q ` Zps, ωqσs pωqdBps, ωq
0
The solution to this is Zpt, ωq “ EI pt, ωq. Hence it is reasonable to call it the stochastic exponential.
From the sde representation it is clear that EI pt, ωq is a martingale, assuming Ipt, ωq is a nice process
(bounded for example). (The Novikov condition is another criteria (in [8] or [17] for example)).
Just as the exponential can be expanded in a basis of homogeneous polynomials, it is reasonable
to ask if something similar can be done with the stochastic exponential. (A function f pxq is
homogeneous of degree n if for all γ P R, f pγxq “ γ n f pxq.) For the regular exponential, we have
8
γX
ÿ Xn
e “ γn .
n“0
n!
125
In homework 2, you found the conditions on the Cn,m so that Hn pI, rIsq was a martingale. In fact,
these polynomial are well known in many areas of math and engineering. They are the Hermite
polynomials. They can also defined by the following expression
ˆ ˙
n x
Hn px, yq “ y H̄n ?
y
n
ˆ ˙
n z2 d
2 2
´ z2
H̄n pzq “ p´1q e e
dz n
Here the H̄n are the standard Hermite polynomial (possible with a different normalization than you
are used to).
We now have two different expressions for the stochastic exponential of γIpt, ωq with zp0q “ 1.
Namely, setting Zpt, ωq “ EγI , we have
żt
Zpt, ωq “ 1 ` γ Zps, ωqσs pωqdBps, ωq
0
and
´ ¯
8
ÿ Hn Ipt, ωq, rIspt, ωq
Zpt, ωq “ γn
k“0
n!
The first expression has Z on the right hand side. At least formally, we can repeatedly insert the
expression of Zps, ωq. Suppressing the ω dependence, we obtain
żt
Zptq “ 1 ` γ Zps1 qσps1 qdBps1 q
0
żt ż t ż s1
2
“ 1 ` γ Zps1 qσps1 qdBps1 q ` γ Zps2 qσps2 qdBps2 qσps1 qdBps1 q
0 0 0
żt
“ 1 ` γ Zps1 qσps1 qdBps1 q ` ¨ ¨ ¨
0
ż t ż s1 ż sn´1
` γn ¨¨¨ Zpsn qσpsn qdBpsn q ¨ ¨ ¨ σps1 qdBps1 q
0 0 0
8
ÿ ż t ż s1 ż sk´1
“ γk ¨¨¨ σpsk qdBpsk q ¨ ¨ ¨ σps1 qdBps1 q
k“0 0 0 0
126