Markov Chain
Markov Chain
Markov Chain
Markov chains and processes are fundamental modeling tools in applications. The reason for
their use is that they natural ways of introducing dependence in a stochastic process and thus
more general. Moreover the analysis of these processes is often very tractable. But perhaps an
overwhelming importance of such processes is that they can quite accurately model a wide variety
of physical phenomena. They play an essential role in modeling telecommunication systems, service
systems, and even signal processing applications. In this chapter we will focus on the discrete-time,
discrete-valued case, that leads to the appellation Markov chains.
We restrict ourselves to the discrete-time case. Markov chains (M.C) can be seen first attempt to
impose a structure of dependence in a sequence of random variables that is rich enough to model
many observed phenomena and yet leads to a tractable structure from which we can perform calculations. Supose we had a sequence of r.v.s {Xi } and we know say X5 , if the Xi0 s are independent
then this information would say nothing about a future value , say X10 , other than the a priori
assumptions that we have on their distribution. O the other hand if they were dependent, unless
we precisely specify how the probability distributions at one time depend on the distributions at
other times there is very little we could do because we know to specify a stochastic process we need
to specify the joint distributions. Markov chains (or Markov processes in general) are stochastic
processes whose future evolution depends only on its current value and not how it reached there.
We formalize this idea below.
Definition 1.1. Let {Xn } be a discrete-line stochastic process which takes its values in a space E.
Let A E. If
P{Xn+1 A|Xo , X1 , . . . Xn } =
P {Xn+1 A | Xn }
B {Xn+1 , . . .}
then
P(A B|Xn ) = P(A|Xn ) P (B|Xn )
1
1 0 0
P = {Pij } = 0.5 0 0.5
2
0 13
3
We can pictorially represent the chain as follows:
In words: If at time 0, the process starts in state 0 then it will stay in state 0 for all time since
2
Pij = P {Xk = j|X0 = i}, i.e., the conditional probability that the chain is in state j after
k-transitions given that it starts in state i.
(k)
= P {Xk = j}
j
and
P = {Pij }i,
j E
We now state the first fundamental result which is obeyed by the conditional probabilities of a
Markov chain. The equation is called the Chapman Kolmogorov equation. We saw (in Chapter 2)
that any Markov chain must obey this equation.
Theorem 1.1. (Chapman-Kolmogorov Equations)
(k+l)
Pij
(k)
(l)
Pi Pj
Proof
P{Xk+l = j/X0 = i} =
P{Xk+1 = j, Xk = |X0 = i}
(Conditional Probabilities) =
(Markov Property) =
P{Xk+1 = j|Xk
(k)
(l)
Pi Pj
= }P{Xk = |X0
= i}
In the above proof we used the fact that the chain is homogeneous. Another way of stating this
result is in matrix notation.
Note that by definition
Pijn = (P n )ij
Hence
Pk+l = Pk Pl .
There are two sub-cases of the Chapman-Kolmogorov equation that are important.
(k+1)
Pij
(k+1)
Pij
(k)
Pi Pj
Pi Pj
: Backward equation
(k)
: Forward equation
What they state is that to reach the state j after (k + 1) steps, the chain starts in state i and goes
to state after the first transition and then goes to j from state after another k transitions or
vice versa.
In a similar vein,
(k+1)
(k+i)
(k)
Pj
(k)
(i)
Pj
P2 =
2
P10 (P00 + P11 ) P11 + P01 P10
and by induction
=
+
1 P11 1 P00
1 P11 1 P00
2 P00 P11
(P00 + P11 1)n
1 P00
(1 P00 )
(1 P11 ) 1 P11
2 P00 P11
1
1 P11 1 P00
1
lim P n =
n
2 P00 P11
1 P11 1 P00
which means
lim P {Xn = 1|X0 = 0} =
1 P00
= P {Xn = 1/X0 = 1}
2 P00 P00
etc. or the chain has the same limiting probabilities irrespective of the initial state i.e., it forgets which state it started in.
Another interesting property can also be seen:
Note that P = limn P n has columns with identical elements. Also P = P P = P P .
The elements of the columns of P are so-called stationary probabilities of the chain that we will
study in detail.
Definition 1.2. The vector = {i }iE is said to be the stationary (or invariant) distribution
of the chain if
= P
or
j =
i Pij
iE
The reason that the vector {} is called the stationary distribution can be seen from the following.
Suppose we start the chain with initial distribution
i0 = i = P {x0 = i}.
Then from the Chapman - Kolmogorov equation
(n+1)
(n)
i Pij
or in matrix form
(n+1)
j
= ( P n )j
=
P n1
j
= = (P )j = j
in other words the probability of being in a given state j at any time remains the same implying
that the distribution is time-invariant, or the process {Xn } is stationary.
To show that {Xn } is stationary if it is started with an initial distribution we need to show
the following property:
P(Xm1 = i1 , Xm2 = i2 , , Xmp = ip ) = P(XN +m1 = i1 , XN +m2 = i2 , , XN +mp = ip )
for all integers N , p, m1 , m2 , . . . , mp and i1 , i2 , . . . , ip . This follows readily from the fact that the
chain is homogeneous and P(Xn = i1 ) = i1 for all n from above. Indeed by the multiplication rule
(m m
)
(m m )
we have both the lhs and the rhs given by: i1 Pi1 i22 1 Pip1p ip p1 showing that the process
is stationary.
Through the examples we have considered we have already seen two important aspects of
Markov chains: how the structure of the matrix P determines the behavior of the chain both from
the time evolution as well as the existence of stationary distributions.
We will now study these issues in greater generality.
Let us first begin by considering the finite-state case. These are referred to as finite-state Markov
chains. Here |E| < (cardinality of the state space is finite or the chain can only take a finite
number of values).
Theorem 2.1. (Ergodicity of Markov chain, |E| < )
Let (E, P ) denote a finite state Markov chain. Let |E| = N + 1.
1. If
n0
min
i,j
s.t.
(n )
Pij 0
>0
then,
N
X
(0 , j , . . . N ) , s.t. i > 0,
i=0
and
(n)
j i E.
lim Pij
3.
N
X
j =
k = 0
> 0
k Pkj
i = 1
Proof: Let
(n)
= mini Pij
(n)
(n)
= maxi Pij .
mj
and
(n)
Mj
By definition
(n)
(n)
Mj .
mj
Since
(n+1)
Pij
(n)
Pi Pj
we have
(n+1)
Since
(n)
mj
mj
(n+1)
= mini Pij
(n+1)
and
(n+1)
mj
=
Pi
(n)
mj
=
(n+1)
(n)
mj
(n+1)
that
(n)
Mj
min Pj
(n)
(n)
(n)
(n)
mj
will exist.
Let
(n0 )
= min Pij
i,j
Then
(n0 +n)
Pij
(n )
> 0.
(n)
Pi 0 Pj
h
i
P
(n )
(n)
(n)
(n)
(n)
Pi 0 Pj Pj +
Pj Pj
h
i
(n )
(n)
(n)
2n)
Pi 0 Pj P,j + Pjj
(n )
(n0 +n)
Pij
(n)
(2n)
mj (1 ) + Pjj .
and hence
(n0 +n)
mj (1 ) + Pjj .
(n0 +n)
Mj
mj
(n)
(2n)
In a similar way
Mj
(n)
(n0 +n)
(n0 +n)
mj
(1 )
(2n)
(1 ) + Pjj .
Hence
Mj
is
Noting that mj
(n)
limn Pij
(n)
Mj
(n)
hence, mj
(n)
Mj .
Mj
(n)
Mj
m(n) .
and consequently
(kn0 +n)
Mj
(kn0 +n)
(1 )k
mj
(n)
Mj
(n)
mj
0 k .
(kn0 +n)
(n)
Mj
(n)
mj
(kn0 +n)
mj
(n)
(n)
mj
is monotonic which
0 as n .
(n)
j =
Then, since mj
(n)
converges to 0. But Mj
lim Mj
(n)
lim mj
(n)
j Mj we have:
( n )1
(n)
(n)
(n)
Pij j Mj mj (1 ) n0
(n)
mj
j n geometrically. Since
(n0 )
mj
>0
j > 0.
(n)
Since Pj
j we have
j =
lim Pij =
n
j = j
Let us conclude that the theorem is a sufficiency theorem, i.e., there may exist stationary distribu(n )
tions even though there maybe no n0 s.t. minij Pij 0 0.
Here is an example: Let
P =
0 1
1 0
Then
Pn =
=
0 1
1 0
1 0
0 1
if n is odd
if n is even.
Hence
(n)
min Pij
ij
= 0
8
n.
But
1 1
=
,
2 2
X
i =
Pi
satisfies
and moreover
0 =
1 P11
2 P00 P11
1 =
1 P00
2 P00 P11
This chain is however not ergodic. We will see what this means a little later on.
2.1
Suppose mini,j Pij = > 0. From the proof of the main result above we have
(n)
|i
i | (1 )n
j
(n)
P (n)
Remark: The quantity j |j j | is referred to as the total variation norm. It measures how
different (n) and are. This is a useful metric between two probability measures defined on the
same space. This property is often called geometric ergodicity of Markov chains.
Let us see what this has to do with ergodicity. Recall we usually use the term ergodicity to
mean that the SLLN holds for any bounded or integrable function of the process,. More precisely,
we use the trem ergodic to imply:
M
1 X
f ((Xn ) = E[f (X0 )]
lim
M M
n=1
where the expectation on the r.h.s is taken under the stationary distribution of the process. Lett
is see how the geometric convergence implies this. Note in this finite state setting it is enough
to show that
P the process {Xn } satisfies the SLLN. Recall, a stationary process {Xn } satisfies the
SLLN if k=0 |R(k)| < where R(k) is the covariance. Without loss of generality let us take
mini,j Pij = > 0.
Let us first compute:
R(n, n + m) = E[Xn Xn+m ] E[Xn ]E[Xn+m ]
XX
X (n) X (n+m)
(n) (m)
=
iji Pij
ii
jj
iE jE
iE
jE
(n)
Now taking limits as n and noting that under the conditions of the Theorem j j we have
P
(m)
R(n, n + m) = i,jE iji Pij (E[X0 ])2 and the r.h.s is just a function of m so limn R(n, n +
9
|R(k)|
X
X
k=0 |R(k)|
(k)
ii j|Pij j |
k=1 i,jE
k=0
= (N + 1)
(1 )k <
k=0
Therefore {Xn } and hence {f (Xn )} for all bounded functions f (.) will obey the SLLN establishing ergodicity.
Let us now study some further probabilistic characteristics of Markov chains.
So far we have only considered the case where |E| < . For this case we saw that if a n0 such
(n )
(n)
that minij Pij 0 > 0 then Pij i which does not depend on i the state the chain started
out in.
Our interest is develop results that are also valid when |E| = i.e., the M.C. can take a count(n)
able infinite number of values. In this case the simple argument to show that Pij i cannot be
P
(n)
carried out since P will now be a matrix of infinite rows and columns, but since jE Pij = 1 n
(n)
this will necessarily imply minij Pij = 0 n, and so our previous arguments do not go through.
However all is not lost we can show some interesting properties of the type above but for this we
need to undertake a more thorough study of the structure of the underlying chain.
Specifically in the case |E| = we will study the following issues:
(n)
To do so we will begin with a description of the states of a Markov chain. The classification
will then enable us to conclude some general properties about states which are members of a class.
We will classify states according to two criteria:
1. Classification of states in terms of the arithmetic (or structural) properties of the transition
(n)
probabilities Pij
(n)
as n .
Let us begin by the study on the classification of the states of a M.C. based on the arithmetic
(n)
properties of Pij .
Throughout the discussion we will assume that |E| = although this is not strictly necessary.
10
Definition 3.1. A state i E is said to be inessential if, with positive probability it is possible
to escape from it in a finite number of transitions without ever returning to it, i.e., a m and
(m)
(n)
j s.t. Pij
> 0 but Pji = 0 n and j.
Let us delete all inessential states from E. Then the remaining states are called essential. The
essential states have the property that the M.C. once it enters the essential states does not leave
them.
Let us now assume that E consists only of essential states.
(m)
Definition 3.2. We say that a state j is accessible form state i if m 0 s.t. Pij
by definition
(n)
Pij
> 0 (note
= 1 if j = i , = 0 otherwise.
P1 0 0
P = 0 P2 0
0 0 P3
where Pi are state transition probability matrices of appropriate dimensions. In this case there are
3 communicating classes.
In such a case since the evolutions of states defined by Pi are not influenced by stats in Pj j 6= i,
the M.C. can be analyzed as 3 separate M.C.s.
Definition 3.3. A M.C. is said to be indecomposable or irreducible if E consists of a single indecomposable class (of communicating states).
Now let us restrict ourselves to a chain which is irreducible (has only one class of communicating
states). Even so, there can be a special structure associated with the class.
Consider for example a chain whose transition probability matrix is given by
0 P1 0 0 0
0 0 P2 0 0
P =
0 0 0 P3 0
0 0 0 0 P4
P5 0 0 0 0
11
This chain is indecomposable but has the particular property that if the chain starts in the states
corresponding to the first submatrix (0) then it goes to states defined by P1 at the next transition
and so on. So the chain returns to a given set of states only after 5 transitions. This is a so-called
cyclic property associated with the states. We can hence sub-classify the states according to such
a structure. This is related to the period of a state which we now define formally below.
(n)
In other words d is the GCD (greatest common divisor) of n for which Pjj
then we put d = 0.
(n)
> 0. If Pjj
= 0n 1
Pij
(k+l)
Hence Pii
(k)
Pij
(l)
Pji
(l)
> 0.
but not divisible by d(i). Then n + k + l is not divisible by d(i), hence Pii
= 0. But since
(k+l+n)
(k)
(n)
(l)
Pii
Pij Pjj Pji > 0 if n is divisible by d(j). Hence k + l + n must be divisible by
d(i) n must be divisible by d(i). This d(i) d(j). By symmetry d(j) d(i). Hence
d(i) = d(j).
Definition 3.6. A M.C. is said to be a periodic if it is irreducible and the period of the states is
1.
We will assume that the M.C. is irreducible and a periodic from now on.
If d > 1 then the class of states can be subdivided into cyclic subclasses as we saw in our
example where d = 5.
To show this select any state i E and introduce the following subclasses.
12
n
o
(n)
j E : Pij > 0 n = 0 mod(d)
n
o
(n)
=
j E : Pij > 0 n = 1 mod(d)
C0 =
C0
Cd1
..
n .
o
(n)
=
j E : Pij > 0 n : (d 1) mod(d)
...
+ Cd1 .
In particular if i Cp then necessarily j Cp+1 if Pij > 0. For example, if Pij > 0 then
j (P + 1) mod(d).
(n)
(n)
13
We will now study the second set of classification of states in terms of the asymptotic properties
(n)
of Pij as n .
Throughout we will assume that the M.C. is irreducible and a periodic.
3.1
(n)
Before we begin our study of the classification based on the asymptotic properties, we will discuss
the issue of the strong Markov property.
The strong Markov property implies that a M.C. continues to inherit its Markov structure when
viewed at instants beyond a random time instant.
Of course the above is a very imprecise statement and so let us try to understand what it means.
Let us begin by considering a simple example.
Let
E = {0, 1}
P =
P00 P01
P10 P11
with P00 > P11 > 0 and P00 < 1, P11 < 1.
Let us define the following random time ().
= min {n > 0 : Xn+1 = 0}. For example, isthe time instant
before the time that it
(0)
(0)
(0)
first reaches 0. Then for any initial distribution
= 0 , 1 .
P {X +1 = 0 | Xm , m < , |X =1 } = 1 6= P10 . What this means is that the Markov
transition structure is not inherited by {Xn } after the random time .
A natural question is when does P {X +1 = j X =i } = Pij if is random? It turns out it
holds when is a so-called Markov or stopping time which we define below.
Definition 3.7. A random is said to be a Markov or stopping time if the event { = n} can be
completely determined by knowing {X0 , X1 , . . . Xn }, for example
P { = n /Xm , m 0} = P { = n/Xm , m n}
Example: Let Xn be a M.C. Define
= min {n > 0 : Xn = i | X0 = i} .
Clearly by observing {Xn } we can determine whether defined as above is a stopping line.
As an example of which is not a stopping line is the case we considered earlier because to
determine we need to know the future value of the process beyond .
We now state the strong Markov property and give a proof of it.
14
E.
{P X +1 = j, X = i, Xm , m }
P {Xm , m < , X = i}
P {X +1 = j X2 = i, Xm , m < }
X
=
P {X+1 = j, X = i, Xm , m < , = }
0
Now we will use the following result that follows from the definition of conditional probabilities:
P (A, B, C) = P (A) P (B/A) P (C/AB).
to write
X
P {X+1 = j, X = i, Xm , m < , = }
> 0
P { = / X = i, Xm , m < , X+1 = j}
= i , Xm ,
m < } = Pij
X P {X+1 = j, X = i, = }
P {X = i}
0
P {X = i} P {X+1
= j
/ X = i} P { = / X = i, Xm }
= Pij
showing that
P {X +1 = j / X = i, Xm ; m < }
= P {X +1 = j / X =i } = Pij
or {X +k } is Markov for k 0 with the same transition matrix P .
Examples of stopping times
1. All constant (non-random) times are stopping times.
2. First entrance times such as
F = inf {n 0 : Xn F}
n
An example of a random time which is not a stopping time is a last exit time of the type
E : sup {Xn E / X0 E}
n
Stopping times play a very important role in the analysis of Markov chains. They also play an
important role in some practical situations where we can only observe certain transitions such as
the so-called M.C. watched in a set which is the following.
Define
0 = inf {n 0 : Xn Y }
n
16
Define
i = {n 1 : Xn = i}
i : First return time to state i and i = if Xn 6= i
n.
Note
{i = k} = {X1 = 6= i, X2 6= i, . . . , Xk1 =6= i, Xk = i}
so i is a stopping time.
Define fij = P {Tj < | X0 = i}
fij denotes the probability the process starting in i enters state j at some finite time.
Let
Ni =
1{ Xn=i }
Ni just counts the number of times the chain visits state i in an infinite sequence of moves
Define fijk = P {j = k | X0 = i}
Then we have the following result which is a direct consequence of the strong Markov property.
Lemma 3.2.
(n)
Pij
n
X
(k)
(nk)
fij Pjj
k=1
n1
X
(k)
Pii
(nk)
fij
k=0
Note
(n)
Pij
= P {Xn = j | X0 = i} =
P {Xn = j, = k / X0 = i}
1Kn
P {X +nk = j, = k | X0 = i}
1kn
P {X +nk = j / = k, X0 = i} P { = k | X0 = i}
1kn
But
{ = k} = {X1 6= j, X2 6= j, . . . Xk1 6= j, Xk = j}
(nk)
Pjj
P { = k / X0 = i} =
n
X
(k)
(nk
fij Pjj
1Kn
P {Xn = j, = n K | X0 = i}
1Kn1
17
k
X
P {Ni = r}
r=0
= fji fiir
Let m denote the mth return time.
P {Ni = m + 1 |X0 = j} = P {Ni = m + 1, Xm+1 = i | X0 = j}
= P {m+2 m+1 = , Xm+1 = i | X0 = j}
= P {m+2 m+1 = | Xm+1 = i, X0 = i} {Xm+1 = i | Xi=j }
= P {m+2 m+1 = | Xm+1 = i} P {Xm+1 = i / Xm = j}
= P {Ti = | X0 = i} P Xm+1 = i | X0 = j
18
Now
P {Ni = k | X0 = i} = fiik (1 fii ).
Let
Pi (Ni = k) = P (Ni = k | X0 = i).
Therefore if fii = 1 then
P {Ni = k | X0 = i} = 0 K.
Hence
{Ni = | X0 = i} = 1.
On the other hand if fii < 1 then
E [Ni | X0 = i] =
(k)
k fii (1 fii )
k=0
fii
<
1 fii
Pi {Ni = } = 0.
So
Pi {Ni = } = 1
fii = 1
fii < 1
Ei [Ni ] < .
These two quantities define a class of properties called recurrence associated with the states of a
M.C.
Definition 3.8. A state i is said to be recurrent if Ni = a.s.. Let Ti be the first return to i. If
Ei [Ti ] < the the state is said to be positive recurrent while if Ei [Ti ] = the i is said to be null
recurrent. A state that is not recurrent is said to be transient.
Let us see one of the implications of the property Ni = a.s. Define 1 = Ti and
n+1 = inf {m > n : |Xm = i}
m
19
Proof
This is just a consequence of the strong Markov property. This is because the process
after k and the process before k are independent. Furthermore since k are the return times to
the state i. We know Xk+n has the same distribution as Xn given X0 = i by the strong Markov
property. Also Sk T0 in distribution since the chain starts off afresh in state i.
Remark 3.1. Such pieces {Xk Xk1 , Xk+1 Xk , . . .} are called regenerative cycles and k
the regeneration times or epochs.
Remark 3.2. A consequence of these results is that if a M.C. is irreducible (indecomposable) (all
states form a single communicating class), then all states are either transient or recurrent.
Later on we will show that positive and null recurrence i.e., when the return times have finite
mean or not, are also a class property.
(n)
The next result establishes the limiting behavior of Pij when the states are transient.
Lemma 3.4. If j is transient then for every i
(n)
Pij <
n1
(n)
(n)
Pij
= Ei [Nj ]
n=1
and so the sum being finite means that on the average the chain visits j only a finite number of
times.
Now
(n)
Pij
n=1
X
n
X
(nk)
P {Tj = k | X0 = i} Pjj
n=1 k=1
P (Tj = k | X0 = i)
X
1
k=1
= fij
(m)
Pjj
(n)
Pjj
<
1.
Hence
(m)
Pij
(n)
< Pij
n=1
as n if j is transient.
Thus, with this partition of states into recurrent or transient we now show that recurrent
states can be further decomposed into those where the expected return time is finite called positive
20
recurrent and those whose expected return time is infinite , called null recurrent. Positive or null
recurrence are closely associated with ergodicity of a MC.
The following figure summarizes a classification of states based on the temporal behavior.
We saw that recurrence is a property which is dependent on whether fii is 1 or < 1 and
fii = P {Ti < | Xm = i}. This is usually not easy to calculate so we seek an alternative criterion.
To do so let us define the so called potential matrix
G =
P (m)
n=0
Then
gij
(n)
Pij
n=0
"
= Ei
P (Xn = j | X0 = i)
n=0
#
1{Xn=j }
21
(n)
= .
Pii
n=0
Proof: This is just equivalent to stating Ei [Ni ] = or the fact that the chain visits i an infinite
number of times a.s..
With this equivalent condition we can now show that recurrence is a class property i.e., if i j
(they belong to the same class) and i is recurrent then j is recurrent.
Proposition 4.2. Let j be recurrent and i j, then i is recurrent.
Proof:
Pi
(t)
> 0, Pji
>0
Hence since
(s+n+t)
Pii
(s)
(n)
(t)
so if
X
(n)
Pjj
(n)
Pii
= i
is recurrent.
Reversing the arguments shows the reverse implication.
4.1
As we have seen if a M.C. is irreducible then either all states are recurrent or transient. Let us now
study conditions for recurrence without calculating fii .
To do so we now introduce the notion of invariant measures. Invariant measures extend the
notion of stationary distributions M.C. can have invariant measures even when no stationary
distribution exists. an example of such a case is a M.C. we have seen where
0 1
P =
.
1 0
Hence ( 12 ,
1
2)
is an invariant measure.
22
in this case.
Let us now define a canonical invariant measure for Xn .
Proposition 4.3. Let P be the transition matrix of a M.C. {Xn }. Assume Xn is irreducible and
recurrent. Let 0 be an arbitrary state and T0 to be the return time to 0. For each i E, define
X
i = E0
1I{Xn =i} 1I{nTo }
n1
(This is the expected number of visits to state i before returning to 0). Then for all i E
0 < i <
and = {i } is an invariant measure of P .
Before we give the proof a few comments are in order.
Remark 4.1. Note by definition: 0 = 1. Since for
n [1, T0 ]
Xn = 0
if and only if n = T0 .
Also since
X X
1{Xn
= i}
1{nT0 } =
iE n1
(
X
n1
iE
)
1Xn=i } 1{nT0 }
1{nT0 } = T0 .
We have
n1
i = E0 [T0 ]
iE
Proof:
(n)
j Pji
jE
(n)
Now since 0 = 1 we have Poi = 0. Hence 0 cannot communicate with i which contradicts the
hypothesis that the chain is irreducible.
23
which can only happen if Pi0 = 0 n which once again contradicts the irreducibility hypothesis.
Hence 0 < i < i E.
Let us know show that i as defined in an invariant measure.
Then by definition of i we have
X
i =
(k)
G0,i
k1
(k)
where G0,i = P(Xk = i, T0 > k|X0 = 0) Applying the result of Lemma 5.5 we obtain for all k 2
(k)
(1)
G0,i = i G0,i
X X
(k1)
G0,j
Pji
j6=0 k=2
j Pji
j6=0
(1)
Define
qji =
Then
X
i
qji =
yi
Pij
yj
1 X
yi
yi Pij =
= 1
yi
yk
i
qji
yi (n)
P
yi ij
24
i.
Also
X
qii (n) =
Pii (n)
Let
(n)
qii
(n)
gji
Pii (n).
n0
n0
So if
= so Q. Let
= Prob {the chain defined by Q
returns for the first time to state i
at time n starting from j}
Then
(n+1)
gi0
(n)
qij gj0 .
j6=0
Hence
(n+1)
yi gi0
(n)
yj gj0 Pji
j6=0
f0i
(n)
g0j Pji
j6=0
(n)
(n)
(1)
(n)
we see that f0i and yi gi0 . Satisfy the same recurrence with f0,i = yi gi0 . Therefore we see
Xi =
yi
is also the invariant distribution
y0
Proof:
Remark:
Noting that
j
j = P
j
we see that j when defined is unique since the multiplicative factors cancel out.
We state this as a theorem.
Theorem 4.2. An irreducible M.C. is positive recurrent if and only if a stationary distribution.
Moreover the stationary distribution is unique.
25
Proof:
The first part follows from the previous Theorem and the remark above.
i =
(n)
i Pji
j
(n)
lim
K
X
(n)
Pij
j=0
K
X
j=0
(n)
lim Pij
= 0
which is a contradiction.
So
On the other hand if it is recurrent it possesses an invariant measure {i } with 0 < i < .
P
K
0 i < (finite sum) so the chain is positive recurrent.
We can now show the following result that shows the importance of the mean return time w.r.t.
the stationary distribution
Theorem 4.3. Let be the unique stationary distribution of a +ve recurrent chain. Let Ti be the
return time to state i.
Then
i =
1
Ei [Ti ]
Proof:
P Since in the definition of i we considered an arbitrary state 0 for which 0 = 1, we
know
i < and
i
= P
j
Taking i = 0 we obtain
0 = P
i
26
1
E0 [T0 ]
i =
the first return time to a given state. Suppose |E| = N < . Then:
X
X
E[ ] =
E[ |X0 = i]i =
Ei [Ti ]i = N
i
since Ei [Ti ] = 1i . Hence if the cardinality of E is infinite then E[ ] = . Does this contradict
positive recurrence? It does not, since X0 can be any one of the states, all the statement says that
the MC cycles through all the states on average before returning to the state it started out in. If we
condition on a particular state the average return time is finite.
So far we have only discussed the positive recurrent case and the transient case. The null
recurrent case corresponds to the case when
X
i = .
i
(n)
In this case it can be shown that Pij 0 n if j is null recurrent. The proof of this
is much more technical and so we approach it differently.
An alternate approach to showing conditions of positive recurrence and null recurrence is as
follows:
Recall
n
X
(k)
(nk)
(n)
fij Pjj
.
Pij =
k=1
Now
(n)
lim Pij
lim
(k)
(nk)
fij Pjj
k=1
(n)
(n)
Let
U0 = 1,
fk = 1,
f0 = 0
k=1
and
Un =
X
k=1
27
fk Unk
Then
lim Un = P
Proof:
1
k fk
k=1
Un z n
F (z) =
fk z k .
Then
U (Z) 1 =
Un z n
1
1 F (z)
U (z) =
Hence
lim Un =
But
F (1) =
1 z
|Z=1
1 F (z)
fk = 1
k=1
1
|z=1 =
F 1 (z)
K=1
1
|Z=1
K fK z K1
1
k=1
k fk
Un = Pji
(n)
fn = fjj
we obtain
(n)
lim Pjj
= P
1
n
(n)
(n)
fjj
1
Ej [Tj ]
and so if j is null recursive Ej [Tj ] = so lim Pij 0. On the other hand if j is +ve recurrent
then
Ej [Tj ] <
28
then
(n)
lim Pij
1
= j
Ei [Tj ]
(n)
In the above result we have shown that if j is recurrent then limn Pij
limit is 0 if j is null recurrent and the limit is P ij if j is +ve recurrent.
Actually we can show that if the chain is a periodic and irreducible i.e. (1 class of communicating states) then if i j i is positive recurrent then j is positive recurrent.
Let us show this. Suppose i is positive recurrent and j is not. Since i and j communicate
(n)
F n, m > 1
Pij
(m)
> 0,
Pji
> 0
Now
(n+m+K)
Pjj
Hence lim k
contradiction.
(n+m+k)
Pjj
(m)
> Pji
(K)
Pii
(n)
Pij .
(k)
i > 0 which is a
This establishes the class property of positive and null recurrent states.
So far we have concentrated on understanding how a MC behaves on the long-term. We identified these properties as related to how the return times behave. A natural question is whether
there is a simple way of determining conditions on whether a chain is ergodic.
Let us consider some simple examples :
Examples:
1. (Random Walk). This is a 1-dim process constructed as follows:
Xn+1 = Xn + Zn
where {Zn } is i.i.d sequence and takes values in {1, 1} with P(Zn = 1) = p = 1P(Zn = 1).
n
(2n+1)
(2n)
Clearly since the chain can only return to 0 at even steps P0,0
= 0 and P00 = 2n
n p (1
P (n)
p)n . Hence if p = 0.5 we see
n P00 = implying 0 is recurrent. With some further
analysis it can be shown that the process is actually null recurrent.
Now if p 6= 0.5 = q = (1 p) it is easy to see that 4pq < 1 and using the fact that n is large
n
P (n)
(2n)
29
i
Now it is easy to see that the period is 2 and fi0 = pq < 1 if q < p. Hence we have all states
are transient. On the other hand if p < q it is easy to see fi0 = 1 implying 0 is recurrent and
moreover it can be shown = P gives:
j
j =
p
q
p
q
>0
= p0
(n)
= pn1
f00
f00
n2
Y
qj
j=0
Q
Thus: PQ
< m|X0 = 0) = 1 Um where Um =P m1
0 (T0 < m) = P(T0 P
i=1 qi . Now we know
p
=
.
Hence
0
is
recurrent
iff
limn qj (1 pj ) = 0
j=0 j
j pj = . Consider the
special case pj = p = 1 qj = 1 q. In this case we can see E0 [T0 ] < establishing positive
recurrence.
We now state the general ergodic theorem for MC. and provide a proof:
Theorem 4.4. Let X(0)n } be a homogeneous, irreducible,
and recurrent MC. Let denote the
P
invariant distribution and let f : E < such that:
|f
(i)|
i < . Then:
iE
n
X
1 X
lim
f (Xk ) =
f (i)i
n (n)
where:
(4.1)
iE
k=1
T0
X
i = E0 [
1I[Xk =i]
k=1
and
(n) =
n
X
1I[Xk =0]
k=1
Proof: It is sufficient to show proof for positive functions. We now exploit the regenerative property
of MC to prove this.
Let i be the successive return times to 0. Define:
p+1
Yp =
X
k=p +1
30
f (Xk )
Then from the strong Markov property {Yp } are i.i.d. and
p+1
E[Yp ] = E[
1
X
f (Xk )] = E0 [
f (Xk )]
k=p +1
1 X
X
= E0 [
k=1
f (i)1I[Xk =i] ] =
k=1 iE
1
X
f (i)E0 [
1I[Xk =i] ]
iE
k=1
f (i)i
iE
P0
1I[Xk =i] ] Therefore: by the SLLN:
where we have used the definition that i = E0 [ k=1
n
X
1X
Yi = E[Y1 ] =
f (i)i
n n
lim
i=1
iE
X
1X
f (Xk ) = E[f (X0 )] =
f (i)i
lim
n n
(4.2)
iE
k=1
i i
X
n
=
i
n (n)
lim
iE
X
jE
= 0iF
31
Pij m(j); i
/F
= 1+
jE
= 1+
Pij m(j)
jE
(4.3)
which is equivalent to :
E[h(Xn+1 ) h(Xn )|Xn = j] < 0
or the conditional drift in state j F is negative.
Let F = inf n {n 1 : Xn F } and define Yn h(Xn )1I[n<F ] .
Note F is a stopping time. Let i
/ F and Ei [.] denote E[.|X0 = i], then:
Ei [Yn+1 |X0 , X1 , . . . , Xn ] = Ei [Tn+1 1I[n<F ] |X0 , . . . , Xn ] + Ei [Yn+1 1I[F n] X(0)0 , . . . , Xn ]
= Ei [Yn+1 1I[n<F ] |X0 , . . . , Xn ]
Ei [h(Xn+1 )1I[n<F ] |X0 , . . . , Xn ]
= 1I[n<F ] Ei [h(Xn+1 |Xn ]
1I[n<F ] (h(Xn ) )
where we used the fact that 1I[n< ] is completely known given X0 , . . . , Xn and if n < F then Xn
/ F.
So taking expectations w.r.t. Ei once more,
0 Ei [Yn+1 ] Ei [Yn ] Pi (F > n)
32
n
X
Pi (F > k)
k=1
P
But we know:
k=1 Pi (F > k) = Ei [F ], but Ei [Y0 ] = h(i) and hence: Ei [F ]
On the other hand using the previous lemma we have for j F :
X
Pji Ei [F ]
Ej [F ]1 +
i
/F
and hence:
Ej [F ] 1 +
h(i)
< .
1 X
Pji j(i)
i
/F
(4.4)
Suppose:
i)supiE |ri | <
ii) There exists a i0 < such that for all i i0 , ri < for some > 0.
Then the chain is ergodic.
Proof: This just follows from above by taking h(Xn ) = Xn and F = {i Z+ : i i0 1}. Then
all conditions of the Foster-Lyapunov theorem are satisfied.
We conclude our discussion to show how these results apply on a canonical example the represents a discrete-time queue.
Example:
Let :
Xn+1 = (Xn 1)+ + n+1
where n+1 is a i.i.d sequence with 0 < E[n+1 ]] < 1 .
Then applying Pakes theorem we see for all i > 1:
E[Xn+1 Xn |Xn = i] = 1 + E[n+1 ] < 0
implying that the chain is ergodic,
In the next section we study the convergence to stationary state a bit further as in the finite
state case.
33
4.2
1
N
When |E| < we actually showed that Pij j as n independent of i and the convergence
rate was geometric. This convergence is actually related to the notion of stability . We discuss this
issue in detail now. Specifically:
(n)
How does Pij j ? In the finite case we have seen the convergence is geometric. Under
what conditions is this true for infinite chains?
We can show something stronger. Xn converges to a stationary process in a finite but random
time. This is called the setting time or coupling time. The ramification of this is that when we try
to simulate stationary MC we need not wait for an infinite time for the chain to be stationary, we
can observe certain events, and once they occur we can conclude that after that time the chain is
stationary.
But first let us recall the result we showed for finite state Markov chains.
Let P be at time n n and let
min Pij = > 0
(i)
Let
(n)
= P {Xn = i}
i = P {Xn = i}
X
i =
j Pji
and
where
(stationary dist)
Define
k (n) k =
1 X
(n)
| i i |
2
1
This is called the total variation metric and convergence under this is called total variation
convergence. The factor 12 is just to normalize the metric.
Now in the proof of Theorem 5.2 we saw
(n)
(n)
j ,
(n)
mj
Hence
X
(n)
| j
i |
(0)
(n)
j |
(n)
Pij
i
(n)
| Mj
mj
(1 )n
X
j
34
| Mj mj | 2(1 )n .
Note
X
| Mj mj | =
jE
iE
iE
Hence
k (n) k 2(1 )n
This convergence is geometric.
Indeed (1 )n is related to the tail distribution of a fundamental quality associated with the
convergence: a so-called coupling time which we will now discuss. Coupling is a powerful technique
which can be used to establish existence of stationary distributions, rate of convergence, etc.
The basic approach is the following: suppose {Xn1 } and {Xn2 } are two homogeneous irreducible
and a periodic M.C.s with the same P which are independent.
Define Zn = (Xn1 , Xn2 ) on E X E. Then Zn is a M.C. with transition problem matrix
P ij,kl P {Zn+1 = ((k, l) / Zn = (i, j)) = Pik Pjl }
Suppose the chain is positive recurrent then, F a finite such that starting for any states i and j
the chain goes to a diagonal state where the two co-ordinates are equal i.e.,
X1 = X2
Define
Xn = Xn1
Xn2
n .
=
Then we can show the following.
Proposition: {Xn } is a +ve recurrent M.C. with transition probability matrix P defined above .
Proof:
This follows directly from the strong M.C. proposition. Let us formally define coupling.
Definition 4.3. 2 stochastic processes {Xn1 }, {Xn2 } and with values in E are said to couple if there
exists a () < s.t. for all
n : Xn1 = Xn2 .
Lemma 4.2. (The coupling inequality)
Let {Xn1 } and {Xn2 } be two independent processes defined on (, F, P ) and let be a coupling
time . Then for any A F we have:
|P (Xn1 A) P (Xn2 A)| P ( > n)
Proof:
35
(4.5)
lim Pij = j
for all i, j E.
Proof:
Construct to independent MC on E E with transition probability P .
Let be a coupling time state the chains meet at Xn1 = Xn2 .
Then if Xn2 has an initial distribution then P {Xn2 = j} = j
Using the coupling inequalities
X
| P {Xn1 = j} j |
for all j.
2 P ( > n).
Therefore since < P ( n) 0 n . So
(n)
|Pij
j | 0
P
(n)
From this we see jE |Pij j | 0 as n
In fact after the chain can be considered to have converged to the stationary distribution.
36
Remark 4.4. The aperiodicity and irreducible assumption is important. Otherwise is is very easy
to construct periodic chains that never meet at a diagonal especially if they start out in different
cyclic subclasses. Hence the periodic case can be treated by considering the transition probability
matrices P d .
How do we get convergence rates from these results?
Lemma 4.3. Suppose E[( )] < for a non-decreasing function (.). Then
1
n
|Pij j | = 0
(n)
Proof:
Since ( ) is non-decreasing
( ) 1 ( > n) > (n)1{ >n}
So
(n)P ( > n) E ( ) 1{
> n}
Now since E ( ) 1{ >n} 0 > n by finiteness of E[( )] we have
1
.
n) 0 n P ( > n) = 0 (n)
(n) P ( >
Of course, depending on the MC we need to establish that E[( )] < . When |E| < it is
easy to show the following:
Lemma 4.4. Let {Xn } be a finite state M.C. on (E, P ), then there exists > 0 s.t.
E [e ] < .
Proof: The proof follows from taking ( ) = e and since the MC is finite the hitting time to
the diagonal state can be shown to have a geometric tail distribution. Hence convergence of the
distribution to the steady state is geometric.
With this we conclude our study of discrete-time Markov chains. In the next part we will study
continuous-time Markov chains where these results will play an important part.
37