0% found this document useful (0 votes)

148 views

Quick Start: Resolving A Markov Decision Process Problem Using The Mdptoolbox in Matlab

This document provides an overview of Markov decision processes (MDPs) and describes how to model and solve a simple forest management problem as an MDP using the MDPtoolbox in MATLAB. It defines the MDP framework, describes the forest management problem in terms of states, actions, transition probabilities, and rewards. It then shows how to define the MDP matrices in MATLAB and use functions in the MDPtoolbox like mdp_policy_iteration, mdp_value_iteration, and mdp_LP to find the optimal policy, which is to always choose the "Wait" action. It also evaluates the value function and plots the stationary state distribution under the optimal policy.

Uploaded by

Omid khosravi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

148 views

Quick Start: Resolving A Markov Decision Process Problem Using The Mdptoolbox in Matlab

Uploaded by

Omid khosravi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

Quick Start: Resolving a Markov decision process

problem using the MDPtoolbox in Matlab

Iadine Chadès∗, Guillaume Chapron†, Marie-Josée Cros‡, Frédérick Garcia‡, Régis Sabbadin‡

January 2014

1 MDP framework
(From Wikipedia, the free encyclopedia with minor changes)

Markov decision processes (MDP) provide a mathematical framework for

modeling decision making in situations where outcomes are partly random
and partly under the control of a decision maker. MDPs are useful for study-
ing a wide range of optimization problems solved via dynamic programming
and reinforcement learning.

More precisely, a Markov Decision Process is a discrete time stochastic con-

trol process. At each time step, the process is in some state s, and the
decision maker may choose any action a that is available in state s. The
process responds at the next time step by randomly moving into a new state
s’, and giving the decision maker a corresponding reward R(s, s0 , a).

The probability that the process moves into its new state s’ is influenced
by the chosen action. Specifically, it is given by the state transition function
P (s, s0 , a). Thus, the next state s’ depends on the current state s and the
decision maker’s action a. But given s and a, it is conditionally independent
of all previous states and actions; in other words, the state transitions of an
MDP possess the Markov property.

Definition
In its typical definition, a Markov decision process is a 4-tuple < S, A, P, R >,
∗
CSIRO Ecosystem Sciences, GPO Box 2583, Brisbane QLD 4001, Australia
†
Grimsö Wildlife Research Station, Swedish University of Agricultural Sciences, 73091
Riddarhyttan, Sweden
‡
INRA, UR 875 Applied Mathematics and Computer Science laboratory, F-31326 Cas-
tanet Tolosan, France.

1
where:

• S is a finite set of states,

• A is a finite set of actions,

• P (s, s0 , a) = Pr(st+1 = s0 | st = s, at = a) is the probability that action

a in state s at time t will lead to state s’ at time t+1,

• R(s, s0 , a) or R(s0 , a) is the immediate reward (or expected) received

after transition to state s’ from state s with action a.

The core problem of MDPs is to find a "policy" for the decision maker: a
function π that specifies the action π(s) that the decision maker will choose
when in state s. The goal is to choose a policy π that will maximize some cu-
mulative function of the random rewards, typically the expected discounted
sum over a potentially infinite horizon:
P∞ t R(s
t=0 γ t , st+1 , π(st ))

where γ is the discount factor and satisfies 0 ≤ γ < 1.

Algorithms
MDPs can be solved by linear programming or dynamic programming. In
what follows we present the latter approach. The standard family of algo-
rithms to calculate this optimal policy requires storage for two arrays indexed
by state: value V, which contains real values, and policy π which contains
actions. At the end of the algorithm, π will contain the solution and V(s)
will contain the discounted sum of the rewards to be earned (on average) by
following that solution from state s.

The algorithm has the following two kinds of steps, which are repeated in
some order for all the states until no further changes take place. They are
defined recursively as follows:

Pa (s, s0 ) (Ra (s, s0 ) + γV (s0 ))}

P
π(s) := arg maxa { s0

Pπ(s) (s, s0 ) Rπ(s) (s, s0 ) + γV (s0 )

P
V (s) := s0

Their order depends on the variant of the algorithm; one can also do them
for all states at once or state by state, and more often to some states than

2
others. As long as no state is permanently excluded from either of the steps,
the algorithm will eventually arrive at the correct solution.
Notable variants: Value iteration, Policy iteration, Modified policy iteration.

2 MDPtoolbox
The MDPtoolbox (http://www7.inra.fr/mia/T/MDPtoolbox) proposes func-
tions related to the resolution of discrete-time Markov Decision Processes:
value iteration, policy iteration, linear programming algorithms with some
variants.
It is currently available on several environment: MATLAB, GNU Octave,
Scilab and R.

Download Matlab version

To download the toolbox for Matlab, follow the toolbox instructions given
on the toolbox website: http://www7.inra.fr/mia/T/MDPtoolbox/Install.html

To use the toolbox, just call Matlab and add the MDPtoolbox directory
to search path.
For example, go to the MDPtoolbox directory, call Matlab and execute:

>> MDPtoolbox_path = pwd;

>> addpath(MDPtoolbox_path)

In this Matlab session, it is then possible to use all the MDPtoolbox func-
tions. To acces the HTML documentation, open with a browser the local
file: MDPtoolbox/documentation/DOCUMENTATION.html .

3 Description of a tiny forest management problem

The considered problem is to manage a forest stand with first the objective
to maintain an old forest for wildlife and second to make money selling cut
wood. The forest stand is managed by two possible actions: Wait or Cut. An
action is decided and applied each time period of 20 years at the beginning
of the period.
Three states are defined, corresponding to 3 age-class of trees: age-class 0-20
years (state 1), 21-40 years (state 2), more than 40 years (state 3). The
state 3 correspond to the oldest age-class. At the end of a period t, if the
state was s at t and action Wait is choosen, the state at the next time period
will be min(s + 1, 3) if no fire occured. But there is a probability p that a
fire burns the forest after the application of the action, living the stand at

3
the youngest age-class (state 1). Let p = 0.1 be the probability of wildfire
occurence during a time period.
The problem is howto manage this stand in a long term vision to maximize
the γ-discounted reward with γ = 0.95.

Here is a modelisation of this problem in the MDP framework.

Let Wait be action 1 and Cut action 2.
The transition
 matrix P(s,s’,a)
 of the problem can then be defined as follows:
p 1−p 0
P(.,.,1) = p 0 1 − p
p 0 1−p
 
1 0 0
P(.,.,2) = 1 0 0
1 0 0
The reward matrix R(s’,a) is defined as follows.
0
R(.,1) = 0
4
 
0
R(.,2) = 1

2
For example, the probability to be in state 3 at time t+1 (s’ = 3), being in
state 3 (s = 3) and choosing action Wait (a = 1) at time t, is P(3,3,1) = 1 -
p = 0.9 and the associated reward is R(3,1) = 4.

Let define the MDP problem in Matlab.

First, define the matrices P for the transition function and R for the reward
function.

>> P(:,:,1) = [0.1 0.9 0; ...

0.1 0 0.9; ...
0.1 0 0.9]; ...
>> P(:,:,2) = [1 0 0; 1 0 0; 1 0 0];
>> R(:,1) = [0 0 4]’;
>> R(:,2) = [0 1 2]’;

or use the forest example function of the MDP toolbox:

>> [P, R] = mdp_example_forest();

It is important to check the validity of the description.

For this, use the mdp_check function.

>> mdp_check(P, R)
ans =
’’

4
When the output is empty, no error was detected.

Finally, define the discount.

>> discount = 0.95;

4 Resolution of this tiny problem

The problem is now expressed, lets solve it.
For a discounted criterium, several algorithms and their related functions
are available in the toolbox (see Functions by category page documenta-
tion). For the very simple considered problem, their are quite equivallent.
Let call the notable algorithms.

>> [V, policy] = mdp_policy_iteration(P, R, discount)

V =
58.4820
61.9020
65.9020
policy =
1
1
1

>>[policy] = mdp_value_iteration(P, R, discount)

policy =
1
1
1

>>[V, policy] = mdp_LP(P, R, discount)

Optimization terminated.
V =
58.4820
61.9020
65.9020
policy =
1
1
1
>> [~, V, policy] = mdp_Q_learning(P, R, discount)
V =
56.3507
59.8762
63.8072
policy =
1
1
1

The optimal policy found is to choose action Wait(1) in the 3 defined states,
that is in a more understandable way ’never cut’.
Note that mdp_LP function provides the exact expected value function. For

5
instance, V(2) = 61.9020 is the expected reward value when in state 2 (age-
class 21-40 years for trees). For the other algorithms, to better apprehend
V, some functions are available in the toolbox. Let call them. Note that
mdp_eval_policy_matrix provides also the exact expected value function.

>> Vpolicy = mdp_eval_policy_matrix (P, R, discount, policy)

Vpolicy =
58.4820
61.9020
65.9020
>> Vpolicy = mdp_eval_policy_iterative(P, R, discount, policy)
Vpolicy =
58.4819
61.9019
65.9019
>> Vpolicy = mdp_eval_policy_TD_0 (P, R, discount, policy)
Vpolicy =
54.2553
58.1873
62.7556

It is often necessary to express and visualise policies in more understand-

able expressions: decision rules, graphs ... An example of graph, pertinent
for this problem, is the representation of the percentage spend in each state
(age-class), that is the stationary distribution of state.

In order to compute it, define the get_stationary_distribution as follow

and save it in a get_stationary_distribution.m file.

function mu = get_stationary_distribution( p )
% Computes the stationary distribution mu of a Markov chain
% described by p (stochastic matrix, ie sum(p,2)=1).
% Input
% p : transition matrix associated with a policy, p(s,s’)
% Output
% mu : stationary distribution for each state s ( p*mu’=mu’ )
% is_OK_mu : false if p is not a stochatic matrix, else true

s=size(p,1);
mu=zeros(1,s);
if any(abs(sum(p,2)-1)>10^-4) || (size(p,2)~=s)
disp ’ERROR in get_stationary_distribution: argument p must be a stochastic matrix’
else
% mu satisfies p*mu’=mu’ and mu sums to one
A=transpose(p)-eye(s);
A(s,:)=ones(1,s);
b=zeros(s,1);
b(s)=1;
mu=transpose(A\b);
is_OK_mu = ~isempty(mu) && all(mu>-10^-4) && (sum(mu)-1<10^-4);
if ~is_OK_mu; mu=[]; end
end

Then call this function and plot (Figure 1) the stationary distribution in
age-classes.

6
>> mu = get_stationary_distribution( mdp_computePpolicyPRpolicy(P, R, policy) )
mu =
0.1000 0.0900 0.8100
>> bar(mu,0.4); ylim([0 1]);
>> xlabel(’age-class’); ylabel(’percentage of time in age-class’);

Beware that in the Matlab code the caracter ’ must be replace by a '.

Figure 1: Percentage of time spend in age-classes for the

mdp_example_forest() problem, with the optimal policy: never cut

5 Exploration of variant problems

Furthermore, it is also possible to evaluate the impact of model parameters
change.

First, what about a lower incitation of conserving the oldest age-class (0.4
instead of 4).

>> [P, R] = mdp_example_forest (3, 0.4, 2,.1);

>> [V, policy] = mdp_LP(P, R, discount)
Optimization terminated.
V =
11.3073
11.9686
12.7419
policy =
1
1
2

7
The policy found ask now to cut the oldest-age class.
Lets compute the state stationary distribution and plot it (Figure 2).

>> mu = get_stationary_distribution( mdp_computePpolicyPRpolicy(P, R, policy) )

mu =
0.3690 0.3321 0.2989
>> bar(mu,0.4); ylim([0 1]);
>> xlabel(’age-class’); ylabel(’percentage of time in age-class’);

Beware that in the Matlab code the caracter ’ must be replace by a '.

Figure 2: Percentage of time spend in age-classes for the

mdp_example_forest(3, 0.4, 2,.1) problem, with the optimal policy:
cut the oldest-age class

Second, what about if the probability of wildfire is very high (p = 0.8).

>>[P, R] = mdp_example_forest (3, 4, 2,.8);

>>[V, policy] = mdp_LP(P, R, discount)
Optimization terminated.
V =
3.1933
4.0336
7.9344
policy =
1
2
1

As expected, the policy found changes and requires to cut at the 2nd age-
class with lower expected values.
Lets compute the state stationary distribution and plot it (Figure 3).

>> mu = get_stationary_distribution( mdp_computePpolicyPRpolicy(P, R, policy) )

mu =

8
0.8333 0.1667 0
>> bar(mu,0.4); ylim([0 1]);
>> xlabel(’age-class’); ylabel(’percentage of time in age-class’);

Beware that in the Matlab code the caracter ’ must be replace by a '.

Figure 3: Percentage of time spend in age-classes for the

mdp_example_forest(3, 4, 2,.8) problem, with the optimal policy:
cut the second age-class

Python Markov Decision Process Toolbox Documentation: Release 4.0-b4
No ratings yet
Python Markov Decision Process Toolbox Documentation: Release 4.0-b4
44 pages
Problem 1: Markov Reward Process
No ratings yet
Problem 1: Markov Reward Process
3 pages
Unit-4 MDP
No ratings yet
Unit-4 MDP
21 pages
Markovian Decision Process
No ratings yet
Markovian Decision Process
27 pages
Lecture4 Model Free Prediction
No ratings yet
Lecture4 Model Free Prediction
34 pages
Lecture 3 - MDPs and Dynamic Programming
No ratings yet
Lecture 3 - MDPs and Dynamic Programming
66 pages
New CZ3005 Module 4 - Markov Decision Process
No ratings yet
New CZ3005 Module 4 - Markov Decision Process
38 pages
MIT16 410F10 Lec22
No ratings yet
MIT16 410F10 Lec22
19 pages
L12 Markov Decision Processes
No ratings yet
L12 Markov Decision Processes
64 pages
Robust Markov Decision Processes- A Place Where AI and Formal Methods Meet
No ratings yet
Robust Markov Decision Processes- A Place Where AI and Formal Methods Meet
29 pages
RL_UNIT-II (1)
No ratings yet
RL_UNIT-II (1)
14 pages
Lecture Notes
No ratings yet
Lecture Notes
29 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
43 pages
A17 Complexdecisions
No ratings yet
A17 Complexdecisions
28 pages
06 MDP
No ratings yet
06 MDP
89 pages
An Introduction To Markov Decision Processes: Bob Givan Ron Parr Purdue University Duke University
No ratings yet
An Introduction To Markov Decision Processes: Bob Givan Ron Parr Purdue University Duke University
23 pages
Rust J. - Numerical Dynamic Programming in Economics
No ratings yet
Rust J. - Numerical Dynamic Programming in Economics
167 pages
Markov Decision Process Tutorial
No ratings yet
Markov Decision Process Tutorial
22 pages
242 Sheet 02 03
No ratings yet
242 Sheet 02 03
5 pages
119686
No ratings yet
119686
24 pages
19.5 Markov Decision Processes: Resolving Unbounded Expected Rewards
No ratings yet
19.5 Markov Decision Processes: Resolving Unbounded Expected Rewards
13 pages
Stochastic DP
No ratings yet
Stochastic DP
23 pages
Markov Decision Processes & Reinforcement Learning: Megan Smith Lehigh University, Fall 2006
No ratings yet
Markov Decision Processes & Reinforcement Learning: Megan Smith Lehigh University, Fall 2006
40 pages
Cs5811 Ch17 Complex Dec
No ratings yet
Cs5811 Ch17 Complex Dec
29 pages
Logistics: CSE 473 Markov Decision Processes
No ratings yet
Logistics: CSE 473 Markov Decision Processes
10 pages
On Minimizing Ordered Weighted Regrets in Multiobjective Markov Decision Processes
No ratings yet
On Minimizing Ordered Weighted Regrets in Multiobjective Markov Decision Processes
15 pages
Lecture7 MDPs I
No ratings yet
Lecture7 MDPs I
9 pages
Policies, Search, Utility
No ratings yet
Policies, Search, Utility
13 pages
08 - Markov Decision Processes
No ratings yet
08 - Markov Decision Processes
31 pages
Tut21 RL
No ratings yet
Tut21 RL
101 pages
Markov Decision Processes: Lecture Notes For STP 425: Jay Taylor
100% (1)
Markov Decision Processes: Lecture Notes For STP 425: Jay Taylor
86 pages
Stochastic Process - Markov Property - Markov Chain - Markov Decision Process - Reinforcement Learning - RL Techniques - Example Applications
No ratings yet
Stochastic Process - Markov Property - Markov Chain - Markov Decision Process - Reinforcement Learning - RL Techniques - Example Applications
39 pages
Experiment 4
No ratings yet
Experiment 4
7 pages
Conjugate Markov Decision Processes
No ratings yet
Conjugate Markov Decision Processes
8 pages
Experiment 3
No ratings yet
Experiment 3
6 pages
RL-DQN-PG
No ratings yet
RL-DQN-PG
65 pages
An Application of Inverse Reinforcement Learning To Medical Records of Diabetes Treatment
No ratings yet
An Application of Inverse Reinforcement Learning To Medical Records of Diabetes Treatment
8 pages
Markov Decision Processes (MDP) : Sudeshna Sarkar
No ratings yet
Markov Decision Processes (MDP) : Sudeshna Sarkar
14 pages
Sp14 Cs188 Lecture 8 - Mdps I
No ratings yet
Sp14 Cs188 Lecture 8 - Mdps I
50 pages
325 Notes
No ratings yet
325 Notes
23 pages
Reinforcement Learning Note
No ratings yet
Reinforcement Learning Note
16 pages
A Tutorial For Reinforcement Learning
No ratings yet
A Tutorial For Reinforcement Learning
17 pages
DSA5102_lecture11
No ratings yet
DSA5102_lecture11
44 pages
A Brief Introduction To Reinforcement Learning
No ratings yet
A Brief Introduction To Reinforcement Learning
4 pages
cs229-notes12 Reinforcement in Control
No ratings yet
cs229-notes12 Reinforcement in Control
17 pages
22 Reinforcement Learning
No ratings yet
22 Reinforcement Learning
18 pages
Markov Decision
100% (3)
Markov Decision
212 pages
Week 10
No ratings yet
Week 10
5 pages
Lecture7 MDP
No ratings yet
Lecture7 MDP
44 pages
EE290 Lecture 16
No ratings yet
EE290 Lecture 16
4 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
101 pages
Textbook Solutions Expert Q&A Practice: Find Solutions For Your Homework
No ratings yet
Textbook Solutions Expert Q&A Practice: Find Solutions For Your Homework
6 pages
A Tutorial For Reinforcement Learning
No ratings yet
A Tutorial For Reinforcement Learning
14 pages
17 - Markov Decision Processes.pptx
No ratings yet
17 - Markov Decision Processes.pptx
59 pages
5.4-Reinforcement Learning-Part1-Introduction
No ratings yet
5.4-Reinforcement Learning-Part1-Introduction
15 pages
3f0739aa808805e51c445195485a7ebb_16-412s16ResourceFile
No ratings yet
3f0739aa808805e51c445195485a7ebb_16-412s16ResourceFile
56 pages
On State Variables and POMDP-s
No ratings yet
On State Variables and POMDP-s
49 pages
Markov Decision Process: Fundamentals and Applications
From Everand
Markov Decision Process: Fundamentals and Applications
Fouad Sabry
No ratings yet
Mathematical Optimization: Fundamentals and Applications
From Everand
Mathematical Optimization: Fundamentals and Applications
Fouad Sabry
No ratings yet
Random Optimization: Fundamentals and Applications
From Everand
Random Optimization: Fundamentals and Applications
Fouad Sabry
No ratings yet
Lesson 2 - Data Preprocessing
100% (1)
Lesson 2 - Data Preprocessing
72 pages
Lesson 6 - Unsupervised Learning
No ratings yet
Lesson 6 - Unsupervised Learning
63 pages
Lesson 8 - Ensemble Learning
No ratings yet
Lesson 8 - Ensemble Learning
61 pages
Lesson 5 - Supervised Learning-Classification
100% (1)
Lesson 5 - Supervised Learning-Classification
91 pages
Lesson 2 - Data Preprocessing
100% (1)
Lesson 2 - Data Preprocessing
72 pages
Lesson 1 - Introduction To AI and Machine Learning
No ratings yet
Lesson 1 - Introduction To AI and Machine Learning
44 pages
Lesson 0 - Course Introduction
No ratings yet
Lesson 0 - Course Introduction
6 pages
Kupdf Min PDF
No ratings yet
Kupdf Min PDF
1 page
Class 12 Cbse Maths Syllabus 2010-11
No ratings yet
Class 12 Cbse Maths Syllabus 2010-11
3 pages
Mech 2 Module 1 Unit 3 (Position, Velocity and Acceleration)
No ratings yet
Mech 2 Module 1 Unit 3 (Position, Velocity and Acceleration)
8 pages
Oxford University Colleges' Profiles
No ratings yet
Oxford University Colleges' Profiles
12 pages
Rapid Mathematical Programming or How To Solve Sudoku Puzzles in A Few Seconds
No ratings yet
Rapid Mathematical Programming or How To Solve Sudoku Puzzles in A Few Seconds
8 pages
Myths and Facts Related To SSC CGL Exam: Myth 1
No ratings yet
Myths and Facts Related To SSC CGL Exam: Myth 1
3 pages
Module 1-Perspectives On Living Systems
100% (1)
Module 1-Perspectives On Living Systems
26 pages
Mathematical Understanding 5 11 Anne D Cockburn instant download
No ratings yet
Mathematical Understanding 5 11 Anne D Cockburn instant download
52 pages
Estimation of Logit Choice Models Using Mixed Stated-Preference and Revealed-Preference Information
No ratings yet
Estimation of Logit Choice Models Using Mixed Stated-Preference and Revealed-Preference Information
19 pages
MSG.04.Introducing Algebra
No ratings yet
MSG.04.Introducing Algebra
13 pages
System Dynamics for Engineering Students: Concepts and Applications 2nd Edition Nicolae Lobontiu all chapter instant download
100% (6)
System Dynamics for Engineering Students: Concepts and Applications 2nd Edition Nicolae Lobontiu all chapter instant download
41 pages
Syl883 13.original
No ratings yet
Syl883 13.original
6 pages
Number System
No ratings yet
Number System
10 pages
ch6 Structs Pointers Recursion
No ratings yet
ch6 Structs Pointers Recursion
10 pages
Inthinking SL P1 - 2. Sol PDF
No ratings yet
Inthinking SL P1 - 2. Sol PDF
7 pages
Digital Fundamentals
No ratings yet
Digital Fundamentals
13 pages
Factoring A Polynomial
100% (1)
Factoring A Polynomial
5 pages
Conditional Probability
No ratings yet
Conditional Probability
4 pages
Ma102 Ode
No ratings yet
Ma102 Ode
2 pages
Maths Cala Plane Shapes
No ratings yet
Maths Cala Plane Shapes
4 pages
Real-Time Multibody Simulation of Vehicle Wheel Suspensions of Different Topologies With Elastokinematic Properties
No ratings yet
Real-Time Multibody Simulation of Vehicle Wheel Suspensions of Different Topologies With Elastokinematic Properties
17 pages
Applications of Free Probability and Random Matrix Theory: Øyvind Ryan
No ratings yet
Applications of Free Probability and Random Matrix Theory: Øyvind Ryan
23 pages
Full Download An Introduction To Piecewise Smooth Dynamics Paul Glendinning PDF
100% (3)
Full Download An Introduction To Piecewise Smooth Dynamics Paul Glendinning PDF
52 pages
Iis W Vi Math Holiday Asssignment 2022
No ratings yet
Iis W Vi Math Holiday Asssignment 2022
4 pages
Problem Solving and Program Design
No ratings yet
Problem Solving and Program Design
8 pages
Application of Double Integration (19194113 - Ritik Gupta)
No ratings yet
Application of Double Integration (19194113 - Ritik Gupta)
8 pages
Operating Characteristic (OC) Curves
No ratings yet
Operating Characteristic (OC) Curves
18 pages
Sampling Plans
No ratings yet
Sampling Plans
16 pages
Differential Calculus Compilation
No ratings yet
Differential Calculus Compilation
76 pages
Pythagorean Triples
No ratings yet
Pythagorean Triples
3 pages
Presentation 3
100% (1)
Presentation 3
37 pages