Dynamic programming
1 8/16/2017
Introduction
In most practical problems, decisions have to be made
sequentially at different points in time, at different
points in space, and at different levels, say, for a system
and/or a subsystem.
The problems in which the decisions are to be made
sequentially are called sequential decision problems.
Since these decisions are to be made at a number of
stages, they are also referred to as multistage decision
problems.
2 8/16/2017
As applied to dynamic programming, a multistage
decision process is one in which a number of single-stage
processes are connected in series so that the output of one
stage is the input of the succeeding stage.
Strictly speaking, this type of process should be called a
serial multistage decision process since the individual
stages are connected head to tail with no recycle.
3 8/16/2017
Example of A serial Multistage decision problem
The stage numbers may be assigned in increasing order
either in the forward direction or in the backward direction.
4 8/16/2017
Multi-stage decision making
5
The objective of a multistage decision problem
is to find x1, x2, x3, ,…., xn to maximize a
function of return ( the sum of returns over all
stages) while satisfying the stage
transformation equation.
Note: A multistage problem may consist of more
than one state variables.
8/16/2017
Dynamic programming (DP) is a useful mathematical
technique ideally suited for the optimization of sequential
decision problems (multistage decision problems).
Dynamic programming (DP) provides a systematic
procedure for determining the optimal combination of
sequential (multi-stage) decisions.
Example: Reservoir operation problem: release decisions have to be
made sequentially across time periods based on the inflow
and the available storage in each period.
This technique was developed by Richard Bellman in the
early 1950s.
6 8/16/2017
Dynamic programming is an approach that divides
the original optimization problem into a set of
smaller optimization problems, each of which needs
to be solved before the overall optimum solution to
the original problem can be identified.
DP transforms a complicated n-variable decision
process into a series of n-stages with a single
decision at each stage i.e. the dynamic
programming technique decomposes a multistage
decision problem as a sequence of single-stage
decision problems.
In contrast to linear programming, DP problems
are not amenable to a single, standard algebraic
7 formulation 8/16/2017
The optimization problem is transformed as a sequence
of n decision processes or stages
8 8/16/2017
Example of A single stage decision problem
9 8/16/2017
Example
For instance, take the case of maximizing a monthly
(variable) yield for a reservoir of known capacity.
Assume further that there are n-months of historic (or
synthetic) flow records.
Typically this is an optimization problem in which
the n-months’ releases are to be optimized.
In a LP formulation, the optimization algorithm
attempts to maximize the releases by formulating
an appropriate objective function that involves the
releases of all the months.
10 8/16/2017
DP, on the other hand, approaches the problem from a
different angle.
In DP the optimization problem is transformed as a
sequence of n decision processes, or stages, at which the
single monthly release, say at month i, is “optimally”
selected to maximize the total release over the entire
length of record.
11 8/16/2017
Thus, decision is made at each stage; in the present
example, at each month.
This sequential decision process can be shown as in the
following figure:
r1 r2 ri rN-1 rN
S3 Si+1
S1 S2 Si
1 2 i N-1 N
d1 d2 di dN-1 dN
12 8/16/2017
Terminologies
Stages: are the points, in time or space, where decision
are made.
Decision Variables (X) : are courses of action to be
taken for each stage. e.g. the release of a reservoir at
each stage (i.e. month).
State variables (Si): are variables that describe the
state of the system at any stage i e.g., the amount of
water available in the reservoir at the end of each stage.
13 8/16/2017
Terminologies
Stage return (R): is a scalar measure of the
effectiveness of the decision making at each stage. It is a
function of (S, X).
Stage transformation or state transition (T): is a
single-valued transformation which relates the input, the
output state, and the decision.
14 8/16/2017
Types of Multistage Decision Problems
1. Initial value problem. If the value of the initial state
variable, sn+1, is prescribed, the problem is called an
initial value problem.
2. Final value problem. If the value of the final state
variable, s1 is prescribed, the problem is called a final
value problem. Notice that a final value problem can be
transformed into an initial value problem by reversing
the directions of si ,i = 1, 2, . . . , n + 1.
3. Boundary value problem. If the values of both the input
and output variables are specified, the problem is
called a boundary value problem.
8/16/2017 15
Types of multistage problems: (a) initial value problem;
(b) final value problem; (c) boundary value problem.
16 8/16/2017
DP solution is based on Bellman’s principle
of optimality
Bellman’s principle of optimality state that
Given the current state of a system, the optimal policy
(Sequence of decisions) for the remaining stages is independent
of the policy adopted in the previous stages.
The principle implies that, given the state Si of the system at a
stage i, one must proceed optimally till the last stage,
irrespective of how one arrived at the state Si.
The solution begins by finding the optimal decision for each
possible state using the backward recursion or the Forward
recursion.
17 8/16/2017
Bellman’s principle of optimality
8/16/2017 18
Recursive nature of computations in DP
Computations in DP are done recursively, so that the optimum
solution of one sub problem is used as an input to the next
sub problem.
By the time the last sub problem is solved, the optimum
solution for the entire problem is at hand.
The manner in which the recursive computations are
carried out depends on how we decompose the original
problem i.e. on the type of DP problem.
The recursive computations can be done both in forward and
backward directions. Both recursions yield same solutions.
8/16/2017 19
Characteristics of a DP problem
A single n-variable problem is divided into n number
of single variable problems.
A problem is divided into stages, with a policy
decision required at each stage.
Each stage has a number of possible states associated
with it.
The policy decision transforms the current state into a
state associated with the next stage.
A recursive relationship identifies the optimal
decision at a given stage for a specific state, given the
optimal decision for each state at the previous stage.
A solution moves backward or forward, stage by stage
20 till optimal decision for the last stage is found. 8/16/2017
21 8/16/2017
22 8/16/2017
Example 1. Shortest route problem
Suppose that you want to select the shortest highway route
between two cities. The network in the Figure below shows
the possible routes between the starting city at node 1 and
the destination city at node 7. The routes pass through
intermediate cities designated by nodes 2 to 6.
8/16/2017 23
Solution
1. We can solve this problem by exhaustively
enumerating all the routes between node 1 and 7 (there
are five such routes). But in a large network exhaustive
enumeration may be difficult computationally. Thus,
another method of solution is required. DP is the most
suitable technique for such type of problems.
2. To solve the problem by DP, we first decompose it into
stages as delineated by the vertical dashed lines in the
next Figure. Next, we carry out the computations for
each stage separately.
8/16/2017 24
Decomposition of shortest route problem into stages
8/16/2017 25
Solution using DP : Points that should be noted
1. The problem is decomposed into three stages
2. At each stage a decision is made.
3. The solution to the problem is found recursively. there
are two ways – beginning at the left most node and
moving from left to right, called the forward-moving
(but backward-looking) algorithm, or beginning at the
most right column of nodes or states and moving from
right to left, called the backward-moving (but
forward-looking) algorithm.
First we will see the forward recursion and then we will
see the backward recursion.
Go to solution
8/16/2017 26
Example 2. Water Allocation problem
A total of 6 unit of water is to be allocated optimally to
three users. The allocation is made in discrete steps of one
unit ranging from 0 to 6. With the three users denoted as
user 1, User 2, and User 3 respectively, the returns obtained
from the users for a given allocation are given in the table
shown in the next slide.
The problem is to find allocations to the three users such
that the total return is maximized.
8/16/2017 27
28 8/16/2017
Go to solution
Exercises
1.
8/16/2017 29
2. A minimum-cost pipeline is to be laid
between points (towns) A and E. The
pipeline is required to pass through one
node out of B1, B2, and B3, one out of
C1, C2, and C3, and one out of D1, D2,
and D3 (see Fig. below). The costs
associated with the various segments of
the pipeline are given below:
8/16/2017 30
8/16/2017 31
8/16/2017 32