Markov Decision Processes For Optimizing Human Workflows
Markov Decision Processes For Optimizing Human Workflows
Service Science
Publication details, including instructions for authors and subscription information:
http://pubsonline.informs.org
This article may be used only for the purposes of research, teaching, and/or private study. Commercial use
or systematic downloading (by robots or other automatic processes) is prohibited without explicit Publisher
approval, unless otherwise noted. For more information, contact permissions@informs.org.
The Publisher does not warrant or guarantee the article’s accuracy, completeness, merchantability, fitness
for a particular purpose, or non-infringement. Descriptions of, or references to, products or publications, or
inclusion of an advertisement in this article, neither constitutes nor implies a guarantee, endorsement, or
support of claims made of that product, publication, or service.
With 12,500 members from nearly 90 countries, INFORMS is the largest international association of operations research (O.R.)
and analytics professionals and students. INFORMS provides unique networking and learning opportunities for individual
professionals, and organizations of all types and sizes, to better understand and use O.R. and analytics tools and methods to
transform strategic visions and achieve better outcomes.
For more information on INFORMS, its publications, membership, or meetings visit http://www.informs.org
Espinosa, Frausto, and Rivera: Markov Decision Processes for Optimizing Human Workflows
Service Science 2(4), pp. 245-269, © 2010 SSG
Enrique D. Espinosa
University of Waterloo
Downloaded from informs.org by [165.51.219.45] on 24 March 2024, at 02:37 . For personal use only, all rights reserved.
295 Hagey Blvd., Suite 240, Waterloo, Ontario, Canada, N2L 6R5
eespinos@engmail.uwaterloo.ca
Juan Frausto
Instituto Tecnológico y de Estudios Superiores de Monterrey
Campus Cuernavaca
Autopista del Sol km. 104, Colonia Real del Puente, Xochitepec, Morelos, México, 62790
juan.frausto@itesm.mx
Ernesto J. Rivera
Instituto Tecnológico y de Estudios Superiores de Monterrey
Campus Cuernavaca
Autopista del Sol km. 104, Colonia Real del Puente, Xochitepec, Morelos, México, 62790
a01126754@itesm.mx
W orkflows are used by domain analysts as a tool to describe the synchronization of activities in a
business domain. The Business Process Management Notation (BPMN) has become a standard to
characterize Workflows. Nevertheless, BPMN alone does not provide tools for aligning business models
and IT architectures. Currently, there is not a method which promotes making decisions based on a
technique with a probabilistic basis for providing financial value to a [human] workflow. If such method
could exist, it would help the domain analyst to understand what sections of the business and IT
architecture could be re-engineered for adding value. Markov Decision Processes (MDP’s) can be the
centerpiece of such a method. MDP’s are introduced as a means to pinpoint assets to be designed,
managed, and continuously improved, while enhancing business agility and operational performance.
Key words: workflow; human workflow; optimization; business process; BPMN; Markov; decision making
History: Received Feb. 5, 2010; Received in revised form Apr. 26, 2010; Accepted Apr. 27, 2010; Online
first publication Sept. 07, 2010
1. Introduction
A Workflow can be defined as a collection of ordered activities to accomplish some business process
(Georgakopoulos et al 1995). It is a tool to integrate the supply chain, and it can act as a bridge between two or more
organizations (Cardoso et al 2004). It involves “the movement of documents and tasks through a automated work
process in order to achieve certain goals” (Tsai et al 2007). Workflows help to model the behavior of diverse
processes, characterized as a sequence of activities. Therefore they can be seen as a decision-making problem-
solver, where material plus human and machine resources should be optimized.
Workflows were used to model supply chains all throughout the 20th century. However, the formal name
“workflow” was coined in 1985 at Filenet, a provider of Enterprise Content Management and Business Process
Management solutions. Domain Analysts model business units using a process-oriented approach, which can be
expressed by a ubiquitous description language. Business Process Management Notation (BPMN) (OMG 2008,
White 2004, Ouyang et al 2006) has become a common representational device for this purpose.
Nevertheless, the alignment between business objectives and IT architectures is still hard to achieve. BPMN per
se cannot model goals, objectives, strategies or tactics (common assets in business models). Added value is a desired
item on any enterprise architecture, but it is typically hard to devise by IT staff. Structured methods for finding
245
Espinosa, Frausto, and Rivera: Markov Decision Processes for Optimizing Human Workflows
Service Science 2(4), pp. 245-269, © 2010 SSG
value, such as balanced scorecards, aren’t still widely used (Malinverno and Hill 2007). In any BPM project,
management needs a clear idea of what performance element will improve as a result of the BPM effort (Rosser
2007). For this to happen, strategies must be filtered down as goals of the marketing, development, R&D operations,
IT organization and IT infrastructure. Most of the organizations have already detected these operational
inefficiencies and are taking aware of process modeling in order to get certain level of Maturity in BPM
(Melenovsky 2006).
IT would help that humans be a part of the assessment, that is, human workflows appear in the overall design.
Assessing the impact of human behavior on the overall business model, based on a probabilistic method could
Downloaded from informs.org by [165.51.219.45] on 24 March 2024, at 02:37 . For personal use only, all rights reserved.
provide a financial benefit, since it would make use of statistical data to support decision-making.
In order to optimize business processes, several efforts have been done from different areas: Business Process
Management (BPM), Business Intelligence (BI) and Business Rules (BR) have already done approaches to the
desired optimization. Nevertheless Forrester Research Inc strongly suggests the combination of the adequate
foundations from these three areas in order to find a definitive solution to this challenge (Evelson et al 2008).
Markov Decision Processes (MDP’s) are commonly used in BI as a tool for Decision Support and have been
widely used to model systems which require decision making. For example, they are used in queue theory for
evaluating performance, and for improving system operation while optimizing its performance (Hillier and
Lieberman 2004).
In this paper, we propose modeling business processes using BPMN Workflows and MDP’s. The goal is to
provide mathematical support to decisions in which human behavior plays a role. These decisions are often made in
an empirical fashion. In contrast, a method for computing the maximum net gain obtained from the evaluation of
employee’s performance is introduced. Based on these calculations, several workflows can be executed until the
optimum is obtained. Therefore, time and money can be saved by applying this model.
Section two gives a general view about relevant works involved with this field of study. Section three presents
the proposal of modeling Workflows with MDPs. Section four shows the results of some preliminary
implementations comparing the efficiency of workflows analyzed by MDP’s. Section Five introduces a second
scenario in order to test the method with different data. Finally, section six gives an overview for future work and
section seven presents the conclusions.
2. Related Work
Several research teams have aimed at improving workflows through MDP’s.
Doshi et.al. (2004) present an approach to a model for launching Web Services based on policies. Through this
work, researchers try to build an “abstract workflow” avoiding implementation details. The authors associate a
quality index to each workflow in order to select the best one. This approach leads to a specific objective: they select
a Web Service through the use of MDP’s, in order to make the workflow orchestration faster and more efficient
(Estublier and Anlaville 2005, Shing 2006, Digiampietri et al 2005). An orchestration is defined as “an executable
business process from the perspective and under the control of one single end-point” (Shing 2006). This ending-
point coordinates Web Services in order to define the business process (Digiampietri et al 2005).
Magnani and Montesi report (Magnani and Montesi 2007) that they can associate a “cost” to the elements of
BPMN notation, which allows them to apply Markov Decision Processes in order to obtain a policy based on the
total minimum computed cost of executing a business process. From this angle, they try to implement a mechanism
which allows them to choose the workflow that implies the minimum computational cost. This process can be linked
with the general mechanism for the execution of the workflow.
Van der Aalst and Van Hee (2002) have also applied Markov techniques in order to establish what the chances
are of a case taking a particular route through a process.
In this sense, the developed models by the aforementioned authors try to evaluate the best workflow in terms of
computational effort (or cost) and efficiency and help to choose the “fastest” workflow. In order to accomplish this
objective, they [generally] assign certain cost of execution to each of the elements of the workflow. Later, when they
have different alternatives for execution, they apply MDP’s in order to obtain the actions’ sequence (i.e. policy)
which allows them to choose the shortest workflow among a group of candidates (Doshi et al 2004, Magnani et al
2007).
These investigations attempt to minimize the computational cost of workflow execution. In the other hand, a new
approach of obtaining Workflow by an MDP which seeks to minimize the financial cost of executing an entire
workflow is proposed in this paper. In our research, we optimize the workflow design in order to obtain a financial
advantage, based on human behavior and/or capacities.
246
Espinosa, Frausto, and Rivera: Markov Decision Processes for Optimizing Human Workflows
Service Science 2(4), pp. 245-269, © 2010 SSG
The proposed contribution focuses in providing a strategic boost to the enactment of a workflow that includes
MDP’s. Its components can be differentiated by exhibiting different levels of performance and different
maintenance costs. An action will be defined as the execution of a task under the constraint imposed by their own
differentiation. As an example, in a stock exchange transaction, a task (e.g. a stock quotation) may be executed by a
rookie, or we can hire a more specialized professional with a higher academic level and experience. Similarly, the
job may be assigned to two assistant brokers, or perhaps one assistant broker plus one specialized staff. This will
result in different costs of execution, due to the salary gap between diverse employee profiles, as well as in different
performance effectively solving tasks, not to mention quality discrepancies among them. The strategic challenge is
Downloaded from informs.org by [165.51.219.45] on 24 March 2024, at 02:37 . For personal use only, all rights reserved.
to choose the form of execution which gives the best cost/benefit ratio. The decision will be probabilistic, as output
by executing an MDP.
There is a significant difference between an assistant and a broker: the broker will have more formal studies (for
example a college degree). As a consequence, he will have more competences and will be able to handle larger
responsibilities. Because of this, he will receive a higher salary with respect to the assistant option. This may be a
drawback on operation costs, especially if these added skills may be spared from the process. His assets would also
be larger: a better capacity for finishing his tasks, which will result in more stock quotes per day.
So, which is the better staff selection? A basis and a mechanism are needed to make the decision. The proposed
mechanism is a Markov Decision Processes (MDP) driven by a human workflow.
We represent the MDP as an addition to the current BPMN notation without drastically changing it. Meanwhile,
the MDP representation could be graphically seen as the addition of a sub-process with the legend “[MDP]”
included in the name. Inside the sub-process specification, a flow split with a distinctive icon with two arrows ( )
pinpointing different roads to take as shown on Figures 3 and 4.
247
Espinosa, Frausto, and Rivera: Markov Decision Processes for Optimizing Human Workflows
Service Science 2(4), pp. 245-269, © 2010 SSG
248
Espinosa, Frausto, and Rivera: Markov Decision Processes for Optimizing Human Workflows
Service Science 2(4), pp. 245-269, © 2010 SSG
Figure 4 MDP Sub-process Specification with Added Flow Split and Proposed Notation
In order to use an MDP, it is necessary to provide it with information like transition probabilities and the
cost/reward matrix. The information needed to supply to the MDP can be found in a well known, but not fully
utilized, administrative tool: the Balanced Scorecard (Kaplan and Norton 1996). The Scorecard provides a
measurement tool for staff performance. It allows to see what would the difference in performance be (plus the
resulting economical benefit/cost) between employees with different profiles. A simple Scorecard looks similar to
Figure 5. It is the basis for obtaining the aforementioned information after a mapping process.
Figure 5 shows two different profile evaluations: current staff and broker skills. Three main skills are
characterized: formal studies, competences and responsibilities. A weight is given, to estimate the impact of each
skill in the overall qualification. Column punctuation shows the skill performance level for the described profile. For
this example, levels are given according to the next scale:
1 = poor
2 = marginal
3 = average
4 = good
5 = excellent
Actor #1
Profile: stock exchange staff member
current Weight Punctuation Score broker Weight Punctuation Score
Formal studies 0.4 2 0.8 0.4 4 1.6
Competences 0.4 3 1.2 0.4 4 1.6
Responsibilities 0.2 3 0.6 0.2 3 0.6
Total Score 2.6 3.8
249
Espinosa, Frausto, and Rivera: Markov Decision Processes for Optimizing Human Workflows
Service Science 2(4), pp. 245-269, © 2010 SSG
Punctuation describes an employee’s performance level. As bigger the number is, better the performance is.
Finally a score is calculated for each skill by multiplying weight and punctuation columns. The scores are summed
and the result is multiplied by the base salary in order to obtain a weighted salary, according to the skills levels.
As a first step, the Markovian states are determined. This is an intuitive process which requires a sound analysis.
For the example above, the markovian states are set according to the quantity of stock quotes the staff may produce
in a given mainframe:
•
Downloaded from informs.org by [165.51.219.45] on 24 March 2024, at 02:37 . For personal use only, all rights reserved.
Next, we need to represent a transit schema between Markovian States. For this purpose, we use a Transition
Matrix. In MDP’s, a Transition Matrix is calculated based on statistical data. Its structure is obtained from the
Scorecard. The transition probabilities for each of the states are thus obtained.
For this example some values have been mapped from the Balanced Scorecard and have been used in order to
propose a method for their conversion to probabilistic values.
It is necessary to establish “ideal values”. These ideal values are the expected values for the current employees’
performance. In other words, they are the quantity of stocks quoted in a day for the different productivity levels. For
this example, 15 Stocks Quoted (SQ) have been assigned for low performance level; 50 SQ for the average
performance level and 90 SQ for high performance level as ideal values. Then the [real] number of quotes that the
employees quote in a day is obtained. It is assumed that the employee has [actually] quoted 16 Stocks per Day
(SPD) while having a low performance, 24 SPD while having an average performance and 32 SPD while having a
high performance. Next, the difference between the ideal values and the real values for each performance level of
the different profiles is calculated. Following the example, when the employee makes a transition from low
performance to high performance, the difference between the ideal and the real (90-16=74) is obtained. Once the
calculations are made, the values shown on Table 1 are obtained. Values are normalized for a percentile probability.
The procedure is explained next. The absolute value for the numbers on each obtained row is added. For the case
‘low’, the result is 109. Now each absolute value is substracted to the previously calculated sum (109-1=108, 109-
34=75, 109-74=35). Results are shown on Table 2.
Table 1 Differences Between Ideal and Real Values for State ‘low’
Finally, the preprocessed values are converted to percentile probability. Percentile probability is obtained by
summing the values in the corresponding row, and then each cell’s value is divided by the obtained sum. For
example the percentile value for cell (low, low) is 108/(108+75+35)=0.4954. The final value is then rounded to 0.5
for easy handling. The final Transition Matrix with values formatted to probabilities for this example is shown on
Table 3.
250
Espinosa, Frausto, and Rivera: Markov Decision Processes for Optimizing Human Workflows
Service Science 2(4), pp. 245-269, © 2010 SSG
The Cost/reward matrix (See Table 4) is calculated based on the differential between financial revenue provided
by each of the proposed actions. In turn, this differential is provided by strategic alignment with company goals
using scorecards. The Balanced Scorecard provides weights given to individual competences as referred to by the
[personal] evaluation of his credentials. In order to obtain these data, it is necessary to calculate the time per unit (of
work) that each type of employee needs for completing the task while having a low, average and high productivity.
Cost per day is then obtained, as long as the net income. The difference between cost and income gives the net gain.
The difference between net gains of the states is then placed in the cost/reward matrix at the corresponding state
transition. This calculus is realized because the change from one state to other implies the need of calculate the
difference between the obtained gain at the different cases. Values for this matrix are divided by 1,000,000
[Mexican pesos (MXN)].
Once all the necessary data for the Markov Decision Process have been determined, one of the methods for
solving MDP’s: complete enumeration, dynamic programming, policy iteration, etc. Besides, it is possible to plan at
a finite or infinite horizon.
For this problem we have experimented with all the available techniques for solving MDP’s. We have found that
some techniques perform better than others. Results about it are shown on Section 4.
The next step is to implement a DSS using the Markovian Transition and Cost/Reward Matrixes, in order to
allow decision making (with MDP support) while modeling BPMN Workflows.
251
Espinosa, Frausto, and Rivera: Markov Decision Processes for Optimizing Human Workflows
Service Science 2(4), pp. 245-269, © 2010 SSG
Maximum income
7
6
x 1,000,000 pesos
5
4
Downloaded from informs.org by [165.51.219.45] on 24 March 2024, at 02:37 . For personal use only, all rights reserved.
3
2
1
0
CE-F DP-F CE-I PI-I PID-I
The method that presented a higher maximum income is Policy Iteration with Discount Factor for infinite
horizon. On the other hand, Complete Enumeration for infinite horizon predicted the lower income.
Notice that these methods have been provided with the information about [efficiency] benefit and cost (Table 4),
as long as the probabilities for state change (Table 3). These data describe the individual performance (and cost) of
each employee. As a consequence the MDP methods select the best action sequence to optimize benefits, getting the
maximum income based on data from the Balanced Scorecard (Figure 5). Providing these data, the best policy for
each possible decision is obtained.
Begin
k=1, i=1
do
vi=accumulateIncome(trans_mat, costRew_mat)
k++
while(k<policies_numb)
do
i=k, j=epochs
do
m
f n ( i ) = máx { v ik +
k
∑j =1
p ijk f n + 1 ( j )}
optimal_policy=obtainPolicy(fn)
j--
while(j>0)
i--
while(i>0)
End
The DP algorithm (see figure 7) delivers the expected optimum income (fn) and the optimal policy for finite
horizon. The algorithm first calculates the vi vector which contains the cumulative income. Then the Bellman
Equation (1) is recursively applied in order to obtain the optimal policy and the expected optimum income (Hillier
and Lieberman 2004, Taha 2007, Winston 2004).
m
f n (i ) = máx{v + ∑ pijk f n+1 ( j )}
k
i
(1)
k
j =1
252
Espinosa, Frausto, and Rivera: Markov Decision Processes for Optimizing Human Workflows
Service Science 2(4), pp. 245-269, © 2010 SSG
The algorithm for Policy Iteration with discount factor (PID) is shown on figure 8. It turned out to be the best
method for solving MDP’s with finite horizon in small BPMN workflows.
Begin
k=random_select_policy()
do
vi=accumulateIncomeFor Policy(trans_mat,
costRew_mat, po)
π=iterationEquations(trans_mat, discount_factor,
po)
m
f (i ) = v ik + ∑ p ijk f ( j )
j =1
newPolicy=obtainPolicy(fn)
while(newPolicy!=k)
optimal_policy=newPolicy
End
Outputs: expected_optimum_income,
A random policy ‘k’ is chosen by PID algorithm. Then the accumulate income vector (vi) for the selected policy
is calculated (Taha 2007). This method needs (but we don’t know them) the transition probabilities for the state n, so
it is necessary to calculate them. Because of that reason, it calculates the transition probabilities in the equilibrium
through the Chapman Kolmogorov equations (Hillier and Lieberman 2004, Taha 2007).
E + f ( policy ) + f ( s ) P s = π s (2)
Once the transition probabilities in the equilibrium have been determined, policy iteration is performed, based on
a modified Bellman equation (3).
m
f (i ) = v + ∑ pijk f ( j )
k
i (3)
j =1
Finally the new policy is compared to the policy k. If both policies are not identical, the new policy is selected as
the policy k and the process is performed again, else the new policy is declared as the optimal and process is over
(Hillier and Lieberman 2004, Taha 2007, Winston 2004).
The application of MDP’s to the workflow reflects a more efficient process as long as how obtained policy is
applied. The results of the finite horizon algorithm (DP) applied to the example from Section 3 are resumed on
Table 5 with an optimum income of 5.468 million MXN. The PID algorithm points out that a broker and an assistant
should be hired while the state is low or average. On the other hand, when the state is high, the current staff should
be enough. This algorithm determines a maximum income of 5.79 million MXN.
253
Espinosa, Frausto, and Rivera: Markov Decision Processes for Optimizing Human Workflows
Service Science 2(4), pp. 245-269, © 2010 SSG
The goal of this method comparison is to determine the method with better performance applied to BPM
Workflows. Then they can be combined with the proposed model from section 3. This combination allows aligning
the MDP for decision making and BPMN for workflow modeling with one of every organization’s objectives: the
enterprise growth through a bigger economical benefit (Rosser 2007). In other words, money is saved as much as
possible by optimizing the decisions taken through the complete workflow.
Downloaded from informs.org by [165.51.219.45] on 24 March 2024, at 02:37 . For personal use only, all rights reserved.
5. A Second Scenario
In order to test the effectiveness of the method, a second scenario has been proposed. Based on the workflow in
Figure 1, the fragment shown on figure 9 is focused. This figure shows the initial process in the stockbrokerage.
This process starts when a quote is received and it is necessary to calculate the future market value. Another Markov
Decision Process has been implemented, since it can help to decide what type of broker to hire, in order to maximize
the maximum expected income.
This time it is necessary to know what type of employee will be in charge of calculating the future market value
(we will for sure hire him/her). Brokers ready to work with different skills can be found, even with different skills
levels. A broker needs according to (HowToAll 2006) four elemental skills: communication, Mathematics skills,
Mathematics + statistics skills and Portfolio handling experience. These skills will determine the type of broker is
needed to optimize the expected benefit from the transaction, because, as long as they get a different pay, they also
have a different level of quality at their evaluations, besides, the different skills can influence the time they need to
develop their job.
According to this and Section 3, the resulting model for the MDP sub-process can be seen as a four flow division
(witness figure 10). The reason for this division is that an employee who has all the four skills (probably at different
level [determined by the MDP]) is required. In order to find the adequate profile, four Markovian Processes will be
built (one for each skill).
In order to implement the MDP’s we need to obtain certain data that help us to make the different matrixes
needed for the Markovian Process. First we need again to work with the Balanced Scorecard (see Section 2). At this
part of the process, we need to take a look at the Scorecard for this employee, shown on figure 11.
254
Espinosa, Frausto, and Rivera: Markov Decision Processes for Optimizing Human Workflows
Service Science 2(4), pp. 245-269, © 2010 SSG
Communication 0.3
Math skills 0.2
Math+statistics 0.4
Portfolio handling experience 0.1
Total Score
The necessary skills for a broker can be seen at the Scorecard (figure 11), as long as the weight of each of them
for the profile. Also a table (derived from the scorecard book) can be calculated showing the cost related to each of
these skills. In other words, it is necessary to know how expensive is to obtain each skill in an employee. For this
example, tables shown on figure 12 have been built.
255
Espinosa, Frausto, and Rivera: Markov Decision Processes for Optimizing Human Workflows
Service Science 2(4), pp. 245-269, © 2010 SSG
These tables show the relation between the skill level and its impact over the final employee performance, as
long as the cost of this relation. At these tables a value closer to one, means a higher pay. On the other hand, a value
closer to zero means a lower pay.
Besides, it is necessary to obtain other values in order to determine the cost/reward matrix. These values include
the cost of the stock (c) (a cost of 50,000 MXN Pesos is supposed) and an estimated profit (p) from buying the
stock. This profit has been estimated in three scenarios: good, average and low. These scenarios are directly related
with the degree of confidence that the broker exhibits when making quotes to the client (which can also be high,
medium and low). It is also needed to know the base salary (b) and the weight (w) of the analyzed skill. These last
two values are easily obtained from the Scorecard for the employee’s profile (see figure 11).
It is necessary to remember that one MDP for each employee’s skill is going to be designed. As a consequence
four different transition matrixes and four different cost/reward matrixes are built.
The states of the MDP correspond to the performance of the broker. For this example high, medium and low
[performance] states have been stated.
In order to obtain all the positions of each cost/reward matrix (M), skills costs (S) tables are used (witness figure
12), applying the equation (4).
Finally transition matrixes are built. Once more, these transition matrixes should be based on statistical data
about the performance of the employee based on the different skills.
The transition matrixes for the Markovian Process corresponding to the communication skill are shown on table
6. The corresponding cost/reward matrixes are shown on table 7. Tables 8 and 9 show the transition matrixes as long
as cost/reward matrixes for Mathematics skill. Tables 10 and 11 show the corresponding matrixes for
Math+statistics skill. Finally, tables 12 and 13 show the skill ‘portfolio handling experience’.
256
Espinosa, Frausto, and Rivera: Markov Decision Processes for Optimizing Human Workflows
Service Science 2(4), pp. 245-269, © 2010 SSG
Once necessary data have been obtained, software tools are used, in order to solve the MDP’s, as described in
Section 4. As a result the more beneficial policy is obtained. This policy states the type of employee to hire. Again,
we have experimented by running the Decision Process for finite horizon with 3 stages and infinite horizon.
257
Espinosa, Frausto, and Rivera: Markov Decision Processes for Optimizing Human Workflows
Service Science 2(4), pp. 245-269, © 2010 SSG
The resulting policy is resumed on table 14. Notice that obtained policy with infinite horizon is exactly the same
than that obtained with finite horizon and it recommends hiring an employee good at the four necessary skills. The
maximum expected income obtained in the finite horizon is resumed in the graphic at figure 13.
From these results can concluded that (because of the low ratio in the cost/proactivity) it is necessary to hire a
good employee in all the skills, but we can have a bigger profit if we don’t give him too many responsibilities, if he
has a low performance his salary will decrease helping to have a big margin of profit. By the way, the results from
infinite horizon (see figure 14), confirm the same tendency, showing similar curves to those obtained in finite
horizon. The new conclusion from these last results is that while longer time planning; the bigger profit can be
obtained.
Table 14 Resulting Policy from the MDP for Finite Horizon and Infinite
Com munication Mathematics skills
State Stage 1 Stage 2 Stage 3 State Stage 1 Stage 2 Stage 3
H igh Good Goo d Go od High Go od G ood Good
Average Good Goo d Go od Average Go od G ood Good
L ow Good Goo d Go od Low Go od G ood Good
Mathematics+statistics skills Portf olio h andling experience
State Stage 1 Stage 2 Stage 3 State Stage 1 Stage 2 Stage 3
High Go od Good Goo d High Goo d G ood G ood
Average Go od Good Goo d Average Goo d G ood G ood
Lo w Go od Good Goo d Low Goo d G ood G ood
It can be noticed at these results a singular phenomenon. From the policy obtained, it can be seen a tendency to
hire an “ideal employee”. This is because the data used on this case are based on experimental data. The same
example has been tested, using variations of the data, which results in different policies and expected values. The
mentioned variations include a bigger base salary for the employee and a smaller future stock value (smaller profit);
besides variations to the transition matrixes have been made. Resulting profiles and expected values are shown next.
50
49
Communication
48
Mathematics skill
47
x1,000
Math+statistics
46 skill
Portfolio handling
45 experience
44
High Average Low
258
Espinosa, Frausto, and Rivera: Markov Decision Processes for Optimizing Human Workflows
Service Science 2(4), pp. 245-269, © 2010 SSG
175
170
165 Communication
Downloaded from informs.org by [165.51.219.45] on 24 March 2024, at 02:37 . For personal use only, all rights reserved.
160 Mathematics
x1,000 skills
155
Mathematics+stat
istics skills
150
Portfolio handling
145 experience
140
High Average Low
The first variation consisted in a reduced future stock value. This reduction, impacted directly in a smaller profit,
reflected in the cost/reward matrixes. The resulting profile for finite horizon in three stages is shown on Table 15.
The new maximum expected income is shown at figure 15. Same data for infinite horizon are resumed on table 16
and figure 16.
Table 15 Resulting Policy from the First Data Variation, Finite Horizon
259
Espinosa, Frausto, and Rivera: Markov Decision Processes for Optimizing Human Workflows
Service Science 2(4), pp. 245-269, © 2010 SSG
Figure 15 Maximum Net Income for First Data Variation, Finite Horizon
14
13
12
Mathematics skill
11
10 Math+statistics skill
9 Portfolio handling
experience
8
High Average Low
Performance
From these figures can be noticed that a more flexible profile has been obtained, varying the employees’ skill
level between average and good. Nevertheless the maximum net income is reduced in these data. This reduction is
because of the future value of the stock, which is lower than the initial one; giving a minor profit from the stock
purchase. This profit reduction is more drastically reflected when results for infinite horizon are compared, which
has been reduced from values between 155 000 and 175 000 to values between 11 000 and 14 000.
Figure 16 Maximum Net Income for First Data Variation, Infinite Horizon
17
16
Max. Net Income (x1,000)
15 Communication
14
Mathematics skills
13
12 Mathematics+statisti
cs skills
11
Portfolio handling
10
experience
High Average Low
Performance
A second data variation was made in order to see if it showed a different behavior: it included a small increment
to the employees’ base salary. Besides, transition probabilities were modified, giving bigger transition possibilities
to the medium performance state. The obtained policy for finite horizon (witness table 17) showed a more balanced
profile, having an average level at mathematics skill. On the other hand portfolio handling experience didn’t show
any variation at all. The policy for Infinite horizon showed exactly the same than first data variation (see table 16).
Besides, the maximum net income (see figures 17 and 18) resulted also with similar values and curves if compared
to the first data variation.
260
Espinosa, Frausto, and Rivera: Markov Decision Processes for Optimizing Human Workflows
Service Science 2(4), pp. 245-269, © 2010 SSG
Figure 17 Maximum Net Income for Second Data Variation, Finite Horizon
14
Net Income (x1000)
13
Communication
12
Mathematics skill
11
x1,000
Math+statistics
Max.
10 skill
Portfolio handling
9 experience
8
High Average Low
Figure 18 Maximum Net Income for Second Data Variation, Infinite Horizon
16
15
Income (x1000)
14 Communication
13 Mathematics
skills
Max. Netx1,000
12
Mathematics+stat
istics skills
11
Portfolio handling
10 experience
9
High Average Low
The analyzed case is valid for demonstration purpose. Nevertheless, it is not designed strictly according to the
reality. For example, in the reality usually there are more than 3 employee profiles. In order to show a more realistic
261
Espinosa, Frausto, and Rivera: Markov Decision Processes for Optimizing Human Workflows
Service Science 2(4), pp. 245-269, © 2010 SSG
view of this case, a last case has been run, involving a variation: five skill levels have been introduced (Expert,
Competent, Medium, Marginal and Low) replacing the three previous levels (High, Medium and Low). These skill
levels represent five different actions (policies). The Markovian Process must choose one among these policies for
every system state.
Downloaded from informs.org by [165.51.219.45] on 24 March 2024, at 02:37 . For personal use only, all rights reserved.
262
Espinosa, Frausto, and Rivera: Markov Decision Processes for Optimizing Human Workflows
Service Science 2(4), pp. 245-269, © 2010 SSG
Table 20 Transition Matrixes for Five Policies Table 21 Cost/Reward Matrixes for Five Policies
Case, Mathematics Skill Case, Mathematics Skill
Action 1: Hire an expert Action 2: Hire a competent Action 1: Hire an expert Action 2: Hire a competent
mathematics skill broker mathematics skill broker mathematics skill broker mathematics skill broker
High Average Low High Average Low High Average Low High Average Low
Downloaded from informs.org by [165.51.219.45] on 24 March 2024, at 02:37 . For personal use only, all rights reserved.
High 0.6 0.3 0.1 High 0.55 0.3 0.15 High 24.5 25.5 19.75 High 8.5 9.25 12.75
Average 0.3 0.6 0.1 Average 0.25 0.4 0.35 Average 25.25 25 23.23 Average 9.25 9 11.25
Low 0.1 0.1 0.8 Low 0.05 0.3 0.65 Low 24.75 22.25 20.75 Low 10.75 10.25 10.75
Action 3: Hire a medium Action 4: Hire a marginal Action 3: Hire a medium Action 4: Hire a marginal
mathematics skill broker mathematics skill broker mathematics skill broker mathematics skill broker
High Average Low High Average Low High Average Low High Average Low
High 0.7 0.2 0.1 High 0.4 0.3 0.3 High 21.2 19.1 24.7 High 13.4 11.3 13.1
Average 0.6 0.2 0.2 Average 0.2 0.4 0.4 Average 22.2 22.4 23.7 Average
10.9 9.9 13.9
Low 0.5 0.3 0.2 Low 0.1 0.4 0.5 Low 25.5 25 25.14 Low 12.3 13 11.1
Action 5: Hire a low Action 5: Hire a low mathematics
mathematics skill broker skill broker
High Average Low High Average Low
High 0.2 0.4 0.4 High -1.5 -0.75 -0.25
Average 0.1 0.3 0.6 Average -0.75 -1 -0.75
Low 0.01 0.09 0.9 Low -0.25 -0.75 -0.25
Table 22 Transition Matrixes for Five Policies Table 23 Cost/Reward Matrixes for Five Policies
Case, Math + Statistics Skill Case, Math + Statistics skill
Action 1: Hire an expert math + Action 2: Hire a competent math Action 1: Hire an expert math + Action 2: Hire a competent math +
statistics skill broker + statistics skill broker statistics skill broker statistics skill broker
High Average Low High Average Low High Average Low High Average Low
High 0.6 0.3 0.1 High 0.55 0.3 0.15 High 40 38 39.5 High 47.75 48.875 49.625
Average 0.3 0.6 0.1 Average 0.25 0.4 0.35 Average 38.7 39 41 Average 48.875 48.5 48.875
Low 0.1 0.1 0.8 Low 0.05 0.3 0.65 Low 35.1 37.4 39.9 Low 49.625 48.875 49.625
Action 3: Hire a medium math + Action 4: Hire a marginal math + Action 3: Hire a medium math + Action 4: Hire a marginal math +
statistics skill broker statistics skill broker statistics skill broker statistics skill broker
High Average Low High Average Low High Average Low High Average Low
High 0.7 0.2 0.1 High 0.4 0.3 0.3 High 25.4 22.2 26.9 High 17.75 18.875 19.625
Average 0.6 0.2 0.2 Average 0.2 0.4 0.4 Average 27.3 21.1 24 Average 18.875 18.5 18.875
Low 0.5 0.3 0.2 Low 0.1 0.4 0.5 Low 25.5 24 23.3 Low 19.625 18.875 19.625
Action 5: Hire a low math + Action 5: Hire a low math + statistics
statistics skill broker skill broker
High Average Low High Average Low
High 0.2 0.4 0.4 High -12.25 -11.125 -10.375
Average 0.1 0.3 0.6 Average -11.125 -11.5 -11.125
Low 0.01 0.09 0.9 Low -10.375 -11.125 -10.375
263
Espinosa, Frausto, and Rivera: Markov Decision Processes for Optimizing Human Workflows
Service Science 2(4), pp. 245-269, © 2010 SSG
Table 24 Transition Matrixes for Five Policies Case, Portfolio Handling Experience
Action 1: Hire an expert portfolio Action 2: Hire a competent portfolio Action 3: Hire a medium portfolio
handling experience skill broker handling experience skill broker handling experience skill broker
High Average Low High Average Low High Average Low
High 0.3 0.4 0.3 High 0.2 0.6 0.2 High 0.25 0.35 0.4
Downloaded from informs.org by [165.51.219.45] on 24 March 2024, at 02:37 . For personal use only, all rights reserved.
Average 0.1 0.5 0.4 Average 0.1 0.5 0.4 Average 0.35 0.45 0.2
Low 0.2 0.5 0.3 Low 0.2 0.4 0.4 Low 0.5 0.45 0.05
Action 4: Hire a marginal portfolio Action 5: Hire a low portfolio
handling experience skill broker handling experience skill broker
High Average Low High Average Low
High 0.3 0.4 0.3 High 0.3 0.4 0.3
Average 0.2 0.3 0.5 Average 0.2 0.3 0.5
Low 0.2 0.4 0.4 Low 0.2 0.4 0.4
Table 25 Cost/Reward Matrixes for Five Policies Case, Portfolio Handling Experience
Action 1: Hire an expert portfolio Action 2: Hire a competent portfolio Action 3: Hire a medium portfolio
handling experience skill broker handling experience skill broker handling experience skill broker
High Average Low High Average Low High Average Low
High 14.1 14.1 14.8 High 14.0 13.8 14.8 High 13.64 13.64 13.92
Average 14.44 14.2 14.36 Average 14.3 14.2 14.5 Average 13.776 13.68 13.744
Low 14.46 14.7 13.84 Low 14.5 14.7 15.0 Low 13.784 13.88 13.536
Action 4: Hire a marginal portfolio Action 5: Hire a low portfolio
handling experience skill broker handling experience skill broker
High Average Low High Average Low
High 14.0 14.5 14.4 High -0.18 -0.18 -0.04
Average 15.0 14.5 14.7 Average -0.112 -0.16 -0.128
Low 14.9 14.1 14.9 Low -0.108 -0.06 -0.232
For this case five matrixes are needed for the markovian process. Markovian matrixes for four skills are shown
from Table 18 to Table 25. These values were obtained again based on the experience of the authors. Once the
matrixes have been designed, the Markovian Process is performed. Resulting policy of the MDP is shown on Table
26. Maximum Net Income is shown on Figure 19.
264
Espinosa, Frausto, and Rivera: Markov Decision Processes for Optimizing Human Workflows
Service Science 2(4), pp. 245-269, © 2010 SSG
90
80
Communication
Mathematics
60 Math+statistics
50 Portfolio handling
experience
40
High Average Low
Performance
From this experiment a different behavior in the policy can be seen, as long as in the different curves on Figure
19. This behavior is caused because the values in the corresponding matrixes have more weight in different actions
than in the first experiments. This has provoked the difference in the curves and the difference in the policies, which
look more varied. In this figure, in general terms, a bigger profit could be obtained if an average state in
mathematics skills is reached and maintained. In similar way, an average state of performance in portfolio handling
experience is recommended to achieve and maintain. According to the curves in Figure 19, a bigger profit can be
obtained if a low performance in communication and statistics is achieved; this last asseveration is caused possibly
because of the salaries increment or other factors [costs] that affect the profit and losses.
The behavior shown in the different scenarios leaves to an important conclusion: When a Markovian process for
Decision making in BP is modeled, it is necessary to get as grained data as possible, in order to make a more
accurate prediction. Besides, it is quite important to get data from adequate sources (such as the Balanced
Scorecard). In this sense, the most trustable the source is, the most adequate the resulting policy is.
6. Further work
Further work consists on the design of larger cases, including a complete organization structure and Business
Process. The addition of historical data to the MDP model is proposed, in order to make the design strictly according
to reality.
The resulting policy from MDP’s improves the performance of our BPMN Workflow in different ways. In the
particular case previously analysed, the time required for quote stocks is minimized. Figure 20 is a fragment of the
Workflow corresponding to the Stockbrokerage (Sections 3 and 4) modified accordingly to the “five policies case”
(Section 5) shown on Figure 10. Figure 20 details the communication skill, showing the MDP application for
selection between five skill levels.
265
Espinosa, Frausto, and Rivera: Markov Decision Processes for Optimizing Human Workflows
Service Science 2(4), pp. 245-269, © 2010 SSG
The modification or addition to the notation including the MDP impact is not the main focus of this paper, but it
will be addressed in future works. As a preview it is possible to visualize the decision taken and its consequences in
an added scheme to the process model (See Figure 21). Figure 21 shows an important addition: the proposed icon
for MDP representation in BPMN ( ), this icon was discussed on Section 3.
Not only graphical notation should be addressed but the BPMN xml schema (Ouyang et al 2006) should be
modified also. BPMN xml schema is used to save the diagrams in an xml formatted file. A preliminary proposal
could look as shown in Figure 22. This xml document includes a fragment of a possible schema for including
Markov Decision Process information for the skill “Communication” in the “five policies case” (Section 5). We
have outlined in bold the new proposed section for MDP modeling. These last proposals are merely approaches
needing further development and they are placed here only to show the possible direction of our research.
Once the business process has been modeled in BPMN notation, it is necessary to translate this notation to the
execution language BPEL. Some efforts have already been done such as (Asztalos et al 2009, García-Bañuelos
2009, Mazanek & Minas 2009, Ouyang et al 2006). Nevertheless, according to our research, we need to figure a way
to include the impact of the MDP execution, with the selection of the best options for profiles and skill selection.
Then, the MDP results should be included as task assignments in the XML that defines the BPEL process.
Finally, the possible implementation of a tool capable of joining the classic modeling technique for Business
Process with the Markovian model introduced in this paper is being currently developed. This development would
result in a tool capable of decision making for business process execution improvement. Besides it could be possible
to identify optimization parameters and to elaborate an execution report.
7. Conclusions
It has been introduced a new application for Markov Decision Processes in the field of Business Processes
Management. A general method for modeling the elements for a Markov Decision Process from different elements
commonly used in Business Process Management has been described.
This paper proposes a method for Decision Support based on observable data (Markov Decision Processes
propose a policy based on the analysis of these data). A different approach to those described on previous sections
(see Section 2) has been made. Through this method BPMN can give an extra economical profit to most
organizations, as long as the inclusion of the human performance in the workflow modeling. In this approach
foundations from Business Process Management (BPM) and Business Intelligence (BI) have been combined in order
to optimize Workflows (Evelson et al 2008).
266
Espinosa, Frausto, and Rivera: Markov Decision Processes for Optimizing Human Workflows
Service Science 2(4), pp. 245-269, © 2010 SSG
Figure 22 A Possible Modification to the BPMN XML Schema for Saving and Characterizing
Markov Decision Processes and Their Result and Impact over the Model
267
Espinosa, Frausto, and Rivera: Markov Decision Processes for Optimizing Human Workflows
Service Science 2(4), pp. 245-269, © 2010 SSG
About Markov Decision Processes, through this research, the performance of different methods for solving
MDP’s providing BPMN Workflows a mechanism for decision making while optimizing different resources. As
long as MDP’s are divided by horizon (finite and infinite) it has been found that the best MDP method for solving
BPMN workflows with finite horizon is Dynamic programming, while Policy Iteration with Discount Factor is the
best for infinite horizon.
Besides, different scenarios while having data variations have been tested in order to feed with different data the
MDP, then the behavior shown for the decision making on each case has been analyzed. From these modifications to
the case of study, it has been seen how some differences in the matrixes make a different prediction in the resulting
Downloaded from informs.org by [165.51.219.45] on 24 March 2024, at 02:37 . For personal use only, all rights reserved.
policy, as long as a visible difference in the predicted maximum net income. Because of this, we strongly suggest to
be careful when obtaining statistical, historical data about people’s performance (Scorecard Book) and their
influence over the final workflow and the decisions taken with the Markovian Processes, in order to make more
exact predictions when this method is applied in real scenarios. From these experiments can be concluded that a
more fine-grained design of the Markovian states observed on the process flow and a good method for data
collection will result in a trustable policy.
References
Asztalos, M., T. Mészáros, L. Lengyel (2009), “Generating Executable BPEL Code from BPMN Models”.
Unpublished. http://is.tm.tue.nl/staff/pvgorp/events/grabats2009/submissions/grabats2009_submission_16-
final.pdf. (Last accessed on October 14, 2009).
Cardoso, J., R. Bostrom, A. Sheth (2004), ”Workflow Management Systems and ERP Systems: Differences,
Commonalities and Applications”. Information Technology and Management. Vol. 1, No. 3-4. pp 319-338.
Digiampietri, L. A., C. B. Medeiros, J. C. Setubal (2005), “A framework based on Web service orchestration for
bioinformatics workflow management”. Genetics and Molecular Research. Vol. 3, No. 4. pp. 535-542.
Doshi, P., K. Verna, R. Goodwin, R Akkiraju (2004), “Dynamic Workflow Composition using Markov Decision
Processes”. Proceedings of the IEEE International Conference in Web Services (ICWS ’04). pp 576-582.
Estublier, J. and S. Anlaville (2005), “Business Process and Workflow Coordination of Web Services”. Proceedings
of the IEEE International Conference on e-Technology, e-Commerce and e-Service, 2005. pp. 85-88.
Evelson, B., C. Teubner and J. Rymer (2008). “How The Convergence Of Business Rules, BPM and BI Will Drive
Business Optimization”. Forrester Research, Inc. May 14, 2008.
García-Bañuelos, L. (2009). “Translating BPMN Models to BPEL Code”. Unpublished.
http://is.tm.tue.nl/staff/pvgorp/events/grabats2009/submissions/grabats2009_submission_22banuelos.pdf. (Last
accessed on October 14, 2009).
Georgakopoulos, D., Hornick, M. and Sheth, A.. (1995). “An Overview of Workflow Management: From Process
Modeling to Workflow Automation Infrastructure”. Distributed and Parallel Databases. Vol. 3, No. 2. pp 119-
153.
Hillier, F. and G. Lieberman (2004), Introduction to Operations Research, Mc Graw Hill, New York, USA.
HowToAll, (2006) “How to become a stockbroker”. http://www.howtoall.com/Educationfiles/
Howtoallbecomestockbroker.htm. (Last accessed on April 10, 2008).
Kaplan, R. S. and D. P. Norton (1996), The Balanced Scorecard: Translating Strategy into Action. Harvard Business
School Press. 322 p. USA.
Magnani, M. and D. Montesi (2007), “Computing the cost of BPMN diagrams”. Technical Report UBLCS-07-17.
University of Bologna. Italy.
Malinverno, P. and J. B. Hill. (2007), “SOA and BPM Are Better Together”. Gartner Research. ID Number:
G00145586.
Mazanek, S. and M. Minas. (2009), “Transforming BPMN to BPEL Using Parsing and Attribute Evaluation with
Respect to a Hypergraph Grammar”. Online: http://is.tm.tue.nl/staff/pvgorp/events/grabats2009
/submissions/grabats2009_submission_8.pdf. (Last accessed on October 14, 2009).
Melenovsky, M. J. and J. Sinur (2006). BPM Maturity Model Identifies Six Phases for Successful BPM Adoption.
Gartner Research. ID Number: G00142643.
OMG, Object Management Group (2008), “Business Process Modeling Notation, V1.1: OMG Available
Specification”. http://www.bpmn.org/Documents/BPMN 1-1 Specification.pdf. (Last accessed on April 29,
2009).
Ouyang, C., M. Dumas, A. Hofstede, W.M.P. Van der Aalst (2006), “From BPMN Process Models to BPEL Web
Services”. IEEE International Conference on Web Services, 2006. pp. 285 – 292.
268
Espinosa, Frausto, and Rivera: Markov Decision Processes for Optimizing Human Workflows
Service Science 2(4), pp. 245-269, © 2010 SSG
Rosser, B. (2007), “How to Achieve BPM Alginment With Enterprise Goals”. Gartner Research. ID Number:
G00152759.
Shing, S. (2006) “BPEL: Building Standards-Based Business Processes with Web Services”. Online course.
http://www.javapassion.com/webservices/BPELOverview.pdf. Last accessed on March 2, 2009.
Taha, H. A. (2007), Operations Research: An introduction. Pearson-Prentice Hall. Upper Saddle River, N.J. USA.
Tsai, C.H., H.J. Luo, F.J. Wang (2007), “Constructing a BPMN Environment with BPMN*”. Proceedings of the 11th
IEEE International Workshop on Future Trends of Distributed Computing Systems (FTDCS’07). pp 164-172.
Van der Aalst, W. M. P. and K. Van Hee (2002). Workflow Management: Models, Methods and Systems. MIT Press.
Downloaded from informs.org by [165.51.219.45] on 24 March 2024, at 02:37 . For personal use only, all rights reserved.
Juan Frausto, Ph.D., holds a Bachelor of Electrical Engineering (1974) from the
Mexican National Polytechnic Institute and he received a Ph.D. on Investigation of
Computing Methods in Electrical Engineering (1981) from the Mexican National
Polytechnic Institute. He has directed more than 80 M.Sc. and Ph.D. Thesis. He has
also published more than 100 articles on international congresses and journals
(http://campus.cva.itesm.mx/juanfrausto). Among his publications, Petri Nets, Markov
Decision Processes, Lineal Optimization, Simulated Annealing, Neural Networks,
Decision Trees and Genetic Algorithms can be found. His research interests include
Combining Optimization applied to Artificial Intelligence for Engineering,
Bioinformatics, Planning and Scheduling.
269