Qualitative Process Analysis
Qualitative Process Analysis
Qualitative Process Analysis
Quality is free, but only to those who are willing to pay heavily
for it.
Tom DeMarco (1940–)
Analyzing business processes is both an art and a science. In this respect, qualitative
analysis is the artistic side of process analysis. Like fine arts, such as painting, there
is not a single way of producing a good process analysis, but rather a range of
principles and techniques that tell us which practices typically lead to a “good”
process analysis.
In this chapter, we introduce a selected set of principles and techniques for
qualitative process analysis. First, we present two techniques aimed at identifying
unnecessary steps of the process (value-added analysis) and sources of waste (waste
analysis). Next, we present techniques to identify and document issues in a process
from multiple perspectives and to analyze the root causes of these issues.
can use them to decompose tasks into steps. Unfortunately, such checklists are not
always available. In many cases, process participants have an implicit understanding
of the steps in a task because they perform the task day in and day out. But
this implicit understanding is not documented anywhere. In the absence of such
documentation, the process analyst needs to decompose each task into steps by
means of observation and interviewing.
Having decomposed the process into steps, a second prerequisite for value-added
analysis is to identify who is the customer of the process and what are the positive
outcomes that the customer seeks from the process. These outcomes are said to add
value to the customer, in the sense that fulfilling these outcomes is in the interest or
for the benefit of the customers.
Having decomposed the process into steps and having identified the positive
outcomes of a process, we can then analyze each step in terms of the value it adds.
Steps that directly contribute to positive outcomes are called Value Adding (VA)
steps. For example, consider a process for repairing a washing machine or other
appliance. The steps where the technician diagnoses the problem with the machine
are value adding, as they directly contribute to the outcome the customer wishes to
see, which is that the machine is repaired. Also, the steps related to repairing the
machine are value adding.
Some steps do not directly add value to the customer but they are necessary
for the business. Consider again the example of a process for repairing a washing
machine. Imagine that this process includes a step “Record defect” in which the
technician enters data into an information system about the washing machine and
an explanation of the defect found in it. This step per se is not value adding for
the customer. The customer wishes the machine to be fixed and does not get value
by the fact that the defect in their machine was recorded in an information system.
However, recording defects and their resolution helps the company to build up a
knowledge base of typical defects and their resolution, which is valuable when new
technicians are recruited. Also, such data allows the company to detect frequent
defects and to report such defects to the manufacturer or distributor of the washing
machine. Steps such as “Record defect” are termed Business Value Adding (BVA)
steps. BVA steps are those that the customer is neither willing to pay for, nor gains
satisfaction from (so they are not value adding), but they are necessary or useful to
the company where the process is performed.
Steps that are neither VA nor BVA are called Non-Value Adding (NVA).
In summary, value-added analysis consists in breaking down each task in a
process into steps, such as a preparation step, an execution step, and a handoff step.
We then classify each step into one of three categories, namely:
• Value Adding (VA): This is a step that produces value or satisfaction to the
customer. When determining whether or not a step is VA, it may help to ask the
following questions: Would the customer be willing to pay for this step? Does
the customer value this step enough to keep conducting business with us? And
conversely, if we remove this step, would the customer perceive that the outcome
of the process is less valuable?
6.1 Value-Added Analysis 215
• Business Value Adding (BVA): The step is necessary or useful for the business
to run smoothly, to collect revenue, or it is required due to the regulatory
environment of the business. When determining whether or not a step is BVA,
it may help to ask the following questions: Is this step required in order to collect
revenue, to improve or grow the business? Would the business (potentially) suffer
in the long term if this step were removed? Does it reduce risk of business losses?
Is this step required in order to comply with regulatory requirements?
• Non-Value Adding (NVA): The step does not fall into any of the other two
categories.
Example 6.1 We consider the equipment rental process described in Example 1.1
(page 3). The customer of this process is the site engineer who submits an equipment
rental request. From the perspective of the site engineer, the positive outcome of the
process is that the required piece of equipment is available in the construction site
when needed. Let us analyze the fragment of this process described in Figure 1.6,
which we reproduce as Figure 6.1 for convenience. To identify the relevant steps,
we walk through this model task by task, and we classify each step into VA, BVA,
and NVA.
• The first task in the process model is the one where the engineer lodges the
request. From the description in Example 1.1, we observe there are three steps in
this task:
1. Site engineer fills in the request.
2. Site engineer sends the request to the clerk via email (handoff step).
3. Clerk opens and reads the request (handoff step).
Fig. 6.1 Process model for the initial fragment of the equipment rental process
216 6 Qualitative Process Analysis
Arguably, filling the request is VA insofar as the site engineer cannot expect the
equipment to be rented if they do not ask for it. In one way or another, the site
engineer has to request the equipment in order to obtain it. On the other hand,
the site engineer does not get value out of sending the request to the clerk by
email nor does get value out of the clerk having to open and read the request.
More generally, steps associated to handoffs between process participants, such
as sending and receiving internal messages, are NVA.
• The second task is the one where the clerk selects a suitable equipment from the
supplier’s catalog. We can treat this task as a single step. This step is VA insofar
as it contributes to identifying a suitable equipment to fulfill the needs of the site
engineer.
• In the third task, the clerk calls the supplier to check the availability of the
selected equipment. This “call supplier” step is value adding insofar as it
contributes to identifying a suitable and available equipment. If the equipment
is available, the clerk recommends that this equipment be rented. To this end, the
clerk adds the details of the recommended equipment and supplier to the rental
request form and forwards the form to the works engineer for approval. Thus we
have two more steps: (i) adding the details to the rental request and (ii) forwarding
the rental request to the works engineer. The first of these steps is BVA since it
helps the company to keep track of the equipment they rent and the suppliers they
rent from. Maintaining this information is valuable when it comes to negotiating
or re-negotiating bulk agreements with suppliers. On the other hand, the handoff
between the clerk and the works engineer (i.e. the “forwarding” step) is not value
adding.
• Next, the works engineer examines the rental request in view of approving it
or rejecting it. We can treat this examination as one step. This step is a control
step, that is, a step where a process participant or a software application checks
that something has been done correctly. In this case, this control step helps the
company to ensure that equipment is only rented when it is needed and that the
expenditure for equipment rental in a given construction project stays within the
project’s budget. Control steps are generally BVA.
• If the works engineer has an issue with the rental request, the works engineer
communicates it to the clerk or the site engineer. This communication is
another step and it is BVA since it contributes to identifying and avoiding
misunderstandings within the company. If approved, the request is sent back to
the clerk; this is a handoff step and it is thus NVA.
• Finally, assuming the request is approved, the clerk creates and sends the PO.
Here we can identify two more steps: creating the PO and sending the PO to
the corresponding supplier. Creating the PO is BVA. It is necessary in order to
ensure that the rental request cost is correctly accounted for and eventually paid
for. Sending the PO is value adding: It is this act that makes the supplier know
when the equipment has to be delivered on a given date. If the supplier did not
get this information, the equipment would not be delivered. Note however that
what is value adding is the fact that the supplier is explicitly requested by the
6.1 Value-Added Analysis 217
construction company to deliver the equipment on a given date. The fact that this
request is made by sending a PO is secondary in terms of adding value to the site
engineer.
The identified steps and their classification are summarized in Table 6.1.
One may wonder whether creating the PO is a VA step or a BVA step. Arguably,
in order for the equipment to be available, the supplier needs to have an assurance
that the equipment rental fee will be paid. So one could say that the creation of
the PO contributes to the rental of the equipment since the PO serves to assure
the supplier that the payment for the rental equipment will be made. However, as
mentioned above, what adds value to the site engineer is the fact that the supplier
is notified that the equipment should be delivered at the required date. Whether
this notification is done by means of a PO or by means of a simple electronic
message sent to the supplier is irrelevant, so long as the equipment is delivered.
Thus, producing a formal document (a formal PO) is arguably not value adding. It
is rather a mechanism to ensure that the construction company’s financial processes
run smoothly and to avoid disputes with suppliers, e.g., avoiding the situation where
a supplier delivers a piece of equipment that is not needed and then asks for payment
of the rental fee. More generally, we will take the convention that documentation and
control steps imposed by accounting or legal requirements are BVA.
Exercise 6.1 Consider the process for university admission described in Exer-
cise 1.1 (page 5) and modeled in Figure 5.4 (page 197). What steps can you extract
from this process? Classify these steps into VA, BVA, and NVA.
Having identified and classified the steps of the process as discussed above,
one can then proceed to determining how to minimize or eliminate NVA steps.
Some NVA steps can be eliminated by means of automation. This is the case of
218 6 Qualitative Process Analysis
Waste analysis can be seen as the reverse of value added analysis. In value added
analysis we look at the process from a positive angle. We try to identify value
adding steps, and then we classify the remaining steps into business-value adding
and non-value adding. Waste analysis takes the negative angle. It tries to find waste
everywhere in the process. Some of these wastes can be traced down to specific steps
in the process, but others, as we will see, are hidden in between steps or sometimes
throughout the process.
Waste analysis is one of the key techniques of the Toyota Production System
(TPS) developed by Taiichi Ohno and colleagues in the 1970s. This technique has
6.2 Waste Analysis 219
6.2.1 Move
The first and perhaps most pervasive source of waste is transportation. In a manufac-
turing process, transportation means moving materials from one location to another
one, such as from a warehouse to a production facility. In a business process, phys-
ical transportation occurs, for example, when documents are sent from one process
participant to another—often signaling a handoff of work between participants—or
when physical documents are exchanged with an external party. In modern business
processes, physical document exchanges have been largely replaced with electronic
exchanges. For example, purchase orders, shipment notifications, delivery receipts
and invoices are often exchanged via Electronic Data Interchange (EDI) channels.
Meanwhile, internal handoffs between process participants are generally automated
by means of Process-Aware Information Systems, as we will discuss in Chapter 9.
But despite the replacement of physical document flows with electronic ones, which
we have witnessed in the past decades, transportation remains nonetheless a source
of waste. Indeed, every time that a handoff occurs between participants, this handoff
entails some delay, as the participant who needs to take the relay is likely to be busy
with other work when the handoff occurs.
A process model with lanes and pools can help us to identify transportation
waste. Typically, there is transportation waste wherever a sequence flow goes from
one lane to another in a pool. Such a sequence flow represents a handoff. In a similar
vein, if the process model has multiple pools, every message flow is a potential
transportation waste.
Example 6.2 Let us consider the equipment rental process model introduced in
Example 1.1 (page 3). The fragment of the process from the creation of the
rental request up to its approval is shown in Figure 6.2. The figure highlights
four transportation wastes. The first three come from handoffs between process
220 6 Qualitative Process Analysis
Fig. 6.2 Fragment of the equipment rental process from creation of rental request up to creation
of the PO
participants: (i) from the site engineer to the clerk; (ii) from the clerk to the works
engineer; and (iii) from the works engineer back to clerk. The fourth transportation
waste occurs when the clerk sends the purchase order to the supplier.
Later in the process, we can note two other transportation events: the delivery
of the equipment to the construction site and its subsequent removal at the end of
the renting period. One might argue that these two transportation events are value
adding, since the delivery of the equipment is precisely what the site engineer seeks.
Still, the rental company would strive to minimize this transportation, for example,
by optimizing the placement of equipment so that it is close to the construction sites
where it is likely to be used.
A final transportation waste occurs when the supplier submits the invoice.
This example shows that not all transportation waste in a process can be
eliminated. In particular, the transportation of equipment cannot be fully eliminated.
But we can strive to reduce it or we can strive to reduce its cost. For example, to
reduce the cost of physical transportation of goods, we can batch together several
deliveries. Similarly, transportation of physical documents can in some cases be
replaced by electronic exchanges. In other situations, it may be possible to reduce
the number of handoffs, so as to reduce the waiting times and context switches that
these exchanges create.
The second type of waste related to movement is motion. Motion refers to process
participants moving from one place to another during the execution of a process.
Motion is common in a manufacturing process where workers move pieces from
6.2 Waste Analysis 221
one place to another in the production line. In the field of business processes, motion
wastes are less common than in manufacturing process, but they are nonetheless
present.
Consider, for example, a vehicle inspection process where customers have their
vehicles inspected in order to assess their roadworthiness and their compliance with
respect to gas emission requirements. In this process, vehicles have to go through
different inspection bases in order to undergo different tests. Oftentimes, process
participants have to move equipment or tools from one inspection base to another to
perform certain tests. This is a motion waste.
Another form of motion waste—which can be found in digitized processes—
arises when a process participant has to switch from one application to another
during the performance of a task. For example, when doing a vehicle inspection
booking for a new customer, the receptionist may need to record the customer details
in one application, and then schedule the inspection in another application. The
movement between these two applications is motion waste. A set of techniques and
tools known under the name of Robotic Process Automation (RPA) [15] aim at
reducing this type of motion waste. RPA will be further discussed in Chapter 9.
6.2.2 Hold
process, waiting waste occurs when a task waits for a process participant to become
available. Waiting waste can also occur in the opposite direction: Instead of the task
waiting for a resource, we might have a resource waiting for a task. This sub-type
of waiting waste is called idleness.
Let us consider again the vehicle inspection process. At some point in time, it
might be that there is a technician at an inspection base waiting for the next car,
because the next car is still being inspected at the previous base. This is an example
of idleness.
On the other hand, consider the case of a travel request that has undergone one
approval, but needs to undergo a second one. This second approval is a task. If
the participant responsible for the second approval is not available when the first
approval is completed, the request is put on hold. The time that the request spends
on hold is waiting waste.
Transportation waste often implies waiting waste. For example, the three flows
going across lanes in Figure 6.2 induce waiting waste because process participants
who receive the rental request will often not be available when the request is handed
off to them.
6.2.3 Overdo
The third category of waste are those related to overdoing. The first type of overdo
waste is called defect waste. Defect waste refers to all work performed in order to
correct, repair, or compensate for a defect in a process. Defect waste encompasses
rework, meaning situations where we have to perform again a task that we have
previously executed in the same case, because of a defect the first time the task was
performed. In a travel requisition process, an example of a defect waste is when a
travel requisition request is sent back by the approver to the requestor for revision
because some data was missing.
Another type of waste in this category is called overprocessing. Overprocessing
refers to work that is performed unnecessarily given the outcome of a process
instance. It includes unnecessary perfectionism, but it also includes tasks that are
performed and later found not to be necessary.
Coming back to the vehicle inspection process, let us assume that the technicians
take a lot of time to measure the vehicle emissions with a higher degree of accuracy.
This perfectionism is waste. If in addition we find out later that the said vehicles for
which the emissions were measured so accurately, end up not fulfilling the emissions
levels by a big margin, then all this accuracy was unnecessary.
Consider now the example of the travel approval and assume that about 10% of
the requests are rejected trivially after several tasks have been completed, because
there is not enough budget for the travel. These unnecessary task executions could
be avoided by doing a budget check earlier in the process, so as to avoid wasting the
time of the approvers. This example illustrates that simple verification steps at the
6.2 Waste Analysis 223
Exercise 6.2 Identify wastes in the university admission process of Exercise 1.1
(page 5) and classify them according to the seven types of waste. Consider the
following additional information.
Each year, the university receives in total 3,000 online applications. There are 10 study
programs. Each study program has 30 study places. The top-5 applicants in each study
program are offered scholarships in addition to a study place. Applicants initially ranked in
positions 6 to 30 in their study program are offered a study place but without a scholarship.
After the committee has examined the applications, each application is either: (i) accepted
with a scholarship, (ii) accepted without scholarship, (iii) admissible but not accepted unless
a study place is freed up by a higher-ranked applicant, or (iv) rejected due to low scores or
plagiarism.
Successful applicants must accept or decline the offer at most two weeks after notification.
If an applicant declines the offer, his or her study place is allocated to the next admissible
non-admitted applicant in the ranking of his or her study program. If an applicant with an
allocated scholarship rejects his or her study place, the scholarship is allocated to the next
applicant in the corresponding ranking who does not yet have an allocated scholarship.
Applications are rejected or discarded for the following reasons:
• 20% of applications are rejected initially due to deficiencies in the online application
form (e.g., missing documents). In half of the cases, the applicant manages to fix the
identified issues and the application passes the administrative check after the second try.
• 10% of applications are rejected because the hard copy is not received on time.
• 3% rejected due to a negative advice from the academic recognition agency.
• 2% rejected due to invalid English language test.
• 5% rejected due to plagiarized motivation letter.
• 5% rejected due to poorly written motivation letters.
• 15% rejected due to low GPA.
• 20% of applicants are offered a place but decline it. In 60% of these cases, the applicant
declines because he or she expected to get a scholarship, but his or her score was
insufficient. In another 30% of cases, applicants decline because they had already
accepted an offer elsewhere. The rest of cases where applicants decline an offer are
due to personal reasons.
• 20% of applicants are declared admissible but do not receive an offer due to lack of study
places.
The admissions office handles circa 10,000 emails from applicants concerning the applica-
tion process, including questions about the application form, the required documents, the
eligibility conditions, the application status, etc.
When analyzing a business process, it is worth keeping in mind that “even a good
process can be made better” [61]. Experience shows that any non-trivial business
process, no matter how much improvement it has undergone, suffers from a number
of issues. There are always errors, misunderstandings, incidents, unnecessary steps
and other waste when a business process is performed on a day-to-day basis.
Part of the job of a process analyst is to identify and to document the issues
that affect the performance of a process. To this end, an analyst will typically
gather data from multiple sources and will interview several stakeholders, chiefly
6.3 Stakeholder Analysis and Issue Documentation 225
the process participants but also the process owner and managers of organizational
units involved in the process. Each stakeholder has a different view on the process
and will naturally have a tendency to raise issues from his or her own perspective.
The same issue may be perceived differently by two stakeholders. For example,
an executive manager or a process owner will typically see issues in terms of
performance objectives not being met or in terms of constraints imposed for example
by external pressures (e.g., regulatory or compliance issues). Meanwhile, process
participants might complain about insufficient resources, hectic timelines as well
as errors or exceptions perceived to be caused by other process participants or by
customers.
Below, we introduce three complementary techniques to collect, document and
analyze issues in a process:
1. Stakeholder analysis, which allows us to collect issues from complementary
perspectives.
2. Issue register, which allows us to document issues in a structured manner.
3. Pareto analysis and PICK charts, which allow us to select a subset of issues for
further analysis and redesign.
participants see the process from the inside. Specifically, process participants are
likely to see defects arising from handoffs in the process, whereas customers do not
necessarily see these defects if they are internally fixed. More generally, process
participants are usually able to provide insights regarding wastes in the process,
not only defects, but also transportation, movement and waiting waste (arising from
handoffs), as well as overprocessing.
External parties can have a variety of concerns, depending on their role in the
process. Suppliers and sub-contractors (which are a common type of external party)
are generally concerned about having a steady or growing stream of work from the
process, being able to plan their work ahead, and being able to meet contractual
requirements. In other words, they are concerned about predictability, and they
might see opportunities to improve the interactions between their own process and
the processes with which they are integrated.
The process owner and the operational managers are likely to be concerned with
the performance measures of the process, be it high cycle times or high processing
times. Indeed, processing times are directly related to labor costs and hence they
affect the efficiency of the process. A process owner might also be concerned about
common defects in the process, as well as overproduction. The process owner and
other managers are generally also concerned about compliance with internal policy
and external regulations.
The sponsor and other high-level managers are generally concerned with the
strategic alignment of the process and the contribution of the process to the key
performance measures of the organization. They might also be concerned about
the ability of the process to adapt to evolving customer expectations, external
competition, and changing market conditions. Sponsors and high-level managers
might also raise opportunities (as opposed to issues), for example opportunities to
attract additional customers, to expand into a new market segment, or to cross-sell
or up-sell products or services to existing customers.
When a process improvement effort starts, the sponsor and the process owner
will generally put forward a set of objectives and targets to be achieved by the
improvement effort. They may also put forward one or more hypotheses regarding
the main bottlenecks and issues in the process. The analyst takes this initial set
of objectives, targets, and perceived issues as a starting point. The analyst then
identifies and conducts interviews with stakeholders in each of the above categories
in order to collect additional perceived issues. By cross-checking the perceived
issues raised during these interviews, and by validating them via additional data
collection, the analyst identifies a set of validated issues from the perspective of
each category of stakeholder. These validated issues are the input to construct an
issue register as discussed below.
Example 6.4 Consider the equipment rental process discussed in previous exam-
ples. The owner of this process is BuildIT’s purchasing manager. The purchasing
manager is concerned by the growing volume of equipment rental expenses. In
the past year, these expenses have grown by 12% whereas the overall volume of
construction activity (measured by revenue) has grown by only 8%. The purchasing
6.3 Stakeholder Analysis and Issue Documentation 227
manager launches an improvement effort to bring down the rental expenses by 5%.
This objective is in line with overall target set by the CFO of 5% of company-wide
cost reductions.
An analyst is asked to review the rental process. The analyst identifies the
following stakeholders:
• Customer: the site engineers.
• Process participants: the clerks, the works engineers and the accounts payable
team at the financial department (who handle the invoice).
• Process owner and operational managers: purchasing manager, construction
project managers, accounts payable team lead.
• Upper management: the CFO, acting as the business sponsor as part of the
broader mandate for cost reduction.
• External party: the equipment rental suppliers.
After interviewing the process owner, the analyst notes two perceived issues in
the process:
• Equipment is often hired for longer than needed, leading to inventory waste.
• Penalties are often being paid to the suppliers due to: (i) equipment being
returned upon receipt because it was not suitable for the job; and (ii) late invoice
payments. In both cases, these penalties arise from wastes of type defect.
The above observations illustrate that oftentimes, issues raised during the
stakeholder analysis (particularly issues raised by the process owner and the process
participants) are associated with wastes. Hence, the output of waste analysis can be
helpful when engaging in a stakeholder analysis.
The analyst decides to start by gathering data from the site engineer, the clerk
and the works engineer, given their central role in the process. He proceeds with
interviews in order to derive deeper qualitative insights. The interviews are partly
driven by the waste analysis—in particular transportation, waiting, and defects, as
well as the inventory waste raised by the process owner.
After interviewing three site engineers, the analyst retains that the main concern
of the site engineers is the delay between the moment they create an equipment
rental and the moment the corresponding equipment arrives. The analyst determines
that this delay is of 3.5 working days on average (sometimes three, sometimes 4
days, rarely less or more). The site engineer also confirmed that sometimes they
have to reject equipment when they receive it because the equipment is not suitable
for the job—even though they claim that in their requests, they clearly indicate what
type of equipment is needed and for what purpose.
On the other hand, the clerks’ main concerns are:
• Lack of clarity in the requirements they receive from the site engineers, which
somehow contradicts the viewpoint of the latter.
• Inaccurate and incomplete equipment descriptions in the catalogs of the site
engineer vendors.
• Slow turnaround times when asking the works engineers to approve the rental
requests.
228 6 Qualitative Process Analysis
The works engineers echo the concerns of the purchasing managers that site
engineers are sometimes retaining the rented equipment for longer than strictly
needed (inventory waste). They are aware that sometimes the delivered equipment
does not match the requirements of the site engineers and is hence returned, but they
do not perceive that this is a major issue.
The accounts payable team claims that they are aware of the fact that penalties
are being paid for late invoice payments. However, they claim that it is not their
fault. In 98% of cases, invoices are being paid at most three working days after their
internal approval. The accounts payable department claims that it is not possible to
do faster, and that in any case, the penalties of late invoices would still occur even if
they could reduce the payment time from 3 days to 2 days.
The analyst also interviewed two suppliers, who echoed the fact that sometimes
the delivered equipment was rejected by the site engineer, and that invoices took
too long to be paid. The suppliers additionally perceived that there is a lack of
integration between their systems and the ones used internally at BuildIT. A supplier
commented that this lack of integration could be one of the reasons why mistakes
were being made along the way.
The analyst retains that the issues raised by the process owner are being echoed
in several ways by other stakeholders. The analyst also takes note of the slow cycles
times reported by the site engineer, and the misunderstandings and data quality
issues raised by the clerk.
Exercise 6.3 Let us consider again the university admission process for interna-
tional students described in Exercise 1.1 (page 5). The owner of this process is
the Head of the Admissions Office, who reports to the university’s Deputy Vice-
Chancellor for Student Affairs. The Head of Admissions Office is simultaneously
concerned about the cost of running the admissions process, but equally as much
by the fact that the university is losing talented admission candidates to competing
universities.
Regarding the costs, the process owner reports that each instance of the admis-
sion process generates e 100 in labor cost, including the time spent by the
admissions office as well as the time spent by the academic committees responsible
for assessing and ranking the applications. The admissions office additionally pays
a fee of e 30 to an external agency to verify the validity and equivalence of each
submitted diploma, plus e 20 per submitted application to the provider of the
online application system that students use to submit and track their applications.
The university’s marketing office additionally spends e 100 in marketing per
application. The university charges to the applicant a non-reimbursable application
fee of e 100 per application. As discussed in Exercise 6.2 (page 224), out of 3,000
applications only 300 applicants end up joining a study program. The remaining
students drop out during the process for the reasons enumerated in Exercise 6.2.
Regarding the loss of candidates during the admission process, the process owner
is particularly concerned about the relatively high number of applicants who receive
an admission offer but do not accept it. Specifically, 30% of applicants who receive
6.3 Stakeholder Analysis and Issue Documentation 229
an offer reject it as mentioned in Exercise 6.2 (page 224) and 30% of those who
reject the offer do so in favor of a competing university.
You are tasked with doing an analysis of the above process in order to come up
with a list of issues. Given the description of the process in Exercise 1.1 (page 5)
and the information given above, prepare a plan including:
• The list of stakeholders you would interview (justify your choice).
• For each stakeholder, discuss what types of issues you would expect him or her to
raise (hypothesized issues) and what questions you could ask to each stakeholder
in order to determine if these hypothesized issues indeed exist, and if so, what is
their impact.
1 Inthis issue register we do not use one column per field, but rather one row per field. This is a
pragmatic choice to better fit the issue register within the width of the page.
6.3 Stakeholder Analysis and Issue Documentation 231
engineer rejects the equipment creates a direct impact for BuildIT, for example in
terms of delays in the construction schedule. Meanwhile, the issues listed above
have an indirect business impact, in the sense that they lead to the equipment being
rejected and the needed equipment not being available on time, which in turn leads
to delays in the construction schedule.
When an issue register contains a combination of issues and factors, it may be
useful to add two fields to the register, namely “caused by” and “is cause of”, that
indicate for a given issue, which other issues in the register are related to it via a
cause-effect relation. This way it becomes easier to identify which issues are related
between them so that related issues can be analyzed together. Also, when an issue
X is a factor of an issue Y, instead of analyzing the impact of both X and Y, we can
analyze the impact of Y and in the qualitative and quantitative impact fields of X we
can simply refer to the impact of Y. For example, in the impact field of issue “Clerk
misunderstood the site engineer’s requirements” we can simply refer to the impact
of “Equipment is rejected by the site engineer”.
232 6 Qualitative Process Analysis
Alternatively, we can adopt the convention of including in the issue register only
top-level issues, meaning issues that have a direct business impact, and separately,
we can use why-why diagrams and cause-effect diagrams to document the factors
underpinning these top-level issues. This convention is followed in the rest of this
chapter, meaning that the issue registers shown below only contain top-level issues
rather than factors. The analysis and documentation of the causes of each issue is
undertaken outside the issue register by means of root cause analysis techniques,
which we will discuss later in this chapter. Hence, for each issue we put in an issue
register, there is at least one stakeholder who is directly impacted by the issue, and
hence we can do an impact analysis of each issue.
In the above example, the number of issues is small. In a large organization,
a stakeholder analysis of a core process can lead to dozens of issues. Moreover,
when we engage in a BPM program covering many processes, the number of issues
across all processes can be in order of hundreds. In these cases, it pays off to use an
Issue Tracking system to maintain the issue register. An issue tracking system is a
collaboration tool that allows its users (among other things) to create, document,
edit, and comment on issues, and to generate filtered and sorted lists of issues
according to a range of criteria.
Exercise 6.4 We consider again the university admission process. As discussed in
Exercise 6.3 (page 228), the process owner is concerned by the costs of the process
and by the fact that good candidates are being lost to competing universities during
the admission process. Concretely, we saw in Exercise 6.2 (page 224) that 30% of
students who receive an offer reject it, and that out of those, 30% of them reject the
offer because they received an offer from a competing university. The interviews
as well as data from applicant surveys reveal that one of the issues faced by the
university is that students have to wait too long to know the outcome of their
application. It often happens that by the time a student is admitted, the student has
decided to go to another university instead. Write an issue register to document this
issue (only this issue). Take into account the data in Exercises 6.2 and 6.3.
The impact assessment conducted while building the issue register can serve as
input for Pareto analysis. The aim of Pareto analysis is to identify which issues
or which causal factors of an issue should be given priority. Pareto analysis rests on
the principle that a small number of factors are responsible for the largest share of a
given effect. In other words:
• A small subset of issues in the issue register are likely responsible for the largest
share of impact.
• For a given issue, a small subset of factors behind this issue are likely responsible
for the largest share of occurrences of this issue.
6.3 Stakeholder Analysis and Issue Documentation 233
Sometimes this principle is also called the 80-20 principle, meaning that 20%
of issues are responsible for 80% of the effect. One should keep in mind however
that the specific proportions are only indicative. It may be for example that 30% of
issues are responsible for 70% of the effect.
A typical approach to conduct Pareto analysis is as follows:
1. Define the effect to be analyzed and the measure via which this effect will be
quantified. The measure might be for example:
• Financial loss for the customer or for the business.
• Time loss by the customer or by the process participants.
• Number of occurrences of a negative outcome, such as number of unsatisfied
customers due to errors made when handling their case.
2. Identify all relevant issues that contribute to the effect to be analyzed.
3. Quantify each issue according to the chosen measure. This step can be done on
the basis of the issue register, in particular, the quantitative impact column of the
register.
4. Sort the issues according to the chosen measure (from highest to lowest impact)
and draw a so-called Pareto chart. A Pareto chart consists of two components:
a. A bar chart where each bar corresponds to an issue and the height of the bar
is proportional to the impact of the issue or factor.
b. A curve that plots the cumulative percentage impact of the issues. For
example, if the issue with the highest impact is responsible for 40% of the
impact, this curve will have a point with a y-coordinate of 0.4 and an x-
coordinate positioned so as to coincide with the first bar in the bar chart.
Example 6.6 Consider again the equipment rental process described in Example 1.1
(page 3) and the issue register in Example 6.5. All three issues in this register have
in common that they are responsible for unnecessary rental expenditure, which is a
form of financial loss. From the data in the impact column of the register, we can
plot the Pareto chart in Figure 6.3.
This Pareto chart shows that issue “Slow rental approval” is responsible already
for 78% of unnecessary rental expenditure. Given that in this example there are only
three issues, one could have come to this conclusion without conducting Pareto
analysis. In practice though, an issue register may contain dozens or hundreds of
issues, making Pareto analysis a useful tool to summarize the data in the issue
register and to focus the analysis and redesign efforts on the set of issues that would
lead to the most visible impact.
Exercise 6.5 Let us consider again the equipment rental process. This time we take
the perspective of the site engineer, whose goal is to have the required equipment
available on site when needed. From this perspective, the main issue is that in about
10% of cases, the requested equipment is not available on site the day when it is
required. When this happens, the site engineer contacts the suppliers directly to
resolve the issue, but still, resolving the issue may take several days. It is estimated
that each such delay costs e 400 per day to BuildIT. By inspecting a random sample
234 6 Qualitative Process Analysis
70000 120%
60000
60000
100% 100%
97%
50000
80%
78%
40000
60%
30000
40%
20000
15000
20%
10000
2400
0 0%
Slow approval Wrong equipment Late payment fees
of delayed equipment deliveries during a one-year period and investigating the cause
of each occurrence, an analyst found that:
1. In total, five occurrences were due to the site engineer not having ordered
the equipment with sufficient advance notice: The site engineers ordered the
equipment the day before it was needed, when at least 2 days are needed. These
cases cause delays of 1 day on average.
2. Nine occurrences were due to the fact that none of BuildIT’s suppliers had the
required type of equipment available on the requested day. These cases cause
delays of 1 to 4 days (3 days on average).
3. 13 occurrences were due to the approval process taking too long (more than a
day) due to mistakes or misunderstandings. For these cases, the delay was 1 day
on average.
4. 27 occurrences were due to the equipment having been delivered on time, but the
equipment was not suitable and the site engineer rejected it. These cases cause
delays of 2 days on average.
5. Four occurrences were due to mistakes or delays attributable entirely to the
supplier. These cases lead to delays of one day. However, in these cases, the
supplier compensated BuildIT by providing the equipment 2 days for free (the
remaining days are still charged). Recall that the average cost of an equipment
rental per day is e 100.
6. For two occurrences, the analyst did not manage to determine the cause of the
delay (the process participants could not recall the details). The delays in these
cases where 2 days per occurrence.
The sample of analyzed occurrences represents around 20% of all occurrences of
the issue during a one-year period.
6.3 Stakeholder Analysis and Issue Documentation 235
• Kill (low payoff, hard to do): issues that are probably not worth addressing or at
least not to their full extent.
Root cause analysis is a family of techniques that helps analysts to identify and
understand the root cause of issues or undesirable events. Root cause analysis is not
confined to business process analysis. In fact, root cause analysis is commonly used
in the context of accident or incident analysis as well as in manufacturing processes
where it is used to understand the root cause of defects in a product. In the context of
business process analysis, root cause analysis is helpful to identify and to understand
the issues that prevent a process from having a better performance.
Root cause analysis encompasses a variety of techniques. In general, these
methods include guidelines for interviewing and conducting workshops with rel-
evant stakeholders, as well as techniques to organize and to document the ideas
generated during these interviews or workshops. Below, we will discuss two of these
techniques, namely cause-and-effect diagrams and why-why diagrams.
Cause-effect diagrams depict the relationship between a given negative effect and
its potential causes. In the context of process analysis, a negative effect is usually
either a recurrent issue or an undesirable level of process performance. Potential
causes can be divided into causal and contributing factors (hereby called factors) as
explained in the box below.
(continued)
6.4 Root Cause Analysis 237
interface does not provide a calendar widget so that the user can easily select
the date. This deficiency in the user interface may increase the chances that
the user enters the wrong date. In other words, this deficiency contributes to
the issue “Incorrect claim data entry”.
While the distinction between causal and contributing factor is generally
useful when investigating specific incidents (for example investigating the
causes of a given road accident), the distinction is often not relevant or not
sufficiently sharp in the context of business process analysis. Accordingly, in
this chapter we will use the term factor to refer to causal and contributing
factors collectively.
In a cause-effect diagram, factors are grouped into categories and possibly also
sub-categories. These categories help to guide the search for potential causes.
Concretely, when organizing a brainstorming session for root cause analysis, one
way to structure the session is to first go around the table asking all participants
to give their opinion on the potential causal or contributing factors of the issue at
hand. These potential factors are listed in no particular order. Next, the potential
factors are classified according to certain categories and the discussion continues
in a more structured way using these categories as a framework. The outcome
of this discussion is a list of potential (or hypothesized) factors. Each of these
hypothesized factors should be validated subsequently by collecting data from the
relevant information systems or by observing executions of the process during a
period of time in order to determine if occurrences of the negative effect can indeed
be traced back to occurrences of the potential factor.
A well-known categorization for cause-effect analysis are the so-called 6 M’s,
which are described below together with possible sub-categorizations.
1. Machine (technology)—factors pertaining to the technology used, like for
example software failures, hardware failures, network failures, or system crashes
that may occur in the information systems that support a business process. A
useful sub-categorization of Machine factors is the following:
a. Lack of functionality in application systems.
b. Redundant storage of data across systems, leading for example to double data
entry (same data entered twice in different systems) and data inconsistencies
across systems.
c. Low performance of IT of network systems, leading for example to low
response times for customers and process participants.
d. Poor user interface design, leading for example to erroneous customer or
process participants not realizing that some data is missing or that some data
is provided but not easily visible.
e. Lack of integration between multiple systems within the enterprise or with
external systems such as a supplier’s information system or a customer’s
information system.
238 6 Qualitative Process Analysis
These categories are meant as a guideline for brainstorming during root cause
analysis rather than gospel that should be followed to the letter. Other ways of
categorizing factors may be equally useful. For example, one alternative catego-
rization is known as the 4 P’s (Policies, Procedures, People and Plant). Also, it is
sometimes useful to classify factors according to the tasks in the process from where
they originate (i.e., one category per major task in the process).
The above categories are useful not only as a guide for brainstorming during
root cause analysis, but also as a basis for documenting the potential root causes in
the form of a cause-effect diagram. Concretely, given a categorization of potential
causes, such as the 6 M’s above, a cause-effect diagram consists of a main horizontal
line (the trunk) from which a number of branches stem (see Figure 6.5). At one end
of the trunk is a box containing the negative effect that is being analyzed (in our case
the issue being analyzed). The trunk has a number of main branches corresponding
to the categories of factors (e.g., the 6 M’s). The root causes are written in the sub-
branches. Sometimes, it is relevant to distinguish between primary factors, meaning
factors that have a direct impact on the issue at hand, from secondary factors, which
are factors that have an impact on the primary factors. For example, in the context
of an insurance claims handling process, an inaccurate estimation of the damages
leads to a miscalculation of the amount to be paid for a given claim. This inaccurate
estimation of the damages may itself stem from a lack of incentive from the repairer
to accurately calculate the cost of repairs. Thus, “Inaccurate damage estimation” can
be seen as a primary factor for “Liability miscalculation”, while “Lack of incentive
to calculate repair costs accurately” is a secondary factor behind the “Inaccurate
damage estimation”. The distinction between primary and secondary factors is a
Issue
Se Sec
con Sec ond
da ond ary
ry ary
first step towards identifying chains of factors behind an issue. We will see later
in this chapter that why-why diagrams allow us to dig deeper into such chains of
factors.
Because of their visual appearance, cause-effect diagrams are also known as
Fishbone diagrams. Another common name for such diagrams is Ishikawa diagrams
in allusion to one of its proponents—Kaoru Ishikawa—one of the pioneers of the
field of quality management.
Example 6.7 We consider again the equipment rental process described in Exam-
ple 1.1 (page 3) and the issue register in Table 6.2 (page 231). One of the issues
identified in the issue register is that sometimes, the delivered equipment is rejected
by the site engineer. We can see three primary causes from the issue, which are
summarized in the cause-effect diagram in Figure 6.6. The diagram also shows
secondary causes underpinning each of the primary causes. Note that the factor
“clerk selected equipment with incorrect specs” has been classified under the
Material category because this factor stems from incorrect input data. A defect in
input data used by a process falls under the Material category.
Exercise 6.6 As discussed in Exercise 6.4 (page 232), one of the main issues of
the university admission process is that students have to wait too long to know the
outcome of their application. It often happens that by the time a student is admitted,
the student has decided to go to another university instead. Analyze the possible
causes of this issue using a cause-effect diagram.
Inaccurate
equipment description in
provider's catalogue
Equipment
rejected at
delivery
Contributing ...
Factor
Contributing
Factor
Issue
Contributing
Factor
Contributing
Factor ...
Contributing
Factor
equipment rental requests approved took a couple of days or more, and the larger the
cost and the longer the duration of the rental, the slower it was to get it approved.
As a workaround, site engineers were renting equipment several days before the
date when they actually needed it. Also, they were specifying short periods in their
equipment rental requests in order to get them approved quicker. When the deadline
for returning an equipment approached, they just called the supplier to keep the
equipment for a longer period.
The analyst then had a closer look at the second issue (equipment being rejected).
The initial interviews with the clerks had already provided some hints as to the
causes of this issue. The clerks often misunderstood the site engineer’s requirements
for an equipment. They also found that the data in the suppliers’ catalogs was
inaccurate and incomplete. Further interviews with the site engineers also revealed
that the site engineers often did not know what equipment had been ordered as
a result of their rental request. Had they known it, they could have rectified the
mistakes before the equipment reached the construction site.
Finally, the analyst took a closer look at the issue of penalties for late payment of
invoices. Again, by taking some concrete examples and talking about them with the
clerks, the analyst found that the issue partially came from the fact that clerks were
having a hard time getting the site engineers to confirm that the data in the invoices
are correct. Site engineers did not feel that verifying the invoices was a priority for
them. The clerks also pointed out that there were often inconsistencies between the
PO and the invoice. One of the causes for these inconsistencies was that, to avoid the
hassle of taking back the equipment and exchanging it for another one, some of the
suppliers had developed a workaround: Every time the supplier received a PO, the
supplier contacted directly the site engineer to negotiate exactly which equipment
should be delivered. As a result of this negotiation, very often the equipment that
was actually delivered differed from the one specified in the PO.
6.4 Root Cause Analysis 243
Based on the above observations and others made during the interviews, the
analyst wrote the following why-why diagrams (represented as nested bullet-point
lists).
Issue 1 Site engineers sometimes reject delivered equipment, why?
• wrong equipment is delivered, why?
– miscommunication between site engineer and clerk, why?
◦ site engineer provides an incomplete or inaccurate description of what they
want.
◦ site engineer does not always see the supplier catalogs when making a
request and does not communicate with the supplier, why?
· site engineer generally does not have Internet connectivity.
◦ site engineer does not check the choice of equipment made by the clerk.
– equipment descriptions in supplier’s catalog not accurate.
Issue 2 Site engineers keep equipment longer than needed via deadline extensions,
why?
• site engineer fears that equipment will not be available later when needed,
why?
– time between request and delivery too long, why?
◦ excessive time spent in finding a suitable equipment and approving the
request, why?
· time spent by clerk contacting possibly multiple suppliers sequen-
tially;
· time spent waiting for works engineer to check the requests;
Issue 3 BuildIT often has to pay late payment fees to suppliers, why?
• Time between invoice received by clerk and confirmation is too long, why?
– clerk needs confirmation from site engineer, why?
◦ clerk cannot assert when the equipment was delivered and picked up,
why?
· delivery and pick-up of equipment are not recorded in a shared
information system;
· site engineer can extend the equipment rental period without
informing the clerk;
◦ site engineer takes too long to confirm the invoice, why?
· confirming invoices is not a priority for site engineer;
244 6 Qualitative Process Analysis
Exercise 6.7 Consider again the process for university admission described in
Exercise 1.1 (page 5) and the issue described in Exercise 6.6 above. Analyze this
issue using a why-why diagram.
6.5 Recap
Solution 6.1
• VA: receive online application, evaluate academic admissibility, send notification
to student
• BVA: check completeness, academic recognition agency check, English test
check
• NVA: receive physical documents from students, forward documents to commit-
tee, notify students service of outcomes of academic admissibility.
Note. In this solution we treat the entire agency check as BVA. Part of this agency
check consists in the admissions office sending the documents to the agency and
the agency sending back the documents and their assessment to the admissions
office. These two sub-steps could be treated as NVA. However, if we assume that
6.6 Solutions to Exercises 245
the agency requires the documents to be sent by post to them, these sub-steps
cannot be easily separated from the agency check itself. In other words, it would not
be possible to eliminate these handoff steps without eliminating the entire agency
check. Thus the entire agency check should arguably be treated as a single step.
Solution 6.2
Transportation. Right from the start of the process, we can spot transportation
waste in the form of physical documents sent by the student to the admissions
office, emails from the admission office to the student, and further documents
sent by the applicant if the initial application is incomplete. The latter events
can also be seen as defect waste. We also note that there is a handoff from the
admissions office to the committee and back. These handoffs are transportation
wastes too. Other transportation wastes come from the interactions between the
admissions office and the external academic recognition agency.
Waiting. When the admissions office finds that an application is incomplete, an
email is sent to the student asking for the missing information or documents.
The fact that the application is put on hold until additional input is received
from the candidate is waiting waste. Later in the process, the committee batches
the applications and examines them every three months. This batching generates
waiting waste. There may also be idleness waste during the period when the
admissions office is waiting for the decisions of the academic committee, but
without further information, it is not possible to assert that this idleness indeed
occurs in practice (it may be that the office handles other work in the meantime).
Inventory. Given the committee meets every three months, we can hypothesize
that at a given point in time, there are several hundred applications in a pending
state. This constitutes inventory waste.
Defect. When an incomplete application is sent back to the applicant, the
application needs to be checked again after the student resubmits a revised
application. This second verification of completeness is rework, hence defect
waste.
Overprocessing. Officers in the admissions office spend time verifying the
authenticity of around 3,000 diplomas and language test results submitted by the
applicants. In the end, however, only 5% of cases reveal any issues. Later on in
the process, three quarters of the applications are passed on to the admission
committees. The university ends up making a study place offer to only 20%
of the applications that they receive. The fact that the document authenticity
was verified for all the applications rejected by the committee is an example
of overprocessing.
Overproduction. We can see two sources of overproduction waste: cases where
an applicant rejects the admission offer he or she receives (20% of cases) and
cases where the applicant is declared admissible but does not receive a study
offer due to lack of places (20%).
Solution 6.3 The customer of the admissions process is the applicant. We distin-
guish between applicants who do not get an admission offer and those who get one.
246 6 Qualitative Process Analysis
Among those who get an admission offer, we distinguish between those who accept
the admission offer and those who reject it. Additionally, we could distinguish
those students who get an admission offer with scholarship from those who do
not. Based on this classification, we may wish to interview at least one applicant
whose application was rejected; one whose application who was accepted without
scholarship and did not accept the offer; one whose application was accepted with
scholarship and did not accept the offer; and one whose application was accepted
without scholarship and accepted the offer. We could also interview one who was
admitted with scholarship and accepted the offer but it is unclear if this would bring
additional insights given the objectives of the process owner.
The main process participants are the admission officers and the admission com-
mittees (we assume there is one committee per curriculum). We should interview
at least one representative from each of these two groups, and in the case of the
admission committees, we should consider interviewing representatives of at least
two committees.
There is one external party in the process (the academic recognition agency).
However, given that their role in the process is punctual and given the objectives of
the process owner, it does not seem that getting input from the academic recognition
agency is necessary.
Given the description of the process, we hypothesize that the students are
concerned by the complexity and slowness of the process, and hence we should
be prepared to ask questions about the amount of effort that the process requires
from them, and their perception about the response times in the process. They might
also raise issues (defects) during the process, like for example wrong answers to
their questions, and hence questions could be prepared to explore this possibility.
The admission officers have a large amount of applications to handle, so they
might be concerned about excessive workload. But they could also be concerned
about the amount of inquiries they get from students. There is a back-and-forth
handoff with the admission committee in the process, and hence defects and waiting
times associated to these handoffs. More generally, we could take each of the wastes
identified in Solution 6.2 and prepare questions to shed light into the magnitude and
impact of each waste.
The admission committee has to examine the applications from multiple per-
spectives and has to deal with discrepancies between different grading systems, as
students come from different countries. They also have to assess the motivation
letters and reference letters, which are free-form. One can expect them to raise issues
about the complexity of tasks they perform. Questions could also be prepared to
shed further light into the wastes identified in Solution 6.2, particularly with regard
to handoffs.
Solution 6.4 In the following issue register, we only analyze the issue described in
this chapter, namely that the admission process takes too long. In practice, the issue
register would include multiple issues.
6.6 Solutions to Exercises 247
In the above issue analysis, the effort required to deal with enquiries during the
pre-admission period is listed in the qualitative impact field. If it were possible (with
a reasonable amount of effort) to estimate how many such enquiries arrive and how
much time they consume, it would be possible to turn this qualitative impact into a
quantitative one.
Solution 6.5 First, we analyze the cost incurred by each type of occurrence (i.e.,
each causal factor) in the sample:
1. Last-minute request: 1 day delay (because normally 2 days advance notice are
needed), thus e 400 cost × 5 = e 2,000.
2. Equipment out-of-stock: 3 days delay = e 1,200 × 9 = e 10,800.
3. Approval delay: 1 day delay = e 400 × 13 = e 5,200.
4. Rejected equipment: 2 days delay = e 800 × 27 = e 21,600. Note that in
Example 6.5 we mentioned that when an equipment is rejected, a fee of e 100 (on
average) has to be paid to the supplier for taking back the equipment. However,
we do not include this fee here because we are interested in analyzing the costs
stemming from equipment not being available on the required day, as opposed to
other costs incurred by rejecting equipment.
5. Supplier mistake: 1 day delay = e 400 minus e 200 in rental cost saving = e 200
× 4 = e 800.
6. Undetermined: 2 days delay = e 800 × 2 = e 1,600.
Since the sample represents 20% of occurrences of the issue over a year, we
multiply the above numbers by 5 in order to estimate the total yearly loss attributable
to each causal factor. The resulting Pareto chart is given in Figure 6.8.
Solution 6.6 The cause-effect diagram corresponding to this exercise should
include at least the name of the issue (e.g., “Student waiting time too long”)
and the following factors:
248 6 Qualitative Process Analysis
120000 120%
108000
100000 100%
98% 100%
94%
90%
80000 80%
77%
40000 40%
26000
20000 20%
10000 8000
4000
0 0%
Rejected Out-of-stock Approval delay Last-minute Undetermined Supplier mistake
equipment requests
Fig. 6.8 Pareto chart of causal factors of issue “Equipment not available when needed”
• Process stalls due to agency check. This is a Method issue, since the issue stems
from the fact that the process essentially stalls until a response is received from
the agency. One could argue that to some extent this is a Milieu issue. But while
the slowness of the agency check is a Milieu issue, the fact that the process stalls
until a response is received from the agency is a Method issue.
• Agency check takes too long. This is a Milieu issue since the agency is a separate
entity that imposes its own limitations.
• Academic committee assessment takes too long. This is a Method issue since
the process imposes that the academic committee only assesses applications at
certain times (when it meets), rather than when applications are ready to be
evaluated.
• Physical documents take too long to be received. This is a Milieu issue for two
reasons. First, the physical documents are needed for the purpose of the agency
check and the delays in the arrival of physical documents are caused by the
applicants themselves and postal service delays.
• Admission office delays the notification after academic assessment. This seems
to be a Method issue, but the description of the process does not give us sufficient
information to state this conclusively. Here, a process analyst would need to
gather more information in order to understand this issue in further detail.
Solution 6.7
• Admission process takes too long, why?
– Process stalls until physical documents arrive, why?
· Agency check requires physical documents;
· Other tasks are performed only after agency check, why?
6.7 Further Exercises 249
conducting interviews and surveys with current and past corporate customers and also by
gathering customer feedback data that the travel agency has recorded over time. About 2%
of customers complained about errors that had been made in their bookings. In one occasion,
a customer had requested a change to a flight booking. The travel agent wrote an email to the
customer suggesting that the change had been made and attached a modified travel itinerary.
However, it later turned out that the modified booking had not been confirmed in the flight
reservation system. As a result, the customer was not allowed to board the flight and this led
to a series of severe inconveniences for the customer. Similar problems had occurred when
booking a flight initially: the customer had asked for certain dates, but the flight tickets
had been issued for different dates. Additionally, customers complained of the long times it
took to get responses to their requests for quotes and itineraries. In most cases, employees
of the travel agency replied to requests for quotes within 2–4 working hours, but in the case
of some complicated itinerary requests (about 10% of the requests), it took them up to 2
days. Finally, about 5% of customers also complained that the travel agents did not find the
best flight connections and prices for them. Several customers reported that they had found
better itineraries and prices on the Web by searching by themselves.
1. Document the issues in the form of an issue register. To this end, you may assume
that the travel agency receives around 100 itinerary requests per day and that the
agency makes 50 bookings per day. Each booking brings a gross profit of e 100
to the agency.
2. Analyze the issues described above using root cause analysis techniques.
Exercise 6.13 Consider the booking-to-cash process at a photography company
(Fotof) described in Exercise 4.31 (page 155) as well as the following data:
• Fotof has 25 photo studios and its latest annual turnover from photography
services is 17.6 million, out of which 25% from sales to corporate customers
and the rest from private customers.
• The company makes an additional 5 million revenue in sales of photography
equipment and accessories at its studios.
• There are on average 3.5 photographers and 2 technicians per studio.
• On average, an in-studio session lasts 45 min, while an on-location session lasts
3.5 h (including transportation time).
• 20% of private customer shootings and 100% of corporate customer shootings
are on-location. The remaining ones are on-studio.
An analyst conducted a stakeholder analysis focusing on three types of a
stakeholders: the customer, the process participants, and the management (process
owner and business sponsor). The main findings of this analysis are summarized
below.
Customer According to the latest customer survey, customer satisfaction stands at
80% (declining from 85% in the previous year) and net promoter score at 70%
(declining from 80% in the previous year). Common customer complaints exist
in regards to: (i) turnaround times between the photo shooting session and the
availability of pictures for review, as well as the turnaround times for delivery of
digital copies and printouts; (ii) turnaround times for resolving customer complaints
particularly with regard to perceived defects in the delivered digital and printed
copies; (iii) mishandled or forgotten orders or special requests. Customers often
6.7 Further Exercises 251
make changes to their orders or additional special requests via phone or email and
these changes/requests are sometimes not recorded (or recorded incorrectly) in the
order management system. Changes to orders are currently handled manually.
Process Participants Staff satisfaction is also low. Over 60% of customer service
staff consider that their job is stressful. The staff turnover rate overall is at an
all-times high: 10% of staff left the company in the previous year and had to be
replaced up from 6% the year before. The average Cost-To-Company (CTC) of
a photographer at a Fotof studio is 41K per year (35K for technicians and 37K
for customer service staff). The CTC at Fotof is generally in line with industry
averages. The company additionally employs 20 staff at the company headquarters
at an average CTC of 46K. Interviews with staff have highlighted the following
issues in the process:
• Customer service staff perceive that appointment management is too time-
consuming. Customers sometimes call or email multiple times to find a suitable
appointment time. Customers also call frequently to change their appointments
for shooting sessions or to cancel their session. About 1% of corporate orders
result in a cancelation prior to the shooting, while 5% of private orders are
canceled prior to the shooting.
• The late-show and no-show rates for appointments are high: 10% late-shows for
in-studio sessions, 2% for on-location sessions, 3% of no-shows for on-studio
sessions, and 1% for on-location.
• There are numerous customer enquiries via phone and email (on average three
per order, in addition to booking-related calls or emails), be it to enquire about
the status of orders or deliveries, to make changes to the order, to discuss special
requests, pricing questions, as well as to report complaints with received pictures.
Management Fotof’s three-years strategy is focused on revenue growth. The
company seeks to achieve a revenue increase of 50% by end of 2018 organically,
meaning via growth of the existing business, without company acquisitions and
without opening additional retail outlets. To achieve this goal, Fotof’s management
is receptive to ideas to improve customer service and to expand the range of added-
value services. Fotof’s management perceives that additional revenue could come
in great part from wedding photos, parties and ceremonies. At present only the
customer who initiates the booking can place orders. But in the case of personal
events and if the customer consents, there is an opportunity to sell to other event
participants. Fotof’s management also perceives that faster cycle times could also
help to enhance sales.
Write an issue register based on the above information.
Exercise 6.14 Write an issue register for the pharmacy prescription fulfillment
process described in Exercise 1.6 (page 30). Analyze at least the following issues:
• Sometimes, a prescription cannot be filled because one or more drugs in the
prescription are not in stock. Customers only learn this when they come to pick
up their prescription.
252 6 Qualitative Process Analysis
• Oftentimes, when customers arrive to pick up the drugs, they find out that they
have to pay more than what they expected because their insurance policy does
not cover the drugs in the prescription, or because the insurance company covers
only a small percentage of the cost of the drugs.
• In a very small number of cases, the prescription cannot be filled because there is
a potentially dangerous interaction between one of the drugs in the prescription
and other drugs that the customer has been given in the past. Customers only find
out about this issue when they arrive to pick up the prescription.
• Some prescriptions can be filled multiple times. This is called a “refill”. Each
prescription explicitly states whether a refill is allowed and if so how many refills
are allowed. Sometimes, a prescription cannot be filled because the number of
allowed refills has been reached. The pharmacist then tries to call the doctor who
issued the prescription to check if the doctor would allow an additional refill.
Sometimes, however, the doctor is unreachable or the doctor does not authorize
the refill. The prescription is then left unfilled and customers will only find out
when they arrive to pick-up the prescription.
• Oftentimes, especially during peak time, customers have to wait for more than
10 min to pick up their prescription due to queues. Customers find this annoying
because they find that having to come twice to the pharmacy (once for drop-
off and once for pick-up) should allow the pharmacy ample time to avoid such
queues at pick-up.
• Sometimes, the customer arrives at the scheduled time, but the prescription is not
yet filled due to delays in the prescription fulfillment process.
When making assumptions to analyze these issues, you may choose to equate
“oftentimes” with “20% of prescriptions”, “sometimes” with “5% of prescriptions”
and “very small number of cases” with “1% of prescriptions”. You may also assume
that the entire chain of pharmacies consists of 200 pharmacies that serve 4 million
prescriptions a year and that the annual revenue of the pharmacy chain attributable
to prescriptions is e 200 million. You may also assume that every time a customer
is dissatisfied when picking up a prescription, the probability that this customer will
not come back after this experience is 20%. You may also assume that on average a
customer requires 5 prescriptions per year.
Based on the issue register, apply Pareto Analysis to determine a subset of issues
that should be addressed to reduce the customer churn due to dissatisfaction by at
least 70%. Customer churn is the number of customers who stop consuming services
offered by a company at a given point in time. In this context, this means the number
of customers who stop coming to the pharmacy due to a bad customer experience.
Exercise 6.15 Write an issue register for the procure-to-pay process described in
Exercise 1.7 (page 31).
Exercise 6.16 Consider the pharmacy prescription fulfillment process described in
Exercise 1.6 (page 30) and the following issue:
• Sometimes, the customer arrives at the scheduled time, but the prescription is not
yet filled due to delays in the prescription fulfillment process.
6.8 Further Readings 253
Analyze the possible causes of this issue using a cause-effect diagram or why-
why diagram.
2 http://www.isixsigma.com/tools-templates/.
3 http://www.syque.com/improvement/a_encyclopedia.htm.
4 http://www.syque.com/quality_tools/toolbook/toolbook.htm.
5 The sub-categorization of the 6 M’s given in Section 6.4.1 also comes from Schwegmann &
Laske [159].
254 6 Qualitative Process Analysis
is that during the workshops organized for the purpose of process discovery (see
Chapter 5), workshop participants will often feel compelled to voice out issues
related to different parts of the process. Therefore, process discovery is an occasion
to start listing issues.
Another framework commonly used for qualitative process analysis is Theory
of Constraints (TOC) [56]. TOC is especially useful when the goal is to trace
weaknesses in the process to bottlenecks. The application of TOC to business
process analysis and redesign is discussed by Laguna & Marklund [85, Chapter 5]
and by Rhee et al. [140].
Chapter 7
Quantitative Process Analysis
We recall that the cycle time of a process is the average time it takes between the
moment the process starts and the moment it completes. By extension, we say that
the cycle time of a task is the average time it takes between the moment the task
starts and the moment it completes.
To understand how flow analysis works it is useful to start with an example of a
purely sequential process as in Figure 7.1. The cycle time of each task is indicated
between brackets. Since the two tasks in this process are performed one after the
other, we can conclude that the cycle time of this process is 20 + 10 = 30 h.
More generally, we can state that the cycle time of a sequential fragment of a
process is the sum of the cycle times of the tasks in the fragment. We use T as the
set of tasks with an index i and define:
n
CT = Ti (7.1)
i=1
A B
(10) (20)
Fig. 7.1 Fully sequential process model (durations of tasks in hours are shown between brackets)
B
(20)
A
(10)
C
(10)
not 40 (the sum of the task cycle times). Indeed, in a given instance of this process,
either task B or task C is performed. If B is performed, the cycle time is 30 h, while
if C is performed, the cycle time is 20 h. So we can conclude that the cycle time
must be lower than 30 h.
Whether the cycle time of this process is closer to 20 h or closer to 30 h depends
on how frequently each branch of the XOR-split is taken. For instance, if in 50% of
instances the upper branch is taken and the remaining 50% of instances the lower
branch is taken, the overall cycle time of the process is 25 h. On the other hand, if
the upper branch is taken 90% of the times and the lower branch is taken 10% of
the times, the cycle time should be intuitively closer to 30 h. Generally speaking,
the cycle time of the fragment of the process between the XOR-split and the XOR-
join is the weighted average of the cycle times of the branches in-between. Thus,
if the upper branch has a frequency of 90% and the lower branch a frequency of
10%, the cycle time of the fragment between the XOR-split and the XOR-join is:
0.9 × 20 + 0.1 × 10 = 19 h. We then need to add the cycle time of task A in order
to obtain the total cycle time, that is, 10 + 19 = 29 h. In the rest of this chapter, we
will use the term branching probability to denote the frequency with which a given
branch of a decision gateway is taken.
In more general terms, the cycle time of a fragment of a process model with the
structure shown in Figure 7.3 is:
n
CT = pi × T i (7.2)
i=1
In Figure 7.3, p1 , p2 , etc. are the branching probabilities. Each cloud represents a
fragment that has a single entry flow and a single exit flow. The cycle times of these
nested fragments are T1 , T2 , etc. This type of fragment is called a XOR-block.
Let us now consider the case where parallel gateways are involved as illustrated
in Figure 7.4. Again, we can observe that the cycle time of this process cannot be
40 (the sum of the task cycle times). Instead, since tasks B and C are executed in
parallel, their combined cycle time is determined by the slowest of the two tasks, that
is, by B. Thus, the cycle time of the process shown in Figure 7.4 is 10 + 20 = 30 h.
T1
p1
T2
p2
pn ...
TN
B
(20)
A
(10)
C
(10)
T1
T2
...
TN
More generally, the cycle time of an AND-block such as the one shown in
Figure 7.5 is:
CT = M ax(T1 , T2 , . . . , Tn ) (7.3)
Example 7.1 Let us consider the credit application process model in Figure 7.6 and
the task cycle times given in Table 7.1. Let us also assume that in 60% of the cases
the credit is granted.
To calculate the cycle time of this process, we first note that the cycle time of
the AND-block is 3 days (the cycle time of the slowest branch). Next, we calculate
the cycle time of the fragment between the XOR-block using Eq. (7.2), that is,
0.6 × 1 + 0.4 × 2 = 1.4 days. The total cycle time is then: 1 + 3 + 3 + 1.4 = 8.4
days.
7.1 Flow Analysis 259
Exercise 7.1 Consider the process model given in Figure 3.8 (page 86). Calculate
the cycle time under the following assumptions:
• Each task in the process takes 1 h on average.
• In 40% of the cases the order contains only Amsterdam products.
• In 40% of the cases the order contains only Hamburg products.
• In 20% of the cases the order contains products from both warehouses.
Compare the process model in Figure 3.8 (page 86) with the one in Figure 3.10
(page 88). Does this comparison give you an idea of how to calculate cycle times
for process models with OR gateways?
Another recurrent pattern is the one where a fragment of a process is repeated
any number of times. This pattern is illustrated in Figure 7.7. In this figure, the
decimal numbers attached to the flows denote the probability that the flow will be
taken whenever the XOR-split gateway is reached. Looking at the figure, we can
say for sure that task B will be executed once. Next, we can say that task B may
be repeated once (i.e., executed a second time) with a probability of 20% (i.e., 0.2),
which is the probability of going back from the XOR-split gateway to the XOR-join
gateway. If we continue this reasoning, we can conclude that the probability that
task B is repeated twice (in addition to the first execution) is 0.2 × 0.2 = 0.04.
More generally, the probability that task B is repeated N times (in addition to the
first execution) is 0.2N .
If we sum up the cycle time of the first execution of B, plus the cycle times of
the cases where
∞B is repeated once, twice, three times, etc., we get the following
summation: n=0 0.2n . This is the expected number of executions of task B. If we
replace 0.2 with a variable r, this summation is a well-known series, known as the
geometric series, and it can be shown that this series is equivalent to 1/(1 − r).
1-r
T
T
CT = . (7.4)
1−r
In this formula, the parameter r is called the rework probability, that is, the
probability that the fragment inside the cycle will need to be reworked. This type of
block is called a rework block or repetition block.
In some scenarios, a task is reworked at most once. This situation would be
modeled as shown in Figure 7.9. Using what we have seen, we can already calculate
the cycle time of this example. First, we observe that the cycle time of the fragment
between the XOR-split and the XOR-join is 0.2 × 20 + 0.8 × 0 = 4. Here, the
zero comes from the fact that one of the branches between the XOR-split and the
XOR-join is empty and, therefore, does not contribute to the cycle time. To this, we
have to add the cycle time of the preceding tasks, giving us a total cycle time of 34.
In summary, we have seen that the cycle time of a process can be calculated using
the following four equations:
• The cycle time CT of a sequence of fragments with cycle times CT1 , . . . CTn is
n
the sum of the cycle times of these fragments: CT = Σi=1 CTi .
• The cycle time CT of an XOR-block is the weighted average of the cycle times
of its branches (CTi ), using the branching probabilities pi as the weights: CT =
n
Σi=1 pi × CTi .
B
(20)
0.2
A 0.8
B
(10) (20)
Fig. 7.9 Situation where a fragment (task) that is reworked at most once
7.1 Flow Analysis 261
• The cycle time CT of an AND-Block is the cycle time of its slowest branch
or, in other words, the maximum of the cycle times of the branches: CT =
M ax(CT1 , . . . CTn ).
• The cycle time CT of a rework block with a rework probability r is the cycle
time of each iteration of the loop (let us call it CTb ) divided by (1 − r): CT =
CTb /(1 − r).
Example 7.2 Let us consider the credit application process model in Figure 7.10
and the cycle times previously given in Table 7.1. Let us assume that after
each execution of “Check completeness”, in 20% of the cases the application is
incomplete. And let us also assume that in 60% of the cases the credit is granted.
The cycle time of the rework block is 1/(1 − 0.2) = 1.25 days. The cycle time
of the AND-block is 3 days and that of the XOR-block is 1.4 days as discussed in
Example 7.1. Thus the total cycle time is 1.25 + 3 + 3 + 1.4 = 8.65 days.
The cycle time of a task or of a process can be divided into waiting time and
processing time. Waiting time is the portion of the cycle time where no work is
being done to advance the process. Processing time, on the other hand, refers to
the time that participants spend doing actual work. In many, if not most processes,
a considerable proportion of the overall cycle time is waiting time. Waiting time
typically arises when there is a handoff between two participants. In this case,
there is usually a waiting time between the moment the first participant finishes
its task, and the moment when the next participant starts the next task. This waiting
time may become relatively long if the process participants perform their work in
batches. For example, in a purchase requisition process, the supervisor responsible
for purchase approvals might choose to batch all purchase requisitions that arrive
during a given day and approve them all at once at the end of the working day. Also,
sometimes time is spent waiting for an external party to provide input for a task.
For example, in the context of fulfilling a medical prescription, a pharmacist may
require a clarification from the doctor. To do so, the pharmacist would try to call the
doctor. But the doctor might be unavailable such that the pharmacist needs to put
the prescription aside and wait until the doctor returns the call.
262 7 Quantitative Process Analysis
When analyzing a process with the aim of addressing issues related to cycle time,
it may be useful to start by evaluating the ratio of overall processing time relative
to the overall cycle time. This ratio is called cycle time efficiency. A cycle time
efficiency close to 1 indicates that there is little room for improving the cycle time
unless relatively radical changes are introduced in the process. A ratio close to zero
indicates that there is a significant amount of room for improving cycle time by
reducing the waiting time.
Concretely, the cycle time efficiency of a process is calculated as follows. First,
we need to determine the cycle time and the processing time of each task. Given
this information, we then calculate the cycle time CT of the process using the four
equations we saw above. Next, we calculate the so-called the Theoretical Cycle
Time (TCT) of the process. The TCT is the average amount of time that a case
would take if there was no waiting time at all. The TCT is calculated using the same
four equations we introduced above, but instead of using the cycle time of each task,
we must use instead the processing time of each task. Once we have calculated the
TCT, we calculate the Cycle Time Efficiency (CTE) as follows:
T CT
CT E = (7.5)
CT
Example 7.3 Let us consider the credit application process model in Figure 7.10
and the processing times given in Table 7.2. The task cycle times (including both
waiting and processing time) are those previously given in Table 7.1. We assume
again that in 20% of the cases the application is incomplete and in 60% of the cases
the credit is granted. Let us additionally assume that 1 day is equal to 8 working
hours.
We have seen in Example 7.2 that the total cycle time of this process is 8.65 days,
which translates to 69.2 working hours. We now calculate the theoretical cycle time
using the processing times given in Table 7.2. This gives us: 2/(1 − 0.2) + 3 +
2 + 0.6 × 2 + 0.4 × 0.5 = 8.9 working hours. The cycle time efficiency is thus
8.9/69.2 = 12.9%.
Exercise 7.2 Calculate the overall cycle time, theoretical cycle time, and cycle time
efficiency of the ministerial enquiry process introduced in Example 3.7 (page 90).
Assume that the rework probability is 0.2 and the waiting times and processing times
are those given in Table 7.3.
Table 7.3 Task cycle times Task Cycle time Processing time
and processing times for
ministerial enquiry process Register ministerial enquiry 2 days 30 min
Investigate ministerial enquiry 8 days 12 h
Prepare ministerial response 4 days 4h
Review ministerial response 4 days 2h
A low cycle time efficiency raises the question of which parts of the process should
be improved. In order to answer that question, we need to more precisely understand
which tasks contribute to the theoretical cycle time. The Critical Path Method
(CPM) is a well-known method for addressing this question in the context of project
planning. This method can be applied to process models that do not contain decision
gateways. This means that if the process model contains XOR or OR gateways, we
need to simplify it by removing all such gateways, before we can apply the CPM
method. We can do this by either replacing every XOR, OR, and loop block with
a single task, or by considering only specific paths of our process and focusing on
those ones. For example, we can eliminate the branches of an XOR-split that lead
to an early completion of the case as well as those gateways associated with rework
loops (like the rework loop in Figure 7.10). Indeed, if we optimize the theoretical
264 7 Quantitative Process Analysis
cycle time of the process without its rework loops, this would contribute to also
optimizing the process with the rework loop.
Let us consider again the credit application process without repetition and with
a positive credit assessment as shown in Figure 7.11. The theoretical cycle time is
determined by the processing time of tasks “Check completeness”, “Check income
sources”, “Assess Application”, and “Make credit offer”, which all take 2, 3, 2 and
2 h, respectively. These tasks are part of the critical path of this process (highlighted
in grey). The critical path of a process is the sequence of tasks that determines the
theoretical cycle time of the process. When optimizing a process with respect to
theoretical cycle time, one should focus the attention on the processing times of the
tasks that belong to the critical path.
CPM identifies the critical path based on the notions of early start (ES), early
finish (EF), late start (LS), and late finish (LF) of each task of the process. Early
start and early finish are determined in a forward pass over the process. We start
with time zero at the start event. Each task is assigned as early start the early finish
time of its predecessor. Its early finish is its early start time plus its processing
time. If this predecessor is the entry (split) gateway of an AND-block, it is assigned
the early finish time of the preceding fragment. If it is the exit (join) gateway of
an AND-block, it is assigned the maximum of the early finish from all its parallel
branches. Using this procedure, we can determine at what time each task has to start
and finish such that the cycle time is equal to the theoretical cycle time.
Not all tasks are equally critical for finishing the process within the theoretical
cycle time. Consider the two verification (check) tasks of the credit application
process. They are part of an AND-block. If the more time-consuming task “Check
income sources” (3 h) gets delayed, this will delay the process altogether. If “Check
credit history” (taking 30 min) is delayed, this will only delay the process if it takes
longer than the processing time of the more time-consuming task “Check income
sources”. For this reason, we also have to determine the late start and late finish of
all tasks in a backward pass over the process. Now, we start from the end event with
the time set to the theoretical cycle time. For each task, its late finish is assigned the
late start of its successor. Its late start is the late finish minus its processing time. We
continue our pass from right to left of the process, now taking the earlier time at the
entry split an AND-block.
7.1 Flow Analysis 265
Example 7.4 Let us now apply these calculation steps for the credit application
process shown in Figure 7.11 and the processing times given in Table 7.2. We start
with calculating the early start and early finish times (ES and EF ).
• The start event “Application received” gets zero assigned (ES = EF = 0).
• Early start of “Check completeness” is the same as the early finish of its prede-
cessor. This means ES = 0. We calculate EF = ES + processingtime = 2.
• Each task after the AND-split gets the same ES as the preceding task. Its EF
is this ES plus the respective processing time. This means, for “Check credit
history” we get ES = 2 and EF = 2.5 and for “Check income sources” we
have ES = 2 and EF = 5.
• At the AND-split, we have to determine the maximum EF of its preceding tasks.
This is M ax(2.5, 5) = 5.
• The subsequent task “Assess application” gets this maximum as its ES = 5.
Considering the processing time, its EF = 7.
• For “Make credit offer” we get ES = 7 and EF = 9.
• Therefore, ES = EF = 9 holds for the end event “Application processed” and
for the overall process.
With this value of EF = 9, we start our pass backwards to calculate late start and
late finish (LS and LF ).
• For the task “Make credit offer”, we assign the late finish time from the end event
(LF = 9) and subtract the processing time to get LS = 7.
• In the same way, we first obtain LF = 7 and then LS = 5 for “Assess
application”.
• For the tasks preceding the AND-join, we obtain their late finish from the late
start of the task after it. Therefore, both “Check credit history” and “Check
income sources” have LF = 5. We subtract their respective processing times
to get LS = 4.5 and LS = 2.
• At the AND-split, we determine the minimum LS of its successor tasks. This is
M in(4.5, 2) = 2.
• The preceding task “Check completeness” gets this minimum as LF = 2.
Considering its processing time, we get LS = 0.
• Therefore, we also have LS = LF = 0 for the start event.
In this example, we observe that the early start and finish times are the same for
most tasks. The critical path is the set of tasks for which these two values are equal.
Those tasks with a late start greater than the early start (LS > ES) or late finish
greater than early finish (LF > EF ) have slack. This means that even when they
start or complete later, it might still be possible to finish the process without delay.
This is the case of “Check credit history” in our example. Slack tasks are typically
the less time-consuming tasks in AND-blocks.
266 7 Quantitative Process Analysis
Exercise 7.4 Consider the process model shown in Figure 7.4 with the processing
times indicated in the brackets of the tasks. What is its critical path? How much
slack is there on the task that is not on the critical path?
Cycle time is directly related to two measures that play an important role when
analyzing a process, namely arrival rate and Work-In-Process (WIP).
The arrival rate of a process is the average number of new instances of the
process that are created per time unit. For example, in a credit application process,
the arrival rate is the number of credit applications received per day (or any other
time unit we choose). Similarly, in an order-to-cash process, the arrival rate is
the average number of new orders that arrive per day. Traditionally, the symbol
λ (lambda) is used to refer to the arrival rate.1
The Work-In-Process (WIP) is the average number of instances of a process that
are active at a given point in time, meaning the average number of instances that
have not yet completed. For example, in a credit application process, the WIP is the
average number of credit applications that have been submitted and not yet granted
or rejected. Similarly, in an order-to-cash process, the WIP is the average number of
orders that have been received, but not yet delivered and paid for.
Cycle time (CT), arrival rate (λ), and WIP are related by a fundamental law
known as Little’s law, which states that:
W IP = λ × CT (7.6)
1A related concept is that of throughout, which is the average number of instances completed per
time unit. In a stable system and over long periods of time, the throughput should be equal to the
arrival rate (otherwise it means that we are not able to handle all the workload).
7.1 Flow Analysis 267
Although simple, Little’s law can be an interesting tool for what-if analysis. We
can also use Little’s law as an alternative way of calculating the total cycle time of
a process if we know the arrival rate and WIP. This is useful because determining
the arrival rate and WIP is sometimes easier than determining the cycle time. For
example, in the case of the credit application process, the arrival rate can be easily
calculated if we know the total number of applications processed over a period of
time. For example, if we assume there are 250 business days per year and we know
the total number of credit applications over the last year is 2,500, we can infer that
the average number of applications per business day is 10. WIP on the other hand
can be calculated by means of sampling. We can ask how many applications are
active at a given point in time, then ask this question again one week later and again
two weeks later. Let us assume that on average we observe that 200 applications are
active at the same time. The cycle time is then W IP/λ = 200/10 = 20 business
days.
Exercise 7.5 A restaurant receives on average 1,200 customers per day (between
10 a.m. and 10 p.m.). During peak times (12 p.m. to 3 p.m. and 6 p.m. to 9 p.m.), the
restaurant receives around 900 customers in total and, on average, 90 customers can
be found in the restaurant at a given point in time. At non-peak times, the restaurant
receives 300 customers in total and, on average, 30 customers can be found in the
restaurant at a given point in time.
• What is the average time that a customer spends in the restaurant during peak
times?
• What is the average time that a customer spends in the restaurant during non-peak
times?
• The restaurant’s premises have a maximum capacity of 110 customers. This
maximum capacity is sometimes reached during peak times. The restaurant
manager expects that the number of customers during peak times will increase
slightly in the coming months. What can the restaurant do to address this issue
without investing in extending its building?
The calculations of Little’s law rely on the assumption that the process is stable.
In order to assess whether or not this assumption is applicable, we have to know
the theoretical capacity of the process and the resource utilization of the resources
involved in the process.2
The theoretical capacity of a process is the maximum amount of instances that
can be completed per time unit given a set of resources. The theoretical capacity is
reached when a subset of the resources are working at full capacity (no idle time)
2 The term occupation rate is sometimes used a synonym for resource utilization.
268 7 Quantitative Process Analysis
and the other resources cannot help them due to the existing division of labor in the
process. When this limit is reached, the resources who are at full capacity cannot
handle more work per time unit.
To better understand the notion of theoretical capacity, we need to introduce the
notion of resource pool. A resource pool R is a set of interchangeable resources who
are responsible for executing a set of tasks in a process. For example, let us assume
that in the loan application process, tasks “Check completeness”, “Check credit
history”, and “Check income sources” are performed by clerks, whereas “Assess
application”, “Make credit offer”, and “Notify rejection” are performed by credit
officers. Therefore, this process has two resource pools (clerks and credit officers).
Most likely, there are multiple clerks and multiple credit officers. This means that
each resource pool has a given size.
Each instance of the process demands a given amount of time (on average) from
each resource pool. The amount of time that a resource pool p needs to spend on
one instance of the process is called the pool’s unit load (ul). Each task assigned to
a resource pool adds up to its unit load. And the more times a task is executed per
instance, the more this task adds up to the load of its resource pool. For example, if
two tasks a1 and a2 have the same processing time, but a1 is executed on average
0.5 times per instance, while a2 is executed on average twice per instance, then a2
contributes four times more to the unit load than a1 . Hence, to calculate the unit load,
we add up the processing times of each task a assigned to resource pool p, taking
into account how many times each task is executed per instance of the process.
Below we present a two-step method to calculate ul (for a given pool p) using
equations similar to the ones we used to calculate cycle time and theoretical cycle
time. In the first step, we assign a unit load to each task with respect to pool p as
follows:
• For each task assigned to resource pool p, its unit load is equal to its processing
time.
• For each task not assigned to resource pool p, its unit load is zero.
In the second step, we use the following equations to calculate ul for a pool:
• The unit load ul of a sequence of fragments with unit loads ul 1 , . . . ul n is the
n
sum of the unit loads of the fragments: ul = Σi=1 ul i .
• The unit load ul of an XOR-block is the weighted average of the unit loads
of its branches (ul i ), using the branching probabilities pi as the weights: ul =
n
Σi=1 pi × ul i .
• The unit load ul of an AND-Block is the sum of the unit loads of its branches:
n
ul = Σi=1 ul i . This is the same equation for sequences of fragments. The reason
is that if a resource pool is involved in multiple branches of an AND-Block, each
of these branches adds up to the load of this resource pool (i.e., each branch
requires some effort and these efforts have to be added up).
• The unit load ul of a rework block with a rework probability r is the unit load of
each iteration of the loop (let us call it ul b ) divided by 1 − r: ul = ul b /(1 − r).
7.1 Flow Analysis 269
Example 7.5 Let us consider the credit application process model in Figure 7.10
and the processing times given in Table 7.2. Tasks “Check completeness”, “Check
credit history”, and “Check income sources” are performed by clerks, whereas
“Assess application”, “Make credit offer”, and “Notify rejection” are performed by
credit officers.
Using the above equations, the unit loads of the three tasks assigned to the
clerk are 2 h (“Check completeness”), 3 h (“Check credit history”), and 0.5 h
(“Check income sources”). The remaining tasks after the AND-join gateway do
not contribute to the unit load of the clerk pool. Hence, the unit load pool is
2/(1 − 0.2) + 3 + 0.5 = 6 working hours. This means that each loan application
takes 6 h of time from the clerk pool.
Meanwhile, the unit loads of the tasks assigned to the credit officer are 2 h
for both “Assess application” and for “Make credit offer”, but 0.5 h for “Notify
rejection”. The first three tasks do not contribute to the unit load of credit officers.
Taking into account the branching probabilities of the XOR-split, the unit load of the
credit office pool is 2 + 0.6 × 2 + 0.4 × 0.5 = 5.2 working hours. Again, this means
that each loan application takes on average 5.2 h from the credit officers pool.
We have seen how to calculate the unit load ul of a resource pool p, which means
the amount of time that a resource pool p spends per instance of the process. To
calculate the theoretical capacity, we now need to determine how much time each
resource pool can deliver per time unit. This is called the unit capacity of the pool.
To do this, it is convenient to lift the temporal granularity by one level. Since we
have been working in terms of hours above, we will now move to the next higher
granularity, which is the working day. Let us assume that a working day is 8 h. This
means that one resource in a pool can deliver 8 h of work per day. This is called the
unit capacity of a resource. By extension, the unit capacity of a resource pool uc is
the size of the pool times the capacity of one resource.
Given the above, the theoretical capacity μp 3 of pool p is:
uc
μp = (7.7)
ul
Example 7.6 Continuing the previous example, let us assume that the size of the
clerk resource pool is 3, while the same holds for the credit officer pool. Let us also
assume that 1 day is equal to 8 working hours. The unit capacity of a clerk (and
same for a credit officer) is 8 h/day. The unit capacity of the clerk pool is 24 h/day
(same for the credit officer pool). This means that the clerks can dedicate up to
24 h of effort per business day. Since each instance takes 6 h from them, they can
collectively handle 24/6 = 4 applications per day (i.e., μ = 4 instances/day for the
clerk pool). Similarly, three credit officers can dedicate up to 24 h/day. And since
each instance requires 5.2 h from them, their theoretical capacity is 24/5.2 = 4.62
loan applications per day (i.e., μ = 4.62 for the credit officer pool).
We see that the clerks can handle less loan applications per day than the credit
officers (4 versus 4.62). So, if we start receiving a lot of loan applications, the clerks
will be the first ones to reach their theoretical capacity. We say that the clerk pool
is the bottleneck of the process. More generally, the bottleneck is the resource pool
with the minimum theoretical capacity among all pools in a process.
The theoretical capacity of a process is the theoretical capacity of its bottleneck
pool. In our example, this is four instances per day. In the long term, there is no way
we can deliver more instances per day unless something is changed, like for example
if we add more resources to the resource pool or reduce the processing time of the
tasks in which they are involved.
Another useful and related concept is that of resource utilization. The resource
utilization ρp 4 of a pool p is the arrival rate λ of instances of the process (which we
saw in Little’s law) divided by the theoretical capacity μp of the pool, i.e.
ρp = λ/μp (7.8)
As mentioned earlier, flow analysis can also be used to calculate other performance
measures besides cycle time. For example, assuming we know the average cost of
each task, we can calculate the cost of a process more or less in the same way as we
calculate cycle time. In particular, the cost of a sequence of tasks is the sum of the
costs of these tasks. Similarly, the cost of an XOR-block is the weighted average of
the cost of the branches of the XOR-block and the cost of a rework pattern, such as
the one shown in Figure 7.8, is the cost of the body of the loop divided by 1 − r.
The only difference between calculating cycle time and calculating cost relates to
the treatment of AND-blocks. The cost of an AND-block, such as the one shown
in Figure 7.5, is not the maximum of the cost of the branches of the AND-block.
Instead, the cost of such a block is the sum of the costs of the branches. This is
because after the AND-split is traversed, every branch in the AND-join is executed.
Therefore the costs of these branches add up to one another.
Example 7.8 Let us consider again the credit application process model in
Figure 7.10 and the processing times given in Table 7.2. As previously, we assume
that in 20% of cases the application is incomplete and in 60% of cases the credit
is granted. We further assume that the tasks “Check completeness”, “Check credit
history” and “Check income sources” are performed by a clerk, while “Assess
application”, “Make credit offer” and “Notify rejection” are performed by a credit
officer. The hourly cost of a clerk is e 25 while the hourly cost of a credit officer
is e 50. Performing a credit history requires that the bank submit a query to an
external system. The bank is charged e 1 per query by the provider of this external
system.
From this scenario, we can see that the cost of each task can be split into two
components: the labor cost and other costs. The labor cost is the cost of the human
resource that performs the task. This can be calculated as the product of the hourly
cost of the resource and the processing time (in hours) of the task. Other costs
correspond to costs that are incurred by an execution of a task, but are not related
to the time spent by human resources on the task. In this example, the cost per
query to the external system would be classified as “other costs” for the task “Check
credit history”. The remaining tasks do not have an “other costs” component. For the
example at hand, the breakdown of resource cost, other cost and total cost per task is
272 7 Quantitative Process Analysis
given in Table 7.5. Given this input, we can calculate the total cost-per-execution of
the process as follows: 50/(1−0.2)+13.5+75+100+0.6×100+0.4×25 = 321.
Exercise 7.7 Calculate the cost-per-execution of the ministerial enquiry process
introduced in Exercise 3.7 (page 90). Assume that the rework probability is 0.2
and the times as given in Table 7.3. The task “Register ministerial enquiry” is
performed by a clerk, task “Investigate ministerial enquiry” is performed by an
adviser, “Prepare ministerial response” is performed by a senior adviser, and
“Review ministerial response” is performed by a minister counselor. The hourly
resource cost of a clerk, adviser, senior adviser and minister counselor are e 25,
e 50, e 75, and e 100, respectively. There are no other costs attached to these tasks
besides the resource costs.
7.2 Queues
In basic queueing theory, a queueing system consists of one or multiple queues and
a service that is provided by one or multiple servers. The elements inside a queue
are called jobs or customers, depending on the specific context. In the following, we
stick to the generic term process instance. For example, in the case of a supermarket,
the service is that of checking out. This service is provided by multiple cashiers (the
servers). Meanwhile, in the case of a bank office, the service is to perform a banking
transaction, the servers are tellers, and there is generally a single queue that leads to
multiple servers (the tellers). These two examples illustrate an important distinction
between multi-line (i.e., multi-queue) queueing systems (like the supermarket) and
single-line queueing systems (like the bank office).
Queueing theory provides a very broad set of techniques. Instead of trying to
present everything that queueing theory has to offer, we will present two queueing
theory models that are relatively simple, yet useful when analyzing business
processes or tasks within a process.
In the two models we will be presenting there is a single queue (single-line
queueing system). Instances arrive at a given average arrival rate λ. This is the same
concept of arrival rate that we discussed above when presenting Little’s law. For
example, we can say that customers arrive to the bank office at a mean rate of 20
1
per hour. This implies that, on average, one customer arrives every 3 min ( 20 h).
This latter number is called the mean inter-arrival time. We observe that if λ is the
arrival rate per time unit, then 1/λ is the mean inter-arrival time.
It would be illusory to think that the time between the arrival of two customers at
the bank office is always 3 min. This is just the mean value. In practice, customers
arrive independently from one another, so the time between the arrival of one
customer and the arrival of the next customer is completely random. Moreover, let
us say the time between the arrival of the first customer and the arrival of the second
customer is 1 min. This observation does not tell us anything about the time between
the arrival of the second customer and the arrival of the third customer. It might be
that the third customer arrives 1 min after the second, 5 min, or 10 min. We will not
know until the third customer arrives.
Such an arrival process is called a Poisson process. In this case, the distribution
of arrivals follows a so-called exponential distribution (specifically a negative
exponential distribution) with a mean of 1/λ. In a nutshell, this means that the
probability that the inter-arrival time is exactly equal to t (where t is a positive
number) decreases in an exponential manner when t increases. For instance, the
probability of an inter-arrival time of 10 min is considerably smaller than the
probability of the inter-arrival time being 1 min. Hence, shorter inter-arrival times
are much more probable than longer ones, but there is always a probability (perhaps
a very small one) that the inter-arrival time will be large.
In practice, the Poisson process and the exponential distribution describe a large
class of arrival processes. So, we will be using them to capture the arrival of jobs
or customers into a business process or a task in a business process. The Poisson
7.2 Queues 275
process can also be observed when we examine how often cars enter a given segment
of a highway or how often calls go through a telephone exchange.
Having said this, one must always cross-check that cases arrive at a given process
or task in an exponentially distributed manner. This cross-check can be done by
recording the inter-arrival times for a given period of time and then feeding these
numbers into a statistical tool such as R, Mathworks’s Statistical Toolbox, and
EasyFit. These tools use the input of a set of observed inter-arrival times to check if
it follows a negative exponential distribution.
Exponential distributions are not only useful when modeling the inter-arrival
time. They are also in some cases useful when describing the processing time of
a task. In queueing theory, the term service time is often used instead of processing
time. In the case of tasks that require a diagnosis, a non-trivial verification, or
some non-trivial decision making, it is often the case that the processing time
is exponentially distributed. Take, for example, the amount of time it takes for
a mechanic to make a repair on a car. Most repairs are fairly standard and the
mechanics might take 1 h to do them. However, some repairs are complex and in
such cases it can take the mechanic several hours to complete. A similar remark can
be made of a doctor receiving patients in an emergency room. A large number of
emergencies are quite standard and can be dispatched in less than an hour, but some
emergencies are extremely complicated and can take hours to deal with. So, it is
likely that such tasks will follow an exponential distribution. As mentioned above,
when making such a hypothesis, it is important to verify it by taking a random
sample of processing times and feeding them to a statistical tool.
In the queueing theory field, a single-queue system is called an M/M/1 queue5
if the inter-arrival times of customers follow an exponential distribution, the
processing times follow an exponential distribution, there is one single server and
instances are served on a First-In-First-Out (FIFO) basis. In the case of M/M/1
queue, we also assume that when an instance arrives it enters the queue and it stays
there until it is taken on by the server.
If the above conditions are satisfied, but there are multiple servers instead of a
single server, the queueing system is said to be M/M/c, where c is the number of
servers. For example, a system is M/M/5 if the inter-arrival times of customers
follow an exponential distribution, the processing times follow an exponential
distribution, and there are 5 servers at the end of the queue. The “M” in this
denomination stands for “Markovian”, which is the name given to the assumptions
that inter-arrival times and processing times follow an exponential distribution.
Other queueing models exist that make different assumptions. Each such model
is different, so the results we will obtain for an M/M/1 or M/M/c queue are quite
different from those we would obtain from other distributions.
The previous discussion, an M/M/1 queue or M/M/c queue can be defined by means
of the following parameters:
• λ is the mean arrival rate per time unit. The mean inter-arrival time is then 1/λ.
For example, λ = 5 means that there are 5 arrivals per hour and this entails that
the mean inter-arrival time between two consecutive instances is 1/5 h, that is
12 min.
• μ is the theoretical capacity per server (i.e., theoretical capacity per resource) or
in other words, the number of instances that a server can execute per time unit.
For example, μ = 6 means that 6 instances are served per hour, which means that
one instance is served in 10 min (on average).6
• In the case of M/M/c, the number of servers is c.
Given parameters λ and μ, we defined in Section 7.1.5 the resource utilization
ρ = λ/μ. In the above example, the resource utilization is 5/6 = 83.34%. It
should be noted that this is a relatively high resource utilization. A system with
a resource utilization of more than 100% is unstable, which means that the queue
will become longer and longer forever because the server cannot cope with all the
demand. In fact, even a system with a resource utilization close to 100% is unstable
because of the randomness with which new instances arrive and the variability in
the processing times per instance. To understand why this is the case, just imagine
a doctor receiving patients at a rate of 6 per hour for 8 h, knowing that every patient
takes 10 min on average to be treated (sometimes less but sometimes more). Without
any slack, the doctor will end up with a tremendous backlog at the end of the day.
λ
In the case of an M/M/c system, the resource utilization is cμ since the system
consists of a pool of resources that can collectively handle instances at a rate of cμ.
For example, if the system has 2 servers and each server can handle 2 instances per
hour, the system can handle 4 instances per hour. If instances arrive at a mean rate
of 3 per hour, the resource utilization of the system is 3/4 = 75%.
Given an M/M/1 or M/M/c system, queueing theory allows us to calculate the
following parameters:
• Lq is the average number of instances in the queue.
• Wq is the average time one instance spends in the queue.
• W is the average time one instance spends in the entire system. This includes
both the time the instance spends in the queue but also the time it spends being
serviced.
• L is the average number of instances in the system (i.e., the Work-In-Process
referenced in Little’s law).
6 InSection 7.1.5, we used symbol μ to refer to the theoretical capacity of a resource pool, while
here we are using symbol μ to refer to the theoretical capacity of each individual resource (or
server) in a pool. This is because the size of the pool is handled separately using parameter c.
7.2 Queues 277
W, L
Lq = ρ2 /(1 − ρ) (7.9)
Lq
Wq = (7.10)
λ
1
W = Wq + (7.11)
μ
L = λW (7.12)
Formulas (7.10), (7.11) and (7.12) can be applied to M/M/c models as well. The
only parameter that needs to be calculated differently in the case of M/M/c models
is Lq . For M/M/c models, Lq is given by the following formula:
(λ/μ)c ρ
Lq = c−1
(7.13)
(λ/μ)c (λ/μ)n
c!(1 − ρ)2 +
c!(1 − ρ) n=0 n!
is k are rejected (and may come back later). Other tools for analyzing queueing
systems include QSim9 and PDQ.10
Example 7.9 A company designs customized electronic hardware for a range of
customers in the high-tech electronics industry. The company receives orders for
designing a new circuit every 20 working days on average. It takes a team of
engineers on average 10 working days to design a hardware device.
This problem can be mapped to an M/M/1 model, assuming that the arrival of
designs follows a Poisson process, that the distribution of times for designing a
circuit follows an exponential distribution, and that new design requests are handled
in a FIFO manner. Note that even though the team includes several people, they act
as a monolithic entity and therefore should be treated as a single server. Let us take
the working day as a time unit. On average, 0.05 orders are received per day (λ =
0.05), and 0.1 orders are fulfilled per day (μ = 0.1). Thus, the resource utilization
of this system ρ = 0.05/0.1 = 0.5. Using the formulas for M/M/1 models, we can
deduce that the average length of the queue Lq is 0.52 /(1 − 0.5) = 0.5 orders.
From there we can conclude that the average time an order spends in the queue is
Wq = 0.5/0.05 = 10 days. Thus, it takes on average W = 10 + 1/0.1 = 20
working days for an order to be fulfilled.
Exercise 7.8 Consider now the case where the engineering team in the previous
example requires 16 working days to design a hardware device. What is then the
average amount of time an order takes to be fulfilled?
Exercise 7.9 An insurance company receives 220 calls per day from customers
who lodge insurance claims. The call center is open from 8 a.m. to 5 p.m. The
arrival of calls follows a Poisson process. Looking at the intensity of arrival of calls,
we can distinguish three periods during the day: the period 8 a.m. to 11 a.m., the
period 11 a.m. to 2 p.m. and the period 2 p.m. to 5 p.m. During the first period,
around 60 calls are received. During the 11 a.m. to 2 p.m. period, 120 calls are
received, and during the 2 p.m. to 5 p.m. period, 40 calls are received. A customer
survey has shown that customers tend to call between 11 a.m. and 2 p.m. because
during this time they have a break at work.
Statistical analysis shows that the durations of calls follow an exponential
distribution. According to the company’s customer service charter, customers
should not wait more than 1 min on average for their call to be answered.
• Assume that the call center can handle 70 calls per hour using 7 call center agents.
Is this enough to meet the 1-min constraint set in the customer service charter?
Please explain your answer by showing how you calculate the average length of
the queue and the average waiting time.
• What happens if the call center’s capacity is increased so that it can handle 80
calls per hour (using 8 call center agents)?
9 http://www.stat.auckland.ac.nz/~stats255/qsim/qsim.html.
10 http://www.perfdynamics.com/Tools/PDQ.html.
7.3 Simulation 279
• The call center manager has a mandate to cut costs by at least 20%. Give at least
two ideas to achieve this cut without reducing the salaries of the call center agents
and while keeping an average waiting time below or close to 1 min.
The basic queueing analysis techniques presented above allow us to estimate waiting
times and queue lengths based on the assumptions that inter-arrival times and
processing times follow an exponential distribution. When these parameters follow
different distributions, one needs to use different queueing models. Fortunately,
queueing theory tools nowadays support a broad range of queueing models and of
course they can do the calculations for us. The discussion above must be seen as an
overview of single-queue models, with the aim of providing a starting point from
where you can learn more about this family of techniques.
A more fundamental limitation of the techniques introduced in this section is that
they only deal with one task at a time. When we have to analyze an entire process
that involves several tasks, events, and resources, these basic techniques are not
sufficient. There are many other queueing analysis techniques that could be used for
this purpose, like for example queueing networks. Essentially, queueing networks
are systems consisting of multiple inter-connected queues. However, the maths
behind queueing networks can become quite complex, especially when the process
includes concurrent tasks. A more popular approach for quantitative analysis of
process models under varying levels of resource contention is process simulation,
as discussed below.
7.3 Simulation
Process simulation is arguably the most popular and widely supported technique for
quantitative analysis of process models. The essential idea underpinning process
simulation is to use the process simulator for generating a large number of
hypothetical instances of a process, executing these instances step-by-step, and
recording each step in this execution. The output of a simulator then includes the
logs of the simulation as well as statistics of cycle times, average waiting times, and
average resource utilization.
During a process simulation, the tasks in the process are not actually executed.
Instead, the simulation of a task proceeds as follows. When a task is ready to be
executed, a so-called work item is created and the simulator first tries to find a
280 7 Quantitative Process Analysis
resource to which it can assign this work item. If no resource able to perform the
work item is available, the simulator puts the work item in waiting mode until a
suitable resource becomes available. Once a resource is assigned to a work item, the
simulator determines the duration of the work item by drawing a random number
according to the probability distribution of the task processing time. This probability
distribution and the corresponding parameters need to be defined in the simulation
model.
Once the simulator has determined the duration of a work item, it puts the work
item in sleeping mode for that duration. This sleeping mode simulates the fact that
the task is being executed. Once the time interval has passed (according to the
simulation clock), the work item is declared to be completed and the resource that
was assigned to it becomes available.
In reality, the simulator does not effectively wait for tasks to come back from their
sleeping mode. For example, if the simulator determines that the duration of a work
item is 2 days and 2 h, it will not wait for this amount of time to pass by. You can
imagine how long a simulation would take if that was the case. Instead, simulators
use smart algorithms to complete the simulation as fast as possible. Modern business
process simulators can effectively simulate thousands of process instances and tens
of thousands of work items in a matter of seconds.
For each work item created during a simulation, the simulator records the
identifier of the resource that was assigned to this instance as well as three
timestamps:
• The time when the task was ready to be executed.
• The time when the task was started, meaning that it was assigned to a resource.
• The time when the task completed.
Using the collected data, the simulator can compute the average waiting time for
each task. These measures are quite important when we try to identify bottlenecks
in the process. Indeed, if a task has a high average waiting time, it means that there
is a bottleneck at the level of this task. The analyst can then consider several options
for addressing this bottleneck.
Additionally, since the simulator records which resources perform which work
items and it knows how long each work item takes, the simulator can find out the
total amount of time during which a given resource is busy handling work items. By
dividing the amount of time that a resource was busy during a simulation by the total
duration of the simulation, we obtain the resource utilization, that is, the percentage
of time that the resource is busy on average.
From the above description of how a simulation works, we can see that the following
information needs to be specified for each task in the process model in order to
simulate it:
7.3 Simulation 281
11 http://bimp.cs.ut.ee.
7.3 Simulation 283
simulator takes as input a BPMN process model. We provide the following inputs
for the simulation.
• Three loan applications arrive per hour on average, meaning an inter-arrival time
of 20 min. Loan applications arrive only from 9 a.m. to 5 p.m. during weekdays.
• The tasks “Check credit history” and “Check income sources” are performed by
clerks.
• The tasks “Notify rejection”, “Make credit offer”, and “Assess application” are
performed by credit officers.
• The task “Receive customer feedback” is in fact an event. It takes zero time and
it only involves the credit information system (no human resources involved). To
capture this, the task is assigned to a special “System” role.
• There are two clerks and two credit officers. The hourly cost of a clerk is e 25
while that of a credit officer is e 50.
• Clerks and credit officers work from 9 a.m. to 5 p.m. during weekdays.
• The cycle time of the task “Assess application” follows an exponential distribu-
tion with a mean of 20 min.
• Cycle times of all other tasks follow a normal distribution. The tasks “Check
credit history”, “Notify rejection”, and “Make credit offer” have a mean cycle
time of 10 min with a 20% standard deviation, while “Check income sources”
has a cycle time of 20 min with a 20% standard deviation as well.
• The probability that an application is accepted is 80%.
• The probability that a customer, whose application was rejected, asks that the
application be re-assessed is 20%.
We run a simulation with 2,400 instances, which means 100 working days given
that 24 loan applications arrive per day. The simulation gives an average cycle time
of around 7.5 h if we count the time outside working hours (cycle time including
off-timetable hours in BIMP). If we count only working hours, the cycle time is 2 h.
The latter is called the cycle time excluding off-timetable hours in BIMP. These cycle
time measurements may vary by about ± 10% when we run the simulation multiple
times. These variations are expected due to the stochastic nature of the simulation.
For this reason, we recommend running the simulation multiple times and to take
averages of the simulation results.
Figure 7.14 shows the histograms for process cycle times (both including and
excluding off-timetable hours), waiting times (excluding off-timetable costs), and
costs. It can be seen that the waiting times are relatively low. This is because the
resource utilization of clerks and credit officers is around 76–80%.
Exercise 7.10 The insurance company called Cetera is facing the following prob-
lem: Whenever there is a major event (e.g., a storm), their claim-to-resolution
process is unable to cope with the ensuing spike in demand. During normal times,
the insurance company receives about 9,000 calls per week, but during a storm
scenario the number of calls per week doubles.
The claim-to-resolution process model of Cetera is presented in Figure 7.15. The
process starts when a call related to lodging a claim is received. The call is routed
284 7 Quantitative Process Analysis
Process cycle times including off-timetable hours Process cycle times excluding off-timetable hours
Fig. 7.14 Histograms produced by simulating the credit application process with BIMP
to one of two call centers depending on the location of the caller. Each call center
receives approximately the same amount of calls (50–50) and has the same number
of operators (40 per call center). The process for handling calls is identical across
both call centers. When a call is received at a call center, the call is picked up by
a call center operator. The call center operator starts by asking a standard set of
questions to the customer to determine if the customer has the minimum information
required to lodge a claim (e.g., insurance policy number). If the customer has enough
information, the operator then goes through a questionnaire with the customer,
enters all relevant details, checks the completeness of the claim, and registers the
claim.
Once a claim has been registered, it is routed to the claims handling office, where
all remaining steps are performed. There is one single claims handling office, so
regardless of the call center agent where the claim is registered, the claim is routed
to the same office. In this office, the claim goes through a two-stage evaluation
process. First of all, the liability of the customer is determined. Secondly, the claim
is assessed in order to determine if the insurance company has to cover this liability
and to what extent. If the claim is accepted, payment is initiated and the customer is
advised of the amount to be paid. The tasks of the claims handling department are
performed by claims handlers. There are 150 claims handlers in total.
The mean cycle time of each task (in seconds) is indicated in Figure 7.15. For
every task, the cycle time follows an exponential distribution. The hourly cost of a
call center agent is e 30, while the hourly cost of a claims handler is e 50.
Describe the input that should be given to a simulator in order to simulate this
process in the normal scenario and in the storm scenario. Using a simulation tool,
encode the normal and the storm scenarios, and run a simulation in order to compare
these two scenarios.
7.3 Simulation 285
120 seconds
Determine
likelihood of
the claim
insured could
not be liable
(15% of cases)
Case closed
1200 seconds
Assess claim
claim is rejected
(20% of cases)
Claim rejected
120 seconds
240 seconds
Initiate Advise
payment claimant of
reimbursement
60 seconds
Close claim
Claim closed
entire resource pool. Other tools support a more fine-grained specification of the
resources of a pool one by one with specific cost rates for each created resource
(e.g., create 10 clerks one by one, each with its name and hourly cost).
The above discussion illustrates some of the nuances found across simulation
tools. In order to avoid diving straight away into the numerous details of a tool,
it may be useful for beginners to take their first steps using the BIMP simulator
referred to in Example 7.10. BIMP is a rather simple BPMN process model
simulator that provides the core functionality found in commercial business process
simulation tools.
One should keep in mind that the quantitative analysis techniques we have seen in
this chapter, and simulation in particular, are based on models and on simplifying
assumptions. The reliability of the output produced by these techniques largely
depends on the accuracy of the numbers that are given as input. Additionally,
simulation assumes that process participants work mechanically. However, process
participants are not robots. They are subject to unforeseen interruptions, they display
varying performance depending on various factors, and they may adapt differently
to new ways of working.
It is good practice whenever possible to derive the input parameters of a
simulation from actual observations, meaning from historical process execution
data. This is possible when simulating an as-is process that is being executed in the
company, but not necessarily when simulating a to-be process. In a similar spirit, it
is recommended to cross-check simulation outputs against expert advice. This can
be achieved by presenting the simulation results to process stakeholders (including
process participants). The process stakeholders are usually able to provide feedback
on the credibility of the resource utilization levels calculated via simulation and
the bottlenecks put into evidence by the simulation. For instance, if the simulation
points to a bottleneck in a given task, while the stakeholders and participants
perceive this task to be uncritical, there is an indication that incorrect assumptions
have been made. Feedback from stakeholders and participants helps to reconfigure
the parameters such that the results are closer to matching the actual behavior. In
other words, process simulation is an iterative analysis technique.
Finally, it is advisable to perform sensitivity analysis of the simulation. Con-
cretely, this means observing how the output of the simulation changes when adding
one resource to or removing one resource from a resource pool, or when changing
the processing times by ± 10%, for example. If such small changes in the simulation
input parameters significantly affect the conclusions drawn from the simulation
outputs, one must be careful when interpreting the simulation results.
288 7 Quantitative Process Analysis
7.4 Recap
In this chapter we saw three quantitative process analysis techniques, namely flow
analysis, queueing theory, and simulation. These techniques allow us to derive
process performance measures, such as cycle time or cost, and to understand how
different tasks and resource pools contribute to the overall performance of a process.
Flow analysis allows us to calculate performance measures from a process model
and performance data pertaining to each task in the model. We also analyzed the
critical path of a process using the Critical Path Method. Finally, we studied the
capacity of a process and defined the notion of resource utilization. The waiting
times of a process are highly dependent on resource utilization—the busier the
resources are, the longer the waiting times.
Basic queueing theory models, such as the M/M/1 model, allow us to calculate
waiting times for individual tasks given data about the number of resources and their
processing times. Other queueing theory models such as queueing networks can be
used to perform fine-grained analysis at the level of entire processes. However, in
practice it is convenient to use process simulation for fine-grained analysis. Process
simulation allows us to derive process performance measures (e.g., cycle time or
cost) given data about the tasks (e.g., processing times) and data about the resources
involved in the process. Process simulation is a versatile technique supported by a
range of process modeling and analysis tools.
Solution 7.1 First we observe that the cycle time of the AND-block is 1. Next, we
calculate the cycle time of the XOR-block as follows: 0.4 × 1 + 0.4 × 1 + 0.2 × 1 h.
The total cycle time is thus: 1 + 1 + 1 = 3 h.
4+4
Solution 7.2 The cycle time of the process is 2 + 8 + 1−0.2 = 20 days. Assuming
8 working hours per day, this translates to 160 working hours. The theoretical cycle
4+2
time is 0.5 + 12 + 1−0.2 = 20 h. Hence, cycle time efficiency is 12.5%.
Solution 7.3 It can be expected that the average cycle times of the reported
process have generally improved. Since 1992, several technological advancements
have drastically improved white-collar work productivity. These relate to a bet-
ter coordination and routing of tasks using information technology including
office applications, enterprise systems, and Internet technology. In Chapter 9, we
will discuss how Business Process Management Systems and different types of
Process-Aware Information Systems have contributed to better coordination and
task automation. These advancements have likely reduced waiting times in many
business processes. Therefore, also cycle time efficiency should have improved
since 1992.
Solution 7.4 The process model shown in Figure 7.4 has three tasks with the
following ES, EF , LS, and LF :
7.5 Solutions to Exercises 289
• Start event: ES = EF = LS = LF = 0.
• Task A: ES = LS = 0 and EF = LF = 10.
• Task B: ES = LS = 10 and EF = LF = 30.
• Task C: ES = 10 and EF = 20. Here is slack, because LS = 20 and LF = 30.
• End event: ES = EF = LS = LF = 30.
Task B has slack of 10. The critical path includes all tasks except B.
Solution 7.5 Little’s law tells us that CT = W IP/λ. At peak time, there are 900
customers distributed across 6 h, so the mean arrival rate λ = 150 customers per
hour. On the other hand, W IP = 90 during peak time. Thus, CT = 90/150 = 0.6 h
(i.e., 36 min). During non-peak time, λ = 300/6 = 50 customer per hour while
W IP = 30, thus CT = 30/50 = 0.6 h (again 36 min). If the number of customers
per hour during peak times is expected to go up, but the WIP has to remain constant,
we need to reduce the cycle time per customer. This may be achieved by shortening
the serving time, the interval between the moment a customer enters the restaurant
and the moment he or she places an order, or the time it takes for the customer
to pay. In other words, the process for order taking and payment may need to be
redesigned.
Solution 7.6
• A call center agent spends 60 + 0.9*540 = 546 s per instance.
• One call center agent can deliver 3,600 s per hour, hence 7 agents can deliver
25,200 s per hour.
• μ = 25, 200/546 = 46.15 calls per hour.
• For convenience, we use the hour as the time unit. Hence, λ = 24.44 and μ =
46.15, and therefore ρ = 24.44/46.15 = 0.53
Solution 7.7 Given that there are no other costs, we calculate the cost of the process
by aggregating the resource costs as follows: 0.5 × e25 + 12 × e50 + (4 × e75 +
2 × e100)/(1 − 0.2) = e1, 237.50.
Solution 7.8 On average, 0.05 orders are received per day (λ = 0.05), and 0.0625
orders are fulfilled per day (μ = 0.0625). Thus, the resource utilization of this
system ρ = 0.05/0.0625 = 0.8. Using the formulas for M/M/1 models, we can
deduce that the average length of the queue Lq is: 0.82 /(1 − 0.8) = 3.2 orders.
From this, we can conclude that the average time an order spends on the queue is
Wq = 3.2/0.05 = 64 days. Thus, it takes on average W = 64 + 16 = 80 working
days for an order to be fulfilled.
Solution 7.9 Strictly speaking, we should analyze this problem using an M/M/c
queueing model. However, the formulas for M/M/c are quite complex to show the
calculations in detail. For this reason, we will assume in this solution that the entire
call center behaves as a single monolithic team, so that we can use an M/M/1
queueing model to analyse the problem. Because of this assumption, the results
will not be exact.
290 7 Quantitative Process Analysis
If we only had 7 call center agents, then the resource utilization ρ = 40/70 =
0.57, Lq = ρ2 /(1 − ρ) = 0.572 /(1 − 0.57) = 0.76, and Wq = Lq /λ = 0.76/40 =
0.0189 h = 1.13 min. So we cannot meet the customer service charter.
If we can handle 80 calls per hour (8 call center agents), then the resource
utilization ρ = 40/80 = 0.5, Lq = ρ2 /(1 − ρ) = 0.52 /(1 − 0.5) = 0.5, and
Wq = Lq /λ = 0.5/40 = 0.0125 h = 45 s, so we meet the customer service charter.
Ways to reduce costs while staying as close as possible to the customer service
charter are:
• We could reduce the number of call center agents to 7 and still have an average
waiting time of 1.13 min. That reduces costs by 12.5% (one call center agent
less).
• We could introduce a self-service system, whereby people lodge their application
online (at least for simple claims).
• We could extend the call center working times (e.g., work until 6 p.m. or 7 p.m.
instead of 5 p.m.), so that people can call after work. In this way, we might ease
the call center load during its peak time.
• Reduce the time of each call by providing better training to call center agents.
Solution 7.10 For this problem, we will reason exclusively in terms of working
hours as a unit of time, as opposed to calendar hours. We assume that a week consists
of 40 working hours. Calls arrive only during these 40 working hours and call center
operators and claims handlers work only during these 40 h. By taking working hours
as a time unit, we avoid the need to attach calendars resources.
In the normal scenario (no storm), the arrival rate is 9,000 cases per week, that
is one case every 16 s (this is the inter-arrival time). In the storm scenario the inter-
arrival time is 8 s. In both cases, we use an exponential distribution for the inter-
arrival time. We run simulations corresponding to 1 week of work, which means
9,000 cases for the normal scenario and 18,000 cases for the storm scenario.
In order to distinguish between the two call centers, we define two separate
resource pools called “Call Center Operator 1” and “Call Center Operator 2” each
one with 40 resources at an hourly cost of 30, plus a resource pool “Claims Handler”
with 150 resources. We assign tasks to resource pools as indicated in the scenario
and we use the cycle times indicated in the process model as input for the simulation.
Running the simulation using the BIMP simulator gives us the following outputs.
In the normal scenario, we obtain a resource utilization of around 48% for claims
handlers and 34–36% for call center operators. The average cycle time (excluding
off-timetable hours) is around 0.5 working hours and the maximum observed cycle
time is around 3.3 working hours. In other words, the resources are under-utilized
and, thus, the cycle time is low.
In the storm season, resource utilization of claims handlers is above 95% and
around 78% for the call center agents. The average cycle time is 2 h while
the maximum is around 7.5 h (excluding off-timetable time). The high resource
utilization indicates that the claims handling office is overloaded during storm
season. On the other hand, the call center has sufficient capacity. The average
waiting time in the call center is in the order of seconds.
7.6 Further Exercises 291
A BPMN model of this process together with the simulation parameters (in the
format required by BIMP) can be found in the book’s companion website.12
Exercise 7.11 Calculate the cycle time, cycle time efficiency, and cost of the
university admission process described in Exercise 1.1 (page 5), assuming that:
• The process starts when an online application is submitted.
• It takes on average 2 weeks (after the online application is submitted) for the
documents to arrive to the students service by post.
• The check for completeness of documents takes about 10 min. In 20% of the
cases, the completeness check reveals that some documents are missing. In this
case, an email is sent to the student automatically by the university admission
management system based on the input provided by the international students
officer during the completeness check.
• A student services officer spends on average 10 min to put the degrees and
transcripts in an envelope and to send them to the academic recognition agency.
The time it takes to send the degrees and transcripts to the academic recognition
agency and to receive back a response is 2 weeks on average.
• About 10% of applications are rejected after the academic recognition assess-
ment.
• The university pays a fee of e 5 each time it requests the academic recognition
agency to accept an application.
• Checking the English language test results takes 1 day on average, but the officer
who performs the check only spends 10 min on average per check. This language
test check is free.
• About 10% of applications are rejected after the English language test.
• It takes on average 2 weeks between the time students service sends the copy of
an application to the committee members and the moment the committee makes
a decision (accept or reject). On average, the committee spends 1 h examining
each application.
• It takes on average 2 days (after the decision is made by the academic committee)
for the students service to record the academic committee’s decision in the
university admission management system. Recording a decision takes on average
2 min. Once a decision is recorded, a notification is automatically sent to the
student.
• The hourly cost of the officers at the international students office is e 50.
• The hourly cost of the academic committee (as a whole) is e 200.
12 http://fundamentals-of-bpm.org/supplementary-material/.
292 7 Quantitative Process Analysis
Calculate the cycle time efficiency and the cost-per-execution of the as-is process
assuming that:
• Submitting and registering a new request takes 5 min on average.
• Requests spend on average 1 h waiting for a Level-1 staff member to check them.
This applies both to new requests and to resubmitted requests.
• Checking if a new request is known takes on average 10 min. In 20% of the cases,
the request is known. In this case, it takes between 2 and 10 min (average 5 min)
for the Level-1 staff member to communicate the resolution to the client. Once
this is done, the request is marked as “closed”. On the other hand, if the request
is not known, the request is automatically forwarded to Level-2.
• New requests spend on average 2 h waiting for a Level-2 staff member to evaluate
them. Level-2 staff take on average 20 min to evaluate a new request.
• Level-2 staff take 5 min to prioritize a request.
• The time between the moment a request has been prioritized and the moment the
request is picked up by a Level-2 staff member is 20 h.
• The time required to research and resolve a request is on average 2 h.
7.6 Further Exercises 293
e 250,000 in revenue, if they do not reduce their order fulfillment time below 40
working days. Adding one engineer to the existing team would reduce the time
to design a hardware down to 14 working days (from 16 days). An additional
engineer would cost the company e 50,000. On the other hand, hiring a second
engineering team would cost e 250,000. Analyze these two scenarios and formulate
a recommendation to the company.
Exercise 7.15 We consider a Level-2 IT service desk with 2 staff members. Each
staff member can handle one service request in 4 working hours on average. Service
times are exponentially distributed. Requests arrive at a mean rate of one request
every 3 h according to a Poisson process. What is the average time between the
moment a service request arrives to this desk and the moment it is fulfilled?
Exercise 7.16 Consider the Level-2 IT service desk described in Exercise 7.15.
Let us assume that the number of requests is one per hour. How many level-2 staff
members are required in order to ensure that the mean waiting time of a request is
less than two working hours?
Exercise 7.17 Consider again the IT helpdesk process described in Exercise 7.12
(page 291). Model and simulate it assuming that cases arrive at a rate of 50 per day
according to an exponential distribution. Assume that all the task cycle times follow
an exponential distribution with the average given in Exercise 7.12.
Note When modeling the process, do not model the waiting times between tasks,
only the tasks themselves.
Exercise 7.18 Consider the process model in Figure 7.17. This model captures
a simplified process for handling mortgage applications. There are two checks
involved. CT1 deals with a check of the financial coverage of the mortgage
application. The second check CT2 concerns the verification of the property that is
to be mortgaged. If the result of both checks is positive, the application is accepted
(task AC). On average, after the execution of task CT1, 20% of all applications
80% 70%
AW: Reject
mortgage
application Mortgage
application
rejected
are rejected. Meanwhile, task CT2 leads to 30% of further rejections. If either of the
checks has an unsatisfactory result, the application is rejected (task AW). The arrival
process is Poisson with an average arrival of 5 cases per hour during business hours.
For each task, exactly one dedicated resource is available. The processing time of
every task follows an exponential distribution. The mean processing times for tasks
CT1, CT2, AC, and AW are respectively 5, 4, 3, and 3 min. The wage of each
resource is e 20 per hour. Business hours are from Monday to Friday from 9 a.m.
to 5 p.m. Resources are only available during these hours.
a. Determine the resource utilization of each resource.
b. Determine the average cycle time of the process.
c. Determine the cycle time efficiency of the process.
d. Determine the average number of mortgage applications that are being handled
at any given point in time.
Hint For this exercise, it might be convenient to use a combination of process
simulation, Little’s law, and flow analysis.
In Section 7.1, we showed how flow analysis techniques can be used to calculate
cycle time and cost. Laguna & Marklund [85] discuss flow analysis in detail.
Another possible application of flow analysis is to estimate the error rate of the
process, meaning the number of cases that will end up in a negative outcome. This
latter application of flow analysis is discussed for example by Yang et al. [196].
Yang et al. also present a technique for flow analysis that is applicable not only to
block-structured process models but also to a much broader class of process models.
As mentioned in Section 7.2, the formula for determining the average queue
length in the context of the M/M/c model is particularly complicated. Laguna &
Marklund [85, Chapter 6] analyze the M/M/c model (including the formula for
average queue length) and its application to process analysis. They also analyze the
M/M/c/K model, where an upper-bound to the length of the queue is imposed (this
is parameter K in the model). The M/M/c/K model is suitable for example when
there is a maximum length of queue beyond which customers are rejected from the
queue. Adan & Resing [3] give detailed introductions to M/M/1, M/M/c, M/M/c/K
and other queueing theory models.
As stated in Section 7.3, business process simulation is a versatile approach
for quantitative process analysis. Numerous case studies illustrating the use of
process simulation in various domains can be found in the literature. For example,
Greasley [58] illustrates the use of business process simulation for redesigning a
process for road traffic accident reporting. In a similar vein, Van der Aalst et al. [178]
discuss the use of business process simulation to evaluate different strategies to
avoid or to mitigate deadline violations in the context of an insurance claims
handling process in an insurance company. Exercise 7.10 is based on this latter
paper.
296 7 Quantitative Process Analysis
Current tools for business process simulation have various limitations. Several
of these limitations are discussed at length by Van der Aalst et al. [177]. Research
by Martin et al. discusses how data about previous executions of the process can
be used to build more accurate simulation models [104, 105]. Van der Aalst et
al. [177] propose to use more sophisticated tools for process simulation, namely
Discrete-Event Simulation (DES) tools. They specifically put forward CPN Tools
as a possible DES that can be used for business process simulation. CPN Tools
is based on Colored Petri Nets—a language that extends Petri nets. Other DES
tools that can be used for business process simulation include ExtendSim [85] and
Arena [75]. For example, Arena is used in the aforementioned case study of a road
traffic reporting process [58]. DES tools are clearly more powerful than specialized
business process simulation tools. However, the choice of a DES tool means that one
cannot directly use a BPMN model for simulation. Instead the model has to be re-
encoded in another notation. Moreover, the use of DES tools requires more technical
background from the analyst. These trade-offs should be considered when choosing
between DES tools and specialized business process simulation tools based for
example on BPMN.
We saw throughout the chapter that quantitative analysis techniques allow us to
identify critical paths and bottlenecks. These are essentially paths in the process
that require special attention if the goal is to reduce cycle time. Anupindi et al. [9]
offer detailed advice on how to deal with critical paths and bottlenecks in business
processes as well as how to reduce waste and repetition. The following chapter will
discuss some of these insights.