ProM 4.0: Comprehensive Support for Real
Process Analysis
W.M.P. van der Aalst1 , B.F. van Dongen1 , C.W. Günther1 , R.S. Mans1 , A.K.
Alves de Medeiros1 , A. Rozinat1 , V. Rubin2,1 , M. Song1 , H.M.W. Verbeek1 ,
and A.J.M.M. Weijters1
1
Eindhoven University of Technology, Eindhoven, The Netherlands
{w.m.p.v.d.aalst}@tue.nl
2
University of Paderborn, Paderborn, Germany
Abstract. This tool paper describes the functionality of ProM. Version
4.0 of ProM has been released at the end of 2006 and this version reflects
recent achievements in process mining. Process mining techniques attempt to extract non-trivial and useful information from so-called “event
logs”. One element of process mining is control-flow discovery, i.e., automatically constructing a process model (e.g., a Petri net) describing the
causal dependencies between activities. Control-flow discovery is an interesting and practically relevant challenge for Petri-net researchers and
ProM provides an excellent platform for this. For example, the theory
of regions, genetic algorithms, free-choice-net properties, etc. can be exploited to derive Petri nets based on example behavior. However, as we
will show in this paper, the functionality of ProM 4.0 is not limited to
control-flow discovery. ProM 4.0 also allows for the discovery of other
perspectives (e.g., data and resources) and supports related techniques
such as conformance checking, model extension, model transformation,
verification, etc. This makes ProM a versatile tool for process analysis which is not restricted to model analysis but also includes log-based
analysis.
1
Introduction
The first version of ProM was released in 2004. The initial goal of ProM was to
unify process mining efforts at Eindhoven University of Technology and other
cooperating groups [4]. Traditionally, most analysis tools focusing on processes
are restricted to model-based analysis, i.e., a model is used as the starting point
of analysis. For example, the alternating-bit protocol can be modeled as a Petri
net and verification techniques can then be used to check the correctness of the
protocol while simulation can be used to estimate performance aspects. Such
analysis is only useful if the model reflects reality. Process mining techniques use
event logs as input, i.e., information recorded by systems ranging from information systems to embedded systems. Hence the starting point is not a model
but the observed reality. Therefore, we use the phrase real process analysis to
position process mining with respect to classical model-based analysis. Note that
ProM also uses models (e.g., Petri nets). However, these models (1) are discovered from event logs, (2) are used to reflect on the observed reality (conformance
checking), or (3) are extended based on information extracted from logs.
Process mining is relevant since more and more information about processes
is collected in the form of event logs. The widespread use of information systems,
e.g., systems constructed using ERP, WFM, CRM, SCM, and PDM software,
resulted in the omnipresence of vast amounts of event data. Events may be
recorded in the form of audit trails, transactions logs, or databases and may
refer to patient treatments, order processing, claims handling, trading, travel
booking, etc. Moreover, recently, more and more devices started to collect data
using TCP/IP, GSM, Bluetooth, and RFID technology (cf. high-end copiers,
wireless sensor networks, medical systems, etc.).
Table 1. Comparing ProM 1.1 presented in [7] with ProM 4.0.
Version
Mining plug-ins
Analysis plug-ins
Import plug-ins
Export plug-ins
Conversion plug-ins
Log filter plug-ins
Total number of plug-ins
ProM 1.1
6
7
4
9
3
0
29
ProM 4.0
27
35
16
28
22
14
142
At the Petri net conference in 2005, Version 1.1 of ProM was presented [7].
In the last two years ProM has been extended dramatically and currently dozens
of researchers are developing plug-ins for ProM. ProM is open source and uses
a plug-able architecture, e.g., people can add new process mining techniques
by adding plug-ins without spending any efforts on the loading and filtering of
event logs and the visualization of the resulting models. An example is the plug-in
implementing the α-algorithm [5], i.e., a technique to automatically derive Petri
nets from event logs. The version of ProM presented at the Petri net conference
in 2005 (Version 1.1) contained only 29 plug-ins. Version 4.0 provides 142 plugins, i.e., there are almost five times as many plug-ins. Moreover, there have been
spectacular improvements in the quality of mining algorithms and the scope
of ProM has been extended considerably. This is illustrated by Table 1 which
compares the version presented in [7] with the current version. To facilitate the
understanding of Table 1, we briefly describe the six types of plug-ins:
– Mining plug-ins implement some mining algorithm, e.g., the α-miner to discover a Petri net [5] or the social network miner to discover a social network
[1].
– Export plug-ins implement some “save as” functionality for specific objects
in ProM. For example, there are plug-ins to save Petri nets, EPCs, social
networks, YAWL, spreadsheets, etc. often also in different formats (PNML,
CPN Tools, EPML, AML, etc.).
– Import plug-ins implement an “open” functionality for specific objects, e.g.,
load instance-EPCs from ARIS PPM or BPEL models from WebSphere.
– Analysis plug-ins which typically implement some property analysis on some
mining result. For example, for Petri nets there is a plug-in which constructs
place invariants, transition invariants, and a coverability graph. However,
there are also analysis plug-ins to compare a log and a model (i.e., conformance checking) or a log and an LTL formula. Moreover, there are analysis
plug-ins related to performance measurement (e.g., projecting waiting times
onto a Petri net).
– Conversion plug-ins implement conversions between different data formats,
e.g., from EPCs to Petri nets or from Petri nets to BPEL.
– Log filter plug-ins implement different ways of “massaging” the log before
applying process mining techniques. For example, there are plug-ins to select
different parts of the log, to abstract from infrequent behavior, clean the log
by removing incomplete cases, etc.
In this paper we do not elaborate on the architecture and implementation framework for plug-ins (for this we refer to [7]). Instead we focus on the functionality
provided by the many new plug-ins in ProM 4.0.
The remainder of this paper is organized as follows. Section 2 provides an
overview of process mining and briefly introduces the basic concepts. Section 3
describes the “teleclaims” process of an Australian insurance company. A log of
this process is used as a running example and is used to explain the different
types of process mining: Discovery (Section 4), Conformance (Section 5), and
Extension (Section 6). Section 7 briefly mentions additional functionality such
as verification and model transformation. Section 8 concludes the paper.
2
Overview
The idea of process mining is to discover, monitor and improve real processes
(i.e., not assumed processes) by extracting knowledge from event logs. Today
many of the activities occurring in processes are either supported or monitored
by information systems. Consider for example ERP, WFM, CRM, SCM, and
PDM systems to support a wide variety of business processes while recording
well-structured and detailed event logs. However, process mining is not limited to
information systems and can also be used to monitor other operational processes
or systems. For example, we have applied process mining to complex X-ray
machines, high-end copiers, web services, wafer steppers, careflows in hospitals,
etc. All of these applications have in common that there is a notion of a process
and that the occurrence of activities are recorded in so-called event logs.
Assuming that we are able to log events, a wide range of process mining
techniques comes into reach. The basic idea of process mining is to learn from
observed executions of a process and can be used to (1) discover new models
(e.g., constructing a Petri net that is able to reproduce the observed behavior),
(2) check the conformance of a model by checking whether the modeled behavior
matches the observed behavior, and (3) extend an existing model by projecting
information extracted from the logs onto some initial model (e.g., show bottlenecks in a process model by analyzing the event log). All three types of analysis
supports/
controls
business processes
people
machines
components
organizations
models
analyzes
specifies
configures
implements
analyzes
records
events, e.g.,
messages,
transactions,
etc.
Fig. 1. Overview showing three types of process mining supported by ProM: (1) Discovery, (2) Conformance, and (3) Extension.
have in common that they assume the existence of some event log. Figure 1 shows
the three types of process mining. Each of these is supported by ProM through
various plug-ins as will be shown in the remainder using a running example.
3
Running Example
As a working example, we consider the “teleclaims” process of an Australian
insurance company described in [2]. This process deals with the handling of
inbound phone calls, whereby different types of insurance claims (household, car,
etc.) are lodged over the phone. The process is supported by two separate call
centres operating for two different organizational entities (Brisbane and Sydney).
Both centres are similar in terms of incoming call volume (approx. 9,000 per
week) and average total call handling time (550 seconds), but different in the
way call centre agents are deployed, underlying IT systems, etc. The teleclaims
process model is shown in Figure 2. The two highlighted boxes at the top show
the subprocesses in both call centres. The lower part describes the process in the
back-office.
This process model is expressed in terms of an Event-Driven Process Chain
(EPC) (see [8] for a discussion on the semantics of EPCs). For the purpose of
the paper it is not necessary to understand the process and EPC notation in
any detail. However, for a basic understanding, consider the subprocess corresponding to the call centre in Brisbane. The process starts with event “Phone
call received”. This event triggers function “Check if sufficient information is
available”. This function is executed by a “Call Center Agent”. Then a choice is
made. The circle represents a so-called connector. The “x” inside the connector
and the two outgoing arcs indicate that it is an exclusive OR-split (XOR). The
XOR connector results in event “Sufficient information is available” or event
“Sufficient information is not available”. In the latter case the process ends. If
the information is available, the claim is registered (cf. function “Register claim”
also executed by a “Call Center Agent”) resulting in event “Claim is registered”.
The call centre in Sydney has a similar subprocess and the back-office process
should be self-explaining after this short introduction to EPCs. Note that there
are three types of split and join connectors: AND, XOR, and OR, e.g., in the
back-office process there is one AND-split (∧) indicating that the last part is
executed in parallel.
Call Centre Brisbane / 24x7
Call Centre Sydney / 5 days, 9-5
Frequency, weekly: 9,000
Frequency, weekly: 9,000
Phone call
received
Phone call
received
30.00 Second(s)
Check, if
sufficient
information is
available
Call Centre
Agent
30.00 Second(s)
Check, if
sufficient
information is
available
Call Centre
Agent
90
0.90
520.00 Second(s)
90
0.10
Sufficient
information is
available
0.90
Sufficient
information is
not available
Call Centre
Agent
Register
claim
0.10
Sufficient
information is
available
Register
claim
520.00 Second(s)
Sufficient
information is
not available
Call Centre
Agent
90
90
Claim
is registered
Claim
is registered
20.00 Second(s)
Determine
likelihood of
claim
Claims Handler
150
0.15
0.85
Insured
could not be
iable
Insured
could be liable
660.00 Second(s)
Assess claim
Claims Handler
150
0.80
0.20
Claim has been
accepted
120.00 Second(s)
Initiate
payment
Claims Handler
Claim has been
rejected
180.00 Second(s)
Advise claimant
on
reimbursement
150
Payment has
been initiated
30.00 Second(s)
Close
claim
Claims Handler
150
Caimant has
been advised
...
<ProcessInstance id="3055" description="Claim being handled">
<AuditTrailEntry>
<Data><Attribute name = "call centre">Sydney </Attribute>
</Data><WorkflowModelElement>incoming claim
</WorkflowModelElement>
<EventType >complete</EventType>
<Timestamp>2006-12-01T07:51:05.000+01:00</Timestamp>
<Originator>customer</Originator>
</AuditTrailEntry>
<AuditTrailEntry>
<Data><Attribute name = "location">Sydney </Attribute>
</Data><WorkflowModelElement>check if sufficient
information is available</WorkflowModelElement>
<EventType >start</EventType>
<Timestamp>2006-12-01T07:51:05.000+01:00</Timestamp>
<Originator>Call Centre Agent Sydney</Originator>
</AuditTrailEntry>
<AuditTrailEntry>
<Data><Attribute name = "location">Sydney </Attribute>
</Data><WorkflowModelElement>check if sufficient
information is available</WorkflowModelElement>
<EventType >complete</EventType>
<Timestamp>2006-12-01T07:51:25.000+01:00</Timestamp>
<Originator>Call Centre Agent Sydney</Originator>
</AuditTrailEntry>
...
<AuditTrailEntry>
<Data><Attribute name = "outcome">processed </Attribute>
<Attribute name = "duration">1732 </Attribute>
</Data><WorkflowModelElement>end</WorkflowModelElement>
<EventType >complete</EventType>
<Timestamp>2006-12-01T08:19:57.000+01:00</Timestamp>
<Originator>Claims handler</Originator>
</AuditTrailEntry>
</ProcessInstance>
...
Claims Handler
150
Claim has
been closed
Fig. 2. Insurance claim handling EPC [2].
Fig. 3. Fragment of the MXML log containing 3512 cases (process instances)
and 46138 events (audit trail entries).
Figure 3 shows a fragment of the log in MXML format, the format used
by ProM. In this case, the event log was obtained from a simulation using
CPN Tools. Using ProMimport one can extract logs from a wide variety of systems, e.g., workflow management systems like Staffware, case handling systems
like FLOWer, ERP components like PeopleSoft Financials, simulation tools like
ARIS and CPN Tools, middleware systems like WebSphere, BI tools like ARIS
PPM, etc., and it has also been used to develop many organization/systemspecific conversions (e.g., hospitals, banks, governments, etc.). Figure 3 illustrates the typical data present in most event logs, i.e., a log is composed of
process instances (i.e., cases) and within each instance there are audit trail entries (i.e., events) with various attributes. Note that it is not required that systems log all of this information, e.g., some systems do not record transactional
information (e.g., just the completion of activities is recorded), related data, or
timestamps. In the MXML format only the ProcessInstance (i.e., case) field and
the WorkflowModelElement (i.e., activity) field are obligatory, i.e., any event
Fig. 4. A Petri net discovered using ProM based on an analysis of the 3512 cases.
needs to be linked to a case (process instance) and an activity. All other fields
(data, timestamps, resources, etc.) are optional.
For control-flow discovery, e.g., deriving a Petri net model from an MXML
file, we often focus on the ordering of activities within individual cases. In this
context, a single case σ can be described by a sequence of activities, i.e., a trace
σ ∈ A∗ where A is the set of activities. Consequently, such an abstraction of the
log can be described by a multiset of traces.
4
Discovery
Process mining techniques supporting discovery do not assume an a-priori model,
i.e., based on an event log, some model is constructed (cf. Figure 1). ProM 4.0
offers 27 mining plug-ins able to construct a wide variety of models. One of the
first plug-ins was the α-miner [5] which constructs a Petri net model from an
MXML log, i.e., based on an analysis of the log which does not contain any
explicit process information (e.g., AND/XOR-splits/joins), a process model is
derived. However, the α-miner is unable to discover complex process models.
For example, it is unable to correctly discover the teleclaims process illustrated
in Figure 2. However, ProM 4.0 has several new mining plug-ins that are able
to correctly discover this process using various approaches (regions, heuristics,
genetic algorithms, etc.) and representations (Petri nets, EPCs, transitions systems, heuristic nets).
Figure 4 shows a Petri net discovered by ProM. The top window shows
the overall process while the second window zooms in on the first part of the
discovered model. This model is behaviorally equivalent to the EPC model in
Figure 2 and has been obtained using an approach which first builds a transition
system (see Figure 5) and then uses extensions of the classical theory of regions
[6] to construct a Petri net. ProM provides various ways to extract transition
systems from logs, a plug-in to construct regions on-the-fly, and an import and
export plug-in for Petrify [6] (see [3] for details).
Process mining is not limited to process models (i.e., control flow). ProM
also allows for the discovery of models related to data, time, transactions, and
resources. As an example, Figure 6 shows the plug-in to extract social networks
from event logs using the technique presented in [1]. The social network shown in
Figure 6 is constructed based on frequencies of work being transferred from one
resource class to another. The diagram adequately shows that work is generated
by customers and then flows via the call centre agents to the claims handlers in
the back office.
It is impossible to provide an overview of all the discovery algorithms supported. However, of the 27 mining plug-ins we would like to mention the heuristics miner (Figure 7) able to discover processes in the presence of noise and
the multi-phase miner using an EPC representation. Both approaches are more
robust than the region-based approach and the classical α-algorithm. It is also
possible to convert models of one type to another. For example, Figure 8 shows
the EPC representation of the Petri net in Figure 4.
5
Conformance
Conformance checking requires, in addition to an event log, some a-priori model.
This model may be handcrafted or obtained through process discovery. Whatever its source, ProM provides various ways of checking whether reality conforms
to such a model. For example, there may be a process model indicating that purchase orders of more than one million Euro require two checks. Another example
is the checking of the so-called “four-eyes principle”. Conformance checking may
be used to detect deviations, to locate and explain these deviations, and to
measure the severity of these deviations. ProM 4.0 also supports conformance
checking, i.e., comparing an a-priori model with the observed reality stored in
some MXML log. For example, we could take the discovered model shown in
Figure 4 and compare it with the log shown in Figure 3 using the conformance
checking plug-in in ProM. Figure 9 shows the result. This analysis shows that the
fitness of the model is 1.0, i.e., the model is able to “parse” all cases. The conformance checker also calculates metrics such as behavioral appropriateness (i.e.,
precision) and structural appropriateness [9] all indicating that the discovered
model is indeed a good reflection of reality. Note that, typically, conformance
checking is done not with respect to a discovered model, but with respect to
some normative/descriptive hand-crafted model. For example, given an event
log obtained from the real teleclaims process it would be interesting to detect
potential deviations from the process model in Figure 2. In case that there is not
a complete a-priori process model but just a set of requirements (e.g., business
rules), ProM’s LTL checker can be used.
Fig. 6. Social network obtained using the
Fig. 5. Transition system system used to “handover of work” metric.
construct the Petri net in Figure 4.
Fig. 7. Heuristics net obtained by applying
the heuristics miner to the log of Figure 3.
Fig. 8. EPC discovered from the log in
Figure 3.
Fig. 10. Performance analyzer.
Fig. 9. Conformance checker.
6
Extension
For model extension it is also assumed that there is an initial model (cf. Figure 1).
This model is extended with a new aspect or perspective, i.e., the goal is not
to check conformance but to enrich the model with performance/time aspects,
organizational/resource aspects, and data/information aspects. Consider for example a Petri net (either discovered, hand-crafted, or resulting from some model
transformation) describing a process which is also logged. It is possible to enrich
the Petri net using information in the log. Most logs also contain information
about resources, data, and time. ProM 4.0 supports for example decision mining,
i.e., by analyzing the data attached to events and using classical decision tree
analysis, it is possible to add decision rules to the Petri net (represented as conditions on arcs). Information about resources (Originator field in the MXML log)
can be analyzed and used to add allocation rules to a Petri net. Figure 10 shows
a performance analysis plug-in which projects timing information on places and
transitions. It graphically shows the bottlenecks and all kinds of performance indicators, e.g., average/variance of the total flow time or the time spent between
two activities. The information coming from all kinds of sources can be stitched
together and exported to CPN Tools, i.e., ProM is able to turn MXML logs into
colored Petri nets describing all perspectives (control-flow, data, time, resources,
etc.). CPN Tools can then be used to simulate the process without adding any
additional information to the generated model.
7
Additional Functionality
It is not possible to give a complete overview of all 142 plug-ins. The figures
shown in previous sections reflect only the functionality of 7 plug-ins. However,
it is important to note that the functionality of ProM is not limited to process
mining. ProM also allows for model conversion. For example, a model discovered
in terms of a heuristic net can be mapped onto an EPC which can be converted
into a Petri net which is saved as a YAWL file that can be uploaded in the
workflow system YAWL thereby directly enacting the discovered model. For
some of the models, ProM also provides analysis plug-ins. For example, the basic
Petri net analysis techniques (invariants, reachability graphs, reduction rules, Scomponents, soundness checks, etc.) are supported. There are also interfaces
to different analysis (e.g., Petrify, Fiona, and Woflan) and visualization (e.g.,
FSMView and DiaGraphica) tools.
8
Conclusion
ProM 4.0 consolidates the state-of-the-art of process mining. It provides a plugable environment for process mining offering a wide variety of plug-ins for process
discovery, conformance checking, model extension, model transformation, etc.
ProM is open source and can be downloaded from www.processmining.org. Many
of its plug-ins work on Petri nets, e.g., there are several plug-ins to discover Petri
nets using techniques ranging from genetic algorithms and heuristics to regions
and partial orders. Moreover, Petri nets can be analyzed in various ways using
the various analysis plug-ins.
Acknowledgements The development of ProM is supported by EIT, NWOEW, the Technology Foundation STW, and the IOP program of the Dutch Ministry of Economic Affairs.
References
1. W.M.P. van der Aalst, H.A. Reijers, and M. Song. Discovering Social Networks
from Event Logs. Computer Supported Cooperative work, 14(6):549–593, 2005.
2. W.M.P. van der Aalst, M. Rosemann, and M. Dumas. Deadline-based Escalation
in Process-Aware Information Systems. Decision Support Systems, 43(2):492–511,
2007.
3. W.M.P. van der Aalst, V. Rubin, B.F. van Dongen, E. Kindler, and C.W. Günther.
Process Mining: A Two-Step Approach using Transition Systems and Regions. BPM
Center Report BPM-06-30, BPMcenter.org, 2006.
4. W.M.P. van der Aalst, B.F. van Dongen, J. Herbst, L. Maruster, G. Schimm, and
A.J.M.M. Weijters. Workflow Mining: A Survey of Issues and Approaches. Data
and Knowledge Engineering, 47(2):237–267, 2003.
5. W.M.P. van der Aalst, A.J.M.M. Weijters, and L. Maruster. Workflow Mining:
Discovering Process Models from Event Logs. IEEE Transactions on Knowledge
and Data Engineering, 16(9):1128–1142, 2004.
6. J. Cortadella, M. Kishinevsky, L. Lavagno, and A. Yakovlev. Deriving Petri Nets
from Finite Transition Systems. IEEE Transactions on Computers, 47(8):859–882,
August 1998.
7. B.F. van Dongen, A.K. Alves de Medeiros, H.M.W. Verbeek, A.J.M.M. Weijters,
and W.M.P. van der Aalst. The ProM framework: A New Era in Process Mining
Tool Support. In G. Ciardo and P. Darondeau, editors, Application and Theory of
Petri Nets 2005, volume 3536 of Lecture Notes in Computer Science, pages 444–454.
Springer-Verlag, Berlin, 2005.
8. E. Kindler. On the Semantics of EPCs: A Framework for Resolving the Vicious
Circle. Data and Knowledge Engineering, 56(1):23–40, 2006.
9. A. Rozinat and W.M.P. van der Aalst. Conformance Testing: Measuring the Fit
and Appropriateness of Event Logs and Process Models. In C. Bussler et al., editor,
BPM 2005 Workshops (Workshop on Business Process Intelligence), volume 3812 of
Lecture Notes in Computer Science, pages 163–176. Springer-Verlag, Berlin, 2006.