Upload
others
View
17
Download
0
Embed Size (px)
Citation preview
Mediating Between Modeled
and Observed Behavior: The
Quest for the "Right" Process
prof.dr.ir. Wil van der Aalst
Outline
PAGE 1
Introduction
to Process
Mining (short)
Importance of
alignments to relate
observed and
modeled behavior The 4+
dimensions of
conformance
Representational
bias
Mediating between
a reference model
and observed
behavior (model
repair)
Discovering
configurable
process models
Decomposing process
mining problems to
deal with Big Data
Positioning Process Mining
2
process
mining
data-oriented analysis
(data mining, machine learning, business intelligence)
process model analysis
(simulation, verification, etc.)pe
rform
an
ce
-orie
nte
d q
ue
stio
ns
,
pro
ble
ms
an
d s
olu
tion
s
co
mp
lian
ce
-orie
nte
d q
ue
stio
ns
,
pro
ble
ms
an
d s
olu
tion
s
Moore's Law
3
2013
2060
Three Types of Process Mining
software
system
(process)
model
event
logs
models
analyzes
discovery
records
events, e.g.,
messages,
transactions,
etc.
specifies
configures
implements
analyzes
supports/
controls
enhancement
conformance
“world”
people machines
organizations
components
business
processes
Play-Out
PAGE 5
event logprocess model
A
B
C
DE
p2
end
p4
p3p1
start
Play-Out (Classical use of models)
PAGE 6
A B C D
A C B D A B C D
A E D
A C B D
A C B D
A E D
A E D
Play-In
PAGE 7
event log process model
A
B
C
DE
p2
end
p4
p3p1
start
Play-In
PAGE 8
A C B D A B C D
A E D
A C B D
A C B D
A E D
A E D A B C D
Example Process Discovery (Vestia, Dutch housing agency, 208 cases, 5987 events)
PAGE 9
Example Process Discovery (ASML, test process lithography systems, 154966 events)
PAGE 10
Example Process Discovery (AMC, 627 gynecological oncology patients, 24331 events)
PAGE 11
Replay
PAGE 12
event log process model
· extended model
showing times,
frequencies, etc.
· diagnostics
· predictions
· recommendations
A
B
C
DE
p2
end
p4
p3p1
start
Replay
PAGE 13
A B C D
A
B
C
DE
p2
end
p4
p3p1
start
Replay
PAGE 14
A E D
A
B
C
DE
p2
end
p4
p3p1
start
Replay can detect problems
PAGE 15
A C D
Problem!
missing token
Problem!
token left behind
Conformance Checking (WOZ objections Dutch municipality, 745 objections, 9583 event, f= 0.988)
PAGE 16
A
B
C
DE
p2
end
p4
p3p1
start
Replay can extract timing information
PAGE 17
A5 B8 C9 D13
5
8
9
13
3
4
5
4 3
2 6 5
8
7 6 4
7
7 4
3
PAGE 18
Performance Analysis Using Replay (WOZ objections Dutch municipality, 745 objections, 9583 event, f= 0.988)
Hundreds of plug-ins available covering
the whole process mining spectrum
PAGE 19 Download from: www.processmining.org
open-source (L-GPL)
Commercial Alternatives
• Disco (Fluxicon)
• Perceptive Process Mining (before Futura Reflect and BPM|one)
• ARIS Process Performance
Manager
• QPR ProcessAnalyzer
• Interstage Process Discovery
(Fujitsu)
• Discovery Analyst (StereoLOGIC)
• XMAnalyzer (XMPro)
• …
20
PAGE 21
Three Key Observations
PAGE 22
• conformance checking to diagnose deviations
• squeezing reality into the model to do model-based
analysis
move on
model
move on
model move on model (harmless)
move on
log
#1 Alignments are essential!
PAGE 23
#2 Models are like the glasses required to
see and understand event data!
PAGE 24
#3 Process models as maps:
Breathing life into models
PAGE 25
Alignments
Joint work with Arya Adriansyah, Boudewijn van Dongen, Elham
Ramezani, Dirk Fahland, Massimiliano de Leoni, et al.
astart register
request
bexamine
thoroughly
cexamine casually
d
check ticket
decide
pay compensation
reject request
reinitiate request
e
g
h
f
end
Replaying trace “abeg”
26
m=1 r=1
a b e g
6
1 1
6
= 0.83333
From “playing the token game” to
optimal alignments …
a b » e g
a b d e g
27
astart register
request
bexamine
thoroughly
cexamine casually
d
check ticket
decide
pay compensation
reject request
reinitiate request
e
g
h
f
end
observed trace: “abeg”
move in
model only
Another alignment
a b c d e g
a b » d e g
28
astart register
request
bexamine
thoroughly
cexamine casually
d
check ticket
decide
pay compensation
reject request
reinitiate request
e
g
h
f
end
observed trace: “abcdeg”
move in
log only
Moves have costs
• Standard cost function:
− c(x,») = 1
− c(»,y) = 1
− c(x,y) = 0, if x=y
− c(x,y) = ∞, if x≠y
29
… a …
… » …
… » …
… a …
… a …
… a …
… a …
… b …
Any cost structure is possible
30
… send-letter(John,2
weeks, $400)
…
… send-email(Sue,3
weeks,$500)
…
• Similar activities (more similarity implies lower
costs).
• Resource conformance (done by someone that
does not have the specified role).
• Data conformance (path is not possible for this
customer).
• Time conformance (missed the legal deadline)
Using Alignments
• An optimal alignment has the
lowest possible costs.
• If multiple alignments are
optimal, pick one or use all.
• Like an "oracle" revealing paths
in the model.
• These paths can be used for
further analysis!
PAGE 31
• Can be used for quantifying
various types conformance
(possibly using a different
cost function).
Some pointers
• Wil M. P. van der Aalst, Arya Adriansyah, Boudewijn F. van Dongen: Replaying
history on process models for conformance checking and performance analysis.
Wiley Interdisc. Rew.: Data Mining and Knowledge Discovery 2(2): 182-192 (2012)
• Arya Adriansyah, Jorge Munoz-Gama, Josep Carmona, Boudewijn F. van Dongen,
Wil M. P. van der Aalst: Alignment Based Precision Checking. Business Process
Management Workshops 2012: 137-149
• Arya Adriansyah, Boudewijn F. van Dongen, Wil M. P. van der Aalst: Conformance
Checking Using Cost-Based Fitness Analysis. EDOC 2011: 55-64
• Massimiliano de Leoni, Wil M. P. van der Aalst, Boudewijn F. van Dongen: Data-
and Resource-Aware Conformance Checking of Business Processes. BIS 2012: 48-
59
• Joos C. A. M. Buijs, Boudewijn F. van Dongen, Wil M. P. van der Aalst: On the Role
of Fitness, Precision, Generalization and Simplicity in Process Discovery. OTM
Conferences (1) 2012: 305-322
• Elham Ramezani, Dirk Fahland, Wil M. P. van der Aalst: Where Did I Misbehave?
Diagnostic Information in Compliance Checking. BPM 2012: 262-278
• A. Adriansyah, B.F. van Dongen, W.M.P. van der Aalst. Memory-Efficient Alignment
of Observed and Modeled Behavior. BPM Center Report BPM-13-03,
BPMcenter.org, 2013
PAGE 32
PAGE 33
Conformance: Taking a Step Back
Conventional Conformance Notions
PAGE 34
fitness
simplicity
generalization
precision
Process
Mining
ability to explain observed behavior
avoiding underfitting
Occam’s Razor
avoiding overfitting
lift
gravity
thrust drag
Leaving out one of these dimensions during
discovery will lead to degenerate cases!
Problem
PAGE 35
real process
event data
process model
record
process discovery
conformance checking
process discovery
conformance checking
“unknown”“only examples”
real process
is unknown
event logs covers only a fraction
of all possible behavior
model needs to provide an abstraction:
Murphy's Law of Process Mining
Traditional Notions such as Precision and
Recall do NOT Apply
PAGE 36
M0
M0
M1
ideal or desired model based on perfect knowledge of real process
FN
TP
FPTN
descriptive or normative model (man-made or discovered)
L
event log
regular behavior in log covered by model
exceptional behavior in log covered by model
exceptional behavior in log not covered by the model
regular behavior in log not covered by model
Problem II: in practice it is unclear where this line is
Problem I: event log does not provide information
about the whole universe of traces only a selected part
Operationalizing the Four
Conformance dimensions
• Fitness (fraction of observed behavior possible according to the model).
− Measure at the case or event level.
− How to continue after a deviation (where to put the "blame"), cf.
duplicate and silent activities.
• Precision (avoiding underfitting; fraction of allowed behavior never
observed).
− Log only contains examples.
− Metrics are e.g. based on escaping edges.
• Generalization (avoiding overfitting; probability that the next unseen
case will not fit).
− Reasoning about unseen behavior, strongly related to log
completeness.
• Simplicity (Occam's Razor: the simplest of two or more competing
theories is preferable)
− Easy to operationalize (e.g., number of nodes or arcs).
− Often subjective (valuation of AND/XOR/OR-split/joins). PAGE 37
fitness
simplicity
generalization
precision
Process
Mining
ability to explain observed behavior
avoiding underfitting
Occam’s Razor
avoiding overfitting
lift
gravity
thrust drag
Some pointers
• Wil M. P. van der Aalst: Mediating Between Modeled and Observed
• Behavior: The Quest for the “Right” Process. Seventh IEEE International
Conference on Research Challenges in Information Science (RCIS 2013), (2013)
• Wil M. P. van der Aalst, Arya Adriansyah, Boudewijn F. van Dongen: Replaying
history on process models for conformance checking and performance
analysis. Wiley Interdisc. Rew.: Data Mining and Knowledge Discovery 2(2):
182-192 (2012)
• Joos C. A. M. Buijs, Boudewijn F. van Dongen, Wil M. P. van der Aalst: On the
Role of Fitness, Precision, Generalization and Simplicity in Process Discovery.
OTM Conferences (1) 2012: 305-322
• Wil M. P. van der Aalst: Process Mining - Discovery, Conformance and
Enhancement of Business Processes. Springer 2011, isbn 978-3-642-19344-6,
pp. I-XVI, 1-352
PAGE 38
PAGE 39
Representational Bias
in Process Mining (not about visualization)
Joint work with Joos Buijs, Sander Leemans, and Boudewijn van Dongen.
Typical Representational Bias
• (Labeled) Petri Nets, WF-
nets, etc.
• Subsets of
• BPMN diagrams,
• UML Activity Diagrams,
• Event-Driven Process
Chains (EPCs),
• YAWL,
• etc.
• Transition Systems
• (Hidden) Markov Models
• …
PAGE 40
register
request
examine casually
examine thoroughly
check ticket
decide
pay compensation
reject request
reinitiate request
start end
start
register
request
examinethoroughly
examine casually
checkticket
decide
pay compensation
reject request
end
e1
AND
OR
XOR
OR
AND
XOR end
e2
e3
e4 e5
e6
start
register
request
examine thoroughly
examine casually
checkticket
decide
pay compensation
reject request
new information
end
c1
c2
OR-split OR-join
c3
a
start register
request
b
examine thoroughly
c
examine casually
d
check ticket
decide
pay compensation
reject request
reinitiate request
e
g
h
f
end
c1
c2
c3
c4
c5
[start]
register
request
examine thoroughly
examine casually
checkticket
decide
pay compensation
reject request
reinitiate request
[end][c1,c2]
[c1,c4]
[c2,c3]
[c3,c4]
checkticket
examine casually
examine thoroughly
[c5]
Huge Search Space When Discovering a
Petri Net, BPMN model, and the like …
PAGE 41
M M
M
M M M M
M
M M
M
M
M
M
M
M
M M
M M
M
M
M M
M
M M M M
M
M M
M
M
M
M
M
M
M M
M M
M
M
M M
M
M M M
M
M
M M
M
M
M
M
M
M
M M
M M
M
M
M M
M
M M M M
M
M M
M
M
M
M
M
M
M M
M M
M
M
M M
M
M
M
M
M
M
M M M
M M M
M M
M
M
M M M M
M
M M
M
M M M M
M
M M
M
M
M
M
M
M
M M
M M
M
M
M M
M
M M M M
M
M M
M
M
M
M
M
M
M M
M M
M
M
M M
M
M M M
M
M
M M
M
M
M
M
M
M
M M
M M
M
M
M M
M
M M M M
M
M M
M
M
M
M
M
M
M M
M M
M
M
M M
M
M
M
M
M
M
M M M
M M M
M M
M
M
M M M M
M
M
M M M
M
M M
M
M
M
M
M
M
M M
M
M
M
M M M
M
M M
M
M
M
M
M
M M
M M
M
M M M
M
M M
M M M
M
M M
M
M
M
M
M
M M
M M
M
M M
M
M M M
M
M M
M
M
M
M
M
M M
M M
M
M
M
M
M
M
M
M
M
M
M
M
M
M
M
M M
M
M
M
M
M
M
M M
M
M
M
M M
M M
M
M
M
M
M
M
M
M
M
M
M
M
M
M M
M
M M M M
M M
… with just a few interesting candidates
PAGE 42
M M
M
M M M M
M
M M
M
M
M
M
M
M
M M
M M
M
M
M M
M
M M M M
M
M M
M
M
M
M
M
M
M M
M M
M
M
M M
M
M M M
M
M
M M
M
M
M
M
M
M
M M
M M
M
M
M M
M
M M M M
M
M M
M
M
M
M M
M
M M
M M
M
M
M M
M
M
M
M
M
M
M M M
M M M
M M
M
M
M M M M
M
M M
M
M M M M
M
M M
M
M
M
M
M
M
M M
M M
M
M
M M
M
M M M M
M
M M
M
M
M
M
M
M
M M
M M
M
M
M M
M
M M M
M
M M M
M
M
M
M
M
M
M M
M M
M
M
M M
M
M M M M
M
M M
M
M
M
M
M
M
M M
M M
M
M
M M
M
M
M
M
M
M
M M
M M M M
M M
M
M
M M M M
M
M
M M M
M
M M
M
M
M
M
M
M
M M
M
M
M
M M M
M
M M
M
M
M
M
M
M M
M M
M
M M M
M
M M
M M M
M
M M
M
M
M
M
M
M M
M M
M
M M
M
M M M
M
M M
M
M
M
M
M
M M
M M
M
M
M
M
M
M
M
M
M
M
M
M
M
M
M
M M
M
M
M
M
M
M
M M
M
M
M
M M
M M
M
M
M
M
M
M
M
M
M
M
M
M
M
M M
M
M M M M
M M
Alternative Representational Bias
1. C-nets (XOR/AND/OR-
split/join graphs; more
likely to be sound due to
declarative semantics).
2. Declare models
(constraint based,
grounded in LTL;
anything is possible
unless forbidden)
3. Process Trees (similar to
subsets of various
process algebras; sound
by structure)
PAGE 43
a
register
request
b
examine thoroughly
c
examine casually
d
checkticket
decide
pay compensation
reject request
e
g
h
f
end
reinitiate request
z
register
request
decide
pay compensation
reject request
g
h
a e
today's
focus
Another Representational Bias:
Process Trees
• Always sound because of the
block structure
• Also Loop and OR operator
A (BA)*
PAGE 58
A
D C
E
→
D X
E ∧
A →
B
C B B A
Petri Net Semantics (used for comparison and conformance checking only)
PAGE 59
A
B
B
A
A
B
A
B
Sequence
Exclusive Choice
Loop
Parallellism
Or Choice
A B
C
… and BPMN.
PAGE 60
Exclusive Choice
Parallellism
Loop
Or Choice
Sequence
A Discovery Algorithm Using Process
Trees: Evolutionary Tree Miner (ETM)
• Process trees as representation (= limit search space
to "good" models).
• Genetic approach (= very flexible)
• Fitness function uses all four criteria (= seamlessly
balance the different "forces")
PAGE 61
Create
Initial
Population
Return
Best
Individual
Measure
Quality
Change
Population
Stop
? Yes
No
Population Change
PAGE 62
Population i Population i+1
Elite
Crossover Mutation
Selection
Replace
Example
PAGE 64
B
C
D
E
F
A G
A = send e-mail, B = check credit,
C = calculate capacity, D = check system,
E = accept, F = reject, G = send e-mail
Conventional Algorithms (1/3) ("best effort" mapping to process trees to allow for comparison)
PAGE 65
alpha miner
ILP miner
language-based region miner
low fitness
low precision
low fitness
Conventional Algorithms (2/3)
PAGE 66
heuristic miner
multi-phase miner
Conventional Algorithms (3/3)
PAGE 67
state-based region miner
genetic miner
Often unsound result and no mechanism
to seamlessly balance the four criteria
PAGE 68
Genetic Mining (ETM) While Considering
Only One Criterion
PAGE 69
best value
possible
for this log
ETM with weight zero to three out of four perspectives.
Considering Replay Fitness and One
Other Criterion
PAGE 70 ETM with weight zero to two out of four perspectives.
Considering 3 of 4 Criteria
PAGE 71 replay fitness needs to have a larger weight
Considering All Four Criteria with
Emphasis on Fitness
PAGE 72
fitness has weight 10
Initial Model Versus Discovered Model
PAGE 73
B
C
D
E
F
A G
discovered by ETM simulated
PAGE 74
1) Carefully choose your representational bias during
discovery: Unrelated to presentation/visualization!
2) Consider all conformance dimensions (replay
fitness, precision, generalization, and simplicity)!
Some pointers
• Wil M. P. van der Aalst, Arya Adriansyah, Boudewijn F. van
Dongen: Causal Nets: A Modeling Language Tailored towards
Process Discovery. CONCUR 2011: 28-42
• Joos C. A. M. Buijs, Boudewijn F. van Dongen, Wil M. P. van der
Aalst: On the Role of Fitness, Precision, Generalization and
Simplicity in Process Discovery. OTM Conferences (1) 2012:
305-322
• Joos C. A. M. Buijs, Boudewijn F. van Dongen, Wil M. P. van der
Aalst: A genetic algorithm for discovering process trees. IEEE
Congress on Evolutionary Computation 2012: 1-8
• S.J.J. Leemans, D. Fahland, W.M.P. van der Aalst. Discovering
Block-Structured Process Models From Event Logs – A
Constructive Approach. BPM Center Report BPM-13-06,
BPMcenter.org, 2013
PAGE 75
PAGE 76
Mediating Between a
Reference Model and Real
Observed Behavior
Joint work with Joos Buijs, Boudewijn van Dongen, and Dirk Fahland.
Compromise Based on Two Main Forces
PAGE 77
fitness
simplicity
generalization
precision
Process
Mining
ability to explain observed behavior
avoiding underfitting
Occam’s Razor
avoiding overfitting
lift
gravity
thrust drag
astart register
request
bexamine
thoroughly
cexamine casually
d
check ticket
decide
pay compensation
reject request
reinitiate request
e
g
h
f
end
Various techniques to
compare graphs, e.g.,
edit distance notions
(add, remove, replace).
Force Between Reference Model and Candidate
Model Can be Viewed as a 5th Conformance Notion
PAGE 78
Optimal
Process Model
Reference
Process Model
Similarity
Boundary
Replay
FitnessSimplicity
GeneralizationPrecision
Candidate
Process Model
Example From CoSeLoG Project
PAGE 79
Some pointers
• J.C.A.M. Buijs ,M. La Rosa, H.A. Reijers, B.F. van Dongen, and
W.M.P. van der Aalst: Improving Business Process Models using
Observed Behavior, SIMPDA 2012 post-proceedings, Lecture Notes in
Business Information Processing, 2013.
• Joos C. A. M. Buijs, Boudewijn F. van Dongen, Wil M. P. van der Aalst:
On the Role of Fitness, Precision, Generalization and Simplicity in
Process Discovery. OTM Conferences (1) 2012: 305-322
• Dirk Fahland, Wil M. P. van der Aalst: Repairing Process Models to
Reflect Reality. BPM 2012: 229-245.
• Dirk Fahland, Wil M. P. van der Aalst: Simplifying discovered process
models in a controlled manner. Inf. Syst. 38(4): 585-605 (2013).
PAGE 80
PAGE 81
Mining Configurable
Process Models
Joint work with Joos Buijs, Boudewijn van Dongen, and Florian Gottschalk.
Two variants of the same process …
PAGE 82
Configurable Process Model
PAGE 83
Variants of the same process
PAGE 84
aa bb
dd
ee
gg hh
cc
ff
aa bb
dd
gg hh
ff
aa
dd
ee
gg hh
cc
ff
Approach 1: Merge Separately Mined
Models
PAGE 85
Step 2b:
Process
Configuration
Step 2b:
Process
Configuration
event
log 1
event
log 2
event
log n
Configurable
Process model
C1
C2
Cn
...
Step 2a:
Process
Model
Merging
Step 2a:
Process
Model
Merging
Step 1:
Process
Mining
Step 1:
Process
Mining
Process
model 1
Process
model 2
Process
model n
"normal"
discovery
pure
model-
based
merging
pure model-
based
configuration
Results Approach 1 for running example
PAGE 86
Results Approach 1 for running example
PAGE 87
poor
generalization poor
similarity
Approach 2
PAGE 88
Step 1c:
Process
Individual-
ization
Step 1c:
Process
Individual-
ization
event
log 1
event
log 2
event
log n
Common
Process model
C1
C2
Cn
...
Step 1b:
Process
Mining
Step 1b:
Process
Mining
Step 1a:
Merge
Event
Logs
Step 1a:
Merge
Event
Logs
Merged
event log
Process
model 1
Process
model 2
Process
model n
Step 1d:
Process
Model
Merging
Step 1d:
Process
Model
Merging
Step 2:
Process
Configuration
Step 2:
Process
ConfigurationConfigurable
Process model
discovery
based on
whole
event log
discovery based on local
event log and common
model (edit distance as
5th dimension)
pure
model-
based
merging
pure model-
based
configuration
Results Approach 2 for running example
PAGE 89
Results Approach 2 for running example
PAGE 90
poor
generalization
poor
similarity
Approach 3
PAGE 91
Step 2:
Process
Configuration
Step 2:
Process
Configuration
event
log 1
event
log 2
event
log n
Configurable
Process model
C1
C2
Cn
...
Step 1b:
Process
Mining
Step 1b:
Process
Mining
Step 1a:
Merge
Event
Logs
Step 1a:
Merge
Event
Logs
Merged
event log
discovery
based on
whole
event log
configuration based on
common model and local
event log (always limiting
behavior, not extending it)
Results Approach 3 for running example
PAGE 92
better
generalization
better
similarity
Approach 4
PAGE 93
event
log 1
event
log 2
event
log n
Configurable
Process model
C1
C2
Cn
...
Step 1&2:
Process
Mining
&
Process
Configuration
Step 1&2:
Process
Mining
&
Process
Configuration
discovery of the process model and
the configuration is combined into
one algorithm (configuration costs are
added to overall fitness)
Results Approach 4 for running example
PAGE 94
even better
generalization
even better
similarity better
simplicity
small drop in
replay fitness
Genetic algorithms are versatile and can
consider different forces at the same time
PAGE 95
… but often not fast enough
PAGE 96
Some pointers
• J.C.A.M. Buijs, B.F. van Dongen, and W.M.P. van der Aalst:
Mining Configurable Process Models from Collections of Event
Logs, under review, 2013.
• Florian Gottschalk, Teun A. C. Wagemakers, Monique H.
Jansen-Vullers, Wil M. P. van der Aalst, Marcello La Rosa:
Configurable Process Models: Experiences from a Municipality
Case Study. CAiSE 2009: 486-500
• Florian Gottschalk, Wil M. P. van der Aalst, Monique H. Jansen-
Vullers: Merging Event-Driven Process Chains. OTM
Conferences (1) 2008: 418-426
• Wil M. P. van der Aalst: Business Process Configuration in the
Cloud: How to Support and Analyze Multi-tenant Processes?
ECOWS 2011: 3-10
PAGE 97
PAGE 98
Decomposing Process Mining
Problems
Joint work with Eric Verbeek, Jorge Munoz , and Josep Carmona.
Big Data: Opportunities and Challenges
PAGE 99
System Net
• Labeled Petri net (P,T,F,l)
• Silent transitions and visible transitions (unique or not),
• One initial marking Minit, one final marking Mfinal
PAGE 100
a
start
a = register request
b = examine file
c = check ticket
d = decide
e = reinitiate request
f = send acceptance letter
g = pay compensation
h = send rejection letter
b
c
d
g
h
e
end
c1
c2
c3
c4
c5t1
f
t2
t3
t4
t5
t6
t7 t8
t9
t10
t11
c6
c7 c8
c9
Traces and Logs
• A system net SN has a set of possible visible traces
ϕ(SN) starting in Minit and ending Mfinal only showing
the visible steps.
• An event log L is a multiset of traces.
• Two main process mining problems:
1. Conformance checking: Given L and SN, evaluate the
"conformance" (e.g., fitness, precision, generalization,
etc.) of L and ϕ(SN)
2. Process discovery: Given L, create SN such that the
conformance of L and ϕ(SN) is "as good as possible"
PAGE 101
Valid Decomposition
• Union of subnets is original net
• No shared places
• Shared transitions are visible and have unique label
PAGE 102
a
start
b
c
d
g
h
e
end
c1
c2
c3
c4
c5t1
f
t2
t3
t4
t5
t6
t7 t8
t9
t10
t11
c6
c7 c8
c9
d
g
h
e
endc5
f
t5
t6
t7 t8
t9
t10
t11
c6
c7 c8
c9
a
start
b
c
d
e
c1
c2
c3
c4
t1
t2
t3
t4
t5
t6
Another Decomposition
Requirement
• Union of subnets is original net
• No shared places
• Shared transitions are visible and unique
PAGE 103
a
start
b
c
d
g
h
e
end
c1
c2
c3
c4
c5t1
f
t2
t3
t4
t5
t6
t7 t8
t9
t10
t11
c6
c7 c8
c9
d
g
h
e
endc5
f
t5
t6
t7 t8
t9
t10
t11
c6
c7 c8
c9
a
c
d
e
c2c4
t1
t4
t5
a
start t1
ab
d
e
c1 c3
t1
t2
t3
t5
t6
t6
Maximal Valid Decomposition
PAGE 104
a
start t1
SN1
a
c
e
c2
t1
t4
t6SN3
ab
d
e
c1 c3
t1
t2
t3
t5
t6
SN2
d
g
h
e
c5
f
t5
t6
t7 t8
t9
t10
c6
c7
SN5
c
d
c4
t4
t5
SN4
g
h
end
f
t8
t9
t10
t11c8
c9
SN6
a
start
b
c
d
g
h
e
end
c1
c2
c3
c4
c5t1
f
t2
t3
t4
t5
t6
t7 t8
t9
t10
t11
c6
c7 c8
c9
Maximal Decomposition
• Construction: group arcs iteratively
• Maximal decomposition is unique
• There is always a valid decomposition
PAGE 105
a
start
b
c
d
g
h
e
end
c1
c2
c3
c4
c5t1
f
t2
t3
t4
t5
t6
t7 t8
t9
t10
t11
c6
c7 c8
c9
a
start t1
SN1
a
c
e
c2
t1
t4
t6SN3
ab
d
e
c1 c3
t1
t2
t3
t5
t6
SN2
d
g
h
e
c5
f
t5
t6
t7 t8
t9
t10
c6
c7
SN5
c
d
c4
t4
t5
SN4
g
h
end
f
t8
t9
t10
t11c8
c9
SN6
y
y
x
Non-unique visible labels
• Union of subnets is original net
• No shared places
• Shared transitions are visible and unique PAGE 106
a
start
b
b
d
g
h
e
end
c1
c2
c3
c4
c5t1
f
t2
t3
t4
t5
t6
t7 t8
t9
t10
t11
c6
c7 c8
c9
a
b
e
c2
t1
t4
t6
ab
d
e
c1 c3
t1
t2
t3
t5
t6
b
d
c4
t4
t5
SN7
ab
b
d
e
c1
c2
c3
c4
t1
t2
t3
t4
t5
t6
Conformance checking can
be decomposed !!!
• Let L be an event log, SN a system net, and
D={SN1,SN2, … SNn} a valid decomposition
• Li is the sublog of SNi (L projected onto visible
transitions of SNi)
PAGE 107
a
start
b
c
d
g
h
e
end
c1
c2
c3
c4
c5t1
f
t2
t3
t4
t5
t6
t7 t8
t9
t10
t11
c6
c7 c8
c9
a
start t1
SN1
a
c
e
c2
t1
t4
t6SN3
ab
d
e
c1 c3
t1
t2
t3
t5
t6
SN2
d
g
h
e
c5
f
t5
t6
t7 t8
t9
t10
c6
c7
SN5
c
d
c4
t4
t5
SN4
g
h
end
f
t8
t9
t10
t11c8
c9
SN6
L is perfectly fitting SN
if and only if
each projected log is Li is perfectly fitting SNi
Example of alignment for observed trace
a,b,c,d,e,c,d,g,f
PAGE 108
a
start
b
c
d
g
h
e
end
c1
c2
c3
c4
c5t1
f
t2
t3
t4
t5
t6
t7 t8
t9
t10
t11
c6
c7 c8
c9
a
start t1
SN1
a
c
e
c2
t1
t4
t6SN3
ab
d
e
c1 c3
t1
t2
t3
t5
t6
SN2
d
g
h
e
c5
f
t5
t6
t7 t8
t9
t10
c6
c7
SN5
c
d
c4
t4
t5
SN4
g
h
end
f
t8
t9
t10
t11c8
c9
SN6
Etc.
a,b,c,d,e,c,d,g,f
So What?
Quantifying Conformance
• Exact result: Fraction of cases perfectly fitting
SN equals the fraction of cases for which each
projected log is Li is perfectly fitting SNi.
• Bound: For fitness at the event level (costs
based alignment) it is possible to compute an
optimistic value.
PAGE 109
Discovery can also be distributed!
1. Assume the set of activities
is split in overlapping sets.
2. Split log L in sublogs Li
based on these sets.
3. Discover a model SNi per
sublog.
4. Merge the models SNi into
SN and return result.
PAGE 111
All the earlier guarantees hold, e.g., L is
perfectly fitting SN if and only if each
projected log is Li is perfectly fitting SNi
Example
PAGE 112
• Let's use the Alpha algorithm
• Alpha algorithm is not very
suitable for discovering
transition bordered subnets.
• By adding start and end
activities we get (in this case)
perfectly fitting subnets.
Discover model per sublog
(Alpha algorithm)
PAGE 113
a
i
b
e
d
start end
a c
e
d
start end
d
j
h
e
start end
j
start endg
f
k
SN1
SN2
SN3
SN4
h
Merge models
PAGE 114
a
i
b
c
d
g
h
e
j f
k
t1 t2
t3
t4
t5
t6
t7
t8
t9
t10
t11
t12
t13
p5
p1
p2
p3
p4
p6
{1,2,3,4}{1,2}
{1}
{1}
{2}
{1,2,3}
{1,2,3}
p7
p8
p9
p10
p11
p12
{3,4}
{3,4}
p13
p14
p15
p16
p17
p18
p19
p20p21
p22
p23
p24
p25
{4}
{4} {4}
{1,2,3,4}
a
i
b
e
d
start end
a c
e
d
start end
d
j
h
e
start end
j
start endg
f
k
SN1
SN2
SN3
SN4
h
L is perfectly fitting SN because each
projected log is Li is perfectly fitting SNi
Simplify by removing redundant
places and initialization
PAGE 115
a
i
b
c
d
g
h
e
j f
k
t1 t2
t3
t4
t5
t6
t7
t8
t9
t10
t11
t12
t13
p5
p1
p2
p3
p4
p6
{1,2,3,4}{1,2}
{1}
{1}
{2}
{1,2,3}
{1,2,3}
p7
p8
p9
p10
p11
p12
{3,4}
{3,4}
p13
p14
p15
p16
p17
p18
p19
p20p21
p22
p23
p24
p25
{4}
{4} {4}
{1,2,3,4}
a
start
i
b
c
d
g
h
e
end
j
f
k
Log is perfectly fitting this model !
PAGE 116
SESEs, Passages, and general result all
define trees on process graph/event log
• Few large process
mining tasks or
many smaller
process mining
tasks.
• Choice of level is
driven by
computation time
and desired
diagnostics!
• All implemented in
ProM.
PAGE 117
Diagnosis
PAGE 118
Example: Conformance Checking based
on SESEs
PAGE 119
k = maximal number of arcs in one SESE, P = # places,
T = # transitions, f = fitness, t = time in seconds, S = #
transition bounded SESEs, B = # bridges, >5 = # more
than 5 arcs, nf = # non-fitting parts.
k=inf (no
decompostion)
some overhead
for easy
problems
stopped after
10 hours problems are
often local
suitable k-value
depends on
model/log
intractable
problems become
tractable
Some pointers
• Wil M.P. van der Aalst. Decomposing Petri Nets for Process Mining: A
Generic Approach. BPM Center Report BPM-12-20, BPMcenter.org,
2012
• Wil M. P. van der Aalst: Distributed Process Discovery and
Conformance Checking. FASE 2012: 1-25
• Wil M. P. van der Aalst: Decomposing Process Mining Problems Using
Passages. Petri Nets 2012: 72-91
• H. M. W. (Eric) Verbeek, Wil M. P. van der Aalst: An Experimental
Evaluation of Passage-Based Process Discovery. Business Process
Management Workshops 2012: 205-210
• Wil M.P. van der Aalst and Eric Verbeek. Process Discovery and
Conformance Checking Using Passages. BPM Center Report BPM-12-
21, BPMcenter.org, 2012
• J. Munoz-Gama, J. Carmona, W. van der Aalst: Hierarchical
Conformance Checking of Process Models Based on Event Logs. In:
Applications and Theory of Petri Nets, 2013
PAGE 120
PAGE 121
Conclusion
Conclusion (1/2)
• Alignments are essential for relating observed
and modeled behavior!
• Conformance has (at least) four dimensions!
• Representational bias is important (and should
not be confused with visualization)!
• New questions are emerging:
− mediating between a reference model and
observed behavior
− discovering configurable process models
• Decomposing process mining problems to deal
with Big Data. PAGE 122
Conclusion (2/2)
PAGE 123
process
mining
data-oriented analysis
(data mining, machine learning, business intelligence)
process model analysis
(simulation, verification, etc.)pe
rform
an
ce
-orie
nte
d q
ue
stio
ns
,
pro
ble
ms
an
d s
olu
tion
s
co
mp
lian
ce
-orie
nte
d q
ue
stio
ns
,
pro
ble
ms
an
d s
olu
tion
s
Still many challenging and
highly relevant open problems
in process mining!