43
Chapter 6 Advanced Process Discovery Techniques prof.dr.ir. Wil van der Aalst www.processmining.org

Chapter 6 Advanced Process Discovery Techniques · Process Modeling and Analysis Chapter 3 Data Mining Part II: From Event Logs to Process Models Chapter 4 Getting the Data Chapter

  • Upload
    buithuy

  • View
    220

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Chapter 6 Advanced Process Discovery Techniques · Process Modeling and Analysis Chapter 3 Data Mining Part II: From Event Logs to Process Models Chapter 4 Getting the Data Chapter

Chapter 6Advanced Process Discovery Techniquesprof.dr.ir. Wil van der Aalstwww.processmining.org

Page 2: Chapter 6 Advanced Process Discovery Techniques · Process Modeling and Analysis Chapter 3 Data Mining Part II: From Event Logs to Process Models Chapter 4 Getting the Data Chapter

Overview

PAGE 1

Part I: Preliminaries

Chapter 2 Process Modeling and Analysis

Chapter 3Data Mining

Part II: From Event Logs to Process Models

Chapter 4 Getting the Data

Chapter 5 Process Discovery: An Introduction

Chapter 6 Advanced Process Discovery Techniques

Part III: Beyond Process Discovery

Chapter 7 Conformance Checking

Chapter 8 Mining Additional Perspectives

Chapter 9 Operational Support

Part IV: Putting Process Mining to Work

Chapter 10 Tool Support

Chapter 11 Analyzing “Lasagna Processes”

Chapter 12 Analyzing “Spaghetti Processes”

Part V: Reflection

Chapter 13Cartography and Navigation

Chapter 14Epilogue

Chapter 1 Introduction

Page 3: Chapter 6 Advanced Process Discovery Techniques · Process Modeling and Analysis Chapter 3 Data Mining Part II: From Event Logs to Process Models Chapter 4 Getting the Data Chapter

Process discovery

PAGE 2

software system

(process)model

eventlogs

modelsanalyzes

discovery

records events, e.g., messages,

transactions, etc.

specifies configures implements

analyzes

supports/controls

enhancement

conformance

“world”

people machines

organizationscomponents

businessprocesses

Page 4: Chapter 6 Advanced Process Discovery Techniques · Process Modeling and Analysis Chapter 3 Data Mining Part II: From Event Logs to Process Models Chapter 4 Getting the Data Chapter

Challenge

PAGE 3

process discovery

fitness

precisiongeneralization

simplicity

“able to replay event log” “Occam’s razor”

“not overfitting the log” “not underfitting the log”

Page 5: Chapter 6 Advanced Process Discovery Techniques · Process Modeling and Analysis Chapter 3 Data Mining Part II: From Event Logs to Process Models Chapter 4 Getting the Data Chapter

Observing a stable process infinitely long

PAGE 4

trace in event log

frequent behavior

all behavior(including noise)

Page 6: Chapter 6 Advanced Process Discovery Techniques · Process Modeling and Analysis Chapter 3 Data Mining Part II: From Event Logs to Process Models Chapter 4 Getting the Data Chapter

Target model

PAGE 5

target model

Page 7: Chapter 6 Advanced Process Discovery Techniques · Process Modeling and Analysis Chapter 3 Data Mining Part II: From Event Logs to Process Models Chapter 4 Getting the Data Chapter

Non-fitting model

PAGE 6

non-fitting model

Page 8: Chapter 6 Advanced Process Discovery Techniques · Process Modeling and Analysis Chapter 3 Data Mining Part II: From Event Logs to Process Models Chapter 4 Getting the Data Chapter

Overfitting model

PAGE 7

overfitting model

Page 9: Chapter 6 Advanced Process Discovery Techniques · Process Modeling and Analysis Chapter 3 Data Mining Part II: From Event Logs to Process Models Chapter 4 Getting the Data Chapter

Underfitting model

PAGE 8

underfitting model

Page 10: Chapter 6 Advanced Process Discovery Techniques · Process Modeling and Analysis Chapter 3 Data Mining Part II: From Event Logs to Process Models Chapter 4 Getting the Data Chapter

Characteristics of process discovery algorithms

• Representational bias− Inability to represent concurrency− Inability to deal with (arbitrary) loops− Inability to represent silent actions− Inability to represent duplicate actions− Inability to model OR-splits/joins− Inability to represent non-free-choice behavior− Inability to represent hierarchy

• Ability to deal with noise• Completeness notion assumed• Approach used (direct algorithmic approaches, two-

phase approaches, computational intelligence approaches, partial approaches, etc.)

PAGE 9

Page 11: Chapter 6 Advanced Process Discovery Techniques · Process Modeling and Analysis Chapter 3 Data Mining Part II: From Event Logs to Process Models Chapter 4 Getting the Data Chapter

Examples

PAGE 10

• Algorithmic techniques• Alpha miner• Alpha+, Alpha++, Alpha#• FSM miner• Fuzzy miner• Heuristic miner• Multi phase miner

• Genetic process mining• Single/duplicate tasks• Distributed GM

• Region-based process mining• State-based regions• Language based regions

• Classical approaches not dealing with concurrency• Inductive inference (Mark Gold, Dana Angluin et al.)• Sequence mining

Page 12: Chapter 6 Advanced Process Discovery Techniques · Process Modeling and Analysis Chapter 3 Data Mining Part II: From Event Logs to Process Models Chapter 4 Getting the Data Chapter

Heuristic mining

• To deal with noise and incompleteness.• To have a better representational bias than the α

algorithm (AND/XOR/OR/skip).• Uses C-nets.

PAGE 11

a

register claim

c e

close case

check damage

dconsult expert

b

check policy

Page 13: Chapter 6 Advanced Process Discovery Techniques · Process Modeling and Analysis Chapter 3 Data Mining Part II: From Event Logs to Process Models Chapter 4 Getting the Data Chapter

Example log; problem α algorithm

PAGE 12

a

b

c

ed

p2

end

p4

p3p1

start

p5

Page 14: Chapter 6 Advanced Process Discovery Techniques · Process Modeling and Analysis Chapter 3 Data Mining Part II: From Event Logs to Process Models Chapter 4 Getting the Data Chapter

Taking into account frequencies

PAGE 13

Page 15: Chapter 6 Advanced Process Discovery Techniques · Process Modeling and Analysis Chapter 3 Data Mining Part II: From Event Logs to Process Models Chapter 4 Getting the Data Chapter

Dependency measure

PAGE 14

Page 16: Chapter 6 Advanced Process Discovery Techniques · Process Modeling and Analysis Chapter 3 Data Mining Part II: From Event Logs to Process Models Chapter 4 Getting the Data Chapter

Example

PAGE 15

Page 17: Chapter 6 Advanced Process Discovery Techniques · Process Modeling and Analysis Chapter 3 Data Mining Part II: From Event Logs to Process Models Chapter 4 Getting the Data Chapter

Lower threshold (2 direct successions and a dependency of at least 0.7)

PAGE 16

a c e

d

b

11(0.92)

11(0.92)

13(0.93)

5(0.83)

4(0.80)

13(0.93)

11(0.92)

11(0.92)

Page 18: Chapter 6 Advanced Process Discovery Techniques · Process Modeling and Analysis Chapter 3 Data Mining Part II: From Event Logs to Process Models Chapter 4 Getting the Data Chapter

Higher threshold (5 direct successions and a dependency of at least 0.9)

PAGE 17

a c e

d

b11(0.92)

11(0.92)

13(0.93) 13(0.93)

11(0.92)

11(0.92)

Page 19: Chapter 6 Advanced Process Discovery Techniques · Process Modeling and Analysis Chapter 3 Data Mining Part II: From Event Logs to Process Models Chapter 4 Getting the Data Chapter

Learning splits and joins

PAGE 18

a40

c21

e 40

d 17

b21

5

20

20

13

20

20

13

4

5

20

13

20

20 20

20

5

20

13 1313

4

4

Page 20: Chapter 6 Advanced Process Discovery Techniques · Process Modeling and Analysis Chapter 3 Data Mining Part II: From Event Logs to Process Models Chapter 4 Getting the Data Chapter

Alternative visualization

PAGE 19

a40

c21

e 40

d 17

b21

5

20

20

13

20

20

13

4

5

20

13

20

20 20

20

5

20

13 1313

4

4

a c e

d

b

AND AND

Page 21: Chapter 6 Advanced Process Discovery Techniques · Process Modeling and Analysis Chapter 3 Data Mining Part II: From Event Logs to Process Models Chapter 4 Getting the Data Chapter

Characteristics of heuristic mining

• Can deal with noise and therefore quite robust.• Improved representational bias.• Split and join rules are only considered locally

(therefore most of the discovered model are not sound and require repair actions).

PAGE 20

Page 22: Chapter 6 Advanced Process Discovery Techniques · Process Modeling and Analysis Chapter 3 Data Mining Part II: From Event Logs to Process Models Chapter 4 Getting the Data Chapter

Genetic process mining

PAGE 21

next generationcomputefitness

elitism

parents

crossover

children

mutation

create initial population

“dead” individuals

tournament

select best individual

event log

termination

Page 23: Chapter 6 Advanced Process Discovery Techniques · Process Modeling and Analysis Chapter 3 Data Mining Part II: From Event Logs to Process Models Chapter 4 Getting the Data Chapter

Design decisions

• Representation of individuals• Initialization• Fitness function• Selection strategy (tournament and elitism)• Crossover• Mutation

PAGE 22

next generationcomputefitness

elitism

parents

crossover

children

mutation

create initial population

“dead” individuals

tournament

select best individual

event log

termination

Page 24: Chapter 6 Advanced Process Discovery Techniques · Process Modeling and Analysis Chapter 3 Data Mining Part II: From Event Logs to Process Models Chapter 4 Getting the Data Chapter

Example: crossover

PAGE 23

a

start register request

b

examine thoroughly

c

examine casually

d

check ticket

decide

pay compensation

reject request

reinitiate request

e

g

h

f

end

a

start register request

b

examine thoroughly

c

examine casually

d

check ticket

decide

pay compensation

reject request

reinitiate request

e

g

h

f

end

a

start register request

b

examine thoroughly

c

examine casually

d

check ticket

decide

reinitiate request

e

f

a

start register request

b

examine thoroughly

c

examine casually

d

check ticket

decide

pay compensation

reject request

reinitiate request

e

g

h

f

end

pay compensation

reject request

g

hend

Page 25: Chapter 6 Advanced Process Discovery Techniques · Process Modeling and Analysis Chapter 3 Data Mining Part II: From Event Logs to Process Models Chapter 4 Getting the Data Chapter

Example: mutation

PAGE 24

a

start register request

b

examine thoroughly

c

examine casually

d

check ticket

decide

pay compensation

reject request

reinitiate request

e

g

h

f

end

a

start register request

b

examine thoroughly

c

examine casually

d

check ticket

decide

pay compensation

reject request

reinitiate request

e

g

h

f

end

remove place

added arc

Page 26: Chapter 6 Advanced Process Discovery Techniques · Process Modeling and Analysis Chapter 3 Data Mining Part II: From Event Logs to Process Models Chapter 4 Getting the Data Chapter

Characteristics of genetic process mining

• Requires a lot of computing power.• Can be distributed easily.• Can deal with noise, infrequent behavior, duplicate tasks,

invisible tasks, etc.• Allows for incremental improvement and combinations

with other approaches (heuristics post-optimization, etc.).PAGE 25

Page 27: Chapter 6 Advanced Process Discovery Techniques · Process Modeling and Analysis Chapter 3 Data Mining Part II: From Event Logs to Process Models Chapter 4 Getting the Data Chapter

Region-based mining

• Two types of regions theory:− State-based regions− Language-based regions

• All about discovering places (like in the α algorithm)!

PAGE 26

a1

...

a2

am

b1

b2

bn

p(A,B) ...

A={a1,a2, … am} B={b1,b2, … bn}

Page 28: Chapter 6 Advanced Process Discovery Techniques · Process Modeling and Analysis Chapter 3 Data Mining Part II: From Event Logs to Process Models Chapter 4 Getting the Data Chapter

State-based regions

Two steps:1.Discover a transition system (different abstractions

are possible)2.Convert transition system into an “equivalent” Petri

net.

PAGE 27

Page 29: Chapter 6 Advanced Process Discovery Techniques · Process Modeling and Analysis Chapter 3 Data Mining Part II: From Event Logs to Process Models Chapter 4 Getting the Data Chapter

Step 1: learning a transition system

• past, future, past+future• sequence, multiset, set abstraction• limited horizon to abstract further• filtering e.g. based on transaction type, names, etc.• labels based on activity name or other features

PAGE 28

a b c d c d c d e f a g h h h ipast future

current state

past and future

trace:

Page 30: Chapter 6 Advanced Process Discovery Techniques · Process Modeling and Analysis Chapter 3 Data Mining Part II: From Event Logs to Process Models Chapter 4 Getting the Data Chapter

Past without abstraction (full sequence)

PAGE 29

‹a,e,d›‹a,e›‹a›

‹a,b,c,d›

‹›

‹a,c,b,d›

‹a,b,c›‹a,b›

‹a,c,b›‹a,c›

a

c d

e d

b dc

b

Page 31: Chapter 6 Advanced Process Discovery Techniques · Process Modeling and Analysis Chapter 3 Data Mining Part II: From Event Logs to Process Models Chapter 4 Getting the Data Chapter

Future without abstraction

PAGE 30

‹a,e,d› ‹d›

‹a,b,c,d›

‹›

‹a,c,b,d›

d

‹b,c,d›‹c,d›

‹c,b,d›‹b,d›

ea

a

a

b

bc

c

‹e,d›

Page 32: Chapter 6 Advanced Process Discovery Techniques · Process Modeling and Analysis Chapter 3 Data Mining Part II: From Event Logs to Process Models Chapter 4 Getting the Data Chapter

Past with multiset abstraction

PAGE 31

[ ]

a

d

b dc

e

c

b

[a,b,c,d][a,b,c][a,c]

[a,b]

[a,e]

[a,d,e]

[a]

Page 33: Chapter 6 Advanced Process Discovery Techniques · Process Modeling and Analysis Chapter 3 Data Mining Part II: From Event Logs to Process Models Chapter 4 Getting the Data Chapter

Only last event matters for state

PAGE 32

‹›a b

c

d

e d

d‹a›

‹b›

‹c›

‹d›

‹e›

c b

Page 34: Chapter 6 Advanced Process Discovery Techniques · Process Modeling and Analysis Chapter 3 Data Mining Part II: From Event Logs to Process Models Chapter 4 Getting the Data Chapter

Step 2: constructing a Petri net using regions

PAGE 33

a

a

b

c

d

fpR

e

a

b

e

c

d

df

f

e

e

a = enterb = enterc = exitd = exite = do not crossf = do not cross

R

Page 35: Chapter 6 Advanced Process Discovery Techniques · Process Modeling and Analysis Chapter 3 Data Mining Part II: From Event Logs to Process Models Chapter 4 Getting the Data Chapter

Example

PAGE 34

[ ]

a

d

b dc

e

cb

[a,b,c,d][a,b,c][a,c]

[ a,b][a,e] [a,d,e]

[a]

a

b

c

de

p2

end

p4

p3p1

start

Page 36: Chapter 6 Advanced Process Discovery Techniques · Process Modeling and Analysis Chapter 3 Data Mining Part II: From Event Logs to Process Models Chapter 4 Getting the Data Chapter

Language based regions

PAGE 35

a1

a2

b1

b2

dpR

e

c1

c

f

YX

Region R = (X,Y,c) corresponding to place pR: X = {a1,a2,c1} = transitions producing a token for pR, Y = {b1,b2,c1} = transitions consuming a token from pR, and c is the initial marking of pR.

Page 37: Chapter 6 Advanced Process Discovery Techniques · Process Modeling and Analysis Chapter 3 Data Mining Part II: From Event Logs to Process Models Chapter 4 Getting the Data Chapter

Based idea: enough tokens should be present when consuming

PAGE 36

a1

a2

b1

b2

dpR

e

c1

c

f

YX

A place is feasible if it can be added without disabling any of thetraces in the event log.

Page 38: Chapter 6 Advanced Process Discovery Techniques · Process Modeling and Analysis Chapter 3 Data Mining Part II: From Event Logs to Process Models Chapter 4 Getting the Data Chapter

Example

PAGE 37

Page 39: Chapter 6 Advanced Process Discovery Techniques · Process Modeling and Analysis Chapter 3 Data Mining Part II: From Event Logs to Process Models Chapter 4 Getting the Data Chapter

Regions

PAGE 38

Page 40: Chapter 6 Advanced Process Discovery Techniques · Process Modeling and Analysis Chapter 3 Data Mining Part II: From Event Logs to Process Models Chapter 4 Getting the Data Chapter

Model

PAGE 39

b

c

a

e

d

p1 p2 p3 p4

p6

p5

Page 41: Chapter 6 Advanced Process Discovery Techniques · Process Modeling and Analysis Chapter 3 Data Mining Part II: From Event Logs to Process Models Chapter 4 Getting the Data Chapter

Characteristics of region-based mining

• Can be used to discover more complex control-flow structures.

• Classical approaches need to be adapted (overfitting!).

• Representational bias can be parameterized (e.g., free-choice nets, label splitting, etc.).

• Problems dealing with noise.

PAGE 40

Page 42: Chapter 6 Advanced Process Discovery Techniques · Process Modeling and Analysis Chapter 3 Data Mining Part II: From Event Logs to Process Models Chapter 4 Getting the Data Chapter

Other approaches, e.g. fuzzy mining

PAGE 41

Page 43: Chapter 6 Advanced Process Discovery Techniques · Process Modeling and Analysis Chapter 3 Data Mining Part II: From Event Logs to Process Models Chapter 4 Getting the Data Chapter

Evaluating the discovered process

PAGE 42

Structure: Is this the simplest model (Occam's Razor)?

Fitness: Is the event log possible according to the model?

Precision: Is the model not underfitting (allow for too much)?

Generalization: Is the model not overfitting (only allow for the “accidental” examples)?