Process-oriented System Analysis

Preview:

DESCRIPTION

Process-oriented System Analysis. Process Mining. BPM Lifecycle. Motivation. Up until now : Designed or pre-defined models Assumption that they are appropriate Process Mining Consideration of information from the execution of proceses This is covered in log data Logs - PowerPoint PPT Presentation

Citation preview

Process-oriented System AnalysisProcess Mining

BPM Lifecycle

Motivation

Up until now:Designed or pre-defined modelsAssumption that they are appropriate

Process MiningConsideration of information from the

execution of procesesThis is covered in log data

LogsSequence of log entries, which capture

events in a company that relate to processes

Log entries

Examples of log entriesCheck Invoice for Invoice No. 4567 completed on 12.11.2010 at 9:19:57Function StoreCustomerData(„Müller“, c1987, „Bad Bentheim“)

completed on 12.11.2010 at 9:22:24Send Invoice for Invoice No. 4567 completed on 12.11.2010 at 9:23:18Function ContactCustomer(c1987, PromoMailing) completed on

12.11.2010 at 9:24:10Function StoreCustomerData(„Miller“, c1988, „Osnabrück“) completed

on 12.11.2010 at 9:26:08Check Invoice for Invoice No. 4568 completed on 12.11.2010 at 9:26:38Function ContactCustomer(c1988, PromoMailing) completed on

12.11.2010 at Send 9:27:32

Logs bear valuable information

Logs bear valuable information to answer questions likeWhen and how many process instances have been executed?Are there recurring patterns in the execution of activities?Can process models be derived from the data?Which paths of execution are used how often in the process

models?Are there paths which are never taken?

Process Discovery

Process Discovery is a technique for deriving a process model from log data

Input: execution logs as ordered lists of activities with time stamp and case id

Output: process model which could have generated the execution logs

The case id is often not directly covered in the data, and needs to be generated in pre-processing

Process Conformance

Process Conformance is a technique to analyze the relationship between log data and process models

Input: Logs and process modelOutput: information on the relationship, e.g. fitness

Overview

Execution Logs

AssumptionExecution log defines complete order of events, which can all be

related to process activitiesAll events in the execution log relate to process instances of the

considered processHint

Often log entries refer to different process modelsThis warrants filtering activities

AbstractionTechniques often work on abstraction of logsFocus on case id and activities

Execution Log Format

Log format(caseID, activity)

ExampleCheck Invoice for Invoice No. 4567 completed on 12.11.2010 at

9:19:57Function StoreCustomerData(„Müller“, c1987, „Bad Bentheim“)

completed on 12.11.2010 at 9:22:24Send Invoice for Invoice No. 4567 completed on 12.11.2010 at

9:23:18

Resulting Log(4567, Check Invoice), (c1987, StoreCustomerData), (4567, Send

Invoice), etc.

Execution Log

Further abstractionA‘s and B‘s(case id, task id)

Additional informationEvent type, time, resource,

dataNot considered here

AssumptionActivity execution captured by

one eventNo intermediate activities

case 1 : task A case 2 : task A case 3 : task A case 3 : task B case 1 : task B case 1 : task C case 2 : task C case 4 : task A case 2 : task B case 2 : task D case 5 : task E case 4 : task C case 1 : task D case 3 : task C case 3 : task D case 4 : task B case 5 : task F case 4 : task D

The Alpha Algorithm

Process Discovery Algorithms

Simplest Algorithm: The α – AlgorithmRelatively simple, some properties can be proofedAffected by Noise, therefore not first choice in practice

Noise refers to incomplete or erroneous logsFurthermore, the α+(+) – Algorithms

α+ and α++ are extensions to the α – Algorithm for recognizing more fine-granular structure in the process model

Also affected by NoiseFinally, techniques for dealing with Noise

Definitions

Let T be a set of activities (Tasks) and T * the set of all sequences of arbitrary length over T, then we have:σ T * is called execution sequence, if all activities in σ belong to

the same process instanceW T * is called execution log (workflow log)

AssumptionsIn each process model, each activity appears at most onceEach direct neighbor relation between activities is represented at least

once

Execution Logs

case 1 : task A case 2 : task A case 3 : task A case 3 : task B case 1 : task B case 1 : task C case 2 : task C case 4 : task A case 2 : task B case 2 : task D case 5 : task E case 4 : task C case 1 : task D case 3 : task C case 3 : task D case 4 : task B case 5 : task F case 4 : task D

Execution Logs

case 1 : task A case 2 : task A case 3 : task A case 3 : task B case 1 : task B case 1 : task C case 2 : task C case 4 : task A case 2 : task B case 2 : task D case 5 : task E case 4 : task C case 1 : task D case 3 : task C case 3 : task D case 4 : task B case 5 : task F case 4 : task D

Execution sequences:Case 1: ABCDCase 2: ACBDCase 3: ABCDCase 4: ACBDCase 5: EFResultingworkflow log: W = {ABCD, ACBD, EF}

Order relations

Log based order relations for pairs of activities a, b T in a workflow log W:Direct successor

a >w b i.e. in an execution sequence b directly follows aCausality

a w b i.e. a >w b and not b >w a

Concurrency a ║w b i.e. a >w b and b >w a

Exclusivenessa w b i.e. not a >w b and not b >w aActivity pairs which never succeed each other

case 1 : task A case 2 : task A case 3 : task A case 3 : task B case 1 : task B case 1 : task C case 2 : task C case 4 : task A case 2 : task B case 2 : task D case 5 : task E case 4 : task C case 1 : task D case 3 : task C case 3 : task D case 4 : task B case 5 : task F case 4 : task D

W = {ABCD, ACBD, EF}• Direct successor• Causality• Concurrency

Execution log analysis

case 1 : task A case 2 : task A case 3 : task A case 3 : task B case 1 : task B case 1 : task C case 2 : task C case 4 : task A case 2 : task B case 2 : task D case 5 : task E case 4 : task C case 1 : task D case 3 : task C case 3 : task D case 4 : task B case 5 : task F case 4 : task D

A>BA>CB>CB>DC>BC>DE>F

AB

AC

BD

CD

EF

B||CC||B

1) 2) 3)

• W = {ABCD, ACBD, EF}• Direct successor• Causality• Concurrency

Execution log analysis

α-Algorithm

The idea is to utilize order relations for deriving a workflow net that is compliant with these relations

Precisely, each order relation results in a petri net fragment, which imposes the respective relationship

α-Algorithm

Idea (a)

a b

α-Algorithm

Idea (b)

a b, a c and b # c

α-Algorithm

Idea (c)

b d, c d and b # c

α-Algorithm

Idea (d)

a b, a c and b || c

α-Algorithm

Idea (e)

b d, c d and b || c

The Alpha-Algorithm (simplified)

1. Identify the set of all tasks in the log as TL.2. Identify the set of all tasks that have been observed as the first

task in some case as TI.3. Identify the set of all tasks that have been observed as the last

task in some case as TO.4. Identify the set of all connections to be potentially represented in

the process model as a set XL. Add the following elements to XL:a. Pattern (a): all pairs for which hold a→b.b. Pattern (b): all triples for which hold a→(b#c).c. Pattern (c): all triples for which hold (b#c)→d.

Note that triples for which Pattern (d) a→(b||c) or Pattern (e) (b||c)→d hold are not included in XL.

The Alpha-Algorithm (cont.)

5. Construct the set YL as a subset of XL by:a. Eliminating a→b and a→c if there exists some a→(b#c).b. Eliminating b→c and b→d if there exists some (b#c)→d.

6. Connect start and end events in the following way:a. If there are multiple tasks in the set TI of first tasks, then draw a start

event leading to an XOR-split, which connects to every task in TI. Otherwise, directly connect the start event with the only first task.

b. For each task in the set TO of last tasks, add an end event and draw an arc from the task to the end event.

The Alpha-Algorithm (cont.)

7. Construct the flow arcs in the following way:a. Pattern (a): For each a→b in YL, draw an arc a to b.

b. Pattern (b): For each a→(b#c) in YL, draw an arc from a to an XOR-split, and from there to b and c.

c. Pattern (c): For each (b#c)→d in YL, draw an arc from b and c to an XOR-join, and from there to d.

d. Pattern (d) and (e): If a task in the so constructed process model has multiple incoming or multiple outgoing arcs, bundle these arcs with an AND-split or AND-join, respectively.

8. Return the newly constructed process model.

α-Algorithm Example

case 1 : task A case 2 : task A case 3 : task A case 3 : task B case 1 : task B case 1 : task C case 2 : task C case 4 : task A case 2 : task B case 2 : task D case 5 : task E case 4 : task C case 1 : task D case 3 : task C case 3 : task D case 4 : task B case 5 : task F case 4 : task D

α-Algorithm Example

case 1 : task A case 2 : task A case 3 : task A case 3 : task B case 1 : task B case 1 : task C case 2 : task C case 4 : task A case 2 : task B case 2 : task D case 5 : task E case 4 : task C case 1 : task D case 3 : task C case 3 : task D case 4 : task B case 5 : task F case 4 : task D

a(W):

α-Algorithm

Log Completeness

Level of completeness required for a logAssume for the execution sequence EF, there is a log missingThen, the correct process model cannot be derived

Basic assumption: each execution sequence must be part of the logConsequence: the complete behaviour is visibleProblem: amount of required instances grows dramaticallyExample:

10 activities are executed in parallelAmount of potential execution sequences:

10! = 3.628.800

Log Completeness

ResultFor the α-Algorithm it is sufficient to have completeness in terms of

the successor relationship (>w)Reason

All other relations are derived from direct successorshipInterpretation

Each time two activities may succeed each other, this must be visible in at least one execution sequence

HintIn case of highly concurrent process models, this reduces the amount

of required execution sequences dramatically

Summary

• Execution Logs• Process Mining using the Alpha-Algorithm

Recommended