Process Mining for ERP Systems

Erik Nooijen,

Boudewijn v. Dongen, Dirk Fahland

Process Discovery

process

discovery

algorithm

process

c1: A B C D E

c2: A C B D E

c3: A F D E

assumptions

• case = sequence of events of this case

• cases are isolated:

event A in c1 happens only in c1 (and not in c2)

• cases of the same process

• one unique case id,

• each event associated to exactly one case id

Typical Process in an ERP System

Build to Order

Material A

Material B order

product X Alice

product Y

Material B

Material C

Material B

Material A

Material C

ACME Inc.

Mega Corp.

Manufacturer

materials

n-to-m relations database

poID cust. … created processed built shipped

po1 Alice 30-08 9:22 30-08 13:12 01-09 15:12 03-09 10:15

po2 Bob 30-08 10:15 30-08 13:14 01-09 16:13 03-09 17:18

ProductOrder

poID moID type added

po1 mo3 B 30-08 13:13

po1 mo4 A 30-08 13:14

po2 mo3 B 30-08 13:15

po2 mo4 C 30-08 13:16

OrderedMaterial

moID suppl. … completed sent received

mo3 ACME 30-08 13:15 30-08 14:15 01-09 9:05

mo4 MEGA 30-08 13:17 30-08 16:12 01-09 10:13

MaterialOrder

cust. address …

Alice … …

Bob … …

Customer id attributes time-stamp attributes

relations

id attributes relations data attributes

process

discovery

algorithm

process

MaterialOrder

- moID

- supplier

- completed

- sent

- received

OrderedMat.

- poID

- moID

- type

- added

Customer

- cust

ProductOrder

- poID

- cust

- created

- processed

- built

- shipped

Process Discovery for ERP Systems

process

discovery

algorithm

process

reality: data in a relational DB

• events stored as time-stamped

attributes in tables

• multiple primary keys

multiple notions of case

• tables are related

one event related to

multiple cases

1..* 1

Process Discovery for ERP Systems

process

discovery

algorithm

process

reality: data in a relational DB

• events stored as time-stamped

attributes in tables

• multiple primary keys

multiple notions of case

• tables are related

one event related to

multiple cases

MaterialOrder

- moID

- supplier

- completed

- sent

- received

OrderedMat.

- poID

- moID

- type

- added

Customer

- cust

ProductOrder

- poID

- cust

- created

- processed

- built

- shipped

1..* 1

Outline

process

decompose by primary keys

log f.

MO discovery

model f.

discovery

model f.

related by

primary foreign-key

relations

Find Artifact Schemas

process

log f.

MO discovery

model f.

discovery

model f.

related by

primary foreign-key

relations

document schema vs. actual schema identify

• column types (esp. time-stamped columns)

• primary keys

• foreign keys

various (non-trivial) techniques available

key discovery is NP-complete in the size of the

table(s)

result:

Step 0: discover database schema

= schema summarization

Step 1: decompose schema into processes

ProductOrder MaterialOrder

1. sets of

corresponding

tables

2. links between

Automatic Schema Summarization

= group similar tables

through clustering

define a distance between

any 2 tables

• by relations

• by information content

tables that are close to

each other

same cluster

# of clusters: user input

1. structural distance

between tables

fanout ~ avg. # of child

records related to the

same parent record

fanout: 1

fanout: 2

fanout: 1 = (2+0)/2

between tables

fanout ~ avg. # of child

records related to the

same parent record

matched fraction ~

1 / (fraction of records in

parent with matching child

record)

fanout: 1

fanout: 2

fanout: 1

m.fr: 1

m.fr: 2 = 1/ (1/2)

Grouping by Clustering

2. information distance

importance of each table

= entropy (is maximal if all

records are different)

distance: 2 tables with high

entropies large distance

3. weighted distance by

structure + information

4. k-means clustering:

k clusters based on

weighted distance

most important table of cluster

= table with least distance to all

key attribute of the cluster

Artifact Schema Artifact Log

process

log f.

MO discovery

model f.

discovery

model f.

related by

primary foreign-key

relations

po1 mo3 B 30-08 13:13

po1 mo4 A 30-08 13:14

po2 mo3 B 30-08 13:15

po2 mo4 C 30-08 13:16

po1 Alice 30-08 9:22 30-08 13:12 01-09 15:12 03-09 10:15

po2 Bob 30-08 10:15 30-08 13:14 01-09 16:13 03-09 17:18

Log Extraction

log f.

cluster = set of related tables

+ primary key of most important table

case id

(created, poID=po1, time=30-08 9:22, …)

po1 mo3 B 30-08 13:13

po1 mo4 A 30-08 13:14

po2 mo3 B 30-08 13:15

po2 mo4 C 30-08 13:16

po1 Alice 30-08 9:22 30-08 13:12 01-09 15:12 03-09 10:15

po2 Bob 30-08 10:15 30-08 13:14 01-09 16:13 03-09 17:18

Log Extraction

log f.

time-stamped attribute event

case id

(created, poID=po1, time=30-08 9:22, cust.=Alice, …)

po1 mo3 B 30-08 13:13

po1 mo4 A 30-08 13:14

po2 mo3 B 30-08 13:15

po2 mo4 C 30-08 13:16

po1 Alice 30-08 9:22 30-08 13:12 01-09 15:12 03-09 10:15

po2 Bob 30-08 10:15 30-08 13:14 01-09 16:13 03-09 17:18

Log Extraction

log f.

case id

related attributes event attributes

po1 mo3 B 30-08 13:13

po1 mo4 A 30-08 13:14

po2 mo3 B 30-08 13:15

po2 mo4 C 30-08 13:16

po1 Alice 30-08 9:22 30-08 13:12 01-09 15:12 03-09 10:15

po2 Bob 30-08 10:15 30-08 13:14 01-09 16:13 03-09 17:18

Log Extraction

log f.

case id

(processed, poID=po1, time=30-08 13:12, …)

po1 mo3 B 30-08 13:13

po1 mo4 A 30-08 13:14

po2 mo3 B 30-08 13:15

po2 mo4 C 30-08 13:16

po1 Alice 30-08 9:22 30-08 13:12 01-09 15:12 03-09 10:15

po2 Bob 30-08 10:15 30-08 13:14 01-09 16:13 03-09 17:18

Log Extraction

log f.

case id

(processed, poID=po1, time=30-08 13:12, …)

(added, poID=po1, time=30-08 13:13, moID=mo3, …)

refers to artifact “MaterialOrder”

Outline

process

log f.

order discovery

model f.

discovery

model f.

compose by

primary foreign-key

relations

Resulting Model(s)

create

processed

shipped

completed

received

Product Order Material Order

(addded, poID=po1, …, moID=mo3)

prototype tool

• input: relational database (via JDBC), .csv tables

• steps

− discover database schema (types, keys, relations)

− discover artifact schema

− by k-means clustering

− by user picking tables

− extract logs ProM

Implementation & Evaluation

> 300 tables, > 40 GiB of data

schema extraction

clustering

log extraction

Evaluation: SAP System of Sligro

time-stamp attributes: 15 hrs

primary keys: 4 hrs

foreign keys: 5 hrs (single col)/

6 days (double col.)

entropies: 17 hrs

table distances: 5 hrs

clustering: a few seconds

~20 different artifacts found

largest: 47 tables, 869 columns

extract 1000 traces of > 246,000 events

query database: 1 hrs

write log file: 32 hrs

Sligro: Artikel lifecycle model

performance

• key discovery: NP-complete in R (# of records)

• foreign key discovery: NP-complete in R2

• problem is in the “hard part” of NP

• sampling of data, domain knowledge, semi-automatic

requires good database structure

• proper relations, proper keys

• otherwise wrong clusters are formed

• events don’t get right attributes

• semi-automatic approach

events shared by multiple cases… working on it…

Open issues

Process Mining for ERP Systems

Erik Nooijen,

Boudewijn v. Dongen, Dirk Fahland

Process Mining for ERP Systems

Technology

ERP implementation process

Copyrights 2002 ERP Data Mining & Knowledge Discovery webcast searchsap.com Sept 10, 2002 1 ERP Centric Data Mining and Knowledge Discovery Naeem Hashmi

2. Whitepaper - ERP Mining - Essential Elements

A Framework of an Automated Data Mining Systems Using ERP

Mining ERP Software - Clear Copywriting...Mining ERP Software 4 Implement a mining specific ERP system As developing a mine involves several industry-unique business processes, it

Mining ERP Software - WordPress.com · Mining ERP Software 3 Introduction Efficient process management is integral to the success of any mining operation. As market competition increases,

Process Mining

ERP Centric Data Mining and KD

Process Mining: Control-Flow Mining Algorithms technologie management 2 Process Mining • Short Recap • Types of Process Mining Algorithms • Common Constructs • Input Format

Oracle ERP Month End Process

Chapter Seven Process Modeling, Process Improvement, and ERP Implementation

Applying data mining techniques to ERP system anomaly …lib.tkk.fi/Dipl/2010/urn100315.pdf · DM Data Mining – process of extracting useful information from such large masses of

Process Mining: Extension Mining Algorithms

Process Mining - Chapter 3 - Data Mining

BUSINESS PROCESS REENGINEERING & ERP

Pronto Xi ERP for Mining -Top 10 Reasons to Select

Business Process Re-engineering/ JIT/ERP

BatchMaster ERP with SAP Business One Cost Effective ERP Solution for Process Manufacturers BatchMaster ERP with SAP Business One Designed for Formula Based Process Manufacturers

4.ERP-Business Process Reengineering

Process erp-broucher