60
Lecture Outline Big Data: the New Playground Events, Processes, and Anything in Between Complex Event Processing Optimization Process Mining with Schedules When Processes Rule Events Avigdor Gal Technion – Israel Institute of Technology

RuleML 2015: When Processes Rule Events

  • Upload
    ruleml

  • View
    1.383

  • Download
    1

Embed Size (px)

Citation preview

LectureOutline

Big Data: theNewPlayground

Events,Processes, andAnything inBetween

ComplexEventProcessingOptimization

ProcessMining withSchedules

When Processes Rule Events

Avigdor GalTechnion – Israel Institute of Technology

LectureOutline

Big Data: theNewPlayground

Events,Processes, andAnything inBetween

ComplexEventProcessingOptimization

ProcessMining withSchedules

Presentation Outline

Big data: the New Playground

Events, Processes, and Anything in Between

Complex Event Processing Optimizaion

Process Mining with Schedules

LectureOutline

Big Data: theNewPlayground

Events,Processes, andAnything inBetween

ComplexEventProcessingOptimization

ProcessMining withSchedules

Big Data: is it a Storm in a Teacup?

LectureOutline

Big Data: theNewPlayground

Events,Processes, andAnything inBetween

ComplexEventProcessingOptimization

ProcessMining withSchedules

Big data is a game changer

From Theory to Systems: empirical evaluation counts

From Systems to Data: large scale empirical evaluationcounts

LectureOutline

Big Data: theNewPlayground

Events,Processes, andAnything inBetween

ComplexEventProcessingOptimization

ProcessMining withSchedules

Who is a Data Scientist?

The ability to take data – to be able to understand it, toprocess it, to extract value from it, to visualize it, tocommunicate it – that’s going to be a hugely important skill inthe next decades. (Hal Varian, Google’s Chief Economist)

LectureOutline

Big Data: theNewPlayground

Events,Processes, andAnything inBetween

ComplexEventProcessingOptimization

ProcessMining withSchedules

Data Volume: No Longer the Size of a Teacup

Volume

Table: Big Data Cross Table

Big data may be a single dataset with a lot of data

LectureOutline

Big Data: theNewPlayground

Events,Processes, andAnything inBetween

ComplexEventProcessingOptimization

ProcessMining withSchedules

Data Volume: No Longer the Size of a Teacup

Table: Big Data Cross Table

Big data may be a single dataset with a lot of data

LectureOutline

Big Data: theNewPlayground

Events,Processes, andAnything inBetween

ComplexEventProcessingOptimization

ProcessMining withSchedules

Data Velocity: Replacing a Teacup with a Tea Hose

Volume

Velocity

Table: Big Data Cross Table

Big data may be data that rapidly changes

LectureOutline

Big Data: theNewPlayground

Events,Processes, andAnything inBetween

ComplexEventProcessingOptimization

ProcessMining withSchedules

Data Velocity: Replacing a Teacup with a Tea Hose

Table: Big Data Cross Table

Big data may be data that rapidly changes

LectureOutline

Big Data: theNewPlayground

Events,Processes, andAnything inBetween

ComplexEventProcessingOptimization

ProcessMining withSchedules

Data Velocity: Replacing a Teacup with a Tea Hose

Table: Big Data Cross Table

Big data may be data that rapidly changes

LectureOutline

Big Data: theNewPlayground

Events,Processes, andAnything inBetween

ComplexEventProcessingOptimization

ProcessMining withSchedules

Data Velocity: Replacing a Teacup with a Tea Hose

Table: Big Data Cross Table

Big data may be data that rapidly changes

LectureOutline

Big Data: theNewPlayground

Events,Processes, andAnything inBetween

ComplexEventProcessingOptimization

ProcessMining withSchedules

Data Variety: When One Tea Type is Just notEnough

Volume

Velocity

Variety

Table: Big Data Cross Table

Big data may be a small dataset with many different schemata

LectureOutline

Big Data: theNewPlayground

Events,Processes, andAnything inBetween

ComplexEventProcessingOptimization

ProcessMining withSchedules

Data Variety: When One Tea Type is Just notEnough

Table: Big Data Cross Table

Big data may be a small dataset with many different schemata

LectureOutline

Big Data: theNewPlayground

Events,Processes, andAnything inBetween

ComplexEventProcessingOptimization

ProcessMining withSchedules

Data Veracity: Is it Coffee or Black Tea with Milk?

Volume

Velocity

Variety

Veracity

Table: Big Data Cross Table

Big data may be data with varying levels of trustworthiness

LectureOutline

Big Data: theNewPlayground

Events,Processes, andAnything inBetween

ComplexEventProcessingOptimization

ProcessMining withSchedules

Data Veracity: Is it Coffee or Black Tea with Milk?

Table: Big Data Cross Table

Big data may be data with varying levels of trustworthiness

LectureOutline

Big Data: theNewPlayground

Events,Processes, andAnything inBetween

ComplexEventProcessingOptimization

ProcessMining withSchedules

Data Gathering: where and when to expect thefountain to burst

Gathering

Volume

Velocity

Variety

Veracity

Signal and Event Processing

Table: Big Data Cross Table

LectureOutline

Big Data: theNewPlayground

Events,Processes, andAnything inBetween

ComplexEventProcessingOptimization

ProcessMining withSchedules

Data Gathering: where and when to expect thefountain to burst

Table: Big Data Cross Table

LectureOutline

Big Data: theNewPlayground

Events,Processes, andAnything inBetween

ComplexEventProcessingOptimization

ProcessMining withSchedules

Data Management: Not your typical DBA anymore

Gathering Managing

Volume

Velocity

Variety

Veracity

Cloud Computing, NoSQL, NewSQL

Table: Big Data Cross Table

LectureOutline

Big Data: theNewPlayground

Events,Processes, andAnything inBetween

ComplexEventProcessingOptimization

ProcessMining withSchedules

Data Analytics: When Data Analysis ExplodesMulti-Dimensionally

Gathering Managing Analyzing

Volume

Velocity

Variety

Veracity

Data & Process MiningML, IR, NLP

Table: Big Data Cross Table

LectureOutline

Big Data: theNewPlayground

Events,Processes, andAnything inBetween

ComplexEventProcessingOptimization

ProcessMining withSchedules

Data Visualization: The Machine Offering toMankind

Gathering Managing Analyzing Visualizing

Volume

Velocity

Variety

Veracity

User Experience

Table: Big Data Cross Table

LectureOutline

Big Data: theNewPlayground

Events,Processes, andAnything inBetween

ComplexEventProcessingOptimization

ProcessMining withSchedules

Data Visualization: The Machine Offering toMankind

Table: Big Data Cross Table

LectureOutline

Big Data: theNewPlayground

Events,Processes, andAnything inBetween

Events

Processes

ComplexEventProcessingOptimization

ProcessMining withSchedules

Big Data Cross Table

Gathering Managing Analyzing Visualizing

Volume Ev Pro

Velocity en ce

Variety t ss

Veracity s es

Table: Big Data Cross Table

LectureOutline

Big Data: theNewPlayground

Events,Processes, andAnything inBetween

Events

Processes

ComplexEventProcessingOptimization

ProcessMining withSchedules

Event Processing

Events

An event e is an occurrence within a particular system ordomain.

It is something that has happened, or is contemplated ashaving happened in that domain.[Etzion and Niblett, 2010]

Point-based semantics.

An event type E ∈ E is a specification for a set of eventsthat share the same semantic intent and structure.

Complex Event Processing

Systems: Amit [Adi and Etzion, 2004],SASE [Wu et al., 2006], Cayuga [Demers et al., 2007],CEDR [Barga et al., 2007], ESPER [].

DEBS 2016: Oragne County, California

LectureOutline

Big Data: theNewPlayground

Events,Processes, andAnything inBetween

Events

Processes

ComplexEventProcessingOptimization

ProcessMining withSchedules

Event Processing

Urban Traffic Management

LectureOutline

Big Data: theNewPlayground

Events,Processes, andAnything inBetween

Events

Processes

ComplexEventProcessingOptimization

ProcessMining withSchedules

Traffic Flow

LectureOutline

Big Data: theNewPlayground

Events,Processes, andAnything inBetween

Events

Processes

ComplexEventProcessingOptimization

ProcessMining withSchedules

Bus Log

LectureOutline

Big Data: theNewPlayground

Events,Processes, andAnything inBetween

Events

Processes

ComplexEventProcessingOptimization

ProcessMining withSchedules

Events and Big Data

Volume: 23 Million records per month (∼ 4GB)

Velocity: 770,000 new records per day (an event each 2-6seconds)

Variety: Homogeneous

Veracity: GPS locations

LectureOutline

Big Data: theNewPlayground

Events,Processes, andAnything inBetween

Events

Processes

ComplexEventProcessingOptimization

ProcessMining withSchedules

Processes

Processes

Process models describe time dependencies amongactivities:

Business processesScheduled activities

Used as a template for execution by a process engine.

A process model can be modeled as a graph containingactivity nodes and control nodes:

Petri nets [Reisig, 1985]BPMN [bpm, 2011]

LectureOutline

Big Data: theNewPlayground

Events,Processes, andAnything inBetween

Events

Processes

ComplexEventProcessingOptimization

ProcessMining withSchedules

Process Models

Bus Log

Bus Model

s d

Traveling Time = Drive Time + Delay Time + Stop Time

ω_2 ω_3 ω_i ω_{n-1}

LectureOutline

Big Data: theNewPlayground

Events,Processes, andAnything inBetween

Events

Processes

ComplexEventProcessingOptimization

ProcessMining withSchedules

Between Events and Processes

Given processes, detect (complex) events

Given events, discover processes

LectureOutline

Big Data: theNewPlayground

Events,Processes, andAnything inBetween

ComplexEventProcessingOptimization

ProcessMining withSchedules

From Processes to CEP

Optimisation of event pattern matching on three levels

Approach based on domain knowledge

Results taken from: M. Weidlich, H. Ziekow, A. Gal, J.Mendling, M. Weske - Optimising Event Pattern Matchingusing Business Process Models. IEEE Transactions onKnowledge and Data Engineering (TKDE), accepted forpublication, 2015.

LectureOutline

Big Data: theNewPlayground

Events,Processes, andAnything inBetween

ComplexEventProcessingOptimization

ProcessMining withSchedules

From Processes to CEP

Thanks Matthias Weidlich for the slides

LectureOutline

Big Data: theNewPlayground

Events,Processes, andAnything inBetween

ComplexEventProcessingOptimization

ProcessMining withSchedules

Optimization by Transformation

Sequentialization Rule

LectureOutline

Big Data: theNewPlayground

Events,Processes, andAnything inBetween

ComplexEventProcessingOptimization

ProcessMining withSchedules

Optimization by Plan Selection

Sequentialization Rule

LectureOutline

Big Data: theNewPlayground

Events,Processes, andAnything inBetween

ComplexEventProcessingOptimization

ProcessMining withSchedules

Optimization by Early Termination

Sequentialization Rule

LectureOutline

Big Data: theNewPlayground

Events,Processes, andAnything inBetween

ComplexEventProcessingOptimization

ProcessMining withSchedules

Performance Analysis

Datasets

publicly available process log that contains recordedexecution sequences of a paper reviewing process.a

The model denes 20 activities.The log comprises 3730 events that are related to 100process instances.Each event is associated with a timestamp and a referenceto an activity of the process model.

Process models of a German insurance company.

1021 process models, ranging from 4 to 339 nodes.The average size of the process models is around 23 nodes.The log was simulated using annotations of the processmodels.

ahttp://www.processmining.org/logs/start

LectureOutline

Big Data: theNewPlayground

Events,Processes, andAnything inBetween

ComplexEventProcessingOptimization

ProcessMining withSchedules

Performance Analysis

LectureOutline

Big Data: theNewPlayground

Events,Processes, andAnything inBetween

ComplexEventProcessingOptimization

ProcessMining withSchedules

Performance Analysis

LectureOutline

Big Data: theNewPlayground

Events,Processes, andAnything inBetween

ComplexEventProcessingOptimization

ProcessMining withSchedules

Complex Events Processing with Processes

Gathering ...

Volume

Velocity Optimization

Variety Optimisation in event processing networks

Veracity

Table: Big Data Cross Table

LectureOutline

Big Data: theNewPlayground

Events,Processes, andAnything inBetween

ComplexEventProcessingOptimization

ProcessMining withSchedules

Complex Events Processing with Processes

... Analysis

Volume Mining of constraints

Velocity

Variety

Veracity Probabilistic mining of constraints

Table: Big Data Cross Table

LectureOutline

Big Data: theNewPlayground

Events,Processes, andAnything inBetween

ComplexEventProcessingOptimization

ProcessMining withSchedules

From Events to Processes

Online Traveling Time Prediction: when Processes Rule Events

Using information on bus stops, the prediction of the journeytraveling time T (〈ω1, . . . , ωn〉, tω1) is traced back to the sum oftraveling times per segment:

T (〈ω1, . . . , ωn〉, tω1) = T (〈ω1, ω2〉, tω1) + . . . + T (〈ωn−1, ωn〉, tωn−1)

where

tωn−1 = tω1 + T (〈ω1, ωn−1〉, tω1).

s d

Traveling Time = Drive Time + Delay Time + Stop Time

ω_2 ω_3 ω_i ω_{n-1}

(Thanks to Arik Senderovich for the slides)

LectureOutline

Big Data: theNewPlayground

Events,Processes, andAnything inBetween

ComplexEventProcessingOptimization

ProcessMining withSchedules

From Events to Processes

Online Traveling Time Prediction: when Processes Rule Events

Using information on bus stops, the prediction of the journeytraveling time T (〈ω1, . . . , ωn〉, tω1) is traced back to the sum oftraveling times per segment:

T (〈ω1, . . . , ωn〉, tω1) = T (〈ω1, ω2〉, tω1) + . . . + T (〈ωn−1, ωn〉, tωn−1)

where

tωn−1 = tω1 + T (〈ω1, ωn−1〉, tω1).

s d

Traveling Time = Drive Time + Delay Time + Stop Time

ω_2 ω_3 ω_i ω_{n-1}

(Thanks to Arik Senderovich for the slides)

LectureOutline

Big Data: theNewPlayground

Events,Processes, andAnything inBetween

ComplexEventProcessingOptimization

ProcessMining withSchedules

Prediction: The Snapshot Principle inSingle-Station Queues

The snapshot principle stems from a heavy-trafficapproximation of a queueing system under limits of itsparameters, as the workload converges to capacity.

Station1

The principle states that the total time in the station(waiting+service) remains constant.

In our context, bus that passes through a segment, e.g.,〈ωi, ωi+1〉 ∈ S × S, will have the same traveling time asanother bus that has just passed through that segment (notnecessarily of the same type, line, etc.).

LectureOutline

Big Data: theNewPlayground

Events,Processes, andAnything inBetween

ComplexEventProcessingOptimization

ProcessMining withSchedules

Prediction: The Snapshot Principle inSingle-Station Queues

The snapshot principle stems from a heavy-trafficapproximation of a queueing system under limits of itsparameters, as the workload converges to capacity.

Station1

The principle states that the total time in the station(waiting+service) remains constant.

In our context, bus that passes through a segment, e.g.,〈ωi, ωi+1〉 ∈ S × S, will have the same traveling time asanother bus that has just passed through that segment (notnecessarily of the same type, line, etc.).

LectureOutline

Big Data: theNewPlayground

Events,Processes, andAnything inBetween

ComplexEventProcessingOptimization

ProcessMining withSchedules

Prediction: The Snapshot Principle inSingle-Station Queues

The snapshot principle stems from a heavy-trafficapproximation of a queueing system under limits of itsparameters, as the workload converges to capacity.

Station1

The principle states that the total time in the station(waiting+service) remains constant.

In our context, bus that passes through a segment, e.g.,〈ωi, ωi+1〉 ∈ S × S, will have the same traveling time asanother bus that has just passed through that segment (notnecessarily of the same type, line, etc.).

LectureOutline

Big Data: theNewPlayground

Events,Processes, andAnything inBetween

ComplexEventProcessingOptimization

ProcessMining withSchedules

The Snapshot Principle in Single-Station Queues

Based on the above, we define a single-segment snapshotpredictor, Last-Bus-to-Travel-Segment (LBTS), denoted byθLBTS(〈ωi, ωi+1〉, tω1).

In real-life settings, applicability of the snapshot principlepredictors should be tested ad-hoc.

The snapshot principle was shown to be of an empirical valuein previous research, where queueing techniques were applied topredict delays.

LectureOutline

Big Data: theNewPlayground

Events,Processes, andAnything inBetween

ComplexEventProcessingOptimization

ProcessMining withSchedules

The Snapshot Principle in Single-Station Queues

Based on the above, we define a single-segment snapshotpredictor, Last-Bus-to-Travel-Segment (LBTS), denoted byθLBTS(〈ωi, ωi+1〉, tω1).

In real-life settings, applicability of the snapshot principlepredictors should be tested ad-hoc.

The snapshot principle was shown to be of an empirical valuein previous research, where queueing techniques were applied topredict delays.

LectureOutline

Big Data: theNewPlayground

Events,Processes, andAnything inBetween

ComplexEventProcessingOptimization

ProcessMining withSchedules

Snapshot Principle in a Network

In our case, the LBTS predictor needs to be lifted to a networksetting.

The snapshot principle holds for networks of queues, when therouting through this network is known in advance.

In scheduled transportation such as buses this is the case as theorder of stops (and segments) is predefined:

Station1 Station2 Station3

Station5 Station6

Station4

Station7

LectureOutline

Big Data: theNewPlayground

Events,Processes, andAnything inBetween

ComplexEventProcessingOptimization

ProcessMining withSchedules

Snapshot Principle in a Network

In our case, the LBTS predictor needs to be lifted to a networksetting.

The snapshot principle holds for networks of queues, when therouting through this network is known in advance.

In scheduled transportation such as buses this is the case as theorder of stops (and segments) is predefined:

Station1 Station2 Station3

Station5 Station6

Station4

Station7

LectureOutline

Big Data: theNewPlayground

Events,Processes, andAnything inBetween

ComplexEventProcessingOptimization

ProcessMining withSchedules

Snapshot Principle in a Network

In our case, the LBTS predictor needs to be lifted to a networksetting.

The snapshot principle holds for networks of queues, when therouting through this network is known in advance.

In scheduled transportation such as buses this is the case as theorder of stops (and segments) is predefined:

Station1 Station2 Station3

Station5 Station6

Station4

Station7

LectureOutline

Big Data: theNewPlayground

Events,Processes, andAnything inBetween

ComplexEventProcessingOptimization

ProcessMining withSchedules

Snapshot Principle in a Network

We define a multi-segment (network) snapshot predictor thatwe refer to as the Last-Bus-to-Travel-Network orθLBTN (〈ω1, ..., ωn〉, tω1), given a sequence of stops (with ω1

being the start stop and ωn being the end stop).

According to the snapshot principle in networks we get that:

θLBTN (〈ω1, ..., ωn〉, tω1) =

n∑i=1

θLBTS(〈ωi, ωi+1〉, tω1).

LectureOutline

Big Data: theNewPlayground

Events,Processes, andAnything inBetween

ComplexEventProcessingOptimization

ProcessMining withSchedules

Snapshot Principle in a Network

We define a multi-segment (network) snapshot predictor thatwe refer to as the Last-Bus-to-Travel-Network orθLBTN (〈ω1, ..., ωn〉, tω1), given a sequence of stops (with ω1

being the start stop and ωn being the end stop).

According to the snapshot principle in networks we get that:

θLBTN (〈ω1, ..., ωn〉, tω1) =

n∑i=1

θLBTS(〈ωi, ωi+1〉, tω1).

LectureOutline

Big Data: theNewPlayground

Events,Processes, andAnything inBetween

ComplexEventProcessingOptimization

ProcessMining withSchedules

Snapshot Principle in a Network

We define a multi-segment (network) snapshot predictor thatwe refer to as the Last-Bus-to-Travel-Network orθLBTN (〈ω1, ..., ωn〉, tω1), given a sequence of stops (with ω1

being the start stop and ωn being the end stop).

According to the snapshot principle in networks we get that:

θLBTN (〈ω1, ..., ωn〉, tω1) =

n∑i=1

θLBTS(〈ωi, ωi+1〉, tω1).

LectureOutline

Big Data: theNewPlayground

Events,Processes, andAnything inBetween

ComplexEventProcessingOptimization

ProcessMining withSchedules

Performance Analysis

Data

8 days of bus data, between September and October of2014.

Each day: approximately 11500 traveled segments.

First trip for each day: no associated last travel time.

Prediction for line 046A.

Data comes from all buses that share segments with line046A.

LectureOutline

Big Data: theNewPlayground

Events,Processes, andAnything inBetween

ComplexEventProcessingOptimization

ProcessMining withSchedules

Performance Analysis

10 20 30 40 50Index of the segment in the trip

100

101

102

103

104

105

106

107

Sam

ple

square

est

imati

on e

rror

40

50

60

70

80

90

100

110

Root

Mean S

quare

Err

or

LectureOutline

Big Data: theNewPlayground

Events,Processes, andAnything inBetween

ComplexEventProcessingOptimization

ProcessMining withSchedules

Process Mining with Schedules

... Analysis

Volume Better prediction

Velocity Segmentation

Variety

Veracity

Table: Big Data Cross Table

LectureOutline

Big Data: theNewPlayground

Events,Processes, andAnything inBetween

ComplexEventProcessingOptimization

ProcessMining withSchedules

Process Mining with Schedules

... Management ...

Volume

Velocity

Variety

Veracity Event Cleaning

Table: Big Data Cross Table

LectureOutline

Big Data: theNewPlayground

Events,Processes, andAnything inBetween

ComplexEventProcessingOptimization

ProcessMining withSchedules

Thank You

Avigdor GalTechnion – Israel Institute of Technology

LectureOutline

Big Data: theNewPlayground

Events,Processes, andAnything inBetween

ComplexEventProcessingOptimization

ProcessMining withSchedules

A. Adi and O. Etzion.Amit - the situation manager.The International Journal on Very Large Data Bases, 13(2):177–203, May2004.

Roger S. Barga, Jonathan Goldstein, Mohamed H. Ali, and MingshengHong.Consistent streaming through time: A vision for event stream processing.In CIDR [DBL, 2007], pages 363–374.

Business Process Model and Notation (BPMN) Version 2.0.Technical report, Object Management Group (OMG), January 2011.

CIDR 2007, Third Biennial Conference on Innovative Data SystemsResearch, Asilomar, CA, USA, January 7-10, 2007, Online Proceedings.www.cidrdb.org, 2007.

Alan J. Demers, Johannes Gehrke, Biswanath Panda, Mirek Riedewald,Varun Sharma, and Walker M. White.Cayuga: A general purpose event monitoring system.In CIDR [DBL, 2007], pages 412–422.

Opher Etzion and Peter Niblett.Event Processing in Action.Manning Publications Company, 2010.

LectureOutline

Big Data: theNewPlayground

Events,Processes, andAnything inBetween

ComplexEventProcessingOptimization

ProcessMining withSchedules

Wolfgang Reisig.Petri Nets: An Introduction, volume 4 of Monographs in TheoreticalComputer Science. An EATCS Series.Springer, 1985.

Eugene Wu, Yanlei Diao, and Shariq Rizvi.High-performance complex event processing over streams.In SIGMOD ’06: Proceedings of the 2006 ACM SIGMOD internationalconference on Management of data, pages 407–418, New York, NY, USA,2006. ACM.