Upload
marlon-dumas
View
256
Download
0
Embed Size (px)
Citation preview
Predictive Process Monitoring Framework with Hyperparameter Optimization
Chiara Di Francescomarino Chiara GhidiniFondazione Bruno Kessler
Marlon Dumas Fabrizio Maria MaggiUniversity of Tartu
Marco FedericiWilliams RizziUniversity of Trento
2
Predictive Business Process Monitoring
Predictive Business Process Monitoring
Historical execution
traces
Running trace
Prediction problem
Prediction
Does Alice need a given exam?
3
Predictive Process Monitoring Frameworks
• Framework instance or configuration: combination of techniques and their input parameters (hyperparameters).
• No unique framework instance for all prediction problems and datasets.
Predictive Process Monitoring Framework
K-means
clustering
DBScan clusteringDecision Tree
Agglomerative
clustering
Frequency-based encoding
Sequence-based
encoding
Voting
Random Forest
• Cluster Number
• Minpoints
• Epsilon
• Voters
• Cluster Number
• Seed
• Seed
Historical execution
traces
Running trace
Prediction problem
Framework Instance
In the “Real” WorldDoes Alice need the exams tumor marker CA- 19.9 or ca -125 using meia?
Which framework instance best suits my dataset and problem?Which one if I would
like to have only accurate predictions?
Predictive Process Monitoring Framework
• Cluster Number• Minpoints
• Epsilon
• Voters
• Cluster Number
• Seed
• Seed
K-means
clusteringDBScan clusteringDecision
Tree
Agglomerative
clustering
Frequency-based encoding
Sequence-based
encoding
Voting
Random Forest
4
5
The Existing Landscape
• Approaches for – the selection of machine learning techniques– the tuning of their hyperparameters – the combined optimization of machine learning
techniques and their hyperparameters• We need to deal with the combination of
more than one machine learning technique, depending one from the other.
Challenge
6
How to Avoid Users’ Panic?
• A Predictive Process Monitoring Framework enhanced with technique and hyperparameter optimization1. An exhaustive exploration of a set of the
framework configurations
2. Comparison and analysis of the results.
How to make
it efficiently?
How to
support users?
7
The Enhanced Framework
Prediction Problem
Predictive Process Monitoring Framework
Historical execution
traces
Running trace
Prediction
Technique and Hyperparameter Tuner
Validation execution
traces
ReplayerEvaluator
Framework InstanceAggregated Metrics
Framework Instance
8
The Predictive Process Monitoring Framework
Pre-processing
Historical execution
traces
Running trace
Runtime
Clustering ClustersControl
flow encoding
Encoded control
flow
CONTROL FLOW
Prefix extraction
Trace Prefixes
Predictive MonitoringControl
flow encoding
Data encoding
Cluster(s) identification
Classification
Prediction Problem
Prediction
Supervised Learning Classifiers
Data encoding
Encoded data
DATALabeling function
9
The Predictive Process Monitoring Framework Instances
• Each technique has its own hyperparameters• Other framework parameters:
– Trace prefix size– Voting mechanism– Interval choice in case of interval time predictions
10
Technique and Hyperparameter Tuning
• A trace is replayed until an evaluation point with a prediction confidence above a given threshold is reached.
• Three metrics/evaluation dimensions:– Accuracy– Failure rate– Earliness
ProM
ProM Operational
Support Service 2.0
Predictive Monitor
Technique and Hyperparameter Tuner
ReplayerValidation execution
traces
Configuration Sender
Evaluator
Framework Instance AggregatedMetrics
Framework Instance
11
Improving Efficiency
• Scheduling mechanism for parallel replayers• Reuse of data structures
ProM
ProM Operational
Support Service 2.0
Predictive Monitor
Technique and Hyperparameter Tuner
Replayer 1
<<GUI>>
Unfolding Module
Configuration Sender
Replayer Scheduler
configuration{Run ID}
<Run ID, Trace>
Replayer 2
Replayer NSCHEDULER
Structured structure
Repository
12
Supporting Users in the Analysis of the Results
13
Evaluation
• A suitable configuration for the prediction problem and dataset in practice1. Does it return a set of configurations suitable for
the prediction problem?2. Does the selected configuration meet the choice
criteria?3. Does it require a reasonable amount of time?
14
Experimental Settings• Two datasets and two prediction problems– BPI Challenge 2011
– BPI Challenge 2015
Dataset preparation:• Training set (70%)• Validation set (20%)• Testing set (10%)
Identification of the most suitable
configurations (among 160)
Evaluation of the identified
configurations (with the testing
set)
15
Configuration Set Variability
• Higher variability for the first dataset → tuning depends on users’ needs
• Lower variability for the second dataset → configurations do not change that much
16
Configuration Selection
• No unique best configuration.• Evaluation values are aligned with the tuning
ones.
17
Computation Time
• Computation time can depend on the trace length.
• Data structure reuse →20% time reduction• 8 replayers → 13% time reduction
18
Summing up & Looking Ahead
• A predictive monitoring framework enhanced with technique and hyperparameter optimization
• Three directions:– Increase user support– Optimize exhaustive search– Prescriptive process monitoring
THANK YOU!!