Upload
barbara-russo
View
234
Download
0
Tags:
Embed Size (px)
Citation preview
Mining System Logs to Learn Error Predictors
A Case Study of a Telemetry System
Barbara Russo L.E.S.E.R.
Faculty of Computer Science, Free University of Bozen-Bolzano, Italy [email protected]
Universität Stuttgart - June 9th, 2015
A collaboration between
Free University of Bozen-Bolzano, Italy
and
University of Alberta, Canada
Barbara Russo, Giancarlo Succi, Witold Pedrycz (2015) Mining system logs to learn error predictors: a case study of a telemetry system, Empirical Software Engineering: Volume 20, Issue 4 (2015), pp. 879-927
Universität Stuttgart - June 9th, 2015 2
System events
• Events describe the behaviour within and across subsystems or components – the system changes over time
• Logs track events
Universität Stuttgart - June 9th, 2015 3
The value of logs
• Log events carry information on – the software application that generated the event and its
state,
– the task and the user whose interaction with the system triggered the event, and
– the time-stamp at which the event is generated.
Universität Stuttgart - June 9th, 2015 4
Errors
• Some behaviours are desirable and some are not
• Undesirable behaviours are referred to as system errors – crashes that immediately stop the system and are easily
identifiable
– deviations from the expected output that let the system run and reveal only at completion of system tasks
Universität Stuttgart - June 9th, 2015 6
Meaning of errors
• Events in error state (errors) act as alerts – ? Manifestations of system failures
– ? Originated from a series of preceding events
– ? Immediate action must be taken
– ? Indication of an underlying problem
Universität Stuttgart - June 9th, 2015 7
Goal
• Analysing the behaviour of a (composite) system by mining logs of events and predicting future system misbehaviour
• Composite: many applications or subsystems
Universität Stuttgart - June 9th, 2015 8
Method
• Solve a classification problem with SVM
• Build a sequence abstraction by mining logs
• Integrate several statistical techniques to control for data brittleness and accuracy of model selection and validation
• Discuss the classification problem at different degree of defectiveness
Universität Stuttgart - June 9th, 2015 9
Sequences
• A single event may not suffice to predict system failures
• An event sequence is a set of events ordered by their timestamp occurring within a given time window
• A sequence abstraction is a representation of identified sequences in formal format that machines can read
Universität Stuttgart - June 9th, 2015 10
Research question
• Is the amount and type of information carried by a sequence enough to predict errors?
Universität Stuttgart - June 9th, 2015 11
Abstracting sequences
Universität Stuttgart - June 9th, 2015 13
µ1 … µn
s7
s30
s2
s14
s10
Same length, same types
Sequence type
• µi – number of the events of type i in a sequence
• sv=[µ1, …,µn] – vector of event multiplicities
• ρ(sv) = sum of # errors in sequences mapping into sv
Universität Stuttgart - June 9th, 2015 15
Features to feed SVM
• v= [sv, µ(sv), ν(sv)] – feature – µ(sv) = # sequences mapping into sv
– ν(sv) average # of users in sequences mapping into sv
• v is an faulty feature if at least one event in one sequence is in error state
Universität Stuttgart - June 9th, 2015 16
Sequence vector semantic
• Patterns of system behaviour – If µ>1 and ρ>0 such sequences denote a reliability problem
that recurs
• Distributed teams – If ν>1 the comparative analysis of features with ρ>0 or ρ=0
tells whether errors are originated by multi users working for the same tasks
Universität Stuttgart - June 9th, 2015 17
Example - features
• v1= [0,1,0,1;1,1], sv1=[0,1,0,1] – µ(sv1) =1, ν(sv1)=1, ρ(sv1)=0
• v2 = [2,1,1,0;1,2], sv2=[2,1,1,0] – µ(sv2) =1, ν(sv2)=2, ρ(sv2)=2
Universität Stuttgart - June 9th, 2015 18
The classification problem
19
Data Sets Classifier
Different ex-ante distributions: (faulty, non-faulty)
G2 =Non-Faulty
G1= Faulty
Ex-post classification differs on different classifier’s thresholds
Features
Classification • False Positive = features v that are predicted faulty
but do not contain error(s), ρ(sv)=0
• True positive = features v that are predicted faulty and contain error(s), ρ(sv)>0
• False negative = features v that are predicted non-faulty but that contain error(s), ρ(sv)>0
• True negative = features v that are predicted non-faulty and do not contain error(s), ρ(sv)=0
Universität Stuttgart - June 9th, 2015 20
Build classifiers on historical data
22
Classifier
Training Set
Test Set
1. To tune classifier’s parameters
2. To compute classifier’s fitting performance
Validating sequence abstraction
• Did we put too much information in our features? – Information Gain selects features that most contribute to the
information of a given classification category: Classification category: sequences with a given number of error events
Universität Stuttgart - June 9th, 2015 24
Control the effect of the dataset nature • Does set balancing increase the quality of prediction?
– If classification categories are not equally represented in datasets, classifiers might have low precision even though true positive rate is high and false positive rate is low.
– Such imbalanced data sets are very frequent in software engineering data
Universität Stuttgart - June 9th, 2015 25
Parametric classification
• The problem varies depending on how many errors we allow in the system
• c – cut-off value, i.e., number of errors in a sequence vector
• Categories: – G1(c)={v = [sv, µ(sv),ν(sv)] | ρ(sv)≥c}
– G2(c)={v = [sv, µ(sv),ν(sv)] | ρ(sv)<c}
Universität Stuttgart - June 9th, 2015 26
Business Questions
• In our case study: – Can we use Support Vector Machines to build suitable
predictors?
– Is there any Support Vector Machine that performs best for all system applications?
– Is there any machine that does it for different levels of reliability requested to the system?
Universität Stuttgart - June 9th, 2015 28
Descriptive analysis across apps
Universität Stuttgart - June 9th, 2015 29
54 datasets out of them 25 with some faulty features
Across system applications
Universität Stuttgart - June 9th, 2015 30
Applications ordered by size of features set
Perc
enta
ge o
f fau
lty fe
atur
es
Splitting data
• Three approaches to control for artificial assumptions – Varying the size of splitting “t-splitting”
– Reducing features with IG and varying size “t-splitting reduced”
– Balancing sets “k-splitting” , i.e., manipulating sets so that the number of instances in the two categories are balanced
Universität Stuttgart - June 9th, 2015 32
Types of SVM
• Different kernels – Multilayer perceptron
– Linear
– Radial Basis Function
Universität Stuttgart - June 9th, 2015 33
Fitting performance ac. applications
Universität Stuttgart - June 9th, 2015 34
Number of applications for which a classifier outperforms (with MR) the others in quality of fit
Prediction
Universität Stuttgart - June 9th, 2015 35
No filter
Filtered with IG
• Models with high fitting performance (bal>0.73)
• Prediction performance averaged across t-
splitting and models
Findings
• Better with IG filtering, MP is best across applications, but it is not the unique (Clustering applications?)
• Artificial balance does not help to identify a single classifier, but it helps to increase convergence in those classifiers that are not reduced with IG
Universität Stuttgart - June 9th, 2015 36
Findings (superior than literature)
• Best performance at individual application (MP, c=3): – 1% false positive rate, 94% true positive rate, and 95%
precision
• Best performance across applications averaged over models for c=2, – 9% false positive rate, 78% true positive rate, and 95%
precision,
Universität Stuttgart - June 9th, 2015 37
What predictions can tell managers • Application the manages software tools of cars
– Pervasive in the telemetry system
• 106 distinct features of 10 different event types, 18% multiple sequences, and 89% with more than one user
• c=1
• IG reduction from 12 to 7 still including µ and ν
Universität Stuttgart - June 9th, 2015 38
Prediction - assumptions
• Behaviour is the same in next three months
• 1000 features
• Category balance is the one for the test set (fitting) (39%) – 390 faulty features and 610 non- faulty features
Universität Stuttgart - June 9th, 2015 40
In numbers
• We have 390 faulty features and 610 non-faulty features and 450 predicted faulty features
• Predicted faulty features that have no error: – 67 = 11%*610
• Fail to predict faulty features = 70 =18%*390
Universität Stuttgart - June 9th, 2015 41
Pred pos Pred neg Total
Pos 82% 18% 100%
Neg 11% 89% 100%
Total 45% 54% 100%
Cost of prediction
• Inspection cost. Wasting time ≥ 67 * average cost to fix one error – There might be more than one error in one sequence on
average
• Cost for undiscovered errors. Defect slippage ≥ 70 – Measure of system unreliability
– Cost to repair errors at late stages (inaccuracy, higher cost due to pressure, not being able to fix)
Universität Stuttgart - June 9th, 2015 42
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
False Positive Rate
True Positive Rate
Best prediction
models
Equal chance
Higher cost to fix
undiscovered errors
Higher inspection costs
Prediction
MP
RBF
L
Universität Stuttgart - June 9th, 2015 43
FPr=11%, TPr=82%
Recommendations
• Select models that first accurately fit historical data before using them for predictions – Best models for quality of fit are not always the best
predictors for all splitting sizes of a feature set
• Reduce information redundancy
Universität Stuttgart - June 9th, 2015 44
Recommendations
• Report fitting accuracy
• Use parametric classification – The parameter being the number of errors a sequence must
contain in order to be classified as defective/faulty.
• Study prediction at different cut-off values or with different splitting size or balance to solve the prediction problem independently from the level of reliability requested for the system and the nature of the data.
Universität Stuttgart - June 9th, 2015 45
With artificial balance
• It does not help to identify a single classifier
• It helps to increase convergence in those classifiers that are not reduced with IG
47