48
Mining System Logs to Learn Error Predictors A Case Study of a Telemetry System Barbara Russo L.E.S.E.R. Faculty of Computer Science, Free University of Bozen-Bolzano, Italy [email protected] Universität Stuttgart - June 9th, 2015

Mining System Logs to Learn Error Predictors, Universität Stuttgart, Stuttgart June 2015

Embed Size (px)

Citation preview

Mining System Logs to Learn Error Predictors

A Case Study of a Telemetry System

Barbara Russo L.E.S.E.R.

Faculty of Computer Science, Free University of Bozen-Bolzano, Italy [email protected]

Universität Stuttgart - June 9th, 2015

A collaboration between

Free University of Bozen-Bolzano, Italy

and

University of Alberta, Canada

Barbara Russo, Giancarlo Succi, Witold Pedrycz (2015) Mining system logs to learn error predictors: a case study of a telemetry system, Empirical Software Engineering: Volume 20, Issue 4 (2015), pp. 879-927

Universität Stuttgart - June 9th, 2015 2

System events

•  Events describe the behaviour within and across subsystems or components –  the system changes over time

•  Logs track events

Universität Stuttgart - June 9th, 2015 3

The value of logs

•  Log events carry information on –  the software application that generated the event and its

state,

–  the task and the user whose interaction with the system triggered the event, and

–  the time-stamp at which the event is generated.

Universität Stuttgart - June 9th, 2015 4

Logs can be cryptic

Universität Stuttgart - June 9th, 2015 5

Errors

•  Some behaviours are desirable and some are not

•  Undesirable behaviours are referred to as system errors –  crashes that immediately stop the system and are easily

identifiable

–  deviations from the expected output that let the system run and reveal only at completion of system tasks

Universität Stuttgart - June 9th, 2015 6

Meaning of errors

•  Events in error state (errors) act as alerts –  ? Manifestations of system failures

–  ? Originated from a series of preceding events

–  ? Immediate action must be taken

–  ? Indication of an underlying problem

Universität Stuttgart - June 9th, 2015 7

Goal

•  Analysing the behaviour of a (composite) system by mining logs of events and predicting future system misbehaviour

•  Composite: many applications or subsystems

Universität Stuttgart - June 9th, 2015 8

Method

•  Solve a classification problem with SVM

•  Build a sequence abstraction by mining logs

•  Integrate several statistical techniques to control for data brittleness and accuracy of model selection and validation

•  Discuss the classification problem at different degree of defectiveness

Universität Stuttgart - June 9th, 2015 9

Sequences

•  A single event may not suffice to predict system failures

•  An event sequence is a set of events ordered by their timestamp occurring within a given time window

•  A sequence abstraction is a representation of identified sequences in formal format that machines can read

Universität Stuttgart - June 9th, 2015 10

Research question

•  Is the amount and type of information carried by a sequence enough to predict errors?

Universität Stuttgart - June 9th, 2015 11

Isolating sequences

Universität Stuttgart - June 9th, 2015 12

Different length, different types

Abstracting sequences

Universität Stuttgart - June 9th, 2015 13

µ1 … µn

s7

s30

s2

s14

s10

Same length, same types

Example – sequence type

•  sv1=[0,1,0,1]

•  sv2=[2,1,1,0]

Universität Stuttgart - June 9th, 2015 14

Sequence type

•  µi – number of the events of type i in a sequence

•  sv=[µ1, …,µn] – vector of event multiplicities

•  ρ(sv) = sum of # errors in sequences mapping into sv

Universität Stuttgart - June 9th, 2015 15

Features to feed SVM

•  v= [sv, µ(sv), ν(sv)] – feature –  µ(sv) = # sequences mapping into sv

–  ν(sv) average # of users in sequences mapping into sv

•  v is an faulty feature if at least one event in one sequence is in error state

Universität Stuttgart - June 9th, 2015 16

Sequence vector semantic

•  Patterns of system behaviour –  If µ>1 and ρ>0 such sequences denote a reliability problem

that recurs

•  Distributed teams –  If ν>1 the comparative analysis of features with ρ>0 or ρ=0

tells whether errors are originated by multi users working for the same tasks

Universität Stuttgart - June 9th, 2015 17

Example - features

•  v1= [0,1,0,1;1,1], sv1=[0,1,0,1] –  µ(sv1) =1, ν(sv1)=1, ρ(sv1)=0

•  v2 = [2,1,1,0;1,2], sv2=[2,1,1,0] –  µ(sv2) =1, ν(sv2)=2, ρ(sv2)=2

Universität Stuttgart - June 9th, 2015 18

The classification problem

19

Data Sets Classifier

Different ex-ante distributions: (faulty, non-faulty)

G2 =Non-Faulty

G1= Faulty

Ex-post classification differs on different classifier’s thresholds

Features

Classification •  False Positive = features v that are predicted faulty

but do not contain error(s), ρ(sv)=0

•  True positive = features v that are predicted faulty and contain error(s), ρ(sv)>0

•  False negative = features v that are predicted non-faulty but that contain error(s), ρ(sv)>0

•  True negative = features v that are predicted non-faulty and do not contain error(s), ρ(sv)=0

Universität Stuttgart - June 9th, 2015 20

Measures of accuracy

Universität Stuttgart - June 9th, 2015 21

Build classifiers on historical data

22

Classifier

Training Set

Test Set

1.  To tune classifier’s parameters

2.  To compute classifier’s fitting performance

Compare prediction performance

23

Classifier1

Validation Set

Classifier2

Classifiern

…  

Validating sequence abstraction

•  Did we put too much information in our features? –  Information Gain selects features that most contribute to the

information of a given classification category: Classification category: sequences with a given number of error events

Universität Stuttgart - June 9th, 2015 24

Control the effect of the dataset nature •  Does set balancing increase the quality of prediction?

–  If classification categories are not equally represented in datasets, classifiers might have low precision even though true positive rate is high and false positive rate is low.

–  Such imbalanced data sets are very frequent in software engineering data

Universität Stuttgart - June 9th, 2015 25

Parametric classification

•  The problem varies depending on how many errors we allow in the system

•  c – cut-off value, i.e., number of errors in a sequence vector

•  Categories: –  G1(c)={v = [sv, µ(sv),ν(sv)] | ρ(sv)≥c}

–  G2(c)={v = [sv, µ(sv),ν(sv)] | ρ(sv)<c}

Universität Stuttgart - June 9th, 2015 26

The case study

Universität Stuttgart - June 9th, 2015 27

Business Questions

•  In our case study: –  Can we use Support Vector Machines to build suitable

predictors?

–  Is there any Support Vector Machine that performs best for all system applications?

–  Is there any machine that does it for different levels of reliability requested to the system?

Universität Stuttgart - June 9th, 2015 28

Descriptive analysis across apps

Universität Stuttgart - June 9th, 2015 29

54 datasets out of them 25 with some faulty features

Across system applications

Universität Stuttgart - June 9th, 2015 30

Applications ordered by size of features set

Perc

enta

ge o

f fau

lty fe

atur

es

Effects of Information Gain

Universität Stuttgart - June 9th, 2015 31

Splitting data

•  Three approaches to control for artificial assumptions –  Varying the size of splitting “t-splitting”

–  Reducing features with IG and varying size “t-splitting reduced”

–  Balancing sets “k-splitting” , i.e., manipulating sets so that the number of instances in the two categories are balanced

Universität Stuttgart - June 9th, 2015 32

Types of SVM

•  Different kernels –  Multilayer perceptron

–  Linear

–  Radial Basis Function

Universität Stuttgart - June 9th, 2015 33

Fitting performance ac. applications

Universität Stuttgart - June 9th, 2015 34

Number of applications for which a classifier outperforms (with MR) the others in quality of fit

Prediction

Universität Stuttgart - June 9th, 2015 35

No filter

Filtered with IG

•  Models with high fitting performance (bal>0.73)

•  Prediction performance averaged across t-

splitting and models

Findings

•  Better with IG filtering, MP is best across applications, but it is not the unique (Clustering applications?)

•  Artificial balance does not help to identify a single classifier, but it helps to increase convergence in those classifiers that are not reduced with IG

Universität Stuttgart - June 9th, 2015 36

Findings (superior than literature)

•  Best performance at individual application (MP, c=3): –  1% false positive rate, 94% true positive rate, and 95%

precision

•  Best performance across applications averaged over models for c=2, –  9% false positive rate, 78% true positive rate, and 95%

precision,

Universität Stuttgart - June 9th, 2015 37

What predictions can tell managers •  Application the manages software tools of cars

–  Pervasive in the telemetry system

•  106 distinct features of 10 different event types, 18% multiple sequences, and 89% with more than one user

•  c=1

•  IG reduction from 12 to 7 still including µ and ν

Universität Stuttgart - June 9th, 2015 38

Confusion matrix: prediction - MP

Universität Stuttgart - June 9th, 2015 39

Prediction - assumptions

•  Behaviour is the same in next three months

•  1000 features

•  Category balance is the one for the test set (fitting) (39%) –  390 faulty features and 610 non- faulty features

Universität Stuttgart - June 9th, 2015 40

In numbers

•  We have 390 faulty features and 610 non-faulty features and 450 predicted faulty features

•  Predicted faulty features that have no error: –  67 = 11%*610

•  Fail to predict faulty features = 70 =18%*390

Universität Stuttgart - June 9th, 2015 41

Pred pos Pred neg Total

Pos 82% 18% 100%

Neg 11% 89% 100%

Total 45% 54% 100%

Cost of prediction

•  Inspection cost. Wasting time ≥ 67 * average cost to fix one error –  There might be more than one error in one sequence on

average

•  Cost for undiscovered errors. Defect slippage ≥ 70 –  Measure of system unreliability

–  Cost to repair errors at late stages (inaccuracy, higher cost due to pressure, not being able to fix)

Universität Stuttgart - June 9th, 2015 42

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

False Positive Rate

True Positive Rate

Best prediction

models

Equal chance

Higher cost to fix

undiscovered errors

Higher inspection costs

Prediction

MP

RBF

L

Universität Stuttgart - June 9th, 2015 43

FPr=11%, TPr=82%

Recommendations

•  Select models that first accurately fit historical data before using them for predictions –  Best models for quality of fit are not always the best

predictors for all splitting sizes of a feature set

•  Reduce information redundancy

Universität Stuttgart - June 9th, 2015 44

Recommendations

•  Report fitting accuracy

•  Use parametric classification –  The parameter being the number of errors a sequence must

contain in order to be classified as defective/faulty.

•  Study prediction at different cut-off values or with different splitting size or balance to solve the prediction problem independently from the level of reliability requested for the system and the nature of the data.

Universität Stuttgart - June 9th, 2015 45

Thank you

Universität Stuttgart - June 9th, 2015 46

With artificial balance

•  It does not help to identify a single classifier

•  It helps to increase convergence in those classifiers that are not reduced with IG

47

With IG filter

48

Best classifiers across different t-splitting; classifiers with b<0.73 are not reported