On theEvaluation ofUnsupervised
OutlierDetection
Arthur Zimek
Outlier DetectionMethods
Evaluation Measures
Datasets
Experiments
Conclusions
References
On the Evaluation of Unsupervised OutlierDetection
Arthur Zimek
Ludwig-Maximilians-Universität München
Talk at TU Vienna18.05.2016
1
On theEvaluation ofUnsupervised
OutlierDetection
Arthur Zimek
Outlier DetectionMethods
Evaluation Measures
Datasets
Experiments
Conclusions
References
On the evaluation of unsupervised outlierdetection
Campos, Zimek, Sander, Campello, Micenková, Schubert,Assent, and Houle: On the evaluation of unsupervised outlierdetection: Measures, datasets, and an empirical study. DataMining and Knowledge Discovery, 2016.http://doi.org/10.1007/s10618-015-0444-8
Online repository with complete material(methods, datasets, results, analysis):http://www.dbs.ifi.lmu.de/research/
outlier-evaluation/
2
On theEvaluation ofUnsupervised
OutlierDetection
Arthur Zimek
Outlier DetectionMethods
Evaluation Measures
Datasets
Experiments
Conclusions
References
What is an Outlier?
The intuitive definition of an outlier would be “anobservation which deviates so much from otherobservations as to arouse suspicions that it wasgenerated by a different mechanism”.
[Hawkins, 1980]
Simple model: take thekNN distance of a pointas its outlier score[Ramaswamy et al.,2000]
p1
p2
p3
3
On theEvaluation ofUnsupervised
OutlierDetection
Arthur Zimek
Outlier DetectionMethods
Evaluation Measures
Datasets
Experiments
Conclusions
References
Motivation
I many new outlier detection methods developed everyyear
I some studies about efficiency [Orair et al., 2010,Kriegel et al., 2016]
I specializations for different areas [Chandola et al.,2009, Zimek et al., 2012, Schubert et al., 2014a,Akoglu et al., 2015]
I evaluation of effectiveness remains notoriouslychallenging
I characterisation of outlierness differs from method tomethod
I lack of commonly agreed upon benchmark dataI measure of success? (most commonly: ROC)
4
On theEvaluation ofUnsupervised
OutlierDetection
Arthur Zimek
Outlier DetectionMethods
Evaluation Measures
Datasets
Experiments
Conclusions
References
Outline
Outlier Detection Methods
Evaluation Measures
Datasets
Experiments
Conclusions
5
On theEvaluation ofUnsupervised
OutlierDetection
Arthur Zimek
Outlier DetectionMethods
Evaluation Measures
Datasets
Experiments
Conclusions
References
Selected Methods
I kNN [Ramaswamy et al., 2000], kNN-weight [Angiulliand Pizzuti, 2005]
I ODIN [Hautamäki et al., 2004] (related to low hubnessoutlierness [Radovanovic et al., 2014])
I LOF [Breunig et al., 2000]I SimplifiedLOF [Schubert et al., 2014a], COF [Tang
et al., 2002], INFLO [Jin et al., 2006], LoOP [Kriegelet al., 2009]
I LDOF [Zhang et al., 2009], LDF [Latecki et al., 2007],KDEOS [Schubert et al., 2014b]
I FastABOD [Kriegel et al., 2008]
6
On theEvaluation ofUnsupervised
OutlierDetection
Arthur Zimek
Outlier DetectionMethods
Evaluation Measures
Datasets
Experiments
Conclusions
References
Discussion
I all these methods have a common parameter, theneighborhood size k
I this family of kNN-based methods is popular andcontains both classic and recent methods
I nevertheless, the parameter k has differentinterpretations and impact among the selected methods
I the selected methods comprise both ‘global’ and ‘local’methods [Schubert et al., 2014a]
I included variants of LOF vary different components ofthe typical local outlier model: notion of neighborhood,distance, density estimates, model comparison
I first study to compare all these methodsI all methods are implemented in a common framework
ELKI [Schubert et al., 2015]7
On theEvaluation ofUnsupervised
OutlierDetection
Arthur Zimek
Outlier DetectionMethods
Evaluation Measures
Datasets
Experiments
Conclusions
References
Outline
Outlier Detection Methods
Evaluation Measures
Datasets
Experiments
Conclusions
8
On theEvaluation ofUnsupervised
OutlierDetection
Arthur Zimek
Outlier DetectionMethods
Evaluation Measures
Datasets
Experiments
Conclusions
References
Evaluation of Rankings
I methods return a full ranking of database objectsI user interested in the top-ranked objects
Rank Score
1
1 0.996
2
2 0.995
3
3 0.994
4
4 0.994
5
5 0.994
6
6 0.993
7
7 0.993
8
8 0.992
99 0.991
10
10 0.989
11
11 0.985
1212 0.974
13
13 0.891
14
14 0.877
15
15 0.854
16
16 0.803
17
17 0.680
18
18 0.573
19
19 0.568 20
20 0.562...615
...0.000
Rank Score
1
1 56.069
2
2 47.862
3
3 43.726
4
4 43.449
5
5 41.938
6
6 37.283
7
7 36.444
8
8 32.193
99 27.095
10
10 23.429
11
11 16.629
1212 9.339
13
13 4.029
14
14 3.369
15
15 3.2081616 3.091
17
17 3.091
18
18 3.020
19
19 3.020 20
20 2.988
76
...76
...2.268
151
...151
...2.017
285
...285
...1.687
407
...407
...1.251...
615...0.171
examples taken from Campello et al. [2015]
9
On theEvaluation ofUnsupervised
OutlierDetection
Arthur Zimek
Outlier DetectionMethods
Evaluation Measures
Datasets
Experiments
Conclusions
References
Precision at n
I If the number of outlier candidates n is specified, thesimplest measure of performance is the precision at n(P@n), i.e., the proportion of correct results in the top nranks [Craswell, 2009a].
Precision at n (P@n)Given a database D of size N, outliers O ⊂ D and inliersI ⊆ D (D = O ∪ I), we have
P@n =|{o ∈ O | rank(o) ≤ n}|
n.
10
On theEvaluation ofUnsupervised
OutlierDetection
Arthur Zimek
Outlier DetectionMethods
Evaluation Measures
Datasets
Experiments
Conclusions
References
Precision at n — Properties
I how to fairly choose the parameter n of P@n?I n = |O| yields the popular R-Precision measure
[Craswell, 2009b].I n = |O| � N⇒ typically obtained values of P@n can be deceptivelylow, and not very informative as such
I n = |O| relatively large (of the same order as N)⇒ deceptively high values of P@n can be obtainedsimply due to the relatively small number of inliersavailable
11
On theEvaluation ofUnsupervised
OutlierDetection
Arthur Zimek
Outlier DetectionMethods
Evaluation Measures
Datasets
Experiments
Conclusions
References
Precision at n — Adjustment for Chance
I general procedure due to Hubert and Arabie [1985]:
Adjusted Index =Index− Expected Index
Maximum Index− Expected Index
I maximum P@n: |O|/n if n > |O|, and 1 otherwiseI expected value (random outlier ranking): |O|/N
Precision at n (P@n), adjusted for chanceIf n ≤ |O|:
Adjusted P@n =P@n− |O|/N
1− |O|/N.
For larger n, the maximum |O|/n must be used instead of 1.
12
On theEvaluation ofUnsupervised
OutlierDetection
Arthur Zimek
Outlier DetectionMethods
Evaluation Measures
Datasets
Experiments
Conclusions
References
Imbalance
I problem: imbalance between the numbers of inliers andoutliers: |I| � |O|, |I| ≈ N
I P@n and Adjusted P@n measures are easily interpretedI but they are sensitive to the choice of n, particularly
when n is smallI example:
I dataset with 10 outliers and 1 million inliersI result with (quite high) ranks 11–20 for the true outliersI P@10 of 0I P@20 of 0.5
I Adjusted P@n measure can be seen to suffer from asimilar sensitivity with respect to the choice of n.
13
On theEvaluation ofUnsupervised
OutlierDetection
Arthur Zimek
Outlier DetectionMethods
Evaluation Measures
Datasets
Experiments
Conclusions
References
Average Precision
I even in good results, outliers typically do not form alarger fraction among the top |O| ranks
I use measures that aggregate performance over a widerange of possible choices of n
I for example average precision [Zhang and Zhang,2009] (popular in information retrieval)
Average Precision (AP)
AP =1|O|
∑o∈O
P@ rank(o).
14
On theEvaluation ofUnsupervised
OutlierDetection
Arthur Zimek
Outlier DetectionMethods
Evaluation Measures
Datasets
Experiments
Conclusions
References
Average Precision — Adjustment for Chance
I perfect ranking yields a maximum value of 1I expected value of a random ranking is |O|/N
Average Precision (AP), adjusted for chance
Adjusted AP =AP−|O|/N1− |O|/N
.
15
On theEvaluation ofUnsupervised
OutlierDetection
Arthur Zimek
Outlier DetectionMethods
Evaluation Measures
Datasets
Experiments
Conclusions
References
When Should we Adjust for Chance?
for P@n and AP, adjustment for chance is:I not necessary when the performance of two methods
on the same dataset (that is, with the same proportionof outliers) is compared in relative terms
I helpful, if the measure is to be interpreted in absoluteterms
I strictly necessary, if the performance is to be comparedover different datasets with different proportions ofoutliers
16
On theEvaluation ofUnsupervised
OutlierDetection
Arthur Zimek
Outlier DetectionMethods
Evaluation Measures
Datasets
Experiments
Conclusions
References
Receiver Operating Characteristic (ROC)
I plots for all possible choices of n the true positive rate(the proportion of outliers correctly ranked among thetop n) versus the false positive rate (the proportion ofinliers ranked among the top n)
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
False Positive Rate
Tru
e P
ositi
ve R
ate
ROCAUC: 0.92684211
I based on rates, ROC inherently adjusts for theimbalance of class sizes typical of outlier detectiontasks
17
On theEvaluation ofUnsupervised
OutlierDetection
Arthur Zimek
Outlier DetectionMethods
Evaluation Measures
Datasets
Experiments
Conclusions
References
Area under the ROC curve (ROC AUC)
I A ROC curve can be summarized by a single valueknown as ROC AUC, defined as the area under theROC curve (AUC)
I 0 ≤ ROC AUC ≤ 1I interpretation: average of the recall at n, with n taken
over the ranks of all inlier objectsI probabilistic interpretation [Hanley and McNeil, 1982]:
probability of a pair (o, i), where o is some true outlier,and i is some inlier, being ordered correctly in theevaluated ranking:
ROC AUC := meano∈O,i∈I
1 if score(o) > score(i)12 if score(o) = score(i)0 if score(o) < score(i)
18
On theEvaluation ofUnsupervised
OutlierDetection
Arthur Zimek
Outlier DetectionMethods
Evaluation Measures
Datasets
Experiments
Conclusions
References
Discussion
I all these evaluation measures are external: requireannotated ground truth (outlier/inlier)
I useful for competitive evaluation of algorithms wherethe outliers in a dataset are actually known
I for evaluation in practical applications of methods wewould need internal evaluation measures
I so far, only one paper in the literature describes aninternal evaluation procedure, the work of Marqueset al. [2015]
19
On theEvaluation ofUnsupervised
OutlierDetection
Arthur Zimek
Outlier DetectionMethods
Evaluation Measures
Datasets
Experiments
Conclusions
References
Outline
Outlier Detection Methods
Evaluation Measures
Datasets
Experiments
Conclusions
20
On theEvaluation ofUnsupervised
OutlierDetection
Arthur Zimek
Outlier DetectionMethods
Evaluation Measures
Datasets
Experiments
Conclusions
References
Ground Truth for Outlier Detection?
I no commonly agreed upon and frequently usedbenchmark data available
I UCI datasets etc.: ground truth by class labels — notreadily usable for outlier evaluation
I papers on outlier detection prepare some datasets adhoc or reuse some datasets that have been preparedad hoc by others
I preparation involves decisions that are often notsufficiently documented
I we follow the common practice of downsampling someclass in a classification dataset to produce an outlierclass
21
On theEvaluation ofUnsupervised
OutlierDetection
Arthur Zimek
Outlier DetectionMethods
Evaluation Measures
Datasets
Experiments
Conclusions
References
Datasets Used in the Literature
Dataset Preprocessing N |O| Attributes Version used bynum cat
ALOI 50000 images, 27 attr. 50000 1508 27 Kriegel et al. [2011], Schubert et al. [2012]24000 images, 27648 attr. de Vries et al. [2012]
Glass Class 6 (out.) 214 9 7 Keller et al. [2012]vs. others (in.)
Iono- Class ‘b’ (out.) 351 126 32 Keller et al. [2012]sphere vs. class ‘g’ (in.)KDDCup99 U2R (out.) 60632 246 38 3 Nguyen and Gopalkrishnan [2010], Nguyen et al. [2010],
vs. Normal (in.) Kriegel et al. [2011], Schubert et al. [2012]Lympho- Classes 1 and 4 (out.) 148 6 3 16 Lazarevic and Kumar [2005],graphy vs. others (in.) Nguyen et al. [2010], Zimek et al. [2013]Pen- Downsampling class ‘4’ 9868 20 16 Kriegel et al. [2011]Digits to 20 objects (out.) Schubert et al. [2012]
Downsampling class ‘0’ Keller et al. [2012]to 10% (out.)
Shuttle Classes 2, 3, 5, 6, 7 (out.) Lazarevic and Kumar [2005], Abe et al. [2006],vs. class 1 (in.) Nguyen et al. [2010]Class 2 (out.) vs. downs. 1013 13 9 Zhang et al. [2009]others to 1000 obj. (in.)Downs. classes 2, 3, 5, Gao and Tan [2006]6, 7 (out.) vs. others (in.)
Wave- Downsampling class ‘0’ 3443 100 21 Zimek et al. [2013]form to 100 objects (out.)WBC ‘malignant ’ (out.) Gao and Tan [2006]
vs. ‘benign’ (in.)Downs. class ‘malignant ’ 454 10 9 Kriegel et al. [2011], Schubert et al. [2012],to 10 objects (out.) Zimek et al. [2013]
WDBC Downs. class ‘malignant ’ 367 10 30 Zhang et al. [2009]to 10 objects (out.)‘malignant ’ (out.) Keller et al. [2012]vs. ‘benign’ (in.)
WPBC Class ‘R’ (out.) 198 47 33 Keller et al. [2012]vs. class ‘N’ (in.)
22
On theEvaluation ofUnsupervised
OutlierDetection
Arthur Zimek
Outlier DetectionMethods
Evaluation Measures
Datasets
Experiments
Conclusions
References
Semantically Meaningful Outlier Datasets
Dataset Semantics N |O| Attributesnum. binary
Annthyroid 2 types of hypothyroidism vs. healthy 7200 534 21Arrhythmia 12 types of cardiac arrhythmia vs. healthy 450 206 259Cardiotocography pathologic, suspect vs. healthy 2126 471 21HeartDisease heart problems vs. healthy 270 120 13Hepatitis survival vs. fatal 80 13 19InternetAds ads vs. other images 3264 454 1555PageBlocks non-text vs. text 5473 560 10Parkinson healthy vs. Parkinson 195 147 22Pima diabetes vs. healthy 768 268 8SpamBase non-spam vs. spam 4601 1813 57Stamps genuine vs. forged 340 31 9Wilt diseased trees vs. other 4839 261 5
23
On theEvaluation ofUnsupervised
OutlierDetection
Arthur Zimek
Outlier DetectionMethods
Evaluation Measures
Datasets
Experiments
Conclusions
References
Dataset Preparation I
downsampling randomly downsample one class (as outlierclass) while retaining all instances from other classes (asinlier class)
I great variation in the nature of outliers producedI we repeat the downsampling ten times, resulting in
ten different datasetsI we adopt different downsample rates where
applicable (resulting in datasets with 20%, 10%, 5%,or 2% outliers)
duplicates duplicates can be problematic (e.g., densityestimates) — for datasets containing duplicates, wegenerate two variants, one with the original duplicates,and one without duplicates
24
On theEvaluation ofUnsupervised
OutlierDetection
Arthur Zimek
Outlier DetectionMethods
Evaluation Measures
Datasets
Experiments
Conclusions
References
Dataset Preparation II
categorical attributes three variants of handling categoricalattributes:
I categorical attributes are removedI 1-of-n encoding: a categorical attribute with n
possible values is mapped into n binary attributes(presence or absence of the correspondingcategorical value)
I IDF: a categorical attribute is encoded as theinverse document frequency
IDF(t) = ln(N/ft)
(N: total number of instances, ft: frequency of theattribute value t)
25
On theEvaluation ofUnsupervised
OutlierDetection
Arthur Zimek
Outlier DetectionMethods
Evaluation Measures
Datasets
Experiments
Conclusions
References
Dataset Preparation III
normalization Normalization of datasets is expected tohave considerable impact on the results, but is rarelydiscussed in the literature. Two variants for each datasetthat does not already have normalized attributes:
I unnormalizedI attribute-wise linear normalization to the range [0, 1]
missing values If an attribute has fewer than 10% ofinstances with missing values, those instances areremoved. Otherwise, the attribute itself is removed.
These procedures applied to most of the 23 datasets (someare taken from the literature unchanged, where they areavailable) results in about 1000 dataset variants.
26
On theEvaluation ofUnsupervised
OutlierDetection
Arthur Zimek
Outlier DetectionMethods
Evaluation Measures
Datasets
Experiments
Evaluation Measures
Characterization ofthe Methods
Characterization ofthe Datasets
Conclusions
References
Outline
Outlier Detection Methods
Evaluation Measures
Datasets
ExperimentsEvaluation MeasuresCharacterization of the MethodsCharacterization of the Datasets
Conclusions
27
On theEvaluation ofUnsupervised
OutlierDetection
Arthur Zimek
Outlier DetectionMethods
Evaluation Measures
Datasets
Experiments
Evaluation Measures
Characterization ofthe Methods
Characterization ofthe Datasets
Conclusions
References
Outline
Outlier Detection Methods
Evaluation Measures
Datasets
ExperimentsEvaluation MeasuresCharacterization of the MethodsCharacterization of the Datasets
Conclusions
28
On theEvaluation ofUnsupervised
OutlierDetection
Arthur Zimek
Outlier DetectionMethods
Evaluation Measures
Datasets
Experiments
Evaluation Measures
Characterization ofthe Methods
Characterization ofthe Datasets
Conclusions
References
Example: Annthyroid
1 10 20 30 40 50 60 70 80 90 1000.00
0.05
0.10
0.15
0.20
0.25
●●
●
●
●●
●
●●
●●
●
●
●
●
●●●●●●
●●
●●●●●●
●●
●●●●●●●●
●●●●●●●●●●
●●●●
●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●
●●●●
●●●●●
●●●
●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●
Annthyroid_withoutdupl_norm_07
Neighborhood size
P@
n
0
●kNN kNNW LOF SimplifiedLOF LoOP LDOF
0
●ODIN KDEOS COF FastABOD LDF INFLO
29
On theEvaluation ofUnsupervised
OutlierDetection
Arthur Zimek
Outlier DetectionMethods
Evaluation Measures
Datasets
Experiments
Evaluation Measures
Characterization ofthe Methods
Characterization ofthe Datasets
Conclusions
References
Example: Annthyroid
1 10 20 30 40 50 60 70 80 90 100−0.05
0.00
0.05
0.10
0.15
●●
●
●
●●
●
●●
●●
●
●
●
●
●●●●●●
●●
●●●●●●
●●
●●●●●●●●
●●●●●●●●●●
●●●●
●●
●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●
●
●●●●
●
●●●●●
●●●●●●●●●●●
●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
Annthyroid_withoutdupl_norm_07
Neighborhood size
Adj
uste
d P
@n
0
●kNN kNNW LOF SimplifiedLOF LoOP LDOF
0
●ODIN KDEOS COF FastABOD LDF INFLO
29
On theEvaluation ofUnsupervised
OutlierDetection
Arthur Zimek
Outlier DetectionMethods
Evaluation Measures
Datasets
Experiments
Evaluation Measures
Characterization ofthe Methods
Characterization ofthe Datasets
Conclusions
References
Example: Annthyroid
1 10 20 30 40 50 60 70 80 90 100
0.06
0.08
0.10
0.12
0.14
●
●
●
●
●
●●
●●
●●●
●●●
●●●●●●●●
●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●
●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
Annthyroid_withoutdupl_norm_07
Neighborhood size
AP
0
●kNN kNNW LOF SimplifiedLOF LoOP LDOF
0
●ODIN KDEOS COF FastABOD LDF INFLO
29
On theEvaluation ofUnsupervised
OutlierDetection
Arthur Zimek
Outlier DetectionMethods
Evaluation Measures
Datasets
Experiments
Evaluation Measures
Characterization ofthe Methods
Characterization ofthe Datasets
Conclusions
References
Example: Annthyroid
1 10 20 30 40 50 60 70 80 90 1000.00
0.02
0.04
0.06
0.08
0.10
●
●
●
●
●
●●
●●
●●●
●●●
●●●●●●●●
●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●
●
●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
Annthyroid_withoutdupl_norm_07
Neighborhood size
Adj
uste
d A
P
0
●kNN kNNW LOF SimplifiedLOF LoOP LDOF
0
●ODIN KDEOS COF FastABOD LDF INFLO
29
On theEvaluation ofUnsupervised
OutlierDetection
Arthur Zimek
Outlier DetectionMethods
Evaluation Measures
Datasets
Experiments
Evaluation Measures
Characterization ofthe Methods
Characterization ofthe Datasets
Conclusions
References
Example: Annthyroid
1 10 20 30 40 50 60 70 80 90 1000.50
0.55
0.60
0.65
0.70
●
●
●
●
●
●●
●●●●●
●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●
●●
●●
●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
Annthyroid_withoutdupl_norm_07
Neighborhood size
RO
C A
UC
0
●kNN kNNW LOF SimplifiedLOF LoOP LDOF
0
●ODIN KDEOS COF FastABOD LDF INFLO
29
On theEvaluation ofUnsupervised
OutlierDetection
Arthur Zimek
Outlier DetectionMethods
Evaluation Measures
Datasets
Experiments
Evaluation Measures
Characterization ofthe Methods
Characterization ofthe Datasets
Conclusions
References
Example: HeartDisease
1 10 20 30 40 50 60 70 80 90 1000.35
0.40
0.45
0.50
0.55
0.60
0.65
0.70
●
●●●
●
●
●
●●●
●
●●
●
●
●●
●●
●●
●●
●
●●
●●●
●●●
●●
●●●
●●●●
●●●●
●●●
●●●●
●●
●●●
●●
●
●●
●●●
●●●
●
●●●●
●●●●●●●
●●
●
●●●
●
●●●
●●●●
●●●●●●
●●●●●●●
●●●●●
●●●
●●●●●●●●
●●●●●●●●●
●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
HeartDisease_withoutdupl_norm_44
Neighborhood size
P@
n
0
●kNN kNNW LOF SimplifiedLOF LoOP LDOF
0
●ODIN KDEOS COF FastABOD LDF INFLO
30
On theEvaluation ofUnsupervised
OutlierDetection
Arthur Zimek
Outlier DetectionMethods
Evaluation Measures
Datasets
Experiments
Evaluation Measures
Characterization ofthe Methods
Characterization ofthe Datasets
Conclusions
References
Example: HeartDisease
1 10 20 30 40 50 60 70 80 90 100−0.1
0.0
0.1
0.2
0.3
0.4
●
●●●
●
●
●
●●●
●
●●
●
●
●●
●●
●●
●●
●
●●
●●●
●●●
●●
●●●
●●●●
●●●●
●●●
●●●●
●●
●●●
●●
●
●●
●●●
●●●
●
●●●●
●●●●●●●
●●
●
●●●
●
●●●
●●●●
●●●●●●
●●●●●●●
●●●●●
●●●
●●●●●●●●
●●●●●●●●●
●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
HeartDisease_withoutdupl_norm_44
Neighborhood size
Adj
uste
d P
@n
0
●kNN kNNW LOF SimplifiedLOF LoOP LDOF
0
●ODIN KDEOS COF FastABOD LDF INFLO
30
On theEvaluation ofUnsupervised
OutlierDetection
Arthur Zimek
Outlier DetectionMethods
Evaluation Measures
Datasets
Experiments
Evaluation Measures
Characterization ofthe Methods
Characterization ofthe Datasets
Conclusions
References
Example: HeartDisease
1 10 20 30 40 50 60 70 80 90 1000.40
0.45
0.50
0.55
0.60
0.65
0.70
●
●
●●
●
●●●
●●●●
●●●
●●
●●
●●
●
●
●●
●●
●●●
●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●
●●●●●●●
●●●●
●●
●●
●●●
●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
HeartDisease_withoutdupl_norm_44
Neighborhood size
AP
0
●kNN kNNW LOF SimplifiedLOF LoOP LDOF
0
●ODIN KDEOS COF FastABOD LDF INFLO
30
On theEvaluation ofUnsupervised
OutlierDetection
Arthur Zimek
Outlier DetectionMethods
Evaluation Measures
Datasets
Experiments
Evaluation Measures
Characterization ofthe Methods
Characterization ofthe Datasets
Conclusions
References
Example: HeartDisease
1 10 20 30 40 50 60 70 80 90 100
0.0
0.1
0.2
0.3
0.4
●
●
●
●
●
●●●
●●●●
●●●
●●
●●
●
●●
●
●●
●●
●●●
●●●●●●●●●
●●●●●●●●●●●●●●
●●●●●●●●●●●
●●●●●
●●
●●●●
●●
●●
●●●
●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
HeartDisease_withoutdupl_norm_44
Neighborhood size
Adj
uste
d A
P
0
●kNN kNNW LOF SimplifiedLOF LoOP LDOF
0
●ODIN KDEOS COF FastABOD LDF INFLO
30
On theEvaluation ofUnsupervised
OutlierDetection
Arthur Zimek
Outlier DetectionMethods
Evaluation Measures
Datasets
Experiments
Evaluation Measures
Characterization ofthe Methods
Characterization ofthe Datasets
Conclusions
References
Example: HeartDisease
1 10 20 30 40 50 60 70 80 90 1000.45
0.50
0.55
0.60
0.65
0.70
0.75
0.80
●
●
●
●
●
●●
●
●●
●●●●
●●
●
●●
●●●
●
●
●
●●
●
●●●●●●●●●●●●●●●●
●●●●
●●●●
●●●●●●●●●●●●
●●●●●●●●●●
●●●
●●●●●●●●●●●●●●
●●●●●●●●●
●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
HeartDisease_withoutdupl_norm_44
Neighborhood size
RO
C A
UC
0
●kNN kNNW LOF SimplifiedLOF LoOP LDOF
0
●ODIN KDEOS COF FastABOD LDF INFLO
30
On theEvaluation ofUnsupervised
OutlierDetection
Arthur Zimek
Outlier DetectionMethods
Evaluation Measures
Datasets
Experiments
Evaluation Measures
Characterization ofthe Methods
Characterization ofthe Datasets
Conclusions
References
Example: Wilt
1 10 20 30 40 50 60 70 80 90 1000.00
0.02
0.04
0.06
0.08
0.10
0.12
0.14
●
●
●●●●●●
●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
Wilt_withoutdupl_norm_02_v01
Neighborhood size
P@
n
0
●kNN kNNW LOF SimplifiedLOF LoOP LDOF
0
●ODIN KDEOS COF FastABOD LDF INFLO
31
On theEvaluation ofUnsupervised
OutlierDetection
Arthur Zimek
Outlier DetectionMethods
Evaluation Measures
Datasets
Experiments
Evaluation Measures
Characterization ofthe Methods
Characterization ofthe Datasets
Conclusions
References
Example: Wilt
1 10 20 30 40 50 60 70 80 90 100−0.05
0.00
0.05
0.10
0.15
●
●
●●●●●●
●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
Wilt_withoutdupl_norm_02_v01
Neighborhood size
Adj
uste
d P
@n
0
●kNN kNNW LOF SimplifiedLOF LoOP LDOF
0
●ODIN KDEOS COF FastABOD LDF INFLO
31
On theEvaluation ofUnsupervised
OutlierDetection
Arthur Zimek
Outlier DetectionMethods
Evaluation Measures
Datasets
Experiments
Evaluation Measures
Characterization ofthe Methods
Characterization ofthe Datasets
Conclusions
References
Example: Wilt
1 10 20 30 40 50 60 70 80 90 1000.00
0.02
0.04
0.06
0.08
0.10
●
●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
Wilt_withoutdupl_norm_02_v01
Neighborhood size
AP
0
●kNN kNNW LOF SimplifiedLOF LoOP LDOF
0
●ODIN KDEOS COF FastABOD LDF INFLO
31
On theEvaluation ofUnsupervised
OutlierDetection
Arthur Zimek
Outlier DetectionMethods
Evaluation Measures
Datasets
Experiments
Evaluation Measures
Characterization ofthe Methods
Characterization ofthe Datasets
Conclusions
References
Example: Wilt
1 10 20 30 40 50 60 70 80 90 100
−0.04
−0.02
0.00
0.02
0.04
0.06
0.08
0.10
●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
Wilt_withoutdupl_norm_02_v01
Neighborhood size
Adj
uste
d A
P
0
●kNN kNNW LOF SimplifiedLOF LoOP LDOF
0
●ODIN KDEOS COF FastABOD LDF INFLO
31
On theEvaluation ofUnsupervised
OutlierDetection
Arthur Zimek
Outlier DetectionMethods
Evaluation Measures
Datasets
Experiments
Evaluation Measures
Characterization ofthe Methods
Characterization ofthe Datasets
Conclusions
References
Example: Wilt
1 10 20 30 40 50 60 70 80 90 1000.3
0.4
0.5
0.6
0.7
0.8
●
●
●●●●●
●●
●●
●●●
●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●
●●
●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
Wilt_withoutdupl_norm_02_v01
Neighborhood size
RO
C A
UC
0
●kNN kNNW LOF SimplifiedLOF LoOP LDOF
0
●ODIN KDEOS COF FastABOD LDF INFLO
31
On theEvaluation ofUnsupervised
OutlierDetection
Arthur Zimek
Outlier DetectionMethods
Evaluation Measures
Datasets
Experiments
Evaluation Measures
Characterization ofthe Methods
Characterization ofthe Datasets
Conclusions
References
Observations
all results available in the web repository:http://www.dbs.ifi.lmu.de/research/outlier-evaluation/
I performance trends differ across algorithms, datasets,parameters, and evaluation methods
I ROC AUC less sensitive to number of true outliersI ROC AUC scores across the datasets typically
reasonably highI P@n scores considerably lower for datasets with smaller
proportions of outliersI AP resembles ROC AUC, assessing the ranks of all
outliers, but tends to be lower with stronger imbalanceI P@n can discriminate between methods that perform
more or less equally well in terms of ROC AUC [Davisand Goadrich, 2006]
32
On theEvaluation ofUnsupervised
OutlierDetection
Arthur Zimek
Outlier DetectionMethods
Evaluation Measures
Datasets
Experiments
Evaluation Measures
Characterization ofthe Methods
Characterization ofthe Datasets
Conclusions
References
Outline
Outlier Detection Methods
Evaluation Measures
Datasets
ExperimentsEvaluation MeasuresCharacterization of the MethodsCharacterization of the Datasets
Conclusions
33
On theEvaluation ofUnsupervised
OutlierDetection
Arthur Zimek
Outlier DetectionMethods
Evaluation Measures
Datasets
Experiments
Evaluation Measures
Characterization ofthe Methods
Characterization ofthe Datasets
Conclusions
References
Average ROC AUC per Method
aggregated over all datasets (without duplicates,normalized)
●
●
●
●
●
●●
●
●
●
●
●
0.65
0.70
0.75
0.80
0.85
KNN
KNNW
LOF
SimplifiedLOF
LoOP
LDOF
ODIN
KDEOS
COF
FastABOD
LDF
INFLO
mean ROC AUC (mean over best k per data set)
34
On theEvaluation ofUnsupervised
OutlierDetection
Arthur Zimek
Outlier DetectionMethods
Evaluation Measures
Datasets
Experiments
Evaluation Measures
Characterization ofthe Methods
Characterization ofthe Datasets
Conclusions
References
Average ROC AUC per Method
aggregated over all datasets (without duplicates,normalized)
●
●
●
●
●
●
●
●
●
●
●
●
0.65
0.70
0.75
0.80
0.85
KNN
KNNW
LOF
SimplifiedLOF
LoOP
LDOF
ODIN
KDEOS
COF
FastABOD
LDF
INFLO
mean ROC AUC (mean over best k +− 5 per data set)
34
On theEvaluation ofUnsupervised
OutlierDetection
Arthur Zimek
Outlier DetectionMethods
Evaluation Measures
Datasets
Experiments
Evaluation Measures
Characterization ofthe Methods
Characterization ofthe Datasets
Conclusions
References
Average ROC AUC per Method
aggregated over all datasets (without duplicates,normalized)
●
●
●
●
●
●
●
●
●
●
●
●
0.65
0.70
0.75
0.80
0.85
KNN
KNNW
LOF
SimplifiedLOF
LoOP
LDOF
ODIN
KDEOS
COF
FastABOD
LDF
INFLO
mean ROC AUC (mean over best k +− 10 per data set)
34
On theEvaluation ofUnsupervised
OutlierDetection
Arthur Zimek
Outlier DetectionMethods
Evaluation Measures
Datasets
Experiments
Evaluation Measures
Characterization ofthe Methods
Characterization ofthe Datasets
Conclusions
References
Average ROC AUC per Method
aggregated over all datasets (without duplicates,normalized)
●
●
●
●
●
●
●
●
●
●
●
●
0.65
0.70
0.75
0.80
0.85
KNN
KNNW
LOF
SimplifiedLOF
LoOP
LDOF
ODIN
KDEOS
COF
FastABOD
LDF
INFLO
mean ROC AUC (mean over all k per data set)
34
On theEvaluation ofUnsupervised
OutlierDetection
Arthur Zimek
Outlier DetectionMethods
Evaluation Measures
Datasets
Experiments
Evaluation Measures
Characterization ofthe Methods
Characterization ofthe Datasets
Conclusions
References
Average ROC AUC per Method
aggregated over all datasets (without duplicates,normalized, at most 5% outliers)
●
●
●
●
●
●●
●
●
●
●
●
0.65
0.70
0.75
0.80
0.85
KNN
KNNW
LOF
SimplifiedLOF
LoOP
LDOF
ODIN
KDEOS
COF
FastABOD
LDF
INFLO
mean ROC AUC (mean over best k per data set)
35
On theEvaluation ofUnsupervised
OutlierDetection
Arthur Zimek
Outlier DetectionMethods
Evaluation Measures
Datasets
Experiments
Evaluation Measures
Characterization ofthe Methods
Characterization ofthe Datasets
Conclusions
References
Average ROC AUC per Method
aggregated over all datasets (without duplicates,normalized, at most 5% outliers)
●●
●
●
●
●
●
●
●
●
●
●
0.65
0.70
0.75
0.80
0.85
KNN
KNNW
LOF
SimplifiedLOF
LoOP
LDOF
ODIN
KDEOS
COF
FastABOD
LDF
INFLO
mean ROC AUC (mean over best k +− 5 per data set)
35
On theEvaluation ofUnsupervised
OutlierDetection
Arthur Zimek
Outlier DetectionMethods
Evaluation Measures
Datasets
Experiments
Evaluation Measures
Characterization ofthe Methods
Characterization ofthe Datasets
Conclusions
References
Average ROC AUC per Method
aggregated over all datasets (without duplicates,normalized, at most 5% outliers)
●
●●
●
●
●
●
●
●●
●
●
0.65
0.70
0.75
0.80
0.85
KNN
KNNW
LOF
SimplifiedLOF
LoOP
LDOF
ODIN
KDEOS
COF
FastABOD
LDF
INFLO
mean ROC AUC (mean over best k +− 10 per data set)
35
On theEvaluation ofUnsupervised
OutlierDetection
Arthur Zimek
Outlier DetectionMethods
Evaluation Measures
Datasets
Experiments
Evaluation Measures
Characterization ofthe Methods
Characterization ofthe Datasets
Conclusions
References
Average ROC AUC per Method
aggregated over all datasets (without duplicates,normalized, at most 5% outliers)
●
●
●
●
●
●
●
●
●
●
●
●
0.65
0.70
0.75
0.80
0.85
KNN
KNNW
LOF
SimplifiedLOF
LoOP
LDOF
ODIN
KDEOS
COF
FastABOD
LDF
INFLO
mean ROC AUC (mean over all k per data set)
35
On theEvaluation ofUnsupervised
OutlierDetection
Arthur Zimek
Outlier DetectionMethods
Evaluation Measures
Datasets
Experiments
Evaluation Measures
Characterization ofthe Methods
Characterization ofthe Datasets
Conclusions
References
Statistical Test
Nemenyi post-hoc test (normalized datasets without duplicates, ALOI and KDDCup99 removed, bestachieved quality in terms of ROC AUC chosen for each dataset independently; results for those datasets withmultiple subsampled variants were grouped by averaging the best results over all variants for each method):
column method is better/worse than row method at 90% (‘+’/‘−’) and 95% (‘++’/‘−−’) confidence levels.
kNN
kNN
W
LO
F
Sim
plifi
edL
OF
LoO
P
LD
OF
OD
IN
KD
EO
S
CO
F
Fast
AB
OD
LD
F
INFL
O
kNN = −−kNNW = −−LOF = − −− −− −−SimplifiedLOF = −−LoOP = −−LDOF + =ODIN ++ =KDEOS ++ ++ ++ ++ ++ = ++ ++ ++COF −− =FastABOD ++ = +LDF −− − =INFLO −− =
36
On theEvaluation ofUnsupervised
OutlierDetection
Arthur Zimek
Outlier DetectionMethods
Evaluation Measures
Datasets
Experiments
Evaluation Measures
Characterization ofthe Methods
Characterization ofthe Datasets
Conclusions
References
Outline
Outlier Detection Methods
Evaluation Measures
Datasets
ExperimentsEvaluation MeasuresCharacterization of the MethodsCharacterization of the Datasets
Conclusions
37
On theEvaluation ofUnsupervised
OutlierDetection
Arthur Zimek
Outlier DetectionMethods
Evaluation Measures
Datasets
Experiments
Evaluation Measures
Characterization ofthe Methods
Characterization ofthe Datasets
Conclusions
References
Best Results per Dataset
Average best performance of all methods, per dataset(without duplicates, normalized). Best results chosen byROC AUC performance.
●
●●
●
●●●●
●
●●●●●
●
●
●●●
●
●●
●●●●●
●
●●●
●
●
●
●●●
●
●
●●
●●●
●
●
●
●
●●
●●
●
●●
●●
●
●
●●
●
0.00
0.25
0.50
0.75
1.00
WP
BC
Wilt
_05
Wilt
_02
WD
BC
WB
CW
avef
orm
Sta
mps
_09
Sta
mps
_05
Sta
mps
_02
Spa
mB
ase_
40S
pam
Bas
e_20
Spa
mB
ase_
10S
pam
Bas
e_05
Spa
mB
ase_
02S
huttl
eP
ima_
35P
ima_
20P
ima_
10P
ima_
05P
ima_
02P
enD
igits
Par
kins
on_7
5P
arki
nson
_20
Par
kins
on_1
0P
arki
nson
_05
Pag
eBlo
cks_
09P
ageB
lock
s_05
Pag
eBlo
cks_
02Ly
mph
ogra
phy_
idf
Lym
phog
raph
y_ca
trem
oved
Lym
phog
raph
y_1o
fnK
DD
Cup
99_i
dfK
DD
Cup
99_c
atre
mov
edK
DD
Cup
99_1
ofn
Iono
sphe
reIn
tern
etA
ds_1
9In
tern
etA
ds_1
0In
tern
etA
ds_0
5In
tern
etA
ds_0
2H
epat
itis_
16H
epat
itis_
10H
epat
itis_
05H
eart
Dis
ease
_44
Hea
rtD
isea
se_2
0H
eart
Dis
ease
_10
Hea
rtD
isea
se_0
5H
eart
Dis
ease
_02
Gla
ssC
ardi
otoc
ogra
phy_
22C
ardi
otoc
ogra
phy_
20C
ardi
otoc
ogra
phy_
10C
ardi
otoc
ogra
phy_
05C
ardi
otoc
ogra
phy_
02A
rrhy
thm
ia_4
6A
rrhy
thm
ia_2
0A
rrhy
thm
ia_1
0A
rrhy
thm
ia_0
5A
rrhy
thm
ia_0
2A
nnth
yroi
d_07
Ann
thyr
oid_
05A
nnth
yroi
d_02
ALO
I
P@
n
38
On theEvaluation ofUnsupervised
OutlierDetection
Arthur Zimek
Outlier DetectionMethods
Evaluation Measures
Datasets
Experiments
Evaluation Measures
Characterization ofthe Methods
Characterization ofthe Datasets
Conclusions
References
Best Results per Dataset
Average best performance of all methods, per dataset(without duplicates, normalized). Best results chosen byROC AUC performance.
●
●●●
●●●
●●●●●●●
●
●
●●●
●●
●
●
●●
●
●
●
●●●
●
●
●
●●●
●
●●
●●●●
●
●●
●
●●●●
●
●●●●
●
●
●●●
0.00
0.25
0.50
0.75
1.00
WP
BC
Wilt
_05
Wilt
_02
WD
BC
WB
CW
avef
orm
Sta
mps
_09
Sta
mps
_05
Sta
mps
_02
Spa
mB
ase_
40S
pam
Bas
e_20
Spa
mB
ase_
10S
pam
Bas
e_05
Spa
mB
ase_
02S
huttl
eP
ima_
35P
ima_
20P
ima_
10P
ima_
05P
ima_
02P
enD
igits
Par
kins
on_7
5P
arki
nson
_20
Par
kins
on_1
0P
arki
nson
_05
Pag
eBlo
cks_
09P
ageB
lock
s_05
Pag
eBlo
cks_
02Ly
mph
ogra
phy_
idf
Lym
phog
raph
y_ca
trem
oved
Lym
phog
raph
y_1o
fnK
DD
Cup
99_i
dfK
DD
Cup
99_c
atre
mov
edK
DD
Cup
99_1
ofn
Iono
sphe
reIn
tern
etA
ds_1
9In
tern
etA
ds_1
0In
tern
etA
ds_0
5In
tern
etA
ds_0
2H
epat
itis_
16H
epat
itis_
10H
epat
itis_
05H
eart
Dis
ease
_44
Hea
rtD
isea
se_2
0H
eart
Dis
ease
_10
Hea
rtD
isea
se_0
5H
eart
Dis
ease
_02
Gla
ssC
ardi
otoc
ogra
phy_
22C
ardi
otoc
ogra
phy_
20C
ardi
otoc
ogra
phy_
10C
ardi
otoc
ogra
phy_
05C
ardi
otoc
ogra
phy_
02A
rrhy
thm
ia_4
6A
rrhy
thm
ia_2
0A
rrhy
thm
ia_1
0A
rrhy
thm
ia_0
5A
rrhy
thm
ia_0
2A
nnth
yroi
d_07
Ann
thyr
oid_
05A
nnth
yroi
d_02
ALO
I
Adj
uste
d P
@n
38
On theEvaluation ofUnsupervised
OutlierDetection
Arthur Zimek
Outlier DetectionMethods
Evaluation Measures
Datasets
Experiments
Evaluation Measures
Characterization ofthe Methods
Characterization ofthe Datasets
Conclusions
References
Best Results per Dataset
Average best performance of all methods, per dataset(without duplicates, normalized). Best results chosen byROC AUC performance.
●
●●●
●●
●●
●
●●●
●●
●
●
●●●
●
●
●
●●●
●●
●
●●●
●
●
●
●●
●
●
●●
●
●●●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●●
●
0.00
0.25
0.50
0.75
1.00
WP
BC
Wilt
_05
Wilt
_02
WD
BC
WB
CW
avef
orm
Sta
mps
_09
Sta
mps
_05
Sta
mps
_02
Spa
mB
ase_
40S
pam
Bas
e_20
Spa
mB
ase_
10S
pam
Bas
e_05
Spa
mB
ase_
02S
huttl
eP
ima_
35P
ima_
20P
ima_
10P
ima_
05P
ima_
02P
enD
igits
Par
kins
on_7
5P
arki
nson
_20
Par
kins
on_1
0P
arki
nson
_05
Pag
eBlo
cks_
09P
ageB
lock
s_05
Pag
eBlo
cks_
02Ly
mph
ogra
phy_
idf
Lym
phog
raph
y_ca
trem
oved
Lym
phog
raph
y_1o
fnK
DD
Cup
99_i
dfK
DD
Cup
99_c
atre
mov
edK
DD
Cup
99_1
ofn
Iono
sphe
reIn
tern
etA
ds_1
9In
tern
etA
ds_1
0In
tern
etA
ds_0
5In
tern
etA
ds_0
2H
epat
itis_
16H
epat
itis_
10H
epat
itis_
05H
eart
Dis
ease
_44
Hea
rtD
isea
se_2
0H
eart
Dis
ease
_10
Hea
rtD
isea
se_0
5H
eart
Dis
ease
_02
Gla
ssC
ardi
otoc
ogra
phy_
22C
ardi
otoc
ogra
phy_
20C
ardi
otoc
ogra
phy_
10C
ardi
otoc
ogra
phy_
05C
ardi
otoc
ogra
phy_
02A
rrhy
thm
ia_4
6A
rrhy
thm
ia_2
0A
rrhy
thm
ia_1
0A
rrhy
thm
ia_0
5A
rrhy
thm
ia_0
2A
nnth
yroi
d_07
Ann
thyr
oid_
05A
nnth
yroi
d_02
ALO
I
AP
38
On theEvaluation ofUnsupervised
OutlierDetection
Arthur Zimek
Outlier DetectionMethods
Evaluation Measures
Datasets
Experiments
Evaluation Measures
Characterization ofthe Methods
Characterization ofthe Datasets
Conclusions
References
Best Results per Dataset
Average best performance of all methods, per dataset(without duplicates, normalized). Best results chosen byROC AUC performance.
●
●●●
●●●●
●
●●●●●●
●
●●●
●●
●●
●●●●
●
●●●
●
●
●
●●●
●
●●
●
●●●
●
●
●
●
●●●●●
●●
●
●
●
●
●●●
0.00
0.25
0.50
0.75
1.00
WP
BC
Wilt
_05
Wilt
_02
WD
BC
WB
CW
avef
orm
Sta
mps
_09
Sta
mps
_05
Sta
mps
_02
Spa
mB
ase_
40S
pam
Bas
e_20
Spa
mB
ase_
10S
pam
Bas
e_05
Spa
mB
ase_
02S
huttl
eP
ima_
35P
ima_
20P
ima_
10P
ima_
05P
ima_
02P
enD
igits
Par
kins
on_7
5P
arki
nson
_20
Par
kins
on_1
0P
arki
nson
_05
Pag
eBlo
cks_
09P
ageB
lock
s_05
Pag
eBlo
cks_
02Ly
mph
ogra
phy_
idf
Lym
phog
raph
y_ca
trem
oved
Lym
phog
raph
y_1o
fnK
DD
Cup
99_i
dfK
DD
Cup
99_c
atre
mov
edK
DD
Cup
99_1
ofn
Iono
sphe
reIn
tern
etA
ds_1
9In
tern
etA
ds_1
0In
tern
etA
ds_0
5In
tern
etA
ds_0
2H
epat
itis_
16H
epat
itis_
10H
epat
itis_
05H
eart
Dis
ease
_44
Hea
rtD
isea
se_2
0H
eart
Dis
ease
_10
Hea
rtD
isea
se_0
5H
eart
Dis
ease
_02
Gla
ssC
ardi
otoc
ogra
phy_
22C
ardi
otoc
ogra
phy_
20C
ardi
otoc
ogra
phy_
10C
ardi
otoc
ogra
phy_
05C
ardi
otoc
ogra
phy_
02A
rrhy
thm
ia_4
6A
rrhy
thm
ia_2
0A
rrhy
thm
ia_1
0A
rrhy
thm
ia_0
5A
rrhy
thm
ia_0
2A
nnth
yroi
d_07
Ann
thyr
oid_
05A
nnth
yroi
d_02
ALO
I
Adj
uste
d A
P
38
On theEvaluation ofUnsupervised
OutlierDetection
Arthur Zimek
Outlier DetectionMethods
Evaluation Measures
Datasets
Experiments
Evaluation Measures
Characterization ofthe Methods
Characterization ofthe Datasets
Conclusions
References
Best Results per Dataset
Average best performance of all methods, per dataset(without duplicates, normalized). Best results chosen byROC AUC performance.
●
●
●●
●
●
●
●●
●
●
●
●●
●●●●
●
●
●
●●
●
●
●
●
●●●●●
●
●
●●
●
●●
●
●
●
●●●●●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
0.50
0.63
0.75
0.88
1.00
WP
BC
Wilt
_05
Wilt
_02
WD
BC
WB
CW
avef
orm
Sta
mps
_09
Sta
mps
_05
Sta
mps
_02
Spa
mB
ase_
40S
pam
Bas
e_20
Spa
mB
ase_
10S
pam
Bas
e_05
Spa
mB
ase_
02S
huttl
eP
ima_
35P
ima_
20P
ima_
10P
ima_
05P
ima_
02P
enD
igits
Par
kins
on_7
5P
arki
nson
_20
Par
kins
on_1
0P
arki
nson
_05
Pag
eBlo
cks_
09P
ageB
lock
s_05
Pag
eBlo
cks_
02Ly
mph
ogra
phy_
idf
Lym
phog
raph
y_ca
trem
oved
Lym
phog
raph
y_1o
fnK
DD
Cup
99_i
dfK
DD
Cup
99_c
atre
mov
edK
DD
Cup
99_1
ofn
Iono
sphe
reIn
tern
etA
ds_1
9In
tern
etA
ds_1
0In
tern
etA
ds_0
5In
tern
etA
ds_0
2H
epat
itis_
16H
epat
itis_
10H
epat
itis_
05H
eart
Dis
ease
_44
Hea
rtD
isea
se_2
0H
eart
Dis
ease
_10
Hea
rtD
isea
se_0
5H
eart
Dis
ease
_02
Gla
ssC
ardi
otoc
ogra
phy_
22C
ardi
otoc
ogra
phy_
20C
ardi
otoc
ogra
phy_
10C
ardi
otoc
ogra
phy_
05C
ardi
otoc
ogra
phy_
02A
rrhy
thm
ia_4
6A
rrhy
thm
ia_2
0A
rrhy
thm
ia_1
0A
rrhy
thm
ia_0
5A
rrhy
thm
ia_0
2A
nnth
yroi
d_07
Ann
thyr
oid_
05A
nnth
yroi
d_02
ALO
I
RO
C A
UC
38
On theEvaluation ofUnsupervised
OutlierDetection
Arthur Zimek
Outlier DetectionMethods
Evaluation Measures
Datasets
Experiments
Evaluation Measures
Characterization ofthe Methods
Characterization ofthe Datasets
Conclusions
References
Difficulty and Dimensionality
Wilt Glass Pima Stamps WBC Page Heart Hepat Lymph Annth Cardio Wave Park ALOI WDBC Spam Arrhy Internet
0.5
0.55
0.6
0.65
0.7
0.75
0.8
0.85
0.9
0.95
1
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
Datasets
RO
C A
UC
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●
●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●kNN kNNW LOF SimplifiedLOF LoOP LDOF●ODIN KDEOS COF FastABOD LDF INFLO
ROC AUC scores, for each method using the best k, on the datasets with 3 to 5%
of outliers, averaged over the different dataset variants where available.
The datasets are arranged on the x-axis of the plot from left to right in order of
increasing dimensionality.39
On theEvaluation ofUnsupervised
OutlierDetection
Arthur Zimek
Outlier DetectionMethods
Evaluation Measures
Datasets
Experiments
Evaluation Measures
Characterization ofthe Methods
Characterization ofthe Datasets
Conclusions
References
Diversity of Outlier Ranks
datasets without duplicates, normalized, with 3-5% outliers
ALOIA B C D E F G H I K L
A kNN
B kNNW
C LOF
D SimplifiedLOF
E LoOP
F LDOF
G ODIN
H KDEOS
I COF
J FastABOD
K LDF
L INFLO
40
On theEvaluation ofUnsupervised
OutlierDetection
Arthur Zimek
Outlier DetectionMethods
Evaluation Measures
Datasets
Experiments
Evaluation Measures
Characterization ofthe Methods
Characterization ofthe Datasets
Conclusions
References
Diversity of Outlier Ranks
datasets without duplicates, normalized, with 3-5% outliers
AnnthyroidA B C D E F G H I J K L
A kNN
B kNNW
C LOF
D SimplifiedLOF
E LoOP
F LDOF
G ODIN
H KDEOS
I COF
J FastABOD
K LDF
L INFLO
40
On theEvaluation ofUnsupervised
OutlierDetection
Arthur Zimek
Outlier DetectionMethods
Evaluation Measures
Datasets
Experiments
Evaluation Measures
Characterization ofthe Methods
Characterization ofthe Datasets
Conclusions
References
Diversity of Outlier Ranks
datasets without duplicates, normalized, with 3-5% outliers
ArrhythmiaA B C D E F G H I J K L
A kNN
B kNNW
C LOF
D SimplifiedLOF
E LoOP
F LDOF
G ODIN
H KDEOS
I COF
J FastABOD
K LDF
L INFLO
40
On theEvaluation ofUnsupervised
OutlierDetection
Arthur Zimek
Outlier DetectionMethods
Evaluation Measures
Datasets
Experiments
Evaluation Measures
Characterization ofthe Methods
Characterization ofthe Datasets
Conclusions
References
Diversity of Outlier Ranks
datasets without duplicates, normalized, with 3-5% outliers
CardiotocographyA B C D E F G H I J K L
A kNN
B kNNW
C LOF
D SimplifiedLOF
E LoOP
F LDOF
G ODIN
H KDEOS
I COF
J FastABOD
K LDF
L INFLO
40
On theEvaluation ofUnsupervised
OutlierDetection
Arthur Zimek
Outlier DetectionMethods
Evaluation Measures
Datasets
Experiments
Evaluation Measures
Characterization ofthe Methods
Characterization ofthe Datasets
Conclusions
References
Diversity of Outlier Ranks
datasets without duplicates, normalized, with 3-5% outliers
GlassA B C D E F G H I J K L
A kNN
B kNNW
C LOF
D SimplifiedLOF
E LoOP
F LDOF
G ODIN
H KDEOS
I COF
J FastABOD
K LDF
L INFLO
40
On theEvaluation ofUnsupervised
OutlierDetection
Arthur Zimek
Outlier DetectionMethods
Evaluation Measures
Datasets
Experiments
Evaluation Measures
Characterization ofthe Methods
Characterization ofthe Datasets
Conclusions
References
Diversity of Outlier Ranks
datasets without duplicates, normalized, with 3-5% outliers
HeartDiseaseA B C D E F G H I J K L
A kNN
B kNNW
C LOF
D SimplifiedLOF
E LoOP
F LDOF
G ODIN
H KDEOS
I COF
J FastABOD
K LDF
L INFLO
40
On theEvaluation ofUnsupervised
OutlierDetection
Arthur Zimek
Outlier DetectionMethods
Evaluation Measures
Datasets
Experiments
Evaluation Measures
Characterization ofthe Methods
Characterization ofthe Datasets
Conclusions
References
Diversity of Outlier Ranks
datasets without duplicates, normalized, with 3-5% outliers
HepatitisA B C D E F G H I J K L
A kNN
B kNNW
C LOF
D SimplifiedLOF
E LoOP
F LDOF
G ODIN
H KDEOS
I COF
J FastABOD
K LDF
L INFLO
40
On theEvaluation ofUnsupervised
OutlierDetection
Arthur Zimek
Outlier DetectionMethods
Evaluation Measures
Datasets
Experiments
Evaluation Measures
Characterization ofthe Methods
Characterization ofthe Datasets
Conclusions
References
Diversity of Outlier Ranks
datasets without duplicates, normalized, with 3-5% outliers
InternetAdsA B C D E F G H I J K L
A kNN
B kNNW
C LOF
D SimplifiedLOF
E LoOP
F LDOF
G ODIN
H KDEOS
I COF
J FastABOD
K LDF
L INFLO
40
On theEvaluation ofUnsupervised
OutlierDetection
Arthur Zimek
Outlier DetectionMethods
Evaluation Measures
Datasets
Experiments
Evaluation Measures
Characterization ofthe Methods
Characterization ofthe Datasets
Conclusions
References
Diversity of Outlier Ranks
datasets without duplicates, normalized, with 3-5% outliers
LymphographyA B C D E F G H I J K L
A kNN
B kNNW
C LOF
D SimplifiedLOF
E LoOP
F LDOF
G ODIN
H KDEOS
I COF
J FastABOD
K LDF
L INFLO
40
On theEvaluation ofUnsupervised
OutlierDetection
Arthur Zimek
Outlier DetectionMethods
Evaluation Measures
Datasets
Experiments
Evaluation Measures
Characterization ofthe Methods
Characterization ofthe Datasets
Conclusions
References
Diversity of Outlier Ranks
datasets without duplicates, normalized, with 3-5% outliers
PageBlocksA B C D E F G H I J K L
A kNN
B kNNW
C LOF
D SimplifiedLOF
E LoOP
F LDOF
G ODIN
H KDEOS
I COF
J FastABOD
K LDF
L INFLO
40
On theEvaluation ofUnsupervised
OutlierDetection
Arthur Zimek
Outlier DetectionMethods
Evaluation Measures
Datasets
Experiments
Evaluation Measures
Characterization ofthe Methods
Characterization ofthe Datasets
Conclusions
References
Diversity of Outlier Ranks
datasets without duplicates, normalized, with 3-5% outliers
ParkinsonA B C D E F G H I J K L
A kNN
B kNNW
C LOF
D SimplifiedLOF
E LoOP
F LDOF
G ODIN
H KDEOS
I COF
J FastABOD
K LDF
L INFLO
40
On theEvaluation ofUnsupervised
OutlierDetection
Arthur Zimek
Outlier DetectionMethods
Evaluation Measures
Datasets
Experiments
Evaluation Measures
Characterization ofthe Methods
Characterization ofthe Datasets
Conclusions
References
Diversity of Outlier Ranks
datasets without duplicates, normalized, with 3-5% outliers
PimaA B C D E F G H I J K L
A kNN
B kNNW
C LOF
D SimplifiedLOF
E LoOP
F LDOF
G ODIN
H KDEOS
I COF
J FastABOD
K LDF
L INFLO
40
On theEvaluation ofUnsupervised
OutlierDetection
Arthur Zimek
Outlier DetectionMethods
Evaluation Measures
Datasets
Experiments
Evaluation Measures
Characterization ofthe Methods
Characterization ofthe Datasets
Conclusions
References
Diversity of Outlier Ranks
datasets without duplicates, normalized, with 3-5% outliers
SpamBaseA B C D E F G H I J K L
A kNN
B kNNW
C LOF
D SimplifiedLOF
E LoOP
F LDOF
G ODIN
H KDEOS
I COF
J FastABOD
K LDF
L INFLO
40
On theEvaluation ofUnsupervised
OutlierDetection
Arthur Zimek
Outlier DetectionMethods
Evaluation Measures
Datasets
Experiments
Evaluation Measures
Characterization ofthe Methods
Characterization ofthe Datasets
Conclusions
References
Diversity of Outlier Ranks
datasets without duplicates, normalized, with 3-5% outliers
StampsA B C D E F G H I J K L
A kNN
B kNNW
C LOF
D SimplifiedLOF
E LoOP
F LDOF
G ODIN
H KDEOS
I COF
J FastABOD
K LDF
L INFLO
40
On theEvaluation ofUnsupervised
OutlierDetection
Arthur Zimek
Outlier DetectionMethods
Evaluation Measures
Datasets
Experiments
Evaluation Measures
Characterization ofthe Methods
Characterization ofthe Datasets
Conclusions
References
Diversity of Outlier Ranks
datasets without duplicates, normalized, with 3-5% outliers
WaveformA B C D E F G H I J K L
A kNN
B kNNW
C LOF
D SimplifiedLOF
E LoOP
F LDOF
G ODIN
H KDEOS
I COF
J FastABOD
K LDF
L INFLO
40
On theEvaluation ofUnsupervised
OutlierDetection
Arthur Zimek
Outlier DetectionMethods
Evaluation Measures
Datasets
Experiments
Evaluation Measures
Characterization ofthe Methods
Characterization ofthe Datasets
Conclusions
References
Diversity of Outlier Ranks
datasets without duplicates, normalized, with 3-5% outliers
WBCA B C D E F G H I J K L
A kNN
B kNNW
C LOF
D SimplifiedLOF
E LoOP
F LDOF
G ODIN
H KDEOS
I COF
J FastABOD
K LDF
L INFLO
40
On theEvaluation ofUnsupervised
OutlierDetection
Arthur Zimek
Outlier DetectionMethods
Evaluation Measures
Datasets
Experiments
Evaluation Measures
Characterization ofthe Methods
Characterization ofthe Datasets
Conclusions
References
Diversity of Outlier Ranks
datasets without duplicates, normalized, with 3-5% outliers
WDBCA B C D E F G H I J K L
A kNN
B kNNW
C LOF
D SimplifiedLOF
E LoOP
F LDOF
G ODIN
H KDEOS
I COF
J FastABOD
K LDF
L INFLO
40
On theEvaluation ofUnsupervised
OutlierDetection
Arthur Zimek
Outlier DetectionMethods
Evaluation Measures
Datasets
Experiments
Evaluation Measures
Characterization ofthe Methods
Characterization ofthe Datasets
Conclusions
References
Diversity of Outlier Ranks
datasets without duplicates, normalized, with 3-5% outliers
WiltA B C D E F G H I J K L
A kNN
B kNNW
C LOF
D SimplifiedLOF
E LoOP
F LDOF
G ODIN
H KDEOS
I COF
J FastABOD
K LDF
L INFLO
40
On theEvaluation ofUnsupervised
OutlierDetection
Arthur Zimek
Outlier DetectionMethods
Evaluation Measures
Datasets
Experiments
Evaluation Measures
Characterization ofthe Methods
Characterization ofthe Datasets
Conclusions
References
Suitability of Ground Truth Outlier Labels
Difficulty for given labels vs. random labels
●ALOI
●Glass
●Lymphography
●
●
●
●●
●
●
●
●●
Waveform
●
●●●●
●
●●
●
●
WBC
●
●
●
●
●●
●
●
●● WDBC
●●
●●●
●
●
●●●
Annthyroid
●
● ●
●
●
●
●
●
● ●
Arrhythmia●
●
●
●●
●
●
●
●●
Cardiotocography
●
●●
●
●
●
●●
●
●
HeartDisease
●
● ●
●
●
●
●
●●
●
Hepatitis
●
●●●●● ●
●
●
●
InternetAds
●●
●
● ●
●
●●
●
●
PageBlocks
●
●●
●
●
●
●
●
●
●
Parkinson●
●
●
●
●
●
●
●●
●
Pima
●●●
●
●●●
●
●
●
SpamBase
●●
●
●●
● ●●
●
●Stamps
●Wilt
● ●
●
●
●
●
●
●●●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●●
●●
●
●●
●
●
●●●●
●●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●●●
●
●
●
●●
●
●●●
RandomRankersIndependent
●● ●● ●●●●●●● ●●●●● ●● ● ●●● ●●●● ●●●● ●●● ●● ●●●● ● ● ● ●●● ●●● ●●●● ●● ● ●●●●● ●●●●● ●● ●● ●● ●● ●●● ●● ●● ●●● ●●●●● ● ● ●●●● ● ●●● ● ●
RandomRankersIdentical●
0 2 4 6 8 10
01
23
4
PerfectResult
Difficulty Score
Div
ersi
ty S
core
41
On theEvaluation ofUnsupervised
OutlierDetection
Arthur Zimek
Outlier DetectionMethods
Evaluation Measures
Datasets
Experiments
Evaluation Measures
Characterization ofthe Methods
Characterization ofthe Datasets
Conclusions
References
Suitability of Ground Truth Outlier Labels
Difficulty for given labels vs. random labels
●
●
●
●
ALOIAnnthyroidArrhythmia
CardiotocographyGlass
HeartDiseaseHepatitis
InternetAdsLymphography
PageBlocksParkinson
PimaSpamBase
StampsWaveform
WBCWDBC
Wilt
2 4 6 8 10
●
●
ALOIAnnthyroidArrhythmia
CardiotocographyGlass
HeartDiseaseHepatitis
InternetAdsLymphography
PageBlocksParkinson
PimaSpamBase
StampsWaveform
WBCWDBC
Wilt
2 4 6 8 10
Difficulty Score
41
On theEvaluation ofUnsupervised
OutlierDetection
Arthur Zimek
Outlier DetectionMethods
Evaluation Measures
Datasets
Experiments
Conclusions
References
Outline
Outlier Detection Methods
Evaluation Measures
Datasets
Experiments
Conclusions
42
On theEvaluation ofUnsupervised
OutlierDetection
Arthur Zimek
Outlier DetectionMethods
Evaluation Measures
Datasets
Experiments
Conclusions
References
Conclusions
I we discussed evaluation measures for outlier rankings:P@n, AP, and ROC (AUC)
I we proposed adjustment for chance for P@n and for API we discussed preprocessing issues for the preparation
of outlier datasets with annotatded ground truth andprovide 23 datasets in about 1000 variants
I we tested 12 outlier detection methods on thesedatasets with a range of choices for the neighborhoodparameter k ∈ [1, . . . , 100]
43
On theEvaluation ofUnsupervised
OutlierDetection
Arthur Zimek
Outlier DetectionMethods
Evaluation Measures
Datasets
Experiments
Conclusions
References
Conclusions
I we aggregate and analyse the resulting > 1, 3 millionexperiments and
I summarize the effectiveness of the 12 methodsI study the suitability of the datasets for evaluation of
outlier detectionI we offer all results and analyses together with source
code online:http://www.dbs.ifi.lmu.de/research/outlier-evaluation/
I experiments can be easily repeated and extended forother methods and other datasets
44
On theEvaluation ofUnsupervised
OutlierDetection
Arthur Zimek
Outlier DetectionMethods
Evaluation Measures
Datasets
Experiments
Conclusions
References
Thank you for your attention!
And many thanks to mycollaborators:
I Guilherme O.Campos
I Jörg SanderI Ricardo J. G. B.
CampelloI Barbora MicenkováI Erich SchubertI Ira AssentI Mike E. Houle
45
On theEvaluation ofUnsupervised
OutlierDetection
Arthur Zimek
Outlier DetectionMethods
Evaluation Measures
Datasets
Experiments
Conclusions
References
References I
N. Abe, B. Zadrozny, and J. Langford. Outlier detection by active learning. InProceedings of the 12th ACM International Conference on KnowledgeDiscovery and Data Mining (SIGKDD), Philadelphia, PA, pages 504–509, 2006.doi: 10.1145/1150402.1150459.
L. Akoglu, H. Tong, and D. Koutra. Graph-based anomaly detection anddescription: A survey. Data Mining and Knowledge Discovery, 29(3):626–688,2015. doi: 10.1007/s10618-014-0365-y.
F. Angiulli and C. Pizzuti. Outlier mining in large high-dimensional data sets. IEEETransactions on Knowledge and Data Engineering, 17(2):203–215, 2005. doi:10.1109/TKDE.2005.31.
M. M. Breunig, H.-P. Kriegel, R.T. Ng, and J. Sander. LOF: Identifyingdensity-based local outliers. In Proceedings of the ACM InternationalConference on Management of Data (SIGMOD), Dallas, TX, pages 93–104,2000. doi: 10.1145/342009.335388.
R. J. G. B. Campello, D. Moulavi, A. Zimek, and J. Sander. Hierarchical densityestimates for data clustering, visualization, and outlier detection. ACMTransactions on Knowledge Discovery from Data (TKDD), 10(1):5:1–51, 2015.doi: 10.1145/2733381.
46
On theEvaluation ofUnsupervised
OutlierDetection
Arthur Zimek
Outlier DetectionMethods
Evaluation Measures
Datasets
Experiments
Conclusions
References
References II
G. O. Campos, A. Zimek, J. Sander, R. J. G. B. Campello, B. Micenková,E. Schubert, I. Assent, and M. E. Houle. On the evaluation of unsupervisedoutlier detection: Measures, datasets, and an empirical study. Data Mining andKnowledge Discovery, 2016. doi: 10.1007/s10618-015-0444-8.
V. Chandola, A. Banerjee, and V. Kumar. Anomaly detection: A survey. ACMComputing Surveys, 41(3):Article 15, 1–58, 2009. doi:10.1145/1541880.1541882.
N. Craswell. Precision at n. In L. Liu and M. T. Özsu, editors, Encyclopedia ofDatabase Systems, pages 2127–2128. Springer, 2009a. doi:10.1007/978-0-387-39940-9_484.
N. Craswell. R-precision. In L. Liu and M. T. Özsu, editors, Encyclopedia ofDatabase Systems, page 2453. Springer, 2009b. doi:10.1007/978-0-387-39940-9_486.
J. Davis and M. Goadrich. The relationship between Precision-Recall and ROCcurves. In Proceedings of the 23rd International Conference on MachineLearning (ICML), Pittsburgh, PA, pages 233–240, 2006.
T. de Vries, S. Chawla, and M. E. Houle. Density-preserving projections forlarge-scale local anomaly detection. Knowledge and Information Systems(KAIS), 32(1):25–52, 2012. doi: 10.1007/s10115-011-0430-4.
47
On theEvaluation ofUnsupervised
OutlierDetection
Arthur Zimek
Outlier DetectionMethods
Evaluation Measures
Datasets
Experiments
Conclusions
References
References III
J. Gao and P.-N. Tan. Converting output scores from outlier detection algorithmsinto probability estimates. In Proceedings of the 6th IEEE InternationalConference on Data Mining (ICDM), Hong Kong, China, pages 212–221, 2006.doi: 10.1109/ICDM.2006.43.
J. A. Hanley and B. J. McNeil. The meaning and use of the area under a receiveroperating characteristic (roc) curve. Radiology, 143:29–36, 1982.
V. Hautamäki, I. Kärkkäinen, and P. Fränti. Outlier detection using k-nearestneighbor graph. In Proceedings of the 17th International Conference onPattern Recognition (ICPR), Cambridge, England, UK, pages 430–433, 2004.doi: 10.1109/ICPR.2004.1334558.
D. Hawkins. Identification of Outliers. Chapman and Hall, 1980.
L. Hubert and P. Arabie. Comparing partitions. Journal of Classification, 2(1):193–218, 1985.
W. Jin, A. K. H. Tung, J. Han, and W. Wang. Ranking outliers using symmetricneighborhood relationship. In Proceedings of the 10th Pacific-Asia Conferenceon Knowledge Discovery and Data Mining (PAKDD), Singapore, pages577–593, 2006. doi: 10.1007/11731139_68.
48
On theEvaluation ofUnsupervised
OutlierDetection
Arthur Zimek
Outlier DetectionMethods
Evaluation Measures
Datasets
Experiments
Conclusions
References
References IV
F. Keller, E. Müller, and K. Böhm. HiCS: high contrast subspaces fordensity-based outlier ranking. In Proceedings of the 28th InternationalConference on Data Engineering (ICDE), Washington, DC, pages 1037–1048,2012. doi: 10.1109/ICDE.2012.88.
H.-P. Kriegel, M. Schubert, and A. Zimek. Angle-based outlier detection inhigh-dimensional data. In Proceedings of the 14th ACM InternationalConference on Knowledge Discovery and Data Mining (SIGKDD), Las Vegas,NV, pages 444–452, 2008. doi: 10.1145/1401890.1401946.
H.-P. Kriegel, P. Kröger, E. Schubert, and A. Zimek. LoOP: local outlierprobabilities. In Proceedings of the 18th ACM Conference on Information andKnowledge Management (CIKM), Hong Kong, China, pages 1649–1652, 2009.doi: 10.1145/1645953.1646195.
H.-P. Kriegel, P. Kröger, E. Schubert, and A. Zimek. Interpreting and unifyingoutlier scores. In Proceedings of the 11th SIAM International Conference onData Mining (SDM), Mesa, AZ, pages 13–24, 2011. doi:10.1137/1.9781611972818.2.
H.-P. Kriegel, E. Schubert, and A. Zimek. The (black) art of runtime evaluation:Are we comparing algorithms or implementations? submitted, 2016.
49
On theEvaluation ofUnsupervised
OutlierDetection
Arthur Zimek
Outlier DetectionMethods
Evaluation Measures
Datasets
Experiments
Conclusions
References
References V
L. J. Latecki, A. Lazarevic, and D. Pokrajac. Outlier detection with kernel densityfunctions. In Proceedings of the 5th International Conference on MachineLearning and Data Mining in Pattern Recognition (MLDM), Leipzig, Germany,pages 61–75, 2007. doi: 10.1007/978-3-540-73499-4_6.
A. Lazarevic and V. Kumar. Feature bagging for outlier detection. In Proceedingsof the 11th ACM International Conference on Knowledge Discovery and DataMining (SIGKDD), Chicago, IL, pages 157–166, 2005. doi:10.1145/1081870.1081891.
H. O. Marques, R. J. G. B. Campello, A. Zimek, and J. Sander. On the internalevaluation of unsupervised outlier detection. In Proceedings of the 27thInternational Conference on Scientific and Statistical Database Management(SSDBM), San Diego, CA, pages 7:1–12, 2015. doi: 10.1145/2791347.2791352.
H. V. Nguyen and V. Gopalkrishnan. Feature extraction for outlier detection inhigh-dimensional spaces. Journal of Machine Learning Research, ProceedingsTrack 10:66–75, 2010.
H. V. Nguyen, H. H. Ang, and V. Gopalkrishnan. Mining outliers with ensemble ofheterogeneous detectors on random subspaces. In Proceedings of the 15thInternational Conference on Database Systems for Advanced Applications(DASFAA), Tsukuba, Japan, pages 368–383, 2010. doi:10.1007/978-3-642-12026-8_29.
50
On theEvaluation ofUnsupervised
OutlierDetection
Arthur Zimek
Outlier DetectionMethods
Evaluation Measures
Datasets
Experiments
Conclusions
References
References VI
G. H. Orair, C. Teixeira, Y. Wang, W. Meira Jr., and S. Parthasarathy.Distance-based outlier detection: Consolidation and renewed bearing.Proceedings of the VLDB Endowment, 3(2):1469–1480, 2010.
M. Radovanovic, A. Nanopoulos, and M. Ivanovic. Reverse nearest neighbors inunsupervised distance-based outlier detection. IEEE Transactions onKnowledge and Data Engineering, 2014. doi: 10.1109/TKDE.2014.2365790.
S. Ramaswamy, R. Rastogi, and K. Shim. Efficient algorithms for mining outliersfrom large data sets. In Proceedings of the ACM International Conference onManagement of Data (SIGMOD), Dallas, TX, pages 427–438, 2000. doi:10.1145/342009.335437.
E. Schubert, R. Wojdanowski, A. Zimek, and H.-P. Kriegel. On evaluation of outlierrankings and outlier scores. In Proceedings of the 12th SIAM InternationalConference on Data Mining (SDM), Anaheim, CA, pages 1047–1058, 2012.doi: 10.1137/1.9781611972825.90.
E. Schubert, A. Zimek, and H.-P. Kriegel. Local outlier detection reconsidered: ageneralized view on locality with applications to spatial, video, and networkoutlier detection. Data Mining and Knowledge Discovery, 28(1):190–237,2014a. doi: 10.1007/s10618-012-0300-z.
51
On theEvaluation ofUnsupervised
OutlierDetection
Arthur Zimek
Outlier DetectionMethods
Evaluation Measures
Datasets
Experiments
Conclusions
References
References VII
E. Schubert, A. Zimek, and H.-P. Kriegel. Generalized outlier detection withflexible kernel density estimates. In Proceedings of the 14th SIAM InternationalConference on Data Mining (SDM), Philadelphia, PA, pages 542–550, 2014b.doi: 10.1137/1.9781611973440.63.
E. Schubert, A. Koos, T. Emrich, A. Züfle, K. A. Schmid, and A. Zimek. Aframework for clustering uncertain data. Proceedings of the VLDB Endowment,8(12):1976–1979, 2015. doi: 10.14778/2824032.2824115.
J. Tang, Z. Chen, A. W.-C. Fu, and D. W. Cheung. Enhancing effectiveness ofoutlier detections for low density patterns. In Proceedings of the 6thPacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD),Taipei, Taiwan, pages 535–548, 2002. doi: 10.1007/3-540-47887-6_53.
E. Zhang and Y. Zhang. Average precision. In L. Liu and M. T. Özsu, editors,Encyclopedia of Database Systems, pages 192–193. Springer, 2009. doi:10.1007/978-0-387-39940-9_482.
K. Zhang, M. Hutter, and H. Jin. A new local distance-based outlier detectionapproach for scattered real-world data. In Proceedings of the 13th Pacific-AsiaConference on Knowledge Discovery and Data Mining (PAKDD), Bangkok,Thailand, pages 813–822, 2009. doi: 10.1007/978-3-642-01307-2_84.
52
On theEvaluation ofUnsupervised
OutlierDetection
Arthur Zimek
Outlier DetectionMethods
Evaluation Measures
Datasets
Experiments
Conclusions
References
References VIII
A. Zimek, E. Schubert, and H.-P. Kriegel. A survey on unsupervised outlierdetection in high-dimensional numerical data. Statistical Analysis and DataMining, 5(5):363–387, 2012. doi: 10.1002/sam.11161.
A. Zimek, M. Gaudet, R. J. G. B. Campello, and J. Sander. Subsampling forefficient and effective unsupervised outlier detection ensembles. InProceedings of the 19th ACM International Conference on KnowledgeDiscovery and Data Mining (SIGKDD), Chicago, IL, pages 428–436, 2013. doi:10.1145/2487575.2487676.
53