View
9
Download
0
Category
Preview:
Citation preview
Anomaly detection Mass Volume curve Extreme regions
The Mass Volume curve, a performance metric forunsupervised anomaly detection
Rare Events, Extremes and Machine Learning Workshop
May 24th, 2018
Albert Thomas
Huawei Technologies - Telecom ParisTech - Airbus Group Innovations
Joint work with Stephan Clemencon, Alexandre Gramfort, Vincent
Feuillard and Anne Sabourin.
1 / 32
Anomaly detection Mass Volume curve Extreme regions
Outline
1 Unsupervised anomaly detection
2 The Mass Volume curve
3 Anomaly detection in extreme regions
2 / 32
Anomaly detection Mass Volume curve Extreme regions
Unsupervised anomaly detection
1. Find anomalies in an unlabeled data set X1, . . . ,Xn
2. Anomalies are assumed to be rare events
3 / 32
Anomaly detection Mass Volume curve Extreme regions
Unsupervised anomaly detection
X1, . . . ,Xn P Rd unlabeled data setiid P
density f w.r.t. Lebesgue measure λ
anomalies rare events
Estimate a density level set
4 / 32
Anomaly detection Mass Volume curve Extreme regions
Minimum Volume set (Polonik, 1997)
A density level set is a Minimum Volume set, i.e. a solution of
minΩPBpRd q
λpΩq such that PpΩq ¥ α
6 4 2 0 2 4 6
6
4
2
0
2
4
6
5 / 32
Anomaly detection Mass Volume curve Extreme regions
Minimum volume set density level set
1. A density level set is always a minimum volume set
2. If f has no flat parts, a minimum volume set is a density level set.
(Einmahl and Mason, 1992), (Polonik, 1997), (Nunez-Garcia et al., 2003)
6 / 32
Anomaly detection Mass Volume curve Extreme regions
Unsupervised anomaly detection algorithms
Common approach for most algorithms
1 Learn a scoring function
s : x P Rd ÞÑ R
such that the smaller spxq the more abnormal is x .
2 Threshold s at an offset q such that pΩα tx , spxq ¥ qu is anestimation of the Minimum Volume set with mass α.
Ñ density estimation (Cadre et al., 2013), One-Class SVM/SupportVector Data Description (Scholkopf et al., 2001), (Vert and Vert, 2006),(Tax et al., 2004), k-NN (Sricharan & Hero, 2011)
Ñ Isolation Forest (Liu et al., 2008) and Local Outlier Factor (Breunig et
al., 2000)
7 / 32
Anomaly detection Mass Volume curve Extreme regions
Ideal scoring functions
Ideal scoring functions s preserve the order induced by density f
spx1q spx2q ðñ f px1q f px2qi.e. strictly increasing transform of f .
0.00 0.25 0.50 0.75 1.000
1
2
3
4
5density ff (x− 0.05)f (x) + 2
s does not need to be close to f in the sense of a Lp norm
8 / 32
Anomaly detection Mass Volume curve Extreme regions
Scoring function of the One-Class SVM
0.0
0.2Gaussian mixture density
f
0 5 100
20
One-Class SVM scoring functions
Asymptotically constant near the modes and proportional tothe density in the low density regions (Vert and Vert, 2006)
9 / 32
Anomaly detection Mass Volume curve Extreme regions
Problem
One-Class SVM with Gaussian kernel kσ
The user needs to choose the bandwidth σ of the kernel
σ 0.5
10 5 0 5
5
0
5
Estimated Tru
e
True
Overfitting
σ 10
10 5 0 5
5
0
5
Estimated
Tru
e
True
Underfitting
How to automatically choose σ?
10 / 32
Anomaly detection Mass Volume curve Extreme regions
Problem
Given a data set Sn pX1, . . . ,Xnq, a hyperparameter space Θ andan unsupervised anomaly detection algorithm
A : Sn Θ Ñ RRd
pSn, θq ÞÑ sθ
How to assess the performance of sθwithout a labeled data set?
(Anomaly Detection Workshop, Thomas, Clemencon, Feuillard, Gramfort,
ICML 2016)
11 / 32
Anomaly detection Mass Volume curve Extreme regions
Mass Volume curve
X P, scoring function s : Rd ÝÑ R,
t-level set of s: tx , spxq ¥ tu
αsptq PpspX q ¥ tq mass of the t-level set
λsptq λptx , spxq ¥ tuq volume of t-the level set.
12 / 32
Anomaly detection Mass Volume curve Extreme regions
Mass Volume curve
Mass Volume curve MVs of a scoring function s(Clemencon and Jakubowicz, 2013), (Clemencon and Thomas, 2017)
t P R ÞÑ pαsptq, λsptqq
4 3 2 1 0 1 2 3 4
x
0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
0.40
Sco
re
fs
Scoring functions
0.0 0.2 0.4 0.6 0.8 1.0Mass
0
1
2
3
4
5
6
7
8
Volu
me
MVfMVs
Mass Volume curves
13 / 32
Anomaly detection Mass Volume curve Extreme regions
Mass Volume curve
Easier to work with the following definition: MVs is defined as theplot of the function
MVs : α P p0, 1q ÞÑ λspα1s pαqq λptx , spxq ¥ α1
s pαquq
where α1s generalized inverse of αs .
Property
(Clemencon and Jakubowicz, 2013), (Clemencon and Thomas, 2017)
Assume that the underlying density f has no flat parts, then for allscoring functions s,
@α P p0, 1q, MVpαq def MVf pαq ¤ MVspαq
The lower is MVs the better is s
14 / 32
Anomaly detection Mass Volume curve Extreme regions
Learning hyperparameters
Consider sθ ApSn, θq
Choose θ minimizing area under MVsθ
As MVsθ depends on P, use empirical MV curve estimated on atest set
yMVs : α P r0, 1q ÞÑ λsppα1s pαqq
where pαsptq 1n
°ni1 1tx ,spxq¥tupXi q.
15 / 32
Anomaly detection Mass Volume curve Extreme regions
Algorithm
Choose θ: minimizing area under Mass Volume curve AMVsθ .
16 / 32
Anomaly detection Mass Volume curve Extreme regions
Aggregation
For each random split we may obtain a different θ:
Ñ to reduce the variance of the estimator consider B randomsplits of the data set
for each random split b we get θ:b and sbθ:b
final scoring function
pS 1
B
B
b1
sbθ:b
17 / 32
Anomaly detection Mass Volume curve Extreme regions
Toy example
Minimum Volume set estimation with One-Class SVM and B 50(α 0.95)
10 5 0 5
5
0
5
Est
imate
d
Estimated Tru
e
True
18 / 32
Anomaly detection Mass Volume curve Extreme regions
Experiments
Given an anomaly detection algorithm A, compare
our approach ÝÑ stuned
a priori fixed hyperparameters ÝÑ sfixed
Performance criterion: relative gain
GApstuned, sfixedq AMVI psfixedq AMVI pstunedqAMVI psfixedq
where AMVI is area under MV curve over interval I r0.9, 0.99s,computed on left out data.
If GApstuned, sfixedq ¡ 0 then stuned better than sfixed
19 / 32
Anomaly detection Mass Volume curve Extreme regions
Results
For stuned we consider 50 random splits (80/20)
GM, d=2 GM, d=4 Banana HTTP
20
0
20
40
60
80
100
Rela
tive g
ain
(%
)
aKLPEKLPEOCSVMKSiForest
n 100
GM, d=2 GM, d=4 Banana HTTP
20
0
20
40
60
80
100
Rela
tive g
ain
(%
)
aKLPEKLPEOCSVMKSiForest
n 500
KLPE (Sricharan & Hero, 2011) — aKLPE: Average KLPE (Qian & Saligrama,
2012) — OCSVM: One-Class SVM (Scholkopf et al., 2001) — iForest:
Isolation Forest (Liu et al., 2008) — KS: Kernel Smoothing
20 / 32
Anomaly detection Mass Volume curve Extreme regions
Consistency of yMVs
We used yMVs as an estimate of MVs where
@α P r0, 1q, yMVspαq λtx , spxq ¥ pα1s pαqu
with pα1s the generalized inverse of pαs .
2 questions
Consistency of yMVs as n Ñ8?
How to build confidence regions?
21 / 32
Anomaly detection Mass Volume curve Extreme regions
Consistency
Let s be a scoring function and ε P p0, 1s .
Assumptions on the random variable spX q and its distribution Fs
λs is C2
Theorem (Clemencon and Thomas, 2017)
(i) Consistency. With probability one,
supαPr0,1εs
|yMVspαq MVspαq| ÝÑnÑ8
0
(ii) Functional CLT. There exists a sequence of Brownian bridgestBnpαquαPr0,1s such that, almost-surely, uniformly over r0, 1 εs, asn Ñ8,
?nyMVspαq MVspαq
λ1spα1
s pαqqfspα1
s pαqq Bnpαq Opn12 log nq
22 / 32
Anomaly detection Mass Volume curve Extreme regions
Confidence bands using smoothed bootstrap
0.05 0.15 0.25 0.35 0.45 0.55 0.65 0.75 0.85 0.95Mass
0
10
20
30
40
Volu
me
MVf
90% confidence band
Up to log n factors (Clemencon and Thomas, 2017),
Functional CLT: rate in Opn12q but requires knowledge of fs
Naive (non-smoothed) bootstrap: rate in Opn14qSmoothed bootstrap: rate in Opn25q
23 / 32
Anomaly detection Mass Volume curve Extreme regions
Anomaly detection in extreme regions
Goal of multivariate Extreme Value Theory: model the tail of thedistribution F of a multivariate random variable
X pX p1q, . . . ,X pdqqMotivation for unsupervised anomaly detection
Anomalies are likely to be located in extreme regions, i.e. regions farfrom the mean ErX sLack of data in these regions makes it difficult to distinguishbetween large normal instances and anomalies
Relying on multivariate Extreme Value Theory we suggest analgorithm to detect anomalies in extreme regions.
24 / 32
Anomaly detection Mass Volume curve Extreme regions
Multivariate Extreme Value Theory
Common approach
Standardization to unit Pareto: T pX q V P Rd where
V pjq 1
1 FjpX pjqq , @j P t1, . . . , du,
Fj , 1 ¤ j ¤ d , being the margins. Note that X and V share thesame dependence structure/copula.
25 / 32
Anomaly detection Mass Volume curve Extreme regions
Multivariate Extreme Value Theory
prpvq, ϕpvqq pv8, vv8q polar coordinates
Sd1 positive orthant of the unit hypercube
Theorem (Resnick, 1987)
With mild assumptions on the distribution F , there exists a finite(angular) measure Φ on Sd1 such that for all Ω Sd1,
ΦtpΩq def t PprpV q ¡ t, ϕpV q P Ωq ÝÑtÑ8
ΦpΩq
26 / 32
Anomaly detection Mass Volume curve Extreme regions
Main idea
Anomalies in extreme regions: observations that deviate fromthe dependence structure of the tail
Dependence Independence
To find the most likely directions of the extreme observations weestimate a Minimum Volume set of the angular measure Φ
27 / 32
Anomaly detection Mass Volume curve Extreme regions
Minimum volume set estimation on the sphere
Solve empirical optimization problem (Scott and Nowak, 2006)
minΩPG
λdpΩq subject to pΦn,kpΩq ¥ α ψkpδq
where pΦn,k is estimated from t PprpV q ¡ t, ϕpV q P Ωq witht nk :
pΦn,kpΩq 1
k
n
i1
1trpV i q¥nk, ϕpV i qPΩu
n/k
n/k
28 / 32
Anomaly detection Mass Volume curve Extreme regions
TheoremΩα being the true Minimum Volume set,
RpΩq pλdpΩq λdpΩαqq
α ΦnkpΩq
Theorem (Thomas, Clemencon, Gramfort, Sabourin, 2017)
Assume that
the margins Fj are known
the class G is of finite VC dimension
common assumptions for existence and uniqueness of MV sets arefulfilled by Φnk
Then there exists a constant C ¡ 0 such that
ErRppΩαqs ¤
infΩPGα
λdpΩq λdpΩαq C
clog k
k
where Gα tΩ P G,ΦnkpΩq ¥ αu.29 / 32
Anomaly detection Mass Volume curve Extreme regions
Numerical experiments
Global scoring function
sprpvq, ϕpvqq 1rpvq2 sϕpϕpvqq.
Data set OCSVM Isolation Forest Score s
shuttle 0.981 0.963 0.987SF 0.478 0.251 0.660
http 0.997 0.662 0.964ann 0.372 0.610 0.518
forestcover 0.540 0.516 0.646
ROC-AUC are computed on test sets made of normal andabnormal instances and restricted to the extreme region (k ?
n).
30 / 32
Anomaly detection Mass Volume curve Extreme regions
References
Mass Volume curves and anomaly ranking. S. Clemencon, A.Thomas. Submitted. 2017
Learning hyperparameters for unsupervised anomaly detection. A.Thomas, S. Clemencon, V. Feuillard, A. Gramfort. Anomalydetection Workshop, ICML 2016.
Anomaly detection in extreme regions via empirical MV sets on thesphere. A. Thomas, S. Clemencon, A. Gramfort, A. Sabourin.AISTATS 2017.
Code for hyperparameter tuning available on githubhttps://github.com/albertcthomas/anomaly_tuning
31 / 32
Anomaly detection Mass Volume curve Extreme regions
Thank you
32 / 32
Recommended