The Mass Volume curve, a performance metric for ......Anomaly detection Mass Volume curve Extreme...

Anomaly detection Mass Volume curve Extreme regions

The Mass Volume curve, a performance metric forunsupervised anomaly detection

Rare Events, Extremes and Machine Learning Workshop

May 24th, 2018

Albert Thomas

Huawei Technologies - Telecom ParisTech - Airbus Group Innovations

Joint work with Stephan Clemencon, Alexandre Gramfort, Vincent

Feuillard and Anne Sabourin.

1 / 32

Outline

1 Unsupervised anomaly detection

2 The Mass Volume curve

3 Anomaly detection in extreme regions

2 / 32

Unsupervised anomaly detection

1. Find anomalies in an unlabeled data set X1, . . . ,Xn

2. Anomalies are assumed to be rare events

3 / 32

Unsupervised anomaly detection

X1, . . . ,Xn P Rd unlabeled data setiid P

density f w.r.t. Lebesgue measure λ

anomalies rare events

Estimate a density level set

4 / 32

Minimum Volume set (Polonik, 1997)

A density level set is a Minimum Volume set, i.e. a solution of

minΩPBpRd q

λpΩq such that PpΩq ¥ α

6 4 2 0 2 4 6

5 / 32

Minimum volume set density level set

1. A density level set is always a minimum volume set

2. If f has no flat parts, a minimum volume set is a density level set.

(Einmahl and Mason, 1992), (Polonik, 1997), (Nunez-Garcia et al., 2003)

6 / 32

Unsupervised anomaly detection algorithms

Common approach for most algorithms

1 Learn a scoring function

s : x P Rd ÞÑ R

such that the smaller spxq the more abnormal is x .

2 Threshold s at an offset q such that pΩα tx , spxq ¥ qu is anestimation of the Minimum Volume set with mass α.

Ñ density estimation (Cadre et al., 2013), One-Class SVM/SupportVector Data Description (Scholkopf et al., 2001), (Vert and Vert, 2006),(Tax et al., 2004), k-NN (Sricharan & Hero, 2011)

Ñ Isolation Forest (Liu et al., 2008) and Local Outlier Factor (Breunig et

al., 2000)

7 / 32

Ideal scoring functions

Ideal scoring functions s preserve the order induced by density f

spx1q spx2q ðñ f px1q f px2qi.e. strictly increasing transform of f .

0.00 0.25 0.50 0.75 1.000

5density ff (x− 0.05)f (x) + 2

s does not need to be close to f in the sense of a Lp norm

8 / 32

Scoring function of the One-Class SVM

0.2Gaussian mixture density

0 5 100

One-Class SVM scoring functions

Asymptotically constant near the modes and proportional tothe density in the low density regions (Vert and Vert, 2006)

9 / 32

Problem

One-Class SVM with Gaussian kernel kσ

The user needs to choose the bandwidth σ of the kernel

σ 0.5

10 5 0 5

Estimated Tru

Overfitting

10 5 0 5

Estimated

Underfitting

How to automatically choose σ?

10 / 32

Problem

Given a data set Sn pX1, . . . ,Xnq, a hyperparameter space Θ andan unsupervised anomaly detection algorithm

A : Sn Θ Ñ RRd

pSn, θq ÞÑ sθ

How to assess the performance of sθwithout a labeled data set?

(Anomaly Detection Workshop, Thomas, Clemencon, Feuillard, Gramfort,

ICML 2016)

11 / 32

Mass Volume curve

X P, scoring function s : Rd ÝÑ R,

t-level set of s: tx , spxq ¥ tu

αsptq PpspX q ¥ tq mass of the t-level set

λsptq λptx , spxq ¥ tuq volume of t-the level set.

12 / 32

Mass Volume curve

Mass Volume curve MVs of a scoring function s(Clemencon and Jakubowicz, 2013), (Clemencon and Thomas, 2017)

t P R ÞÑ pαsptq, λsptqq

4 3 2 1 0 1 2 3 4

Scoring functions

0.0 0.2 0.4 0.6 0.8 1.0Mass

MVfMVs

Mass Volume curves

13 / 32

Mass Volume curve

Easier to work with the following definition: MVs is defined as theplot of the function

MVs : α P p0, 1q ÞÑ λspα1s pαqq λptx , spxq ¥ α1

s pαquq

where α1s generalized inverse of αs .

Property

(Clemencon and Jakubowicz, 2013), (Clemencon and Thomas, 2017)

Assume that the underlying density f has no flat parts, then for allscoring functions s,

@α P p0, 1q, MVpαq def MVf pαq ¤ MVspαq

The lower is MVs the better is s

14 / 32

Learning hyperparameters

Consider sθ ApSn, θq

Choose θ minimizing area under MVsθ

As MVsθ depends on P, use empirical MV curve estimated on atest set

yMVs : α P r0, 1q ÞÑ λsppα1s pαqq

where pαsptq 1n

°ni1 1tx ,spxq¥tupXi q.

15 / 32

Algorithm

Choose θ: minimizing area under Mass Volume curve AMVsθ .

16 / 32

Aggregation

For each random split we may obtain a different θ:

Ñ to reduce the variance of the estimator consider B randomsplits of the data set

for each random split b we get θ:b and sbθ:b

final scoring function

sbθ:b

17 / 32

Toy example

Minimum Volume set estimation with One-Class SVM and B 50(α 0.95)

10 5 0 5

Estimated Tru

18 / 32

Experiments

Given an anomaly detection algorithm A, compare

our approach ÝÑ stuned

a priori fixed hyperparameters ÝÑ sfixed

Performance criterion: relative gain

GApstuned, sfixedq AMVI psfixedq AMVI pstunedqAMVI psfixedq

where AMVI is area under MV curve over interval I r0.9, 0.99s,computed on left out data.

If GApstuned, sfixedq ¡ 0 then stuned better than sfixed

19 / 32

Results

For stuned we consider 50 random splits (80/20)

GM, d=2 GM, d=4 Banana HTTP

tive g

aKLPEKLPEOCSVMKSiForest

GM, d=2 GM, d=4 Banana HTTP

tive g

aKLPEKLPEOCSVMKSiForest

KLPE (Sricharan & Hero, 2011) — aKLPE: Average KLPE (Qian & Saligrama,

2012) — OCSVM: One-Class SVM (Scholkopf et al., 2001) — iForest:

Isolation Forest (Liu et al., 2008) — KS: Kernel Smoothing

20 / 32

Consistency of yMVs

We used yMVs as an estimate of MVs where

@α P r0, 1q, yMVspαq λtx , spxq ¥ pα1s pαqu

with pα1s the generalized inverse of pαs .

2 questions

Consistency of yMVs as n Ñ8?

How to build confidence regions?

21 / 32

Consistency

Let s be a scoring function and ε P p0, 1s .

Assumptions on the random variable spX q and its distribution Fs

λs is C2

Theorem (Clemencon and Thomas, 2017)

(i) Consistency. With probability one,

supαPr0,1εs

|yMVspαq MVspαq| ÝÑnÑ8

(ii) Functional CLT. There exists a sequence of Brownian bridgestBnpαquαPr0,1s such that, almost-surely, uniformly over r0, 1 εs, asn Ñ8,

?nyMVspαq MVspαq

λ1spα1

s pαqqfspα1

s pαqq Bnpαq Opn12 log nq

22 / 32

Confidence bands using smoothed bootstrap

0.05 0.15 0.25 0.35 0.45 0.55 0.65 0.75 0.85 0.95Mass

90% confidence band

Up to log n factors (Clemencon and Thomas, 2017),

Functional CLT: rate in Opn12q but requires knowledge of fs

Naive (non-smoothed) bootstrap: rate in Opn14qSmoothed bootstrap: rate in Opn25q

23 / 32

Anomaly detection in extreme regions

Goal of multivariate Extreme Value Theory: model the tail of thedistribution F of a multivariate random variable

X pX p1q, . . . ,X pdqqMotivation for unsupervised anomaly detection

Anomalies are likely to be located in extreme regions, i.e. regions farfrom the mean ErX sLack of data in these regions makes it difficult to distinguishbetween large normal instances and anomalies

Relying on multivariate Extreme Value Theory we suggest analgorithm to detect anomalies in extreme regions.

24 / 32

Multivariate Extreme Value Theory

Common approach

Standardization to unit Pareto: T pX q V P Rd where

V pjq 1

1 FjpX pjqq , @j P t1, . . . , du,

Fj , 1 ¤ j ¤ d , being the margins. Note that X and V share thesame dependence structure/copula.

25 / 32

Multivariate Extreme Value Theory

prpvq, ϕpvqq pv8, vv8q polar coordinates

Sd1 positive orthant of the unit hypercube

Theorem (Resnick, 1987)

With mild assumptions on the distribution F , there exists a finite(angular) measure Φ on Sd1 such that for all Ω Sd1,

ΦtpΩq def t PprpV q ¡ t, ϕpV q P Ωq ÝÑtÑ8

ΦpΩq

26 / 32

Main idea

Anomalies in extreme regions: observations that deviate fromthe dependence structure of the tail

Dependence Independence

To find the most likely directions of the extreme observations weestimate a Minimum Volume set of the angular measure Φ

27 / 32

Minimum volume set estimation on the sphere

Solve empirical optimization problem (Scott and Nowak, 2006)

minΩPG

λdpΩq subject to pΦn,kpΩq ¥ α ψkpδq

where pΦn,k is estimated from t PprpV q ¡ t, ϕpV q P Ωq witht nk :

pΦn,kpΩq 1

1trpV i q¥nk, ϕpV i qPΩu

28 / 32

TheoremΩα being the true Minimum Volume set,

RpΩq pλdpΩq λdpΩαqq

α ΦnkpΩq

Theorem (Thomas, Clemencon, Gramfort, Sabourin, 2017)

Assume that

the margins Fj are known

the class G is of finite VC dimension

common assumptions for existence and uniqueness of MV sets arefulfilled by Φnk

Then there exists a constant C ¡ 0 such that

ErRppΩαqs ¤

infΩPGα

λdpΩq λdpΩαq C

clog k

where Gα tΩ P G,ΦnkpΩq ¥ αu.29 / 32

Numerical experiments

Global scoring function

sprpvq, ϕpvqq 1rpvq2 sϕpϕpvqq.

Data set OCSVM Isolation Forest Score s

shuttle 0.981 0.963 0.987SF 0.478 0.251 0.660

http 0.997 0.662 0.964ann 0.372 0.610 0.518

forestcover 0.540 0.516 0.646

ROC-AUC are computed on test sets made of normal andabnormal instances and restricted to the extreme region (k ?

30 / 32

References

Mass Volume curves and anomaly ranking. S. Clemencon, A.Thomas. Submitted. 2017

Learning hyperparameters for unsupervised anomaly detection. A.Thomas, S. Clemencon, V. Feuillard, A. Gramfort. Anomalydetection Workshop, ICML 2016.

Anomaly detection in extreme regions via empirical MV sets on thesphere. A. Thomas, S. Clemencon, A. Gramfort, A. Sabourin.AISTATS 2017.

Code for hyperparameter tuning available on githubhttps://github.com/albertcthomas/anomaly_tuning

31 / 32

Thank you

32 / 32

The Mass Volume curve, a performance metric for ......Anomaly detection Mass Volume curve Extreme...

Documents

Gas Mass & Conservation of Mass ©mcgraw-hill. Solid Atomic view Fixed shape Fixed volume No volume change under pressure Gas Mass & Conservation of Mass

Mass, volume, density

Density - kubesclass.weebly.com · volume, float? Is it really weight? Mass per Volume There is more mass per unit of volume in the bowling ball Mass 1 Bowling Ball = Mass of 18 Basketballs

Matter solid liquid gas Definite: mass volume shape Definite: mass volume Definite: mass

Chapter11-Volume, Mass and Time

Weight, Mass, Volume and Density

Mass and Volume - Wikispacesand+Volume... · ... DV is volume, mass/ volume ratio (density) is the ... Marks without numbers ... Mass and Volume Lab Graduated Cylinders

Volume & weight (mass) conversions

Density refers to how much stuff (mass) there is in a given volume. Density= mass/volume

“ Mass Communication ” [Mass & Volume]. (1) What’s Our Purpose Today? Measure the mass and volume of matter. Compare the mass and volume of different

What is Density? Density is defined as mass per unit volume. Density = Mass Volume

Mass, Volume, & Density. Short Informational Videos Mass Volume & Density Buoyancy

Development of Parametric Mass and Volume Models for … · Development of Parametric Mass and Volume ... describes the parametric mass and volume models that are used ... from the

Exploring mass, volume & density An integrated Science and … · 2019. 2. 14. · Density is the relationship between an object’s mass and volume. Mass and volume are topics firmly

Motion and Time Study - Semantic Scholar...line and can also got unit curve and average curve Learning Curve Applies • Mass Production – Assembly line • G.T. Cell – Repetitive

Density, Mass, & Volume

EAV Mass: 2.6 t Volume: 12 m 3 EDV Mass: 2.6 t Volume: 12 m 3 Transit to Earth Mass: 45.6 t Volume: 145 m 3 MAV Mass: 2.6 t Volume: 12 m 3 TransHab in

What is ultrasonic plethysmography and how can …...Comparable to capnography the molar mass analysis of tidal breathing shows the expiratory molar mass curve vs. expired volume

Volume, Capacity and Mass - Grade 5 ASWgrade5asw.weebly.com/uploads/2/2/4/1/22418994/vol__mass_f_teache… · Volume, Capacity and Mass Teacher Book. ... Capacity and Mass Volume

Plant Methods BioMed Central - CORE · [SA] = [(luminesence - y-intercept standard curve)/slope standard curve]/tissue mass where known luminescence of a sample and tissue mass are