30
EVALUATING REAL-TIME ANOMALY DETECTION: THE NUMENTA ANOMALY BENCHMARK MLCONF San Francisco November 13, 2015 Subutai Ahmad [email protected]

Subutai Ahmad, VP of Research, Numenta at MLconf SF - 11/13/15

  • Upload
    mlconf

  • View
    916

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Subutai Ahmad, VP of Research, Numenta at MLconf SF - 11/13/15

EVALUATING REAL-TIME ANOMALY DETECTION: THE NUMENTA ANOMALY BENCHMARK

MLCONF San FranciscoNovember 13, 2015

Subutai [email protected]

Page 2: Subutai Ahmad, VP of Research, Numenta at MLconf SF - 11/13/15

2

Monitoring IT

infrastructure

Uncovering fraudulent transaction

s

Tracking vehicles

Real-time health

monitoring

Monitoring energy

consumption

Detection is necessary, but prevention is often the goal

REAL-TIME ANOMALY DETECTION• Exponential growth in IoT, sensors and real-time data collection is

driving an explosion of streaming data• The biggest application for machine learning is anomaly detection

Page 3: Subutai Ahmad, VP of Research, Numenta at MLconf SF - 11/13/15

3

EXAMPLE: PREVENTATIVE MAINTENANCE

Page 4: Subutai Ahmad, VP of Research, Numenta at MLconf SF - 11/13/15

4

EXAMPLE: PREVENTATIVE MAINTENANCE

Plannedshutdown

Behavioral changepreceding failure

Catastrophicfailure

Page 5: Subutai Ahmad, VP of Research, Numenta at MLconf SF - 11/13/15

5

YET ANOTHER BENCHMARK?• A benchmark consists of:

• Labeled data sets• Scoring mechanism• Versioning system

• Most existing benchmarks are designed for batch data, not streaming data

• Hard to find benchmarks containing real world data labeled with anomalies

• We saw a need for a benchmark that is designed to test anomaly detection algorithms on real-time, streaming data

• A standard community benchmark could spur innovation in real-time anomaly detection algorithms

Page 6: Subutai Ahmad, VP of Research, Numenta at MLconf SF - 11/13/15

6

NUMENTA ANOMALY BENCHMARK (NAB)

• NAB: a rigorous benchmark for anomaly detection in streaming applications

Page 7: Subutai Ahmad, VP of Research, Numenta at MLconf SF - 11/13/15

7

NUMENTA ANOMALY BENCHMARK (NAB)

• NAB: a rigorous benchmark for anomaly detection in streaming applications

• Real-world benchmark data set• 58 labeled data streams (47 real-world, 11 artificial streams)• Total of 365,551 data points

Page 8: Subutai Ahmad, VP of Research, Numenta at MLconf SF - 11/13/15

8

NUMENTA ANOMALY BENCHMARK (NAB)

• NAB: a rigorous benchmark for anomaly detection in streaming applications

• Real-world benchmark data set• 58 labeled data streams (47 real-world, 11 artificial streams)• Total of 365,551 data points

• Scoring mechanism• Reward early detection• Anomaly windows• Scoring function• Different “application profiles”

Page 9: Subutai Ahmad, VP of Research, Numenta at MLconf SF - 11/13/15

9

NUMENTA ANOMALY BENCHMARK (NAB)

• NAB: a rigorous benchmark for anomaly detection in streaming applications

• Real-world benchmark data set• 58 labeled data streams (47 real-world, 11 artificial streams)• Total of 365,551 data points

• Scoring mechanism• Reward early detection• Anomaly windows• Scoring function• Different “application profiles”

• Open resource• AGPL repository contains data, source

code, and documentation• github.com/numenta/NAB

Page 10: Subutai Ahmad, VP of Research, Numenta at MLconf SF - 11/13/15

10

EXAMPLE: LOAD BALANCER HEALTHUnusually high load balancer latency

Page 11: Subutai Ahmad, VP of Research, Numenta at MLconf SF - 11/13/15

11

EXAMPLE: HOURLY SERVICE DEMAND

Spike in demandUnusually low demand

Page 12: Subutai Ahmad, VP of Research, Numenta at MLconf SF - 11/13/15

12

EXAMPLE: PRODUCTION SERVER CPUSpiking behavior becomes the new norm

Spike anomaly

Page 13: Subutai Ahmad, VP of Research, Numenta at MLconf SF - 11/13/15

13

HOW SHOULD WE SCORE ANOMALIES?

• The perfect detector• Detects every anomaly• Detects anomalies as soon as possible• Provides detections in real time• Triggers no false alarms• Requires no parameter tuning• Automatically adapts to changing statistics

Page 14: Subutai Ahmad, VP of Research, Numenta at MLconf SF - 11/13/15

14

HOW SHOULD WE SCORE ANOMALIES?

• The perfect detector• Detects every anomaly• Detects anomalies as soon as possible• Provides detections in real time• Triggers no false alarms• Requires no parameter tuning• Automatically adapts to changing statistics

• Scoring methods in traditional benchmarks are insufficient• Precision/recall does not incorporate importance of early detection• Artificial separation into training and test sets does not handle continuous

learning• Batch data files allow look ahead and multiple passes through the data

Page 15: Subutai Ahmad, VP of Research, Numenta at MLconf SF - 11/13/15

15

WHERE IS THE ANOMALY?

Page 16: Subutai Ahmad, VP of Research, Numenta at MLconf SF - 11/13/15

16

NAB DEFINES ANOMALY WINDOWS

Page 17: Subutai Ahmad, VP of Research, Numenta at MLconf SF - 11/13/15

17

• Effect of each detection is scaled relative to position within window:

• Detections outside window are false positives (scored low)

• Multiple detections within window are ignored (use earliest one)

SCORING FUNCTION

Page 18: Subutai Ahmad, VP of Research, Numenta at MLconf SF - 11/13/15

18

• Effect of each detection is scaled relative to position within window:

• Detections outside window are false positives (scored low)

• Multiple detections within window are ignored (use earliest one)

• Total score is sum of scaled detections + weighted sum of missed detections:

SCORING FUNCTION

Page 19: Subutai Ahmad, VP of Research, Numenta at MLconf SF - 11/13/15

19

OTHER DETAILS• Application profiles

• Three application profiles assign different weightings based on the tradeoff between false positives and false negatives.

• EKG data on a cardiac patient favors False Positives.• IT / DevOps professionals hate False Positives.• Three application profiles: standard, favor low false positives, favor low false

negatives.

Page 20: Subutai Ahmad, VP of Research, Numenta at MLconf SF - 11/13/15

20

OTHER DETAILS• Application profiles

• Three application profiles assign different weightings based on the tradeoff between false positives and false negatives.

• EKG data on a cardiac patient favors False Positives.• IT / DevOps professionals hate False Positives.• Three application profiles: standard, favor low false positives, favor low false

negatives.

• NAB emulates practical real-time scenarios• Look ahead not allowed for algorithms. Detections must be made on the fly.• No separation between training and test files. Invoke model, start streaming,

and go.• No batch, per dataset, parameter tuning. Must be fully automated with

single set of parameters across datasets. Any further parameter tuning must be done on the fly.

Page 21: Subutai Ahmad, VP of Research, Numenta at MLconf SF - 11/13/15

21

TESTING ALGORITHMS WITH NAB• NAB is a community effort

• The goal is to have researchers independently evaluate a large number of algorithms

• Very easy to plug in and test new algorithms

Page 22: Subutai Ahmad, VP of Research, Numenta at MLconf SF - 11/13/15

22

TESTING ALGORITHMS WITH NAB• NAB is a community effort

• The goal is to have researchers independently evaluate a large number of algorithms

• Very easy to plug in and test new algorithms

• Seed results with three algorithms:• Hierarchical Temporal Memory

• Numenta’s open source streaming anomaly detection algorithm• Models temporal sequences in data, continuously learning

• Etsy Skyline• Popular open source anomaly detection technique• Mixture of statistical experts, continuously learning

• Twitter ADVec• Open source anomaly detection released earlier this year• Robust outlier statistics + piecewise approximation

Page 23: Subutai Ahmad, VP of Research, Numenta at MLconf SF - 11/13/15

23

NAB V1.0 RESULTS (58 FILES)

Page 24: Subutai Ahmad, VP of Research, Numenta at MLconf SF - 11/13/15

24

DETECTION RESULTS: CPU USAGE ON PRODUCTION SERVER

Simple spike, all 3 algorithms detect

Shift in usage

Etsy Skyline

Numenta HTM

Twitter ADVec

Red denotes False Positive

Key

Page 25: Subutai Ahmad, VP of Research, Numenta at MLconf SF - 11/13/15

25

DETECTION RESULTS: MACHINE TEMPERATURE READINGS

HTM detects purely temporal anomaly

Etsy Skyline

Numenta HTM

Twitter ADVec

Red denotes False Positive

Key

All 3 detect catastrophic failure

Page 26: Subutai Ahmad, VP of Research, Numenta at MLconf SF - 11/13/15

26

DETECTION RESULTS: TEMPORAL CHANGES IN BEHAVIOR OFTEN PRECEDE A LARGER SHIFT

HTM detects anomaly 3 hours earlier

Etsy Skyline

Numenta HTM

Twitter ADVec

Red denotes False Positive

Key

Page 27: Subutai Ahmad, VP of Research, Numenta at MLconf SF - 11/13/15

27

SUMMARY• Anomaly detection is most common application for streaming

analytics• NAB is a community benchmark for streaming anomaly

detection• Includes a labeled dataset with real data• Scoring methodology designed for practical real-time applications• Fully open source codebase

Page 28: Subutai Ahmad, VP of Research, Numenta at MLconf SF - 11/13/15

28

SUMMARY• Anomaly detection is most common application for streaming

analytics• NAB is a community benchmark for streaming anomaly

detection• Includes a labeled dataset with real data• Scoring methodology designed for practical real-time applications• Fully open source codebase

• What’s next for NAB?• We hope to see researchers test additional algorithms• We hope to spark improved algorithms for streaming• More data sets!

• Could incorporate UC Irvine dataset, Yahoo labs dataset (not open source)

• Would love to get more labeled streaming datasets from you• Add support for multivariate anomaly detection

Page 29: Subutai Ahmad, VP of Research, Numenta at MLconf SF - 11/13/15

29

NAB RESOURCESTable 12 at MLConf

Repository: https://github.com/numenta/NAB

Paper: A. Lavin and S. Ahmad, “Evaluating Real-time Anomaly Detection Algorithms – the Numenta Anomaly Benchmark,” to appear in 14th International Conference on Machine Learning and Applications (IEEE ICMLA’15), 2015.Preprint available: http://arxiv.org/abs/1510.03336

Contact info:[email protected] , [email protected]

Page 30: Subutai Ahmad, VP of Research, Numenta at MLconf SF - 11/13/15

THANK YOU!

QUESTIONS?