Watch everything, Watch anything

@nathanielvcook

WATCH ANYTHING,WATCH EVERYTHINGANOMALY DETECTION BY NATHANIEL COOK

@nathanielvcook

In DevOps we are good at collecting metrics

Why? Because the tooling makes it easy and it's in our culture.

Is it not hard to collect millions of unique metrics at tens of terabytes a month.

@nathanielvcook

The Problem - Scalability

● Dashboarding doesn’t scale● Static thresholds don’t scale● Tooling isn’t easy enough

We need to automate watching

metrics, aka anomaly detection.

@nathanielvcookHow many anomalies does this graph have?

@nathanielvcook

TICK Stack

@nathanielvcook

Ways we can “watch” metrics

● With our eyes● Static Thresholds● Machine Learning / Statistical models

@nathanielvcook

Machine Learning 101

1. Get a set of training data2. Create a model from the data3. Compare new raw metrics to the model4. (If you are cool update the model again)

@nathanielvcook

Standard Deviation Model

1. Yesterday’s data at the same time of day.2. Compute the mean and standard deviation of the

training data.3. The current data is anomalous if: abs(data - mean) >

(threshold * stddev)

Threshold -- is the number of standard deviations to expect around the mean. Typically it’s greater than 2.

@nathanielvcook

Visualizing error bands. How would you express this process in code?

@nathanielvcookvar yesterday = batch |query('SELECT mean(value), stddev(value) FROM request_latency') .offset(1d) .period(1h) .every(5m) .align() |shift(1d)

var today = batch |query('SELECT mean(value) FROM request_latency') .period(1h) .every(5m) .align()

yesterday |join(today) .as('yesterday', 'today') |alert() .crit(lambda: abs("today.mean" - "yesterday.mean") > (3.5 * "yesterday.stddev"))

This code is TICKscript the DSL Kapacitor uses to define tasks.

@nathanielvcook

Predictive Model

Holt-Winters: A forecasting method from the 60s.

Find anomalies by predicting a trend for our current data.

1. Get previous 30 days of data.2. Using Holt-Winters forecast today day.3. If the predicted values differ significantly from real

values we found an anomaly.

@nathanielvcook

Predictive model for detecting unexpected data.

var training = batch |query('SELECT max(value) FROM request_count') .offset(1d) .groupBy(time(1d)) .period(30d) .every(1d)var predicted = training |holtWinters('max', 1, 7, 1d) |last('max') .as('value')var current = batch |query('SELECT max(value) FROM request_count') .period(1d) .every(1d) |last('max') .as('value')predicted |join(current) .as('predicted', 'current') |alert() .crit(lambda: abs("predicted.value" - "current.value") / "predicted.value" > 0.2)

@nathanielvcook

Custom Model

Morgoth: An unsupervised anomaly detection framework.

Find anomalies by using a custom anomaly detection framework.

1. Not needed2. Give each window an anomaly score via Morgoth.3. Check the anomaly score.

@nathanielvcook

Custom algorithm

stream |from() .measurement('request_count') |window() .period(5m) .every(5m) @morgoth() .field('value') .scoreField('anomaly_score') .sigma(3.5) |alert() .crit(lambda: "anomaly_score" > 0.9)

@nathanielvcook

How do you pick a model?

● This is the golden question.● No one model that does best.● Simple is better, start with something simple.● Let data help you choose a model.

@nathanielvcook

Properties of an Anomaly Detection Method:

● False Positive Rate (FPR)-- Boy who cried wolf● False Negative Rate (FNR) -- Missed anomalies● Detection Delay (DD)

Ask yourself: What is the cost of each?

@nathanielvcook

Try it out

1. Pick a metric2. Pick a model3. Evaluate the model on a set of historical data4. Rate the model based on its FPR, FNR and DD values.

If the model isn’t good enough try a different one or improve your existing one.

@nathanielvcook

Kapacitor makes this easy

● Select historical data and replay it against your task:

kapacitor replay-live batch -task request_count_alert -past 180d -rec-time

● Save static data sets to use as test fixtures.

kapacitor record batch -task request_count_alert -past 180d

● Store anomalies back into InfluxDB to compute FPR and FNR.

@nathanielvcook

Automate “watching” your metrics

@nathanielvcook

Q&A / More Resources:

● Anomaly Detection 101 -- Elizabeth (Betsy) Nichols Ph.D. https://www.youtube.com/watch?v=5vrY4RbeWkM

● Kapacitor is Open Source check it out on Github https://github.com/influxdata/kapacitor

● Wikipedia is your friend. There are many good explanations of how to employ various anomaly detection techniques.

Watch everything, Watch anything

Software

Mini-Training: Measure Anything Optimize Everything

B”h THE NEJUME FOUNDATION, INC.. You know know everything everything and… and… You You can do anything… can do anything…

Invest in their Put everything in start and watch how far

Design Everything. Simulate Anything. · 3 Design Everything. Simulate Anything. MSCOneSE Starter Editio n MSCOneXT Extended Editio n MSCOne MSCOne ˜˚˛˝˙ˆˇ˘ ˙ ˛ˆ ˝ ˆ

A World Where Everything Can Be Called Anything Else

Foam Packaging Protect Anything and Everything, Anywhere

MATTER Matter is anything that takes up space (or volume). Everything has MATTER!

Dyaus Infotech Let's talk Oracle - Anything & Everything

In order to understand anything, you must not try to understand everything Aristotle

Apple Watch: Everything You Need To Know As A Marketer

The Adolescent Brain. Matt, 16, can’t remember anything Bethany,18 “Knows” Everything

Tizi Hodson Blummells School. Anything and everything that you want to do, do it today, where possible. Never leave anything until tomorrow

Design Everything. Simulate Anything

StasD & Graphite - Measure anything, Measure Everything

S2000.club [Anything and Everything Honda S2000]

Anything as a service makes everything possible

Western Canada Anything and Everything Chapter 3

Anything and everything you can fly Stemme S6

IoT: Connecting Anything and Everything to the … Monday Master - 2015_01_12.pdf · IoT: Connecting Anything and Everything to the Internet January 2015 IoTIAP @ MIT Brian ... Texas

Anything&Everything” · 13 K erseyispretty,andusuallyremarkablyquiet.italways hasbeenpretty,orcertainlywithinthememoryofthose alivetoday.¶Butquiet itveryrarelywas.¶How couldaplacehavebeen