31
Quantifying Abnormal Behavior Baron Schwartz • VividCortex

Quantifying Abnormal Behavior

Embed Size (px)

Citation preview

Page 1: Quantifying Abnormal Behavior

æ

Quantifying Abnormal Behavior

Baron Schwartz • VividCortex

Page 2: Quantifying Abnormal Behavior

æ Optimization, Backups, Replication, and more

Baron Schwartz, Peter Zaitsev &

Vadim Tkachenko

High PerformanceMySQL

3rd Edition

Covers Version 5.5

Me

2

Baron Schwartz

baron - at - vividcortex.com

@xaprb

Page 3: Quantifying Abnormal Behavior

åThe Goal

Is the system in trouble?

Find problems early & small

Prevent problems from growing

Page 4: Quantifying Abnormal Behavior

åPrimitive Health Checks

System is dead/down

Metric exceeds threshold

Page 5: Quantifying Abnormal Behavior

å

Threshold PainFalse alarmsMissed alarmsDecisions, decisions

Page 7: Quantifying Abnormal Behavior

åHow Do Systems Fail?

Down/dead/unavailable is “rare”

Partial failures are common

Failures escalate over time

Page 8: Quantifying Abnormal Behavior

å

Abnormality DetectionHey, it’s a starting point.

Page 9: Quantifying Abnormal Behavior

æ

Look For Improbable Events?

9

Page 10: Quantifying Abnormal Behavior

å

Statistics RefresherVariance: the mean of the square minus the square of the mean

Standard Deviation: √variance

Z-Score: how many standard deviations a measurement is from the mean

The distribution of measured samples is asymptotically Gaussian, regardless of the underlying distribution.

10

Page 11: Quantifying Abnormal Behavior

å

Page 12: Quantifying Abnormal Behavior

æ

Is It Really Unlikely?

12

Page 13: Quantifying Abnormal Behavior

å

Systems Are Continually Abnormal

Page 14: Quantifying Abnormal Behavior

å

Statistical Process Control

Page 15: Quantifying Abnormal Behavior

æ

Shewhart Control Chart

15

Page 16: Quantifying Abnormal Behavior

å

Sliding WindowsTail Wags Dog?

Page 17: Quantifying Abnormal Behavior

æ17

Holt-Winters Forecasting

Page 18: Quantifying Abnormal Behavior

æ

We’re Doing It Wrong.

18

Page 19: Quantifying Abnormal Behavior

å

Measuring What MattersWhat matters is whether the system is getting its work done.

Measure work, not just status or activity.

Know the meaning of the metrics.

Did you attend Brendan Gregg’s talk?

19

Page 20: Quantifying Abnormal Behavior

å

Little’s Law: N=XR

Page 21: Quantifying Abnormal Behavior

å

Utilization Law: U=SX

Page 22: Quantifying Abnormal Behavior

æ

Universal Scalability Law

22

Page 23: Quantifying Abnormal Behavior

åWork-Related Metrics

N: Concurrency

X: Throughput

R: Response Time

U: Utilization

S: Service Time

Page 24: Quantifying Abnormal Behavior

åRealtime

At scale, in-memory operation is helpful

Rolling windows are less practical

CPU-intensive operations are impractical

The distant past has little relevance

Page 25: Quantifying Abnormal Behavior

å

Define a decay factor αbetween 0 and 1, then:

avg = avg•(α-1) + sample•α

Exponentially Weighted Moving Average

Page 26: Quantifying Abnormal Behavior

åHow To Choose α

Moving Window: age = N/2

EWMA: α = 2/(N+1)

Ex: .064516129 for a “60-second window” with an average age of 30 seconds

Page 27: Quantifying Abnormal Behavior

æ27

0

100

200

300

400

1 2 3 4 5

0

100

200

300

400

1 2 3 4 5

Page 28: Quantifying Abnormal Behavior

åExponentially Weighted Moving Statistics

Variance = EWMA of squares minus squared EWMA

Standard deviation = √EWMVar

Z-Score = Number of EWMStddev from the EWMA

Page 29: Quantifying Abnormal Behavior

åOne Feasible Normality Metric

Track EWMA and EWMASoS; compute Z-Score!

Or, use your imagination. Ideas:Variance-to-mean ratio (index of dispersion)

http://en.wikipedia.org/wiki/Index_of_dispersion

Follow links on that page ;-)

Page 30: Quantifying Abnormal Behavior

æ

Questions?@xaprb • linkedin.com/in/xaprbbaron - at - vividcortex.com

30

Page 31: Quantifying Abnormal Behavior

åPhoto Credits

http://www.flickr.com/photos/exquisitur/3502317741/http://www.flickr.com/photos/conorkeller/3424910997/http://www.flickr.com/photos/zooboing/5394322517/http://www.flickr.com/photos/robbn1/4114136177/http://www.flickr.com/photos/nathaninsandiego/5054092761/http://www.flickr.com/photos/ericmay/4817484054/http://www.flickr.com/photos/hktang/4243300265/http://www.flickr.com/photos/marceau_r/5445398067/http://www.flickr.com/photos/domesticat/2963393184/http://www.flickr.com/photos/amattox/3206367817/http://www.flickr.com/photos/rawhead/4617769266/http://www.flickr.com/photos/23737778@N00/7115229223/http://www.flickr.com/photos/sprengben/4419536377/http://www.flickr.com/photos/nickpix2008/2588993907/http://www.flickr.com/photos/kevineddy/1796490978/http://www.flickr.com/photos/asphericlens/5661878892/http://www.flickr.com/photos/dexxus/3031015377/http://www.flickr.com/photos/dexxus/5791228117/