55
Benchmarking: You’re Doing It Wrong Aysylu Greenberg @aysylu22

Benchmarking: You're Doing It Wrong (StrangeLoop 2014)

Embed Size (px)

DESCRIPTION

Knowledge of how to set up good benchmarks is invaluable in understanding performance of the system. Writing correct and useful benchmarks is hard, and verification of the results is difficult and prone to errors. When done right, benchmarks guide teams to improve the performance of their systems. When done wrong, hours of effort may result in a worse performing application, upset customers or worse! In this talk, we will discuss what you need to know to write better benchmarks. We will look at examples of bad benchmarks and learn about what biases can invalidate the measurements, in the hope of correctly applying our new-found skills and avoiding such pitfalls in the future.

Citation preview

Page 1: Benchmarking: You're Doing It Wrong (StrangeLoop 2014)

Benchmarking:  You’re  Doing  It  Wrong  

Aysylu  Greenberg  @aysylu22  

Page 2: Benchmarking: You're Doing It Wrong (StrangeLoop 2014)

In  Memes…  

Page 3: Benchmarking: You're Doing It Wrong (StrangeLoop 2014)

To  Write  Good  Benchmarks…  

Need  to  be  Full  Stack  

Page 4: Benchmarking: You're Doing It Wrong (StrangeLoop 2014)

       What’s  a  Benchmark  

How  fast?    

Your  process  vs  Goal  Your  process  vs  Best  PracLces  

 

Page 5: Benchmarking: You're Doing It Wrong (StrangeLoop 2014)

Today  

•  How  Not  to  Write  Benchmarks  •  Benchmark  Setup  &  Results:  -   Wrong  about  the  machine  -   Wrong  about  stats  -   Wrong  about  what  maOers  

•  Becoming  Less  Wrong  

Page 6: Benchmarking: You're Doing It Wrong (StrangeLoop 2014)

HOW  NOT  TO  WRITE  BENCHMARKS  

Page 7: Benchmarking: You're Doing It Wrong (StrangeLoop 2014)

Website  Serving  Images  

•  Access  1  image  1000  Lmes  •  Latency  measured  for  each  access  •  Start  measuring  immediately  •  3  runs  •  Find  mean  •  Dev  machine  

Web  Request  

Server  

S3  Cache  

Page 8: Benchmarking: You're Doing It Wrong (StrangeLoop 2014)

WHAT’S  WRONG  WITH  THIS  BENCHMARK?    

Page 9: Benchmarking: You're Doing It Wrong (StrangeLoop 2014)

BENCHMARK  SETUP  &  RESULTS:  COMMON  PITFALLS  

You’re  wrong  about  the  machine  

Page 10: Benchmarking: You're Doing It Wrong (StrangeLoop 2014)

Wrong  About  the  Machine  

•  Cache,  cache,  cache,  cache!  

Page 11: Benchmarking: You're Doing It Wrong (StrangeLoop 2014)

It’s  Caches  All  The  Way  Down  

Page 12: Benchmarking: You're Doing It Wrong (StrangeLoop 2014)

It’s  Caches  All  The  Way  Down  

Page 13: Benchmarking: You're Doing It Wrong (StrangeLoop 2014)

Caches  in  Benchmarks  Prof.  Saman  Amarasinghe,  MIT  2009    

Page 14: Benchmarking: You're Doing It Wrong (StrangeLoop 2014)

Caches  in  Benchmarks  Prof.  Saman  Amarasinghe,  MIT  2009    

Page 15: Benchmarking: You're Doing It Wrong (StrangeLoop 2014)

Caches  in  Benchmarks  Prof.  Saman  Amarasinghe,  MIT  2009    

Page 16: Benchmarking: You're Doing It Wrong (StrangeLoop 2014)

Caches  in  Benchmarks  Prof.  Saman  Amarasinghe,  MIT  2009    

Page 17: Benchmarking: You're Doing It Wrong (StrangeLoop 2014)

Caches  in  Benchmarks  Prof.  Saman  Amarasinghe,  MIT  2009    

Page 18: Benchmarking: You're Doing It Wrong (StrangeLoop 2014)

Website  Serving  Images  

•  Access  1  image  1000  Lmes  •  Latency  measured  for  each  access  •  Start  measuring  immediately  •  3  runs  •  Find  mean  •  Dev  machine  

Web  Request  

Server  

S3  Cache  

Page 19: Benchmarking: You're Doing It Wrong (StrangeLoop 2014)

Wrong  About  the  Machine  

•  Cache,  cache,  cache,  cache!  •  Warmup  &  Timing  

Page 20: Benchmarking: You're Doing It Wrong (StrangeLoop 2014)

Website  Serving  Images  

•  Access  1  image  1000  Lmes  •  Latency  measured  for  each  access  •  Start  measuring  immediately  •  3  runs  •  Find  mean  •  Dev  machine  

Web  Request  

Server  

S3  Cache  

Page 21: Benchmarking: You're Doing It Wrong (StrangeLoop 2014)

Wrong  About  the  Machine  

•  Cache,  cache,  cache,  cache!  •  Warmup  &  Timing  •  Periodic  interference  

Page 22: Benchmarking: You're Doing It Wrong (StrangeLoop 2014)

Website  Serving  Images  

•  Access  1  image  1000  Lmes  •  Latency  measured  for  each  access  •  Start  measuring  immediately  •  3  runs  •  Find  mean  •  Dev  machine  

Web  Request  

Server  

S3  Cache  

Page 23: Benchmarking: You're Doing It Wrong (StrangeLoop 2014)

Wrong  About  the  Machine  

•  Cache,  cache,  cache,  cache!  •  Warmup  &  Timing  •  Periodic  interference  •  Different  specs  in  test  vs  prod  machines  

Page 24: Benchmarking: You're Doing It Wrong (StrangeLoop 2014)

Website  Serving  Images  

•  Access  1  image  1000  Lmes  •  Latency  measured  for  each  access  •  Start  measuring  immediately  •  3  runs  •  Find  mean  •  Dev  machine  

Web  Request  

Server  

S3  Cache  

Page 25: Benchmarking: You're Doing It Wrong (StrangeLoop 2014)

Wrong  About  the  Machine  

•  Cache,  cache,  cache,  cache!  •  Warmup  &  Timing  •  Periodic  interference  •  Different  specs  in  test  vs  prod  machines  •  Power  mode  changes  

Page 26: Benchmarking: You're Doing It Wrong (StrangeLoop 2014)

BENCHMARK  SETUP  &  RESULTS:  COMMON  PITFALLS  

You’re  wrong  about  the  stats  

Page 27: Benchmarking: You're Doing It Wrong (StrangeLoop 2014)

Wrong  About  Stats  

•  Too  few  samples    

Page 28: Benchmarking: You're Doing It Wrong (StrangeLoop 2014)

Wrong  About  Stats  

0  

20  

40  

60  

80  

100  

120  

0   10   20   30   40   50   60  

Latency  

Time  

Convergence  of  Median  on  Samples  

Stable  Samples  

Stable  Median  

Decaying  Samples  

Decaying  Median  

Page 29: Benchmarking: You're Doing It Wrong (StrangeLoop 2014)

Website  Serving  Images  

•  Access  1  image  1000  Lmes  •  Latency  measured  for  each  access  •  Start  measuring  immediately  •  3  runs  •  Find  mean  •  Dev  machine  

Web  Request  

Server  

S3  Cache  

Page 30: Benchmarking: You're Doing It Wrong (StrangeLoop 2014)

Wrong  About  Stats  

•  Too  few  samples  •  Non-­‐Gaussian  

Page 31: Benchmarking: You're Doing It Wrong (StrangeLoop 2014)

Website  Serving  Images  

•  Access  1  image  1000  Lmes  •  Latency  measured  for  each  access  •  Start  measuring  immediately  •  3  runs  •  Find  mean  •  Dev  machine  

Web  Request  

Server  

S3  Cache  

Page 32: Benchmarking: You're Doing It Wrong (StrangeLoop 2014)

Wrong  About  Stats  

•  Too  few  samples  •  Non-­‐Gaussian  •  MulLmodal  distribuLon  

Page 33: Benchmarking: You're Doing It Wrong (StrangeLoop 2014)

MulLmodal  DistribuLon  

50%  99%  

#  occurren

ces  

Latency   5  ms   10  ms  

Page 34: Benchmarking: You're Doing It Wrong (StrangeLoop 2014)

Wrong  About  Stats  

•  Too  few  samples  •  Non-­‐Gaussian  •  MulLmodal  distribuLon  •  Outliers  

Page 35: Benchmarking: You're Doing It Wrong (StrangeLoop 2014)

BENCHMARK  SETUP  &  RESULTS:  COMMON  PITFALLS  

You’re  wrong  about  what  maOers  

Page 36: Benchmarking: You're Doing It Wrong (StrangeLoop 2014)

Wrong  About  What  MaOers  

•  Premature  opLmizaLon  

Page 37: Benchmarking: You're Doing It Wrong (StrangeLoop 2014)

“Programmers  waste  enormous  amounts  of  Lme  thinking  about  …  the  speed  of  noncriLcal  parts  of  their  programs  ...  Forget  about  small  efficiencies  …97%  of  the  Lme:  premature  opKmizaKon  is  the  root  of  all  evil.  Yet  we  should  not  pass  up  our  opportuniLes  in  that  criLcal  3%.”    

-­‐-­‐  Donald  Knuth  

Page 38: Benchmarking: You're Doing It Wrong (StrangeLoop 2014)

Wrong  About  What  MaOers  

•  Premature  opLmizaLon  •  UnrepresentaLve  Workloads  

Page 39: Benchmarking: You're Doing It Wrong (StrangeLoop 2014)

Wrong  About  What  MaOers  

•  Premature  opLmizaLon  •  UnrepresentaLve  Workloads  •  Memory  pressure  

Page 40: Benchmarking: You're Doing It Wrong (StrangeLoop 2014)

BECOMING  LESS  WRONG  The  How  

Page 41: Benchmarking: You're Doing It Wrong (StrangeLoop 2014)

Becoming  Less  Wrong  

User  AcLons  MaOer    

X  >  Y  for  workload  Z  with  trade  offs  A,  B,  and  C  

-­‐  hOp://www.toomuchcode.org/  

Page 42: Benchmarking: You're Doing It Wrong (StrangeLoop 2014)

Becoming  Less  Wrong  

Profiling  Code  InstrumentaLon  Aggregate  Over  Logs  Traces    

Page 43: Benchmarking: You're Doing It Wrong (StrangeLoop 2014)

Microbenchmarking:  Blessing  &  Curse  

+ Quick  &  cheap  + Answers  narrow  ?s  well  - Open  misleading  results  - Not  representaLve  of  the  program  

Page 44: Benchmarking: You're Doing It Wrong (StrangeLoop 2014)

Microbenchmarking:  Blessing  &  Curse  

•  Choose  your  N  wisely    

Page 45: Benchmarking: You're Doing It Wrong (StrangeLoop 2014)

Choose  Your  N  Wisely  Prof.  Saman  Amarasinghe,  MIT  2009    

Page 46: Benchmarking: You're Doing It Wrong (StrangeLoop 2014)

Microbenchmarking:  Blessing  &  Curse  

•  Choose  your  N  wisely  •  Measure  side  effects  

Page 47: Benchmarking: You're Doing It Wrong (StrangeLoop 2014)

Microbenchmarking:  Blessing  &  Curse  

•  Choose  your  N  wisely  •  Measure  side  effects  •  Beware  of  clock  resoluLon  

Page 48: Benchmarking: You're Doing It Wrong (StrangeLoop 2014)

Microbenchmarking:  Blessing  &  Curse  

•  Choose  your  N  wisely  •  Measure  side  effects  •  Beware  of  clock  resoluLon  •  Dead  Code  EliminaLon  

Page 49: Benchmarking: You're Doing It Wrong (StrangeLoop 2014)

Microbenchmarking:  Blessing  &  Curse  

•  Choose  your  N  wisely  •  Measure  side  effects  •  Beware  of  clock  resoluLon  •  Dead  Code  EliminaLon  •  Constant  work  per  iteraLon  

Page 50: Benchmarking: You're Doing It Wrong (StrangeLoop 2014)

Non-­‐Constant  Work  Per  IteraLon  

Page 51: Benchmarking: You're Doing It Wrong (StrangeLoop 2014)

Follow-­‐up  Material  •  How  NOT  to  Measure  Latency  by  Gil  Tene  

–  hOp://www.infoq.com/presentaLons/latency-­‐piralls  •  Taming  the  Long  Latency  Tail  on  highscalability.com  

–  hOp://highscalability.com/blog/2012/3/12/google-­‐taming-­‐the-­‐long-­‐latency-­‐tail-­‐when-­‐more-­‐machines-­‐equal.html  

•  Performance  Analysis  Methodology  by  Brendan  Gregg  –  hOp://www.brendangregg.com/methodology.html  

•  Robust  Java  benchmarking  by  Brent  Boyer  –  hOp://www.ibm.com/developerworks/library/j-­‐benchmark1/  –  hOp://www.ibm.com/developerworks/library/j-­‐benchmark2/  

•  Benchmarking  arLcles  by  Aleksey  Shipilëv  –  hOp://shipilev.net/#benchmarking  

Page 52: Benchmarking: You're Doing It Wrong (StrangeLoop 2014)

Takeaway  #1:  Cache  

Page 53: Benchmarking: You're Doing It Wrong (StrangeLoop 2014)

Takeaway  #2:  Outliers  

Page 54: Benchmarking: You're Doing It Wrong (StrangeLoop 2014)

Takeaway  #3:  Workload  

Page 55: Benchmarking: You're Doing It Wrong (StrangeLoop 2014)

Benchmarking:  You’re  Doing  It  Wrong  

Aysylu  Greenberg  @aysylu22