Upload
aysylu-greenberg
View
584
Download
1
Embed Size (px)
DESCRIPTION
Knowledge of how to set up good benchmarks is invaluable in understanding performance of the system. Writing correct and useful benchmarks is hard, and verification of the results is difficult and prone to errors. When done right, benchmarks guide teams to improve the performance of their systems. When done wrong, hours of effort may result in a worse performing application, upset customers or worse! In this talk, we will discuss what you need to know to write better benchmarks for distributed systems. We will look at examples of bad benchmarks and learn about what biases can invalidate the measurements, in the hope of correctly applying our new-found skills and avoiding such pitfalls in the future.
Citation preview
Benchmarking: You’re Doing It Wrong
Aysylu Greenberg @aysylu22
To Write Good Benchmarks…
Need to be Full Stack
your process vs Goal your process vs Best PracCces
Benchmark = How Fast?
Today
• How Not to Write Benchmarks • Benchmark Setup & Results: - You’re wrong about machines - You’re wrong about stats - You’re wrong about what maLers
• Becoming Less Wrong • Having Fun with Riak
HOW NOT TO WRITE BENCHMARKS
Website Serving Images
• Access 1 image 1000 Cmes • Latency measured for each access • Start measuring immediately • 3 runs • Find mean • Dev environment
Web Request
Server
S3 Cache
WHAT’S WRONG WITH THIS BENCHMARK?
YOU’RE WRONG ABOUT THE MACHINE
Wrong About the Machine
• Cache, cache, cache, cache!
It’s Caches All The Way Down
Web Request
Server
S3 Cache
It’s Caches All The Way Down
Caches in Benchmarks Prof. Saman Amarasinghe, MIT 2009
Caches in Benchmarks Prof. Saman Amarasinghe, MIT 2009
Caches in Benchmarks Prof. Saman Amarasinghe, MIT 2009
Caches in Benchmarks Prof. Saman Amarasinghe, MIT 2009
Caches in Benchmarks Prof. Saman Amarasinghe, MIT 2009
Website Serving Images
• Access 1 image 1000 Cmes • Latency measured for each access • Start measuring immediately • 3 runs • Find mean • Dev environment
Web Request
Server
S3 Cache
Wrong About the Machine
• Cache, cache, cache, cache! • Warmup & Cming
Website Serving Images
• Access 1 image 1000 Cmes • Latency measured for each access • Start measuring immediately • 3 runs • Find mean • Dev environment
Web Request
Server
S3 Cache
Wrong About the Machine
• Cache, cache, cache, cache! • Warmup & Cming • Periodic interference
Website Serving Images
• Access 1 image 1000 Cmes • Latency measured for each access • Start measuring immediately • 3 runs • Find mean • Dev environment
Web Request
Server
S3 Cache
Wrong About the Machine
• Cache, cache, cache, cache! • Warmup & Cming • Periodic interference • Test != Prod
Website Serving Images
• Access 1 image 1000 Cmes • Latency measured for each access • Start measuring immediately • 3 runs • Find mean • Dev environment
Web Request
Server
S3 Cache
Wrong About the Machine
• Cache, cache, cache, cache! • Warmup & Cming • Periodic interference • Test != Prod • Power mode changes
YOU’RE WRONG ABOUT THE STATS
Wrong About Stats
• Too few samples
Wrong About Stats
0
20
40
60
80
100
120
0 10 20 30 40 50 60
Latency
Time
Convergence of Median on Samples
Stable Samples
Stable Median
Decaying Samples
Decaying Median
Website Serving Images
• Access 1 image 1000 Cmes • Latency measured for each access • Start measuring immediately • 3 runs • Find mean • Dev machine
Web Request
Server
S3 Cache
Wrong About Stats
• Too few samples • Gaussian (not)
Website Serving Images
• Access 1 image 1000 Cmes • Latency measured for each access • Start measuring immediately • 3 runs • Find mean • Dev machine
Web Request
Server
S3 Cache
Wrong About Stats
• Too few samples • Gaussian (not) • MulCmodal distribuCon
MulCmodal DistribuCon
50% 99%
# occurren
ces
Latency 5 ms 10 ms
Wrong About Stats
• Too few samples • Gaussian (not) • MulCmodal distribuCon • Outliers
YOU’RE WRONG ABOUT WHAT MATTERS
Wrong About What MaLers
• Premature opCmizaCon
“Programmers waste enormous amounts of Cme thinking about … the speed of noncriCcal parts of their programs ... Forget about small efficiencies …97% of the Cme: premature opHmizaHon is the root of all evil. Yet we should not pass up our opportuniCes in that criCcal 3%.”
-‐-‐ Donald Knuth
Wrong About What MaLers
• Premature opCmizaCon • UnrepresentaCve workloads
Wrong About What MaLers
• Premature opCmizaCon • UnrepresentaCve workloads • Memory pressure
Wrong About What MaLers
• Premature opCmizaCon • UnrepresentaCve workloads • Memory pressure • Load balancing
Wrong About What MaLers
• Premature opCmizaCon • UnrepresentaCve workloads • Memory pressure • Load balancing • Reproducibility of measurements
BECOMING LESS WRONG
User AcCons MaLer
X > Y for workload Z with trade offs A, B, and C
-‐ hLp://www.toomuchcode.org/
Profiling Code instrumentaCon Aggregate over logs Traces
Microbenchmarking: Blessing & Curse
+ Quick & cheap + Answers narrow ?s well - Osen misleading results - Not representaCve of the program
Microbenchmarking: Blessing & Curse
• Choose your N wisely
Choose Your N Wisely Prof. Saman Amarasinghe, MIT 2009
Microbenchmarking: Blessing & Curse
• Choose your N wisely • Measure side effects
Microbenchmarking: Blessing & Curse
• Choose your N wisely • Measure side effects • Beware of clock resoluCon
Microbenchmarking: Blessing & Curse
• Choose your N wisely • Measure side effects • Beware of clock resoluCon • Dead Code EliminaCon
Microbenchmarking: Blessing & Curse
• Choose your N wisely • Measure side effects • Beware of clock resoluCon • Dead Code EliminaCon • Constant work per iteraCon
Non-‐Constant Work Per IteraCon
Follow-‐up Material
• How NOT to Measure Latency by Gil Tene – hLp://www.infoq.com/presentaCons/latency-‐piualls
• Taming the Long Latency Tail on highscalability.com – hLp://highscalability.com/blog/2012/3/12/google-‐taming-‐the-‐long-‐latency-‐tail-‐when-‐more-‐machines-‐equal.html
• Performance Analysis Methodology by Brendan Gregg – hLp://www.brendangregg.com/methodology.html
• Silverman’s Mode Detec@on Method by MaL Adereth – hLp://adereth.github.io/blog/2014/10/12/silvermans-‐mode-‐detecCon-‐method-‐explained/
HAVING FUN WITH
Setup
• SSD 30 GB • M3 large • Riak version 1.4.2-‐0-‐g61ac9d8 • Ubuntu 12.04.5 LTS • 4 byte keys, 10 KB values
1850
1900
1950
2000
2050
2100
2150
2200
2250
2300
2350
Latency (usec)
Number of Keys
Get Latency
L3
Takeaway #1: Cache
Takeaway #2: Outliers
Takeaway #3: Workload
Benchmarking: You’re Doing It Wrong
Aysylu Greenberg @aysylu22