36
PRESENTATION TITLE GOES HERE What’s Your Shape? 5 Steps to Understanding Your Virtual Workload Irfan Ahmad CTO CloudPhysics

What’s Your Shape? 5 Steps to PRESENTATION TITLE GOES …

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: What’s Your Shape? 5 Steps to PRESENTATION TITLE GOES …

PRESENTATION TITLE GOES HERE What’s Your Shape? 5 Steps to Understanding Your Virtual Workload

Irfan Ahmad CTO

CloudPhysics

Page 2: What’s Your Shape? 5 Steps to PRESENTATION TITLE GOES …

What’s Your Shape? 5 Steps to Understanding Your Virtual Workload © 2014 Storage Networking Industry Association and CloudPhysics, Inc. All Rights Reserved.

SNIA Legal Notice

The material contained in this tutorial is copyrighted by the SNIA unless otherwise noted. Member companies and individual members may use this material in presentations and literature under the following conditions:

Any slide or slides used must be reproduced in their entirety without modification The SNIA must be acknowledged as the source of any material used in the body of any document containing material from these presentations.

This presentation is a project of the SNIA Education Committee. Neither the author nor the presenter is an attorney and nothing in this presentation is intended to be, or should be construed as legal advice or an opinion of counsel. If you need legal advice or a legal opinion please contact your attorney. The information presented herein represents the author's personal opinion and current understanding of the relevant issues involved. The author, the presenter, and the SNIA do not assume any responsibility or liability for damages arising out of any reliance on or use of this information. NO WARRANTIES, EXPRESS OR IMPLIED. USE AT YOUR OWN RISK.

2

Page 3: What’s Your Shape? 5 Steps to PRESENTATION TITLE GOES …

What’s Your Shape? 5 Steps to Understanding Your Virtual Workload © 2014 Storage Networking Industry Association and CloudPhysics, Inc. All Rights Reserved.

Abstract

Is your Virtual Machine (VM) rightsized? Is your VM is on the right datastore?

Is your VM IO bound?

These are the kinds of questions that come up frequently when you’re doing capacity management, solving performance problems, and making procurement decisions. The root of the answer to all of them is the shape of your workload. Learn you how to find the answers to workload shape questions. This tutorial delves into some of the challenges inherent in right-sizing virtual workloads by discovering the shape of your workload and applying that knowledge to capacity and performance decisions.

Is the disk workload sequential or random? How much parallelism is there? How do we figure out the shape of a workload? What are the tools and techniques we can

use? What is the bottleneck resource? How do we map workloads to the right mix of storage, CPU, memory and network?

3

Page 4: What’s Your Shape? 5 Steps to PRESENTATION TITLE GOES …

What’s Your Shape? 5 Steps to Understanding Your Virtual Workload © 2014 Storage Networking Industry Association and CloudPhysics, Inc. All Rights Reserved.

The Virtualization Promise

Page 5: What’s Your Shape? 5 Steps to PRESENTATION TITLE GOES …

What’s Your Shape? 5 Steps to Understanding Your Virtual Workload © 2014 Storage Networking Industry Association and CloudPhysics, Inc. All Rights Reserved.

3

…No Longer Delivering

Page 6: What’s Your Shape? 5 Steps to PRESENTATION TITLE GOES …

What’s Your Shape? 5 Steps to Understanding Your Virtual Workload © 2014 Storage Networking Industry Association and CloudPhysics, Inc. All Rights Reserved.

A New Set of Headaches

Will I really benefit from costly SSD

cache?

Will performance

suffer if I consolidate

more?

How do I properly

plan my IT budget for next year?

How do I ensure that we meet our

SLAs?

Page 7: What’s Your Shape? 5 Steps to PRESENTATION TITLE GOES …

What’s Your Shape? 5 Steps to Understanding Your Virtual Workload © 2014 Storage Networking Industry Association and CloudPhysics, Inc. All Rights Reserved.

Designed Like Civil Engineering

Operated Like Airlines

Predicted Like Chip Design

Other large operations have powerful tools to design and manage, but datacenters do not.

• CAD software helps design infrastructure and model costs before building

• Logistics management software allows for maximizing efficiency

• Design automation software allows for testing before costly manufacturing

Can’t Datacenters Be…?

Page 8: What’s Your Shape? 5 Steps to PRESENTATION TITLE GOES …

What’s Your Shape? 5 Steps to Understanding Your Virtual Workload © 2014 Storage Networking Industry Association and CloudPhysics, Inc. All Rights Reserved.

…Sure They Can!

Example: Storage performance Predict how an SSD design will perform? Model the cost of operations and ROI?

Tools now exist to design, predict and operate data centers. And workload shapes are the key!

Page 9: What’s Your Shape? 5 Steps to PRESENTATION TITLE GOES …

What’s Your Shape? 5 Steps to Understanding Your Virtual Workload © 2014 Storage Networking Industry Association and CloudPhysics, Inc. All Rights Reserved.

Flash Overtakes In Mindshare

Interest in flash memory has risen greatly, but it is costly and doesn’t benefit everyone.

Page 10: What’s Your Shape? 5 Steps to PRESENTATION TITLE GOES …

What’s Your Shape? 5 Steps to Understanding Your Virtual Workload © 2014 Storage Networking Industry Association and CloudPhysics, Inc. All Rights Reserved.

SSD has garnered a lot of hype

But does reality live up to the hype?

Source: Google Trends, March 2014

Solid-state drive SSD

“The economics of flash memory are staggering. If you’re not using SSD, you are doing it wrong.” – High Scalability

Page 11: What’s Your Shape? 5 Steps to PRESENTATION TITLE GOES …

What’s Your Shape? 5 Steps to Understanding Your Virtual Workload © 2014 Storage Networking Industry Association and CloudPhysics, Inc. All Rights Reserved.

SSDs: A cautionary tale

• Large SSD caching project POC completed Production deployment completed

• But back-of-the-envelope VM selection VMs solely selected on application identity

• Project was a DISASTER VMs couldn’t possibly benefit – tremendous waster

Company Quick Facts

• Light Vehicle Automotive • Publicly traded • Established 1950s

• 4,000+ employees • $3bln+ revenue

Page 12: What’s Your Shape? 5 Steps to PRESENTATION TITLE GOES …

What’s Your Shape? 5 Steps to Understanding Your Virtual Workload © 2014 Storage Networking Industry Association and CloudPhysics, Inc. All Rights Reserved.

Key questions

Will SSDs benefit my datacenter?

Which of my VMs / applications?

How much cache do I need?

Page 13: What’s Your Shape? 5 Steps to PRESENTATION TITLE GOES …

What’s Your Shape? 5 Steps to Understanding Your Virtual Workload © 2014 Storage Networking Industry Association and CloudPhysics, Inc. All Rights Reserved.

Do VMs benefit from SSDs? Depends…

Is the disk even a bottleneck (or is it CPU, memory)? How do you determine if a VM will benefit from caching? Detailed workload characterization

Outstanding IOs analysis Read/write ratio analysis Latency analysis Cache hit ratio analysis

No simple rule of thumb! No one size fits all

Page 14: What’s Your Shape? 5 Steps to PRESENTATION TITLE GOES …

What’s Your Shape? 5 Steps to Understanding Your Virtual Workload © 2014 Storage Networking Industry Association and CloudPhysics, Inc. All Rights Reserved.

Made up Example

0

500

1000

1500

2000

2500

1 2 3 4 5 6 7 8 9 10Latency of an operation (microseconds)

Frequency

Histograms are much more informative than single numbers like mean, median, and standard deviations from the mean

e.g., multimodal behaviors are easily identified by plotting a histogram, but obfuscated by a mean

Histograms can actually be calculated efficiently online Why take one number if you can have a distribution?

Mean is 5.3!

Workload Shapes Technique

Page 15: What’s Your Shape? 5 Steps to PRESENTATION TITLE GOES …

What’s Your Shape? 5 Steps to Understanding Your Virtual Workload © 2014 Storage Networking Industry Association and CloudPhysics, Inc. All Rights Reserved.

Workload Shapes Technique

The ESX disk IO workload characterization is on a pervirtual disk basis

Allows us to separate out each different type of workload into its own container and observe trends

Technique: For each virtual machine IO request in ESX, we insert some values into histograms

E.g., size of IO request → 4KB

0246

1024

2048

4096

8192

0246

1024

2048

4096

8192

Data collected

per-virtual

disk

Page 16: What’s Your Shape? 5 Steps to PRESENTATION TITLE GOES …

What’s Your Shape? 5 Steps to Understanding Your Virtual Workload © 2014 Storage Networking Industry Association and CloudPhysics, Inc. All Rights Reserved.

Workload Shapes Technique

Read/Write Distributions are available for our histograms

Overall Read/Write ratio?

Are Writes smaller or larger than Reads in this workload?

Are Reads more sequential than Writes?

Which type of IO is incurring more latency?

IO Size All, Reads, Writes

Seek Distance All, Reads, Writes

Seek Distance Shortest Among Last 16

Outstanding IOs All, Reads, Writes

IO Interarrival Times All, Reads, Writes

Latency All, Reads, Write

Page 17: What’s Your Shape? 5 Steps to PRESENTATION TITLE GOES …

What’s Your Shape? 5 Steps to Understanding Your Virtual Workload © 2014 Storage Networking Industry Association and CloudPhysics, Inc. All Rights Reserved.

IO Length Filebench OLTP I/O Length Histogram

0500

100015002000250030003500

512

1024

2048

4095

4096

8191

8192

1638

3

1638

4

3276

8

4915

2

6553

5

6553

6

8192

0

1310

72

2621

44

5242

88

>524

288

Length (bytes)

Frequency

I/O Length Histogram

0200400600800

1000120014001600

512

1024

2048

4095

4096

8191

8192

1638

3

1638

4

3276

8

4915

2

6553

5

6553

6

8192

0

1310

72

2621

44

5242

88

>524

288

Length (bytes)Frequency

UFS

ZFS

4K and 8K IO transformed into 128K by

ZFS?

Page 18: What’s Your Shape? 5 Steps to PRESENTATION TITLE GOES …

What’s Your Shape? 5 Steps to Understanding Your Virtual Workload © 2014 Storage Networking Industry Association and CloudPhysics, Inc. All Rights Reserved.

Seek Distance Filebench OLTP

Seek Distance Histogram

0

200

400

600

800

1000

1200

1400

-50

00

00

-50

00

0

-50

00

-50

0

-64

-16 -6 -2 0 2 6

16

64

50

0

50

00

50

00

0

50

00

00

Distance (sectors)

Frequency

Seek Distance Histogram

0

50

100

150

200

250

300

-500

000

-500

00

-500

0

-500 -64

-16 -6 -2 0 2 6 16 64 500

5000

5000

0

5000

00

Distance (sectors)

Freq

uen

cy

UFS

ZFS

Seek distance: measure of sequentiality versus randomness in a workload Somehow a random workload is transformed into a sequential one by ZFS! More details needed ...

Page 19: What’s Your Shape? 5 Steps to PRESENTATION TITLE GOES …

What’s Your Shape? 5 Steps to Understanding Your Virtual Workload © 2014 Storage Networking Industry Association and CloudPhysics, Inc. All Rights Reserved.

Seek Distance Filebench OLTP—More Detailed

UFS

ZFS

Seek Distance Histogram (Writes)

0

200

400

600

800

1000

1200

-500

000

-500

00

-500

0

-500 -64

-16 -6 -2 0 2 6 16 64 500

5000

5000

0

5000

00

Distance (sectors)

Frequency

Seek Distance Histogram (Writes)

0

50

100

150

200

250

300

-500

000

-500

00

-500

0

-500 -64

-16 -6 -2 0 2 6 16 64 500

5000

5000

0

5000

00

Distance (sectors)

Frequency

Seek Distance Histogram (Reads)

0

50

100

150

200

250

300

-500

000

-500

00

-500

0

-500 -64

-16 -6 -2 0 2 6 16 64 500

5000

5000

0

5000

00

Distance (sectors)

Frequency

Seek Distance Histogram (Reads)

0

100

200

300

400

500

600

-500

000

-500

00

-500

0

-500 -64

-16 -6 -2 0 2 6 16 64 500

5000

5000

0

5000

00

Distance (sectors)

Frequency

Split out reads & writes

Transformation from Random to Sequential: primarily for Writes Reads: Seek distance is reduced (look at histogram shape & scales)

Page 20: What’s Your Shape? 5 Steps to PRESENTATION TITLE GOES …

What’s Your Shape? 5 Steps to Understanding Your Virtual Workload © 2014 Storage Networking Industry Association and CloudPhysics, Inc. All Rights Reserved.

Filebench OLTP Summary

So, what have we learnt about Filebench OLTP? IO is primarily 4K but 8K isn’t uncommon (~30%) Access pattern is mostly random

Reads are entirely random Writes do have a forward-leaning pattern

ZFS is able to transform random Writes into sequential: Aggressive IO scheduling Copy-on-write (COW) technique (blocks on disk not modified in place) Changes to blocks from app writes are written to alternate locations Stream otherwise random data writes to a sequential pattern on disk

Performed this detailed analysis in just a few minutes

Page 21: What’s Your Shape? 5 Steps to PRESENTATION TITLE GOES …

What’s Your Shape? 5 Steps to Understanding Your Virtual Workload © 2014 Storage Networking Industry Association and CloudPhysics, Inc. All Rights Reserved.

OSDL Database Test 2 (Linux 2.6.17-10) Analysis

Workload is primarily random (big spikes towards the right and left edges of the graph) Still, many IOs that are within 500 sectors (20%) or within 5,000 sectors (33%) of the previous command The workload is almost exclusively 8K for both reads and writes

Seek Distance Histogram (Writes)

0

50

100

150

200

250

300

-500

000

-500

00

-500

0

-500 -6

4

-16 -6 -2 0 2 6 16 64 500

5000

5000

0

5000

00

Distance (sectors)

Frequency

I/O Length Histogram

0200400600800

10001200140016001800

512

1024

2048

4095

4096

8191

8192

1638

3

1638

4

3276

8

4915

2

6553

5

6553

6

8192

0

1310

72

2621

44

5242

88

> 52

4288

Length (bytes)

Frequency

Page 22: What’s Your Shape? 5 Steps to PRESENTATION TITLE GOES …

What’s Your Shape? 5 Steps to Understanding Your Virtual Workload © 2014 Storage Networking Industry Association and CloudPhysics, Inc. All Rights Reserved.

OSDL Database Test 2 (Linux 2.6.17-10) Analysis (2)

The number of outstanding IOs are very different in this workload between reads and writes PostgreSQL almost always issues 32 write IOs simultaneously IO rate from this workload varies over time as much as 15% over a 2 min period

Outstanding I/Os Histogram (Reads, Writes)

0100200300400500600700800900

1000

1 2 4 6 8 12 16 20 24 28 32 64

> 64

I/Os Outstanding at Arrival time

Frequency

ReadsWrites

1 4 8 16 24 32 > 64S1

S6

S11

S16

0

200

400

600

800

1000

1200

Frequency

I/Os Outstandingat Arrival time

Time (in 6 sec

intervals)

Outstanding I/Os Histogram over Time

1000-1200800-1000600-800400-600200-4000-200

Page 23: What’s Your Shape? 5 Steps to PRESENTATION TITLE GOES …

What’s Your Shape? 5 Steps to Understanding Your Virtual Workload © 2014 Storage Networking Industry Association and CloudPhysics, Inc. All Rights Reserved.

OSDL Database Test 2 (Linux 2.6.17-10) Summary

On the aggregate the workload appears random But 20% of IOs are within 250KB and 33% are within 2.4MB!

IO size is 8K for both reads and writes Outstanding IOs very different between reads and writes

PostgreSQL almost always issues 32 write IOs simultaneously

IO rate varies over time (up to 15%) Don’t assume that every database workload behaves the same; measure and determine for yourself

Page 24: What’s Your Shape? 5 Steps to PRESENTATION TITLE GOES …

What’s Your Shape? 5 Steps to Understanding Your Virtual Workload © 2014 Storage Networking Industry Association and CloudPhysics, Inc. All Rights Reserved.

Use Cases for Workload Shapes

Analyzing a new disk performance sensitive workload

Tuning of underlying disk subsystem

How to interpret Pay attention to changes in distribution shape as well as magnitude

Which metrics to start with IO Size

Read/Write Ratios

Outstanding IOs

Corrective actions Tune disk subsystem and re-measure; pay attention to latency histogram

Page 25: What’s Your Shape? 5 Steps to PRESENTATION TITLE GOES …

What’s Your Shape? 5 Steps to Understanding Your Virtual Workload © 2014 Storage Networking Industry Association and CloudPhysics, Inc. All Rights Reserved.

Workload shape matters

Bursty writes Steady read traffic

The read/write ratio is highly biased towards reads.

8K reads and writes

Bimodal spatial locality

Understanding application IO patterns is the first step in predicting SSD benefits.

Page 26: What’s Your Shape? 5 Steps to PRESENTATION TITLE GOES …

What’s Your Shape? 5 Steps to Understanding Your Virtual Workload © 2014 Storage Networking Industry Association and CloudPhysics, Inc. All Rights Reserved.

Limitation of Simple Shape Analysis

Strength: allows deep analysis of separate flavors of workloads in each VM by splitting workloads by virtual disk

Place DB redo logs on a separate virtual disk than the DB tablespaces Weakness: doesnt give a complete picture of IO going to a storage array

Many VMs might be doing IO from same ESX host VMs from different ESX hosts might be doing IO In general, it is a hard problem to figure out Rule of thumb: IO to a LUN from different apps is effectively random Still: storage arrays are rather smart to pull off individual sequential streams and schedule IO per stream

Page 27: What’s Your Shape? 5 Steps to PRESENTATION TITLE GOES …

What’s Your Shape? 5 Steps to Understanding Your Virtual Workload © 2014 Storage Networking Industry Association and CloudPhysics, Inc. All Rights Reserved.

Disk Access Traces Matter Even More

(Source: USENIX ’06)

Knowing patterns isn’t enough. Exact IO sequences are required.

Page 28: What’s Your Shape? 5 Steps to PRESENTATION TITLE GOES …

What’s Your Shape? 5 Steps to Understanding Your Virtual Workload © 2014 Storage Networking Industry Association and CloudPhysics, Inc. All Rights Reserved.

Algorithms to the rescue

⊕ ⇓ Hit Ratio Curves

Data access patterns, IO sequences and complex analytics allow for maximizing ROI of SSD cache.

Big gains at ~500MB and 2200MB, but little in between.

⊕ Simulation Prediction

Algorithms

Page 29: What’s Your Shape? 5 Steps to PRESENTATION TITLE GOES …

What’s Your Shape? 5 Steps to Understanding Your Virtual Workload © 2014 Storage Networking Industry Association and CloudPhysics, Inc. All Rights Reserved.

SSDs: A success story

• 100s of VMs analyzed

• 16% showed benefit from server-side SSD cache

• Improvement ranged from 50% to 200% better response times

• Hit Ratio Curve derived cache size recommendations ranged from 1GB - 512GB (VM-by-VM basis)

Company Quick Facts

• Boston-area hedge fund • International operations • Assets >$20B

• Established 1980s • 50+ employees

Successfully identified VMs that benefit from SSD cache, SSDs in 16% of VMs.

Page 30: What’s Your Shape? 5 Steps to PRESENTATION TITLE GOES …

What’s Your Shape? 5 Steps to Understanding Your Virtual Workload © 2014 Storage Networking Industry Association and CloudPhysics, Inc. All Rights Reserved.

More success: SSDs in 3% of VMs

• 100s of virtual machines simulated

• 3% of VMs had 50% or higher improvement

• Hit Ratio Curves derived recommendations ranged from 1GB – 512GB (VM-by-VM basis)

• Customer installed 2 PCIe Flash cards to get maximum benefit via a strategic installation

• Public University (Boston area)

• Established 1850s

• 10,000 students • 1300+ employees

COMPANY QUICK FACTS

Even smaller deployments benefit.

Page 31: What’s Your Shape? 5 Steps to PRESENTATION TITLE GOES …

PRESENTATION TITLE GOES HERE APPENDIX

31

Page 32: What’s Your Shape? 5 Steps to PRESENTATION TITLE GOES …

What’s Your Shape? 5 Steps to Understanding Your Virtual Workload © 2014 Storage Networking Industry Association and CloudPhysics, Inc. All Rights Reserved.

Attribution & Feedback

32

Please send any questions or comments regarding this SNIA Tutorial to [email protected]

The SNIA Education Committee thanks the following individuals for their contributions to this Tutorial.

Authorship History Name/Date of Original Author here: Irfan Ahmad, CloudPhysics, April 17, 2014

Additional Contributors

Page 33: What’s Your Shape? 5 Steps to PRESENTATION TITLE GOES …

What’s Your Shape? 5 Steps to Understanding Your Virtual Workload © 2014 Storage Networking Industry Association and CloudPhysics, Inc. All Rights Reserved.

Workload Characterization Technique

To make the histograms practical, bin sizes are on rather irregular scales

E.g., the IO length histogram bin ranges like this: …, 2048, 4095, 4096, 8191, 8192, … rather odd: some buckets are big and others are as small as just 1 Certain block sizes are really special since the underlying storage subsystems may optimize for them; single those out from the start (else lose that precise information) E.g., important to know if the IO was 16KB or some other size in the interval (8KB,16KB)

2048

4095

4096

8191

8192

1638

3

1638

4

3276

8

Page 34: What’s Your Shape? 5 Steps to PRESENTATION TITLE GOES …

What’s Your Shape? 5 Steps to Understanding Your Virtual Workload © 2014 Storage Networking Industry Association and CloudPhysics, Inc. All Rights Reserved.

Windows File Copy

I/O Length Histogram

0200400600800

100012001400160018002000

512

1024

2048

4095

4096

8191

8192

1638

3

1638

4

3276

8

4915

2

6553

5

6553

6

8192

0

1310

72

2621

44

5242

88

>524

288

Length (bytes)

Frequency

Vista EnterpriseXP Pro

Seek Distance Histogram

0

200400

600

800

10001200

1400

1600

-500

000

-500

00

-500

0

-500 -6

4

-16 -6 -2 0 2 6 16 64 500

5000

5000

0

5000

00

Distance (bytes)

Frequency

Vista EnterpriseXP Pro

XP issues 64KB IOs IOs are largely sequential.

Vista is issuing very large IOs (1MB)

Number of commands is lower

IOs are very sequential

Latency is higher

Vista enables large IOs to be issued; file copy is just an example

Keep an eye out for increasing IO sizes in future workloads

Page 35: What’s Your Shape? 5 Steps to PRESENTATION TITLE GOES …

What’s Your Shape? 5 Steps to Understanding Your Virtual Workload © 2014 Storage Networking Industry Association and CloudPhysics, Inc. All Rights Reserved.

How to Run

$ /usr/lib/vmware/bin/vscsiStats -p iolength Histogram: IO lengths of commands {

min : 512

max : 32768

mean: 11731

count : 241

{

5 (<= 512)

14 (<= 1024)

5 (<= 2048)

17 (<= 4095)

76 (<= 4096)

1 (<= 8191)

20 (<= 8192)

18 (<= 16383)

36 (<= 16384)

49 (<= 32768)

0 (<= 49152)

0 (<= 65535)

0 (<= 65536)

0 (<= 81920)

0 (<= 131072)

0 (<= 262144)

0 (<= 524288)

0 (> 524288)

}

}

$ /usr/lib/vmware/bin/vscsiStats -p latency Histogram: latency of IOs in Microseconds (us) {

min : 191

max : 13391

mean: 598

count : 288

{

0 (<= 1)

0 (<= 10)

0 (<= 100)

248 (<= 500)

28 (<= 1000)

4 (<= 5000)

8 (<= 15000)

0 (<= 30000)

0 (<= 50000)

0 (<= 100000)

0 (> 100000)

}

}

Bin Ranges (Bucket Limits). Think x-axis of histograms

plots

Page 36: What’s Your Shape? 5 Steps to PRESENTATION TITLE GOES …

What’s Your Shape? 5 Steps to Understanding Your Virtual Workload © 2014 Storage Networking Industry Association and CloudPhysics, Inc. All Rights Reserved.

Performance Overhead of Stats Collection

Overhead is negligible (tested on internal build)

Used iometer to generate 4KB Sequential Reads

16 outstanding IOs on a Windows 2003 Enterprise Edition 64-bit VM 4KB is the most realistic worst-case scenario for overheads

Online Histo Service Disabled Enabled IOps 8187 8137 IOps Std. Dev. 6.5 200 MBps 35.1 34.8 CPU (out of 800) 106.0 108.0 CPU Std. Dev. 2.7 4.8

CPU Efficiency (UsedSec/IOps) 0.0417 0.0424

Latency (ms) 1.6 1.6

Table 2. Microbenchmark Performance