30
Cake: Enabling High-level SLOs on Shared Storage Systems Andrew Wang, Shivaram Venkataraman, Sara Alspaugh, Randy Katz, Ion Stoica University of California, Berkeley SOCC 2012

Cake: Enabling High-level SLOs on Shared Storage Systemsacs.ict.ac.cn/storage/slides/Cake.pdfCake: Enabling High-level SLOs on Shared Storage Systems Andrew Wang, Shivaram Venkataraman,

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Cake: Enabling High-level SLOs on Shared Storage Systemsacs.ict.ac.cn/storage/slides/Cake.pdfCake: Enabling High-level SLOs on Shared Storage Systems Andrew Wang, Shivaram Venkataraman,

Cake: Enabling High-level SLOs on Shared Storage Systems

Andrew Wang, Shivaram Venkataraman, Sara Alspaugh, Randy Katz, Ion Stoica

University of California, Berkeley

SOCC 2012

Page 2: Cake: Enabling High-level SLOs on Shared Storage Systemsacs.ict.ac.cn/storage/slides/Cake.pdfCake: Enabling High-level SLOs on Shared Storage Systems Andrew Wang, Shivaram Venkataraman,

2

Introduction

Problem And Challenge

Solutions

System Design

Implementation

Evaluation

Conclusion

Future work

Content

Page 3: Cake: Enabling High-level SLOs on Shared Storage Systemsacs.ict.ac.cn/storage/slides/Cake.pdfCake: Enabling High-level SLOs on Shared Storage Systems Andrew Wang, Shivaram Venkataraman,

Introduction

Rich web applications

A single slow storage request can dominate the

overall response time

High percentile latency SLOs

Deal with the latency present at the 95th or

99th percentile

Page 4: Cake: Enabling High-level SLOs on Shared Storage Systemsacs.ict.ac.cn/storage/slides/Cake.pdfCake: Enabling High-level SLOs on Shared Storage Systems Andrew Wang, Shivaram Venkataraman,

4

Introduction

Datacenter applications

Latency-sensitive

Throughput-oriented

Accessing distributed storage systems

Applications don’t share storage systems

Service-level objectives on throughput or latency

Page 5: Cake: Enabling High-level SLOs on Shared Storage Systemsacs.ict.ac.cn/storage/slides/Cake.pdfCake: Enabling High-level SLOs on Shared Storage Systems Andrew Wang, Shivaram Venkataraman,

5

Introduction

SLOs

Reflect the performance expectations

Amazon, Google, and Microsoft have identified

SLO as a major cause of user dissatisfaction

For example

A web client might require a 99th percentile

latency SLO of 100ms

A batch job might require a throughput SLO of

100 scan requests per second

Page 6: Cake: Enabling High-level SLOs on Shared Storage Systemsacs.ict.ac.cn/storage/slides/Cake.pdfCake: Enabling High-level SLOs on Shared Storage Systems Andrew Wang, Shivaram Venkataraman,

6

Problem And Challenge

Physically separating storage systems

Need Individual peak load

Segregation of data leads to degraded user

experience

Operational complexity

Require additional maintenance staff

More software bugs and configuration errors

Page 7: Cake: Enabling High-level SLOs on Shared Storage Systemsacs.ict.ac.cn/storage/slides/Cake.pdfCake: Enabling High-level SLOs on Shared Storage Systems Andrew Wang, Shivaram Venkataraman,

7

Problem And Challenge

Focusing solely on controlling disk-level resources

High-level storage SLOs require consideration of

resources beyond the disk

Disconnect between the high-level SLOs and

performance parameters like MB/s

Require tedious, manual translation

More programmer or system operator

Page 8: Cake: Enabling High-level SLOs on Shared Storage Systemsacs.ict.ac.cn/storage/slides/Cake.pdfCake: Enabling High-level SLOs on Shared Storage Systems Andrew Wang, Shivaram Venkataraman,

8

Solutions

Cake

A coordinated, multi-resource

schedule for shared distributed storage

environments with the goal of achieving

both high throughput and bounded

latency.

Page 9: Cake: Enabling High-level SLOs on Shared Storage Systemsacs.ict.ac.cn/storage/slides/Cake.pdfCake: Enabling High-level SLOs on Shared Storage Systems Andrew Wang, Shivaram Venkataraman,

9

Architecture

System Design

Page 10: Cake: Enabling High-level SLOs on Shared Storage Systemsacs.ict.ac.cn/storage/slides/Cake.pdfCake: Enabling High-level SLOs on Shared Storage Systems Andrew Wang, Shivaram Venkataraman,

10

System Design

First-level schedulers as a client

Provide mechanisms for differentiated

scheduling

Split large requests into smaller chunks

Limit the number of outstanding device requests

Page 11: Cake: Enabling High-level SLOs on Shared Storage Systemsacs.ict.ac.cn/storage/slides/Cake.pdfCake: Enabling High-level SLOs on Shared Storage Systems Andrew Wang, Shivaram Venkataraman,

11

System Design

Cake’s second-level scheduler as a

feedback loop

While attempting to increase utilization

Continually adjusts resource allocation at each

of the first-level schedulers

Maximize SLO compliance of the system

Page 12: Cake: Enabling High-level SLOs on Shared Storage Systemsacs.ict.ac.cn/storage/slides/Cake.pdfCake: Enabling High-level SLOs on Shared Storage Systems Andrew Wang, Shivaram Venkataraman,

12

First-level Resource Scheduling

Differentiated scheduling

a b

Page 13: Cake: Enabling High-level SLOs on Shared Storage Systemsacs.ict.ac.cn/storage/slides/Cake.pdfCake: Enabling High-level SLOs on Shared Storage Systems Andrew Wang, Shivaram Venkataraman,

13

First-level Resource SchedulingSplit large requests

Control number of outstanding requests

c d

Page 14: Cake: Enabling High-level SLOs on Shared Storage Systemsacs.ict.ac.cn/storage/slides/Cake.pdfCake: Enabling High-level SLOs on Shared Storage Systems Andrew Wang, Shivaram Venkataraman,

14

Second-level Scheduling

Multi-resource Request Lifecycle

Request processing in a storage system

involves far more than just accessing disk

Necessitating a coordinated, multi-resource

approach to scheduling

Page 15: Cake: Enabling High-level SLOs on Shared Storage Systemsacs.ict.ac.cn/storage/slides/Cake.pdfCake: Enabling High-level SLOs on Shared Storage Systems Andrew Wang, Shivaram Venkataraman,

15

Second-level Scheduling

Multi-resource Request Lifecycle

Page 16: Cake: Enabling High-level SLOs on Shared Storage Systemsacs.ict.ac.cn/storage/slides/Cake.pdfCake: Enabling High-level SLOs on Shared Storage Systems Andrew Wang, Shivaram Venkataraman,

16

Second-level Scheduling

High-level SLO Enforcement Cake’s second-level scheduler

Satisfy the latency requirements of latency-sensitive front-end clients

Maximize the throughput of throughput-oriented batch clients

Two phases of second level scheduling decisions For disk in the SLO compliance-based phase

For non-disk resources in the queue occupancy-based phase

Page 17: Cake: Enabling High-level SLOs on Shared Storage Systemsacs.ict.ac.cn/storage/slides/Cake.pdfCake: Enabling High-level SLOs on Shared Storage Systems Andrew Wang, Shivaram Venkataraman,

17

Second-level Scheduling

The initial SLO compliance-based phase

Decide on disk allocations based on client performance

The queue occupancy-based phase

Balance allocation in the rest of the system to keep the

disk utilized and improve overall performance

Page 18: Cake: Enabling High-level SLOs on Shared Storage Systemsacs.ict.ac.cn/storage/slides/Cake.pdfCake: Enabling High-level SLOs on Shared Storage Systems Andrew Wang, Shivaram Venkataraman,

18

Implementation

Chunking Large Requests

Page 19: Cake: Enabling High-level SLOs on Shared Storage Systemsacs.ict.ac.cn/storage/slides/Cake.pdfCake: Enabling High-level SLOs on Shared Storage Systems Andrew Wang, Shivaram Venkataraman,

19

Implementation

Number of Outstanding Requests

Page 20: Cake: Enabling High-level SLOs on Shared Storage Systemsacs.ict.ac.cn/storage/slides/Cake.pdfCake: Enabling High-level SLOs on Shared Storage Systems Andrew Wang, Shivaram Venkataraman,

20

Implementation

Cake Second-level Scheduler — SLO

Compliance-based Scheduling

Page 21: Cake: Enabling High-level SLOs on Shared Storage Systemsacs.ict.ac.cn/storage/slides/Cake.pdfCake: Enabling High-level SLOs on Shared Storage Systems Andrew Wang, Shivaram Venkataraman,

21

Implementation

Cake Second-level Scheduler — Queue

Occupancy-based Scheduling

Page 22: Cake: Enabling High-level SLOs on Shared Storage Systemsacs.ict.ac.cn/storage/slides/Cake.pdfCake: Enabling High-level SLOs on Shared Storage Systems Andrew Wang, Shivaram Venkataraman,

22

Evaluation

Proportional Shares and Reservations

When the front-end client is sending low throughput, reservations are an

effective way of reducing queue time at HDFS

Page 23: Cake: Enabling High-level SLOs on Shared Storage Systemsacs.ict.ac.cn/storage/slides/Cake.pdfCake: Enabling High-level SLOs on Shared Storage Systems Andrew Wang, Shivaram Venkataraman,

23

Evaluation

Proportional Shares and Reservations

When the front-end is sending high throughput,proportional share

is an effective mechanism at reducing latency

Page 24: Cake: Enabling High-level SLOs on Shared Storage Systemsacs.ict.ac.cn/storage/slides/Cake.pdfCake: Enabling High-level SLOs on Shared Storage Systems Andrew Wang, Shivaram Venkataraman,

24

Evaluation

Single vs Multi-resource Scheduling

CPU contention within HBase when running many concurrent threads

and without separate queues and differentiated scheduling

Page 25: Cake: Enabling High-level SLOs on Shared Storage Systemsacs.ict.ac.cn/storage/slides/Cake.pdfCake: Enabling High-level SLOs on Shared Storage Systems Andrew Wang, Shivaram Venkataraman,

25

Evaluation

Single vs. Multi-resource Scheduling

Thread-per-request displays greatly increased latency with chunked

request sizes

Page 26: Cake: Enabling High-level SLOs on Shared Storage Systemsacs.ict.ac.cn/storage/slides/Cake.pdfCake: Enabling High-level SLOs on Shared Storage Systems Andrew Wang, Shivaram Venkataraman,

26

Evaluation

Convergence Time

Diurnal Workload

Spike Workload

Latency Throughput Trade-off

Quantifying Benefits of Consolidation

Page 27: Cake: Enabling High-level SLOs on Shared Storage Systemsacs.ict.ac.cn/storage/slides/Cake.pdfCake: Enabling High-level SLOs on Shared Storage Systems Andrew Wang, Shivaram Venkataraman,

27

Conclusion

Coordinating resource allocation across

multiple software layers

Allowing application programmers to specify

high-level SLOs directly to the storage

Allowing consolidation of latency-sensitive

and throughput-oriented workloads

Page 28: Cake: Enabling High-level SLOs on Shared Storage Systemsacs.ict.ac.cn/storage/slides/Cake.pdfCake: Enabling High-level SLOs on Shared Storage Systems Andrew Wang, Shivaram Venkataraman,

28

Conclusion

Allowing users to flexibly move within the

storage latency vs. throughput trade-off by

choosing different high-level SLOs

Using Cake has concrete economic and

business advantages

Page 29: Cake: Enabling High-level SLOs on Shared Storage Systemsacs.ict.ac.cn/storage/slides/Cake.pdfCake: Enabling High-level SLOs on Shared Storage Systems Andrew Wang, Shivaram Venkataraman,

29

Future work

SLO admission control

Influence of DRAM and SSDs

Composable application-level SLOs

Automatic parameter tuning

Generalization to multiple SLOs

Page 30: Cake: Enabling High-level SLOs on Shared Storage Systemsacs.ict.ac.cn/storage/slides/Cake.pdfCake: Enabling High-level SLOs on Shared Storage Systems Andrew Wang, Shivaram Venkataraman,

30

Thank You