Managing Cloud Resources: Distributed Rate Limiting

Preview:

DESCRIPTION

Managing Cloud Resources: Distributed Rate Limiting. Alex C. Snoeren Kevin Webb, Bhanu Chandra Vattikonda , Barath Raghavan , Kashi Vishwanath , Sriram Ramabhadran , and Kenneth Yocum Building and Programming the Cloud Workshop 13 January 2010. Hosting with a single physical presence - PowerPoint PPT Presentation

Citation preview

Managing Cloud Resources: Distributed Rate Limiting

Alex C. SnoerenKevin Webb, Bhanu Chandra Vattikonda, Barath Raghavan,

Kashi Vishwanath, Sriram Ramabhadran,and Kenneth Yocum

Building and Programming the Cloud Workshop13 January 2010

2

Hosting with a single physical presence However, clients are across the Internet

Centralized Internet services

Mysore-Park Cloud Workshop – 13 January 2010

3

Cloud-based services Resources and clients distributed across the world

Often incorporates resources from multiple providers

Windows Live

Mysore-Park Cloud Workshop – 13 January 2010

4

Resources in the Cloud Distributed resource consumption

Clients consume resources at multiple sites Metered billing is state-of-the-art Service “punished” for popularity

» Those unable to pay are disconnected No control of resources used to serve increased demand

Overprovision and pray Application designers typically cannot describe needs Individual service bottlenecks varied but severe

» IOps, network bandwidth, CPU, RAM, etc.» Need a way to balance resource demand

Mysore-Park Cloud Workshop – 13 January 2010

5

Two lynchpins for success Need a way to control and manage distributed

resources as if they were centralized All current models from OS scheduling and provisioning

literature assume full knowledge and absolute control (This talk focuses specifically on network bandwidth)

Must be able to efficiently support rapidly evolving application demand

Balance the resource needs to hardware realization automatically without application designer input

(Another talk if you’re interested)

Mysore-Park Cloud Workshop – 13 January 2010

6

S

S

S

D

D

D

0 ms

0 ms

0 ms

Limiters

Ideal: Emulate a single limiter Make distributed feel centralized

Packets should experience the same limiter behavior

Mysore-Park Cloud Workshop – 13 January 2010

7

Accuracy(how close to K Mbps is delivered, flow rate fairness)

+Responsiveness

(how quickly demand shifts are accommodated)

Vs.

Communication Efficiency(how much and often rate limiters must communicate)

Engineering tradeoffs

Mysore-Park Cloud Workshop – 13 January 2010

8

Limiter 1

Limiter 2

Limiter 3

Limiter 4

Gossip

GossipGossipEstimatelocal demand

Estimateintervaltimer

Set allocationGlobal

demand

Enforce limit

Packetarrival

An initial architecture

Mysore-Park Cloud Workshop – 13 January 2010

9

Token bucket, fill rate K Mbps

Packet

Token bucket limiters

Mysore-Park Cloud Workshop – 13 January 2010

10

Demand info

(bytes/sec)

Limiter 1 Limiter 2

A global token bucket (GTB)?

Mysore-Park Cloud Workshop – 13 January 2010

11

Limiter 1

3 TCP flowsS D

Limiter 27 TCP flowsS D

Single token bucket

10 TCP flowsS D

A baseline experiment

Mysore-Park Cloud Workshop – 13 January 2010

12

Single token bucket Global token bucket

7 TCP flows3 TCP flows

10 TCP flows

Problem: GTB requires near-instantaneous arrival info

GTB performance

Mysore-Park Cloud Workshop – 13 January 2010

13

5 Mbps (limit)4 Mbps (global arrival rate)

Case 1: Below global limit, forward packet

Limiters send, collect global rate info from othersTake 2: Global Random Drop

Mysore-Park Cloud Workshop – 13 January 2010

14

5 Mbps (limit)6 Mbps (global arrival rate)

Case 2: Above global limit, drop with probability:Excess / Global arrival rate = 1/6

Same at all limiters

Global Random Drop (GRD)

Mysore-Park Cloud Workshop – 13 January 2010

15

7 TCP flows

3 TCP flows

10 TCP flows

Delivers flow behavior similar to a central limiter

GRD baseline performanceSingle token bucket Global token bucket

Mysore-Park Cloud Workshop – 13 January 2010

16

GRD under dynamic arrivals

Mysore-Park Cloud Workshop – 13 January 2010 (50-ms estimate interval)

17

Limiter 1

3 TCP flowsS D

Limiter 2

7 TCP flowsS D

Returning to our baseline

Mysore-Park Cloud Workshop – 13 January 2010

18

“3 flows”“7 flows”

Goal: Provide inter-flow fairness for TCP flows

Local token-bucketenforcement

Basic idea: flow counting

Limiter 1 Limiter 2

Mysore-Park Cloud Workshop – 13 January 2010

19

Local token rate (limit) = 10 Mbps

Flow A = 5 Mbps

Flow B = 5 Mbps

Flow count = 2 flows

Estimating TCP demand

1 TCP flowS

1 TCP flowS

Mysore-Park Cloud Workshop – 13 January 2010

20

FPS under dynamic arrivals

Mysore-Park Cloud Workshop – 13 January 2010 (500-ms estimate interval)

21

Comparing FPS to GRD

Both are responsive and provide similar utilization GRD requires accurate estimates of the global rate at all limiters.

GRD (50-ms est. int.)

FPS (500-ms est. int.)

Mysore-Park Cloud Workshop – 13 January 2010

22

Estimating skewed demandLimiter 1

D

Limiter 2

3 TCP flowsS D

1 TCP flowS

S1 TCP flow

Mysore-Park Cloud Workshop – 13 January 2010

23

Key insight: Use a TCP flow’s rate to infer demand

Local token rate (limit) = 10 Mbps

Flow A = 8 Mbps

Flow B = 2 Mbps

Flow count ≠ demand

Bottlenecked elsewhere

Estimating skewed demand

Mysore-Park Cloud Workshop – 13 January 2010

24

Local token rate (limit) = 10 Mbps

Flow A = 8 Mbps

Flow B = 2 Mbps

Bottlenecked elsewhere

Estimating skewed demand

108

Local LimitLargest Flow’s Rate = = 1.25 flows

Mysore-Park Cloud Workshop – 13 January 2010

25

3 flowsLimiter 2

10 Mbps x 1.251.25 + 3

Global limit = 10 Mbps

1.25 flowsLimiter 1

Set local token rate =

= 2.94 Mbps

Global limit x local flow countTotal flow count

=

FPS example

Mysore-Park Cloud Workshop – 13 January 2010

26

FPS bottleneck example

Mysore-Park Cloud Workshop – 13 January 2010

Initially 3:7 split between 10 un-bottlenecked flows At 25s, 7-flow aggregate bottlenecked to 2 Mbps At 45s, un-bottlenecked flow arrives: 3:1 for 8 Mbps

27

Real world constraints Resources spent tracking usage is pure overhead

Efficient implementation (<3% CPU, sample & hold) Modest communication budget (<1% bandwidth)

Control channel is slow and lossy Need to extend gossip protocols to tolerate loss An interesting research problem on its own…

The nodes themselves may fail or partition In an asynchronous system, you cannot tell the difference Need to have a mechanism that deals gracefully with both

Mysore-Park Cloud Workshop – 13 January 2010

28

Robust control communication

Mysore-Park Cloud Workshop – 13 January 2010

7 Limiters enforcing 10 Mbps limit Demand fluctuates every 5 sec between 1-100 flows Varying loss on the control channel

29

Handling partitions

Mysore-Park Cloud Workshop – 13 January 2010

Failsafe operation: each disconnected group k/n Ideally: Bank-o-mat problem (credit/debit scheme) Challege: group membership with asymmetric partitions

30

5 Mbps

Following PlanetLab demand Apache Web servers on 10 PlanetLab nodes

5-Mbps aggregate limit Shift load over time from 10 nodes to 4

Mysore-Park Cloud Workshop – 13 January 2010

31

Demands at 10 apache servers on Planetlab

Demand shifts to just 4 nodesWasted capacity

31Mysore-Park Cloud Workshop – 13 January 2010

Current limiting options

32

Applying FPS on PlanetLab

32Mysore-Park Cloud Workshop – 13 January 2010

33

Hierarchical limiting

Mysore-Park Cloud Workshop – 13 January 2010

34

A sample use case

Mysore-Park Cloud Workshop – 13 January 2010

T 0:A: 5 flows at L1

T 55:A: 5 flows at L2

T 110:B: 5 flows at L1

T 165:B: 5 flows at L2

35

Worldwide flow join

Mysore-Park Cloud Workshop – 13 January 2010

8 nodes split between UCSD and Polish Telecom 5 Mbps aggregate limit A new flow arrives at each limiter every 10 seconds

36

Worldwide demand shift

Mysore-Park Cloud Workshop – 13 January 2010

Same demand-shift experiment as before At 50 sec, Polish Telecom demand disappears Reappears at 90 sec.

37

Where to go from here Need to “let go” of full control, make decisions with

only a “cloudy” view of actual resource consumption Distinguish between what you know and what you don’t know Operate efficiently when you know you know. Have failsafe options when you know you don’t.

Moreover, we cannot rely upon application/service designers to understand their resource demands

The system needs to dynamically adjust to shifts We’ve started to manage the demand equation We’re now focusing on the supply side: custom-tailored

resource provisioning.

Mysore-Park Cloud Workshop – 13 January 2010

Recommended