Managing Cloud Resources: Distributed Rate Limiting

Managing Cloud Resources: Distributed Rate Limiting

Alex C. SnoerenKevin Webb, Bhanu Chandra Vattikonda, Barath Raghavan,

Kashi Vishwanath, Sriram Ramabhadran,and Kenneth Yocum

Building and Programming the Cloud Workshop13 January 2010

2

Hosting with a single physical presence However, clients are across the Internet

Centralized Internet services

Mysore-Park Cloud Workshop – 13 January 2010

http://docs.google.com/images/doclist/logo_docs.gif

3

Cloud-based services Resources and clients distributed across the world

Often incorporates resources from multiple providers

Windows Live


http://docs.google.com/images/doclist/logo_docs.gif

4

Resources in the Cloud Distributed resource consumption

Clients consume resources at multiple sites Metered billing is state-of-the-art Service “punished” for popularity

» Those unable to pay are disconnected No control of resources used to serve increased demand

Overprovision and pray Application designers typically cannot describe needs Individual service bottlenecks varied but severe

» IOps, network bandwidth, CPU, RAM, etc.» Need a way to balance resource demand


5

Two lynchpins for success Need a way to control and manage distributed

resources as if they were centralized All current models from OS scheduling and provisioning

literature assume full knowledge and absolute control (This talk focuses specifically on network bandwidth)

Must be able to efficiently support rapidly evolving application demand

Balance the resource needs to hardware realization automatically without application designer input

(Another talk if you’re interested)


6

S

S

S

D

D

D

0 ms

0 ms

0 ms

Limiters

Ideal: Emulate a single limiter Make distributed feel centralized

Packets should experience the same limiter behavior


7

Accuracy(how close to K Mbps is delivered, flow rate fairness)

+Responsiveness

(how quickly demand shifts are accommodated)

Vs.

Communication Efficiency(how much and often rate limiters must communicate)

Engineering tradeoffs


8

Limiter 1

Limiter 2

Limiter 3

Limiter 4

Gossip

GossipGossipEstimatelocal demand

Estimateintervaltimer

Set allocationGlobal

demand

Enforce limit

Packetarrival

An initial architecture


9

Token bucket, fill rate K Mbps

Packet

Token bucket limiters


10

Demand info

(bytes/sec)

Limiter 1 Limiter 2

A global token bucket (GTB)?


11

Limiter 1

3 TCP flowsS D

Limiter 27 TCP flowsS D

Single token bucket

10 TCP flowsS D

A baseline experiment


12

Single token bucket Global token bucket

7 TCP flows3 TCP flows

10 TCP flows

Problem: GTB requires near-instantaneous arrival info

GTB performance


13

5 Mbps (limit)4 Mbps (global arrival rate)

Case 1: Below global limit, forward packet

Limiters send, collect global rate info from othersTake 2: Global Random Drop


14

5 Mbps (limit)6 Mbps (global arrival rate)

Case 2: Above global limit, drop with probability:Excess / Global arrival rate = 1/6

Same at all limiters

Global Random Drop (GRD)


15

7 TCP flows

3 TCP flows

10 TCP flows

Delivers flow behavior similar to a central limiter

GRD baseline performanceSingle token bucket Global token bucket


16

GRD under dynamic arrivals

Mysore-Park Cloud Workshop – 13 January 2010 (50-ms estimate interval)

17

Limiter 1

3 TCP flowsS D

Limiter 2

7 TCP flowsS D

Returning to our baseline


18

“3 flows”“7 flows”

Goal: Provide inter-flow fairness for TCP flows

Local token-bucketenforcement

Basic idea: flow counting

Limiter 1 Limiter 2


19

Local token rate (limit) = 10 Mbps

Flow A = 5 Mbps

Flow B = 5 Mbps

Flow count = 2 flows

Estimating TCP demand

1 TCP flowS

1 TCP flowS


20

FPS under dynamic arrivals

Mysore-Park Cloud Workshop – 13 January 2010 (500-ms estimate interval)

21

Comparing FPS to GRD

Both are responsive and provide similar utilization GRD requires accurate estimates of the global rate at all limiters.

GRD (50-ms est. int.)

FPS (500-ms est. int.)


22

Estimating skewed demandLimiter 1

D

Limiter 2

3 TCP flowsS D

1 TCP flowS

S1 TCP flow


23

Key insight: Use a TCP flow’s rate to infer demand


Flow A = 8 Mbps

Flow B = 2 Mbps

Flow count ≠ demand

Bottlenecked elsewhere

Estimating skewed demand


24


Flow A = 8 Mbps

Flow B = 2 Mbps

Bottlenecked elsewhere

Estimating skewed demand

108

Local LimitLargest Flow’s Rate = = 1.25 flows


25

3 flowsLimiter 2

10 Mbps x 1.251.25 + 3

Global limit = 10 Mbps

1.25 flowsLimiter 1

Set local token rate =

= 2.94 Mbps

Global limit x local flow countTotal flow count

=

FPS example


26

FPS bottleneck example


Initially 3:7 split between 10 un-bottlenecked flows At 25s, 7-flow aggregate bottlenecked to 2 Mbps At 45s, un-bottlenecked flow arrives: 3:1 for 8 Mbps

27

Real world constraints Resources spent tracking usage is pure overhead

Efficient implementation (<3% CPU, sample & hold) Modest communication budget (<1% bandwidth)

Control channel is slow and lossy Need to extend gossip protocols to tolerate loss An interesting research problem on its own…

The nodes themselves may fail or partition In an asynchronous system, you cannot tell the difference Need to have a mechanism that deals gracefully with both


28

Robust control communication


7 Limiters enforcing 10 Mbps limit Demand fluctuates every 5 sec between 1-100 flows Varying loss on the control channel

29

Handling partitions


Failsafe operation: each disconnected group k/n Ideally: Bank-o-mat problem (credit/debit scheme) Challege: group membership with asymmetric partitions

30

5 Mbps

Following PlanetLab demand Apache Web servers on 10 PlanetLab nodes

5-Mbps aggregate limit Shift load over time from 10 nodes to 4


31

Demands at 10 apache servers on Planetlab

Demand shifts to just 4 nodesWasted capacity

31Mysore-Park Cloud Workshop – 13 January 2010

Current limiting options

32

Applying FPS on PlanetLab

32Mysore-Park Cloud Workshop – 13 January 2010

33

Hierarchical limiting


34

A sample use case


T 0:A: 5 flows at L1

T 55:A: 5 flows at L2

T 110:B: 5 flows at L1

T 165:B: 5 flows at L2

35

Worldwide flow join


8 nodes split between UCSD and Polish Telecom 5 Mbps aggregate limit A new flow arrives at each limiter every 10 seconds

36

Worldwide demand shift


Same demand-shift experiment as before At 50 sec, Polish Telecom demand disappears Reappears at 90 sec.

37

Where to go from here Need to “let go” of full control, make decisions with

only a “cloudy” view of actual resource consumption Distinguish between what you know and what you don’t know Operate efficiently when you know you know. Have failsafe options when you know you don’t.

Moreover, we cannot rely upon application/service designers to understand their resource demands

The system needs to dynamically adjust to shifts We’ve started to manage the demand equation We’re now focusing on the supply side: custom-tailored

resource provisioning.


Documents

Managing Cloud Resources: Distributed Rate Limiting