Managing Cloud Resources: Distributed Rate Limiting

Alex C. SnoerenKevin Webb, Bhanu Chandra Vattikonda, Barath Raghavan,

Kashi Vishwanath, Sriram Ramabhadran,and Kenneth Yocum

Building and Programming the Cloud Workshop13 January 2010

Hosting with a single physical presence However, clients are across the Internet

Centralized Internet services

Mysore-Park Cloud Workshop – 13 January 2010

Cloud-based services Resources and clients distributed across the world

Often incorporates resources from multiple providers

Windows Live

Resources in the Cloud Distributed resource consumption

Clients consume resources at multiple sites Metered billing is state-of-the-art Service “punished” for popularity

» Those unable to pay are disconnected No control of resources used to serve increased demand

Overprovision and pray Application designers typically cannot describe needs Individual service bottlenecks varied but severe

» IOps, network bandwidth, CPU, RAM, etc.» Need a way to balance resource demand

Two lynchpins for success Need a way to control and manage distributed

resources as if they were centralized All current models from OS scheduling and provisioning

literature assume full knowledge and absolute control (This talk focuses specifically on network bandwidth)

Must be able to efficiently support rapidly evolving application demand

Balance the resource needs to hardware realization automatically without application designer input

(Another talk if you’re interested)

Limiters

Ideal: Emulate a single limiter Make distributed feel centralized

Packets should experience the same limiter behavior

Accuracy(how close to K Mbps is delivered, flow rate fairness)

+Responsiveness

(how quickly demand shifts are accommodated)

Communication Efficiency(how much and often rate limiters must communicate)

Engineering tradeoffs

Limiter 1

Limiter 2

Limiter 3

Limiter 4

Gossip

GossipGossipEstimatelocal demand

Estimateintervaltimer

Set allocationGlobal

demand

Enforce limit

Packetarrival

An initial architecture

Token bucket, fill rate K Mbps

Packet

Token bucket limiters

Demand info

(bytes/sec)

Limiter 1 Limiter 2

A global token bucket (GTB)?

Limiter 1

3 TCP flowsS D

Limiter 27 TCP flowsS D

Single token bucket

10 TCP flowsS D

A baseline experiment

Single token bucket Global token bucket

7 TCP flows3 TCP flows

10 TCP flows

Problem: GTB requires near-instantaneous arrival info

GTB performance

5 Mbps (limit)4 Mbps (global arrival rate)

Case 1: Below global limit, forward packet

Limiters send, collect global rate info from othersTake 2: Global Random Drop

5 Mbps (limit)6 Mbps (global arrival rate)

Case 2: Above global limit, drop with probability:Excess / Global arrival rate = 1/6

Same at all limiters

Global Random Drop (GRD)

7 TCP flows

3 TCP flows

10 TCP flows

Delivers flow behavior similar to a central limiter

GRD baseline performanceSingle token bucket Global token bucket

GRD under dynamic arrivals

Mysore-Park Cloud Workshop – 13 January 2010 (50-ms estimate interval)

Limiter 1

3 TCP flowsS D

Limiter 2

7 TCP flowsS D

Returning to our baseline

“3 flows”“7 flows”

Goal: Provide inter-flow fairness for TCP flows

Local token-bucketenforcement

Basic idea: flow counting

Limiter 1 Limiter 2

Local token rate (limit) = 10 Mbps

Flow A = 5 Mbps

Flow B = 5 Mbps

Flow count = 2 flows

Estimating TCP demand

1 TCP flowS

FPS under dynamic arrivals

Mysore-Park Cloud Workshop – 13 January 2010 (500-ms estimate interval)

Comparing FPS to GRD

Both are responsive and provide similar utilization GRD requires accurate estimates of the global rate at all limiters.

GRD (50-ms est. int.)

FPS (500-ms est. int.)

Estimating skewed demandLimiter 1

Limiter 2

3 TCP flowsS D

1 TCP flowS

S1 TCP flow

Key insight: Use a TCP flow’s rate to infer demand

Flow A = 8 Mbps

Flow B = 2 Mbps

Flow count ≠ demand

Bottlenecked elsewhere

Estimating skewed demand

Flow A = 8 Mbps

Flow B = 2 Mbps

Bottlenecked elsewhere

Estimating skewed demand

Local LimitLargest Flow’s Rate = = 1.25 flows

3 flowsLimiter 2

10 Mbps x 1.251.25 + 3

Global limit = 10 Mbps

1.25 flowsLimiter 1

Set local token rate =

= 2.94 Mbps

Global limit x local flow countTotal flow count

FPS example

FPS bottleneck example

Initially 3:7 split between 10 un-bottlenecked flows At 25s, 7-flow aggregate bottlenecked to 2 Mbps At 45s, un-bottlenecked flow arrives: 3:1 for 8 Mbps

Real world constraints Resources spent tracking usage is pure overhead

Efficient implementation (<3% CPU, sample & hold) Modest communication budget (<1% bandwidth)

Control channel is slow and lossy Need to extend gossip protocols to tolerate loss An interesting research problem on its own…

The nodes themselves may fail or partition In an asynchronous system, you cannot tell the difference Need to have a mechanism that deals gracefully with both

Robust control communication

7 Limiters enforcing 10 Mbps limit Demand fluctuates every 5 sec between 1-100 flows Varying loss on the control channel

Handling partitions

Failsafe operation: each disconnected group k/n Ideally: Bank-o-mat problem (credit/debit scheme) Challege: group membership with asymmetric partitions

5 Mbps

Following PlanetLab demand Apache Web servers on 10 PlanetLab nodes

5-Mbps aggregate limit Shift load over time from 10 nodes to 4

Demands at 10 apache servers on Planetlab

Demand shifts to just 4 nodesWasted capacity

31Mysore-Park Cloud Workshop – 13 January 2010

Current limiting options

Applying FPS on PlanetLab

32Mysore-Park Cloud Workshop – 13 January 2010

Hierarchical limiting

A sample use case

T 0:A: 5 flows at L1

T 55:A: 5 flows at L2

T 110:B: 5 flows at L1

T 165:B: 5 flows at L2

Worldwide flow join

8 nodes split between UCSD and Polish Telecom 5 Mbps aggregate limit A new flow arrives at each limiter every 10 seconds

Worldwide demand shift

Same demand-shift experiment as before At 50 sec, Polish Telecom demand disappears Reappears at 90 sec.

Where to go from here Need to “let go” of full control, make decisions with

only a “cloudy” view of actual resource consumption Distinguish between what you know and what you don’t know Operate efficiently when you know you know. Have failsafe options when you know you don’t.

Moreover, we cannot rely upon application/service designers to understand their resource demands

The system needs to dynamically adjust to shifts We’ve started to manage the demand equation We’re now focusing on the supply side: custom-tailored

resource provisioning.

Managing Cloud Resources: Distributed Rate Limiting

Documents

Federated Database Systems for Managing Distributed, Heterogeneous

Managing distributed teams

Chapter 7 Configuring & Managing Distributed File System

Managing large and distributed Eclipse server applications

Managing Distributed Innovation: Strategic Utilization of ... · PDF fileManaging Distributed Innovation: Strategic ... “Managing Distributed Innovation: Strategic Utilization of

Managing Central Monitoring in Distributed Systemsmedia.paessler.com/common/files/pdf/whitepaper_remote... · 2010-09-16 · White Paper: Managing Central Monitoring in Distributed

A 3-V fully differential distributed limiting driver for ...sorinv/papers/JSSCC_2003.pdf · A 3-V Fully Differential Distributed Limiting Driver for 40-Gb/s Optical Transmission Systems

soCloud: distributed multi-cloud platform for deploying, executing and managing distributed applications

Managing a Widely Distributed Network

Limiting Fake Accounts in Large-Scale Distributed Systems ...dl.ifip.org/db/conf/im/im2015diss/138163.pdf · Limiting Fake Accounts in Large-Scale Distributed Systems through Adaptive

Managing Distributed Data Streams – II

Managing Distributed Technology Projects: Within and Between …assets.csom.umn.edu/assets/100142.pdf · 2014-09-17 · Managing Distributed Technology Projects: Within and Between

Geo-Distributed Stream Processingcastan/doc/2020/WPE2.pdf · All mainstream distributed stream processing systems, however, share a limiting assump- ... milliseconds, this includes

Limiting extensibility constitutive model with distributed

Deploying & Managing distributed apps on YARN

Managing a Zoo: Tools for Managing and Monitoring Distributed Systems from Clouderacache-mskstoredata09.cdn.yandex.net/download.yande… · · 2011-09-30Monitoring Distributed Systems

MANAGING DISTRIBUTED TEAMS USING AGILE METHOD

Managing Distributed UPS Energy for Effective Power

Cloud Control with Distributed Rate Limiting

Managing the COVID-19 Pandemic’s Continuing … › content › dam › marsh › Documents › PDF › …Managing the COVID-19 Pandemic’s Continuing Effects Limiting Travel,