43
Should a load-balancer choose the path as well as the server? Nikhil Handigol Stanford University Joint work with Nick McKeown and Ramesh Johari

Datacenter Wide-areaEnterprise LOAD-BALANCER Client Servers

Embed Size (px)

Citation preview

Page 1: Datacenter Wide-areaEnterprise LOAD-BALANCER Client Servers

Should a load-balancer choose the path

as well as the server?

Nikhil HandigolStanford University

Joint work with Nick McKeown and Ramesh Johari

Page 2: Datacenter Wide-areaEnterprise LOAD-BALANCER Client Servers

Datacenter

Wide-area Enterprise

Page 3: Datacenter Wide-areaEnterprise LOAD-BALANCER Client Servers

Can’t choose path :’(

LOAD-BALANCER

Client

Servers

Page 4: Datacenter Wide-areaEnterprise LOAD-BALANCER Client Servers

Outline and goals

A new architecture for distributed load-balancing joint (server, path) selection

Demonstrate a nation-wide prototype

Interesting preliminary results

Page 5: Datacenter Wide-areaEnterprise LOAD-BALANCER Client Servers

I’m here to ask for your help!

Page 6: Datacenter Wide-areaEnterprise LOAD-BALANCER Client Servers

Data Path (Hardware)

Control Path

OpenFlow

OpenFlow Controller

OpenFlow Protocol (SSL)

Control Path

Page 7: Datacenter Wide-areaEnterprise LOAD-BALANCER Client Servers

Custom Hardware

Custom Hardware

Custom Hardware

Custom Hardware

Custom Hardware

OS

OS

OS

OS

OS

Network OS

Feature

Feature

Software Defined Networking

Feature Feature

Feature Feature

Feature Feature

Feature Feature

Feature Feature

7

Page 8: Datacenter Wide-areaEnterprise LOAD-BALANCER Client Servers

Load Balancing is just

Smart Routing

Page 9: Datacenter Wide-areaEnterprise LOAD-BALANCER Client Servers

Custom Hardware

Custom Hardware

Custom Hardware

Custom Hardware

Custom Hardware

Network OS

Load-balancing logic

Load-balancing as a network primitive

Load-balancing decision

Load-balancing decision

Load-balancing decision

Load-balancing decision

Load-balancing decision

9

Page 10: Datacenter Wide-areaEnterprise LOAD-BALANCER Client Servers

Aster*x Controller

Page 11: Datacenter Wide-areaEnterprise LOAD-BALANCER Client Servers
Page 12: Datacenter Wide-areaEnterprise LOAD-BALANCER Client Servers

http://www.openflow.org/videos

Page 13: Datacenter Wide-areaEnterprise LOAD-BALANCER Client Servers

So far…

A new architecture for distributed load-balancing joint (server, path) selection

Aster*x – a nation-wide prototype

Promising results that joint (server, path) selection might have great benefits

Page 14: Datacenter Wide-areaEnterprise LOAD-BALANCER Client Servers

What next?

Page 15: Datacenter Wide-areaEnterprise LOAD-BALANCER Client Servers

How big is the pie?

Characterizing and quantifying the performance of joint (server, path) selection

Page 16: Datacenter Wide-areaEnterprise LOAD-BALANCER Client Servers

Load-balancing Controller

MININET-RT

Page 17: Datacenter Wide-areaEnterprise LOAD-BALANCER Client Servers

Load-balancing Controller

Page 18: Datacenter Wide-areaEnterprise LOAD-BALANCER Client Servers

Clients CDNISP

Model

Page 19: Datacenter Wide-areaEnterprise LOAD-BALANCER Client Servers

ParametersTopology

Intra-AS topologies

BRITE (2000 topologies)

CAIDA (1000 topologies)

Rocketfuel (~100 topos.)

20-50 nodesUniform link capacity

Page 20: Datacenter Wide-areaEnterprise LOAD-BALANCER Client Servers

ParametersServers

5-10 serversRandom placement

ServiceSimple HTTP serviceServing 1 MB fileAdditional server-side computation

Page 21: Datacenter Wide-areaEnterprise LOAD-BALANCER Client Servers

ParametersClients

3-5 client locationsRandom placement

Request patternPoisson process

Mean rate: 5-10 req/sec

Page 22: Datacenter Wide-areaEnterprise LOAD-BALANCER Client Servers

Load-balancing strategies?

Page 23: Datacenter Wide-areaEnterprise LOAD-BALANCER Client Servers

Simple but suboptimal

Complex but optimal

Design space

Disjoint-Shortest-Path

Joint

Disjoint-Traffic-Engineering

Page 24: Datacenter Wide-areaEnterprise LOAD-BALANCER Client Servers

Anatomy of a request-response

Client Load-Balancer ServerR

esp

onse

Tim

e

Deliver

Retrieve

Choose

Request

Response 1st byte

Response last byte

Last byte ack

Page 25: Datacenter Wide-areaEnterprise LOAD-BALANCER Client Servers

Disjoint-Shortest-Path

CDN selects the least loaded server

Load = retrieve + deliver

ISP independently selects the shortest path

Page 26: Datacenter Wide-areaEnterprise LOAD-BALANCER Client Servers

Disjoint-Traffic-Engineering

CDN selects the least loaded server

Load = retrieve + deliver

ISP independently selects path to minimize max load

Max bandwidth headroom

Page 27: Datacenter Wide-areaEnterprise LOAD-BALANCER Client Servers

Joint

Single controller jointly selects the best (server, path) pair

Total latency = retrieve + estimated deliver

Page 28: Datacenter Wide-areaEnterprise LOAD-BALANCER Client Servers

Disjoint-Shortest-Path vs Joint

Disjoint-Shortest-Path performs ~2x worse than Joint

Page 29: Datacenter Wide-areaEnterprise LOAD-BALANCER Client Servers

Disjoint-Traffic-Engg. vs Joint

Disjoint-Traffic-Engineering performs almost as well as Joint

Page 30: Datacenter Wide-areaEnterprise LOAD-BALANCER Client Servers

Is Disjoint truly disjoint?Client Load-Balancer Server

Resp

onse

Tim

e

Deliver

Retrieve

Choose

Request

Response 1st byte

Response last byte

Last byte ack

Server response time contains network information

Page 31: Datacenter Wide-areaEnterprise LOAD-BALANCER Client Servers

The bottleneck effect

A single bottleneck resource along the path determines the performance.

Page 32: Datacenter Wide-areaEnterprise LOAD-BALANCER Client Servers

Clients CDNISP

The CDN-ISP game

Page 33: Datacenter Wide-areaEnterprise LOAD-BALANCER Client Servers

The CDN-ISP game

System load monotonically decreases

Both push system in the same direction

Page 34: Datacenter Wide-areaEnterprise LOAD-BALANCER Client Servers

Summary of observations

Disjoint-SP is ~2x worse than Joint

Disjoint-TE performs almost as well as Joint (despite decoupling of server selection

and traffic engineering)

Game theoretic analysis supports the empirical observation

Page 35: Datacenter Wide-areaEnterprise LOAD-BALANCER Client Servers

How we could collaborate

Netflix video - ~30% Internet traffic

Important to efficiently utilize the available resources

I want to apply my research work to Netflix’s service “How can we jointly optimize (server, path) selection to

achieve near-optimal performance?”

How can we work together on this?

Page 36: Datacenter Wide-areaEnterprise LOAD-BALANCER Client Servers

Can you share video streaming data? How can I model the “Netflix network”? Topology? B/W? Where is the bottleneck? Servers? Network? Where are the servers located? How many? What is the client request pattern? What is the video stream size distribution? Duration?

Bandwidth? How do you choose a server for a given request? How do you choose a path for a given request?

Questions – Video Streaming

Page 37: Datacenter Wide-areaEnterprise LOAD-BALANCER Client Servers

Can you share video streaming data? Cost structure – What is the cost

model? Why do you outsource video streaming to CDNs?

How do you deal with non-streaming part of the service (UI)?

Questions – Video Streaming

Page 38: Datacenter Wide-areaEnterprise LOAD-BALANCER Client Servers

Questions – AWS Deployment

Can we work together to characterize the AWS deployment? E.g., Size of deployment, incoming request pattern, inter-VM traffic

Are there web-level SLAs? Does AWS pose challenges?

What are the scaling bottlenecks? CPU? Network? Other?

Page 39: Datacenter Wide-areaEnterprise LOAD-BALANCER Client Servers

Let’s chat more!

Page 40: Datacenter Wide-areaEnterprise LOAD-BALANCER Client Servers

Conclusion

A new architecture for distributed load-balancing joint (server, path) selection

Aster*x - a nation-wide prototype

Interesting preliminary results

Future – application to streaming media services

Page 41: Datacenter Wide-areaEnterprise LOAD-BALANCER Client Servers

Extra slides…

Page 42: Datacenter Wide-areaEnterprise LOAD-BALANCER Client Servers

Questions – AWS Deployment

Can you share Netflix AWS deployment data? How many VMs? What size? What is the service structure? How many tiers of

services? Do you have any SLAs to meet? Any problems there? Would joint VM placement + routing help? What is the avg. NIC/CPU utilization on the VMs? Is the network ever a bottleneck? Do you do any MapReduce-style computation?

Page 43: Datacenter Wide-areaEnterprise LOAD-BALANCER Client Servers

Sample topologies

BRITE CAIDA