52
VL2: A Scalable and Flexible data Center Network CS538 10/23/2014 Presentation by: Soteris Demetriou Scribe: Cansu Erdogan

VL2: A Scalable and Flexible data Center Network CS538 10/23/2014 Presentation by: Soteris Demetriou Scribe: Cansu Erdogan

Embed Size (px)

Citation preview

Page 1: VL2: A Scalable and Flexible data Center Network CS538 10/23/2014 Presentation by: Soteris Demetriou Scribe: Cansu Erdogan

VL2: A Scalable and Flexible data Center Network

CS53810/23/2014

Presentation by: Soteris DemetriouScribe: Cansu Erdogan

Page 2: VL2: A Scalable and Flexible data Center Network CS538 10/23/2014 Presentation by: Soteris Demetriou Scribe: Cansu Erdogan

Credits

• Some of the slides were used in their original form or adjusted from – Assistant Professor’s Hakim Weatherspoon

(Cornell)

• Those slides are annotated with a * on the top right hand side of the slide.

Page 3: VL2: A Scalable and Flexible data Center Network CS538 10/23/2014 Presentation by: Soteris Demetriou Scribe: Cansu Erdogan

Paper Details

• Title– VL2: A Scalable and Flexible Data Center Network

• Authors – Albert Greenberg, James R. Hamilton, Navendu Jain, Srikanth

Kandula, Changhoon Kim, Parantap Lahiri, David A. Maltz, Paveen Patel, Sudipta Sengupta

– Microsoft Research• Venue

– Proceedings of the ACM SIGCOMM 2009 conference on Data communication

• Citations– 918

Page 4: VL2: A Scalable and Flexible data Center Network CS538 10/23/2014 Presentation by: Soteris Demetriou Scribe: Cansu Erdogan

Overview

• Problem: Conventional Data Center Networks do not provide agility. I.e assigning any service to any server efficiently is challenging.

• Approach: Merge layer 2 and layer 3 into a virtual layer 2 (VL2). How?– Use of flat addressing to provide Layer-2 semantics, Valiant

Load Balancing for uniform high capacity between servers and TCP to ensure performance isolation.

• Findings: VL2 can provide uniform high capacity between any 2 servers, performance isolation between services and agility through layer-2 semantics

Page 5: VL2: A Scalable and Flexible data Center Network CS538 10/23/2014 Presentation by: Soteris Demetriou Scribe: Cansu Erdogan

Outline

• Background• Motivation• Measurement Study• VL2• Evaluation• Summary and Discussion

Page 6: VL2: A Scalable and Flexible data Center Network CS538 10/23/2014 Presentation by: Soteris Demetriou Scribe: Cansu Erdogan

Outline

• Background• Motivation• Measurement Study• VL2• Evaluation• Summary and Discussion

Page 7: VL2: A Scalable and Flexible data Center Network CS538 10/23/2014 Presentation by: Soteris Demetriou Scribe: Cansu Erdogan

Clos Topology

• Multistage circuit switching network• Usage: when the physical network exceeds the

capacity of the largest crossbar switch (MxN)

Page 8: VL2: A Scalable and Flexible data Center Network CS538 10/23/2014 Presentation by: Soteris Demetriou Scribe: Cansu Erdogan

Clos TopologyIngress Stage Middle Stage Egress Stage

Page 9: VL2: A Scalable and Flexible data Center Network CS538 10/23/2014 Presentation by: Soteris Demetriou Scribe: Cansu Erdogan

Clos Topology

n

r

m

Page 10: VL2: A Scalable and Flexible data Center Network CS538 10/23/2014 Presentation by: Soteris Demetriou Scribe: Cansu Erdogan

Traffic Matrix

• Traffic rate between a node and every other node– E.g network with N nodes– Each node N connects to every other node

• NxN flows NxN representative matrix

• Valid: a valid Traffic Matrix is one that ensures no node is oversubscribed

Page 11: VL2: A Scalable and Flexible data Center Network CS538 10/23/2014 Presentation by: Soteris Demetriou Scribe: Cansu Erdogan

Valiant Load Balancing

• Keslassy et al. proved that uniform Valiant load-balancing is the unique architecture which requires the minimum node capacity in interconnecting a set of identical nodes

• Zhang-Shen et al used it to design a predictable network backbone

• Intuition: It is much easier to estimate the aggregate traffic entering and leaving a node than to estimate a complete traffic matrix (traffic rate from every node to every other node)

• A valid traffic matrix approach where an ingress to egress path is used, requires link capacity = node capacity (= r)

• VLB load balances traffic among any two-hop paths– Link capacity among any 2 nodes: r/N + r/N = r * 2/N

Zhang-Shen, Rui, and Nick McKeown. "Designing a predictable Internet backbone network." HotNets, 2004.

Page 12: VL2: A Scalable and Flexible data Center Network CS538 10/23/2014 Presentation by: Soteris Demetriou Scribe: Cansu Erdogan

Valiant Load Balancing

Backbone network

Access Network

Point of Presence

Zhang-Shen, Rui, and Nick McKeown. "Designing a predictable Internet backbone network." HotNets, 2004.

Page 13: VL2: A Scalable and Flexible data Center Network CS538 10/23/2014 Presentation by: Soteris Demetriou Scribe: Cansu Erdogan

Outline

• Background• Motivation• Measurement Study• VL2• Evaluation• Summary and Discussion

Page 14: VL2: A Scalable and Flexible data Center Network CS538 10/23/2014 Presentation by: Soteris Demetriou Scribe: Cansu Erdogan

Conventional Data Center Network Architecture

*

Page 15: VL2: A Scalable and Flexible data Center Network CS538 10/23/2014 Presentation by: Soteris Demetriou Scribe: Cansu Erdogan

DCN Problems

• Static network assignment• Fragmentation of resource

• Poor server to server connectivity• Traffics affects each other• Poor reliability and utilization

CR CR

AR AR AR AR

SS

SS

A AA …

SS

A AA …

. . .

SS

SS

A AA …

SS

A AA …

I want moreI have spare ones,

but…1:5

1:80

1:240

*

Page 16: VL2: A Scalable and Flexible data Center Network CS538 10/23/2014 Presentation by: Soteris Demetriou Scribe: Cansu Erdogan

End Result

16

The Illusion of a Huge L2 Switch

1. L2 semantics

2. Uniform high capacity

3. Performance isolation

A AA … A AA …

. . .

A AA … A AA …

CR CR

AR AR AR AR

SS

SS SS

SS

SS SS

AAAA AAAA AAAA A A A A AA A AA AA AA

. . .

*

Page 17: VL2: A Scalable and Flexible data Center Network CS538 10/23/2014 Presentation by: Soteris Demetriou Scribe: Cansu Erdogan

Objectives

• Uniform high capacity:– Maximum rate of server to server traffic flow should be limited

only by capacity on network cards– Assigning servers to service should be independent of network

topology

• Performance isolation:– Traffic of one service should not be affected by traffic of other

services

• Layer-2 semantics:– Easily assign any server to any service– Configure server with whatever IP address the service expects– VM keeps the same IP address even after migration

*

Page 18: VL2: A Scalable and Flexible data Center Network CS538 10/23/2014 Presentation by: Soteris Demetriou Scribe: Cansu Erdogan

Outline

• Background• Motivation• Measurement Study• VL2• Evaluation• Summary and Discussion

Page 19: VL2: A Scalable and Flexible data Center Network CS538 10/23/2014 Presentation by: Soteris Demetriou Scribe: Cansu Erdogan

Methodology

• Setting design objectives– Interview stakeholders to derive a set of

objectives• Deriving typical workloads

– Measurement study on traffic patterns• Data-center traffic analysis• Flow distribution analysis• Traffic matrix analysis• Failure characteristics

Page 20: VL2: A Scalable and Flexible data Center Network CS538 10/23/2014 Presentation by: Soteris Demetriou Scribe: Cansu Erdogan

Measurement Study

• 2 main questions– Who sends how much data, to whom and when– How often does the state of the network change

due to changes in demand, or switch/link failures and recoveries

• Studied production data centers of a large cloud provider

Page 21: VL2: A Scalable and Flexible data Center Network CS538 10/23/2014 Presentation by: Soteris Demetriou Scribe: Cansu Erdogan

Data-Center Traffic Analysis

• Setting– Instrumentation of a highly utilized cluster in a

data-center– Cluster of 15,000 nodes– Center supports data mining on PB of data– Servers are distributed roughly evenly among 75

ToR (Top of Rack) switches which are connected hierarchically

Page 22: VL2: A Scalable and Flexible data Center Network CS538 10/23/2014 Presentation by: Soteris Demetriou Scribe: Cansu Erdogan

Data-Center Traffic Analysis

• Ratio of traffic volume between servers in the data-center with traffic entering/leaving the data-center is 4:1

• Bandwidth demand between servers inside a data-center shows grows faster than bandwidth demand to external hosts

• Network is the bottleneck of computation

Page 23: VL2: A Scalable and Flexible data Center Network CS538 10/23/2014 Presentation by: Soteris Demetriou Scribe: Cansu Erdogan

Flow Distribution Analysis

• Majority of flows are small (few KB) in par with Internet flows– Why?

• Mostly hellos and meta-data requests to the distributed file system

– Almost all bytes (>90%) are transported in flows of 100MB to 1GB size

• Mode is around 100MB– Why?

» The distributed file system breaks long files into 100-MB chunks

– Flows over a few GB are rear

Page 24: VL2: A Scalable and Flexible data Center Network CS538 10/23/2014 Presentation by: Soteris Demetriou Scribe: Cansu Erdogan

Flow Distribution Analysis

• 2 modes– >50% of the time, an average machine has ~10

concurrent flows– At least 5% of the time it has >80 concurrent flows

• Implies that randomizing path selection at flow granularity will not cause perpetual congestion in case of unlucky placement of flows

Page 25: VL2: A Scalable and Flexible data Center Network CS538 10/23/2014 Presentation by: Soteris Demetriou Scribe: Cansu Erdogan

Traffic Matrix Analysis

• Poor summarizibility of traffic patterns– Even when approximating with 50-60 clusters,

fitting error remains high (60%)– Engineering for just a few traffic matrices is

unlikely to work well for “real” traffic in data centers

• Instability of traffic patterns– Traffic pattern shows no periodicity that can be

exploited for prediction

Page 26: VL2: A Scalable and Flexible data Center Network CS538 10/23/2014 Presentation by: Soteris Demetriou Scribe: Cansu Erdogan

Failure Characteristics 1/2

• Failure definition– The event that occurs when a system or component is

unable to perform its required function for more than 30s• Most failures are small in size

– 50% of network failures involve < 4 devices– 95% of network failures involve < 20 devices

• Downtimes can be significant– 95% are resolved in 10 min– 98% in < 1 hour– 99.6% in < 1 day– 0.09% last > 10 days

Page 27: VL2: A Scalable and Flexible data Center Network CS538 10/23/2014 Presentation by: Soteris Demetriou Scribe: Cansu Erdogan

Failure Characteristics 2/2

• In 0.3% of failures all redundant components in a network device group became unavailable

• Main causes of downtimes– Network misconfigurations– Firmware bugs– Faulty components

• No obvious way to eliminate all failures from the top of the hierarchy

Page 28: VL2: A Scalable and Flexible data Center Network CS538 10/23/2014 Presentation by: Soteris Demetriou Scribe: Cansu Erdogan

Outline

• Background• Motivation• Measurement Study• VL2 • Evaluation• Summary and Discussion

Page 29: VL2: A Scalable and Flexible data Center Network CS538 10/23/2014 Presentation by: Soteris Demetriou Scribe: Cansu Erdogan

Objectives• Methodology: Interviews with architects, developers and operators

• Uniform high capacity:– Maximum rate of server to server traffic flow should be limited only by

capacity on network cards– Assigning servers to service should be independent of network topology

• Performance isolation:– Traffic of one service should not be affected by traffic of other services

• Layer-2 semantics:– Easily assign any server to any service– Configure server with whatever IP address the service expects– VM keeps the same IP address even after migration

*

Page 30: VL2: A Scalable and Flexible data Center Network CS538 10/23/2014 Presentation by: Soteris Demetriou Scribe: Cansu Erdogan

Design Overview

Layer-2 semantics

Performance Isolation

Uniform high-capacity

Flat Addressing

Enforce hose model using

existing mechanisms

Guarantee bandwidth for

hose-model traffic

Directory System (resolution service)

Name – Location Separation

TCP

Valiant Load Balancing

Scale-out CLOS topology

Solution Approach Objective

Page 31: VL2: A Scalable and Flexible data Center Network CS538 10/23/2014 Presentation by: Soteris Demetriou Scribe: Cansu Erdogan

Design Overview

• Randomizing to cope with unpredictability and volatility– Valiant Load Balancing:

• Destination independent traffic spreading across multiple intermediate nodes

• Clos topology is used to support randomization• Proposal of a flow spreading mechanism

Page 32: VL2: A Scalable and Flexible data Center Network CS538 10/23/2014 Presentation by: Soteris Demetriou Scribe: Cansu Erdogan

Design Overview

• Building on proven technologies• VL2 is based on IP routing and forwarding

technologies that are available in commodity switches

• Link state routing– Maintains switch-level topology– Does not disseminate end hosts’ info

• Equal-Cost Multi-Path forwarding with anycast addresses

– Enables VLB with minimal control plane messaging

Page 33: VL2: A Scalable and Flexible data Center Network CS538 10/23/2014 Presentation by: Soteris Demetriou Scribe: Cansu Erdogan

Design Overview

• Separating names from locators– Enables agility– rapid VM migration– Use of

• Application addresses (AA)• Location Addresses (LA)• Directory System for name resolution

Page 34: VL2: A Scalable and Flexible data Center Network CS538 10/23/2014 Presentation by: Soteris Demetriou Scribe: Cansu Erdogan

VL2 Components

• Scale-out CLOS topology• Addressing and Routing – VLB• Directory System

Page 35: VL2: A Scalable and Flexible data Center Network CS538 10/23/2014 Presentation by: Soteris Demetriou Scribe: Cansu Erdogan

Scale-out topology

. . .

. . .

TOR

20 Servers

Int

. . . . . . . . .

Aggr

. . . . . . . . . . .

VL2CLOS

•Bipartite graph •Graceful degradation of bandwidth if an IS fails

*

Page 36: VL2: A Scalable and Flexible data Center Network CS538 10/23/2014 Presentation by: Soteris Demetriou Scribe: Cansu Erdogan

Scale-out topology

• Clos very suitable for VLB– By indirectly forwarding traffic through an IS at the

top, the network can provide bandwidth guarantees for any traffic matrices subject to the hose model

• Routing is simple and resilient– Need a random path up to an IS and a random

path down to a destination ToR

Page 37: VL2: A Scalable and Flexible data Center Network CS538 10/23/2014 Presentation by: Soteris Demetriou Scribe: Cansu Erdogan

VL2 Addressing and Routing: name-location separation

payloadToR3

. . . . . .

yx

Servers use flat names

Switches run link-state routing and maintain only switch-level topology

y zpayloadToR4 z

ToR2 ToR4ToR1 ToR3

y, zpayloadToR3 z

. . .

DirectoryService

…x ToR2

y ToR3

z ToR4

Lookup &Response

…x ToR2

y ToR3

z ToR3

• Allows usage of low-cost switches• Protects network and hosts from host-state churn

VL2

*

Page 38: VL2: A Scalable and Flexible data Center Network CS538 10/23/2014 Presentation by: Soteris Demetriou Scribe: Cansu Erdogan

VL2 Addressing and Routing: VLB indirection

x y

payloadT3 y

z

payloadT5 z

IANYIANYIANY

IANY

Links used for up pathsLinks usedfor down paths

T1 T2 T3 T4 T5 T6

[ ECMP + IP Anycast ]• Harness huge bisection bandwidth• Avoid esoteric traffic engineering or optimization• Ensure robustness to failures• Work with switch mechanisms available today

1. Must spread traffic2. Must ensure dst independence

Equal Cost Multi Path Forwarding

*

Page 39: VL2: A Scalable and Flexible data Center Network CS538 10/23/2014 Presentation by: Soteris Demetriou Scribe: Cansu Erdogan

VL2 Directory System

• Three key functions– Lookups– Updates

• AA to LA mapping

– Reactive cache update• For latency sensitive updates• E.g VM during migration

• Goals– Scalability– Reliability for updates– High lookup performance– Eventual consistency (like ARP)

Page 40: VL2: A Scalable and Flexible data Center Network CS538 10/23/2014 Presentation by: Soteris Demetriou Scribe: Cansu Erdogan

VL2 Directory System

RSM

DS

RSM

DS

RSM

DS

Agent

. . .

Agent

. . .. . . DirectoryServers

RSMServers

2. Reply2. Reply1. Lookup

“Lookup”

5. Ack

2. Set 4. Ack(6. Disseminate)

3. Replicate

1. Update

“Update”

Page 41: VL2: A Scalable and Flexible data Center Network CS538 10/23/2014 Presentation by: Soteris Demetriou Scribe: Cansu Erdogan

Outline

• Background• Motivation• Measurement Study• VL2 • Evaluation• Summary and Discussion

Page 42: VL2: A Scalable and Flexible data Center Network CS538 10/23/2014 Presentation by: Soteris Demetriou Scribe: Cansu Erdogan

Evaluation• Uniform high capacity:

– All-to-all data shuffle traffic matrix:– 75 servers – Each delivers 500MB to all others (for a total of 2.7TB shuffle from

memory to memory)– VL2 completes the shuffle in 395s– Aggregate goodput: 58.8Gbps

– 10x times better than their current data center network– Maximal achievable goodput over all flows is 62.3Gbps– VL2 network efficiency as 58.8/62.3 = 94%

Page 43: VL2: A Scalable and Flexible data Center Network CS538 10/23/2014 Presentation by: Soteris Demetriou Scribe: Cansu Erdogan

Evaluation• VLB Fairness:

– 75 node testbed– Traffic characteristics as per measurement study– All flows pass through the Aggregation switches sufficient to check there

for the split ratio among the links to the Intermediate switches– Plot Jain’s fairness index for traffics to intermediate switches

– VLB split ratio fairness index, averages >0.98% for all ASsTime (s)

0 100 200 300 400 500

1.00

0.98

0.96

0.94Fair

nes

s In

dex

Aggr1 Aggr2 Aggr3

Page 44: VL2: A Scalable and Flexible data Center Network CS538 10/23/2014 Presentation by: Soteris Demetriou Scribe: Cansu Erdogan

Evaluation• Performance isolation:

– Added two types of services to the network:• Service one: 18 servers do single TCP transfer to another server, starting at time

0 and lasting throughout the experiment• Service two:

• Start one server at 60s and assigns a new server every 2s for a total of 19servers

• Each one starts a 8GB transfer over TCP as soon as it starts up (every 2 seconds)

No perceptible change in Service 1, as servers start up on Service 2

Page 45: VL2: A Scalable and Flexible data Center Network CS538 10/23/2014 Presentation by: Soteris Demetriou Scribe: Cansu Erdogan

Evaluation• Performance isolation (cnt’d):

– To evaluate how mice flows (large number of short TCP connections) – common in DC – affect performance on other services:

– Service 2:– Servers create successively more bursts of short TCP connections (1 to 20

KB)

No perceptible change in Service 1

• TCP’s natural enforcement of the hose model is sufficient to provide performance isolation when combined with VLB and no oversubscription

Page 46: VL2: A Scalable and Flexible data Center Network CS538 10/23/2014 Presentation by: Soteris Demetriou Scribe: Cansu Erdogan

Evaluation• Convergence after link failures

– 75 servers– All-to-all data shuffle– Disconnect links between intermediate and aggregation switches

– Maximum capacity of the network, degrades gracefully – Restoration is delayed– VL2 fully uses a link, roughly 50s after it is restored– Restoration does not interfere with traffic and the aggregate throughput

eventually returns to its initial level.

Page 47: VL2: A Scalable and Flexible data Center Network CS538 10/23/2014 Presentation by: Soteris Demetriou Scribe: Cansu Erdogan

Outline

• Background• Motivation• Measurement Study• VL2 • Evaluation• Summary and Discussion

Page 48: VL2: A Scalable and Flexible data Center Network CS538 10/23/2014 Presentation by: Soteris Demetriou Scribe: Cansu Erdogan

That’s a lot to take in! Take Aways please!

Problem: Over-subscription in data-centers and lack of agility

Page 49: VL2: A Scalable and Flexible data Center Network CS538 10/23/2014 Presentation by: Soteris Demetriou Scribe: Cansu Erdogan

Overall ApproachMeasurement Study

Stakeholder InterviewsDatacenter workload

measurements

Design

Objectives ArchitectureApplication of

known techniques when possible

Evaluation

Testbed: includes all design components

Evaluation with respect to objectives

Page 50: VL2: A Scalable and Flexible data Center Network CS538 10/23/2014 Presentation by: Soteris Demetriou Scribe: Cansu Erdogan

Design Overview

Layer-2 semantics

Performance Isolation

Uniform high-capacity

Flat Addressing

Enforce hose model using

existing mechanisms

Guarantee bandwidth for

hose-model traffic

Directory System (resolution service)

Name – Location Separation

TCP

Valiant Load Balancing

Scale-out CLOS topology

Solution Approach Objective

Page 51: VL2: A Scalable and Flexible data Center Network CS538 10/23/2014 Presentation by: Soteris Demetriou Scribe: Cansu Erdogan

End Result

52

The Illusion of a Huge L2 Switch

1. L2 semantics

2. Uniform high capacity

3. Performance isolation

A AA … A AA …

. . .

A AA … A AA …

CR CR

AR AR AR AR

SS

SS SS

SS

SS SS

AAAA AAAA AAAA A A A A AA A AA AA AA

. . .

*

Page 52: VL2: A Scalable and Flexible data Center Network CS538 10/23/2014 Presentation by: Soteris Demetriou Scribe: Cansu Erdogan

Discussion

• What is the impact of VL2 on Data-Center power consumption?

• Security– The Directory System can perform access control

• What are the challenges with that?

– Other issues?• Flat vs Hierarchical addressing• VL2 has issues with large flows. How can we

address that challenge?