Sharing the Datacenter Network - Seawall

Alan ShiehCornell UniversitySrikanth KandulaAlbert GreenbergChanghoon KimBikas SahaMicrosoft Research, Azure, Bing

Sharing the Datacenter Network - Seawall

Presented by WANG Ting

Ability to multiplex is a key driver for the datacenter business

Diverse applications, jobs, and tenants share common infrastructure

Congestion Control at flow granularity (TCP)

The de-facto way to share the network is

Monopolize shared resource• Use many TCP flows• Use more aggressive variants of TCP• Do not react to congestion (UDP)Denial of service attack on VM or rack• Place a malicious VM on the same machine (rack)

as victim• Flood traffic to that VM

Normal Traffic Malicious or Selfish tenant

Problem: Performance interference

Problem:Hard to achieve cluster objectivesEven with well-behaved applications, no good way toAllocate disjoint resources coherently:

Reduce slot != Map slot due to differing # of flowsAdapt allocation as needed:

Boost task that is holding back job due to congestion

Decouple network allocation from application’s traffic profile

Have freedom to do this in datacenters

RequirementsProvide simple, flexible service interface for

tenants Support any protocol or traffic patternNeed not specify bandwidth requirements

Scale to datacenter workloadsO(10^5) VMs and tasks, O(10^4) tenantsO(10^5) new tasks per minute, O(10^3) deployments

per dayUse network efficiently (e.g., work conserving)Operate with commodity network devices

< x Mbps

In-network queuing and rate limiting

Existing mechanisms are insufficient

Not scalable. Slow, cumbersome to reconfigure switches

< x Mbps

Does not provide end-to-end protection; Wasteful in common case

Hard to specify. Overhead. Wasteful in common case.

End host rate limits

Reservations

HV

HV

Basic ideas in SeawallLeverage congestion control loops to adapt

network allocationUtilizes network efficientlyCan control allocations based on policyNeeds no central coordination

Implemented in the hypervisor to enforce policyIsolated from tenant codeAvoids scalability, churn, and

reconfiguration limitations of hardware

Weights: Simple, flexible service model

Weights enable high level policiesPerformance isolationDifferentiated provisioning model

Increase priority of stragglers

Small VM: CPU = 1 coreMemory = 1 GBNetwork weight = 1

Every VM is associated with a weightSeawall allocates bandwidth share in

proportion to weight

Tunnel

Tunnel

Components of Seawall

To control the network usage of endpoints Shims on the forwarding paths at the sender and

receiver One tunnel per VM <source,destination> Periodic congestion feedback (% lost, ECN marked...) Controller adapts allowed rate on each tunnel

Hyp

ervi

sor

Congestion feedback (once every 50ms)

Tunnel

Rate controller

Rate controller

Path-oriented congestion control is not enough

Weight 1

Weight 1

Seawall (link-oriented congestion control)

TCP (path-oriented congestion control)

Weight 1

75%

25%

Weight 1

50%

50%

Effective share increases with #

of tunnels

No change in effective weight

Path-oriented congestion control is not enough

Seawall = Link-oriented congestion controlBuilds on standard congestion control loops

AIMD, CUBIC, DCTCP, MulTCP, MPAT, ...Run in rate limit modeExtend congestion control loops to accept

weight parameterAllocates bandwidth according to per-link

weighted fair shareWorks on commodity hardware

Will show that the combination achieves our goal

50%

50%

For every source VM1. Run a separate distributed control loop

(e.g., AIMD)instance for every active link to generate per-link rate limit

2. Convert per-link rate limits to per-tunnel rate limits

100%

Weight 1

Weight 1

50%

50%

For every source VM1. Run a separate distributed control loop (e.g., AIMD)

instance for every active link to generate per-link rate limit

2. Convert per-link rate limits to per-tunnel rate limitsWeight 1

Weight 1

50%

For every source VM1. Run a separate distributed control loop (e.g., AIMD)

instance for every active link to generate per-link rate limit

2. Convert per-link rate limits to per-tunnel rate limitsWeight 1

Weight 1 Greedy + exponential smoothing

10%25%

15%

Achieving link-oriented control loop

1. How to map paths to links?Easy to get topology in the data centerChanges are rare and easy to disseminate

2. How to obtain link-level congestion feedback?

Such feedback requires switch mods that are not yet available

Use path-congestion feedback (e.g., ECN, losses)

Implementation

Userspace rate controllerKernel datapath shim

(NDIS filter)

Prototype runs on Microsoft Hyper-V root partition and native Windows

Achieving line-rate performanceHow to add congestion control header to

packets?Naïve approach: Use encapsulation, but poses

problemsMore code in shimBreaks hardware optimizations that depend

on header formatBit-stealing: reuse redundant/predictable parts

of existing headers

Other protocols: might need paravirtualization.

IP IP-ID

TCP Timestamp option0x08

0x0a

TSval

TSecr

Seq #

# pa

cket

sSe

q #

Constant

Unused

Evaluation1. Evaluate performance2. Examine protection in presence of

malicious nodes

TestbedXeon L5520 2.26Ghz (4 core Nehalem)1 Gb/s access linksIaaS model: entities = VMs

Performance

Minimal overhead beyond null NDIS filter

(metrics = cpu, memory, throughput)

At Sender

Protection against DoS/selfish traffic

Strategy: UDP flood (red) vs TCP (blue)Equal weights, so ideal share is 50/50

UDP flood is contained

1000 Mbps 430 Mbps

1.5 Mbps

Strategy:Open many TCP connections


Attacker sees little increase with # of flows

Seaw

all

Seaw

all

Seaw

all

Strategy:Open connections to many destinations


Allocation see little change with # of destinations

Seaw

all

Seaw

all

Seaw

all

Related work(Datacenter) Transport protocols

DCTCP, ICTCP, XCP, CUBICNetwork sharing systems

SecondNet, Gatekeeper, CloudPoliceNIC- and switch- based allocation

mechanismsWFQ, DRR, MPLS, VLANs

Industry efforts to improve network / vswitch integration

Congestion Manager

ConclusionShared datacenter network are vulnerable

to selfish, compromised & malicious tenants

Seawall uses hypervisor rate limiters + end-to-end rate controller to provide performance isolation while achieving high performance and efficient network utilization

We develop link-oriented congestion controlUse parameterized control loopsCompose congestion feedback from many

destinations

Thank You!

Documents

Sharing the Datacenter Network - Seawall