Data centre networking at London School of Economics and Political Science - Networkshop44

Preview:

Citation preview

Protocol Hamburger

Data centre networking at LSE

Matt Bernstein

Protocol Hamburger

Matt BernsteinLSE

New DCI requirements

• Encrypt everything• VLANs• Availability: 100%?• Bandwidth: lots?• Latency: none?• Scale

– 1600 VMs?– “everything in Azure”?

Campus/Hallszones

encrypteddatacentre

interconnect

InternetSloughLondon

Campus

Local DC VLANs

Shared DC VLANs

Local DC VLANs

encrypted tunnel over Janet

171 VLANsin London

magic routerin London

magic routerin Slough

171 VLANsin Slough

Janet offerings for Slough

• high-capacity, low latency IPv4/IPv6 network• L2 (“JPWS”), but 802.1q tagged on the L3 link

– unless you have Ciena light-path kit on campus– even JPWS relies on Janet routing protocols

• 9000-byte MTU– 9192 bytes on interface, but 9000 in protocols– GÉANT run 9000, Janet not minded to change

• no Out-of-Band access– tenants have all done their own thing

LSE #SDC

sloudc-ban1sloudc-ban2

londpg-ban1londtw-ban1

londpg-sbr1londtw-sbr1 londic-rbr2londsh-rbr2

LSE #1

LSE #2

High level diagram LSE, SDC and London Region

J6 Core J6 Core

“Just Say No”: problems with L2

• Broadcast domains / fault domains• Routing across two locations harder• Spanning Tree • No hop count for simple loop detection• MAC address limit of switching hardware• Mixed MTU on the same segment

BUM traffic• “Normal” switches flood BUM frames• So do most “L2VPN” technologies

– VPLS / EoMPLS (including JPWS)– VXLAN…

• Exacerbated by virtual servers – every VLAN on every port, means every

BUM frame spammed to every hypervisor

Technology selection

• Cisco– vs Juniper (vs HP vs Arista vs..)

• OTV– vs EVPN (vs EVI vs VXLAN vs..)

• Nexus L2 fabric– vs Virtual Chassis Fabric (..)

• FEX blades– vs HP PassThru blades

• DACs, DACBOs• FCoE?

Juniper selected

• Metafabric reference architecture– MX240 routers– SRX5400 firewalls– QFX5100 switches

• EVPN for L2 DCI, bypassing firewalls• Enough grunt to last more than five

years• Can do so much more than just VLANs

fpc5 fpc6fpc4fpc3fpc2 fpc7

fpc1fpc0

node0 node1

bacon158.143.220.2532001:630:9:f220::2

onion158.143.220.254

2001:630:9:f220::1

Janet

xe-1/0/4146.97.129.42/30

2001:630:0:9001::2a/126

xe-1/0/4146.97.129.46/302001:630:0:9001::2e/126

ae0 (xe-1/0/{5,6})158.143.220.2/302001:630:9:f220::1:2/126

ae0 (xe-1/0/{5,6})158.143.220.1/30

2001:630:9:f220::1:1/126

ae1 (xe-1/0/{0,1})Layer 2

ae1 (xe-1/0/{0,1})Layer 2

irb.482 (xe-1/0/{2,3})158.143.220.9/30

irb.482 (xe-1/0/{2,3})158.143.220.5/30

sdc-ban1sdc-ban2

reth1 (xe-2/2/7, xe-5/2/6)158.143.220.10/30

reth0 (xe-2/2/6, xe-5/2/7)158.143.220.6/30

reth2 (xe-{2,5}/2/{0,1,8,9})routed VLANs

Traffic flowthrough Janet

Ethernet VPN

• (MP-)BGP control plane for MAC addresses– FabricPath and VCF are both IS-IS for MAC addresses

mb@press> show route table bgp.evpn.0 evpn-mac-address 00:50:56:91:3e:ca

bgp.evpn.0: 882 destinations, 882 routes (882 active, 0 holddown, 0 hidden)+ = Active Route, - = Last Active, * = Both

2:158.143.220.253:1::1133::00:50:56:91:3e:ca/304 *[BGP/170] 1w4d 07:39:40, localpref 100, from 158.143.220.253 AS path: I, validation-state: unverified > via gr-1/3/0.2, label-switched-path press-to-bacon to 158.143.221.0 via ae0.0, label-switched-path press-to-bacon

Ethernet VPN• MPLS forwarding plane

– using RSVP for fast convergence– MPLS first to be standardised [RFC7432]; VXLAN becoming increasingly popular

mb@press> show evpn instance extensive | match "VLAN|1133" VLAN VNI Intfs / up IRB intf Mode MAC sync IM route label 1133 None 1 1 Extended Enabled 371904

mb@press> show route label 371904

mpls.0: 705 destinations, 877 routes (705 active, 0 holddown, 0 hidden)+ = Active Route, - = Last Active, * = Both

371904 *[EVPN/7] 4w5d 12:31:51, routing-instance DATACENTRE-EVPN1, route-type Ingress-IM, vlan-id 1133 to table DATACENTRE-EVPN1.evpn-mac.0

Our stackL2

EVPN (BGP)RSVP

MPLSOSPF

GREIPSec

BGPIPv6L2

158-byte hit per packet• more than 10% overhead for 1500-byte frames• less than 2% overhead for 9000-byte frames

set interface xe-1/0/4.0 family inet6 mtu 9000set services ipsec-vpn rule X term 1 tunnel-mtu 8910set interface ms-1/2/0 mtu 8910set interface gr-1/3/0 mtu 8886set interface ae1 mtu 8842

It Works

~ 3Gb/s throughput (IMIX, 1500-byte MTU)

~ 9Gb/s throughput (single TCP stream, 9000-byte MTU)

Latency (RTTs from London MX to Slough MX):Raw: 2.7ms (small packets) / 4.8ms (8000 byte)Burgered: 3.3ms (small packets) / 5.9ms (8000 byte)

~ 1-2s to re-converge in the event of single failure

The Bad News

• we found some new bugs in Junos• routing protocols within Janet are a SPoF for our DCI• layering is not as strict as I would like (too much in inet.0)• we're not yet running any L3 on the DC networks in Slough

– partly time constraints, partly a few glitches• the existing firewall adds another 1ms to the RTT, if crossing

subnets– round-trip between two VMs on different VLANs in Slough is 7ms– web servers making lots of DB queries to render a web page are slow

What might we have done differently?

• EVPN is now available on the QFX5100 switches– with a VXLAN forwarding plane

• OTV is simpler to configure, less bleeding edge– but even Cisco seem not to be releasing new OTV hardware

(ASR1k, N7k both old—and expensive)• EVPN/VXLAN appearing on platforms like Juniper MX,

Cisco ASR9k, Cisco N9k, Arista– all three vendors have VMs for testing

Questions

jisc.ac.uk

Thank you

Matt BernsteinLSEm.bernstein@lse.ac.uk