50
David Lapsley (@devlaps), Chet Burgess (@cfbIV), Kahou Lei (@kahou82) May 20, 2015 OpenStack Vancouver Summit VXLAN Distributed Service Node

VXLAN Distributed Service Node

Embed Size (px)

Citation preview

David Lapsley (@devlaps), Chet Burgess (@cfbIV), Kahou Lei (@kahou82)

May 20, 2015

OpenStack Vancouver Summit

VXLAN Distributed Service Node

Virtualization in the data center has changed network

requirements

Number of end hosts

Number of networks

Bandwidth requirements

This is a problem for traditional data center

networks

• L2 Access with L3 Aggregation

• Wasted capacity: STP blocks ports to prevent loops

• VLAN Exhaustion: only 4K with 802.1Q label

• ToR Scalability: hw tables need to scale with endpoints

Traditional Data Centers

L3 to the edge can help

• L3 is Scalable

• Well known and supported

• Equal Cost Multi-Path (ECMP) Routing• Each link active at all times

L3

How do we scope tenants/projects?

• MAC over UDP/IP overlay

• Re-uses existing IP core (L3 ECMP, No STP)

• Reduces pressure on ToR L2 tables

• Supports over 16M+ VLANs

• Maintains L2 bridging semantics

VXLAN

VXLAN Encapsulation

• Virtual Network Identifier• 24 bits 16+ million

• VXLAN Tunnel End Point (VTEP)• Encapsulation, Decapsulation

• Listen on UDP port 4789 (IANA), 8472 (Linux default) for incoming VXLAN packets

• VNI to VTEP IP mapping

VXLAN Components

VXLAN Example Deployment

Hypervisor 1

VM1 VM2

VTEP (vxlan100)

Tenant bridge (br101)

VM1 VM2

VTEP (vxlan101)

L3 Network

eth0

Hypervisor 2

Tenant bridge (br100)

VM3 VM4

VTEP (vxlan100)

Tenant bridge (br101)

VM3 VM4

VTEP (vxlan101)

eth0

VXLAN 100

VXLAN 101

DMAC SMAC 802.1Q EType Payload CRC

Outer MAC

OuterIP

Outer UDP

VXLAN CRCPayload

VXLANNetwork Identifier

(24 bits)

VXLANFlags

(8 bits)

Reserved(24 bits)

Reserved(8 bits)

Tenant bridge (br100)

• Broadcast, Unknown, and Multicast packets (e.g. ARP, DHCP, multi-cast, etc.) are flooded to all VTEPs for the given VNI

• Two mechanisms used:• Multicast

• Multi-cast address and VNI configured for each VXLAN segment

• VTEP sends IGMP join/leave as VMs spin up/down

• Broadcast domain implemented using multicast

• Service Node:

• Use a “central” service node to maintain mapping of VNIs to VTEP IPs

Broadcast, Unknown and Multicast Packets

Service Node

Hypervisor 1

VM1 VM2

vxlan100 (1.1.1.1)

Tenant bridge (br101)

VM1 VM2

vxlan101 (3.3.3.3)

L3 Network

eth0

Hypervisor 2

Tenant bridge (br100)

VM3 VM4

vxlan100 (2.2.2.2)

Tenant bridge (br101)

VM3 VM4

vxlan101 (4.4.4.4)

eth0

VXLAN 100

VXLAN 101

Tenant bridge (br100)

VNI VTEPs

1001.1.1.1

2.2.2.2

1013.3.3.3

4.4.4.4

Remote

Service Node

Service Node

Central Service Node

Central Service Node

Distributed Service Node

Distributed Service Node

Distributed Service Node

Distributed Service Node

VXLAN Distributed Service Node

Design

Design

Design

Controller 1 Controller 2 Controller 3

L3 Network

Hypervisor 1

Tenant bridge (br100)

VM1 VM2

Tenant bridge (br101)

VM1 VM2

VTEP (vxlan101)

eth0

Hypervisor 500

Tenant bridge (br100)

VM1 VM2

VTEP (vxlan100)

Tenant bridge (br101)

VM1 VM2

VTEP (vxlan101)

eth0

eth0

VTEP (vxlan100)

eth0 eth0

DistributedVXLAN

Service Node

DistributedVXLAN

Service Node

mcrouter

memcache

mcrouter

memcache

mcrouter

memcache

Design

Controller 1 Controller 2 Controller 3

L3 Network

Hypervisor 1

Tenant bridge (br100)

VM1 VM2

Tenant bridge (br101)

VM1 VM2

VTEP (vxlan101)

eth0

Hypervisor 500

Tenant bridge (br100)

VM1 VM2

VTEP (vxlan100)

Tenant bridge (br101)

VM1 VM2

VTEP (vxlan101)

eth0

eth0

VTEP (vxlan100)

eth0 eth0

DistributedVXLAN

Service Node

DistributedVXLAN

Service Node

mcrouter

memcache

mcrouter

memcache

mcrouter

memcache

Design

Controller 1 Controller 2 Controller 3

L3 Network

Hypervisor 1

Tenant bridge (br100)

VM1 VM2

Tenant bridge (br101)

VM1 VM2

VTEP (vxlan101)

eth0

Hypervisor 500

Tenant bridge (br100)

VM1 VM2

VTEP (vxlan100)

Tenant bridge (br101)

VM1 VM2

VTEP (vxlan101)

eth0

eth0

VTEP (vxlan100)

eth0 eth0

DistributedVXLAN

Service Node

DistributedVXLAN

Service Node

mcrouter

memcache

mcrouter

memcache

mcrouter

memcache

• Multi-threaded python program (multiprocessing module)

• Runs on every hypervisor

• Shares state using Distributed Cache• FB Mcrouter – memcached protocol router (5B requests /second @ peak!)

• Listens for new VTEP registrations• Forwards new mappings to Distributed Cache

• Listens for Broadcast, Unknown, Multicast packets• Floods to all VTEPs in the Virtual Network

VXLAN Distributed Service Node

Service Node

Service Node

Configuring VXLAN

ip link add vxlan1 type vxlan id 1 remote 169.254.1.1 dev

eth0

ip addr add 172.16.1.1 dev vxlan1

ip link set dev vxlan1 mtu 1450

ip link set dev vxlan1 up

Creating VXLAN interfaces

root@mhv2:~# ip addr show vxlan1

4: vxlan1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc

noqueue state UNKNOWN group default

link/ether f2:af:3f:62:cf:65 brd ff:ff:ff:ff:ff:ff

inet 172.16.1.5/24 scope global vxlan1

valid_lft forever preferred_lft forever

inet6 fe80::f0af:3fff:fe62:cf65/64 scope link

valid_lft forever preferred_lft forever

Configured VXLAN Interface

iptables –t nat -A OUTPUT -d 169.254.1.1/32 -p udp -m udp -

-dport 8472 -j DNAT --to-destination 127.0.0.1:8473

The @cfbIV rule

-t nat -A OUTPUT -d 169.254.1.1/32 -p udp -m udp --dport 8472 -j DNAT

--to-destination 127.0.0.1:8473

The @cfbIV rule

-t nat -A OUTPUT -d 169.254.1.1/32 -p udp -m udp --dport 8472 -j DNAT --to-destination 127.0.0.1:8473

The @cfbIV rule

-t nat -A OUTPUT -d 169.254.1.1/32 -p udp -m udp --dport 8472 -j DNAT

--to-destination 127.0.0.1:8473

The @cfbIV rule

-t nat -A OUTPUT -d 169.254.1.1/32 -p udp -m udp --dport 8472 -j DNAT

--to-destination 127.0.0.1:8473

The @cfbIV rule

Demo

Demo Setup

Controller 1 Controller 2 Controller 3

L3 Network

Hypervisor 1

VTEP (172.16.3.4)

192.168.225.231

Hypervisor 500

192.168.225.232

192.168.225.226

VTEP1 (172.16.1.4)

192.168.225.227 192.168.225.228

VTEP1 (172.16.1.4) VTEP (172.16.3.6)VTEP1 (172.16.1.5) VTEP1 (172.16.1.5)

VXLANDistributed

Service Node

VXLANDistributed

Service Node

mcrouter

memcache

mcrouter

memcache

mcrouter

memcache

• Open source VDSN source code

• Integration with Neutron (if community interest)

• Performance and scalability testing

Future work

References

• Presentation slides: http://bit.ly/vdsn-presentation

• VDSN Source Code and Ansible playbooks:• Simple, accessible model, horizontal scaling

• http://bit.ly/vdsn-ansible

• VDSN code coming soon (@devlaps, #devlaps)

• Production Code:• Multi-area VXLAN! Highly optimized, requires expertise to

configure/troubleshoot

• http://bit.ly/multi-area-vxlan

References

• C. Burgess, N. Leake, L3 + VXLAN Made Practical, OpenStack Summit Spring 2014.

• M. Mahalingam, et. Al, Virtual eXtensible Local Area Network (VXLAN): A Framework for Overlaying Virtualized Layer 2 Networks over Layer 3 Networks, https://tools.ietf.org/html/rfc7348

References

• Sanjay K. Hooda, Shyam Kapadia, PadmanabhanKrishnan, Using TRILL, FabricPath, and VXLAN: Designing Massively Scalable Data Centers (MSDC) with Overlays, Cisco Press, 2014.

• Introducing McRouter, http://bit.ly/introducing-mcrouter

References

• McRouter on github, https://github.com/facebook/mcrouter

• Pyroute2, https://pypi.python.org/pypi/pyroute2

• Maintaining a set in Memcached, http://bit.ly/memcache-sets

• Ansible, http://docs.ansible.com

References

@devlaps, [email protected]

Thank You