97
Overlay Networks in the Datacenter Craig Johnson, Network Consulting Engineer CCIE #6965 – Datacenter, Storage, R&S @crajohnson

Overlay Networks in the Datacenter - DFW Cisco Users Groupdfw.cisco-users.org/zips/20150204_DFWCUG_Overlay_Networks_In_… · 4/2/2015  · Overlay Networks in the Datacenter Craig

  • Upload
    others

  • View
    7

  • Download
    2

Embed Size (px)

Citation preview

Page 1: Overlay Networks in the Datacenter - DFW Cisco Users Groupdfw.cisco-users.org/zips/20150204_DFWCUG_Overlay_Networks_In_… · 4/2/2015  · Overlay Networks in the Datacenter Craig

Overlay Networks in the Datacenter

Craig Johnson, Network Consulting Engineer

CCIE #6965 – Datacenter, Storage, R&S

@crajohnson

Page 2: Overlay Networks in the Datacenter - DFW Cisco Users Groupdfw.cisco-users.org/zips/20150204_DFWCUG_Overlay_Networks_In_… · 4/2/2015  · Overlay Networks in the Datacenter Craig

© 2014 Cisco and/or its affiliates. All rights reserved.BRKDCT-2328 Cisco Public

Agenda

• Why overlays?

• What problems to they solve in the datacenter

• Evolving technologies in overlay networking

2

Page 3: Overlay Networks in the Datacenter - DFW Cisco Users Groupdfw.cisco-users.org/zips/20150204_DFWCUG_Overlay_Networks_In_… · 4/2/2015  · Overlay Networks in the Datacenter Craig

Problems in Network Design

• Simplified Workload Provisioning / Automation• Simplified deployment

• Fast Provisioning of Virtual Workload by consuming the L2 network from network pool

• Without changing the physical network

• Multitenant Scale• Provide Layer 2 networks for tenants

• Workload anywhere (Mobility/Reachability)• Optimally use server resources by placing the workload anywhere

• Yet provide Layer 2 connectivity to Workloads

3

Page 4: Overlay Networks in the Datacenter - DFW Cisco Users Groupdfw.cisco-users.org/zips/20150204_DFWCUG_Overlay_Networks_In_… · 4/2/2015  · Overlay Networks in the Datacenter Craig

Typical Data Center Design

Layer 2 benefits limited to a POD

POD B POD CPOD A

L3

L2

Page 5: Overlay Networks in the Datacenter - DFW Cisco Users Groupdfw.cisco-users.org/zips/20150204_DFWCUG_Overlay_Networks_In_… · 4/2/2015  · Overlay Networks in the Datacenter Craig

Possible Solution for End-to-End L2?

Just extend STP to the whole network (!?)

STP

L3

L2

Page 6: Overlay Networks in the Datacenter - DFW Cisco Users Groupdfw.cisco-users.org/zips/20150204_DFWCUG_Overlay_Networks_In_… · 4/2/2015  · Overlay Networks in the Datacenter Craig

Limitations of Traditional Layer 2

• Local problems have network-wide impact

• Tree topology provides limited bandwidth• vPC/Mlag can help a bit

• Tree topology introduces sub-optimal paths

• Flooding

• MAC address tables don’t scale

• Slow convergence

• Only 12 bit namespace for L2 domains

STP

L3

L2

Page 7: Overlay Networks in the Datacenter - DFW Cisco Users Groupdfw.cisco-users.org/zips/20150204_DFWCUG_Overlay_Networks_In_… · 4/2/2015  · Overlay Networks in the Datacenter Craig

How network engineers want to build DC networks

Layer 3 all the way to the ToR

L3

L2

Page 8: Overlay Networks in the Datacenter - DFW Cisco Users Groupdfw.cisco-users.org/zips/20150204_DFWCUG_Overlay_Networks_In_… · 4/2/2015  · Overlay Networks in the Datacenter Craig

Advantages of a pure L3 network

• Extremely scalable

• Limited fault domain

• Localized BUM traffic

• Small MAC table sizes

• Optimized traffic flow

Page 9: Overlay Networks in the Datacenter - DFW Cisco Users Groupdfw.cisco-users.org/zips/20150204_DFWCUG_Overlay_Networks_In_… · 4/2/2015  · Overlay Networks in the Datacenter Craig

FabricPath

Easy Configuration Plug-and-Play Flexible Provisioning

Stable and Scalable Multipathing (ECMP) Fast Convergence

Switching Routing

FabricPath

Cisco FabricPath Goal

FabricPath combines benefits of Layer 3 routing with simplicity of Layer 2 switching

Page 10: Overlay Networks in the Datacenter - DFW Cisco Users Groupdfw.cisco-users.org/zips/20150204_DFWCUG_Overlay_Networks_In_… · 4/2/2015  · Overlay Networks in the Datacenter Craig

Limitations of Fabricpath

• Cisco Proprietary• TRILL went nowhere

• Improved scale over STP, but still limited

• BUM traffic still a problem

• Egress routing & ingress tromboning

FabricPath

Page 11: Overlay Networks in the Datacenter - DFW Cisco Users Groupdfw.cisco-users.org/zips/20150204_DFWCUG_Overlay_Networks_In_… · 4/2/2015  · Overlay Networks in the Datacenter Craig

Why Overlays?

Page 12: Overlay Networks in the Datacenter - DFW Cisco Users Groupdfw.cisco-users.org/zips/20150204_DFWCUG_Overlay_Networks_In_… · 4/2/2015  · Overlay Networks in the Datacenter Craig

Why Overlays?

12

Flexible Overlay Virtual Network• Mobility – Track end-point attach at edges

• Scale – Reduce core state• Distribute and partition state to network edge

• Flexibility/Programmability• Reduced number of touch points

Robust Underlay/Fabric• High Capacity Resilient Fabric

• Intelligent Packet Handling

• Programmable & Manageable

Seek well integrated best in class Overlays and Underlays

Page 13: Overlay Networks in the Datacenter - DFW Cisco Users Groupdfw.cisco-users.org/zips/20150204_DFWCUG_Overlay_Networks_In_… · 4/2/2015  · Overlay Networks in the Datacenter Craig

Types of Overlay Service

13

Layer 2 Overlays

• Emulate a LAN segment

• Transport Ethernet Frames (IP and non-IP)

• Single subnet mobility (L2 domain)

• Exposure to open L2 flooding

• Useful in emulating physical topologies

Layer 3 Overlays

• Abstract IP based connectivity

• Transport IP Packets

• Full mobility regardless of subnets

• Contain network related failures (floods)

• Useful in abstracting connectivity and policy

Hybrid L2/L3 Overlays offer the best of both domains

Page 14: Overlay Networks in the Datacenter - DFW Cisco Users Groupdfw.cisco-users.org/zips/20150204_DFWCUG_Overlay_Networks_In_… · 4/2/2015  · Overlay Networks in the Datacenter Craig

Layer 2 Overlay Considerations

• Scale of the edge devices• L2 addresses in Ethernet (MACs) use a flat space

which cannot be summarized

• L2/L3 boundary scaling• Large L2 domains require a large capacity L3

gateway to handle large ARP and MAC tables at a frequent rate of refresh

• Multi-homing sites can induce loops in the network

• Flooding of L2 protocols, unknown unicasts and broadcast in general can propagate failures across the entire L2 domain

14

Page 15: Overlay Networks in the Datacenter - DFW Cisco Users Groupdfw.cisco-users.org/zips/20150204_DFWCUG_Overlay_Networks_In_… · 4/2/2015  · Overlay Networks in the Datacenter Craig

Multi-homing in L2 OverlaysSource learning assumes single attached sites

But network overlays involve edge resiliency

Enhancements are required to address:

• Loop resolution

• Multi-pathing

• Broadcast/Multicast de-duplication

Two Approaches:

• Active-Standby (Data Plane or Control Plane)• One active device per VLAN (single attached site)• VLAN based load balancing

• Active-Active (Control Plane only)• One active device for multi-destination traffic• Intra-VLAN load balancing for unicast

15

• Loop resolution

• Multi-pathing

• Broadcast/Multicast de-duplication

Core

Core

Core

✗ ✗

Page 16: Overlay Networks in the Datacenter - DFW Cisco Users Groupdfw.cisco-users.org/zips/20150204_DFWCUG_Overlay_Networks_In_… · 4/2/2015  · Overlay Networks in the Datacenter Craig

Flooding in L2 OverlaysControl Plane Signalling eliminates the need for floods

16

Pre-set flood facility

MAC learning based on flooding

Flood L2 protocols and unknown unicast

Failure propagation

Fail Open

Suitable for small domains (failure scope)

No predetermined flood tree

MAC learning by control protocol

Contain Failures and L2 protocols

Rich information

Fail Closed

Better suited for broad scope

BA C D BA C D

L2

L3

DC-1

DC-2

Data Plane Learning Control Protocol

BA C D BA C D

L2

L3

DC-1

DC-2

Flooded L2 Overlays MAC Routing

Page 17: Overlay Networks in the Datacenter - DFW Cisco Users Groupdfw.cisco-users.org/zips/20150204_DFWCUG_Overlay_Networks_In_… · 4/2/2015  · Overlay Networks in the Datacenter Craig

L2 Overlay Evolution

17

L2 L2Intra-DC

(Fabric)

Inter-DC (DCI)

Backbone Network

Fabric Path VXLAN

Underlay Control Plane IS-IS Any IP routing protocol

Overlay Control Plane Flood and Learn Flood and Learn BGP

Encapsulation MAC in MAC MAC in IP

Locator Access Switch-ID Access IP

VPLS OTV

Underlay Control Plane MPLS Any IP routing protocol

Overlay Control Plane Flood and Learn IS-IS

Encapsulation MAC in MPLS MAC in IP

Locator MPLS PE NV Edge IP

Page 18: Overlay Networks in the Datacenter - DFW Cisco Users Groupdfw.cisco-users.org/zips/20150204_DFWCUG_Overlay_Networks_In_… · 4/2/2015  · Overlay Networks in the Datacenter Craig

L2 Overlay Flood/Learn Implementations

18

Fabric Path L2

Switch-IDs

1. Underlay Control Plane: IS-IS calculates all possible paths between switch-IDs (Locators)

2. IS-IS calculates a multicast distribution tree for floods

3. BUM traffic flooded over multicast tree

4. Locators for each host learnt by gleaning Floods

VXLAN L2

NVE-IPs

1. Underlay Control Plane: IP calculates all possible paths between NVE-IPs (Locators)

2. IP multicast distribution tree for floods

3. BUM traffic flooded over multicast tree

4. Locators for each host learnt by gleaning Floods

VPLS

1. Underlay Control Plane: MPLS calculates all possible LSPs between PEs

2. Pre-determined group of pseudo-wires for flooding

3. BUM traffic flooded over multicast tree

4. PEs for each host gleaned from Floods

MPLS

PE

PEPE

Page 19: Overlay Networks in the Datacenter - DFW Cisco Users Groupdfw.cisco-users.org/zips/20150204_DFWCUG_Overlay_Networks_In_… · 4/2/2015  · Overlay Networks in the Datacenter Craig

L2 Overlay Control Plane Implementations

19

1. Underlay Control Plane: IP calculates all possible paths between Edge Devices (Locators)

2. Overlay Control Plane: IS-IS adjacencies amongst Edge Devices

3. Locators for each host advertised in IS-IS

4. No Floods, integrated multi-homing

OTV IP

ED

EDED

1. Underlay Control Plane: IMPLS calculates all possible LSPs between PEs

2. Overlay Control Plane: BGP adjacencies amongst Edge Devices

3. Locators for each host advertised in BGP

4. No Floods, integrated multi-homing

EVPN MPLS

PE

PEPE

Page 20: Overlay Networks in the Datacenter - DFW Cisco Users Groupdfw.cisco-users.org/zips/20150204_DFWCUG_Overlay_Networks_In_… · 4/2/2015  · Overlay Networks in the Datacenter Craig

Layer 3 Overlay Considerations

• Scale of the edge devices• Can be improved further by using an on-demand

pull model

• IP Mobility for subnet disaggregation• Members of a subnet may be distributed across

locations

• Any host anywhere

• Broadcast & Link-local multicast traffic to be handled as a special case

• Potentially without even learning MAC addresses

20

Page 21: Overlay Networks in the Datacenter - DFW Cisco Users Groupdfw.cisco-users.org/zips/20150204_DFWCUG_Overlay_Networks_In_… · 4/2/2015  · Overlay Networks in the Datacenter Craig

L3 Overlay Evolution

Push Protocol Model

• IP/BGP MPLS VPNs are highly scalable today

• PE routers must:• Hold a large number of prefixes• Maintain multiple routing protocol

adjacencies

• Mobility and cloud will add pressure in terms of:

• Prefix granularity and volume• Increased number of PEs

Edge Device Scale

21

Pull Protocol (on-demand) Model

LISP deployments and footprint are increasing rapidly

On-demand caching models ease the requirements on the edge devices:– Only prefixes being utilized are cached

– No routing adjacencies are maintained

A pull model is expected to provide global scalability to enable pervasive cloud models

Page 22: Overlay Networks in the Datacenter - DFW Cisco Users Groupdfw.cisco-users.org/zips/20150204_DFWCUG_Overlay_Networks_In_… · 4/2/2015  · Overlay Networks in the Datacenter Craig

Seminal Idea: Location and Identity Separation

22

IP core

Device IPv4 or IPv6

Address Represents

Identity and Location

Traditional BehaviourLoc/ID “Overloaded” Semantic

10.1.0.1 When the Device Moves, It Gets

a New IPv4 or IPv6 Address for

Its New Identity and Location20.2.0.9

Device IPv4 or IPv6

Address Represents

Identity Only.

When the Device Moves, Keeps

Its IPv4 or IPv6 Address.

It Has the Same Identity

Overlay BehaviourLoc/ID “Split”

IP core

1.1.1.1

2.2.2.2

Only the Location Changes

10.1.0.1

10.1.0.1

Its Location Is Here!

Page 23: Overlay Networks in the Datacenter - DFW Cisco Users Groupdfw.cisco-users.org/zips/20150204_DFWCUG_Overlay_Networks_In_… · 4/2/2015  · Overlay Networks in the Datacenter Craig

L3 Overlay Implementations

23

1. Underlay Control Plane: IP calculates all possible paths between Edge Devices (Locators)

2. Overlay Control Plane: All mappings registered with Mapping System by xTRs

3. xTRs “pull” mappings on demand

LISP

(pull)IP

xTR

xTRxTR

1. Underlay Control Plane: IMPLS calculates all possible LSPs between PEs

2. Overlay Control Plane: BGP adjacencies amongst PEs

3. Locators for each host pushed in BGP to all PEs

MPLS VPNs

(push)MPLS

PE

PEPE

Map System

Page 24: Overlay Networks in the Datacenter - DFW Cisco Users Groupdfw.cisco-users.org/zips/20150204_DFWCUG_Overlay_Networks_In_… · 4/2/2015  · Overlay Networks in the Datacenter Craig

Combined L2/L3 Overlays• Route all IP traffic including Intra-

subnet

• Bridge only: • Non-IP

• Broadcast

• Link-local multicast

• Assumption is that most traffic is IP

24

L2/L3

Fabric

Page 25: Overlay Networks in the Datacenter - DFW Cisco Users Groupdfw.cisco-users.org/zips/20150204_DFWCUG_Overlay_Networks_In_… · 4/2/2015  · Overlay Networks in the Datacenter Craig

Distributed Gateway Function in L3 Overlays

25

Traditional L2 - centralised L2/L3 boundary

• Always bridge, route only at an aggregation point

• Large amounts of state converge

• Scale problem for large# of L2 segments

• Traditional L2 and L2 overlays

L2/L3 fabric (or overlay)

• Always route (at the leaves), bridge when necessary

• Distribute and disaggregate necessary state

• Optimal scalability

• Enhanced forwarding and L3 overlays

App

OS

App

OS

Virtual Physical

L3 Boundary

L3 Boundary

App

OS

App

OS

Virtual Physical

L2/L3 Fabric

Page 26: Overlay Networks in the Datacenter - DFW Cisco Users Groupdfw.cisco-users.org/zips/20150204_DFWCUG_Overlay_Networks_In_… · 4/2/2015  · Overlay Networks in the Datacenter Craig

IP Mobility with L3 Overlays• Granular location information (host

routes)• Allow subnet members to move anywhere

• Layer 2 semantics• ARP proxy

• Consistent default Gateway presence

• L3 at the Access• Access switch replies to all ARPs with the

same MAC address

• Host routing for all traffic within the fabric

• Summary prefix outside the fabric

26

L3 Fabric

Page 27: Overlay Networks in the Datacenter - DFW Cisco Users Groupdfw.cisco-users.org/zips/20150204_DFWCUG_Overlay_Networks_In_… · 4/2/2015  · Overlay Networks in the Datacenter Craig

Access

Aggregation

Core

L3

L2

H1

10.1.1.10/24H3

10.1.2.10/24H2

10.1.1.20/24

SVI IP AddressMAC: 0000.dead.beef

IP: 10.1.1.1

SVI IP AddressMAC:

0000.dead.beef

IP: 10.1.1.1

vSwitchvSwitch

L2/L3 Overlay First Hop Routing

• A leaf switch is assigned an IP address and a gateway MAC address for each locally defined subnet with a connected host IP address of the SVIs

• The same anycast IP address is assigned to all leaves supporting attached hosts in the same subnet

• The same gateway MAC address can be used across all subnets supported on all the leaves

Routing on the Leaf Nodes

27

Page 28: Overlay Networks in the Datacenter - DFW Cisco Users Groupdfw.cisco-users.org/zips/20150204_DFWCUG_Overlay_Networks_In_… · 4/2/2015  · Overlay Networks in the Datacenter Craig

H1

10.1.1.10

1. H1 sends an ARP request for H2 –10.1.1.20

2. The ARP request is intercepted at the leaf L1 and punted to the Sup

3. A few options:

1. If L1 has a valid route to H2, L1 may ARP reply with its own G_MAC

2. If L1 has a MAC-IP binding for H2, L1 may ARP-reply on behalf of H2 with H2’s MAC

3. L1 may unicast the ARP reply to the leaf where H1 is attached

1

H1 ARP Cache

10.1.1.20 G_MAC

3

L2/L3 Overlays – ARP and Intra-subnet ForwardingARP Handling

H2

10.1.1.20

vSwitch

2 CPUAccess

Aggregation

Core

L1 RIB

10.1.1.20/32 NH L4_IP

L1 ARP TableL4_IP L4_MAC

28

Page 29: Overlay Networks in the Datacenter - DFW Cisco Users Groupdfw.cisco-users.org/zips/20150204_DFWCUG_Overlay_Networks_In_… · 4/2/2015  · Overlay Networks in the Datacenter Craig

Aggregation

Core

• If H1 generates a data packet destined to G_MAC, then a MAC re-write, TTL decrement and host IP forwarding takes place

• If H1 generates a data packet destined to H2_MAC, then overlay forwarding can be done without TTL decrement based on either H2_MAC or H2_IP depending on the overlay implementation.

L2/L3 Overlays – ARP and Intra-subnet ForwardingIP Forwarding within the Same Subnet

4

SMAC→ H1_MAC

DMAC→ G_MAC

SIP→ 10.1.1.10

DIP→ 10.1.1.20

H110.1.1.10

H1 ARP Cache

10.1.1.20 G_MAC

e1/1

H210.1.1.20

vSwitch

6SMAC→ L1_MAC

DMAC→ L4_MAC

SIP→ 10.1.1.10

DIP→ 10.1.1.20

SSID→ L1

DSID→ L4

SMAC→ G_MAC

DMAC→ H2_MAC

SIP→ 10.1.1.10

DIP→ 10.1.1.20

7

L4 RIB10.1.1.20/32 e1/1L1 RIB

10.1.1.20/32 NH L4_IP

L1 ARP TableL4_IP L4_MAC

5

29

Page 30: Overlay Networks in the Datacenter - DFW Cisco Users Groupdfw.cisco-users.org/zips/20150204_DFWCUG_Overlay_Networks_In_… · 4/2/2015  · Overlay Networks in the Datacenter Craig

Combined L2/L3 Overlay Service Implementations

30

Cisco Nexus Unified Fabric

30

L2

Switch-IDs

1. Underlay Control Plane: IS-IS calculates all possible paths between switch-IDs (Locators)

2. L2: Flood and Learn

3. L3: MP-BGP advertisement of host locations

4. Forward on L3 unless data is non-IP

Application Centric

Infrastructure

L2

NVE-IPs

1. Underlay Control Plane: IP calculates all possible paths between NVE-IPs (Locators)

2. Overlay Control Plane: COOP – Demand protocol

3. Register both IP and MACs for every host with COOP

4. Leaf nodes “pull” IP and/or MAC mappings on demand

5. Forward on L3 information unless data is non-IP

Page 31: Overlay Networks in the Datacenter - DFW Cisco Users Groupdfw.cisco-users.org/zips/20150204_DFWCUG_Overlay_Networks_In_… · 4/2/2015  · Overlay Networks in the Datacenter Craig

VXLAN/NVGRE

Page 32: Overlay Networks in the Datacenter - DFW Cisco Users Groupdfw.cisco-users.org/zips/20150204_DFWCUG_Overlay_Networks_In_… · 4/2/2015  · Overlay Networks in the Datacenter Craig

VXLAN Packet StructureEthernet in IP with a shim for scalable segmentation

32

Original L2 FrameVXLAN Header F C

S

Allows for 16M possible segmentsUDP 4789

Hash of the inner L2/L3/L4 headers of the original frame.

Enables entropy for ECMP Load balancing in the Network.

Src and Dst addresses of the VTEPs

Src VTEP MAC Address

Next-Hop MAC Address

50 Bytes of overhead

Large scale segmentation

Tunnel Entropy

Ethernet Payload

Page 33: Overlay Networks in the Datacenter - DFW Cisco Users Groupdfw.cisco-users.org/zips/20150204_DFWCUG_Overlay_Networks_In_… · 4/2/2015  · Overlay Networks in the Datacenter Craig

NVGRE Frame Format

Outer

MAC

DA

Outer

MAC

SA

Outer

802.1Q

Outer IP

DA

Outer IP

SAGRE

VS ID (24 bits)Inner

MAC DA

InnerMA

C

SA

Optional

Inner 802.1Q

Original

Ethernet

PayloadCRC

Original Ethernet Frame

Allows for possible

16M segments

IP header, allowing transport

across any IP network

Transport VLAN

NVGRE is a simple MAC-in-GRE (MAC-in-IP)

33

Page 34: Overlay Networks in the Datacenter - DFW Cisco Users Groupdfw.cisco-users.org/zips/20150204_DFWCUG_Overlay_Networks_In_… · 4/2/2015  · Overlay Networks in the Datacenter Craig

Overlay Taxonomy

34

Overlay Control Plane

Encapsulation

Service = Virtual Network Instance (VNI)

Identifier = VN Identifier (VNID)

NVE = Network Virtualization Edge

VTEP = VXLAN Tunnel End-Point

Underlay Control Plane

Underlay Network

Hosts

(end-points)

Edge Devices (NVE)Edge Device (NVE)

VTEPs

Page 35: Overlay Networks in the Datacenter - DFW Cisco Users Groupdfw.cisco-users.org/zips/20150204_DFWCUG_Overlay_Networks_In_… · 4/2/2015  · Overlay Networks in the Datacenter Craig

VXLAN is an Overlay Encapsulation

35

Overlay Control Plane

Encapsulation

VXLAN

Data Plane Learning

Flood and Learn over a multidestinationdistribution tree joined by all edge devices

Protocol Learning

Advertise hosts in a protocol amongst edge devices

t

Page 36: Overlay Networks in the Datacenter - DFW Cisco Users Groupdfw.cisco-users.org/zips/20150204_DFWCUG_Overlay_Networks_In_… · 4/2/2015  · Overlay Networks in the Datacenter Craig

Data Plane LearningDedicated Multicast Distribution Tree per VNI

36

VTEP VTEP VTEP

PIM Join for Multicast Group 239.1.1.1

PIM Join for Multicast Group 239.1.1.1

PIM Join for Multicast Group 239.2.2.2

PIM Join for Multicast Group 239.2.2.2

WebVM

WebVM

DBVM

DBVM

Multicast-enabled Transport

Page 37: Overlay Networks in the Datacenter - DFW Cisco Users Groupdfw.cisco-users.org/zips/20150204_DFWCUG_Overlay_Networks_In_… · 4/2/2015  · Overlay Networks in the Datacenter Craig

Data Plane LearningLearning on Broadcast Source - ARP Request Example

37

VM 1 VM 3VM 2

VTEP 11.1.1.1

VTEP 33.3.3.3

VTEP 22.2.2.2

IP A GARP Req

MAC IP AddrVM 1 VTEP 1

MAC IP AddrVM 1 VTEP 1

ARP Req

IP A GARP Req

ARP Req ARP Req

Multicast-enabled Transport

Page 38: Overlay Networks in the Datacenter - DFW Cisco Users Groupdfw.cisco-users.org/zips/20150204_DFWCUG_Overlay_Networks_In_… · 4/2/2015  · Overlay Networks in the Datacenter Craig

Data Plane LearningLearning on Unicast Source - ARP Response Example

38

VM 1 VM 3VM 2

VTEP 11.1.1.1

VTEP 33.3.3.3

VTEP 22.2.2.2

ARP Resp

MAC IP AddrVM 2 VTEP 2

Multicast-enabled Transport

VTEP 2 VTEP 1ARP Resp

ARP Resp

MAC IP AddrVM 1 VTEP 1

Page 39: Overlay Networks in the Datacenter - DFW Cisco Users Groupdfw.cisco-users.org/zips/20150204_DFWCUG_Overlay_Networks_In_… · 4/2/2015  · Overlay Networks in the Datacenter Craig

Data Plane LearningSharing Multicast Groups across VNIs

39

WebVM

WebVM

DBVM

DBVM

VTEP 11.1.1.1

VTEP 33.3.3.3

VTEP 22.2.2.2

Blue VNI onGroup G

Purple VNI onGroup G

IP A GOrg Frame

IP A GOrg Frame

Page 40: Overlay Networks in the Datacenter - DFW Cisco Users Groupdfw.cisco-users.org/zips/20150204_DFWCUG_Overlay_Networks_In_… · 4/2/2015  · Overlay Networks in the Datacenter Craig

Destination is in another segment.Packet is routed to the new segment

VXLANORANGE VXLANBLUE

Ingress VXLAN packet on Orange segment

VXLAN Router

VXLAN L2 and L3 GatewaysConnecting VXLAN to the broader network

40

L2 Gateway: VXLAN to VLAN BridgingVXLANORANGE

Ingress VXLAN packet on Orange segment

Egress interface chosen (bridge may .1Q tag the packet)

VXLAN L2 Gateway

SVI

Egress interface chosen (bridge may .1Q tag the packet)

L3 Gateway: VXLAN to X Routing • VXLAN • VLAN VLAN100 VLAN200

N1KV N7K w/F3 LC Nexus 3K N5K/6K N9K CSR 1000V ASR1K/ASR9K

L2 Gateway VXGW on 1110 Yes Yes Yes Yes N/A Yes

L3 Gateway CSR 1000V Yes No Roadmap Roadmap Yes Yes

Page 41: Overlay Networks in the Datacenter - DFW Cisco Users Groupdfw.cisco-users.org/zips/20150204_DFWCUG_Overlay_Networks_In_… · 4/2/2015  · Overlay Networks in the Datacenter Craig

VXLAN Evolution

• Head-end replication enables unicast-only mode

• Control Plane provides dynamic VTEP discoveryMulticast Independent

• Workload MAC addresses learnt by VXLAN NVEs

• Advertise L2/L3 address-to-VTEP association information in a protocol

Protocol Learning prevents floods

• VXLAN HW Gateways to other encaps/networks

• VXLAN HW Gateway redundancy

• Enable hybrid overlaysExternal Connectivity

• VXLAN Routing

• Distributed IP GatewaysIP Services

41

Page 42: Overlay Networks in the Datacenter - DFW Cisco Users Groupdfw.cisco-users.org/zips/20150204_DFWCUG_Overlay_Networks_In_… · 4/2/2015  · Overlay Networks in the Datacenter Craig

Unicast-OnlyTransport

POD2

POD3

VTEP

VXLAN Unicast ModeHead-end replication

POD1 VXLAN Encap4

3 VTEP performs Head-End Replication

VTEP

VTEP

Overlay Neighbors

POD3 , IP C

POD2 , IP B

2VTEP has static configuration of Overlay Neighbors

BUM Frame

1

A host sends a L2 BUM* frame

IP A IP BBUM Frame

IP AIP B

IP C

IP A IP CBUM Frame

*Broadcast, Unknown Unicast or Multicast

5 Frames are unicast to the neighbors

Page 43: Overlay Networks in the Datacenter - DFW Cisco Users Groupdfw.cisco-users.org/zips/20150204_DFWCUG_Overlay_Networks_In_… · 4/2/2015  · Overlay Networks in the Datacenter Craig

VXLAN Evolution: Using a Control ProtocolVTEP Discovery

43

POD2

POD3

VTEP

POD1

BGP consolidates and propagates VTEP list for VNI

2

4 VTEP can perform Head-End Replication

VTEP

VTEP

Overlay Neighbors

POD3 , IP C

POD2 , IP B

3VTEP obtains list of VTEP neighbors for each VNI

1

VTEPs advertise their VNI membership in BGP

IP AIP B

IP C

BGP Route

Reflector

1

1

Page 44: Overlay Networks in the Datacenter - DFW Cisco Users Groupdfw.cisco-users.org/zips/20150204_DFWCUG_Overlay_Networks_In_… · 4/2/2015  · Overlay Networks in the Datacenter Craig

VXLAN Evolution: Using a Control ProtocolProtocol Learning

4444

POD2

POD3

VTEP

POD1

BGP propagates routes for the host to all other VTEPs

2

VTEP

VTEP

Overlay Forwarding Table

Host1 <MAC,IP> , VTEP IP AHost2 <MAC,IP> , VTEP IP B

3

1

VTEPs advertise host routes (IP+MAC) to local hosts

IP AIP B

IP C

BGP Route

Reflector

Overlay Forwarding Table

Host1 <MAC,IP> , VTEP IP A

3VTEPs obtain host routes for remote hosts and install in RIB/FIB

Page 45: Overlay Networks in the Datacenter - DFW Cisco Users Groupdfw.cisco-users.org/zips/20150204_DFWCUG_Overlay_Networks_In_… · 4/2/2015  · Overlay Networks in the Datacenter Craig

Host Route Distribution decoupled from the Underlay protocol

Use MP-BGP on the leaf nodes to distribute internal host/subnet routes and external reachability information

Route-Reflectors deployed for scaling purposes

iBGP Adjacencies

Access

AggregationRR RR

VXLAN EVPN Control PlaneHost and Subnet Route Distribution

45

Page 46: Overlay Networks in the Datacenter - DFW Cisco Users Groupdfw.cisco-users.org/zips/20150204_DFWCUG_Overlay_Networks_In_… · 4/2/2015  · Overlay Networks in the Datacenter Craig

1. Host Attaches2. Attachment NVE advertises host’s MAC (+IP) through BGP RR3. Choice of encapsulation is also advertised

Access

AggregationRR RR

NLRI:Host MAC1, IP1 NVE IP 1VNI 5000

Ext.Community:Encapsulation: VXLAN, NVGRECost/Sequence

Host 1VLAN 10

VNI 5000

MAC IP VNI Next-Hop

Encap Seq

1 1 5000 IP1 VXLAN 0

VXLAN EVPN Control PlaneHost Advertisement

46

Page 47: Overlay Networks in the Datacenter - DFW Cisco Users Groupdfw.cisco-users.org/zips/20150204_DFWCUG_Overlay_Networks_In_… · 4/2/2015  · Overlay Networks in the Datacenter Craig

1. Host Moves to NVE32. NVE3 detects Host1 and advertises H1 with seq#13. NVE1 sees more recent route and withdraws its advertisement

Access

AggregationRR RR

NLRI:Host MAC1, IP1 NVE IP 3VNI 5000

Ext.Community:Encapsulation: VXLAN, NVGRECost/Sequence 1

Host 1VLAN 10

VNI 5000

MAC IP VNI Next-Hop

Encap Seq

1 1 5000 IP1 VXLAN 0

MAC IP VNI Next-Hop

Encap Seq

1 1 5000 IP3 VXLAN 1

VXLAN EVPN Control PlaneHost Moves

47

Page 48: Overlay Networks in the Datacenter - DFW Cisco Users Groupdfw.cisco-users.org/zips/20150204_DFWCUG_Overlay_Networks_In_… · 4/2/2015  · Overlay Networks in the Datacenter Craig

1. Host Attaches2. Attachment NVE advertises host’s MAC (+IP) through BGP RR

Access

Aggregation

Core

RR RR

NLRI:Host MAC1, IP1 NVE IP 1VNI 5000

Ext.Community:Encapsulation: Overlay, NVGRECost/Sequence

Host 1VLAN 10

VNI 5000

MAC IP VNI Next-Hop

Encap Seq

1 1 5000 IP1 Overlay 0

Overlay Control Plane IP Mobility ConsiderationsHost Advertisement

48

Page 49: Overlay Networks in the Datacenter - DFW Cisco Users Groupdfw.cisco-users.org/zips/20150204_DFWCUG_Overlay_Networks_In_… · 4/2/2015  · Overlay Networks in the Datacenter Craig

1. Host Moves to NVE32. NVE3 detects Host1 and advertises H1 with higher sequence3. NVE1 sees more recent route and withdraws its advertisement

Access

Aggregation

Core

RR RR

NLRI:Host MAC1, IP3 NVE IP 1VNI 5000

Ext.Community:Encapsulation: Overlay, NVGRECost/Sequence 1

Host 1VLAN 10

VNI 5000

MAC IP VNI Next-Hop

Encap Seq

1 1 5000 IP1 Overlay 0

MAC IP VNI Next-Hop

Encap Seq

1 1 5000 IP3 Overlay 1

Overlay Control Plane IP Mobility considerationsHost Moves

49

Page 50: Overlay Networks in the Datacenter - DFW Cisco Users Groupdfw.cisco-users.org/zips/20150204_DFWCUG_Overlay_Networks_In_… · 4/2/2015  · Overlay Networks in the Datacenter Craig

DC-Fabric: Normalized L2/L3 Network Overlays

VM

OS

VM

OS

VM

OS

VM

OS

NVGRE VXLAN

VXLAN

Terminate the encapsulation from the host overlay

Translate to a normalized encapsulation in the fabric

Seamlessly allow physical and virtual to connect to the fabric

Fabric overlay provides L2 and L3 services with mobility and segmentation

Physical

50

Page 51: Overlay Networks in the Datacenter - DFW Cisco Users Groupdfw.cisco-users.org/zips/20150204_DFWCUG_Overlay_Networks_In_… · 4/2/2015  · Overlay Networks in the Datacenter Craig

Evolution of the VXLAN Data PlaneBeyond Ethernet in IP GPE: Generic Protocol Encapsulation

51

Original L2 FrameVXLAN Header F C

S

Allows for 16M possible segments

UDP 4789

Hash of the inner L2/L3/L4 headers of the original frame.

Enables entropy for ECMP Load balancing in the Network.

Src and Dst addresses of the VTEPs

Src VTEP MAC Address

Next-Hop MAC Address

50 Bytes of overhead 24 bit Protocol Type field (previously

reserved)

Payload: • IP• Ethernet • other

Page 52: Overlay Networks in the Datacenter - DFW Cisco Users Groupdfw.cisco-users.org/zips/20150204_DFWCUG_Overlay_Networks_In_… · 4/2/2015  · Overlay Networks in the Datacenter Craig

Overlay Deployment Considerations

52

Page 53: Overlay Networks in the Datacenter - DFW Cisco Users Groupdfw.cisco-users.org/zips/20150204_DFWCUG_Overlay_Networks_In_… · 4/2/2015  · Overlay Networks in the Datacenter Craig

VTEP Redundancy in a VXLAN Fabric

53

L3 Fabric

L2 Gateway redundancy based on vPC (anycast VTEP address)

vMAC Emulated VTEPVM

OS

VM

OS

vPC provides MAC state synchronization

Redundant VTEPs share anycast VTEP IP address in the underlay

VXLAN L3

Gateway

VXLAN L3

Gateway

Page 54: Overlay Networks in the Datacenter - DFW Cisco Users Groupdfw.cisco-users.org/zips/20150204_DFWCUG_Overlay_Networks_In_… · 4/2/2015  · Overlay Networks in the Datacenter Craig

Distributed IP Anycast Gateway

54

VXLAN L3

Gateway

L3 Fabric

VXLAN L3

Gateway

VM

OS

VM

OS

VXLAN L3

Gateway

VXLAN L3

Gateway

VXLAN L3

Gateway

The same“Anycast” SVI IP/MAC is used at all VTEPs/ToRs

A host will always find its SVI anywhere it moves

SVI IP AddressMAC: 0000.dead.beef

IP: 10.1.1.1

SVI IP AddressMAC: 0000.dead.beef

IP: 10.1.2.1

Page 55: Overlay Networks in the Datacenter - DFW Cisco Users Groupdfw.cisco-users.org/zips/20150204_DFWCUG_Overlay_Networks_In_… · 4/2/2015  · Overlay Networks in the Datacenter Craig

Distributed IP Anycast GatewayDetailed View

55

L3 Fabric

VXLAN L3

Gateway

VXLAN L3

GatewayVXLAN L3

Gateway

SVI A

Underlay / IP Core

VLAN A'

VLAN B'

VTEP L2 GWY

L3 GWYSVI B SVI A

Underlay / IP Core

VLAN A

VLAN B

VNI A

VNI B

VTEPL2 GWY

L3 GWYSVI B

Consistent Anycast SVI IP / MAC address at all leaves

VLAN-IDs are locally significant

Page 56: Overlay Networks in the Datacenter - DFW Cisco Users Groupdfw.cisco-users.org/zips/20150204_DFWCUG_Overlay_Networks_In_… · 4/2/2015  · Overlay Networks in the Datacenter Craig

Distributed IP Anycast Gateway

Combined L2/L3 Overlays

• H2 H4: Bridged over VNI A

• H1 H4: Routed via SVI B (VTEP1) SVI A (VTEP1) Bridged over VNI A

• H4 H1: Routed via SVI A (VTEP2) SVI B (VTEP2) Bridged over VNI B

Packet Flows - Pervasive VNIs/SVIs

56

SVI B

VLAN A'

VLAN B'

VTEP1

SVI A SVI B

VLAN A

VLAN B

VNI A

VNI B

VTEP2

SVI A

H2 H4H1

Page 57: Overlay Networks in the Datacenter - DFW Cisco Users Groupdfw.cisco-users.org/zips/20150204_DFWCUG_Overlay_Networks_In_… · 4/2/2015  · Overlay Networks in the Datacenter Craig

Distributed IP Anycast GatewayPacket Flows – Scoped VNIs/SVIs

57

SVI B

VLAN A'

VLAN B'

VTEP

SVI A

VLAN AVNI A

VTEP

SVI A

H2 H4H1

• H1 H4: Routed via SVI B (VTEP1) SVI A (VTEP1) Bridged over VNI A

• H4 H1: Routed via SVI A (VTEP2) VNI X SVI B (VTEP1) VLAN B’

• Simple rule: A local route is preferred over a BGP learnt route. If there is a local SVI for the destination subnet, the SVI is the next hop

Page 58: Overlay Networks in the Datacenter - DFW Cisco Users Groupdfw.cisco-users.org/zips/20150204_DFWCUG_Overlay_Networks_In_… · 4/2/2015  · Overlay Networks in the Datacenter Craig

SVI/VNI/VLAN Scoping and Provisioning

All VNIs/SVIs everywhere

• Umbrella catch-all provisioning

• Full ARP state on all Leaf Nodes

• Can be manually provisioned up-front

• Open to L2 Flooding everywhere

Orchestration leads to scale optimization

VNIs/SVIs scoped as hosts attach

• Provision on host attach/policy

• ARP state only for local subnets

• Requires orchestration

• L2 Flooding is scoped

58

L3 Fabric

L3 GWY L3 GWY L3 GWY L3 GWYL3 GWY L3 GWY

L3 Fabric

L3 GWY L3 GWY L3 GWY L3 GWYL3 GWY L3 GWY

Mgmt

Page 59: Overlay Networks in the Datacenter - DFW Cisco Users Groupdfw.cisco-users.org/zips/20150204_DFWCUG_Overlay_Networks_In_… · 4/2/2015  · Overlay Networks in the Datacenter Craig

Optimizing ARP behavior

• IP and MAC addresses host information distributed by control protocol

• NVEs (Leaf Switches) create an ARP cache for remote hosts

• NVEs reply locally to ARP requests for remote hosts• Avoid ARP request broadcast flooding

Minimizing Flooding in the Fabric with ARP suppression

59

switch# sh ip arp suppression-cache detail

Flags: + - Adjacencies synced via CFSoE/vPC peerR – Remote AdjacencyL2 – Leant over l2 interface

Total number of entries: 2Address Age MAC Address Vlan Physical Interface Flags100.1.1.2 00:01:02 0026.980c.1ec2 100 Ethernet2/6 100.1.1.3 00:01:03 0026.980c.1ec3 100 R

interface nve 1source-interface loopback 0member vni 100 mcast-group 239.0.0.1member vni 1000

end-host-reachability control protocol bgpsuppress-arp

ingress-replication control protocol bgp

Page 60: Overlay Networks in the Datacenter - DFW Cisco Users Groupdfw.cisco-users.org/zips/20150204_DFWCUG_Overlay_Networks_In_… · 4/2/2015  · Overlay Networks in the Datacenter Craig

Use of /32 host routes may lead to scaling issues if all the routes are installed in the HW tables of all leaf nodes

Host routes associated to remote endpoints can be installed into the HW FIB (from the SW RIB) upon detection of an active conversation with a local endpoint

vSwitch

H110.1.1.10

H310.1.2.10

L1 L2 L3

H210.1.1.20

L1 RIB10.1.1.20/32 NH L2_IP10.1.2.10/32 NH L3_IP

L1 FIB10.1.1.20/32 NH L2_IP10.1.2.10/32 NH L3_IP

Default Behavior (No L3 Conversational Learning)

vSwitch

H110.1.1.10

H310.1.2.10

L1 L2 L3

H210.1.1.20

L1 RIB10.1.1.20/32 NH L2_IP10.1.2.10/32 NH L3_IP

L1 FIB10.1.1.20/32 NH L2_IP

After Enabling L3 Conversational Learning

VTEP ScalingSelective FIB Download

60

Page 61: Overlay Networks in the Datacenter - DFW Cisco Users Groupdfw.cisco-users.org/zips/20150204_DFWCUG_Overlay_Networks_In_… · 4/2/2015  · Overlay Networks in the Datacenter Craig

VTEP ScalingPull Control Protocols

61

IP Fabric

Location Data Base

Complete Host-Leaf mappings

Forwarding Table

Cache only destinations for

active flowsTarget

Page 62: Overlay Networks in the Datacenter - DFW Cisco Users Groupdfw.cisco-users.org/zips/20150204_DFWCUG_Overlay_Networks_In_… · 4/2/2015  · Overlay Networks in the Datacenter Craig

Push Protocol

iBGP Adjacencies

Access

Aggregation

Core

RR RR

Pull Protocol

Communication with Mapping System

Access

Aggregation

Core

MS MS

Overlay Control PlaneHost and Subnet Route Distribution

62

Page 63: Overlay Networks in the Datacenter - DFW Cisco Users Groupdfw.cisco-users.org/zips/20150204_DFWCUG_Overlay_Networks_In_… · 4/2/2015  · Overlay Networks in the Datacenter Craig

Multi-Data Center ConnectivityLAN Extensions

63

N7K/ASR

VNI 5000 VLAN 20VLAN 30

L3 Fabric

N7K/ASR

VNI 5000

VLAN 20VLAN 30

L3 Fabric

DC1 DC2

VXLAN L3 Gateway

VXLAN L3 Gateway

VXLAN L3 Gateway

VXLAN L3 Gateway

L2 GatewayL2 Gateway

VXLAN L3 Gateway

VXLAN L3 Gateway

VXLAN L3 Gateway

VXLAN L3 Gateway

L2 GatewayL2 Gateway

L3 DomainVXLAN OTV/EVPN VXLAN

VXLAN✗

Domain Boundary:Failure and Event Containment Clear Administrative Delineation

Page 64: Overlay Networks in the Datacenter - DFW Cisco Users Groupdfw.cisco-users.org/zips/20150204_DFWCUG_Overlay_Networks_In_… · 4/2/2015  · Overlay Networks in the Datacenter Craig

Multi-Data Center ConnectivityL2 Handoff

CRS-1

WAN

Terminate VXLAN segments at DC-edge

- Per VNI load balancing- Optimized failure containment

Access

Aggregation

Core

CRS-1

OTV/VPLS/EVPN

VXLAN YVXLAN X

64

Page 65: Overlay Networks in the Datacenter - DFW Cisco Users Groupdfw.cisco-users.org/zips/20150204_DFWCUG_Overlay_Networks_In_… · 4/2/2015  · Overlay Networks in the Datacenter Craig

VXLAN Versus LISP

65

Similarities• Same UDP based encapsulation header

• VXLAN does not use the control flag bits or Nonce/Map-Version field

• 24 Bit Segment ID

Differences

• LISP carries IP packets, while VXLAN carries Ethernet frames

• Forwarding Logic• VXLAN: Flooding/Learning

• LISP: Uses a mapping system to register/resolve inner IP to outer IP mappings

• For LISP, IP Multicast is only required to carry host IP multicast traffic

• LISP is designed to give IP address (Identifier) mobility / multi-homing and IP core route scalability

• LISP can provide optimal traffic routing when Identifier IP addresses move to a different location

Locator / ID Separation Protocol

Page 66: Overlay Networks in the Datacenter - DFW Cisco Users Groupdfw.cisco-users.org/zips/20150204_DFWCUG_Overlay_Networks_In_… · 4/2/2015  · Overlay Networks in the Datacenter Craig

VXLAN Versus OTVOverlay Transport Virtualization

66

Similarities• Both carry Ethernet frames

• Same UDP based encapsulation header• VXLAN does not use the OTV Overlay ID field

• Both can use IP Multicast• For broadcast and multicast frames (optional

for OTV)

Differences• Forwarding Logic

• VXLAN: Flooding/Learning

• OTV: Uses the IS-IS protocol to advertise the MAC address to IP bindings

• OTV can locally terminate ARP and doesn’t flood unknown MACs

• OTV can use an adjacency server to eliminate the need for IP multicast

• OTV is optimized for Data Center Interconnect to extend VLANs between or across data centers

• VXLAN is optimized for intra-DC and multi-tenancy

Overlay Transport VirtualizationVXLAN Versus OTV

Page 67: Overlay Networks in the Datacenter - DFW Cisco Users Groupdfw.cisco-users.org/zips/20150204_DFWCUG_Overlay_Networks_In_… · 4/2/2015  · Overlay Networks in the Datacenter Craig

OTV & VXLANL2 Handoff

L3 Fabric

VXLAN L2

Gateway

VXLAN L2

GatewayL2 internal interface

Join-interface

Peer-link

OTV OTV

VXLAN

OTV

VLAN

IP Core

67

Page 68: Overlay Networks in the Datacenter - DFW Cisco Users Groupdfw.cisco-users.org/zips/20150204_DFWCUG_Overlay_Networks_In_… · 4/2/2015  · Overlay Networks in the Datacenter Craig

Multi-Data Center ConnectivityLAN Extensions and IP mobility

68

Ethernet extensions between independent fabricsIP traffic is forwarded via the optimal path (no hair-pinning)

N7K/ASR

VNI 5000

VLAN 20VLAN 30

L3 Fabric

L3 Domain

N7K/ASR

VxLAN

VLAN

untagged

VNI 5000

VLAN 20VLAN 30

L3 Fabric

Data Center 1 Data Center 2

VXLAN L3 Gateway

VXLAN L3 Gateway

VXLAN L3 Gateway

VXLAN L3 Gateway

VXLAN L2/L3 Gateway

VXLAN L2/L3 Gateway

VXLAN L3 Gateway

VXLAN L3 Gateway

VXLAN L3 Gateway

VXLAN L3 Gateway

VXLAN L2/L3 Gateway

VXLAN L2/L3 Gateway

LAN Extensions (OTV/EVPN)

IP Mobility (BGP/LISP)

Page 69: Overlay Networks in the Datacenter - DFW Cisco Users Groupdfw.cisco-users.org/zips/20150204_DFWCUG_Overlay_Networks_In_… · 4/2/2015  · Overlay Networks in the Datacenter Craig

Multi-Data Center ConnectivityIP Mobility for optimized routing

69

N7K/ASR

L3 Fabric

N7K/ASR

VNI 5000

L3 Fabric

DC1 DC2

VXLAN L3 Gateway

VXLAN L3 Gateway

VXLAN L3 Gateway

VXLAN L3 Gateway

VXLAN L2/L3 Gateway

VXLAN L2/L3 Gateway

VXLAN L3 Gateway

VXLAN L3 Gateway

VXLAN L3 Gateway

VXLAN L3 Gateway

VXLAN L2/L3 Gateway

VXLAN L2/L3 Gateway

L3 Domain

LISP/BGP Signalling:Relay mobility state between sites

Fabric Mobile Host Detection

LISP/BGP Signalling

Host routes Host routes

LISP Map System / BGP RR

Direct Path Forwarding

Moving Hosts

Page 70: Overlay Networks in the Datacenter - DFW Cisco Users Groupdfw.cisco-users.org/zips/20150204_DFWCUG_Overlay_Networks_In_… · 4/2/2015  · Overlay Networks in the Datacenter Craig

Data Center 2

End-to-end IP mobility with LISP & BGPL3-VXLAN & LISP IP Mobility

70

CRS-1

WAN

Fabric Mobile Host Detection

LISP Mobility:LISP encapsulation from client sitesNo host routing in the IP coreDirect Path Forwarding

LISP Host MobilityRedistributionBGP LISP

LISP Map System

BGP Host Routes

Access

Aggregation

Core

LISP Host Mobility

Core

CRS-1

Data Plane

Control Plane

LISP Enabled Router

MAC IP VNI Next-Hop

Encap Seq

1 1 5000 IP1 VXLAN 0

EID VNI Loc Seq

1 5000 DC1 0

Access

Host 1VLAN 10

VNI 5000

BGP host route

BGP event = Move signal to LISP

LISP registration

LISP NotificationMAC IP VNI Next-

HopEncap Seq

1 1 5000 XTRs VXLAN 1

EID VNI Loc Seq

1 5000 DC2 1

Data Center 1

BGP host route

Page 71: Overlay Networks in the Datacenter - DFW Cisco Users Groupdfw.cisco-users.org/zips/20150204_DFWCUG_Overlay_Networks_In_… · 4/2/2015  · Overlay Networks in the Datacenter Craig

Underlay Deployment Considerations

71

Page 72: Overlay Networks in the Datacenter - DFW Cisco Users Groupdfw.cisco-users.org/zips/20150204_DFWCUG_Overlay_Networks_In_… · 4/2/2015  · Overlay Networks in the Datacenter Craig

Fabric Relevance to a Hybrid Overlay

DCIWAN Integration

L2/L3

Fabric

VM

OS

VM

OS

VirtualPhysical

ECMP

Multicast ServicesFast Re-route

Overlay HW GWYs & TEPs

Site Demarcation

IP Mobility ServicesL2 ServicesOverlay aware instrumentation

Distributed Routing Services

72

Page 73: Overlay Networks in the Datacenter - DFW Cisco Users Groupdfw.cisco-users.org/zips/20150204_DFWCUG_Overlay_Networks_In_… · 4/2/2015  · Overlay Networks in the Datacenter Craig

Multicast Enabled Underlay

• May use PIM-ASM or PIM-BiDir (Different hardware has different capabilities)

• Spine and Aggregation switches make good RP locations in clos and traditional topologies respectively

• Reserve a range of multicast channels to service the overlay and optimize for diverse VNIs

• In clos topologies with lean spine, using multiple RPs across the multiple spines and mapping different VNIs to different RPs will provide a simple load balancing measure

Network Overlay

73

N1KV Nexus 7K with F3 LC

Nexus 3K Nexus 5K/6KNexus 9K

StandaloneCSR 1000V

ASR1KASR9K

Mcast mode IGMP v2/v3PIM-ASM & Bidir-PIM

PIM-ASM (Bidir– Future)

Bidir-PIM PIM-ASMBidir-PIM

(ASM –Future)PIM-ASM & Bidir-PIM

Page 74: Overlay Networks in the Datacenter - DFW Cisco Users Groupdfw.cisco-users.org/zips/20150204_DFWCUG_Overlay_Networks_In_… · 4/2/2015  · Overlay Networks in the Datacenter Craig

Multi-Pathing and Entropy

• Symmetric Underlay Network topologies facilitate ECMP routing: • Multi-path load balancing

• Fast Re-convergence on link Failures

• Polarization: Encapsulated flows appear as a single flow which hashes to a single path

• Entropy in the encapsulation header to depolarize tunnels• Variable UDP source port in VXLAN outer header

• Underlay must support ECMP hashing on L4 port numbers

74

NV-edge NV-edge

Page 75: Overlay Networks in the Datacenter - DFW Cisco Users Groupdfw.cisco-users.org/zips/20150204_DFWCUG_Overlay_Networks_In_… · 4/2/2015  · Overlay Networks in the Datacenter Craig

Folded Clos Topology

• Fully Symmetric, BW rich topology, Optimized for East-West traffic

• Lean Spine does not do any VXLAN termination/gateway

• Access to other networks through border leaf block

Providing Topology Symmetry

75

L3 Fabric

WAN/DCI

VXLAN L2/3

Gateway

VXLAN L2/3

GatewayVXLAN L2/3

Gateway

VXLAN L2/3

Gateway

VXLAN L2/3

Gateway

VXLAN L2/3

Gateway

Spine

Border LeafLeaf

Page 76: Overlay Networks in the Datacenter - DFW Cisco Users Groupdfw.cisco-users.org/zips/20150204_DFWCUG_Overlay_Networks_In_… · 4/2/2015  · Overlay Networks in the Datacenter Craig

Access/Aggregation/Core Wide BW Topology

76

Access

Aggregation

Core

WAN/DCI

• Fully Symmetric, BW rich topology, Optimized for North-South traffic

• VXLAN termination/gateway @ Access and Core (or Aggregation)

• Access to other networks through Core

VXLAN L2/3

Gateway

VXLAN L2/3

Gateway

VXLAN L2/3

Gateway

VXLAN L2/3

Gateway

VXLAN L2/3

Gateway

VXLAN L2/3

Gateway

VXLAN L2/3

Gateway

VXLAN L2/3

Gateway

Page 77: Overlay Networks in the Datacenter - DFW Cisco Users Groupdfw.cisco-users.org/zips/20150204_DFWCUG_Overlay_Networks_In_… · 4/2/2015  · Overlay Networks in the Datacenter Craig

Instrumentation and Overlay Awareness

• Infrastructure awareness of encapsulated traffic:• Outer/Encapsulation header

• Overlay shim header

• Internal/Payload header

• Payload

• Overlay aware Switching & Routing infrastructure: • ACLs, QoS, Netflow

• Inspection of encapsulated traffic77

NV-edge NV-edge

Page 78: Overlay Networks in the Datacenter - DFW Cisco Users Groupdfw.cisco-users.org/zips/20150204_DFWCUG_Overlay_Networks_In_… · 4/2/2015  · Overlay Networks in the Datacenter Craig

Over-speed, Encapsulation & Effective Throughput

• Encapsulation adds bits to the traffic being sent

• When receiving traffic at full line rate, the encapsulated traffic will exceed the line-rate BW of the egress interface

• Packet drops• Diminished effective throughput

• The uplink BW should be greater than the downlink BW to avoid congestion by encapsulation• This is naturally done in the network

78

encap

1500bytes/packet (10Gbps) 1542 bytes/packet (10.1 Gbps)

64bytes/packet (10Gbps) 106 bytes/packet (10.3 Gbps)

10GE 40GE

Page 79: Overlay Networks in the Datacenter - DFW Cisco Users Groupdfw.cisco-users.org/zips/20150204_DFWCUG_Overlay_Networks_In_… · 4/2/2015  · Overlay Networks in the Datacenter Craig

MTU Issues: Overlay PMTUD

• Encapsulated traffic may exceed max MTU of the path

• When traffic is encapsulated with the Don’t Fragment (DF) bit set:• If MTU is exceeded: IGMP unreachable message (datagram-too-big) is sent back to the

encapsulating NV-edge• Encapsulating NV-edge will lower the tunnel MTU accordingly• Subsequent packets from the source will trigger an ICMP unreachable message from

the NV-edge back to the server (if the traffic from the source has the DF bit set)

• If the DF bit is not set, the device sensing the MTU is exceeded should attempt to fragment the traffic

79

NV-edge NV-edge

MTU backpressureMTU

backpressure

Flow of Traffic

Page 80: Overlay Networks in the Datacenter - DFW Cisco Users Groupdfw.cisco-users.org/zips/20150204_DFWCUG_Overlay_Networks_In_… · 4/2/2015  · Overlay Networks in the Datacenter Craig

Encapsulation HW Offload

Host Overlays

• Current forwarding penalty for SW encap is about 50% throughput

• STT trick leverages TCP offload engine in existing NICs

• TCP violation, short lived workaround

• P2P only, no routing of flows

• VXLAN/NVGRE offload on NICs

• The way forward for host overlays

• Disruptive, many touch points

• Static as ASICs: headers still in flux

• Cisco 3rd Gen VIC 2HCY14 (stateless offload)

Network Overlays

• ASIC acceleration of overlay encapsulations

• Fast enablement of incremental functions in header reserved fields without replacing HW

• Minimal disruption at the network access

• Manageable number of touch points

• Encapsulation Normalization

• Maximize throughput

80

Page 81: Overlay Networks in the Datacenter - DFW Cisco Users Groupdfw.cisco-users.org/zips/20150204_DFWCUG_Overlay_Networks_In_… · 4/2/2015  · Overlay Networks in the Datacenter Craig

Multi-tenancy models

81

To VLANs or Hosts• BD Pink, VNID 1

• BD Blue, VNID 2Default

• Single Underlay namespace• Default table or other VRF

Shared Underlay namespace To Underlay Network

To VLANs or Hosts• BD Pink, VNID 1

• BD Blue, VNID 2Default

Pink

Blue

To Virtualized Underlay Network

• Blue Underlay Segment

• Pink Underlay Segment

Shared Mode: Segmented Overlay, Single Underlay

Parallel Mode: Segmented Overlay and Segmented Underlay

Page 82: Overlay Networks in the Datacenter - DFW Cisco Users Groupdfw.cisco-users.org/zips/20150204_DFWCUG_Overlay_Networks_In_… · 4/2/2015  · Overlay Networks in the Datacenter Craig

Summary recommendations & takeaways

• Optimize the location of L2 and L3 GWYs to optimize routing and minimize failure exposure

• Leverage L3 VXLAN services enabled by control protocols as the main service and L2 extensions as the exception

• Design the underlay with the VXLAN overlay in mind

• A combination of pull protocols and push protocols may render optimal scale and resiliency

• Design the network hierarchically: both the underlay as well as the overlay

• L3 Gateways are key to a sound overlay design

• Link the provisioning of the overlay and scoping of VNIs to the host orchestration system for optimal scale

82

Page 83: Overlay Networks in the Datacenter - DFW Cisco Users Groupdfw.cisco-users.org/zips/20150204_DFWCUG_Overlay_Networks_In_… · 4/2/2015  · Overlay Networks in the Datacenter Craig

VXLAN EVPN Configuration

Page 84: Overlay Networks in the Datacenter - DFW Cisco Users Groupdfw.cisco-users.org/zips/20150204_DFWCUG_Overlay_Networks_In_… · 4/2/2015  · Overlay Networks in the Datacenter Craig

Tenant Red (VRF Red)

VLAN A

Layer-2 VNI A’

VLAN B

Layer-2 VNI B’

SVIA SVI B

VLAN X

Layer-3 VNI X’

SVIX

• 1 Layer-3 VNI per Tenant (VRF) for routing

• VNI X’ is used for routed packets

• 1 Layer-2 VNI per Layer-2 segment

• Multiple Layer-2 VNIs per tenant

• VNI A’ and B’ are used for bridged packets

Page 85: Overlay Networks in the Datacenter - DFW Cisco Users Groupdfw.cisco-users.org/zips/20150204_DFWCUG_Overlay_Networks_In_… · 4/2/2015  · Overlay Networks in the Datacenter Craig

feature nv overlay

feature vn-segment-vlan-based

feature bgp

nv overlay evpn

Initial configuration – Per Switch

Enable VXLAN

Enable VLAN-based VXLAN (the currently only mode)

Enable OSPF if it’s chosen to be the underlay IGP routing protocol

Enable VLAN SVI interfaces if the VTEP needs to be IP gateway and route for the VXLAN VLAN IP subnet.

Enable EVPN control plane for VXLAN

feature ospf

feature pim

feature interface-vlan

Other features may need to be anabled

Enable BGP

Enable IP PIM multicast routing in the underlay network

Page 86: Overlay Networks in the Datacenter - DFW Cisco Users Groupdfw.cisco-users.org/zips/20150204_DFWCUG_Overlay_Networks_In_… · 4/2/2015  · Overlay Networks in the Datacenter Craig

vrf context evpn-tenant-1

vni 39000

rd auto

address-family ipv4 unicast

route-target import 39000:39000

route-target export 39000:39000

route-target both auto evpn

EVPN Tenant VRF ConfigurationCreate VXLAN tenant VRF

Create a VXLAN Tenant VRF named “evpn-tenant-1

Layer-3 VNI for VXLAN routing within the tenant

Define VRF Route Target and import/export policies in address-family ipv4 unicast

Define VRF RD (route distinguisher)

vrf context evpn-tenant-2

vni 39010

rd auto

address-family ipv4 unicast

route-target import 39010:39010

route-target export 39010:39010

route-target both auto evpn

Example to create a 2nd tenant VRF following the above steps

Page 87: Overlay Networks in the Datacenter - DFW Cisco Users Groupdfw.cisco-users.org/zips/20150204_DFWCUG_Overlay_Networks_In_… · 4/2/2015  · Overlay Networks in the Datacenter Craig

vlan 3900name l3-vni-vlan-for-tenant-1vn-segment 39000

interface Vlan3900description l3-vni-for-tenant-1-routingno shutdownvrf member evpn-tenant-1

vrf context evpn-tenant-1vni 39000rd autoaddress-family ipv4 unicastroute-target import 39000:39000route-target export 39000:39000route-target both auto evpn

Layer-3 VNI Per Tenant for EVPN RoutingConfigure Layer-3 VNI per EVPN Tenant VRF Routing Instant

Create VLAN 3900/VNI 39000 for Layer-3 VNI for tenant evpn-tenant-1 vrf routing instance

Create the SVI interface for VLAN 3900/VNI 39000 for VXLAN routing. Put this SVI interface into the tenant VRF context

Associate VNI 39000 with the tenant VRF evpn-tenant-1 as its routing Layer-3 VNI

Page 88: Overlay Networks in the Datacenter - DFW Cisco Users Groupdfw.cisco-users.org/zips/20150204_DFWCUG_Overlay_Networks_In_… · 4/2/2015  · Overlay Networks in the Datacenter Craig

vlan 3901

name l3-vni-vlan-for-tenant-2

vn-segment 39010

interface Vlan3901

description l3-vni-for-tenant-2-routing

no shutdown

vrf member evpn-tenant-2

vrf context evpn-tenant-2

vni 39010

rd auto

address-family ipv4 unicast

route-target import 39010:39010

route-target export 39010:39010

route-target both auto evpn

EVPN Layer-3 VNI Per Tenant for Routing Instance

Configure Layer-3 VNI per EVPN Tenant VRF Routing Instant

Define Layer-3 VNI for a 2nd tenant following the same steps in the previous slide

Page 89: Overlay Networks in the Datacenter - DFW Cisco Users Groupdfw.cisco-users.org/zips/20150204_DFWCUG_Overlay_Networks_In_… · 4/2/2015  · Overlay Networks in the Datacenter Craig

vlan 200

vn-segment 20000

vlan 210

vn-segment 21000

EVPN Layer-2 VNI ConfigurationMap VLANs to VXLAN L2 VNIs and Configure their MP-BGP EVPN Parameters

Map VLAN to VXLAN VNI

evpn

vni 20000 l2

rd auto

route-target import auto

route-target export auto

vni 21000 l2

rd auto

route-target import auto

route-target export auto

Under EVPN configuration, define RD and RT import/export policies for each Layer-2 VNIs

Page 90: Overlay Networks in the Datacenter - DFW Cisco Users Groupdfw.cisco-users.org/zips/20150204_DFWCUG_Overlay_Networks_In_… · 4/2/2015  · Overlay Networks in the Datacenter Craig

interface Vlan200

no shutdown

vrf member evpn-tenant-1

ip address 20.1.1.1/8

fabric forwarding mode anycast-gateway

interface Vlan210

no shutdown

vrf member evpn-tenant-2

ip address 21.1.1.1/8

fabric forwarding mode anycast-gateway

EVPN Layer-2 Network VXLAN VLAN SVI Interface

Configure SVI interfaces for L2 VNIs, enabling anycast-gateway

Create SVI interface vlan 200Associate it with tenant VRF evpn-tenant-1

Enable distributed anycast gateway for vlan 200 subnet

Create SVI interface vlan 210Associate it with tenant VRF evpn-tenant-2

Enable distributed anycast gateway for vlan 210 subnet

All VTEPs for this VLAN should have the same SVI interface IP address as the distributed IP gateway

All VTEPs for this VLAN should have the same SVI interface IP address as the distributed IP gateway

Page 91: Overlay Networks in the Datacenter - DFW Cisco Users Groupdfw.cisco-users.org/zips/20150204_DFWCUG_Overlay_Networks_In_… · 4/2/2015  · Overlay Networks in the Datacenter Craig

interface Vlan200

no shutdown

vrf member evpn-tenant-1

ip address 20.1.1.1/8

interface Vlan210

no shutdown

vrf member evpn-tenant-2

ip address 21.1.1.1/8

EVPN Layer-2 Network VXLAN VLAN SVI Interface

Configure SVI interfaces for L2 VNIs, enabling anycast-gateway

Create SVI interface vlan 200Associate it with tenant VRF evpn-tenant-1

Create SVI interface vlan 210Associate it with tenant VRF evpn-tenant-2

Page 92: Overlay Networks in the Datacenter - DFW Cisco Users Groupdfw.cisco-users.org/zips/20150204_DFWCUG_Overlay_Networks_In_… · 4/2/2015  · Overlay Networks in the Datacenter Craig

EVPN Distributed Gateway

Configure anycast-gateway virtual IP address for VLAN 200. All VTEPs for VLAN 200 should have the same virtual IP address

Configure distributed gateway virtual MAC addressOne virtual MAC per VTEP. All VTEPs should have the same virtual MAC address

fabric forwarding anycast-gateway-mac 0002.0002.0002

interface Vlan200

no shutdown

vrf member evpn-tenant-1

ip address 20.1.1.1/8

fabric forwarding mode anycast-gateway

interface Vlan210

no shutdown

vrf member evpn-tenant-2

ip address 21.1.1.1/8

fabric forwarding mode anycast-gateway

Enable anycast-gateway fabric forwarding for VLAN 200

Configure anycast-gateway virtual IP address for VLAN 210.All VTEPs for VLAN 210 should have the same virtual IP

address

Enable anycast-gateway fabric forwarding for VLAN 210

Page 93: Overlay Networks in the Datacenter - DFW Cisco Users Groupdfw.cisco-users.org/zips/20150204_DFWCUG_Overlay_Networks_In_… · 4/2/2015  · Overlay Networks in the Datacenter Craig

interface nve1

no shutdown

source-interface loopback0

host-reachability protocol bgp

member vni 20000

suppress-arp

mcast-group 239.1.1.1

member vni 21000

suppress-arp

mcast-group 239.1.1.2

member vni 39000 associate-vrf

member vni 39010 associate-vrf

VXLAN Tunnel Interface Configuration Configure VXLAN tunnel interface nve1

Specify loopback0 as the source interface

Define BGP as the mechanism for host reachability advertisement

Add Layer-3 VNIs, one per tenant VRF

Associate tenant Layer-2 VNIs to the tunnel interface nve1Define the mcast group on a per-VNI basisEnable arp suppression on a per-VNI basis

interface loopback0

ip address 10.1.1.11/32

ip ospf network point-to-point

ip router ospf 1 area 0.0.0.0

ip pim sparse-mode

Loopback0 interface configurationThe /32 IP address is used as the VTEP address

Page 94: Overlay Networks in the Datacenter - DFW Cisco Users Groupdfw.cisco-users.org/zips/20150204_DFWCUG_Overlay_Networks_In_… · 4/2/2015  · Overlay Networks in the Datacenter Craig

router bgp 100

router-id 10.1.1.11

log-neighbor-changes

address-family ipv4 unicast

address-family l2vpn evpn

neighbor 10.1.1.1 remote-as 100

update-source loopback0

address-family ipv4 unicast

address-family l2vpn evpn

send-community extended

neighbor 10.1.1.2 remote-as 100

update-source loopback0

address-family ipv4 unicast

address-family l2vpn evpn

send-community extended

vrf evpn-tenant-1

address-family ipv4 unicast

advertise l2vpn evpn

vrf evpn-tenant-2

address-family ipv4 unicast

advertise l2vpn evpn

MP-BGP Configuration on VTEP

Address-family ipv4 unicast for prefix-based routing

Define MP-BGP neighbors. Under each neighbor define address-family ipv4 unicast and l2vpn evpn parameters

Under address-family ipv4 unicast of each tenant VRF instance, enable advertising EVPN routes

Send extended community for both MAC and Host IP reachability information

Address-family ipv4 evpn for vxlan host-based routing

Page 95: Overlay Networks in the Datacenter - DFW Cisco Users Groupdfw.cisco-users.org/zips/20150204_DFWCUG_Overlay_Networks_In_… · 4/2/2015  · Overlay Networks in the Datacenter Craig

router bgp 100

router-id 10.1.1.1

log-neighbor-changes

address-family ipv4 unicast

address-family l2vpn evpn

retain route-target all

template peer vtep-peer

remote-as 100

update-source loopback0

address-family ipv4 unicast

send-community both

route-reflector-client

address-family l2vpn evpn

send-community both

route-reflector-client

neighbor 10.1.1.11

inherit peer vtep-peer

neighbor 10.1.1.12

inherit peer vtep-peer

neighbor 10.1.1.13

inherit peer vtep-peer

neighbor 10.1.1.14

inherit peer vtep-peer

MP-BGP Configuration on iBGP Route Reflector

Address-family ipv4 unicast for prefix-based routing

iBGP RR client peer template

Ssnd both standard and extended community in address-family l2vpn evpn

Ssnd both standard and extended community in address-family ipv4 unicast

Address-family ipv4 evpn for vxlan host-based routing

Page 96: Overlay Networks in the Datacenter - DFW Cisco Users Groupdfw.cisco-users.org/zips/20150204_DFWCUG_Overlay_Networks_In_… · 4/2/2015  · Overlay Networks in the Datacenter Craig

ip pim ssm range 232.0.0.0/8

ip pim bsr listen forward

interface loopback0

ip address 10.1.1.11/32

ip ospf network point-to-point

ip router ospf 1 area 0.0.0.0

ip pim sparse-mode

interface Ethernet2/1

description TME-1-9508-1 Eth1/1

no switchport

ip address 192.167.11.2/30

ip ospf network point-to-point

ip router ospf 1 area 0.0.0.0

ip ospf bfd

ip pim sparse-mode

no shutdown

interface Ethernet2/2

description TME-1-9508-2 Eth1/1

no switchport

ip address 192.168.11.2/30

ip ospf network point-to-point

ip router ospf 1 area 0.0.0.0

ip ospf bfd

ip pim sparse-mode

no shutdown

Underlay Multicast Configuration onVTEP

This example uses BSR for RP

Page 97: Overlay Networks in the Datacenter - DFW Cisco Users Groupdfw.cisco-users.org/zips/20150204_DFWCUG_Overlay_Networks_In_… · 4/2/2015  · Overlay Networks in the Datacenter Craig