Upload
ngonhu
View
233
Download
1
Embed Size (px)
Citation preview
Confidential. Copyright © Arista 2017. All rights reserved.Confidential. Copyright © Arista 2017. All rights reserved.1
VXLAN for IXPs
Confidential. Copyright © Arista 2017. All rights reserved.Confidential. Copyright © Arista 2017. All rights reserved.
VXLAN for IXPsFirst, a little context from the DC
2
Confidential. Copyright © Arista 2017. All rights reserved.
Data Center – L3 leaf-spine Architecture
- Traffic routed at the leaf layer- Leaf Layer 3 connect to each
Spine- Routing protocol of choice
BGP- eBGP connection on the
physical link between leaf and spine
- Provides control and scale
3
Leaf Leaf
Spine
Layer 3
Layer 2AS 65001 AS 65010
AS 65000
eBGP
Eth-4
Eth-4
The DC topology has evolved to a Layer 3 leaf spine architecture
Confidential. Copyright © Arista 2017. All rights reserved.
Data Center – L3 leaf-spine Architecture
- ECMP for traffic load-balancing- Leaf learns the same prefix from
each Spine - Prefix has a next-hop to each
Spine- Traffic L3 load-balanced across the
uplinks - 5-tuple flow based hash- Configurable seed hash for multi-
stage designs
4
Leaf Leaf
Spine
AS 65001 AS 65010
AS 65000
eBGP
Equal Cost Multi-Pathing (ECMP) to accommodate bandwidth growth
Routing Table10.10.11.0/24 -> spine-1
-> spine-2-> spine-3-> spine-4
Flow-1 Flow-2 Flow-3 Flow-4
10.10.11.0/24
Confidential. Copyright © Arista 2017. All rights reserved.
Data Center – L3 leaf-spine Architecture
• Resilient ECMP to minimize traffic distribution- Minimise traffic disruption during a link failure- Link failure, only flows on the effected link are re-disrupted.- Flows on the remaining active path are not re-distributed, thus unaffected by the failure
5
Leaf Leaf
Spine
AS 65001 AS 65010
eBGP
25% of leaf BW
25% of leaf BW
25% of leaf BW
25% of leaf BW
New next-hop table1- 1.1.1.2 - NEW2- 1.1.1.2 – no change3- 1.1.1.3 - no change4- 1.1.1.4 – no change5- 1.1.1.2 - NEW6- 1.1.1.2 – no change7- 1.1.1.3 – no change8- 1.1.1.4 – no change
next-hop table1- 1.1.1.1 -Fail2- 1.1.1.23- 1.1.1.34- 1.1.1.45- 1.1.1.1 -Fail6- 1.1.1.27- 1.1.1.38- 1.1.1.4
Number of Next-hop (N) remains the same regardless of the number active next-hops
ip hardware fib ecmp capacity 4 redundancy 2
The Number of Next-hop (N) remains the same (eight) even after the failure
Confidential. Copyright © Arista 2017. All rights reserved.
Data Center – L3 leaf-spine Architecture
• BGP Unequal Cost Multi-Pathing- Weight the traffic load-balancing across unequal link speeds - Utilizes BGP link-bandwidth extended community to advertise BW to the peer - Use case – Dual internet BGP peers at the DC edge - connecting at different speeds- Migrating the network from 10G to 100G, mix of 10G and 100G in the path
6
https://tools.ietf.org/html/draft-ietf-idr-link-bandwidth-06
10.10.100.0/24
100G
10G
P-1(config-router-bgp)#show ip route bgpCodes: C - connected, S - static, K - kernel,
O - OSPF, IA - OSPF inter area, E1 - OSPF external type 1,E2 - OSPF external type 2, N1 - OSPF NSSA external type 1,N2 - OSPF NSSA external type2, B I - iBGP, B E - eBGP,
B E 10.10.100.0/24 [200/0] via 172.168.2.2, Ethernet50/5 label 3, weight10/11via 172.168.1.2, Ethernet49/9 label 3, weight 1/11
P-3(config-router-bgp)#show ip bgp neighbors 1.1.1.2 advertised-routes deRouter identifier 1.1.1.5, local AS number 65005BGP routing table entry for 10.10.100.0/24Paths: 1 available65005 65001
1.1.1.5 from 1.1.1.4 (1.1.1.4)Origin IGP, metric -, localpref -, IGP metric 20, weight -, valid, external, bestExtended Community: Link-Bandwidth-AS:65005:10000000000.000000Rx SAFI: Unicast
10G
100G
100G
50/5
49/9
P-11.1.1.2
P-31.1.1.5
P-5
Confidential. Copyright © Arista 2017. All rights reserved.
Data Center – L3 leaf-spine Architecture
• BGP for graceful Automated network upgrades- Automated snapshot of the switch state prior to the Upgrade - Gracefully drain the traffic away from the switch, via BGP route maps - Upgrade switch with no code dependency concern, compare post and prior snapshot, on
success reintroduce upgraded switch
7
Snapshot• Neighbors (BGP & LLDP)• Routes
Graceful Removal + upgrade• Automated route-map deployed• AS prepend
Graceful Insertion• Snapshot pre==post• Automated route-map removal
25% of Leafbandwidth
1 2 3
25% of Leafbandwidth
25% of Leafbandwidth
Confidential. Copyright © Arista 2017. All rights reserved.
Data Center – L3 leaf-spine Architecture
• Layer 3 infrastructure for scale• Requirement L2 between Leaf
nodes- VM mobility, app dependences
• VXLAN - Development of an MAC in IP encapsulation standard
• Standard IP header no change to the L3 leaf-spine
• Can utilize ECMP bandwidth of the leaf-spine architecture
8
Still a requirement for layer 2 stretch - VXLAN adoption
Confidential. Copyright © Arista 2017. All rights reserved.Confidential. Copyright © Arista 2017. All rights reserved.
VXLAN for IXPsVXLAN Overview
9
Confidential. Copyright © Arista 2017. All rights reserved.
VXLAN Overview
• VXLAN is IETF defined standard RFC 7348- Co-authored, Vmware, Arista, Cisco, Broadcom- Defines a MAC in UDP/IP encapsulation, for Layer 2 transport across an Layer 3 network- VXLAN header added with a 24-bit layer 2 domain identifier- Providing the ability to build scalable layer 2 networks across an IP layer 3 transport network-
10
VTEP VTEPLayer 3 IP transport
Layer 2 domain
Rtr-110.10.10.1/24
Rtr-210.10.10.2/24
Ethernet-frame VXLAN/IP
Confidential. Copyright © Arista 2017. All rights reserved.
VXLAN Overview - Terminology
• VXLAN Terminology (RFC 7348) - Virtual Tunnel EndPoint (VTEP): Node providing the VXLAN encapsulation functionality- Virtual Tunnel Identifier (VTI): Src IP address of the VTEP, used for the VXLAN encapsulation - Virtual Network Identifier (VNI): 24-bit field in the VXLAN header defining the Layer 2 domain
of the packet
Rtr-1 Rtr-2
802.1QVLAN 10
Layer 2 Domain
802.1QVLAN 10
+ VLAN VNI
VXLAN EncapsulationVXLAN header
added to the frame
VXLAN De-EncapsulationVXLAN header
Removed from the frame
VTEP-1 VTEP-2Ethernet-frame
Ethernet-frame VXLAN
Ethernet-frame
Ethernet-frame
VXLAN
VTIIP-1
VTIIP-2
Confidential. Copyright © Arista 2017. All rights reserved.
VXLAN Overview – Frame format
• VXLAN Frame Format – MAC in IP encapsulation- Outer Ethernet header, local VTEP MAC (src) and next-hop MAC (dst)
- Outer IP header Src/dest IP addresses the local and remote VTEPs VTIs
- Allows for ECMP load-balancing across the network core which is VXLAN unaware.
- 24-bit VNI to scale up to 16 million for the Layer 2 domain/ vWires (8 bytes)
12
Src. MAC addr.
Dest.MAC addr. 802.1Q. Dest. IP Src. IP UDP VNI
(24 bits) Payload FCS
Src. MAC addr.
Dest.MAC addr.
Optional802.1Q.
Original Ethernet Payload (including any IP headers etc.)
Original Ethernet Frame
Remote VTEP
Local VTEP
MAC ofVTEP
MAC of next-hop
50 byte VXLAN header
Confidential. Copyright © Arista 2017. All rights reserved.
VXLAN Overview – Frame format
• To provide Entropy in the ECMP network- UDP source is a hash of the inner encapsulated frame- What fields are hashed from the inner is not defined in the standard- Silicon vendor, will define the level of Entropy that can be achieved - UDP destination port, predefined in the standard as 4789
13
Src. MAC addr.
Dest.MAC addr. 802.1Q. Dest. IP Src. IP UDP VNI
(24 bits) Payload FCS
UDP Dest PortStandard = 4789
UDP Source PortHash of the inner frame
VXLAN UDP
Interface MAC
to Spine
MAC of next-hop
Spine
50 byte VXLAN header
Confidential. Copyright © Arista 2017. All rights reserved.
VXLAN Overview – Service interfaces
• VLAN to VNI mapping- 1:1 mapping between the VLAN-ID and the VXLAN VNI- Mapping locally significant, VLAN tag not carried in VXLAN header
14
• S-VLAN to VNI mapping- Outer S-tag of Q-in-Q frame map to the VXLAN VNI- C-tag’s bundled in a single VXLAN layer 2 (VNI) domain)
• N Vlan-IDs to a VNI mapping (future)- Map multiple VLAN-IDs to a single shared VXLAN VNI- Provide support for overlapping VLAN-ID on same physical VTEP
VTEP
VTEP
VTEP
VLAN 10
VNI 1010
VLAN 10 –> VNI 1010
VLAN 10 VLAN 20
Q-in-QC-tag S-tag
VLAN 20 –> VNI 1020
VNI 1010
VLAN 10
VLAN 10
VLAN 20
VLAN 30 –> VNI 1030
VNI 1030
VLAN 30
Confidential. Copyright © Arista 2017. All rights reserved.
• VXLAN Service interface - ensures standard VLAN behaviour on the ingress interface• Requirement for a Finite set of MAC addresses permitted per edge port• Standard L2 Access Control List (ACL) - restrict traffic from approved members
• Could automatically-generate policy from a database (e.g. IXP-Manager)• JSON API call to the switch
VXLAN Overview – MAC security
15
P-1(config-if-Et1)#show mac access-listsMAC Access List Test-1
10 permit 00:aa:aa:aa:aa:aa 00:00:00:00:00:00 anyP-1(config-if-Et1)#show run int eth 1interface Ethernet1
shutdownmac access-group Test-1 in
P-1(config-if-Et1)#
Confidential. Copyright © Arista 2017. All rights reserved.
• The Layer 2 ACLs already limit traffic to only approved speakers• Storm-control to restrict Broadcast (ARP), Multicast (v6ND) traffic
• Statistics can be retrieved via APIs, etc. for automated behaviors• EVPN control plane adds value with support for snooping. damping and
ARP supression
VXLAN Overview – Broadcast Control (ARPs etc)
16
interface Ethernet1
storm-control broadcast level pps 5000
storm-control multicast level pps 5000
Confidential. Copyright © Arista 2017. All rights reserved.Confidential. Copyright © Arista 2017. All rights reserved.
VXLAN for IXPsVXLAN Control Plane
17
Confidential. Copyright © Arista 2017. All rights reserved.
VXLAN Control Plane - Options
• VXLAN control plane, is used for MAC learning and packet flooding- Learning what remote VTEP a host resides behind- Allowing the mapping of remote MACs to their associated remote VTEP- Mechanism for forwarding of the Broadcast and multicast traffic within the Layer 2
segment (VNI)
18
IP Multicast Control Plane
• VTEP join an associated IP multicast group(s) for the VNI(s)
• Unknown unicasts forwarded to VTEPs in the VNIs via IP multicast
• Flood and learn and requires IP multicast support in the underlay
• Limited deployments
Head-End Replication (HER)
• BUM traffic replicated to each remote VTEPs in the VNIs
• Unicast Replication carried out on the ingress VTEP
• MAC learning still via flood and learn, but no requirement for IP multicast
EVPN Model
• BGP used to distribute local MAC to IP bindings between VTEPs
• Broadcast traffic handled via IP multicast or HER models
• Dynamic MAC distribution and VNI learning, configuration can be BGP intensive
Confidential. Copyright © Arista 2017. All rights reserved.
VXLAN Control Plane – Head-End Replication
• Head-end Replication (HER)- VTEP configured with a flood-list, containing the IPs of the remote VTEPs in the VNI- Any BUM traffic received on the VTEP, replicated to the VTEPs in the flood-list - Remote VTEPs receiving flooded traffic, learn the inner source MAC to VTEP binding- Creating a remote MAC to outer SRC IP (VTEP) mapping in the layer 2 table
19
VXLAN VNI 1010
UnknownUnicast
RTR-1MAC-A
VTEP learns MAC-A and maps to VTEP-1
BUM traffic replicated (unicast) to VTEPs in flood-list
VTEP-1
VTEP-2
VTEP-3
VTEP-1 Flood-listVNI 1010 -> VTEP-2, VTEP-3
RTR-2
RTR-3
Simple flood and learn forwarding plane
Confidential. Copyright © Arista 2017. All rights reserved.
VXLAN Control Plane – Head-End Replication
• Simple configuration…
20
Eth 6Eth 6VTEP-3
2..2.2.3
RTR-4
Eth 7 VTEP-22.2.2.2
VLAN 20
Eth 6
VLAN 20 VLAN 10
VNI 2000
VNI 1000
VTEP-12.2.2.1
VLAN 10
RTR-3RTR-2RTR-1
!interface Loopback2
ip address 2.2.2.1/32!Interface ethernet 6
switchport mode accessswitchport access vlan 10
!Interface ethernet 7
switchport mode accessswitchport access vlan 10
!interface Vxlan1
vxlan source-interface Loopback2vxlan udp-port 4789vxlan vlan 10 vni 1000vxlan vlan 20 vni 2000vxlan vlan 10 flood vtep 2.2.2.3vxlan vlan 20 flood vtep 2.2.2.2
!
!interface Loopback2
ip address 2.2.2.2/32!Interface ethernet 6
switchport mode accessswitchport access vlan 10
!interface Vxlan1
vxlan source-interface Loopback2vxlan udp-port 4789vxlan vlan 20 vni 2000vxlan vlan 20 flood vtep 2.2.2.1
!
!interface Loopback2
ip address 2.2.2.3/32!Interface ethernet 6
switchport mode accessswitchport access vlan 20
!interface Vxlan1
vxlan source-interface Loopback2vxlan udp-port 4789vxlan vlan 10 vni 2000vxlan vlan 10 flood vtep 2.2.2.1
!
Confidential. Copyright © Arista 2017. All rights reserved.Confidential. Copyright © Arista 2017. All rights reserved.
VXLAN for IXPsEVPN - BGP based Control Plane
21
Confidential. Copyright © Arista 2017. All rights reserved.
What is EVPN?
• EVPN Standard RFC 7432 - Standard defines a BGP control plane with a MPLS data plane- New EVPN address family, to advertise MAC/IP and IP prefixes.- Providing Layer 2 and 3 VPN services on single interface.
• Multiple forwarding plane Options, same control plane- RFC 7432 – MPLS forwarding plane- NVO draft – VXLAN, NVGRE, MPLSoGRE – Data Center focus- PBB draft – Metro Ethernet focus
Control Plane EVPN MP-BGP (RFC 7432)
Data Plane MPLSRFC 7432
Provider Backbone Bridging (Draft)
NVONVGRE, VXLAN,
(Draft)
Confidential. Copyright © Arista 2017. All rights reserved.
EVPN Terminology
• EVPN Terminology – VXLAN forwarding plane- NVE/VTEP : Node providing the encapsulation functionality (MPLS/VXLAN)- EVPN Instance (EVI): Logical switch in the EVPN domain, spans VTEPs to providing L2 and L3 connectivity.- MAC-VRF: A tenant VRF storing MAC addresses on a VRF for a specific tenant - L2 VPNs- IP-VRF: A VRF table for announcing tenant prefixes between VTEPs, used for delivering Layer 3 VPNs- Ethernet Segment ID (ESI): Shared Ethernet segment between VTEPs for supporting multi-homing
BGP EVPN
Confidential. Copyright © Arista 2017. All rights reserved.
EVPN Terminology – New Address Family
• New BGP NLRI for the EVPN routes- Address Family Identifier (AFI) = 25 (L2VPN)- Subsequent Address Family Identifier (SAFI) = 70 (EVPN)
• To support multi-tenancy, traditional VPN methods- Route Target (RT), to control the import and export of routes across VRFs, - Route Distinguisher (RD): Prepended to advertised address, support overlapping address space
Route Type
1 - Ethernet A-D Route2 - MAC-advertisement Route
3 - Inclusive Multicast Route4 - Ethernet Segment Route5 - IP prefix route (optional)
NLRINext-hop IP for the prefix = PE
AFI = 25 (L2VPN) , SAFI =70 (EVPN))
Type-2 Route
Route Distinguisher (RD)
Ethernet Segment Identifier =0
Ethernet Tag ID
MAC address Length (48 bits)
MAC address
IP address length [optional]
IP address [optional]
Label (VXLAN VNI)
Extended Community Route Target
Confidential. Copyright © Arista 2017. All rights reserved.
EVPN Terminology – EVPN route types
• Standard Defines five new Route types
Route Type Description Use Case
1 Auto-Discover Segment route - Used in multi-homing deployments to allow the dynamic discovery of EVPN multi-homing
2 MAC address Route - Advertisement of locally learnt/provisioned MAC address and optionally IP addresses. L2 VPNs and IRB
3 Inclusive Multicast Ethernet Route - used to advertise EVI/VNI membership for the creation of ingress replication lists
L2 VPNs
4 Ethernet Segment Route – used in multi-homing deployments to allow the dynamic discovery of shared Ethernet segments
EVPN multi-homing
5 IP prefix Route, advertisement of a IP prefix and next-hop, no MAC address for the route is advertised.
L3 VPNs
Confidential. Copyright © Arista 2017. All rights reserved.
EVPN Operation - Layer 2 VPNs
• Layer 2 VPN with EVPN- VTEP sharing the same MAC-VRF (L2 Domain), BGP EVPN peering session- Advertise locally learnt MACs in the MAC-VRF, with a Type-2 route & RT of the MAC-VRF, NH=VTEP- Advertised VNI label identifies the Layer 2 domain in the data plane- Option to advertise host IP address for ARP suppression on the remote VTEP
Confidential. Copyright © Arista 2017. All rights reserved.
EVPN Operation - Layer 2 VPNs
• Sequence number for MAC mobility- After MAC move, new sequence number added to the Type-2 route announcement, - Higher sequence allows VTEPs to determine most up to date MAC advertisement – fast mac refresh
• MAC address Damping- After advertisement of local learnt MAC, counter started M (default 180s) - If N (5) MAC mobility events detected within 180s (M) window – stop updates
27
VTEP-1 10.10.4.0/24
EVI EVIEt-1
VTEP-2
MAC-A MAC-A
Et-2
MAC-1Sequence = 003
MAC-1Sequence = 004
MAC Table MAC-1 à IP-2
Higher sequence numberMAC table updated MAC Table
MAC-1 à Eth1
MAC move
MAC Table MAC-1 à IP-1
Layer 3 IP core
Confidential. Copyright © Arista 2017. All rights reserved.
EVPN Operation - Layer 2 VPNs
• Layer 2 VPN, support for Multi-homing for CPE resiliency- Multi-home CPEs to multiple PE nodes, not limited to a pair of PEs - Provides both an active-active and active-standby forwarding model- Multi-homed PE nodes, no need for an interconnected via a “peer” link- Standard LACP LAG connection to the PE nodes
Active- PE Standby-PE Active- PE Active- PE
EVI EVI EVI EVI
Active- PE Active- PE
EVI
EVI
Active- PE
EVI EVI
Active-standby Active-Active Multi-node Active-Active
No interlink required No interlink required No interlink required
Multi-homing for resiliency and active-active forwarding
Confidential. Copyright © Arista 2017. All rights reserved.
EVPN - Configuration
29
VTEP-1 10.10.4.0/24
EVI EVI
VXLAN data plane VTEP-2
MAC-1 MAC-2
MACVRF-1 MAC
VRF-1
EVPN Type 2 Route Learnt MAC + IP [optional], RT and RD,VPN Label = 1010, NH VTEP-2
Interface loopback 1ip address 1.1.1.2/32!Interface ethernet 1switchport mode trunkswitchport trunk allowed vlan 10
!interface Vxlan1
vxlan source-interface Loopback1vxlan vlan 10 vni 1010
!router bgp 65001neighbor VTEP1 remote-as 65001
!address-family evpn
neighbor VTEP1 activate!vlan 10
rd 1.1.1.2:1010route-target both 1010:1010redistribute learned
EVPN Type 2 Route Learnt MAC + IP [optional], RT and RD,
VPN Label = 1010, NH VTEP-2
Interface loopback 1ip address 1.1.1.1/32!Interface ethernet 1switchport mode trunkswitchport trunk allowed vlan 10
!interface Vxlan1
vxlan source-interface Loopback1vxlan vlan 10 vni 1010
!router bgp 65001neighbor VTEP2 remote-as 65001
!address-family evpn
neighbor VTEP2 activate!vlan 10
rd 1.1.1.1:1010route-target both 1010:1010redistribute learned
Confidential. Copyright © Arista 2017. All rights reserved.
EVPN – Routing Table
• Type-2 MAC and IP route learnt from VTEP-2 (1.1.1.2)
30
VTEP-1 10.10.4.0/24
EVI EVI
VXLAN data plane VTEP-2
MAC-110.10.10..101
0050.5686.94d2 10.10.10.102
MACVRF-1 MAC
VRF-1
EVPN Type 2 Route Learnt MAC + IP [optional], RT and RD,VPN Label = 1010, NH VTEP-2
EVPN Type 2 Route Learnt MAC + IP [optional], RT and RD,
VPN Label = 1010, NH VTEP-2
BGP routing table entry for mac-ip 0050.5686.94d2 10.10.10.102, Route Distinguisher: 1.1.1.2:1010Paths: 2 available65001 1.1.1.2 from 1.1.1.2 (1.1.1.2)Origin IGP, metric -, localpref 100, weight 0, valid, external, ECMP head, best, ECMP contributorExtended Community: Route-Target-AS:1010:1010 TunnelEncap:tunnelTypeVxlanVNI: 1010 ESI: 0000:0000:0000:0000:0000
VTEP-1(config)#show vxlan address-tableVxlan Mac Address Table
----------------------------------------------------------------------VLAN Mac Address Type Prt VTEP Moves Last Move---- ----------- ---- --- ---- ----- ---------10 0050.5686.94d2 EVPN Vx1 1.1.1.2 1 0:04:35 ago
Confidential. Copyright © Arista 2017. All rights reserved.
Summary
• VXLAN adopted standard and supported in Merchant Silicon- Simple, Lean, Layer 2 MAC in IP encapsulation protocol- Jericho/Jericho+, Trident2, Trident2+, Tomahawk, XP etc- Standard VLAN based interfaces – for security and broadcast control
31
• Head-End Replication- Flood and learn control plane- Light weight, low touch configuration- Ideal for small to medium IXP
environments
• BGP based EVPN control plane- Controlled learning via BGP, BGP centric
config- ARP suppression and broadcast
optimization- Ideal for medium and larger IXP
environments
Control Plane Options
Confidential. Copyright © Arista 2017. All rights reserved.
www.arista.com
Thank You
Confidential. Copyright © Arista 2017. All rights reserved.32