37
© 2009 Petr Grygarek, Advanced Computer Networks Technologies 1 Multi-site Datacenter Network Multi-site Datacenter Network Infrastructures Infrastructures Petr Gryg Petr Gryg á á rek rek

Multi-site Datacenter Network Infrastructureswh.cs.vsb.cz/sps/images/8/87/Datacenter-multisite.pdf · Ingress Rbridge maintain or

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Multi-site Datacenter Network Infrastructureswh.cs.vsb.cz/sps/images/8/87/Datacenter-multisite.pdf · Ingress Rbridge maintain  or

© 2009 Petr Grygarek, Advanced Computer Networks Technologies 1

Multi-site Datacenter Network Multi-site Datacenter Network InfrastructuresInfrastructures

Petr GrygPetr Grygáárekrek

Page 2: Multi-site Datacenter Network Infrastructureswh.cs.vsb.cz/sps/images/8/87/Datacenter-multisite.pdf · Ingress Rbridge maintain  or

2© 2009 Petr Grygarek, Advanced Computer Networks Technologies

Why Multisite Datacenters ?Why Multisite Datacenters ?

• Resiliency against large-scale site failures (geodiversity)Resiliency against large-scale site failures (geodiversity)

• fire, wide area power outages, political reasons, law regulations ...fire, wide area power outages, political reasons, law regulations ...

• Disaster recoveryDisaster recovery

• Easier handling of planned outagesEasier handling of planned outages

• Workload migration to unaffected siteWorkload migration to unaffected site

• Traffic optimization Traffic optimization

● choose ingress point closer to requesting clientchoose ingress point closer to requesting client

Page 3: Multi-site Datacenter Network Infrastructureswh.cs.vsb.cz/sps/images/8/87/Datacenter-multisite.pdf · Ingress Rbridge maintain  or

3© 2009 Petr Grygarek, Advanced Computer Networks Technologies

Interconnection of DC SitesInterconnection of DC SitesTraditional Requirements & ArchitectureTraditional Requirements & Architecture

• L3L3

• IP or MPLSIP or MPLS

• Optionally L2Optionally L2

• traditional designtraditional design

Technically, L3 and L3 intnerconnection can be implemented on a single set of network Technically, L3 and L3 intnerconnection can be implemented on a single set of network devices (MPLS, EVPN, ...)devices (MPLS, EVPN, ...)

Page 4: Multi-site Datacenter Network Infrastructureswh.cs.vsb.cz/sps/images/8/87/Datacenter-multisite.pdf · Ingress Rbridge maintain  or

4© 2009 Petr Grygarek, Advanced Computer Networks Technologies

Multi-site PoDsMulti-site PoDs

Page 5: Multi-site Datacenter Network Infrastructureswh.cs.vsb.cz/sps/images/8/87/Datacenter-multisite.pdf · Ingress Rbridge maintain  or

5© 2009 Petr Grygarek, Advanced Computer Networks Technologies

Options of L3 extension between sitesOptions of L3 extension between sites

• Dedicated core (IP-only or MPLS)Dedicated core (IP-only or MPLS)• MPLS/VPN is beneficial for multiple tenants separationMPLS/VPN is beneficial for multiple tenants separation

• DMVPN over shared core (or Interner)DMVPN over shared core (or Interner)• Multiple VRF instances, in tenants' VRFsMultiple VRF instances, in tenants' VRFs

• EVPN (L3 encapsulation)EVPN (L3 encapsulation)• Tenants are separated using L3 VNIDTenants are separated using L3 VNID

• ......

Page 6: Multi-site Datacenter Network Infrastructureswh.cs.vsb.cz/sps/images/8/87/Datacenter-multisite.pdf · Ingress Rbridge maintain  or

6© 2009 Petr Grygarek, Advanced Computer Networks Technologies

Why to extend L2 between sites ?Why to extend L2 between sites ?• Server admins like transparent VM mobilityServer admins like transparent VM mobility

• Distributed clusters for better resilienceDistributed clusters for better resilience• FWs, LBs, NASs, ...FWs, LBs, NASs, ...

• Server clustersServer clusters

• e.g. using Windows NLBe.g. using Windows NLB

Be aware that usage of technologies originally developed as Be aware that usage of technologies originally developed as „local“ in multi-site environments always need careful „local“ in multi-site environments always need careful considerationconsideration

• Timers built in application software or hardware Timers built in application software or hardware appliances (e.g. storage clusters)appliances (e.g. storage clusters)

Page 7: Multi-site Datacenter Network Infrastructureswh.cs.vsb.cz/sps/images/8/87/Datacenter-multisite.pdf · Ingress Rbridge maintain  or

7© 2009 Petr Grygarek, Advanced Computer Networks Technologies

L2 Extensions between DC SitesL2 Extensions between DC Sites● Dual-site (special, simplified case)Dual-site (special, simplified case)

● P2P virtual links: QinQ, EoMPLS /AToM, …P2P virtual links: QinQ, EoMPLS /AToM, …

● Multichassis Etherchannel/Virtual chassis P2P L2 technologiesMultichassis Etherchannel/Virtual chassis P2P L2 technologies

– Cisco VPC/VSS , Dell VLT, Juniper VC, ...Cisco VPC/VSS , Dell VLT, Juniper VC, ...

● General topology:General topology:● Redundant switched network with STP (non-recommended !), also includes QinQRedundant switched network with STP (non-recommended !), also includes QinQ

● Distributed virtual chassisDistributed virtual chassis

– if latency between sites fits into solution's limitsif latency between sites fits into solution's limits

● TRILL/FabricPathTRILL/FabricPath

● VPLSVPLS

● Cisco OTVCisco OTV

● VxLANs (advantage: can solve both L2 and L3 extension))VxLANs (advantage: can solve both L2 and L3 extension))

● Various SDN interconnect „clouds“ (mostly IP overlay)Various SDN interconnect „clouds“ (mostly IP overlay)

Page 8: Multi-site Datacenter Network Infrastructureswh.cs.vsb.cz/sps/images/8/87/Datacenter-multisite.pdf · Ingress Rbridge maintain  or

8© 2005 Petr Grygarek, VSB-TU Ostrava, Routed and Switched Networks

Transparent Interconnection Transparent Interconnection of Lots of Links (TRILL)of Lots of Links (TRILL)

• IEEE 802.1aqIEEE 802.1aq

• L2 multipath solutionL2 multipath solution● eliminates Spanning Tree, no stability issueseliminates Spanning Tree, no stability issues● no blocked portsno blocked ports● reduced latency – shortest path always usedreduced latency – shortest path always used● alternative active paths (equal-cost)alternative active paths (equal-cost)

– path selection based on data packet header hash ensures ordered deliverypath selection based on data packet header hash ensures ordered delivery

• L2 frame encapsulation („L2 over L3“)L2 frame encapsulation („L2 over L3“)● new header carries egreess switch identitynew header carries egreess switch identity

• ISIS-like internal routing of encapsulated framesISIS-like internal routing of encapsulated frames

Page 9: Multi-site Datacenter Network Infrastructureswh.cs.vsb.cz/sps/images/8/87/Datacenter-multisite.pdf · Ingress Rbridge maintain  or

9© 2005 Petr Grygarek, VSB-TU Ostrava, Routed and Switched Networks

TRILL PrinciplesTRILL Principles• Rbridge – TRILL-capable bridgeRbridge – TRILL-capable bridge● Ingress, egress, TRILL cloud internalIngress, egress, TRILL cloud internal

• Switches have identities, ISIS calculates shortest paths between switchesSwitches have identities, ISIS calculates shortest paths between switches● ISIS chosen as it runs directly on L2 (no TCP/IP) and is generic enough ISIS chosen as it runs directly on L2 (no TCP/IP) and is generic enough (new TLVs)(new TLVs)

• 2-level switching hierarchy2-level switching hierarchy● Only Rbridge addresses have to be known in TRILL coreOnly Rbridge addresses have to be known in TRILL core● Smaller MAC address tables, better scalabilitySmaller MAC address tables, better scalability

• Data-plane MAC address learning: Data-plane MAC address learning: ● Conversational learning used to save space in MAC tablesConversational learning used to save space in MAC tables

– Only src MACs of unicast frames to destinations known to be local are learnedOnly src MACs of unicast frames to destinations known to be local are learned● Backward learining still used to learn addresses from outside of TRILL cloudBackward learining still used to learn addresses from outside of TRILL cloud● Ingress Rbridge maintain <MAC,egress Rbridge> or <MAC,local port> records Ingress Rbridge maintain <MAC,egress Rbridge> or <MAC,local port> records

(VLANs also supported)(VLANs also supported)● Optional Control-plane MAC address learningOptional Control-plane MAC address learning

● End-Station Address Distribution Infrormation (ESADI)End-Station Address Distribution Infrormation (ESADI)● Proactive MAC address propagation (configurable per-VLAN), cryptographically Proactive MAC address propagation (configurable per-VLAN), cryptographically

secured, fast location updatessecured, fast location updates

Page 10: Multi-site Datacenter Network Infrastructureswh.cs.vsb.cz/sps/images/8/87/Datacenter-multisite.pdf · Ingress Rbridge maintain  or

10© 2005 Petr Grygarek, VSB-TU Ostrava, Routed and Switched Networks

TRILL vs. Cisco FabricPathTRILL vs. Cisco FabricPath• TRILL Rbridges may be interconnected via legacy TRILL Rbridges may be interconnected via legacy

Ethernet cloudsEthernet clouds● Not meaningful in DC environment, Cisco FabricPath (TRILL alternative) does Not meaningful in DC environment, Cisco FabricPath (TRILL alternative) does

not support thisnot support this

• Next-hop header allows passing of TRILL frame Next-hop header allows passing of TRILL frame over eacg leagacy internal Ethenet segment (even over eacg leagacy internal Ethenet segment (even VLAN-based) – if anyVLAN-based) – if any

● DST MAC in outer header specifies next-hop RbridgeDST MAC in outer header specifies next-hop Rbridge● For each legacy Ethernet interconnect segment single Rbridge is elected (per For each legacy Ethernet interconnect segment single Rbridge is elected (per

VLAN) to avoid looping/frame duplicationVLAN) to avoid looping/frame duplication

• Inner header allows routing of TRILL frame to Inner header allows routing of TRILL frame to egress Rbridgeegress Rbridge

• Loop protection: Hop count (TTL) in TRILL header Loop protection: Hop count (TTL) in TRILL header

Page 11: Multi-site Datacenter Network Infrastructureswh.cs.vsb.cz/sps/images/8/87/Datacenter-multisite.pdf · Ingress Rbridge maintain  or

11© 2005 Petr Grygarek, VSB-TU Ostrava, Routed and Switched Networks

TRILL Multidestination Frame TRILL Multidestination Frame ForwardingForwarding

• For broadcasts, unknown unicasts and multicasts For broadcasts, unknown unicasts and multicasts (BUM)(BUM)

• One or more distribution tree covering all egress One or more distribution tree covering all egress Rbridges is calculatedRbridges is calculated

• Distribution tree to be used to distribute Distribution tree to be used to distribute particular data frame (destination root switch) is particular data frame (destination root switch) is chosen based on „destination Rbridge address“ chosen based on „destination Rbridge address“ field of encapsulating framefield of encapsulating frame

Page 12: Multi-site Datacenter Network Infrastructureswh.cs.vsb.cz/sps/images/8/87/Datacenter-multisite.pdf · Ingress Rbridge maintain  or

12© 2005 Petr Grygarek, VSB-TU Ostrava, Routed and Switched Networks

VxLANsVxLANs

Page 13: Multi-site Datacenter Network Infrastructureswh.cs.vsb.cz/sps/images/8/87/Datacenter-multisite.pdf · Ingress Rbridge maintain  or

13© 2005 Petr Grygarek, VSB-TU Ostrava, Routed and Switched Networks

VxLANs - Usage and PrinciplesVxLANs - Usage and Principles• eXtensible VLANseXtensible VLANs

• Layer 2 overlay over a Layer 3 networkLayer 2 overlay over a Layer 3 network● Stateless tunnelling between VTEPsStateless tunnelling between VTEPs● Various payload-to-VTEP address mapping mechanismsVarious payload-to-VTEP address mapping mechanisms● UDP encapsulationUDP encapsulation

– well-known destination port 4789, „random“ src port to support ECMP well-known destination port 4789, „random“ src port to support ECMP hashinghashing

● Src port is actually a hash of dataframe headerSrc port is actually a hash of dataframe header– MTU increase in transport network is neededMTU increase in transport network is needed

● fragmentation not desirablefragmentation not desirable

• Originally intended for inter-hypervisor traffic via L3 infraOriginally intended for inter-hypervisor traffic via L3 infra

• Multitenant overlay switching/routing fabrics based on IP Multitenant overlay switching/routing fabrics based on IP ECMP underlay and VxLANs can be commonly seen todayECMP underlay and VxLANs can be commonly seen today

● Traditional VLANs limited to rack with particular leafTraditional VLANs limited to rack with particular leaf

Page 14: Multi-site Datacenter Network Infrastructureswh.cs.vsb.cz/sps/images/8/87/Datacenter-multisite.pdf · Ingress Rbridge maintain  or

14© 2005 Petr Grygarek, VSB-TU Ostrava, Routed and Switched Networks

VTEPs, bridging & routingVTEPs, bridging & routing• Virtual Tunnel End-point (VTEP)Virtual Tunnel End-point (VTEP)

• Each VXLAN segment (VNID) is mapped to IP multicast group in Each VXLAN segment (VNID) is mapped to IP multicast group in the transport IP network to carry BUM trafficthe transport IP network to carry BUM traffic

● alternatively, unicast replication can be usedalternatively, unicast replication can be used

• VLAN-VxLAN bridgingVLAN-VxLAN bridging● hardware or software gatewayhardware or software gateway● combines together traditional L2 VLAN and VxLAN segment combines together traditional L2 VLAN and VxLAN segment

to single broadcast domainto single broadcast domain

• Inter-VxLAN routingInter-VxLAN routing● also routing between VxLAN VTEP and traditional L3 interacesalso routing between VxLAN VTEP and traditional L3 interaces

• Anycast gateway – same GW IP address „present“ on all VxLAN Anycast gateway – same GW IP address „present“ on all VxLAN fabric leafsfabric leafs

Page 15: Multi-site Datacenter Network Infrastructureswh.cs.vsb.cz/sps/images/8/87/Datacenter-multisite.pdf · Ingress Rbridge maintain  or

15© 2005 Petr Grygarek, VSB-TU Ostrava, Routed and Switched Networks

VXLAN headerVXLAN header(http://www.cisco.com/c/en/us/products/collateral/switches/nexus-9000-series-switches/white-paper-c11-729383.html)(http://www.cisco.com/c/en/us/products/collateral/switches/nexus-9000-series-switches/white-paper-c11-729383.html)

• VxLAN header• 24b segment ID• Reserved/Flags – for future extensibility• - utilized by various „SDN“ technologies (e.g. Cisco APIC)

Page 16: Multi-site Datacenter Network Infrastructureswh.cs.vsb.cz/sps/images/8/87/Datacenter-multisite.pdf · Ingress Rbridge maintain  or

16© 2005 Petr Grygarek, VSB-TU Ostrava, Routed and Switched Networks

VxLANs implementation VxLANs implementation scenariosscenarios

• MAC address learning optionsMAC address learning options● From data plane (+ multicast BUM traffic )From data plane (+ multicast BUM traffic )● Separate control planeSeparate control plane

– EVPNEVPN● Proactive MAC propagation using BGP (dedicated AF) – limits Proactive MAC propagation using BGP (dedicated AF) – limits

or eliminates floodingor eliminates flooding● automatic VTEP discovery via BGP updatesautomatic VTEP discovery via BGP updates● also handles redundant attachment of the same switched segmentalso handles redundant attachment of the same switched segment

– SDNSDN● SDN controller / virtual machine manager has full knowledge SDN controller / virtual machine manager has full knowledge

what VMs reside on individual hypervisors/VTEPswhat VMs reside on individual hypervisors/VTEPs

– any other defined in futureany other defined in future

Page 17: Multi-site Datacenter Network Infrastructureswh.cs.vsb.cz/sps/images/8/87/Datacenter-multisite.pdf · Ingress Rbridge maintain  or

17© 2005 Petr Grygarek, VSB-TU Ostrava, Routed and Switched Networks

MP-BGP EVPNMP-BGP EVPN• Standard based Standard based

● MP-BGP control plane, MP-BGP control plane, (VxLAN or MPLS data encapsulation)(VxLAN or MPLS data encapsulation)

• Combines intra-VxLAN bridging with inter-VXLAN routing Combines intra-VxLAN bridging with inter-VXLAN routing (integrated bridging and routing - IRB)(integrated bridging and routing - IRB)

● Optimal for both east-west and north-east traffic (host Optimal for both east-west and north-east traffic (host routing)routing)

• BGP propagates host routes (MACs and IP /32s) and site IP BGP propagates host routes (MACs and IP /32s) and site IP prefixes (/nn)prefixes (/nn)

● VxLAN dataplane flooding is still used for BUM trafficVxLAN dataplane flooding is still used for BUM traffic– /nn prefixes may help to route to IP hosts/nn prefixes may help to route to IP hosts

• Knowledge of MAC/IP pairs can also limit ARP flooding (ARP Knowledge of MAC/IP pairs can also limit ARP flooding (ARP suppression)suppression)

• Supports multiple attachments of the same L2 segmentSupports multiple attachments of the same L2 segment

Page 18: Multi-site Datacenter Network Infrastructureswh.cs.vsb.cz/sps/images/8/87/Datacenter-multisite.pdf · Ingress Rbridge maintain  or

18© 2005 Petr Grygarek, VSB-TU Ostrava, Routed and Switched Networks

EVPN TerminologyEVPN Terminology

• EVPN Instance (EVI) = virtual swich (with EVPN Instance (EVI) = virtual swich (with VLANs)VLANs)

Page 19: Multi-site Datacenter Network Infrastructureswh.cs.vsb.cz/sps/images/8/87/Datacenter-multisite.pdf · Ingress Rbridge maintain  or

19© 2005 Petr Grygarek, VSB-TU Ostrava, Routed and Switched Networks

Address LearningAddress Learning

• Local lerarning Local lerarning ● MAC addresses - from source MAC of data frames MAC addresses - from source MAC of data frames

(mandatory)(mandatory)● IP addresses: learning from DHCP messages IP addresses: learning from DHCP messages

(optional)(optional)

• Remote Learning Remote Learning ● BGP MAC+IP advertisements (type 2 routes)BGP MAC+IP advertisements (type 2 routes)

Page 20: Multi-site Datacenter Network Infrastructureswh.cs.vsb.cz/sps/images/8/87/Datacenter-multisite.pdf · Ingress Rbridge maintain  or

20© 2005 Petr Grygarek, VSB-TU Ostrava, Routed and Switched Networks

EVPN (Host) PropagationEVPN (Host) Propagation

• Proactive address propagation via MP-BGP Proactive address propagation via MP-BGP ● L2vpn-evpn address familyL2vpn-evpn address family

– L3 address + length, L2 address + lengthL3 address + length, L2 address + length– L2 VNID (VxLAN ID), L3 VNID (VRF membership)L2 VNID (VxLAN ID), L3 VNID (VRF membership)

● Site IP prefixes are used to limit unknown unicast flooding for Site IP prefixes are used to limit unknown unicast flooding for silent destination hostssilent destination hosts

● BGP authentication mechanisms can prevent rogue VTEPsBGP authentication mechanisms can prevent rogue VTEPs– Additional data plane anti-spoofing mechanism: VxLAN Additional data plane anti-spoofing mechanism: VxLAN

tunnel source (outer) header must correspond respective tunnel source (outer) header must correspond respective source's BGP next hopsource's BGP next hop

Page 21: Multi-site Datacenter Network Infrastructureswh.cs.vsb.cz/sps/images/8/87/Datacenter-multisite.pdf · Ingress Rbridge maintain  or

21© 2005 Petr Grygarek, VSB-TU Ostrava, Routed and Switched Networks

EVPN Multi-Tenancy SupportEVPN Multi-Tenancy Support

• VxLAN (L3) interfaces in separate VRFsVxLAN (L3) interfaces in separate VRFs

• Route Distinguisher makes host routes with overlapping address Route Distinguisher makes host routes with overlapping address spaces (IP, MAC) unique in BGPspaces (IP, MAC) unique in BGP

• Route Targets regulate who can talk with whomRoute Targets regulate who can talk with whom● Route visibility, even between different VRFsRoute visibility, even between different VRFs● Same concept as in L3 MPLS/VPNSame concept as in L3 MPLS/VPN

Page 22: Multi-site Datacenter Network Infrastructureswh.cs.vsb.cz/sps/images/8/87/Datacenter-multisite.pdf · Ingress Rbridge maintain  or

22© 2005 Petr Grygarek, VSB-TU Ostrava, Routed and Switched Networks

Host/site multihoming using LAGHost/site multihoming using LAG

• Ethernet segment (ES) = set of links to the same Ethernet segment (ES) = set of links to the same customer site/hostcustomer site/host

● downlink LAG from respective EVIs' of 2 different downlink LAG from respective EVIs' of 2 different leaf switchesleaf switches

● Identified by End System Identifier (ESI)Identified by End System Identifier (ESI)– 0=single homed0=single homed

Page 23: Multi-site Datacenter Network Infrastructureswh.cs.vsb.cz/sps/images/8/87/Datacenter-multisite.pdf · Ingress Rbridge maintain  or

23© 2005 Petr Grygarek, VSB-TU Ostrava, Routed and Switched Networks

EVPN BGP routesEVPN BGP routes• Type 1 – Ethernet AutoDiscoveryType 1 – Ethernet AutoDiscovery

● Propagates ESI of each leaf SWPropagates ESI of each leaf SW● Allows leaf Sws connected to the same site (L2 network) to find Allows leaf Sws connected to the same site (L2 network) to find

each othereach other

• Type 2 – MAC/IP AdvertisementsType 2 – MAC/IP Advertisements● also handles multihomed hosts/sitesalso handles multihomed hosts/sites● Propagates ESIPropagates ESI

• Type 3 – Inclusive Multicast Ethernet Tag RouteType 3 – Inclusive Multicast Ethernet Tag Route● Signalling of multicast tunnelSignalling of multicast tunnel● Multicast group IP addressMulticast group IP address

• Type 4 – Ethernet Segment RouteType 4 – Ethernet Segment Route● Used to elect designated forwarder for multicast on each ESUsed to elect designated forwarder for multicast on each ES

• Type 5 – IP prefix routeType 5 – IP prefix route

Page 24: Multi-site Datacenter Network Infrastructureswh.cs.vsb.cz/sps/images/8/87/Datacenter-multisite.pdf · Ingress Rbridge maintain  or

24© 2005 Petr Grygarek, VSB-TU Ostrava, Routed and Switched Networks

All-Active MultipathAll-Active Multipath

• By combining Type 2 and Type 1 routes, source By combining Type 2 and Type 1 routes, source EVI can load-balance between multiple egress EVI can load-balance between multiple egress leafs with downlinks to the same ESleafs with downlinks to the same ES

● Even if all traffic from destination to source was always hashed to a single leaf SW which Even if all traffic from destination to source was always hashed to a single leaf SW which learnt is MAC address but other leaves had never seen itlearnt is MAC address but other leaves had never seen it

• If a downlink from leaf switch to ES goes down, If a downlink from leaf switch to ES goes down, it sends BGP withdrawal of type 1 route (whole it sends BGP withdrawal of type 1 route (whole ESI) instead of withdrawing all individual host ESI) instead of withdrawing all individual host routes (type 2) => faster reconvergence after routes (type 2) => faster reconvergence after failurefailure

Page 25: Multi-site Datacenter Network Infrastructureswh.cs.vsb.cz/sps/images/8/87/Datacenter-multisite.pdf · Ingress Rbridge maintain  or

25© 2005 Petr Grygarek, VSB-TU Ostrava, Routed and Switched Networks

Broadcast & Unknown Unicast Broadcast & Unknown Unicast Traffic FloodingTraffic Flooding

• Unknown hosts shouldn't exist as BGP Unknown hosts shouldn't exist as BGP advertised all (non-silent) host proactivelyadvertised all (non-silent) host proactively

• ARP broadcasts should not be needed because of ARP broadcasts should not be needed because of ARP proxying (info from BGP host routes)ARP proxying (info from BGP host routes)

• Flooding of unknown and broadcast traffic can Flooding of unknown and broadcast traffic can be enabled/disabled administratively be enabled/disabled administratively

● Ingress replication / underlay (multicast) replicationIngress replication / underlay (multicast) replication

Page 26: Multi-site Datacenter Network Infrastructureswh.cs.vsb.cz/sps/images/8/87/Datacenter-multisite.pdf · Ingress Rbridge maintain  or

26© 2005 Petr Grygarek, VSB-TU Ostrava, Routed and Switched Networks

Multicast routingMulticast routing• Split horizon replicationSplit horizon replication

● From local source to other local segments and to remote From local source to other local segments and to remote VTEPsVTEPs

● From remote source to local segments onlyFrom remote source to local segments only

• Designated forwarder on each ES has to be Designated forwarder on each ES has to be established to avoid multiple copies of the same frame established to avoid multiple copies of the same frame for multihomed sitesfor multihomed sites

• Prevention against hearing multihomed source's own Prevention against hearing multihomed source's own multicast traffic back over EVPN core multicast traffic back over EVPN core

● if source VTEP IP address in received VxLAN packet if source VTEP IP address in received VxLAN packet advertises the same ES connectivity as local destination advertises the same ES connectivity as local destination ES, packet is droppedES, packet is dropped

Page 27: Multi-site Datacenter Network Infrastructureswh.cs.vsb.cz/sps/images/8/87/Datacenter-multisite.pdf · Ingress Rbridge maintain  or

27© 2005 Petr Grygarek, VSB-TU Ostrava, Routed and Switched Networks

MAC MovesMAC Moves

• Leaf1 does not know that MACx has moved until Leaf1 does not know that MACx has moved until it hears Type 2 route from Leaf2it hears Type 2 route from Leaf2

● Initiates triggered Type 2 route withdrawalInitiates triggered Type 2 route withdrawal● Sequence numbers are used to avoid race condition Sequence numbers are used to avoid race condition

in case of multiple movesin case of multiple moves

Page 28: Multi-site Datacenter Network Infrastructureswh.cs.vsb.cz/sps/images/8/87/Datacenter-multisite.pdf · Ingress Rbridge maintain  or

28© 2005 Petr Grygarek, VSB-TU Ostrava, Routed and Switched Networks

EVPN: Inter-VxLAN routing optionsEVPN: Inter-VxLAN routing options• Distributed anycast gatewayDistributed anycast gateway

● L3 VxLAN interface on multiple switchesL3 VxLAN interface on multiple switches● same IP and MAC on multiple switchessame IP and MAC on multiple switches

• Asymmetric mode: Inter-VxLAN routing on source VTEPAsymmetric mode: Inter-VxLAN routing on source VTEP● All VxLAN's L3 interfaces must be present on all switchesAll VxLAN's L3 interfaces must be present on all switches● after packet is routed, it is then carried in destination VxLAN to egress VTEP after packet is routed, it is then carried in destination VxLAN to egress VTEP

(egress VTEP does only L2 lookup)(egress VTEP does only L2 lookup)● Unscalable (source VTEP must maintain host routes of all VxLANs)Unscalable (source VTEP must maintain host routes of all VxLANs)

• Symmetric mode (preferred): Inter-VxLAN routing using (dynamic) Symmetric mode (preferred): Inter-VxLAN routing using (dynamic) virtual L3 P2P segment between source and destination VTEP switchvirtual L3 P2P segment between source and destination VTEP switch

● Source and destination MAC on virtual L3 interconnect segment are rewritten according Source and destination MAC on virtual L3 interconnect segment are rewritten according to well-known stateless procedure (corresponds to standard behaviour on L3 to well-known stateless procedure (corresponds to standard behaviour on L3 interconnect segment)interconnect segment)

● Packet arrives to destination switch from virtual L3 segment and then is routed to Packet arrives to destination switch from virtual L3 segment and then is routed to destination VxLANdestination VxLAN

● Both ingress and egress VTEP must do L3 lookup (+ L2 lookup)Both ingress and egress VTEP must do L3 lookup (+ L2 lookup)

Page 29: Multi-site Datacenter Network Infrastructureswh.cs.vsb.cz/sps/images/8/87/Datacenter-multisite.pdf · Ingress Rbridge maintain  or

29© 2005 Petr Grygarek, VSB-TU Ostrava, Routed and Switched Networks

EVPN External TrafficEVPN External Traffic

• BGP may convert between L2VPN (fabric BGP may convert between L2VPN (fabric internal) and IPv4 unicast routes (external internal) and IPv4 unicast routes (external connections) connections)

● respectively to per-VRF external routes – VPNv4 AFrespectively to per-VRF external routes – VPNv4 AF

Page 30: Multi-site Datacenter Network Infrastructureswh.cs.vsb.cz/sps/images/8/87/Datacenter-multisite.pdf · Ingress Rbridge maintain  or

30© 2005 Petr Grygarek, VSB-TU Ostrava, Routed and Switched Networks

See other presentations for See other presentations for another inter-DC L2 technologiesanother inter-DC L2 technologies

Page 31: Multi-site Datacenter Network Infrastructureswh.cs.vsb.cz/sps/images/8/87/Datacenter-multisite.pdf · Ingress Rbridge maintain  or

31© 2009 Petr Grygarek, Advanced Computer Networks Technologies

Multisite DC without L2 extensionMultisite DC without L2 extension● VM migration across different subnets (keeping original IP address) is VM migration across different subnets (keeping original IP address) is

often requiredoften required

● Keeping VM identity (including FW opennings) and established Keeping VM identity (including FW opennings) and established sessionssessions

● Potential alternative solutionsPotential alternative solutions

● Load-balancer frontend (L4 session termination or NAT)Load-balancer frontend (L4 session termination or NAT)● Mobile IP (not used actually today)Mobile IP (not used actually today)

● LISPLISP

Page 32: Multi-site Datacenter Network Infrastructureswh.cs.vsb.cz/sps/images/8/87/Datacenter-multisite.pdf · Ingress Rbridge maintain  or

32© 2009 Petr Grygarek, Advanced Computer Networks Technologies

LISPLISPLocator Identity Separation Locator Identity Separation

ProtocolProtocol

Page 33: Multi-site Datacenter Network Infrastructureswh.cs.vsb.cz/sps/images/8/87/Datacenter-multisite.pdf · Ingress Rbridge maintain  or

33© 2009 Petr Grygarek, Advanced Computer Networks Technologies

LISP MotivationLISP Motivation•Identity and Location are mixed together in today's IP routing Identity and Location are mixed together in today's IP routing schemescheme

•Advent of server virtualization made servers (VMs) mobileAdvent of server virtualization made servers (VMs) mobile•Mobility is advantageous for various operational reasonsMobility is advantageous for various operational reasons

•VMs are decoupled from physical infrastructure (IP address VMs are decoupled from physical infrastructure (IP address does not imply physical location anymore)does not imply physical location anymore)

•VM has still to keep its identity, even if moved between VM has still to keep its identity, even if moved between subnetssubnets

•VLAN extension over the whole world is not a solutionVLAN extension over the whole world is not a solution

•Same principle may apply to Internet multihoming with Same principle may apply to Internet multihoming with provider-independent addressingprovider-independent addressing

Page 34: Multi-site Datacenter Network Infrastructureswh.cs.vsb.cz/sps/images/8/87/Datacenter-multisite.pdf · Ingress Rbridge maintain  or

34© 2009 Petr Grygarek, Advanced Computer Networks Technologies

LISP Principles (1)LISP Principles (1)•Location / IP address separationLocation / IP address separation

•Endpoint Identifier (EID)Endpoint Identifier (EID)

•Routing Locator (RLOC)Routing Locator (RLOC)

•Separate address spaces for IDs and LocatorsSeparate address spaces for IDs and Locators•arbitrary values (e.g. MAC + GPS) or IP addresses (endpoints) + IP addresses (tunnel)arbitrary values (e.g. MAC + GPS) or IP addresses (endpoints) + IP addresses (tunnel)

•RIPE currently provides experimental registry for EID prefixes (/48)RIPE currently provides experimental registry for EID prefixes (/48)

•additional level of indirectionadditional level of indirection

•Mapping of EIDs to RLOCs is neededMapping of EIDs to RLOCs is needed

•Traffic is tunelled to current position of EID Traffic is tunelled to current position of EID (identified by one or mutiple alternative RLOCs)(identified by one or mutiple alternative RLOCs)

•Ingress Tunnel Router (ITR): source site → LISPIngress Tunnel Router (ITR): source site → LISP

•Egress Tunnel Router (ETR): LISP → destination siteEgress Tunnel Router (ETR): LISP → destination site

Page 35: Multi-site Datacenter Network Infrastructureswh.cs.vsb.cz/sps/images/8/87/Datacenter-multisite.pdf · Ingress Rbridge maintain  or

35© 2009 Petr Grygarek, Advanced Computer Networks Technologies

LISP Principles (2)LISP Principles (2)•Various mapping mechanisms were proposedVarious mapping mechanisms were proposed

•Mapping server(s)/resolver(s)Mapping server(s)/resolver(s)

•LISP-ALT topologyLISP-ALT topology

•BGP over GRE between LISP routers (normal IPv4/v6 AFs for EIDs)BGP over GRE between LISP routers (normal IPv4/v6 AFs for EIDs)•Map request and responsed routed over itMap request and responsed routed over it

•DDNS-like structureDDNS-like structure

•Standardized Query/Reply messagesStandardized Query/Reply messages

•Resolver proxies info from mapping mechanism for clientsResolver proxies info from mapping mechanism for clients•Client can find suitable resolver e.g. using anycastingClient can find suitable resolver e.g. using anycasting

•EIDs may be advertised as whole subnets or per hostEIDs may be advertised as whole subnets or per host

•Multiple RLOCs may be advertised for single EIDMultiple RLOCs may be advertised for single EID•Priority differentiates between alternative ETRsPriority differentiates between alternative ETRs

•Weight defines load share between ETRs of same priorityWeight defines load share between ETRs of same priority

Page 36: Multi-site Datacenter Network Infrastructureswh.cs.vsb.cz/sps/images/8/87/Datacenter-multisite.pdf · Ingress Rbridge maintain  or

36© 2009 Petr Grygarek, Advanced Computer Networks Technologies

LISP Principles (3)LISP Principles (3)

•Preconfigured EID (subnet) to RLOC mapping is useful e.g. Preconfigured EID (subnet) to RLOC mapping is useful e.g. for subnet multihomingfor subnet multihoming

•cases where EID does not movecases where EID does not move

•New EID mapping may be published dynamically if ETR New EID mapping may be published dynamically if ETR detects arrival of VM to its subnetdetects arrival of VM to its subnet

•ETR Informs other ITR/ETRs (multicast group) and ETR Informs other ITR/ETRs (multicast group) and registers new RLOC(s) with mapping serverregisters new RLOC(s) with mapping server

Page 37: Multi-site Datacenter Network Infrastructureswh.cs.vsb.cz/sps/images/8/87/Datacenter-multisite.pdf · Ingress Rbridge maintain  or

37© 2009 Petr Grygarek, Advanced Computer Networks Technologies

LISP VM moves between L3 LISP VM moves between L3 segments (distributed DC case)segments (distributed DC case)

•Outgoing traffic uses original VM's default GWOutgoing traffic uses original VM's default GW•VM does not know that it has been migratedVM does not know that it has been migrated

•Proxy ARP is used for cold migration scenerioProxy ARP is used for cold migration scenerio•VM's ARP cache is emptyVM's ARP cache is empty

•Manual synchronization of gateway HSRP/VRRP Manual synchronization of gateway HSRP/VRRP MAC addresses between DC sites (different IP MAC addresses between DC sites (different IP segments) can solve hot migration casesegments) can solve hot migration case

•in addition to proxy ARPin addition to proxy ARP