Upload
others
View
6
Download
0
Embed Size (px)
Citation preview
Introduction to the Cloud Computing Network Control Plane
Architecture and Protocols for Data Center Networks
Outline
❒ Data Center Networking Basics❒ Problem Solving with Traditional Design
Techniques❒ Virtual/Overlay Network Functional
Architecture❒ Virtual/Overlay Network Design and
Implementation
Data Center Networking Basics
Lots of Servers❒ Data centers consist of
massive numbers of servers❍ Up to 100,000’s
❒ Each server has multiple processors❍ 8 or more
❒ Each processor has multiple cores❍ 32 max for commodity
processors, more coming
❒ Each server has multiple NICs❍ Usually at least 2 for redundancy❍ 1G common, 10G on the upswing
Source: http://img.clubic.com/05468563-photo-google-datacenter.jpg
Mostly Virtualized❒ Hypervisor provides a compute
abstraction layer ❍ Looks like hardware to operating
system❍ OSes run as multiple Virtual
Machines (VMs) on single server
❒ Hypervisor maps VM to processors ❍ Virtual cores (vCores)
❒ Virtual switch provides networking between VMs and to DC network❍ Virtual NICs (vNICS)
❒ W.o. oversubscription, usually as many VMs as cores❍ Up to 256 for 8p x 32c❍ Typical is 32 for 4p x 8c
❒ VMs can be moved from one machine to another
Server Hardware
Hypervisor
VM1
VM2
VM3
VM4
NIC1 NIC2
Virtual Switch
vNIC vNICvNIC vNIC
Data Center Network Problem❒ For a single virtualized data center built
with cheap commodity servers:❍ 32 VMs per server❍ 100,000 servers❍ 32 x 100,000 = 3.2 million VMs!
❒ Each VM needs a MAC address and an IP address
❒ Infrastructure needs IP and MAC addresses too❍ Routers, switches❍ Physical servers for management
❒ Clearly a scaling problem!
Common Data Center Network Architectures: Three Tier❒ Server NICs connected directly to edge
switch ports❒ Aggregation layer switches connect
multiple edge switches❒ Top layer switches connect aggregation
❍ Top layer can also connect to the Internet
❒ Usually some redundancy
❒ Pluses❍ Common❍ Simple
❒ Minuses❍ Top layer massively over-subscribed❍ Reduced cross sectional bandwidth
• 4:1 oversubscription means only 25% of bandwidth available
❍ Scalability at top layer requires expensive enterprise switches
Source: K. Bilal, S. U. Khan, L. Zhang, H. Li, K. Hayat, S. A. Madani, N. Min-Allah, L. Wang, D. Chen, M. Iqbal, C.-Z. Xu, and A. Y. Zomaya, "Quantitative Comparisons of the State of the Art Data Center Architectures," Concurrency and Computation: Practice and Experience,
vol. 25, no. 12, pp. 1771-1783, 2013.
Top of Rack (ToR) Switch
End ofRow Switch(sometimes)
These can beIP Routers(for more €s)
Common Data Center Network Architectures: Fat Tree❒ CLOS network origin in 1950’s
telephone network❒ Data center divided into k pods❒ Each pod has (k/2)2 switches
❍ k/2 access, k/2 aggregation
❒ Core has (k/2)2 switches❒ 1:1 oversubscription ratio and full
bisection bandwidth
❒ Pluses❍ No oversubscription❍ Full bisection bandwidth
❒ Minuses❍ Need specialized routing and
addressing scheme❍ Number of pods limited to number of
ports on a switch❍ Maximum # of pods = # switch ports
Source: Bilal, et. al.
k=4 Example
Problem Sovlving with Traditonal Design Techniques
Problem #1:ARP/ND Handling
❒ IP nodes use ARP (IPv4) and N(eighbor) D(iscovery) for resolving the IP to MAC address❍ Broadcast (ARP)
and Multicast (ND)
❒ Problem:❍ Broadcast
forwarding load on large, flat L2 networks can be overwhelming
Source: http://www.louiewong.com/wp-content/uploads/2010/09/ARP.jpg
Problem #2: VM Movement❒ Data center operators need
to move VMs around❍ Reasons: server maintenance,
server optimization for energy use, performance improvement, etc.
❍ MAC address can stay fixed (provided it is unique in the data center)
❍ If subnet changes, IP address must change because it is bound to the VM’s location in the topology• For “hot” migration, the IP
address cannot change
❒ Problem:❍ How broadcast domains are
provisioned affects where VMs can be moved
Source: http://www.freesoftwaremagazine.com/files/nodes/1159/slide4.jpg
Hypervisor Hypervisor Hypervisor Hypervisor
Solutions Using Traditional Network Design Principles: IP Subnets
❒ ToR == last hop router❍ Subnet (broadcast domain) limited to rack❍ Good broadcast/multicast limitation❍ Poor VM mobility
❒ Aggregation Switch == last hop router❍ Subnet limited to racks controlled by aggregation switch❍ Complex configuration
• Subnet VLAN to all access switches and servers on served racks
❍ Moderate broadcast/multicast limitation❍ Moderate VM mobility
• To any rack covered
❒ Core Switch/Router == last hop router❍ Poor broadcast/multicast limitation❍ Good VM mobility
Note:These solutions onlywork if the data centeris single tenant!
Source: Bilal, et. al.
Where to put the lasthop router?
Problem #3: Dynamic Provisioning of Tenant Networks❒ Virtualized data centers enable
renting infrastructure to outside parties (aka tenants)❍ Infrastructure as a Service (IaaS)
model❍ Amazon Web Services, Microsoft
Azure, Google Compute Engine, etc.
❒ Customers get dynamic server provisioning through VMs❍ Expect same dynamic “as a service”
provisioning for networks too
❒ Characteristics of tenant network❍ Traffic isolation❍ Address isolation
• From other tenants• From infrastructure
Solution Using Traditional Network Design Principles❒ Use a different VLAN for each tenant network❒ Problem #1
❍ There are only 4096 VLAN tags for 802.1q VLANs*❍ Forces tenant network provisioning along physical
network lines
❒ Problem #2❍ For fully dynamic VM placement, each ToR-server
link must be dynamically configured as a trunk
❒ Problem #3❍ Can only move VMs to servers where VLAN tag is
available• Ties VM movement to physical infrastructure
*except for carrier Ethernet, about which more shortly
Summary
❒ Configuring subnets based on hierarchical switch architecture always results in a tradeoff between broadcast limitation and VM movement freedom❍ On top of which, can’t achieve traffic isolation for
multitenant networks
❒ Configuring multitenant networks with VLAN tags for traffic isolation ties tenant configuration to physical data center layout❍ Severely limits where VMs can be provisioned and
moved❍ Requires complicated dynamic trunking
❒ For multitenant, virtualized data centers, no good solution using traditional techniques!
Virtual/Overlay Network Functional Architecture
Virtual Networks through Overlays
Source: Bilal, et. al. Blue Tenant Network Yellow Tenant Network
❒ Basic idea of an overlay:❍ Tunnel tenant packets through underlying physical Ethernet or IP network ❍ Overlay forms a conceptually separate network providing a separate service
from underlay
❒ L2 service like VPLS or EVPN❍ Overlay spans a separate broadcast domain
❒ L3 service like BGP IP VPNs❍ Different tenant networks have separate IP address spaces
❒ Dynamically provision and remove overlay as tenants need network service
❒ Multiple tenants with separate networks on the same server
Advantages of Overlays
❒ Tunneling is used to aggregate traffic❒ Addresses in underlay are hidden from the
tenant❍ Inhibits unauthorized tenants from accessing
data center infrastructure
❒ Tenant addresses in overlay are hidden from underlay and other tenants❍ Multiple tenants with the same IP address space
❒ Overlays can potentially support large numbers of tenant networks
❒ Virtual network state and end node reachability are handled in the end nodes
Challenges of Overlays
❒ Management tools to co-ordinate overlay and underlay❍ Overlay networks probe for bandwidth and
packet loss, which can lead to inaccurate information
❍ Lack of communication between overlay and underlay can lead to inefficient usage of network resources
❍ Lack of communication between overlays can lead to contention and other performance issues
❒ Overlay packets may fail to traverse firewalls
❒ Path MTU limit may cause fragmentation❒ Efficient multicast is challenging
Functional Architecture: Definitions❒ Virtual Network
❍ Overlay network defined over the Layer 2 or Layer 3 underlay (physical) network
❍ Provides either a Layer 2 or a Layer 3 service to tenant
❒ Virtual Network Instance (VNI) or Tenant Network❍ A specific instance of a virtual network
❒ Virtual Network Context (VNC)❍ A tag or field in the encapsulation header that
identifies the specific tenant network
Functional Architecture: More Definitions❒ Network Virtualization Edge (NVE)
❍ Data plane entity that sits at the edge of an underlay network and implements L2 and/or L3 network virtualization functions • Example: virtual switch aka Virtual Edge Bridge (VEB)
❍ Terminates the virtual network towards the tenant VMs and towards outside networks
❒ Network Virtualization Authority (NVA)❍ Control plane entity that provides information
about reachability and connectivity for all tenants in the data center
Overlay Network Architecture
Data Center L2/L3 Network
NVE
TenantSystem
TenantSystem
NVE
TenantSystem
TenantSystem
TenantSystem
NVE
NVA
Data PlaneControl Plane
LAN link
Point to Point link
End Systemintegration
Virtual/Overlay Network Design and ImplementatION
Implementing Overlays: Tagging or Encapsulation?❒ At or above Layer 2 but below Layer 3:
❍ Insert tag at a standards specified place in the pre-Layer 3 header
❒ At Layer 3:❍ Encapsulate the tenant packet with an
encapsulation protocol header and an IP header
❒ Tenant network identified by Virtual Network Context❍ Tag for tagging❍ Context identifier in protocol header for
encapsulation
L2 Virtual Networks:Tagging Options❒ Simple 802.1q VLANs
❍ 4096 limit problem❍ Trunking complexity
❒ MPLS❍ Nobody uses MPLS directly on the switching hardware
• One experimental system (Zepplin)
❍ Switches are perceived to be too expensive
❒ TRILL❍ IETF standard for L2 encapsulation❍ Not widely adopted
• Brocade and Cisco implement it
❒ Collection of enhancements to 802.1 since 2000❍ 802.1qbg Virtual Edge Bridging (VEB) and Virtual Ethernet Port Aggregation
(VEPA) (data plane)❍ 802.1qbc Provider Bridging (data plane)❍ 802.1qbf Provider Backbone Bridging (data plane)
• Also does MAC’nMAC encapsulation
❍ 802.1aq Shortest-Path Bridging (control plane)❍ Note: These are also used by carriers for wide area network (Carrier
Ethernet)
802.1qbg: Standard Virtual Switch/VEB❒ Virtual switch software sits in hypervisor and switches
packets between VMs❒ Every time a packet arrives for a VM, the hypervisor
takes an interrupt❍ Potential performance issue
Source: D. Kamath, et. Al., “Edge Virtual Bridge Proposal Version 0, Rev 0.1”, March, 2010.
802.1qbg: Hardware Supported VEB
❒ SR-IOV is a PCI Express bus standard for allowing VMs to communicate directly with the NIC❍ No hypervisor interrupt
❒ Improves performance of virtual switching❒ Downside
❍ More expensive NIC hardware❍ More complex virtual switch❍ Constrains VM movement
802.1qbg: VEB Forwarding
❒ At 1, VEB forwards between VM and outside network via an external physical bridge (e.g. ToR)
❒ At 2, VEB forwards between two VMs belonging to the blue tenant on the same hypervisor
❒ At 3, forwarding between two logical uplink ports is not allowed
802.1qbg: VEB Characteristics❒ Works in the absence of any ToR switch support❒ Only supports a single physical uplink❒ VEB does not participate in spanning tree
calculations❒ Maximize bandwidth
❍ As opposed to VEPA which uses trombone forwarding (as we will shortly see)
❒ Minimize latency for co-located VMs because no external network to cross
❒ Migration of VMs between servers is straightforward❍ If both support SR-IOV for hardware supported
802.1qbg:VEB Drawbacks (as of 2010)❒ Limited additional packet processing (ACLs,
etc.)❒ Limited security features❒ Limited monitoring (Netflow, etc.)❒ Limited support for 802.1 protocols (802.1x
authentication, etc.)❒ Limited support for promiscuous mode❒ All these are supported in the ToR❒ Assumption: the only way to get support
for these is to forward frames to the ToR before sending them to the VM
802.1qbg: Virtual Edge Port Aggregation (VEPA)❒ Firmware upgrade to switch to allow forwarding out of the same
physical port the packet arrived at under certain conditions❒ VMs send all packets to the switch
❍ Packets to VMs on VLANs on same machine turned around and sent back
❒ Trombone routing halves the capacity on the ToR-server link
❒ OpenVirtualSwtch (OVS) supports ACLs
❒ OVS supports Netflow❒ VMWare virtual switch supports
promiscuous mode and OVS supports it if the NIC is in promiscuous mode
❒ OVS doesn’t support 802.1x❒ Conclusion: programming support
into software is a much better solution than making a hardware standard that reduces performance
5 Years Later: VEBs support most of these
Ethernet Data Plane Evolution:Not Your Father’s Ethernet Anymore
1990 1999 2005 2008
802.1D802.1QVLAN
802.1QbcProvider Bridging
802.1QbfProvider
BackboneBridging
Source:evolutionanimation.wordpress.com
Source:P. Thaler, N. Finn, D. Fedyk,G. Parsons, and E. Gray, “IEEE 802.1Q:Media Access Control Bridges andVirtual Bridged Local Area Networks”, IETF-86 Tutorial, March 19, 2013
Ethernet Control Plane Evolution❒ Rapid Spanning Tree
Protocol (RSTP): single spanning tree for all traffic
❒ Multiple Spanning Tree Protocol (MSTP): different VLANs share separate paths
❒ Shortest Path Bridging: Use routing protocol (ISIS) to give each node its own spanning tree
Source: P. Thaler, et. al., 2013
SPB Data Center Virtualization
Data Center L2 Network
NVA(e.g. Software DefinedNetwork Controller)
NVE (EdgeSwitch 1)
NVE (EdgeSwitch 2)
NVE (EdgeSwitch 3)
CentralSwitch 1
1) Create RedTenant Network
(I-SID1)
2) Distribute Shortest
Path Routes with ISIS
B-V
ID1
I-SI
D1
VN-1
VN-1
Hybrid Centralized/Distributed Control Plane
VM
L2 Virtualization: Challenges Handled❒ “Hot” VM movement
❍ IP address space configured on I-SID❍ But only within the data center
❒ ARP containment ❍ Limit broadcast domain to I-SID
❒ Firewall traversal ❍ No firewall at L2
❒ Path MTU ❍ Handled by the IP layer
❒ Multicast ❍ ISIS handles
❒ Management❍ Whole suite of management tools for 802.1 networks
L2 Virtualization Summary
❒ Possible to virtualize a data center with standardized L2 overlays❍ Advances in 802.1Q data plane provide one
encapsulation layer of MAC’nMAC encapsulation and extra layer of VLAN tags
❍ Centralized, decentralized or hybrid control plane
❒ But most existing deployments use proprietary extensions❍ Cisco UCS uses TRILL
❒ But using IP overlays is cheaper❍ Switches supporting carrier Ethernet extensions
and TRILL are more expensive than simple 802.1Q
L3 Virtual Networks: Advantages
❒ Easy IP provisioning through hypervisor/virtual switch❍ End host provisioning❍ No need for distributed control plane
❒ Cheap NICs and switching hardware ❒ Support in hypervisor/virtual switch❒ No limitation on number and placement of
virtual networks❍ Virtual network can even extend into WAN
L3 Virtual Networks: Challenges❒ Path MTU limitation may cause
fragmentation❒ Lack of tools for management❒ Some performance hit
❍ Encapsulation/decapsulation❍ Lack of NIC hardware support
But low cost of NICs and switching hardware trumps all!!
L3 Virtual Networks: Encapsulation Options❒ IP in IP
❍ Use IP address as VNC❍ Problem for IPv4: Lack of address space
❒ IPSec in Infrastructure mode❍ Provides additional confidentiality❍ Problem: Key distribution complexity❍ Problem: larger performance hit even with hardware encryption
assist
❒ In practice:❍ STT
• Proprietary VMWare/NSX protocol• Designed to leverage TLS hardware support on NICs
❍ GRE and NVGRE❍ VxLAN
❒ Coming❍ GEVNE
• Proposed unified protocol framework for encapsulation headers
NVGRE: Network Virtualization Generic Routing Encapsulation
❒ Microsoft-proposed GRE Extension built on:❍ RFC 2784 GRE❍ RFC 2890 GRE Key Extension
❒ Provides a Layer 2 service tunneled over IP❍ No VLAN id!
❒ VNC is a Virtual Subnet Identifier (VSID)❍ 24 bit Key
• Each VSID constitutes a separate broadcast domain
– Like a VLAN
❍ 8 bit Flow label • Adds entropy for Equal Cost
Multipath (ECMP) routing P-DMAC
P-SMAC
P-VIDEthertype=0x0800
P-SIP
P-DIPProtocol=0x2F
0 1 0 0ReservedVerProtocol=0x6558
24 bit VSID 8 bit FlowID
C-DMAC
C-SMACEthertype=0x0800
C-SIP
C-DIPProtocol=<Payload>
Checksum bit &Sequence # bit
Key bit
IndicatesTransparent
EthernetBridging
NO VID!!
NVGRE Characteristics
❒ Path MTU discovery must be performed by originating NVE
❒ Encapsulated MAC header VLAN tag handling❍ Originating NVE must strip out any 802.1Q VLAN tag❍ Receiving NVE must add required 802.1Q VLAN tag
back❍ Requires NVA to maintain and provision VLAN tag to VN
Key mapping
❒ Multicast handling❍ Multicast routing deployed in infrastructure
• Provider provisions a multicast address per VSID• Addr takes all multicast and broadcast traffic originating in VSID
❍ No multicast routing deployed in infrastructure• N-way unicast by NVEs or a dedicated VM multicast router
VxLAN: Virtual eXtensible LocalArea Network ❒ RFC 7348
❍ Consortium lead by Intel, VMWare and Cisco
❒ Full Layer 2 service provided over IP❍ VLAN id OK❍ VxLAN segments constitute
a broadcast domain
❒ VNC is VxLAN Network Identifier (VNI)❍ 24 bit VxLAN Segment
Identifier
❒ Recommended UDP source port randomized to provide entropy for ECMP routing
P-DMAC
P-SMAC
P-VIDEthertype=0x0800
P-SIP
P-DIPProtocol=0x17 (UDP)
C-DMAC
C-SMAC
Ethertype=0x0800C-SIP
C-DIPProtocol=<Payload>
C-VID
Source Port = <Random>
Dest. Port = 4789
UDP Length UDP Checksum (= 0)
Rsv. 1 R. Reserved
24 bit VNI Reserved
UDP
VxLAN
Set to 1 for valid VNI
Other bits ignored
VxLAN Characteristics❒ Problem: IP multicast control plane required by RFC
7348❍ IP multicast address allocated per VNI for determining IP
unicast address to MAC address mapping❍ Multicast routing not widely deployed in data centers
• Most VxLAN deployments use NVA/SDN Controller
❒ Solution: VxLAN just used as an encapsulation format❒ UDP endpoint constitutes a VxLAN Tunnel End Point
(VTEP)❍ Handled at application layer
❒ Path MTU discovery performed by VTEP❒ Multicast handling like NVGRE
❍ Can be handled by using underlay multicast❍ Mostly handled using N-way unicast
VxLAN Data Center Virtualization
Data Center L3 Network
NVA(e.g. Software DefinedNetwork Controller)
ToR1
ToR2
ToR3
Centralized Control Plane
VTEP
VTEP
VTEP
Create RedTenant Network
(VNI-1)
VNI-1
VN
I-1
NVE
NVE
NVE
L3 Virtual Networks Summary
❒ Despite the challenges with IP overlays, they are widely deployed❍ Usually workarounds for the challenges
❒ Software availability❍ Lots of open source software❍ Also proprietary solutions available
❒ Can extend overlay into WAN❍ Between data centers❍ Between enterprise network and data center
❒ Deployments almost exclusively use centralized control❍ NVA implemented using an SDN controller