Introduction to the Cloud Computing Network Control Plane...Common Data Center Network...

Preview:

Citation preview

Introduction to the Cloud Computing Network Control Plane

Architecture and Protocols for Data Center Networks

Outline

❒ Data Center Networking Basics❒ Problem Solving with Traditional Design

Techniques❒ Virtual/Overlay Network Functional

Architecture❒ Virtual/Overlay Network Design and

Implementation

Data Center Networking Basics

Lots of Servers❒ Data centers consist of

massive numbers of servers❍ Up to 100,000’s

❒ Each server has multiple processors❍ 8 or more

❒ Each processor has multiple cores❍ 32 max for commodity

processors, more coming

❒ Each server has multiple NICs❍ Usually at least 2 for redundancy❍ 1G common, 10G on the upswing

Source: http://img.clubic.com/05468563-photo-google-datacenter.jpg

Mostly Virtualized❒ Hypervisor provides a compute

abstraction layer ❍ Looks like hardware to operating

system❍ OSes run as multiple Virtual

Machines (VMs) on single server

❒ Hypervisor maps VM to processors ❍ Virtual cores (vCores)

❒ Virtual switch provides networking between VMs and to DC network❍ Virtual NICs (vNICS)

❒ W.o. oversubscription, usually as many VMs as cores❍ Up to 256 for 8p x 32c❍ Typical is 32 for 4p x 8c

❒ VMs can be moved from one machine to another

Server Hardware

Hypervisor

VM1

VM2

VM3

VM4

NIC1 NIC2

Virtual Switch

vNIC vNICvNIC vNIC

Data Center Network Problem❒ For a single virtualized data center built

with cheap commodity servers:❍ 32 VMs per server❍ 100,000 servers❍ 32 x 100,000 = 3.2 million VMs!

❒ Each VM needs a MAC address and an IP address

❒ Infrastructure needs IP and MAC addresses too❍ Routers, switches❍ Physical servers for management

❒ Clearly a scaling problem!

Common Data Center Network Architectures: Three Tier❒ Server NICs connected directly to edge

switch ports❒ Aggregation layer switches connect

multiple edge switches❒ Top layer switches connect aggregation

❍ Top layer can also connect to the Internet

❒ Usually some redundancy

❒ Pluses❍ Common❍ Simple

❒ Minuses❍ Top layer massively over-subscribed❍ Reduced cross sectional bandwidth

• 4:1 oversubscription means only 25% of bandwidth available

❍ Scalability at top layer requires expensive enterprise switches

Source: K. Bilal, S. U. Khan, L. Zhang, H. Li, K. Hayat, S. A. Madani, N. Min-Allah, L. Wang, D. Chen, M. Iqbal, C.-Z. Xu, and A. Y. Zomaya, "Quantitative Comparisons of the State of the Art Data Center Architectures," Concurrency and Computation: Practice and Experience,

vol. 25, no. 12, pp. 1771-1783, 2013.

Top of Rack (ToR) Switch

End ofRow Switch(sometimes)

These can beIP Routers(for more €s)

Common Data Center Network Architectures: Fat Tree❒ CLOS network origin in 1950’s

telephone network❒ Data center divided into k pods❒ Each pod has (k/2)2 switches

❍ k/2 access, k/2 aggregation

❒ Core has (k/2)2 switches❒ 1:1 oversubscription ratio and full

bisection bandwidth

❒ Pluses❍ No oversubscription❍ Full bisection bandwidth

❒ Minuses❍ Need specialized routing and

addressing scheme❍ Number of pods limited to number of

ports on a switch❍ Maximum # of pods = # switch ports

Source: Bilal, et. al.

k=4 Example

Problem Sovlving with Traditonal Design Techniques

Problem #1:ARP/ND Handling

❒ IP nodes use ARP (IPv4) and N(eighbor) D(iscovery) for resolving the IP to MAC address❍ Broadcast (ARP)

and Multicast (ND)

❒ Problem:❍ Broadcast

forwarding load on large, flat L2 networks can be overwhelming

Source: http://www.louiewong.com/wp-content/uploads/2010/09/ARP.jpg

Problem #2: VM Movement❒ Data center operators need

to move VMs around❍ Reasons: server maintenance,

server optimization for energy use, performance improvement, etc.

❍ MAC address can stay fixed (provided it is unique in the data center)

❍ If subnet changes, IP address must change because it is bound to the VM’s location in the topology• For “hot” migration, the IP

address cannot change

❒ Problem:❍ How broadcast domains are

provisioned affects where VMs can be moved

Source: http://www.freesoftwaremagazine.com/files/nodes/1159/slide4.jpg

Hypervisor Hypervisor Hypervisor Hypervisor

Solutions Using Traditional Network Design Principles: IP Subnets

❒ ToR == last hop router❍ Subnet (broadcast domain) limited to rack❍ Good broadcast/multicast limitation❍ Poor VM mobility

❒ Aggregation Switch == last hop router❍ Subnet limited to racks controlled by aggregation switch❍ Complex configuration

• Subnet VLAN to all access switches and servers on served racks

❍ Moderate broadcast/multicast limitation❍ Moderate VM mobility

• To any rack covered

❒ Core Switch/Router == last hop router❍ Poor broadcast/multicast limitation❍ Good VM mobility

Note:These solutions onlywork if the data centeris single tenant!

Source: Bilal, et. al.

Where to put the lasthop router?

Problem #3: Dynamic Provisioning of Tenant Networks❒ Virtualized data centers enable

renting infrastructure to outside parties (aka tenants)❍ Infrastructure as a Service (IaaS)

model❍ Amazon Web Services, Microsoft

Azure, Google Compute Engine, etc.

❒ Customers get dynamic server provisioning through VMs❍ Expect same dynamic “as a service”

provisioning for networks too

❒ Characteristics of tenant network❍ Traffic isolation❍ Address isolation

• From other tenants• From infrastructure

Solution Using Traditional Network Design Principles❒ Use a different VLAN for each tenant network❒ Problem #1

❍ There are only 4096 VLAN tags for 802.1q VLANs*❍ Forces tenant network provisioning along physical

network lines

❒ Problem #2❍ For fully dynamic VM placement, each ToR-server

link must be dynamically configured as a trunk

❒ Problem #3❍ Can only move VMs to servers where VLAN tag is

available• Ties VM movement to physical infrastructure

*except for carrier Ethernet, about which more shortly

Summary

❒ Configuring subnets based on hierarchical switch architecture always results in a tradeoff between broadcast limitation and VM movement freedom❍ On top of which, can’t achieve traffic isolation for

multitenant networks

❒ Configuring multitenant networks with VLAN tags for traffic isolation ties tenant configuration to physical data center layout❍ Severely limits where VMs can be provisioned and

moved❍ Requires complicated dynamic trunking

❒ For multitenant, virtualized data centers, no good solution using traditional techniques!

Virtual/Overlay Network Functional Architecture

Virtual Networks through Overlays

Source: Bilal, et. al. Blue Tenant Network Yellow Tenant Network

❒ Basic idea of an overlay:❍ Tunnel tenant packets through underlying physical Ethernet or IP network ❍ Overlay forms a conceptually separate network providing a separate service

from underlay

❒ L2 service like VPLS or EVPN❍ Overlay spans a separate broadcast domain

❒ L3 service like BGP IP VPNs❍ Different tenant networks have separate IP address spaces

❒ Dynamically provision and remove overlay as tenants need network service

❒ Multiple tenants with separate networks on the same server

Advantages of Overlays

❒ Tunneling is used to aggregate traffic❒ Addresses in underlay are hidden from the

tenant❍ Inhibits unauthorized tenants from accessing

data center infrastructure

❒ Tenant addresses in overlay are hidden from underlay and other tenants❍ Multiple tenants with the same IP address space

❒ Overlays can potentially support large numbers of tenant networks

❒ Virtual network state and end node reachability are handled in the end nodes

Challenges of Overlays

❒ Management tools to co-ordinate overlay and underlay❍ Overlay networks probe for bandwidth and

packet loss, which can lead to inaccurate information

❍ Lack of communication between overlay and underlay can lead to inefficient usage of network resources

❍ Lack of communication between overlays can lead to contention and other performance issues

❒ Overlay packets may fail to traverse firewalls

❒ Path MTU limit may cause fragmentation❒ Efficient multicast is challenging

Functional Architecture: Definitions❒ Virtual Network

❍ Overlay network defined over the Layer 2 or Layer 3 underlay (physical) network

❍ Provides either a Layer 2 or a Layer 3 service to tenant

❒ Virtual Network Instance (VNI) or Tenant Network❍ A specific instance of a virtual network

❒ Virtual Network Context (VNC)❍ A tag or field in the encapsulation header that

identifies the specific tenant network

Functional Architecture: More Definitions❒ Network Virtualization Edge (NVE)

❍ Data plane entity that sits at the edge of an underlay network and implements L2 and/or L3 network virtualization functions • Example: virtual switch aka Virtual Edge Bridge (VEB)

❍ Terminates the virtual network towards the tenant VMs and towards outside networks

❒ Network Virtualization Authority (NVA)❍ Control plane entity that provides information

about reachability and connectivity for all tenants in the data center

Overlay Network Architecture

Data Center L2/L3 Network

NVE

TenantSystem

TenantSystem

NVE

TenantSystem

TenantSystem

TenantSystem

NVE

NVA

Data PlaneControl Plane

LAN link

Point to Point link

End Systemintegration

Virtual/Overlay Network Design and ImplementatION

Implementing Overlays: Tagging or Encapsulation?❒ At or above Layer 2 but below Layer 3:

❍ Insert tag at a standards specified place in the pre-Layer 3 header

❒ At Layer 3:❍ Encapsulate the tenant packet with an

encapsulation protocol header and an IP header

❒ Tenant network identified by Virtual Network Context❍ Tag for tagging❍ Context identifier in protocol header for

encapsulation

L2 Virtual Networks:Tagging Options❒ Simple 802.1q VLANs

❍ 4096 limit problem❍ Trunking complexity

❒ MPLS❍ Nobody uses MPLS directly on the switching hardware

• One experimental system (Zepplin)

❍ Switches are perceived to be too expensive

❒ TRILL❍ IETF standard for L2 encapsulation❍ Not widely adopted

• Brocade and Cisco implement it

❒ Collection of enhancements to 802.1 since 2000❍ 802.1qbg Virtual Edge Bridging (VEB) and Virtual Ethernet Port Aggregation

(VEPA) (data plane)❍ 802.1qbc Provider Bridging (data plane)❍ 802.1qbf Provider Backbone Bridging (data plane)

• Also does MAC’nMAC encapsulation

❍ 802.1aq Shortest-Path Bridging (control plane)❍ Note: These are also used by carriers for wide area network (Carrier

Ethernet)

802.1qbg: Standard Virtual Switch/VEB❒ Virtual switch software sits in hypervisor and switches

packets between VMs❒ Every time a packet arrives for a VM, the hypervisor

takes an interrupt❍ Potential performance issue

Source: D. Kamath, et. Al., “Edge Virtual Bridge Proposal Version 0, Rev 0.1”, March, 2010.

802.1qbg: Hardware Supported VEB

❒ SR-IOV is a PCI Express bus standard for allowing VMs to communicate directly with the NIC❍ No hypervisor interrupt

❒ Improves performance of virtual switching❒ Downside

❍ More expensive NIC hardware❍ More complex virtual switch❍ Constrains VM movement

802.1qbg: VEB Forwarding

❒ At 1, VEB forwards between VM and outside network via an external physical bridge (e.g. ToR)

❒ At 2, VEB forwards between two VMs belonging to the blue tenant on the same hypervisor

❒ At 3, forwarding between two logical uplink ports is not allowed

802.1qbg: VEB Characteristics❒ Works in the absence of any ToR switch support❒ Only supports a single physical uplink❒ VEB does not participate in spanning tree

calculations❒ Maximize bandwidth

❍ As opposed to VEPA which uses trombone forwarding (as we will shortly see)

❒ Minimize latency for co-located VMs because no external network to cross

❒ Migration of VMs between servers is straightforward❍ If both support SR-IOV for hardware supported

802.1qbg:VEB Drawbacks (as of 2010)❒ Limited additional packet processing (ACLs,

etc.)❒ Limited security features❒ Limited monitoring (Netflow, etc.)❒ Limited support for 802.1 protocols (802.1x

authentication, etc.)❒ Limited support for promiscuous mode❒ All these are supported in the ToR❒ Assumption: the only way to get support

for these is to forward frames to the ToR before sending them to the VM

802.1qbg: Virtual Edge Port Aggregation (VEPA)❒ Firmware upgrade to switch to allow forwarding out of the same

physical port the packet arrived at under certain conditions❒ VMs send all packets to the switch

❍ Packets to VMs on VLANs on same machine turned around and sent back

❒ Trombone routing halves the capacity on the ToR-server link

❒ OpenVirtualSwtch (OVS) supports ACLs

❒ OVS supports Netflow❒ VMWare virtual switch supports

promiscuous mode and OVS supports it if the NIC is in promiscuous mode

❒ OVS doesn’t support 802.1x❒ Conclusion: programming support

into software is a much better solution than making a hardware standard that reduces performance

5 Years Later: VEBs support most of these

Ethernet Data Plane Evolution:Not Your Father’s Ethernet Anymore

1990 1999 2005 2008

802.1D802.1QVLAN

802.1QbcProvider Bridging

802.1QbfProvider

BackboneBridging

Source:evolutionanimation.wordpress.com

Source:P. Thaler, N. Finn, D. Fedyk,G. Parsons, and E. Gray, “IEEE 802.1Q:Media Access Control Bridges andVirtual Bridged Local Area Networks”, IETF-86 Tutorial, March 19, 2013

Ethernet Control Plane Evolution❒ Rapid Spanning Tree

Protocol (RSTP): single spanning tree for all traffic

❒ Multiple Spanning Tree Protocol (MSTP): different VLANs share separate paths

❒ Shortest Path Bridging: Use routing protocol (ISIS) to give each node its own spanning tree

Source: P. Thaler, et. al., 2013

SPB Data Center Virtualization

Data Center L2 Network

NVA(e.g. Software DefinedNetwork Controller)

NVE (EdgeSwitch 1)

NVE (EdgeSwitch 2)

NVE (EdgeSwitch 3)

CentralSwitch 1

1) Create RedTenant Network

(I-SID1)

2) Distribute Shortest

Path Routes with ISIS

B-V

ID1

I-SI

D1

VN-1

VN-1

Hybrid Centralized/Distributed Control Plane

VM

L2 Virtualization: Challenges Handled❒ “Hot” VM movement

❍ IP address space configured on I-SID❍ But only within the data center

❒ ARP containment ❍ Limit broadcast domain to I-SID

❒ Firewall traversal ❍ No firewall at L2

❒ Path MTU ❍ Handled by the IP layer

❒ Multicast ❍ ISIS handles

❒ Management❍ Whole suite of management tools for 802.1 networks

L2 Virtualization Summary

❒ Possible to virtualize a data center with standardized L2 overlays❍ Advances in 802.1Q data plane provide one

encapsulation layer of MAC’nMAC encapsulation and extra layer of VLAN tags

❍ Centralized, decentralized or hybrid control plane

❒ But most existing deployments use proprietary extensions❍ Cisco UCS uses TRILL

❒ But using IP overlays is cheaper❍ Switches supporting carrier Ethernet extensions

and TRILL are more expensive than simple 802.1Q

L3 Virtual Networks: Advantages

❒ Easy IP provisioning through hypervisor/virtual switch❍ End host provisioning❍ No need for distributed control plane

❒ Cheap NICs and switching hardware ❒ Support in hypervisor/virtual switch❒ No limitation on number and placement of

virtual networks❍ Virtual network can even extend into WAN

L3 Virtual Networks: Challenges❒ Path MTU limitation may cause

fragmentation❒ Lack of tools for management❒ Some performance hit

❍ Encapsulation/decapsulation❍ Lack of NIC hardware support

But low cost of NICs and switching hardware trumps all!!

L3 Virtual Networks: Encapsulation Options❒ IP in IP

❍ Use IP address as VNC❍ Problem for IPv4: Lack of address space

❒ IPSec in Infrastructure mode❍ Provides additional confidentiality❍ Problem: Key distribution complexity❍ Problem: larger performance hit even with hardware encryption

assist

❒ In practice:❍ STT

• Proprietary VMWare/NSX protocol• Designed to leverage TLS hardware support on NICs

❍ GRE and NVGRE❍ VxLAN

❒ Coming❍ GEVNE

• Proposed unified protocol framework for encapsulation headers

NVGRE: Network Virtualization Generic Routing Encapsulation

❒ Microsoft-proposed GRE Extension built on:❍ RFC 2784 GRE❍ RFC 2890 GRE Key Extension

❒ Provides a Layer 2 service tunneled over IP❍ No VLAN id!

❒ VNC is a Virtual Subnet Identifier (VSID)❍ 24 bit Key

• Each VSID constitutes a separate broadcast domain

– Like a VLAN

❍ 8 bit Flow label • Adds entropy for Equal Cost

Multipath (ECMP) routing P-DMAC

P-SMAC

P-VIDEthertype=0x0800

P-SIP

P-DIPProtocol=0x2F

0 1 0 0ReservedVerProtocol=0x6558

24 bit VSID 8 bit FlowID

C-DMAC

C-SMACEthertype=0x0800

C-SIP

C-DIPProtocol=<Payload>

Checksum bit &Sequence # bit

Key bit

IndicatesTransparent

EthernetBridging

NO VID!!

NVGRE Characteristics

❒ Path MTU discovery must be performed by originating NVE

❒ Encapsulated MAC header VLAN tag handling❍ Originating NVE must strip out any 802.1Q VLAN tag❍ Receiving NVE must add required 802.1Q VLAN tag

back❍ Requires NVA to maintain and provision VLAN tag to VN

Key mapping

❒ Multicast handling❍ Multicast routing deployed in infrastructure

• Provider provisions a multicast address per VSID• Addr takes all multicast and broadcast traffic originating in VSID

❍ No multicast routing deployed in infrastructure• N-way unicast by NVEs or a dedicated VM multicast router

VxLAN: Virtual eXtensible LocalArea Network ❒ RFC 7348

❍ Consortium lead by Intel, VMWare and Cisco

❒ Full Layer 2 service provided over IP❍ VLAN id OK❍ VxLAN segments constitute

a broadcast domain

❒ VNC is VxLAN Network Identifier (VNI)❍ 24 bit VxLAN Segment

Identifier

❒ Recommended UDP source port randomized to provide entropy for ECMP routing

P-DMAC

P-SMAC

P-VIDEthertype=0x0800

P-SIP

P-DIPProtocol=0x17 (UDP)

C-DMAC

C-SMAC

Ethertype=0x0800C-SIP

C-DIPProtocol=<Payload>

C-VID

Source Port = <Random>

Dest. Port = 4789

UDP Length UDP Checksum (= 0)

Rsv. 1 R. Reserved

24 bit VNI Reserved

UDP

VxLAN

Set to 1 for valid VNI

Other bits ignored

VxLAN Characteristics❒ Problem: IP multicast control plane required by RFC

7348❍ IP multicast address allocated per VNI for determining IP

unicast address to MAC address mapping❍ Multicast routing not widely deployed in data centers

• Most VxLAN deployments use NVA/SDN Controller

❒ Solution: VxLAN just used as an encapsulation format❒ UDP endpoint constitutes a VxLAN Tunnel End Point

(VTEP)❍ Handled at application layer

❒ Path MTU discovery performed by VTEP❒ Multicast handling like NVGRE

❍ Can be handled by using underlay multicast❍ Mostly handled using N-way unicast

VxLAN Data Center Virtualization

Data Center L3 Network

NVA(e.g. Software DefinedNetwork Controller)

ToR1

ToR2

ToR3

Centralized Control Plane

VTEP

VTEP

VTEP

Create RedTenant Network

(VNI-1)

VNI-1

VN

I-1

NVE

NVE

NVE

L3 Virtual Networks Summary

❒ Despite the challenges with IP overlays, they are widely deployed❍ Usually workarounds for the challenges

❒ Software availability❍ Lots of open source software❍ Also proprietary solutions available

❒ Can extend overlay into WAN❍ Between data centers❍ Between enterprise network and data center

❒ Deployments almost exclusively use centralized control❍ NVA implemented using an SDN controller

Recommended