Windows Server 2012 Hyper-V Networking Evolved

Preview:

DESCRIPTION

More info on http://techdays.be.

Citation preview

Windows Server 2012 Hyper-V Networking Evolved

Didier Van Hoye

Didier Van Hoye

Technical Architect – FGIA

Microsoft MVP & MEET Member

http://workinghardinit.wordpress.com@workinghardinit

What We’ll Discuss

Windows Server 2012 Networking

Changed & Improved features

New features

Relationship to Hyper-V

Why We’ll Discuss This

We face many network challenges

Keep systems & services running

High to continuous availability

High reliability & reducing complexity

Security, multitenancy, extensibility

Cannot keep throwing money at it (CAPEX)

Network virtualization, QOS, bandwidth management in box

Performance (latency, throughput, scalability)

Leverage existing hardware

Control operational cost (OPEX) Reduce complexity

Eternal Challenge = Balanced Design

M E M O R YC P U

S T O R A G EN E T W O R K

A V

A I L A

B I L I T Y

C A

P A

C I

T Y

C O S T

P E R F O R M A N C E

Network Bottlenecks

In the host networking stack

In the NICs

In the switches

PowerEdge M1000e

9 101 2

11 123 4

13 145 6

15 167 8

0

1

00

1

0 0

1

00

1

0 0

1

00

1

0 0

1

00

1

0

0

1

00

1

0 0

1

00

1

0 0

1

00

1

0 0

1

00

1

0

42

Socket, NUMA, Core, K-Group

Processor: One physical processor, which can consist of one or more NUMA nodes. Today a physical processor ≈ a socket, with multiple cores.

Non-uniform memory architecture (NUMA) node:A set of logical processors and cache that are close to one another.

Core: One processing unit, which can consist of one or more logical processors.

Logical processor (LP): One logical computing engine from the perspective of the operating system, application or driver. In effect, a logical processor is a thread (think hyper threading).

Kernel Group: A set of up to 64 logical processors.

Kernel Group (K-Group)

Advanced Network Features (1)

Receive Side Scaling (RSS)

Receive Segment Coalescing (RSC)

Dynamic Virtual Machine Queuing (DVMQ)

Single Root I/O Virtualization (SR-IOV)

NIC TEAMING

RDMA/Multichannel support for virtual machines on SMB3.0

Windows Server 2012 scales RSS to the next generation of servers & workloads

Spreads interrupts across all available CPUs

Even for those very large scale hosts

RSS now works across K-Groups

Even RSS is “Numa Aware” to optimize performance

Now load balances UDP traffic across CPUs

40% to 100% more throughput (backups, file copies, web)

Receive Side Scaling (RSS)

Node 0 Node 1 Node 2 Node 3

Qu

eu

es

Incoming Packets

RSS improves scalability on multiple processors / NUMA nodes by distributing

TCP/UDP receive traffic across the cores in ≠ nodes / K-Groups

RSS NIC with 8 Queues

Receive Segment Coalescing (RSC)

Coalesces packets in the NICso the stack processes fewer headers

Multiple packets belonging to a connectionare coalesced by the NIC to a larger packet (max of 64 K) and processed within a single interrupt

10 - 20% improvement in throughput &CPU workload Offload to NIC

Enabled by default on all 10Gbps

Coalesced into larger buffer

Incoming Packets

NIC with RSC

RSC helps by coalescing multiple inbound packets into a

larger buffer or “packet” which reduces per packet CPU

costs as less headers need to be processed.

Receive Segment Coalescing

Dynamic Virtual Machine Queue (DVMQ)

VMQ is to virtualization what RSS is to native workloads.

It makes sure that Routing, Filtering etc. is done by the NIC in queues andthat the interrupts for those queues don’t get done by 1 processor (0).

Most inbox 10Gbps Ethernet adapters support this.

Enabled by default.

Network I/O path without VMQ

Network I/O path with VMQ

Root Partition

Physical NIC

CPU

0

CPU

1

CPU

2

CPU

3

Root Partition

Physical NIC

CPU

0

CPU

1

CPU

2

CPU

3

Dynamic Virtual Machine Queue (DVMQ)

Adaptive optimal performance across changing workloads

No VMQ

Root Partition

Physical NIC

CPU

0

CPU

1

CPU

2

CPU

3

Static VMQ

Root Partition

Physical NIC

CPU

0

CPU

1

CPU

2

CPU

3

Dynamic VMQ

Network I/O path with SR-IOVNetwork I/O path without SR-IOV

Single-Root I/O Virtualization (SR-IOV)

Reduces CPU utilization for processing network traffic

Reduces latency path

Increases throughput

Requires: Chipset: Interrupt & DMA remapping

BIOS Support

CPU: Hardware virtualization, EPT or NPT

Physical NIC

Root Partition

Hyper-V Switch

Routing

VLAN

Filtering

Data Copy

Virtual Machine

Virtual NIC

SR-IOV Physical NIC

Virtual Function

VMBUS

SR-IOV Enabling & Live Migration

Virtual Machine

Network Stack

Enable IOV (VM NIC Property)

Virtual Function is “Assigned”

“NIC” automatically created

Traffic flows through VF

Turn On IOV Switch back to Software path Reassign Virtual Function

Assuming resources are available

Migrate as normal

Live Migration Post Migration

Remove VF from VM

VM has connectivity even if

Switch not in IOV mode

IOV physical NIC not present

Different NIC vendor

Different NIC firmware

SR-IOV Physical NICPhysical NIC

Software Switch

(IOV Mode)

SR-IOV Physical NIC

Software path is not used

Virtual Function

“NIC”

Software NIC

Virtual Function

Software Switch

(IOV Mode)

“NIC”

Software NIC

NIC TEAMING

Customers are dealing withway to many issues.

NIC vendors would like toget rid of supporting this.

Microsoft needs this to becompetitive & complete thesolution stack + reducesupport issues.

Teaming modes: Switch dependent

Switch independent

Load balancing: Address Hash

Hyper-Port

Hashing modes: 4-tuple

2-tuple

MAC address

Active/Active & Active/Standby

Vendor Agnostic

Hyper-V Extensible Switch

Network switch

IM MUXProtocol edge

Virtual miniport 1

Port 1 Port 2 Port 3

LBFO

Configuration DLL

LBFO Admin GUI

Kern

el

mo

de

Use

r m

od

e

WMI

IOCTL

NIC Teaming

NIC 1 NIC 2 NIC 3

LBFO Provider

Frame distribution/aggregation

Failure detection

Control protocol implementation

NIC TEAMING (LBFO)

Parent NIC Teaming Guest NIC Teaming

Hyper-V virtual switch

VM (Guest Running Any OS)

SR-IOV NIC SR-IOV NIC

LBFO Teamed NIC

SR-IOV Not exposed Hyper-V virtual

switch

VM (Guest Running Windows Server 2012)

LBFO Teamed NIC

Hyper-V virtual

switch

SR-IOV NIC SR-IOV NIC

SMB Client SMB Server

User

Kernel

R-NIC

Network w/

RDMA

support

NTFS

SCSI

R-NIC

SMB Direct (SMB over RDMA)

What

Addresses congestion in network stack by offloading the stack to the network adapter

Advantages

Scalable, fast and efficient storage access

High throughput, low latency & minimal CPU utilization

Load balancing, automatic failover & bandwidth aggregation via SMB Multichannel

Scenarios

High performance remote file access for applicationservers like Hyper-V, SQL Server, IIS and HPC

Used by File Server and Clustered Shared Volumes (CSV) for storage communications within a cluster

Required hardware

RDMA-capable network interface (R-NIC)

Three types: iWARP, RoCE & Infiniband

SMB Client

Application

Network w/

RDMA

support

SMB Server

Disk

SMB Multichannel

Multiple connections per SMB session

Full Throughput

Bandwidth aggregation with multiple NICs

Multiple CPUs cores engaged when using Receive Side Scaling (RSS)

Automatic Failover

SMB Multichannel implements end-to-end failure detection

Leverages NIC teaming if present, but does not require it

Automatic Configuration

SMB detects and uses multiple network paths

SMB Multichannel Single NIC Port

No failover

Can’t use full 10Gbps Only one TCP/IP connection

Only one CPU core engaged

1 session, without Multichannel

SMB Server

SMB Client

Switch

10GbE

NIC

10GbE

NIC

10GbE

CPU utilization per core

Core 1 Core 2 Core 3 Core 4

RSS

RSS

SMB Server

SMB Client

No failover

Full 10Gbps available Multiple TCP/IP connections

Receive Side Scaling (RSS) helpsdistribute load across CPU cores

1 session, with Multichannel

Switch

10GbE

NIC

10GbE

NIC

10GbE

CPU utilization per core

Core 1 Core 2 Core 3 Core 4

RSS

RSS

Automatic NIC failover

Combined NIC bandwidth available Multiple NICs engaged

Multiple CPU cores engaged

No automatic failover

Can’t use full bandwidth Only one NIC engaged

Only one CPU core engaged

SMB Server 1

SMB Client 1

Switch

10GbE

SMB Server 2

SMB Client 2

NIC

10GbE

NIC

10GbE

NIC

10GbE

NIC

10GbE

Switch

10GbE

Switch

10GbE

NIC

10GbE

NIC

10GbE

NIC

10GbE

NIC

10GbE

RSS RSS

RSS RSS

SMB Server 1

SMB Client 1

SMB Server 2

SMB Client 2

NIC

10GbE

NIC

10GbE

NIC

10GbE

NIC

10GbE

Switch

10GbE

Switch

10GbE

NIC

10GbE

NIC

10GbE

NIC

10GbE

NIC

10GbE

RSS RSS

RSS RSS

SMB Multichannel Multiple NIC Ports

1 session, without Multichannel 1 session, with Multichannel

Switch

10GbE

Switch

10GbE

Switch

10GbE

Automatic NIC failover (faster with NIC Teaming)

Combined NIC bandwidth available Multiple NICs engaged

Multiple CPU cores engaged

Automatic NIC failover

Can’t use full bandwidth Only one NIC engaged

Only one CPU core engaged

SMB Server 1

SMB Client 1

SMB Server 2

SMB Client 2

Switch

10GbE

NIC

10GbE

Switch

10GbE

NIC

10GbE

NIC

10GbE

NIC

10GbE

Switch

1GbE

NIC

1GbE

NIC

1GbE

Switch

1GbE

NIC

1GbE

NIC

1GbE

SMB Server 2

SMB Client 1

Switch

1GbE

SMB Server 2

SMB Client 2

NIC

1GbE

NIC

1GbE

Switch

1GbE

NIC

1GbE

NIC

1GbE

Switch

10GbE

Switch

10GbE

NIC

10GbE

NIC

10GbE

NIC

10GbE

NIC

10GbE

NIC Teaming

NIC Teaming

RSS RSS

RSS RSS

NIC TeamingRSS RSS

SMB Multichannel & NIC Teaming

1 session, NIC Teaming without MC 1 session, NIC Teaming with MC

NIC TeamingRSS RSS

NIC Teaming

NIC Teaming

NIC Teaming

NIC Teaming

1 session, with Multichannel

Automatic NIC failover

Combined NIC bandwidth available Multiple NICs engaged

Multiple RDMA connections

SMB Direct & Multichannel

SMB Server 2

SMB Client 2

SMB Server 1

SMB Client 1

SMB Server 2

SMB Client 2

SMB Server 1

SMB Client 1

Switch

10GbE

Switch

10GbE

R-NIC

10GbE

R-NIC

10GbE

R-NIC

10GbE

R-NIC

10GbE

Switch

54GbIB

R-NIC

54GbIB

R-NIC

54GbIB

Switch

54GbIB

R-NIC

54GbIB

R-NIC

54GbIB

Switch

10GbE

Switch

10GbE

R-NIC

10GbE

R-NIC

10GbE

R-NIC

10GbE

R-NIC

10GbE

Switch

54GbIB

R-NIC

54GbIB

R-NIC

54GbIB

Switch

54GbIB

R-NIC

54GbIB

R-NIC

54GbIB

No automatic failover

Can’t use full bandwidth Only one NIC engaged

RDMA capability not used

1 session, without Multichannel

Auto configuration looks at NIC type/speed => Same NICs are used for RDMA/Multichannel (doesn’t mix 10Gbps/1Gbps, RDMA/non-RDMA)

Let the algorithms work before you decide to intervene

Choose adapters wisely for their function

Switch

1GbE

SMB Server

SMB Client

NIC

1GbE

NIC

1GbE

Switch

1GbE

Switch

Wireless

SMB Server

SMB Client

NIC

1GbE

NIC

Wireless

NIC

1GbE

Switch

1GbE

SMB Server

SMB Client

NIC

1GbE

NIC

1GbE

Switch

10GbE

SMB Server

SMB Client

R-NIC

10GbE

R-NIC

10GbE

Switch

10GbE

NIC

10GbE

NIC

10GbE

Switch

IB

R-NIC

32GbIB

R-NIC

32GbIB

Switch

10GbE

R-NIC

10GbE

R-NIC

10GbE

RSS

RSS

NIC

Wireless

SMB Multichannel Auto Configuration

Metric Large

Send

Offload

(LSO)

Receive

Segment

Coalescing

(RSC)

Receive

Side

Scaling

(RSS)

Virtual

Machine

Queues

(VMQ)

Remote

DMA

(RDMA)

Single Root

I/O

Virtualization

(SR-IOV)

Lower

Latency

Higher

Scalability

Higher

Throughput

Lower Path

Length

Networking Features Cheat Sheet

Advanced Network Features (2)

Consistent Device Naming

DCTCP/DCB/QOS

DHCP Guard/Router Guard/Port Mirroring

Port ACLs

IPSEC Task Offload for Virtual Machines (IPsecTOv2)

Network virtualization & Extensible Switch

Consistent Device Naming

1Gbps flow controlled by TCP Needs 400 to 600KB of memory

TCP saw tooth visible

1Gbps flow controlled by DCTCP Requires 30KB of memory

Smooth

DCTCP Requires Less Buffer Memory

W2K12 deals with network congestion by reacting to

the degree & not merely the presence of congestion.

DCTCP aims to achieve low latency, high burst tolerance and

high throughput, with small buffer switches.

Requires Explicit Congestion Notification (ECN, RFC 3168)

capable switches.

Algorithm enabled when it makes sense

(low round trip times, i.e. in the data center).

Datacenter TCP (DCTCP)

Datacenter TCP (DCTCP)

http://www.flickr.com/photos/srgblog/414839326

Datacenter TCP (DCTCP)

Datacenter TCP (DCTCP)

Running out of buffer in a

switch gets you in to stop/go

hell by getting a boatload of

green, orange & red lights

along your way

Big buffers mitigate this but

are very expensive

http://www.flickr.com/photos/bexross/2636921208/http://www.flickr.com/photos/mwichary/3321222807/

Datacenter TCP (DCTP)

You want to be in a green wave

Windows Server 2012 & ECN

provides network traffic control

by default

http://www.flickr.com/photos/highwaysagency/6281302040/

http://www.telegraph.co.uk/motoring/news/5149151/Motorists-to-be-given-green-

traffic-lights-if-they-stick-to-speed-limit.html

Data Center Bridging (DCB)

Prevents congestion in NIC & network by reserving

bandwidth for particular traffic types

Windows 2012 provides support & control for DCB, tags

packets by traffic type

Provides lossless transport for mission critical workloads

DCB is like a car pool lane …

http://www.flickr.com/photos/philopp/7332438786/

DCB Requirements

1. Enhanced Transmission Selection (IEEE 802.1Qaz)

2. Priority Flow Control (IEEE 802.1Qbb)

3. (Optional) Data Center Bridging Exchange protocol

4. (Not required) Congestion Notification (IEEE 802.1Qau)

10 GbE Phy NIC 10 GbE Phy NIC

LBFO Teamed NIC

Hyper-V virtual switch

VM 1 VM nManagement OS

Live Migration

Storage

Management

Hyper-V Qos beyond the VM

Manage the Network Bandwidth

with a Maximum (value) and/or a

Minimum (value or weight)

Hyper-V Qos beyond the VM

Default Flow per Virtual Switch

VM2

Hyper-V Extensible Switch

VM1Gold

Tenant

Customers may group a number of

VMs that each don’t have

minimum bandwidth. They will be

bucketed into a default flow, which

has minimum weight allocation.

This is to prevent starvation.? ? 10

1 Gbps

Maximum Bandwidth for Tenants

Hyper-V Extensible Switch

Unified Remote Access

Gateway

<100Mb

One common customer pain

point is WAN links are

expensive

Cap VM throughput to the

Internet to avoid bill shock∞

Internet Intranet

Bandwidth Network Management

Manage the Network

Bandwidth with a

Maximum and a

Minimum value

SLAs for hosted Virtual

Machines

Control per VMs and not

per HOST

DHCP & Router Guard, Port Mirroring

IPsec Task Offload

IPsec is CPU intensive => Offload to NIC

In demand due to compliance (SOX, HIPPA, etc.)

IPsec is required & needed for secured operations

Only available to host/parent workloads in W2K8R2

Now extended to virtual machines Managed by the Hyper-V switch

Port ACL

Note: Counters are implemented as ACLs

– Counts packets to address/range

– Read via WMI/PowerShell

– Counters are tied into the resource metering you

can do for charge/show back, planning etc.

Port ACL

Allow/Deny/Counter

MAC, IPv4 or IPv6 addresses

Wildcards allowed in IP addresses

Note: Counters are implemented as ACLs

Counts packets to address/range

Read via WMI/PowerShell

Counters are tied into the resource metering you can do for charge/show back, planning etc.

ACLs are the basic building blocks of virtual switch security functions

http://workinghardinit.wordpress.com@workinghardinit

Questions & Answers

Recommended