41

Processor or Socket NUMA Node Core LP Processor or Socket NUMA Node Core LP Processor or Socket NUMA Node Core LP Processor or Socket NUMA Node Core

Embed Size (px)

Citation preview

Page 1: Processor or Socket NUMA Node Core LP Processor or Socket NUMA Node Core LP Processor or Socket NUMA Node Core LP Processor or Socket NUMA Node Core
Page 2: Processor or Socket NUMA Node Core LP Processor or Socket NUMA Node Core LP Processor or Socket NUMA Node Core LP Processor or Socket NUMA Node Core

Network Tuning for Specific WorkloadsGabriel Silva Don Stanwyck

DCIM-B344

Page 3: Processor or Socket NUMA Node Core LP Processor or Socket NUMA Node Core LP Processor or Socket NUMA Node Core LP Processor or Socket NUMA Node Core

MICROSOFT CONFIDENTIAL – INTERNAL ONLY

Terminology and Basic Architecture

NIC Features for Networking

Agenda

Page 4: Processor or Socket NUMA Node Core LP Processor or Socket NUMA Node Core LP Processor or Socket NUMA Node Core LP Processor or Socket NUMA Node Core

Let’s first cover some terminology• Processor. One physical processor, which can consist of one or more nodes. A physical processor is the same as a package, a socket, or a CPU.

• Non-uniform memory architecture (NUMA) node. A set of logical processors and cache that are close to one another.

• Core. One processing unit, which can consist of one or more logical processors.

• Logical processor (LP). One logical computing engine from the perspective of the host operating system, application, or driver. In effect, an LP is a thread.

• Virtual Processor (VP). One logical computing engine from the perspective of a virtual operating system. VPs are specific to a virtual machine and are not 1:1 to LPs on the physical system

• Affinity. A preference indicated by a thread, process, or interrupt for operation on a particular processor, node, or group.

Processor or Socket

NUMA Node

Core

LP LP

Processor or SocketNUMA Node

Core

LP LP

Processor or SocketNUMA Node

Core

LP LP

Processor or SocketNUMA Node

Core

LP LP

Processor or SocketNUMA Node

Core

LP LP

Page 5: Processor or Socket NUMA Node Core LP Processor or Socket NUMA Node Core LP Processor or Socket NUMA Node Core LP Processor or Socket NUMA Node Core

Parent Partition

Basic Architecture

Hyper-V Extensible Switch

ExtensionsRouting Filtering

ACLs

Network Adapter

Virtual machine

VM BUS

LP0 LP1 LP2 LP3 VP0 VP1 VP2 VP3

VM NIC

Page 6: Processor or Socket NUMA Node Core LP Processor or Socket NUMA Node Core LP Processor or Socket NUMA Node Core LP Processor or Socket NUMA Node Core

NIC Performance Features for Networking

Page 7: Processor or Socket NUMA Node Core LP Processor or Socket NUMA Node Core LP Processor or Socket NUMA Node Core LP Processor or Socket NUMA Node Core

Problem• An Enterprise Web Server has a large volume of incoming packets

spanning a large number of incoming connections. A small set of logical processors run at 100% utilization. An IT Admin wants to scale the workload across many logical processors and get more parallelization in servicing requests

• Examples: Web Server and File Server

Page 8: Processor or Socket NUMA Node Core LP Processor or Socket NUMA Node Core LP Processor or Socket NUMA Node Core LP Processor or Socket NUMA Node Core

Receive Side Scaling (RSS)•Why do I need RSS?•Network traffic is no longer limited to one core•RSS distributes receive and send networking traffic to multiple processors for the host

•When should I enable RSS?•Enabled by default on Windows Server machines without a vSwitch created. •Useful when processing large amounts of networking traffic

Page 9: Processor or Socket NUMA Node Core LP Processor or Socket NUMA Node Core LP Processor or Socket NUMA Node Core LP Processor or Socket NUMA Node Core

Physical Machine

CPU0 CPU1 CPU2 CPU3 CPU4 CPU5

RSS Queue1

RSS Queue2

RSS Queue3

RSS Queue4

RSS Logic

Network Packet

RSS Step by Step

1. NIC performs the spreading of the network traffic by TCP/UDP flows

2. Flows are then placed into RSS Queues on the NIC

3. Networking traffic inside the RSS Queue is then indicated to the affinitized processor in the host

4. RSS CPUs are configured by the admininistrator

Page 10: Processor or Socket NUMA Node Core LP Processor or Socket NUMA Node Core LP Processor or Socket NUMA Node Core LP Processor or Socket NUMA Node Core

RSS Configuration• PowerShell Configuration• Get-NetAdapterRss• Set-NetAdapterRss• -Profile, -BaseProcessorNumber, -MaxProcessorNumber, -MaxProcessors

• Enable-NetAdapterRss• Disable-NetAdapterRss

• Profiles• Closest (Like WS08)• Closest Static • NUMA Static (Default)• NUMA Dynamic• Conservative

DynamicStatic

NUMA aware

NonNUMA

Closest Static

NUMA Dynamic

NUMA Static(default)

Closest(like WS08 R2)

Page 11: Processor or Socket NUMA Node Core LP Processor or Socket NUMA Node Core LP Processor or Socket NUMA Node Core LP Processor or Socket NUMA Node Core

Demo

RSS

Page 12: Processor or Socket NUMA Node Core LP Processor or Socket NUMA Node Core LP Processor or Socket NUMA Node Core LP Processor or Socket NUMA Node Core

Problem• An IT Pro is unable to run more VMs on a deployment server

because they realize that the incoming packet processing is saturating a limited set of logical processors on the physical host

• Examples: Private Cloud Deployments

Page 13: Processor or Socket NUMA Node Core LP Processor or Socket NUMA Node Core LP Processor or Socket NUMA Node Core LP Processor or Socket NUMA Node Core

Virtual Machine Queue (VMQ)•Why do I need VMQ?•RSS spreads traffic for one physical host and by TCP/UDP flows•Multiple guests on a system breaks that model•VMQ spreads traffic per vNIC•Each VMQ can use at most one logical CPU in the host

•when When should I enable VMQ?•Enabled by default on Windows Server machines with a vSwitch created. •Useful hosting many VMs on the same host

Page 14: Processor or Socket NUMA Node Core LP Processor or Socket NUMA Node Core LP Processor or Socket NUMA Node Core LP Processor or Socket NUMA Node Core

Parent Partition

Virtual machine

Network stack

VM NIC

Hyper-V Extensible Switch

Virtual machine

Network stack

VM NIC

VM BUS

VM Queue

VM Queue

DefaultQueue

Routing Filtering

Port1 Port2

VP0 VP1 VP2 VP3 VP4 VP5

MAC Address Filter

VMQ

Page 15: Processor or Socket NUMA Node Core LP Processor or Socket NUMA Node Core LP Processor or Socket NUMA Node Core LP Processor or Socket NUMA Node Core

VMQ Step-by-Step1. NIC performs the spreading of the network traffic by MAC address2. Flows are then placed into the appropriate VMQ on the NIC3. Networking traffic inside the VMQ is then indicated to the affinitized processor in the

host• VMQ indication saves CPU by bypassing the routing and filtering in the vSwitch

4. Once vSwitch processing is completed, vSwitch passes the traffic to the correct VM5. VMQ CPUs are configured by the admininistrator

Page 16: Processor or Socket NUMA Node Core LP Processor or Socket NUMA Node Core LP Processor or Socket NUMA Node Core LP Processor or Socket NUMA Node Core

VMQ Configuration• PowerShell Configuration• Get-NetAdapterVmq• Set-NetAdapterVmq• -BaseProcessorNumber, -MaxProcessorNumber, -MaxProcessors

• Enable-NetAdapterVmq• Disable-NetAdapterVmq• Get-NetAdapterVmqQueue

Page 17: Processor or Socket NUMA Node Core LP Processor or Socket NUMA Node Core LP Processor or Socket NUMA Node Core LP Processor or Socket NUMA Node Core

Demo

VMQ

Page 18: Processor or Socket NUMA Node Core LP Processor or Socket NUMA Node Core LP Processor or Socket NUMA Node Core LP Processor or Socket NUMA Node Core

Problem• An IT Pro is hosting an Enterprise Web Server in a VM for high

availability and to leverage existing hardware. Like the server on a physical machine, this server has a large volume of incoming packets spanning a large number of incoming connections. The Pro notices that the workload used to scale natively but in a VM the processing is bound to 1 VP and a single LP in the root. The Pro wants to scale the workload across many logical and virtual processors to get more parallelization in servicing requests

• Examples: File Servers, Web Server, Gateways

Page 19: Processor or Socket NUMA Node Core LP Processor or Socket NUMA Node Core LP Processor or Socket NUMA Node Core LP Processor or Socket NUMA Node Core

Virtual Receive Side Scaling (vRSS)•Why do I need vRSS?•Remember this from VMQ: Each VMQ can use at most one logical CPU in the host•One guest with high networking traffic is capped at a single CPU for network processing• Example: File Server, SQL Server•vRSS enables RSS inside of a VM and spreading inside the host for vSwitch processing

•When should I enable vRSS?•Disabled by default on Windows Server 2012 R2 machines •Useful when hosting one or two large VMs with high networking traffic

Page 20: Processor or Socket NUMA Node Core LP Processor or Socket NUMA Node Core LP Processor or Socket NUMA Node Core LP Processor or Socket NUMA Node Core

vRSSParent Partition Virtual machine

Hyper-V Extensible Switch

Virtual machine

VM BUS

VM Queue

VM Queue

DefaultQueue

Port1 Port2

Extensions

ACLs

VP0 VP1 VP2 VP3 VP0 VP1 VP2 VP3

VM NIC VM NICIndirection Table Indirection Table

LP0 LP1 LP2 LP3 LP4 LP5

Indirection Table

Toeplitz HashRouting Filtering

MAC Address Filter

Page 21: Processor or Socket NUMA Node Core LP Processor or Socket NUMA Node Core LP Processor or Socket NUMA Node Core LP Processor or Socket NUMA Node Core

vRSS Step-by-Step1. NIC performs the spreading of the network traffic by MAC address2. Flows are then placed into the appropriate VMQ on the NIC3. Networking traffic inside the VMQ is then indicated to the affinitized processor in the

host• VMQ indication saves CPU by bypassing the routing and filtering in the vSwitch

4. NIC applies RSS logic for the flow based on the VMQ it is indicated on to spread traffic further for vSwitch processing, if needed

5. Once vSwitch processing is completed, vSwitch passes the traffic to the correct VM6. The VM applies another level of RSS logic to now spread networking traffic inside the

VM7. VMQ CPUs are configured by the administrator in the host and RSS CPUs on the

guest side are configured by the VMs administrator

Page 22: Processor or Socket NUMA Node Core LP Processor or Socket NUMA Node Core LP Processor or Socket NUMA Node Core LP Processor or Socket NUMA Node Core

vRSS Configuration• PowerShell Configuration (Host Side)• VMQ PowerShell Cmdlets

• PowerShell Configuration (VM)• RSS PowerShell Cmdlets

Page 23: Processor or Socket NUMA Node Core LP Processor or Socket NUMA Node Core LP Processor or Socket NUMA Node Core LP Processor or Socket NUMA Node Core

Demo

vRSS

Page 24: Processor or Socket NUMA Node Core LP Processor or Socket NUMA Node Core LP Processor or Socket NUMA Node Core LP Processor or Socket NUMA Node Core

IT Pro’s Problem• An IT Pro has a VM that hosts applications that will receive data

from the network, make some decisions based on the data, and then send a response in her private cloud. The workload is very latency sensitive and the IT Pro needs to minimize the round-trip time to the microseconds from the time they receive the data and the time they send a response back out

• Examples: Financial Applications

Page 25: Processor or Socket NUMA Node Core LP Processor or Socket NUMA Node Core LP Processor or Socket NUMA Node Core LP Processor or Socket NUMA Node Core

Single Root I/O Virtulization (SR-IOV)•Why do I need SR-IOV?•SR-IOV distributes networking traffic by VM and delivers directly to the VM from the NIC•*Some adapters also have RSS for SR-IOV from the NIC

•When should I enable SR-IOV?•Disabled by default on Windows Server machines •Useful when hosting VMs with low latency requirements in a trusted environment

Page 26: Processor or Socket NUMA Node Core LP Processor or Socket NUMA Node Core LP Processor or Socket NUMA Node Core LP Processor or Socket NUMA Node Core

SR-IOV

Parent Partition Virtual machine

Network stack

VM NICVirtual

function (VF)

Hyper-V Extensible Switch

ExtensionsRouting Filtering

ACLs

SR-IOV network adapter VF

Virtual machine

Network stack

VM NIC

VM BUS

Page 27: Processor or Socket NUMA Node Core LP Processor or Socket NUMA Node Core LP Processor or Socket NUMA Node Core LP Processor or Socket NUMA Node Core

SR-IOV Step-by-Step1. NIC performs the spreading of the network traffic by MAC address 2. Flows are then placed into the appropriate VF on the NIC3. Networking traffic inside the VF is then indicated to the correct VM• SR-IOV saves CPU and decreases latency by bypassing the host and vSwitch altogether

Page 28: Processor or Socket NUMA Node Core LP Processor or Socket NUMA Node Core LP Processor or Socket NUMA Node Core LP Processor or Socket NUMA Node Core

SR-IOV Configuration• Hyper-V Extensible Switch Configuration• Get-NetAdapterSriov

• PowerShell Configuration• Get-NetAdapterSriov• Set-NetAdapterSriov• -NumVFs

• Enable-NetAdapterSriov• Disable-NetAdapterSriov• Set-NetAdapterAdvancedProperty

Page 29: Processor or Socket NUMA Node Core LP Processor or Socket NUMA Node Core LP Processor or Socket NUMA Node Core LP Processor or Socket NUMA Node Core

• RSS provides high scalability of network traffic on a single host

• VMQ distributes traffic equally amongst multiple guests on a single host with a vSwitch (1 core per vmNIC/vNIC)

• vRSS provides high scalability of network traffic on a single host to a few large VMs

• SRIOV is a low latency path for network traffic that bypasses the host and vSwitch in secure environments

NIC Offloads Key Takeaways

Page 30: Processor or Socket NUMA Node Core LP Processor or Socket NUMA Node Core LP Processor or Socket NUMA Node Core LP Processor or Socket NUMA Node Core

RSS VMQHost (No vSwitch bound to this NIC)

vSwitch (Bound to the NIC)

VM (vRSS)

Inside the VM Inside the host for vSwitch processing

NIC Offloads Key Takeaways, cont…

Page 31: Processor or Socket NUMA Node Core LP Processor or Socket NUMA Node Core LP Processor or Socket NUMA Node Core LP Processor or Socket NUMA Node Core

Parent Partition Virtual machine

Hyper-V Extensible Switch

Virtual machine

VM BUS

VMQVMQ

Port1 Port2

Extensions

ACLs

VP0 VP1 VP2 VP3 VP0 VP1 VP2 VP3

VM NIC VM NICIndirection Table

LP0 LP1 LP2 LP3 LP4 LP5

RSS

NIC Offloads altogether

Page 32: Processor or Socket NUMA Node Core LP Processor or Socket NUMA Node Core LP Processor or Socket NUMA Node Core LP Processor or Socket NUMA Node Core

NIC Teaming in WS 2012

• All Static Distribution• Different hashes (address elements, switch ports)

• Hashes were static until failure or reboot

• No explicit attempt to balance loads, just to distribute them

Page 33: Processor or Socket NUMA Node Core LP Processor or Socket NUMA Node Core LP Processor or Socket NUMA Node Core LP Processor or Socket NUMA Node Core

WS 2012 R2: New Dynamic mode

• Continuously monitors distribution of traffic

• Actively adjusts traffic based on observed load

• Will move TCP streams from team member to team member during breaks in the stream

Page 34: Processor or Socket NUMA Node Core LP Processor or Socket NUMA Node Core LP Processor or Socket NUMA Node Core LP Processor or Socket NUMA Node Core

NIC TEAM

How it works in WS 2012…

tNIC

NIC NICNIC

BACDBACDBACDBACDBACDBACDBACDBACDBACDBACDBACDBACDBACDBACDBACDBACDBACDBACDBACDBACDBACDBACDBACDBACDTCP/IP Native Stack

Each arrow represents a flowlet. In WS2012each flowlet always follows the same path theprevious flowlet from that flow did because flowlets aren’t detected and rebalancing isn’tperformed.

Page 35: Processor or Socket NUMA Node Core LP Processor or Socket NUMA Node Core LP Processor or Socket NUMA Node Core LP Processor or Socket NUMA Node Core

NIC TEAM

How it works in WS 2012 R2…

tNIC

NIC NICNIC

BACDBACDBACDBACDBACDBACDBACDBACDBACDBACDBACDBACDBACDBACDBACDBACDBACDBACDBACDBACDBACDBACDBACDBACD

Each arrow represents a flowlet. In WS2012 R2each flowlet is independently routed to the leastused NIC in the team. With MAC address rewritethe adjacent switches are unaware that flows aremoving around.

ABCDTCP/IP Native Stack

Page 37: Processor or Socket NUMA Node Core LP Processor or Socket NUMA Node Core LP Processor or Socket NUMA Node Core LP Processor or Socket NUMA Node Core

Come Visit Us in the Microsoft Solutions Experience!

Look for Datacenter and Infrastructure ManagementTechExpo Level 1 Hall CD

For More InformationWindows Server 2012 R2http://technet.microsoft.com/en-US/evalcenter/dn205286

Windows Server

Microsoft Azure

Microsoft Azurehttp://azure.microsoft.com/en-us/

System Center

System Center 2012 R2http://technet.microsoft.com/en-US/evalcenter/dn205295

Azure PackAzure Packhttp://www.microsoft.com/en-us/server-cloud/products/windows-azure-pack

Page 38: Processor or Socket NUMA Node Core LP Processor or Socket NUMA Node Core LP Processor or Socket NUMA Node Core LP Processor or Socket NUMA Node Core

Resources

Learning

Microsoft Certification & Training Resources

www.microsoft.com/learning

msdn

Resources for Developers

http://microsoft.com/msdn

TechNet

Resources for IT Professionals

http://microsoft.com/technet

Sessions on Demand

http://channel9.msdn.com/Events/TechEd

Page 39: Processor or Socket NUMA Node Core LP Processor or Socket NUMA Node Core LP Processor or Socket NUMA Node Core LP Processor or Socket NUMA Node Core

Complete an evaluation and enter to win!

Page 40: Processor or Socket NUMA Node Core LP Processor or Socket NUMA Node Core LP Processor or Socket NUMA Node Core LP Processor or Socket NUMA Node Core

Evaluate this session

Scan this QR code to evaluate this session.

Page 41: Processor or Socket NUMA Node Core LP Processor or Socket NUMA Node Core LP Processor or Socket NUMA Node Core LP Processor or Socket NUMA Node Core

© 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.