Azure Accelerated Networking: SmartNICs in the Public CloudTHE DESIGN OF THE SMARTNIC. Multiple...

Preview:

Citation preview

AZURE ACCELERATED NETWORKING: SMART NICS IN

THE PUBLIC CLOUDFirestone, D et al

NETWORK VIRTUALIZATION

Virtualization of OSI Layers 2 – 7

Routers and Switches implemented on software simulate physical routers and switches

Provides similar features to those found on physical networking hardware, including: Route Tables Subnets DHCP DNS Firewalls Access Control List (ACL) Network Address Translation (NAT)

SOFTWARE DESIGNED NETWORKING (SDN)

Cloud providers (AWS, Azure) enable customers to configure virtual networks through software

Such networks run on the cloud platform hypervisor (NOT within the customer’s virtual machines)

Cloud providers generally provide free SDN to the end customers Only pay for outbound data and virtual networking peering (per GB) and public

IPv4 addresses

https://azure.microsoft.com/en-us/pricing/details/virtual-network/

AZURE VIRTUAL FILTERING PLATFORM (VFP)

Virtual Machine 1Virtual

Machine 2Virtual

Machine 3

THE HIDDEN COST OF NETWORKING

CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU

Physical Server

Hypervisor processes, including SDN

Each rented CPU generates ~$900/year in revenue

Virtual Machine 1Virtual

Machine 2Virtual

Machine 3

THE HIDDEN COST OF NETWORKING

CPU CPU CPU CPU CPU CPU CPU

Virtual Machine 4

CPU CPU CPU

Physical Server

Hypervisor processes, including SDN

Can we get here?

HOST SDN

Host with SDNPhysical NIC

Virtual Machines

SR-IOV (SINGLE ROOT IO VIRTUALIZATION)

COMPARISION

Host SDN

Host SDN is full-featured

It is slow and expensive!

SR-IOV

SR-IOV is fast

It cannot apply all SDN rules

Can we get the best of both?

GENERIC FLOW TABLE (GFT) OFFLOAD

Just-In-Time Compilation

For the first packet in each flow, route it through Host SDN to apply polices

Store the result in a hash-table (based upon L2/L3/L4 properties) in the SR-IOV NIC

The SR-IOV routes all subsequent packets over the VF directly to the VM

DESIGN GOALS

1. Minimize burning of CPU cores

2. Maintain programmability

3. Be as good as SR-IOV

4. Be extensible for new features

5. Be with the entire fleet

6. Achieve high single-connection performance

7. Scale to 100GbE+

8. Be serviceable

Overall focus on cost minimization

SR-IOV WITH GFT VS SNAP?

How do the design goals of SR-IOV with GFT compare to those of SNAP?

Consider the following dimensions:

CPU Utilization

Programmability

Throughput

Future-proof

Compatibility

Single-connection performance

100GbE

Serviceability

Security

THE DESIGN OF THE SMARTNIC

Multiple options to consider: ASIC-based NICs Multicore SoC NICs FPGA NICs Burn host cores

ASIC

Application Specific Integrated Circuit

High performance

1-2 years development-to-manufacturing lead time

Cannot be changed after manufacturing

For well-specified, non-evolving applications, such as Bitcoin Mining (pictured), ASICs offer maximum performance for cost

MULTICORE SOC

Socket on Chip

Provides Linux environment, where one can run DPDK

Works well up to 40GbE

FPGA

Field Programable Gate Array

Like an ASIC but can be reprogrammed

Already used within Microsoft for Bing (Catapult)

WHICH IS BEST?

What are the advantages and disadvantages of these three hardware options along with using the host CPU?

DETAILED COMPARISION

ASICs SoCs FPGAs Host CPUs

Host CPU Utilization ✅ ✅ ✅ ❌

Programmability ❌ ✅ ✅ ✅

Throughput ✅ ❌ ✅ ❌

Future-proof ❌ ✅ ✅ ✅

Compatibility ✅ ✅ ✅

Single Connection Performance

✅ ❌ ✅ ❌

100 GbE ✅ ❌ ✅ ❌

Serviceability ❌ ✅ ✅ ✅

Ease of Development ❌ ✅ ✅ ✅

Power Efficiency ✅ ❌ ✅ ❌

SYSTEM DESIGN

ACCELNET ARCHITECTURE

SERVICEABILITY

Question: If all traffic goes through AccelNet, and the AccelNet is exposed as a VF directly into the VM, how does Azure maintain uptime when performing service on the FPGA or GFT?

SERVICEABILITY

KERNEL-BYPASS PROTOCOLS

DPDK: Poll Mode Driver (PMD) transparently binds between the VMBUs and the VF Exposes all DPDK APIs

RDMA: Fallback to TCP – open area of research

PERFORMANCE

BATTLE OF THE CLOUDS

DISCUSSION QUESTIONS

1. What are the security concerns of AccelNet?

2. What are the drawbacks of AccelNet?

3. Which is the better approach to take – AccelNet vs SNAP?

4. Should public clouds charge customers for the complexity of their networks?

Recommended