View
1.066
Download
2
Category
Tags:
Preview:
DESCRIPTION
Mario Smarduch, Senior Virtualization Engineer from the Samsung OSG, gives his perspective on Network Function Virtualization, including discussions of the Xen platform.
Citation preview
© 2013 SAMSUNG Electronics Co.
Mario SmarduchSenior Virtualization ArchitectOpen Source GroupSamsung Research America (Silicon Valley)m.smarduch@samsung.com
State of the Union: Open Source Netw
ork Function Virtualization
2 © 2013 SAMSUNG Electronics Co.
ConfidentialTalk DescriptionOne of the hottest developments today for Fixed and Mobile Networks is
'Network Function Virtualization', headed by ETSI (European Telecommunications
Standard Institute) ISG which managed to become the largest ISG in matter of six months
with close to 70 members and 90 participants. Goals of NFV are to eliminate proprietary
hardware appliances, to reduce energy, space, and hardware turnover cost. Leverage IT
virtualization benefits like consolidation, time to market, multi-tenancy of heterogeneous
applications, scaling out and in, and encourage an open eco-system not tied to any
specific hardware. However IT virtualization is currently not fit for some NFV scenarios,
Network Elements, User Equipment. Proprietary vendors and chip manufacturers are
rushing to close this gap.
This presentation focuses on open source virtualization technology primarily KVM-ARM to
contrast these Gaps and identify required low level enhancements in hypervisor, guest,
and ongoing community development to address these gaps is presented. Real uses
cases are presented to illustrate why IT virtualization is not always a fit for many NFV
scenarios. A brief overview of ARM-KVM virtualization and hardware extensions are also
covered.
3 © 2013 SAMSUNG Electronics Co.
ConfidentialAgendaGeneral Public Clouds
NFV Introduction, Status
Cloud RAN NFV use case
KVM (ARM) – limitations/required enhancements
4 © 2013 SAMSUNG Electronics Co.
ConfidentialPublic Cloud Control Focus on IaaS – PaaS, SaaS build on top of each other
- NFV does have PaaS, SaaS – powerful use cases as well (see NFV use case document)
- IaaS to grow – 2011 $4.2B � $24B 2016 (Source: Gartner)
DatabaseAgent Agent
IaaSOwners Portal SchedulerCompute Cloud Sotorage & ImageCloud
� IaaS owner issues new VM request via portal serverwith params # of cores, memory, storage, image toload/install
� Scheduler – view physical server/storage/network DBselects optimal server, loads image, creates raid createsVM in Compute cloudo May need to migrate load, create NAT entrieso For KVM issue libvirt � qemu, commands
� Update DB to maintain availability� IaaS owner – unaware of physical topology, migration,
i.e. other management – cloud infrastructure control plane� OpenStack equivalent components – Dashboard, Network
Compute, Image, Block Storage, …
RAIDVM
VMVM
QEMUvCPU/IO Threads QEMUQcow2imagevirtio-blkvirtio-net vSW OpenFlow – SDNControl (also VLANS, GRE,..)SSH, VNC
VM
5 © 2013 SAMSUNG Electronics Co.
ConfidentialPublic Cloud NetworkL3 & L2 in public cloud – Scaling the Cloud
- Public clouds 40,000 Physical machines possibly up to 1,000,000 VMs
∙ 2011-Gartner 8VMs/Server, probable 30:1 ratio
- IaaS – typically don’t require L2 broadcast domain, scale through multiple VMs
- VMs place on unique subnets – isolated for security
- Very few apps require – L2 in Cloud (broadcast, multicast – discovery of services)
∙ Large cloud providers – support L2 subnets
∙ Some client/server architectures – i.e. front end/backend processing
- Large cloud Scaling achieved through L3 hierarchical aggregated routes
6 © 2013 SAMSUNG Electronics Co.
ConfidentialPublic Cloud Network
10.20.0.0/16 – bits [17-24].xxxx_xxxxAggregate – 256 subnets * 256 10.20.254.0/27 – 10.20.254.224/27[25-27].x_xxxx aggregate – 8 subnets * 32
10.20.0.0/24 10.20.255.0/24 ……10.20.254.0/30 – 10.20.254.31/30 bits [28-30].xx aggregate – 8 subnets * 4Each 1 IP, 1 GW
L2 overlay of L3, support isolated L2 Subnets For VMs IaaS
vSW 192.168.x.x
SDN OrchistrationOpenFlow switch/route on any fields
SDN - Openvswich with KVM# ovs-vsctl add-br br0# ovs-vsctl add-port br0 <phys-intfc>- qemu.ifup – ovs-vsctl add tap to br0- ovs-ofctl – control flows
DNS Load Balance- For example A record to multiple IPsScaling via L3
7 © 2013 SAMSUNG Electronics Co.
ConfidentialPublic Cloud Characteristics (IaaS)Workloads
Web front end, SQL data base backend – eCommerce
Social Networking
SaaS apps like email, Content Backup
High Performance Computing in the cloud
Characteristics• Resources – traditional compute – cpu,
ram, storage, network
• Response - Not Real-time – response
driven by user perception (web interface)
• Scalability – out, in – add/remove VMs
- Front frontend server, or load balancer
distributes load,
• I/O – primarily virtualized – storage,
network
• Overcommit – as much as current
average 8:1, future 30:1 per Server
[Source: cloudscaling]
• Orchestration – spans few VM types,
small geographic area – same Pod
8 © 2013 SAMSUNG Electronics Co.
ConfidentialIntroduction to NFVMobile Network – LTE EUTRAN/EPC
9 © 2013 SAMSUNG Electronics Co.
ConfidentialIntroduction to NFV• EUTRAN – eNodeB, UE
- Radio – bearer, admission, mobility, scheduling dynamic radio resource allocation for uplink and downlink
• EPC
- PCRF = Policy Control and Charging Rules
∙ Determines QoS Class Identifier for data flow
∙ QoS – GBR/non-GBR, Priority, Delay, Pkt error loss rate – RT Gaming, Voice, Live Streaming most demanding
- HSS = Home Serving Server
∙ Subscriber profile – QoS, APN (PDN), current user MME
- P-GW = Packet Data Network Gateway
∙ UP IP alloc, enforce PCRF QCI map to DL bearers
- S-GW = Serving Gateway
∙ UE anchor for all IP traffic as UE roams through eNodeBs, retain bearer info for UE in idle
- MME = Mobility Management Engine
∙ Control node, UE attachment, bearer setup, UE context management from HSS,
process Tracking Area Update, paging, UE-IDLE to CONNECT state
10 © 2013 SAMSUNG Electronics Co.
ConfidentialEstablishing Bearers
Data Plane Three traffic pipes - call setupQCI1 GBR Delay 100ms voiceQCI2 GBR Delay 100ms videoQCI4 GBR Delay 50ms RT gaming
Control Plane – idea of messaging in Bearer setup – mobile initiated
- LTE supports Public Safety – call setup time < 300ms – support group calls
- Other procedures –Sys Info Bcast, UE Rand Access Proc., UE Attach/Detach, TAU, Call Term., Paging – MME 500-800 UE msgs/hr, heavy load – 1500msgs/hr
- Example Call Setup – range from 2-3sec -UE eNodeB MME HSS SGW P-GWRRC Conn Establish (Several MSGs)Bearer Resource Allocation Req
Radio Bearer S1-U Bearer S5-U Bearer
UE S-GWeNodeB P-GW IPGTP-UUDP/IPL2L1IPPDCPRLCMACPHY QCI�DL-TFT � S5 TEIDS1 TEID � S5 TEIDRB-ID � S1 TEIDUL-TFT � RB ID
Identification & Authorization Request (long procedure) Modify Bearer Request Modify Bearer RequestModify Bearer ResponseModify Bearer ResponseBearer Resource Cmd Bearer Resource CmdCreate Bearer RqstCreate Bearer RqstDetermine E2EResources for BearerAllocate TFT, map to QCIand GTP-U TEIDActivate Bearer@ eNodeB andPiggybacks UEMessageeNodeB tellUE on RRC allocaton……..
11 © 2013 SAMSUNG Electronics Co.
ConfidentialLTE EUTRAN/EPC Load CharacteristicsResources
- Radio BW, Network (CN), CPU, Memory, Storage (varies on NE like HSS).
Response - State Machine driven
- Attachment, idle-connect, bearer setup – associated with timers/states
- Real-time sensitive – various parameters can be tuned – but User Experience Suffers
- User perception still all important – but hard deadlines exist
- Near native scheduling
Scalability & Orchestration
- Network tightly coupled – scaling out – ripples through NEs
- Unlike Public Cloud just adding new VMs will not do it
- Orchestration for scale out/in extremely complex
I/O – Need near native
- RAN – massive device pass-through BBU accelerators, EPC NIC device pass-through
Overcommit
- Delicate load calculation required for PLMN to scale on demand where needed
- Can’t apply Cloud 8:1, 30:1 ratios
12 © 2013 SAMSUNG Electronics Co.
ConfidentialCurrent State of NVFNFV ETSI ISG
- Initial White Paper published Oct 2012
- Spans Mobile, and Fixed Networks
- First serious attempt to virtualize Mobile/Fixed networks∙ Members Service Providers and all eco-system players
Proof of Concepts – Cloud Ran, Migration with Dev Pass-through, Cloud rGW
- Network Function Virtualization as a Service (NVF IaaS)
∙ Target Big Telco/Small Telco – lease NFVI as IasS for VNF and Cloud
- VNFaaS – move enterprise CPE into SP cloud, and later PE simplify Opex/Capex
∙ AR, NG-FW, QoS/DPI in owned/provisioned by SP
- VNPasS – Platform as a Service for example DNS, DHCP, email, FW
∙ Bring closer to APN – no tunneling back central IT infrastructure – total control
∙ SP provides bare services and Enterprise with config tools to manage the service
- VNF Forwarding Graphs
∙ Essential SDN – in multi-tenant environment OpenFlow capable config required to host f.e. small telco in VNFI
∙ Need SDN orchestration OpenStack enhancing Quantum for SDN – to span VNFs and Physical Network functions
- Mobile Core Virtualization – Goes along with NVFIaaS (to some extent)
∙ Improves Self Optimizing Networks – deliver performance where needed
- Cloud-RAN - key features for SON, on demand Radio BW, Opex/Capex savings
- Virtualizing home – vSTB, vGW – Fixed Network video/internet delivery to home
13 © 2013 SAMSUNG Electronics Co.
ConfidentialNFV Cloud-RAN Use CaseEvolution of Radio Access Network
• Single mode – 2G,3G – combined BBU & RRU• Scaled to maximum peak – waste of resources• Base Band Processing co-located with Remote Radio Unit
o Hard access, power an issue in some locations
BBUBBU
RRUs• Remote Radio Units distributed via fiber links• Base Band Processing support multiple technologies• BBU can be housed in-door RRUs strategically distributed
• Pooling of Radio Base Band Unit Processing• Capacity dynamically adjusted – example sport event • Resources maximized – delivered on demand• Several Technologies supported
vBBULTE vBBULTE vBBUUMTSPHY AceeleratorsMME/SGW SGSN
14 © 2013 SAMSUNG Electronics Co.
ConfidentialNew Virtualization HYP Mode
15 © 2013 SAMSUNG Electronics Co.
ConfidentialVirtualization MMU Extensions
16 © 2013 SAMSUNG Electronics Co.
ConfidentialInterrupt Virtualization Extensions
17 © 2013 SAMSUNG Electronics Co.
ConfidentialDevice Pass-throughArchitecture/cost of interrupts
- BBU cloud has hundreds of devices passed through – small cells many RRUs and fiber links
- Libvirt, qemu not ready for such passive pasthrough, another issue handling faults
- RRU to/from BBU PHY OFDMA (channels framing,FEC) to MAC – L2 logical Channels
- L3 - RRC, NAS, IP
~80 regsGPRegsVFP/SIMDCP15 regsGuest ~80 regsGPRegsVFP/SIMDCP15Host
• MMU Pass-through – to usero Devices emulated – trap to QEMU – not this typeo GVA � IPA � HPA – Direct access to HW regs
� PCI – looks up target BARs for HPA, QEMU selects IPA� DT – Device node with HPA, QEMU selects IPA
o No performance penalty for MMU pass-through• Cost of Exit/Enter – executed in HYP – optimized assembler
o Similar to process switch, Guest switch very costly� More so – OS system registers saved/restored� All banked regs (dabt, iabt, irq, …)
o No concept of light-weight context switch like threadso Goal avoid at all costs
QEMU GuestKVM/HostHYP Mode (PL2)Hardware (PHY to UE, NIC to CN)
QEMU Guest UserKernelMemory DriversDriversT&E Device IPA IPAGVA GVATask TaskGVAHPA
MMIO Device Passthrough (PHY to UE/NIC)GuestExitReturnTo HostIRQToHost
Inject To Guest EOI ExitvGICDupdate Save GuestState Load Host State ~80 regsGPRegsVFP/SIMDCP15HostSave HostState ~80 regsGPRegsVFP/SIMDCP15 regsGuestLoad Guest State
18 © 2013 SAMSUNG Electronics Co.
ConfidentialDevice Pass-throughIRQ over head and optimizationsQEMU Guest
KVM/HostHYP Mode (PL2)Hardware (PHY to UE, NIC to CN)
QEMU Guest UserKernelMemory DriversDriversT&E Device IPA IPAGVA GVATask TaskGVAHPA
MMIO Device Passthrough (PHY to UE/NIC)GuestExitReturnTo HostIRQToHost
Inject To Guest EOI ExitvGICDupdate
1. Guest executes – exit to hyp mode – save guest/restore host2. Host enable Interrupts – deliver to host – 1st Complete IRQ OS PATH3. Inject to Guest – save host, restore guest & 2nd Complete IRQ OS PATH4. Guest EOI – exit save guest/restore host5. Update virtual distributor6. Resume Guest – save host/restore guestNote: Applying most direct injection no – irqfd, and additional threadsTesting by Virtual Open Systems reveals atleast 5x delay
Optimization 1• ARM supports piority drop/deactivation after ack IRQ priority dropsand can deactivate from Guest during EOI w/no exit
• ARM can inject hwirq• Eliminate 4-6 (experimenting)Optimization 2• Process Interrupts directly from HYP mode
o Currently HYP mode limited to low level• Build hwirq inject to Guest• Eliminate 2-6 HOWEVER requires C-code, more overhead in HYP mode
In Addition• IRQ CPU affinity must match vCPU affinity – either bind or follow vCPUotherwise you need IPIs – very slow
• Prefereable vCPU in idle not exit, wait for event not exit• Future GIC versions per IRQ – direct delivery – still handle sleeping guests
19 © 2013 SAMSUNG Electronics Co.
ConfidentialDynamic Load BalancingCloud Ran – dynamic load balancing between VMs
- Cell sites exhibit various loads throughout the day
- vCPU hotplug∙ Unplug/plug vCPUs – dynamically scale to demand
Multicore Platform
vBBU
unplug/plug vCPUsPower MgmtvCPUs Idling
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
vBBU vBBU
20 © 2013 SAMSUNG Electronics Co.
ConfidentialFast Path between Radio/CNZero copy message passing – Guest/Guest, Guest to Host
- ivshmem one example, add enhancements
QEMU Guest
Hardware (PHY to UE, NIC to CN)T&E Device
QEMURANPHYSHMDEV SHMDEVSHMDRV SHMDRVT&E Device
TaskTask NICHYP Mode
GuestPass-throughHOST/KVM
OptimizedInterruptSignaling
• BBU needs fast switching – radio � core network• Can’t have full stack with expensive IPC• Want to separate Radio and Core functions• Radio
o Dedicate CPUs – poll –or- optimized dev passthrough� Dedicate is never good.
o PHY Device passthrough –� Pull of wire directly to user space + MAC + L3 packet� Zero copy to inter-guest shared memory
• Core Networko Pull packet from Shared Memory Ring buffero Tx/Rx to Core Network SCTP or GTP-U
• Issues:o Signaling – signaling/interrupt path too long (red lines)
� Guest via UIO writes to IRQ reg, exit, MMIO to QEMU� QEMU ‘event’ peer QEMU� Emulated Device on peer inject interrupt to Guest� Solution: interrupt HYP mode, coalesce
o Discovery – to pair Guest must discover shared memory segments dynamically
� Many vBBU clouds on demand create/destroy� Solution: shared memory discover protocol via emulateddevice through QEMU (green line)
SharedMemoryvBBU Instance
21 © 2013 SAMSUNG Electronics Co.
ConfidentialRT-SchedulingNetwork stack time sensitive - requirements
- Highres timers a must
- Preemptibility – event PREEMPT_RT a must – prevent interrupt inversion
- Scheduling at several levels – host and guest threads
Timers
- Arch-timers improvement no exit on reg updates
- But still exit on timer fire – need injection
- Issue for high res timers in Guest
- Again near native IRQ pass-through important
IRQs & page faults
- Any host IRQ can prevent guest from running
- PFRA as well (if so most likely not tuned for RT)
PREEMPT_RT
- Not really tested with virtualization
IRQ SourcesLinux Host & KVMHardwareGuest
vMME vS-GW(PREEMPT_RT)Timer Events � LatencyTimer Events � LatencyVFIO, Protocol FSM, …Spin Lock=mutexInterruptHigher Prio Thread Executes no Int/Prio Inver.
Spin Lock=mutexInterruptHigher Prio Thread Executes no Int/Prio Inver (PREEMPT_RT)
22 © 2013 SAMSUNG Electronics Co.
ConfidentialRT-SchedulingPossible Optimizations – area of research
Linux Host & KVMHardwareGuest
vMME vS-GW(PREEMPT_RT)Timer Events � LatencyTimer Events � LatencyVFIO, Protocol FSM, …Spin Lock=mutexInterruptHigher Prio Thread Executes no Int/Prio Inver.
Spin Lock=mutexInterruptHigher Prio Thread Executes no Int/Prio Inver (PREEMPT) Host PREEMPT_RT
- Eliminate spinlocks, replace with mutexes
- Prioritize interrupts - non-VM targeted IRQs
- vCPUs – prioritize at higher priority
- VM IRQs don’t run as threads – timers, dev-passthrough IRQs
- Use Priority Drop/Deactivation to schedule
highest priority interrupts for VMs
Guests most likely PREEMPT only
Challenges –
- multiple VMs sharing CPU∙ Priority between them
∙ Priority of their IRQs
∙ Context switching an issue, depends on load
- OS periodic tick work - CONFIG_NO_HZ_FULL
∙ Promising, for dedicated vCPU to core reduces tick overhread
∙ Improves multiple vCPUs as well tick rate
IRQ Sources
Thank you.
© 2013 SAMSUNG Electronics Co.
Mario Smarduch
Senior Virtualization Architect
Open Source Group
Samsung Research America (Silicon Valley)
m.smarduch@samsung.com
Recommended