Upload
others
View
14
Download
2
Embed Size (px)
Citation preview
EXTERNAL USE1
Agenda
• Virtualization Introduction
− KVM/QEMU
− Containers
− libvirt
• I/O in KVM Environments
− Device Virtualization - virtio
− Device Direct-Assignment - VFIO
including a whirl wind introduction of VFIO integration of the
QorIQ LS-series Data Path Acceleration Architecture (DPAA2)
• Q&A
EXTERNAL USE2
VIRTUALIZATION INTRODUCTION
EXTERNAL USE3
What is … ?
Virtualization … Hardware and software technologies that provide an abstraction layer that enables running multiple operating systems on a single computer system
App
Linux
AppApp
RTOS
AppApp
Linux
AppApp
Linux
AppApp
RTOS
AppApp
Linux
App
Hypervisor
A hypervisor … is a software component that
creates and manages virtual machines which
can run operating systems
EXTERNAL USE4
Basic Use Cases
• Consolidation / Migration
− Consolidate separate (legacy) systems onto one hardware platform
− Multiple operating systems/partitions on a single multi-core chip
− Multiple homogeneous operating systems on multiple cores
− Preserve investment in software
− Run legacy software alongside new software
− Add Linux services to a non-Linux platform
• Divided workload (e.g. control plane, data plane)
− Multiple operating systems, possibly heterogeneous, need to work securely and seamlessly together
− Isolation mechanisms are needed for safety, robustness
− Efficient inter-partition communication mechanisms are needed for cooperation
EXTERNAL USE5
Basic Use Cases
• Isolate or sandbox untrusted software
− Isolate untrusted operating systems: Proprietary OS + open OS (eg Linux)
− Isolate end-user installed software
−Software under test
• Security
−Secure partition for sensitive security tasks (e.g. access rights control, rule definitions, key storage/management)
• High availability
−active/standby configuration without additional hardware
EXTERNAL USE6
SDN/NFV Use Cases
• Specialized processing functions (firewall, …) are now commonly implemented in virtual OS instances call Virtual Network Functions (VNFs)
• Full processing sequence (e.g. data plane) is implemented through "service chaining" multiple VMs
• This requires efficient, high-performance I/O between VMs (network) or between VMs and peripherals (storage, PCIe)
• Originated in cloud and data center, now strongly expanding in networking
EXTERNAL USE7
Virtualization Technologies Approaches
Linux Containers(OS Virtualization)
• Low Overhead
• Isolation and Resource Control in Linux
• Decreased Isolation (Kernel sharing)
ContCont
Multicore Hardware
Cont
Linux ®
LXC LXC LXC
App App App
Embedded Hypervisor
• Lightweight Hypervisor
• Resource Partitioning
• Para-Virtualization
• Failover support
• 3rd Party OSs
VM
Multicore Hardware
VMVM
OS OS OS
Embedded Hypervisor
App App App
KVM
• Linux ® Hypervisor
• Resource Virtualization
• Resource Oversubscription
• 3rd Party OSs
Multicore Hardware
VM
App
OS
Linux
KVM
VM
App
OS
App
EXTERNAL USE8
KVM/QEMU – Overview
• KVM/QEMU– open source virtualization
technology based on the Linux kernel
• KVM is a Linux kernel module
• QEMU is a user space emulator that
uses KVM for acceleration
• Run virtual machines alongside Linux
applications
• No or minimal OS changes required
• Virtual I/O capabilities
• Direct/pass thru I/O – assign I/O
devices to VMs
Multicore Hardware
LinuxKVM
App
Virtual Machine 1
QEMU
App
OS
Virtual Machine 2
QEMU
App
OS
EXTERNAL USE9
KVM/QEMU
• QEMU is a user space emulator that uses KVM for acceleration
−Uses dedicated threads for vcpus and I/O
−KVM leverages hardware virtualization to run guest with higher privileges
−Virtual chip emulation in kernel
− I/O
Provides dedicated virtio I/O devices and standard drivers in Linux kernel
Uses VFIO Linux framework to direct assign physical PCI devices
Direct notifications between I/O threads and KVM using eventfds
vhost provides in-kernel virtio emulation
Multi-queue virtio devices connected to multi-queue tap devices
− Provides services for console, debug, reset, watchdog, etc
EXTERNAL USE10
Linux Containers
• LinuX Containers : Low overhead,
lightweight, secure partitioning of Linux
applications into different domains
• Guest kernel == Host kernel… but OS appears isolated
OS level virtualization
• Based on a collection of technologies
including kernel components (cgroups,
namespaces) and user-space tools (LXC).
• Can control resource utilization of domains –
CPU, Memory, I/O bandwidth
• Not platform dependent
1
1 7 12
15 1
4 7
13
21
1
4 9
15 17
Container 1
Container 2 Container 3• close to 0% performance overhead
• process-level virtualization
Container
Container
Linux ®
AppApp App
EXTERNAL USE11
Container Technologies
Linux Kernel
Namespaces cgroups
liblxc libcontainer
Docker
Other Technologies
Low-level API
ContainerDistribution
Flockport DockerHub
libvirt_lxc
libvirtd
JailsFreeBSD
ZonesSolaris
OpenVZ
Linux VServer
Google Containers
Client
ContainerEngine(Daemon)
LXC virsh docker
LXD
seccomp
Migration
CRIU
EXTERNAL USE12
Libvirt
• A toolkit to interact with the virtualization capabilities of Linux (and other OSes / hypervisors)
• Goal: to provide a common and stable layer sufficient to securely manage domains on a node, possibly remote
• Has drivers for KVM/QEMU and Linux containers
• Many management applications supported
• http://libvirt.org/Multicore Hardware
Linux
libvirtdQemudriver
LXCdriver
Libvirt API
Domain Domain
LXC KVM
EXTERNAL USE13
DEVICE AND I/O VIRTUALIZATION
EXTERNAL USE14
Device Usage in Virtual Environments
Direct Access
• Fast native performance• Direct access to hardware
I/O
OSOS
I/O
Driver
EmulatedHW
• Driver in Hypervisor• Emulation in Hypervisor• Unmodified Drivers in
Guest OSOS
I/O
Driver
Driver
Emulation
Driver
Para-Virtualized
• Driver in Hypervisor• Modified Drivers in
Guest
OS
I/O
Custom Driver
OS
Driver
Custom Driver
PartitionableHW
• Hardware partitioned• One hardware block
OSOS
I/O
DriverDriver
EXTERNAL USE15
Scala
bility a
nd P
erf
orm
ance
Flexibility
Bare Metal
No Guest Modifications
Emulated I/O
virtio(para-virtualized)
I/O Virtualization - Performance vs Flexibility
Trend
Direct Assignment
(VFIO)
vhost
vhostoptim
vhost-user
EXTERNAL USE16
DEVICE VIRTUALIZATION
VIRTIO
EXTERNAL USE17
virtio
• Device abstraction layer of para-
virtualized hypervisor
−Standard for VMs/VNFs
−Appearance as physical devices
−Uses standard virtual drivers and
discovery mechanisms
virtio-net : Ethernet virtual driver
vhost-net : optimizes Ethernet virtual
driver by eliminating QEMU context switch
virtio-pci
• Backend drivers are vendor specific in
host Linux; transparent to VM/VNFsSources: https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Virtualization_Tuning_and_Optimization_Guide
Guest Linux
Host Linux / QEMU
virtio frontend virtio frontend
virtio-console
virtio-blk
virtio-net
virtio-pci
virtio-balloon
virtio-scsi
virtio
transport
virtio back-end drivers
EXTERNAL USE18
DIRECT-ASSIGNMENT
VFIO
EXTERNAL USE19
Direct-Assignment of I/O Devices
• Device drivers access from user-space
−Device pass-through (libusb, libscsi)
−Map /mem (not recommended)
• UIO (User-space I/O)
Device access (mmap device MMIO regions)
Interrupt support
No isolation or translation
• VFIO (Virtual Function IO)
Linux user space driver infrastructure for DMA devices
EXTERNAL USE20
VFIO
• VFIO (Virtual Function IO) :
− Device access : mmap() device MMIO regions
− Enforces IOMMU protection/translation/isolation (iova to real address)
− IOMMU programming interface
− High performance interrupt support (INTx, MSIs & MSI-X)
− Bus support : PCI, platform devices, LS2 MC bus
• VFIO PCI - abstracts devices as :
− Regions :
PCI configuration space
MMIO and I/O port BAR spaces
MMIO PCI ROM access
− IRQs include : INTx (legacy interrupts), Message Signaled Interrupts (MSI & MSI-X)
Source: www.linux-kvm.org/wiki/images/e/ed/Kvm-forum-2013-VFIO-VGA.pdf
Multicore Hardware
Device
VM
App
Guest OS
AppApp
IOMMU
DMA
MMU
MMIOHost OS
IRQs
EXTERNAL USE21
HW
Kernel
Qemu
KVM
VFIO for PCI Bus
Guest
PCI Device driverInterrupt
Controller
driver
irqfd
PC
I
em
ula
tion
Interrupt
Controller
Memory
Control Path
Data Path
IRQ Path
Kick Path
DP
VFIO
MMIO
Emulated
Interrupt
Controller
IRQ
FD
IOMMU
ICIDICID
ICID
VFICID
1
2
3
3
5
CFG BAR4
PCI-SRIOV
PFICID
CFG BAR
Interrupt
Controller driver
irq
2
EXTERNAL USE22
NXP LS-Series - DPAA 2 Secure Direct Assignment
• Management Complex (MC) hardware and firmware creates objects(network interfaces, switches) from hardware sub-components (NI, SW, MAC, MUX, …), which appear on a Linux MC bus .
• MC bus is conceptually similar to a PCIe bus (enumerable, hot-plug capable, …)
• DPAA2 objects can be directly assigned to VMs using VFIO
• Supports IOMMU translation and protection for user-space(ODP, DPDK and QEMU)
EXTERNAL USE23
DPAA2 Management Complex
EXTERNAL USE24
HW
Kernel
Qemu
KVM
VFIO for MC Bus
Guest
Eth Device driverInterrupt
Controller
driver
Interrupt
Controller driver
irqfd
De
vic
e T
ree
irq
Interrupt
Controller
Memory
Control Path
Data Path
IRQ Path
Kick Path
DP
VFIO
MMIO
Emulated
Interrupt
Controller
IRQ
FD
IOMMU
ICID
1
4
4
4
2
DP
RC
DP
NI
…
3
MCI/O
ICIDICID
ICID4
3
4
EXTERNAL USE25
virtio vs Direct Assignment (VFIO)
virtio Direct Assignment
Flexibility High Med
Guest Driver Generic HW dependent
Device Sharing Yes No
Live Migration Yes PoC
Performance Medium High
Processing Backend is SW emulated in Host or in Firmware Reduced processing in Host
HW support for
isolationNo Required (SMMU)
Licensing Open Source* Open Source
Upstreamable ? Firmware accelerations - NO YES
History
Started as software implementation in Linux
and now API is standardized (OASIS)
Standard add-ons may not be accepted in Linux
upstream.
Framework implemented in Linux for
PCI devices that is extended for
Platform devices.
EXTERNAL USE26
Q & A