43
Neo Jia & Kirti Wankhede, 08/25/2016 VGPU ON KVM VFIO BASED MEDIATED DEVICE FRAMEWORK

VGPU ON KVM - linux-kvm.org · VGPU ON KVM VFIO BASED MEDIATED DEVICE FRAMEWORK. 2 AGENDA Background / Motivation Mediated Device Framework –Overview Mediated Device Framework –Deep-Dive

  • Upload
    doliem

  • View
    249

  • Download
    0

Embed Size (px)

Citation preview

Neo Jia & Kirti Wankhede, 08/25/2016

VGPU ON KVM VFIO BASED MEDIATED DEVICE FRAMEWORK

2

AGENDA

Background / Motivation

Mediated Device Framework – Overview

Mediated Device Framework – Deep-Dive

Current Status

Demo

Future work

3

TODAY, HOW GPU PRESENTED INSIDE KVM VM

Great performance

Full API compatibility – GPU vendor driver inside the virtual machine

Poor density – limited by PCI-E resource

Minimal visibility of the device on the host –generic vfio_pci owns this device, and only perform enable/disable/route interrupts, reset the device

Difficult to cover all graphics workload – either underutilized or too small to scale

VFIO device pass-through [1]

VIRTUALMACHINEGuest OS

NVIDIA Driver

RemoteGraphics

Apps

VIRTUALMACHINEGuest OS

NVIDIA Driver

RemoteGraphics

Apps

VIRTUALMACHINE

Guest OS

Vendor Driver

Apps

GPU Pass-through

GPU

KVM host

VFIO_PCI

DMA remapper

4

WHAT IS VGPU?

Physical GPU shared among multiple virtual machines

Great performance and suitable for different workload

Full API compatibility – GPU vendor driver inside the virtual machine

Full device visibility to the hypervisor/host – allows for device-specific features such as dynamically monitoring and tuning performance, detailed error reporting, etc.

High level overview

VIRTUALMACHINEGuest OS

NVIDIA Driver

RemoteGraphics

Apps

VIRTUALMACHINEGuest OS

NVIDIA Driver

RemoteGraphics

Apps

VIRTUALMACHINE

Guest OS

Apps

vGPU

Vendor Driver

KVM host

GPU

Vendor Driver

DMA remapper

5

I/O VIRTUALIZATION

SR-IOV devices - supported by standard VFIO PCI (Direct Assignment) today

Established QEMU VFIO/PCI driver, KVM agnostic and well-defined UAPI

Virtualized PCI config /MMIO space access, interrupt delivery

Modular IOMMU, pin and map memory for DMA

Mediated devices – non SR-IOV, require vendor-specific drivers to mediate sharing

Leveraging existing VFIO framework, UAPI

Vendor driver - Mediated Device – managing device’s internal I/O resource

SR-IOV and mediated solutions

6

MEDIATED DEVICE FRAMEWORK

Mediated core module (new)

Mediated bus driver, create mediated device

Physical device interface for vendor driver callbacks

Generic mediate device management user interface (sysfs)

Mediated device module (new)

Manage created mediated device, fully compatible with VFIO user API

VFIO IOMMU driver (enhancement)

VFIO IOMMU API TYPE1 compatible, easy to extend to non-TYPE1

A common framework for mediated I/O devices

7

GPUGPU

MEDIATED DEVICE FRAMEWORK

VFIOMDEV

VFIOTYPE1 IOMMU

Device RegisterInterface

PCIE Device

Mediated CBsMediatedBus Driver

Mdev Driver Register Interface

RAM

IOMMUMMU

Mdev SysFS VFIO UAPITYPE1 IOMMU

UAPI

Mediated Core

Vendor driverVendor driver

8

MEDIATED DEVICE FRAMEWORK -INITIALIZATION

9

MEDIATED DEVICE FRAMEWORKWith QEMU – Driver Initialization

Registers VFIO MDEV as driver

QEMU

VM

Guest OS

Guest

RAM

GPUGPU

VFIOMDEV VFIO

TYPE1 IOMMUDevice Register

Interface

PCIE Device

Mediated CBsMediatedBus Driver

Mdev Driver Register Interface

RAM

IOMMUMMU

Mdev SysFS VFIO UAPI

PIN/UNPIN

TYPE1 IOMMU UAPI

Mediated Core

Vendor driverVendor driver

10

MEDIATED DEVICE FRAMEWORKWith QEMU – Driver Initialization

Registers VFIO MDEV as driver

Vendor driver registers devicesQEMU

VM

Guest OS

Guest

RAM

GPUGPU

VFIOMDEV VFIO

TYPE1 IOMMUDevice Register

Interface

PCIE Device

Mediated CBsMediatedBus Driver

Mdev Driver Register Interface

RAM

IOMMUMMU

Mdev SysFS VFIO UAPI

PIN/UNPIN

TYPE1 IOMMU UAPI

Mediated Core

Vendor driverVendor driver

11

MEDIATED DEVICE FRAMEWORK

Registers VFIO MDEV as driver

Vendor driver registers devices

Vendor driver registers Mediated CBs

With QEMU – Driver Initialization

QEMU

VM

Guest OS

Guest

RAM

GPUGPU

VFIOMDEV VFIO

TYPE1 IOMMUDevice Register

Interface

PCIE Device

Mediated CBsMediatedBus Driver

Mdev Driver Register Interface

RAM

IOMMUMMU

Mdev SysFS VFIO UAPI

PIN/UNPIN

TYPE1 IOMMU UAPI

Mediated Core

Vendor driverVendor driver

12

MEDIATED DEVICE FRAMEWORK

After vendor driver device registration, under physical device sysfs:

mdev_create : create a virtual device (aka mdev device)

mdev_destroy : destroy a mdev device

mdev_supported_types : supported mdev and configuration of this device

Mdev node: /sys/bus/mdev/devices/$mdev_UUID/

online: start and stop virtual device

Mediated Device sysfs

13

MEDIATED DEVICE FRAMEWORKWith QEMU – Device Initialization

Registers VFIO MDEV as driver

Vendor driver registers devices

Vendor driver registers Mediated CBs

User writes mdev sysfs to create mdev device

QEMU

VM

Guest OS

Guest

RAM

GPUGPU

VFIOMDEV VFIO

TYPE1 IOMMUDevice Register

Interface

PCIE Device

Mediated CBsMediatedBus Driver

Mdev Driver Register Interface

RAM

IOMMUMMU

Mdev SysFS VFIO UAPI

PIN/UNPIN

TYPE1 IOMMU UAPI

Mediated Core

Vendor driverVendor driver

echo UUID >

/sys/bus/…/<device>/

mdev_create

14

MEDIATED DEVICE FRAMEWORKWith QEMU – Device Initialization

Registers VFIO MDEV as driver

Vendor driver registers devices

Vendor driver registers Mediated CBs

User writes mdev sysfs to create mdev device

QEMU calls VFIO API to add VFIO dev to IOMMU container, group, get fd back

QEMU

VM

Guest OS

Guest

RAM

GPUGPU

VFIOMDEV VFIO

TYPE1 IOMMUDevice Register

Interface

PCIE Device

Mediated CBsMediatedBus Driver

Mdev Driver Register Interface

RAM

IOMMUMMU

Mdev SysFS VFIO UAPI

PIN/UNPIN

TYPE1 IOMMU UAPI

Mediated Core

Vendor driverVendor driver

/sys/bus/mdev/devices/$mdev_UUID

Mdev fd

15

MEDIATED DEVICE FRAMEWORKWith QEMU – Device Initialization

Registers VFIO MDEV as driver

Vendor driver registers devices

Vendor driver registers Mediated CBs

User writes mdev sysfs to create mdev device

QEMU calls VFIO API to add VFIO dev to IOMMU container, group, get fd back

QEMU access device fd and present it into VM

QEMU

VM

Guest OS

Guest

RAM

GPUGPU

VFIO

TYPE1 IOMMUDevice Register

Interface

PCIE Device

Mediated CBsMediatedBus Driver

Mdev Driver Register Interface

RAM

IOMMUMMU

Mdev SysFS VFIO UAPI

PIN/UNPIN

TYPE1 IOMMU UAPI

Mediated Core

Vendor driverVendor driver

/sys/bus/mdev/devices/$mdev_UUID

Mdev fd

PCIE

MDEVVendor driver

VFIOMDEV

16

MEDIATED DEVICE ACCESS - EMULATED

17

MEDIATED DEVICE ACCESS

Virtual device memory region are presented inside guest for consistent view of vendor driver

Access to emulated regions are redirected to mediated vendor driver for virtualization support

Access to passthrough region are directly sent to device corresponding region for max performance

1st access redirected to mediated vendor driver for CPU page table setup

Emulated vs Passthrough

18

MEDIATED DEVICE ACCESSEmulated

QEMU gets region info via VFIO UAPI from vendor driver thru VFIO MDEV and Mediated CBs

QEMU

VM

Guest OS

Guest

RAM

TYPE1 IOMMUDevice Register

Interface

MediatedBus Driver

RAM

IOMMUMMU

Mdev SysFS

PIN/UNPIN

TYPE1 IOMMU UAPI

Mediated Core

/sys/bus/mdev/devices/$mdev_UUID

Mdev fd

VFIO

KVM

Vendor driver

GPUGPUPCIE Device

PCIE

MDEV

Vendor driver

VFIO UAPI

VFIOMDEV

Mdev Driver Register Interface

Mediated CBs

Vendor driver

19

MEDIATED DEVICE ACCESSEmulated

QEMU gets region info via VFIO UAPI from vendor driver thru VFIO MDEV and Mediated CBs

Vendor driver accesses MDEV MMIO trapped region backed by mdev fdtriggers EPT violation

QEMU

VM

Guest OS

Guest

RAM

VFIO TYPE1 IOMMUDevice Register

Interface

Mediated CBsMediatedBus Driver

Mdev Driver Register Interface

RAM

IOMMUMMU/EPT

Mdev SysFS VFIO UAPI

PIN/UNPIN

TYPE1 IOMMU UAPI

Mediated Core

Mdev fd

KVM

Vendordriver

PCIE

MDEV

GPUGPUPCIE Device

MMIO

Vendor driverVendor driver

VFIOMDEV

/sys/bus/mdev/devices/$mdev_UUID

20

MEDIATED DEVICE ACCESSEmulated

QEMU gets region info via VFIO UAPI from vendor driver thru VFIO MDEV and Mediated CBs

Vendor driver accesses MDEV MMIO trapped region backed by mdev fdtriggers EPT violation

KVM services EPT violation and forwards to QEMU VFIO PCI driver

QEMU

VM

Guest OS

Guest

RAM

VFIO TYPE1 IOMMUDevice Register

Interface

Mediated CBsMediatedBus Driver

Mdev Driver Register Interface

RAM

IOMMUMMU/EPT

Mdev SysFS VFIO UAPI

PIN/UNPIN

TYPE1 IOMMU UAPI

Mediated Core

Mdev fd

KVM

GPUGPUPCIE Device

Vendor driverVendor driver

PCIE

MDEV

MMIOVendor driver

VFIOMDEV

/sys/bus/mdev/devices/$mdev_UUID

21

MEDIATED DEVICE ACCESSEmulated

QEMU gets region info via VFIO UAPI from vendor driver thru VFIO MDEV and Mediated CBs

Vendor driver accesses MDEV MMIO trapped region backed by mdev fdtriggers EPT violation

KVM services EPT violation and forwards to QEMU VFIO PCI driver

QEMU convert request from KVM to R/W access to MDEV fd

QEMU

VM

Guest OS

Guest

RAM

VFIO TYPE1 IOMMUDevice Register

Interface

Mediated CBsMediatedBus Driver

Mdev Driver Register Interface

RAM

IOMMUMMU

Mdev SysFS VFIO UAPI

PIN/UNPIN

TYPE1 IOMMU UAPI

Mediated Core

Mdev fd

KVM

GPUGPUPCIE Device

Vendor driverVendor driver

PCIE

MDEV

MMIOVendor driver

VFIOMDEV

/sys/bus/mdev/devices/$mdev_UUID

22

MEDIATED DEVICE ACCESSEmulated

QEMU gets region info via VFIO UAPI from vendor driver thru VFIO MDEV and Mediated CBs

Vendor driver accesses MDEV MMIO trapped region backed by mdev fdtriggers EPT violation

KVM services EPT violation and forwards to QEMU VFIO PCI driver

QEMU convert request from KVM to R/W access to MDEV fd

RW handled by vendor driver via Mediated CBs and VFIO MDEV

QEMU

VM

Guest OS

Guest

RAM

VFIO TYPE1 IOMMUDevice Register

Interface

Mediated CBsMediatedBus Driver

Mdev Driver Register Interface

RAM

IOMMUMMU

Mdev SysFS VFIO UAPI

PIN/UNPIN

TYPE1 IOMMU UAPI

Mediated Core

Mdev fd

KVM

Vendor driverVendor driver

GPUGPUPCIE Device

PCIE

MDEV

MMIOVendor driver

VFIOMDEV

/sys/bus/mdev/devices/$mdev_UUID

23

MEDIATED DMA TRANSLATION

24

MEDIATED DMA TRANSLATIONWith QEMU – Memory Tracking

QEMU Starts

QEMU

VFIO

TYPE1 IOMMUDevice Register

Interface

Mediated CBsMediatedBus Driver

Mdev Driver Register Interface

Mdev SysFS VFIO UAPI

PIN/UNPIN

TYPE1 IOMMU UAPI

Mediated Core

VM

Guest OS

Guest

RAM

Vendor driver

PCIE

MDEV

RAM

IOMMUMMU

GPUGPUPCIE Device

VFIOMDEV

25

MEDIATED DMA TRANSLATIONWith QEMU – Memory Tracking

QEMU Starts

Memory regions gets added by QEMUQEMU

VFIO

TYPE1 IOMMUDevice Register

Interface

Mediated CBsMediatedBus Driver

Mdev Driver Register Interface

Mdev SysFS VFIO UAPI

PIN/UNPIN

TYPE1 IOMMU UAPI

Mediated Core

VM

Guest OS

Guest

RAM

QEMU

VA

VA

GFN

RAM

IOMMUMMU

GPUGPUPCIE Device

Vendor driver

PCIE

MDEV

VFIOMDEV

26

MEDIATED DMA TRANSLATIONWith QEMU – Memory Tracking

QEMU Starts

Memory regions gets added by QEMU

QEMU calls VFIO_DMA_MAP via Memory listener

QEMU

VFIO

TYPE1 IOMMUDevice Register

Interface

Mediated CBsMediatedBus Driver

Mdev Driver Register Interface

Mdev SysFS VFIO UAPI

PIN/UNPIN

TYPE1 IOMMU UAPI

Mediated Core

VM

Guest OS

Guest

RAM

QEMU

VA

VA

VA

VA

GFN

GFN

GFN

RAM

IOMMU

GPUGPUPCIE Device

Vendor driver

PCIE

MDEV

MMU

VFIOMDEV

27

MEDIATED DMA TRANSLATIONWith QEMU – Memory Tracking

QEMU Starts

Memory regions gets added by QEMU

QEMU calls VFIO_DMA_MAP via Memory listener

TYPE1 IOMMU tracks <VA, GFN>

QEMU

VFIO

TYPE1 IOMMUDevice Register

Interface

Mediated CBsMediatedBus Driver

Mdev Driver Register Interface

RAM

IOMMUMMU

Mdev SysFS VFIO UAPI

PIN/UNPIN

TYPE1 IOMMU UAPI

Mediated Core

VM

Guest OS

Guest

RAM

QEMU

VA

VA

VA

VA

GFN

GFN

GFN

VA GFN

GPUGPUPCIE Device

Vendor driver

PCIE

MDEV

VFIOMDEV

28

MEDIATED DMA TRANSLATIONWith QEMU – Runtime Memory pinning

QEMU Starts

Memory regions gets added by QEMU

QEMU calls VFIO_DMA_MAP via Memory listener

TYPE1 IOMMU tracks <VA, GFN>

Vendor driver pin/translate GFN by TYPE1 IOMMU to get PFN

QEMU

VFIO

TYPE1 IOMMUDevice Register

Interface

Mediated CBsMediatedBus Driver

Mdev Driver Register Interface

RAM

IOMMUMMU

Mdev SysFS VFIO UAPI

PIN/UNPIN

TYPE1 IOMMU UAPI

Mediated Core

VM

Guest OS

Guest

RAM

QEMU

VA

VA

VA

VA

GFN

GFN

GFN

VA GFN

PFN

GPUGPUPCIE Device

Vendor driver

PCIE

MDEV

VFIOMDEV

29

MEDIATED DMA TRANSLATIONWith QEMU – Runtime Memory pinning

QEMU Starts

Memory regions gets added by QEMU

QEMU calls VFIO_DMA_MAP via Memory listener

TYPE1 IOMMU tracks <VA, GFN>

Vendor driver pin/translate GFN by TYPE1 IOMMU to get PFN

Vendor driver call pci_map_sg to map PFNs to BFN, program DMA

QEMU

VFIO

TYPE1 IOMMUDevice Register

Interface

Mediated CBsMediatedBus Driver

Mdev Driver Register Interface

RAM

IOMMUMMU

Mdev SysFS VFIO UAPI

PIN/UNPIN

TYPE1 IOMMU UAPI

Mediated Core

Vendor driver

VM

Guest OS

Guest

RAM

QEMU

VA

VA

VA

VA

GFN

GFN

GFN

PFN

PFN

PFN

VA GFN

VA GFN

VA GFN

GPUGPUPCIE Device

DMA

BFN

BFN

BFN

PCIE

MDEV

VFIOMDEV

30

MEDIATED DEVICE - INTERRUPT

31

VFIOMDEV

MEDIATED DEVICE FRAMEWORKWith QEMU – Interrupt Setup

QEMU query MDEV supported interrupt type, provided by vendor driver

QEMU

VM

Guest OS

Guest

RAM

TYPE1 IOMMUDevice Register

Interface

MediatedBus Driver

RAM

IOMMUMMU

Mdev SysFS VFIO UAPI

PIN/UNPIN

TYPE1 IOMMU UAPI

Mediated Core

Mdev fd

VFIO

KVM

irqfd

GPUGPUPCIE Device

Vendor driver

PCIE

MDEV

Mediated CBs

Mdev Driver Register Interface

Vendor driver

/sys/bus/mdev/devices/$mdev_UUID

32

QEMU

MEDIATED DEVICE FRAMEWORKWith QEMU – Interrupt Setup

QEMU query MDEV supported interrupt type, provided by vendor driver

QEMU setups up KVM IRQFD

VM

Guest OS

Guest

RAM

TYPE1 IOMMUDevice Register

Interface

Mediated CBsMediatedBus Driver

Mdev Driver Register Interface

RAM

IOMMUMMU

Mdev SysFS VFIO UAPI

PIN/UNPIN

TYPE1 IOMMU UAPI

Mediated Core

Mdev fd

VFIO

KVM

irqfd

GPUGPUPCIE Device

Vendor driverVendor driver

PCIE

MDEV

VFIOMDEV

/sys/bus/mdev/devices/$mdev_UUID

33

VFIOMDEV

QEMU

MEDIATED DEVICE FRAMEWORKWith QEMU – Interrupt Setup

QEMU query MDEV supported interrupt type, provided by vendor driver

QEMU setups up KVM IRQFD

QEMU notifies the vendor driver IRQFD via VFIO PCI UAPI

VM

Guest OS

Guest

RAM

TYPE1 IOMMUDevice Register

Interface

MediatedBus Driver

RAM

IOMMUMMU

Mdev SysFS

PIN/UNPIN

TYPE1 IOMMU UAPI

Mediated Core

Mdev fd

VFIO

KVM

irqfd

VFIO UAPI

Mediated CBs

Mdev Driver Register Interface

Vendor driverVendor driver

GPUGPUPCIE Device

PCIE

MDEV

/sys/bus/mdev/devices/$mdev_UUID

34

MEDIATED DEVICE FRAMEWORKWith QEMU – Interrupt injection in runtime

QEMU query MDEV supported interrupt type, provided by vendor driver

QEMU setups up KVM IRQFD

QEMU notifies the vendor driver IRQFD via VFIO PCI UAPI

Vendor driver inject interrupt by signaling on eventfd, trigger guest ISR

QEMU

VM

Guest OS

Guest

RAM

TYPE1 IOMMUDevice Register

Interface

Mediated CBsMediatedBus Driver

Mdev Driver Register Interface

RAM

IOMMUMMU

Mdev SysFS VFIO UAPI

PIN/UNPIN

TYPE1 IOMMU UAPI

Mediated Core

Mdev fd

VFIO

KVM

irqfd

Vendor driverVendor driver

Vendor driver

ISRPCIE

MDEV

GPUGPUPCIE Device

VFIOMDEV

/sys/bus/mdev/devices/$mdev_UUID

35

MEDIATED DEVICE – CURRENT STATUS

36

CURRENT STATUS

[PATCH v7] is sent out by Kirti Wankhede on 08/24/2016

vfio: Mediated device Core driver

vfio: VFIO driver for mediated devices

vfio iommu: Add support for mediated devices

docs: Add Documentation for Mediate devices

Tested with Linux kernel 4.7

Multiple mediated device per VM

Multiple VFIO passthru device per VM

Mixed mediated device and VFIO passthru device

Upstream

37

DEMO: NVIDIA VGPU

38

MEDIATED DEVICE FRAMEWORK – FUTURE WORK

39

MEDIATED DEVICE FRAMEWORKFuture work

QEMU

VM

Guest OS

Guest

RAM

TYPE1 IOMMUDevice Register

Interface

Mediated CBsMediatedBus Driver

Mdev Driver Register Interface

RAM

IOMMUMMU

Mdev SysFS VFIO UAPI

PIN/UNPIN

TYPE1 IOMMU UAPI

Mediated CoreVFIO

KVM

PCIE

MDEV

Vendor driverVendor driver

GPUGPUPCIE Device

VFIOMDEV

40

MEDIATED DEVICE FRAMEWORKFuture work

POWER support – by extend pin/unpin to SPAPR TCE v2 IOMMU

QEMU

VM

Guest OS

Guest

RAM

SPAPR TCE V2 IOMMU

Device RegisterInterface

Mediated CBsMediatedBus Driver

Mdev Driver Register Interface

RAM

IOMMUMMU

Mdev SysFS VFIO UAPI

PIN/UNPIN

TYPE1 IOMMU UAPI

Mediated CoreVFIO

KVM

Vendor driverVendor driver

GPUGPUPCIE Device

PCIE

MDEV

VFIOMDEV

41

MEDIATED DEVICE FRAMEWORKFuture work

POWER support – by extend pin/unpin to SPAPR TCE v2 IOMMU

Libvirt integrationQEMU

VM

Guest OS

Guest

RAM

SPAPR TCE V2 IOMMU

Device RegisterInterface

Mediated CBsMediatedBus Driver

Mdev Driver Register Interface

RAM

IOMMUMMU

Mdev SysFS VFIO UAPI

PIN/UNPIN

TYPE1 IOMMU UAPI

Mediated CoreVFIO

KVM

Vendor driverVendor driver

GPUGPUPCIE Device

PCIE

MDEV

VFIOMDEV

42

REFERENCE

[1] An Introduction to PCI Device Assignment with VFIO - Alex Williamson, Red Hat

[Qemu-devel] [PATCH v7 0/4] Add Mediated device supporthttps://lists.nongnu.org/archive/html/qemu-devel/2016-08/msg03798.html

[libvirt] [RFC] libvirt vGPU QEMU integration

https://www.redhat.com/archives/libvir-list/2016-August/msg00939.html

QUESTIONS?