24
Device Assignment with Nested Guest and DPDK Peter Xu <[email protected]> Red Hat Virtualization Team

Device Assignment with Nested Guest and DPDK · 2018. 11. 15. · Device Assignment with Nested Guest and DPDK Peter Xu Red Hat Virtualization Team. 2 Agenda

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Device Assignment with Nested Guest and DPDK · 2018. 11. 15. · Device Assignment with Nested Guest and DPDK Peter Xu  Red Hat Virtualization Team. 2 Agenda

Device Assignment with Nested Guest and DPDK

Peter Xu <[email protected]>Red Hat Virtualization Team

Page 2: Device Assignment with Nested Guest and DPDK · 2018. 11. 15. · Device Assignment with Nested Guest and DPDK Peter Xu  Red Hat Virtualization Team. 2 Agenda

2

Agenda

● Problems● Unsafe userspace device drivers● Device assignment for nested guests

● Solution● Status update

Page 3: Device Assignment with Nested Guest and DPDK · 2018. 11. 15. · Device Assignment with Nested Guest and DPDK Peter Xu  Red Hat Virtualization Team. 2 Agenda

BACKGROUNDS

Page 4: Device Assignment with Nested Guest and DPDK · 2018. 11. 15. · Device Assignment with Nested Guest and DPDK Peter Xu  Red Hat Virtualization Team. 2 Agenda

4

Backgrounds

● What the talk is about?● DMA of assigned devices (no PCI configrations, IRQs, MMIOs…)● vIOMMU (QEMU, x86_64/Intel)

● These two features cannot work together (before)...● Guest IOMMU page table is only visible to the guest● An assigned hardware cannot see the guest IOMMU page table

● Will we need it?

Page 5: Device Assignment with Nested Guest and DPDK · 2018. 11. 15. · Device Assignment with Nested Guest and DPDK Peter Xu  Red Hat Virtualization Team. 2 Agenda

PROBLEMS

Page 6: Device Assignment with Nested Guest and DPDK · 2018. 11. 15. · Device Assignment with Nested Guest and DPDK Peter Xu  Red Hat Virtualization Team. 2 Agenda

6

Problem 1: Userspace Drivers

● More userspace drivers!● VFIO/UIO driver can pass though a device to userspace● DPDK/SPDK uses PMDs to drive devices

● However, userspace drivers are not trusted● MMU protects CPU accesses (CPU instructions)● IOMMU protects device accesses (DMA)

● What if we want to “assign” an assigned device to DPDK in the guest?● No vIOMMU, means no device DMA protection● Guest kernel is at risk: as long as userspace driver used, kernel tainted!

Page 7: Device Assignment with Nested Guest and DPDK · 2018. 11. 15. · Device Assignment with Nested Guest and DPDK Peter Xu  Red Hat Virtualization Team. 2 Agenda

7

● How device assignment works for L1 guest?

● Device seen by the L1 guest

● Guest uses L1GPA as DMA addresses

● Host IOMMU maps L1GPA → HPA before guest starts

● What if we assign a hardware twice to a nested guest?● Device seen by both L1 & L2 guest

● L2 guest uses L2GPA as DMA address

● We need host IOMMU to map L2GPA → HPA… but how?

Problem 2: Device Assignment for Nested Guests

Page 8: Device Assignment with Nested Guest and DPDK · 2018. 11. 15. · Device Assignment with Nested Guest and DPDK Peter Xu  Red Hat Virtualization Team. 2 Agenda

8

Problem 2: Device Assignment forNested Guests (cont.)

Host Memory

Host IOMMU

VFIO driver

L1 Guest Memory

L1 Guest IOMMU

PCI Device

PCI Device

VFIO driver

L2 Guest Memory

PCI Device

Host

L1 Guest

L2 Guest

Provides L2GPA -> L1GPA Mapping

Provides L1GPA -> HPA Mapping

Page 9: Device Assignment with Nested Guest and DPDK · 2018. 11. 15. · Device Assignment with Nested Guest and DPDK Peter Xu  Red Hat Virtualization Team. 2 Agenda

SOLUTION

Page 10: Device Assignment with Nested Guest and DPDK · 2018. 11. 15. · Device Assignment with Nested Guest and DPDK Peter Xu  Red Hat Virtualization Team. 2 Agenda

WHAT WE HAVE?

Page 11: Device Assignment with Nested Guest and DPDK · 2018. 11. 15. · Device Assignment with Nested Guest and DPDK Peter Xu  Red Hat Virtualization Team. 2 Agenda

11

DMA for Emulated Device, w/o vIOMMU

vCPU

Emulated Device

(e1000/virtio)

Guest Memory

QEMU

Guest

Memory Core API

(1)

(2)

(3)

(1) IO Request (2) Allocate DMA buffer (3) DMA request (GPA) (4) Memory access (GPA)

(4)

Page 12: Device Assignment with Nested Guest and DPDK · 2018. 11. 15. · Device Assignment with Nested Guest and DPDK Peter Xu  Red Hat Virtualization Team. 2 Agenda

12

DMA for Emulated Device, w/ vIOMMU

vCPU

Emulated Device

(e1000/virtio)

QEMU

Guest

Memory Core API

(1)

(2)

vIOMMU

Guest Memory

(3)(6)

(4)

(5)

(7)

(1) IO request (2) Allocate DMA buffer, setup device page table (IOVA->GPA) (3) DMA request (IOVA) (4) Page translation request (IOVA) (5) Lookup device page table (IOVA->GPA) (6) Get translation result (GPA) (7) Complete translation request (GPA) (8) Memory access (GPA)

(8)

Page 13: Device Assignment with Nested Guest and DPDK · 2018. 11. 15. · Device Assignment with Nested Guest and DPDK Peter Xu  Red Hat Virtualization Team. 2 Agenda

13

DMA of Assigned Devices, w/o vIOMMU

vCPU

Assigned PCI Device

QEMU

Guest

Memory Core API

(1)

(2)Guest Memory

(3)

(1) IO request (2) Allocate DMA buffer (3) Virtual DMA request (using GPA) (4) DMA request (using GPA) (5) Memory access (using HPA)

Assigned PCI Device

IOMMU(4)

(5)

Device Page Table (GPA->HPA)

Page 14: Device Assignment with Nested Guest and DPDK · 2018. 11. 15. · Device Assignment with Nested Guest and DPDK Peter Xu  Red Hat Virtualization Team. 2 Agenda

WHAT WE NEED?

Page 15: Device Assignment with Nested Guest and DPDK · 2018. 11. 15. · Device Assignment with Nested Guest and DPDK Peter Xu  Red Hat Virtualization Team. 2 Agenda

15

DMA of Assigned Devices, w/ vIOMMU

vCPU

Assigned PCI Device

QEMU

Guest

Memory Core API

(1)

(2)Guest Memory

(3)

(1) IO request (2) Allocate DMA buffer, setup device page table (IOVA->GPA) (3) Send MAP notification (4) Sync shadow page table (IOVA->HPA) (5) Sync Complete (6) MAP notification Complete (7) Virtual DMA request (using IOVA) (8) DMA request (using IOVA) (9) Memory access (using HPA)

Assigned PCI Device

IOMMU

(7)vIOMMU

(4)

(8)

(9)

Device Shadow Page Table (IOVA->HPA)

(5)

(6)

Page 16: Device Assignment with Nested Guest and DPDK · 2018. 11. 15. · Device Assignment with Nested Guest and DPDK Peter Xu  Red Hat Virtualization Team. 2 Agenda

16

IOMMU Shadow Page TableHardware IOMMU page tables without/with a vIOMMU in the guest(GPA→HPA is the original page table; IOVA→HPA is the shadow page table)

HPA

HPA

HPA

...

HPA

HPA

HPA

HPA

HPA

HPA

HPA

...

HPA

HPA

HPA

HPA

Device Page Table Root Pointer (GPA->HPA)

GPA[31:22] GPA[21:12] GPA[11:0]

DATA

DATA

DATA

...

DATA

DATA

DATA

DATA

HPA

HPA

HPA

...

HPA

HPA

HPA

HPA

HPA

HPA

HPA

...

HPA

HPA

HPA

HPA

Device Shadow Page Table Root Pointer (IOVA->HPA)

IOVA[31:22] IOVA[21:12] IOVA[11:0]

DATA

DATA

DATA

...

DATA

DATA

DATA

DATA

Without vIOMMU: GPA->HPA

With vIOMMU: IOVA->HPA

Page 17: Device Assignment with Nested Guest and DPDK · 2018. 11. 15. · Device Assignment with Nested Guest and DPDK Peter Xu  Red Hat Virtualization Team. 2 Agenda

17

Synchronizing Shadow Page Tables

● Solution 1 (not used): Write-protect guest page table● Complicated; possibly need a new KVM interface to report the event

● Solution 2 (being used): VT-d caching mode● “Any page entry update will require explicit invalidation of caches”

(VT-d spec chapter 6.1)● No KVM change needed● Have existing Linux guest driver support● Intel-only solution; PV-like, but also applies to hardware

Page 18: Device Assignment with Nested Guest and DPDK · 2018. 11. 15. · Device Assignment with Nested Guest and DPDK Peter Xu  Red Hat Virtualization Team. 2 Agenda

18

Shadow Page Tables: MMU vs IOMMU

TYPE MMU IOMMU

Target Processors Devices

Allow page faults? Yes (of course!) No [*]

Trigger mode(shadow sync) Page Fault Explicit Message

(caching-mode)

Page Table Format 32bits, 64bits, PAE,... 64bits

Cost(shadow sync)

Small, relatively(KVM only)

Huge(long code path [**])

Need Previous State? No Yes [***]

[*]: Upstream work ongoing to enable Intel IOMMU page faults[**/***]: Please refer to follow up slides for more information

Page 19: Device Assignment with Nested Guest and DPDK · 2018. 11. 15. · Device Assignment with Nested Guest and DPDK Peter Xu  Red Hat Virtualization Team. 2 Agenda

19

Shadow Sync: Costly for IOMMU!(Example: when L2 guest maps one page)

L2 Guest

IOMMU Driver

KVM

QEMU (L2 instance)

vIOMMU

VFIO

QEMU (L1 instance)

L1 Kernel

KVM

Host Kernel

vIOMMU

VFIO

IOMMU Driver

IOMMU Driver

Host IOMMU

Page 20: Device Assignment with Nested Guest and DPDK · 2018. 11. 15. · Device Assignment with Nested Guest and DPDK Peter Xu  Red Hat Virtualization Team. 2 Agenda

20

Shadow Sync: About State Cache

● MMU shadow sync● Talk to page tables: PGD, PUD, PMD, PTE,…● Doing set() on page table entries● No need to cache previous state

● IOMMU shadow sync● Talk to vfio-pci driver: VFIO_IOMMU_MAP_DMA, VFIO_IOMMU_UNMAP_DMA

(no direct access to page tables, the same even to vfio-pci driver underneath)● Doing add()/remove() on page table entries● We can either create a new entry (it must not exist before), or delete an old entry● Previous state matters, since otherwise we can’t judge what page has been mapped

Page 21: Device Assignment with Nested Guest and DPDK · 2018. 11. 15. · Device Assignment with Nested Guest and DPDK Peter Xu  Red Hat Virtualization Team. 2 Agenda

STATUS UPDATE

Page 22: Device Assignment with Nested Guest and DPDK · 2018. 11. 15. · Device Assignment with Nested Guest and DPDK Peter Xu  Red Hat Virtualization Team. 2 Agenda

22

Some Facts… and TBDs

● Emulated devices v.s. Assigned devices from IOMMU perspective● Emulated: fast mapping (no sync), slow IO (need guest translation)● Assigned: slow mapping (need sync), fast IO (no guest translation)

● Some performance numbers (Intel ixgbe, 10Gbps NIC)● Kernel ixgbe driver, very slow (~80% degradation on L1)● Userspace DPDK driver, very fast (close to line speed, both L1 & L2)

● Future works?● Reduce context switches when sync shadow pages? (vhost-iommu?)● Nested page table? (need hardware support, like EPT comparing to softmmu)● Sharing the state cache?

(e.g. vfio-pci has similar state cache, see “vfio_iommu.dma_list”)

Page 23: Device Assignment with Nested Guest and DPDK · 2018. 11. 15. · Device Assignment with Nested Guest and DPDK Peter Xu  Red Hat Virtualization Team. 2 Agenda

23

Wanna try?

● QEMU command line to try this out:

● Versions:● QEMU: please use v3.0 or newer● Linux: please use v4.18-rc1 or newer

● For more information, please visit:● https://wiki.qemu.org/Features/VT-d

qemu-system-x86_64 -M q35,accel=kvm,kernel-irqchip=split -m 2G \ -device intel-iommu,intremap=on,caching-mode=on \ -device vfio-pci,host=XX:XX:XX

qemu-system-x86_64 -M q35,accel=kvm,kernel-irqchip=split -m 2G \ -device intel-iommu,intremap=on,caching-mode=on \ -device vfio-pci,host=XX:XX:XX

Page 24: Device Assignment with Nested Guest and DPDK · 2018. 11. 15. · Device Assignment with Nested Guest and DPDK Peter Xu  Red Hat Virtualization Team. 2 Agenda

THANK YOU

plus.google.com/+RedHat

youtube.com/user/RedHatVideos

facebook.com/redhatinc

twitter.com/RedHatlinkedin.com/company/red-hat