27
KVM tuning and testing, and KVM tuning and testing, and KVM tuning and testing, and KVM tuning and testing, and KVM tuning and testing, and KVM tuning and testing, and KVM tuning and testing, and KVM tuning and testing, and SMP enhancement SMP enhancement SMP enhancement SMP enhancement SMP enhancement SMP enhancement SMP enhancement SMP enhancement Eddie Dong, Yunfeng Zhao, Xin Li Eddie Dong, Yunfeng Zhao, Xin Li

KVM tuning and testing, and SMP enhancementKVM...KVM tuning and testing, and SMP enhancement Eddie Dong, Yunfeng Zhao, Xin Li 2 Legal Disclaimer INFORMATION IN THIS DOCUMENT IS PROVIDED

  • Upload
    others

  • View
    74

  • Download
    0

Embed Size (px)

Citation preview

KVM tuning and testing, and KVM tuning and testing, and KVM tuning and testing, and KVM tuning and testing, and KVM tuning and testing, and KVM tuning and testing, and KVM tuning and testing, and KVM tuning and testing, and SMP enhancementSMP enhancementSMP enhancementSMP enhancementSMP enhancementSMP enhancementSMP enhancementSMP enhancement

Eddie Dong, Yunfeng Zhao, Xin LiEddie Dong, Yunfeng Zhao, Xin Li

2

Legal DisclaimerLegal Disclaimer

�� INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEINFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTELL®® PRODUCTS. NO LICENSE, PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAEXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS L PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTELGRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL’’S TERMS AND CONDITIONS OF SALE FOR S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER, AND INTEL SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER, AND INTEL DISCLAIMS ANY EXPRESS OR DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTELIMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL®® PRODUCTS INCLUDING LIABILITY OR PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANWARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR TABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPINFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. INTEL ERTY RIGHT. INTEL PRODUCTS ARE NOT INTENDED FOR USE IN MEDICAL, LIFE SAVING, OR LIPRODUCTS ARE NOT INTENDED FOR USE IN MEDICAL, LIFE SAVING, OR LIFE SUSTAINING APPLICATIONS. FE SUSTAINING APPLICATIONS.

�� Intel may make changes to specifications and product descriptionIntel may make changes to specifications and product descriptions at any time, without notice.s at any time, without notice.

�� All products, dates, and figures specified are preliminary basedAll products, dates, and figures specified are preliminary based on current expectations, and are subject on current expectations, and are subject to change without notice.to change without notice.

�� Intel, processors, chipsets, and desktop boards may contain desiIntel, processors, chipsets, and desktop boards may contain design defects or errors known as errata, gn defects or errors known as errata, which may cause the product to deviate from published specificatwhich may cause the product to deviate from published specifications. Current characterized errata are ions. Current characterized errata are available on request.available on request.

�� Intel and the Intel logo are trademarks or registered trademarksIntel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in of Intel Corporation or its subsidiaries in the United States and other countries. the United States and other countries.

�� *Other names and brands may be claimed as the property of others*Other names and brands may be claimed as the property of others..

�� Copyright Copyright ©© 2007 Intel Corporation.2007 Intel Corporation.

Throughout this presentation:

VT-x refers to Intel® VT for IA-32 and Intel® 64

VT-i refers to the Intel® VT for IA-64, and

VT-d refers to Intel® VT for Directed I/O

3

AgendaAgenda

�� Performance tuningPerformance tuning

�� Kernel interrupt controller statusKernel interrupt controller status

�� SMP supportSMP support

�� TestingTesting

4

Back to KVMBack to KVM--1818

�� Kernel build performance was only Kernel build performance was only 1/3 of 1/3 of XenXen

–– We suspected shadow page table may We suspected shadow page table may

not be optimizednot be optimized

�� We used We used oprofileoprofile to analyze the to analyze the overheadoverhead

––Anthony & Avi started looking at Anthony & Avi started looking at

performance issues at same timeperformance issues at same time

5

Top 5 findings from Top 5 findings from oprofileoprofile

�� Guest only get ~Guest only get ~25%25% cyclescycles

�� Excessive MSR save/restoreExcessive MSR save/restore

–– Such as SYSCALL_MASK, LSTAR, CSTAR, Such as SYSCALL_MASK, LSTAR, CSTAR, KERNEL_GS_BASE, EFER, and K6_STARKERNEL_GS_BASE, EFER, and K6_STAR

–– load_msrsload_msrs costs ~costs ~7%7%

–– save_msrssave_msrs costs ~costs ~3.7%3.7%

–– kvm_vmx_returnkvm_vmx_return costs ~costs ~6.1%6.1%

–– Hardware VM Exit does save/load for some of the Hardware VM Exit does save/load for some of the MSRsMSRs

�� vmx_vcpu_runvmx_vcpu_run costs ~costs ~3.2%3.2%

––Most time is spent in HOST_FS/GS_BASE write Most time is spent in HOST_FS/GS_BASE write and and fxfx save/restoresave/restore

6

LightLight--weight vs. heavyweight vs. heavy--weight VM Exitweight VM Exit

�� A lightA light--weight VM Exit is handled in weight VM Exit is handled in KVM and returned to guest directly, KVM and returned to guest directly, without host context switchwithout host context switch

––Mostly for shadow page faultMostly for shadow page fault

––Cover 93% of all VM Exits in KVMCover 93% of all VM Exits in KVM--1818

�� A heavyA heavy--weight VM exit involves weight VM exit involves host context switch or transition to host context switch or transition to QemuQemu

––Such as I/O or when signal is pendingSuch as I/O or when signal is pending

––Require save/restore of Require save/restore of MSRsMSRs

7

Reduce VM ExitsReduce VM Exits

�� Improve shadow page table codeImprove shadow page table code

––Combine guest PTE update with shadow Combine guest PTE update with shadow

PTE update (Avi Kivity, PTE update (Avi Kivity, QumranetQumranet))

–– Increase shadow page table size Increase shadow page table size

(Avi Kivity, (Avi Kivity, QumranetQumranet))

�� Misc.Misc.

––Port 0x80 access goes to hardware Port 0x80 access goes to hardware

directly (Qing He, Intel)directly (Qing He, Intel)

8

Shorten VM Exit handlingShorten VM Exit handling�� Provide quick path for lightProvide quick path for light--weight VM weight VM ExitExit––Minimize context save/restore for lightMinimize context save/restore for light--weight weight VM Exit (Eddie Dong, Intel)VM Exit (Eddie Dong, Intel)

–– Avoid hardware MSR save/restore (Eddie Dong, Avoid hardware MSR save/restore (Eddie Dong, Intel)Intel)

–– Lazy MSR_EFER save/restore (Eddie Dong, Intel)Lazy MSR_EFER save/restore (Eddie Dong, Intel)

�� Fine tune heavyFine tune heavy--weight VM Exit path to weight VM Exit path to save/restore necessary context onlysave/restore necessary context only–– Lazy FP (Anthony Liguori, IBM)Lazy FP (Anthony Liguori, IBM)

–– Some Some MSRsMSRs are not changed in certain are not changed in certain environment (Anthony, Avi and etc.)environment (Anthony, Avi and etc.)

–– UnbundleUnbundle fsfs from from gsgs reload for better SMP reload for better SMP support (Laurent support (Laurent VivierVivier, Bull), Bull)

9

Kernel BuildKernel Build

0.00%

10.00%

20.00%

30.00%

40.00%

50.00%

60.00%

70.00%

80.00%

18 19 20 21 22 23 24 25 26 27 28 29 30 31 33 35

release #

vs. native

Guest time Net time

10

AgendaAgenda

�� Performance tuningPerformance tuning

�� Kernel interrupt controller statusKernel interrupt controller status

�� SMP supportSMP support

�� TestingTesting

11

Where to Where to virtualizevirtualize interrupt controller ?interrupt controller ?

�� User levelUser level–– Pro: Can reuse Pro: Can reuse QemuQemu device modeldevice model

–– Con: Performance concern for kernel devicesCon: Performance concern for kernel devices

�� Mixed mode (APIC in kernel, PIC/IOAPIC in Mixed mode (APIC in kernel, PIC/IOAPIC in user level initially)user level initially)–– ProsPros

–– Flexible code structureFlexible code structure

–– Better SMP supportBetter SMP support

–– ConsCons

–– ComplexityComplexity

–– Performance concern if IOAPIC is in user levelPerformance concern if IOAPIC is in user level

�� Kernel level (lapic5 branch)Kernel level (lapic5 branch)–– Pro: Better SMP support, better performancePro: Better SMP support, better performance

–– Con: Kernel is subject to device model failureCon: Kernel is subject to device model failure

12

Kernel interrupt controller I/FsKernel interrupt controller I/Fs

PIC IOAPIC

VP0

APIC0

VP1

APIC1

User level device model

(Qemu)

Host Kernel

KVM

KVM_IRQ_LINE

Signal an IRQ line level

KVM_GET_IRQCHIP

Save interrupt controller state

KVM_SET_IRQCHIPRestore interrupt controller state

Convert vector based VCPU ops to gsi based VM opsConvert vector based VCPU ops to Convert vector based VCPU ops to gsigsi based VM opsbased VM ops

13

Lapic5 statusLapic5 status

�� Current StatusCurrent Status

––PIC/IOAPIC/APIC are implementedPIC/IOAPIC/APIC are implemented

––Live migration is supportedLive migration is supported

––SMP Windows/Linux worksSMP Windows/Linux works

�� TODOTODO

––Merge with master branchMerge with master branch

––StabilizeStabilize

––Guest MSIGuest MSI

14

Lapic5 Quality StatusLapic5 Quality StatusFailFail

N/AN/A

Can boot, but has issuesCan boot, but has issues

PassPass

Vista SMP ACPI HALVista SMP ACPI HAL

Vista UP ACPI HALVista UP ACPI HAL

WinXPWinXP UP ACPI HALUP ACPI HAL

WinXPWinXP NoNo--ACPI HALACPI HAL

Win2k Win2k SrvSrv MP ACPI HALMP ACPI HAL

Win2k Win2k SrvSrv UP ACPI HALUP ACPI HAL

Win2k Win2k SrvSrv NoNo--ACPI HALACPI HAL

Win2k3 R2 MP ACPI HALWin2k3 R2 MP ACPI HAL

Win2k3 R2 UP ACPI HALWin2k3 R2 UP ACPI HAL

Win2k3 R2 NoWin2k3 R2 No--ACPI HALACPI HAL

Linux 2.6.22 SMPLinux 2.6.22 SMP

Linux 2.6.22 UPLinux 2.6.22 UP

Linux 2.6.18 SMPLinux 2.6.18 SMP

Linux 2.6.18 UPLinux 2.6.18 UP

Linux 2.6.9 SMPLinux 2.6.9 SMP

Linux 2.6.9 UPLinux 2.6.9 UP

64/6464/6432p/6432p/6432p/32p32p/32p32/32p32/32p

Guest/HostGuest/HostGuest OSGuest OS

Lapic5 has same functionality with master branchLapic5 has same functionality with master branchLapic5 has same functionality with master branch

15

AgendaAgenda

�� Performance tuningPerformance tuning

�� Kernel interrupt controller statusKernel interrupt controller status

�� SMP supportSMP support

�� TestingTesting

16

KVM SMP developmentKVM SMP development

�� Originally enabled in June based on Originally enabled in June based on GregGreg’’s in kernel APIC V09 (Xin Li, s in kernel APIC V09 (Xin Li, Intel)Intel)––Each vCPU has a dedicated threadEach vCPU has a dedicated thread

�� User level SMP is enabled in KVMUser level SMP is enabled in KVM--29 29 ((Avi Kivity, Avi Kivity, QumranetQumranet))

�� In kernel interrupt controller (lapic5 In kernel interrupt controller (lapic5 branch) based SMP is enabled (Xin branch) based SMP is enabled (Xin Li, Intel)Li, Intel)

17

Linux Kernel

Guest

KVM SMPKVM SMP

Hardware

Device

Driver

Device

Driver

KVMmodule

QEMU DMRegularLinux

Application

VMX/SVMmodule

vCPU vCPU

Local

APIC

PIC/IOAPIC

Local

APICIPI

18

KVM SMPKVM SMP

�� N model: each vCPU has a dedicated N model: each vCPU has a dedicated threadthread

––Global lock to DMGlobal lock to DM

–– May use device locks insteadMay use device locks instead

–– Each Each vCPUvCPU thread need to handle signalsthread need to handle signals

–– Asynchronous events may be delivered to any threadAsynchronous events may be delivered to any thread

�� N+1 Model: to add a dedicated thread to N+1 Model: to add a dedicated thread to handle asynchronous eventshandle asynchronous events

–– Such as DMA/AIOSuch as DMA/AIO

–– Simplify vCPU thread logicSimplify vCPU thread logic

19

AgendaAgenda

�� Performance tuningPerformance tuning

�� Kernel interrupt controller statusKernel interrupt controller status

�� SMP supportSMP support

�� TestingTesting

20

KVM TestKVM Test

GoalsGoals�� Ensure KVM works well on all Intel Ensure KVM works well on all Intel

PlatformsPlatforms

–– functionality and performancefunctionality and performance

ActivitiesActivities�� Test KVM master branch dailyTest KVM master branch daily

�� Report issues and regressions ASAPReport issues and regressions ASAP

�� Track issues and help community Track issues and help community developers to fix issuesdevelopers to fix issues

�� Develop test cases for new KVM featuresDevelop test cases for new KVM features

21

KVM Test SuitesKVM Test Suites

Linux: CPU2K, Kernel build, Lmbench, Iometer, SpecJBB, Sysbench,Linux: CPU2K, Kernel build, Lmbench, Iometer, SpecJBB, Sysbench,

Byte, NetPerfByte, NetPerf

Windows: Windows: SysmarkSysmark, CPU2k, , CPU2k, SpecJBBSpecJBB, , PCmarkPCmark

PerformancePerformance

Basic test cases for KVM main features, like Save/Restore, SMP Basic test cases for KVM main features, like Save/Restore, SMP

Windows/Linux, live migration, and basic virtual devicesWindows/Linux, live migration, and basic virtual devicesNightly TestNightly Test

Linux: LTP stress, Crashme, misc workloadsLinux: LTP stress, Crashme, misc workloads

Windows: HCT StressWindows: HCT StressStressStress

Specific tests for previous failuresSpecific tests for previous failuresRegression Regression

teststests

LTP, kernel parameters, Windows (HCT,DTM), Guest OS installationLTP, kernel parameters, Windows (HCT,DTM), Guest OS installation(RHEL5, FC6, RHEL4U3, SLES10, OpenSuse10, SLES9, Windows (RHEL5, FC6, RHEL4U3, SLES10, OpenSuse10, SLES9, Windows XP/2k/2k3/Vista)XP/2k/2k3/Vista)

Guest OSGuest OS

Disk, NIC, VGA, Timer, Keyboard, MouseDisk, NIC, VGA, Timer, Keyboard, MouseDevice ModelDevice Model

Create/destroy different guest configurations (memory, #VCPU, ACCreate/destroy different guest configurations (memory, #VCPU, ACPI, PI, 32/64), Save & Restore, Live Migration etc.32/64), Save & Restore, Live Migration etc.

VM VM

ManagementManagement

Test ScopeTest ScopeTest SuiteTest Suite

22

Test FrequencyTest Frequency

On demandOn demand

MonthlyMonthly

DailyDaily

RegressionRegression

Guest/Guest Guest/Guest

Installation Installation

StressStress

PerformancePerformance

Device ModelDevice Model

Nightly TestNightly Test

64/6464/6432p/6432p/6432/6432/6432p/32p32p/32p32/32p32/32pGuest/HostGuest/Host

23

Test InfrastructureTest Infrastructure

�� Common interface for easy test case Common interface for easy test case developmentdevelopment

�� Run tests according to predefined Run tests according to predefined configuration and scenarioconfiguration and scenario

�� Can handle host hang/crash/reboot Can handle host hang/crash/reboot situationssituations

�� Automatic report generationAutomatic report generation–– Outputs a journal file in a standard wellOutputs a journal file in a standard well--defined formatdefined format

24

Sample Test ResultSample Test ResultIssue listIssue list

================================================================================================

1. Could not create kvm guest with memory >=2040 1. Could not create kvm guest with memory >=2040

DetailsDetails

================================================================================================

PAE:PAE:

1. boot guest with 256M memory1. boot guest with 256M memory PASSPASS

2. boot two windows 2. boot two windows xpxp guestguest PASSPASS

……

IA32e:IA32e:

1. boot 4 321. boot 4 32--bits guest in parallelbits guest in parallel PASSPASS

2. boot 4 642. boot 4 64--bits guest in parallelbits guest in parallel FAILFAIL

……

Test LogTest Log

25

Current StatusCurrent StatusFailFail

N/AN/A

Can boot, but has issuesCan boot, but has issues

PassPass

Vista SMP ACPI HALVista SMP ACPI HAL

Vista UP ACPI HALVista UP ACPI HAL

WinXPWinXP UP ACPI HALUP ACPI HAL

WinXPWinXP NoNo--ACPI HALACPI HAL

Win2k Win2k SrvSrv MP ACPI HALMP ACPI HAL

Win2k Win2k SrvSrv UP ACPI HALUP ACPI HAL

Win2k Win2k SrvSrv NoNo--ACPI HALACPI HAL

Win2k3 R2 MP ACPI HALWin2k3 R2 MP ACPI HAL

Win2k3 R2 UP ACPI HALWin2k3 R2 UP ACPI HAL

Win2k3 R2 NoWin2k3 R2 No--ACPI HALACPI HAL

Linux 2.6.22 SMPLinux 2.6.22 SMP

Linux 2.6.22 UPLinux 2.6.22 UP

Linux 2.6.18 SMPLinux 2.6.18 SMP

Linux 2.6.18 UPLinux 2.6.18 UP

Linux 2.6.9 SMPLinux 2.6.9 SMP

Linux 2.6.9 UPLinux 2.6.9 UP

64/6464/6432p/6432p/6432p/32p32p/32p32/32p32/32p

Guest/HostGuest/HostGuest OSGuest OS

26

We need your helpWe need your help

�� Use Bug Tracker instead of email to track issuesUse Bug Tracker instead of email to track issues

–– submit issuesubmit issue

–– assign owner assign owner

–– update bug status update bug status

�� Give us feedback on our test and its resultsGive us feedback on our test and its results

�� What more can we do for KVM?What more can we do for KVM?

27