Upload
others
View
74
Download
0
Embed Size (px)
Citation preview
KVM tuning and testing, and KVM tuning and testing, and KVM tuning and testing, and KVM tuning and testing, and KVM tuning and testing, and KVM tuning and testing, and KVM tuning and testing, and KVM tuning and testing, and SMP enhancementSMP enhancementSMP enhancementSMP enhancementSMP enhancementSMP enhancementSMP enhancementSMP enhancement
Eddie Dong, Yunfeng Zhao, Xin LiEddie Dong, Yunfeng Zhao, Xin Li
2
Legal DisclaimerLegal Disclaimer
�� INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEINFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTELL®® PRODUCTS. NO LICENSE, PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAEXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS L PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTELGRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL’’S TERMS AND CONDITIONS OF SALE FOR S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER, AND INTEL SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER, AND INTEL DISCLAIMS ANY EXPRESS OR DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTELIMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL®® PRODUCTS INCLUDING LIABILITY OR PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANWARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR TABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPINFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. INTEL ERTY RIGHT. INTEL PRODUCTS ARE NOT INTENDED FOR USE IN MEDICAL, LIFE SAVING, OR LIPRODUCTS ARE NOT INTENDED FOR USE IN MEDICAL, LIFE SAVING, OR LIFE SUSTAINING APPLICATIONS. FE SUSTAINING APPLICATIONS.
�� Intel may make changes to specifications and product descriptionIntel may make changes to specifications and product descriptions at any time, without notice.s at any time, without notice.
�� All products, dates, and figures specified are preliminary basedAll products, dates, and figures specified are preliminary based on current expectations, and are subject on current expectations, and are subject to change without notice.to change without notice.
�� Intel, processors, chipsets, and desktop boards may contain desiIntel, processors, chipsets, and desktop boards may contain design defects or errors known as errata, gn defects or errors known as errata, which may cause the product to deviate from published specificatwhich may cause the product to deviate from published specifications. Current characterized errata are ions. Current characterized errata are available on request.available on request.
�� Intel and the Intel logo are trademarks or registered trademarksIntel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in of Intel Corporation or its subsidiaries in the United States and other countries. the United States and other countries.
�� *Other names and brands may be claimed as the property of others*Other names and brands may be claimed as the property of others..
�� Copyright Copyright ©© 2007 Intel Corporation.2007 Intel Corporation.
Throughout this presentation:
VT-x refers to Intel® VT for IA-32 and Intel® 64
VT-i refers to the Intel® VT for IA-64, and
VT-d refers to Intel® VT for Directed I/O
3
AgendaAgenda
�� Performance tuningPerformance tuning
�� Kernel interrupt controller statusKernel interrupt controller status
�� SMP supportSMP support
�� TestingTesting
4
Back to KVMBack to KVM--1818
�� Kernel build performance was only Kernel build performance was only 1/3 of 1/3 of XenXen
–– We suspected shadow page table may We suspected shadow page table may
not be optimizednot be optimized
�� We used We used oprofileoprofile to analyze the to analyze the overheadoverhead
––Anthony & Avi started looking at Anthony & Avi started looking at
performance issues at same timeperformance issues at same time
5
Top 5 findings from Top 5 findings from oprofileoprofile
�� Guest only get ~Guest only get ~25%25% cyclescycles
�� Excessive MSR save/restoreExcessive MSR save/restore
–– Such as SYSCALL_MASK, LSTAR, CSTAR, Such as SYSCALL_MASK, LSTAR, CSTAR, KERNEL_GS_BASE, EFER, and K6_STARKERNEL_GS_BASE, EFER, and K6_STAR
–– load_msrsload_msrs costs ~costs ~7%7%
–– save_msrssave_msrs costs ~costs ~3.7%3.7%
–– kvm_vmx_returnkvm_vmx_return costs ~costs ~6.1%6.1%
–– Hardware VM Exit does save/load for some of the Hardware VM Exit does save/load for some of the MSRsMSRs
�� vmx_vcpu_runvmx_vcpu_run costs ~costs ~3.2%3.2%
––Most time is spent in HOST_FS/GS_BASE write Most time is spent in HOST_FS/GS_BASE write and and fxfx save/restoresave/restore
6
LightLight--weight vs. heavyweight vs. heavy--weight VM Exitweight VM Exit
�� A lightA light--weight VM Exit is handled in weight VM Exit is handled in KVM and returned to guest directly, KVM and returned to guest directly, without host context switchwithout host context switch
––Mostly for shadow page faultMostly for shadow page fault
––Cover 93% of all VM Exits in KVMCover 93% of all VM Exits in KVM--1818
�� A heavyA heavy--weight VM exit involves weight VM exit involves host context switch or transition to host context switch or transition to QemuQemu
––Such as I/O or when signal is pendingSuch as I/O or when signal is pending
––Require save/restore of Require save/restore of MSRsMSRs
7
Reduce VM ExitsReduce VM Exits
�� Improve shadow page table codeImprove shadow page table code
––Combine guest PTE update with shadow Combine guest PTE update with shadow
PTE update (Avi Kivity, PTE update (Avi Kivity, QumranetQumranet))
–– Increase shadow page table size Increase shadow page table size
(Avi Kivity, (Avi Kivity, QumranetQumranet))
�� Misc.Misc.
––Port 0x80 access goes to hardware Port 0x80 access goes to hardware
directly (Qing He, Intel)directly (Qing He, Intel)
8
Shorten VM Exit handlingShorten VM Exit handling�� Provide quick path for lightProvide quick path for light--weight VM weight VM ExitExit––Minimize context save/restore for lightMinimize context save/restore for light--weight weight VM Exit (Eddie Dong, Intel)VM Exit (Eddie Dong, Intel)
–– Avoid hardware MSR save/restore (Eddie Dong, Avoid hardware MSR save/restore (Eddie Dong, Intel)Intel)
–– Lazy MSR_EFER save/restore (Eddie Dong, Intel)Lazy MSR_EFER save/restore (Eddie Dong, Intel)
�� Fine tune heavyFine tune heavy--weight VM Exit path to weight VM Exit path to save/restore necessary context onlysave/restore necessary context only–– Lazy FP (Anthony Liguori, IBM)Lazy FP (Anthony Liguori, IBM)
–– Some Some MSRsMSRs are not changed in certain are not changed in certain environment (Anthony, Avi and etc.)environment (Anthony, Avi and etc.)
–– UnbundleUnbundle fsfs from from gsgs reload for better SMP reload for better SMP support (Laurent support (Laurent VivierVivier, Bull), Bull)
9
Kernel BuildKernel Build
0.00%
10.00%
20.00%
30.00%
40.00%
50.00%
60.00%
70.00%
80.00%
18 19 20 21 22 23 24 25 26 27 28 29 30 31 33 35
release #
vs. native
Guest time Net time
10
AgendaAgenda
�� Performance tuningPerformance tuning
�� Kernel interrupt controller statusKernel interrupt controller status
�� SMP supportSMP support
�� TestingTesting
11
Where to Where to virtualizevirtualize interrupt controller ?interrupt controller ?
�� User levelUser level–– Pro: Can reuse Pro: Can reuse QemuQemu device modeldevice model
–– Con: Performance concern for kernel devicesCon: Performance concern for kernel devices
�� Mixed mode (APIC in kernel, PIC/IOAPIC in Mixed mode (APIC in kernel, PIC/IOAPIC in user level initially)user level initially)–– ProsPros
–– Flexible code structureFlexible code structure
–– Better SMP supportBetter SMP support
–– ConsCons
–– ComplexityComplexity
–– Performance concern if IOAPIC is in user levelPerformance concern if IOAPIC is in user level
�� Kernel level (lapic5 branch)Kernel level (lapic5 branch)–– Pro: Better SMP support, better performancePro: Better SMP support, better performance
–– Con: Kernel is subject to device model failureCon: Kernel is subject to device model failure
12
Kernel interrupt controller I/FsKernel interrupt controller I/Fs
PIC IOAPIC
VP0
APIC0
VP1
APIC1
User level device model
(Qemu)
Host Kernel
KVM
KVM_IRQ_LINE
Signal an IRQ line level
KVM_GET_IRQCHIP
Save interrupt controller state
KVM_SET_IRQCHIPRestore interrupt controller state
Convert vector based VCPU ops to gsi based VM opsConvert vector based VCPU ops to Convert vector based VCPU ops to gsigsi based VM opsbased VM ops
13
Lapic5 statusLapic5 status
�� Current StatusCurrent Status
––PIC/IOAPIC/APIC are implementedPIC/IOAPIC/APIC are implemented
––Live migration is supportedLive migration is supported
––SMP Windows/Linux worksSMP Windows/Linux works
�� TODOTODO
––Merge with master branchMerge with master branch
––StabilizeStabilize
––Guest MSIGuest MSI
14
Lapic5 Quality StatusLapic5 Quality StatusFailFail
N/AN/A
Can boot, but has issuesCan boot, but has issues
PassPass
Vista SMP ACPI HALVista SMP ACPI HAL
Vista UP ACPI HALVista UP ACPI HAL
WinXPWinXP UP ACPI HALUP ACPI HAL
WinXPWinXP NoNo--ACPI HALACPI HAL
Win2k Win2k SrvSrv MP ACPI HALMP ACPI HAL
Win2k Win2k SrvSrv UP ACPI HALUP ACPI HAL
Win2k Win2k SrvSrv NoNo--ACPI HALACPI HAL
Win2k3 R2 MP ACPI HALWin2k3 R2 MP ACPI HAL
Win2k3 R2 UP ACPI HALWin2k3 R2 UP ACPI HAL
Win2k3 R2 NoWin2k3 R2 No--ACPI HALACPI HAL
Linux 2.6.22 SMPLinux 2.6.22 SMP
Linux 2.6.22 UPLinux 2.6.22 UP
Linux 2.6.18 SMPLinux 2.6.18 SMP
Linux 2.6.18 UPLinux 2.6.18 UP
Linux 2.6.9 SMPLinux 2.6.9 SMP
Linux 2.6.9 UPLinux 2.6.9 UP
64/6464/6432p/6432p/6432p/32p32p/32p32/32p32/32p
Guest/HostGuest/HostGuest OSGuest OS
Lapic5 has same functionality with master branchLapic5 has same functionality with master branchLapic5 has same functionality with master branch
15
AgendaAgenda
�� Performance tuningPerformance tuning
�� Kernel interrupt controller statusKernel interrupt controller status
�� SMP supportSMP support
�� TestingTesting
16
KVM SMP developmentKVM SMP development
�� Originally enabled in June based on Originally enabled in June based on GregGreg’’s in kernel APIC V09 (Xin Li, s in kernel APIC V09 (Xin Li, Intel)Intel)––Each vCPU has a dedicated threadEach vCPU has a dedicated thread
�� User level SMP is enabled in KVMUser level SMP is enabled in KVM--29 29 ((Avi Kivity, Avi Kivity, QumranetQumranet))
�� In kernel interrupt controller (lapic5 In kernel interrupt controller (lapic5 branch) based SMP is enabled (Xin branch) based SMP is enabled (Xin Li, Intel)Li, Intel)
17
Linux Kernel
Guest
KVM SMPKVM SMP
Hardware
Device
Driver
Device
Driver
KVMmodule
QEMU DMRegularLinux
Application
VMX/SVMmodule
vCPU vCPU
Local
APIC
PIC/IOAPIC
Local
APICIPI
18
KVM SMPKVM SMP
�� N model: each vCPU has a dedicated N model: each vCPU has a dedicated threadthread
––Global lock to DMGlobal lock to DM
–– May use device locks insteadMay use device locks instead
–– Each Each vCPUvCPU thread need to handle signalsthread need to handle signals
–– Asynchronous events may be delivered to any threadAsynchronous events may be delivered to any thread
�� N+1 Model: to add a dedicated thread to N+1 Model: to add a dedicated thread to handle asynchronous eventshandle asynchronous events
–– Such as DMA/AIOSuch as DMA/AIO
–– Simplify vCPU thread logicSimplify vCPU thread logic
19
AgendaAgenda
�� Performance tuningPerformance tuning
�� Kernel interrupt controller statusKernel interrupt controller status
�� SMP supportSMP support
�� TestingTesting
20
KVM TestKVM Test
GoalsGoals�� Ensure KVM works well on all Intel Ensure KVM works well on all Intel
PlatformsPlatforms
–– functionality and performancefunctionality and performance
ActivitiesActivities�� Test KVM master branch dailyTest KVM master branch daily
�� Report issues and regressions ASAPReport issues and regressions ASAP
�� Track issues and help community Track issues and help community developers to fix issuesdevelopers to fix issues
�� Develop test cases for new KVM featuresDevelop test cases for new KVM features
21
KVM Test SuitesKVM Test Suites
Linux: CPU2K, Kernel build, Lmbench, Iometer, SpecJBB, Sysbench,Linux: CPU2K, Kernel build, Lmbench, Iometer, SpecJBB, Sysbench,
Byte, NetPerfByte, NetPerf
Windows: Windows: SysmarkSysmark, CPU2k, , CPU2k, SpecJBBSpecJBB, , PCmarkPCmark
PerformancePerformance
Basic test cases for KVM main features, like Save/Restore, SMP Basic test cases for KVM main features, like Save/Restore, SMP
Windows/Linux, live migration, and basic virtual devicesWindows/Linux, live migration, and basic virtual devicesNightly TestNightly Test
Linux: LTP stress, Crashme, misc workloadsLinux: LTP stress, Crashme, misc workloads
Windows: HCT StressWindows: HCT StressStressStress
Specific tests for previous failuresSpecific tests for previous failuresRegression Regression
teststests
LTP, kernel parameters, Windows (HCT,DTM), Guest OS installationLTP, kernel parameters, Windows (HCT,DTM), Guest OS installation(RHEL5, FC6, RHEL4U3, SLES10, OpenSuse10, SLES9, Windows (RHEL5, FC6, RHEL4U3, SLES10, OpenSuse10, SLES9, Windows XP/2k/2k3/Vista)XP/2k/2k3/Vista)
Guest OSGuest OS
Disk, NIC, VGA, Timer, Keyboard, MouseDisk, NIC, VGA, Timer, Keyboard, MouseDevice ModelDevice Model
Create/destroy different guest configurations (memory, #VCPU, ACCreate/destroy different guest configurations (memory, #VCPU, ACPI, PI, 32/64), Save & Restore, Live Migration etc.32/64), Save & Restore, Live Migration etc.
VM VM
ManagementManagement
Test ScopeTest ScopeTest SuiteTest Suite
22
Test FrequencyTest Frequency
On demandOn demand
MonthlyMonthly
DailyDaily
RegressionRegression
Guest/Guest Guest/Guest
Installation Installation
StressStress
PerformancePerformance
Device ModelDevice Model
Nightly TestNightly Test
64/6464/6432p/6432p/6432/6432/6432p/32p32p/32p32/32p32/32pGuest/HostGuest/Host
23
Test InfrastructureTest Infrastructure
�� Common interface for easy test case Common interface for easy test case developmentdevelopment
�� Run tests according to predefined Run tests according to predefined configuration and scenarioconfiguration and scenario
�� Can handle host hang/crash/reboot Can handle host hang/crash/reboot situationssituations
�� Automatic report generationAutomatic report generation–– Outputs a journal file in a standard wellOutputs a journal file in a standard well--defined formatdefined format
24
Sample Test ResultSample Test ResultIssue listIssue list
================================================================================================
1. Could not create kvm guest with memory >=2040 1. Could not create kvm guest with memory >=2040
DetailsDetails
================================================================================================
PAE:PAE:
1. boot guest with 256M memory1. boot guest with 256M memory PASSPASS
2. boot two windows 2. boot two windows xpxp guestguest PASSPASS
……
IA32e:IA32e:
1. boot 4 321. boot 4 32--bits guest in parallelbits guest in parallel PASSPASS
2. boot 4 642. boot 4 64--bits guest in parallelbits guest in parallel FAILFAIL
……
Test LogTest Log
25
Current StatusCurrent StatusFailFail
N/AN/A
Can boot, but has issuesCan boot, but has issues
PassPass
Vista SMP ACPI HALVista SMP ACPI HAL
Vista UP ACPI HALVista UP ACPI HAL
WinXPWinXP UP ACPI HALUP ACPI HAL
WinXPWinXP NoNo--ACPI HALACPI HAL
Win2k Win2k SrvSrv MP ACPI HALMP ACPI HAL
Win2k Win2k SrvSrv UP ACPI HALUP ACPI HAL
Win2k Win2k SrvSrv NoNo--ACPI HALACPI HAL
Win2k3 R2 MP ACPI HALWin2k3 R2 MP ACPI HAL
Win2k3 R2 UP ACPI HALWin2k3 R2 UP ACPI HAL
Win2k3 R2 NoWin2k3 R2 No--ACPI HALACPI HAL
Linux 2.6.22 SMPLinux 2.6.22 SMP
Linux 2.6.22 UPLinux 2.6.22 UP
Linux 2.6.18 SMPLinux 2.6.18 SMP
Linux 2.6.18 UPLinux 2.6.18 UP
Linux 2.6.9 SMPLinux 2.6.9 SMP
Linux 2.6.9 UPLinux 2.6.9 UP
64/6464/6432p/6432p/6432p/32p32p/32p32/32p32/32p
Guest/HostGuest/HostGuest OSGuest OS
26
We need your helpWe need your help
�� Use Bug Tracker instead of email to track issuesUse Bug Tracker instead of email to track issues
–– submit issuesubmit issue
–– assign owner assign owner
–– update bug status update bug status
�� Give us feedback on our test and its resultsGive us feedback on our test and its results
�� What more can we do for KVM?What more can we do for KVM?