41
Xen Virtual Machine Xen Virtual Machine Monitor Monitor Performance Isolation Performance Isolation E0397 Lecture E0397 Lecture 17/8/2010 17/8/2010 Many slides based verbatim on Many slides based verbatim on “Xen Credit Scheduler Wiki” “Xen Credit Scheduler Wiki”

Xen Virtual Machine Monitor Performance Isolation

  • Upload
    amish

  • View
    35

  • Download
    0

Embed Size (px)

DESCRIPTION

Xen Virtual Machine Monitor Performance Isolation. E0397 Lecture 17/8/2010 Many slides based verbatim on “Xen Credit Scheduler Wiki”. Recall: Xen Architecture. Hypervisor Core Functions. Scheduling Domains Allocating Memory Driver Domain (Domain 0) - PowerPoint PPT Presentation

Citation preview

Page 1: Xen Virtual Machine Monitor Performance Isolation

Xen Virtual Machine MonitorXen Virtual Machine MonitorPerformance IsolationPerformance Isolation

E0397 Lecture E0397 Lecture 17/8/201017/8/2010

Many slides based verbatim on “Xen Many slides based verbatim on “Xen Credit Scheduler Wiki”Credit Scheduler Wiki”

Page 2: Xen Virtual Machine Monitor Performance Isolation

Recall: Xen ArchitectureRecall: Xen Architecture

Page 3: Xen Virtual Machine Monitor Performance Isolation

Hypervisor Core FunctionsHypervisor Core Functions

Scheduling DomainsScheduling Domains Allocating MemoryAllocating Memory Driver Domain (Domain 0)Driver Domain (Domain 0)

– All access to devices needs to go through this All access to devices needs to go through this domaindomain

Page 4: Xen Virtual Machine Monitor Performance Isolation

CPU Sharing between domainsCPU Sharing between domainsCredit SchedulerCredit Scheduler

Proportional fair share CPU scheduler - work conserving on SMP hosts Proportional fair share CPU scheduler - work conserving on SMP hosts Definitions:Definitions:

– Each domain (including Host OS) is assigned a Each domain (including Host OS) is assigned a weightweight and a and a capcap. . – WeightWeight– A domain with a weight of 512 will get twice as much CPU as a domain with a weight A domain with a weight of 512 will get twice as much CPU as a domain with a weight

of 256 on a contended host. Legal weights range from 1 to 65535 and the default is of 256 on a contended host. Legal weights range from 1 to 65535 and the default is 256. 256.

– CapCap– The cap optionally fixes the maximum amount of CPU a domain will be able to The cap optionally fixes the maximum amount of CPU a domain will be able to

consume, even if the host system has idle CPU cycles. The cap is expressed in consume, even if the host system has idle CPU cycles. The cap is expressed in percentage of one physical CPU: 100 is 1 physical CPU, 50 is half a CPU, 400 is 4 percentage of one physical CPU: 100 is 1 physical CPU, 50 is half a CPU, 400 is 4 CPUs, etc... The default, 0, means there is no upper cap. CPUs, etc... The default, 0, means there is no upper cap.

– VCPUs: Number of virtual CPUs given to a domain – exactly replaces the VCPUs: Number of virtual CPUs given to a domain – exactly replaces the concept of number of CPUs in physical machineconcept of number of CPUs in physical machine

More VCPUs- can run threads parallel (only if physical CPUs > 1)More VCPUs- can run threads parallel (only if physical CPUs > 1) Number of VCPUs should be = Number of phy CPUsNumber of VCPUs should be = Number of phy CPUs < will limit performance< will limit performance

Page 5: Xen Virtual Machine Monitor Performance Isolation

…….Credit Scheduler.Credit Scheduler

– SMP load balancingSMP load balancing The credit scheduler automatically load balances The credit scheduler automatically load balances

guest VCPUs across all available physical CPUs on guest VCPUs across all available physical CPUs on an SMP host. The administrator does not need to an SMP host. The administrator does not need to manually pin VCPUs to load balance the system. manually pin VCPUs to load balance the system. However, she can restrict which CPUs a particular However, she can restrict which CPUs a particular VCPU may run on using the generic VCPU may run on using the generic vcpu-pinvcpu-pin interface.interface.

Page 6: Xen Virtual Machine Monitor Performance Isolation

UsageUsage

The xm sched-credit command may be used to tune the per VM guest scheduler parameters.

xm sched-credit -d <domain> lists weight and cap

xm sched-credit -d <domain> -w <weight> sets the weight

xm sched-credit -d <domain> -c <cap> sets the cap

Page 7: Xen Virtual Machine Monitor Performance Isolation

Credit Scheduler AlgorithmCredit Scheduler Algorithm Each CPU manages a local run queue of runnable VCPUs.Each CPU manages a local run queue of runnable VCPUs.

– queue is sorted by VCPU priority. queue is sorted by VCPU priority. – A VCPU's priority can be one of two value: A VCPU's priority can be one of two value: overover or or underunder

Over: exceeded its fair share of CPU resource in the ongoing accounting period. Over: exceeded its fair share of CPU resource in the ongoing accounting period. Under: not exceededUnder: not exceeded

– When inserting a VCPU onto a run queue, it is put after all other VCPUs of equal When inserting a VCPU onto a run queue, it is put after all other VCPUs of equal priority to it. priority to it.

As a VCPU runs, it consumes As a VCPU runs, it consumes creditscredits..– Every so often (10ms), 100 credits are reducedEvery so often (10ms), 100 credits are reduced– Unless negative, VPU gets three rounds to run (30ms)Unless negative, VPU gets three rounds to run (30ms)– Negative credits imply a priority of Negative credits imply a priority of overover. Until a VCPU consumes its alloted credits, . Until a VCPU consumes its alloted credits,

it priority is it priority is underunder. . – Credits refreshed periodically (30 ms)Credits refreshed periodically (30 ms)

Active vs Inactive VM –Active vs Inactive VM –– Credits reduction/refreshing happens only for active VMs (those that keep Credits reduction/refreshing happens only for active VMs (those that keep

using their credits)using their credits)– Those not using their credits are marked “inactive”Those not using their credits are marked “inactive”

Page 8: Xen Virtual Machine Monitor Performance Isolation

……Credit Schdeuler AlgorithmCredit Schdeuler Algorithm

On each CPU, at every scheduling decision (when a VCPU On each CPU, at every scheduling decision (when a VCPU blocks, yields, completes its time slice, or is awaken), the blocks, yields, completes its time slice, or is awaken), the next VCPU to run is picked off the head of the run queue next VCPU to run is picked off the head of the run queue (UNDER). (UNDER).

When a CPU doesn't find a VCPU of priority When a CPU doesn't find a VCPU of priority underunder on its on its local run queue, it will look on other CPUs for one. This local run queue, it will look on other CPUs for one. This load balancing guarantees each VM receives its fair share load balancing guarantees each VM receives its fair share of CPU resources system-wide. of CPU resources system-wide.

Before a CPU goes idle, it will look on other CPUs to find Before a CPU goes idle, it will look on other CPUs to find any runnable VCPU. This guarantees that no CPU idles any runnable VCPU. This guarantees that no CPU idles when there is runnable work in the system. when there is runnable work in the system.

Page 9: Xen Virtual Machine Monitor Performance Isolation

O U U U U

……Credit SchedulerCredit Scheduler

O O U U U

O U U U

V4 V3 V2 V1

V4 V5 V3 V2 V1 V5 V4 V3 V2 V1

V5

CPU1

CPU1 CPU1

V5 state ?

UNDER OVER

end of time slice / domain blocked on IO credit calculation for V5

Positive credits Negative credits

Page 10: Xen Virtual Machine Monitor Performance Isolation

……Credit SchedulerCredit Scheduler

SMP load balancing – find next runnable VCPU in following SMP load balancing – find next runnable VCPU in following order.order.– UNDER domain in local run queueUNDER domain in local run queue– UNDER domain in run queue of other CPUsUNDER domain in run queue of other CPUs– OVER domain in local run queueOVER domain in local run queue– OVER domain in run queue of other CPUsOVER domain in run queue of other CPUs

Page 11: Xen Virtual Machine Monitor Performance Isolation

Performance StudiesPerformance Studies

CPU Sharing is predictableCPU Sharing is predictable I/O sharing is notI/O sharing is not See:See:

Page 12: Xen Virtual Machine Monitor Performance Isolation

Study 1 Study 1 Cherkasova paperCherkasova paper. .

Applications running in Domain 1: (one of)Applications running in Domain 1: (one of)

Web server: requests for 10KB sized filesWeb server: requests for 10KB sized files– Measure: requests/secMeasure: requests/sec

iperf: measures maximum achievable network throughput, iperf: measures maximum achievable network throughput, – MbpsMbps

dd: reads 1000 1KB blocksdd: reads 1000 1KB blocks– Disk throughput: MB/sDisk throughput: MB/s

Page 13: Xen Virtual Machine Monitor Performance Isolation

Schedulers with three workloadsSchedulers with three workloads

•Performance varies between schedulers

•Sensitive to Dom0 weight

•Disk reads least sensitive to both schedulers and Dom0 weight

Page 14: Xen Virtual Machine Monitor Performance Isolation

Study 2: Ongarro paper:Study 2: Ongarro paper:Experiment 3: Burn x 7, ping x 1Experiment 3: Burn x 7, ping x 1

•Processor sharing fair for all schedulers•SEDF provides low latency•Boost, sort and ticking together work best

Page 15: Xen Virtual Machine Monitor Performance Isolation

Study 3: Sujesha et al (IITB)Study 3: Sujesha et al (IITB)Impact of Resource Allocation on Impact of Resource Allocation on

App performanceApp performance

Response time vs resource (CPU) allocated,for a different loads

Desired operation: at the “knee” of the curve.

Page 16: Xen Virtual Machine Monitor Performance Isolation

Study 4 (Gundecha/Apte): Study 4 (Gundecha/Apte): Performance of multiple domains Performance of multiple domains

vs scheduler parametersvs scheduler parameters

Is resource management fair to every domain Is resource management fair to every domain irrespective of irrespective of – type of loadtype of load– scheduling parameters of the domainscheduling parameters of the domain

Scheduler used – default credit schedulerScheduler used – default credit scheduler Two virtual machines one with CPU-intensive and Two virtual machines one with CPU-intensive and

other with IO-intensive workload (Apache other with IO-intensive workload (Apache webserver).webserver).

Page 17: Xen Virtual Machine Monitor Performance Isolation

ResultsResultsScheduler Scheduler

parametersparameters webserver statisticswebserver statistics

VM VM weightweight CAPCAP LOADLOAD %CPU %CPU usageusage

requests requests per secper sec

time time per per

requestrequest

transfer rate transfer rate (Kbytes/sec)(Kbytes/sec)

Experiment 1 : one VM with webserver runningExperiment 1 : one VM with webserver running

vm1vm1 256256 400400 -- -- NANA

vm2vm2 256256 400400 webserverwebserver 180180 797.61797.61 2.5072.507 1035.171035.17

Experiment 2 : Mixed Load - one VM with CPU load, one VM with webserverExperiment 2 : Mixed Load - one VM with CPU load, one VM with webserver

vm1vm1 256256 400400 CPUCPU 100100 NANA

vm2vm2 256256 400400 webserverwebserver 180180 607.43607.43 3.2933.293 788.35788.35

Table 1 : Webserver performance stats with and without CPU load on other VM

Page 18: Xen Virtual Machine Monitor Performance Isolation

Study 4: ConclusionsStudy 4: Conclusions

Performance isolation is imperfect, when Performance isolation is imperfect, when high-level performance measures are high-level performance measures are considered (application throughput)considered (application throughput)

Page 19: Xen Virtual Machine Monitor Performance Isolation

Live Migration in Xen VMMLive Migration in Xen VMM

Page 20: Xen Virtual Machine Monitor Performance Isolation

Live Migration: AdvantagesLive Migration: Advantages

Avoids difficulties of process migrationAvoids difficulties of process migration– ““Residual dependencies” – original host has to remain available Residual dependencies” – original host has to remain available

and network-connected to service some calls from migrated OS*and network-connected to service some calls from migrated OS* In-memory state can be transferred consistentlyIn-memory state can be transferred consistently

– TCP stateTCP state– ““this means that we can migrate an on-line game server or this means that we can migrate an on-line game server or

streaming media server without requiring clients to reconnect: ....”streaming media server without requiring clients to reconnect: ....”– ““Allows a separation of concerns between the users and operator Allows a separation of concerns between the users and operator

of a data center or cluster. Users need not provide the operator of a data center or cluster. Users need not provide the operator with any OS-level access at all (e.g. a root login to quiesce with any OS-level access at all (e.g. a root login to quiesce processes or I/O prior to migration). “processes or I/O prior to migration). “

Page 21: Xen Virtual Machine Monitor Performance Isolation

*Residual dependencies*Residual dependencies

“ “The problem of the residual dependencies that The problem of the residual dependencies that a migrated process retains on the machine a migrated process retains on the machine from which it migrated. Examples of residual from which it migrated. Examples of residual dependencies include open file descriptors, dependencies include open file descriptors, shared memory segments, and other local shared memory segments, and other local resources. These are undesirable because resources. These are undesirable because the original machine must remain available, the original machine must remain available, and because they usually negatively impact and because they usually negatively impact the performance of migrated processes.”the performance of migrated processes.”

Page 22: Xen Virtual Machine Monitor Performance Isolation

In summary: Live migration isIn summary: Live migration is

extremely powerful tool for cluster administratorsextremely powerful tool for cluster administrators– allowing separation of hardware and software allowing separation of hardware and software

considerationsconsiderations– consolidating clustered hardware into a single coherent consolidating clustered hardware into a single coherent

management domain.management domain.– If host needs to be removed from serviceIf host needs to be removed from service

Move guest domains, shutdown machineMove guest domains, shutdown machine

– Relieve bottlenecksRelieve bottlenecks If host is overloaded, move guest domains to idle hostsIf host is overloaded, move guest domains to idle hosts

““virtualization + live migration = improved manageability”virtualization + live migration = improved manageability”

Page 23: Xen Virtual Machine Monitor Performance Isolation

Goals of live migrationGoals of live migration

Reduce time during which services are Reduce time during which services are totally unavailabletotally unavailable

Reduce Reduce total migration timetotal migration time – time during – time during which synch is happening (which might be which synch is happening (which might be unreliable)unreliable)

Migration should not create unnecessary Migration should not create unnecessary resource contention (CPU, disk, network, resource contention (CPU, disk, network, etc)etc)

Page 24: Xen Virtual Machine Monitor Performance Isolation

Xen live migration highlightsXen live migration highlights

SPECweb benchmark migrated with 210ms SPECweb benchmark migrated with 210ms unavailabilityunavailability

Quake 3 server migrated with 60ms Quake 3 server migrated with 60ms downtimedowntime

Can maintain network connections and Can maintain network connections and application stateapplication state

Seamless migrationSeamless migration

Page 25: Xen Virtual Machine Monitor Performance Isolation

Xen live migration: Key ideasXen live migration: Key ideas

Pre-copy approachPre-copy approach– Pages of memory iteratively copied without Pages of memory iteratively copied without

stopping the VMstopping the VM– Page-level protection hardware used to ensure Page-level protection hardware used to ensure

consistent snapshotconsistent snapshot– Rate adaptive algorithm to control impact of Rate adaptive algorithm to control impact of

migration trafficmigration traffic– VM pauses only in final copy phaseVM pauses only in final copy phase

Page 26: Xen Virtual Machine Monitor Performance Isolation

Details: Time definitionsDetails: Time definitions

Downtime: pDowntime: period during which the service is eriod during which the service is unavailable due to there being no currently unavailable due to there being no currently executing instance of the VM; this period will be executing instance of the VM; this period will be directly visible to clients of the VM as service directly visible to clients of the VM as service interruption. interruption.

Total migration timeTotal migration time: duration between when : duration between when migration is initiated and when the original VM migration is initiated and when the original VM may be finally discarded and hence, the source may be finally discarded and hence, the source host may potentially be taken down for host may potentially be taken down for maintenance, upgrade or repair.maintenance, upgrade or repair.

Page 27: Xen Virtual Machine Monitor Performance Isolation

Memory transfer phasesMemory transfer phases

PushPush– VM keeps sending pages from source to dest VM keeps sending pages from source to dest

while running (modified pages have to be while running (modified pages have to be resent)resent)

Stop-and-CopyStop-and-Copy– Halt VM- copy entire image, start VM on destHalt VM- copy entire image, start VM on dest

PullPull– Start VM on dest. Pages that are not found Start VM on dest. Pages that are not found

locally are retrieved from sourcelocally are retrieved from source

Page 28: Xen Virtual Machine Monitor Performance Isolation

Migration approachesMigration approaches

Combinations of phasesCombinations of phases– Pure stop-and-copy:Pure stop-and-copy:

Downtime & total migration time proportional to phy Downtime & total migration time proportional to phy memory size (we want memory size (we want downtimedowntime to be lesser) to be lesser)

– Pure demand-migrationPure demand-migration Move some minimal data required, start VM on dest.Move some minimal data required, start VM on dest. VM will page-fault a lot initially – total migration time VM will page-fault a lot initially – total migration time

will be long, and performance may be unacceptablewill be long, and performance may be unacceptable

– This paper: “Bounded push phase followed by This paper: “Bounded push phase followed by stop-and-copy”stop-and-copy”

Page 29: Xen Virtual Machine Monitor Performance Isolation

Xen “pre-copy” key ideasXen “pre-copy” key ideas

Pre-copying (push) happens in roundsPre-copying (push) happens in rounds– In round In round nn pages that changed in round pages that changed in round n-1n-1 are copied are copied– Knowledge of pages which are written to frequently – Knowledge of pages which are written to frequently –

poor candidates for pre-copy poor candidates for pre-copy ““Writable working set “ (WWS) analysis for server workloadsWritable working set “ (WWS) analysis for server workloads

Short stop-and-copy phaseShort stop-and-copy phase Awareness of impact on actual services Awareness of impact on actual services

– Control resources used by migration (e.g. network Control resources used by migration (e.g. network bandwidth used, CPU used etc.)bandwidth used, CPU used etc.)

Page 30: Xen Virtual Machine Monitor Performance Isolation

Decoupling local resourcesDecoupling local resources

Network: Assume source, dest VM on same Network: Assume source, dest VM on same LANLAN– Migrating VM will move with TCP/IP state and Migrating VM will move with TCP/IP state and

IP addressIP address– Generate Generate unsolicited ARP reply unsolicited ARP reply – ARP reply – ARP reply

includes IP address mapping to MAC address. includes IP address mapping to MAC address. All receiving hosts will update their mapping All receiving hosts will update their mapping Few packets in transit may be lost, very minimal Few packets in transit may be lost, very minimal

impact expected impact expected

Page 31: Xen Virtual Machine Monitor Performance Isolation

… … Decoupling local resourcesDecoupling local resources

Network…Problem: Routers may not accept Network…Problem: Routers may not accept broadcast ARP replies ( broadcast ARP replies (note: ARP requests note: ARP requests are expected to be broadcast – not replies)are expected to be broadcast – not replies)– Send ARP replies to addresses in its own ARP Send ARP replies to addresses in its own ARP

cachecache– On a switched network migrating OS keeps its On a switched network migrating OS keeps its

original Ethernet mac – depends on network original Ethernet mac – depends on network switch to detect the mac moved to a new portswitch to detect the mac moved to a new port

Page 32: Xen Virtual Machine Monitor Performance Isolation

… … Decoupling local resourcesDecoupling local resources

DiskDisk– Assume Network Attached StorageAssume Network Attached Storage– Source/Dest machine on the same storage Source/Dest machine on the same storage

networknetwork– Problem not directly addressedProblem not directly addressed

Page 33: Xen Virtual Machine Monitor Performance Isolation

Migration StepsMigration Steps

If not enough resources, no

migration

At end of this phase, two consistent

copies of VM

OK Message from BA

and ack from AB. VM on A can stop

Failure management:

One consistent VM active at all times.

If “commit” step does not happen, migration aborted and original VM continues

Page 34: Xen Virtual Machine Monitor Performance Isolation

Writable Working SetsWritable Working Sets

Copying VM memory the largest migration Copying VM memory the largest migration overheadoverhead

Stop-and-copy simplest approachStop-and-copy simplest approach– Downtime unacceptableDowntime unacceptable

Pre-copy migration may reduce downtimePre-copy migration may reduce downtime– What about pages that are dirtied while copying?What about pages that are dirtied while copying?– What if rate of dirtying > rate of copying?What if rate of dirtying > rate of copying?

Key observation: most VMs will be such that large Key observation: most VMs will be such that large parts of memory are not modified, small part is parts of memory are not modified, small part is (called WWS)(called WWS)

Page 35: Xen Virtual Machine Monitor Performance Isolation

WWS MeasurementsWWS Measurements

Different for different workloads

Page 36: Xen Virtual Machine Monitor Performance Isolation

Estimated Effect on downtimeEstimated Effect on downtime

Each successive line (top bottom) showing increasing pre-copy roundsPre-copy reduces downtime (confirmed by many experiments)

Page 37: Xen Virtual Machine Monitor Performance Isolation

Details: Managed Migration Details: Managed Migration (Migration by Domain 0)(Migration by Domain 0)

First round: All pages copiedFirst round: All pages copied Subsequent rounds – only dirtied pages copied. (maintain Subsequent rounds – only dirtied pages copied. (maintain

dirty bitmap). How:dirty bitmap). How:– Insert Insert shadow page tablesshadow page tables– Populated by translating sections of guest OS page tablesPopulated by translating sections of guest OS page tables

Page table entries are initially read-only mappingsPage table entries are initially read-only mappings If guest tries to modify a pageIf guest tries to modify a page fault created/trapped by Xen fault created/trapped by Xen If write access permitted by original page table, do same here – and set If write access permitted by original page table, do same here – and set

bit in dirty bitmapbit in dirty bitmap

– At start of each pre-copy round, dirty bitmap given to control At start of each pre-copy round, dirty bitmap given to control software, Xens bitmap cleared, shadow page tables destroyed and software, Xens bitmap cleared, shadow page tables destroyed and recreated, all write permissions lostrecreated, all write permissions lost

When pre-copy is to stop, mesg sent to guest OS to When pre-copy is to stop, mesg sent to guest OS to suspend itself, dirty bitmap checked, pages synched.suspend itself, dirty bitmap checked, pages synched.

Page 38: Xen Virtual Machine Monitor Performance Isolation

Self-MigrationSelf-Migration

Major difficulty: OS itself has to run to Major difficulty: OS itself has to run to transfer itself – what will be correct state to transfer itself – what will be correct state to transfer? Soln.transfer? Soln.– Suspend all activities except migration-relatedSuspend all activities except migration-related– Dirty page scanDirty page scan– Copy dirty pages to “shadow buffer”Copy dirty pages to “shadow buffer”– Transfer shadow bufferTransfer shadow buffer

Page dirtying during this time is ignored.Page dirtying during this time is ignored.

Page 39: Xen Virtual Machine Monitor Performance Isolation

Dynamic Rate-Limiting AlgorithmDynamic Rate-Limiting Algorithm

Select minimum and maximum bandwidth limits Select minimum and maximum bandwidth limits First pre-copy round transfers pages minimum b/wFirst pre-copy round transfers pages minimum b/w Dirty page rate calculates (number/duration of Dirty page rate calculates (number/duration of

round)round) B/w limit of next round = dirty rate + 50 Mbit/sec B/w limit of next round = dirty rate + 50 Mbit/sec

(higher b/w if dirty rate high)(higher b/w if dirty rate high) Terminate when calculated rate > max or less than Terminate when calculated rate > max or less than

256KB remaining256KB remaining Stop-and-copy done at max bandwitdhStop-and-copy done at max bandwitdh

Page 40: Xen Virtual Machine Monitor Performance Isolation

Live Migration: Some resultsLive Migration: Some results

Page 41: Xen Virtual Machine Monitor Performance Isolation

……ResultsResults

Downtime: 210 msDowntime: 210 ms Total migration time: 71 secsTotal migration time: 71 secs No perceptible impact on performance No perceptible impact on performance

during uptimeduring uptime