Research on Embedded Hypervisor Scheduler TechniquesMidterm Report
2014/06/25
1
BackgroundAsymmetric multi-core is
becoming increasing popular over homogeneous multi-core systems.◦An asymmetric multi-core platform
consists of cores with different capabilities, for example, ARM big.LITTLE architecture.
2
ARM big.LITTLE CoreDeveloped by ARM in Oct. 2011.Combine two kinds of
architecturally compatible cores.To create a multi-core processor that can
adjust better to dynamic computing needs and use less power than clock scaling alone.
big cores are more powerful but power-hungry, while LITTLE cores are low-power but (relatively) slower.
3
Three Types of ModelsCluster migrationCPU migration(In-Kernel
Switcher)Heterogeneous multi-processing
(global task scheduling)
MotivationTraditional scheduler for
homogeneous multi-core platform focus on load-balancing.◦Each core has the same computing
ability, workloads are distributed evenly in order to obtain maximum performance.
5
Motivation(Cont.)Need new scheduling strategies
for asymmetric multi-core platform.◦Cores with different power and
computing characteristics.
6
OS Kernel
GUEST2
Scheduler
VCPU VCPU
OS Kernel
GUEST2
Scheduler
VCPU VCPU
7
ARM Cortex-A15
ARM Cortex-A7
OS Kernel
GUEST1
Scheduler
VCPU VCPU
Hypervisor
vCPU Scheduler
Performance Power-saving
Low computing resource requirement
High computing resource requirement
If Guest OS scheduler is not big.LITTLE-aware, it will assign tasks to vCPUs evenly in order to achieve load balancing.
Task 1
Task 2
Task 3
Task 4
Hypervisor vCPU scheduler will assign vCPUs evenly to physical ARM cores since it is not big.LITTLE-aware.
Cannot take advantage on
big.LITTLE core architeture
Current Hypervisor Architecture and Problem
OS Kernel
GUEST2
Scheduler
VCPU VCPU
8
ARM Cortex-A15
ARM Cortex-A7
OS Kernel
GUEST1
Scheduler
VCPU VCPU
Hypervisor
vCPU Scheduler
Performance Power-saving
Assume that the scheduler in the Guest OS is big.LITTLE-aware. The vCPU are either big or little.
Hypervisor vCPU scheduler will assign vCPUs evenly to physical ARM cores in order to achieve load-balancing.
Cannot take advantage on
big.LITTLE core architeture
Current Hypervisor Architecture and Problem(Cont.)
VCPUVCPU VCPUVCPUWaste energyPerformance Degradation
Project GoalResearch on the current scheduling
algorithms.Design and tune the hypervisor
scheduler on asymmetric multi-core platform.Assign virtual cores to physical cores for
execution.Minimize the power consumption with
performance guarantee.
9
Challenge The hypervisor scheduler cannot
take advantage of big.LITTLE architecture if the scheduler inside guest OS is not big.LITTLE aware.
10
OS Kernel
GUEST2
Android Framework
Scheduler
VCPU VCPU
11
ARM Cortex-A15
ARM Cortex-A7
OS Kernel
GUEST1
Android Framework
Scheduler
VCPU VCPU
Hypervisor
Performance Power-saving
If Guest OS scheduler is not big.LITTLE-aware, it will assign tasks to vCPUs evenly in order to achieve load balancing.
Cannot take advantage on
big.LITTLE core architeture
Task 1 Task 2Task 3 Task 4
Both on big coreOt both on LITTLE core
b-L vCPU Scheduler
Even if hypervisor vCPU scheduler is big.LITTLE-aware, it will schedule these vCPUs to either big cores or LITTLE cores since they have the same loading.
Current Hypervisor Architecture and Problem(Cont.)
Possible SolutionApply VM introspection(VMI) to
retrieve the process list in a VM.◦VMI is a technique that allows the
hypervisor to inspect the contents of the VM in real-time.
Modify the CPU masks of tasks in the VM in order to create an illusion of “big vCPU” and “LITTLE vCPU”.
Hypervisor scheduler can assign the vCPU to corresponding big or LITTLE cores.
12
Linaro Linux Kernel
GUEST2
Android Framework
Scheduler
VCPU VCPU
13
ARM Cortex-A15
ARM Cortex-A7
OS Kernel
GUEST1
Android Framework
Scheduler
VCPU VCPU
Hypervisor
Performance Power-saving
OS Kernel
GUEST2
Android Framework
Scheduler
VCPU VCPU
Low computing resource requirement
High computing resource requirement
Task 2 Task 4
VM Introspector
b-L vCPU Scheduler
VM Introspector gathers task information from
Guest OS
Task-to-vCPU
Mapper
Modify the CPU mask of each task according
to the task information from VMI
[1|0][1|0]
[0|1][0|1]
Treat this vCPU as LITTLE core since tasks with low computing requirement are scheduled here.
Hypervisor vCPU scheduler will schedule big vCPU to A15, and LITTLE vCPU to A7.
VCPU
Task 3Task 1
Hypervisor Architecture with VMI
OS Kernel
GUEST2
Android Framework
Scheduler
VCPU VCPU
14
ARM Cortex-A15
ARM Cortex-A7
OS Kernel
GUEST1
Android Framework
Scheduler
VCPU VCPU
Hypervisor
Performance Power-saving
Task 2 Task 4
VM Introspector
b-L vCPU Scheduler
Task-to-vCPU
Mapper
[1|0][1|0]
[0|1][0|1]
Hypervisor vCPU scheduler will schedule big vCPU to A15, and LITTLE vCPU to A7.
VCPU
Task 3Task 1
VCPU VCPU
Task 1[1|1] [1|1]
Guest OS 2 has two task with low computing requirement
VM Introspector gathers task information from
Guest OS
Modify the CPU mask of each task according
to the task information from VMI
Task 2
Treat this vCPU as LITTLE core since tasks with low computing requirement are scheduled here.
Guest OS 1 has two task with high computing
requirement and two task with low computing
requirement
Hypervisor Architecture with VMI(Cont.)
Hypervisor SchedulerSchedules the virtual cores to
physical cores for execution.◦Decides the execution order and
amount of time assigned to each virtual core according to some scheduling policies.
◦Xen - credit-based scheduler◦KVM - completely fair scheduler
15
Credit-Based Scheduler Each domain(OS) is assigned
with a weight and a cap.◦The weight decides the amount of
time a domain will get in a time interval.
◦The cap optionally fixes the maximum amount of CPU a domain will be able to consume.
16
Credit-Based Scheduler(Cont.) Each CPU manages a local run
queue of runnable virtual cores.Queue is sorted by virtual cores
priority. ◦virtual cores priority is either over or under. Exceeded its fair share of CPU resource in
a time interval or not. While inserting a virtual core, it is put
after all other virtual cores of equal priority.
17
Credit-Based Scheduler(Cont.) As a virtual core runs, it
consumes credits.The next virtual core to run is
picked from the head of the run queue.
A CPU will look on other CPUs for runnable virtual cores before going idle.
18
Credit-Based Scheduler on Asymmetric multi-coreCredit-Based Scheduler consider
only “fair share” of time slices.Assigning the same amount of
time slices on big and little cores results in different performance and power consumption.
19
Virtual Core Scheduling Problem
For every time period, the hypervisor scheduler is given a set of virtual cores.
Given the operating frequency of each virtual core, the scheduler will generate a scheduling plan, such that the power consumption is minimized, and the performance is guaranteed.
20
Core ModelsThere are two types of cores –
virtual cores and physical cores.
◦vj: frequency of the virtual core
◦fi: frequency of the physical core
◦ti, tj: type of the core21
),(
),(
iii
jjj
tfpC
tvvC
Power ModelTo decide the power model, we
have done some preliminary experiments to measure the power consumption of cores.◦On ODROID-XU board
22
Result – bzip2
0 20 40 60 80 100 1200
0.5
1
1.5
2
2.5
250MHzLinear (250MHz)600MHzLinear (600MHz)8000MHzLinear (8000MHz)1600MHzLinear (1600MHz)
Loading(%)
功耗
(Watt)
Power Model(Cont.)The power consumption of a
physical core is a function of core type, core frequency, and the load of the core.◦The load of a core is the percentage
of time a core is executing virtual cores.
24
),,( loadtfpPower iii
PerformanceA ratio between the computing
resource assigned, to the computing resource requested.◦Ex: a virtual core running at 800MHz
runs on a physical core of 1200MHz for 60% of a time interval.
◦The performance of this virtual core is 0.6*1200/800 = 0.9.
25
Objective FunctionGenerate a scheduling plan, such
that the power consumption is minimized, and the performance is guaranteed.◦Assume there are n physical cores.
)min(1
n
iiPower
Scheduling PlanA set of ai,j which indicates the
amount of time executing virtual core j on physical core i in a time interval.
A feasible scheduling plan must satisfies some constraints.
27
ConstraintsEach virtual cores should be
assigned with sufficient computing resources in order to meet performance guarantee.
1
guarantee eperformanc
resource computing
1,
1,
1,
n
iji
j
n
iiji
n
iiji
a
v
fa
fa
Constraints(Cont.)A physical core has a fixed
amount of computing resources in a time interval.◦According to its frequency
10
1
,
1,
ji
i
m
jji
a
loada
Current SolutionGiven the objective function and
the constraints, we can use integer programming to find a feasible scheduling plan.◦Divide each time interval into 100
time slices.◦The ai,j in the scheduling plan can be
transformed into the amount of time slices a virtual core j on physical core i.
30
Assign Virtual Cores to Physical CoresWith the scheduling plan from
integer programming is not enough.
Need to find a way to assign these virtual cores according to the scheduling plan.◦A virtual core cannot appear in two
or more physical core on the same time.
31
Example – 3 vCPUs, 2 Physical Core
… …
vCPU0
(60, 20)vCPU1
(0, 50)vCPU2
(20, 30)
t=100
t=0
Assign Virtual Cores to Physical Cores(Cont.)Given a feasible scheduling plan,
we can schedule the virtual cores to physical cores without violating the constraints.◦Consider n physical cores with m
virtual cores, n < m.
33
ExamplevCPU0
(50,40,0, 0)
t=100
t=0
vCPU3
(10,10,20, 20)
vCPU1
(20,20,20, 20)vCPU4
(10,10,10, 10)
vCPU2
(10,10,20, 20)vCPU5
(0, 0,10, 10)
Flow of Each Interval
35
Guest Oses schedules and adjusts the core
frequencies
Hypervisor scheduler generates a scheduling
plan
Virtual Core Frequencies
Execute virtual cores on physical
cores
Tasks running in Guest OSes
Loading and/or QoS
Scheduling plan
Affect task performance
Trigger DVFS mechanism on physical cores
SimulationConduct simulations to compare
the power consumption of our asymmetry-aware scheduler with that of a credit-based scheduler.
36
Simulation EnvironmentTwo types of physical cores
◦power-hunger “big” cores frequency: 1600MHz
◦power-efficient “little” cores frequency: 600MHz
◦The DVFS mechanism is disabled.
37
Scenario I – 2 Big and 2 LittleEach VM has two virtual cores.Two sets of input:
◦Case 1: Both VMs with light workloads. 250MHz for each virtual core.
◦Case 2: One VM with heavy workloads, the other with modest workloads. Heavy:1200MHz for each virtual core Modest:600MHz for each virtual core.
38
Scenario I - Results
◦Case 1: asymmetry-aware method is about 43.2% of that of credit-based method.
◦Case 2:asymmetry-aware method uses 95.6% of energy used by the credit-base method.
39
Power(Watt)
Case 1Asymmetry-aware
0.295
Credit-based 0.683
Case 2Asymmetry-aware
2.382
Credit-based 2.491
Scenario 2 – 4 Big and 4 LittleThe hardware specification of ARM 64-
bit boardEach case has three Quad-core
VM:
40
VM1 VM2 VM3
Case 1 All 250 MHz All 250 MHz All 250 MHz
Case 2 All 600MHz All 600 MHz All 250 MHz
Case 3 All 1600MHz All 1600MHz All 1600MHz
Case 4
800,800,400,400 MHz
1000,800,600,400 MHz
600,600,250,250 MHz
Scenario 2 - Results
In case 3, the loading of physical cores are 100% using both methods. Cannot save power if the computing resources
are not enough.
41
Power(Watt) Savings
Case 1Asymmetry-aware
1.20541.2
%Credit-based 2.049
Case 2Asymmetry-aware
3.52411.1
%Credit-based 3.960
Case 3*Asymmetry-aware
6.0090%
Credit-based 6.009
Case 4Asymmetry-aware
4.4356%
Credit-based 4.711
SummaryWe develop an energy-efficient
asymmetry-aware scheduler for asymmetric multi-core platforms.
The goal is to generate an energy-efficient scheduling plan with performance guarantee.
Our simulation results show that the asymmetry-aware strategy saves up to 57.2% energy against credit-based method, while still providing performance guarantee.
42
Q&A
43