Upload
open-kernel-labs
View
833
Download
3
Tags:
Embed Size (px)
DESCRIPTION
ARM TechCon Session "Virtualization as the Nexus of Multicore Power Management" Thursday, November 11, 2010 Adoption of multicore technology for the desktop,data center and embedded designs responds to comparable needs – to scale compute capacity without stepping up system clocks and to attain more MIPS-per-watt for devices and applications. Multicore for the desktop and data center enjoys mature support from deployed OSes. Even as embedded OSes become more adept at running on multicore CPUs, applications and middleware still face challenges of thread-safety, concurrency and load balancing. Mobile virtualization is a means to get maximum value from multicore ARM designs, at both architectural and app levels. It examines multicore use cases for virtualization, and how it brings superior CPU utilization,greater security, smoother legacy migration,& smarter energy management to multicore designs.
Citation preview
November 9-11, 2010The Santa Clara Convention Center
www.armtechcon.com
Energy Management for Mobile DevicesPower to the Microvisor!
Energy-management Virtualization basics Enter multicore Summary
Overview
Device uses energy• Drains battery
Goal of energy management:• Maximize battery life
Energy in Mobile Devices
Dynamic voltage and frequency scaling
CMOS power consumption:• P = Pdyn + Pstat
• Pdyn ∝ f V2
• Vmin ∝ f (very approximately)
Assuming execution time T 1 / ∝ f• Edyn = Pdyn T ∝ f V2 / f = V2 = f2
• lower frequency lower dynamic energy⇒
Energy-Management Mechanisms: DVFS
When CPU is idle, turn clock off• Pdyn = 0 ⇒ P = Pstat
Sleep states reduce power further:• Psleep < Pstat
Typically have multiple sleep states• shallow sleep states save some energy
but fast to enter/exit
• deep sleep states save more energy but lose state and are expensive to enter/exit
Complex tradeoff
Mechanisms: Sleep States
Edyn ∝ f 2 lowest frequency is best⇒ Ignores static energy!
• E = Edyn + Estat
• Edyn ∝ f 2
• Estat = Pstat T ∝ 1/f
Low f increases execution time ⇒ Estat increases at low f !
Popular Approach: Lowest Frequency
Run at maximum f, then go to sleep• Tries to minimize static power — but:
• dynamic power isn’t irrelevant (yet)– T 1/∝ f isn’t correct either — ignores memory!
• Effect of memory stalls• T = TCPU + Tmem
• TCPU ∝ 1/f • Tmem = const• Estat ∝ T = 1/f + const
Ignores sleep energy!
Other Approach: “Race to Halt”
Run at maximum f, then go to sleep Earlier completion longer sleep⇒
• E = Edyn + Estat + Esleep
• Esleep = Psleep Tsleep
• Tsleep = T0 – T
• Esleep = Psleep (T0 - T)
Still ignores dynamic energy!
Other Approach: “Race to Halt” (2)
Real Data: Total Energy (Measured)
CPU-boundCPU-bound
Memory-bound
Memory-bound Naïve
modelNaïvemodel
Real Data: Including Sleep Energy
High-powersleep stateHigh-powersleep state
Low-powersleep stateLow-powersleep state
Energy management is complex! Optimal setting depends on:
• Workload memory-bound vs CPU-bound vs in-between
• Hardware platform static vs dynamic energy CPU vs memory power depth of sleep states and cost of entering
Simple models don’t work!
Summary: Energy-Management Basics
How to establish memory-boundedness? Easy way out: pre-characterization
• measure behavior off-line
• determine optimal power setting by model or trial-and-error
Ok-ish for pre-defined workloads Unsuitable for open systems
• ... such as phones
Tricky with apps which change behavior
Characterizing Workloads
Need to observe app and adjust setting• works for any app
• adjusts to changing behavior
Solution by [Snowdon et al., EuroSys’09] Performance counters are your friends!
• e.g. cache misses indicate memory access
Can systematically select best counters• build model of platform
• Linear combination of performance-counter readings
• pre-characterize hardware
• pick counters which provide most accurate model
• using sound statistical methods
Better Way: On-Line Characterization
Model predicts energy consumption and relative execution speed• at present setpoint
• at different setpoins
Accurately predicts energy- and performance response to DVFS• within a few %
Can use this for informed energy-management decisions
On-Line Characterization & Modeling
What is “best”?• Maximal Performance?
• Minimal Energy?
• Minimal Power?
Depends... May change
• battery depletes
Need flexible policies
Energy Management Policies
Workload PredictionWorkload Prediction
CandidateSetpoints
QoS Info
Setting
Energy/Performance Energy/Performance ModelsModels
Selection PolicySelection Policy
Workload Statistics
Generalized Energy-Delay Policy
Generalized Energy-Delay Policy
PerformancePerformance
CPU-boundCPU-bound
Memory-bound
Memory-bound
EnergyEnergy
Implementation of power model and policies• once for platform vs once for each guest
• no guest has global view, hypervisor does
• integration with other cores DSPs, baseband processor
• policy-mechanism separation
Why do it outside the OS?
Controls all resources• CPU, memory, devices
De-privileged guest OSes• execute in user mode
• prevents interference with hypervisor with other guests
• ensures hypervisor retains control over resources
The Hypervisor
Subsystems compete for it Cannot let subsystems manage it
• just as with memory, CPU
Needs trusted, central authority Needs to be done in virtualization layer
Energy is a Global Resource
Mechanisms in hypervisor Policies in isolated management module Keep hypervisor policy-free
• HW-like
Policy-Mechanism Separation
Additional degree of freedom• DVFS + sleep states + core shutdown
• Hypervisor supports transparent, temporaryconsolidation of cores
• Unneeded cores turned off to reduce power
Different tradeoffs• Performance vs power close to linear
Important to manage cores globally• In average more cores off than with
per-guest management• Can use deeper sleep state
• Less overall energy use
Enter Multicore
OKL4 Microvisor
Subsystem #1
CPU
VCPU VCPU VCPUVCPU
Subsystem #2
CPU CPUCPU
OKL4 Microvisor
Subsystem #1
CPU
VCPU VCPU VCPUVCPU
Subsystem #2
CPU CPUCPU
Cache coherency couples clock frequencies of multiple cores
OSes running on different cores cannot adjust clock independently
Requires entity with global view
Enter Multicore: Architectural Constraints
Cores have same ISA but different clock rates Hypervisor can determine optimal mapping of subsystems to cores
• Using same infrastructure as for DVFS
• Integrate with temporary core consolidation
Asymmetric Multicore
FastCPU
SlowCPU
OKL4 Microvisor
CPU-boundSubsystem
FastCPU
VCPU VCPU VCPUVCPU
Memory-boundSubsystem
SlowCPU
Move subsystems between cores• including temporary consolidation
of different subsystems on common core
Architectural inter-core dependencies• cannot manage core clocks independently
Requires global control• ... outside individual OSes
• indirection layer between OS and hardware
No practical alternative to virtualization!
The Future is Multicore
OKL4 Microvisor
Subsystem #1
CPU
VCPU VCPU VCPUVCPU
Subsystem #2
CPU CPUCPU
Virtualization is unavoidable long-term ... but provides other benefits short-term Early uptake maximises benefits Future-proof your designs!
Summary
Thank You!