View
216
Download
0
Category
Tags:
Preview:
Citation preview
Model predictive control and self-learning of thermal models for
multi-core platforms*
Luca BeniniLuca.benini@unibo.it
*Work supported by Intel Labs Braunschweig
The Power Crisis
M.Pedram [CADS10]
mobile clients
Data center, HPC,server0%
20%
40%
60%
80%
100%
Macbook Macbook Air iPhone-3G MacPro-Desktop
Co
2 E
mis
sio
ns
%Transport
Production
Use
Run-time power ~50% of ICT
environmental impact and cost!
3
The Thermal Crisis
• Never-ending shrinking: smaller, faster…
Free lunch?!
[Sun, 1.8 GHz Sparc v9
Microproc]
[Sun, Niagara
Broadband Processor]
CMOS65nm
CMOS 45nm
CMOS32nm
[Coskun et al ‘07, UCSD]
• Multi-Processor SoC possible“Cool” chips, “hot” applications
Not really…
[Cell Multi-
Processor]
system wear-out
and lifetime
reliability degradation !!
• Thermal issues: hot-spots, thermal gradients…
3D-SoCs are even worse
A System-level View• Heat density trend 2005-2010 (systems)
[Uptime Institute]
Cooling and hot spot avoidance is an open issue!
• Increasing power density• Thermal issues at
multiple levels– Chip / component level– Server/board level– Rack level– Room level
Multi-scale Problem
Today’s focus: Chip level
Thermal Management
Spatial and temporalworkload variation
software
High power
densities
Tecnology scalingHigh performace requirements
Limitated dissipation capabilities
System integrationCosts
power, temperature, performanceNON UNIFORM:
Hot spots, thermal gradients and cycles
Leakage current Reliability lost,
Aging
Dynamic Approach:
on-line tuning of system performance andtemperature through closed-loop control
Control algomigration policy
System information
from OSWorkloadCPU utilization,
queue status
PowerReliability
alarms Temperature
Core 1 Core N …Proc. 1 Proc. 2
INTERCONNECT
Private
Mem
Private
Mem…
Multicore Platform
Management Loop: Holistic view
SW
HWIntrospection: monitors Self-calibation: knobs (Vdd, Vb, f,on/off )
Task scheduling - migration
Proc. N
Private
Mem
Outline
• Introduction• Energy Controller• Thermal Controller architecture• Learning (self-calibration)• Scalability• Simulation Infrastructure• Results• Conclusion
O.S
DRM - General Architecture
• System (Chip Scale)• Sensors
– Performance counter - PMU
– Core temperature• Actuator - Knobs
– ACPI states– P-State DVFS– C-State PGATING
– Task allocation• Controller
– Reactive– Threshold/Heuristic– Controller theory
– Proactive – Predictors
Simulation snap-shot
L2
CPU1L1
L2
DRAM
Network
CPU2L1
CPUNL1
HW
SW
App.1
Thre
ad
1 ...
Th
read
N
App.N
Th
read
1 ...
Th
read
N.......
f,v
PGATING
TCPU,#L1MISS,#BUSACCESS,CYCLEACTIVE,....
CONTROLLER
• Controller • Minimize energy
• Bound the CPU temperature
• Our approach• Stack of controllers
• Energy controller
• Thermal controller
CONTROLLER
Energy
Thermal
TifECi
WLei
fTCi
Energy ControllerCPU BOUND TASK
MEMORY BOUND TASK
cpu
cache
High Frequency
Power
Exec
Time
Energy
1
0
cpu
cache
Low Frequency
Power
Exec
Time
Energy
1
0
• Performance Loss• Power reduction• Energy Efficiency Loss!
cpu
dram
cache
High Frequency
Power
Exec
Time
Energy
1
0
cpu
dram
cache
Low Frequency
Power
Exec
Time
Energy
1
0
• Same Performance• Power reduction• Energy Efficiency Gain!
Energy Controller
cpu
cache
cpu
dram
cache
Low Frequency
High Frequency
Power
Exec
Time
Energy
1
0
• Performance Loss• Power reduction• Energy Efficiency Loss!
Power
Exec
Time
Energy
1
0
• Same Performance• Power reduction• Energy Efficiency Gain!
CPU BOUND TASK
cpu
cache
High Frequency
Power
Exec
Time
Energy
1
0
cpu
dram
cache
Low Frequency
Power
Exec
Time
Energy
1
0
MEMORY BOUND TASK
OUR SOLUTION •Power Saving• No performance Loss• Higher Energy Efficiency
Outline
• Introduction• Energy Controller• Thermal Controller architecture• Learning (self-calibration)• Scalability• Simulation Infrastructure• Results• Conclusion
Thermal Controller
COMPLEXITY[Intel®, ISSCC 2007]
Threshold basedcontroller
•T > Tmax low freq•T < Tmin high freq• cannot prevent overshoot• thermal cycle
Classical feed-back controller
• PID controllers• Better than threshold based approach• Cannot prevent overshoot
Model Predictive Controller
•Internal prediction:avoid overshoot
•Optimization: maximizes performance
• Centralized• aware of neighbor
cores thermal influence
• All at once – MIMO controller
• Complexity !!!
Thermal Model
Past input & output
OptimizerFuture input
Target frequency
+-
Cost function
MPC
Futureoutput
Future error
Constraint
CoreN
Core1
Corei
multicore
MPC Robustness
Thermal Model
Past input & output
OptimizerFuture input
Target frequency
+-
Cost function
MPC
Futureoutput
Future error
Constraint
MPC needs a Thermal Model
• Accurate, with low complexity • Must be known “at priori”• Depends on user configuration• Changes with system ageing
“In field” Self-CalibrationWorkload
tCoreN
Workload
t
Workload
tCorei
Workload
t
Workload
t
Workload
tCore1
Workload
t
Workload execution
Training tasks
Workload
Power
Temperature
System Identification
Identified State-SpaceThermal Model
• Force test workloads• Measure cores temperatures• System identification
Outline
• Introduction• Energy Controller• Thermal Controller architecture• Learning (self-calibration)• Scalability• Simulation Infrastructure• Results• Conclusion
task tasktask
Modello Termico
TjThermalmodel
Tn,j
Pn,j
Modello di potenza
TaskjPj
Power model
P=g(task,f)
Thermal Model & Power Model
si sisi
sisi
sisi
sisi
Cu cucu cucu
02 20 13 31
01
10
23
32
P0 P1
Matrix A Matrix B
Si 00 Si 11
Cu 22 Cu 33
Tn,j
Pn,j
Tenv
Model Structure
Modelleast squareoptimization
DataOptimal parameters
Test pattern
Systemresponse
A,B
e = Tmodel - Treal
Parametric optimization
Parameters
Cost function
Error Function
i
LS System Identification
PREPROCESSING
Pattern Generator Workloader
core0 core1 coreN
XTS
LS
MODEL
LS
MODEL
PREPROCESSING
PRBS.csv
DATA.csv
..0,0,1,1,1,…
..idle,idle,run,run,run,…
•Temperature• Frequency• Workload
C/FORTRAN(SLICOT,MINPACK)
Matlab System Identification• N4SID• PEM• LS (Levenberg-Marquardt)
HW
(Ts=1/10ms)
Fan boardFan board
CPU2CPU2 CPU1CPU1
Storage drivesStorage drives
SUN FIRE X4270SUN FIRE X4270
Air
flo
w
RAMRAMRAMRAM
ChipsetChipset
CPU1CPU1CPU2CPU2
• Intel Nehalem
• 8core/16thread
• 2.9GHz
• 95W TDP
• IPMI
• Intel Nehalem
• 8core/16thread
• 2.9GHz
• 95W TDP
• IPMI
Experimental setup
0 5 10 15 20 25 30 3532
34
36
38
40
42
44
°C
Tcore2
0 5 10 15 20 25 30 350
0.2
0.4
0.6
0.8
1
Pcore2
Time (seconds)
Wo
rklo
ad
Workload & Temperature
Temperature trace
Pseudorandom workload pattern
Identification based on pure LS fitting
TEMPERATURA MISURATA vs. SIMULATA
25 26 27 28 29 30 31 32 3334
35
36
37
38
39
40
41
42
43
44
45Tcore2
seconds
°C
Measured
Thermal Model
MEASURED vs. SIMULATED TEMPERATURE
Black-box Identification
13.6 13.65 13.7 13.75 13.8 13.85 13.9 13.95 14 14.05 14.132
34
36
38
40
42
44
Tcore2
seconds
°C
MeasuredThermal Model
1.36 1.365 1.37 1.375 1.38 1.385 1.39 1.395 1.4 1.405 1.41
x 104
1
1.5
2
2.5
3
3.5
4x 10
9
f [H
z]
1.36 1.365 1.37 1.375 1.38 1.385 1.39 1.395 1.4 1.405 1.41
x 104
-1
0
1
2
W
time (ms)
PROBLEM 1: “WORKLOAD” signal does not include power variation due to frequency changes.
FREQUENCY
WORKLOAD
SOLUTION:
Learn Power Model too!
Partially unobservable model
Power model P=g(w,f) initially unknown
Powermodel(LUT)
Thermal model(A,B)
Tfw
P
1° STEP: set f=const, set w as [0|1]N sequence [P1|P0]N with P1, P0 pre-measured in steady state, we measure T to obtain A0 by LS2° STEP: A is known, we set f, w, we measure T, we invert A and we obtain P
Multi-step Identification
3° STEP: P is known, we now generate richer sequence w,f and we re-calibrate A by LS
Iterate until convergence
13.6 13.65 13.7 13.75 13.8 13.85 13.9 13.95 14 14.05 14.1
34
35
36
37
38
39
40
41
42
Tcore2
seconds
°C
MeasuredThermal Model
1.36 1.365 1.37 1.375 1.38 1.385 1.39 1.395 1.4 1.405 1.41
x 104
1
1.5
2
2.5
3
3.5
4x 10
9
f [H
z]
1.36 1.365 1.37 1.375 1.38 1.385 1.39 1.395 1.4 1.405 1.41
x 104
-1
0
1
2
W
time (ms)
20.5 21 21.5 22 22.5 23
40.5
41
41.5
42
42.5
Tcore2
seconds
°C
Measured
Thermal Model
Problem 2: Instable Model
Tamb
Tcore
t
Tcore differs from Tamb with P=0
Problem 3: Model is not physical
Identification algorithm must be aware of physical properties to avoid over-fitting
Validation
Constraint on initialcondition
Linear constraint
CONSTRAINED LEAST SQUARES
26 26.2 26.4 26.6 26.8 27 27.2
37
38
39
40
41
42
43
44Tcore2
seconds
[°C
]
Measured
Thermal Model
Constrained Identification
100 200 300 400 500 60032
34
36
38
Tcore2[°
C]
Measured
Thermal Model
100 200 300 400 500 60030
35
Tcore0
[°C
]
Measured
Thermal Model
100 200 300 400 500 60030
35
Tcore1
[°C
]
Measured
Thermal Model
100 200 300 400 500 60030
35
Tcore3
Seconds
[°C
]
Measured
Thermal Model
Possible causes:
• Package thermal inertia?
• Environment inertia (Air)?
• PLEAK temperature dependency?
Identification with pseudorandom trace:
• Too many samples, huge LS computation
Large time constant
Quasi-steady-state accuracy
• Modelling the third time constant as heat sink temperature variation
• One-pole model identification
Enviromentthermalmodel
CPU thermal model
Theatsink
PT
Tenv
Addressing models stiffnes
100 200 300 400 500 60030
35
40Tcore
[°C
]
Measured
Thermal Model
100 200 300 400 500 60030
35
40Tcore
[°C
]
Measured
Thermal Model
100 200 300 400 500 60030
35
40Tcore
[°C
]
Measured
Thermal Model
100 200 300 400 500 60030
35
40Tcore
Seconds
[°C
]
Measured
Thermal Model
Outline
• Introduction• Energy Controller• Thermal Controller architecture• Learning (self-calibration)• Scalability• Simulation Infrastructure• Results• Conclusion
4
36
MPC Scalability
[Intel, ISSCC 2007] [Intel, ISSCC 2007]
MPC Complexity• Implicit - a.k.a. on-line
• computational burden
• Explicit – a.k.a. off-line• high memory
occupation
# CORES
# E
xpli
cit
reg
ion
s
[Intel, ISSCC 2007] [Intel, ISSCC 2007]
8
1296
16
Ou
t o
f m
emo
ry
Complexity grows superlinearly with number of cores!!
Addressing Scalability
On a short time window, power has a local thermal effect!
4
36
# CORES
# E
xpli
cit
reg
ion
s
8
1296
16
Ou
t o
f m
emo
ry
8 16 32
One controller for each coreController uses:• local power & thermal model• neighbor’s temperatures
Fully distributedComplexity scales linearly with #cores
CoreN
Core1
Corei
multicore
Distributed Control
Distributed and hierarchical controllers:– Energy Controller (EC)
• Output Frequency fEC
– Minimize power – CPI based– Performance degradation < 5%
– Temperature Controller (TC) • Distributed MPC• Inputs:
– fEC ,TCORE, TNEIGHBOURS
• Output– Core frequency (fTC)
fTC i,k+1
fTC N,k+1
T i,k
T i,k
controllernode
Ti+1,k
Ti-1,k
TN-1,k
Tx,k
T2,k
TCi
TCN
fEC i,k+1
ECi
CPI i,k+1
CPI N,k+1
ECN
fEC N,k+1
Thermal ControllerCore 1CPI
f1,EC
f1,TC
Thermal ControllerCore 2CPI
f1,EC
f2,TC
Thermal ControllerCore 3CPI
f1,EC
Thermal ControllerCore 4CPI
f1,EC
PLANT
T1+Tneigh
T1+Tneigh
f4,TC
T3+Tneigh
T4+Tneigh
f3,TC
Ene
rgy
Con
trol
ler
High Level Architecture
Distributed Controller
CPI
f1,EC
Observer
P1,EC
Distributed Thermal Controller
g(·) MPC ControllerLinear Model
QP Optimizx1
TENV
g-1(·)CPI
P1,TC
f1,TC
MPC Controller Core 1
Nonlinear(Frequency to Power)
Linear(Power to Temperature)
s.t
2 states per core
T1
Classic Luenberger state observer
Implicit formulation
?
Region number
x1SHIFTEDx1 +
A1-1·B1
[x1 ; TENV ; P1,EC]
REGION NUMBER
GAIN MATRICES
1 F1 , G1
2 F2 , G2
… …
nr Fnr , Gnr
u(k)=∆P1 +
P1,EC
P1,TC
Our aim is to minimize the difference between the input P1,TC (also called manipulated variable MV) and the reference (P1,EC). Our controller can only take in account a constant reference. To overcome this limitation we reformulate the tracking problem as a regulation problem consisting in taking the ∆P1 (the new MV) to 0. The regulated power P1,TC is:
At each time instant the system belongs to a region according with its current state. On each region the explicit controller executes the following linear control law:
The prediction evaluated by our explicit controller cannot take into account the measured disturbances (uMD=[Tenv, P1, Tneigh]). Thus we exploit the superposition principle of linear systems:
To remap the effect of these elements we exploit the model to modify the state (x(k) xSHIFTED(k)) projecting one step forward the MDs effects.
Explicit Distributed Controller
CoreN
Core1
Corei
multicore
O.S.Implementation – Linux SMP• Controller routines
– Scheduler Routine Extension• Is distributed• Executes on the core it relies on
– Timing: Scheduler tick (1-10ms)
• CPI estimation– Performance counters:
• Clock expired• Instructions retired
• Energy Controller– Look-up-table:
• fEC = LuT [ CPI ]
• Thermal Controller– Core Temperature Sensors– Matrix Multiplication & Look-up-table:
• fTC = LuT [ M*[TCORE, TNEIGHBOURS] ]
fTC i,k+1
fTC N,k+1
T i,k
T i,k
controllernode
Ti+1,k
Ti-1,k
TN-1,k
Tx,k
T2,k
TCi
TCN
fEC i,k+1
ECi
CPI i,k+1
CPI N,k+1
ECN
fEC N,k+1
System Identification
“In field” Self-Calibration
• Force test workloads• Measure cores temperatures• System identification
Model Learning Scalability
Thermal Model
Past input & output
OptimizerFuture input
Target frequency
+-
Cost function
MPC
Futureoutput
Future error
Constraint
MPC Weaknesses – 2nd
Internal Thermal Model
• Accurate, with low complexity • Must be known “at priori”• Depends on user configuration• Changes with system ageing
Workload
tCoreN
Workload
t
Workload
tCorei
Workload
t
Workload
t
Workload
tCore1
Workload
t
Workload execution
Training tasks
Workload
CoreN
Core1
Corei
multicore
Temperature
Power
Identified State-SpaceThermal Model
Complexity issue• State-of-the-art is centralized• Least square based – is based on matrix inversion (cubic with #cores)
# CORES
Tim
e -
s
4
105
16
>1h
8
1230
1 2 4
Distributed approach: each core identifies its
local thermal modelComplexity scales linearly with #cores
Outline
• Introduction• Energy Controller• Thermal Controller architecture• Learning (self-calibration)• Scalability• Simulation Infrastructure• Results• Conclusion
Simulation StrategyTrace driven Simulator [1]:• Not suitable for full system simulation (How to simulate O.S.?)• looses information on cross-dependencies
resulting in degraded simulation accuracy
Co
ntr
ol
stra
teg
y
Multicore Simulator
Workload Set
Power Model
Temperature Model
Execution Trace database
Multicore Simulator
Workload
Power Model
Temperature Model
Co
ntr
ol
stra
teg
y
Close loop simulator:• Cycle accurate simulators [2] :
• High modeling accuracy• support well-established power and temperature
co-simulation based on analytical models and system micro-architectural knowledge
• Low simulation speed• Not suitable for full-system simulation
• Functional and instruction set simulators:• allow full system simulation• less internal precision• less detailed data• introduces the challenge of having accurate
power and temperature physical models
no micro-architectural model
[1] P Chaparro et al. Understanding the thermal implications of multi-core architectures. 2007[2] Benini L. et al. MPARM: Exploring the multi-processor SoC design space with SystemC 2005
Virtual Platform
Virtual Platform
Simulator
SIMICS
RUBY COREStall Mem
AccessCPU1 CPU2 CPUN
L2L1
L2
DRAMNetwork
L1 L1HWSW
App.1
T1 ... TN
App.N
T1 ... TN
O.S.
....
Simics by Virtutech:• full system functional simulator• models the entire system:
peripherals, BIOS, network interfaces, cores, memories
• allows booting full OS, such as Linux SMP • supports different target CPU (arm, sparc, x86)• x86 model:
• in-order• all instruction are retired in 1 cycle• does not account for memory latency
Memory timing model• RUBY – GEMS (University of Wisconsin)[1]
• Public cycle-accurate memory timing model
• Different target memory architectures• fully integrated with Virtutech Simics• written in C++• we use it as skeleton to apply our add-
ons (as C++ object)
[1] Martin Milo M. K. et al. Multifacet’s general execution-driven multiprocessor simulator (GEMS) toolset 2005
Virtual Platform
Virtual Platform
Simulator
SIMICS
PC#hlt,stall
active,cycles
RUBY COREStall Mem
Access
DVFSfi
fi
fi ,VDD
CPU1 CPU2 CPUN
L2L1
L2
DRAMNetwork
L1 L1HWSW
App.1
T1 ... TN
App.N
T1 ... TN
O.S.
....
#L1MISS, #BUSACCESS, CYCLEACTIVE,....
f,v f,v f,v
Performance knobs (DVFS) module:• Virtutech Simics support frequency change at run-time• RUBY does not support it:
• does not have internal knowledge of frequency• We add a new DVFS module to support it :
• ensures L2 cache and DRAM to have a constant clock frequency• L1 latency scale with Simics processor clock frequency
Performance counters module:• Needed by performance control policy • We add a new Performance Counter module to support it
• exports to O.S. and application different quantities: • the number of instruction retired, clock cycles and stall cycles expired,
halt instructions,…
Virtual Platform
Virtual Platform
Simulator
SIMICS
PC#hlt,stall
active,cycles
RUBY COREStall Mem
Access
DVFSfi
fi
fi ,VDD
POWER MODELPCORE, PL1, PL2
CPU1 CPU2 CPUN
L2L1
L2
DRAMNetwork
L1 L1HWSW
App.1
T1 ... TN
App.N
T1 ... TN
O.S.
....
#L1MISS, #BUSACCESS, CYCLEACTIVE,....
f,v f,v f,v
Power model module:• At run-time estimate the power consumption of the target architecture• Core model PT = [PD(f,CPI) + PS(T, VDD)] *(1 − idleness) + idleness *(PIDLE)• PD experimentally calibrated analytical power model• Cache and memory power – access cost estimated with CACTI [1]
[1] Thoziyoor Shyamkumar et al. A comprehensive memory modeling tool and its application to the design and analysis of future memory hierarchies. 2008
Power Model
Power model interface
Simulation snap-shot Ruby &simics
Ruby &simics
L2
CPU1L1
CPU NL1
L2
DRAM
CPU2L1
Network
f1 = k1 * fnom f2 = k2 * fnom fN = kN * fnom
i-th CPU# Istruction retired# Stall Cycle # HLT Istructioni-th L1# Line & WD Read# Line & WD Write
TVf dd ,,
i-th L2
# Line Read# Line Write
DRAM
# Burst Read# Burst Write
POWER MODEL
POWER & ENERGY
Modeling Real Platform – Power
EkCKDCBCKDDAD CPIfkkkfVkP )(2
Real Power Measurement • Intel server system S7000FC4UR
• 16 cores - 4 quad cores Intel® Xeon® X7350, 2.93GHz• 16GB FBDIMMs• Intel® Core™ 2 Duo architecture
• At the wall Power consumption • test:
• set of synthetic benchmarks with different memory pattern accesses • forcing all the cores to run at different performance levels• for each benchmark we extract the clocks per instruction metrics (CPI) and
correlate it with the power consumption
• We relate the static power with the operating point by using an analytical model
High accuracy at high and low CPI
Virtual Plattform
Virtual Platform
Simulator
SIMICS
PC#hlt,stall
active,cycles
RUBY COREStall Mem
Access
DVFSfi
fi
fi ,VDD
POWER MODELPCORE, PL1, PL2
CPU1 CPU2 CPUN
L2L1
L2
DRAMNetwork
L1 L1HWSW
App.1
T1 ... TN
App.N
T1 ... TN
O.S.
....
#L1MISS, #BUSACCESS, CYCLEACTIVE,....
TEMPERATURE MODELT
TCPU,
f,v f,v f,v
Temperature model module:• we integrate our virtual platform with a thermal simulator [1]• Input: power dissipated by the main functional units composing the target platform• Output: Provides the temperature distribution along the simulated multicore die area
as output
[1] Paci G. et al. Exploring ”temperature-aware” design in low-power MPSoCs
PCPU1PL1
PCPUnPL1
PCPU2PL1
L2L2
Network
Thermal Model
Methods to solve temperature
POWER MODEL
POWER & ENERGY
sisisi
sisi
sisi
si
si
Cu cucu cu cu
CPU2
L2
CPU1 CPUn
L1L1L1
Heat spreaderIC package
Package pin
PCB
IC die
Thermal ModelTi
Modeling Real Platform– Thermal• Thermal Model Calibration :
• Derived from Intel® Core™ 2 Duo layout
• We calibrate the model parameter to simulate real HW transient
• High accuracy (error < 1%) and same transient behavior
<1%
Virtual Platform Performance• Host:
• Intel® Core™ 2 Duo • 2.4 Ghz• 2GB RAM
Simics + Ruby:
Simics + Ruby + DVFS:
Simics + Ruby + DVFS + Power:
Simics + Ruby +DVFS + Power + Thermal interface:
Simics + Ruby +DVFS +Power +Thermal Model:
• Target:• 4 core Pentium® 4 • 2GB RAM• 32 KB private L1 cache• 4 MB shared L2 cache• Linux OS
Tsim = 1040 s
Tsim = 1045 s
Tsim = 1110 s
Tsim = 1160 s
Tsim = 1240 s
68 cellsT = 100ns
Compute every 13us
1 Billion instruction
+ 7% + 19.2%
Mathworks Matlab/Simulink
• Numerical computing environment developed to design, implement and test numerical algorithms
• Mathworks Simulink – for simulation of dynamic systems: simplifies and speedups the development cycle of control systems
• Can be called as a computational engine by writing C and Fortran programs that use Mathworks Matlab’s engine library
• Controller design - two steps: • developing the control algorithm that optimizes the system
performance • implementing it in the system
We allow a Mathworks Matlab/Simulink description of the controller to directly drive at run-time the performance knobs of the emulated system
Simulator Virtual Platform
Virtual Platform
Virtutech Simics
PC#hlt,stall
active,cycles
RUBY COREStall Mem
Access
fi ,VDD
POWER MODELPCORE, PL1, PL2
CPU1 CPU2 CPUN
L2L1
L2
DRAMNetwork
L1 L1HWSW
App.1
T1 ... TN
App.N
T1 ... TN
O.S.
....
f,vPGATING
#L1MISS, #BUSACCESS, CYCLEACTIVE,....
TEMPERATURE MODELT
Mat
hwor
ks
Mat
lab
Controller
MAT
LAB
Inte
rfac
eP
CPI
P,T
TCPU,
Mathworks Matlab interface:• New module named Controller in RUBY• Initialization: starts the Mathworks Matlab engine concurrent process,• Every N cycle - wake-up:
• send the current performance monitor output to the Mathworks Simulink model• execute one step of the controller Mathworks Simulink model• propagate the Mathworks Simulink controller decision to the DVFS module
DVFSfi
fi
CONTROL-STRATEGIES DEVELOPMENT CYCLE1. Controller design in Mathworks Matlab/Simulink framework
• system represented by a simplified model • obtained by physical considerations and identification techniques
2. Set of simulation tests and design adjustments done in Simulink
3. Tuned controller evaluationwith an accurate model of the plant done in the virtual platform
4. Performance analysis, by simulating the overall system
Outline
• Introduction• Energy Controller• Thermal Controller architecture• Learning (self-calibration)• Scalability• Simulation Infrastructure• Results• Conclusion
ResultsEnergy Controller (EC)
–Performance Loss < 5%–Energy minimization
Temperature Controller (TC) – Complexity reduction
• 2 explicit region for controller
–Performs as the centralized• Thermal capping
<0.3° <3%
<3%
Next Steps
• Now working on the embedded implementation • Server multicore platform and Intel ® SCC
• Explore thermal aware scheduler solution • co-operate with presented solution
• Develop distributed+multi-scale solution for data-centers
Thermal-aware task scheduling
1
2
3
4
56
Sensor DataDatabase
CFD simulationsoftware
PolicyControllerSchedulerOther Impact
factors
Collecting environmental data andload information from sensors
`
Correlation ofload & power
Cost Analysis
Scheduling Policy
Control Policy
Incoming task
Onsite survey
Map load to powerconsumption
History Sensor Data
Current Sensor Data
Schematic View of Thermal Management
Datacenter
Abstract HeatModel
Thank you!
Recommended