Upload
vuongtram
View
223
Download
1
Embed Size (px)
Citation preview
May 23-24, 2000Scottsdale, AZKickoff_may_2000.ppt
1
Morphable Computer Architecturesfor Highly Energy Aware Systems:
PACC Kickoff: May 23, 24, 2000; Scottsdale, AZ
Peter M. Kogge: CSE Dept.University of Notre Dame [email protected]
Kanad Ghose: CS Dept.SUNY-Binghamton; [email protected]
Nikzad “Benny” Toomarian:Center for Integrated Space Microsystems (CISM)
Jet Propulsion Lab; [email protected]
May 23-24, 2000Scottsdale, AZKickoff_may_2000.ppt
2
M O R P HM O R P H : Dynamic Low Energy Architectures
Profiles
Baseline
Morphable Node
Data Placement
Adaptive Algorithms
Run-time
Demo & Eval
0 6 mo 1 yr 18 mo 2
New Ideas• Multi-cluster microarchitecture to allowdynamic changes in energy expended per cycle• Energy efficient ISA extensions to processdata more energy efficiently• Energy efficient morphable memoryhierarchies• Adaptive algorithms to select bestconfiguration• Energy aware run-time which canreconfigure system
M O R P HAdds An
“Energy Gear”“Energy Gear”to
Embedded Systems
I M P A C T• Changes focus to energy, not power,management•Adds extra degrees of freedom todynamic energy control• Provides an inherently more energyefficient architecture• Designed with real embedded missions inmind
May 23-24, 2000Scottsdale, AZKickoff_may_2000.ppt
3
Why is PACC Important?
n Real world: limited energy sourceso Renewable energy: 12-15 watts at
high noono Fixed capacity batteries for off-peak
sunlight or emergencies in shade
n Multiple operational modes, allcompute/energy constrainedo Movement: collision avoidanceo Spectroscopy: data gathering vs
analysiso Communication: compression vs
transmission
n Today:o Select computers for peak
performance needso Limited ability to “downshift”
May 23-24, 2000Scottsdale, AZKickoff_may_2000.ppt
4
The Future at the Low End:M icroexplorers
1997
µCOMMUNICATION
µCOMPUTING
10 kg 1 kg2002?
100 gm2007?
10 gm2012?
µSENSORS
ADVANCED MOBILITY
µPOWER µNAVIGATION
µSTRUCTURE
TEMPERATURECONTROL
Extremely limited energy sources => Peak computing only when absolutely necessary
May 23-24, 2000Scottsdale, AZKickoff_may_2000.ppt
5
Distributed Sensors Penetrators
IntegratedInflatableSailcraft
Nano-Rovers
Nano-Spacecraft
Hydrobot
RLV
AtmosphericProbes
“Larger” Systems Have MoreD iverse Energy/Performance Profiles
May 23-24, 2000Scottsdale, AZKickoff_may_2000.ppt
6
Recasting The ClassicalPower Equation
Power = 1/2 x C x Τ Τ x V2
Energy/sec Logic transitions/sec
Energy/cycle x cycles/sec transitions/cycle x cycles/sec
EnergyPerCycle = 1/2 x C x Nαα x V2
EPC is independent of clock rate!
Lowering EPC is our focus!
May 23-24, 2000Scottsdale, AZKickoff_may_2000.ppt
7
Why is This Important?
n Power = EPC x F
n Performance = IPC x F
n Today’s designs: Performance/Power = IPC/EPCo EPC & IPC are fixed at design time (other than voltage scaling)o THUS: Ratio is fixed at design timeoOnly runtime “knobs” are V and F
n Real embedded scenarios:o Short periods of very high peak performance need => high IPCo Followed by long periods of much lower performance need
nResult: long periods of lower performancestill running at inefficient EPC!!
F = cycles/second
May 23-24, 2000Scottsdale, AZKickoff_may_2000.ppt
8
This Project:A “Morphable” System Architecture
n Today’s microarchitectures: EPC = IPCk where k>>1
n Our approach:o Inherently lower EPC (lower k)oWith variable IPC (in turn varying EPC)
n Thus IPC/EPC can be varied dynamicallyo Lowering IPC lowers EPC even more
n Result: additional runtime “knobs” to run-time energymanagemento Adjust configuration so IPC x F matches performance needso Reap energy savings of lower EPC
Allow systems to change the “Energy Gear” on demand!
May 23-24, 2000Scottsdale, AZKickoff_may_2000.ppt
9
The Team
SUNY-BINGHAMTON• Morphable Caches, RFs• Energy Eff VLIW archs• Supporting compilertechniques
UNIVERSITYOF NOTRE DAME
• Morphable multi-clusterarchitecture• “At the sense amps” ISA extension• Runtime with hooks for dynamicmorphing control
JET PROPULSIONLABORATORY
• Scenarios & benchmarks• Baseline characterizations• Runtime adaptationalgorithms
Energy AwareData Placement
Overall Goals:• Architectures with variableIPC, EPC• Tools & S/W to managemorphing• Realistic demonstrations
Peter KoggeVincent FreehJay Brockman
Nikzad ToomarianMohammed MojarradiSavio Chau
Kanad Ghose
May 23-24, 2000Scottsdale, AZKickoff_may_2000.ppt
10
Project Components
n Morphable, inherently low EPC design
n Memory system allowing both width and placement shaping
n Dynamic algorithms to select best “shape” for currentenergy/performance profile
n Augmented run-time to allow dynamic reconfiguration
May 23-24, 2000Scottsdale, AZKickoff_may_2000.ppt
11
Our Background
n NSF MIPS: Inherently Low Power Architectureso The Multi-cluster microarchitectureo Cache-In-Memoryo Energy Efficient Caches
n IEEC Binghamton: Reducing power on interconnects
n DARPA Processing-In-Memory Projects: HTMT & DIVAo Utilizing wide bandwidth on-chip storage macroso Data placement in deep memory hierarchiesoMulti-threading
n NASAo X2000: highly scalable low power systems for deep space missionso Evolvable Computing Program: adaptive algorithms to select system
parameters to meet some mission objective
May 23-24, 2000Scottsdale, AZKickoff_may_2000.ppt
12
How Power Explodeswith Conventional Designs
May 23-24, 2000Scottsdale, AZKickoff_may_2000.ppt
13
Starting A Solution:Multi Cluster Architecture
Fetch
Decode
Register File
DataCache
Fetch
Decode
Rename
Issue Window
Register File
Bypass
DataCache
memorydisambiguation
Fetch
Decode
Renameand steering
Issue Window
Register File
Bypass
DataCache
RAW
RAB
memorydisambiguation
Issue Window
Register File
Bypass
DataCache
RAW
RAB
memorydisambiguation
One Cluster
(a) Simple Pipeline (b) Classical Superscalar (c) New Multi Cluster
Problem: single largecentralized registerfiles with many ports Solution: multiple smaller
register files with few ports
IssueWidth(IW)
EPC/IPC ~ (IW)kk as high as 1.9
w(IW/w)k
<< (IW)kw Clusters
IW/w
May 23-24, 2000Scottsdale, AZKickoff_may_2000.ppt
14
Multi-Cluster vs ConventionalResults
1x6
1x4
1x8
4x4
2x6
Conventional
Up to 1/2 the energy at same IPC, or 20% better IPC at same energy
2x4
4x2
May 23-24, 2000Scottsdale, AZKickoff_may_2000.ppt
15
Insertion into PACC
n Implement CPU as nominal 4 cluster configuration
n Modify Instruction Issue to target variable # of clusterso Equivalent need for separating memory disambiguation units
n Make this a runtime settable parametero Unused clusters turned off
n Additional CPU optionso Implement selected subset of “wide word” & VLIW-like operations
within a clustero Utilize unused clusters for additional concurrent threads
May 23-24, 2000Scottsdale, AZKickoff_may_2000.ppt
16
Another Starting Point:Low Energy Caches & Register Files
n Approach: exploit locality to reduce energy requirements ofon-chip storage resources:
n Example: multiple line buffers:
May 23-24, 2000Scottsdale, AZKickoff_may_2000.ppt
17
Storage System Morphs
n Exploit locality to reduce dynamic AND static energydissipations of on chip storage resources:o Selective substrate biasing to reduce leakage – reverse body bias
removed when storage component is accessedo Clustered data placement to maximize access to each partition within
on-chip and off-chip RAMso Compiler/OS prefetching to avoid/reduce turn-on delay
n Changeable Widths of Interconnect & Storage Resourceso Sub-banking for caches and on-chip/off-chip RAMo FU-driven selection of activation width of dispatch buffer and
reservation stations, data register filesoOperand-width driven activation of FU slices
May 23-24, 2000Scottsdale, AZKickoff_may_2000.ppt
18
ISA Extensions with EnergyReduction Potential
n VLIW-like multiple move instructionso Use compiler to optimize number of moves/energyo Useful for many signal processing loops, numerical computations
n “Wide word” multiple operation per instructiono Utilize existing bandwidth more completely
n Inclusion of simultaneous multi-threading extensionso Allow for pipelines without costly hazard detection/forwarding
May 23-24, 2000Scottsdale, AZKickoff_may_2000.ppt
19
Run-Time Considerations
n Application must have freedom to provideo expected energy/performance of codeo requests for levels of service
n But, only run-time sees global pictureo All current running applications & their requestso Existing energy/power resources and mission profilesoMeasurements on current activities
n Run-time modifications: changing the “energy gear”o Number of clusters per threado Number of threadso Active width of on-chip storage resources & substrate biaseso Active width of off-chip memory & interfaceso Placement of data within hierarchy
May 23-24, 2000Scottsdale, AZKickoff_may_2000.ppt
20
Determining the Gear:Reconfiguration Algorithms
n Approach:o Use powerful parallel searches (e. g. genetic algorithms,
neural nets, etc.), possibly including hardware, to determinethe optimal performance.
n Payoff:o Achieve high autonomy on-board spacecrafto The best schedule for highest science return with lowest
power consumptionoMaintain functionality under changes in operating conditions
n Objective:o Develop reconfigurable computing
capability which will allow:l Self-reconfiguration and
adaptation to unforeseen conditionsl Faster, cheaper development cycles
Outgrowth of JPL’s Evolvable Computing Program
May 23-24, 2000Scottsdale, AZKickoff_may_2000.ppt
21
Program Plan
Profiles
Baseline
Morphable Node
Data Placement
Adaptive Algorithms
Run-time
Demo & Eval
0 6 mo 1 yr 18 mo 2 yr
Optional 3rd year: high level design & demo on FPLA or MOSIS prototype of run-time investigation of needed program development environment demo in JPL test bed analysis for insertion into real JPL mission
May 23-24, 2000Scottsdale, AZKickoff_may_2000.ppt
22
Expected Deliverables
n Benchmark suite & corresponding mission energy profiles
n Detailed morphable architecture
n System simulator with energy & performance projections &evaluation against profiles
n Demonstration of data placement & architectural adaptationalgorithms
n Specification of energy aware run-time & API
May 23-24, 2000Scottsdale, AZKickoff_may_2000.ppt
23
Some Recent References
n Zyuban, Victor and Peter M. Kogge, “Inherently Lower-Power High-PerformanceSuperscalar Architectures,” submitted to IEEE Trans. On Computers
n Zyuban, Victor and Peter M. Kogge, "Optimization of High-Performance Super-ScalarArchitectures for Energy-Delay Product," accepted for ISPLED 2000
n K. Ghose, “Reducing Energy Requirements for Instruction Issue and Dispatch in SuperscalarProcessors”, accepted for ISLPED 2000
n K. Ghose and M. B. Kamble, “Reducing Power in Superscalar Caches Using Subbanking,Multiple Line Buffers and Bit-Line Segmentation”, ISPLED’99, pp. 70-75.
n Zyuban, Victor and Peter M. Kogge, "The Energy Complexity of Register Files,” ISPLED’98,pp.305-310.
n K. Ghose and M. B. Kamble “Energy-efficient Cache Organizations for SuperscalarProcessors”, Workshop on Power-Driven Microarchitecture, in conjunction with ISCA’98
n Zyuban, Victor and Peter M. Kogge, "Split Register File Architecture for Inherently LowerPower Architectures," Workshop on Power-Driven Microarchitecture, in conjunction withISCA’98.
n Zawodny, Jason T., Jay B. Brockman, Peter M. Kogge, Eric Johnson, "Cache-In-Memory: ALower Power Alternative," Workshop on Power-Driven Microarchitecture, in conjunctionwith ISCA’98.
n M.B. Kamble and K. Ghose, “Analytical Energy Dissipation Models for Low Power Caches, “ISPLED’97, pp. 143-148.
n M.B. Kamble and K. Ghose, “Energy-Efficiency of VLSI Caches: A Comparative Study,”IEEE 10-th. Int’l. Conf. on VLSI Design, Jan. 1997, pp. 261-267.
May 23-24, 2000Scottsdale, AZKickoff_may_2000.ppt
24
“Just enough energy”