View
213
Download
0
Category
Preview:
Citation preview
Power Awareness through Selective Dynamically Optimized Traces
Roni Rosner, Yoav Almog, Micha Moffie, Naftali Schwartz and Avi Mendelson – Intel Labs, Haifa, Israel
Presenter: Ioana Burcea
Agenda
Motivation for PARROT = Power-Aware aRchitecture Running Optimized Traces
PARROT Concept and Architecture Performance and Energy Results Discussion
– What makes PARROT a power-aware architecture?– What is new about this paper? / What are the contributions
of this paper?
Motivation
We pay more energy per task– Poor scaling of performance with power consumption
PARROT tries to change the balance– Filtering Techniques to Improve Trace-Cache Efficiency –
PACT 2001– Selecting Long Atomic Traces for High Coverage – ICS
2003– Specialized Dynamic Optimizations for High-Performance
Energy-Efficient Microarchitecture – CGO 2004
PARROT Concepts – The Big Picture
Based on the well-known cold/hot (10/90) paradigm
PARROT Principles– Reuse: trace-cache centric– Dynamic optimizations: more performance with
less energy– Focus: invest where it pays– Pipeline decoupling: hybrid front-end, cold and
hot execution pipelines– Transparency: immune to s/w compatibility
Traces and Trace Selection
Decoded atomic traces– Complex retirement & recovery in case of misprediction– More aggressive optimizations
Trace Selection – deterministic criteria– Capacity limitation: 64 uops– Complete basic blocks– Terminating CTI (control-transfer instructions)
Indirect jumps, software exceptions, backward taken branches
– Return instructions: procedure inlining– Trace join
Microarchitecture
Split-execution vs. unified-execution– Foreground phase: fetch-to-execution pipeline– Background phase (post-processing): trace selection and
optimization
Microarchitecture (cont’d)
• Two predictors: GHR = Global History Buffer
•Branch predictor
•Trace predictor
• Deterministic trace build scheme
• Filtering mechanisms:
• The hot filter selects frequent traces from those executed on the cold pipeline
• The blazing filter selects for optimization the hottest traces
• Dynamic optimizations
• generic and core specific optimizations
• gradually applied (?)
Simulation framework
An “in-house” proprietary performance and power simulator
Optimizations applied as different passes– Optimization delay for one trace ~ 100 cycles
Energy simulation– Power consumption matrix for each operation on each
hardware unit– Leakage
Uniform leakage in space over the processor core and L2 cache and in time modeling a high temperature
LE = PMAX * (0.05 * M + 0.4*K) * CYC
Configuration Space
Experimental Evaluation
Metrics– IPC– Total energy– Cubic-MIPS-per-WATT (CMPW)
A measure of the design tradeoffs between power and performance
Benchmarks– SpecInt2000– SpecFP2000– Office– Multimedia– DotNet
Performance and Power Awareness
Extreme Microarchitectural Alternatives
Hot Code Predictability
Trace-cache Fetch Coverage
Optimizer Capabilities
Energy Breakdown
Their Conclusions…
Our Conclusions
What makes PARROT a power-aware architecture?
What is new about this paper? / What are the contributions of this paper?– rePlay (?)
Recommended