Upload
others
View
5
Download
0
Embed Size (px)
Citation preview
Individual Voltage Scaling in Logic and Memory Circuits towards Runtime Energy Optimization in Processors
Jun Shiomi, Tohru Ishihara, Hidetoshi Onodera
1
Graduate School of Informatics,
Kyoto University, Japan
Energy Reduction by Dynamic Voltage Scaling
2
𝑉DD- and 𝑉th-tuning technique for energy minimization
Supply voltage (𝑉DD) Threshold voltage (𝑉th)
Energ
y
Energ
y
Delay Delay
Static energy
Supply voltage tuning (𝑉DD) Threshold voltage tuning (𝑉th)
DVFS: Dynamic Voltage andFrequency Scaling
ABB: Adaptive Body Biasing
Dynamic energy
Minimum Energy Point Tracking (MEP Tracking)
3
Energy minimization by voltage scaling under a given frequency
MEPT example:Renesas SOTB 65-nmCell-based memory
Target: MEP tracking technique for processors
Minimum Energy Point: MEP(Best combination of 𝑉DD and 𝑉th)
0 -0.5 -1.0 -1.5 -2.0
1.2
1.0
0.8
0.6
0.4Su
pp
ly
Vo
lta
ge
[V]
Body Bias [V] Large 𝑉th
140 pJ
90 pJ
Small 𝑉th
Performance contour
Activity Factor Dependency of MEP Curves(Activity 𝟏𝟎𝟎% → 𝟏𝟎%)
4
Activity factor: Important parameter determining MEPs
Issue: MEPs heavily depend on activity factors (toggle rates)
1.2
1.0
0.8
0.6
0.4
0 -0.5 -1.0 -1.5 -2.0
Su
pp
ly V
olta
ge
[V]
Body Bias [V] Large 𝑉thSmall 𝑉th
Performance contour
12 pJOptimized
20 pJ
Unoptimized
Overview of This Work
5
Individual voltage scaling problem in logic and memory circuits
Heuristic algorithm for runtime optimization
1.2
1.0
0.8
0.6
0.4
0 -0.5 -1.0 -1.5 -2.0
Su
pp
ly V
olta
ge
[V]
Body Bias [V] Large 𝑉thSmall 𝑉th
MEP with 10% activity≅ On-chip memory
MEP with 100% activity≅ Logic circuits
Performance contour
Outline
• Background
• Individual Voltage Scaling Problem
• Silicon Measurement
• Conclusion
6
(Existing) Uniform Voltage Scaling Problem
7
min 𝐸
s. t. 𝐷 ≤ 𝐷0 𝑉 DD
𝑉th
𝑉DD, 𝑉th ∈ ℝ
MEP curve
• Existing approach: Runtime MEP tracking [5]
Solution
𝐷 = 𝐷0
𝑉 DD
𝑉th
Initial point
Finish
Performance contourfor 𝐷 = 𝐷0
Enables to track MEPs at runtime even if
Energy & delay monitoring(MEP check)
Requires only simple circuits
dynamically change
target performance
temperature
activity
Tunes 𝑉DD and 𝑉th iteratively
Circuit energy
Circuit delay
Targetperformance
Individual Voltage Scaling Problem
8This work: Heuristic algorithm for runtime voltage scaling
min 𝐸L + 𝐸M
s. t. 𝐷L + 𝐷M ≤ 𝐷0
𝑉DD,L, 𝑉th,L, 𝑉DD,M, 𝑉th,M ∈ ℝ
Logic Memory
𝐷L 𝐷M
Constraint 𝐷0
𝑉DD,L 𝑉th,L 𝑉DD,M 𝑉th,M
L No runtime algorithms
due to complex delay assignment between 𝐷L and 𝐷M
Delay DelayPow
er
Pow
er
𝐷0 𝐷0
Voltage scaling in logicVoltage boost in mem.
Logic Memory
Huge energy saving
Various Strategies in Uniform Voltage Scaling
9
𝑉 DD
𝑉th
Logic MEP (𝐸L min.)
Memory MEP (𝐸M min.)
Processor MEP (𝐸L + 𝐸M min.)
𝐸L optimized, but 𝐸M NOT optimized
𝐸L, 𝐸M balanced ⇒ Solution in uniform voltage scaling
Delay contour (𝐷L + 𝐷M = 𝐷0)
Logic MEP (𝐸L min.)
Memory MEP (𝐸M min.)
Processor MEP (𝐸L + 𝐸M min.)
Concept of the Proposed Heuristic Algorithm
10
Delay contour (𝐷L + 𝐷M = 𝐷0)
𝑉 DD
𝑉thLogic voltages (𝑉DD,L, 𝑉th,L) Memory voltages (𝑉DD,M, 𝑉th,M)
Enable local minimum energy point operation
Point: 𝐷L and 𝐷M are constant over the delay contour ( )
Simple Heuristic Algorithm for Individual Voltage Scaling
11
𝑉 DD,L=𝑉 D
D,M
𝑉th,L = 𝑉th,M
Logic MEP
Mem. MEP
Step 1
1. Uniform voltage tuning in Logic & Mem.
(i.e., 𝑉DD,L = 𝑉DD,M & 𝑉th,L = 𝑉th,M)
Enables to apply existing techniques
2. Find logic MEP ( )
𝑉 DD,L≠𝑉 D
D,M
𝑉th,L ≠ 𝑉th,M
1. Tune only mem. voltages (𝑉DD,M & 𝑉th,M)
2. Find memory MEP ( )
Enable runtime energy optimization
Step 2
Local minimum energy point operation
Init. point
Tune only mem. voltages
Delay contour𝐷L + 𝐷M = 𝐷0
Fix logic voltages
Logic Energy Optimization
Memory Energy Optimization
Outline
• Background
• Individual Voltage Scaling Problem
• Silicon Measurement
• Conclusion
12
Case Study: 32-bit RISC Processor
13
• On-chip memory
- 4 kB I-Cache + TAG
- 8 kB I-SPM
- 16 kB D-SPM
• Renesas SOTB 65-nm
I/O
Logic (𝑉DD,L)
Main
mem
ory
(DCT loop)
Mem. (𝑉DD,M)
Body bias 𝑉BP,L 𝑉BN,M 𝑉BP,M
• Individual in logic and mem.
- Body bias for nMOSFETs in logic circuits is fixed at GND
• No level converters between logic and memory
Target
Supply voltage & body bias
Standard-cell based memory
Activity Factor Dependency of Memory MEPs (𝑉DD,L = 𝑉DD,M & 𝑉BB,L = 𝑉BB,M)
14
1.2
1.0
0.8
0.6
0.4
0 -0.5 -1.0 -1.5 -2.0
MEPs move to the upper right as activity 𝛼M decreases
Small 𝑉th Large 𝑉th
LogicMEP
𝜶𝐌 = 𝟏
𝜶𝐌 = 𝟎. 𝟏 𝜶𝐌 = 𝟎. 𝟎𝟏
𝛼M: Memory activity factor
1 Activate in each clock cycle
Activate once in 10 clock cycles
Supply
Voltage
[V]
Body Bias [V]
Fmax contour of the fabricated processor [MHz]
Activate once in 100 clock cycles
0.1
0.01
Measurement Results of the Proposed Algorithm(𝛼M = 0.01)
15
1.2
1.0
0.8
0.6
0.4
0 -0.5 -1.0 -1.5 -2.0
Logic
MEP
Mem.
MEP
Step 1
1. Uniform voltage scaling
2. Find logic MEP ( )
Step 2
1. Fix logic voltages @
2. Tune only mem. voltage & find mem. MEP ( )
Individual voltage tuning achieved by the proposed algorithm
Small 𝑉th Large 𝑉th
Supply
Voltage
[V]
Body Bias [V]
Fmax contour of the fabricated processor [MHz]
Energy Reduction by Individual Voltage Scaling(𝛼M = 0.01)
164 MHz 8 MHz 20 MHz 29 MHz
Total EnergyConsumption[pJ / cycle]
0
20
40
60
80
100
Logic dynamic energyLogic static energyMemory dynamic energyMemory static energy
−15%−16%
−13%
−10%
Fmax
Up to 16% energy reduction by individual voltage scaling
Conclusion & Future Work
17
• Individual voltage scaling problem in logic and memory presented
Conclusion
• A heuristic algorithm proposed for runtime energy optimization
• Case study using RSIC processors in 65-nm process
- Up to 16% energy reduction compared with uniform voltage scaling
Future work
• Energy overhead compared with the global solution
• Energy overhead introduced by fine-grained voltage tuning, etc…
- Key: Activity factor gap between logic and memory circuits
18
Energy Reduction by Individual Voltage Scaling(𝛼M = 0.1)
19
Fmax 4 MHz 8 MHz 20 MHz 29 MHz
0
20
40
60
80
100
−11%−9%
−7%
−5%
No energy improvement when 𝛼M = 1
Total EnergyConsumption[pJ / cycle]
Logic dynamic energyLogic static energyMemory dynamic energyMemory static energy
Definition of 𝛼M
20
• No clock gating circuits
• Dynamic energy consumption @ each clock cycle
Implemented on-chip memory has large activity factor
On-chip memory property
• Parameter 𝛼M implemented to scale activity factor
Measured
memory energy
Leakage
energy
× 𝛼M
Measured value
Dynamic energy
Static energy
Evaluated value
System-Level Optimization Problem
21
CPU(≃ Memory)
DSP(≃ Logic)
The problem can be abstracted to system-level optimization
Low activity
High activity
Time
Time
CPU execution time(≃ 𝐷M)
DSP execution time(≃ 𝐷L)
Deadline (≃ 𝐷0)
Future work: Applying the heuristic to system-level optimization
𝑉DD, 𝑉th
𝑉DD, 𝑉th