Sp12 CMPEN 411 L14 S.1
CMPEN 411VLSI Digital Circuits
Spring 2012
Lecture 14: Designing for Low Power
[Adapted from Rabaey’s Digital Integrated Circuits, Second Edition, ©2003 J. Rabaey, A. Chandrakasan, B. Nikolic]
Sp12 CMPEN 411 L14 S.2
Reminders
Next lecture
Dynamic logic
- Reading assignment – Rabaey, et al, 6.3
Sp12 CMPEN 411 L14 S.3
Review: CMOS Power Equations
P = CL VDD2 f + tscVDD Ipeak f + VDD Ileak
Dynamic
power
Short-circuit
power
Leakage
power
Sp12 CMPEN 411 L14 S.4
Power and Energy Design Space
Constant Throughput/Latency
Variable Throughput/Latency
Energy Design Time Non-active Modules Run Time
Active
(Dynamic)
Logic design
Reduced Vdd
TSizing
Multi-Vdd
Clock Gating
DFS, DVS
(Dynamic Freq, Voltage Scaling)
Leakage
(Standby)
Multi-VT
Stack effect
Pin ordering
Sleep Transistors
Multi-Vdd
Variable VT
Input control
Variable VT
Sp12 CMPEN 411 L14 S.5
Transistor Sizing for Minimum Energy
Device sizing COMBINED with supply voltage reduction is a very
effective way to reduce the energy consumption
of a logic network
Device sizing affects dynamic energy consumption
gain is largest for networks with large overall effective fan-outs (F = CL/Cg,1)
Sp12 CMPEN 411 L14 S.7
Dynamic Power Consumption is Data Dependent
A B Out
0 0 1
0 1 0
1 0 0
1 1 0
2-input NOR Gate
With input signal probabilities
PA=1 = 1/2
PB=1 = 1/2
Static transition probability
P01 = Pout=0 x Pout=1
= P0 x (1-P0)
Switching activity, P01, has two components
A static component – function of the logic topology
A dynamic component – function of the timing behavior (glitching)
NOR static transition probability
= 3/4 x 1/4 = 3/16
Sp12 CMPEN 411 L14 S.8
NOR Gate Transition Probabilities
CL
A
B
BA
P01 = P0 x P1 = (1-(1-PA)(1-PB)) (1-PA)(1-PB)
PA
PB
0
1 0 1
Switching activity is a strong function of the input signal statistics
PA and PB are the probabilities that inputs A and B are one
Sp12 CMPEN 411 L14 S.9
Transition Probabilities for Some Basic Gates
P01 = Pout=0 x Pout=1
NOR (1 - (1 - PA)(1 - PB)) x (1 - PA)(1 - PB)
OR (1 - PA)(1 - PB) x (1 - (1 - PA)(1 - PB))
NAND PAPB x (1 - PAPB)
AND (1 - PAPB) x PAPB
XOR (1 - (PA + PB- 2PAPB)) x (PA + PB- 2PAPB)
B
AZ
X0.5
0.5
For Z: P01 =
For X: P01 =
Sp12 CMPEN 411 L14 S.10
Transition Probabilities for Some Basic Gates
P01 = Pout=0 x Pout=1
NOR (1 - (1 - PA)(1 - PB)) x (1 - PA)(1 - PB)
OR (1 - PA)(1 - PB) x (1 - (1 - PA)(1 - PB))
NAND PAPB x (1 - PAPB)
AND (1 - PAPB) x PAPB
XOR (1 - (PA + PB- 2PAPB)) x (PA + PB- 2PAPB)
B
AZ
X0.5
0.5
For Z: P01 = P0 x P1 = (1-PXPB) PXPB
For X: P01 = P0 x P1 = (1-PA) PA
= 0.5 x 0.5 = 0.25
= (1 – (0.5 x 0.5)) x (0.5 x 0.5) = 3/16
Sp12 CMPEN 411 L14 S.11
Another Example
B
A
Z
X0.5
0.5
(1-0.5)(1-0.5)x(1-(1-0.5)(1-0.5)) = 3/16
(1- 3/16 x 0.5) x (3/16 x 0.5) = 0.085
Sp12 CMPEN 411 L14 S.12
Inter-signal Correlations
B
A
Z
X
P(Z=1) = P(B=1) & P(A=1 | B=1)
0.5
0.5
(1-0.5)(1-0.5)x(1-(1-0.5)(1-0.5)) = 3/16
(1- 3/16 x 0.5) x (3/16 x 0.5) = 0.085Reconvergent
Determining switching activity is complicated by the fact that signals exhibit correlation in space and time
reconvergent fan-out
Have to use conditional probabilities
notice that Z = (A or B) and B = AB or B = B,
so 0 -> 1 should be (and is) 1/2 x 1/2 = 1/4 !!!
Sp12 CMPEN 411 L14 S.13
Logic Restructuring
Chain implementation has a lower overall switching activity than the tree implementation for random inputs
Logic restructuring: changing the topology of a logic
network to reduce transitions
A
BC
D F
A
B
C
D Z
F
WX
Y0.5
0.5
(1-0.25)*0.25 = 3/16
0.50.5
0.5
0.5
0.5
0.5
7/64
15/256
3/16
3/16
15/256
AND: P01 = P0 x P1 = (1 - PAPB) x PAPB
Sp12 CMPEN 411 L14 S.14
Input Ordering
A
BC
X
F
0.5
0.20.1
B
CA
X
F
0.2
0.10.5
Which is better wrt transition probabilities?
Sp12 CMPEN 411 L14 S.15
Input Ordering
Beneficial to postpone the introduction of signals with a high transition rate (signals with signal probability close to 0.5)
A
BC
X
F
0.5
0.20.1
B
CA
X
F
0.2
0.10.5
(1-0.5x0.2)x(0.5x0.2)=0.09 (1-0.2x0.1)x(0.2x0.1)=0.0196
Which is better wrt transition probabilities?
Sp12 CMPEN 411 L14 S.16
Glitching in Static CMOS Networks
ABC
X
Z
101 000
Unit Delay
A
BX
ZC
Gates have a nonzero propagation delay resulting in spurious transitions or glitches (dynamic hazards)
glitch: node exhibits multiple transitions in a single cycle before settling to the correct logic value
Sp12 CMPEN 411 L14 S.17
Glitching in Static CMOS Networks
ABC
X
Z
101 000
Unit Delay
A
BX
ZC
Gates have a nonzero propagation delay resulting in spurious transitions or glitches (dynamic hazards)
glitch: node exhibits multiple transitions in a single cycle before settling to the correct logic value
Sp12 CMPEN 411 L14 S.18
Glitching in an RCA
S0S1S2S14S15
Cin
0
1
2
3
0 2 4 6 8 10 12
Time (ps)
S O
utp
ut
Vo
ltag
e (
V)
Cin
S0
S1
S2
S3
S4
S5
S10
S15
Sp12 CMPEN 411 L14 S.19
Balanced Delay Paths to Reduce Glitching
So equalize the lengths of timing paths through logic
F1
F2
F3
0
0
0
0
1
2
F1
F2
F3
0
0
0
0
1
1
Glitching is due to a mismatch in the path lengths in
the logic network; if all input signals of a gate change
simultaneously, no glitching occurs
Sp12 CMPEN 411 L14 S.20
Power and Energy Design Space
Constant Throughput/Latency
Variable Throughput/Latency
Energy Design Time Non-active Modules Run Time
Active
(Dynamic)
Logic design
Reduced Vdd
TSizing
Multi-Vdd
Clock Gating
DFS, DVS
(Dynamic Freq, Voltage Scaling)
Leakage
(Standby)
Multi-VT
Stack effect
Pin ordering
Sleep Transistors
Multi-Vdd
Variable VT
Input control
Variable VT
Sp12 CMPEN 411 L14 S.21
Dynamic Power as a Function of VDD
Decreasing the VDD
decreases dynamic energy consumption (quadratically)
But, increases gate delay (decreases performance)
1
1.5
2
2.5
3
3.5
4
4.5
5
5.5
0.8 1 1.2 1.4 1.6 1.8 2 2.2 2.4
VDD (V)
Determine the critical path(s) at design time and use high VDD for the transistors on those paths for speed. Use a lower VDD on the other gates, especially those that drive large capacitances (as this yields the largest energy benefits).
Sp12 CMPEN 411 L14 S.22
Multiple VDD Considerations
How many VDD? – Two is becoming common
Many chips already have two supplies (one for core and one for I/O)
When combining multiple supplies, level converters are required whenever a module at the lower supply drives a gate at the higher supply (step-up)
If a gate supplied with VDDL drives a gate at VDDH, the PMOS never turns off
- The cross-coupled PMOS transistorsdo the level conversion
- The NMOS transistor operate on a reduced supply
Level converters are not needed for a step-down change in voltage
Overhead of level converters can be mitigated by doing conversions at register boundaries and embedding the level conversion inside the flipflop (see Figure 11.47)
VDDH
Vin
VoutVDDL
Sp12 CMPEN 411 L14 S.23
Dual-Supply Inside a Logic Block
Minimum energy consumption is achieved if all logic paths are critical (have the same delay)
Clustered voltage-scaling
Each path starts with VDDH and switches to VDDL (gray logic gates) when delay slack is available
Level conversion is done in the flipflops at the end of the paths
Sp12 CMPEN 411 L14 S.24
Dual-Supply Inside a Logic Block
Minimum energy consumption is achieved if all logic paths are critical (have the same delay)
Clustered voltage-scaling
Each path starts with VDDH and switches to VDDL (gray logic gates) when delay slack is available
Level conversion is done in the flipflops at the end of the paths
Sp12 CMPEN 411 L14 S.25
Power and Energy Design Space
Constant Throughput/Latency
Variable Throughput/Latency
Energy Design Time Non-active Modules Run Time
Active
(Dynamic)
Logic design
Reduced Vdd
TSizing
Multi-Vdd
Clock Gating
DFS, DVS
(Dynamic Freq, Voltage Scaling)
Leakage
(Standby)
Multi-VT
Stack effect
Pin ordering
Sleep Transistors
Multi-Vdd
Variable VT
Input control
Variable VT
Sp12 CMPEN 411 L14 S.26
Stack Effect
Subthreshold leakage is a function of the circuit topology and the value of the inputs
VT = VT0 + (|-2F + VSB| - |-2F|)
where VT0 is the threshold voltage at VSB = 0; VSB is the source-bulk (substrate) voltage; is the body-effect coefficient
A B
B
A
Out
VX
Leakage is least when A = B = 0
Leakage reduction due to stacked transistors is called the stack effect
Sp12 CMPEN 411 L14 S.28
Leakage as a Function of Design Time VT
Reducing the VT
increases the sub-threshold leakage current (exponentially)
90mV reduction in VTincreases leakage by an order of magnitude
But, reducing VT
decreases gate delay (increases performance) 0 0.2 0.4 0.6 0.8 1
VGS (V)ID
(A
)
VT=0.4V
VT=0.1V
Determine the critical path(s) at design time and use low VT devices on the transistors on those paths for speed. Use a high VT on the other logic for leakage control.
A careful assignment of VT’s can reduce the leakage by as much as 80%
Sp12 CMPEN 411 L14 S.29
Dual-Thresholds Inside a Logic Block
Minimum energy consumption is achieved if all logic paths are critical (have the same delay)
Use lower threshold on timing-critical paths
Assignment can be done on a per gate or transistor basis; no clustering of the logic is needed
No level converters are needed
Sp12 CMPEN 411 L14 S.30
IBM Cu11/Cu08 Blue Logic Library
ASIC Cu11 (130nm) Library : Dual-vt library 2690 total cells in standard cell library Nominal Vt level (~300mv) Low Vt level (~210mv)
Low-vt version has same physical footprint ~15% improvement in gate delay ~10x increase in leakage power
ASIC Cu08 (90nm) Library : Multi-vt library 2118 total cells in standard cell library
Intermediate-vt (AVT) and Low-vt (LVT) version of each cell Two more vt levels being planned (very lowvt and high vt)
Sp12 CMPEN 411 L14 S.31
An example to summarize all design-time techniques
Critical path
Sp12 CMPEN 411 L14 S.32
Design Time Low Power Techniques
Lower Vdd
Higher Vdd
Level Converter
Sp12 CMPEN 411 L14 S.33
Design Time Low Power Techniques
Higher Vth
Lower Vth
Sp12 CMPEN 411 L14 S.34
Design Time Low Power Techniques
Stack Forcing
In Out
1/2 W
W
W 1/2 W
1/2 W
1/2 W
Sp12 CMPEN 411 L14 S.35
Low Power Techniques – Interaction w/ each other
Higher Vth
Lower Vth
Apply high Vth and size-up to recover speed
Sp12 CMPEN 411 L14 S.36
Next Lecture and Reminders
Next lecture
Dynamic logic
- Reading assignment – Rabaey, et al, 6.3