View
216
Download
0
Embed Size (px)
Citation preview
Spring 07, Feb 20Spring 07, Feb 20 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 11
ELEC 7770ELEC 7770Advanced VLSI DesignAdvanced VLSI Design
Spring 2007Spring 2007Reducing Power through Multicore ParallelismReducing Power through Multicore Parallelism
Vishwani D. AgrawalVishwani D. AgrawalJames J. Danaher ProfessorJames J. Danaher Professor
ECE Department, Auburn UniversityECE Department, Auburn University
Auburn, AL 36849Auburn, AL 36849
[email protected]@eng.auburn.edu
http://www.eng.auburn.edu/~vagrawal/COURSE/E7770_Spr07http://www.eng.auburn.edu/~vagrawal/COURSE/E7770_Spr07
Spring 07, Feb 20Spring 07, Feb 20 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 22
Power Dissipation in CMOS Power Dissipation in CMOS Logic (0.25µ)Logic (0.25µ)
%75 %5%20
PPtotaltotal (0→1) = (0→1) = CCLL V VDDDD22
+ + ttscscVVDDDD I Ipeakpeak ++ VVDDDDIIleakageleakage
CL
VDD VDD
Spring 07, Feb 20Spring 07, Feb 20 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 33
Low-Power Datapath ArchitectureLow-Power Datapath Architecture Lower supply voltageLower supply voltage
This slows down circuit speedThis slows down circuit speed Use parallel computing to gain the speed backUse parallel computing to gain the speed back
Works well when threshold voltage is also lowered.Works well when threshold voltage is also lowered. About 60% reduction in power obtainable.About 60% reduction in power obtainable. Reference: A. P. Chandrakasan and R. W. Brodersen, Reference: A. P. Chandrakasan and R. W. Brodersen,
Low Power Digital CMOS DesignLow Power Digital CMOS Design, Boston: Kluwer , Boston: Kluwer Academic Publishers (Now Springer), 1995.Academic Publishers (Now Springer), 1995.
Spring 07, Feb 20Spring 07, Feb 20 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 44
A Reference DatapathA Reference Datapath
Combinationallogic
OutputInputR
eg
iste
r
Re
gis
ter
CK
Supply voltage = Vref
Total capacitance switched per cycle = Cref
Clock frequency = fPower consumption: Pref = CrefVref
2f
Cref
Spring 07, Feb 20Spring 07, Feb 20 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 55
A Parallel ArchitectureA Parallel Architecture
Comb.Logic
Copy 1
Comb.Logic
Copy 2
Comb.Logic
Copy N
Re
gis
ter
Re
gis
ter
Re
gis
ter
Re
gis
ter
N to
1 m
ulti
ple
xer
MultiphaseClock gen. and mux
control
InputOutput
CK
f
f/N
f/N
f/N
Each copy processes every Nth input, operates at reduced voltage
Supply voltage:VN ≤ V1 = Vref
N = Deg. of parallelism
Spring 07, Feb 20Spring 07, Feb 20 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 66
Level Converter: L to HLevel Converter: L to H
Vin_L
Vout_H
VDDH
VDDL
Transistors with thicker oxide and longer channels
N. H. E. Weste and D. Harris, CMOS VLSI Design, ThirdEdition, Section 12.4.3, Addison-Wesley, 2005.
Spring 07, Feb 20Spring 07, Feb 20 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 77
Level Converter: H to LLevel Converter: H to L
Vin_H Vout_L
VDDLTransistors with thicker oxide and longer channels
N. H. E. Weste and D. Harris, CMOS VLSI Design, ThirdEdition, Section 12.4.3, Addison-Wesley, 2005.
Spring 07, Feb 20Spring 07, Feb 20 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 88
Control Signals, N = 4Control Signals, N = 4
CK
Phase 1
Phase 2
Phase 3
Phase 4
Spring 07, Feb 20Spring 07, Feb 20 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 99
PowerPowerPN = Pproc + Poverhead
Pproc = N(Cinreg+ Ccomb)VN2f/N + CoutregVN
2f
= (Cinreg+ Ccomb+Coutreg)VN2f
= CrefVN2f
Poverhead = CoverheadVN2f ≈ δCref(N – 1)VN
2f
PN = [1 + δ(N – 1)]CrefVN2f
PN VN2
── = [1 + δ(N – 1)] ───P1 Vref
2
Spring 07, Feb 20Spring 07, Feb 20 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 1010
Voltage vs. SpeedVoltage vs. Speed CLVref CLVref
Delay of a gate, T ≈ ──── = ────────── I k(W/L)(Vref – Vt)2
where I is saturation currentk is a technology parameterW/L is width to length ratio of transistorVt is threshold voltage
Supply voltage
No
rma
lize
d g
ate
de
lay,
T
4.0
3.0
2.0
1.0
0.0 Vt Vref =5VV2=2.9V
N=1
N=2
V3
N=31.2μ CMOS Voltage reduction
slows down as we get closer to Vt
Spring 07, Feb 20Spring 07, Feb 20 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 1111
Increasing MultiprocessingIncreasing Multiprocessing
PN/P1
1 2 3 4 5 6 7 8 9 10 11 12
1.0
0.8
0.6
0.4
0.2
0.0
Vt=0V (extreme case)
Vt=0.4V
Vt=0.8V
N
1.2μ CMOS, Vref = 5V
Spring 07, Feb 20Spring 07, Feb 20 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 1212
Extreme Cases: VExtreme Cases: Vtt = 0 = 0Delay, T α 1/ Vref
For N processing elements, delay = NT → VN = Vref/N
PN 1── = [1+ δ (N – 1)] ── → 1/NP1 N2
For negligible overhead, δ→0
PN 1── ≈ ──P1 N2
For Vt > 0, power reduction is less and there will be an optimum value of N.
Spring 07, Feb 20Spring 07, Feb 20 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 1313
Example: Multiplier CoreExample: Multiplier Core
Specification:Specification: 200MHz Clock200MHz Clock 15W dissipation @ 5V15W dissipation @ 5V Low voltage operation, VLow voltage operation, VDDDD ≥ 1.5 volts ≥ 1.5 volts
(V(VDDDD – 0.5) – 0.5)22
Relative clock rate = Relative clock rate = ────────────── 20.2520.25
Problem:Problem: Integrate multiplier core on a SOCIntegrate multiplier core on a SOC Power budget for multiplier ~ 5WPower budget for multiplier ~ 5W
Spring 07, Feb 20Spring 07, Feb 20 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 1414
A Multicore DesignA Multicore Design
MultiplierCore 1
MultiplierCore 5
Reg
RegR
egR
eg
5 to
1 m
ux
MultiphaseClock gen.
and muxcontrol
Input
Output
200MHzCK
200MHz
40MHz
40MHz
40MHz
MultiplierCore 2
Core clock frequency = 200/N, N should divide 200.
Spring 07, Feb 20Spring 07, Feb 20 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 1515
How Many Cores?How Many Cores?
For N cores:For N cores: clock frequency = 200/N MHzclock frequency = 200/N MHz
Supply voltage, VSupply voltage, VDDNDDN= 0.5 + (20.25/N)= 0.5 + (20.25/N)1/21/2 Volts Volts
Assuming 10% overhead per core,Assuming 10% overhead per core, VVDDNDDN
Power dissipation =15 [1 + 0.1(N – 1)] Power dissipation =15 [1 + 0.1(N – 1)] ((──────))2 2
wattswatts 55
Spring 07, Feb 20Spring 07, Feb 20 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 1616
Design TradeoffsDesign TradeoffsNumber of coresNumber of cores
NNClock (MHz)Clock (MHz) Core supply VDDN Core supply VDDN
(Volts)(Volts)Total PowerTotal Power
(Watts)(Watts)
11 200200 5.005.00 15.015.0
22 100100 3.683.68 8.948.94
44 5050 2.752.75 5.905.90
55 4040 2.512.51 5.295.29
88 2525 2.102.10 4.504.50
Spring 07, Feb 20Spring 07, Feb 20 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 1717
Power Reduction in ProcessorsPower Reduction in Processors
Just about everything is used.Just about everything is used. Hardware methods:Hardware methods:
Voltage reduction for dynamic powerVoltage reduction for dynamic power Dual-threshold devices for leakage reductionDual-threshold devices for leakage reduction Clock gating, frequency reductionClock gating, frequency reduction Sleep modeSleep mode
Architecture:Architecture: Instruction setInstruction set hardware organizationhardware organization
Software methodsSoftware methods
Spring 07, Feb 20Spring 07, Feb 20 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 1818
Parallel ArchitectureParallel Architecture
Processor
f
Processor
f/2
Processor
f/2
f
Input Output
Input
Output
Capacitance = CVoltage = VFrequency = fPower = CV2f
Capacitance = 2.2CVoltage = 0.6VFrequency = 0.5fPower = 0.396CV2f
Spring 07, Feb 20Spring 07, Feb 20 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 1919
Pipeline ArchitecturePipeline Architecture
Processor
f
Input Output
Re
gis
ter
½Proc.
f
Input Output
Re
gis
ter
½Proc.
Re
gis
ter
Capacitance = CVoltage = VFrequency = fPower = CV2f
Capacitance = 1.2CVoltage = 0.6VFrequency = fPower = 0.432CV2f
Spring 07, Feb 20Spring 07, Feb 20 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 2020
Approximate TrendApproximate Trend n-parallel proc.n-parallel proc. n-stage pipeline proc.n-stage pipeline proc.
CapacitanceCapacitance nCnC CC
VoltageVoltage V/nV/n V/nV/n
FrequencyFrequency f/nf/n ff
PowerPower CVCV22f/nf/n22 CVCV22f/nf/n22
Chip areaChip area n timesn times 10-20% increase10-20% increase
G. K. Yeap, Practical Low Power Digital VLSI Design, Boston: KluwerAcademic Publishers, 1998.
Spring 07, Feb 20Spring 07, Feb 20 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 2121
Multicore ProcessorsMulticore Processors
2000 2004 2008
Per
form
ance
bas
ed o
nS
PE
Cin
t200
0 an
d S
PE
Cfp
2000
ben
chm
arks
Multicore
Single core
Computer, May 2005, p. 12
Spring 07, Feb 20Spring 07, Feb 20 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 2222
Multicore ProcessorsMulticore Processors
D. Geer, “Chip Makers Turn to Multicore Processors,” D. Geer, “Chip Makers Turn to Multicore Processors,” ComputerComputer, vol. 38, no. 5, pp. 11-13, May 2005., vol. 38, no. 5, pp. 11-13, May 2005.
A. Jerraya, H. Tenhunen and W. Wolf, “Multiprocessor A. Jerraya, H. Tenhunen and W. Wolf, “Multiprocessor Systems-on-Chips,” Systems-on-Chips,” ComputerComputer, vol. 5, no. 7, pp. 36-40, , vol. 5, no. 7, pp. 36-40, July 2005; July 2005; this special issue contains three more this special issue contains three more articles on multicore processorsarticles on multicore processors..
S. K. Moore, “Winner Multimedia Monster – Cell’s Nine S. K. Moore, “Winner Multimedia Monster – Cell’s Nine Processors Make It a Supercomputer on a Chip,” Processors Make It a Supercomputer on a Chip,” IEEE IEEE SpectrumSpectrum, vol. 43. no. 1, pp. 20-23, January 2006. , vol. 43. no. 1, pp. 20-23, January 2006.
Spring 07, Feb 20Spring 07, Feb 20 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 2323
Cell - Cell Broadband Engine Cell - Cell Broadband Engine ArchitectureArchitecture
L to RAtsushi Kameyama, ToshibaJames Kahle, IBMMasakazu Suzoki, Sony
© I
EE
E S
pe
ctru
m,
Jan
ua
ry 2
00
6
Nine-processor chip:192 Gflops
Spring 07, Feb 20Spring 07, Feb 20 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 2424
Cell’s Nine-Processor ChipCell’s Nine-Processor Chip
© IEEE Spectrum, January 2006 Eight IdenticalProcessors f = 5.6GHz (max)44.8 Gflops