Upload
micheal
View
47
Download
0
Embed Size (px)
DESCRIPTION
Lecture 21: Packaging, Power, & Clock. Outline. Packaging Power Distribution Clock Distribution. Packages. Package functions Electrical connection of signals and power from chip to board Little delay or distortion Mechanical connection of chip to board Removes heat produced on chip - PowerPoint PPT Presentation
Citation preview
Lecture 21:
Packaging, Power, & Clock
CMOS VLSI DesignCMOS VLSI Design 4th Ed.21: Package, Power, and Clock 2
Outline Packaging Power Distribution Clock Distribution
CMOS VLSI DesignCMOS VLSI Design 4th Ed.21: Package, Power, and Clock 3
Packages Package functions
– Electrical connection of signals and power from chip to board
– Little delay or distortion– Mechanical connection of chip to board– Removes heat produced on chip– Protects chip from mechanical damage– Compatible with thermal expansion– Inexpensive to manufacture and test
CMOS VLSI DesignCMOS VLSI Design 4th Ed.21: Package, Power, and Clock 4
Package Types Through-hole vs. surface mount
CMOS VLSI DesignCMOS VLSI Design 4th Ed.21: Package, Power, and Clock 5
Chip-to-Package Bonding Traditionally, chip is surrounded by pad frame
– Metal pads on 100 – 200 m pitch– Gold bond wires attach pads to package– Lead frame distributes signals in package– Metal heat spreader helps with cooling
CMOS VLSI DesignCMOS VLSI Design 4th Ed.21: Package, Power, and Clock 6
Advanced Packages Bond wires contribute parasitic inductance Fancy packages have many signal, power layers
– Like tiny printed circuit boards Flip-chip places connections across surface of die
rather than around periphery– Top level metal pads covered with solder balls– Chip flips upside down– Carefully aligned to package (done blind!)– Heated to melt balls– Also called C4 (Controlled Collapse Chip Connection)
CMOS VLSI DesignCMOS VLSI Design 4th Ed.
LGA Package 1
21: Package, Power, and Clock 7
1366 gold-plated pads
CMOS VLSI DesignCMOS VLSI Design 4th Ed.21: Package, Power, and Clock 8
Package Parasitics
Chip
Signal P
ins
PackageCapacitor
Signal P
ads
ChipVDD
ChipGND
BoardVDD
BoardGND
Bond Wire Lead Frame
Package
Use many VDD, GND in parallel
– Inductance, IDD
CMOS VLSI DesignCMOS VLSI Design 4th Ed.21: Package, Power, and Clock 9
Heat Dissipation 60 W light bulb has surface area of 120 cm2
Itanium 2 die dissipates 130 W over 4 cm2
– Chips have enormous power densities– Cooling is a serious challenge
Package spreads heat to larger surface area– Heat sinks may increase surface area further– Fans increase airflow rate over surface area– Liquid cooling used in extreme cases ($$$)
CMOS VLSI DesignCMOS VLSI Design 4th Ed.21: Package, Power, and Clock 10
Thermal Resistance T = jaP
– T: temperature rise on chip
– ja: thermal resistance of chip junction to ambient
– P: power dissipation on chip Thermal resistances combine like resistors
– Series and parallel ja = jp + pa
– Series combination
CMOS VLSI DesignCMOS VLSI Design 4th Ed.21: Package, Power, and Clock 11
Example Your chip has a heat sink with a thermal resistance
to the package of 4.0° C/W. The resistance from chip to package is 1° C/W. The system box ambient temperature may reach
55° C. The chip temperature must not exceed 100° C. What is the maximum chip power dissipation?
(100-55 C) / (4 + 1 C/W) = 9 W
CMOS VLSI DesignCMOS VLSI Design 4th Ed.21: Package, Power, and Clock 12
Temperature Sensor
Monitor die temperature and throttle performance if it gets too hot
Use a pair of pnp bipolar transistors– Vertical pnp available in CMOS
Voltage difference is proportional to absolute temp– Measure with on-chip A/D converter
1 2 11 2
2
ln
ln ln ln ln
BEqVckT
c s BEc
c c cBE BE BE
s s c
IkTI I e V
q I
I I IkT kT kTV V V m
q I I q I q
CMOS VLSI DesignCMOS VLSI Design 4th Ed.21: Package, Power, and Clock 13
Power Distribution Power Distribution Network functions
– Carry current from pads to transistors on chip– Maintain stable voltage with low noise– Provide average and peak power demands– Provide current return paths for signals– Avoid electromigration & self-heating wearout– Consume little chip area and wire– Easy to lay out
CMOS VLSI DesignCMOS VLSI Design 4th Ed.21: Package, Power, and Clock 14
Power Requirements VDD = VDDnominal – Vdroop
Want Vdroop < +/- 10% of VDD
Sources of Vdroop
– IR drops– L di/dt noise
IDD changes on many time scalesclock gating
Time
Average
Max
Min
Power
CMOS VLSI DesignCMOS VLSI Design 4th Ed.21: Package, Power, and Clock 15
IR Drop
A chip draws 24 W from a 1.2 V supply. The power supply impedance is 5 m. What is the IR drop?
IDD = 24 W / 1.2 V = 20 A
IR drop = (20 A)(5 m) = 100 mV
CMOS VLSI DesignCMOS VLSI Design 4th Ed.
IR Introduced Noise
21: Package, Power, and Clock 16
CMOS VLSI DesignCMOS VLSI Design 4th Ed.
Power Distribution
21: Package, Power, and Clock 17
CMOS VLSI DesignCMOS VLSI Design 4th Ed.
Power Distribution Low level distribution is in metal 1. Power has to be strapped in higher layers of metal. The spacing is set by IR drop, electromigration, and
inductive effects. Always use multiple contacts on straps.
21: Package, Power, and Clock 18
CMOS VLSI DesignCMOS VLSI Design 4th Ed.
Power and Ground Distribution
21: Package, Power, and Clock 19
CMOS VLSI DesignCMOS VLSI Design 4th Ed.
3 Metal Layers (EV4)
21: Package, Power, and Clock 20
CMOS VLSI DesignCMOS VLSI Design 4th Ed.
4 Metal Layers (EV5)
21: Package, Power, and Clock 21
CMOS VLSI DesignCMOS VLSI Design 4th Ed.
6 Metal Layers (EV6)
21: Package, Power, and Clock 22
CMOS VLSI DesignCMOS VLSI Design 4th Ed.
Power Supply Droop
21: Package, Power, and Clock 23
CMOS VLSI DesignCMOS VLSI Design 4th Ed.
L di/dt Noise
21: Package, Power, and Clock 24
CMOS VLSI DesignCMOS VLSI Design 4th Ed.21: Package, Power, and Clock 25
L di/dt Noise
A 1.2 V chip switches from an idle mode consuming 5W to a full-power mode consuming 53 W. The transition takes 10 clock cycles at 1 GHz. The supply inductance is 0.1 nH. What is the L di/dt droop?
I = (53 W – 5 W)/(1.2 V) = 40 A t = 10 cycles * (1 ns / cycle) = 10 ns L di/dt droop = (0.1 nH) * (40 A / 10 ns) = 0.4 V
CMOS VLSI DesignCMOS VLSI Design 4th Ed.
Dealing with L di/dt Separate power pins for I/O pads and chip core. Multiple power and ground pins. Careful selection of positions of power and ground
pins on package. Increase rise and fall times as much as possible. Schedule current consuming transitions. Use advanced packaging technologies. Use decoupling capacitances on the board. Use decoupling capacitances on chip.
21: Package, Power, and Clock 26
CMOS VLSI DesignCMOS VLSI Design 4th Ed.
Choosing the Right Pin
21: Package, Power, and Clock 27
CMOS VLSI DesignCMOS VLSI Design 4th Ed.
Decoupling Capacitance
21: Package, Power, and Clock 28
CMOS VLSI DesignCMOS VLSI Design 4th Ed.21: Package, Power, and Clock 29
Bypass Capacitors Need low supply impedance at all frequencies Ideal capacitors have impedance decreasing with Real capacitors have parasitic R and L
– Leads to resonant frequency of capacitor
104
105
106
107
108
109
1010
10-2
10-1
100
101
102
frequency (Hz)
impedance
1 F
0.03
0.25 nH
CMOS VLSI DesignCMOS VLSI Design 4th Ed.
De-coupling Capacitor Ratios
EV4– total effective switching capacitance = 12.5nF– 128nF of de-coupling capacitance– de-coupling/switching capacitance ~ 10x
EV5– 13.9nF of switching capacitance – 160nF of de-coupling capacitance
EV6– 34nF of effective switching capacitance– 320nF of de-coupling capacitance -- not enough!
Source: B. Herrick (Compaq)
CMOS VLSI DesignCMOS VLSI Design 4th Ed.
EV6 De-coupling CapacitanceDesign for Idd= 25 A @ Vdd = 2.2 V, f = 600
MHz– 0.32-µF of on-chip de-coupling capacitance was
added• Under major busses and around major gridded clock
drivers• Occupies 15-20% of die area
– 1-µF 2-cm2 Wirebond Attached Chip Capacitor (WACC) significantly increases “Near-Chip” de-coupling• 160 Vdd/Vss bondwire pairs on the WACC minimize
inductance
Source: B. Herrick (Compaq)
CMOS VLSI DesignCMOS VLSI Design 4th Ed.
EV6 WACC
587 IPGA
MicroprocessorWACC
Heat Slug
389 Signal - 198 VDD/VSS Pins389 Signal Bondwires
395 VDD/VSS Bondwires
320 VDD/VSS Bondwires
Source: B. Herrick (Compaq)
CMOS VLSI DesignCMOS VLSI Design 4th Ed.21: Package, Power, and Clock 33
Power System Model Power comes from regulator on system board
– Board and package add parasitic R and L– Bypass capacitors help stabilize supply voltage– But capacitors also have parasitic R and L
Simulate system for time and frequency responses
VoltageRegulator
Printed CircuitBoard Planes
Packageand Pins
SolderBumps
BulkCapacitor
CeramicCapacitor
PackageCapacitor
On-ChipCapacitor
On-ChipCurrent Demand
VDD
Chip
PackageBoard
CMOS VLSI DesignCMOS VLSI Design 4th Ed.21: Package, Power, and Clock 34
Frequency Response Multiple capacitors in parallel
– Large capacitor near regulator has low impedance at low frequencies
– But also has a low self-resonant frequency– Small capacitors near chip and on chip have low
impedance at high frequencies Choose caps to get low impedance at all frequencies
frequency (Hz)
impedance
CMOS VLSI DesignCMOS VLSI Design 4th Ed.21: Package, Power, and Clock 35
Example: Pentium 4
Power supply impedance for Pentium 4– Spike near 100 MHz caused by package L
Step response to sudden supply current chain– 1st droop: on-chip bypass caps– 2nd droop: package capacitance– 3rd droop: board capacitance
[Xu08] [Wong06]
CMOS VLSI DesignCMOS VLSI Design 4th Ed.
Distributed Model
21: Package, Power, and Clock 36
CMOS VLSI DesignCMOS VLSI Design 4th Ed.21: Package, Power, and Clock 37
Charge Pumps
Sometimes a different supply voltage is needed but little current is required– 20 V for Flash memory programming– Negative body bias for leakage control during sleep
Generate the voltage on-chip with a charge pump
CMOS VLSI DesignCMOS VLSI Design 4th Ed.21: Package, Power, and Clock 38
Energy Scavenging
Ultra-low power systems can scavenge their energy from the environment rather than needing batteries– Solar calculator (solar cells)– RFID tags (antenna)– Tire pressure monitors powered by vibrational
energy of tires (piezoelectric generator) Thin film microbatteries deposited on the chip can
store energy for times of peak demand
CMOS VLSI DesignCMOS VLSI Design 4th Ed.
Capacitive Cross Talk
CMOS VLSI DesignCMOS VLSI Design 4th Ed.
Capacitive Cross Talk Dynamic Node
3 x 1 m overlap: 0.19 V disturbance
CY
CXY
VDD
PDN
CLK
CLK
In1
In2
In3
Y
X
2.5 V
0 V
CMOS VLSI DesignCMOS VLSI Design 4th Ed.
Capacitive Cross Talk Driven Node
XY = RY(CXY+CY)
Keep time-constant smaller than rise time
V (Volt)
0
0.5
0.45
0.4
0.35
0.3
0.25
0.2
0.15
0.1
0.05
010.80.6
t (nsec)
0.40.2
X
YVX
RYCXY
CY
tr↑
CMOS VLSI DesignCMOS VLSI Design 4th Ed.
Dealing with Capacitive Cross Talk
Avoid floating nodes Protect sensitive nodes Make rise and fall times as large as possible Differential signaling Do not run wires together for a long distance Use shielding wires Use shielding layers
CMOS VLSI DesignCMOS VLSI Design 4th Ed.
Shielding
GND
GND
Shieldingwire
Substrate ( GND )
Shieldinglayer
VDD
CMOS VLSI DesignCMOS VLSI Design 4th Ed.
Cross Talk and Performance
Cc
- When neighboring lines switch in opposite direction of victim line, delay increases
DELAY DEPENDENT UPON ACTIVITY IN NEIGHBORING WIRES
Miller EffectMiller Effect
- Both terminals of capacitor are switched in opposite directions (0 Vdd, Vdd 0)
- Effective voltage is doubled and additional charge is needed (from Q=CV)
CMOS VLSI DesignCMOS VLSI Design 4th Ed.
Impact of Cross Talk on Delay
r is ratio between capacitance to GND and to neighbor
CMOS VLSI DesignCMOS VLSI Design 4th Ed.
Dealing with Cross-Talk
Evaluate and improve Constructive layout generation Predictable structures Avoid worst case patterns
21: Package, Power, and Clock 46
CMOS VLSI DesignCMOS VLSI Design 4th Ed.
Structured Predictable Interconnect
S
S SV V S
G
S
SV
G
VS
S SV V S
G
S
SV
G
VExample: Dense Wire Fabric ([Sunil Kathri])Trade-off:• Cross-coupling capacitance 40x lower, 2% delay variation• Increase in area and overall capacitance Also: FPGAs, VPGAs
CMOS VLSI DesignCMOS VLSI Design 4th Ed.21: Package, Power, and Clock 48
Clock Distribution On a small chip, the clock distribution network is just
a wire– And possibly an inverter for clkb
On practical chips, the RC delay of the wire resistance and gate load is very long– Variations in this delay cause clock to get to
different elements at different times– This is called clock skew
Most chips use repeaters to buffer the clock and equalize the delay– Reduces but doesn’t eliminate skew
CMOS VLSI DesignCMOS VLSI Design 4th Ed.
Example
21: Package, Power, and Clock 49
CMOS VLSI DesignCMOS VLSI Design 4th Ed.21: Package, Power, and Clock 50
Example Skew comes from differences in gate and wire delay
– With right buffer sizing, clk1 and clk2 could ideally arrive at the same time.
– But power supply noise changes buffer delays
– clk2 and clk3 will always see RC skew
3 mm
1.3 pF
3.1 mmgclk
clk1
0.5 mm
clk2clk3
0.4 pF 0.4 pF
CMOS VLSI DesignCMOS VLSI Design 4th Ed.
Clock Uncertainties
21: Package, Power, and Clock 51
CMOS VLSI DesignCMOS VLSI Design 4th Ed.52
Clock Nonidealities Clock skew
– Spatial variation in temporally equivalent clock edges; deterministic + random, tSK
Clock jitter– Temporal variations in consecutive edges of the
clock signal; modulation + random noise– Cycle-to-cycle (short-term) tJS
– Long term tJL
Variation of the pulse width – Important for level sensitive clocking
CMOS VLSI DesignCMOS VLSI Design 4th Ed.21: Package, Power, and Clock 53
Review: Skew Impact
F1
F2
clk
clk clk
Combinational Logic
Tc
Q1 D2
Q1
D2
tskew
CL
Q1
D2
F1
clk
Q1
F2
clk
D2
clk
tskew
tsetup
tpcq
tpdq
tcd
thold
tccq
setup skew
sequencing overhead
hold skew
pd c pcq
cd ccq
t T t t t
t t t t
Ideally full cycle is
available for work Skew adds sequencing
overhead Increases hold time too
CMOS VLSI DesignCMOS VLSI Design 4th Ed.21: Package, Power, and Clock 54
Solutions Reduce clock skew
– Careful clock distribution network design– Plenty of metal wiring resources
Analyze clock skew– Only budget actual, not worst case skews– Local vs. global skew budgets
Tolerate clock skew– Choose circuit structures insensitive to skew
CMOS VLSI DesignCMOS VLSI Design 4th Ed.21: Package, Power, and Clock 55
Clock Dist. Networks Ad hoc Grids H-tree Hybrid
CMOS VLSI DesignCMOS VLSI Design 4th Ed.21: Package, Power, and Clock 56
H-Trees Fractal structure
– Gets clock arbitrarily close to any point– Matched delay along all paths
Delay variations cause skew A and B might see big skew A B
CMOS VLSI DesignCMOS VLSI Design 4th Ed.57
More realistic H-tree
[Restle98]
CMOS VLSI DesignCMOS VLSI Design 4th Ed.21: Package, Power, and Clock 58
Itanium 2 H-Tree Four levels of buffering:
– Primary driver– Repeater– Second-level
clock buffer– Gater
Route around
obstructionsPrimary Buffer
Repeaters
Typical SLCBLocations
CMOS VLSI DesignCMOS VLSI Design 4th Ed.
Itanium 2 Repeaters
21: Package, Power, and Clock 59
CMOS VLSI DesignCMOS VLSI Design 4th Ed.
Spines
21: Package, Power, and Clock 60
CMOS VLSI DesignCMOS VLSI Design 4th Ed.
Pentium IV Clock Spines
21: Package, Power, and Clock 61
CMOS VLSI DesignCMOS VLSI Design 4th Ed.
Pentium IV Clock Spines
21: Package, Power, and Clock 62
CMOS VLSI DesignCMOS VLSI Design 4th Ed.21: Package, Power, and Clock 63
Clock Grids Use grid on two or more levels to carry clock Make wires wide to reduce RC delay Ensures low skew between nearby points But possibly large skew across die
CMOS VLSI DesignCMOS VLSI Design 4th Ed.64
The Grid System
D r iv e r
D r iv e r
Dri
ver
Driv
er
G C LK G C LK
G CL K
G CL K
•No rc-matching•Large power
CMOS VLSI DesignCMOS VLSI Design 4th Ed.21: Package, Power, and Clock 65
Alpha Clock Grids
PLL
gclk grid
Alpha 21064 Alpha 21164 Alpha 21264
gclk grid
Alpha 21064 Alpha 21164 Alpha 21264
CMOS VLSI DesignCMOS VLSI Design 4th Ed.66
Example: DEC Alpha 21164
Clock Frequency: 300 MHz - 9.3 Million Transistors
Total Clock Load: 3.75 nF
Power in Clock Distribution network : 20 W (out of 50)
Uses Two Level Clock Distribution:
• Single 6-stage driver at center of chip
• Secondary buffers drive left and right sideclock grid in Metal3 and Metal4
Total driver size: 58 cm!
CMOS VLSI DesignCMOS VLSI Design 4th Ed.67
21164 Clocking 2 phase single wire clock,
distributed globally 2 distributed driver channels
– Reduced RC delay/skew
– Improved thermal distribution
– 3.75nF clock load
– 58 cm final driver width
Local inverters for latching Conditional clocks in caches to
reduce power More complex race checking Device variation
trise = 0.35ns tskew = 150ps
tcycle= 3.3ns
Clock waveform
Location of clockdriver on die
pre-driver
final drivers
CMOS VLSI DesignCMOS VLSI Design 4th Ed.68
Clock Drivers
CMOS VLSI DesignCMOS VLSI Design 4th Ed.69
Clock Skew in Alpha Processor
CMOS VLSI DesignCMOS VLSI Design 4th Ed.70
2 Phase, with multiple conditional buffered clocks
– 2.8 nF clock load– 40 cm final driver width
Local clocks can be gated “off” to save power
Reduced load/skew Reduced thermal issues Multiple clocks complicate race
checking
trise = 0.35ns tskew = 50ps
tcycle= 1.67ns
EV6 (Alpha 21264) Clocking 600 MHz – 0.35 micron CMOSEV6 (Alpha 21264) Clocking 600 MHz – 0.35 micron CMOS
Global clock waveform
PLL
CMOS VLSI DesignCMOS VLSI Design 4th Ed.71
21264 Clocking
CMOS VLSI DesignCMOS VLSI Design 4th Ed.72
EV6 Clock Results
GCLK Skew(at Vdd/2 Crossings)
ps5101520253035404550
ps300305310315320325330335340345
GCLK Rise Times(20% to 80% Extrapolated to 0% to 100%)
CMOS VLSI DesignCMOS VLSI Design 4th Ed. 73
EV7 Clock Hierarchy
GCLK(CPU Core)L2
L_C
LK(L
2 C
ache
)
L2R
_CLK
(L2
Cac
he)
NCLK(Mem Ctrl)
DLL
PLL
SYSCLK
DLL
DLL
+ widely dispersed drivers
+ DLLs compensate static and low-frequency variation
+ divides design and verification effort
- DLL design and verification is added work
+ tailored clocks
Active Skew Management and Multiple Clock Domains
CMOS VLSI DesignCMOS VLSI Design 4th Ed.21: Package, Power, and Clock 74
Hybrid Networks Use H-tree to distribute clock to many points Tie these points together with a grid
Ex: IBM Power4, PowerPC– H-tree drives 16-64 sector buffers– Buffers drive total of 1024 points– All points shorted together with grid
CMOS VLSI DesignCMOS VLSI Design 4th Ed.
Clock Gaters
21: Package, Power, and Clock 75
CMOS VLSI DesignCMOS VLSI Design 4th Ed.
Adaptive Deskewing
21: Package, Power, and Clock 76
CMOS VLSI DesignCMOS VLSI Design 4th Ed.77
Self-timed and Asynchronous Design
Functions of clock in synchronous design
1) Acts as completion signal
2) Ensures the correct ordering of events
Truly asynchronous design
2) Ordering of events is implicit in logic
1) Completion is ensured by careful timing analysis
Self-timed design
1) Completion ensured by completion signal2) Ordering imposed by handshaking protocol
CMOS VLSI DesignCMOS VLSI Design 4th Ed.78
Self-Timed Pipelined Datapath
R2 OutF2In
tpF2
Start Done
R1 F1
tpF1
Start Done
R3 F3
tpF3
Start Done
Req Req Req Req
Ack Ack Ack ACKHS HS HS
CMOS VLSI DesignCMOS VLSI Design 4th Ed.79
Completion Signal Generation
LOGIC
NETWORK
DELAY MODULE
In Out
Start Done
Using Delay Element (e.g. in memories)
CMOS VLSI DesignCMOS VLSI Design 4th Ed.80
Completion Signal Generation
Using Redundant Signal Encoding
CMOS VLSI DesignCMOS VLSI Design 4th Ed.
Completion Signal in DCVSL
21: Package, Power, and Clock 81
CMOS VLSI DesignCMOS VLSI Design 4th Ed.82
Self-Timed Adder
P0
C0
P1
G0
P2
G1
P3
G2 G3
VDD
Start
Start
P0
C0
P1
K0
P2
K1
P3
K2 K3
VDD
Start
Start
C0 C1 C2 C3 C4 C4
C4C0 C1 C2 C3 C4
VDD
Start
C4
C3
C2
C1
C4
C3
C2
C1
Start Done
(a) Differential carry generation
(b) Completion signal
CMOS VLSI DesignCMOS VLSI Design 4th Ed.
Completion Signal Using Current Sensing
21: Package, Power, and Clock 83