Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
R®
Page 1
Design of Clock Distribution in High Performance Processors
Ian Young
Intel Senior Fellow and Director, Advanced Circuits and Technology Integration (ACTI)
Technology Manufacturing Group (TMG)Intel Corporation, Hillsboro, Oregon
R®
Page 2Clock Distribution in Microprocessors I. Young 3/30/2005
Desktop Clock Frequency Trend
10 MHz
100 MHz
1985 1990 1995
Year
1 GHz
2000 2005
10 GHz
Clo
ck F
requ
ency
PIIPentium
PIII
P4
XX
X
486 X
X
R®
Page 3Clock Distribution in Microprocessors I. Young 3/30/2005
Outline
Introduction to Synchronous Logic and clock distribution
Manufacturing effects
Early history of clocking: 80486, Pentium and Pentium II
Itanium active deskewing clock distribution
Pentium4 clock distribution
Montecito (Itanium family next gen.) clock distribution
Summary
R®
Page 4Clock Distribution in Microprocessors I. Young 3/30/2005
Synchronous Logic
Logic progresses at a rate controlled by the clock— Retiming removes the effects of different logic and wire delays— Slows down signals that arrive too fast
Requires a state element— Latch stores Input when clock is low— Flip-Flop stores Input when clock rises
Requires a precise clock
Enables CPU pipelining and high through-put CPUs
R®
Page 5Clock Distribution in Microprocessors I. Young 3/30/2005
Flip-Flop: Set-up and Hold Times
Setup Time:— time before the clock signal, that a data signal must be valid in
order to be stored.
Hold Time:— time after the clock signal, that a data signal must be valid in
order to be stored.
Clock
Data In
Data Out
Setup time
Hold time
R®
Page 6Clock Distribution in Microprocessors I. Young 3/30/2005
The Clock Distribution Problem
Deliver the clock signal from the source (PLL) to all the receivers with the best timing precision. CLOCK SKEW is the inaccuracy of the same clock edge arriving at various locations in the chip (spatial separation)CLOCK JITTER is the inaccuracy of consecutive clock edges arriving at the same location (temporal separation)
Skew
AB
JitterC
A
B
C
PLL
R®
Page 7Clock Distribution in Microprocessors I. Young 3/30/2005
Clock and Logic Structure & Operation
Clock
Data OutData Out
U1 U2
Data InData In
C
Buffer controlling U1 data capture & transmit
ComputationCircuitry
Data path
Buffer Controlling U2 data capture & transmit
B1
B2Clock path 2
CLOCK
Data OutData Out
FFU1
FFU2
Data InData In
C
Buffer controlling U1 data capture & transmit
ComputationCircuitry
Clock path 1
Data path
Buffer Controlling U2 data capture & transmit
B1
Clock path 2
Clock Delay 1 = delay from Clock to the Clock input of U1.Clock Delay 2 = delay from Clock to the Clock input of U2.Clock Skew = Clock Delay 2 – Clock Delay 1 (should be zero).
R®
Page 8Clock Distribution in Microprocessors I. Young 3/30/2005
SETUP violation: Data arrives at U2 too late, anddoesn’t get captured by U2 clock cycle.
Combinational logic
CU1 U2DATA IN DATA OUT
DATA IN
CLOCK
0
0
1
1
1
1
DATA IN Data ArrivesToo Late
CLOCK
DATA OUT
DATA IN
DATA OUT
CLOCK
U1
bU2
CLOCK SKEWCLOCK SKEW
R®
Page 9Clock Distribution in Microprocessors I. Young 3/30/2005
HOLD violation: Data arrives at U2 too soon, not held long enough to be captured by U2 clock cycle.
CCombinational logic
U1 U2DATA IN DATA OUT
CLOCK
0
0
1
1
1
1
DATA IN Data ArrivesToo SoonU1 CLOCK
DATA OUT
DATA IN
U2DATA OUT
CLOCK
CLOCK SKEWCLOCK SKEW
R®
Page 10Clock Distribution in Microprocessors I. Young 3/30/2005
Challenges for Microprocessor Clock Design
Rapid increases in Core Clock Frequencies— 1991: 100 MHz (0.8mm)— 1997: 400 MHz (0.35mm)— 2001: 2.0GHz (0.13mm)— 2005: 3.8GHz (90nm)
Increasing Clock Load— as indicated by total transistors/die— 1991: 1.2 million transistors (0.8mm) — 1997: 7.5 million transistors (0.35mm)— 2001: 42 million transistors (0.13mm) — 2005: 1.7 billion transistor (90nm)
Worsening within-die process variations— Lithography and Etch — Supply Noise— Hot Spots
R®
Page 11Clock Distribution in Microprocessors I. Young 3/30/2005
Clock Distribution H-Tree (2 level)Global / LocalSkew
L1Ext Clk PLL
(100MHz)
(2 GHz)
L2
L3
R®
Page 12Clock Distribution in Microprocessors I. Young 3/30/2005
Outline
Introduction to Synchronous Logic and clock distribution
Manufacturing effects
Early history of clocking: 80486, Pentium and Pentium II
Itanium active deskewing clock distribution
Pentium4 clock distribution
Montecito (Itanium family next gen.) clock distribution
Summary
R®
Page 13Clock Distribution in Microprocessors I. Young 3/30/2005
Poly Gate Transistor Length Variation Sources
Long-range Within-die— Stepper lens aberrations
Proximity effects (systematic)— Nest or isolated
Random component— Stepper lens— Poly gate line edge roughness— Threshold voltage Lgate
Source Drain
R®
Page 14Clock Distribution in Microprocessors I. Young 3/30/2005
Transistor Vth Variation Sources
Die-to-die
Random component— From Random Dopant Fluctuations f (W, L)
Short channel component.
Well and Halo doping
R®
Page 15Clock Distribution in Microprocessors I. Young 3/30/2005
Vth Variation Model
Model relates variability to device size:
Where We and Le are the effective device width and length. C1 and C2 relate to some physical device parameters such as Tox, junction depth, etc.
WeZeCCVt 2
1)( +=∆σ
R®
Page 16Clock Distribution in Microprocessors I. Young 3/30/2005
Interconnect Variability
Conductor width and space
Conductor Thickness
Dielectric Thickness
Inductance — needs to be analyzed and modeled for busses
R®
Page 17Clock Distribution in Microprocessors I. Young 3/30/2005
Interconnect Variability
Conductor Thickness Dielectric Thickness Power and Global SignalInductance (needs to be analyzed and modeled)
Metal LineVia
Insulating Dielectric
M1M2
M5
M4
M3
M7
M6
R®
Page 18Clock Distribution in Microprocessors I. Young 3/30/2005
Outline
Introduction to Synchronous Logic and clock distribution
Manufacturing effects
Early history of clocking: 80486, Pentium and Pentium II
Itanium active deskewing clock distribution
Pentium4 clock distribution
Montecito (Itanium family next gen.) clock distribution
Summary
R®
Page 19Clock Distribution in Microprocessors I. Young 3/30/2005
The 50MHz 80486 Microprocessor (1991)
PLL
• 3 Layer Metal• 2 Phase Clocking • RC clock skew• On-chip PLL
R®
Page 20Clock Distribution in Microprocessors I. Young 3/30/2005
Clock Buffers80486 Microprocessor: PLL
PhaseFreqDet
CP VCOVcntl Div
2
Clkin
Clkout
Ph1/Ph2Ext Clk
Fb Clk
50% duty-cycle 2φ clocks, Clkin, Clkout compatible with prior gen.Internal Clock Skews Between Chips Reduced by ~2nsEnables 0-1ns Hold Time for frequency ScalabilityVCO Designed with: Wide Frequency Range (5-130MHz)
Supply Voltage Noise Rejection
Conventional With PLLSetup 3ns 1.5nsHold 2.5ns 1.0nsOutput Valid 9.0ns 7.5ns
R®
Page 21Clock Distribution in Microprocessors I. Young 3/30/2005
Pentium Microprocessor Clock DistributionSingle 50% Duty-cycle Clock66-133MHz Internal Frequency Internal clock freq. vs. External bus freq. Ratios of 2:1, 3:2, 1:1
To Global Clock Driver
PLL Clock Generator
To Global Clock Driver
Local Clock Enable
Local Clock Enable
T=0ns
Global Clock DriversLocated in the Pad Ring
Serpentines for RC matching
R®
Page 22Clock Distribution in Microprocessors I. Young 3/30/2005
Pentium II (P6) Microprocessor Clock Distribution
Buffer Network designed for > 300 MHzMinimized the Propagation Delay Minimized Global Clock Skew Global Clock Power DownSupply di/dt noise reduction
— Vdd / Vss decoupling capacitance— Minimize Vdd / Vss DC Resistance (IR drop) — Minimize Vdd / Vss AC Resistance and Inductance
R®
Page 23Clock Distribution in Microprocessors I. Young 3/30/2005
P L L
C lo c k _ e nL o c a l C lo c k D r iv e rs
G lo b a l C lo c k D riv e rs
L o c a l C lo c k D r iv e rs
L o c a l C lo c k D r iv e rs
1 .2 n s
C lo c k D is t r ib u tion N e tw o rk
g c lk # c lk
2 n s
Global
R®
Page 24Clock Distribution in Microprocessors I. Young 3/30/2005
Pentium II Global Clock Skew measured test chip results
SK= -592ps
SK= -460ps
SK = -424ps SK = -548ps
SK= -564ps
SK= -476ps with M4 Metal Strapping Ring
SK= -488ps
Input Point to Local Buffers
5 Level Driver for 500pF load
with clock gating
Floor Plan of measured skew SK = Skew relative to feedback point from local bufferSkew across M4 Global Distribution = 140ps
R®
Page 25Clock Distribution in Microprocessors I. Young 3/30/2005
Outline
Introduction to Synchronous Logic and clock distribution
Manufacturing effects
Early history of clocking: 80486, Pentium and Pentium II
Itanium active deskewing clock distribution
Pentium4 clock distribution
Montecito (Itanium family next gen.) clock distribution
Summary
R®
Page 26Clock Distribution in Microprocessors I. Young 3/30/2005
Itanium Processor Clocking
IA-64 architecture0.18µm CMOS6 metal layers25.4M transistors800MHz frequency
FPUIA-32Control
Instr.Fetch &Decode
Cache
Cache
TLB
Integer Units
IA-64 Control
Bus
Ref: S. Tam, S. Rusu, U. Desai, R. Kim, J. Zhang, I. Young JSSC Nov 2000
R®
Page 27Clock Distribution in Microprocessors I. Young 3/30/2005
Clock Distribution Hierarchy
GlobalDistribution
VCC/2
DSK
DSK
DSKPLL
CLKPCLKN
RCD
RCDDLCLK
OTB
RegionalDistribution
LocalDistribution
ReferenceClock
MainClock
R®
Page 28Clock Distribution in Microprocessors I. Young 3/30/2005
Itanium Global Clock Distribution
Balanced H-tree routed in M5 and M6Lateral shieldingDistributes both main and reference clockOptimized to account for inductive effectsPLL
DSK
DSK
DSK
DSKDSK
DSKDSK
DSK
R®
Page 29Clock Distribution in Microprocessors I. Young 3/30/2005
Itanium Regional Clock Distribution
Distributed array of deskew buffers to reduce process related skew
— 8 deskew clusters each holding up to 4 buffers
Regional clock grids driven by modular Regional Clock Drivers
— M4-M5 grid tailored for the clock load densityof the underlying block
— Full support for scan and clock gating
DSK = Cluster of 4 deskew buffers
DSK
DSK
DSK
DSKDSK
DSKDSK
DSK
CDC
CDC = Central Deskew Controller
R®
Page 30Clock Distribution in Microprocessors I. Young 3/30/2005
Deskew Buffer: Block Diagram
Regional Feedback Clock
Ref. ClockLocal Controller
RCD
RCD
Global Clock
TAP I/F
Deskew Buffer
RegionalClock Grid
DelayCircuit
Deskew covers the entire clock distribution up to the input of the local clock buffer
R®
Page 31Clock Distribution in Microprocessors I. Young 3/30/2005
Deskew Buffer: Delay Circuit
Enable
20-bit Delay Control Register
Output
Input
TAP I/F Step size = 8.5psDeskew range = 170ps
Small step size enables fine granularity skew control over a wide rangeTAP read/write access to Control Register enables faster timing debug and performance tuning
R®
Page 32Clock Distribution in Microprocessors I. Young 3/30/2005
Deskew Buffer: Local Controller
Referenceclock 16-to-1
CounterEnable
Phase Detector
Digital Low-Pass Filter
To Deskew Buffer Register
Feedbackclock
Lead/Lag
Phase detector output sampled every 16 core cycles6-tap digital low pass filter reduces comparison noiseLocal controller ensures stable deskew operation
R®
Page 33Clock Distribution in Microprocessors I. Young 3/30/2005
Itanium Skew Measurements
0102030405060708090
100110120
R01 R03 R11 R20 R22 R31 R33 R41 R43 R51 R53 R63 R71
Regional Clock
Dis
trib
utio
n Sk
ew (p
s) Projected max skew without deskew mechanism = 110ps
Max skew with deskew mechanism = 28ps
R®
Page 34Clock Distribution in Microprocessors I. Young 3/30/2005
Clock Skew Timing Budgets
Common LCB Ts1
Common RCD Ts2
Common Reference Ts3
Common Main Clock Ts4
Category Skew Budget
Ts1 < Ts2 < Ts3 < Ts4
DSK
CommonReference
LCBRCD
CB
CB
CB
MainClock
CB
DSK
DSK
RCD
RCD
LCB
CommonReference
Multiple skew budgets minimize the skew penaltyand enable timing optimization
R®
Page 35Clock Distribution in Microprocessors I. Young 3/30/2005
Outline
Introduction to Synchronous Logic and clock distribution
Manufacturing effects
Early history of clocking: 80486, Pentium and Pentium II
Itanium active deskewing clock distribution
Pentium4 clock distribution
Montecito (Itanium family next gen.) clock distribution
Summary
R®
Page 36Clock Distribution in Microprocessors I. Young 3/30/2005
Pentium®4 Clock Challenges
Enable Netburst™ micro-architecture for 0.18um technology
≥ 2GHz clock for Hyper Pipelined Technology core≥ 4GHz clock for Rapid Execution Engine≥ 400MHz I/O clock for fast data transfer
— < 10% clock inaccuracy
Enable clock gating for low powerClock observability and controllability for fast debug
Reference: Kurd, Barkatullah, Dizon, Fletcher, Madland JSSC Nov 2001
R®
Page 37Clock Distribution in Microprocessors I. Young 3/30/2005
Clock Generation & Distribution
I/OPLL
Clock Dist
CorePLL
Clock Dist LCDs
systemclocks
Local ClockDrivers
Cor
e C
lock
s
LCDs
LCDs
LCDs
LCDs
LCDs I/O C
loc k
s
2x
1x
½x
1x
2x
4x
4GHz
2GHz
1GHz100MHz
100MHz
400MHz
200MHz
R®
Page 38Clock Distribution in Microprocessors I. Young 3/30/2005
Clock Distribution
I/OPLL
Clock Dist
CorePLL
Clock Dist LCDs
systemclocks
Local ClockDrivers
Cor
e C
lock
s
LCDs
LCDs
LCDs
LCDs
LCDs I/O C
loc k
s
2x
1x
½x
1x
2x
4x
R®
Page 39Clock Distribution in Microprocessors I. Young 3/30/2005
Binary distribution tree in three spines
F r o m P L L
R®
Page 40Clock Distribution in Microprocessors I. Young 3/30/2005
Triple Clock Spines
Top Spine
Middle Spine
Bottom Spine PLL
BinaryDist.Tree
SkewOptimizer
To LocalClock Drivers
From PLLSp
ine
R®
Page 41Clock Distribution in Microprocessors I. Young 3/30/2005
Skew Optimization Scheme
PLL
binarytreeof
clockrepea-
ters
DB47
DB1
DB2
DB3
PDDB46
PD
To TestAccess Port Local Clock
Macro
LCDPD
PD Local ClockMacro
Local ClockMacroLCD
LCD
LCD
LCD
SE
SE
SE
SE
SE
AdjustableDomain Buffers
PhaseDetectors
LocalClock
Drivers
SequentialElements
FilteredVCC
Skew Optimizer
R®
Page 42Clock Distribution in Microprocessors I. Young 3/30/2005
Skew Profile Graph
-40.0
-30.0
-20.0
-10.0
0.0
10.0
20.0
30.0
40.0
Left Right
Rel
ativ
e Ti
min
g (p
s)
TopMiddleBottom
REFERENCEAFTER
BEFORE
R®
Page 43Clock Distribution in Microprocessors I. Young 3/30/2005
Skew Control Scheme
For the particular die example shown— Pre-compensation max skew ~ ±32ps— Post-compensation max skew ±8ps
Side benefits— Provides a within-die skew profile— Deliberately skew clocks for performance
– 200MHz frequency increase obtained
R®
Page 44Clock Distribution in Microprocessors I. Young 3/30/2005
Outline
Introduction to Synchronous Logic and clock distribution
Manufacturing effects
Early history of clocking: 80486, Pentium and Pentium II
Itanium active deskewing clock distribution
Pentium4 clock distribution
Montecito (Itanium family next gen.) clock distribution
Summary
R®
Page 45
Montecito Clock System Floorplan
PLL / PLL / translatiotranslation table / n table / clock clock controlcontrol
FSB FSB DFDsDFDs
Foxton Controller DFDFoxton Controller DFD
FSB FSB DFDsDFDs
Core Core DFDsDFDs
RVDsRVDs
CORE 1CORE 1
CORE CORE 00
Bus Logic DFDBus Logic DFD
References: ISSCC 2005 Paper 16.1, T. Fischer et al, ISSCC 2005 Paper 16.2, P. Mahoney et al
R®
Page 46Clock Distribution in Microprocessors I. Young 3/30/2005
Clock SystemArchitecture
FixedSupply
VariableSupply
Bus Clock
I/Os
Foxton
Bus Logic
Core0
Core1
1/N
CVD
RAD
Gater
1/1
GaterCVD
RVD
DFD
DFD
SLCB CVD GaterPLL
DFD
SLCB CVD Gater
SLCB
DFD
CVD Gater
1/M
DFD
DFD
1/NRAD
SLCB
SLCB
Frequency Translation Table
DivisorsFuses
PinsBalanced
Tree ClockDistribution
Phase Aligner
R®
Page 47Clock Distribution in Microprocessors I. Young 3/30/2005
Montecito Clock Distribution Summary (1)
Core Clock Frequency controlled real-time based upon DC and transient power supply voltage
— Montecito/Foxton Power delivery sets DC supply voltage based upon power dissipation (temp. sensors around the chip) of 100W. No worst case power.
— Ldi/dt supply noise transients slow critical paths and reduce the operating frequency for a few cycles. Constantly varying frequency responds to the core supply voltage transient behavior.
Regional Active Deskewing system reduces the process voltage and temperature sources of skew across the 21.5mm x 27.7 mm die.Clock Venier Devices (CVD) inserted at each local clock buffer allow 70ps of adjustment via Scan control.The clock distribution system consumes less than 25W for the 30mm route from PLL through the clock tree to all the Latches.
R®
Page 48
Montecito Clock System Floorplan
PLL / PLL / translation translation table / table / clock clock controlcontrol
FSB FSB DFDsDFDs
Foxton Controller DFDFoxton Controller DFD
FSB FSB DFDsDFDs
Core Core DFDsDFDs
RVDsRVDs
CORE 1CORE 1
CORE 0CORE 0
Bus Logic DFDBus Logic DFD
R®
Page 49Clock Distribution in Microprocessors I. Young 3/30/2005
Overview with block diagram
Bus Clock
IOs
Foxton
Bus Logic
REPEATERS
core0
core1
SLCB
RAD
RAD
GATERSCVDSLCB
DFD
DFD
PLL
DFD
DFD
DFD
DFD
L0 route L1 route L2 route
Latches
Latches
Latches
Latches
Latches
Latches
Latches
SLCB
SLCB
SLCB
GATERSCVD
GATERSCVD
GATERSCVD
GATERSCVD
L3 Route
Fixed frequencyLow Voltage Swings
Variable Frequency Full Rail Transitions
Differential Single Ended
R®
Page 50Clock Distribution in Microprocessors I. Young 3/30/2005
Montecito Clock Distribution Summary (2)
The LO clock route is differential with 400mV swig and resistive load at the end of each line (length = 20mm)
— Line width tapering is used— A self-biased differential amplifier is the repeater
L1 route (length = 2mm) from the Digital Frequency Divider (DFD) to the Second Level Clock Buffer (SLCB) is distributed as a half frequency 0o / 90o clocks that are XOR’d in the SLCB and not duty cycle sensitiveL2 route (length=3mm) from SLCBs to the Clock Venier Device has < 6ps skew using optimization with a CAD timing tool.L3 route (length = 2mm), from the CVDs through the clock gaters to the Latches, provides an overall clock skew adjustment to < 10ps to the Latches controlled by test scan
R®
Page 51Clock Distribution in Microprocessors I. Young 3/30/2005
Digital Frequency Divider (DFD) Block Diagram
PLL CLOCK DIFFERENTIAL INPUT
64 PHASES
STATE MACHINE
FULL FREQUENCY DIFFERENTIAL “UTILITY”CLOCK ROUTES TO CLOCK SYSTEM
PCSM
PERIOD ADJUST +2 TO -1
16-PHASE DLL AND INTERPOLATION
TO / FROM SAME-CORE PCSMS
STARTUP CONTROL
RVD UP / DOWN REQUESTS
DIVIDE BY 2
DIVIDE BY 2
ODCS CONTROL
SCAN AND TRIGGERS
½ FREQUENCY QUADRATURE DIFFERENTIAL CLOCK ROUTES TO SLCBS
R®
Page 52Clock Distribution in Microprocessors I. Young 3/30/2005
Voltage-to-Frequency Conversion (VFC)Per Core
DFD1 L1 Clock Route To SLCBS
L0 Clock Route from PLL
22
4
RVD4 RVD5 RVD6 RVD7
2
2
DFD2
RVD11
L1 Clock Route To SLCBS
4
RVD8 RVD9 RVD10
2
8
8
DFD0
RVD3
L1 Clock Route To SLCBS
2
4
RVD0 RVD1 RVD2
2
8
Utility Clocks
Utility Clocks
Utility Clocks
R®
Page 53Clock Distribution in Microprocessors I. Young 3/30/2005
Variable Frequency Mode:CMOS Critical Path Scaling
0.5
0.7
0.91.1
1.3
1.5
1.7
1.92.1
2.3
2.5
600 800 1000 1200
Supply Voltage (mV)
Del
ay (n
orm
to 1
100
mV)
circuit1circuit2circuit3circuit4circuit5
R®
Page 54Clock Distribution in Microprocessors I. Young 3/30/2005
RVD Coarse Delay Element
I
O
I
I
I
I I
I
Metal 1 Serpentine Resistor
out
VDD
nout
nrun
run
nrunfbn
runfbp
GND
VDD
even
clear
GND
nrun fetconfig_fet nfet
nrunrun
GND
VDDfbp
nclear
nodd
innfet
R®
Page 55Clock Distribution in Microprocessors I. Young 3/30/2005
RVD Block Diagram
Delay Line 0A
Delay Line 1A
Delay Line 0B
Delay Line 1B
RV
D F
SM
dly0in
dly1in
dly0out
dly1out
HOLD
eval0
DOWN
eval1eval0
eval1
clk
additional delay creates deadzone
R®
Page 56Clock Distribution in Microprocessors I. Young 3/30/2005
Example VFC Supply Droop Response
DFD Output Clock
Vcore
RVD Delay Line Clock
Period “UP” to DFD
Clock period increasedNo Adjust needed this cycle
Droop increases RVD delay line delay
Increased delay asserts period “UP” for one cycle
3 4 51 2
R®
Page 57Clock Distribution in Microprocessors I. Young 3/30/2005
SLCB Block Diagram
SLCBO
PCb
Setback dutycycle
inainbincind
PCa
128 Bit Delay Element
128 Bit Thermometer
ShifterShifter ControlFilter
Zone Summer
Output Buffer &Duty Cycle Control
PRESET
DutyCycleScan Registers
Set-backScanRegisters
R®
Page 58Clock Distribution in Microprocessors I. Young 3/30/2005
CVD Circuit and Operation
I O
I I I
SLCBOx cvdoSLCBO
mid
highlow Drive fight with feedback is
attenuated with pass gate settings change the delay as desired
SPICE simulationsshowing low, mid andhigh delay settingsfor SLCBOx (top graph)and CVDO (bottom graph)
R®
Page 59Clock Distribution in Microprocessors I. Young 3/30/2005
Route Statistics and Power
Route Terminals Distance Delay
L0 14 20mm 640ps
L1 71 5mm 215ps
L2 14500 2-3.3mm 60ps
L3 ~5 million 0-1.5mm 12ps
Power dissipation contribution by route
Total CPU
L3
L2L1L0
Route statistics
•Highest load and most power dissipated in the L3 route.•Future research into low-power clock distribution should focus on last section of route.
R®
Page 60Clock Distribution in Microprocessors I. Young 3/30/2005
Evolution in Clock Distribution• 486: PLL on-chip to remove the large clock distribution delay (zero
delay buffer). Clock RC skew minimized across chip with metal.• Pentium: Clock tree with length “tuning” for skew balancing.• Pentium II: Clock Binary Tree in center Spine with branch length
“tuning” to local clock buffer for skew balancing.• Itanium I: Lightly loaded “balanced” reference clock routed with the
highly loaded “unbalanced” clock tree - actively adjust clock buffer delay for low skew (at product test).
• Pentium IV: Three binary tree Spines with “tunable delays” and Phase Detectors distributed across the die. Blow fuses (based upon compare algorithm at test).
• Itanium (Next Gen):- Differential global clock distribution (20mm). - Digital Frequency Divider (synthesizer) adjusts frequency in 1.6% steps within 2 cycles based upon measured local supply.
- Regional Active Deskew adjusts Second Level Clock Buffer delay for low skew ( done during test)
- Local clock buffer variable delay adjust to load (flip-flops) by design (time borrowing)
1991
1993
1995
1998
2000
2004
Complexity
R®
Page 61Clock Distribution in Microprocessors I. Young 3/30/2005
Summary / Future Directions
Clocking systems have evolved with even more complex electrical methods
— Trimming and active feedback de-skewing circuits developed— Transient Frequency adjust based upon local supply voltage
Design the micro-architecture with interconnect delay in mind Exploit locality for frequency scaling
— Logic / clock domains
Clock Distribution Power will take a larger % of the total chip power
R®
Page 62Clock Distribution in Microprocessors I. Young 3/30/2005
Acknowledge the contributions of
Keng Wong, TMG/LTD Design Simon Tam and the Itanium clock design teamNasser Kurd and the Pentium 4 clock design team Patrick Mahoney, Tim Fischer and Montecito clock team