Upload
others
View
7
Download
0
Embed Size (px)
Citation preview
Clock Distribution on a Dual-core Multi-threaded Itanium™-
Family Microprocessor
����������� ��������� �� ������� ����� ��
� �� �� ������ �
��� ��� ��� ��������������� ������ ��� �
Topics• Overview of distribution• L0 route – from PLL to Digital Frequency Divider
(DFD)• L1 route – from DFD to 2nd Level Clock Buffer (SLCB)• Regional Active Deskewing (RAD) system• L2 route – from SLCB to CVD’s• Clock Vernier Device (CVD) description• L3 route – from CVDs to gaters to latches• Summary
Overview with block diagram
Variable Frequency Full Rail Transitions
Bus Clock
IOs
Fixed frequencyLow Voltage Swings
Foxton
Bus Logic
REPEATERS
core0
core1
SLCB
RAD
RAD
GATERSCVDSLCB
DFD
DFD
PLL
DFD
DFD
DFD
DFD
L0 route L1 route L2 route
Latches
Latches
Latches
Latches
Latches
Latches
Latches
SLCB
SLCB
SLCB
GATERSCVD
GATERSCVD
GATERSCVD
GATERSCVD
L3 Route
Differential Single Ended
Level 0 Route: PLL to DFD• Connects PLL to the DFDs• 400mV Low-voltage swing, constant frequency• 20mm long, repeated every 5mm• Balanced h-tree tapered route with matched impedance
resistively terminated.• No duty cycle correction in repeaters, DFD not sensitive
GNDCLK -CLK +GND
GND VDD VDD VDDGND GND N-2 layer
L0 Repeater Circuit
outb
outa
ina
inb
Feedback improves edge rates
O
II
Ooutinout in
GND
VDDVDDVDD
GND
VDD
VDD
VDD
GND
VDD VDD
GND
GND
VDD
GND
VDD
ayax
ina inboutaoutb noutb nouta
Level 1 Route: DFD to SLCB
• Connects DFD to the Second Level Clock Buffer (SLCB)• Full rail signal w/ varying supply and varying frequency• Runs at half core frequency using differential clocks.• Balanced h-tree tapered route with matched impedance
GND VDD VDD VDDGND GND
CLK + CLK - CLK90 + CLK90 -G GG
N-2 layer
SLCBO
PCb
SLCB Block Diagram
Setback dutycycle
inainbincind
PCa
128 Bit Delay Element
128 Bit Thermometer
ShifterShifter ControlFilter
Zone Summer
Output Buffer &Duty Cycle Control
PRESET
DutyCycleScan Registers
Set-backScanRegisters
ina
inb
Feedback turns on
inc
ind
outa
Route Feedback
XOR output generator
Differential to single ended converter
L1 Route Topology
O
I
I
O
I
I
VDD
GND
GND
VDD
GND
GND
VDD
VDD
p12
p12b
outa
outb
ind
inb
ina
inc, ind
inc
ina, inb
SLCB Delay Operation
0
40
80
120
160
200
0 50 100 150trim setting (in ticks)
dela
y (p
s)adjusted delay
Measured silicon data showing adjusted delay vs. trim setting
4:1mux
Delay line
Delay line
Delay line
Delay line
Delay line
Delay line
Delay line diagram
RAD System Overview
11 12 13 14 15
21 2223 24 25 36
31 32 34
33
35
41
42
43
44
52
45
53
46
48
4762
Reference Zone Delay Centered
Each arrow in the zone diagram represents the below phase comparator
I
O
I
O
CVD
CVD
VDDVDD
GND
VDD
GND
VDD
cvdacvdb
inb
up
ina
down
RAD Circuit Operation
RAD system disabled.
RAD system enabled.
L2 route: SLCB to CVD• Connects SLCB to Clock Vernier Device (CVD)• Full rail signal w/ varying supply and varying frequency• Runs at full core frequency and is single-ended. • H-tree route 2.1-3.3mm in length matched in RLC delay• Extracted RLC simulation with a target skew of 6ps• Each SLCB typically drives several hundred CVD’s• Route tapered to reduce power and RC flight
GNDCLKVDD
L2 Clock Routing Tool• In house tool called EZRoute used for balanced h-tree
creation.• Uses two methods of modeling: Elmore delay and
SPICE.• Uses recursive algorithm to adjust wire widths to obtain
lowest skew for a given target.• Tool inputs to are the location of clock inputs and their
loading. The direct output is the layout mask of a clock tree.
Die micrograph showing L2 grid
Clock Vernier Devices (CVD’s)
• CVD’s act as local clock buffers and allow scanable delay insertion
• 14500 individually controllable CVD’s on die for localized clock skew adjustment - RAD CVD’sfor unit-level adjustments
• Skew can be added to the clock– Automate speedpath detection– Find and fix speedpaths in software via scan– Find and fix races in software via scan– In combination with scan latches and on-die clock
shrink, a powerful tool for rapid speedpath debug..
CVD Circuit and Operation
I O
I I I
SLCBOx cvdoSLCBO
mid
high
low Drive fight with feedback is
attenuated with pass gate settings change the delay as desired
SPICE simulationsshowing low, mid andhigh delay settingsfor SLCBOx (top graph)and CVDO (bottom graph)
Level 3 Route: CVDs to latches
• Connects CVDs to the clock-gating buffers and then latches.
• Routed by individual circuit designers – not clock team• Clocks are gated by logic wherever possible.• Up to 1.5mm long in the extreme case, but typically a
few hundred microns. • Shielded route with a worst-case delay of 12ps.• Routes not correctable by RAD system• Routes and loads are extracted and simulated in SPICE
using “picoskew” custom tool and CVD and gater sizing determined by loading.
Route Statistics and Power
Power dissipation contribution by route
640ps20mm14L0
12ps0-1.5mm~5 millionL3
60ps2-3.3mm14500L2
215ps5mm71L1
DelayDistanceTerminalsRoute
Route statistics
•Highest load and most power dissipated in the L3 route.•Future research into low-power clock distribution should focus on last section of route.
Total CPU
L3
L2L1L0
Summary
• Four level clock distribution system operating in a variable frequency, variable voltage domain at 2GHz.
• Deskewing system minimizes routing skew to 10ps or less - constantly - across PVT.
• CVD’s allow software detection and repair of setup or hold timing violations.
• Distribution system dissipates about 25W of power
Acknowledgements
The authors wish to recognize the efforts of Steve Hall and Erin Francom and the others whose contributions made this design possible.