Upload
ngohanh
View
214
Download
1
Embed Size (px)
Citation preview
1
Digital System Clocking:Digital System Clocking:Digital System Clocking:HighHigh--Performance and LowPerformance and Low--Power AspectsPower Aspects
Vojin G. Oklobdzija, Vladimir M. Stojanovic, Dejan M. Markovic, Nikola M. Nedovic
Wiley-Interscience and IEEE Press, January 2003
Chapter 9: Microprocessor Examples
Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 2
Microprocessor Examples• Clocking for Intel® Microprocessors
• IA-32 Pentium® Pro• First IA-64 Microprocessor• Pentium 4
• Sun Microsystems UltraSPARC-III® Clocking• Clocking and CSEs
• Alpha® Clocking: A Historical Overview• Clocking and CSEs
• IBM® Microprocessors• Level-Sensitive Scan Design• Examples of CSEs
2
Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 3
Microprocessor Examples• Clocking for Intel® Microprocessors
• IA-32 Pentium® Pro• First IA-64 Microprocessor• Pentium 4
• Sun Microsystems UltraSPARC-III® Clocking• Clocking and CSEs
• Alpha® Clocking: A Historical Overview• Clocking and CSEs
• IBM® Microprocessors• Level-Sensitive Scan Design• Examples of CSEs
Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 4
Source: Microprocessor Report Journal
67W23W27WMax Power
0.18µm, 6M0.18µm, 6M0.28µm, 4MIC Process
217mm2106mm2203mm2Die Size
12K/8K/256K16K/16K/256K16k/16K/-Cache (I/D/L2)
42M24M7.5MTransistors
22/2412/1412/14Pipeline Stages
2GHz1GHz266 MHzClock Speed
Dec 2001April 2000June 1997MPR Issue
Pentium 4Pentium IIIPentium II
Intel® Microprocessor Features
3
Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 5
Clock distribution network with deskewing circuit(Geannopoulos and Dai 1998), Copyright © 1998 IEEE
IA-32 Pentium® Pro
Delay Line
Delay SRDeskewControl
Delay Line
Delay SR
CLK Gen
Left Spine
Right S
pine
PD
FB Clk
Ext Clk
Core
Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 6
Adaptive Deskewing Technique• Equalization of two clock distribution spines
by compensating for delay mismatch • Delay lines• Phase detector• Controller
• Result: global clock skew of only 15ps• 0.25µm technology• 7.5M transistors
4
Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 7
Delay shift register(Geannopoulos and Dai 1998), Copyright © 1998 IEEE
IA-32 Pentium® ProOutIn
Load<1:15,2> Load<0:14,2>
<1:15,2> <0:14,2>
Delay Shift Register
Delay Line
Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 8
Phase detector(Geannopoulos and Dai 1998), Copyright © 1998 IEEE
IA-32 Pentium® Pro
Left Clk
PhaseDetector 1
PhaseDetector 2
Delay = n
Delay = n
Right Clk
BandwidthControl
Left Leads
Right Leads
5
Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 9
Clock distribution topology(Rusu and Tam 2000), Copyright © 2000 IEEE
First IA-64 Microprocessor
PLL
PLLCore Clock
Reference Clock
DeskewCluster
RCDs
Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 10
Programmable Deskew Units• Strategy similar to that in IA-32• External differential clock
• System bus frequency
• PLL generates internal clock• 2x frequency
• Clock distribution architecture• Balanced global clock tree• Multiple deskew buffers• Multiple local clock buffers
6
Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 11
Deskew buffer architecture(Rusu and Tam 2000), Copyright © 2000 IEEE
First IA-64 Microprocessor
RCD
RCD
RegionalClockGrid
Digital FilterControl FSM
Deskew Settings
Global Clock
TAP Interface
Reference ClockPhase
Detector
Regional Feedback Clock
Deskew Buffer
Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 12
Digitally controlled delay line(Rusu and Tam 2000), Copyright © 2000 IEEE
First IA-64 Microprocessor
Output
Input
Delay Control Register
Enable
7
Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 13
Simulated regional clock-grid skew(Rusu and Tam 2000), Copyright © 2000 IEEE
First IA-64 Microprocessor
Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 14
Measured regional clock skew(Rusu and Tam 2000), Copyright © 2000 IEEE
First IA-64 Microprocessor
8
Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 15
Core and I/O clock generation(Kurd et al. 2001), Copyright © 2001 IEEE
Pentium® 42x-Clk
enablesaddr. busoutboundclocks
MACRO
clock enablegenerator
clock enabledistribution
& sync
clock enabledistribution
& sync
1x-Clkenable
MACRO
CorePLL
core Clkdistribution
core clock
I/OPLL
bus clock
bus clock#I/O data Clkdistribution
data busoutbound clocks
divide by 4
outbounddeskew state
machine
core clockI/O feedback clock
data from coreD Q
MSFF
inputbuffer
inboundbuffers
data
DQ
MSFF
core clock
inboundclocks gen
state machine
strobe glitchprotection and
detection
inputbuffers
strobes
inbound latchingclocks
data to core
data clock
Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 16
Multi-GHz Clock Network in Pentium 4• Three core and three I/O frequencies
(total 6 frequencies running concurrently)• Differential off-chip reference clock
• PLL synthesizes core and I/O clocks
• Global core clock distribution• 47 independent clock domains• Each domain has 5-bit deskew control register
• Clock skew < 20ps
9
Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 17
Logical diagram of core clock distribution(Kurd et al. 2001), Copyright © 2001 IEEE
Pentium® 4
Domain Buffer1
Domain Buffer2
Domain Buffer3
Domain Buffer46
Domain Buffer47
PhaseDetector
PhaseDetector
PhaseDetector
PhaseDetector
To TestAccess Port
Local ClockMacro
Local ClockMacro
Local ClockMacro
SequentialElements
SequentialElements
SequentialElements
SequentialElements
SequentialElements
Local ClockMacro
Local ClockMacro
33-stage binary
tree ofclock
repeaters
PLL
Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 18
Example of local clock buffers generating various frequency, phase and types of clocks(Kurd et al. 2001), Copyright © 2001 IEEE
Pentium® 4
Gclk
Enable 1
AdjustableDelay Buffer
Stretch 1
Stretch 0
Enable 2
fast freq.pulse clk
ClkBuf Type 2
Gclk
Enable 2
AdjustableDelay Buffer
Stretch 1
Stretch 0
mediumfreq. pulseclk phase 1
ClkBuf Type 1
Enable 1
Enable
Gclk mediumfreq. normal
clk phase 1ClkBuf Type 3
GclkEnable 2
Stretch 0
Enable 1
ClkBufType 1
Stretch 0
Stretch 1Stretch 1
Enable 1
Enable 2Gclk
mediumfreq. pulseclk phase 2
Gclk
SlowClkSync
Stretch 0
Enable 1
ClkBufType 1
Stretch 0
Stretch 1Stretch 1
Enable 1
Enable 2Gclk
slow freq.pulse clkphase 1
10
Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 19
Intel Clocking: Summary• Increasing clock speeds and die size
• Balancing the clock skew in large designs using simple RC trees is becoming less effective
• Insertion delay 7-8FO4 due to increased die• Comparable to the clock period
• Clock skew control has been getting harder to due to increased PVT variations
• Inductive effects at multi-GHz rates• Use of active deskewing circuits
Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 20
Microprocessor Examples• Clocking for Intel® Microprocessors
• IA-32 Pentium® Pro• First IA-64 Microprocessor• Pentium 4
• Sun Microsystems UltraSPARC-III® Clocking• Clocking and CSEs
• Alpha® Clocking: A Historical Overview• Clocking and CSEs
• IBM® Microprocessors• Level-Sensitive Scan Design• Examples of CSEs
11
Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 21
<80W<30W<30WPower consumption
7 (Al)5 (Al)4 (Al)Metal layers
0.15µm CMOS0.35µm CMOS0.5µm CMOSProcess
1.6V2.5V3.3VSupply voltage
1GHz330MHz167MHzClock Frequency
23M5.4M5.2M# of transistors
15x15.5mm212.5x12.5mm217.7x17.8mm2Die size
SPARC V9, 4-issueSPARC V9, 4-issueSPARC V9, 4-issueArchitecture
200019971995Year
UltraSPARC-III®UltraSPARC-II®UltraSPARC-I®
UltraSPARC® Family Characteristics
Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 22
UltraSPARC-III: Clocking
• Performance-driven high-power clock distribution
• Eight logic gates per cycle• High-speed semi-dynamic flip-flops with
logic embedding • Large hold time mandates use of advanced
tools for fixing fast-path violations
12
Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 23
Clock distribution delay in UltraSPARC-III(Heald et al. 2000), Copyright © 2000 IEEE
UltraSPARC-III®: Clocking
Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 24
Semidynamic flip-flop(Klass 1998), Copyright © 1998 IEEE
Vdd
Vdd
Clk
D
MN3
MN1MN4
MP2
MP1
Inv1
Inv2 Inv3
NAND
MN2
MN5Q
Inv4
Inv6
Inv5Q
S
Clk1
UltraSPARC-III: Clock Storage Elements
• Single-ended dynamic structure with use of keepers for static operation and use of clock pulsing
• Positive feedback (NAND) improves low-to-high setup time• Fast, at the price of high internal and clock power
13
Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 25
UltraSPARC-III: Clock Storage ElementsVdd
Vdd
Clk
D1
QMN3
MN1
MN5
MN4
MP2
MP1
Inv1
Inv5Inv4
Inv2
Q
Inv3Inv6
NAND
D2
MN2a MN2c
MN2b MN2d
D2
D1
SVdd
Vdd
Clk
D1
QMN3
MN1
MN5
MN4
MP2
MP1
Inv1
Inv5Inv4
Inv2
Q
Inv3NMOSnetwork
D2
DN
Inv6
NAND
S
Logic embedding in a semi-dynamic flip-flop Two-input XOR function(Klass, 1998), Copyright © 1998 IEEE
• A non-inverting logic function can be embedded by replacing the input D transistor with an n-MOS logic network
• Necessary for fitting 8 logic stages in cycle time, also used for scan• Complexity of embedded logic limited by the n-MOS stack depth
Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 26
UltraSPARC-III: Clock Storage ElementsVdd
Clk
MN3
MN1
MP1
Inv1-2
Inv5
Inv3-4
D
Vdd
QInv6
MN2 MN4
MN5
MP2 MP4 MP3
MN7MN6
D
QS R
Vdd
Clk
MN3
MN1
MP1
Inv1
Inv5
Inv4
Inv2
Q
Inv3D
NAND
MN2
S
Single-ended dynamic SDFF Differential dynamic SDFF(Klass, 1998), Copyright © 1998 IEEE
• Dynamic version of SDFF used in dynamic logic paths • Outputs exercise precharge-evaluate sequence to ensure
monotonicity
14
Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 27
UltraSPARC-III flip-flop(Heald et al. 2000), Copyright © 2000 IEEE
Vdd
Vdd
Clk
D
QMN3
MN1MN5
MN4
MP5
MP1
Inv1
Inv5
Inv2 Inv3
Inv4NAND
DVdd
Vdd
Vdd
MN2
MN6
MN7
MP2
MP3
MP4MP6
MP7
Q
S
UltraSPARC-III: Clock Storage Elements
• Final UltraSPARC-III flip-flop modified by decoupling keepers to increase immunity to α-particles
• Somewhat degraded speed and logic embedding property
Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 28
Microprocessor Examples• Clocking for Intel® Microprocessors
• IA-32 Pentium® Pro• First IA-64 Microprocessor• Pentium 4
• Sun Microsystems UltraSPARC-III® Clocking• Clocking and CSEs
• Alpha® Clocking: A Historical Overview• Clocking and CSEs
• IBM® Microprocessors• Level-Sensitive Scan Design• Examples of CSEs
15
Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 29
12121416Gates/Cycle
1200600300200Clk Freq. [MHz]
125725030Power [W]
1.52.23.33.3Supply [V]0.18µm0.35µm0.5µm0.75µmProcess
21.1x18.816.7x18.818.1x16.516.8x13.9Die Size [mm2]
15215.29.31.68# transistors [M]
21364212642116421064
Alpha® Microprocessor Features
Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 30
Alpha microprocessor final clock driver location:(a) 21064, (b) 21164, (c) 21264
(a) (b) (c)
Alpha® Microprocessors: Clocking
clockgrid
16
Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 31
21064 clock skew(Gronowski et al. 1998), Copyright © 1998 IEEE
Alpha® Microprocessors: Clocking
Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 32
21164 clock skew(Gronowski et al. 1998), Copyright © 1998 IEEE
Alpha® Microprocessors: Clocking
17
Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 33
21264 clock hierarchy(Gronowski et al. 1998), Copyright © 1998 IEEE
Alpha® Microprocessors: Clocking
PLLext. clk
D
Clk
D
Clk
D
Clkcond
local clk
cond.local clk
D
Clk
D
Clk
D
Clk
cond
local clk
cond.local clk
GCLKGrid Box
ClkGrid
Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 34
21264 clock skew(Gronowski et al. 1998), Copyright © 1998 IEEE
Alpha® Microprocessors: Clocking
18
Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 35
21364 major clock domains(Xanthopoulos et al. 2001), Copyright © 2001
Alpha® Microprocessors: Clocking
GCLKgrid
DLL
DLL DLL
L2LClk L2RClk
NCLK
Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 36
21364, NCLK clock skew(Xanthopoulos et al. 2001), Copyright © 2001 IEEE
Alpha® Microprocessors: Clocking
19
Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 37
21064 modified TSPC latches(Gronowski et al. 1998), Copyright © 1998
D
Clk
X Q
1N 2N
3N4N
2P1P
5PD
ClkX Q
1N 2N
3P
4P
2P1P
5N
Alpha® µP: Clock Storage Elements
Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 38
21164: (a) phase-A latch, (b) phase-B latch(Gronowski et al. 1998), Copyright © 1998 IEEE
(a) (b)
Alpha® µP: Clock Storage Elements
D Q
Clk
XD Q
Clk
X
20
Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 39
Embedding of logic into a latch:(a) 21064 TSPC latch, one level of logic;
(b) 21164 latch, two levels of logic.(Gronowski et al. 1998), Copyright © 1998 IEEE
(a) (b)
Alpha® µP: Clock Storage Elements
Q
X2
Clk
Q
X1
Clk
Clk
X
1D
2D
1D
2D
3D4D
Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 40
21264 flip-flop(Gronowski et al. 1998), Copyright © 1998 IEEE
Alpha® µP: Clock Storage ElementsQQ
Clk
D
21
Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 41
Critical-path and race analysis for clock buffering and conditioning(Gronowski et al. 1998), Copyright © 1998 IEEE
Alpha® Microprocessors: Timing
GCLK
D Q Logic D Q
R
Critical Path Definition and Criteria- Identify common clock, D and R- Maximize D- Minimize R
D+U−R ≤ Tcycle
GCLK
D Q Logic
D R
Race Definition and Criteria- Identify common clock, D and R- Minimize D- Maximize R
D ≥ R+H
cond
D
Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 42
Microprocessor Examples• Clocking for Intel® Microprocessors
• IA-32 Pentium® Pro• First IA-64 Microprocessor• Pentium 4
• Sun Microsystems UltraSPARC-III® Clocking• Clocking and CSEs
• Alpha® Clocking: A Historical Overview• Clocking and CSEs
• IBM® Microprocessors• Level-Sensitive Scan Design• Examples of CSEs
22
Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 43
Eichelberger 1983
Hazard-Free Level-Sensitive Polarity-Hold Latch
Out
+Clock
-Clock
Data
Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 44
General LSSD Configuration
Sn+1 = f {Sn, X}
Present StateSn
Next State Sn+1
Clocked StorageElements
Clock
Outputs (Y)
Y=Y(X, Sn)
Inputs (X)Combinational
Logic
Scan-In
Scan-Out
Scan-Out
23
Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 45
LSSD Shift Register Latch
-Scan_In
+A Clk
-Data
-C Clk
+B Clk
-L2
+L2
L2 Latch
L1 Latch-L1 +L1
Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 46
LSSD Double Latch Design
PrimaryInputs X
L1
L1
L1
L1
L2
L2
L2
L2
CombinationalLogic
X1
X2
X3
Xn
Primary Outputs Z
Sn
State Sn
C1
A ShiftScan InB Shift
or Scan In
Scan Out
24
Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 47
LSSD SRL with multiplexer used in the IBM S/390 G4 processor(Sigal et al. 1997), reproduced by permission
IBM® S/390 Parallel Server Processor
Q
(SCAN_OUT)
L1
A_CLK
SCAN_INL2
B_CLK
CLKL
CLKG
SELECT_N
TEST_DISABLE
SELECT_ACLKL
CLK_ENABLE
CLKG IN_A
IN_B
Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 48
Static multiplexer version of the SRL used in the IBM S/390 G4(Sigal et al. 1997), reproduced by permission
IBM® S/390 Parallel Server Processor
(SCAN_OUT)
SELECT_N
SELECT_A
TEST_DISABLE
Q
Q
B_CLKA_CLK
SCAN_IN
IN_AIN_BIN_C
IN_MIN_N
CLKL
mux_A
mux_M_N
25
Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 49
A clocked storage element is used in the non-timing-critical timing macros of the IBM S/390 G4 processor
(Sigal et al. 1997), reproduced by permission
IBM® S/390 Parallel Server Processor
Q(S
CA
N_O
UT)
L1
A_CLK
SCAN_IN
IN L2
CLKG
C1
C2
B_CLKCLKG
C2_ENABLE
C1_DISABLE C1
C2
Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 50
The clock-generation element used to detect problems created with fast paths:IBM S/390 G4 processor
(Sigal et al. 1997), reproduced by permission
IBM® S/390 Parallel Server Processor
UNOVERLAP
C1
C2B_CLK
CLKG
C1_DISABLE
C2_ENABLE
CLKG
C1
C2
26
Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 51
The experimental IBM PowerPC processor(Silberman et al. 1998), reproduced by permission
IBM® PowerPC Processor
(a)
(b)
SEL0
SEL0
D0
CLK
SELn-1
Dn-1
SELn-1
CLK
CLK
OC
OT
Slave Latch
SR Master Latch
Complement Mux
True Mux
SO
SEL_EXTi
SG
SELi
CLK
SCAN_GATE
NCLK
Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 52
The PowerPC 603 MSL(Gerosa et al. 1994), Copyright © 1994 IEEE
IBM® PowerPC 603: Master-Slave Latch
Din
C1
C1
VDD
ACLK
ACLK
ACLK
SCANin
C2
C2
C2
Dout
27
Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 53
The PowerPC 603 local clock regenerator(Gerosa et al. 1994), Copyright © 1994 IEEE
IBM® PowerPC 603: Local Clk Generator
C1_TESTSCAN_C1
GCLK
WAITCLKOVERRIDE
C2_TEST
C1_FREEZE
C2_FREEZE
ACLK
C1
C2
Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 54
Summary• Intel® Microprocessors
• Active clock deskewing in Pentium® processors
• Sun Microsystems® Processors• Semidynamic flip-flop (one of the fastest
single-ended flip-flops today, “soft-edge”)
• Alpha® Processors• Performance leader in the ‘90s• Incorporating logic into CSEs
• IBM® Processors• Design for testability techniques• Low-power champion PowerPC 603