Upload
satheesh-kumar
View
220
Download
0
Embed Size (px)
Citation preview
7/28/2019 Lec 19 Design for Skew
1/29
Introduction to
CMOS VLSIDesign
Design for Skew
7/28/2019 Lec 19 Design for Skew
2/29
CMOS VLSI DesignDesign for Skew Slide 2
Outline
Clock Distribution
Clock Skew
Skew-Tolerant Static Circuits
Traditional Domino Circuits
Skew-Tolerant Domino Circuits
7/28/2019 Lec 19 Design for Skew
3/29
CMOS VLSI DesignDesign for Skew Slide 3
Clocking
Synchronous systems use a clock to keep
operations in sequence
Distinguish this fromprevious ornext
Determine speed at which machine operates
Clock must be distributed to all the sequencing
elements
Flip-flops and latches
Also distribute clock to other elements Domino circuits and memories
7/28/2019 Lec 19 Design for Skew
4/29
CMOS VLSI DesignDesign for Skew Slide 4
Clock Distribution
On a small chip, the clock distribution network is just
a wire
And possibly an inverter for clkb
On practical chips, the RC delay of the wire
resistance and gate load is very long
Variations in this delay cause clock to get to
different elements at different times
This is called clock skew
Most chips use repeaters to buffer the clock and
equalize the delay
Reduces but doesnt eliminate skew
7/28/2019 Lec 19 Design for Skew
5/29
CMOS VLSI DesignDesign for Skew Slide 5
Example
Skew comes from differences in gate and wire delay
With right buffer sizing, clk1 and clk2 could ideally
arrive at the same time.
But power supply noise changes buffer delays
clk2 and clk3 will always see RC skew
3 mm
1.3 pF
3.1 mmgclk
clk1
0.5 mm
clk2
clk3
0.4 pF 0.4 pF
7/28/2019 Lec 19 Design for Skew
6/29
CMOS VLSI DesignDesign for Skew Slide 6
Review: Skew Impact
F1
F2
clk
clk clk
Combinational Logic
Tc
Q1 D2
Q1
D2
tskew
CL
Q1
D2
F1
clk
Q1
F2
clk
D2
clk
tskew
tsetup
tpcq
tpdq
tcd
thold
tccq
setup skewsequencing overhead
hold skew
pd c pcq
cd ccq
t T t t t
t t t t
Ideally full cycle is
available for work
Skew adds sequencing
overhead Increases hold time too
7/28/2019 Lec 19 Design for Skew
7/29
CMOS VLSI DesignDesign for Skew Slide 7
Cycle Time Trends
Much of CPU performance comes from higherf
fis improving faster than simple process shrinks
Sequencing overhead is bigger part of cycle
0.01
0.1
1
10
100
8038680486PentiumPentium II / III
SpecInt95
1985 1988 1991 1994 1997 2000
1.2 0.8 0.6 0.352.0
Process
100
200
500VDD = 5
VDD = 3.3
VDD = 2.5
50Fanout-of-4(FO4)InverterD
elay(ps)
0.25
10
100
1000
8038680486PentiumPentium II / III
MHz
1988 1991 1994 1997 20001985
10
100
1985 1988 1991 1994 1997
8038680486PentiumPentium II / IIIF
O4inverterdelays/cycle
50
20
2000
7/28/2019 Lec 19 Design for Skew
8/29
CMOS VLSI DesignDesign for Skew Slide 8
Solutions
Reduce clock skew
Careful clock distribution network design
Plenty of metal wiring resources
Analyze clock skew Only budget actual, not worst case skews
Local vs. global skew budgets
Tolerate clock skew
Choose circuit structures insensitive to skew
7/28/2019 Lec 19 Design for Skew
9/29
CMOS VLSI DesignDesign for Skew Slide 9
Clock Dist. Networks
Ad hoc
Grids
H-tree
Hybrid
7/28/2019 Lec 19 Design for Skew
10/29
CMOS VLSI DesignDesign for Skew Slide 10
Clock Grids
Use grid on two or more levels to carry clock
Make wires wide to reduce RC delay
Ensures low skew between nearby points
But possibly large skew across die
7/28/2019 Lec 19 Design for Skew
11/29
CMOS VLSI DesignDesign for Skew Slide 11
Alpha Clock Grids
PLL
gclk grid
Alpha 21064 Alpha 21164 Alpha 21264
gclk grid
Alpha 21064 Alpha 21164 Alpha 21264
7/28/2019 Lec 19 Design for Skew
12/29
CMOS VLSI DesignDesign for Skew Slide 12
H-Trees
Fractal structure
Gets clock arbitrarily close to any point
Matched delay along all paths
Delay variations cause skew
A and B might see big skew A B
7/28/2019 Lec 19 Design for Skew
13/29
CMOS VLSI DesignDesign for Skew Slide 13
Itanium 2 H-Tree
Four levels of buffering:
Primary driver
Repeater
Second-levelclock buffer
Gater
Route around
obstructionsPrimary Buffer
Repeaters
Typical SLCB
Locations
7/28/2019 Lec 19 Design for Skew
14/29
CMOS VLSI DesignDesign for Skew Slide 14
Hybrid Networks
Use H-tree to distribute clock to many points
Tie these points together with a grid
Ex: IBM Power4, PowerPC H-tree drives 16-64 sector buffers
Buffers drive total of 1024 points
All points shorted together with grid
7/28/2019 Lec 19 Design for Skew
15/29
CMOS VLSI DesignDesign for Skew Slide 15
Skew Tolerance
Flip-flops are sensitive to skew because ofhard edges
Data launches at latest rising edge of clock
Must setup before earliest next rising edge of clock
Overhead would shrink if we can soften edge
Latches tolerate moderate amounts of skew
Data can arrive anytime latch is transparent
7/28/2019 Lec 19 Design for Skew
16/29
CMOS VLSI DesignDesign for Skew Slide 16
Skew: Latches
Q1L1
1
2
L2
L3
1
1
2
Combinational
Logic 1
Combinational
Logic 2
Q2 Q3D1 D2 D3
sequencing overhead
1 2 hold nonoverlap skew
borrow setup nonoverlap skew
2
,
2
pd c pdq
cd cd ccq
c
t T t
t t t t t t
Tt t t t
2-Phase Latches
setup skew
sequencing overhead
hold skew
borrow setup skew
max ,pd c pdq pcq pw
cd pw ccq
pw
t T t t t t t
t t t t t
t t t t
Pulsed Latches
7/28/2019 Lec 19 Design for Skew
17/29
CMOS VLSI DesignDesign for Skew Slide 17
Dynamic Circuit Review
Static circuits are slow because fat pMOS load input
Dynamic gates use precharge to remove pMOS
transistors from the inputs
Precharge: = 0 output forced high
Evaluate: = 1 output may pull low
A B
Y
C DY
A B C D
A
B
CD
7/28/2019 Lec 19 Design for Skew
18/29
CMOS VLSI DesignDesign for Skew Slide 18
Domino Circuits
Dynamic inputs must monotonically rise during
evaluation
Place inverting stage between each dynamic gate
Dynamic / static pair called domino gate
Domino gates can be safely cascaded
A
W
B
X
domino AND
dynamic
NAND
static
inverter
7/28/2019 Lec 19 Design for Skew
19/29
CMOS VLSI DesignDesign for Skew Slide 19
Domino Timing
Domino gates are 1.5 2x faster than static CMOS
Lower logical effort because of reduced Cin
Challenge is to keep precharge off critical path
Look at clocking schemes for precharge and eval Traditional schemes have severe overhead
Skew-tolerant domino hides this overhead
7/28/2019 Lec 19 Design for Skew
20/29
CMOS VLSI DesignDesign for Skew Slide 20
Traditional Domino Ckts
Hide precharge time by ping-ponging between half-
cycles
One evaluates while other precharges
Latches hold results during precharge
Tc
Static
Dynamic
Latch
clk
Static
Dynamic
clk
Static
Dynamic
clk
Dynamic
clk clk
Static
Dynamic
Latch
Static
Dynamic
Static
Dynamic
Dynamic
clk clk clk clk clk
clk
clk
tpdq
tpdq
2pd c pdqt T t
7/28/2019 Lec 19 Design for Skew
21/29
CMOS VLSI DesignDesign for Skew Slide 21
Clock Skew
Skew increases sequencing overhead
Traditional domino has hard edges
Evaluate at latest rising edge
Setup at latch by earliest falling edge
Static
Dynamic
Latch
clk
Static
Dynamic
clk
Dynamic
clk clk
Static
Dynamic
Latch
Static
Dynamic
Dynamic
clk clk clk clk
clk
clk
tskew
tsetup
setup skew2 2pd ct T t t
7/28/2019 Lec 19 Design for Skew
22/29
CMOS VLSI DesignDesign for Skew Slide 22
Time Borrowing
Logic may not exactly fit half-cycle
No flexibility to borrow time to balance logic
between half cycles
Traditional domino sequencing overhead is about
25% of cycle time in fast systems!
Static
Dynamic
Latch
clk
Static
Dynamic
clk clk
Static
Dynamic
Latch
Static
Dynamic
clk clk clk
clk
clk
tskew
tsetup
7/28/2019 Lec 19 Design for Skew
23/29
CMOS VLSI DesignDesign for Skew Slide 23
Relaxing the Timing
Sequencing overhead caused by hard edges
Data departs dynamic gate on late rising edge
Must setup at latch on early falling edge
Latch functions
Prevent glitches on inputs of domino gates
Holds results during precharge
Is the latch really necessary?
No glitches if inputs come from other domino Can we hold the results in another way?
7/28/2019 Lec 19 Design for Skew
24/29
CMOS VLSI DesignDesign for Skew Slide 24
Skew-Tolerant Domino
Use overlapping clocks to eliminate latches at phase
boundaries.
Second phase evaluates using results of first
a
Static
Dynamic1
Static
Dynamic2
b c d
a
1
2
b
c
a
1
2
b
c
No latch at
phase boundary
7/28/2019 Lec 19 Design for Skew
25/29
CMOS VLSI DesignDesign for Skew Slide 25
Full Keeper
After second phase evaluates, first phase precharges
Input to second phase falls
Violates monotonicity?
But we no longer need the value
Now the second gate has a floating output
Need full keeper to hold it either high or low
weak full
keeper
transistors
f
X
H
7/28/2019 Lec 19 Design for Skew
26/29
CMOS VLSI DesignDesign for Skew Slide 26
Time Borrowing
Overlap can be used to
Tolerate clock skew
Permit time borrowing
No sequencing overhead
tskew
Static
Dynamic
Static
Dynamic
Static
Dynamic
Dynamic
Static
Dynamic
Static
Dynamic
Static
Dynamic
Dynamic
Static
Static
1
2
1
1
1
1
1
2
2
2
Phase 1 Phase 2
toverlap
tborrow
pd ct T
7/28/2019 Lec 19 Design for Skew
27/29
CMOS VLSI DesignDesign for Skew Slide 27
Multiple Phases
With more clock phases, each phase overlaps more
Permits more skew tolerance and time borrowing
Static
Dynamic
Static
Dynamic
Static
Dynamic
Dynamic
Static
Dynamic
Static
Dynamic
Static
Dynamic
Dynamic
Static
Static
3
4
1 1 2 2 3 3 4 4
Phase 1 Phase 2 Phase 3 Phase 4
1
2
7/28/2019 Lec 19 Design for Skew
28/29
CMOS VLSI DesignDesign for Skew Slide 28
Clock Generation
clken
1
2
3
4
7/28/2019 Lec 19 Design for Skew
29/29
CMOS VLSI DesignDesign for Skew Slide 29
Summary
Clock skew effectively increases setup and holdtimes in systems with hard edges
Managing skew
Reduce: good clock distribution network
Analyze: local vs. global skew
Tolerate: use systems with soft edges
Flip-flops and traditional domino are costly
Latches and skew-tolerant domino perform at fullspeed even with moderate clock skews.