Upload
dangnga
View
218
Download
3
Embed Size (px)
Citation preview
Sp12 CMPEN 411 L17 S.1
CMPEN 411VLSI Digital Circuits
Spring 2012Lecture 17: Dynamic Sequential Circuits
And Timing Issues
[Adapted from Rabaey’s Digital Integrated Circuits, Second Edition, ©2003 J. Rabaey, A. Chandrakasan, B. Nikolic]
Sp12 CMPEN 411 L17 S.2
This Lecture
Reading
Dynamic sequential circuits- Reading assignment – Rabaey, et al, 7.3, 7.7
Timing issues, Intro to datapath design
- Reading assignment – Rabaey, et al, 10.1-10.3.3; 11.1-11.2
Next lecture
Intro to datapath design
- Reading assignment – Rabaey, et al, 11.1-11.2
Adder design
- Reading assignment – Rabaey, et al, 11.3
Sp12 CMPEN 411 L17 S.3
Last Lecture: Static MS ET Implementation
Q
D
clk
QM
I1
I2 I3
I4
I5 I6T2
T1T3
T4
Master Slave
!clk
clk
Sp12 CMPEN 411 L17 S.4
Dynamic ET Flipflop
T1
T2
I1 I2 QQM
D
C1
C2
!clk
clk
clk
!clk
!clk
clk
master transparent
slave hold
master hold
slave transparent
master slave
tsu =
thold =
tc-q =
tpd_tx
zero
2 tpd_inv + tpd_tx
Sp12 CMPEN 411 L17 S.5
Pseudostatic Dynamic Latch
Robustness considerations limit the use of dynamic FF’s
coupling between signal nets and internal storage nodes can inject significant noise and destroy the FF state
leakage currents cause state to leak away with time
internal dynamic nodes don’t track fluctuations in VDD that reduces noise margins
A simple fix is to make the circuit pseudostatic
QM
clk
!clk
Add above logic added to all dynamic latches
Q
Sp12 CMPEN 411 L17 S.6
Dynamic ET FF Race Conditions
T1
T2
I1 I2 QQM
D
C1
C2
!clk
clk
clk
!clk
!clk
clk0-0 overlap race condition
toverlap0-0 < tT1 + tI1 + tT2
1-1 overlap race condition
toverlap1-1 < thold
Sp12 CMPEN 411 L17 S.7
Fix 1: Dynamic Two-Phase ET FF
clk2
clk1tnon_overlap
T1
T2
I1 I2 QQM
D
C1
C2
clk1
!clk1
clk2
!clk2
master transparent
slave hold
master hold
slave transparent
Keep clock non-overlap large enough, but with 4 clock singals to route
Sp12 CMPEN 411 L17 S.8
Fix 2: C2MOS (Clocked CMOS) ET Flipflop
clk
!clk
!clk
clk
QM
C1 C2
QD
M1
M3
M4
M2 M6
M8
M7
M5
Master Slave
!clk
clk
A clock-skew insensitive FF
Sp12 CMPEN 411 L17 S.9
C2MOS (Clocked CMOS) ET Flipflop
clk
!clk
!clk
clk
QM
C1 C2
QD
M1
M3
M4
M2 M6
M8
M7
M5
Master Slave
!clk
clk
master transparent
slave hold
master hold
slave transparent
on
on
off
offon
onoff
off
A clock-skew insensitive FF
Sp12 CMPEN 411 L17 S.10
C2MOS FF 0-0 Overlap Case
0 0QM
C1 C2
QD
M1
M4
M2 M6
M8
M5
!clk
clk
!clk
clk
Clock-skew insensitive as long as the rise and fall times of the clock edges are sufficiently small
Sp12 CMPEN 411 L17 S.11
C2MOS FF 1-1 Overlap Case
1 1
QM
C1 C2
QD
M1
M2 M6
M5
!clk
clk
M3 M7
!clk
clk
Sp12 CMPEN 411 L17 S.12
Fix 3: True Single Phase Clocked (TSPC) Latches
clk clkInQ
Positive LatchNegative Latch
transparent when clk = 1
hold when clk = 0
clk clkIn
Q
hold when clk = 1
transparent when clk = 0
Sp12 CMPEN 411 L17 S.14
TSPC ET FF
clkmaster hold
slave transparent
clk clkD
Master Slave
clk clk QQM
master transparent
slave hold
onon
offoff
onon
offoff
on onoffoff
Sp12 CMPEN 411 L17 S.15
Choosing a Clocking Strategy
Choosing the right clocking scheme affects the functionality, speed, and power of a circuit
Two-phase designs
+ robust and conceptually simple
- need to generate and route two clock signals
- have to design to accommodate possible skew between the two clock signals
Single phase designs
+ only need to generate and route one clock signal
+ supported by most automated design methodologies
+ don’t have to worry about skew between the two clocks
- have to have guaranteed slopes on the clock edges
Sp12 CMPEN 411 L17 S.16
Review: Sequential Definitions
Use two, level sensitive latches of opposite type to build one master-slave flipflop that changes state on a clock edge (when the slave is transparent)
Static storage
static uses a bistable element with feedback to store its state and thus preserves state as long as the power is on
- Loading new data into the element: 1) cutting the feedback path (mux based); 2) overpowering the feedback path (SRAM based)
Dynamic storage
dynamic stores state on parasitic capacitors so the state held for only a period of time (milliseconds); requires periodic refresh
dynamic is usually simpler (fewer transistors), higher speed, lower power but due to noise immunity issues always modify the circuit (by adding a feedback loop on the output) so that it is pseudostatic
Sp12 CMPEN 411 L17 S.17
Timing Classifications
Synchronous systems
All memory elements in the system are simultaneously updated using a globally distributed periodic synchronization signal (i.e., a global clock signal)
Functionality is ensure by strict constraints on the clock signal generation and distribution to minimize
- Clock skew (spatial variations in clock edges)
- Clock jitter (temporal variations in clock edges)
Asynchronous systems
Self-timed (controlled) systems
No need for a globally distributed clock, but have asynchronous circuit overheads (handshaking logic, etc.)
Hybrid systems
Synchronization between different clock domains
Interfacing between asynchronous and synchronous domains
Sp12 CMPEN 411 L17 S.18
Review: Synchronous Timing Basics
Under ideal conditions (i.e., when tclk1 = tclk2)
T tc-q + tplogic + tsu
thold ≤ tcdlogic + tcdreg
Under real conditions, the clock signal can have both spatial (clock skew) and temporal (clock jitter) variations
skew is constant from cycle to cycle (by definition); skew can be positive (clock and data flowing in the same direction) or negative (clock and data flowing in opposite directions)
jitter causes T to change on a cycle-by-cycle basis
D Q
R1
Combinational
logicD Q
R2
clk
In
tclk1 tclk2
tc-q, tsu,
thold, tcdreg
tplogic, tcdlogic
Sp12 CMPEN 411 L17 S.19
Sources of Clock Skew and Jitter in Clock Network
PLL
1
2
4
3
5
6
7
clock
generation
clock drivers
power supply
interconnectcapacitive load
capacitive
coupling
temperature
Skew
manufacturing device variations in clock drivers
interconnect variations
environmental variations (power supply and temperature)
Jitter
clock generation
capacitive loading and coupling
environmental variations (power supply and temperature)
Sp12 CMPEN 411 L17 S.20
Positive Clock Skew
D Q
R1
Combinational
logicD Q
R2
clk
In
tclk1 tclk2
delay
> 0: Improves performance, but makes thold harder to meet. If thold is not met (race conditions), the circuit malfunctions independent of the clock period!
T
T + > 0
+ thold
T + tc-q + tplogic + tsu so T tc-q + tplogic + tsu -
thold + ≤ tcdlogic + tcdreg so thold ≤ tcdlogic + tcdreg -
1
2
3
4
Clock and data flow in the same direction
T :
thold :
Sp12 CMPEN 411 L17 S.21
Negative Clock Skew
D Q
R1
Combinational
logicD Q
R2
clk
In
tclk1 tclk2
delay
Clock and data flow in opposite directions
T
T +
< 0
T + tc-q + tplogic + tsu so T tc-q + tplogic + tsu -
thold + ≤ tcdlogic + tcdreg so thold ≤ tcdlogic + tcdreg -
1
2
3
4
< 0: Degrades performance, but thold is easier to meet (eliminating race conditions)
T :
thold :
Sp12 CMPEN 411 L17 S.22
Clock Jitter
Jitter causes T to vary on a cycle-by-cycle basis
R1Combinational
logic
clk
In
tclk
T
-tjitter +tjitter
T - 2tjitter tc-q + tplogic + tsu so T tc-q + tplogic + tsu + 2tjitter
Jitter directly reduces the performance of a sequential circuit
T :
Sp12 CMPEN 411 L17 S.23
Combined Impact of Skew and Jitter
D Q
R1
Combinational
logicD Q
R2
In
tclk1 tclk2
Constraints on the minimum clock period ( > 0)
> 0 with jitter: Degrades performance, and makes thold
even harder to meet. (The acceptable skew is reduced by jitter.)
T
T +
> 01
6 12
-tjitter
T tc-q + tplogic + tsu - + 2tjitter thold ≤ tcdlogic + tcdreg – – 2tjitter
Sp12 CMPEN 411 L17 S.24
Clock Distribution Networks
Clock skew and jitter can ultimately limit the performance of a digital system, so designing a clock network that minimizes both is important
In many high-speed processors, a majority of the dynamic power is dissipated in the clock network.
To reduce dynamic power, the clock network must support clock gating (shutting down (disabling the clock) units)
Clock distribution techniques
Balanced paths (H-tree network, matched RC trees)
- In the ideal case, can eliminate skew
- Could take multiple cycles for the clock signal to propagate to the leaves of the tree
Clock grids
- Typically used in the final stage of the clock distribution network
- Minimizes absolute delay (not relative delay)
Sp12 CMPEN 411 L17 S.25
H-Tree Clock Network
Clock
Clock
Idle
condition
Gated
clock
Can insert clock gating at
multiple levels in clock tree
Can shut off entire subtree
if all gating conditions are
satisfied
If the paths are perfectly balanced, clock skew is zero
Sp12 CMPEN 411 L17 S.26
Clock Grid Network
Distributed buffering reduces absolute delay and makes clock gating easier, but is sensitive to variations in the buffer delay
Clock
secondary clock buffers
local logic
area
main clock
buffer
The secondary buffersisolate the local clock nets from the upstream load and amplify the clock signals degraded by the RC network
decreases absolute skew
gives steeper clocks
Only have to bound the skew within the local logic area
Sp12 CMPEN 411 L17 S.27
DEC Alpha 21164 (EV5) Example
300 MHz clock (9.3 million transistors on a 16.5x18.1 mm die in 0.5 micron CMOS technology)
single phase clock
3.75 nF total clock load
Extensive use of dynamic logic
20 W (out of 50) in clock distribution network
Two level clock distribution
Single 6 inverter stage main clock buffer at the center of the chip
Secondary clock buffers drive the left and right sides of the clock grid in m3 and m4
Total equivalent driver size of 58 cm !!
Sp12 CMPEN 411 L17 S.29
Clock Skew in Alpha Processor
Absolute skew smaller than 90 ps
The critical instruction and execution units all see the clock within 65 ps
Sp12 CMPEN 411 L17 S.32
Dealing with Clock Skew and Jitter To minimize skew, balance clock paths using H-tree or
matched-tree clock distribution structures.
If possible, route data and clock in opposite directions; eliminates races at the cost of performance.
The use of gated clocks to help with dynamic power consumption make jitter worse.
Shield clock wires (route power lines – VDD or GND – next to clock lines) to minimize/eliminate coupling with neighboring signal nets.
Use dummy fills to reduce skew by reducing variations in interconnect capacitances due to interlayer dielectric thickness variations.
Beware of temperature and supply rail variations and their effects on skew and jitter. Power supply noise fundamentally limits the performance of clock networks.
Sp12 CMPEN 411 L17 S.34
Clock Skew Scheduling
16 12
C
12max 16
pulse at i,k
pulse at j
T = 15
tight
i j k
1