28
1 EE371 Lecture 6 Clocked storage elements Vladimir Stojanovic Material in this presentation is adapted from “Digital System Clocking: High-Performance and Low Power Aspects”, V.G.Oklobdzija, V.M.Stojanovic, D.M.Markovic, N.M.Nedovic, © 2003 J.Wiley & IEEE EE371 Lecture 6 2 ! Latch and Flip-Flop ! Timing and Power Metrics ! High-Performance Issues ! Low-Energy Issues ! State-of-the-art circuits ! Microprocessor Examples Outline

Clocked storage elements - Stanford University...Note: power is always consumed in the clocked pulse generator EE371 Lecture 6 32 Delay Comparison: M-S Latches and Flip-Flops 0.18um,

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Clocked storage elements - Stanford University...Note: power is always consumed in the clocked pulse generator EE371 Lecture 6 32 Delay Comparison: M-S Latches and Flip-Flops 0.18um,

1

EE371 Lecture 6

Clocked storage elements

Vladimir Stojanovic

Material in this presentation is adapted from “Digital System Clocking: High-Performance and Low Power Aspects”, V.G.Oklobdzija, V.M.Stojanovic, D.M.Markovic, N.M.Nedovic, © 2003 J.Wiley & IEEE

EE371 Lecture 6 2

! Latch and Flip-Flop

! Timing and Power Metrics

! High-Performance Issues

! Low-Energy Issues

! State-of-the-art circuits

! Microprocessor Examples

Outline

Page 2: Clocked storage elements - Stanford University...Note: power is always consumed in the clocked pulse generator EE371 Lecture 6 32 Delay Comparison: M-S Latches and Flip-Flops 0.18um,

2

EE371 Lecture 6 3

Recent Interest in Flip-Flops

! Trends in high-performance systems» Higher clock frequency» More transistors on chip

! Consequences» Increased flip-flop overhead relative to cycle time

– Cycle time 10 - 20 FO4 delays, flop overhead 2 - 4 FO4

» Difficult to control both edges of the clock» Higher impact of clock skew» Higher crosstalk and substrate coupling» Higher power consumption

– expensive packages and cooling systems– limit in performance

» Clock burns up to 40%, flops up to 20% of total power

EE371 Lecture 6 4

Why are clocked storage elements important?

! Cycle time ~ 12-20 FO4 delays

! Flip-flop overhead 2-3FO4 (20% of cycle time!!!)

D Q

Clk

D Q

Clk

Logic

N

TLogicTClk-Q TSetup

T

TD-Q=TClk-Q + TSetup

Page 3: Clocked storage elements - Stanford University...Note: power is always consumed in the clocked pulse generator EE371 Lecture 6 32 Delay Comparison: M-S Latches and Flip-Flops 0.18um,

3

EE371 Lecture 6 5

Latch and Flip-Flop

Data Q

Clock

Q

Clock

Data

F-F

Data Q

Clock

Q

Clock

Data

Latch

Latch is “transparent”(clock-level sensitive)

After the transition of the clock, data change does not affect the output (clock-edge sensitive)

Flip-Flop – “hard” edge clocking

Latch – “soft” edge clocking

EE371 Lecture 6 6

How can one recognize the difference without knowing what is inside the “black-box” ?

Flip-Flop and Master-Slave Latch

Operational behavior appears the same…

DataQ

Clock

Q

Clock

Data

DataQ

Clock

Clock

Data

Latch

D

Q

Q

Master - L1D

Q

Q

Clk2

Slave - L2

Clk1

Q

F-F

M-S Latch

Page 4: Clocked storage elements - Stanford University...Note: power is always consumed in the clocked pulse generator EE371 Lecture 6 32 Delay Comparison: M-S Latches and Flip-Flops 0.18um,

4

EE371 Lecture 6 7

Flip-Flop and M-S Latch: Structural Difference

PulseGenerator

Clock

Q Q

Input Input

SlaveLatchNo Clock

Flip-Flop

Master(L1)

Latch

Clock: Φ1

Q Q

Input Input

Slave(L2)

Latch

Clock: Φ2

Q1

Q2

Q1

Q2

M-S Latch

Pulse Capturing

Latch

S RS

EE371 Lecture 6 8

T-G Master-Slave Latch

! PowerPC 603 (Gerosa, JSSC 12/94)Vdd Vdd

Clk

QClk Clkb

Clkb

D

Page 5: Clocked storage elements - Stanford University...Note: power is always consumed in the clocked pulse generator EE371 Lecture 6 32 Delay Comparison: M-S Latches and Flip-Flops 0.18um,

5

EE371 Lecture 6 9

Flip-Flop Example 1: SAFF

SAFF DEC Alpha 21264 (Madden & Bowhill, 1990, Matsui 1994)

D=1

D=0pulse RS

Pulsegenerator

CapturingLatch

Clk

D D

RS

QQ

(Sense-Amplifier-Based Flip-Flop)

EE371 Lecture 6 10

Flip-Flop Example 2: HLFF

Vdd

D

Clk

Q

Q

D=1

D=0

signal atnode X Second

Stage LatchPulse

Generator

D=1

D=0

(Hybrid Latch Flip-Flop)

Page 6: Clocked storage elements - Stanford University...Note: power is always consumed in the clocked pulse generator EE371 Lecture 6 32 Delay Comparison: M-S Latches and Flip-Flops 0.18um,

6

EE371 Lecture 6 11

Logic Diagram of HLFF

D=1

D=0

signal atnode X

SecondStage Latch

PulseGenerator

D=1

Clk

D

D=0

Enable

Q

EE371 Lecture 6 12

Performance Metrics:Timing and Power

Page 7: Clocked storage elements - Stanford University...Note: power is always consumed in the clocked pulse generator EE371 Lecture 6 32 Delay Comparison: M-S Latches and Flip-Flops 0.18um,

7

EE371 Lecture 6 13

Timing Parameters in Latches

D Q

ClkU H

Q1

D1

D2

Q2

(a)

(b)

tCW

tCQ

tDQ

L

Clk

Setup and Hold Times are defined relative to closing the clock signal

Timing Parameters in Latches

(a) Early data arrival

(b) Late data arrival

Setup and Hold Times are defined relative to closing the clock signal

EE371 Lecture 6 14

Timing Parameters in Flip-Flops

Setup and Hold Times are defined relative to rising edgeof the clock

D Q

Clk

U H

Q

D

tCW

tCQ

FF

Clk

SamplingWindow

Page 8: Clocked storage elements - Stanford University...Note: power is always consumed in the clocked pulse generator EE371 Lecture 6 32 Delay Comparison: M-S Latches and Flip-Flops 0.18um,

8

EE371 Lecture 6 15

Data-to-Output Delay

! Sum of setup time and Clk-Q delay is the only true measure of performance w.r.t. system speed

! T = TClk-Q + TLogic + Tsetup+ Tskew

D Q

Clk

D Q

Clk

Logic

N

TLogicTClk-Q TSetup

T

TD-Q=TClk-Q + TSetup

EE371 Lecture 6 16

0

50

100

150

200

250

300

350

-200 -150 -100 -50 0 50 100 150 200

Data-Clk [ps]

Clk

-Ou

tpu

t [p

s]

Setup Hold

Clk-Q Delay is a function of D-Clk

Sampling Window

Page 9: Clocked storage elements - Stanford University...Note: power is always consumed in the clocked pulse generator EE371 Lecture 6 32 Delay Comparison: M-S Latches and Flip-Flops 0.18um,

9

EE371 Lecture 6 17

Data to Clock Delay

Dat

a to

Ou

tpu

t D

elay

DDQm

Constant Clk-Q Region

Failure Region

Variable Clk-Q Region

Data arrives early Data arrives late

D-Q

Clk-Q

Uopt

45o

Setup Time vs. Data-to-Output (D-Q) Delay

opt Setup

EE371 Lecture 6 18

Power Consumption

! Power related to a CSE can be divided into: » Input power

– Data power (PD)– Clock power (PCLK)

» Internal power (PINT)– Depends on data activity

and glitching activity

» Load power (PLOAD)– Can be merged into PINT

Q

CLK

D

Qb

VDDVDD

VDD

PD

PCLK PINT

PLOAD

D

CLK

tot internal&load driverinputs(D,CLK)

P P P= + ∑

Page 10: Clocked storage elements - Stanford University...Note: power is always consumed in the clocked pulse generator EE371 Lecture 6 32 Delay Comparison: M-S Latches and Flip-Flops 0.18um,

10

EE371 Lecture 6 19

High-Performance Issues

EE371 Lecture 6 20

Clock Skew and Jitter

Ref_Clock

DRVCLKt

Received Clock

skewt

RCVCLKt

T

jitt−

jitt+

skewt

Page 11: Clocked storage elements - Stanford University...Note: power is always consumed in the clocked pulse generator EE371 Lecture 6 32 Delay Comparison: M-S Latches and Flip-Flops 0.18um,

11

EE371 Lecture 6 21

The Idea of Clock Uncertainty Absorption

Change in D-Q delay is much smaller than the clock uncertainty (CSE absorbs a part of the uncertainty)

200

220

240

260

280

300

-30 -20 -10 0 10 20 30 40 50 60

Clk arrival time [ps]

D-Q

del

ay [

ps]

tCU

DDQm

DDQM

NominalClk

Clk

EE371 Lecture 6 22

Clock Uncertainty Absorption

Clock uncertainty tCU

D

Q

Clk

Worst-case DDQ

Nominal DD-Clk

DDQm

DDQM

Early DD-Clk

Late DD-ClkTNominal=0

Page 12: Clocked storage elements - Stanford University...Note: power is always consumed in the clocked pulse generator EE371 Lecture 6 32 Delay Comparison: M-S Latches and Flip-Flops 0.18um,

12

EE371 Lecture 6 23

Example: HLFF

[Partovi et al, ISSCC’96]

EE371 Lecture 6 24

State-of-the-Art CSEs in CMOS Technology

Page 13: Clocked storage elements - Stanford University...Note: power is always consumed in the clocked pulse generator EE371 Lecture 6 32 Delay Comparison: M-S Latches and Flip-Flops 0.18um,

13

EE371 Lecture 6 25

Requirements in the Flip-Flop Design

! Small Clk-Output delay, Narrow sampling window

! Low power

! Small clock load

! High driving capability (increased levels of parallelism)» Typical flip-flop load in a 0.18µm CMOS ranges from 50fF to

over 200fF, with typical values of 100-150fF in critical paths

! Integration of logic into the flop

! Multiplexed or clock scan

! Crosstalk insensitivity - dynamic/high impedance nodes are affected

EE371 Lecture 6 26

Low-Energy Designs: Master-Slave Latch Examples

D

Q

Clk Clk1

Clk

QMSM SS

Clk1

Clk1

Clk

C2MOS Latch Master-Slave Latch (MSL)

[G.Gerosa et al, JSSC 1994]

Feedback for pseudo-static operationState node SS protection in PPC (decoupled Q)

D Q

Clk1

Clk

Clk

Clk1

Clk

Clk1

Clk

QMClk

Clk1

Clk1

Clk

[Suzuki et al, JSSC 1973]

Page 14: Clocked storage elements - Stanford University...Note: power is always consumed in the clocked pulse generator EE371 Lecture 6 32 Delay Comparison: M-S Latches and Flip-Flops 0.18um,

14

EE371 Lecture 6 27

Master-Slave Latches

! Positive setup times

! Two clock phases:

» distributed globally

» generated locally

! Small penalty in delay for incorporating MUX

! Some circuit tricks needed to reduce the overall delay

EE371 Lecture 6 28

High-Performance Designs: Flip-Flop Examples 1/2

D

Q

ClkS

Clk1

Q

Clk

Clk1

S

I

D

Clk

Clk

HLFF (Hybrid Latch-Flip-Flop) SDFF (Semi-Dynamic Flip-Flop)

[F.Klass et al, JSSC 1998]

Pulse-generating first stage (precharge-evaluate)Keepers for pseudo-static operation

Output load decoupled from internal nodes

[Partovi et al, JSSC 1996]

Page 15: Clocked storage elements - Stanford University...Note: power is always consumed in the clocked pulse generator EE371 Lecture 6 32 Delay Comparison: M-S Latches and Flip-Flops 0.18um,

15

EE371 Lecture 6 29

High-Performance Designs: Flip-Flop Examples 2/2

SAFF (Sense-Amplifier-Based Flip-Flop)

Fully-differential circuit

First stage sense-amp can take reduced-swing inputs

2nd stage is capturing latch

delay to Q and !Q not equal[Matsui et al, 1994]

Clk

D D

RS

QQ

EE371 Lecture 6 30

! The first stage is unchanged sense amplifier

! Second stage is sized to provide maximum switching speed

! Driver transistors are large! Keeper transistors are

small and disengaged during transitions

[Nikolic & Stojanovic ISSCC ’99]

SAFF with Improved S-R Latch

Page 16: Clocked storage elements - Stanford University...Note: power is always consumed in the clocked pulse generator EE371 Lecture 6 32 Delay Comparison: M-S Latches and Flip-Flops 0.18um,

16

EE371 Lecture 6 31

Flip-Flops

! First stage is a pulse generator» generates a pulse (glitch) on a rising edge of the clock

! Second stage is a latch» captures the pulse generated in the first stage

! Pulse generation potentially results in a negative setup time and soft-edge property

! Must check for hold time violations

Note: power is always consumed in the clocked pulse generator

EE371 Lecture 6 32

Delay Comparison: M-S Latches and Flip-Flops

0.18um, high load (14 min inverters)

Min D-Q Delay Comparison

0.00.51.01.52.02.53.03.54.04.55.0

MSL C2MOS HLFF SDFF SAFF M-SAFF

Del

ay [F

O4] Flip-Flops are faster

Page 17: Clocked storage elements - Stanford University...Note: power is always consumed in the clocked pulse generator EE371 Lecture 6 32 Delay Comparison: M-S Latches and Flip-Flops 0.18um,

17

EE371 Lecture 6 33

Energy Comparison: M-S Latches and Flip-Flops

0.18um, high load (14 min inverters)

Energy breakdown (50% activity)

0

20

40

60

80

100

120

MSL C2MOS HLFF SDFF SAFF M-SAFF

Ene

rgy

[fJ]

Ext. clock Ext. data Int. clockInternal non-clk

Latches are lower energy

EE371 Lecture 6 34

! CSE topology depends on target application» Master-Slave Latches for low-energy

» Flip-Flops & Pulsed latches for high-performance

! Delay is critical in high-speed systems, although minimizing Clk energy is of increasing importance

! Methods for reducing Clk energy» Clock gating (more effective in high-performance than in

low-energy designs)

» Reduced-swing clocking

» Dual-edge clocking

Summary

Page 18: Clocked storage elements - Stanford University...Note: power is always consumed in the clocked pulse generator EE371 Lecture 6 32 Delay Comparison: M-S Latches and Flip-Flops 0.18um,

18

EE371 Lecture 6 35

Microprocessor Examples

EE371 Lecture 6 36

Sun UltraSPARC-III

Vdd

Vdd

Clk

D

MN3

MN1MN4

MP2

MP1

Inv1

Inv2 Inv3

NAND

MN2

MN5

QInv4

Inv6

Inv5Q

S

Clk1

[Klass, 1998]

Basic Flip-Flop (SDFF)

Page 19: Clocked storage elements - Stanford University...Note: power is always consumed in the clocked pulse generator EE371 Lecture 6 32 Delay Comparison: M-S Latches and Flip-Flops 0.18um,

19

EE371 Lecture 6 37

Sun UltraSPARC-III Dynamic Flip-FlopsVdd

Clk

MN3

MN1

MP1

Inv1

Inv5

Inv4

Inv2

Q

Inv3

D

NAND

MN2

S

Vdd

Clk

MN3

MN1

MP1

Inv1-2

Inv5

Inv3-4

D

Vdd

Q

Inv6

MN2 MN4

MN5

MP2 MP4 MP3

MN7

MN6

D

QS R

Single-Ended

Differential

EE371 Lecture 6 38

Vdd

Vdd

Clk

D1

QMN3

MN1

MN5

MN4

MP2

MP1

Inv1

Inv5Inv4

Inv2

Q

Inv3

Inv6

NAND

D2

MN2a MN2c

MN2b MN2d

D2

D1

S

Sun UltraSPARC-III

2-input XOR

Flip-Flops with Logic Embedding

Page 20: Clocked storage elements - Stanford University...Note: power is always consumed in the clocked pulse generator EE371 Lecture 6 32 Delay Comparison: M-S Latches and Flip-Flops 0.18um,

20

EE371 Lecture 6 39

Latches used in 20164 Alpha

[Gronowski et al, 1998]

D

Clk

X Q

1N2N

3N

4N

2P1P

5PD

ClkX Q

1N2N

3P

4P

2P1P

5N

21064 modified TSPC Latches

EE371 Lecture 6 40

Logic Embedding in 21064/21164 Alpha

1 level of logic(21064 Alpha)

Q

X2

Clk

Q

X1

Clk

Clk

X

1D

2D

1D

2D

3D

4D

2 levels of logic(21164 Alpha)

Page 21: Clocked storage elements - Stanford University...Note: power is always consumed in the clocked pulse generator EE371 Lecture 6 32 Delay Comparison: M-S Latches and Flip-Flops 0.18um,

21

EE371 Lecture 6 41

Flip-Flop used in 21264 Alpha

Differential

QQ

Clk

D

EE371 Lecture 6 42

IBM Processors: Level-Sensitive Scan Design (LSSD)

[Eichelberger 1983]

Out

+Clock

-Clock

Data

Hazard-free level-sensitive polarity-hold latch

Page 22: Clocked storage elements - Stanford University...Note: power is always consumed in the clocked pulse generator EE371 Lecture 6 32 Delay Comparison: M-S Latches and Flip-Flops 0.18um,

22

EE371 Lecture 6 43

The PowerPC 603 Master-Slave Latch

Din

C1

C1

VDD

ACLK

ACLK

ACLK

SCANin

C2

C2

C2

Dout

[Gerosa et al, 1994]

EE371 Lecture 6 44

IBM Power4TM processor

Data_outData_in

c1 c2

c1

c2

scan_clk

Scan_in

1c 2c

2c

L1 L2

AttachedLogicGate

scan_clk

scan_clk

scan_clk

1c

[Warnock et al, 2002]Scannable Split Latch with LSSD Capability

Page 23: Clocked storage elements - Stanford University...Note: power is always consumed in the clocked pulse generator EE371 Lecture 6 32 Delay Comparison: M-S Latches and Flip-Flops 0.18um,

23

EE371 Lecture 6 45

Low-Energy Issues:Clock & Clocked elements burn more

than 60% of the processor power

EE371 Lecture 6 46

Low-Swing Clocking: Clock Driver Re-design

Cn1

Cp1

Cn2

Cp2 CA

CB

CPT

CNT

CPB

CNBH-VDD

VDD

GNDClk

VDD

Vthp

Vthn

GND

CPBCPT

CNBCNT

50% power reduction with half-swing clock

(minus some penalty in clock drivers)

[H. Kojima, JSSC, April 1995]

Page 24: Clocked storage elements - Stanford University...Note: power is always consumed in the clocked pulse generator EE371 Lecture 6 32 Delay Comparison: M-S Latches and Flip-Flops 0.18um,

24

EE371 Lecture 6 47

Low-Swing Clocking: CSE Re-design

D

Clk

QQ

Vwell > VDD VDD

VDD

Clk(VDD-nVth)

n

Clock drivers

VDD-Low

Clk(VDD-Low)

Clk(VDD-Vth)

E(a),(b)~VDD(VDD-Vth)

(a)

(b)

(c)

E(c)~(VDD-Vth)2

PMOS does not fully turn off[H. Kawaguchi and T. Sakurai, JSSC, May 1998]

EE371 Lecture 6 48

Low-Swing Clocking: N-only CSEs

D

QClkClk

QM

SM

Clk

SS

ClkN1 N2 N3 N4

Clk

N-only clocked transistors, M-S Latch Example(N1 and N2 improve pull-up on SM)

[D.Markovic, J.Tschanz, V.De, 2001, patent pending]

Page 25: Clocked storage elements - Stanford University...Note: power is always consumed in the clocked pulse generator EE371 Lecture 6 32 Delay Comparison: M-S Latches and Flip-Flops 0.18um,

25

EE371 Lecture 6 49

Clock Gating: Global Clock Gating

Used to save clocking energy when data activity is low

D Q0

1 S

ClkLoad

In

REG

D Q

REGClkEN

In

(a) (b)

Time-mux(no gating!) Global Clk Gating

EE371 Lecture 6 50

Clock Gating: Local Clock Gating

Used to save clocking energy when data activity is low

D Q

CP

CPCP

QM

Clk

CP

P1

CPI

CP

Data-TransitionLook-Ahead

Clock ControlPulse Generator

CP

CP

CP

[M. Nogawa and Y. Ohtomo, JSSC, May 1998]

Page 26: Clocked storage elements - Stanford University...Note: power is always consumed in the clocked pulse generator EE371 Lecture 6 32 Delay Comparison: M-S Latches and Flip-Flops 0.18um,

26

EE371 Lecture 6 51

Dual-Edge Triggering: Latch-Mux

D Q

QC0

1 S

D Q

QC

Q

Clk

D

Used to save clocking energy regardless of data activity!

Clk Clk

Clk

QClkD

Clk

Clk

ClkClk

Clk

Clk

Concept Circuit Example

[R.P. Llopis and M. Sachdev, ISPLED Aug. 1996]

EE371 Lecture 6 52

Dual-Edge Triggering: Pulsed-Latch

C

C

D Q

QC

Clk

D Q

QPulseGen

PulseGen

Concept

Circuit Example

Clk

Clk

Clk Clk1

Clk

D

D

Clk

(a) (b)

Q

Clk Clk

Q

Clk Clk1 Clk1

Clk2

ClkClk2

Clk

Clk1

Clk2

Clk2

Clk2

Clk2

Clk1 Clk1

Clk1

Clk1

Clk

Clk

Clk

Single-Edge Dual-Edge

Page 27: Clocked storage elements - Stanford University...Note: power is always consumed in the clocked pulse generator EE371 Lecture 6 32 Delay Comparison: M-S Latches and Flip-Flops 0.18um,

27

EE371 Lecture 6 53

Dual-Edge Triggered Flip-Flop

Q

QCL

Clk

D

Q

Q

R

S

C

D

R

S

C

D

Concept Circuit Example

Clk

D

Clk

D

SX

Clk

PG Latch XCL

Q

Clk Clk1 Clk1 Clk2

Clk1

Clk

Clk

Clk2

SY

1st stage:PG Latch Y

1st stage:

[N.Nedovic, V.G.Oklobdzija, ESSCIRC 2002]

Pulse-generating latchestrigger capturing latch

EE371 Lecture 6 54

Design goals! Apply

» Small clock load

» Short direct path

» Reduced node swing

» Low-power feedback

» Pulsed design

» Optimization of both Master and Slave latch

! Avoid

» Positive setup time

» Sensitivity to clock slope and skew

» Dynamic (floating) nodes

» Dynamic Master latch

Conduct Energy - Delay optimizations

Take into account all sources of power dissipation

ALWAYS use Clk-Q + setup time for max delay

For more details on storage elements check prof. Oklobdzija’s ISSCC’02 workshop:

http://www.ece.ucdavis.edu/acsel under Presentations

Page 28: Clocked storage elements - Stanford University...Note: power is always consumed in the clocked pulse generator EE371 Lecture 6 32 Delay Comparison: M-S Latches and Flip-Flops 0.18um,

28

EE371 Lecture 6 55

! Incorporating logic into the CSE

! Absorbing clock skew

! Pipeline boundaries will start to blur – pulsed latches

! Latch-less domino style clocking, signals used to clock

! Synchronous design only in a limited domain

! Asynchronous communication between synchronous domains

What to Expect in the Future?