64
Charles Dike 1 R ® Synchronization Ideas Charles E. Dike Intel Corporation

1 ® Charles Dike Synchronization Ideas Charles E. Dike Intel Corporation

  • View
    245

  • Download
    5

Embed Size (px)

Citation preview

Page 1: 1 ® Charles Dike Synchronization Ideas Charles E. Dike Intel Corporation

Charles Dike 1R

®

Synchronization Ideas

Charles E. Dike

Intel Corporation

Page 2: 1 ® Charles Dike Synchronization Ideas Charles E. Dike Intel Corporation

Charles Dike 2R

®

Introduction

• Tutorial

• Share some ideas about synchronization and metastability

• Introduce NEW, IMPROVED theory on metastability

• Charles Dike ([email protected])

Page 3: 1 ® Charles Dike Synchronization Ideas Charles E. Dike Intel Corporation

Charles Dike 3R

®

Why and where synchronize? Reduce latency between independent clock domains.

Asynchronous domain to synchronous clock.Synchronous clock to an independent synchronous

clock.

Benefit - higher performance in critical circuits.

Asynchronous

Circuit

Pausable

Clock

at 1.8 GHz

Synchronous

Clock

at 3.0 GHz

Synchronous Clock at 1.5GHz

Synchronous Clock at 1.5GHz

Page 4: 1 ® Charles Dike Synchronization Ideas Charles E. Dike Intel Corporation

Charles Dike 4R

®

Design Direction

MEM

FPU

ALU

MEM

FPU

ALUMEM

FPU

ALU

MEMFPU

ALU

80stowards 100MHz

90stowards 1GHz

00smulti-GHz

VALUE ADDED

Page 5: 1 ® Charles Dike Synchronization Ideas Charles E. Dike Intel Corporation

Charles Dike 5R

®

Chip Area NetworksLate 00s

multi-GHz

Page 6: 1 ® Charles Dike Synchronization Ideas Charles E. Dike Intel Corporation

Charles Dike 6R

®

I believe….

• We must be able to synchronize all domains to a PLL controlled clock

• Interconnect on chip will be asynchronous (GALS)

• We need to minimize latency

• There will be two basic synchronizer uses - near neighbor and the chip net

Page 7: 1 ® Charles Dike Synchronization Ideas Charles E. Dike Intel Corporation

Charles Dike 7R

®

Topics of Discussion• Generic synchronizer of the type used

in the TeraFlops computer

• Simple synchronizer of the type used in StrongArm

• The Myrinet pipeline synchronization scheme

• Latest understanding of metastability

Page 8: 1 ® Charles Dike Synchronization Ideas Charles E. Dike Intel Corporation

Charles Dike 8R

®

Generic Synchronizer• Handles self timed to synchronous

interfaces and vice-versa

• Supports synchronous to synchronous interfaces

• Can handle streaming data

• Adaptable to any speed range

• Possibly used over the chip network

Page 9: 1 ® Charles Dike Synchronization Ideas Charles E. Dike Intel Corporation

Charles Dike 9R

®

Two flop synch

D Q D Q

CLK

VALID#1 #2

Page 10: 1 ® Charles Dike Synchronization Ideas Charles E. Dike Intel Corporation

Charles Dike 10R

®

Single latch synch

D Q D Q

CLK2

REQ

S R

Q

DQ DQ

CLK1

Write Valid Read Valid

ACK

LATCH OUTPUT

RECEIVER CLOCK

SENDER CLOCK

Page 11: 1 ® Charles Dike Synchronization Ideas Charles E. Dike Intel Corporation

Charles Dike 11R

®

Multi latch synch

D Q D Q

CLK2

REQ

S R

Q

DQ DQ

CLK1

Write Valid Read Valid

ACK

D Q D Q

CLK2

REQ

S R

Q

DQ DQ

CLK1

Write Valid Read Valid

ACK

Page 12: 1 ® Charles Dike Synchronization Ideas Charles E. Dike Intel Corporation

Charles Dike 12R

®

General Case

1000000000

0000010000

WRITEPOINTER

READPOINTER

EMPTY

SYNC

STATUSREGISTER

1111100000

SYNCHRONIZERS

LATENCY

PADDING

FULL

ENEN EN

Write Clock

Write Enable

Read Clock

Page 13: 1 ® Charles Dike Synchronization Ideas Charles E. Dike Intel Corporation

Charles Dike 13R

®

empty caseWRITE

POINTERREAD

POINTERSTATUS

REGISTER

EMPTYD Q

REN

D Q

R

D Q

R

SYNCHRONIZER

Write Pointer a

Read Pointer bRead Clock

EMPTYD Q

REN

D Q

R

D Q

RWrite ClockWrite Enable

Write Pointer b

Read Pointer a

Page 14: 1 ® Charles Dike Synchronization Ideas Charles E. Dike Intel Corporation

Charles Dike 14R

®

General Case

1000000000

0000010000

WRITEPOINTER

READPOINTER

EMPTY

SYNC

STATUSREGISTER

1111100000

SYNCHRONIZERS

LATENCY

PADDING

FULL

ENEN EN

Write Clock

Write Enable

Read Clock

Page 15: 1 ® Charles Dike Synchronization Ideas Charles E. Dike Intel Corporation

Charles Dike 15R

®

Topics of Discussion• Generic synchronizer of the type used

in the TeraFlops computer

• Simple synchronizer of the type used in StrongArm processor

• The Myrinet pipeline synchronization scheme

• Latest understanding of metastability

Page 16: 1 ® Charles Dike Synchronization Ideas Charles E. Dike Intel Corporation

Charles Dike 16R

®

Simple Synchronizer• Constrained by frequency ratio

• Supports synchronous to synchronous interfaces

• Does it support asynch to synch? Yes, with restrictions.

• Possibly used in local neighbor synchronizers

Page 17: 1 ® Charles Dike Synchronization Ideas Charles E. Dike Intel Corporation

Charles Dike 17R

®

Simple Synchronizer

D Q D Q D QD Q

Divide by 2

SLOW CLK

FAST CLK

SYNC

MI**

MI* = Metastable Immune

A A1 A2 A3

w x y z

Page 18: 1 ® Charles Dike Synchronization Ideas Charles E. Dike Intel Corporation

Charles Dike 18R

®

timing1D Q D Q D QD Q

Divide by 2

SLOW

FAST

SYNC

MI**

A A1 A2 A3

1 2 3 4 5 6FAST CLOCK

SLOW CLOCK

A

A1

A2

A3

SYNC

Page 19: 1 ® Charles Dike Synchronization Ideas Charles E. Dike Intel Corporation

Charles Dike 19R

®

timing2D Q D Q D QD Q

Divide by 2

SLOW

FAST

SYNC

MI**

A A1 A2 A3

1 2 3 4 5 6FAST CLOCK

SYNC

SLOW CLOCK

CHEATER CLOCK

Page 20: 1 ® Charles Dike Synchronization Ideas Charles E. Dike Intel Corporation

Charles Dike 20R

®

timing3D Q D Q D QD Q

Divide by 2

SLOW

FAST

SYNC

MI**

A A1 A2 A3

1 2 3 4 5 6FAST CLOCK

SYNC

SLOW CLOCK

CHEATER CLOCK

Page 21: 1 ® Charles Dike Synchronization Ideas Charles E. Dike Intel Corporation

Charles Dike 21R

®

timing4

Divide by 2

SLOW

FAST

SYNC

MI**

A A1 A2 A3

1 2 3 4 5 6FAST CLOCK

SYNC

SLOW CLOCK

SLOW CLOCK#

SYNC

D Q D Q D Q

FAST

SYNC

MI**

A A1 A2 A3

D Q D Q D QD Q

D Q

MI**

Page 22: 1 ® Charles Dike Synchronization Ideas Charles E. Dike Intel Corporation

Charles Dike 22R

®

transfers1 2 3 4 5 6FAST CLOCK

SYNC

SLOW CLOCK

CHEATER CLOCK

D Q D Q

SYNCFAST CLOCK

SLOW CLOCK

FAST TO SLOW TRANSFERSLOW TO FAST TRANSFER

D Q D Q

SYNCFAST CLOCK

SLOW CLOCK

Page 23: 1 ® Charles Dike Synchronization Ideas Charles E. Dike Intel Corporation

Charles Dike 23R

®

Topics of Discussion• Generic synchronizer of the type used

in the TeraFlops computer

• Simple synchronizer of the type used in StrongArm

• The Myrinet pipeline synchronization scheme

• Latest understanding of metastability

Page 24: 1 ® Charles Dike Synchronization Ideas Charles E. Dike Intel Corporation

Charles Dike 24R

®

Pipeline Synchronizer• Supports synchronous to synchronous

interfaces• Supports asynch to synch and vice-

versa• Possibly used in local neighbor

synchronizers• Essentially a distributed fifo and

synchronizer

Page 25: 1 ® Charles Dike Synchronization Ideas Charles E. Dike Intel Corporation

Charles Dike 25R

®

Pipeline Synchronizer

S Ri

Ai

Di

Ro

Ao

Do

S Ri

Ai

Di

Ro

Ao

Do

S Ri

Ai

Di

Ro

Ao

Do

Page 26: 1 ® Charles Dike Synchronization Ideas Charles E. Dike Intel Corporation

Charles Dike 26R

®

R1

R0

A1

A0

ME

S

ME element

XREQ

Page 27: 1 ® Charles Dike Synchronization Ideas Charles E. Dike Intel Corporation

Charles Dike 27R

®

Fifo element

Ri

Ai

Di

Ro

Ao

Do

C

Ri

Ai

Ro

AoC

Data

Page 28: 1 ® Charles Dike Synchronization Ideas Charles E. Dike Intel Corporation

Charles Dike 28R

®

Async to sync

S Ri

Ai

Di

Ro

Ao

Do

S Ri

Ai

Di

Ro

Ao

Do

S Ri

Ai

Di

Ro

Ao

Do

Synchronous Asynchronous

Page 29: 1 ® Charles Dike Synchronization Ideas Charles E. Dike Intel Corporation

Charles Dike 29R

®

Sync to async

Synchronous Asynchronous

Ri

Ai

Di

Ro

Ao

Do

Ri

Ai

Di

Ro

Ao

Do

Ri

Ai

Di

Ro

Ao

Do

SSS

Page 30: 1 ® Charles Dike Synchronization Ideas Charles E. Dike Intel Corporation

Charles Dike 30R

®

Points to ponder #1• All synchronizing interfaces have one thing in

common - a latching element that holds data while metastabilities are being resolved.

• There is no way to avoid the latency which is required to resolve metastabilities.

• To minimize latency the latching element characteristics can be improved.

• We will be required to understand and use this knowledge. This is the future of digital design.

Page 31: 1 ® Charles Dike Synchronization Ideas Charles E. Dike Intel Corporation

Charles Dike 31R

®

Topics of Discussion• Generic synchronizer of the type used

in the TeraFlops computer

• Simple synchronizer of the type used in StrongArm

• The Myrinet pipeline synchronization scheme

• Latest understanding of metastability

Page 32: 1 ® Charles Dike Synchronization Ideas Charles E. Dike Intel Corporation

Charles Dike 32R

®

Role of the Synchronizing Flop

• Reorients incoming information to a clock edge

• Its performance determines system failure rate or latency

Page 33: 1 ® Charles Dike Synchronization Ideas Charles E. Dike Intel Corporation

Charles Dike 33R

®

Real Life• There is no magic bullet• There is a lot of misinformation on metastability

around• To date many circuits have been over designed

through planning and luck• Whenever a circuit fails based on too high of a

frequency ultimately the cause of failure is metastability

• There is no way to synchronize a signal faster than about the time it takes to pass a signal through six static gates

Page 34: 1 ® Charles Dike Synchronization Ideas Charles E. Dike Intel Corporation

Charles Dike 34R

®

Metastability is....

SET

RESET

OUT

OUT

NODE A

NODE B

Page 35: 1 ® Charles Dike Synchronization Ideas Charles E. Dike Intel Corporation

Charles Dike 35R

®

Technical terms• Tw (window size) - likelihood of entering a

metastable state - in units of time• Tau () - rate at which metastability

resolves - in units of time• MTBF (Mean Time Between Failures)

MTBF =Twfdfc

e t

<Vn2>=4kT/C < thermal noise

Page 36: 1 ® Charles Dike Synchronization Ideas Charles E. Dike Intel Corporation

Charles Dike 36R

®

Simple jamb latch

DATA

CLOCK RESET

OUTNODE A NODE B

Propagation delay

time of dataafter clock

Page 37: 1 ® Charles Dike Synchronization Ideas Charles E. Dike Intel Corporation

Charles Dike 37R

®

Simple jamb latch

DATA

CLOCK RESET

OUTNODE A NODE B

Propagation delay

time of dataafter clock

~RC time constant

Page 38: 1 ® Charles Dike Synchronization Ideas Charles E. Dike Intel Corporation

Charles Dike 38R

®

Rough Histogram

Propagation delay

time of dataafter clock

Propagation delay

time of dataafter clock(log scale)

MTBF =Twfdfc

e t

Tw

The slope is the

Page 39: 1 ® Charles Dike Synchronization Ideas Charles E. Dike Intel Corporation

Charles Dike 39R

®

Why is the theory a problem?

• It assumes a uniform distribution of data about the clock– What happens when data always violates the setup/ hold window?

• It is not detailed enough– Doesn’t consider a deterministic region

– Doesn’t account for thermal noise

• People tend to extrapolate the theory improperly

MTBF =Twfdfc

e t

Page 40: 1 ® Charles Dike Synchronization Ideas Charles E. Dike Intel Corporation

Charles Dike 40R

®

Overview of refined theory

• Not everything past a normal propagation is a metastable event

• The Tw window can’t be improved by input edge rates

• Tw has a complex relationship to t based on load

• The MTBF formula needs to be modified due to non-uniform distribution of data about the clock input

Page 41: 1 ® Charles Dike Synchronization Ideas Charles E. Dike Intel Corporation

Charles Dike 41R

®

Schematic

Page 42: 1 ® Charles Dike Synchronization Ideas Charles E. Dike Intel Corporation

Charles Dike 42R

®

tau= 29.9 ps, Tw= 211.9 ps normal prop= 189.2 ps

0.1

1

10

100

1000

0.15 0.2 0.25 0.3 0.35

propagation delay in ps

Win

do

w w

idth

in

ps

propagation delay in ns

0.8 ps

1.8 ps2.8 ps

Simulation of Typical Latching Device

4.8 ps

Simulation of a typical latching device

Page 43: 1 ® Charles Dike Synchronization Ideas Charles E. Dike Intel Corporation

Charles Dike 43R

®

Test case

D QR

PC

DELAY

PULSE GENERATOR#2

PULSE GENERATOR#1

TRIGGER

INPUT

TEK 11801-BOSCILLOSCOPE

DELAY

Page 44: 1 ® Charles Dike Synchronization Ideas Charles E. Dike Intel Corporation

Charles Dike 44R

®

Measuring real data

advancing time

0.1

1

10

100

1000

10000

100000

1000000

10000000

-3.00E-10 -2.50E-10 -2.00E-10 -1.50E-10 -1.00E-10 -5.00E-11 0.00E+00 5.00E-11 1.00E-10

Series1

Page 45: 1 ® Charles Dike Synchronization Ideas Charles E. Dike Intel Corporation

Charles Dike 45R

®

Histogram

Inflection point

time

0.6mv/0.1ps

Page 46: 1 ® Charles Dike Synchronization Ideas Charles E. Dike Intel Corporation

Charles Dike 46R

®

Histogram

Inflection point

time

0.6mv/0.1ps

Page 47: 1 ® Charles Dike Synchronization Ideas Charles E. Dike Intel Corporation

Charles Dike 47R

®

Measured versus Basic

Propagation delay

time of dataafter clock(log scale)

MTBF =Twfdfc

e t

Tw

The slope is the

Propagation delay

0.6mv/0.1ps

Page 48: 1 ® Charles Dike Synchronization Ideas Charles E. Dike Intel Corporation

Charles Dike 48R

®

Simulated....

Voltage Controlled Switch

R1 = 100 R1 = 100M

Battery

Page 49: 1 ® Charles Dike Synchronization Ideas Charles E. Dike Intel Corporation

Charles Dike 49R

®

Tau Simulated 2

=| t1 - t2 |

ln V2V1

Where:V1 = voltage at time t1V2 = voltage at time t2

t2

t1

Latch outputs at nodes 1 and 2

1.0 1.2 1.4ns

Semilog difference between latch outputs

1.0 1.2 1.4ns

100

10-3

10-6

volts

time

1.5

1.0

0.5

0.0

volts

Page 50: 1 ® Charles Dike Synchronization Ideas Charles E. Dike Intel Corporation

Charles Dike 50R

®

<Vn2>=4kT/C=4kTBR

k = 1.38 x 10-23 J/K

B = 1/=5 x 1010Hz

R = ~400 T = 300o K

= 20 picoseconds

Vn = ~0.6 mv

Page 51: 1 ® Charles Dike Synchronization Ideas Charles E. Dike Intel Corporation

Charles Dike 51R

®

Putting it all together

-50 0 20010050 150 250

180 ps

18.0 ps

1.80 ps

0.18 ps

18.0 fs

1.80 fs

0.18 fs

1.80 ns

(picoseconds)A

normal

Page 52: 1 ® Charles Dike Synchronization Ideas Charles E. Dike Intel Corporation

Charles Dike 52R

®

Putting it all together

-50 0 20010050 150 250

180 ps

18.0 ps

1.80 ps

0.18 ps

18.0 fs

1.80 fs

0.18 fs

1.80 ns

(picoseconds)B

?deterministic

Page 53: 1 ® Charles Dike Synchronization Ideas Charles E. Dike Intel Corporation

Charles Dike 53R

®

Putting it all together

-50 0 20010050 150 250

180 ps

18.0 ps

1.80 ps

0.18 ps

18.0 fs

1.80 fs

0.18 fs

1.80 ns

(picoseconds)C

Thermal noise point

1.80 v

180 mv

18.0 mv

1.80 mv

180 v

18.0 v

1.80 v

deterministic

Page 54: 1 ® Charles Dike Synchronization Ideas Charles E. Dike Intel Corporation

Charles Dike 54R

®

Putting it all together

-50 0 20010050 150 250

180 ps

18.0 ps

1.80 ps

0.18 ps

18.0 fs

1.80 fs

0.18 fs

1.80 ns

(picoseconds)D

T=19 ps

deterministic true metastability

Page 55: 1 ® Charles Dike Synchronization Ideas Charles E. Dike Intel Corporation

Charles Dike 55R

®

Putting it all together

-50 0 20010050 150 250

180 ps

18.0 ps

1.80 ps

0.18 ps

18.0 fs

1.80 fs

0.18 fs

1.80 ns

(picoseconds)E

Tw=15 ps

T=19 ps

deterministic true metastability

Page 56: 1 ® Charles Dike Synchronization Ideas Charles E. Dike Intel Corporation

Charles Dike 56R

®

MTBF =Twfdfc

e(t-deter)

MTBF =Twfdfc

e t

Worst case

Simple case

MTBF =Twfdfc

e(t-0.5*deter)Expected

Page 57: 1 ® Charles Dike Synchronization Ideas Charles E. Dike Intel Corporation

Charles Dike 57R

®

Points to ponder #2Jakov Seizovic postulated a “malicious” asynchronous signal:no matter how we position the sampling window, and no matter how small we make the sampling window, the asynchronous transition will appear in that window.

This case has to be assumed when interfacing to a signal of unknown probability distribution.

We know something about just how malicious a signal can be.

Page 58: 1 ® Charles Dike Synchronization Ideas Charles E. Dike Intel Corporation

Charles Dike 58R

®

Exploring

Page 59: 1 ® Charles Dike Synchronization Ideas Charles E. Dike Intel Corporation

Charles Dike 59R

®

Worst case bound

Page 60: 1 ® Charles Dike Synchronization Ideas Charles E. Dike Intel Corporation

Charles Dike 60R

®

< 0.1 ps

Uniform distribution

12 ps jitter

Not worst case bound

Page 61: 1 ® Charles Dike Synchronization Ideas Charles E. Dike Intel Corporation

Charles Dike 61R

®

Final comments • With the proper synchronizing device it may be possible to

synchronize a signal within a single clock cycle. The constraints

are: – You require about 35 s in order to get the MTBF out to about 1

century.

– Each typical static gate delay is equivalent to about 5 s in a properly designed synchronizing flop.

– The metastability MTBF of a device should probably be an order of magnitude better than the mechanical MTBF.

– You must assume a ‘malicious’ input to the synchronizer. Nevertheless, this only adds about 5s to the delay.

– Standard flop designs are generally very poor synchronizers. Use a jamb structure. It has the best transconductance.

– You should never require more than two synchronizing flops in series

Page 62: 1 ® Charles Dike Synchronization Ideas Charles E. Dike Intel Corporation

Charles Dike 62R

®

Conclusion

• There are several ways to communicate between independent domains

• I believe more asynchronous domains will appear that are imbedded within synchronous designs– Latency must be reduced to maximize the use of asynchronous designs.

– This is a burden that asynch designers must bear

– We need to know the limitations of synchronization and metastability

• Chip area networks are coming and they will open up opportunities for asynchronous design

Page 63: 1 ® Charles Dike Synchronization Ideas Charles E. Dike Intel Corporation

Charles Dike 63R

®

References

• T. Sakurai, “Optimization of CMOS Arbiter and Synchronizer Circuits with Submicrometer MOSFET’s,” IEEE J. Solid State Circuits, vol. 23,no. 4, pp. 901-906, Aug 1988.

• L. Kleeman and A. Cantoni, “Metastable Behavior in Digital Systems,” IEEE Design & Test of Computers, pp. 4-19, Dec 1987.

• I. E. Sutherland, “Micropipelines.” Turing Award Lecture, Communications of the ACM, 32(6), pp.720-738, 1989.

• J. N. Seizovic, “Pipeline Synchronization,” Proc. Int’l Symp. Advanced Research in Asynchronous Circuits and Systems, CS Press, 1994.

• C. Dike and E. Burton, “Miller and Noise Effects in a Synchronizing Flip-Flop,” IEEE J. Solid State Circuits, vol. 34,no. 6, pp. 849-855, June 1999.

• A. Van der Ziel, Noise in Measurements. New York: Wiley, 1976.

Page 64: 1 ® Charles Dike Synchronization Ideas Charles E. Dike Intel Corporation

Charles Dike 64R

®

Overview of present theory• Everything past a normal propagation is

considered a metastable event• A deterministic region doesn’t exist

• Tw has no fixed relationship to • The MTBF formula assumes a uniform

distribution of data about the clock input

MTBF =Twfdfc

e t