Synchronizers, Arbiters, GALS and Metastability€™s the problem: The digital IP world and the rest of the world Your system The synchronizer . is the guy that allows. ... metastability

TUBS 1 July 2009 1

David KinnimentUniversity of Newcastle, UK

Based on contributions from:Alex Bystrov, Keith Heron, Nikolaos Minas,

Gordon Russell, Alex Yakovlev, and Jun Zhou

Synchronizers, Arbiters, GALS and Metastability

TUBS 1 July 2009 2

Outline

What’s the problem? Why does it matter?

Time and distributions Noise – decisions are not

deterministic. Synchronizer latency, can it be

avoided? Measuring and some interesting

effects

TUBS 1 July 2009 3

What’s the problem: The digital IP world and the rest of the world

Your system

The synchronizer is the guy that allowstiming flexibility

Everything else, or Reality

TUBS 1 July 2009 4

Synchronizers and arbiters

Your system

Input

Your system

Input 1

Input 2

SynchronizerDecides which clockcycle to use for the input data

Asynchronous arbiterDecides the order of inputs

TUBS 1 July 2009 5

Time Comparison Hardware Digital comparison hardware

(which compares integers) is easy– Fast– Bounded time

Analog comparison hardware (which compares reals like time) is hard– Normally fast, but takes longer as the difference becomes smaller– Can take forever, (Buridan’s Ass ~1340)

Synchronization and arbitration involve comparison of time

Known to early computer designers:– Lubkin 1952, Catt 1966– Chaney and Littlefield 1966/72

TUBS 1 July 2009 6

Asynchronous Network (Sparsø, ASYNC 2005)

Sparsø

Synchronization requiredMultiple Clocks

AsynchronousArbitration required

TUBS 1 July 2009 7

Synchronization of data

Non pausible clocks require data synchronization You have a limited time to synchronize. Synchronizer circuits may fail to work in that time System sometimes fails (you fly into a mountain) Synchronization time = latency

D Q D Q

CLK a

VALID

#1 #2

CLK b

TUBS 1 July 2009 8

Synchronization of clocks

Clock paused to prevent contention Wait for MUTEX output to resolve Can take forever (with decreasing probability) This may not be acceptable (you fly into a mountain)

C

MU

TEX

Request Grant

Clock

Ack

Run

Delay

TUBS 1 July 2009 9

Trends

Communications are limiting performance Difficulty (Impossibility?) in maintaining one

clock per chip Need reduce wasted power in clocks Timing closure problems

– Interconnect delay times long– Variability increases

Move towards asynchronous networks

TUBS 1 July 2009 10

Metastability is....

Not being able to decide…

Q

Q

Clock

D

∆tin

∆tin -> 0

D

Clock

Request

Processor Clock

Set-up time violated

TUBS 1 July 2009 11

Metastability in a Latch

V1 V2

I1 I2

V1

V2

Stable points

Metastable Point

V1

V2

I1

V2

V1

I2

TUBS 1 July 2009 12

Simple Linear Model

Simple linear model leads to two exponentials

τa is convergent, τb is divergent

ba

t

b

t

a eKeKV ττ ..1 +=−0

111 2

21

21 2 1

2 1= ++

+ −τ ττ τ

. .( )

. ( ).d Vdt A

dVdt A

V

τ τ11 1

22 2= =

C RA

C RA

.,

.Q1 Q2

-A*V1

R1

C1

+- V2

V2

V1

V1

V2

V1

RAgm =

TUBS 1 July 2009 13

Synchronizer

t is time allowed for the Q to change between CLK a and CLK b τ is the recovery time constant, usually the gain-bandwidth of the circuit Tw is the “metastability window” τ and Tw depend on the circuit We assume that all values of ∆tin are equally probable

D Q D Q

CLK a

VALID#1 #2

dcw

t

ffTeMTBF

..

/τ

=CLK b

TUBS 1 July 2009 14

Typical responses

All starting points are equally probable Most are a long way from the “balance point” A few are very close and take a long time to resolve

Clock

Q Output

TUBS 1 July 2009 15

Event Histogram

- Propagation delay

Events

The slope is -1/τ

Log Probability of eventdepends on ∆ time

Propagation delay

Normal delay

The intercept is ~Tw

TUBS 1 July 2009 16

Synchronizer state of the art

You require about 35 τ s in order to get the MTBF out to about 1 century. (That’s for 1 synchronizer)

There is nothing else you can do while synchronizing

Each typical static gate delay is equivalent to about 5 τ Synchronizers are analog devices, so worse affected by scaling

Bigger SoCs, in future systems so more synchronizers, worse reliability

Inputs can be ‘malicious’ i.e. always causing metastability.

TUBS 1 July 2009 17

Outline





effects

TUBS 1 July 2009 18

The arbiter (MUTEX)

Grant 2

Grant 1

Request 1

Gnd

Request 2

•Asynchronous arbitration,No time bound•Seitz metastability filter•Grants cannot occur until after the latch resolves any metastability

TUBS 1 July 2009 19

Arbitration time

Unlike a synchronizer, an arbiter may take for ever. It usually doesn’t, long responses are rare. On average the time should only be τ longer than the

normal response, so DEAD time ought to be low Outputs are always monotonic

Request 1

Request 2

Grant 1

Grant 2

tm

TUBS 1 July 2009 20

Gate metastability filter

Half levels due to metastability need to be removed– Low (or high) threshold inverters– Measure divergence

Filters define the time to reach a stable state

Vt =Vdd/4

Vdd/2

Vdd/2

0

0

Vdd/2

TUBS 1 July 2009 21

Resolution time can be affected

• The start/finish points make a difference• Grant appears when trajectory crosses the filter threshold• Gate metastability filter shows more variation• Seitz filter shows more delay

1.1

1.3

1.5

1.7

1.9

2.1

0 50 100 150 200 250 300 350 400 450 500

ps

Volts

Low Start1.55VHigh Start

TUBS 1 July 2009 22

Event distribution

Assumption: Time between two events (R1 and R2) is evenly distributed

Typically it is not. Can be as much

as 7:1 variation. Here there are

more cases where R1 is just after R2

Arbitration may not be fair

Distribution of event times

TUBS 1 July 2009 23

Non uniform probabilities and time Time for metastability averaged over a distribution T in a

MUTEX is:

in

T

in

w tdt

TtimeAverage ∆

∆

= ∫ .ln_0

τ

If T = ∞, Average_time = τBUTFor a malicious input, distribution limited by noise If T = 4ps, Tw =25ps, Average_time = 4τ Noise can sometimes make average time faster

TUBS 1 July 2009 24

Predicting performance

MUTEX circuits are fast

BUT delay times may not be easy to predict– Cannot rely on ALL times being fast

Synchronizers are known to be unreliable

TUBS 1 July 2009 25

Future processes

Synchronizers and arbiters don’t work well in nanometre technologies, especially at low Vdd

Worse that gates! Why? A gate input is either HIGH

– Output pulled down Or LOW

– Output pulled up A metastable gate is neither

– Both transistors can be off Vdd decreases with process shrink,

– gm very low Synchronization time constant τ = C/gm

Vdd

Ground

IdsVdd

IdsGround

Ids

Ids

Vdd/2

TUBS 1 July 2009 26

Robust synchronizer

Data Reset

Clock

Vdd

0V

• Jamb latch synchronizer slow for low Vdd, low temp

• In metastability, both outputs are the same

• Extra p-types are switched fully on, so gm increases, and τ improves

TUBS 1 July 2009 27

Results

τ (metastability time constant) vs Vdd

Measurement Results (ps)Vdd(v) Jamb Latch B Robust Synchronizer

1.8 35.55 34.92

1.7 37.29 35.76

1.6 40.93 38.25

1.5 52.36 43.07

1.4 66.17 50.36

1.35 75.35 58.19

TUBS 1 July 2009 28

Outline




avoided Measuring and some interesting

effects

TUBS 1 July 2009 29

Does noise affect τ ?

Probability of escape from metastability does not change with gaussian noise (Couranz and Wann 1975)

Trajectories

-0.7

-0.5

-0.3

-0.1

0.1

0.3

0.5

0.7

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2

Time

Vol

ts

TUBS 1 July 2009 30

Does noise affect τ ?

Probability of escape from metastability does not change with gaussian noise (Couranz and Wann 1975)

Trajectories

-0.7

-0.5

-0.3

-0.1

0.1

0.3

0.5

0.7

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2

Time

Vol

ts

TUBS 1 July 2009 31

Can noise change the failure rate?

Or maybe not..…

Q

Q

Clock

D

∆tin

∆tin -> 0

D

Clock

TUBS 1 July 2009 32

The normal case

Probability of initial difference due to noise component P1(v)

Probability of initial difference due to input clock data overlap P0(v)

Convolution

Result of convolution P(v)

Probability

T >> tn

tn

Time

TUBS 1 July 2009 33

The malicious input

Probability of initial difference due to noise component P1(v)

Probability of initial difference with zero input clock data overlap P0(v)

Result of convolution P(v)

Probability

T << tn

tn

Time

TUBS 1 July 2009 34

Non-determinism

Synchronizers and arbiters can produce non-deterministic outcomes

Some noise is deterministic (5-50ps)– Power supply, crosstalk

Some noise is non-deterministic (typically 0.1ps -0.5ps). – Thermal noise, which increases as dimensions reduce.

Sequence and time of an individual computation paths is unpredictable

System performance can only be predicted probabilistically

TUBS 1 July 2009 35

Outline





effects

TUBS 1 July 2009 36

Request and Acknowledge

D Q D Q

Read Clocks

REQ

DQ DQ

Write Clocks

Data Available

Read done

ACK

DATA

TUBS 1 July 2009 37

Latency

It takes one - two receive clocks to synchronise the request

Then one – two write clocks to acknowledge it

Significant latency (1-3 clocks) Poor data rate (2 – 6 Clocks)

TUBS 1 July 2009 38

FIFO Can improve data rate by using a FIFO But not latency (which gets worse) FIFO is asynchronous (usually RAM + read and write

pointers)

D Q D Q

Read Clock 2

Data Available

WRITE

FIFO

DQ DQ

Write clock 1

Write Data Read done

Free to writeFull Not Empty

READ

DATA DATA

Write clock 2 Read Clock 1

TUBS 1 July 2009 39

Timing regions can have predictable relationships

Phase locked (mesochronous)– Timing of input data is constant, therefore

predictable Same or related frequencies but phase

difference can drift in an unbounded manner. (plesiochronous) – Timing is not constant but is still predictable

Unrelated frequencies (Heterochronous)– No assumptions about timing can be made– Need a synchronizer

TUBS 1 July 2009 40

Don’t synchronise when you don’t need to

If the two clocks are locked together, you don’t need a synchroniser, just an asynchronous FIFO big enough to accommodate any jitter/skew

FIFO must never overflow/underflow, so there is latency

REQ IN

Write Data AvailableRead done

ACK IN REQ OUT

ACK OUT

FIFODATA DATA

TUBS 1 July 2009 41

Mesochronous data exchange Intermediate X register used to retime data Need to find a place where write data is stable, and read register

available. There is always a place which can be found at start up– Chakraborty and Greenstreet ASYNC 2003

Controller

DATA In DATA Out

Write Clock Read Clock

RW X

TUBS 1 July 2009 42

Clock delay synchronizer (Ginosar AINT 2000)

conflict region

DATA

REG

RCLKtKO

0 1conflictdetector

WCLK

RCLK d dtKO

SYNC

TUBS 1 July 2009 43

Delay synchronizer latency

Nominally 0 – 1 clock cycle Relies on accurately predicting conflicts Clocks must remain stable over

synchronisation time. Always lose tko of next computation stage Alternative: shift all conflicts to next read

cycle– On average this loses 2d– 2d must be big enough to cover any clock

drift/jitter over synchronization time

TUBS 1 July 2009 44

Speculation

Mostly, the synchronizer does not need 30τ to settle

Only e-13 (0.00023%) need more than 13τ

Why not go ahead anyway, and try again if more time was needed

TUBS 1 July 2009 45

Low latency synchronization Data Available, or Free to write are produced early. If they prove to be in error, synchronization failed. Read Fail or Write Fail flag is then raised and the

action can be repeated.

Read Fail

Data Available

WRITE

FIFO

Write Fail

Write Data Read done

Free to writeFull Not Empty

READ

DATA DATA

Write clock Read Clock

Speculativesynchronizer

Speculativesynchronizer

TUBS 1 July 2009 46

When to recover

1.Early

Half Cycle - 2τ

2. SpeculativeHalf Cycle

FinalEnd of Cycle

Fail1 & 2 differentEnd of Cycle

Comment

? ? metastable? metastable? Unrecoverable error,Probability low.

0 0 0 0 No data was available

0 1 1 1 Stable at the end of the cycle, but the speculative output may have been metastable. Fail

1 1 1 0 Normal data Transfer

Early Data Available is set after a half cycle – 2 inverter delays Speculative Data Available after a half cycleIf these two are different at end of cycle, set Fail

TUBS 1 July 2009 47

Synchronizer latency reduction

1. Only have one clock for the whole system2. Use clocks with a predictable relationship3. Speculate4. Synchronise at the start of a burst transfer,

the data rate is predictable during the burst

TUBS 1 July 2009 48

Outline





effects

TUBS 1 July 2009 49

What we know about metastability

Things we know– Synchronizers are unreliable, the more there are

the more unreliable the system– How to measure reliability up to a few hours

Things we know we don’t know– What reliability is at 3 years– How to measure it– Complex circuits give complex results, the simple

MTBF formula may not apply

Things we don’t know we don’t know– What happens on the back edge of the clock

TUBS 1 July 2009 50

74F5074 Histogram

Slope, τ, is about 120ps (in fast region) Typical delay time (most events) is 4ns 99.9% of clock cycles do not cause useful events To get 1 event at 7ns requires hours

-4ns-7ns

TUBS 1 July 2009 51

10 MHz TestFFD Q

Integrator

Variable Delay

SlaveFFD Q

Increasing the number of events

Test FF is driven to metastability Every clock produces a metastable response Integrator ensures half outputs high, half low

TUBS 1 July 2009 52

What you get

Clock to D (Input) histogram

Q to Clock (Output) histogram

200ps

3ns

TUBS 1 July 2009 53

Total input events normalized

0

0.2

0.4

0.6

0.8

1

200 250 300 350

Input time, ps

Total output events normalized

0.0

0.2

0.4

0.6

0.8

1.0

3.50 4.50 5.50

Output time, ns

Interpreting results

05000

10000150002000025000300003500040000

50 150 250 350D to Clock, ps

Even

ts

Input time distribution is not flat

Proportion of total inputs causing events vs input time

Mapping output times to input times

Proportion of total output events vs output time

0 < Balance point > 1

TUBS 1 July 2009 54

Results

∆t is the time from the “balance point” of ~200ps

Similar to original graph BUT not events

Orders of magnitude quicker to gather data

Reliability for days not minutes Only one oscillator, so no

distribution issues ∆t does not depend on fc and fd or

measurement time. Events do

1.00E-17

1.00E-16

1.00E-15

1.00E-14

1.00E-13

1.00E-12

1.00E-11

1.00E-10

1.00E-09

3.00 5.00 7.00

Q time, ns

Delta

t

dct ffMTBF

∆=

1τt

wt eT−

=∆

TUBS 1 July 2009 55

When the clock goes low

Clock goes high, master goes metastable

1E-18

1E-17

1E-16

1E-15

1E-14

1E-13

5.0 6.0 7.0 8.0 9.0 10.0

ns

Delta

t

No Back Edge4.5 Back Edge5.5 Back Edge

D QS

Slave latch

D QM

Master Latch

Clock

Clock Inverse Clock

D QM

Back edge of clock causes increased delay

Master output arrives at slave– Before slave clock high: transparent

gate delay td– As slave clock goes high:

metastable, slightly longer delay

TUBS 1 July 2009 56

Effect of clock low on 74F5074

1 – 3 ns additional delay

1.00E-21

1.00E-20

1.00E-19

1.00E-18

1.00E-17

1.00E-16

1.00E-15

1.00E-14

1.00E-13

1.00E-12

1.00E-115.00E-

096.00E-

097.00E-

098.00E-

099.00E-

091.00E-

081.10E-

081.20E-

08

Output time

Inpu

t tim

e

5ns pulse 4ns pulse No back edge

6 ns pulse

4 ns pulse

TUBS 1 July 2009 57

Measurement results

Reliability measurements extended from– 10-15 s or MTBF = 16 min at 10MHz, to– 10-22 s or MTBF = 3 years

We can see variations in τ not previously seen

Measurement is statistical, not affected by noise

Not affected by oscillator linking

Back edge of clock pulse is seen to be an important effect, can be 0 – 15τ

TUBS 1 July 2009 58

Conclusions

Synchronization/arbitration requires special circuit elements

They’re not digital! Well known models of synchronizers and arbiters

exist Design gets more difficult with small dimensions Synchronizers and arbiters are not deterministic. Both circuits and systems may not conform to

idealized models Differences can lead to poor reliability, unfair

arbitration, and unpredictable performance

Documents

Synchronizers, Arbiters, GALS and Metastability€™s the problem: The digital IP world and the rest of the world Your system The synchronizer . is the guy that allows. ... metastability