Upload
lykhanh
View
216
Download
4
Embed Size (px)
Citation preview
TUBS 1 July 2009 1
David KinnimentUniversity of Newcastle, UK
Based on contributions from:Alex Bystrov, Keith Heron, Nikolaos Minas,
Gordon Russell, Alex Yakovlev, and Jun Zhou
Synchronizers, Arbiters, GALS and Metastability
TUBS 1 July 2009 2
Outline
What’s the problem? Why does it matter?
Time and distributions Noise – decisions are not
deterministic. Synchronizer latency, can it be
avoided? Measuring and some interesting
effects
TUBS 1 July 2009 3
What’s the problem: The digital IP world and the rest of the world
Your system
The synchronizer is the guy that allowstiming flexibility
Everything else, or Reality
TUBS 1 July 2009 4
Synchronizers and arbiters
Your system
Input
Your system
Input 1
Input 2
SynchronizerDecides which clockcycle to use for the input data
Asynchronous arbiterDecides the order of inputs
TUBS 1 July 2009 5
Time Comparison Hardware Digital comparison hardware
(which compares integers) is easy– Fast– Bounded time
Analog comparison hardware (which compares reals like time) is hard– Normally fast, but takes longer as the difference becomes smaller– Can take forever, (Buridan’s Ass ~1340)
Synchronization and arbitration involve comparison of time
Known to early computer designers:– Lubkin 1952, Catt 1966– Chaney and Littlefield 1966/72
TUBS 1 July 2009 6
Asynchronous Network (Sparsø, ASYNC 2005)
Sparsø
Synchronization requiredMultiple Clocks
AsynchronousArbitration required
TUBS 1 July 2009 7
Synchronization of data
Non pausible clocks require data synchronization You have a limited time to synchronize. Synchronizer circuits may fail to work in that time System sometimes fails (you fly into a mountain) Synchronization time = latency
D Q D Q
CLK a
VALID
#1 #2
CLK b
TUBS 1 July 2009 8
Synchronization of clocks
Clock paused to prevent contention Wait for MUTEX output to resolve Can take forever (with decreasing probability) This may not be acceptable (you fly into a mountain)
C
MU
TEX
Request Grant
Clock
Ack
Run
Delay
TUBS 1 July 2009 9
Trends
Communications are limiting performance Difficulty (Impossibility?) in maintaining one
clock per chip Need reduce wasted power in clocks Timing closure problems
– Interconnect delay times long– Variability increases
Move towards asynchronous networks
TUBS 1 July 2009 10
Metastability is....
Not being able to decide…
Q
Q
Clock
D
∆tin
∆tin -> 0
D
Clock
Request
Processor Clock
Set-up time violated
TUBS 1 July 2009 11
Metastability in a Latch
V1 V2
I1 I2
V1
V2
Stable points
Metastable Point
V1
V2
I1
V2
V1
I2
TUBS 1 July 2009 12
Simple Linear Model
Simple linear model leads to two exponentials
τa is convergent, τb is divergent
ba
t
b
t
a eKeKV ττ ..1 +=−0
111 2
21
21 2 1
2 1= ++
+ −τ ττ τ
. .( )
. ( ).d Vdt A
dVdt A
V
τ τ11 1
22 2= =
C RA
C RA
.,
.Q1 Q2
-A*V1
R1
C1
+- V2
V2
V1
V1
V2
V1
RAgm =
TUBS 1 July 2009 13
Synchronizer
t is time allowed for the Q to change between CLK a and CLK b τ is the recovery time constant, usually the gain-bandwidth of the circuit Tw is the “metastability window” τ and Tw depend on the circuit We assume that all values of ∆tin are equally probable
D Q D Q
CLK a
VALID#1 #2
dcw
t
ffTeMTBF
..
/τ
=CLK b
TUBS 1 July 2009 14
Typical responses
All starting points are equally probable Most are a long way from the “balance point” A few are very close and take a long time to resolve
Clock
Q Output
TUBS 1 July 2009 15
Event Histogram
- Propagation delay
Events
The slope is -1/τ
Log Probability of eventdepends on ∆ time
Propagation delay
Normal delay
The intercept is ~Tw
TUBS 1 July 2009 16
Synchronizer state of the art
You require about 35 τ s in order to get the MTBF out to about 1 century. (That’s for 1 synchronizer)
There is nothing else you can do while synchronizing
Each typical static gate delay is equivalent to about 5 τ Synchronizers are analog devices, so worse affected by scaling
Bigger SoCs, in future systems so more synchronizers, worse reliability
Inputs can be ‘malicious’ i.e. always causing metastability.
TUBS 1 July 2009 17
Outline
What’s the problem? Why does it matter?
Time and distributions Noise – decisions are not
deterministic. Synchronizer latency, can it be
avoided? Measuring and some interesting
effects
TUBS 1 July 2009 18
The arbiter (MUTEX)
Grant 2
Grant 1
Request 1
Gnd
Request 2
•Asynchronous arbitration,No time bound•Seitz metastability filter•Grants cannot occur until after the latch resolves any metastability
TUBS 1 July 2009 19
Arbitration time
Unlike a synchronizer, an arbiter may take for ever. It usually doesn’t, long responses are rare. On average the time should only be τ longer than the
normal response, so DEAD time ought to be low Outputs are always monotonic
Request 1
Request 2
Grant 1
Grant 2
tm
TUBS 1 July 2009 20
Gate metastability filter
Half levels due to metastability need to be removed– Low (or high) threshold inverters– Measure divergence
Filters define the time to reach a stable state
Vt =Vdd/4
Vdd/2
Vdd/2
0
0
Vdd/2
TUBS 1 July 2009 21
Resolution time can be affected
• The start/finish points make a difference• Grant appears when trajectory crosses the filter threshold• Gate metastability filter shows more variation• Seitz filter shows more delay
1.1
1.3
1.5
1.7
1.9
2.1
0 50 100 150 200 250 300 350 400 450 500
ps
Volts
Low Start1.55VHigh Start
TUBS 1 July 2009 22
Event distribution
Assumption: Time between two events (R1 and R2) is evenly distributed
Typically it is not. Can be as much
as 7:1 variation. Here there are
more cases where R1 is just after R2
Arbitration may not be fair
Distribution of event times
TUBS 1 July 2009 23
Non uniform probabilities and time Time for metastability averaged over a distribution T in a
MUTEX is:
in
T
in
w tdt
TtimeAverage ∆
∆
= ∫ .ln_0
τ
If T = ∞, Average_time = τBUTFor a malicious input, distribution limited by noise If T = 4ps, Tw =25ps, Average_time = 4τ Noise can sometimes make average time faster
TUBS 1 July 2009 24
Predicting performance
MUTEX circuits are fast
BUT delay times may not be easy to predict– Cannot rely on ALL times being fast
Synchronizers are known to be unreliable
TUBS 1 July 2009 25
Future processes
Synchronizers and arbiters don’t work well in nanometre technologies, especially at low Vdd
Worse that gates! Why? A gate input is either HIGH
– Output pulled down Or LOW
– Output pulled up A metastable gate is neither
– Both transistors can be off Vdd decreases with process shrink,
– gm very low Synchronization time constant τ = C/gm
Vdd
Ground
IdsVdd
IdsGround
Ids
Ids
Vdd/2
TUBS 1 July 2009 26
Robust synchronizer
Data Reset
Clock
Vdd
0V
• Jamb latch synchronizer slow for low Vdd, low temp
• In metastability, both outputs are the same
• Extra p-types are switched fully on, so gm increases, and τ improves
TUBS 1 July 2009 27
Results
τ (metastability time constant) vs Vdd
Measurement Results (ps)Vdd(v) Jamb Latch B Robust Synchronizer
1.8 35.55 34.92
1.7 37.29 35.76
1.6 40.93 38.25
1.5 52.36 43.07
1.4 66.17 50.36
1.35 75.35 58.19
TUBS 1 July 2009 28
Outline
What’s the problem? Why does it matter?
Time and distributions Noise – decisions are not
deterministic. Synchronizer latency, can it be
avoided Measuring and some interesting
effects
TUBS 1 July 2009 29
Does noise affect τ ?
Probability of escape from metastability does not change with gaussian noise (Couranz and Wann 1975)
Trajectories
-0.7
-0.5
-0.3
-0.1
0.1
0.3
0.5
0.7
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2
Time
Vol
ts
TUBS 1 July 2009 30
Does noise affect τ ?
Probability of escape from metastability does not change with gaussian noise (Couranz and Wann 1975)
Trajectories
-0.7
-0.5
-0.3
-0.1
0.1
0.3
0.5
0.7
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2
Time
Vol
ts
TUBS 1 July 2009 31
Can noise change the failure rate?
Or maybe not..…
Q
Q
Clock
D
∆tin
∆tin -> 0
D
Clock
TUBS 1 July 2009 32
The normal case
Probability of initial difference due to noise component P1(v)
Probability of initial difference due to input clock data overlap P0(v)
Convolution
Result of convolution P(v)
Probability
T >> tn
tn
Time
TUBS 1 July 2009 33
The malicious input
Probability of initial difference due to noise component P1(v)
Probability of initial difference with zero input clock data overlap P0(v)
Result of convolution P(v)
Probability
T << tn
tn
Time
TUBS 1 July 2009 34
Non-determinism
Synchronizers and arbiters can produce non-deterministic outcomes
Some noise is deterministic (5-50ps)– Power supply, crosstalk
Some noise is non-deterministic (typically 0.1ps -0.5ps). – Thermal noise, which increases as dimensions reduce.
Sequence and time of an individual computation paths is unpredictable
System performance can only be predicted probabilistically
TUBS 1 July 2009 35
Outline
What’s the problem? Why does it matter?
Time and distributions Noise – decisions are not
deterministic. Synchronizer latency, can it be
avoided? Measuring and some interesting
effects
TUBS 1 July 2009 36
Request and Acknowledge
D Q D Q
Read Clocks
REQ
DQ DQ
Write Clocks
Data Available
Read done
ACK
DATA
TUBS 1 July 2009 37
Latency
It takes one - two receive clocks to synchronise the request
Then one – two write clocks to acknowledge it
Significant latency (1-3 clocks) Poor data rate (2 – 6 Clocks)
TUBS 1 July 2009 38
FIFO Can improve data rate by using a FIFO But not latency (which gets worse) FIFO is asynchronous (usually RAM + read and write
pointers)
D Q D Q
Read Clock 2
Data Available
WRITE
FIFO
DQ DQ
Write clock 1
Write Data Read done
Free to writeFull Not Empty
READ
DATA DATA
Write clock 2 Read Clock 1
TUBS 1 July 2009 39
Timing regions can have predictable relationships
Phase locked (mesochronous)– Timing of input data is constant, therefore
predictable Same or related frequencies but phase
difference can drift in an unbounded manner. (plesiochronous) – Timing is not constant but is still predictable
Unrelated frequencies (Heterochronous)– No assumptions about timing can be made– Need a synchronizer
TUBS 1 July 2009 40
Don’t synchronise when you don’t need to
If the two clocks are locked together, you don’t need a synchroniser, just an asynchronous FIFO big enough to accommodate any jitter/skew
FIFO must never overflow/underflow, so there is latency
REQ IN
Write Data AvailableRead done
ACK IN REQ OUT
ACK OUT
FIFODATA DATA
TUBS 1 July 2009 41
Mesochronous data exchange Intermediate X register used to retime data Need to find a place where write data is stable, and read register
available. There is always a place which can be found at start up– Chakraborty and Greenstreet ASYNC 2003
Controller
DATA In DATA Out
Write Clock Read Clock
RW X
TUBS 1 July 2009 42
Clock delay synchronizer (Ginosar AINT 2000)
conflict region
DATA
REG
RCLKtKO
0 1conflictdetector
WCLK
RCLK d dtKO
SYNC
TUBS 1 July 2009 43
Delay synchronizer latency
Nominally 0 – 1 clock cycle Relies on accurately predicting conflicts Clocks must remain stable over
synchronisation time. Always lose tko of next computation stage Alternative: shift all conflicts to next read
cycle– On average this loses 2d– 2d must be big enough to cover any clock
drift/jitter over synchronization time
TUBS 1 July 2009 44
Speculation
Mostly, the synchronizer does not need 30τ to settle
Only e-13 (0.00023%) need more than 13τ
Why not go ahead anyway, and try again if more time was needed
TUBS 1 July 2009 45
Low latency synchronization Data Available, or Free to write are produced early. If they prove to be in error, synchronization failed. Read Fail or Write Fail flag is then raised and the
action can be repeated.
Read Fail
Data Available
WRITE
FIFO
Write Fail
Write Data Read done
Free to writeFull Not Empty
READ
DATA DATA
Write clock Read Clock
Speculativesynchronizer
Speculativesynchronizer
TUBS 1 July 2009 46
When to recover
1.Early
Half Cycle - 2τ
2. SpeculativeHalf Cycle
FinalEnd of Cycle
Fail1 & 2 differentEnd of Cycle
Comment
? ? metastable? metastable? Unrecoverable error,Probability low.
0 0 0 0 No data was available
0 1 1 1 Stable at the end of the cycle, but the speculative output may have been metastable. Fail
1 1 1 0 Normal data Transfer
Early Data Available is set after a half cycle – 2 inverter delays Speculative Data Available after a half cycleIf these two are different at end of cycle, set Fail
TUBS 1 July 2009 47
Synchronizer latency reduction
1. Only have one clock for the whole system2. Use clocks with a predictable relationship3. Speculate4. Synchronise at the start of a burst transfer,
the data rate is predictable during the burst
TUBS 1 July 2009 48
Outline
What’s the problem? Why does it matter?
Time and distributions Noise – decisions are not
deterministic. Synchronizer latency, can it be
avoided? Measuring and some interesting
effects
TUBS 1 July 2009 49
What we know about metastability
Things we know– Synchronizers are unreliable, the more there are
the more unreliable the system– How to measure reliability up to a few hours
Things we know we don’t know– What reliability is at 3 years– How to measure it– Complex circuits give complex results, the simple
MTBF formula may not apply
Things we don’t know we don’t know– What happens on the back edge of the clock
TUBS 1 July 2009 50
74F5074 Histogram
Slope, τ, is about 120ps (in fast region) Typical delay time (most events) is 4ns 99.9% of clock cycles do not cause useful events To get 1 event at 7ns requires hours
-4ns-7ns
TUBS 1 July 2009 51
10 MHz TestFFD Q
Integrator
Variable Delay
SlaveFFD Q
Increasing the number of events
Test FF is driven to metastability Every clock produces a metastable response Integrator ensures half outputs high, half low
TUBS 1 July 2009 52
What you get
Clock to D (Input) histogram
Q to Clock (Output) histogram
200ps
3ns
TUBS 1 July 2009 53
Total input events normalized
0
0.2
0.4
0.6
0.8
1
200 250 300 350
Input time, ps
Total output events normalized
0.0
0.2
0.4
0.6
0.8
1.0
3.50 4.50 5.50
Output time, ns
Interpreting results
05000
10000150002000025000300003500040000
50 150 250 350D to Clock, ps
Even
ts
Input time distribution is not flat
Proportion of total inputs causing events vs input time
Mapping output times to input times
Proportion of total output events vs output time
0 < Balance point > 1
TUBS 1 July 2009 54
Results
∆t is the time from the “balance point” of ~200ps
Similar to original graph BUT not events
Orders of magnitude quicker to gather data
Reliability for days not minutes Only one oscillator, so no
distribution issues ∆t does not depend on fc and fd or
measurement time. Events do
1.00E-17
1.00E-16
1.00E-15
1.00E-14
1.00E-13
1.00E-12
1.00E-11
1.00E-10
1.00E-09
3.00 5.00 7.00
Q time, ns
Delta
t
dct ffMTBF
∆=
1τt
wt eT−
=∆
TUBS 1 July 2009 55
When the clock goes low
Clock goes high, master goes metastable
1E-18
1E-17
1E-16
1E-15
1E-14
1E-13
5.0 6.0 7.0 8.0 9.0 10.0
ns
Delta
t
No Back Edge4.5 Back Edge5.5 Back Edge
D QS
Slave latch
D QM
Master Latch
Clock
Clock Inverse Clock
D QM
Back edge of clock causes increased delay
Master output arrives at slave– Before slave clock high: transparent
gate delay td– As slave clock goes high:
metastable, slightly longer delay
TUBS 1 July 2009 56
Effect of clock low on 74F5074
1 – 3 ns additional delay
1.00E-21
1.00E-20
1.00E-19
1.00E-18
1.00E-17
1.00E-16
1.00E-15
1.00E-14
1.00E-13
1.00E-12
1.00E-115.00E-
096.00E-
097.00E-
098.00E-
099.00E-
091.00E-
081.10E-
081.20E-
08
Output time
Inpu
t tim
e
5ns pulse 4ns pulse No back edge
6 ns pulse
4 ns pulse
TUBS 1 July 2009 57
Measurement results
Reliability measurements extended from– 10-15 s or MTBF = 16 min at 10MHz, to– 10-22 s or MTBF = 3 years
We can see variations in τ not previously seen
Measurement is statistical, not affected by noise
Not affected by oscillator linking
Back edge of clock pulse is seen to be an important effect, can be 0 – 15τ
TUBS 1 July 2009 58
Conclusions
Synchronization/arbitration requires special circuit elements
They’re not digital! Well known models of synchronizers and arbiters
exist Design gets more difficult with small dimensions Synchronizers and arbiters are not deterministic. Both circuits and systems may not conform to
idealized models Differences can lead to poor reliability, unfair
arbitration, and unpredictable performance