View
218
Download
3
Category
Preview:
Citation preview
A systems approach to buildingA systems approach to building modern high-speed links
Vladimir Stojanović
Integrated Systems GroupMassachusetts Institute of Technology
Course Map
Show modern, systems approach to link design
Day 1Link environment as communication channelLinks as bandlimited communication systems
Day 2Link implementations (signal processing, synchronization)Lab 1
Day 3System modeling (noise sources, ISI)L b 2
Integrated Systems Group 2
Lab 2
Link System Implementations
Signal processing – Tx, RxEqualizationM d l tiModulation
SynchronizationCDRCDRCDR and Eq interaction
Integrated Systems Group 3
Modern Link Architecture
Equalization, Multi-level signaling, Partial-responseSophisticated timing recovery
Integrated Systems Group 4
Multi-level, second-order
Signaling: Current vs. Voltage Mode
Current mode Voltage mode
Hatamkhani 2004Palmer 2007
Voltage mode reduces supply current by a factor of 4
Integrated Systems Group 5
Not as clear when pre-driver and regulator power added
Transmitter: Output Drivers
On-chip clock speed limited to 6-8FO4p pNeed to send more bits/clock – multiplex data
Integrated Systems Group 6
Transmitter : Time-Interleaved DACs
CurrentlkSt t0data0
50
CurrentlkSt t0data0
50
CurrentDAC
••
clkStart0clkEnd0
CurrentDAC
••
clkStart0clkEnd0
CurrentDAC
••
clkStart7data7 Current
DAC
••
clkStart7data7
DACs enabled by overlap of two clocks
DACclkEnd7 DACclkEnd7
DACs enabled by overlap of two clocksNeed precise clocksFast clocks limit interleaving
Integrated Systems Group 7
Capacitance DACs loads output
DAC Output Circuitry
7 thermometer coded size 32 outputs
RCout = 25Ω * 4.3pF −> 1.5 GHz bandwidth
32
7 thermometer-coded size 32 outputs
32
16
16
8
8
4
4
21
VddReg
clkStart42data 1
low-fanout output 3 bit
clkEnd
symbol timepre-driver output
driver
5-bit binary
3-bit thermometer
Integrated Systems Group 8
Predriver VddReg controls output current
Transmit Equalizers : Analog FIR
TNTP
TNTP
A[2] A[0]
TNTP
W/L
W/L
A[1]1/z
1/z
B[0]W/L
W/L
A[0]
...
1/z
E[0]
W/L
W/L
Simple PAM2/PAM4 Tx Equalizing 5-Tap 2P/4P Tx
E[0]
Integrated Systems Group 9
Simple PAM2/PAM4 Tx q g p
Transmit Equalizers : Analog FIR
hare
deg
men
ts (7
)Sh
Driv
er S
ped
icat
ed T
apD
river
s (5
)
Original 5-Tap PAM2/PAM4 equalizing transmitter
De
Integrated Systems Group 10
Shared equalizing transmitter[Zerbe 2003]
Linear CT receiver equalizer – analog high-pass
Tunable RC tail degenerationSensitive to common mode
Integrated Systems Group 11
Sensitive to common-mode[Farjad-Rad 2003]
Rx Continuous Time Linear Equalizer
At HF
Source-connected RC-poleCapacitor becomes a short at high-frequencies, increasing gain
NOTE M i till li it d b G i *BW d t f l dNOTE: Max gain still limited by Gain*BW product of source-coupled differential pair
Integrated Systems Group 12
Variable Cap Rx CTLE Implementation
|H||H||H|(dB)|H|(dB)
f
Zero FrequencyAdjustment
f
Zero FrequencyAdjustment
D l ith i ti b t i C &
f(log)f(log)
Deal with process variation by tuning C & therefore tuning zero location
Integrated Systems Group 13
Rx CTLE Implementation Getting More Gain
Using L canboost gain
significantly
Build peaking amplifier by use of inductorsArea intensiveArea intensive [Partovi 2003]
Integrated Systems Group 14
Banwidth peaking, Tx and Rx
dj tadjustadjust
adc
dj tadjustadjust adc
adcd
dacdac
adjust
4 stagePLL
dac
Transmit memory
adjustadjust
4 stagePLL
Rx memoryTiming Recovery
adjust adc dac
[Ellersick 2001]
Insert inductors between interleaved stagesApproximate lumped LC transmission linePhase adjustment compensates delay
Integrated Systems Group 15
Phase adjustment compensates delayNice-idea – hard to do
Rx CTLE Implementation Issues
Linearity a challengeEspecially when input swings vary greatly in amplitude
Limited by gain bw of diff pair stageLimited by gain-bw of diff-pair stageTuned to a data rate with limited range; channel matchSensitive to PVT variationsSensitive to device mismatch linearitySensitive to device mismatch, linearityDifficult to offset cancel, calibrate
Multi-stage issuesHigh gain can lead to clipping in multi-stage designHigh gain can lead to clipping in multi-stage designOriginal design issues become even more difficultTuning is tricky
Performance resultsPerformance resultsIn 90nm, can generally get ~4-6dB of gain/stage at 10Gb/s if you do things rightNot a lot of gain – leads to multi-stage & inductively peaked d i
Slide 16
designs
Receiver with Analog FIR DFE
Loop latency an issue [Zerbe 2003]
Integrated Systems Group 17
Loop latency an issueUse for tap 2 and more, sometimes just for reflections
[ ]
DFE Feedback Timing :Splitting Edge & Data Gives Some Relief
FF
Data Output
Data Input
E0 D0 E1 D1 E2 D2 E3 D3 …
D3
Normal DFEP Q
Data clk
D0 D1 D2S
FF FFD0 D1 D2
RS
FF
Tran clk
Trans Output
E3
Ta c
XDFE
Separate data & edge feedback & summers helpStill need to make 1UI data timing – still very tightStill need to make 1UI data timing still very tightAllows separate coefficients & ‘edge-DFE’
Courtesy E.Chen, K. Yang UCLAIntegrated Systems Group 18
Complete Di-Bit (Half-rate) DFE
S L
α2
VO LS L
α2
VO LdO
SR
Vin
Lclk
α
Lclk
SR
Vin
Lclk
αα1 3
Lclk
dClkO
Vin
α3α
VdClkO
Vin
αα1
V dESR
dClkO
dClkE
Lclk
α2
VE Lclk
SR
dClkO
dClkE
Lclk
α2
VE Lclk
Combine prDFE and fully split even and odd pathsImproved tolerance to DCD errorsCan potentially use separate even/odd coefficents
Courtesy J. Zerbe, Rambus19Integrated Systems Group
One phase of double-data rate Rx
+- +
½
dE'½w2
+wePEdO'
dE' DFE(Taps 3-10)
+- +
eClkE
eClkEdE'
w2
½w2+w1e
0eZE
+
DFE3-10
w3-w10
dfeClkE/O
vin
DFE(Tap 2)
dClkEVin E1 O1 E2
+-
+
+dClkEw2
dE
dPE
aP dE'
sel0
l1 1
+w1
PrDFE(Tap 1)
eClkEdClkOeClkO
+-
+-
+
+
w2 aClkE
dEaP
dNE
SR Latch
dClkO
dEsel1
To / From
10
-w1
±w1+δ
dClkE Odd Data Pathw1
Even PathOdd PathDFE
First tap unrolled [Leibowitz 2007]
Integrated Systems Group
Second – current-mode (still a speed issue)Third – current-mode DAC
20
Reducing the critical path: Merge sampler & mux
dClkE dClkESel0 Sel0b
Data SamplerPMOS Domino
Multiplexer
dClkEbb dClkEbb
NMOS DominoMultiplexer
Sel0 Sel0b
i i
dPE+
dPE-SR
dClkEbdClkE
in+ in-
dClkEbb dO- dO+ dE- dE+dClkEb dClkEbb
PrDFE selectto Odd path
PrDFE selectfrom Odd pathSampler +
lti l SRmultiplexerlogic from
Figure 12.4.1:
StrongArm samper directly drives domino
Integrated Systems Group
StrongArm samper directly drives domino mux to reduce critical path on first tap
21
Receiver front-end is becoming complicatedVTTVTT
2
RTERM
CHANNELIN +/-
DATA(0) AFE Slice
DATA(+) AFE Slice
DATA( ) AFE Slice
Dat
aD
ata
DATA( )
DATA(0)
DATA(+)
+
+
2
RTERM
CHANNELIN +/-
DATA(0) AFE SliceDATA(0) AFE Slice
DATA(+) AFE SliceDATA(+) AFE Slice
DATA( ) AFE SliceDATA( ) AFE Slice
Dat
aD
ata
Dat
aD
ata
DATA( )
DATA(0)
DATA(+)
+++
+++
DATA(-) AFE Slice
A-AFE Slice
Cur
rent
dCLK
aCLK
Dat
a
Dat
a
A-DATA
DATA(-)
EDGE(+)
Dat
a
+
+1:14CurrentS it hV
DATA(-) AFE SliceDATA(-) AFE Slice
A-AFE SliceA-AFE Slice
Cur
rent
Cur
rent
dCLK
aCLK
Dat
aD
ata
Dat
aD
ata
A-DATA
DATA(-)
EDGE(+)
Dat
aD
ata
+++
++1:14CurrentS it hV
EDGE(0) AFE Slice
EDGE(+) AFE Slice
EDGE(-) AFE Slice
EDGE(0)
EDGE(-)
DD
ata
Dat
a
9-bit Delta DAC
+
+
SwitchVTT
EDGE(0) AFE SliceEDGE(0) AFE Slice
EDGE(+) AFE SliceEDGE(+) AFE Slice
EDGE(-) AFE SliceEDGE(-) AFE Slice
EDGE(0)
EDGE(-)
DDD
ata
Dat
aD
ata
Dat
a
9-bit Delta DAC
9-bit Delta DAC
+++
+++
SwitchVTT
EDGE( ) AFE Slice
eCLK
dCLKeaCLKe
8-bit Thresh
dCLKoaCLKo
I_0 [5:0]I_+ [5:0]
[5 0]
“Even” (e) Clock Phase Ckts
“Odd” (o) Clock Phase Ckts
+
VTT
EDGE( ) AFE SliceEDGE( ) AFE Slice
eCLK
dCLKeaCLKe
8-bit Thresh
8-bit Thresh
dCLKoaCLKo
I_0 [5:0]I_+ [5:0]
[5 0]
“Even” (e) Clock Phase Ckts
“Odd” (o) Clock Phase Ckts
+++
VTT
Lots of samplersData even, odd – multi-levelEdge even odd multi level
Lots of DACsMulti-level thresholdsOffset calibration
aCLKeeCLKe
esDAC
aCLKoeCLKo
I_- [5:0] aCLKeeCLKe
esDAC
esDAC
aCLKoeCLKo
I_- [5:0]
Integrated Systems Group 22
Edge even, odd – multi-levelAdaptive sampler
Offset calibrationAdaptation, on-chip scope
Hardware re-use: Dual-mode receiver
th h(+)D QD Q
thresh (+)lsb(+)
prDFE enable
D Q0
thresh(+)
0 D Q D Qin0
msb
prDFE enabledClk
dClk
D Q
D Q0
1
thresh(-)
D Qthresh (-)
dClk
prDFE enable
D Q
D Q1
0
1
PAM4
lsb(-)D QdClkD Q
0
1
Integrated Systems Group 23
Hardware re-use: Dual-mode receiver
D QD Qthresh (+)
lsb(+)
prDFE enable
D Q0
th h(+)
D Q D Qin0
msb
prDFE enabledClk
dClk
D Q
D Q0
1thresh(+)
0
D Qthresh (-)
dClk
prDFE enable
D Q
D Q1
0
1
outN
clkclk
Q
thresh(-)
lsb(-)D QdClkD Q
0
1inP
inNinP
outNoutP
outP outN
Q
PAM4 inNclkthreshII
+2 threshII
−2 Q
Integrated Systems Group 24
pre-amp with offset comparator
Hardware re-use: Dual-mode receiver
D QD Qthresh (+)
lsb(+)
prDFE enable
D Q0
0 D Q D Qin0
msb
prDFE enabledClk
dClk
D Q
D Q0
1
D Qthresh (-)
dClk
prDFE enable
D Q
D Q1
0
1
lsb(-)D QdClkD Q
0
1
PAM2
Integrated Systems Group 25
Hardware re-use: Dual-mode receiver
D QD Qthresh (+)
lsb(+)
prDFE enable
D Q0
D Q D Qin0
msb
prDFE enabledClk
dClk
D Q
D Q0
1
D Qthresh (-)
dClk
prDFE enable
D Q
D Q1
0
1
lsb(-)D QdClkD Q
0
1
PAM2 with loop-unrolled DFE tap
Integrated Systems Group 26
Hardware re-use: Dual-mode receiver
D QD Qthresh (+)
lsb(+)
prDFE enable
D Q0
thresh(+)D Q D Qin
0
msb
prDFE enabledClk
dClk
D Q
D Q0
1
thresh(-)
D Qthresh (-)
dClk
prDFE enable
D Q
D Q1
0
1
lsb(-)D QdClkD Q
0
1
PAM2 with loop-unrolled DFE tapLeverage multi-level properties of signals in loop-unrollingRe-use PAM4 receiver hardware (slicers and CDR)
Integrated Systems Group 27
Re-use PAM4 receiver hardware (slicers and CDR)
Adaptation with minimum overheaddLev
Tx Data
dLev
adaptivesampler
error
Adaptivemacro
aClk
Channel
Rx data macro
thresholds
d
dClk
tapupdates
t d t
CDRedge
aClk dClk eClkeClk
Adaptive sampler Generates the error signal at reference level
Monitors the link
tap updates
Monitors the linkAdjustable voltage and time referenceOn-chip sampling scope
Can replace any other sampler - calibration
Integrated Systems Group 28
Can replace any other sampler - calibrationV. Stojanović et al. “Adaptive Equalization and Data Recovery in Dual-Mode (PAM2/4) Serial Link Transceiver,” IEEE Symposium on VLSI Circuits, June 2004.
Dual-loop adaptive algorithmD l l f lData level reference loop
0ˆ),(1 >−=+ nndataLevnn xesignstepdLevdLev
dLevinit dLevmid dLevenderrorinitp-p
nx̂
)(Si
… … )ˆ( nxSign
)( neSign
Initial eye Mid-way equalized Equalized
Equalizer loop)ˆ()(1 nnwnn xsignesignstepww +=+
Integrated Systems Group 29
Scale the equalizer - output Tx constraint
Dual loop convergence – 4 tap example
1000100
PAM2, 5Gb/s, 4taps Tx Equalization
400
600
800
ht [m
V] main tap
60
80
[mV]
0
200
tap
wei
g
post2
20
40dLev
0 50 100 150 200-400
-200
number of updates
post1 pre1
0 50 100 150 2000
20
number of updates
Hard to estimate analyticallyExperimental results show
number of updatesnumber of updates
Integrated Systems Group 30
Both loops are stable within wide range 0.1 – 10x of relative speeds
Partial response adaptation - start
Extend data filter by one bit (msbn,msbn-1)
(a) Update loops only on (msbn,msbn-1)=(1,1)Finds dLev(1,1) - “1+α”
(b) Update loops only on (msb msb )=(0 1)
Integrated Systems Group 31
(b) Update loops only on (msbn,msbn-1)=(0,1)Finds dLev(0,1) – “1-α”
Partial response adaptation - end
2α
2α=dLev(1,1)-dLev(0,1)Iterate α finding and equalization loops
msb msb filter tolerates one tap post cursor ISI α
Integrated Systems Group 32
msbn,msbn-1 filter tolerates one tap post-cursor ISI α
Continuous adaptation with unknown data
DeSer
10
dE'dO'
d d d d
Data Snapshot
dn-10…dn+3
dssClk
dn-10dn-9 dn-8 dn+2dn-9
dn-8
d
dssReset
dn-7
dn+3Abs(x)
U/D!9-bit Acc
n+3( )
Stop adaptation when data not random enough
Integrated Systems Group
Stop adaptation when data not random enoughUse a cross-correlator circuit
Spectral gating results
6
8
DFE adaptation without spectral gating
0
2
4
6
Tap
Wei
ght
w3w4w9
-20 20 40 60 80 100 120 140
DFE adaptation with spectral gating
6
8
ight w3
0
2
4
0 20 40 60 80 100 120 140
Tap
We
w4w9
PRBS-31Static DataPRBS-31
Integrated Systems Group
-20 20 40 60 80 100 120 140
Adaptation Iteration
Noise detection
Tx Induced Noise
VTTImpedance
VDDVDD Induced
Sampler Noise
Tx Induced Noise
VTTImpedance
VDDVDD Induced
Sampler Noise
60
80
100Threshold
Setting
hres
hold
0
5
1 0
CDR LogicClock Period
Sam
plin
g a
1 (%
)
AFE
Thr
esho
ld (m
V)
60
80
100Threshold
Setting
hres
hold
0
5
1 0
CDR LogicClock Period
Sam
plin
g a
1 (%
)
AFE
Thr
esho
ld (m
V)
VTTR+
-
Noise
M M 11
CLK
-
+
Data
Input CM Noise
Noise
VTTR+
-
Noise
M M 11
CLK
-
+
Data
Input CM Noise
Noise
-20 -10 0 100
20
40
Vin (mV)
OffsetCancellation
100
Zero
Th
0 1 0 2 0 3 0 4 0- 1 0
- 5
0
U I
old
(mV
)
1 (%
)P
roba
bilit
y of
S
@ 9UI
424
Mea
sure
d m
ean
A
(a) (c)
-20 -10 0 100
20
40
Vin (mV)
OffsetCancellation
100
Zero
Th
0 1 0 2 0 3 0 4 0- 1 0
- 5
0
U I
old
(mV
)
1 (%
)P
roba
bilit
y of
S
@ 9UI
424
Mea
sure
d m
ean
A
(a) (c)
TX RXCurrentDACs
M > 1
VDD
VDD InducedDAC Noise
TX RXCurrentDACs
M > 1
VDD
VDD InducedDAC Noise
400 410 420 4300
20
40
60
80
Vin (mV)
Larg
e Th
resh
old
0 1 0 2 0 3 0 4 0U IM
easu
red
mea
n A
FE T
hres
ho
Pro
babi
lity
of S
ampl
ing
a 1
@ 9UI
414
419
409
404400 410 420 4300
20
40
60
80
Vin (mV)
Larg
e Th
resh
old
0 1 0 2 0 3 0 4 0U IM
easu
red
mea
n A
FE T
hres
ho
Pro
babi
lity
of S
ampl
ing
a 1
@ 9UI
414
419
409
404
TX RXTX RX(b) (d)(b) (d)
On chip scope can analyze system noiseOn-chip scope can analyze system noiseMulti-level samplers especially vulnerable
Differential properties degrade with threshold offset
Integrated Systems Group 35
Link System Implementations
Signal processing – Tx, RxEqualizationM d l tiModulation
SynchronizationCDRCDRCDR and Eq interaction
Integrated Systems Group 36
Clocking : Terminology
Can do with orCan do with orwithout CDR
Needs CDR!
Integrated Systems Group 37Poulton’99
Clock and Data Recovery
Recovering clock from the dataCan recover clock completely, or just phaseJust phase: need a reference clock
Why?Allows separate xtals on different boardsAllows separate xtals on different boardsDon’t have to match trace lengths, delaysEasier system design / clock distribution
Why NotExpensive: takes area, powerRequires coding or transition density or at a minimum aRequires coding or transition density or at a minimum a training sequence
8b10b coding uses 10b to xfer 8b of info; 20% BW loss
Integrated Systems Group 38
Example CDR : PLL Technique
Simple bang-bang PLLObserve data with phase detector Incoming DataFilter Early/Late & drive VCO
AdvantagesGood frequency range
Incoming Data
Low JitterChallenges
Phase offsetLock time - startup sequenceLoss of lock - coding dependantHow to integrate multiple PLLs? g p
Harmonic locking problems
Integrated Systems Group 39
Dual PLL Problem: Harmonic Locking
Potentially serious in highly integrated plesiochronous t h id l h i l t i
Integrated Systems Group 40
systems where residual phase error is close to noise injection in magnitude
2-PAM Eye With Density
Integrated Systems Group 41
Dual-Loop CDR
Combination ofCore PLL provides multiple phases at frequencyphases at frequencyPeriphery DLL mixes and make desired phase
Advantages PDLow Pass
Filter VCO
÷ N
gAvoids harmonic locking
Easy to integrate manyRapid CDR lock time
PD FilterRefClk
Phase MixersCDRRXp
CDR very stable Digital = flexible filtering, controlCan even ‘hold’ phase state
Phase MixersCDRLogic
Data
RClk
ChallengesLimited Freq offset from PLLJitter not as low as PLL
Integrated Systems Group 42
CDR Issues : Transitions & Tracking
CDRs require transition densityCDRs require transition densityPLL based CDRs require to keep lock even in mesochronousDLL based require for plesiochronous tracking
Often coding is used to guaranteeCan alternately use a scrambler + XOR
Integrated Systems Group 43
CDR Issues : Jitter
dither
locklock
CDR Jitter starts out worse than PLL jitterAlso can have ‘dither jitter’ : phase wander when locked
Integrated Systems Group 44
Also can have dither jitter : phase wander when locked
CDR Dither Jitter
Caused by need to track plesiochronous differences Dither jitter set byDither jitter set by
Latency of the loop usually 10-20 cyclesStep size usually 1% TsymbolStep s e usua y % sy bo# of averages usually 16+
The last two along with transition density set the tracking rate – (CDR loop is first order)
Conflicting requirements between tracking and jitter
Integrated Systems Group 45
2x Oversampling
Slicer
deserializerdataOut
dn
PD
deserializer
Phasecontrol
dn
en
PLLref Clk
control
Phasemixeredge Clk
data Clk
dn-1
en (late)
Generate early/late from dn,dn-1,enSi l 1st d l l i t ti
n-1
Simple 1st order loop, cancels receiver setup time
Jitter on data Clk ≠ PLL outputBase is linear PLL jitter
Integrated Systems Group 46
jCan add non-linear phase selector noise from CDR
Baud-Rate CDR
dlev
dn
dlev
Use information from data level samplerUse information from data-level sampler(already there for adaptation)
Eliminate edge sampler, eclkCurvature to waveform here
Use a comparator to differentiate between dlev & signalDecide later if 0 or 1 from comparitor means early or late
Integrated Systems Group 47
p yEven more transition-like level behavior in 4-PAM mid-data levels
PAM4 : Multiple Transitions
Integrated Systems Group 48
Many transition types → PAM2 CDR unusable
4-PAM Eye With Density
Offset transitions clearly visibleBut good transitions exist
Integrated Systems Group 49
But good transitions exist…
PAM4 Edges & CDR Approach #1Mid-level
00
11
01MSB threshold
MSB
10
MSBData levels
Offset edge sampler to data level to get mid-levels
Best available edge-rate in FSE systemR i d l l d t l
Integrated Systems Group 50
Requires edge samplers placed accurately on data level
PAM4 Edges & CDR Approach #2Minor Major
00
11
01MSB threshold
10
LSBthresholds
Use all minor transitions and one major transition
More transitions to choose from, no voltage offset required
Integrated Systems Group 51
requiredPoorer edge-rate from minor transitions
Measured 4-PAM CDR Performance
2 PAM CDR2-PAM CDR on 4-PAM data
60ps p-p @ 8Gb/s
Phase
p p p @
4-PAM CDR uses only minor transitions
Phase
Lower dither jitter35ps p-p @ 8Gb/s
Integrated Systems Group 52Cycle
Partial response CDR
+1+α+1-α(0,1) (0,1)
-1-α
-1+α(1,0) (1,0)
1 α
F i l l lFour signal levels
Integrated Systems Group 53
Partial response CDR
+1+α+1-α(0,1) (0,1)
-1-α
-1+α(1,0) (1,0)
1 α(1,1)
(0,1)+α
F i l l l(0,0)
(1,0)-α
Four signal levels Offset edge samplers for transitions with ISI
Otherwise timing error
Integrated Systems Group 54
gNeed to filter edges – similar to PAM4
Dual-mode CDR
D Qthresh (+) edgen (+)
lsbn(+), lsbn-1(+)
D Qx0 eClk
eClk
edgen (0)PAM4
msbn , msbn-1 , msbn-2
D Qthresh (-)
eClk
edgen (-)PAM2prDFE lsbn(-), lsbn-1(-)
PAM4
( )eClk
filteredearly/late
Integrated Systems Group 55
J. Zerbe et al, "Design, Equalization and Clock Recovery for a 2.5-10Gb/s 2-PAM/4-PAM Backplane Transceiver Cell," IEEE Journal Solid-State Circuits, Dec. 2003.
Dual-mode CDR
D Qthresh (+) edgen (+)
lsbn(+), lsbn-1(+)
D Qx0 eClk
eClk
edgen (0)PAM4
msbn , msbn-1 , msbn-2
D Qthresh (-)
eClk
edgen (-)PAM2prDFE lsbn(-), lsbn-1(-)
( )eClk
filteredearly/late
PAM2 with loop-unrolled DFE tapLeverage multi-level properties of signals in loop-unrollingRe-use PAM4 receiver hardware (slicers and CDR)
Integrated Systems Group 56
Re-use PAM4 receiver hardware (slicers and CDR)
CDR In a Plesiochronous System
Encode Serializer FIFODeserializerTx Rx Decode
CHIP 1 CHIP 2
8b/10b
ElasticBuffer
10b/8b
PLL CDR f2f1
8 bits@ f1
10 bits@ f1
10 bits@ f1
10 bits@ f2
8 bits@ f2
Goal is to transfer 8bits @ f1 on chip 1 to 8bits @ f2 on chip2First encode and transfer data based on local clock f1Then recover data and clock (f1) on chip 2Elastic buffer (FIFO) used to transfer data from f1 to f2
Integrated Systems Group 57
Elastic buffer (FIFO) used to transfer data from f1 to f2Finally, decode to get 8 bits @ f2
Plesiochronous System Impact
Packets must have appropriate slack timeHow else to recover timing difference?Idle characters must be recognized for slack
Need FIFOs deep enoughNeed FIFOs deep enoughSet by maximum frequency difference & maximum data length
CDR track rateMust be able to track maximum difference including dither
Frequency difference, protocol, maximum data packet size different system componentssize different system components
Integrated Systems Group 58
Second-order loops – PI control
Solves frequency tracking vs. jitter problem
[Lee04]
I branch tracks the constant frequency difference Very low tracking rate & bandwidth (thus low jitter)Sub-1ppm tracking capability
P branch tracks normal phase shifts
Integrated Systems Group 59
P branch tracks normal phase shiftsSmall step size & dither
CDR : Coding & Transition Density
CDR requires transition density to keep lockAC - coupling requires DC-balancep g qPlesiochronous operation requires packets & null characters…..Coding as solutionTypical code : 8b10b from IBM
8 bits into the link => 10 bits on the wires8 bits into the link => 10 bits on the wiresRaw data rate must be 25% faster than effective data rate
6.25Gb raw for 5Gb effective8b10b d t8b10b code guarantees
DC balanceTransition density : 2 transitions every 10 bitsR d d l h
Integrated Systems Group 60
Reserved codes, control characters
8b10b Code Overview
• DC Balanced within everycode word
Integrated Systems Group 61
64/66 Code Overview
Much lower overheadP DC b l & t iti ti
Integrated Systems Group 62Walker IEEE 802.3HSSG
Poorer DC balance & transition properties
Link System Implementations
Signal processing – Tx, RxEqualizationM d l tiModulation
SynchronizationCDRCDRCDR and Eq interaction
Integrated Systems Group 63
CDR & EQ – General Issues
T
DFE
SampledD t
Tx D t Tx
Linear EqChannel
dClk
Data
Sampled Edge
Data
CDR
eClk
Fundamental issue – conditioning signal edges affects CDR edge-position…CDR edge-position effects observed ISI
Can affect both Tx & Rx coefficientsWh t i b t l ti f l t BER???What is best solution for lowest BER???
Courtesy J. Zerbe, RambusIntegrated Systems Group 64
TxEQ Pulse-Shaping Effects on CDR Lock Position
Blue: Rawsingle bit
Initial LockingFinal Locking
single bit response
Purple :Purple :Final result
with adaptation& CDR locking
PostISI & CDR locking
PreISI
Eff t d d T EQ t[Ren 2007]
Effect depends on TxEQ taps1 pre-cursor taps, 1 main, 2 post-cursor taps
Courtesy J. Zerbe, RambusIntegrated Systems Group 65
TxEQ Pulse-Shaping & CDR: Precursors Only
3 pre-cursor taps, 1 main, 0 post-cursor tapsSignificant phase-shift in lock position
Courtesy J. Zerbe, RambusIntegrated Systems Group 66
CDR & DFE : Feedback Pulse Timing
)(tϕchannel single bitresponse )(tϕchannel single bitresponse )(ϕresponse )(ϕresponse
1dfe
2/1dfe 2/)( 21 dfedfe +
2dfe1dfe
2/1dfe 2/)( 21 dfedfe +
2dfe
Edge/DFE interaction with first post-cursor tap ; will move lock pointProper prDFE implementation can avoid this entirely
Unlike TxEq, since taps 2-N are not convolved with channelthey do not interact with CDR
Courtesy J. Zerbe, RambusIntegrated Systems Group 67
ConclusionsBackplane links limited by the channelISI is large in baseband links
C ’t l t l tCan’t completely compensate(At least not with reasonable area/power)
Residual ISI also increases CDR jitterjGenerally have low BER requirements
Accurate noise statistic important Many of large noise source are bounded
Power constrained transmitterPAM4 and PAM2 with simple DFE are attractivePAM4 and PAM2 with simple DFE are attractive solutions
What to do above 10Gb/s?
Integrated Systems Group 68
Next Challenges
More efficient spectral usageMulti level multi tone modulationsMulti-level, multi-tone modulations
Need improved PSR of all circuits in the pathControl/calibrate offset and mismatch
CodingDensity issues
Improve energy-efficiencyImprove energy-efficiency
Control of complex architecturesDeal with crosstalk
Lots of opportunity for design!
Integrated Systems Group 69
Bridging the gap: Multi-tone link
10
Multi-tone data rates with thermal noise
6
8 Nelco 64Gb/sFR4 38Gb/s
Hz
4
6
#bits
/H
0
2
0 2 4 6 8 10 12 140
frequency [GHz]
Integrated Systems Group 70
A. Amirkhany, V. Stojanovic, M.A. Horowitz, “Multi-tone Signaling for High-speed Backplane Electrical Links,” IEEE Global Telecommunications Conference, November 2004.
Bridging the gap: Multi-tone link
6
8
10
Multi-tone data rates with thermal noise
Nelco 64Gb/sFR4 38Gb/s
/Hz
data0
d t 10
2
4
#bits
/
LPF LPFdata0
data1
ls
data10 2 4 6 8 10 12 140
frequency [GHz]BPF BPF
ejw1t ejw1t
data1LPF LPF
…
# le
vel
dataNBPF
e e
BPFLPFdataN
LPF
f
Challenge – balancing the inter-symbol and inter-channel interference
ejwNtejwNt
Integrated Systems Group 71
inter channel interferenceMicrowave filter techniquesCustom signal processing
Works well on channels with notchesPackagePackage
PackagPackag
PackagePackage
PackagPackag
PackagePackage
PackagPackag
PackagePackage
PackagPackag
PackagePackage
PackagPackag
PackagePackage
PackagPackag
PackagePackage
PackagPackag
PackagePackage
PackagPackag
PackagePackage
PackagPackag
PackagePackage
PackagPackag
PackagePackage
PackagPackag
PackagePackage
PackagPackag
PackagePackage
PackagPackag
PackagePackage
PackagPackag
PackagePackage
PackagPackag
PackagePackage
PackagPackag
PackagePackage
PackagPackag
PackagePackage
PackagPackag
PackagePackage
PackagPackag
PackagePackage
PackagPackag
PackagePackage
PackagPackag
PackagePackage
PackagPackag
PackagePackage
PackagPackag
PackagePackage
PackagPackag r
card
PackagePackage
PackagPackag
PackagePackage
PackagPackag
PackagePackage
PackagPackag
PackagePackage
PackagPackag
PackagePackage
PackagPackag
PackagePackage
PackagPackag
PackagePackage
PackagPackag
PackagePackage
PackagPackag
PackagePackage
PackagPackag
PackagePackage
PackagPackag
PackagePackage
PackagPackag
PackagePackage
PackagPackag
PackagePackage
PackagPackag
PackagePackage
PackagPackag
PackagePackage
PackagPackag
PackagePackage
PackagPackag
PackagePackage
PackagPackag
PackagePackage
PackagPackag
PackagePackage
PackagPackag
PackagePackage
PackagPackag
PackagePackage
PackagPackag
PackagePackage
PackagPackag
PackagePackage
PackagPackag
PackagePackage
PackagPackag r
card-10
0Frequency Response
Chip to Chip
gegegegegegegege gegegegegegegege gegegege gegegegegegegege gegegege gegegege gegegegegegegege gegegege
Dau
ghte
r gegegegegegegege gegegegegegegege gegegege gegegegegegegege gegegege gegegege gegegegegegegege gegegege
Dau
ghte
r
-40
-30
-20
dB Multi-Drop(Memory)
BackplaneBackplaneBackplane
M M
0 5 10 15 20-50
Frequency (GHz)
p
Chip A
PCB
Chip B
Package Package
Chip A
PCB
Chip B
Package PackagePackage
Memory controller
PCBDRA
M
Memory cards
Package
Memory controller
PCBDRA
M
Memory cards
Return to bus architectureReturn to bus architectureCan arrange stubs to create controlled notches
Integrated Systems Group 72
Digital Implementation
Phase 1 6GHz
3GHz - I
1
DAC
2 12
EqualizerPattern Generator
PN 1&2
Phase 3
Phase 2
Ph 4
6GS/sDAC
6GS/sDAC
6G
1
0
3GHz - Q0
1
0
PN 5&6
PN 3&4
PN 7&8
2
2
2
12
12
12 12GS/s
Oversampled Tx filter controls the whole
Phase 4 0PN 7&8 2 12GS/s
3GHz3GHz (4PAM)
[Amirkhany 08]
channel band while shaping sub-channelsIntegrated Systems Group 73
Digital Tx FIR implementation
∑=
16
1kkkWx
As technology scalesDi i l FIR bDigital FIRs become more attractive
Integrated Systems Group 74
Multi-Tone Measured Eye DiagramsBaseband Modes 4-Channel AMT Mode
2PAM 2PAM 4PAM Ch1 Ch2 Ch3 Ch42PAM 2PAM 4PAM Ch1 Ch2 Ch3 Ch4Un-Equalized Equalized Equalized Equalized – Post Processed
12Gb/s 12Gb/s 24Gb/s 18Gb/s
Tx sampled with oscilloscopeWith Rx mixing and integration in MatlabOscilloscope measurements
Integrated Systems Group 75
Some cited referencesEqualization and clock recovery for a 2.5-10-Gb/s 2-PAM/4-PAM backplane transceiver cellZerbe, J.L.; Werner, C.W.; Stojanovic, V.; Chen, F.; Wei, J.; Tsang, G.; Kim, D.; Stonecypher, W.F.; Ho, A.; Thrush, T.P.; Kollipara, R.T.; Horowitz, M.A.; Donnelly, K.S.;IEEE Journal of Solid-State CircuitsVolume 38, Issue 12, Dec 2003 Page(s):2121 - 2130 A second-order semidigital clock recovery circuit based on injection lockingHiok-Tiaq Ng; Farjad-Rad, R.; Lee, M.-J.E.; Dally, W.J.; Greer, T.; Poulton, J.; Edmondson, J.H.; Rathi, R.; Senthinathan, R.;IEEE Journal of Solid-State Circuits Volume 38, Issue 12, Dec 2003 Page(s):2101 - 2110 0.622-8.0 Gbps 150 mW serial IO macrocell with fully flexible preemphasis and equalizationFarjad-Rad, R.; Hiok-Taiq Ng; Edward Lee, M.-J.; Senthinathan, R.; Dally, W.J.; Nguyen, A.; Rathi, R.; Poulton, J.; Edmondson, J.; Tran, J.; Yazdanmehr, H.;VLSI Circuits Symposium, 2003. 12-14 June 2003 Page(s):63 - 66 A serial-link transceiver based on 8-GSamples/s A/D and D/A converters in 0.25-μm CMOSYang, C.-K.K.; Stojanovic, V.; Modjtahedi, S.; Horowitz, M.A.; Ellersick, W.F.;IEEE Journal of Solid-State Circuits Volume 36, Issue 11, Nov. 2001 Page(s):1684 - 1692 A 62.5 Gb/s multi-standard SerDes ICP t i H E B Wil T Sh lt S N i k E S j i E W Y G l k i h KPartovi, H.; Evans, B.; Wilson, T.; Shelton, S.; Naviasky, E.; Sanjeevi, E.; Wen, Y.; Gopalakrishnan, K.; Chokkalingam, S.; Thompson, H.; Casas, M.; Lingting Ye; Hufford, M.; Yujing Qiu; Williams, M.; James, J.; Baldiserotto, A.; White, S.; Williams, S.; Georgantas, D.; Gray, T.;Custom Integrated Circuits Conference, 2003. Proceedings of the IEEE 200321-24 Sept. 2003 Page(s):585 - 588 G Sheets J D’Ambrosia “The Impact of Environmental Conditions on Channel Performance "
Integrated Systems Group 76
G. Sheets, J. D’Ambrosia The Impact of Environmental Conditions on Channel Performance, DesignCon 2004
To probe furtherB Widrow et al “Stationary and nonstationary learning characteristics of the LMSB. Widrow et al,, Stationary and nonstationary learning characteristics of the LMS adaptive filter,” Proc. IEEE, vol. 64, no. 8, pp. 1151-1162, 1976.V. Stojanović, G. Ginis and M. A. Horowitz, "Transmit Pre-emphasis for High-Speed Time-Division-Multiplexed Serial Link Transceiver," IEEE International Conference on Communications, pp. 1934 -1939, May 2002.
S G / C OSJ.T. Stonick et al, "An adaptive pam-4 5-Gb/s backplane transceiver in 0.25-μm CMOS," IEEE J. Solid-State Circuits, vol. 38, no. 3, March 2003, pp. 436-443. C-C. Yeh and J. R. Barry, “Adaptive Minimum Bit-Error Rate Equalization for Binary Signaling,” IEEE Transactions on Communications, vol. 48, no. 7, July 2000A Amirkhany V Stojanovic and M A Horowitz "Multi-tone signaling for high-speedA. Amirkhany, V. Stojanovic and M.A. Horowitz Multi-tone signaling for high-speed backplane electrical links," GLOBECOM '04. IEEE vol. 2,, pp. 1111-1117 Vol.2, 2004.E. Alon, V. Stojanovic and M.A. Horowitz "Circuits and techniques for high-resolution measurement of on-chip power supply noise," IEEE Journal of Solid-State Circuits, vol. 40, no. 4 SN - 0018-9200, pp. 820-828, 2005.V St j i A H B W G l F Ch J W i G T E Al R T K lliV. Stojanovic, A. Ho, B.W. Garlepp, F. Chen, J. Wei, G. Tsang, E. Alon, R.T. Kollipara, C.W. Werner, J.L. Zerbe and M.A. Horowitz "Autonomous dual-mode (PAM2/4) serial link transceiver with adaptive equalization and data recovery," IEEE Journal of Solid-State Circuits, vol. 40, no. 4, pp. 1012-1026, 2005.
If you have any questions send me mailvlada@mit edu
Integrated Systems Group 77
vlada@mit.edu
Recommended