Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Ramin Farjad-RadCenter for Integrated Systems
Stanford UniversityStanford, CA 94305
*Funding from LSI Logic, SUN Microsystems, and Powell foundation
ABabcdfghiejkl
March 11, 1999
A CMOS Multi-Gb/s 4-PAM Serial LinkTransceiver*
• Networking high-speed (5 -10Gbps) systems forranges up to 10 meters at lower cost and complexity→ Parallel buses are costly for long distances.
→ Optical fibers are not beneficial for such small ranges.
→ Serial links on copper cables are an attractive solution for this kindof application.
• Push bandwidth limitations of CMOS serial links→ CMOS technology is getting cheaper, faster, and more available.
→ Integrate more digital functions on-chip.
Goals
RxTx
RTERMTiming
Recovery
Copper cable
RTERM
Challenges
System Architecture
Circuit Implementation
Test Results
Conclusion
Outline
• Frequency-dependent attenuation in electrical links due to skineffect resistance and dielectric loss.
→ The -3dB BW of 10-meter PE-142 coax is ~1.0GHz.
Challenges: Interconnection Bandwidth
108
109
1010
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Frequency(Hz)
Ampli
tude(V
)
Frequency Response
• Frequency-dependent attenuation causes ISI.→ Only channel eigen-waveforms result in no ISI.
→ Generation and detection of true eigen-waveforms is not feasibledue to circuit limitations at high frequencies.→ Trapezoidal pulses are instead used as basis waveforms.
→ Higher symbol rate results in more ISI.
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6
0
0.2
0.4
0.6
0.8
1
Time(ns)
Ampli
tude(V
)
Ideal Transmitted PulseCable Pulse Response
ISI ISI
Transmitted pulseCable pulse response
Challenges: Interconnection Bandwidth
Challenges: Data Generation/Detection
• Hard to operate CMOS circuits at directly multi-GHz speeds→ Better to reduce the on-chip frequency
• Recovering embedded timing information from the serial data
→ Large frequency variations of on-chip oscillators and smallfrequency capture range of phase detectors
• Data detection at high speeds
→ Input voltage offset, cross-talk, and signal reflection (reflection ISI)limits the minimum detectable signal.
Data stream
Outline
Challenges
System Architecture
Circuit Implementation
Test Results
Conclusion
Reduction of On-Chip Frequency
• Multiplexing (N :1) and demultiplexing (1:N) the high-speed data at thetransmission line*→ Reduces the on-chip frequency by a factor of N
For example: N = 5 => (for 5Gsym/s) fck = = 1GHz
→ Max switching speed of the process is the limitCMOS provides high-speed transistor switches
5G5
-------
ckt0
ckt1
ckt2
ckt3
ckt4 Rec
eive
r D
etec
torsckr0
ckr1
ckr2
ckr3
ckr4
Transmission line
Tran
smitt
er D
river
s
Multiplexer Demultiplexer
High-speed data
(fck) (fck)
(Nxfck)
*C-K. Yang, R. Farjad, M. Horowitz, VLSI Symp. 97
• 4-PAM is used for data communication in the serial link→ Symbol rate reduces to half that of binary transmission.
→ Lower symbol rate reduces ISI and on-chip clock frequency.
→ Higher level PAM was not used because of:limited transmitter swing, minimum detectable signaland reflection ISI.
• 4Sym-->5Sym conversion guarantees clock recovery
• Gray code mapping of levels reduces BER by 25% vs.linear mapping
Linear Gray
Proposed Modulation
11
10
01
00
10
11
01
00
Only 1 bit error2 bits error
• To cancel the long tail of pulse response, a pre-emphasis symbol-spaced 2-tapFIR filter is implemented at the transmitter:
• To sharpen the signal transition edges, a half-symbol-spaced 1-tap high-passequalizer is implemented at the receiver
Vo n( ) Vi n( ) a Vi n 1–( )•– b Vi n 2–( )•–=
Veq n( ) Vi n( ) α Vi n 12----–
•–=
Proposed Architecture to Combat ISI
0.8 1 1.2 1.4 1.6 1.8 2
−0.2
0
0.2
0.4
0.6
0.8
1
Time(ns)
Ampl
itude
(V)
Pre−emphasized transmitted waveReceived signalequalized signalPulse response (no filtering)
ab
Shaped transmitted waveReceived signalEqualized signalPulse response
Receiver Equalizer
• The half-symbol-spaced filter boosts frequency components up to
• Sharpening the transitions increases the eye opening width.=>Less sensitive to sampling phase errors.
f1Ts------=
Slow transition Sharp transition
Ts=200psR=5Gsym/s
fmax=5GHzfmax-fmax
Challenges
System Architecture
Circuit Implementation
Test Results
Conclusion
Outline
Top Level Architecture
Analog RX VCO
1:5 DemultiplexerSampler & Equalizer
2-b ADC BankVctl
Serial 4-PAM
FilterPh/Fr Detector
10
Analog TX PLL
5:1 Multiplexer/Serializer2-b DAC & Filter
10
5:1 Multiplexing Transmitter
Ring Oscillatorck0 ck1 ck2 ck3 ck4
Sym1 Sym3Sym0 Sym2Sym4
Sym0
TsCLK1
CLK0
D[0:1] (valid for Sym0)
D
Clk0
D
Clk0
Clk1
• Each symbol is generated by the rising and falling egdes of two phasesof clock
x 5
RTERM RTERM
Out
Outb
2-bit Output Driver
DoDo
Clk
0C
lk1 D1D1
D
Clk0
D
Clk0
DoutDout
Clk1
tail
D Clk0
2-bit DAC module Differential drive leg
Vdd Clk1
Dout Dout
x2 Leg x1 Leg
4-PAM Preshaping 5:1 Multiplexer
• Each driver generates a filtered symbol independent of other drivers.
• Simple architecture to implement the filter.
Multiplexer Drivers & 2-Tap Filters
2-b
DA
C
2-b
DA
C,T
1
2-b
DA
C,T
2
Del
ay
Del
ay data
clk0 clk1 clk2
2-b
DA
C
2-b
DA
C,T
1
2-b
DA
C,T
2
Del
ay
Del
ay
clk1 clk2 clk3
data
External Sources
+ - - - + - - -
To 3 Other Drivers
a b
t
Symbol Generation
Sym1 Sym3
CLK2
CLK1
Main Pulse
Sym0
D[0:1] (valid for Sym0)
Sym2Sym4
S0,T1 S2,T1S4,T1 S1,T1S3,T1
S4,T2 S1,T2S3,T2S2,T2
S0,T1
Tap2 stream
Tap1 stream
Main stream
Summed @the output
Delayedversionsof D[0:1]
Tap-Drivers timing
Ts
D[0:1] (Ts delayed for T1)
D[0:1] (2Ts delayed for T2)
S0,T2
Symbol-Width Problem
• Variations in PMOS to NMOS strength ratio result in duty cycle error inthe clocks (unbalanced falling & rising times)
→ The effective width of the final output symbol decreases.
Pulse-Width Control Loop
driver1
dum
my
Ck Buf
D
Ck Buf
D Ck1,2 Ck1,2Vdd
To 4 other drivers
Vdd
Ctl
D D
Ctl + -
dum
my
Va Vb
t
Vb
Vb
Va
Va
Ts
(Wide Pulses)
(Narrow Pulses)
Receiver Timing Recovery
PhaseDetector/
ClockGenerator/
Filter
samp_ck
DinDout
Pre_amplifier
Receiver
VCO
ctrl PLL
Data
Sample Clock
timing margin
• Oversampling phase detection
→ Many input samplers, Phase quantization error, Complex logic.
• Tracking phase detection
→ Conventional bang-bang control: Low loop bandwidth and capture range
→ Proportional control: Desirable
Top Level Architecture
Analog RX VCO
1:5 DemultiplexerSampler & Equalizer
2-b ADC BankVctl
Serial 4-PAM
FilterPh/Fr Detector
10
Analog TX PLL
5:1 Multiplexer/Serializer2-b DAC & Filter
10
Timing Recovery: Front-end
4-PAM Input
Ck0 Ck1 Ck2 Ck3
2-bitADC
2-bitADC
linear
So1 So3So2So0
d0,1 d2,3
(differential)
amp
Simplified receiver front-end (x2 oversampling)
Se1
linearamp
Se3
Multi- ΦClocks
Proposed Proportional Phase Detection
Se<0
1 1
0 0
Se>0
(1-->0) => -(Se) > 0 (0-->1) => +(Se) > 0Speed clock
Cloc k Lags
Se = k.∆φ
∆φ
• Advantages:
→ Larger PLL bandwidth and stability compared to bang-bang PLLs.
→ Zero systematic phase offset (same detection mechanism for edges and data).
→ Zero ripple on control voltage of PLL (unlike bang-bang).
• Disadvantage:
→ Voltage offsets of edge samplers translate into phase error.
Three 4-PAM Transitions
1) Right crossing 3) No crossing2) Misplaced crossing
type1 type2 type3 type1type2
Differential4-PAM input
Sampling edges(in lock)
• Only type1 transitions are used for clock recovery
• Transitions type2 and type3 are ignored by a decision logic
Data Phase Detector
d0,1 d2,3
Se1
d2,3 d4,5
Se3
d8,9 d0,1
Se9
pump LoopCharge filter
+
-
+-
+- VP
Analog RX VCO
To input samplers
decision logic
Data phase detector
• Edge sample values, Se, of type1 are summed with correct polarity atphase detector output (VP)
Frequency Acquisition
• A frequency acquisition aid solves the small capture rangeproblem of the data-recovery phase detector→ The frequency acquisition circuit sets the proper oscillationfrequency before phase locking starts
Phase Det
Freq. Det
VCO
CKref
Data
fCKFreq.
Monitor
LPF2
LPF1|fdata-fck|
VQ=0 ; |fdata-fck| > ∆fVQ=1 ; |fdata-fck| < ∆f << fcapture
VQ
VQ
VP
Frequency Monitor
C
Edge detector
D Q
|fdata-fck|
VQ
VO
OneShot
R
Reset
1
Vp
Top Level Architecture
Analog RX VCO
1:5 DemultiplexerSampler & Equalizer
2-b ADC BankVctl
Serial 4-PAM
FilterPh/Fr Detector
10
Analog TX PLL
5:1 Multiplexer/Serializer2-b DAC & Filter
10
1:5 Demultiplexing Samplers and Equalizers
• 5 Samplers for the symbol centers and 5 samplers for transitions (x2 oversampling).
• Each equalizer uses the present and half a symbol earlier sample (half symbol spaced)
ΣΣΣx x x
Ck0 Ck1 Ck2Tap weight
4-PAM Input
Ck0 Ck1 Ck2 Ck3
2-bitADC
2-bitADC
linearSo1 So3So2So0
S0 S1 S2 S3
differential
amplinearamp
Ck3
Σ
1-tap Equalizer: So1 = S1 - α*S0
Multi- ΦClocks
- - -
Half-symbol-Spaced 1-tap Equalizer
• Equalization function should be performed very fast
→ Subtraction is done by summing the currents, which are proportional to thesampled values with opposite polarity.
• Differential pairs should have a large linear range for proper operation ofanalog equalization.
Sn0 Sp0
Sp1
Tap weight
Analog 1-tap equalizer
SO1 = S1 - α∗S0
α*-I0
I1
(α)
SOn1 SOp1
Sn1
IO1 = I1 - α∗I0 IO1
Input Preamplifier
Vip
Von
Vsrc
(Vi-Vsrc)
Vo
Linear
• Short-channel MOS has a linear ID-VGScharacteristic in saturation region
• “Vsrc” should be set such that Vo-Vi islinear for all values of Vi.
Vin
Vop
Vsrc
VGS
ID
IDLinear
VGS
ID = k.(VGS-Vt)
Differential 4-PAM Level detection
• Flash detection: Three comparators to detectthe 4 levels.
• Differential signaling: Only one referencevoltage is required
0
-Vref
+Vref
Datalevels
Vref
Vip
Vin
+ -a2 + a1 - + a0 -
Input 2-bit ADC
Vip Vinclk1
SR-Latch
Regenerative amp
Vref
VipVin
clk0 clk0
Vsrc
Comp.
Comp.
Comp.
Preamplifier
ref
ref
Gra
y de
code
r A0
A1
+-
+-
+-
Vref
VnVp
a2
a1
a0
a1 A1
a0
a2
A0
a2b
Dec
ode
logi
c
& Equalizer
Vo+
Outline
Challenges
System Architecture
Implementation
Test Results
Conclusion
Modeling The Cable
Oscilloscopetr~40p
TDR
pulse responsenon-ideal
=>
DSP=>
impulse responseIdeal
Convolution=>
symbol responseCable real
* =
SPICE(Xmitter)
Matlab(Cable)
Matlab(Equalizer)
• SPICE models for skin effect are not ideal, need a better model:
→ Directly measure the cable impulse response (time domain)
→ Convolve it with the transmitted symbols
Simulated Eye Diagrams
0 0.1 0.2 0.3 0.4−1
−0.5
0
0.5
1
time (ns)
Ampl
itude
(V)
∆τ ∆v
b)
(b) Eye diagram after cable
(a) Eye diagram after cable
0 0.1 0.2 0.3 0.4−1
−0.5
0
0.5
1
time (ns)
Ampl
itude
(V)
∆τ ∆v
c)
0 0.1 0.2 0.3 0.4
−1
−0.5
0
0.5
1
time (ns)
Ampli
tude
(V)
a)
(With pre-emphasis)
(c) Eye diagram after cable (With pre-emphasis/equalization)
(Without pre-emphasis)
0.35-µm Transmitter Die Photo
Analog TX PLL
Ck Buf
PRBS Enc.
SRAM
4-PAM FIR Xmitter
Resync.
Total die area: 2mm x 1.5mm
4-PAM FIR transmitter: 0.8mm x 0.3mm
Measured Eye Diagrams
b)
(b) 10Gb/s eye diagram after the cable
(a) 10Gb/s eye diagram at source
(With Pre-emphasis)
(c) 8Gb/s Eye diagram after the cable (With pre-emphasis)
(No pre-emphasis)
a)
c)
0.35-µm Transmitter Performance
Transmitter Data Rate Eye Height Eye Width
10Gb/s, 10meter, W/ Pre-emphasis 200mV 90ps - 70ps
10Gb/s, 10meter, No Pre-emphasis 0 0
8Gb/s, 10meter, W/ Pre-emphasis 350mV 110ps - 90ps
8Gb/s, 10meter, No Pre-emphasis < 60mV < 50ps
Transmitter Output Jitter:Peak to peak 32psRMS 8ps
Power @ 10Gb/s (5Gsym/s):Analog 0.7wattsOutput Driver 0.5wattsSync/Logic 0.3watts
---------------------------------------------------------------------------------------------------
Total 1.5watts
0.3-µm Full Transceiver Die Photo
4-PAM FIR
Resync PRBS Enc.
SRAM
TX PLL
RX PLL
Samplers &
Resync
PRBS dec.
Analog bypass cap
Digitalbypass cap
Phase detector
Xmitter
Total die area: 2mm x 2mm
Total receiver data recovery section: 0.85mm x 0.43mm
0.3-µm Transceiver Performance
Maximum link speed 8Gbps @ 3V
Transmitter output Jitter @ 8Gbps 11ps (peak-peak), 2ps (rms)
Receiver PLL Jitter @8Gbps 28ps (peak-peak), 4ps (rms)
Receiver PLL dynamics BW > 30MHz, Ph.m. > 48
Receiver PLL capture range ~ 20MHz
Min. input swing to capture lock mV (diff.)
Min. input swing to maintain lock mV (diff.)
°
400±
300±
Power8Gb/s, 3V
Analog:750mW
4-PAM driver:220mW
Other:130mW
Total:1.1W
BER Measurements
PRBS Gen.
enco
der
Driv
er
Sam
pler
enco
defr
m d
et
PR
BS
deco
de
Freq. Counter
Ck1 Ck2Clock Gen.
Xmitter Receiver
on-chip on-chip
→ At 8Gbps: BER ~ 10-7, Window = 50ps
(almost no improvement with input equalizer)
→ At 6Gbps: BER ~ 10-14, Window = 150ps
(30ps improvement with input equalizer)
• Possible factors for lower BER at high speeds:- Line Reflection : Bad on-chip terminations, package/bondwire- EMI from neighboring high-speed bondwires
A solution for multi-Gbps transmission over bandwidth-limited cables instandard CMOS technology:
• Transmitter:
→ A high-speed 4-PAM DAC design to reduce the symbol rate to half (v.s. 2-PAM).
→ A FIR preshaping filter to perform at multi-Gbps rates with very low complexity.
→ A control circuit to optimize the width of the transmitted symbols.
• Receiver:
→ An analog FIR equalizer effective up to multi-GHz ranges in CMOS technology.
→ A new proportional data-recovery phase detector for detecting 4-PAM serial data.
→ A new frequency-acquisition technique for data-recovery PLLs.
A 4-PAM transceiver capable of data transmission up to 8 Gbps over 10-mcopper cable with BW~1GHz in 0.3-µm CMOS technology.
Contributions
Acknowledgments
Future Work
• Use higher-level N-PAM modulation→ Challenge: Very fast ADCs and DACs with higher resolution.
→ Explore general methods of N-PAM data recovery.
• Advanced communication methods for narrow-band channels:
→ Channel eigen function as transmission symbol.
→ Maximum likelihood detection (ML)
→ Multi-Carrier techniques (e.g DMT in ADSL)
→ Use coding methods to reduce BER.
Frequency Monitor
in+ in-
LevelConverter
delay ECL to CMOS
Hysteresis edge detector
CEdgedetector
D Q
|fdata-fck|
VQ
VO
OneShot
R
Reset
1
2WW W
Vp
Analog Supply Drop
• The on-chip VCOs speed was limited to 800MHz (8Gbps) due to analog supplydrop
→ Analog supply traces has ~1.8Ω resistance in series.- Only one pin for “Vdda” or “Gnda”
→ 250mA analog supply current at 8Gbps => ~0.45V drop on analog supply!
Gnd
Vdd
Vdda
Gnda
Anal
og