47
Ramin Farjad-Rad Center for Integrated Systems Stanford University Stanford, CA 94305 *Funding from LSI Logic, SUN Microsystems, and Powell foundation ABabcdfghiejkl March 11, 1999 A CMOS Multi-Gb/s 4-PAM Serial Link Transceiver*

*Funding from LSI Logic, SUN Microsystems, and Powell

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: *Funding from LSI Logic, SUN Microsystems, and Powell

Ramin Farjad-RadCenter for Integrated Systems

Stanford UniversityStanford, CA 94305

*Funding from LSI Logic, SUN Microsystems, and Powell foundation

ABabcdfghiejkl

March 11, 1999

A CMOS Multi-Gb/s 4-PAM Serial LinkTransceiver*

Page 2: *Funding from LSI Logic, SUN Microsystems, and Powell

• Networking high-speed (5 -10Gbps) systems forranges up to 10 meters at lower cost and complexity→ Parallel buses are costly for long distances.

→ Optical fibers are not beneficial for such small ranges.

→ Serial links on copper cables are an attractive solution for this kindof application.

• Push bandwidth limitations of CMOS serial links→ CMOS technology is getting cheaper, faster, and more available.

→ Integrate more digital functions on-chip.

Goals

RxTx

RTERMTiming

Recovery

Copper cable

RTERM

Page 3: *Funding from LSI Logic, SUN Microsystems, and Powell

Challenges

System Architecture

Circuit Implementation

Test Results

Conclusion

Outline

Page 4: *Funding from LSI Logic, SUN Microsystems, and Powell

• Frequency-dependent attenuation in electrical links due to skineffect resistance and dielectric loss.

→ The -3dB BW of 10-meter PE-142 coax is ~1.0GHz.

Challenges: Interconnection Bandwidth

108

109

1010

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Frequency(Hz)

Ampli

tude(V

)

Frequency Response

Page 5: *Funding from LSI Logic, SUN Microsystems, and Powell

• Frequency-dependent attenuation causes ISI.→ Only channel eigen-waveforms result in no ISI.

→ Generation and detection of true eigen-waveforms is not feasibledue to circuit limitations at high frequencies.→ Trapezoidal pulses are instead used as basis waveforms.

→ Higher symbol rate results in more ISI.

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6

0

0.2

0.4

0.6

0.8

1

Time(ns)

Ampli

tude(V

)

Ideal Transmitted PulseCable Pulse Response

ISI ISI

Transmitted pulseCable pulse response

Challenges: Interconnection Bandwidth

Page 6: *Funding from LSI Logic, SUN Microsystems, and Powell

Challenges: Data Generation/Detection

• Hard to operate CMOS circuits at directly multi-GHz speeds→ Better to reduce the on-chip frequency

• Recovering embedded timing information from the serial data

→ Large frequency variations of on-chip oscillators and smallfrequency capture range of phase detectors

• Data detection at high speeds

→ Input voltage offset, cross-talk, and signal reflection (reflection ISI)limits the minimum detectable signal.

Data stream

Page 7: *Funding from LSI Logic, SUN Microsystems, and Powell

Outline

Challenges

System Architecture

Circuit Implementation

Test Results

Conclusion

Page 8: *Funding from LSI Logic, SUN Microsystems, and Powell

Reduction of On-Chip Frequency

• Multiplexing (N :1) and demultiplexing (1:N) the high-speed data at thetransmission line*→ Reduces the on-chip frequency by a factor of N

For example: N = 5 => (for 5Gsym/s) fck = = 1GHz

→ Max switching speed of the process is the limitCMOS provides high-speed transistor switches

5G5

-------

ckt0

ckt1

ckt2

ckt3

ckt4 Rec

eive

r D

etec

torsckr0

ckr1

ckr2

ckr3

ckr4

Transmission line

Tran

smitt

er D

river

s

Multiplexer Demultiplexer

High-speed data

(fck) (fck)

(Nxfck)

*C-K. Yang, R. Farjad, M. Horowitz, VLSI Symp. 97

Page 9: *Funding from LSI Logic, SUN Microsystems, and Powell

• 4-PAM is used for data communication in the serial link→ Symbol rate reduces to half that of binary transmission.

→ Lower symbol rate reduces ISI and on-chip clock frequency.

→ Higher level PAM was not used because of:limited transmitter swing, minimum detectable signaland reflection ISI.

• 4Sym-->5Sym conversion guarantees clock recovery

• Gray code mapping of levels reduces BER by 25% vs.linear mapping

Linear Gray

Proposed Modulation

11

10

01

00

10

11

01

00

Only 1 bit error2 bits error

Page 10: *Funding from LSI Logic, SUN Microsystems, and Powell

• To cancel the long tail of pulse response, a pre-emphasis symbol-spaced 2-tapFIR filter is implemented at the transmitter:

• To sharpen the signal transition edges, a half-symbol-spaced 1-tap high-passequalizer is implemented at the receiver

Vo n( ) Vi n( ) a Vi n 1–( )•– b Vi n 2–( )•–=

Veq n( ) Vi n( ) α Vi n 12----–

•–=

Proposed Architecture to Combat ISI

0.8 1 1.2 1.4 1.6 1.8 2

−0.2

0

0.2

0.4

0.6

0.8

1

Time(ns)

Ampl

itude

(V)

Pre−emphasized transmitted waveReceived signalequalized signalPulse response (no filtering)

ab

Shaped transmitted waveReceived signalEqualized signalPulse response

Page 11: *Funding from LSI Logic, SUN Microsystems, and Powell

Receiver Equalizer

• The half-symbol-spaced filter boosts frequency components up to

• Sharpening the transitions increases the eye opening width.=>Less sensitive to sampling phase errors.

f1Ts------=

Slow transition Sharp transition

Ts=200psR=5Gsym/s

fmax=5GHzfmax-fmax

Page 12: *Funding from LSI Logic, SUN Microsystems, and Powell

Challenges

System Architecture

Circuit Implementation

Test Results

Conclusion

Outline

Page 13: *Funding from LSI Logic, SUN Microsystems, and Powell

Top Level Architecture

Analog RX VCO

1:5 DemultiplexerSampler & Equalizer

2-b ADC BankVctl

Serial 4-PAM

FilterPh/Fr Detector

10

Analog TX PLL

5:1 Multiplexer/Serializer2-b DAC & Filter

10

Page 14: *Funding from LSI Logic, SUN Microsystems, and Powell

5:1 Multiplexing Transmitter

Ring Oscillatorck0 ck1 ck2 ck3 ck4

Sym1 Sym3Sym0 Sym2Sym4

Sym0

TsCLK1

CLK0

D[0:1] (valid for Sym0)

D

Clk0

D

Clk0

Clk1

• Each symbol is generated by the rising and falling egdes of two phasesof clock

x 5

RTERM RTERM

Out

Outb

Page 15: *Funding from LSI Logic, SUN Microsystems, and Powell

2-bit Output Driver

DoDo

Clk

0C

lk1 D1D1

D

Clk0

D

Clk0

DoutDout

Clk1

tail

D Clk0

2-bit DAC module Differential drive leg

Vdd Clk1

Dout Dout

x2 Leg x1 Leg

Page 16: *Funding from LSI Logic, SUN Microsystems, and Powell

4-PAM Preshaping 5:1 Multiplexer

• Each driver generates a filtered symbol independent of other drivers.

• Simple architecture to implement the filter.

Multiplexer Drivers & 2-Tap Filters

2-b

DA

C

2-b

DA

C,T

1

2-b

DA

C,T

2

Del

ay

Del

ay data

clk0 clk1 clk2

2-b

DA

C

2-b

DA

C,T

1

2-b

DA

C,T

2

Del

ay

Del

ay

clk1 clk2 clk3

data

External Sources

+ - - - + - - -

To 3 Other Drivers

a b

t

Page 17: *Funding from LSI Logic, SUN Microsystems, and Powell

Symbol Generation

Sym1 Sym3

CLK2

CLK1

Main Pulse

Sym0

D[0:1] (valid for Sym0)

Sym2Sym4

S0,T1 S2,T1S4,T1 S1,T1S3,T1

S4,T2 S1,T2S3,T2S2,T2

S0,T1

Tap2 stream

Tap1 stream

Main stream

Summed @the output

Delayedversionsof D[0:1]

Tap-Drivers timing

Ts

D[0:1] (Ts delayed for T1)

D[0:1] (2Ts delayed for T2)

S0,T2

Page 18: *Funding from LSI Logic, SUN Microsystems, and Powell

Symbol-Width Problem

• Variations in PMOS to NMOS strength ratio result in duty cycle error inthe clocks (unbalanced falling & rising times)

→ The effective width of the final output symbol decreases.

Page 19: *Funding from LSI Logic, SUN Microsystems, and Powell

Pulse-Width Control Loop

driver1

dum

my

Ck Buf

D

Ck Buf

D Ck1,2 Ck1,2Vdd

To 4 other drivers

Vdd

Ctl

D D

Ctl + -

dum

my

Va Vb

t

Vb

Vb

Va

Va

Ts

(Wide Pulses)

(Narrow Pulses)

Page 20: *Funding from LSI Logic, SUN Microsystems, and Powell

Receiver Timing Recovery

PhaseDetector/

ClockGenerator/

Filter

samp_ck

DinDout

Pre_amplifier

Receiver

VCO

ctrl PLL

Data

Sample Clock

timing margin

• Oversampling phase detection

→ Many input samplers, Phase quantization error, Complex logic.

• Tracking phase detection

→ Conventional bang-bang control: Low loop bandwidth and capture range

→ Proportional control: Desirable

Page 21: *Funding from LSI Logic, SUN Microsystems, and Powell

Top Level Architecture

Analog RX VCO

1:5 DemultiplexerSampler & Equalizer

2-b ADC BankVctl

Serial 4-PAM

FilterPh/Fr Detector

10

Analog TX PLL

5:1 Multiplexer/Serializer2-b DAC & Filter

10

Page 22: *Funding from LSI Logic, SUN Microsystems, and Powell

Timing Recovery: Front-end

4-PAM Input

Ck0 Ck1 Ck2 Ck3

2-bitADC

2-bitADC

linear

So1 So3So2So0

d0,1 d2,3

(differential)

amp

Simplified receiver front-end (x2 oversampling)

Se1

linearamp

Se3

Multi- ΦClocks

Page 23: *Funding from LSI Logic, SUN Microsystems, and Powell

Proposed Proportional Phase Detection

Se<0

1 1

0 0

Se>0

(1-->0) => -(Se) > 0 (0-->1) => +(Se) > 0Speed clock

Cloc k Lags

Se = k.∆φ

∆φ

• Advantages:

→ Larger PLL bandwidth and stability compared to bang-bang PLLs.

→ Zero systematic phase offset (same detection mechanism for edges and data).

→ Zero ripple on control voltage of PLL (unlike bang-bang).

• Disadvantage:

→ Voltage offsets of edge samplers translate into phase error.

Page 24: *Funding from LSI Logic, SUN Microsystems, and Powell

Three 4-PAM Transitions

1) Right crossing 3) No crossing2) Misplaced crossing

type1 type2 type3 type1type2

Differential4-PAM input

Sampling edges(in lock)

• Only type1 transitions are used for clock recovery

• Transitions type2 and type3 are ignored by a decision logic

Page 25: *Funding from LSI Logic, SUN Microsystems, and Powell

Data Phase Detector

d0,1 d2,3

Se1

d2,3 d4,5

Se3

d8,9 d0,1

Se9

pump LoopCharge filter

+

-

+-

+- VP

Analog RX VCO

To input samplers

decision logic

Data phase detector

• Edge sample values, Se, of type1 are summed with correct polarity atphase detector output (VP)

Page 26: *Funding from LSI Logic, SUN Microsystems, and Powell

Frequency Acquisition

• A frequency acquisition aid solves the small capture rangeproblem of the data-recovery phase detector→ The frequency acquisition circuit sets the proper oscillationfrequency before phase locking starts

Phase Det

Freq. Det

VCO

CKref

Data

fCKFreq.

Monitor

LPF2

LPF1|fdata-fck|

VQ=0 ; |fdata-fck| > ∆fVQ=1 ; |fdata-fck| < ∆f << fcapture

VQ

VQ

VP

Page 27: *Funding from LSI Logic, SUN Microsystems, and Powell

Frequency Monitor

C

Edge detector

D Q

|fdata-fck|

VQ

VO

OneShot

R

Reset

1

Vp

Page 28: *Funding from LSI Logic, SUN Microsystems, and Powell

Top Level Architecture

Analog RX VCO

1:5 DemultiplexerSampler & Equalizer

2-b ADC BankVctl

Serial 4-PAM

FilterPh/Fr Detector

10

Analog TX PLL

5:1 Multiplexer/Serializer2-b DAC & Filter

10

Page 29: *Funding from LSI Logic, SUN Microsystems, and Powell

1:5 Demultiplexing Samplers and Equalizers

• 5 Samplers for the symbol centers and 5 samplers for transitions (x2 oversampling).

• Each equalizer uses the present and half a symbol earlier sample (half symbol spaced)

ΣΣΣx x x

Ck0 Ck1 Ck2Tap weight

4-PAM Input

Ck0 Ck1 Ck2 Ck3

2-bitADC

2-bitADC

linearSo1 So3So2So0

S0 S1 S2 S3

differential

amplinearamp

Ck3

Σ

1-tap Equalizer: So1 = S1 - α*S0

Multi- ΦClocks

- - -

Page 30: *Funding from LSI Logic, SUN Microsystems, and Powell

Half-symbol-Spaced 1-tap Equalizer

• Equalization function should be performed very fast

→ Subtraction is done by summing the currents, which are proportional to thesampled values with opposite polarity.

• Differential pairs should have a large linear range for proper operation ofanalog equalization.

Sn0 Sp0

Sp1

Tap weight

Analog 1-tap equalizer

SO1 = S1 - α∗S0

α*-I0

I1

(α)

SOn1 SOp1

Sn1

IO1 = I1 - α∗I0 IO1

Page 31: *Funding from LSI Logic, SUN Microsystems, and Powell

Input Preamplifier

Vip

Von

Vsrc

(Vi-Vsrc)

Vo

Linear

• Short-channel MOS has a linear ID-VGScharacteristic in saturation region

• “Vsrc” should be set such that Vo-Vi islinear for all values of Vi.

Vin

Vop

Vsrc

VGS

ID

IDLinear

VGS

ID = k.(VGS-Vt)

Page 32: *Funding from LSI Logic, SUN Microsystems, and Powell

Differential 4-PAM Level detection

• Flash detection: Three comparators to detectthe 4 levels.

• Differential signaling: Only one referencevoltage is required

0

-Vref

+Vref

Datalevels

Vref

Vip

Vin

+ -a2 + a1 - + a0 -

Page 33: *Funding from LSI Logic, SUN Microsystems, and Powell

Input 2-bit ADC

Vip Vinclk1

SR-Latch

Regenerative amp

Vref

VipVin

clk0 clk0

Vsrc

Comp.

Comp.

Comp.

Preamplifier

ref

ref

Gra

y de

code

r A0

A1

+-

+-

+-

Vref

VnVp

a2

a1

a0

a1 A1

a0

a2

A0

a2b

Dec

ode

logi

c

& Equalizer

Vo+

Page 34: *Funding from LSI Logic, SUN Microsystems, and Powell

Outline

Challenges

System Architecture

Implementation

Test Results

Conclusion

Page 35: *Funding from LSI Logic, SUN Microsystems, and Powell

Modeling The Cable

Oscilloscopetr~40p

TDR

pulse responsenon-ideal

=>

DSP=>

impulse responseIdeal

Convolution=>

symbol responseCable real

* =

SPICE(Xmitter)

Matlab(Cable)

Matlab(Equalizer)

• SPICE models for skin effect are not ideal, need a better model:

→ Directly measure the cable impulse response (time domain)

→ Convolve it with the transmitted symbols

Page 36: *Funding from LSI Logic, SUN Microsystems, and Powell

Simulated Eye Diagrams

0 0.1 0.2 0.3 0.4−1

−0.5

0

0.5

1

time (ns)

Ampl

itude

(V)

∆τ ∆v

b)

(b) Eye diagram after cable

(a) Eye diagram after cable

0 0.1 0.2 0.3 0.4−1

−0.5

0

0.5

1

time (ns)

Ampl

itude

(V)

∆τ ∆v

c)

0 0.1 0.2 0.3 0.4

−1

−0.5

0

0.5

1

time (ns)

Ampli

tude

(V)

a)

(With pre-emphasis)

(c) Eye diagram after cable (With pre-emphasis/equalization)

(Without pre-emphasis)

Page 37: *Funding from LSI Logic, SUN Microsystems, and Powell

0.35-µm Transmitter Die Photo

Analog TX PLL

Ck Buf

PRBS Enc.

SRAM

4-PAM FIR Xmitter

Resync.

Total die area: 2mm x 1.5mm

4-PAM FIR transmitter: 0.8mm x 0.3mm

Page 38: *Funding from LSI Logic, SUN Microsystems, and Powell

Measured Eye Diagrams

b)

(b) 10Gb/s eye diagram after the cable

(a) 10Gb/s eye diagram at source

(With Pre-emphasis)

(c) 8Gb/s Eye diagram after the cable (With pre-emphasis)

(No pre-emphasis)

a)

c)

Page 39: *Funding from LSI Logic, SUN Microsystems, and Powell

0.35-µm Transmitter Performance

Transmitter Data Rate Eye Height Eye Width

10Gb/s, 10meter, W/ Pre-emphasis 200mV 90ps - 70ps

10Gb/s, 10meter, No Pre-emphasis 0 0

8Gb/s, 10meter, W/ Pre-emphasis 350mV 110ps - 90ps

8Gb/s, 10meter, No Pre-emphasis < 60mV < 50ps

Transmitter Output Jitter:Peak to peak 32psRMS 8ps

Power @ 10Gb/s (5Gsym/s):Analog 0.7wattsOutput Driver 0.5wattsSync/Logic 0.3watts

---------------------------------------------------------------------------------------------------

Total 1.5watts

Page 40: *Funding from LSI Logic, SUN Microsystems, and Powell

0.3-µm Full Transceiver Die Photo

4-PAM FIR

Resync PRBS Enc.

SRAM

TX PLL

RX PLL

Samplers &

Resync

PRBS dec.

Analog bypass cap

Digitalbypass cap

Phase detector

Xmitter

Total die area: 2mm x 2mm

Total receiver data recovery section: 0.85mm x 0.43mm

Page 41: *Funding from LSI Logic, SUN Microsystems, and Powell

0.3-µm Transceiver Performance

Maximum link speed 8Gbps @ 3V

Transmitter output Jitter @ 8Gbps 11ps (peak-peak), 2ps (rms)

Receiver PLL Jitter @8Gbps 28ps (peak-peak), 4ps (rms)

Receiver PLL dynamics BW > 30MHz, Ph.m. > 48

Receiver PLL capture range ~ 20MHz

Min. input swing to capture lock mV (diff.)

Min. input swing to maintain lock mV (diff.)

°

400±

300±

Power8Gb/s, 3V

Analog:750mW

4-PAM driver:220mW

Other:130mW

Total:1.1W

Page 42: *Funding from LSI Logic, SUN Microsystems, and Powell

BER Measurements

PRBS Gen.

enco

der

Driv

er

Sam

pler

enco

defr

m d

et

PR

BS

deco

de

Freq. Counter

Ck1 Ck2Clock Gen.

Xmitter Receiver

on-chip on-chip

→ At 8Gbps: BER ~ 10-7, Window = 50ps

(almost no improvement with input equalizer)

→ At 6Gbps: BER ~ 10-14, Window = 150ps

(30ps improvement with input equalizer)

• Possible factors for lower BER at high speeds:- Line Reflection : Bad on-chip terminations, package/bondwire- EMI from neighboring high-speed bondwires

Page 43: *Funding from LSI Logic, SUN Microsystems, and Powell

A solution for multi-Gbps transmission over bandwidth-limited cables instandard CMOS technology:

• Transmitter:

→ A high-speed 4-PAM DAC design to reduce the symbol rate to half (v.s. 2-PAM).

→ A FIR preshaping filter to perform at multi-Gbps rates with very low complexity.

→ A control circuit to optimize the width of the transmitted symbols.

• Receiver:

→ An analog FIR equalizer effective up to multi-GHz ranges in CMOS technology.

→ A new proportional data-recovery phase detector for detecting 4-PAM serial data.

→ A new frequency-acquisition technique for data-recovery PLLs.

A 4-PAM transceiver capable of data transmission up to 8 Gbps over 10-mcopper cable with BW~1GHz in 0.3-µm CMOS technology.

Contributions

Page 44: *Funding from LSI Logic, SUN Microsystems, and Powell

Acknowledgments

Page 45: *Funding from LSI Logic, SUN Microsystems, and Powell

Future Work

• Use higher-level N-PAM modulation→ Challenge: Very fast ADCs and DACs with higher resolution.

→ Explore general methods of N-PAM data recovery.

• Advanced communication methods for narrow-band channels:

→ Channel eigen function as transmission symbol.

→ Maximum likelihood detection (ML)

→ Multi-Carrier techniques (e.g DMT in ADSL)

→ Use coding methods to reduce BER.

Page 46: *Funding from LSI Logic, SUN Microsystems, and Powell

Frequency Monitor

in+ in-

LevelConverter

delay ECL to CMOS

Hysteresis edge detector

CEdgedetector

D Q

|fdata-fck|

VQ

VO

OneShot

R

Reset

1

2WW W

Vp

Page 47: *Funding from LSI Logic, SUN Microsystems, and Powell

Analog Supply Drop

• The on-chip VCOs speed was limited to 800MHz (8Gbps) due to analog supplydrop

→ Analog supply traces has ~1.8Ω resistance in series.- Only one pin for “Vdda” or “Gnda”

→ 250mA analog supply current at 8Gbps => ~0.45V drop on analog supply!

Gnd

Vdd

Vdda

Gnda

Anal

og