3
Ultra Low Power 10- to 25-Gb/s CMOS-Driven VCSEL Links Jonathan E. Proesel, Clint L. Schow, and Alexander V. Rylyakov IBM T. J. Watson Research Center, 1101 Kitchawan Road, Yorktown Heights, NY 10598 [email protected] Abstract: Two optical links sharing a TX but using different RXs explore the speed/power/area tradeoff. Circuits are 90nm CMOS. Record full-link power efficiencies of 1.37pJ/bit at 15Gb/s and 3.6pJ/bit at 25Gb/s are achieved. OCIS codes: (060.2360) Fiber optics links and subsystems; (060.2380) Fiber optics sources and detectors; (200.4650) Optical interconnects. 1. Introduction Power-efficient optical links are required for high-bandwidth communications in future high-performance computing systems. Hundreds of Gb/s to Tb/s must be supplied between microprocessor cores and between cores and memory. To meet tightly-constrained power budgets, the power efficiency of links in exascale systems must be on the order of 2pJ/bit [1]. Previous work has demonstrated 15Gb/s VCSEL-based optical links with 4.6pJ/bit power efficiency [2]. A 10Gb/s silicon photonics transceiver was reported [3] with less than 0.6pJ/bit power efficiency, but this number does not include the external laser power or the power used for thermal tuning of the ring resonators. This paper describes two VCSEL-based multimode (MM) fiber optical links achieving sub-2pJ/bit power efficiency from 15Gb/s to 22Gb/s. The TX and RX circuitry are realized in 90nm CMOS. The links share the same TX but use two different RX designs that explore different aspects of the speed/power/area tradeoff. 2. Link Description A block diagram of the optical links is shown in Fig. 1a. The TX consists of a laser diode driver (LDD) chip wirebonded to a 5μm diameter VCSEL [4] (Fig. 1b). The LDD consists of a 2-stage Cherry-Hooper (CH) preamplifier (PA) and main driver (DRV). The DRV is an inductively-peaked differential amplifier with one output used to modulate the VCSEL and the other tied to an on-chip resistor of similar series resistance to the VCSEL. The LDD occupies 80μm×170μm and is powered by V DD_PA = 1.6V nominal and V DD_DRV = 1V nominal. DRV PA V DD_PA V DD_DRV V VCSEL Data In - + LDD Chip Boundary VCSEL V PD RX Data Out PD MM Fiber DRV PA V DD_PA V DD_DRV V VCSEL Data In - + LDD Chip Boundary VCSEL V PD RX Data Out PD MM Fiber a) VCSEL LDD VCSEL LDD b) Fig. 1. a) Block diagram of the optical links. b) Photograph of the wirebonded TX test site. The RX consists of a 25μm diameter photodiode (PD) [4] wirebonded to one of two RX chips. The two RX designs are called the T-coil RX and the CMOS inverter RX. The T-coil RX targets high speed and low power at the expense of area. The CMOS inverter RX targets low power and small area for high integration density at the expense of speed. In addition, the CMOS inverter RX is self-biasing and scales into nanometer digital CMOS nodes. The T-coil RX block diagram is shown in Fig. 2a. The transimpedance amplifier (TIA) is realized as a pair of CMOS inverters with resistive feedback, one active and one a replica. A differential pair steers current into the TIA input nodes for DC offset compensation. The limiting amplifier (LA) is a 5-stage differential amplifier using T-coils for bandwidth (BW) extension. The low pass filter (LPF) amplifies the difference between the DC levels at the LA output and returns it to the TIA for DC offset compensation. The output stage (OUT) is an inductively-peaked differential amplifier. The nominal supply voltage for the TIA, LA, OUT, and LPF is 1.2V. The T-coil RX has 8.2kgain, 20GHz BW, and 8MHz low frequency cutoff in simulation. The T-coil RX occupies 250μm×390μm. The CMOS inverter RX block diagram is shown in Fig. 3a. The TIA is a CMOS inverter with resistive feedback, while a second inverter sinks or sources current at the input for self-biasing. The LA is a 4-stage amplifier, each stage realized as two inverters with resistive feedback around the second inverter to boost the speed, similar to the CH amplifier. The LPF provides feedback for self-biasing and is realized as a single-pole RC filter with a Miller- boosted capacitor. The output stage is an NMOS source follower (SF) driving an off-chip 50load. The nominal OW4I.3.pdf 1 1/23/2012 11:52:18 AM OFC/NFOEC Technical Digest © 2012 OSA ©Optical Society of America

Ultra Low Power 10-To 25Gb.s CMOS-Driven VCSEL Links

Embed Size (px)

Citation preview

Page 1: Ultra Low Power 10-To 25Gb.s CMOS-Driven VCSEL Links

Ultra Low Power 10- to 25-Gb/s

CMOS-Driven VCSEL Links

Jonathan E. Proesel, Clint L. Schow, and Alexander V. Rylyakov IBM T. J. Watson Research Center, 1101 Kitchawan Road, Yorktown Heights, NY 10598

[email protected]

Abstract: Two optical links sharing a TX but using different RXs explore the speed/power/area

tradeoff. Circuits are 90nm CMOS. Record full-link power efficiencies of 1.37pJ/bit at 15Gb/s and

3.6pJ/bit at 25Gb/s are achieved. OCIS codes: (060.2360) Fiber optics links and subsystems; (060.2380) Fiber optics sources and detectors; (200.4650)

Optical interconnects.

1. Introduction

Power-efficient optical links are required for high-bandwidth communications in future high-performance

computing systems. Hundreds of Gb/s to Tb/s must be supplied between microprocessor cores and between cores

and memory. To meet tightly-constrained power budgets, the power efficiency of links in exascale systems must be

on the order of 2pJ/bit [1]. Previous work has demonstrated 15Gb/s VCSEL-based optical links with 4.6pJ/bit power

efficiency [2]. A 10Gb/s silicon photonics transceiver was reported [3] with less than 0.6pJ/bit power efficiency, but

this number does not include the external laser power or the power used for thermal tuning of the ring resonators.

This paper describes two VCSEL-based multimode (MM) fiber optical links achieving sub-2pJ/bit power

efficiency from 15Gb/s to 22Gb/s. The TX and RX circuitry are realized in 90nm CMOS. The links share the same

TX but use two different RX designs that explore different aspects of the speed/power/area tradeoff.

2. Link Description

A block diagram of the optical links is shown in Fig. 1a. The TX consists of a laser diode driver (LDD) chip

wirebonded to a 5µm diameter VCSEL [4] (Fig. 1b). The LDD consists of a 2-stage Cherry-Hooper (CH)

preamplifier (PA) and main driver (DRV). The DRV is an inductively-peaked differential amplifier with one output

used to modulate the VCSEL and the other tied to an on-chip resistor of similar series resistance to the VCSEL. The

LDD occupies 80µm×170µm and is powered by VDD_PA = 1.6V nominal and VDD_DRV = 1V nominal.

DRVPA

VDD_PA VDD_DRV VVCSEL

DataIn −

+

LDD ChipBoundary

VCSEL

VPD

RX

DataOut

PD

MM Fiber

DRVPA

VDD_PA VDD_DRV VVCSEL

DataIn −

+

LDD ChipBoundary

VCSEL

VPD

RX

DataOut

PD

MM Fibera) VCSEL

LDD

VCSEL

LDD

b)

Fig. 1. a) Block diagram of the optical links. b) Photograph of the wirebonded TX test site.

The RX consists of a 25µm diameter photodiode (PD) [4] wirebonded to one of two RX chips. The two RX

designs are called the T-coil RX and the CMOS inverter RX. The T-coil RX targets high speed and low power at the

expense of area. The CMOS inverter RX targets low power and small area for high integration density at the

expense of speed. In addition, the CMOS inverter RX is self-biasing and scales into nanometer digital CMOS nodes.

The T-coil RX block diagram is shown in Fig. 2a. The transimpedance amplifier (TIA) is realized as a pair of

CMOS inverters with resistive feedback, one active and one a replica. A differential pair steers current into the TIA

input nodes for DC offset compensation. The limiting amplifier (LA) is a 5-stage differential amplifier using T-coils

for bandwidth (BW) extension. The low pass filter (LPF) amplifies the difference between the DC levels at the LA

output and returns it to the TIA for DC offset compensation. The output stage (OUT) is an inductively-peaked

differential amplifier. The nominal supply voltage for the TIA, LA, OUT, and LPF is 1.2V. The T-coil RX has

8.2kΩ gain, 20GHz BW, and 8MHz low frequency cutoff in simulation. The T-coil RX occupies 250µm×390µm.

The CMOS inverter RX block diagram is shown in Fig. 3a. The TIA is a CMOS inverter with resistive feedback,

while a second inverter sinks or sources current at the input for self-biasing. The LA is a 4-stage amplifier, each

stage realized as two inverters with resistive feedback around the second inverter to boost the speed, similar to the

CH amplifier. The LPF provides feedback for self-biasing and is realized as a single-pole RC filter with a Miller-

boosted capacitor. The output stage is an NMOS source follower (SF) driving an off-chip 50Ω load. The nominal

OW4I.3.pdf 1 1/23/2012 11:52:18 AM

OFC/NFOEC Technical Digest © 2012 OSA

OW4I.3.pdf 1 1/23/2012 2:16:43 PM

©Optical Society of America

Page 2: Ultra Low Power 10-To 25Gb.s CMOS-Driven VCSEL Links

supply voltage for the CMOS inverters and the SF is 1.2V. The CMOS inverter RX has 6.3kΩ gain, 12GHz BW,

and 2MHz low frequency cutoff in simulation. The CMOS inverter RX occupies 40µm×95µm.

a) b) PD

LA

OUT

TIA

LPF

PD

LA

OUT

TIA

LPF

T-Coil RX ChipBoundary

TIA

LPF

OUTLA50Ω Load(Scope, BERT)

VDD_TIA VDD_LA VDD_OUTVPD

T-Coil RX ChipBoundary

TIA

LPF

OUTLA50Ω Load(Scope, BERT)

VDD_TIA VDD_LA VDD_OUTVPD

Fig. 2. a) Block diagram of the T-coil RX. b) Photograph of the wirebonded T-coil RX test site.

50Ω Load(Scope,BERT)

CMOS Inv.RX Chip Boundary

VPDa) b) PD

RX

LA

LALALALA

TIA

SF

LPF

VDD_CMOSVDD_SF

Fig. 3. a) Block diagram of the CMOS inverter RX. b) Photograph of the wirebonded CMOS inverter RX test site.

3. Experimental Results

The TX and RX sites are wirebonded to a high-speed custom PCB for testing (Fig. 1b, 2b, and 3b). Lensed 50µm

MM fiber probes optically couple the VCSEL and PD. A 30GHz BW sampling oscilloscope is used for eye diagram

measurements and a 17GHz BW Newport D-25xr PD is used to capture TX optical eye diagrams. The test pattern is

27-1 PRBS. At nominal supply voltages, the TX uses 36.2mW, the T-coil RX uses 44.4mW, and the CMOS inverter

RX uses 25.2mW of power. The VCSEL carries a DC bias current of 4.2mA and emitted 0.73mW average optical

power with 5.1dB ER, giving 0.77mW (-1.1dBm) OMA.

Fig. 4 shows the TX and RX eye diagrams; all eyes have BER < 10-12

in the center. The CMOS inverter RX eye

is distinctly lopsided, an expected result due to the single-ended SF output, which is much faster on the falling edge.

For 10Gb/s to 22Gb/s, all supply voltages are nominal and the RXs are connected to the TXs through 4m of OM2

MM fiber and an adjustable optical attenuator. At 25Gb/s, the optical attenuator is removed, VDD_PA is increased to

2V, and the VCSEL is biased at 4.5mA, resulting in 46mW TX power. The CMOS inverter RX eye is closed above

22Gb/s, reflecting the lower BW of that RX. The T-coil RX horizontal eye opening at 25Gb/s is 0.22UI at BER =

10-10

.

CMOS Inv. RX

Eye

T-Coil

RX Eye

TX Optical

Eye

CMOS Inv. RX

Eye

T-Coil

RX Eye

TX Optical

Eye

22Gb/s20Gb/s 25Gb/s15Gb/s10Gb/s 22Gb/s20Gb/s 25Gb/s15Gb/s10Gb/s

0.7

7 m

W

40 ps/div

24

0 m

V

40 ps/div

39

0 m

V

40 ps/div

0.7

7 m

W

26.7 ps/div

24

0 m

V

26.7 ps/div

39

0 m

V

26.7 ps/div

0.7

7 m

W

20 ps/div

24

0 m

V

20 ps/div

39

0 m

V

20 ps/div

0.7

7 m

W

18.2 ps/div

24

0 m

V

18.2 ps/div

39

0 m

V

18.2 ps/div

16 ps/div

0.9

1 m

W24

0 m

V

16 ps/div

Fig. 4. Eye diagrams. All eyes have BER < 10-12 in the center.

Sensitivity measurements (Fig. 5) were taken with 4m OM2 MM fiber and an adjustable optical attenuator. The

approximately 1dB lower sensitivity from 10Gb/s to 20Gb/s in the T-coil RX compared to the CMOS inverter RX

OW4I.3.pdf 2 1/23/2012 11:52:18 AM

OFC/NFOEC Technical Digest © 2012 OSA

OW4I.3.pdf 2 1/23/2012 2:16:43 PM

Page 3: Ultra Low Power 10-To 25Gb.s CMOS-Driven VCSEL Links

can be attributed to the larger T-coil RX BW that integrates more high frequency noise to degrade sensitivity.

Sensitivity losses of 4dB and 2dB are observed moving from 27-1 PRBS to 2

31-1 PRBS at 15Gb/s for the T-coil RX

and CMOS inverter RX, respectively. A low transmission penalty of only 1dB is observed for both RXs after 100m

of OM3 MM fiber at 20Gb/s.

-14 -12 -10 -8 -6 -4 -210

-12

10-10

10-8

10-6

10-4

OMA (dBm)

Bit

Err

or

Ra

te

22Gb/s

20Gb/s

15Gb/s

10Gb/s

-14 -12 -10 -8 -6 -4 -210

-12

10-10

10-8

10-6

10-4

OMA (dBm)

Bit

Err

or

Ra

te

22Gb/s

20Gb/s

15Gb/s

10Gb/s

T-Coil RX CMOS Inv. RX

Fig. 5. Sensitivity measurements at nominal supply voltages for both links.

The TX is a limiting factor in the link performance. TX eye closure at 22Gb/s and 25Gb/s is reflected in the RX

eye diagrams. Horizontal eye opening (BER = 10-10

) at 20Gb/s is 0.5UI and 0.47UI and at 22Gb/s is 0.32UI and

0.35UI for the T-coil and CMOS inverter RXs, respectively. The similarity between the horizontal eye openings

despite the higher BW of the T-coil RX indicates that the TX limits the overall link performance.

To determine the best achievable power efficiency versus data rate, both links were power optimized at data

rates ranging from 10Gb/s to 25Gb/s. Power was optimized by reducing supply voltages and bias currents while

maintaining BER < 10-12

in the center of the eye and RX single-ended output voltage ≥100mVpp [2]. The optical

portion of the links consists only of 4m OM2 MM fiber. The results are plotted in Figure 6. The full link with

CMOS inverter RX achieves a record low power efficiency of 1.37pJ/bit at 15Gb/s.

The record power efficiencies are a result of the low supply voltages tolerated by the RX circuits. At the lowest

power efficiency points, VDD_TIA = 1.0V and VDD_LA = 0.7V for the T-coil RX and VDD_CMOS = VDD_SF = 0.7V for the

CMOS inverter RX. Reducing the power supplies trades RX gain, RX speed, and link margin for ultra low power.

10 12.5 15 17.5 20 22.5 250

1

2

3

4

Data Rate (Gb/s)

Po

wer

Eff

icie

ncy

(p

J/b

it)

TX

RX

Total

T-Coil RX CMOS Inv. RX

10 12.5 15 17.5 20 22.5 250

1

2

3

4

Data Rate (Gb/s)

Po

wer

Eff

icie

ncy

(p

J/b

it)

TX

RX

Total

1.78pJ/bit 1.37pJ/bit

Fig. 6. Full link power efficiency curves for T-coil RX and CMOS inv. RX. Insets show eye diagrams at lowest power points.

4. Conclusions

This work demonstrates the first optical links with sub-2pJ/bit power efficiency at 15Gb/s to 22Gb/s. This is the first

demonstration of full-link power efficiency on the order of exascale system requirements.

5. Acknowledgements

The authors thank C. Baks for high-speed custom PCB design, Y. Vlasov for management support, and Emcore

Corp. for the VCSELs and PDs.

6. References [1] J. Kash et al., “Optical interconnects in future servers,” in Proc. Optical Fiber Communications Conf. (OFC), paper OWQ1, Mar. 2011.

[2] C. Schow et al., “Transmitter pre-distortion for simultaneous improvements in bit-rate, sensitivity, jitter, and power efficiency in

20 Gb/s CMOS-driven VCSEL Links,” in Proc. Optical Fiber Communications Conf. (OFC), paper PDPC6, Mar. 2011.

[3] X. Zheng et al., “Ultra-low power arrayed CMOS silicon photonic transceivers for an 80Gbps WDM optical link,” in Proc. Optical Fiber

Communications Conf. (OFC), paper PDPA1, Mar. 2011.

[4] N. Li et. al., “High-performance 850 nm VCSEL and photodetector arrays for 25 Gb/s parallel optical interconnects,” in Proc.

Optical Fiber Communications Conf. (OFC), paper OTuP2, Mar. 2010.

OW4I.3.pdf 3 1/23/2012 11:52:18 AM

OFC/NFOEC Technical Digest © 2012 OSA

OW4I.3.pdf 3 1/23/2012 2:16:43 PM