Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
Digital Link Pre-emphasis with
Dynamic Driver Impedance
Modulation
Ranko Sredojević and Vladimir
Stojanović
RLE/MTL, Department of EECS,
Massachusetts Institute of Technology,
Cambridge, MA
Need for energy-efficient links
• Links consume significant portion of SoC
power
– ~100pJ/op for compute in modern processors
– ~8b/op for data transfers (DRAM, coherency)
– @ 10pJ/bit links spend 80% of computing
power!
– Challenge: Also need to increase link data rates
2Cell
Rainbow
Falls
Use nonlinearities to improve links• Challenge common wisdom
– Do we really need 50Ω TX/RX (VM, CML drivers)?
• Be smarter about signal processing
– Use higher impedance and compensate (CMOS+DSP)
– Use NLEQ to push performance as well – new EQ
algorithms and opportunities at Rx as well
Serial to Parallel
Phase Control
Parallel to Serial
Vtt
TXP
TXNTX Data
SysClk
RefClk
1/4 or 1/5
1 or 1/2PLL
Vtt
RXP
RXN
RX Clk
RX Data
Tclk
RX Equalizer
Tap Weights
Phase Mixer
Phase MixerPhase MixerPhase Mixer Rclk
Rclk
Tap Selection
TX
EQ
TX
Clocking
RX
Serial to Parallel
Phase Control
Parallel to Serial
Vtt
TXP
TXNTX Data
SysClk
RefClk
1/4 or 1/5
1 or 1/2PLL
Vtt
RXP
RXN
RX Clk
RX Data
Tclk
RX Equalizer
Tap Weights
Phase Mixer
Phase MixerPhase MixerPhase Mixer Rclk
Rclk
Tap Selection
TX
EQ
Serial to Parallel
Phase Control
Parallel to Serial
Vtt
TXP
TXNTX Data
SysClk
RefClk
1/4 or 1/5
1 or 1/2PLL
Vtt
RXP
RXN
RX Clk
RX Data
Tclk
RX Equalizer
Tap Weights
Phase Mixer
Phase MixerPhase MixerPhase Mixer Rclk
Rclk
Tap Selection
TX
EQ
TX
Clocking
RX
Transmitter
Clocking
Receiver
3
Asymmetric Links
4
RC-dominated channels
(On-chip, cables, …)Memory interfaces
Robert Palmer, et al., “A 4.3Gb/s
Mobile Memory Interface With
Power-efficient Bandwidth
Scaling,” VLSI Symposium 2009
Byungsub Kim, et al., “A 4Gb/s
356fJ/bit 10mm Equalized On-chip
with Nonlinear Charge-injection
Transmit Filter in 90nm CMOS,”
ISSCC 2009
Static Pre-distorted
coefficients
RCLK
VD
DV
DD
TIA 1-tap DFE Rx
2.7k
2.7k
RCLK
VT+
VT-
TCLK TCLK
Tx
Pre-distort ion 3-tap FFE Tx
Slicer &
DFE
Ib
IT+
IT-
Ib1cm
DTx
DRx
VS-
IR-
Vth
‘...00100
...'
VS+
Current Mode (CM) EQs
• Constant current
consumption
• Half of the current
leaked in termination
5
|a[0]|+|a[1]|
|a[0]|-|a[1]|
Bn[k] = ~B[k], D[k] = (2B[k]-1)
Vout=D[0]*a[0]+D[-1]*a[1]
B[0] Bn[0] Bn[-1] B[-1]
a[0] a[1]
Voltage Mode (VM) EQs• Voltage modulation by
shunt resistive divider– Side-effect: direct path from
Vdd to GND
• VM drivers lose efficiency when equalizing– Often used just as drivers
without equalization
6
|a[0]|+|a[1]|
|a[0]|-|a[1]|
Voltage Mode* (VM*) EQs
• Additional switches
shorting the
channel
• Constant current
from supply
possible
• Higher parasitic cap
7
W.D. Dettloff, et al., “A 32mW
7.4Gb/s Protocol Agile
Source-series Terminated
Transmitter in 45nm CMOS
SOI,” ISSCC 2010
The cost of impedance matching
• Enforces the change of the
equivalent Thevenin source
• If abandoned: simpler to implement
modulation of the termination
– Hard to analyze (inherently nonlinear)
– Intuition, simulation and experiment8
The cost of impedance matching
9
Practical problems in channel
termination
• Even lean ESD can result in ~400fF driver
output capacitance
– (50 + j 130)Ω termination impedance at Nyquist
frequency for 6.25Gb/s
• Channel characteristic impedance is not
tightly controlled parameter in fabrication 10
Static termination effects
11
RM Tx as segmented CMOS-inverter
• Most energy-efficient but nonlinear – so
compensate
-60 -40 -20 0 20 40 60-0.8
-0.55
-0.3
-0.05
0.2
0.45
0.70.8
Memory Code
Sta
tic tra
nsfe
r cu
rve
[V
dd
]
12
VDD
VDD
outP
outN_S
S
E[x]
VDD
VDD
outP
outN_S
S
E[x]
. . .Nx
E[N-1]
E[0]
. . .
Pattern-Lookup LUTs to compensate
and do EQ
Output VoltagePattern-dependent Code
13
Driver implementation
• Segmentation – linearity/complexity tradeoff– Use partial-thermometer encoding
• 6b+1b-> 8b+2b before E/O mux
• 8b+2b -> 64b+2b after E/O mux – 6 logic stages
14
Chip layout & die
• Test the RM equalization approach
• Test LUT flexibility in compensating
– Driver nonlinearity, timing path mismatches, …
15
Transfer curve and equalization
• Resistance 25Ω - 1.5kΩ (single ended)
• Use to calculate LUT codes
– Achieving predistortion and linearizing the
transmitter 16
Equalization at 4Gb/s
• 26’’ FR4: -9dB at
2GHz Nyquist
• 2pJ/bit efficiency
• 231-1 PRBS (DDR)
17Unequalized and equalized eye
(a[0], a[1]) = ( 0.75, -0.25 )
Duty cycle correction with
independent LUTs
• Timing noise is converted to voltage by
the channel
– We can use voltage to correct for timing
– 292mV/226ps : 128mV/113ps uncorrected
– 207mV/171ps : 198mV/178ps corrected 18
Power/jitter for CLK pattern @ 4GHz
• Driver power linear with amplitude
19
~1.5mW for 2nd
order filter
Power efficiency measurements
20
Power comparison example
21
)4(92.0
)(98.0
)(7.0
42.1)84.11(2
1
4,1
)4.0(2
1)1(
2
1
100@
*
CBFR
NCBRogers
ideal
P
P
PP
PPP
mV
RM
VM
CMVM
Power comparison CM/RM
22
• Constant Vdd for CM and RM
• Linear regulator for VM assumed
• VM operates from a lower supply
~3.5x
~5.5x
Performance comparison with
regulated supply (projection)
23• Underestimates capacitance of VM/VM*
Conclusions & Future work
• Working TxEQ modulating transmitter resistance in 90nm CMOS– 2pJ/bit @ 4Gb/s and 100mV receiver eye
– Small footprint (150u x 130u) and fully digital
• Flexible LUT-based filter– Linearizes driver, equalizes channel, timing correction
• Impedance matching not fundamental constraint– Trade-off: Signal quality vs. power efficiency
• Opportunity for new NLEQ, and TX-Channel-RX codesign for power minimization
24
Acknowledgement
• Fred Chen, Byungsub Kim
• C2S2 and FCRP Centers
• Center for Integrated Circuits and
Systems (CICS) at MIT
• IBM TAPO for chip fabrication