25
Digital Link Pre-emphasis with Dynamic Driver Impedance Modulation Ranko Sredojević and Vladimir Stojanović RLE/MTL, Department of EECS, Massachusetts Institute of Technology, Cambridge, MA

Ranko Sredojević and Vladimir · 2019. 11. 20. · Need for energy-efficient links •Links consume significant portion of SoC power –~100pJ/op for compute in modern processors

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

  • Digital Link Pre-emphasis with

    Dynamic Driver Impedance

    Modulation

    Ranko Sredojević and Vladimir

    Stojanović

    RLE/MTL, Department of EECS,

    Massachusetts Institute of Technology,

    Cambridge, MA

  • Need for energy-efficient links

    • Links consume significant portion of SoC

    power

    – ~100pJ/op for compute in modern processors

    – ~8b/op for data transfers (DRAM, coherency)

    – @ 10pJ/bit links spend 80% of computing

    power!

    – Challenge: Also need to increase link data rates

    2Cell

    Rainbow

    Falls

  • Use nonlinearities to improve links• Challenge common wisdom

    – Do we really need 50Ω TX/RX (VM, CML drivers)?

    • Be smarter about signal processing

    – Use higher impedance and compensate (CMOS+DSP)

    – Use NLEQ to push performance as well – new EQ

    algorithms and opportunities at Rx as well

    Serial to Parallel

    Phase Control

    Parallel to Serial

    Vtt

    TXP

    TXNTX Data

    SysClk

    RefClk

    1/4 or 1/5

    1 or 1/2PLL

    Vtt

    RXP

    RXN

    RX Clk

    RX Data

    Tclk

    RX Equalizer

    Tap Weights

    Phase Mixer

    Phase MixerPhase MixerPhase Mixer Rclk

    Rclk

    Tap Selection

    TX

    EQ

    TX

    Clocking

    RX

    Serial to Parallel

    Phase Control

    Parallel to Serial

    Vtt

    TXP

    TXNTX Data

    SysClk

    RefClk

    1/4 or 1/5

    1 or 1/2PLL

    Vtt

    RXP

    RXN

    RX Clk

    RX Data

    Tclk

    RX Equalizer

    Tap Weights

    Phase Mixer

    Phase MixerPhase MixerPhase Mixer Rclk

    Rclk

    Tap Selection

    TX

    EQ

    Serial to Parallel

    Phase Control

    Parallel to Serial

    Vtt

    TXP

    TXNTX Data

    SysClk

    RefClk

    1/4 or 1/5

    1 or 1/2PLL

    Vtt

    RXP

    RXN

    RX Clk

    RX Data

    Tclk

    RX Equalizer

    Tap Weights

    Phase Mixer

    Phase MixerPhase MixerPhase Mixer Rclk

    Rclk

    Tap Selection

    TX

    EQ

    TX

    Clocking

    RX

    Transmitter

    Clocking

    Receiver

    3

  • Asymmetric Links

    4

    RC-dominated channels

    (On-chip, cables, …)Memory interfaces

    Robert Palmer, et al., “A 4.3Gb/s

    Mobile Memory Interface With

    Power-efficient Bandwidth

    Scaling,” VLSI Symposium 2009

    Byungsub Kim, et al., “A 4Gb/s

    356fJ/bit 10mm Equalized On-chip

    with Nonlinear Charge-injection

    Transmit Filter in 90nm CMOS,”

    ISSCC 2009

    Static Pre-distorted

    coefficients

    RCLK

    VD

    DV

    DD

    TIA 1-tap DFE Rx

    2.7k

    2.7k

    RCLK

    VT+

    VT-

    TCLK TCLK

    Tx

    Pre-distort ion 3-tap FFE Tx

    Slicer &

    DFE

    Ib

    IT+

    IT-

    Ib1cm

    DTx

    DRx

    VS-

    IR-

    Vth

    ‘...00100

    ...'

    VS+

  • Current Mode (CM) EQs

    • Constant current

    consumption

    • Half of the current

    leaked in termination

    5

    |a[0]|+|a[1]|

    |a[0]|-|a[1]|

    Bn[k] = ~B[k], D[k] = (2B[k]-1)

    Vout=D[0]*a[0]+D[-1]*a[1]

    B[0] Bn[0] Bn[-1] B[-1]

    a[0] a[1]

  • Voltage Mode (VM) EQs• Voltage modulation by

    shunt resistive divider– Side-effect: direct path from

    Vdd to GND

    • VM drivers lose efficiency when equalizing– Often used just as drivers

    without equalization

    6

    |a[0]|+|a[1]|

    |a[0]|-|a[1]|

  • Voltage Mode* (VM*) EQs

    • Additional switches

    shorting the

    channel

    • Constant current

    from supply

    possible

    • Higher parasitic cap

    7

    W.D. Dettloff, et al., “A 32mW

    7.4Gb/s Protocol Agile

    Source-series Terminated

    Transmitter in 45nm CMOS

    SOI,” ISSCC 2010

  • The cost of impedance matching

    • Enforces the change of the

    equivalent Thevenin source

    • If abandoned: simpler to implement

    modulation of the termination

    – Hard to analyze (inherently nonlinear)

    – Intuition, simulation and experiment8

  • The cost of impedance matching

    9

  • Practical problems in channel

    termination

    • Even lean ESD can result in ~400fF driver

    output capacitance

    – (50 + j 130)Ω termination impedance at Nyquist

    frequency for 6.25Gb/s

    • Channel characteristic impedance is not

    tightly controlled parameter in fabrication 10

  • Static termination effects

    11

  • RM Tx as segmented CMOS-inverter

    • Most energy-efficient but nonlinear – so

    compensate

    -60 -40 -20 0 20 40 60-0.8

    -0.55

    -0.3

    -0.05

    0.2

    0.45

    0.70.8

    Memory Code

    Sta

    tic tra

    nsfe

    r cu

    rve

    [V

    dd

    ]

    12

    VDD

    VDD

    outP

    outN_S

    S

    E[x]

    VDD

    VDD

    outP

    outN_S

    S

    E[x]

    . . .Nx

    E[N-1]

    E[0]

    . . .

  • Pattern-Lookup LUTs to compensate

    and do EQ

    Output VoltagePattern-dependent Code

    13

  • Driver implementation

    • Segmentation – linearity/complexity tradeoff– Use partial-thermometer encoding

    • 6b+1b-> 8b+2b before E/O mux

    • 8b+2b -> 64b+2b after E/O mux – 6 logic stages

    14

  • Chip layout & die

    • Test the RM equalization approach

    • Test LUT flexibility in compensating

    – Driver nonlinearity, timing path mismatches, …

    15

  • Transfer curve and equalization

    • Resistance 25Ω - 1.5kΩ (single ended)

    • Use to calculate LUT codes

    – Achieving predistortion and linearizing the

    transmitter 16

  • Equalization at 4Gb/s

    • 26’’ FR4: -9dB at

    2GHz Nyquist

    • 2pJ/bit efficiency

    • 231-1 PRBS (DDR)

    17Unequalized and equalized eye

    (a[0], a[1]) = ( 0.75, -0.25 )

  • Duty cycle correction with

    independent LUTs

    • Timing noise is converted to voltage by

    the channel

    – We can use voltage to correct for timing

    – 292mV/226ps : 128mV/113ps uncorrected

    – 207mV/171ps : 198mV/178ps corrected 18

  • Power/jitter for CLK pattern @ 4GHz

    • Driver power linear with amplitude

    19

    ~1.5mW for 2nd

    order filter

  • Power efficiency measurements

    20

  • Power comparison example

    21

    )4(92.0

    )(98.0

    )(7.0

    42.1)84.11(2

    1

    4,1

    )4.0(2

    1)1(

    2

    1

    100@

    *

    CBFR

    NCBRogers

    ideal

    P

    P

    PP

    PPP

    mV

    RM

    VM

    CMVM

  • Power comparison CM/RM

    22

    • Constant Vdd for CM and RM

    • Linear regulator for VM assumed

    • VM operates from a lower supply

    ~3.5x

    ~5.5x

  • Performance comparison with

    regulated supply (projection)

    23• Underestimates capacitance of VM/VM*

  • Conclusions & Future work

    • Working TxEQ modulating transmitter resistance in 90nm CMOS– 2pJ/bit @ 4Gb/s and 100mV receiver eye

    – Small footprint (150u x 130u) and fully digital

    • Flexible LUT-based filter– Linearizes driver, equalizes channel, timing correction

    • Impedance matching not fundamental constraint– Trade-off: Signal quality vs. power efficiency

    • Opportunity for new NLEQ, and TX-Channel-RX codesign for power minimization

    24

  • Acknowledgement

    • Fred Chen, Byungsub Kim

    • C2S2 and FCRP Centers

    • Center for Integrated Circuits and

    Systems (CICS) at MIT

    • IBM TAPO for chip fabrication