Data-Dependent Selection of Amplitude ... - analog.snu.ac.kranalog.snu.ac.kr/publications/09076611.pdf · This article has been accepted for inclusion in a future issue of this journal

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS–I: REGULAR PAPERS 1

Data-Dependent Selection of Amplitude and PhaseEqualization in a Quarter-Rate Transmitter

for Memory InterfacesJoo-Hyung Chae , Member, IEEE, Yong-Un Jeong, Student Member, IEEE,

and Suhwan Kim , Senior Member, IEEE

Abstract— We combine 2-tap feed-forward amplitude equal-ization with phase equalization by 4-tap integrated pulse-width modulation. In a VSS-terminated transmitter, amplitudeequalization is selected for pull-up data transmission, andphase equalization for pull-down data transmission, and thestrength of equalization can be controlled depending on channellosses. This combines the strength of amplitude equalizationwith the energy efficiency of phase equalization. A prototypequarter-rate transmitter for memory interfaces, fabricated ina 65nm CMOS process, performed single-ended signaling at adata-rate of 16Gb/s/pin over a channel with a loss of 14.7dB.Its energy efficiency is 1.04pJ/bit/pin and the figure-of-merit is0.070pJ/bit/pin/dB.

Index Terms— Quarter-rate transmitter, memory interface,amplitude equalization, phase equalization, data-dependentequalization.

I. INTRODUCTION

DEVELOPMENTS in virtual reality, high-performancecomputing, big data processing, artificial intelli-

gence, and autonomous vehicles have brought increasingdemand for high-bandwidth dynamic random-access memory(DRAM) [1], including double data-rate (DDR), graphicsDDR (GDDR), and low-power DDR (LPDDR) memory, andalso high-bandwidth memory (HBM). Increasing the data-rateper pin is one of the commonly used methods to increase mem-ory bandwidth, reaching 2.6Gb/s/pin in HBM [2], 6.4Gb/s/pinin LPDDR and DDR memory [3], and 18Gb/s/pin in GDDRmemory [4].

As the data-rate per pin increases, so does the chan-nel loss, which degrades signal integrity, mandating theuse of equalization to counter inter-symbol interference. But

Manuscript received December 28, 2019; revised March 25, 2020; acceptedApril 7, 2020. This work was supported by the Future Semiconductor DeviceTechnology Development Program through the Ministry of Trade, Industryand Energy (MOTIE) and the Korea Semiconductor Research Consortium(KSRC), under Grant 10080570. This article was recommended by AssociateEditor Robert Rieger. (Corresponding author: Suhwan Kim.)

Joo-Hyung Chae was with the Department of Electrical and ComputerEngineering, Seoul National University, Seoul 08826, South Korea. He isnow with SK Hynix, Icheon 17336, South Korea.

Yong-Un Jeong and Suhwan Kim are with the Department of Electricaland Computer Engineering, Seoul National University, Seoul 08826, SouthKorea, and also with the Inter-University Semiconductor Research Cen-ter (ISRC), Seoul National University, Seoul 08826, South Korea (e-mail:[email protected]).

Color versions of one or more of the figures in this article are availableonline at http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TCSI.2020.2987052

equalization requires extra power, and thus it exacerbatespower-supply-induced jitter (PSIJ), which also affects signalintegrity. Therefore, a proper balance between the increaseof power consumption and signal integrity improvement isimportant, when equalization is used. Noise, area, and timingissues [5], [6] force the equalization to be performed in thetransmitter of a memory interface, while the receiver has asimple structure and performs little or no equalization [7].

Amplitude [8]–[11] and phase equalization [7], [12]–[15]are commonly used methods of transmitter equaliza-tion. Amplitude equalization can compensate for moredata-dependent jitter than phase equalization, but requiresmore power. To alleviate these issues, we aim for a ben-eficial combination of amplitude and phase equalization intransmitter design. The advantages of these two types ofequalization can be combined if an interface is able to switchbetween them as the data-bits change value. LPDDR memoryand HBM both have a VSS termination. Therefore, currentonly flows through the VSS termination during pull-up datatransmission, and there is no current when pull-down datais transmitted. Amplitude equalization creates a short currentpath during pull-down transmission, which increases currentconsumption. Thus, in a VSS-terminated transmitter, we showhow energy-efficient pull-down (PD) phase equalization canbe used for successive data ‘0’s, while more effective pull-up(PU) amplitude equalization is used for successive data ‘1’s.DDR and GDDR memories both have a VDDQ termination,and so current only flows during pull-down data transmission.Thus, a reversed scheme, using pull-up phase equalizationand pull-down amplitude equalization, would be suitable formemory interfaces with VDDQ terminations.

The remainder of this paper is organized as follows: inSection II we describe amplitude and phase equalization, howwe combine them, and the effectiveness of this combination;in Section III we introduce a quarter-rate transmitter based onthis scheme; in Section IV we present experimental results;and in Section V we draw conclusions.

II. DATA-DEPENDENT AMPLITUDE AND

PHASE EQUALIZATION

A. Overview of Amplitude and Phase Equalization

Amplitude equalization is the most commonly used methodof equalization in a memory interface transmitter, because

1549-8328 © 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information.

Authorized licensed use limited to: Seoul National University. Downloaded on April 26,2020 at 03:44:48 UTC from IEEE Xplore. Restrictions apply.

https://orcid.org/0000-0001-9107-2963

https://orcid.org/0000-0001-6354-5612


2 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS–I: REGULAR PAPERS

TABLE I

COMPARISON OF AMPLITUDE AND PHASE EQUALIZATION

it provides high resolution, and is simple to implement andtest [9], [10]. However, amplitude equalization requires sig-nificantly more signaling and switching power in both thepre-driver and the output driver of the transmitter, increasingsimultaneous switching noise [12]. The consequent reduc-tion in signal-to-noise ratio, which is particularly undesirablebecause single-ended signaling is used in memory interfaces,hampers efforts to reduce the supply voltage and hence powerconsumption. These issues have motivated the introduction ofphase equalization, which can reduce data-dependent jitter toimprove signal integrity of the output signal without increas-ing current consumption or simultaneous switching noise.However, the main effect of the most of phase equalizationis to improve the timing margin only, whereas amplitudeequalization improves both the timing margin and the voltagemargin [15], and is therefore more effective. The main char-acteristics of amplitude and phase equalization are comparedin Table I.

B. Amplitude Equalization

Amplitude equalization techniques, such as feed-forwardequalization (FFE), involve increasing the amplitude of tran-sition bits or decreasing the amplitude of non-transition bitsin a bitstream. Amplitude equalization can be incorporated ina voltage-mode driver without sacrificing channel impedancematching, by means of a high-pass filter. The driver outputYFFE[n] of 2-tap FFE can be expressed as follows:

YFFE[n] = X[n] − α · X[n − 1], (1)

where X[n − 1] and X[n] are successive identical input bits,α is the tap coefficient. Fig. 1(a) shows how successive bits ofpull-up data undergo amplitude equalization in an impedance-matched voltage-mode driver having a VSS termination. Whena transition bit is transmitted, the driver outputs data at avoltage of VOH,main. During transmission of a non-transitionbit, the output voltage level is reduced by the tap coefficientα, producing data having a voltage of VOH,post. When thisoperation happens, some of the pull-up and pull-down driversturn on or off, which creates a short current path from thesupply to the ground [16], increasing the power requirement.This also occurs in pull-down data transmission.

Fig. 1. Transmission of a transition bit and a non-transition bit by thevoltage-mode driver with amplitude equalization with (a) impedance matchingand (b) relaxed impedance matching.

Unlike the above method, if only some pull-up driversare turned on or off when transferring successive data ‘1’s,the output voltage VOH,post of non-transition bits can beadjusted by the tap coefficient α to equalize the amplitudeof the pull-up data bits as shown in Fig. 1(b). This can have adeleterious effect on channel impedance matching, but signalreflections are somewhat attenuated as the channel loss isincreased [17]. Besides, the short current path is removed.Overall, relaxing impedance matching [18] in pull-up dataequalization can balance simultaneous switching noise andsignal reflections, while equalizing the output amplitude andreducing current consumption. We adopt this approach to thepull-up data transmission of our design.

C. Phase Equalization

Phase equalization advances or delays the rising or fallingedges of data bits, depending on previous data patterns.Integrated pulse-width modulation (iPWM) [7], [14] is anestablished method of phase equalization, and Fig. 2 shows itseffect on several successive identical bits. In the case of twosuccessive identical bits of pull-down data, the timing of theoutput diPWM(t) of 2-tap iPWM can be expressed as follows:

diPWM(t) = − [u(t) − u(t − 2T + β)]+ [u(t − 2T + β) − u(t − 2T )], (2)

where β is the tap coefficient.The low-frequency spectrum of the signal that occurs during

successive identical bit transmission has a low channel loss,so that the voltage level approaches close to the ideal level



CHAE et al.: DATA-DEPENDENT SELECTION OF AMPLITUDE AND PHASE EQUALIZATION 3

Fig. 2. Concept and effect of the iPWM [7], [14] method of phaseequalization.

Fig. 3. (a) An output driver with data-dependent selection between amplitudeand phase equalization, and (b) the effect of this equalization.

before a transition. However, the frequency spectrum differswith the number of successive identical bits, producing adifferent peak voltage for each data pattern, which causesdata-dependent jitter. This can be reduced by advancing therising or falling edges by a time that depends on the numberof successive identical bits. This also opens the eye more.Unlike amplitude equalization, phase equalization does notrequire segmentation of the output driver, nor an additionaloutput driver specially for equalization. This reduces the inputand output capacitance (CIO), which improves the signalintegrity of both the transmitter (TX) and the receiver (RX),in memory interfaces where the TX and RX share a singlepad. Because phase equalization does not produce the shortcurrent path in the output driver, it uses less power thanamplitude equalization and reduces simultaneous switchingnoise. Therefore, we use phase equalization for transmittingpull-down data.

D. Data-Dependent Selection of Amplitudeand Phase Equalization

Fig. 3(a) shows a simplified block diagram of our outputdriver which selects pull-up amplitude and pull-down phaseequalization, depending on the pattern of the data beingtransmitted. The 1UI pulse for amplitude equalization isprovided by the pulse generator (Pulse Gen) when a risingtransition is detected by the pattern detector. During pull-down data transmission, the extent to which the rising edge isadvanced is determined by the number of successive ‘0’ bitswhen seen from the driver output node. A pattern detectorcounts the number of identical bits in a run. Phase equal-ization needs edge shifters to adjust the timing of transitionedges [14]. UI-shifted amplitude equalization [9] needs manyclock synchronous flip-flops and latches to shift data beforethe output driver stage, and thus it is more burdensome

Fig. 4. Memory interface configuration.

Fig. 5. Waveforms of the transmitter output OUTTX, the micro-bump outputOUTBump, and the receiver input INRX of Fig. 4: (a) without equalization,(b) with amplitude equalization, (c) with phase equalization, and (d) withdata-dependent selection between amplitude and phase equalization.

than phase equalization in terms of area, power, and switch-ing noise. Asynchronous time-delayed amplitude equaliza-tion [19], which shifts data using logic delay units such asinverters, has similar requirements to phase equalization, andthe main difference between them is the area and power ofthe output driver.

Fig. 3(b) shows the DQ waveform through the channel thatresults from data-dependent selection between amplitude andphase equalization: the voltage is equalized during pull-up datatransmission, and the phase is equalized during pull-down datatransmission. This increases the eye opening. Single-endedsignaling requires a reference voltage VREF to determine thecorrect data in the receiver, and thus, after a memory has pow-ered up or in operation, the memory interface performs VREFtraining to find the best reference voltage VREF [20]. Thistraining compensates for process, voltage, and temperature(PVT) variation, and also prevents a data-dependent offset orDC wander, which may occur in data-dependent equalization.

The role of equalization in the configuration of a mem-ory interface [21] is shown in Fig. 4. Fig. 5 shows thathow the various types of equalization introduced earlier affectthe waveforms of the output of the transmitter (OUTTX),the micro-bump output (OUTBump), and the input of the




Fig. 6. Effect of tap coefficient on data-dependent jitter of the falling edgesof random data, using 2-tap pull-up amplitude equalization at 10Gb/s/pin.

receiver (INRX). The transmitter output signal passes throughthe pad, the package, and the micro-bump, and then along thechannel to the receiver stage, where signal distortion occursalong the way (Fig. 5(a)). The three forms of equalization canreduce data-dependent jitter in the amplitude (Fig. 5(b)) orthe phase domain (Fig. 5(c)), or in both domains (Fig. 5(d)),depending on the data pattern; this improves signal integrityat the receiver.

E. Analysis of Data-Dependent Jitter in Pull-Up AmplitudeEqualization, Pull-Down Phase Equalization,and Combined Equalization

To compare each form of equalization in a simple way,a 1st-order low-pass filter with a single pole is used as thechannel model. We set the loss of this channel model to 10dBat 5GHz.

When 2-tap pull-up amplitude equalization, as formulatedin (1), is applied to two successive identical pull-up data bits,the output data dAEQ(t) can be represented in the time domainby

dAEQ(t) = u(t) + (1 − α) · u(t − TB)

− α · u(t − 2TB), (3)

where u(t) is a unit step function, α is the tap coefficient, andTB is the 1UI period of the data. We calculated the crossingtime of half of the output swing at the falling edge of theN th data bits [22], when random data is being transmittedthrough the channel. These calculated crossing time values canbe used as a measure of the data-dependent jitter of the fallingedges. Fig. 6 shows the data-dependent jitter (DDJ) in randomdata at 10Gb/s/pin when 2-tap pull-up amplitude equalizationis applied with an increasing tap coefficient. Without equal-ization, data-dependent jitter is 46ps; with pull-up amplitudeequalization, by means of a 300mV tap coefficient, it is 14ps.

We have seen the effect of 2-tap iPWM on two successiveidentical pull-down data bits diPWM(t) in (2). We can find thedata-dependent jitter of the rising edges when using pull-downphase equalization by calculating the crossing time [22], whichcorresponds to half of the output swing at the rising edge inthe N th data bit. Fig. 7 shows how the data-dependent jitterat 10Gb/s/pin is affected by sweeping the tap coefficient of2-tap and 4-tap pull-down phase equalization, with a rangeof channel losses. We see that 4-tap phase equalization is

Fig. 7. Effect of tap coefficient on data-dependent jitter of the rising edgesof random data at 10Gb/s/pin, using (a) 2-tap and (b) 4-tap pull-down phaseequalization.

Fig. 8. Calculated eye diagrams showing data-dependent jitter at 10Gb/s/pinafter the 1st-order low-pass filter, (a) without and (b) with data-dependentequalization.

better than 2-tap equalization when the channel loss is greaterthan 10dB.

By combining 2-tap pull-up amplitude equalization and4-tap pull-down phase equalization, we can compensate for alossy channel. Fig. 8 shows eye diagrams with and withoutdata-dependent equalization at 10Gb/s/pin, over a channelwith a 10dB loss. Amplitude and phase equalization with tapcoefficients of 25mV and 10ps reduce the data-dependent jitterfrom 46ps to 15ps.

F. Assessment of Combined Equalizationin a Memory Interface

We simulated the application of several equalizationschemes, including our combination of pull-up amplitudeand pull-down phase equalization, to a single-ended outputdriver, with a data-rate of 10Gb/s/pin and a supply voltageof 1.0V. The insertion loss of the simulated channel is 9.5dBat a Nyquist frequency of 5GHz. A pad capacitance and abonding wire model were included to model the effect ofthe simultaneous switching noise produced by the currentconsumption of each scheme. Fig. 9 shows simulated eyediagrams of the output from the driver with no equalization,with conventional amplitude equalization (2-tap FFE), withphase equalization (4-tap iPWM), and with our data-dependentequalization which uses relaxed impedance matching in pull-up amplitude equalization. Table II gives the sizes of the eyesshown in Fig. 9. Without any equalization (EQ), the eye issmall due to inter-symbol interference, and all the equalizationschemes increase its height and width significantly. Amplitude




Fig. 9. Simulated eye diagrams of the driver output at 10Gb/s/pin (a) withoutequalization, (b) with amplitude equalization (2-tap FFE), (c) with phaseequalization (4-tap iPWM), and (d) with data-dependent equalization.

TABLE II

WIDTHS AND HEIGHTS OF THE EYES IN FIG. 9

equalization produces the largest eye, which is approximately86.9% wider and 84.2% higher than the unequalized eye;phase equalization produces an eye which is about 73.9%wider and 64.6% higher; and data-dependent equalizationproduces an eye which is 82.9% wider and 77.5% higher. It isnecessary to change the reference voltage VREF to suit theequalization method. In our simulation, the required referencevoltage VREF ranged from 153mV to 210mV, to accommodatePVT variations as well as different equalization schemes. VREFtraining of memory interfaces [20] addresses this issue.

To assess the effect of our equalization scheme on powerconsumption, we calculated the amount of current flowingthrough the output driver when transmitting a PRBS-7 datapattern, using the different types of equalization, with astrength of 2.5dB. The current in mA through the 16 outputdrivers required to transmit two-byte data can be expressed asfollows:

NMain,0 · IMain,0 + NMain,1 · IMain,1 + NPost,0 · IPost,0

+ NPost,1 · IMain,1, (4)

where the variables N are numbers of bits, and I are currents.The current is derived from the ratio of the value of thetermination resistor to the on-resistance of the pull-up and pull-down drivers. The subscripts ‘0 ‘and ‘1’ indicate the values ofthe bits, and the subscripts ‘main’ and ‘post’ refer to transitionand non-transition bits. Fig. 10 shows the currents throughtwo-byte sections of the output driver, for the different typesof equalization, with and without data bus inversion (DBI).

Data bus inversion is a technique used in memory interfacesto reduce the DC current consumption of a data bus [23].Fig. 11 shows an example in which data bus inversion is

Fig. 10. Current consumption per two-byte section of the output drivertransmitting a PRBS-7 data pattern (a) without equalization or with phaseequalization, (b) with conventional amplitude equalization, (c) with conven-tional amplitude equalization and DBI, (d) with relaxed impedance matchingand data-dependent equalization, and (e) with relaxed impedance matching,data-dependent equalization, and DBI.

Fig. 11. Example of data bus inversion in a VSS-terminated transmitter whena byte contains (a) fewer than or equal to four ‘1’s and (b) more than four ‘1’s.

applied to a VSS-terminated transmitter. The logic circuitrywhich realizes data bus inversion counts the numbers of ‘1’sin each byte. If there are more than four ‘1’s, then the logicinverts the byte and brings the DBI_n signal low, to informthe receiver that the transmitted data has been inverted. Sinceno current flows during pull-down data transmission by a VSS-terminated transmitter, this technique can save a lot of power.

The resulting average currents in Table III show that databus inversion is moderately effective with amplitude equaliza-tion, but produces a remarkably large current saving with data-dependent equalization. This is because current consumptioncan be further reduced by eliminating the short current paththat occurs when transmitting ‘0’s, in addition to the reductionin DC current achieved by data bus inversion. Table III alsoshows that relaxed impedance matching of the pull-up drivercan save current.

III. TRANSMITTER DESIGN

A. Architecture

To assess the effectiveness of our equalization scheme,we incorporated this scheme into a quarter-rate transmitterwith a VSS termination. A quarter-rate architecture is suitable




TABLE III

AVERAGE CURRENT CONSUMPTION ACROSS ALL PRBS-7DATA BITS SHOWN IN FIG. 10

Fig. 12. Architecture of our quarter-rate transmitter with data-dependentequalization, together with its clock and ZQ paths.

for memory interfaces which employ single-ended signalingbecause a lower clock frequency reduces simultaneous switch-ing noise and relaxes the timing margin [24]. Fig. 12 showsour transmitter design, together with its clock path and the ZQpath that determines the channel impedance termination at theoutput driver.

In the clock path, the four quadrature clock signals CKI,CKQ, CKIB, and CKQB are generated from the incomingdifferential clock signals CK and CKB in the IQ generator,and their duty-cycle and quadrature phase errors are correctedby separate duty-cycle and quadrature phase correctors.

The 32-bit pseudo-random binary sequence (PRBS) gener-ator produces parallel PRBS-7 data, which is serialized by the32:4 serializer and then retimed by the data aligner. The outputdriver, constructed from power-isolated low-voltage swingterminated logic (LVSTL) [25], performs 4:1 serializationusing an overlapped time-division multiplexing (OVTDM)scheme [19]. A serializer (SER) timing adjuster controlseach phase of the quadrature clock signals to ensure timingmargins between the data and clock signals during the final4:1 serialization by the output driver. The type of equalizationperformed by the OVTDM driver is determined by the data

Fig. 13. Block diagram and operation of the 4:1 OVTDM LVSTL driver.

pattern in amplitude equalization (AEQ) and phase equaliza-tion (PEQ) control blocks. Each equalization is realized by theamplitude equalization pulse generator and by data and clockdelay lines (DLs). The amplitude equalization control blockgenerates the 4-bit control signals PU< 3 : 0 >, which aresent to the amplitude equalization pulse generator, where theydetermine the strength of amplitude equalization. The phaseequalization control block generates the 4-bit control signalsPD< 3 : 0 >, which control the rising and falling edges of thedata and clock signals using delay lines. The signals CoAEQand CoPEQ adjust the strength of each equalization. Power issupplied to the output driver from VDDQ, which is separatedfrom VDD, which powers the other blocks.

B. 4:1 Overlapped Time-Division Multiplexing Driver UsingPower-Isolated Low-Voltage Swing Terminated Logic

A memory interface with a high data-rate per pin musthave low input and output capacitance in order to maintainsignal integrity. These capacitances are largely made up of thepackage and pad capacitances, the input capacitance of thereceiver, and the output capacitance of the transmitter.

An OVTDM driver [19] has a smaller area and alower output capacitance than previous multiplexing drivers.Fig. 13 shows how our 4:1 OVTDM driver allows data signalsto overlap. Instead of using a pseudo-open drain (POD) outputdriver [19], with a PMOS transistor for pull-up and an NMOStransistor for pull-down driving, we use a power-isolatedLVSTL output driver [25]. This uses the NMOS transistor forboth pull-up and pull-down driving, and its supply voltageis isolated from and lower than that of the pre-driver. Thisisolates simultaneous switching noise from the output driver,improving the performance of the pre-driver. An LVSTL driverhas better energy efficiency and lower output capacitance thana POD driver.

We simulated the rise and fall times of the output signalfor the POD and power-isolated LVSTL output driver, withthe results shown in Fig. 14. Since each driver has the sameoutput resistance, to match the channel impedance, we expectthe rise and fall times to be primarily determined by output




Fig. 14. Simulated (a) rise time and (b) fall time, of a 4:1 OVTDM POD(blue curves), and a 4:1 OVTDM LVSTL driver (red curves).

Fig. 15. (a) The pull-up amplitude equalization control block, (b) its fourstates, which depend on pull-up data pattern, and (c) an example of waveformsin this control block, which generates the control signal PU< 0 >.

capacitance. The 4:1 OVTDM LVSTL driver has the lowerrise and fall time, and we can, therefore, conclude that it has alower output capacitance and hence a higher signal bandwidth.

C. Pull-Up Amplitude Equalization (2-Tap FFE)

The control signals for pull-up and pull-down equalizationgenerated by the amplitude equalization and phase equal-ization control blocks determine which sort of equalizationwill take place. To secure the timing margin and preventmalfunction due to noise and inter-symbol interference, thesecontrol blocks use quarter-rate data and clock signals. Thisalso makes the control logic less sensitive to technology andfrequency scaling.

Fig. 15(a) shows the pull-up amplitude equalization controlblock. The four states of this control block, which depend onthe pull-up data pattern, are shown in Fig. 15(b). Since ourtransmitter turns on two output drivers to overlap the outputdata when performing 4:1 serialization, two control signals areneeded. When the data changes from 0 to 1, two of the controlsignals PU< 3 : 0 >, corresponding to the pull-up data pattern,are turned on and transmitted to the amplitude equalizationpulse generator to provide 2-tap FFE. These signals remain onfor a single period of the quarter-rate clock, which is equal to

Fig. 16. (a) The amplitude equalization pulse generator and (b) examplesignal waveforms during pull-up amplitude equalization.

Fig. 17. (a) The pull-down phase equalization control block, (b) its fivestates, which depend on the pull-down data pattern, and (c) an example ofwaveforms in this control block, which generates control signal PD< 0 >.

4 UIs of the full-rate DQ output signal. These control signalsare turned off for other data patterns. Fig. 15(c) shows howthe control block detects pull-up transition bits and generatesthe control signal PU< 0 >.




Fig. 18. (a) The data and clock delay line and (b) example signal waveformsduring pull-down phase equalization.

Fig. 19. Measurement setup, together with die micrograph.

The amplitude equalization pulse generator, shownin Fig. 16(a), creates the AEQ pulse signal, consisting of asingle pulse, in response to the control signals PU< 3 : 0 >.The width �t of this pulse is determined by the asynchronousdelay of the inverters, INVPulse, which approximately equals1UI at a data-rate of 16Gb/s. The AEQ pulse signal is addedto the DQ output to perform 2-tap FFE. The CoAEQ < 3 : 0 >signals adjust the strength of amplitude equalization to copewith different channel conditions.

Fig. 16(b) shows an example of amplitude equalization. Thecontrol signals PU< 1 > and PU< 2 > are turned on whenthe data bits D0 and D1 change from ‘0’ to ‘1’, and the controlsignals PU< 2 > and PU< 3 > are turned on when D1 andD2 change from ‘0’ to ‘1’. These signals are then delivered tothe amplitude equalization pulse generator, which adds a pulseto the DQ signal when D1 and D2 are output, producing 2-tapFFE with an amplitude VAEQ that is determined by the CoAEQsignal that is turned on.

D. Pull-Down Phase Equalization (4-Tap iPWM)

Fig. 17(a) shows the pull-down phase equalization controlblock. Depending on the pull-down data pattern, the controlsignals can be in one of five states, as shown in Fig. 17(b).The four bits of the control signal PD< 3 : 0 > are turned onone by one by successive identical ‘0’ bits, and then all resetwhen a ‘1’ arrives. PD< 3 : 0 > adjusts the data and clock

Fig. 20. (a) Measured channel loss characteristics, and the output of thetransmitter, measured after passing through the channel, using a fixed datapattern at 16Gb/s/pin: (b) without equalization and (c) with data-dependentequalization.

Fig. 21. Measured eye diagrams and simulated shmoo plots for the PRBS-7pattern at 16Gb/s/pin, (a) without and with duty-cycle and quadrature phasecorrection, and (b) without and with correction of 4:1 serialization timing bythe serializer timing adjuster.

delay lines to vary the timing of the rising and falling edgesof the data and clock signals. Fig. 17(c) shows an example inwhich the control block detects successive identical ‘0’ bitsand generates the control signal PD< 0 >.

Fig. 18(a) is a block diagram of the data and clockdelay lines, each of which contains four delay-line units.




Fig. 22. Measured eye diagrams before and after the channel, (a) without equalization, (b) with pull-up amplitude equalization, (c) with pull-down phaseequalization, and (d) with our data-dependent equalization; also (e) the corresponding simulated shmoo plots after the channel.

PD< 3 : 0 > controls the number of delay-line units thatoperate. The total width of the operating transistor in each unitis determined by the CoPEQ < 4 : 0 > signals, and this widthdetermines the time resolution, �tPEQ, of that unit. Therefore,the extent to which the timing of the data and clock edges iscontrolled is determined by the number of bits in CoPEQ andthe control signals PD< 3 : 0 > that are turned on.

Fig. 18(b) shows an example of pull-down phase equaliza-tion. When three successive identical ‘0’ bits D1, D2, andD3 arrive, the rising edge of the output signal DQ from D3to D0 is pulled forward by 2∗�tPEQ. Then, two consecutive‘0’ bits D2 and D3 arrive, and the control signal PD< 2 >is asserted, pulling the rising edge of the output signal DQforward by �tPEQ. However, the provision of a delay of�tPEQ at the output of the transmitter does not mean thatthe same delay will occur at the far end of the channel,because the channel has a non-linear characteristic. To alleviatethis issue, the delay �tPEQ is adjusted by the control signalsCoPEQ < 4 : 0 >, and then realized by both data and clockdelay lines.

IV. EXPERIMENTAL RESULTS

A prototype chip was fabricated in a 65nm CMOS tech-nology. Fig. 19 shows the measurement setup, together witha micrograph of the die, which has an area of 0.25mm2,

including a decoupling capacitor. The differential clock signalsCK and CKB were supplied by a clock generator, and the eyediagram of the DQ signal passing through the lossy channelwas displayed by an oscilloscope.

The measured insertion loss of the channel, which is a12.5 inch FR4 trace, is shown in Fig. 20(a). At a Nyquistfrequency of 8GHz, the insertion loss of the channel is 14.7dB.Fig. 20(b) and (c) shows the measured transmitter output usinga fixed data pattern at 16Gb/s/pin, without any equalization andwith data-dependent equalization, which is shown to equalizethe voltage swing.

In a quarter-rate transmitter, the duty-cycle and quadraturephase error can distort the output eye. Fig. 21(a) showsmeasured eye diagrams and simulated shmoo plots for aPRBS-7 data pattern at 16Gb/s/pin before a channel, withoutcorrection, and with duty-cycle and quadrature phase cor-rection. We can see how correction improves the regularityof the diagram. Fig. 21(b) shows measured eye diagramsand simulated shmoo plots after a channel, without and withtiming control by the serializer timing adjuster. We see that itcorrects the timing of 4:1 serialization, reducing inter-symbolinterference and significantly opening the eye of the outputsignal.

Fig. 22 shows the effect of pull-up amplitude equal-ization, pull-down phase equalization, and data-dependent




Fig. 23. Measured eye diagram for the PRBS-7 data pattern at 16Gb/s/pin:(a) without equalization, duty-cycle correction, and quadrature phase correc-tion, when VDD = 1.0V and VDDQ = 0.6V, and (b) measured eye diagramswith data-dependent equalization, duty-cycle correction, and quadrature phasecorrection, at the same voltages, and (c) at a reduced VDD of 0.9V and VDDQof 0.3V. Simulated shmoo plots (e) when VDD = 1.0V and VDDQ = 0.6Vand (f) at a reduced VDD of 0.9V and VDDQ of 0.3V.

equalization, with a VDD of 1V and a VDDQ of 0.6V, using aPRBS-7 data pattern at 16Gb/s/pin, before and after a channelwith a loss of 14.7dB. Fig. 24(a) shows that the eye is closedwhen there is no equalization. Fig. 22(b) shows that the inter-symbol interference for transitions from ‘1’ to ‘0’ is reducedby pull-up amplitude equalization. Fig. 22(c) shows that pull-down phase equalization reduces the inter-symbol interferencefor ‘0’ to ‘1’ transitions. Fig. 22(d) shows that data-dependentequalization reduces the inter-symbol interference for bothtypes of transition, producing the largest eye by opening itboth vertically and horizontally, to 25 mV and 0.49UI. Thesimulated shmoo plots in Fig. 22(e) confirm this trend.

Fig. 23 shows measured eye diagrams and simulated shmooplots of the transmitter output for a PRBS7 data pattern at16Gb/s/pin. With neither equalization nor clock correction,the eye is closed (Fig. 23(a)). With data-dependent equal-ization and clock correction, the eye has a vertical openingof 25 mV and a horizontal opening of 0.49UI, for a VDDof 1.0V and a VDDQ of 0.6V (Fig. 23(b)). And the eye is stillopen at a reduced VDD of 0.9V and VDDQ of 0.3V, when the

Fig. 24. Power breakdown of our quarter-rate transmitter at 16Gb/s/pin,using a VDD of 0.9V and a VDDQ of 0.3V.

Fig. 25. Area breakdown of our quarter-rate transmitter.

vertical opening of the eye is 13mV and its horizontal openingis 0.47UI (Fig. 23(c)). Although the reference voltage VREFof the eye changes as VDDQ changes, the memory interfacecompensates for this variation, as a result of VREF training.

Fig. 24 provides a power breakdown of our quarter-ratetransmitter at a data-rate of 16Gb/s/pin, when VDD is 0.9Vand VDDQ is 0.3V; its total power consumption is 16.68mW.The biggest consumer of power is the pre-driver, and the4:1 OVTDM LVSTL driver uses 19.3%. The power con-sumption of the amplitude equalization control block and 1UIpulse generator accounts for 16.5%, and the phase equalizationcontrol block and delay line use 7.2%. In both the pull-up andpull-down paths of this transmitter, less power is required fordata-dependent equalization than for amplitude equalizationalone using the pulse generator. Although the transmitter usesmore power for data-dependent equalization than for phaseequalization alone, this produces definite performance benefits.

Fig. 25 is an area breakdown of our transmitter, which hasa total active area of 0.0191mm2. The pre-driver is the largestblock and the 4:1 OVTDM LVSTL driver only takes up 10.9%of the total area. Circuits related to the amplitude and phaseequalization respectively account for 12.0% and 6.0% of thetotal area.

Table IV summarizes the performance of our transmitter.We also compared it with seven state-of-the-art designs usingother methods of equalization, listed in Table III, using twofigures of merit which are measures of energy efficiency.FoM1, which is pJ/bit/pin, is normalized energy consumption;and FoM2 [26], which is pJ/bit/pin/dB, additionally factorsin channel loss. On the basis of these figures of merit, ourdesign outperforms six of the others. Kim and Oh’s design [27]achieves better values of both FoMs, but they only present anoutput driver.




TABLE IV

PERFORMANCE SUMMARY AND COMPARISON WITH OTHER STATE-OF-THE-ART TRANSMITTER DESIGNS

V. CONCLUSION

We have presented a 16Gb/s/pin quarter-rate transmitter formemory interfaces which uses data-dependent selection ofamplitude and phase equalization. This approach compensatesfor channel losses without a significant cost in terms of powerconsumption, because there is no current consumption by theoutput driver in pull-down data transmission and the relaxedimpedance matching saves current during pull-up data trans-mission. A prototype chip fabricated in a 65nm CMOS processperformed single-ended signaling at a data-rate of 16Gb/s/pinover a channel with a loss of 14.7dB. At 16Gb/s/pin, its energyefficiency is 1.04pJ/bit/pin or 0.070pJ/bit/pin/dB. A quarter-rate architecture and single-ended signaling make this trans-mitter suitable for memory interfaces.

REFERENCES

[1] J.-H. Chae et al., “266–2133 MHz phase shifter using all-digitaldelay-locked loop and triangular-modulated phase interpolator forLPDDR4X interface,” Electron. Lett., vol. 53, no. 12, pp. 766–768,Jun. 2017.

[2] J. H. Cho et al., “A 1.2 V 64Gb 341GB/S HBM2 stacked DRAMwith spiral point-to-point TSV structure and improved bank group datacontrol,” in IEEE ISSCC Dig. Tech. Papers, Feb. 2018, pp. 208–210.

[3] D. Kim et al., “23.2 a 1.1 V 1ynm 6.4Gb/s/pin 16Gb DDR5 SDRAMwith a Phase-Rotator-Based DLL, high-speed SerDes and RX/TXequalization scheme,” in IEEE ISSCC Dig. Tech. Papers, Feb. 2019,pp. 380–382.

[4] Y.-J. Kim et al., “A 16Gb 18Gb/S/pin GDDR6 DRAM with per-bittrainable single-ended DFE and PLL-less clocking,” in IEEE ISSCCDig. Tech. Papers, Feb. 2018, pp. 204–206.

[5] J. E. Proesel and T. O. Dickson, “A 20-Gb/s, 0.66-pJ/bit serial receiverwith 2-stage continuous-time linear equalizer and 1-tap decision feed-back equalizer in 45nm SOI CMOS,” in Symp. VLSI Circuits Dig. Tech.Papers, Jun. 2011, pp. 206–207.

[6] M. Kossel et al., “A 10 Gb/s 8-Tap 6b 2-PAM/4-PAM Tomlinson–Harashima precoding transmitter for future memory-link applicationsin 22-nm SOI CMOS,” IEEE J. Solid-State Circuits, vol. 48, no. 12,pp. 3268–3284, Dec. 2013.

[7] A. Ramachandran, A. Natarajan, and T. Anand, “29.4 a 16Gb/s 3.6pJ/bwireline transceiver with phase domain equalization scheme: Integratedpulse width modulation (iPWM) in 65nm CMOS,” in IEEE ISSCC Dig.Tech. Papers, Feb. 2017, pp. 488–490.

[8] J. M. Wilson et al., “A 1.17pJ/b 25Gb/s/pin ground-referenced single-ended serial link for off- and on-package communication in 16nmCMOS using a process- and temperature-adaptive voltage regulator,”in IEEE ISSCC Dig. Tech. Papers, Feb. 2018, pp. 276–278.

[9] Y.-H. Song and S. Palermo, “A 6-Gbit/s hybrid voltage-mode transmitterwith current-mode equalization in 90-nm CMOS,” IEEE Trans. CircuitsSyst. II, Exp. Briefs, vol. 59, no. 8, pp. 491–495, Aug. 2012.

[10] S. Kim, Y. Jeong, M. Lee, K.-W. Kwon, and J.-H. Chun, “A 5.2-Gb/slow-swing voltage-mode transmitter with an AC-/DC-Coupled equalizerand a voltage offset generator,” IEEE Trans. Circuits Syst. I, Reg. Papers,vol. 61, no. 1, pp. 213–225, Jan. 2014.

[11] Y. Chen, P.-I. Mak, Z. Yang, C. C. Boon, and R. P. Martins, “A 0.0071-mm2 10.8pspp-jitter 4 to 10-Gb/s 5-Tap current-mode transmitter usinga hybrid delay line for Sub-1-UI fractional de-emphasis,” IEEE Trans.Circuits Syst. I, Reg. Papers, vol. 66, no. 10, pp. 3991–4004, Oct. 2019.

[12] W.-J. Su and S.-I. Liu, “A 5 Gb/s voltage-mode transmitter usingadaptive time-based de-emphasis,” IEEE Trans. Circuits Syst. I, Reg.Papers, vol. 64, no. 4, pp. 959–968, Apr. 2017.

[13] S. Saxena, R. K. Nandwana, and P. K. Hanumolu, “A 5 Gb/s energy-efficient voltage-mode transmitter using time-based de-emphasis,” IEEEJ. Solid-State Circuits, vol. 49, no. 8, pp. 1827–1836, Aug. 2014.

[14] A. Ramachandran, A. Natarajan, and T. Anand, “Line coding techniquesfor channel equalization: Integrated pulse-width modulation and consec-utive digit chopping,” IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 66,no. 3, pp. 1192–1204, Mar. 2019.

[15] J. Park, J.-H. Chae, Y.-U. Jeong, J.-W. Lee, and S. Kim, “A 2.1-Gb/s12-channel transmitter with phase emphasis embedded serializer for55-in UHD intra-panel interface,” IEEE J. Solid-State Circuits, vol. 53,no. 10, pp. 2878–2888, Oct. 2018.

[16] K.-L.-J. Wong, H. Hatamkhani, M. Mansuri, and C.-K.-K. Yang,“A 27-mW 3.6-Gb/s I/O transceiver,” IEEE J. Solid-State Circuits,vol. 39, no. 4, pp. 602–612, Apr. 2004.

[17] B. Kim, Y. Liu, T. O. Dickson, J. F. Bulzacchelli, and D. J. Friedman,“A 10-Gb/s compact low-power serial I/O with DFE-IIR equalizationin 65-nm CMOS,” IEEE J. Solid-State Circuits, vol. 44, no. 12,pp. 3526–3538, Dec. 2009.

[18] M. Choi et al., “An FFE transmitter which automatically and adaptivelyrelaxes impedance matching,” IEEE J. Solid-State Circuits, vol. 53,no. 6, pp. 1780–1792, Jun. 2018.




[19] J.-H. Chae, H. Ko, J. Park, and S. Kim, “A 12.8-Gb/s quarter-ratetransmitter using a 4:1 overlapped multiplexing driver combined withan adaptive clock phase aligner,” IEEE Trans. Circuits Syst. II, Exp.Briefs, vol. 66, no. 3, pp. 372–376, Mar. 2019.

[20] (2020). JEDEC Solid State Technology Association: DDR4SDRAM Standard. [Online]. Available: http://www.jedec.org/standards-documents/docs/jesd79-4a

[21] M. Kim et al., “A 4266 Mb/s/pin LPDDR4 interface with an asynchro-nous feedback CTLE and an adaptive 3-Step eye detection algorithm formemory controller,” IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 65,no. 12, pp. 1894–1898, Dec. 2018.

[22] Y. Chen, P.-I. Mak, C. C. Boon, and R. P. Martins, “A 36-Gb/s 1.3-mW/Gb/s duobinary-signal transmitter exploiting power-efficient cross-quadrature clocking multiplexers with maximized timing margin,” IEEETrans. Circuits Syst. I, Reg. Papers, vol. 65, no. 9, pp. 3014–3026,Sep. 2018.

[23] (2017). JEDEC Solid State Technology Association: GDDR6SGRAM Standard. [Online]. Available: http://www.jedec.org/standards-documents/docs/jesd250

[24] X. Zheng et al., “A 40-Gb/s quarter-rate SerDes transmitter and receiverchipset in 65-nm CMOS,” IEEE J. Solid-State Circuits, vol. 52, no. 11,pp. 2963–2978, Nov. 2017.

[25] C.-K. Lee et al., “23.2 a 5Gb/s/pin 8Gb LPDDR4X SDRAM with power-isolated LVSTL and split-die architecture with 2-die ZQ calibrationscheme,” in IEEE ISSCC Dig. Tech. Papers, Feb. 2017, pp. 390–392.

[26] K.-Y. Chen, W.-Y. Chen, and S.-I. Liu, “A 0.31-pJ/bit 20-Gb/s DFE with1 discrete tap and 2 IIR filters feedback in 40-nm-LP CMOS,” IEEETrans. Circuits Syst. II, Exp. Briefs, vol. 64, no. 11, pp. 1282–1286,Nov. 2017.

[27] E. Kim and T. Oh, “Single-ended 2 ch. ×3.4 Gbit/s dual-mode near-ground transmitter IO driver in 45 nm CMOS process,” Electron. Lett.,vol. 53, no. 5, pp. 308–310, Mar. 2017.

Joo-Hyung Chae (Member, IEEE) received theB.S. and Ph.D. degrees in electrical engineeringfrom Seoul National University, Seoul, South Korea,in 2012 and 2019, respectively.

In summer 2013, he was an Intern at the Depart-ment of LPDDR Memory Design, SK Hynix,Icheon, South Korea. In 2019, he joined SK Hynix,where he worked on GDDR memory design. Hisresearch interests are the design of high-speedI/O circuits, clock generation circuits, and memoryinterface.

Yong-Un Jeong (Student Member, IEEE) receivedthe B.S. degree in electrical and electronic engi-neering from Yonsei University, Seoul, South Korea,in 2013. He is currently pursuing the Ph.D. degreewith Seoul National University, Seoul.

His research interests include the design ofhigh-speed I/O circuits, clock generation circuits,display interface, and memory interface.

Suhwan Kim (Senior Member, IEEE) received theB.S. and M.S. degrees in electrical engineering andcomputer science from Korea University, Seoul,South Korea, in 1990 and 1992, respectively, and thePh.D. degree in electrical engineering and computerscience from the University of Michigan, Ann Arbor,MI, USA, in 2001.

From 1993 to 1999, he was with LG Electronics,Seoul. From 2001 to 2004, he was a Research StaffMember with the IBM T. J. Watson Research Center,Yorktown Heights, NY, USA. In 2004, he joined

Seoul National University, Seoul, where he is currently a Professor ofelectrical and computer engineering. His research interests include analog andmixed-signal integrated circuits, high-speed I/O circuits, low-power sensorreadout circuits, and silicon-photonic integrated circuits.

Dr. Kim has received the 1991 Best Student Paper Award of the IEEE KoreaSection and the First Prize (Operational Category) in the VLSI Design Contestof the 2001 ACM/IEEE Design Automation Conference, the Best Paper Awardof the 2009 Korean Conference on Semiconductors, and the 2011 Best PaperAward of the International Symposium on Low-Power Electronics and Design.He has served as the Organizing Committee Chair for IEEE Asian Solid StateConference and the General Co-Chair and the Technical Program Chair forthe IEEE International System-on-Chip (SoC) Conference. He has participatedmultiple times on the Technical Program Committee of the IEEE InternationalSOC Conference, the International Symposium on Low-Power Electronicsand Design, the IEEE Asian Solid-State Circuits Conference, and the IEEEInternational Solid-State Circuits Conference. He has also served as a GuestEditor for the IEEE JOURNAL OF SOLID-STATE CIRCUITS special issue onthe IEEE Asian Solid-State Circuits Conference.


Documents

Data-Dependent Selection of Amplitude ... - analog.snu.ac.kranalog.snu.ac.kr/publications/09076611.pdf · This article has been accepted for inclusion in a future issue of this journal