28
Energy Recycling from Multi-GHz Clocks using Fully Integrated Switching Converters M. Alimadadi, S. Sheikhaei, G. Lemieux, S. Mirabbasi, W. Dunford, P. Palmer University of British Columbia, Vancouver, BC, Canada Abstract Large digital chips use a significant amount of energy to distribute a multi-GHz clock. By discharging the clock network to ground every cycle, the energy stored in this large capacitor is wasted. Instead, this energy can be recycled to another part of the chip using an on-chip DC-DC converter. This paper investigates energy recycling by integrating three DC-DC converter topologies, namely buck, boost and buck-boost, with a high-speed clock driver. The high operating frequency significantly shrinks the required size of the L and C components so they can be placed on-chip; typical converters place them off-chip. The clock driver and DC-DC converter are able to share the entire tapered buffer chain, including the widest drive transistors in the final stage. The clock duty cycle must be modulated to achieve voltage regulation, implying only single-edge-triggered flops should be used. However, this minor drawback is eclipsed by the benefits: by recovering energy from the clock, the output power can actually exceed the incremental input power needed to operate the converter circuitry, resulting in an effective efficiency greater than 100%. Circuit operation is validated using a 90nm prototype chip. Finally, the converter output enables additional power-saving features such as low-voltage islands or body bias voltages. Keywords Switching DC-DC converter, integrated DC-DC converter, integrated output filter, charge recycling, energy recovery, energy recycling, integrated clock/converter, multi-GHz clock.

Energy Recycling from Multi-GHz Clocks using Fully ...lemieux/publications/alimadadi-jssc2008draft.pdf · Large digital chips use a significant amount of energy to distribute a multi-GHz

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Energy Recycling from Multi-GHz Clocks using Fully ...lemieux/publications/alimadadi-jssc2008draft.pdf · Large digital chips use a significant amount of energy to distribute a multi-GHz

Energy Recycling from Multi-GHz Clocks using Fully Integrated Switching Converters

M. Alimadadi, S. Sheikhaei, G. Lemieux, S. Mirabbasi, W. Dunford, P. Palmer University of British Columbia, Vancouver, BC, Canada

Abstract

Large digital chips use a significant amount of energy to distribute a multi-GHz clock. By

discharging the clock network to ground every cycle, the energy stored in this large capacitor is

wasted. Instead, this energy can be recycled to another part of the chip using an on-chip DC-DC

converter. This paper investigates energy recycling by integrating three DC-DC converter

topologies, namely buck, boost and buck-boost, with a high-speed clock driver. The high

operating frequency significantly shrinks the required size of the L and C components so they can

be placed on-chip; typical converters place them off-chip. The clock driver and DC-DC converter

are able to share the entire tapered buffer chain, including the widest drive transistors in the final

stage. The clock duty cycle must be modulated to achieve voltage regulation, implying only

single-edge-triggered flops should be used. However, this minor drawback is eclipsed by the

benefits: by recovering energy from the clock, the output power can actually exceed the

incremental input power needed to operate the converter circuitry, resulting in an effective

efficiency greater than 100%. Circuit operation is validated using a 90nm prototype chip. Finally,

the converter output enables additional power-saving features such as low-voltage islands or

body bias voltages.

Keywords

Switching DC-DC converter, integrated DC-DC converter, integrated output filter, charge

recycling, energy recovery, energy recycling, integrated clock/converter, multi-GHz clock.

Page 2: Energy Recycling from Multi-GHz Clocks using Fully ...lemieux/publications/alimadadi-jssc2008draft.pdf · Large digital chips use a significant amount of energy to distribute a multi-GHz

1

1. Introduction

The rapid increase in energy consumption of large digital circuits has been predominantly due to

an increase in total gate capacitance and an increase in operating frequency [1]. As a result, a

large fraction of the total energy budget is used by the high-frequency clock network. While the

clock energy is obviously proportional to frequency, additional energy is also expended in

modern designs to reduce clock skew and to drive increased capacitance caused by the more

closely spaced wires and thinner gate oxides. Most of this energy is consumed in the final drive

stage where a large load capacitance is charged and discharged every cycle [2]. For example, the

5+ GHZ IBM POWER6 processor clock network consumes 22W at 1V in a 341mm2 die [3],

suggesting it can be roughly modeled by an equivalent capacitance of 13pF/mm2.

There are several methods used to reduce clock energy, such as gating the clock, low-

swing signals, double-edge triggered flip-flops, adiabatic switching [4], and resonant clocking

[5][6]. All of these previous techniques have attempted to reduce the power consumption of

operating the clock, but have various issues. For example, resonant clocking shows promise for

up to 80% energy reduction in clock distribution, but it cannot be applied at the end loads where

most of the power is dissipated. The reason is that the sinusoidal waveform produced by

resonance does not have steep edges right at the critical switching time, which is needed for all

the flops to agree to the same, precise switching moment.

This paper investigates a new method for reducing clock energy, summarized in Figure 1,

where the clock energy is not reduced directly but is instead recovered using a DC-DC converter

and redeployed to another circuit in a regulated fashion. This redeployment reduces the total

current draw from the primary supply. We call this concept energy recycling [7].

Page 3: Energy Recycling from Multi-GHz Clocks using Fully ...lemieux/publications/alimadadi-jssc2008draft.pdf · Large digital chips use a significant amount of energy to distribute a multi-GHz

2

One of the main advantages of energy recycling is the efficient generation of an on-chip

voltage supply which differs from the level offered by the primary supply. Since the on-chip DC-

DC converter is small, many can be deployed across the chip to produce independent, regional

power supplies. This allows for several different regulated voltages to be on-chip at the same

time, all powered from a single primary supply. Various power-saving techniques such as mixed-

voltage islands and adaptive body biasing (ABB) [8] can utilize these additional supply voltages.

An on-chip DC-DC converter can power these schemes without the need for external pins,

external components, or board design effort. Another advantage of on-chip converters is the

ability to respond quickly to dynamic load conditions in many-core processors, a key requirement

for achieving the savings promised by dynamic voltage and frequency scaling (DVFS) [9].

The main drawback of fully integrated switching converters is their low efficiency. High-

quality discrete converters operate at above 80% efficiency. It can be difficult to reach this level

without off-chip components because a low switching frequency (hence, large LC components) is

often used to reduce switching loss. For example, the converter described in [10] uses 27mm2 at

45MHz to reach an efficiency of 65%. The DC-DC converters described in this paper operate at

nearly 100× higher frequencies to shrink LC area by 99%, but switching losses do increase. To

mitigate the switching loss, energy stored in the clock capacitance is recycled. This extra energy

is available for free, since it would normally be wasted every cycle when the clock node is

discharged. As a result, the power needed to operate the DC-DC converter is greatly reduced.

To keep things simple, the new converters in this paper are presented in an ‘open loop’

mode with no voltage regulation capability. A more complete design, as presented in [10], would

include a feedback controller to regulate voltage by modulating the clock pulse width. However,

Page 4: Energy Recycling from Multi-GHz Clocks using Fully ...lemieux/publications/alimadadi-jssc2008draft.pdf · Large digital chips use a significant amount of energy to distribute a multi-GHz

3

the operation of such a controller is not under investigation here. Also, in all designs, the power

converters target an output voltage ripple of less than 5% peak-to-peak.

The remainder of this paper is organized as follows. In the next section, we describe the

compatibility of merging a clock driver with a power converter and the resulting impact on

power conversion efficiency. Sections 3 to 5 present a clock driver integrated with buck, boost,

and buck-boost power converter topologies, respectively. Section 6 gives simulation results,

while Section 7 discusses layout implementation issues and Section 8 presents results from a

working proof-of-concept buck-converter prototype manufactured in 90nm CMOS. Section 9

concludes this paper.

2. Similarity of Clock Driver and Power Converter Circuits

Integrating a clock driver with a DC-DC converter merges several compatible concepts. First,

using the same high switching frequency (e.g., 3GHz) for the clock and converter reduces the

size of on-chip inductor and capacitor needed by the converter. Second, the final clock drivers

and the DC-DC power transistors are both very wide to improve switching time of the clock and

reduce output losses of the converter. These large, low-impedance transistors need to be driven

by a tapered inverter chain to keep up with the high frequency. Third, the power used by this

chain should be minimized in both cases. Fourth, high-efficiency DC-DC converters employ an

energy-saving concept known as zero-voltage switching (ZVS), where a power transistor is

turned on only after a “dead-time” delay when the source-drain voltage reaches 0V. During the

dead time, energy stored in the drain node capacitance is removed by the inductor and delivered

to the load instead of being wasted. Applying this ZVS technique to the large clock capacitance

is the key to energy recycling. Since the inductor is not used in a resonant fashion, clock edges

Page 5: Energy Recycling from Multi-GHz Clocks using Fully ...lemieux/publications/alimadadi-jssc2008draft.pdf · Large digital chips use a significant amount of energy to distribute a multi-GHz

4

are kept fast. Lastly, many DC-DC converters use pulse width modulation (with fast edge rates)

for output regulation, a scheme compatible with single-edge triggered clocking.

Figure 2 shows how combining the clock driver circuit with the power converter circuit

helps to increase the overall efficiency. The integrated clock driver/power converter in Figure

2(a) receives Pin1 input power and provides Pout output power, resulting in a raw efficiency of

η = Pout /Pin1 < 1. However, part of Pin1 is required to operate the clock network. If a dedicated

clock driver was constructed, this power consumption would be Pin2. Hence, an integrated clock

driver/power converter circuit uses Pin1 − Pin2 additional power to operate the power converter

portion and recycle energy from the clock driver. As shown in Figure 2(b), if the clock driver

power (and circuitry) was removed from the integrated design and replaced with a standalone

converter, it would need to provide Pout using just the incremental power, Pin1 − Pin2. Thus,

recycling the clock power increases the overall efficiency.

To compare the dual-purpose circuit with a traditional stand-alone power converter, we

use a new metric called effective efficiency (ηeff), which is defined in Equation (1) as the output

power divided by the incremental input power needed to operate the converter. Effective

efficiency captures how efficient a traditional stand-alone converter needs to be to supply the

same output power using just the incremental input power used by the dual-purpose circuit.

10021

×−

=inin

outeff PP

Pη . (1)

Due to the poor efficiency typically associated with very high switching rates, we

consider an effective efficiency greater than 50% to be good for an on-chip converter.1 However,

in some cases, circuits in this paper provide effective efficiency greater than 100%. This is not

1 The threshold value of 50% effective efficiency was chosen to match the raw efficiency of the integrated clock driver/buck

converter chip presented in this paper. Achieving anything better than the raw efficiency should be considered good.

Page 6: Energy Recycling from Multi-GHz Clocks using Fully ...lemieux/publications/alimadadi-jssc2008draft.pdf · Large digital chips use a significant amount of energy to distribute a multi-GHz

5

evidence of the perpetual motion fallacy; it is proof that otherwise wasted energy is being

recycled from the clock tree and delivered to the converter output.

3. Integrated Clock Driver/Buck Converter

This section describes the operation of a buck converter and how it can be integrated with a

traditional tapered clock driver chain. This combined circuit is used to drive a large clock

capacitance and output a voltage lower than VDD which can be used as a supply for low-voltage

islands to further reduce energy consumption. This circuit was initially presented in [11], but is

described here in greater detail.

3.1. Simplified Circuit

A buck converter operates by averaging a square-wave signal through a low-pass filter as shown

in Figure 3(a). The output voltage of an ideal converter is the average or DC value of its input,

D×VDD, which implies that it is a function of the magnitude and duty cycle of the input, but not

the frequency.

A simple integrated clock driver/buck converter circuit is shown in Figure 3(b) where a

chain of cascaded inverters (not shown) drives node Vclk-in. Capacitance Cclk is the overall

capacitance at the clock node and includes all transistor and wiring capacitances at this node. The

operation of this circuit is summarized by the idealized timing diagram in Figure 3(c), where D,

Tsw, and Tdelay represent clock duty cycle, switching (clock) period, and ZVS dead-time,

respectively. There are three modes of operation:

• Mode 1 (time 0 to D×Tsw) is intended to drive the load and charge Cclk through Mp.

Inductor current increases linearly since the voltage across it is constant.

• Mode 2 (time D×Tsw to D×Tsw+Tdelay) is intended for energy recycling. During this time,

both Mn and Mp are off and the charge stored in Cclk is moved to the output circuit

Page 7: Energy Recycling from Multi-GHz Clocks using Fully ...lemieux/publications/alimadadi-jssc2008draft.pdf · Large digital chips use a significant amount of energy to distribute a multi-GHz

6

through the inductor, as the inductor current can not be disrupted abruptly. This results in

a rapid drop of Vclk, which is intended. In this short period of time, the inductor current

can be assumed somewhat constant. It is worth mentioning that if no delay is present, Cclk

would be discharged to ground at time D×Tsw through Mn, wasting the energy.

• Mode 3 (time D×Tsw+Tdelay to Tsw) starts when the voltage across Mn is close to zero. At

this time Mn is turned on to provide a low-resistance path for the inductor current. At this

point, no further energy is supplied to the inductor and the voltage across it is constant, so

inductor current decreases linearly. ZVS operation occurs when Mn is turned on while its

source-drain voltage is close to zero, thereby reducing dynamic power loss.

Theoretically, in Mode 3, when the falling inductor current crosses zero, Mn could be turned off

to allow charging Cclk with the negative inductor current. Then, at the beginning of the next

switching cycle, Mp would be turned on with 0V across it (i.e., ZVS operation for Mp). In

practice, this increases the output voltage ripple, as CF must provide the required charge for the

large Cclk. In this design, no ZVS operation is implemented for Mp.

3.2. Complete Circuit

Measuring the performance of the new circuit requires constructing a reference clock driver

circuit as well as the integrated clock driver/buck converter itself. These two circuits are shown

in Figures 4 and 5, respectively. The reference clock driver has the same transistor sizes and load

as the integrated design to facilitate experimental measurements with a fabricated prototype.

In both circuits, PMOS transistors are three times wider than NMOS transistors, except

for the last inverter stage in which the PMOS is four times wider to reduce the voltage drop

across Mp while Vclk is high and the current is building up in the inductor LF (Mode 1). A tapering

factor equivalent to a fan-out of four is used for the inverter chain, which minimizes clock

Page 8: Energy Recycling from Multi-GHz Clocks using Fully ...lemieux/publications/alimadadi-jssc2008draft.pdf · Large digital chips use a significant amount of energy to distribute a multi-GHz

7

latency from the source. To reduce front-end energy and improve overall conversion efficiency,

this tapering factor could be increased. NMOS transistor gate capacitance is used to implement

the converter filter capacitor, CF, while the gate capacitance of a large dummy inverter is used to

represent the parasitic and load capacitance at the clock node, Cclk. The value of Cclk is estimated

to be 12pF, roughly equivalent to a 1mm2 region of the IBM POWER6 processor.

To control the exact on/off timing of Mn and Mp shown in Figure 5, the inverter driving

those transistors is replaced with two separate inverters, with the same total transistor sizes as the

original single driver. To implement a delay for the ZVS dead-time, the gate of M1 is connected

to Vclk instead of being connected to the gate of M2. Therefore, compared to Vp, the rising edge of

Vn is delayed by Tdelay, a duration which depends on how quickly LF drains Cclk and how fast M1

turns on to raise Vn. A drop in Vclk will result in M1 and then Mn to turn on and consequently Vclk

is dropped faster. Since the gate of M2 is connected to Vm, no falling edge delay is observed for

Vn. To prevent M1 and M2 from being on concurrently at the rising edge of Vm, the source of M1 is

connected to Vp instead of VDD and uses its charge. Therefore, Vn falls at the falling edge of Vp.

The duty cycle observed at Vclk, which determines the output voltage, is influenced by

how quickly Cclk is drained by the load. The output voltage is given by ineffout VDV ×= where

sw

falldelayeff T

TTDD

−⋅+=

2

1. (2)

Here, Tfall is the fall time of Vclk if there is no ZVS delay (reference clock driver circuit), while

Tdelay is the fall time of Vclk in the presence of ZVS (integrated driver/converter circuit).

Equation (2) suggests that if Tdelay is equal to Tfall, duty cycle remains unchanged. Any Tdelay

larger than Tfall increases the effective duty cycle accordingly. Tdelay can be calculated using the

simplified circuit model given in Figure 6. At the time t = 0 when Mp turns off,

Page 9: Energy Recycling from Multi-GHz Clocks using Fully ...lemieux/publications/alimadadi-jssc2008draft.pdf · Large digital chips use a significant amount of energy to distribute a multi-GHz

8

( ) PMOSonaxLmDDclk RIVV −⋅−=0 . During clock fall off, ILmax can be assumed to be constant,

therefore, ( ) ( ) tIC

VtV axLmclk

clkclk ⋅⋅−= 10 . The time that takes for Vclk to reach zero is

−= −PMOSon

axLm

DDclkdelay R

I

VCT . (3)

Timing uncertainty is an important issue in a clock distribution network. In Figure 5, as

Cclk is charged and discharged through non-similar circuit routes, rising/falling edges of the Vclk

are not similar. The most significant concern is rise- and fall-time dependence on a dynamic load.

When the clock is rising, different load currents result in different voltage drops across the Mp

on-resistance, Ron-pmos. As a result, the rising edge of the clock is slightly modified based on the

load current. For the falling edge, the problem is more severe, as the load current solely

determines the falling slope of the clock signal before the ZVS delay circuit is triggered.

It is also worth noting that no oscillation happens at the Vclk node: as Vclk approaches 0V,

Mn turns on and prevents the oscillation by keeping Vclk at 0V until the next Mp turn-on phase.

4. Integrated Clock Driver/Boost Converter

The buck converter just described cannot output a voltage close to VDD because the PWM clock

signal must remain low for a significant time period. Instead, a boost converter can be used to

provide increased output voltage range. Originally presented in [12], this paper provides

additional circuit design and implementation details. This boost converter design also led to the

development of a new low-power clock driver [13].

4.1. Simplified Circuit

In the typical boost converter of Figure 7(a), when the switch is closed, voltage Vin is across the

inductor LF and it builds up current. In the next phase, when the switch is open, inductor current

finds its way through the diode and charges the output capacitor.

Page 10: Energy Recycling from Multi-GHz Clocks using Fully ...lemieux/publications/alimadadi-jssc2008draft.pdf · Large digital chips use a significant amount of energy to distribute a multi-GHz

9

A simple integrated clock driver/boost converter circuit is shown in Figure 7(b). It uses a

switched-capacitor voltage-shifter circuit to generate a shifted gating signal for the PMOS

transistor in place of the power diode. In addition to providing output voltage levels higher than

VDD, the circuit also produces a buffered version of the clock, Vclk_scaled, at the same magnitude as

Vout. This clock signal can be used in the circuitry powered by the converter, but allowances for

clock skew and level-conversion will need to be made in the data path logic. There are two

modes of operation: Mode 1, where Vclk goes high and Mn turns on to build up inductor current,

and Mode 2 where inductor current finds its way through Mp to charge the output capacitor CF.

The output voltage of this boost converter would be inout VD

V ×−

=1

1DDV

D

D ×−

=1

,

where DDclkin VDVmeanV ×== )( . Ideally when D > 50%, the output voltage will be higher than

VDD.

4.2. Complete Circuit

The complete integrated clock driver/boost converter circuit is shown in Figure 8. Mn2 and Mp2

introduce the turn-on delay to implement ZVS for Mn1 as in the buck circuit. The drain node of

Mn3 and Mp3, denoted as Vclk_scaled, swings from zero to Vout. The value of the scaled clock

capacitor is assumed to be 2.2pF. Some of the recovered energy is subsequently lost when this

capacitor is discharged, so it should be kept small.

In Figure 8, the gating signal for Mn3 changes from VDD to zero. However, the appropriate

gating signal for Mp3 should instead change from Vout to Vout – VDD. The combination of diodes

Dshift, capacitor Cshift, and transistors Mn1 and Mp1 perform as a switched-capacitor voltage shifter.

Except for Dshift and Cshift, all transistor body terminals are connected to their source pins.

The body terminals of Dshift and Cshift are connected to ground instead. This prevents forward

Page 11: Energy Recycling from Multi-GHz Clocks using Fully ...lemieux/publications/alimadadi-jssc2008draft.pdf · Large digital chips use a significant amount of energy to distribute a multi-GHz

10

biasing of the body-drain intrinsic diode, in case drain voltage goes lower than the source

voltage. Also, this makes the layout implementation easier as well, since no deep n-well structure

is required. Finally, a 1kΩ resistor is added in parallel to Cshift to bias the Dshift diodes and provide

a DC current path to avoid floating nodes when the Dshift is off.

5. Integrated Clock Driver/Buck-Boost Converter

This section integrates a buck-boost converter with a clock driver to produce a negative output

voltage. Originally presented in [12], this paper provides circuit implementation details.

5.1. Simplified Circuit

In the typical buck-boost converter of Figure 7(c), when the switch is on, voltage Vin will be

across the inductor LF and current will build up in the inductor. In the next phase, when the

switch is off, inductor current goes through the diode and charges the output capacitor. The diode

prevents shorting Vout to VDD when the switch is on.

A simple integrated clock driver/buck-boost converter circuit is shown in Figure 7(d). It

uses a switched-capacitor voltage-shifter circuit to generate a shifted gating signal for the NMOS

used in place of the power diode. An extra switch Sclk is also added between nodes Vclk and Vinv.

This switch prevents Vclk from becoming negative as Vinv goes below zero when Mn is on.

Similar to the boost converter, the buck-boost converter also has two modes of operation.

In Mode 1, current builds up in the inductor, while in Mode 2 the inductor current absorbs the

energy in Cclk then delivers its stored energy to CF. The output voltage of this buck-boost

converter is calculated by DDinout VD

DV

D

DV ×

−−=×

−−=

11

2

, which is negative.

5.2. Complete Circuit

The complete integrated clock driver/buck-boost converter circuit is shown in Figure 9. Many of

Page 12: Energy Recycling from Multi-GHz Clocks using Fully ...lemieux/publications/alimadadi-jssc2008draft.pdf · Large digital chips use a significant amount of energy to distribute a multi-GHz

11

the changes are similar in nature to those used to implement the boost circuit, i.e., the addition of

Mp3 and Mn3 to delay the energy-wasting discharge of Cclk. Also, in Figure 9, the gating signal for

Mp1 changes from zero to VDD but Mn1 needs a voltage-shifted value produced by the

combination of diodes Dshift, capacitor Cshift, and transistors Mn3 and Mp3.

There are three design decisions in Figure 9 that warrant further discussion. First,

transistors Mp2 and Mn2 are added to protect Mp1 and Mn1 from potentially large voltage drops

across them. Second, transistor Mp4 acts as the switch to prevent Vclk from going negative. The

gate of Mp4 is connected to Vbias, which is set at the threshold voltage of PMOS transistor Mp5.

Third, the body of all NMOS transistors needs to be connected to their sources or the most

negative voltage in the system to prevent forward biasing of body-source intrinsic diodes.

6. Simulation Results

All three integrated clock-driver/switching converters were designed and simulated in 90nm

CMOS technology. In the buck design, all transistors are standard-Vt (standard-threshold) type,

but the other two use only low-Vt type transistors to facilitate operation at lower VDD levels. To

provide greater opportunity for energy recycling, the boost and buck-boost designs use a larger

Cclk estimated to be 25pF.

6.1. Buck Simulation

Simulated waveforms for the integrated buck converter are shown in Figure 10. The circuit is

simulated with a 50% duty cycle and 70mA load current. The inductor current shown as Lf in

Figure 10(b) exhibits a triangular shape as expected, with minimum and maximum values of

around -50mA and 190mA, respectively. In the first half cycle of the clock, Mp source current

provides the energy to charge up Cclk as well as LF. Because of the high current, there is a voltage

drop of ~0.1V across Mp as suggested by the droop of Vclk to ~0.9V in Figure 10(a). In this figure,

Page 13: Energy Recycling from Multi-GHz Clocks using Fully ...lemieux/publications/alimadadi-jssc2008draft.pdf · Large digital chips use a significant amount of energy to distribute a multi-GHz

12

the reference clock circuit output is shown as Vclk-ref. Both clocks have similar edge slopes. In the

second half cycle of the clock, inductor current discharges Cclk. As can be seen in Figure 10(b),

Mn source current is always positive, which means that all the charge in Cclk is delivered to the

load instead of the ground.

The simulated output voltage and effective efficiency of the buck converter at different

duty cycles and output currents are given in Figure 11. The output voltage increases as D is

increased and, at the same time, the effective efficiency decreases because more of the output

power comes from conduction via Mp1 rather than energy recycling of Cclk. For example, at

70mA output current, by varying the duty cycle from 30% to 70%, the effective efficiency ranges

from 286% down to 135%. For the reference circuit (the clock driver alone), simulations

determined its power consumption, Pin2, is 41mW.

6.2. Boost and Buck-boost Simulations

Figures 12 and 13 show the output voltage and the effective efficiency of the integrated boost

converter and buck-boost converter, respectively The output voltage increases in magnitude as D

is increased and, at the same time, the effective efficiency decreases. For the boost converter, a

maximum effective efficiency of 111% is achieved at D = 40% with Iout = 30mA. As mentioned

earlier, achieving an effective efficiency above 100% is definitive proof that “free energy” is

being recovered from the clock tree. For the buck-boost converter, effective efficiency reaches a

maximum of 66% at D = 20% with Iout = 50mA. The lower efficiency is a result of more

transistors in the main current path.

For the reference circuit, simulations determined its power consumption, Pin2, was

100mW. This is higher than the buck circuit, primarily because of a larger Cclk.

Page 14: Energy Recycling from Multi-GHz Clocks using Fully ...lemieux/publications/alimadadi-jssc2008draft.pdf · Large digital chips use a significant amount of energy to distribute a multi-GHz

13

7. Prototype Implementation

Layout design and fabrication for all three designs are performed in 90nm CMOS. This section

describes layout issues for the inductor, capacitor, and three converter circuits.

7.1. Implementation of the Inductor

The use of magnetic materials to increase inductance [14] is desirable, but this work assumes

only conventional CMOS processes and, consequently, coreless inductors. These integrated

inductors induce eddy currents in the substrate, which can be mitigated by placing a metal

patterned-ground shield (PGS) in between the inductor coil and the substrate [15] [16] [17] [18].

The inductor in buck design uses a single-turn octagon, placing metal layers 6 and 7 in parallel to

reduce series resistance. To further reduce resistance, the inductor in the other two designs places

metal layers 4 through 7 and one extra aluminum (ALUCAP) layer in parallel resulting in a

slightly different estimated inductance. In all cases, the PGS is implemented in metal 1 to keep it

as far as possible from the inductor. ASITIC [19] was used to extract the inductor characteristics

shown in Figure 14(a) and the simplified π model of Figure 14(b) for the inductor used in the

buck converter chip. The inductor layout area is 0.1mm2.

7.2. Implementation of the Bulk Capacitor

In CMOS technology MOSFET gate capacitors have the highest capacitance density, but they are

nonlinear [20]. This nonlinear behavior of gate capacitance is not significant in power converter

applications such as this work because capacitance is used only for energy storage. An array of

hundreds of NMOS devices in parallel is used to accommodate for the high capacitance needed.

The internal resistance of a MOS gate capacitor, known as equivalent series resistance (ESR), is

reduced by using a transistor W/L ratio of 10 [21]. A reduced ESR not only decreases power

dissipation in the capacitor, but also lowers the voltage ripple across it.

Page 15: Energy Recycling from Multi-GHz Clocks using Fully ...lemieux/publications/alimadadi-jssc2008draft.pdf · Large digital chips use a significant amount of energy to distribute a multi-GHz

14

7.3. Implementation of Three Complete Designs

The block diagram and chip micrograph of the buck design are shown in Figure 15. The area of

the buck design is 0.27mm2, and the total die area is 1mm2. A second chip, with a total die area

of 2mm2, contains the boost and buck-boost designs as well as the circuit described in [13]. The

boost design, including LF, is 0.26mm2. Due to increased layout effort, the buck-boost design

requires only 0.2mm2. In comparison, the area of the reference clock driver is 0.03mm2.

The layout of all three circuits is organized for probe station testing. Paths that carry high

currents are wide, slotted, and use many parallel vias. We are unable to monitor the internal

waveforms of the chip without being invasive, so waveform measurements are not available.

8. Prototype Measurement Results

After verifying that the circuit simulated successfully, the buck converter was manufactured and

tested first. For precise power measurement, all the parasitic resistances in the test setup of the

buck circuit were accounted for through measurement and calibration. As a result, a supply

voltage of 1.0V was applied at the chip probe pads. An external signal generator provides a clock

signal to the chip under test with a 50% duty cycle.2

The converter output voltage vs. the output current is plotted in Figure 16(a). For each

curve, the input duty cycle is kept constant at 50% and the switching frequency is changed. As

expected, the output voltage does not vary much with frequency. The output voltage increases as

output current decreases because of Deff and less resistive voltage drop in the circuit.

Similarly, Figure 16(b) plots the input power vs. output current. The input power to the

integrated clock driver/buck converter is plotted as Pin1, and the input power to the reference

clock circuit as Pin2. The input power increases with frequency due to switching activity.

Page 16: Energy Recycling from Multi-GHz Clocks using Fully ...lemieux/publications/alimadadi-jssc2008draft.pdf · Large digital chips use a significant amount of energy to distribute a multi-GHz

15

Raw and effective efficiency is plotted in Figure 17. Raw efficiency does not change

much at different frequencies. Effective efficiency is higher at lower output currents. Since the

available energy in Cclk is constant with respect to output current, at low outputs a greater

proportion of the output energy comes from recycling. However, higher Fsw results in more

energy being stored in the capacitor per second. Hence, ηeff benefits from increasing the

frequency and lowering the output current. Achieving an effective efficiency above 100% is

definitive proof that energy is being recovered from the clock. At 3GHz, Pin2 of the reference

clock is measured to be 43mW. At the same time Pin1 is measured to be 58mW for an output

voltage of 0.75V at 37mA load current (Pout1=27mW). Thus, η = 47% and ηeff = 180%.

Table 1 provides a summary of performance comparison between this work and two other

previously published buck converters. The output voltage ripple is part of the design

specification. Among the published on-chip DC-DC converters, [22] has the highest reported

switching frequency at 480MHz, but it uses on-package inductors.3 In contrast, [10] implemented

a fully on-chip buck converter in 0.18µm SiGe RF BiCMOS technology that was 65% efficient.

It also used an area of 27mm2 to fit the large passive components. The buck converter in this

work achieves a much higher effective efficiency using only 1/100th of the area.

After testing the buck converter successfully, the boost and buck-boost designs were

manufactured together on a shared die and tested. Unfortunately, the boost and buck-boost circuit

prototypes were not functional. For these circuits, measurements indicate an output voltage of a

few hundreds of millivolts, which is lower than expected. There are several suspected issues,

mostly arising from common elements of the layout or design, but we were unable to identify a

2 The circuit also behaves as expected at a 66% duty cycle [21]. Generator limits prevented testing other duty cycles. 3 The design later appeared in [23], switching at 233MHz to further boost converter efficiency.

Page 17: Energy Recycling from Multi-GHz Clocks using Fully ...lemieux/publications/alimadadi-jssc2008draft.pdf · Large digital chips use a significant amount of energy to distribute a multi-GHz

16

specific cause. The most likely cause is a faulty voltage shifter circuit, leading to the inability to

turn off the associated power transistor and unintentional draining of CF.

9. Conclusions

This paper investigates energy recovery from a clock load in high-speed digital circuits by

exploring the integration of three switching DC-DC converter topologies with a clock driver.

These integrated driver/converter circuits recycle the energy (in the form of charge stored) from

the clock capacitance. Several characteristics make integration of these two different circuits very

promising: the multi-GHz system clock used for switching significantly reduces the size of the

output filter components, making their integration feasible; the clock driver and power converter

can share the tapered buffer chain, which can be optimized for low power; and energy wasted

during clock discharge can be recovered using ZVS technique. Unlike resonant schemes which

normally produce sinusoidal clock waveforms, simulations show this approach produces good-

quality, quasi-square clocks. In addition, the power converter output enables further power-

saving features to be employed such as low-voltage islands or body bias voltages.

In this work, the buck converter has the simplest implementation. A proof-of-concept

fully integrated clock driver/buck converter test chip was fabricated in a standard 90nm CMOS

technology. Measurements indicate the chip operates with an effective efficiency up to 180%.

This is able to exceed 100% by recycling the “free energy” that is available from the clock.

Going forward, the circuits described in this work leave significant room for further

optimization. For example, this paper focuses on using ZVS to recycle the back-end energy of the

final clock driver where most of the energy is lost. However, it is possible to lower switching

losses even further with front-end energy recycling [24] or gate charge recycling [25].

Alternatively, [26] suggests a way to shrink passive components by employing two inductors to

Page 18: Energy Recycling from Multi-GHz Clocks using Fully ...lemieux/publications/alimadadi-jssc2008draft.pdf · Large digital chips use a significant amount of energy to distribute a multi-GHz

17

cancel output ripple. Future work can improve upon the converters in this paper by improving

their design and combining them with other recent advances.

In addition, there are a number of concerns that still need to be addressed before this work

can be made practical. First, clock jitter may be increased as a result of the pulse-width

modulation required for voltage regulation. As well, there may be issues with stopping the clock,

such as possible oscillations due to the LC elements. Integration with the layout of a real high-

speed microprocessor, or a reasonable proxy, is needed to verify practical layout issues with the

power grid and low-skew clock distribution network. Also, the new clock waveform is only

suitable for edge-triggered digital flip-flops, whereas high-performance microprocessors

frequently employ transparent latches. Future work is needed to address these concerns.

Acknowledgements

The authors would like to thank several people who have provided feedback for this work,

including Daryl van Vorst and Professors David Pulfrey, Resve Saleh, and K.C. Smith. We

gratefully acknowledge the contributions of CMC Microsystems for providing CAD tools and

chip fabrication resources, as well as funding from the Natural Sciences and Engineering

Research Council of Canada (NSERC).

References

[1] E. G. Friedman, “Clock Distribution Networks in Synchronous Digital Integrated Circuits,” Proc. of the IEEE, vol. 89, no. 5, pp. 665-689, May 2001.

[2] N. Ranganathan and N. Jouppi, “Evaluating the Potential of Future On-Chip Clock Distribution using Optical Interconnects,” Hewlett-Packard Company Labs Technical Report HPL-2007-163, 2007.

Page 19: Energy Recycling from Multi-GHz Clocks using Fully ...lemieux/publications/alimadadi-jssc2008draft.pdf · Large digital chips use a significant amount of energy to distribute a multi-GHz

18

[3] J. Friedrich, B. McCredie, N. James, B. Huott, B. Curran, E. Fluhr, G. Mittal, E. Chan, Y. Chan, D. Plass, S. Chu, H. Le, L. Clark, J. Ripley, S. Taylor, J. Dilullo, and M. Lanzerotti, “Design of the Power6 Microprocessor,” IEEE Int. Solid-State Circuits Conf. Dig. Tech. Papers, pp. 96-97, 2007.

[4] M. Stan and W. Burleson, “Low-power CMOS Clock Drivers,” Proc. ACM/IEEE Int. Workshop on Timing Issues in the Specification and Synthesis of Digital Systems, pp. 149-156, 1995.

[5] S. C. Chan, K. L. Shepard, and P. J. Restle, “Uniform-Phase Uniform-Amplitude Resonant-Load Global Clock Distributions,” IEEE J. Solid-State Circuits, vol. 40, no. 1, pp. 102-109, Jan. 2005.

[6] S. C. Chan, K. L. Shepard, and P. J. Restle, “Distributed Differential Oscillators for Global Clock Networks,” IEEE J. Solid-State Circuits, vol. 41, no. 9, pp. 2083-2094, Sept. 2006.

[7] G. Lemieux, M. Alimadadi, S. Sheikhaei, S. Mirabbasi, and P. Palmer “SoC Energy Savings = Reduce + Reuse + Recycle: A Case Study Using a 660MHz DC-DC Converter with Integrated Output Filter,” Proc. IEEE Canadian Conf. on Electrical and Computer Engineering, pp. 947-950, 2008.

[8] J. W. Tschanz, J. T. Kao, S. G. Narendra, R. Nair, D. A. Antoniadis, A. P. Chandrakasan, and V. De, “Adaptive Body Bias for Reducing Impacts of Die-to-Die and Within-Die Parameter Variations on Microprocessor Frequency and Leakage,” IEEE J. Solid-State Circuits, vol. 37, no. 11, pp. 1396-1402, Nov. 2002.

[9] W. Kim, M. Gupta, G.-Y. Wei, and D. Brooks, “System Level Analysis of Fast, per-core DVFS using On-chip Switching Regulators,” IEEE Int. Symposium on High Performance Computer Architecture, pp. 123-134, 2008.

[10] S. Abedinpour, B. Bakkaloglu, and S. Kiaei, “A Multi-Stage Interleaved Synchronous Buck Converter with Integrated Output Filter in a 0.18µm SiGe Process,” IEEE Int. Solid-State Circuits Conf. Dig. Tech. Papers, pp. 356-357, 2006.

[11] M. Alimadadi, S. Sheikhaei, G. Lemieux, S. Mirabbasi, and P. Palmer, “A 3GHz Switching DC-DC Converter Using Clock-Tree Charge-Recycling in 90nm CMOS with Integrated Output Filter,” IEEE Int. Solid-State Circuits Conf. Dig. Tech. Papers, pp. 532-533, 2007.

[12] M. Alimadadi, S. Sheikhaei, G. Lemieux, S. Mirabbasi, P. Palmer, and W. Dunford, “Energy Recovery from High-frequency Clocks using DC-DC Converters,” Proc. IEEE Int. Symposium on Very Large Scale Integration, pp. 162-167, 2008.

[13] M. Alimadadi, S. Sheikhaei, G. Lemieux, S. Mirabbasi, P. Palmer, and W. Dunford, “A 4GHz Non-resonant Clock Driver with Power-grid Energy Return,” submitted to IEEE Trans. Circuits and Systems II, 2008.

[14] S. C. O. Mathuna, T. O'Donnell, W. Ningning, and K. Rinne, “Magnetics on Silicon: an Enabling Technology for Power Supply on Chip,” IEEE Trans. Power Electronics, vol. 20, no. 3, pp. 585-592, May 2005.

[15] C. Yue and S. Wong, “On-chip Spiral Inductors with Patterned Ground Shields for Si-based RF ICs,” IEEE J. Solid-State Circuits, vol. 33, no. 5, pp. 743-752, 1998.

Page 20: Energy Recycling from Multi-GHz Clocks using Fully ...lemieux/publications/alimadadi-jssc2008draft.pdf · Large digital chips use a significant amount of energy to distribute a multi-GHz

19

[16] J. N. Burghartz, “Progress in RF inductors on silicon-understanding substrate losses,” IEEE Int. Electron Devices Meeting Dig. Tech. Papers, pp. 523-526, 1998.

[17] J. N. Burghartz, D. C. Edelstein, M. Soyuer, H. A. Ainspan, and K. A. Jenkins, “RF circuit design aspects of spiral inductors on silicon,” IEEE J. Solid-State Circuits, vol. 33, no. 12, pp. 2028-2034, Dec. 1998.

[18] J. Gil and S. Hyungcheol, “A Simple Wide-band On-chip Inductor Model for Silicon-based RF ICs,” IEEE Trans. Microwave Theory and Techniques, vol. 51, no. 9, pp. 2023-2028, Sep. 2003.

[19] A. M. Niknejad and R. G. Meyer, “Analysis, Design, and Optimization of Spiral Inductors and Transformers for Si RF IC’s,” IEEE J. Solid-State Circuits, vol. 33, no. 10, pp. 1470-1481, Oct. 1998.

[20] G. Villar, E. Alarcon, F. Guinjoan, and A. Poveda, “Optimized Design of MOS Capacitors in Standard CMOS Technology and Evaluation of their Equivalent Series Resistance for Power Applications,” Proc. IEEE Int. Symposium Circuits and Systems, pp. 451-454, 2003.

[21] M. Alimadadi, “Recycling Clock Network Energy in High-performance Digital Designs using On-chip DC-DC Converters”, Ph.D. Thesis, Dept. of Electrical and Computer Engineering, The University of British Columbia, 2008.

[22] G. Schrom, P. Hazucha, J. Hahn, D. S. Gardner, B. A. Bloechel, G. Dermer, S. G. Narendra, T. Karnik, and V. De, “ A 480-MHz, Multi-phase Interleaved Buck DC-DC Converter with Hysteretic Control,” Proc. IEEE Power Electronics Specialists Conf., pp. 4702-4707, 2004.

[23] P. Hazucha, G. Schrom, J. Hahn, B. A. Bloechel, P. Hack, G. E. Dermer, S. Narendra, D. Gardner, T. Karnik, V. De, and S. Borkar, “A 233MHz 80%-87% Efficient Four-phase DC-DC Converter Utilizing Air-core Inductors on Package,” IEEE J. Solid-State Circuits, vol. 40, no. 4, pp. 838-845, Apr. 2005.

[24] M. Alimadadi, S. Sheikhaei, G. Lemieux, S. Mirabbasi, P. Palmer, and W. Dunford, “A 660MHz ZVS DC-DC Converter Using Gate-driver Charge-recycling in 0.18µm CMOS with an Integrated Output Filter,” Proc. IEEE Power Electronics Specialists Conf., 2008.

[25] M. D. Mulligan, B. Broach, and T. H. Lee, “A 3MHz Low-voltage Buck Converter with Improved Light Load Efficiency,” IEEE Int. Solid-State Circuits Conf. Dig. Tech. Papers, pp. 528-529, 2007.

[26] J. Wibben and R. Harjani, “A High-efficiency DC-DC Converter Using 2nH Integrated Inductors,” IEEE J. Solid-State Circuits, vol. 43, no. 4, pp. 844-854, 2008.

Page 21: Energy Recycling from Multi-GHz Clocks using Fully ...lemieux/publications/alimadadi-jssc2008draft.pdf · Large digital chips use a significant amount of energy to distribute a multi-GHz

20

Figure 1. Recycling clock energy with a DC-DC converter

(a) Raw efficiency (b) Effective efficiency

Figure 2. Power dissipation block diagram

Page 22: Energy Recycling from Multi-GHz Clocks using Fully ...lemieux/publications/alimadadi-jssc2008draft.pdf · Large digital chips use a significant amount of energy to distribute a multi-GHz

21

(a) A typical buck converter

(b) Simplified circuit diagram (c) Idealized timing diagrams Figure 3. Integrated clock driver/buck converter

Figure 4. Reference clock driver for the integrated clock driver/buck converter

Page 23: Energy Recycling from Multi-GHz Clocks using Fully ...lemieux/publications/alimadadi-jssc2008draft.pdf · Large digital chips use a significant amount of energy to distribute a multi-GHz

22

Figure 5. Integrated clock driver/buck converter

Figure 6. Simplified circuit model for analyzing Vclk during Tfall

Page 24: Energy Recycling from Multi-GHz Clocks using Fully ...lemieux/publications/alimadadi-jssc2008draft.pdf · Large digital chips use a significant amount of energy to distribute a multi-GHz

23

(a) A typical boost converter (c) A typical buck-boost converter

(b) Simple clock driver/boost converter (d) Simple clock driver/buck-boost converter Figure 7. Integrated clock driver converter circuits

Page 25: Energy Recycling from Multi-GHz Clocks using Fully ...lemieux/publications/alimadadi-jssc2008draft.pdf · Large digital chips use a significant amount of energy to distribute a multi-GHz

24

Figure 8. Circuit diagram of the integrated clock driver/boost converter

Figure 9. Circuit diagram of the integrated clock driver/buck-boost converter

Page 26: Energy Recycling from Multi-GHz Clocks using Fully ...lemieux/publications/alimadadi-jssc2008draft.pdf · Large digital chips use a significant amount of energy to distribute a multi-GHz

25

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

1.2

0 0.2 0.4 0.6 0.8 1

Time (nSec)

Vo

ltag

e (V

)

VclkVclk-refVload

-0.1

-0.05

0

0.05

0.1

0.15

0.2

0.25

0.3

0 0.2 0.4 0.6 0.8 1

Time (nSec)

Cur

rent

(A)

LfMnMp

(a) Voltage waveforms (b) Current waveforms

Figure 10. Simulated waveforms of points of interest for the integrated clock driver/buck converter

0

0.25

0.5

0.75

1

10 20 30 40 50 60 70 80

Duty Ratio (%)

Vo

ut (V

)

Iout=30

Iout=50

Iout=70

Iout=100

0

50

100

150

200

250

300

40 50 60 70 80 90 100

Iout (mA)

Eff

ecti

ve E

ffic

ien

cy (%

)

D=30%D=40%D=50%D=60%D=70%

(a) Output voltage vs. duty cycle (b) Effective efficiency vs. output current

Figure 11. Simulation results of the integrated clock driver/buck converter

0

0.5

1

1.5

2

2.5

30 40 50 60 70 80

Duty Ratio (%)

Vou

t (V

)

Iout=10mAIout=30mAIout=50mAIout=70mAIout=100mA

0

25

50

75

100

125

0 20 40 60 80 100

Iout (mA)

Eff

ectiv

e E

ffic

ienc

y (%

)

D=40%D=50%D=60%D=70%D=80%

(a) Output voltage vs. duty cycle (b) Effective efficiency vs. output current

Figure 12. Simulation results of the integrated clock driver/boost converter

Page 27: Energy Recycling from Multi-GHz Clocks using Fully ...lemieux/publications/alimadadi-jssc2008draft.pdf · Large digital chips use a significant amount of energy to distribute a multi-GHz

26

-2

-1.6

-1.2

-0.8

-0.4

0

10 20 30 40 50 60 70

Duty Ratio (%)

Vou

t (V

)

Iout=10mAIout=30mAIout=50mA

Iout=70mAIout=90mA

0

20

40

60

80

100

0 20 40 60 80 100

Iout (mA)

Eff

ectiv

e E

ffic

ienc

y (%

)

D=20%D=30%D=40%D=50%D=60%D=70%

(a) Output voltage vs. duty cycle (b) Effective efficiency vs. output current Figure 13. Simulation results of the integrated clock driver/buck-boost converter

0

50

100

150

200

250

300

350

400

0.1 1 10 100Fsw (GHz)

Ls

(pH

), R

s (m

Oh

m),

Q x

10

Ls (pH)

Rs (mOhm)

Q x 10

Fsw = 3GHz, Lseries = 320pH, Rseries = 260mΩ, Cs1 ≅ Cs2 ≅ 140fF, Rs1 ≅ Rs2 ≅ 280Ω, Q = 20

(a) Lseries, Rseries and Q vs. Fsw (b) Simplified π model Figure 14. Modeling of the on-chip inductor for the buck converter chip

(a) Chip block diagram (b) Chip micrograph Figure 15. Implementation of the integrated clock driver/buck converter

Page 28: Energy Recycling from Multi-GHz Clocks using Fully ...lemieux/publications/alimadadi-jssc2008draft.pdf · Large digital chips use a significant amount of energy to distribute a multi-GHz

27

Fsw Sweep (D=50%)

0

0.2

0.4

0.6

0.8

1

1.2

30 40 50 60 70 80 90 100 110

Iout (mA)

Vou

t (V

)3GHz2.5GHz

Fsw Sweep (D=50%)

0

20

40

60

80

100

120

30 40 50 60 70 80 90 100 110

Iout (mA)

Pow

er (m

W)

3GHz2.5GHz

(a) Output voltage (b) Input power

Figure 16. Measured results indicating the effect of Fsw on (a) Vout and (b) Pin1 and Pin2

Fsw Sweep (D=50%)

0

20

40

60

80

100

120

30 40 50 60 70 80 90 100 110

Iout (mA)

Raw

Eff

icie

ncy

(%

)

3GHz2.5GHz

Fsw Sweep (D=50%)

0

40

80

120

160

200

240

30 40 50 60 70 80 90 100 110

Iout (mA)

Eff

ectiv

e E

ffic

ienc

y (%

)

3GHz2.5GHz

(a) Raw efficiency (b) Effective efficiency

Figure 17. Measured results showing the effect of Fsw on η and ηeff

Table 1. Summary of performance comparison between integrated switching converters Previous Work This Work Converter type

4-Phase Buck [22] 2-Phase Buck [10] Buck

Technology 90nm CMOS

0.18µm SiGe RF BiCMOS

90nm CMOS

Layout Area (mm2) 0.14 * (excludes L) 27 0.27

Switching frequency, Fsw (MHz) 480 45 3 000

Inductor, LF (pH) 3 600 (per phase)

11 000 (per phase) 320

Capacitor, CF (pF) 2 500 6 000 350 Supply Voltage, Vin (V) 1.8 2.8 1.0 Output Voltage, Vout (V) 0.9 1.5 ~ 2 0.53 ~ 0.75 Output Voltage Ripple < 5% Output Current, Iout (mA) 500 200 40 ~ 100

Effective Efficiency, ηeff (%) 72 65 184 (Vout=0.75V) 102 (Vout=0.63V) 74 (Vout=0.53V)

* Layout area was reported in [23]

Pin1

Pin2