Starlee Contents

8/13/2019 Starlee Contents

1/45

1

CHAPTER 1

INTRODUCTION

In current microprocessors, the number of wires used for inter-module

communication has skyrocketed. Furthermore, the increased complexity and high level

of integration requires higher wire densities, and coupling capacitance has dominated

total wire capacitance for several technologies already.A high coupling capacitance

ratio is not favorable in conventional bussesdue to the possibility of adjacent wiresswitching in the opposite direction, yielding a worst-case Miller capacitance factor

(MCF) of 2. For example, when MCF=2 the coupling capacitance ratio over the total

interconnect capacitance is over 80% for a minimum pitch intermediate metal layer in

65-nm[5].It is possible to reduce coupling capacitance by increasing spacing or

introducing shielding, but this comes at the cost of significant area penalties[1]. Hence,

a key challenge in interconnect design is to reduce the worst-case MCF while

maintaining the same physical footprint of the interconnect, thereby reducing the

effective wire capacitance and interconnect energy consumption.

In this project, a new encoding technique that achieves a worst-case MCF of 1,

while preserving the best-case MCF of 0 has been presented. This is done by controlling

the edges of rising and falling transition in time, namely always performing rising

transitions on the negative edge of the clock and falling transition on the positive edge of

the clock (or vice versa). Since the worst-case switching is separated by as much as one

phase (half clock cycle), this technique remains robust against process variation. Hence,

both the average and worst-case energy can be reduced without impacting the sensitivity

to process variation. Average energy savings will aid battery life and typical energy costs,

but worst-case energy is also a meaningful metric in terms of thermal management and

peak demand for power grids and decoupling capacitance.[13] These savings are

accomplished at the expense of minimal encoder logic with half cycle latency and

additional clocking. However, we findthat the logic and clocking overhead is small in

long interconnects where interconnect power consumption is dominant, and also show


2/45

2

that the potential latency overhead can be eliminated or minimized in multi-cycle

interconnects.

1.1SELF-TRANSITIONS AND COUPLING TRANSITIONSSelf-transitions are defined as transitions on the capacitance between a bus

line and the substrate (ground) while coupling transitions are defined as transitions

on the capacitance between adjacent lines. Figure 1.1 shows a simplified bus model

with coupling (ignoring all the resistances). Cs is the self-capacitance from each

bus line to ground; Cc is the coupling-capacitance between two adjacent lines. The

average power consumption on the bus is given by:

( ) (1.1)

where s is the number of average self-transitions per bus cycle and c is the

number of average coupling-transitions per bus cycle; only the charging transitions

that require current flow from the power supply are being counted. For example only

the 0 1 transitions are counted as self-transitions for power consumption.

Figure 1.1 Bus model with self and coupling capacitances.

1.2MILLER-COUPLING CAPACITANCEA simple way of performing timing analysis in the presence of crosstalk noise is

to replace the coupling capacitance by an equivalent Miller capacitance to account for

the slow down or the speed up of the victim transition. Hence, we are effectively


3/45

3

modeling the delay noise on the victim by simply scaling the coupling capacitance with

the appropriate Miller-coupling factor (MCF). The simple approximation leads to a

decoupling between the aggressor-victim interconnects (as shown in Figure 1.2) and

allows us to compute the signal arrival times using the existing STA framework.

Figure 1.2 Decoupling of the aggressor-victim nets by using Miller-coupling

capacitance

The exact value of the Miller-coupling factors for the victim (MCFvictim

) depends

on the mutual switching directions and the ratio of the slopes of the aggressor-victim

waveforms

(

).(1.2)

If the aggressor-victim nets switch in mutually opposite directions (as shown in

Figure 1.2), then MCFvictim is obtained by adding 1 to the relative ratio of the

slopes(a/v) Conversely, if the aggressor-victim nets switch in the same direction,


4/45

4

then MCFvictim is obtained by subtracting (a/v) from one. Suppose the aggressor-

victim nets switch in the opposite directions with equal slopes (i.e., a = v). Using

Equation 1.2, we can obtain the commonly used MCFvictimvalue of two. In such cases,

the Miller capacitance used to decouple the aggressor-victim interconnects is twice the

size of the original coupling capacitance. Similarly, if the aggressor-victim nets switch

in the same direction with equal slopes, then we obtain an MCFvictimvalue of zero.

1.3 FLIP-FLOP

In electronics, a flip-flopor latchis a circuit that has two stable states and can be

used to store state information. The circuit can be made to change state by signals

applied to one or more control inputs and will have one or two outputs. It is the basic

storage element in sequential logic. Flip-flops and latches are a fundamental building

block of digital electronics systems used in computers, communications, and many

other types of systems.

Flip-flops and latches are used as data storage elements. Such data storage can

be used for storage of state, and such a circuit is described as sequential logic. Whenused in a finite-state machine, the output and next state depend not only on its current

input, but also on its current state (and hence, previous inputs). It can also be used for

counting of pulses, and for synchronizing variably-timed input signals to some

reference timing signal.

Flip-flops can be either simple (transparent or opaque) or clocked (synchronous

or edge-triggered); the simple ones are commonly called latches. The word latch is

mainly used for storage elements, while clocked devices are described as flip-flops.

1.3.1 Dual Edge Flip-Flops

The edge encoding technique requires dual-edge flip-flops and hence the

number of flip-flops placed in multi-cycle interconnects is inevitably increased

compared to conventional multi-cycle interconnects with single-edge flip-flops.


5/45

5

Along the critical path of multi-cycle interconnects, both the total setup time

and CLK-Q delay in flip-flops increase as well. Hold time is not a concern even when

using dual-edge flip-flops because the interconnect paths are well-defined with several

repeaters and large wire load, and thus short paths between flip-flops do not exist.

Time-borrowing flip-flops have zero or negative setup time, providing

performance benefits over master-slave flip-flops. Therefore, in the proposed edge

encoding techniques,time-borrowing flip-flops are considered for the inserted flip-flops

to mitigate the increase in total setup time and variation.

Several types of flip-flops from [18] were considered to absorb the setup time

and mismatch between interconnect paths, for the time-borrowing dual-edge flip-flops.

We selected the pulsed triggered dual-edge flip-flop in Figure 1.3 for the proposed

edge encoding because it is more area-efficient and the D-Q path is shorter.

Figure 1.3 Schematic of time borrowing pulsed dual edge flip-flop.

1.4 TIME BORROWING WITH LATCHES

The unique property which enables above advent transparent for the duration

of an active clock pulse normal edge-to-edge timing requirements of sync long

enough and is determining the maximum freq a shorter path in subsequent latch-to-

latch stages


6/45

6

Figure 1.4 Timing paths in a sample circuit.

The figure 1.4 has 2 timing paths: Path 1 from the p a negative-level latch (2),

while Path 2 is from the triggered register (3). Let us examine this simple compensate

for the delay through the logic cloud A in Path 1, we can have two scenarios of

timing (Figures 1.6 and 1.7.)

Figure 1.5 Timing relationships set by the system clock.

In Case A, data arrives from logic A at Latch 2 be this case, the behavior of the

latch is similar to the need to borrow any time to achieve our timing go.


7/45

7

Figure 1.6 Logic A is fast enough, so no borrowing is necessary.

In Case B, the negative clock edge enables the la at the input of the latch. So the

latch will go to tra from Logic A through to Register B for a while. Bu from Logic A

reaches Logic B and passes through Register 2. So if the propagation delay of Logic

B, some of the time reserved for Logic B, and the cir this extra time in order to

complete its propagation analysis will consider the end of the borrowed time delay.

Figure 1.7 Logic A borrows time from Logic B.

While doing STA, Timing reports will be generate the timing when the latch is

enabled is the same element (Figure 1.8)


8/45

8

Figure 1.8 When the latch is enabled, it becomes a passive delay.

1.5 REPEATER INSERTION

Repeater insertion is a technique for reducing the time delay associated with

long wire lines in integrated circuits. The technique involves cutting the long wire into

one or more short wires and inserting arepeaterbetween each new pair of short wires.

The time it takes for a signal to travel from one end of a wire to the other end is

known as wire-line delay or just delay. In an integrated circuit, this delay is

characterized byRC,theresistance of the wire (R

) multiplied by the wire'scapacitance(C). Thus, if the wire's resistance is 100 ohms and its capacitance is 0.01 microfarad

(F), the wire's delay is onemicrosecond (s).

The resistance of a wire on an integrated circuit is directly proportional, or

linear, according to the wire's length. If a 1 mm length of the wire has 100 ohms

resistance, then a 2 mm length will have 200 ohms resistance.

The capacitance of a wire also increases linearly along its length. If a 1 mm

length of the wire has 0.01 F capacitance, a 2 mm length of the wire will have 0.02 F,

a 3 mm wire will have 0.03 F, and so on as shown in table 1.1.

Thus, the time delay through a wire increases with the square of the wire's

length. This is true, to first order, for any wire whose cross-section remains constant

along the length of the wire.
http://en.wikipedia.org/wiki/Repeaterhttp://en.wikipedia.org/wiki/RC_circuithttp://en.wikipedia.org/wiki/Electrical_resistancehttp://en.wikipedia.org/wiki/Capacitancehttp://en.wikipedia.org/wiki/Ohmhttp://en.wikipedia.org/wiki/Faradhttp://en.wikipedia.org/wiki/Microsecondhttp://en.wikipedia.org/wiki/Integrated_circuithttp://en.wikipedia.org/wiki/Directly_proportionalhttp://en.wikipedia.org/wiki/Time_delayhttp://en.wikipedia.org/wiki/Time_delayhttp://en.wikipedia.org/wiki/Directly_proportionalhttp://en.wikipedia.org/wiki/Integrated_circuithttp://en.wikipedia.org/wiki/Microsecondhttp://en.wikipedia.org/wiki/Faradhttp://en.wikipedia.org/wiki/Ohmhttp://en.wikipedia.org/wiki/Capacitancehttp://en.wikipedia.org/wiki/Electrical_resistancehttp://en.wikipedia.org/wiki/RC_circuithttp://en.wikipedia.org/wiki/Repeater


9/45

9

Wire Length Resistance Capacitance Time delay

1 mm 100 ohm 0.01F 1s

2 mm 200 ohm 0.02 F 4 s

3 mm 300 ohm 0.03 F 9 s

Table 1.1 Analysis of wire delay with wire length.

The interesting consequence of this behavior is that, while a single 2 mm length

of wire has a delay of 4 s, two separate 1 mm wires only have a delay of 1 s each.

The two separate wires cover the same distance in half the time. By cutting the wire in

half, we can double its speed.

Anactivecircuit must be placed between the two separate wires so as to move

the signal from one to the next. An active circuit used for such a purpose is known as a

repeater.In a CMOS integrated circuit, the repeater is often a simple inverter.

Reducing the delay of a wire by cutting it in half and inserting a repeater is

known as repeater insertion. The cost of this procedure is the additional new delay

through the repeater itself, plus power cost because the repeater is an active circuit that

must be powered, whereas the plain unrepeated wire was originally an unpowered

passive component.

1.6 BASIC LOW POWER DIGITAL DESIGN

Moores law states that the number of transistors that can be placed

inexpensively on an integrated circuit will double approximately every two years,hasoften been subject to the following criticism: while it boldly states the blessing of

technology scaling, it fails to expose its bane. A direct consequence of Moores law is

that the power density of the integrated circuit increases exponentially with every

technology generation. This implicit trend has arguably brought about some of the

most important changes in electronic and computer designs. Since the 1970s, most

popular electronics manufacturing technologies used bipolar and nMOS transistors.

However, bipolar and nMOS transistors consume energy even in a stable combinatorial
http://en.wikipedia.org/wiki/Active_componenthttp://en.wikipedia.org/wiki/Electronic_circuithttp://en.wikipedia.org/wiki/Repeaterhttp://en.wikipedia.org/wiki/Repeaterhttp://en.wikipedia.org/wiki/Passive_componenthttp://en.wikipedia.org/wiki/Passive_componenthttp://en.wikipedia.org/wiki/Repeaterhttp://en.wikipedia.org/wiki/Electronic_circuithttp://en.wikipedia.org/wiki/Active_component


10/45

10

state, and consequently, by 1980s, the power density of bipolar designs was considered

too high to be sustainable.

1.6.1 CMOS Transistor Power Consumption

The power consumption of a CMOS transistor can be divided into three different

components: dynamic, static (or leakage) and short circuit power consumption.

Figure 1.9 illustrates the three components of power consumption. Dynamic and

short circuit power are also collectively known as switching power, and are consumed

when transistors change their logic state, but leakage power is consumed merely

because the circuit is powered-on.

Switching power, which includes both dynamic power and short-circuit power,

is consumed when signals through CMOS circuits change their logic state, resulting in

the charging and discharging of load capacitors. Leakage power is primarily due to the

sub-threshold currents and reverse biased diodes in a CMOS transistor.

total dynamic shortcircuit leaage..(1.3)

Figure 1.9 The dynamic, short circuit and leakage power components


11/45

11

1.6.2 Switching Power

When signals change their logic state in a CMOS transistor, energy is drawn

from the power supply to charge up the load capacitance from 0 to Vdd. For the inverter

example in Figure 1.9, the power drawn from the power supply is dissipated as heat in

pMOS transistor during the charging process. Energy is needed whenever charge is

moved against some potential. Thus, dE = d(QV). When the output of the inverter

transitions from logical 0 to 1, the load capacitance is charged. The energy drawn from

the power supply during the charging process is given by,

dEP= d(VQ) = V

dd.dQ

L

since the power supply provides power at a constant voltage Vdd. Now, since QL=CL.VL,

we have:

dQL= CL.dVL

Therefore,

dEP= Vdd. CL. dVL

Integrating for full charging of the load capacitance,

= CL. Vdd2I ...(1.4)

Thus a total of CL.Vdd2 energy is drawn from the power source. The energy EL stored

in the capacitor at the end of transition can be computed as follows:

dEL = d(VQ) = VL.dQL

where VL is the instantaneous voltage across the load capacitance, and QL is the

instantaneous charge of the load capacitance during the loading process. Therefore,

dEL = VL. CLdVL

Integrating for full charging of the load capacitance,

=CL. Vdd2

/2

(1.5)


12/45


13/45

13

As the input voltage rises, at time t0 , we have Vin >VTn, i.e., the input voltage

become higher than the threshold voltage of the nMOS transistor. At this time a short-

circuit current path is established. This short circuit current increases as the nMOS

transistor turns on. Thereafter, the short circuit current first increases and then decreases

until, after t1, we have Vin >(Vdd-VTp),i.e., the pMOS transistor turns off, signaling the

end of short-circuit current. Therefore, in the duration when (VTn < Vin < (Vdd - VTp))

holds, there will be a conductive path open between Vdd and GND because both the

nMOS and pMOS devices will be simultaneously on. Short-circuit power is typically

estimated as:

Pshort-circuit = 1/12.k..Fclk.( Vdd -2 Vt )3 .(1.6)

1.6.4 Leakage Power

The third component of power dissipation in CMOS circuits, as shown in

Equation 1.5 is the static or leakage power. Even though a transistor is in a stable logic

state, just because it is powered-on, it continues to leak small amounts of power at

almost all junctions due to various effects.

1.6.4.1 Reverse Biased Diode Leakage

The reverse biased diode leakage is due to the reverse bias current in the

parasitic diodes that are formed between the diffusion region of the transistor and

substrate. It results from minority carrier diffusion and drift near the edge of depletion

regions, and also from the generation of electron hole pairs in the depletion regions of

reverse-bias junctions. As shown in Figure 1.11, when the input of inverter in Figure 1.9

is high, a reverse potential difference of Vddis established between the drain and the n-

well, which causes diode leakage through the drain junction. In addition, the n- well

region of the pMOS transistor is also reverse biased with respect to the p-type substrate.

This also leads to reverse bias leakage at the n-well junction.


14/45

14

The reverse bias leakage current is typically expressed as,

Irbdl= A.Js.(eq.Vbias/kT-1).(1.7)

where A is the area of the junction, Js is the reverse saturation current density,

and Vbias is the reverse bias voltage across the junction, and V th= KT/ q is the thermal

Voltage. Reverse biased diode leakage will further become important as we continue to

heavily dope the n- and p-regions. As a result, zener and band-to-band tunneling will

also become contributing factors to the reverse bias current.

Figure 1.11 Components of Leakage Power

1.7 CROSSTALK NOISE

As the feature sizes have been shrinking with process-technology scaling, the

spacing between adjacent interconnect wires keeps decreasing in every process

technology. Also, while the lateral width of interconnect wires has been scaled down


15/45

15

significantly their vertical height has not been scaled in proportion (as shown in Figure

1.12). Both these trends lead to a very rapid increase in the amount of coupling

capacitance (essentially like parallel-plate capacitors) between the wires. We know that

coupling capacitance accounts for more than 85% of the total interconnect capacitance

in the 90nm technology node.

Figure 1.12 Shrinking of wire geometries in the nanometer process technologyleads to an increase in the amount of coupling capacitance.

More aggressive technology scaling will only leads to an increase in the overall

contribution of the coupling capacitances to the total interconnect capacitance.

Therefore, signal-integrity issues such as crosstalk noise have become important when

performing timing verification of VLSI chips. Due to capacitive coupling, the switching

characteristic of a net is affected by simultaneous switching of nets that are in close

physical proximity. The net under analysis which suffers from coupling noise is referred

to as victim, and all neighboring nets which contribute to coupling noise on the victim

are termed as aggressors. Figure 1.13 illustrates coupling noise injected on the victim

due to the rising transition of its aggressor.

As the aggressor transition occurs, the voltage at the victim gets pulled up due to

the AC current flowing through the coupling capacitance. The resulting glitch in the

victim voltage due to the aggressor transition is referred to as coupling- noise pulse. The


16/45

16

peak of the coupling-noise pulse usually occurs when the aggressor transition is

completed and there is no more coupling current flowing through the coupling

capacitance. Finally, the coupling-noise pulse gradually dies down once the aggressor

transition is completed and the victim node discharges the accumulated charge.

Figure 1.13 The aggressor transition resulting coupling-noise pulse on the

victim

It has become imperative to model the signal-integrity issues that can arise due

to the charge transfer through the coupling capacitances for VLSI chips in the

nanometer process technology.

1.7.1 Preliminaries of Crosstalk Effects

Crosstalk can affect signal delays by changing the times at which signal

transitions occur. For example, consider the signal waveforms on the cross-coupled nets

A, B, and C in Figure 1.14. Because of capacitive cross-coupling, the transitions on net

A and net C can affect the time at which the transition occurs on net B.

A rising-edge transition on net A at the time shown in Figure 1.14 can cause the

transition to occur later on net B, possibly contributing to a setup violation for a path

containing B.


17/45

17

Figure 1.14 Transition Slowdown or Speedup Caused by Crosstalk.

Similarly, a falling-edge transition on net C can cause the transition to occur

earlier on net B, possibly contributing to a hold violation for a path containing B. The

logic effects of crosstalk on steady-state nets are due to the cross-coupled nets as shown

in Figure 1.15.

Figure 1.15 Glitch Due to Crosstalk.

Net B should be constant at logic zero, but the rising edge on net A causes a

noise bump or glitch on net B. If the bump is sufficiently large and wide, it can cause an

incorrect logic value to be propagated to the next gate in the path containing net B.


18/45

18

A net that receives undesirable cross-coupling effects from a nearby net is called

a victim net. A net that causes these effects in a victim net is called an aggressor net.

Note that an aggressor net can itself be a victim net; and a victim net can also be an

aggressor net. The terms aggressor and victim refer to the relationship between two nets

being analyzed. The timing impact of an aggressor net on a victim net depends on

several factors:

The amount of cross-coupled capacitance

The relative times and slew rates of the signal transitions

The switching directions (rising, falling)

The combination of effects from multiple aggressor nets on a single victim net

Figure 1.16 illustrates the importance of timing considerations for calculating

crosstalk effects.

Figure 1.16 Effects of Crosstalk at Different Arrival Times.

The aggressor signal A has a range of possible arrival times, from early to late.

If the transition on A occurs at about the same time as the transition on B, it could cause

the transition on B to occur later as shown in the figure, possibly contributing to a setup

violation, or it could cause the transition to occur earlier, possibly contributing to a hold


19/45

19

violation. If the transition on A occurs at an early time, it induces an upward bump or

glitch on net B before the transition on B, which has no effect on the timing of signal B.

However, a sufficiently large bump can cause unintended current flow by forward-

biasing a pass transistor. Similarly, if the transition on A occurs at a late time, it induces

a bump on B after the transition on B, also with no effect on the timing of signal B.

However, a sufficiently large bump can cause a change in the logic value of the net,

which can be propagated down the timing path.


20/45

20

CHAPTER 2

LITERATURE REVIEW

K. Nose and T. Sakurai proposed a new buffer insertion scheme for bi-

directional buses, namely dual-rail bus (DRB) scheme, which does not have

noise problems [3]. One more proposal is on a high-speed buffer insertion scheme for

uni-directional buses by making use of staggered firing. The staggered firing bus (SFB)

is proposed and measured. As the device dimension is scaled down, interconnect RCdelay becomes dominant performance limiter in high-performance VLSI's.

Another issue in the submicron interconnects is a drastic increase of coupling

capacitance due to the higher aspect ratio to reduce the interconnect resistance. The

increase of the coupling capacitance degrades signal integrity, inducing noise problems

and delay fluctuation problems. The original buffer insertion, however, cannot be

applied to bi-directional buses because the buffer is uni-directional in nature. These

circuits turn out to be prone to malfunctions when there is a noise from adjacent lines in

scaled down interconnect systems where capacitive coupling is large.

B. Victor and K. Keutzer proposed to employ data encoding to eliminate

crosstalk delay within a bus. They present a rigorous analysis of the theory behind

"self-shielding codes", and gives the fundamental theoretical limits on the

performance of codes with and without memory [9]. In this paper, they have

introduced the concept of using data encoding to mitigate crosstalk delay on buses.

The latter would involve codes that eliminate crosstalk delay as well as reduce

average power consumption, perform error detection or correction. The propagation

delay across long on-chip buses is increasingly becoming a limiting factor in high-

speed designs. Crosstalk between adjacent wires on the bus may create a significant

portion of this delay Placing a shield wire between each signal wire alleviates the

crosstalk problem but doubles the area used by the bus.


21/45

21

M. Khellah, J. Tschanz, Y. Ye, S. Narendra, and V. De proposed Static

Pulsed Bus (SPB) driver, repeater and receiver techniques that reduce the worst-

case CCM value to 1[11]. RC delay of long on-chip interconnects continues to be

a key limiter to performance and power of microprocessors. Coupling

capacitance (Cc) between neighbouring lines remains a large fraction (50%) of the

total line capacitance (C1NT) with technology scaling.

SPB offers significant advantages over SB in delay, energy, total

device width and peak Vcc current. These improvements are due to reduction

in CCM and repeater skewing enabled by monotonic signal transition. Unlike

dynamic schemes, energy savings of SPB are maintained across all activity factors

without any clock power or routing overhead.

R. Arunachalam, E. Acar, and S. Nassif presented common current methods

to decrease coupling noise include shielding and buffering, both of which can increase

overall power dissipation[1]. An alternative method is spacing, which has the added

benefit of improving the manufacturability (i.e. defect insensitivity) of the design.

Capacitive coupling is recognized as one of the most critical problems that designers

need to address for deep submicron technologies.

A commonly used technique to avoid coupling is to shield signal lines from each

other by inserting power/ground lines in between them. Although shielding practically

eliminates coupling between signal lines, it does result in increased power and area.

Their results demonstrate that spacing is a viable and more effective option than

shielding for low power designs, even after budgeting for noise and delay increase due

to coupling. Excessive (unnecessary) shielding may significantly increase the total

capacitance of the signal line, which dissipates more dynamic power in operation.

M. Khellah, M. Ghoneima, J. Tschanz, Y. Ye, N. Kurd, J. Barkatullah, S.

Nimmagadda and Y. Ismailpresented a bus architecture called Skewed Repeater Bus

(SRB) for reducing on-chip interconnects energy in microprocessors[5]. By introducing

relative delay between neighbouring bus lines, SRB reduces both average and worst-

case coupling capacitance between those lines On-chip interconnect RC delay is not

scaling with technology.


22/45

22

Therefore, the longest interconnect that can be accommodated in a single clock

cycle between two flip-flops (flop distance) reduces, and the number of clock cycles

needed to propagate a signal from one point of the chip to another increases. This

adversely impacts overall performance of the processor.

H. Deogun, R. Senger, D. Sylvester, R. Brown, and K. Nowka presented a

new dual-VDD bus technique that is well suited for low power operation[12]. This

technique adapts static pulsed bus architecture to use dual-VDD power supplies. During

quiescent periods, the bus system idles at the lower of the two VDD supplies, thereby

lowering static power dissipation. When actively transitioning, the inverters in the bus

system are temporarily boosted to the higher VDD supply to provide the needed drive

strength for performance. Minimizing power consumption is a problem that has been,

and will continue to be, attacked on a variety of design fronts.

A first-order analysis in projects that the fraction of total cells in a

functional logic block used for repeaters will grow from 6% at the 90nm node to 70% at

the 32nm node. This rapid increase in repeaters, which are generally sized aggressively

to maintain delay and signal skew, dramatically increases the total power of the

integrated circuit. They have introduced a novel dual-VDD boosted pulsed bus

technique for total power reduction. They have shown that this technique maintains

noise margins very close to those found in a traditional bus design. They showed that

the achievable savings are consistent through a wide range of data switching rates.

H. Kaul, J. Seo, M. Anders, D. Sylvester, and R. Krishnamurthy described

an alternate repeater insertion technique that uses correct-by-construction polarities to

reduce worst-case miller coupling factor (MCF) across any multiple segmented

portion of a repeated bus[7]. The increased integration of multiple-cores and large

shared caches in microprocessors requires improved energy- efficiency for core-to-core,

core-to-cache and even intra-core communication to sustain the required performance

benefits within shrinking power envelopes.

For multi-core chips, improved bus designs need to satisfy the constraints of

drop-in replacement for minimal change in design methodology, robust operation

and gains across process corners and design space and the ability to improve power-


23/45

23

performance of shared busses with multiple driver and receiver points along the bus.

The technique is robust across process corners and for bus designs with non- equidistant

repeater placement.

C. J. Akl and M. A. Bayoumi proposed a hybrid polarity repeater insertion

technique that combines inverting and non-inverting repeater insertion to achieve

constant average effective coupling capacitance per wire transition for all possible

switching patterns[8]. A simple yet effective hybrid polarity repeater insertion technique

is used to minimize delay uncertainty and reduce delay and/or energy dissipation and

buffers area of on-chip buses and adjacent signal wires. The reduction in worst case

capacitive coupling reduces peak energy which is a critical factor for thermal regulation

and packaging. With the continuous scaling of CMOS technology, interconnect delays

dominate gate delays and become the major limiter to high performance systems.

One of the major causes of interconnect delay degradation is the high cross-

coupling capacitance between adjacent signal wires in deep sub- micrometer (DSM)

technologies. This is mainly due to the dense wiring employed to achieve high

integration density, and the increased aspect ratio used to lower interconnect resistance.

The proposed technique has negligible overhead and it has regular and uniform layout.

It can be easily integrated into current automated layout tools, which can be a possible

extension of this work.

K. Hirose and H. Yasuuraproposed a technique for reduction of maximum bus

delay caused by crosstalk to prevent simultaneous opposite transition by skewing signal

transition timing of adjacent wires[4].As the CMOS technology scaled down, the

horizontal coupling capacitance makes crosstalk interference between adjacent wires aserious problem in VLSI design.

The bus wires are placed with alternation of normal timing wire and shifted

timing wire for the purpose of no adjacent wires switch at the same transition timing.

The shifting time can be made by a delay line of inverter chain or two phase clock. By

approximated equation of bus delay, it becomes clear that our technique is effective for

repeater-inserted bus.


24/45

24

K. Bernstein, C.-T. Chuang, R. Joshi, and R. Puri described CMOS scaling

challenges for sub-9Onm designs and CAD challenges to support energy, parameter

variation, and micro architectural challenges[20]. Design practice will have to change

from todays deterministic design to probabilistic and statistical design. Future high

performance microprocessor design with technology scaling beyond 90nm will pose

two major challenges: (1) energy and power, and (2) parameter variations.

Excessive sub threshold and gate oxide leakage are emerging as serious

problem. Active power density of memory, such as on-chip SRAM cache, is an order

of magnitude smaller than logic. Therefore, the overall processor performance can be

improved in a more energy-efficient manner by using more memory than logic. This

effectively reduces the overall activity factor of the chip. Dual-Vt designs can reduce

leakage power during active operation, burn-in and standby.


25/45

25

CHAPTER 3

DESIGN METHODOLOGY

3.1 EXISTING METHOD

The hybrid polarity repeater insertion technique is shown in Figure. 3.1. The

inverting repeaters at the midpoint of alternate bus lines are replaced with non-inverting

repeaters. For an n-bit bus, n/2 inverting repeaters are substituted with non-inverting

ones. When a worst delay transition occurs at the first half of a line, a best delay

transition occurs at the second half and vice versa.

Figure 3.1 Proposed hybrid polarity repeater insertion with non-inverting

repeaters inserted at the midpoint of alternate bus wires.

This leads to constant average coupling capacitance factor for all possible input

transitions as shown in Table 3.1. As a result, reduction of both worst case delay and

delay uncertainty is achieved. This technique has negligible overhead and it has regular

and uniform layout. This technique is useful in many applications and it can be easily

integrated into current automated layout tools, which can be a possible extension of this

work.


26/45

26

Table 3.1 Average Coupling Capacitance Factor of the middle switching line

for all possible input transition patterns

3.1.1 Drawbacks of Existing Method

This technique reduces the overall worst-case MCF of an interconnect to 1, but

also eliminate the best-case MCF of 0 (all adjacent wires switching in the same

direction), leading to less advantage in average energy consumption.

3.2 PROPOSED METHOD

The edge encoding technique and coding scheme are used to reduce the power

consumption due to coupling capacitance and self transitions respectively. In edge

encoding technique, worst-case MCF of 1 is achieved, while preserving the best-case

MCF of 0.This is done by controlling the edges of rising and falling transition in time,

namely always performing rising transitions on the negative edge of the clock and

falling transition on the positive edge of the clock (vice versa).

3.2.1 Edge Encoding Technique

In a multi-cycle bus structure, the transitions between neighboring wires are

synchronized at every flip-flop as the signal propagates down the bus. This often

generates simultaneous switching of adjacent wires in the opposite or same direction. In


27/45

27

Figure. 3.2, the worst-case and best-case switching of the conventional bus are shown.

The MCF=2 case, where every other wire switches in the opposite direction, generates

the worst-case delay, which defines the clock frequency and also consumes the worst-

case energy due to maximum coupling capacitance. To avoid this, we propose to

selectively shift rising and falling edges and separate them by as much as half cycle. For

example, as seen in Figure. 3.2 (b), if we selectively delay only the rising transitions by

a half cycle and keep the falling transitions unaltered, the worst-case MCF is reduced

from 2 to 1. We refer to this selective edge shifting as edge encoding.

(a)Conventional wire switching

(b) Proposed wire switching

Figure 3.2 Conventional and proposed wire switching scenario in adjacent wires


28/45

28

Since edge encoding shifts the same transitions together, the advantage of best-

case switching is still maintained, this is unachievable in most other approaches. Since

the edge-encoded signal transitions at both positive and negative edges of the clock, we

use dual-edge triggered flip flops to propagate the signal along long multi-cycle

interconnects. Since the signals must be synchronized back to positive edge triggered

flip-flops after long interconnects, the number of dual-edge flip-flops within the multi-

cycle interconnect should be even.

The objective of the edge encoder is to selectively shift the rising and falling

transition by different amounts. This encoding is done simply by performing an AND

operation between the original signal and the half-cycle delayed version of itself. In this

way, only the rising edge is delayed by a half cycle, separating simultaneous rising and

falling transition by a half cycle. Since the encoder logic is very simple, the encoding

overhead in terms of power and area is very small. This makes the edge encoding

technique a highly practical approach.

We propose two schemes to effectively use the edge-encoding technique in

multi-cycle interconnect. The two methods differ in the procedure to cope with the

initial half cycle latency required for edge encoding and to address the issue of aligning

back to the positive-edge triggered signal at the far end of the wire.

3.2.1.1 Zero Latency (ZL) Scheme

The ZL scheme reduces energy consumption in multi-cycle interconnects

without any latency overhead although encoding requires a half-cycle delay at the near

end of the wire. This scheme exploits the fact the signal propagation in the edge-encoded bus is faster than that in the conventional bus due to reduced coupling

capacitance.

The block diagram of a multi-cycle interconnect with simple encoder logic is

shown in Figure. 3.3.


29/45

29

Figure 3.3 Encoder logic and block diagram of ZL scheme.

The encoding procedure and the propagation of the encoded signal are shown in

Figure. 3.3. When data toggles every cycle, the encoder generates a half-cycle pulse

(enc_out). As this half-cycle pulse propagates through an even number of dual-edge

flip-flops, it automatically aligns back to a positive edge triggered signal (ff4_out) at thefar end. Therefore, there is no need for any decoder circuit.

Figure 3.4. Timing diagram of ZL scheme


30/45

30

To achieve overall zero latency, the interconnect system is set up as shown in

Figure. 3.5. If the conventional scheme requires n cycles to propagate through the entire

interconnect, the edge-encoded bus must propagate through in (2n-1) half cycles,

considering that the encoding takes one half cycle to synchronize at the far end of wire.

In Figure. 3.5, L1 is the distance between positive-edge triggered flip-flops in the

conventional bus, and L2 is the distance between dual-edge triggered flip-flops in the

edge-encoded bus. Overall zero latency is achievable if L2 is defined by L1 and as

follows:

() ...................................................................... (3.1)

Figure. 3.5 Flip-flop placement in conventional and ZL edge encoding scheme.

For example, in a 9 mm interconnect, when n=3 and L1=3, the edge-encoded

signal will propagate 1.8 mm every half cycle while the conventional signal will

propagate 3 mm every cycle.


31/45

31

Effectively, the edge-encoded signal is traveling 20% longer (1.8 versus 1.5

mm) during the same time period, which is possible when at least a 17% speedup is

achieved in the edge encoded bus due to coupling capacitance reduction.

3.2.1.2 One Cycle Latency (OCL) Scheme

In multi-cycle interconnects, multiple cycles are required to propagate across

the entire wire. In these cases, one additional cycle latency may be acceptable if a clock

frequency increase or aggressive energy reduction is a higher design priority. The OCL

scheme is intended to achieve further performance improvement and energy reduction

for a fixed throughput at the expense of one-cycle latency.

After the encoding, the data must eventually align to the positive edge of the

clock at the far end of the wire. To achieve this, we can align the transition at the near

end to the positive edge of the clock by encoding with a full one cycle delay, and then

allow for normal signal propagation along the wire. The one-cycle latency is therefore

introduced once at the beginning of the wire and the throughput is not hampered.

The block diagram of the OCL edge encoding scheme are shown in Figure. 3.6.

Figure 3.6. Encoder logic and block diagram of OCL scheme.

The difference in the encoder in Figure 3.6 compared to ZL is that a dual-edge

flip-flop is added at the output to intentionally delay enc_in by one cycle and align the


32/45

32

rising edge of enc_out at the positive edge of the clock as shown in Figure 3.6.The

timing diagrams of the OCL edge encoding scheme are shown in Figure 3.7.

Figure 3.7. Timing diagram of OCL scheme

The corresponding flip-flop placement in the OCL edge encoding scheme is

shown in Figure 3.8. Dual-edge flip-flops are placed at intervals equal to half the flop

distance of the conventional bus. In the OCL edge-encoded bus, since the worst-case

wire delay is reduced due to MCF reduction, we can either increase the clock frequency

for high-performance busses or downsize the repeaters for iso-performance to the

conventional bus for aggressive energy reduction.


33/45

33

Figure 3.8 Flip-flop placement in conventional and OCL edge encoding scheme

3.2.2. Coding Scheme

This scheme minimizes self transition activities in the bus lines. Using this

scheme, the present data can be converted into four types. They are inversion of odd

lines, inversion of even lines, swapping adjacent bits and keeping the data unchanged.

Then the number of self transitions between these four data patterns and the previously

coded data can be calculated using energy estimator and the data pattern which has the

less number of self transitions can be found using comparator and this corresponding

data pattern can be transmitted on the bus.

Let the data on an n bit wide bus, at time instant t be denoted as At = {atn-1, atn-

2,....at1, at0}. The data transmitted on the bus is denoted as A(t)enc. The function

calculateST_n(data1, data2) finds the number of self transitions between (data1,data2).

Here, data1 and data2 should be n bits wide. The function swapAdj_n (At) swaps the

adjacent bits in At and gives the output Asw(t) = {atn-1,atn-2,....at0,at1}.

The coding scheme is as follows:

1) Let A(t-1)enc be the previously coded data which was transmitted on the bus and let

At be the present data which should be encoded and transmitted.

2) Invert the odd lines in At and affix it with 00, for decoding purposes. Let this new

data be denoted as At(odd) . Evaluate st_odd=calculateST_n(At(odd) ,A(t-1)enc).

3) Similarly, invert the even lines in At and affix it with 01.Let this new data be denoted

as At(eve).Evaluate st_eve=calculateST_n(At(eve) ,A(t-1)enc).


34/45

34

4) Let Asw(t) =swapAdj_n(At).Suffix Asw(t) with 10 and let this new data be denoted

as At(swp) .Evaluate st_swp=calculateST_n(At(swp) ,A(t-1)enc).

5) Suffix At with 11 and let this new data be denoted as At(unc) .Evaluate

st_unc=calculateST_n (At(unc) ,A(t-1)enc).

6) Find min (st_odd,st_eve,st_swp,st_unc).

7) The coded pattern corresponding to the minimum value in step 6 is transmitted.

The block diagram of the proposed encoder is given in Figure 3.9

Figure 3.9. Block diagram of the encoder of coding scheme

The encoder takes an n bit input. The calculateST_n functions in the steps

2,3,4,5 in the coding scheme above are evaluated by the Energy Estimator block.

The outputs of the Energy Estimator block, min(st_odd,st_eve,st_swp,st_unc)

are compared among themselves and the minimum amongst them is found out. The

encode data pattern corresponding to the minimum value of self transitions is

transmitted on the bus.

The decoder to be used at the receiving end is shown in Figure 3.10.


35/45

35

Figure 3.10. Block diagram of the decoder of coding scheme

The two least significant bits in the transmitted pattern are given to a 2 to 4

decoder and depending on these two bits, the appropriate decoding procedure is done.

3.3 SOFTWARE DESCRIPTION

VHDL is a versatile and powerful hardware description language which is useful

for modeling electronic systems at various levels of design abstraction.

3.3.1 VHDL

VHDL (VHSIC hardware description language) is a hardware description

language used in electronic design automation to describe digital and mixed-signal

systems such asfield-programmable gate arrays andintegrated circuits

3.3.1.1 Design

VHDL is commonly used to write text models that describe a logic circuit. Such

a model is processed by a synthesis program, only if it is part of the logic design. A

simulation program is used to test the logic design using simulation models to represent

the logic circuits that interface to the design. This collection of simulation models is

commonly called a test bench.
http://en.wikipedia.org/wiki/VHSIChttp://en.wikipedia.org/wiki/Hardware_description_languagehttp://en.wikipedia.org/wiki/Hardware_description_languagehttp://en.wikipedia.org/wiki/Electronic_design_automationhttp://en.wikipedia.org/wiki/Digital_electronicshttp://en.wikipedia.org/wiki/Mixed-signal_integrated_circuithttp://en.wikipedia.org/wiki/Field-programmable_gate_arrayhttp://en.wikipedia.org/wiki/Integrated_circuithttp://en.wikipedia.org/wiki/Integrated_circuithttp://en.wikipedia.org/wiki/Field-programmable_gate_arrayhttp://en.wikipedia.org/wiki/Mixed-signal_integrated_circuithttp://en.wikipedia.org/wiki/Digital_electronicshttp://en.wikipedia.org/wiki/Electronic_design_automationhttp://en.wikipedia.org/wiki/Hardware_description_languagehttp://en.wikipedia.org/wiki/Hardware_description_languagehttp://en.wikipedia.org/wiki/VHSIC


36/45

36

VHDL has constructs to handle theparallelism inherent in hardware designs, but

these constructs (processes) differ in syntax from the parallel constructs in Ada (tasks).

Like Ada, VHDL is strongly typed and is not case sensitive. In order to directly

represent operations which are common in hardware, there are many features of VHDL

which are not found in Ada, such as an extended set of Boolean operators including

nand and nor. VHDL also allows arrays to be indexed in either ascending or descending

direction; both conventions are used in hardware, whereas in Ada and most

programming languages only ascending indexing is available.

VHDL has file input and output capabilities, and can be used as a general-

purpose language for text processing, but files are more commonly used by a simulation

test bench for stimulus or verification data. There are some VHDL compilers which

build executable binaries. In this case, it might be possible to use VHDL to write a test

bench to verify the functionality of the design using files on the host computer to define

stimuli, to interact with the user, and to compare results with those expected. However,

most designers leave this job to the simulator.

It is relatively easy for an inexperienced developer to produce code that

simulates successfully but that cannot be synthesized into a real device, or is too large tobe practical. One particular pitfall is the accidental production of transparent latches

rather thanD-type flip-flops as storage elements.

One can design hardware in a VHDL IDE (for FPGA implementation such as

Xilinx ISE, Altera Quartus, Synopsys Synplify or Mentor Graphics HDL Designer) to

produce the RTL schematic of the desired circuit. After that, the generated schematic

can be verified using simulation software which shows the waveforms of inputs and

outputs of the circuit after generating the appropriate test bench. To generate an

appropriate test bench for a particular circuit or VHDL code, the inputs have to be

defined correctly. For example, for clock input, a loop process or an iterative statement

is required.

A final point is that when a VHDL model is translated into the "gates and wires"

that are mapped onto a programmable logic device such as aCPLD orFPGA,then it is
http://en.wikipedia.org/wiki/Parallel_computinghttp://en.wikipedia.org/wiki/Strongly_typedhttp://en.wikipedia.org/wiki/Case_sensitivityhttp://en.wikipedia.org/wiki/Transparent_latchhttp://en.wikipedia.org/wiki/Flip-flop_(electronics)http://en.wikipedia.org/wiki/Register-transfer_levelhttp://en.wikipedia.org/wiki/CPLDhttp://en.wikipedia.org/wiki/FPGAhttp://en.wikipedia.org/wiki/FPGAhttp://en.wikipedia.org/wiki/CPLDhttp://en.wikipedia.org/wiki/Register-transfer_levelhttp://en.wikipedia.org/wiki/Flip-flop_(electronics)http://en.wikipedia.org/wiki/Transparent_latchhttp://en.wikipedia.org/wiki/Case_sensitivityhttp://en.wikipedia.org/wiki/Strongly_typedhttp://en.wikipedia.org/wiki/Parallel_computing


37/45

37

the actual hardware being configured, rather than the VHDL code being "executed" as if

on some form of a processor chip.

3.3.1.2 Advantages

The key advantage of VHDL, when used for systems design, is that it allows the

behavior of the required system to be described (modeled) and verified (simulated)

before synthesis tools translate the design into real hardware (gates and wires).

Another benefit is that VHDL allows the description of a concurrent system.

VHDL is adataflow language,unlike procedural computing languages such as BASIC,

C, and assembly code, which all run sequentially, one instruction at a time.

A VHDL project is multipurpose. Being created once, a calculation block can be

used in many other projects. However, many formational and functional block

parameters can be tuned (capacity parameters, memory size, element base, block

composition and interconnection structure).

A VHDL project is portable. Being created for one element base, a computing

device project can be ported on another element base, for example VLSI with various

technologies.
http://en.wikipedia.org/wiki/Concurrent_systemhttp://en.wikipedia.org/wiki/Dataflow_programminghttp://en.wikipedia.org/wiki/Dataflow_programminghttp://en.wikipedia.org/wiki/Concurrent_system


38/45

38

CHAPTER 4

SIMULATION RESULTS

4.1 POWER ANALYSIS OF ENCODER IN CODING SCHEME:

The dynamic power of 0.068W and quiescent (leakage) power of 0.010 that

together contribute a total on-chip power of 0.078W is given in this figure.

Figure 4.1 Power analysis of encoder in coding scheme


39/45

39

CHAPTER 5

CONCLUSION AND FUTURE ENHANCEMENTS

5.1 CONCLUSION


40/45

40

APPENDIX

CODING

library IEEE;

use IEEE.STD_LOGIC_1164.ALL;

entity self is

Port ( a : in STD_LOGIC_VECTOR (7 downto 0);

encout : out STD_LOGIC_VECTOR (9 downto 0) );

end self;

architecture Behavioral of self is

signal odd,even,swap,append :STD_LOGIC_VECTOR (9 downto 0);

component energy is

Port ( ina : in STD_LOGIC_vector(9 downto 0);

bout :out std_logic_vector(3 downto 0));

end component;

signal egy1,egy2,egy3,egy4 : std_logic_vector(3 downto 0);

begin

p1:process(a)

begin

odd


41/45

41

end process p1;

egymet1 : energy PORT MAP(ina=>odd,bout=>egy1);

egymet2 : energy PORT MAP(ina=>even,bout=>egy2);

egymet3 : energy PORT MAP(ina=>swap,bout=>egy3);

egymet4 : energy PORT MAP(ina=>append,bout=>egy4);

library IEEE;

use IEEE.std_logic_1164.all;

use IEEE.numeric_std.all;

use IEEE.std_logic_unsigned.all;

entity energy is

Port ( ina : in STD_LOGIC_vector(9 downto 0);

bout :out std_logic_vector(3 downto 0));

end energy;

architecture Behavioral of energy is

signal atp,exr : std_logic_vector(9 downto 0) :="0000000000";

component FA is

Port ( a,b,cin : in STD_LOGIC;

s : out std_logic;

c : out STD_LOGIC);

end component FA;

component HA is

Port ( a,b : in STD_LOGIC;

s : out std_logic;

c : out STD_LOGIC);


42/45

42

end component;

signal c1 : std_logic_vector(7 downto 1);

signal s1 : std_logic_vector(5 downto 1);

begin

exr exr(1),b=>exr(2),cin=>exr(3),s =>s1(1),c=>c1(1));

FA2 : FA PORT MAP(a=>exr(4),b=>exr(5),cin=>exr(6),s =>s1(2),c=>c1(2));

FA3 : FA PORT MAP(a=>exr(7),b=>exr(8),cin=>exr(9),s =>s1(3),c=>c1(3));

FA4 : FA PORT MAP(a=>c1(1),b=>c1(2),cin=>c1(3),s =>s1(4),c=>c1(4));

FA5 : FA PORT MAP(a=>s1(1),b=>s1(2),cin=>s1(3),s =>s1(5),c=>c1(5));

HA1 : HA PORT MAP(a=>exr(0),b=>s1(5),s =>bout(0),c=>c1(6));

FA6 : FA PORT MAP(a=>s1(4),b=>c1(6),cin=>c1(5),s =>bout(1),c=>c1(7));

HA2 : HA PORT MAP(a=>c1(7),b=>c1(4),s =>bout(2),c=>bout(3));

end Behavioral;

library IEEE;

use IEEE.std_logic_1164.all;

use IEEE.numeric_std.all;

use IEEE.std_logic_unsigned.all;

entity HA is

Port ( a,b : in STD_LOGIC;

s : out std_logic;

c : out STD_LOGIC);

end HA;

architecture Behavioral of HA is

begin


43/45

43

s


44/45

44

REFERENCES

[1] A. B. Kahng, S. Muddu, and E. Sarto, (1999) Interconnect optimization

strategies for high-performance VLSI designs, in Proc. Int. Conf. VLSI Des., pp.464469.

[2] A. B. Kahng, S. Muddu, and E. Sarto, (2000) On switch factor based analysis of

coupled RC interconnects, in Proc. Des. Autom. Conf. (DAC), pp. 7984.

[3] B. Victor and K. Keutzer, (2001 )Bus encoding to prevent crosstal delay, in

Proc. Int. Conf. Comput.-Aided Des. (ICCAD), pp. 5763.

[4] C. J. Akl and M. A. Bayoumi, (Sep. 2008)Reducing interconnect delay

uncertainty via hybrid polarity repeater insertion, IEEE Trans. Very Large Scale

Integr. (VLSI) Syst., vol. 16, no. 9, pp. 12301239.

[5] H. Deogun, R. Senger, D. Sylvester, R. Brown, and K. Nowka, (2006)

A dual-Vdd boosted pulsed bus technique for low power and low leakage

operation, in Proc. Int. Symp. Low Power Electron. Des. (ISLPED),pp. 7378.

[6] H. Kaul, D. Sylvester, M. Anders, and R. Krishnamurthy, (Nov. 2005) Design

and analysis of spatial encoding circuits for peak power reduction in on-chip

buses, IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol.13, no. 11, pp.

12251238.

[7] H. Kaul, J. Seo, M. Anders, D. Sylvester, and R. Krishnamurthy, (2008) A

robust alternate repeater technique for high performance busses in the multi-core

era, in Proc. Int. Symp. Circuits Syst., pp. 372375.

[8] H. Partovi, R. Burd, U. Salim, F. Weber, L. DiGregorio, and D. Draper, (1996)

Flow-through latch and edge-triggered flip-flop hybrid elements, in Dig. Tech.

Papers IEEE Int. Solid-State Circuits Conf. (ISSCC), pp. 138139.

[9] J. Eble, V. De, D. Wills, and J. Meindl,( 1998) Minimum repeater count, size,

and energy dissipation for gigascale integration (GSI) interconnects, in Proc. Int.

Interconnect Technol. Conf.,pp. 5658.

[10] J. Seo, D. Sylvester, D. Blaauw, H. Kaul, and R. Krishnamurthy, (2007) A

robust edge encoding technique for energy-efficient multi-cycle interconnect,

in Proc. Int. Symp. Low Power Electron. Des. (ISLPED), pp. 6873.

[11] J. Tschanz, S. Narendra, Z. Chen, S. Borkar, M. Sachdev, and V. De, (2001)

Comparative delay and energy of single edge-triggered and dual edge-triggered

pulsed flip-flops for high-performance microprocessors, in Proc. Int. Symp. Low

Power Electron. Des. (ISLPED), pp. 147152.


45/45

[12] K. Bernstein, C.-T. Chuang, R. Joshi, and R. Puri, (2003)Design and CAD

challenges in sub-90 nm CMOS Technologies, in Proc. Int. Conf. Comput.-

Aided Des. (ICCAD), pp. 129136.

[13] K. Bowman, J. Tschanz, M. Khellah, M. Ghoneima, Y. Ismail, and V. De, (2006)Time-borrowing multi-cycle on-chip interconnects for delay variation tolerance,

in Proc. Int. Symp. Low Power Electron. Des. (ISLPED), pp. 7984.

[14] K. Hirose and H. Yasuura, (2000) A bus delay reduction technique considering

crosstal, in Proc. Des., Autom. Test Eur. (DATE), pp.441445.

[15] K. Nose and T. Saurai, (2001) Two schemes to reduce interconnect delays in

bi-directional and uni-directional buses, in Symp. VLSI Circuits Dig. Tech.

Papers, pp. 193194.

[16] M. Khellah, J. Tschanz, Y. Ye, S. Narendra, and V. De, (2002) Static pulsed busfor on-chip interconnects, in Symp. VLSI Circuits Dig. Tech. Pa- pers, pp. 78

79.

[17] M. Khellah, M. Ghoneima, J. Tschanz, Y. Ye, N. Kurd, J. Barkatullah,

S. Nimmagadda, and Y. Ismail, (2005) A sewed repeater bus architecture for

on-chip energy reduction in microprocessors, in Proc. Int. Conf. Comput. Des.

(ICCD), pp. 253257.

[18] P. P. Sotiriadis, A. Wang, and A. Chandraasan, (2000) Transition pattern

coding: An approach to reduce energy in interconnect, in Proc. Euro. Solid-State

Circuits Conf. (ESSCIRC), pp. 348351.

[19] R. Arunachalam, E. Acar, and S. Nassif, (2003) Optimal shielding/spacing

metrics for low power design, in Proc. IEEE Comput. Soc. Annu. Symp. VLSI,

pp. 167172.

[20] S.-C. Wong, T. G.-Y. Lee, D.-J. Ma, and C.-J. Chao, (May 2000) An empirical

three-dimensional crossover capacitance model for multilevel interconnect VLSI

circuits, IEEE Trans. Semiconductor Manufacturing, vol.13, no. 2, pp. 219227.

Documents

Starlee Contents