Upload
swarnaasnl
View
217
Download
0
Embed Size (px)
Citation preview
8/13/2019 Starlee Contents
1/45
1
CHAPTER 1
INTRODUCTION
In current microprocessors, the number of wires used for inter-module
communication has skyrocketed. Furthermore, the increased complexity and high level
of integration requires higher wire densities, and coupling capacitance has dominated
total wire capacitance for several technologies already.A high coupling capacitance
ratio is not favorable in conventional bussesdue to the possibility of adjacent wiresswitching in the opposite direction, yielding a worst-case Miller capacitance factor
(MCF) of 2. For example, when MCF=2 the coupling capacitance ratio over the total
interconnect capacitance is over 80% for a minimum pitch intermediate metal layer in
65-nm[5].It is possible to reduce coupling capacitance by increasing spacing or
introducing shielding, but this comes at the cost of significant area penalties[1]. Hence,
a key challenge in interconnect design is to reduce the worst-case MCF while
maintaining the same physical footprint of the interconnect, thereby reducing the
effective wire capacitance and interconnect energy consumption.
In this project, a new encoding technique that achieves a worst-case MCF of 1,
while preserving the best-case MCF of 0 has been presented. This is done by controlling
the edges of rising and falling transition in time, namely always performing rising
transitions on the negative edge of the clock and falling transition on the positive edge of
the clock (or vice versa). Since the worst-case switching is separated by as much as one
phase (half clock cycle), this technique remains robust against process variation. Hence,
both the average and worst-case energy can be reduced without impacting the sensitivity
to process variation. Average energy savings will aid battery life and typical energy costs,
but worst-case energy is also a meaningful metric in terms of thermal management and
peak demand for power grids and decoupling capacitance.[13] These savings are
accomplished at the expense of minimal encoder logic with half cycle latency and
additional clocking. However, we findthat the logic and clocking overhead is small in
long interconnects where interconnect power consumption is dominant, and also show
8/13/2019 Starlee Contents
2/45
2
that the potential latency overhead can be eliminated or minimized in multi-cycle
interconnects.
1.1SELF-TRANSITIONS AND COUPLING TRANSITIONSSelf-transitions are defined as transitions on the capacitance between a bus
line and the substrate (ground) while coupling transitions are defined as transitions
on the capacitance between adjacent lines. Figure 1.1 shows a simplified bus model
with coupling (ignoring all the resistances). Cs is the self-capacitance from each
bus line to ground; Cc is the coupling-capacitance between two adjacent lines. The
average power consumption on the bus is given by:
( ) (1.1)
where s is the number of average self-transitions per bus cycle and c is the
number of average coupling-transitions per bus cycle; only the charging transitions
that require current flow from the power supply are being counted. For example only
the 0 1 transitions are counted as self-transitions for power consumption.
Figure 1.1 Bus model with self and coupling capacitances.
1.2MILLER-COUPLING CAPACITANCEA simple way of performing timing analysis in the presence of crosstalk noise is
to replace the coupling capacitance by an equivalent Miller capacitance to account for
the slow down or the speed up of the victim transition. Hence, we are effectively
8/13/2019 Starlee Contents
3/45
3
modeling the delay noise on the victim by simply scaling the coupling capacitance with
the appropriate Miller-coupling factor (MCF). The simple approximation leads to a
decoupling between the aggressor-victim interconnects (as shown in Figure 1.2) and
allows us to compute the signal arrival times using the existing STA framework.
Figure 1.2 Decoupling of the aggressor-victim nets by using Miller-coupling
capacitance
The exact value of the Miller-coupling factors for the victim (MCFvictim
) depends
on the mutual switching directions and the ratio of the slopes of the aggressor-victim
waveforms
(
).(1.2)
If the aggressor-victim nets switch in mutually opposite directions (as shown in
Figure 1.2), then MCFvictim is obtained by adding 1 to the relative ratio of the
slopes(a/v) Conversely, if the aggressor-victim nets switch in the same direction,
8/13/2019 Starlee Contents
4/45
4
then MCFvictim is obtained by subtracting (a/v) from one. Suppose the aggressor-
victim nets switch in the opposite directions with equal slopes (i.e., a = v). Using
Equation 1.2, we can obtain the commonly used MCFvictimvalue of two. In such cases,
the Miller capacitance used to decouple the aggressor-victim interconnects is twice the
size of the original coupling capacitance. Similarly, if the aggressor-victim nets switch
in the same direction with equal slopes, then we obtain an MCFvictimvalue of zero.
1.3 FLIP-FLOP
In electronics, a flip-flopor latchis a circuit that has two stable states and can be
used to store state information. The circuit can be made to change state by signals
applied to one or more control inputs and will have one or two outputs. It is the basic
storage element in sequential logic. Flip-flops and latches are a fundamental building
block of digital electronics systems used in computers, communications, and many
other types of systems.
Flip-flops and latches are used as data storage elements. Such data storage can
be used for storage of state, and such a circuit is described as sequential logic. Whenused in a finite-state machine, the output and next state depend not only on its current
input, but also on its current state (and hence, previous inputs). It can also be used for
counting of pulses, and for synchronizing variably-timed input signals to some
reference timing signal.
Flip-flops can be either simple (transparent or opaque) or clocked (synchronous
or edge-triggered); the simple ones are commonly called latches. The word latch is
mainly used for storage elements, while clocked devices are described as flip-flops.
1.3.1 Dual Edge Flip-Flops
The edge encoding technique requires dual-edge flip-flops and hence the
number of flip-flops placed in multi-cycle interconnects is inevitably increased
compared to conventional multi-cycle interconnects with single-edge flip-flops.
8/13/2019 Starlee Contents
5/45
5
Along the critical path of multi-cycle interconnects, both the total setup time
and CLK-Q delay in flip-flops increase as well. Hold time is not a concern even when
using dual-edge flip-flops because the interconnect paths are well-defined with several
repeaters and large wire load, and thus short paths between flip-flops do not exist.
Time-borrowing flip-flops have zero or negative setup time, providing
performance benefits over master-slave flip-flops. Therefore, in the proposed edge
encoding techniques,time-borrowing flip-flops are considered for the inserted flip-flops
to mitigate the increase in total setup time and variation.
Several types of flip-flops from [18] were considered to absorb the setup time
and mismatch between interconnect paths, for the time-borrowing dual-edge flip-flops.
We selected the pulsed triggered dual-edge flip-flop in Figure 1.3 for the proposed
edge encoding because it is more area-efficient and the D-Q path is shorter.
Figure 1.3 Schematic of time borrowing pulsed dual edge flip-flop.
1.4 TIME BORROWING WITH LATCHES
The unique property which enables above advent transparent for the duration
of an active clock pulse normal edge-to-edge timing requirements of sync long
enough and is determining the maximum freq a shorter path in subsequent latch-to-
latch stages
8/13/2019 Starlee Contents
6/45
6
Figure 1.4 Timing paths in a sample circuit.
The figure 1.4 has 2 timing paths: Path 1 from the p a negative-level latch (2),
while Path 2 is from the triggered register (3). Let us examine this simple compensate
for the delay through the logic cloud A in Path 1, we can have two scenarios of
timing (Figures 1.6 and 1.7.)
Figure 1.5 Timing relationships set by the system clock.
In Case A, data arrives from logic A at Latch 2 be this case, the behavior of the
latch is similar to the need to borrow any time to achieve our timing go.
8/13/2019 Starlee Contents
7/45
7
Figure 1.6 Logic A is fast enough, so no borrowing is necessary.
In Case B, the negative clock edge enables the la at the input of the latch. So the
latch will go to tra from Logic A through to Register B for a while. Bu from Logic A
reaches Logic B and passes through Register 2. So if the propagation delay of Logic
B, some of the time reserved for Logic B, and the cir this extra time in order to
complete its propagation analysis will consider the end of the borrowed time delay.
Figure 1.7 Logic A borrows time from Logic B.
While doing STA, Timing reports will be generate the timing when the latch is
enabled is the same element (Figure 1.8)
8/13/2019 Starlee Contents
8/45
8
Figure 1.8 When the latch is enabled, it becomes a passive delay.
1.5 REPEATER INSERTION
Repeater insertion is a technique for reducing the time delay associated with
long wire lines in integrated circuits. The technique involves cutting the long wire into
one or more short wires and inserting arepeaterbetween each new pair of short wires.
The time it takes for a signal to travel from one end of a wire to the other end is
known as wire-line delay or just delay. In an integrated circuit, this delay is
characterized byRC,theresistance of the wire (R
) multiplied by the wire'scapacitance(C). Thus, if the wire's resistance is 100 ohms and its capacitance is 0.01 microfarad
(F), the wire's delay is onemicrosecond (s).
The resistance of a wire on an integrated circuit is directly proportional, or
linear, according to the wire's length. If a 1 mm length of the wire has 100 ohms
resistance, then a 2 mm length will have 200 ohms resistance.
The capacitance of a wire also increases linearly along its length. If a 1 mm
length of the wire has 0.01 F capacitance, a 2 mm length of the wire will have 0.02 F,
a 3 mm wire will have 0.03 F, and so on as shown in table 1.1.
Thus, the time delay through a wire increases with the square of the wire's
length. This is true, to first order, for any wire whose cross-section remains constant
along the length of the wire.
http://en.wikipedia.org/wiki/Repeaterhttp://en.wikipedia.org/wiki/RC_circuithttp://en.wikipedia.org/wiki/Electrical_resistancehttp://en.wikipedia.org/wiki/Capacitancehttp://en.wikipedia.org/wiki/Ohmhttp://en.wikipedia.org/wiki/Faradhttp://en.wikipedia.org/wiki/Microsecondhttp://en.wikipedia.org/wiki/Integrated_circuithttp://en.wikipedia.org/wiki/Directly_proportionalhttp://en.wikipedia.org/wiki/Time_delayhttp://en.wikipedia.org/wiki/Time_delayhttp://en.wikipedia.org/wiki/Directly_proportionalhttp://en.wikipedia.org/wiki/Integrated_circuithttp://en.wikipedia.org/wiki/Microsecondhttp://en.wikipedia.org/wiki/Faradhttp://en.wikipedia.org/wiki/Ohmhttp://en.wikipedia.org/wiki/Capacitancehttp://en.wikipedia.org/wiki/Electrical_resistancehttp://en.wikipedia.org/wiki/RC_circuithttp://en.wikipedia.org/wiki/Repeater8/13/2019 Starlee Contents
9/45
9
Wire Length Resistance Capacitance Time delay
1 mm 100 ohm 0.01F 1s
2 mm 200 ohm 0.02 F 4 s
3 mm 300 ohm 0.03 F 9 s
Table 1.1 Analysis of wire delay with wire length.
The interesting consequence of this behavior is that, while a single 2 mm length
of wire has a delay of 4 s, two separate 1 mm wires only have a delay of 1 s each.
The two separate wires cover the same distance in half the time. By cutting the wire in
half, we can double its speed.
Anactivecircuit must be placed between the two separate wires so as to move
the signal from one to the next. An active circuit used for such a purpose is known as a
repeater.In a CMOS integrated circuit, the repeater is often a simple inverter.
Reducing the delay of a wire by cutting it in half and inserting a repeater is
known as repeater insertion. The cost of this procedure is the additional new delay
through the repeater itself, plus power cost because the repeater is an active circuit that
must be powered, whereas the plain unrepeated wire was originally an unpowered
passive component.
1.6 BASIC LOW POWER DIGITAL DESIGN
Moores law states that the number of transistors that can be placed
inexpensively on an integrated circuit will double approximately every two years,hasoften been subject to the following criticism: while it boldly states the blessing of
technology scaling, it fails to expose its bane. A direct consequence of Moores law is
that the power density of the integrated circuit increases exponentially with every
technology generation. This implicit trend has arguably brought about some of the
most important changes in electronic and computer designs. Since the 1970s, most
popular electronics manufacturing technologies used bipolar and nMOS transistors.
However, bipolar and nMOS transistors consume energy even in a stable combinatorial
http://en.wikipedia.org/wiki/Active_componenthttp://en.wikipedia.org/wiki/Electronic_circuithttp://en.wikipedia.org/wiki/Repeaterhttp://en.wikipedia.org/wiki/Repeaterhttp://en.wikipedia.org/wiki/Passive_componenthttp://en.wikipedia.org/wiki/Passive_componenthttp://en.wikipedia.org/wiki/Repeaterhttp://en.wikipedia.org/wiki/Electronic_circuithttp://en.wikipedia.org/wiki/Active_component8/13/2019 Starlee Contents
10/45
10
state, and consequently, by 1980s, the power density of bipolar designs was considered
too high to be sustainable.
1.6.1 CMOS Transistor Power Consumption
The power consumption of a CMOS transistor can be divided into three different
components: dynamic, static (or leakage) and short circuit power consumption.
Figure 1.9 illustrates the three components of power consumption. Dynamic and
short circuit power are also collectively known as switching power, and are consumed
when transistors change their logic state, but leakage power is consumed merely
because the circuit is powered-on.
Switching power, which includes both dynamic power and short-circuit power,
is consumed when signals through CMOS circuits change their logic state, resulting in
the charging and discharging of load capacitors. Leakage power is primarily due to the
sub-threshold currents and reverse biased diodes in a CMOS transistor.
total dynamic shortcircuit leaage..(1.3)
Figure 1.9 The dynamic, short circuit and leakage power components
8/13/2019 Starlee Contents
11/45
11
1.6.2 Switching Power
When signals change their logic state in a CMOS transistor, energy is drawn
from the power supply to charge up the load capacitance from 0 to Vdd. For the inverter
example in Figure 1.9, the power drawn from the power supply is dissipated as heat in
pMOS transistor during the charging process. Energy is needed whenever charge is
moved against some potential. Thus, dE = d(QV). When the output of the inverter
transitions from logical 0 to 1, the load capacitance is charged. The energy drawn from
the power supply during the charging process is given by,
dEP= d(VQ) = V
dd.dQ
L
since the power supply provides power at a constant voltage Vdd. Now, since QL=CL.VL,
we have:
dQL= CL.dVL
Therefore,
dEP= Vdd. CL. dVL
Integrating for full charging of the load capacitance,
= CL. Vdd2I ...(1.4)
Thus a total of CL.Vdd2 energy is drawn from the power source. The energy EL stored
in the capacitor at the end of transition can be computed as follows:
dEL = d(VQ) = VL.dQL
where VL is the instantaneous voltage across the load capacitance, and QL is the
instantaneous charge of the load capacitance during the loading process. Therefore,
dEL = VL. CLdVL
Integrating for full charging of the load capacitance,
=CL. Vdd2
/2
(1.5)
8/13/2019 Starlee Contents
12/45
8/13/2019 Starlee Contents
13/45
13
As the input voltage rises, at time t0 , we have Vin >VTn, i.e., the input voltage
become higher than the threshold voltage of the nMOS transistor. At this time a short-
circuit current path is established. This short circuit current increases as the nMOS
transistor turns on. Thereafter, the short circuit current first increases and then decreases
until, after t1, we have Vin >(Vdd-VTp),i.e., the pMOS transistor turns off, signaling the
end of short-circuit current. Therefore, in the duration when (VTn < Vin < (Vdd - VTp))
holds, there will be a conductive path open between Vdd and GND because both the
nMOS and pMOS devices will be simultaneously on. Short-circuit power is typically
estimated as:
Pshort-circuit = 1/12.k..Fclk.( Vdd -2 Vt )3 .(1.6)
1.6.4 Leakage Power
The third component of power dissipation in CMOS circuits, as shown in
Equation 1.5 is the static or leakage power. Even though a transistor is in a stable logic
state, just because it is powered-on, it continues to leak small amounts of power at
almost all junctions due to various effects.
1.6.4.1 Reverse Biased Diode Leakage
The reverse biased diode leakage is due to the reverse bias current in the
parasitic diodes that are formed between the diffusion region of the transistor and
substrate. It results from minority carrier diffusion and drift near the edge of depletion
regions, and also from the generation of electron hole pairs in the depletion regions of
reverse-bias junctions. As shown in Figure 1.11, when the input of inverter in Figure 1.9
is high, a reverse potential difference of Vddis established between the drain and the n-
well, which causes diode leakage through the drain junction. In addition, the n- well
region of the pMOS transistor is also reverse biased with respect to the p-type substrate.
This also leads to reverse bias leakage at the n-well junction.
8/13/2019 Starlee Contents
14/45
14
The reverse bias leakage current is typically expressed as,
Irbdl= A.Js.(eq.Vbias/kT-1).(1.7)
where A is the area of the junction, Js is the reverse saturation current density,
and Vbias is the reverse bias voltage across the junction, and V th= KT/ q is the thermal
Voltage. Reverse biased diode leakage will further become important as we continue to
heavily dope the n- and p-regions. As a result, zener and band-to-band tunneling will
also become contributing factors to the reverse bias current.
Figure 1.11 Components of Leakage Power
1.7 CROSSTALK NOISE
As the feature sizes have been shrinking with process-technology scaling, the
spacing between adjacent interconnect wires keeps decreasing in every process
technology. Also, while the lateral width of interconnect wires has been scaled down
8/13/2019 Starlee Contents
15/45
15
significantly their vertical height has not been scaled in proportion (as shown in Figure
1.12). Both these trends lead to a very rapid increase in the amount of coupling
capacitance (essentially like parallel-plate capacitors) between the wires. We know that
coupling capacitance accounts for more than 85% of the total interconnect capacitance
in the 90nm technology node.
Figure 1.12 Shrinking of wire geometries in the nanometer process technologyleads to an increase in the amount of coupling capacitance.
More aggressive technology scaling will only leads to an increase in the overall
contribution of the coupling capacitances to the total interconnect capacitance.
Therefore, signal-integrity issues such as crosstalk noise have become important when
performing timing verification of VLSI chips. Due to capacitive coupling, the switching
characteristic of a net is affected by simultaneous switching of nets that are in close
physical proximity. The net under analysis which suffers from coupling noise is referred
to as victim, and all neighboring nets which contribute to coupling noise on the victim
are termed as aggressors. Figure 1.13 illustrates coupling noise injected on the victim
due to the rising transition of its aggressor.
As the aggressor transition occurs, the voltage at the victim gets pulled up due to
the AC current flowing through the coupling capacitance. The resulting glitch in the
victim voltage due to the aggressor transition is referred to as coupling- noise pulse. The
8/13/2019 Starlee Contents
16/45
16
peak of the coupling-noise pulse usually occurs when the aggressor transition is
completed and there is no more coupling current flowing through the coupling
capacitance. Finally, the coupling-noise pulse gradually dies down once the aggressor
transition is completed and the victim node discharges the accumulated charge.
Figure 1.13 The aggressor transition resulting coupling-noise pulse on the
victim
It has become imperative to model the signal-integrity issues that can arise due
to the charge transfer through the coupling capacitances for VLSI chips in the
nanometer process technology.
1.7.1 Preliminaries of Crosstalk Effects
Crosstalk can affect signal delays by changing the times at which signal
transitions occur. For example, consider the signal waveforms on the cross-coupled nets
A, B, and C in Figure 1.14. Because of capacitive cross-coupling, the transitions on net
A and net C can affect the time at which the transition occurs on net B.
A rising-edge transition on net A at the time shown in Figure 1.14 can cause the
transition to occur later on net B, possibly contributing to a setup violation for a path
containing B.
8/13/2019 Starlee Contents
17/45
17
Figure 1.14 Transition Slowdown or Speedup Caused by Crosstalk.
Similarly, a falling-edge transition on net C can cause the transition to occur
earlier on net B, possibly contributing to a hold violation for a path containing B. The
logic effects of crosstalk on steady-state nets are due to the cross-coupled nets as shown
in Figure 1.15.
Figure 1.15 Glitch Due to Crosstalk.
Net B should be constant at logic zero, but the rising edge on net A causes a
noise bump or glitch on net B. If the bump is sufficiently large and wide, it can cause an
incorrect logic value to be propagated to the next gate in the path containing net B.
8/13/2019 Starlee Contents
18/45
18
A net that receives undesirable cross-coupling effects from a nearby net is called
a victim net. A net that causes these effects in a victim net is called an aggressor net.
Note that an aggressor net can itself be a victim net; and a victim net can also be an
aggressor net. The terms aggressor and victim refer to the relationship between two nets
being analyzed. The timing impact of an aggressor net on a victim net depends on
several factors:
The amount of cross-coupled capacitance
The relative times and slew rates of the signal transitions
The switching directions (rising, falling)
The combination of effects from multiple aggressor nets on a single victim net
Figure 1.16 illustrates the importance of timing considerations for calculating
crosstalk effects.
Figure 1.16 Effects of Crosstalk at Different Arrival Times.
The aggressor signal A has a range of possible arrival times, from early to late.
If the transition on A occurs at about the same time as the transition on B, it could cause
the transition on B to occur later as shown in the figure, possibly contributing to a setup
violation, or it could cause the transition to occur earlier, possibly contributing to a hold
8/13/2019 Starlee Contents
19/45
19
violation. If the transition on A occurs at an early time, it induces an upward bump or
glitch on net B before the transition on B, which has no effect on the timing of signal B.
However, a sufficiently large bump can cause unintended current flow by forward-
biasing a pass transistor. Similarly, if the transition on A occurs at a late time, it induces
a bump on B after the transition on B, also with no effect on the timing of signal B.
However, a sufficiently large bump can cause a change in the logic value of the net,
which can be propagated down the timing path.
8/13/2019 Starlee Contents
20/45
20
CHAPTER 2
LITERATURE REVIEW
K. Nose and T. Sakurai proposed a new buffer insertion scheme for bi-
directional buses, namely dual-rail bus (DRB) scheme, which does not have
noise problems [3]. One more proposal is on a high-speed buffer insertion scheme for
uni-directional buses by making use of staggered firing. The staggered firing bus (SFB)
is proposed and measured. As the device dimension is scaled down, interconnect RCdelay becomes dominant performance limiter in high-performance VLSI's.
Another issue in the submicron interconnects is a drastic increase of coupling
capacitance due to the higher aspect ratio to reduce the interconnect resistance. The
increase of the coupling capacitance degrades signal integrity, inducing noise problems
and delay fluctuation problems. The original buffer insertion, however, cannot be
applied to bi-directional buses because the buffer is uni-directional in nature. These
circuits turn out to be prone to malfunctions when there is a noise from adjacent lines in
scaled down interconnect systems where capacitive coupling is large.
B. Victor and K. Keutzer proposed to employ data encoding to eliminate
crosstalk delay within a bus. They present a rigorous analysis of the theory behind
"self-shielding codes", and gives the fundamental theoretical limits on the
performance of codes with and without memory [9]. In this paper, they have
introduced the concept of using data encoding to mitigate crosstalk delay on buses.
The latter would involve codes that eliminate crosstalk delay as well as reduce
average power consumption, perform error detection or correction. The propagation
delay across long on-chip buses is increasingly becoming a limiting factor in high-
speed designs. Crosstalk between adjacent wires on the bus may create a significant
portion of this delay Placing a shield wire between each signal wire alleviates the
crosstalk problem but doubles the area used by the bus.
8/13/2019 Starlee Contents
21/45
21
M. Khellah, J. Tschanz, Y. Ye, S. Narendra, and V. De proposed Static
Pulsed Bus (SPB) driver, repeater and receiver techniques that reduce the worst-
case CCM value to 1[11]. RC delay of long on-chip interconnects continues to be
a key limiter to performance and power of microprocessors. Coupling
capacitance (Cc) between neighbouring lines remains a large fraction (50%) of the
total line capacitance (C1NT) with technology scaling.
SPB offers significant advantages over SB in delay, energy, total
device width and peak Vcc current. These improvements are due to reduction
in CCM and repeater skewing enabled by monotonic signal transition. Unlike
dynamic schemes, energy savings of SPB are maintained across all activity factors
without any clock power or routing overhead.
R. Arunachalam, E. Acar, and S. Nassif presented common current methods
to decrease coupling noise include shielding and buffering, both of which can increase
overall power dissipation[1]. An alternative method is spacing, which has the added
benefit of improving the manufacturability (i.e. defect insensitivity) of the design.
Capacitive coupling is recognized as one of the most critical problems that designers
need to address for deep submicron technologies.
A commonly used technique to avoid coupling is to shield signal lines from each
other by inserting power/ground lines in between them. Although shielding practically
eliminates coupling between signal lines, it does result in increased power and area.
Their results demonstrate that spacing is a viable and more effective option than
shielding for low power designs, even after budgeting for noise and delay increase due
to coupling. Excessive (unnecessary) shielding may significantly increase the total
capacitance of the signal line, which dissipates more dynamic power in operation.
M. Khellah, M. Ghoneima, J. Tschanz, Y. Ye, N. Kurd, J. Barkatullah, S.
Nimmagadda and Y. Ismailpresented a bus architecture called Skewed Repeater Bus
(SRB) for reducing on-chip interconnects energy in microprocessors[5]. By introducing
relative delay between neighbouring bus lines, SRB reduces both average and worst-
case coupling capacitance between those lines On-chip interconnect RC delay is not
scaling with technology.
8/13/2019 Starlee Contents
22/45
22
Therefore, the longest interconnect that can be accommodated in a single clock
cycle between two flip-flops (flop distance) reduces, and the number of clock cycles
needed to propagate a signal from one point of the chip to another increases. This
adversely impacts overall performance of the processor.
H. Deogun, R. Senger, D. Sylvester, R. Brown, and K. Nowka presented a
new dual-VDD bus technique that is well suited for low power operation[12]. This
technique adapts static pulsed bus architecture to use dual-VDD power supplies. During
quiescent periods, the bus system idles at the lower of the two VDD supplies, thereby
lowering static power dissipation. When actively transitioning, the inverters in the bus
system are temporarily boosted to the higher VDD supply to provide the needed drive
strength for performance. Minimizing power consumption is a problem that has been,
and will continue to be, attacked on a variety of design fronts.
A first-order analysis in projects that the fraction of total cells in a
functional logic block used for repeaters will grow from 6% at the 90nm node to 70% at
the 32nm node. This rapid increase in repeaters, which are generally sized aggressively
to maintain delay and signal skew, dramatically increases the total power of the
integrated circuit. They have introduced a novel dual-VDD boosted pulsed bus
technique for total power reduction. They have shown that this technique maintains
noise margins very close to those found in a traditional bus design. They showed that
the achievable savings are consistent through a wide range of data switching rates.
H. Kaul, J. Seo, M. Anders, D. Sylvester, and R. Krishnamurthy described
an alternate repeater insertion technique that uses correct-by-construction polarities to
reduce worst-case miller coupling factor (MCF) across any multiple segmented
portion of a repeated bus[7]. The increased integration of multiple-cores and large
shared caches in microprocessors requires improved energy- efficiency for core-to-core,
core-to-cache and even intra-core communication to sustain the required performance
benefits within shrinking power envelopes.
For multi-core chips, improved bus designs need to satisfy the constraints of
drop-in replacement for minimal change in design methodology, robust operation
and gains across process corners and design space and the ability to improve power-
8/13/2019 Starlee Contents
23/45
23
performance of shared busses with multiple driver and receiver points along the bus.
The technique is robust across process corners and for bus designs with non- equidistant
repeater placement.
C. J. Akl and M. A. Bayoumi proposed a hybrid polarity repeater insertion
technique that combines inverting and non-inverting repeater insertion to achieve
constant average effective coupling capacitance per wire transition for all possible
switching patterns[8]. A simple yet effective hybrid polarity repeater insertion technique
is used to minimize delay uncertainty and reduce delay and/or energy dissipation and
buffers area of on-chip buses and adjacent signal wires. The reduction in worst case
capacitive coupling reduces peak energy which is a critical factor for thermal regulation
and packaging. With the continuous scaling of CMOS technology, interconnect delays
dominate gate delays and become the major limiter to high performance systems.
One of the major causes of interconnect delay degradation is the high cross-
coupling capacitance between adjacent signal wires in deep sub- micrometer (DSM)
technologies. This is mainly due to the dense wiring employed to achieve high
integration density, and the increased aspect ratio used to lower interconnect resistance.
The proposed technique has negligible overhead and it has regular and uniform layout.
It can be easily integrated into current automated layout tools, which can be a possible
extension of this work.
K. Hirose and H. Yasuuraproposed a technique for reduction of maximum bus
delay caused by crosstalk to prevent simultaneous opposite transition by skewing signal
transition timing of adjacent wires[4].As the CMOS technology scaled down, the
horizontal coupling capacitance makes crosstalk interference between adjacent wires aserious problem in VLSI design.
The bus wires are placed with alternation of normal timing wire and shifted
timing wire for the purpose of no adjacent wires switch at the same transition timing.
The shifting time can be made by a delay line of inverter chain or two phase clock. By
approximated equation of bus delay, it becomes clear that our technique is effective for
repeater-inserted bus.
8/13/2019 Starlee Contents
24/45
24
K. Bernstein, C.-T. Chuang, R. Joshi, and R. Puri described CMOS scaling
challenges for sub-9Onm designs and CAD challenges to support energy, parameter
variation, and micro architectural challenges[20]. Design practice will have to change
from todays deterministic design to probabilistic and statistical design. Future high
performance microprocessor design with technology scaling beyond 90nm will pose
two major challenges: (1) energy and power, and (2) parameter variations.
Excessive sub threshold and gate oxide leakage are emerging as serious
problem. Active power density of memory, such as on-chip SRAM cache, is an order
of magnitude smaller than logic. Therefore, the overall processor performance can be
improved in a more energy-efficient manner by using more memory than logic. This
effectively reduces the overall activity factor of the chip. Dual-Vt designs can reduce
leakage power during active operation, burn-in and standby.
8/13/2019 Starlee Contents
25/45
25
CHAPTER 3
DESIGN METHODOLOGY
3.1 EXISTING METHOD
The hybrid polarity repeater insertion technique is shown in Figure. 3.1. The
inverting repeaters at the midpoint of alternate bus lines are replaced with non-inverting
repeaters. For an n-bit bus, n/2 inverting repeaters are substituted with non-inverting
ones. When a worst delay transition occurs at the first half of a line, a best delay
transition occurs at the second half and vice versa.
Figure 3.1 Proposed hybrid polarity repeater insertion with non-inverting
repeaters inserted at the midpoint of alternate bus wires.
This leads to constant average coupling capacitance factor for all possible input
transitions as shown in Table 3.1. As a result, reduction of both worst case delay and
delay uncertainty is achieved. This technique has negligible overhead and it has regular
and uniform layout. This technique is useful in many applications and it can be easily
integrated into current automated layout tools, which can be a possible extension of this
work.
8/13/2019 Starlee Contents
26/45
26
Table 3.1 Average Coupling Capacitance Factor of the middle switching line
for all possible input transition patterns
3.1.1 Drawbacks of Existing Method
This technique reduces the overall worst-case MCF of an interconnect to 1, but
also eliminate the best-case MCF of 0 (all adjacent wires switching in the same
direction), leading to less advantage in average energy consumption.
3.2 PROPOSED METHOD
The edge encoding technique and coding scheme are used to reduce the power
consumption due to coupling capacitance and self transitions respectively. In edge
encoding technique, worst-case MCF of 1 is achieved, while preserving the best-case
MCF of 0.This is done by controlling the edges of rising and falling transition in time,
namely always performing rising transitions on the negative edge of the clock and
falling transition on the positive edge of the clock (vice versa).
3.2.1 Edge Encoding Technique
In a multi-cycle bus structure, the transitions between neighboring wires are
synchronized at every flip-flop as the signal propagates down the bus. This often
generates simultaneous switching of adjacent wires in the opposite or same direction. In
8/13/2019 Starlee Contents
27/45
27
Figure. 3.2, the worst-case and best-case switching of the conventional bus are shown.
The MCF=2 case, where every other wire switches in the opposite direction, generates
the worst-case delay, which defines the clock frequency and also consumes the worst-
case energy due to maximum coupling capacitance. To avoid this, we propose to
selectively shift rising and falling edges and separate them by as much as half cycle. For
example, as seen in Figure. 3.2 (b), if we selectively delay only the rising transitions by
a half cycle and keep the falling transitions unaltered, the worst-case MCF is reduced
from 2 to 1. We refer to this selective edge shifting as edge encoding.
(a)Conventional wire switching
(b) Proposed wire switching
Figure 3.2 Conventional and proposed wire switching scenario in adjacent wires
8/13/2019 Starlee Contents
28/45
28
Since edge encoding shifts the same transitions together, the advantage of best-
case switching is still maintained, this is unachievable in most other approaches. Since
the edge-encoded signal transitions at both positive and negative edges of the clock, we
use dual-edge triggered flip flops to propagate the signal along long multi-cycle
interconnects. Since the signals must be synchronized back to positive edge triggered
flip-flops after long interconnects, the number of dual-edge flip-flops within the multi-
cycle interconnect should be even.
The objective of the edge encoder is to selectively shift the rising and falling
transition by different amounts. This encoding is done simply by performing an AND
operation between the original signal and the half-cycle delayed version of itself. In this
way, only the rising edge is delayed by a half cycle, separating simultaneous rising and
falling transition by a half cycle. Since the encoder logic is very simple, the encoding
overhead in terms of power and area is very small. This makes the edge encoding
technique a highly practical approach.
We propose two schemes to effectively use the edge-encoding technique in
multi-cycle interconnect. The two methods differ in the procedure to cope with the
initial half cycle latency required for edge encoding and to address the issue of aligning
back to the positive-edge triggered signal at the far end of the wire.
3.2.1.1 Zero Latency (ZL) Scheme
The ZL scheme reduces energy consumption in multi-cycle interconnects
without any latency overhead although encoding requires a half-cycle delay at the near
end of the wire. This scheme exploits the fact the signal propagation in the edge-encoded bus is faster than that in the conventional bus due to reduced coupling
capacitance.
The block diagram of a multi-cycle interconnect with simple encoder logic is
shown in Figure. 3.3.
8/13/2019 Starlee Contents
29/45
29
Figure 3.3 Encoder logic and block diagram of ZL scheme.
The encoding procedure and the propagation of the encoded signal are shown in
Figure. 3.3. When data toggles every cycle, the encoder generates a half-cycle pulse
(enc_out). As this half-cycle pulse propagates through an even number of dual-edge
flip-flops, it automatically aligns back to a positive edge triggered signal (ff4_out) at thefar end. Therefore, there is no need for any decoder circuit.
Figure 3.4. Timing diagram of ZL scheme
8/13/2019 Starlee Contents
30/45
30
To achieve overall zero latency, the interconnect system is set up as shown in
Figure. 3.5. If the conventional scheme requires n cycles to propagate through the entire
interconnect, the edge-encoded bus must propagate through in (2n-1) half cycles,
considering that the encoding takes one half cycle to synchronize at the far end of wire.
In Figure. 3.5, L1 is the distance between positive-edge triggered flip-flops in the
conventional bus, and L2 is the distance between dual-edge triggered flip-flops in the
edge-encoded bus. Overall zero latency is achievable if L2 is defined by L1 and as
follows:
() ...................................................................... (3.1)
Figure. 3.5 Flip-flop placement in conventional and ZL edge encoding scheme.
For example, in a 9 mm interconnect, when n=3 and L1=3, the edge-encoded
signal will propagate 1.8 mm every half cycle while the conventional signal will
propagate 3 mm every cycle.
8/13/2019 Starlee Contents
31/45
31
Effectively, the edge-encoded signal is traveling 20% longer (1.8 versus 1.5
mm) during the same time period, which is possible when at least a 17% speedup is
achieved in the edge encoded bus due to coupling capacitance reduction.
3.2.1.2 One Cycle Latency (OCL) Scheme
In multi-cycle interconnects, multiple cycles are required to propagate across
the entire wire. In these cases, one additional cycle latency may be acceptable if a clock
frequency increase or aggressive energy reduction is a higher design priority. The OCL
scheme is intended to achieve further performance improvement and energy reduction
for a fixed throughput at the expense of one-cycle latency.
After the encoding, the data must eventually align to the positive edge of the
clock at the far end of the wire. To achieve this, we can align the transition at the near
end to the positive edge of the clock by encoding with a full one cycle delay, and then
allow for normal signal propagation along the wire. The one-cycle latency is therefore
introduced once at the beginning of the wire and the throughput is not hampered.
The block diagram of the OCL edge encoding scheme are shown in Figure. 3.6.
Figure 3.6. Encoder logic and block diagram of OCL scheme.
The difference in the encoder in Figure 3.6 compared to ZL is that a dual-edge
flip-flop is added at the output to intentionally delay enc_in by one cycle and align the
8/13/2019 Starlee Contents
32/45
32
rising edge of enc_out at the positive edge of the clock as shown in Figure 3.6.The
timing diagrams of the OCL edge encoding scheme are shown in Figure 3.7.
Figure 3.7. Timing diagram of OCL scheme
The corresponding flip-flop placement in the OCL edge encoding scheme is
shown in Figure 3.8. Dual-edge flip-flops are placed at intervals equal to half the flop
distance of the conventional bus. In the OCL edge-encoded bus, since the worst-case
wire delay is reduced due to MCF reduction, we can either increase the clock frequency
for high-performance busses or downsize the repeaters for iso-performance to the
conventional bus for aggressive energy reduction.
8/13/2019 Starlee Contents
33/45
33
Figure 3.8 Flip-flop placement in conventional and OCL edge encoding scheme
3.2.2. Coding Scheme
This scheme minimizes self transition activities in the bus lines. Using this
scheme, the present data can be converted into four types. They are inversion of odd
lines, inversion of even lines, swapping adjacent bits and keeping the data unchanged.
Then the number of self transitions between these four data patterns and the previously
coded data can be calculated using energy estimator and the data pattern which has the
less number of self transitions can be found using comparator and this corresponding
data pattern can be transmitted on the bus.
Let the data on an n bit wide bus, at time instant t be denoted as At = {atn-1, atn-
2,....at1, at0}. The data transmitted on the bus is denoted as A(t)enc. The function
calculateST_n(data1, data2) finds the number of self transitions between (data1,data2).
Here, data1 and data2 should be n bits wide. The function swapAdj_n (At) swaps the
adjacent bits in At and gives the output Asw(t) = {atn-1,atn-2,....at0,at1}.
The coding scheme is as follows:
1) Let A(t-1)enc be the previously coded data which was transmitted on the bus and let
At be the present data which should be encoded and transmitted.
2) Invert the odd lines in At and affix it with 00, for decoding purposes. Let this new
data be denoted as At(odd) . Evaluate st_odd=calculateST_n(At(odd) ,A(t-1)enc).
3) Similarly, invert the even lines in At and affix it with 01.Let this new data be denoted
as At(eve).Evaluate st_eve=calculateST_n(At(eve) ,A(t-1)enc).
8/13/2019 Starlee Contents
34/45
34
4) Let Asw(t) =swapAdj_n(At).Suffix Asw(t) with 10 and let this new data be denoted
as At(swp) .Evaluate st_swp=calculateST_n(At(swp) ,A(t-1)enc).
5) Suffix At with 11 and let this new data be denoted as At(unc) .Evaluate
st_unc=calculateST_n (At(unc) ,A(t-1)enc).
6) Find min (st_odd,st_eve,st_swp,st_unc).
7) The coded pattern corresponding to the minimum value in step 6 is transmitted.
The block diagram of the proposed encoder is given in Figure 3.9
Figure 3.9. Block diagram of the encoder of coding scheme
The encoder takes an n bit input. The calculateST_n functions in the steps
2,3,4,5 in the coding scheme above are evaluated by the Energy Estimator block.
The outputs of the Energy Estimator block, min(st_odd,st_eve,st_swp,st_unc)
are compared among themselves and the minimum amongst them is found out. The
encode data pattern corresponding to the minimum value of self transitions is
transmitted on the bus.
The decoder to be used at the receiving end is shown in Figure 3.10.
8/13/2019 Starlee Contents
35/45
35
Figure 3.10. Block diagram of the decoder of coding scheme
The two least significant bits in the transmitted pattern are given to a 2 to 4
decoder and depending on these two bits, the appropriate decoding procedure is done.
3.3 SOFTWARE DESCRIPTION
VHDL is a versatile and powerful hardware description language which is useful
for modeling electronic systems at various levels of design abstraction.
3.3.1 VHDL
VHDL (VHSIC hardware description language) is a hardware description
language used in electronic design automation to describe digital and mixed-signal
systems such asfield-programmable gate arrays andintegrated circuits
3.3.1.1 Design
VHDL is commonly used to write text models that describe a logic circuit. Such
a model is processed by a synthesis program, only if it is part of the logic design. A
simulation program is used to test the logic design using simulation models to represent
the logic circuits that interface to the design. This collection of simulation models is
commonly called a test bench.
http://en.wikipedia.org/wiki/VHSIChttp://en.wikipedia.org/wiki/Hardware_description_languagehttp://en.wikipedia.org/wiki/Hardware_description_languagehttp://en.wikipedia.org/wiki/Electronic_design_automationhttp://en.wikipedia.org/wiki/Digital_electronicshttp://en.wikipedia.org/wiki/Mixed-signal_integrated_circuithttp://en.wikipedia.org/wiki/Field-programmable_gate_arrayhttp://en.wikipedia.org/wiki/Integrated_circuithttp://en.wikipedia.org/wiki/Integrated_circuithttp://en.wikipedia.org/wiki/Field-programmable_gate_arrayhttp://en.wikipedia.org/wiki/Mixed-signal_integrated_circuithttp://en.wikipedia.org/wiki/Digital_electronicshttp://en.wikipedia.org/wiki/Electronic_design_automationhttp://en.wikipedia.org/wiki/Hardware_description_languagehttp://en.wikipedia.org/wiki/Hardware_description_languagehttp://en.wikipedia.org/wiki/VHSIC8/13/2019 Starlee Contents
36/45
36
VHDL has constructs to handle theparallelism inherent in hardware designs, but
these constructs (processes) differ in syntax from the parallel constructs in Ada (tasks).
Like Ada, VHDL is strongly typed and is not case sensitive. In order to directly
represent operations which are common in hardware, there are many features of VHDL
which are not found in Ada, such as an extended set of Boolean operators including
nand and nor. VHDL also allows arrays to be indexed in either ascending or descending
direction; both conventions are used in hardware, whereas in Ada and most
programming languages only ascending indexing is available.
VHDL has file input and output capabilities, and can be used as a general-
purpose language for text processing, but files are more commonly used by a simulation
test bench for stimulus or verification data. There are some VHDL compilers which
build executable binaries. In this case, it might be possible to use VHDL to write a test
bench to verify the functionality of the design using files on the host computer to define
stimuli, to interact with the user, and to compare results with those expected. However,
most designers leave this job to the simulator.
It is relatively easy for an inexperienced developer to produce code that
simulates successfully but that cannot be synthesized into a real device, or is too large tobe practical. One particular pitfall is the accidental production of transparent latches
rather thanD-type flip-flops as storage elements.
One can design hardware in a VHDL IDE (for FPGA implementation such as
Xilinx ISE, Altera Quartus, Synopsys Synplify or Mentor Graphics HDL Designer) to
produce the RTL schematic of the desired circuit. After that, the generated schematic
can be verified using simulation software which shows the waveforms of inputs and
outputs of the circuit after generating the appropriate test bench. To generate an
appropriate test bench for a particular circuit or VHDL code, the inputs have to be
defined correctly. For example, for clock input, a loop process or an iterative statement
is required.
A final point is that when a VHDL model is translated into the "gates and wires"
that are mapped onto a programmable logic device such as aCPLD orFPGA,then it is
http://en.wikipedia.org/wiki/Parallel_computinghttp://en.wikipedia.org/wiki/Strongly_typedhttp://en.wikipedia.org/wiki/Case_sensitivityhttp://en.wikipedia.org/wiki/Transparent_latchhttp://en.wikipedia.org/wiki/Flip-flop_(electronics)http://en.wikipedia.org/wiki/Register-transfer_levelhttp://en.wikipedia.org/wiki/CPLDhttp://en.wikipedia.org/wiki/FPGAhttp://en.wikipedia.org/wiki/FPGAhttp://en.wikipedia.org/wiki/CPLDhttp://en.wikipedia.org/wiki/Register-transfer_levelhttp://en.wikipedia.org/wiki/Flip-flop_(electronics)http://en.wikipedia.org/wiki/Transparent_latchhttp://en.wikipedia.org/wiki/Case_sensitivityhttp://en.wikipedia.org/wiki/Strongly_typedhttp://en.wikipedia.org/wiki/Parallel_computing8/13/2019 Starlee Contents
37/45
37
the actual hardware being configured, rather than the VHDL code being "executed" as if
on some form of a processor chip.
3.3.1.2 Advantages
The key advantage of VHDL, when used for systems design, is that it allows the
behavior of the required system to be described (modeled) and verified (simulated)
before synthesis tools translate the design into real hardware (gates and wires).
Another benefit is that VHDL allows the description of a concurrent system.
VHDL is adataflow language,unlike procedural computing languages such as BASIC,
C, and assembly code, which all run sequentially, one instruction at a time.
A VHDL project is multipurpose. Being created once, a calculation block can be
used in many other projects. However, many formational and functional block
parameters can be tuned (capacity parameters, memory size, element base, block
composition and interconnection structure).
A VHDL project is portable. Being created for one element base, a computing
device project can be ported on another element base, for example VLSI with various
technologies.
http://en.wikipedia.org/wiki/Concurrent_systemhttp://en.wikipedia.org/wiki/Dataflow_programminghttp://en.wikipedia.org/wiki/Dataflow_programminghttp://en.wikipedia.org/wiki/Concurrent_system8/13/2019 Starlee Contents
38/45
38
CHAPTER 4
SIMULATION RESULTS
4.1 POWER ANALYSIS OF ENCODER IN CODING SCHEME:
The dynamic power of 0.068W and quiescent (leakage) power of 0.010 that
together contribute a total on-chip power of 0.078W is given in this figure.
Figure 4.1 Power analysis of encoder in coding scheme
8/13/2019 Starlee Contents
39/45
39
CHAPTER 5
CONCLUSION AND FUTURE ENHANCEMENTS
5.1 CONCLUSION
8/13/2019 Starlee Contents
40/45
40
APPENDIX
CODING
library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
entity self is
Port ( a : in STD_LOGIC_VECTOR (7 downto 0);
encout : out STD_LOGIC_VECTOR (9 downto 0) );
end self;
architecture Behavioral of self is
signal odd,even,swap,append :STD_LOGIC_VECTOR (9 downto 0);
component energy is
Port ( ina : in STD_LOGIC_vector(9 downto 0);
bout :out std_logic_vector(3 downto 0));
end component;
signal egy1,egy2,egy3,egy4 : std_logic_vector(3 downto 0);
begin
p1:process(a)
begin
odd
8/13/2019 Starlee Contents
41/45
41
end process p1;
egymet1 : energy PORT MAP(ina=>odd,bout=>egy1);
egymet2 : energy PORT MAP(ina=>even,bout=>egy2);
egymet3 : energy PORT MAP(ina=>swap,bout=>egy3);
egymet4 : energy PORT MAP(ina=>append,bout=>egy4);
library IEEE;
use IEEE.std_logic_1164.all;
use IEEE.numeric_std.all;
use IEEE.std_logic_unsigned.all;
entity energy is
Port ( ina : in STD_LOGIC_vector(9 downto 0);
bout :out std_logic_vector(3 downto 0));
end energy;
architecture Behavioral of energy is
signal atp,exr : std_logic_vector(9 downto 0) :="0000000000";
component FA is
Port ( a,b,cin : in STD_LOGIC;
s : out std_logic;
c : out STD_LOGIC);
end component FA;
component HA is
Port ( a,b : in STD_LOGIC;
s : out std_logic;
c : out STD_LOGIC);
8/13/2019 Starlee Contents
42/45
42
end component;
signal c1 : std_logic_vector(7 downto 1);
signal s1 : std_logic_vector(5 downto 1);
begin
exr exr(1),b=>exr(2),cin=>exr(3),s =>s1(1),c=>c1(1));
FA2 : FA PORT MAP(a=>exr(4),b=>exr(5),cin=>exr(6),s =>s1(2),c=>c1(2));
FA3 : FA PORT MAP(a=>exr(7),b=>exr(8),cin=>exr(9),s =>s1(3),c=>c1(3));
FA4 : FA PORT MAP(a=>c1(1),b=>c1(2),cin=>c1(3),s =>s1(4),c=>c1(4));
FA5 : FA PORT MAP(a=>s1(1),b=>s1(2),cin=>s1(3),s =>s1(5),c=>c1(5));
HA1 : HA PORT MAP(a=>exr(0),b=>s1(5),s =>bout(0),c=>c1(6));
FA6 : FA PORT MAP(a=>s1(4),b=>c1(6),cin=>c1(5),s =>bout(1),c=>c1(7));
HA2 : HA PORT MAP(a=>c1(7),b=>c1(4),s =>bout(2),c=>bout(3));
end Behavioral;
library IEEE;
use IEEE.std_logic_1164.all;
use IEEE.numeric_std.all;
use IEEE.std_logic_unsigned.all;
entity HA is
Port ( a,b : in STD_LOGIC;
s : out std_logic;
c : out STD_LOGIC);
end HA;
architecture Behavioral of HA is
begin
8/13/2019 Starlee Contents
43/45
43
s
8/13/2019 Starlee Contents
44/45
44
REFERENCES
[1] A. B. Kahng, S. Muddu, and E. Sarto, (1999) Interconnect optimization
strategies for high-performance VLSI designs, in Proc. Int. Conf. VLSI Des., pp.464469.
[2] A. B. Kahng, S. Muddu, and E. Sarto, (2000) On switch factor based analysis of
coupled RC interconnects, in Proc. Des. Autom. Conf. (DAC), pp. 7984.
[3] B. Victor and K. Keutzer, (2001 )Bus encoding to prevent crosstal delay, in
Proc. Int. Conf. Comput.-Aided Des. (ICCAD), pp. 5763.
[4] C. J. Akl and M. A. Bayoumi, (Sep. 2008)Reducing interconnect delay
uncertainty via hybrid polarity repeater insertion, IEEE Trans. Very Large Scale
Integr. (VLSI) Syst., vol. 16, no. 9, pp. 12301239.
[5] H. Deogun, R. Senger, D. Sylvester, R. Brown, and K. Nowka, (2006)
A dual-Vdd boosted pulsed bus technique for low power and low leakage
operation, in Proc. Int. Symp. Low Power Electron. Des. (ISLPED),pp. 7378.
[6] H. Kaul, D. Sylvester, M. Anders, and R. Krishnamurthy, (Nov. 2005) Design
and analysis of spatial encoding circuits for peak power reduction in on-chip
buses, IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol.13, no. 11, pp.
12251238.
[7] H. Kaul, J. Seo, M. Anders, D. Sylvester, and R. Krishnamurthy, (2008) A
robust alternate repeater technique for high performance busses in the multi-core
era, in Proc. Int. Symp. Circuits Syst., pp. 372375.
[8] H. Partovi, R. Burd, U. Salim, F. Weber, L. DiGregorio, and D. Draper, (1996)
Flow-through latch and edge-triggered flip-flop hybrid elements, in Dig. Tech.
Papers IEEE Int. Solid-State Circuits Conf. (ISSCC), pp. 138139.
[9] J. Eble, V. De, D. Wills, and J. Meindl,( 1998) Minimum repeater count, size,
and energy dissipation for gigascale integration (GSI) interconnects, in Proc. Int.
Interconnect Technol. Conf.,pp. 5658.
[10] J. Seo, D. Sylvester, D. Blaauw, H. Kaul, and R. Krishnamurthy, (2007) A
robust edge encoding technique for energy-efficient multi-cycle intercon- nect,
in Proc. Int. Symp. Low Power Electron. Des. (ISLPED), pp. 6873.
[11] J. Tschanz, S. Narendra, Z. Chen, S. Borkar, M. Sachdev, and V. De, (2001)
Comparative delay and energy of single edge-triggered and dual edge-triggered
pulsed flip-flops for high-performance microprocessors, in Proc. Int. Symp. Low
Power Electron. Des. (ISLPED), pp. 147152.
8/13/2019 Starlee Contents
45/45
[12] K. Bernstein, C.-T. Chuang, R. Joshi, and R. Puri, (2003)Design and CAD
challenges in sub-90 nm CMOS Technologies, in Proc. Int. Conf. Comput.-
Aided Des. (ICCAD), pp. 129136.
[13] K. Bowman, J. Tschanz, M. Khellah, M. Ghoneima, Y. Ismail, and V. De, (2006)Time-borrowing multi-cycle on-chip interconnects for delay variation tolerance,
in Proc. Int. Symp. Low Power Electron. Des. (ISLPED), pp. 7984.
[14] K. Hirose and H. Yasuura, (2000) A bus delay reduction technique considering
crosstal, in Proc. Des., Autom. Test Eur. (DATE), pp.441445.
[15] K. Nose and T. Saurai, (2001) Two schemes to reduce interconnect delays in
bi-directional and uni-directional buses, in Symp. VLSI Circuits Dig. Tech.
Papers, pp. 193194.
[16] M. Khellah, J. Tschanz, Y. Ye, S. Narendra, and V. De, (2002) Static pulsed busfor on-chip interconnects, in Symp. VLSI Circuits Dig. Tech. Pa- pers, pp. 78
79.
[17] M. Khellah, M. Ghoneima, J. Tschanz, Y. Ye, N. Kurd, J. Barkatullah,
S. Nimmagadda, and Y. Ismail, (2005) A sewed repeater bus architecture for
on-chip energy reduction in microprocessors, in Proc. Int. Conf. Comput. Des.
(ICCD), pp. 253257.
[18] P. P. Sotiriadis, A. Wang, and A. Chandraasan, (2000) Transition pattern
coding: An approach to reduce energy in interconnect, in Proc. Euro. Solid-State
Circuits Conf. (ESSCIRC), pp. 348351.
[19] R. Arunachalam, E. Acar, and S. Nassif, (2003) Optimal shielding/spacing
metrics for low power design, in Proc. IEEE Comput. Soc. Annu. Symp. VLSI,
pp. 167172.
[20] S.-C. Wong, T. G.-Y. Lee, D.-J. Ma, and C.-J. Chao, (May 2000) An empirical
three-dimensional crossover capacitance model for multilevel interconnect VLSI
circuits, IEEE Trans. Semiconductor Manufacturing, vol.13, no. 2, pp. 219227.