Starlee Contents

Embed Size (px)

Citation preview

  • 8/13/2019 Starlee Contents

    1/45

    1

    CHAPTER 1

    INTRODUCTION

    In current microprocessors, the number of wires used for inter-module

    communication has skyrocketed. Furthermore, the increased complexity and high level

    of integration requires higher wire densities, and coupling capacitance has dominated

    total wire capacitance for several technologies already.A high coupling capacitance

    ratio is not favorable in conventional bussesdue to the possibility of adjacent wiresswitching in the opposite direction, yielding a worst-case Miller capacitance factor

    (MCF) of 2. For example, when MCF=2 the coupling capacitance ratio over the total

    interconnect capacitance is over 80% for a minimum pitch intermediate metal layer in

    65-nm[5].It is possible to reduce coupling capacitance by increasing spacing or

    introducing shielding, but this comes at the cost of significant area penalties[1]. Hence,

    a key challenge in interconnect design is to reduce the worst-case MCF while

    maintaining the same physical footprint of the interconnect, thereby reducing the

    effective wire capacitance and interconnect energy consumption.

    In this project, a new encoding technique that achieves a worst-case MCF of 1,

    while preserving the best-case MCF of 0 has been presented. This is done by controlling

    the edges of rising and falling transition in time, namely always performing rising

    transitions on the negative edge of the clock and falling transition on the positive edge of

    the clock (or vice versa). Since the worst-case switching is separated by as much as one

    phase (half clock cycle), this technique remains robust against process variation. Hence,

    both the average and worst-case energy can be reduced without impacting the sensitivity

    to process variation. Average energy savings will aid battery life and typical energy costs,

    but worst-case energy is also a meaningful metric in terms of thermal management and

    peak demand for power grids and decoupling capacitance.[13] These savings are

    accomplished at the expense of minimal encoder logic with half cycle latency and

    additional clocking. However, we findthat the logic and clocking overhead is small in

    long interconnects where interconnect power consumption is dominant, and also show

  • 8/13/2019 Starlee Contents

    2/45

    2

    that the potential latency overhead can be eliminated or minimized in multi-cycle

    interconnects.

    1.1SELF-TRANSITIONS AND COUPLING TRANSITIONSSelf-transitions are defined as transitions on the capacitance between a bus

    line and the substrate (ground) while coupling transitions are defined as transitions

    on the capacitance between adjacent lines. Figure 1.1 shows a simplified bus model

    with coupling (ignoring all the resistances). Cs is the self-capacitance from each

    bus line to ground; Cc is the coupling-capacitance between two adjacent lines. The

    average power consumption on the bus is given by:

    ( ) (1.1)

    where s is the number of average self-transitions per bus cycle and c is the

    number of average coupling-transitions per bus cycle; only the charging transitions

    that require current flow from the power supply are being counted. For example only

    the 0 1 transitions are counted as self-transitions for power consumption.

    Figure 1.1 Bus model with self and coupling capacitances.

    1.2MILLER-COUPLING CAPACITANCEA simple way of performing timing analysis in the presence of crosstalk noise is

    to replace the coupling capacitance by an equivalent Miller capacitance to account for

    the slow down or the speed up of the victim transition. Hence, we are effectively

  • 8/13/2019 Starlee Contents

    3/45

    3

    modeling the delay noise on the victim by simply scaling the coupling capacitance with

    the appropriate Miller-coupling factor (MCF). The simple approximation leads to a

    decoupling between the aggressor-victim interconnects (as shown in Figure 1.2) and

    allows us to compute the signal arrival times using the existing STA framework.

    Figure 1.2 Decoupling of the aggressor-victim nets by using Miller-coupling

    capacitance

    The exact value of the Miller-coupling factors for the victim (MCFvictim

    ) depends

    on the mutual switching directions and the ratio of the slopes of the aggressor-victim

    waveforms

    (

    ).(1.2)

    If the aggressor-victim nets switch in mutually opposite directions (as shown in

    Figure 1.2), then MCFvictim is obtained by adding 1 to the relative ratio of the

    slopes(a/v) Conversely, if the aggressor-victim nets switch in the same direction,

  • 8/13/2019 Starlee Contents

    4/45

    4

    then MCFvictim is obtained by subtracting (a/v) from one. Suppose the aggressor-

    victim nets switch in the opposite directions with equal slopes (i.e., a = v). Using

    Equation 1.2, we can obtain the commonly used MCFvictimvalue of two. In such cases,

    the Miller capacitance used to decouple the aggressor-victim interconnects is twice the

    size of the original coupling capacitance. Similarly, if the aggressor-victim nets switch

    in the same direction with equal slopes, then we obtain an MCFvictimvalue of zero.

    1.3 FLIP-FLOP

    In electronics, a flip-flopor latchis a circuit that has two stable states and can be

    used to store state information. The circuit can be made to change state by signals

    applied to one or more control inputs and will have one or two outputs. It is the basic

    storage element in sequential logic. Flip-flops and latches are a fundamental building

    block of digital electronics systems used in computers, communications, and many

    other types of systems.

    Flip-flops and latches are used as data storage elements. Such data storage can

    be used for storage of state, and such a circuit is described as sequential logic. Whenused in a finite-state machine, the output and next state depend not only on its current

    input, but also on its current state (and hence, previous inputs). It can also be used for

    counting of pulses, and for synchronizing variably-timed input signals to some

    reference timing signal.

    Flip-flops can be either simple (transparent or opaque) or clocked (synchronous

    or edge-triggered); the simple ones are commonly called latches. The word latch is

    mainly used for storage elements, while clocked devices are described as flip-flops.

    1.3.1 Dual Edge Flip-Flops

    The edge encoding technique requires dual-edge flip-flops and hence the

    number of flip-flops placed in multi-cycle interconnects is inevitably increased

    compared to conventional multi-cycle interconnects with single-edge flip-flops.

  • 8/13/2019 Starlee Contents

    5/45

    5

    Along the critical path of multi-cycle interconnects, both the total setup time

    and CLK-Q delay in flip-flops increase as well. Hold time is not a concern even when

    using dual-edge flip-flops because the interconnect paths are well-defined with several

    repeaters and large wire load, and thus short paths between flip-flops do not exist.

    Time-borrowing flip-flops have zero or negative setup time, providing

    performance benefits over master-slave flip-flops. Therefore, in the proposed edge

    encoding techniques,time-borrowing flip-flops are considered for the inserted flip-flops

    to mitigate the increase in total setup time and variation.

    Several types of flip-flops from [18] were considered to absorb the setup time

    and mismatch between interconnect paths, for the time-borrowing dual-edge flip-flops.

    We selected the pulsed triggered dual-edge flip-flop in Figure 1.3 for the proposed

    edge encoding because it is more area-efficient and the D-Q path is shorter.

    Figure 1.3 Schematic of time borrowing pulsed dual edge flip-flop.

    1.4 TIME BORROWING WITH LATCHES

    The unique property which enables above advent transparent for the duration

    of an active clock pulse normal edge-to-edge timing requirements of sync long

    enough and is determining the maximum freq a shorter path in subsequent latch-to-

    latch stages

  • 8/13/2019 Starlee Contents

    6/45

    6

    Figure 1.4 Timing paths in a sample circuit.

    The figure 1.4 has 2 timing paths: Path 1 from the p a negative-level latch (2),

    while Path 2 is from the triggered register (3). Let us examine this simple compensate

    for the delay through the logic cloud A in Path 1, we can have two scenarios of

    timing (Figures 1.6 and 1.7.)

    Figure 1.5 Timing relationships set by the system clock.

    In Case A, data arrives from logic A at Latch 2 be this case, the behavior of the

    latch is similar to the need to borrow any time to achieve our timing go.

  • 8/13/2019 Starlee Contents

    7/45

    7

    Figure 1.6 Logic A is fast enough, so no borrowing is necessary.

    In Case B, the negative clock edge enables the la at the input of the latch. So the

    latch will go to tra from Logic A through to Register B for a while. Bu from Logic A

    reaches Logic B and passes through Register 2. So if the propagation delay of Logic

    B, some of the time reserved for Logic B, and the cir this extra time in order to

    complete its propagation analysis will consider the end of the borrowed time delay.

    Figure 1.7 Logic A borrows time from Logic B.

    While doing STA, Timing reports will be generate the timing when the latch is

    enabled is the same element (Figure 1.8)

  • 8/13/2019 Starlee Contents

    8/45

    8

    Figure 1.8 When the latch is enabled, it becomes a passive delay.

    1.5 REPEATER INSERTION

    Repeater insertion is a technique for reducing the time delay associated with

    long wire lines in integrated circuits. The technique involves cutting the long wire into

    one or more short wires and inserting arepeaterbetween each new pair of short wires.

    The time it takes for a signal to travel from one end of a wire to the other end is

    known as wire-line delay or just delay. In an integrated circuit, this delay is

    characterized byRC,theresistance of the wire (R

    ) multiplied by the wire'scapacitance(C). Thus, if the wire's resistance is 100 ohms and its capacitance is 0.01 microfarad

    (F), the wire's delay is onemicrosecond (s).

    The resistance of a wire on an integrated circuit is directly proportional, or

    linear, according to the wire's length. If a 1 mm length of the wire has 100 ohms

    resistance, then a 2 mm length will have 200 ohms resistance.

    The capacitance of a wire also increases linearly along its length. If a 1 mm

    length of the wire has 0.01 F capacitance, a 2 mm length of the wire will have 0.02 F,

    a 3 mm wire will have 0.03 F, and so on as shown in table 1.1.

    Thus, the time delay through a wire increases with the square of the wire's

    length. This is true, to first order, for any wire whose cross-section remains constant

    along the length of the wire.

    http://en.wikipedia.org/wiki/Repeaterhttp://en.wikipedia.org/wiki/RC_circuithttp://en.wikipedia.org/wiki/Electrical_resistancehttp://en.wikipedia.org/wiki/Capacitancehttp://en.wikipedia.org/wiki/Ohmhttp://en.wikipedia.org/wiki/Faradhttp://en.wikipedia.org/wiki/Microsecondhttp://en.wikipedia.org/wiki/Integrated_circuithttp://en.wikipedia.org/wiki/Directly_proportionalhttp://en.wikipedia.org/wiki/Time_delayhttp://en.wikipedia.org/wiki/Time_delayhttp://en.wikipedia.org/wiki/Directly_proportionalhttp://en.wikipedia.org/wiki/Integrated_circuithttp://en.wikipedia.org/wiki/Microsecondhttp://en.wikipedia.org/wiki/Faradhttp://en.wikipedia.org/wiki/Ohmhttp://en.wikipedia.org/wiki/Capacitancehttp://en.wikipedia.org/wiki/Electrical_resistancehttp://en.wikipedia.org/wiki/RC_circuithttp://en.wikipedia.org/wiki/Repeater
  • 8/13/2019 Starlee Contents

    9/45

    9

    Wire Length Resistance Capacitance Time delay

    1 mm 100 ohm 0.01F 1s

    2 mm 200 ohm 0.02 F 4 s

    3 mm 300 ohm 0.03 F 9 s

    Table 1.1 Analysis of wire delay with wire length.

    The interesting consequence of this behavior is that, while a single 2 mm length

    of wire has a delay of 4 s, two separate 1 mm wires only have a delay of 1 s each.

    The two separate wires cover the same distance in half the time. By cutting the wire in

    half, we can double its speed.

    Anactivecircuit must be placed between the two separate wires so as to move

    the signal from one to the next. An active circuit used for such a purpose is known as a

    repeater.In a CMOS integrated circuit, the repeater is often a simple inverter.

    Reducing the delay of a wire by cutting it in half and inserting a repeater is

    known as repeater insertion. The cost of this procedure is the additional new delay

    through the repeater itself, plus power cost because the repeater is an active circuit that

    must be powered, whereas the plain unrepeated wire was originally an unpowered

    passive component.

    1.6 BASIC LOW POWER DIGITAL DESIGN

    Moores law states that the number of transistors that can be placed

    inexpensively on an integrated circuit will double approximately every two years,hasoften been subject to the following criticism: while it boldly states the blessing of

    technology scaling, it fails to expose its bane. A direct consequence of Moores law is

    that the power density of the integrated circuit increases exponentially with every

    technology generation. This implicit trend has arguably brought about some of the

    most important changes in electronic and computer designs. Since the 1970s, most

    popular electronics manufacturing technologies used bipolar and nMOS transistors.

    However, bipolar and nMOS transistors consume energy even in a stable combinatorial

    http://en.wikipedia.org/wiki/Active_componenthttp://en.wikipedia.org/wiki/Electronic_circuithttp://en.wikipedia.org/wiki/Repeaterhttp://en.wikipedia.org/wiki/Repeaterhttp://en.wikipedia.org/wiki/Passive_componenthttp://en.wikipedia.org/wiki/Passive_componenthttp://en.wikipedia.org/wiki/Repeaterhttp://en.wikipedia.org/wiki/Electronic_circuithttp://en.wikipedia.org/wiki/Active_component
  • 8/13/2019 Starlee Contents

    10/45

    10

    state, and consequently, by 1980s, the power density of bipolar designs was considered

    too high to be sustainable.

    1.6.1 CMOS Transistor Power Consumption

    The power consumption of a CMOS transistor can be divided into three different

    components: dynamic, static (or leakage) and short circuit power consumption.

    Figure 1.9 illustrates the three components of power consumption. Dynamic and

    short circuit power are also collectively known as switching power, and are consumed

    when transistors change their logic state, but leakage power is consumed merely

    because the circuit is powered-on.

    Switching power, which includes both dynamic power and short-circuit power,

    is consumed when signals through CMOS circuits change their logic state, resulting in

    the charging and discharging of load capacitors. Leakage power is primarily due to the

    sub-threshold currents and reverse biased diodes in a CMOS transistor.

    total dynamic shortcircuit leaage..(1.3)

    Figure 1.9 The dynamic, short circuit and leakage power components

  • 8/13/2019 Starlee Contents

    11/45

    11

    1.6.2 Switching Power

    When signals change their logic state in a CMOS transistor, energy is drawn

    from the power supply to charge up the load capacitance from 0 to Vdd. For the inverter

    example in Figure 1.9, the power drawn from the power supply is dissipated as heat in

    pMOS transistor during the charging process. Energy is needed whenever charge is

    moved against some potential. Thus, dE = d(QV). When the output of the inverter

    transitions from logical 0 to 1, the load capacitance is charged. The energy drawn from

    the power supply during the charging process is given by,

    dEP= d(VQ) = V

    dd.dQ

    L

    since the power supply provides power at a constant voltage Vdd. Now, since QL=CL.VL,

    we have:

    dQL= CL.dVL

    Therefore,

    dEP= Vdd. CL. dVL

    Integrating for full charging of the load capacitance,

    = CL. Vdd2I ...(1.4)

    Thus a total of CL.Vdd2 energy is drawn from the power source. The energy EL stored

    in the capacitor at the end of transition can be computed as follows:

    dEL = d(VQ) = VL.dQL

    where VL is the instantaneous voltage across the load capacitance, and QL is the

    instantaneous charge of the load capacitance during the loading process. Therefore,

    dEL = VL. CLdVL

    Integrating for full charging of the load capacitance,

    =CL. Vdd2

    /2

    (1.5)

  • 8/13/2019 Starlee Contents

    12/45

  • 8/13/2019 Starlee Contents

    13/45

    13

    As the input voltage rises, at time t0 , we have Vin >VTn, i.e., the input voltage

    become higher than the threshold voltage of the nMOS transistor. At this time a short-

    circuit current path is established. This short circuit current increases as the nMOS

    transistor turns on. Thereafter, the short circuit current first increases and then decreases

    until, after t1, we have Vin >(Vdd-VTp),i.e., the pMOS transistor turns off, signaling the

    end of short-circuit current. Therefore, in the duration when (VTn < Vin < (Vdd - VTp))

    holds, there will be a conductive path open between Vdd and GND because both the

    nMOS and pMOS devices will be simultaneously on. Short-circuit power is typically

    estimated as:

    Pshort-circuit = 1/12.k..Fclk.( Vdd -2 Vt )3 .(1.6)

    1.6.4 Leakage Power

    The third component of power dissipation in CMOS circuits, as shown in

    Equation 1.5 is the static or leakage power. Even though a transistor is in a stable logic

    state, just because it is powered-on, it continues to leak small amounts of power at

    almost all junctions due to various effects.

    1.6.4.1 Reverse Biased Diode Leakage

    The reverse biased diode leakage is due to the reverse bias current in the

    parasitic diodes that are formed between the diffusion region of the transistor and

    substrate. It results from minority carrier diffusion and drift near the edge of depletion

    regions, and also from the generation of electron hole pairs in the depletion regions of

    reverse-bias junctions. As shown in Figure 1.11, when the input of inverter in Figure 1.9

    is high, a reverse potential difference of Vddis established between the drain and the n-

    well, which causes diode leakage through the drain junction. In addition, the n- well

    region of the pMOS transistor is also reverse biased with respect to the p-type substrate.

    This also leads to reverse bias leakage at the n-well junction.

  • 8/13/2019 Starlee Contents

    14/45

    14

    The reverse bias leakage current is typically expressed as,

    Irbdl= A.Js.(eq.Vbias/kT-1).(1.7)

    where A is the area of the junction, Js is the reverse saturation current density,

    and Vbias is the reverse bias voltage across the junction, and V th= KT/ q is the thermal

    Voltage. Reverse biased diode leakage will further become important as we continue to

    heavily dope the n- and p-regions. As a result, zener and band-to-band tunneling will

    also become contributing factors to the reverse bias current.

    Figure 1.11 Components of Leakage Power

    1.7 CROSSTALK NOISE

    As the feature sizes have been shrinking with process-technology scaling, the

    spacing between adjacent interconnect wires keeps decreasing in every process

    technology. Also, while the lateral width of interconnect wires has been scaled down

  • 8/13/2019 Starlee Contents

    15/45

    15

    significantly their vertical height has not been scaled in proportion (as shown in Figure

    1.12). Both these trends lead to a very rapid increase in the amount of coupling

    capacitance (essentially like parallel-plate capacitors) between the wires. We know that

    coupling capacitance accounts for more than 85% of the total interconnect capacitance

    in the 90nm technology node.

    Figure 1.12 Shrinking of wire geometries in the nanometer process technologyleads to an increase in the amount of coupling capacitance.

    More aggressive technology scaling will only leads to an increase in the overall

    contribution of the coupling capacitances to the total interconnect capacitance.

    Therefore, signal-integrity issues such as crosstalk noise have become important when

    performing timing verification of VLSI chips. Due to capacitive coupling, the switching

    characteristic of a net is affected by simultaneous switching of nets that are in close

    physical proximity. The net under analysis which suffers from coupling noise is referred

    to as victim, and all neighboring nets which contribute to coupling noise on the victim

    are termed as aggressors. Figure 1.13 illustrates coupling noise injected on the victim

    due to the rising transition of its aggressor.

    As the aggressor transition occurs, the voltage at the victim gets pulled up due to

    the AC current flowing through the coupling capacitance. The resulting glitch in the

    victim voltage due to the aggressor transition is referred to as coupling- noise pulse. The

  • 8/13/2019 Starlee Contents

    16/45

    16

    peak of the coupling-noise pulse usually occurs when the aggressor transition is

    completed and there is no more coupling current flowing through the coupling

    capacitance. Finally, the coupling-noise pulse gradually dies down once the aggressor

    transition is completed and the victim node discharges the accumulated charge.

    Figure 1.13 The aggressor transition resulting coupling-noise pulse on the

    victim

    It has become imperative to model the signal-integrity issues that can arise due

    to the charge transfer through the coupling capacitances for VLSI chips in the

    nanometer process technology.

    1.7.1 Preliminaries of Crosstalk Effects

    Crosstalk can affect signal delays by changing the times at which signal

    transitions occur. For example, consider the signal waveforms on the cross-coupled nets

    A, B, and C in Figure 1.14. Because of capacitive cross-coupling, the transitions on net

    A and net C can affect the time at which the transition occurs on net B.

    A rising-edge transition on net A at the time shown in Figure 1.14 can cause the

    transition to occur later on net B, possibly contributing to a setup violation for a path

    containing B.

  • 8/13/2019 Starlee Contents

    17/45

    17

    Figure 1.14 Transition Slowdown or Speedup Caused by Crosstalk.

    Similarly, a falling-edge transition on net C can cause the transition to occur

    earlier on net B, possibly contributing to a hold violation for a path containing B. The

    logic effects of crosstalk on steady-state nets are due to the cross-coupled nets as shown

    in Figure 1.15.

    Figure 1.15 Glitch Due to Crosstalk.

    Net B should be constant at logic zero, but the rising edge on net A causes a

    noise bump or glitch on net B. If the bump is sufficiently large and wide, it can cause an

    incorrect logic value to be propagated to the next gate in the path containing net B.

  • 8/13/2019 Starlee Contents

    18/45

    18

    A net that receives undesirable cross-coupling effects from a nearby net is called

    a victim net. A net that causes these effects in a victim net is called an aggressor net.

    Note that an aggressor net can itself be a victim net; and a victim net can also be an

    aggressor net. The terms aggressor and victim refer to the relationship between two nets

    being analyzed. The timing impact of an aggressor net on a victim net depends on

    several factors:

    The amount of cross-coupled capacitance

    The relative times and slew rates of the signal transitions

    The switching directions (rising, falling)

    The combination of effects from multiple aggressor nets on a single victim net

    Figure 1.16 illustrates the importance of timing considerations for calculating

    crosstalk effects.

    Figure 1.16 Effects of Crosstalk at Different Arrival Times.

    The aggressor signal A has a range of possible arrival times, from early to late.

    If the transition on A occurs at about the same time as the transition on B, it could cause

    the transition on B to occur later as shown in the figure, possibly contributing to a setup

    violation, or it could cause the transition to occur earlier, possibly contributing to a hold

  • 8/13/2019 Starlee Contents

    19/45

    19

    violation. If the transition on A occurs at an early time, it induces an upward bump or

    glitch on net B before the transition on B, which has no effect on the timing of signal B.

    However, a sufficiently large bump can cause unintended current flow by forward-

    biasing a pass transistor. Similarly, if the transition on A occurs at a late time, it induces

    a bump on B after the transition on B, also with no effect on the timing of signal B.

    However, a sufficiently large bump can cause a change in the logic value of the net,

    which can be propagated down the timing path.

  • 8/13/2019 Starlee Contents

    20/45

    20

    CHAPTER 2

    LITERATURE REVIEW

    K. Nose and T. Sakurai proposed a new buffer insertion scheme for bi-

    directional buses, namely dual-rail bus (DRB) scheme, which does not have

    noise problems [3]. One more proposal is on a high-speed buffer insertion scheme for

    uni-directional buses by making use of staggered firing. The staggered firing bus (SFB)

    is proposed and measured. As the device dimension is scaled down, interconnect RCdelay becomes dominant performance limiter in high-performance VLSI's.

    Another issue in the submicron interconnects is a drastic increase of coupling

    capacitance due to the higher aspect ratio to reduce the interconnect resistance. The

    increase of the coupling capacitance degrades signal integrity, inducing noise problems

    and delay fluctuation problems. The original buffer insertion, however, cannot be

    applied to bi-directional buses because the buffer is uni-directional in nature. These

    circuits turn out to be prone to malfunctions when there is a noise from adjacent lines in

    scaled down interconnect systems where capacitive coupling is large.

    B. Victor and K. Keutzer proposed to employ data encoding to eliminate

    crosstalk delay within a bus. They present a rigorous analysis of the theory behind

    "self-shielding codes", and gives the fundamental theoretical limits on the

    performance of codes with and without memory [9]. In this paper, they have

    introduced the concept of using data encoding to mitigate crosstalk delay on buses.

    The latter would involve codes that eliminate crosstalk delay as well as reduce

    average power consumption, perform error detection or correction. The propagation

    delay across long on-chip buses is increasingly becoming a limiting factor in high-

    speed designs. Crosstalk between adjacent wires on the bus may create a significant

    portion of this delay Placing a shield wire between each signal wire alleviates the

    crosstalk problem but doubles the area used by the bus.

  • 8/13/2019 Starlee Contents

    21/45

    21

    M. Khellah, J. Tschanz, Y. Ye, S. Narendra, and V. De proposed Static

    Pulsed Bus (SPB) driver, repeater and receiver techniques that reduce the worst-

    case CCM value to 1[11]. RC delay of long on-chip interconnects continues to be

    a key limiter to performance and power of microprocessors. Coupling

    capacitance (Cc) between neighbouring lines remains a large fraction (50%) of the

    total line capacitance (C1NT) with technology scaling.

    SPB offers significant advantages over SB in delay, energy, total

    device width and peak Vcc current. These improvements are due to reduction

    in CCM and repeater skewing enabled by monotonic signal transition. Unlike

    dynamic schemes, energy savings of SPB are maintained across all activity factors

    without any clock power or routing overhead.

    R. Arunachalam, E. Acar, and S. Nassif presented common current methods

    to decrease coupling noise include shielding and buffering, both of which can increase

    overall power dissipation[1]. An alternative method is spacing, which has the added

    benefit of improving the manufacturability (i.e. defect insensitivity) of the design.

    Capacitive coupling is recognized as one of the most critical problems that designers

    need to address for deep submicron technologies.

    A commonly used technique to avoid coupling is to shield signal lines from each

    other by inserting power/ground lines in between them. Although shielding practically

    eliminates coupling between signal lines, it does result in increased power and area.

    Their results demonstrate that spacing is a viable and more effective option than

    shielding for low power designs, even after budgeting for noise and delay increase due

    to coupling. Excessive (unnecessary) shielding may significantly increase the total

    capacitance of the signal line, which dissipates more dynamic power in operation.

    M. Khellah, M. Ghoneima, J. Tschanz, Y. Ye, N. Kurd, J. Barkatullah, S.

    Nimmagadda and Y. Ismailpresented a bus architecture called Skewed Repeater Bus

    (SRB) for reducing on-chip interconnects energy in microprocessors[5]. By introducing

    relative delay between neighbouring bus lines, SRB reduces both average and worst-

    case coupling capacitance between those lines On-chip interconnect RC delay is not

    scaling with technology.

  • 8/13/2019 Starlee Contents

    22/45

    22

    Therefore, the longest interconnect that can be accommodated in a single clock

    cycle between two flip-flops (flop distance) reduces, and the number of clock cycles

    needed to propagate a signal from one point of the chip to another increases. This

    adversely impacts overall performance of the processor.

    H. Deogun, R. Senger, D. Sylvester, R. Brown, and K. Nowka presented a

    new dual-VDD bus technique that is well suited for low power operation[12]. This

    technique adapts static pulsed bus architecture to use dual-VDD power supplies. During

    quiescent periods, the bus system idles at the lower of the two VDD supplies, thereby

    lowering static power dissipation. When actively transitioning, the inverters in the bus

    system are temporarily boosted to the higher VDD supply to provide the needed drive

    strength for performance. Minimizing power consumption is a problem that has been,

    and will continue to be, attacked on a variety of design fronts.

    A first-order analysis in projects that the fraction of total cells in a

    functional logic block used for repeaters will grow from 6% at the 90nm node to 70% at

    the 32nm node. This rapid increase in repeaters, which are generally sized aggressively

    to maintain delay and signal skew, dramatically increases the total power of the

    integrated circuit. They have introduced a novel dual-VDD boosted pulsed bus

    technique for total power reduction. They have shown that this technique maintains

    noise margins very close to those found in a traditional bus design. They showed that

    the achievable savings are consistent through a wide range of data switching rates.

    H. Kaul, J. Seo, M. Anders, D. Sylvester, and R. Krishnamurthy described

    an alternate repeater insertion technique that uses correct-by-construction polarities to

    reduce worst-case miller coupling factor (MCF) across any multiple segmented

    portion of a repeated bus[7]. The increased integration of multiple-cores and large

    shared caches in microprocessors requires improved energy- efficiency for core-to-core,

    core-to-cache and even intra-core communication to sustain the required performance

    benefits within shrinking power envelopes.

    For multi-core chips, improved bus designs need to satisfy the constraints of

    drop-in replacement for minimal change in design methodology, robust operation

    and gains across process corners and design space and the ability to improve power-

  • 8/13/2019 Starlee Contents

    23/45

    23

    performance of shared busses with multiple driver and receiver points along the bus.

    The technique is robust across process corners and for bus designs with non- equidistant

    repeater placement.

    C. J. Akl and M. A. Bayoumi proposed a hybrid polarity repeater insertion

    technique that combines inverting and non-inverting repeater insertion to achieve

    constant average effective coupling capacitance per wire transition for all possible

    switching patterns[8]. A simple yet effective hybrid polarity repeater insertion technique

    is used to minimize delay uncertainty and reduce delay and/or energy dissipation and

    buffers area of on-chip buses and adjacent signal wires. The reduction in worst case

    capacitive coupling reduces peak energy which is a critical factor for thermal regulation

    and packaging. With the continuous scaling of CMOS technology, interconnect delays

    dominate gate delays and become the major limiter to high performance systems.

    One of the major causes of interconnect delay degradation is the high cross-

    coupling capacitance between adjacent signal wires in deep sub- micrometer (DSM)

    technologies. This is mainly due to the dense wiring employed to achieve high

    integration density, and the increased aspect ratio used to lower interconnect resistance.

    The proposed technique has negligible overhead and it has regular and uniform layout.

    It can be easily integrated into current automated layout tools, which can be a possible

    extension of this work.

    K. Hirose and H. Yasuuraproposed a technique for reduction of maximum bus

    delay caused by crosstalk to prevent simultaneous opposite transition by skewing signal

    transition timing of adjacent wires[4].As the CMOS technology scaled down, the

    horizontal coupling capacitance makes crosstalk interference between adjacent wires aserious problem in VLSI design.

    The bus wires are placed with alternation of normal timing wire and shifted

    timing wire for the purpose of no adjacent wires switch at the same transition timing.

    The shifting time can be made by a delay line of inverter chain or two phase clock. By

    approximated equation of bus delay, it becomes clear that our technique is effective for

    repeater-inserted bus.

  • 8/13/2019 Starlee Contents

    24/45

    24

    K. Bernstein, C.-T. Chuang, R. Joshi, and R. Puri described CMOS scaling

    challenges for sub-9Onm designs and CAD challenges to support energy, parameter

    variation, and micro architectural challenges[20]. Design practice will have to change

    from todays deterministic design to probabilistic and statistical design. Future high

    performance microprocessor design with technology scaling beyond 90nm will pose

    two major challenges: (1) energy and power, and (2) parameter variations.

    Excessive sub threshold and gate oxide leakage are emerging as serious

    problem. Active power density of memory, such as on-chip SRAM cache, is an order

    of magnitude smaller than logic. Therefore, the overall processor performance can be

    improved in a more energy-efficient manner by using more memory than logic. This

    effectively reduces the overall activity factor of the chip. Dual-Vt designs can reduce

    leakage power during active operation, burn-in and standby.

  • 8/13/2019 Starlee Contents

    25/45

    25

    CHAPTER 3

    DESIGN METHODOLOGY

    3.1 EXISTING METHOD

    The hybrid polarity repeater insertion technique is shown in Figure. 3.1. The

    inverting repeaters at the midpoint of alternate bus lines are replaced with non-inverting

    repeaters. For an n-bit bus, n/2 inverting repeaters are substituted with non-inverting

    ones. When a worst delay transition occurs at the first half of a line, a best delay

    transition occurs at the second half and vice versa.

    Figure 3.1 Proposed hybrid polarity repeater insertion with non-inverting

    repeaters inserted at the midpoint of alternate bus wires.

    This leads to constant average coupling capacitance factor for all possible input

    transitions as shown in Table 3.1. As a result, reduction of both worst case delay and

    delay uncertainty is achieved. This technique has negligible overhead and it has regular

    and uniform layout. This technique is useful in many applications and it can be easily

    integrated into current automated layout tools, which can be a possible extension of this

    work.

  • 8/13/2019 Starlee Contents

    26/45

    26

    Table 3.1 Average Coupling Capacitance Factor of the middle switching line

    for all possible input transition patterns

    3.1.1 Drawbacks of Existing Method

    This technique reduces the overall worst-case MCF of an interconnect to 1, but

    also eliminate the best-case MCF of 0 (all adjacent wires switching in the same

    direction), leading to less advantage in average energy consumption.

    3.2 PROPOSED METHOD

    The edge encoding technique and coding scheme are used to reduce the power

    consumption due to coupling capacitance and self transitions respectively. In edge

    encoding technique, worst-case MCF of 1 is achieved, while preserving the best-case

    MCF of 0.This is done by controlling the edges of rising and falling transition in time,

    namely always performing rising transitions on the negative edge of the clock and

    falling transition on the positive edge of the clock (vice versa).

    3.2.1 Edge Encoding Technique

    In a multi-cycle bus structure, the transitions between neighboring wires are

    synchronized at every flip-flop as the signal propagates down the bus. This often

    generates simultaneous switching of adjacent wires in the opposite or same direction. In

  • 8/13/2019 Starlee Contents

    27/45

    27

    Figure. 3.2, the worst-case and best-case switching of the conventional bus are shown.

    The MCF=2 case, where every other wire switches in the opposite direction, generates

    the worst-case delay, which defines the clock frequency and also consumes the worst-

    case energy due to maximum coupling capacitance. To avoid this, we propose to

    selectively shift rising and falling edges and separate them by as much as half cycle. For

    example, as seen in Figure. 3.2 (b), if we selectively delay only the rising transitions by

    a half cycle and keep the falling transitions unaltered, the worst-case MCF is reduced

    from 2 to 1. We refer to this selective edge shifting as edge encoding.

    (a)Conventional wire switching

    (b) Proposed wire switching

    Figure 3.2 Conventional and proposed wire switching scenario in adjacent wires

  • 8/13/2019 Starlee Contents

    28/45

    28

    Since edge encoding shifts the same transitions together, the advantage of best-

    case switching is still maintained, this is unachievable in most other approaches. Since

    the edge-encoded signal transitions at both positive and negative edges of the clock, we

    use dual-edge triggered flip flops to propagate the signal along long multi-cycle

    interconnects. Since the signals must be synchronized back to positive edge triggered

    flip-flops after long interconnects, the number of dual-edge flip-flops within the multi-

    cycle interconnect should be even.

    The objective of the edge encoder is to selectively shift the rising and falling

    transition by different amounts. This encoding is done simply by performing an AND

    operation between the original signal and the half-cycle delayed version of itself. In this

    way, only the rising edge is delayed by a half cycle, separating simultaneous rising and

    falling transition by a half cycle. Since the encoder logic is very simple, the encoding

    overhead in terms of power and area is very small. This makes the edge encoding

    technique a highly practical approach.

    We propose two schemes to effectively use the edge-encoding technique in

    multi-cycle interconnect. The two methods differ in the procedure to cope with the

    initial half cycle latency required for edge encoding and to address the issue of aligning

    back to the positive-edge triggered signal at the far end of the wire.

    3.2.1.1 Zero Latency (ZL) Scheme

    The ZL scheme reduces energy consumption in multi-cycle interconnects

    without any latency overhead although encoding requires a half-cycle delay at the near

    end of the wire. This scheme exploits the fact the signal propagation in the edge-encoded bus is faster than that in the conventional bus due to reduced coupling

    capacitance.

    The block diagram of a multi-cycle interconnect with simple encoder logic is

    shown in Figure. 3.3.

  • 8/13/2019 Starlee Contents

    29/45

    29

    Figure 3.3 Encoder logic and block diagram of ZL scheme.

    The encoding procedure and the propagation of the encoded signal are shown in

    Figure. 3.3. When data toggles every cycle, the encoder generates a half-cycle pulse

    (enc_out). As this half-cycle pulse propagates through an even number of dual-edge

    flip-flops, it automatically aligns back to a positive edge triggered signal (ff4_out) at thefar end. Therefore, there is no need for any decoder circuit.

    Figure 3.4. Timing diagram of ZL scheme

  • 8/13/2019 Starlee Contents

    30/45

    30

    To achieve overall zero latency, the interconnect system is set up as shown in

    Figure. 3.5. If the conventional scheme requires n cycles to propagate through the entire

    interconnect, the edge-encoded bus must propagate through in (2n-1) half cycles,

    considering that the encoding takes one half cycle to synchronize at the far end of wire.

    In Figure. 3.5, L1 is the distance between positive-edge triggered flip-flops in the

    conventional bus, and L2 is the distance between dual-edge triggered flip-flops in the

    edge-encoded bus. Overall zero latency is achievable if L2 is defined by L1 and as

    follows:

    () ...................................................................... (3.1)

    Figure. 3.5 Flip-flop placement in conventional and ZL edge encoding scheme.

    For example, in a 9 mm interconnect, when n=3 and L1=3, the edge-encoded

    signal will propagate 1.8 mm every half cycle while the conventional signal will

    propagate 3 mm every cycle.

  • 8/13/2019 Starlee Contents

    31/45

    31

    Effectively, the edge-encoded signal is traveling 20% longer (1.8 versus 1.5

    mm) during the same time period, which is possible when at least a 17% speedup is

    achieved in the edge encoded bus due to coupling capacitance reduction.

    3.2.1.2 One Cycle Latency (OCL) Scheme

    In multi-cycle interconnects, multiple cycles are required to propagate across

    the entire wire. In these cases, one additional cycle latency may be acceptable if a clock

    frequency increase or aggressive energy reduction is a higher design priority. The OCL

    scheme is intended to achieve further performance improvement and energy reduction

    for a fixed throughput at the expense of one-cycle latency.

    After the encoding, the data must eventually align to the positive edge of the

    clock at the far end of the wire. To achieve this, we can align the transition at the near

    end to the positive edge of the clock by encoding with a full one cycle delay, and then

    allow for normal signal propagation along the wire. The one-cycle latency is therefore

    introduced once at the beginning of the wire and the throughput is not hampered.

    The block diagram of the OCL edge encoding scheme are shown in Figure. 3.6.

    Figure 3.6. Encoder logic and block diagram of OCL scheme.

    The difference in the encoder in Figure 3.6 compared to ZL is that a dual-edge

    flip-flop is added at the output to intentionally delay enc_in by one cycle and align the

  • 8/13/2019 Starlee Contents

    32/45

    32

    rising edge of enc_out at the positive edge of the clock as shown in Figure 3.6.The

    timing diagrams of the OCL edge encoding scheme are shown in Figure 3.7.

    Figure 3.7. Timing diagram of OCL scheme

    The corresponding flip-flop placement in the OCL edge encoding scheme is

    shown in Figure 3.8. Dual-edge flip-flops are placed at intervals equal to half the flop

    distance of the conventional bus. In the OCL edge-encoded bus, since the worst-case

    wire delay is reduced due to MCF reduction, we can either increase the clock frequency

    for high-performance busses or downsize the repeaters for iso-performance to the

    conventional bus for aggressive energy reduction.

  • 8/13/2019 Starlee Contents

    33/45

    33

    Figure 3.8 Flip-flop placement in conventional and OCL edge encoding scheme

    3.2.2. Coding Scheme

    This scheme minimizes self transition activities in the bus lines. Using this

    scheme, the present data can be converted into four types. They are inversion of odd

    lines, inversion of even lines, swapping adjacent bits and keeping the data unchanged.

    Then the number of self transitions between these four data patterns and the previously

    coded data can be calculated using energy estimator and the data pattern which has the

    less number of self transitions can be found using comparator and this corresponding

    data pattern can be transmitted on the bus.

    Let the data on an n bit wide bus, at time instant t be denoted as At = {atn-1, atn-

    2,....at1, at0}. The data transmitted on the bus is denoted as A(t)enc. The function

    calculateST_n(data1, data2) finds the number of self transitions between (data1,data2).

    Here, data1 and data2 should be n bits wide. The function swapAdj_n (At) swaps the

    adjacent bits in At and gives the output Asw(t) = {atn-1,atn-2,....at0,at1}.

    The coding scheme is as follows:

    1) Let A(t-1)enc be the previously coded data which was transmitted on the bus and let

    At be the present data which should be encoded and transmitted.

    2) Invert the odd lines in At and affix it with 00, for decoding purposes. Let this new

    data be denoted as At(odd) . Evaluate st_odd=calculateST_n(At(odd) ,A(t-1)enc).

    3) Similarly, invert the even lines in At and affix it with 01.Let this new data be denoted

    as At(eve).Evaluate st_eve=calculateST_n(At(eve) ,A(t-1)enc).

  • 8/13/2019 Starlee Contents

    34/45

    34

    4) Let Asw(t) =swapAdj_n(At).Suffix Asw(t) with 10 and let this new data be denoted

    as At(swp) .Evaluate st_swp=calculateST_n(At(swp) ,A(t-1)enc).

    5) Suffix At with 11 and let this new data be denoted as At(unc) .Evaluate

    st_unc=calculateST_n (At(unc) ,A(t-1)enc).

    6) Find min (st_odd,st_eve,st_swp,st_unc).

    7) The coded pattern corresponding to the minimum value in step 6 is transmitted.

    The block diagram of the proposed encoder is given in Figure 3.9

    Figure 3.9. Block diagram of the encoder of coding scheme

    The encoder takes an n bit input. The calculateST_n functions in the steps

    2,3,4,5 in the coding scheme above are evaluated by the Energy Estimator block.

    The outputs of the Energy Estimator block, min(st_odd,st_eve,st_swp,st_unc)

    are compared among themselves and the minimum amongst them is found out. The

    encode data pattern corresponding to the minimum value of self transitions is

    transmitted on the bus.

    The decoder to be used at the receiving end is shown in Figure 3.10.

  • 8/13/2019 Starlee Contents

    35/45

    35

    Figure 3.10. Block diagram of the decoder of coding scheme

    The two least significant bits in the transmitted pattern are given to a 2 to 4

    decoder and depending on these two bits, the appropriate decoding procedure is done.

    3.3 SOFTWARE DESCRIPTION

    VHDL is a versatile and powerful hardware description language which is useful

    for modeling electronic systems at various levels of design abstraction.

    3.3.1 VHDL

    VHDL (VHSIC hardware description language) is a hardware description

    language used in electronic design automation to describe digital and mixed-signal

    systems such asfield-programmable gate arrays andintegrated circuits

    3.3.1.1 Design

    VHDL is commonly used to write text models that describe a logic circuit. Such

    a model is processed by a synthesis program, only if it is part of the logic design. A

    simulation program is used to test the logic design using simulation models to represent

    the logic circuits that interface to the design. This collection of simulation models is

    commonly called a test bench.

    http://en.wikipedia.org/wiki/VHSIChttp://en.wikipedia.org/wiki/Hardware_description_languagehttp://en.wikipedia.org/wiki/Hardware_description_languagehttp://en.wikipedia.org/wiki/Electronic_design_automationhttp://en.wikipedia.org/wiki/Digital_electronicshttp://en.wikipedia.org/wiki/Mixed-signal_integrated_circuithttp://en.wikipedia.org/wiki/Field-programmable_gate_arrayhttp://en.wikipedia.org/wiki/Integrated_circuithttp://en.wikipedia.org/wiki/Integrated_circuithttp://en.wikipedia.org/wiki/Field-programmable_gate_arrayhttp://en.wikipedia.org/wiki/Mixed-signal_integrated_circuithttp://en.wikipedia.org/wiki/Digital_electronicshttp://en.wikipedia.org/wiki/Electronic_design_automationhttp://en.wikipedia.org/wiki/Hardware_description_languagehttp://en.wikipedia.org/wiki/Hardware_description_languagehttp://en.wikipedia.org/wiki/VHSIC
  • 8/13/2019 Starlee Contents

    36/45

    36

    VHDL has constructs to handle theparallelism inherent in hardware designs, but

    these constructs (processes) differ in syntax from the parallel constructs in Ada (tasks).

    Like Ada, VHDL is strongly typed and is not case sensitive. In order to directly

    represent operations which are common in hardware, there are many features of VHDL

    which are not found in Ada, such as an extended set of Boolean operators including

    nand and nor. VHDL also allows arrays to be indexed in either ascending or descending

    direction; both conventions are used in hardware, whereas in Ada and most

    programming languages only ascending indexing is available.

    VHDL has file input and output capabilities, and can be used as a general-

    purpose language for text processing, but files are more commonly used by a simulation

    test bench for stimulus or verification data. There are some VHDL compilers which

    build executable binaries. In this case, it might be possible to use VHDL to write a test

    bench to verify the functionality of the design using files on the host computer to define

    stimuli, to interact with the user, and to compare results with those expected. However,

    most designers leave this job to the simulator.

    It is relatively easy for an inexperienced developer to produce code that

    simulates successfully but that cannot be synthesized into a real device, or is too large tobe practical. One particular pitfall is the accidental production of transparent latches

    rather thanD-type flip-flops as storage elements.

    One can design hardware in a VHDL IDE (for FPGA implementation such as

    Xilinx ISE, Altera Quartus, Synopsys Synplify or Mentor Graphics HDL Designer) to

    produce the RTL schematic of the desired circuit. After that, the generated schematic

    can be verified using simulation software which shows the waveforms of inputs and

    outputs of the circuit after generating the appropriate test bench. To generate an

    appropriate test bench for a particular circuit or VHDL code, the inputs have to be

    defined correctly. For example, for clock input, a loop process or an iterative statement

    is required.

    A final point is that when a VHDL model is translated into the "gates and wires"

    that are mapped onto a programmable logic device such as aCPLD orFPGA,then it is

    http://en.wikipedia.org/wiki/Parallel_computinghttp://en.wikipedia.org/wiki/Strongly_typedhttp://en.wikipedia.org/wiki/Case_sensitivityhttp://en.wikipedia.org/wiki/Transparent_latchhttp://en.wikipedia.org/wiki/Flip-flop_(electronics)http://en.wikipedia.org/wiki/Register-transfer_levelhttp://en.wikipedia.org/wiki/CPLDhttp://en.wikipedia.org/wiki/FPGAhttp://en.wikipedia.org/wiki/FPGAhttp://en.wikipedia.org/wiki/CPLDhttp://en.wikipedia.org/wiki/Register-transfer_levelhttp://en.wikipedia.org/wiki/Flip-flop_(electronics)http://en.wikipedia.org/wiki/Transparent_latchhttp://en.wikipedia.org/wiki/Case_sensitivityhttp://en.wikipedia.org/wiki/Strongly_typedhttp://en.wikipedia.org/wiki/Parallel_computing
  • 8/13/2019 Starlee Contents

    37/45

    37

    the actual hardware being configured, rather than the VHDL code being "executed" as if

    on some form of a processor chip.

    3.3.1.2 Advantages

    The key advantage of VHDL, when used for systems design, is that it allows the

    behavior of the required system to be described (modeled) and verified (simulated)

    before synthesis tools translate the design into real hardware (gates and wires).

    Another benefit is that VHDL allows the description of a concurrent system.

    VHDL is adataflow language,unlike procedural computing languages such as BASIC,

    C, and assembly code, which all run sequentially, one instruction at a time.

    A VHDL project is multipurpose. Being created once, a calculation block can be

    used in many other projects. However, many formational and functional block

    parameters can be tuned (capacity parameters, memory size, element base, block

    composition and interconnection structure).

    A VHDL project is portable. Being created for one element base, a computing

    device project can be ported on another element base, for example VLSI with various

    technologies.

    http://en.wikipedia.org/wiki/Concurrent_systemhttp://en.wikipedia.org/wiki/Dataflow_programminghttp://en.wikipedia.org/wiki/Dataflow_programminghttp://en.wikipedia.org/wiki/Concurrent_system
  • 8/13/2019 Starlee Contents

    38/45

    38

    CHAPTER 4

    SIMULATION RESULTS

    4.1 POWER ANALYSIS OF ENCODER IN CODING SCHEME:

    The dynamic power of 0.068W and quiescent (leakage) power of 0.010 that

    together contribute a total on-chip power of 0.078W is given in this figure.

    Figure 4.1 Power analysis of encoder in coding scheme

  • 8/13/2019 Starlee Contents

    39/45

    39

    CHAPTER 5

    CONCLUSION AND FUTURE ENHANCEMENTS

    5.1 CONCLUSION

  • 8/13/2019 Starlee Contents

    40/45

    40

    APPENDIX

    CODING

    library IEEE;

    use IEEE.STD_LOGIC_1164.ALL;

    entity self is

    Port ( a : in STD_LOGIC_VECTOR (7 downto 0);

    encout : out STD_LOGIC_VECTOR (9 downto 0) );

    end self;

    architecture Behavioral of self is

    signal odd,even,swap,append :STD_LOGIC_VECTOR (9 downto 0);

    component energy is

    Port ( ina : in STD_LOGIC_vector(9 downto 0);

    bout :out std_logic_vector(3 downto 0));

    end component;

    signal egy1,egy2,egy3,egy4 : std_logic_vector(3 downto 0);

    begin

    p1:process(a)

    begin

    odd

  • 8/13/2019 Starlee Contents

    41/45

    41

    end process p1;

    egymet1 : energy PORT MAP(ina=>odd,bout=>egy1);

    egymet2 : energy PORT MAP(ina=>even,bout=>egy2);

    egymet3 : energy PORT MAP(ina=>swap,bout=>egy3);

    egymet4 : energy PORT MAP(ina=>append,bout=>egy4);

    library IEEE;

    use IEEE.std_logic_1164.all;

    use IEEE.numeric_std.all;

    use IEEE.std_logic_unsigned.all;

    entity energy is

    Port ( ina : in STD_LOGIC_vector(9 downto 0);

    bout :out std_logic_vector(3 downto 0));

    end energy;

    architecture Behavioral of energy is

    signal atp,exr : std_logic_vector(9 downto 0) :="0000000000";

    component FA is

    Port ( a,b,cin : in STD_LOGIC;

    s : out std_logic;

    c : out STD_LOGIC);

    end component FA;

    component HA is

    Port ( a,b : in STD_LOGIC;

    s : out std_logic;

    c : out STD_LOGIC);

  • 8/13/2019 Starlee Contents

    42/45

    42

    end component;

    signal c1 : std_logic_vector(7 downto 1);

    signal s1 : std_logic_vector(5 downto 1);

    begin

    exr exr(1),b=>exr(2),cin=>exr(3),s =>s1(1),c=>c1(1));

    FA2 : FA PORT MAP(a=>exr(4),b=>exr(5),cin=>exr(6),s =>s1(2),c=>c1(2));

    FA3 : FA PORT MAP(a=>exr(7),b=>exr(8),cin=>exr(9),s =>s1(3),c=>c1(3));

    FA4 : FA PORT MAP(a=>c1(1),b=>c1(2),cin=>c1(3),s =>s1(4),c=>c1(4));

    FA5 : FA PORT MAP(a=>s1(1),b=>s1(2),cin=>s1(3),s =>s1(5),c=>c1(5));

    HA1 : HA PORT MAP(a=>exr(0),b=>s1(5),s =>bout(0),c=>c1(6));

    FA6 : FA PORT MAP(a=>s1(4),b=>c1(6),cin=>c1(5),s =>bout(1),c=>c1(7));

    HA2 : HA PORT MAP(a=>c1(7),b=>c1(4),s =>bout(2),c=>bout(3));

    end Behavioral;

    library IEEE;

    use IEEE.std_logic_1164.all;

    use IEEE.numeric_std.all;

    use IEEE.std_logic_unsigned.all;

    entity HA is

    Port ( a,b : in STD_LOGIC;

    s : out std_logic;

    c : out STD_LOGIC);

    end HA;

    architecture Behavioral of HA is

    begin

  • 8/13/2019 Starlee Contents

    43/45

    43

    s

  • 8/13/2019 Starlee Contents

    44/45

    44

    REFERENCES

    [1] A. B. Kahng, S. Muddu, and E. Sarto, (1999) Interconnect optimization

    strategies for high-performance VLSI designs, in Proc. Int. Conf. VLSI Des., pp.464469.

    [2] A. B. Kahng, S. Muddu, and E. Sarto, (2000) On switch factor based analysis of

    coupled RC interconnects, in Proc. Des. Autom. Conf. (DAC), pp. 7984.

    [3] B. Victor and K. Keutzer, (2001 )Bus encoding to prevent crosstal delay, in

    Proc. Int. Conf. Comput.-Aided Des. (ICCAD), pp. 5763.

    [4] C. J. Akl and M. A. Bayoumi, (Sep. 2008)Reducing interconnect delay

    uncertainty via hybrid polarity repeater insertion, IEEE Trans. Very Large Scale

    Integr. (VLSI) Syst., vol. 16, no. 9, pp. 12301239.

    [5] H. Deogun, R. Senger, D. Sylvester, R. Brown, and K. Nowka, (2006)

    A dual-Vdd boosted pulsed bus technique for low power and low leakage

    operation, in Proc. Int. Symp. Low Power Electron. Des. (ISLPED),pp. 7378.

    [6] H. Kaul, D. Sylvester, M. Anders, and R. Krishnamurthy, (Nov. 2005) Design

    and analysis of spatial encoding circuits for peak power reduction in on-chip

    buses, IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol.13, no. 11, pp.

    12251238.

    [7] H. Kaul, J. Seo, M. Anders, D. Sylvester, and R. Krishnamurthy, (2008) A

    robust alternate repeater technique for high performance busses in the multi-core

    era, in Proc. Int. Symp. Circuits Syst., pp. 372375.

    [8] H. Partovi, R. Burd, U. Salim, F. Weber, L. DiGregorio, and D. Draper, (1996)

    Flow-through latch and edge-triggered flip-flop hybrid elements, in Dig. Tech.

    Papers IEEE Int. Solid-State Circuits Conf. (ISSCC), pp. 138139.

    [9] J. Eble, V. De, D. Wills, and J. Meindl,( 1998) Minimum repeater count, size,

    and energy dissipation for gigascale integration (GSI) interconnects, in Proc. Int.

    Interconnect Technol. Conf.,pp. 5658.

    [10] J. Seo, D. Sylvester, D. Blaauw, H. Kaul, and R. Krishnamurthy, (2007) A

    robust edge encoding technique for energy-efficient multi-cycle intercon- nect,

    in Proc. Int. Symp. Low Power Electron. Des. (ISLPED), pp. 6873.

    [11] J. Tschanz, S. Narendra, Z. Chen, S. Borkar, M. Sachdev, and V. De, (2001)

    Comparative delay and energy of single edge-triggered and dual edge-triggered

    pulsed flip-flops for high-performance microprocessors, in Proc. Int. Symp. Low

    Power Electron. Des. (ISLPED), pp. 147152.

  • 8/13/2019 Starlee Contents

    45/45

    [12] K. Bernstein, C.-T. Chuang, R. Joshi, and R. Puri, (2003)Design and CAD

    challenges in sub-90 nm CMOS Technologies, in Proc. Int. Conf. Comput.-

    Aided Des. (ICCAD), pp. 129136.

    [13] K. Bowman, J. Tschanz, M. Khellah, M. Ghoneima, Y. Ismail, and V. De, (2006)Time-borrowing multi-cycle on-chip interconnects for delay variation tolerance,

    in Proc. Int. Symp. Low Power Electron. Des. (ISLPED), pp. 7984.

    [14] K. Hirose and H. Yasuura, (2000) A bus delay reduction technique considering

    crosstal, in Proc. Des., Autom. Test Eur. (DATE), pp.441445.

    [15] K. Nose and T. Saurai, (2001) Two schemes to reduce interconnect delays in

    bi-directional and uni-directional buses, in Symp. VLSI Circuits Dig. Tech.

    Papers, pp. 193194.

    [16] M. Khellah, J. Tschanz, Y. Ye, S. Narendra, and V. De, (2002) Static pulsed busfor on-chip interconnects, in Symp. VLSI Circuits Dig. Tech. Pa- pers, pp. 78

    79.

    [17] M. Khellah, M. Ghoneima, J. Tschanz, Y. Ye, N. Kurd, J. Barkatullah,

    S. Nimmagadda, and Y. Ismail, (2005) A sewed repeater bus architecture for

    on-chip energy reduction in microprocessors, in Proc. Int. Conf. Comput. Des.

    (ICCD), pp. 253257.

    [18] P. P. Sotiriadis, A. Wang, and A. Chandraasan, (2000) Transition pattern

    coding: An approach to reduce energy in interconnect, in Proc. Euro. Solid-State

    Circuits Conf. (ESSCIRC), pp. 348351.

    [19] R. Arunachalam, E. Acar, and S. Nassif, (2003) Optimal shielding/spacing

    metrics for low power design, in Proc. IEEE Comput. Soc. Annu. Symp. VLSI,

    pp. 167172.

    [20] S.-C. Wong, T. G.-Y. Lee, D.-J. Ma, and C.-J. Chao, (May 2000) An empirical

    three-dimensional crossover capacitance model for multilevel interconnect VLSI

    circuits, IEEE Trans. Semiconductor Manufacturing, vol.13, no. 2, pp. 219227.