4
Set of self-timed latches for high-speed VLSl B.-S.Kong and Y.-H.Jun Abstract: A set of novel self-timed latches is introduccd and analysed. These latches have no back-to- back connection as in conventional self-timed latches, and both inverting and noninverting outputs are evaluated simultaneously leading to higher operating frequencies. A novel type of cross-coupled inverter used in the proposed circuits implements static operation without thc signal fighting with the main driver during signal transition. The power consumption of these latches is also comparable to, or lcss than, that of conventional circuits. The proposed latches are designed using a 0.35 pm CMOS technology. The conipaiison results indicate that the proposed active-low self-timed latch (ALSTL) improves speed by 22-34% over the conventional NAND SR latch, while for the active-high self- timed latch (AHSTL) the speed improvements are 2&35% with less power as compared to the corresponding NOR SR latch. 1 Introduction Latches and flip-flops (FFs), which are usually used to store data in VLSl systems, are important design elements, as they can determine the overall perfotmance of the sys- tem [I]. Most latches and FFs are synchronously operated by a clock signal that indicates the data valid time. In this case, the transition of a clock signal must occur at the same moment of time at all synchronising points of the system where storage elements are placed. In a practical situation, however, clock signals arc routed along several different wiring paths with different loads, and thus rnay reach at each point with different delays. This variation on timing delay during clock distribution results in clock skew, and can cause serious problems such as false output latching. In a multiphase clocking scheme, a nonoverlapping period is introduced as a margm to prevent problems caused by the skew between clock phases. Even in a single-phase clocking scheme, although there is no skew between phases, similar problems can occur, owing to clock delay between remotely located FFs or latches [2]. The delay must be accommo- dated by lengthening the clock period to provide a suffi- cient timing margin. In the moderate frequency range, the portion taken by the margin is negligibly small, as com- pared to the clock period. But, as the clock frequency increases, this cannot be ignored because it occupies a con- siderable portion in a given clock period, and causes diffi- culty in increasing clock speed. Moreover, the distribution of clock signal unifoimly throughout a system with tight skew control requires increased design cost. Recently, a method to overcome the skew problem to a large extent in synchronous design was proposed in [3]. But the cost for the distribution of the clock signal throughout the system still exists. Another way of solving the skew problem is to use the concept of self-timing. A selr-timing circuit needs no Thc auihon arc with LG Semicon Co. Ltd., 16, Wooinyeondong, Scochwu, Seoul, 137.124, Knrea explicit timing control. Control information is encoded into the input itself and becomes available as soon as an input change occurs. Therefore, no clock skew-related problems occur and clock distribution cost is avoided. The simplest foim of self-timed latch is the conventional set-reset (SR) latch which is shown in Fig. I, and is popularly used in synchronous systems to improve peiformance. For exam- ple, the PHIMOS [4] technique combines an SR latch with differential cascode voltage switch (DCVS) logic [SI to implement a race-free single-phase clocked pipcline struc- ture. This latch is also used as a slave stage of a high per- foimance differential edge-triggered F F to reduce latency and clock load [6]. However, the conventional SR latch has several disadvantages, due to back-to-bdck connection between logic gates. A ci.itical path established through the connection makes the latch inherently slow with high power consumption. In addition, serial output evaluation makes the inverting and noninverting output transitions unsymmetrical to each other. 0 b To overcome these drawbacks, we introduce B set of novel self-timed latches in this paper. 2 Circuit architecture and operation The schematic diagrams for the proposed self-timed latches are depicted in Fig. 2. The circuit shown in Fig. 2r1, which is called active-low self-timed latch (ALSTL), consists of two input inverters, two main drivers, and two cross- coupled inverters. Input inverters, INV1, and 1NV2, are used to generate Sb and Rb for use as inputs to the main driver. Trdiisistors, MPI, MP2, MNI, and MN2, form 341

Set of self-timed latches for high-speed VLSI

  • Upload
    y-h

  • View
    218

  • Download
    2

Embed Size (px)

Citation preview

Page 1: Set of self-timed latches for high-speed VLSI

Set of self-timed latches for high-speed VLSl

B.-S.Kong and Y.-H.Jun

Abstract: A set of novel self-timed latches is introduccd and analysed. These latches have no back-to- back connection as in conventional self-timed latches, and both inverting and noninverting outputs are evaluated simultaneously leading to higher operating frequencies. A novel type of cross-coupled inverter used in the proposed circuits implements static operation without thc signal fighting with the main driver during signal transition. The power consumption of these latches is also comparable to, or lcss than, that of conventional circuits. The proposed latches are designed using a 0.35 pm CMOS technology. The conipaiison results indicate that the proposed active-low self-timed latch (ALSTL) improves speed by 22-34% over the conventional NAND SR latch, while for the active-high self- timed latch (AHSTL) the speed improvements are 2&35% with less power as compared to the corresponding NOR SR latch.

1 Introduction

Latches and flip-flops (FFs), which are usually used to store data in VLSl systems, are important design elements, as they can determine the overall perfotmance of the sys- tem [I]. Most latches and FFs are synchronously operated by a clock signal that indicates the data valid time. In this case, the transition of a clock signal must occur at the same moment of time at all synchronising points of the system where storage elements are placed. In a practical situation, however, clock signals arc routed along several different wiring paths with different loads, and thus rnay reach at each point with different delays. This variation on timing delay during clock distribution results in clock skew, and can cause serious problems such as false output latching. In a multiphase clocking scheme, a nonoverlapping period is introduced as a margm to prevent problems caused by the skew between clock phases. Even in a single-phase clocking scheme, although there is no skew between phases, similar problems can occur, owing to clock delay between remotely located FFs or latches [2]. The delay must be accommo- dated by lengthening the clock period to provide a suffi- cient timing margin. In the moderate frequency range, the portion taken by the margin is negligibly small, as com- pared to the clock period. But, as the clock frequency increases, this cannot be ignored because it occupies a con- siderable portion in a given clock period, and causes diffi- culty in increasing clock speed. Moreover, the distribution of clock signal unifoimly throughout a system with tight skew control requires increased design cost. Recently, a method to overcome the skew problem to a large extent in synchronous design was proposed in [3]. But the cost for the distribution of the clock signal throughout the system still exists.

Another way of solving the skew problem is to use the concept of self-timing. A selr-timing circuit needs no

Thc auihon arc with LG Semicon Co. Ltd., 16, Wooinyeondong, Scochwu, Seoul, 137.124, Knrea

explicit timing control. Control information is encoded into the input itself and becomes available as soon as an input change occurs. Therefore, no clock skew-related problems occur and clock distribution cost is avoided. The simplest foim of self-timed latch is the conventional set-reset (SR) latch which is shown in Fig. I, and is popularly used in synchronous systems to improve peiformance. For exam- ple, the PHIMOS [4] technique combines an SR latch with differential cascode voltage switch (DCVS) logic [SI to implement a race-free single-phase clocked pipcline struc- ture. This latch is also used as a slave stage of a high per- foimance differential edge-triggered FF to reduce latency and clock load [6]. However, the conventional SR latch has several disadvantages, due to back-to-bdck connection between logic gates. A ci.itical path established through the connection makes the latch inherently slow with high power consumption. In addition, serial output evaluation makes the inverting and noninverting output transitions unsymmetrical to each other.

0 b

To overcome these drawbacks, we introduce B set of novel self-timed latches in this paper.

2 Circuit architecture and operation

The schematic diagrams for the proposed self-timed latches are depicted in Fig. 2. The circuit shown in Fig. 2r1, which is called active-low self-timed latch (ALSTL), consists of two input inverters, two main drivers, and two cross- coupled inverters. Input inverters, INV1, and 1NV2, are used to generate Sb and Rb for use as inputs to the main driver. Trdiisistors, MPI, MP2, MNI, and MN2, form

341

Page 2: Set of self-timed latches for high-speed VLSI

main drivers in which p-type transistors are connected to the external inputs, S, and R, while n-type transistors are connected to Sb and Rb. The source terminals of MNI and MN2 are connected to Sb and Rb. The cross-coupled inverters are constructed using MP3, MP4, MN3, and MN4. The source terminals of these transistors are con- nected to the external inputs or the outputs of input invert- ers. The major function of these inverters is to compensate leakage current at output nodes; thus the transistor width of these inverters can be smaller than that of other transis- tors in the circuit. These inverters preserve logic states stati- cally when no input is applied, and cause no signal fighting during logic transitions as shown later in this Section. The circuit structure addressed above, with predefined input combinations, operates safely as a self-timed latch. ALSTL operates with both S and R inputs normally at ‘one’, unless the state of the circuit has to be changed. If we assume that the initial states of Q and Qh are, respectively, low and high with both S and R being high, all the transistors in the main drivers and the transistors MN3 and MP4 are off, and transistors MP3 and MN4 are on. Thus, output nodes are connected to the respective supply lines preserving the output logic states. When a ‘zero’ is applied to the input node S, transistors MPI and MN2 become on, and this makes Q and Qb high and low regardless of their previous states. Once S and Sb change their values, transistors MP3 and MN4 no longer drive old values because the source nodes change their values. Instead, they drive new logic val- ues, and thus help speed up the logic transition by provid- ing another driving path to output nodes. On subsequent return of the input, both MPI and MN2 become off, and the output values are sustained by the cross-coupled invert- ers. Similarly, the application of ‘zero’ to the R input niakes Q low and Qb high, and these values are preserved after the input returns to ‘one’. When both inputs go low, all the transistors in the main driver become active, and the logic ‘one’ is driven to both the outputs. This corresponds to the ‘set’ state of the latch.

S R

R s MN3 MNL

R

lNV l INV2

R s I I MN3 MNL ,/ MNZ M N I

I A I b

Fig2 ~’~‘uvei.~~~in,ieilbiri, ( i y p I) o Aaive-low selilrimed lalch (ALSIL I) h Active-high self-limed l a a h (AHSTL I)

The circuit structure of active-high self-timed latch (AHSTL) shown in Fig. 2b is similar to that of ALSTL, except for some connections, namely, the external inputs, S and R, which are connected to the n-type transistors of the main drivers, while thep-type transistors receive Sb and Rb signals as inputs. The operation principle of the circuit is

342

exactly the same as that of ALSTL except for the input signal polarity. That is, the inputs for AHSTL normally stay at ‘zero’, and a high pulse on S or R causes the corre- sponding output change. Simultaneous high values for both inputs make the ‘set’ state of the latch as in the case of ALSTL.

The circuits shown in Fig. 3 have improved the speed performance over the circuits in Fig. 2 by changing some connections in the main drivers: the sources of MN1 and MN2 are connected to the ground instead of being con- nected to inteinal signals. The modification eliminates the stacked transistor connection in the main driver, and thus leads to higher speed. However, active pulses on both S and R inputs simultaneously must not be allowed. If this happens, all the transistors in the main driver become on, and a substantial short-circuit current flows, resulting in an increased power consumption. Therefore, in the case of using these batches, applying active signals on both inputs simultaneously even due to noise must be prohibited.

S R

INV2

I I MN3 MNL S R

M N I MN2 h

Fig.3 a Acme-low mlf-iimal latch (ALSTL 11) h Active-high self-limed latch (AHSTL 11)

Now/ .sel/+med blch (Type 11)

R S a

MP3 MPL

Q

The circuits shown in Fig. 4 show the implementation of the latch with a reduced number of transistors. In these cir- cuits, input inverters are eliminated, and the connection and type of some devices in the main driver are changed. In the circuit shown in Fig. 4u, MNI and MN2 receive inputs through the sources, with their gate terminals being con- nected to the output. The sources of MN3 and MN4 are connected to ground instead of inputs. The circuit shown in

mi: Proc.-Circ~cirs D e w r s y . ~ Voi 146. A’”. 6, Decmher lo99

Page 3: Set of self-timed latches for high-speed VLSI

Fig. 46 uses tt-channel devices as the pull-up transistors in the main drivers. The transistors MP3 and MP4 in the cross-coupled inverters are connected to the power supply. The operation of these circuits is similar to that of the respective circuits in Fig. 2. Simultaneous low or high pulses are not allowed to happen for the same reason as in the circuits in Fig. 3 due to their unsafe behaviour.

The proposed self-timed latches described above have advantages provided by the elimination of the clock signal such as the immunity to clock skew problems and no global clock distribution cost, as shown in Section 1. More- over, they have additional merits over conventional self- timed latches. First of all, the speed of the proposed circuits is supeiior to that of conventional circuits, due to novel structure that has no speed-limiting back-to-back connec- tion. In conventional self-timed latches, serial logic evalua- tions through the logic gates connected back-to-back make the speed inherently slow. Secondly, these latches have no crossover current in the main driver during signal trdnsi- tion. While one transistor in a main driver is being turned on, the other transistor in the same main diiver is always in the cut-off state, as is verified by the operation principle of the circuits described above. Thus, the crossover current during transition is totally eliminated, and all the current being provided is used to charge or discharge the output node. This lellds to improvement in operating frequency and reduction in power consumption. Another factor con- tributing to higher speed and lower power comes from the behaviour of the proposed cross-coupled inverter. A con- ventional cross-coupled inverter as a means of obtaining static operation causes signal fighting against the old value duting logic transition leading to the degradation in speed and power. This can be somewhat relieved by reducing the WIL ratio of the devices, but this can cause an increased parasitic component and the circuit still tends to be slow. On the other hand_ the cross-coupled inverter in the pro- posed self-timed latch causes no signal fighting and helps speed up the signal transition as is explained earlier in this Section.

3r

I

3r 3 L 5 6 7

3 L 5 6 7 t ime ns

3 Comparison and discussion

To evaluate the perFoimance OF the proposed circuits, these latches are designed using a 0.35 pn CMOS technology. The circuit simulations are done with the power supply of 2.5 V using the netlists extracted from the layout with all

IEE Proc.-Circuiir Det.ieu8 S.n.. Vol. 146, iVo. 6. Deterriber 1999

the parasitic components such as the capacitance and the resistance, and the waveforms of the circuits are illustrated in Fig. 5. For a fair comparison, the output inverters in the conventional latches in Fig. 1 and the main drivers of the proposed circuits are made to he the same size. Further- more, each input is generated by two stages of inverters with the first stage being the same size for all cases. The size of the second stage is optimised according to the amount of input capacitance of each circuit. The power consumed by these inverters is included in the comparison. Tables 1 and 2 summarise the comparison results of the conventional and the proposed latches. They list the number of devices, total gate width of the transistors used, worst-case propaga- tion delay, average power consumption. The propagation

Table 1: Summary of comparison: NAND STL and ALSTL

Device Total gate Delay Power count width lprn) Ins) IpWJ

Conventional 12 68 0.93 213 ALSTL-I 12 68 0.73 223 ALSTL-II 12 50 0.61 166 ALSTL-Ill 8 40 0.65 187

Table 2 Summary of comparison: NOR STL and AHSTL

Device Total gate Delay Power count width(prn1 (ns) (pW1

Conventional 12 76 0.92 230 ALSTL~I 12 48 0.74 206 ALSTL-II 12 52 0.60 173 ALSTL-Ill 8 40 0.65 148

1.ar

01 0 200 LOO 600 800

capacitive lood,fF a

0 200 LOO 600 800 copocitive load,fF

h

:4:

Page 4: Set of self-timed latches for high-speed VLSI

1.25 1.5 1.15 2 0 2.25 2.5 supply vol1oge.V

0

1.25 1.5 1.75 2.0 2 .25 2.5 supply v0 i toge.v

b

delay is measured with 200fF load per output node, while power consumption is measured with no load capacitance when the input toggles at the frequency of 100MHa. When it comes to device count, novel circuits require the same or a smaller number of devices. Furthermore, total gate width of the proposed circuit is smaller for almost all cases. The gatc width of ALSTL I is comparable to that of conven- tional circuit due to input invcrlcrs with larger sue. As far

as the performance is concerned, ALSTL latches improve speed by 22-34%, and the power consumption is compara- ble to, or less than, a conventional NAND latch. Medn- while, the improvement in speed of AHSR latches is 2 6 35% as compared to the corresponding NOR latch. Power consumption in this case is also smaller. To investigate delay variation on different output load, Fig. 6 plots the propagation delay of novel and conventional circuits with changing load capacitance from l O O f F to 800fF. According to the simulation result, type 111 latches are the best when the amount of load is small, but their performance becomes degraded as the load increases. Types I and I1 have lower latency than the conventional counterparts for all the range of comparison. Among them, type I1 latches show the best overall performance. Thus, type I1 is recommended for use in hgh speed circuit design with the condition that prohib- ited input combinations be completely prevented from hap- pening. Type 111 latches are useful only when the area and the power are critical and the amount of load is small.

4 Conclusions

In this paper, a set of novel self-timed latches is introduced and analysed. These latches have no back-to-back connec- tion that exists in conventional latches, and both the invert- ing and noninverting outputs are simultaneously evaluated, leading to higher operating frequencies. A new type of cross-coupled inverter used in the proposed latches imple- ments static operation without signal fighting with the main driver during signal ~ transitions. Power consumption of these latches is also comparable to, or less than, that of conventional circuits.

5 References

I MEAD, C., and CONWAY, L.: 'Introduction to VLSI systems' (Addison-Wesley, 1980)

2 AFGHAHI, M., and SVENSSON, C.: 'A unified single-phase clock- ing scheme Cor V U 1 systems', IEEE J Solid-Slate Circuits, 1990, 25, ( I ) , pp. 22S232 HARRIS, D., and HOROWITZ, M.A.: 'Skew-tolerant Domino cir- cuits', IEEE J. Solid-State Circuits, 1997, 32, ( I I), pp. 1702-1711 RENSHAW, D., and LAU, C.H.: 'Racefree clocking of CMOS pipe- lines using a single global clack', IEEE J. Solid-State Civcuits, 1990, 25, (3), pp. 766769

5 HELLER, L.G., GRIFFIN, W.R., DAVIS, J.W., and THOMA, N.G.: 'Cascode voltage switch logic: a differential CMOS logic fdmily'. ISSCC digest technical papers, San Franscisco, USA, 1984, pp. 1 6 1 7 MONTANARO, J.: 'A 160 MHz 32 b 0.5 W CMOS RISC micro- processor'. ISSCC digest technical papers, 1996, pp. 21M15

3

4

6

344 IEE Prac:Clrcuirr Devices Sysl.. Vol. 146, No. 6, December 1999