High-Performance and Power-efficient CMOS Comparators

254 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 38, NO. 2, FEBRUARY 2003

High-Performance and Power-EfficientCMOS Comparators

Chung-Hsun Huang, Student Member, IEEE, and Jinn-Shyan Wang, Member, IEEE

AbstractSeveral design techniques for high-performanceand power-efficient CMOS comparators are proposed. First, thecomparator is based on the priority-encoding (PE) algorithm,and the dynamic circuit technique developed specifically for thepriority encoder can be applied. Second, the PE function and thesubsequent logic functions are merged and efficiently realized inthe multiple output domino logic (MODL) to result in a shortenedlogic depth. The circuit in MODL CMOS is also compact andpower efficient because few transistors are needed. Third, themultilevel look-ahead technique is used to shorten the path ofpriority-token propagation. Finally, the circuit is realized witha latch-based two-stage pipelined structure, and the comparisonfunction is partitioned into two parts, with each part executed ineach half of the clock cycle in a delay-balanced manner. Post-layoutsimulation results show that a 64-b comparator designed with theproposed techniques in a 3-V 0.6- m CMOS technology is 16%faster, 50% smaller, and 79% more power efficient as comparedwith the all-n-transistor comparator, which is the fastest amongthe conventional comparators. Measurement results of the testchip conform with simulation results and prove the feasibility ofthe proposed techniques.

Index TermsCMOS dynamic circuit, comparator, priorityencoding, multilevel lookahead, multiple output domino logic(MODL).

I. INTRODUCTION

THE COMPARATOR is a very basic and useful arith-metic component of digital systems. There are severalapproaches to designing CMOS comparators, each withdifferent operating speed, power consumption, and circuitcomplexity. One can implement the comparator by flatteningthe logic function directly [1]. This approach is only suitablefor comparators with short inputs. For the comparators withlonger inputs, circuit complexity increases drastically, andthe operating speed is degraded accordingly. Another way todesigning the comparator is employing a parallel adder [2].In this approach, the adder becomes the major factor limitingthe operating speed. However, a very high-speed adder oftenrequires thousands of transistors [11][13].

Recently, Wang et al. [3] proposed to construct the com-parator in a tree structure with the all-n-transistor (ANT)dynamic CMOS logic [3] in order to improve the operatingspeed. The ANT logic is derived from the all-n-logic (ANL)[4]. Both ANT and ANL logic circuits can only be implementedwith heavy pipelining. In [3], a 64-b comparator is designed as

Manuscript received March 26, 2002; revised September 6, 2002. This workwas supported by the National Science Council of Taiwan under Research GrantNSC 90-2215-E-194-019 and Grant NSC 91-2215-E-194-007.

The authors are with the Institute of Electrical Engineering, National Chung-Cheng University, Chia-Yi, 621 Taiwan, R.O.C. (e-mail: [email protected]).

Digital Object Identifier 10.1109/JSSC.2002.807409

Fig. 1. Numerical example of 4-b priority-encoding-based comparison.

a six-pipeline circuit, and each comparison operation throughthese six pipelines is finished in three clock cycles. Althoughsuch a heavily pipelined design achieves high throughput, itmay not be suitable for all applications. For example, somepopular microprocessors such as the ARM microprocessor [5]often need to execute a comparison instruction within a singleclock cycle. Moreover, the latches used to form the pipelinesincrease the circuit complexity and power consumption of theANT comparator.

In this paper, we propose several design techniques for high-performance and power-efficient CMOS comparators. The pro-posed techniques span from the microarchitecture to logic andcircuit design levels. In the microarchitecture design, the pri-ority-encoding algorithm is adopted to efficiently implementeach comparison operation in one clock cycle. The critical pathis effectively shortened using the multilevel look-ahead tech-nique that we proposed in [6] for the priority encoder. Further-more, for long comparators, a two-stage pipelined architectureis used to partition and balance the logic functions into eachhalf of the clock cycle. In the logic design, the priority-encodingfunction and some logic functions are merged in one complexCMOS gate called the magnitude decision module. Such a de-sign not only improves the operating speed but also makes thecircuit more compact and power efficient. In the circuit design,the dynamic technique with serially connected structure is ap-plied to produce high performance with low switching activity.Also, a technique similar to the multiple output domino logic(MODL) [7] is applied to the magnitude decision module sothat the circuit complexity is reduced further.

The rest of this paper is organized as follows. Section II de-scribes the design principles of the new priority-encoding-basedcomparator. Basic design techniques used to design new com-parators will be described in Section III, while the microarchi-

0018-9200/03$17.00 2003 IEEE

HUANG AND WANG: HIGH-PERFORMANCE AND POWER-EFFICIENT CMOS COMPARATORS 255

Fig. 2. Conceptual block diagram of the priority-encoding-based 4-b comparator.

tecture improvement together with modified circuits for longcomparators is described in Section IV. Performance evaluationand experimental results are given in Section V, and the conclu-sion is given in Section VI.

II. DESIGN PRINCIPLES OF THEPRIORITY-ENCODING-BASED COMPARATOR

Let the two inputs of the comparator be and , both withbits counted from bit 0 to bit . The binary variable

denotes that is larger than . Another bi-nary signal EQUAL indicates is equal to . A 4-b numericalexample, as shown in Fig. 1, is used to demonstrate the designconcept of the proposed comparator.

Assume the two operands and are 4 b1011 and 4 b1000,respectively. By inspection, and should be 1 and 0,respectively, and EQUAL should be 0. The magnitude compar-ison is divided into four steps and the number in each shadedoval in Fig. 1 stands for the sequence number of each step. Thefirst step is to determine whether each corresponding bit ofand is equal or not using XOR gates. If is equal to , allthe output bits of the XOR gates will be 0. On the other hand, if

is not equal to , there is at least one 1 bit in the result. Inthis numerical example, the result is 4 b0011, reflecting thatis not equal to .

There are two operations in the second step. The first opera-tion is performed by NORing the result of the first step, which is4 b0011, to generate the output signal EQUAL. The second op-eration actually determines which input is larger. Observe that inthe result of the first step (4 b0011), the most significant 1 bit,which will be called the most significant unequal bit (MSUB)hereafter, is at bit 1. Meanwhile, the bit at the MSUB of is1, while the bit at the MSUB of is 0. The MSUB imme-diately shows which operand is larger. In order to quickly findthe MSUB, we employ the priority encoder proposed in [6] and[8] (details of the circuit will be described in a later section). Forthis numerical case, the priority encoder takes the output of thefirst step (4 b0011) and generates 4 b0010. There is only one1 bit in the output, which is exactly at the MSUB.

In the third step, AND operations are used to find out fromwhich operand the 1 bit (MSUB) comes. Let( be the AND of and the output of thepriority encoder. Then, and are 4 b0010

(a)

(b)Fig. 3. (a) Block diagram of a 4-b comparator. (b) Schematic diagram of a 4-bMDM.

and 4 b0000, respectively. The nonzero value ofimmediately shows that is larger than . Finally, the signals

and can be generated by ORing the bits ofand , respectively. As expected, and are 1and 0 in this example.

The above operations can be realized by the circuit shown inFig. 2. The priority encoder implements the following equations[6].

(1)


Fig. 4. Schematic diagram of a 4-b priority-encoding-based CMOS comparator.

Note that although the above design concept is similar to thatdescribed in [1], the implementation details are quite different.We will elaborate these details shortly.

III. BASIC DESIGN TECHNIQUES

When implementing the comparator in CMOS technology,we found that the priority encoder and those AND gates in Fig. 2can be merged into a functional block, called the magnitudedecision module (MDM). With the MDM, the block diagramof a 4-b comparator is revised as shown in Fig. 3(a). The circuitfor generating EQUAL will not be shown hereafter because itis not in the critical path. The MDM implements the functionslisted below, and it is designed as the circuit shown in Fig. 3(b).

(2)

The circuit in Fig. 3(b) is derived from the priority encoder weproposed in [6]. We also adapt the MODL style [7] to reducecircuit complexity and increase operating speed.

The circuit in Fig. 3(b) operates as follows. When the clocksignal clk goes low, the circuit enters the precharging phase andthe output nodes andare precharged to 0. When clk goes high, the circuit entersthe evaluation phase. For , the priority descends

from to . For example, if , then thesignal will be used to turn off the discharging pathsfor and . Therefore,outputs and will be keptat 0. The values of and dependon and . For example, if , nodes and

will be evaluated as logic 0 and 1, while nodesand will be kept at logic 1 and 0, respectively. Onthe other hand, if , neither node nor node havedischarging path because transistor is turned off. Then,outputs and stay in the prechargedstate. At the same time, relinquishes the controland the rest of the circuit functions as if there are only threeinputs, , , and .

The schematic of the 4-b comparator [Fig. 3(a)] is shown inFig. 4. The circuit follows the domino logic style [9] and, hence,the necessary inversion function is moved to the input terminaland implemented via static CMOS circuits. On the other hand,the OR function is implemented by a dynamic NOR gate plus aNOT gate, and placed after the dynamic MDM circuit.

Although we can derive an MDM with more than four inputsin the same way as (2), the circuit becomes too complicated toachieve high speed. Thus, instead, we employ the concept ofmultilevel lookahead proposed for the priority encoder [6] todesign comparators with more than four input bits. The conceptof multilevel lookahead is illustrated with the aid of the blockdiagram of a 16-b comparator in Fig. 5(a), and the schematicdiagram of the modified 4-b comparator macro PEBCLA4b isshown in Fig. 5(b).

In addition to the input/output (I/O) signals shown in Fig. 4,the new 4-b comparator macro needs an extra input look-aheadsignal and an extra output look-ahead signal . As il-lustrated in Fig. 5(a), the in the th macro is connected tothe in the th macro, except that the in the least


(a)

(b)Fig. 5. (a) Block diagram of a 16-b comparator. (b) Schematic diagram of the macro PEBCLA4b.

significant macro should be tied to directly. The followingequations describe the functions of Fig. 5(b).

(3)

As described in [6], and in (3) realizethe first-level look-ahead mechanism because all these func-tions are flattened without iteration and finished with one gatedelay. On the other hand, the circuits enclosed in the gray areasof Fig. 5(b) realize the second-level look-ahead mechanismbecause the signal is generated only with a domino-gatedelay. The look-ahead signals are used to connect differentmacros to shorten the critical path.

IV. LONG PRIORITY-ENCODING-BASED COMPARATORSWhen the size of the comparator grows larger, the third- and

even the fourth-level look-ahead circuit structures, which aresimilar to that used in the priority encoder [6], can be used toshorten the critical path further. However, not only does thestructure of a single gate become more complex, but also thepropagation delay grows linearly to the number of the cascading


Fig. 6. Block diagram of a high-performance 64-b comparator.

macros. Therefore, for a longer comparator, we propose a two-stage pipelined structure to enhance the performance with littleincrease in circuit complexity.

The previous design approach needs a precharge phase andan evaluation phase to finish one comparison operation. Thus,the precharging time is wasted from the viewpoint of logic op-eration. Furthermore, the duty cycle of a system clock is usu-ally set to be 50% despite that the required precharging timeis typically shorter than the evaluation time. Taking these fac-tors into consideration, we partition the logic functions of thecomparator into each half of the clock cycle to form a two-stagepipeline. Such a design not only makes each pipeline shorter butalso fully utilizes the clock cycle if the circuit is implementedin the dynamic CMOS logic. When the first pipeline stage en-ters the evaluation phase, the second pipeline stage enters theprecharge phase. After the first pipeline stage turns to prechargeand latches the results, the second pipeline stage begins to eval-uate. Although the new architecture needs more transistors forpipeline latches, it can effectively shorten the clock cycle to im-prove the operating speed. Furthermore, implementing the cir-cuit by dynamic CMOS circuits, the comparator can still finisheach comparison in one clock cycle.

Let us take the 64-b comparator as an example. The blockdiagram of the new design is shown in Fig. 6. The 64 input bitsare partitioned into eight small groups, each having eight inputbits. In the first pipeline stage, eight comparators process eightgroups of inputs respectively, producing eight pairs of outputs

and . After latching, these outputs are sent to thesecond stage, which is another 8-b comparator, to perform therest operations.

The 8-b macro cell PEB8b shown in Fig. 7 implements thefollowing equations.

(4)The circuit structure is derived from that of Fig. 5(b) and is

described as follows.1) Two 4-b comparators are used to construct the 8-b macro.2) The least significant macro of PEB8b uses an AND gate to

generate the lookahead signal for the second macroof PEB8b. The second macro does not need to generate

because there is no connection between different8-b macros.

3) Those transistors controlled by in the original 4-bmacro are removed from the least significant macro ofPEB8b because PEB8b does not need .

4) The and signals in the two 4-bmacros are combined together by two eight-input dy-namic NOR gates, respectively, and the results are latchedby two N-C MOS latches.

The detail operations of the 64-b comparator are describedbriefly as follows.


Fig. 7. Schematic diagram of the 8-b macro cell PEB8b.

1) The first and second pipeline stages of the 64-b com-parator utilize the same 8-b macro PEB8b. However, themacros in the first pipeline stage accept the clock signal

, but the macro in the second pipeline stage acceptsthe clock signal . Therefore, when goes high, themacro cells in the first pipeline stage enter the evaluationphase and the macro cell in the second pipeline stage en-ters the precharge phase.

2) When goes low, the macro cells in the first pipelinestage enter the precharge phase and the evaluated resultsare latched in the N-C MOS latches. These outputs arealso fed into the corresponding inputs of the macro in thesecond pipeline stage for obtaining the final comparisonresult.

3) Both stages have the same critical path, i.e., the 8-b com-parator. Because the critical paths of both stages are short-


(a)

(b)Fig. 8. Layouts of (a) Wang et al.s comparator [3]. (b) Proposed comparator.

ened and balanced, the operation speed of the comparatoris improved significantly.

V. PERFORMANCE EVALUATION AND EXPERIMENTAL RESULTSIn order to verify the proposed techniques, a two-stage

pipelined 64-b comparator is realized. To minimize the layouteffort and layout area, we have all N-type transistors at thepull-down network with the same transistor width instead ofratioed design. We also enlarge the width of these transistorsup to 5 m to reduce the pull-down delay. For example, thechannel width of the transistors in Fig. 7 are all 5 m.The design is implemented based on a 3-V 0.6- m CMOStechnology [10], which is the same as that used in the ANTcomparator [3]. The 64-b comparator based on Wang et al.sapproach is also resimulated with the transistor sizes reportedin [3]. However, for comparison purpose, the performancecomparison is based on the results of post-layout simulationsrunning at 3-V supply voltage. The layouts of both designs areshown in Fig. 8, and the complexity information is listed inTable I. We found the transistor count of the new design is lessthan that required in the conventional design, while the layoutarea of the new design is only nearly half of the conventionaldesign. This is mainly because the transistor size used in thenew design is typically much smaller than that used in theprevious design.

Before reporting the timing information, timing characteri-zation methods for both designs will be described. For the new

TABLE ICOMPLEXITY COMPARISON OF TWO 64-b COMPARATORS

TABLE IIPOST-LAYOUT SIMULATION RESULTS OF TWO DIFFERENT 64-b COMPARATORS

design, both stages have the same critical path, i.e., the 8-b com-parator. Then, we only need to characterize the critical pathdelay of the 8-b comparator macro, which is the sum-mation of the delay of the static XOR gate and the eval-uation delay of the dynamic gate . Note that the outputof the static XOR gate must be stable before the dynamic gateentering the evaluation phase. This means that can beviewed as the setup time of the dynamic circuit. The minimalcycle time will be twice of , and the maximal operatingfrequency will be . Analysis shows that we canapply the pattern ( , ) totrigger the longest signal propagation path. The timing chart ofthe 8-b macro PEB8b is illustrated in Fig. 9.

As mentioned above, the new comparator finishes each com-parison in just one clock cycle, while the conventional 64-bcomparator takes three clock cycles to finish the task. Similar tothe new design, all stages in the conventional design also havethe same critical path, but each pipeline is a 2-b comparator inthis case. Then, we only need to characterize the critical pathdelay of the 2-b comparator macro. For a fair compar-ison, we define the equivalent total delay time for eachoperation to be six times of , and the equivalent max-imal operating frequency is defined to be . Thepattern ( ) is applied to trigger the longestsignal propagation path.

Post-layout simulation results are summarized in Table II.Power consumption listed in Table II is evaluated at the max-imum clock frequency. It shows that the proposed comparatoris 16% faster and consumes 79% less power as compared withWang et al.s comparator [3]. For the new design, it is possibleto trade the layout area and the power consumption for morespeed advantages.

The proposed 64-b comparator has been fabricated for per-formance verification. Fig. 10 shows the test chip architectureused to measure the delay time of the dynamic circuit .This measurement method is commonly used in measuringdelay time of dynamic circuits [13], [14]. The input clock signalgoes through the clock buffer first, and then proceeds in twopaths. One goes through the comparator core, output buffer, andreaches output pad. The other one only goes through the outputbuffer to reach the output pad. Obviously, the only differencebetween these two paths is the comparator core. Therefore,we can measure the time between clock output signal Clkand comparator output and get the delay time .The photograph of the test chip is shown in Fig. 11(a) andmeasured waveforms with 160- and 50-MHz clocks are shown


Fig. 9. Timing chart of the critical path of PEB8b.

Fig. 10. Test chip architecture.

in Fig. 11(b) and (c), respectively. Measured chip features andpost-layout simulation results are summarized in Table III. Themeasured waveforms indicate that the delay time of thedynamic gate in the 8-b macro is 2.2 ns no matter which clockrate is used, which completely matches with the simulationresult. We cannot measure directly on the chip becauseit is the set-up time in nature. However, according to the abovemeasurement result, we have confidence that the experimentalresult is very close to the simulation result. The maximaloperating frequency is measured around 180 MHz (not shown),which again agrees with the simulation. The measured powerconsumption is also very close to the simulated result.

VI. CONCLUSION

Design techniques for high-performance and power-efficientCMOS comparators are proposed. The design is based on thepriority-encoding algorithm and utilizes the dynamic CMOS

circuit technique to result in a compact comparator with highperformance. In implementation, the priority-encoding functionand the subsequent AND function are merged as an MDM, whichis realized in the MODL. Such a design not only improves theoperating speed due to the reduced logic depth, but also makesthe circuit compact and power efficient because fewer transis-tors are used. To efficiently shorten the critical path that lies inthe MDM, multilevel look-ahead technique is adopted. To en-hance the operating speed further, the circuit is realized with alatch-based two-stage pipelined structure, and the logic func-tions are partitioned into two parts, with each part executed inhalf of the clock cycle in a delay-balanced manner. Post-layoutsimulation results show that a 64-b comparator designed withthe proposed techniques in a 3-V 0.6- m CMOS technology is16% faster, 50% smaller, and 79% more power efficient as com-pared with the fastest conventional design. Measurement resultsof the test chip confirm with simulation results and prove thefeasibility of the proposed techniques.


(a)

(b)

(c)Fig. 11. (a) Photograph of the fabricated chip. (b) Measured waveforms with160-MHz clock. (c) Measured waveforms with 50-MHz clock.

TABLE IIIEXPERIMENTAL AND POST-LAYOUT SIMULATION RESULTS

ACKNOWLEDGMENT

The authors would like to thank Prof. C. Yeh of the De-partment of Electrical Engineering, National Chung ChengUniversity, for improving the language and presentation of thispaper. They would also like to thank to the Chip Implementa-

tion Center, Taiwan, for supporting the fabrication of the testchip.

REFERENCES[1] M. M. Mano, Digital Design. Englewood Cliffs, NJ: Prentice-Hall,

1991, ch. 5.[2] N. West and K. Eshraghian, Principles of CMOS VLSI De-

sign. Reading, MA: Addison-Wesley, 1993, ch. 8.[3] C.-C. Wang, C.-F. Wu, and K.-C. Tsai, 1-GHz 64-b high-speed com-

parator using ANT dynamic logic with two-phase clocking, Proc. Inst.Elect. Eng. Comput. Digital Techn., vol. 145, no. 6, pp. 433436, Nov.1998.

[4] R. X. Gu and M. I. Elmasry, All-N-Logic high-speed true-single-phasedynamic CMOS logic, IEEE J. Solid-State Circuits, vol. 31, pp.221229, Feb. 1996.

[5] S. Furber, ARM System Architecture. Reading, MA: Addison-Wesley,1997.

[6] J.-S. Wang and C.-H. Huang, High-speed and low-power CMOS pri-ority encoders, IEEE J. Solid-State Circuits, vol. 35, pp. 15111514,Oct. 2000.

[7] I. S. Hwang and A. L. Fisher, Ultrafast compact 32-b CMOS adders inmultiple-output domino logic, IEEE J. Solid-State Circuits, vol. 24, pp.358369, Apr. 1989.

[8] J.-S. Wang and C.-S. Huang, A high-speed single-phase-clockedCMOS priority encoder, in Proc. IEEE Int. Symp. Circuit and Systems,vol. 5, May 2000, pp. 537540.

[9] R. W. Krambeck, C. M. Lee, and H.-F. S. Law, High-speed compactcircuits with CMOS, IEEE J. Solid-State Circuits, vol. SC-17, pp.614619, June 1982.

[10] 0.6-m CMOS ASIC process digests, Taiwan Semiconductor Manu-facturing Corp., Hsinchu, Taiwan, R.O.C., 1996.

[11] J. Park, H. C. Ngo, J. A. Silberman, and S. H. Dhong, 470 ps 64-bparallel binary adder [for CPU chip], in Symp. VLSI Circuits Dig. Tech.Papers, 2000, pp. 192193.

[12] S. Naffziger, A sub-nanosecond 0.5-m 64-b adder design, in IEEEInt. Solid-State Circuits Conf. Dig. Tech. Papers, 1996, pp. 362363.

[13] R. Woo, S.-J. Lee, and H.-J. Yoo, A 670-ps 64-b dynamic low-poweradder design, in Proc. IEEE Int. Symp. Circuit and Systems, vol. 1, May2000, pp. 2831.

[14] G. Yee and C. Sechen, Clock-delayed domino for dynamic circuit de-sign, IEEE Trans. VLSI Syst., vol. 8, pp. 425430, Aug. 2000.

Chung-Hsun Huang (S00) was born in Taiwan,R.O.C., in 1977. He received the B.S. and M.S.degrees in electrical engineering from NationalChung-Cheng University, Chia-Yi, Taiwan, in 1999and 2000, respectively. He is currently workingtoward the Ph.D. degree at the Institute of ElectricalEngineering, National Chung-Cheng University.

His research interests include high-speed andlow-power digital integrated circuits, microprocessordesign, SOC design methodology, and high-speedanalog-to-digital converter design.

Jinn-Shyan Wang (S85M88) was born inTaiwan, R.O.C., in 1959. He received the B.S.degree in electrical engineering from the NationalCheng-Kung University, Tainan, Taiwan, in 1982and the M.S. and Ph.D. degrees from the Instituteof Electronics, National Chiao-Tung University,Hsinchu, Taiwan, in 1984 and 1988, respectively.

He was with Industrial Technology Research In-stitute (ITRI) from 1988 to 1995, engaged in ASICcircuit and system design, and became the Managerof the Department of VLSI Design. He joined the De-

partment of Electrical Engineering, National Chung-Cheng University, Chia-Yi,Taiwan, in 1995, where he is currently a full Professor. His research interests arein low-power and high-speed digital integrated circuits and systems, analog inte-grated circuits, IP and SOC design, and CMOS image sensors. He has publishedover 20 journal papers and 40 conference papers and holds over 20 patents onVLSI circuits and architectures.

Index:

CCC: 0-7803-5957-7/00/$10.00 2000 IEEE

ccc: 0-7803-5957-7/00/$10.00 2000 IEEE

cce: 0-7803-5957-7/00/$10.00 2000 IEEE

index:

INDEX:

ind:

Documents

High-Performance and Power-efficient CMOS Comparators