5
Reversible Implementation of Novel Multiply Accumulate (MAC) Unit Swaraj Raman M, Arun Kumar K, Srinivas Reddy K EEE Department, Birla Institute of Technology and Science, Pilani, India [email protected], [email protected], [email protected] Abstract—In almost all the Digital Signal Processing (DSP) applications, the vital operations involve multiplications and accumulations. Consequently, there is a demand for dedicated hardware in processors to enhance the speed with which these multiplications and accumulations are performed. In the present world of irreversible circuits, the Multiply Accumulate Unit multiplies the two operands, adds the product to the previously accumulated result and stores back the new result in the Accumulator all in a single clock cycle. On the other hand, implementation of digital circuits in reversible logic is gaining popularity with the arrival of quantum computing and reversible logic. In this paper, we propose a novel Reversible Multiply Accumulate (MAC) unit. We also build a Reversible Vedic MAC unit and compare various possible implementations of the reversible MAC unit in terms of Quantum Cost, number of Garbage Outputs and Depth. Keywords- reversible; multiply; accumulation; MAC; vedic; quantum cost; garbage outputs; depth; DSP I. INTRODUCTION In irreversible logic, according to Laundaeur’s research, heat dissipation of kTln2 joules takes place on erasing a bit (k is the Boltzmann’s constant and T is the absolute temperature of environment) [1]. A one to one mapping exists between the input and output vectors in reversible logic and according to Bennett, the operations performed in a reversible manner will not dissipate kTln2 joules of heat energy [2]. Reversible logic gates constitute reversible logic circuits and their major application can be seen in quantum computing [3], [4]. Each quantum logic gate performs an elementary unitary operation on one, two or more two-state quantum system called qubits. In the design of reversible circuits, fan-out and loops are not permitted. From the point of view of reversible circuit design, there are three parameters for determining the complexity and performance of circuits [5]: Quantum cost (QC): The number of 1x1 or 2x2 reversible gates which are used in circuit. Garbage outputs (GO): The number of dummy (unused) outputs which are made to appear in order to make the circuit reversible. Depth: The number of 1x1 or 2x2 reversible gates which are in the longest path from input to output. A reversible circuit designer always looks to optimize on these three parameters. All Digital Signal Processing (DSP) algorithms extensively use multiply-accumulate (MAC) operation for high performance digital processing system. This operation eases the computation of convolution which is needed in filters, Fourier analyzers, etc. A multiply-accumulate (MAC) unit comprises of a multiplier, an adder and an accumulator. The multiplier multiplies the inputs and gives the result to the adder, which adds the multiplier result to the previously accumulated result. In this paper, a novel reversible Multiply Accumulate (MAC) unit is proposed. The reversible multiplier is implemented by the combination of reversible HAs, FAs and Peres gates [6]. The reversible ripple carry adder is used as the adder and the reversible Accumulator is designed using the reversible sequential blocks [7]. A reversible Vedic MAC unit is also built and compared with other possible implementations of the reversible MAC unit in terms of QC, GO and Depth of the circuit. The rest of the paper is organized as follows: Section II explains the background of reversible gates and circuits which will constitute the MAC unit. Section III presents the proposed design of reversible MAC unit. Section IV shows the design of the reversible Vedic MAC unit. In Section V, the performance analysis of the proposed MAC unit, Vedic MAC unit and other possible MAC implementations are discussed and compared. Section VI provides the conclusions. II. REVERSIBLE GATES An n x n (n inputs and n outputs) reversible circuit has each input assignment mapped to a unique output assignment and vice versa. The quantum cost of all 1x1 and 2x2 reversible gates are considered as unity [3]. The quantum cost of a reversible gate is the number of 1x1 and 2x2 reversible gates needed to design it. A. Elementary Gates: The elementary quantum logic gates shown in fig. 1 are (a) 1x1 NOT gate, (b) 2x2 Feynman gate (Controlled NOT gate), (c) 2x2 Controlled-V and (d) 2x2 Controlled-V+ (V is a square-root of NOT gate and V+ is its hermitian) [8]. When two V gates or two V+ gates are in series, they will behave as a NOT gate. A V gate in series with a V+ gate or vice versa is an identity. All the elementary gates have a quantum cost of unity. 2012 International Conference on Communication, Information & Computing Technology (ICCICT), Oct. 19-20, Mumbai, India 978-1-4577-2078-9/12/$26.00©2011 IEEE 1

[IEEE 2012 International Conference on Communication, Information & Computing Technology (ICCICT) - Mumbai, India (2012.10.19-2012.10.20)] 2012 International Conference on Communication,

  • Upload
    reddy-k

  • View
    217

  • Download
    2

Embed Size (px)

Citation preview

Page 1: [IEEE 2012 International Conference on Communication, Information & Computing Technology (ICCICT) - Mumbai, India (2012.10.19-2012.10.20)] 2012 International Conference on Communication,

Reversible Implementation of Novel Multiply Accumulate (MAC) Unit

Swaraj Raman M, Arun Kumar K, Srinivas Reddy K EEE Department, Birla Institute of Technology and Science,

Pilani, India [email protected], [email protected], [email protected]

Abstract—In almost all the Digital Signal Processing (DSP) applications, the vital operations involve multiplications and accumulations. Consequently, there is a demand for dedicated hardware in processors to enhance the speed with which these multiplications and accumulations are performed. In the present world of irreversible circuits, the Multiply Accumulate Unit multiplies the two operands, adds the product to the previously accumulated result and stores back the new result in the Accumulator all in a single clock cycle. On the other hand, implementation of digital circuits in reversible logic is gaining popularity with the arrival of quantum computing and reversible logic. In this paper, we propose a novel Reversible Multiply Accumulate (MAC) unit. We also build a Reversible Vedic MAC unit and compare various possible implementations of the reversible MAC unit in terms of Quantum Cost, number of Garbage Outputs and Depth.

Keywords- reversible; multiply; accumulation; MAC; vedic; quantum cost; garbage outputs; depth; DSP

I. INTRODUCTION In irreversible logic, according to Laundaeur’s research,

heat dissipation of kTln2 joules takes place on erasing a bit (k is the Boltzmann’s constant and T is the absolute temperature of environment) [1]. A one to one mapping exists between the input and output vectors in reversible logic and according to Bennett, the operations performed in a reversible manner will not dissipate kTln2 joules of heat energy [2]. Reversible logic gates constitute reversible logic circuits and their major application can be seen in quantum computing [3], [4]. Each quantum logic gate performs an elementary unitary operation on one, two or more two-state quantum system called qubits. In the design of reversible circuits, fan-out and loops are not permitted.

From the point of view of reversible circuit design, there are three parameters for determining the complexity and performance of circuits [5]: Quantum cost (QC): The number of 1x1 or 2x2 reversible gates which are used in circuit. Garbage outputs (GO): The number of dummy (unused) outputs which are made to appear in order to make the circuit reversible. Depth: The number of 1x1 or 2x2 reversible gates which are in the longest path from input to output. A reversible circuit designer always looks to optimize on these three parameters.

All Digital Signal Processing (DSP) algorithms extensively use multiply-accumulate (MAC) operation for high performance digital processing system. This operation eases the computation of convolution which is needed in filters, Fourier analyzers, etc. A multiply-accumulate (MAC) unit comprises of a multiplier, an adder and an accumulator. The multiplier multiplies the inputs and gives the result to the adder, which adds the multiplier result to the previously accumulated result. In this paper, a novel reversible Multiply Accumulate (MAC) unit is proposed. The reversible multiplier is implemented by the combination of reversible HAs, FAs and Peres gates [6]. The reversible ripple carry adder is used as the adder and the reversible Accumulator is designed using the reversible sequential blocks [7]. A reversible Vedic MAC unit is also built and compared with other possible implementations of the reversible MAC unit in terms of QC, GO and Depth of the circuit.

The rest of the paper is organized as follows: Section II explains the background of reversible gates and circuits which will constitute the MAC unit. Section III presents the proposed design of reversible MAC unit. Section IV shows the design of the reversible Vedic MAC unit. In Section V, the performance analysis of the proposed MAC unit, Vedic MAC unit and other possible MAC implementations are discussed and compared. Section VI provides the conclusions.

II. REVERSIBLE GATES An n x n (n inputs and n outputs) reversible circuit has each

input assignment mapped to a unique output assignment and vice versa. The quantum cost of all 1x1 and 2x2 reversible gates are considered as unity [3]. The quantum cost of a reversible gate is the number of 1x1 and 2x2 reversible gates needed to design it.

A. Elementary Gates: The elementary quantum logic gates shown in fig. 1 are (a)

1x1 NOT gate, (b) 2x2 Feynman gate (Controlled NOT gate), (c) 2x2 Controlled-V and (d) 2x2 Controlled-V+ (V is a square-root of NOT gate and V+ is its hermitian) [8]. When two V gates or two V+ gates are in series, they will behave as a NOT gate. A V gate in series with a V+ gate or vice versa is an identity. All the elementary gates have a quantum cost of unity.

2012 International Conference on Communication, Information & Computing Technology (ICCICT), Oct. 19-20, Mumbai, India

978-1-4577-2078-9/12/$26.00©2011 IEEE 1

Page 2: [IEEE 2012 International Conference on Communication, Information & Computing Technology (ICCICT) - Mumbai, India (2012.10.19-2012.10.20)] 2012 International Conference on Communication,

Figure 1. Elementary quantum logic gates

B. Fredkin Gate: The 3x3 Fredkin Gate shown in fig. 2(a) passes the first

input unaltered and the second and third inputs are swapped if the first input is ‘1’ [9]. The quantum cost of Fredkin gate QC=5 and the number of garbage outputs GO=1 as can be seen from fig. 2(b).

Figure 2. Fredkin Gate and its quantum gate implementation

C. Peres Gate: Peres gates are often used in the design of reversible

arithmetic circuits. The input-output equations of this gate are shown in fig. 3(a) [6]. The implementation of this gate using elementary quantum gates as shown in fig. 3(b), has a quantum cost of 4 and 1 garbage output. Half- adder can be realized using this Peres Gate by making C=0. The last two outputs would be the sum and carry.

Figure 3. Peres Gate and its quantum gate implementation

D. Full Adder: The implementation of reversible full-adder is shown in

fig. 4 and its quantum cost and number of garbage outputs equal 6 and 2 respectively [10].

Figure 4. Reversible implementation of Full-Adder

E. Fanout Gate: Fan-out is not permitted in reversible circuits; hence a

Feynman gate can be used to provide it as shown in fig. 5.

Figure 5. Fanout Gate

III. PROPOSED REVERSIBLE MAC UNIT The Multiply Accumulate (MAC) unit consists of a

multiplier, an adder and an accumulator as shown in fig. 6.

Figure 6. 4x4 Multiply Accumulate (MAC) Unit

For the 4x4 MAC unit to be reversible, the design of these arithmetic and sequential circuits are made to be reversible as follows:

A. 4x4 Multiplier:

The reversible 4x4 multiplier consists of a partial product generator and a 4-operand adder [11]. The partial product generator shown in fig. 7 is made of Peres Gates yielding the required partial products. The 4-operand adder is implemented using half adders and full adders as shown in fig. 8. The multiplier has a total quantum cost (QC) of 152, number of Garbage Outputs (GO) =52 and a Depth of 29.

B. 8-bit Ripple Carry Adder: An 8-bit Ripple Carry Adder is designed using a Half

Adder (HA) and 7 Full Adders (FA) as shown in the fig. 9 [12] [13]. The quantum cost of the Adder QC=46, number of garbage outputs GO=15 and the Depth=39.

2012 International Conference on Communication, Information & Computing Technology (ICCICT), Oct. 19-20, Mumbai, India

978-1-4577-2078-9/12/$26.00©2011 IEEE 2

Page 3: [IEEE 2012 International Conference on Communication, Information & Computing Technology (ICCICT) - Mumbai, India (2012.10.19-2012.10.20)] 2012 International Conference on Communication,

Figure 7. Reversible Partial Product Generator using Peres Gates

C. 8-bit Accumulator: The basic element constituting an accumulator or a register

is a flip flop which can store a bit of information. Reversible master slave flip flops are made by using Feynman and Fredkin gates. A single reversible master-slave D flip flop is shown in fig. 10 and an 8-bit reversible accumulator is built by cascading 8 such reversible D flip-flops as shown in fig. 11. The total quantum cost (QC) of the 8-bit reversible accumulator is 96, the number of garbage outputs is 24 and Depth=96.

Figure 8. Reversible four-operand Adder

D. Feedback Fanout: The accumulation operation requires the output of the accumulator to be fed back to the adder; hence we require 8 fanout gates at the output of the accumulator. The quantum cost of the feedback fanout is thus QC=8, the number of garbage ouputs and Depth being 0.

Thus, the proposed 4x4 reversible Multiply Accumulate (MAC) unit has a quantum cost of QC=302, number of garbage outputs GO=91 and Depth=164.

Figure 9. Reversible 8-bit Ripple Carry Adder

Figure 10. Reversible Master-Slave D flip-flop using Fredkin and

Feynman Gates

Figure 11. 8-bit Reversible Accumulator

IV. REVERSIBLE VEDIC MAC UNIT In the present world of irreversible circuits, Vedic Sutras

have simplified and even optimised the implementation of Arithmetic circuits [14]. The reversible Vedic Multiply Accumulate (MAC) unit has a reversible Vedic multiplier. The Adder and Accumulator architectures remain the same as the ones proposed earlier. The Vedic multiplier uses the Vedic Sutra “Urdhva Thiryagbhyam” (vertically and crosswire) [15] to implement the 4x4 multiplication for binary numbers. The challenge lies in making the hardware reversible and reducing the quantum cost, number of garbage outputs and Depth.

The fig. 12 summarizes the steps needed to obtain the final product, aided by the expressions shown for each step.

R0=B0A0 (1) C1R1=B0A1+B1A0 (2) C2R2=C1+B0A2+B1A1+B2A0 (3) C3R3=C2+B0A3+B1A2+B3A0+B2A1 (4)

2012 International Conference on Communication, Information & Computing Technology (ICCICT), Oct. 19-20, Mumbai, India

978-1-4577-2078-9/12/$26.00©2011 IEEE 3

Page 4: [IEEE 2012 International Conference on Communication, Information & Computing Technology (ICCICT) - Mumbai, India (2012.10.19-2012.10.20)] 2012 International Conference on Communication,

Figure 12. Line diagram for two 4-bit binary numbers

C4R4=C3+B1A3+B2A2+B3A1 (5) C5R5=C4+B2A3+B3A2 (6) C6R6=C5+B3A3 (7) The final product is C6R6R5R4R3R2R1R0.

The partial products can be calculated in parallel using the reversible partial product generator in fig. 7 with QC=64, GO=32 and Depth=6. Apart from the reversible partial product generator, the vedic multiplier consists of a 4x4 vedic multiplier module shown in fig. 13. It consists of four 2x2 multiplier modules, a 4-bit carry save adder and a 4-bit ripple carry adder as shown, all of which are reversible and are implemented as follows:

A. 2x2 reversible multiplier module: The 2x2 reversible multiplier module is implemented using

two reversible half-adders (HA) as shown in fig. 14. The implementation equations are: r0 (1 bit) =b0a0 (8) r1 (1 bit) =b0a1+b1a0 (9) r2 (2 bits) =b1a1+c1 (10) product (4 bits) ={ r2,r1,r0 } (11) where {,} represents concatenation.

The quantum cost of a single 2x2 reversible multiplier module QC is 8, number of garbage outputs is 2 and the depth is 8.

B. 4-bit reversible carry save adder: The reversible carry save adder shown in the fig. 15 is used

to add three 4-bit operands yielding 4-bit sum and 4-bit carry (output of first row). The carry bits will now be added to the succeeding sum bits by a reversible 4-bit ripple carry adder (second row) producing the final 6-bit sum (the least significant bit is the least significant bit of the 4-bit carry save sum thus making the final result 6bits). The quantum cost of the carry-save adder, QC is 44, the number of garbage outputs GO is 14 and depth is 23.

C. 4-bit reversible ripple carry adder: A 4-bit Ripple Carry Adder is designed using a Half Adder

(HA) and 3 Full Adders (FA). The quantum cost of the Adder QC is 22, the number of garbage outputs GO is 7 and the Depth is 19.

Figure 13. Reversible Vedic Multiplier

Figure 14. 2x2 Reversible Multiplier Module

Figure 15. 4-bit Reversible Carry Save Adder with Ripple Carry Adder

The partial product generator generates the partial products which are fed to the four 2x2 multiplier modules. The least two significant bits of the rightmost 2x2 multiplier module are the least two significant bits of the final product, P1P0. The 4- bit carry save adder adds three 4-bit operands i.e. concatenated 4-bit {0,0, most significant two output bits of right most 2x2 multiplier module} and output bits of middle two 2x2 multiplier modules. The least significant two bits of 6-bit sum

2012 International Conference on Communication, Information & Computing Technology (ICCICT), Oct. 19-20, Mumbai, India

978-1-4577-2078-9/12/$26.00©2011 IEEE 4

Page 5: [IEEE 2012 International Conference on Communication, Information & Computing Technology (ICCICT) - Mumbai, India (2012.10.19-2012.10.20)] 2012 International Conference on Communication,

of the carry save adder represent the middle part of the product, P3P2. The 4-bit output of the leftmost 2x2 multiplier module and most significant 4bits of 6bit sum (remaining 4bits) of the carry save adder are fed into 4-bit ripple carry adder. The resulting sum represents P7P6P5P4.

The total quantum cost of the reversible Vedic multiplier QC is 186, the number of garbage outputs GO is 61 and Depth is 56. The 8-bit Ripple Carry Adder, 8-bit Accumulator and feedback fanout are the same as explained in Section-III B, C and D respectively.

Thus, the 4x4 reversible Vedic Multiply Accumulate (MAC) unit has a quantum cost QC of 336, number of garbage outputs GO=100 and Depth=191.

V. COMPARISON The table-I gives the quantum cost(QC), number of

garbage outputs(GO) and depth of different 4x4 multiplier implementations, 8bit Ripple Carry Adder and 8-bit Accumulator. The 4x4 reversible MAC unit can be implemented using a combination of different 4x4 reversible multipliers and 8-bit reversible adders.

Table-II compares the quantum cost (QC), number of

garbage outputs (GO) and depth of 4x4 reversible MAC units implemented using different 4x4 reversible multipliers. The proposed novel MAC unit of Section III has the least quantum cost, number of garbage outputs and depth compared to the Vedic MAC unit discussed in Section IV or any of the reversible MAC units implemented using multipliers in [16] [17] [18].

TABLE I. QUANTUM COST (QC), NUMBER OF GARBAGE OUTPUTS (GO) AND DEPTH OF

REVERSIBLE MULTIPLIERS, ADDER AND ACCUMULATOR

QC GO Depth

TSG Multiplier [16] 182 58 42

MKG Multiplier [17] 160 56 41

HNG Multiplier [18] 152 52 40

Multiplier in [11] 152 52 29

Vedic Multiplier 186 61 56

8-bit Adder 46 15 39

8-bit Accumulator 96 24 96

TABLE II.

QUANTUM COST (QC), NUMBER OF GARBAGE OUTPUTS (GO) AND DEPTH OF DIFFERENT REVERSIBLE MULTIPLY ACCUMULATE (MAC) UNITS

QC GO Depth

MAC using Vedic Multiplier 336 100 191

MAC using TSG Multiplier 332 97 177

MAC using MKG Multiplier 310 95 176

MAC using HNG Multiplier 302 91 175

Proposed Novel MAC 302 91 164

VI. CONCLUSIONS Presently, Multiply Accumulation operation is extensively

used in many Digital Signal Processing algorithms for high performance digital processing systems. The trend is believed to continue in Quantum Digital Signal Processing as well. A novel reversible MAC Unit was proposed in this paper. Various implementations of the reversible MAC unit were discussed and compared in terms of Quantum cost, number of Garbage outputs and Depth.

The prospect for further research includes the reversible implementation of complex filters and other complex arithmetic circuits using the proposed reversible Multiply Accumulate Unit.

REFERENCES [1] R. Landauer, “Irreversibility and heat generation in the computational

process”, IBM J. Research and Development, vol. 5, pp. 183–191, Dec. 1961.

[2] C.H. Bennett, “Logical reversibility of computation”, IBM J. Research and Development, vol. 17, pp. 525–532, Nov. 1973.

[3] W. N. Hung, X. Song, G.Yang, J.Yang, and M. Perkowski, “Optimal synthesis of multiple output boolean functions using a set of quantum gates by symbolic reachability analysis”, IEEE Trans. Computer-Aided Design, vol. 25, no. 9, pp. 1652–1663, Sept. 2006.

[4] V. Vedral, A. Barenco, and A. Ekert, “Quantum networks for elementary arithmetic operations”, Phys. Rev. A, vol. 54, no. 1, pp. 147–153, Jul 1996.

[5] P. Kaye, R. Laflamme, and M. Mosca, “An Introduction to Quantum Computing”, Oxford University Press, January 2007.

[6] A. Peres, “Reversible logic and quantum computers”, Physical Review A, Vol 32, pp. 3266-3276, 1985.

[7] Hari, Siva Kumar Sastry, Shroff, Shyam, Mahammad, SK.Noor, Kamakoti V, "Efficient Building Blocks for Reversible Sequential Circuit Design", Proceedings of the 2006, 49th Midwest Symposium on Circuits and Systems.

[8] W. N. N. Hung, X. Song, G. Yang, J. Yang and M. A Perkowski, “Quantum Logic Synthesis by Symbolic Reachability Analysis”, Proc. 41st annual conference on Design automation, pp.838-841, Jan. 2004.

[9] E. Fredkin and T. Toffoli, "Conservative logic", Int. J. Theoretical Physics, Vol. 21, No. 3/4, pp. 219–253, 1982.

[10] D. Maslov, C. Young, D. M. Miller, and G. W. Dueck, “Quantum Circuit Simplification Using Templates”, Proc. Design Automation and Test in Europe (DATE), Vol 2, pp. 1208-1213, March 2005.

[11] Fateme Naderpour, Abbas Vafaei, "Reversible Multipliers: Decreasing the Depth of the Circuit", 5th International Conference on Electrical and Computer Engineering ICECE December 2008, Dhaka, Bangladesh.

[12] J.W Bruce, M.A. Thornton, L Shivakumaraiah, P.S Kokate and X. Li, "Efficient Adder Circuits Based on a Conservative Reversible Logic Gate", Proc. IEEE Computer Society Annual Symposium on VLSI 2002.

[13] Matthew Morrison, Matthew Lewandowski, Richard Meana and Nagarajan Ranganathan, "Design of a Novel Reversible ALU using an Enhanced Carry Look-Ahead Adder", 2011 11th IEEE International Conference on Nanotechnology.

[14] Devika Jaina, Kabiraj Sethi and Rutuparna Panda, "Vedic Mathematics based Multiply Accumulate Unit", Computational Intelligence and Communication Networks (CICN), 2011 International Conference.

[15] Maharaja, J.S.S.B.K.T, "Vedic mathematics", Motilal Banarsidass Publishers Pvt. Ltd, Delhi, 2009.

[16] H. Thapliyal and M.B. Srinivas, “Novel Reversible Multiplier Architecture Using Reversible TSG Gate”, Proc. IEEE International Conf. on Computer Systems and Applications, pp. 100-103, Mar. 2006.

[17] M. Shams, M. Haghparast and K. Navi, “Novel Reversible Multiplier Circuit in Nanotechnology”, World Applied Science Journal Vol. 3, No. 5, pp. 806-810, 2008.

[18] M. Haghparast, S. Jafarali Jassbi, K. Navi and O. Hashemipour, “Design of a Novel Reversible Multiplier Circuit Using HNG Gate in Nanotechnology”, World Applied Science Journal Vol. 3 No. 6, pp. 974-978, 2008.

2012 International Conference on Communication, Information & Computing Technology (ICCICT), Oct. 19-20, Mumbai, India

978-1-4577-2078-9/12/$26.00©2011 IEEE 5