Johansson 2005

A Detailed Complexity Model for Multiple ConstantMultiplication and an Algorithm to Minimize the Complexity

Kenny Johansson, Oscar Gustafsson, and Lars Wanhammar*

Abstract — Multiple constant multiplication (MCM) hasbeen an active research area for the last decade. Mostwork so far have only considered the number of additionsto realize a number of constant multiplications with thesame input. In this work we consider the number of fulland half adder cells required to realize those additionsand a novel complexity measure is proposed. The pro-posed complexity measure can be utilized for all types ofconstant operations based on shifts, additions and sub-tractions. Based on the proposed complexity measure anovel MCM algorithm is presented. Simulations showthat compared with previous algorithms, the proposedMCM algorithm have a similar number of additions whilethe number of full adder cells are significantly reduced.

1 INTRODUCTIONIn many implementations of DSP algorithms the multipliercoefficients are constant. This can be utilized to express themultiplications using shifts, additions, and subtractions.Sometimes this is referred to as a multiplierless implemen-tation. For bit-parallel arithmetic the shifts can be hard-wired, and, hence, usually only the number of additionsand subtractions are taken into account. As the complexityof an adder and a subtractor are similar we will refer toboth as adders.

For example in transposed direct form FIR filters oneinput data is multiplied with several constant coefficients,as illustrated in Fig. 1 (a). Then it is possible to utilize re-dundancy between the coefficients to reduce the number ofadders. This is referred to as the multiple constant multipli-cation (MCM) problem and have been extensively investi-gated during the last decade [1]–[7]. Note that transposingan MCM block results in a sum of products, as illustratedby the direct form FIR filter in Fig. 1 (b). This approachhave been extended to including the delays inherent in anFIR filter in the redundancy utilization [3] and to matrixmultiplications [8].

Most of these algorithms have the number of adders asthe only considered cost. For some algorithms the maxi-mum number of cascaded adders, the adder depth, has alsobeen considered [5],[6]. This is partly motivated by thepower consumption, which in general is lower for smalleradder depth [5],[9]. However, a more detailed complexitymeasure can be used.

In [10] a cost corresponding to the required number offull adders was introduced. This was referred to as adder-bit cost, and an improved approach was used in [9]. Here,we refine this method further and present an MCM algo-rithm that minimizes our proposed complexity measure.Furthermore, we show how the use of half adders can beremoved if the sign of the coefficient may be changed.

In the next section the proposed complexity measure isdescribed. Then, in Section 3, the proposed algorithm, amodification of the RAG-n algorithm from [2], is present-

ed. In Section 4, some examples of using the proposed costmeasure is presented, while in Section 5 the results of theproposed algorithm are given. Finally, in Section 6, someconcluding remarks are given.

2 PROPOSED COMPLEXITY MODELThe idea of the adder-bit cost is to count the number of fulladder cells required to realize a wordlevel adder. By usingthe proposed cost function all nodes are explicitly scaledusing safe scaling [11], i.e., there will never be an overflowand the output wordlength is just enough to represent allpossible outputs. Quantization is not considered, and,hence, full precision is kept throughout the MCM block.

We will utilize the directed acyclic graph representa-tion of multiplication introduced in [1] and used in, e.g.,[2], [8], [10], and [12]. Here, each node corresponds to anadder, except for the input node. Each edge corresponds toa shift (multiplication with a power-of-two) and a possiblenegation (realized by changing the adder to a subtractor).Intermediate values realized in nodes are called fundamen-tals. Only odd fundamentals must be considered as evencoefficients can be realized by an appropriate shift of anodd fundamental. In the same way, fractional coefficientscan be derived from integer fundamentals by shifting.

As shown in Fig. 2, each fundamental, fi, has the value(1)

There are two possibilities associated with the edge values,ej and ek, which are powers-of-two as they correspond toshifts. In the first case the value at one of the input nodes isleft shifted at least once while the significance of the othervalue is unchanged (otherwise the result would not beodd). The shift operation is, for simplicity, always associ-ated with the input node fj. The second case occurs whenthe magnitudes of both edge values are less than one. In or-der to obtain an odd fundamental the edge values must thenbe of equal significance. Hence, we have

(first case) or

(second case) (2)

* Department of Electrical Engineering, Linköping University, SE-581 83 Linköping, Sweden, e-mail: [kennyj, oscarg, larsw]@isy.liu.se,tel: +46 13 284059, fax: +46 13 139282

x(n)

y(n)

T TT T

h0 h1 h2 hNhN−1(b)

x(n)

(a)

y(n)

h0 h1 h2 hNhN−1

T T T T

Figure 1: (a) Direct transposed form and (b) direct formFIR filter realization.

��

��

��

�

�

��

��

�

�

�

��

Figure 2: (a) Utilized graph representation for shift andaddition or subtraction and (b) corresponding operation.

fi ejfj ekfk+=

ej 1 ek 1=,>

ej ek 1<=

For simplicity, we eliminate the case when both edge val-ues are negative. A transformation that can be used toavoid this is shown in Fig. 3. Hence, the requirement forthe signs of the edge values is stated as

(3)

This leads to the different cases illustrated in Fig. 4. The number of output bits, Wi, from the add operation

associated with the fundamental fi is(4)

where W0 is the wordlength at the input to the multiplier(s).For simplicity we assume that the number of shifts on anedge is smaller than the wordlength for the opposite edge.Hence, for the first case, we have

(5)

The required number of full adders, ni, FA, to performthe add operation can then be defined as

(6)

This means that the number of full adders, in the first case,is Wi minus the number of shifts associated with the edgevalue ej, as illustrated in Fig. 5 (a)–(c). In the second case,on the other hand, the magnitude of the edge values are lessthan one which gives more full adders than Wi. Althoughthe sum bits corresponding to the fractional bits are knownto be zero, the carry in bit to the full adder correspondingto the least significant output bit, s0, is computed using fulladders as shown in Fig. 6.

The number of overhead full adders is defined as thedifference between the number of full adders according to(6) and the input wordlength, W0, i.e.,

(7)

If the edge without any shift operation, in the first case,is negative, i.e., ek = –1, overhead half adders are required

to compute the least significant output bits, as illustrated inFig. 5 (c). Hence, we have

(8)

The use of overhead half adders can be eliminated by thetransformation shown in Fig. 3. A condition for this trans-formation usually is that it must be possible to compensatefor the sign of the coefficient in following steps of the al-gorithm, which is the case in most applications and certain-ly in FIR filters.

Note that there are several cases where a complete fulladder is not required, as can be seen, e.g., for the LSBs inFig. 6. This kind of design improvements has not been con-sidered here, hence, all full adders are assumed to be com-plete.

If the fundamental is a function of several input signals,as in a matrix multiplication, the output wordlength is

(9)

where in this case there are several fi:s, each one being thecoefficient corresponding to one input.

In a similar way it is possible to derive the output word-lengths in FIR MCM blocks where the delays are takeninto the redundancy utilization (sometimes referred to asvertical subexpressions), as in [3]. The output wordlengthfor each node, Wi, is also identical to the number of registerbits required if that node is to be delayed. Hence, it is pos-sible to use this model to compare the number of registerbits for FIR filter realizations.

3 PROPOSED ALGORITHMThe proposed algorithm is based on the RAG-n algorithmfrom [2]. The input to this algorithm is a set of coefficients,

��

��

�

�

�

��

��

��

��

��

Figure 3: Changing the sign of the edge weights.

�

�

�

��

��

�

�

�

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

�

�

��

��

��

��

��

��

��

�

�

��

��

��

��

��

��

��

��

�

�

��

��

Figure 4: Alternatives for the (a)–(c) first case and (d)–(e)second case with corresponding graphs.

sign ej{ } sign ek{ },( ) 1 1,( ) 1– 1,( ) 1 1–,( ), ,{ }∈

Wi W0 fi( )2log+=

ej( )2log W<0

fk( )2log+ Wk=

nFA i, Wi ej( )2log–=

nFA i,OH fi( )2log ej( )2log–=

Figure 5: Adding the least significant bits for the adders inFigs. 4 (a)–(c), respectively, with n = 2.

��

�

� ��

��

�

�

�

�

�

��

�

� ��

��

�

�

�

�

�

��

�

� ��

��

��

�

�

�

�

��

��

��

�

�

��

��

�

�

��

�

�

��

�

�

�

��

� �

��

��

�

�

��

��

�

�

��

�

�

��

�

�

�

��

� �Figure 6: Adding the least significant bits for the adders inFigs. 4 (d) and (e), respectively, with n = 2.

nHA i,OH

ej( )2log

0

=ek 1–=

otherwise

Wi W0 fi∑( )2log+=

C, and the output is a graph, G, where each node has a val-ue corresponding to a coefficient or an extra fundamentalthat is required to realize the coefficients. Initially the onlyavailable node value in G is 1. As stated in (1), all coeffi-cients fi that can be obtained from any fj and fk that areavailable in G are removed from C and added to G. Thisprocedure is iterated until either C is empty or it is not pos-sible to realize any more required coefficients. For the latercase one or more extra fundamentals, that makes it possibleto realize one of the coefficients in C, are added to G. Thenstart combining all available values according to (1) again.

The proposed algorithm, denoted RFAG-n, only addsone coefficient at a time, the one which require the smallestnumber of overhead full adders. Furthermore, RFAG-n se-lects extra fundamentals leading to the smallest number ofoverhead full adders instead of the smallest fundamentalvalues.

The result is that RAG-n is more likely to reuse extrafundamentals, due to the selection of smaller values and bythat reduce the number of wordlevel adders, whileRFAG-n is more likely to reduce the number of overheadfull adders.

4 EXAMPLES

4.1 Single Coefficient Case

In [12] a simplified representation of the constant coeffi-cient multipliers introduced in [10] was proposed. It wasshown that several of the different cases in [10] could beconsidered as the same case during the design process.However, when realizing the multiplier one of the severalpossible cases must be selected.

Consider the coefficient 1717, computed as(10)

The corresponding simplified (vertex reduced [12]) graphis shown in Fig. 7 (a). Note that this is only one out of sev-eral possible graphs and edge weight sets that can be usedto realize the coefficient 1717 with four adders.

When realizing this multiplication one of the six asso-ciated fully-specified graphs shown in Figs. 7 (b)–(g) canbe used. Note that the realizations in Figs. 7 (d)–(g) corre-spond to the transposed simplified graph.

The results in terms of overhead full adders are shownin Table 1, where the associated extra fundamental valuesalso are given. If the transformation in Fig. 3 is applied theuse of half adders can be eliminated, resulting in the coef-ficient –1717 for all realizations. Hence, the cost in halfadders will be ignored.

Assume that the input wordlength, W0, is 16 bits. If therealizations in Figs. 7 (b) or (c) are used full adders are required. Using the realization in Fig. 7 (g),which has the lowest adder depth, fulladders are required. Hence, the total number of full addersis decreased with more than 13% in this case by choosingthe realization with lowest complexity.

4.2 Multiple Coefficient Case

For the MCM case we will consider the FIR filter used forthe example in [6]. In Table 2 the results in terms of adders,and overhead full and half adder cells for various algo-rithms are shown. The total number of full adders are com-puted using an input wordlength of 16 bits. From this it isclear that the proposed algorithm provides the minimumnumber of full adder cells for the considered example.

For this particular example the number of adders weresmallest using RFAG-n. However, this is not generally thecase, as will be seen in the next section. The main advan-tage of RFAG-n is instead the reduced number of overheadfull adders, while the number of adders are kept small.Here, overhead full adders corresponding to almost twowordlevel adders are saved.

Realization Fundamental 1 Fundamental 2 Fundamental 3 Output TotalValue Overhead Value Overhead Value Overhead OverheadFig. 7 (b) 3 1 11 1 693 4 (6) 1 7 (6)Fig. 7 (c) 3 1 11 1 1013 0 (10) 5 7 (10)Fig. 7 (d) 63 0 (6) 189 7 693 7 1 15 (6)Fig. 7 (e) 63 0 (6) 189 7 1213 1 8 16 (6)Fig. 7 (f) 63 0 (6) 1087 1 1213 10 8 19 (6)Fig. 7 (g) 63 0 (6) 1087 1 315 7 10 18 (6)

Table 1. Extra fundamental values and the associated number of overhead full adders required, in addition to the wordlength of the input, W0, for the multiplier graphs in Fig. 7 (b)–(g). The number of overhead half adders are in brackets.

11 63 1024+⋅ 1 2 8+ +( ) 64 1–( ) 1024+⋅=

Figure 7: (a) Vertex reduced graph for the coefficient 1717. (b)–(g) Fully-specified graphs for the coefficient 1717. The nodesymbols (●■▲) indicate the relation between nodes in the vertex reduced graph and nodes in the fully-specified graphs.

(a) (c)

(g)

(d)

(f)

(b)

(e)

8−164

10242 64

81024

−111 12

8

64

1024

−11 112

81024

−1 1164

1

8

64−1

21024

1 11

64−1

28

1

1024

11

41

−1

1024

121

12

64

Algorithm Adders Overhead full adders

Overhead half adders

Total full adders

RAG-n [2] 18 78 20 366BHM [2] 20 99 19 419Pasko [4] 23 68 68 436

C1 [6] 19 70 34 374DA-MST [7] 19 88 22 392

RFAG-n 17 48 57 320

Table 2. Results for the FIR filter in [6].

4 16 7+⋅ 71=

4 16 18+⋅ 82=

5 RESULTSTo study the properties of the proposed MCM algorithm inSection 3, simulations with random coefficients have beenperformed. For each combination of coefficient word-length and number of coefficients 100 random coefficientsets have been used. The proposed algorithm is comparedwith the RAG-n algorithm in [2]. This algorithm is gener-ally known for obtaining the probably best results in termsof adders.

Considering sets 25 coefficients and a varying coeffi-cient wordlength the results in Fig. 8 is obtained. FromFig. 8 (a) it is clear that the number of overhead full addersare reduced for our proposed algorithm compared toRAG-n. The overhead in terms of adders are shown inFig. 8 (b), where it is clear that a small overhead are ob-tained for longer coefficient wordlengths. This is becauseof the different strategies when extra fundamentals must beadded, as discussed in Section 3. However, a careful anal-ysis shows that for some cases RFAG-n actually requiresless adders than RAG-n. Transforming the overhead fulladders into wordlevel adders, it is from Fig. 8 (c) clear thatthe savings in overhead full adders more than enough com-pensates for the slight overhead in wordlevel adders. Therelative savings are shown in Fig. 8 (d).

Instead varying the coefficient set size with a fixed co-efficient wordlength the results in Fig. 9 are obtained. Sim-ilar conclusions can be drawn from these results.

6 CONCLUSIONSIn this work a detailed complexity model for multiple con-stant multiplication (MCM) blocks was proposed. Themodel counts the number of full and half adder cells re-quired to realize an MCM block. A transformation that canbe used to eliminate the use of half adders, at no extra cost,was introduced. Based on the proposed model a novel al-gorithm for the MCM problem was proposed. It was shownthat the proposed algorithm provides significantly im-proved results compared with previous algorithms. Theproposed complexity model can also be utilized for singleconstant coefficient multipliers, constant matrix multipli-ers, and FIR filters.

References

[1] D. R. Bull and D. H. Horrocks, “Primitive operator digitalfilters,” IEE Proc. G, vol. 138, pp. 401–412, June 1991.

[2] A. G. Dempster and M. D. Macleod, “Use of minimum-adder multiplier blocks in FIR digital filters,” IEEETrans. Circuits Syst.–II, vol. 42, no. 9, pp. 569–577, Sept.1995.

[3] R. I. Hartley, “Subexpression sharing in filters usingcanonic signed digit multipliers,” IEEE Trans. CircuitsSyst.–II, vol. 43, pp. 677–688, Oct. 1996.

[4] R. Pasko, P. Schaumont, V. Derudder, S. Vernalde, and D.Durackova, “A new algorithm for elimination of commonsubexpressions,” IEEE Trans. Computer-Aided Design,vol. 18, no. 1, pp. 58–68, Jan. 1999.

[5] M. Martínez-Peiró, E. Boemo, and L. Wanhammar,“Design of high speed multiplierless filters using anonrecursive signed common subexpression algorithm,”IEEE Trans. Circuits Syst.–II, vol. 49, no. 3, pp. 196–203,Mar. 2002.

[6] A.G. Dempster, S.S. Demirsoy & I. Kale, “Designingmultiplier blocks with low logic depth,” in Proc Proc.IEEE Int. Symp. Circuits Syst., Phoenix, AZ, May 26–29,2002, vol. 5, pp. 773–776.

[7] O. Gustafsson, H. Ohlsson, and L. Wanhammar,“Improved multiple constant multiplication usingminimum spanning trees,” in Proc. Asilomar Conf.Signals, Syst., Comp., Pacific Grove, CA, Nov. 7–10,2004, pp. 63–66.

[8] A. G. Dempster, O. Gustafsson, and J. O. Coleman,“Towards an algorithm for matrix multiplier blocks,” inProc. European Conf. Circuit Theory Design, Kraków,Poland, Sept. 1–4, 2003.

[9] S. S. Demirsoy, A. G. Dempster, and I. Kale, “Poweranalysis of multiplier blocks,” in Proc. IEEE Int. Symp.Circuits Syst., Phoenix, AZ, May 26–29, 2002, vol. 1, pp.297–300.

[10] A. G. Dempster and M. D. Macleod, “Constant integermultiplication using minimum adders,” IEE Proc.Circuits Devices Syst., vol. 141, no. 6, pp. 407–413, Oct.1994.

[11] L. Wanhammar, DSP Integrated Circuits, AcademicPress, 1999.

[12] O. Gustafsson, A. G. Dempster, and L. Wanhammar,“Extended results for minimum-adder constant integermultipliers,” in Proc. IEEE Int. Symp. Circuits Syst.,Phoenix, AZ, May 26–29, 2002, vol. 1, pp. 73–76.

6 8 10 120

50

100

150

200

250

Ove

rhea

d FA

/HA

(a)

FA RAG−nFA RFAG−nHA RAG−nHA RFAG−n

6 8 10 12−0.4

−0.3

−0.2

−0.1

0

0.1

Savi

ngs

in a

dder

s

(b)

6 8 10 120

1

2

3

Coefficient bits

Savi

ngs

in W

0 add

ers

(c)

W0 = 14

W0 = 18

6 8 10 120

2

4

6

8

10

Coefficient bits

Tot

al F

A s

avin

gs [

%]

(d)

W0 = 14

W0 = 18

Figure 8: Comparison results for sets of 25 coefficients. (a)Number of overhead full and half adders. (b) Savings inadders using RFAG-n over RAG-n. (c) Correspondingsavings in wordlevel adders considering both adders andoverhead full adders. (d) Relative savings in full adder cellsfor RFAG-n over RAG-n.

10 20 30 400

50

100

150

200

250

Ove

rhea

d FA

/HA

(a)

FA RAG−nFA RFAG−nHA RAG−nHA RFAG−n

10 20 30 40−0.2

−0.1

0

0.1

Savi

ngs

in a

dder

s

(b)

10 20 30 400

2

4

6

Number of coefficients

Savi

ngs

in W

0 add

ers

(c)W

0 = 14

W0 = 18

10 20 30 400

2

4

6

8

10

Number of coefficients

Tot

al F

A s

avin

gs [

%]

(d)W

0 = 14

W0 = 18

Figure 9: Comparison results for 10 bit coefficients. (a)Number of overhead full and half adders. (b) Savings inadders using RFAG-n over RAG-n. (c) Correspondingsavings in wordlevel adders considering both adders andoverhead full adders. (d) Relative savings in full adder cellsfor RFAG-n over RAG-n.

Documents

Johansson 2005