Design of new AB2 multiplier over GF(2m) using cellular automata

Design of new AB2 multiplier over GF(2m) usingcellular automata

K.M. Ku, K.J. Ha and K.Y. Yoo

Abstract: AB2 multiplication over GF(2m) is an essential operation in modular exponentiation,which is the basic computation for most public key cryptosystems. The authors present a newarchitecture that can perform AB2 multiplication over GF(2m) in m clock cycles using cellularautomata. The proposed cellular automata architecture is also well suited to VLSI implementationbecause it is simple, regular, modular, and cascadable.

1 Introduction

With the ever-increasing growth in data communications, itis now possible to readily obtain various forms ofinformation and different information services. However,unfortunately, potentially dangerous and destructive mal-functions also accompany such convenience and profit-ability. This has increased the need for informationprotection, resulting in the development of many types ofsecurity technologies and an increased public interest incrypto-systems. Cryptography is an essential requirementfor communication privacy or the concealment of data in adata bank.Finite field GF(2m) arithmetic is fundamental to the

implementation of a number of modern cryptographicsystems [1]. Most public key cryptosystems, such as theDiffie-Hellman key exchange and the ElGamal system, arebased on a modular exponentiation computation in a finitefield [2, 3]. The algorithm for exponentiation (A(x)E modP(x), where A(x) represents the elements on GF(2m), P(x) isan irreducible polynomial and E¼ [em�1, em�2, y,e1, e0])over GF(2m) is divided into the LSB-first method andMSB-first method according to the method of processing theexponent E [4].In the LSB-first method, the modular multiplication and

squaring are the basic computations. The algorithmsinvolved in the implementation of the multiplier includethe LSB-first multiplication algorithm [5], MSB-first multi-plication algorithm [6], and Montgomery algorithm [7]. Therelated work is for these basic computations is shown in [6,8] based on systolic arrays. However these papers proposedonly the multiplier. In [9], a parallel architecture whichsimultaneously processes the modular multiplication andsquaring is proposed for the effective exponentiation.In the MSB-first method, the AB2 multiplication is the

basic computation. Most previous research activities related

to AB2 multiplication have focused on systolic arrays. So inthe present study, the AB2 multiplication using cellularautomata has been attempted for the first time. A cellularautomaton, particularly a three-neighbour additive cellularautomaton, can match such criteria very well and hasalready been applied to many areas, including the encryp-tion, decryption etc. of cryptosystems [10, 11]. In [12], AB2

multiplication is performed on a systolic array in 3m clockcycles using m�m cells over GF(2m), while [13] presents analgorithm that can perform AB2 multiplication in 2m+m/2clock cycles using (m�m)/2 cells over GF(2m).The goal of the current paper is to investigate and

develop a simple, regular, modular, and cascadablearchitecture for the VLSI implementation of exponentiationon GF(2m), which is the basic computation for all publickey cryptosystems.Accordingly, a new structure is proposed which facilitates

AB2 multiplication for the effective exponentiation onGF(2m) using cellular automata. The current paper presentsan architecture that can perform AB2 multiplication in mclock cycles using m cells, (3m�2) AND gates, m 2-inputXOR gates, (m�1) 3-input XOR gates and 1 (m�1)-bitregister, 2m-bit registers by applying cellular automata. Theresulting performance is much more efficient in terms oftime and space than [12] and [13]. The proposed schemeworks by implementing a two-bit left-shift operation everyclock cycle through the use of cellular automata in the AB2

multiplication based on the MSB-first method.

2 Cellular automata (CA)

A CA is a finite state machine composed of many cells thatare regularly interconnected and operated synchronously atdiscrete time steps by a common clock [10, 11]. A cell of theCA has the ‘0’ state or ‘1’ state at a certain time. The nextstate of a cell depends on the present states of itsneighbours. The neighbour in CA means the cell whichaffects the state of a cell updated. An example of the two-state three-neighbour one-dimensional CA is shown inTable 1.Here, the states of the neighbours refer to eight available

states of three neighbours at time t. Among the three bitsused to indicate the states, the middle bit shows its ownstate, while the left and right bits show the states of the leftand right neighbour, respectively. Rule 60 shows the state ofthe ith cell at time t+1, where 60 represents the eight bits ofthe next state shown in a decimal system. As such, in rule

K.M. Ku is with Mobilab. Co. Ltd, Plus B/D 4F 952-3, Dongcheon-dong,Bukgu, Daegu 702-250, Korea

K.J. Ha is with the Department of Information Processing, KyungsanUniversity, 75 San, JumChon-Dong Kyungsan, KyungPook 712-715, Korea

K.Y. Yoo is with the Department of Computer Engineering, KyungpookNationl University, Taegu 702-701, Korea

r IEE, 2004

IEE Proceedings online no. 20040161

doi:10.1049/ip-cds:20040161

Paper first received 1st October 2001 and in revised form 13th May 2003

88 IEE Proc.-Circuits Devices Syst., Vol. 151, No. 2, April 2004

60, the next state of the middle bit is renewed in terms of thevalue resulting from the XOR of the state values of the leftneighbour and itself. Therefore, if it is assumed that qi(t) isthe state value of the ith cell at time t, rule 60 can beexpressed as qi(t+1)¼ qi�1(t) " qi(t), where " representsthe XOR computation, and qi�1 and qi+1 represent the leftand right neighbour of qi, respectively.The present state of a CA with n cells can be shown in

terms of the n-vector x¼ (x0x1yxn-1), where xi is thevalue of cell i, and xi is an element of GF(2). The nextstate of a CA can be determined by multiplying acharacteristic matrix with the vector in the presentstate, where the characteristic matrix shows the entirerules of the CA. Assuming that xt is the state of the CAat time t (regarded as a one-row vector) and thecharacteristic matrix is T, the state of the CA at time t+1can be expressed as xt+1¼T xt, where the arithmeticcomputation occurs on GF(2). An example of characteristicmatrix is as follows:

0 1 0 0 . . . 0 11 0 1 0 . . . 0 00 1 0 1 . . . 0 00 0 1 0 . . . 0 00 0 0 1 . . . 0 0. . . . . . . . . . . . . . . . . . . . .1 0 0 0 . . . 1 0

2666666664

3777777775

In the above example, the element ‘1’ in the matrix on theith line of the jth row indicates that the ith cell is dependenton the neighbour of the jth cell.

3 Algorithm for exponentiation on GF(2m)

This Section presents the general algorithms for obtainingA(x) E mod P(x) on GF(2m) [4]. The polynomial P(x) ofarbitrary degree with coefficients from GF(2) is called anirreducible polynomial if P(x) is indivisible by anypolynomial over GF(2) of a degree greater than 0 yet lessthan the degree of P(x) [1]. Let P(x)¼ xm+pm�1x

m�1+y

+p1x1+p0 be an irreducible polynomial over GF(2). Any

element of GF(2m) can be represented by a standard basis,such as A¼ am�1x

m�1+am�2xm�2+y+a1x+a0, where

aiAGF(2) for 0rirm�1. {1, x, x2,y, xm�2, xm�1 } is astandard basis for GF(2m).Let us suppose that A(x) and B(x) are the elements on

GF(2m). Then the two polynomials A(x) and B(x) are asfollows:

AðxÞ ¼ am�1xm�1 þ . . .þ a1x1 þ a0 ð1Þ

BðxÞ ¼ bm�1xm�1 þ . . .þ b1x1 þ b0 ð2ÞFirst, the computation ofA(x)Emod P(x) is divided into theLSB-first method and MSB-first method according to themethod of processing the exponent E, [em�1, em�2,y,e1, e0],where the method of computation is as follows:LSB-first exponentiation is computed in the order of

from e0 to em�1

AðxÞE ¼ AðxÞe0ðAðxÞ2Þe1ðAðxÞ4Þe2 . . . ðAðxÞ2m�1

Þem�1

MSB-first exponentiation is computed in the order of fromem�1 to e0

AðxÞE ¼ ð. . . ððAðxÞem�1Þ2AðxÞem�2Þ2 . . .AðxÞe1Þ2AðxÞe0

The current Section reviews the general algorithm for themethod of MSB-first exponentiation [4], then the structureof the proposed multiplier using a CA is presented in thenext Section.

Algorithm 1: MSB-first exponentiation algorithm

Input : A(x), E, P(x)

Output: B(x)¼A(x)E mod P(x)

Step 1: if em�1¼ 1 B(x) ¼ A(x) else B(x)¼ 1Step 2: for i¼m�2 to 0Step 3: if ei¼ 1 B(x) ¼ A(x)B(x)2 mod P(x)

else B(x)¼B(x)2 mod P(x)

The general method for implementing algorithm 1 is todesign a computing machine for AB2 multiplication basedon step 3 and then use it to implement the exponentiation.In contrast, the next Section proposes a structure for theefficient implementation of such AB2 multiplication in mclock cycles using CA.

4 AB2 multiplier using cellular automata

This Section presents a structure whereby the computationof AB2 (step 3 of algorithm 1), an essential element forobtaining A(x)E mod P(x) in the MSB-first method onGF(2m), can be accomplished within a short time using CA.First, the equation shown below can be induced from (1)

and (2) in Section 3.

BðxÞ2 mod PðxÞ ¼ bm�1x2m�2 þ bm�2x2m�4 þ . . .

þ b1x2 þ b0 mod P ðxÞ ð3Þ

AðxÞBðxÞ2 mod P ðxÞ¼AðxÞðbm�1x2m�2þbm�2x2m�4þ . . .þ b1x2þ b0Þmod P ðxÞ¼ ðAðxÞbm�1x2m�2 þ AðxÞbm�2x2m�4 þ . . .þAðxÞb1x2 þ AðxÞb0Þ mod P ðxÞ¼ ½AðxÞbm�1x2m�4 þ AðxÞbm�2x2m�6 þ . . .

þ AðxÞb1�x2 mod P ðxÞ þ AðxÞb0¼ f. . . ½AðxÞbm�1x2 mod P ðxÞ þ AðxÞbm�2�x2 mod PðxÞþ . . .þ AðxÞb1gx2 mod P ðxÞ þ AðxÞb0

ð4ÞA concrete algorithm for implementing (4) is algorithm 2.

Algorithm 2 : AB2 multiplication algorithm

Input : A(x), B(x), P(x)

Output : A(x)B(x)2 mod P(x)

Step 1 : M(x)¼ 0;Step 2 : for i¼m�1 to 0Step 3 : M(x)¼M(x)x2 mod P(x) + A(x)bi

The M(x)x2 mod P(x) operation and A(x)bi (m�1ZiZ0)operation can be performed simultaneously in step 3 ofalgorithm 2, where the basic operations for implementingthe above are as follows:

Operation 1 : M(x) x2

Operation 2 : modular reduction

Operation 3 : A(x)bi (m�1Zi Z0)

Table 1: Two-state three-neighbour 1-D CA

State ofneighbour

111 110 101 100 011 010 001 000

Next state(rule 60)

0 0 1 1 1 1 0 0

IEE Proc.-Circuits Devices Syst., Vol. 151, No. 2, April 2004 89

First, to perform operation 1, which requires a two-bit left-shift to implement x2, a CA with an initial value of 0 and mcells is used. The next state of each cell is defined as the stateof the second right neighbour in a CA with m cells. Here,the leftmost cell and rightmost cell of the CA are adjacent.The m�m characteristic matrix T showing such a CA canbe expressed as follows:

T ¼

0 0 1 0 . . . 0 00 0 0 1 . . . 0 00 0 0 0 . . . 0 00 0 0 0 . . . 0 0. . . . . . . . . . . . . . . . . . . . .1 0 0 0 . . . 0 00 1 0 0 . . . 0 0

2666666664

3777777775

The structure of the CA reflecting the characteristic matrixT is shown in Fig. 1. To perform operation 2, which is themodular reduction, two modular reduction operations arerequired due to the two-bit left-shift resulting fromoperation 1. The order of the two modular operations isshown in Fig. 2.

Since the resultant value obtained from the CA has beenshifted to the left by two bits as a result of operation 1, twomodular reductions are implemented, as shown in Fig. 3,provided that the first carry obtained as a result of theshifting is stored in the second LSB. The modular reductionis taken when the value of the second LSB bit of resultant ofthe CA is 1. The operation with p0 is not needed because p0is always 1, modular reduction is not occurred when carry is0 and the result value must be 1, 0 XOR 1(¼ p0), whencarry is 1. So we can use the carry value directly.

Operation 3 can be easily obtained using m ANDgates since each element of A(x) must be multiplied bythe element bm-i-1 in the ith (0rirm�1) clock in orderto perform operation 3. This structure is shown inFig. 4.Next, a method is presented for obtaining AB2 based on

algorithm 2 using operations 1, 2, and 3. Operations 2 and 3are performed simultaneously. The entire structure is shownin Fig. 5. The initial value is set to CA reflecting the

cell m−1cell m−2cell 2cell 1cell 0

clock

cell 3 ...

.... ..

... ...

Fig. 1 Cellular automata structure reflecting characteristicmatrix T

mm−1 mm−2 m0... m1mm−3

mm−1

m m−1

m m−1

m m−2 m m−3

mm−2

pm−1 pm−2 pm−3

mm−3 mm−4

1st modular reduction

m2

0... m0m1

1st carry

p0... p1p2

if mm−1 is 1 then mi−1 XOR pi (0 ≤ i ≤ m−1)

where m−1 = 0

if m m−1 is 1 then m i−1 XOR pi (0 ≤ i ≤ m−1)

where m−1 = 0

2nd modular reduction

p0... p1p2

0

2nd carry

...

...

output

m 2

m 1

m 0

m m−3 m m−4

m 1

m 0

input

...

...

.. .

...

′′ ′ ′ ′ ′

′ ′ ′ ′ ′ ′

m m−2

pm−1 pm−2 pm−3

′

′′

Fig. 2 Order of the two modular operations

...

...

...

1st carry line

2nd carry line

< initial value >0

...

...

...

...

...

...

000

pm−1 pm−2 pm−3 ... p2 p1

CAin Fig. 1

0

Fig. 3 Structure of two modular reduction operations

am−1 am−2 a0... a1

...

b0 ... bm−2 bm−1

Fig. 4 Structure of A(x)bi


characteristic matrix T, A, and P registers prior to the firstoperation. Initial values are as follows:

Initial values of CA : all 0

Initial values of A register : A(x)¼ am�1ya2 a1 a0

Initial values of P register : P(x)¼ pm�1yp2 p1

At ith (0rirm�1) clock, the result value after twomodular reductions is input for XOR operation withA(x)bm-i-1. So, the result value is stored at CA afterm iterations. As such, it is possible to perform AB2

multiplication in m clock cycles using (3m�2) AND gates,m 2-input XOR gates, (m�1) 3-input XOR gates, and1 (m�1)-bit register, 2 m-bit registers if the structure shownin Fig. 5 is used.

5 Analysis

This Section gives an analysis for the performance ofproposed AB2 multiplier with that of previous studies.Table 2 shows the comparison of AB2 multiplier. Thecomplexity of the time and space considers the entire circuit.The proposed structure can facilitate AB2 multiplication tobe performed inm clock cycles using (3m�2) AND gates,m2-input XOR gates, (m�1) 3-input XOR gates, and 1(m�1)-bit register, and 2 m-bit registers. The proposedstructure was much more efficient in terms of space andtime when compared to that of [12] and [13].

We generally consider construction simplicity, defined bythe number of transistors needed for its construction andthe time needed for the signal change to propagate throughgates [14]. Table 3 shows the comparison of the area–timeproduct. In this comparison, the 1-bit register is counted as1 FF (flip-flop) [14]. As shown in Table 3, the proposedstructure has O(m) area complexity whereas others haveO(m2) area complexity. And also, in view of the time, the

CA in Fig.1

...

...

...

1st carry line

2nd carry line

<initial value>0 0 0

a0am−1am−2 ... a1a2

...

...

...

...

...

...

...

pm−1pm−2 pm−3 ... p1p2

0

A (x )

A(x )B(x )2 mod P(x )

AB 2AB2

P (x )

b0 ... bm−1bm−2

0

Fig. 5 Structure of AB2 multiplication using cellular automata

Table 2: Comparison of AB2 multiplier

Structure Systolic array(2-dimensional)

This paper

Wei [12] Wang [13]

Operation AB2 AB2 AB2

No. of ANDgates

3m�m 3(m�m) 3m�2

No. of XORgates

3m�m 3(m�m) 2-input : m

3-input : m�1

No. of latches 10m�m 17(m�m)/2 0

No.ofregisters

0 0 m�1 bit : 1

m bit : 2

Executiontime(clock cycles)

3m 2m+m/2 m

Table 3: Area–time product for AB2 multiplier

Wei [12] Wang [13] This paper

Area (A) 3m2A2AND+3m2A2XOR+10m2A1LATCH¼140m2F

3m2A2AND+3m2A2XOR+8.5m2A1LATCH¼128m2F

(3m�2)A2AND+(3m�2)A2XOR+(3m�1)A1FF¼ (114m�58)F

Time (T ) 3m(T2AND+2T2XOR) ¼32.4mD (2m+m/2)(T2AND+3T2XOR)¼37.5mD m (T2AND+ 3T2XOR+T1FF)¼18.8 mD

AT product 4536m3FD 4800m3FD (2143.2 m2�1090.4 m)FD

IEE Proc.-Circuits Devices Syst., Vol. 151, No. 2, April 2004 91

proposed structure is more efficient than that of [12] and[13]. In the case of m¼ 512, the proposed structure is twicefaster than that of [12] and [13] as. In the case of the space,the complexity is O(m2), whereas others are O(m3).

6 Conclusion

This paper presented a new structure in which AB2

multiplication for effective exponentiation on GF(2m) canbe performed using cellular automata. As a result, theproposed structure can facilitate AB2 multiplication to beperformed in m clock cycles using 3m�2 AND gates, m 2-input XOR gates, (m�1) 3-input XOR gates, and 1 (m�1)-bit register, and 2 m-bit registers. This is accomplished byimplementing a two-bit left-shift operation every clock cycleusing cellular automata in the AB2 multiplication based onthe MSB-first method.The proposed AB2 multiplier can be used for more

efficient implementation of the exponentiation, divider, andinverter.

7 Acknowledgments

The authors thank the anonymous reviewers for theircomments and suggestions. This research was supported byUniversity IT Research Center Project.

8 References

1 McEliece, R.J.: ‘Finite fields for computer scientists and engineering’(Kluwer Academic, New York, USA, 1987)

2 Diffie, W., and Hellman, M.E.: ‘New directions in cryptography’,IEEE Trans. Inf. Theory, 1976, 22, pp. 644–654

3 ElGamal, T.: ‘A public key cryptosystem and a signature schemebased on discrete logarithms’, IEEE Trans. Inf. Theory, 1985, 31, (4),pp. 469–472

4 Knuth, D.E.: ‘The art of computer programming: seminumericalalgorithms’ (Addison Wesley, 1998 2nd edn.)

5 Yeh, C.S., ReedIrving, S., and Truong, T.K.: ‘Systolic multipliersfor finite fields GF(2m)’, IEEE Trans. Comput., 1984, C-33, (4),pp. 357–360

6 Wang, C.L., and Lin, J.L.: ‘Systolic array implementation ofmultipliers for finite fields GF(2m)’, IEEE Trans. Circuits Syst.,1991, 38, (7), pp. 796–800

7 Montgomery, P.L.: ‘Modular multiplication without trial division’,Math Comput., 1985, 44 (170), pp. 519–521

8 Yeh, C.-S., Reed Irving, S., and Truong, T.K.: ‘Systolic multipliers forfinite fields GF(2m)’, IEEE Trans. Comput., 1984, C-33, (4), pp. 357–360

9 Ku, K.M., Ha, K.J., Kim, H.S., Yoo, K.Y.: ‘New parallel architecturefor modular multiplication and squaring based on cellular automata,’Lect. Notes Comput. Sci., 2002, 2367, pp. 359–369

10 Delorme,M., andMazoyer, J.: ‘Cellular automata’ (Kluwer AcademicPublishers, 1999)

11 Wolfram, S.: ‘Cellular automata and complexity’ (Addison-WesleyPublishing Company, 1994)

12 Wei, S.W.: ‘A systolic power-sum circuit for GF(2m)’, IEEE Trans.Comput., 1994, 43, (2), pp. 226–229

13 Wang, C.L., and Guo, J.H.: ‘New systolic arrays for C+AB2,inversion, and division in GF(2m)’, IEEE Trans. Comput., 2000, 49,(10), pp. 1120–1125

14 Gajski, D. D.: ‘Principles of digital design’ (Prentice-Hall Interna-tional, Inc., 1997)


Documents

Design of new AB2 multiplier over GF(2m) using cellular automata