10-3-8

8/12/2019 10-3-8

1/12

Finite Field Polynomial Multiplier with

Linear Feedback Shift Register

Che-Wun Chiou1*, Chiou-Yng Lee

2and Jim-Min Lin

3

1Department of Computer Science and Information Engineering, Ching Yun University,

Chung-Li, Taiwan 320, R.O.C.2Department of Computer Information and Network Engineering, Lung Hwa University of Science & Technology,

Taoyuan, Taiwan 333, R.O.C.3Department of Information Engineering and Computer Science, Feng Chia University,

Taichung, Taiwan 407, R.O.C.

Abstract

We will present an one-dimensional polynomial basis array multiplier for performing

multiplications in finite field GF(2m

). A linear feedback shift register is employed in our proposed

multiplier for reducing space complexity. As compared to other existing two-dimensional polynomial

basis multipliers, our proposed linear array multiplier drastically reduces the space complexity from

O(m2) to O(m). A new two-dimensional systolic array version of the proposed array multiplier is also

included in this paper. The proposed two-dimensional systolic array multiplier saves about 30% of

space complexity and 27% of time complexity while comparing with other two-dimensional systolic

array multipliers.

Key Words: Finite Field, Multiplication, Polynomial Basis, Systolic Array, Cryptography

1. Introduction

Arithmetic operations in a finite field play an in-

creasingly important role in error-correcting codes [1],

cryptography [2], digital signal processing [3,4], and

pseudorandom number generation [5]. Two premier ari-

thmetic operations over finite fields are addition and

multiplication. Addition operation is simple. Multiplica-

tion operation requires more computational time andhigher circuit complexity. Many other complex arithme-

tic operations, like exponentiation, division, and multi-

plicative inversion, can be therefore performed by apply-

ing multiplication operations repeatedly. Hence, it is im-

portant in a practical sense to develop fast multiplication

algorithms for these complex arithmetic operations. In

recent years, the realization of multiplication operation

in finite fields has received wide attentions, and several

approaches have been presented [636]. The complexity

of implementing multiplication operations depends on

the representation of the field elements. There are three

main representation types of bases over GF(2m) fields,

namely, normal basis (NB), dual basis (DB), and polyno-

mial basis (PB). The major advantage of the NB multipli-

ers [68] is that the squaring of an element could be com-puted simply by a cyclic shift of the binary representa-

tion. Thus, the normal basis multipliers could be very ef-

fectively applied on performing inverse, squaring, andexponentiation operations. The DB multipliers [913]require less chip area than other two types. However, the

former two multipliers need basis conversion, while the

latter type does not [36]. The polynomial basis represen-

tation has been widely used and leads to lots of efficient

implementations of multipliers. As compared to other

two bases multipliers, the polynomial multipliers have

the feature of lower design complexity and their sizes

could be easily extended to desirable scales to meet vari-

ous applications due to their simplicity, regularity, and

modularity in architecture.

Tamkang Journal of Science and Engineering, Vol. 10, No. 3, pp. 253264(2007) 253

*Corresponding author. E-mail: [email protected]

8/12/2019 10-3-8

2/12

Numerous architectures for PB multipliers have

been presented [1435]. The first parallel PB multiplierwas suggested by Bartee and Schneider [14]. The PB

multiplication operation for GF(2m) is often accompli-

shed in two steps: polynomial multiplication and modu-

lar reduction. In practical, both steps are usually com-

bined together for performance reason. Mastrovito [15,

16] firstly proposed the architecture for performing such

combinational operations. Recently, several bit-parallel

PB multipliers have been proposed for VLSI implemen-

tation by using some specific classes of polynomials,

such as trinomials [1723], all one polynomials (AOP)and equally spaced polynomials (ESP) [2426], and

composite fields [27,28]. Yet these architectures stillhave certain shortcomings as regards cryptographic ap-

plication due to their high circuit complexity and long la-

tency. When the size of the finite field is getting large, the

issue of modular multipliers design requires much more

attentions. To alleviate the long latency problem, most

existing PB multipliers employ XOR trees to minimize

time complexity. Unfortunately, these circuits are not

suitable for VLSI systems, due to the irregular and non-

modular structure of XOR trees. To overcome this prob-

lem, Lee [22] has proposed a regular and modular PB

multiplier using irreducible trinomials with the space

complexity of O(m2) and the time complexity of O(m).

This multiplier could be easily extended and implemen-

ted using VLSI technologies.

In this article, we will present a linear parallel-in par-

allel-out PB array multiplier using general irreducible

polynomials with a linear feedback shift register. The

proposed PB multiplier requires the space complexity of

O(m). In order to demonstrate that our proposed multi-

plier is superior to other existing two-dimensional sys-

tolic array multipliers, a new two-dimensional systolicarray multiplier version of such multiplier is also pre-

sented. We will show that the proposed two-dimensional

systolic array multiplier also saves both space and time

complexities while comparing with other existing two-

dimensional systolic array multipliers.

The organization of this paper is as follows. In Sec-

tion 2, we will provide some basic definitions and pre-

liminaries. In Section 3, we derive the one-dimensional

parallel-in parallel-out PB multiplication algorithm us-

ing general irreducible polynomials and a linear feed-

back shift register. The two-dimensional systolic array

version of the proposed algorithm is then described in

Section 4. The space and time complexities are discussed

in Section 5. Finally, a brief conclusion is given in Sec-

tion 6.

2. Preliminaries

It is assumed that the reader is familiar with the basic

concepts of finite fields. The properties of finite fields

are covered in detail in [1,2]. The properties of finite

fields are reviewed briefly as required in the following

paragraphs.

The finite field GF(2m) can be viewed as a vector

space of dimension m over GF(2). Suppose that the finitefield GF(2m) is generated by the irreducible polynomial

P(x) = p0 + p1x + + pm-1xm-1 + xm of degree m over

GF(2), wherep0= 1. Then any element A in the Galois

field GF(2m) can be represented asA(x) = a0+ a1x + a2x2

+ + am-1xm-1, where x is an intermediate over GF(2).

The basis {1, x, x2,, xm-1} is known as standard basis

and often refered to as polynomial basis, conventional

basis or canonical basis. Since P(x) = 0,xm = p0+ p1x +

+ pm-1xm-1 can be used to reduce the high order termxp,

pm, to a polynomial of degree less than m. Thus,xB(x)modP(x)can be reduced by

xB(x) mod P(x)

=b0x+b1x2 + + bm-1x

m mod P(x)

= bm-1p0 + (bm-1p1 + b0)x + + (bm-1pm-1 + bm-2)xm-1

Let

B(x)(1) =xB(x) mod P(x) (1)

Therefore, x iB(x) mod P(x)can be obtained as the fol-

lowing formula

B(x)(i) =xB(x)(i-1) mod P(x) (2)

Note thatB(x)(0) =B(x).

Let the PB representation ofB(x)(i) be

According tox m =p0+ p1x + p2x2 + + pm-1x

m-1, the

relation between B(X)(i+1) and B(X)(i) is depicted as

follows:

254 Che-Wun Chiou et al.

( ) 2 3 1,0 ,1 ,2 ,3 , 1( ) ... ,

0,1 for 0 j m - 1.

i mi i i i i m

i, j

B x b b x b x b x b x

where b

8/12/2019 10-3-8

3/12

8/12/2019 10-3-8

4/12

The shift registers S and E are initially loaded with

B(X)andA(X)as follows:

Sj(0)=bjandEj(0)=ajfor 0j7.

The detailed circuit of each cell Ujis shown in Figure 2.

The cell Ujrealizes the following function:

vout=hinv1inv2in, andhout=hin.

The output vout is computed and then the result is

latched in the 1-bit latch for each clock. All 1-bit

latches in U cells are initially reset to 0s. The symbol L

in Figure 2 represents a 1-bit latch. The detailed cir-

cuits for cellsSjand Ejcould be found in Figure 3 and

Figure 4, respectively. The shift registers S and E can

be loaded in parallel.

The procedure for computingC(X) = A(X) B(X)in Fig-

ure 1 is described in the Appendix.


Figure 1. Hardware implementation ofC(X) = A(X)B(X) mod (1+X+X3+X4+X8).

Figure 2. The detailed circuit of the cell Uj. Figure 3. The detailed circuit of the cellSj.

8/12/2019 10-3-8

5/12

4. Implementation with Semi-Systolic

Two-Dimensional Array

A semi-systolic two-dimensional systolic array im-

plementation of the proposed array multiplier structure is

discussed in this section. As aforementioned, the results

in Eqs. (2) and (4) are rewritten as follows:

andB(X)(i) for 0im - 1is represented by

The initial value ofB(X)(i) is assigned as follows:

and the relation between coefficients of B(X)(i+1) and

B(X)(i) is illustrated as follows:

(5)

Each coefficient of the product C X c xjj

j

j m

( )

0

1

is com-

puted as follows:

(6)

The following algorithm can be used for computing

the coefficient cjbased on Eq. (6).

Algorithm A:(Using traditional method)

cj: = 0;

b-1,j-1: = bj;

b-1,m-1: = 0;

For i = 0 to m-1

Begin

cj:=cj+aibi-1,j-1;

cj:=cj+aibi-1,m-1pj;

End

If Algorithm A is realized with the hardware circuit, the

propagation delay of one AND gate delay and two XOR

gate delays is needed. To shorten this propagation de-

lay, a parallel version of Algorithm A, Algorithm B, is

depicted as follows.

Algorithm B:(Using parallel method)

z-1:=bj;

b-1,m-1:=bm-1;

For i = 0 to m-1

Begin

Cobegin

zi:=zi-1+bi-1,m-1pj;

cj:=cj+aizi-1;

Coend

End

Finite Field Polynomial Multiplier with Linear Feedback Shift Register 257

Figure 4. The detailed circuit of the cellEj.

( ) 2 3 1,0 ,1 ,2 ,3 , 1( ) ... ,

0,1 for 0 j m - 1.

i mi i i i i m

i, j

B X b b x b x b x b x

where b

(0)( ) ( )B X B X

1,0 , 1 0

1, , 1 , 1

,

for 1 j m - 1

i i m

i j i j i m j

b b p

b b b p

1

,

0

for 0 j m - 1, or

m

j i i j

i

c a b

(0) (1) (2)0 1 2

(3) ( 1)3 1

( ) ( ) ( ) ( )

( ) ... ( ) ,mm

C X a B X a B X a B X

a B X a B X

0 0, 1 1, 2 2, 3 3,

4 4, 1 1,

0 0, 1 0, 1 0, 1 2 1, 1 1, 1

3 2, 1 2, 1

1 2, 1 2, 1

0 1 1 1 0, 1 2

...

( ) ( )

( ) ...

( )

( ) (

j j j j j

j m m j

j j m j j m j

j m j

m m j m m j

j j m j

c a b a b a b a b

a b a b

a b a b b p a b b p

a b b p

a b b p

a b a b a b p a

1, 1 2 1, 1

3 2, 1 3 2, 1

1 2, 1 1 2, 1

)

( ) ...

( )

j m j

j m j

m m j m m m j

b a b p

a b a b p

a b a b p

8/12/2019 10-3-8

6/12

Based on Eqs. (5~6) and Algorithm B, the semi-

systolic two-dimensional systolic array for realizing

the product C(X) = A(X) B(X)is shown in Figure 5.The circuit for the processing elementVi,j is shown in

Figure 6.

5. Complexity

In the CMOS VLSI technology, 2-input AND, 2-

input XOR, and 1-bit latch are composed of 6, 6, and 8

transistors, respectively [37]. Suppose that an XOR gate

with 3-input and an XOR gate with 4-input are const-

ructed by two 2-input XOR gates and three 2-input XOR

gates, respectively. Thus, the propagation delays of go-ing through a 3-input XOR gate and a 4-input XOR gate

would be the same. A comparison of space and area-time

complexities of various PB bit-parallel multipliers is

given in Table 1.

Suppose that the generating polynomial P(X) has k

terms. Most existing PB multipliers using XOR binary

trees require the space complexity of O(m2) and take


Figure 5. The proposed semi-systolic two-dimensional systolic array over GF(2m).

Figure 6. The detailed circuit for the cellVi,j.

8/12/2019 10-3-8

7/12

time complexity of O(log2 m) [24,25]. However, such

multipliers are not regular and then are not suitable for

VLSI implementation due to their tree structures. To

overcome this problem, many systolic array structures,

which have features of regularity and modularity and are

well suited to VLSI implementation, have been present-

ed. However, most existing systolic array multipliers

need the space complexity of O(m2). Our proposed linear

systolic array multiplier in Figure 1 using an irreducible

polynomial only requires the space complexity of O(m).

However, two-dimensional systolic array multipliers are

useful when there are many successive multiplication

operations to be performed as in the case of exponen-

tiation operation. Thus, a two-dimensional systolic arrayversion of the proposed multiplier is shown in Figure 5.

Comparing with the multiplier proposed by Wei [33], the

proposed two-dimensional semi-systolic systolic array

multiplier in Figure 5 saves about 30% of space com-

plexity.

Comparisons of time complexities of various PB

bit-parallel multipliers are given in Table 2. Let TA,

TX, TL, and T3X represent the gate delays of 2-input

AND gate, 2-input XOR gate, 1-bit latch, and 3-input

XOR gate, respectively. We assume that real circuits

such as M74HC86 (STMicroelectronics, XOR gate,

tPD = 12ns (TYP.)) [38], M74HC08 (STMicroelectro-

nics, AND gate, tPD= 7ns (TYP.)) [39], and M74HC279

(STMicroelectronics, Latch, tPD = 13ns (TYP.)) [40]

are employed. The proposed multiplication architec-

tures in Figure 1 and Figure 5 save about 27% of time

complexity as compared to the multiplier in [33]. Al-

though, the developed multiplier in Figure 1 increases

the space complexity as compared to Lees multiplier

[22] for all trinomials, but saves about 33% of area-

time complexity.

6. Conclusion

In this study, we have presented an one-dimensionalarray multiplier for performing multiplications in the

finite field GF(2m) with the PB representation. A linear

feedback shift register is employed in our proposed

multiplier. Our proposed linear array multiplier re-

quires only O(m) space complexity while other exist-

ing two-dimensional systolic array multipliers need

O(m2) space complexity. Such low-complexity multi-

plier is very attractive for mobile platforms such as

PDA and smart phone. A new two-dimensional sys-

tolic array version of the proposed multiplier has also

been included. The proposed two-dimensional systolic


Table 1.Comparison of various PB bit-parallel multipliers

Space complexityItems

Multipliers

Generating

polynomial Function

Gate count Transistor countLatency

Area-time

complexity (ns)

Yeh et al. [34] General form AB + C #AND2: 2m2

#XOR2: 2m2

#L: 7m2

80m2

3m 7680m3

Wang-Lin [31] General form AB + C #AND2: 2m2

#XOR3: m2

#L: 7m2

76m2

3m 10032m30

Wei [33] General form AB + C #AND2: 3m2

#XOR2: m2

#XOR3: m2

#L: 4m2

68m2

m 2992m3

Lee [22] Trinomials AB + C #AND2: m2

#XOR2: m

2+ m - 1

#L: 3m2

- 2m - 2

36m2

+ 24m - 24 2m - 1 2304m3

Our proposal in Fig. 1 General form AB + C #AND2: 7m

#XOR2: m + k

#L: 3m

72m + 6k m 2304m2

Our proposal in Fig. 5 General form AB + C #AND2:2m2

#XOR2: 2m2

#L: 3m2

48m2

m 1536m3

8/12/2019 10-3-8

8/12

array multiplier saves about 30% of space complexity

and 27% of time complexity while comparing with

other existing two-dimensional systolic array multi-

plier in [33].

Appendix: Procedure-A

Procedure-A:

/* LetC(X) = A(X)B(X) mod P(X), and *//* C(X) = c0 + c1X

1 + c2X2 + c3X

3 + c4X4 + c5X

5 + c6X6

+C7X7, */

/*A(X) = a0 + a1X1 + a2X

2 + a3X3 + a4X

4 + a5X5 + a6X

6

+A7X7, */

/*B(X) = b0 + b1X1 + b2X

2 + b3X3 + b4X

4 + b5X5 + b6X

6

+B7X7, */

/*P(X) = 1 + X1 + X3 + X4 + X8. */

Step 0: Initial condition;

(a)B(X)is loaded into the linear feedback shift regis-ter S as follows:Si(0) = bifor 0 i 7.

(b)A(X)is loaded into the shift register E as follows:

Ei(0) = aifor 0 i 7.

(c) All 1-bit latches in cells Uifor 0 i 7 are ini-

tially reset to zeros.

Step 1: At clock cycle 0; Cells U0~U7do following op-

erations:

U0=0E0 (0)S0 (0) =a0 b0,U1=0E0 (0)S1 (0) =a0 b1,U2=0E0 (0)S2 (0) =a0 b2,

U3=0E0 (0)S3 (0) =a0 b3,U4=0E0 (0)S4(0) =a0b4,U5=0E0(0)S5 (0) =a0 b5,U6=0E0 (0)S6(0) =a0b6,U7=0E0(0)S7(0) =a0b7.


erations:

U0=U0E0 (1)S0(1) =a0 b0a1b7,U1=U1E0 (1)S1(1) =a0b1a1(b0 b7),U2= U2E0 (1)S2(1) =a0 b2a1b1,U3=U3E0 (1)S3(1) =a0 b3a1(b2b7),U4=U4E0 (1)S4(1) =a0 b4a1(b3b7),U5=U5E0 (1)S5(1) =a0 b5a1b4,U6=U6E0 (1)S6(1) =a0 b6a1b5,U7=U7E0 (1)S7(1) =a0 b7a1b6.


erations:U0=U0E0 (2)S0 (2) =a0 b0a1b7a2b6,U1=U1E0 (2)S1(2) =a0 b1a1(b0b7)

a2(b7b6),U2=U2E0 (2)S2(2) =a0 b2a1b1a2(b0b7),U3=U3E0 (2)S3(2) =a0 b3a1(b2b7)

a2(b1b6),U4=U4E0 (2)S4(2) =a0 b4a1(b3b7)

a2(b2b7b6),U5=U5E0 (2)S5(2) =a0 b5a1b4a2(b3b7),U6=U6E0 (2)S6(2) =a0 b6a1b5a2b4,U7=U7E0 (2)S7(2) =a0 b7a1b6a2b5.


Table 2.Comparisons of time complexities of various PB bit-parallel multipliers

Time complexityItems

Multipliers

Generating

polynomial Function

Latency

(unit = clockcycles)

Throu-ghput

Propagation throughone cell

Total propagationdelay (unit = ns)

Yeh et al. [34] General form AB + C 3m 1 TA+TX+TL 3m(TA+TX+TL)(96m ns)

Wang-Lin [31] General form AB + C 3m 1 TA+T3X+TL 3m(TA+T3X+TL)(132m ns)

Wei [33] General form AB + C m 1 TA+T3X m(TA+T3X+TL)

(44m ns)

Lee [22] Trinomials AB + C 2m - 1 1 TA+TX+TL (2m-1)(TA+TX+TL)

(64m ns)

Our proposal in Fig. 1 General form AB + C m 1/m TA+TX+TL m(TA+TX+TL)(32m ns)

Our proposal in Fig. 5 General form AB + C m 1 TA+TX+TL m(TA+TX+TL)(32m ns)

8/12/2019 10-3-8

9/12


erations:

U0=U0E0 (3)S0 (3) =a0 b0a1b7a2b6a3b5,U1=U1E0 (3)S1(3) =a0 b1a1(b0b7)

a2(b7b6)a3(b6b5),U2=U2E0 (3)S2(3) =a0 b2a1b1a2(b0b7)

a3(b7b6),U3=U3E0 (3)S3(3) =a0 b3a1(b2b7)

a2(b1b6)a3(b0b7b5),U4=U4E0 (3)S4(3) =a0 b4a1(b3b7)

a2(b2b7b6)a3(b1b6b5),U5=U5E0 (3)S5(3) =a0 b5a1b4a2(b3b7)

a3(b2b7b6),

U6=U6E0 (3)S6(3) =a0 b6a1b5a2b4a3(b3b7),U7=U7E0 (3)S7(3) =a0 b7a1b6a2b5a3b4.


erations:

U0=U0E0 (4)S0 (4) =a0 b0a1b7a2b6a3b5a4b4,

U1=U1E0 (4)S1(4) =a0 b1a1(b0b7)a2(b7b6)a3(b6b5)a4(b5b4),

U2=U2E0 (4)S2(4) =a0 b2a1b1a2(b0b7)

a3(b7b6)a4(b6+b5),U3=U3E0 (4)S3(4) =a0 b3a1(b2b7)a2(b1

b6)a3(b0b7b5)a4(b7b6b4),U4=U4E0 (4)S4(4) =a0 b4a1(b3b7)

a2(b2b7b6)a3(b1b6b5)a4(b0b7b5b4),

U5=U5E0 (4)S5(4) =a0 b5a1b4a2(b3b7)a3(b2b7b6)a4(b1b6b5),

U6=U6E0 (4)S6(4) =a0 b6a1b5a2b4a3(b3b7)a4(b2b7b6),

U7=U7E0 (4)S7(4) =a0 b7a1b6a2b5a3b4a4(b3b7).


erations:

U0=U0E0 (5)S0 (5) =a0 b0a1b7a2b6a3b5a4b4a5(b3b7),

U1=U1E0 (5)S1(5) =a0 b1a1(b0b7)a2(b7b6)a3(b6b5)a4(b5b4)a5(b4b3b7),

U2=U2E0 (5)S2(5) =a0 b2a1b1a2(b0b7)

a3(b7b6)a4(b6b5)a5(b5b4),

U3=U3E0 (5)S3(5) =a0 b3a1(b2b7)a2(b1b6)a3(b0b7b5)a4(b7b6b4)a5(b6b5b3b7),

U4=U4E0 (5)S4(5) =a0 b4a1(b3b7)a2(b2b7b6)a3(b1b6b5)a4(b0b7b5b4)a5(b7b6b4b3b7),

U5=U5E0 (5)S5(5) =a0 b5a1b4a2(b3b7)a3(b2b7b6)a4(b1b6b5)a5(b0b7b5b4),

U6=U6E0 (5)S6(5) =a0 b6a1b5a2b4a3(b3b7)a4(b2b7b6)a5(b1b6b5),

U7=U7E0 (5)S7(5) =a0 b7a1b6a2b5a3b4a4(b3b7)a5(b2b7b6).


erations:

U0=U0E0 (6)S0 (6) =a0 b0a1b7a2b6a3b5a4b4a5(b3b7)a6(b2b7b6),

U1=U1E0 (6)S1(6) =a0 b1a1(b0b7)a2(b7b6)a3(b6b5)a4(b5b4)a5(b4b3b7) +a6(b3b7b2b7b6),

U2=U2E0 (6)S2(6) =a0 b2a1b1a2(b0b7)a3(b7b6)a4(b6b5)a5(b5b4)a6(b4b3b7),

U3=U3E0 (6)S3(6) =a0 b3a1(b2b7)a2(b1b6)a3(b0b7b5)a4(b7b6b4)a5(b6b5b3b7)a6(b5b4b2b7b6),

U4=U4E0 (6)S4(6) =a0 b4a1(b3b7)a2(b2b7b6)a3(b1b6b5)a4(b0b7b5b4)a5(b7b6b4b3b7)a6(b6b5b3b7b2b7b6),

U5=U5E0 (6)S5(6) =a0 b5a1b4a2(b3b7)a3(b2b7b6)a4(b1b6b5)a5(b0

b7b5b4)a6(b7b6b4b3b7),U6=U6E0(6)S6(6) =a0 b6a1b5a2b4a3(b3

b7)a4(b2b7b6)a5(b1b6b5)a6(b0b7b5b4),

U7= U7E0 (6)S7(6) =a0 b7a1b6a2b5a3b4 a4(b3 b7) a5(b2 b7 b6) a6(b1 b6+ b5).


erations:

U0=U0E0 (7)S0 (7) =a0 b0a1b7a2b6a3b5a4b4a5(b3b7)a6(b2b7b6)

a7(b1b6b5),


8/12/2019 10-3-8

10/12

U1=U1E0 (7)S1(7) =a0 b1a1(b0b7)a2(b7b6)a3(b6b5)a4(b5b4)a5(b4b3b7)a6(b3b7b2b7b6)a7(b2b7b6b1b6b5),

U2=U2E0 (7)S2(7) =a0 b2a1b1a2(b0b7)a3(b7b6)a4(b6b5)a5(b5b4)a6(b4b3b7)a7(b3b7b2b7b6),

U3=U3E0 (7)S3(7) =a0 b3a1(b2b7)a2(b1b6)a3(b0b7b5)a4(b7b6b4)a5(b6b5b3 b7)a6(b5b4b2b7b6)a7(b4b3b7b1b6b5),

U4=U4E0 (7)S4(7) =a0 b4a1(b3b7)a2(b2b7b6)a3(b1b6b5)a4(b0b7b5

b4)a5(b7b6b4b3b7)a6(b6b5b3b7b2b7b6)a7(b5b4b2b7b6b1b6b5),

U5=U5E0 (7)S5(7) =a0 b5a1b4a2(b3b7)a3(b2b7b6)a4(b1b6b5)a5(b0b7b5b4)a6(b7b6b4b3b7)a7(b6b5b3b7b2b7b6),

U6=U6E0 (7)S6(7) =a0 b6a1b5a2b4a3(b3b7)a4(b2b7b6)a5(b1b6b5)a6(b0b7b5b4)a7(b7b6b4b3b7),

U7=U7E0 (7)S7(7) =a0 b7a1b6a2b5a3b4a4(b3b7)a5(b2b7b6)a6(b1b6b5)a7(b0b7b5b4).

The final result C(X) is obtained from the outputs of U i

for 0 i 7.

Acknowledgments

The authors would like to thank anonymous referees

and the editor for carefully reading the paper and for theirgreat help in improving the paper.

References

[1] MacWilliams, F. J. and Sloane, N. J. A., The Theory of

Error-Correcting Codes, Amsterdam: North-Holland

(1977).

[2] Lidl, R. and Niederreiter, H., Introduction to Finite

Fields and Their Applications,New York: Cambridge

Univ. Press, U.S.A. (1994).

[3] Blahut, R. E.,Fast Algorithms for Digital Signal Pro-

cessing,Reading, Mass.: Addison-Wesley (1985).

[4] Reed, I. S. and Truong, T. K., The Use of Finite Fields

to Compute Convolutions, IEEE Trans. Information

Theory,Vol. IT-21, pp. 208213 (1975).

[5] Wang, C. C. and Pei, D., A VLSI Design for Com-

puting Exponentiation in GF(2m) and its Application to

Generate Pseudorandom Number Sequences, IEEE

Trans. Computers,Vol. 39, pp. 258262 (1990).

[6] Omura, J. and Massey, J., Computational Method and

Apparatus for Finite Field Arithmetic, U.S. Patent

Number 4,587,627 (1986).

[7] Wang, C. C., Truong, T. K., Shao, H. M., Deutsch, L.J., Omura, J. K. and Reed, I. S., VLSI Architectures

for Computing Multiplications and Inverses in GF(2m

),

IEEE Trans. Computers, Vol. C-34, pp. 709717

(1985).

[8] Reyhani-Masoleh, A. and Hasan, M. A., A New Con-

struction of Massey-Omura Parallel Multiplier Over

GF(2m

), IEEE Trans. Computers, Vol. 51, pp. 511

520 (2002).

[9] Berlekamp, E. R., Bit-Serial Reed-Solomon Enco-

der, IEEE Trans. Inform. Theory, Vol. IT-28, pp.

869874 (1982).

[10] Wu, H., Hasan, M. A. and Blake, I. F., New Low-

Complexity Bit-Parallel Finite Field Multipliers Using

Weakly Dual Bases,IEEE Trans. Computers, Vol. 47,

pp. 12231234 (1998).

[11] Wu, H. and Hasan, M. A., Low Complexity Bit-

Parallel Multipliers for a Class of Finite Fields,IEEE


[12] Lee, C. Y., Chiou, C. W. and Lin, J. M., Low-

Complexity Bit-Parallel Dual Basis Multipliers Using

the Modified Booths Algorithm,Computers & Elec-

trical Engineering,Vol. 31, pp. 444459 (2005).

[13] Lee, C. Y. and Chiou, C. W., Efficient Design of

Low-Complexity Bit-Parallel Systolic Hankel Multi-

pliers to Implement Multiplication in Normal and Dual

Bases of GF(2m

), IEICE Trans. on Fundamentals of

Electronics,Communications and Computer Science,

Vol. E88-A, pp. 31693179 (2005).

[14] Bartee, T. C. and Schneider, D. J., Computation with


8/12/2019 10-3-8

11/12

Finite Fields,Information and Computing, Vol. 6, pp.

7998 (1963).

[15] Mastrovito, E. D., VLSI Architectures for Multiplica-

tion Over Finite Field GF(2m),Applied Algebra, Al-

gebraic Algorithms, and Error-Correcting Codes, Proc.

Sixth Intl Conf., AAECC-6, T. Mora, ed., Rome, pp.

297309 (1988).

[16] Mastrovito, E. D., VLSI Architectures for Computa-

tions in Galois Fields, Ph.D. thesis, Linkping Univ.,

Dept. of Electrical Eng., Linkping, Sweden (1991).

[17] Ko, . K. and Sunar, B., Low-Complexity Bit-

Parallel Canonical and Normal Basis Multipliers for a

Class of Finite Fields,IEEE Trans. Computers, Vol.47, pp. 353356 (1998).

[18] Sunar, B. and Ko, . K., Mastrovito Multiplier for

All Trinomials,IEEE Trans. Computers,Vol. 48, pp.

522527 (1999).

[19] Wu, H., Bit-Parallel Finite Field Multiplier and Squ-

are Using Polynomial Basis,IEEE Trans. Computers,

Vol. 51, pp. 750758 (2002).

[20] Elia, M., Leone, M. and Visentin, C., Low Complex-

ity Bit-Parallel Multipliers for GF(2m

) with Generator

Polynomial xm+xk+1,Electronics Letters, Vol. 35, pp.

551552 (1999).

[21] Wu, H., Montgomery Multiplier and Squarer for a

Class of Finite Fields,IEEE Trans. Computers, Vol.

51, pp. 521529 (2002).

[22] Lee, C. Y., Low Complexity Bit-Parallel Systolic

Multiplier Over GF(2m

) Using Irreducible Trino-

mials,IEE Proc.-Comput. Digit. Tech.,Vol. 150, pp.

3942 (2003).

[23] Chiou, C. W., Lin, L. C., Chou, F. H. and Shu, S. F.,

Low Complexity Finite Field Multiplier Using Irre-

ducible Trinomials,IEE Electronics Letters, Vol. 39,

pp. 17091711 (2003).

[24] Itoh, T. and Tsujii, S., Structure of Parallel Multipliers

for a Class of Fields GF(2m

), Information and Com-

putation,Vol. 83, pp. 2140 (1989).

[25] Hasan, M. A., Wang, M. and Bhargava, V. K., Modu-

lar Construction of Low Complexity Parallel Multipli-

ers for a Class of Finite Fields GF(2m

), IEEE Trans.

Computers,Vol. 41, pp. 962971 (1992).

[26] Lee, C. Y., Lu, E. H. and Lee, J. Y., Bit-Parallel Sys-

tolic Multipliers for GF(2m

) Fields Defined by All-One

and Equally-Spaced Polynomials, IEEE Trans. Com-

puters,Vol. 50, pp. 385393 (2001).

[27] Paar, C., A New Architecture for a Parallel Finite

Field Multiplier with Low Complexity Based on Com-

posite Fields, IEEE Trans. Computers, Vol. 45, pp.

856861 (1996).

[28] Paar, C., Fleischmann, P. and Roelse, P., Efficient

Multiplier Architectures for Galois Fields GF(24n

),

IEEE Trans. Computers,Vol. 47, pp. 162170 (1998).

[29] Kim, N.-Y., Kim, H.-S. and Yoo, K.-Y., Computation

of AB2

Multiplication in GF(2m

) Using Low-Com-plexity Systolic Architecture,IEE Proc.-Circuits, De-

vices and Systems,Vol. 150, pp. 119123 (2003).

[30] Drolet, G., A New Representation of Elements of Fi-

nite Fields GF(2m

) Yielding Small Complexity Arith-

metic Circuits,IEEE Trans. Computers, Vol. 47, pp.

938946 (1998).

[31] Wang, C.-L. and Lin, J.-L., Systolic Array Implemen-

tation of Multipliers for Finite Fields GF(2m

), IEEE

Trans. Circuits and Systems, Vol. 38, pp. 796800

(1991).

[32] Wei, S.-W., A Systolic Power-Sum Circuit for GF(2m

),


[33] Wei, S.-W., VLSI Architectures for Computing Ex-

ponentiations, Multiplicative Inverses, and Divisions

in GF(2m

),IEEE Trans. Circuits and Systems-II: Ana-

log and Digital Signal Processing, Vol. 44, pp.

847855 (1997).

[34] Yeh, C.-S., Reed, I. S. and Truong, T. K., Systolic

Multipliers for Finite Fields GF(2m

), IEEE Trans.

Computers,Vol. C-33, pp. 357360 (1984).

[35] Wang, C. L. and Guo, J. H., New Systolic Arrays for

C+AB2, Inversion, and Division in GF(2

m), IEEE


[36] Hsu, I. S., Truong, T. K., Deutsch, L. J. and Reed, I. S.,

A Comparison of VLSI Architecture of Finite Field

Multipliers Using Dual, Normal, or Standard Bases,


[37] Weste, N. and Eshraghian, K., Principles of CMOS

VLSI Design: A System Perspective, Reading, Mass.:


8/12/2019 10-3-8

12/12

Addison-Wesley (1985).

[38] http://www.st.com/stonline/books/pdf/docs/2006.pdf

[39] http://www.st.com/stonline/books/pdf/docs/1885.pdf

[40] Http://www.stm.com/stonline/products/literature/

ds/1937/m74hc279.pdf

Manuscript Received: Dec. 1, 2005

Accepted: Oct. 2, 2006


Documents

10-3-8