10-3-8

Embed Size (px)

Citation preview

  • 8/12/2019 10-3-8

    1/12

    Finite Field Polynomial Multiplier with

    Linear Feedback Shift Register

    Che-Wun Chiou1*, Chiou-Yng Lee

    2and Jim-Min Lin

    3

    1Department of Computer Science and Information Engineering, Ching Yun University,

    Chung-Li, Taiwan 320, R.O.C.2Department of Computer Information and Network Engineering, Lung Hwa University of Science & Technology,

    Taoyuan, Taiwan 333, R.O.C.3Department of Information Engineering and Computer Science, Feng Chia University,

    Taichung, Taiwan 407, R.O.C.

    Abstract

    We will present an one-dimensional polynomial basis array multiplier for performing

    multiplications in finite field GF(2m

    ). A linear feedback shift register is employed in our proposed

    multiplier for reducing space complexity. As compared to other existing two-dimensional polynomial

    basis multipliers, our proposed linear array multiplier drastically reduces the space complexity from

    O(m2) to O(m). A new two-dimensional systolic array version of the proposed array multiplier is also

    included in this paper. The proposed two-dimensional systolic array multiplier saves about 30% of

    space complexity and 27% of time complexity while comparing with other two-dimensional systolic

    array multipliers.

    Key Words: Finite Field, Multiplication, Polynomial Basis, Systolic Array, Cryptography

    1. Introduction

    Arithmetic operations in a finite field play an in-

    creasingly important role in error-correcting codes [1],

    cryptography [2], digital signal processing [3,4], and

    pseudorandom number generation [5]. Two premier ari-

    thmetic operations over finite fields are addition and

    multiplication. Addition operation is simple. Multiplica-

    tion operation requires more computational time andhigher circuit complexity. Many other complex arithme-

    tic operations, like exponentiation, division, and multi-

    plicative inversion, can be therefore performed by apply-

    ing multiplication operations repeatedly. Hence, it is im-

    portant in a practical sense to develop fast multiplication

    algorithms for these complex arithmetic operations. In

    recent years, the realization of multiplication operation

    in finite fields has received wide attentions, and several

    approaches have been presented [636]. The complexity

    of implementing multiplication operations depends on

    the representation of the field elements. There are three

    main representation types of bases over GF(2m) fields,

    namely, normal basis (NB), dual basis (DB), and polyno-

    mial basis (PB). The major advantage of the NB multipli-

    ers [68] is that the squaring of an element could be com-puted simply by a cyclic shift of the binary representa-

    tion. Thus, the normal basis multipliers could be very ef-

    fectively applied on performing inverse, squaring, andexponentiation operations. The DB multipliers [913]require less chip area than other two types. However, the

    former two multipliers need basis conversion, while the

    latter type does not [36]. The polynomial basis represen-

    tation has been widely used and leads to lots of efficient

    implementations of multipliers. As compared to other

    two bases multipliers, the polynomial multipliers have

    the feature of lower design complexity and their sizes

    could be easily extended to desirable scales to meet vari-

    ous applications due to their simplicity, regularity, and

    modularity in architecture.

    Tamkang Journal of Science and Engineering, Vol. 10, No. 3, pp. 253264(2007) 253

    *Corresponding author. E-mail: [email protected]

  • 8/12/2019 10-3-8

    2/12

    Numerous architectures for PB multipliers have

    been presented [1435]. The first parallel PB multiplierwas suggested by Bartee and Schneider [14]. The PB

    multiplication operation for GF(2m) is often accompli-

    shed in two steps: polynomial multiplication and modu-

    lar reduction. In practical, both steps are usually com-

    bined together for performance reason. Mastrovito [15,

    16] firstly proposed the architecture for performing such

    combinational operations. Recently, several bit-parallel

    PB multipliers have been proposed for VLSI implemen-

    tation by using some specific classes of polynomials,

    such as trinomials [1723], all one polynomials (AOP)and equally spaced polynomials (ESP) [2426], and

    composite fields [27,28]. Yet these architectures stillhave certain shortcomings as regards cryptographic ap-

    plication due to their high circuit complexity and long la-

    tency. When the size of the finite field is getting large, the

    issue of modular multipliers design requires much more

    attentions. To alleviate the long latency problem, most

    existing PB multipliers employ XOR trees to minimize

    time complexity. Unfortunately, these circuits are not

    suitable for VLSI systems, due to the irregular and non-

    modular structure of XOR trees. To overcome this prob-

    lem, Lee [22] has proposed a regular and modular PB

    multiplier using irreducible trinomials with the space

    complexity of O(m2) and the time complexity of O(m).

    This multiplier could be easily extended and implemen-

    ted using VLSI technologies.

    In this article, we will present a linear parallel-in par-

    allel-out PB array multiplier using general irreducible

    polynomials with a linear feedback shift register. The

    proposed PB multiplier requires the space complexity of

    O(m). In order to demonstrate that our proposed multi-

    plier is superior to other existing two-dimensional sys-

    tolic array multipliers, a new two-dimensional systolicarray multiplier version of such multiplier is also pre-

    sented. We will show that the proposed two-dimensional

    systolic array multiplier also saves both space and time

    complexities while comparing with other existing two-

    dimensional systolic array multipliers.

    The organization of this paper is as follows. In Sec-

    tion 2, we will provide some basic definitions and pre-

    liminaries. In Section 3, we derive the one-dimensional

    parallel-in parallel-out PB multiplication algorithm us-

    ing general irreducible polynomials and a linear feed-

    back shift register. The two-dimensional systolic array

    version of the proposed algorithm is then described in

    Section 4. The space and time complexities are discussed

    in Section 5. Finally, a brief conclusion is given in Sec-

    tion 6.

    2. Preliminaries

    It is assumed that the reader is familiar with the basic

    concepts of finite fields. The properties of finite fields

    are covered in detail in [1,2]. The properties of finite

    fields are reviewed briefly as required in the following

    paragraphs.

    The finite field GF(2m) can be viewed as a vector

    space of dimension m over GF(2). Suppose that the finitefield GF(2m) is generated by the irreducible polynomial

    P(x) = p0 + p1x + + pm-1xm-1 + xm of degree m over

    GF(2), wherep0= 1. Then any element A in the Galois

    field GF(2m) can be represented asA(x) = a0+ a1x + a2x2

    + + am-1xm-1, where x is an intermediate over GF(2).

    The basis {1, x, x2,, xm-1} is known as standard basis

    and often refered to as polynomial basis, conventional

    basis or canonical basis. Since P(x) = 0,xm = p0+ p1x +

    + pm-1xm-1 can be used to reduce the high order termxp,

    pm, to a polynomial of degree less than m. Thus,xB(x)modP(x)can be reduced by

    xB(x) mod P(x)

    =b0x+b1x2 + + bm-1x

    m mod P(x)

    = bm-1p0 + (bm-1p1 + b0)x + + (bm-1pm-1 + bm-2)xm-1

    Let

    B(x)(1) =xB(x) mod P(x) (1)

    Therefore, x iB(x) mod P(x)can be obtained as the fol-

    lowing formula

    B(x)(i) =xB(x)(i-1) mod P(x) (2)

    Note thatB(x)(0) =B(x).

    Let the PB representation ofB(x)(i) be

    According tox m =p0+ p1x + p2x2 + + pm-1x

    m-1, the

    relation between B(X)(i+1) and B(X)(i) is depicted as

    follows:

    254 Che-Wun Chiou et al.

    ( ) 2 3 1,0 ,1 ,2 ,3 , 1( ) ... ,

    0,1 for 0 j m - 1.

    i mi i i i i m

    i, j

    B x b b x b x b x b x

    where b

  • 8/12/2019 10-3-8

    3/12

  • 8/12/2019 10-3-8

    4/12

    The shift registers S and E are initially loaded with

    B(X)andA(X)as follows:

    Sj(0)=bjandEj(0)=ajfor 0j7.

    The detailed circuit of each cell Ujis shown in Figure 2.

    The cell Ujrealizes the following function:

    vout=hinv1inv2in, andhout=hin.

    The output vout is computed and then the result is

    latched in the 1-bit latch for each clock. All 1-bit

    latches in U cells are initially reset to 0s. The symbol L

    in Figure 2 represents a 1-bit latch. The detailed cir-

    cuits for cellsSjand Ejcould be found in Figure 3 and

    Figure 4, respectively. The shift registers S and E can

    be loaded in parallel.

    The procedure for computingC(X) = A(X) B(X)in Fig-

    ure 1 is described in the Appendix.

    256 Che-Wun Chiou et al.

    Figure 1. Hardware implementation ofC(X) = A(X)B(X) mod (1+X+X3+X4+X8).

    Figure 2. The detailed circuit of the cell Uj. Figure 3. The detailed circuit of the cellSj.

  • 8/12/2019 10-3-8

    5/12

    4. Implementation with Semi-Systolic

    Two-Dimensional Array

    A semi-systolic two-dimensional systolic array im-

    plementation of the proposed array multiplier structure is

    discussed in this section. As aforementioned, the results

    in Eqs. (2) and (4) are rewritten as follows:

    andB(X)(i) for 0im - 1is represented by

    The initial value ofB(X)(i) is assigned as follows:

    and the relation between coefficients of B(X)(i+1) and

    B(X)(i) is illustrated as follows:

    (5)

    Each coefficient of the product C X c xjj

    j

    j m

    ( )

    0

    1

    is com-

    puted as follows:

    (6)

    The following algorithm can be used for computing

    the coefficient cjbased on Eq. (6).

    Algorithm A:(Using traditional method)

    cj: = 0;

    b-1,j-1: = bj;

    b-1,m-1: = 0;

    For i = 0 to m-1

    Begin

    cj:=cj+aibi-1,j-1;

    cj:=cj+aibi-1,m-1pj;

    End

    If Algorithm A is realized with the hardware circuit, the

    propagation delay of one AND gate delay and two XOR

    gate delays is needed. To shorten this propagation de-

    lay, a parallel version of Algorithm A, Algorithm B, is

    depicted as follows.

    Algorithm B:(Using parallel method)

    z-1:=bj;

    b-1,m-1:=bm-1;

    For i = 0 to m-1

    Begin

    Cobegin

    zi:=zi-1+bi-1,m-1pj;

    cj:=cj+aizi-1;

    Coend

    End

    Finite Field Polynomial Multiplier with Linear Feedback Shift Register 257

    Figure 4. The detailed circuit of the cellEj.

    ( ) 2 3 1,0 ,1 ,2 ,3 , 1( ) ... ,

    0,1 for 0 j m - 1.

    i mi i i i i m

    i, j

    B X b b x b x b x b x

    where b

    (0)( ) ( )B X B X

    1,0 , 1 0

    1, , 1 , 1

    ,

    for 1 j m - 1

    i i m

    i j i j i m j

    b b p

    b b b p

    1

    ,

    0

    for 0 j m - 1, or

    m

    j i i j

    i

    c a b

    (0) (1) (2)0 1 2

    (3) ( 1)3 1

    ( ) ( ) ( ) ( )

    ( ) ... ( ) ,mm

    C X a B X a B X a B X

    a B X a B X

    0 0, 1 1, 2 2, 3 3,

    4 4, 1 1,

    0 0, 1 0, 1 0, 1 2 1, 1 1, 1

    3 2, 1 2, 1

    1 2, 1 2, 1

    0 1 1 1 0, 1 2

    ...

    ( ) ( )

    ( ) ...

    ( )

    ( ) (

    j j j j j

    j m m j

    j j m j j m j

    j m j

    m m j m m j

    j j m j

    c a b a b a b a b

    a b a b

    a b a b b p a b b p

    a b b p

    a b b p

    a b a b a b p a

    1, 1 2 1, 1

    3 2, 1 3 2, 1

    1 2, 1 1 2, 1

    )

    ( ) ...

    ( )

    j m j

    j m j

    m m j m m m j

    b a b p

    a b a b p

    a b a b p

  • 8/12/2019 10-3-8

    6/12

    Based on Eqs. (5~6) and Algorithm B, the semi-

    systolic two-dimensional systolic array for realizing

    the product C(X) = A(X) B(X)is shown in Figure 5.The circuit for the processing elementVi,j is shown in

    Figure 6.

    5. Complexity

    In the CMOS VLSI technology, 2-input AND, 2-

    input XOR, and 1-bit latch are composed of 6, 6, and 8

    transistors, respectively [37]. Suppose that an XOR gate

    with 3-input and an XOR gate with 4-input are const-

    ructed by two 2-input XOR gates and three 2-input XOR

    gates, respectively. Thus, the propagation delays of go-ing through a 3-input XOR gate and a 4-input XOR gate

    would be the same. A comparison of space and area-time

    complexities of various PB bit-parallel multipliers is

    given in Table 1.

    Suppose that the generating polynomial P(X) has k

    terms. Most existing PB multipliers using XOR binary

    trees require the space complexity of O(m2) and take

    258 Che-Wun Chiou et al.

    Figure 5. The proposed semi-systolic two-dimensional systolic array over GF(2m).

    Figure 6. The detailed circuit for the cellVi,j.

  • 8/12/2019 10-3-8

    7/12

    time complexity of O(log2 m) [24,25]. However, such

    multipliers are not regular and then are not suitable for

    VLSI implementation due to their tree structures. To

    overcome this problem, many systolic array structures,

    which have features of regularity and modularity and are

    well suited to VLSI implementation, have been present-

    ed. However, most existing systolic array multipliers

    need the space complexity of O(m2). Our proposed linear

    systolic array multiplier in Figure 1 using an irreducible

    polynomial only requires the space complexity of O(m).

    However, two-dimensional systolic array multipliers are

    useful when there are many successive multiplication

    operations to be performed as in the case of exponen-

    tiation operation. Thus, a two-dimensional systolic arrayversion of the proposed multiplier is shown in Figure 5.

    Comparing with the multiplier proposed by Wei [33], the

    proposed two-dimensional semi-systolic systolic array

    multiplier in Figure 5 saves about 30% of space com-

    plexity.

    Comparisons of time complexities of various PB

    bit-parallel multipliers are given in Table 2. Let TA,

    TX, TL, and T3X represent the gate delays of 2-input

    AND gate, 2-input XOR gate, 1-bit latch, and 3-input

    XOR gate, respectively. We assume that real circuits

    such as M74HC86 (STMicroelectronics, XOR gate,

    tPD = 12ns (TYP.)) [38], M74HC08 (STMicroelectro-

    nics, AND gate, tPD= 7ns (TYP.)) [39], and M74HC279

    (STMicroelectronics, Latch, tPD = 13ns (TYP.)) [40]

    are employed. The proposed multiplication architec-

    tures in Figure 1 and Figure 5 save about 27% of time

    complexity as compared to the multiplier in [33]. Al-

    though, the developed multiplier in Figure 1 increases

    the space complexity as compared to Lees multiplier

    [22] for all trinomials, but saves about 33% of area-

    time complexity.

    6. Conclusion

    In this study, we have presented an one-dimensionalarray multiplier for performing multiplications in the

    finite field GF(2m) with the PB representation. A linear

    feedback shift register is employed in our proposed

    multiplier. Our proposed linear array multiplier re-

    quires only O(m) space complexity while other exist-

    ing two-dimensional systolic array multipliers need

    O(m2) space complexity. Such low-complexity multi-

    plier is very attractive for mobile platforms such as

    PDA and smart phone. A new two-dimensional sys-

    tolic array version of the proposed multiplier has also

    been included. The proposed two-dimensional systolic

    Finite Field Polynomial Multiplier with Linear Feedback Shift Register 259

    Table 1.Comparison of various PB bit-parallel multipliers

    Space complexityItems

    Multipliers

    Generating

    polynomial Function

    Gate count Transistor countLatency

    Area-time

    complexity (ns)

    Yeh et al. [34] General form AB + C #AND2: 2m2

    #XOR2: 2m2

    #L: 7m2

    80m2

    3m 7680m3

    Wang-Lin [31] General form AB + C #AND2: 2m2

    #XOR3: m2

    #L: 7m2

    76m2

    3m 10032m30

    Wei [33] General form AB + C #AND2: 3m2

    #XOR2: m2

    #XOR3: m2

    #L: 4m2

    68m2

    m 2992m3

    Lee [22] Trinomials AB + C #AND2: m2

    #XOR2: m

    2+ m - 1

    #L: 3m2

    - 2m - 2

    36m2

    + 24m - 24 2m - 1 2304m3

    Our proposal in Fig. 1 General form AB + C #AND2: 7m

    #XOR2: m + k

    #L: 3m

    72m + 6k m 2304m2

    Our proposal in Fig. 5 General form AB + C #AND2:2m2

    #XOR2: 2m2

    #L: 3m2

    48m2

    m 1536m3

  • 8/12/2019 10-3-8

    8/12

    array multiplier saves about 30% of space complexity

    and 27% of time complexity while comparing with

    other existing two-dimensional systolic array multi-

    plier in [33].

    Appendix: Procedure-A

    Procedure-A:

    /* LetC(X) = A(X)B(X) mod P(X), and *//* C(X) = c0 + c1X

    1 + c2X2 + c3X

    3 + c4X4 + c5X

    5 + c6X6

    +C7X7, */

    /*A(X) = a0 + a1X1 + a2X

    2 + a3X3 + a4X

    4 + a5X5 + a6X

    6

    +A7X7, */

    /*B(X) = b0 + b1X1 + b2X

    2 + b3X3 + b4X

    4 + b5X5 + b6X

    6

    +B7X7, */

    /*P(X) = 1 + X1 + X3 + X4 + X8. */

    Step 0: Initial condition;

    (a)B(X)is loaded into the linear feedback shift regis-ter S as follows:Si(0) = bifor 0 i 7.

    (b)A(X)is loaded into the shift register E as follows:

    Ei(0) = aifor 0 i 7.

    (c) All 1-bit latches in cells Uifor 0 i 7 are ini-

    tially reset to zeros.

    Step 1: At clock cycle 0; Cells U0~U7do following op-

    erations:

    U0=0E0 (0)S0 (0) =a0 b0,U1=0E0 (0)S1 (0) =a0 b1,U2=0E0 (0)S2 (0) =a0 b2,

    U3=0E0 (0)S3 (0) =a0 b3,U4=0E0 (0)S4(0) =a0b4,U5=0E0(0)S5 (0) =a0 b5,U6=0E0 (0)S6(0) =a0b6,U7=0E0(0)S7(0) =a0b7.

    Step 2: At clock cycle 1; Cells U0~U7do following op-

    erations:

    U0=U0E0 (1)S0(1) =a0 b0a1b7,U1=U1E0 (1)S1(1) =a0b1a1(b0 b7),U2= U2E0 (1)S2(1) =a0 b2a1b1,U3=U3E0 (1)S3(1) =a0 b3a1(b2b7),U4=U4E0 (1)S4(1) =a0 b4a1(b3b7),U5=U5E0 (1)S5(1) =a0 b5a1b4,U6=U6E0 (1)S6(1) =a0 b6a1b5,U7=U7E0 (1)S7(1) =a0 b7a1b6.

    Step 3: At clock cycle 2; Cells U0~U7do following op-

    erations:U0=U0E0 (2)S0 (2) =a0 b0a1b7a2b6,U1=U1E0 (2)S1(2) =a0 b1a1(b0b7)

    a2(b7b6),U2=U2E0 (2)S2(2) =a0 b2a1b1a2(b0b7),U3=U3E0 (2)S3(2) =a0 b3a1(b2b7)

    a2(b1b6),U4=U4E0 (2)S4(2) =a0 b4a1(b3b7)

    a2(b2b7b6),U5=U5E0 (2)S5(2) =a0 b5a1b4a2(b3b7),U6=U6E0 (2)S6(2) =a0 b6a1b5a2b4,U7=U7E0 (2)S7(2) =a0 b7a1b6a2b5.

    260 Che-Wun Chiou et al.

    Table 2.Comparisons of time complexities of various PB bit-parallel multipliers

    Time complexityItems

    Multipliers

    Generating

    polynomial Function

    Latency

    (unit = clockcycles)

    Throu-ghput

    Propagation throughone cell

    Total propagationdelay (unit = ns)

    Yeh et al. [34] General form AB + C 3m 1 TA+TX+TL 3m(TA+TX+TL)(96m ns)

    Wang-Lin [31] General form AB + C 3m 1 TA+T3X+TL 3m(TA+T3X+TL)(132m ns)

    Wei [33] General form AB + C m 1 TA+T3X m(TA+T3X+TL)

    (44m ns)

    Lee [22] Trinomials AB + C 2m - 1 1 TA+TX+TL (2m-1)(TA+TX+TL)

    (64m ns)

    Our proposal in Fig. 1 General form AB + C m 1/m TA+TX+TL m(TA+TX+TL)(32m ns)

    Our proposal in Fig. 5 General form AB + C m 1 TA+TX+TL m(TA+TX+TL)(32m ns)

  • 8/12/2019 10-3-8

    9/12

    Step 4: At clock cycle 3; Cells U0~U7do following op-

    erations:

    U0=U0E0 (3)S0 (3) =a0 b0a1b7a2b6a3b5,U1=U1E0 (3)S1(3) =a0 b1a1(b0b7)

    a2(b7b6)a3(b6b5),U2=U2E0 (3)S2(3) =a0 b2a1b1a2(b0b7)

    a3(b7b6),U3=U3E0 (3)S3(3) =a0 b3a1(b2b7)

    a2(b1b6)a3(b0b7b5),U4=U4E0 (3)S4(3) =a0 b4a1(b3b7)

    a2(b2b7b6)a3(b1b6b5),U5=U5E0 (3)S5(3) =a0 b5a1b4a2(b3b7)

    a3(b2b7b6),

    U6=U6E0 (3)S6(3) =a0 b6a1b5a2b4a3(b3b7),U7=U7E0 (3)S7(3) =a0 b7a1b6a2b5a3b4.

    Step 5: At clock cycle 4; Cells U0~U7do following op-

    erations:

    U0=U0E0 (4)S0 (4) =a0 b0a1b7a2b6a3b5a4b4,

    U1=U1E0 (4)S1(4) =a0 b1a1(b0b7)a2(b7b6)a3(b6b5)a4(b5b4),

    U2=U2E0 (4)S2(4) =a0 b2a1b1a2(b0b7)

    a3(b7b6)a4(b6+b5),U3=U3E0 (4)S3(4) =a0 b3a1(b2b7)a2(b1

    b6)a3(b0b7b5)a4(b7b6b4),U4=U4E0 (4)S4(4) =a0 b4a1(b3b7)

    a2(b2b7b6)a3(b1b6b5)a4(b0b7b5b4),

    U5=U5E0 (4)S5(4) =a0 b5a1b4a2(b3b7)a3(b2b7b6)a4(b1b6b5),

    U6=U6E0 (4)S6(4) =a0 b6a1b5a2b4a3(b3b7)a4(b2b7b6),

    U7=U7E0 (4)S7(4) =a0 b7a1b6a2b5a3b4a4(b3b7).

    Step 6: At clock cycle 5; Cells U0~U7do following op-

    erations:

    U0=U0E0 (5)S0 (5) =a0 b0a1b7a2b6a3b5a4b4a5(b3b7),

    U1=U1E0 (5)S1(5) =a0 b1a1(b0b7)a2(b7b6)a3(b6b5)a4(b5b4)a5(b4b3b7),

    U2=U2E0 (5)S2(5) =a0 b2a1b1a2(b0b7)

    a3(b7b6)a4(b6b5)a5(b5b4),

    U3=U3E0 (5)S3(5) =a0 b3a1(b2b7)a2(b1b6)a3(b0b7b5)a4(b7b6b4)a5(b6b5b3b7),

    U4=U4E0 (5)S4(5) =a0 b4a1(b3b7)a2(b2b7b6)a3(b1b6b5)a4(b0b7b5b4)a5(b7b6b4b3b7),

    U5=U5E0 (5)S5(5) =a0 b5a1b4a2(b3b7)a3(b2b7b6)a4(b1b6b5)a5(b0b7b5b4),

    U6=U6E0 (5)S6(5) =a0 b6a1b5a2b4a3(b3b7)a4(b2b7b6)a5(b1b6b5),

    U7=U7E0 (5)S7(5) =a0 b7a1b6a2b5a3b4a4(b3b7)a5(b2b7b6).

    Step 7: At clock cycle 6; Cells U0~U7do following op-

    erations:

    U0=U0E0 (6)S0 (6) =a0 b0a1b7a2b6a3b5a4b4a5(b3b7)a6(b2b7b6),

    U1=U1E0 (6)S1(6) =a0 b1a1(b0b7)a2(b7b6)a3(b6b5)a4(b5b4)a5(b4b3b7) +a6(b3b7b2b7b6),

    U2=U2E0 (6)S2(6) =a0 b2a1b1a2(b0b7)a3(b7b6)a4(b6b5)a5(b5b4)a6(b4b3b7),

    U3=U3E0 (6)S3(6) =a0 b3a1(b2b7)a2(b1b6)a3(b0b7b5)a4(b7b6b4)a5(b6b5b3b7)a6(b5b4b2b7b6),

    U4=U4E0 (6)S4(6) =a0 b4a1(b3b7)a2(b2b7b6)a3(b1b6b5)a4(b0b7b5b4)a5(b7b6b4b3b7)a6(b6b5b3b7b2b7b6),

    U5=U5E0 (6)S5(6) =a0 b5a1b4a2(b3b7)a3(b2b7b6)a4(b1b6b5)a5(b0

    b7b5b4)a6(b7b6b4b3b7),U6=U6E0(6)S6(6) =a0 b6a1b5a2b4a3(b3

    b7)a4(b2b7b6)a5(b1b6b5)a6(b0b7b5b4),

    U7= U7E0 (6)S7(6) =a0 b7a1b6a2b5a3b4 a4(b3 b7) a5(b2 b7 b6) a6(b1 b6+ b5).

    Step 8: At clock cycle 7; Cells U0~U7do following op-

    erations:

    U0=U0E0 (7)S0 (7) =a0 b0a1b7a2b6a3b5a4b4a5(b3b7)a6(b2b7b6)

    a7(b1b6b5),

    Finite Field Polynomial Multiplier with Linear Feedback Shift Register 261

  • 8/12/2019 10-3-8

    10/12

    U1=U1E0 (7)S1(7) =a0 b1a1(b0b7)a2(b7b6)a3(b6b5)a4(b5b4)a5(b4b3b7)a6(b3b7b2b7b6)a7(b2b7b6b1b6b5),

    U2=U2E0 (7)S2(7) =a0 b2a1b1a2(b0b7)a3(b7b6)a4(b6b5)a5(b5b4)a6(b4b3b7)a7(b3b7b2b7b6),

    U3=U3E0 (7)S3(7) =a0 b3a1(b2b7)a2(b1b6)a3(b0b7b5)a4(b7b6b4)a5(b6b5b3 b7)a6(b5b4b2b7b6)a7(b4b3b7b1b6b5),

    U4=U4E0 (7)S4(7) =a0 b4a1(b3b7)a2(b2b7b6)a3(b1b6b5)a4(b0b7b5

    b4)a5(b7b6b4b3b7)a6(b6b5b3b7b2b7b6)a7(b5b4b2b7b6b1b6b5),

    U5=U5E0 (7)S5(7) =a0 b5a1b4a2(b3b7)a3(b2b7b6)a4(b1b6b5)a5(b0b7b5b4)a6(b7b6b4b3b7)a7(b6b5b3b7b2b7b6),

    U6=U6E0 (7)S6(7) =a0 b6a1b5a2b4a3(b3b7)a4(b2b7b6)a5(b1b6b5)a6(b0b7b5b4)a7(b7b6b4b3b7),

    U7=U7E0 (7)S7(7) =a0 b7a1b6a2b5a3b4a4(b3b7)a5(b2b7b6)a6(b1b6b5)a7(b0b7b5b4).

    The final result C(X) is obtained from the outputs of U i

    for 0 i 7.

    Acknowledgments

    The authors would like to thank anonymous referees

    and the editor for carefully reading the paper and for theirgreat help in improving the paper.

    References

    [1] MacWilliams, F. J. and Sloane, N. J. A., The Theory of

    Error-Correcting Codes, Amsterdam: North-Holland

    (1977).

    [2] Lidl, R. and Niederreiter, H., Introduction to Finite

    Fields and Their Applications,New York: Cambridge

    Univ. Press, U.S.A. (1994).

    [3] Blahut, R. E.,Fast Algorithms for Digital Signal Pro-

    cessing,Reading, Mass.: Addison-Wesley (1985).

    [4] Reed, I. S. and Truong, T. K., The Use of Finite Fields

    to Compute Convolutions, IEEE Trans. Information

    Theory,Vol. IT-21, pp. 208213 (1975).

    [5] Wang, C. C. and Pei, D., A VLSI Design for Com-

    puting Exponentiation in GF(2m) and its Application to

    Generate Pseudorandom Number Sequences, IEEE

    Trans. Computers,Vol. 39, pp. 258262 (1990).

    [6] Omura, J. and Massey, J., Computational Method and

    Apparatus for Finite Field Arithmetic, U.S. Patent

    Number 4,587,627 (1986).

    [7] Wang, C. C., Truong, T. K., Shao, H. M., Deutsch, L.J., Omura, J. K. and Reed, I. S., VLSI Architectures

    for Computing Multiplications and Inverses in GF(2m

    ),

    IEEE Trans. Computers, Vol. C-34, pp. 709717

    (1985).

    [8] Reyhani-Masoleh, A. and Hasan, M. A., A New Con-

    struction of Massey-Omura Parallel Multiplier Over

    GF(2m

    ), IEEE Trans. Computers, Vol. 51, pp. 511

    520 (2002).

    [9] Berlekamp, E. R., Bit-Serial Reed-Solomon Enco-

    der, IEEE Trans. Inform. Theory, Vol. IT-28, pp.

    869874 (1982).

    [10] Wu, H., Hasan, M. A. and Blake, I. F., New Low-

    Complexity Bit-Parallel Finite Field Multipliers Using

    Weakly Dual Bases,IEEE Trans. Computers, Vol. 47,

    pp. 12231234 (1998).

    [11] Wu, H. and Hasan, M. A., Low Complexity Bit-

    Parallel Multipliers for a Class of Finite Fields,IEEE

    Trans. Computers,Vol. 47, pp. 883887 (1998).

    [12] Lee, C. Y., Chiou, C. W. and Lin, J. M., Low-

    Complexity Bit-Parallel Dual Basis Multipliers Using

    the Modified Booths Algorithm,Computers & Elec-

    trical Engineering,Vol. 31, pp. 444459 (2005).

    [13] Lee, C. Y. and Chiou, C. W., Efficient Design of

    Low-Complexity Bit-Parallel Systolic Hankel Multi-

    pliers to Implement Multiplication in Normal and Dual

    Bases of GF(2m

    ), IEICE Trans. on Fundamentals of

    Electronics,Communications and Computer Science,

    Vol. E88-A, pp. 31693179 (2005).

    [14] Bartee, T. C. and Schneider, D. J., Computation with

    262 Che-Wun Chiou et al.

  • 8/12/2019 10-3-8

    11/12

    Finite Fields,Information and Computing, Vol. 6, pp.

    7998 (1963).

    [15] Mastrovito, E. D., VLSI Architectures for Multiplica-

    tion Over Finite Field GF(2m),Applied Algebra, Al-

    gebraic Algorithms, and Error-Correcting Codes, Proc.

    Sixth Intl Conf., AAECC-6, T. Mora, ed., Rome, pp.

    297309 (1988).

    [16] Mastrovito, E. D., VLSI Architectures for Computa-

    tions in Galois Fields, Ph.D. thesis, Linkping Univ.,

    Dept. of Electrical Eng., Linkping, Sweden (1991).

    [17] Ko, . K. and Sunar, B., Low-Complexity Bit-

    Parallel Canonical and Normal Basis Multipliers for a

    Class of Finite Fields,IEEE Trans. Computers, Vol.47, pp. 353356 (1998).

    [18] Sunar, B. and Ko, . K., Mastrovito Multiplier for

    All Trinomials,IEEE Trans. Computers,Vol. 48, pp.

    522527 (1999).

    [19] Wu, H., Bit-Parallel Finite Field Multiplier and Squ-

    are Using Polynomial Basis,IEEE Trans. Computers,

    Vol. 51, pp. 750758 (2002).

    [20] Elia, M., Leone, M. and Visentin, C., Low Complex-

    ity Bit-Parallel Multipliers for GF(2m

    ) with Generator

    Polynomial xm+xk+1,Electronics Letters, Vol. 35, pp.

    551552 (1999).

    [21] Wu, H., Montgomery Multiplier and Squarer for a

    Class of Finite Fields,IEEE Trans. Computers, Vol.

    51, pp. 521529 (2002).

    [22] Lee, C. Y., Low Complexity Bit-Parallel Systolic

    Multiplier Over GF(2m

    ) Using Irreducible Trino-

    mials,IEE Proc.-Comput. Digit. Tech.,Vol. 150, pp.

    3942 (2003).

    [23] Chiou, C. W., Lin, L. C., Chou, F. H. and Shu, S. F.,

    Low Complexity Finite Field Multiplier Using Irre-

    ducible Trinomials,IEE Electronics Letters, Vol. 39,

    pp. 17091711 (2003).

    [24] Itoh, T. and Tsujii, S., Structure of Parallel Multipliers

    for a Class of Fields GF(2m

    ), Information and Com-

    putation,Vol. 83, pp. 2140 (1989).

    [25] Hasan, M. A., Wang, M. and Bhargava, V. K., Modu-

    lar Construction of Low Complexity Parallel Multipli-

    ers for a Class of Finite Fields GF(2m

    ), IEEE Trans.

    Computers,Vol. 41, pp. 962971 (1992).

    [26] Lee, C. Y., Lu, E. H. and Lee, J. Y., Bit-Parallel Sys-

    tolic Multipliers for GF(2m

    ) Fields Defined by All-One

    and Equally-Spaced Polynomials, IEEE Trans. Com-

    puters,Vol. 50, pp. 385393 (2001).

    [27] Paar, C., A New Architecture for a Parallel Finite

    Field Multiplier with Low Complexity Based on Com-

    posite Fields, IEEE Trans. Computers, Vol. 45, pp.

    856861 (1996).

    [28] Paar, C., Fleischmann, P. and Roelse, P., Efficient

    Multiplier Architectures for Galois Fields GF(24n

    ),

    IEEE Trans. Computers,Vol. 47, pp. 162170 (1998).

    [29] Kim, N.-Y., Kim, H.-S. and Yoo, K.-Y., Computation

    of AB2

    Multiplication in GF(2m

    ) Using Low-Com-plexity Systolic Architecture,IEE Proc.-Circuits, De-

    vices and Systems,Vol. 150, pp. 119123 (2003).

    [30] Drolet, G., A New Representation of Elements of Fi-

    nite Fields GF(2m

    ) Yielding Small Complexity Arith-

    metic Circuits,IEEE Trans. Computers, Vol. 47, pp.

    938946 (1998).

    [31] Wang, C.-L. and Lin, J.-L., Systolic Array Implemen-

    tation of Multipliers for Finite Fields GF(2m

    ), IEEE

    Trans. Circuits and Systems, Vol. 38, pp. 796800

    (1991).

    [32] Wei, S.-W., A Systolic Power-Sum Circuit for GF(2m

    ),

    IEEE Trans. Computers,Vol. 43, pp. 226229 (1994).

    [33] Wei, S.-W., VLSI Architectures for Computing Ex-

    ponentiations, Multiplicative Inverses, and Divisions

    in GF(2m

    ),IEEE Trans. Circuits and Systems-II: Ana-

    log and Digital Signal Processing, Vol. 44, pp.

    847855 (1997).

    [34] Yeh, C.-S., Reed, I. S. and Truong, T. K., Systolic

    Multipliers for Finite Fields GF(2m

    ), IEEE Trans.

    Computers,Vol. C-33, pp. 357360 (1984).

    [35] Wang, C. L. and Guo, J. H., New Systolic Arrays for

    C+AB2, Inversion, and Division in GF(2

    m), IEEE

    Trans. Computers,Vol. 49, pp. 11201125 (2000).

    [36] Hsu, I. S., Truong, T. K., Deutsch, L. J. and Reed, I. S.,

    A Comparison of VLSI Architecture of Finite Field

    Multipliers Using Dual, Normal, or Standard Bases,

    IEEE Trans. Computers,Vol. 37, pp. 735739 (1988).

    [37] Weste, N. and Eshraghian, K., Principles of CMOS

    VLSI Design: A System Perspective, Reading, Mass.:

    Finite Field Polynomial Multiplier with Linear Feedback Shift Register 263

  • 8/12/2019 10-3-8

    12/12

    Addison-Wesley (1985).

    [38] http://www.st.com/stonline/books/pdf/docs/2006.pdf

    [39] http://www.st.com/stonline/books/pdf/docs/1885.pdf

    [40] Http://www.stm.com/stonline/products/literature/

    ds/1937/m74hc279.pdf

    Manuscript Received: Dec. 1, 2005

    Accepted: Oct. 2, 2006

    264 Che-Wun Chiou et al.