39
Some new results on binary polynomial multiplication Murat Cenk Institute of Applied Mathematics, Middle East Technical University, Ankara, Turkey joint work with Anwar Hasan April 10, 2015 Murat Cenk New results on binary polynomial multiplication 1 / 36

Some new results on binary polynomial multiplicationmcs.bilgem.tubitak.gov.tr/.../2015-sunumlar/Murat_Cenk.pdf · 2016. 3. 21. · Murat Cenk Institute of Applied Mathematics, Middle

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

  • Some new results on binary polynomialmultiplication

    Murat Cenk

    Institute of Applied Mathematics, Middle East Technical University,Ankara, Turkey

    joint work with Anwar Hasan

    April 10, 2015

    Murat Cenk New results on binary polynomial multiplication 1 / 36

  • Outline

    1 Why do we need efficient multiplication algorithms in F2n?

    2 Known methods

    3 Complexity tables for high speed cryptography

    4 Improving complexities further

    5 New results

    Murat Cenk New results on binary polynomial multiplication 2 / 36

  • Motivation: Why do we need efficient multiplication algorithms?

    Cryptographic systems must be efficient.

    F2n is suitable for implementations.

    The value of n for practically used elliptic curve cryptographychanges between 163 and 571, and one scalar multiplicationrequires several hundreds of field multiplications, i.e., it is notefficient unless careful designs and efficient algorithms areused.

    Murat Cenk New results on binary polynomial multiplication 3 / 36

  • Example

    Bernstein [Crypto 2009]

    A binary Edwards curve overF2251 = F2[t]/(t251 + t7 + t4 + t2 + 1) is used.A single scalar multiplication requires 1266 fieldmultiplications.

    Each multiplication needs 33974 bit operations, where 33096bit operations for 251-bit polynomial multiplication, and 878bit operations is required for reducing the 501-bit productmodulo defining polynomial.

    So, totally 1266× 33974 = 43011084 bit operations.Note that the other operations such as additions, squarings,multiplication by a fixed element of field, and conditionalswap require totally 1668531, which is negligible compared tomultiplication.

    Murat Cenk New results on binary polynomial multiplication 4 / 36

  • Notation and model of computation

    Fqn is used for the finite field with qn elements (where q is aprime power), and Fq[X] is employed for the ring ofpolynomials over Fq.Mq(n) represents the minimum number of bit operationsrequired for the computation of the product of twopolynomials of a degree less than n over Fq.DA and DX denote the delay of bit level multiplication andaddition, respectively.

    The cost metric related to polynomial multiplication is takenas the number of bit operations (bit addition and bitmultiplication) required for multiplying polynomials over F2 orF4, and since the computations are over characteristic twofields, addition and subtraction are equal.

    Murat Cenk New results on binary polynomial multiplication 5 / 36

  • Two measures of the complexity of an algorithm.

    Arithmetic complexity: the total number of operationsrequired for multiplying polynomials and denoted by M(n).

    Delay complexity: the depth of the corresponding arithmeticcircuit, i.e., the length of the longest path and denoted byD(n).

    αx

    6R

    0A 1A 2A

    0A 2A1R 4R

    Murat Cenk New results on binary polynomial multiplication 6 / 36

  • Two measures of the complexity of an algorithm.

    Arithmetic complexity: the total number of operationsrequired for multiplying polynomials and denoted by M(n).

    Delay complexity: the depth of the corresponding arithmeticcircuit, i.e., the length of the longest path and denoted byD(n).

    αx

    6R

    0A 1A 2A

    0A 2A1R 4R

    Murat Cenk New results on binary polynomial multiplication 6 / 36

  • 1 Why do we need efficient multiplication algorithms in F2n?

    2 Known methods

    3 Complexity tables for high speed cryptography

    4 Improving complexities further

    5 New results

    Murat Cenk New results on binary polynomial multiplication 7 / 36

  • The computational complexity of multiplication

    Polynomial multiplication: Consider two degree n− 1polynomials

    A(x) =

    n−1∑i=0

    aixi, B(x) =

    n−1∑i=0

    bixi.

    The school-book multiplication gives us the product C(x) ofA(x) and B(x) to be

    C(x) =

    n−1∑i=0

    n−1∑j=0

    aibjxi+j .

    This algorithm requires n2 multiplications and (n− 1)2additions.

    Reduction: This step is generally easy and the cost is lessthan 5n.

    Murat Cenk New results on binary polynomial multiplication 8 / 36

  • Karatsuba Algorithm

    Karatsuba algorithm has better complexity. For example, considertwo 2-term polynomials,

    A(x) = a0 + a1x, B(x) = b0 + b1x.

    Karatsuba algorithm computes the product C(x) = A(x)B(x) as

    C(x) = a1b1x2 + [(a0 + a1)(b0 + b1)− a0b0 − a1b1]x+ a0b0.

    Here we need just three multiplications a0b0, (a0 + a1)(b0 + b1),a1b1 and four additions.

    Murat Cenk New results on binary polynomial multiplication 9 / 36

  • Asymptotic complexity of Karatsuba Algorithm

    Now, the size of polynomials are four (degree three):

    A(x) = a0+a1x+a2x2+a3x

    3 = a0 + a1x︸ ︷︷ ︸A0

    + x2︸︷︷︸y

    (a2 + a3x)︸ ︷︷ ︸A1

    = A0+yA1,

    B(x) = b0+b1x+b2x2+b3x

    3 = b0 + b1x︸ ︷︷ ︸B0

    + x2︸︷︷︸y

    (b2 + b3x)︸ ︷︷ ︸B1

    = B0+yB1.

    A(x)B(x) = A1B1y2+[(A0+A1)(B0+B1)−A0B0−A1B1]y+A0B0.

    For 2n-term polynomials, we have

    M(2n) ≤ 3M(n) + 8n− 4,M(1) = 1,

    M(n) ≤ 7nlog2 3 + 4n− 4 = 7n1.585 + 4n− 4.

    Murat Cenk New results on binary polynomial multiplication 10 / 36

  • Karatsuba algorithm (with Bernstein’s improvement)

    A(x) = A0 +XnA1; B(x) = B0 +X

    nB1,

    A0 =

    n−1∑i=0

    aiXi, A1 =

    n−1∑i=0

    ai+nXi,

    B0 =

    n−1∑i=0

    biXi, B1 =

    n−1∑i=0

    bi+nXi.

    (A0 +XnA1)(B0 +X

    nB1) =(1 +Xn)(A0B0 +X

    nA1B1) +Xn(A0 +A1)(B0 +B1)

    Murat Cenk New results on binary polynomial multiplication 11 / 36

  • The arithmetic complexity of the algorithm is as follow :M2(n+ k) ≤ 2M2(n) +M2(k) + 3n+ 4k − 3, n/2 ≤ k ≤ n,D2(2n) ≤ D2(n) + 3DX ,M2(n) ≤ 6.5n1.58 − 7n+ 1.5,D2(n) ≤ 3 log2(n)DX +DA.

    Murat Cenk New results on binary polynomial multiplication 12 / 36

  • Karatsuba-like improved 3-way split algorithm

    This algorithm was obtained by C., Negre and Hasan in 2012 using atechnique similar to that employed in [Zhou-Michalik].

    P0 = A0B0 = P0L + P0HXn, P1 = A1B1 = P1L + P0HX

    n,P2 = A2B2 = P2L + P2HX

    n, P3 = (A1 +A2)(B1 +B2) = P3L + P3HXn,

    P4 = (A0 +A1)(B0 +B1) = P4L + P4HXn,

    P5 = (A0 +A2)(B0 +B2) = P5L + P5HXn,

    R0 = P0H + P1L, R1 = R0 + P0L,R2 = R1 + P4L, R3 = P1H + P2L, R4 = R1 +R3,R5 = P4H + P5L, R6 = R4 +R5, R7 = R3 + P2H , R8 = R7 +R0,R9 = R8 + P3L, R10 = R9 + P5H , R11 = R7 + P3H ,C = P0L +R2X

    n +R6X2n +R10X

    3n +R11X4n + P2HX

    5n.

    M2(3n) ≤ 6M2(n) + 18n− 6,M2(2n+ k) ≤ 5M2(n) +M2(k) + 12n+ 6k − 6, n/2 < k ≤ n,M2(2n+ k) ≤ 5M2(n) +M2(k) + 13n+ 4k − 5, k ≤ n/2,D2(3n) ≤ D2(n) + 4DX ,M2(n) ≤ 5.8n1.63 − 6n+ 1.2,D2(n) ≤ 4 log3(n)DX +DA.

    Murat Cenk New results on binary polynomial multiplication 13 / 36

  • Bernstein 4-way split algorithm

    A = A0+A1Xn+A2X

    2n+A3X3n, B = B0+B1X +B2X

    2n+B3X3n

    where Aj =∑n−1

    i=0 ai+njXi and Bj =

    ∑n−1i=0 bi+njX

    i for j = 0, 1, 2, 3.Bernstein’s 4-way algorithm is the following:

    AB = (1 +X2n)((1 +Xn)(A0B0 +XnA1B1 +X

    2nA2B2 +X3nA3B3)

    +Xn(A0 +A1)(B0 +B1) +X3n(A2 +A3)(B2 +B3))

    +X2n(A0 +A2 + (A1 +A3)Xn)(B0 +B2 + (B1 +B3)X

    n).

    M2(4n) ≤M2(2n) + 6M2(n) + 27n− 8,M2(3n+ k) ≤M2(2n) + 5M2(n) +M2(k) + 19n+ 8k − 8, n/2 ≤ k ≤ n,D2(4n) ≤ D2(n) + 5DX ,M2(n) ≤ 6.425n1.58 − 6.8n+ 1.375,D2(n) ≤ 5 log4(n)DX +DA.

    Murat Cenk New results on binary polynomial multiplication 14 / 36

  • Interpolation method

    Let C(x) = A(x)B(x) in Fqn and 2n− 1 ≤ q where q is a primepower.

    Step 1 (Selection) Choose 2n− 1 points i.e.w0, w1, · · · , wd−1.

    Step 2 (Evaluation) For i = 0, 1, . . . , 2n− 1,(i) Compute A(wi) and B(wi)(ii) Compute the product A(wi) ·B(wi)

    Step 3 (Interpolation) Compute the polynomial productC(x) of A(x) and B(x) of (2n− 1)-term such thatC(wi) = A(wi) ·B(wi).

    Murat Cenk New results on binary polynomial multiplication 15 / 36

  • Step 3 can be done explicitly by the following matrix equation.c0c1...

    cd−1

    ︸ ︷︷ ︸

    C

    = V −1 ·

    A(w0)B(w0)A(w1)B(w1)

    ...A(wd−1)B(wd−1)

    ︸ ︷︷ ︸

    A(wi)B(wi)

    (1)

    where

    V =

    1 w0 w

    20 · · · w

    d−10

    1 w1 w21 · · · w

    d−11

    ......

    ......

    ...

    1 wd−1 w2d−1 · · · w

    d−1d−1

    The matrix V is called interpolation matrix. Since matrix V a isVan der Monde matrix, it is invertible.

    Murat Cenk New results on binary polynomial multiplication 16 / 36

  • Bernstein’s 3-way split formula

    A = A0 +A1Y +A2Y2, B = B0 +B1Y +B2Y

    2. Bernstein hasused these five elements 0, 1, X,X + 1 and ∞.

    Evaluations

    P0 = A0B0,P1 = (A0 +A1 +A2)(B0 +B1 +B2),P2 = (A0 +A1X +A2X

    2)(B0 +B1X +B2X2),

    P3 =((A0 +A1 +A2) + (A1X +A2X

    2))(

    (B0 +B1 +B2) + (B1X +B2X2)),

    P4 = A2B2.

    Reconstruction

    C = U + P4(X4n/3 +Xn/3) +

    W

    X2 +X,

    U = P0 + (P0 + P1)X, V = P2 + (P2 + P3)(Xn/3 +X)

    W = (U + V + P4(X4 +X))(X2n/3 +Xn/3)

    Murat Cenk New results on binary polynomial multiplication 17 / 36

  • Bernstein’s 3-way split formula

    A = A0 +A1Y +A2Y2, B = B0 +B1Y +B2Y

    2. Bernstein hasused these five elements 0, 1, X,X + 1 and ∞.

    Evaluations

    P0 = A0B0,P1 = (A0 +A1 +A2)(B0 +B1 +B2),P2 = (A0 +A1X +A2X

    2)(B0 +B1X +B2X2),

    P3 =((A0 +A1 +A2) + (A1X +A2X

    2))(

    (B0 +B1 +B2) + (B1X +B2X2)),

    P4 = A2B2.

    Reconstruction

    C = U + P4(X4n/3 +Xn/3) +

    W

    X2 +X,

    U = P0 + (P0 + P1)X, V = P2 + (P2 + P3)(Xn/3 +X)

    W = (U + V + P4(X4 +X))(X2n/3 +Xn/3)

    Murat Cenk New results on binary polynomial multiplication 17 / 36

  • Complexities

    M(n) ≤ 3M(n/3) + 2M(n/3 + 2) + 35n/3− 12,

    M(n/3 + 2) ≤M(n/3) + 8n/3 + 4,

    M(n) ≤ 25.5nlog3(5) − 25.5n+ 1,

    M(n) = O(n1.46).

    Murat Cenk New results on binary polynomial multiplication 18 / 36

  • Multi-evaluation and reconstruction data flow

    1

    1

    3

    1

    Div. by

    3n 3

    n

    3n

    3n

    3n

    3n

    3n

    A0

    C

    A1 A2

    A0 A21R 4RR3

    2P 3P 0P 1PP4

    X2+X

    Murat Cenk New results on binary polynomial multiplication 19 / 36

  • Delay evaluations

    Reconstruction

    C = U + P4(X4n/3 +Xn/3) +

    W

    X2 +X,

    Division by X2 +X

    Divide W by X which is a shift of the coefficients of W .

    Divide W/X by X + 1. The coefficients of W/(X2 +X):

    w′n−j = wn + wn−1 + · · ·+ wn−j+2.

    The corresponding delay is (n− 2)D⊕ where D⊕ is the delayof a bit addition.

    Delay complexity

    D(n) = (3n2

    + 8 log3(n)−3

    2)D⊕ +D⊗.

    Murat Cenk New results on binary polynomial multiplication 20 / 36

  • Three-way formula based on field extension

    C., Negre and Hasan proposed a different approach.

    F4 = F2[α]/(α2 + α+ 1) = {0, 1, α, α+ 1}.Evaluate the polynomials at 0, 1, α, α+ 1 and ∞.

    Evaluations

    P0 = A0B0,P1 = (A0 +A1 +A2)(B0 +B1 +B2),P2 = (A0 +A2 + α(A1 +A2))(B0 +B2 + α(B1 +B2)),P3 = (A0 +A1 + α(A1 +A2))(B0 +B1 + α(B1 +B2)),P4 = A2B2.

    Reconstruction

    C = (P0 + Xn/3P4)(1 + X

    n) + (P1 + (1 + α)(P2 + P3))(Xn/3 +

    X2n/3 +Xn) + α(P2 + P3)Xn + P2X

    2n/3 + P3Xn/3.

    Murat Cenk New results on binary polynomial multiplication 21 / 36

  • Complexities

    MF2(n) ≤ 2MF4(n/3) + 3MF2(n/3) + 29n/3− 12,

    MF4(n) ≤ 5MF4(n/3) + 58n/3− 21,

    MF4(n) ≤ 30.75nlog3(5) − 29n+ 5.25,

    MF2(n) ≤ 30.75nlog3(5) − 9.67n log3(n)− 30.5n+ 0.75.

    Murat Cenk New results on binary polynomial multiplication 22 / 36

  • Multi-evaluation and reconstruction data flow

    αx

    6R

    0A 1A 2A

    0A 2A1R 4R

    n

    n3

    n3

    n3

    n3

    n3

    n3

    (1+α) xα x

    C

    P0 1P 2P 3P 4P

    Murat Cenk New results on binary polynomial multiplication 23 / 36

  • Delay evaluations

    DF2(n) ≤ 7D⊕ +DF4(n/3),

    DF4(n) ≤ 9D⊕ +DF4(n/3),

    DF4(n) ≤ 9 log3(n)D⊕ +D⊗,

    DF2(n) ≤ (9 log3(n)− 2)D⊕ +D⊗.

    Murat Cenk New results on binary polynomial multiplication 24 / 36

  • Complexity comparisons

    CNH complexities

    M(n) ≤ 30.75nlog3(5) − 9.67n log3(n)− 30.5n+ 0.75,D(n) ≤ (9 log3(n)− 2)D⊕ +D⊗.

    Bernstein’s complexities

    M(n) ≤ 25.5nlog3(5) − 25.5n+ 1,

    D(n) ≤(3n

    2+ 8 log3(n)−

    3

    2

    )D⊕ +D⊗.

    Murat Cenk New results on binary polynomial multiplication 25 / 36

  • Complexity comparisons

    CNH complexities

    M(n) ≤ 30.75nlog3(5) − 9.67n log3(n)− 30.5n+ 0.75,D(n) ≤ (9 log3(n)− 2)D⊕ +D⊗.

    Bernstein’s complexities

    M(n) ≤ 25.5nlog3(5) − 25.5n+ 1,

    D(n) ≤(3n

    2+ 8 log3(n)−

    3

    2

    )D⊕ +D⊗.

    Murat Cenk New results on binary polynomial multiplication 25 / 36

  • 1 Why do we need efficient multiplication algorithms in F2n?

    2 Known methods

    3 Complexity tables for high speed cryptography

    4 Improving complexities further

    5 New results

    Murat Cenk New results on binary polynomial multiplication 26 / 36

  • 1 Why do we need efficient multiplication algorithms in F2n?

    2 Known methods

    3 Complexity tables for high speed cryptography

    4 Improving complexities further

    5 New results

    Murat Cenk New results on binary polynomial multiplication 27 / 36

  • A new split method for Bernstein’s 3-way split algorithm

    We compute (XA(X))(XB(X)) instead of A(X)B(X) by usingBernstein’s 3-way split algorithm.

    XA(X) = A0 +A1Xn+1 +A2X

    2n+2

    XB(X) = B0 +B1Xn+1 +B2X

    2n+2,

    This method splits 3n-term polynomials as (n, n+ 1, n− 1) ratherthan (n, n, n)

    M2(3n) ≤M2(n) + 2M2(n+ 1) +M(n+ 2) +M(n− 1) + 35n− 12,M2(3n− 2) ≤ 2M2(n) +M2(n+ 1) + 2M(n− 1) + 35n− 13.

    Murat Cenk New results on binary polynomial multiplication 28 / 36

  • Improved 5-way split algorithm

    A = A0 +A1Xn +A2X

    2n +A3X3n +A4X

    4n,

    B = B0 +B1Xn +B2X

    2n +B3X3n +B4X

    4n.

    m1 = A0B0, m2 = A1B1, m3 = A2B2, m4 = A3B3, m5 = A4B4,m6 = (A0 +A1)(B0 +B1), m7 = (A0 +A2)(B0 +B2),m8 = (A2 +A4)(B2 +B4),m9 = (A3 +A4)(B3 +B4), m10 = (A0 +A2 +A3)(B0 +B2 +B3),m11 = (A1 +A2 +A4)(B1 +B2 +B4),m12 = (A0 +A3 +A1 +A4)(B0 +B3 +B1 +B4),m13 = (A0 +A1 +A2 +A3 +A4)(B0 +B1 +B2 +B3 +B4),

    Murat Cenk New results on binary polynomial multiplication 29 / 36

  • Let C =∑10

    i=1 UiX(i−1)n

    t1 = p1 + p2, t2 = t1 + p3, t3 = t2 + p11, t4 = p4 + p5, t5 = p12 + p13,t6 = t4 + t5, t7 = t2 + t6, t8 = t1 + t4, t9 = p6 + p7, t10 = t8 + t9,t11 = t10 + p9, t12 = p14 + p15, t13 = t11 + t12, t14 = p19 + p23, t15 = t14 + p25,t16 = t13 + t15, t17 = p8 + p9, t18 = t17 + p10, t19 = t18 + p18, t20 = p6 + p7,t21 = t18 + t20, t22 = p16 + p17, t23 = t21 + t22, t24 = t23 + t3, t25 = p20 + p21,t26 = p25 + p26, t27 = p19 + p24, t28 = t25 + t26, t29 = t28 + t27, t30 = t29 + t24,t31 = t7 + t19, t32 = t28 + t31, t33 = p22 + p23, t34 = t32 + t33, t35 = t11 + p1,t36 = t35 + p10, t37 = t36 + t12, t38 = t37 + p22, t39 = t38 + p24, t40 = t39 + p26,

    U1 = p1, U2 = t3, U3 = t7, U4 = t16, U5 = t30, U6 = t34, U7 = t40,U8 = t23, U9 = t19, U10 = p10,

    Murat Cenk New results on binary polynomial multiplication 30 / 36

  • Asymptotic complexities of this algorithm are the following:

    M2(n) ≤ 13M2(n) + 56n/5− 18, M2(1) = 1,M2(n) ≤ 6.5n1.58 − 7n+ 1.5,D2(n) ≤ D2(n/5) + 12DX , D2(1) = DA,D2(n) ≤ 12 log5(n)DX +DA.

    Murat Cenk New results on binary polynomial multiplication 31 / 36

  • New improved 3-way algorithm

    P0 = A0B0, P1 = (A0 +A1 +A2)(B0 +B1 +B2), P4 = A2B2,P2 = (A0 +A2 + α(A1 +A2))(B0 +B2 + α(B1 +B2)) = P2,0 + αP2,1,C = P4X

    4n + (P0 + P1 + P2,1)X3n + (P2,0 + P1 + P2,1)X

    2n

    +(P4 + P1 + P2,0)Xn + P0

    Asymptotic complexities of this algorithm are the following:

    M2(n) ≤ 3M2(n/3) +M4(n/3) + 20n/3− 5, , M2(1) = 1,M2(n) ≤ 15.125n1.46 − 14.25n− 2.4274 log3(n) + 0.125,D2(n) ≤ D4(n/3) + 8DX , D2(1) = DA,D2(n) ≤ 10 log3(n)DX +DA.

    Murat Cenk New results on binary polynomial multiplication 32 / 36

  • Comparison of complexities

    Table: Cost of multiplication

    Algorithm Split M(n) Delay

    Bernstein 2 6.5n1.58 +O(n) 3 log2(n)

    Bernstein 3 25.5n1.46 +O(n) (1.5n+O(log3(n))DX

    CNH 3 5.8n1.63 +O(n) 4 log3(n)DX

    CNH 3 30.25n1.46 +O(n) 10 log3(n)DX

    CH 3 15.125n1.46 +O(n) 10 log3(n)DX

    Bernstein 4 6.425n1.58 +O(n) 5 log4(n)DX

    CH 5 6.5n1.5 +O(n) 11 log5(n)DX

    Murat Cenk New results on binary polynomial multiplication 33 / 36

  • 1 Why do we need efficient multiplication algorithms in F2n?

    2 Known methods

    3 Complexity tables for high speed cryptography

    4 Improving complexities further

    5 New results

    Murat Cenk New results on binary polynomial multiplication 34 / 36

  • New results

    n Previous New

    9 132 12615 329 31717 414 40718 456 43819 502 49821 602 59622 641 63223 678 67624 704 70225 800 79126 856 85327 922 912163 16923 16828233 29354 29156251 33096 32604256 34079 33397283 38735 38432407 67374 66931408 67582 67137409 67753 67284571 112569 111621

    Murat Cenk New results on binary polynomial multiplication 35 / 36

  • Thank you for your attention.

    Murat Cenk New results on binary polynomial multiplication 36 / 36

    Why do we need efficient multiplication algorithms in F2n?Known methodsComplexity tables for high speed cryptographyImproving complexities furtherNew results