25
Efficient FPGA Modular Multiplication and Exponentiation 1 Efficient FPGA Modular Multiplication and Exponentiation Architectures using Digit Serial Computation Gustavo Sutter , Jean-Pierre Deschamps, José Luis Imaña [email protected], [email protected] , [email protected]

Exponentiation1Efficient FPGA Modular Multiplication and Exponentiation Architectures Using Digit Serial Computation

Embed Size (px)

Citation preview

Page 1: Exponentiation1Efficient FPGA Modular Multiplication and Exponentiation Architectures Using Digit Serial Computation

Efficient FPGA Modular Multiplication and Exponentiation 1

Efficient FPGA Modular

Multiplication and

Exponentiation Architectures

using Digit Serial Computation

Gustavo Sutter, Jean-Pierre Deschamps, José Luis Imañ[email protected], [email protected] , [email protected]

Page 2: Exponentiation1Efficient FPGA Modular Multiplication and Exponentiation Architectures Using Digit Serial Computation

Efficient FPGA Modular Multiplication and Exponentiation 2

Agenda

• Introduction

– Modular exponentiation

• Background

– Montgomery multiplication and exponentiation

• The proposed architecture

– Precomputing q, digit serial and carry save adder

• FPGA Results

– multiplication and exponentiation

• Result comparison

– For multiplication and exponentiation

• Conclusions

Page 3: Exponentiation1Efficient FPGA Modular Multiplication and Exponentiation Architectures Using Digit Serial Computation

Efficient FPGA Modular Multiplication and Exponentiation 3

Agenda

• Introduction

– Modular exponentiation

• Background

– Montgomery multiplication and exponentiation

• The proposed architecture

– Precomputing q, digit serial and carry save adder

• FPGA Results

– multiplication and exponentiation

• Result comparison

– For multiplication and exponentiation

• Conclusions

Page 4: Exponentiation1Efficient FPGA Modular Multiplication and Exponentiation Architectures Using Digit Serial Computation

Introduction

• Modular exponentiation => public key cryptosystems.

• Montgomery´s modular multiplication algorithm is

normally used since no trial division is necessary and

the critical path is reduced by using carry-save

addition (CSA).

• In this paper, the Montgomery multiplication is

optimized and architectures are proposed to perform

the Least-Significant-Bit (LSB) first and the Most-

Significant-Bit (MSB) first algorithms.

Efficient FPGA Modular Multiplication and Exponentiation 4

Page 5: Exponentiation1Efficient FPGA Modular Multiplication and Exponentiation Architectures Using Digit Serial Computation

Introduction (II)

• The architecture here presented has the

following distinctive characteristics:

– Use of digit-serial approach for Montgomery

multiplication.

– Conversion of the CSA representation of

intermediate multiplication using carry-skip

addition which reduces the critical path with a

small area-speed penalty.

– Precompute quotient value in Montgomery

iteration in order to speed up operation frequency.

Efficient FPGA Modular Multiplication and Exponentiation 5

Page 6: Exponentiation1Efficient FPGA Modular Multiplication and Exponentiation Architectures Using Digit Serial Computation

Efficient FPGA Modular Multiplication and Exponentiation 6

Agenda

• Introduction

– Modular exponentiation

• Background

– Montgomery multiplication and exponentiation

• The proposed architecture

– Precomputing q, digit serial and carry save adder

• FPGA Results

– multiplication and exponentiation

• Result comparison

– For multiplication and exponentiation

• Conclusions

Page 7: Exponentiation1Efficient FPGA Modular Multiplication and Exponentiation Architectures Using Digit Serial Computation

Background: Montgomery’s

algorithm

• The Montgomery product computes Z=X.Y.R-1 mod

M instead of Z=X.Y mod M . The drawback is the

need to convert operands into and out of

Montgomery’s domain, which is almost negligible in

some particular applications such as exponentiation.

Efficient FPGA Modular Multiplication and Exponentiation 7

Algorithm 1 – modified Montgomery product p := 0;

for i in 0 .. k-1 loop

q(i):= (p(0) + x(i)*y(0)) mod 2;

p := (p + x(i)*y + q(i)*m)/2;

end loop;

if p >= m then z := p-m; else z := p; end if;

Page 8: Exponentiation1Efficient FPGA Modular Multiplication and Exponentiation Architectures Using Digit Serial Computation

Background: Montgomery’s

algorithm (II)

• In the previous algorithm the main contributing factor

to the delay is the carry propagation resulting from

the very large operand additions. This can be

avoided by using Carry Save Adders (CSA)

Efficient FPGA Modular Multiplication and Exponentiation 8

Algorithm 2 – Montgomery product, carry-save addition pc := 0; ps := 0;

for i in 0 .. k-1 loop

q:= (pc(0) + ps(0) + x(i)*y(0)) mod 2;

(pc, ps) := (pc + ps + x(i)*y + q(i)*m)/2;

end loop;

p = pc + ps;

if p >= m then z:=p-m; else z:=p; end if;

Page 9: Exponentiation1Efficient FPGA Modular Multiplication and Exponentiation Architectures Using Digit Serial Computation

Background: The Exponentiation

• Modular exponentiation (YX mod M) is usually done with

repeated modular multiplications (MSB or LSB first).

• If the operands in Montgomery’s domain, then additional

pre- and post-processing steps are needed.

Efficient FPGA Modular Multiplication and Exponentiation 9

Algorithm 4 - base 2 mod m exponentiation,

LSB-first using Montgomery product e := exp_k;

ty := mp(y, exp_2k);

for i in 0 .. ke-1 loop

if x(i) = 1 then e := mp(e, ty);

end if;

ty := mp(ty, ty);

end loop;

z := mp(ty, 1);

Algorithm 3 – base 2 mod m exponentiation,

MSB-first using Montgomery product e := exp_k;

ty := mp(y, exp_2k);

for i in 1 .. ke loop

e := mp(e, e);

if x(k-i) = 1 then e := mp(e, ty);

end if;

end loop;

z := mp(e, 1);

Page 10: Exponentiation1Efficient FPGA Modular Multiplication and Exponentiation Architectures Using Digit Serial Computation

Efficient FPGA Modular Multiplication and Exponentiation 10

Agenda

• Introduction

– Modular exponentiation

• Background

– Montgomery multiplication and exponentiation

• The proposed architecture

– Precomputing q, digit serial and carry save adder

• FPGA Results

– multiplication and exponentiation

• Result comparison

– For multiplication and exponentiation

• Conclusions

Page 11: Exponentiation1Efficient FPGA Modular Multiplication and Exponentiation Architectures Using Digit Serial Computation

THE PROPOSED ARCHITECTURE:

Modular Multiplication

• To speed up algortihm 3, precomputes q(i+1) and

use Carry Save Adders.

Efficient FPGA Modular Multiplication and Exponentiation 11

Algorithm 6 – modified Montgomery product, carry-save

addition, q precomputed.pc := 0; ps := 0;

q := x(0)*y(0);

for i in 0 .. k-1 loop

qn:= ((pc(1:0) + ps(1:0) + x(i)*y(1:0)

+ q*m(1:0))/2 + x(i+1)*y(0)) mod 2;

(pc, ps) := (pc + ps + x(i)*y + q(i)*m)/2;

q := qn;

end loop;

p = pc + ps;

if p >= m then z:=p-m; else z:=p; end if;

HA FAFA ...

yk-1 y1

pc,k ps,k-1 pc,k-1 ps,1 pc,1ps,k

HA FAFA ...

bs,k bc,k bs,k-1 bc,k-1 bs,1

mk-1

...

FA

y0

ps,0 pc,0

HA

0bc,0

m0=1m1

bs,0bc,1

bc,k+1

bs,k+1 bc,(k+1..1) bs,(k+1..1)

new_pc,(k..0)

...

xor

y1

ps,1 pc,1

xor

FA

y0

ps,0 pc,0

xi+1

m1

xi

xor

xi qiqi

qi+1

next q computation

new_ps,(k..0)

Page 12: Exponentiation1Efficient FPGA Modular Multiplication and Exponentiation Architectures Using Digit Serial Computation

THE PROPOSED ARCHITECTURE:

Modular Multiplication (II)

• To further optimize

– Use digit serial computation.

– Use carry-skip adder for final addition

Efficient FPGA Modular Multiplication and Exponentiation 12

clear

ce

two (k+1)-bit

and a one bit registers

load

ce_p

load

shift

k-bit shift-d-register

new_pc,(k..0)

new_ps,(k..0)

d-digits Montgomery Cell

qi+1

qipc m x

x(d.(i+1)+1.. d.i)

qipc ps

ps y

final additions

p

m

d+1

Page 13: Exponentiation1Efficient FPGA Modular Multiplication and Exponentiation Architectures Using Digit Serial Computation

THE PROPOSED ARCHITECTURE:

Modular Multiplication (III)

• The carry-skip is much faster than a carry-propagate

adder but can be slower than the period of the

datapath of divider. The used solution is wait w=

T/ad cycles to finish this final step.

Efficient FPGA Modular Multiplication and Exponentiation 13

TABLE I. DELAY IN NS AND AREA IN LUTS FOR CARRY SKIP COMPARED AGAINST

RIPPLE CARRY ADDERS IN VIRTEX 5

ripple-carry S=32 S=64 SpeedUp

Area

Overhead Bits Delay Area Delay Area Delay Area

512 11.8 512 4.4 716 5.5 644 267% 40%

1024 26.8 1024 5.3 1452 5.9 1332 505% 42%

2048 56.5 2048 6.6 2924 6.3 2708 896% 32%

Page 14: Exponentiation1Efficient FPGA Modular Multiplication and Exponentiation Architectures Using Digit Serial Computation

THE PROPOSED ARCHITECTURE:

Modular Exponentiation

• We have used the traditional MSB and LSB first

algorithm.

– In MSB first the average Montgomery products (MP)

performed is around of 1.5 and worst case is 2.

– In LSB first in turn includes at most two Montgomery

products. In this case both products can be executed in

parallel and the total computation time is 1.

– The computation of exp_k and exp_2k necessary for are

computed using an SRT reducer

Efficient FPGA Modular Multiplication and Exponentiation 14

Page 15: Exponentiation1Efficient FPGA Modular Multiplication and Exponentiation Architectures Using Digit Serial Computation

Efficient FPGA Modular Multiplication and Exponentiation 15

Agenda

• Introduction

– Modular exponentiation

• Background

– Montgomery multiplication and exponentiation

• The proposed architecture

– Precomputing q, digit serial and carry save adder

• FPGA Results

– multiplication and exponentiation

• Result comparison

– For multiplication and exponentiation

• Conclusions

Page 16: Exponentiation1Efficient FPGA Modular Multiplication and Exponentiation Architectures Using Digit Serial Computation

FPGA Implemenation Results

• The design entry is behavioral VHDL except for

FPGA carry-skip adder.

Efficient FPGA Modular Multiplication and Exponentiation 16

TABLE II. VIRTEX 5 IMPLEMENTATION RESULTS OF PROPOSED DIGIT SERIAL MONTGOMERY’S

MULTIPLIERS

k d FF 6-Luts cycles

main

w

cycles

Period

(ns)

Total

Time (ns)

512 1 2581 4130 512 4 1.7 920.5

512 2 2583 6178 256 3 2.6 663.8

512 4 2584 10276 128 2 4.5 585.0

512 8 2584 18494 64 1 8.4 549.3

1024 1 5142 8227 1024 4 1.8 1936.8

1024 2 5144 12323 512 3 2.6 1319.9

1024 4 5145 20527 256 2 4.5 1161.0

1024 8 5145 36937 128 1 8.5 1090.1

2048 1 10263 16417 2048 5 1.8 3867.9

2048 2 10265 24613 1024 4 2.5 2634.8

2048 4 10266 41007 512 2 4.5 2313.0

Page 17: Exponentiation1Efficient FPGA Modular Multiplication and Exponentiation Architectures Using Digit Serial Computation

FPGA Implemenation Results

Efficient FPGA Modular Multiplication and Exponentiation 17

TABLE III. VIRTEX 5 IMPLEMENTATION OF EXPONENTIATIONS

k = ke Meth d FF LUTs Period

(ns)

avg T

(ms)

Thrg

(Mb/s)

512 MSB 1 4144 5696 1.8 0.72 713.6

512 MSB 2 4145 7745 2.5 0.50 1023.6

512 MSB 4 4145 11845 4.5 0.45 1133.0

512 MSB 8 4145 20041 8.5 0.43 1199.6

512 LSB 2 6728 13923 2.5 0.33 1535.4

1024 MSB 1 8242 11330 1.9 2.98 343.2

1024 MSB 2 8243 15427 2.6 2.03 503.6

1024 MSB 4 8243 23623 4.5 1.79 572.5

1024 MSB 8 8243 40011 8.4 1.68 608.7

1024 LSB 2 13387 27750 2.6 1.38 744.6

2048 MSB 1 16436 22595 1.9 12.00 170.7

2048 MSB 2 16437 30790 2.5 7.91 259.0

2048 MSB 4 16437 47176 4.5 7.12 287.8

2048 LSB 1 26699 39012 2.5 10.53 194.6

Page 18: Exponentiation1Efficient FPGA Modular Multiplication and Exponentiation Architectures Using Digit Serial Computation

Efficient FPGA Modular Multiplication and Exponentiation 18

Agenda

• Introduction

– Modular exponentiation

• Background

– Montgomery multiplication and exponentiation

• The proposed architecture

– Precomputing q, digit serial and carry save adder

• FPGA Results

– multiplication and exponentiation

• Result comparison

– For multiplication and exponentiation

• Conclusions

Page 19: Exponentiation1Efficient FPGA Modular Multiplication and Exponentiation Architectures Using Digit Serial Computation

Performance Comparison:

Modular Multipliers• Circuits reimplemented the multipliers in Virtex 2 devices using

Xilinx ISE 10.1.03.

Efficient FPGA Modular Multiplication and Exponentiation 19

TABLE V. COMPARISON OF MODULAR MULTIPLIERS IN FPGAS

k Circuit Device slice T

(ns)

Time

(µs)

Thrg

(Mb/s) AxD

512 [9] Virtex E 2972 10.5 16.17 31.7 48.1

512 [3] (5 to 2) Virtex 2 5170 7.9 4.06 126.2 21.0

512 [3] (4 to 2) Virtex 2 5782 8.2 4.21 121.6 24.4

512 [6] Virtex 2 2902 8.2 4.26 120.3 12.3

512 [4] Virtex 2 4029 4.5 2.33 220.2 9.4

512 Prop D=1 Virtex 2 2469 3.6 1.89 270.5 4.7

512 Prop D=2 Virtex 2 3497 4.8 1.25 409.3 4.4

512 Prop D=4 Virtex 2 5538 8.6 1.13 452.2 6.3

512 Prop D=8 Virtex 2 9446 15.6 1.03 497.4 9.7

512 Prop D=4 Virtex 5 2936 4.5 0.59 862.0 -

Page 20: Exponentiation1Efficient FPGA Modular Multiplication and Exponentiation Architectures Using Digit Serial Computation

Performance Comparison:

Modular Multipliers (II)

Efficient FPGA Modular Multiplication and Exponentiation 20

TABLE V. COMPARISON OF MODULAR MULTIPLIERS IN FPGAS

k Circuit Device slice T

(ns)

Time

(µs)

Thrg

(Mb/s) AxD

1024 [9] Virtex E 5706 10.5 32.17 31.8 183.6

1024 [3] (5 to 2) Virtex 2 10332 9.8 10.09 101.5 104.2

1024 [3] (4 to 2) Virtex 2 11520 9.0 9.22 111.1 106.2

1024 [6] Virtex 2 4512 8.8 9.03 113.4 40.7

1024 [4] Virtex 2 8000 4.5 4.63 221.1 37.1

1024 Prop D=1 Virtex 2 4923 3.7 3.88 262.7 19.2

1024 Prop D=2 Virtex 2 6982 4.8 2.48 410.8 17.4

1024 Prop D=4 Virtex 2 11079 8.4 2.19 471.7 24.1

1024 Prop D=8 Virtex 2 19247 15.5 2.02 508.2 38.8

1024 Prop D=4 Virtex 5 5702 4.5 1.18 868.5 -

2048 [3] (5 to 2) Virtex 2 20986 11.1 22.76 90.0 477.5

2048 [3] (4 to 2) Virtex 2 23108 11.0 22.59 90.6 522.1

2048 Prop D=1 Virtex 2 9831 3.8 7.79 263.0 76.6

2048 Prop D=2 Virtex 2 13954 4.8 4.94 414.8 68.9

2048 Prop D=4 Virtex 2 22201 8.4 4.34 471.3 95.7

2048 Prop D=2 Virtex 5 6837 2.56 2.63 777.3 -

Page 21: Exponentiation1Efficient FPGA Modular Multiplication and Exponentiation Architectures Using Digit Serial Computation

Performance Comparison:

Modular Exponentiators

Efficient FPGA Modular Multiplication and Exponentiation 21

TABLE VII. COMPARISON FOR 1024 BITS EXPONENTIATORS.

Ref Meth FPGA Area

(slices)

Period

(ns) w

avg C

(x1000)

avg T

(ms)

Thrg

(Mb/s)

[10] (r2) LSB XC4K 4865 19.2 - 2122 40.74 25.1

[10] (r16) LSB XC4K 6683 21.9 - 546 11.95 85.7

[3] (4 to 2) MSB Virtex 2 26136 10.3 - 1054 10.85 94.3

[4] LSB Virtex 2 12537 6.6 - 1579 10.35 98.9

Prop D=2 LSB Virtex 2 9298 4.8 6 798 3.83 267.3

Prop D=4 LSB Virtex 2 13346 8.4 3 399 3.35 305.5

Prop D=2 MSB Virtex 2 16280 4.8 6 532 2.55 401.0

Prop D=4 LSB Virtex 5 6217 4.5 2 397 1.79 572.5

Prop D=2 MSB Virtex 5 7303 2.6 3 529 1.38 744.6

Page 22: Exponentiation1Efficient FPGA Modular Multiplication and Exponentiation Architectures Using Digit Serial Computation

Efficient FPGA Modular Multiplication and Exponentiation 22

Agenda

• Introduction

– Modular exponentiation

• Background

– Montgomery multiplication and exponentiation

• The proposed architecture

– Precomputing q, digit serial and carry save adder

• FPGA Results

– multiplication and exponentiation

• Result comparison

– For multiplication and exponentiation

• Conclusions

Page 23: Exponentiation1Efficient FPGA Modular Multiplication and Exponentiation Architectures Using Digit Serial Computation

Conclusions

• The key point for exponentiation is an efficient

multiplication. The Montgomery`s multiplication is

widely used since it avoids the trial division.

• The distinctive characteristics of present work are:

– Precomputation of quotient value (q) in Montgomery iteration

in order to speed up operation frequency.

– Use of digit serial computation approach for Montgomery´s

multiplication.

– Maintain intermediate exponentiation values in binary format

instead of carry-save.

– Final conversion of the carry-save representation of

intermediate MP using carry-skip addition.

Efficient FPGA Modular Multiplication and Exponentiation 23

Page 24: Exponentiation1Efficient FPGA Modular Multiplication and Exponentiation Architectures Using Digit Serial Computation

Conclusions

• Results comparisons show that the proposed

architecture outperforms all the previous published

results to the author’s knowledge in terms of

throughput and also in area-delay.

• The comparison for 512, 1024 and 2048 bits

multipliers doubles the fastest reported result.

Comparison in 1024 bits exponentiation in FPGA

shows also a factor two improvement for similar or

less area.

Efficient FPGA Modular Multiplication and Exponentiation 24

Page 25: Exponentiation1Efficient FPGA Modular Multiplication and Exponentiation Architectures Using Digit Serial Computation

Efficient FPGA Modular Multiplication and Exponentiation 25

Questions…