10
Implementation of the RSA Algorithm on a Dataflow Architecture Nikola Bežanić [email protected] Advisors: Veljko Milutinović, Jelena Popović-Božović, and Ivan Popović School of Electrical Engineering, University of Belgrade, 2013.

Implementation of the RSA Algorithm on a Dataflow Architecture Nikola Bežanić [email protected] Advisors: Veljko Milutinović, Jelena Popović-Božović,

Embed Size (px)

Citation preview

Page 1: Implementation of the RSA Algorithm on a Dataflow Architecture Nikola Bežanić nbezanic@gmail.com Advisors: Veljko Milutinović, Jelena Popović-Božović,

Implementation of the RSA Algorithm

on a Dataflow Architecture

Nikola Bežanić[email protected]

Advisors: Veljko Milutinović, Jelena Popović-Božović, and Ivan Popović

School of Electrical Engineering, University of Belgrade, 2013.

Page 2: Implementation of the RSA Algorithm on a Dataflow Architecture Nikola Bežanić nbezanic@gmail.com Advisors: Veljko Milutinović, Jelena Popović-Božović,

Introduction

Case study area: Public key cryptography acceleration

Problem: RSA implementation on MaxelerExisting problem solutions: NoneSummary: Under review (IPSI) Approach:

Accelerate multiplications Analyze usability

Conclusions: Multiplication speedup: 70% (28% total) Usability: Picture encryption

2

2/10

Page 3: Implementation of the RSA Algorithm on a Dataflow Architecture Nikola Bežanić nbezanic@gmail.com Advisors: Veljko Milutinović, Jelena Popović-Božović,

RSA

Montgomery method: n -> rr=2sw -> power of 2Montgomery product

(MonPro): modulo r arithmetic

------------------------------------

3

3/10

bits

nMC emod

m1. . .ms-1 m0

32w function ModExp(M, e, n) {n is odd} Step 1. Compute n’. Step 2. Mm := M ∙ r mod n

Step 3. xm := 1 ∙ r mod n Step 4. for i = k – 1 down to 0 do Step 5. xm := MonPro(xm, xm)

Step 6. if ei = 1 then xm := MonPro(Mm, xm)

Step 7. x := MonPro(xm, 1) Step 8. return x

swr 21'1 nnrr

e1. . .ek-1 e0

1 bit

Page 4: Implementation of the RSA Algorithm on a Dataflow Architecture Nikola Bežanić nbezanic@gmail.com Advisors: Veljko Milutinović, Jelena Popović-Božović,

Montgomery product

a and b are big numbers

Breaking them to digits: bs-1…b1b0

as-1…a1a0

Processing on a word basis

4

4/10

function MonPro(a, b)Step 1. t := a ∙ bStep 2. m := t ∙ n’ mod rStep 3. u := (t + m ∙ n) / rStep 4. if u ≥ n then return u – n else return u

for i = 0 to s-1 C := 0 for j = 0 to s-1 (C, S) := t[i + j] + a[j]∙b[i] + C t[i + j] := S t[i + S] := C

Page 5: Implementation of the RSA Algorithm on a Dataflow Architecture Nikola Bežanić nbezanic@gmail.com Advisors: Veljko Milutinović, Jelena Popović-Božović,

Montgomery product: Step 15

5/10

for i = 0 to s-1 C := 0 for j = 0 to s-1 (C, S) := t[i + j] + a[j]∙b[i] + C t[i + j] := S t[i + S] := C

X

a[j] b[i]

lowhiProduct:

32 bits

CPU

Page 6: Implementation of the RSA Algorithm on a Dataflow Architecture Nikola Bežanić nbezanic@gmail.com Advisors: Veljko Milutinović, Jelena Popović-Božović,

Dataflow multiplier6

6/10

X

a[j] b[i]

lowhiProduct:

32 bits

X

Stream a Constant b0

Stream x

CPU

Dataflow engine (DFE)

Stream y

Page 7: Implementation of the RSA Algorithm on a Dataflow Architecture Nikola Bežanić nbezanic@gmail.com Advisors: Veljko Milutinović, Jelena Popović-Božović,

Dataflow multiplier: Pipeline problem

Next iteration (next constant b1) => new DFE runNew DFE run => new pipeline fill-up overhead1024-bits key requires only 32 digits (32 bits

each)Not enough to fill-up the pipelineResult: CPU time < DFE time !Solution:

Work on blocks of data Do not use constants, rather use a stream Stream has redundant values: acts as a const.

7

7/10

X

a0

b0

Stream xStream y

a1a2a3

b1

a0a1a2a3

Page 8: Implementation of the RSA Algorithm on a Dataflow Architecture Nikola Bežanić nbezanic@gmail.com Advisors: Veljko Milutinović, Jelena Popović-Božović,

Dataflow multiplier: Blocks of data8

8/10

b0 x a < = >

Block 0

Block 1

Block z-1

Big streams for each run

Page 9: Implementation of the RSA Algorithm on a Dataflow Architecture Nikola Bežanić nbezanic@gmail.com Advisors: Veljko Milutinović, Jelena Popović-Božović,

Results

Using blocks pipeline is fullUsing one multiplier speed up is 10% for RSASpeedup is 70% for multiplication using 4

multiplersIt leads to 28% for complete RSA (Amdahl’s law)Future work

Deal with carry at DFE or Overlap carry propagation at CPU and multiplication at

DFE

9

9/10

Page 10: Implementation of the RSA Algorithm on a Dataflow Architecture Nikola Bežanić nbezanic@gmail.com Advisors: Veljko Milutinović, Jelena Popović-Božović,

10

10/10

The End

function ModExp(M, e, n) { n is odd } Step 1. Compute n’. Step 2. Mm := M ∙ r mod n

Step 3. xm := 1 ∙ r mod n Step 4. for i = k – 1 down to 0 do Step 5. xm := MonPro(xm, xm)

Step 6. if ei = 1 then xm := MonPro(Mm, xm)

Step 7. x := MonPro(xm, 1) Step 8. return x