View
217
Download
1
Embed Size (px)
Citation preview
An Expandable An Expandable Montgomery Modular Montgomery Modular
Multiplication Multiplication ProcessorProcessor
Adnan Abdul-Aziz GutubAdnan Abdul-Aziz Gutub Alaaeldin A. M. Alaaeldin A. M. AminAmin
Computer Engineering DepartmentComputer Engineering Department
King Fahd University of Petroleum & King Fahd University of Petroleum & MineralsMinerals
Dhahran, SAUDI ARABIADhahran, SAUDI ARABIA
Presentation OutlinePresentation Outline Introduction (RSA cryptographic systemIntroduction (RSA cryptographic system The Systolic MultiplierThe Systolic Multiplier The Basic CellThe Basic Cell Montgomery Product (MP) AlgorithmMontgomery Product (MP) Algorithm Expandability of the Parallel DesignExpandability of the Parallel Design The Expandable MP HardwareThe Expandable MP Hardware ConclusionConclusion
RSA Public Key RSA Public Key CryptosystemCryptosystem
Developed in 1978, by Rivest, Developed in 1978, by Rivest, Shamir & AdlemanShamir & Adleman
Its security is based on theIts security is based on the integer integer factoring problemfactoring problem
The most popular method :-The most popular method :-– simple to understand & implementsimple to understand & implement– same algorithm for encryption & same algorithm for encryption &
decryptiondecryption– can also be used for digital signaturecan also be used for digital signature
Encryptionkey
Decryptionkey
Concept
EncryptionRSA
DecryptionRSA
Plaintextmessage PlaintextCiphertext
Encryptionkey
Decryptionkey
Concept
Different
EncryptionRSA
DecryptionRSA
Plaintextmessage PlaintextCiphertext
RSA AlgorithmRSA Algorithm
For Encryption :
C = ME mod NFor Decryption :
M = CD mod N
M is the message, (E,N) is the encryption key, C is the cipher text, (D,N) is the decryption key.
Encryption key (E,N)
Encryption key (E,N)Decryption key (D,N)
Decryption key (D,N)public
private
RSA SecurityRSA Security
* Security depends on the key size.* Security depends on the key size.
largerkey size
largerkey size
more securesystem
more securesystem
Modular Multiplication• multiply/divide• add/subtract• logarithmic speed • Montgomery
Modular Multiplication• multiply/divide• add/subtract• logarithmic speed • Montgomery
hardwarehardware
Modular Exponentiationrepeated squaring
Modular Exponentiationrepeated squaring
softwareslow speed
softwareslow speed
RSA Implementations
&&
(G .A lia 1 9 9 1 )(C .W u 1 9 9 4 )
L ooku p Tab les
(E .F .B ricke ll1 9 8 3 )
R es id u e N u m b erS ys tem s
L ow er sp eedn o t ve ry
u se fu ll fo r ou rp rob lem
(S .E .E ld rid g e&C .D .W alte r1 9 9 3 )
(C .D .W alte r1 9 9 5 )
H ard w areD es ig n s
(n on -sys to lic )
(C .D .W alte r1 9 9 3 )
S ys to lica rrays
M on tg om ery'sM od u la r
M u lt ip lica tionA lg orith m
(C .D .W alte r 1 9 9 4 )
A L og arith m icS p eed M od u la r
M u lt ip lica tionA lg orith m
M od u la r A rith m etic
Montgomery’s MethodMontgomery’s Method
Introduced by P. Montgomery in 1985Introduced by P. Montgomery in 1985 Modular multiplication with out trial Modular multiplication with out trial
divisiondivision Can be implemented in VLSICan be implemented in VLSI Requires some pre-computations.Requires some pre-computations. Suitable for large number Suitable for large number
multiplication.multiplication.
MontgomeryMontgomery Modular Modular MultiplicationMultiplication
To Compute Z= XY mod NTo Compute Z= XY mod N
Pre-computation : R, R-1, N’11
mapping X &Y to Montgomery Domain :-x = XR mod N , y = YR mod N22
Montgomery Product: z = MP(x,y) = xy R-1 mod N33
OBJECTIVEOBJECTIVE
map z from Montgomery to normal: Z = MP(1,z)44
Mapping to Montgomery’s Domain:
Montgomery’s AlgorithmMontgomery’s AlgorithmTo computeTo compute : XY mod N : XY mod N
Pre-computations Pre-computations :: choose R= 2choose R= 2kk ; k = number of bits of E; R > N & ; k = number of bits of E; R > N &
GCD(R,N)=1.GCD(R,N)=1. compute: Rcompute: R-1-1 ; such that: R ; such that: R-1-1R mod N=1 & 0<RR mod N=1 & 0<R-1-1<N.<N. compute: N’ ; such that: N’=-Ncompute: N’ ; such that: N’=-N-1-1 mod R & 0<N’<R. mod R & 0<N’<R.
compute: x = X.R mod N.compute: x = X.R mod N. compute: y = Y.R mod N. compute: y = Y.R mod N.
performedby software
Montgomery’s AlgorithmMontgomery’s AlgorithmMP(x,y) = xyRMP(x,y) = xyR-1-1 mod N mod N
Montgomery’s Modular Multiplication: Montgomery’s Modular Multiplication: MP(x,y)MP(x,y)
P = x.yP = x.y U = P + N. (P.N’ mod R)U = P + N. (P.N’ mod R) S = U/RS = U/R MP = S (if S<N) ELSE MP = S-N MP = S (if S<N) ELSE MP = S-N
A2 A1* A :
R= 2k* A mod R : A1
* A/R : A2
k k
Numbers RepresentationA :
Al-1 Al-2 A2A1 A0
A :
b - bitsA : k-bits := l -wordsA : k-bits := l*b - bits
Numbers RepresentationA :
Al-1 Al-2 A2A1 A0
A :
A : A0 + A12b + A222b + . . . + Al-2 2(l-2)b+ Al-12(l-1)b A : A0 + A12b + A222b + . . . + Al-2 2(l-2)b+ Al-12(l-1)b
b - bits
SystolicMultiplier
p = x.y + q
clock
xyqp
The Systolic Multiplier
0,...,0, xl-1 , xl-2 ,...., x1 ,x00,...,0, yl-1 , yl-2 ,....., y1 ,y0
0, q2l-1 , q2l-2 ,........, q1 ,q0
p0 , p1,..............., p2l-1 , p2l
z0,...,0,1
Control input
First product digit
Building the Systolic Building the Systolic MultiplierMultiplier
clock
0,..,0, xl-1 ,...., x1 ,x0
0,..,0, yl-1 ,....., y1 ,y0
0, q2l-1 ,........, q1 ,q0
p0 , p1,......., p2l-1 , p2l
x
y
q
p
z0,...,0,1 zin
xin
yin
qin
pout
cell 1 cell 2 cell l/2+1
0
• (l/2 + 1) cells required for l-digit multiplication• (l/2 + 1) cells required for l-digit multiplication
Expandable Systolic Multiplier
x
y
q
p
z zin
xin
yin
qin
pout
cell 1 cell l/2+1
zin
xin
yin
qin
pout
zout
xout
yout
qout
pin
clock
cell 1 cell l/2+1
zout
xout
yout
qout
pin 0
Multiplier for l-digits Multiplier for l-digits
Multiplier for 2l-digits
Systolic Montgomery Reduction
(J. Sauerbrey 1992) N’0= -N-1 mod 2b ;
p = x.y ; for i = 0 to l-1 vi = pi . N’0 mod 2b ;
p = p+vi N 2bi ;
end for ; return p/R ;
Note that x,y < N< R where R = 2l*b & gcd(R,N) = 0
SystolicMultiplierx
y
q
p
z
clock0,...,0,1
p = x.y + q
0,...,0,Nl-1,...,N0
X0,...,0,N0’,...,N’0
0, p2l-1 , p2l-2 ,......., p1 ,p0
0,...,0,t0 , t1,............, tl-1
l-times
l-times
VHDLVHDL
Implementation of the Systolic Montgomery Reduction for l = 4
x y
qx.y + q x
y
x.y mod 2b
2b : base of numbers x & y
2T
delay of 2-clock cycles
T
T
T
T
T
T
T
T
2T2T2TT
2T2T2T
N
000 N’0
p(0)
p(4)
Systolic Multiplier
CorrectCorrect
Clarificationfor l = 4
Clarificationfor l = 4
p(2)
• N’0 = -N-1 mod 2b ;
• p(0) = x.y ;
• for i = 0 to l-1
• vi = pi(i) . N’0 mod 2b ;
• p(i+1) = p(i) + vi N 2b i ;
• end for ;
• return p(l)/R ;
T
T
T
T
T
T
T
T
2T2T2TT
2T2T2T
N
0 N’0
p(0) p(4)
v0
p(1) p(3)
v1v2 v3
p(0)
p(0) & N’0 is precomputed
Expandability of the Expandability of the Parallel ImplementationParallel Implementation
basic design for l-digits
expanded design for 2l-digits
expanded design for 3l-digits
ProjectionProjection
x y
qx.y + q x
y
x.y mod 2b
2b : base of numbers x & y
2T
delay of 2-clock cycles
T
T
T
T
T
T
T
T
2T2T2TT
2T2T2T
N
000 N’0
p(0)
p(4)
Systolic Multiplier
The Serial MP DesignThe Serial MP Design
multiplier
Systolic Multiplier p = xy + q
z
x
y
q
p
z(i)
N(i)
v(i)
p(i)
2l+1
p(i+1)
2l
z(i+1)
2l+ 1
2TN(i+1)
N’0{ }Mux 0
z(i)
LOOP : i = 0 to l-1LOOP : i = 0 to l-1 p(0) is precomputed
For ExpandabilityFor Expandability
Allow input data to have more digitsAllow input data to have more digits Allow systolic multiplier to be Allow systolic multiplier to be
expandableexpandable Allow registers to be expandableAllow registers to be expandable MultiplexingMultiplexing
The Expandable MP The Expandable MP systemsystem
Basicchipfor
l-digits
inputdata
Results
Chip for
additional l-digitsDesign for 2l-digits
Design for 3l-digits
additional l-digits
Design for 4l-digits
Chip for
additional l-digits
VHDL ModelingVHDL Modeling All three designs were modeled in All three designs were modeled in
VHDLVHDL Structural level => similar to real Structural level => similar to real
hardwarehardware Designs >> fully parametrized in Designs >> fully parametrized in
terms:terms:– ‘ ‘ll’ number of words ’ number of words – ‘‘bb’ number of bits in each word’ number of bits in each word– ‘‘tt’ time delay for each gate’ time delay for each gate
An expandable Montgomery modular An expandable Montgomery modular multiplication processor was designed, multiplication processor was designed, modeled in VHDL, and analyzed. modeled in VHDL, and analyzed.
Conclusion
..p(0)1 p(0)0
Systolic Systolic Montgomery Montgomery
ReductionReductionsignal flow graph for l = 4signal flow graph for l = 4
N’N’0 0 = -N= -N-1-1 mod 2 mod 2bb ; ; p(0) = x.y ;p(0) = x.y ; for i = 0 to for i = 0 to ll-1-1 vvii = p = pii(i) . N’(i) . N’00 mod 2 mod 2b b ;;
p(i+1) = p(i) + vp(i+1) = p(i) + vii N b N bb b
ii end for ;end for ; return p(return p(ll)/r ; )/r ;
time : 0 1 2 3 4 5 6
....0 0 0 0 N’0
....0 0 N3 N2 N1 N0
x y
qx.y + q x
y
x.y mod 2b
2b : base of numbers x & ySystolic Multiplier
• N’0 = -N-1 mod 2b ;
• p(0) = x.y ;
• for i = 0 to l-1
• vi = pi(i) . N’0 mod 2b ;
• p(i+1) = p(i) + vi N 2b i ;
• end for ;
• return p(l)/R ;
Montgomery’s AlgorithmMP(x,y) = xyR-1 mod N
Montgomery’s AlgorithmMP(x,y) = xyR-1 mod N
Loop: i = 0• v0 = p0(0) . N’0 mod 2b
• p(1) = p(0) + v0 N 20
Loop: i = 1• v1 = p1(1) . N’0 mod 2b
• p(2) = p(1) + v1 N 2b
Loop: i = 2• v2 = p2(2) . N’0 mod 2b
• p(3) = p(2) + v2 N 22b
Th e R S A c ryp tog rap h icp rocessor (H . S ed lak1 9 8 8 )V L S I im p lem en ta tion o f p u b lickey en c ryp tion a lg orith m s(G . O rton 1 9 8 7 )F as t R S A -H ard ware(F . H oorn eart 1 9 8 8 )
n o en ou g h in fo rm ation ,
o r n o t p rac tica lfo r exp an d ab ility
V IC TO R : A nE ffic ien t R S A
H ard ware(H . O ru p 1 9 9 0 )
A H ig h S p eedR S A P rocessor(F . A l-Tu wa ijry
1 9 9 1 )
A Mod u lar SystolicExp on en tiation Un it(J.Sau erb rey 1992)
w ell-d e fin edp rop osed
Im p lem en ta tion s
R S A D es ig n s
suitable for expandabilitylogical start