Upload
shannon-jones
View
214
Download
2
Embed Size (px)
Citation preview
Generating RSA PrimesJim TownsendCSE633Final ResultsFall 2010
Importance• Encryption is harder to secure than ever• RSA is an important standard in Public Key Encryption• Developed in 1977, it began with relatively small keys –
128,256 bit keys• Current standard: 1048 bit keys (310 decimal digits)• Math on these numbers is very CPU intensive
How Keys are Generated• Use the Miller-Rabin algorithm • Tests against a specific few numbers• Only a probabilistic method• Probability a number is prime: .75• Repeated passes used to eliminate false positives• 16 repetitions: (1-.75)^16 • Runtime: O(ln(N)^4)
Sieve of Eratosphenes• Decided to implement a small sieve on the numbers before
using the Miller-Rabin algorithm• Using all the prime numbers less than 1000 (168 numbers),
see if any of those evenly divide the number first• Decreased serial runtime by more than half
Current Program• The program takes in two strings: a starting value and a range• Runs a sieve on the range with the first 168 primes• Uses the remaining numbers and tests them with the Miller-
Rabin algorithm up to 16 times on each.
Serial Results
Serial Results• Finding small numbers was relatively fast• Found 2263 primes 20 digits long in just .68 seconds• Large numbers are a different story:• 310 digits (Current RSA standard) took 27.01 seconds to find
only 118 primes
Parallel Algorithm• Divided the range among each processor • Each node checked its set and reported the number of primes
it found• Final reduction to sum up the count
Gains• Saw incredible speedup due to the minimal communication
needed• Most of the real gains came from tweaking the serial
algorithm• Using the sieve and only checking odd numbers• Would see much more by using load balancing using OpenMP
10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200 210 220 230 240 250 260 270 280 290 300 3100
5
10
15
20
25
30
35
Single Parallel vs Serial Algorithm
SingleSerial
Number of Decimal Digits
Tim
e (s
)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 310
0.5
1
1.5
2
2.5
3
3.5
4
All Parallel Test Runs
8 Cores16 Cores24 Cores32 Cores40 Cores48 Cores56 Cores64 Cores
Number of Decimal Digits
Tim
e (s
)
1 8 16 24 32 40 48 56 640
5
10
15
20
25
30
35
40
45
50
Total Speedup
310 Digits240 Digits180 Digits120 Digits60 Digits
Number of Cores
Spee
dup
Fact
or
1 8 16 24 32 40 48 56 640
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Efficiency: Ts/(P*Tp)
310 Digits240 Digits180 Digits120 Digits60 Digits
Cores
Perc
ent
Future Work• Could be more improved by load balancing the test with
OpenMP• Exit on first failed test• Much better synchronization would be possible
• Could also use this to divide the test into smaller pieces as well
• Implementation in CUDA using GPGPUs
Any Questions?