1 Lucas-Lehmer Primality Tester Team: W-4 Nathan Stohs W4-1 Brian Johnson W4-2 Joe Hurley W4-3...

Lucas-Lehmer Primality Tester

Team: W-4

Nathan Stohs W4-1

Brian Johnson W4-2

Joe Hurley W4-3

Marques Johnson W4-4

Design Manager: Prateek Goenka

Agenda

• Background (Marques)• Project Description (Marques) • Algorithmic Description (Joe)• Data Flow/Block Diagram (Joe)• Design Process (Nathan)• Simulations (Nathan)• Floorplan/Layout (Brian)• Conclusions (Brian)

History of 2P-1

• 16th century it was believed 2P-1 was prime for all prime P’s

• 1536 Hudalricus Regius proved 211-1 was not prime

• French monk Marin Mersenne published Cogitata Physica-Mathematica where he stated 2P-1 was prime for P = 2, 3, 5, 7, 13, 17, 19, 31, 67, 127 and 257

Lucas-Lehmer

• François Edouard Anatole Lucas

• 1876 proved that the number 2127 - 1 is prime using his own methods

• Derrick Lehmer – 1930 he refined Lucas’s method

Make History

• December 2005• 43rd Known Mersenne Prime Found!!• Dr. Curtis Cooper and Dr. Steven Boone• Professors at Central Missouri State University • 230,402,457-1

Prime Number Competitions• Electronic Frontier Foundation

• $50,000 to the first individual or group who discoversa prime number with at least 1,000,000 decimal digits (awarded Apr. 6, 2000)

• $100,000 to the first individual or group who discoversa prime number with at least 10,000,000 decimal digits

• $150,000 to the first individual or group who discoversa prime number with at least 100,000,000 decimal digits

• $250,000 to the first individual or group who discoversa prime number with at least 1,000,000,000 decimal digits

rank prime digits who when reference

1 230402457-1 9152052 G9 2005 Mersenne 43

2 225964951-1 7816230 G8 2005 Mersenne 42

3 224036583-1 7235733 G7 2004 Mersenne 41

4 220996011-1 6320430 G6 2003 Mersenne 40

5 213466917-1 4053946 G5 2001 Mersenne 39

6 27653.29167433+1 2759677 SB8 2005

7 28433.27830457+1 2357207 SB7 2004

8 26972593-1 2098960 G4 1999 Mersenne 38

9 5359.25054502+1 1521561 SB6 2003

10 4847.23321063+1 999744 SB9 2005

Mersenne Prime Algorithm

• Only used for numbers that are in the form 2P-1

• For P > 2

• 2P-1 is prime if and only if Sp-2 is zero in this sequence:

• S0 = 4

• SN = (SN-12 - 2) mod (2P-1)

Example to Show 27 - 1 is Prime

• 27 – 1 = 127

• S0 = 4

• S1 = (4 * 4 - 2) mod 127 = 14

• S2 = (14 * 14 - 2) mod 127 = 67

• S3 = (67 * 67 - 2) mod 127 = 42

• S4 = (42 * 42 - 2) mod 127 = 111

• S5 = (111 * 111 - 2) mod 127 = 0

Computations needed:-Squaring (not a problem…)-Add/Subtract (not a problem…)

-Modulo (2n – 1) multiplication (?)

Algorithmic description

We knew the necessary computations, but how to translate that to gates?

Mechanisms behind the math• If done with brute force, modulo 2n-1 could have

been ugly.– Would need to square and find the remainder

via division.• Luckily, for that specific computation, math is on

our side, the 2n-1 constraint saves us from division, as will be seen.

• A quick search on www.ieee.org produced inspiration.

• Reto Zimmermann. Efficient VLSI Implementation of Modulo (2n +- 1) Addition and Multiplication. Computer Arithmetic, 1999; p158-167.

Useful Math: Multiplication

Just like any other multiplication, a modulo multiplication can be computed by (modulo) summing the partial products.

So modulo multiplication is multiplication using a modulo adder.

From the Zimmerman paper

Mod Calc

Mod add

Subtract 2

Block Diagram

Register

Compare

Counter

Next Partial Product

Register

S1 = (4 * 4) mod 127 - 2 = 14

Loop xP-2

S5 = (111 * 111 - 2) mod 127 = 0

...S2 = (14 * 14) mod 127 - 2 = 67

Loop x16

Design ProcessThe Process So far:

- Found Mathematical Means (core algorithm)

- Found Computational Means (modulo multiplier, adder)

From the above, a high level C program was written in a manner that would easily translate to verilog and gates, or at least more standard operations

int mod_square_minus(int value, int p, int offset) { int acc, i; int mod = (1 << p) - 1; for(acc=offset, i=0; i<(sizeof(int)*8-1); i++) { int a = (value >> i) & 1; int temp; if (a) { if (i-p > 0)

temp = value << (i-p); else

temp = value >> (p-i); acc = acc + temp + ((value << i) & ((1 << p) - 1)); } if (acc >= mod) acc = acc - mod; } return acc;}

This easily translated into behavorial verilog, and readily turned into a gate-level implementation. Essentially it was written in a more low-level manner.

Design Process

The rest of the design can simply be thought of as a wrapper for the modulo multiplier.

The following slides contain Verilog code that was directly taken from the C code below.

module mod_mult(out, itrCount, x, y, mod, p, reset, en, clk); input [15:0] x, y, mod, p; output [15:0] out;

input reset, en, clk;

wire [15:0] pp, ma0, temp; output [3:0] itrCount;

counter mycount(itrCount, reset, en, clk); partial_product ppg(pp, x, y, itrCount, mod, p); mod_add modAdder(out, pp, temp, mod); dff_16_lp partial(clk, out, temp, reset, en);

endmodule

Top level of multiplier

module partial_product(out, x, y, i, mod, p); output [15:0] out; input [15:0] x, y, mod, p; input [3:0] i;

wire [15:0] diff1, diff2, added, result, corrected, final; wire [15:0] high, low, shifted, toadd; wire cout1, cout2, ithbith, toobig;

sub_16 difference1(diff1, cout1, {12'b0, i}, p); sub_16 difference2(diff2, cout2, p, {12'b0, i}); shift_left shiftL(high, y, diff1[3:0]); shift_right shiftR(low, y, diff2[3:0]); mux16 choose(high, low, shifted, cout1);

shift_left shiftL2(toadd, y, i); and16 bigand(added, toadd, mod);

fulladder_16 addhighlow(.out(result), .xin(added), .yin(shifted), .cin({1'b0}), .cout(nowhere));

sub_16 correct(.out(corrected), .cout(toobig), .xin(mod), .yin(result)); mux16 correctionMux(.out(final), .high(corrected), .low(result), .sel(toobig));

shift_right ibit({15'b0, ithbit}, x, i); select16 checkfor0(.out(out), .x(result), .sel(ithbit));

endmodule

Partial Product Unit w/ modulo reduction

module mod_add(out, x, y, mod); input [15:0] x, y, mod; output [15:0] out;

wire cout, isDouble, cin; wire [15:0] plus, lowbits, done, mod_bar, check;

fulladder_16 add(.out(plus), .xin(x), .yin(y), .cin(cin), .cout());

invert_16 inverter(mod_bar, mod);

and16 hihnbits(check, plus, mod_bar); and16 lownbits(done, plus, mod);

or8 (cin, check[0], check[1], check[2], check[3], check[4], check[5], check[6], check[7], check[8], check[9], check[10], check[11], check[12], check[13], check[14], check[15]);

compare_16 checkfordouble(isDouble, done, 16'b1111_1111_1111_1111); mux16 fixdouble(.out(out), .high(16'b0), .low(done), .sel(isDouble));

endmodule

Modulo Adder

Final Design Process Notes

• Lessons learned: Never tweak the schematics without retesting the verilog first. Timing issues can be subtle. Verilog is better for catching them and quickly fixing/retesting than schematics.

• Considering total time spent during this phase, roughly half was on the “core” and the FSM, the rest on the “wrapper”.

Road to verification : C2 Examples of the high-level C implementations:

Tyrion:~/Desktop/15525 nstohs$ ./prime4 7round 1: (4 * 4 - 2) mod 127 = 14round 2: (14 * 14 - 2) mod 127 = 67round 3: (67 * 67 - 2) mod 127 = 42round 4: (42 * 42 - 2) mod 127 = 111round 5: (111 * 111 - 2) mod 127 = 027-1 is prime

Tyrion:~/Desktop/15525 nstohs$ ./prime4 11round 1: (4 * 4 - 2) mod 2047 = 14round 2: (14 * 14 - 2) mod 2047 = 194round 3: (194 * 194 - 2) mod 2047 = 788round 4: (788 * 788 - 2) mod 2047 = 701round 5: (701 * 701 - 2) mod 2047 = 119round 6: (119 * 119 - 2) mod 2047 = 1877round 7: (1877 * 1877 - 2) mod 2047 = 240round 8: (240 * 240 - 2) mod 2047 = 282round 9: (282 * 282 - 2) mod 2047 = 1736211-1 is not prime

Road to verification: Verilog

Samples of Verilog Verification output:

Partial Product Unit p = 7380 ppOut= 56, x= 14, y= 14, i= 2, mod= 127, p= 7400 ppOut= 112, x= 14, y= 14, i= 3, mod= 127, p= 7420 ppOut= 0, x= 14, y= 14, i= 4, mod= 127, p= 7440 ppOut= 0, x= 14, y= 14, i= 5, mod= 127, p= 7

Top Level p = 7itrOut= xitrOut= 4itrOut= 14itrOut= 67itrOut= 42itrOut= 111itrOut= 0

Top Level p = 11itrOut= xitrOut= 4itrOut= 14itrOut= 194itrOut= 788itrOut= 701itrOut= 119itrOut= 1877…

Tests were either specific tests on important units such as Partial_Product

…or top level tests. Note that these are the same results generated from the C code

Road to verification: Schematic I

Schematic Test of our modular adder.

128 + 68 Mod 127 = 69

Road to verification: Schematic II

Plot of the top level output after a single iteration, p=7

Output after a single iteration is 14, the expected value.

Road to verification: Schematic III

4 14 67 42 111

Road to verification: Intermission

Disk Space required for a full-length schematic test of p=7 : 6 GBTime required for a full-length schematic test of p=7 : 5 hours

Disk Space required for a full-length extractedRC test of p=7 : 20 GBTime required for a full-length extractedRC test of p=7 : 8 hours

Simulations become lengthy due to tests needing to be “deep” to be useful.

Layout: ExtractedRC – Full Run

4 14 67 42 111

TimingTo determine the bounds of our clock, Pathmill was used once major portions of the schematic was complete.

The critical path through our design is one loop through the modular multiplier, which runs through the modular adder and partial products module.

The pathmill delay of the modular adder was 9ns, and 5.2 ns through the partial products module.

This already puts our total delay at 14.2 ns, putting our schematic delay at 70 MHz.

For extractedRC, due in part to simulation issues, a conservative 50 MHz was chosen as the final clock.

Issues

• extractedRC of partial_product module• Registers switch

– Custom design to DFFs with muxes

• Switching from parallel calculations to series– Transistor count vs. clock cycles

• Syncing up design between people– Transferring files– Different design styles

• LONG simulation times• Floorplanning

– Too much emphasis on aspect ratios and not enough on wiring– Couldn’t decide on one set floorplan

Floorplan v1.0

Floorplan v2.0

Final Floorplan

Pin Specifications

Pin Type # of Pins

Vdd! In/Out 1

Gnd! In/Out 1

p<0:15> In 16

clk In 1

start In 1

Done Out 1

out Out 1

Total - 22

Initial Module SpecificationsModule Transistor

(µm²)

Transistor

Density

FSM 300 900 .33

mod_p 2,440 7,000 .35

mod_add 1,282 9,000 .14

partial_product 8,676 65,000 .13

count 1,656 6,000 .27

sub_16 704 3,500 .20

Registers 1,848 6,000 .30

compare 36 300 .12

Total 16,942 97,700 .17

Final Module Specifications

Module Transistor

(µm²)

Transistor

Density

FSM 152 1,200 .13

mod_p 1,280 8,603 .15

mod_add 1,168 5,603 .21

partial_product 7,520 54,680 .14

count 1,424 8,701 .16

sub_16 576 2,934 .20

Registers 896 6,028 .15

compare 56 201 .28

Total 13,702 86,621 .16

Aspect

Chip Specifications

• Transistor Count: 13,702

• Size: 296.51µm x 292.13µm

• Area: 86,621µm²

• Aspect Ratio: 1.01:1

• Density: 0.16 transistors/µm²

Final Floorplan

Partial Product

shift_rightshift_left

shift_right shift_left

16-bit and Select16

Sub_16

Poly Layer

Density: 7.14%

Active Layer

Density: 8.76%

Metal1 Layer

Density: 23.86%

Metal2 Layer

Density: 19.97%

Metal3 Layer

Density: 11.30%

Metal4 Layer

Density: 10.34%

Conclusions

• Plan for buffers-Will be hard to put them in after the fact

• Your design will change dramatically from start to finish so be flexible

• Communication is key

• Do layout in parallel

1 Lucas-Lehmer Primality Tester Team: W-4 Nathan Stohs W4-1 Brian Johnson W4-2 Joe Hurley W4-3...

Documents

w4 Work Physiology1

Opti 222 w4

NORTH EAST MUNI-CORR · 2019-09-29 · twp 57 rge 1 w4 twp 59 rng 19 w4 twp 59 rng 18 w4 twp 58 rng 18 w4 twp 58 rng 17 w4 twp 60 rng 19 w4 twp 60 rng 18 w4 twp 59 rng 17 w4 twp 60

T4 w4 current

SMK Ownermap Spring2016 - Smoky Lake County · twp 60 rge 12 w4 twp 57 rge 12d w4 twp 61 rge 14 w4 twpa 61 rge 13l w4 twp 60 rge 14 w4 twpe 60 rgel 13 w4 twpr 59 rgew 14 w4 twp 59

Construction W4-6

W4 L1 Slides

1 Lucas-Lehmer Primality Tester Presentation 8 March 22nd 2006 Team: W-4 Nathan Stohs W4-1 Brian Johnson W4-2 Joe Hurley W4-3 Marques Johnson W4-4 Design

Lis4380 f13-w4

1 Lucas-Lehmer Primality Tester Presentation 6 March 1st 2006 Team: W-4 Nathan Stohs W4-1 Brian Johnson W4-2 Joe Hurley W4-3 Marques Johnson W4-4 Design

Holzmedia Medienwand W4 Projektionssideboard B8 H olzmedia ...€¦ · PP-W4-BP-01-SDG PP-W4-BP-01-SSG PP-W4-BP-01-SLS PP-W4-BP-02 PP-W4-BP-03 PP-W4-BP-04 Beschreibung Professionelles

WELCOME !! Singsation 2013 Kathy Simpson & Zack Bjornsen Lori Stohs & Linda Timmons

Ana Negotiation w4

W4 physics 2003

T3 w4 current

W4 Probability Distributions

Addendum No - First Nations Health Authority · w4 w4 w4 w3 w3 w3 w4 w4 w4 w4 w4 w4 w4 w4 w3 w3 w5 w5 w5 w5 w5 w5 3'-6" 6'-3 1/4" 4'-11" 2 a2.1 2 a2.1 furr out ducting 16'-0" 20'-0"

W4 - Decision Trees

Stephen M. Stohs and Jeffrey T. LaFrance …...1 A Learning Rule for Updating the Distribution of Crop Yields over Space and Time Stephen M. Stohs and Jeffrey T. LaFrance Department

W4 July18 Newsletter