Random Number Generator

Preview:

DESCRIPTION

Random Number Generator. May 1, 2006. Dmitriy Solmonov W1-1 David Levitt W1-2 Jesse Guss W1-3 Sirisha Pillalamarri W1-4 Matt Russo W1-5 Design Manager – Thiago Hersan. Why Random Numbers?. Real-Time Simulations Encryption Gambling. Encryption Need random numbers for authentication - PowerPoint PPT Presentation

Citation preview

Random Number Generator

Dmitriy Solmonov W1-1David Levitt W1-2Jesse Guss W1-3

Sirisha Pillalamarri W1-4 Matt Russo W1-5

Design Manager – Thiago Hersan

May 1, 2006

2

Why Random Numbers?

• Real-Time Simulations

• Encryption

• Gambling

3

Encryption• Need random numbers for authentication• Key generation• Software vs. Hardware

– Less power/time per number– Portable

Gambling• ePoker Rooms• SoC Deck Generation• Other future casino games

4

Business Plan•Potential markets

•Defense and Intelligence Organizations•E-Gambling / Casinos•Game Consoles•Mobile Communication

•License the IP•Our design will be part of a larger ASIC or GPP design

5

IBAA Algorithm

• Uses RC4 encryption algorithm– Cryptographically secure– Deterministic

• 1024-bit number generated

• Internally Updated Seed– not user visible = secure

6

#define ALPHA (8)#define SIZE (1<<ALPHA)#define ind(x) ((x)&(0x1F))#define barrel(a) (((a)<<19)^((a)13)) uint32 A, B, Y, X;

uint32 M[32], R[32];

for ( i=0; i<SIZE; i++ ) {

X = m[ind(i)];

A = barrel(A) + M[ind(i +16)];

M[ind(i)] = Y = M[ind(X)] + A + B;

R[ind(i)] = B = M[ind(Y>>ALPHA)] + X;

}

The IBAA Algorithm

Architecture

8

for ( i=0; i<SIZE; i++ ) {

X = M[ind(i)];

A = barrel(A) + M[ind(i +16)];

M[ind(i)] = Y = M[ind(X)] + A + B;

R[ind(i)] = B = M[ind(Y>>ALPHA)] + X;}

IBAA Algorithm to Architecture

4 Reads from M1 Write to M1 Write to R

dependencies, feedback, and RAW hazards

9

Algorithm to Architecture

• Hardware Limits– Max. of 2 simultaneous reads from memory

• Can’t do better than two stages

• Each stage must take multiple cycles to complete

10

• Chosen Timing– Addition = 1 cycle– Memory Read = 0.5 cycles– Memory is clocked ½ period off phase– Set address and receive data in 1 cycle

• When forwarding is applied, need 4 cycles per stage

Algorithm to Architecture

11

SRAM (M)

SRAM(R)

FSM

Adder

Counter

Control Logic

Register

Counter

Adder(X)

Reg

(B)Reg

(Y)Reg Adder

(Y1)Reg Adder

(A)Reg

Stage 1--------------------------------------M1 = M[i+16]--------------------------------------X = M[i] | A = M1 + barrel(A)--------------------------------------M3 = M[X] | C1 = (X==i-1)--------------------------------------Y1 = A + (C1) ? Y : M3

Stage 2------------------------------------Y = B + Y1------------------------------------M4 = M[Yaddr] | C2 = (i==Yaddr)------------------------------------B = X + (C2) ? Y : M4------------------------------------M[i] = Y | R[i] = B

(M4)Reg

(M1)Reg

(M2)Reg

(M3)Reg

Design For Manufacture

Regular Fabrics

13

14

15

16

Why DFM?

•Ability to print on smaller processes•Robust Manufacturability•Sacrifice area, speed and metal layers for a regular design

17

Sample Layout:

Regular Fabrics

18

Lithography Simulations

Hardware

20

Adder• Four adders execute 256 times.• Hybrid adder• Fast and low power.

CS4 CS18 CS6 CS4

A[3

:0]

B[3

:0]

A[9

:4]

B[9

:4]

A[2

7:10

]

B[2

7:10

]

A[3

1:28

]

B[3

1:28

]

S[31:28] S[27:10] S[9:4] S[3:0]

C’[4

]

C[1

0]

C’[2

8]

C[3

2]

21

32-Bit Adder: First 4 Bits

22

32-Bit Adder: CS6 Block

23

32-Bit Adder: CS18 Block

24

32 Bit Fast Adder

25

Adder Performance

• Delay: 1.56 ns

• Energy Consumption – (worst case switching) : 12.4 pJ

• Power Dissipation – (estimating with our switch factor) : 148 μW

26

SRAM

Single Bus Cell

Double Bus Cell

27

SRAM

28

Functional Verification

• Structural Verilog vs. C Code: – Generate numbers under equal load

conditions– Compare Numbers

• Schematic vs. Structural Verilog– Under equal inputs, check if port

outputs match

• LVS

29

Verification

• Schematic and Extracted Parasitic spice simulations of major blocks– Check for clean signals– Check delays and rise/fall times

• Extracted Parasitic simulation of critical Register-Register Path– Signals are clean – Delay = 2.1 ns

• Extracted Parasitic simulation of chip clock distribution

30

Critical Delay

31

Final Layout

32

Poly Density 7.52%

Metal1 Density 20.85 %

33

Metal2 Density 19.89%

Metal3 Density 18.76%

34

Metal5 Density 6.8%

Metal4 Density 9.36%

Analysis

36

Specifications

• Pins– 36 input pins

• 32 bit seed input, gen, read, rst, clk

– 34 output pins• 32 bit random output, rdy, done

– 2 input/output pins• vdd, gnd

• 475 MHz chip speed• 436 KHz throughput

37

PartTrans

CountArea

(um2)Density

Prop

Delay (ns)

Power

(1x) (mW)

500MHz

Power

(Avg) (mW)

475 MHz

Adders

(4)5,856

(1,464 ea.)

25,200

(6,300 ea.)0.232

1.45

1.56

0.60

0.62

0.14

0.148

SRAM

(M&R)17,736

(M=10,458

R=7,278)

51,000

(M=35,000

R=16,000

0.348

(M=0.293

R=0.456)

0.735

0.845

W: 0.51

W: 3.25

R: 0.19

R: 1.40

0.27

1.86

Regs

(10)6,400

(640 ea.)

38,400

(3,840 ea.)0.167

0.220

0.275

0.53

0.59

0.13

0.145

Total33,371 182,000 0.194

2.1 ns

475 MHz----- 4.1 mW

Putting it All Together

Schematic ExtractRC

38

Performance Comparison

Operation Time (ms)

~4,000,000 Runs

Intel P4 3.20 GHz (90 nm) 5000

W1-2006 475 MHz (180 nm) 9000

AMD Opteron Blade 1.005 GHz () 14000

ARM Intel XScale 700 MHz () 125000

39

Where to Now ?

• ERC, tapeout, etc.

• Thermal noise unit to use as input seed

• On-Chip Bus Interface

• HyperTransport™ Interface

40

References•Jenkins, Robert J. “ISAAC”. http://burtleburtle.net/bob/rand/isaac.html

•Chirca, Schulte, Glossner, et al. “A Static Low-Power, High-Performance 32-bit Carry Skip Adder”. http://mesa.ece.wisc.edu/publications/cp_2004-12.pdf

•“CLA and Ling Adders”. http://umunhum.stanford.edu/~farland/notes.html

41

Questions

Recommended