Upload
mirra
View
113
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Random Number Generator. May 1, 2006. Dmitriy Solmonov W1-1 David Levitt W1-2 Jesse Guss W1-3 Sirisha Pillalamarri W1-4 Matt Russo W1-5 Design Manager – Thiago Hersan. Why Random Numbers?. Real-Time Simulations Encryption Gambling. Encryption Need random numbers for authentication - PowerPoint PPT Presentation
Citation preview
Random Number Generator
Dmitriy Solmonov W1-1David Levitt W1-2Jesse Guss W1-3
Sirisha Pillalamarri W1-4 Matt Russo W1-5
Design Manager – Thiago Hersan
May 1, 2006
2
Why Random Numbers?
• Real-Time Simulations
• Encryption
• Gambling
3
Encryption• Need random numbers for authentication• Key generation• Software vs. Hardware
– Less power/time per number– Portable
Gambling• ePoker Rooms• SoC Deck Generation• Other future casino games
4
Business Plan•Potential markets
•Defense and Intelligence Organizations•E-Gambling / Casinos•Game Consoles•Mobile Communication
•License the IP•Our design will be part of a larger ASIC or GPP design
5
IBAA Algorithm
• Uses RC4 encryption algorithm– Cryptographically secure– Deterministic
• 1024-bit number generated
• Internally Updated Seed– not user visible = secure
6
#define ALPHA (8)#define SIZE (1<<ALPHA)#define ind(x) ((x)&(0x1F))#define barrel(a) (((a)<<19)^((a)13)) uint32 A, B, Y, X;
uint32 M[32], R[32];
…
for ( i=0; i<SIZE; i++ ) {
X = m[ind(i)];
A = barrel(A) + M[ind(i +16)];
M[ind(i)] = Y = M[ind(X)] + A + B;
R[ind(i)] = B = M[ind(Y>>ALPHA)] + X;
}
The IBAA Algorithm
Architecture
8
for ( i=0; i<SIZE; i++ ) {
X = M[ind(i)];
A = barrel(A) + M[ind(i +16)];
M[ind(i)] = Y = M[ind(X)] + A + B;
R[ind(i)] = B = M[ind(Y>>ALPHA)] + X;}
IBAA Algorithm to Architecture
4 Reads from M1 Write to M1 Write to R
dependencies, feedback, and RAW hazards
9
Algorithm to Architecture
• Hardware Limits– Max. of 2 simultaneous reads from memory
• Can’t do better than two stages
• Each stage must take multiple cycles to complete
10
• Chosen Timing– Addition = 1 cycle– Memory Read = 0.5 cycles– Memory is clocked ½ period off phase– Set address and receive data in 1 cycle
• When forwarding is applied, need 4 cycles per stage
Algorithm to Architecture
11
SRAM (M)
SRAM(R)
FSM
Adder
Counter
Control Logic
Register
Counter
Adder(X)
Reg
(B)Reg
(Y)Reg Adder
(Y1)Reg Adder
(A)Reg
Stage 1--------------------------------------M1 = M[i+16]--------------------------------------X = M[i] | A = M1 + barrel(A)--------------------------------------M3 = M[X] | C1 = (X==i-1)--------------------------------------Y1 = A + (C1) ? Y : M3
Stage 2------------------------------------Y = B + Y1------------------------------------M4 = M[Yaddr] | C2 = (i==Yaddr)------------------------------------B = X + (C2) ? Y : M4------------------------------------M[i] = Y | R[i] = B
(M4)Reg
(M1)Reg
(M2)Reg
(M3)Reg
Design For Manufacture
Regular Fabrics
13
14
15
16
Why DFM?
•Ability to print on smaller processes•Robust Manufacturability•Sacrifice area, speed and metal layers for a regular design
17
Sample Layout:
Regular Fabrics
18
Lithography Simulations
Hardware
20
Adder• Four adders execute 256 times.• Hybrid adder• Fast and low power.
CS4 CS18 CS6 CS4
A[3
:0]
B[3
:0]
A[9
:4]
B[9
:4]
A[2
7:10
]
B[2
7:10
]
A[3
1:28
]
B[3
1:28
]
S[31:28] S[27:10] S[9:4] S[3:0]
C’[4
]
C[1
0]
C’[2
8]
C[3
2]
21
32-Bit Adder: First 4 Bits
22
32-Bit Adder: CS6 Block
23
32-Bit Adder: CS18 Block
24
32 Bit Fast Adder
25
Adder Performance
• Delay: 1.56 ns
• Energy Consumption – (worst case switching) : 12.4 pJ
• Power Dissipation – (estimating with our switch factor) : 148 μW
26
SRAM
Single Bus Cell
Double Bus Cell
27
SRAM
28
Functional Verification
• Structural Verilog vs. C Code: – Generate numbers under equal load
conditions– Compare Numbers
• Schematic vs. Structural Verilog– Under equal inputs, check if port
outputs match
• LVS
29
Verification
• Schematic and Extracted Parasitic spice simulations of major blocks– Check for clean signals– Check delays and rise/fall times
• Extracted Parasitic simulation of critical Register-Register Path– Signals are clean – Delay = 2.1 ns
• Extracted Parasitic simulation of chip clock distribution
30
Critical Delay
31
Final Layout
32
Poly Density 7.52%
Metal1 Density 20.85 %
33
Metal2 Density 19.89%
Metal3 Density 18.76%
34
Metal5 Density 6.8%
Metal4 Density 9.36%
Analysis
36
Specifications
• Pins– 36 input pins
• 32 bit seed input, gen, read, rst, clk
– 34 output pins• 32 bit random output, rdy, done
– 2 input/output pins• vdd, gnd
• 475 MHz chip speed• 436 KHz throughput
37
PartTrans
CountArea
(um2)Density
Prop
Delay (ns)
Power
(1x) (mW)
500MHz
Power
(Avg) (mW)
475 MHz
Adders
(4)5,856
(1,464 ea.)
25,200
(6,300 ea.)0.232
1.45
1.56
0.60
0.62
0.14
0.148
SRAM
(M&R)17,736
(M=10,458
R=7,278)
51,000
(M=35,000
R=16,000
0.348
(M=0.293
R=0.456)
0.735
0.845
W: 0.51
W: 3.25
R: 0.19
R: 1.40
0.27
1.86
Regs
(10)6,400
(640 ea.)
38,400
(3,840 ea.)0.167
0.220
0.275
0.53
0.59
0.13
0.145
Total33,371 182,000 0.194
2.1 ns
475 MHz----- 4.1 mW
Putting it All Together
Schematic ExtractRC
38
Performance Comparison
Operation Time (ms)
~4,000,000 Runs
Intel P4 3.20 GHz (90 nm) 5000
W1-2006 475 MHz (180 nm) 9000
AMD Opteron Blade 1.005 GHz () 14000
ARM Intel XScale 700 MHz () 125000
39
Where to Now ?
• ERC, tapeout, etc.
• Thermal noise unit to use as input seed
• On-Chip Bus Interface
• HyperTransport™ Interface
40
References•Jenkins, Robert J. “ISAAC”. http://burtleburtle.net/bob/rand/isaac.html
•Chirca, Schulte, Glossner, et al. “A Static Low-Power, High-Performance 32-bit Carry Skip Adder”. http://mesa.ece.wisc.edu/publications/cp_2004-12.pdf
•“CLA and Ling Adders”. http://umunhum.stanford.edu/~farland/notes.html
41
Questions