View
216
Download
2
Tags:
Embed Size (px)
Citation preview
SCOTT MILLER, AMBROSE CHU, MIHAI SIMA, MICHAEL MCGUIRE
ReCoEng LabDEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING
UNIVERSITY OF VICTORIAVICTORIA, B.C., CANADA
VLSI Implementation of a Cryptography-Oriented Reconfigurable Array
DSD 2008 - Parma, Italy ReCoEng Lab, University of Victoria, Canada
Outline
Motivation and Problem StatementOverview of Current FPGAs
• Limitations for Cryptography• Carry Lookahead Addition
CryptoRATile Implementation, Split LUTSimulation FrameworkResultsConclusions
DSD 2008 - Parma, Italy ReCoEng Lab, University of Victoria, Canada
Motivation
ProblemCryptography on
mobile, embedded systems
ASICs are expensive Recurring engineering,
quick obsolescencePoor long-integer
arithmetic support in current FPGAs
Design ConstraintsLow added
complexityNo (negligible)
impact on reconfigurability
“Cheap” solution
DSD 2008 - Parma, Italy ReCoEng Lab, University of Victoria, Canada
Overview of FPGAs
Grid of computing units
Mesh of configurable interconnection busses
Emulate any digital logic function
Global Interconnect slow
CLB CLB
CLBCLB
DSD 2008 - Parma, Italy ReCoEng Lab, University of Victoria, Canada
Overview of FPGAs
DSD 2008 - Parma, Italy ReCoEng Lab, University of Victoria, Canada
Xilinx Virtex-II4-input LUTSupport for
ripple-carry and carry-lookahead adders
Carry-Lookahead Addition
Ripple Carry Adders have serial delay
Carry Lookahead calculate carries in parallel
Can use hierarchies of CLA adders to speed-up long-operand calculations
OPERANDS FOR CLA
1 + 1 = Generate1 + 0 = Propagate0 + 1 = Propagate
0 + 0 = Nothing
FPGAs: Limitations for Cryptography
Poor support for long-integer arithmetic Long ripple-carry chains (with global interconnects) Fast-adders still require multiple stages of global-
interconnectsSame difficulties for comparison operations
Required in most common ECC and RSA algorithms
DSD 2008 - Parma, Italy ReCoEng Lab, University of Victoria, Canada
FPGAs: Limitations for Cryptography
DSD 2008 - Parma, Italy ReCoEng Lab, University of Victoria, Canada
Proposed Solution: CryptoRA
Based on Xilinx architecture
Additional fast-path provided for simultaneous Carry, Propagate signals
Extends fast-path across in rows as well as columns
Splits LUT to handle subtraction, etc.
DSD 2008 - Parma, Italy ReCoEng Lab, University of Victoria, Canada
Simulation Framework
All designs simulated in 65nm technology Simulated with Cadence Spectre simulator Average taken of 10 Monte Carlo runs with process
variation and mismatched includedSimulated simplified CLB models
Many components outside the scope of this research Respective loads for omitted modules were included
Timing simulated at every point of interest in the LUT -> Fast chain path to find all timing trade-offs
DSD 2008 - Parma, Italy ReCoEng Lab, University of Victoria, Canada
Results: Split LUT
Scenario
Edge Delay (ps)Selection Path Generate Path LUT Output Cout
Good Poor Good Poor Good Poor Good Poor
Incremental2-select/2-generate 30 71 46 56 94 129 104 1692-select/3-generate 31 72 58 115 98 128 117 1713-select/3-generate 38 88 60 112 102 137 123 182
Virtex-II / Spartan-3 106 157 50 111 62 147 156 260Virtex-4 59 120 50 124 59 120 155 209Virtex-5 141 276 121 204 121/141 204/276 299 353
DesignSpeed-up Versus
Virtex-II/Spartan-3 Virtex-4 Virtex-52-select/2-generate 1.54 1.24 2.092-select/3-generate 1.52 1.22 2.063-select/3-generate 1.42 1.15 1.94
DSD 2008 - Parma, Italy ReCoEng Lab, University of Victoria, Canada
Results: Split LUT
Selection Generate LUT_Out Cout
0
50
100
150
200
250
300
350
400
2-sel/2-gen
2-sel/3-gen
3-sel/3-gen
V-II/S-3
V-4
V-5
DSD 2008 - Parma, Italy ReCoEng Lab, University of Victoria, Canada
Results - Discussion
Performance boost of added carry-chain and additional fast-path cannot be directly quantified Dependence on physical FPGA itself, and operand word-
lengthHierarchical carry-lookahead adders show
promise with the new chains for increased performance
Example calculations are given in the paperPerformance comes at 2.5% area increase over
smallest reference structure
DSD 2008 - Parma, Italy ReCoEng Lab, University of Victoria, Canada
Conclusions
Split LUT structure enhances performance at minor (2.5%) area penalty
Increased speed in carry chain and avoiding global interconnect improves long-integer operation performance
Line-loading overhead from extra fast-chains is very small
This device shows promise for performing cryptographic operations.
DSD 2008 - Parma, Italy ReCoEng Lab, University of Victoria, Canada
Thank You for Listening
Any Questions?
Scott [email protected]
http://www.ece.uvic.ca/~smiller
DSD 2008 - Parma, Italy ReCoEng Lab, University of Victoria, Canada