Upload
vaibhav-patil
View
216
Download
0
Embed Size (px)
Citation preview
8/3/2019 ECE545 Lecture2 Project 6
1/18
1
Course web page:
ECE 545
Digital System Design with VHDL
ECE web page Courses Course web pages ECE 545
http://ece.gmu.edu/coursewebpages/ECE/ECE545/F10/
Kris Gaj
Office hours: Monday, 7:30-8:30 PM,
Wednesday, 6:00-7:00 PM,
and by appointment
Research and teaching interests:reconfigurable computingcomputer arithmeticcryptographynetwork security
Contact:The Engineering Building, room 3225
ECE 545
Part of:
MS in Electrical Engineering
MS in Computer Engineering
Digital Systems Design
Microprocessor and Embedded Systems
Strongly suggestedfor two concentration areas:
Elective
Elective course in the remaining concentration areas
One of five core courses (must be passed with B or better)
algorithmic
Design level
register-transfer
gate
transistor
layout
devices
Courses
Computer
Arithmetic
Digital System
Design with VHDL
Digital
Integrated
CircuitsPhysical
VLSI Design
VLSI Test
Concepts
ECE
545
ECE
645
ECE
586
ECE
680
ECE
682
ECE684MOS Device
ElectronicsECE 584
Semiconductor
Device Fundamentals
ECE
681
VLSI Design
for ASICs
DIGITAL SYSTEMS DESIGN
Concentration advisors: Kris Gaj, Jens-Peter Kaps, Ken Hintz
1. ECE 545 Digital System Design with VHDL K. Gaj, project, FPGA design with VHDL,
Aldec/Mentor Graphics, Xilinx/Altera
2. ECE 645 Computer Arithmetic
K. Gaj, project, FPGA design with VHDL or Verilog,
Aldec/Mentor Graphics, Xilinx/Altera
3. ECE 681 VLSI Design for ASICs
N. Klimavicz, project/lab, back-end ASIC design with
Synopsys tools
4. ECE 586 Digital Integrated Circuits
D. Ioannou, R. Mulpuri
5. ECE 682 VLSI Test Concepts
T. Storey
Grading Scheme
Homework - 10%Project - 40%Midterm Exam - 20%Final Exam - 30%
8/3/2019 ECE545 Lecture2 Project 6
2/18
2
Midterm exam 1
2 hours 30 minutesin classdesign-orientedopen-books, open-notespractice exams will be available on the web
Monday, November 1st
Tentative date:
Final exam
2 hours 45 minutesin classdesign-orientedopen-books, open-notespractice exams will be available on the web
Monday, December 20, 7:30-10:15pm
Date:
9
Project
Project
individualsemester-longrelated to the research project conducted by
Cryptographic Engineering Research Group (CERG)
at GMU
supporting NIST (National Institute of Standardsand Technology) in the evaluation of candidates
for a new cryptographic standard
11
Background
Hash Function
arbitrary length
message
hashfunction
hash valueh(m)
h
m
fixed length
It is computationally
infeasible to find such
m and m that
h(m)=h(m)
8/3/2019 ECE545 Lecture2 Project 6
3/18
3
Main Application: Digital Signature
Signature
DIGITALHANDWRITTEN
A6E3891F2939E38C745B
25289896CA345BEF5349
245CBA653448E349EA47
Main Goals: unique identificationproof of agreement to the contentsof the document
Message
Hash
function
Public key
cipher
Alice
Signature
Alices private key
Bob
Hash
function
Alices public key
Typical Digital Signature Scheme
Hash value 1
Hash value 2
Hash value
Public key
cipher
yes no
Message Signature
Handwritten and Digital SignaturesCommon Features
Handwritten signature Digital signature
1. Unique
2. Impossible to be forged
3. Impossible to be denied by the author
4. Easy to verify by an independent judge
5. Easy to generate
Handwritten and Digital SignaturesDifferences
Handwritten signature Digital signature
6. Associated physically
with the document
7. Almost identical
for all documents
8. Usually at the last
page
6. Can be stored and
transmitted
independently
of the document
7. Function of the
document
8. Covers the entire
document
Hash function algorithms
Customized
(dedicated)
Based on
block ciphers
Based on
modular arithmetic
MDC-2
MDC-4
IBM, Brachtl, Meyer, Schilling, 1988
MASH-11988-1996
MD2Rivest 1988
MD4Rivest 1990
MD5Rivest 1990
SHA-0
SHA-1
RIPEMD
RIPEMD-160
European RACE Integrity
Primitives Evaluation Project, 1992
NSA, 1992
NSA, 1995
SHA-256, SHA-384, SHA-512 NSA, 2000
Attacks against dedicated hash functions
known by 2004
MD2
MD4
MD5 SHA-0
SHA-1
RIPEMD
RIPEMD-160
partially broken
broken, H. Dobbertin, 1995
(one hour on PC, 20 free bytes at the start of the message)
partially broken,
collisions for the
compression function,Dobbertin, 1996
(10 hours on PC)
weaknessdiscovered,
1995 NSA,
1998 Francereduced roundversion broken,
Dobbertin 1995
SHA-256, SHA-384, SHA-512
8/3/2019 ECE545 Lecture2 Project 6
4/18
4
MD4
MD5
SHA-0
SHA-1
RIPEMD
RIPEMD-160
SHA-256, SHA-384, SHA-512
broken;
Wang, Feng, Lai, Yu
Crypto 2004
(1 hr on a PC)
attack with
240 operations
Crypto 2004
What was discovered in 2004-2005?
broken;
Wang, Feng, Lai, Yu, Crypto 2004
(manually, without using a computer)
broken;
Wang, Feng,
Lai, Yu,
Crypto 2004
(manully, without
using a computer)
attack with
263 operations
Wang, Yin,
Yu, Aug 2005
263 operationsSchneier, 2005
In hardware:
Machine similar to the one used to break DES:Cost = $50,000-$70,000 Time: 18 days
or
Cost = $0.9-$1.26M Time: 24 hours
In software:
Computer network similar to distributed.net
used to break DES (~331,252 computers) :
Cost = ~ $0 Time: 7 months
Cryptographic Standards
So how the cryptographic standards
have been created so far?
National Security Agency
(also known as No Such Agency
or NeverSay Anything)
Created in 1952 by president Truman
Goals:
designing strong ciphers (to protect U.S. communications)breaking ciphers (to listen to non-U.S. communications)Budget and number of employees kept secret
Largest employer of mathematicians in the world
Larger purchaser of computer hardware
NSA-developed Cryptographic Standards
time
1970 1980 1990 2000 2010
DES Data Encryption Standard
1977 1999
Triple DES
SHA-1Secure Hash Algorithm
SHA-2
Block Ciphers
Hash Functions 1995 20031993
SHA-0
2005
Cryptographic Standard Contests
time
96 97 98 99 00 01 02 03 04 05 06 07 08 09 10 11 12
AES
NESSIE
CRYPTREC
eSTREAM
SHA-3
34 stream ciphers4 SW+4 HW winners
51 hash functions1 winner
15 block ciphers1 winner
IX.1997 X.2000
I.2000 XII.2002
V.2008
X.2007 XII.2012
XI.2004
8/3/2019 ECE545 Lecture2 Project 6
5/18
5
25
SHA-3 Contest - NIST Evaluation Criteria
Security
So*ware
Efficiency
HardwareEfficiency
Simplicity
FPGAsASICs
Flexibility Licensing
Software or hardware?
SOFTWARE HARDWAREsecurity of data
during transmission
flexibility
(new cryptoalgorithms,
protection against new attacks)
speed
random key
generation
access control
to keys
tamper resistance
low cost
resistance to
side-channel attacks
Memory
Power
consumption
Primary efficiency indicators
Software Hardware
Speed Memory Speed Area
Efficiency parameters
Latency Throughput = Speed
Encryption/
decryption
Time to
encrypt/decrypt
a single block
of data
Mi
Ci
Number of bits
encrypted/decrypted
in a unit of time
Encryption/
decryption
MiMi+1Mi+2
CiCi+1Ci+2
Throughput =Block_size Number_of_blocks_processed_simultaneously
Latency
Advanced Encryption Standard (AES) Contest
1997-2001
15 Candidatesfrom USA, Canada, Belgium,
France, Germany, Norway, UK, Israel,
Korea, Japan, Australia, Costa Rica
June 1998
August 1999
October 2000
1 winner: RijndaelBelgium
5 final candidates
Mars, RC6, Rijndael, Serpent, Twofish
Round 1
Round 2
Security
Software efficiency
Flexibility
Security
Hardware efficiency
0
50
100
150
200
250
300
350
400
450
500
Serpent Rijndael Twofish RC6 Mars
Speed of the final AES candidates in Xilinx FPGAs
Speed [Mbit/s] K.Gaj, P. Chodowiec, AES3, April, 2000
8/3/2019 ECE545 Lecture2 Project 6
6/18
6
0
10
20
30
40
50
60
70
8090
100
SerpentRijndael Twofish RC6 Mars
Survey filled by 167 participants of
the Third AES Conference, April 2000# votes
SerpentRijndael Twofish RC6 Mars
Results of the NSA group
ASICsSpeed [Mbit/s]
606
414
0
100
200
300
400
500
600
700
202
105 10357
431
177143
61
NSA
ASIC
GMU
FPGA
AES3, April, 2000
0
5
10
15
20
25
30
SerpentRijndael TwofishRC6 Mars
Efficiency in software: NIST-specified platform
128-bit key
192-bit key256-bit key
200 MHz Pentium Pro, Borland C++
Speed [Mbits/s] Security
Complexity
High
Adequate
Simple Complex
NIST Report: Security
Rijndael
MARSSerpent
Twofish
RC6
AES Final Report, October 2000
35
NIST SHA-3 Contest - Timeline
51
candidates
Round 114
5-6 1-2Round 2 Round 3
July 2009 End of 2010 Mid 2012Oct. 2008
36
Fair and comprehensive methodology for evaluationof hardware performance in FPGAs
High-speed fully autonomous implementations ofall 14SHA-3 candidates & SHA-2
256-bit & 512-bit variants
optimized for the maximum throughput to area ratio
Open-source benchmarking tool supporting optimizationof tool options and efficient generation of results for multiple
FPGA families
GMU Team Goals
8/3/2019 ECE545 Lecture2 Project 6
7/18
7
PrimaryDesignersofGMUCodes
Ekawat Homsirikamol
a.k.a IceMarcin Rogawski
Developed optimized VHDL implementations of
14 Round 2 SHA-3 candidates + SHA-2
in two variants each (256 & 512-bit output),for some functions using several alternative architectures 38
Methodology
39
Comprehensive Evaluation
two major vendors: Altera and Xilinx (~90% of the market)multiple high-performance and low-cost families
Altera Xilinx
Technology Low-cost High-
performance
Low-cost High-
performance
90 nm Cyclone II Stratix II Spartan 3 Virtex 4
65 nm Cyclone III Stratix III Virtex 5
40
Language: VHDL Tools: FPGA vendor tools Interface Performance Metrics Design Methodology Benchmarking
Uniform Evaluation
41
Why Interface Matters?
Pin limit
Total number of i/o ports Total number of an FPGA i/o pins
Support for the maximum throughputTime to load the next message block Time to process previous block
42
Interface: Two possible solutions
Length of the message
communicated atthe beginning
+ easy to implement
passive source circuit
area overhead for the counter
of message bits
Dedicated end of message
port
more intelligent source
circuit required
+ no need for internal
message bit counter
msg_bitlen
zero_word
messageend_of_msg
SHA core
8/3/2019 ECE545 Lecture2 Project 6
8/18
8
43
SHA Core: Interface & Typical Configuration
SHA core is an active component; surrounding FIFOs are passive andwidely available
Input interface is separate from an output interfaceProcessing a current block, reading the next block, and storinga result for the previous message can be all done in parallel
fifoin_empty
fifoin_read
idata
w w
odata
fifoout_full
fifoout_write
fifoin_full
fifoin_write
fifoout_empty
fifoout_read
Input
FIFOSHAcore
clk rst
ext_idata
w
ext_odatadin dout
src_ready
src_read
dst_ready
dst_write
din dout
full empty
wri te read
Output
FIFO
din dout
full empty
write read
w
clk rst
clk rst clk rst
clk rst
clk rst
44
SHA Core: Interface & Typical Configuration
fifoin_empty
fifoin_read
idata
w w
odata
fifoout_full
fifoout_write
fifoin_full
fifoin_write
fifoout_empty
fifoout_read
Input
FIFO SHAcore
clk rst
ext_idata
w
ext_odatadin dout
src_ready
src_read
dst_ready
dst_write
din dout
full empty
write read
Output
FIFO
din dout
full empty
write read
w
clk rst
io_clk rst io_clk rst
clk rst
clk rst
io_clk
io_clk
Some functions may require a faster input/output clock in order to loadinput data at a faster rate
45
Primary Secondary
1. Throughput
(single long message)
2. Area
3. Throughput / Area
3. Hash Time for
Short Messages
(up to 1000 bits)
Performance Metrics
46
Performance Metrics - Area
We force these vectors to look as follows through
the synthesis and implementation options:
0
0
0
0
Areaa
47
Primary Optimization Target: Throughput to Area Ratio
Features:
practical: good balance between speed and cost
very reliable guide through the entire design process,facilitating the choice of
high-level architecture implementation of basic components choice of tool options
leads to high-speed, close-to-maximum-throughput designs
Choice of Optimization Target
48
Our Design Flow
Specification Interface
Datapath
Block diagram
Controller
ASM Chart
VHDL Code
Formulas for
Throughput &
Hash time
Max. Clock Freq.
Resource Utilization
Throughput, Area, Throughput/Area,
Hash Time for Short Messages
Controller
Template
Library of Basic
Components
8/3/2019 ECE545 Lecture2 Project 6
9/18
9
49
Basic Operations of 14 SHA-3 Candidates
49
NTT Number Theoretic Transform, GF MUL Galois Field multiplication,
MUL integer multiplication, mADDn multioperand addition with n operands
ATHENaAutomatedToolforHardware
Evalua?oN
50
Benchmarkingopen-sourcetool,
wriGeninPerl,aimedatan
AUTOMATEDgenera?onof
OPTIMIZEDresultsfor
MULTIPLEFPAplaorms
Underdevelopmentat
eorgeMasonUniversity.
http://cryptography.gmu.edu/athena
ATHENa
Server
FPGA Synthesis and
Implementation
Result Summary
+ DatabaseEntries
2 3
HDL + scripts +
configuration files
1
Database
Entries
Download scripts
andconfiguration files8
Designer
4
HDL + FPGA Tools
User
Database
query
Ranking
of designs
5
6
Basic Dataflow of ATHENa
0
Interfaces
+ Testbenches 51 52
synthesizable
sourcefiles
configuraKon
files
testbench
constraint
files
result
summary
(user-friendly)
database
entries
(machine-
friendly)
ATHENaMajorFeatures(1) synthesis,implementa?on,and?minganalysisinbatchmode supportfordevicesandtoolsofmulKpleFPGAvendors:
genera?onofresultsformulKplefamiliesofFPAsofagivenvendor
automatedchoiceofabest-matchingdevicewithinagivenfamily
53
ATHENaMajorFeatures(2)
automatedverificaKonofdesignsthroughsimula?oninbatchmode
supportformulK-coreprocessing automatedextracKonandtabulaKonofresults severalopKmizaKonstrategiesaimedatfinding
op?mumop?onsoftools besttargetclockfrequency beststar?ngpointofplacement
OR
54
8/3/2019 ECE545 Lecture2 Project 6
10/18
10
55
batch mode of FPGA tools
ease of extraction and tabulation of results Excel, CSV (available), LaTeX (coming soon)
optimized choice of tool options
Generation of Results Facilitated by ATHENa
vs.
56
Relative Improvement of Results from Using ATHENaVirtex 5, 256-bit Variants of Hash Functions
0
0.5
1
1.5
2
2.5
Groe
stl
Shav
ite-3
Luffa
Kecc
ak
Hams
iEc
ho
Skein
Fugu
e
Sha-2
BMW
Cube
Hash
Blake
Shab
al
SIMD
JH
Area
Thr
Thr/Area
Ratios of results obtained using ATHENa suggested options
vs. default options of FPGA tools
58
Results
59
Throughput [Mbit/s]
Virtex 5, 256-bit variants of algorithms
0
2000
4000
6000
8000
10000
12000
14000
16000
ECHO
Keccak
Groestl
Luffa
BMW
JH
Cube
Hash
Fugu
e
SHAvite
-3
BLAKE
Skein
Hamsi
Shabal
SIMD
SHA-2
60
Throughput [Mbit/s]
Virtex 5, 512-bit variants of algorithms
0.0
2000.0
4000.0
6000.0
8000.0
10000.0
12000.0
14000.0
Groestl
BMW
Luffa
Keccak
ECHO
SIMD
JH
SHAvite
-3
BLAKE
Cube
Hash
Skein
Shabal
SHA-2
Hamsi
Fugu
e
8/3/2019 ECE545 Lecture2 Project 6
11/18
11
61
Normalization & Compression of Results
Absoluteresulte.g.,throughputinMbits/s,areainCLBslices
Normalizedresult
Overallnormalizedresulteometricmeanofnormalizedresultsfor
allinves?gatedFPAfamilies
normalized_ result=result_ for_ SHA 3_ candidate
result_ for_ SHA 2
62
Normalized Throughput& Overall Normalized Throughput
63
Overall Normalized Throughput: 256-bit variants of algorithmsNormalized to SHA-256, Averaged over 7 FPGA families
0
1
2
3
4
5
6
7
8
Keccak EC
HO Luffa BMW Groestl JH
Cube
Has Fu
gue
SHAvite-3
BLAKE Ha
msi Skein Shabal SIMD
64
Overall Normalized Throughput: 512-bit variants of algorithmsNormalized to SHA-512, Averaged over 7 FPGA families
0
0.5
1
1.5
2
2.5
3
3.5
4
Groestl
Luffa
BMW
ECHO
Keccak
JH
SIMD
Cu
beHa
sh
SHA
vite-3
BLAKE
Skein
Shab
al
Hamsi
Fugue
65
Area [CLB slices]
Virtex 5, 256-bit variants of algorithms
0
1000
2000
3000
40005000
6000
7000
8000
9000
10000
SHA-2
CubeHa
sh
Hamsi
Fugu
e JH
SHAvite
-3
Luffa
Keccak
Shab
al
Skein
Groestl
BLAKE
BMW
ECHO
SIMD
66
Area [CLB slices]
Virtex 5, 512-bit variants of algorithms
0
2000
4000
6000
8000
10000
12000
14000
16000
18000
SHA-2
CubeHa
sh
Fugue JH
Keccak
Shabal
Skein
SHAvite
-3
Luffa
Hamsi
Groestl
BLAKE
ECHO
BMW
SIMD
8/3/2019 ECE545 Lecture2 Project 6
12/18
12
67
Overall Normalized Area: 256-bit variants of algorithmsNormalized to SHA-256, Averaged over 7 FPGA families
0
5
10
15
20
25
30
CubeHa
sh
Hamsi
BLAKE
Luffa
Shab
al JH
Keccak
SHAvite
-3
Skein
Fugu
e
Groe
stl
BMW
SIMD
ECHO
68
Overall Normalized Area: 512-bit variants of algorithmsNormalized to SHA-512, Averaged over 7 FPGA families
0
5
10
15
20
25
30
CubeHa
sh
Fugue
Keccak
Shabal JH
Skein
BLAKE
Hamsi
Luffa
SHAvite
-3
Groestl
BMW
ECHO
SIMD
69
Overall Normalized Throughput/Area: 256-bit variantsNormalized to SHA-256, Averaged over 7 FPGA families
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
Keccak
Lu
ffa
CubeHa
sh
Groestl
JH
Ham
si
BLAKE
Fug
ue
SHAv
ite-3
Sh
abal
S
kein
B
MW
ECH
O
S
IMD
70
Overall Normalized Throughput/Area: 512-bit variantsNormalized to SHA-512, Averaged over 7 FPGA families
0
0.2
0.4
0.6
0.8
1
1.2
1.4
Keccak
CubeHa
sh
Luffa
JH
Groe
stl
Shab
al
BLAKE
Skein
SHAvite
-3
Fugu
e
Hamsi
BMW
ECHO
SIMD
71
Throughput vs. Area Normalized to Results for SHA-256
and Averaged over 7 FPGA Families 256-bit variants
best
worst
72
Throughput vs. Area Normalized to Results for SHA-512
and Averaged over 7 FPGA Families 512-bit variants
best
worst
8/3/2019 ECE545 Lecture2 Project 6
13/18
13
73
Execution Time for Short Messages up to 1000 bits
Virtex 5, 256-bit variants of algorithms
74
Execution Time for Short Messages up to 1000 bits
Virtex 5, 512-bit variants of algorithms
75
Thr/Area Thr Area Short msg. Thr/Area Thr Area Short msg.
256-bit variants 512-bit variants
BLAKE
BMW
CubeHash
ECHO
Fugue
Groestl
Hamsi
JH
Keccak
Luffa
Shabal
SHAvite-3
SIMD
Skein
76
Throughput/Area & Throughput most crucial forhigh-speed implementations
Area cannot be easily traded for ThroughputBest performers so far
1-2. Keccak & Luffa
3. Groestl
Worst performers so far:
14. SIMD
13. ECHO
12. BMW
Summary of Results
77
Cryptology e-Print Archive - 2010/445 (100+ pages) Detailed hierarchical block diagrams Corresponding formulas for execution time and throughput
FPL 2010 paper ATHENa features Case studies
ATHENa web site Most recent results Comparisons with results from other groups Optimum options of tools
More About our Designs & Tools
78
Comparison
withOther Groups
8/3/2019 ECE545 Lecture2 Project 6
14/18
14
79
OTHERGROUPS GMU
Area Thr Thr/Area Source Area Thr Thr/Area
BLAKE1660 2676 1.61
Kobayashiet al.
1871 2854 1.53
CubeHash590 2960 5.02
Kobayashi
et al.707 3445 4.87
ECHO 9333 14860 1.59 Lu et al. 5445 13875 2.55
Groestl1722 10276 5.97
Gauvaram
et al.1884 8677 4.61
Hamsi718 1680 2.34
Kobayashiet al.
946 2646 2.80
Keccak 1412 6900 4.89 Bertoni et al. 1229 10807 8.79
Luffa1048 6343 6.05
Kobayashiet al.
1154 8008 6.94
Shabal 153 2051 13.41 Detrey et al. 1266 2624 2.07
Skein
(estimated)1632 3535 2.17 Tillich 1463 2812 1.92
Comparison with Best Results Reported by Other Groups
Virtex 5, 256-bit variants of algorithms
80
BEST REPORTED RESULTS
Area Thr Thr/Area Source
BLAKE 1660 2676 1.61 Kobayashi et al.
BMW 4400 5577 1.27 GMU
CubeHash 590 2960 5.02 Kobayashi et al.
ECHO 5445 13875 2.55 GMU
Fugue 956 3151 3.30 GMU
Groestl 1722 10276 5.97 Gauvaram et al.
Hamsi 946 2646 2.80 GMU
JH 1108 3955 3.57 GMU
Keccak 1229 10807 8.79 GMU
Luffa 1154 8008 6.94 GMU
Shabal 153 2051 13.41 Detrey et al.
SHAvite-3 1130 2887 2.55 GMU
SIMD 9288 2326 0.25 GMU
Skein 1632 3535 2.17 Tillich et al.
Best Overall Reported Results as of Aug. 6, 2010
Virtex 5, 256-bit variants of algorithms
81
Throughput vs. Area: Best reported results
Virtex 5, 256-bit variants of algorithms
best
worst
82
Your Project
83
Analysis of Alternative Architectures - Unrolled
r times r/2 times
84
Analysis of Alternative Architectures - Folded
r times 2r times 2r times
Basic
Folded
Vertically-2x
(fv2)
Folded
Horizontally-2x
(fh2)
8/3/2019 ECE545 Lecture2 Project 6
15/18
15
85
Preliminary results for
CubeHash, Groestl, Keccak & Luffa in Virtex 5
0
1
2
3
4
5
6
7
8
0 1 2 3 4 5 6 7
NormalizedThroughput
Normalized Area
CubeHash
Groestl
Luffa
Keccak
x1 x2x4
fv3^2
x1 x2
fv4
fv2
x1
x1 x2
CubeHash
Luffa
Keccak
Groestl
Your Project
14 SHA-3 candidates left in the contest Given:
specification of the functionreference implementation in Cinterfacetestbench and test vectorsGMU implementation of the basic version including
block diagramsASM chartsshort descriptionformulas for execution time & throughputsource codesresults for Xilinx and Altera FPGAs
Your Project
Develop:
Block diagramASM chartFormulas for execution time & throughputSynthesizable code in VHDLResults for multiple families of FPGAs from Xilinx and
Altera
forat least one architecture from each of the following
three classes of architectures:
Unrolled architecture Folded architecture Architecture based on the use of embedded FPGA
resources (BRAMs, multipliers, DSP units, etc.)[256 bit only, 512-bit only, or both]
88
BlockRAMsandMULs
BlockRAMsandMULs
Configurable
Logic
Blocks
I/O
Blocks
What is an FPGA?
Block
RAMs &
EmbeddedMultipliers
89
RAM Blocks and Multipliers in Xilinx
FPGAs
The Design Warriors Guide to FPGAsDevices, Tools, and Flows. ISBN 0750676043
Copyright 2004 Mentor Graphics Corp. (www.mentor.com)
90
Using Embedded FPGA Resources
Basic design
Your design
( 1536, 0, 0)
( 768, 2, 4)
Basic design
Your design
( 3010, 0, 0)
( 1505, 32 kbit, 4)
8/3/2019 ECE545 Lecture2 Project 6
16/18
16
91
Block RAM
Spartan-3Dual-Port
Block RAM
Port
A
Port
B
Block RAM
Most efficient memory implementation Dedicated blocks of memory
Ideal for most memory requirements 4 to 104 memory blocks
18 kbits = 18,432 bits per block (16 k without parity bits) Use multiple blocks for larger memories
Builds both single and true dual-port RAMs Synchronous write and read (different from distributed RAM)
92
Block RAM can have various configurations (port
aspect ratios)
0
16,383
1
4,095
4
0
8,191
2
0
2047
8+1
0
1023
16+2
0
16k x 1
8k x 2 4k x 4
2k x (8+1)
1024 x (16+2)
93
Port A Out18-Bit Width
Port B In
1k-Bit Depth
Port A In1K-Bit Depth
Port B Out
18-Bit Width
DOA[17:0]
DOB[17:0]
WEA
ENA
RSTA
ADDRA[9:0]
CLKA
DIA[17:0]
WEB
ENB
RSTB
ADDRB[9:0]
CLKB
DIB[17:0]
Dual-Port Bus Flexibility
94
Embedded Multipliers in Spartan 3
18x18 bit signed multipliers with optional input/output registers
95
The Design Warriors Guide to FPGAs
Devices, Tools, and Flows. ISBN 0750676043
Copyright 2004 Mentor Graphics Corp. (ww w.mentor.com)
Multiplier-Accumulator - MAC
96
Xilinx XtremeDSP
Starting with Virtex 4 family, Xilinx introduced DSP48 blockfor high-speed DSP on FPGAs
Essentially a multiply-accumulate core with many otherfeatures
Now also Spartan-3A and Virtex 5 have DSP blocks
8/3/2019 ECE545 Lecture2 Project 6
17/18
17
97
DSP48 Slice: Virtex 4
98
Simplified Form of DSP48
Technology Low-cost High-
performance
120/150nm Virtex2,2Pro
90nm Spartan3 Virtex4
65nm Virtex5
45nm Spartan6
40nm Virtex6
Xilinx FPGA Devices Altera FPGA Devices
Technology Low-cost Mid-range High-
performance
130nm Cyclone Stra?x
90nm CycloneII StraKxII
65nm CycloneIII ArriaI StraKxIII
40nm CycloneIV ArriaII StraKxIV
All Projects - Organization
Projects divided into phases Deliverables for each phase submitted through
Blackboard at selected checkpoints and evaluatedby the instructor and/or TA
Feedback provided to students on a best effortbasis
Final report and codes submitted using Blackboardat the end of the semester
Honor Code Rules
All students are expected to write and debugtheir codes individually
Students are encouraged to help and supporteach other in all problems related to the
- operation of the CAD tools,
- basic understanding of the problem.
8/3/2019 ECE545 Lecture2 Project 6
18/18
18
103
Course Objectives
At the end of this course you should be able to: Code in VHDL for synthesis Decompose a digital system into a controller (FSM) and datapath,and code accordingly Write VHDL testbenches Synthesize and implement digital systems on FPGAs Effectively code digital systems for cryptography, signal
processing, and microprocessor applications
This knowledge will come about through homework, exams,and an extensive project The project in particular will help you know VHDL and the FPGA
design flow from beginning to end
104
Additional Skills Learned in the Project
Reading & understanding specification of a complexalgorithm
Design of new hardware architectures based onexisting architectures (datapath & controller)
Reading, understanding, and modifying existingVHDL code
Using embedded resources of modern FPGAs Characterizing performance of your codes
for multiple FPGA families
105
Project Task 1
Read the following chapters from the GMUtechnical report published at
http://eprint.iacr.org/2010/445 Chapter 1 Introduction & Motivation Chapter 2 Methodology Chapter 3 Comprehensive Designs of SHA-3 Candidates
3.1, 3.2 + subsection concerning your algorithm
Chapter 4 Design Summary and Results Download and get familiar with the package of a hash
function assigned to youhttp://csrc.nist.gov/groups/ST/hash/sha-3/Round2/submissions_rnd2.html
Read carefully the specification of your algorithm106
Project Task 1 cont.
In one week:
Meeting with the instructor devoted to fully understanding
the GMU report, specification, block diagrams,
interface, and timing formulas.
In two weeks:
Draft block diagrams of the
- selected unrolled architecture
- selected folded architecture.
Corresponding timing formulas for execution time &
throughput.