Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
Introduction to FPGAs
Madhura Purnaprajna
Outline
• What’s different about FPGAs
• Architecture
– Logic
– Routing
– I/O
• State-of-the-art: Xilinx Virtex 7
The Applications ...
3
Medical
Consumer
High Performance Computing
Communications
The Industry ...
4
The two domains ...
5
Processors
• Sequential computing
• Instruction-level parallelism
6
ALU
Registers
Instruction Memory
Decoder
Data Memory
Registers
FPGAs
• User configurable
• User-defined parallelism
7
00 0 01 1 10 1 11 1
FU
FFs
FU
FFs
FU
FFs
FU
FFs
FU
FFs
FU
FFs
FU
FFs
FU
FFs
FU
FFs
Application Mapping
8
Processor FPGA
N
< N
Temporal vs Spatial Computing
9
ALU
Registers
Instruction Memory
Decoder
Data Memory
Registers
Processor FPGA
User-defined parallelism Flexibility Performance per Watt
x Limited parallelism x Fixed architecture x Scalability?
00 0 01 1 10 1 11 1
FU
FFs
FU
FFs
FU
FFs
FU
FFs
FU
FFs
FU
FFs
FU
FFs
FU
FFs
FU
FFs
Performance vs Adaptability
10
ASIC
Ease
of
Ad
apta
bili
ty
Performance
FPGA
Processors
~35x Area
~5x Speed
~15x Power
Measuring the gap between FPGAs and ASICs, Ian Kuon and Jonathan Rose, FPGA 2006
FPGA Architecture
11
Programmable Logic
Programmable Routing
Logic: Lookup Tables
12
LUT FF
LUT FF
LUT FF
LUT FF
2K:1
MUX
2K
SRAM
K
Slice/Cluster
Look-up Table
• 2K SRAM Cells
• 22K different functions
• 2K:1 MUX
– K-levels of 2:1 muxes
13
2K:1
MUX
2K
SRAM
K
Look-up Table: 2-inputs
• 22 SRAM Cells
• 222 different functions
• 22:1 MUX
– 2-levels of 2:1 muxes
14
22:1
MUX
22
SRAM
2
Look-up Table: 2-input NAND
• 4 SRAM Cells
– 6 transistors each
• 4:1 MUX
– ~12 transistors
• ~40 Transistors
15
4:1
MUX
1 1 1 0
2
Look-up Table: 2-input NAND
16
4:1
MUX
1 1 1 0
2
4 Transistors
HUGE!
40 Transistors
Design Flow: FPGA
17
HDL
Logic Synthesis
Technology Mapping
Pack, Place & Route
FPGA
Benchmark Circuits
FPGA Architecture
Area, Power, Speed
LOGIC BLOCK ARCHITECTURE
Logic: Soft
19
Programmable Logic Blocks
Logic: Hard Blocks
20
Memory Blocks
Logic: Hard Blocks
21
DSP Blocks
Logic: Lookup Tables
22
LUT FF
LUT FF
LUT FF
LUT FF
2K:1
MUX
2K
SRAM
K
Slice/ Cluster
Design decisions
• LUT size
• Number of LUTs per cluster
• Inputs/Outputs to/from each cluster
• Area and Speed
No. of Logic Blocks vs. Logic Block Functionality
• LUT size increases exponentially with K • Routing tracks surrounding logic increases with the number of input pins
Total FPGA area vs. LUT size
Terminology
• Basic logic element (BLE)
• Cluster
– Size grows quadratically
– Local interconnect
– Fewer inputs (shared)
LUT FF
LUT FF
LUT FF
LUT FF
LUTs on critical path & LUT delay vs LUT size
Functionality increases=> fewer logic blocks on critical path => internal delay increases
Critical path: Function of LUT and Cluster size
Diminishing returns beyond LUT6 and cluster size 3,4
HETEROGENEOUS BLOCKS
Choice of functions
• Which function?
• Ratio of special function to generic logic?
• What to do with special function blocks when they are not used?
Hard blocks
• FFs (set, reset, enable, load,…)
• Add, sub, carry logic, …
• Use LUTs as memories
• Block RAMs/ ROMs, FIFOs
• Multipliers (fracturable)
• Processors
Challenge
• Performance, power, area
– As compared to ASICs
• Introduce other hard blocks
– Floating point units, etc.
• Shadow logic
ROUTING ARCHITECTURE
Routing in FPGAs
• Connect logic blocks and I/O
– To define a user circuit
• Flexible
– Support local and distant routing demands
• Locality
– Short, Fast, with intermediate long wires
• Global clocks and resets
Routing details
• Global routing
– Macroscopic allocation of wires
– Relative position of routing channels to logic blocks
– Wires in each channel
• Detailed routing
– Microscopic
– Length of wires
– Switching quantity
Routing Architectures
• Hierarchical
• Island style
Hierarchical Routing
• Groups of logic blocks
• Interconnected levels
• Used in:
– Altera FLEX, APEX
Hierarchical Routing
• Advantages:
– Predictable inter-logic block delay
– Superior performance for some designs
• Disadvantages:
– Over use of logic blocks (mismatch in design and FPGA hierarchy)
– Large variation in inter-block delay
Island style
• 2-D mesh: evenly distributed routing resources
• routing channels on four sides
• Each channel has W wires • Wire segments of
different lengths in each channel
• Used in present day commercial FPGAs
Island style
• Advantages:
– Efficient connection for varying net lengths
– Staggering start/end points, optimise for a tile
– Regular, min delay can be estimated
Details
Connection blocks
Switch blocks
Channel segmentation distribution
• Short wires: 1 block • Long wires: Multiple blocks
Routing hops
Switch block: disjoint
• Numerical designation of wire entering = wire exiting
• 0-0
• 1-1
• …
• Limits flexibility
• Distinct routing domains
Switch block: Wilton
• Allows change in domains for turns
• 0(left)-3(bottom)
• 0(left)-0(top)
I/O STANDARDS
I/O Architecture
• Sets external interface rates
• Occupies significant area
– ~40%
• Choice of I/O standard
– Performance (Pin capacitance)
– Area
Common I/O Standards
Selection
• I/O banks
– Groups of I/O cells
– Share supply/reference voltage
– Each bank has different I/O standard
Highspeed I/O
• High speed inter-chip signaling
– SERDES (serialiser/deserialiser)
• Source sync clocking
• Dynamic clock phase adjustment
• High-bandwidth memory interface
– Ethernet MAC
– DLLs/PLLs
PROGRAMMING TECH
Programming Technology
• SRAM Cells
– Reusability
– Standard CMOS
Programming Technologies
Improving FPGAs
• Reducing the gap: Area, Speed, Power
• Alternatives to FPGAs
– CGRAs
– Structured ASICs
References
• FPGA Architecture: Survey and Challenges
– Ian Kuon, Russell Tessier, Jonathan Rose
Questions?