FPGA for Complex SystemImplementation
National Chiao Tung UniversityChun-Jen Tsai
04/14/2011
2/32
About FPGA
FPGA was invented by Ross Freeman in 1989†
SRAM-based
FPGA properties Standard parts Allowing multi-level logic implementation Composed of programmable logic blocks and interconnects Some complex FPGAs also include non-programmable logic
blocks (such as processor cores, MAC units, and SRAMs) toimprove efficiency “Platform FPGA”
†R. H. Freeman, “Configurable Electrical Circuit Having Configurable Logic Elements and ConfigurableInterconnect,”U.S. Patent 4,870,302, Sep. 26, 1989
3/32
Electronic Logic Components
Logic
GeneralPurpose IC ASIC
ProgrammableLogic Devices
GateArrays Cell-based ICs Full Custom
ICs
SPLDs(PALs) CPLDs FPGAs
4/32
Programmable Array Logic (PAL)
PAL is a special case of sum-of-product logic inwhich the AND array is programmable and the ORarray is fixed
Each input is buffered and drives many AND gates:
AND gate symbols in PAL:
non-inverted output
inverted output
ABC
ABC ABC
A B C
5/32
Function Implementation Using PAL
Combinational PALs have 10 ~ 20 inputs and 2 ~ 10outputs; with 2 ~ 8 AND gates driving each OR gate
Sequential PALs has extra D flip-flops with inputdriven from the programmable array logic
a full adder in PAL
6/32
Complex Programmable Logic Devices
If several PLDs, along with some flip-flops, are putinto a single IC, we have a complex programmablelogic device (CPLD) that can be used to implement asmall digital system
Example: Xilinx CoolRunner Macrocell (MUXs and buffers)
PAL block
7/32
Field Programmable Gate Arrays
The basic ideas of FPGA’s is to inter-connect small“truth tables”to form complex digital circuits
10000
11111. . .. . .00001
QABCDOutputInputs
00000
11111. . .. . .00001
QABCDOutputInputs
00000
11111. . .. . .10001
QABCDOutputInputs
. . .
table 3
table 1
table 2
8/32
Logic Design with FPGA
A digital design on FPGA is composed of three parts: Logic elements Interconnect I/O blocks (IOB)
An FPGA configuration is similar to a program formicroprocessor Specifies “functional units”
and “interconnects”betweenfunctional units
…
LE LE LE
LE LE LE
LE LE LE
IOB
IOB
IOB
IOB
IOB
IOB
Interconnect
Interconnect
9/32
CPU v.s. FPGA
Microprocessor & FPGAs are programmed indifferent ways
CPU
memory
instructions
data
logic logic
logic logic
FPGA program bits
10/32
Logic Elements
Logic element (LE) is more capable than logic gates A simple LE can be programmed to behave as an
n-input, m-output function (for example, n = 4, m = 1); suchLE’s are called “fine-grained”LE’s (relatively speaking,these LE are “coarse”compared to a gate, for example)
Many FPGAs include distributed register bits around the LE
An FPGA may provide specialized complex LEblocks, such as multipliers, SRAMs, or processors These are all called coarse-grained LEs
A “platform-FPGA”is composed of both fine-grainedand coarse-grained LEs
11/32
Generic Logic Elements
Example of fine-grain logic element structure
11 1 1 1. . .
00 0 0 110 0 0 0
outinputs
…
LE LE LE
LE LE LE
LE LE LE
IOB
IOB
IOB
IOB
IOB
IOB
Interconnect
Interconnect
Logic Element
Lookup Table(LUT) D Q
configuration bit
LE out
12/32
Function Implementation with LUT
The datapath that implements F = ABC + ABC+ ABis as follows, the LUT4 has entries as follows:
A function with more than 4 variables can always bedecomposed to the sum (OR) of 4-variable function
00
00000001
11111. . .. . .
11
00100011
FX1 X2 X3 X4
LUT4 table entries(red means don’t care)
13/32
Carry Chains in FPGA
Since addition is a very important operation, manyFPGAs have a dedicated circuitry for carry bitcalculation and propagation.
14/32
Example: Spartan 2 Architecture (1/2)
A Xilinx Spartan device is composed of a 2-D array ofConfigurable Logic Blocks (CLB)
15/32
Example: Spartan 2 Architecture (2/2)
In Spartan II, each CLB has two identical slices; eachslice contains two logic cells with a LUT, carry logic,and a register
F5IN
G4
G3
G2
G1
LookupTable
LookupTable
BYSR
F4
F3
F2
F1
BXCE
CLK
CIN
COUT
carry/controllogic
carry/controllogic
QD
QD
Y
YB
YQ
XXB
XQ
16/32
Example: Spartan 2 I/O Blocks
Supports multiple I/O standards (PCI, AGP, etc.)
17/32
Logic Implementation on FPGA
Logic synthesis How do we breakdown a function and map it to logic
elements? How do we implement an operation within a logic element?
Logic placement Where do we put each piece of logic in the array of logic
elements?
…
LE LE LE
LE LE LE
LE LE LE
18/32
Interconnect Architecture
On an FPGA, we must be able to control Connections from wiring channels to LEs Connections between wires in the wiring channels
LE LE
Wiring channel
channel channel
chan
nel
chan
nel
19/32
Wiring among LEs is organized into channels Channels are arranged horizontally and vertically on the chip There are many wires per channel
Connections between wires made at programmableinterconnection points
An EDA tool must choose: Channels from source to
destination Wires within the channels
Programmable Wiring
LE LE LE
LE LE LE
LE LE LE
LE
LE
LE
horizontal channel 2
vert
ical
chan
nel1
vert
ical
chan
nel5
vert
ical
chan
nel3
horizontal channel 3
20/32
Programmable vs Fixed Interconnect
Compares to the wiring of fixed layout in a customlogic, there are two major disadvantages of FPGAinterconnect: Switch adds delay
FPGA interconnect has extra length The problem becomes worse as the logic becomes larger
D Q
21/32
Interconnect Strategies
Types of wires: Short wires: local LE connections Global wires: long-distance, buffered communication Special wires: clocks, etc
Use design hierarchy to guide placement searchUse hard macros where possible
A macro is a larger modules designed to fit into a particularFPGA (similar to IP blocks for platform-based SoC)
Hard macro includes placement Soft macro does not include placement
Add placement constraints
22/32
FPGAs and I/O Pins
Chip capacity is growing faster than package pinout Now, we can put many hardware functions in an FPGA,
but the total number of I/O pins is limited Must try to share a small amount of interface pins among
functions
Alternatively, one can use multiple smaller FPGAs tocompose same functions It’s harder to breakdown a design across FPGAs The performance may be better due to shorter routing
lengths
23/32
FPGA Configuration Technologies
FPGA’s logic elements, interconnect switch, and I/Opins can be programmed using one of the followingthree technologies: SRAM-based
Can be programmed many times Must be programmed after power-up
Antifuse-based Programmed once via a burn-in step
Flash-based Similar to SRAM but using flash memory
24/32
SRAM-based FPGAs
Program logic functions and interconnect usingSRAM to store boolean table and on/off state
Advantages: Re-programmable dynamically reconfigurable uses standard processes
Disadvantages: SRAM burns power Configuration lost at power-down (but not on reset!) Possible to steal, disrupt configuration bits
Just like piracy & virus issues of software
25/32
Configuring SRAM-based FPGA
There are several ways to configure an FPGA JTAG interface, not good for “turn-key”systems FPGA in master mode, read configuration data from PROM FPGA in slave mode, microcontroller configures an FPGA
26/32
Features of SRAM-based LUT
n-input LUT can handle function of 2n inputs All logic functions take the same amount of space All functions have the same delay
With CMOS custom logic, XOR is much slower than NAND;with SRAM LUT, XOR is as fast (slow) as NAND
SRAM is larger than static gate equivalent of function “Gate-count”is not a good measure for FPGA logic cost
For static gate, n input NAND/NOR gate has 2n transistors For FPGA LE, 4-input LUT has 128 transistors in SRAM, 96 in
multiplexer
Burns power even at idle
27/32
Platform FPGAs
A complex system must be composed of hardwareand software components
To reduce system development/integration time,some chip companies starts to push “Platform FPGA”visions
Two examples: Xilinx has Virtex II Pro that provides PowerPC-based
platform FPGA Altera has Excalibur that features ARM-based platform
FPGA (a.k.a. System-on-Programmable-Chip, SoPC)
28/32
Xilinx Platform FPGA Vision
Processing Platform: PowerPC D/I Caches Controllers Interfaces
DSP Platform: Distributed RAM 1818 Multipliers 600 Billion MACs/sec
Connectivity Platform: 100+ Gb Bandwidth I/O interfaces of the chip Rocket I/O (3.125 Gbps serial port) Hi-speed parallel
29/32
Four Generations of Virtex Devices
1985 1992 2000 2002
Dev
ice
Com
plex
ity
Glue Logic
System-LevelFunction Blocks
XC2000-XC3000 XC4000, Virtex Virtex-II
PlatformFPGA
Virtex-II Pro,Virtex-4, Virtex-5
Platform forProgrammableSystems
30/32
Example: Platform FPGA Systems
A platform implementation with remote configurationcapabilities†
†K. Park and H. Kim, Remote FPGA Reconfiguration Using MicroBlaze or PowerPC Processors, XApp 441, Sep. 2006
31/32
FPGA Implementation Process
Step1: Design Design entry methods: HDL (Verilog or VHDL) or schematic
drawings
Step 2: Create netlist (synthesis) Translates V, VHD, SCH files into the standard format EDIF
file
Step 3: Physical design (Implementation) Translate, map, place & route the netlist into the target
device configuration bits
Step 4: Configure the FPGA Download BIT file into the FPGA
32/32
FPGA Design Flow
In this class, Xilinx ISE Foundation is used as theLogic design toolchain
Design Entry
Specification
Testbench Simulation
Synthesis
timing constraints
Place & Route
FPGA
bit file
Static timing analysis
Mapping