View
229
Download
0
Tags:
Embed Size (px)
Citation preview
DSP for FPGA
SYSC5603 (ELG6163) Digital Signal Processing Microprocessors, Software
and Applications
Miodrag Bolic
Objectives
• Comparison between PDSP and FPGA
• Virtex II Pro
• Altera Stratix FPGA
• Stratix DSP Block and its configuration
• Altera design flow
What Is an FPGA?
• Field Programmable Gate Array
• Device that Has a Regular Architecture (Set of Blocks) that Can Be Programmed for Various Functions
• “Glue” Logic• Customizable Hardware Solution• Configurable Processors
Why Use FPGAs in DSP Applications?• 10x More DSP Throughput Than
DSP Processors– Parallel vs. Serial Architecture
• Cost-Effective for Multi-Channel Applications
• Flexible Hardware Implementation• Single-Chip Solution
– System (Hardware/Software) Integration Benefits
FPGASoftwareEmbeddedProcessor
FPGA
DSP System
SoftwareDSP
MAC MAC
MAC MAC
Can implement hundreds of MAC functions in an FPGA
Parallel implementation allows for faster throughput
– 200 Tap FIR Filter would need 1 clock cycle per sample
1-8 Multipliers Needs looping for more than 8
multiplications Needs multiple clock cycles
because of serial computation 200 Tap FIR Filter would need
25+ clock cycles per sample with an 8 MAC unit processor
MAC MAC MAC MAC MAC MAC MAC MAC
MAC MAC MAC MAC MAC MAC MAC MAC
MAC MAC MAC MAC MAC MAC MAC MAC
MAC MAC MAC MAC MAC MAC MAC MAC
High Speed DSP Processor
High Level of Parallel Processing in FPGA
DSP Processors vs. FPGAs
100 -
Complete Hardware Implementation
Per
form
ance
(M
MA
Cs/
sec)
600 -
Embedded Processors
Embedded Processors Hardware Acceleration
New!New!
Extending Range of Altera Reconfigurable DSP Solutions
Extending Range of Altera Reconfigurable DSP Solutions
Data Programmable DSP Processors Reconfigurable DSP
Benefits • Easy to Use• Programmed Via C-Code or Assembly• Fast Development Time
• Easy to Use• Programmed via C-Code, Assembly, or HDL• Efficient for Recursive Algorithms Using DSP IP Cores• Higher Levels of Integration
Weaknesses • Fixed Architecture• Inefficient for Highly Recursive Algorithms Unless Hardware Accelerated• Potential Bus Bottlenecks• Other Devices (FPGAs) Often Used on Board for Other Functions
• Longer Development Time (But Getting Shorter!)
Comparison of DSP DevicesComparison of DSP Devices
Objectives
• Comparison between PDSP and FPGA
• Virtex II Pro
• Altera Stratix FPGA
• Stratix DSP Block and its configuration
• Altera design flow
Stratix EP1S10 [2]
TriMatrix™ Memory [1]
M512 Blocks M4K Blocks M-RAMDedicated External Memory Interface
Look-Up Schemes Packet & Cell Buffering Cache
More Bits For Larger Memory Buffering
More Data Ports for Greater Memory Bandwidth
Small FIFOs Shift Register Rake Receiver
Correlator FIR Filter Delay Line
Header / Cell Storage Channelized
Functions ATM cell–packet
processing Nios Program Memory
Packet / Data Storage Nios Program Memory System Cache Video Frame Buffers Echo Canceller Data
Storage
512 bits per block + parity
4 Kbits per block + parity
512 Kbits per block + parity
Memory Bandwidth SummaryStratix Device Family [1]
Device Total RAM Bits
M-RAM Blocks
M4K Blocks
M512 Blocks
MaximumBandwidth
(Mbps)
EP1S10 920,448 1 60 94 1,245,024
EP1S20 1,669,248 2 82 194 2,096,928
EP1S25 1,944,576 2 138 224 2,894,400
EP1S30 3,317,184 4 171 295 3,750,192
EP1S40 3,423,744 4 183 384 4,384,800
EP1S60 5,215,104 6 292 574 6,762,528
EP1S80 7,427,520 9 364 767 8,784,720
Logic Element (LE) [2]
Sync Load & Clear Logic
DDATA
4-Input LUT
Register Control Signals
Register Chain Input
Register Chain Output
LUT Chain Output
data1
data2
data3
data4
cin
Row, Column & DirectLink
Routing
Local Routing
Note:1) Functional Diagram Only. Please See Datasheet for more Details.2) Addnsum & data1 connected via XOR logic
LUT Chain Input
Register Feedback
addnsub
(2)
Dynamic Arithmetic Mode
Sync Load & Clear Logic
DDATA
Register Control Signals
Register Chain Input
Register Chain Output
data1
data2
addnsub
Row, Column & DirectLink
Routing
Local Routing
Note: Functional Diagram Only. Please See Datasheet for more Details.
Carry-Out Logic
Carry-In Logic
LAB Carry-In
Carry-In0Carry-In1
Sum Calculator
Carry Calculator
data3
Carry-In0Carry-In1
Carry-Out1
Carry-Out0
Logic Array Blocks (LAB) [2]• 10 LEs• Local Interconnect• LAB-Wide Control
Signals
LE1
LE2
LE3
LE4
LE5
LE6
LE7
LE8
LE10
LE9
4
4
4
4
4
4
4
4
4
4
Control Signals
Lo
cal I
nte
rco
nn
ect
30 LAB Input Lines10 LE Feedback Lines
Avalon Switch Fabric Contents
• Avalon Switch Fabric provides the following to peripherals it connects – Data-Path Multiplexing– Address Decoding– Wait-State Generation– Dynamic Bus Sizing– Interrupt-Priority Assignment– Latent Transfer Capabilities– Streaming Read and Write Capabilities
• Avalon Switch Fabric tailors transactions to the characteristic of peripherals that are attached
SOPC Design ExampleDMA Controller With
StreamingControl Port
(Slave)
Read Port (Master –
Streaming)
Write Port (Master –
Streaming)
UARTInstruction Memory 32-bit Data
path
Avalon Switch Fabric
Avalon Tri-State Bridge
VGA Controller
External FLASH 1 MB 16-bit
Datapath
External SRAM 256 KB 32-bit
Datapath
InstMaster
DataMaster
CPU 32 Bit
Data Memory 32-bit Data
path
Allows for Masters and Slaves to communicate without knowledge of each others interface details
Data Path Multiplexing & Slave ArbitrationDMA Controller With
StreamingControl Port
(Slave)
Read Port (Master –
Streaming)
Write Port (Master –
Streaming)
UARTInstruction Memory 32-bit Data
path
Avalon Switch Fabric
Arbiter
Avalon Tri-State Bridge
VGA Controller
External FLASH 1 MB 16-bit
Datapath
External SRAM 256 KB 32-bit
Datapath
InstMaster
DataMaster
CPU 32 Bit
Data Memory 32-bit Data
path
MUX
1. Data-Path Multiplexing
2- Slave Arbitration
3- Address Decoding
Objectives
• Comparison between PDSP and FPGA
• Virtex II Pro
• Altera Stratix FPGA
• Stratix DSP Block and its configuration
• Altera design flow
DSP Blocks
• Eight 9 × 9 bit multipliers
• Four 18 × 18 bit multipliers
• One 36 × 36 bit multiplier
DSP Blocks (cont.)
The DSP block consists of
• A multiplier block
• An adder/subtractor/accumulator block
• A summation block
• An output interface
• Output registers
• Routing and control signals
Stratix DSP Blocks
• High Performance Dedicated Multiplier Circuitry– 18x18 Functions at 280 MHz
• Variable Operand Widths with Full Precision Outputs – 9x9 (8 Max.)– 18x18 (4 Max.)– 36x36 (1 Max.)
• Add, Accumulate orSubtract– Signed & Unsigned
Operations– Dynamically Change
between Add & Subtract– Supports DSP Requirements
Including Complex Numbers
+
Op
tio
nal
Pip
elin
ing
Ou
tpu
t R
eg
iste
r U
nit
Ou
tpu
t M
ult
iple
xer+ -
+ - Inp
ut
Reg
iste
r U
nit
DSP Block for 18 x 18-bit Mode
Shift Register Chain
Adder/Output Block
Time-Domain Multiplexed FIR Filters
Operation of TDM Filter
• DSP Block– Reduces LE Usage – Reduces Routing Congestion– Reduces Power– Maintains Performance
90% of your problems are hidden under the surface!
18
X
18
X
18
36
+
36
18
36
+
36
+
38
SAVES 652 ROUTING
NETS!
Resource Savings with DSP BlocksResource Savings with DSP Blocks
Design Flow
Design Flow Overview
1) Create Design in Simulink Using Altera Libraries2) Simulate in Simulink3) Add SignalCompiler to Model 4) Create HDL Code & Generate Testbench 5) Perform RTL Simulation6) Synthesize HDL Code & Place & Route7) Program Device8) Signal Tap II Logic Analyzer
Step 1- Create Design in Simulink Using Altera Libraries
• Drag & Drop Library Blocks into Simulink Design & Parameterize Each Block
Parameterization of IP Megacores
Step 2 - Simulate in Simulink
Step 3 - Add “Signal Compiler” to Model to Generate HDL code
• APEX20K/E/C• APEX II• Stratix & Stratix GX• Cyclone & ACEX 1K• Mercury• FLEX10K & FLEX 6000• DSP Boards
Speed vs. Area
Message Window
• Leonardo Spectrum• Synplify• Quartus II
Testbench Generation
Step 4 - Create HDL Code & Generate Testbench
AltrFir32.vhd
AltrFir32.mdl
Enable "Generate Stimuli for VHDL Testbench" Button
HDL Code Generation
DSP Builder Report File• Lists All Converted
Blocks– Port Widths
– Sampling Frequencies
– Warnings & Messages
Step 5 – Perform RTL Simulation ( ModelSim )
1) Set working directory (File => Change Directory)
2) Run TCL file (Tools => Execute Macro)
Perform VerificationModelSim
vs Simulink
Step 6 - Synthesize HDL & Place & Route
– Synthesis
• Leonardo Spectrum
• Synplify• Quartus II
– Quartus II
Fitter
Step 7 – Program Device
Download Design to DSP
Development Kits
Stratix DSP Development Board
40-Pin Connectors for Analog Devices Texas Instruments Connectors
on Underside of Board
Mictor-Type Connectors for HP Logic Analyzers
MAX 7000 Device
Analog SMA Connectors
D/A Converters
A/D Converters
Prototyping Area
Nios Expansion Prototype Connector
Stratix DSP Board – Key Features
• Stratix EP1S25F780C5 Device (Starter Version)• Stratix EP1S80B956C7 Device (Professional Version)• Analog I/O
– Two 12-bit, 125 MHz A/D Converters– Two 14-bit, 165 MHz D/A Converters
• Digital I/O– Two 40-pin Connectors for Analog Devices A/D Converter
Evaluation Boards– Connector for TI TMS320 Cross-Platform Daughter Card– 3.3V Expansion/Prototype Headers– RS-232 Serial Port
• Memory– 2 Mbytes of 7.5-ns Synchronous SRAM– 32 Mbytes of FLASH
Step 8 - SignalTap II Logic Analyzer• Embedded Logic
Analyzer– Downloads into Device
with Design– Captures State of
Internal Nodes– Uses JTAG for
Communication
SignalTap II Logic AnalyzerSignalTap II Logic Analyzer
Imported Data
Imported Plot
Analysis of Imported Data