RR Osorio FPGA

Preview:

Citation preview

Field-Programmable Gate Arraysas tracking devices

Roberto Rodríguez OsorioJavier Díaz Bruguera

Group of Computer ArchitectureDept. of Electronics and Computer Science

University of Santiago de Compostela

2

Outline

Application-specific computing machinesASIC vs FPGAFPGA technology basicsHard cores in FPGAsPerformanceDesign effortChoicesApplications

3

Application-specific computing machines

Microprocessor

Codememory

Datamemory

PC IR

Control logic

Registerfile

Functionalunits

DatapathControlsection

M p

Control logic MAC

DatapathControlsection

Mpt

Codememory

Datamemory

PC IR

Control logic

Registerfile

Functionalunits

DatapathControlsection

M p

Control logic MAC

DatapathControlsection

Mpt

Application-SpecificIntegrated Circuit

Performance: 10 cycles @ 3GHzDissipated power: ~35 W

Performance: 1 cycle @ 1GHzDissipated power: ~mW

4

ASIC vs FPGA

0.05

$4M

$3M

$2M

$1M

Technology (micrometers)

NR

E

0.35 0.25 0.2 0.15 0.1

5

ASIC vs FPGA

10

10

10

10

10

10

10

6

5

4

3

2

1

0

2 1 0.5 0.25 0.13 0.07

1986 1990 1994 1998 2002 2006

Computational efficiency (Mops/w)

Technology ( m)

Maximum efficiency(ASIC)

FPGAASSPMPPAGPGPUVLIWASIPManyCore...

Source: Theo A.C.M Claasen, ISSCC 99

6

FPGA technology basics – Computing

carryinput a b s

carryoutput

0 0 0 0 00 0 1 1 00 1 0 1 00 1 1 0 11 0 0 1 01 0 1 0 11 1 0 0 11 1 1 1 1

FA

a b

s

cout cin

ac

b

aba

cbcin

in

in

s

cout

7

FPGA technology basics – Do not compute

Logic blocks

SRAM

Memory

8x1-bit

SRAM

Memory

8x1-bit

cin

a

b s

cout

8

FPGA technology basics – Interconnect█ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ ██ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ ██ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ ██ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ ██ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ ██ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ ██ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ ██ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ ██ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ ██ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ ██ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ ██ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ ██ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ ██ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ ██ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ ██ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █

9

FPGA technology basics – Interconnect

10

FPGA technology basics – Interconnect

11

FPGA technology basics – Interconnect + memory

FPGA fabric consists of a huge number of simple memory elements connected by means of a reconfigurable networkDesign software must break every computing tasks into 1-bit size operation with no more than 4, 5 or 6 variablesOperations are spatially distributed according to proximity criteriaRouting may be troublesome

Long paths are slowRouting though logic blocks increase area

12

Hard cores in FPGAs

Memory blocksMultipliersDSP blocksMicroprocessorsFloating point units?

█ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █

█ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █

█ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █

█ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █

█ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █

█ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █

█ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █

█ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █

█ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █

█ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █

13

Memory blocks

Hundreds or thousands of small memory blocksDual-port blocks18 K-bit each for XilinxFlexible configurations

Many short words or a few large word

Independent accessHuge aggregated bandwidth

14

Multipliers and DSP blocks

As FPGAs were becoming larger, some people tried to implement DSP algorithms on them

However: Multipliers take too much areaTherefore: Hardwired multipliers were introduced

DSP algorithms are often based on multiply & addmultiply & accumulate

DSP blocks in modern FPGAs implement hardwired: multipliy, multiply & add, multiply & accumulateoptional addition before multiplyingthree-input add1 large, 2 medium or 4 small operations on the same hardwareshifting, comparisons, bit-wise operations,…

Up to 2000 DSP blocks in current FPGAs for massive parallelism

15

Microprocessors

Xilinx: IBMs Power PC processors

Virtex II ProVirtex-4 FXVirtex-5 FX

Microblaze soft processors

Altera: ARM RISC processorsNios soft processor

16

Floating point units

Not implemented so far• Suggested to help to accelerate scientific computing• For engineering, fixed point arithmetic is usually enough

Would it happen?☺ It happened with multipliers, transceivers, DSP blocks, …

GPUs have already a strong position in this field

17

Performance

Compared to an ASIC10 times slower, larger and power hungry

Compared to a microprocessorFast, depending on:

Potential parallelismRequired bandwidth

Small and simple, even standaloneReduced power consumption (< 1W), they may run on batteries

18

Design effort

Several scenarios:

Pure VHDL or Verilog codingHigher flexibility, efficiency and performanceLong design time Costly debugging

Use macros combined with VHDL or Verilog Libraries of IP blocks easy the design processIt is not guaranteed that the required functionalities can be found

High level languages (DSP logic (Matlab), Impulse-C, Handel-C,…)

Efficient and simple implementation for simple algorithmsLack of expressiveness for complex algorithms

19

Choices

XilinxVirtexSpartan

AlteraStratixCyclone

OthersActelLattice Semiconductor…

20

Choices - Xilinx

Spartan 3 Spartan 6 Virtex 6

Logic Cells 1728 – 74880 3840 - 147443 74496 – 566784

Block RAM (Kbits)

12 - 1872 216 - 4824 5616 – 32832

Multipliers / DSP

4 – 10484 - 126 8 - 180 288 - 2016

Evaluation board cost

< $200 $300 - $1000 $2000 - $2500

21

In the context of this applications

Device choice• Logic bounded

• Standard logic• Multipliers

• IO boundedParallel acquisition• Switching memory blocks for acquisition and computationHigh computing speed• Via pipeliningResults storage• Internal or external memoryPower consumptionConfiguration