21
Field-Programmable Gate Arrays as tracking devices Roberto Rodríguez Osorio Javier Díaz Bruguera Group of Computer Architecture Dept. of Electronics and Computer Science University of Santiago de Compostela

RR Osorio FPGA

Embed Size (px)

Citation preview

Page 1: RR Osorio  FPGA

Field-Programmable Gate Arraysas tracking devices

Roberto Rodríguez OsorioJavier Díaz Bruguera

Group of Computer ArchitectureDept. of Electronics and Computer Science

University of Santiago de Compostela

Page 2: RR Osorio  FPGA

2

Outline

Application-specific computing machinesASIC vs FPGAFPGA technology basicsHard cores in FPGAsPerformanceDesign effortChoicesApplications

Page 3: RR Osorio  FPGA

3

Application-specific computing machines

Microprocessor

Codememory

Datamemory

PC IR

Control logic

Registerfile

Functionalunits

DatapathControlsection

M p

Control logic MAC

DatapathControlsection

Mpt

Codememory

Datamemory

PC IR

Control logic

Registerfile

Functionalunits

DatapathControlsection

M p

Control logic MAC

DatapathControlsection

Mpt

Application-SpecificIntegrated Circuit

Performance: 10 cycles @ 3GHzDissipated power: ~35 W

Performance: 1 cycle @ 1GHzDissipated power: ~mW

Page 4: RR Osorio  FPGA

4

ASIC vs FPGA

0.05

$4M

$3M

$2M

$1M

Technology (micrometers)

NR

E

0.35 0.25 0.2 0.15 0.1

Page 5: RR Osorio  FPGA

5

ASIC vs FPGA

10

10

10

10

10

10

10

6

5

4

3

2

1

0

2 1 0.5 0.25 0.13 0.07

1986 1990 1994 1998 2002 2006

Computational efficiency (Mops/w)

Technology ( m)

Maximum efficiency(ASIC)

FPGAASSPMPPAGPGPUVLIWASIPManyCore...

Source: Theo A.C.M Claasen, ISSCC 99

Page 6: RR Osorio  FPGA

6

FPGA technology basics – Computing

carryinput a b s

carryoutput

0 0 0 0 00 0 1 1 00 1 0 1 00 1 1 0 11 0 0 1 01 0 1 0 11 1 0 0 11 1 1 1 1

FA

a b

s

cout cin

ac

b

aba

cbcin

in

in

s

cout

Page 7: RR Osorio  FPGA

7

FPGA technology basics – Do not compute

Logic blocks

SRAM

Memory

8x1-bit

SRAM

Memory

8x1-bit

cin

a

b s

cout

Page 8: RR Osorio  FPGA

8

FPGA technology basics – Interconnect█ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ ██ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ ██ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ ██ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ ██ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ ██ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ ██ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ ██ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ ██ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ ██ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ ██ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ ██ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ ██ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ ██ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ ██ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ ██ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █

Page 9: RR Osorio  FPGA

9

FPGA technology basics – Interconnect

Page 10: RR Osorio  FPGA

10

FPGA technology basics – Interconnect

Page 11: RR Osorio  FPGA

11

FPGA technology basics – Interconnect + memory

FPGA fabric consists of a huge number of simple memory elements connected by means of a reconfigurable networkDesign software must break every computing tasks into 1-bit size operation with no more than 4, 5 or 6 variablesOperations are spatially distributed according to proximity criteriaRouting may be troublesome

Long paths are slowRouting though logic blocks increase area

Page 12: RR Osorio  FPGA

12

Hard cores in FPGAs

Memory blocksMultipliersDSP blocksMicroprocessorsFloating point units?

█ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █

█ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █

█ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █

█ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █

█ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █

█ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █

█ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █

█ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █

█ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █

█ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █

Page 13: RR Osorio  FPGA

13

Memory blocks

Hundreds or thousands of small memory blocksDual-port blocks18 K-bit each for XilinxFlexible configurations

Many short words or a few large word

Independent accessHuge aggregated bandwidth

Page 14: RR Osorio  FPGA

14

Multipliers and DSP blocks

As FPGAs were becoming larger, some people tried to implement DSP algorithms on them

However: Multipliers take too much areaTherefore: Hardwired multipliers were introduced

DSP algorithms are often based on multiply & addmultiply & accumulate

DSP blocks in modern FPGAs implement hardwired: multipliy, multiply & add, multiply & accumulateoptional addition before multiplyingthree-input add1 large, 2 medium or 4 small operations on the same hardwareshifting, comparisons, bit-wise operations,…

Up to 2000 DSP blocks in current FPGAs for massive parallelism

Page 15: RR Osorio  FPGA

15

Microprocessors

Xilinx: IBMs Power PC processors

Virtex II ProVirtex-4 FXVirtex-5 FX

Microblaze soft processors

Altera: ARM RISC processorsNios soft processor

Page 16: RR Osorio  FPGA

16

Floating point units

Not implemented so far• Suggested to help to accelerate scientific computing• For engineering, fixed point arithmetic is usually enough

Would it happen?☺ It happened with multipliers, transceivers, DSP blocks, …

GPUs have already a strong position in this field

Page 17: RR Osorio  FPGA

17

Performance

Compared to an ASIC10 times slower, larger and power hungry

Compared to a microprocessorFast, depending on:

Potential parallelismRequired bandwidth

Small and simple, even standaloneReduced power consumption (< 1W), they may run on batteries

Page 18: RR Osorio  FPGA

18

Design effort

Several scenarios:

Pure VHDL or Verilog codingHigher flexibility, efficiency and performanceLong design time Costly debugging

Use macros combined with VHDL or Verilog Libraries of IP blocks easy the design processIt is not guaranteed that the required functionalities can be found

High level languages (DSP logic (Matlab), Impulse-C, Handel-C,…)

Efficient and simple implementation for simple algorithmsLack of expressiveness for complex algorithms

Page 19: RR Osorio  FPGA

19

Choices

XilinxVirtexSpartan

AlteraStratixCyclone

OthersActelLattice Semiconductor…

Page 20: RR Osorio  FPGA

20

Choices - Xilinx

Spartan 3 Spartan 6 Virtex 6

Logic Cells 1728 – 74880 3840 - 147443 74496 – 566784

Block RAM (Kbits)

12 - 1872 216 - 4824 5616 – 32832

Multipliers / DSP

4 – 10484 - 126 8 - 180 288 - 2016

Evaluation board cost

< $200 $300 - $1000 $2000 - $2500

Page 21: RR Osorio  FPGA

21

In the context of this applications

Device choice• Logic bounded

• Standard logic• Multipliers

• IO boundedParallel acquisition• Switching memory blocks for acquisition and computationHigh computing speed• Via pipeliningResults storage• Internal or external memoryPower consumptionConfiguration