24
Supercomputing Systems AG Phone +41 43 456 16 00 Technopark 1 Fax +41 43 456 16 10 8005 Zürich www.scs.ch Vision trifft Realität. Smart Automotive Sensors with acceleration on FPGA and CPU Workshop at University of Applied Sciences Ulm 26.6.2018 Felix Eberli, Department Head Embeded & Automotive

Smart Automotive Sensors with acceleration on FPGA and CPU · [email protected] +41 43 456 16 19. Author: Eberli Felix Created Date: 6/6/2018 5:54:28 PM

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Smart Automotive Sensors with acceleration on FPGA and CPU · felix.eberli@scs.ch +41 43 456 16 19. Author: Eberli Felix Created Date: 6/6/2018 5:54:28 PM

Supercomputing Systems AG Phone +41 43 456 16 00

Technopark 1 Fax +41 43 456 16 10

8005 Zürich www.scs.ch

Vision trifft Realität.

Smart Automotive Sensors

with acceleration on

FPGA and CPU

Workshop at University of Applied Sciences Ulm 26.6.2018

Felix Eberli, Department Head Embeded & Automotive

Page 2: Smart Automotive Sensors with acceleration on FPGA and CPU · felix.eberli@scs.ch +41 43 456 16 19. Author: Eberli Felix Created Date: 6/6/2018 5:54:28 PM

2 Zürich 06.06.2018 © by Supercomputing Systems AG PUBLIC

Will robots drive our cars soon?

Page 3: Smart Automotive Sensors with acceleration on FPGA and CPU · felix.eberli@scs.ch +41 43 456 16 19. Author: Eberli Felix Created Date: 6/6/2018 5:54:28 PM

3 Zürich 06.06.2018 © by Supercomputing Systems AG PUBLIC

But already in series as many driver assistant systems

• ADC (Distronic)

• Blind spot detection

• Break assist

• Pedestrian detection

• Park pilot

• Stop & Go Pilot

• Highway Pilot (steering assist)

• ….

• Lets see

Page 4: Smart Automotive Sensors with acceleration on FPGA and CPU · felix.eberli@scs.ch +41 43 456 16 19. Author: Eberli Felix Created Date: 6/6/2018 5:54:28 PM

5 Zürich 06.06.2018 © by Supercomputing Systems AG PUBLIC

Deep learning results

Page 5: Smart Automotive Sensors with acceleration on FPGA and CPU · felix.eberli@scs.ch +41 43 456 16 19. Author: Eberli Felix Created Date: 6/6/2018 5:54:28 PM

7 Zürich 06.06.2018 © by Supercomputing Systems AG PUBLIC

Why do we need accelerators for image processing?

• Low processing time

• Low Latency

• Low Power consumption

Page 6: Smart Automotive Sensors with acceleration on FPGA and CPU · felix.eberli@scs.ch +41 43 456 16 19. Author: Eberli Felix Created Date: 6/6/2018 5:54:28 PM

8 Zürich 06.06.2018 © by Supercomputing Systems AG PUBLIC

Software vs. Hardware

• FPGA imlpementation allows to implement massively parallel and pipelined

Page 7: Smart Automotive Sensors with acceleration on FPGA and CPU · felix.eberli@scs.ch +41 43 456 16 19. Author: Eberli Felix Created Date: 6/6/2018 5:54:28 PM

9 Zürich 06.06.2018 © by Supercomputing Systems AG PUBLIC

Control path vs data path

• Data path can be implemented in HW to reach acceleration

• Keep more complex control code on processor

Page 8: Smart Automotive Sensors with acceleration on FPGA and CPU · felix.eberli@scs.ch +41 43 456 16 19. Author: Eberli Felix Created Date: 6/6/2018 5:54:28 PM

10 Zürich 06.06.2018 © by Supercomputing Systems AG PUBLIC

Partitioning SW on processor vs FPGA-Fabric

Page 9: Smart Automotive Sensors with acceleration on FPGA and CPU · felix.eberli@scs.ch +41 43 456 16 19. Author: Eberli Felix Created Date: 6/6/2018 5:54:28 PM

11 Zürich 06.06.2018 © by Supercomputing Systems AG PUBLIC

Xilinx Zynq Ultrascale+

Page 10: Smart Automotive Sensors with acceleration on FPGA and CPU · felix.eberli@scs.ch +41 43 456 16 19. Author: Eberli Felix Created Date: 6/6/2018 5:54:28 PM

12 Zürich 06.06.2018 © by Supercomputing Systems AG PUBLIC

SCS Example Projects

• Acceleration on FPGA or ARM

Page 11: Smart Automotive Sensors with acceleration on FPGA and CPU · felix.eberli@scs.ch +41 43 456 16 19. Author: Eberli Felix Created Date: 6/6/2018 5:54:28 PM

13 Zürich 06.06.2018 © by Supercomputing Systems AG PUBLIC

Stereo accelerator for SGM in FPGA

Page 12: Smart Automotive Sensors with acceleration on FPGA and CPU · felix.eberli@scs.ch +41 43 456 16 19. Author: Eberli Felix Created Date: 6/6/2018 5:54:28 PM

14 Zürich 06.06.2018 © by Supercomputing Systems AG PUBLIC

Timestamping and PCIe with FPGA:

Measurement System recording 8 Cameras and 10GigE

at ~3 GByte/s

Cam SCS measurement

PC+ storage

12x Coax 12x Coax

Cam

NVIDIA platformDrive PX2

1G/10G ethernet switch

2x10G Base-T

Other sensors

Page 13: Smart Automotive Sensors with acceleration on FPGA and CPU · felix.eberli@scs.ch +41 43 456 16 19. Author: Eberli Felix Created Date: 6/6/2018 5:54:28 PM

15 Zürich 06.06.2018 © by Supercomputing Systems AG PUBLIC

SCS JPEG DECODER in FPGA Fabric

• Processing rate of up to 140 MSamples/sec on Spartan6 FPGA

• 12Bit / 8Bit version available

• Four Huffmann tables (fixed or extracted from header)

• Up to 8 quantization tables

• Support to decode several interleaved image stripes

• 3 color components

• Support 1 scan configuration and YUV 4:2:0 (Different format on request)

• Supports any image size up to 64kx64k

• Supports DNL and restart markers

Header processor(jpgd_header)

Huffman and run-length decoder

(jpgd_dehuff)

Dequantizer(jpgd_dequant)

De-zigzag and 2D IDCT(jpgd_idct_2d)

JPEG

imag

e

De-

stu

ffed

Hu

ffm

an

enco

ded

dat

a

Qu

anti

zed

DC

T co

effi

cien

ts

Deq

uan

tize

d

DC

T co

effi

cien

ts

YUV

imag

e d

ata

Quantization tables

Huffman tables

Page 14: Smart Automotive Sensors with acceleration on FPGA and CPU · felix.eberli@scs.ch +41 43 456 16 19. Author: Eberli Felix Created Date: 6/6/2018 5:54:28 PM

16 Zürich 06.06.2018 © by Supercomputing Systems AG PUBLIC

SwissFEL - The PSI Future Project

Impressions

Page 15: Smart Automotive Sensors with acceleration on FPGA and CPU · felix.eberli@scs.ch +41 43 456 16 19. Author: Eberli Felix Created Date: 6/6/2018 5:54:28 PM

19 Zürich 06.06.2018 © by Supercomputing Systems AG PUBLIC

Automotive FPGA Solutions

Pedestrian Detection

Stereo movie processing time reduced

from 2s / frame to 25 frames/s

Stixel Detection

Page 16: Smart Automotive Sensors with acceleration on FPGA and CPU · felix.eberli@scs.ch +41 43 456 16 19. Author: Eberli Felix Created Date: 6/6/2018 5:54:28 PM

21 Zürich 06.06.2018 © by Supercomputing Systems AG PUBLIC

significant speed-up in image post-processing time

significant power reduction

Virtex 6, 8 GByte DDR3, 4x GigE, PCIe (500MB/s)

Photogrammetry Solution

5 kW BLADE Server

1 Core: 95s per frame

SCS Virtex6 Board (100W)

1 Board: 1.6s /frame

Page 17: Smart Automotive Sensors with acceleration on FPGA and CPU · felix.eberli@scs.ch +41 43 456 16 19. Author: Eberli Felix Created Date: 6/6/2018 5:54:28 PM

25 Zürich 06.06.2018 © by Supercomputing Systems AG PUBLIC

Code Optimization on ARM

Preparation and Important Notes

• Prepare testbenches and carefully select your testcases

• Update the documentation of your code

• Always keep unoptimized code (compile-time switches)

• Use GIT or other version control system

• Check for high level algorithm changes before analyzing the innermost loop

Page 18: Smart Automotive Sensors with acceleration on FPGA and CPU · felix.eberli@scs.ch +41 43 456 16 19. Author: Eberli Felix Created Date: 6/6/2018 5:54:28 PM

26 Zürich 06.06.2018 © by Supercomputing Systems AG PUBLIC

Optimization – How To

Start Profiling on the PC

• Get test setup running (with good testcases)

• Profiling on PC with valgrind and Linux perf

• Both tools output linebased execution time

• This is also a good starting point to understand the code

-> Find bottlenecks independant of the target architecture

• At the moment there are more possibilities on PC

• Instrument the code with timestamps to get a time profile for every algorithm

-> Good benchmark to see the progress in the project

Page 19: Smart Automotive Sensors with acceleration on FPGA and CPU · felix.eberli@scs.ch +41 43 456 16 19. Author: Eberli Felix Created Date: 6/6/2018 5:54:28 PM

27 Zürich 06.06.2018 © by Supercomputing Systems AG PUBLIC

Optimization – How To

Techniques for Optimization on ARM

• More aggressive selection of the regions of interest. During development,

code is written for maximum flexibility. Check if this flexibility/coverage is still needed

• Check for repeated computation of same results

-> Balance computation and memory consumption / bandwidth

• In nested loops: Fragmented code is difficult to optimize,

because it is not easy to avoid computing the same stuff several times

-> Combine / inline function calls within nested loops into one loop

Page 20: Smart Automotive Sensors with acceleration on FPGA and CPU · felix.eberli@scs.ch +41 43 456 16 19. Author: Eberli Felix Created Date: 6/6/2018 5:54:28 PM

28 Zürich 06.06.2018 © by Supercomputing Systems AG PUBLIC

Optimization – How To

Techniques for Optimization - Algebra

• Replace trigonometric functions with table lookups

• Algebra simplification: simple transformations or change of computation order

e.g. instead of using rotation matrix, transform data into polar coordinate system

-> instead of computing a rotation matrix, you just do an addition

• Length comparisons: don't compute the square roots, directly compare the squares

• Multiplications are much faster than divisions

Page 21: Smart Automotive Sensors with acceleration on FPGA and CPU · felix.eberli@scs.ch +41 43 456 16 19. Author: Eberli Felix Created Date: 6/6/2018 5:54:28 PM

29 Zürich 06.06.2018 © by Supercomputing Systems AG PUBLIC

Optimization – How To

Techniques for Optimization

• Complete rework of smaller parts of the algorithms (clean-up)

• Data structures (cache suffers if data is fragmented in memory)

• Replace generic code by specific code once functionality is frozen

• Don’t forget the compiler flags…

• Identify Code that can be accelerated on FPGA Fabric or Hard IP accelerators

Page 22: Smart Automotive Sensors with acceleration on FPGA and CPU · felix.eberli@scs.ch +41 43 456 16 19. Author: Eberli Felix Created Date: 6/6/2018 5:54:28 PM

30 Zürich 06.06.2018 © by Supercomputing Systems AG PUBLIC

Example Timing of an Algorithm Before Optimization

0

20

40

60

80

100

120

Runtime [ms]

Page 23: Smart Automotive Sensors with acceleration on FPGA and CPU · felix.eberli@scs.ch +41 43 456 16 19. Author: Eberli Felix Created Date: 6/6/2018 5:54:28 PM

31 Zürich 06.06.2018 © by Supercomputing Systems AG PUBLIC

Example Timing of Same Algorithm After Optimization

4x faster

Runtime [ms]

0

20

40

60

80

100

120

Page 24: Smart Automotive Sensors with acceleration on FPGA and CPU · felix.eberli@scs.ch +41 43 456 16 19. Author: Eberli Felix Created Date: 6/6/2018 5:54:28 PM

Supercomputing Systems AG Phone +41 43 456 16 00

Technopark 1 Fax +41 43 456 16 10

8005 Zürich www.scs.ch

Vision meets reality.

Supercomputing Systems AG

[email protected] +41 43 456 16 19