36
Dramatically Accelerate 96Board Software via an FPGA with Integrated Processors Glenn Steiner, Sr. Manager, Xilinx, Inc. February 2018 Glenn Steiner, Sr. Manager, Xilinx, Inc. March 2018

Dramatically Accelerate 96Board Software via an FPGA with

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Dramatically Accelerate 96Board Software via an FPGA with

Dramatically Accelerate 96Board Software via an FPGA with Integrated Processors

Glenn Steiner, Sr. Manager, Xilinx, Inc.February 2018

Glenn Steiner, Sr. Manager, Xilinx, Inc.March 2018

Page 2: Dramatically Accelerate 96Board Software via an FPGA with

© Copyright 2018 Xilinx

16:00 - 16:55, 21 March 2018

Key Takeaways:

With the drive to increase integration, reduce system costs, accelerate performance, and enhance reliability, software developers are discovering the processor they would like to target is simply not fast enough. This session will help you the system architect or software developer understand how you can architect and develop software on an FPGA integrated processor, and accelerate software code via FPGA accelerators.

Abstract:

As a software developer, in order to meet system level performance requirements, you may have realized that your next software project will be targeting a processor inside of an FPGA. How will this impact your development process and what benefits might you gain with this tight integration of processor and FPGA? Starting from the basics of what FPGAs are (in terms of software programming), this session will provide a simple to understand primer of what modern FPGAs with embedded processors can do. We will wrap up with examples of how high level synthesis tools can move software to programmable logic hardware enabling dramatic software acceleration.Page 2

Abstract

Page 3: Dramatically Accelerate 96Board Software via an FPGA with

© Copyright 2018 Xilinx

What is a Field Programmable Gate Array (FPGA)?

Why Use a Heterogeneous All Programmable MPSoC?

How Do You Program Heterogeneous All Programmable Devices?

Introducing the Xilinx Ultra96 Development Platform

Software Acceleration Design Example

–German Traffic Sign Recognition

Summary

Page 3

Agenda

Page 4: Dramatically Accelerate 96Board Software via an FPGA with

© Copyright 2018 XilinxPage 4

Complexity Continues to Increase- Designers Need Better Ways to Architect, Design & Implement

Packet Processing & Transport

Video and Vision

Cloud and Data Center

Industrial loT

Wireless

> 400G OTN Video Over IP

Software Defined Networks Network Function Virtualization

LTE Advanced Cloud-RAN

Early 5G Heterogeneous Wireless Networks

8K/4K Resolution Immersive Display

Augmented Reality Video Analytics

Acceleration Big Data

Software Defined Data Center Public and Private Cloud

Machine to Machine Sensory Fusion

Industry 4.0 Cyber-Physical

Embedded Vision

Performance & Power Scalability

System Integration &

Intelligence

Security, Safety &

Reliability

1

2

3

Page 5: Dramatically Accelerate 96Board Software via an FPGA with

© Copyright 2018 XilinxPage 5

What is a Field Programmable Gate Array (FPGA)?

Page 6: Dramatically Accelerate 96Board Software via an FPGA with

© Copyright 2018 Xilinx

A field-programmable gate array (FPGA) is an integrated circuit designed to be configured by the customer or designer after manufacturing—hence “ field-programmable“–Wikipedia

In their simplest form FPGAs contain:–Configurable Logic Blocks

• AND, OR, Invert & many other logic functions–Configurable interconnect

• Enabling Logic Blocks to be connected together– I/O Interfaces

With these elements an arbitrary logic design may be createdPage 6

What is an FPGA and How Does It Work?

Page 7: Dramatically Accelerate 96Board Software via an FPGA with

© Copyright 2018 Xilinx

Glue Logic

Simple State Machines

Prototyping

Page 7

FPGAs - Circa 1990

Logic that could be connected quickly- Like LEGO blocks

Page 8: Dramatically Accelerate 96Board Software via an FPGA with

© Copyright 2018 Xilinx

Prototyping or Glue Logic

System Integration (SoC)

Connectivity – Bridging, Switching

Digital Signal Processing (DSP)

Contain Complete Embedded Processing Systems

Accelerate Software

Page 8

What Today’s FPGAs Can Do

Chips that can do any Intelligent Function- Like LEGO Mindstorms Blocks

Page 9: Dramatically Accelerate 96Board Software via an FPGA with

© Copyright 2018 XilinxPage 9

Why Use a Heterogeneous All Programmable MPSoC?

Page 10: Dramatically Accelerate 96Board Software via an FPGA with

© Copyright 2018 Xilinx

Application Processors – High performance general computing

Graphics Processing Unit – Graphical User Interface

Real-time Processors – Low latency deterministic operation

–Lock step for reliable operation

Platform Management Unit Processor

– Triple redundant for system management & error handling

Configuration & Security Unit Processor

– Triple redundant for reliable configuration & security operation

Programmable Logic – Software acceleration, additional or custom

peripheralsPage 10

All Programmable Heterogeneous ProcessingDiversity – The Right Processing Element for the

Right Job

Page 11: Dramatically Accelerate 96Board Software via an FPGA with

© Copyright 2018 Xilinx

Technical

Performance

Power

Flexibility

Business

Time to Market

Lower Cost

Flexibility

Page 11

The Heterogeneous All Programmable Advantage

Highest Level of System Integration and Intelligence

The Right Enginesfor the Right Task

Page 12: Dramatically Accelerate 96Board Software via an FPGA with

© Copyright 2018 XilinxPage 12

Why Heterogeneous Processors?

un-deterministic determinism

On-Chip Memory

Tightly-CoupledMemory

Critical Tasks

Real-Time Processing Real-Time Processing

RTOS

Compute-IntensiveNon-Critical Tasks

Motor Control (Real-Time Response)

I N T E R R U P TI N T E R R U P T

Hgp[

General PurposeProcessor

Linux

Non-Critical Tasks

Critical Tasks

Network InterfaceI N T E R R U P T

Page 13: Dramatically Accelerate 96Board Software via an FPGA with

© Copyright 2018 XilinxPage 13

Today’s “FPGAs” Heterogeneous All Programmable Devices

Real-Time Processors32-bit Dual-CoreUp to 600MHz

R5

Platform & Power ManagementGranular Power ControlFunctional Safety

Configuration & Security UnitAnti-Tamper & TrustIndustry Standards

FPGA Acceleration16nm UltraScale+ FabricCustomizable Engines

Video Codec8K4K (15fps)4K2K (60fps)

High SpeedPeripheralsPCIe Gen2, USB 3.0, DisplayPort

Graphics ProcessorARM Mali-400MP22D/3D Visualization

MemorySubsystemDDR3/4LPDDR3/4

Application Processors64-bit ARMv8Up to 1.5GHzA53

Programmable Logic

Processing System

Programmable Logic

Processing System

PlatformManagementUnit

Configand Security

PCIe® Gen4 UltraRAM

DisplayPort

USB 3.0

SATA

PCIe® Gen2

GigE

CAN

SPI

SD/eMMC

NAND

DisplayPort

USB 3.0

SATA

PCIe® Gen2

GigE

CAN

SPI

SD/eMMC

NAND

MemorySubsystemMemorySubsystem

ARM Mali™-400MP2

System Functions

System Functions

Dual-CoreARMCortex™-R5

16G Transceivers16G & 33G Transceivers

100G Ethernet* 150G Interlaken*

Video Codec

Dual-Core ARM® Cortex™-A53

Quad-CoreARM®Cortex™-A53

Programmable Logic

Processing System

MemorySubsystem

ARM Mali™-400MP2

PlatformManagementUnit

Configand Security

System Functions

DisplayPort

USB 3.0

SATA

PCIe® Gen2

GigE

CAN

SPI

SD/eMMC

NAND

PCIe® Gen4

UltraRAM

Dual-CoreARMCortex™-R5

16G Transceivers16G & 33G Transceivers 100G Ethernet 150G Interlaken

Video Codec

Multicore ARM® Cortex™-A53

* ZU11EG, ZU15EG, ZU19EG Only

CG DevicesDual-Core A53s

EG DevicesQuad-Core A53sGPU

EV DevicesQuad-Core A53sGPU and Video Codec

Page 14: Dramatically Accelerate 96Board Software via an FPGA with

© Copyright 2018 Xilinx

Rapidly Implement & Test designs

Accommodate Late Breaking Design Changes

In System Field Upgradability

Build Next Generation with New Capabilities–Use same Board & Chipset w/ reprogrammed devices

Dynamic Partial Reconfiguration–Re-program FPGA regions during operation

HW – SW Co-debug– Implement Logic Analyzers inside FPGAs

for better visibility into hardware-software problems

Page 14

Flexibility – All Programmable Devices Accelerate Development & Enable Change Management

Page 15: Dramatically Accelerate 96Board Software via an FPGA with

© Copyright 2018 XilinxPage 15

How Do You Program Heterogeneous All Programmable Devices?

Page 16: Dramatically Accelerate 96Board Software via an FPGA with

© Copyright 2018 Xilinx

Create Software Platform and Project

Download and Test Software Design

Debug Software Application

Profile Software Application for Performance

Page 16

Software Development Flow

Create Software Platform

Create Software Projects

Library Generation

Compilation

Debug

Download to FPGA

Embedded Targeted Reference Design

Page 17: Dramatically Accelerate 96Board Software via an FPGA with

© Copyright 2018 Xilinx

Eclipse based IDE

Windows and Linux hosted

GUI and command line interface

Everything you need for

–Board bring-up

–Firmware development

–Bare-metal application development

–Linux application development

–Performance optimization

–System deployment

Page 17

Software Development Kit

Page 18: Dramatically Accelerate 96Board Software via an FPGA with

© Copyright 2018 Xilinx

APUAPU RPURPU Micro-BlazeMicro-Blaze

Page 18

Run Time Software

APURPU MicroBlaze™ Processor

OSOS

High-Level OSsStand-alone Drivers & Libraries RTOSs

Hypervisors

Begin Application Development with Validated OS’sDetermine the Right Processor for your Application

Open Source / No Fee

Commercial

Safety Certifiable

Cross-Processor OS Support

OS Ecosystem

AppsApps

KernelKernel

Hypervisors

OSOS OSOSAppsApps

KernelKernel

AppsApps

KernelKernel

OS OS OSOSAppsApps

KernelKernel

AppsApps

KernelKernel

AppsApps

KernelKernel

Linux: Mentor, Wind River

Xen Hypervisor

Android

Bare metal

eSol eT-kernel

FreeRTOS

GreenHills – INTEGRITY

LynxOS7, LynxSecure

Mentor Hypervisor, Nucleus

Micrium - uC/OS-II & III

QNX

Sciopta

Sysgo – PikeOS

Wind River VxWorks7/Rocket

Hypervisor Support

Page 19: Dramatically Accelerate 96Board Software via an FPGA with

© Copyright 2018 XilinxPage 19

Introducing the Xilinx Ultra96 Development Platform

Page 20: Dramatically Accelerate 96Board Software via an FPGA with

© Copyright 2018 Xilinx

Linaro 96Boards Consumer Edition–85mm x 54mm form factor

Zynq UltraScale+ MPSoC ZU3EG

2GB LPDDR4 Memory

MicroSD Socket

WiFi

Ports–Mini DisplayPort–2x USB 3.0 Type A downstream ports (Host)–1x USB 3.0 Micro-B upstream port (Device)–40-pin Low-speed expansion header–60-pin High speed expansion header

Page 20

Ultra96 Features

Page 21: Dramatically Accelerate 96Board Software via an FPGA with

© Copyright 2018 Xilinx

Compatible with 96Boards Mezzanine Boards

Grove 96Boards Sensor Mezzanine (board or bundle)–9 Grove connectors for 96Boards IO

5x GPIO, and 4x I2C (mixed 3.3V and 5V; all 5V tolerant)–9 Grove connectors for ATMEGA328 IO

5x GPIO, 3x ADC, and 1x I2C (all 5V)–2 6-pin SPI headers–MicroUSB interface to 96Boards console serial port–1 Power LED & 3 user LEDs–Power & Reset buttons

Grove Starter Kit (Includes Mezzanine Board)–LCD, Relay, Buzzer, LEDs, Sound Sensor, Touch Sensor, Light Sensor, …

Page 21

Mezzanine Cards

Page 22: Dramatically Accelerate 96Board Software via an FPGA with

© Copyright 2018 Xilinx

Linux Kernel v4.9– Supports Ultra96 peripherals– MALI 400 GPU support– Enlightenment Desktop

Easy-to-use web interface – Access via WiFi or monitor, mouse, keyboard

Run Example Projects for Grove devices

Edit, compile and run demo applications

Add custom content

Access Tutorials

Page 22

Ultra96 SD Card Image

Page 23: Dramatically Accelerate 96Board Software via an FPGA with

© Copyright 2018 XilinxPage 23

Accelerating Software via Programmable Logic

Page 24: Dramatically Accelerate 96Board Software via an FPGA with

© Copyright 2018 XilinxPage 24

Now I Can Turn Software into Hardware

Profile Your Code

Identify Time Consuming Functions

Automatically Create & Attach

Hardware Accelerators

Model & Optimize for Performance

Download & Run

Page 25: Dramatically Accelerate 96Board Software via an FPGA with

© Copyright 2018 Xilinx

MRI Back Projection Algorithm 8x

16k FFT 10x

Optical Flow 25x

Stereo Local Block Matching 25x

2D Video Optical Filter 30x

Binary Neural Network 8,600x

Page 25

Acceleration Examples

Page 26: Dramatically Accelerate 96Board Software via an FPGA with

© Copyright 2018 XilinxPage 26

Software Acceleration Design ExampleGerman Traffic Sign Recognition

Page 27: Dramatically Accelerate 96Board Software via an FPGA with

© Copyright 2018 Xilinx

German Road Sign Database–50,000+ 32x32 bit images for training–44 classes (43 road signs, 1 background)–Training via Amazon Web Services

• AWS: p2.xlarge Instance – 8 hours $7.78 6,5e

Binary Neural Network Characteristics–6 convolutional layers–2 max pool layers–3 fully connected layers

Page 27

Neural Network Example

Page 28: Dramatically Accelerate 96Board Software via an FPGA with

© Copyright 2018 XilinxPage 28

Neural Network Performance Results

Up to 8,600 times faster when accelerated with programmable logic

Performance Metric Software Only Programmable Logic Accelerated

Tiles per second 2.2 19,000

Scene rate (fps) 0.011 (92 sec per frame) 94

Overall Acceleration - 8,600X

Page 29: Dramatically Accelerate 96Board Software via an FPGA with

© Copyright 2018 XilinxPage 29

Page 30: Dramatically Accelerate 96Board Software via an FPGA with

© Copyright 2018 XilinxPage 30

Summary

Page 30

Page 31: Dramatically Accelerate 96Board Software via an FPGA with

© Copyright 2018 Xilinx

Your “Stuff”

Reference Designs

OS / Hypervisor

Software Development

Development Kits

Silicon

Leverage Proven Content

Rapid Prototyping

Reduced Time to Market

Reduced System Level Costs

Page 31

Go Faster with Development Platforms

Page 32: Dramatically Accelerate 96Board Software via an FPGA with

© Copyright 2018 Xilinx

Submit your most creative, most out-of-the box AI or ML application at the Xilinx or Avnet table during Demo Friday (12:00 – 16:00)

The best 25 get a FREE Ultra96 board plus software to help you realize their vision

Submit a working design within 60 days and get a T-shirt and SWAG. It’s that simple.

Winner announced through Xilinx social media channels. If it’s you, you’re invited to present your design to your peers in industry at Xilinx Developer Forum 2018

Page 32

The Future is Ultra96 Contest

Page 33: Dramatically Accelerate 96Board Software via an FPGA with

© Copyright 2018 Xilinx

Industry Standard Development Environments

–Develop, Compile, Link & Download

–Easy On-Chip Debugging including HW-SW Co-Debug

Extensive IP Libraries, Drivers & Popular OSs Available

Automated Tools for Creating SW Accelerators in HW

Pre-Built Reference Designs Enable Quick-Starts

Page 33

Summary: No Need to Fear Your Processor Being Inside an FPGA

Programmable Logic

Processing System

MemorySubsystem

ARM Mali™-400MP2

PlatformManagementUnit

Configand Security

System Functions

DisplayPort

USB 3.0

SATA

PCIe® Gen2

GigE

CAN

SPI

SD/eMMC

NAND

PCIe® Gen4

UltraRAM

Dual-CoreARMCortex™-R5

16G Transceivers16G & 33G Transceivers 100G Ethernet 150G Interlaken

Video Codec

Multicore ARM® Cortex™-A53

Page 34: Dramatically Accelerate 96Board Software via an FPGA with

Dramatically Accelerate 96Board Software via an FPGA with Integrated Processors

Glenn Steiner, Sr. Manager, Xilinx, Inc.February 2018

Questions?

Page 35: Dramatically Accelerate 96Board Software via an FPGA with

© Copyright 2018 XilinxPage 35

Zynq UltraScale+ MPSoC

Application Processing Unit

321

ARM® Cortex™-A53

NEON™

32 KBI-Cachew/Parity

Floating Point Unit

32 KBD-Cachew/ECC

MemoryManagement

Unit

EmbeddedTrace

Macrocell

4

External Memory Controller

DDRCDDR4/3/3LLPDDR3/4

ECC Support

Graphics Processing UnitARM Mali™-400 MP2

Memory Management Unit

64 KB L2 Cache

GeometryProcessor

PixelProcessorPixel

Processor 1 2

High-Speed ConnectivityDisplayPort

SATA 3.1

PCIe 1.0 / 2.0

System Functions

FPD DMA

PLLsDPLL, VPLL, APLL

Debug

Timers SWDT

USB 3.0

FullPower

Domain

PMU CSU

Processor System

ROM

System Functions

SHA3AES-GCM

RSA

System Monitor

Timers 2xWDT, 4xTTC

LP ConnectivityReal-Time Processing UnitReal-Time Processing Unit

21

ARM Cortex™-R5

Vector FloatingPoint Unit

128 KB TCM w/ECC

32 KB I-Cachew/ECC

32 KB D-Cachew/ECC

Memory ProtectionUnit

128 KB RAM

4 x GEM

2 x CAN 2.0

2 x UART

2 x SPI, 2x- I2C

2 x Quad SPI

NAND

2 x SD/eMMC

2 x USB 2.0

LPD DMA

Global Registers

PLLsRPLL, IOPLL

256 KB OCM with ECC

Low Power

Domain

Storage & Signal

Processing

Storage & Signal

ProcessingGP I/OGP I/O High-Speed

ConnectivityHigh-Speed Connectivity

Video CodecH.265, H.264

Video CodecH.265, H.264

Programmable Logic

Power Domain BBRAM RTC

OSC

VCC_PSAUX

VCC_PSBATT

BatteryDomain

Page 36: Dramatically Accelerate 96Board Software via an FPGA with

© Copyright 2018 Xilinx

96Boards Sensors Mezzanine Card– Integrated ATMEGA328p microcontroller (Arduino compatible)– 18 Grove connectors– Arduino UNO compatible shield connectors– UART interface between baseboard and microcontroller– USB-UART connector for baseboard console and control

Grove Sensors– LCD RGB Backlight– Smart Relay– Buzzer– Sound Sensor– Touch Sensor– Rotary Angle Sensor– Temperature & Humidity Sensor– LED– Light Sensor– Button– LEDs Blue, Green, Red– Mini Servo– Grove Cables

Page 36

96Boards Sensors Kit