Acceleration Platform Audio signal representation & preprocessing Neural network 15 14 13 12 11 10 9 8 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0 0.2 0.4 0.6 0.8 1 The Xelera Suite Software accelerates Deep Learning model executions in order to enable inference with low and low-variance latency. It achieves this by offloading the inference to highly efficient hardware accelerators such as FPGAs. The software is available where FPGAs are available: in the public cloud, in on-premises data centers or edge computing servers. Xelera Suite ensures that the accelerator connects seamlessly to the host CPU, network or sensor I/O. The software can be used without application code changes because it integrates into common software frameworks. Xelera provides the Deep Learning acceleration software, and optional customization and integration services. Model parameter extraction Model quantization Tailored FPGA architecture synthesis Deep Learning Accelerator on FPGAs Tool Flow Scalable FPGA Architecture Control & Tiling Pooling & Activation Scalable accelerator Model Quantization Reference Use Case – Real-Time Speaker Recognition – 6 ms Inference AI Engine Tensor precision [bits] Weight precision [bits] Select optimal quantization Rel. error metric FPGA-enabled standard server or cloud instance Deep Learning model Convert Keras/Tensorflow model into reduced-bitwidth fixed-point representation Mic Audio signal Detected speaker 6 ms Platform Parallel AI engines per FPGA Xilinx® Alveo TM U200 10 Xilinx® Alveo TM U250 16 AWS F1.2 instance 10 AWS F1.16 instance 80

Deep Learning Accelerator on FPGAs Tool Flow · Title: Microsoft Word - Accelerated-Deep-Learning-Inference.docx Created Date: 8/13/2019 4:16:44 PM

Download PDF Report

Upload
others
View
2
Download
0

Embed Size (px)

Citation preview

Acceleration Platform

Audio signal representation &

preprocessing Neural network

15 1413 12

11 109 8

7 65 4

3 21

12

34

56

78

910

1112

1314

15

0

0.2

0.4

0.6

0.8

1

Activation precision [bits]Weight precision [bits]

The Xelera Suite Software accelerates Deep

Learning model executions in order to enable

inference with low and low-variance latency. Itachieves this by offloading the inference to

highly efficient hardware accelerators such as

FPGAs. The software is available where FPGAs

are available: in the public cloud, in on-premises

data centers or edge computing servers. Xelera

Suite ensures that the accelerator connects

seamlessly to the host CPU, network or sensor

I/O. The software can be used without

application code changes because it integrates

into common software frameworks. Xelera

provides the Deep Learning acceleration

software, and optional customization and

integration services.

Model parameter extraction

Model quantization

Tailored FPGA

architecture synthesis

Deep Learning Accelerator on FPGAs Tool Flow

Scalable FPGA Architecture

Control &

Tiling

Pooling &

Activation

Scalable

accelerator

Model Quantization

Reference Use Case – Real-Time Speaker Recognition – 6 ms Inference

AI

Engine

Tensor

precision [bits]

Weight

precision [bits]

Select optimal

quantization

Rel. error metric

FPGA-enabled standard

server or cloud instance

Deep Learning model

Convert

Keras/Tensorflow

model into

reduced-bitwidth

fixed-point

representation

Mic Audio signal

Detected

speaker

6 ms

Platform Parallel AI engines

per FPGA

Xilinx® AlveoTM

U20010

Xilinx® AlveoTM

U25016

AWS F1.2

instance10

AWS F1.16

instance80