1
Acceleration Platform Audio signal representation & preprocessing Neural network 15 14 13 12 11 10 9 8 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0 0.2 0.4 0.6 0.8 1 The Xelera Suite Software accelerates Deep Learning model executions in order to enable inference with low and low-variance latency. It achieves this by offloading the inference to highly efficient hardware accelerators such as FPGAs. The software is available where FPGAs are available: in the public cloud, in on-premises data centers or edge computing servers. Xelera Suite ensures that the accelerator connects seamlessly to the host CPU, network or sensor I/O. The software can be used without application code changes because it integrates into common software frameworks. Xelera provides the Deep Learning acceleration software, and optional customization and integration services. Model parameter extraction Model quantization Tailored FPGA architecture synthesis Deep Learning Accelerator on FPGAs Tool Flow Scalable FPGA Architecture Control & Tiling Pooling & Activation Scalable accelerator Model Quantization Reference Use Case – Real-Time Speaker Recognition – 6 ms Inference AI Engine Tensor precision [bits] Weight precision [bits] Select optimal quantization Rel. error metric FPGA-enabled standard server or cloud instance Deep Learning model Convert Keras/Tensorflow model into reduced-bitwidth fixed-point representation Mic Audio signal Detected speaker 6 ms Platform Parallel AI engines per FPGA Xilinx® Alveo TM U200 10 Xilinx® Alveo TM U250 16 AWS F1.2 instance 10 AWS F1.16 instance 80

Deep Learning Accelerator on FPGAs Tool Flow · Title: Microsoft Word - Accelerated-Deep-Learning-Inference.docx Created Date: 8/13/2019 4:16:44 PM

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

  • Acceleration Platform

    Audio signal representation &

    preprocessing Neural network

    15 1413 12

    11 109 8

    7 65 4

    3 21

    12

    34

    56

    78

    910

    1112

    1314

    15

    0

    0.2

    0.4

    0.6

    0.8

    1

    Activation precision [bits]Weight precision [bits]

    The Xelera Suite Software accelerates Deep

    Learning model executions in order to enable

    inference with low and low-variance latency. Itachieves this by offloading the inference to

    highly efficient hardware accelerators such as

    FPGAs. The software is available where FPGAs

    are available: in the public cloud, in on-premises

    data centers or edge computing servers. Xelera

    Suite ensures that the accelerator connects

    seamlessly to the host CPU, network or sensor

    I/O. The software can be used without

    application code changes because it integrates

    into common software frameworks. Xelera

    provides the Deep Learning acceleration

    software, and optional customization and

    integration services.

    Model parameter extraction

    Model quantization

    Tailored FPGA

    architecture synthesis

    Deep Learning Accelerator on FPGAs Tool Flow

    Scalable FPGA Architecture

    Control &

    Tiling

    Pooling &

    Activation

    Scalable

    accelerator

    Model Quantization

    Reference Use Case – Real-Time Speaker Recognition – 6 ms Inference

    AI

    Engine

    Tensor

    precision [bits]

    Weight

    precision [bits]

    Select optimal

    quantization

    Rel. error metric

    FPGA-enabled standard

    server or cloud instance

    Deep Learning model

    Convert

    Keras/Tensorflow

    model into

    reduced-bitwidth

    fixed-point

    representation

    Mic Audio signal

    Detected

    speaker

    6 ms

    Platform Parallel AI engines

    per FPGA

    Xilinx® AlveoTM

    U20010

    Xilinx® AlveoTM

    U25016

    AWS F1.2

    instance10

    AWS F1.16

    instance80