Upload
others
View
8
Download
1
Embed Size (px)
Citation preview
Features Benefits
Die Layout, Specifications
Dynamic Winograd-weight transformation, INT8 convolutions performed at INT12 accumulate
•2.25x acceleration for supported convolutions with no precision loss
Multilayer parallelism (deep fusion) •Inference activations can be kept local•Higher compute utilization
Background loading of weights during computation •Higher hardware utilization
GPIO x32 connectors •Multichip, scalable solutions possilbe
Single and double byte datatypes •INT8 datapath for low latency•BF16 datapath for high precision
Architecture: 2x2 tile of 64 64-wide 1D systolic array •Convolutional kernels map efficiently•Wide coverage of neural network operators
Datacenter Acceleration at Edge Power and Price
Reconfigurable interconnect •64 MAC clusters can be configured to allow unblocked dataflow
AI + eFPGA ®InferX™ X1Edge Inference Coprocessor
Package 21x21mm 624 pin fcBGAFrequency (MHz) 933 / 800 / 667 / 533MACs (INT8 / BF16) 4096 / 2048On-Chip SRAM 13MBDRAM Interface x32 LPDDR4xHost Interfaces PCIe Gen3/4Est. Worst-Case TDP (933 MHz, YOLOv3)
13.5W
NMAX Array
NoC
L3 SRAMLP
DD
R4x
Inte
rfac
e
PCIe
InferX X1 performs inference faster than existing edge inference chips. The X1 is optimized for the real-time megapixel input streams fundamental to edge applications, and operates in INT8 or BF16 precision over a batch size of 1 for minimum latency.
At the center of the X1 is the nnMAX™ acceleration engine. The programmable logic in the nnMAX rewire internal MAC clusters to create an optimal dataflow path for layers’ convolutional kernels, with intermediate activations stored in local SRAM or passed directly to the next layer. Complementing the nnMAX inference acceleration engine, the X1 contains a PCIe x4 Gen3/4 interface to connect to a host, and a x32 DDR interface for a single DDP LPDDR4x DRAM chip.
The InferX Compiler takes as input TensorFlow Lite / ONNX models and directly outputs a binary execution plan for the X1. InferX X1 is available as standalone chips, or on PCIe boards with single and multi-chip configurations. Performance modeling and analysis via the InferX Compiler is available for evaluation now.
October 2020www.flexlogix.com
Product Overview
AI + eFPGA ®InferX™ X1Edge Inference Coprocessor
X1 Freq. (MHz) Model Precision Input Size Throughput (fps) Latency (ms)
933
YOLOv3 INT8
1440 14.0 71.4608 72.7 13.8416 109.8 9.1
8001440 12.0 83.3608 62.3 16.0416 94.1 10.6
6671440 10.0 99.9608 52.0 19.2416 78.5 12.7
5331440 8.0 125.0608 41.5 24.1416 62.7 15.9
X1P1 Performance Estimatesbatch=1, no optimizations post training