fpgaConvNet: A Framework for Mapping Convolutional Neural

fpgaConvNet:AFrameworkforMappingConvolutionalNeuralNetworksonFPGAs

Stylianos I. Venieris, Christos-Savvas Bouganisstylianos.venieris10@imperial.ac.uk

FCCM 2016, Washington DC2 May 2016

Deep Learning and AI

Deep Learning Success Stories - ConvNets

Image Recognition (Microsoft, 2015)

Image Captioning (Microsoft, 2015)“Deep Face” (Facebook, 2014)

Deep Learning on FPGAs

• Hand-tuned implementations • Memory I/O Optimisation

• Design Space Exploration

What is missing?

• Caffe• TensorFlo• Theano• Torch

• FPGA

– ConvNet Functionality

– Optimised for

HighPerformance

Deep Learning Developers fpgaConvNet

Our approach - fpgaConvNet

Automated Design Space Exploration

ConvNet Hardware Mapping

fpgaConvNet

FPGA Target Platform Specifications

Supplied by Deep Learning Expert

ConvNet Description

Bitstream

Convolutional Neural Networks (ConvNets)

convolutional+ nonlinearity

pooling convolutional+ nonlinearity

pooling

• SynchronousDataFlow

– ConvNetasadata-drivengraph

– Representedasamatrix

– Eachlayermappedtoatunablesetofhardwarebuildingblocks

fpgaConvNet – ConvNet Modelling Framework

Streaming

Analytical power

fpgaConvNet – Modelling ConvNets with SDF

ConvNet Hardware SDF Graph

Convolutional Layerwith4filtersNonlinLayer PoolingLayer Convolutional Layerwith4filters

Memory

Interface

fpgaConvNet – Design Space Perspective

ConvNet Hardware SDF Graph

Bottlenecks:

– Limitedcomputeresources

– Limitedoff-chipmemorybandwidth

– Limitedon-chipmemory formodelparameters

0 5 10

Design Space

Current Design PointFPGA 2

FPGA 1

Defineasetofactionstomovearoundthedesignspace

Action 1: Coarse-grained Folding

1) Exceeding the available compute resources

4 Convolutions / cycle

2) Not enough off-chip memory bandwidth

Action 1: Coarse-grained Folding

2 Convolutions / cycle

Action 2

Fine-grained FoldingCompute Resources

Required Bandwidth

Input Data

ConvLayer 1

NonlinLayer 1

PoolLayer 1

ConvLayer 2

NonlinLayer 2

PoolLayer 2

Subgraph 1 Subgraph 2 Subgraph 3

Bitstream 1 Bitstream 2 Bitstream 3

Bitstream

Exceeding the available compute resources

Action 3: Partitioning through Reconfiguration

Not enough on-chip memory FPGA Reconfiguration

InputData

Intermediate Results

Final ResultsOff-chip Memory

fpgaConvNet – SDF Analytical Power

Window Size = KPool Size = P

Design 1

Design 2

• SynchronousDataFlow– Actionsasalgebraicoperations

– Anylocalactionpropagates throughthenetwork

– Staticscheduling

– AnalyticalPerformanceModel

– CastDSEtoformalresource-constrainedoptimisation

Hardware Stages Interconnections

Evaluation - Experimental Setup

• fpgaConvNet

– XilinxZynq-7000XC7Z020SoCwith220DSPsat100MHz

– Q8.8fixed-pointprecisiontomatchexistingwork

(alsosupportsfloating-point)

– CurrenttoolflowsupportstheVivadoHLStoolchain

Performance Model Accuracy

0 2 4 6 8 10 12 14

LeNet-5

Sign Recognition ConvNet

Scene Labelling ConvNet

Performance Model Accuracy

Measured Performance (GOps/s) Predicted Performance (GOps/s)

Error between 1.73% and 11.76%

fpgaConvNet vs. Existing FPGA Work

00.000050.0001

0.000150.0002

0.000250.0003

0.000350.0004

0.000450.0005

Hand-tuned [1] Memory-optimised [2]

Performance Density Comparison (GOps/s/Slice)

Existing Work (GOps/s/Slice) fpgaConvNet (GOps/s/Slice)

1.62×

[1] C. Farabet et al., “CNP: An FPGA-Based Processor for Convolutional Networks”, in FPL, IEEE, 2009.

[2] M. Peemen et al., “Memory-centric accelerator design for Convolutional Neural Networks”, in ICCD, IEEE, 2013.

Hand-tuned Embedded GPU [3]

Performance Efficiency Comparison (GOps/s/Watt)

Existing Work (GOps/s/Watt)fpgaConvNet (GOps/s/Watt)

[3] L. Cavigelli et al., “Accelerating real-time embedded scene labeling with convolutional networks”, in DAC, ACM/EDAC/IEEE, 2015.

Hand-tuned Embedded GPU

• Tegra K1at800MHz

• MemoryBandwidth: 12GB/s

fpgaConvNet

• Zynq-7000XC7Z020at100MHz

• MemoryBandwidth:4.26GB/s

fpgaConvNet vs. Existing Embedded GPU Work

Conclusions

• Caffe• TensorFlo• Theano• Torch

Deep Learning Developers fpgaConvNet

fpgaConvNet: A Framework for Mapping Convolutional Neural

Documents

Doubly Convolutional Neural Networks - papers.nips.ccpapers.nips.cc/paper/6340-doubly-convolutional-neural-networks.pdf · convolutional neural network (DCNN). In Figure 3 we have

O-CNN: Octree-based Convolutional Neural Networks for … · O-CNN: Octree-based Convolutional Neural Networks for 3D ... object classification, ... Octree-based Convolutional Neural

Using Support Vector Machines, Convolutional Neural ... · Using Support Vector Machines, Convolutional Neural Networks and ... Using Support Vector Machines, Convolutional Neural

Convolutional Neural Networks (Part II)lvelho.impa.br/ip17/proj/slides/1110GoingDeeperConv.pdf · Convolutional Neural Networks 1 Convolutional Neural Networks (Part II) 08, 10 &

SemanticFusion: Dense 3D Semantic Mapping with ... · SemanticFusion: Dense 3D Semantic Mapping with Convolutional Neural Networks John McCormac, Ankur Handa, Andrew Davison, and

Convolutional Neural Networks Arise From Ising Models …sunilpai/convolutional-neural-networks.pdf · Convolutional Neural Networks Arise From Ising Models and Restricted Boltzmann

Neural Network Part 3: Convolutional Neural Networkspages.cs.wisc.edu/.../lecture12-neural-networks-3.pdf · Convolutional neural networks •Strong empirical application performance

Neural Predictive Coding using Convolutional Neural ...Neural Predictive Coding using Convolutional Neural Networks towards Unsupervised Learning of Speaker Characteristics Arindam

Convolutional Neural Networks · 2017-10-10 · Convolutional? –Pruning Convolutional Neural Networks for Resource Efficient Inference –FILTER SHAPING FOR CONVOLUTIONAL NEURAL

DeepTAM: Deep Tracking and Mapping with Convolutional Neural … · 2019-09-04 · DeepTAM: Deep Tracking and Mapping with Convolutional Neural Networks 3 tracking, can handle these

SemanticFusion: Dense 3D Semantic Mapping with ...wp.doc.ic.ac.uk/bjm113/.../2017/07/SemanticFusion... · SemanticFusion: Dense 3D Semantic Mapping with Convolutional Neural Networks

ImageNet Classification with Deep Convolutional Neural ...rogerioferis.com/.../presentations/GuangnanAndMajaDeepLearning.pdf · ImageNet Classification with Deep Convolutional Neural

A Convolutional Neural Network Cascade for Face … · A Convolutional Neural Network Cascade for Face Detection ... using a GPU, and achieves state ... Convolutional Neural Network

Convolutional Neural Networks Analyzed via Convolutional ... · PDF fileConvolutional Neural Networks Analyzed via Convolutional Sparse Coding ... Deep Learning, Convolutional Neural

Hyperbolic Graph Convolutional Neural Networkspapers.nips.cc › paper › 8733-hyperbolic-graph-convolutional-neural-networks.pdfGraph Convolutional Neural Networks (GCNs) are state-of-the-art

Convolutional Neural Fabrics - arXiv

ImageNet Classiﬁcation with Deep Convolutional Neural …kriz/imagenet_classification_with_deep_convolutional.pdfImageNet Classiﬁcation with Deep Convolutional Neural Networks

AnApproachtoSemanticandStructuralFeaturesLearningfor ...downloads.hindawi.com/journals/mpe/2020/6038619.pdf3.2. Convolutional Neural Network. Convolutional neural network(CNN)isone

Constrained Convolutional Neural Networks for …vgg/rg/slides/ccnn1.pdf · Constrained Convolutional Neural Networks for Weakly Supervised Segmentation ... [CCNN] Convolutional Neural

ImageNet Classification with Deep Deep Convolutional …rita/visual_recog_course/talks/ImageNet... · ImageNet Classification with Deep Deep Convolutional Convolutional Neural Neural