22
1 Jeff Gehlhaar, Vice President, Qualcomm Research May 12, 2015 Deep-learning-based visual perception in mobile and embedded devices: Opportunities and challenges

"Deep-learning-based Visual Perception in Mobile and Embedded Devices: Opportunities and Challenges," a Presentation from Qualcomm

Embed Size (px)

Citation preview

1

Jeff Gehlhaar, Vice President, Qualcomm Research

May 12, 2015

Deep-learning-based visual perception

in mobile and embedded devices:

Opportunities and challenges

2

Qualcomm Research:

Transforming the future of mobile technology

Research Prototype Standardize

3

Vision for cognitive computing

More intuitive devices and things

4

Key elements of “Cognition”

Hear Anticipate See Plan Concepts

Autonomous Classify Infer context

Relationships

Perception Action Reasoning

5

Rich Connectivity

Heterogeneous Computing

On-device Intelligence

On-device capabilities

• Integrated modem & AP

• Adaptive RF front end

• LTE broadcast & service focused modem features

• Tightly integrated Wi-Fi/BT

• Leading location / GPS

• Fully customized architecture

• Superior performance at low power consumption

• Highly optimized for cutting-edge cognitive capabilities

• On-device machine learning

• Computer vision

• Behavioral analysis

• Sensor processing and classification algorithms

• Natural language processing

Visual

Perception Speech & Audio Understanding

Natural Interactions

Intelligent

Connectivity Immersive Multimedia

Intuitive

Security Always On Awareness

On the road to a “Cognitive Platform”

6

On-device visual perception is key

Democratizing

robotics to assist

us in daily lives

Revolutionizing

transportation with

autonomous cars

Contextualizing your

environment through scene

understanding

7

Process data closest to the source, complement cloud

Why fully on-device matters

Reliability

Efficient use of network bandwidth

Low Latency

Security and user privacy

8

• Qualcomm Technologies, Inc. has been applying

machine learning to mobile for many years

• Deep learning for visual perception

• Provides best-in-class solutions

• Traditionally a cloud-only solution, but not on

mobile (until now)

• Presents many implementation challenges

• Our mobile focused platform goes beyond deep

learning to include RNNs and other strategies

• Applications: Security, handwriting, natural

language processing, etc.

Deep learning solves visual perception

C C C C C C

C C C C C C

Pooling

Fully Connected

Result

Deep Network

9

Challenges to enabling deep learning

based visual perception on mobile

10

Typical computing environment for deep

learning

Performance

Teraflops

Memory

bandwidth

100s of GB/s

Storage

10s of GBs of RAM

Power

100s of watts

Best-in-class server-based visual perception models

require about ~2B MAC operations per image

11

Supporting deep learning on-device is

a major challenge

Power and thermal efficiency

Storage and memory bandwidth limitations

Battery powered

Constrained mobile environment

Visual perception

workloads

Compute intensive

Large and complicated neural network models

12

Within the power and thermal

constraints of mobile devices

Solving the challenge of

on-device visual perception

13

Scene understanding video

14

Robot face tracking video

https://www.youtube.com/watch?v=0D9I0SBGAPY

15

Key to deep learning on mobile is an efficient execution environment that considers all aspects of the SoC combined with efficient library implementations

• Careful analysis of deep learning tradeoffs

• Consider the impact of different network architectures

• Focus on cache performance, data locality, DRAM utilization efficiency

• Focus on parallelism and heterogeneity

• Take advantage of heterogeneous computing frameworks (e.g. Qualcomm MARE)

• Span execution across Qualcomm® Snapdragon™ CPU, DSP, and GPU

• Focus on underlying optimizations

• Convolutions implemented as highly efficient matrix multiply operations

• Smart buffer management for GPU and fixed bit-width optimizations for DSP

• Optimized matrix multiply for Snapdragon processors1

• 6X faster than Eigen

Efficient execution on mobile SoCs

1. Results are based on Snapdragon 805 processor and Eigen 3.2.2

Qualcomm Snapdragon and Qualcomm Multicore Asynchronous Runtime Environment are products of Qualcomm Technologies, Inc.

16

Goal

Reduce both physical size and number of MACs required at equivalent precision

• Utilize available memory bandwidth, computations effectively -> power efficiency

• Smaller size permits in-field model upgrades and improvements

Reducing model size through compression

C C C C C C

C C C C C C

Pooling

Fully Convolution

Result

Deep Network

Qualcomm Technologies, Inc.

approach • Initial SVD approach based on a paper by

Denton, et. al. of NYU1

• Qualcomm Technologies Inc. approach

involves replacing single layers with

multiple layers

• Approach permits fine-tuning all layers,

not just layers above compressed layers

Results

• Up to a 10X reduction in physical

model size

• Up to a 35% reduction in the

number of MAC operations with

minimal lost of precision

1. “Exploiting Linear Structure Within Convolutional Networks for Efficient Evaluation”, arXiv:1404.0736 [cs.CV]

17

Size compression and error rate impact

FC Layer Compressed

Original Network

FC and Conv Layer

Compressed

Fully connected layer compression significantly impacts physical network size

10X size reduction

~1% pt loss in top5 error

18

MAC compression and error rate impact

FC Layer Compressed

Original Network

FC and Conv Layer

Compressed Compression ~ 35% MAC reduction

~ 1.3% pt loss in top5 error

Fine Tuning 2.5% pt improvement in

top5 error under max

MAC constraints

AlexNet

Convolutional layer compression significantly impacts MAC requirements

19

Focus on reduction of precision for both weights (static value) and

activations (dynamic values) versus traditional 32-bit floating approaches

• Physically smaller networks

• 2X improvement in memory access efficiency for network weights

Fixed point and reduced bit widths

16-bit values are used with no net increase in top-5 error

Act

ivat

ion

Bit

Wid

ths Neural Network Weight Bit Widths

4 8 16 24 32 Float

8 20.0% 1.4% 0.1% 0.1% 0.1% 0.1%

16 20.1% 1.4% 0.0% 0.0% 0.0% 0.0%

24 20.1% 1.4% 0.0% 0.0% 0.0% 0.0%

32 20.1% 1.4% 0.0% 0.0% 0.0% 0.0%

Float 20.1% 1.4% 0.0% 0.0% 0.0% 0.0%

0.0%

20

Conclusions

What the future holds

21

Expanding the frontier of visual perception

• More complex models

• Video classification

• Scene parsing and object localization and tracking

Platform enhancements

• Evolution of the SoC

Working towards “Cognition”

• Qualcomm Research is experimenting with algorithms for

“reasoning” to link perception to action

What comes next?

22

• Qualcomm Technologies, Inc. web sites: • Computer Vision: https://www.qualcomm.com/invention/research/projects/computer-vision

• Cognitive Technologies: https://www.qualcomm.com/invention/cognitive-technologies

• FastCV™ SDK: : https://developer.qualcomm.com/mobile-development/add-advanced-

features/computer-vision-fastcv/tools-and-resources

• Embedded Vision Alliance web sites: • Heterogeneous computing for CV: http://www.embedded-vision.com/platinum-

members/qualcomm/embedded-vision-training/videos/pages/oct-2013-embedded-vision-

summit-heterogeneous

• CV acceleration: http://www.embedded-vision.com/platinum-members/bdti/embedded-vision-

training/videos/pages/september-2013-qualcomm-uplinq-conferenc

• Demo in Technology Showcase • Scene detect through on-device deep learning

Additional resources

FastCV is a product of Qualcomm Technologies, Inc.

Qualcomm and Snapdragon are trademarks of Qualcomm Incorporated, registered in the United States and other countries.

FastCV is a trademark of Qualcomm Incorporated. All Qualcomm Incorporated trademarks are used with permission.

Other products and brand names may be trademarks or registered trademarks of their respective owners.