Upload
nvidia-taiwan
View
170
Download
0
Embed Size (px)
Citation preview
Affordable AIConnects To A Better LifeBofu Chen, Sep 21, 2016
Intelligent GatewayAffordable AI TechniquesImplementationExample: Pepper RobotExample: Campus Security System
AGENDA
Intelligent Gateway
Photo: Robert Bond
Photo: Robert Bond
Cat Recognition
Photo: Robert Bond
Photo: Robert Bond
Deep Learning Inference
Cat!
No Backpropagation
Inference Essentials
MBComputing Time Memory Usage
Shorten the prediction time is always welcome
Device memory is limited, but deep learning model can
be huge
Techniques To Make AI Affordable
Inference Researches
Weight Storage Hardware Usage
Reduce weight storage size without sacrificing accuracy
Utilize computing components (CPU, GPU, etc.) as many as possible
simultaneously
Binarized Neural Networks, http://arxiv.org/abs/1602.02830 | XNOR-Net, http://arxiv.org/abs/1603.05279 | DoReFa-Net, https://arxiv.org/abs/1606.06160 | DeepX, http://niclane.org/pubs/deepx_ipsn.pdf
Approaches
CompressionNvidia TensorRT Optimization
ThroughputPower efficiency
Memory usageKeep accuracy
Speed up
Low-level speed up
Nvidia TensorRTLike a model compiler
Production Deep Learning with NVIDIA GPU Inference Engine, https://devblogs.nvidia.com/parallelforall/production-deep-learning-nvidia-gpu-inference-engine/
Pruning
Learning both Weights and Connections for Efficient Neural Networks, https://arxiv.org/abs/1506.02626
Quantization
How to Quantize Neural Networks with TensorFlow, https://github.com/tensorflow/tensorflow/blob/master/tensorflow/g3doc/how_tos/quantization/index.md
DNN is noise tolerable
FP16 to INT8
Hardware speedup
FPU to ALU
Inference Without DL Frameworks
Likely A compiler intermediate representation for image recognition and heterogeneous computing, http://liblikely.org/
Implementation
Deep Learning Computer Vision
NVIDIA TX1Pre-trainedOn Server
End Devices (Sender)
Architecture
End Devices (Receiver)
Intelligent Gateway
NVIDIA TX1
Ubuntu
Tensorflow
REST
TensorRT
gRPC
Inference Choices
TensorRTFast Object
Slow Motion
TensorFlow on TX1
DONEModel Server
Maximize Performance
NEXTInference Optimization
on Ubuntu
Other Attempts
Raspberry Pi 3 Qualcomm Snapdragon 801
0.9s/img
GoogLeNet
Real-TimeInception v3
Pepper, The Emotional Robot
HW Specification
4-core1.9 GHz 4 GB 790 MHz
Pepper motherboard specification, http://doc.aldebaran.com/2-4/family/pepper_technical/motherboard_pep.html
Vision and Speech Limitations
Instead offace identification Keywords instead of NLP
FaceRecognition SpeechRecognition
Cloud Solution Drawbacks
CostConnectivity Privacy
Need to ensure bandwidth, stability and latency are
good enough
Huge amount ofimage transmission
You might want to keep family information locally
Architecture
Pepper Gateway
NVIDIA TX1
Ubuntu
Tensorflow
REST
TensorRT
gRPC
Real World Gesture Recognition Algorithm
Campus Security System
Current Solution
CloudEnd Device
Current Solution
Cloud
NOTINTELLIGENT
Current Solution
Cloud
NOTINTELLIGENT
NOTREAL-TIME
Architecture
Security Gateway
NVIDIA TX1
Ubuntu
Tensorflow
REST
TensorRT
gRPC
StudentStudent
SuspectsStudent StudentStudent StudentStudent
DT42
Violent Event
Kinect v2
UpdateUSB Firmware
Open Source Libraries
Fix data transmission issue libfreenect2 and pylibfreenect2 make enablement easier
MS Kinect v2 on Nvidia Jetson TX1, http://jetsonhacks.com/2016/07/11/ms-kinect-v2-nvidia-jetson-tx1/
We Are DT42