Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Confidential and Proprietary
GPU-BASED DEEP
LEARNING IN CLOUD AND
EMBEDDED SYSTEMS
FREDERICK SOO, CTO
April 4, 2016
Nauto is launching a connected camera for professional
drivers
2
• Drive more than most
consumers
• Exposed to passenger
and driver liability
• Driver quality unknown -
small number of very bad
drivers
Massive shift in transportation due to synergistic
technologies
3
Autonomous
90% reduction
in accidents
Connected
Electric Shared
$0.08 / mile
85% efficient
drivetrain
50-70%
utilization
Fleet
optimization
Why use deep learning?
4
Good at
visual tasks
Scalable
Deployable
Most important for NAUTO
Small brains have a lot of functionality
5
26 billion neurons
1 million
10 million
100 million 20 watts
1mW
10mW
100mW
Required performance depends on use case
6
Small changes in F1 with size
7
• Large networks can be
used in later stages of
cascade
• Order of magnitude
improvements in speed
with basic exploration
• Always worth
measuring
performance/size
tradeoff
Test your chipsets - algorithm speed important but not entire
story
8
0
30
60
90
120
150
A B C D E
Nauto
CN
N forw
ard
pass (
msec)
Embedded SoC
• Chipsets released in
2014, 2015 and 2016
• Pricing varying from
$25 to $60+
• Varying degrees of
HW/SW support
Algorithm is not the bottleneck
9
Image
processing
Conversion to
CNN space
CNN forward
pass Other steps
30msec 30msec … msec 15msec
Entire system must be optimized
10
Collect data Label Train Deploy
years months months months/years Pre-GPU
Entire system must be optimized
11
Collect data Label Train Deploy
weeks months months months/years Post-GPU
years months months months/years Pre-GPU
Entire system must be optimized
12
Collect data Label Train Deploy
weeks months months months/years Post-GPU
days weeks weeks weeks Nauto
prototype
years months months months/years Pre-GPU
Entire system must be optimized
13
Collect data Label Train Deploy
weeks months months months/years Post-GPU
days weeks weeks weeks Nauto
prototype
years months months months/years Pre-GPU
Nauto at-
scale ? ? ? ?
Easy to think of optimization; hard to think of
system
14
Programmers waste enormous amounts of time thinking
about, or worrying about, the speed of noncritical parts of
their programs, and these attempts at efficiency actually have
a strong negative impact when debugging and maintenance
are considered.
We should forget about small efficiencies, say about 97% of
the time: premature optimization is the root of all evil.
Yet we should not pass up our opportunities in that critical
3%.
Donald Knuth
Lessons
15
• Embedded pipeline as important as raw CNN
performance
• Match algorithm performance to use case
• Overall system performance (data acquisition,
labeling, training) is where big progress to be made
The future is in distributed awareness
16
Real world search
Team
17
Ludmila Levkova
Nikhil Deshmukh
Joe Virzi
Jonathan Soo