DEEP LEARNING WITH GPUs - files.meetup.comfiles.meetup.com/4379272/ACCELERATED-DEEP-LEARNING... ·...

An NVIDIA Tech Talk @

Boston Imaging and Vision Meetup

Thursday, October 22nd, 2015

Accelerated Deep Learning With GPUs

PC DATA CENTER MOBILE

DESKTOP VIRTUALIZATION

AUTONOMOUS MACHINES

HPC & CLOUD SERVICE PROVIDERS GAMING DESIGN

The World Leader in Visual Computing

Academic Collaboration

https://developer.nvidia.com/academia

CUDA Zone

Accelerated Deep Learning with GPUs

• Data Science & Deep Learning Overview

• The Promise of Machine Learning

• What Makes Deep Learning Deep?

• Why Is Deep Learning Hot Now? AGENDA

Data science landscape (simplified)

• Regression • SVM • Recommendation systems

Data Analytics

Machine

Learning Graph Analytics SQL Query

Traditional

Methods

Deep Neural

Networks

How GPU Acceleration Works (highly simplified)

Application Code

GPU CPU 5% of Code

Compute-Intensive Functions

Rest of Sequential CPU Code

~ 80% of run-time

3 Drivers for Deep Learning

More Data Better Models Powerful GPU Accelerators

“Machine Learning” (ML) is in some sense a rebranding of AI.

The focus is now on more specific, often perceptual tasks, and there are many successes.

Today, some of the world’s largest internet companies, as well as the foremost research institutions, are using GPUs for machine learning.

GPU Accelerated Deep Learning CUDA for

Deep Learning

Neural Networks

Inherently Parallel

Matrix Operations

Why are GPUs Useful for Deep Learning?

110 28%

2010 2011 2012 2013 2014

person

GPUs deliver —

Same or better prediction accuracy

Faster results

Smaller footprint

Lower power

Deep learning with COTS HPC systems

A. Coates, B. Huval, T. Wang, D. Wu, A. Ng, B. Catanzaro

ICML 2013

GOOGLE DATACENTER

1,000 CPU Servers 2,000 CPUs • 16,000

600 kWatts

$5,000,00

STANFORD AI LAB

3 GPU-Accelerated Servers 12 GPUs • 18,432 cores

4 kWatts

$33,000

Now You Can Build Google’s

$1M Artificial Brain on the Cheap

“ “ GPUs Make Deep Learning Accessible

NVIDIA & IBM Cloud Support

“NVIDIA is excited to announce we have teamed up with IBM Cloud to provide qualifying competitors with access to the SoftLayer cloud server infrastructure. The servers are outfitted with dual Intel Xeon E5-2690 CPUs, 128GB RAM, two 1TB SATA HDD/RAID 0, and two NVIDIA Tesla K80s, NVIDIA’s fastest dual-GPU accelerators.” NVIDIA Parallel For All Blog, August 13th, 2015

The Promise of Machine Learning ML Systems Extract Value From Big Data

350 millions images uploaded per day

2.5 Petabytes of customer data hourly

100 hours of video uploaded every minute

“Delving Deep into Rectifiers: Surpassing Human-Level

Performance on ImageNet Classification”

— Microsoft: 4.94%, Feb. 6, 2015

“Deep Image: Scaling up Image Recognition”

— Baidu: 5.98%, Jan. 13, 2015

“Batch Normalization: Accelerating Deep Network

Training by Reducing Internal Covariant Shift”

— Google: 4.82%, Feb. 11, 2015

IMAGENET

CHALLENGE

Accuracy %

2010 2014 2012 2011 2013

The Early Big Data Narrative

2.5 Exabytes of Web Data Created Daily 2.5 Petabytes of Customer Data Hourly

350 Million Images Uploaded a Day 100 Hours Video Uploaded Every Minute

How can we organize, analyze, understand, and

benefit from such a trove of data?

TRAINING DATA

The promise of machine learning

LIVE DATA PREDICTION ANALYSIS

MACHINE LEARNING

Walmart • 2.5 PB of customer transaction

data every hr

Facebook • 340 million photos uploaded

every day

Orange France • >1 billion call data records every

Image Classification, Object Detection, Localization

Facial Recognition Speech & Natural Language

Processing

Medical Imaging & Interpretation

Seismic Imaging & Interpretation Recommendation

Machine Learning Use Cases …machine learning is pervasive!

Traditional ML – Hand Tuned Features

Image Vision features Detection

Images/video

Audio sample Audio features Speaker ID

Text Text features

Text classification, Machine

translation, Information

retrieval, ....

Slide courtesy of Andrew Ng, Stanford University

What is Deep Learning? Systems that learn to recognize objects which are important, without us telling the system explicitly what the object is ahead of time

Key components

Features

Learning Algorithm

Deep Learning Advantages

Don’t have to figure out the features ahead of time

Use same neural net approach for many different problems

Fault tolerant

Scales well

Simplicity & Scalability

Linear classifier

Regression

Support Vector Machine Bayesian

Decision Trees Association Rules

Clustering

Convolutional Neural Networks (CNNs)

Biologically inspired

Neuron only connected to a small region of neurons in layer below it called the receptive field A given layer can have many convolutional filters/kernels Each filter has the same weights across the whole layer Bottom layers are convolutional, top layers are fully connected

Generally trained via supervised learning Supervised Unsupervised Reinforcement …ideal system automatically switches modes…

Convolutional Networks Breakthrough

Y. LeCun et al. 1989-1998 : Handwritten digit reading

A. Krizhevsky, G. Hinton et al. 2012 : Imagenet classification winner

Training

Image Classification with DNNs

Cars Buses Trucks Motor cycles

Inference

Image Classification with DNNs Training

Typical training run

Pick a DNN design

Input 100 million training images spanning 1,000 categories

One week of computation

Test accuracy

If bad: modify DNN, fix training set or update training parameters

Cars Buses Trucks Motor cycles

What makes Deep Learning deep?

Input Result

Hinton et al., 2006; Bengio et al., 2007; Bengio & LeCun, 2007; Lee et al., 2008; 2009

Visual Object Recognition Using Deep Convolutional Neural Networks

Rob Fergus (New York University / Facebook) http://on-demand-gtc.gputechconf.com/gtcnew/on-demand-gtc.php#2985

Networks can have ~10 layers 1B parameters 10M images ~30 Exaflops ~30 GPU days Human brain has trillions of parameters.

Deep Learning Framework

“turtle”

Forward Propagation

Compute weight update to nudge

from “turtle” towards “dog”

Backward Propagation

Trained Model

“dog”

Repeat

Training

Classification

CNNs dominate in perceptual tasks

Slide credit: Yann Lecun, Facebook & NYU

Why is Deep Learning Hot Now ?

ConvNet Benchmark comments from github

“…9 months ago, we were ~3x slower on Alex net and ~4x slower on overfeat. Training that took 3 weeks, takes 1 week (on the same 1-GPU metric). That is a huge fundamental speedup results in several man-hours saved in waiting for experiments to finish.”

“Pushing these boundaries so far, in such a short time-frame is quite something. There’s two sets of teams who have made this happen:

• NVIDIA, with their Maxwell cards that are as fast as #*%!

• Nervana Systems (Scott Gray and team) who have pushed the CUDA kernels to the limits of the GPUs with efficiencies > 95%”

“Rejigging the marks… #56” – Soumith Chintala, August 2015 Artificial Intelligence Research Engineer at Facebook

Why is Deep Learning Hot Now? Three Driving Factors…

Big Data Availability New ML Techniques Compute Density

350 millions images uploaded per day

2.5 Petabytes of customer data hourly

100 hours of video uploaded every minute

Deep Neural Networks GPUs

ML systems extract value from Big Data

Deep Learning Revolutionizing Medical Research

Detecting Mitosis in

Breast Cancer Cells

— IDSIA

Predicting the Toxicity

of New Drugs

— Johannes Kepler University

Understanding Gene Mutation

to Prevent Disease

— University of Toronto

ILSVRC12 winning model: “Supervision”

7 layers

5 convolutional layers + 2 fully-connected

ReLU, pooling, drop-out, response normalization

Implemented with Caffe

GPU Acceleration Training a Deep, Convolutional Neural Network

Batch Size Training Time

CPU Training Time

GPU GPU

Speed Up

64 images 64 s 7.5 s 8.5X

128 images 124 s 14.5 s 8.5X

256 images 257 s 28.5 s 9.0X

Dual 10-core Ivy Bridge CPUs

1 Tesla K40 GPU

CPU times utilized Intel MKL BLAS library

GPU acceleration from CUDA matrix libraries (cuBLAS)

GPU-Accelerated Deep Learning Frameworks

CAFFE TORCH THEANO CUDA-

CONVNET2 KALDI

Domain Deep Learning

Framework

Scientific Computing

Framework

Math Expression

Compiler

Deep Learning

Application

Speech Recognition

Toolkit

cuDNN R2 R2 R2 -- --

Multi-GPU In Progress Partial Partial (nnet2)

Multi-CPU (nnet2)

License BSD-2 GPL BSD Apache 2.0 Apache 2.0

Interface(s)

Text-based

definition files,

Python, MATLAB

Python, Lua,

MATLAB Python C++ C++, Shell scripts

Embedded (TK1)

http://developer.nvidia.com/deeplearning

Deep Learning Use Cases for Vision & Graphics

Street Number Detection

[Goodfellow 2014]

Object Classification

[Krizhevsky 2012]

Image Retrieval

[Krizhevsky 2012]

Pose Estimation

[Toshev, Szegedy 2014]

Object Detection

[Toshev, Szegedy 2014]

Object Detection

[Huval et al. 2015]

Face Recognition

[Taigman et al. 2014]

Action Recognition

[Simonyan et al. 2014]

Thank You!

DEEP LEARNING WITH GPUs - files.meetup.comfiles.meetup.com/4379272/ACCELERATED-DEEP-LEARNING... ·...

Documents

Introduction to Local Binary Patterns - Meetupfiles.meetup.com/4379272/BIPCVG_LocalBinaryPatterns_2013.01.16.pdf · 2013-01-16 · • Ahonen T, Hadad A, Pietikäinen M. Face description

files.meetup.comfiles.meetup.com/119694/Vattimo Essay 1.pdf · Created Date: 9/1/2014 11:39:51 AM

Halide Boston vision meeting - We are what we do | Meetupfiles.meetup.com/4379272/Halide_Boston_vision_meeting.pdf · 2015-07-17 · Gaurav Chaurasia (MIT CSAIL) Slides courtesy:

Image Processing in Infrared Cameras - Meetupfiles.meetup.com/4379272/BIV20160602_Norvig.pdfBoston Imaging and Vision Group Infrared Vision marc.norvig@gmail.com 06/02/2016 Image Processing

Amazon Aurora Deep Dive - files.meetup.comfiles.meetup.com/19647895/Amazon Aurora - Deep Dive.pdfAmazon Web Services June, 2016. ... Reproducing benchmark results https: ... Replicate

files.meetup.comfiles.meetup.com/46975/Naruto_d20_Bookmarks.pdf · stories, storylines, plots, thematic elements, dialogue, incidents, language, artwork, ... 5. Representation of

files.meetup.comfiles.meetup.com/19057432/BluesRock Blues Master S… · Web viewfiles.meetup.com

Amazon Aurora Deep Dive - files.meetup.comfiles.meetup.com/8179642/Amazon Aurora Deep dive YVR MeetUP.pdf · MySQL-compatible relational database ... open source databases Delivered

DICOS: The Case for Standardized Data in Securityfiles.meetup.com/4379272/dicos_BIPCVG#2_Final.pdfGraphite 2.16 6.0 Aluminum 2.7 13 . Lorena Kreda, Proprietary Information 12/6/2012

Splunk Spark Integration - files.meetup.comfiles.meetup.com/16395762/4. splunk_spark.pdf · Splunk'Company'Overview' 3" Company'' • Global"HQs:""! San"Francisco"! London""! Hong"Kong"

D Programming Language - files.meetup.comfiles.meetup.com/18234529/AliCehreli_D_intro_Axcient.pdf · • Fast compiling programs ... Author of "Modern C++ Design", "The D Programming

files.meetup.comfiles.meetup.com/10366/Arabic Mock DLPT.pdffiles.meetup.com

Introduction to OpenStack - files.meetup.comfiles.meetup.com/2715362/Introduction to OpenStack.pdf · Protecting, Empowering, and Promoting OpenStack software and the community around

Deep Learning - files.meetup.comfiles.meetup.com/18450611/2_20-April-15_Roelof-talk1-DLBirdseye_s… · Deep Learning = Machine Learning Learning denotes changes in the system that

files.meetup.comfiles.meetup.com/.../RT_Permaculture_Short_PDC_curriculum_2011… · Web viewRT Permaculture . Permaculture Design Certification Course Outline (Abridged) Prepared

files.meetup.comfiles.meetup.com/5947002/The_Odd_Couple_female_version.pdf · Created Date: 8/21/2013 8:02:37 PM

files.meetup.comfiles.meetup.com/14539652/The Ending of Time.docx · Web viewfiles.meetup.com

mysql sys schema oow 2014 - files.meetup.comfiles.meetup.com/107575/chicago_mysql_meetup_slides_20150422.… · MySQL sys views • Reference set of views solving various administrator

Salt Stack Reactors - files.meetup.comfiles.meetup.com/8829272/saltstack_reactors_slides.pdf · salt stack reactors zile rehman automation infrastructure architect rehmanzile@gmail.com

gradle user to addict sf - files.meetup.comfiles.meetup.com/1715787/gradle_user_to_addict_notes.pdf · build.gradle apply plugin: ‘com.android.application’ * Ok, so the ﬁrst