Deep neural networks - Computer Science- UC Davisweb.cs.ucdavis.edu/.../lee_lecture18_deeplearning.pdf · Deep neural networks ... TradiMonal Image CategorizaMon: Training phase Training

6/1/17

1

Deepneuralnetworks

June1st,2017

YongJaeLeeUCDavis

ManyslidesfromRobFergus,SvetlanaLazebnik,Jia-BinHuang,DerekHoiem

Announcements•  PostquesMonsonPiazzaforreview-session(6/8lecture)

2

Outline•  DeepNeuralNetworks•  ConvoluMonalNeuralNetworks(CNNs)

6/1/17

2

TradiMonalImageCategorizaMon:Trainingphase

TrainingLabels

Training Images

ClassifierTraining

Training

ImageFeatures

TrainedClassifier

TrainingLabels

Training Images

ClassifierTraining

Training

ImageFeatures

TrainedClassifier

ImageFeatures

Testing

Test Image Outdoor Prediction Trained

Classifier

TradiMonalImageCategorizaMon:TesMngphase

Featureshavebeenkey..

SIFT [Loewe IJCV 04] HOG [Dalal and Triggs CVPR 05]

SPM [Lazebnik et al. CVPR 06] DPM [Felzenszwalb et al. PAMI 10]

Color Descriptor [Van De Sande et al. PAMI 10]

Hand-cra(ed

6/1/17

3

Whataboutlearningthefeatures?

•  Learnafeaturehierarchyallthewayfrompixelstoclassifier

•  Eachlayerextractsfeaturesfromtheoutputofpreviouslayer

•  Layershave(nearly)thesamestructure

•  Trainalllayersjointly

Layer1 Layer2 Layer3 SimpleClassifier

Image/VideoPixels

LearningFeatureHierarchy

Goal:Learnusefulhigher-levelfeaturesfromimagesFeaturerepresentaMon

Input data

1stlayer“Edges”

2ndlayer“Objectparts”

3rdlayer“Objects”

Pixels

Lee et al., ICML 2009; CACM 2011

Slide: Rob Fergus

LearningFeatureHierarchy

•  Better performance

•  Other domains (unclear how to hand engineer): –  Kinect –  Video –  Multi spectral

•  Feature computation time –  Dozens of features now regularly used –  Getting prohibitive for large datasets (10’s sec /image)

Slide: R. Fergus

6/1/17

4

“Shallow”vs.“deep”architectures

Hand-designedfeatureextracMon

Trainableclassifier

Image/VideoPixels

ObjectClass

Layer1 LayerN Simpleclassifier

ObjectClass

Image/VideoPixels

Traditional recognition: “Shallow” architecture

Deep learning: “Deep” architecture

…

BiologicalneuronandPerceptrons

A biological neuron An artificial neuron (Perceptron) - a linear classifier

Simple,Complex,andHyper-complexcells

David H. Hubel and Torsten Wiesel

David Hubel's Eye, Brain, and Vision

Suggested a hierarchy of feature detectors in the visual cortex, with higher level features responding to patterns of activation in lower level cells, and propagating activation upwards to still higher level cells.

video

6/1/17

5

Hubel/WieselArchitectureandMulM-layerNeuralNetwork

Hubel and Weisel’s architecture Multi-layer Neural Network - A non-linear classifier

Neuron:LinearPerceptron•  Inputsarefeaturevalues•  Eachfeaturehasaweight•  SumistheacMvaMon

•  IftheacMvaMonis:–  PosiMve,output+1–  NegaMve,output-1

Slide credit: Pieter Abeel and Dan Klein

Two-layerperceptronnetwork


6/1/17

6





Learningw•  Trainingexamples

•  ObjecMve:amisclassificaMonloss

•  Procedure:

– Gradientdescent/hillclimbing


6/1/17

7

Hillclimbing

•  Simple,generalidea:–  Startwherever–  Repeat:movetothebestneighboringstate

–  Ifnoneighborsbeferthancurrent,quit

–  Neighbors=smallperturbaMonsofw

•  What’sbad?–  OpMmal?






6/1/17

8

Two-layerneuralnetwork


NeuralnetworkproperMes•  Theorem(Universalfunc9onapproximators):Atwo-layernetworkwithasufficientnumberofneuronscanapproximateanyconMnuousfuncMontoanydesiredaccuracy

•  Prac9calconsidera9ons:–  Canbeseenaslearningthefeatures–  Largenumberofneurons

•  Dangerforoverfijng

–  Hill-climbingprocedurecangetstuckinbadlocalopMma

Slide credit: Pieter Abeel and Dan Klein Approximation by Superpositions of Sigmoidal Function,1989

MulM-layerNeuralNetwork•  Anon-linearclassifier•  Training:findnetworkweightswtominimizetheerrorbetweentruetraininglabelsandesMmatedlabels

•  MinimizaMoncanbedonebygradientdescentprovidedfisdifferenMable

•  Thistrainingmethodiscalledback-propaga9on

6/1/17

9

Outline•  DeepNeuralNetworks•  Convolu9onalNeuralNetworks(CNNs)

ConvoluMonalNeuralNetworks(CNN,ConvNet,DCN)

•  CNN=amulM-layerneuralnetworkwith– LocalconnecMvity:•  Neuronsinalayerareonlyconnectedtoasmallregionofthelayerbeforeit

– ShareweightparametersacrossspaMalposiMons:•  Learningshil-invariantfilterkernels

Image credit: A. Karpathy

Neocognitron[Fukushima,BiologicalCyberneMcs1980]

Deformation-Resistant Recognition S-cells: (simple) - extract local features C-cells: (complex) - allow for positional errors

6/1/17

10

LeNet[LeCunetal.1998]

Gradient-basedlearningappliedtodocumentrecogniMon[LeCun,Bofou,Bengio,Haffner1998] LeNet-1 from 1993

•  StackmulMplestagesoffeatureextractors

•  Higherstagescomputemoreglobal,moreinvariantfeatures

•  ClassificaMonlayerattheend

InputImage

ConvoluMon(Learned)

Non-linearity

SpaMalpooling

ConvoluMonalNeuralNetworks

Featuremaps

WhatisaConvoluMon?•  Weightedmovingsum

Input FeatureAcMvaMonMap

.

.

.

6/1/17

11

WhyConvoluMon?•  Fewparameters(filterweights)•  Dependenciesarelocal•  TranslaMoninvariance

Input FeatureMap

.

.

.

Input FeatureMap

...


InputImage

ConvoluMon(Learned)

Non-linearity

SpaMalpooling

Featuremaps


Rectified Linear Unit (ReLU)

slide credit: S. Lazebnik

InputImage

ConvoluMon(Learned)

Non-linearity

SpaMalpooling

Featuremaps

6/1/17

12

Non-Linearity

•  Per-element(independent)•  OpMons:

–  Tanh–  Sigmoid:1/(1+exp(-x))–  RecMfiedlinearunit(ReLU)

•  Makeslearningfaster•  SimplifiesbackpropagaMon•  AvoidssaturaMonissuesàPreferredopMon

InputImage

ConvoluMon(Learned)

Non-linearity

SpaMalpooling

NormalizaMon

Featuremaps

Maxpooling


Max-pooling: a non-linear down-samp Provide translation invariance

SpaMalPooling•  Averageormax•  Non-overlapping/overlappingregions•  Roleofpooling:•  InvariancetosmalltransformaMons•  LargerrecepMvefields(seemoreofinput)

Max

Average

6/1/17

13

Engineeredvs.learnedfeatures

Image

FeatureextracMon

Pooling

Classifier

Label

Image

ConvoluMon/pool

ConvoluMon/pool

ConvoluMon/pool

ConvoluMon/pool

ConvoluMon/pool

Dense

Dense

Dense

Label Convolutional filters are trained in a supervised manner by back-propagating classification error

Compare:SIFTDescriptor

ImagePixels Apply

orientedfilters

SpaMalpool(Sum)

Normalizetounitlength

FeatureVector

Lowe[IJCV2004]

Compare:SpaMalPyramidMatching

FilterwithVisualWords

MulM-scalespaMalpool(Sum)

TakemaxVWresponse

Globalimage

descriptor

Lazebnik,Schmid,Ponce

[CVPR2006]SIFT

features

6/1/17

14

PreviousConvnetsuccesses

•  Handwrifentext/digits– MNIST(0.17%error[Ciresanetal.2011])–  Arabic&Chinese[Ciresanetal.2012]

•  SimplerrecogniMonbenchmarks–  CIFAR-10(9.3%error[Wanetal.2013])–  TrafficsignrecogniMon•  0.56%errorvs1.16%forhumans[Ciresanetal.2011]

ImageNetChallenge2012

[Dengetal.CVPR2009]

•  ~14millionlabeledimages,20kclasses

•  ImagesgatheredfromInternet

•  HumanlabelsviaAmazonTurk

•  ImageNetChallenge:1.2milliontrainingimages,1000classes

A. Krizhevsky, I. Sutskever, and G. Hinton, ImageNet Classification with Deep Convolutional Neural Networks, NIPS 2012

AlexNetSimilarframeworktoLeCun’98but:•  Biggermodel(7hiddenlayers,650,000units,60,000,000params)•  Moredata(106vs.103images)•  GPUimplementaMon(50xspeedupoverCPU)•  TrainedontwoGPUsforaweek

A. Krizhevsky, I. Sutskever, and G. Hinton, ImageNet Classification with Deep Convolutional Neural Networks, NIPS 2012

6/1/17

15

AlexNetforimageclassificaMon

“car”

AlexNet

Fixed input size: 224x224x3

ImageNetClassificaMonChallenge

http://image-net.org/challenges/talks/2016/ILSVRC2016_10_09_clsloc.pdf

AlexNet

IndustryDeployment

•  UsedinFacebook,Google,Microsol•  Startups•  ImageRecogniMon,SpeechRecogniMon,….•  FastattestMme

Taigman et al. DeepFace: Closing the Gap to Human-Level Performance in Face Verification, CVPR’14

6/1/17

16

VisualizingCNNs•  WhatinputpafernoriginallycausedagivenacMvaMoninthefeaturemaps?

Visualizing and Understanding Convolutional Networks [Zeiler and Fergus, ECCV 2014]

Layer1


Layer2


6/1/17

17

Layer3


Layer4and5


BeyondclassificaMon•  DetecMon•  SegmentaMon•  Regression•  PoseesMmaMon•  Matchingpatches•  Synthesis

andmanymore…

6/1/17

18

R-CNN:RegionswithCNNfeatures•  TrainedonImageNetclassificaMon•  FinetuneCNNonPASCAL

RCNN [Girshick et al. CVPR 2014]

LabelingPixels:SemanMcLabels

Fully Convolutional Networks for Semantic Segmentation [Long et al. CVPR 2015]

LabelingPixels:EdgeDetecMon

DeepEdge: A Multi-Scale Bifurcated Deep Network for Top-Down Contour Detection [Bertasius et al. CVPR 2015]

6/1/17

19

CNNforRegression

DeepPose [Toshev and Szegedy CVPR 2014]

CNNasaSimilarityMeasureforMatching

FaceNet [Schroff et al. 2015] Stereo matching [Zbontar and LeCun CVPR 2015] Compare patch [Zagoruyko and Komodakis 2015]

Match ground and aerial images [Lin et al. CVPR 2015] FlowNet [Fischer et al 2015]

CNNforImageGeneraMon

Learning to Generate Chairs with Convolutional Neural Networks [Dosovitskiy et al. CVPR 2015]

6/1/17

20

ChairMorphing

Learning to Generate Chairs with Convolutional Neural Networks [Dosovitskiy et al. CVPR 2015]

TransferLearning•  Improvementoflearninginanewtaskthroughthetransferofknowledgefromarelatedtaskthathasalreadybeenlearned.

•  WeightiniMalizaMonforCNN

LearningandTransferringMid-LevelImageRepresentaMonsusingConvoluMonalNeuralNetworks[Oquab et al. CVPR 2014]

Deeplearninglibraries•  Tensorflow•  Caffe•  Torch•  MatConvNet

6/1/17

21

FoolingCNNs

Intriguing properties of neural networks [Szegedy ICLR 2014]

Whatisgoingon?

x x← x+α ∂E∂x

∂E∂x

http://karpathy.github.io/2015/03/30/breaking-convnets/ Explaining and Harnessing Adversarial Examples [GoodfellowICLR2015]

QuesMons?

SeeyouTuesday!

63

Documents

Deep neural networks - Computer Science- UC Davisweb.cs.ucdavis.edu/.../lee_lecture18_deeplearning.pdf · Deep neural networks ... TradiMonal Image CategorizaMon: Training phase Training