75
@graphific Roelof Pieters Deep Learning: a (non-techy) birds-eye view 20 April 2015 Stockholm Deep Learning Slides at: http://www.slideshare.net/roelofp/deep-learning-a-birdseye-view

Deep Learning: a birds eye view

Embed Size (px)

Citation preview

Page 1: Deep Learning: a birds eye view

@graphificRoelof Pieters

Deep Learning:a (non-techy) birds-eye

view20  April  2015  

Stockholm Deep Learning

Slides at:http://www.slideshare.net/roelofp/deep-learning-a-birdseye-view

Page 2: Deep Learning: a birds eye view

• Deals with “construction and study of systems that can learn from data”

Refresher: Machine Learning ???

A computer program is said to learn from experience (E) with respect to some class of tasks (T) and performance measure (P), if its performance at tasks in T, as measured by P, improves with experience E

— T. Mitchell 1997

2

Page 3: Deep Learning: a birds eye view

Improving some task T based on experience E with

respect to performance measure P.

Deep Learning = Machine Learning

Learning denotes changes in the system that are adaptive in the sense that they enable the system to do the same task (or tasks drawn from a population of

similar tasks) more effectively the next time.

— H. Simon 1983 "Why Should Machines Learn?” in Mitchell 1997

— T. Mitchell 1997

3

Page 4: Deep Learning: a birds eye view

Representation learningAttempts to automatically learn

good features or representations

Deep learningAttempt to learn multiple levels of representation of increasing

complexity/abstraction

Deep Learning: What?

4

Page 5: Deep Learning: a birds eye view

Deep Learning ??

5

Page 6: Deep Learning: a birds eye view

Machine Learning ??

Traditional Programming:

Data

ProgramOutput

6

Computer

Page 7: Deep Learning: a birds eye view

Machine Learning ??

Traditional Programming:

Data

ProgramOutput

Data ProgramOutput

Machine Learning:

7

(labels)(“weights”/model)

Computer

Computer

Page 8: Deep Learning: a birds eye view

Machine Learning ??

8

• Most machine learning methods work well because of human-designed/hand-engineered features (representations)

• machine learning -> optimising weights to best make a final prediction

Page 9: Deep Learning: a birds eye view

Typical ML Regression

Deep Learning: Why?

Page 10: Deep Learning: a birds eye view

Neural NetTypical ML Regression

Deep Learning: Why?

Page 11: Deep Learning: a birds eye view

Machine -> Deep Learning ??

Page 12: Deep Learning: a birds eye view

Machine -> Deep Learning ??

DEEP NET (DEEP LEARNING)

Page 13: Deep Learning: a birds eye view
Page 14: Deep Learning: a birds eye view

Biological Inspiration

14

Page 15: Deep Learning: a birds eye view

Deep Learning: Why?

Page 16: Deep Learning: a birds eye view

Deep Learning is everywhere…

Page 17: Deep Learning: a birds eye view
Page 18: Deep Learning: a birds eye view

Deep Learning in the News

Page 19: Deep Learning: a birds eye view

(source: Google Trends)19

Page 20: Deep Learning: a birds eye view

(source: arXiv bookworm, Culturomics)

Scientific Articles:

Page 21: Deep Learning: a birds eye view
Page 22: Deep Learning: a birds eye view
Page 23: Deep Learning: a birds eye view
Page 24: Deep Learning: a birds eye view
Page 25: Deep Learning: a birds eye view

Why Now?

• Inspired by the architectural depth of the brain, researchers wanted for decades to train deep multi-layer neural networks.

• No successful attempts were reported before 2006 …Exception: convolutional neural networks, LeCun 1998

• SVM: Vapnik and his co-workers developed the Support Vector Machine (1993) (shallow architecture).

• Breakthrough in 2006!

25

Page 26: Deep Learning: a birds eye view

Renewed Interest: 1990s

• Learning multiple layers• “Back propagation”• Can “theoretically” learn any function!

But…• Very slow and inefficient• SVMs, random forests, etc. SOTA

26

Page 27: Deep Learning: a birds eye view

2006 Breakthrough

• More data

• Faster hardware: GPU’s, multi-core CPU’s

• Working ideas on how to train deep architectures

27

Page 28: Deep Learning: a birds eye view

2006 Breakthrough

• More data

• Faster hardware: GPU’s, multi-core CPU’s

• Working ideas on how to train deep architectures

28

Page 29: Deep Learning: a birds eye view

2006 Breakthrough

29

Page 30: Deep Learning: a birds eye view

2006 Breakthrough

30

Growth of datasets

Page 31: Deep Learning: a birds eye view

2006 Breakthrough

• More data

• Faster hardware: GPU’s, multi-core CPU’s

• Working ideas on how to train deep architectures

31

Page 32: Deep Learning: a birds eye view

2006 Breakthrough

32

vs

Page 33: Deep Learning: a birds eye view

Rise of Raw Computation Power

Page 34: Deep Learning: a birds eye view

2006 Breakthrough

• More data

• Faster hardware: GPU’s, multi-core CPU’s

• Working ideas on how to train deep architectures

34

Page 35: Deep Learning: a birds eye view

2006 Breakthrough

Stacked Restricted Boltzman Machines* (RBM) Hinton, G. E, Osindero, S., and Teh, Y. W. (2006).A fast learning algorithm for deep belief nets.Neural Computation, 18:1527-1554.

Stacked Autoencoders (AE) Bengio, Y., Lamblin, P., Popovici, P., Larochelle, H. (2007).Greedy Layer-Wise Training of Deep Networks,Advances in Neural Information Processing Systems 19

* called Deep Belief Networks (DBN)35

Page 36: Deep Learning: a birds eye view

Deep Learning for the Win!

Page 37: Deep Learning: a birds eye view

• 1.2M images with 1000 object categories

• AlexNet of uni Toronto: 15% error rate vs 26% for 2th placed (traditional CV)

Impact on Computer VisionImageNet Challenge 2012

Page 38: Deep Learning: a birds eye view

Impact on Computer Vision

(from Clarifai)

Page 39: Deep Learning: a birds eye view

Impact on Computer Vision

Page 40: Deep Learning: a birds eye view

40

Classification results on ImageNet 2012

Team Year Place Error (top-5) Uses external data

SuperVision 2012 - 16.4% no

SuperVision 2012 1st 15.3% ImageNet 22k

Clarifai 2013 - 11.7% no

Clarifai 2013 1st 11.2% ImageNet 22k

MSRA 2014 3rd 7.35% no

VGG 2014 2nd 7.32% no

GoogLeNet 2014 1st 6.67% no

Final Detection Results Team Year Place mAP e x t e r n a l

data ensemble c o n t e x t u a l

model approach

UvA-Euvision 2013 1st 22.6% none ? yes F i s h e r vectors

Deep Insight 2014 3rd 40.5% I L S V R C 1 2 Classification + Localization

3 models yes ConvNet

C U H K DeepID-Net

2014 2nd 40.7% I L S V R C 1 2 Classification + Localization

? no ConvNet

GoogLeNet 2014 1st 43.9% I L S V R C 1 2 Classification

6 models no ConvNet

Detection results

source: Szegedy et al. Going deeper with convolutions (GoogLeNet ), ILSVRC2014, 19 Sep 2014

Page 41: Deep Learning: a birds eye view

41source: Szegedy et al. Going deeper with convolutions (GoogLeNet ), ILSVRC2014, 19 Sep 2014

GoogLeNet

Convolution Pooling Softmax Other

Winners of: Large Scale Visual Recognition Challenge 2014

(ILSVRC2014)19 September 2014

GoogLeNet

Convolution Pooling Softmax Other

Page 42: Deep Learning: a birds eye view

42source: Szegedy et al. Going deeper with convolutions (GoogLeNet ), ILSVRC2014, 19 Sep 2014

Inception

Width of inception modules ranges from 256 filters (in early modules) to 1024 in top inception modules. Can remove fully connected layers on top completely Number of parameters is reduced to 5 million

256 480 480 512

512 512 832 832 1024

Computional cost is increased by less than 2X compared to Krizhevsky’s network. (<1.5Bn operations/evaluation)

Page 43: Deep Learning: a birds eye view

Impact on Computer Vision

Latest State of the Art:

Page 44: Deep Learning: a birds eye view

Computer Vision: Current State of the Art

Page 45: Deep Learning: a birds eye view

Impact on Audio Processing

45

First public Breakthrough with Deep Learning in 2010

Dahl et al. (2010)

Page 46: Deep Learning: a birds eye view

Impact on Audio Processing

46

First public Breakthrough with Deep Learning in 2010

Dahl et al. (2010)

-33%! -32%!

Page 47: Deep Learning: a birds eye view

Impact on Audio Processing

47

Speech Recognition

Page 48: Deep Learning: a birds eye view

Impact on Audio Processing

48

TIMIT Speech Recognition

(from: Clarifai)

Page 49: Deep Learning: a birds eye view

Impact on Audio Processing

Page 50: Deep Learning: a birds eye view

C&W 2011

Impact on Natural Language Processing

Pos: Toutanova et al. 2003)Ner: Ando & Zhang 2005

C&W 2011

Page 51: Deep Learning: a birds eye view

Impact on Natural Language Processing

Named Entity Recognition:

Page 52: Deep Learning: a birds eye view

Deep Learning: Who’s to blame?

Page 53: Deep Learning: a birds eye view

53

Deep Learning: Who’s to blame?

Page 54: Deep Learning: a birds eye view

Deep Architectures can be representationally efficient• Fewer computational units for same function

Deep Representations might allow for a hierarchy or representation

• Allows non-local generalisation• Comprehensibility

Multiple levels of latent variables allow combinatorial sharing of statistical strength

54

Deep Learning: Why?

Page 55: Deep Learning: a birds eye view

— Andrew Ng

“I’ve worked all my life in Machine Learning, and I’ve

never seen one algorithm knock over benchmarks like Deep

Learning”

Deep Learning: Why?

55

Page 56: Deep Learning: a birds eye view

Biological JustificationDeep Learning = Brain “inspired”Audio/Visual Cortex has multiple stages == Hierarchical

Page 57: Deep Learning: a birds eye view

Different Levels of Abstraction

57

Page 58: Deep Learning: a birds eye view

Hierarchical Learning• Natural progression

from low level to high level structure as seen in natural complexity

Different Levels of AbstractionFeature Representation

58

Page 59: Deep Learning: a birds eye view

Hierarchical Learning• Natural progression

from low level to high level structure as seen in natural complexity• Easier to monitor what is being learnt and to guide the machine to better subspaces

Different Levels of AbstractionFeature Representation

59

Page 60: Deep Learning: a birds eye view

Hierarchical Learning• Natural progression

from low level to high level structure as seen in natural complexity• Easier to monitor what is being learnt and to guide the machine to better subspaces

• A good lower level representation can be used for many distinct tasks

Different Levels of AbstractionFeature Representation

60

Page 61: Deep Learning: a birds eye view

Hierarchical Learning• Natural progression

from low level to high level structure as seen in natural complexity• Easier to monitor what is being learnt and to guide the machine to better subspaces

• A good lower level representation can be used for many distinct tasks

Different Levels of AbstractionFeature Representation

61

Page 62: Deep Learning: a birds eye view

Different Levels of Abstraction

Page 63: Deep Learning: a birds eye view

Classic Deep Architecture

Input layer

Hidden layers

Output layer

Page 64: Deep Learning: a birds eye view

Modern Deep Architecture

Input layer

Hidden layers

Output layer

movie time:http://www.cs.toronto.edu/~hinton/adi/index.htm

Page 65: Deep Learning: a birds eye view

Hierarchies

Efficient

Generalization

Distributed

Sharing

Unsupervised*

Black Box

Training Time

Major PWNAGE!

Much Data

Why go Deep ?

65

Page 66: Deep Learning: a birds eye view

No More Handcrafted Features !

66

Page 67: Deep Learning: a birds eye view

[Kudos to Richard Socher, for this eloquent summary :) ]

• Manually designed features are often over-specified, incomplete and take a long time to design and validate

• Learned Features are easy to adapt, fast to learn

• Deep learning provides a very flexible, (almost?) universal, learnable framework for representing world, visual and linguistic information.

• Deep learning can learn unsupervised (from raw text/audio/images/whatever content) and supervised (with specific labels like positive/negative)

Why Deep Learning ?

Page 68: Deep Learning: a birds eye view

Deep Learning: Future Developments

Currently an explosion of developments• Hessian-Free networks (2010)• Long Short Term Memory (2011)• Large Convolutional nets, max-pooling (2011)• Nesterov’s Gradient Descent (2013)

Currently state of the art but...• No way of doing logical inference (extrapolation)• No easy integration of abstract knowledge• Hypothetic space bias might not conform with reality

68

Page 69: Deep Learning: a birds eye view

Deep Learning: Future Challenges

a

69

Szegedy, C., Wojciech, Z., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I., Fergus, R. (2013) Intriguing properties of neural networks

L: correctly identified, Center: added noise x10, R: “Ostrich”

Page 70: Deep Learning: a birds eye view
Page 71: Deep Learning: a birds eye view
Page 72: Deep Learning: a birds eye view

as PhD candidate KTH/CSC:“Always interested in discussing

Machine Learning, Deep Architectures, Graphs, and

Language Technology”

In touch!

[email protected]/~roelof/

Data Science ConsultancyAcademic/Research

[email protected]

72

Gve Systems Graph Technologies

Page 73: Deep Learning: a birds eye view

• Theano - CPU/GPU symbolic expression compiler in python (from LISA lab at University of Montreal). http://deeplearning.net/software/theano/

• Pylearn2 - library designed to make machine learning research easy. http://deeplearning.net/software/pylearn2/

• Torch - Matlab-like environment for state-of-the-art machine learning algorithms in lua (from Ronan Collobert, Clement Farabet and Koray Kavukcuoglu) http://torch.ch/

• more info: http://deeplearning.net/software links/

Wanna Play ?

Wanna Play ? General Deep Learning

73

Page 74: Deep Learning: a birds eye view

• RNNLM (Mikolov) http://rnnlm.org

• NB-SVM https://github.com/mesnilgr/nbsvm

• Word2Vec (skipgrams/cbow)https://code.google.com/p/word2vec/ (original) http://radimrehurek.com/gensim/models/word2vec.html (python)

• GloVehttp://nlp.stanford.edu/projects/glove/ (original) https://github.com/maciejkula/glove-python (python)

• Socher et al / Stanford RNN Sentiment code:http://nlp.stanford.edu/sentiment/code.html

• Deep Learning without Magic Tutorial: http://nlp.stanford.edu/courses/NAACL2013/

Wanna Play ? NLP

74

Page 75: Deep Learning: a birds eye view

• cuda-convnet2 (Alex Krizhevsky, Toronto) (c++/CUDA, optimized for GTX 580) https://code.google.com/p/cuda-convnet2/

• Caffe (Berkeley) (Cuda/OpenCL, Theano, Python)http://caffe.berkeleyvision.org/

• OverFeat (NYU) http://cilvr.nyu.edu/doku.php?id=code:start

Wanna Play ? Computer Vision

75