Upload
others
View
8
Download
0
Embed Size (px)
Citation preview
Deep Learning for Data Mining
University of Rome "La Sapienza"Dep. of Computer, Control and Management Engineering A. Ruberti
Valsamis Ntouskos, ALCOR Lab
Outline
Deep Learning for Data Mining
• Introduction - Motivation
• Theoretical aspects
• Convolutional Neural Networks
• Recurrent Neural Networks
• Generative models
• Further directions
13/12/2018
Deep Learning with CNNs
Deep Learning for Data Mining
Compositional Models
Learned End-to-End
Hierarchy of Representations- vision: pixel, motif, part, object
- text: character, word, clause, sentence
- speech: audio, band, phone, word
concrete abstractlearning
Slides from Caffe framework tutorial @ CVPR2015
13/12/2018
Deep Learning with CNNs
Deep Learning for Data Mining
Compositional Models
Learned End-to-End
Back-propagation jointly learns
all of the model parameters to
optimize the output for the task.
Slides from Caffe framework tutorial @ CVPR2015
13/12/2018
Motivation of CNNs
Deep Learning for Data Mining
Inputs usually treated as general feature vectors
In some cases inputs have special structure:• Audio• Images• Videos
Signals: Numerical representations of physical quantities
Deep learning can be directly applied on signals by using suitable operators
13/12/2018
Motivation
Deep Learning for Data Mining
. . . 0.0468 0.0468 0.0468 0.0390 0.0390 0.0390 0.0546 0.0625 0.0625 0.0390 0.0312 0.0468 0.0625 . . .
1D data - (variable length) vectors
Audio
13/12/2018
Motivation
Deep Learning for Data Mining
Images
A sequence of images sampled through time - 3D data
2D data - matrices
Video
13/12/2018
What is a CNN?
Deep Learning for Data Mining 13/12/2018
Some theory
13/12/2018Deep Learning for Data Mining
Convolution
• Image filtering is
based on convolution
with special kernels
Some theory
Deep Learning for Data Mining
Pooling
• Introduces subsampling
13/12/2018
Some theory
Deep Learning for Data Mining
Activation
Standard way to model a neuron
f(x) = tanh(x) or f(x) = (1 + e-x)-1
Very slow to train (saturation)
Non-saturating nonlinearity (RELU)f(x) = max(0, x)Quick to train
13/12/2018
Some theory
Deep Learning for Data Mining
Regularization
Dropout
• Applied on the fully-connected layers
• During training prune nodes with probability α
• During testing nodes are weighed by α
Image from Srivastava et al.. "Dropout: A Simple Way to Prevent Neural Networks from Overfitting"
13/12/2018
Some theory
Deep Learning for Data Mining
Every convolutional layer of a CNN transforms the 3D input
volume to a 3D output volume of neuron activations.
A regular 3-layer Neural Network
Material from Fei-Fei’s group
13/12/2018
Some theory
Deep Learning for Data Mining
Each neuron is connected to a
local region in the input volume
spatially, but to all channels
The neurons still compute a dot
product of their weights with the
input followed by a non-linearity
Material from Fei-Fei’s group
13/12/2018
Algorithms
Deep Learning for Data Mining
• Each* neuron/layer is differentiable!
• Just apply backpropagation (chain-rule)
• Use standard gradient-based optimization algorithms
(SGD, AdaGrad, …)
• The devil lies in the details though …
▪Choosing hyperparameters / loss-function
▪Exploding/Vanishing gradients – batch normalization
▪Overfitting – Regularization
▪Cost of performing experiments
▪Convergence
▪…
*what about max-pooling?
13/12/2018
Kernels and
Feature maps
Deep Learning for Data Mining
Material from Fei-Fei’s group
13/12/2018
Case study: image classification
Deep Learning for Data Mining
Slides from Caffe framework tutorial @ CVPR2015
13/12/2018
Deep Learning for Data Mining
Case study: image classification
Slides from Caffe framework tutorial @ CVPR2015
13/12/2018
Brief history of CNNs
Foundational work done in the middle of the 1900s
• 1940s-1960s: Cybernetics [McCulloch and Pitts 1943,
Hebb 1949, Rosenblatt 1958]
• 1980s-mid 1990s: Connectionism [Rumelhart 1986,
Hinton 1989]
• 1990s: modern convolutional networks [LeCun et al.
1998], LSTM [Hochreiter & Schmidhuber 1997,
MNIST and other large datasets]
Deep Learning for Data Mining 13/12/2018
Brief history of CNNs
Deep Learning for Data Mining
Hubel & Wiesel [60s] Simple & Complex cells architecture:
Hubel & Wiesel [60s] Simple & Complex cells architecture Fukushima’s Neocognitron [70s]
Yann LeCun’s Early CNNs [80s]:
13/12/2018
Brief history of CNNs
Deep Learning for Data Mining
Convolutional Networks: 1989
LeNet: a layered model composed of convolution and subsampling operations followed
by a holistic representation and ultimately a classifier for handwritten digits. [ LeNet ]
13/12/2018
Recent success
• Parallel Computation (GPU)
• Larger training sets
• International Competitions
• Theoretical advancements
– Dropout
– ReLUs
– Batch Normalization
Deep Learning for Data Mining 13/12/2018
Recent success
Deep Learning for Data Mining
CUDA Jetson TX1, TK1
Android lib, demo
OpenCL branch
Better Hardware – GPUs
13/12/2018
Recent success
ImageNet
• Over 15M labeled high resolution images
• Roughly 22K categories
• Collected from web and labeled by Amazon Mechanical Turk
Deep Learning for Data Mining
Larger training sets
13/12/2018
Recent success
ILSVRC
• Annual competition of image classification at large scale
• 1.2M images in 1K categories
• Classification: make 5 guesses about the image label
Deep Learning for Data Mining
Competitions
EntleBucher Appenzeller
13/12/2018
CNNs in Computer Vision
Deep Learning for Data Mining
• Image classification
13/12/2018
Evolution of CNNs for image classification
Deep Learning for Data Mining
AlexNet: a layered model composed of convolution, subsampling, and further operations
followed by a holistic representation and all-in-all a landmark classifier on
ILSVRC12. [ AlexNet ]
Convolutional Nets: 2012
AlexNet
13/12/2018
Evolution of CNNs for image classification
Deep Learning for Data Mining
Convolutional Nets: 2014
ILSVRC14 Winners: ~6.6% Top-5 error
- GoogLeNet: composition of multi-scale
dimension-reduced modules
+ depth
+ data
+ dimensionality reduction
13/12/2018
Evolution of CNNs for image classification
Deep Learning for Data Mining
Convolutional Nets: 2014
ILSVRC14 Winners: ~6.6% Top-5 error
- VGG: 16 layers of 3x3 convolution
interleaved with max pooling +
3 fully-connected layers
+ depth
+ data
+ dimensionality reduction
13/12/2018
Evolution of CNNs for image classification
Deep Learning for Data Mining
Convolutional Nets: 2015
ResNet
ILSVRC15 Winner: ~3.6% Top-5 error
Intuition: Easier to learn zero than identity function
13/12/2018
Reasonable questions
• Is this just for a particular dataset? – No!
Deep Learning for Data Mining
Slides from ICCV 2015 Math of Deep Learning tutorial
13/12/2018
Reasonable questions
Deep Learning for Data Mining
Object Localization[R-CNN, HyperColumns, Overfeat, etc.]
Pose estimation [Thomson et al, CVPR’15]
• Is this just for a particular task? – No!
Slides from ICCV 2015 Math of Deep Learning tutorial
13/12/2018
Reasonable questions
Deep Learning for Data Mining
Semantic Segmentation[Pinhero, Collobert, Dollar, ICCV’15]
• Is this just for a particular task? – No!
Slides from ICCV 2015 Math of Deep Learning tutorial
13/12/2018
Fine Tuning
Deep Learning for Data Mining
Dogs vs. Cats
top 10 in 10 minutes
Take a pre-trained model and fine-tune to new tasks [DeCAF] [Zeiler-Fergus] [OverFeat]
© kaggle.com
Your Task
Style
RecognitionLots of Data
ImageNet
13/12/2018
RNNs
Dealing with sequences
• data points are related
• order is important!
(C)NNs disregard this information
Deep Learning for Data Mining 13/12/2018
RNNs
Recurrent Neural Networks (RNNs) address this problem by introducing cells with loops
13/12/2018Deep Learning for Data Mining
• allow information to persist
• the value of output 𝐡𝑡 depends both on 𝐱𝑡 and the previous output 𝐡𝑡−1
Material from Colah’s blog
RNNs
Another way to see RNNs is to unfold the loop
• each cell dedicated to a different data sample
• output 𝐡𝑡 computed based also on 𝐡𝑡−1
13/12/2018Deep Learning for Data Mining
RNNs
RNN structure
13/12/2018Deep Learning for Data Mining
Material from Colah’s blog
RNNs
RNNs (and variants) are very effective in various domains:
• Speech recognition
• Translation
• Image/Video captioning
• Video summarization
• Activity recognition
• ...
13/12/2018Deep Learning for Data Mining
RNNs
a) "The clouds are in the _."
13/12/2018Deep Learning for Data Mining
ProblemRNNs cannot capture long-term dependencies• so-called vanishing/exploding gradients
problem
b) "I grew up in France... I speak fluently _."
LSTMs
Long Short Term Memories (LSTMs)
• RNNs with special structure
• designed to capture long-term dependencies
13/12/2018Deep Learning for Data Mining
Hochreiter & Schmidhuber (1997) Long short-term memory
LSTMs
Main idea: Separate cell output 𝐡𝑡 and cell state 𝐶𝑡• state is modified through a series of
structures called gates
13/12/2018Deep Learning for Data Mining
LSTMs – step by step
1st gate: forget mechanism
• drop elements of 𝐶𝑡−1 based on values of 𝐡𝑡−1and input 𝐱𝑡
13/12/2018Deep Learning for Data Mining
LSTMs – step by step
2nd and 3rd gate: update mechanism
i. decide which elements of 𝐶𝑡−1 to update
ii. compute the update vector ሚ𝐶𝑡
13/12/2018Deep Learning for Data Mining
LSTMs – step by step
2nd and 3rd gate: update mechanism
iii. update elements selected via 𝑖𝑡 with values computed in ሚ𝐶𝑡
13/12/2018Deep Learning for Data Mining
LSTMs – step by step
Last step: compute output
• Based on 𝐶𝑡 , 𝐡𝑡−1, 𝐱𝑡
13/12/2018Deep Learning for Data Mining
LSTM variants 1/3
Add peephole connections
All layers “see” the state 𝐶𝑡−1
13/12/2018Deep Learning for Data Mining
Gers & Schmidhuber (2000) Recurrent nets that time and count
LSTM variants 2/3
Coupled input and forget gates:
• only forget something if you are going to input something else
13/12/2018Deep Learning for Data Mining
LSTM variants 3/3
Gated Recurrent Units (GRUs)
• no cell state! - directly operate on output 𝐡𝑡• just two gates instead of three
13/12/2018Deep Learning for Data Mining
Yao et al. (2015) Depth-gated recurrent neural networks
LSTMs
Further improvements:
• Further modifications of gates/structure
• Bi-directional LSTMs
• Attention mechanisms
• ...
13/12/2018Deep Learning for Data Mining
LSTMs – typical use cases
One-to-one – classical NNs (no RNN)
One-to-many – sequence output (e.g. image captioning)
Many-to-one – sequence input (e.g. sentiment analysis)
Many-to-many – sequence to sequence (e.g. machine translation)
Many-to-many synced – e.g. frame per frame video analysis
13/12/2018Deep Learning for Data Mining
LSTMs – use case examples
Deep Learning for Data Mining
Visual Sequence Tasks
Jeff Donahue et al. CVPR’15
68
Based on Long short-term memory (LSTM) layers
13/12/2018
Using LSTMs
Some videos:
- Facebook AI
- Music classification
- Action recognition SecondHands project
13/12/2018Deep Learning for Data Mining
Beyond Regression/Classification
Generative models
• Variational Auto-Encoders (VAEs)
– focus on learning latent space structure
• Generative Adversarial Networks (GANs)
– focus on learning a distribution
– no latent space (in general)
13/12/2018Deep Learning for Data Mining
GANs
Goal
Sample from the input data distribution 𝒳
Idea
Invert a (Convolutional) Neural Network
13/12/2018Deep Learning for Data Mining
GANs
Encoder (conventional CNN)
• processes an image and produces a vector/code
13/12/2018Deep Learning for Data Mining
GANs
Decoder
• receives a code and produces an image
• uses “deconvolutional” layers
13/12/2018Deep Learning for Data Mining
Material from Frans’ blog
GANs
Problem
How to train the decoder to produce meaningful data?
Idea
Adversarial Training
13/12/2018Deep Learning for Data Mining
GANs
GAN is a combination of two networks
1. a generator network (decoder)
2. a discriminator network (critic)
13/12/2018Deep Learning for Data Mining
GANs
Roles
• Generator: produces realistic samples of 𝒳
• Discriminator: identifies if a sample actually comes from the (unknown) 𝒳 or not
Training
make the networks compete with each other
• generator tries to fool the discriminator in believing that the sample is ‘real’
• discriminator tries to discriminate as good as possible ‘real’ from ‘fake’ samples
13/12/2018Deep Learning for Data Mining
GANs
Example on CIFAR dataset (32x32 images)
Are these real images or generated?
13/12/2018Deep Learning for Data Mining
GANs
Example on CIFAR dataset (32x32 images)
Are these real images or generated?
13/12/2018Deep Learning for Data Mining
GANs
Example on CIFAR dataset (32x32 images)
Results at 300, 900 and 5700 iterations
13/12/2018Deep Learning for Data Mining
GANs
Example: Celebrity faces (1024x1024 images)
13/12/2018Deep Learning for Data Mining
VAEs
Goal
• modify data in specific directions
• identify meaningful directions in latent space
Examples
- ‘distort’ faces (change expression, add glasses)
- produce digits from different hand-writing styles
- distort 3D meshes
- ... 13/12/2018Deep Learning for Data Mining
VAEs
What is an auto-encoder?
- a combination of an encoder and a decoder
- trained based on reconstruction loss
- provides low-dimensional representation
13/12/2018Deep Learning for Data Mining
Material from Kurita’s blog
VAEs
Goal
Feed vectors and get realistic samples of 𝒳
AE problem: we don’t know if the vectors are valid or not
13/12/2018Deep Learning for Data Mining
VAEs
Main idea:
Encoder produces a distribution instead of a vector
Decoder operates on samples from this distribution
13/12/2018Deep Learning for Data Mining
VAEs
Problems
1. How to produce a distribution?
2. How to prevent degeneration?
Solutions
1. parametric distributions – typically Gaussian
– produce mean 𝜇 and variance Σ
2. add loss term based on Kullback-Leiblerdivergence
13/12/2018Deep Learning for Data Mining
VAEs
One more problem:
- Sampling operation is not differentiable
Solution:
- re-parametrization
13/12/2018Deep Learning for Data Mining
Images from Keita Kurita’s blog
VAEs
Example: faces (Hou et al., 2016)
13/12/2018Deep Learning for Data Mining
VAEs
Example: 3D Mesh deformation (Tan et al., 2018)
13/12/2018Deep Learning for Data Mining
Further directions
Few-shot and zero-shot learning
• meta-learner architectures
• Prototypical networks
13/12/2018Deep Learning for Data Mining
Further directions
Few-shot and zero-shot learning
• meta-learner architectures
• Prototypical networks
13/12/2018Deep Learning for Data Mining
One –show learning example: Mullet Zero–show learning example: Zebra (from horse)
Further directions
Few-shot and zero-shot learning
• meta-learner architectures
• Prototypical networks
Learning on manifolds
• learning functions defined on surfaces
13/12/2018Deep Learning for Data Mining
Further directions
Few-shot and zero-shot learning
• meta-learner architectures
• Prototypical networks
Learning on manifolds
• learning functions defined on surfaces
• learning functions defined on graphs
13/12/2018Deep Learning for Data Mining
Further directions
Few-shot and zero-shot learning
• meta-learner architectures
• Prototypical networks
Learning on manifolds
• learning functions defined on surfaces
• learning functions defined on graphs
Learning with constraints
• Frank-Wolf networks
13/12/2018Deep Learning for Data Mining
Resources
Frameworks:
• TensorFlow (Google) | C/C++, Python, Java, Go
• Torch/PyTorch (Facebook) | Lua/Python
• Caffe/Caffe 2 (UC Berkeley) | C/C++, Python, Matlab
• Theano (U Montreal) | Python
• CNTK (Microsoft) | Python, C++ , C#/.Net, Java
• MxNet (DMLC) | Python, C++, R, Perl, …
• Darknet (Redmon J.) | C
• …
Deep Learning for Data Mining 13/12/2018
Resources
High-level libraries:
• Keras | Backends: TensorFlow (TF), Theano
Models:
• Depends on the framework, e.g.
– https://github.com/BVLC/caffe/wiki/Model-Zoo (Caffe)
– https://github.com/tensorflow/models/tree/master/research (TF)
Interactive Interfaces:
• DIGITS (NVIDIA) | Caffe, TF, Torch
• TensorBoard (TF)
Tools:
• http://ethereon.github.io/netscope (for networks defined in protobuf )
Deep Learning for Data Mining 13/12/2018
Thank you!
13/12/2018Deep Learning for Data Mining