Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
VBM683
Machine Learning
Pinar Duygulu
Slides are adapted from
Dhruv Batra (Virginia Tech),
J. Elder
New Topic: Neural Networks
(C) Dhruv Batra 2
Two class discriminant function
Two class discriminant function
Perceptron
Synonyms
• Neural Networks
• Artificial Neural Network (ANN)
• Feed-forward Networks
• Multilayer Perceptrons (MLP)
• Types of ANN
– Convolutional Nets
– Autoencoders
– Recurrent Neural Nets
• [Back with a new name]: Deep Nets / Deep Learning
(C) Dhruv Batra 6
Biological Neuron
(C) Dhruv Batra 7
The Neuron Metaphor
• Neurons
– accept information from multiple inputs,
– transmit information to other neurons.
• Multiply inputs by weights along edges
• Apply some function to the set of inputs at each node
8Slide Credit: HKUST
Types of Neurons
Linear Neuron
Logistic Neuron
Perceptron
Potentially more. Require a convex
loss function for gradient descent training.
9Slide Credit: HKUST
Generalized linear models
Sigmoid
-6 -4 -2 0 2 4 60
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
-6 -4 -2 0 2 4 60
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
-6 -4 -2 0 2 4 60
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
w0=0, w1=1w0=2, w1=1 w0=0, w1=0.5
Slide Credit: Carlos Guestrin(C) Dhruv Batra 11
Case 1: Linearly separable inputs
The Perceptron algorithm
The Perceptron algorithm
The Perceptron criterion
The Perceptron criterion
The Perceptron algorithm
The Perceptron algorithm
The Perceptron algorithm
Not monotonic
The perceptron convergence
theorem
Example
The first learning machine
Practical limitations
Generalization to not linearly
separable inputs
Generalization to multiclass
problems
K>2 classes
K>2 classes
K>2 classes
1-of-K coding scheme
Computational limitations of
perceptrons
Limitation
• A single “neuron” is still a linear decision boundary
• What to do?
• Idea: Stack a bunch of them together!
(C) Dhruv Batra 32
Multilayer Networks
• Cascade Neurons together
• The output from one layer is the input to the next
• Each Layer has its own sets of weights
33Slide Credit: HKUST
Multi-layer perceptrons
Universal Function Approximators
• Theorem
– 3-layer network with linear outputs can uniformly
approximate any continuous function to arbitrary accuracy,
given enough hidden units [Funahashi ’89]
(C) Dhruv Batra 35
Feed-Forward Networks
• Predictions are fed forward through the network to
classify
36Slide Credit: HKUST
Feed-Forward Networks
• Predictions are fed forward through the network to
classify
37Slide Credit: HKUST
Feed-Forward Networks
• Predictions are fed forward through the network to
classify
38Slide Credit: HKUST
Feed-Forward Networks
• Predictions are fed forward through the network to
classify
39Slide Credit: HKUST
Feed-Forward Networks
• Predictions are fed forward through the network to
classify
40Slide Credit: HKUST
Feed-Forward Networks
• Predictions are fed forward through the network to
classify
41Slide Credit: HKUST
Implementing logical relations
XOR problem
Combining two linear classifiers
Combining two linear classifiers
Combining two linear classifiers
Two-layer perceptron
Two-layer perceptron
Two-layer perceptron
Two-layer perceptron
Higher dimensions
Higher dimensions
Two-layer perceptron
Two layer perceptron
Limitations of a two-layer
perceptron
Three layer perceptron
Three layer perceptron
Three layer perceptron
Learning parameters – Training
data
Choosing an activation function
Smooth activation function
Output : Two classes
Output: K> 2 classes
Non-convex
Backpropagation algorithm
Notation
Notation
Input
Notation
Cost function
Cost function
Gradient descent
Gradient descent
Gradient descent
Backpropagation: Output layer
Backpropagation: Hidden layers
Backpropagation: Summary of
the algorithm
Batch versus online learning
Batch versus online learning
Neural Nets
• Best performers on OCR
– http://yann.lecun.com/exdb/lenet/index.html
• NetTalk
– Text to Speech system from 1987
– http://youtu.be/tXMaFhO6dIY?t=45m15s
• Rick Rashid speaks Mandarin
– http://youtu.be/Nu-nlQqFCKg?t=7m30s
(C) Dhruv Batra 80
Convergence of backprop
• Perceptron leads to convex optimization
– Gradient descent reaches global minima
• Multilayer neural nets not convex
– Gradient descent gets stuck in local minima
– Hard to set learning rate
– Selecting number of hidden units and layers = fuzzy
process
– NNs had fallen out of fashion in 90s, early 2000s
– Back with a new name and significantly improved
performance!!!!
• Deep networks
– Dropout and trained on much larger corpus
Slide Credit: Carlos Guestrin(C) Dhruv Batra 81
Convolutional Nets
• Example:
– http://yann.lecun.com/exdb/lenet/index.html
(C) Dhruv Batra 82
INPUT 32x32
Convolutions SubsamplingConvolutions
C1: feature maps 6@28x28
Subsampling
S2: f. maps6@14x14
S4: f. maps 16@5x5
C5: layer120
C3: f. maps 16@10x10
F6: layer 84
Full connection
Full connection
Gaussian connections
OUTPUT 10
Image Credit: Yann LeCun, Kevin Murphy
(C) Dhruv Batra 83Slide Credit: Marc'Aurelio Ranzato
(C) Dhruv Batra 84Slide Credit: Marc'Aurelio Ranzato
(C) Dhruv Batra 85Slide Credit: Marc'Aurelio Ranzato
Visualizing Learned Filters
(C) Dhruv Batra 86Figure Credit: [Zeiler & Fergus ECCV14]
Visualizing Learned Filters
(C) Dhruv Batra 87Figure Credit: [Zeiler & Fergus ECCV14]
Visualizing Learned Filters
(C) Dhruv Batra 88Figure Credit: [Zeiler & Fergus ECCV14]