VBM683 Machine Learning

VBM683

Machine Learning

Pinar Duygulu

Slides are adapted from

Dhruv Batra (Virginia Tech),

J. Elder

New Topic: Neural Networks

(C) Dhruv Batra 2

Two class discriminant function

Two class discriminant function

Perceptron

Synonyms

• Neural Networks

• Artificial Neural Network (ANN)

• Feed-forward Networks

• Multilayer Perceptrons (MLP)

• Types of ANN

– Convolutional Nets

– Autoencoders

– Recurrent Neural Nets

• [Back with a new name]: Deep Nets / Deep Learning

(C) Dhruv Batra 6

Biological Neuron

(C) Dhruv Batra 7

The Neuron Metaphor

• Neurons

– accept information from multiple inputs,

– transmit information to other neurons.

• Multiply inputs by weights along edges

• Apply some function to the set of inputs at each node

8Slide Credit: HKUST

Types of Neurons

Linear Neuron

Logistic Neuron

Perceptron

Potentially more. Require a convex

loss function for gradient descent training.


Generalized linear models

Sigmoid

-6 -4 -2 0 2 4 60

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

-6 -4 -2 0 2 4 60

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

-6 -4 -2 0 2 4 60

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

w0=0, w1=1w0=2, w1=1 w0=0, w1=0.5

Slide Credit: Carlos Guestrin(C) Dhruv Batra 11

Case 1: Linearly separable inputs

The Perceptron algorithm


The Perceptron criterion

The Perceptron criterion




Not monotonic

The perceptron convergence

theorem

Example

The first learning machine

Practical limitations

Generalization to not linearly

separable inputs

Generalization to multiclass

problems

K>2 classes

K>2 classes

K>2 classes

1-of-K coding scheme

Computational limitations of

perceptrons

Limitation

• A single “neuron” is still a linear decision boundary

• What to do?

• Idea: Stack a bunch of them together!

(C) Dhruv Batra 32

Multilayer Networks

• Cascade Neurons together

• The output from one layer is the input to the next

• Each Layer has its own sets of weights


Multi-layer perceptrons

Universal Function Approximators

• Theorem

– 3-layer network with linear outputs can uniformly

approximate any continuous function to arbitrary accuracy,

given enough hidden units [Funahashi ’89]

(C) Dhruv Batra 35

Feed-Forward Networks

• Predictions are fed forward through the network to

classify




classify




classify




classify




classify




classify


Implementing logical relations

XOR problem

Combining two linear classifiers



Two-layer perceptron




Higher dimensions

Higher dimensions


Two layer perceptron

Limitations of a two-layer

perceptron

Three layer perceptron



Learning parameters – Training

data

Choosing an activation function

Smooth activation function

Output : Two classes

Output: K> 2 classes

Non-convex

Backpropagation algorithm

Notation

Notation

Input

Notation

Cost function

Cost function

Gradient descent

Gradient descent

Gradient descent

Backpropagation: Output layer

Backpropagation: Hidden layers

Backpropagation: Summary of

the algorithm

Batch versus online learning

Batch versus online learning

Neural Nets

• Best performers on OCR

– http://yann.lecun.com/exdb/lenet/index.html

• NetTalk

– Text to Speech system from 1987

– http://youtu.be/tXMaFhO6dIY?t=45m15s

• Rick Rashid speaks Mandarin

– http://youtu.be/Nu-nlQqFCKg?t=7m30s

(C) Dhruv Batra 80

http://yann.lecun.com/exdb/lenet/index.html

http://youtu.be/tXMaFhO6dIY?t=45m15s

http://youtu.be/Nu-nlQqFCKg?t=7m30s

Convergence of backprop

• Perceptron leads to convex optimization

– Gradient descent reaches global minima

• Multilayer neural nets not convex

– Gradient descent gets stuck in local minima

– Hard to set learning rate

– Selecting number of hidden units and layers = fuzzy

process

– NNs had fallen out of fashion in 90s, early 2000s

– Back with a new name and significantly improved

performance!!!!

• Deep networks

– Dropout and trained on much larger corpus

Slide Credit: Carlos Guestrin(C) Dhruv Batra 81

Convolutional Nets

• Example:

– http://yann.lecun.com/exdb/lenet/index.html

(C) Dhruv Batra 82

INPUT 32x32

Convolutions SubsamplingConvolutions

C1: feature maps 6@28x28

Subsampling

S2: f. maps6@14x14

S4: f. maps 16@5x5

C5: layer120

C3: f. maps 16@10x10

F6: layer 84

Full connection

Full connection

Gaussian connections

OUTPUT 10

Image Credit: Yann LeCun, Kevin Murphy

http://yann.lecun.com/exdb/lenet/index.html

(C) Dhruv Batra 83Slide Credit: Marc'Aurelio Ranzato



Visualizing Learned Filters

(C) Dhruv Batra 86Figure Credit: [Zeiler & Fergus ECCV14]





Documents

VBM683 Machine Learning