88
VBM683 Machine Learning Pinar Duygulu Slides are adapted from Dhruv Batra (Virginia Tech), J. Elder

VBM683 Machine Learning

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: VBM683 Machine Learning

VBM683

Machine Learning

Pinar Duygulu

Slides are adapted from

Dhruv Batra (Virginia Tech),

J. Elder

Page 2: VBM683 Machine Learning

New Topic: Neural Networks

(C) Dhruv Batra 2

Page 3: VBM683 Machine Learning

Two class discriminant function

Page 4: VBM683 Machine Learning

Two class discriminant function

Page 5: VBM683 Machine Learning

Perceptron

Page 6: VBM683 Machine Learning

Synonyms

• Neural Networks

• Artificial Neural Network (ANN)

• Feed-forward Networks

• Multilayer Perceptrons (MLP)

• Types of ANN

– Convolutional Nets

– Autoencoders

– Recurrent Neural Nets

• [Back with a new name]: Deep Nets / Deep Learning

(C) Dhruv Batra 6

Page 7: VBM683 Machine Learning

Biological Neuron

(C) Dhruv Batra 7

Page 8: VBM683 Machine Learning

The Neuron Metaphor

• Neurons

– accept information from multiple inputs,

– transmit information to other neurons.

• Multiply inputs by weights along edges

• Apply some function to the set of inputs at each node

8Slide Credit: HKUST

Page 9: VBM683 Machine Learning

Types of Neurons

Linear Neuron

Logistic Neuron

Perceptron

Potentially more. Require a convex

loss function for gradient descent training.

9Slide Credit: HKUST

Page 10: VBM683 Machine Learning

Generalized linear models

Page 11: VBM683 Machine Learning

Sigmoid

-6 -4 -2 0 2 4 60

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

-6 -4 -2 0 2 4 60

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

-6 -4 -2 0 2 4 60

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

w0=0, w1=1w0=2, w1=1 w0=0, w1=0.5

Slide Credit: Carlos Guestrin(C) Dhruv Batra 11

Page 12: VBM683 Machine Learning

Case 1: Linearly separable inputs

Page 13: VBM683 Machine Learning

The Perceptron algorithm

Page 14: VBM683 Machine Learning

The Perceptron algorithm

Page 15: VBM683 Machine Learning

The Perceptron criterion

Page 16: VBM683 Machine Learning

The Perceptron criterion

Page 17: VBM683 Machine Learning

The Perceptron algorithm

Page 18: VBM683 Machine Learning

The Perceptron algorithm

Page 19: VBM683 Machine Learning

The Perceptron algorithm

Page 20: VBM683 Machine Learning

Not monotonic

Page 21: VBM683 Machine Learning

The perceptron convergence

theorem

Page 22: VBM683 Machine Learning

Example

Page 23: VBM683 Machine Learning

The first learning machine

Page 24: VBM683 Machine Learning

Practical limitations

Page 25: VBM683 Machine Learning

Generalization to not linearly

separable inputs

Page 26: VBM683 Machine Learning

Generalization to multiclass

problems

Page 27: VBM683 Machine Learning

K>2 classes

Page 28: VBM683 Machine Learning

K>2 classes

Page 29: VBM683 Machine Learning

K>2 classes

Page 30: VBM683 Machine Learning

1-of-K coding scheme

Page 31: VBM683 Machine Learning

Computational limitations of

perceptrons

Page 32: VBM683 Machine Learning

Limitation

• A single “neuron” is still a linear decision boundary

• What to do?

• Idea: Stack a bunch of them together!

(C) Dhruv Batra 32

Page 33: VBM683 Machine Learning

Multilayer Networks

• Cascade Neurons together

• The output from one layer is the input to the next

• Each Layer has its own sets of weights

33Slide Credit: HKUST

Page 34: VBM683 Machine Learning

Multi-layer perceptrons

Page 35: VBM683 Machine Learning

Universal Function Approximators

• Theorem

– 3-layer network with linear outputs can uniformly

approximate any continuous function to arbitrary accuracy,

given enough hidden units [Funahashi ’89]

(C) Dhruv Batra 35

Page 36: VBM683 Machine Learning

Feed-Forward Networks

• Predictions are fed forward through the network to

classify

36Slide Credit: HKUST

Page 37: VBM683 Machine Learning

Feed-Forward Networks

• Predictions are fed forward through the network to

classify

37Slide Credit: HKUST

Page 38: VBM683 Machine Learning

Feed-Forward Networks

• Predictions are fed forward through the network to

classify

38Slide Credit: HKUST

Page 39: VBM683 Machine Learning

Feed-Forward Networks

• Predictions are fed forward through the network to

classify

39Slide Credit: HKUST

Page 40: VBM683 Machine Learning

Feed-Forward Networks

• Predictions are fed forward through the network to

classify

40Slide Credit: HKUST

Page 41: VBM683 Machine Learning

Feed-Forward Networks

• Predictions are fed forward through the network to

classify

41Slide Credit: HKUST

Page 42: VBM683 Machine Learning

Implementing logical relations

Page 43: VBM683 Machine Learning

XOR problem

Page 44: VBM683 Machine Learning

Combining two linear classifiers

Page 45: VBM683 Machine Learning

Combining two linear classifiers

Page 46: VBM683 Machine Learning

Combining two linear classifiers

Page 47: VBM683 Machine Learning

Two-layer perceptron

Page 48: VBM683 Machine Learning

Two-layer perceptron

Page 49: VBM683 Machine Learning

Two-layer perceptron

Page 50: VBM683 Machine Learning

Two-layer perceptron

Page 51: VBM683 Machine Learning

Higher dimensions

Page 52: VBM683 Machine Learning

Higher dimensions

Page 53: VBM683 Machine Learning

Two-layer perceptron

Page 54: VBM683 Machine Learning

Two layer perceptron

Page 55: VBM683 Machine Learning

Limitations of a two-layer

perceptron

Page 56: VBM683 Machine Learning

Three layer perceptron

Page 57: VBM683 Machine Learning

Three layer perceptron

Page 58: VBM683 Machine Learning

Three layer perceptron

Page 59: VBM683 Machine Learning

Learning parameters – Training

data

Page 60: VBM683 Machine Learning

Choosing an activation function

Page 61: VBM683 Machine Learning

Smooth activation function

Page 62: VBM683 Machine Learning

Output : Two classes

Page 63: VBM683 Machine Learning

Output: K> 2 classes

Page 64: VBM683 Machine Learning

Non-convex

Page 65: VBM683 Machine Learning

Backpropagation algorithm

Page 66: VBM683 Machine Learning

Notation

Page 67: VBM683 Machine Learning

Notation

Page 68: VBM683 Machine Learning

Input

Page 69: VBM683 Machine Learning

Notation

Page 70: VBM683 Machine Learning

Cost function

Page 71: VBM683 Machine Learning

Cost function

Page 72: VBM683 Machine Learning

Gradient descent

Page 73: VBM683 Machine Learning

Gradient descent

Page 74: VBM683 Machine Learning

Gradient descent

Page 75: VBM683 Machine Learning

Backpropagation: Output layer

Page 76: VBM683 Machine Learning

Backpropagation: Hidden layers

Page 77: VBM683 Machine Learning

Backpropagation: Summary of

the algorithm

Page 78: VBM683 Machine Learning

Batch versus online learning

Page 79: VBM683 Machine Learning

Batch versus online learning

Page 80: VBM683 Machine Learning

Neural Nets

• Best performers on OCR

– http://yann.lecun.com/exdb/lenet/index.html

• NetTalk

– Text to Speech system from 1987

– http://youtu.be/tXMaFhO6dIY?t=45m15s

• Rick Rashid speaks Mandarin

– http://youtu.be/Nu-nlQqFCKg?t=7m30s

(C) Dhruv Batra 80

Page 81: VBM683 Machine Learning

Convergence of backprop

• Perceptron leads to convex optimization

– Gradient descent reaches global minima

• Multilayer neural nets not convex

– Gradient descent gets stuck in local minima

– Hard to set learning rate

– Selecting number of hidden units and layers = fuzzy

process

– NNs had fallen out of fashion in 90s, early 2000s

– Back with a new name and significantly improved

performance!!!!

• Deep networks

– Dropout and trained on much larger corpus

Slide Credit: Carlos Guestrin(C) Dhruv Batra 81

Page 82: VBM683 Machine Learning

Convolutional Nets

• Example:

– http://yann.lecun.com/exdb/lenet/index.html

(C) Dhruv Batra 82

INPUT 32x32

Convolutions SubsamplingConvolutions

C1: feature maps 6@28x28

Subsampling

S2: f. maps6@14x14

S4: f. maps 16@5x5

C5: layer120

C3: f. maps 16@10x10

F6: layer 84

Full connection

Full connection

Gaussian connections

OUTPUT 10

Image Credit: Yann LeCun, Kevin Murphy

Page 83: VBM683 Machine Learning

(C) Dhruv Batra 83Slide Credit: Marc'Aurelio Ranzato

Page 84: VBM683 Machine Learning

(C) Dhruv Batra 84Slide Credit: Marc'Aurelio Ranzato

Page 85: VBM683 Machine Learning

(C) Dhruv Batra 85Slide Credit: Marc'Aurelio Ranzato

Page 86: VBM683 Machine Learning

Visualizing Learned Filters

(C) Dhruv Batra 86Figure Credit: [Zeiler & Fergus ECCV14]

Page 87: VBM683 Machine Learning

Visualizing Learned Filters

(C) Dhruv Batra 87Figure Credit: [Zeiler & Fergus ECCV14]

Page 88: VBM683 Machine Learning

Visualizing Learned Filters

(C) Dhruv Batra 88Figure Credit: [Zeiler & Fergus ECCV14]