10
STAT/CSE 416: Intro to Machine Learning Learning neural networks with hidden layers ©2017 Emily Fox OPTIONAL

Learning neural networks with hidden layers · 2018-05-31 · 4 STAT/CSE 416: Intro to Machine Learning Multilayer neural networks ©2017 Emily Fox Inference and Learning •Forward

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Learning neural networks with hidden layers · 2018-05-31 · 4 STAT/CSE 416: Intro to Machine Learning Multilayer neural networks ©2017 Emily Fox Inference and Learning •Forward

STAT/CSE 416: Intro to Machine Learning

Learning neural networks with hidden layers

©2017 Emily Fox

OPTIONAL

Page 2: Learning neural networks with hidden layers · 2018-05-31 · 4 STAT/CSE 416: Intro to Machine Learning Multilayer neural networks ©2017 Emily Fox Inference and Learning •Forward

STAT/CSE 416: Intro to Machine Learning2

Optimizing a single-layer neuron

We train to minimize sum of squared errors:

Taking gradients:

`(w) =1

2

X

i

[yi � g(w0 +X

j

wjxi[j])]2

@`

@wj= �

X

i

[yi � g(w0 +X

j

wjxi[j])]@

@wjg(w0 +

X

j

wjxi[j])@`

@wj= �

X

i

[yi � g(w0 +X

j

wjxi[j])]xi[j]g0(w0 +

X

j

wjxi[j])

Solution just depends on g’: derivative of activation function!

©2017 Emily Fox

x[1]

x[2]

x[d]

Σ

1

Page 3: Learning neural networks with hidden layers · 2018-05-31 · 4 STAT/CSE 416: Intro to Machine Learning Multilayer neural networks ©2017 Emily Fox Inference and Learning •Forward

STAT/CSE 416: Intro to Machine Learning3

Forward propagation1-hidden layer:

Compute values left to right 1. Inputs: x[1],…,x[d]2. Hidden: v[1],…,v[d]3. Output: y

©2017 Emily Fox

out(x) = g(w0 +X

k

wkg(wk0 +

X

j

wkj x[j]))

For fixed weights, forming predictions is easy!

v[1]

v[2]

x[1]

x[2]

1

y

1

Page 4: Learning neural networks with hidden layers · 2018-05-31 · 4 STAT/CSE 416: Intro to Machine Learning Multilayer neural networks ©2017 Emily Fox Inference and Learning •Forward

STAT/CSE 416: Intro to Machine Learning4

Multilayer neural networks

©2017 Emily Fox

Inference and Learning• Forward pass:

left to right, each hidden layer in turn

• Gradient computation:right to left, propagating gradient for each node

Forward

Gradient

Page 5: Learning neural networks with hidden layers · 2018-05-31 · 4 STAT/CSE 416: Intro to Machine Learning Multilayer neural networks ©2017 Emily Fox Inference and Learning •Forward

STAT/CSE 416: Intro to Machine Learning5

Back-propagation – Learning

• Just gradient descent!!! • Recursive algorithm for computing gradient• For each example- Perform forward propagation - Start from output layer

• Compute gradient of node v[k] with parents u[1],u[2],…:• Update weight • Repeat (move to preceding layer)

©2017 Emily Fox

wkj

Page 6: Learning neural networks with hidden layers · 2018-05-31 · 4 STAT/CSE 416: Intro to Machine Learning Multilayer neural networks ©2017 Emily Fox Inference and Learning •Forward

STAT/CSE 416: Intro to Machine Learning6

Convergence of backprop

Multilayer neural nets not convex- Gradient descent gets stuck in local minima

©2017 Emily Fox

Page 7: Learning neural networks with hidden layers · 2018-05-31 · 4 STAT/CSE 416: Intro to Machine Learning Multilayer neural networks ©2017 Emily Fox Inference and Learning •Forward

STAT/CSE 416: Intro to Machine Learning

Deep features:

Deep learning+

Transfer learning

©2018 Emily Fox

Page 8: Learning neural networks with hidden layers · 2018-05-31 · 4 STAT/CSE 416: Intro to Machine Learning Multilayer neural networks ©2017 Emily Fox Inference and Learning •Forward

STAT/CSE 416: Intro to Machine Learning8

Transfer learning: Use data from one task to help learn on another

Lots of data:Learn

neural netGreat accuracy

on cat v. dog

Some data: Neural net as feature extractor

+

Simple classifier

Great accuracy on 101

categories

Old idea, explored for deep learning by Donahue et al. ’14 & others

©2018 Emily Fox

vs.

Page 9: Learning neural networks with hidden layers · 2018-05-31 · 4 STAT/CSE 416: Intro to Machine Learning Multilayer neural networks ©2017 Emily Fox Inference and Learning •Forward

STAT/CSE 416: Intro to Machine Learning9

What’s learned in a neural net

©2018 Emily Fox

Very specific to Task 1

Should be ignored for other tasks

More genericCan be used as feature extractor

vs.

Neural net trained for Task 1: cat vs. dog

Page 10: Learning neural networks with hidden layers · 2018-05-31 · 4 STAT/CSE 416: Intro to Machine Learning Multilayer neural networks ©2017 Emily Fox Inference and Learning •Forward

STAT/CSE 416: Intro to Machine Learning10

Transfer learning in more detail…

©2018 Emily Fox

Very specific to Task 1

Should be ignored for other tasks

More genericCan be used as feature extractor

For Task 2, predicting 101 categories, learn only end part of neural net

Use simple classifiere.g., logistic regression, SVMs, nearest neighbor,…

Class?Keep weights fixed!

Neural net trained for Task 1: cat vs. dog