Learning neural networks with hidden layers · 2018-05-31 · 4 STAT/CSE 416: Intro to Machine Learning Multilayer neural networks ©2017 Emily Fox Inference and Learning •Forward

STAT/CSE 416: Intro to Machine Learning

Learning neural networks with hidden layers

©2017 Emily Fox

OPTIONAL

STAT/CSE 416: Intro to Machine Learning2

Optimizing a single-layer neuron

We train to minimize sum of squared errors:

Taking gradients:

`(w) =1

2

X

i

[yi � g(w0 +X

j

wjxi[j])]2

@`

@wj= �

X

i

[yi � g(w0 +X

j

wjxi[j])]@

@wjg(w0 +

X

j

wjxi[j])@`

@wj= �

X

i

[yi � g(w0 +X

j

wjxi[j])]xi[j]g0(w0 +

X

j

wjxi[j])

Solution just depends on g’: derivative of activation function!

©2017 Emily Fox

x[1]

x[2]

x[d]

Σ

…

1


Forward propagation1-hidden layer:

Compute values left to right 1. Inputs: x[1],…,x[d]2. Hidden: v[1],…,v[d]3. Output: y

©2017 Emily Fox

out(x) = g(w0 +X

k

wkg(wk0 +

X

j

wkj x[j]))

For fixed weights, forming predictions is easy!

v[1]

v[2]

x[1]

x[2]

1

y

1


Multilayer neural networks

©2017 Emily Fox

Inference and Learning• Forward pass:

left to right, each hidden layer in turn

• Gradient computation:right to left, propagating gradient for each node

Forward

Gradient


Back-propagation – Learning

• Just gradient descent!!! • Recursive algorithm for computing gradient• For each example- Perform forward propagation - Start from output layer

• Compute gradient of node v[k] with parents u[1],u[2],…:• Update weight • Repeat (move to preceding layer)

©2017 Emily Fox

wkj


Convergence of backprop

Multilayer neural nets not convex- Gradient descent gets stuck in local minima

©2017 Emily Fox

STAT/CSE 416: Intro to Machine Learning

Deep features:

Deep learning+

Transfer learning

©2018 Emily Fox


Transfer learning: Use data from one task to help learn on another

Lots of data:Learn

neural netGreat accuracy

on cat v. dog

Some data: Neural net as feature extractor

+

Simple classifier

Great accuracy on 101

categories

Old idea, explored for deep learning by Donahue et al. ’14 & others

©2018 Emily Fox

vs.


What’s learned in a neural net

©2018 Emily Fox

Very specific to Task 1

Should be ignored for other tasks

More genericCan be used as feature extractor

vs.

Neural net trained for Task 1: cat vs. dog


Transfer learning in more detail…

©2018 Emily Fox

Very specific to Task 1

Should be ignored for other tasks

More genericCan be used as feature extractor

For Task 2, predicting 101 categories, learn only end part of neural net

Use simple classifiere.g., logistic regression, SVMs, nearest neighbor,…

Class?Keep weights fixed!

Neural net trained for Task 1: cat vs. dog

Documents

Learning neural networks with hidden layers · 2018-05-31 · 4 STAT/CSE 416: Intro to Machine Learning Multilayer neural networks ©2017 Emily Fox Inference and Learning •Forward