74
Machine Learning Neural Networks

Machine Learning Neural Networks Human Brain Neurons

  • View
    219

  • Download
    2

Embed Size (px)

Citation preview

Page 1: Machine Learning Neural Networks Human Brain Neurons

Machine Learning

Neural Networks

Page 2: Machine Learning Neural Networks Human Brain Neurons

Human Brain

Page 3: Machine Learning Neural Networks Human Brain Neurons

Neurons

Page 4: Machine Learning Neural Networks Human Brain Neurons

Input-Output Transformation

Input Spikes

Output Spike

(Excitatory Post-Synaptic Potential)

Spike (= a brief pulse)

Page 5: Machine Learning Neural Networks Human Brain Neurons

Human Learning

• Number of neurons: ~ 1010

• Connections per neuron: ~ 104 to 105

• Neuron switching time: ~ 0.001 second• Scene recognition time: ~ 0.1 second

100 inference steps doesn’t seem much

Page 6: Machine Learning Neural Networks Human Brain Neurons

Machine Learning Abstraction

Synapses

Axon

Dendrites

Synapses++

+--

(weights)

Nodes

Page 7: Machine Learning Neural Networks Human Brain Neurons

Artificial Neural Networks

• Typically, machine learning ANNs are very artificial, ignoring:– Time– Space– Biological learning processes

• More realistic neural models exist– Hodkins & Huxley (1952) won a Nobel

prize for theirs (in 1963)• Nonetheless, very artificial ANNs have

been useful in many ML applications

Page 8: Machine Learning Neural Networks Human Brain Neurons

Perceptrons

• The “first wave” in neural networks• Big in the 1960’s

– McCulloch & Pitts (1943), Woodrow & Hoff (1960), Rosenblatt (1962)

Page 9: Machine Learning Neural Networks Human Brain Neurons

A single perceptron

else 0

0 if 10

n

iii xw

w1

w3

w2

w4

w5

x1

x2

x3

x4

x5

x0

w0

Inpu

ts

Bias (x0 =1,always)

Page 10: Machine Learning Neural Networks Human Brain Neurons

Logical Operators

-0.8

0.5

0.5

else 0

0 if 10

n

iii xw

x0

x1

x2

AND

-0.3

0.5

0.5

else 0

0 if 10

n

iii xw

x0

x1

x2

OR

0.0

-1.0

else 0

0 if 10

n

iii xw

x0

x1

NOT

Page 11: Machine Learning Neural Networks Human Brain Neurons

Perceptron Hypothesis Space

• What’s the hypothesis space for a perceptron on n inputs?

1| nwwH

Page 12: Machine Learning Neural Networks Human Brain Neurons

Learning Weights

• Perceptron Training Rule• Gradient Descent• (other approaches: Genetic

Algorithms)

else 0

0 if 10

n

iii xw

?x0

?x1

?x2

Page 13: Machine Learning Neural Networks Human Brain Neurons

Perceptron Training Rule

• Weights modified for each example • Update Rule:

iii www

ii xotw )( where

learning rate

targetvalue

perceptronoutput

inputvalue

Page 14: Machine Learning Neural Networks Human Brain Neurons

What weights make XOR?

• No combination of weights works• Perceptrons can only represent

linearly separable functions

else 0

0 if 10

n

iii xw

?

x0

?

x1

?x2

Page 15: Machine Learning Neural Networks Human Brain Neurons

Linear Separability

x1

x2

OR

Page 16: Machine Learning Neural Networks Human Brain Neurons

Linear Separability

x1

x2

AND

Page 17: Machine Learning Neural Networks Human Brain Neurons

Linear Separability

x1

x2

XOR

Page 18: Machine Learning Neural Networks Human Brain Neurons

Perceptron Training Rule

• Converges to the correct classification IF– Cases are linearly separable– Learning rate is slow enough– Proved by Minsky and Papert in 1969

Killed widespread interest in perceptrons till the 80’s

Page 19: Machine Learning Neural Networks Human Brain Neurons

XOR

else 0

0 if 10

n

iii xw

0

x0

0.6x1

0.6x2

else 0

0 if 10

n

iii xw

0

x0

else 0

0 if 10

n

iii xw

0

x0

XOR1

1

-0.6

-0.6

Page 20: Machine Learning Neural Networks Human Brain Neurons

What’s wrong with perceptrons?

• You can always plug multiple perceptrons together to calculate any function.

• BUT…who decides what the weights are?– Assignment of error to parental inputs

becomes a problem….– This is because of the threshold….

• Who contributed the error?

Page 21: Machine Learning Neural Networks Human Brain Neurons

Problem: Assignment of error

else 0

0 if 10

n

iii xw

?

x0

?

x1

?x2

Perceptron ThresholdStep function

• Hard to tell from the output who contributed what

• Stymies multi-layer weight learning

Page 22: Machine Learning Neural Networks Human Brain Neurons

Solution: Differentiable Function

n

iii xw

0

?

x0

?

x1

?x2

Simple linear function

• Varying any input a little creates a perceptible change in the output

• This lets us propagate error to prior nodes

Page 23: Machine Learning Neural Networks Human Brain Neurons

Measuring error for linear units

• Output Function

• Error Measure:

xwx

)(

Dd

dd otwE 2)(2

1)(

data targetvalue

linear unit output

Page 24: Machine Learning Neural Networks Human Brain Neurons

Gradient Descent

Gradient:Training rule:

Page 25: Machine Learning Neural Networks Human Brain Neurons

Gradient Descent Rule

Dddd

ii

otww

E 2)(2

1

Dd

didd xot ))(( ,

Dd

diddii xotww ,)(Update Rule:

Page 26: Machine Learning Neural Networks Human Brain Neurons

Back to XOR

else 0

0 if 10

n

iii xw

x0

x1

x2

else 0

0 if 10

n

iii xw

x0

else 0

0 if 10

n

iii xw

x0

XOR

Page 27: Machine Learning Neural Networks Human Brain Neurons

Gradient Descent for Multiple Layers

x0

x1

x2

x0

x0

XOR

n

iii xw

0

n

iii xw

0

n

iii xw

0

ijw

We can compute:

ijw

E

Page 28: Machine Learning Neural Networks Human Brain Neurons

Gradient Descent vs. Perceptrons

• Perceptron Rule & Threshold Units– Learner converges on an answer ONLY IF

data is linearly separable– Can’t assign proper error to parent nodes

• Gradient Descent– Minimizes error even if examples are not

linearly separable– Works for multi-layer networks

• But…linear units only make linear decision surfaces (can’t learn XOR even with many layers)

– And the step function isn’t differentiable…

Page 29: Machine Learning Neural Networks Human Brain Neurons

A compromise function• Perceptron

• Linear

• Sigmoid (Logistic)

else 0

0 if 10

n

iii xw

output

n

iii xwnetoutput

0

netenetoutput

1

1)(

Page 30: Machine Learning Neural Networks Human Brain Neurons

The sigmoid (logistic) unit

• Has differentiable function– Allows gradient descent

• Can be used to learn non-linear functions

?

x1

?x2

n

iii xw

e 01

1

Page 31: Machine Learning Neural Networks Human Brain Neurons

Logistic function

Inputs

Coefficients

Output

Independent variables

Prediction

Age 34

1Gender

Stage 4

.5

.8

.40.6

“Probability of beingAlive”

n

iii xw

e 01

1

Page 32: Machine Learning Neural Networks Human Brain Neurons

Neural Network Model

Inputs

Weights

Output

Independent variables

Dependent variable

Prediction

Age 34

2Gender

Stage 4

.6

.5

.8

.2

.1

.3.7

.2

WeightsHiddenLayer

“Probability of beingAlive”

0.6

.4

.2

Page 33: Machine Learning Neural Networks Human Brain Neurons

Getting an answer from a NN

Inputs

Weights

Output

Independent variables

Dependent variable

Prediction

Age 34

2Gender

Stage 4

.6

.5

.8

.1

.7

WeightsHiddenLayer

“Probability of beingAlive”

0.6

Page 34: Machine Learning Neural Networks Human Brain Neurons

Inputs

Weights

Output

Independent variables

Dependent variable

Prediction

Age 34

2Gender

Stage 4

.5

.8.2

.3

.2

WeightsHiddenLayer

“Probability of beingAlive”

0.6

Getting an answer from a NN

Page 35: Machine Learning Neural Networks Human Brain Neurons

Getting an answer from a NN

Inputs

Weights

Output

Independent variables

Dependent variable

Prediction

Age 34

1Gender

Stage 4

.6.5

.8.2

.1

.3.7

.2

WeightsHiddenLayer

“Probability of beingAlive”

0.6

Page 36: Machine Learning Neural Networks Human Brain Neurons

Minimizing the Error

winitialwtrained

initial error

final error

Error surface

positive change

negative derivative

local minimum

Page 37: Machine Learning Neural Networks Human Brain Neurons

Differentiability is key!

• Sigmoid is easy to differentiate

• For gradient descent on multiple layers, a little dynamic programming can help:– Compute errors at each output node– Use these to compute errors at each hidden node– Use these to compute errors at each input node

))(1()()(

yyy

y

Page 38: Machine Learning Neural Networks Human Brain Neurons

The Backpropagation Algorithm

jikjiji

ji

koutputsk

hkhhh

h

kkkkk

k

u

xδww

w

δwooδ

δ

otooδ

δk

u

ox

t,x

ight network weeach Update.4

)1(

error term its calculate h,unit hidden each For .3

))(1(

error term its calculate ,unit output each For 2.

network in the unit every for

output thecompute andnetwork the to instanceInput 1.

example, ninginput traieach For

Page 39: Machine Learning Neural Networks Human Brain Neurons

Getting an answer from a NN

Inputs

Weights

Output

Independent variables

Dependent variable

Prediction

Age 34

1Gender

Stage 4

.6.5

.8.2

.1

.3.7

.2

WeightsHiddenLayer

“Probability of beingAlive”

0.6

Page 40: Machine Learning Neural Networks Human Brain Neurons

Expressive Power of ANNs

• Universal Function Approximator:– Given enough hidden units, can

approximate any continuous function f

• Need 2+ hidden units to learn XOR

• Why not use millions of hidden units?– Efficiency (neural network training is

slow)– Overfitting

Page 41: Machine Learning Neural Networks Human Brain Neurons

Overfitting

Overfitted ModelReal Distribution

Page 42: Machine Learning Neural Networks Human Brain Neurons

Combating Overfitting in Neural Nets

• Many techniques

• Two popular ones:– Early Stopping

• Use “a lot” of hidden units• Just don’t over-train

– Cross-validation• Choose the “right” number of hidden units

Page 43: Machine Learning Neural Networks Human Brain Neurons

Early Stopping

b = training set

a = validation set

Overfitted model

error

Epochs

minerror )

error a

error b

Stopping criterion

Page 44: Machine Learning Neural Networks Human Brain Neurons

Cross-validation

• Cross-validation: general-purpose technique for model selection – E.g., “how many hidden units should I use?”

• More extensive version of validation-set approach.

Page 45: Machine Learning Neural Networks Human Brain Neurons

Cross-validation

• Break training set into k sets• For each model M

– For i=1…k•Train M on all but set i•Test on set i

• Output M with highest average test score, trained on full training set

Page 46: Machine Learning Neural Networks Human Brain Neurons

Summary of Neural Networks

Non-linear regression technique that is trained with gradient descent.

Question: How important is the biological metaphor?

Page 47: Machine Learning Neural Networks Human Brain Neurons

Summary of Neural Networks

When are Neural Networks useful?– Instances represented by attribute-value pairs

• Particularly when attributes are real valued

– The target function is• Discrete-valued• Real-valued• Vector-valued

– Training examples may contain errors– Fast evaluation times are necessary

When not?– Fast training times are necessary– Understandability of the function is required

Page 48: Machine Learning Neural Networks Human Brain Neurons

Advanced Topics in Neural Nets

• Batch Move vs. incremental• Hidden Layer Representations• Hopfield Nets• Neural Networks on Silicon• Neural Network language models

Page 49: Machine Learning Neural Networks Human Brain Neurons

Incremental vs. Batch Mode

Page 50: Machine Learning Neural Networks Human Brain Neurons

Incremental vs. Batch Mode

• In Batch Mode we minimize:

• Same as computing:

• Then setting

Dd

dD ww

Dwww

Page 51: Machine Learning Neural Networks Human Brain Neurons

Advanced Topics in Neural Nets

• Batch Move vs. incremental• Hidden Layer Representations• Hopfield Nets• Neural Networks on Silicon• Neural Network language models

Page 52: Machine Learning Neural Networks Human Brain Neurons

Hidden Layer Representations

• Input->Hidden Layer mapping:– representation of input vectors tailored to

the task

• Can also be exploited for dimensionality reduction– Form of unsupervised learning in which we

output a “more compact” representation of input vectors

– <x1, …,xn> -> <x’1, …,x’m> where m < n

– Useful for visualization, problem simplification, data compression, etc.

Page 53: Machine Learning Neural Networks Human Brain Neurons

Dimensionality Reduction

Model: Function to learn:

Page 54: Machine Learning Neural Networks Human Brain Neurons

Dimensionality Reduction: Example

Page 55: Machine Learning Neural Networks Human Brain Neurons

Dimensionality Reduction: Example

Page 56: Machine Learning Neural Networks Human Brain Neurons

Dimensionality Reduction: Example

Page 57: Machine Learning Neural Networks Human Brain Neurons

Dimensionality Reduction: Example

Page 58: Machine Learning Neural Networks Human Brain Neurons

Advanced Topics in Neural Nets

• Batch Move vs. incremental• Hidden Layer Representations• Hopfield Nets• Neural Networks on Silicon• Neural Network language models

Page 59: Machine Learning Neural Networks Human Brain Neurons

Advanced Topics in Neural Nets

• Batch Move vs. incremental• Hidden Layer Representations• Hopfield Nets• Neural Networks on Silicon• Neural Network language models

Page 60: Machine Learning Neural Networks Human Brain Neurons

Neural Networks on Silicon

• Currently:Simulation of continuous device physics (neural

networks)

Digital computational model (thresholding)

Continuous device physics (voltage)

Why not skip this?

Page 61: Machine Learning Neural Networks Human Brain Neurons

Example: Silicon Retina

Simulates function of biological retina

Single-transistor synapses adapt to luminance, temporal contrast

Modeling retina directly on chip => requires 100x less power!

Page 62: Machine Learning Neural Networks Human Brain Neurons

Example: Silicon Retina

• Synapses modeled with single transistors

Page 63: Machine Learning Neural Networks Human Brain Neurons

Luminance Adaptation

Page 64: Machine Learning Neural Networks Human Brain Neurons

Comparison with Mammal Data

• Real:

• Artificial:

Page 65: Machine Learning Neural Networks Human Brain Neurons

• Graphics and results taken from:

Page 66: Machine Learning Neural Networks Human Brain Neurons

General NN learning in silicon?

• Seems less in-vogue than in late-90s

• Interest has turned somewhat to implementing Bayesian techniques in analog silicon

Page 67: Machine Learning Neural Networks Human Brain Neurons

Advanced Topics in Neural Nets

• Batch Move vs. incremental• Hidden Layer Representations• Hopfield Nets• Neural Networks on Silicon• Neural Network language models

Page 68: Machine Learning Neural Networks Human Brain Neurons

Neural Network Language Models

• Statistical Language Modeling:– Predict probability of next word in sequence

I was headed to Madrid , ____

P(___ = “Spain”) = 0.5,P(___ = “but”) = 0.2, etc.

• Used in speech recognition, machine translation, (recently) information extraction

Page 69: Machine Learning Neural Networks Human Brain Neurons

• Estimate:

Formally

121 ,...,,| njjjj wwwwP

jj hwP |

Page 70: Machine Learning Neural Networks Human Brain Neurons
Page 71: Machine Learning Neural Networks Human Brain Neurons

Optimizations

• Key idea – learn simultaneously:– vector representations for each word (50

dim)– a predictor of the next word

• Short-lists– Much complexity in hidden->output layer

• Number of possible next words is large

– Only predict a subset of words• Use a standard probabilistic model for the rest

Page 72: Machine Learning Neural Networks Human Brain Neurons

Design Decisions (1)

• Number of hidden units

• Almost no difference…

Page 73: Machine Learning Neural Networks Human Brain Neurons

Design Decisions (2)

• Word representation (# of dimensions)

• They chose 120

Page 74: Machine Learning Neural Networks Human Brain Neurons

Comparison vs. state of the art