45
Robert J. Marks II Robert J. Marks II University of Washington Department of Electrical Engineering CIA Laboratory, Box 352500 Seattle, Washington 98185-2500 [email protected] Artificial Neural Networks Supervised Models

Robert J. Marks II University of Washington Department of Electrical Engineering CIA Laboratory, Box 352500 Seattle, Washington 98185-2500 [email protected]

Embed Size (px)

Citation preview

Page 1: Robert J. Marks II University of Washington Department of Electrical Engineering CIA Laboratory, Box 352500 Seattle, Washington 98185-2500 marks@ee.washington.edu

Robert J. Marks II

Robert J. Marks IIUniversity of WashingtonDepartment of Electrical EngineeringCIA Laboratory, Box 352500Seattle, Washington [email protected]

Artificial Neural Networks: Supervised Models

Page 2: Robert J. Marks II University of Washington Department of Electrical Engineering CIA Laboratory, Box 352500 Seattle, Washington 98185-2500 marks@ee.washington.edu

Robert J. Marks II

Supervised Learning

Given: Input (Stimulus)/Output (Response)

Data Object:

Train a machine to simulate the input/output relationship

Types Classification (Discrete Outputs) Regression (Continuous Outputs)

Page 3: Robert J. Marks II University of Washington Department of Electrical Engineering CIA Laboratory, Box 352500 Seattle, Washington 98185-2500 marks@ee.washington.edu

Robert J. Marks II

Training a Classifier

> classifier < Marks

> classifier < not Marks

> classifier < not Marks

> classifier < not Marks

> classifier < Marks

> classifier < not Marks

Page 4: Robert J. Marks II University of Washington Department of Electrical Engineering CIA Laboratory, Box 352500 Seattle, Washington 98185-2500 marks@ee.washington.edu

Robert J. Marks II

Recall from a Trained Classifier

> Classifier > Marks

Note: The test image does not appear in the training data.

Learning = Memorization

Page 5: Robert J. Marks II University of Washington Department of Electrical Engineering CIA Laboratory, Box 352500 Seattle, Washington 98185-2500 marks@ee.washington.edu

Robert J. Marks II

Classifier In Feature Space, After Training

representation

concept(truth)

= training data= Marks= not Marks

= test data (Marks)

Page 6: Robert J. Marks II University of Washington Department of Electrical Engineering CIA Laboratory, Box 352500 Seattle, Washington 98185-2500 marks@ee.washington.edu

Robert J. Marks II

Supervised Regression (Interpolation)

Output data is continuous rather than discrete

Example - Load Forecasting Training (from historical data):

Input: temperatures, current load, day of week, holiday(?), etc.

Output: next day’s load Test

Input: forecasted temperatures, current load, day of week, holiday(?), etc.

Output: tomorrow’s load forecast

Page 7: Robert J. Marks II University of Washington Department of Electrical Engineering CIA Laboratory, Box 352500 Seattle, Washington 98185-2500 marks@ee.washington.edu

Robert J. Marks II

Properties of Good Classifiersand Regression Machines

Good accuracy outside of training set Explanation Facility

Generate rules after training

Fast training Fast testing

Page 8: Robert J. Marks II University of Washington Department of Electrical Engineering CIA Laboratory, Box 352500 Seattle, Washington 98185-2500 marks@ee.washington.edu

Robert J. Marks II

Some Classifiers and Regression Machines

Classification & Autoregression Trees (CART)

Nearest Neighbor Look-Up Neural Networks

Layered Perceptron (or MLP’s) Recurrent Perceptrons Cascade Correlation Neural Networks Radial Basis Function Neural Networks

Page 9: Robert J. Marks II University of Washington Department of Electrical Engineering CIA Laboratory, Box 352500 Seattle, Washington 98185-2500 marks@ee.washington.edu

Robert J. Marks II

A Model of an Artificial Neuron

w4

w3

w2

w5

w1

s1

s5

s4

s3

s2

s = state = (sum)(.) = squashing function

sum s

nsum = wn sn

(sum)

Page 10: Robert J. Marks II University of Washington Department of Electrical Engineering CIA Laboratory, Box 352500 Seattle, Washington 98185-2500 marks@ee.washington.edu

Robert J. Marks II

Squashing Functions

sum

(sum)1

sigmoid: (x) = __________11 + e - x

Page 11: Robert J. Marks II University of Washington Department of Electrical Engineering CIA Laboratory, Box 352500 Seattle, Washington 98185-2500 marks@ee.washington.edu

Robert J. Marks II

A Layered Perceptron

interconnects

neurons

hiddenlayer

output

input

Page 12: Robert J. Marks II University of Washington Department of Electrical Engineering CIA Laboratory, Box 352500 Seattle, Washington 98185-2500 marks@ee.washington.edu

Robert J. Marks II

Training

Given Training Data, input vector set :

{ i n | 1 < n < N } corresponding output (target) vector set:

{ t n | 1 < n < N }

Find the weights of the interconnects using training data to minimize error in the test data

Page 13: Robert J. Marks II University of Washington Department of Electrical Engineering CIA Laboratory, Box 352500 Seattle, Washington 98185-2500 marks@ee.washington.edu

Robert J. Marks II

Error

Input, target & response input vector set : { i n | 1 < n < N }

target vector set: { t n | 1 < n < N }

on = neural network output when the input is i n . (Note: on = t n )

Error

on - tn n

12

Page 14: Robert J. Marks II University of Washington Department of Electrical Engineering CIA Laboratory, Box 352500 Seattle, Washington 98185-2500 marks@ee.washington.edu

Robert J. Marks II

Error Minimization Techniques The error is a function of the

fixed training and test data neural network weights

Find weights that minimize error (Standard Optimization) conjugate gradient descent random search genetic algorithms steepest descent (error backpropagation)

Page 15: Robert J. Marks II University of Washington Department of Electrical Engineering CIA Laboratory, Box 352500 Seattle, Washington 98185-2500 marks@ee.washington.edu

Robert J. Marks II

Minimizing Error Using Steepest Descent

The main idea:Find the way downhill and take a step:

E

x

minimum

downhill = - _____d Ed x

= step size

x x -d Ed x

Page 16: Robert J. Marks II University of Washington Department of Electrical Engineering CIA Laboratory, Box 352500 Seattle, Washington 98185-2500 marks@ee.washington.edu

Robert J. Marks II

Example of Steepest Descent

E(x) = _ x 2 ; minimum at x = 0

- ___ = - xx x -x

Solution to difference equation

xp xp-1

is x p p x0.

for || < 1, x1

d Ed x

12

d Ed x

Page 17: Robert J. Marks II University of Washington Department of Electrical Engineering CIA Laboratory, Box 352500 Seattle, Washington 98185-2500 marks@ee.washington.edu

Robert J. Marks II

Training the Perceptron

on- tn

wnkik -tn

im wmkik - tm

ij om- tm

12

12

2

n = 1

n=1 k=1

k=1

2 4

4 ddwm j

o1 o2

i 1 i 2 i 3 i 4

w11 w24

Page 18: Robert J. Marks II University of Washington Department of Electrical Engineering CIA Laboratory, Box 352500 Seattle, Washington 98185-2500 marks@ee.washington.edu

Robert J. Marks II

Weight Update

ij om- tm

for m = 4 and j = 2

w24 w24 - i4 o2- t2

o1 o2

i 1 i 2 i 3 i 4

w11 w24

ddwm j

Page 19: Robert J. Marks II University of Washington Department of Electrical Engineering CIA Laboratory, Box 352500 Seattle, Washington 98185-2500 marks@ee.washington.edu

Robert J. Marks II

No Hidden Alters = Linear Separation

o = (wn in )For classifier, threshold: If o > ___ , announce class #1 If o < ___ , announce class #2

Classification boundary:

o = ___ , or wn in = 0. This is the equation of a plane!

12

12

12

n

n

o

w1 w3

w2

i 1 i 2 i 3

Page 20: Robert J. Marks II University of Washington Department of Electrical Engineering CIA Laboratory, Box 352500 Seattle, Washington 98185-2500 marks@ee.washington.edu

Robert J. Marks II

wn in = 0 = line through origin.n

i 2

Classification Boundary

i 1

Page 21: Robert J. Marks II University of Washington Department of Electrical Engineering CIA Laboratory, Box 352500 Seattle, Washington 98185-2500 marks@ee.washington.edu

Robert J. Marks II

Adding Bias Term

o

w1

w2 w3

w4

i 1 i 2 i 3 1

Classification boundary still a line,but need not go through origin.

i 2

i 1

Page 22: Robert J. Marks II University of Washington Department of Electrical Engineering CIA Laboratory, Box 352500 Seattle, Washington 98185-2500 marks@ee.washington.edu

Robert J. Marks II

The Minsky-Papert Objection

i 2

i 1

1

1

The simple operationof the exclusive or

(XOR)cannot be resolved

usinga linear perceptron

withbias.

More importantproblems can

probablythus not be resolved

with a linear perceptron with bias.

?

Page 23: Robert J. Marks II University of Washington Department of Electrical Engineering CIA Laboratory, Box 352500 Seattle, Washington 98185-2500 marks@ee.washington.edu

Robert J. Marks II

The Layered Perceptron

interconnect:

weights =

wjk(l)

neurons:states = sj(l )

hiddenlayer: l

output: l = L

input: l = 0

Page 24: Robert J. Marks II University of Washington Department of Electrical Engineering CIA Laboratory, Box 352500 Seattle, Washington 98185-2500 marks@ee.washington.edu

Robert J. Marks II

Error Backpropagation

dEdwjk(l )

______ = _____ ________ ________

Problem: For an arbitrary weight, wjk(l) , update

wjk(l ) wjk(l ) - ______

A Solution: Error Backpropagation Chain rule for partial fractions

dd d sj(l ) dsumj(l )

dwm j d sj(l ) dsumj(l ) dwm j

Page 25: Robert J. Marks II University of Washington Department of Electrical Engineering CIA Laboratory, Box 352500 Seattle, Washington 98185-2500 marks@ee.washington.edu

Robert J. Marks II

Each Partial is Evaluated(Beautiful Math!!!)

dsj(l ) d 1 dsumj(l ) dsumj(l ) 1 + exp[ - sumj(l ) ]

= sj(l ) [ 1 - sj(l ) ]

dsumj(l )

dwm j

dE dsj(l )

________ = _______ _________________

________ = s (l -1)

= j (l ) = n(l +1) sn(l +1) [ 1 - s n (l +1) ] wnj (l ) n

m

Page 26: Robert J. Marks II University of Washington Department of Electrical Engineering CIA Laboratory, Box 352500 Seattle, Washington 98185-2500 marks@ee.washington.edu

Robert J. Marks II

Weight Update

dEdwjk(l )

______ = _____ ________ ________

wjk(l ) wjk(l ) - ______

dd dsj(l ) dsumj(l )

dwm j d sj(l ) dsumj(l ) dwm j

= j(l ) sj(l +1) [ 1 - s j (l +1) ] sk (l -1)

Page 27: Robert J. Marks II University of Washington Department of Electrical Engineering CIA Laboratory, Box 352500 Seattle, Washington 98185-2500 marks@ee.washington.edu

Robert J. Marks II

Step #1: Input Data & Feedforward

s1(2) = o1 s2(2) = o2

s1(1) s2(1) s3(1)

i1 i2 = s2(0)

The states of all of the

neurons are

determined by the states of

the neurons below

them and the

interconnect weights.

Page 28: Robert J. Marks II University of Washington Department of Electrical Engineering CIA Laboratory, Box 352500 Seattle, Washington 98185-2500 marks@ee.washington.edu

Robert J. Marks II

Step #2: Evaluate output error, backpropagate

to find ’s for each neuron

o1 , t1 o2 , t2

1(2) 2(2)

s1(1) s2(1) s3(1)

1(1) 2(1) 3(1)

i1 i2 = s2(0)

1(0) 2(0)

Each neuron now keeps track of two

numbers. The ’s for each neuron are determined by “back-

propagating” the output

error towards the input.

Page 29: Robert J. Marks II University of Washington Department of Electrical Engineering CIA Laboratory, Box 352500 Seattle, Washington 98185-2500 marks@ee.washington.edu

Robert J. Marks II

Step #3: Update Weights

o1 , t1 o2 , t2

1(2) 2(2)

s1(1) s2(1) s3(1)

1(1) 2(1) 3(1)

i1 i2 = s2(0)

1(0) 2(0)

w32(1) w32(1)

-3(1) s3(1)

[ 1 - s3(1) ] s2 (0)

Weight updates are

performed within the neural network

architecture

Page 30: Robert J. Marks II University of Washington Department of Electrical Engineering CIA Laboratory, Box 352500 Seattle, Washington 98185-2500 marks@ee.washington.edu

Robert J. Marks II

Neural Smithing

Bias Momentum Batch Training Learning Versus Memorization Cross Validation The Curse of Dimensionality Variations

Page 31: Robert J. Marks II University of Washington Department of Electrical Engineering CIA Laboratory, Box 352500 Seattle, Washington 98185-2500 marks@ee.washington.edu

Robert J. Marks II

Bias

Bias is used with MLP At input Hidden layers (sometimes)

Page 32: Robert J. Marks II University of Washington Department of Electrical Engineering CIA Laboratory, Box 352500 Seattle, Washington 98185-2500 marks@ee.washington.edu

Robert J. Marks II

Momentum Steepest descent wjk(l) wjk(l ) +wjk(l ) With Momentum,

wjk (l ) = wjk(l ) +wjk (l ) + wjk(l ) New step effected by previous step m is the iteration number Convergence is improved

m mm+1 m+1

Page 33: Robert J. Marks II University of Washington Department of Electrical Engineering CIA Laboratory, Box 352500 Seattle, Washington 98185-2500 marks@ee.washington.edu

Robert J. Marks II

Back Propagation Batch Training

Accumulate error from all training data prior to weight update True steepest descent Update weights each epoch

Training Layered Perceptron One Data pair at a time Randomize data to avoid structure The Widrow-Hoff Algorithm

Page 34: Robert J. Marks II University of Washington Department of Electrical Engineering CIA Laboratory, Box 352500 Seattle, Washington 98185-2500 marks@ee.washington.edu

Robert J. Marks II

Learning versus Memorization: Both have zero training error

goodgeneralization

(learning)

concept(truth)

badgeneralization(memorization)

= training data= test data

Page 35: Robert J. Marks II University of Washington Department of Electrical Engineering CIA Laboratory, Box 352500 Seattle, Washington 98185-2500 marks@ee.washington.edu

Robert J. Marks II

Alternate View:

concept

learning

memorization(over fitting)

Page 36: Robert J. Marks II University of Washington Department of Electrical Engineering CIA Laboratory, Box 352500 Seattle, Washington 98185-2500 marks@ee.washington.edu

Robert J. Marks II

Learning versus Memorization (cont.) Successful Learning:

Recognizing data outside the training set, e.g. data in the test set.

i.e. the neural network must successfully classify (interpolate) inputs it has not seen before.

How can we assure learning? Cross Validation Choosing neural network structure

Pruning Genetic Algorithms

Page 37: Robert J. Marks II University of Washington Department of Electrical Engineering CIA Laboratory, Box 352500 Seattle, Washington 98185-2500 marks@ee.washington.edu

Robert J. Marks II

Cross Validation

iterations (m)

testerror

trainingerror

minimum

Page 38: Robert J. Marks II University of Washington Department of Electrical Engineering CIA Laboratory, Box 352500 Seattle, Washington 98185-2500 marks@ee.washington.edu

Robert J. Marks II

The Curse of Dimensionality

For many problems, the required number of trainingdata increases to the power of the input’s dimension.

Example:

• For N=2 inputs, suppose that 100 = 102 training data pairs

• For N=3 inputs, 103 = 1000 training data pairs are needed

• In general, 10N training data pairs are neededfor many important problems.

Page 39: Robert J. Marks II University of Washington Department of Electrical Engineering CIA Laboratory, Box 352500 Seattle, Washington 98185-2500 marks@ee.washington.edu

Robert J. Marks II

Example: Classifying a circle in a square

i1

i2

neural net

o

i1 i2

100 = 102 pointsare shown.

Page 40: Robert J. Marks II University of Washington Department of Electrical Engineering CIA Laboratory, Box 352500 Seattle, Washington 98185-2500 marks@ee.washington.edu

Robert J. Marks II

Example: Classifying a sphere in a cubeN=3

neural net

o

i1 i2 i3

i3

i2

i1

10 layers each with 102

points= 103 points= 10N points

Page 41: Robert J. Marks II University of Washington Department of Electrical Engineering CIA Laboratory, Box 352500 Seattle, Washington 98185-2500 marks@ee.washington.edu

Robert J. Marks II

Variations Architecture variation for MLP’s

Recurrent Neural Networks Radial Basis Functions Cascade Correlation Fuzzy MLP’s

Training Algorithms

Page 42: Robert J. Marks II University of Washington Department of Electrical Engineering CIA Laboratory, Box 352500 Seattle, Washington 98185-2500 marks@ee.washington.edu

Robert J. Marks II

Applications

Power Engineering Finance Bioengineering Control Industrial Applications Politics

Page 43: Robert J. Marks II University of Washington Department of Electrical Engineering CIA Laboratory, Box 352500 Seattle, Washington 98185-2500 marks@ee.washington.edu

Robert J. Marks II

Political ApplicationsPolitical ApplicationsRobert Novak syndicated column

Washington, February 18, 1996

UNDECIDED BOWLERS

“President Clinton’s pollsters have identified the voters who will determine whether he will be elected to a second term: two-parent families whose members bowl for recreation.”

“Using a technique they call the ‘neural network,’ Clinton advisors contend that these family bowlers are the quintessential undecided voters. Therefore, these are the people who must be targeted by the president.”

Page 44: Robert J. Marks II University of Washington Department of Electrical Engineering CIA Laboratory, Box 352500 Seattle, Washington 98185-2500 marks@ee.washington.edu

Robert J. Marks II

“A footnote: Two decades ago, Illinois Democratic Gov. Dan Walker campaigned

heavily in bowling alleys in the belief he would find swing voters there. Walker had national

political ambitions but ended up in federal prison.”

Robert Novak syndicated columnWashington, February 18, 1996

(continued)

Page 45: Robert J. Marks II University of Washington Department of Electrical Engineering CIA Laboratory, Box 352500 Seattle, Washington 98185-2500 marks@ee.washington.edu

Robert J. Marks II

FiniFiniss