2

Click here to load reader

Refresher: Refresher: PerceptronPerceptron Training ...marc/cs672/net09-23.pdf · Another Refresher: Linear Algebra How can we visualize a straight line defined by an equation such

Embed Size (px)

Citation preview

Page 1: Refresher: Refresher: PerceptronPerceptron Training ...marc/cs672/net09-23.pdf · Another Refresher: Linear Algebra How can we visualize a straight line defined by an equation such

11

Refresher: Refresher: PerceptronPerceptron Training AlgorithmTraining AlgorithmAlgorithmAlgorithm PerceptronPerceptron;;

Start with a randomly chosen weight vector Start with a randomly chosen weight vector ww00;;Let k = 1;Let k = 1;whilewhile there exist input vectors that are there exist input vectors that are

misclassified by misclassified by wwkk--11, , dodo

September 23, 2010 Neural Networks Lecture 6: Perceptron Learning

1

Let Let iijj be a misclassified input vector;be a misclassified input vector;Let Let xxkk = class(= class(iijj))⋅⋅iijj, implying that , implying that wwkk--11⋅⋅xxkk < 0;< 0;Update the weight vector to Update the weight vector to wwkk = = wwkk--11 + + ηηxxkk;;Increment k;Increment k;

endend--whilewhile;;

Another Refresher: Linear AlgebraAnother Refresher: Linear AlgebraHow can we visualize a straight line defined by an How can we visualize a straight line defined by an equation such as wequation such as w00 + w+ w11ii11 + w+ w22ii22 = 0?= 0?One possibility is to determine the points where the One possibility is to determine the points where the line crosses the coordinate axes:line crosses the coordinate axes:ii11 = 0 = 0 ⇒⇒ ww00 + + ww22ii22 = = 0 0 ⇒⇒ ww22ii22 = = --ww00 ⇒⇒ ii22 = = --ww00//ww22

ii 00 ++ ii 00 ii ii //

September 23, 2010 Neural Networks Lecture 6: Perceptron Learning

2

ii22 = 0 = 0 ⇒⇒ ww00 + + ww11ii11 = 0 = 0 ⇒⇒ ww11ii11 = = --ww00 ⇒⇒ ii11 = = --ww00//ww11

Thus, the line crosses at (0, Thus, the line crosses at (0, --ww00//ww22))TT and (and (--ww00//ww11, 0), 0)TT..If wIf w11 or wor w22 is 0, it just means that the line is horizontal is 0, it just means that the line is horizontal or vertical, respectively.or vertical, respectively.If wIf w00 is 0, the line hits the origin, and its slope iis 0, the line hits the origin, and its slope i22/i/ii i is:is:ww11ii11 + w+ w22ii22 = = 0 0 ⇒⇒ ww22ii22 = = --ww11ii11 ⇒⇒ ii22/i/i1 1 = = --ww11/w/w22

PerceptronPerceptron Learning ExampleLearning Example

i223

-1

We would like our We would like our perceptronperceptron to correctly classify the to correctly classify the five 2five 2--dimensional data points below.dimensional data points below.Let the random initial weight vector Let the random initial weight vector ww00 = (2,= (2, 1, 1, --2)2)TT..

Then the dividing line crosses atThen the dividing line crosses at(0, 1)(0, 1)TT and (and (--2, 0)2, 0)TT..

September 23, 2010 Neural Networks Lecture 6: Perceptron Learning

3

i1

1 2 3-3 -2 -1 12

-3-2-1

-1

1

cclass lass --11cclass 1lass 1

Let us pick the misclassified Let us pick the misclassified point (point (--2, 2, --1)1)TT for learning:for learning:ii = (1, = (1, --2, 2, --1)1)TT (include offset 1)(include offset 1)xx11 = (= (--1)1)⋅⋅(1, (1, --2, 2, --1)1)TT ((ii is is in in class class --1)1)xx11 = (= (--1, 2, 1)1, 2, 1)TT

PerceptronPerceptron Learning ExampleLearning Example

i223

-1

ww11 = = ww00 + + xx1 1 (let us set (let us set ηη = 1 for simplicity)= 1 for simplicity)ww11 = = (2,(2, 1, 1, --2)2)T T + + ((--1, 1, 2, 1)2, 1)TT = (1= (1, , 3, 3, --1)1)TT

The new dividing line crosses at (0, 1)The new dividing line crosses at (0, 1)TT and (and (--1/3, 0)1/3, 0)TT..

Let us pick the next misclassified Let us pick the next misclassified TT

September 23, 2010 Neural Networks Lecture 6: Perceptron Learning

4

i1

1 2 3-3 -2 -1 12

-3-2-1

-1

1

point (0, 2)point (0, 2)TT for learning:for learning:ii = (1, 0, 2)= (1, 0, 2)TT (include offset 1)(include offset 1)xx22 = = (1, 0, 2)(1, 0, 2)TT ((ii is is in in class 1)class 1)

cclass lass --11cclass 1lass 1

PerceptronPerceptron Learning ExampleLearning Example

i223 1

ww22 = = ww11 + + xx22

ww22 = = (1,(1, 3, 3, --1)1)T T + + (1(1, , 0, 2)0, 2)TT = (2, 3, 1)= (2, 3, 1)TT

Now the line crosses at (0, Now the line crosses at (0, --2)2)TT and (and (--2/3, 0)2/3, 0)TT..

With this weight vector, the With this weight vector, the

September 23, 2010 Neural Networks Lecture 6: Perceptron Learning

5

i1

1 2 3-3 -2 -1 12

-3-2-1

-1

1perceptronperceptron achieves perfect achieves perfect classification!classification!The learning process terminates.The learning process terminates.In most cases, many more In most cases, many more iterations are necessary than in iterations are necessary than in this example.this example.cclass lass --11

cclass 1lass 1

PerceptronPerceptron Learning ResultsLearning ResultsWe We proved proved that the that the perceptronperceptron learning algorithm is learning algorithm is guaranteed to find a solution to a classification problem guaranteed to find a solution to a classification problem if it is linearly separable.if it is linearly separable.But are those solutions optimal?But are those solutions optimal?One of the reasons why we are interested in neural One of the reasons why we are interested in neural networks is that they are able to generalize, i.e., give networks is that they are able to generalize, i.e., give l ibl f ( i d) il ibl f ( i d) i

September 23, 2010 Neural Networks Lecture 6: Perceptron Learning

6

plausible output for new (untrained) inputs.plausible output for new (untrained) inputs.How well does a How well does a perceptronperceptron deal with new inputs? deal with new inputs?

Page 2: Refresher: Refresher: PerceptronPerceptron Training ...marc/cs672/net09-23.pdf · Another Refresher: Linear Algebra How can we visualize a straight line defined by an equation such

22

PerceptronPerceptron Learning ResultsLearning ResultsPerfect Perfect classification of classification of training samples, training samples, but may not but may not generalize well to generalize well to new (untrained) new (untrained) samples.samples.

September 23, 2010 Neural Networks Lecture 6: Perceptron Learning

7

PerceptronPerceptron Learning ResultsLearning ResultsThis function This function is likely to is likely to perform perform better better classification classification on new on new samples.samples.

September 23, 2010 Neural Networks Lecture 6: Perceptron Learning

8

AdalinesAdalinesIdea behind adaptive linear elements (Idea behind adaptive linear elements (AdalinesAdalines):):Compute a continuous, differentiable error function Compute a continuous, differentiable error function between net input and desired output (before applying between net input and desired output (before applying threshold function).threshold function).For example, compute the mean squared error (MSE) For example, compute the mean squared error (MSE) between every training vector and its class (1 or between every training vector and its class (1 or --1).1).

September 23, 2010 Neural Networks Lecture 6: Perceptron Learning

9

Then find those weights for which the error is minimal.Then find those weights for which the error is minimal.With a differential error function, we can use the With a differential error function, we can use the gradient descent technique gradient descent technique to find this absolute to find this absolute minimum in the error function.minimum in the error function.

Gradient DescentGradient DescentGradient descent is a very common technique to find Gradient descent is a very common technique to find the absolute minimum of a function.the absolute minimum of a function.It is especially useful for highIt is especially useful for high--dimensional functions.dimensional functions.We will use it to We will use it to iteratively minimizes the network’s iteratively minimizes the network’s (or (or neuron’s) error neuron’s) error by by finding the gradientfinding the gradient of the error of the error surface in weightsurface in weight--space andspace and adjusting the weightsadjusting the weights

September 23, 2010 Neural Networks Lecture 6: Perceptron Learning

10

surface in weightsurface in weight space and space and adjusting the weightsadjusting the weightsin the opposite in the opposite direction.direction.

Gradient DescentGradient DescentGradientGradient--descent example:descent example: Finding the absolute Finding the absolute minimum of a oneminimum of a one--dimensional error function f(x):dimensional error function f(x):

f(x)f(x)slope: f’(xslope: f’(x00))

September 23, 2010 Neural Networks Lecture 6: Perceptron Learning

11

xxxx00 xx1 1 = x= x00 -- η⋅η⋅f’(xf’(x00))

Repeat this iteratively until for some xRepeat this iteratively until for some xii, f’(x, f’(xii) is ) is sufficiently close to 0.sufficiently close to 0.

Gradient DescentGradient DescentGradients of twoGradients of two--dimensional functions:dimensional functions:

September 23, 2010 Neural Networks Lecture 6: Perceptron Learning

12

The twoThe two--dimensional function in the left diagram is represented by contour dimensional function in the left diagram is represented by contour lines in the right diagram, where arrows indicate the gradient of the function lines in the right diagram, where arrows indicate the gradient of the function at different locations. Obviously, the gradient is always pointing in the at different locations. Obviously, the gradient is always pointing in the direction of the steepest increase of the function. In order to find the direction of the steepest increase of the function. In order to find the function’s minimum, we should always move against the gradient.function’s minimum, we should always move against the gradient.