Lecture 3 Perceptrons

Preview:

Citation preview

7/21/2019 Lecture 3 Perceptrons

http://slidepdf.com/reader/full/lecture-3-perceptrons 1/46

The Linear Classifier,

also known as the “Perceptron”

COMP24111 lecture 3

7/21/2019 Lecture 3 Perceptrons

http://slidepdf.com/reader/full/lecture-3-perceptrons 2/46

LAST WEEK: our first“machine learning” algorithm

Testing point x

For each training datapoint x’

 measure distance(x,x’)

End 

Sort distances

Select K nearest

 Assign most common class 

The K-Nearest Neighbour Classifier

Make your own notes on itsadvantages / disadvantages.

I’ll ask for volunteers nexttime we meet…..

7/21/2019 Lecture 3 Perceptrons

http://slidepdf.com/reader/full/lecture-3-perceptrons 3/46

Model(memorize the

training data)

Testing Data(no labels)

Training data

Predicted Labels

Learning algorithm

(do nothing)

Supervised Learning Pipeline for Nearest Neighbour

7/21/2019 Lecture 3 Perceptrons

http://slidepdf.com/reader/full/lecture-3-perceptrons 4/46

Themost important concept in Machine Learning

7/21/2019 Lecture 3 Perceptrons

http://slidepdf.com/reader/full/lecture-3-perceptrons 5/46

Looks good so far…

Themost important concept in Machine Learning

7/21/2019 Lecture 3 Perceptrons

http://slidepdf.com/reader/full/lecture-3-perceptrons 6/46

Looks good so far…

Oh no! Mistakes! 

What happened?

Themost important concept in Machine Learning

7/21/2019 Lecture 3 Perceptrons

http://slidepdf.com/reader/full/lecture-3-perceptrons 7/46

Looks good so far…

Oh no! Mistakes! 

What happened?

We didn’t have all the data.

We can never assume that we do.

This is called “OVER-FITTIN”to the small dataset.

Themost important concept in Machine Learning

7/21/2019 Lecture 3 Perceptrons

http://slidepdf.com/reader/full/lecture-3-perceptrons 8/46

The Linear Classifier

COMP24111 lecture 3

7/21/2019 Lecture 3 Perceptrons

http://slidepdf.com/reader/full/lecture-3-perceptrons 9/46

Model(equation)

Testing Data(no labels)

Training data

Predicted Labels

Learning algorithm

(search for good

parameters)

Supervised Learning Pipeline for Linear Classifiers

7/21/2019 Lecture 3 Perceptrons

http://slidepdf.com/reader/full/lecture-3-perceptrons 10/46

A more simple,compact model?

height

weightt 

dancer""else player""then)(if    t weight  >

7/21/2019 Lecture 3 Perceptrons

http://slidepdf.com/reader/full/lecture-3-perceptrons 11/46

What’s an algorithm to find a good threshold?

height

weight

dancer""else player""then)(if    t weight  >

t=40

while ( numMistakes != 0 ){

t = t + 1

numMistakes = testRule(t)

}

7/21/2019 Lecture 3 Perceptrons

http://slidepdf.com/reader/full/lecture-3-perceptrons 12/46

We have our second Machine Learning procedure.

dancer""else player""then)(if    t weight  >

t=40while ( numMistakes != 0 )

{

t = t + 1

numMistakes = testRule(t)

}

The threshold classifier (also known as a“Decision Stump”)

7/21/2019 Lecture 3 Perceptrons

http://slidepdf.com/reader/full/lecture-3-perceptrons 13/46

Three“ingredients” of a Machine Learning procedure

“Model”The final product, the thing you have to package upand send to a customer. A piece of code with some parameters that need to be set.

“Error function”The performance criterion: the function you use to judge how well the parameters of the model are set.

“Learningalgorithm”

The algorithm that optimises the model parameters,using the error function to judge how well it is doing.

7/21/2019 Lecture 3 Perceptrons

http://slidepdf.com/reader/full/lecture-3-perceptrons 14/46

Three“ingredients” of a Threshold Classifier

Model

Learningalgorithm

t=40

while ( numMistakes != 0 )

{ t = t + 1

numMistakes = testRule(t)

}

Error function

dancer""else player""then)(if    t  x >

General case – we’ re not just talking about the weight

of a rugby player – a threshold can be put on any feature‘  x’ .

7/21/2019 Lecture 3 Perceptrons

http://slidepdf.com/reader/full/lecture-3-perceptrons 15/46

7/21/2019 Lecture 3 Perceptrons

http://slidepdf.com/reader/full/lecture-3-perceptrons 16/46

Model

(memorize the

training data)

Testing Data(no labels)

Training data

Predicted Labels

Learning algorithm

(do nothing)

Supervised Learning Pipeline for Nearest Neighbour

7/21/2019 Lecture 3 Perceptrons

http://slidepdf.com/reader/full/lecture-3-perceptrons 17/46

What’s the“model” for the Nearest Neighbour classifier?

height

weight

For the k-nn, the model is thetraining data itself ! - very good accuracy - very computationally intensive!

Testing point x

For each training datapoint x’

 measure distance(x,x’)

End 

Sort distances

Select K nearest

 Assign most common class 

7/21/2019 Lecture 3 Perceptrons

http://slidepdf.com/reader/full/lecture-3-perceptrons 18/46

height

weight

New data: what’s an algorithm to find a good threshold?

Our model does notmatch the problem!

1 mistake…

dancer""else player""then)(if    t weight  >

7/21/2019 Lecture 3 Perceptrons

http://slidepdf.com/reader/full/lecture-3-perceptrons 19/46

height

weight

New data: what’s an algorithm to find a good threshold?

But our current modelcannot represent this…

 

dancer""else player""then)(if    t weight  >

7/21/2019 Lecture 3 Perceptrons

http://slidepdf.com/reader/full/lecture-3-perceptrons 20/46

We need a more sophisticated model…

dancer""else player""then)(if    t  x >

7/21/2019 Lecture 3 Perceptrons

http://slidepdf.com/reader/full/lecture-3-perceptrons 21/46

7/21/2019 Lecture 3 Perceptrons

http://slidepdf.com/reader/full/lecture-3-perceptrons 22/46

nput signals sent

from other neurons

f enough sufficient

signals accumulate! theneuron fires a signal"

#onnection strengths determine

ho$ the signals are accumulated

7/21/2019 Lecture 3 Perceptrons

http://slidepdf.com/reader/full/lecture-3-perceptrons 23/46

1 x

2 x

3 x

add 

)( t aif    >1=output  output 

 signal 

• input signals!x’ and coefficients!w’ are multiplied

• weights correspond to connection strengths

• signals are added up – if they are enough, FIRE!

else0=output 

1w

2w

3w

i

 M 

i

iw xa ∑=

=1

incoming

signal

connection

strengthacti%ation

le%el

output

signal

7/21/2019 Lecture 3 Perceptrons

http://slidepdf.com/reader/full/lecture-3-perceptrons 24/46

Sum notation

(just like a loop from 1 to

M)

double[] x =

double[] =

Multiple corresponding

elements and add them up

a

if (actiation ! threshold) "#$% &

(actiation)

i

 M 

i

iw xa ∑=

=1

Calculation…

7/21/2019 Lecture 3 Perceptrons

http://slidepdf.com/reader/full/lecture-3-perceptrons 25/46

t >if  0else '1then ==   output output    

  ∑

=

i

 M 

i

iw x1

The Perceptron Decision Rule

7/21/2019 Lecture 3 Perceptrons

http://slidepdf.com/reader/full/lecture-3-perceptrons 26/46

out"ut # $out"ut # %

t >if  0else '1then ==   output output   

  

 ∑=

i

 M 

i

iw x1

Ru&'( "la(er # $)allet dancer # %

7/21/2019 Lecture 3 Perceptrons

http://slidepdf.com/reader/full/lecture-3-perceptrons 27/46

s this a good decision boundar&'

t >if  0else '1then ==   output output  

 

 

 

 ∑=

i

 M 

i

iw x1

7/21/2019 Lecture 3 Perceptrons

http://slidepdf.com/reader/full/lecture-3-perceptrons 28/46

w$ # $.%

 

w* # %.*

t # %.%+

t >if  0else '1then ==   output output  

 

 

 

 ∑=

i

 M 

i

iw x1

7/21/2019 Lecture 3 Perceptrons

http://slidepdf.com/reader/full/lecture-3-perceptrons 29/46

w$ # *.$

 

w* # %.*

t # %.%+

t >if  0else '1then ==   output output  

 

 

 

 ∑=

i

 M 

i

iw x1

7/21/2019 Lecture 3 Perceptrons

http://slidepdf.com/reader/full/lecture-3-perceptrons 30/46

w$ # $.,

 

w* # %.%*

t # %.%+

t >if  0else '1then ==   output output  

 

 

 

 ∑=

i

 M 

i

iw x1

7/21/2019 Lecture 3 Perceptrons

http://slidepdf.com/reader/full/lecture-3-perceptrons 31/46

han&in& the wei&htsthreshold ma/es the decision 'oundar( move.

0ointless im"ossi'le to do it '( hand 1 onl( o/ 2or sim"le *-3 case.

We need an al&orithm4.

w$ # -%.5

 

w* # %.%6

t # %.%+

7/21/2019 Lecture 3 Perceptrons

http://slidepdf.com/reader/full/lecture-3-perceptrons 32/46

 *0 '*0 '2*0 +=w

 0*2 '*0 '0*1 += x

,1* -hat is the actiation' a' of the neuron.

,2* /oes the neuron fire.

,3* -hat if e set threshold at 0* and eight 3 to ero.

 

0*1=t 

w$

w*

w6

7$

7*

76

∑=

= M 

i

iiw xa1

Take a 20 minute break

and think about this.

7/21/2019 Lecture 3 Perceptrons

http://slidepdf.com/reader/full/lecture-3-perceptrons 33/46

20 minute break

7/21/2019 Lecture 3 Perceptrons

http://slidepdf.com/reader/full/lecture-3-perceptrons 34/46

 *0 '*0 '2*0 +=w

 0*2 '*0 '0*1 += x

0*1=t 

w$

w*

w6

7$

7*

76

∑=

= M 

i

iiw xa1

w$

w*

w6

7$

7*

76

∑=

= M 

i

iiw xa1

3*1)*00*2()*0*0()2*00*1(1

=×+×+×==∑=

 M 

i

iiw xa

" *hat is the acti%ation! a! of the neuron'

+" Does the neuron fire'

if (activation > threshold) output else output" 

  …# $o %es& it fires# 

7/21/2019 Lecture 3 Perceptrons

http://slidepdf.com/reader/full/lecture-3-perceptrons 35/46

 *0 '*0 '2*0 +=w

 0*2 '*0 '0*1 += x

0*1=t 

w$

w*

w6

7$

7*

76

∑=

= M 

i

iiw xa1

w$

w*

w6

7$

7*

76

∑=

= M 

i

iiw xa1

*0)0*00*2()*0*0()2*00*1(1

=×+×+×==∑=

 M 

i

iiw xa

," *hat if $e set threshold at -". and $eight /, to zero'

if (activation > threshold) output else output" 

  …# $o no& it does not fire## 

7/21/2019 Lecture 3 Perceptrons

http://slidepdf.com/reader/full/lecture-3-perceptrons 36/46

We need a more sophisticated model…

height

weight

dancer""else player""then)(if    t weight  >

dancer""else player""then))((if    t  x f     > )(

)(

2

1

kg weight  x

cmheight  x

==

i

i

i xw∑=

=1

)4()4()( 2211   xw xw x f     +=

The Perceptron

7/21/2019 Lecture 3 Perceptrons

http://slidepdf.com/reader/full/lecture-3-perceptrons 37/46

The Perceptron

i

i

i xw∑=

=1

)4()4()( 2211   xw xw x f     +=

height

weight

height

weight

dancer""else player""then)(if    t  x f     >

, and change the position of the DECISION BOUNDARYt 1w 2w

Decisionboundary

7/21/2019 Lecture 3 Perceptrons

http://slidepdf.com/reader/full/lecture-3-perceptrons 38/46

The Perceptron

error)tionclassifica(a*k*a* mistakesof  5um6erError function

height

weight

Model

Learning algo.  alues***andtheoptimisetoneed****... t w

0

=

=

"dancer" 

"player" 0y7else1y7thenif 

1

==>∑=

t  xw i

i

i

7/21/2019 Lecture 3 Perceptrons

http://slidepdf.com/reader/full/lecture-3-perceptrons 39/46

Perceptron Learning Rule

ne eight 8 old eight 9 0*1 ( true:a6el ; output ) input×

i24 8 target = !9 output = ! : 4. then u"date # ;

i24 8 target = !9 output = " : 4. then u"date # ;

i24 8 target = "9 output = ! : 4. then u"date # ;

i24 8 target = "9 output = " : 4. then u"date # ;

×

What 'eight updates do these cases produce?

update

7/21/2019 Lecture 3 Perceptrons

http://slidepdf.com/reader/full/lecture-3-perceptrons 40/46

initialise weights to random numbers in range -1 to +1

or n = 1 to "M#$%&R'%$

or ea*h training eam,le (.)

*al*ulate a*ti/ation

or ea*h weight

u,date weight b. learning rule

end

end

end

Perceptron convergence theorem:

 If the data is linearly separable, then application of the

 Perceptron learning rule will find a separating decision boundary,

within a finite number of iterations

Learning algorithm for the Perceptron

7/21/2019 Lecture 3 Perceptrons

http://slidepdf.com/reader/full/lecture-3-perceptrons 41/46

Model(if… then…)

Testing Data(no labels)

Training data

Predicted Labels

Learning algorithm(search for good

parameters)

Supervised Learning Pipeline for Perceptron

N d l l bl

7/21/2019 Lecture 3 Perceptrons

http://slidepdf.com/reader/full/lecture-3-perceptrons 42/46

New data…. non-linearly separable 

height

weight

Our model does notmatch the problem!

(AGAIN!)

Many mistakes!

dancer""else player""thenif 1

t  xw i

i

i   >∑=

M lil P

7/21/2019 Lecture 3 Perceptrons

http://slidepdf.com/reader/full/lecture-3-perceptrons 43/46

Multilayer Perceptron

x1

x2

x3

x4

x5

7/21/2019 Lecture 3 Perceptrons

http://slidepdf.com/reader/full/lecture-3-perceptrons 44/46

MLPd ii b d li bl ld!

7/21/2019 Lecture 3 Perceptrons

http://slidepdf.com/reader/full/lecture-3-perceptrons 45/46

MLP decision boundary – nonlinear problems, solved!

height

weight

N lNt k

7/21/2019 Lecture 3 Perceptrons

http://slidepdf.com/reader/full/lecture-3-perceptrons 46/46

Neural Networks - summary

Perceptrons are a (simple) emulation of a neuron.

Layering perceptrons gives you… a multilayer perceptron.An MLP is one type of neural network – there are others.

An MLP with sigmoid activation functions can solve highly nonlinear problems.

Downside – we cannot use the simple perceptron learning algorithm.

Instead we have the backpropagation  algorithm.

This is outside the scope of this introductory course.

Recommended