1 Pattern Classification X. 2 Content General Method K Nearest Neighbors Decision Trees Nerual...

Preview:

Citation preview

1

Pattern Classification

X

2

Content

General Method K Nearest Neighbors Decision Trees Nerual Networks

General Method

Training Learning knowledge or parameters

Testing Applying learned to new instance

3

5

K Nearest Neighbors

K Nearest Neighbors Advantage

Nonparametric architecture Simple Powerful Requires no training time

Disadvantage Memory intensive Classification/estimation is slow

6

K Nearest Neighbors

The key issues involved in training this model includes setting the variable K

Validation techniques (ex. Cross validation) the type of distant metric

Euclidean measure

2

1

)(),(

D

i

YiXiYXDist

7

Figure K Nearest Neighbors Example

X

Stored training set patternsX input pattern for classification--- Euclidean distance measure to the nearest three patterns

8

Store all input data in the training set

For each pattern in the test set

Search for the K nearest patterns to the input pattern using a Euclidean distance measure

For classification, compute the confidence for each class as Ci /K,

(where Ci is the number of patterns among the K nearest patterns belonging to class i.)

The classification for the input pattern is the class with the highest confidence.

9

Training parameters and typical settings

Number of nearest neighbors The numbers of nearest neighbors (K) should be

based on cross validation over a number of K setting.

When k=1 is a good baseline model to benchmark against.

A good rule-of-thumb numbers is k should be less than the square root of the total number of training patterns.

10

Training parameters and typical settings

Input compression Since KNN is very storage intensive, we may

want to compress data patterns as a preprocessing step before classification.

Using input compression will result in slightly worse performance.

Sometimes using compression will improve performance because it performs automatic normalization of the data which can equalize the effect of each input in the Euclidean distance measure.

11 CPC group Seminar Thursday, June 1, 2006

Euclidean distance metric fails

Pattern to be classified Prototype A Prototype B

Prototype B seems more similar than Prototype A according to Euclidean distance.

Digit “9” misclassified as “4”.

Possible solution is to use an distance metric invariant to irrelevant transformations.

12

Decision trees

Decision trees are popular for pattern recognition because the models they produce are easier to understand.

Root node

A A

B B B B

A. Nodes of the tree

B. Leaves (terminal nodes) of the tree

C. Branches (decision point) of the tree

C

13

Decision trees-Binary decision trees

Classification of an input vector is done by traversing the tree beginning at the root node, and ending the leaf.

Each node of the tree computes an inequality (ex. BMI<24, yes or no) based on a single input variable.

Each leaf is assigned to a particular class.

Yes No

Yes No

NoYes

BMI<24

14

Decision trees-Binary decision trees

Since each inequality that is used to split the input space is only based on one input variable.

Each node draws a boundary that can be geometrically interpreted as a hyperplane perpendicular to the axis.

B C

15

Decision trees-Linear decision trees

Linear decision trees are similar to binary decision trees, except that the inequality computed at each node takes on an arbitrary linear from that may depend on multiple variables.

aX1+bX2Yes No

Yes No

NoYes

Biological Neural Systems

Neuron switching time : > 10-3 secs Number of neurons in the human brain: ~1010

Connections (synapses) per neuron : ~104–105

Face recognition : 0.1 secs High degree of distributed and parallel computation

Highly fault tolerent Highly efficient Learning is key

Excerpt from Russell and Norvig

A Neuron

Computation: input signals input function(linear) activation

function(nonlinear) output signal

ajoutput links

ak

outputInput links

Wk

j

ai = output(inj)

inj

j

kkjj IWin *

Part 1. Perceptrons: Simple NN

x1

x2

xn

.

.

.

w1

w2

wn

a=i=1n wi xi

Xi’s range: [0, 1]

1 if a y= 0 if a <

y

{

inputs

weights

activationoutput

Decision Surface of a Perceptron

x1

x2

Decision line

w1 x1 + w2 x2 = w1

1 1

0

0

00

0

1

Linear Separability

x1

x2

10

0 0

Logical AND

x1 x2 a y

0 0 0 0

0 1 1 0

1 0 1 0

1 1 2 1

w1=1w2=1=1.5

x1

10

0

w1=?w2=?= ?

1

Logical XOR

x1 x2 y

0 0 0

0 1 1

1 0 1

1 1 0

Threshold as Weight: W0

x1

x2

xn

.

.

.

w1

w2

wn

w0

x0=-1

a= i=0n wi xi

y

1 if a y= 0 if a <{

=w0

Thus, y= sgn(a)=0 or 1

Perceptron Learning Rule

w’=w + (t-y) x

wi := wi + wi = wi + (t-y) xi (i=1..n) The parameter is called the learning rate.

In Han’s book it is lower case L It determines the magnitude of weight updates wi .

If the output is correct (t=y) the weights are not changed (wi =0).

If the output is incorrect (t y) the weights wi are changed such that the output of the Perceptron for the new weights w’i is closer/further to the input xi.

Perceptron Training Algorithm

Repeatfor each training vector pair (x,t)

evaluate the output y when x is the inputif yt then

form a new weight vector w’ accordingto w’=w + (t-y) x

else do nothing

end if end forUntil y=t for all training vector pairs or # iterations > k

Perceptron Learning Examplet=1

t=-1

w=[0.25 –0.1 0.5]x2 = 0.2 x1 – 0.5

o=1

o=-1

(x,t)=([-1,-1],1)o=sgn(0.25+0.1-0.5) =-1

w=[0.2 –0.2 –0.2]

(x,t)=([2,1],-1)o=sgn(0.45-0.6+0.3) =1

w=[-0.2 –0.4 –0.2]

(x,t)=([1,1],1)o=sgn(0.25-0.7+0.1) =-1

w=[0.2 0.2 0.2]

Part 2. Multi Layer Networks

Output nodes

Input nodes

Hidden nodes

Output vector

Input vector

Can use multi layer to learn nonlinear functions

How to set the weights?

x1

10

0

w1=?w2=?= ?

1

Logical XOR

x1 x2 y

0 0 0

0 1 1

1 0 1

1 1 0

x1

x2

3

4

5

w23

w35

28

End

Recommended