27
1 Pattern Classification X

1 Pattern Classification X. 2 Content General Method K Nearest Neighbors Decision Trees Nerual Networks

Embed Size (px)

Citation preview

Page 1: 1 Pattern Classification X. 2 Content General Method K Nearest Neighbors Decision Trees Nerual Networks

1

Pattern Classification

X

Page 2: 1 Pattern Classification X. 2 Content General Method K Nearest Neighbors Decision Trees Nerual Networks

2

Content

General Method K Nearest Neighbors Decision Trees Nerual Networks

Page 3: 1 Pattern Classification X. 2 Content General Method K Nearest Neighbors Decision Trees Nerual Networks

General Method

Training Learning knowledge or parameters

Testing Applying learned to new instance

3

Page 4: 1 Pattern Classification X. 2 Content General Method K Nearest Neighbors Decision Trees Nerual Networks

5

K Nearest Neighbors

K Nearest Neighbors Advantage

Nonparametric architecture Simple Powerful Requires no training time

Disadvantage Memory intensive Classification/estimation is slow

Page 5: 1 Pattern Classification X. 2 Content General Method K Nearest Neighbors Decision Trees Nerual Networks

6

K Nearest Neighbors

The key issues involved in training this model includes setting the variable K

Validation techniques (ex. Cross validation) the type of distant metric

Euclidean measure

2

1

)(),(

D

i

YiXiYXDist

Page 6: 1 Pattern Classification X. 2 Content General Method K Nearest Neighbors Decision Trees Nerual Networks

7

Figure K Nearest Neighbors Example

X

Stored training set patternsX input pattern for classification--- Euclidean distance measure to the nearest three patterns

Page 7: 1 Pattern Classification X. 2 Content General Method K Nearest Neighbors Decision Trees Nerual Networks

8

Store all input data in the training set

For each pattern in the test set

Search for the K nearest patterns to the input pattern using a Euclidean distance measure

For classification, compute the confidence for each class as Ci /K,

(where Ci is the number of patterns among the K nearest patterns belonging to class i.)

The classification for the input pattern is the class with the highest confidence.

Page 8: 1 Pattern Classification X. 2 Content General Method K Nearest Neighbors Decision Trees Nerual Networks

9

Training parameters and typical settings

Number of nearest neighbors The numbers of nearest neighbors (K) should be

based on cross validation over a number of K setting.

When k=1 is a good baseline model to benchmark against.

A good rule-of-thumb numbers is k should be less than the square root of the total number of training patterns.

Page 9: 1 Pattern Classification X. 2 Content General Method K Nearest Neighbors Decision Trees Nerual Networks

10

Training parameters and typical settings

Input compression Since KNN is very storage intensive, we may

want to compress data patterns as a preprocessing step before classification.

Using input compression will result in slightly worse performance.

Sometimes using compression will improve performance because it performs automatic normalization of the data which can equalize the effect of each input in the Euclidean distance measure.

Page 10: 1 Pattern Classification X. 2 Content General Method K Nearest Neighbors Decision Trees Nerual Networks

11 CPC group Seminar Thursday, June 1, 2006

Euclidean distance metric fails

Pattern to be classified Prototype A Prototype B

Prototype B seems more similar than Prototype A according to Euclidean distance.

Digit “9” misclassified as “4”.

Possible solution is to use an distance metric invariant to irrelevant transformations.

Page 11: 1 Pattern Classification X. 2 Content General Method K Nearest Neighbors Decision Trees Nerual Networks

12

Decision trees

Decision trees are popular for pattern recognition because the models they produce are easier to understand.

Root node

A A

B B B B

A. Nodes of the tree

B. Leaves (terminal nodes) of the tree

C. Branches (decision point) of the tree

C

Page 12: 1 Pattern Classification X. 2 Content General Method K Nearest Neighbors Decision Trees Nerual Networks

13

Decision trees-Binary decision trees

Classification of an input vector is done by traversing the tree beginning at the root node, and ending the leaf.

Each node of the tree computes an inequality (ex. BMI<24, yes or no) based on a single input variable.

Each leaf is assigned to a particular class.

Yes No

Yes No

NoYes

BMI<24

Page 13: 1 Pattern Classification X. 2 Content General Method K Nearest Neighbors Decision Trees Nerual Networks

14

Decision trees-Binary decision trees

Since each inequality that is used to split the input space is only based on one input variable.

Each node draws a boundary that can be geometrically interpreted as a hyperplane perpendicular to the axis.

B C

Page 14: 1 Pattern Classification X. 2 Content General Method K Nearest Neighbors Decision Trees Nerual Networks

15

Decision trees-Linear decision trees

Linear decision trees are similar to binary decision trees, except that the inequality computed at each node takes on an arbitrary linear from that may depend on multiple variables.

aX1+bX2Yes No

Yes No

NoYes

Page 15: 1 Pattern Classification X. 2 Content General Method K Nearest Neighbors Decision Trees Nerual Networks

Biological Neural Systems

Neuron switching time : > 10-3 secs Number of neurons in the human brain: ~1010

Connections (synapses) per neuron : ~104–105

Face recognition : 0.1 secs High degree of distributed and parallel computation

Highly fault tolerent Highly efficient Learning is key

Page 16: 1 Pattern Classification X. 2 Content General Method K Nearest Neighbors Decision Trees Nerual Networks

Excerpt from Russell and Norvig

Page 17: 1 Pattern Classification X. 2 Content General Method K Nearest Neighbors Decision Trees Nerual Networks

A Neuron

Computation: input signals input function(linear) activation

function(nonlinear) output signal

ajoutput links

ak

outputInput links

Wk

j

ai = output(inj)

inj

j

kkjj IWin *

Page 18: 1 Pattern Classification X. 2 Content General Method K Nearest Neighbors Decision Trees Nerual Networks

Part 1. Perceptrons: Simple NN

x1

x2

xn

.

.

.

w1

w2

wn

a=i=1n wi xi

Xi’s range: [0, 1]

1 if a y= 0 if a <

y

{

inputs

weights

activationoutput

Page 19: 1 Pattern Classification X. 2 Content General Method K Nearest Neighbors Decision Trees Nerual Networks

Decision Surface of a Perceptron

x1

x2

Decision line

w1 x1 + w2 x2 = w1

1 1

0

0

00

0

1

Page 20: 1 Pattern Classification X. 2 Content General Method K Nearest Neighbors Decision Trees Nerual Networks

Linear Separability

x1

x2

10

0 0

Logical AND

x1 x2 a y

0 0 0 0

0 1 1 0

1 0 1 0

1 1 2 1

w1=1w2=1=1.5

x1

10

0

w1=?w2=?= ?

1

Logical XOR

x1 x2 y

0 0 0

0 1 1

1 0 1

1 1 0

Page 21: 1 Pattern Classification X. 2 Content General Method K Nearest Neighbors Decision Trees Nerual Networks

Threshold as Weight: W0

x1

x2

xn

.

.

.

w1

w2

wn

w0

x0=-1

a= i=0n wi xi

y

1 if a y= 0 if a <{

=w0

Thus, y= sgn(a)=0 or 1

Page 22: 1 Pattern Classification X. 2 Content General Method K Nearest Neighbors Decision Trees Nerual Networks

Perceptron Learning Rule

w’=w + (t-y) x

wi := wi + wi = wi + (t-y) xi (i=1..n) The parameter is called the learning rate.

In Han’s book it is lower case L It determines the magnitude of weight updates wi .

If the output is correct (t=y) the weights are not changed (wi =0).

If the output is incorrect (t y) the weights wi are changed such that the output of the Perceptron for the new weights w’i is closer/further to the input xi.

Page 23: 1 Pattern Classification X. 2 Content General Method K Nearest Neighbors Decision Trees Nerual Networks

Perceptron Training Algorithm

Repeatfor each training vector pair (x,t)

evaluate the output y when x is the inputif yt then

form a new weight vector w’ accordingto w’=w + (t-y) x

else do nothing

end if end forUntil y=t for all training vector pairs or # iterations > k

Page 24: 1 Pattern Classification X. 2 Content General Method K Nearest Neighbors Decision Trees Nerual Networks

Perceptron Learning Examplet=1

t=-1

w=[0.25 –0.1 0.5]x2 = 0.2 x1 – 0.5

o=1

o=-1

(x,t)=([-1,-1],1)o=sgn(0.25+0.1-0.5) =-1

w=[0.2 –0.2 –0.2]

(x,t)=([2,1],-1)o=sgn(0.45-0.6+0.3) =1

w=[-0.2 –0.4 –0.2]

(x,t)=([1,1],1)o=sgn(0.25-0.7+0.1) =-1

w=[0.2 0.2 0.2]

Page 25: 1 Pattern Classification X. 2 Content General Method K Nearest Neighbors Decision Trees Nerual Networks

Part 2. Multi Layer Networks

Output nodes

Input nodes

Hidden nodes

Output vector

Input vector

Page 26: 1 Pattern Classification X. 2 Content General Method K Nearest Neighbors Decision Trees Nerual Networks

Can use multi layer to learn nonlinear functions

How to set the weights?

x1

10

0

w1=?w2=?= ?

1

Logical XOR

x1 x2 y

0 0 0

0 1 1

1 0 1

1 1 0

x1

x2

3

4

5

w23

w35

Page 27: 1 Pattern Classification X. 2 Content General Method K Nearest Neighbors Decision Trees Nerual Networks

28

End