1 Pattern Classification X. 2 Content General Method K Nearest Neighbors Decision Trees Nerual...

Pattern Classification

Content

General Method K Nearest Neighbors Decision Trees Nerual Networks

General Method

Training Learning knowledge or parameters

Testing Applying learned to new instance

K Nearest Neighbors

K Nearest Neighbors Advantage

Nonparametric architecture Simple Powerful Requires no training time

Disadvantage Memory intensive Classification/estimation is slow

K Nearest Neighbors

The key issues involved in training this model includes setting the variable K

Validation techniques (ex. Cross validation) the type of distant metric

Euclidean measure

YiXiYXDist

Figure K Nearest Neighbors Example

Stored training set patternsX input pattern for classification--- Euclidean distance measure to the nearest three patterns

Store all input data in the training set

For each pattern in the test set

Search for the K nearest patterns to the input pattern using a Euclidean distance measure

For classification, compute the confidence for each class as Ci /K,

(where Ci is the number of patterns among the K nearest patterns belonging to class i.)

The classification for the input pattern is the class with the highest confidence.

Training parameters and typical settings

Number of nearest neighbors The numbers of nearest neighbors (K) should be

based on cross validation over a number of K setting.

When k=1 is a good baseline model to benchmark against.

A good rule-of-thumb numbers is k should be less than the square root of the total number of training patterns.

Training parameters and typical settings

Input compression Since KNN is very storage intensive, we may

want to compress data patterns as a preprocessing step before classification.

Using input compression will result in slightly worse performance.

Sometimes using compression will improve performance because it performs automatic normalization of the data which can equalize the effect of each input in the Euclidean distance measure.

11 CPC group Seminar Thursday, June 1, 2006

Euclidean distance metric fails

Pattern to be classified Prototype A Prototype B

Prototype B seems more similar than Prototype A according to Euclidean distance.

Digit “9” misclassified as “4”.

Possible solution is to use an distance metric invariant to irrelevant transformations.

Decision trees

Decision trees are popular for pattern recognition because the models they produce are easier to understand.

Root node

B B B B

A. Nodes of the tree

B. Leaves (terminal nodes) of the tree

C. Branches (decision point) of the tree

Decision trees-Binary decision trees

Classification of an input vector is done by traversing the tree beginning at the root node, and ending the leaf.

Each node of the tree computes an inequality (ex. BMI<24, yes or no) based on a single input variable.

Each leaf is assigned to a particular class.

Yes No

BMI<24

Decision trees-Binary decision trees

Since each inequality that is used to split the input space is only based on one input variable.

Each node draws a boundary that can be geometrically interpreted as a hyperplane perpendicular to the axis.

Decision trees-Linear decision trees

Linear decision trees are similar to binary decision trees, except that the inequality computed at each node takes on an arbitrary linear from that may depend on multiple variables.

aX1+bX2Yes No

Yes No

Biological Neural Systems

Neuron switching time : > 10-3 secs Number of neurons in the human brain: ~1010

Connections (synapses) per neuron : ~104–105

Face recognition : 0.1 secs High degree of distributed and parallel computation

Highly fault tolerent Highly efficient Learning is key

Excerpt from Russell and Norvig

A Neuron

Computation: input signals input function(linear) activation

function(nonlinear) output signal

ajoutput links

outputInput links

ai = output(inj)

kkjj IWin *

Part 1. Perceptrons: Simple NN

a=i=1n wi xi

Xi’s range: [0, 1]

1 if a y= 0 if a <

inputs

weights

activationoutput

Decision Surface of a Perceptron

Decision line

w1 x1 + w2 x2 = w1

Linear Separability

Logical AND

x1 x2 a y

0 0 0 0

0 1 1 0

1 0 1 0

1 1 2 1

w1=1w2=1=1.5

w1=?w2=?= ?

Logical XOR

x1 x2 y

Threshold as Weight: W0

a= i=0n wi xi

1 if a y= 0 if a <{

Thus, y= sgn(a)=0 or 1

Perceptron Learning Rule

w’=w + (t-y) x

wi := wi + wi = wi + (t-y) xi (i=1..n) The parameter is called the learning rate.

In Han’s book it is lower case L It determines the magnitude of weight updates wi .

If the output is correct (t=y) the weights are not changed (wi =0).

If the output is incorrect (t y) the weights wi are changed such that the output of the Perceptron for the new weights w’i is closer/further to the input xi.

Perceptron Training Algorithm

Repeatfor each training vector pair (x,t)

evaluate the output y when x is the inputif yt then

form a new weight vector w’ accordingto w’=w + (t-y) x

else do nothing

end if end forUntil y=t for all training vector pairs or # iterations > k

Perceptron Learning Examplet=1

w=[0.25 –0.1 0.5]x2 = 0.2 x1 – 0.5

(x,t)=([-1,-1],1)o=sgn(0.25+0.1-0.5) =-1

w=[0.2 –0.2 –0.2]

(x,t)=([2,1],-1)o=sgn(0.45-0.6+0.3) =1

w=[-0.2 –0.4 –0.2]

(x,t)=([1,1],1)o=sgn(0.25-0.7+0.1) =-1

w=[0.2 0.2 0.2]

Part 2. Multi Layer Networks

Output nodes

Input nodes

Hidden nodes

Output vector

Input vector

Can use multi layer to learn nonlinear functions

How to set the weights?

w1=?w2=?= ?

Logical XOR

x1 x2 y

1 Pattern Classification X. 2 Content General Method K Nearest Neighbors Decision Trees Nerual...

Documents

Nonparametric Methods: Nearest Neighbors

L8: Nearest neighbors - Texas A&M Universitycourses.cs.tamu.edu/rgutier/csce666_f16/l8.pdf · L8: Nearest neighbors • Nearest neighbors density estimation • The k nearest neighbors

Deep k-Nearest Neighbors: Towards Conﬁdent, Interpretable ... · Deep k-Nearest Neighbors: Towards Conﬁdent, Interpretable and Robust Deep Learning Nicolas Papernot and Patrick

K Nearest Neighbors Saed Sayad 1

Finding Similar Documents Using Nearest Neighbors

k-Nearest Neighbors in Uncertain Graphs

Nearest Neighbors and Naive Bayes

Compressed k-Nearest Neighbors Classification for Evolving

Supervised Learning: K-Nearest Neighbors and Decision Trees

Sequential Change-Point Detection Based on Nearest Neighbors

K Nearest Neighbors

Algorithms for Nearest Neighbors - State-of-the-Art

Ranking continuous nearest neighbors for uncertain ...users.eecs.northwestern.edu/~goce/SomePubs/... · Ranking continuous nearest neighbors for uncertain trajectories ... mental

Reverse Nearest Neighbors Search in Ad-hoc Subspaces

K-Nearest Neighbors Hashingopenaccess.thecvf.com/content_CVPR_2019/papers/He_K...K-Nearest Neighbors Hashing Xiangyu He1,2, Peisong Wang1, Jian Cheng1,2,3 1 NLPR, Institute of Automation,

Open Problem: Dynamic Planar Nearest Neighbors

K Nearest Neighbors - | Chemical Engineering & Applied …chem-eng.utoronto.ca/~datamining/Presentations/KNN.pdf · · 2010-03-05K Nearest Neighbors Dr. Saed Sayad University of

25 Faster Nearest Neighbors: Voronoi Diagrams and k-d Treesjrs/189s17/lec/25.pdf · 25 Faster Nearest Neighbors: Voronoi Diagrams and k-d Trees ... Faster Nearest Neighbors: Voronoi

Continuous Monitoring of Nearest Neighbors on Land Surface

k -Nearest neighbors and decision tree