21
CS 478 - Instance Based Learning 1 Instance Based Learning Instance Based Learning

CS 478 - Instance Based Learning1 Instance Based Learning

  • View
    238

  • Download
    1

Embed Size (px)

Citation preview

CS 478 - Instance Based Learning 1

Instance Based LearningInstance Based Learning

CS 478 - Instance Based Learning 2

Instance Based LearningInstance Based Learning

Classify based on local similarity Ranges from simple nearest neighbor to case-based and

analogical reasoning Use local information near the current query instance to

decide the classification of that instance As such can represent quite complex decision surfaces in a

simple manner

kk-Nearest Neighbor Approach-Nearest Neighbor Approach Simply store all (or some representative subset) of the

examples in the training set. When desiring to generalize on a new instance,

measure the distance from the new instance to one or more stored instances which vote to decide the class of the new instance.

No need to pre-process a specific hypothesis (Lazy vs. Eager learning)– Fast learning– Can be slow during execution and require significant storage– Some models index the data or reduce the instances stored

CS 478 - Instance Based Learning 4

kk-Nearest Neighbor (cont.)-Nearest Neighbor (cont.)

Naturally supports real valued attributes Typically use Euclidean distance

Nominal/unknown attributes can just be a 1/0 distance (more on other distance metrics later)

The output class for the query instance is set to the most common class of its k nearest neighbors

where (x,y) = 1 if x = y , else 0 k greater than 1 is more noise resistant, but a very large k would lead

to less accuracy as less relevant neighbors have more influence (common values: k=3, k=5)

dist(x,y) = (x i − y i)2

i=1

m

f^

(xq ) = argmaxv∈V

δ(v, f (x i))i=1

k

CS 478 - Instance Based Learning 5

kk-Nearest Neighbor (cont.)-Nearest Neighbor (cont.)

Can also do distance weighted voting where the strength of a neighbors influence is proportional to its distance

Inverse of distance squared is a common weight Gaussian is another common distance weight In this case can let k be larger (even all points if desired),

because the more distant points have negligible influence

f^

(xq ) = argmaxv∈V

wiδ(v, f (x i))i=1

k

wi =1

dist(xq , x i)2

CS 478 - Instance Based Learning 6

Regression with Regression with kk-nn-nn

Can also do regression by letting the output be the weighted mean of the k nearest neighbors

For distance weighted regression

Where f(x) is the output value for instance x

f^

(xq ) =

wi f (x i)i=1

k

wi

i=1

k

wi =1

dist(xq , x i)2

CS 478 - Instance Based Learning 7

Attribute WeightingAttribute Weighting

One of the main weaknesses of nearest neighbor is irrelevant features, since they can dominate the distance

Can create algorithms which weight the attributes (Note that backprop and ID3 etc, do higher order weighting of features)

No longer lazy evaluation since you need to come up with a portion of your hypothesis (attribute weights) before generalizing

Still an open area of research– Should weighting be local or global?– What is the best method, etc.?

CS 478 - Instance Based Learning 8

Reduction TechniquesReduction Techniques

Wilson, D. R. and Martinez, T. R., Reduction Techniques for Exemplar-Based Learning Algorithms, Machine Learning Journal, vol. 38, no. 3, pp. 257-286, 2000.

Create a subset or other representative set of prototype nodes Approaches

– Leave-one-out reduction - Drop instance if it would still be classified correctly

– Growth algorithm - Only add instance if it is not already classified correctly - both order dependent

– More global optimizing approaches– Just keep central points– Just keep border points (pre-process noisy instances)– Drop 5 (Wilson & Martinez) maintains almost full accuracy with

approximately 15% of the original instances

CS 478 - Instance Based Learning 9

CS 478 - Instance Based Learning 10

Radial Basis Function (RBF) NetworksRadial Basis Function (RBF) Networks

Each prototype node computes a distance based kernel function (Gaussian is common)

Prototype nodes form a hidden layer in a neural network Train top layer with simple delta rule to get outputs Thus, prototype nodes learn weightings for each class

2

A

3

x y

1

B

4

1

x

y

2

3 4

CS 478 - Instance Based Learning 11

Radial Basis Function NetworksRadial Basis Function Networks

Number of nodes and placement (means) Sphere of influence (deviation)

– Too small - no generalization, should have some overlap– Too large - saturation, lose local effects, long training

Output layer weights - linear or non-linear nodes– Delta rule variations– Direct matrix weight calculation

CS 478 - Instance Based Learning 12

Node PlacementNode Placement

One for each instance of the training set Random subset of instances Clustering - Unsupervised or supervised - k-means style vs.

constructive Genetic Algorithms Random Coverage - Curse of Dimensionality Node adjustment - Competitive Learning style Dynamic addition and deletion of nodes

CS 478 - Instance Based Learning 13

RBF vs. BPRBF vs. BP

Line vs. Sphere - mix-and-match approaches Potential Faster Training - nearest neighbor localization -

Yet more data and hidden nodes typically needed Local vs Global, less extrapolation (ala BP), have reject

capability (avoid false positives) RBF can have problems with irrelevant features just like

nearest neighbor

CS 478 - Instance Based Learning 14

Distance MetricsDistance Metrics

Wilson, D. R. and Martinez, T. R., Improved Heterogeneous Distance Functions, Journal of Artificial Intelligence Research, vol. 6, no. 1, pp. 1-34, 1997.

Normalization of features Main question: How best to handle nominal inputs

CS 478 - Instance Based Learning 15

CS 478 - Instance Based Learning 16

CS 478 - Instance Based Learning 17

CS 478 - Instance Based Learning 18

CS 478 - Instance Based Learning 19

CS 478 - Instance Based Learning 20

21CS 478 - Instance Based Learning

Instance Based Learning AssignmentInstance Based Learning Assignment

See http://axon.cs.byu.edu/~martinez/classes/478/Assignments.html