Instance-Based Learning and Clustering · Example: Recognition of handwritten digits 30 pixels 20...

Instance-Based Learning and Clustering

R&N 20.4, a bit of 20.3

Different kinds of Inductive Learning

• Supervised learning – Basic idea: Learn an approximation for a

function y=f(x) based on labelled examples { (x1,y1), (x2,y2), …, (xn,yn) }

– E.g. Decision Trees, Bayes classifiers, Instance-based learning methods

• Unsupervised learning

Instance-based learning

• Idea: For every test data point, search database of training data for ‘similar’ points and predict according to those points

Instance-based learning

• Idea: For every test data point, search database of training data for ‘similar’ points and predict according to those points

• Four elements of an instance-based learner:– How do we define ‘similarity’?– How many similar data points (neighbors) do we use?– (Optional) What weights do we give these neighbors?– How do we predict using these neighbors?

One-nearest-neighbor (1-NN)

• Simplest Instance-based learning method• Four elements of 1-NN:

– How do we define ‘similarity’?• Euclidian distance metric

– How many similar data points (neighbors) do we use?• one

– (Optional) What weights do we give these neighbors?• unused

– How do we predict using these neighbors?• predict the same value as the nearest neighbor

1-NN Prediction

class A

class B

1. Classification (predicting discrete-valued labels)

1-NN Prediction

class A

class B

PredictionTest point

1-NN Prediction

class A

class B

1-NN Prediction

• Three classes: • Background color

indicates prediction in different areas

• Solid lines are decision boundaries between classes

[ignore the dashed purple line]

1-NN Prediction

2. Regression (predicting real-valued labels)

K-nearest-neighbor (K-NN)• A generalization of 1-NN to multiple neighbors• Four elements of K-NN:

– How do we define ‘similarity’?• Euclidian distance metric

– How many similar data points (neighbors) do we use?• K

– (Optional) What weights do we give these neighbors?• unused

– How do we predict using these neighbors?• Classification: predict majority label among neighbors• Regression: predict average value among neighbors

K-NN Prediction

class A

class B

1. Classification (K=3)

K-NN Prediction

class A

class B

K-NN Prediction

• Three classes: • Background color

indicates prediction in different areas

• Solid lines are decision boundaries between classes

[ignore the dashed purple line]

K-NN Prediction

K=1 K=15

K-NN Prediction

2. Regression (with K=9)

K-NN Prediction

Example: Recognition of handwritten digits

30 pixels

20 pixels

600-dimensionaldata point

Example: Recognition of handwritten digits

N sets of handwritten digit samples

10xN 600-dimensional training points

New handwritten sample classifiedby K-NN in 600-dimensional space

on training data

(Each color represents samples of a particular digit)

K-NN vs. other techniques• Most Instance-based methods work only for real-valued

inputs• Instance-based methods do not need a training phase,

unlike decision trees and Bayes classifiers• However, the nearest-neighbors-search step can be

expensive for large/high-dimensional datasets • Instance-based learning is non-parametric, i.e. no prior

model assumptions• No foolproof way to pre-select K … must try different

values and pick one that works well• Problems of discontinuities and edge effects in K-NN

regression … can be addressed by introducing weightsfor data points that are proportional to closeness

Unsupervised Learning (a.k.a. Clustering)

• Unsupervised learning – Basic idea: Learn an approximation for a

function y=f(x) based on unlabelled examples { x1, x2, …, xn }

– The goal is to uncover distinct classes of data points (clusters), which might then lead to a supervised learning scenario

– E.g. K-means, hierarchical clustering

The following slides are adapted from Andrew Moore’s slides at http://www.autonlab.org/tutorials/kmeans.html

K-means• Even if we have no labels for a

data set, there might still be interesting structure in the data in the form of distinct clusters/clumps.

• K-means is an iterative algorithm to find such clusters given the assumption that exactly K clusters exist.

K-means1. Ask user how many

clusters they’d like. (e.g. k=5)

2. Randomly guess k cluster Center locations

3. Each datapoint finds out which Center it’s closest to. (Thus each Center “owns”a set of datapoints)

3. Each datapoint finds out which Center it’s closest to.

4. Each Center finds the centroid of the points it owns

3. Each datapoint finds out which Center it’s closest to.

4. Each Center finds the centroid of the points it owns…

5. …and jumps there

6. …Repeat until terminated!

K-means Questions

• What is it trying to optimize?• Are we sure it will terminate?• Are we sure it will find an optimal

clustering?• How should we start it?

DistortionGiven..

•an encoder function: ENCODE : ℜm → [1..k]

•a decoder function: DECODE : [1..k] → ℜm

Define…

( )∑=

2)]([Distortion ENCODEDECODE xx

DistortionGiven..

•an encoder function: ENCODE : ℜm → [1..k]

•a decoder function: DECODE : [1..k] → ℜm

Define…

We may as well write

2)(ENCODE )(Distortionso

][DECODE

( )∑=

2)]([Distortion ENCODEDECODE xx

The Minimal Distortion

What properties must centers c1 , c2 , … , ck have when distortion is minimized?

2)(ENCODE )(Distortion xcx

The Minimal Distortion (1)

(1) xi must be encoded by its nearest center

….why?

},...,{)(ENCODE )(minarg

ccccx −=

..at the minimal distortion

….why?

},...,{)(ENCODE )(minarg

ccccx −=

..at the minimal distortion

Otherwise distortion could be reduced by replacing ENCODE[xi] by the nearest center

(2) The partial derivative of Distortion with respect to each center location must be zero.

minimum) a(for 0

)(Distortion

)OwnedBy(

1 )OwnedBy(

2)(ENCODE

−−=

−∂∂

∑ ∑

OwnedBy(cj ) = the set of records owned by Center cj .

minimum) a(for 0

)(Distortion

)OwnedBy(

1 )OwnedBy(

2)(ENCODE

−−=

−∂∂

∑ ∑

∑∈

=)OwnedBy(|)OwnedBy(|

cThus, at a minimum:

At the minimum distortion

(2) Each Center must be at the centroid of points it owns.

Improving a suboptimal configuration…

What properties can be changed for centers c1 , c2 , … , ckhave when distortion is not minimized?

Improving a suboptimal configuration…

What properties can be changed for centers c1 , c2 , … , ckhave when distortion is not minimized?

(1) Change encoding so that xi is encoded by its nearest center

(2) Set each Center to the centroid of points it owns.

There’s no point applying either operation twice in succession.

But it can be profitable to alternate.

…And that’s K-means!

Will we find the optimal configuration?

• Not necessarily.• Can you invent a configuration that has

converged, but does not have the minimum distortion?

Will we find the optimal configuration?

• Not necessarily.• Can you invent a configuration that has

converged, but does not have the minimum distortion?

Trying to find good optima

• Idea 1: Be careful about where you start• Idea 2: Do many runs of k-means, each

from a different random start configuration• Many other ideas floating around.

Other distance metrics

• Note that we could have used the Manhattan distance metric instead of the one above.

• If so,

1)(ENCODE ||Distortion xcx

How would you find the distortion-minimizing centers in this case?

Example: Image Segmentation• Once K-means is performed, the resulting cluster centers can be

thought of as K labelled data points for 1-NN on the entire training set, such that each data point is labelled with its nearest center. This is called Vector Quantization.

Vector quantization on pixel intensities

Vector quantization on pixel colors

Common uses of K-means

• Often used as an exploratory data analysis tool• In one-dimension, a good way to quantize real-

valued variables into k non-uniform buckets• Used on acoustic data in speech understanding

to convert waveforms into one of k categories (i.e. Vector Quantization)

• Also used for choosing color palettes on old fashioned graphical display devices!

Single Linkage Hierarchical Clustering

1. Say “Every point is its own cluster”

2. Find “most similar” pair of clusters

3. Merge it into a parent cluster

4. Repeat

4. Repeat…until you’ve merged the whole dataset into one clusterYou’re left with a nice

dendrogram, or taxonomy, or hierarchy of datapoints

How do we define similarity between clusters?

• Minimum distance between points in clusters

• Maximum distance between points in clusters

• Average distance between points in clusters

Hierarchical Clustering Comments• It’s nice that you get a hierarchy instead of

an amorphous collection of groups• If you want k groups, just cut the (k-1)

longest links

Instance-Based Learning and Clustering · Example: Recognition of handwritten digits 30 pixels 20...

Documents

Learning handwritten digits with a simple neural networkcs332/ppt/facesDeep.pdf11/11/18 1 Learning handwritten digits with a simple neural network sample image 28 28 one output unit

Intrinsic disentanglement: an invariance view for deep ... · to design realistic images in a variety of complex domains (handwritten digits, human faces, interior scenes). In par-ticular,

Improved Learning Algorithms for Restricted Boltzmann …research.cs.aalto.fi/bayes/papers/files/ChoMasters.pdf · 4.1 MNIST Handwritten digits ... Especially, deep neural networks

Robust Segmentation for Automatic Data Extraction from ... · recognition algorithm is extended by Gaussian elimination method. The handwritten character A to Z and digits 0 to 9

Neural Network for handwritten digits recognition in C++

Offline Handwritten Digits Recognition Using Machine learningieomsociety.org/dc2018/papers/123.pdfto Neural Network of 96.8%, but K-NN achieves a processing speed with almost 10 times

Image Classification with DIGITS - GPUComputing Sheffieldgpucomputing.shef.ac.uk/.../image-classification-with-digits.pdf · HANDWRITTEN DIGIT RECOGNITION ... • Train your own Convolutional

Diagnosis of Pediatric Obstructive Sleep Apnea via Face ... · network was subsequently used to analyze the MNIST ([24]) data set of handwritten digits. The resulting network was

Deep Learning for Handwritten Digits Recognition Using

With many contributors - cntk.ai€¦ · With many contributors: A. Agarwal, E. Akchurin, ... arbitrary neural networks by composing simple ... MNIST Handwritten Digits

Learning handwritten digits with a simple neural networkcs.wellesley.edu/~cs332/ppt/facesDeep.pdf11/11/18 1 Learning handwritten digits with a simple neural network sample image 28

Math 285 Classiﬁcation with Handwritten Digits – First day ...gchen/Math285S16/intro.pdfplatform hosting many data science competitions. – It uses a crowdsourcing approach that

Automatic Recognition of Handwritten Score DigitsAutomatic Recognition of Handwritten Score Digits by Kang Gim Pin 16270 Dissertation submitted in partial fulfilment of the requirements

Neural networks in modern image processing · Neural networks in modern image processing ... e.g. traffic sign recognition, handwritten digits ... Image features derived from neural

PROJECT IDEAS:- · PROJECT IDEAS:- 1. Distributed ... Neural Network for Recognition of Handwritten and Digits Backpropagation ... Neural Network for Recognition of Handwritten and

Online Handwritten Chinese/Japanese Character Recognitioncdn.intechopen.com/...handwritten...japanese_character_recognition.pdf · handwritten Chinese/Japanese string recognition

MAT 167: Applied Linear Algebra Lecture 21: Classification ...saito/courses/167.s12/Lecture21.pdf · MAT 167: Applied Linear Algebra Lecture 21: Classi cation of Handwritten Digits

Tiny ImageNet Visual Recognition Challengecs231n.stanford.edu/reports/2015/pdfs/yle_project.pdf · Tiny ImageNet Visual Recognition Challenge ... and handwritten digits. ... deep

Identification of Prototypes for Handwritten Digits using ...nitish/iitk_page/se367/pres2.pdfNitish Srivastava Pradeep Karuturi SE 367: Introduction to Cognitive Science Department

Towards Life Long Learning: Multimodal Learning of MNIST ... · Towards Life Long Learning: Multimodal Learning of MNIST Handwritten Digits Eli Sheppard y, Hagen Lehmann , G. Rajendran