Classification and K-Nearest Neighborsketelsen/files/courses/csci4622/slides/lesson06.pdfA Subtle...

Classification and K-Nearest Neighbors

Administriviao Reminder: Homework 1 is due by 5pm Friday on Moodle

o Reading Quiz associated with today’s lecture. Due before class Wednesday.

° NOTETAKER

Regression vs ClassificationSo far, we’ve taken a vector of features and predicted a numerical response

Regression deals with quantitative predictions

But what if we’re interested in making qualitative predictions?

y = �0 + �1x1 + �2x2 + · · · �pxp

(x1, x2, . . . , xp)

$360, 000 = �0 + �1 ⇥ (2000 sq-ft) + �2 ⇥ (3 bath rooms)

Classification Examples

Text Classification: Spam or Not Spam

Text Classification: Fake News or Real News

Text Classification: Sentiment Analysis (Tweets, Amazon Reviews)

Image Classification: Handwritten Digits

Image Classification: Stop Sign or No Stop Sign (Autonomous Vehicles)

Image Classification: Hot Dog or Not Hot Dog (No real application whatsoever)

Image Classification: Labradoodle or Fried Chicken (Important for not eating your dog)

Classification with Probability

In classification problems our outputs are discrete classes

Goal: Given features predict which class belongs to x = (x1, x2, . . . , xp) Y = k, x

Conditional probability gives a nice framework for many classification models

Suppose our data can belong to one of classes

We model the conditional probability for each

and we make the prediction

k = 1, 2, . . . ,K

p(Y = k | x) k = 1, 2, . . . ,K

y = argmaxk

p(Y = k | x)

Example: In Spam Classification, the features are the words in the email

The classes are SPAM and HAM

We estimate and p(SPAM | email) p(HAM | email). r

€0.75= o .zs y

K-Nearest Neighbors

AllLabeledData TrainingSet ValidationSet

Our first classifier is a very simple one

Assume we have a labeled training set

Features can be anything: words, pixel intensities, numerical quantities

Instead of continuous response, now have discrete labels (encoded as 0, 1, 2, etc)

K-Nearest Neighbors

General Idea: Suppose we have an example that we want to classify

o Look in training set for K examples that are “nearest” to

o Predict label for using majority vote of labels for K nearest neighbors

Questions: o What does “nearest” mean?

o What if there’s a tie in the vote?

K-Nearest Neighbors

o Find : the set of K training examples nearest to

o Predict to be majority label in

Question: o What does “nearest” mean?

Answer:o Measure closeness using Euclidean Distance:

xNK(x)

y NK(x)

kxi � xk2

gtest Example

Tnthimng vector

K-Nearest Neighbors

Euclidean Distance:

o 1-NN predicts: _________

kxi � xk2

Classes = { RED ,Blue 3

K-Nearest Neighbors

Euclidean Distance:

kxi � xk2

K-Nearest Neighbors

Euclidean Distance:

o 1-NN predicts: blue

kxi � xk2

unclear

K-Nearest Neighbors

Euclidean Distance:

o 2-NN predicts: unclear

kxi � xk2

K-Nearest Neighbors

Euclidean Distance:

o 2-NN predicts: unclear

o 5-NN predicts: red

kxi � xk2

What About Probability?

o 5-NN:

p(Y = k | x) = 1

I(yi = k)

INDICATORFunction

p ( Y = Blue / x ) - }=

ftp.REDK/)=3@I (b) = { 1 it b is true

0 IF b is FADE

What About Ties?

Lots of reasonable strategies

Example: If there is a tie: o reduce K by 1 and try again.

o repeat until the tie is broken.

K-Nearest Neighbors

Predict with

Closest Points:

Prediction:

y K = 1

N, 1×1

= { Coin }

ya= RED

K-Nearest Neighbors

Predict with

Closest Points:

Prediction:

y K = 2

[ Nzlx ) > 310,4/11,4 ) }

t µ , (1) a { ( 0,2 ) }

RED =p

K-Nearest Neighbors

Predict with

Closest Points:

Prediction:

y K = 3

N } ( x ) = { ( 0,2 ) , 11,4 )( 2) 4)}

ya= BLUE

A Subtle Distinction

o KNN is an instance-based method

o Given an unseen query point, look directly at training data and then predict

o Compare to regression where we use training data to learn parameters

o When we learn parameters we call the model a parametric model

o Instance-based methods like KNN are called non-parametric models

Question: When would parametric vs non-parametric make a big difference?

Naïve Implementation

How much does KNN cost?

Suppose your training set has n examples, each with p features

Now you classify a single example

Naïve Method: Loop over each training example, compute distances, cache min indices

o The naïve method is ___________ in time

o The naïve method is ___________ in space

klxi - x 113

ocpn )olpn )

Tree-Based Implementation

Build a KD- or Ball-tree on the training data

Tree-Based Implementation

Build a KD- or Ball-tree on the training data

Higher up-front cost in time and storage to build data structure

Up-front cost is ___________ in time and ___________ in space

Now you classify a single example

Classification time for single query is ______________

Good strategy if lots of data and need to do lots of queries

ocpnlogn) Ocpu)

ocplogn)

Performance Metrics and Choosing K

Question: How do we evaluate our KNN classifier?

Answer: Perform prediction on labeled data, count how many we got wrong

Question: How do we choose K?

Error Rate =1

I(yi 6= yi)

[# wrong PRED

# predictions

Like all models, KNN can overfit and underfit, depending on choice of K

K = 1 K = 15 DECISION

ayBoundary

Choose optimal K by varying K and computing error rate on train and validation set

o Note: Plot vs 1/K

o Lower K leads to more flexibility

o Choose K to minimize validation error

0.01 0.02 0.05 0.10 0.20 0.50 1.00

Training ErrorsTest Errors.

VALIDATION

Feature Standardization

o Important to normalize/standardize features to similar sizes

o Consider doing 2-NN on unscaled and scaled data

Feature Standardization

o Important to normalize/standardize features to similar sizes

o Consider doing 2-NN on unscaled and scaled data

= BLUE ya = RED

Generalization Error and Analysis

Bias-Variance Trade-Off: o Pretty much analogous to our experience with polynomial regression

o As , variance increases and bias decreases

Reducible and Irreducible Errors: o This is more interesting, and more complicated

o What follows applies to ALL classification methods, not just KNN

1/K ! 1

The Optimal Bayes Classifier

Suppose we knew the true probabilistic distribution of our data:

Example:o Equal prob a 2D point is in Q1 or Q3

o If point in Q1, 0.9 probability it’s red.

o If point in Q2, 0.9 probability it’s blue.

o If classifier uses true model to predict…

o will be wrong ______ of time on average

o Bayes Error Rate = ____________

p(Y = k | X)

Atta04:

The Optimal Bayes Classifier

The Bayes Error Rate is the theoretical measure of Irreducible Error

o In long run, can never do better than the Bayes Error Rate on validation data

o Of course, just like in the regression setting, we don’t actually know what this error is. We just know it’s there!

0.01 0.02 0.05 0.10 0.20 0.50 1.00

Training ErrorsTest Errors

Var(✏)

K-Nearest Neighbors Wrap-Up

o Talked about the general classification setting

o Learned how to do simple KNN

o Looked at possible implementations and their complexity

Next Time: o Look at our first parametric classifier: The Perceptron

o Kinda a weak model, but forms the basis for Neural Networks

If-Time Bonus: Weighted KNN

Question: What should 5-NN predict in the following picture?

Weighted-KNN:

o Find : the set of K training examples nearest to

o Predict to be weighted-majority label in , weighted by inverse-distance

Improvements over KNN:

o Gives more weight to examples that are very close to query point

o Less tie-breaking required

xNK(x)

y NK(x)

Red Distances:

Blue Distances:

Red Weighted-Majority Vote:

Blue Weighted-Majority Vote:

Prediction:

Classification and K-Nearest Neighborsketelsen/files/courses/csci4622/slides/lesson06.pdfA Subtle...

Documents

Edu Cat en v5f Ff v5r16 Lesson06 Toprint

Unseen Essay

Unseen Justice #1

Seeing the unseen

The Unseen

70 640 Lesson06 Ppt 041009

THE UNSEEN REALM - Canberra Forerunnerscanberraforerunners.org/.../2013/03/The-Unseen-Realm-sample.pdf · THE UNSEEN REALM Recovering the ... Michael S. Heiser. The Unseen Realm

Venice :unseen history

International Yoga FestivalThe yoga of the sages has always been the unseen, but deeply felt, realm of subtle sound. Alive with listening Alive with listening awareness, sacred sound

Unseen Phuket

nuXmv: property speci cation - Home page | Department of ...disi.unitn.it/~trentin/teaching/fm2015/lesson06/lesson06.pdf · Contents 1 Property Speci cation Invariants LTL CTL Fairness

Unseen Slaughter Exptr

UNSEEN FASHION - DIP

Sight Unseen - redcross.org.au

Seen and Unseen

Science Unseen World

Unseen Adjustments 05.04

Unseen- The salvation

Unseen Ally

07 ILiA Lesson06