47
Classification and K-Nearest Neighbors

Classification and K-Nearest Neighborsketelsen/files/courses/csci4622/slides/lesson06.pdfA Subtle Distinction 27 oKNN is an instance-based method oGiven an unseen query point, look

  • Upload
    others

  • View
    7

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Classification and K-Nearest Neighborsketelsen/files/courses/csci4622/slides/lesson06.pdfA Subtle Distinction 27 oKNN is an instance-based method oGiven an unseen query point, look

Classification and K-Nearest Neighbors

Page 2: Classification and K-Nearest Neighborsketelsen/files/courses/csci4622/slides/lesson06.pdfA Subtle Distinction 27 oKNN is an instance-based method oGiven an unseen query point, look

Administriviao Reminder: Homework 1 is due by 5pm Friday on Moodle

o Reading Quiz associated with today’s lecture. Due before class Wednesday.

2

° NOTETAKER

Page 3: Classification and K-Nearest Neighborsketelsen/files/courses/csci4622/slides/lesson06.pdfA Subtle Distinction 27 oKNN is an instance-based method oGiven an unseen query point, look

Regression vs ClassificationSo far, we’ve taken a vector of features and predicted a numerical response

Regression deals with quantitative predictions

But what if we’re interested in making qualitative predictions?

3

y = �0 + �1x1 + �2x2 + · · · �pxp

(x1, x2, . . . , xp)

$360, 000 = �0 + �1 ⇥ (2000 sq-ft) + �2 ⇥ (3 bath rooms)

Page 4: Classification and K-Nearest Neighborsketelsen/files/courses/csci4622/slides/lesson06.pdfA Subtle Distinction 27 oKNN is an instance-based method oGiven an unseen query point, look

Classification Examples

4

Text Classification: Spam or Not Spam

Page 5: Classification and K-Nearest Neighborsketelsen/files/courses/csci4622/slides/lesson06.pdfA Subtle Distinction 27 oKNN is an instance-based method oGiven an unseen query point, look

Classification Examples

5

Text Classification: Fake News or Real News

Page 6: Classification and K-Nearest Neighborsketelsen/files/courses/csci4622/slides/lesson06.pdfA Subtle Distinction 27 oKNN is an instance-based method oGiven an unseen query point, look

Classification Examples

6

Text Classification: Sentiment Analysis (Tweets, Amazon Reviews)

Page 7: Classification and K-Nearest Neighborsketelsen/files/courses/csci4622/slides/lesson06.pdfA Subtle Distinction 27 oKNN is an instance-based method oGiven an unseen query point, look

Classification Examples

7

Image Classification: Handwritten Digits

Page 8: Classification and K-Nearest Neighborsketelsen/files/courses/csci4622/slides/lesson06.pdfA Subtle Distinction 27 oKNN is an instance-based method oGiven an unseen query point, look

Classification Examples

8

Image Classification: Stop Sign or No Stop Sign (Autonomous Vehicles)

Page 9: Classification and K-Nearest Neighborsketelsen/files/courses/csci4622/slides/lesson06.pdfA Subtle Distinction 27 oKNN is an instance-based method oGiven an unseen query point, look

Classification Examples

9

Image Classification: Hot Dog or Not Hot Dog (No real application whatsoever)

Page 10: Classification and K-Nearest Neighborsketelsen/files/courses/csci4622/slides/lesson06.pdfA Subtle Distinction 27 oKNN is an instance-based method oGiven an unseen query point, look

Classification Examples

10

Image Classification: Labradoodle or Fried Chicken (Important for not eating your dog)

Page 11: Classification and K-Nearest Neighborsketelsen/files/courses/csci4622/slides/lesson06.pdfA Subtle Distinction 27 oKNN is an instance-based method oGiven an unseen query point, look

Classification with Probability

11

In classification problems our outputs are discrete classes

Goal: Given features predict which class belongs to x = (x1, x2, . . . , xp) Y = k, x

Page 12: Classification and K-Nearest Neighborsketelsen/files/courses/csci4622/slides/lesson06.pdfA Subtle Distinction 27 oKNN is an instance-based method oGiven an unseen query point, look

Classification with Probability

12

In classification problems our outputs are discrete classes

Goal: Given features predict which class belongs to x = (x1, x2, . . . , xp) Y = k, x

Conditional probability gives a nice framework for many classification models

Suppose our data can belong to one of classes

We model the conditional probability for each

and we make the prediction

k = 1, 2, . . . ,K

p(Y = k | x) k = 1, 2, . . . ,K

y = argmaxk

p(Y = k | x)

Page 13: Classification and K-Nearest Neighborsketelsen/files/courses/csci4622/slides/lesson06.pdfA Subtle Distinction 27 oKNN is an instance-based method oGiven an unseen query point, look

Classification with Probability

13

In classification problems our outputs are discrete classes

Goal: Given features predict which class belongs to x = (x1, x2, . . . , xp) Y = k, x

Example: In Spam Classification, the features are the words in the email

The classes are SPAM and HAM

We estimate and p(SPAM | email) p(HAM | email). r

€0.75= o .zs y

'

=SPAM

Page 14: Classification and K-Nearest Neighborsketelsen/files/courses/csci4622/slides/lesson06.pdfA Subtle Distinction 27 oKNN is an instance-based method oGiven an unseen query point, look

K-Nearest Neighbors

14

AllLabeledData TrainingSet ValidationSet

Our first classifier is a very simple one

Assume we have a labeled training set

Features can be anything: words, pixel intensities, numerical quantities

Instead of continuous response, now have discrete labels (encoded as 0, 1, 2, etc)

Page 15: Classification and K-Nearest Neighborsketelsen/files/courses/csci4622/slides/lesson06.pdfA Subtle Distinction 27 oKNN is an instance-based method oGiven an unseen query point, look

K-Nearest Neighbors

15

General Idea: Suppose we have an example that we want to classify

o Look in training set for K examples that are “nearest” to

o Predict label for using majority vote of labels for K nearest neighbors

Questions: o What does “nearest” mean?

o What if there’s a tie in the vote?

x

x

x

Page 16: Classification and K-Nearest Neighborsketelsen/files/courses/csci4622/slides/lesson06.pdfA Subtle Distinction 27 oKNN is an instance-based method oGiven an unseen query point, look

K-Nearest Neighbors

16

KNN:

o Find : the set of K training examples nearest to

o Predict to be majority label in

Question: o What does “nearest” mean?

Answer:o Measure closeness using Euclidean Distance:

xNK(x)

y NK(x)

kxi � xk2

-

gtest Example

Tnthimng vector

Page 17: Classification and K-Nearest Neighborsketelsen/files/courses/csci4622/slides/lesson06.pdfA Subtle Distinction 27 oKNN is an instance-based method oGiven an unseen query point, look

K-Nearest Neighbors

17

Euclidean Distance:

o 1-NN predicts: _________

o 2-NN predicts: _________

o 5-NN predicts: _________

kxi � xk2

Classes = { RED ,Blue 3

Page 18: Classification and K-Nearest Neighborsketelsen/files/courses/csci4622/slides/lesson06.pdfA Subtle Distinction 27 oKNN is an instance-based method oGiven an unseen query point, look

K-Nearest Neighbors

18

Euclidean Distance:

o 1-NN predicts: _________

o 2-NN predicts: _________

o 5-NN predicts: _________

kxi � xk2

BLUE

Page 19: Classification and K-Nearest Neighborsketelsen/files/courses/csci4622/slides/lesson06.pdfA Subtle Distinction 27 oKNN is an instance-based method oGiven an unseen query point, look

K-Nearest Neighbors

19

Euclidean Distance:

o 1-NN predicts: blue

o 2-NN predicts: _________

o 5-NN predicts: _________

kxi � xk2

unclear

Page 20: Classification and K-Nearest Neighborsketelsen/files/courses/csci4622/slides/lesson06.pdfA Subtle Distinction 27 oKNN is an instance-based method oGiven an unseen query point, look

K-Nearest Neighbors

20

Euclidean Distance:

o 1-NN predicts: blue

o 2-NN predicts: unclear

o 5-NN predicts: _________

kxi � xk2

RED

Page 21: Classification and K-Nearest Neighborsketelsen/files/courses/csci4622/slides/lesson06.pdfA Subtle Distinction 27 oKNN is an instance-based method oGiven an unseen query point, look

K-Nearest Neighbors

21

Euclidean Distance:

o 1-NN predicts: blue

o 2-NN predicts: unclear

o 5-NN predicts: red

kxi � xk2

Page 22: Classification and K-Nearest Neighborsketelsen/files/courses/csci4622/slides/lesson06.pdfA Subtle Distinction 27 oKNN is an instance-based method oGiven an unseen query point, look

What About Probability?

22

o 5-NN:

p(Y = k | x) = 1

K

X

i2NK

I(yi = k)

INDICATORFunction

: -

=

p ( Y = Blue / x ) - }=

ftp.REDK/)=3@I (b) = { 1 it b is true

0 IF b is FADE

Page 23: Classification and K-Nearest Neighborsketelsen/files/courses/csci4622/slides/lesson06.pdfA Subtle Distinction 27 oKNN is an instance-based method oGiven an unseen query point, look

What About Ties?

23

Lots of reasonable strategies

Example: If there is a tie: o reduce K by 1 and try again.

o repeat until the tie is broken.

Page 24: Classification and K-Nearest Neighborsketelsen/files/courses/csci4622/slides/lesson06.pdfA Subtle Distinction 27 oKNN is an instance-based method oGiven an unseen query point, look

K-Nearest Neighbors

24

Predict with

Closest Points:

Prediction:

y K = 1

N, 1×1

= { Coin }

ya= RED

Page 25: Classification and K-Nearest Neighborsketelsen/files/courses/csci4622/slides/lesson06.pdfA Subtle Distinction 27 oKNN is an instance-based method oGiven an unseen query point, look

K-Nearest Neighbors

25

Predict with

Closest Points:

Prediction:

y K = 2

[ Nzlx ) > 310,4/11,4 ) }

t µ , (1) a { ( 0,2 ) }

RED =p

Page 26: Classification and K-Nearest Neighborsketelsen/files/courses/csci4622/slides/lesson06.pdfA Subtle Distinction 27 oKNN is an instance-based method oGiven an unseen query point, look

K-Nearest Neighbors

26

Predict with

Closest Points:

Prediction:

y K = 3

N } ( x ) = { ( 0,2 ) , 11,4 )( 2) 4)}

ya= BLUE

Page 27: Classification and K-Nearest Neighborsketelsen/files/courses/csci4622/slides/lesson06.pdfA Subtle Distinction 27 oKNN is an instance-based method oGiven an unseen query point, look

A Subtle Distinction

27

o KNN is an instance-based method

o Given an unseen query point, look directly at training data and then predict

o Compare to regression where we use training data to learn parameters

o When we learn parameters we call the model a parametric model

o Instance-based methods like KNN are called non-parametric models

Question: When would parametric vs non-parametric make a big difference?

.

Page 28: Classification and K-Nearest Neighborsketelsen/files/courses/csci4622/slides/lesson06.pdfA Subtle Distinction 27 oKNN is an instance-based method oGiven an unseen query point, look

Naïve Implementation

28

How much does KNN cost?

Suppose your training set has n examples, each with p features

Now you classify a single example

Naïve Method: Loop over each training example, compute distances, cache min indices

o The naïve method is ___________ in time

o The naïve method is ___________ in space

x

klxi - x 113

ocpn )olpn )

Page 29: Classification and K-Nearest Neighborsketelsen/files/courses/csci4622/slides/lesson06.pdfA Subtle Distinction 27 oKNN is an instance-based method oGiven an unseen query point, look

Tree-Based Implementation

29

Build a KD- or Ball-tree on the training data

Page 30: Classification and K-Nearest Neighborsketelsen/files/courses/csci4622/slides/lesson06.pdfA Subtle Distinction 27 oKNN is an instance-based method oGiven an unseen query point, look

Tree-Based Implementation

30

Build a KD- or Ball-tree on the training data

Higher up-front cost in time and storage to build data structure

Up-front cost is ___________ in time and ___________ in space

Now you classify a single example

Classification time for single query is ______________

Good strategy if lots of data and need to do lots of queries

x

ocpnlogn) Ocpu)

ocplogn)

Page 31: Classification and K-Nearest Neighborsketelsen/files/courses/csci4622/slides/lesson06.pdfA Subtle Distinction 27 oKNN is an instance-based method oGiven an unseen query point, look

Performance Metrics and Choosing K

31

Question: How do we evaluate our KNN classifier?

Answer: Perform prediction on labeled data, count how many we got wrong

Question: How do we choose K?

Error Rate =1

n

nX

i=1

I(yi 6= yi)

[# wrong PRED

-

# predictions

Page 32: Classification and K-Nearest Neighborsketelsen/files/courses/csci4622/slides/lesson06.pdfA Subtle Distinction 27 oKNN is an instance-based method oGiven an unseen query point, look

Performance Metrics and Choosing K

32

Like all models, KNN can overfit and underfit, depending on choice of K

K = 1 K = 15 DECISION

ayBoundary

Page 33: Classification and K-Nearest Neighborsketelsen/files/courses/csci4622/slides/lesson06.pdfA Subtle Distinction 27 oKNN is an instance-based method oGiven an unseen query point, look

Performance Metrics and Choosing K

33

Choose optimal K by varying K and computing error rate on train and validation set

o Note: Plot vs 1/K

o Lower K leads to more flexibility

o Choose K to minimize validation error

0.01 0.02 0.05 0.10 0.20 0.50 1.00

0.0

00

.05

0.1

00

.15

0.2

01/K

Err

or

Ra

te

Training ErrorsTest Errors.

VALIDATION

Page 34: Classification and K-Nearest Neighborsketelsen/files/courses/csci4622/slides/lesson06.pdfA Subtle Distinction 27 oKNN is an instance-based method oGiven an unseen query point, look

Feature Standardization

34

o Important to normalize/standardize features to similar sizes

o Consider doing 2-NN on unscaled and scaled data

O

Page 35: Classification and K-Nearest Neighborsketelsen/files/courses/csci4622/slides/lesson06.pdfA Subtle Distinction 27 oKNN is an instance-based method oGiven an unseen query point, look

Feature Standardization

35

o Important to normalize/standardize features to similar sizes

o Consider doing 2-NN on unscaled and scaled data

y"

= BLUE ya = RED

0 o

Page 36: Classification and K-Nearest Neighborsketelsen/files/courses/csci4622/slides/lesson06.pdfA Subtle Distinction 27 oKNN is an instance-based method oGiven an unseen query point, look

Generalization Error and Analysis

36

Bias-Variance Trade-Off: o Pretty much analogous to our experience with polynomial regression

o As , variance increases and bias decreases

Reducible and Irreducible Errors: o This is more interesting, and more complicated

o What follows applies to ALL classification methods, not just KNN

1/K ! 1

Page 37: Classification and K-Nearest Neighborsketelsen/files/courses/csci4622/slides/lesson06.pdfA Subtle Distinction 27 oKNN is an instance-based method oGiven an unseen query point, look

The Optimal Bayes Classifier

37

Suppose we knew the true probabilistic distribution of our data:

Example:o Equal prob a 2D point is in Q1 or Q3

o If point in Q1, 0.9 probability it’s red.

o If point in Q2, 0.9 probability it’s blue.

o If classifier uses true model to predict…

o will be wrong ______ of time on average

o Bayes Error Rate = ____________

p(Y = k | X)

QI

Atta04:

Page 38: Classification and K-Nearest Neighborsketelsen/files/courses/csci4622/slides/lesson06.pdfA Subtle Distinction 27 oKNN is an instance-based method oGiven an unseen query point, look

The Optimal Bayes Classifier

38

The Bayes Error Rate is the theoretical measure of Irreducible Error

o In long run, can never do better than the Bayes Error Rate on validation data

o Of course, just like in the regression setting, we don’t actually know what this error is. We just know it’s there!

0.01 0.02 0.05 0.10 0.20 0.50 1.00

0.0

00

.05

0.1

00

.15

0.2

01/K

Err

or

Ra

te

Training ErrorsTest Errors

Var(✏)

_

t

Page 39: Classification and K-Nearest Neighborsketelsen/files/courses/csci4622/slides/lesson06.pdfA Subtle Distinction 27 oKNN is an instance-based method oGiven an unseen query point, look

K-Nearest Neighbors Wrap-Up

39

o Talked about the general classification setting

o Learned how to do simple KNN

o Looked at possible implementations and their complexity

Next Time: o Look at our first parametric classifier: The Perceptron

o Kinda a weak model, but forms the basis for Neural Networks

Page 40: Classification and K-Nearest Neighborsketelsen/files/courses/csci4622/slides/lesson06.pdfA Subtle Distinction 27 oKNN is an instance-based method oGiven an unseen query point, look

If-Time Bonus: Weighted KNN

40

Question: What should 5-NN predict in the following picture?

Page 41: Classification and K-Nearest Neighborsketelsen/files/courses/csci4622/slides/lesson06.pdfA Subtle Distinction 27 oKNN is an instance-based method oGiven an unseen query point, look

If-Time Bonus: Weighted KNN

41

Question: What should 5-NN predict in the following picture?

Standard KNN would predict red, but it seemslike there are almost asmany blues and they’recloser.
Page 42: Classification and K-Nearest Neighborsketelsen/files/courses/csci4622/slides/lesson06.pdfA Subtle Distinction 27 oKNN is an instance-based method oGiven an unseen query point, look

If-Time Bonus: Weighted KNN

42

Weighted-KNN:

o Find : the set of K training examples nearest to

o Predict to be weighted-majority label in , weighted by inverse-distance

Improvements over KNN:

o Gives more weight to examples that are very close to query point

o Less tie-breaking required

xNK(x)

y NK(x)

Page 43: Classification and K-Nearest Neighborsketelsen/files/courses/csci4622/slides/lesson06.pdfA Subtle Distinction 27 oKNN is an instance-based method oGiven an unseen query point, look

If-Time Bonus: Weighted KNN

43

Question: What should 5-NN predict in the following picture?

Red Distances:

Blue Distances:

Red Weighted-Majority Vote:

Blue Weighted-Majority Vote:

Prediction:

0.5, 0.5
1/0.5 + 1/0.5 = 2 + 2 = 4
4 > 3 -> predict Blue
1, 1, 1
1/1 + 1/1 + 1/1 = 3
Page 44: Classification and K-Nearest Neighborsketelsen/files/courses/csci4622/slides/lesson06.pdfA Subtle Distinction 27 oKNN is an instance-based method oGiven an unseen query point, look

44

Page 45: Classification and K-Nearest Neighborsketelsen/files/courses/csci4622/slides/lesson06.pdfA Subtle Distinction 27 oKNN is an instance-based method oGiven an unseen query point, look

45

Page 46: Classification and K-Nearest Neighborsketelsen/files/courses/csci4622/slides/lesson06.pdfA Subtle Distinction 27 oKNN is an instance-based method oGiven an unseen query point, look

46

Page 47: Classification and K-Nearest Neighborsketelsen/files/courses/csci4622/slides/lesson06.pdfA Subtle Distinction 27 oKNN is an instance-based method oGiven an unseen query point, look

47