57
1 Machine Learning

13 kNN NB Regression - Bryn Mawr€¦ · nearest-neighbor.! But the lack of gradients and the jerkiness isnʼt good.! Appalling behavior! Loses all the detail that 1-nearest neighbor

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: 13 kNN NB Regression - Bryn Mawr€¦ · nearest-neighbor.! But the lack of gradients and the jerkiness isnʼt good.! Appalling behavior! Loses all the detail that 1-nearest neighbor

1

Machine Learning

Page 2: 13 kNN NB Regression - Bryn Mawr€¦ · nearest-neighbor.! But the lack of gradients and the jerkiness isnʼt good.! Appalling behavior! Loses all the detail that 1-nearest neighbor

Machine  Learning  in  a  Nutshell  

Machine  Learner  

Data   Model   Performance  Measure  

2

Page 3: 13 kNN NB Regression - Bryn Mawr€¦ · nearest-neighbor.! But the lack of gradients and the jerkiness isnʼt good.! Appalling behavior! Loses all the detail that 1-nearest neighbor

Data  with  a5ributes  

Machine  Learning  in  a  Nutshell  

ID   A1   Reflex   RefLow  RefHigh  Label  1   5.6   Normal   3.4   7   No  2   5.5   Normal   2.4   5.7   No  3   5.3   Normal   2.4   5.7   Yes  4   5.3   Elevated   2.4   5.7   No  5   6.3   Normal   3.4   7   No  6   3.3   Normal   2.4   5.7   Yes  7   5.1   Decreased   2.4   5.7   Yes  8   4.2   Normal   2.4   5.7   Yes  …  …   …   …   …   …  

Machine  Learner  

Data   Model   Performance  Measure  

Instance      with  label  

xi ∈ Xyi ∈ Y

3

Page 4: 13 kNN NB Regression - Bryn Mawr€¦ · nearest-neighbor.! But the lack of gradients and the jerkiness isnʼt good.! Appalling behavior! Loses all the detail that 1-nearest neighbor

Machine  Learning  in  a  Nutshell  

Machine  Learner  

Data   Model   Performance  Measure  

Model  LogisHc  regression  Support  vector    machines  

x2 x3

x1

x4 x5

Hierarchical  Bayesian  Networks  

Mixture  Models  

f : X �→ YData  with  a5ributes  ID   A1   Reflex   RefLow  RefHigh  Label  1   5.6   Normal   3.4   7   No  2   5.5   Normal   2.4   5.7   No  3   5.3   Normal   2.4   5.7   Yes  4   5.3   Elevated   2.4   5.7   No  5   6.3   Normal   3.4   7   No  6   3.3   Normal   2.4   5.7   Yes  7   5.1   Decreased   2.4   5.7   Yes  8   4.2   Normal   2.4   5.7   Yes  …  …   …   …   …   …   Instance      with  label  

xi ∈ Xyi ∈ Y

4

Page 5: 13 kNN NB Regression - Bryn Mawr€¦ · nearest-neighbor.! But the lack of gradients and the jerkiness isnʼt good.! Appalling behavior! Loses all the detail that 1-nearest neighbor

Machine  Learning  in  a  Nutshell  

Machine  Learner  

Data   Model   Performance  Measure  

EvaluaHon  Measure  predicted  labels  vs  actual  labels  on  test  data  

#  Training  Examples  Pe

rformance  

Learning  Curve  

5

Model  LogisHc  regression  Support  vector    machines  

x2 x3

x1

x4 x5

Hierarchical  Bayesian  Networks  

Mixture  Models  

f : X �→ YData  with  a5ributes  ID   A1   Reflex   RefLow  RefHigh  Label  1   5.6   Normal   3.4   7   No  2   5.5   Normal   2.4   5.7   No  3   5.3   Normal   2.4   5.7   Yes  4   5.3   Elevated   2.4   5.7   No  5   6.3   Normal   3.4   7   No  6   3.3   Normal   2.4   5.7   Yes  7   5.1   Decreased   2.4   5.7   Yes  8   4.2   Normal   2.4   5.7   Yes  …  …   …   …   …   …   Instance      with  label  

xi ∈ Xyi ∈ Y

Page 6: 13 kNN NB Regression - Bryn Mawr€¦ · nearest-neighbor.! But the lack of gradients and the jerkiness isnʼt good.! Appalling behavior! Loses all the detail that 1-nearest neighbor

6

A training set

Page 7: 13 kNN NB Regression - Bryn Mawr€¦ · nearest-neighbor.! But the lack of gradients and the jerkiness isnʼt good.! Appalling behavior! Loses all the detail that 1-nearest neighbor

7

ID3-induced decision tree

Page 8: 13 kNN NB Regression - Bryn Mawr€¦ · nearest-neighbor.! But the lack of gradients and the jerkiness isnʼt good.! Appalling behavior! Loses all the detail that 1-nearest neighbor

8

Model spaces I

+ +

- -

I

+ +

- -

I

+ +

- - Nearest neighbor

Version space

Decision tree

Page 9: 13 kNN NB Regression - Bryn Mawr€¦ · nearest-neighbor.! But the lack of gradients and the jerkiness isnʼt good.! Appalling behavior! Loses all the detail that 1-nearest neighbor

9

Decision tree-induced partition – example

Color

Shape Size +

+ - Size

+ -

+ big

big small

small

round square

red green blue

I

Page 10: 13 kNN NB Regression - Bryn Mawr€¦ · nearest-neighbor.! But the lack of gradients and the jerkiness isnʼt good.! Appalling behavior! Loses all the detail that 1-nearest neighbor

10

k-Nearest Neighbor Instance-Based Learning

Some material adapted from slides by Andrew Moore, CMU.

Visit http://www.autonlab.org/tutorials/ for Andrew’s repository of Data Mining tutorials.

Page 11: 13 kNN NB Regression - Bryn Mawr€¦ · nearest-neighbor.! But the lack of gradients and the jerkiness isnʼt good.! Appalling behavior! Loses all the detail that 1-nearest neighbor

11

1-Nearest Neighbor

n  One of the simplest of all machine learning classifiers

n  Simple idea: label a new point the same as the closest known point

Label it red.!

Page 12: 13 kNN NB Regression - Bryn Mawr€¦ · nearest-neighbor.! But the lack of gradients and the jerkiness isnʼt good.! Appalling behavior! Loses all the detail that 1-nearest neighbor

12

1-Nearest Neighbor

n  A type of instance-based learning n  Also known as “memory-based” learning

n  Forms a Voronoi tessellation of the instance space

Page 13: 13 kNN NB Regression - Bryn Mawr€¦ · nearest-neighbor.! But the lack of gradients and the jerkiness isnʼt good.! Appalling behavior! Loses all the detail that 1-nearest neighbor

13

Distance Metrics n  Different metrics can change the decision surface

n  Standard Euclidean distance metric: n  Two-dimensional: Dist(a,b) = sqrt((a1 – b1)2 + (a2 – b2)2) n  Multivariate: Dist(a,b) = sqrt(∑ (ai – bi)2)

Dist(a,b) =(a1 – b1)2 + (a2 – b2)2! Dist(a,b) =(a1 – b1)2 + (3a2 – 3b2)2!

Adapted from “Instance-Based Learning” !lecture slides by Andrew Moore, CMU.!

Page 14: 13 kNN NB Regression - Bryn Mawr€¦ · nearest-neighbor.! But the lack of gradients and the jerkiness isnʼt good.! Appalling behavior! Loses all the detail that 1-nearest neighbor

14

Four Aspects of an Instance-Based Learner:

1.  A distance metric 2.  How many nearby neighbors to look at? 3.  A weighting function (optional) 4.  How to fit with the local points?

Adapted from “Instance-Based Learning” !lecture slides by Andrew Moore, CMU.!

Page 15: 13 kNN NB Regression - Bryn Mawr€¦ · nearest-neighbor.! But the lack of gradients and the jerkiness isnʼt good.! Appalling behavior! Loses all the detail that 1-nearest neighbor

15

1-NN’s Four Aspects as an Instance-Based Learner:

1.  A distance metric n  Euclidian

2.  How many nearby neighbors to look at? n  One

3.  A weighting function (optional) n  Unused

4.  How to fit with the local points? n  Just predict the same output as the nearest neighbor.

Adapted from “Instance-Based Learning” !lecture slides by Andrew Moore, CMU.!

Page 16: 13 kNN NB Regression - Bryn Mawr€¦ · nearest-neighbor.! But the lack of gradients and the jerkiness isnʼt good.! Appalling behavior! Loses all the detail that 1-nearest neighbor

16

Zen Gardens Mystery of renowned zen garden revealed [CNN Article] Thursday, September 26, 2002 Posted: 10:11 AM EDT (1411 GMT)

LONDON (Reuters) -- For centuries visitors to the renowned Ryoanji Temple garden in Kyoto, Japan have been entranced and mystified by the simple arrangement of rocks.

The five sparse clusters on a rectangle of raked gravel are said to be pleasing to the eyes of the hundreds of thousands of tourists who visit the garden each year.

Scientists in Japan said on Wednesday they now believe they have discovered its mysterious appeal.

"We have uncovered the implicit structure of the Ryoanji garden's visual ground and have shown that it includes an abstract, minimalist depiction of natural scenery," said Gert Van Tonder of Kyoto University.

The researchers discovered that the empty space of the garden evokes a hidden image of a branching tree that is sensed by the unconscious mind.

"We believe that the unconscious perception of this pattern contributes to the enigmatic appeal of the garden," Van Tonder added.

He and his colleagues believe that whoever created the garden during the Muromachi era between 1333-1573 knew exactly what they were doing and placed the rocks around the tree image.

By using a concept called medial-axis transformation, the scientists showed that the hidden branched tree converges on the main area from which the garden is viewed.

The trunk leads to the prime viewing site in the ancient temple that once overlooked the garden. It is thought that abstract art may have a similar impact.

"There is a growing realisation that scientific analysis can reveal unexpected structural features hidden in controversial abstract paintings," Van Tonder said

Adapted from “Instance-Based Learning” lecture slides by Andrew Moore, CMU.!

Page 17: 13 kNN NB Regression - Bryn Mawr€¦ · nearest-neighbor.! But the lack of gradients and the jerkiness isnʼt good.! Appalling behavior! Loses all the detail that 1-nearest neighbor

17

k – Nearest Neighbor

n  Generalizes 1-NN to smooth away noise in the labels

n  A new point is now assigned the most frequent label of its k nearest neighbors

Label it red, when k = 3!

Label it blue, when k = 7!

Page 18: 13 kNN NB Regression - Bryn Mawr€¦ · nearest-neighbor.! But the lack of gradients and the jerkiness isnʼt good.! Appalling behavior! Loses all the detail that 1-nearest neighbor

18

k-Nearest Neighbor (k = 9)

A magnificent job of noise smoothing. Three cheers for 9-nearest-neighbor.!But the lack of gradients and the jerkiness isn’t good.!

Appalling behavior! Loses all the detail that 1-nearest neighbor would give. The tails are horrible!!

Fits much less of the noise, captures trends. But still, frankly, pathetic compared!with linear regression.!

Adapted from “Instance-Based Learning” !lecture slides by Andrew Moore, CMU.!

Page 19: 13 kNN NB Regression - Bryn Mawr€¦ · nearest-neighbor.! But the lack of gradients and the jerkiness isnʼt good.! Appalling behavior! Loses all the detail that 1-nearest neighbor

19

The Naïve Bayes

Classifier

Some material adapted from slides by Tom Mitchell, CMU.

Page 20: 13 kNN NB Regression - Bryn Mawr€¦ · nearest-neighbor.! But the lack of gradients and the jerkiness isnʼt good.! Appalling behavior! Loses all the detail that 1-nearest neighbor

20

The Naïve Bayes Classifier

)()|()(

)|(j

ijiji XP

YXPYPXYP =

n  Recall Bayes rule:

n  Which is short for:

n  We can re-write this as:

)()|()(

)|(j

ijiji xXP

yYxXPyYPxXyYP

=

======

∑ ===

======

k kkj

ijiji yYPyYxXP

yYxXPyYPxXyYP

)()|()|()(

)|(

Page 21: 13 kNN NB Regression - Bryn Mawr€¦ · nearest-neighbor.! But the lack of gradients and the jerkiness isnʼt good.! Appalling behavior! Loses all the detail that 1-nearest neighbor

21

Deriving Naïve Bayes

n  Idea: use the training data to directly estimate:

n  Then, we can use these values to estimate using Bayes rule.

n  Recall that representing the full joint probability

is not practical.

)(YP)|( YXP and

)|( newXYP

)|,,,( 21 YXXXP n…

Page 22: 13 kNN NB Regression - Bryn Mawr€¦ · nearest-neighbor.! But the lack of gradients and the jerkiness isnʼt good.! Appalling behavior! Loses all the detail that 1-nearest neighbor

22

Deriving Naïve Bayes

n  However, if we make the assumption that the attributes are independent, estimation is easy!

n  In other words, we assume all attributes are conditionally independent given Y. n  Often this assumption is violated in practice, but

more on that later…

∏=i

in YXPYXXP )|()|,,( 1…

Page 23: 13 kNN NB Regression - Bryn Mawr€¦ · nearest-neighbor.! But the lack of gradients and the jerkiness isnʼt good.! Appalling behavior! Loses all the detail that 1-nearest neighbor

23

Deriving Naïve Bayes

n  Let and label Y be discrete.

n  Then, we can estimate and directly from the training data by counting!

nXXX ,,1…=

)( iYP)|( ii YXP

Sky Temp Humid Wind Water Forecast Play? sunny warm normal strong warm same yes sunny warm high strong warm same yes rainy cold high strong warm change no sunny warm high strong cool change yes

P(Sky = sunny | Play = yes) = ? P(Humid = high | Play = yes) = ?

Page 24: 13 kNN NB Regression - Bryn Mawr€¦ · nearest-neighbor.! But the lack of gradients and the jerkiness isnʼt good.! Appalling behavior! Loses all the detail that 1-nearest neighbor

24

The Naïve Bayes Classifier n  Now we have:

which is just a one-level Bayesian Network

n  To classify a new point Xnew:

)( iHP

… … Attributes (evidence)

Labels (hypotheses)

1 n i

j

X X X

Y )( jYP)|( ji YXP

∑ ∏∏

==

====

k i kik

jiijnj yYXPyYP

yYXPyYPXXyYP

)|()()|()(

),,|( 1 …

∏ ==⎯⎯←i

kikynew yYXPyYPYk

)|()(maxarg

Page 25: 13 kNN NB Regression - Bryn Mawr€¦ · nearest-neighbor.! But the lack of gradients and the jerkiness isnʼt good.! Appalling behavior! Loses all the detail that 1-nearest neighbor

25

The Naïve Bayes Algorithm

n  For each value yk

n  Estimate P(Y = yk) from the data. n  For each value xij of each attribute Xi

n Estimate P(Xi=xij | Y = yk)

n  Classify a new point via:

n  In practice, the independence assumption doesn’t often hold true, but Naïve Bayes performs very well despite it.

∏ ==⎯⎯←i

kikynew yYXPyYPYk

)|()(maxarg

Page 26: 13 kNN NB Regression - Bryn Mawr€¦ · nearest-neighbor.! But the lack of gradients and the jerkiness isnʼt good.! Appalling behavior! Loses all the detail that 1-nearest neighbor

26

Naïve Bayes Applications

n  Text classification n  Which e-mails are spam? n  Which e-mails are meeting notices? n  Which author wrote a document?

n  Classifying mental states

People Words Animal Words

Learning P(BrainActivity | WordCategory)

Pairwise Classification Accuracy: 85%

Page 27: 13 kNN NB Regression - Bryn Mawr€¦ · nearest-neighbor.! But the lack of gradients and the jerkiness isnʼt good.! Appalling behavior! Loses all the detail that 1-nearest neighbor

Polynomial Curve Fitting

27

Slides  adapted  from    Pa5ern  RecogniHon  and  Machine  Learning    by  Christopher  Bishop  

Page 28: 13 kNN NB Regression - Bryn Mawr€¦ · nearest-neighbor.! But the lack of gradients and the jerkiness isnʼt good.! Appalling behavior! Loses all the detail that 1-nearest neighbor

Polynomial  Curve  FiUng    

Page 29: 13 kNN NB Regression - Bryn Mawr€¦ · nearest-neighbor.! But the lack of gradients and the jerkiness isnʼt good.! Appalling behavior! Loses all the detail that 1-nearest neighbor

Sum-­‐of-­‐Squares  Error  FuncHon  

Page 30: 13 kNN NB Regression - Bryn Mawr€¦ · nearest-neighbor.! But the lack of gradients and the jerkiness isnʼt good.! Appalling behavior! Loses all the detail that 1-nearest neighbor

0th  Order  Polynomial  

Page 31: 13 kNN NB Regression - Bryn Mawr€¦ · nearest-neighbor.! But the lack of gradients and the jerkiness isnʼt good.! Appalling behavior! Loses all the detail that 1-nearest neighbor

1st  Order  Polynomial  

Page 32: 13 kNN NB Regression - Bryn Mawr€¦ · nearest-neighbor.! But the lack of gradients and the jerkiness isnʼt good.! Appalling behavior! Loses all the detail that 1-nearest neighbor

3rd  Order  Polynomial  

Page 33: 13 kNN NB Regression - Bryn Mawr€¦ · nearest-neighbor.! But the lack of gradients and the jerkiness isnʼt good.! Appalling behavior! Loses all the detail that 1-nearest neighbor

9th  Order  Polynomial  

Page 34: 13 kNN NB Regression - Bryn Mawr€¦ · nearest-neighbor.! But the lack of gradients and the jerkiness isnʼt good.! Appalling behavior! Loses all the detail that 1-nearest neighbor

Over-­‐fiUng  

Err

or

Page 35: 13 kNN NB Regression - Bryn Mawr€¦ · nearest-neighbor.! But the lack of gradients and the jerkiness isnʼt good.! Appalling behavior! Loses all the detail that 1-nearest neighbor

Polynomial  Coefficients      

Page 36: 13 kNN NB Regression - Bryn Mawr€¦ · nearest-neighbor.! But the lack of gradients and the jerkiness isnʼt good.! Appalling behavior! Loses all the detail that 1-nearest neighbor

Data  Set  Size:    

9th  Order  Polynomial  

Page 37: 13 kNN NB Regression - Bryn Mawr€¦ · nearest-neighbor.! But the lack of gradients and the jerkiness isnʼt good.! Appalling behavior! Loses all the detail that 1-nearest neighbor

Data  Set  Size:    

9th  Order  Polynomial  

Page 38: 13 kNN NB Regression - Bryn Mawr€¦ · nearest-neighbor.! But the lack of gradients and the jerkiness isnʼt good.! Appalling behavior! Loses all the detail that 1-nearest neighbor

Penalize  large  coefficient  values  

L2  Norm  

RegularizaHon  

Measures the “complexity” of w

�w�2 =

��

i

w2i

E(w) =1

2

N�

n=1

{y(xn,w)− tn}2 +λ

2(�w�2)2

Page 39: 13 kNN NB Regression - Bryn Mawr€¦ · nearest-neighbor.! But the lack of gradients and the jerkiness isnʼt good.! Appalling behavior! Loses all the detail that 1-nearest neighbor

Penalize  large  coefficient  values  

L2  Norm  

RegularizaHon  

E(w) =1

2

N�

n=1

{y(xn,w)− tn}2 +λ

2

i

w2i

Measures the “complexity” of w

�w�2 =

��

i

w2i

E(w) =1

2

N�

n=1

{y(xn,w)− tn}2 +λ

2(�w�2)2

λ regularization parameter higher λ à more regularization

Page 40: 13 kNN NB Regression - Bryn Mawr€¦ · nearest-neighbor.! But the lack of gradients and the jerkiness isnʼt good.! Appalling behavior! Loses all the detail that 1-nearest neighbor

RegularizaHon:    

λ=  1.5E-­‐8

λ=  1.5E-­‐8

Page 41: 13 kNN NB Regression - Bryn Mawr€¦ · nearest-neighbor.! But the lack of gradients and the jerkiness isnʼt good.! Appalling behavior! Loses all the detail that 1-nearest neighbor

RegularizaHon:    

λ=  1

λ=  1

Page 42: 13 kNN NB Regression - Bryn Mawr€¦ · nearest-neighbor.! But the lack of gradients and the jerkiness isnʼt good.! Appalling behavior! Loses all the detail that 1-nearest neighbor

RegularizaHon:                      vs.    

Page 43: 13 kNN NB Regression - Bryn Mawr€¦ · nearest-neighbor.! But the lack of gradients and the jerkiness isnʼt good.! Appalling behavior! Loses all the detail that 1-nearest neighbor

Polynomial  Coefficients      

Page 44: 13 kNN NB Regression - Bryn Mawr€¦ · nearest-neighbor.! But the lack of gradients and the jerkiness isnʼt good.! Appalling behavior! Loses all the detail that 1-nearest neighbor

E(w) =1

2

N�

n=1

�y(xn,w)− tn

�2+

λ

2

j

w2j

∇jE(w) =N�

n=1

xjn

�y(xn,w)− tn

�+ λwj

wj = wj − α∇jE(w)

Choose  w randomly, where    Repeat  unHl  w  converges  (i.e.,  ||w  –  wold||  <  ε  )    wold  =  w For  j  =  0  ...  M:  

Learning  via  Gradient  Descent  

wj ∼ N(0,σ2)

Page 45: 13 kNN NB Regression - Bryn Mawr€¦ · nearest-neighbor.! But the lack of gradients and the jerkiness isnʼt good.! Appalling behavior! Loses all the detail that 1-nearest neighbor

The  Gaussian  DistribuHon  

Page 46: 13 kNN NB Regression - Bryn Mawr€¦ · nearest-neighbor.! But the lack of gradients and the jerkiness isnʼt good.! Appalling behavior! Loses all the detail that 1-nearest neighbor

The  MulHvariate  Gaussian  

Page 47: 13 kNN NB Regression - Bryn Mawr€¦ · nearest-neighbor.! But the lack of gradients and the jerkiness isnʼt good.! Appalling behavior! Loses all the detail that 1-nearest neighbor

Gaussian  Parameter  EsHmaHon  

Likelihood  funcHon  

Page 48: 13 kNN NB Regression - Bryn Mawr€¦ · nearest-neighbor.! But the lack of gradients and the jerkiness isnʼt good.! Appalling behavior! Loses all the detail that 1-nearest neighbor

Maximum  (Log)  Likelihood  

Page 49: 13 kNN NB Regression - Bryn Mawr€¦ · nearest-neighbor.! But the lack of gradients and the jerkiness isnʼt good.! Appalling behavior! Loses all the detail that 1-nearest neighbor

Curve  FiUng  Re-­‐visited  

Page 50: 13 kNN NB Regression - Bryn Mawr€¦ · nearest-neighbor.! But the lack of gradients and the jerkiness isnʼt good.! Appalling behavior! Loses all the detail that 1-nearest neighbor

Maximum  Likelihood  

Determine                        by  minimizing  sum-­‐of-­‐squares  error,                          .  

Page 51: 13 kNN NB Regression - Bryn Mawr€¦ · nearest-neighbor.! But the lack of gradients and the jerkiness isnʼt good.! Appalling behavior! Loses all the detail that 1-nearest neighbor

PredicHve  DistribuHon  

Page 52: 13 kNN NB Regression - Bryn Mawr€¦ · nearest-neighbor.! But the lack of gradients and the jerkiness isnʼt good.! Appalling behavior! Loses all the detail that 1-nearest neighbor

MAP:  A  Step  towards  Bayes  

Determine                              by  minimizing  regularized  sum-­‐of-­‐squares  error,                          .  

Page 53: 13 kNN NB Regression - Bryn Mawr€¦ · nearest-neighbor.! But the lack of gradients and the jerkiness isnʼt good.! Appalling behavior! Loses all the detail that 1-nearest neighbor

Bayesian  Curve  FiUng  

Page 54: 13 kNN NB Regression - Bryn Mawr€¦ · nearest-neighbor.! But the lack of gradients and the jerkiness isnʼt good.! Appalling behavior! Loses all the detail that 1-nearest neighbor

Bayesian  PredicHve  DistribuHon  

Page 55: 13 kNN NB Regression - Bryn Mawr€¦ · nearest-neighbor.! But the lack of gradients and the jerkiness isnʼt good.! Appalling behavior! Loses all the detail that 1-nearest neighbor

Model  SelecHon  

Cross-­‐ValidaHon  

Page 56: 13 kNN NB Regression - Bryn Mawr€¦ · nearest-neighbor.! But the lack of gradients and the jerkiness isnʼt good.! Appalling behavior! Loses all the detail that 1-nearest neighbor

Curse  of  Dimensionality  

Page 57: 13 kNN NB Regression - Bryn Mawr€¦ · nearest-neighbor.! But the lack of gradients and the jerkiness isnʼt good.! Appalling behavior! Loses all the detail that 1-nearest neighbor

Curse  of  Dimensionality  

Polynomial  curve  fiUng,  M = 3          Gaussian  DensiHes  in    higher  dimensions