39
ECE 5424: Introduction to Machine Learning Stefan Lee Virginia Tech Topics: – (Finish) Model selection – Error decomposition – BiasVariance Tradeoff – Classification: Naïve Bayes Readings: Barber 17.1, 17.2, 10.110.3

L10 bias variance naive bayes - Virginia Techf16ece5424/slides/L10_bias_variance... · • Matlab demo (C)#DhruvBatra# 24. BiasDVariance#Tradeoff • Choice#of#hypothesisclassintroduceslearning#bias

Embed Size (px)

Citation preview

Page 1: L10 bias variance naive bayes - Virginia Techf16ece5424/slides/L10_bias_variance... · • Matlab demo (C)#DhruvBatra# 24. BiasDVariance#Tradeoff • Choice#of#hypothesisclassintroduceslearning#bias

ECE  5424:  Introduction  to  Machine  Learning

Stefan  LeeVirginia  Tech

Topics:  – (Finish)  Model  selection– Error  decomposition

– Bias-­Variance  Tradeoff– Classification:  Naïve Bayes

Readings:  Barber  17.1,  17.2,  10.1-­10.3

Page 2: L10 bias variance naive bayes - Virginia Techf16ece5424/slides/L10_bias_variance... · • Matlab demo (C)#DhruvBatra# 24. BiasDVariance#Tradeoff • Choice#of#hypothesisclassintroduceslearning#bias

Administrativia• HW2  

– Due:  Wed  09/28,  11:55pm– Implement  linear  regression,  Naïve  Bayes,  Logistic  Regression

• Project  Proposal  due  tomorrow  by  11:55pm!!

(C)  Dhruv  Batra   2

Page 3: L10 bias variance naive bayes - Virginia Techf16ece5424/slides/L10_bias_variance... · • Matlab demo (C)#DhruvBatra# 24. BiasDVariance#Tradeoff • Choice#of#hypothesisclassintroduceslearning#bias

Recap  of  last  time

(C)  Dhruv  Batra   3

Page 4: L10 bias variance naive bayes - Virginia Techf16ece5424/slides/L10_bias_variance... · • Matlab demo (C)#DhruvBatra# 24. BiasDVariance#Tradeoff • Choice#of#hypothesisclassintroduceslearning#bias

Regression

(C)  Dhruv  Batra   4

Page 5: L10 bias variance naive bayes - Virginia Techf16ece5424/slides/L10_bias_variance... · • Matlab demo (C)#DhruvBatra# 24. BiasDVariance#Tradeoff • Choice#of#hypothesisclassintroduceslearning#bias

(C)  Dhruv  Batra   5Slide  Credit:  Greg  Shakhnarovich

Page 6: L10 bias variance naive bayes - Virginia Techf16ece5424/slides/L10_bias_variance... · • Matlab demo (C)#DhruvBatra# 24. BiasDVariance#Tradeoff • Choice#of#hypothesisclassintroduceslearning#bias

(C)  Dhruv  Batra   6Slide  Credit:  Greg  Shakhnarovich

Page 7: L10 bias variance naive bayes - Virginia Techf16ece5424/slides/L10_bias_variance... · • Matlab demo (C)#DhruvBatra# 24. BiasDVariance#Tradeoff • Choice#of#hypothesisclassintroduceslearning#bias

(C)  Dhruv  Batra   7Slide  Credit:  Greg  Shakhnarovich

Page 8: L10 bias variance naive bayes - Virginia Techf16ece5424/slides/L10_bias_variance... · • Matlab demo (C)#DhruvBatra# 24. BiasDVariance#Tradeoff • Choice#of#hypothesisclassintroduceslearning#bias

What  you  need  to  know• Linear  Regression

– Model– Least  Squares  Objective– Connections  to  Max  Likelihood  with  Gaussian  Conditional– Robust  regression  with  Laplacian Likelihood– Ridge  Regression  with  priors– Polynomial  and  General  Additive  Regression

(C)  Dhruv  Batra   8

Page 9: L10 bias variance naive bayes - Virginia Techf16ece5424/slides/L10_bias_variance... · • Matlab demo (C)#DhruvBatra# 24. BiasDVariance#Tradeoff • Choice#of#hypothesisclassintroduceslearning#bias

Plan  for  Today• (Finish)  Model  Selection

– Overfitting vs Underfitting– Bias-­Variance  trade-­off

• aka  Modeling  error  vs Estimation  error  tradeoff

• Naïve  Bayes

(C)  Dhruv  Batra   9

Page 10: L10 bias variance naive bayes - Virginia Techf16ece5424/slides/L10_bias_variance... · • Matlab demo (C)#DhruvBatra# 24. BiasDVariance#Tradeoff • Choice#of#hypothesisclassintroduceslearning#bias

New  Topic:  Model  Selection  and  Error  Decomposition

(C)  Dhruv  Batra   10

Page 11: L10 bias variance naive bayes - Virginia Techf16ece5424/slides/L10_bias_variance... · • Matlab demo (C)#DhruvBatra# 24. BiasDVariance#Tradeoff • Choice#of#hypothesisclassintroduceslearning#bias

Model  Selection• How  do  we  pick  the  right  model  class?

• Similar  questions– How  do  I  pick  magic  hyper-­parameters?– How  do  I  do  feature  selection?

(C)  Dhruv  Batra   11

Page 12: L10 bias variance naive bayes - Virginia Techf16ece5424/slides/L10_bias_variance... · • Matlab demo (C)#DhruvBatra# 24. BiasDVariance#Tradeoff • Choice#of#hypothesisclassintroduceslearning#bias

Errors• Expected  Loss/Error

• Training  Loss/Error  • Validation  Loss/Error• Test  Loss/Error

• Reporting  Training  Error  (instead  of  Test)  is  CHEATING

• Optimizing  parameters  on  Test  Error  is  CHEATING

(C)  Dhruv  Batra   12

Page 13: L10 bias variance naive bayes - Virginia Techf16ece5424/slides/L10_bias_variance... · • Matlab demo (C)#DhruvBatra# 24. BiasDVariance#Tradeoff • Choice#of#hypothesisclassintroduceslearning#bias

(C)  Dhruv  Batra   13Slide  Credit:  Greg  Shakhnarovich

Page 14: L10 bias variance naive bayes - Virginia Techf16ece5424/slides/L10_bias_variance... · • Matlab demo (C)#DhruvBatra# 24. BiasDVariance#Tradeoff • Choice#of#hypothesisclassintroduceslearning#bias

(C)  Dhruv  Batra   14Slide  Credit:  Greg  Shakhnarovich

Page 15: L10 bias variance naive bayes - Virginia Techf16ece5424/slides/L10_bias_variance... · • Matlab demo (C)#DhruvBatra# 24. BiasDVariance#Tradeoff • Choice#of#hypothesisclassintroduceslearning#bias

(C)  Dhruv  Batra   15Slide  Credit:  Greg  Shakhnarovich

Page 16: L10 bias variance naive bayes - Virginia Techf16ece5424/slides/L10_bias_variance... · • Matlab demo (C)#DhruvBatra# 24. BiasDVariance#Tradeoff • Choice#of#hypothesisclassintroduceslearning#bias

(C)  Dhruv  Batra   16Slide  Credit:  Greg  Shakhnarovich

Page 17: L10 bias variance naive bayes - Virginia Techf16ece5424/slides/L10_bias_variance... · • Matlab demo (C)#DhruvBatra# 24. BiasDVariance#Tradeoff • Choice#of#hypothesisclassintroduceslearning#bias

(C)  Dhruv  Batra   17Slide  Credit:  Greg  Shakhnarovich

Page 18: L10 bias variance naive bayes - Virginia Techf16ece5424/slides/L10_bias_variance... · • Matlab demo (C)#DhruvBatra# 24. BiasDVariance#Tradeoff • Choice#of#hypothesisclassintroduceslearning#bias

Overfitting• Overfitting: a  learning  algorithm  overfits  the  training  data  if  it  outputs  a  solution  w when  there  exists  another  solution  w’ such  that:

18(C)  Dhruv  Batra   Slide  Credit:  Carlos  Guestrin

Page 19: L10 bias variance naive bayes - Virginia Techf16ece5424/slides/L10_bias_variance... · • Matlab demo (C)#DhruvBatra# 24. BiasDVariance#Tradeoff • Choice#of#hypothesisclassintroduceslearning#bias

Error  Decomposition

(C)  Dhruv  Batra   19

Reality

Page 20: L10 bias variance naive bayes - Virginia Techf16ece5424/slides/L10_bias_variance... · • Matlab demo (C)#DhruvBatra# 24. BiasDVariance#Tradeoff • Choice#of#hypothesisclassintroduceslearning#bias

Error  Decomposition

(C)  Dhruv  Batra   20

Reality

Page 21: L10 bias variance naive bayes - Virginia Techf16ece5424/slides/L10_bias_variance... · • Matlab demo (C)#DhruvBatra# 24. BiasDVariance#Tradeoff • Choice#of#hypothesisclassintroduceslearning#bias

Error  Decomposition

(C)  Dhruv  Batra   21

Reality

Higher-­OrderPotentials

Page 22: L10 bias variance naive bayes - Virginia Techf16ece5424/slides/L10_bias_variance... · • Matlab demo (C)#DhruvBatra# 24. BiasDVariance#Tradeoff • Choice#of#hypothesisclassintroduceslearning#bias

Error  Decomposition• Approximation/Modeling   Error

– You  approximated  reality  with  model

• Estimation  Error– You  tried  to  learn  model  with  finite  data

• Optimization  Error– You  were  lazy  and  couldn’t/didn’t  optimize  to  completion

• Bayes  Error– Reality  just  sucks  (i.e.  there  is  a  lower  bound  on  error  for  all  models,  usually  non-­zero)

(C)  Dhruv  Batra   22

Page 23: L10 bias variance naive bayes - Virginia Techf16ece5424/slides/L10_bias_variance... · • Matlab demo (C)#DhruvBatra# 24. BiasDVariance#Tradeoff • Choice#of#hypothesisclassintroduceslearning#bias

Bias-­Variance  Tradeoff• Bias:  difference  between  what  you  expect  to  learn  and  truth– Measures  how  well  you  expect  to  represent  true  solution– Decreases  with  more  complex  model  

• Variance:  difference  between  what  you  expect  to  learn  and  what  you  learn  from  a  from  a  particular  dataset  – Measures  how  sensitive  learner  is  to  specific  dataset– Increases  with  more  complex  model

(C)  Dhruv  Batra   23Slide  Credit:  Carlos  Guestrin

Page 24: L10 bias variance naive bayes - Virginia Techf16ece5424/slides/L10_bias_variance... · • Matlab demo (C)#DhruvBatra# 24. BiasDVariance#Tradeoff • Choice#of#hypothesisclassintroduceslearning#bias

Bias-­Variance  Tradeoff• Matlab demo

(C)  Dhruv  Batra   24

Page 25: L10 bias variance naive bayes - Virginia Techf16ece5424/slides/L10_bias_variance... · • Matlab demo (C)#DhruvBatra# 24. BiasDVariance#Tradeoff • Choice#of#hypothesisclassintroduceslearning#bias

Bias-­Variance  Tradeoff

• Choice  of  hypothesis  class  introduces   learning  bias– More  complex  class  → less  bias– More  complex  class  → more  variance

25(C)  Dhruv  Batra   Slide  Credit:  Carlos  Guestrin

Page 26: L10 bias variance naive bayes - Virginia Techf16ece5424/slides/L10_bias_variance... · • Matlab demo (C)#DhruvBatra# 24. BiasDVariance#Tradeoff • Choice#of#hypothesisclassintroduceslearning#bias

(C)  Dhruv  Batra   26Slide  Credit:  Greg  Shakhnarovich

Page 27: L10 bias variance naive bayes - Virginia Techf16ece5424/slides/L10_bias_variance... · • Matlab demo (C)#DhruvBatra# 24. BiasDVariance#Tradeoff • Choice#of#hypothesisclassintroduceslearning#bias

Learning  Curves• Error  vs size  of  dataset

• On  board– High-­bias  curves– High-­variance  curves

(C)  Dhruv  Batra   27

Page 28: L10 bias variance naive bayes - Virginia Techf16ece5424/slides/L10_bias_variance... · • Matlab demo (C)#DhruvBatra# 24. BiasDVariance#Tradeoff • Choice#of#hypothesisclassintroduceslearning#bias

Debugging  Machine  Learning• My  algorithm  doesn’t  work

– High  test  error

• What  should  I  do?– More  training  data– Smaller  set  of  features– Larger  set  of  features– Lower  regularization– Higher  regularization

(C)  Dhruv  Batra   28

Page 29: L10 bias variance naive bayes - Virginia Techf16ece5424/slides/L10_bias_variance... · • Matlab demo (C)#DhruvBatra# 24. BiasDVariance#Tradeoff • Choice#of#hypothesisclassintroduceslearning#bias

What  you  need  to  know• Generalization  Error  Decomposition

– Approximation,  estimation,  optimization,  bayes error– For  squared  losses,  bias-­variance  tradeoff

• Errors– Difference  between  train  &  test  error  &  expected  error– Cross-­validation  (and  cross-­val error)– NEVER  EVER  learn  on  test  data

• Overfitting vs Underfitting

(C)  Dhruv  Batra   29

Page 30: L10 bias variance naive bayes - Virginia Techf16ece5424/slides/L10_bias_variance... · • Matlab demo (C)#DhruvBatra# 24. BiasDVariance#Tradeoff • Choice#of#hypothesisclassintroduceslearning#bias

New  Topic:  Naïve Bayes  

(your  first  probabilistic  classifier)

(C)  Dhruv  Batra   30

Classificationx y Discrete

Page 31: L10 bias variance naive bayes - Virginia Techf16ece5424/slides/L10_bias_variance... · • Matlab demo (C)#DhruvBatra# 24. BiasDVariance#Tradeoff • Choice#of#hypothesisclassintroduceslearning#bias

Classification• Learn:  h:X! Y

– X – features– Y  – target  classes

• Suppose  you  know  P(Y|X)  exactly,  how  should  you  classify?– Bayes  classifier:

• Why?

Slide  Credit:  Carlos  Guestrin

Page 32: L10 bias variance naive bayes - Virginia Techf16ece5424/slides/L10_bias_variance... · • Matlab demo (C)#DhruvBatra# 24. BiasDVariance#Tradeoff • Choice#of#hypothesisclassintroduceslearning#bias

Optimal  classification

• Theorem:  Bayes  classifier  hBayes is  optimal!

– That  is

• Proof:

Slide  Credit:  Carlos  Guestrin

Page 33: L10 bias variance naive bayes - Virginia Techf16ece5424/slides/L10_bias_variance... · • Matlab demo (C)#DhruvBatra# 24. BiasDVariance#Tradeoff • Choice#of#hypothesisclassintroduceslearning#bias

Generative  vs.  Discriminative

• Generative  Approach– Estimate  p(x|y)  and  p(y)– Use  Bayes  Rule  to  predict  y

• Discriminative  Approach– Estimate  p(y|x)  directly  OR– Learn  “discriminant”  function  h(x)

(C)  Dhruv  Batra   33

Page 34: L10 bias variance naive bayes - Virginia Techf16ece5424/slides/L10_bias_variance... · • Matlab demo (C)#DhruvBatra# 24. BiasDVariance#Tradeoff • Choice#of#hypothesisclassintroduceslearning#bias

Generative  vs.  Discriminative• Generative  Approach

– Assume  some  functional  form  for  P(X|Y),  P(Y)– Estimate  p(X|Y)  and  p(Y)– Use  Bayes  Rule  to  calculate  P(Y|  X=x)– Indirect computation  of  P(Y|X)  through  Bayes  rule– But,  can  generate  a  sample, P(X)  =  �y P(y)  P(X|y)

• Discriminative  Approach– Estimate  p(y|x)  directly  OR– Learn  “discriminant”  function  h(x)– Direct  but  cannot  obtain  a  sample  of  the  data,  because  P(X)  is  not  available

(C)  Dhruv  Batra   34

Page 35: L10 bias variance naive bayes - Virginia Techf16ece5424/slides/L10_bias_variance... · • Matlab demo (C)#DhruvBatra# 24. BiasDVariance#Tradeoff • Choice#of#hypothesisclassintroduceslearning#bias

Generative  vs.  Discriminative• Generative:  

– Today:  Naïve Bayes

• Discriminative:– Next:  Logistic  Regression

• NB  &  LR  related  to  each  other.  

(C)  Dhruv  Batra   35

Page 36: L10 bias variance naive bayes - Virginia Techf16ece5424/slides/L10_bias_variance... · • Matlab demo (C)#DhruvBatra# 24. BiasDVariance#Tradeoff • Choice#of#hypothesisclassintroduceslearning#bias

How  hard  is  it  to  learn  the  optimal  classifier?

Slide  Credit:  Carlos  Guestrin

• Categorical  Data    

• How  do  we  represent  these?  How  many  parameters?– Class-­Prior,  P(Y):

• Suppose  Y  is  composed  of  k classes

– Likelihood,  P(X|Y):• Suppose  X is  composed  of  d binary  features

• Complex  model  à High  variance  with  limited  data!!!

Page 37: L10 bias variance naive bayes - Virginia Techf16ece5424/slides/L10_bias_variance... · • Matlab demo (C)#DhruvBatra# 24. BiasDVariance#Tradeoff • Choice#of#hypothesisclassintroduceslearning#bias

Independence  to  the  rescue

(C)  Dhruv  Batra   37Slide  Credit:  Sam  Roweis

Page 38: L10 bias variance naive bayes - Virginia Techf16ece5424/slides/L10_bias_variance... · • Matlab demo (C)#DhruvBatra# 24. BiasDVariance#Tradeoff • Choice#of#hypothesisclassintroduceslearning#bias

The  Naïve  Bayes  assumption• Naïve  Bayes  assumption:

– Features  are  independent  given  class:

– More  generally:

• How  many  parameters  now?• Suppose  X is  composed  of  d binary  features

Slide  Credit:  Carlos  Guestrin(C)  Dhruv  Batra   38

d

Page 39: L10 bias variance naive bayes - Virginia Techf16ece5424/slides/L10_bias_variance... · • Matlab demo (C)#DhruvBatra# 24. BiasDVariance#Tradeoff • Choice#of#hypothesisclassintroduceslearning#bias

The  Naïve  Bayes  Classifier

Slide  Credit:  Carlos  Guestrin(C)  Dhruv  Batra   39

• Given:– Class-­Prior  P(Y)– d conditionally  independent  features  X given  the  class  Y– For  each  Xi,  we  have  likelihood  P(Xi|Y)

• Decision  rule:

• If  assumption  holds,  NB  is  optimal  classifier!