48
CS 330 - Artificial Intelligent - Logistic and linear regression Instructor: Renzhi Cao Computer Science Department Pacific Lutheran University Fall 2018 Special appreciation to Tom Mitchell, Ian Goodfellow, Joshua Bengio, Aaron Courville, Michael Nielsen, Andrew Ng, Katie Malone, Sebastian Thrun, Ethem Alpaydin, Christopher Bishop,

CS 330 - Artificial Intelligentcaora/cs330/Materials/fall2018/Slides/Day13.pdf · CS 330 - Artificial Intelligent - Logistic and linear regression Instructor: Renzhi Cao Computer

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: CS 330 - Artificial Intelligentcaora/cs330/Materials/fall2018/Slides/Day13.pdf · CS 330 - Artificial Intelligent - Logistic and linear regression Instructor: Renzhi Cao Computer

CS 330 - Artificial Intelligent - Logistic and linear regression

Instructor: Renzhi Cao Computer Science Department

Pacific Lutheran University Fall 2018

1

Special appreciation to Tom Mitchell, Ian Goodfellow, Joshua Bengio, Aaron Courville, Michael Nielsen, Andrew Ng, Katie Malone, Sebastian Thrun, Ethem Alpaydin, Christopher Bishop,

Page 2: CS 330 - Artificial Intelligentcaora/cs330/Materials/fall2018/Slides/Day13.pdf · CS 330 - Artificial Intelligent - Logistic and linear regression Instructor: Renzhi Cao Computer

Announcement

• Homework of decision tree is due on next Tuesday • Lab 3 is due on Sakai • Quiz on next week, study guide will be posted on Sakai • Practical Machine learning next Tuesday, bring your laptop

Page 3: CS 330 - Artificial Intelligentcaora/cs330/Materials/fall2018/Slides/Day13.pdf · CS 330 - Artificial Intelligent - Logistic and linear regression Instructor: Renzhi Cao Computer

Gaussian Naive Bayes - Big Picture

Page 4: CS 330 - Artificial Intelligentcaora/cs330/Materials/fall2018/Slides/Day13.pdf · CS 330 - Artificial Intelligent - Logistic and linear regression Instructor: Renzhi Cao Computer

Logistic Regression

Idea: • Naive Bayes allows computing P(Y|X) by learning P(Y) and

P(X|Y)

• Why not learn P(Y|X) directly?

Page 5: CS 330 - Artificial Intelligentcaora/cs330/Materials/fall2018/Slides/Day13.pdf · CS 330 - Artificial Intelligent - Logistic and linear regression Instructor: Renzhi Cao Computer
Page 6: CS 330 - Artificial Intelligentcaora/cs330/Materials/fall2018/Slides/Day13.pdf · CS 330 - Artificial Intelligent - Logistic and linear regression Instructor: Renzhi Cao Computer

• What would be w0 and w1 ?

Page 7: CS 330 - Artificial Intelligentcaora/cs330/Materials/fall2018/Slides/Day13.pdf · CS 330 - Artificial Intelligent - Logistic and linear regression Instructor: Renzhi Cao Computer
Page 8: CS 330 - Artificial Intelligentcaora/cs330/Materials/fall2018/Slides/Day13.pdf · CS 330 - Artificial Intelligent - Logistic and linear regression Instructor: Renzhi Cao Computer
Page 9: CS 330 - Artificial Intelligentcaora/cs330/Materials/fall2018/Slides/Day13.pdf · CS 330 - Artificial Intelligent - Logistic and linear regression Instructor: Renzhi Cao Computer
Page 10: CS 330 - Artificial Intelligentcaora/cs330/Materials/fall2018/Slides/Day13.pdf · CS 330 - Artificial Intelligent - Logistic and linear regression Instructor: Renzhi Cao Computer
Page 11: CS 330 - Artificial Intelligentcaora/cs330/Materials/fall2018/Slides/Day13.pdf · CS 330 - Artificial Intelligent - Logistic and linear regression Instructor: Renzhi Cao Computer
Page 12: CS 330 - Artificial Intelligentcaora/cs330/Materials/fall2018/Slides/Day13.pdf · CS 330 - Artificial Intelligent - Logistic and linear regression Instructor: Renzhi Cao Computer
Page 13: CS 330 - Artificial Intelligentcaora/cs330/Materials/fall2018/Slides/Day13.pdf · CS 330 - Artificial Intelligent - Logistic and linear regression Instructor: Renzhi Cao Computer
Page 14: CS 330 - Artificial Intelligentcaora/cs330/Materials/fall2018/Slides/Day13.pdf · CS 330 - Artificial Intelligent - Logistic and linear regression Instructor: Renzhi Cao Computer
Page 15: CS 330 - Artificial Intelligentcaora/cs330/Materials/fall2018/Slides/Day13.pdf · CS 330 - Artificial Intelligent - Logistic and linear regression Instructor: Renzhi Cao Computer

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Gradient-Descent

iii

ii

www

iwEw

Δ+=

∀∂

∂−=Δ ,η

15

wt wt+1

η

E (wt)

E (wt+1)

Page 16: CS 330 - Artificial Intelligentcaora/cs330/Materials/fall2018/Slides/Day13.pdf · CS 330 - Artificial Intelligent - Logistic and linear regression Instructor: Renzhi Cao Computer
Page 17: CS 330 - Artificial Intelligentcaora/cs330/Materials/fall2018/Slides/Day13.pdf · CS 330 - Artificial Intelligent - Logistic and linear regression Instructor: Renzhi Cao Computer
Page 18: CS 330 - Artificial Intelligentcaora/cs330/Materials/fall2018/Slides/Day13.pdf · CS 330 - Artificial Intelligent - Logistic and linear regression Instructor: Renzhi Cao Computer
Page 19: CS 330 - Artificial Intelligentcaora/cs330/Materials/fall2018/Slides/Day13.pdf · CS 330 - Artificial Intelligent - Logistic and linear regression Instructor: Renzhi Cao Computer
Page 20: CS 330 - Artificial Intelligentcaora/cs330/Materials/fall2018/Slides/Day13.pdf · CS 330 - Artificial Intelligent - Logistic and linear regression Instructor: Renzhi Cao Computer
Page 21: CS 330 - Artificial Intelligentcaora/cs330/Materials/fall2018/Slides/Day13.pdf · CS 330 - Artificial Intelligent - Logistic and linear regression Instructor: Renzhi Cao Computer
Page 22: CS 330 - Artificial Intelligentcaora/cs330/Materials/fall2018/Slides/Day13.pdf · CS 330 - Artificial Intelligent - Logistic and linear regression Instructor: Renzhi Cao Computer
Page 23: CS 330 - Artificial Intelligentcaora/cs330/Materials/fall2018/Slides/Day13.pdf · CS 330 - Artificial Intelligent - Logistic and linear regression Instructor: Renzhi Cao Computer
Page 24: CS 330 - Artificial Intelligentcaora/cs330/Materials/fall2018/Slides/Day13.pdf · CS 330 - Artificial Intelligent - Logistic and linear regression Instructor: Renzhi Cao Computer

4n + 1

n + 1

Page 25: CS 330 - Artificial Intelligentcaora/cs330/Materials/fall2018/Slides/Day13.pdf · CS 330 - Artificial Intelligent - Logistic and linear regression Instructor: Renzhi Cao Computer
Page 26: CS 330 - Artificial Intelligentcaora/cs330/Materials/fall2018/Slides/Day13.pdf · CS 330 - Artificial Intelligent - Logistic and linear regression Instructor: Renzhi Cao Computer

Regression So far, we’ve been interested in learning P(Y|X) where Y has

discrete values (called ‘classification’) What if Y is continuous? (called ‘regression’) •  predict weight from gender, height, age, …

•  predict Google stock price today from Google, Yahoo, MSFT prices yesterday

•  predict each pixel intensity in robot’s current camera image, from previous image and previous action

Page 27: CS 330 - Artificial Intelligentcaora/cs330/Materials/fall2018/Slides/Day13.pdf · CS 330 - Artificial Intelligent - Logistic and linear regression Instructor: Renzhi Cao Computer

Regression Wish to learn f:X!Y, where Y is real, given {<x1,y1>…<xn,yn>} Approach: 1.  choose some parameterized form for P(Y|X; θ)

( θ is the vector of parameters)

2.  derive learning algorithm as MCLE or MAP estimate for θ

Page 28: CS 330 - Artificial Intelligentcaora/cs330/Materials/fall2018/Slides/Day13.pdf · CS 330 - Artificial Intelligent - Logistic and linear regression Instructor: Renzhi Cao Computer

1. Choose parameterized form for P(Y|X; θ)

and the expected value of y for any given x is f(x)

Y

XAssume Y is some deterministic f(X), plus randomnoise

where

Therefore Y is a random variable that follows the distribution

Page 29: CS 330 - Artificial Intelligentcaora/cs330/Materials/fall2018/Slides/Day13.pdf · CS 330 - Artificial Intelligent - Logistic and linear regression Instructor: Renzhi Cao Computer

Consider Linear Regression

E.g., assume f(x) is linear function of x Notation: to make our parameters explicit, let’s write

Page 30: CS 330 - Artificial Intelligentcaora/cs330/Materials/fall2018/Slides/Day13.pdf · CS 330 - Artificial Intelligent - Logistic and linear regression Instructor: Renzhi Cao Computer

Training Linear Regression

How can we learn W from the training data?

Page 31: CS 330 - Artificial Intelligentcaora/cs330/Materials/fall2018/Slides/Day13.pdf · CS 330 - Artificial Intelligent - Logistic and linear regression Instructor: Renzhi Cao Computer

Training Linear Regression

How can we learn W from the training data? Learn Maximum Conditional Likelihood Estimate! where

Page 32: CS 330 - Artificial Intelligentcaora/cs330/Materials/fall2018/Slides/Day13.pdf · CS 330 - Artificial Intelligent - Logistic and linear regression Instructor: Renzhi Cao Computer

Training Linear Regression Learn Maximum Conditional Likelihood Estimate

where

Page 33: CS 330 - Artificial Intelligentcaora/cs330/Materials/fall2018/Slides/Day13.pdf · CS 330 - Artificial Intelligent - Logistic and linear regression Instructor: Renzhi Cao Computer

Training Linear RegressionLearn Maximum Conditional LikelihoodEstimate

where

so:

Page 34: CS 330 - Artificial Intelligentcaora/cs330/Materials/fall2018/Slides/Day13.pdf · CS 330 - Artificial Intelligent - Logistic and linear regression Instructor: Renzhi Cao Computer

Training Linear RegressionLearn Maximum Conditional LikelihoodEstimate

Can we derive gradient descent rule for training?

Page 35: CS 330 - Artificial Intelligentcaora/cs330/Materials/fall2018/Slides/Day13.pdf · CS 330 - Artificial Intelligent - Logistic and linear regression Instructor: Renzhi Cao Computer

Summary

• Learning is optimization problem once we choose our objective function: • maximize data likelihood • maximize posterior prob of W

• We use gradient descent as general learning algorithm to learn the weight.

Page 36: CS 330 - Artificial Intelligentcaora/cs330/Materials/fall2018/Slides/Day13.pdf · CS 330 - Artificial Intelligent - Logistic and linear regression Instructor: Renzhi Cao Computer

Discussion about progress of literature review

• Around 20 mins discussions between groups. • One group member presents the current progress, plan and

issues. • (https://www.cs.plu.edu/~caora/cs330/Materials/

fall2018/groups) • (https://www.cs.plu.edu/~caora/cs330/Materials/

fall2018/LiteratureReview_requirement.pdf)

Page 37: CS 330 - Artificial Intelligentcaora/cs330/Materials/fall2018/Slides/Day13.pdf · CS 330 - Artificial Intelligent - Logistic and linear regression Instructor: Renzhi Cao Computer
Page 38: CS 330 - Artificial Intelligentcaora/cs330/Materials/fall2018/Slides/Day13.pdf · CS 330 - Artificial Intelligent - Logistic and linear regression Instructor: Renzhi Cao Computer

Extra slides (Not required to understand)

Page 39: CS 330 - Artificial Intelligentcaora/cs330/Materials/fall2018/Slides/Day13.pdf · CS 330 - Artificial Intelligent - Logistic and linear regression Instructor: Renzhi Cao Computer
Page 40: CS 330 - Artificial Intelligentcaora/cs330/Materials/fall2018/Slides/Day13.pdf · CS 330 - Artificial Intelligent - Logistic and linear regression Instructor: Renzhi Cao Computer
Page 41: CS 330 - Artificial Intelligentcaora/cs330/Materials/fall2018/Slides/Day13.pdf · CS 330 - Artificial Intelligent - Logistic and linear regression Instructor: Renzhi Cao Computer
Page 42: CS 330 - Artificial Intelligentcaora/cs330/Materials/fall2018/Slides/Day13.pdf · CS 330 - Artificial Intelligent - Logistic and linear regression Instructor: Renzhi Cao Computer
Page 43: CS 330 - Artificial Intelligentcaora/cs330/Materials/fall2018/Slides/Day13.pdf · CS 330 - Artificial Intelligent - Logistic and linear regression Instructor: Renzhi Cao Computer
Page 44: CS 330 - Artificial Intelligentcaora/cs330/Materials/fall2018/Slides/Day13.pdf · CS 330 - Artificial Intelligent - Logistic and linear regression Instructor: Renzhi Cao Computer
Page 45: CS 330 - Artificial Intelligentcaora/cs330/Materials/fall2018/Slides/Day13.pdf · CS 330 - Artificial Intelligent - Logistic and linear regression Instructor: Renzhi Cao Computer
Page 46: CS 330 - Artificial Intelligentcaora/cs330/Materials/fall2018/Slides/Day13.pdf · CS 330 - Artificial Intelligent - Logistic and linear regression Instructor: Renzhi Cao Computer

EPI 809/Spring 2008 46

YY = mX + b

b = Y-interceptX

Changein Y

Change in X

m = Slope

Linear Equations

© 1984-1994 T/Maker Co.

Page 47: CS 330 - Artificial Intelligentcaora/cs330/Materials/fall2018/Slides/Day13.pdf · CS 330 - Artificial Intelligent - Logistic and linear regression Instructor: Renzhi Cao Computer

Regression – SummaryUnder general assumption

1.  MLE corresponds to minimizing sum of squared prediction errors

2.  MAP estimate minimizes SSE plus sum of squared weights

3.  Again, learning is an optimization problem once we choose our objective function•  maximize data likelihood•  maximize posterior prob of W

4.  Again, we can use gradient descent as a general learning algorithm•  as long as our objective fn is differentiable wrt W•  though we might learn local optima ins

5.  Almost nothing we said here required that f(x) be linear in x

Page 48: CS 330 - Artificial Intelligentcaora/cs330/Materials/fall2018/Slides/Day13.pdf · CS 330 - Artificial Intelligent - Logistic and linear regression Instructor: Renzhi Cao Computer

How about MAP instead of MLE estimate?

-