CS 330 - Artificial Intelligentcaora/cs330/Materials/fall2018/Slides/Day13.pdf · CS 330 -...

Preview:

Citation preview

CS 330 - Artificial Intelligent - Logistic and linear regression

Instructor: Renzhi Cao Computer Science Department

Pacific Lutheran University Fall 2018

1

Special appreciation to Tom Mitchell, Ian Goodfellow, Joshua Bengio, Aaron Courville, Michael Nielsen, Andrew Ng, Katie Malone, Sebastian Thrun, Ethem Alpaydin, Christopher Bishop,

Announcement

• Homework of decision tree is due on next Tuesday • Lab 3 is due on Sakai • Quiz on next week, study guide will be posted on Sakai • Practical Machine learning next Tuesday, bring your laptop

Gaussian Naive Bayes - Big Picture

Logistic Regression

Idea: • Naive Bayes allows computing P(Y|X) by learning P(Y) and

P(X|Y)

• Why not learn P(Y|X) directly?

• What would be w0 and w1 ?

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Gradient-Descent

iii

ii

www

iwEw

Δ+=

∀∂

∂−=Δ ,η

15

wt wt+1

η

E (wt)

E (wt+1)

4n + 1

n + 1

Regression So far, we’ve been interested in learning P(Y|X) where Y has

discrete values (called ‘classification’) What if Y is continuous? (called ‘regression’) •  predict weight from gender, height, age, …

•  predict Google stock price today from Google, Yahoo, MSFT prices yesterday

•  predict each pixel intensity in robot’s current camera image, from previous image and previous action

Regression Wish to learn f:X!Y, where Y is real, given {<x1,y1>…<xn,yn>} Approach: 1.  choose some parameterized form for P(Y|X; θ)

( θ is the vector of parameters)

2.  derive learning algorithm as MCLE or MAP estimate for θ

1. Choose parameterized form for P(Y|X; θ)

and the expected value of y for any given x is f(x)

Y

XAssume Y is some deterministic f(X), plus randomnoise

where

Therefore Y is a random variable that follows the distribution

Consider Linear Regression

E.g., assume f(x) is linear function of x Notation: to make our parameters explicit, let’s write

Training Linear Regression

How can we learn W from the training data?

Training Linear Regression

How can we learn W from the training data? Learn Maximum Conditional Likelihood Estimate! where

Training Linear Regression Learn Maximum Conditional Likelihood Estimate

where

Training Linear RegressionLearn Maximum Conditional LikelihoodEstimate

where

so:

Training Linear RegressionLearn Maximum Conditional LikelihoodEstimate

Can we derive gradient descent rule for training?

Summary

• Learning is optimization problem once we choose our objective function: • maximize data likelihood • maximize posterior prob of W

• We use gradient descent as general learning algorithm to learn the weight.

Discussion about progress of literature review

• Around 20 mins discussions between groups. • One group member presents the current progress, plan and

issues. • (https://www.cs.plu.edu/~caora/cs330/Materials/

fall2018/groups) • (https://www.cs.plu.edu/~caora/cs330/Materials/

fall2018/LiteratureReview_requirement.pdf)

Extra slides (Not required to understand)

EPI 809/Spring 2008 46

YY = mX + b

b = Y-interceptX

Changein Y

Change in X

m = Slope

Linear Equations

© 1984-1994 T/Maker Co.

Regression – SummaryUnder general assumption

1.  MLE corresponds to minimizing sum of squared prediction errors

2.  MAP estimate minimizes SSE plus sum of squared weights

3.  Again, learning is an optimization problem once we choose our objective function•  maximize data likelihood•  maximize posterior prob of W

4.  Again, we can use gradient descent as a general learning algorithm•  as long as our objective fn is differentiable wrt W•  though we might learn local optima ins

5.  Almost nothing we said here required that f(x) be linear in x

How about MAP instead of MLE estimate?

-

Recommended