Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
CS 330 - Artificial Intelligent - Logistic and linear regression
Instructor: Renzhi Cao Computer Science Department
Pacific Lutheran University Fall 2018
1
Special appreciation to Tom Mitchell, Ian Goodfellow, Joshua Bengio, Aaron Courville, Michael Nielsen, Andrew Ng, Katie Malone, Sebastian Thrun, Ethem Alpaydin, Christopher Bishop,
Announcement
• Homework of decision tree is due on next Tuesday • Lab 3 is due on Sakai • Quiz on next week, study guide will be posted on Sakai • Practical Machine learning next Tuesday, bring your laptop
Gaussian Naive Bayes - Big Picture
Logistic Regression
Idea: • Naive Bayes allows computing P(Y|X) by learning P(Y) and
P(X|Y)
• Why not learn P(Y|X) directly?
• What would be w0 and w1 ?
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
Gradient-Descent
iii
ii
www
iwEw
Δ+=
∀∂
∂−=Δ ,η
15
wt wt+1
η
E (wt)
E (wt+1)
4n + 1
n + 1
Regression So far, we’ve been interested in learning P(Y|X) where Y has
discrete values (called ‘classification’) What if Y is continuous? (called ‘regression’) • predict weight from gender, height, age, …
• predict Google stock price today from Google, Yahoo, MSFT prices yesterday
• predict each pixel intensity in robot’s current camera image, from previous image and previous action
Regression Wish to learn f:X!Y, where Y is real, given {<x1,y1>…<xn,yn>} Approach: 1. choose some parameterized form for P(Y|X; θ)
( θ is the vector of parameters)
2. derive learning algorithm as MCLE or MAP estimate for θ
1. Choose parameterized form for P(Y|X; θ)
and the expected value of y for any given x is f(x)
Y
XAssume Y is some deterministic f(X), plus randomnoise
where
Therefore Y is a random variable that follows the distribution
Consider Linear Regression
E.g., assume f(x) is linear function of x Notation: to make our parameters explicit, let’s write
Training Linear Regression
How can we learn W from the training data?
Training Linear Regression
How can we learn W from the training data? Learn Maximum Conditional Likelihood Estimate! where
Training Linear Regression Learn Maximum Conditional Likelihood Estimate
where
Training Linear RegressionLearn Maximum Conditional LikelihoodEstimate
where
so:
Training Linear RegressionLearn Maximum Conditional LikelihoodEstimate
Can we derive gradient descent rule for training?
Summary
• Learning is optimization problem once we choose our objective function: • maximize data likelihood • maximize posterior prob of W
• We use gradient descent as general learning algorithm to learn the weight.
Discussion about progress of literature review
• Around 20 mins discussions between groups. • One group member presents the current progress, plan and
issues. • (https://www.cs.plu.edu/~caora/cs330/Materials/
fall2018/groups) • (https://www.cs.plu.edu/~caora/cs330/Materials/
fall2018/LiteratureReview_requirement.pdf)
Extra slides (Not required to understand)
EPI 809/Spring 2008 46
YY = mX + b
b = Y-interceptX
Changein Y
Change in X
m = Slope
Linear Equations
© 1984-1994 T/Maker Co.
Regression – SummaryUnder general assumption
1. MLE corresponds to minimizing sum of squared prediction errors
2. MAP estimate minimizes SSE plus sum of squared weights
3. Again, learning is an optimization problem once we choose our objective function• maximize data likelihood• maximize posterior prob of W
4. Again, we can use gradient descent as a general learning algorithm• as long as our objective fn is differentiable wrt W• though we might learn local optima ins
5. Almost nothing we said here required that f(x) be linear in x
How about MAP instead of MLE estimate?
-