Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Warm-up as You Walk InBernouli distribution:
๐ โผ ๐ต๐๐๐ ๐ง
๐ ๐ฆ = แ๐ง, ๐ฆ = 11 โ ๐ง, ๐ฆ = 0
What is the log likelihood for three i.i.d. samples, given parameter ๐ง:
๐ = {๐ฆ 1 = 1, ๐ฆ 2 = 1, ๐ฆ 3 = 0}
๐ฟ ๐ง =
โ ๐ง =
Introduction to Machine Learning
Logistic Regression
Instructor: Pat Virtue
AnnouncementsAssignments:
โช HW2 (written & programming)
โช Due Tue 2/4, 11:59 pm
Early Feedback
โช More mathematical rigor
โช Consolidated course notes
โช Lots of concepts, how does it all fit together?
Plan
Last time
โช Likelihood
โช Density Estimation
โช MLE for Density Estimation
Today
โช Wrap up MLE for linear regression
โช Classification models
โช MLE for logistic regression
MR Fingerprinting AssumptionsForgot a really important assumption!!
5002500
T1 T2
AssumptionsWhat assumptions do we make with this data?
Input x
Ou
tpu
t y
Modelling ๐(๐|๐, ๐)
MLE for Linear RegressionHow does our model of ๐(๐|๐, ๐) with the likelihood function?
๐ฟ ๐
Maximum (Conditional) Likelihood Estimate
M(C)LE for Linear Regression
๐ฟ ๐, ๐๐ =1
2๐๐2 ๐/2๐
โ ฯ๐ ๐ฆ
(๐)โ๐
๐๐(๐) 2
2๐2
M(C)LE for Linear RegressionHow does M(C)LE optimization relate to least squares optimization?
โ(๐) =
๐ฝ ๐ =
Piazza Poll 2:Does min
๐โ โ ๐ equal min
๐๐ฝ(๐ค) ?
Linear Regression with Multiple Input Features
Poll 1: Which vector is the correct ๐ฝ?
Classification ModelsLinear Regression
Classification ModelsLinear Regression with Decision Boundary
Classification ModelsLinear Regression with Probability
Modelling ๐(๐|๐, ๐)Bernoulli distribution of logistic function of linear model
MLE for BernoulliBernoulli distribution:
๐ โผ ๐ต๐๐๐ ๐ง
๐ ๐ฆ = แ๐ง, ๐ฆ = 11 โ ๐ง, ๐ฆ = 0
What is the log likelihood for three i.i.d. samples, given parameter ๐ง?
๐ = {๐ฆ 1 = 1, ๐ฆ 2 = 1, ๐ฆ 3 = 0}
๐ฟ ๐ง =
โ ๐ง =
MLE for BernoulliBernoulli distribution:
๐ โผ ๐ต๐๐๐ ๐ง
๐ ๐ฆ = แ๐ง, ๐ฆ = 11 โ ๐ง, ๐ฆ = 0
What is the log likelihood for three i.i.d. samples, given parameter ๐ง?
๐ = {๐ฆ 1 = 1, ๐ฆ 2 = 1, ๐ฆ 3 = 0}
๐ฟ ๐ง =
โ ๐ง =
MLE for BernoulliBernoulli distribution:
๐ โผ ๐ต๐๐๐ ๐ง
๐ ๐ฆ = แ๐ง, ๐ฆ = 11 โ ๐ง, ๐ฆ = 0
What is the log likelihood for three i.i.d. samples, given parameter ๐ง?
๐ = {๐ฆ 1 = 1, ๐ฆ 2 = 1, ๐ฆ 3 = 0}
๐ฟ ๐ง = ๐ง โ ๐ง โ (1 โ ๐ง) = ฯ๐ ๐ง๐ฆ๐
1 โ ๐ง 1โ๐ฆ(๐)
โ ๐ง = log ๐ง + log ๐ง + log(1 โ ๐ง) = ฯ๐ ๐ฆ(๐) log ๐ง + 1 โ ๐ฆ ๐ log 1 โ ๐ง
M(C)LE for Logistic Regression๐(๐ โฃ ๐, ๐)
๐ ๐ ๐,๐ = ฯ๐=1๐ ๐(๐ฆ(๐) โฃ ๐(๐), ๐)
Model ๐ as a Bernoulli distribution, but the temporary ๐ง is now based on the logistic function of our linear model of input ๐
๐ โผ ๐ต๐๐๐ ๐ , ๐ = ๐ ๐๐๐ , ๐ ๐ง =1
1+๐โ๐ง
What is the conditional log likelihood?
๐ฟ ๐ =
โ ๐ =
M(C)LE for Logistic Regression๐(๐ โฃ ๐, ๐)
๐ ๐ ๐,๐ = ฯ๐=1๐ ๐(๐ฆ(๐) โฃ ๐(๐), ๐)
Model ๐ as a Bernoulli distribution, but the temporary ๐ง is now based on the logistic function of our linear model of input ๐
๐ โผ ๐ต๐๐๐ ๐ , ๐ = ๐ ๐๐๐ , ๐ ๐ง =1
1+๐โ๐ง
What is the conditional log likelihood?
๐ฟ ๐ = ฯ๐ ๐ ๐๐๐(๐)๐ฆ ๐
1 โ ๐ ๐๐๐(๐)1โ๐ฆ(๐)
โ ๐ = ฯ๐ ๐ฆ ๐ log ๐ ๐๐๐(๐) + 1 โ ๐ฆ(๐) log 1 โ ๐ ๐๐๐(๐)
M(C)LE for Logistic Regression
โ ๐ = ฯ๐ ๐ฆ ๐ log ๐(๐) + 1 โ ๐ฆ(๐) log 1 โ ๐(๐)
๐โ
๐๐=
๐ง = ๐ ๐, ๐ = ๐๐๐
โ๐๐(๐, ๐) = ๐
๐ = ๐ ๐ง =1
1+๐โ๐ง
๐๐
๐๐ง= ๐ ๐ง 1 โ ๐ ๐ง = ๐(1 โ ๐)
M(C)LE for Logistic Regression
โ ๐ = ฯ๐ ๐ฆ ๐ log ๐(๐) + 1 โ ๐ฆ(๐) log 1 โ ๐(๐)
๐โ
๐๐= ฯ๐
๐ฆ ๐
๐(๐)โ
1โ๐ฆ ๐
1โ๐(๐)๐๐
๐๐
๐๐
๐๐
= ฯ๐๐ฆ ๐ โ๐ ๐
๐ ๐ 1โ๐ ๐ ๐ ๐ 1 โ ๐ ๐ ๐ ๐ ๐
= ฯ๐ ๐ฆ ๐ โ ๐ ๐ ๐ ๐ ๐
๐ง = ๐ ๐, ๐ = ๐๐๐
โ๐๐(๐, ๐) = ๐
๐ = ๐ ๐ง =1
1+๐โ๐ง
๐๐
๐๐ง= ๐ ๐ง 1 โ ๐ ๐ง = ๐(1 โ ๐)
M(C)LE for Logistic Regression
โ ๐ = ฯ๐ ๐ฆ ๐ log ๐(๐) + 1 โ ๐ฆ(๐) log 1 โ ๐(๐)
โ๐โ(๐) = ฯ๐ ๐ฆ ๐ โ ๐ ๐ ๐ ๐
โ๐โ(๐) = 0?
No closed form solution
Back to iterative methods. Solve with (stochastic) gradient descent, Newtonโs method, or Iteratively Reweighted Least Squares (IRLS)
๐ง = ๐ ๐, ๐ = ๐๐๐ ๐ = ๐ ๐ง =1
1+๐โ๐ง
Logistic FunctionCool note: Logistic function is related the invers of logit function!
Odds: Ratio of two probabilities. For ๐ โผ ๐ต๐๐๐(๐), ๐(๐=1)
๐(๐=0)=
๐
1โ๐
Logit function: Log odds. log๐(๐=1)
๐(๐=0)= log
๐
1โ๐
๐ง = ๐๐๐๐๐ก ๐ = log๐
1โ๐
๐ = ๐๐๐๐๐กโ1 ๐ง =1
1+๐โ๐ง
Log Odds and Logistic RegressionFormulate log odds as linear model of X:
log๐(๐ = 1 โฃ ๐ = ๐,๐)
๐(๐ = 0 โฃ ๐ = ๐,๐)= ๐๐๐
Equivalent to logistic representation:
๐ ๐ = 1 ๐ = ๐,๐ =1
1 + ๐โ๐๐๐
Log Odds and Logistic Regression (Multi-class!)Formulate log odds as linear model of X:
log๐(๐ = 1 โฃ ๐ = ๐,๐พ)
๐(๐ = ๐พ โฃ ๐ = ๐,๐พ)= ๐1
๐๐
log๐(๐ = 2 โฃ ๐ = ๐,๐พ)
๐(๐ = ๐พ โฃ ๐ = ๐,๐พ)= ๐2
๐๐
โฎ
log๐(๐ = ๐พ โ 1 โฃ ๐ = ๐,๐พ)
๐(๐ = ๐พ โฃ ๐ = ๐,๐พ)= ๐๐พโ1
๐ ๐
Equivalent to softmax representation:
๐ ๐ = ๐ ๐ = ๐,๐ =๐๐๐
๐๐
1+ฯ๐=1๐พโ1 ๐
๐๐๐๐
๐ ๐ = ๐พ ๐ = ๐,๐ =1
1+ฯ๐=1๐พโ1 ๐
๐๐๐๐
๐ ๐ = ๐ ๐ = ๐,๐ =๐๐๐
๐๐
ฯ๐=1๐พ ๐
๐๐๐๐OR
Multi-class Logistic Regression๐(๐ โฃ ๐, ๐)
๐ ๐ ๐,๐พ = ฯ๐=1๐ ๐(๐ฆ(๐) โฃ ๐(๐),๐พ)
๐ ๐ฆ(๐) = ๐ ๐ = ๐(๐),๐ =๐๐๐
๐๐(๐)
ฯ๐=1๐พ ๐
๐๐๐๐(๐)
What is the conditional likelihood?
๐ฟ ๐ = ฯ๐๐๐๐
๐๐(๐)
ฯ๐=1๐พ ๐
๐๐๐๐(๐)
What is the hypothesis function?
เท๐ฆ = โ๐พ ๐ =