29
Warm-up as You Walk In Bernouli distribution: โˆผ =แ‰Š , =1 1 โˆ’ , =0 What is the log likelihood for three i.i.d. samples, given parameter : = { 1 = 1, 2 = 1, 3 = 0} = โ„“ =

Warm-up as You Walk In10315/lectures/10315_Sp20...Warm-up as You Walk In Bernouli distribution: ... Today Wrap up MLE for linear regression Classification models MLE for logistic regression

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Warm-up as You Walk In10315/lectures/10315_Sp20...Warm-up as You Walk In Bernouli distribution: ... Today Wrap up MLE for linear regression Classification models MLE for logistic regression

Warm-up as You Walk InBernouli distribution:

๐‘Œ โˆผ ๐ต๐‘’๐‘Ÿ๐‘› ๐‘ง

๐‘ ๐‘ฆ = แ‰Š๐‘ง, ๐‘ฆ = 11 โˆ’ ๐‘ง, ๐‘ฆ = 0

What is the log likelihood for three i.i.d. samples, given parameter ๐‘ง:

๐’Ÿ = {๐‘ฆ 1 = 1, ๐‘ฆ 2 = 1, ๐‘ฆ 3 = 0}

๐ฟ ๐‘ง =

โ„“ ๐‘ง =

Page 2: Warm-up as You Walk In10315/lectures/10315_Sp20...Warm-up as You Walk In Bernouli distribution: ... Today Wrap up MLE for linear regression Classification models MLE for logistic regression

Introduction to Machine Learning

Logistic Regression

Instructor: Pat Virtue

Page 3: Warm-up as You Walk In10315/lectures/10315_Sp20...Warm-up as You Walk In Bernouli distribution: ... Today Wrap up MLE for linear regression Classification models MLE for logistic regression

AnnouncementsAssignments:

โ–ช HW2 (written & programming)

โ–ช Due Tue 2/4, 11:59 pm

Early Feedback

โ–ช More mathematical rigor

โ–ช Consolidated course notes

โ–ช Lots of concepts, how does it all fit together?

Page 4: Warm-up as You Walk In10315/lectures/10315_Sp20...Warm-up as You Walk In Bernouli distribution: ... Today Wrap up MLE for linear regression Classification models MLE for logistic regression

Plan

Last time

โ–ช Likelihood

โ–ช Density Estimation

โ–ช MLE for Density Estimation

Today

โ–ช Wrap up MLE for linear regression

โ–ช Classification models

โ–ช MLE for logistic regression

Page 5: Warm-up as You Walk In10315/lectures/10315_Sp20...Warm-up as You Walk In Bernouli distribution: ... Today Wrap up MLE for linear regression Classification models MLE for logistic regression

MR Fingerprinting AssumptionsForgot a really important assumption!!

5002500

T1 T2

Page 6: Warm-up as You Walk In10315/lectures/10315_Sp20...Warm-up as You Walk In Bernouli distribution: ... Today Wrap up MLE for linear regression Classification models MLE for logistic regression

AssumptionsWhat assumptions do we make with this data?

Input x

Ou

tpu

t y

Page 7: Warm-up as You Walk In10315/lectures/10315_Sp20...Warm-up as You Walk In Bernouli distribution: ... Today Wrap up MLE for linear regression Classification models MLE for logistic regression

Modelling ๐‘“(๐‘Œ|๐‘‹, ๐œƒ)

Page 8: Warm-up as You Walk In10315/lectures/10315_Sp20...Warm-up as You Walk In Bernouli distribution: ... Today Wrap up MLE for linear regression Classification models MLE for logistic regression

MLE for Linear RegressionHow does our model of ๐‘“(๐‘Œ|๐‘‹, ๐œƒ) with the likelihood function?

๐ฟ ๐œƒ

Maximum (Conditional) Likelihood Estimate

Page 9: Warm-up as You Walk In10315/lectures/10315_Sp20...Warm-up as You Walk In Bernouli distribution: ... Today Wrap up MLE for linear regression Classification models MLE for logistic regression

M(C)LE for Linear Regression

๐ฟ ๐’˜, ๐ˆ๐Ÿ =1

2๐œ‹๐œŽ2 ๐‘/2๐‘’

โˆ’ ฯƒ๐‘ ๐‘ฆ

(๐‘›)โˆ’๐’˜

๐‘‡๐’™(๐‘›) 2

2๐œŽ2

Page 10: Warm-up as You Walk In10315/lectures/10315_Sp20...Warm-up as You Walk In Bernouli distribution: ... Today Wrap up MLE for linear regression Classification models MLE for logistic regression

M(C)LE for Linear RegressionHow does M(C)LE optimization relate to least squares optimization?

โ„“(๐’˜) =

๐ฝ ๐’˜ =

Page 11: Warm-up as You Walk In10315/lectures/10315_Sp20...Warm-up as You Walk In Bernouli distribution: ... Today Wrap up MLE for linear regression Classification models MLE for logistic regression

Piazza Poll 2:Does min

๐’˜โˆ’ โ„“ ๐’˜ equal min

๐’˜๐ฝ(๐‘ค) ?

Page 12: Warm-up as You Walk In10315/lectures/10315_Sp20...Warm-up as You Walk In Bernouli distribution: ... Today Wrap up MLE for linear regression Classification models MLE for logistic regression

Linear Regression with Multiple Input Features

Page 13: Warm-up as You Walk In10315/lectures/10315_Sp20...Warm-up as You Walk In Bernouli distribution: ... Today Wrap up MLE for linear regression Classification models MLE for logistic regression

Poll 1: Which vector is the correct ๐œฝ?

Page 14: Warm-up as You Walk In10315/lectures/10315_Sp20...Warm-up as You Walk In Bernouli distribution: ... Today Wrap up MLE for linear regression Classification models MLE for logistic regression

Classification ModelsLinear Regression

Page 15: Warm-up as You Walk In10315/lectures/10315_Sp20...Warm-up as You Walk In Bernouli distribution: ... Today Wrap up MLE for linear regression Classification models MLE for logistic regression

Classification ModelsLinear Regression with Decision Boundary

Page 16: Warm-up as You Walk In10315/lectures/10315_Sp20...Warm-up as You Walk In Bernouli distribution: ... Today Wrap up MLE for linear regression Classification models MLE for logistic regression

Classification ModelsLinear Regression with Probability

Page 17: Warm-up as You Walk In10315/lectures/10315_Sp20...Warm-up as You Walk In Bernouli distribution: ... Today Wrap up MLE for linear regression Classification models MLE for logistic regression

Modelling ๐‘(๐‘Œ|๐‘‹, ๐œƒ)Bernoulli distribution of logistic function of linear model

Page 18: Warm-up as You Walk In10315/lectures/10315_Sp20...Warm-up as You Walk In Bernouli distribution: ... Today Wrap up MLE for linear regression Classification models MLE for logistic regression

MLE for BernoulliBernoulli distribution:

๐‘Œ โˆผ ๐ต๐‘’๐‘Ÿ๐‘› ๐‘ง

๐‘ ๐‘ฆ = แ‰Š๐‘ง, ๐‘ฆ = 11 โˆ’ ๐‘ง, ๐‘ฆ = 0

What is the log likelihood for three i.i.d. samples, given parameter ๐‘ง?

๐’Ÿ = {๐‘ฆ 1 = 1, ๐‘ฆ 2 = 1, ๐‘ฆ 3 = 0}

๐ฟ ๐‘ง =

โ„“ ๐‘ง =

Page 19: Warm-up as You Walk In10315/lectures/10315_Sp20...Warm-up as You Walk In Bernouli distribution: ... Today Wrap up MLE for linear regression Classification models MLE for logistic regression

MLE for BernoulliBernoulli distribution:

๐‘Œ โˆผ ๐ต๐‘’๐‘Ÿ๐‘› ๐‘ง

๐‘ ๐‘ฆ = แ‰Š๐‘ง, ๐‘ฆ = 11 โˆ’ ๐‘ง, ๐‘ฆ = 0

What is the log likelihood for three i.i.d. samples, given parameter ๐‘ง?

๐’Ÿ = {๐‘ฆ 1 = 1, ๐‘ฆ 2 = 1, ๐‘ฆ 3 = 0}

๐ฟ ๐‘ง =

โ„“ ๐‘ง =

Page 20: Warm-up as You Walk In10315/lectures/10315_Sp20...Warm-up as You Walk In Bernouli distribution: ... Today Wrap up MLE for linear regression Classification models MLE for logistic regression

MLE for BernoulliBernoulli distribution:

๐‘Œ โˆผ ๐ต๐‘’๐‘Ÿ๐‘› ๐‘ง

๐‘ ๐‘ฆ = แ‰Š๐‘ง, ๐‘ฆ = 11 โˆ’ ๐‘ง, ๐‘ฆ = 0

What is the log likelihood for three i.i.d. samples, given parameter ๐‘ง?

๐’Ÿ = {๐‘ฆ 1 = 1, ๐‘ฆ 2 = 1, ๐‘ฆ 3 = 0}

๐ฟ ๐‘ง = ๐‘ง โ‹… ๐‘ง โ‹… (1 โˆ’ ๐‘ง) = ฯ‚๐‘› ๐‘ง๐‘ฆ๐‘›

1 โˆ’ ๐‘ง 1โˆ’๐‘ฆ(๐‘›)

โ„“ ๐‘ง = log ๐‘ง + log ๐‘ง + log(1 โˆ’ ๐‘ง) = ฯƒ๐‘› ๐‘ฆ(๐‘›) log ๐‘ง + 1 โˆ’ ๐‘ฆ ๐‘› log 1 โˆ’ ๐‘ง

Page 21: Warm-up as You Walk In10315/lectures/10315_Sp20...Warm-up as You Walk In Bernouli distribution: ... Today Wrap up MLE for linear regression Classification models MLE for logistic regression

M(C)LE for Logistic Regression๐‘(๐‘Œ โˆฃ ๐‘‹, ๐œƒ)

๐‘ ๐‘Œ ๐‘‹,๐’˜ = ฯ‚๐‘›=1๐‘ ๐‘(๐‘ฆ(๐‘›) โˆฃ ๐’™(๐‘›), ๐’˜)

Model ๐‘Œ as a Bernoulli distribution, but the temporary ๐‘ง is now based on the logistic function of our linear model of input ๐’™

๐‘Œ โˆผ ๐ต๐‘’๐‘Ÿ๐‘› ๐œ‡ , ๐œ‡ = ๐‘” ๐’˜๐‘‡๐’™ , ๐‘” ๐‘ง =1

1+๐‘’โˆ’๐‘ง

What is the conditional log likelihood?

๐ฟ ๐’˜ =

โ„“ ๐’˜ =

Page 22: Warm-up as You Walk In10315/lectures/10315_Sp20...Warm-up as You Walk In Bernouli distribution: ... Today Wrap up MLE for linear regression Classification models MLE for logistic regression

M(C)LE for Logistic Regression๐‘(๐‘Œ โˆฃ ๐‘‹, ๐œƒ)

๐‘ ๐‘Œ ๐‘‹,๐’˜ = ฯ‚๐‘›=1๐‘ ๐‘(๐‘ฆ(๐‘›) โˆฃ ๐’™(๐‘›), ๐’˜)

Model ๐‘Œ as a Bernoulli distribution, but the temporary ๐‘ง is now based on the logistic function of our linear model of input ๐’™

๐‘Œ โˆผ ๐ต๐‘’๐‘Ÿ๐‘› ๐œ‡ , ๐œ‡ = ๐‘” ๐’˜๐‘‡๐’™ , ๐‘” ๐‘ง =1

1+๐‘’โˆ’๐‘ง

What is the conditional log likelihood?

๐ฟ ๐’˜ = ฯ‚๐‘› ๐‘” ๐’˜๐‘‡๐’™(๐‘›)๐‘ฆ ๐‘›

1 โˆ’ ๐‘” ๐’˜๐‘‡๐’™(๐‘›)1โˆ’๐‘ฆ(๐‘›)

โ„“ ๐’˜ = ฯƒ๐‘› ๐‘ฆ ๐‘› log ๐‘” ๐’˜๐‘‡๐’™(๐‘›) + 1 โˆ’ ๐‘ฆ(๐‘›) log 1 โˆ’ ๐‘” ๐’˜๐‘‡๐’™(๐‘›)

Page 23: Warm-up as You Walk In10315/lectures/10315_Sp20...Warm-up as You Walk In Bernouli distribution: ... Today Wrap up MLE for linear regression Classification models MLE for logistic regression

M(C)LE for Logistic Regression

โ„“ ๐’˜ = ฯƒ๐‘› ๐‘ฆ ๐‘› log ๐œ‡(๐‘›) + 1 โˆ’ ๐‘ฆ(๐‘›) log 1 โˆ’ ๐œ‡(๐‘›)

๐œ•โ„“

๐œ•๐’˜=

๐‘ง = ๐‘“ ๐’˜, ๐’™ = ๐’˜๐‘‡๐’™

โˆ‡๐’˜๐‘“(๐’˜, ๐’™) = ๐’™

๐œ‡ = ๐‘” ๐‘ง =1

1+๐‘’โˆ’๐‘ง

๐‘‘๐‘”

๐‘‘๐‘ง= ๐‘” ๐‘ง 1 โˆ’ ๐‘” ๐‘ง = ๐œ‡(1 โˆ’ ๐œ‡)

Page 24: Warm-up as You Walk In10315/lectures/10315_Sp20...Warm-up as You Walk In Bernouli distribution: ... Today Wrap up MLE for linear regression Classification models MLE for logistic regression

M(C)LE for Logistic Regression

โ„“ ๐’˜ = ฯƒ๐‘› ๐‘ฆ ๐‘› log ๐œ‡(๐‘›) + 1 โˆ’ ๐‘ฆ(๐‘›) log 1 โˆ’ ๐œ‡(๐‘›)

๐œ•โ„“

๐œ•๐’˜= ฯƒ๐‘›

๐‘ฆ ๐‘›

๐œ‡(๐‘›)โˆ’

1โˆ’๐‘ฆ ๐‘›

1โˆ’๐œ‡(๐‘›)๐œ•๐‘”

๐œ•๐‘“

๐œ•๐‘“

๐œ•๐’˜

= ฯƒ๐‘›๐‘ฆ ๐‘› โˆ’๐œ‡ ๐‘›

๐œ‡ ๐‘› 1โˆ’๐œ‡ ๐‘› ๐œ‡ ๐‘› 1 โˆ’ ๐œ‡ ๐‘› ๐’™ ๐‘› ๐‘‡

= ฯƒ๐‘› ๐‘ฆ ๐‘› โˆ’ ๐œ‡ ๐‘› ๐’™ ๐‘› ๐‘‡

๐‘ง = ๐‘“ ๐’˜, ๐’™ = ๐’˜๐‘‡๐’™

โˆ‡๐’˜๐‘“(๐’˜, ๐’™) = ๐’™

๐œ‡ = ๐‘” ๐‘ง =1

1+๐‘’โˆ’๐‘ง

๐‘‘๐‘”

๐‘‘๐‘ง= ๐‘” ๐‘ง 1 โˆ’ ๐‘” ๐‘ง = ๐œ‡(1 โˆ’ ๐œ‡)

Page 25: Warm-up as You Walk In10315/lectures/10315_Sp20...Warm-up as You Walk In Bernouli distribution: ... Today Wrap up MLE for linear regression Classification models MLE for logistic regression

M(C)LE for Logistic Regression

โ„“ ๐’˜ = ฯƒ๐‘› ๐‘ฆ ๐‘› log ๐œ‡(๐‘›) + 1 โˆ’ ๐‘ฆ(๐‘›) log 1 โˆ’ ๐œ‡(๐‘›)

โˆ‡๐’˜โ„“(๐’˜) = ฯƒ๐‘› ๐‘ฆ ๐‘› โˆ’ ๐œ‡ ๐‘› ๐’™ ๐‘›

โˆ‡๐’˜โ„“(๐’˜) = 0?

No closed form solution

Back to iterative methods. Solve with (stochastic) gradient descent, Newtonโ€™s method, or Iteratively Reweighted Least Squares (IRLS)

๐‘ง = ๐‘“ ๐’˜, ๐’™ = ๐’˜๐‘‡๐’™ ๐œ‡ = ๐‘” ๐‘ง =1

1+๐‘’โˆ’๐‘ง

Page 26: Warm-up as You Walk In10315/lectures/10315_Sp20...Warm-up as You Walk In Bernouli distribution: ... Today Wrap up MLE for linear regression Classification models MLE for logistic regression

Logistic FunctionCool note: Logistic function is related the invers of logit function!

Odds: Ratio of two probabilities. For ๐‘Œ โˆผ ๐ต๐‘’๐‘Ÿ๐‘›(๐‘), ๐‘(๐‘Œ=1)

๐‘(๐‘Œ=0)=

๐‘

1โˆ’๐‘

Logit function: Log odds. log๐‘(๐‘Œ=1)

๐‘(๐‘Œ=0)= log

๐‘

1โˆ’๐‘

๐‘ง = ๐‘™๐‘œ๐‘”๐‘–๐‘ก ๐‘ = log๐‘

1โˆ’๐‘

๐‘ = ๐‘™๐‘œ๐‘”๐‘–๐‘กโˆ’1 ๐‘ง =1

1+๐‘’โˆ’๐‘ง

Page 27: Warm-up as You Walk In10315/lectures/10315_Sp20...Warm-up as You Walk In Bernouli distribution: ... Today Wrap up MLE for linear regression Classification models MLE for logistic regression

Log Odds and Logistic RegressionFormulate log odds as linear model of X:

log๐‘(๐‘Œ = 1 โˆฃ ๐‘‹ = ๐’™,๐’˜)

๐‘(๐‘Œ = 0 โˆฃ ๐‘‹ = ๐’™,๐’˜)= ๐’˜๐‘‡๐’™

Equivalent to logistic representation:

๐‘ ๐‘Œ = 1 ๐‘‹ = ๐’™,๐’˜ =1

1 + ๐‘’โˆ’๐’˜๐‘‡๐’™

Page 28: Warm-up as You Walk In10315/lectures/10315_Sp20...Warm-up as You Walk In Bernouli distribution: ... Today Wrap up MLE for linear regression Classification models MLE for logistic regression

Log Odds and Logistic Regression (Multi-class!)Formulate log odds as linear model of X:

log๐‘(๐‘Œ = 1 โˆฃ ๐‘‹ = ๐’™,๐‘พ)

๐‘(๐‘Œ = ๐พ โˆฃ ๐‘‹ = ๐’™,๐‘พ)= ๐’˜1

๐‘‡๐’™

log๐‘(๐‘Œ = 2 โˆฃ ๐‘‹ = ๐’™,๐‘พ)

๐‘(๐‘Œ = ๐พ โˆฃ ๐‘‹ = ๐’™,๐‘พ)= ๐’˜2

๐‘‡๐’™

โ‹ฎ

log๐‘(๐‘Œ = ๐พ โˆ’ 1 โˆฃ ๐‘‹ = ๐’™,๐‘พ)

๐‘(๐‘Œ = ๐พ โˆฃ ๐‘‹ = ๐’™,๐‘พ)= ๐’˜๐พโˆ’1

๐‘‡ ๐’™

Equivalent to softmax representation:

๐‘ ๐‘Œ = ๐‘˜ ๐‘‹ = ๐’™,๐‘Š =๐‘’๐’˜๐‘˜

๐‘‡๐’™

1+ฯƒ๐‘—=1๐พโˆ’1 ๐‘’

๐’˜๐‘—๐‘‡๐’™

๐‘ ๐‘Œ = ๐พ ๐‘‹ = ๐’™,๐‘Š =1

1+ฯƒ๐‘—=1๐พโˆ’1 ๐‘’

๐’˜๐‘—๐‘‡๐’™

๐‘ ๐‘Œ = ๐‘˜ ๐‘‹ = ๐’™,๐‘Š =๐‘’๐’˜๐‘˜

๐‘‡๐’™

ฯƒ๐‘—=1๐พ ๐‘’

๐’˜๐‘—๐‘‡๐’™OR

Page 29: Warm-up as You Walk In10315/lectures/10315_Sp20...Warm-up as You Walk In Bernouli distribution: ... Today Wrap up MLE for linear regression Classification models MLE for logistic regression

Multi-class Logistic Regression๐‘(๐‘Œ โˆฃ ๐‘‹, ๐œƒ)

๐‘ ๐‘Œ ๐‘‹,๐‘พ = ฯ‚๐‘›=1๐‘ ๐‘(๐‘ฆ(๐‘›) โˆฃ ๐’™(๐‘›),๐‘พ)

๐‘ ๐‘ฆ(๐‘›) = ๐‘˜ ๐‘‹ = ๐’™(๐‘›),๐‘Š =๐‘’๐’˜๐‘˜

๐‘‡๐’™(๐‘›)

ฯƒ๐‘—=1๐พ ๐‘’

๐’˜๐‘—๐‘‡๐’™(๐‘›)

What is the conditional likelihood?

๐ฟ ๐’˜ = ฯ‚๐‘›๐‘’๐’˜๐‘˜

๐‘‡๐’™(๐‘›)

ฯƒ๐‘—=1๐พ ๐‘’

๐’˜๐‘—๐‘‡๐’™(๐‘›)

What is the hypothesis function?

เทœ๐‘ฆ = โ„Ž๐‘พ ๐’™ =