Announcements10601/lectures/10601_Fa20...Announcements Assignments HW6 Due Mon, 11/2, 11:59 pm...

AnnouncementsAssignments

▪ HW6

▪ Due Mon, 11/2, 11:59 pm

Midterm 2

▪ Mon, 11/9, during lecture

▪ See Piazza for details

▪ Forms for conflicts / tech issues due Fri, 10/30

Fireside Chat about the CMU ML PhD Program

▪ Fri, 10/30, 8:00 pm

▪ See Piazza for details, including form to show interest

PlanLast Time

▪ PAC Criteria and Learning Theorems

▪ Bias-Variance trade-off as we change |ℋ| or 𝑁

▪ Started VC dimension for infinite |ℋ|

Today

▪ VC dimensions

▪ Learning theory and regularization

▪ MLE

▪ MLE for linear regression

▪ MAP (Maximum a posteriori) estimation

▪ MAP for linear regression

Wrap up Learning TheoryLearning theory slides

Introduction to Machine Learning

MLE & MAP

Instructor: Pat Virtue

Reminder MLETrick coin

𝜙𝑀𝐿𝐸 = argmax𝜙

ෑ

𝑖

𝑁

𝑝 𝑦 𝑖 𝜙

Previous Piazza PollWe model the outcome of a single mysterious weighted-coin flip as a Bernoulli random variable:

𝑌 ∼ 𝐵𝑒𝑟𝑛 𝜙

𝑝 𝑦 ∣ 𝜙 = ቊ𝜙, 𝑦 = 1 (𝐻𝑒𝑎𝑑𝑠)

1 − 𝜙, 𝑦 = 0 𝑇𝑎𝑖𝑙𝑠

Given the ordered sequence of coin flip outcomes:1, 0, 1, 1

What is the estimate of parameter 𝜙?

A. 0.0 B. 1/8 C. 1/4 D. 1/2 E. 3/4 F. 3/8 G. 1.0

Why? p(𝒟 ∣ 𝜙) = 𝜙3 1 − 𝜙 1

𝜙𝑀𝐿𝐸 = argmax𝜙

ς𝑖𝑁 𝑝 𝑦 𝑖 𝜙

MLE as Data IncreasesGiven the ordered sequence of coin flip outcomes:

1, 0, 1, 1

p(𝒟 ∣ 𝜙) =ෑ

𝑖

𝑁

𝑝 𝑦 𝑖 𝜙 = 𝜙𝑁𝑦=1 1 − 𝜙 𝑁𝑦=0

What happens as we flip more coins?

MLE for GaussianGaussian distribution:

𝑌 ∼ 𝒩 𝜇, 𝜎2

𝑝 𝑦 ∣ 𝜇, 𝜎2 =1

2𝜋𝜎2𝑒−

𝑦−𝜇 2

2𝜎2

What is the log likelihood for three i.i.d. samples, given parameters 𝜇, 𝜎2?

𝒟 = {𝑦 1 = 65, 𝑦 2 = 95, 𝑦 3 = 85}

𝐿 𝜇, 𝜎2 =

ℓ 𝜇, 𝜎2 =

ෑ

𝑖=1

𝑁1

2𝜋𝜎2𝑒−

𝑦(𝑖)−𝜇2

2𝜎2

𝑖=1

𝑁

−log 2𝜋𝜎2 −𝑦(𝑖) − 𝜇

2

2𝜎2

𝜃𝑀𝐿𝐸 = argmax𝜽

ෑ

𝑖

𝑁

𝑝 𝑦 𝑖 𝜽

𝜃𝑀𝐿𝐸 = argmax𝜽

𝑖

𝑁

log 𝑝 𝑦 𝑖 𝜽

Recipe for EstimationMLE

1. Formulate the likelihood, 𝑝(𝒟 ∣ 𝜃)

2. Set objective 𝐽(𝜃) equal to negative log of likelihood

J 𝜃 = − log 𝑝 𝒟 𝜃

3. Compute derivative of objective, 𝜕𝐽/𝜕𝜃

4. Find መ𝜃, either

a. Set derivate equal to zero and solve for 𝜃

b. Use (stochastic) gradient descent to step towards better 𝜃

M(C)LE for Logistic RegressionLearn to predict if a patient has cancer (𝑌 = 1) or not (𝑌 = 0) given the input of just one test result, 𝑋𝐴.

መ𝜃𝑀𝐿𝐸 = argmax𝜽

ෑ

𝑖

𝑁1

1 + 𝑒−𝜽𝑇𝒙

𝑖

𝕀 𝑦 𝑖 =1

1 −1

1 + 𝑒−𝜽𝑇𝒙

𝑖

𝕀 𝑦(𝑖)=0

M(C)LE for Logistic RegressionLearn to predict if a patient has cancer (𝑌 = 1) or not (𝑌 = 0) given the input of just one test result, 𝑋𝐴.


ෑ

𝑖

𝑁

𝑝 𝑦 𝑖 𝒙 𝑖 , 𝜽

M(C)LE for Linear RegressionProbabilistic interpretation of linear regression


ෑ

𝑖

𝑁


FROM MLE TO MAP

13

Product RuleConstruct the joint by multiplying the conditional by the appropriate marginal

𝑃 𝐴, 𝐵 = 𝑃 𝐵 𝐴 𝑃(𝐴)

𝑃 𝐴, 𝐵 = 𝑃 𝐴 𝐵 𝑃(𝐵)

Also works when something is given everywhere

𝑃 𝐴, 𝐵 ∣ 𝐶 = 𝑃 𝐴 𝐵, 𝐶 𝑃 𝐵 𝐶

𝑃 𝐴, 𝐵 ∣ 𝐶, 𝐷, 𝐸 = 𝑃 𝐴 𝐵, 𝐶, 𝐷, 𝐸 𝑃 𝐵 𝐶, 𝐷, 𝐸

Coin Flipping ExampleTrick coin: Suppose I know how many coins are in each container in the store. How can I use this information both before and after flipping coins?

Likelihood, Prior, and PosteriorLikelihood: 𝑝(𝒟 ∣ 𝜃)

Prior: 𝑝 𝜃

Posterior: 𝑝(𝜃 ∣ 𝒟)

Relating these with Bayes rule

Joint: 𝑝(𝒟, 𝜃)

MLE and MAPLikelihood: 𝑝(𝒟 ∣ 𝜃)

Prior: 𝑝 𝜃

Posterior: 𝑝(𝜃 ∣ 𝒟)

MLE: መ𝜃𝑀𝐿𝐸 = argmax𝜃

𝑝(𝒟 ∣ 𝜃)

MAP: መ𝜃𝑀𝐴𝑃 = argmax𝜃

𝑝 𝒟 𝜃 𝑝 𝜃

Maximum a posteriori estimation

Joint: 𝑝(𝒟, 𝜃)

𝑝 𝜃 𝒟 ∝ 𝑝 𝒟 𝜃 𝑝(𝜃)

Coin Flipping ExampleTrick coin: Suppose I know how many coins are in each container in the store. How can I use this information both before and after flipping coins?

መ𝜃𝑀𝐴𝑃 = argmax𝜃

ς𝑖=1𝑁 𝑝 𝑦(𝑖) 𝜃 𝑝 𝜃

Piazza Poll 1:

𝑝(𝜃 ∣ 𝒟) ∝ 𝑝(𝒟 𝜃 𝑝(𝜃)

𝑝(𝜃 ∣ 𝒟) ∝ ς 𝑝(𝒟 𝑛 𝜃 𝑝 𝜃

As the number of data points increases, which of the following are true?

Select ALL that applyA. The MAP estimate approaches the MLE estimate

B. The posterior distribution approaches the prior distribution

C. The likelihood distribution approaches the prior distribution

D. The posterior distribution approaches the likelihood distribution

E. The likelihood has a lower impact on the posterior

F. The prior has a lower impact on the posterior

posterior ∝ likelihood ⋅ prior

MAP as Data IncreasesGiven the ordered sequence of coin flip outcomes:

𝒟 = 1, 0, 1, 1

p 𝒟 𝜙 𝑝(𝜙) =ෑ

𝑖

𝑁

𝑝 𝑦 𝑖 𝜙 𝑝 𝜙 = 𝜙𝑁𝑦=1 1 − 𝜙 𝑁𝑦=0 𝑝(𝜙)

What happens as we flip more coins?

Recipe for EstimationMLE

1. Formulate the likelihood, 𝑝(𝒟 ∣ 𝜃)

2. Set objective 𝐽(𝜃) equal to negative log of likelihood

J 𝜃 = − log 𝑝 𝒟 𝜃





Recipe for EstimationMAP

1. Formulate the likelihood times the prior, 𝑝 𝒟 𝜃 𝑝(𝜃)

2. Set objective 𝐽(𝜃) equal to negative log of likelihood times the prior

J 𝜃 = − log 𝑝 𝒟 𝜃 𝑝(𝜃)





M(C)LE for Linear RegressionProbabilistic interpretation of linear regression


ෑ

𝑖

𝑁


MAP for Linear RegressionWhat assumptions are we making about our parameters?

Announcements10601/lectures/10601_Fa20...Announcements Assignments HW6 Due Mon, 11/2, 11:59 pm...

Documents

Announcements10601/lectures/10601_Fa20... · 2020. 12. 11. · Announcements Assignments HW3 Mon, 9/28, 11:59 pm Midterm 1 Mon, 10/5 See Piazza for details Fill out swap-section

BRERKS THE INTERNET SAdfð1S AINO SVH MON (6 ......BRERKS THE INTERNET SAdfð1S AINO SVH MON (6) MON dOd11101 (8) SVH MON ANOD WVÐðD d01 (Z) MON V 9NßSlW ION WVAðD (9) „OðAH

Announcements - Carnegie Mellon School of Computer Science10601/lectures/10601_Fa20... · 2020. 12. 11. · Announcements Assignments HW4 Wed, 10/14, 11:59 pm Survey Thanks! We’ll

L’échange international « mon corps, mon éducation, mes droits » - rapport final

S - Noida Authority Online · Web view01 Nos. B/M with Bullock in all park Mon to Sun Mon to Sun Mon to Sun Mon Tue to Wed Thu Fri to Sat Mon. to Wed. Thu to Sat. Mon. to Wed. Thu

Automated Mon

Mon Quotidien

Mon Tue Wed Thu Fri Mon Tue Wed Thu Fri

Home - Fennell Bay Public School · Mon 28th Oct Tues 29th Oct Wed 30-1st Nov Week 4 Mon 4-8th Nov Mon 4-7th Nov Fri 8th Nov Week 5 Mon 11-15th Nov Mon 11-15th Nov Week 6 Mon 18-22nd

Lecture: Astroparticle Physics and Detectors 2008 Mon Feb 18Ereditato Mon Feb 25Ereditato Mon March 3Ereditato Mon March 10Ereditato Mon March 17res Mon

T Mon: Velocity Model Prior Probability T Mon: Velocity ...quasar.as.utexas.edu/papers/EichhornPlots.pdf · T Mon: Velocity Model Prior Probability ... 1234567 T Mon: Velocity Model

City of Melville Homepage - City of Melville · ..ION '"MON TUES MON MON MON THURS THURS s" MON MON MON MON #6 #8 from Monday, 8 July from Monday, 15 July from Monday, 22 July from

T/Mon MINI T/Mon SLIM T/Mon LNX - · PDF fileT/Mon Overview A T/Mon ... Protocol mediator to multiple destinations and protocols ... We build remotes for small, medium, and large sites,

MON MOULIN mon moulin · MON MOULIN mon moulin Iris Hanke 3, rue de l`Ecole F-39350 Ougney fon 0033-384709997 fax 0033-384709998 email bonjour@mon-moulin.com

1 mon 0841 mon holtgrave

Fondapa Mon

Mon enfance

MON ARQUITECTOS

ABB Robotics Facsimile Training Guide 2007 · 1 8 15 22 29 Mon Feb 5 12 1926 Mar Wed 14 Mon Mar 26 Mon Apr 2 Mon May 7 Tue May Mon May 21 Apr 30 May 28 Jun 4 Mon Jun 11 Mon Jun 18

Announcements10601/lectures/10601_Fa20... · 2020. 12. 12. · Announcements Assignments HW8: due Thu, 12/3, 11:59 pm HW9 Out Friday Due Wed, 12/9, 11:59 pm The two slip days are