Maximum Likelihood Estimation

Preview:

DESCRIPTION

Learning. Probabilistic Graphical Models. Parameter Estimation. Maximum Likelihood Estimation. Biased Coin Example. P is a Bernoulli distribution: P(X=1) = , P(X=0) = 1- . Tosses are independent of each other Tosses are sampled from the same distribution (identically distributed). - PowerPoint PPT Presentation

Citation preview

Daphne Koller

Parameter Estimation

MaximumLikelihoodEstimation

ProbabilisticGraphicalModels

Learning

Daphne Koller

Biased Coin Example

• Tosses are independent of each other• Tosses are sampled from the same

distribution (identically distributed)

P is a Bernoulli distribution: P(X=1) = , P(X=0) = 1-

sampled IID from P

Daphne Koller

IID as a PGM

XData m X[1] X[M]

. . .

0

1

][1

][)|][(

xmx

xmxmxP

Daphne Koller

Maximum Likelihood Estimation

• Goal: find [0,1] that predicts D well• Prediction quality = likelihood of D given

M

mmxPDPDL

1)|][()|():(

HHTTHL ,,,,:

0 0.2 0.4 0.6 0.8 1

L(D:

)

Daphne Koller

Maximum Likelihood Estimator

• Observations: MH heads and MT tails

• Find maximizing likelihood

• Equivalent to maximizing log-likelihood

• Differentiating the log-likelihood and solving for :

TH MMTH MML )1(),:(

)1log(log),:( THTH MMMMl

TH

H

MM

M

Daphne Koller

Sufficient Statistics

• For computing in the coin toss example, we only needed MH and MT since

• MH and MT are sufficient statistics

TH MMDL )1():(

Daphne Koller

Sufficient Statistics• A function s(D) is a sufficient statistic from

instances to a vector in k if for any two datasets D and D’ and any we have

)':():(])[(])[('][][

DLDLixsixsDixDix

Datasets

Statistics

Daphne Koller

Sufficient Statistic for Multinomial

k

i

Mi

iDL1

):(

• For a dataset D over variable X with k values, the sufficient statistics are counts <M1,...,Mk> where Mi is the # of times that X[m]=xi in D

• Sufficient statistic s(x) is a tuple of dimension k– s(xi)=(0,...0,1,0,...,0)

i

Daphne Koller

Sufficient Statistic for Gaussian

• Gaussian distribution:

• Rewrite as

• Sufficient statistics for Gaussian: s(x)=<1,x,x2>

2

2

12

2

1)(),(~)(

x

eXpNXP if

2

2

222

2

1exp

2

1)(

xxXp

Daphne Koller

Maximum Likelihood Estimation

• MLE Principle: Choose to maximize L(D:)

• Multinomial MLE:

• Gaussian MLE: m

mxM

][1

m

i i

ii

M

M

1

m

mxM

2)ˆ][(1

ˆ

Daphne Koller

Summary

• Maximum likelihood estimation is a simple principle for parameter selection given D

• Likelihood function uniquely determined by sufficient statistics that summarize D

• MLE has closed form solution for many parametric distributions

Daphne Koller

END END END

Recommended