10
Daphne Koller Parameter Estimation Max Likelihood for BNs Probabilistic Graphical Models Learning

Max Likelihood for BNs

  • Upload
    gita

  • View
    46

  • Download
    0

Embed Size (px)

DESCRIPTION

Learning. Probabilistic Graphical Models. Parameter Estimation. Max Likelihood for BNs. MLE for Bayesian Networks. Parameters: Data instances: . X. Y. MLE for Bayesian Networks.  X. Parameters:. X.  Y|X. Y. Data d. MLE for Bayesian Networks. - PowerPoint PPT Presentation

Citation preview

Page 1: Max Likelihood for BNs

Daphne Koller

Parameter Estimation

Max Likelihoodfor BNs

ProbabilisticGraphicalModels

Learning

Page 2: Max Likelihood for BNs

Daphne Koller

MLE for Bayesian Networks• Parameters:

• Data instances: <x[m],y[m]> X

Y

Y

X y0 y1

x0 0.95 0.05

x1 0.2 0.8

X

x0 x1

0.7 0.3

Page 3: Max Likelihood for BNs

Daphne Koller

MLE for Bayesian Networks• Parameters:

):][],[():(1

M

m

mymxPDL

):][|][():][(

):][|][():][(

1|

1

11M

mXY

M

mX

M

m

M

m

mxmyPmxP

mxmyPmxP

):][|][():][(1

M

m

mxmyPmxP

X

X

Y

Y|X

Data d

Page 4: Max Likelihood for BNs

Daphne Koller

MLE for Bayesian Networks• Likelihood for Bayesian network

if Xi|Ui are disjoint, then MLE can be computed by

maximizing each local likelihood separately

iii

i miii

m iiii

m

DL

mmxP

mmxP

mxPDL

):(

):][|][(

):][|][(

):][():(

U

U

Page 5: Max Likelihood for BNs

Daphne Koller

MLE for Table CPDs

uu

Uu

u][,][:

|,

):][|][(mxmxm

Xx

mmxP

][

],[

],'[

],[

'

| u

u

u

uu M

xM

xM

xM

x

x

):][|][():][|][(1

|1

M

mX

M

m

mmxPmmxP Uuu

uu

uu ][,][:

|, mxmxm

xx

],[

,|

u

uu

xM

xx

Page 6: Max Likelihood for BNs

Daphne Koller

MLE for Linear Gaussians

Page 7: Max Likelihood for BNs

Daphne Koller

Shared ParametersS(1)S(0) S(2) S(3)

S’|S

Page 8: Max Likelihood for BNs

Daphne Koller

Shared Parameters

S(1)

O(1)

S(0) S(2)

O(2)

S(3)

O(3)

S’|S

O’|S’

Page 9: Max Likelihood for BNs

Daphne Koller

Summary• For BN with disjoint sets of parameters in

CPDs, likelihood decomposes as product of local likelihood functions, one per variable

• For table CPDs, local likelihood further decomposes as product of likelihood for multinomials, one for each parent combination

• For networks with shared CPDs, sufficient statistics accumulate over all uses of CPD

Page 10: Max Likelihood for BNs

Daphne Koller

END END END