Max Likelihood for BNs

Daphne Koller

Parameter Estimation

Max Likelihoodfor BNs

ProbabilisticGraphicalModels

Learning

Daphne Koller

MLE for Bayesian Networks• Parameters:

• Data instances: <x[m],y[m]> X

Y

Y

X y0 y1

x0 0.95 0.05

x1 0.2 0.8

X

x0 x1

0.7 0.3

Daphne Koller

MLE for Bayesian Networks• Parameters:

):][],[():(1

M

m

mymxPDL

):][|][():][(

):][|][():][(

1|

1

11M

mXY

M

mX

M

m

M

m

mxmyPmxP

mxmyPmxP

):][|][():][(1

M

m

mxmyPmxP

X

X

Y

Y|X

Data d

Daphne Koller

MLE for Bayesian Networks• Likelihood for Bayesian network

if Xi|Ui are disjoint, then MLE can be computed by

maximizing each local likelihood separately

iii

i miii

m iiii

m

DL

mmxP

mmxP

mxPDL

):(

):][|][(

):][|][(

):][():(

U

U

Daphne Koller

MLE for Table CPDs

uu

Uu

u][,][:

|,

):][|][(mxmxm

Xx

mmxP

][

],[

],'[

],[

'

| u

u

u

uu M

xM

xM

xM

x

x

):][|][():][|][(1

|1

M

mX

M

m

mmxPmmxP Uuu

uu

uu ][,][:

|, mxmxm

xx

],[

,|

u

uu

xM

xx

Daphne Koller

MLE for Linear Gaussians

Daphne Koller

Shared ParametersS(1)S(0) S(2) S(3)

S’|S

Daphne Koller

Shared Parameters

S(1)

O(1)

S(0) S(2)

O(2)

S(3)

O(3)

S’|S

O’|S’

Daphne Koller

Summary• For BN with disjoint sets of parameters in

CPDs, likelihood decomposes as product of local likelihood functions, one per variable

• For table CPDs, local likelihood further decomposes as product of likelihood for multinomials, one for each parent combination

• For networks with shared CPDs, sufficient statistics accumulate over all uses of CPD

Daphne Koller

END END END

Documents

Max Likelihood for BNs