27
Bayesian network inference Given: – Query variables: X Evidence (observed) variables: E = e Evidence (observed) variables: E = e Unobserved variables: Y Goal: calculate some useful information about the query Goal: calculate some useful information about the query variables – Posterior P(X|e) MAP estimate arg max x P(x|e) Recall: inference via the full joint distribution Si BN’ ff d il i i fj i = = y y e X e e X e E X ) , , ( ) ( ) , ( ) | ( P P P P Since BN’s can afford exponential savings in storage of joint distributions, can they afford similar savings for inference?

Bayesian network inference - Welcome to UNC Computer Science

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Bayesian network inference - Welcome to UNC Computer Science

Bayesian network inference• Given:

– Query variables: XEvidence (observed) variables: E = e– Evidence (observed) variables: E = e

– Unobserved variables: Y• Goal: calculate some useful information about the queryGoal: calculate some useful information about the query

variables– Posterior P(X|e)– MAP estimate arg maxx P(x|e)

• Recall: inference via the full joint distribution

Si BN’ ff d i l i i f j i

∑∝==y

yeXe

eXeEX ),,()(

),()|( PP

PP

– Since BN’s can afford exponential savings in storage of joint distributions, can they afford similar savings for inference?

Page 2: Bayesian network inference - Welcome to UNC Computer Science

Bayesian network inferenceI f ll lit NP h d• In full generality, NP-hard– More precisely, #P-hard: equivalent to counting satisfying assignments

• We can reduce satisfiability to Bayesian network inferenceWe can reduce satisfiability to Bayesian network inference– Decision problem: is P(Y) > 0?

)()()( 432321321 uuuuuuuuuY ∨¬∨∧∨¬∨¬∧∨∨= )()()( 432321321

C1 C2 C3

G. Cooper, 1990

Page 3: Bayesian network inference - Welcome to UNC Computer Science

Inference example

• Query: P(B | j, m)

Page 4: Bayesian network inference - Welcome to UNC Computer Science

Inference example

• Query: P(B | j, m)

∑∑∝= mjaebPmjbPmjbP ),,,,(),,(),|(

∑∑

∑∑= =

=eE aA

amPajPebaPePbP

jmjP

j

)|()|(),|()()(

),,,,(),(

),|(

∑ ∑

∑∑= =

=eE aA

amPajPebaPePbP

j

)|()|(),|()()(

)|()|(),|()()(

∑ ∑= =eE aA

• Are we doing any unnecessary work?

Page 5: Bayesian network inference - Welcome to UNC Computer Science

Inference example∑ ∑= =

∝eE aA

amPajPebaPePbPmjbP )|()|(),|()()(),|(

Page 6: Bayesian network inference - Welcome to UNC Computer Science

Exact inferenceExact inference

• Basic idea: compute the results of sub-expressions in a bottom-up way and p p ycache them for later use– Form of dynamic programmingForm of dynamic programming

• Has polynomial time and space complexity for polytreesfor polytrees– Polytree: at most one undirected path between

any two nodesany two nodes

Page 7: Bayesian network inference - Welcome to UNC Computer Science

Representing peopleRepresenting people

Page 8: Bayesian network inference - Welcome to UNC Computer Science

Review: Bayesian network inference

• In general harder than satisfiabilityIn general, harder than satisfiability• Efficient inference via dynamic

programming is possible for polytreesprogramming is possible for polytrees• In other practical cases, must resort to

i t th dapproximate methods

Page 9: Bayesian network inference - Welcome to UNC Computer Science

Approximate inference:Sampling

A B i t k i ti d l• A Bayesian network is a generative model– Allows us to efficiently generate samples from

h j i di ib ithe joint distribution• Algorithm for sampling the joint distribution:

– While not all variables are sampled:• Pick a variable that is not yet sampled, but whose

t l dparents are sampled• Draw its value from P(X | parents(X))

Page 10: Bayesian network inference - Welcome to UNC Computer Science

Example of sampling from the joint distribution

Page 11: Bayesian network inference - Welcome to UNC Computer Science

Example of sampling from the joint distribution

Page 12: Bayesian network inference - Welcome to UNC Computer Science

Example of sampling from the joint distribution

Page 13: Bayesian network inference - Welcome to UNC Computer Science

Example of sampling from the joint distribution

Page 14: Bayesian network inference - Welcome to UNC Computer Science

Example of sampling from the joint distribution

Page 15: Bayesian network inference - Welcome to UNC Computer Science

Example of sampling from the joint distribution

Page 16: Bayesian network inference - Welcome to UNC Computer Science

Example of sampling from the joint distribution

Page 17: Bayesian network inference - Welcome to UNC Computer Science

Inference via samplingInference via sampling

• Suppose we drew N samples from the jointSuppose we drew N samples from the joint distribution

• How do we compute P(X = x | e)?p ( | )

NPP happen / and timesof #),()|( exexexX ≈==

• Rejection sampling: to compute P(X = x | e)

NPP

/ happens timesof #)()|(

eeexX ≈

• Rejection sampling: to compute P(X = x | e), keep only the samples in which e happens and find in what proportion of them x also happens p p pp

Page 18: Bayesian network inference - Welcome to UNC Computer Science

Inference via sampling• Rejection sampling: to compute P(X = x | e),

keep only the samples in which e happens and fi d i h t ti f th l hfind in what proportion of them x also happens

• What if e is a rare event? E l b l th k– Example: burglary ∧ earthquake

– Rejection sampling ends up throwing away most of the samplesthe samples

Page 19: Bayesian network inference - Welcome to UNC Computer Science

Inference via sampling• Rejection sampling: to compute P(X = x | e),

keep only the samples in which e happens and fi d i h t ti f th l hfind in what proportion of them x also happens

• What if e is a rare event? E l b l th k– Example: burglary ∧ earthquake

– Rejection sampling ends up throwing away most of the samplesthe samples

• Likelihood weightingSample from P(X = x | e) but weight each– Sample from P(X = x | e), but weight each sample by P(e)

Page 20: Bayesian network inference - Welcome to UNC Computer Science

Inference via sampling: SummarySummary

• Use the Bayesian network to generate samples from the joint distribution

• Approximate any desired conditional or marginal probability by empirical frequencies– This approach is consistent: in the limit of infinitely

many samples frequencies converge to probabilitiesmany samples, frequencies converge to probabilities– No free lunch: to get a good approximate of the

desired probability, you may need an exponential number of samples anyway

Page 21: Bayesian network inference - Welcome to UNC Computer Science

Example: Solving the satisfiabilityproblem by samplingproblem by sampling

)()()( 432321321 uuuuuuuuuY ∨¬∨∧∨¬∨¬∧∨∨=C1 C2 C3

P(ui = T) = 0.5

• Sample values of u1, …, u4 according to P(ui = T) = 0.5– Estimate of P(Y): # of satisfying assignments / # of sampled assignments( ) y g g p g– Not guaranteed to correctly figure out whether P(Y) > 0 unless you

sample every possible assignment!

Page 22: Bayesian network inference - Welcome to UNC Computer Science

Other approximate inference methods

• Variational methodsVariational methods– Approximate the original network by a simpler one

(e.g., a polytree) and try to minimize the divergence between the simplified and the exact model

• Belief propagationIt ti i h d t– Iterative message passing: each node computes some local estimate and shares it with its neighbors. On the next iteration, it uses information from its ,neighbors to update its estimate.

Page 23: Bayesian network inference - Welcome to UNC Computer Science

Parameter learningS k h k (b• Suppose we know the network structure (but not the parameters), and have a training set of complete observationscomplete observations

Training setSample C S R W

1 T F T T2 F T F T

?

2 F T F T3 T F F F4 T T T T

??

??

5 F T F T6 T F T F… … … …. …

????

Page 24: Bayesian network inference - Welcome to UNC Computer Science

Parameter learningS k h k (b• Suppose we know the network structure (but not the parameters), and have a training set of complete observationscomplete observations– P(X | Parents(X)) is given by the observed

frequencies of the different values of X for each qcombination of parent values

– Similar to sampling, except your samples come from the training data and not from the model (whosethe training data and not from the model (whose parameters are initially unknown)

Page 25: Bayesian network inference - Welcome to UNC Computer Science

Parameter learning• Incomplete observations

? Training setSample C S R W

1 ? F T T2 ? T F T? ? 2 ? T F T3 ? F F F4 ? T T T

? ?

5 ? T F T6 ? F T F… … … …. …

????

Page 26: Bayesian network inference - Welcome to UNC Computer Science

Parameter learningParameter learning

• Learning with incomplete observations:Learning with incomplete observations:EM (Expectation Maximization) Algorithm

∑ ===+

zzZxxzZ )|,(),|(maxarg )()1( θθθ θ LP tt

Z: hidden variables

Page 27: Bayesian network inference - Welcome to UNC Computer Science

Parameter learning• What if the network structure is unknown?

– Structure learning algorithms exist, but they are pretty complicated…

Sample C S R W

Training setC

1 T F T T2 F T F T3 T F F FS R? 4 T T T T5 F T F T6 T F T F

?6 T F T F… … … …. …

W