89
Inference in Bayesian Networks CE417: Introduction to Artificial Intelligence Sharif University of Technology Spring 2018 Soleymani Slides are based on Klein and Abdeel, CS188, UC Berkeley.

Inference in Bayesian Networksce.sharif.edu/courses/96-97/2/ce417-1/resources/root... · 2018-05-17 · Inference in Bayesian Networks CE417: Introduction to Artificial Intelligence

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Inference in Bayesian Networksce.sharif.edu/courses/96-97/2/ce417-1/resources/root... · 2018-05-17 · Inference in Bayesian Networks CE417: Introduction to Artificial Intelligence

Inference in Bayesian NetworksCE417: Introduction to Artificial Intelligence

Sharif University of Technology

Spring 2018

Soleymani

Slides are based on Klein and Abdeel, CS188, UC Berkeley.

Page 2: Inference in Bayesian Networksce.sharif.edu/courses/96-97/2/ce417-1/resources/root... · 2018-05-17 · Inference in Bayesian Networks CE417: Introduction to Artificial Intelligence

Bayes’ Nets

Representation

Conditional Independences

Probabilistic Inference Enumeration (exact, exponential complexity)

Variable elimination (exact, worst-case

exponential complexity, often better)

Probabilistic inference is NP-complete

Sampling (approximate)

Learning Bayes’ Nets from Data

2

Page 3: Inference in Bayesian Networksce.sharif.edu/courses/96-97/2/ce417-1/resources/root... · 2018-05-17 · Inference in Bayesian Networks CE417: Introduction to Artificial Intelligence

Recap: Bayes’ Net Representation

A directed, acyclic graph, one node per randomvariable

A conditional probability table (CPT) for each node

A collection of distributions over X, one for each combination ofparents’ values

Bayes’ nets implicitly encode joint distributions

As a product of local conditional distributions

To see what probability a BN gives to a full assignment, multiplyall the relevant conditionals together:

3

Page 4: Inference in Bayesian Networksce.sharif.edu/courses/96-97/2/ce417-1/resources/root... · 2018-05-17 · Inference in Bayesian Networks CE417: Introduction to Artificial Intelligence

Example: Alarm Network

Burglary

Earthqk

Alarm

John

calls

Mary

calls

B P(B)

+b 0.001

-b 0.999

E P(E)

+e 0.002

-e 0.998

B E A P(A|B,E)

+b +e +a 0.95

+b +e -a 0.05

+b -e +a 0.94

+b -e -a 0.06

-b +e +a 0.29

-b +e -a 0.71

-b -e +a 0.001

-b -e -a 0.999

A J P(J|A)

+a +j 0.9

+a -j 0.1

-a +j 0.05

-a -j 0.95

A M P(M|A)

+a +m 0.7

+a -m 0.3

-a +m 0.01

-a -m 0.99

[Demo: BN Applet]

4

Page 5: Inference in Bayesian Networksce.sharif.edu/courses/96-97/2/ce417-1/resources/root... · 2018-05-17 · Inference in Bayesian Networks CE417: Introduction to Artificial Intelligence

Video of Demo BN Applet

5

Page 6: Inference in Bayesian Networksce.sharif.edu/courses/96-97/2/ce417-1/resources/root... · 2018-05-17 · Inference in Bayesian Networks CE417: Introduction to Artificial Intelligence

Example: Alarm Network

B P(B)

+b 0.001

-b 0.999

E P(E)

+e 0.002

-e 0.998

B E A P(A|B,E)

+b +e +a 0.95

+b +e -a 0.05

+b -e +a 0.94

+b -e -a 0.06

-b +e +a 0.29

-b +e -a 0.71

-b -e +a 0.001

-b -e -a 0.999

A J P(J|A)

+a +j 0.9

+a -j 0.1

-a +j 0.05

-a -j 0.95

A M P(M|A)

+a +m 0.7

+a -m 0.3

-a +m 0.01

-a -m 0.99

B E

A

MJ

6

Page 7: Inference in Bayesian Networksce.sharif.edu/courses/96-97/2/ce417-1/resources/root... · 2018-05-17 · Inference in Bayesian Networks CE417: Introduction to Artificial Intelligence

Example: Alarm Network

B P(B)

+b 0.001

-b 0.999

E P(E)

+e 0.002

-e 0.998

B E A P(A|B,E)

+b +e +a 0.95

+b +e -a 0.05

+b -e +a 0.94

+b -e -a 0.06

-b +e +a 0.29

-b +e -a 0.71

-b -e +a 0.001

-b -e -a 0.999

A J P(J|A)

+a +j 0.9

+a -j 0.1

-a +j 0.05

-a -j 0.95

A M P(M|A)

+a +m 0.7

+a -m 0.3

-a +m 0.01

-a -m 0.99

B E

A

MJ

7

Page 8: Inference in Bayesian Networksce.sharif.edu/courses/96-97/2/ce417-1/resources/root... · 2018-05-17 · Inference in Bayesian Networks CE417: Introduction to Artificial Intelligence

Bayes’ Nets

Representation

Conditional Independences

Probabilistic Inference

Enumeration (exact, exponential complexity)

Variable elimination (exact, worst-case exponentialcomplexity, often better)

Inference is NP-complete

Sampling (approximate)

Learning Bayes’ Nets from Data

8

Page 9: Inference in Bayesian Networksce.sharif.edu/courses/96-97/2/ce417-1/resources/root... · 2018-05-17 · Inference in Bayesian Networks CE417: Introduction to Artificial Intelligence

Examples:

Posterior probability

Most likely explanation:

Inference

Inference: calculating someuseful quantity from a jointprobability distribution

9

Page 10: Inference in Bayesian Networksce.sharif.edu/courses/96-97/2/ce417-1/resources/root... · 2018-05-17 · Inference in Bayesian Networks CE417: Introduction to Artificial Intelligence

Inference by Enumeration

General case: Evidence variables:

Query* variable:

Hidden variables: All variables

* Works fine with multiple query variables, too

We want:

Step 1: Select the entries consistent with the evidence

Step 2: Sum out H to get joint of Query and evidence

Step 3: Normalize

10

Page 11: Inference in Bayesian Networksce.sharif.edu/courses/96-97/2/ce417-1/resources/root... · 2018-05-17 · Inference in Bayesian Networks CE417: Introduction to Artificial Intelligence

Inference by Enumeration in Bayes’ Net

Given unlimited time, inference in BNs is easy

Reminder of inference by enumeration by example: B E

A

MJ

11

Page 12: Inference in Bayesian Networksce.sharif.edu/courses/96-97/2/ce417-1/resources/root... · 2018-05-17 · Inference in Bayesian Networks CE417: Introduction to Artificial Intelligence

Burglary example: full joint probability

12

𝑃 𝑏 𝑗, ¬𝑚 =𝑃 𝑗, ¬𝑚, 𝑏

𝑃 𝑗, ¬𝑚= 𝐴 𝐸 𝑃 𝑗, ¬𝑚, 𝑏, 𝐴, 𝐸

𝐵 𝐴 𝐸 𝑃 𝑗, ¬𝑚, 𝑏, 𝐴, 𝐸

= 𝐴 𝐸 𝑃 𝑗 𝐴 𝑃 ¬𝑚 𝐴 𝑃 𝐴 𝑏, 𝐸 𝑃 𝑏 𝑃(𝐸)

𝐵 𝐴 𝐸 𝑃 𝑗 𝐴 𝑃 ¬𝑚 𝐴 𝑃 𝐴 𝐵, 𝐸 𝑃 𝐵 𝑃(𝐸)

𝑗: 𝐽𝑜ℎ𝑛𝐶𝑎𝑙𝑙𝑠 = 𝑇𝑟𝑢𝑒¬𝑏: 𝐵𝑢𝑟𝑔𝑙𝑎𝑟𝑦 = 𝐹𝑎𝑙𝑠𝑒

Short-hands

Page 13: Inference in Bayesian Networksce.sharif.edu/courses/96-97/2/ce417-1/resources/root... · 2018-05-17 · Inference in Bayesian Networks CE417: Introduction to Artificial Intelligence

Inference by Enumeration?

13

Page 14: Inference in Bayesian Networksce.sharif.edu/courses/96-97/2/ce417-1/resources/root... · 2018-05-17 · Inference in Bayesian Networks CE417: Introduction to Artificial Intelligence

Factor Zoo

14

Page 15: Inference in Bayesian Networksce.sharif.edu/courses/96-97/2/ce417-1/resources/root... · 2018-05-17 · Inference in Bayesian Networks CE417: Introduction to Artificial Intelligence

Factor Zoo I

Joint distribution: P(X,Y)

Entries P(x,y) for all x, y

Sums to 1

Selected joint: P(x,Y)

A slice of the joint distribution

Entries P(x,y) for fixed x, all y

Sums to P(x)

Number of capitals = dimensionality of the table

T W P

hot sun 0.4

hot rain 0.1

cold sun 0.2

cold rain 0.3

T W P

cold sun 0.2

cold rain 0.3

15

Page 16: Inference in Bayesian Networksce.sharif.edu/courses/96-97/2/ce417-1/resources/root... · 2018-05-17 · Inference in Bayesian Networks CE417: Introduction to Artificial Intelligence

Factor Zoo II

Single conditional: P(Y | x)

Entries P(y | x) for fixed x, all y

Sums to 1

Family of conditionals:

P(X |Y)

Multiple conditionals

Entries P(x | y) for all x, y

Sums to |Y|

T W P

hot sun 0.8

hot rain 0.2

cold sun 0.4

cold rain 0.6

T W P

cold sun 0.4

cold rain 0.6

16

Page 17: Inference in Bayesian Networksce.sharif.edu/courses/96-97/2/ce417-1/resources/root... · 2018-05-17 · Inference in Bayesian Networks CE417: Introduction to Artificial Intelligence

Factor Zoo III

Specified family: P( y | X )

Entries P(y | x) for fixed y,

but for all x

Sums to … who knows!

T W P

hot rain 0.2

cold rain 0.6

17

Page 18: Inference in Bayesian Networksce.sharif.edu/courses/96-97/2/ce417-1/resources/root... · 2018-05-17 · Inference in Bayesian Networks CE417: Introduction to Artificial Intelligence

Factor Zoo Summary

In general, when we write P(Y1 … YN | X1 … XM)

It is a “factor,” a multi-dimensional array

Its values are P(y1 … yN | x1 … xM)

Any assigned (=lower-case) X or Y is a dimension missing (selected) from the array

18

Page 19: Inference in Bayesian Networksce.sharif.edu/courses/96-97/2/ce417-1/resources/root... · 2018-05-17 · Inference in Bayesian Networks CE417: Introduction to Artificial Intelligence

Example: Traffic Domain

RandomVariables

R: Raining

T:Traffic

L: Late for class! T

L

R+r 0.1-r 0.9

+r +t 0.8+r -t 0.2-r +t 0.1-r -t 0.9

+t +l 0.3+t -l 0.7-t +l 0.1-t -l 0.9

19

Page 20: Inference in Bayesian Networksce.sharif.edu/courses/96-97/2/ce417-1/resources/root... · 2018-05-17 · Inference in Bayesian Networks CE417: Introduction to Artificial Intelligence

Inference by Enumeration: Procedural

Outline

Track objects called factors

Initial factors are local CPTs (one per node)

Any known values are selected

E.g. if we know , the initial factors are

Procedure: Join all factors, then eliminate all hidden variables

+r 0.1-r 0.9

+r +t 0.8+r -t 0.2-r +t 0.1-r -t 0.9

+t +l 0.3+t -l 0.7-t +l 0.1-t -l 0.9

+t +l 0.3-t +l 0.1

+r 0.1-r 0.9

+r +t 0.8+r -t 0.2-r +t 0.1-r -t 0.9

20

Page 21: Inference in Bayesian Networksce.sharif.edu/courses/96-97/2/ce417-1/resources/root... · 2018-05-17 · Inference in Bayesian Networks CE417: Introduction to Artificial Intelligence

Operation 1: Join Factors

First basic operation: joining factors

Combining factors:

Just like a database join

Get all factors over the joining variable

Build a new factor over the union of the

variables involved

Example: Join on R

Computation for each entry: pointwise

products

+r 0.1-r 0.9

+r +t 0.8+r -t 0.2-r +t 0.1-r -t 0.9

+r +t 0.08+r -t 0.02-r +t 0.09-r -t 0.81

T

R

R,T

21

Page 22: Inference in Bayesian Networksce.sharif.edu/courses/96-97/2/ce417-1/resources/root... · 2018-05-17 · Inference in Bayesian Networks CE417: Introduction to Artificial Intelligence

Example: Multiple Joins

22

Page 23: Inference in Bayesian Networksce.sharif.edu/courses/96-97/2/ce417-1/resources/root... · 2018-05-17 · Inference in Bayesian Networks CE417: Introduction to Artificial Intelligence

Example: Multiple Joins

T

R Join R

L

R, T

L

+r 0.1-r 0.9

+r +t 0.8+r -t 0.2-r +t 0.1-r -t 0.9

+t +l 0.3+t -l 0.7-t +l 0.1-t -l 0.9

+r +t 0.08+r -t 0.02-r +t 0.09-r -t 0.81

+t +l 0.3+t -l 0.7-t +l 0.1-t -l 0.9

R, T, L

+r +t +l 0.024

+r +t -l 0.056

+r -t +l 0.002

+r -t -l 0.018

-r +t +l 0.027

-r +t -l 0.063

-r -t +l 0.081

-r -t -l 0.729

Join T

23

Page 24: Inference in Bayesian Networksce.sharif.edu/courses/96-97/2/ce417-1/resources/root... · 2018-05-17 · Inference in Bayesian Networks CE417: Introduction to Artificial Intelligence

Operation 2: Eliminate

Second basic operation: marginalization

Take a factor and sum out a variable

Shrinks a factor to a smaller one

A projection operation

Example:

+r +t 0.08+r -t 0.02-r +t 0.09-r -t 0.81

+t 0.17-t 0.83

24

Page 25: Inference in Bayesian Networksce.sharif.edu/courses/96-97/2/ce417-1/resources/root... · 2018-05-17 · Inference in Bayesian Networks CE417: Introduction to Artificial Intelligence

Multiple Elimination

Sumout R

Sumout T

T, L LR, T, L

+r +t +l 0.024

+r +t -l 0.056

+r -t +l 0.002

+r -t -l 0.018

-r +t +l 0.027

-r +t -l 0.063

-r -t +l 0.081

-r -t -l 0.729

+t +l 0.051

+t -l 0.119

-t +l 0.083

-t -l 0.747

+l 0.134-l 0.886

25

Page 26: Inference in Bayesian Networksce.sharif.edu/courses/96-97/2/ce417-1/resources/root... · 2018-05-17 · Inference in Bayesian Networks CE417: Introduction to Artificial Intelligence

Thus Far: Multiple Join, Multiple Eliminate (=

Inference by Enumeration)

26

Page 27: Inference in Bayesian Networksce.sharif.edu/courses/96-97/2/ce417-1/resources/root... · 2018-05-17 · Inference in Bayesian Networks CE417: Introduction to Artificial Intelligence

Inference by Enumeration vs. Variable Elimination

Why is inference by enumeration so slow?

You join up the whole joint distribution beforeyou sum out the hidden variables

Idea: interleave joining and marginalizing!

Called “Variable Elimination”

Still NP-hard, but usually much faster than inference by enumeration

First we’ll need some new notation: factors

27

Page 28: Inference in Bayesian Networksce.sharif.edu/courses/96-97/2/ce417-1/resources/root... · 2018-05-17 · Inference in Bayesian Networks CE417: Introduction to Artificial Intelligence

Traffic Domain

Inference by EnumerationT

L

R

Variable Elimination

Join on rJoin on r

Join on t

Join on t

Eliminate r

Eliminate t

Eliminate r

Eliminate t

28

Page 29: Inference in Bayesian Networksce.sharif.edu/courses/96-97/2/ce417-1/resources/root... · 2018-05-17 · Inference in Bayesian Networks CE417: Introduction to Artificial Intelligence

Marginalizing Early (= Variable Elimination)

29

Page 30: Inference in Bayesian Networksce.sharif.edu/courses/96-97/2/ce417-1/resources/root... · 2018-05-17 · Inference in Bayesian Networks CE417: Introduction to Artificial Intelligence

Marginalizing Early! (aka VE)

Sum out R

T

L

+r +t 0.08+r -t 0.02-r +t 0.09-r -t 0.81

+t +l 0.3+t -l 0.7-t +l 0.1-t -l 0.9

+t 0.17-t 0.83

+t +l 0.3+t -l 0.7-t +l 0.1-t -l 0.9

T

R

L

+r 0.1-r 0.9

+r +t 0.8+r -t 0.2-r +t 0.1-r -t 0.9

+t +l 0.3+t -l 0.7-t +l 0.1-t -l 0.9

Join R

R, T

L

T, L L

+t +l 0.051

+t -l 0.119

-t +l 0.083

-t -l 0.747

+l 0.134-l 0.866

Join T Sum out T

30

Page 31: Inference in Bayesian Networksce.sharif.edu/courses/96-97/2/ce417-1/resources/root... · 2018-05-17 · Inference in Bayesian Networks CE417: Introduction to Artificial Intelligence

Evidence

If evidence, start with factors that select that evidence

No evidence uses these initial factors:

Computing , the initial factors become:

We eliminate all vars other than query + evidence

+r 0.1-r 0.9

+r +t 0.8+r -t 0.2-r +t 0.1-r -t 0.9

+t +l 0.3+t -l 0.7-t +l 0.1-t -l 0.9

+r 0.1 +r +t 0.8+r -t 0.2

+t +l 0.3+t -l 0.7-t +l 0.1-t -l 0.9

31

Page 32: Inference in Bayesian Networksce.sharif.edu/courses/96-97/2/ce417-1/resources/root... · 2018-05-17 · Inference in Bayesian Networks CE417: Introduction to Artificial Intelligence

Evidence II

Result will be a selected joint of query and evidence

E.g. for P(L | +r), we would end up with:

To get our answer, just normalize this!

That ’s it!

+l 0.26-l 0.74

+r +l 0.026+r -l 0.074

Normalize

32

Page 33: Inference in Bayesian Networksce.sharif.edu/courses/96-97/2/ce417-1/resources/root... · 2018-05-17 · Inference in Bayesian Networks CE417: Introduction to Artificial Intelligence

Distribution of products on sums

33

Exploiting the factorization properties to allow sums and

products to be interchanged

𝑎 × 𝑏 + 𝑎 × 𝑐 needs three operations while 𝑎 × (𝑏 + 𝑐)requires two

Page 34: Inference in Bayesian Networksce.sharif.edu/courses/96-97/2/ce417-1/resources/root... · 2018-05-17 · Inference in Bayesian Networks CE417: Introduction to Artificial Intelligence

General Variable Elimination

Query:

Start with initial factors:

Local CPTs (but instantiated by evidence)

While there are still hidden variables (not Qor evidence):

Pick a hidden variable H

Join all factors mentioning H

Eliminate (sum out) H

Join all remaining factors and normalize

34

Page 35: Inference in Bayesian Networksce.sharif.edu/courses/96-97/2/ce417-1/resources/root... · 2018-05-17 · Inference in Bayesian Networks CE417: Introduction to Artificial Intelligence

Variable elimination: example

35

𝑃 𝑏, 𝑗 = 𝐴

𝐸

𝑀

𝑃 𝑏 𝑃 𝐸 𝑃 𝐴 𝑏, 𝐸 𝑃 𝑗 𝐴 𝑃 𝑀 𝐴

= 𝑃 𝑏

𝐸

𝑃 𝐸

𝐴

𝑃 𝐴 𝑏, 𝐸 𝑃 𝑗 𝐴 𝑀𝑃 𝑀 𝐴

𝑃 𝑏|𝑗 ∝ 𝑃(𝑏, 𝑗)Intermediate results are

probability distributions

Page 36: Inference in Bayesian Networksce.sharif.edu/courses/96-97/2/ce417-1/resources/root... · 2018-05-17 · Inference in Bayesian Networks CE417: Introduction to Artificial Intelligence

Variable elimination: example

36

𝑃 𝐵, 𝑗 = 𝐴

𝐸

𝑀

𝑃 𝐵 𝑃 𝐸 𝑃 𝐴 𝐵, 𝐸 𝑃 𝑗 𝐴 𝑃 𝑀 𝐴

= 𝑃 𝐵

𝐸

𝑃 𝐸

𝐴

𝑃 𝐴 𝐵, 𝐸 𝑃 𝑗 𝐴 𝑀𝑃 𝑀 𝐴

𝒇4(𝐴)

11

𝒇7 𝐵, 𝐸 = 𝐴𝒇3(𝐴, 𝐵, 𝐸) × 𝒇4(𝐴) × 𝒇6(𝐴)

𝒇8 𝐵 = 𝐸𝒇2(𝐸) × 𝒇7 𝐵, 𝐸

𝑃 𝐵|𝑗 ∝ 𝑃(𝐵, 𝑗)

𝒇3(𝐴, 𝐵, 𝐸)𝒇1(𝐵) 𝒇2(𝐸) 𝒇5(𝐴,𝑀)

𝒇6(𝐴)

Intermediate results are

probability distributions

Page 37: Inference in Bayesian Networksce.sharif.edu/courses/96-97/2/ce417-1/resources/root... · 2018-05-17 · Inference in Bayesian Networks CE417: Introduction to Artificial Intelligence

Variable elimination: Order of summations

37

An inefficient order:

𝑃 𝐵, 𝑗 = 𝑀

𝐸

𝐴

𝑃 𝐵 𝑃 𝐸 𝑃 𝐴 𝐵, 𝐸 𝑃 𝑗 𝐴 𝑃 𝑀 𝐴

= 𝑃 𝐵 𝑀 𝐸𝑃 𝐸

𝐴

𝑃 𝐴 𝐵, 𝐸 𝑃 𝑗 𝐴 𝑃 𝑀 𝐴

𝒇(𝐴, 𝐵, 𝐸,𝑀)

Page 38: Inference in Bayesian Networksce.sharif.edu/courses/96-97/2/ce417-1/resources/root... · 2018-05-17 · Inference in Bayesian Networks CE417: Introduction to Artificial Intelligence

Variable elimination:

Pruning irrelevant variables

38

Any variable that is not an ancestor of a query variable or

evidence variable is irrelevant to the query.

Prune all non-ancestors of query or evidence variables:

𝑃 𝑏, 𝑗

Burglary

Alarm

John Calls

=True

Earthquake

Mary

Calls

XY

Z

Page 39: Inference in Bayesian Networksce.sharif.edu/courses/96-97/2/ce417-1/resources/root... · 2018-05-17 · Inference in Bayesian Networks CE417: Introduction to Artificial Intelligence

Variable elimination algorithm

39

Given: BN, evidence 𝑒, a query 𝑃(𝒀|𝒙𝒗)

Prune non-ancestors of {𝒀, 𝑿𝑽}

Choose an ordering on variables, e.g.,𝑋1,…,𝑋𝑛 For i = 1 to n, If 𝑋𝑖 ∉ {𝒀, 𝑿𝑽}

Collect factors 𝒇1, … , 𝒇𝑘 that include 𝑋𝑖 Generate a new factor by eliminating 𝑋𝑖 from these factors:

𝒈 = 𝑋𝑖

𝑗=1

𝑘

𝒇𝑗

Normalize 𝑃(𝒀, 𝒙𝒗) to obtain 𝑃(𝒀|𝒙𝒗)

After this summation, 𝑋𝑖 is eliminated

Page 40: Inference in Bayesian Networksce.sharif.edu/courses/96-97/2/ce417-1/resources/root... · 2018-05-17 · Inference in Bayesian Networks CE417: Introduction to Artificial Intelligence

Variable elimination algorithm

40

• Evaluating expressions in a proper order

• Storing intermediate results

• Summation only for those portions of the expression that

depend on that variable

Given: BN, evidence 𝑒, a query 𝑃(𝒀|𝒙𝒗)

Prune non-ancestors of {𝒀, 𝑿𝑽}

Choose an ordering on variables, e.g.,𝑋1,…,𝑋𝑛 For i = 1 to n, If 𝑋𝑖 ∉ {𝒀, 𝑿𝑽}

Collect factors 𝒇1, … , 𝒇𝑘 that include 𝑋𝑖 Generate a new factor by eliminating 𝑋𝑖 from these factors:

𝒈 = 𝑋𝑖

𝑗=1

𝑘

𝒇𝑗

Normalize 𝑃(𝒀, 𝒙𝒗) to obtain 𝑃(𝒀|𝒙𝒗)

Page 41: Inference in Bayesian Networksce.sharif.edu/courses/96-97/2/ce417-1/resources/root... · 2018-05-17 · Inference in Bayesian Networks CE417: Introduction to Artificial Intelligence

Variable elimination

41

Eliminates by summation non-observed non-query variables

one by one by distributing the sum over the product

Complexity determined by the size of the largest factor

Variable elimination can lead to significant costs saving but its

efficiency depends on the network structure .

there are still cases in which this algorithm we lead to exponential time.

Page 42: Inference in Bayesian Networksce.sharif.edu/courses/96-97/2/ce417-1/resources/root... · 2018-05-17 · Inference in Bayesian Networks CE417: Introduction to Artificial Intelligence

Example

Choose A

42

Page 43: Inference in Bayesian Networksce.sharif.edu/courses/96-97/2/ce417-1/resources/root... · 2018-05-17 · Inference in Bayesian Networks CE417: Introduction to Artificial Intelligence

Example

Choose E

Finish with B

Normalize

43

Page 44: Inference in Bayesian Networksce.sharif.edu/courses/96-97/2/ce417-1/resources/root... · 2018-05-17 · Inference in Bayesian Networks CE417: Introduction to Artificial Intelligence

Same Example in Equations

marginal can be obtained from joint by summing out

use Bayes’ net joint distribution expression

use x*(y+z) = xy + xz

joining on a, and then summing out gives f1

use x*(y+z) = xy + xz

joining on e, and then summing out gives f2

All we are doing is exploiting uwy + uwz + uxy + uxz + vwy + vwz + vxy +vxz = (u+v)(w+x)(y+z) to improve computational efficiency!

44

Page 45: Inference in Bayesian Networksce.sharif.edu/courses/96-97/2/ce417-1/resources/root... · 2018-05-17 · Inference in Bayesian Networks CE417: Introduction to Artificial Intelligence

Inference on a chain

45

𝑃 𝑑 = 𝐴 𝐵 𝐶𝑃(𝐴, 𝐵, 𝐶, 𝑑)

𝑃 𝑑 =

𝐴

𝐵 𝐶𝑃 𝐴 𝑃 𝐵 𝐴 𝑃 𝐶 𝐵 𝑃(𝑑|𝐶)

A naïve summation needs to enumerate over an

exponential number of terms

𝐴 𝐵 𝐶 𝐷

Page 46: Inference in Bayesian Networksce.sharif.edu/courses/96-97/2/ce417-1/resources/root... · 2018-05-17 · Inference in Bayesian Networks CE417: Introduction to Artificial Intelligence

Inference on a chain:

marginalization and elimination

46

𝑃 𝑑 =

𝐴

𝐵 𝐶𝑃 𝐴 𝑃 𝐵 𝐴 𝑃 𝐶 𝐵 𝑃(𝑑|𝐶)

=

𝐶

𝐵 𝐴𝑃 𝐴 𝑃 𝐵 𝐴 𝑃 𝐶 𝐵 𝑃(𝑑|𝐶)

=

𝐶

𝑃(𝑑|𝐶) 𝐵𝑃 𝐶 𝐵

𝐴𝑃 𝐴 𝑃 𝐵 𝐴

In a chain of 𝑛 nodes each having 𝑘 values,𝑂(𝑛𝑘2) instead of 𝑂(𝑘𝑛)

𝑓(𝐵)

𝑓(𝐶)

𝐴 𝐵 𝐶 𝐷

Page 47: Inference in Bayesian Networksce.sharif.edu/courses/96-97/2/ce417-1/resources/root... · 2018-05-17 · Inference in Bayesian Networks CE417: Introduction to Artificial Intelligence

Wampus example

47

𝑒𝑣𝑖𝑑𝑒𝑛𝑐𝑒 = ¬𝑏1,1 ∧ 𝑏1,2 ∧ 𝑏2,1 ∧ ¬𝑝1,1 ∧ ¬𝑝1,2 ∧ ¬𝑝2,1

𝑃 𝑃1,3 𝑒𝑣𝑖𝑑𝑒𝑛𝑐𝑒 =?

Page 48: Inference in Bayesian Networksce.sharif.edu/courses/96-97/2/ce417-1/resources/root... · 2018-05-17 · Inference in Bayesian Networks CE417: Introduction to Artificial Intelligence

Wumpus example

48

Possible worlds with 𝑃1,3 = 𝑡𝑟𝑢𝑒 Possible worlds with 𝑃1,3 = 𝑓𝑎𝑙𝑠𝑒

𝑃 𝑃1,3 = 𝑇𝑟𝑢𝑒 𝑒𝑣𝑖𝑑𝑒𝑛𝑐𝑒 ∝ 0.2 × 0.2 × 0.2 + 0.2 × 0.8 + 0.8 × 0.2

𝑃 𝑃1,3 = 𝐹𝑎𝑙𝑠𝑒 𝑒𝑣𝑖𝑑𝑒𝑛𝑐𝑒 ∝ 0.8 × 0.2 × 0.2 + 0.2 × 0.8

⇒ 𝑃 𝑃1,3 = 𝑇𝑟𝑢𝑒 𝑒𝑣𝑖𝑑𝑒𝑛𝑐𝑒 = 0.31

Page 49: Inference in Bayesian Networksce.sharif.edu/courses/96-97/2/ce417-1/resources/root... · 2018-05-17 · Inference in Bayesian Networks CE417: Introduction to Artificial Intelligence

Another Variable Elimination Example

Computational complexity critically depends on the largest factor being generated in this process. Size of factor = number of entries in table. In example above (assuming binary) all factors generated are of size 2 --- as they all only have one variable (Z, Z, and X3 respectively).

49

Page 50: Inference in Bayesian Networksce.sharif.edu/courses/96-97/2/ce417-1/resources/root... · 2018-05-17 · Inference in Bayesian Networks CE417: Introduction to Artificial Intelligence

Variable Elimination Ordering

For the query P(Xn|y1,…,yn) work through the following two different

orderings as done in previous slide: Z, X1, …, Xn-1 and X1, …, Xn-1, Z.

What is the size of the maximum factor generated for each of the

orderings?

Answer: 2n+1 versus 22 (assuming binary)

In general: the ordering can greatly affect efficiency.

50

Page 51: Inference in Bayesian Networksce.sharif.edu/courses/96-97/2/ce417-1/resources/root... · 2018-05-17 · Inference in Bayesian Networks CE417: Introduction to Artificial Intelligence

VE: Computational and Space Complexity

The computational and space complexity of variable elimination is

determined by the largest factor

The elimination ordering can greatly affect the size of the largest factor.

E.g., previous slide’s example 2n vs. 2

Does there always exist an ordering that only results in small factors?

No!

51

Page 52: Inference in Bayesian Networksce.sharif.edu/courses/96-97/2/ce417-1/resources/root... · 2018-05-17 · Inference in Bayesian Networks CE417: Introduction to Artificial Intelligence

Complexity of variable elimination algorithm

52

In each elimination step, the following computations are

required:

𝑓 𝑥, 𝑥1, … , 𝑥𝑘 = 𝑖=1𝑀 𝑔𝑖(𝑥, 𝒙𝑐𝑖)

𝑥 𝑓 𝑥, 𝑥1, … , 𝑥𝑘

We need:

(𝑀 − 1) × 𝑉𝑎𝑙(𝑋) × 𝑖=1𝑘 𝑉𝑎𝑙(𝑋𝑖) multiplications

For each tuple 𝑥, 𝑥1, … , 𝑥𝑘, we need 𝑀 − 1 multiplications

𝑉𝑎𝑙(𝑋) × 𝑖=1𝑘 𝑉𝑎𝑙(𝑋𝑖) additions

For each tuple 𝑥1, … , 𝑥𝑘, we need 𝑉𝑎𝑙(𝑋) additions

Complexity is exponential in number of variables in the intermediate factor

Size of the created factors is the dominant quantity in the complexity of VE

Page 53: Inference in Bayesian Networksce.sharif.edu/courses/96-97/2/ce417-1/resources/root... · 2018-05-17 · Inference in Bayesian Networks CE417: Introduction to Artificial Intelligence

Example

53

Query: 𝑃(𝑋2|𝑋7 = 𝑥7)

𝑃 𝑋2 𝑥7 ∝ 𝑃 𝑋2, 𝑥7

𝑃 𝑥2, 𝑥7

=

𝑥1

𝑥3

𝑥4

𝑥5

𝑥6

𝑥8

𝑃 𝑥1, 𝑥2, 𝑥3, 𝑥4, 𝑥5, 𝑥6, 𝑥7, 𝑥8

Consider the elimination order 𝑋1, 𝑋3, 𝑋4, 𝑋5, 𝑋6, 𝑋8𝑃 𝑥2, 𝑥7

=

𝑥8

𝑥6

𝑥5

𝑥4

𝑥3

𝑥1

𝑃 𝑥1 𝑃 𝑥2 𝑃 𝑥3 𝑥1, 𝑥2 𝑃 𝑥4 𝑥3 𝑃 𝑥5 𝑥2 𝑃 𝑥6 𝑥3, 𝑥7 𝑃( 𝑥7|𝑥4, 𝑥5)𝑃 𝑥8 𝑥7

𝑋1 𝑋2

𝑋3

𝑋4

𝑋5

𝑋6

𝑋7

𝑋8

Page 54: Inference in Bayesian Networksce.sharif.edu/courses/96-97/2/ce417-1/resources/root... · 2018-05-17 · Inference in Bayesian Networks CE417: Introduction to Artificial Intelligence

54

𝑃 𝑥2, 𝑥7 =

𝑥8

𝑥6

𝑥5

𝑥4

𝑥3

𝑃 𝑥2 𝑃 𝑥4 𝑥3 𝑃 𝑥5 𝑥2 𝑃 𝑥6 𝑥3, 𝑥7 𝑃( 𝑥7|𝑥4, 𝑥5)𝑃 𝑥8 𝑥7

𝑥1

𝑃 𝑥1 𝑃 𝑥3 𝑥1, 𝑥2

=

𝑥8

𝑥6

𝑥5

𝑥4

𝑥3

𝑃 𝑥2 𝑃 𝑥4 𝑥3 𝑃 𝑥5 𝑥2 𝑃 𝑥6 𝑥3, 𝑥7 𝑃 𝑥7 𝑥4, 𝑥5 𝑃 𝑥8 𝑥7 𝑚1(𝑥2, 𝑥3)

=

𝑥8

𝑥6

𝑥5

𝑥4

𝑃 𝑥2 𝑃 𝑥5 𝑥2 𝑃 𝑥7 𝑥4, 𝑥5 𝑃 𝑥8 𝑥7

𝑥3

𝑃 𝑥4 𝑥3 𝑃 𝑥6 𝑥3, 𝑥7 𝑚1(𝑥2, 𝑥3)

=

𝑥8

𝑥6

𝑥5

𝑥4

𝑃 𝑥2 𝑃 𝑥5 𝑥2 𝑃 𝑥7 𝑥4, 𝑥5 𝑃 𝑥8 𝑥7 𝑚3(𝑥2, 𝑥6, 𝑥4)

=

𝑥8

𝑥6

𝑥5

𝑃 𝑥2 𝑃 𝑥5 𝑥2 𝑃 𝑥8 𝑥7

𝑥4

𝑃 𝑥7 𝑥4, 𝑥5 𝑚3(𝑥2, 𝑥6, 𝑥4)

=

𝑥8

𝑥6

𝑥5

𝑃 𝑥2 𝑃 𝑥5 𝑥2 𝑃 𝑥8 𝑥7 𝑚4(𝑥2, 𝑥5, 𝑥6)

=

𝑥8

𝑥6

𝑃 𝑥2 𝑃 𝑥8 𝑥7

𝑥5

𝑃 𝑥5 𝑥2 𝑚4(𝑥2, 𝑥5, 𝑥6)

=

𝑥8

𝑥6

𝑃 𝑥2 𝑃 𝑥8 𝑥7 𝑚5(𝑥2, 𝑥6)

=

𝑥8

𝑃 𝑥2 𝑃 𝑥8 𝑥7

𝑥6

𝑚5(𝑥2, 𝑥6)

=

𝑥8

𝑃 𝑥2 𝑃 𝑥8 𝑥7 𝑚6(𝑥2) = 𝑚8(𝑥2)𝑚6(𝑥2)

Page 55: Inference in Bayesian Networksce.sharif.edu/courses/96-97/2/ce417-1/resources/root... · 2018-05-17 · Inference in Bayesian Networks CE417: Introduction to Artificial Intelligence

Conditional probability

55

𝑃 𝑥2| 𝑥7 =𝑚8(𝑥2)𝑚6(𝑥2)

𝑥2𝑚8(𝑥2)𝑚6(𝑥2)

Page 56: Inference in Bayesian Networksce.sharif.edu/courses/96-97/2/ce417-1/resources/root... · 2018-05-17 · Inference in Bayesian Networks CE417: Introduction to Artificial Intelligence

Graph elimination

56

Graph elimination is a simple unified treatment of inference

algorithms

Moralize the graph

Graph-theoretic property: the factors resulted during variable

elimination are captured by recording the elimination clique

The computational complexity of the Eliminate algorithm can

be reduced to purely graph-theoretic considerations

Page 57: Inference in Bayesian Networksce.sharif.edu/courses/96-97/2/ce417-1/resources/root... · 2018-05-17 · Inference in Bayesian Networks CE417: Introduction to Artificial Intelligence

Graph elimination

57

Begin with the undirected GM or moralized BN

Choose an elimination ordering (query nodes should be last)

Eliminate a node from the graph and add edges (called fill

edges) between all pairs of its neighbors

Iterate until all non-query nodes are eliminated

Page 58: Inference in Bayesian Networksce.sharif.edu/courses/96-97/2/ce417-1/resources/root... · 2018-05-17 · Inference in Bayesian Networks CE417: Introduction to Artificial Intelligence

Graph elimination

58

𝑋1 𝑋2

𝑋3

𝑋4

𝑋5

𝑋6𝑋7

𝑋8

𝑋1 𝑋2

𝑋3

𝑋4

𝑋5

𝑋6

𝑋7

𝑋8

𝑋1 𝑋2

𝑋3

𝑋4

𝑋5

𝑋6𝑋8

𝑋2

𝑋3

𝑋4

𝑋5

𝑋6𝑋8

𝑋2

𝑋4

𝑋5

𝑋6𝑋8

𝑋2

𝑋5

𝑋6𝑋8

𝑋2

𝑋6𝑋8

𝑋2

𝑋8

𝑋2

Removing a node from the graph and connecting the remaining

neighbors

Moralized

graph

Summation ⇔ elimination

Intermediate term ⇔ elimination

clique

fill

edges

Page 59: Inference in Bayesian Networksce.sharif.edu/courses/96-97/2/ce417-1/resources/root... · 2018-05-17 · Inference in Bayesian Networks CE417: Introduction to Artificial Intelligence

Graph elimination: elimination cliques

59

Induced dependency during marginalization is captured in

elimination cliques

A correspondence between maximal cliques in the induced

graph and maximal factors generated inVE algorithm

The complexity depends on the number of variables in the largest

elimination clique

The size of the maximal elimination clique in the induced

graph depends on the elimination ordering

Page 60: Inference in Bayesian Networksce.sharif.edu/courses/96-97/2/ce417-1/resources/root... · 2018-05-17 · Inference in Bayesian Networks CE417: Introduction to Artificial Intelligence

Elimination order

60

Finding the best elimination ordering is NP-hard

Equivalent to finding the tree-width in the graph that is NP-

hard

Tree-width: one less than the smallest achievable size of the

largest elimination clique, ranging over all possible elimination

ordering

Good elimination orderings lead to small cliques and

thus reduce complexity

What is the optimal order for trees?

Page 61: Inference in Bayesian Networksce.sharif.edu/courses/96-97/2/ce417-1/resources/root... · 2018-05-17 · Inference in Bayesian Networks CE417: Introduction to Artificial Intelligence

Polytrees

A polytree is a directed graph with no undirected cycles

For poly-trees you can always find an ordering that is efficient

Try it!!

Cut-set conditioning for Bayes’ net inference

Choose set of variables such that if removed only a polytree remains

Exercise:Think about how the specifics would work out!

61

Page 62: Inference in Bayesian Networksce.sharif.edu/courses/96-97/2/ce417-1/resources/root... · 2018-05-17 · Inference in Bayesian Networks CE417: Introduction to Artificial Intelligence

Worst Case Complexity?

CSP:

If we can answer P(z) equal to zero or not, we answered whether the 3-SAT problem has a solution.

Hence inference in Bayes’ nets is NP-hard. No known efficient probabilistic inference in general.

62

Page 63: Inference in Bayesian Networksce.sharif.edu/courses/96-97/2/ce417-1/resources/root... · 2018-05-17 · Inference in Bayesian Networks CE417: Introduction to Artificial Intelligence

Variable Elimination: summary

Interleave joining and marginalizing

dk entries computed for a factor over kvariables with domain sizes d

Ordering of elimination of hiddenvariables can affect size of factorsgenerated

Worst case: running time exponentialin the size of the Bayes’ net

63

Page 64: Inference in Bayesian Networksce.sharif.edu/courses/96-97/2/ce417-1/resources/root... · 2018-05-17 · Inference in Bayesian Networks CE417: Introduction to Artificial Intelligence

Bayes’ Nets Representation

Conditional Independences

Probabilistic Inference

Enumeration (exact, exponential complexity)

Variable elimination (exact, worst-caseexponential complexity, often better)

Inference is NP-complete

Sampling (approximate)

Learning Bayes’ Nets from Data

64

Page 65: Inference in Bayesian Networksce.sharif.edu/courses/96-97/2/ce417-1/resources/root... · 2018-05-17 · Inference in Bayesian Networks CE417: Introduction to Artificial Intelligence

Approximate Inference: Sampling

65

Page 66: Inference in Bayesian Networksce.sharif.edu/courses/96-97/2/ce417-1/resources/root... · 2018-05-17 · Inference in Bayesian Networks CE417: Introduction to Artificial Intelligence

Sampling

Sampling is a lot like repeated simulation

Predicting the weather, basketball games, …

Basic idea

Draw N samples from a sampling distribution S

Compute an approximate posterior probability

Show this converges to the true probability P

Why sample?

Learning: get samples from a distribution you don’t know

Inference: getting a sample is faster than computing the right answer (e.g. with variable elimination)

66

Page 67: Inference in Bayesian Networksce.sharif.edu/courses/96-97/2/ce417-1/resources/root... · 2018-05-17 · Inference in Bayesian Networks CE417: Introduction to Artificial Intelligence

Sampling

Sampling from given distribution

Step 1: Get sample u from uniform distribution over [0, 1)

E.g. random() in python

Step 2: Convert this sample u into an outcome for the given distribution by having each

outcome associated with a sub-interval of [0,1) with sub-interval size equal to probability

of the outcome

If random() returns u = 0.83, then our sample is C = blue

E.g, after sampling 8 times:

C P(C)

red 0.6

green 0.1

blue 0.3

67

Page 68: Inference in Bayesian Networksce.sharif.edu/courses/96-97/2/ce417-1/resources/root... · 2018-05-17 · Inference in Bayesian Networks CE417: Introduction to Artificial Intelligence

Sampling in Bayes’ Nets

Prior Sampling

Rejection Sampling

Likelihood Weighting

Gibbs Sampling

68

Page 69: Inference in Bayesian Networksce.sharif.edu/courses/96-97/2/ce417-1/resources/root... · 2018-05-17 · Inference in Bayesian Networks CE417: Introduction to Artificial Intelligence

Prior Sampling

69

Page 70: Inference in Bayesian Networksce.sharif.edu/courses/96-97/2/ce417-1/resources/root... · 2018-05-17 · Inference in Bayesian Networks CE417: Introduction to Artificial Intelligence

Prior Sampling

Cloudy

Sprinkler Rain

WetGrass

Cloudy

Sprinkler Rain

WetGrass

+c 0.5-c 0.5

+c +s 0.1-s 0.9

-c +s 0.5-s 0.5

+c +r 0.8-r 0.2

-c +r 0.2-r 0.8

+s +r +w 0.99-w 0.01

-r +w 0.90-w 0.10

-s +r +w 0.90-w 0.10

-r +w 0.01-w 0.99

Samples:

+c, -s, +r, +w-c, +s, -r, +w…

70

Page 71: Inference in Bayesian Networksce.sharif.edu/courses/96-97/2/ce417-1/resources/root... · 2018-05-17 · Inference in Bayesian Networks CE417: Introduction to Artificial Intelligence

Prior Sampling

For i=1, 2,…, n

Sample xi from P(Xi | Parents(Xi))

Return (x1, x2,…, xn)

71

Page 72: Inference in Bayesian Networksce.sharif.edu/courses/96-97/2/ce417-1/resources/root... · 2018-05-17 · Inference in Bayesian Networks CE417: Introduction to Artificial Intelligence

Prior Sampling

This process generates samples with probability:

…i.e. the BN’s joint probability

Let the number of samples of an event be

Then

I.e., the sampling procedure is consistent

72

Page 73: Inference in Bayesian Networksce.sharif.edu/courses/96-97/2/ce417-1/resources/root... · 2018-05-17 · Inference in Bayesian Networks CE417: Introduction to Artificial Intelligence

Example

We’ll get a bunch of samples from the BN:

+c, -s, +r, +w

+c, +s, +r, +w

-c, +s, +r, -w

+c, -s, +r, +w

-c, -s, -r, +w

If we want to know P(W)

We have counts <+w:4, -w:1>

Normalize to get P(W) = <+w:0.8, -w:0.2>

This will get closer to the true distribution with more samples

Can estimate anything else, too

What about P(C| +w)? P(C| +r, +w)? P(C| -r, -w)?

Fast: can use fewer samples if less time (what’s the drawback?)

S R

W

C

73

Page 74: Inference in Bayesian Networksce.sharif.edu/courses/96-97/2/ce417-1/resources/root... · 2018-05-17 · Inference in Bayesian Networks CE417: Introduction to Artificial Intelligence

Rejection Sampling

74

Page 75: Inference in Bayesian Networksce.sharif.edu/courses/96-97/2/ce417-1/resources/root... · 2018-05-17 · Inference in Bayesian Networks CE417: Introduction to Artificial Intelligence

+c, -s, +r, +w+c, +s, +r, +w-c, +s, +r, -w+c, -s, +r, +w-c, -s, -r, +w

Rejection Sampling

Let’s say we want P(C)

No point keeping all samples around

Just tally counts of C as we go

Let’s say we want P(C| +s)

Same thing: tally C outcomes, but ignore(reject) samples which don’t have S=+s

This is called rejection sampling

It is also consistent for conditional probabilities(i.e., correct in the limit)

S R

W

C

75

Page 76: Inference in Bayesian Networksce.sharif.edu/courses/96-97/2/ce417-1/resources/root... · 2018-05-17 · Inference in Bayesian Networks CE417: Introduction to Artificial Intelligence

Rejection Sampling

IN: evidence instantiation

For i=1, 2,…, n

Sample xi from P(Xi | Parents(Xi))

If xi not consistent with evidence

Reject: Return, and no sample is generated in this cycle

Return (x1, x2,…, xn)

76

Page 77: Inference in Bayesian Networksce.sharif.edu/courses/96-97/2/ce417-1/resources/root... · 2018-05-17 · Inference in Bayesian Networks CE417: Introduction to Artificial Intelligence

Likelihood Weighting

77

Page 78: Inference in Bayesian Networksce.sharif.edu/courses/96-97/2/ce417-1/resources/root... · 2018-05-17 · Inference in Bayesian Networks CE417: Introduction to Artificial Intelligence

Idea: fix evidence variables and sample the rest Problem: sample distribution not consistent!

Solution: weight by probability of evidence given parents

Likelihood Weighting

Problem with rejection sampling:

If evidence is unlikely, rejects lots of samples

Evidence not exploited as you sample

Consider P(Shape|blue)

Shape ColorShape Color

pyramid, greenpyramid, redsphere, bluecube, redsphere, green

pyramid, bluepyramid, bluesphere, bluecube, bluesphere, blue

78

Page 79: Inference in Bayesian Networksce.sharif.edu/courses/96-97/2/ce417-1/resources/root... · 2018-05-17 · Inference in Bayesian Networks CE417: Introduction to Artificial Intelligence

Likelihood Weighting

+c 0.5-c 0.5

+c +s 0.1-s 0.9

-c +s 0.5-s 0.5

+c +r 0.8-r 0.2

-c +r 0.2-r 0.8

+s +r +w 0.99-w 0.01

-r +w 0.90-w 0.10

-s +r +w 0.90-w 0.10

-r +w 0.01-w 0.99

Samples:

+c, +s, +r, +w…

Cloudy

Sprinkler Rain

WetGrass

Cloudy

Sprinkler Rain

WetGrass

79

Page 80: Inference in Bayesian Networksce.sharif.edu/courses/96-97/2/ce417-1/resources/root... · 2018-05-17 · Inference in Bayesian Networks CE417: Introduction to Artificial Intelligence

Likelihood Weighting

IN: evidence instantiation

w = 1.0

for i=1, 2,…, n

if Xi is an evidence variable

Xi = observation xi for Xi

Set w = w * P(xi | Parents(Xi))

else

Sample xi from P(Xi | Parents(Xi))

return (x1, x2,…, xn), w

80

Page 81: Inference in Bayesian Networksce.sharif.edu/courses/96-97/2/ce417-1/resources/root... · 2018-05-17 · Inference in Bayesian Networks CE417: Introduction to Artificial Intelligence

Likelihood Weighting

Sampling distribution if z sampled and e fixed evidence

Now, samples have weights

Together, weighted sampling distribution is consistent

Cloudy

R

C

S

W

81

Page 82: Inference in Bayesian Networksce.sharif.edu/courses/96-97/2/ce417-1/resources/root... · 2018-05-17 · Inference in Bayesian Networks CE417: Introduction to Artificial Intelligence

Likelihood Weighting

Likelihood weighting is good

We have taken evidence into account as we generate the sample

E.g. here, W’s value will get picked based on the evidence values of S, R

More of our samples will reflect the state of the world suggested by the evidence

Likelihood weighting doesn’t solve all our problems

Evidence influences the choice of downstream variables, but not upstream ones (C isn’tmore likely to get a value matching the evidence)

We would like to consider evidence when we sample every variable

Gibbs sampling

82

Page 83: Inference in Bayesian Networksce.sharif.edu/courses/96-97/2/ce417-1/resources/root... · 2018-05-17 · Inference in Bayesian Networks CE417: Introduction to Artificial Intelligence

Gibbs Sampling

83

Page 84: Inference in Bayesian Networksce.sharif.edu/courses/96-97/2/ce417-1/resources/root... · 2018-05-17 · Inference in Bayesian Networks CE417: Introduction to Artificial Intelligence

Gibbs Sampling

Procedure: keep track of a full instantiation x1, x2,…, xn.

Start with an arbitrary instantiation consistent with the evidence.

Sample one variable at a time, conditioned on all the rest, but keep evidence fixed.

Keep repeating this for a long time.

Property: in the limit of repeating this infinitely many times the resulting sample is

coming from the correct distribution

Rationale: both upstream and downstream variables condition on evidence.

In contrast: likelihood weighting only conditions on upstream evidence, and hence

weights obtained in likelihood weighting can sometimes be very small.

Sum of weights over all samples is indicative of how many “effective” samples were obtained, so

want high weight.

84

Page 85: Inference in Bayesian Networksce.sharif.edu/courses/96-97/2/ce417-1/resources/root... · 2018-05-17 · Inference in Bayesian Networks CE417: Introduction to Artificial Intelligence

Gibbs Sampling Example: P( S | +r)

Step 1: Fix evidence

R = +r

Step 2: Initialize other variables

Randomly

Steps 3: Repeat

Choose a non-evidence variable X

Resample X from P( X | all other variables)

S +r

W

C

S +r

W

C

S +rW

CS +r

W

CS +r

W

CS +r

W

CS +r

W

CS +r

W

C

85

Page 86: Inference in Bayesian Networksce.sharif.edu/courses/96-97/2/ce417-1/resources/root... · 2018-05-17 · Inference in Bayesian Networks CE417: Introduction to Artificial Intelligence

Gibbs Sampling

How is this better than sampling from the full joint?

In a Bayes’ Net, sampling a variable given all the other variables

(e.g. P(R|S,C,W)) is usually much easier than sampling from the

full joint distribution Only requires a join on the variable to be sampled (in this case, a join on R)

The resulting factor only depends on the variable’s parents, its children, and its children’s

parents (this is often referred to as its Markov blanket)

86

Page 87: Inference in Bayesian Networksce.sharif.edu/courses/96-97/2/ce417-1/resources/root... · 2018-05-17 · Inference in Bayesian Networks CE417: Introduction to Artificial Intelligence

Efficient Resampling of One Variable

Sample from P(S | +c, +r, -w)

Many things cancel out – only CPTs with S remain!

More generally: only CPTs that have resampled variable need to be

considered, and joined together

S +r

W

C

87

Page 88: Inference in Bayesian Networksce.sharif.edu/courses/96-97/2/ce417-1/resources/root... · 2018-05-17 · Inference in Bayesian Networks CE417: Introduction to Artificial Intelligence

Bayes’ Net Sampling Summary

Prior Sampling P

Likelihood Weighting P( Q | e)

Rejection Sampling P( Q | e )

Gibbs Sampling P( Q | e )

88

Page 89: Inference in Bayesian Networksce.sharif.edu/courses/96-97/2/ce417-1/resources/root... · 2018-05-17 · Inference in Bayesian Networks CE417: Introduction to Artificial Intelligence

Further Reading on Gibbs Sampling*

Gibbs sampling produces sample from the query distribution P(Q|e)

in limit of re-sampling infinitely often

Gibbs sampling is a special case of more general methods called

Markov chain Monte Carlo (MCMC) methods

Metropolis-Hastings is one of the more famous MCMC methods (in fact, Gibbs

sampling is a special case of Metropolis-Hastings)

You may read about Monte Carlo methods – they’re just sampling

89