52
1 22c:145 Artificial Intelligence Bayesian Networks • Reading: Ch 14. Russell & Norvig

1 22c:145 Artificial Intelligence Bayesian Networks Reading: Ch 14. Russell & Norvig

Embed Size (px)

Citation preview

Page 1: 1 22c:145 Artificial Intelligence Bayesian Networks Reading: Ch 14. Russell & Norvig

1

22c:145 Artificial Intelligence

Bayesian Networks

• Reading: Ch 14. Russell & Norvig

Page 2: 1 22c:145 Artificial Intelligence Bayesian Networks Reading: Ch 14. Russell & Norvig

2

Review of Probability Theory

• Random Variables• The probability that a random variable X has value

val is written as P(X=val)• P: domain ! [0, 1]

– Sums to 1 over the domain: » P(Raining = true) = P(Raining) = 0.2» P(Raining = false) = P(: Raining) = 0.8

• Joint distribution: • P(X1, X2, …, Xn)

• Probability assignment to all combinations of values of random variables and provide complete information about the probabilities of its random variables.

• A JPD table for n random variables, each ranging over k distinct values, has kn entries!

Toothache :Toothache

Cavity 0.04 0.06

: Cavity 0.01 0.89

Page 3: 1 22c:145 Artificial Intelligence Bayesian Networks Reading: Ch 14. Russell & Norvig

3

Review of Probability Theory

• Conditioning• P(A) = P(A | B) P(B) + P(A | :B) P(:B)

= P(A Æ B) + P(A Æ :B)• A and B are independent iff

• P(A Æ B) = P(A) ¢ P(B)• P(A | B) = P(A)• P(B | A) = P(B)

• A and B are conditionally independent given C iff• P(A | B, C) = P(A | C)• P(B | A, C) = P(B | C)• P(A Æ B | C) = P(A | C) ¢ P(B | C)

• Bayes’ Rule• P(A | B) = P(B | A) P(A) / P(B)• P(A | B, C) = P(B | A, C) P(A | C) / P(B | C)

Page 4: 1 22c:145 Artificial Intelligence Bayesian Networks Reading: Ch 14. Russell & Norvig

4

Bayesian Networks

• To do probabilistic reasoning, you need to know the joint probability distribution

• But, in a domain with N propositional variables, one needs 2N numbers to specify the joint probability distribution

• We want to exploit independences in the domain• Two components: structure and numerical

parameters

Page 5: 1 22c:145 Artificial Intelligence Bayesian Networks Reading: Ch 14. Russell & Norvig

Lecture 14 • 5

Bayesian networks• A simple, graphical notation for conditional independence

assertions and hence for compact specification of full joint distributions

• Syntax:• a set of nodes, one per variable• a directed, acyclic graph (link ≈ "directly influences")• a conditional distribution for each node given its parents:

P (Xi | Parents (Xi))

• In the simplest case, conditional distribution represented as a conditional probability table (CPT) giving the distribution over Xi for each combination of parent values

Page 6: 1 22c:145 Artificial Intelligence Bayesian Networks Reading: Ch 14. Russell & Norvig

6

Bayesian (Belief) Networks

• Set of random variables, each has a finite set of values

• Set of directed arcs between them forming acyclic graph, representing causal relation

• Every node A, with parents B1, …, Bn, has

P(A | B1,…,Bn) specified

Page 7: 1 22c:145 Artificial Intelligence Bayesian Networks Reading: Ch 14. Russell & Norvig

7

Key Advantage

• The conditional independencies (missing arrows) mean that we can store and compute the joint probability distribution more efficiently

How to design a Belief Network?•Explore the causal relations

Page 8: 1 22c:145 Artificial Intelligence Bayesian Networks Reading: Ch 14. Russell & Norvig

8

Icy Roads

“Causal” Component

Holmes Crash

Icy

Watson Crash

Inspector Smith is waiting for Holmes and Watson, who are driving (separately) to meet him. It is winter. His secretary tells him that Watson has had an accident. He says, “It must be that the roads are icy. I bet that Holmes will have an accident too. I should go to lunch.” But, his secretary says, “No, the roads are not icy, look at the window.” So, he says, “I guess I better wait for Holmes.”

Page 9: 1 22c:145 Artificial Intelligence Bayesian Networks Reading: Ch 14. Russell & Norvig

9

Icy Roads

“Causal” Component

Holmes Crash

Icy

Watson Crash

Inspector Smith is waiting for Holmes and Watson, who are driving (separately) to meet him. It is winter. His secretary tells him that Watson has had an accident. He says, “It must be that the roads are icy. I bet that Holmes will have an accident too. I should go to lunch.” But, his secretary says, “No, the roads are not icy, look at the window.” So, he says, “I guess I better wait for Holmes.”

Page 10: 1 22c:145 Artificial Intelligence Bayesian Networks Reading: Ch 14. Russell & Norvig

10

Icy

Icy Roads

“Causal” Component

Holmes Crash

Watson Crash

Inspector Smith is waiting for Holmes and Watson, who are driving (separately) to meet him. It is winter. His secretary tells him that Watson has had an accident. He says, “It must be that the roads are icy. I bet that Holmes will have an accident too. I should go to lunch.” But, his secretary says, “No, the roads are not icy, look at the window.” So, he says, “I guess I better wait for Holmes.”

Page 11: 1 22c:145 Artificial Intelligence Bayesian Networks Reading: Ch 14. Russell & Norvig

11

Icy

Icy Roads

“Causal” Component

Holmes Crash

Watson Crash

Inspector Smith is waiting for Holmes and Watson, who are driving (separately) to meet him. It is winter. His secretary tells him that Watson has had an accident. He says, “It must be that the roads are icy. I bet that Holmes will have an accident too. I should go to lunch.” But, his secretary says, “No, the roads are not icy, look at the window.” So, he says, “I guess I better wait for Holmes.”

H and W are dependent,

Page 12: 1 22c:145 Artificial Intelligence Bayesian Networks Reading: Ch 14. Russell & Norvig

12

Icy Roads

“Causal” Component

Holmes Crash

Icy

Watson Crash

Inspector Smith is waiting for Holmes and Watson, who are driving (separately) to meet him. It is winter. His secretary tells him that Watson has had an accident. He says, “It must be that the roads are icy. I bet that Holmes will have an accident too. I should go to lunch.” But, his secretary says, “No, the roads are not icy, look at the window.” So, he says, “I guess I better wait for Holmes.”

H and W are dependent, but conditionally independent given I

Page 13: 1 22c:145 Artificial Intelligence Bayesian Networks Reading: Ch 14. Russell & Norvig

13

Holmes and Watson in IA

Holmes and Watson have moved to IA. He wakes up to find his lawn wet. He wonders if it has rained or if he left his sprinkler on. He looks at his neighbor Watson’s lawn and he sees it is wet too. So, he concludes it must have rained.

Holmes Lawn Wet

Sprinkler

Watson Lawn Wet

Rain

Page 14: 1 22c:145 Artificial Intelligence Bayesian Networks Reading: Ch 14. Russell & Norvig

14

Holmes and Watson in IA

Holmes and Watson have moved to IA. He wakes up to find his lawn wet. He wonders if it has rained or if he left his sprinkler on. He looks at his neighbor Watson’s lawn and he sees it is wet too. So, he concludes it must have rained.

Holmes Lawn Wet

Sprinkler

Watson Lawn Wet

Rain

Page 15: 1 22c:145 Artificial Intelligence Bayesian Networks Reading: Ch 14. Russell & Norvig

15

Holmes and Watson in IA

Holmes and Watson have moved to IA. He wakes up to find his lawn wet. He wonders if it has rained or if he left his sprinkler on. He looks at his neighbor Watson’s lawn and he sees it is wet too. So, he concludes it must have rained.

Holmes Lawn Wet

Sprinkler

Watson Lawn Wet

Rain

Page 16: 1 22c:145 Artificial Intelligence Bayesian Networks Reading: Ch 14. Russell & Norvig

16

Holmes and Watson in IA

Holmes and Watson have moved to IA. He wakes up to find his lawn wet. He wonders if it has rained or if he left his sprinkler on. He looks at his neighbor Watson’s lawn and he sees it is wet too. So, he concludes it must have rained.

Holmes Lawn Wet

Sprinkler

Watson Lawn Wet

Rain

Page 17: 1 22c:145 Artificial Intelligence Bayesian Networks Reading: Ch 14. Russell & Norvig

17

Rain

Holmes Lawn Wet

Holmes and Watson in IA

Holmes and Watson have moved to IA. He wakes up to find his lawn wet. He wonders if it has rained or if he left his sprinkler on. He looks at his neighbor Watson’s lawn and he sees it is wet too. So, he concludes it must have rained.

Sprinkler

Watson Lawn Wet

Given W, P(R) goes up

Page 18: 1 22c:145 Artificial Intelligence Bayesian Networks Reading: Ch 14. Russell & Norvig

18

Rain

Holmes Lawn Wet

Holmes and Watson in IA

Holmes and Watson have moved to IA. He wakes up to find his lawn wet. He wonders if it has rained or if he left his sprinkler on. He looks at his neighbor Watson’s lawn and he sees it is wet too. So, he concludes it must have rained.

Sprinkler

Watson Lawn Wet

Given W, P(R) goes up and P(S) goes down – “explaining away”

Page 19: 1 22c:145 Artificial Intelligence Bayesian Networks Reading: Ch 14. Russell & Norvig

19

Inference in Bayesian Networks

Query Types

Given a Bayesian network, what questions might we want to ask?

•Conditional probability query: P(x | e)•Maximum a posteriori probability:

What value of x maximizes P(x|e) ?

General question: What’s the whole probability distribution over variable X given evidence e, P(X | e)?

Page 20: 1 22c:145 Artificial Intelligence Bayesian Networks Reading: Ch 14. Russell & Norvig

20

Using the joint distribution

To answer any query involving a conjunction of variables, sum over the variables not involved in the query.

Pr(d) Pr(a,b,c,d )ABC

Pr(Aa Bb C c)cdom(C )

bdom(B )

adom(A )

Page 21: 1 22c:145 Artificial Intelligence Bayesian Networks Reading: Ch 14. Russell & Norvig

21

Using the joint distribution

To answer any query involving a conjunction of variables, sum over the variables not involved in the query.

Pr(d) Pr(a,b,c,d )ABC

Pr(Aa Bb C c)cdom(C )

bdom(B )

adom(A )

Pr(d | b)Pr(b,d )Pr(b)

Pr(a,b,c,d)

AC

Pr(a,b,c,d)

ACD

Page 22: 1 22c:145 Artificial Intelligence Bayesian Networks Reading: Ch 14. Russell & Norvig

22

Chain Rule

• Variables: V1, …, Vn

• Values: v1, …, vn

• P(V1=v1, V2=v2, …, Vn=vn) = i P(Vi=vi | parents(Vi))

A B

C

D

P(A) P(B)

P(C|A,B)

P(D|C)

Page 23: 1 22c:145 Artificial Intelligence Bayesian Networks Reading: Ch 14. Russell & Norvig

23

Chain Rule

• Variables: V1, …, Vn

• Values: v1, …, vn

• P(V1=v1, V2=v2, …, Vn=vn) = i P(Vi=vi | parents(Vi))

A B

C

D

P(A) P(B)

P(C|A,B)

P(D|C)

P(ABCD) = P(A=true, B=true, C=true, D=true)

Page 24: 1 22c:145 Artificial Intelligence Bayesian Networks Reading: Ch 14. Russell & Norvig

24

Chain Rule

• Variables: V1, …, Vn

• Values: v1, …, vn

• P(V1=v1, V2=v2, …, Vn=vn) = i P(Vi=vi | parents(Vi))

A B

C

D

P(A) P(B)

P(C|A,B)

P(D|C)

P(ABCD) = P(A=true, B=true, C=true, D=true)

P(ABCD) =

P(D|ABC)P(ABC)

Page 25: 1 22c:145 Artificial Intelligence Bayesian Networks Reading: Ch 14. Russell & Norvig

25

Chain Rule

• Variables: V1, …, Vn

• Values: v1, …, vn

• P(V1=v1, V2=v2, …, Vn=vn) = i P(Vi=vi | parents(Vi))

A B

C

D

P(A) P(B)

P(C|A,B)

P(D|C)

P(ABCD) = P(A=true, B=true, C=true, D=true)

P(ABCD) =

P(D|ABC)P(ABC) =

P(D|C) P(ABC) =

A independent from D given C

B independent from D given C

Page 26: 1 22c:145 Artificial Intelligence Bayesian Networks Reading: Ch 14. Russell & Norvig

26

Chain Rule

• Variables: V1, …, Vn

• Values: v1, …, vn

• P(V1=v1, V2=v2, …, Vn=vn) = i P(Vi=vi | parents(Vi))

A B

C

D

P(A) P(B)

P(C|A,B)

P(D|C)

P(ABCD) = P(A=true, B=true, C=true, D=true)

P(ABCD) =

P(D|ABC)P(ABC) =

P(D|C) P(ABC) =

P(D|C) P(C|AB) P(AB) =

A independent from D given C

B independent from D given C

Page 27: 1 22c:145 Artificial Intelligence Bayesian Networks Reading: Ch 14. Russell & Norvig

27

Chain Rule

• Variables: V1, …, Vn

• Values: v1, …, vn

• P(V1=v1, V2=v2, …, Vn=vn) = i P(Vi=vi | parents(Vi))

A B

C

D

P(A) P(B)

P(C|A,B)

P(D|C)

P(ABCD) = P(A=true, B=true, C=true, D=true)

P(ABCD) =

P(D|ABC)P(ABC) =

P(D|C) P(ABC) =

P(D|C) P(C|AB) P(AB) =

P(D|C) P(C|AB) P(A)P(B) A independent from D given C

B independent from D given C

A independent from B

Page 28: 1 22c:145 Artificial Intelligence Bayesian Networks Reading: Ch 14. Russell & Norvig

28

Chain Rule

• Variables: V1, …, Vn

• Values: v1, …, vn

• P(V1=v1, V2=v2, …, Vn=vn) = i P(Vi=vi | parents(Vi))

A B

C

D

P(A) P(B)

P(C|A,B)

P(D|C)

P(ABCD) = P(A=true, B=true, C=true, D=true)

P(ABCD) =

P(D|ABC)P(ABC) =

P(D|C) P(ABC) =

P(D|C) P(C|AB) P(AB) =

P(D|C) P(C|AB) P(A)P(B) A independent from D given C

B independent from D given C

A independent from B

Page 29: 1 22c:145 Artificial Intelligence Bayesian Networks Reading: Ch 14. Russell & Norvig

29

Icy Roads with Numbers

Holmes Crash

Icy

Watson Crash

P(I=t) P(I=f)

0.7 0.3t = true

f = false

The right-hand column in these tables is redundant, since we know the entries in each row must add to 1.

NB: the columns need NOT add to 1.

Page 30: 1 22c:145 Artificial Intelligence Bayesian Networks Reading: Ch 14. Russell & Norvig

30

Icy Roads with Numbers

P(W=t | I) P(W=f | I)

I=t 0.8 0.2

I=f 0.1 0.9

Holmes Crash

Icy

Watson Crash

P(I=t) P(I=f)

0.7 0.3t = true

f = false

The right-hand column in these tables is redundant, since we know the entries in each row must add to 1.

Note: the columns need NOT add to 1.

Page 31: 1 22c:145 Artificial Intelligence Bayesian Networks Reading: Ch 14. Russell & Norvig

31

Icy Roads with Numbers

P(W=t | I) P(W=f | I)

I=t 0.8 0.2

I=f 0.1 0.9

Holmes Crash

Icy

Watson Crash

P(I=t) P(I=f)

0.7 0.3

P(H=t | I) P(H=f | I)

I=t 0.8 0.2

I=f 0.1 0.9

t = true

f = false

The right-hand column in these tables is redundant, since we know the entries in each row must add to 1.

Note: the columns need NOT add to 1.

Page 32: 1 22c:145 Artificial Intelligence Bayesian Networks Reading: Ch 14. Russell & Norvig

32

Probability that Watson Crashes

Holmes Crash

Icy

Watson Crash

P(I)=0.7

P(H| I)

I 0.8

-I 0.1

P(W| I)

I 0.8

-I 0.1

P(W) =

Page 33: 1 22c:145 Artificial Intelligence Bayesian Networks Reading: Ch 14. Russell & Norvig

33

Probability that Watson Crashes

Holmes Crash

Icy

Watson Crash

P(I)=0.7

P(H| I)

I 0.8

-I 0.1

P(W| I)

I 0.8

-I 0.1

P(W) = P(W| I) P(I) + P(W|-I) P(-I)

= 0.8*0.7 + 0.1*0.3

= 0.56 + 0.03

= 0.59

Page 34: 1 22c:145 Artificial Intelligence Bayesian Networks Reading: Ch 14. Russell & Norvig

34

Probability of Icy given Watson

Holmes Crash

Icy

Watson Crash

P(I)=0.7

P(H| I)

I 0.8

-I 0.1

P(W| I)

I 0.8

-I 0.1

P(I | W) =

Page 35: 1 22c:145 Artificial Intelligence Bayesian Networks Reading: Ch 14. Russell & Norvig

35

Probability of Icy given Watson

Holmes Crash

Icy

Watson Crash

P(I)=0.7

P(H| I)

I 0.8

-I 0.1

P(W| I)

I 0.8

-I 0.1

P(I | W) = P(W | I) P(I) / P(W)

= 0.8*0.7 / 0.59

= 0.95

We started with P(I) = 0.7; knowing that Watson crashed raised the probability to 0.95

Page 36: 1 22c:145 Artificial Intelligence Bayesian Networks Reading: Ch 14. Russell & Norvig

36

Probability of Holmes given Watson

Holmes Crash

Icy

Watson Crash

P(I)=0.7

P(H| I)

I 0.8

-I 0.1

P(W| I)

I 0.8

-I 0.1

P(H|W) =

Page 37: 1 22c:145 Artificial Intelligence Bayesian Networks Reading: Ch 14. Russell & Norvig

37

Probability of Holmes given Watson

Holmes Crash

Icy

Watson Crash

P(I)=0.7

P(H| I)

I 0.8

-I 0.1

P(W| I)

I 0.8

-I 0.1

P(H|W) = P(H, I | W) + P(H, -I | W) = P(H|W,I)P(I|W) + P(H|W,-I) P(-I| W) = P(H|I)P(I|W) + P(H|-I) P(-I| W) = 0.8*0.95 + 0.1*0.05 = 0.765

We started with P(H) = 0.59; knowing that Watson crashed raised the probability to 0.765

Page 38: 1 22c:145 Artificial Intelligence Bayesian Networks Reading: Ch 14. Russell & Norvig

38

Prob of Holmes given Icy and Watson

Holmes Crash

Icy

Watson Crash

P(I)=0.7

P(H| I)

I 0.8

~I 0.1

P(W| I)

I 0.8

~I 0.1

P(H|W, ~I I) = P(H ~I) = 0.1

H and W are independent given I, so H and W are conditionally independent given I

Page 39: 1 22c:145 Artificial Intelligence Bayesian Networks Reading: Ch 14. Russell & Norvig

Lecture 14 • 39

Example

• Topology of network encodes conditional independence assertions:

• Weather is independent of the other variables• Toothache and Catch are conditionally independent given

Cavity

Page 40: 1 22c:145 Artificial Intelligence Bayesian Networks Reading: Ch 14. Russell & Norvig

Lecture 14 • 40

Example• I'm at work, neighbor John calls to say my alarm is ringing, but

neighbor Mary doesn't call. Sometimes it's set off by minor earthquakes. Is there a burglar?

• Variables: Burglary, Earthquake, Alarm, JohnCalls, MaryCalls

• Network topology reflects "causal" knowledge:• A burglar can set the alarm off• An earthquake can set the alarm off• The alarm can cause Mary to call• The alarm can cause John to call

Page 41: 1 22c:145 Artificial Intelligence Bayesian Networks Reading: Ch 14. Russell & Norvig

Lecture 14 • 41

Example contd.

Page 42: 1 22c:145 Artificial Intelligence Bayesian Networks Reading: Ch 14. Russell & Norvig

Lecture 14 • 42

Compactness

• A CPT for Boolean Xi with k Boolean parents has 2k rows for the combinations of parent values

• Each row requires one number p for Xi = true(the number for Xi = false is just 1-p)

• If each variable has no more than k parents, the complete network requires O(n · 2k) numbers

• I.e., grows linearly with n, vs. O(2n) for the full joint distribution

• For burglary net, 1 + 1 + 4 + 2 + 2 = 10 numbers (vs. 25-1 = 31)

Page 43: 1 22c:145 Artificial Intelligence Bayesian Networks Reading: Ch 14. Russell & Norvig

Lecture 14 • 43

Semantics

The full joint distribution is defined as the product of the local conditional distributions:

P (X1, … ,Xn) = πi = 1 P (Xi | Parents(Xi))

e.g., P(j m a b e)= P (j | a) P (m | a) P (a | b, e) P (b) P (e)

n

Page 44: 1 22c:145 Artificial Intelligence Bayesian Networks Reading: Ch 14. Russell & Norvig

Lecture 14 • 44

Constructing Bayesian networks

• 1. Choose an ordering of variables X1, … ,Xn

• 2. For i = 1 to n• add Xi to the network• select parents from X1, … ,Xi-1 such that

P (Xi | Parents(Xi)) = P (Xi | X1, ... Xi-1)

This choice of parents guarantees:P (X1, … ,Xn) = πi =1 P (Xi | X1, … , Xi-1)

= πi =1P (Xi | Parents(Xi))

(by construction)(chain rule)

n

n

Page 45: 1 22c:145 Artificial Intelligence Bayesian Networks Reading: Ch 14. Russell & Norvig

Lecture 14 • 45

• Suppose we choose the ordering M, J, A, B, E

P(J | M) = P(J)?

Example

Page 46: 1 22c:145 Artificial Intelligence Bayesian Networks Reading: Ch 14. Russell & Norvig

Lecture 14 • 46

• Suppose we choose the ordering M, J, A, B, E

P(J | M) = P(J)?P(A | J, M) = P(A | J)? P(A | J, M) = P(A)?

• No

Example

Page 47: 1 22c:145 Artificial Intelligence Bayesian Networks Reading: Ch 14. Russell & Norvig

Lecture 14 • 47

• Suppose we choose the ordering M, J, A, B, E

P(J | M) = P(J)?P(A | J, M) = P(A | J)? P(A | J, M) = P(A)? NoP(B | A, J, M) = P(B | A)? P(B | A, J, M) = P(B)?

• No

Example

Page 48: 1 22c:145 Artificial Intelligence Bayesian Networks Reading: Ch 14. Russell & Norvig

Lecture 14 • 48

• Suppose we choose the ordering M, J, A, B, E

P(J | M) = P(J)?P(A | J, M) = P(A | J)? P(A | J, M) = P(A)? NoP(B | A, J, M) = P(B | A)? YesP(B | A, J, M) = P(B)? NoP(E | B, A ,J, M) = P(E | A)?P(E | B, A, J, M) = P(E | A, B)?

• No

Example

Page 49: 1 22c:145 Artificial Intelligence Bayesian Networks Reading: Ch 14. Russell & Norvig

Lecture 14 • 49

• Suppose we choose the ordering M, J, A, B, E

P(J | M) = P(J)?P(A | J, M) = P(A | J)? P(A | J, M) = P(A)? NoP(B | A, J, M) = P(B | A)? YesP(B | A, J, M) = P(B)? NoP(E | B, A ,J, M) = P(E | A)? NoP(E | B, A, J, M) = P(E | A, B)? Yes

• No

Example

Page 50: 1 22c:145 Artificial Intelligence Bayesian Networks Reading: Ch 14. Russell & Norvig

Lecture 14 • 50

Example contd.

• Deciding conditional independence is hard in noncausal directions• (Causal models and conditional independence seem hardwired for

humans!)• Network is less compact: 1 + 2 + 4 + 2 + 4 = 13 numbers needed

Page 51: 1 22c:145 Artificial Intelligence Bayesian Networks Reading: Ch 14. Russell & Norvig

51

Excercises

P(J, M, A, B, E ) = ?P(M, A, B) = ?P(-M, A, B) = ?P(A, B) = ?P(M, B) = ?P(A | J) = ?

Page 52: 1 22c:145 Artificial Intelligence Bayesian Networks Reading: Ch 14. Russell & Norvig

Lecture 14 • 52

Summary

• Bayesian networks provide a natural representation for (causally induced) conditional independence

• Topology + CPTs = compact representation of joint distribution

• Generally easy for domain experts to construct