Upload
matthew-cole
View
217
Download
0
Tags:
Embed Size (px)
Citation preview
2
Announcements
• Practice Exercises on Bnets– 6a, 6b and 6c– Reminder: they are helpful for staying on top of the material,
and for studying for the exam
• I will post practice material for the final in a dedicated folder on Connect
• Keep an eye on Connect and the class schedule for info on new office hours for next week and the week after
• Remember to fill out student evaluations!– Your feedback is invaluable
4
X Y Z val
t t t 0.1
t t f 0.9
t f t 0.2
t f f 0.8
f t t 0.4
f t f 0.6
f f t 0.3
f f f 0.7
Factors• A factor f(X1,… ,Xj) is a function from a tuple of random
variables X1,… ,Xj to the real numbers R
• A factor denotes one or more (possibly partial) distributions over the given tuple of variables, e.g.,
• P(X1, X2) is a factor f(X1, X2)
• P(Z | X,Y) is a factor f(Z,X,Y)
• P(Z=f|X,Y) is a factor f(X,Y)
• Note: Factors do not have to sum to one
Distribution
Set of DistributionsOne for each combination
of values for X and Y
Set of partial Distributions
f(X, Y ) Z = f
Recap• If we assign variable A=a in factor f (A,B), what is the correct form for the
resulting factor?– f(B).
When we assign variable A we remove it from the factor’s domain
• If we marginalize variable A out from factor f (A,B), what is the correct form for the resulting factor?– f(B).
When we marginalize out variable A we remove it from the factor’s domain
• If we multiply factors f4(X,Y) and f6(Z,Y), what is the correct form for the resulting factor?– f(X,Y,Z)– When multiplying factors, the resulting factor’s domain is the union of the
multiplicands’ domains
5
The variable elimination algorithm,
1. Construct a factor for each conditional probability.
2. For each factor, assign the observed variables E to their observed values.
3. Given an elimination ordering, decompose sum of products
4. Sum out all variables Zi not involved in the query (one a time)
• Multiply factors containing Zi
• Then marginalize out Zi from the product
5. Multiply the remaining factors (which only involve Y )
6. Normalize by dividing the resulting factor f(Y) by y
Yf )(
To compute P(Y=yi| E1=e1, …, Ej=ej) =
The JPD of a Bayesian network is
Given: P(Y, E1…, Ej , Z1…,Zk )
))(|( ) , ,P(1
1
n
iiin XpaXPXX
))(,())(|( iiiii XpaXfXpaXP
1
11 ,,1
11 )(),,,(Z
eEeE
n
ii
Zjj jj
k
feEeEYP
observedOther variables not involved in the query
yYjj
jji
e, E, eEyYP
e, E, eEyYP
),(
),(
11
11
7
Variable elimination example
P(G,H) = A,B,C,D,E,F,I P(A,B,C,D,E,F,G,H,I) =
= A,B,C,D,E,F,I P(A)P(B|A)P(C)P(D|B,C)P(E|C)P(F|D)P(G|F,E)P(H|G)P(I|G)
Compute P(G | H=h1 ).
8
Step 1: Construct a factor for each cond. probability
P(G,H) = A,B,C,D,E,F,I P(A)P(B|A)P(C)P(D|B,C)P(E|C)P(F|D)P(G|F,E)P(H|G)P(I|G)
P(G,H) = A,B,C,D,E,F,I f0(A) f1(B,A) f2(C) f3(D,B,C) f4(E,C) f5(F, D) f6(G,F,E) f7(H,G) f8(I,G)
• f0(A)
• f1(B,A)
• f2(C)
• f3(D,B,C)
• f4(E,C)
• f5(F, D)
• f6(G,F,E)
• f7(H,G)
• f8(I,G)
Compute P(G | H=h1 ).
9
Previous state:
P(G,H) = A,B,C,D,E,F,I f0(A) f1(B,A) f2(C) f3(D,B,C) f4(E,C) f5(F, D) f6(G,F,E) f7(H,G) f8(I,G)
Observe H :
• f9(G)• f0(A)
• f1(B,A)
• f2(C)
• f3(D,B,C)
• f4(E,C)
• f5(F, D)
• f6(G,F,E)
• f7(H,G)
• f8(I,G)
Step 2: assign to observed variables their observed values.
P(G,H=h1)=A,B,C,D,E,F,I f0(A) f1(B,A) f2(C) f3(D,B,C) f4(E,C) f5(F, D) f6(G,F,E) f9(G) f8(I,G)
Compute P(G | H=h1 ).
H=h1
10
Step 3: Decompose sum of products
Previous state: P(G,H=h1) = A,B,C,D,E,F,I f0(A) f1(B,A) f2(C) f3(D,B,C) f4(E,C) f5(F, D) f6(G,F,E) f9(G) f8(I,G)
Elimination ordering A, C, E, I, B, D, F : P(G,H=h1) = f9(G) F D f5(F, D) B I f8(I,G) E f6(G,F,E) C f2(C) f3(D,B,C) f4(E,C) A f0(A) f1(B,A)
• f9(G)• f0(A)
• f1(B,A)
• f2(C)
• f3(D,B,C)
• f4(E,C)
• f5(F, D)
• f6(G,F,E)
• f7(H,G)
• f8(I,G)
Compute P(G | H=h1 ).
11
Step 4: sum out non query variables (one at a time)
Previous state:
P(G,H=h1) = f9(G) F D f5(F, D) B I f8(I,G) E f6(G,F,E) C f2(C) f3(D,B,C) f4(E,C) A f0(A) f1(B,A)
Eliminate A: perform product and sum out A in
P(G,H=h1) = f9(G) F D f5(F, D) B f10(B) I f8(I,G) E f6(G,F,E) C f2(C) f3(D,B,C) f4(E,C)
• f9(G)• f0(A)
• f1(B,A)
• f2(C)
• f3(D,B,C)
• f4(E,C)
• f5(F, D)
• f6(G,F,E)
• f7(H,G)
• f8(I,G)
• f10(B)
Elimination order: A,C,E,I,B,D,FCompute P(G | H=h1 ).
f10(B) does not depend on C, E, or I, so we can push it outside of those sums.
12
Step 4: sum out non query variables (one at a time)
Previous state:
P(G,H=h1) = f9(G) F D f5(F, D) B f10(B) I f8(I,G) E f6(G,F,E) C f2(C) f3(D,B,C) f4(E,C)
Eliminate C: perform product and sum out C in
P(G,H=h1) = f9(G) F D f5(F, D) B f10(B) I f8(I,G) E f6(G,F,E) f11(B,D,E)
• f9(G)• f0(A)
• f1(B,A)
• f2(C)
• f3(D,B,C)
• f4(E,C)
• f5(F, D)
• f6(G,F,E)
• f7(H,G)
• f8(I,G)
• f10(B)
Compute P(G | H=h1 ). Elimination order: A,C,E,I,B,D,F
• f11(B,D,E)
13
Step 4: sum out non query variables (one at a time)
Previous state:
P(G,H=h1) = P(G,H=h1) = f9(G) F D f5(F, D) B f10(B) I f8(I,G) E f6(G,F,E) f11(B,D,E)
Eliminate E: perform product and sum out E in
P(G,H=h1) = P(G,H=h1) = f9(G) F D f5(F, D) B f10(B) f12(B,D,F,G) I f8(I,G)
• f9(G)• f0(A)
• f1(B,A)
• f2(C)
• f3(D,B,C)
• f4(E,C)
• f5(F, D)
• f6(G,F,E)
• f7(H,G)
• f8(I,G)
• f10(B)
Compute P(G | H=h1 ). Elimination order: A,C,E,I,B,D,F
• f11(B,D,E)
• f12(B,D,F,G)
14
Previous state:
P(G,H=h1) = P(G,H=h1) = f9(G) F D f5(F, D) B f10(B) f12(B,D,F,G) I f8(I,G)
Eliminate I: perform product and sum out I in
P(G,H=h1) = P(G,H=h1) = f9(G) f13(G)F D f5(F, D) B f10(B) f12(B,D,F,G)
• f9(G)• f0(A)
• f1(B,A)
• f2(C)
• f3(D,B,C)
• f4(E,C)
• f5(F, D)
• f6(G,F,E)
• f7(H,G)
• f8(I,G)
• f10(B)
Elimination order: A,C,E,I,B,D,F
• f11(B,D,E)
• f12(B,D,F,G)
• f13(G)
Step 4: sum out non query variables (one at a time)
Compute P(G | H=h1 ).
15
Previous state:
P(G,H=h1) = P(G,H=h1) = f9(G) f13(G)F D f5(F, D) B f10(B) f12(B,D,F,G)
Eliminate B: perform product and sum out B in
P(G,H=h1) = P(G,H=h1) = f9(G) f13(G)F D f5(F, D) f14(D,F,G)
• f9(G)• f0(A)
• f1(B,A)
• f2(C)
• f3(D,B,C)
• f4(E,C)
• f5(F, D)
• f6(G,F,E)
• f7(H,G)
• f8(I,G)
• f10(B)
Elimination order: A,C,E,I,B,D,F
• f11(B,D,E)
• f12(B,D,F,G)
• f13(G)
• f14(D,F,G)
Step 4: sum out non query variables (one at a time)
Compute P(G | H=h1 ).
16
Previous state:
P(G,H=h1) = P(G,H=h1) = f9(G) f13(G)F D f5(F, D) f14(D,F,G)
Eliminate D: perform product and sum out D in
P(G,H=h1) = P(G,H=h1) = f9(G) f13(G)F f15(F,G)
• f9(G)• f0(A)
• f1(B,A)
• f2(C)
• f3(D,B,C)
• f4(E,C)
• f5(F, D)
• f6(G,F,E)
• f7(H,G)
• f8(I,G)
• f10(B)
Elimination order: A,C,E,I,B,D,F
• f11(B,D,E)
• f12(B,D,F,G)
• f13(G)
• f14(D,F,G)
• f15(F,G)
Step 4: sum out non query variables (one at a time)
Compute P(G | H=h1 ).
17
Previous state:
P(G,H=h1) = P(G,H=h1) = f9(G) f13(G)F f15(F,G)
Eliminate F: perform product and sum out F in
f9(G) f13(G)f16(G)
• f9(G)• f0(A)
• f1(B,A)
• f2(C)
• f3(D,B,C)
• f4(E,C)
• f5(F, D)
• f6(G,F,E)
• f7(H,G)
• f8(I,G)
• f10(B)
Elimination order: A,C,E,I,B,D,F
• f11(B,D,E)
• f12(B,D,F,G)
• f13(G)
• f14(D,F,G)
• f15(F,G)
• f16(G)
Step 4: sum out non query variables (one at a time)
Compute P(G | H=h1 ).
Slide 18
Step 5: Multiply remaining factors
Previous state:
P(G,H=h1) = f9(G) f13(G)f16(G)
Multiply remaining factors (all in G):
P(G,H=h1) = f17(G)
• f9(G)• f0(A)
• f1(B,A)
• f2(C)
• f3(D,B,C)
• f4(E,C)
• f5(F, D)
• f6(G,F,E)
• f7(H,G)
• f8(I,G)
• f10(B)
Compute P(G | H=h1 ). Elimination order: A,C,E,I,B,D,F
• f11(B,D,E)
• f12(B,D,F,G)
• f13(G)
• f14(D,F,G)
• f15(F,G)
• f16(G)
f17(G)
19
Step 6: Normalize
• f9(G)• f0(A)
• f1(B,A)
• f2(C)
• f3(D,B,C)
• f4(E,C)
• f5(F, D)
• f6(G,F,E)
• f7(H,G)
• f8(I,G)
• f10(B)
Compute P(G | H=h1 ).
• f11(B,D,E)
• f12(B,D,F,G)
• f13(G)
• f14(D,F,G)
• f15(F,G)
• f16(G)
f17(G)
)('
17
17
)('
1
)'(
)(
),'(
),(
)(
),()|(
1
1
1
1
GdomgGdomg
gf
gf
hHgGP
hHgGP
hHP
hHgGPhHgGP
VE and conditional independence• So far, we haven’t use conditional independence!
– Before running VE, we can prune all variables Z that are conditionally
independent of the query Y given evidence E: Z ╨ Y | E
– They cannot change the belief over Y given E!
21
• Example: which variables can we prune for the query P(G=g| C=c1, F=f1, H=h1) ?
a A,B,D
b D,E
c A,B,D,E
d None
VE and conditional independence• Before running VE, we can prune all variables Z that are conditionally
independent of the query Y given evidence E: Z ╨ Y | E
• They cannot change the belief over Y given E!
• Example: which variables can we prune for the query P(G=g| C=c1, F=f1, H=h1) ?
– A, B, and D. Both paths from these nodes to G are blocked • F is observed node in chain structure• C is an observed common parent
a A,B,D
23
Variable elimination: pruning
Slide 23
Thus, if the query is
P(G=g| C=c1, F=f1, H=h1)
we only need to consider this
subnetwork
• We can also prune unobserved leaf nodes• Since they are unobserved and not predecessors of the query nodes, they
cannot influence the posterior probability of the query nodes
One last trick
• We can also prune unobserved leaf nodes– And we can do so recursively
• E.g., which nodes can we prune if the query is P(A)?
• Recursively prune unobserved leaf nodes:• we can prune all nodes other than A !
24
25
VE in AISpace
• To see how variable elimination works in the Aispace Applet• Select “Network options -> Query Models > verbose”• Compare what happens when you select “Prune Irrelevant variables” or
not in the VE window that pops up when you query a node• Try different heuristics for elimination ordering
28
• After assigning L = F, the factor f(A) includes the rows in the original f(A,L) that correspond to L=T
Complexity of Variable Elimination (VE)(not required)
• A factor over n binary variables has to store 2n numbers– The initial factors are typically quite small (variables typically only have
few parents in Bayesian networks)– But variable elimination constructs larger factors by multiplying factors
together
• The complexity of VE is exponential in the maximum number of variables in any factor during its execution – This number is called the treewidth of a graph (along an ordering)– Elimination ordering influences treewidth
• Finding the best ordering is NP complete– I.e., the ordering that generates the minimum treewidth– Heuristics work well in practice (e.g. least connected variables first)– Even with best ordering, inference is sometimes infeasible
• In those cases, we need approximate inference. See CS422 & CS54029
• Build a Bayesian Network for a given domain• Identify the necessary CPTs• Compare different network structures• Understand dependencies and independencies • Variable elimination
– Understating factors and their operations– Carry out variable elimination by using factors and the related operations– Use techniques to simplify variable elimination
Learning Goals For Bnets
30
Bioinformatics
Big picture: Reasoning Under Uncertainty
Dynamic Bayesian Networks
Hidden Markov Models & Filtering
Probability Theory
Bayesian Networks & Variable Elimination
Natural Language Processing
Email spam filters
Motion Tracking,Missile Tracking,
etc
Monitoring(e.g. credit card fraud detection)
Diagnostic systems
(e.g. medicine)31
Where are we?• Environment
Problem Type
Query
Planning
Deterministic Stochastic
Constraint Satisfaction Search
Arc Consistency
Search
Search
Logics
STRIPS
Vars + Constraints
Variable
Elimination
Belief Nets
Decision Nets
Static
Sequential
Representation
ReasoningTechnique
Variable
Elimination
This concludes the module on answering queries in stochastic environments
What’s Next?• Environment
Problem Type
Query
Planning
Deterministic Stochastic
Constraint Satisfaction Search
Arc Consistency
Search
Search
Logics
STRIPS
Vars + Constraints
Variable
Elimination
Belief Nets
Decision Nets
Static
Sequential
Representation
ReasoningTechnique
Variable
Elimination
Now we will look at acting in stochastic environments
Decisions Under Uncertainty: Intro• An agent's decision will depend on
– What actions are available– What beliefs the agent has– Which goals the agent has
• Differences between deterministic and stochastic setting– Obvious difference in representation: need to represent our uncertain
beliefs– Actions will be pretty straightforward: represented as decision variables– Goals will be interesting: we'll move from all-or-nothing goals to a richer
notion: • rating how happy the agent is in different situations.
– Putting these together, we'll extend Bayesian Networks to make a new representation called Decision Networks
35
Delivery Robot Example• Robot needs to reach a certain room
• Robot can go
• the short way - faster but with more obstacles, thus more prone to accidents that can damage the robot and prevent it from reaching the room
• the long way - slower but less prone to accident• Which way to go? Is it more important for the robot to arrive fast, or to minimize
the risk of damage?• The Robot can choose to wear pads to protect itself in case of accident, or not
to wear them. Pads make it heavier, increasing energy consumption• Again, there is a tradeoff between reducing risk of damage, saving resources and
arriving fast• Possible outcomes
• No pad, no accident
• Pad, no accident
• Pad, Accident
• No pad, accident
Next• We’ll see how to represent and reason about situations of this
nature by using
• Probability to measure the uncertainty in actions outcome
• Utility to measure agent’s preferences over the various outcomes
• Combined in a measure of expected utility that can be used to identify the action with the best expected outcome
• Best that an intelligent agent can do when it needs to act in a stochastic environment
Delivery Robot Example• Decision variable 1: the robot can choose to wear pads
– Yes: protection against accidents, but extra weight– No: fast, but no protection
• Decision variable 2: the robot can choose the way– Short way: quick, but higher chance of accident– Long way: safe, but slow
• Random variable: is there an accident?
Agent decides
Chance decides
38
Possible worlds and decision variables• A possible world specifies a value for each random variable and each decision
variable• For each assignment of values to all decision variables
– the probabilities of the worlds satisfying that assignment sum to 1.
0.2
0.8
39
Possible worlds and decision variables
0.01
0.99
0.2
0.8
40
• A possible world specifies a value for each random variable and each decision variable
• For each assignment of values to all decision variables – the probabilities of the worlds satisfying that assignment sum to 1.
Possible worlds and decision variables
0.01
0.99
0.2
0.8
0.2
0.8
41
• A possible world specifies a value for each random variable and each decision variable
• For each assignment of values to all decision variables – the probabilities of the worlds satisfying that assignment sum to 1.
Possible worlds and decision variables
0.01
0.99
0.2
0.8
0.01
0.99
0.2
0.8
42
• A possible world specifies a value for each random variable and each decision variable
• For each assignment of values to all decision variables – the probabilities of the worlds satisfying that assignment sum to 1.
43
Lecture Overview
• Recap lecture 32• VE in AISpace, and refinements• Intro to DT
• Utility and expected utility
Utility• Utility: a measure of desirability of possible worlds to an agent
– Let U be a real-valued function such that U(w) represents an agent's degree of preference for world w
– Expressed by a number in [0,100]
44
Utility for the Robot Example• Which would be a reasonable utility function for our robot?
• Which are the best and worst scenarios?
0.01
0.99
0.2
0.8
0.01
0.99
0.2
0.8
45
Utilityprobability