106
Bayesian Bayesian Network Network CVPR Winter seminar Jaemin Kim

Bayesian Network Bayesian Network CVPR Winter seminar Jaemin Kim

Embed Size (px)

Citation preview

Page 1: Bayesian Network Bayesian Network CVPR Winter seminar Jaemin Kim

Bayesian Network Bayesian Network

CVPR Winter seminar

Jaemin Kim

Page 2: Bayesian Network Bayesian Network CVPR Winter seminar Jaemin Kim

OutlineOutline» Concepts in Probability

• Probability• Random variables• Basic properties (Bayes rule)

• Bayesian Networks• Inference• Decision making• Learning networks from data• Reasoning over time• Applications

2

Page 3: Bayesian Network Bayesian Network CVPR Winter seminar Jaemin Kim

ProbabilitiesProbabilities

Probability distribution P(X| • X is a random variable

• Discrete• Continuous

• is background state of information

3

Page 4: Bayesian Network Bayesian Network CVPR Winter seminar Jaemin Kim

Discrete Random VariablesDiscrete Random Variables

Finite set of possible outcomes

0)( ixP

1)(1

n

iixP

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

X1 X2 X3 X41)()( xPxPX binary:

nxxxxX ,...,,, 321

4

Page 5: Bayesian Network Bayesian Network CVPR Winter seminar Jaemin Kim

Continuous Random VariablesContinuous Random Variables

Probability distribution (density function) over continuous values

10

0

1)( dxxP

10,0X 0)( xP

7

5

)()75( dxxPxP

)(xP

x5 7

5

Page 6: Bayesian Network Bayesian Network CVPR Winter seminar Jaemin Kim

More ProbabilitiesMore Probabilities

Joint

• Probability that both X=x and Y=y Conditional

• Probability that X=x given we know that Y=y

)(),( yYxXPyxP

)|()|( yYxXPyxP

6

Page 7: Bayesian Network Bayesian Network CVPR Winter seminar Jaemin Kim

Rule of ProbabilitiesRule of Probabilities

Product Rule

Marginalization

)()|()()|(),( XPXYPYPYXPYXP

),(),()( xYPxYPYP

),( )(1

n

iixYPYP

X binary:

7

Page 8: Bayesian Network Bayesian Network CVPR Winter seminar Jaemin Kim

Bayes RuleBayes Rule

)()|()()|(),( HPHEPEPEHPEHP

)(

)()|()|(

EP

HPHEPEHP

8

Page 9: Bayesian Network Bayesian Network CVPR Winter seminar Jaemin Kim

목적 : 특정 variable 에 관한 정보 (probability distribution) 를

상관관계가 있는 다른 variables 관한 정보부터 추출

Graph ModelGraph Model

Definition:• A collection of variables (nodes) with a set of

dependencies (edges) between the variables, and a set of probability distribution functions for each

variable

• A Bayesian network is a special type of graph model

which is a directed acyclic graph (DAG)

9

Page 10: Bayesian Network Bayesian Network CVPR Winter seminar Jaemin Kim

Bayesian NetworksBayesian Networks

A Graph− nodes represent the random variables

− directed edges (arrows) between pairs of nodes

− it must be a Directed Acyclic Graph (DAG)

− the graph represents relationships between variables

Conditional probability specifications− the conditional probability distribution (CPD) of each variable

given its parents

− discrete variable: table (CPT)

10

Page 11: Bayesian Network Bayesian Network CVPR Winter seminar Jaemin Kim

Bayesian Networks (Belief Bayesian Networks (Belief Networks)Networks) A Graph

− directed edges (arrows) between pairs of nodes

− causality: A “causes” B

− AI an statistics communities

Markov Random fields (MRF)Markov Random fields (MRF)

A Graph

− undirected edges (arrows) between pairs of nodes

− a simple definition of independence:

If all paths between the nodes in A and B are separated by a node c

A and B are conditionally independent given a third set C

− physics and vision communities

11

Page 12: Bayesian Network Bayesian Network CVPR Winter seminar Jaemin Kim

Bayesian NetworksBayesian Networks

12

Page 13: Bayesian Network Bayesian Network CVPR Winter seminar Jaemin Kim

Bayesian networksBayesian networks

Basics• Structured representation

• Conditional independence

• Naïve Bayes model

• Independence facts

13

Page 14: Bayesian Network Bayesian Network CVPR Winter seminar Jaemin Kim

Bayesian networksBayesian networks

CancerSmoking heavylightnoS ,,

malignantbenignnoneC ,,P(S=no) 0.80P(S=light) 0.15P(S=heavy) 0.05

Smoking= no light heavyP(C=none) 0.96 0.88 0.60P(C=benign) 0.03 0.08 0.25P(C=malig) 0.01 0.04 0.15

P(C|S):

P(S):

14

Page 15: Bayesian Network Bayesian Network CVPR Winter seminar Jaemin Kim

Product RuleProduct Rule

P(C,S) = P(C|S) P(S)

S C none benign malignantno 0.768 0.024 0.008

light 0.132 0.012 0.006

heavy 0.035 0.010 0.005

P(C=none ^ S=no) = P(C=none | S=no)P(S=no) = 0.96*0.8 = 0.768

15

Page 16: Bayesian Network Bayesian Network CVPR Winter seminar Jaemin Kim

Product RuleProduct Rule

P(C,S) = P(C|S) P(S)

S C none benign malignantno 0.768 0.024 0.008

light 0.132 0.012 0.006

heavy 0.035 0.010 0.005

P(C=none ^ S=no) = P(C=none | S=no)P(S=no) = 0.96*0.8 = 0.768

16

Page 17: Bayesian Network Bayesian Network CVPR Winter seminar Jaemin Kim

MarginalizationMarginalization

P(C=mal) = P(C=mal ^ S=no) + P(C=mal ^ S=light) + P(C=mal | S=heavy)

S C none benign malig totalno 0.768 0.024 0.008 .80

light 0.132 0.012 0.006 .15

heavy 0.035 0.010 0.005 .05

total 0.935 0.046 0.019

P(Cancer)

P(Smoke)

P(S=no) = P(S=no ^ C=no) + P(S=no ^ C=be) + P(S=no & C=mal)

17

Page 18: Bayesian Network Bayesian Network CVPR Winter seminar Jaemin Kim

Bayes Rule RevisitedBayes Rule Revisited

)(

),(

)(

)()|()|(

CP

SCP

CP

SPSCPCSP

S C none benign maligno 0.768/.935 0.024/.046 0.008/.019

light 0.132/.935 0.012/.046 0.006/.019

heavy 0.030/.935 0.015/.046 0.005/.019

Cancer= none benign malignantP( S=no) 0.821 0.522 0.421P( S=light) 0.141 0.261 0.316P(S=heavy) 0.037 0.217 0.263

P(S|C):

18

Page 19: Bayesian Network Bayesian Network CVPR Winter seminar Jaemin Kim

A Bayesian NetworkA Bayesian Network

Smoking

GenderAge

Cancer

LungTumor

SerumCalcium

Exposureto Toxics

19

Page 20: Bayesian Network Bayesian Network CVPR Winter seminar Jaemin Kim

Problems with Large InstancesProblems with Large Instances

lb

lxfsbPxfsP,

),,,,(),,(111111

• The joint probability distribution, P(A,G,E,S,C,L,SC)

For five binary variables there are 27 = 128 values in the joint distribution (for 100 variables there are over 1030 values)

How are these values to be obtained?

• Inference

To obtain posterior distributions once some evidence is available requires summation over an exponential number of terms eg 22 in the calculation of

which increases to 297 if there are 100 variables.

Page 21: Bayesian Network Bayesian Network CVPR Winter seminar Jaemin Kim

IndependenceIndependence

Age and Gender are independent.

P(A|G) = P(A) A G P(G|A) = P(G) G A

GenderAge

P(A,G) = P(G|A) P(A) = P(G)P(A)P(A,G) = P(A|G) P(G) = P(A)P(G)

P(A,G) = P(G)P(A)

21

Page 22: Bayesian Network Bayesian Network CVPR Winter seminar Jaemin Kim

Conditional IndependenceConditional Independence

Smoking

GenderAge

Cancer

Cancer is independent of Age and Gender given Smoking.

P(C|A,G,S) = P(C|S) C A,G | S

(Smoking=heavy) 조건은 Age 와 Gender 의 확률분포를 제한 (Smoking=heavy) 조건은 cancer 의 확률분포를 제한(Smoking=heavy) 조건하에서 cancer 는 age 와 gender 에 독립

22

Page 23: Bayesian Network Bayesian Network CVPR Winter seminar Jaemin Kim

More Conditional Independence:More Conditional Independence:Naïve Bayes Naïve Bayes

Cancer

LungTumor

SerumCalcium

Serum Calcium is independent of Lung Tumor, given Cancer

P(L|SC,C) = P(L|C)

Serum Calcium and Lung Tumor are dependent

혈청23

Page 24: Bayesian Network Bayesian Network CVPR Winter seminar Jaemin Kim

More Conditional Independence:More Conditional Independence:Explaining Away Explaining Away

Exposure to Toxics is dependent on Smoking, given Cancer

Exposure to Toxics and Smoking are independent

Smoking

Cancer

Exposureto Toxics

E S

P(E = heavy | C = malignant) >

P(E = heavy | C = malignant, S=heavy)

24

Page 25: Bayesian Network Bayesian Network CVPR Winter seminar Jaemin Kim

More Conditional Independence:More Conditional Independence:Explaining Away Explaining Away

Exposure to Toxics is dependent on Smoking, given Cancer

Smoking

Cancer

Exposureto Toxics

25

Smoking

Cancer

Exposureto Toxics

Moralize the graph.

Page 26: Bayesian Network Bayesian Network CVPR Winter seminar Jaemin Kim

Put it all togetherPut it all together

),,,,,,( SCLCSEGAP

Smoking

GenderAge

Cancer

LungTumor

SerumCalcium

Exposureto Toxics

)|()|( CLPCSCP

)()( GPAP

),|()|( GASPAEP

),|( SECP

26

Page 27: Bayesian Network Bayesian Network CVPR Winter seminar Jaemin Kim

General Product (Chain) Rule General Product (Chain) Rule for Bayesian Networksfor Bayesian Networks

)|(),,,(1

21 iPa

n

iin XPXXXP

Pai=parents(Xi)

27

Page 28: Bayesian Network Bayesian Network CVPR Winter seminar Jaemin Kim

Conditional IndependenceConditional Independence

28

Smoking

GenderAge

Cancer

LungTumor

SerumCalcium

Exposureto Toxics Cancer is independent of

Age and Gender given Exposure to Toxics and Smoking.

Descendants

Parents

Non-Descendants

A variable (node) is conditionally independent of its non-descendants given its parents.

Page 29: Bayesian Network Bayesian Network CVPR Winter seminar Jaemin Kim

Another non-descendant Another non-descendant

Diet

Cancer is independent of Diet given Exposure to Toxics and Smoking.

Smoking

GenderAge

Cancer

LungTumor

SerumCalcium

Exposureto Toxics

29

Page 30: Bayesian Network Bayesian Network CVPR Winter seminar Jaemin Kim

n

iiin xpaxPxxxP

121 ))(|(),...,,(

Representing the Joint DistributionRepresenting the Joint Distribution

In general, for a network with nodes X1, X2, …, Xn then

An enormous saving can be made regarding the number of values required for the joint distribution.

To determine the joint distribution directly for n binary variables 2n – 1 values are required.

For a BN with n binary variables and each node has at most k parents then less than 2kn values are required.

Page 31: Bayesian Network Bayesian Network CVPR Winter seminar Jaemin Kim

An ExampleAn Example

Smoking history

Fatigue

Bronchitis Lung Cancer

X-ray

P(s1)=0.2

P(l1|s1)=0.003P(l1|s2)=0.00005

P(b1|s1)=0.25P(b1|s2)=0.05

P(f1|b1,l1)=0.75P(f1|b1,l2)=0.10P(f1|b2,l1)=0.5P(f1|b2,l2)=0.05

P(x1|l1)=0.6P(x1|l2)=0.02

?),,,,(11121

xflbsP

Page 32: Bayesian Network Bayesian Network CVPR Winter seminar Jaemin Kim

SolutionSolution

)|(),|()|()|()(),,,,( lxPlbfPslPsbPsPxflbsP

),,,|(),,|(),|()|()(),,,,( flsbxPlsbfPsblPsbPsPxflbsP

)|(),,,|( lxPflsbxP

Note that our joint distribution with 5 variables can be represented as:

Consequently the joint probability distribution can now be expressed as

For example, the probability that someone has a smoking history, lung cancer but not bronchitis, suffers from fatigue and tests positive in an X-ray test is

000135.06.05.0003.075.02.0),,,,(11121

xflbsP

Page 33: Bayesian Network Bayesian Network CVPR Winter seminar Jaemin Kim

Independence and Graph SeparationIndependence and Graph Separation

• Given a set of observations, is one set of variables dependent on another set?

• Observing effects can induce dependencies.

• d-separation (Pearl 1988) allows us to check conditional independence graphically.

Page 34: Bayesian Network Bayesian Network CVPR Winter seminar Jaemin Kim

Bayesian networksBayesian networks

• Additional structure• Nodes as functions• Causal independence• Context specific dependencies• Continuous variables• Hierarchy and model construction

Page 35: Bayesian Network Bayesian Network CVPR Winter seminar Jaemin Kim

Nodes as funtionsNodes as funtions

X

A

B

0.1

0.3

0.6

a b a b a b

0.4

0.2

0.4

a b

0.5

0.3

0.2

lo

med

hi

0.7

0.1

0.2

X

0.7

0.1

0.2

• A BN node is conditional distribution function• its parent values are the inputs• its output is a distribution over its valueslo : 0.7

med : 0.1

hi : 0.2

b

a

Page 36: Bayesian Network Bayesian Network CVPR Winter seminar Jaemin Kim

Nodes as funtionsNodes as funtions

X

A

B

X

Any type of functionfrom Val(A,B)to distributions

over Val(X)

lo : 0.7

med : 0.1

hi : 0.2

b

a

Page 37: Bayesian Network Bayesian Network CVPR Winter seminar Jaemin Kim

Continuous variablesContinuous variables

OutdoorTemperature

A/C Setting

97o hi

IndoorTemperature

Function from Val(A,B)to density functions

over Val(X)

P(x)

x

IndoorTemperature

Page 38: Bayesian Network Bayesian Network CVPR Winter seminar Jaemin Kim

Gaussian (normal) distributionsGaussian (normal) distributions

2

)(exp

2

1)(

2xxP

N(, )

different mean different variance

Page 39: Bayesian Network Bayesian Network CVPR Winter seminar Jaemin Kim

Gaussian networksGaussian networks

X Y

),(~ 2XNX

),(~ 2YbaxNY

X YX Y

Each variable is a linear function of its parents,

with Gaussian noise

Joint probability density functions:

Page 40: Bayesian Network Bayesian Network CVPR Winter seminar Jaemin Kim

Composing functions Composing functions

• Recall: a BN node is a function• We can compose functions to get more

complex functions.• The result: A hierarchically structured

BN. • Since functions can be called more than

once, we can reuse a BN model fragment in multiple contexts.

Page 41: Bayesian Network Bayesian Network CVPR Winter seminar Jaemin Kim

Tires

Owner

Car:

Mileage

Maintenance Age Original-value

Fuel-efficiency Braking-power

OwnerAge Income

BrakesBrakes: Power

Tires:RF-TireLF-Tire

TractionPressure

EngineEngineEngine:Power

Page 42: Bayesian Network Bayesian Network CVPR Winter seminar Jaemin Kim

Bayesian NetworksBayesian Networks

• Knowledge acquisition• Variables• Structure• Numbers

Page 43: Bayesian Network Bayesian Network CVPR Winter seminar Jaemin Kim

What is a variable?What is a variable?

Risk of Smoking Smoking

Values versus Probabilities

• Collectively exhaustive, mutually exclusive values

4321 xxxx

jixx ji )(

Error Occured

No Error

Page 44: Bayesian Network Bayesian Network CVPR Winter seminar Jaemin Kim

Clarity Test: Clarity Test: Knowable in PrincipleKnowable in Principle

• Weather {Sunny, Cloudy, Rain, Snow}• Gasoline: Cents per gallon• Temperature { 100F , < 100F}• User needs help on Excel Charting {Yes,

No}• User’s personality {dominant, submissive}

Page 45: Bayesian Network Bayesian Network CVPR Winter seminar Jaemin Kim

StructuringStructuring

LungTumor

SmokingExposureto Toxic

GenderAge

Extending the conversation.

Network structure correspondingto “causality” is usually good.

CancerGeneticDamage

Page 46: Bayesian Network Bayesian Network CVPR Winter seminar Jaemin Kim

Course ContentsCourse Contents

• Concepts in Probability• Bayesian Networks» Inference• Decision making• Learning networks from data• Reasoning over time• Applications

Page 47: Bayesian Network Bayesian Network CVPR Winter seminar Jaemin Kim

InferenceInference

• Patterns of reasoning• Basic inference• Exact inference• Exploiting structure• Approximate inference

Page 48: Bayesian Network Bayesian Network CVPR Winter seminar Jaemin Kim

Predictive InferencePredictive Inference

How likely are elderly malesto get malignant cancer?

P(C=malignant | Age>60, Gender= male)

Smoking

GenderAge

Cancer

LungTumor

SerumCalcium

Exposureto Toxics

Page 49: Bayesian Network Bayesian Network CVPR Winter seminar Jaemin Kim

CombinedCombined

How likely is an elderly male patient with high Serum Calcium to have malignant cancer?

P(C=malignant | Age>60, Gender= male, Serum Calcium = high)

Smoking

GenderAge

Cancer

LungTumor

SerumCalcium

Exposureto Toxics

Page 50: Bayesian Network Bayesian Network CVPR Winter seminar Jaemin Kim

Explaining awayExplaining away

Smoking

GenderAge

Cancer

LungTumor

SerumCalcium

Exposureto Toxics

• If we see a lung tumor, the probability of heavy smoking and of exposure to toxics both go up.

If we then observe heavy smoking, the probability of exposure to toxics goes back down.

Smoking

Page 51: Bayesian Network Bayesian Network CVPR Winter seminar Jaemin Kim

Inference in Belief NetworksInference in Belief Networks

• Find P(Q=q|E= e)• Q the query variable• E set of evidence variables

P(q | e) =P(q, e)

P(e)

X1,…, Xn are network variables except Q, E

P(q, e) = P(q, e, x1,…, xn) x1,…, xn

Page 52: Bayesian Network Bayesian Network CVPR Winter seminar Jaemin Kim

Basic InferenceBasic Inference

A B

= P(c | b) P(b | a) P(a) b a

P(b)

P(c) = P(a, b, c)b,a

P(b) = P(a, b) = P(b | a) P(a) a a

C

bP(c) = P(c | b) P(b)

b,a= P(c | b) P(b | a) P(a)

Page 53: Bayesian Network Bayesian Network CVPR Winter seminar Jaemin Kim

Inference in treesInference in trees

X

Y1Y2

P(x) = P(x | y1, y2) P(y1, y2)y1, y2

because of independence of Y1, Y2:

y1, y2

= P(x | y1, y2) P(y1) P(y2)

X

Page 54: Bayesian Network Bayesian Network CVPR Winter seminar Jaemin Kim

PolytreesPolytrees

• A network is singly connected (a polytree) if it contains no undirected loops.

Theorem: Inference in a singly connected network can be done in linear time*.

Main idea: in variable elimination, need only maintain distributions over single nodes.

* in network size including table sizes.

Page 55: Bayesian Network Bayesian Network CVPR Winter seminar Jaemin Kim

The problem with loopsThe problem with loops

Rain

Cloudy

Grass-wet

Sprinkler

P(c) 0.5

P(r)

c c

0.99 0.01P(s)

c c

0.01 0.99

deterministic or

The grass is dry only if no rain and no sprinklers.

P(g) = P(r, s) ~ 0

Page 56: Bayesian Network Bayesian Network CVPR Winter seminar Jaemin Kim

The problem with loops contdThe problem with loops contd..

= P(r, s)

P(g | r, s) P(r, s) + P(g | r, s) P(r, s)

+ P(g | r, s) P(r, s) + P(g | r, s) P(r, s)

0

10

0

= P(r) P(s) ~ 0.5 ·0.5 = 0.25

problem

~ 0

P(g) =

Page 57: Bayesian Network Bayesian Network CVPR Winter seminar Jaemin Kim

Variable eliminationVariable elimination

A

B

C

P(c) = P(c | b) P(b | a) P(a) b a

P(b)

x

P(A) P(B | A)

P(B, A) A P(B)

x

P(C | B)

P(C, B) B P(C)

Page 58: Bayesian Network Bayesian Network CVPR Winter seminar Jaemin Kim

Inference as variable elimination Inference as variable elimination

• A factor over X is a function from val(X) to numbers in [0,1]:• A CPT is a factor• A joint distribution is also a factor

• BN inference:• factors are multiplied to give new ones• variables in factors summed out

• A variable can be summed out as soon as all factors mentioning it have been multiplied.

Page 59: Bayesian Network Bayesian Network CVPR Winter seminar Jaemin Kim

Variable Elimination with loopsVariable Elimination with loops

Smoking

GenderAge

Cancer

LungTumor

SerumCalcium

Exposureto Toxics

x

P(A,G,S)

P(A) P(S | A,G)P(G)

P(A,S)G

E,S

P(C)

P(L | C) x P(C,L) C

P(L)

P(E,S)A

P(A,E,S)

P(E | A)

x

P(C | E,S)

P(E,S,C)

x

Complexity is exponential in the size of the factors

Page 60: Bayesian Network Bayesian Network CVPR Winter seminar Jaemin Kim

Inference in BNs and Junction Tree

The main point of BNs is to enable probabilistic inference to be performed. Inference is the task of computing the probability of each value of a node in BNs when other variables’ values are know.

The general idea is doing inference by representing the joint probability distribution on an undirected graph called the Junction tree

The junction tree has the following characteristics:

• it is an undirected tree, its nodes are clusters of variables

• given two clusters, C1 and C2, every node on the path between them contains their intersection C1 C2

• a Separator, S, is associated with each edge and contains the variables in the intersection between neighbouring nodes

ABC BCD CDEBC CD

Page 61: Bayesian Network Bayesian Network CVPR Winter seminar Jaemin Kim

Moralize the Bayesian network Triangulate the moralized graph Let the cliques of the triangulated graph be the

nodes of a tree, and construct the junction tree Belief propagation throughout the junction tree

to do inference

Inference in BNsInference in BNs

Page 62: Bayesian Network Bayesian Network CVPR Winter seminar Jaemin Kim

Constructing the Junction Tree (1)Constructing the Junction Tree (1)

Step 1. Form the moral graph from the DAG

Consider BN in our example

DAG Moral Graph – marry parents and remove arrows

S

F

B L

X

S

F

B L

X

Page 63: Bayesian Network Bayesian Network CVPR Winter seminar Jaemin Kim

Constructing the Junction Tree (2)Constructing the Junction Tree (2)

Step 2. Triangulate the moral graph

An undirected graph is triangulated if every cycle of length greater than 3 possesses a chord

S

F

B L

X

Page 64: Bayesian Network Bayesian Network CVPR Winter seminar Jaemin Kim

Constructing the Junction Tree (3)Constructing the Junction Tree (3)

Step 3. Identify the Cliques

A clique is a subset of nodes which is complete (i.e. there is an edge between every pair of nodes) and maximal.

Cliques

{B,S,L}{B,L,F}{L,X}

S

F

B L

X

Page 65: Bayesian Network Bayesian Network CVPR Winter seminar Jaemin Kim

Constructing the Junction Tree (4)Constructing the Junction Tree (4)

Step 4. Build Junction Tree

The cliques should be ordered (C1,C2,…,Ck) so they possess the running intersection property: for all 1 < j ≤ k, there is an i < j such that Cj (C1… Cj-1) Ci.

To build the junction tree choose one such I for each j and add an edge between Cj and Ci.

BSL

BLF

LX

Junction Tree

Cliques

{B,S,L}{B,L,F}{L,X}

BL

L

Page 66: Bayesian Network Bayesian Network CVPR Winter seminar Jaemin Kim

)|(

),|(

)()|()|(

lxP

lbfP

sPsbPsbP

LX

SLF

BSL

Potentials InitializationPotentials Initialization

To initialize the potential functions:

1. set all potentials to unity

2. for each variable, Xi, select one node in the junction tree (i.e. one clique) containing both that variable and its parents, pa(Xi), in the original DAG

3. multiply the potential by P(xi|pa(xi))

BSL

BLF

LX

BL

L

Page 67: Bayesian Network Bayesian Network CVPR Winter seminar Jaemin Kim

Potential RepresentationPotential Representation

Ss ss

Cc cc

x

xxP

)(

)()(

)()( ccc xPx

The joint probability distribution can now be represented in terms of potential functions, ϕ, defined on each clique and each separator of the junction tree. The joint distribution is given by

The idea is to transform one representation of the joint distribution to another in which for each clique, C, the potential function gives the marginal distribution for the variables in C, i.e.

This will also apply for the separators, S.

Page 68: Bayesian Network Bayesian Network CVPR Winter seminar Jaemin Kim

Given a numbered graph, proceed from node n, decrease to 1• Determine the lower-numbered nodes

which are adjacent to the current node, including those which may have been made adjacent to this node earlier in this algorithm

• Connects these nodes to each other.

TriangulationTriangulation

Page 69: Bayesian Network Bayesian Network CVPR Winter seminar Jaemin Kim

Numbering the nodes• Arbitrarily number the nodes

• Maximum cardinality search

• Give any node a value of 1

• For each subsequent number, pick an new unnumbered node that neighbors the most already numbered nodes

TriangulationTriangulation

Page 70: Bayesian Network Bayesian Network CVPR Winter seminar Jaemin Kim

TriangulationTriangulation

BN Moralized graph

Page 71: Bayesian Network Bayesian Network CVPR Winter seminar Jaemin Kim

TriangulationTriangulation

6

7

8

2

5

4

1

3

6

7

8

2

5

4

1

3

Arbitrary numbering

Page 72: Bayesian Network Bayesian Network CVPR Winter seminar Jaemin Kim

TriangulationTriangulation

6 7 8

2 5

4 13

6 7 8

2 5

4 13

Maximum cardinality search

Page 73: Bayesian Network Bayesian Network CVPR Winter seminar Jaemin Kim

Course ContentsCourse Contents

Concepts in Probability Bayesian Networks Inference» Decision making Learning networks from data Reasoning over time Applications

Page 74: Bayesian Network Bayesian Network CVPR Winter seminar Jaemin Kim

Decision makingDecision making

Decision - an irrevocable allocation of domain resources

Decision should be made so as to maximize expected utility.

View decision making in terms of• Beliefs/Uncertainties

• Alternatives/Decisions

• Objectives/Utilities

Page 75: Bayesian Network Bayesian Network CVPR Winter seminar Jaemin Kim

Course ContentsCourse Contents

Concepts in Probability Bayesian Networks Inference Decision making» Learning networks from data Reasoning over time Applications

Page 76: Bayesian Network Bayesian Network CVPR Winter seminar Jaemin Kim

Learning networks from dataLearning networks from data

• The learning task• Parameter learning

• Fully observable• Partially observable

• Structure learning• Hidden variables

Page 77: Bayesian Network Bayesian Network CVPR Winter seminar Jaemin Kim

The learning taskThe learning task

B E A C N

...

Input: training data

Call

Alarm

Burglary Earthquake

Newscast

Output: BN modeling data

• Input: fully or partially observable data cases?

• Output: parameters or also structure?

e a cb n

b e a c n

Page 78: Bayesian Network Bayesian Network CVPR Winter seminar Jaemin Kim

Parameter learning: one variableParameter learning: one variable

Different coin tosses independent given P(X1, …, Xn | ) =

h heads, t tails

Unfamiliar coin: Let = bias of coin (long-run fraction of heads)

If known (given), then P(X = heads | ) =

h (1-)t

Page 79: Bayesian Network Bayesian Network CVPR Winter seminar Jaemin Kim

Maximum likelihoodMaximum likelihood

79

= hh+t

Input: a set of previous coin tosses• X1, …, Xn = {H, T, H, H, H, T, T, H, . . ., H}

h heads, t tails Goal: estimate

The likelihood P(X1, …, Xn | ) = h (1-)t

The maximum likelihood solution is:

Page 80: Bayesian Network Bayesian Network CVPR Winter seminar Jaemin Kim

Conditioning on dataConditioning on data

P()D

h heads, t tails

P( | D)

1 head1 tail

P() P(D | ) = P() h (1-)t

Page 81: Bayesian Network Bayesian Network CVPR Winter seminar Jaemin Kim

Conditioning on dataConditioning on data

Good parameter distribution:

* Dirichlet distribution generalizes Beta to non-binary variables.

Page 82: Bayesian Network Bayesian Network CVPR Winter seminar Jaemin Kim

General parameter learningGeneral parameter learning

A multi-variable BN is composed of several independent parameters (“coins”).

A BA, B|a, B|a

Can use same techniques as one-variable case to learn each one separately

Three parameters:

Max likelihood estimate of B|a would be:

#data cases with b, a#data cases with a

B|a =

Page 83: Bayesian Network Bayesian Network CVPR Winter seminar Jaemin Kim

Partially observable dataPartially observable data

B E A C N

... Call

Alarm

Burglary Earthquake

Newscast

? a cb ?

b ? a ? n

• Fill in missing data with “expected” value• expected = distribution over possible values• use “best guess” BN to estimate distribution

Page 84: Bayesian Network Bayesian Network CVPR Winter seminar Jaemin Kim

IntuitionIntuition

In fully observable case:

Problem: * unknown.

n|e =

#data cases with n, e#data cases with e

j I(n,e | dj)

j I(e | dj)

I(e | dj) =1 if E=e in data case dj

0 otherwise

=

In partially observable case I is unknown.

Best estimate for I is: )|,()|,(ˆ * jj denPdenI

Page 85: Bayesian Network Bayesian Network CVPR Winter seminar Jaemin Kim

Expectation Maximization (EM)Expectation Maximization (EM)

Expectation (E) step • Use current parameters to estimate filled in

data.

Maximization (M) step Use filled in data to do max likelihood estimation

)|,()|,(ˆ jj denPdenI

j j

j j

endeI

denI

)|(ˆ

)|,(ˆ~|

Repeat :

until convergence.

Set: ~:

Page 86: Bayesian Network Bayesian Network CVPR Winter seminar Jaemin Kim

Structure learningStructure learning

Goal: find “good” BN structure (relative to data)

Solution: do heuristic search over space of network structures.

Page 87: Bayesian Network Bayesian Network CVPR Winter seminar Jaemin Kim

Search spaceSearch space

Space = network structuresOperators = add/reverse/delete edges

Page 88: Bayesian Network Bayesian Network CVPR Winter seminar Jaemin Kim

Heuristic searchHeuristic search

score

Use scoring function to do heuristic search (any algorithm).Greedy hill-climbing with randomness works pretty well.

Page 89: Bayesian Network Bayesian Network CVPR Winter seminar Jaemin Kim

ScoringScoring

Fill in parameters using previous techniques & score completed networks.

One possibility for score:

likelihood function: Score(B) = P(data | B)

Example: X, Y independent coin tosses typical data = (27 h-h, 22 h-t, 25 t-h, 26 t-t)

Maximum likelihood network structure:

X Y

Max. likelihood network typically fully connected

This is not surprising: maximum likelihood always overfits…

Page 90: Bayesian Network Bayesian Network CVPR Winter seminar Jaemin Kim

Better scoring functionsBetter scoring functions

MDL formulation: balance fit to data and model complexity (# of parameters)

Score(B) = P(data | B) - model complexity

* with Dirichlet parameter prior, MDL is an approximation to full Bayesian score.

Full Bayesian formulation prior on network structures & parameters more parameters higher dimensional space get balance effect as a byproduct*

Page 91: Bayesian Network Bayesian Network CVPR Winter seminar Jaemin Kim

Hidden variablesHidden variables

There may be interesting variables that we never get to observe:• topic of a document in information retrieval;

• user’s current task in online help system. Our learning algorithm should

• hypothesize the existence of such variables;

• learn an appropriate state space for them.

Page 92: Bayesian Network Bayesian Network CVPR Winter seminar Jaemin Kim

E3

E1

E2

randomlyscattered data

Page 93: Bayesian Network Bayesian Network CVPR Winter seminar Jaemin Kim

E3

E1

E2

actual data

Page 94: Bayesian Network Bayesian Network CVPR Winter seminar Jaemin Kim

Bayesian clustering (Autoclass)Bayesian clustering (Autoclass)

(hypothetical) class variable never observed if we know that there are k classes, just run

EM learned classes = clusters Bayesian analysis allows us to choose k, trade

off fit to data with model complexity

naïve Bayes model: Class

E1 E2 En…...

Page 95: Bayesian Network Bayesian Network CVPR Winter seminar Jaemin Kim

E3

E1

E2

Resulting clusterdistributions

Page 96: Bayesian Network Bayesian Network CVPR Winter seminar Jaemin Kim

Detecting hidden variablesDetecting hidden variables

Unexpected correlations hidden variables.

Cholesterolemia

Test1 Test2 Test3

Hypothesized modelHypothesized model

Cholesterolemia

Test1 Test2 Test3

Data modelData model

Cholesterolemia

Test1 Test2 Test3

““Correct” modelCorrect” modelHypothyroid

Page 97: Bayesian Network Bayesian Network CVPR Winter seminar Jaemin Kim

Course ContentsCourse Contents

Concepts in Probability Bayesian Networks Inference Decision making Learning networks from data» Reasoning over time Applications

Page 98: Bayesian Network Bayesian Network CVPR Winter seminar Jaemin Kim

Reasoning over timeReasoning over time

Dynamic Bayesian networks Hidden Markov models Decision-theoretic planning

• Markov decision problems

• Structured representation of actions

• The qualification problem & the frame problem

• Causality (and the frame problem revisited)

Page 99: Bayesian Network Bayesian Network CVPR Winter seminar Jaemin Kim

Dynamic environmentsDynamic environments

State(t) State(t+1) State(t+2)

Markov property: past independent of future given current state; a conditional independence assumption; implied by fact that there are no arcs t t+2.

Page 100: Bayesian Network Bayesian Network CVPR Winter seminar Jaemin Kim

Dynamic Bayesian networksDynamic Bayesian networks

...

State described via random variables.

Velocity(t+1)

Position(t+1)

Weather(t+1)

Drunk(t+1)

Velocity(t)

Position(t)

Weather(t)

Drunk(t)

Velocity(t+2)

Position(t+2)

Weather(t+2)

Drunk(t+2)

Page 101: Bayesian Network Bayesian Network CVPR Winter seminar Jaemin Kim

Hidden Markov modelHidden Markov model

State transitionmodel

Observationmodel

State(t) State(t+1)

Obs(t) Obs(t+1)

An HMM is a simple model for a partially observable stochastic domain.

Page 102: Bayesian Network Bayesian Network CVPR Winter seminar Jaemin Kim

Hidden Markov modelHidden Markov model

Speech recognition:• states = phonemes• observations = acoustic

signal Biological sequencing:

• states = protein structure• observations = amino acids

0.8

0.15

0.05

Partially observable stochastic environment:

Mobile robots: states = location observations = sensor input

Page 103: Bayesian Network Bayesian Network CVPR Winter seminar Jaemin Kim

Acting under uncertaintyActing under uncertainty

agentobserves

state

Overall utility = sum of momentary rewards. Allows rich preference model, e.g.:

rewards correspondingto “get to goal asap” = +100 goal states

-1 other states

action model

Action(t)

Markov Decision Problem (MDP)

State(t+2)

Action(t+1)

Reward(t+1)Reward(t)

State(t) State(t+1)

Page 104: Bayesian Network Bayesian Network CVPR Winter seminar Jaemin Kim

Partially observable MDPsPartially observable MDPs

State(t+2)State(t) State(t+1)

Action(t) Action(t+1)

Reward(t+1)Reward(t)

The optimal action at time t depends on the entire history of previous observations.

Instead, a distribution over State(t) suffices.

agent observesObs, not state

Obs(t) Obs(t+1)Obs depends

on state

Page 105: Bayesian Network Bayesian Network CVPR Winter seminar Jaemin Kim

Structured representationStructured representation

Position(t)

Holding(t)

Direction(t)Preconditions Effects

Probabilistic action model• allows for exceptions & qualifications;• persistence arcs: a solution to the frame problem.

Position(t+1)

Holding(t+1)

Direction(t+1)Move:

Position(t)

Holding(t)

Direction(t)

Position(t+1)

Holding(t+1)

Direction(t+1)Turn:

Page 106: Bayesian Network Bayesian Network CVPR Winter seminar Jaemin Kim

ApplicationsApplications

Medical expert systems• Pathfinder• Parenting MSN

Fault diagnosis• Ricoh FIXIT • Decision-theoretic troubleshooting

Vista Collaborative filtering