21
04/21/2005 CS673 1 Being Bayesian About Being Bayesian About Network Structure Network Structure A Bayesian Approach to Structure Discovery in Bayesian Networks Nir Friedman and Daphne Koller

04/21/2005 CS673 1 Being Bayesian About Network Structure A Bayesian Approach to Structure Discovery in Bayesian Networks Nir Friedman and Daphne Koller

Embed Size (px)

DESCRIPTION

04/21/2005 CS673 3 Bayesian Networks Compact representation of probability distributions via conditional independence Qualitative part: Directed acyclic graph-DAG Nodes – random variables Edges – direct influence Together : Define a unique distribution in a factored form Quantitative part: Set of conditional probability distribution EB R A C E BP(A|E,B) e b e !b !e b !e !b P(B,E,A,C,R) =P(B)P(E)P(A|B,E)P(R|E)P(C|A)

Citation preview

Page 1: 04/21/2005 CS673 1 Being Bayesian About Network Structure A Bayesian Approach to Structure Discovery in Bayesian Networks Nir Friedman and Daphne Koller

04/21/2005 CS673 1

Being Bayesian About Being Bayesian About Network StructureNetwork Structure

A Bayesian Approach to Structure Discovery in Bayesian Networks

Nir Friedman and Daphne Koller

Page 2: 04/21/2005 CS673 1 Being Bayesian About Network Structure A Bayesian Approach to Structure Discovery in Bayesian Networks Nir Friedman and Daphne Koller

04/21/2005 CS673 2

RoadmapRoadmap• Bayesian learning of Bayesian Networks

– Exact vs Approximate Learning

• Markov Chain Monte Carlo method– MCMC over structures – MCMC over orderings

• Experimental Results

• Conclusions

Page 3: 04/21/2005 CS673 1 Being Bayesian About Network Structure A Bayesian Approach to Structure Discovery in Bayesian Networks Nir Friedman and Daphne Koller

04/21/2005 CS673 3

Bayesian NetworksBayesian Networks• Compact representation of probability distributions via

conditional independence

Qualitative part:Directed acyclic graph-DAG• Nodes – random variables• Edges – direct influence

Together:Define a unique distribution in a factored form

Quantitative part:Set of conditional probability distribution

E B

R A

C

E B P(A|E,B)e be !b!e b!e !b

0.9 0.10.2 0.80.9 0.10.01 0.99

P(B,E,A,C,R) =P(B)P(E)P(A|B,E)P(R|E)P(C|A)

Page 4: 04/21/2005 CS673 1 Being Bayesian About Network Structure A Bayesian Approach to Structure Discovery in Bayesian Networks Nir Friedman and Daphne Koller

04/21/2005 CS673 4

Why Learn Bayesian Networks?Why Learn Bayesian Networks?• Conditional independencies & graphical

representation capture the structure of many real-world distributions - Provides insights into domain

• Graph structure allows “knowledge discovery”• Is there a direct connection between X & Y• Does X separate between two “subsystems”• Does X causally affect Y

• Bayesian Networks can be used for many tasks– Inference, causality, etc.

• Examples: scientific data mining- Disease properties and symptoms- Interactions between the expression of genes

Page 5: 04/21/2005 CS673 1 Being Bayesian About Network Structure A Bayesian Approach to Structure Discovery in Bayesian Networks Nir Friedman and Daphne Koller

04/21/2005 CS673 5

Learning Bayesian NetworksLearning Bayesian Networks

Data + Prior

Information

InducerE B

R A

C

E B P(A|E,B)e be !b!e b!e !b

0.9 0.10.2 0.80.9 0.10.01 0.99•Inducer needs the prior probability distribution P(Inducer needs the prior probability distribution P(BB))

Using Bayesian conditioning, update the priorUsing Bayesian conditioning, update the prior

P( P(BB) ) P( P(BB|D)|D)

Page 6: 04/21/2005 CS673 1 Being Bayesian About Network Structure A Bayesian Approach to Structure Discovery in Bayesian Networks Nir Friedman and Daphne Koller

04/21/2005 CS673 6

Why Struggle for Accurate Structure?Why Struggle for Accurate Structure?AE B

S

AE B

S

AE B

S

““True” structureTrue” structure

Adding an arcAdding an arc Missing an arcMissing an arc

•Increases the number of Increases the number of parameters to be fittedparameters to be fitted

Wrong assumptions about Wrong assumptions about causality and domain structurecausality and domain structure

•Cannot be compensated by Cannot be compensated by accurate fitting of parametersaccurate fitting of parameters

Also misses causality and Also misses causality and domain structuredomain structure

Page 7: 04/21/2005 CS673 1 Being Bayesian About Network Structure A Bayesian Approach to Structure Discovery in Bayesian Networks Nir Friedman and Daphne Koller

04/21/2005 CS673 7

Score-based learningScore-based learning• Define scoring function that evaluates how well a

structure matches the data

E B

A

E

A

B

E

BA

E, B, A<Y,N,N><Y,Y,Y><N,Y,Y> . .<N,N,N>

• Search for a structure that maximizes the score

Page 8: 04/21/2005 CS673 1 Being Bayesian About Network Structure A Bayesian Approach to Structure Discovery in Bayesian Networks Nir Friedman and Daphne Koller

04/21/2005 CS673 8

Bayesian Score of a ModelBayesian Score of a Model

)()()|()|(

DPGPGDPDGP

wherwhere e

dGPGDPGDP )|(),|()|(

Marginal Likelihood

LikelihoodPrior over parameters

Page 9: 04/21/2005 CS673 1 Being Bayesian About Network Structure A Bayesian Approach to Structure Discovery in Bayesian Networks Nir Friedman and Daphne Koller

04/21/2005 CS673 9

Discovering Structure – Model Discovering Structure – Model SelectionSelection

P(G|D)P(G|D)

E B

R A

C

•Current practice: model selectionCurrent practice: model selection Pick a single high-scoring model Pick a single high-scoring model Use that model to infer domain structure Use that model to infer domain structure

Page 10: 04/21/2005 CS673 1 Being Bayesian About Network Structure A Bayesian Approach to Structure Discovery in Bayesian Networks Nir Friedman and Daphne Koller

04/21/2005 CS673 10

Discovering Structure – Model Discovering Structure – Model AveragingAveraging

P(G|D)P(G|D)

E B

R A

C

E B

R A

C

E B

R A

C

E B

R A

C

E B

R A

C ProblemProblemSmall sample size Small sample size many high scoring many high scoring modelsmodelsAnswer based on one model often uselessAnswer based on one model often uselessWant features common to many modelsWant features common to many models

Page 11: 04/21/2005 CS673 1 Being Bayesian About Network Structure A Bayesian Approach to Structure Discovery in Bayesian Networks Nir Friedman and Daphne Koller

04/21/2005 CS673 11

Bayesian ApproachBayesian Approach• Estimate probability of features

– Edge X Y– Markov edge X -- Y– Path X … Y– ...

G

DGPGfDfP )|()()|(

Feature of G,e.g., X Y

Indicator function for feature f

Bayesian score for G

• Huge (super-exponential – 2Θ(n2)) number of networks G • Exact learning - intractable

Page 12: 04/21/2005 CS673 1 Being Bayesian About Network Structure A Bayesian Approach to Structure Discovery in Bayesian Networks Nir Friedman and Daphne Koller

04/21/2005 CS673 12

Approximate Bayesian LearningApproximate Bayesian Learning

• Restrict the search space to Gk, where Gk – set of graphs with indegree bounded by k -space still super-exponential

• Find a set G of high scoring structures – Estimate

- Hill-climbing – biased sample of structures

G

G

DGP

GfDGPDfP

)|(

)()|()|(

Page 13: 04/21/2005 CS673 1 Being Bayesian About Network Structure A Bayesian Approach to Structure Discovery in Bayesian Networks Nir Friedman and Daphne Koller

04/21/2005 CS673 13

Markov Chain Monte Carlo over Markov Chain Monte Carlo over NetworksNetworks

MCMC Sampling– Define Markov Chain over BNs– Perform a walk through the chain to get

samples G’s whose posteriors converge to the posterior P(G|D) of the true structure

• Possible pitfalls:– Still super-exponential number of networks– Time for chain to converge to posterior is unknown– Islands of high posterior, connected by low bridges

Page 14: 04/21/2005 CS673 1 Being Bayesian About Network Structure A Bayesian Approach to Structure Discovery in Bayesian Networks Nir Friedman and Daphne Koller

04/21/2005 CS673 14

Better Approach to Approximate Better Approach to Approximate LearningLearning

• Further constraints on the search space

– Perform model averaging over the structures consistent with some know (fixed) total ordering ‹

• Ordering of variables: – X1 ‹ X2 ‹…‹ Xn parents for Xi must be in X1, X2,…,

Xi-1

• Intuition: Order decouples choice of parents– Choice of Pa(X7) does not restrict choice of Pa(X12) Can compute efficiently in closed formCan compute efficiently in closed form

Likelihood P(D|Likelihood P(D|‹‹))Feature probability Feature probability P(f|D,P(f|D,‹‹))

Page 15: 04/21/2005 CS673 1 Being Bayesian About Network Structure A Bayesian Approach to Structure Discovery in Bayesian Networks Nir Friedman and Daphne Koller

04/21/2005 CS673 15

Sample OrderingsSample OrderingsWe can write

)|(),|()|( DPDfPDfP

Sample orderings and approximate

n

ii DfPDfP

1

),|()|(

MCMC Sampling• Define Markov Chain over orderings• Run chain to get samples from posterior P(<|D)

Page 16: 04/21/2005 CS673 1 Being Bayesian About Network Structure A Bayesian Approach to Structure Discovery in Bayesian Networks Nir Friedman and Daphne Koller

04/21/2005 CS673 16

Experiments: Exact posterior over Experiments: Exact posterior over orders versus order-MCMCorders versus order-MCMC

Page 17: 04/21/2005 CS673 1 Being Bayesian About Network Structure A Bayesian Approach to Structure Discovery in Bayesian Networks Nir Friedman and Daphne Koller

04/21/2005 CS673 17

Experiments: ConvergenceExperiments: Convergence

Page 18: 04/21/2005 CS673 1 Being Bayesian About Network Structure A Bayesian Approach to Structure Discovery in Bayesian Networks Nir Friedman and Daphne Koller

04/21/2005 CS673 18

Experiments: structure-MCMC – Experiments: structure-MCMC – posterior correlation for two different posterior correlation for two different

runsruns

Page 19: 04/21/2005 CS673 1 Being Bayesian About Network Structure A Bayesian Approach to Structure Discovery in Bayesian Networks Nir Friedman and Daphne Koller

04/21/2005 CS673 19

Experiments: order-MCMC – posterior Experiments: order-MCMC – posterior correlation for two different runscorrelation for two different runs

Page 20: 04/21/2005 CS673 1 Being Bayesian About Network Structure A Bayesian Approach to Structure Discovery in Bayesian Networks Nir Friedman and Daphne Koller

04/21/2005 CS673 20

ConclusionConclusion

• Order-MCMC better than structure-MCMC

Page 21: 04/21/2005 CS673 1 Being Bayesian About Network Structure A Bayesian Approach to Structure Discovery in Bayesian Networks Nir Friedman and Daphne Koller

04/21/2005 CS673 21

ReferencesReferencesBeing Bayesian about Network Structure: A Bayesian Approach to Structure Discovery in Bayesian Networks, N. Friedman and D. Koller. Machine Learning Journal, 2002

NIPS 2001 Tutorial on learning Bayesian networks from Data. Nir Friedman and Daphne Koller

Nir Friedman and Moises Goldzsmidt, AAAI-98 Tutorial on learning Bayesian networks from Data.

D. Heckerman.  A Tutorial on Learning with Bayesian Networks.  In Learning in Graphical Models, M. Jordan, ed.. MIT Press, Cambridge, MA, 1999.  Also appears as Technical Report MSR-TR-95-06, Microsoft Research, March, 1995.  An earlier version appears as Bayesian Networks for Data Mining, Data Mining and Knowledge Discovery, 1:79-119, 1997.Christophe Andrieu, Nando de Freitas, Arnaud Doucet and Michael I. Jordan. An Introduction to MCMC for Machine Learning. Machine Learning, 2002.

Artificial Intelligence: A Modern Approach. Stuart Russell and Peter Norvig