Upload
elsa
View
29
Download
4
Tags:
Embed Size (px)
DESCRIPTION
Being Bayesian about Network Structure. Nir Friedman Daphne Koller Hebrew Univ. Stanford Univ.. Structure Discovery. Current practice: model selection Pick a single model (of high score) Use that model to represent domain structure - PowerPoint PPT Presentation
Citation preview
.
Being Bayesian about Network Structure
Nir Friedman Daphne Koller Hebrew Univ. Stanford Univ.
2
Structure Discovery Current practice: model selection
Pick a single model (of high score) Use that model to represent domain structure
Enough data “right” model overwhelmingly likely But what about the rest of the time?
Many high-scoring models Answer based on one model often useless
Bayesian model averaging is Bayesian ideal
G
DGPGfDfP )|()()|(Feature of G,
e.g., XY
3
Model Averaging
Unfortunately, it is intractable: # of possible structures is superexponential That’s why no one really does it*
Our contribution: Closed form solution for fixed ordering
over nodes MCMC over orderings for general case
Faster convergence, robust results.
* Exceptions: Madigan & Raftery, Madigan & York; see below
)log( 2
2 nnO
4
Fixed Ordering
Suppose that We know the ordering of variables
say, X1 > X2 > X3 > X4 > … > Xn parents for Xi must be in X1,…,Xi-1
Limit number of parents per nodes to k
Intuition: Order decouples choice of parents
The choice of parents for X7 do not restrict the choice of parents for X12
We can exploit this to simplify the form of P(D)
2k•n•log n networks
5
Ordering: Computing P(D)
Set of possible parent sets for Xi consistent with has size at most k
i Ui
G i
Gii
i
UXScore
PaXScoreDP
,
)|(
)|()|(
U
G
,iU
Small number of potentialfamilies per node
Independenceof families
Efficient closed-form summation over exponential number of structures
)|( GDP
6
MCMC over Models
Cannot enumerate structures, so sample structures
MCMC Sampling Define Markov chain over BN models Run chain to get samples from posterior P(G | D)
Possible pitfalls: huge number of models mixing rate (also required burn-in) unknown islands of high posterior, connected by low bridges
)|(~)(1
)|)((1
DGPGwithGfn
DGfP i
n
ii
7
ICU Alarm BN: No Mixing However, with 500 instances:
the runs clearly do not mix.
MCMC Iteration
Sco
re o
f cu
ure
nt s
ampl
e
-9400
-9200
-9000
-8800
-8600
-8400
0 100000 200000 300000 400000 500000 600000
score
iteration
emptygreedy
8
Effects of Non-Mixing Two MCMC runs over same 500 instances Probability estimates for Markov features:
based on 50 nets sampled from MCMC process
Probability estimates highly variable, nonrobust
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
true BN vs randomInitialization
true BN vs true BN
9
Our Approach: Sample Orderings
We can write
Comment: Structure prior P(G) changes uniform prior over structures uniform prior over orderings and on
structures consistent with a given ordering
Sample orderings and approximate
)|(),|()|( DPDGPDGP
)|(~with],|)([]|)([1
DPDGfEDGfE ii
n
i
10
MCMC Over Orderings
Use Metropolis-Hasting algorithm Specify a proposal distribution q(’| )
flip: (i1 … ij … ik … in) (i1 … ik … ij … in)
“cut”: (i1 … ij ij+1 … in) (ij+1 … in i1 … ij)
Each iteration: Sample ’ from q(’| ) go ’ with probability
Since priors are uniform
Efficient computation!!!
))'|()|()|'()|'(
,1min(
qDPqDP
)|()'|(
)|()|'(
DPDP
DPDP
11
Why Ordering Helps
Smaller space Significant reduction in size of sample space
Better structured space We can get from one ordering to another in
(relatively) small number of steps Smoother posterior “landscape”
Score of an ordering is sum over many networks No ordering is “horrendous”
no “islands” of high posterior separated by a deep blue sea
12
Mixing with MCMC-Orderings 4 runs on ICU-Alarm with 500 instances
fewer iterations than MCMC-Nets approximately same amount of computation
Process is clearly mixing!
-8450
-8445
-8440
-8435
-8430
-8425
-8420
-8415
-8410
-8405
-8400
0 10000 20000 30000 40000 50000 60000
score
iteration
randomgreedy
MCMC Iteration
Sco
re o
f cu
ure
nt s
ampl
e
13
Mixing of MCMC runs Two MCMC runs over same 500 instances Probability estimates for Markov features:
based on 50 nets sampled from MCMC process
Probability estimates very robust100 instances
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
1000 instances
14
Computing Feature Posterior: P(f|’,D)
Edges:
Markov Blanket: IfYZ or both Y and Z are parents of some X Posterior of these features are independent
Other features (e.g., existence of causal path): Sample networks from ordering Estimate features from networks
,
,
)|(
)|(}{
),|(
i
i
i
Ui
Ui
XY UXScore
UXScoreUY1
DfP
U
U
YX
XZYYZZY DfPDfPDfP
)),|(1()),|(1(1),|( },{~
15
Feature Reconstruction (ICU-Alarm) Markov Features
Fal
se N
egat
ives
False Positives
BootstrapOrder
Structure
0
10
20
30
40
50
0 10 20 30 0 10 20 30 0 10 20 30
Reconstruct “true” features of generating network
16
Feature Reconstruction
(ICU-Alarm) Path Features
0
50
100
150
200
0
50
100
150
200
0
50
100
150
200
0 200 400 600
BootstrapOrder
Structure
17
Conclusion Full Bayesian model averaging is tractable for
known ordering.
MCMC over orderings allows robust approximation to full Bayesian averaging over Bayes nets
rapid and reliable mixing robust & reliable estimates for probability of
structural features
Crucial for structure discovery in domains with limited data
Biological discovery