View
219
Download
0
Tags:
Embed Size (px)
Citation preview
Incorporating Prior Information in Causal Discovery
Rodney O'Donnell, Jahangir Alam,
Bin Han, Kevin Korb and Ann Nicholson
Outline• Methods for learning causal models
– Data mining, Elicitation, Hybrid approach
• Algorithms for learning causal models– Constraint based– Metric based (including our CaMML)
• Incorporating priors into CaMML– 5 different types of priors
• Experimental Design
• Experimental Results
Learning Causal Bayesian Networks
Elicitation Data mining
Requires domain knowledge Requires large dataset
Expensive and time-consuming
Sometimes, the algorithms are “stupid”
(no prior knowledge →no common sense)
Partial knowledge may be insufficient
Data only tells part of the story
A hybrid approach
• Combine the domain knowledge and the facts learned from data
• Minimize the expert’s effort in domain knowledge elicitation
Elicitation Data Mining
Causal BN
• Enhance the efficiency of the learning process– Reduce / bias the search space
Objectives
• Generate different prior specification methods
• Comparatively study the influences of priors on the BN structural learning
• Future: apply the methods to the Heart Disease modeling project
Causal learning algorithms
• Constraint based– Pearl & Verma’s algorithm, PC
• Metric based– MML, MDL, BIC, BDe, K2,
K2+MWST,GES,CaMML
• Priors on structure– Optional vs. Required – Hard vs. Soft
Priors on structure
Required Optional Hard Soft
K2 (BNT) yes yes
K2+MWST
(BNT)yes yes
GES
(Tetrad)yes yes
PC (Tetrad ) yes yes
CaMML yes yes yes
CaMML• MML metric based• MML vs. MDL
– MML can be derived from Bayes’ Theorem (Wallace)– MDL is a non-Bayesian method
• Search: MCMC sampling through TOM space– TOM = DAG + total ordering – TOM is finer than DAG
A
B C
Two TOMs: ABC, ACB
A B C
One Tom: ABC
Priors in CaMML: arcs
Experts may provide priors on pairwise relations:1. Directed arcs:
– e.g. {A→B 0.7} (soft)– e.g. {A→D 1.0} (hard)
2. Undirected arcs– E.g. {A─C 0.6} (soft)
3. {A→B 0.7; B→A 0.8; A─C 0.6}– Represented by 2 adjacency matrices
0.7
0.8
A B C
A
B
C
Directed arcs
0.6
0.6
A B C
A
B
C
Undirected arcs
Priors in CaMML: arcs (continued)
• MML cost for each pair
AB: log(0.7) + log(1-0.8)
AC: log(1-0.6)
BC: log( default arc prior)
expert specified network
A0.7
0.8
0.6
B C
One candidate network
A
B C
Priors in CaMML: Tiers
• Expert can provide prior on an additional pairwise relation
• Tier: Temporal ordering of variables
E.g., Tier {A>>C 0.6;B>>C 0.8}
IMML(h)=log(0.6)+log(1-0.8)
A
C
B
One possible TOM
Priors in CaMML: edPrior
• Expert specifies single network, plus a confidence– e.g. EdConf=0.7
• Prior is based on edit distance from this network
A
B
Expert specified network
C
IMML(h)=-2*(log0.7-log(1-0.7))
One candidate network :ED=2
A
B C
Priors in CaMML: KTPrior
• Again, expert specifies single network, plus a confidence– e.g. KTConf = 0.7
• Prior is based on Kendall-Tau Edit distance from this network – KTEditDist = KT + undirected ED
A B C
Expert specified dagTOM: ABC
A
B C
A candidate TOM: ACB
IMML(h)=-3*(log0.7-log(1-0.7))
• B-C order in expert TOM disagrees with candidate TOM • KTEditDist = KT(1) + Undirected ED (2) = 3
Experiment 1: Design• Prior
– weak, strong– correct, incorrect
• Size of dataset– 100,1000,10k and 100k– For each size we randomly generate 30 datasets
• Algorithms: – CaMML– K2 (BNT)– K2+MWST (BNT)– GES (TETRAD)– PC (TETRAD)
• Models: AsiaNet, “Model6”(An artificial model)
Models: AsiaNet and “Model6”
Experimental Design
Priors
Algorithms
Sample Size
Experiment Design: Evaluation
• ED: Difference between Structures
• KL: Difference between distributions
Model6 (1000 samples)
Model6 (10k samples)
AsiaNet (1000 Samples)
Experiment 1: Results
• With default priors: CaMML is comparable to or outperforms other algorithms
• With full tiers: – There is no statistically significant differences
between CaMML and K2– GES is slightly behind, PC performs poorly.
• CaMML is the only method allowing soft priors: – with the prior 0.7, CaMML is comparable to other
algorithms with full tiers– With stronger prior, CaMML performs better
• CaMML performs significantly better with expert’s priors than with uniform priors
Expertiment 2:Is CaMML well calibrated?
• Biased prior– Expert’s confidence may not be consistent
with the expert’s skill
e.g, expert 0.99 sure but wrong about a connection
– Biased hard prior– Soft prior and data will eventually overcome
the bad prior
Is CaMML well calibrated?
• Question: Does CaMML reward well calibrated experts?
• Experimental design– Objective measure: How good is a proposed
structure?: • ED: 0-14
– Subjective measure: Expert’s confidence• 0.5 to 0.9999
– How good is the learned structure?• KL distance
Effect of expert skill and confidence on quality of learned model
Better ← Expert Skill → Worse
Overconfidence penalized
Justified confidence rewarded
Unconfident expert
Experiment 2: Results
• CaMML improves the elicited structure and approaches the true structure
• CaMML improves when the expert confidence matches with the expert skill
Conclusions
• CaMML is comparable to other algorithms when given equivalent prior knowledge
• CaMML can incorporate more flexible prior knowledge
• CaMML’s results improve when expert is skillful or well calibrated
Thanks