Incorporating Prior Information in Causal Discovery Rodney O'Donnell, Jahangir Alam, Bin Han,...

Preview:

Citation preview

Incorporating Prior Information in Causal Discovery

Rodney O'Donnell, Jahangir Alam,

Bin Han, Kevin Korb and Ann Nicholson

Outline• Methods for learning causal models

– Data mining, Elicitation, Hybrid approach

• Algorithms for learning causal models– Constraint based– Metric based (including our CaMML)

• Incorporating priors into CaMML– 5 different types of priors

• Experimental Design

• Experimental Results

Learning Causal Bayesian Networks

Elicitation Data mining

Requires domain knowledge Requires large dataset

Expensive and time-consuming

Sometimes, the algorithms are “stupid”

(no prior knowledge →no common sense)

Partial knowledge may be insufficient

Data only tells part of the story

A hybrid approach

• Combine the domain knowledge and the facts learned from data

• Minimize the expert’s effort in domain knowledge elicitation

Elicitation Data Mining

Causal BN

• Enhance the efficiency of the learning process– Reduce / bias the search space

Objectives

• Generate different prior specification methods

• Comparatively study the influences of priors on the BN structural learning

• Future: apply the methods to the Heart Disease modeling project

Causal learning algorithms

• Constraint based– Pearl & Verma’s algorithm, PC

• Metric based– MML, MDL, BIC, BDe, K2,

K2+MWST,GES,CaMML

• Priors on structure– Optional vs. Required – Hard vs. Soft

Priors on structure

Required Optional Hard Soft

K2 (BNT) yes yes

K2+MWST

(BNT)yes yes

GES

(Tetrad)yes yes

PC (Tetrad ) yes yes

CaMML yes yes yes

CaMML• MML metric based• MML vs. MDL

– MML can be derived from Bayes’ Theorem (Wallace)– MDL is a non-Bayesian method

• Search: MCMC sampling through TOM space– TOM = DAG + total ordering – TOM is finer than DAG

A

B C

Two TOMs: ABC, ACB

A B C

One Tom: ABC

Priors in CaMML: arcs

Experts may provide priors on pairwise relations:1. Directed arcs:

– e.g. {A→B 0.7} (soft)– e.g. {A→D 1.0} (hard)

2. Undirected arcs– E.g. {A─C 0.6} (soft)

3. {A→B 0.7; B→A 0.8; A─C 0.6}– Represented by 2 adjacency matrices

0.7

0.8

A B C

A

B

C

Directed arcs

0.6

0.6

A B C

A

B

C

Undirected arcs

Priors in CaMML: arcs (continued)

• MML cost for each pair

AB: log(0.7) + log(1-0.8)

AC: log(1-0.6)

BC: log( default arc prior)

expert specified network

A0.7

0.8

0.6

B C

One candidate network

A

B C

Priors in CaMML: Tiers

• Expert can provide prior on an additional pairwise relation

• Tier: Temporal ordering of variables

E.g., Tier {A>>C 0.6;B>>C 0.8}

IMML(h)=log(0.6)+log(1-0.8)

A

C

B

One possible TOM

Priors in CaMML: edPrior

• Expert specifies single network, plus a confidence– e.g. EdConf=0.7

• Prior is based on edit distance from this network

A

B

Expert specified network

C

IMML(h)=-2*(log0.7-log(1-0.7))

One candidate network :ED=2

A

B C

Priors in CaMML: KTPrior

• Again, expert specifies single network, plus a confidence– e.g. KTConf = 0.7

• Prior is based on Kendall-Tau Edit distance from this network – KTEditDist = KT + undirected ED

A B C

Expert specified dagTOM: ABC

A

B C

A candidate TOM: ACB

IMML(h)=-3*(log0.7-log(1-0.7))

• B-C order in expert TOM disagrees with candidate TOM • KTEditDist = KT(1) + Undirected ED (2) = 3

Experiment 1: Design• Prior

– weak, strong– correct, incorrect

• Size of dataset– 100,1000,10k and 100k– For each size we randomly generate 30 datasets

• Algorithms: – CaMML– K2 (BNT)– K2+MWST (BNT)– GES (TETRAD)– PC (TETRAD)

• Models: AsiaNet, “Model6”(An artificial model)

Models: AsiaNet and “Model6”

Experimental Design

Priors

Algorithms

Sample Size

Experiment Design: Evaluation

• ED: Difference between Structures

• KL: Difference between distributions

Model6 (1000 samples)

Model6 (10k samples)

AsiaNet (1000 Samples)

Experiment 1: Results

• With default priors: CaMML is comparable to or outperforms other algorithms

• With full tiers: – There is no statistically significant differences

between CaMML and K2– GES is slightly behind, PC performs poorly.

• CaMML is the only method allowing soft priors: – with the prior 0.7, CaMML is comparable to other

algorithms with full tiers– With stronger prior, CaMML performs better

• CaMML performs significantly better with expert’s priors than with uniform priors

Expertiment 2:Is CaMML well calibrated?

• Biased prior– Expert’s confidence may not be consistent

with the expert’s skill

e.g, expert 0.99 sure but wrong about a connection

– Biased hard prior– Soft prior and data will eventually overcome

the bad prior

Is CaMML well calibrated?

• Question: Does CaMML reward well calibrated experts?

• Experimental design– Objective measure: How good is a proposed

structure?: • ED: 0-14

– Subjective measure: Expert’s confidence• 0.5 to 0.9999

– How good is the learned structure?• KL distance

Effect of expert skill and confidence on quality of learned model

Better ← Expert Skill → Worse

Overconfidence penalized

Justified confidence rewarded

Unconfident expert

Experiment 2: Results

• CaMML improves the elicited structure and approaches the true structure

• CaMML improves when the expert confidence matches with the expert skill

Conclusions

• CaMML is comparable to other algorithms when given equivalent prior knowledge

• CaMML can incorporate more flexible prior knowledge

• CaMML’s results improve when expert is skillful or well calibrated

Thanks