Upload
kalia-houston
View
40
Download
0
Tags:
Embed Size (px)
DESCRIPTION
From Association Analysis to Causal Discovery. Prof Jiuyong Li University of South Australia. Association analysis. Diapers -> Beer Bread & Butter -> Milk. P ositive correlation of birth rate to stork population. increasing the stork population would increase the birth rate ?. - PowerPoint PPT Presentation
Citation preview
From Association Analysis to Causal Discovery
Prof Jiuyong Li
University of South Australia
Association analysis• Diapers -> Beer• Bread & Butter -> Milk
Positive correlation of birth rate to stork population
• increasing the stork population would increase the birth rate?
Further evidence for Causality ≠ AssociationsSimpson paradox
Recovered Not recovered Sum Recover rate
Drug 20 20 40 50%
No Drug 16 24 40 40%
36 44 80
Female Recovered Not recovered Sum Recover rate
Drug 2 8 10 20%
No Drug 9 21 30 30%
11 29 40
Male Recovered Not recovered Sum Recover rate
Drug 18 12 30 60%
No Drug 7 3 10 70%
25 15 40
Association and Causal Relationship
• Two variables X and Y.• Prob(Y | X) ≠ P(Y), X is associated with Y (association
rules)• Prob(Y | do X) ≠ Prob(Y | X)• How does Y vary when X changes?
• The key, How to estimate Prob(Y | do X)? • In association analysis, the relationship of X and Y
is analysed in isolation. • However, the relationship between X and Y is
affected by other variables.
5
Causal discovery 1• Randomised
controlled trials– Gold standard
method– Expensive– Infeasible
• Association = causation
Causal discovery 2• Bayesian network based
causal inference – Do-calculus (Pearl 2000)– IDA (Maathuis et al. 2009) – To infer causal effects in
a Bayesian network. – However– Constructing a Bayesian
network is NP hard– Low scalability to large
number of variables
Leaning causal structures• PC algorithm (Spirtes,
Glymour and Scheines)– Not (A ╨ B | Z), there is an
edge between A and B.– The search space
exponentially increases with the number of variables.
• Constraint based search– CCC (G. F. Cooper, 1997)– CCU (C. Silverstein et. al.
2000)– Efficiently removing non-
causal relationships.
A C
B
ABC
CCU
A C
B
ABC, ABC, CAB
CCC
Association rules
• Many efficient algorithms
• Hundreds of thousands to millions of rules.– Many are spurious.
• Interpretability– Association rules do
not indicate causal effects.
Causal rules• Discover causal relationships using partial association
and simulated cohort study. • Do not rely on Bayesian network structure learning. The
discovery of causal rules also have strong theoretical support.
• Discover both single cause and combined causes.• Can be discovered efficiently.
• Z. Jin, J. Li, L. Liu, T. D. Le, B. Sun, and R. Wang, Discovery of causal rules using partial association. ICDM, 2012
• J. Li, T. D. Le, L. Liu, J. Liu, Z. Jin, and B. Sun. Mining causal association rules. In Proceedings of ICDM Workshop on Causal Discovery (CD), 2013.
Problem
A B C D E F Y #repeats
1 1 1 1 1 1 1 14
1 0 1 1 1 1 1 8
1 1 0 1 0 1 1 15
0 1 1 1 1 1 1 8
0 1 0 0 0 0 0 5
0 0 0 0 1 0 1 6
1 0 0 0 0 1 0 4
1 0 1 1 1 0 0 3
0 1 0 1 1 0 0 3
0 1 0 0 1 0 0 5
Discover causal rules from large databases of binary variables
A YC YBF YDE Y
Partial association test
I J
K
I JK
I J
K
M. W. Birch, 1964.
Nonzero partial association
Partial association test – an example
4. Partial association test.A B C D E F Y G #repeat
1 1 1 1 1 1 1 0 14
1 0 1 1 1 1 1 0 8
1 1 0 1 0 1 1 0 15
0 1 1 1 1 1 1 0 8
0 1 0 0 0 0 0 0 5
0 0 0 0 1 0 1 0 6
1 0 0 0 0 1 0 0 4
1 0 1 1 1 0 0 0 3
1 1 1 1 0 1 1 1 3
0 1 0 0 1 0 0 0 5
k kk
kkkk
k k
kkkk
nnnnnn
nnnnn
KYXPA
)1(
)21
|(|
),,(
2001.1
201100011
),,( ACDEYBFPA
68.125
8031401100011
k
kkkk
n
nnnn 6776.0)125(25
3112214
)1( 22001.1
kk
kkkk
nn
nnnn
Fast partial association test
• K denotes all possible variable combinations, the number is very large.
• Counting the frequencies of the combinations is also time consuming.
• Our solution: – Sort data and count frequencies of the
equivalence classes.– Only use the combinations existing in the data set.
Pruning strategies Definition (Redundant causal rules): Assume that X W, if X → Y is a causal rule, ⊂rule W → Y is redundant as it does not provide new information.
Definition (Condition for testing causal rules): We only test a combined causal rule XV → Y if X and Y have a zero association and V and Y have a zero association (cannot pass the qui-square test in step 3).
AlgorithmA B C D E F G Y #repeats
1 1 1 1 1 1 0 1 14
1 0 1 1 1 1 0 1 8
1 1 0 1 0 1 0 1 15
0 1 1 1 1 1 0 1 8
0 1 0 0 0 0 0 0 5
0 0 0 0 1 0 0 1 6
1 0 0 0 0 1 0 0 4
1 0 1 1 1 0 0 0 3
1 1 1 1 1 1 1 0 3
0 1 0 0 1 0 0 0 5
1. Prune the variable set (support)
2. Create the contingency table for each variable X
x
Y=1 Y=0 Total
X=1 n11 n12 n1.
X=0 n21 n22 n2.
Total n.1 n.2 n
3. Calculate the • If go to next step
2, YX
22, YX
22, YX
2,2
1,1
22
, )(
))((ji
ji ij
ijijYX nE
nEn
4. Partial association test.• If PA(X, Y, K) is nonzero
then XY is a causal rule.
5. Repeat 1-4 for each variable
which is the combination of variables in set N
• If move X to a set N
positive association
zero association
2),,( KYXPA
Experimental evaluations• We use the Arrhythmia data set in UCI machine learning
repository.
– We need to classify the presence and absence of cardiac arrhythmia. The data set contains 452 records and each record obtains 279 data attributes and one class attribute
• Our results are quite consistent with the results from CCC method.
• Some rules in CCC are removed by our method as they cannot pass the partial association test.
• Our method can discover the combined rules. CCC and CCU methods are not set to discover these rules.
Comparison with CCC and CCU
Experimental evaluations
Figure 1: Extraction Time Comparison (20K Records) Figure 1: Extraction Time Comparison (100K Records)
Summary 1
• Simpson paradox– Associations might be inconsistent in subsets
• Partial association test– Test the persistency of associations in all possible
partitions. – Statistically sound.– Efficiency in sparse data.
• What else?
Cohort study 1
Defined population
Expose Not expose
Not havea disease
Have a disease
Not have a disease
Have a disease
• Prospective: follow up.• Retrospective: look back. Historic study.
Cohort study 2
• Cohorts: share common characteristics but exposed or not exposed.
• Determine how the exposure causes an outcome.
• Measure: odds ratio = (a/b) / (c/d)Diseased Healthy
Exposed a bNot exposed c d
Limitations of cohort study• Need to know a hypothesis beforehand• Domain experts determine the control
variables.• Collect data and test the hypothesis. • Not for data exploration.
• We need– Given a data set without any hypotheses.– An automatic method to find and validate
hypotheses.– For data exploration.
Control variables
• If we do not control covariates (especially those correlated to the outcome), we could not determine the true cause.
• Too many control variables result too few matched cases in data.– How many people with the same race, gender, blood type,
hair colour, eye colour, education level, …. • Irrelevant variables should not be controlled.
– Eye colour may not relevant to the study.
Cause Outcome
Other factors
Matches• Exact matching
– Exact matches on all covariates. Infeasible.• Limited exact matching
– Exact matches on a few key covariates. • Nearest neighbour matching
– Find the closest neighbours• Propensity score matching
– Based on the predicted effect of a treatment of covariates.
Method1
A B C D E F Y
1 1 1 1 1 1 1
1 0 1 1 1 1 1
1 1 0 1 0 1 1
0 1 1 1 1 1 1
0 1 0 0 0 0 0
0 0 0 0 1 0 1
1 0 0 0 0 1 0
1 0 1 1 1 0 0
0 1 0 1 1 0 0
0 1 0 0 1 0 0
Discover causal association rules from large databases of binary variables
A YA B C D E F Y
1 1 1 1 1 1 1
1 0 1 0 1 1 1
1 1 0 1 0 1 0
1 0 1 0 1 0 0
0 1 1 1 1 1 0
0 0 1 0 1 1 0
0 1 0 1 0 1 1
0 0 1 0 1 0 1
Fair dataset
Methods
A B C D E F Y
1 1 1 1 1 1 1
1 0 1 0 1 1 1
1 1 0 1 0 1 0
1 0 1 0 1 0 0
0 1 1 1 1 1 0
0 0 1 0 1 1 0
0 1 0 1 0 1 1
0 0 1 0 1 0 1
Fair dataset• A: Exposure variable
• {B,C,D,E,F}: controlled variable set.
• Rows with the same color for the controlled variable set are called matched record pairs.
A=0
A=1 Y=1 Y=0
Y=1 n11 n12
Y=0 n21 n22
• An association rule is a causal association rule if: A Y
1)( YAOddsRatiofD
Algorithm
28
A B C D E F G Y
1 1 1 1 1 1 0 1
… … …
1 1 0 1 0 1 0 1
1. Remove irrelevant variables (support, local support, association)
2. Find the exclusive variables of the exposure variable (support, association), i.e. G, F.
The controlled variable set = {B, C, D, E}.
x
3. Find the fair dataset. Search for all matched record pairs
4. Calculate the odds-ratio to identify if the testing rule is causal
5. Repeat 2-4 for each variable which is the combination of variables. Only consider combination of non-causal factors.
For each association rule (e. g. ) A Y
A B C D E Y
1 1 1 1 1 1
… … …
0 1 1 1 1 0
… …
x
Experimental evaluations
Experimental evaluations
Figure 1: Extraction Time Comparison (20K Records)
CAR CCC CCU
Experimental evaluations
Causality – Judea Pearl
Judea Pearl. Causality: Models, Reasoning, and Inference. Cambridge University Press, 2000.32
X1 X2 … Xn-1 Xn
5.2 7.5 6.5 5.2
5.6 7.2 6.6 5.3
… … … … …
5.4 7.1 7.1 5.7
5.7 6.9 6.9 5.8
+1
+0.8
Methods
• IDA– Maathuis, H. M.,
Colombo, D., Kalisch, M., and Buhlmann, P. (2010). Predicting causal effects in large-scale systems from observational data. Nature Methods, 7(4), 247–249.
33
Conclusions• Association analysis has been widely used in data
mining, but associations do not indicate causal relationships.
• Association rule mining can be adapted for causal relationship discovery by combining some statistical methods.
– Partial association test
– Cohort study
• They are efficient alternatives for causal Bayesian network based methods.
• They are capable of finding combined causal factors.
Discussions• Causality and classification
– Estimate prob (Y| do X) instead of prob (Y|X).
• Feature section versus controlled variable selection.
• Evaluation of causes.– Not classification accuracy– Bayesian networks??
Research Collaborators
• Jixue Liu• Lin Liu• Thuc Le• Jin Zhou• Bin-yu Sun
Thank you for listening
Questions please ??