View
218
Download
0
Category
Tags:
Preview:
Citation preview
1
Part 2
Automatically Identifying and Measuring Latent
Variables for Causal Theorizing
2
Assumptions Throughout
• Causal Bayes Nets
• Causal Markov Condition
• Faithfulness
3
Latent Variables
Reduce Dimensionality
X1
F1
X200 X2 X3 . . . . X4
F2
4
Latent Variables
Cluster of Causes
Income
Socioeconomic Status
Education House Size
5
Latent Variables
Model concepts that might be “real” but which cannot be directly measured, e.g., air polution, depression
I1
Air Polution
I2
I20
.
.
Dep1
12
.
.
Depression
Dep2
12
Dep20
12
6
The Causal Theory Formation Problem for Latent Variable Models
Given observations on a number of variables, identify the latent variables that underlie these variables and the causal relations among these latent concepts.
Example: Spectral measurements of solar radiation intensities. Variables are intensities at each measured frequency.
Example: Quality of a Child’s Home Environment, Cumulative Exposure to Lead, Cognitive Functioning
7
The Most Common Automatic Solution: Exploratory Factor Analysis
• Chooses “factors” to account linearly for as much of the variance/covariance of the measured variables as possible.
• Great for dimensionality reduction• Factor rotations are arbitrary• Gives no information about the statistical and thus
the causal dependencies among any real underlying factors.
• No general theory of the reliability of the procedure
8
Other Solutions
• Independent Components, etc
• Background Theory
• Scales
9
Other Solutions: Background Theory
St1
12
Home
St2
12
St21
12
.
.
T1
Lead
.
.
Cognitive Function
T2
T20
C1 C2 C20 . .
?
Key Causal Question
Thus, key statistical question: Lead _||_ Cog | Home ?
Specified Model
10
St1
12
Home
St2
12
St21
12
.
.
T1
Lead
.
.
Cognitive Function
T2
T20
C1 C2 C20 . .
F
Lead _||_ Cog | Home ?
Yes, but statistical inference will say otherwise.
Other Solutions: Background Theory
True Model
“Impurities”
11
St1
12
Home
St2
12
St21
12
.
.
T1
Lead
.
.
Cognitive Function
T2
T20
C1 C2 C20 . .
F
Other Solutions: Background Theory
True Model“Impure” Measures:
C1, C2, T2, T20
A measure is “pure” if it is d-separated from all other measures by its latent parent.
12
F1
x1 x2
F2 F3
x3 x4 y1 y2 y3 y4 z1 z2 z3 z4
Purify
Specified Model
13
F1
x1 x2
F
F2 F3
x3 x4 y1 y2 y3 y4 z1 z2 z3 z4
Purify
True Model
14
F1
x1 x2
F
F2 F3
x3 x4 y1 y2 y3 y4 z1 z2 z3 z4
Purify
True Model
15
F1
x1 x2
F
F2 F3
x3 x4 y1 y2 y3 y4 z1 z2 z3 z4
Purify
True Model
16
F1
x1 x2
F
F2 F3
x3 x4 y1 y2 y3 y4 z1 z2 z3 z4
Purify
True Model
17
F1
x1 x2
F
F2 F3
x3 y1 y2 y3 y4 z1 z3 z4
Purify
Purified Model
18
Scale = sum(measures of a latent)
Other Solutions: Scales
St1
12
Home
St2
12
St21
12
.
.
Homescale = i=1 to 21 (Sti)
Homescale
19
True Model
Other Solutions: Scales
Pseudo-Random Sample: N = 2,000
20
Scales vs. Latent variable Models
Regression:Cognition on Home, Lead
Predictor Coef SE Coef T PConstant -0.02291 0.02224 -1.03 0.303Home 1.22565 0.02895 42.33 0.000Lead -0.00575 0.02230 -0.26 0.797 S = 0.9940 R-Sq = 61.1% R-Sq(adj) = 61.0%
Insig.
True Model
21
Scales vs. Latent variable Models
Scales
homescale = (x1 + x2 + x3)/3leadscale = (x4 + x5 + x6)/3cogscale = (x7 + x8 + x9)/3
True Model
22
Scales vs. Latent variable Models
Cognition = - 0.0295 + 0.714 homescale - 0.178 Lead Predictor Coef SE Coef T PConstant -0.02945 0.02516 -1.17 0.242homescal 0.71399 0.02299 31.05 0.000Lead -0.17811 0.02386 -7.46 0.000
Regression:Cognition on
homescale, Lead
Sig.
True Model
23
Scales vs. Latent variable Models
Modeling Latents
True Model
Specified Model
24
Scales vs. Latent variable Models
(2 = 29.6, df = 24, p = .19)
B5 = .0075, which at t=.23, is correctly insignificant
True Model
Estimated Model
25
Scales vs. Latent variable Models
Mixing Latents and Scales
(2 = 14.57, df = 12, p = .26)
B5 = -.137, which at t=5.2, is incorrectly highly significantP < .001
True Model
26
Algorithms
Washdown (Scheines and Glymour, 2000?)
Build Pure Clusters (Silva, Scheines, Glymour, 2003,204)
27
Build Pure ClustersQualitative Assumptions (Causal Grammar - Tennenbaum):
1. Two types of nodes: measured (M) and latent (L)
2. M L (measured don’t cause latents)
3. Each m M measures (is a direct effect of) at least one l L
4. No cycles involving M
Quantitative Assumptions:
1. Each m M is a linear function of its parents plus noise
2. P(L) has second moments, positive variances, and no deterministic relations
28
Build Pure ClustersOutput - provably reliable (pointwise consistent):
Equivalence class of measurement models over a pure subset of M
For example:
L1 L2 L3
m1 m2 m3 m4 m5 m6 m7 m8 m9
L1 L2 L3
m1 m2 m3 m4 m5 m6 m7 m8 m9 m11 m10 True Model
Output
29
Build Pure ClustersMeasurement models in the equivalence class are at most refinements, but never coarsenings or permuted clusterings.
L1 L2 L3
m1 m2 m3 m4 m5 m6 m7 m8 m9
Output
L1 L2 L3
m1 m2 m3 m4 m5 m6 m7 m8 m9
L4
L1 L2 L3
m1 m2 m3 m4 m5 m6 m7 m8 m9
L1 L3
m1 m2 m3 m4 m5 m6 m7 m8 m9
30
Build Pure Clusters
Algorithm Sketch:
1. Use particular rank (tetrad) constraints on the measured correlations to find pairs mj, mk that do NOT share a latent parent
2. Add a latent for each subset S of M such that no pair in S was found NOT to share a latent parent in step 1.
3. Purify
4. Remove latents with no children
31
Limitations
• Requires large sample sizes to be really reliable (~ 500).
• Pure indicators must exist for a latent to be discovered and included
• Moderately computationally intensive (O(n6)).
• No error probabilities.
32
Case Studies
Stress, Depression, and Religion (Lee, 2004)
Test Anxiety (Bartholomew, 2002)
33
Stress, Depression, and ReligionMSW Students (N = 127) 61 - item survey (Likert Scale)
• Stress: St1 - St21
• Depression: D1 - D20
• Religious Coping: C1 - C20
P = 0.00
St1
12
Stress
St2
12
St21
12
.
.
Dep1
12
Coping
.
.
Depression
Dep2
12
Dep20
12
C1 C2 C20 . .
+
-
Specified Model
34
Stress, Depression, and Religion
Build Pure Clusters
St3
12
Stress
St4
12 St16
12
Dep9
12
Coping
Depression Dep13
12 Dep19
12
C9 C12 C15
St18
12
St20
12
C14
35
Stress, Depression, and Religion
Assume Stress temporally prior:
MIMbuild to find Latent Structure: St3
12
Stress
St4
12 St16
12
Dep9
12
Coping
Depression Dep13
12 Dep19
12
C9 C12 C15
St18
12
St20
12
C14
+
+
P = 0.28
36
Test Anxiety12th Grade Males in British Columbia (N = 335)
20 - item survey (Likert Scale items): X1 - X20
Exploratory Factor Analysis:
X2
Emotionality Worry
X8
X9
X10
X15
X16
X18
X3
X4
X5
X6
X7
X14
X17
X20
37
Test Anxiety
Build Pure Clusters:
X2
Emotionalty
X8
X9
X10
X11
X16
X18
X3
X5
X7
X14
X6
Cares About Achieving
Self-Defeating
38
Test Anxiety
Build Pure Clusters:
X2
Emotionalty
X8
X9
X10
X11
X16
X18
X3
X5
X7
X14
X6
Worries About Achieving
Self-Defeating
X2
Emotionality Worry
X8
X9
X10
X15
X16
X18
X3
X4
X5
X6
X7
X14
X17
X20
P-value = 0.00 P-value = 0.47
Exploratory Factor Analysis:
39
Test Anxiety
X2
Emotionalty
X8
X9
X10
X11
X16
X18
X3
X5
X7
X14
X6
Worries About Achieving
Self-Defeating
MIMbuild
p = .43
Emotionalty-Scale
Worries About Achieving-Scale
Self-Defeating
Unininformative
Scales: No Independencies or Conditional Independencies
40
Future Directions
• Handle discrete items
• Incorporate background knowledge
• Apply to ETS data
Recommended