1 Part 2 Automatically Identifying and Measuring Latent Variables for Causal Theorizing

Part 2

Automatically Identifying and Measuring Latent

Variables for Causal Theorizing

Assumptions Throughout

• Causal Bayes Nets

• Causal Markov Condition

• Faithfulness

Latent Variables

Reduce Dimensionality

X200 X2 X3 . . . . X4

Latent Variables

Cluster of Causes

Income

Socioeconomic Status

Education House Size

Latent Variables

Model concepts that might be “real” but which cannot be directly measured, e.g., air polution, depression

Air Polution

Depression

The Causal Theory Formation Problem for Latent Variable Models

Given observations on a number of variables, identify the latent variables that underlie these variables and the causal relations among these latent concepts.

Example: Spectral measurements of solar radiation intensities. Variables are intensities at each measured frequency.

Example: Quality of a Child’s Home Environment, Cumulative Exposure to Lead, Cognitive Functioning

The Most Common Automatic Solution: Exploratory Factor Analysis

• Chooses “factors” to account linearly for as much of the variance/covariance of the measured variables as possible.

• Great for dimensionality reduction• Factor rotations are arbitrary• Gives no information about the statistical and thus

the causal dependencies among any real underlying factors.

• No general theory of the reliability of the procedure

Other Solutions: Scales

Pseudo-Random Sample: N = 2,000

Scales vs. Latent variable Models

Regression:Cognition on Home, Lead

Predictor Coef SE Coef T PConstant -0.02291 0.02224 -1.03 0.303Home 1.22565 0.02895 42.33 0.000Lead -0.00575 0.02230 -0.26 0.797 S = 0.9940 R-Sq = 61.1% R-Sq(adj) = 61.0%

Insig.

True Model

Scales

homescale = (x1 + x2 + x3)/3leadscale = (x4 + x5 + x6)/3cogscale = (x7 + x8 + x9)/3

True Model

Cognition = - 0.0295 + 0.714 homescale - 0.178 Lead Predictor Coef SE Coef T PConstant -0.02945 0.02516 -1.17 0.242homescal 0.71399 0.02299 31.05 0.000Lead -0.17811 0.02386 -7.46 0.000

Regression:Cognition on

homescale, Lead

True Model

Modeling Latents

True Model

Specified Model

(2 = 29.6, df = 24, p = .19)

B5 = .0075, which at t=.23, is correctly insignificant

True Model

Estimated Model

Mixing Latents and Scales

(2 = 14.57, df = 12, p = .26)

B5 = -.137, which at t=5.2, is incorrectly highly significantP < .001

True Model

Algorithms

Washdown (Scheines and Glymour, 2000?)

Build Pure Clusters (Silva, Scheines, Glymour, 2003,204)

Build Pure ClustersQualitative Assumptions (Causal Grammar - Tennenbaum):

1. Two types of nodes: measured (M) and latent (L)

2. M L (measured don’t cause latents)

3. Each m M measures (is a direct effect of) at least one l L

4. No cycles involving M

Quantitative Assumptions:

1. Each m M is a linear function of its parents plus noise

2. P(L) has second moments, positive variances, and no deterministic relations

Build Pure ClustersOutput - provably reliable (pointwise consistent):

Equivalence class of measurement models over a pure subset of M

For example:

L1 L2 L3

m1 m2 m3 m4 m5 m6 m7 m8 m9

L1 L2 L3

m1 m2 m3 m4 m5 m6 m7 m8 m9 m11 m10 True Model

Output

Build Pure ClustersMeasurement models in the equivalence class are at most refinements, but never coarsenings or permuted clusterings.

L1 L2 L3

m1 m2 m3 m4 m5 m6 m7 m8 m9

Output

L1 L2 L3

m1 m2 m3 m4 m5 m6 m7 m8 m9

L1 L2 L3

m1 m2 m3 m4 m5 m6 m7 m8 m9

Build Pure Clusters

Algorithm Sketch:

1. Use particular rank (tetrad) constraints on the measured correlations to find pairs mj, mk that do NOT share a latent parent

2. Add a latent for each subset S of M such that no pair in S was found NOT to share a latent parent in step 1.

3. Purify

4. Remove latents with no children

Limitations

• Requires large sample sizes to be really reliable (~ 500).

• Pure indicators must exist for a latent to be discovered and included

• Moderately computationally intensive (O(n6)).

• No error probabilities.

Case Studies

Stress, Depression, and Religion (Lee, 2004)

Test Anxiety (Bartholomew, 2002)

Stress, Depression, and ReligionMSW Students (N = 127) 61 - item survey (Likert Scale)

• Stress: St1 - St21

• Depression: D1 - D20

• Religious Coping: C1 - C20

P = 0.00

Stress

Coping

Depression

C1 C2 C20 . .

Specified Model

Stress, Depression, and Religion

Build Pure Clusters

Stress

12 St16

Coping

Depression Dep13

12 Dep19

C9 C12 C15

Stress, Depression, and Religion

Assume Stress temporally prior:

MIMbuild to find Latent Structure: St3

Stress

12 St16

Coping

Depression Dep13

12 Dep19

C9 C12 C15

P = 0.28

Test Anxiety12th Grade Males in British Columbia (N = 335)

20 - item survey (Likert Scale items): X1 - X20

Exploratory Factor Analysis:

Emotionality Worry

Test Anxiety

Build Pure Clusters:

Emotionalty

Cares About Achieving

Self-Defeating

Test Anxiety

Build Pure Clusters:

Emotionalty

Worries About Achieving

Self-Defeating

Emotionality Worry

P-value = 0.00 P-value = 0.47

Exploratory Factor Analysis:

Test Anxiety

Emotionalty

Worries About Achieving

Self-Defeating

MIMbuild

p = .43

Emotionalty-Scale

Worries About Achieving-Scale

Self-Defeating

Unininformative

Scales: No Independencies or Conditional Independencies

Future Directions

• Handle discrete items

• Incorporate background knowledge

• Apply to ETS data

1 Part 2 Automatically Identifying and Measuring Latent Variables for Causal Theorizing

Documents

LATENT VARIABLES, CAUSAL MODELS AND OVERIDENTIEYING ... · 118 C. Giymour and P. Spirtes, Latent variables, causal models and overidentifying constrarnts models that are in accord

THEORIZING INDIAN DEMOCRACY

Causal Transfer Learning - arXiv · and their targets. ... of causal transfer learning problems even when there are latent confounders and when types and targets of interventions

Hendrix Theorizing(2012)

Learning Linear Cyclic Causal Models with Latent Variablesjmlr.org/papers/volume13/hyttinen12a/hyttinen12a.pdf · dation for the causal discovery procedure ﬁrst presented by Eberhardt

THEORIZING RACIAL JUSTICE

ANALYZING & THEORIZING ARCHITECTURE

Kellner Theorizing Globalization

Theorizing Glocalization and Grobalization - … · Theorizing Glocalization and Grobalization T ... not only useful in theorizing globalization but also have much broader applicability

Network Theorizing

Inequality and Theorizing

Theorizing about Entrepreneurship

Theorizing Risk

Discovery of Causal Models that Contain Latent Variables ... · Constraint-based and Bayesian causal discovery · Posterior probability 1 Introduction Much of science consists of

Theorizing Globalization

Theorizing Gender

THEORIZING DISTRUST - RSF

Theorizing Crisis Communication

Theorizing caribbean development

Curriculum theorizing