Using Bayesian Causal Forest Models to Examine Treatment ...y ij = j + (x ij)+[ (w ij)+ j] z ij + ij...

Using Bayesian Causal Forest Models to Examine Treatment Effect

HeterogeneityJared S. Murray

UT Austin

yij = ↵j +pX

�hxijh +

⌧`wij` + �j

#zij + ✏ij

Controls at the student and/or school

Moderators at the student and/or school

School-specific intercepts/fixed/random effects

School-specific “unexplained” heterogeneity

Multilevel Linear Models for Heterogeneous Treatment Effects

yij = ↵j + �(xij) + [⌧(wij) + �j ] zij + ✏ij

Coloring outside the lines: Multilevel Bayesian Causal Forests

We replace linear terms with Bayesian additive regression trees (BART)

BART in causal inferece: Hill (2011), Green & Kern (2012), …

Parameterizing treatment effect heterogeneity with BART is due to Hahn, Murray and Carvalho (2017)

Allows for complicated functional forms (nonlinearity, interactions, etc) without

pre-specification…

…while carefully regularizing estimates with prior distributions (shrinkage toward additive structure and discouraging implausibly large

treatment effects)

BART in causal inferece: Hill (2011), Green & Kern (2012), …

Parameterizing treatment effect heterogeneity with BART is due to Hahn, Murray and Carvalho (2017)

Analyzing data with ML BCF• Obtain posterior samples for all the parameters, compute treatment

effect estimates for each unit/school/etc.

• The challenge: How do we summarize these complicated objects?

• “Roll up” treatment effect estimates to ATE

• Subgroup search

• Counterfactual treatment effect predictions/“partial effects of moderators”

Application: A new analysis with NMS

• Same moderators (school mindset norms, achievement, and minority composition) + controls

• Different population (all students) and outcome (math GPA)

• Same basic process with limited researcher DOF

• Weakly informative priors on τ(w) (<0.5 GPA points with high prior probability) and random effects

95% confidence interval from ML Linear Model

95% uncertainty interval from ML BCF

Inference for the Average Treatment Effect

Subgroup search• Obtain posterior mean of treatment effects

• Use recursive partitioning (CART) on the posterior mean to find moderator-determined subgroups with high variation across subgroup ATE

• Statistically kosher! We use the data once (prior -> posterior)

• Can be formalized as the Bayes estimate under a particular loss function

Achievement

Lower Achieving Low Norm

CATE = 0.032 n = 3253

Lower Achieving High Norm

CATE = 0.073 n = 3265

−0.05 0.00 0.05 0.10 0.15

High Achieving CATE = 0.016

n = 5023

> 0.67 ≤ 0.67

≤ 0.53 > 0.53

Achievement

CATE = 0.032 n = 3253

CATE = 0.073 n = 3265

−0.05 0.00 0.05 0.10 0.15

n = 5023

−0.05 0.05 0.10 0.15 0.20

Diff in Subgroup ATE

Pr(diff > 0) = 0.93

> 0.67 ≤ 0.67

≤ 0.53 > 0.53

Achievement

CATE = 0.073 n = 3265

n = 5023

Pr(diff > 0) = 0.81

> 0.67 ≤ 0.67

≤ 0.53 > 0.53

Low Achieving Low Norm

CATE = 0.010 n = 1208

Mid Achieving Low Norm

CATE = 0.045 n = 2045

Achievement

≤ 0.38 > 0.38

−0.05 0.00 0.05 0.10 0.15

−0.05 0.05 0.10 0.15 0.20

Diff in Subgroup ATE

Counterfactual treatment effect predictions• How do estimated treatment

effects change in lower achieving/low norm schools if norms increase, holding constant school minority comp & achievement?

• Not a formal causal mediation analysis (roughly, we would need “no unmeasured moderators correlated with norms”)

Achievement

CATE = 0.032 n = 3253

CATE = 0.073 n = 3265

n = 5023

> 0.67 ≤ 0.67

≤ 0.53 > 0.53

0.0 0.1 0.2CATE

Increase +0.5 IQR +1 IQR +10% Orig Group Low Norm/Lower Ach High Norm/Lower Ach

! Original 0.032 (-‐0.011 0.076) +10% 0.050 (0.005, 0.097) +Half IQR 0.051 (0.005, 0.099) +Full IQR 0.059 (0.009, 0.114)

1 IQR = 0.6 extra problems on worksheet task

Conclusion• Flexible models + careful regularization + posterior summarization is a

winning combination

• Our approach takes the best parts of linear models with lots of researcher degrees of freedom and “black box” machine learning methods that only afford bankshot regularization and summarization

• Many “degrees of freedom” in the summarization step, but these depend on the data only through the posterior

• Unlike many ML methods, we can handle multilevel structure and prior knowledge with ease

Using Bayesian Causal Forest Models to Examine Treatment ...y ij = j + (x ij)+[ (w ij)+ j] z ij + ij...

Documents

Design-based, Bayesian Causal Inference for the Social

Bayesian approaches to cognitive sciences. Word learning Bayesian property induction Theory-based causal inference

Bayesian Approach to Causal Structure Learning · 1 Introduction 2 Bayesian Approach 3 Incorporating Experimental Data 4 Structure Discovery 5 Bayesian Model Averaging 6 Curriculum

Bayesian Sense-Making in Data Science Conference...Nov 02, 2018 · creating, editing, and analyzing causal models (also known as directed acyclic graphs or causal Bayesian networks)

Bayesian Causal Inference: A Tutorial · Bayesian Causal Inference: A Tutorial Fan Li Department of Statistical Science Duke University ... I Causal inference under the potential

Bayesian inversion of deterministic dynamic causal models

Bayesian regression tree models for causal inference

Meelis Kull meelis.kull@ut.ee Spring 2018€¦ · –3x Probabilistic learning –2x Causal Inference –1x Bayesian Nonparametrics –1x Bayesian Optimization –1x Gaussian processes

Causal and Bayesian Network (Chapter 2)

Bayesian Causal Inference Under Conditional Ignorability

Integrated modular Bayesian networks with selective inference …sclab.yonsei.ac.kr/publications/Papers/IJ/2015_Neuro_SHL.pdf · 2015-07-01 · computational complexity of Bayesian

Ecological prediction using causal Bayesian networks: A ...€¦ · Ecological prediction using causal Bayesian networks: ... long-term monitoring data, Bayesian hierarchical modeling

B-COURSE: A WEB-BASED TOOL FOR BAYESIAN AND CAUSAL DATA …cosco.hiit.fi/Articles/ijait02.pdf · 2007-08-23 · B-Course: A Web-Based Tool for Bayesian and Causal Data Analysis 371

Bayesian Belief Network - ULisboa · Bayesian Networks Bayesian belief network allows a subset of the variables conditionally independent A graphical model of causal relationships

Bayesian Generic Priors for Causal Learningcvl.psych.ucla.edu/papers/GenericPrior.psychreview08.pdf · judgments of both causal strength and causal structure (whether a causal link

Bayesian selection of dynamic causal models for fMRI

Inferring causal impact using Bayesian structural time-series …static.googleusercontent.com/media/research.google.com/... · 2020. 3. 3. · INFERRING CAUSAL IMPACT USING BAYESIAN

Applied Bayesian Modeling and Causal Inference from ... · Applied Bayesian Modeling and Causal Inference from Incomplete-Data Perspectives An essential journey with Donald Rubin’s

Learning Causal Bayesian Network Structures from ... · issues in the use of Bayesian Networks in causal inference. 2.2 Causal Networks and Experimental Data In general, the formulation

A Bayesian Theory of Sequential Causal Learning and ...ayuille/Pubs15/CausalSeq2 final.pdf · A Bayesian Theory of Sequential Causal Learning and Abstract Transfer ... max rule tends