35
Quantifying Statistical Control: the Threshold of Theoretical Randomization Kenneth A. Frank Minh Duong Spiro Maroulis Michigan State University Ben Kelcey University of Michigan Presented at Groningen May 21 2008

Quantifying Statistical Control: the Threshold of Theoretical Randomization Kenneth A. Frank Minh Duong Spiro Maroulis Michigan State University Ben Kelcey

  • View
    217

  • Download
    0

Embed Size (px)

Citation preview

Quantifying Statistical Control: the Threshold of Theoretical Randomization

Kenneth A. Frank Minh Duong

Spiro Maroulis Michigan State University

Ben KelceyUniversity of Michigan

Presented at Groningen May 21 2008

Focal Example: The Effect of Kindergarten Retention on Reading and Math Achievement(Hong and Raudenbush 2005)

1. What is the average effect of kindergarten retention policy? (Example used here)

Should we expect to see a change in children’s average learning outcomes if a school changes its retention policy?

Propensity based questions (not explored here)

2. What is the average impact of a school’s retention policy on children who would be promoted if the policy were adopted?

Use principal stratification (Frangakis and Rubin 2002).

3. What is the effect of kindergarten retention on those who are retained?

How much more or less kindergarten retainees would have learned, on average, had they been promoted to the first grade rather than retained.

Data

• Early Childhood Longitudinal Study Kindergarten cohort (ECLSK)– US National Center for Education Statistics (NCES).

• Nationally representative• Kindergarten and 1st grade

– observed Fall 1998, Spring 1998, Spring 1999 • Student

– background and educational experiences– Math and reading achievement (dependent variable)– experience in class

• Parenting information and style• Teacher assessment of student• School conditions• Analytic sample (1,080 schools that do retain some children)

– 471 kindergarten retainees – 10,255 promoted students

Effect of Retention on Reading Scores(Hong and Raudenbush)

Possible Confounding Variables

• Gender• Two Parent Household• Poverty• Mother’s level of Education (especially

relevant for reading achievement)

What is the Impact of a Confounding Variable on an Inference for a Regression Coefficient?

(Frank, K. 2000. “Impact of a Confounding Variable on the Inference of a RegressionCoefficient.” Sociological Methods and Research, 29(2), 147-194.)

Impact appears in Partial Correlation

r ty is the sample correlation between the treatment and the outcomer yv is the sample correlation between a confound and the outcomer tv is the sample correlation between a confound and the treatment

Correlation is reduced by the product of two relevant correlations (values in denominator can only increase the partial)

Inference for regression coefficient is same as that for partial correlation

| 2 21 1

ty t v y vty v

t v y v

r r rr

r r

Impacts of Covariates on Correlation between Retention and

Reading AchievementComponent Correlations

covariate impact with with

achievement retention Mother’sEducation -0.0122 0.189 -0.064Female -0.0054 0.102 -0.053Two parent -0.0025 0.086 -0.025poverty -0.0080 0.135 -0.059

Negative impact would reduce the magnitude of the coefficient for retention

Covariates and Absorbers (dependent variable: Reading in Spring 1999)

• Covariates– Mother’s education – Poverty – Gender– Two parent home– References

• Hong and Raudenbush; Shepard; Coleman

• Absorbers– Schools as fixed effects– Pre-test Spring 1998– Growth trajectory: Fall 1998-Spring1998– References; Shadish et al; Heckman and Hotz (1988: JASA)

Extent to which Pre-test Absorbs the Impacts of Covariates on Inference

Regarding Effect of Retention on Reading Achievement

Controlling for pre-test absorbs 87% of the impact of Mother’s Education; once controlling for pre-test there is less of a need to control for mother’s education

No Control for Pre-test Control for Pre-test % Reduction(absorption)

Family background

Impact rv•t rv•y Impact rv•t rv•y

Mother’s education

-0.012 -.065 0.190 -.002 -0.027

0.057 873

Female -0.005 -0.053 0.103 -.001 -0.035

0.040 743

Two parent home

-0.002 -0.025 0.087 -.001 -0.015

0.044 690

Poverty 0.008 0.059 -0.135 .003 0.037

-0.069 688

Capacity of Controls to Absorb the Impacts of Covariates

00.0020.0040.0060.0080.01

0.0120.0140.016

none school school +pre2

School +pre2 +

(pre2-pre1)

Control

Imp

act

momed

female

2parent

poverty

Effect of Retention on Achievement After Adding each Covariate

Controls Est Se t

School -21.24 .63 -33.49

School+Pre2 -12.01 .45 -26.48

School+Pre2+(Pre2-Pre1)+ -12.10 .47 -26.28

School+Pre2+(Pre2-Pre1)+Momed

-12.00 .47 -26.26

School+Pre2+(Pre2-Pre1)+Female

-12.07 .46 -25.18

School+Pre2+(Pre2-Pre1)+2parent

-12.01 .46 -26.27

School+Pre2+(Pre2-Pre1)+poverty

-12.04 .46 -26.16

Hong and Raudenbush (model based) -9.01 .68 -13.27

n=10,065, R2 =.40Note: 1 year’s growth is about 10 points, so retention effect > 1 year growth

Randomization as the Gold Standard

• Randomization preferred

• Works in “long run”: What is “long run”?

• Relationship between n and impact in theoretical randomized experiment – Alternative “Silver Standard”– Quantify statistical control in a quasi-

experiment

Need for Simulation?Predicting Mean Impact Using Wei Pan’s

Approximation(UGLY!)

Pan, W., and Frank, K.A., 2004. “A probability index of the robustness of a causalinference,” Journal of Educational and Behavioral Statistics, 28, 315-337.

ρtv = correlation between treatment and confoundρ yv = correlation between outcome and confounds, a, b coefficients to obtain approximation

Pan’s Approximation(UGLY!): But Works

Simulate mean impactn: (20,100,1000) ρtv, ρty, ρvy: (.1, .3, .5, .7)

Bias of predicted mean impact (Pan 2003) across simulations is .00094 with standard deviation of .00071

We have a function for the impacts across a range of conditions

What is the Impact of a Confounding Variable in an Randomized Experiment?

0 in RCT

Predicting Mean Impact Using Wei Pan’s Approximation Assuming ρtv=0: (No Correlation between treatment and

confound, as in randomized experiment). Elegant!

2(1 )

6ty yvmean impact

n

Where

ρty= correlation between treatment and outcomeρyv= correlation between outcome and unobserved confound

Solving Pan’s Approximation for n (assuming randomized experiment):

2(1 )6ty yveffective n

mean impact

Allows us to predict effective n of a theoretical randomized experimentgiven a mean impact and hypothetical correlation between outcome and confound

Can predict an effective n given an impact in a quasi-experiment

Predicted Sample Size as a Function of Impact

Of mother’s education

Quantitative Crosswalk between RCT and Quasi-experiment

• Quasi-experiment can achieve same or better level of control as randomized experiment– Red line: Hong and Raudenbush achieve control equivalent to

randomized experiment of size 200 better than a small RCT

• But, with a randomized experiment– Guaranteed no bias in long run– Confidence interval captures uncertainty

• Trade off between precision versus bias– Quasi-experiment could be more precise, but possibly biased– Key assumption: impacts of measured covariates represent

impacts of unmeasured covariates.

Asymptotics of Randomization

• “Elbow” in relationship between n and impact.• Imprecise prediction for small impact (where we care the

most)• Leverage the shape by defining a single threshold (first

derivative=-25/.001=-25000). 25 change in n for .001 change in impact

2

2

(1 )( )25000

( ) ( )y t y veffective n

mean impact mean impact

ygv

ygt

0.1 0.3 0.5

0.1

0.3

0.5

0.7

Mean Impact and Effective N of each cell given at threshold.

Aymptotics of Precision for Randomization Across Levels of Correlation between Outcome and the Treatment (ρyt) and Outcome and a Confound (ρyv)

Interpretations

• Cut offs appear reasonable – on the way to asymptotic land

• More affected by treatment effect (can be estimated) than by relationship between outcome and unobserved confound (unknown). Good.

Discussion

• Characterize control in terms of impact• Theoretical randomized experiment as “gold standard”

– Departure from Cook, who used actual experiments

• Quasi-experiments (legitimacy)– Can equate to theoretical experiment– Obtain effective n– Use effective n as weight in meta-analysis– Cross threshold?

• Procedure– Establish impact of good covariates– Establish absorption due to pre-test, etc– Equate to randomized experiment

What must be the Impact of an Unmeasured Confounding Variable

Invalidate the Inference?

Step 1: Establish Correlation Between Retention and Score

Step 2: Define a Threshold for Inference

Step 3: Calculate the Threshold for the Impact Necessary to Invalidate the Inference

Step 4: Multivariate Extension, with measured Covariates

Step 1: Establish Correlation Between Retention and Score

2 2

t 26r .26

(n q 1) t (9012) ( 26.00)

t taken from regression, =-26.00 n is the sample size q is the number of parameters estimatedN-q-1=9012

Step 2: Define a Threshold for Inference

• Define r# as the value of r that is just statistically significant:

# critical

2critical

tr

(n q 1) t

n is the sample size q is the number of parameters estimatedtcritical is the critical value of the t-distribution for making an inference

#

2

1.96.02

(9012) 1.96r

r# can also be defined in terms of effect sizes

Step 3: Calculate the Threshold for the Impact Necessary to Invalidate the Inference

#

r

1 rx yr

TICV

· · · ·· | 2 2

· ·11 1

x y x cv y cv x yx ycv

y cv x cv

r r r r kr

kr r

Set rx∙y|cv =r# and solve for k to find the threshold for the impact of a confounding variable (TICV).

Define the impact: k = rx∙cv x ry∙cv and assume rx∙cv =ry∙cv (which maximizes the impact of the confounding variable – Frank 2000).

impact of an unmeasured confound > .25 → inference invalid

.313 .02.25

1 | .02 |TICV

Calculations made easy!

• http://www.msu.edu/~kenfrank/papers/calculating%20indices%203.xls

Step 4: Multivariate Extension, with Covariates

#· |2 2

· · #

r(1 )(1 )

1 rx y z

x z y z

rTICV r r

2·2

· 2·

1

1

y z

y cv

x z

rr TICV

r

k=rx ∙cv|z× ry ∙ cv|z

Maximizing the impact with covariates z in the model implies

2·2

· 2·

1

1x z

x cv

y z

rr TICV

r

And

=.21

Multivariate Calculations

• http://www.msu.edu/~kenfrank/papers/calculating%20indices%203.xls

What must be the Impact of an Unmeasured Confound to Invalidate the Inference?

If k > .25 (or .21 without covariates) then the inference is invalid.Maximum for multivariate model occurs when

r x cv =.46 and ry cv, =.45.

Furthermore, correlations of unobserved confound must be partialled for covariates z.

The magnitude of the impact of mother’s education (strongest measured covariate) = .0015;

Impact of unmeasured confound would have to be more than 100 times greater than the impact of mother’s education to invalidate the inference. Hmmm….

Extensions• Logistic Regression

– See Imbens, Guido “Sensitivity to Exogeneity Assumptions in Program Evaluation” Recent Advances in Econometric Methdology (126-132, especially 128)

– David J. Harding. 2003. "Counterfactual Models of Neighborhood Effects: The Effect of Neighborhood Poverty on Dropping Out and Teenage Pregnancy." American Journal of Sociology 109(3): 676-719.

– Logistic regression (Ben Kelcey at U of M)• Use weighted least squares• Use odds ratios

• Multilevels– Seltzer and Frank (AERA 2007)

• Multiple thresholds– Statistical significance: simply redefine H0 ≠0.– Point estimates: define impact necessary to reduce coefficient below a

series of thresholds, each one representing a separate decision. Half-way between Bayesian and Frequentist

Actual Randomized Experiment

• Effect of Technology on Teaching

• Strong Methods

• Randomization

• Still some Confounding

Relationship Between background Characteristics and Treatment Assignment in a Randomized Study of the Effect of Technology on Achievement