26
The Power of Replication: How (Not) to Interpret Empirical Findings Michael Price Georgia State University and NBER

The Power of Replication: How (Not) to Interpret Empirical Findings Michael Price Georgia State University and NBER

Embed Size (px)

Citation preview

Page 1: The Power of Replication: How (Not) to Interpret Empirical Findings Michael Price Georgia State University and NBER

The Power of Replication:How (Not) to Interpret Empirical Findings

Michael PriceGeorgia State University and NBER

Page 2: The Power of Replication: How (Not) to Interpret Empirical Findings Michael Price Georgia State University and NBER

The Basic Problem: Interpretation

• What parameters are measured by the study?

• Are the parameters that are measured applicable in other environments?

• How likely are the parameters that are measured to reflect the “truth”?

Page 3: The Power of Replication: How (Not) to Interpret Empirical Findings Michael Price Georgia State University and NBER

The Basic Problem: Interpretation

• What is the maintained theory and how does interpretation depend on the maintained theory?– Revealed altruism and the difference between acts of omission versus

acts of commission– The importance of endowment on reference points and “framing”

effects

• The ability of individuals (research partners) to sort and the availability of substitutes– Allowing individuals to avoid the ask in charity and subsequent

patterns of giving– The effect of social comparisons on energy use in dorms/apartments

versus single family homes in a small town

Page 4: The Power of Replication: How (Not) to Interpret Empirical Findings Michael Price Georgia State University and NBER

The Basic Problem: Interpretation

• The basic mechanics of scientific discovery…..– The more independent researchers that are working on a problem, the

less likely that the initial finding is “true”– The extent of research “bias” and the sensitivity of a finding to the

maintained model decreases the likelihood a finding is “true”

• The power of replication….– The more times any given study is replicated, the more likely that the

findings are “true”

Page 5: The Power of Replication: How (Not) to Interpret Empirical Findings Michael Price Georgia State University and NBER

The Importance of the Maintained Model: “Framing” Effects

• A number of studies report that seemingly innocuous changes to a game lead to dramatic differences in outcomes– Payoffs to the recipient in a dictator game depend on whether the

choice is framed as giving to or taking from the recipient– Differences in final allocations in payoff equivalent common pool

resource and public goods games

• Are such differences “anomalies”?– Answer to the question depends on the maintained model….

Page 6: The Power of Replication: How (Not) to Interpret Empirical Findings Michael Price Georgia State University and NBER

“Framing” Effects – Standard Model

• Any model with utility defined over final payoffs does not distinguish between acts of omission versus commission– Not sharing with a recipient in the dictator game is an act of omission– Taking from a recipient in the dictator game is an of commission

• Consider individual that is asked how to split $10 with another party– Giving $X to the recipient is the same as taking $10 – X from the recipient– Would thus expect final allocations to be independent of endowment

Page 7: The Power of Replication: How (Not) to Interpret Empirical Findings Michael Price Georgia State University and NBER

“Framing” Effect – Moral Costs

• Suppose that individuals feel guilty when chosen actions are deemed “selfish”– Such feelings would motivate giving in dictator game– Concept shares similarity with social pressures in DellaVigna et al. (2012)

• U.S. law makes distinction between acts of omission and acts of commission when assigning liability

• Assume that feelings of guilt are stronger for acts of commission– Assignment of property rights and associated action space will impact split– Final payoff to dictator will be lower when asked to take from recipient

Page 8: The Power of Replication: How (Not) to Interpret Empirical Findings Michael Price Georgia State University and NBER

“Framing” the Results

• Suppose that one observes small but statistically significant differences in dictator payoff under Give and Take frames

• Interpretation of the data depend upon the maintained model– If believe “true” model is that of moral costs, differences are predicted

by theory and reflect that the games are different– If believe “true” model is defined over final payoffs only, differences are

at odds with theory and reflects “framing”

Page 9: The Power of Replication: How (Not) to Interpret Empirical Findings Michael Price Georgia State University and NBER

“Framing” the Results

• Example reflects how researcher “bias” can influence what is viewed as the “true” state of the world

• One should thus ask how likely the maintained model is likely to be valid

• Design replication studies that take on defining characteristics of model– Inequality aversion predicts that indifference curves are backward

above the 45 degree line– Efficiency preferences suggest player 1 would strictly prefer bundle

with payoffs (9, 7, 6) to one with payoffs (11, 7, 4)

Page 10: The Power of Replication: How (Not) to Interpret Empirical Findings Michael Price Georgia State University and NBER

Sorting and Non-Compliance

• Experiments may differ in ability and/or costs for subjects to sort– Provide dictators option to forego potential profits to avoid being asked to

share versus forcing them to share– Warning potential donors that a solicitor will be coming to their door

during a given time period versus showing up unannounced

• Sorting fundamentally alters what parameters (motives) are reflected in subsequent actions– Donations to an unexpected solicitor reflects social pressures and altruism– With sorting, the importance of social pressures is lower

Page 11: The Power of Replication: How (Not) to Interpret Empirical Findings Michael Price Georgia State University and NBER

Sorting and Non-Compliance

• Randomize subjects to into different remuneration schemes – conditional bonus, loss framed bonuses, piece rate– Typical experiment will focus on contemporaneous effects of the

various compensation schemes– But the choice of compensation scheme may impact who remains

with the company over the long-run

• Long-run impacts will depend on what types of workers elect to remain with the company– Potential differences in the relative superiority of contract types in

short and long-run…– Suppose that attrition is correlated with treatment – e.g., low

productivity workers are less likely to remain if paid via piece rate

Page 12: The Power of Replication: How (Not) to Interpret Empirical Findings Michael Price Georgia State University and NBER

Sorting and Non-Compliance

• Nature of scientific discovery is that research tends to focus on contemporaneous effects first

• Number of examples highlighting benefits of replication studies that examine treatment effects over longer horizon– Appearance of solicitor versus use of charitable raffle– Providing potential donors unconditional versus conditional gifts

Page 13: The Power of Replication: How (Not) to Interpret Empirical Findings Michael Price Georgia State University and NBER

Sorting and Non-Compliance

• A fundamental challenge in designing/interpreting experiments is issue of compliance (exposure to treatment)– Parents that are offered incentive to attend a parent academy but elect

not to– Households that are sent but do not open/read letter that includes a

normative appeal to conserve energy

• In such instances what experiment captures is an intent to treat effect – randomization is an imperfect instrument

Page 14: The Power of Replication: How (Not) to Interpret Empirical Findings Michael Price Georgia State University and NBER

Sorting and Non-Compliance

• Recall that estimated treatment effect under IV is given:

• If one cannot observe or model compliance, what is estimated is

Page 15: The Power of Replication: How (Not) to Interpret Empirical Findings Michael Price Georgia State University and NBER

The Availability of Substitutes

• Growing body of work that explores the impact of social comparisons on residential energy use

• Opower reports average reductions in consumption in range of 2-3%– However, treatment leads to increased consumption in some utilities

and up to 4-5% reductions in others

• Studies that explore the effects within dorms or apartments report effects in the 15-20% range

Page 16: The Power of Replication: How (Not) to Interpret Empirical Findings Michael Price Georgia State University and NBER

The Availability of Substitutes

• Intuitively the impact of such programs will depend on ability of individual to substitute away from in-home energy use

• Those living in dorms or large apartment complexes have more options to substitute away from in-home use– Watch TV or study in common rooms of dorm– Wash/dry clothes in common laundry room rather than apartment

• Cities with more amenities – movie theaters, public libraries, coffee houses, etc. – provide more substitution possibilities

Page 17: The Power of Replication: How (Not) to Interpret Empirical Findings Michael Price Georgia State University and NBER

The Availability of Substitutes

• Data used to analyze the impacts of such programs rarely includes controls for substitutes

• Extent to which availability of substitutes predicts variation in estimated treatment effects is unanswered question– Facilitate better predictions for those wishing to implement such

policies– Facilitate deeper understanding of channels through which messages

impact behavior

Page 18: The Power of Replication: How (Not) to Interpret Empirical Findings Michael Price Georgia State University and NBER

A Related Concern….Partner Selection

• Implementation of field experiments requires consent of a willing partner– Charity that is willing to test effectiveness of a given fund-raising

technique– Utility that is willing to explore the effectiveness of price changes or

targeted messages during periods of peak demand– School district that is willing to explore the effectiveness of teacher

incentives/curriculum change

• What is those willing to implement experiments are fundamentally different than others?

Page 19: The Power of Replication: How (Not) to Interpret Empirical Findings Michael Price Georgia State University and NBER

A Related Concern…Partner Selection

• Utilities that are willing to explore role of targeted messages as means to manage peak demand– More likely to face capacity/transmission constraints during peak periods

(unobserved differences in consumers/market structure)– More likely to have implemented other strategies to manage demand

during peak periods (unobserved differences in margins that can adjust)– More likely to believe that consumers will respond in desired way to

treatment (unobserved differences in consumers/market structure)

• Extent to which such selection would impact estimated treatment effects is unknown…but can be understood through replication

Page 20: The Power of Replication: How (Not) to Interpret Empirical Findings Michael Price Georgia State University and NBER

Types of Replication

• Various levels of replication– Re-analyze existing data to check robustness of results– Implementing experiment using similar protocol but different subject

pool– Employ new research design to test the interpretation/validity of prior

findings

• When to implement and benefits of any given strategy depend on underlying cause of concern

Page 21: The Power of Replication: How (Not) to Interpret Empirical Findings Michael Price Georgia State University and NBER

Types of Replication – Re-Analysis

• Want to re-analyze existing data when you believe that results are sensitive to modeling choices– Functional form assumptions or choice of controls– Rules for selecting relevant sample

• More common with naturally occurring data where identification relies upon choice of instrument

• However, there is scope for re-analysis of experimental data– Power of underlying statistical tests– Assumptions of linear treatment effect or specification of underlying model of

interest– Potential imbalance across observables that effect outcomes

Page 22: The Power of Replication: How (Not) to Interpret Empirical Findings Michael Price Georgia State University and NBER

Types of Replication – Rerun Original Design

• Maniadis et al. (2014) provide model that highlights conditions that influence the likelihood that stated research finding is “true”– Prior belief on the existence/magnitude of a particular association– Number of independent research teams working on a problem– Extent to which interpretation of finding is influenced by maintained

model – potential for researcher “bias”– Number of replication studies that report similar findings

• Framework highlights conditions under which one may want to re-run the original experiment using new subject pool

Page 23: The Power of Replication: How (Not) to Interpret Empirical Findings Michael Price Georgia State University and NBER

Types of Replication – Re-run Original Design

• The likelihood of a false positive is greater the lower the prior one places on the existence/magnitude of a reported effect– Concern is not with choice of design per se but likelihood that findings

reflect “luck” or draw from a small sample– Concern exacerbated by tendency for journals to publish “unexpected”

results

• The likelihood of a false positive for an initial finding is greater the more independent research teams are exploring a question

Page 24: The Power of Replication: How (Not) to Interpret Empirical Findings Michael Price Georgia State University and NBER

Types of Replication – New Study Design

• The likelihood of a false positive is greater the more likely it is that the researcher is “biased”– Design protocol in way that “forces” result– Interpret data in a way that is colored by maintained model– Results depend on ability of subjects to sort/availability of substitutes

• When underlying concern is research “bias” want to explore new study designs– Introduce sorting in the dictator game– Examine choices in regions where models have distinct predictions– Examine choice across domains with more/less substitutes and control

for such

Page 25: The Power of Replication: How (Not) to Interpret Empirical Findings Michael Price Georgia State University and NBER

Take Away Thoughts….

• Number of factors that influence what any given experiment measures and how to interpret the results

• Nature of scientific discovery suggests the power of replication– Tendency for journals to publish “novel” or “unexpected” findings– Sensitivity of results to maintained model and how that influences the

design– Heterogeneity in treatment effects and influence of partner selection and

characteristics of environment on such

Page 26: The Power of Replication: How (Not) to Interpret Empirical Findings Michael Price Georgia State University and NBER

Take Away Thoughts…

• Various levels of replication that address different concerns– Re-analyze existing data– Re-run original design with new subject pool– Design new set of experiments to explore robustness of a result

• Intuitive criteria that allow researcher/practitioner to determine which results should be replicated and what approach to take

• Replication need not be a dirty word or something we shy away from….embrace it and do not be afraid to question prior findings