15
Laia Pujol Priego, Jonathan Wareham October – 2019 EN As Predicted: Preventing p-hacking Open Science Monitor Case Study

As Predicted: Preventing p-hacking · about research practices such as p-hacking. P-hacking consists of manipulating data analyses in order to obtain significant effects (J. P. Simmons

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: As Predicted: Preventing p-hacking · about research practices such as p-hacking. P-hacking consists of manipulating data analyses in order to obtain significant effects (J. P. Simmons

Laia Pujol Priego, Jonathan Wareham October – 2019

EN

As Predicted: Preventing p-hacking

Open Science Monitor Case Study

Page 2: As Predicted: Preventing p-hacking · about research practices such as p-hacking. P-hacking consists of manipulating data analyses in order to obtain significant effects (J. P. Simmons

As Predicted: Preventing p-hacking - Open Science Monitor Case Study

European Commission Directorate-General for Research and Innovation Directorate G — Research and Innovation Outreach Unit G.4 — Open Science E-mail [email protected] [email protected] European Commission B-1049 Brussels

Manuscript completed in August 2019.

This document has been prepared for the European Commission however it reflects the views only of the authors, and the Commission cannot be held responsible for any use which may be made of the information contained therein.

More information on the European Union is available on the internet (http://europa.eu).

Luxembourg: Publications Office of the European Union, 2019

EN PDF ISBN 978-92-76-12548-8 doi: 10.2777/527729 KI-04-19-668-EN-N

© European Union, 2019.

Reuse is authorized provided the source is acknowledged. The reuse policy of European Commission documents is regulated by Decision 2011/833/EU (OJ L 330, 14.12.2011, p. 39).

For any use or reproduction of photos or other material that is not under the EU copyright, permission must be sought directly from the copyright holders.

Page 3: As Predicted: Preventing p-hacking · about research practices such as p-hacking. P-hacking consists of manipulating data analyses in order to obtain significant effects (J. P. Simmons

EUROPEAN COMMISSION

As Predicted: Preventing p-hacking Open Science Monitor Case Study

2019 Directorate-General for Research and Innovation EN

Page 4: As Predicted: Preventing p-hacking · about research practices such as p-hacking. P-hacking consists of manipulating data analyses in order to obtain significant effects (J. P. Simmons

4

Table of contents Acknowledgments ............................................................................................... 5

1 Introduction ................................................................................................ 6

2 Background ................................................................................................. 7

3 Drivers ....................................................................................................... 8

4 Barriers ...................................................................................................... 9

5 Impact ...................................................................................................... 10

6 Policy conclusions....................................................................................... 11

References ....................................................................................................... 12

Page 5: As Predicted: Preventing p-hacking · about research practices such as p-hacking. P-hacking consists of manipulating data analyses in order to obtain significant effects (J. P. Simmons

5

Acknowledgments

Disclaimer: The information and views set out in this study report are those of the author(s) and do not necessarily reflect the official opinion of the Commission. The Commission does not guarantee the accuracy of the data included in this case study. Neither the Commission nor any person acting on the Commission’s behalf may be held responsible for the use which may be made of the information contained therein.

The case study is part of Open Science Monitor led by the Lisbon Council, together with CWTS, ESADE, and Elsevier.

Authors

Laia Pujol Priego – Ramon Llull University, ESADE

Jonathan Wareham – Ramon Llull University, ESADE

Page 6: As Predicted: Preventing p-hacking · about research practices such as p-hacking. P-hacking consists of manipulating data analyses in order to obtain significant effects (J. P. Simmons

STUDY ON OPEN SCIENCE: MONITORING TRENDS AND DRIVERS (Reference: PP-05622-2017)

6

1 Introduction

"While currently there is a unilateral emphasis on 'first' discoveries, there should be as much emphasis on replication of discoveries."(Begley, 2013)

The scientific community has recently begun to emphasize the problem of bias in academic publications towards statistically significant effects (Ferguson & Brannick, 2012; Kühberger, Fritz, & Scherndl, 2014; Rosenthal, 1979), which has eventually lead to use of some questionable practices (J. P. Simmons, Nelson, & Simonsohn, 2011), including the research scandals of Lawrence Sanna (Yong, 2012) or Diederik Stapel (Bhattacharjee, 2013) (see the overview of Gonzales & Cunningham, 2015). The tendency of the current publication system to favour studies with significant results has generated not only dishonest behaviour but also has highlighted a significant problem about research practices such as p-hacking. P-hacking consists of manipulating data analyses in order to obtain significant effects (J. P. Simmons et al., 2011) and hypothesizing after the results are known (Kerr, 1998). P-value diagnosticity is partly contingent on the number of tests performed (Sellke, Bayarri, & Berger, 2001). The decision that a given p < .05 result is unlikely–and thus claiming evidence against null hypothesis- is different if only one test were conducted versus 10, 100 or 1000 tests (Nosek, Ebersole, DeHaven, & Mellor, 2017). Although there are different options for correcting the diagnostic validity of p-values for the number of tests conducted (Dunnett, 1955), they are rarely and inconsistently performed (Nosek et al., 2017). The major challenge, as (Loken & Gelman, 2017) explains, is that if the choices (which are vast) for analysing the data are made during the analysis, it makes selecting some paths more likely and others less. In these cases, it is almost impossible to estimate the paths that could have been selected or if the analytic decisions were influenced by confirmation and outcome bias (Nosek et al., 2017). Researchers have different choices for the analysis and selecting those tests that yield positive results over the negative ones damages accuracy (Ioannidis, 2005). The P-hacking problem is different from the so-called "file-drawer problem," which consists of researchers not publishing their p>.05 studies. See examples of P-hacking effects and how selective reporting invalidates Bayesian inference as much as it invalidates p-values (Rosenthal, 1979; J. P. Simmons et al., 2011). As a result, psychological science and other scientific domains have started to foster research practices that seek to overcome p-hacking – i.e., provision of outlets to publish replication studies (Rovenpor & Gonzales, 2015) and a broader and ambitious open science agenda that encourages transparency. One of such practices includes preregistration of studies, which seeks to reduce the publication bias for statistically significant results. By preregistering the research, scientists also specify their plan in advance and thus separate hypothesis-generating (exploratory) from hypothesis-testing (confirmatory) research. The same data cannot be used to both generate and test the research hypothesis. This planning allows for discerning the scientist’s strategy and improves the overall quality and transparency of research (Nosek et al., 2017). Preregistration allows for being clear between hypothesis generation and testing and preserves accuracy in the calibration and assessment of the evidence. Preregistration makes researchers commit to analytical steps without knowing a priori the research outcomes. As a result, different independent registries have been launched (see next section). For example, in September 2018, the University of Pennsylvania launched the Penn Wharton Credibility Lab (in short, the Credibility Lab). The mission of the lab is to provide an online platform that makes it possible for researchers to post their research designs before the studies are conducted towards greater transparency both within their scientific audience, as well as the general public at large (Wharton University, 2019). The Credibility Lab launched As Predicted in December 2015, a platform that allows scientists to preregister their studies so that other researchers can read and evaluate their preregistration. Other independent registries have also been launched.

Page 7: As Predicted: Preventing p-hacking · about research practices such as p-hacking. P-hacking consists of manipulating data analyses in order to obtain significant effects (J. P. Simmons

STUDY ON OPEN SCIENCE: MONITORING TRENDS AND DRIVERS (Reference: PP-05622-2017)

7

2 Background

As Simmons, Nelson, & Simonsohn (2017) clearly explain in their descriptive blog post: “Preregistrations are time-stamped documents in which researchers specify exactly how they plan to collect their data and to conduct their key confirmatory analyses. The goal of preregistration is to make it easy to distinguish between planned, confirmatory analyses – those for which statistical significance is meaningful – and unplanned exploratory analyses – those for which statistical significance is not meaningful. Because good preregistration prevents researchers from p-hacking, it also protects them from suspicions of p-hacking." There are different domain-specific and general independent registries that have emerged to allow researchers to preregister their studies. Some examples include: the World Health Organization1; Clinicaltrials.gov2; the American Economic Association Registry3; the Registry for International Development Impact Evaluations4; the Evidence in Governance and Politics registry5 for economics and political science research; or the Open Science Framework6, which provides different formats for preregistration7. As Predicted emerges within this preregistration landscape to provide a simple form for registration that, at the same time, provides enough information to allow for accuracy and transparency. In addition, As Predicted allows users to keep their completed forms private forever, the founders of As Predicted realized that requiring that all preregistrations become public makes this practice less appealing and more problematic. At the same time allowing the pre-registration to be private, this practice stills mitigate the file-drawer problem by allowing researchers to choose to make public studies that authors would otherwise not share via publications due to non-positive results; while at the same time, preventing authors from preregistering multiple alternative research plans for the same study and then only disclosing the one that is consistent with the result that they sought to be published. As Predicted works as follows: The author answers nine questions. Afterward, all co-authors receive an email asking for approval. If it is approved, the study remains private until the author agrees to make it public. When going public, a single-page .pdf is generated, which contains a unique URL allowing for one-click verification. The single-page document is automatically stored in the web archive. To preregister a study in As Predicted, a researcher is requested to answer nine questions about their research and analyses.

1) Data collection: Have any data been collected for this study already? 2) Hypothesis: What is the main question being asked or hypothesis being tested in

this study? 3) Dependent variable: Describe the key dependent variable(s) specifying how they

will be measured. 4) Conditions: How many and which conditions will participants be assigned to? 5) Analyses. Specify accurately which analyses you will conduct to examine the central

question/hypothesis. 6) Outliers and Exclusions. Describe exactly how outliers will be defined and handled,

and your precise rule(s) for excluding observations 7) Sample Size. How many observations will be collected, or what will determine the

sample size? 8) Other. Anything else you would like to preregister?

1 Source: http://www.who.int/ictrp/network/primary/en/ 2 Source: http://clinicaltrials.gov/ 3 Source: https://www.socialscienceregistry.org/ 4 Source: http://ridie.3ieimpact.org/ 5 Source: http://egap.org/content/registration 6 Source: https://osf.io 7 Source: http://osf.io/registries/

Page 8: As Predicted: Preventing p-hacking · about research practices such as p-hacking. P-hacking consists of manipulating data analyses in order to obtain significant effects (J. P. Simmons

STUDY ON OPEN SCIENCE: MONITORING TRENDS AND DRIVERS (Reference: PP-05622-2017)

8

9) Name. Give a title for this As Predicted preregistration DreamHost hosts the website and database. 3 Drivers

The growing trend towards preregistration although variance across scientific communities’ practices, journal policies and funder policies There is significant variance about preregistration practices across community practices, journal policies, and funder attitudes towards registration, although there is a general trend towards making it compulsory. There are some journals which have already adopted such practices (see footnote for list of journals with preregistration8). In those cases, researchers are requested or have the option to submit their research design and analysis strategy to the journal for peer review before starting the analysis. This first step can be rejected by the journal or can go to revision following the standard review processes. If approved, the researchers run the study. Once the study is completed, the process the authors submit the complete article, and if accepted by editors and reviewers, it goes for the second round of review and follows the standard revision procedure. However, reviewer teams cannot reject the study at this point due to its outcomes (non-significant effects) but still can be rejected for other problems not related to the outcome (Gonzales & Cunningham, 2015). Journals are also adopting different incentive mechanisms to increase study preregistration. Some journals are adopting ‘badges’, which provide incentives to authors to explicitly get credit in the published article for preregistering their study. In addition to journal policies, funding agencies are also embracing preregistration practices and thus requesting this practice for grant applications. Some funders have signed the Transparency and Openness Promotion Guidelines9 that includes preregistration within a set of standards for transparency and reproducibility. Preregistration is compulsory by US law in the case of clinical trials, and it is required to be published in the International Committee of Medical Journal Editors policy. As both journals and funders will increasingly adopt preregistration requisites, scientists’ behaviour is expected to change and motivate such practices. Finally, there are emerging initiatives that are incorporating incentives for preregistration in the publishing process. For instance, the Preregistration Challenge10 offers $1000 awards to researchers that publish the results of a preregistered study. Challenges of preregistration: finding the right balance between insufficient and too much information.

• A) Pre-registration is insufficiently specific or exhaustive. Many preregistrations contain insufficient details about data collection and analysis, sample size, rules for exclusions, or how the dependent variable will be scored. Although these preregistrations are time-stamped, they cannot effectively prevent p-hacking because they lack information and precision about the planning (Simmons et al., 2017).

• B) Pre-registration contains excessive information. Pre-registration needs to be easy to understand; it should be specific and concise enough to allow researchers to distinguish if it is a confirmatory or exploratory analysis, but not burdened with excessive information. As (J. Simmons et al., 2017) explain: “We have seen many preregistrations that are just too long, containing large sections on theoretical

8 List of journals with preregistration: https://docs.google.com/spreadsheets/d/1D4_k-8C_UENTRtbPzXfhjEyu3BfLxdOsn9j-otrO870/edit 9 http://cos.io/top/ 10 About the preregistration challenge: http://cos.io/prereg/. Another example: https://www.erpc2016.com/

Page 9: As Predicted: Preventing p-hacking · about research practices such as p-hacking. P-hacking consists of manipulating data analyses in order to obtain significant effects (J. P. Simmons

STUDY ON OPEN SCIENCE: MONITORING TRENDS AND DRIVERS (Reference: PP-05622-2017)

9

background and on exploratory analyses, or lots of procedural details that on the one hand will definitely be part of the paper, and on the other, are not p-hackable.”

Lessons from other scientific fields and preregistration practices The ClinicalTrials.gov experience showed that although posting the results is mandatory, many of the preregistrations are not compliant; that is, they do not upload the outcomes (Prayle, Hurley, & Smyth, 2012). In the cases that results are posted, they often are incomplete or inconsistent with the ones published (Becker & Ross, 2014). As Predicted seeks to overcome such problem by a system design that enables the study to be easily found thanks to the structure and information provided along with algorithms which help detect abuse. Need for transparent planning in an optional private space Allowing for private registrations mitigates the motivation problems of researchers for preregistering their studies and provides them with a useful tool to accurately plan their research while simultaneously preventing p-hacking. Researchers need to clearly describe the main characteristics of their study, which contributes to planning their analysis more precisely. 4 Barriers

Despite the value of preregistration for transparency, accuracy, and reproducibility, this practice is just starting to emerge in basic, preclinical, and applied research. Different barriers are preventing broader adoption of preregistration. The most relevant barrier is insufficient training. One of the significant preregistration barriers is the lack of proper methodological and statistical training (Nosek et al., 2017). There is an increasing number of resources made available online to help scientists to preregister their studies. These resources include online courses in Coursera11; guidelines about how to register12; provision of instructions and templates13; amongst others. We provide below an example shared by (J. Simmons et al., 2017) on wrong and accurate pre-registration (see figure 1). Finally, there is also a concern about how journals can assess preregistrations for potential publication following the review process. As Gonzales and Cunningham (2014) reflect: “Without research results to help indicate the value of a study, editors and reviewers for nonblind journals may rely more on researcher prestige to make decisions about accepting articles for preregistration. This could harm the ability of graduate students and early career researchers to publish their work, impairing their career development”.

11 Course in coursera: https://www.coursera.org/learn/statistical-inferences 12 Examples of instructional guides include: http://help.osf.io/m/registrations/l/546603-enter-the-

preregistration-challenge; and http://datacolada.org/64 13 Examples of templates: https://osf.io/zab38/wiki/

Page 10: As Predicted: Preventing p-hacking · about research practices such as p-hacking. P-hacking consists of manipulating data analyses in order to obtain significant effects (J. P. Simmons

STUDY ON OPEN SCIENCE: MONITORING TRENDS AND DRIVERS (Reference: PP-05622-2017)

10

Figure 1. Examples of good/bad preregistrations

5 Impact

The usage of preregistration sites is growing, specifically in psychology and social sciences. In the case of as predicted, table 1 offers the main metrics about its usage:

The different efforts launched to encourage preregistration practices have been accompanied by research literature assessing a decline in false-positive publications as well as assessing preregistration strengths and weaknesses. For instance, (Kaplan & Irvin, 2015) have observed a dramatic drop in the rate of significant results after the requirement of preregistering the primary outcomes in a sample of clinical trials. Preregistration has made it possible to identify and correct selection and to report biases (see www.COMPare-

Table 1. As Predicted usage (June 2019)

As predicted data usage 10,767 authors 984 institutions 15,477 created 2,719 already public

Page 11: As Predicted: Preventing p-hacking · about research practices such as p-hacking. P-hacking consists of manipulating data analyses in order to obtain significant effects (J. P. Simmons

STUDY ON OPEN SCIENCE: MONITORING TRENDS AND DRIVERS (Reference: PP-05622-2017)

11

trials.org) (Gonzales & Cunningham, 2015). Finally, Franco, Malhotra, & Simonovits, (2014) analysed time-sharing experiments in the archive of the social sciences of nearly 250 peer-reviewed proposals of social science experiments. They report that 40% of published papers in their preregistration studies sample failed to report outcome variables. The authors find that, on the other side, 56 out of 91 studies with strongly significant results were successfully published into a journal. Preregistration enables transparency and allows for assessing if/ how many non-significant results are published. 6 Policy conclusions

There is an increased awareness about the benefits of preregistration to combat p-hacking. There is widespread agreement that the decline of selective reporting will increase transparency, accuracy, and credibility in research outcomes. However, there are still challenges in motivating pre-registration practices across scientists and providing appropriate training and skills to effectively plan and preregister the studies. There is still a need for coordinated action in a decentralized system where the different motivations and tools are made available to researchers to increase preregistration. The research community, with its diverse stakeholders, needs to solve the incentive problem and nudge scientists towards such preregistration practices while emphasising the importance of such practices for career advancement.

Page 12: As Predicted: Preventing p-hacking · about research practices such as p-hacking. P-hacking consists of manipulating data analyses in order to obtain significant effects (J. P. Simmons

12

References

Begley, C. G. (2013). Reproducibility: Six red flags for suspect work. Nature, 497, 433–434. https://doi.org/10.1038/497433a

Bhattacharjee, Y. (2013, April 26). The Mind of a Con Man. The New York Times. Retrieved from https://www.nytimes.com/2013/04/28/magazine/diederik-stapels-audacious-academic-fraud.html

Dunnett, C. W. (1955). A Multiple Comparison Procedure for Comparing Several Treatments with a Control. Journal of the American Statistical Association, 50(272), 1096–1121. https://doi.org/10.1080/01621459.1955.10501294

Ferguson, C. J., & Brannick, M. T. (2012). Publication bias in psychological science: Prevalence, methods for identifying and controlling, and implications for the use of meta-analyses. Psychological Methods, 17(1), 120–128. https://doi.org/10.1037/a0024445

Franco, A., Malhotra, N., & Simonovits, G. (2014). Publication bias in the social sciences: Unlocking the file drawer. Science, 345(6203), 1502–1505. https://doi.org/10.1126/science.1255484

Gonzales, J. E., & Cunningham, C. A. (2015, August). The promise of preregistration in psychological research. American Psychological Association. Retrieved from https://www.apa.org/science/about/psa/2015/08/pre-registration

Ioannidis, J. P. A. (2005). Why Most Published Research Findings Are False. PLOS Medicine, 2(8), e124. https://doi.org/10.1371/journal.pmed.0020124

Kaplan, R. M., & Irvin, V. L. (2015). Likelihood of Null Effects of Large NHLBI Clinical Trials Has Increased over Time. Retrieved from https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0132382

Kerr, N. (1998). HARKing: hypothesizing after the results are known. Perspectives Social Psychology Review, 2(3), 196–217. https://doi.org/10.1207/s15327957pspr0203_4

Kühberger, A., Fritz, A., & Scherndl, T. (2014). Publication Bias in Psychology: A Diagnosis Based on the Correlation between Effect Size and Sample Size. PLOS ONE, 9(9), e105825. https://doi.org/10.1371/journal.pone.0105825

Loken, E., & Gelman, A. (2017). Measurement error and the replication crisis. Science, 355(6325), 584–585. https://doi.org/10.1126/science.aal3618

Nosek, B. A., Ebersole, C. R., DeHaven, A., & Mellor, D. (2017). The Preregistration Revolution. https://doi.org/10.31219/osf.io/2dxu5

Rosenthal, R. (1979). The file drawer problem and tolerance for null results. Psychological Bulletin, 86(3), 638–641. https://doi.org/10.1037/0033-2909.86.3.638

Rovenpor, D. R., & Gonzales, J. E. (2015). Replicability in psychological science. Retrieved July 1, 2019, from the American Psychological Association website: www.apa.org/science/about/psa/2015/01/replicability.aspx

Sellke, T., Bayarri, M. J., & Berger, J. O. (2001). Calibration of ρ Values for Testing Precise Null Hypotheses. The American Statistician, 55(1), 62–71. https://doi.org/10.1198/000313001300339950

Simmons, J., Nelson, L. D., & Simonsohn, U. (2017, November 6). How To Properly Preregister A Study. Retrieved July 1, 2019, from Data Colada website: http://datacolada.org/64

Page 13: As Predicted: Preventing p-hacking · about research practices such as p-hacking. P-hacking consists of manipulating data analyses in order to obtain significant effects (J. P. Simmons

13

Simmons, J. P., Nelson, L. D., & Simonsohn, U. (2011). False-Positive Psychology: Undisclosed Flexibility in Data Collection and Analysis Allows Presenting Anything as Significant. Retrieved July 1, 2019, from https://journals.sagepub.com/doi/full/10.1177/0956797611417632

Wharton University. (2019). Credibility Lab. Retrieved July 1, 2019, from https://credlab.wharton.upenn.edu/

Yong, E. (2012, July 12). Uncertainty shrouds psychologist’s resignation. Nature. Retrieved from https://www.nature.com/news/uncertainty-shrouds-psychologist-s-resignation-1.10968

Page 14: As Predicted: Preventing p-hacking · about research practices such as p-hacking. P-hacking consists of manipulating data analyses in order to obtain significant effects (J. P. Simmons

Getting in touch with the EU IN PERSON All over the European Uni, on there are hundreds of Europe Direct Information Centres. You can find the address of thcenterre nearest you t: http://europa.eu/conta.ct ON THE PHONE OR BY E-MAIL Europe Direct is a service that answers your questions about the European Union. You can contact this service – by freephone: 00 800 6 7 8 9 10 11 (certain operators may charge for these calls), – at the following standard number: +32 22999696 or – by electronic mail via: http://europa.eu/contact Finding information about the EU ONLINE Information about the European Union in all the official languages of the EU is available on the Europa website at: http://europa.eu EU PUBLICATIONS You can download or order free and priced EU publications from EU Bookshop at: http://bookshop.europa.eu. Multiple copies of free publications may be obtained by contacting Europe Direct or your local informatiocenterre (see http://europa.eu/contact) EU LAW AND RELATED DOCUMENTS For access to legal information from the EU, including all EU law since 1951 in all the official language versions, go to EUR-Lex t: http://eur-lex.europa.eu OPEN DATA FROM THE EU The EU Open Data Portal (http://data.europa.eu/euodp/en/data) provides acces s to datasets from the EU. Data can be downloaded and reused for free, both for commercial and non-commercial purposes.

Page 15: As Predicted: Preventing p-hacking · about research practices such as p-hacking. P-hacking consists of manipulating data analyses in order to obtain significant effects (J. P. Simmons

The scientific community have started to foster research practices that seek to overcome p-hacking. One of such practices includes the preregistration of studies, which seeks to reduce the publication bias for statistically significant results. Preregistration makes researchers commit to analytical steps without knowing a priori the research outcomes. It affords distinguishing between hypothesis generation and testing, and it preserves accuracy in the calibration and assessment of the evidence. In the last years, we have witnessed a proliferation of different independent registries. The present case study provides an overview of one of such registries: As Predicted, which was launched by the Credibility Lab of Wharton University of Pennsylvania in December 2015. Studies and reports

[Catalogue num

ber]