66
Open Data + Preprints = Open Science David Mellor @EvoMellor [email protected] Find this presentation at osf.io/ 7grp5

Open Data + Preprints = Open Science · Open Data + Preprints = Open Science David Mellor @EvoMellor [email protected] Find this presentation at osf.io/7grp5. Open Data + Preprints = Open

  • Upload
    others

  • View
    8

  • Download
    0

Embed Size (px)

Citation preview

Open Data + Preprints = Open Science

David Mellor

@EvoMellor

[email protected]

Find this presentation at osf.io/7grp5

Open Data + Preprints = Open Science

David Mellor

@EvoMellor

[email protected]

Find this presentation at https://osf.io/zshgp/

Everything that happens prior to the final publication

Our mission is to increase

openness, integrity, and

reproducibility of research.

Openness is a core value of the scientific method

“Nullius in verba” ~ “Take nobody's word for it”

• Communalism

• Universalism

• Disinterestedness

• Organized skepticism(Merton, 1942)

Problem:

The gap between scholarly values and practices

Most researchers say that they support the values of science

Many researchers say that they practice the values of science

But almost everyone thinks that “other researchers” do not

practice the values of science

The combination of a strong bias toward statistically

significant findings and flexibility in data analysis

results in irreproducible research

The combination of a strong bias toward statistically

significant findings and flexibility in data analysis

results in irreproducible research

https://simplystatistics.org/2017/07/26/announcing-the-tidypvals-package/

The combination of a strong bias toward statistically

significant findings and flexibility in data analysis

results in irreproducible research

The Garden of Forking Paths

Gelman and Loken, 2013

“Does X affect Y?”

Control for time?

Exclude outliers?

Median or mean?

How do you define X?

The combination of a strong bias toward statistically

significant findings and flexibility in data analysis

results in irreproducible research

“Many analysts, one dataset: Making transparent how variations in analytical choices affect results” https://psyarxiv.com/qkwst/ Image via 538.com

https://fivethirtyeight.com/features/science-isnt-broken/#part1

The combination of a strong bias toward statistically

significant findings and flexibility in data analysis

results in irreproducible research

p-v

alu

es

Original Studies Replications

97%

“significant”

37%

“significant”

Is there a Reproducibility Crisis?

Baker, 2016

Nature survey of 1,576 researchers

Is there a Reproducibility Crisis?

Nature survey of 1,576 researchers

Baker, 2016

Solutions

Option A

Mandates

Culture of distrust.

Continued antagonism.

Solutions

Option B

Reward actions that are idealized

scientific practice.

The “truth” is versioned.

Pointing out mistakes enhance

reputations of all parties.

Getting to Option B

I. Allow peers to receive recognition for

ideal behaviors.

II. Identify biases in analysis and

publication, use a process that

addresses those biases.

III. Collective action by key decision

makers.

Transparency and Openness Promotion (TOP)

Guidelines

Eight policy statements for increasing

the transparency and reproducibility

of the published research.

• Agnostic to discipline

• Low barrier to entry

• Modular

See cos.io/top for more detailed language

Three Tiers

Disclose Require Verify

Eight Standards

Data citation

Materials transparency

Data transparency

Code transparency

Design transparency

Study Preregistration

Analysis Preregistration

Replication

See cos.io/top for more detailed language

Three Tiers

Disclose Require Verify

Eight Standards

Data citation

Materials transparency

Data transparency

Code transparency

Design transparency

Study Preregistration

Analysis Preregistration

Replication

See cos.io/top for more detailed language

Three Tiers

Disclose Require Verify

Eight Standards

Data citation

Materials transparency

Data transparency

Code transparency

Design transparency

Study Preregistration

Analysis Preregistration

Replication

See cos.io/top for more detailed language

Three Tiers

Disclose Require Verify

Eight Standards

Data citation

Materials transparency

Data transparency

Code transparency

Design transparency

Study Preregistration

Analysis Preregistration

Replication

See cos.io/top for more detailed language

Ask authors who submit to answer 2

questions:

1) Are the data/code/materials

available in a public repository?

Yes/No

2) If Yes, where: URL: ________

Three Tiers

Disclose Require Verify

Eight Standards

Data citation

Materials transparency

Data transparency

Code transparency

Design transparency

Study Preregistration

Analysis Preregistration

Replication

See cos.io/top for more detailed language

Ask authors who submit to answer 2

questions:

1) Are the data/code/materials

available in a public repository?

Yes/No

2) If Yes, where: URL: ________

Make answers available in article

metadata, or simply in footnotes.

Signals: Making Behaviors Visible

Promotes Adoption

Kidwell et al., 2016

% A

rtic

les r

ep

ort

ing

data

availa

ble

in r

ep

osito

ry40%

30%

20%

10%

0%

Fig 4. Actually available, correct, usable, and complete data.

% o

f A

rtic

les

wit

h D

ata

Re

po

rte

dly

Ava

ila

ble

Reportedly

Available

Actually

Available

Correct

DataUsable

Data

Complete

Data

Preregistration increases credibility by

specifying in advance how data will be

analyzed, thus preventing biased

reasoning from affecting data analysis.

cos.io/prereg

What is a preregistration?

A time-stamped, read-only version of your research plan

created before the study.

Study plan:

● Hypothesis

● Data collection procedures

● Manipulated and measured variables

Analysis plan:

● Statistical model

● Inference criteria

What problems do preregistration fix?

1) The file drawer

2) P-Hacking: Unreported flexibility in data analysis

3) HARKing: Hypothesizing After Results are Known

Dataset

Hypothesis

Kerr, 1998

What problems do preregistration fix?

1) The file drawer

2) P-Hacking: Unreported flexibility in data analysis

3) HARKing: Hypothesizing After Results are Known

Dataset

Hypothesis

Kerr, 1998

Preregistration makes the distinction between

confirmatory (hypothesis testing) and

exploratory (hypothesis generating)

research more clear.

Confirmatory versus exploratory analysis

Context of confirmation

Traditional hypothesis testing

Results held to the highest

standards of rigor

Goal is to minimize false positives

P-values interpretable

Presenting exploratory results as confirmatory

increases publishability at the expense of credibility

Context of discovery

Pushes knowledge into new areas/

data-led discovery

Finds unexpected relationships

Goal is to minimize false negatives

P-values meaningless

Example workflow #1

(Theory driven with specific prediction)

Discovery Phase

Exploratory research

Hypothesis generating

Confirmation Phase

Hypothesis testing

Collect New Data

Discovery Phase

Exploratory research

Hypothesis generating

Confirmation Phase

Hypothesis testing

Collect Data

Split DataKeep these

data secret!

Example workflow #2

(Few a-priori predictions)

How do you preregister?

https://osf.io/prereg

Tips for writing up preregistered work

1. Include a link to your preregistration

2. Report the results of ALL preregistered analyses

3. ANY unregistered analyses must be transparent

What is a Registered Report?

When the research plan undergoes peer review

before results are known, the preregistration

becomes part of a Registered Report

cos.io/rr

Registered Reports

• Are the hypotheses well founded?

• Are the methods and proposed analyses feasible and sufficiently

detailed?

• Is the study well powered? (≥90%)

• Have the authors included sufficient positive controls to confirm that the

study will provide a fair test?

Registered Reports

• Did the authors follow the approved protocol?

• Did positive controls succeed?

• Are the conclusions justified by the data?

None of these things matter

Chambers, 2017

What problems do preregistration fix?

1) The file drawer

2) P-Hacking: Unreported flexibility in data analysis

3) HARKing: Hypothesizing After Results are Known

4) Registered Reports also address publication bias.

5) Registered Reports also may improve the study design,

by getting peer review into the process sooner.

FAQ: Does preregistration work?

Reported Tests (122)

Median p-value = .02

Median effect size (d) = .29

% p < .05 = 63%

Unreported Tests (147)

Median p-value = .35

Median effect size (d) = .13

% p < .05 = 23%

Underreporting in Political Science Survey Experiments: Comparing Questionnaires to Published

Results. Franco, A., Malhotra, N., & Simonovits, G. (2015).

OpenSesame

https://osf.io

Managing a research workflow

Planning Execution Reporting Archiving Discovery

Collaboration

Version Control

Hub for Services

Project Management

Collaboration

Put data, materials,

and code on the OSF

File downloads

Forks

See the Impact

https://osf.io/preprints/discover

Aggregated

Powered by SHARE (share.osf.io), OSF

Preprints aggregates search across local

and external preprint services

Currently over 2M preprint records

available

UnmoderatedPreprints are visible

Pre-moderationPreprints are not visible until approved

Post-moderationPreprints are visible until moderator rejects

Commenting options

Visibility of comments

1. Visible only to other preprint moderators

2. Visible to both preprint moderators AND to

authors

Anonymity

1. Comments from moderators are anonymous

2. Comments from moderators are identified

What’s Next?

Public Roadmap (http://bit.ly/2iUAFGF)

• Improved Analytics

• Improved search and filtering

• Public commenting

• Preprints Advisory Committee is providing on going

governance, technical prioritization, best practices, and

education

• A widening community that supports experimentation and

innovation in scholarly communications

Open Science and Citation Impact

1) Articles that appear in preprint servers are more highly cited than

those that don’t. • The Citation Impact of Digital Preprint Archives for Solar Physics Papers (Metcalfe, 2006;

https://doi.org/10.1007/s11207-006-0262-7)

2) Sharing Detailed Research Data Is Associated with Increased

Citation Rate • (Piwowar et al, 2017; https://doi.org/10.1371/journal.pone.0000308)

OpenSesame

Thank you!

Find this presentation at osf.io/7grp5

Resources for Registered Reports, preregistration, Open Science Badges,

Statistical Consulting, Communities, and more at https://cos.io

Find me online @EvoMellor or email: [email protected]

Our mission is to provide expertise, tools, and training to help researchers create and

promote open science within their teams and institutions. Promoting these practices

within the research funding and publishing communities accelerates scientific progress.