104
Open research data Heather Piwowar DataONE postdoc with Dryad and NESCent, UBC @researchremix OA week 2010 University of British Columbia

Research into Open Research Data

Embed Size (px)

DESCRIPTION

Presentation by Heather Piwowar as part of UBC's Open Access Week 2010

Citation preview

Page 1: Research into Open Research Data

Open research data

Heather PiwowarDataONE postdoc with Dryad and NESCent, UBC

@researchremix

OA week 2010University of British Columbia

Page 2: Research into Open Research Data

#1

It matters

Page 3: Research into Open Research Data

http://www.metmuseum.org/toah/ho/09/euwf/ho_24.45.1.htm

Page 4: Research into Open Research Data

http://www.flickr.com/photos/jsmjr/62443357/

Page 5: Research into Open Research Data

http://www.flickr.com/photos/camilleharrington/3587294608/

Page 6: Research into Open Research Data

http://www.flickr.com/photos/rkuhnau/3318245976/

Page 7: Research into Open Research Data

http://www.flickr.com/photos/conformpdx/1796399674/

Page 8: Research into Open Research Data

http://www.flickr.com/photos/rkuhnau/3317418699/

Page 9: Research into Open Research Data

http://www.flickr.com/photos/zemlinki/261617721/

Page 10: Research into Open Research Data

http://www.flickr.com/photos/tracenmatt/3020786491/

Page 11: Research into Open Research Data

http://www.flickr.com/photos/the-o/2078239333/

Page 12: Research into Open Research Data
Page 13: Research into Open Research Data

http://www.flickr.com/photos/75166820@N00/5318468/

Page 14: Research into Open Research Data

#2

Wayfinding + progress

Page 16: Research into Open Research Data

http://www.flickr.com/photos/paulhami/1020538523//

Which data?

Page 17: Research into Open Research Data

http://www.flickr.com/photos/paulhami/1020538523//

Where?

Page 18: Research into Open Research Data

http://www.flickr.com/photos/paulhami/1020538523//

With whom?

Page 19: Research into Open Research Data

http://www.flickr.com/photos/paulhami/1020538523//

When?

Page 20: Research into Open Research Data

http://www.flickr.com/photos/paulhami/1020538523//

Under what terms?

Page 22: Research into Open Research Data

FindOrganizeDocumentDeidentifyFormatAskSubmit

Answer questionsWorry about mistakes being foundWorry about data being misinterpretedWorry about being scoopedForgo money and IP and prestige???

Page 23: Research into Open Research Data

not very motivating.

Page 25: Research into Open Research Data

a) policies + expectations

- NSF- Joint Data Archiving Policy- BioMed Central- PLoS

Page 26: Research into Open Research Data

b) repositories

- datatype-based- institution-based- discipline-based- journal-based

Page 27: Research into Open Research Data

c) standards

- data licenses- data citation - IDs for datasets, people, entities

Page 28: Research into Open Research Data

d) part of something bigger

- open government data- citizen science- supplemental materials- dataset-based usage metrics- awards, recognition

Page 29: Research into Open Research Data

#3

Is it working?

Page 30: Research into Open Research Data

http://www.genome.jp/en/db_growth.html

lots of data sharing!

Page 31: Research into Open Research Data

but how much isn’t shared?

what isn’t shared?

who isn’t sharing it?why not?

what can we do about it?

how much does it matter?

Page 32: Research into Open Research Data

you can not manage what you do not measure

quote: Lord Kelvinhttp://www.flickr.com/photos/archeon/2941655917/

Page 34: Research into Open Research Data

Why is it important?Are we sure?

Page 35: Research into Open Research Data

Errors.

Gore et al 1977, Kantoer and Taylor 1994, McGuigan 1995, Hurlbert and White 1993

More than half of all papers contain errors

5‐10% contain errors that change the conclusions

Page 36: Research into Open Research Data

Ok, let’s share on request.

Page 37: Research into Open Research Data

Doesn’t work

self-reported denying a request in last 3 years

trainees self-reported denying a request

been denied access to data, materials, code

authors “not able to retrieve raw data”

not willing to release data

0% 10% 20% 30% 40%

Campbell et al. JAMA. 2002.Kyzas et al. J Natl Cancer Inst. 2005.

Vogeli et al. Acad Med. 2006.Reidpath et al. Bioethics 2001.

Page 38: Research into Open Research Data

Don’t get the email

Evangelou et al.  FASEB J.  2006.Wren.  Bioinformatics 2008.Wren et al.  EMBO Rep 2006.

Page 39: Research into Open Research Data

Say no

Hedstrom. Society of Am Archivists Ann Meeting. 2008.

want to publish more papers first

want exclusive use

ensure data confidentiality

control

avoid cost of preparation0% 10% 20% 30% 40% 50%

Page 40: Research into Open Research Data

Ask why

Reidpath et al. Bioethics 2001.

`Before I send you the data could I ask what you want it for?'

`Can you be more explicit, please, about the analyses you have in mind and what you plan to do with them?'

`We'll have to discuss your request with the other coauthors.  Before we do that, I'd like to know your proposed analysis plan.' 

`We are not finished using the data, but when we are finished with it, we would be open to requests for the data.'

`Any use of the data other than for the specific purpose laid down in the contract of collaboration is effectively ruled out.'

Page 41: Research into Open Research Data

Not efficient.

Page 42: Research into Open Research Data

Not efficient. Not fair.

Campbell et all 2000

Not random:

‐ young

‐ productive

Page 43: Research into Open Research Data

Has real costs.Survey of doctoral students and postdocs:

28-50% reported withholding negative effects:• hurt progress of their research, • hurt rate of discovery in their lab/research group, • hurt quality of their relationships with academic

scientists, • hurt quality of their education, • hurt level of communication in their lab/research

group.

Vogeli et al. Acad Med. 2006 Feb; 81(2):128-36

Page 44: Research into Open Research Data

Ok, then on a website?No. Urls stop working.

Evangelou et al.  FASEB J.  2006.Wren.  Bioinformatics 2008.Wren et al.  EMBO Rep 2006.

Page 45: Research into Open Research Data

Ok, in a repository?

Page 46: Research into Open Research Data

lots of data sharing!

http://www.genome.jp/en/db_growth.html

Page 47: Research into Open Research Data

http://www.flickr.com/photos/g_kat26/4255119413/

Page 49: Research into Open Research Data
Page 50: Research into Open Research Data

Combined, these full-text portals reach 85% of the articles available through U of Pittsburgh library subscriptions.

Page 51: Research into Open Research Data

microarray data

http://en.wikipedia.org/wiki/DNA_microarray

http://en.wikipedia.org/wiki/Image:Heatmap.png

http://commons.wikimedia.org/wiki/File:DNA_double_helix_vertikal.PNG

Page 52: Research into Open Research Data
Page 53: Research into Open Research Data

11,603 studies that created

gene expression microarray data

Page 54: Research into Open Research Data

Is research data shared after publication?

Funder Journal Investigator Institution Study

Page 55: Research into Open Research Data

funded by NIH?

size of grant

sharing plan req’d?

funded by non-NIH?

impact factor

strength of policy

open access?

number of microarray studies published

years since first paper

# pubs

# citations

previously shared?

previously reused?

gender

sector

size

impact rank

country

humans?

mice?

plants?

cancer?

clinical trial?

number of authors

year

Funder Journal Investigator Institution Study

Page 56: Research into Open Research Data

“An inherent principle of publication is that others should be able to replicate and build upon the authors' published claims. Therefore, a condition of publication in a Nature journal is that authors are required to make materials, data and associated protocols available in a publicly accessible database …”

http://www.nature.com/authors/editorial_policies/availability.html

http://www.nature.com/nature/journal/v453/n7197/index.html

journal data sharing policy

Page 57: Research into Open Research Data

journal rank

Page 58: Research into Open Research Data

institution rank

Yu et al. BMC medical informatics and decision making (2007) vol. 7 pp. 17

Page 59: Research into Open Research Data

funding level

PubMed grant lists + NIH grant details

Page 60: Research into Open Research Data

study type

Page 61: Research into Open Research Data

author gender

Page 62: Research into Open Research Data

and so on...

124 variables

Page 63: Research into Open Research Data

11,603 studies

25% had links from datasets in databases

Page 64: Research into Open Research Data

0.05

0.10

0.15

0.20

0.25

0.30

0.35

Year article published

Pro

po

rtio

n o

f a

rtic

les w

ith

da

tase

ts f

ou

nd

in

GE

O o

r A

rra

yE

xp

ress

2000 2001 2002 2003 2004 2005 2006 2007 2008 2009

Proportion of articles with shared datasets, by year

Across time

Page 65: Research into Open Research Data

What can we do about it?

Page 66: Research into Open Research Data

What can we do about it?

Funder policies.

Page 67: Research into Open Research Data

19%

Piwowar and Chapman. Journal of Informetrics 2010

Page 68: Research into Open Research Data

What can we do about it? Journal policies.

Page 69: Research into Open Research Data

We looked at data sharing policies within Instruction to Author statements of 70 journals, as they apply to gene expression microarray data.

Piwowar and Chapman. ELPUB 2008

Page 70: Research into Open Research Data

No applicable policy (43%)

Weak policy (24%)

should, recommend, requestmust, but without requiring database accession number

Strong policy (33%)

must, required, condition of publicationrequires database accession number

strength of data sharing policies

Page 71: Research into Open Research Data

High-impact journals

tend to have

a strong data-sharing

policy

Page 72: Research into Open Research Data

Articles published in journals with a strong data-sharing policy are more likely to have publicly

available datasets

Page 73: Research into Open Research Data

What can we do about it?

Learn

• Learn from those who do it well• Focus on places that need it

Page 74: Research into Open Research Data

Ph

ysio

l G

en

om

ics

PL

oS

Ge

ne

t

Ge

no

me

Bio

l

Microbiology

PL

oS

On

e

BM

C G

en

om

ics

Pla

nt

Ce

ll

Ge

no

me

Re

s

Eu

ka

ryo

t C

ell

Ap

pl E

nviro

n M

icro

bio

lB

MC

Me

d G

en

om

ics

Hu

m M

ol G

en

et

Pro

c N

atl A

ca

d S

ci U

S A

Infe

ct

Imm

un

Am

J R

esp

ir C

ell

Mo

l B

iol

De

v B

iol

J B

acte

rio

l

Mo

l E

nd

ocrin

ol

BM

C C

an

ce

r

Pla

nt

Ph

ysio

lB

iol R

ep

rod

Blood

J I

mm

un

ol

FA

SE

B J

To

xic

ol S

ci

J E

xp

Bo

tN

ucle

ic A

cid

s R

es

Diabetes

Mo

l C

ell B

iol

Mo

l C

an

ce

r T

he

r

BM

C B

ioin

form

atics

Ste

m C

ells

FE

BS

Le

tt

J N

eu

rosci

Am

J P

ath

ol

J B

iol C

he

m

J V

iro

l

OTHER

Ca

nce

r R

es

J C

lin

En

do

crin

ol M

eta

b

Pla

nt

Mo

l B

iol

Clin

Ca

nce

r R

es

Genomics

Inve

st

Op

hth

alm

ol V

is S

ci

Mo

l H

um

Re

pro

dCarcinogenesis

Gene

Endocrinology

Oncogene

Ca

nce

r L

ett

Bio

ch

em

Bio

ph

ys R

es C

om

mu

n

Pro

port

ion o

f data

sets

share

d

0.0

0.2

0.4

0.6

0.8

1.0 Journals(Physiological Genomics)

Page 75: Research into Open Research Data

Sta

nfo

rd U

niv

ers

ity

Un

ive

rsity o

f P

en

nsylv

an

ia

Un

ive

rsity o

f Illin

ois

Un

ive

rsity o

f C

alif

orn

ia,

Lo

s A

ng

ele

s

Un

ive

rsity o

f W

isco

nsin

, M

ad

iso

n

Un

ive

rsity o

f W

ash

ing

ton

Un

ive

rsity o

f C

alif

orn

ia,

Da

vis

Th

e U

niv

ers

ity o

f B

ritish

Co

lum

bia

Un

ive

rsity o

f C

alif

orn

ia,

Sa

n F

ran

cis

co

Un

ive

rsity o

f F

lorid

a

Un

ive

rsity o

f C

alif

orn

ia,

Sa

n D

ieg

o

Un

ive

rsity o

f M

inn

eso

ta,

Tw

in C

itie

s

Ba

ylo

r C

olle

ge

of

Me

dic

ine

OTHER

Ma

x P

lan

ck G

ese

llsch

aft

Ha

rva

rd U

niv

ers

ity

Du

ke

Un

ive

rsity M

ed

ica

l C

en

ter

Ya

le U

niv

ers

ity

Jo

hn

s H

op

kin

s U

niv

ers

ity

Un

ive

rsity o

f P

itts

bu

rgh

Wa

sh

ing

ton

Un

ive

rsity in

Sa

int

Lo

uis

Un

ive

rsity o

f T

oro

nto

Un

ive

rsity o

f C

alif

orn

ia,

Be

rke

ley

Un

ive

rsity o

f M

ich

iga

n,

An

n A

rbo

r

Mic

hig

an

Sta

te U

niv

ers

ity

Na

tio

na

l C

an

ce

r In

stitu

te

To

kyo

Da

iga

ku

Pro

po

rtio

n o

f d

ata

se

ts s

ha

red

0.0

0.2

0.4

0.6

0.8

1.0

Institutions(Stanford)

Page 76: Research into Open Research Data

1

101

201

301

401

501

601

701

801

901

1001

1101

1201

1301

1401

1501

1601

1701

1801

1901

Pro

po

rtio

n o

f d

ata

se

ts s

ha

red

0.0

0.2

0.4

0.6

0.8

1.0

Institutionrank

Page 77: Research into Open Research Data

Odds Ratio

0.25 0.50 1.00 2.00 4.00 8.00

Has journal policy0.95Count of R01 & other NIH grants

Authors prev GEOAE sharing & OA & microarray creation

NO K funding or P funding

Institution high citations & collaboration

Journal impact

Journal policy consequences & long halflife

NOT animals or mice

Instititution is government & NOT higher ed

Last author num prev pubs & first year pub

Large NIH grant

Humans & cancer

NO geo reuse + YES high institution output

First author num prev pubs & first year pub

Multivariate nonlinear regressions with interactions

Odds Ratio

0.25 0.50 1.00 2.00 4.00 8.00

Has journal policy0.95Count of R01 & other NIH grants

Authors prev GEOAE sharing & OA & microarray creation

NO K funding or P funding

Journal impact

Journal policy consequences & long halflife

Institution high citations & collaboration

NOT animals or mice

Instititution is government & NOT higher ed

Last author num prev pubs & first year pub

Large NIH grant

Humans & cancer

NO geo reuse + YES high institution output

First author num prev pubs & first year pub

Multivariate nonlinear regressions with interactions

Page 78: Research into Open Research Data

Odds Ratio

0.25 0.50 1.00 2.00 4.00 8.00

Has journal policy0.95Count of R01 & other NIH grants

Authors prev GEOAE sharing & OA & microarray creation

NO K funding or P funding

Institution high citations & collaboration

Journal impact

Journal policy consequences & long halflife

NOT animals or mice

Instititution is government & NOT higher ed

Last author num prev pubs & first year pub

Large NIH grant

Humans & cancer

NO geo reuse + YES high institution output

First author num prev pubs & first year pub

Multivariate nonlinear regressions with interactions

Odds Ratio

0.25 0.50 1.00 2.00 4.00 8.00

Has journal policy0.95Count of R01 & other NIH grants

Authors prev GEOAE sharing & OA & microarray creation

NO K funding or P funding

Journal impact

Journal policy consequences & long halflife

Institution high citations & collaboration

NOT animals or mice

Instititution is government & NOT higher ed

Last author num prev pubs & first year pub

Large NIH grant

Humans & cancer

NO geo reuse + YES high institution output

First author num prev pubs & first year pub

Multivariate nonlinear regressions with interactions

Page 79: Research into Open Research Data

Odds Ratio

0.25 0.50 1.00 2.00 4.00

OA journal & previous GEO-AE sharing

0.95Amount of NIH funding

Journal impact factor and policy

Higher Ed in USA

Cancer & humans

Multivariate nonlinear regression with interactions

Page 80: Research into Open Research Data

Odds Ratio

0.25 0.50 1.00 2.00 4.00

OA journal & previous GEO-AE sharing

0.95Amount of NIH funding

Journal impact factor and policy

Higher Ed in USA

Cancer & humans

Multivariate nonlinear regression with interactions

Page 81: Research into Open Research Data

Carrot?

http://www.flickr.com/photos/sunrise/35819369/

Page 82: Research into Open Research Data

currency of value?

Citations.

Page 83: Research into Open Research Data

currency of value?

Citations.

$50!

Diamond,Arthur M. What is a Citation Worth?. The Journal of Human Resources (1986) vol. 21 (2) pp. 200-215

Page 84: Research into Open Research Data

dataset85 cancer microarray trials published in 1999-2003, as identified by Ntzani and Ioannidis (2003)

citationsISI Web of Science Citation index, citations from 2004-2005

data sharing locationsPublisher and lab websites, microarray databases, WayBack Internet Archive, Oncomine

statisticsMultivariate linear regression

Page 85: Research into Open Research Data

Note:log scale

Page 86: Research into Open Research Data

~70%

Page 87: Research into Open Research Data

Next?

http://www.flickr.com/photos/gatewaystreets/3838452287/

Page 88: Research into Open Research Data

Impact of JDAP

Abadie et al. Journal of the American Statistical Association 2010

Page 89: Research into Open Research Data

Reuse.

http://www.flickr.com/photos/boitabulle/3668162701/

Page 90: Research into Open Research Data

http://upload.wikimedia.org/wikipedia/commons/thumb/e/e6/Gamma_distribution_pdf.svg/500px-Gamma_distribution_pdf.svg.png

Page 91: Research into Open Research Data
Page 92: Research into Open Research Data
Page 93: Research into Open Research Data

#4

We are the culture.Let’s do it.

Page 94: Research into Open Research Data

http://www.flickr.com/photos/joellevand/279468607/

Page 95: Research into Open Research Data

http://www.flickr.com/photos/huzzahvintage/4577075021/

Page 96: Research into Open Research Data

a) in our communities

- strengthening policies:- journal, conference, institutional

- decision-makers- role-models and educators

Page 97: Research into Open Research Data

b) in our tools

- measure opinions- measure use- be transparent!

Page 98: Research into Open Research Data

c) with our data

- share it.- ugly? incomplete? strange?

“Flawed, but out there” is a million times better than “perfect, but unattainable”

http://sciblogs.co.nz/seeing-data/2010/10/12/the-zen-of-open-data/

Page 99: Research into Open Research Data

“Does anyone want your data?

That’s hard to predict […] After all, no one ever knocked on your door asking to buy those figurines collecting dust in your cabinet before you listed them on eBay.

Your data, too, may simply be awaiting an effective matchmaker.”

Got data? Nature Neuroscience (2007)

Page 100: Research into Open Research Data

I post my data, code, and statistical scripts: http://researchremix.org

Share yours too!

http://www.flickr.com/photos/myklroventine/892446624/

Page 101: Research into Open Research Data

More info?

• OATP oa.data tag on Connotea, Twi1er

• FriendFeed• Mendeley “data sharing” group

• @researchremix [email protected] 

Page 102: Research into Open Research Data

thank youTodd Vision,

Michael Whitlock, Wendy Chapman

The open science online community and those who release their articles, datasets and photos openly

Page 103: Research into Open Research Data
Page 104: Research into Open Research Data

http://www.flickr.com/photos/youraddresshere/6649228/