16
Beyond Data Release Mandates Helping Authors Make Data Available Laurie Goodman, PhD Editor-in-Chief GigaScience ORCID ID: 0000-0001-9724-5976

Laurie Goodman at #aibsdata: Beyond Data Release Mandates - Helping Authors Make Data Available

Embed Size (px)

DESCRIPTION

Laurie Goodman at the AIBS Changing Practices in Data Pub workshop: Beyond Data Release Mandates - Helping Authors Make Data Available. 3rd December 2014

Citation preview

Page 1: Laurie Goodman at #aibsdata: Beyond Data Release Mandates - Helping Authors Make Data Available

Beyond Data Release Mandates

Helping Authors Make Data Available

Laurie Goodman, PhD

Editor-in-Chief GigaScience

ORCID ID: 0000-0001-9724-5976

Page 2: Laurie Goodman at #aibsdata: Beyond Data Release Mandates - Helping Authors Make Data Available

Beyond Reproducibility and Retraction

• I’m not going to talk about:

– The 10-30% of papers that can’t be reproduced

• I’m not going to talk about:

– The 15x increase in the number of retractions in the last decade

More eyes*More innovation* More widespread use

I’m going to talk about the other reason for making data available (in an accessible and reusable format)

Page 3: Laurie Goodman at #aibsdata: Beyond Data Release Mandates - Helping Authors Make Data Available

Infectious DiseaseMeasles: 122,000 per yearHepatitis C-related liver disease: 350,000-500,000 per yearMalaria: 627,000 per yearHIV/AIDS: 1.4-1.7 million per yearNon-communicable, with genetic predispositionProstate cancer: 307,000 per yearBreast cancer: 522,000 per yearSuicide: 800,000 per yearDiabetes: 1.5 million per yearCancer: 8.2 million per yearCardiovascular Disease: 17.5 million per yearNon-genetic/Non-infectiousPesticide Poisoning: 250,000 per yearMalnutrition: 2.8 million children (under 5) per year

World Health Organization Fact Sheets http://www.who.int/en/

EnvironmentExtinction Rate: 1,000-10,000x higher than natural extinction rate

World Wildife Federation http://wwf.panda.org/

Why?

Page 4: Laurie Goodman at #aibsdata: Beyond Data Release Mandates - Helping Authors Make Data Available

This week: Genome Biology Soap Box article on Future of Data Publishing in By Kahn R., Goodman L., & Mittleman D. http://genomebiology.com/

Page 5: Laurie Goodman at #aibsdata: Beyond Data Release Mandates - Helping Authors Make Data Available

Scientific Communication Via Publication

• Scholarly articles are merely advertisement of scholarship . The actual scholarly artefacts, i.e. the data and

computational methods, which support the

scholarship, remain largely inaccessible --- Jon B. Buckheit and David L. Donoho, WaveLab and reproducible research, 1995

• Core scientific statements or assertions are intertwined and hidden in the conventional scholarly narratives

• Lack of transparency, lack of credit for anything other than “regular” dead tree publication

Page 6: Laurie Goodman at #aibsdata: Beyond Data Release Mandates - Helping Authors Make Data Available

Wiley Researcher Data Insights SurveyWhy Researchers Do Not Share• Intellectual property or confidentiality issues (59%)• Concerned research might be “scooped” (39%)• Concerns about misinterpretation or misuse (32%)• Concerns about attribution/citation credit (31%)• Ethical concerns (24%)• Insufficient time/resources (19%)• Funder/institution does not require sharing (13%)• Lack of funding (13%)• Not sure where to share (5%)• Not sure how to share (3%)

Report is underway: but See:http://exchanges.wiley.com/blog/2014/11/03/how-and-why-researchers-share-data-and-why-they-dont/http://scholarlykitchen.sspnet.org/2014/11/11/to-share-or-not-to-share-that-is-the-research-data-question/

Slide from Catherine Giffi, Director, Strategic Market Analysis, Global Research, Wiley

Page 7: Laurie Goodman at #aibsdata: Beyond Data Release Mandates - Helping Authors Make Data Available

Wiley Researcher Data Insights SurveyWhy Researchers Do Not Share• Intellectual property or confidentiality issues (59%)• Concerned research might be “scooped” (39%)• Concerns about misinterpretation or misuse (32%)• Concerns about attribution/citation credit (31%)• Ethical concerns (24%)• Insufficient time/resources (19%)• Funder/institution does not require sharing (13%)• Lack of funding (13%)• Not sure where to share (5%)• Not sure how to share (3%)

Report is underway: but See:http://exchanges.wiley.com/blog/2014/11/03/how-and-why-researchers-share-data-and-why-they-dont/http://scholarlykitchen.sspnet.org/2014/11/11/to-share-or-not-to-share-that-is-the-research-data-question/

Slide from Catherine Giffi, Director, Strategic Market Analysis, Global Research, Wiley

Page 8: Laurie Goodman at #aibsdata: Beyond Data Release Mandates - Helping Authors Make Data Available

How Can Publishers Promote Data Sharing

Carrots and Sticks• Sticks

– Create Journal Data Release Policies– Check Data Release Policy is followed

• Carrots– Find Ways to Aid Researchers in Releasing Data– Consider ways to support/protect researchers who do share ahead of

publications– Promote Data Citation– Data Curation– Data Hosting (short term or long term, depending on need)

And- why us?Researchers are never so captive as when they publishing

But we need to help — not just harass.

Page 9: Laurie Goodman at #aibsdata: Beyond Data Release Mandates - Helping Authors Make Data Available

How We Envision Research Publication(Communicating Science)

Data Sets inGigaDB

Analyses inGigaGalaxy

Paper inGigaScience

Open-access journal Data Publishing Platform

Data Analysis Platform

Page 10: Laurie Goodman at #aibsdata: Beyond Data Release Mandates - Helping Authors Make Data Available

Why have Journal-linked Database?

http://blogs.biomedcentral.com/gigablog/2014/05/14/the-latest-weapon-in-publishing-data-the-polar-bear/

Example #1:Direct Data CitationEncourages data release prior to publication of data analysis article

Page 11: Laurie Goodman at #aibsdata: Beyond Data Release Mandates - Helping Authors Make Data Available

The polar bear DATA was released –prepublication- in 2011

Data were used and cited in at least 5 studies (Below)1. Hailer, F et al., Nuclear genomic sequences reveal that polar bears are an old and

distinct bear lineage. Science. 2012 Apr 20;336(6079):344-7. doi:10.1126/science.1216424.

2. Cahill, JA et al., Genomic evidence for island population conversion resolves conflicting theories of polar bear evolution. PLoS Genet. 2013;9(3):e1003345. doi:10.1371/journal.pgen.1003345.

3. Morgan, CC et al., Heterogeneous models place the root of the placental mammal phylogeny. Mol Biol Evol. 2013 Sep;30(9):2145-56. doi:10.1093/molbev/mst117.

4. Cronin, MA et al., Molecular Phylogeny and SNP Variation of Polar Bears (Ursusmaritimus), Brown Bears (U. arctos), and Black Bears (U. americanus) Derived from Genome Sequences. J Hered. 2014; 105(3):312-23. doi:10.1093/jhered/est133.

5. Bidon, T et al., Brown and Polar Bear Y Chromosomes Reveal Extensive Male-Biased Gene Flow within Brother Lineages. Mol Biol Evol. 2014 Apr 4. doi:10.1093/molbev/msu109

http://blogs.biomedcentral.com/gigablog/2014/05/14/the-latest-weapon-in-publishing-data-the-polar-bear/

Formal Article by data producers on these datawas published in 2014 in Cell(But, data not cited in references…)

http://www.cell.com/cell/abstract/S0092-8674%2814%2900488-7

Page 12: Laurie Goodman at #aibsdata: Beyond Data Release Mandates - Helping Authors Make Data Available

Why have Journal-linked Database?

Example #2:Provide persistent database for data types that have no repository

Page 13: Laurie Goodman at #aibsdata: Beyond Data Release Mandates - Helping Authors Make Data Available

• New Sequencing technology• minION Oxford-Nanopore

• New Sequence Data Type• EBI and NCBI Databases not ready

• High community interest for testing data

• >100 GB of data

Why Journal-linked Database?Example # 3

• Uploaded prior to publication• Deployed on Amazon Cloud Front• Ongoing

testing/comparison/information sharing prior to publication

• When ready for data EBI used our cloud to upload data

• EBI transferred the data to NCBI when they were ready

Page 14: Laurie Goodman at #aibsdata: Beyond Data Release Mandates - Helping Authors Make Data Available

Why have Journal-linked Database?

Example #4:Provide the specific information (and forms) for accessing data in protected databases (EGA and DBGaP)

What needs to be done and who to contact for permission to use data

Forms needed to access this dataset

Page 15: Laurie Goodman at #aibsdata: Beyond Data Release Mandates - Helping Authors Make Data Available

Beyond Data Availability

Reviewing Data:Issue: We can’t ask our reviewers to do that!Our finding: Reviewers don’t mind

Reviewer Dr. Christophe Pouzat on neuroscience manuscript: “In addition to making the presented research trustworthy, the reproducible research paradigm definitelymakes the reviewers job more fun!”

Can also use specific Data Reviewers (we do)

Data Curation:Data availability without metadata is useless

Provide or engage data curators (in your target community or elsewhere.)

Page 16: Laurie Goodman at #aibsdata: Beyond Data Release Mandates - Helping Authors Make Data Available

Thanks to:Scott Edmunds, Executive Editor

Nicole Nogoy, Commissioning Editor

Peter Li, Lead Data Manager

Chris Hunter, Lead BioCurator

Rob Davidson, Data Scientist

Xiao (Jesse) Si Zhe, Database Developer

Amye Kenall, Journal Development Manager

[email protected]@gigasciencejournal.com

@GigaScience

facebook.com/GigaScience

blogs.openaccesscentral.com/blogs/gigablog

Contact us:

Follow us:

www.gigasciencejournal.comwww.gigadb.org