52
Transparency in Publishing and Being an Open Scientist Laurie Goodman, PhD Editor-in-Chief, GigaScience [email protected] ORCID ID: 0000-0001-9724-5976

Laurie Goodman at the BMC Roadshow: Transparency in Publishing and Being an Open Scientist

Embed Size (px)

Citation preview

Page 1: Laurie Goodman at the BMC Roadshow: Transparency in Publishing and Being an Open Scientist

Transparency in Publishing and Being an Open Scientist

Laurie Goodman, PhD

Editor-in-Chief, GigaScience

[email protected]

ORCID ID: 0000-0001-9724-5976

Page 2: Laurie Goodman at the BMC Roadshow: Transparency in Publishing and Being an Open Scientist

Open Access is Only the Beginning

• Open Access serves as a foundation for a specific way of thinking about scientific communication and the role of journals and publishers

• It is a step toward changing how we communicate science — from start to finish

• But it is only one step toward complete transparency in publishing— and in science.

Page 3: Laurie Goodman at the BMC Roadshow: Transparency in Publishing and Being an Open Scientist

Open Access is One Component of Open Science

What is Open Science?

• Open Access

• Open Data

• Open Source

• Open Notebook

Page 4: Laurie Goodman at the BMC Roadshow: Transparency in Publishing and Being an Open Scientist

Open Access is One Component of Open Science

What does that mean?

Dan Gezelter explained this quite nicely in a blog post(http://www.openscience.org/blog/?p=269)

It means:

• Having transparency in experimental methodology, observation, and collection of data.

• Providing public availability and reusability of scientific data.

• Providing public accessibility and transparency of scientific communication.

• Using web-based tools to facilitate scientific collaboration.

Page 5: Laurie Goodman at the BMC Roadshow: Transparency in Publishing and Being an Open Scientist

WHY Be Open?

Page 6: Laurie Goodman at the BMC Roadshow: Transparency in Publishing and Being an Open Scientist

A Tale of Two Bacteria1. On May 2, 2011 German Doctors Reported the first case of

an E.coli infection, that was accompanied by hemolytic-uremic syndrome.

2. On May 21, 2011 the first death occurred from this bacteria (denoted E.coli O104:H4).

3. On May 26, 2011 Cucumbers from Spain were declared the source of the infection. Resulting in a revenue loss of 200 Million Euros per week.

4. On June 3, 2014, BGI completed a draft sequence of E.coliO104:H4 from a sample provided by doctors at the University Medical Centre Hamburg-Eppendorf

5. Evening of Jun 3, 2014 the leaders at BGI and Hamburg-Eppendorf held a discussion about whether to release the sequence data immediately: what were the potential repercussions of doing so.

Page 7: Laurie Goodman at the BMC Roadshow: Transparency in Publishing and Being an Open Scientist

A Tale of Two Bacteria

A main question in this discussionIf the data were released now —

will it affect our ability to publish later?

Will Journal Editors say: “Not enough of an advance over the information already out there.”

Will Journal Editors say, “You have broken an embargo by making this information available to the public and the press.”

Will we be scooped?!?

Page 8: Laurie Goodman at the BMC Roadshow: Transparency in Publishing and Being an Open Scientist

A Tale of Two Bacteria

In World # 1The researchers — who were concerned, rightly so, about their ability to publish (remember this is the way to obtain recognition and obtain grants, which are essential for them to work) — waited.

The ResultThe first publication appeared on July 29th

(~ 2 months after the first death.)

Page 9: Laurie Goodman at the BMC Roadshow: Transparency in Publishing and Being an Open Scientist

A Tale of Two Bacteria

In World # 2The researchers decided public health was more important than obtaining a publication — released the data immediately.

The Result with regard to PublicationThe first publication appeared on July 29th — but it was not from the group who released the data (even though the released data was included in that study).

We live in World 2; however, although the data producers were not the first to publish, what

followed was exciting and had broad repercussions.

Page 10: Laurie Goodman at the BMC Roadshow: Transparency in Publishing and Being an Open Scientist

To maximize its utility to the research community and aid those fighting the current epidemic, genomic data is released here into the public domain under a CC0 license. Until the publication of research papers on the assembly and whole-genome analysis of this isolate we would ask you to cite this dataset as:

Li, D; Xi, F; Zhao, M; Liang, Y; Chen, W; Cao, S; Xu, R; Wang, G; Wang, J; Zhang, Z; Li, Y; Cui, Y; Chang, C; Cui, C; Luo, Y; Qin, J; Li, S; Li, J; Peng, Y; Pu, F; Sun, Y; Chen,Y; Zong, Y; Ma, X; Yang, X; Cen, Z; Zhao, X; Chen, F; Yin, X; Song,Y ; Rohde, H; Li, Y; Wang, J; Wang, J and the Escherichia coli O104:H4 TY-2482 isolate genome sequencing consortium (2011) Genomic data from Escherichia coli O104:H4 isolate TY-2482. BGI Shenzhen. doi:10.5524/100001 http://dx.doi.org/10.5524/100001

These data were put on an FTP server under a CCO waiver and also

given a DOI to make access ‘permanent’

To the extent possible under law, BGI Shenzhen has waived all copyright and related or neighboring rights to Genomic Data from the 2011 E. coli outbreak. This work is published from: China.

Page 11: Laurie Goodman at the BMC Roadshow: Transparency in Publishing and Being an Open Scientist
Page 12: Laurie Goodman at the BMC Roadshow: Transparency in Publishing and Being an Open Scientist
Page 13: Laurie Goodman at the BMC Roadshow: Transparency in Publishing and Being an Open Scientist
Page 14: Laurie Goodman at the BMC Roadshow: Transparency in Publishing and Being an Open Scientist

1.3 The power of intelligently open dataThe benefits of intelligently open data were powerfully

illustrated by events following an outbreak of a severe gastro-

intestinal infection in Hamburg in Germany in May 2011. This

spread through several European countries and the US,

affecting about 4000 people and resulting in over 50 deaths. All

tested positive for an unusual and little-known Shiga-toxin–

producing E. coli bacterium. The strain was initially analysed by

scientists at BGI-Shenzhen in China, working together with

those in Hamburg, and three days later a draft genome was

released under an open data licence. This generated interest

from bioinformaticians on four continents. 24 hours after the

release of the genome it had been assembled. Within a week

two dozen reports had been filed on an open-source site

dedicated to the analysis of the strain. These analyses

provided crucial information about the strain’s virulence and

resistance genes – how it spreads and which antibiotics are

effective against it. They produced results in time to help

contain the outbreak. By July 2011, scientists published papers

based on this work. By opening up their early sequencing

results to international collaboration, researchers in Hamburg

produced results that were quickly tested by a wide range of

experts, used to produce new knowledge and ultimately to

control a public health emergency.

Page 15: Laurie Goodman at the BMC Roadshow: Transparency in Publishing and Being an Open Scientist

So:Can we all agree that releasing the E.coli data

ahead of publication was ‘good’ (At least from a public health perspective)

If so- I want to put this case in perspective

Here are the numbers for the E.coli 2011 Outbreak:In total, ~4000 people were infected and 53 died

Page 16: Laurie Goodman at the BMC Roadshow: Transparency in Publishing and Being an Open Scientist

Infectious DiseaseMeasles: 122,000 per yearHepatitis C-related liver disease: 350,000-500,000 per yearMalaria: 627,000 per yearHIV/AIDS: 1.4-1.7 million per yearNon-communicable, with genetic predispositionProstate cancer: 307,000 per yearBreast cancer: 522,000 per yearSuicide: 800,000 per yearDiabetes: 1.5 million per yearCancer: 8.2 million per yearCardiovascular Disease: 17.5 million per yearNon-genetic/Non-infectiousPesticide Poisoning: 250,000 per yearMalnutrition: 2.8 million children (under 5) per yearData from World Health Organization Fact Sheets http://www.who.int/en/

Then… For all research: From a Public Health perspective…

Page 17: Laurie Goodman at the BMC Roadshow: Transparency in Publishing and Being an Open Scientist

But…

What’s in it for me??

Page 18: Laurie Goodman at the BMC Roadshow: Transparency in Publishing and Being an Open Scientist

Beyond Altruism: What Researchers Need

• Recognition:• Obtained through Publication and Citation

• Money:• Grants • Promotions Both are based on how much and

where you publish

Page 19: Laurie Goodman at the BMC Roadshow: Transparency in Publishing and Being an Open Scientist

INCENTIVESFor Data Sharing

Data Citation

Page 20: Laurie Goodman at the BMC Roadshow: Transparency in Publishing and Being an Open Scientist

What we’re doing at GigaScience1. Requiring all data supporting work to be Freely available in a publically available repository

– How we’re this:

• Journal-dedicated data and software repository GigaDB that hosts ALL data types.

• Have a Biocurator(s) to aid in handling Metadata

• All Datasets are provided a Digital Object Identifier(DOI) making them citable and countable (reward for making data available- GigaDB is tracked by Thompson Reuters Data Citation Index)

• All Material in GigaDB is available under a CC0 Waiver

• Data with a publically approved database must be submitted there as well

• Provide Direct links to all associated information

Page 21: Laurie Goodman at the BMC Roadshow: Transparency in Publishing and Being an Open Scientist

For data citation to work, needs:

• Acceptance by journals.

• Data+Citation: inclusion in the references.

• Tracking by citation indexes.

• Usage of the metrics by the community…

Page 22: Laurie Goodman at the BMC Roadshow: Transparency in Publishing and Being an Open Scientist

For data citation to work, needs:

• Acceptance by journals.

• Data+Citation: inclusion in the references.

• Tracking by citation indexes.

• Usage of the metrics by the community…

Page 23: Laurie Goodman at the BMC Roadshow: Transparency in Publishing and Being an Open Scientist
Page 24: Laurie Goodman at the BMC Roadshow: Transparency in Publishing and Being an Open Scientist

For data citation to work, needs:

• Acceptance by journals.

• Data+Citation: inclusion in the references.

• Tracking by citation indexes.

• Usage of the metrics by the community…

Page 25: Laurie Goodman at the BMC Roadshow: Transparency in Publishing and Being an Open Scientist
Page 26: Laurie Goodman at the BMC Roadshow: Transparency in Publishing and Being an Open Scientist

These data were released at the time of publication

Page 27: Laurie Goodman at the BMC Roadshow: Transparency in Publishing and Being an Open Scientist

http://blogs.biomedcentral.com/gigablog/2014/05/14/the-latest-weapon-in-publishing-data-the-polar-bear/

These data were released THREE YEARS before publication

Page 28: Laurie Goodman at the BMC Roadshow: Transparency in Publishing and Being an Open Scientist

The polar bear DATA was released –prepublication- in 2011They were used and cited in the following studies- before the main paper on the sequencing was published

Hailer, F et al., Nuclear genomic sequences reveal that polar bears are an old and distinct bear lineage. Science. 2012 Apr 20;336(6079):344-7. doi:10.1126/science.1216424.

Cahill, JA et al., Genomic evidence for island population conversion resolves conflicting theories of polar bear evolution. PLoS Genet. 2013;9(3):e1003345. doi:10.1371/journal.pgen.1003345.

Morgan, CC et al., Heterogeneous models place the root of the placental mammal phylogeny. Mol Biol Evol. 2013 Sep;30(9):2145-56. doi:10.1093/molbev/mst117.

Cronin, MA et al., Molecular Phylogeny and SNP Variation of Polar Bears (Ursusmaritimus), Brown Bears (U. arctos), and Black Bears (U. americanus) Derived from Genome Sequences. J Hered. 2014; 105(3):312-23. doi:10.1093/jhered/est133.

Bidon, T et al., Brown and Polar Bear Y Chromosomes Reveal Extensive Male-Biased Gene Flow within Brother Lineages. Mol Biol Evol. 2014 Apr 4. doi:10.1093/molbev/msu109

http://blogs.biomedcentral.com/gigablog/2014/05/14/the-latest-weapon-in-publishing-data-the-polar-bear/

Page 29: Laurie Goodman at the BMC Roadshow: Transparency in Publishing and Being an Open Scientist

Even though the data had been released 2 years earlier and cited in other papers- the main analysis paper was published in Cell

Page 30: Laurie Goodman at the BMC Roadshow: Transparency in Publishing and Being an Open Scientist

Cell Press Journals had indicated publishing a dataset prior to publication could be considered as prior publication

Page 31: Laurie Goodman at the BMC Roadshow: Transparency in Publishing and Being an Open Scientist

For data citation to work, needs:

• Acceptance by journals.

• Data+Citation: inclusion in the references.

• Tracking by citation indexes.

• Usage of the metrics by the community…

Page 32: Laurie Goodman at the BMC Roadshow: Transparency in Publishing and Being an Open Scientist

Data publication in databases is now being tracked by this and other tracking resources

Page 33: Laurie Goodman at the BMC Roadshow: Transparency in Publishing and Being an Open Scientist

For data citation to work, needs:

• Acceptance by journals.

• Data+Citation: inclusion in the references.

• Tracking by citation indexes.

• Usage of the metrics by the community…

Page 34: Laurie Goodman at the BMC Roadshow: Transparency in Publishing and Being an Open Scientist

Analysis of Data Citation Metrics With the advent of easily available and collated information on Data Citation, people are beginning to assess the levels of metrics obtained from different servers and the uses of these data.

http://arxiv.org/abs/1501.03342

Page 35: Laurie Goodman at the BMC Roadshow: Transparency in Publishing and Being an Open Scientist

Funding• Funding Agencies are now promoting open data release

• Explicitly including a data release plan in your grants can improve your chances to obtain funding.

• Example: Recent release of information from the Bill and

Melinda Gates Foundation http://www.gatesfoundation.org/how-

we-work/general-information/open-access-policy

The NIH, the Welcome Trust, and other funding agencies are also doing this (as well as specifically ignoring impact factor of publications)

Page 36: Laurie Goodman at the BMC Roadshow: Transparency in Publishing and Being an Open Scientist

Open Source

Page 37: Laurie Goodman at the BMC Roadshow: Transparency in Publishing and Being an Open Scientist

What we’re doing at GigaScience

Requiring all software and work to be Freely available in a publically available repository

– How we’re promoting this:

• Journal-Dedicated repository GigaDB that hosts software so it can be downloaded.

• Software and Workflows are provided a DOI making them citable and countable (reward)

• Journal-dedicated Galaxy Platform to run tools

• Have a Data Manager and Data Scientist to wrap and deploy software tools

• All software created by authors must be open-source

Page 38: Laurie Goodman at the BMC Roadshow: Transparency in Publishing and Being an Open Scientist

Beyond Open Data and Open Source

Being an Open Scientist

Page 39: Laurie Goodman at the BMC Roadshow: Transparency in Publishing and Being an Open Scientist

USE preprint servers

• If you are not aware of preprint servers, these are places where you can post your paper prior to submission to a journal

• Many journals (more than you think) allow researchers to do this (At GigaScience we recommend it.)

• Two preprint servers that are widely used by the community are: Bioarchive (http://biorxiv.org/) for biology papers and Arxiv (http://arxiv.org), for more mathematical/physics based papers- but any research paper can go hear as well)

• Editors are now looking at preprint servers and contacting authors of papers they are interested in!

Page 40: Laurie Goodman at the BMC Roadshow: Transparency in Publishing and Being an Open Scientist

USE preprint servers

Page 41: Laurie Goodman at the BMC Roadshow: Transparency in Publishing and Being an Open Scientist

Don’t be an Anonymous Peer Reviewer!

• There is an increasing trend toward open-peer review; including the reviewer’s name and after publication access to the reviews

• Open peer review extensively expands the transparency of the publication process

• We have found that open peer review is more constructive and less antagonistic than anonymous peer review.

Page 42: Laurie Goodman at the BMC Roadshow: Transparency in Publishing and Being an Open Scientist

What we’re doing at GigaSciencePeer Review

• Reviews are signed (Other Journals are doing this)

– Currently Opt-Out. Planning to make it mandatory

• All Reviews (and all pre-publication history) are available upon publication.

Page 43: Laurie Goodman at the BMC Roadshow: Transparency in Publishing and Being an Open Scientist

Take the Reviewer’s Oath• Dr. Mick Watson published a blog putting forth the idea of

reviewers taking an oath on how they will carry out peer review:https://biomickwatson.wordpress.com/2013/02/11/the-reviewers-oath)

• The Open Science Peer Review Oath (Aleksic et al) F1000 Research http://f1000research.com/articles/3-271/v1

• Review by Jonathan Eisen http://icis.ucdavis.edu/?p=505

Page 44: Laurie Goodman at the BMC Roadshow: Transparency in Publishing and Being an Open Scientist

Giving Reviewers Credit• We and other journals are starting to give DOI’s for

reviews that are open and named– We are doing this because we were asked by researchers

and teachers how they could cite a peer review they found of value.

• We are using a company called Publons, https://publons.com, which hosts reviews under the reviewer’s names where they can be tracked and read and given credit.

• For every review we post in Publons, we mint a DOI so that these can be cited and tracked.

• Publons currently has ~35,000 registered reviewers and ~86,000 reviews from ~5,700 journals

Page 45: Laurie Goodman at the BMC Roadshow: Transparency in Publishing and Being an Open Scientist

Open Peer Review as a New Reviewer’s Tool

• Since many of you are beginning to be peer-reviewers: Open Reviews are an excellent learning tool!

• Go to any journal with open peer review and read early versions of papers and reviews.

• Learn the names of reviewers who’s reviews you respect, and follow their reviews on Publons.

• Register in Publons for when you start to openly review. You can post your own reviews (or some journals, like GigaScience, post them for you.)

Page 46: Laurie Goodman at the BMC Roadshow: Transparency in Publishing and Being an Open Scientist

Being an Open Scientist and Tenure• It is possible! (And more and more probable)• Example: Dr. C. Titus Brown• From his blog: “On Gaining Tenure as an Open Scientist”

written after being awarded tenure as UC Davis.– I blog and tweet about our research.– All my senior-author papers are open access and were posted

as preprints.– I post all of my single-author grants openly, as soon as I submit

them.– All of our source code is openly available on github and most of

our papers are written in public on github.– I sign almost all of my paper reviews and post many of them

(the ones that I remember to post ;) on my blog.• http://ivory.idyll.org/blog/2014-open-and-tenured.html

Titus’ Take Home Message: “It is possible to achieve some measure of traditional success while being open. Grants; publications; tenure. 'nuff said.”

Page 47: Laurie Goodman at the BMC Roadshow: Transparency in Publishing and Being an Open Scientist

Current Scientific Communication

PublicationINFORMATION!!!

BIG PRESS RELEASE‘We found the Cure!!!’

Share Data/Tools/Methods

Punctuated Science

ResearchPreprints

(Ask!)

ScientificConferences

Collaborations

Shhhh…

Lab discussions

The End(ish)

Page 48: Laurie Goodman at the BMC Roadshow: Transparency in Publishing and Being an Open Scientist

Scientific Publication as part of a Continuum of Scientific Communication

Conferences

Preprints

Press Interviews

Public Discourse

Twitter

Interactive web tools

CollaborationBlogs

Publication

Sharing Data

Sharing Tools

Education

Blogs

Twitter

Press Interviews

EducationPublic Discourse

Sharing Data

Sharing Tools

Page 49: Laurie Goodman at the BMC Roadshow: Transparency in Publishing and Being an Open Scientist

The Continuum of Scientific Communication

Publication should be just a stage of research where one

component of the process has been formally vetted and is

available within an easily accessible and condensed format

Publication

Promote Real-Time Science

Page 50: Laurie Goodman at the BMC Roadshow: Transparency in Publishing and Being an Open Scientist

Infectious DiseaseMeasles: 334 per DAYHepatitis C-related liver disease: 959-1,369 per DAYMalaria: 1,718 per DAYHIV/AIDS: 3,836-4,758 per DAYNon-communicable, with genetic predispositionProstate cancer: 841 per DAYBreast cancer: 1,430 per DAYSuicide: 2,192 per DAYDiabetes: 4,110 per DAYCancer: 22,466 per DAYCardiovascular Disease: 47,945 million per DAYNon-genetic/Non-infectiousPesticide Poisoning: 685 per DAYMalnutrition: 7,671 children (under 5) per DAYData from World Health Organization Fact Sheets http://www.who.int/en/

Finally— Consider… For every DAY you wait to release information:

Page 51: Laurie Goodman at the BMC Roadshow: Transparency in Publishing and Being an Open Scientist

Several Blogs Worth Reading

• The Future of Science by Michael Nielsen– http://michaelnielsen.org/blog/the-future-of-science-2

• The Reviewers Oath by Mick Watson– https://biomickwatson.wordpress.com/2013/02/11/the-

reviewers-oath/

• How to Peer Review by Arjun Raj– http://rajlaboratory.blogspot.com/2014/04/how-to-

review-paper.html

• On Gaining Tenure as an Open Scientist by C. Titus Brown– http://ivory.idyll.org/blog/2014-open-and-tenured.html

Page 52: Laurie Goodman at the BMC Roadshow: Transparency in Publishing and Being an Open Scientist

Thanks to:Scott Edmunds, Executive Editor

Nicole Nogoy, Commissioning Editor

Peter Li, Lead Data Manager

Chris Hunter, Lead BioCurator

Rob Davidson, Data Scientist

Xiao (Jesse) Si Zhe, Database Developer

Amye Kenall, Journal Development Manager

[email protected]@gigasciencejournal.com

@GigaScience

facebook.com/GigaScience

blogs.openaccesscentral.com/blogs/gigablog

Contact us:

Follow us:

www.gigasciencejournal.comwww.gigadb.org