Laurie Goodman at the BMC Roadshow: Transparency in Publishing and Being an Open Scientist

Transparency in Publishing and Being an Open Scientist

Laurie Goodman, PhD

Editor-in-Chief, GigaScience

[email protected]

ORCID ID: 0000-0001-9724-5976

mailto:[email protected]

http://orcid.org/0000-0001-9724-5976

Open Access is Only the Beginning

• Open Access serves as a foundation for a specific way of thinking about scientific communication and the role of journals and publishers

• It is a step toward changing how we communicate science — from start to finish

• But it is only one step toward complete transparency in publishing— and in science.

Open Access is One Component of Open Science

What is Open Science?

• Open Access

• Open Data

• Open Source

• Open Notebook

Open Access is One Component of Open Science

What does that mean?

Dan Gezelter explained this quite nicely in a blog post(http://www.openscience.org/blog/?p=269)

It means:

• Having transparency in experimental methodology, observation, and collection of data.

• Providing public availability and reusability of scientific data.

• Providing public accessibility and transparency of scientific communication.

• Using web-based tools to facilitate scientific collaboration.

WHY Be Open?

A Tale of Two Bacteria1. On May 2, 2011 German Doctors Reported the first case of

an E.coli infection, that was accompanied by hemolytic-uremic syndrome.

2. On May 21, 2011 the first death occurred from this bacteria (denoted E.coli O104:H4).

3. On May 26, 2011 Cucumbers from Spain were declared the source of the infection. Resulting in a revenue loss of 200 Million Euros per week.

4. On June 3, 2014, BGI completed a draft sequence of E.coliO104:H4 from a sample provided by doctors at the University Medical Centre Hamburg-Eppendorf

5. Evening of Jun 3, 2014 the leaders at BGI and Hamburg-Eppendorf held a discussion about whether to release the sequence data immediately: what were the potential repercussions of doing so.

A Tale of Two Bacteria

A main question in this discussionIf the data were released now —

will it affect our ability to publish later?

Will Journal Editors say: “Not enough of an advance over the information already out there.”

Will Journal Editors say, “You have broken an embargo by making this information available to the public and the press.”

Will we be scooped?!?


In World # 1The researchers — who were concerned, rightly so, about their ability to publish (remember this is the way to obtain recognition and obtain grants, which are essential for them to work) — waited.

The ResultThe first publication appeared on July 29th

(~ 2 months after the first death.)


In World # 2The researchers decided public health was more important than obtaining a publication — released the data immediately.

The Result with regard to PublicationThe first publication appeared on July 29th — but it was not from the group who released the data (even though the released data was included in that study).

We live in World 2; however, although the data producers were not the first to publish, what

followed was exciting and had broad repercussions.

To maximize its utility to the research community and aid those fighting the current epidemic, genomic data is released here into the public domain under a CC0 license. Until the publication of research papers on the assembly and whole-genome analysis of this isolate we would ask you to cite this dataset as:

Li, D; Xi, F; Zhao, M; Liang, Y; Chen, W; Cao, S; Xu, R; Wang, G; Wang, J; Zhang, Z; Li, Y; Cui, Y; Chang, C; Cui, C; Luo, Y; Qin, J; Li, S; Li, J; Peng, Y; Pu, F; Sun, Y; Chen,Y; Zong, Y; Ma, X; Yang, X; Cen, Z; Zhao, X; Chen, F; Yin, X; Song,Y ; Rohde, H; Li, Y; Wang, J; Wang, J and the Escherichia coli O104:H4 TY-2482 isolate genome sequencing consortium (2011) Genomic data from Escherichia coli O104:H4 isolate TY-2482. BGI Shenzhen. doi:10.5524/100001 http://dx.doi.org/10.5524/100001

These data were put on an FTP server under a CCO waiver and also

given a DOI to make access ‘permanent’

To the extent possible under law, BGI Shenzhen has waived all copyright and related or neighboring rights to Genomic Data from the 2011 E. coli outbreak. This work is published from: China.

http://dx.doi.org/10.5524/100001

1.3 The power of intelligently open dataThe benefits of intelligently open data were powerfully

illustrated by events following an outbreak of a severe gastro-

intestinal infection in Hamburg in Germany in May 2011. This

spread through several European countries and the US,

affecting about 4000 people and resulting in over 50 deaths. All

tested positive for an unusual and little-known Shiga-toxin–

producing E. coli bacterium. The strain was initially analysed by

scientists at BGI-Shenzhen in China, working together with

those in Hamburg, and three days later a draft genome was

released under an open data licence. This generated interest

from bioinformaticians on four continents. 24 hours after the

release of the genome it had been assembled. Within a week

two dozen reports had been filed on an open-source site

dedicated to the analysis of the strain. These analyses

provided crucial information about the strain’s virulence and

resistance genes – how it spreads and which antibiotics are

effective against it. They produced results in time to help

contain the outbreak. By July 2011, scientists published papers

based on this work. By opening up their early sequencing

results to international collaboration, researchers in Hamburg

produced results that were quickly tested by a wide range of

experts, used to produce new knowledge and ultimately to

control a public health emergency.

So:Can we all agree that releasing the E.coli data

ahead of publication was ‘good’ (At least from a public health perspective)

If so- I want to put this case in perspective

Here are the numbers for the E.coli 2011 Outbreak:In total, ~4000 people were infected and 53 died

Infectious DiseaseMeasles: 122,000 per yearHepatitis C-related liver disease: 350,000-500,000 per yearMalaria: 627,000 per yearHIV/AIDS: 1.4-1.7 million per yearNon-communicable, with genetic predispositionProstate cancer: 307,000 per yearBreast cancer: 522,000 per yearSuicide: 800,000 per yearDiabetes: 1.5 million per yearCancer: 8.2 million per yearCardiovascular Disease: 17.5 million per yearNon-genetic/Non-infectiousPesticide Poisoning: 250,000 per yearMalnutrition: 2.8 million children (under 5) per yearData from World Health Organization Fact Sheets http://www.who.int/en/

Then… For all research: From a Public Health perspective…

But…

What’s in it for me??

Beyond Altruism: What Researchers Need

• Recognition:• Obtained through Publication and Citation

• Money:• Grants • Promotions Both are based on how much and

where you publish

INCENTIVESFor Data Sharing

Data Citation

What we’re doing at GigaScience1. Requiring all data supporting work to be Freely available in a publically available repository

– How we’re this:

• Journal-dedicated data and software repository GigaDB that hosts ALL data types.

• Have a Biocurator(s) to aid in handling Metadata

• All Datasets are provided a Digital Object Identifier(DOI) making them citable and countable (reward for making data available- GigaDB is tracked by Thompson Reuters Data Citation Index)

• All Material in GigaDB is available under a CC0 Waiver

• Data with a publically approved database must be submitted there as well

• Provide Direct links to all associated information

For data citation to work, needs:

• Acceptance by journals.

• Data+Citation: inclusion in the references.

• Tracking by citation indexes.

• Usage of the metrics by the community…











These data were released at the time of publication

http://blogs.biomedcentral.com/gigablog/2014/05/14/the-latest-weapon-in-publishing-data-the-polar-bear/

These data were released THREE YEARS before publication

The polar bear DATA was released –prepublication- in 2011They were used and cited in the following studies- before the main paper on the sequencing was published

Hailer, F et al., Nuclear genomic sequences reveal that polar bears are an old and distinct bear lineage. Science. 2012 Apr 20;336(6079):344-7. doi:10.1126/science.1216424.

Cahill, JA et al., Genomic evidence for island population conversion resolves conflicting theories of polar bear evolution. PLoS Genet. 2013;9(3):e1003345. doi:10.1371/journal.pgen.1003345.

Morgan, CC et al., Heterogeneous models place the root of the placental mammal phylogeny. Mol Biol Evol. 2013 Sep;30(9):2145-56. doi:10.1093/molbev/mst117.

Cronin, MA et al., Molecular Phylogeny and SNP Variation of Polar Bears (Ursusmaritimus), Brown Bears (U. arctos), and Black Bears (U. americanus) Derived from Genome Sequences. J Hered. 2014; 105(3):312-23. doi:10.1093/jhered/est133.

Bidon, T et al., Brown and Polar Bear Y Chromosomes Reveal Extensive Male-Biased Gene Flow within Brother Lineages. Mol Biol Evol. 2014 Apr 4. doi:10.1093/molbev/msu109

http://blogs.biomedcentral.com/gigablog/2014/05/14/the-latest-weapon-in-publishing-data-the-polar-bear/

Even though the data had been released 2 years earlier and cited in other papers- the main analysis paper was published in Cell

Cell Press Journals had indicated publishing a dataset prior to publication could be considered as prior publication






Data publication in databases is now being tracked by this and other tracking resources






Analysis of Data Citation Metrics With the advent of easily available and collated information on Data Citation, people are beginning to assess the levels of metrics obtained from different servers and the uses of these data.

http://arxiv.org/abs/1501.03342

Funding• Funding Agencies are now promoting open data release

• Explicitly including a data release plan in your grants can improve your chances to obtain funding.

• Example: Recent release of information from the Bill and

Melinda Gates Foundation http://www.gatesfoundation.org/how-

we-work/general-information/open-access-policy

The NIH, the Welcome Trust, and other funding agencies are also doing this (as well as specifically ignoring impact factor of publications)

Open Source

What we’re doing at GigaScience

Requiring all software and work to be Freely available in a publically available repository

– How we’re promoting this:

• Journal-Dedicated repository GigaDB that hosts software so it can be downloaded.

• Software and Workflows are provided a DOI making them citable and countable (reward)

• Journal-dedicated Galaxy Platform to run tools

• Have a Data Manager and Data Scientist to wrap and deploy software tools

• All software created by authors must be open-source

Beyond Open Data and Open Source

Being an Open Scientist

USE preprint servers

• If you are not aware of preprint servers, these are places where you can post your paper prior to submission to a journal

• Many journals (more than you think) allow researchers to do this (At GigaScience we recommend it.)

• Two preprint servers that are widely used by the community are: Bioarchive (http://biorxiv.org/) for biology papers and Arxiv (http://arxiv.org), for more mathematical/physics based papers- but any research paper can go hear as well)

• Editors are now looking at preprint servers and contacting authors of papers they are interested in!

http://arxiv.org

USE preprint servers

Don’t be an Anonymous Peer Reviewer!

• There is an increasing trend toward open-peer review; including the reviewer’s name and after publication access to the reviews

• Open peer review extensively expands the transparency of the publication process

• We have found that open peer review is more constructive and less antagonistic than anonymous peer review.

What we’re doing at GigaSciencePeer Review

• Reviews are signed (Other Journals are doing this)

– Currently Opt-Out. Planning to make it mandatory

• All Reviews (and all pre-publication history) are available upon publication.

Take the Reviewer’s Oath• Dr. Mick Watson published a blog putting forth the idea of

reviewers taking an oath on how they will carry out peer review:https://biomickwatson.wordpress.com/2013/02/11/the-reviewers-oath)

• The Open Science Peer Review Oath (Aleksic et al) F1000 Research http://f1000research.com/articles/3-271/v1

• Review by Jonathan Eisen http://icis.ucdavis.edu/?p=505

https://biomickwatson.wordpress.com/2013/02/11/the-reviewers-oath

http://f1000research.com/articles/3-271/v1

Giving Reviewers Credit• We and other journals are starting to give DOI’s for

reviews that are open and named– We are doing this because we were asked by researchers

and teachers how they could cite a peer review they found of value.

• We are using a company called Publons, https://publons.com, which hosts reviews under the reviewer’s names where they can be tracked and read and given credit.

• For every review we post in Publons, we mint a DOI so that these can be cited and tracked.

• Publons currently has ~35,000 registered reviewers and ~86,000 reviews from ~5,700 journals

https://publons.com

Open Peer Review as a New Reviewer’s Tool

• Since many of you are beginning to be peer-reviewers: Open Reviews are an excellent learning tool!

• Go to any journal with open peer review and read early versions of papers and reviews.

• Learn the names of reviewers who’s reviews you respect, and follow their reviews on Publons.

• Register in Publons for when you start to openly review. You can post your own reviews (or some journals, like GigaScience, post them for you.)

Being an Open Scientist and Tenure• It is possible! (And more and more probable)• Example: Dr. C. Titus Brown• From his blog: “On Gaining Tenure as an Open Scientist”

written after being awarded tenure as UC Davis.– I blog and tweet about our research.– All my senior-author papers are open access and were posted

as preprints.– I post all of my single-author grants openly, as soon as I submit

them.– All of our source code is openly available on github and most of

our papers are written in public on github.– I sign almost all of my paper reviews and post many of them

(the ones that I remember to post ;) on my blog.• http://ivory.idyll.org/blog/2014-open-and-tenured.html

Titus’ Take Home Message: “It is possible to achieve some measure of traditional success while being open. Grants; publications; tenure. 'nuff said.”

http://ivory.idyll.org/blog/

https://twitter.com/ctitusbrown

http://scholar.google.com/citations?hl=en&user=O4rYanMAAAAJ&view_op=list_works&sortby=pubdate

http://ged.msu.edu/research.html

https://github.com/ged-lab/

http://ivory.idyll.org/blog/tag/reviews.html

Current Scientific Communication

PublicationINFORMATION!!!

BIG PRESS RELEASE‘We found the Cure!!!’

Share Data/Tools/Methods

Punctuated Science

ResearchPreprints

(Ask!)

ScientificConferences

Collaborations

Shhhh…

Lab discussions

The End(ish)

Scientific Publication as part of a Continuum of Scientific Communication

Conferences

Preprints

Press Interviews

Public Discourse

Twitter

Interactive web tools

CollaborationBlogs

Publication

Sharing Data

Sharing Tools

Education

Blogs

Twitter

Press Interviews

EducationPublic Discourse

Sharing Data

Sharing Tools

The Continuum of Scientific Communication

Publication should be just a stage of research where one

component of the process has been formally vetted and is

available within an easily accessible and condensed format

Publication

Promote Real-Time Science

Infectious DiseaseMeasles: 334 per DAYHepatitis C-related liver disease: 959-1,369 per DAYMalaria: 1,718 per DAYHIV/AIDS: 3,836-4,758 per DAYNon-communicable, with genetic predispositionProstate cancer: 841 per DAYBreast cancer: 1,430 per DAYSuicide: 2,192 per DAYDiabetes: 4,110 per DAYCancer: 22,466 per DAYCardiovascular Disease: 47,945 million per DAYNon-genetic/Non-infectiousPesticide Poisoning: 685 per DAYMalnutrition: 7,671 children (under 5) per DAYData from World Health Organization Fact Sheets http://www.who.int/en/

Finally— Consider… For every DAY you wait to release information:

Several Blogs Worth Reading

• The Future of Science by Michael Nielsen– http://michaelnielsen.org/blog/the-future-of-science-2

• The Reviewers Oath by Mick Watson– https://biomickwatson.wordpress.com/2013/02/11/the-

reviewers-oath/

• How to Peer Review by Arjun Raj– http://rajlaboratory.blogspot.com/2014/04/how-to-

review-paper.html

• On Gaining Tenure as an Open Scientist by C. Titus Brown– http://ivory.idyll.org/blog/2014-open-and-tenured.html

Thanks to:Scott Edmunds, Executive Editor

Nicole Nogoy, Commissioning Editor

Peter Li, Lead Data Manager

Chris Hunter, Lead BioCurator

Rob Davidson, Data Scientist

Xiao (Jesse) Si Zhe, Database Developer

Amye Kenall, Journal Development Manager

[email protected]@gigasciencejournal.com

@GigaScience

facebook.com/GigaScience

blogs.openaccesscentral.com/blogs/gigablog

Contact us:

Follow us:

www.gigasciencejournal.comwww.gigadb.org




http://www.gigasciencejournal.com/

http://www.gigadb.org/