Upload
gigascience-bgi-hong-kong
View
367
Download
4
Tags:
Embed Size (px)
Citation preview
Transparency in Publishing and Being an Open Scientist
Laurie Goodman, PhD
Editor-in-Chief, GigaScience
ORCID ID: 0000-0001-9724-5976
Open Access is Only the Beginning
• Open Access serves as a foundation for a specific way of thinking about scientific communication and the role of journals and publishers
• It is a step toward changing how we communicate science — from start to finish
• But it is only one step toward complete transparency in publishing— and in science.
Open Access is One Component of Open Science
What is Open Science?
• Open Access
• Open Data
• Open Source
• Open Notebook
Open Access is One Component of Open Science
What does that mean?
Dan Gezelter explained this quite nicely in a blog post(http://www.openscience.org/blog/?p=269)
It means:
• Having transparency in experimental methodology, observation, and collection of data.
• Providing public availability and reusability of scientific data.
• Providing public accessibility and transparency of scientific communication.
• Using web-based tools to facilitate scientific collaboration.
WHY Be Open?
A Tale of Two Bacteria1. On May 2, 2011 German Doctors Reported the first case of
an E.coli infection, that was accompanied by hemolytic-uremic syndrome.
2. On May 21, 2011 the first death occurred from this bacteria (denoted E.coli O104:H4).
3. On May 26, 2011 Cucumbers from Spain were declared the source of the infection. Resulting in a revenue loss of 200 Million Euros per week.
4. On June 3, 2014, BGI completed a draft sequence of E.coliO104:H4 from a sample provided by doctors at the University Medical Centre Hamburg-Eppendorf
5. Evening of Jun 3, 2014 the leaders at BGI and Hamburg-Eppendorf held a discussion about whether to release the sequence data immediately: what were the potential repercussions of doing so.
A Tale of Two Bacteria
A main question in this discussionIf the data were released now —
will it affect our ability to publish later?
Will Journal Editors say: “Not enough of an advance over the information already out there.”
Will Journal Editors say, “You have broken an embargo by making this information available to the public and the press.”
Will we be scooped?!?
A Tale of Two Bacteria
In World # 1The researchers — who were concerned, rightly so, about their ability to publish (remember this is the way to obtain recognition and obtain grants, which are essential for them to work) — waited.
The ResultThe first publication appeared on July 29th
(~ 2 months after the first death.)
A Tale of Two Bacteria
In World # 2The researchers decided public health was more important than obtaining a publication — released the data immediately.
The Result with regard to PublicationThe first publication appeared on July 29th — but it was not from the group who released the data (even though the released data was included in that study).
We live in World 2; however, although the data producers were not the first to publish, what
followed was exciting and had broad repercussions.
To maximize its utility to the research community and aid those fighting the current epidemic, genomic data is released here into the public domain under a CC0 license. Until the publication of research papers on the assembly and whole-genome analysis of this isolate we would ask you to cite this dataset as:
Li, D; Xi, F; Zhao, M; Liang, Y; Chen, W; Cao, S; Xu, R; Wang, G; Wang, J; Zhang, Z; Li, Y; Cui, Y; Chang, C; Cui, C; Luo, Y; Qin, J; Li, S; Li, J; Peng, Y; Pu, F; Sun, Y; Chen,Y; Zong, Y; Ma, X; Yang, X; Cen, Z; Zhao, X; Chen, F; Yin, X; Song,Y ; Rohde, H; Li, Y; Wang, J; Wang, J and the Escherichia coli O104:H4 TY-2482 isolate genome sequencing consortium (2011) Genomic data from Escherichia coli O104:H4 isolate TY-2482. BGI Shenzhen. doi:10.5524/100001 http://dx.doi.org/10.5524/100001
These data were put on an FTP server under a CCO waiver and also
given a DOI to make access ‘permanent’
To the extent possible under law, BGI Shenzhen has waived all copyright and related or neighboring rights to Genomic Data from the 2011 E. coli outbreak. This work is published from: China.
1.3 The power of intelligently open dataThe benefits of intelligently open data were powerfully
illustrated by events following an outbreak of a severe gastro-
intestinal infection in Hamburg in Germany in May 2011. This
spread through several European countries and the US,
affecting about 4000 people and resulting in over 50 deaths. All
tested positive for an unusual and little-known Shiga-toxin–
producing E. coli bacterium. The strain was initially analysed by
scientists at BGI-Shenzhen in China, working together with
those in Hamburg, and three days later a draft genome was
released under an open data licence. This generated interest
from bioinformaticians on four continents. 24 hours after the
release of the genome it had been assembled. Within a week
two dozen reports had been filed on an open-source site
dedicated to the analysis of the strain. These analyses
provided crucial information about the strain’s virulence and
resistance genes – how it spreads and which antibiotics are
effective against it. They produced results in time to help
contain the outbreak. By July 2011, scientists published papers
based on this work. By opening up their early sequencing
results to international collaboration, researchers in Hamburg
produced results that were quickly tested by a wide range of
experts, used to produce new knowledge and ultimately to
control a public health emergency.
So:Can we all agree that releasing the E.coli data
ahead of publication was ‘good’ (At least from a public health perspective)
If so- I want to put this case in perspective
Here are the numbers for the E.coli 2011 Outbreak:In total, ~4000 people were infected and 53 died
Infectious DiseaseMeasles: 122,000 per yearHepatitis C-related liver disease: 350,000-500,000 per yearMalaria: 627,000 per yearHIV/AIDS: 1.4-1.7 million per yearNon-communicable, with genetic predispositionProstate cancer: 307,000 per yearBreast cancer: 522,000 per yearSuicide: 800,000 per yearDiabetes: 1.5 million per yearCancer: 8.2 million per yearCardiovascular Disease: 17.5 million per yearNon-genetic/Non-infectiousPesticide Poisoning: 250,000 per yearMalnutrition: 2.8 million children (under 5) per yearData from World Health Organization Fact Sheets http://www.who.int/en/
Then… For all research: From a Public Health perspective…
But…
What’s in it for me??
Beyond Altruism: What Researchers Need
• Recognition:• Obtained through Publication and Citation
• Money:• Grants • Promotions Both are based on how much and
where you publish
INCENTIVESFor Data Sharing
Data Citation
What we’re doing at GigaScience1. Requiring all data supporting work to be Freely available in a publically available repository
– How we’re this:
• Journal-dedicated data and software repository GigaDB that hosts ALL data types.
• Have a Biocurator(s) to aid in handling Metadata
• All Datasets are provided a Digital Object Identifier(DOI) making them citable and countable (reward for making data available- GigaDB is tracked by Thompson Reuters Data Citation Index)
• All Material in GigaDB is available under a CC0 Waiver
• Data with a publically approved database must be submitted there as well
• Provide Direct links to all associated information
For data citation to work, needs:
• Acceptance by journals.
• Data+Citation: inclusion in the references.
• Tracking by citation indexes.
• Usage of the metrics by the community…
For data citation to work, needs:
• Acceptance by journals.
• Data+Citation: inclusion in the references.
• Tracking by citation indexes.
• Usage of the metrics by the community…
For data citation to work, needs:
• Acceptance by journals.
• Data+Citation: inclusion in the references.
• Tracking by citation indexes.
• Usage of the metrics by the community…
These data were released at the time of publication
http://blogs.biomedcentral.com/gigablog/2014/05/14/the-latest-weapon-in-publishing-data-the-polar-bear/
These data were released THREE YEARS before publication
The polar bear DATA was released –prepublication- in 2011They were used and cited in the following studies- before the main paper on the sequencing was published
Hailer, F et al., Nuclear genomic sequences reveal that polar bears are an old and distinct bear lineage. Science. 2012 Apr 20;336(6079):344-7. doi:10.1126/science.1216424.
Cahill, JA et al., Genomic evidence for island population conversion resolves conflicting theories of polar bear evolution. PLoS Genet. 2013;9(3):e1003345. doi:10.1371/journal.pgen.1003345.
Morgan, CC et al., Heterogeneous models place the root of the placental mammal phylogeny. Mol Biol Evol. 2013 Sep;30(9):2145-56. doi:10.1093/molbev/mst117.
Cronin, MA et al., Molecular Phylogeny and SNP Variation of Polar Bears (Ursusmaritimus), Brown Bears (U. arctos), and Black Bears (U. americanus) Derived from Genome Sequences. J Hered. 2014; 105(3):312-23. doi:10.1093/jhered/est133.
Bidon, T et al., Brown and Polar Bear Y Chromosomes Reveal Extensive Male-Biased Gene Flow within Brother Lineages. Mol Biol Evol. 2014 Apr 4. doi:10.1093/molbev/msu109
http://blogs.biomedcentral.com/gigablog/2014/05/14/the-latest-weapon-in-publishing-data-the-polar-bear/
Even though the data had been released 2 years earlier and cited in other papers- the main analysis paper was published in Cell
Cell Press Journals had indicated publishing a dataset prior to publication could be considered as prior publication
For data citation to work, needs:
• Acceptance by journals.
• Data+Citation: inclusion in the references.
• Tracking by citation indexes.
• Usage of the metrics by the community…
Data publication in databases is now being tracked by this and other tracking resources
For data citation to work, needs:
• Acceptance by journals.
• Data+Citation: inclusion in the references.
• Tracking by citation indexes.
• Usage of the metrics by the community…
Analysis of Data Citation Metrics With the advent of easily available and collated information on Data Citation, people are beginning to assess the levels of metrics obtained from different servers and the uses of these data.
http://arxiv.org/abs/1501.03342
Funding• Funding Agencies are now promoting open data release
• Explicitly including a data release plan in your grants can improve your chances to obtain funding.
• Example: Recent release of information from the Bill and
Melinda Gates Foundation http://www.gatesfoundation.org/how-
we-work/general-information/open-access-policy
The NIH, the Welcome Trust, and other funding agencies are also doing this (as well as specifically ignoring impact factor of publications)
Open Source
What we’re doing at GigaScience
Requiring all software and work to be Freely available in a publically available repository
– How we’re promoting this:
• Journal-Dedicated repository GigaDB that hosts software so it can be downloaded.
• Software and Workflows are provided a DOI making them citable and countable (reward)
• Journal-dedicated Galaxy Platform to run tools
• Have a Data Manager and Data Scientist to wrap and deploy software tools
• All software created by authors must be open-source
Beyond Open Data and Open Source
Being an Open Scientist
USE preprint servers
• If you are not aware of preprint servers, these are places where you can post your paper prior to submission to a journal
• Many journals (more than you think) allow researchers to do this (At GigaScience we recommend it.)
• Two preprint servers that are widely used by the community are: Bioarchive (http://biorxiv.org/) for biology papers and Arxiv (http://arxiv.org), for more mathematical/physics based papers- but any research paper can go hear as well)
• Editors are now looking at preprint servers and contacting authors of papers they are interested in!
USE preprint servers
Don’t be an Anonymous Peer Reviewer!
• There is an increasing trend toward open-peer review; including the reviewer’s name and after publication access to the reviews
• Open peer review extensively expands the transparency of the publication process
• We have found that open peer review is more constructive and less antagonistic than anonymous peer review.
What we’re doing at GigaSciencePeer Review
• Reviews are signed (Other Journals are doing this)
– Currently Opt-Out. Planning to make it mandatory
• All Reviews (and all pre-publication history) are available upon publication.
Take the Reviewer’s Oath• Dr. Mick Watson published a blog putting forth the idea of
reviewers taking an oath on how they will carry out peer review:https://biomickwatson.wordpress.com/2013/02/11/the-reviewers-oath)
• The Open Science Peer Review Oath (Aleksic et al) F1000 Research http://f1000research.com/articles/3-271/v1
• Review by Jonathan Eisen http://icis.ucdavis.edu/?p=505
Giving Reviewers Credit• We and other journals are starting to give DOI’s for
reviews that are open and named– We are doing this because we were asked by researchers
and teachers how they could cite a peer review they found of value.
• We are using a company called Publons, https://publons.com, which hosts reviews under the reviewer’s names where they can be tracked and read and given credit.
• For every review we post in Publons, we mint a DOI so that these can be cited and tracked.
• Publons currently has ~35,000 registered reviewers and ~86,000 reviews from ~5,700 journals
Open Peer Review as a New Reviewer’s Tool
• Since many of you are beginning to be peer-reviewers: Open Reviews are an excellent learning tool!
• Go to any journal with open peer review and read early versions of papers and reviews.
• Learn the names of reviewers who’s reviews you respect, and follow their reviews on Publons.
• Register in Publons for when you start to openly review. You can post your own reviews (or some journals, like GigaScience, post them for you.)
Being an Open Scientist and Tenure• It is possible! (And more and more probable)• Example: Dr. C. Titus Brown• From his blog: “On Gaining Tenure as an Open Scientist”
written after being awarded tenure as UC Davis.– I blog and tweet about our research.– All my senior-author papers are open access and were posted
as preprints.– I post all of my single-author grants openly, as soon as I submit
them.– All of our source code is openly available on github and most of
our papers are written in public on github.– I sign almost all of my paper reviews and post many of them
(the ones that I remember to post ;) on my blog.• http://ivory.idyll.org/blog/2014-open-and-tenured.html
Titus’ Take Home Message: “It is possible to achieve some measure of traditional success while being open. Grants; publications; tenure. 'nuff said.”
Current Scientific Communication
PublicationINFORMATION!!!
BIG PRESS RELEASE‘We found the Cure!!!’
Share Data/Tools/Methods
Punctuated Science
ResearchPreprints
(Ask!)
ScientificConferences
Collaborations
Shhhh…
Lab discussions
The End(ish)
Scientific Publication as part of a Continuum of Scientific Communication
Conferences
Preprints
Press Interviews
Public Discourse
Interactive web tools
CollaborationBlogs
Publication
Sharing Data
Sharing Tools
Education
Blogs
Press Interviews
EducationPublic Discourse
Sharing Data
Sharing Tools
The Continuum of Scientific Communication
Publication should be just a stage of research where one
component of the process has been formally vetted and is
available within an easily accessible and condensed format
Publication
Promote Real-Time Science
Infectious DiseaseMeasles: 334 per DAYHepatitis C-related liver disease: 959-1,369 per DAYMalaria: 1,718 per DAYHIV/AIDS: 3,836-4,758 per DAYNon-communicable, with genetic predispositionProstate cancer: 841 per DAYBreast cancer: 1,430 per DAYSuicide: 2,192 per DAYDiabetes: 4,110 per DAYCancer: 22,466 per DAYCardiovascular Disease: 47,945 million per DAYNon-genetic/Non-infectiousPesticide Poisoning: 685 per DAYMalnutrition: 7,671 children (under 5) per DAYData from World Health Organization Fact Sheets http://www.who.int/en/
Finally— Consider… For every DAY you wait to release information:
Several Blogs Worth Reading
• The Future of Science by Michael Nielsen– http://michaelnielsen.org/blog/the-future-of-science-2
• The Reviewers Oath by Mick Watson– https://biomickwatson.wordpress.com/2013/02/11/the-
reviewers-oath/
• How to Peer Review by Arjun Raj– http://rajlaboratory.blogspot.com/2014/04/how-to-
review-paper.html
• On Gaining Tenure as an Open Scientist by C. Titus Brown– http://ivory.idyll.org/blog/2014-open-and-tenured.html
Thanks to:Scott Edmunds, Executive Editor
Nicole Nogoy, Commissioning Editor
Peter Li, Lead Data Manager
Chris Hunter, Lead BioCurator
Rob Davidson, Data Scientist
Xiao (Jesse) Si Zhe, Database Developer
Amye Kenall, Journal Development Manager
[email protected]@gigasciencejournal.com
@GigaScience
facebook.com/GigaScience
blogs.openaccesscentral.com/blogs/gigablog
Contact us:
Follow us:
www.gigasciencejournal.comwww.gigadb.org