Upload
philip-bourne
View
568
Download
0
Embed Size (px)
DESCRIPTION
Presentation on the changing face of scholarly communication and the interplay between data and the knowledge derived from that data.
Citation preview
iDASH October 18, 2013 1
In the Future Will a Biological Database Really be Different than a Biological Journal?
Philip E. Bourne [email protected]
iDASH October 18, 2013 2
I am speaking to you today as someone who..
• Maintains a major biological database – the PDB – used by over 300,000 scientists per month
• Is the Founding Editor in Chief of PLOS Computational Biology
iDASH October 18, 2013 3
A Question First Posed in August 2005
PLOS Comp Biol 2005 1(3): e34
iDASH October 18, 2013 4
Here is one reason why the question is important….
1. A link brings up figures from the paper
0. Full text of PLoS papers stored in a database
2. Clicking the paper figure retrievesdata from the PDB which is
analyzed
3. A composite view ofjournal and database
content results
The Paper As Experiment
1. User clicks on thumbnail2. Metadata and a
webservices call provide a renderable image that can be annotated
3. Selecting a features provides a database/literature mashup
4. That leads to new papers
4. The composite view haslinks to pertinent blocks
of literature text and back to the PDB
1.
2.
3.
4.
PLoS Comp. Biol. 2005 1(3) e34
5
iDASH October 18, 2013 6
The answer 8 years ago, as is now is…
In principle there is no difference, but the way in which each is perceived is still very different…
Yet progress has been made and we will focus on what we can do to further accelerate change
iDASH October 18, 2013 7
Why Bother?
Better integration of data and the knowledge derived from it can accelerate discovery and improve the comprehension and dissemination of science
iDASH October 18, 2013 8
Lets take a step back ...
What got me thinking this way?
iDASH October 18, 2013 9
Data Are Becoming More Complex:Witness The World Wide Protein Data Bank
• The single worldwide repository for data on the structure of biological macromolecules
• Vital for drug discovery and the life sciences
• 43 years old• Free to all
http://www.wwpdb.org
iDASH October 18, 2013 10
The World Wide Protein Data BankPlaces High Value on Data
• Paper not published unless data are deposited – strong data to literature correspondence
• Highly structured data conforming to an extensive ontology
• DOI’s assigned to every structure
http://www.wwpdb.org
iDASH October 18, 2013 11
The PLoS Corpus• Established in 2000• Identified as a high
quality publications• Currently 8 journals
with healthy growth• Open Access – free to
all• PLOS ONE a huge
success
iDASH October 18, 2013 12
Author Submission via the Web Depositor Submission via the Web
Syntax Checking Syntax Checking
Review by Scientists &Editors
Review by Annotators
Corrections by AuthorCorrections by Depositor
Publish – Web Accessible Release – Web Accessible
Similar Processes Lead to Similar Resources
iDASH October 18, 2013 13
The scientific process for handling data and publications are not that different, but the end product is perceived very differently
iDASH October 18, 2013
Unfortunately the Metrics of Success Remain…
[Carole Goble] 14
iDASH October 18, 2013 15
This makes no sense when you ask yourself the question:What is more valuable a dataset used and cited by 100 scientists or
a paper you wrote that only you cite?
Case in point…
iDASH October 18, 2013 16
What can you do today to change the situation?
Think Globally Act Locally
• Support emergent community commons/portals• Be involved in the support and development of
metadata standards• Contribute to workflow development etc. to drive
an open research lifecycle• Educate your mentors on the importance of
open science and scholarly communication • Write software thinking of an App model
iDASH October 18, 2013 17
Pressure Your Institutions to Play a Greater Role
• We need institutional data/knowledge sharing plans
• We need digital universities
• We need data/information scientists to be better recognized by institutions – its not all about papers – this implies new metrics
18iDASH October 18, 2013
iDASH October 18, 2013
Committee on Academic Promotions
• What Counts– Money– Grants– Papers– Teaching – Service
• What Does Not– Sharing data– Sharing software– Open access– Collaboration– Patents– Startups
19
Ten Simple Rules for Getting Ahead as a Computational Biologist in Academia 2011 PLOS Comp Biol 7(1) e1002001
iDASH October 18, 2013
We Need to Bend the Traditional SystemThe Wikipedia Experiment – Topic Pages
Identify areas of Wikipedia that relate to the journal that are missing of stubs
Develop a Wikipedia page in the sandbox
Have a Topic Page Editor Review the page
Publish the copy of record with associated rewards
Release the living version into Wikipedia
20
iDASH October 18, 2013 21
We Need Innovative Contributions to the Research Lifecycle
IDEAS – HYPOTHESES – EXPERIMENTS – DATA - ANALYSIS - COMPREHENSION - DISSEMINATION
AuthoringTools
Lab Notebooks
DataCapture
SoftwareRepositories
Analysis Tools
Visualization
ScholarlyCommunication
Commercial &Public Tools
Git-likeResources
By Discipline
Data JournalsDiscipline-
Based MetadataStandards
Community Portals
Institutional Repositories
New Reward Systems
Commercial Repositories
Training
iDASH October 18, 2013 22
We Need Innovative Contributions to the Research Lifecycle
IDEAS – HYPOTHESES – EXPERIMENTS – DATA - ANALYSIS - COMPREHENSION - DISSEMINATION
AuthoringTools
Lab Notebooks
DataCapture
SoftwareRepositories
Analysis Tools
Visualization
ScholarlyCommunication
Commercial &Public Tools
Git-likeResources
By Discipline
Data JournalsDiscipline-
Based MetadataStandards
Community Portals
Institutional Repositories
New Reward Systems
Commercial Repositories
Training
iDASH October 18, 2013 23
www.rcsb.org/pdb/explore/literature.do?structureId=1TIM
Example Interoperability: The Database View
BMC Bioinformatics 2010 11:220
iDASH October 18, 2013 24
This is asking a lot of us, but our job is being made easier by what is going on around us
iDASH October 18, 2013 25
Open Access to Data and the Literature is no Longer a Curiosity, but Mainstream
Conservative Bodies Are Recognizing Change
• Anyone, anything, anytime
• publication access, data, models, source codes, resources, transparent methods, standards, formats, identifiers, apis, licenses, education, policies
• “accessible, intelligible, assessable, reusable”
http://royalsociety.org/policy/projects/science-public-enterprise/report/
[Carole Goble]
27
Governments Are Recognizing ChangeG8 Open Data Charter
iDASH October 18, 2013
http://opensource.com/government/13/7/open-data-charter-g8
iDASH October 18, 2013 28
Funding Agencies are Changing
iDASH October 18, 2013 29
Publishing is Changing
• Today:• Approx 10,000 publishers
• Publishing approx 25,000 journals
• Which publish approx 1.5 million articles per year (almost 1 million of which appear in PubMed)
iDASH October 18, 2013 30
Witness the ‘Open Access Mega Journal'
1. Very very large– Publishing thousands of articles per year– and benefiting from economies of scale
2. Open Access– Because no one will pay a subscription fee for a journal that
large (and growing that fast)– and using an OA Business Model where each article pays for its
own costs
3. (Preferably) without any ‘artificial’ constraints on its ability to grow– For example, a desire to only publish ‘high impact; papers
[Pete Binfield]
Publications by PLoS ONE per quarter since launch
0
500
1000
1500
2000
2500
3000
3500
Publications by PLOS ONE per quarter since launch
[Pete Binfield]
iDASH October 18, 2013 32
“Open Access Mega Journals”– One Name, Two Flavours
• ‘Clones’ of PLoS ONE (not selective)– SAGE Open– BMJ Open– Scientific Reports (Nature)– AIP Advances (Am Inst Physics)– G3 (Genetics Soc of America)– Biology Open (Company of Biologists)
• ‘Pseudo-Clones’ of PLoS ONE (probably selective) – Physical Review X (Am Physical Society)– Open Biology (Royal Society)– Cell Reports (Elsevier, Cell Press)
[Pete Binfield]
33
Attitudes are Changing“An article about computational science in a scientific publication is not the scholarship itself, it is merely advertising of the scholarship. The actual scholarship is the complete software development environment, [the complete data] and the complete set of instructions which generated the figures.” David Donoho, “Wavelab and Reproducible Research,” 1995
datasetsdata collectionsalgorithmsconfigurationstools and appscodesworkflowsscriptscode librariesservices,system software infrastructure, compilershardware
Morin et al Shining Light into Black BoxesScience 13 April 2012: 336(6078) 159-160
Ince et al The case for open computer programs, Nature 482, 2012
[Carole Goble]
34
Flaws Are Becoming More Obvious
1. Ioannidis et al., 2009. Repeatability of published microarray gene expression analyses. Nature Genetics 41: 142. Science publishing: The trouble with retractions http://www.nature.com/news/2011/111005/full/478026a.html3. Bjorn Brembs: Open Access and the looming crisis in science https://theconversation.com/open-access-and-the-looming-crisis-in-science-14950
Out of 18 microarray papers, resultsfrom 10 could not be reproduced
More retractions: >15X increase in last decadeAt current % > by 2045 as many papers published as retracted
[Carole Goble]
iDASH October 18, 2013
Science is Being Deinstitutionalized
35
Daniel Hulshizer/Associated Press
iDASH October 18, 2013
Science is Being Deinstitutionalized
36
Daniel Hulshizer/Associated Press
iDASH October 18, 2013 37
In Summary
• Question (2005): In the Future Will a Biological Database Really be Different than a Biological Journal?
• Answer: – Less different that they were in 2005– We still have a long way to go improve science– Change is accelerating– What one does on a daily basis as a scholar is very different
from when I was in graduate school and it will be very different again