UCSD Libraries
Why is Scholarly Communication Broken and What Can Be Done?
In Celebration of Open Access Week
Philip E. BourneUniversity of California San Diego
Oct. 18, 2010
UCSD Libraries
Disclaimer
• I am a domain (life) scientist not a computer or information scientist
• I am fortunate enough to have a major biological resource (the Protein Data Bank) and a major biological journal (PLoS Computational Biology) as my playground
• I am part of the long tail
• I am naïve, but I am the majorityOct. 18, 2010
UCSD Libraries
Agenda
• Motivation
• What needs to be done?
• A few examples
• The role of the institution
Oct. 18, 2010
UCSD Libraries
The Scientific Process is Too Slow to Respond to a Crisis – Either Global or Personal
Oct. 18, 2010Motivation
http://knol.google.com/k/plos-currents-influenza#
By the time the paper is published we could all be dead
UCSD Libraries
* http://www.cdc.gov/h1n1flu/estimates/April_March_13.htm
Jan. 2008 Jan. 2009 Jan. 2010Jul. 2009Jul. 2008 Jul. 2010
1RUZ: 1918 H1 Hemagglutinin
Structure Summary page activity forH1N1 Influenza related structures
3B7E: Neuraminidase of A/Brevig Mission/1/1918 H1N1 strain in complex with zanamivir
In a time of crisis the need for fast access to accurate data and any knowledge ofthat data are paramount
MotivationOct. 18, 2010
UCSD Libraries
If that is not enough…
For some people the scientific process may be too slow to save their life
Oct. 18, 2010Motivation
UCSD Libraries
Josh Sommer – A Remarkable Young ManCo-founder & Executive Director the Chordoma Foundation
Oct. 18, 2010http://sagecongress.org/Presentations/Sommer.pdf
Motivation
UCSD Libraries
Chordoma
• A rare form of brain cancer
• No known drugs• Treatment – surgical
resection followed by intense radiation therapy
Oct. 18, 2010Motivation
http://upload.wikimedia.org/wikipedia/commons/2/2b/Chordoma.JPG
UCSD LibrariesOct. 18, 2010
http://sagecongress.org/Presentations/Sommer.pdf
Motivation
UCSD LibrariesOct. 18, 2010
http://sagecongress.org/Presentations/Sommer.pdf
Motivation
UCSD LibrariesOct. 18, 2010
http://sagecongress.org/Presentations/Sommer.pdf
Motivation
UCSD LibrariesOct. 18, 2010
Adapted: http://sagecongress.org/Presentations/Sommer.pdf
Motivation
Isaac
If I have seen further it is only by standing on the shoulders of giants
Isaac Newton
From Josh’s point of view the climb up just takes too long
> 15 years and > $850M to be more precise
UCSD LibrariesOct. 18, 2010
http://sagecongress.org/Presentations/Sommer.pdf
Motivation
UCSD LibrariesOct. 18, 2010Motivation
http://sagecongress.org/Presentations/Sommer.pdf
UCSD LibrariesOct. 18, 2010
http://fora.tv/2010/04/23/Sage_Commons_Josh_Sommer_Chordoma_Foundation
Motivation
UCSD Libraries
Now we are all hopefully motivated let us break this down to what actually needs to be done in my opinion
Here are a few big things …
Oct. 18, 2010What Needs to be Done?
UCSD Libraries
A Few Things to Accelerate the Rate of Scientific Discovery
• Better communication, data and knowledge access, and new modes of discovery, which means:– We need data and knowledge about that data to interoperate
i.e. we need new kinds of fast, versatile publications and data archives
– We need to be more open with both– We need to think more about the tools that analyze, visualize
and annotate data to maximize knowledge discovery– Reward systems need to change– We need scientist management tools– We need to be less fixated on the big data problems– We need to unleash the full power of the Internet
Oct. 18, 2010 Easy Hard
1. A link brings up figures from the paper
0. Full text of PLoS papers stored in a database
2. Clicking the paper figure retrievesdata from the PDB which is
analyzed
3. A composite view ofjournal and database
content results
We Need Data and Knowledge About That
Data to Interoperate
1. User clicks on content2. Metadata and
webservices to data provide an interactive view that can be annotated
3. Selecting features provides a data/knowledge mashup
4. Analysis leads to new content I can share
4. The composite view haslinks to pertinent blocks
of literature text and back to the PDB
1.
2.
3.
4.
The Knowledge and Data Cycle
PLoS Comp. Biol. 2005 1(3) e34
UCSD Libraries
We Need Data and Knowledge About That Data to Interoperate – What is Stopping US?
• Governance – publishers vs. database providers
• Reward• Metadata standards for provenance, privacy
etc.• Exemplars• ….
Oct. 18, 2010
Caveat: Each discipline is different – I speak very much from a biomedicalsciences perspective
Certainly the Argument for Interoperability in the Biomedical Sciences is Strong
• PubMed contains 18,792,257 entries
• ~100,000 papers indexed per month
• In Feb 2009:– 67,406,898 interactive
searches were done– 92,216,786 entries were
viewed
• 1078 databases reported in NAR 2008
• MetaBase http://biodatabase.org reports 2,651 entries edited 12,587 times
Data as of April 14, 2009
PLoS Comp. Biol. 2005 1(3) e34What Needs to be Done?
UCSD Libraries
www.rcsb.org/pdb/explore/literature.do?structureId=1TIM
Example Interoperability: The Database View
BMC Bioinformatics 2010 11:220Oct. 18, 2010What Needs to be Done?
UCSD Libraries
Example Interoperability: The Literature Viewhttp://biolit.ucsd.edu
Nucleic Acids Research 2008 36(S2) W385-389Oct. 18, 2010What Needs to be Done?
UCSD LibrariesICTP Trieste, December 10, 2007
Oct. 18, 2010
UCSD Libraries
Semantic Tagging & Widgets are a Powerful Tool to Integrate Data and Knowledge of that
Data, But as Yet Not Used Much
Oct. 18, 2010
Will Widgets and Semantic Tagging Change Computational Biology? PLoS Comp. Biol. 6(2) e1000673
What Needs to be Done?
Semantic Tagging of Database Content in The Literature or Elsewhere
http://www.rcsb.org/pdb/static.do?p=widgets/widgetShowcase.jspPLoS Comp. Biol. 6(2) e1000673Semantic Tagging
UCSD LibrariesOct. 18, 2010What Needs to be Done?
UCSD Libraries
The Publishers are Starting to Do It
Oct. 18, 2010From Anita de Waard, Elsevier
What Needs to be Done?
UCSD Libraries
This is Literature Post-processingBetter to Get the Authors Involved
• Authors are the absolute experts on the content
• More effective distribution of labor
• Add metadata before the article enters the publishing process
Oct. 18, 2010What Needs to be Done?
UCSD Libraries
Word 2007 Add-in for authors
• Allows authors to add metadata as they write, before they submit the manuscript
• Authors are assisted by automated term recognition– OBO ontologies– Database IDs
• Metadata are embedded directly into the manuscript document via XML tags, OOXML format– Open– Machine-readable
• Open source, Microsoft Public License
http://www.codeplex.com/ucsdbiolitOct. 18, 2010
What Needs to be Done?
UCSD Libraries
Challenges
• Authors – Carrot IF one or more publishers fast tracked a
paper that had semantic markup it might catch on
• Publishers– Carrot Competitive advantage
Oct. 18, 2010What Needs to be Done?
UCSD Libraries
A Few Things to Accelerate the Rate of Scientific Discovery
• Better communication, data and knowledge access, and new modes of discovery, which means:– We need data and knowledge about that data to interoperate
i.e. we need new kinds of fast, versatile publications and data archives
– We need to be more open with both– We need to think more about the tools that analyze, visualize
and annotate data to maximize knowledge discovery– Reward systems need to change– We need scientist management tools– We need to be less fixated on the big data problems– We need to unleash the full power of the Internet
Oct. 18, 2010 Easy Hard
UCSD Libraries
Reward Systems Need to ChangeWhat is Needed?
• Author disambiguation• Auditing (identification and metrics) of all
scholarship - means new tools• Seniors need to promote alternative forms of
scholarship• Juniors need to respond
Oct. 18, 2010Reward Systems Need to Change
Ten Simple Rules for Getting Promoted as a Computational Biologist in Academia PLoS Comp Biol to appear
UCSD Libraries
Example Tools
Oct. 18, 2010
http://pubnet.gersteinlab.org/
http://www.researcherid.com/
http://www.biomedexperts.com
UCSD Libraries
What Are these Alternative Forms of Scholarship?
Research[Grants]
JournalArticle
ConferencePaper
PosterSession
Reviews
BlogsCommunity Service/Data
Curation
Reward Systems Need to ChangeOct. 18, 2010
UCSD Libraries
Ideally the ID will be Tagged to Every Piece of Scholarly Communication
I an Not a Scientist I am a NumberPLoS Comp. Biol. 2008 4(12) e1000247
Reward Systems Need to ChangeOct. 18, 2010
UCSD Libraries
A Few Things to Accelerate the Rate of Scientific Discovery
• Better communication, data and knowledge access, and new modes of discovery, which means:– We need data and knowledge about that data to interoperate
i.e. we need new kinds of fast, versatile publications and data archives
– We need to be more open with both– We need to think more about the tools that analyze, visualize
and annotate data to maximize knowledge discovery– Reward systems need to change– We need scientist management tools– We need to be less fixated on the big data problems– We need to unleash the full power of the Internet
Oct. 18, 2010 Easy Hard
UCSD Libraries
The Truth About My Laboratory
• I have ?? mail folders!
• The intellectual memory of my laboratory is in those folders
• This is an unhealthy hub and spoke mentality
We Need Scientist Management ToolsOct. 18, 2010
The Truth About My Laboratory
• I generate way more negative that positive data, but where is it?
• Content management is a mess– Slides, posters…..– Data, lab notebooks ….– Collaborations, Journal clubs …
• Software is open but where is it?• Farewell is for the data too
Computational Biology Resources Lack Persistence and Usability. PLoS Comp. Biol. 2008 4(7): e1000136 We Need Scientist Management Tools
http://artbyvida.com/portfolio.php
UCSD Libraries
Many Great Tools Out There
Oct. 18, 2010 We Need Scientist Management Tools
Taverna
UCSD Libraries
Where I See the Problems
• The long tail is confused
• Lack of interoperability between the options
• The reward (publishing) is still removed from the available tools
Oct. 18, 2010 We Need Scientist Management Tools
Science is Increasingly a Digital Workflow
Scientist
Idea
Experiment
Data
Conclusions
PublishThe Role of the Institution
Laboratory
Publisher
Maybe The Line is Somewhere Else?
Scientist
Idea
Experiment
Data
Conclusions
Publish
Laboratory
Publisher
Institution
Lab Notebook
The Role of the Institution
This Amounts to Publishing WorkflowsBut That Has its Problems
• Workflows are not linear• Workflow : paper is not 1:1• Confidentiality• Peer review• Infrastructure• Community acceptance• Reward system
The Role of the Institution
Solutions to Publishing Workflows?
• New organizations (university as publisher?)
• Appropriate reward system
• Shared governance – author, institution, publisher
• Crowd sourcing the electronic printing press
The Role of the Institution
Crowd Sourcing the Electronic Printing Press(aka Workshop: Beyond the PDF)
• Funded by DDCF, Microsoft, NCI, Sage Bionetworks:
• Aims:– Define user requirements– Establish a specification document– Open source the development effort– Have a commitment from a publisher to publish a
research object using the system– Act as an exemplar for what can be done
The Role of the Institution
Logistics
• UC San Diego• Jan 19-21, 2010• Under the auspices of
W3C• FoRC will have a follow
on meeting
The Role of the Institution