The Avoidable Waste of Scholarly Publishing Peter Murray-Rust*, ContentMine.org and the University of Cambridge PLoS, Cambridge, UK 2015-07-09
Scholarly Publishing un/wittingly destroys huge amounts of publicly funded research.There are solutions; what is needed is will
Hi, Im here to talk about AMI; a data extraction framework and tool. First, I just want highlight some of key contributors to the projects; Andy for his work on the ChemistryVisitor and Peter for the overall architecture.
In this talk, Im going to impress the importance of data in a specific format and its utility to automated machine processing. Then Im going to demonstrate AMIs architecture and the transformation of data as it flows through the process. Im going to dwell a little on a core format used, Scalable Vector Graphics (SVG) before introducing the concept of visitors, which are pluggable context specific data extractors. Next, Im going to introduce Andys ChemVisitor, for extracting semantic chemistry data, along with a few other visitors that can process non-chemistry specific data. Finally, I will demonstrate some uses of the ChemVisitor, within the realm of validation and metabolism.
BackgroundContentmine aims to make large areas of scientific fact OPEN (100 million facts/year)Were working with WellcomeTrust, Europe PubMedCentral, etc.A politically hot area (Hargreaves legislation, EU activity)2015 WellcomeTrust workshop on TDM and Neuroscience; rough consensus on what was needed.Day workshop at Cochrane, UK (Amy Price, Anna Noel Storr, Ben Goldacre)2-day workshop at Edinburgh on Systematic Reviews of Animal Test publicationsIn the last few months weve prototyped a unique Open starting point, continuously released.
Can PLoS and ContentMine find constructive ways forward?
PM-Rs first real paper, doing science by re-using the results of otherts in a novel way
1974:Each point represented 1-4 hours in library discovery, volume delivery, Transcription, hand calculation.
We were stunned recently when we stumbled across an article by European researchers in Annals of Virology : The results seem to indicate that Liberia has to be included in the Ebola virus endemic zone. In the future, the authors asserted, medical personnel in Liberian health centers should be aware of the possibility that they may come across active cases and thus be prepared to avoid nosocomial epidemics, referring to hospital-acquired infection.
Adage in public health: The road to inaction is paved with research papers.
Bernice Dahn (chief medical officer of Liberias Ministry of Health)Vera Mussah (director of county health services) Cameron Nutt (Ebola response adviser to Partners in Health)A System Failure of Scholarly Publishing
MONROVIA, Liberia The conventional wisdom among public health authorities is that the Ebola virus, which killed at least 10,000 people in Liberia, Sierra Leone and Guinea, was a new phenomenon, not seen in West Africa before 2013. (The one exception was an anomalous case in Ivory Coast in 1994, when a Swiss primatologist was infected after performing an autopsy on a chimpanzee.)
The conventional wisdom is wrong. We were stunned recently when we stumbled across an article by European researchers in Annals of Virology: The results seem to indicate that Liberia has to be included in the Ebola virus endemic zone. In the future, the authors asserted, medical personnel in Liberian health centers should be aware of the possibility that they may come across active cases and thus be prepared to avoid nosocomial epidemics, referring to hospital-acquired infection.As members of a team drafting Liberias Ebola recovery plan last month, we systematically reviewed the literature on Ebola surveillance since the viruss discovery in central Africa in 1976. We learned that the virologists who wrote that report, who were from Germany, had analyzed frozen blood samples taken in 1978 and 1979 from 433 Liberian citizens. They found that 26 (or 6 percent) had antibodies to the Ebola virus.
Three other studies published in 1986 documented Ebola antibody prevalence rates of 10.6, 13.4 and 14 percent, respectively, in northwestern Liberia, not far from its borders with Sierra Leone and Guinea. These articles, along with other forgotten reports from the 1980s on antibody prevalence in neighboring Sierra Leone and Guinea, suggest the possibility of what some call sanctuary sites, or persistent, if latent, Ebola infection in humans.Bernice Dahn is the chief medical officer of Liberias Ministry of Health, where Vera Mussah is the director of county health services. Cameron Nutt is the Ebola response adviser to Dr. Paul Farmer at the nonprofit group Partners in Health.
Free and Open
"Free software is a matter of liberty, not price. free speech', not 'free beer'. (R M Stallman)
A piece of data or content is open if anyone is free to use, reuse, and redistribute it (OKFN)http://opendefinition.org/
open (access) has multiple incompatible definitions. Major split is human eyeballs vs copying and machine reusabilityOpen is a marketing term for publishers, who frequently (often deliberately) do not grant full Openness.
Gratis vs Libre
http://www.budapestopenaccessinitiative.org/read an unprecedented public good.
completely free and unrestricted access to [peer-reviewed literature] by all scientists, scholars, teachers, students, and other curious minds.
Removing access barriers to this literature will accelerate research, enrich education, share the learning of the rich with the poor and the poor with the rich, make this literature as useful as it can be, and lay the foundation for uniting humanity in a common intellectual conversation and quest for knowledge.(Budapest Open Access Initiative, 2003)
Scientific and Medical publication (STM)[+]World Citizens pay $400,000,000,000 for research in 1,500,000 articles cost $300,000 each to create $7000 each to publish [*] $10,000,000,000 from academic libraries to publishers who forbid access to 99.9% of citizens of the world 85% of medical research is wasted (not published, badly conceived, duplicated, )
[+] Figures probably +- 50 %[*] arXiV preprint server costs $7 USD per paper
creative use of these large data sets in the US health care sector could generate more than $300bn in value per annum [MGI, McKinsey]Gartner Inc. has identified 'Big Data' and 'Next-Generation Analytics' as two of the 'Top 10 Strategic Technologies' for 2012.Given the volume of text generated by business, academic and social activities in for example competitor reports, research publications or customer opinions on social networking sites text mining is, however, highly important. [JISC]there are some tasks that simply could not be achieved without using text mining. For example, a major pharmaceutical company used text mining tools to evaluate 50,000 patents in 18 months. This would have taken 50 person years to achieve manually, meaning that it would not even have been contemplated. [JISC]Big Data and Analytics (ContentMining)
Prof. Ian Hargreaves (2011): "David Cameron's exam question: "Could it be true that laws designed more than three centuries ago with the express purpose of creating economic incentives for innovation by protecting creators' rights are today obstructing innovation and economic growth? yes. We have found that the UK's intellectual property framework, especially with regard to copyright, is falling behind what is needed. "Digital Opportunity" by Prof Ian Hargreaves - http://www.ipo.gov.uk/ipreview.htm. Licensed under CC BY 3.0 via Wikipedia - https://en.wikipedia.org/wiki/File:Digital_Opportunity.jpg#/media/File:Digital_Opportunity.jpg
PUBLISHER TDM LICENCE INITIATIVES GENERALLY DO NOT HELPPublishers have started offering their own TDM licences and policiesTheir licences often impose unfair (and in the case of the UK, unenforceable) constraints on researchers freedom to exploit TDM, e.g., requiring users to employ publishers API, putting unnecessary restrictions on how much can be copied, or how fast it can be copied. Why unenforceable? Because, as noted earlier, UK law specifically states that any contract or licence term that prevents anyone from doing TDM in the manner prescribed in the new exception shall be deemed null and void.Really need a test case on these attempted restrictions.Springer and Royal Society offer generous TDM provisions. So why are so many publishers offering restrictive licences in the UK? Maybe they hope licensees are ignorant of the strength of the new law, or the publishers in fact dont know about it. So they are either deliberately misleading, or ignorant
Prof Charles Oppenheim and contentmine.org
Elsevier wants to control Open Data
[asked by Michelle Brook]
Front. Pharmacol., 03 October 2011 | http://dx.doi.org/10.3389/fphar.2011.00051
How data are published in the 21st C
w00000t!!!!1111!!!!ELEVEN!!!! YAYAYAYAYAYAY!!!! Damn tootin'!!!!!
Supplemental material also undermines the concept of a self-contained research report by providing a place for critical material to get lost. Methods that are essential for replicating the experiments, analyses that are central to validating the results, and awkward observations are increasingly being relegated to supplemental material. Such material is not supplemental and belongs in the body of the article, but authors can be tempted (or, with some journals, encouraged) to place essential article components in the s