Literature/data integration and
Ryan ScherleData Repository ArchitectDryad Digital Repository
HighWire Fall Publishers’ MeetingNovember 20, 2013
You may reuse any of the original content in these slides as you wish, provided you attribute the source
Bumpus HC (1898) The Elimination of the Unfit as Illustrated by the Introduced Sparrow, Passer domesticus. Biological Lectures from the Marine Biological Laboratory: 209-226.
CC-BY Adamohttp://www.piqs.de/fotos/121272.html
Who cares if the data is lost?
By Agrant141 (Own work) [CC-BY-SA-3.0 (http://creativecommons.org/licenses/by-sa/3.0)], via Wikimedia Commons
James Cook, portrait by Nathaniel Dance-Holland, c. 1775, National Maritime Museum, Greenwich
Source: Publishing Research Consortium, http://publishingresearch.netn=3824
6
Who cares if the data is lost?
Data “available upon request”
Wicherts and colleagues requested data from from 141 articles in American Psychological Association journals.
“6 months later, after … 400 emails, [sending] detailed descriptions of our study aims, approvals of our ethical committee, signed assurances not to share data with others, and even our full resumes…” only 27% of authors complied
Wicherts JM, Borsboom D, Kats J, Molenaar D (2006) doi:10.1037/0003-066X.61.7.726
Fighting data entropy
8
Info
rmati
on
Con
ten
t
Time
Time of publication
Specific details
General details
Accident
Retirement or career change
Death
(Michener et al. 1997)
Funder policies
o CDCo DODo DOEo EPAo NASA
o NIHo NISTo NOAAo NSFo USDA
US funding agencies that require or strongly recommend data sharing:
Joint data archiving policy
Data are important products of the scientific enterprise, and they should be preserved and usable for decades in the future.
As a condition for publication, data supporting the results in the article should be deposited in an appropriate public archive.
Authors may elect to embargo access to the data for a period up to a year after publication.
Exceptions may be granted at the discretion of the editor, especially for sensitive information.http://datadryad.org/pages/jdap
Piwowar HA, Chapman WW (2008) hdl:10101/npre.2008.1700.1
Impact factor and archiving policies
n=70
IF=3.6
IF=4.5
IF=6.0
Data archiving landscape
There are so many data repositories that we need directories of them:
o http://re3data.orgo http://DataBib.org
These repositories vary along many dimensions:o Datatype focuso Community focuso Allowed file sizeso Curation policieso Data access policieso Funding model
Data archiving landscape
Datatype Focus
Com
mu
nit
y F
ocu
s
General
General
Focused
Focused
Figshare
Institutional RepositorySupplement
alMaterials
Genbank
Pangaea Zenodo
LabDatabas
e
Dryad
14
Dryad vs supplementary materials
Dryad SOM
Discoverable: indexed and exposed to both web and bibliographic search engines
✔ ✗
Identifiable: DataCite DOIs within articles serve as permanent, resolvable identifiers
✔ ✗*
Permanent: processes in place to promote preservation (incl. format migration) ✔ ✔/✗**
Curated: quality control by both automated processes and human inspection ✔ ✗*
Ease of deposit: streamlined deposit, allowance for large and complex datasets ✔ ✔/✗**
Formatted for reuse: do not convert reusable formats to PDF ✔ ✔/✗**
Updatable: new versions of data files can be added, metadata can be enhanced ✔ ✗
Support for embargoes: can delay release of data in accordance with journal policy
✔ ✗
Free reuse: no paywall, clear terms of reuse (all data released under CC Zero) ✔ ✔/✗**
Support for large files: allow data files up to 10GB ✔ ✗
Economy of scale: cost efficiency from shared infrastructure ✔ ✔/✗**
Alignment to organizational mission: focus on archiving and reuse of scientific data
✔ ✗
* A few publisher SOM sites are exceptions to the general rule** Practices differ among publishers, see Smit (2011), doi:10.1045/january2011-smit
DataDryad.org 15
What makes Dryad unique
1. Tight focus on data associated with published literature
2. Data packages are curated
3. Open development process allows broad participation
4. Nonprofit organization managed by stakeholders
DataDryad.org 21
Data citations
Best practice is to cite both the article and the data – they are both useful research products
But limit data citations to one data package per article – this eliminates most concerns about the size/granularity of data files
Dryad uptake
>4,000 data packages containing >12,000 files associated with articles in 275 journals
200 submissions each month and growing
Some data packages have been downloaded more than 10,000 times
Fewer than 10% of authors chose to embargo their data when this option is allowed by the journal
Price schedule
Plan Member Non-member Minimum Purchase
Voucher $65 per data package $70 per data package 25 vouchers
Deferred Payment $70 per data package $75 per data package 1 year
contract
Subscriptionannual fee based on $25 per published research article
annual fee based on $30 per published research article
2 year contract
Pay on submission N/A
$80 per data package, payable by the submitter
1 data package
29
Sponsoring open data
Functional EcologyHeredityJournal of HereditySystematic BiologyThe American NaturalistEcological MonographsProceedings AProceedings BJournal of EcologyInterface FocusPlant PhysiologyThe Plant CallOpen BiologyEcology and EvolutionEvolutionary ApplicationseLife
Publishers, societies, and other organizations are now sponsoring deposits in 44 Journals
EvolutionElementaPalaeontologyMycoKeysComparative CytogeneticsSubterranean BiologyNature ConservationNeoBiotaPhytoKeysZooKeysPaleobiologyBiodiversity Data JournalBioRiskMolecular EcologyMolecular Ecology Resources
GMS German Medical ScienceGMS Medizinische Infomatik, Biometric und EpidemiologieSpecial Papers in PalaeontologyJournal of Evolutionary BiologyJournal of the Royal Society InterfaceJournal of Applied EcologyJournal of Animal EcologyMethods in Ecology and EvolutionThe Journal of PaleontologyJournal of Hymenoptera ResearchPhilosophical Transactions APhilosophical Transactions B
In development…
Added value for journals, including a data display widget and a dashboard for editors
Integrated article & data submission
Key functionalityo Makes data deposition simple for
authors (once files are prepared)o Ensures permanent link to data
within each article (and vice versa).
Options are customized to meet journal policies
o Data can be submitted prior to manuscript review or upon acceptance
o Journals may allow authors the option of a embargoing data for 1 year after publication
32
To learn more
Repository home: http://datadryad.orgNews: http://blog.datadryad.orgTwitter: @datadryad
Ryan Scherle, [email protected]
33