DATA MANAGEMENT PLANS & THE DATA CITATION INDEX NIGEL ROBINSON
19 MAY 2014
©20
10 T
hom
son
Reu
ters
THE INCREASING VISIBILITY OF DATA
• Grant funding agencies
• Journal publishers• Publisher website
• Electronic articles
• Data repositories & registration agencies
“Data is the new gold”
– Neellie Kroes, EU Digital Agenda Commissioner
©20
10 T
hom
son
Reu
ters
DIGITAL SCHOLARSHIP
•Very visible within the literature as a concept
•Articles, projects, university labs all devoted to digital scholarship in various ways
Digital Scholarship
•Authors/researchers
•Research administrators
•Librarians, data archivists
•Publishers
•Grant funding organizations
Interested Parties
•Discipline-specific and multidisciplinary content
•Needs and requirements vary by discipline
•Diverse content formats, with few standards
Content
©20
10 T
hom
son
Reu
ters
OVERVIEW
• Emerging landscape
• Opportunities
• Citation and attribution
©20
10 T
hom
son
Reu
ters
NIH (2003) Data Sharing Policy that all funding applications of $500,000 or more per year are expected to address data-sharing in their application.
NSF (2011) All funding proposals submitted on or after January 18, 2011, must include a “Data Management Plan” describing how the proposal will conform to NSF policy on the dissemination and sharing of research results.
THE EMERGENCE OF FUNDING MANDATES
©20
10 T
hom
son
Reu
ters
DATA MANAGEMENT REQUIREMENTS EXTEND ACROSS THE GLOBE
Aug 2011… “expectation that all our funded researchers should maximise access to their research data with as few restrictions as possible. …. submit a data management and sharing plan as part of the application process.”
2007… “Researchers are to retain research data and primary materials, manage storage of research data and primary materials, maintain confidentiality of research data and primary materials.”
©20
10 T
hom
son
Reu
ters
DATA MANAGEMENT REQUIREMENTS EXTEND ACROSS THE GLOBE
• “A further new element in Horizon 2020 is the use of Data Management Plans (DMPs) detailing what data the project will generate, whether and how it will be exploited or made accessible for verification and re-use, and how it will be curated and preserved. The use of a Data Management Plan is required for projects participating in the Open Research Data Pilot. Other projects are invited to submit a Data Management Plan if relevant for their planned research.”
©20
10 T
hom
son
Reu
ters
IMPACT ON RESEARCH LIBRARIES
8
©20
10 T
hom
son
Reu
ters
FUNDING MANDATES BECOMING STRONGER
January 14, 2013… “failure to provide the requisite Data Management Plan will result in the application being rejected or terminated.”
©20
10 T
hom
son
Reu
ters
WHY SHARE DATA?
• Verification - Findings can be verified
• Extend original findings, address new questions
• Reduce costs
• Training – data reuse
• Increased primary publicationsA. Pienta, G. Alter, J. Lyle (2010). The Enduring Value of Social Science Research: The Use and Reuse of Primary Research Data. http://hdl.handle.net/2027.42/78307 Data sharing leads to more science &
more knowledge
©20
10 T
hom
son
Reu
ters
DATA SHARING
• The level and timing of data sharing varies 28% only
share prior to
publication
35% only share after publication
25% share before and after
publication
©20
10 T
hom
son
Reu
ters
INCREASED CITATION WITH SHARED DATA
35% to 69% more citations
courtesy of Jon Sears (AGU)
Piwowar HA, Day RS, Fridsma DB (2007) Sharing Detailed Research Data Is Associated with Increased Citation Rate. PLoS ONE 2(3): e308. doi:10.1371/journal.pone.0000308
Bibliometrics
©20
10 T
hom
son
Reu
ters
DEPOSITION OF DATA BY RESEARCHERS
13
24%
36%
47%
51%
17%
Publisher website
Repository managed by a third party (e.g, domain-…
Department or institutional repository
Personal website
Other
Q16. Where do you place your non-traditional scholarly output to make it available to others? (n=471)
©20
10 T
hom
son
Reu
ters
RESEARCHERS NOT RECEIVING CREDIT
14
Barriers to creating and sharing data:
• Researchers are hesitant to spend time and effort to create and share data because they don’t feel the work is adequately exposed or accredited
• Researchers find it difficult to expose data they have produced because data repositories do not have clear standards or mechanisms in place for doing so
©20
10 T
hom
son
Reu
ters
RESEARCHER PROBLEMS• Access & discovery
• Citation standards
• Lack of willingness to deposit and cite
• Lack of recognition / credit
©20
10 T
hom
son
Reu
ters
DATA MANAGEMENT PLAN BENEFITS• Repository must hold data• Repository must provide access to dataData deposit
• Material added/updated• Provide statistics on deposited data• Actively curate data in the archiveActive
• Persistent IDs, DOIs or other permanent ID• Contacts available for confirmation of interpretation• Indication of intention to preserve data or provide access over the
long term• Contingency if repository was to cease to operate
• Make data accessible (or state licensing terms)• Sustainable• Funding information available for repository and deposited data
Persistence
• Links to literature• Citation in literature databasesData reuse
©20
10 T
hom
son
Reu
ters
CHALLENGES
• Metadata– Resources
– Expertise
• Citable data source
• Metadata quality– Unique & persistent identifiers
– Consistency
• Data repositories are not static– How is version control handled?
• Partnerships
©20
10 T
hom
son
Reu
ters
DATA CITATIONCurrent citation style (in full text of article as informal citations)
Desired/future citation style (as formally cited references)
U.S. Dept. of Justice, Bureau of Justice Statistics (1996): MURDER CASES IN 33 LARGE URBAN COUNTIES IN THE UNITED STATES, 1988. Version 1. Inter-university Consortium for Political and Social Research. http://dx.doi.org/10.3886/ICPSR09907.v1
Lee, Seung-Jae; Lee, He-Jin; Cho, Ji-Hoon; Rho, Sangchul; Hwang, Daehee (2008): GSE11574: The responses of astrocytes stimulated by extracellular a-synuclein. Gene Expression Omnibus. http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE11574
©20
10 T
hom
son
Reu
ters
DATA CITATION
Lee, Seung-Jae; Lee, He-Jin; Cho, Ji-Hoon; Rho, Sangchul; Hwang, Daehee (2008): GSE11574: The responses of astrocytes stimulated by extracellular a-synuclein. Gene Expression Omnibus. http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE11574
Data Citation Index
New data metrics
Scientific literature
Published data sets
©20
10 T
hom
son
Reu
ters
DATA CITATION INDEX AIMS
Launched October 2012
4M data records
• Enable the discovery of data repositories, data studies and data sets in the context of traditional literature
• Link data to research publications
• Help researchers find data sets and studies and track the full impact of their research output
• Provide expanded measurement of researcher and institutional research output and assessment
• Facilitate more accurate and comprehensive bibliometric analyses
©20
10 T
hom
son
Reu
ters
21
INDEXING A DATA REPOSITORY ON WEB OF SCIENCE
• Repository/Source: Comprises data studies, data sets and/or microcitations. Stores and provides access to the raw data.
• Data Study: Descriptions of studies or experiments with associated data which have been used in the data study. Includes serial or longitudinal studies over time.
• Data Set: A single or coherent set of data or a data file provided by the repository, as part of a collection, data study or experiment.
• Microcitation: (nanopublication) An assertion about concepts that have been found to be linked by scientific enquiry, and can be uniquely identified and attributed to its author. Made up of three separate parts: a subject, a predicate and an object.
Record Types
Descriptive metadata feed from repository
Repository raw
metadata is analysed
Metadata
added
Repository
Data study
Data set
Micro-citation
©20
10 T
hom
son
Reu
ters
Search Results within the Data Citation Index
present the powerful Web of Knowledge options for
exploring a body of information. Data
becomes discoverable alongside literature
Data deposition makes it possible to show related data
from the repository
Because data are accessible and able to be cited, they can be linked to publications describing research which uses them
Link out directly to the original item, in this case
a Data Study.
Start to build citation maps associated with
data through the association of data and
literature
Provide assistance in how to associate data and
literature through citation
©20
10 T
hom
son
Reu
ters
DATA CITATION INDEX & DATA MANAGEMENT PLANS
• Discovery of data most important to scholarly research
• Data linked to published research literature
• Measures of data citation, use and reuse with attribution assisted by identifiers
• New metrics for digital scholarship
©20
10 T
hom
son
Reu
ters
As we evaluate repositories for
inclusion, some of the things we
consider are:
• Editorial Content - ensuring that
material is desirable to the
research community.
• Persistence and stability of the
repository, with a steady flow of
new information.
• Thoroughness and detail of
descriptive information.
• Links from data to research
literature.
REPOSITORY SELECTION & EVALUATION
©20
10 T
hom
son
Reu
ters
DATA REPOSITORIES• Over 1000 repositories identified
©20
10 T
hom
son
Reu
ters
TYPES OF DATA BY DISCIPLINE
ART & HUMANITIES
CULTURAL HERITAGE
LANGUAGE CORPUS
IMAGE COLLECTIONS
RECORDINGS
SOCIAL SCIENCES
POLL DATA
ECONOMIC STATISTICS
LONGITUDINAL DATA
NATIONAL CENSUS
PUBLIC OPINION SURVEYS
SCIENCE & TECHNOLOGY
MAPS
ALGORITHMS
GENOMICS
SKY SURVEYS
ASTROPHYSICS
REMOTE SENSING
MUSEUM SPECIMENS