Upload
asist
View
563
Download
1
Embed Size (px)
DESCRIPTION
Kathleen Fear, ICPSR, University of Michigan “The impact of data reuse: a pilot study of 5 measures” Panel: Data citation and altmetrics Research Data Access & Preservation Summit 2013 Baltimore, MD April 4, 2013 #rdap13
Citation preview
Viable Data Citation: Expanding the Impact of Social Science Research
RDAP13 Panel on Data Citation and Altmetrics, April 5, 2013Elizabeth Moss, [email protected]
At ICPSR
• Providing opportunities for tracking and measuring impact• Linking data to the literature, and the
challenges involved • Aiding the cultural shift to viable citing
practice (impact can be better measured if data use is readily discernable)
Top 10 Data Downloads in the Previous Six Months (non-anonymous, distinct users downloading one or more files)
ICPSR Study Title # Downloads
National Longitudinal Study of Adolescent Health (Add Health), 1994-2008 1817
National Survey on Drug Use and Health, 2010 1109
Chinese Household Income Project, 2002 648
General Social Survey, 1972-2010 [Cumulative File] 643
National Survey on Drug Use and Health, 2011 603
Collaborative Psychiatric Epidemiology Surveys (CPES), 2001-2003 [United States] 527
Health Behavior in School-Aged Children (HBSC), 2005-2006 509
American National Election Study, 2008: Pre- and Post-Election Survey 427
India Human Development Survey (IHDS), 2005 395
School Survey on Crime and Safety (SSOCS), 2006 339
Who uses these shared data?
With what impact?
Obtaining ICPSR MetadataICPSR metadata are available in two formats:• DDI Codebook XM
L• MARC21• OAI-PMH
• Increase likelihood of discovery and re-use• Aid students, instructors, researchers, and
funders
The ICPSR Bibliography of Data-related Literature
Link research data to scholarly literature about it
It’s really a searchable database . . .
. . . containing 65,000 citations of known published and unpublished works resulting from analyses of data archived at ICPSR
. . . that resides in Oracle, with an internal UI for database management
. . . that can generate study bibliographieslinking each study with the literature about it, and out to the full text
It’s useful to all stakeholdersInstructors direct students to begin data-related research projects by reading some of the major works based on the dataAdvanced researchers also use it to conduct a focused literature review before deciding to use a datasetReporters and policymakers looking for processed statistics look for reports explaining studiesPrincipal investigators and funding agencies want to track how data are used after they are deposited
But challenging to provide
The state of data citation in the social science literature
Abstract?Acknowledgements?
Charts and Tables?
Appendices?
References!
Discussion?Footnotes?
Sample?Methods?
Data “Sighting”(implicit)
vs. Data Citing
(explicit)
Typical “sightings”• Sample described, not named, no author
information, no access information, only a publication cited
• Data named in text, with some attribution, but no access information
• Cited in reference section, but with no permanent, unique identifier, so difficult for indexing scripts to find to automate tracking
ICPSR’s advocates the use of DOIs• ICPSR has been providing citations to its data
since 1990 and started assigning DOIs in 2008
• DOIs apply at the study or collection level (a study can have multiple datasets) and resolve to the study home page with richest metadata
• DOIs are of the form: doi:10.3886/ICPSR04549
A-typical “citing:”In the references, with the DOI
doi:10.3886/ICPSR21240
Challenges in database search infrastructure• Journal databases fielded for journal article
discovery are not ideal for finding data “sightation”
• No field searching on methods sections• Full-text search brings back too many bad hits• Limiting to abstract misses too many good hits
• Tension between highly curating a manageable collection and minimally maintaining a broad collection
• Too many publications for efficient collection by humans, so we must make it easy for scripts to do it reliably
Challenges in tracking many studies
Challenges of completeness
• Data use that is too difficult/costly to find cannot be counted
• A selective sample, difficult to draw accurate conclusions in broad analyses of re-use
Challenges in publishing practice, and lack of data management planning• Publishing sequence prevents citation
creation before publication• Potential for change by educating the
PI/mentor• Consciousness raising starting to occur due
to funders’ requirements
Poorly described and cited data+ Excessive human search effort= Too costly, too questionable for confident measure of impact
Citing data with a DOI+ Minimal human search effort= High hit accuracy for the cost, and better confidence of impact measures
Finding data with simple search fields
Integration with Web of Knowledge All Databases: Research data is equal to research literature
Converting journal search infrastructure to meet the needs of data, but synching metadata still a work in progress.
Articles linked to underlying data.Increased data discovery.Reward for data citation.Potential for automated tracking.
Building a culture of viable data citation to improve measures of impact
Provide PIs and users with citations and DOIs for all study-level data
Join groups advocating viable data citing practice
Work with partner repositories to change publishing practice
Three meetings: Journal editors, domain repositories, and funders• Establish consistent data citation in social
science journals• Encourage transparency in research• Optimize editorial work flows: sequencing• Develop common standards for repositories• Find long-term funding models repository
sustainability
Thank you
Elizabeth [email protected]