RDAP13 Elizabeth Moss: The impact of data reuse

Preview:

DESCRIPTION

Kathleen Fear, ICPSR, University of Michigan “The impact of data reuse: a pilot study of 5 measures” Panel: Data citation and altmetrics Research Data Access & Preservation Summit 2013 Baltimore, MD April 4, 2013 #rdap13

Citation preview

Viable Data Citation: Expanding the Impact of Social Science Research

RDAP13 Panel on Data Citation and Altmetrics, April 5, 2013Elizabeth Moss, ICPSReammoss@umich.edu

At ICPSR

• Providing opportunities for tracking and measuring impact• Linking data to the literature, and the

challenges involved • Aiding the cultural shift to viable citing

practice (impact can be better measured if data use is readily discernable)

Top 10 Data Downloads in the Previous Six Months (non-anonymous, distinct users downloading one or more files)

ICPSR Study Title # Downloads

National Longitudinal Study of Adolescent Health (Add Health), 1994-2008 1817

National Survey on Drug Use and Health, 2010 1109

Chinese Household Income Project, 2002 648

General Social Survey, 1972-2010 [Cumulative File] 643

National Survey on Drug Use and Health, 2011 603

Collaborative Psychiatric Epidemiology Surveys (CPES), 2001-2003 [United States] 527

Health Behavior in School-Aged Children (HBSC), 2005-2006 509

American National Election Study, 2008: Pre- and Post-Election Survey 427

India Human Development Survey (IHDS), 2005 395

School Survey on Crime and Safety (SSOCS), 2006 339

Who uses these shared data?

With what impact?

• Increase likelihood of discovery and re-use• Aid students, instructors, researchers, and

funders

The ICPSR Bibliography of Data-related Literature

Link research data to scholarly literature about it

It’s really a searchable database . . .

. . . containing 65,000 citations of known published and unpublished works resulting from analyses of data archived at ICPSR

. . . that resides in Oracle, with an internal UI for database management

. . . that can generate study bibliographieslinking each study with the literature about it, and out to the full text

It’s useful to all stakeholdersInstructors direct students to begin data-related research projects by reading some of the major works based on the dataAdvanced researchers also use it to conduct a focused literature review before deciding to use a datasetReporters and policymakers looking for processed statistics look for reports explaining studiesPrincipal investigators and funding agencies want to track how data are used after they are deposited

But challenging to provide

The state of data citation in the social science literature

Abstract?Acknowledgements?

Charts and Tables?

Appendices?

References!

Discussion?Footnotes?

Sample?Methods?

Data “Sighting”(implicit)

vs. Data Citing

(explicit)

Typical “sightings”• Sample described, not named, no author

information, no access information, only a publication cited

• Data named in text, with some attribution, but no access information

• Cited in reference section, but with no permanent, unique identifier, so difficult for indexing scripts to find to automate tracking

ICPSR’s advocates the use of DOIs• ICPSR has been providing citations to its data

since 1990 and started assigning DOIs in 2008

• DOIs apply at the study or collection level (a study can have multiple datasets) and resolve to the study home page with richest metadata

• DOIs are of the form: doi:10.3886/ICPSR04549

A-typical “citing:”In the references, with the DOI

doi:10.3886/ICPSR21240

Challenges in database search infrastructure• Journal databases fielded for journal article

discovery are not ideal for finding data “sightation”

• No field searching on methods sections• Full-text search brings back too many bad hits• Limiting to abstract misses too many good hits

• Tension between highly curating a manageable collection and minimally maintaining a broad collection

• Too many publications for efficient collection by humans, so we must make it easy for scripts to do it reliably

Challenges in tracking many studies

Challenges of completeness

• Data use that is too difficult/costly to find cannot be counted

• A selective sample, difficult to draw accurate conclusions in broad analyses of re-use

Challenges in publishing practice, and lack of data management planning• Publishing sequence prevents citation

creation before publication• Potential for change by educating the

PI/mentor• Consciousness raising starting to occur due

to funders’ requirements

Poorly described and cited data+ Excessive human search effort= Too costly, too questionable for confident measure of impact

Citing data with a DOI+ Minimal human search effort= High hit accuracy for the cost, and better confidence of impact measures

Finding data with simple search fields

Integration with Web of Knowledge All Databases: Research data is equal to research literature

Converting journal search infrastructure to meet the needs of data, but synching metadata still a work in progress.

Articles linked to underlying data.Increased data discovery.Reward for data citation.Potential for automated tracking.

Building a culture of viable data citation to improve measures of impact

Provide PIs and users with citations and DOIs for all study-level data

Join groups advocating viable data citing practice

Work with partner repositories to change publishing practice

Three meetings: Journal editors, domain repositories, and funders• Establish consistent data citation in social

science journals• Encourage transparency in research• Optimize editorial work flows: sequencing• Develop common standards for repositories• Find long-term funding models repository

sustainability

Thank you

Elizabeth Mosseammoss@umich.edu

Recommended