64
Data Access and Research Transparency: a Data Repository View George Alter ICPSR University of Michigan

2013 DataCite Summer Meeting - Closing Keynote: Building Community Engagement on Research Transparency and Data Citation (George Alter - ICPSR)

Embed Size (px)

DESCRIPTION

2013 DataCite Summer Meeting - Making Research better DataCite. Co-sponsored by CODATA. Thursday, 19 September 2013 at 13:00 - Friday, 20 September 2013 at 12:30 Washington, DC. National Academy of Sciences http://datacite.eventbrite.co.uk/

Citation preview

Data Access and Research Transparency: a Data Repository View

George AlterICPSRUniversity of Michigan

Mission: ICPSR provides leadership and training in data access, curation, and methods of analysis for a diverse and expanding social science research community.

• Acquire and archive social science data• Distribute data to researchers• Preserve data for future generations• Provide training in quantitative methods

About the Inter-university Consortium for Political and Social Research (ICPSR)

ICPSR Then and Now• ICPSR History– Established in 1962 so that social scientists could share

data– Started as a partnership among 21 universities– Data distributed on punched cards and then magnetic

reel to reel tape

ICPSR Then and Now• ICPSR History

– Established in 1962 so that social scientists could share data– Started as a partnership among 21 universities– Data distributed on punched cards and then magnetic reel

to reel tape

• ICPSR Today– More than 700 members – 390+ U.S. institutions– 46 national memberships– 8,000+ data collections– Direct downloads– Online analysis

Data archiving and dissemination for more than 20 federal and private agencies

Summer Program 20131,000+ participants42 four-week courses37 one- to five-day courses

ICPSR Bibliography of Data Related Publications (66,000+) in the Data Citation Index

“Building Community Engagement in Data Citation and Open Access to Data”

• Funded by Alfred P. Sloan Foundation– Challenge Grants to improve data citation and

access– Social science journals– Domain repositories

“Building Community Engagement in Data Citation and Open Access to Data”

• Challenge grants: 4 selected from 26 applications:– Richard Ball and Norm Medeiros, "Replication of Empirical Research: A

Soup-to-Nuts Protocol for Documenting Data Management and Analysis," Haverford College

– Thomas Carsey, "Implementing a Data Citation Workflow within the State Politics and Policy Journal," University of North Carolina at Chapel Hill

– Lisa Neidert, "OPEN Data Through a Restricted Data Portal," The University of Michigan

– Jian Qin and Kevin Crowston, "Development and Dissemination of a Capability Maturity Model for Research Data Management Training and Performance Assessment," Syracuse University

• AERA Education Evaluation and Policy Analysis• American Economic Journal: Applied Economics• American Economics Review• American Educational Research Association• American Journal of Political Science• American Journal of Sociology• American Psychological Association• American Sociological Review• American Statistical Association• Archives of Scientific Psychology• Demography• Institute for Quantitative Social Science, Harvard University• Journal of Politics• MIT Libraries• Society for Research on Educational Effectiveness• State Politics and Policy Quarterly

Data Citation and Research Transparency Standards For the Social Sciences

June 13-14, 2013

• Association of Religion Data Archives• CIESIN• Cultural Policy and the Arts National Data Archive• Data Conservancy• Data ONE• Databrary• Dryad• Human Relations Area Files• Linguistic Data Consortium• National Academies of Science• National Snow and Ice Data Center• Odum Institute• Roper Center• SEAD• tDAR Digital Archaeological Record • UCLA Data Archive• University of Michigan Transportation Research Institute• US Virtual Astronomical Observatory• Worldwide Protein Data Bank

Sustaining Domain Repositories for Digital Data, June 24-25, 2013

What do we know about sharing of social science data?

Source: Pienta, Amy, Myron Gutmann, & Jared Lyle. 2009. “Research Data in The Social Sciences: How Much is Being Shared?” Research Conference on Research Integrity, Niagara Falls, NY.

Most data are not shared.

Data Archived(n=111)

Data Shared Informally(n=415)

Data Not Shared(n=409)

Primary PI Pubs (median)

6 6 3

Secondary Pubs, No PI (median)

8 6 3

Pubs with Students(median)

4 3 1

Total 18 15 7

Median # of Publications by Data Sharing Status

Source: Pienta, Amy M., George Alter, and Jared Lyle. 2010. “The Enduring Value of Social Science Research: The Use and Reuse of Primary Research Data.” Presented at the BRICK, DIME, STRIKE Workshop, The Organisation, Economics, and Policy of Scientific Research, Turin, Italy, April 23 24, 2010 (‐ http://hdl.handle.net/2027.42/78307)

Shared Data Produce More Publications

Why don’t researchers share their data?

The usual suspects:• I don’t have time.• My grant doesn’t pay for it.• It will be used incorrectly.• Someone might scoop me with my own data.

Our usual replies:• You will get credit for sharing.• More research will be done.• Transparency and replication are good for science.

What are the weak points in this story?

Will Researcher 2 cite the data?Will Researcher 1

deposit the data?

Researcher 1 collects data and publishes an

article.

Publication as Seen by a Researcher

Researcher 1 is rewarded.

Researcher 2 reads the article and has an idea

Researcher 2 obtains the data

Researcher 2 writes a new manuscript

Journal

Researcher 2 sends the manuscript to the

Journal

Editor

Journal

The Editor sends the manuscript out for

reviews

Editor

Reviewers

Journal

The accepted manuscript goes to the

Copy Editor

Editor

Reviewers

CopyEditor

Journal

Publisher

The Copy Editor sends the article to the

Publisher

Editor

Reviewers

CopyEditor

Printer

Publisher

Journal

Publisher

The article is published and Researcher 2 is

rewarded

Editor

Reviewers

CopyEditor

Printer

Publisher

Journal

Publisher

Repository

The Researcher’s view of publication does not

include a Repository

Journal

Publisher

Repository

The Researcher’s view of publication does not

include a Repositoryor

data citation with a persistent identifier

Repository

Who can assure that data are sent to a

repository?

Funding Agency

Repository

Most data collection is supported by a Funding

Agency

Funding Agency

Repository

The Funding Agency has a carrot

Awards

Funding Agency

Repository

The Funding Agency has a carrot

and a stickAwardsCompliance

Funding Agency

Journal

Publisher

Repository

Who can assure that data are cited?

Professional Association

Journal

Publisher

The Journal is owned by a Professional Association

Professional Association

Journal

Publisher

Committees of the Professional Association oversee the

Journal

Professional Association

Journal

Publisher

Committees of the Professional Association oversee the

Journal

Executive

Professional Association

Journal

Publisher

Committees of the Professional Association oversee the

Journal

Ethics

Executive

Professional Association

Journal

Publisher

Committees of the Professional Association oversee the

Journal

Ethics Publications

Executive

Professional Association

Journal

Publisher

Ethics Code

The Professional Association issues an Ethics Code requiring

data access and research transparency

Professional Association

Journal

Publisher

Ethics Code

Author Guide

The Ethics Code informs the Journal’s Author Guide

Professional Association

Journal

Publisher

Ethics Code

Author Guide

Someone at the Journal enforces the Author Guide requirements

for data access and citation

Funding AgencyProfessional Association

Journal

Publisher

Ethics Code

Repository Author Guide

Achieving Data Access and Research Transparency:• Enforcement by funding

agencies

• Ethics codes from Professional Associations

• Author guidelines from Journals

• Enforcement by journals

Why should funding agencies require data sharing?

• Data re-use is a more efficient use of funds– Collecting data is expensive– Data that are shared produce more science • Funding agencies are the biggest beneficiaries of data

citation.

• Political winds favor open data

Reproducibility should be the gold standard that all peer reviewers and editors aim for when assessing whether a manuscript has supplied sufficient information to allow others to repeat and build on the experiments. As such, the presumption must be that, unless there is a strong reason otherwise, data should be fully disclosed and made publicly available. In line with this principle, data associated with all publicly funded research should, where possible, be made widely and freely available. The work of researchers who expend time and effort adding value to their data, to make it usable by others, should be acknowledged and encouraged.

House of Commons, Science and Technology Committee - Eighth Report of Session 201012 Peer review in scientiic publications. Ordered by the House of Commons to be printed 18 July 2011.http://www.publications.parliament.uk/pa/cm201012/cmselect/cmsctech/856/856.pdf

Transparency and reproducibility are politically popular

The White House has mandated public access to federally funded data

Congress favors open access to data

“The growing lack of scientific integrity and transparency has many causes but one thing is very clear: without open access to data, there can be neither integrity nor transparency from the conclusions reached by the scientific community. Furthermore, when there is no reliable access to data, the progress of science is impeded and leads to inefficiencies in the scientific discovery process. Important results cannot be verified, and confidence in scientific claims dwindles.”

Statement of Research Subcommittee Chairman Larry Bucshon (R-Ind.) Hearing on Scientific Integrity and Transparency, March 5, 2013.

Open data has bi-partisan support!

National Institutes of Health, Data and Informatics Working GroupDraft Report to The Advisory Committee to the Director,June 15, 2012

Recommendation 1: Promote Data Sharing Through Central and Federated Catalogues1a. Establish a Minimal Metadata Framework for Data Sharing1b. Create Catalogues and Tools to Facilitate Data Sharing1c. Enhance and Incentivize a Data Sharing Policy for NIH-Funded Data

What is motivating Professional Associations and Journals?

• Concern about legitimacy– Cases of fraud and misuse of data

What is motivating Professional Associations and Journals?

• Concern about legitimacy– Cases of fraud and misuse of data– Failures of replication– Public attacks on science

How can Professional Associations and Journals respond?

• Professional associations – Ethics guidelines that emphasize data access and research

transparency• Journals– Data citation guidelines– Data access policies

• Replication data• Codes and scripts

– Journals worry about• Cost• Compliance• Competition

Improving Data Citation in Journals

Data-PASS letter to the American Sociological Association, August 8, 2010

Similar letters sent to American Economics Association, American Education Research Association, and American Political Science Association.

Data Citation

References for data sets should include a persistent identifier, such as a Digital Object Identifier (DOI). Persistent identifiers ensure future access to unique published digital objects, such as a text or data set. Persistent identifiers are assigned to data sets by digital archives, such as institutional repositories and partners in the Data Preservation Alliance for the Social Sciences (Data-PASS).

American Political Science Association “Guide to Professional Ethics” October 2012

6. Researchers have an ethical obligation to facilitate the evaluation of their evidence-based knowledge claims through data access, production transparency, and analytic transparency so that their work can be tested or replicated.

6.1 Data access: Researchers making evidence-based knowledge claims should reference the data they used to make those claims. If these are data they themselves generated or collected, researchers should provide access to those data or explain why they cannot.6.2 Production transparency: Researchers providing access to data they themselves generated or collected, should offer a full account of the procedures used to collect or generate the data.6.3 Analytic Transparency: Researchers making evidence-based knowledge claims should provide a full account of how they draw their analytic conclusions from the data, i.e., clearly explicate the links connecting data to conclusions.

American Political Science Association Guide to Professional Ethics, Rights and Freedoms

The American Economic Review: Data Availability Policy

It is the policy of the American Economic Review to publish papers only if the data used in the analysis are clearly and precisely documented and are readily available to any researcher for purposes of replication. Authors of accepted papers that contain empirical work, simulations, or experimental work must provide to the Review, prior to publication, the data, programs, and other details of the computations sufficient to permit replication. These will be posted on the AER Web site. The Editor should be notified at the time of submission if the data used in a paper are proprietary or if, for some other reason, the requirements above cannot be met. As soon as possible after acceptance, authors are expected to send their data, programs, and sufficient details to permit replication, in electronic form, to the AER office. … If a request for an exemption based on proprietary data is made, authors should inform the editors if the data can be accessed or obtained in some other way by independent researchers for purposes of replication. Authors are also asked to provide information on how the proprietary data can be obtained by others in their Readme PDF file. A copy of the programs used to create the final results is still required.

Concluding thoughts

• Changing researcher behavior is difficult• The rewards of data citation are not enough

• Funding agencies and Journals – have the greatest leverage for changing behavior– are sympathetic to data access and transparency

What can we do?Funding agencies• Fund data stewardship– Researchers should not be faced with a tradeoff

between their scientific aims and data stewardship• Enforce data management plans• Improve funding of data repositories– Recognize data repositories as scientific infrastructure– Develop relevant evaluation criteria

What can we do?Journals• Guidelines to authors should include– Data access policies– Data citation policies– Persistent identifiers for data– Examples

• Keep it simple– Focus on key elements: Author, Title, Date,

Location (i.e. persistent identifier)

What can we do?Data Archiving Community• See the whole picture• Train researchers in data management– See Ball and Medeiros, “Teaching Students to

Document Empirical Research” on YouTube• Reduce the costs of capturing metadata in

scientific workflows• Rate journals on their policies and performance

Thank you!

George [email protected]