22
Data resources for the futu Since 1962 The Selection, The Selection, Appraisal, and Retention Appraisal, and Retention of Social Science Data of Social Science Data in the United States in the United States Myron Gutmann Inter-university Consortium for Political and Social Research ERPANET/CODATA Workshop on “The Selection, Appraisal And Retention of Digital Scientific Data,” Lisbon, Portugal, December, 2003

Data resources for the future Since 1962 The Selection, Appraisal, and Retention of Social Science Data in the United States Myron Gutmann Inter-university

Embed Size (px)

Citation preview

Data resources for the future

Since 1962

The Selection, Appraisal, and The Selection, Appraisal, and Retention of Social Science Retention of Social Science

Data in the United StatesData in the United States

Myron Gutmann

Inter-university Consortium for Political and Social Research

ERPANET/CODATA Workshop on “The Selection, Appraisal And Retention of Digital Scientific Data,”

Lisbon, Portugal, December, 2003

Data resources for the future

Since 1962

Main QuestionsMain Questions

• What constitutes Social Science Data?

• Who Archives the Data?

• How are Data Selected and Appraised?

• Prospects for the Future

Data resources for the future

Since 1962

BackgroundBackground

• Social science data shared for more than 150 years – published census volumes

• Social science data have been digital since 1890 – first Hollerith cards

• Data re-use is well-established in some social sciences, not in others

• Metadata standards well known• Many horror stories of lost data – NORC

Kennedy Assassination survey

Data resources for the future

Since 1962

About ICPSRAbout ICPSR

• International Consortium of >500 Higher Education Institutions worldwide

• Administered by University of Michigan

• Founded 1962 to Archive & Disseminate data & train data users

• On-line Data, Metadata, Bibliography

• Research on archiving: Metadata (DDI), Disclosure…

Data resources for the future

Since 1962

Question 1: Question 1: What Constitutes Social What Constitutes Social

Science Data?Science Data?

Data resources for the future

Since 1962

Types of DataTypes of Data

• Categorical or Closed-End Survey Responses and Administrative Data:– Censuses & Administrative Data (crimes, etc)– Sample surveys: Public Opinion Polls– Social Research Surveys (also

political/economic)

• Qualitative or Open-Ended Survey Responses– Exact words of the responses are important

Data resources for the future

Since 1962

Who Produces Social Science Who Produces Social Science Data in the U.S.?Data in the U.S.?• U.S. Federal Government Agencies

– Administrative Data (Census, etc)– Surveys (Current Population Survey, etc)

• University Researchers– Government Funded surveys– University & Foundation-funded surveys

• Private Researchers– Marketing & Polling Firms– Private Research Firms

Data resources for the future

Since 1962

Who Owns the Data?Who Owns the Data?

• Government-produced data: the Government (federal, state, local)

• Government-funded data – Contracts: Owned by Government– Grants: Owned by University (=researcher)

• Privately-funded data: also owned by the University (= researcher)

Data resources for the future

Since 1962

Final Important PointsFinal Important Points• Non-Confidential social and economic

data collected by the U.S. Government are generally public and freely available

• U.S. Government Data are often archived simultaneously by public and private archives, which have different goals and techniques.

• Few university data owners have archiv-ing policies, so researchers on their own

Data resources for the future

Since 1962

Question 2:Question 2:Who Archives the Data?Who Archives the Data?

Data resources for the future

Since 1962

Five Major U.S. ArchivesFive Major U.S. Archives

• National Archives and Records Administration (1976 – U.S. Gov’t)

• Inter-university Consortium for Political and Social Research (1962 - Michigan)

• Roper Center for Public Opinion Research (1947 - Connecticut)

• Odum Institute for Research in Social Science (1924 - North Carolina)

• Murray Research Center (1976 - Harvard-Radcliffe)

Data resources for the future

Since 1962

Size of Holdings – 14 TB at ICPSR)Size of Holdings – 14 TB at ICPSR)

0

2,000

4,000

6,000

8,000

10,000

12,000

14,000

16,000

Studies Gigabytes

NARA

ICPSR

Roper

Odum

Murray

Data resources for the future

Since 1962

Each Archive is Somewhat Each Archive is Somewhat SpecializedSpecialized

• NARA - Official Government Documents & rare Government-funded social science

• Roper Center – National public opinion polling and some other surveys

• Odum Institute – U.S. South & State & Regional public opinion polling

• Murray Center – Studies of Human Development, largely qualitative but some quantitative

Data resources for the future

Since 1962

ICPSR has the Largest Holdings ICPSR has the Largest Holdings and Broadest Mandateand Broadest Mandate

• Originally focused on Political data

• Since 1970s, seeks out and receives submissions of a broad range of publicly- and privately-produced data

• Partnerships with government agencies and private foundations to archive and disseminate specialized data

Data resources for the future

Since 1962

Question 3:Question 3:How are Data Selected and How are Data Selected and

Appraised?Appraised?NARA and The University-NARA and The University-

Based ArchivesBased Archives

Data resources for the future

Since 1962

NARANARA

• Electronic Records Division uses standard NARA Selection & Appraisal Standards

• Goal: Identify & Select Permanently Valuable Records from Government Agencies

• Generally like the situation Terry Eastwood described

Data resources for the future

Since 1962

ICPSR: Two ModelsICPSR: Two Models• “Voluntary” Archiving – Search for &

select data of lasting value to preserve– Appraisal early in process leads to little

negative appraisal of final data

• “Contract” Archiving – P & disseminate data selected by the producer or funder

• Other Issues– Confidentiality– Metadata– Perceived value for secondary analysis

Data resources for the future

Since 1962

University-Based Archives Have University-Based Archives Have Collection Development Policies Collection Development Policies that Drive Selection/Appraisalthat Drive Selection/Appraisal

• ICPSR tracks major research grants

• Roper logs surveys that are mentioned in the press

• Odum collects regional polls

• Murray Center notes major longitudinal and qualitative studies

Data resources for the future

Since 1962

Bigger Challenge: Getting Data Bigger Challenge: Getting Data Owners to Archive DataOwners to Archive Data• NSF & NIH have active data sharing

policies, but data owners have…

• Concerns about confidentiality of survey respondents

• Concerns about maintaining research priority vs. competitors

• Researchers lack time or motivation to prepare data for archiving

• Data just “fall through the cracks”

Data resources for the future

Since 1962

Question 4:Question 4:Prospects for the FutureProspects for the Future

Data resources for the future

Since 1962

Dealing with the ContradictionsDealing with the Contradictions

• More data produced

• More pressure to share

• More concern about confidentiality

• Easier access to shared data via the web

Data resources for the future

Since 1962

Future ActivitiesFuture Activities

• Increasing interest by research and mission agencies to support archiving and preservation

• Hybrid services proposed to link PI-disseminated with permanently archived data in a “virtual” repository

• The five major U.S. Archives have joined together to ensure the preservation of 75 years of digital social science data.