Upload
mildred-park
View
219
Download
2
Tags:
Embed Size (px)
Citation preview
Data resources for the future
Since 1962
The Selection, Appraisal, and The Selection, Appraisal, and Retention of Social Science Retention of Social Science
Data in the United StatesData in the United States
Myron Gutmann
Inter-university Consortium for Political and Social Research
ERPANET/CODATA Workshop on “The Selection, Appraisal And Retention of Digital Scientific Data,”
Lisbon, Portugal, December, 2003
Data resources for the future
Since 1962
Main QuestionsMain Questions
• What constitutes Social Science Data?
• Who Archives the Data?
• How are Data Selected and Appraised?
• Prospects for the Future
Data resources for the future
Since 1962
BackgroundBackground
• Social science data shared for more than 150 years – published census volumes
• Social science data have been digital since 1890 – first Hollerith cards
• Data re-use is well-established in some social sciences, not in others
• Metadata standards well known• Many horror stories of lost data – NORC
Kennedy Assassination survey
Data resources for the future
Since 1962
About ICPSRAbout ICPSR
• International Consortium of >500 Higher Education Institutions worldwide
• Administered by University of Michigan
• Founded 1962 to Archive & Disseminate data & train data users
• On-line Data, Metadata, Bibliography
• Research on archiving: Metadata (DDI), Disclosure…
Data resources for the future
Since 1962
Question 1: Question 1: What Constitutes Social What Constitutes Social
Science Data?Science Data?
Data resources for the future
Since 1962
Types of DataTypes of Data
• Categorical or Closed-End Survey Responses and Administrative Data:– Censuses & Administrative Data (crimes, etc)– Sample surveys: Public Opinion Polls– Social Research Surveys (also
political/economic)
• Qualitative or Open-Ended Survey Responses– Exact words of the responses are important
Data resources for the future
Since 1962
Who Produces Social Science Who Produces Social Science Data in the U.S.?Data in the U.S.?• U.S. Federal Government Agencies
– Administrative Data (Census, etc)– Surveys (Current Population Survey, etc)
• University Researchers– Government Funded surveys– University & Foundation-funded surveys
• Private Researchers– Marketing & Polling Firms– Private Research Firms
Data resources for the future
Since 1962
Who Owns the Data?Who Owns the Data?
• Government-produced data: the Government (federal, state, local)
• Government-funded data – Contracts: Owned by Government– Grants: Owned by University (=researcher)
• Privately-funded data: also owned by the University (= researcher)
Data resources for the future
Since 1962
Final Important PointsFinal Important Points• Non-Confidential social and economic
data collected by the U.S. Government are generally public and freely available
• U.S. Government Data are often archived simultaneously by public and private archives, which have different goals and techniques.
• Few university data owners have archiv-ing policies, so researchers on their own
Data resources for the future
Since 1962
Question 2:Question 2:Who Archives the Data?Who Archives the Data?
Data resources for the future
Since 1962
Five Major U.S. ArchivesFive Major U.S. Archives
• National Archives and Records Administration (1976 – U.S. Gov’t)
• Inter-university Consortium for Political and Social Research (1962 - Michigan)
• Roper Center for Public Opinion Research (1947 - Connecticut)
• Odum Institute for Research in Social Science (1924 - North Carolina)
• Murray Research Center (1976 - Harvard-Radcliffe)
Data resources for the future
Since 1962
Size of Holdings – 14 TB at ICPSR)Size of Holdings – 14 TB at ICPSR)
0
2,000
4,000
6,000
8,000
10,000
12,000
14,000
16,000
Studies Gigabytes
NARA
ICPSR
Roper
Odum
Murray
Data resources for the future
Since 1962
Each Archive is Somewhat Each Archive is Somewhat SpecializedSpecialized
• NARA - Official Government Documents & rare Government-funded social science
• Roper Center – National public opinion polling and some other surveys
• Odum Institute – U.S. South & State & Regional public opinion polling
• Murray Center – Studies of Human Development, largely qualitative but some quantitative
Data resources for the future
Since 1962
ICPSR has the Largest Holdings ICPSR has the Largest Holdings and Broadest Mandateand Broadest Mandate
• Originally focused on Political data
• Since 1970s, seeks out and receives submissions of a broad range of publicly- and privately-produced data
• Partnerships with government agencies and private foundations to archive and disseminate specialized data
Data resources for the future
Since 1962
Question 3:Question 3:How are Data Selected and How are Data Selected and
Appraised?Appraised?NARA and The University-NARA and The University-
Based ArchivesBased Archives
Data resources for the future
Since 1962
NARANARA
• Electronic Records Division uses standard NARA Selection & Appraisal Standards
• Goal: Identify & Select Permanently Valuable Records from Government Agencies
• Generally like the situation Terry Eastwood described
Data resources for the future
Since 1962
ICPSR: Two ModelsICPSR: Two Models• “Voluntary” Archiving – Search for &
select data of lasting value to preserve– Appraisal early in process leads to little
negative appraisal of final data
• “Contract” Archiving – P & disseminate data selected by the producer or funder
• Other Issues– Confidentiality– Metadata– Perceived value for secondary analysis
Data resources for the future
Since 1962
University-Based Archives Have University-Based Archives Have Collection Development Policies Collection Development Policies that Drive Selection/Appraisalthat Drive Selection/Appraisal
• ICPSR tracks major research grants
• Roper logs surveys that are mentioned in the press
• Odum collects regional polls
• Murray Center notes major longitudinal and qualitative studies
Data resources for the future
Since 1962
Bigger Challenge: Getting Data Bigger Challenge: Getting Data Owners to Archive DataOwners to Archive Data• NSF & NIH have active data sharing
policies, but data owners have…
• Concerns about confidentiality of survey respondents
• Concerns about maintaining research priority vs. competitors
• Researchers lack time or motivation to prepare data for archiving
• Data just “fall through the cracks”
Data resources for the future
Since 1962
Question 4:Question 4:Prospects for the FutureProspects for the Future
Data resources for the future
Since 1962
Dealing with the ContradictionsDealing with the Contradictions
• More data produced
• More pressure to share
• More concern about confidentiality
• Easier access to shared data via the web
Data resources for the future
Since 1962
Future ActivitiesFuture Activities
• Increasing interest by research and mission agencies to support archiving and preservation
• Hybrid services proposed to link PI-disseminated with permanently archived data in a “virtual” repository
• The five major U.S. Archives have joined together to ensure the preservation of 75 years of digital social science data.