Upload
oclc
View
1.018
Download
0
Embed Size (px)
Citation preview
International Digital Curation Conference, Amsterdam, February 22, 2016
A Context-driven Approach to Data Curation for Reuse
Ixchel M. Faniel, Ph.D.Research Scientist, OCLC
Elizabeth Yakel, Ph.D.Professor, University of Michigan
DATA &
DOCUMENTATION.csv, codebooks, research design,
survey, images, images, notes,shape files, specimens,
artifacts, etc.
DATA DEPOSITREQUIREMENTS
description, creator, title, publisher,date, donor, rights, collector, taxon,
documents, subject, coverage, methods, etc.Data
producerRepository
staff
Data Reuser
But based on whose needs? A Context-driven Approach to Data Curation for Reuse
DATA &
DOCUMENTATION
data collection, data producer, and repository information, prior reuse, missing data,
research objectives, provenance, advise on reuse, etc.
The DSpace Digital Repository Model
DIPIR
Nancy McGovernICPSR/MIT
Ixchel FanielOCLC
Research (PI)
Eric Kansa Open Context
William Fink UM Museum of
Zoology
Elizabeth Yakel University of
Michigan (Co-PI)
ICSPR Open Context UMMZPhase 1: Project Start up
Interview Staff 10 4 10
Phase 2: Collecting and analyzing user dataInterview Reusers 43 22 27
Survey Reusers 1480
Web analytics server logs
Observe Reusers 13
Phase 3: Mapping data’s context to reusers’ needs
DIPIR Methodology
Interviews and Observations Data Collection • 92 interviews
• 13 researchers observed at the University of Michigan Museum of Zoology
Data Analysis • 1st cycle coding
– based on interview protocol – more codes added as
necessary• 2nd cycle coding for context
– Detailed context needed– Place get context – Reason need context
Cross-disciplinary Context Model Contextual information
reusers needPlaces reusers seek
contextual information
Reasons reusers require contextual information“our wolf data are not making a lot of sense…so going back to the field notes sometimes can help.” (Zoologist 07)
My definition, my theory fit the definition of what they were using to code.” (Social Scientist 02)
“If it’s somebody whose training I don’t know about, I’m going to be less likely to use their dataset because I'm not sure how reliable it is.” (Archaeologist 06)
First Card Sort Exercise Rank order your cards based on the different types of context information your designated community of users need.
OUR CONTEXT MODEL VS.
DATA DEPOSIT REQUIREMENTS
Second Card Sort Exercise
Re-sort your cards with your existing data deposit requirements in mind.
Data Deposit Requirements: Methodology
• Sample repository data deposit requirements in quantitative social science, archaeology, and zoology– Our collaborators– Places mentioned in our interviews– Major repositories (larger, well-known, longevity) in each
field– English language documentation– Online and physical facilities
Data Deposit Requirements: SampleField Sample Contacted Data Deposit
Requirements
Quantitative social science
9 6
Archaeology 6 6
Zoology 14 7
Data Deposit Requirements: Analyses• Types of information collected by repositories in each
field• Comparison with a generalized set of metadata
derived from the group• Comparison with peer institutions• Comparison with what context users want
ARCHAEOLOGY Generalized Requirements ADS DANS TDar OpenContext
Institute for Archaeology
Parks Canada Average
Description 1.0000 1.0000 0.9000 0.9000 0.0000 1.0000 0.8000
Creator 0.9577 1.0000 0.8481 1.0000 0.7236 0.0000 0.7549
Title 0.9000 1.0000 0.9143 1.0000 0.0000 0.0000 0.6357
Publisher 1.0000 1.0000 0.9000 0.0000 0.8069 0.0000 0.6178
Date 0.9179 0.8533 0.9143 0.7678 0.0000 0.0000 0.5756
Location 0.0000 0.0000 0.8800 1.0000 0.0000 1.0000 0.4800
Source 1.0000 1.0000 0.8329 0.0000 0.0000 0.0000 0.4721
Identifier 0.9733 1.0000 0.8277 0.0000 0.0000 0.0000 0.4668
Type 0.9000 0.8429 0.8600 0.0000 0.0000 0.0000 0.4338
Contributor 0.9765 1.0000 0.0000 0.0000 0.0000 0.0000 0.3294
Subject 1.0000 0.8444 0.0000 0.0000 0.0000 0.0000 0.3074
Coverage 1.0000 0.9200 0.0000 0.0000 0.0000 0.0000 0.3200
Language 1.0000 0.8533 0.0000 0.0000 0.0000 0.0000 0.3089
Format 1.0000 0.8421 0.0000 0.0000 0.0000 0.0000 0.3070
Rights 0.0000 0.8857 0.9143 0.0000 0.0000 0.0000 0.3000
Comments 0.0000 0.0000 0.0000 0.0000 0.7574 1.0000 0.2929
Relations 1.0000 0.9643 0.0000 0.0000 0.0000 0.0000 0.3274
File 0.9000 0.0000 0.8800 0.0000 0.0000 0.0000 0.2967
Documents 0.0000 0.0000 0.0000 0.0000 0.8273 0.0000 0.1379
Average 0.7645 0.7372 0.5090 0.2457 0.1640 0.1579
Quantitative Social Science Generalized Requirements
UK Data Service Odum DANS ICPSR ADA Roper Average Rank
Title 1.0000 0.8727 1.0000 0.8667 0.9143 0.8727 0.9211 1.0000
Description 0.9000 0.8889 1.0000 0.8727 0.8490 0.9200 0.9051 2.0000
Dates 0.8727 0.8490 0.8382 0.9000 0.8135 0.9143 0.8646 3.0000
Data Collector 0.9779 0.8259 0.0000 0.8825 0.9500 0.8109 0.7412 4.0000
Depositor 0.9693 1.0000 0.0000 1.0000 1.0000 0.0000 0.6616 5.0000
Time Period 0.9273 0.0000 0.0000 0.9333 0.9273 0.0000 0.4646 6.0000
Funding 1.0000 0.0000 0.0000 0.8533 0.8800 0.0000 0.4556 7.0000
Subject 0.9000 0.8222 1.0000 0.0000 0.0000 0.0000 0.4537 8.0000
Principal Investigator 0.0000 0.8109 0.0000 0.9455 0.9407 0.0000 0.4495 9.0000
Location 0.8533 0.8889 0.0000 0.0000 0.0000 0.8381 0.4301 10.0000
Creator 0.8975 0.0000 1.0000 0.0000 0.0000 0.0000 0.3163
Contributors 0.0000 0.0000 0.9765 0.0000 0.0000 0.0000 0.1628
Language 0.0000 0.0000 1.0000 0.0000 0.0000 0.0000 0.1667
Rights 0.0000 0.0000 0.9000 0.0000 0.0000 0.0000 0.1500
Confidentiality 0.9000 0.0000 0.0000 0.7828 0.0000 0.0000 0.2805
Source 0.8533 0.8889 0.0000 0.0000 0.0000 0.0000 0.2904
Relations 0.0000 0.0000 0.9643 0.0000 0.0000 0.0000 0.1607
Documents 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
Notes 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
Methodology 1.0000 0.8471 0.0000 0.0000 0.0000 0.0000 0.3078
Average 0.6026 0.4347 0.4340 0.4018 0.3637 0.2178 0.2908
ZOOLOGY: Generalized Requirements DANS
Canadian Polar Network
Museum of Vertebrate Zoology (Berkeley) Genbank
American Museum of Natural History Dryad
Protein Data Bank Average
Dates 0.8503 0.0000 0.8570 0.8203 0.0000 0.0000 0.0000 0.3611
Title 1.0000 0.8750 0.0000 0.0000 0.0000 0.0000 0.0000 0.2679
Language 1.0000 0.8667 0.0000 0.0000 0.0000 0.0000 0.0000 0.2667
Description 1.0000 0.0000 0.0000 0.0000 0.0000 0.8500 0.0000 0.2643
Source 1.0000 0.0000 0.0000 0.8184 0.0000 0.0000 0.0000 0.2598
Location 0.0000 0.0000 0.9000 0.9000 0.0000 0.0000 0.0000 0.2571
Donor 0.0000 0.0000 0.8333 0.0000 0.8222 0.0000 0.0000 0.2365
Sequence 0.0000 0.0000 0.0000 0.8667 0.0000 0.0000 0.8079 0.2392
Creator 1.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.1429
Subject 1.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.1429
Keywords 0.0000 1.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.1429
Contributors 0.9765 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.1395
Collector 0.0000 0.7849 0.0000 0.0000 0.0000 0.0000 0.0000 0.1121
Institution 0.0000 0.0000 0.0000 0.0000 0.8222 0.0000 0.0000 0.1175
Period 0.0000 0.9000 0.0000 0.0000 0.0000 0.0000 0.0000 0.1286
Format 1.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.1429
Rights 0.9000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.1286
Methods 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
Notes 0.0000 0.0000 0.8727 0.0000 0.0000 0.0000 0.0000 0.1247
Relations 0.9643 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.1378
Taxon 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
Average 0.5091 0.2108 0.1649 0.1622 0.0783 0.0405 0.0385
Archaeology: Reuser Needs vs. Repository Requirements
Data collection informationSpecimen/artifact informationRepository Provenance Data analysis information Data producer information Digitization/curation information
Description
Creator
Title
Publisher
Date
Location
Quantitative Social Science: Reuser Context vs. Repository Requirements:Data collection informationData analysis information Data producer information Prior reuseMissing dataRepository
Title
Description
Dates
Data
Collector
Depositor
Time period
Zoology: Reuser Context vs. Repository RequirementsSpecimen or artifact information Data collection informationData producer informationDigitization/curation informationRepositoryProvenanceData analysis information
Dates
Title
Language
Description
Source
Location
Impact of Our Approach and Findings• Kathleen Fear, Ph.D.
– Data Librarian, University of Rochester• Eric Kansa, Ph.D.
– Data Publisher, Open Context
Acknowledgements • Institute of Museum and Library Services • Partners: Nancy McGovern, Ph.D. (MIT), Eric Kansa, Ph.D. (Open Context),
William Fink, Ph.D. (University of Michigan Museum of Zoology)• OCLC Fellow: Julianna Barrera-Gomez• Doctoral Students: Rebecca Frank, Adam Kriesberg, Morgan Daniels, Ayoung
Yoon• Master’s Students: Alexa Hagen, Jessica Schaengold, Gavin Strassel,
Michele DeLia, Kathleen Fear, Mallory Hood, Annelise Doll, Monique Lowe• Undergraduates: Molly Haig
Thank youIxchel M. FanielResearch Scientist, OCLC
©2015 OCLC and Elizabeth Yakel. This work is licensed under a Creative Commons Attribution 4.0 International License. Suggested attribution: This work uses content from A Context-driven Approach to Data Curation for Reuse © OCLC, Elizabeth Yakel used under a Creative Commons Attribution 4.0 International License: http://creativecommons.org/licenses/by/4.0/.
Elizabeth YakelProfessor, University of Michigan, School of Information