25
International Digital Curation Conference, Amsterdam, February 22, 2016 A Context-driven Approach to Data Curation for Reuse Ixchel M. Faniel, Ph.D. Research Scientist, OCLC Elizabeth Yakel, Ph.D. Professor, University of Michigan

A Context-driven Approach to Data Curation for Reuse

  • Upload
    oclc

  • View
    1.018

  • Download
    0

Embed Size (px)

Citation preview

Page 1: A Context-driven Approach to Data Curation for Reuse

International Digital Curation Conference, Amsterdam, February 22, 2016

A Context-driven Approach to Data Curation for Reuse

Ixchel M. Faniel, Ph.D.Research Scientist, OCLC

Elizabeth Yakel, Ph.D.Professor, University of Michigan

Page 2: A Context-driven Approach to Data Curation for Reuse

DATA &

DOCUMENTATION.csv, codebooks, research design,

survey, images, images, notes,shape files, specimens,

artifacts, etc.

DATA DEPOSITREQUIREMENTS

description, creator, title, publisher,date, donor, rights, collector, taxon,

documents, subject, coverage, methods, etc.Data

producerRepository

staff

Data Reuser

But based on whose needs? A Context-driven Approach to Data Curation for Reuse

DATA &

DOCUMENTATION

data collection, data producer, and repository information, prior reuse, missing data,

research objectives, provenance, advise on reuse, etc.

The DSpace Digital Repository Model

Page 3: A Context-driven Approach to Data Curation for Reuse

DIPIR

Nancy McGovernICPSR/MIT

Ixchel FanielOCLC

Research (PI)

Eric Kansa Open Context

William Fink UM Museum of

Zoology

Elizabeth Yakel University of

Michigan (Co-PI)

Page 4: A Context-driven Approach to Data Curation for Reuse

ICSPR Open Context UMMZPhase 1: Project Start up

Interview Staff 10 4 10

Phase 2: Collecting and analyzing user dataInterview Reusers 43 22 27

Survey Reusers 1480

Web analytics server logs

Observe Reusers 13

Phase 3: Mapping data’s context to reusers’ needs

DIPIR Methodology

Page 5: A Context-driven Approach to Data Curation for Reuse

Interviews and Observations Data Collection • 92 interviews

• 13 researchers observed at the University of Michigan Museum of Zoology

Data Analysis • 1st cycle coding

– based on interview protocol – more codes added as

necessary• 2nd cycle coding for context

– Detailed context needed– Place get context – Reason need context

Page 6: A Context-driven Approach to Data Curation for Reuse

Cross-disciplinary Context Model Contextual information

reusers needPlaces reusers seek

contextual information

Reasons reusers require contextual information“our wolf data are not making a lot of sense…so going back to the field notes sometimes can help.” (Zoologist 07)

My definition, my theory fit the definition of what they were using to code.” (Social Scientist 02)

“If it’s somebody whose training I don’t know about, I’m going to be less likely to use their dataset because I'm not sure how reliable it is.” (Archaeologist 06)

Page 7: A Context-driven Approach to Data Curation for Reuse

First Card Sort Exercise Rank order your cards based on the different types of context information your designated community of users need.

Page 8: A Context-driven Approach to Data Curation for Reuse
Page 9: A Context-driven Approach to Data Curation for Reuse
Page 10: A Context-driven Approach to Data Curation for Reuse
Page 11: A Context-driven Approach to Data Curation for Reuse
Page 12: A Context-driven Approach to Data Curation for Reuse

OUR CONTEXT MODEL VS.

DATA DEPOSIT REQUIREMENTS

Page 13: A Context-driven Approach to Data Curation for Reuse

Second Card Sort Exercise

Re-sort your cards with your existing data deposit requirements in mind.

Page 14: A Context-driven Approach to Data Curation for Reuse

Data Deposit Requirements: Methodology

• Sample repository data deposit requirements in quantitative social science, archaeology, and zoology– Our collaborators– Places mentioned in our interviews– Major repositories (larger, well-known, longevity) in each

field– English language documentation– Online and physical facilities

Page 15: A Context-driven Approach to Data Curation for Reuse

Data Deposit Requirements: SampleField Sample Contacted Data Deposit

Requirements

Quantitative social science

9 6

Archaeology 6 6

Zoology 14 7

Page 16: A Context-driven Approach to Data Curation for Reuse

Data Deposit Requirements: Analyses• Types of information collected by repositories in each

field• Comparison with a generalized set of metadata

derived from the group• Comparison with peer institutions• Comparison with what context users want

Page 17: A Context-driven Approach to Data Curation for Reuse

ARCHAEOLOGY Generalized Requirements ADS DANS TDar OpenContext

Institute for Archaeology

Parks Canada Average

Description 1.0000 1.0000 0.9000 0.9000 0.0000 1.0000 0.8000

Creator 0.9577 1.0000 0.8481 1.0000 0.7236 0.0000 0.7549

Title 0.9000 1.0000 0.9143 1.0000 0.0000 0.0000 0.6357

Publisher 1.0000 1.0000 0.9000 0.0000 0.8069 0.0000 0.6178

Date 0.9179 0.8533 0.9143 0.7678 0.0000 0.0000 0.5756

Location 0.0000 0.0000 0.8800 1.0000 0.0000 1.0000 0.4800

Source 1.0000 1.0000 0.8329 0.0000 0.0000 0.0000 0.4721

Identifier 0.9733 1.0000 0.8277 0.0000 0.0000 0.0000 0.4668

Type 0.9000 0.8429 0.8600 0.0000 0.0000 0.0000 0.4338

Contributor 0.9765 1.0000 0.0000 0.0000 0.0000 0.0000 0.3294

Subject 1.0000 0.8444 0.0000 0.0000 0.0000 0.0000 0.3074

Coverage 1.0000 0.9200 0.0000 0.0000 0.0000 0.0000 0.3200

Language 1.0000 0.8533 0.0000 0.0000 0.0000 0.0000 0.3089

Format 1.0000 0.8421 0.0000 0.0000 0.0000 0.0000 0.3070

Rights 0.0000 0.8857 0.9143 0.0000 0.0000 0.0000 0.3000

Comments 0.0000 0.0000 0.0000 0.0000 0.7574 1.0000 0.2929

Relations 1.0000 0.9643 0.0000 0.0000 0.0000 0.0000 0.3274

File 0.9000 0.0000 0.8800 0.0000 0.0000 0.0000 0.2967

Documents 0.0000 0.0000 0.0000 0.0000 0.8273 0.0000 0.1379

Average 0.7645 0.7372 0.5090 0.2457 0.1640 0.1579

Page 18: A Context-driven Approach to Data Curation for Reuse

Quantitative Social Science Generalized Requirements

UK Data Service Odum DANS ICPSR ADA Roper Average Rank

Title 1.0000 0.8727 1.0000 0.8667 0.9143 0.8727 0.9211 1.0000

Description 0.9000 0.8889 1.0000 0.8727 0.8490 0.9200 0.9051 2.0000

Dates 0.8727 0.8490 0.8382 0.9000 0.8135 0.9143 0.8646 3.0000

Data Collector 0.9779 0.8259 0.0000 0.8825 0.9500 0.8109 0.7412 4.0000

Depositor 0.9693 1.0000 0.0000 1.0000 1.0000 0.0000 0.6616 5.0000

Time Period 0.9273 0.0000 0.0000 0.9333 0.9273 0.0000 0.4646 6.0000

Funding 1.0000 0.0000 0.0000 0.8533 0.8800 0.0000 0.4556 7.0000

Subject 0.9000 0.8222 1.0000 0.0000 0.0000 0.0000 0.4537 8.0000

Principal Investigator 0.0000 0.8109 0.0000 0.9455 0.9407 0.0000 0.4495 9.0000

Location 0.8533 0.8889 0.0000 0.0000 0.0000 0.8381 0.4301 10.0000

Creator 0.8975 0.0000 1.0000 0.0000 0.0000 0.0000 0.3163

Contributors 0.0000 0.0000 0.9765 0.0000 0.0000 0.0000 0.1628

Language 0.0000 0.0000 1.0000 0.0000 0.0000 0.0000 0.1667

Rights 0.0000 0.0000 0.9000 0.0000 0.0000 0.0000 0.1500

Confidentiality 0.9000 0.0000 0.0000 0.7828 0.0000 0.0000 0.2805

Source 0.8533 0.8889 0.0000 0.0000 0.0000 0.0000 0.2904

Relations 0.0000 0.0000 0.9643 0.0000 0.0000 0.0000 0.1607

Documents 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000

Notes 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000

Methodology 1.0000 0.8471 0.0000 0.0000 0.0000 0.0000 0.3078

Average 0.6026 0.4347 0.4340 0.4018 0.3637 0.2178 0.2908

Page 19: A Context-driven Approach to Data Curation for Reuse

ZOOLOGY: Generalized Requirements DANS

Canadian Polar Network

Museum of Vertebrate Zoology (Berkeley) Genbank

American Museum of Natural History Dryad

Protein Data Bank Average

Dates 0.8503 0.0000 0.8570 0.8203 0.0000 0.0000 0.0000 0.3611

Title 1.0000 0.8750 0.0000 0.0000 0.0000 0.0000 0.0000 0.2679

Language 1.0000 0.8667 0.0000 0.0000 0.0000 0.0000 0.0000 0.2667

Description 1.0000 0.0000 0.0000 0.0000 0.0000 0.8500 0.0000 0.2643

Source 1.0000 0.0000 0.0000 0.8184 0.0000 0.0000 0.0000 0.2598

Location 0.0000 0.0000 0.9000 0.9000 0.0000 0.0000 0.0000 0.2571

Donor 0.0000 0.0000 0.8333 0.0000 0.8222 0.0000 0.0000 0.2365

Sequence 0.0000 0.0000 0.0000 0.8667 0.0000 0.0000 0.8079 0.2392

Creator 1.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.1429

Subject 1.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.1429

Keywords 0.0000 1.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.1429

Contributors 0.9765 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.1395

Collector 0.0000 0.7849 0.0000 0.0000 0.0000 0.0000 0.0000 0.1121

Institution 0.0000 0.0000 0.0000 0.0000 0.8222 0.0000 0.0000 0.1175

Period 0.0000 0.9000 0.0000 0.0000 0.0000 0.0000 0.0000 0.1286

Format 1.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.1429

Rights 0.9000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.1286

Methods 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000

Notes 0.0000 0.0000 0.8727 0.0000 0.0000 0.0000 0.0000 0.1247

Relations 0.9643 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.1378

Taxon 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000

Average 0.5091 0.2108 0.1649 0.1622 0.0783 0.0405 0.0385

Page 20: A Context-driven Approach to Data Curation for Reuse

Archaeology: Reuser Needs vs. Repository Requirements

Data collection informationSpecimen/artifact informationRepository Provenance Data analysis information Data producer information Digitization/curation information

Description

Creator

Title

Publisher

Date

Location

Page 21: A Context-driven Approach to Data Curation for Reuse

Quantitative Social Science: Reuser Context vs. Repository Requirements:Data collection informationData analysis information Data producer information Prior reuseMissing dataRepository

Title

Description

Dates

Data

Collector

Depositor

Time period

Page 22: A Context-driven Approach to Data Curation for Reuse

Zoology: Reuser Context vs. Repository RequirementsSpecimen or artifact information Data collection informationData producer informationDigitization/curation informationRepositoryProvenanceData analysis information

Dates

Title

Language

Description

Source

Location

Page 23: A Context-driven Approach to Data Curation for Reuse

Impact of Our Approach and Findings• Kathleen Fear, Ph.D.

– Data Librarian, University of Rochester• Eric Kansa, Ph.D.

– Data Publisher, Open Context

Page 24: A Context-driven Approach to Data Curation for Reuse

Acknowledgements • Institute of Museum and Library Services • Partners: Nancy McGovern, Ph.D. (MIT), Eric Kansa, Ph.D. (Open Context),

William Fink, Ph.D. (University of Michigan Museum of Zoology)• OCLC Fellow: Julianna Barrera-Gomez• Doctoral Students: Rebecca Frank, Adam Kriesberg, Morgan Daniels, Ayoung

Yoon• Master’s Students: Alexa Hagen, Jessica Schaengold, Gavin Strassel,

Michele DeLia, Kathleen Fear, Mallory Hood, Annelise Doll, Monique Lowe• Undergraduates: Molly Haig

Page 25: A Context-driven Approach to Data Curation for Reuse

Thank youIxchel M. FanielResearch Scientist, OCLC

[email protected]

©2015 OCLC and Elizabeth Yakel. This work is licensed under a Creative Commons Attribution 4.0 International License. Suggested attribution: This work uses content from A Context-driven Approach to Data Curation for Reuse © OCLC, Elizabeth Yakel used under a Creative Commons Attribution 4.0 International License: http://creativecommons.org/licenses/by/4.0/.

Elizabeth YakelProfessor, University of Michigan, School of Information

[email protected]