23
Contouring Curation in Research Libraries: Defining “Working” Data Units and Communities Carole L. Palmer Center for Informatics Research in Science & Scholarship FOURTH BLOOMSBURY CONFERENCE ON E-PUBLISHING AND E-PUBLICATIONS Valued Resources: Roles and Responsibilities of Digital Curators and Publishers 24-25 JUNE 2010

Contouring Curation in Research Libraries: Defining “Working” Data Units and Communities Carole L. Palmer Center for Informatics Research in Science &

Embed Size (px)

Citation preview

Page 1: Contouring Curation in Research Libraries: Defining “Working” Data Units and Communities Carole L. Palmer Center for Informatics Research in Science &

Contouring Curation in Research Libraries:

Defining “Working” Data Units and Communities

Carole L. PalmerCenter for Informatics Research in Science & Scholarship

FOURTH BLOOMSBURY CONFERENCE ON E-PUBLISHING AND E-PUBLICATIONSValued Resources: Roles and Responsibilities of Digital Curators and Publishers

24-25 JUNE 2010

Page 2: Contouring Curation in Research Libraries: Defining “Working” Data Units and Communities Carole L. Palmer Center for Informatics Research in Science &

Data curation and the future of research libraries

Data assets vital for universities and research centers

- to produce competitive science and scholarship

- to be good stewards of the common good produced through research

Natural extension of research library mission- to provide information resources to support current and future scholarship

Flickr: stancia, rh creative commons flickr.com/photos/001fj/2907653323/

The new stacks? (W. Tabb) The new special collections? (S. Choudhury)

Page 3: Contouring Curation in Research Libraries: Defining “Working” Data Units and Communities Carole L. Palmer Center for Informatics Research in Science &

Same “metascience” & specialist responsibilities

ON THE RESEARCH TEAM & IN THE LIBRARY

(Bates 1999)

But comprehensive and functioning infrastructure and servicesenvisioned for interdisciplinary & multi-scale science and scholarship,

requires information and data expertise

Provide access and promote sharing of broad landscape of information

• across institutions and disciplines in tradition union catalogs, bibliographies of

bibliographies

• across generations long-term, just in case, collecting

Page 4: Contouring Curation in Research Libraries: Defining “Working” Data Units and Communities Carole L. Palmer Center for Informatics Research in Science &

Research on range of organizational structures

Research libraries will provide direct support for some-- align with and connect to others

local cross-departmental data – “faculty of the environment”

geographic site cross-disciplinary data – unique research intensive location

disciplinary “resource collections” – neuroscience case

institutional repository services – individuals, across disciplines

national research library initiative – Data Conservancy

Functionality will need to support “strategic reading” (Renear & Palmer, 2009)not just of literature, but data sets as well.

Page 5: Contouring Curation in Research Libraries: Defining “Working” Data Units and Communities Carole L. Palmer Center for Informatics Research in Science &

Information and Discovery in Neuroscience Project (NSF/CISE, 2002-2005)

Tensions managing data repository efforts & scientific research activities

Depositor & user perspectives: 341 multi-scale, multi-format data sets - cell biologists, microscopists, modelers

Used with permission from NCMIR

Discipline based repository

Important functions beyond archiving and access

Registration, certification, awareness function (see Cragin, 2009 dissertation)

Implications for moving “research” collections to “resource” level repositories

Methods development - progressive, critical materials approach to data collection from multiple information seeking, use, and management perspectives

Page 6: Contouring Curation in Research Libraries: Defining “Working” Data Units and Communities Carole L. Palmer Center for Informatics Research in Science &

Institutional repository

Data Curation Profiles Project (IMLS NLG 2007-2010)

Individual scientist’s data production workflows and perspectives on sharing

Scott Brandt, PI; Collaborators: M. Witt & J. Carlson, (Purdue) Palmer, Cragin, & Shreeves (Illinois)

• derive requirements for managing data sets in IRs• develop policies for archiving and access• articulate librarian roles & skill sets for supporting archiving & sharing

BiochemistryBiologyCivil EngineeringElectrical EngineeringFood SciencesEarth and Atmospheric SciencesSoil Science

AnthropologyGeologyPlant SciencesKinesiologySpeech and Hearing Earth and Atmospheric SciencesSoil Science

Page 7: Contouring Curation in Research Libraries: Defining “Working” Data Units and Communities Carole L. Palmer Center for Informatics Research in Science &

Data collection and analysis

Interviews - with scientists and data managers

Case Studies- with selected research groups ingeology and civil engineering

Focus Groups - with liaison librarians on theirwork with academic researchersrelated to data issues

Needs Analysis - policy assertions forpreservation and access, based on researchers as data producers, suppliers, and users

Curation Profiles -detailed disciplinary profilesInstrument for curatorial practice

Page 8: Contouring Curation in Research Libraries: Defining “Working” Data Units and Communities Carole L. Palmer Center for Informatics Research in Science &

Integrated and comprehensive data curation strategy

to collect, organize, validate, and preserve data to address grand research challenges that face society

Infrastructure builds on & connects existing exemplar projects and communities

deep engagement with scientists extensive experience with large-scale, distributed system development.

Research libraries will be a core part of the emerging, distributed

network of data collections and services.

Data Conservancy - assertion and approach

Nationally scoped research library repository

Page 9: Contouring Curation in Research Libraries: Defining “Working” Data Units and Communities Carole L. Palmer Center for Informatics Research in Science &

Data Conservancy.org

PI, Sayeed Choudhury, Sheridan Libraries

Network of domain and data scientists, information and computer scientists, enterprise experts, librarians, and engineers.

Carl Lagoze Cornell University

Mary Marlino National Center for Atmospheric Research (NCAR)

Carole Palmer CIRSS, GSLIS, University of Illinois at U-C

Paddy Patterson Marine Biological Laboratory

Chris Borgman University of California Los Angeles

Ruth Duerr National Snow and Ice Data Center

Mark Evans Tessella, Inc.

Eileen Fenton Portico

Sandy Payette DuraSpace / Fedora Commons

Co-PIs and Partners

Page 10: Contouring Curation in Research Libraries: Defining “Working” Data Units and Communities Carole L. Palmer Center for Informatics Research in Science &

Success in data standards, practices, documentation, and associated services

Ingest astronomy data into preservation archive,connect data to existing services used by astronomers.

Demonstrate utility of hosting data in environment that supports existing scientific capabilities in a sustainable manner.

Astronomy as an exemplar community

Scope to include: life sciences

earth sciencessocial sciences

Page 11: Contouring Curation in Research Libraries: Defining “Working” Data Units and Communities Carole L. Palmer Center for Informatics Research in Science &

Science and library based hubs

Marine Biological Laboratory

Encyclopedia of Life - taxonomic organization, ontology indexingspecies identification queries for climate change analyses

National Snow & Ice Data Center

extensive sensor network, fieldwork, aircraft and satellite dataaccess node on the DC network, test bed for distributed services

National Center for Atmospheric Research

civic decision making and climate science in megacities

Cornell University Library

DataStar - promotes archiving to disciplinary data centers arXiv eprints - OAI-ORE to link research data with publications

Page 12: Contouring Curation in Research Libraries: Defining “Working” Data Units and Communities Carole L. Palmer Center for Informatics Research in Science &

Data framework

Start with a common conceptualization that applies across domains-- scientific observation

Examine, adapt, and adopt existing models

National Virtual Observatory Scientific Observations Network (Sonet)

Define fundamental concepts and identity conditions – collections, data sets, version, etc.

(Data Concepts team at Illinois, lead by Allen Renear)

Accommodate range of disciplinary data and metadata standards

-- dozens in earth, atmospheric, soil science alone,

yet the “typical” scientist may know of none

Page 13: Contouring Curation in Research Libraries: Defining “Working” Data Units and Communities Carole L. Palmer Center for Informatics Research in Science &

User requirements and research

AstronomyLife

SciencesEarth

SciencesSocial

Sciences

NCAR

Task-based design and usability testing User cases, data requirements, system

recommendations

UCLAEthnography, oral histories

Use cases, Data reqs.

SMALL SCIENCE- reuse potentials

Curation requirements framework relating data characteristics and stages (metadata & provenance) to community data practices

ILLINOIS

Page 14: Contouring Curation in Research Libraries: Defining “Working” Data Units and Communities Carole L. Palmer Center for Informatics Research in Science &

Applying quasi-profiling approach

Data kinds and stages - sharing targets, workflow/ provenance, context

Intellectual property - owner(s), stakeholders, terms of use, attribution

Ingest org /description – formal / local standards, documentation

Access - embargo, access control, mirror site

Preservation – targets, duration, migration

Tools - analytical, visualization, integration

Interoperability - needs, APIs, 3rd party data, etc.

Storage, integrity, security - audits, version control

Discovery – browse, search, external

Page 15: Contouring Curation in Research Libraries: Defining “Working” Data Units and Communities Carole L. Palmer Center for Informatics Research in Science &

Progressive data collection

Talking shop about data- efficient exchange with the right scientists about the right things

Scientists leading research - IP, access, discovery, research context

• Pre-interview worksheets

• Semi-structured interviews

• follow up sessions with selected participants

Scientists managing data - stages, versions, standards, tools

(post docs, others from labs and research groups)

• Data deposit & sharing worksheet

• Data samples, related documentation

Page 16: Contouring Curation in Research Libraries: Defining “Working” Data Units and Communities Carole L. Palmer Center for Informatics Research in Science &

Units of analysis

Data “sets”

aligned with research group production and dissemination

workflows and services

policies on attribution, embargoing, etc.

Data communities

Aligned with current and future interactions around data

representation, functionality, and use

policies for selection, appraisal, retention, description

Page 17: Contouring Curation in Research Libraries: Defining “Working” Data Units and Communities Carole L. Palmer Center for Informatics Research in Science &

Data communities

What are the meaningful social units for organization and use of data over the long term?

• Sub-discipline focused on particular kinds of data that produce specific measurements or analysis

• Specialized domain focused on a research problem, often interdisciplinary in nature

• Developers of shared community-level data collection (i.e., “Resource Collection”, NSB 2005)

Core research challenge:

Predict and design for communities of users, which will differ from producers, and change over time

Page 18: Contouring Curation in Research Libraries: Defining “Working” Data Units and Communities Carole L. Palmer Center for Informatics Research in Science &

Systems oriented “small” scienceGeobiology Volcanology Soil ecology

Analytical data unit

Site-specific time series: • reduced spreadsheets: rock, water, microbial• microscopy images

• annotated digital photographs

Rock profile: • physical rock• thin section• chemical analysis• photographs• field notes

Database:• multiple abiotic soil measurements• associated metadata

User communities

Geology Chemistry Microbiology Genomics U.S. Park Service

Geology – igneous petrologyGeophysicsGeochemistry

Geology – bio geo chemistry Earthworm ecology Sensor network researchers

Sharingconventions

• by request • no repository• mostly post-publication some unpublished

• by request• no repository

• public resource collection

At present, literature and conference-based sharing relationships

Individual data components required for reuse

Page 19: Contouring Curation in Research Libraries: Defining “Working” Data Units and Communities Carole L. Palmer Center for Informatics Research in Science &

Research informing LIS education

Preparing information professionals for range of workforce demands:

Summer Institutes

In service professionaldevelopment

2008 -

BiologicalInformation

Specialist

Masters in bioinformatics2006 - Curation

In theHumanities

Curation in the

Sciences

MSLIS concentration in data curationsciences, 2006 -humanities, 2008 -

Page 20: Contouring Curation in Research Libraries: Defining “Working” Data Units and Communities Carole L. Palmer Center for Informatics Research in Science &

6th International Digital Curation Conference

Chicago, ILDec. 6-8, 2010

hosted byCIRSS / GSLIS

in partnership withDigital Curation Centre, UK

pre-conference DataNet Education Summitpost-conference LIS Research Summit

Page 21: Contouring Curation in Research Libraries: Defining “Working” Data Units and Communities Carole L. Palmer Center for Informatics Research in Science &

Questions & comments, please

Center for Informatics Research in Science and Scholarship

[email protected]

http://cirss.lis.uiuc.edu/

Page 22: Contouring Curation in Research Libraries: Defining “Working” Data Units and Communities Carole L. Palmer Center for Informatics Research in Science &
Page 23: Contouring Curation in Research Libraries: Defining “Working” Data Units and Communities Carole L. Palmer Center for Informatics Research in Science &

Data curation is . . .

the active and on-going management of (research) data through its lifecycle of interest and usefulness

to scholarship, science, and education.

Tasks

• appraisal and selection• representation• authentication • data integrity• maintaining links• format conversions

Functions

• enable discovery and retrieval• maintain data quality• add value• provide for re-use over time• archiving• preservation