18
SEAD Virtual Archive: Building a Federation of Institutional Repositories for Long-Term Data Preservation in Sustainability Science Beth Plale, Indiana University, Bloomington, Indiana, USA Robert H. McDonald, Indiana University, Bloomington, Indiana, USA Kavitha Chandrasekar, Indiana University, Bloomington, Indiana, USA Inna Kouper, Indiana University, Bloomington, Indiana, USA Stacy Konkiel, Indiana University, Bloomington, Indiana, USA Margaret L. Hedstrom, University of Michigan, Ann Arbor, Michigan, USA Jim Myers, Rensselaer Polytechnic Institute, Troy, New York, USA Praveen Kumar, University of Illinois, Urbana, Illinois, USA Cooperative agreement #OCI0940824 IDCC 2013 – Amsterdam – Jan. 16, 2013 1

SEAD Virtual Archive: Building a Federation of Institutional Repositories for Long-Term Data Preservation in Sustainability Science Beth Plale, Indiana

Embed Size (px)

Citation preview

Page 1: SEAD Virtual Archive: Building a Federation of Institutional Repositories for Long-Term Data Preservation in Sustainability Science Beth Plale, Indiana

IDCC 2013 – Amsterdam – Jan. 16, 2013 1

SEAD Virtual Archive: Building a Federation of Institutional Repositories for

Long-Term Data Preservation in Sustainability Science

Beth Plale, Indiana University, Bloomington, Indiana, USA Robert H. McDonald, Indiana University, Bloomington, Indiana, USA Kavitha Chandrasekar, Indiana University, Bloomington, Indiana, USA

Inna Kouper, Indiana University, Bloomington, Indiana, USA Stacy Konkiel, Indiana University, Bloomington, Indiana, USA

Margaret L. Hedstrom, University of Michigan, Ann Arbor, Michigan, USA Jim Myers, Rensselaer Polytechnic Institute, Troy, New York, USA

Praveen Kumar, University of Illinois, Urbana, Illinois, USA

Cooperative agreement #OCI0940824

Page 2: SEAD Virtual Archive: Building a Federation of Institutional Repositories for Long-Term Data Preservation in Sustainability Science Beth Plale, Indiana

2

SEAD TEAMS

Margaret Hedstrom-PI, Marietta Van Buhler, Karen Woollams, George Alter (ICPSR), Bryan Beecher (ICPSR)

Beth Plale-Co-PI, Katy Börner, Robert H. McDonald, Robert Light, Kavitha Chandrasekar, Stacy Kowalczyk, Inna Kouper, Stacy Konkiel, Robert Ping, Ryan Cobine

James Myers-Co-PI, Ram Prasanna Govind Krishnan, Lindsay Todd

Praveen Kumar-Co-PI, Terry McLaren (NCSA), Rob Kooper (NCSA), Luigi Marini (NCSA)

Michigan

Indiana

Rensselaear

Illinois

IDCC 2013 – Amsterdam – Jan. 16, 2013

Page 3: SEAD Virtual Archive: Building a Federation of Institutional Repositories for Long-Term Data Preservation in Sustainability Science Beth Plale, Indiana

3

Challenge: The Data Deluge

1. Scientific data ingestion must be quick and minimally intrusive on a scientist’s time. 2. Ingesting must be flexible enough to handle the varied kinds of data.

sizes // formats // composition3. Tools for advertising and serving data from an institutional repository need to be consistent with tools and processes of the scientific community.

IDCC 2013 – Amsterdam – Jan. 16, 2013

Page 4: SEAD Virtual Archive: Building a Federation of Institutional Repositories for Long-Term Data Preservation in Sustainability Science Beth Plale, Indiana

4

Challenge: Long Tail Scientific Research

• Many research niches– customized methods

& toolsets– localized storage

• Less consideration for long-term availability and data reuse

IDCC 2013 – Amsterdam – Jan. 16, 2013

Page 5: SEAD Virtual Archive: Building a Federation of Institutional Repositories for Long-Term Data Preservation in Sustainability Science Beth Plale, Indiana

5

Requirements of Virtual Archive for Sustainability Science

• Must connect multiple IRs• Must be minimally intrusive on a scientist’s time• Must handle varied data: – multi-GB collection, – vastly heterogeneous collection of files, – small complex database of a thousand variables, or – set of files in formats that are unique to the

subdiscipline• Must be consistent with tools and processes of the

communityIDCC 2013 – Amsterdam – Jan. 16, 2013

Page 6: SEAD Virtual Archive: Building a Federation of Institutional Repositories for Long-Term Data Preservation in Sustainability Science Beth Plale, Indiana

6

SEAD

Active Curation

Repository

(ACR)

-- metadata

harvest

-- annotation

-- web tools

SEAD VIVO-- social networking

-- links data sets

and researchers

SEAD Virtual Archive (SVA)-- manage sustainability science

window to multiple IRs--OAIS model

IU ScholarworksIR

publish associate

discover

UIUC IDEALSIR

UMich Deep Blue IR

ingest

IDCC 2013 – Amsterdam – Jan. 16, 2013

Page 7: SEAD Virtual Archive: Building a Federation of Institutional Repositories for Long-Term Data Preservation in Sustainability Science Beth Plale, Indiana

7

Active Curation

Repository

(ACR)

-- metadata

harvest

-- annotation

-- web tools

SEAD VIVO-- social networking

-- links data sets

and researchers

SEAD Virtual Archive (SVA)-- manage sustainability science

window to multiple IRs--OAIS model

SEAD Virtual Archive (SVA)Design

Policy Decisions

Progress to Date

[Single view into data] [Easy deposit] IDCC 2013 – Amsterdam – Jan. 16, 2013

Page 8: SEAD Virtual Archive: Building a Federation of Institutional Repositories for Long-Term Data Preservation in Sustainability Science Beth Plale, Indiana

8

Preview Data

Upload Data to

VA

Run Virus

Checking

File Charact-erization

Mint DOI

Deposit to IR (& cloud)

Update DOI

target

Index Metadata

Index Scientific Metadata

Large Dataset Decision

Ongoing work

Version Data

IR Match-maker

Index Scientific Metadata

Accept Repository Agreement

SEAD Virtual Archive Workflow

IDCC 2013 – Amsterdam – Jan. 16, 2013

Page 9: SEAD Virtual Archive: Building a Federation of Institutional Repositories for Long-Term Data Preservation in Sustainability Science Beth Plale, Indiana

IDCC 2013 – Amsterdam – Jan. 16, 2013 9

Preview

Data

Upload

Data to VA

File Check

Mint DOI

Deposit to IR (& cloud

)

Update DOI target

Index

VIVO

IR MatchmakerClient

IR MatchmakerService

Repository Agent

IRMatch-maker

Query for data contributor metadata

Return data contributor’s affiliation information

VA Load Monitor Agent

QueryMatch

GetMatch

Query for IRs’ details

Return all IRs’ details

QueryVA load

ReturnVA load

constraints

Architecture: SEAD VA Matchmaker

Page 10: SEAD Virtual Archive: Building a Federation of Institutional Repositories for Long-Term Data Preservation in Sustainability Science Beth Plale, Indiana

10

Policy: Licensing Agreements

• Right to store and re-format files (preservation)

• Allow editing to protect human subjects, sensitive data (protection)

• Make metadata public (discoverability)

• Ensure sponsor compliance (liability)

Repository rights

IDCC 2013 – Amsterdam – Jan. 16, 2013

Page 11: SEAD Virtual Archive: Building a Federation of Institutional Repositories for Long-Term Data Preservation in Sustainability Science Beth Plale, Indiana

11

Policy: Licensing Agreements

• Retain copyright/moral rights

• Deposits will not be changed from original intent

• Embargoes will be honored

Depositor rights

IDCC 2013 – Amsterdam – Jan. 16, 2013

Page 12: SEAD Virtual Archive: Building a Federation of Institutional Repositories for Long-Term Data Preservation in Sustainability Science Beth Plale, Indiana

12

Policy: Licensing Agreements

Single-license solution

Satisfy all repository requirements

Mitigate rights on behalf of depositor

Matchmaking solution

Connect requirements of:• End users• Repositories• SEAD Virtual Archive

IDCC 2013 – Amsterdam – Jan. 16, 2013

Page 13: SEAD Virtual Archive: Building a Federation of Institutional Repositories for Long-Term Data Preservation in Sustainability Science Beth Plale, Indiana

13

Policy: Permanent Identifiers

Author IDs

•VIVO identifiers

Dataset IDs

•Digital Object Identifiers (DOIs)

IDCC 2013 – Amsterdam – Jan. 16, 2013

Page 14: SEAD Virtual Archive: Building a Federation of Institutional Repositories for Long-Term Data Preservation in Sustainability Science Beth Plale, Indiana

14

Policy: Author IDs

ORCID

ResearcherIDScopus

Author IDPivot ID

VIVO ID

• Used primarily at domain/institutional level

• Supports many researcher ID systems, including ORCID

• Global system• Buy-in from and

integration with major publishers and institutions

IDCC 2013 – Amsterdam – Jan. 16, 2013

Page 15: SEAD Virtual Archive: Building a Federation of Institutional Repositories for Long-Term Data Preservation in Sustainability Science Beth Plale, Indiana

15

Policy: Dataset IDs

Handles DOIs

EZID integration into DSpace

Metadata storage

Widely used

Foundation for DOIs

Basis for DSpace PID

IDCC 2013 – Amsterdam – Jan. 16, 2013

Page 16: SEAD Virtual Archive: Building a Federation of Institutional Repositories for Long-Term Data Preservation in Sustainability Science Beth Plale, Indiana

16

Progress to Date

• Ingested all NCED data– Small-sized collection (overall < 150 Mb)– File organization for heterogeneous collection of

related files with flat or hierarchical structure• Tested deposit between the VA, UIUC IDEALS,

and IUScholarWorks

IDCC 2013 – Amsterdam – Jan. 16, 2013

Page 17: SEAD Virtual Archive: Building a Federation of Institutional Repositories for Long-Term Data Preservation in Sustainability Science Beth Plale, Indiana

17

Future Work

• Address other use cases– Large size collections (overall > 1 Gb)– Relational database / interconnected variables– Unique formats (to project, discipline, community)

• Interoperability with other DataNets• Support for API access• Determine how prototype fits researcher

workflows

IDCC 2013 – Amsterdam – Jan. 16, 2013

Page 18: SEAD Virtual Archive: Building a Federation of Institutional Repositories for Long-Term Data Preservation in Sustainability Science Beth Plale, Indiana

IDCC 2013 – Amsterdam – Jan. 16, 2013 18

Thank you

Download this presentation at http://slidesha.re/11vqeN9 Cooperative agreement

#OCI0940824

http://www.sead-data.net@SEADdatanet