Download ppt - Indiana University

Transcript
Page 1: Indiana University

Data Publishing Service Indiana University

Stacy KowalczykApril 9, 2010

Page 2: Indiana University

Questions

• Which phases of the data life cycle are managed by your repository?

• How do data management requirements differ across the data life cycle?

• What systems do you use to support the data life cycle?

• Can you generalize the mechanisms used to migrate data between different phases of the data life cycle?

Page 3: Indiana University

Data Publishing Service• A new service of the IUScholarWorks institutional

repository and the Scholarly Data Services• Providing data management support and data access• Data will have a persistent URL so it can be linked to

publications• The service will combine our DSpace repository with

IU’s Scholarly Data system (formerly known as MDSS), a system that researchers are already uses

• Allows discovery over the Web• Preservation – bit level

Page 4: Indiana University

Current Data Lifecycle Model Implementation

Scholarly Data ServiceData creationresearch designdata management planningdata collection (surveying, experimentation, measuring etc.)data checking and cleaning

↓Data analysisanalysisderived data creationcreation of data documentation

↓End of researchresearch outputspreparing data for preservation

IU ScholarWorksPreservation of datastorage of datamigration to suitable format/mediummetadata creation

↓Distribution/publication of data

↓Re-use of databy same researcherby other researchers

http://www.data-archive.ac.uk/sharing/lifecycle.asp

Page 5: Indiana University

Scholarly Data Service

• Massive Data Storage System• Current system for research data storage• Installed in 1998• Based on IBM developed High Performance

Storage System (HPSS) software• It offers over 2.8 petabytes of disk- and

tape-based storage. Distributed between Indianapolis and Bloomington campuses

Page 6: Indiana University

IUBSubsystem

IUPUISubsystem

Research Network

Research Network

BloomingtonUsers

BloomingtonUsers Indianapolis

UsersIndianapolis

Users

HPSSMoversHPSS

MoversHPSS

MoversHPSS

Movers

Research Network

Research Network

TCP/IP Wide Area

Network

SANSANSANSAN

IUBCampus Network

IUPUICampus Network

Disk ArraysDisk Arrays Tape LibraryTape LibraryDisk ArraysDisk Arrays Tape LibraryTape Library

HPSS CoreServers

HPSS CoreServers

Distributed between IUB and IUPUI

Page 7: Indiana University

Data Publishing in IU Scholarworks

• Discovery and access of datasets and related publications through the IUScholarWorks Repository service

• DSpace records that are searchable, indexed, and harvested and available at stable URLs

• DSpace records that contain DSpace bitstreams for small datasets

• DSpace records that link via stable URLs to large datasets in IU MDSS

Page 8: Indiana University

IU MDSS

MDSS web server

HTTP Server

hpssfs filesystem

IUScholarWorks Data: Linking to MDSS and delivery via HTTP

Item record with URL’s of

datasets in MDSS

Page 9: Indiana University

Data Publishing in IU Scholarworks• Facilitating the submission process for

both the researcher and collection manager

• We facilitate the process for submitters via the DSpace Configurable Submission system

• We facilitate the data collection manager’s process via steps in the DSpace workflow system

Page 10: Indiana University

IU MDSS

Initiate MDSS actions (move datasets, etc.)

Instructions and

preparation

Describe item

metadata form(s)

Review step

File upload step

MDSS and dataset

info/form

Finalize/ Accept License

IUScholarWorks Data: Item submission user interfacePhase 2, automated workflow

DSpace Configurable Submission System

Non-interactive processing steps Update

metadata

Query MDSS technical metadata

(checksum, etc.)

Page 11: Indiana University

Planning for a More Curated Life Cycle Model

April 21, 2023

http://libraries.mit.edu/guides/subjects/data-management/cycle.html

Page 12: Indiana University

Active and Social Curation

• Engage researchers during projects not at the end

• Use immediate benefits to drive automatic capture and 'volunteering’ of metadata

• Reduce costs by re-engineering curation processes to leverage this rich metadata and volunteered effort

Page 13: Indiana University

Appraisal and

Selection Trusted Digital Repository Federation (OAIS compliant) Preservation

Actions

Compound Objects - OAI-ORE

Dissemination Packages

Ingest, AIPs

Active Data Systems

Data Acquisition, Analysis and Simulation

Search, Browse,

Annotation, Visualization

Tools

Metadata Management

DDI3. METS, PREMIS, MODS, DC, SensorML,

OGC, …

Automated Curation Workflow/Rule

Engine

Operates on Metadata, Content Objects and

Trigger Events

Access Mechanisms and E-Scholarship Services

Migration and

Emulation Tools

Use, Reuse, Repurposing

Tools

Wide-Area File System

Active Curation OAIS Repository FederationCuration Boundary

UserContributor