16
Data curation in an existing infrastructure: Stellenbosch University 1 st African Digital Curation Conference 12 – 13 February 2008 Wouter Klapwijk Senior software specialist Library and Information Service

Data curation in an existing infrastructure: Stellenbosch University 1 st African Digital Curation Conference 12 – 13 February 2008 Wouter Klapwijk Senior

Embed Size (px)

Citation preview

Page 1: Data curation in an existing infrastructure: Stellenbosch University 1 st African Digital Curation Conference 12 – 13 February 2008 Wouter Klapwijk Senior

Data curation in an existing infrastructure:

Stellenbosch University

1st African Digital Curation Conference 12 – 13 February 2008

Wouter KlapwijkSenior software specialist

Library and Information Service

Page 2: Data curation in an existing infrastructure: Stellenbosch University 1 st African Digital Curation Conference 12 – 13 February 2008 Wouter Klapwijk Senior

Overview

1. Organisational objectives: current status

2. Example of digital curation in practice

Page 3: Data curation in an existing infrastructure: Stellenbosch University 1 st African Digital Curation Conference 12 – 13 February 2008 Wouter Klapwijk Senior

Overview

1.Organisational objectives: current status

2. Example of digital curation in practice

Page 4: Data curation in an existing infrastructure: Stellenbosch University 1 st African Digital Curation Conference 12 – 13 February 2008 Wouter Klapwijk Senior

Organisational objectives: current status

• Current digital curation practices focused on supporting the e-Science/e-Research support framework on campus

• Acknowledge the fact that digital curation is more than just ingesting, preserving and disseminating research output

• Started experimenting with replication technology – LOCKSS

• Institutional Repository (IR) will interface with Research Management System (RIMS) – SA RIMS (InfoEd) project

• Dspace RIMS interaction via Staging Area needs attention

Page 5: Data curation in an existing infrastructure: Stellenbosch University 1 st African Digital Curation Conference 12 – 13 February 2008 Wouter Klapwijk Senior

Organisational objectives: current status

• Library Service, Dept of Research Development and the Dept of Information Technology: policy framework

• Institutional policies need to be in place• Human Resources: no dedicated

programmers, no dedicated repository administrator, no dedicated systems administrator, 2 part time staff, no real budget

Page 6: Data curation in an existing infrastructure: Stellenbosch University 1 st African Digital Curation Conference 12 – 13 February 2008 Wouter Klapwijk Senior

Research Support Technology Framework

Do Research

ToolboxWeb survey tool (SUrvey)

Citation tools (e.g.Endnote)SAS, SPSS, Matlab, etc.

Federated Search (MetaLib)

Federated Identity ManagementInter-institutional ID management

CollaborationEnvironment

e.g. For Centres of Excellence, Inter-institutionWeb & video conferencing, messaging,

document collaboration, blogs, wikiswebsites

High-speedInternet

SANReNSeacom

Remote connectivity (e.g. SCN)

Institutional Repository

Lab notesPreservation

Security & PublishingResearch outputs:

ArticlesData sets

ETD

Concept

Research Lifecycle ManagementGrant

Application

EthicsProtocol

NegotiateApprove

InternalReview

SubmitIP/Ethics/ContractReview

Manage Project &Contract

ReportingResearch

Output

(Inter)NationalResources

Library systems

Funding search/alerts

Expertise directories

Compiled by Ralph Pina, Stellenbosch University IT Division

e-Portfolio

Self-maintained formarketing

e-Profile

Hi-Perf Computer Cluster

Page 7: Data curation in an existing infrastructure: Stellenbosch University 1 st African Digital Curation Conference 12 – 13 February 2008 Wouter Klapwijk Senior

Institutional Repository

INSTITUTIONAL RESEARCH REPOSITORY

SU FEDERATED INTERFACE

OAI-PMH Service provider

LEARNING OBJECT REPOSITORY

Storage of learning objects to support the processof Learning and Teaching (interface with WebCT)

NON-ACADEMIC REPOSITORY

For the submission, archival and retrieval of Digital Objects

REPOSITORY …

RESEARCH REPOSITORY

Centre/s of ExcellenceDepartmental Research

… etc.{ CREST / Department of Research Support }

ELECTRONIC THESES / DISSERTATIONS

For the submission of postgraduate research which is not part of the Research Repository

Page 8: Data curation in an existing infrastructure: Stellenbosch University 1 st African Digital Curation Conference 12 – 13 February 2008 Wouter Klapwijk Senior

Implementation - technology plan

• Dspace digital repository system• Standardize on version 1.5 for 3 years• Need to workshop OAIS framework (in

South African context)• Full OAIS-compliance in Dspace Release

2• Replication of ETD’s with LOCKSS – Proof

of Concept planned for 2009• Some work done with LOCKSS on format

migration• Multiple instances of Dspace – more

flexibility, more personalization

Page 9: Data curation in an existing infrastructure: Stellenbosch University 1 st African Digital Curation Conference 12 – 13 February 2008 Wouter Klapwijk Senior

Implementation – virtual server setup

VMWare guestetd.sun.ac.za

(with tomcat and handle server)

VMWare guestresearch.sun.ac.za

(with tomcat and handle server)

VMWare guestlib.sun.ac.za

(with tomcat and handle server)

DSPACE 1.5 SERVER SQL SERVER

Linux

Bitstream storage

VMWare server

Linux / BSD

Metadata storage

PostgreSQL

mySQL

METADATA

ISO 19115

EAD

Dublin Core

etc.

SAN(OPTIONAL)

Page 10: Data curation in an existing infrastructure: Stellenbosch University 1 st African Digital Curation Conference 12 – 13 February 2008 Wouter Klapwijk Senior

Overview

1. Organisational objectives: current status

2. Example of digital curation in practice

Page 11: Data curation in an existing infrastructure: Stellenbosch University 1 st African Digital Curation Conference 12 – 13 February 2008 Wouter Klapwijk Senior

Overview

1. Organisational objectives: current status

2.Example of digital curation in practice

Page 12: Data curation in an existing infrastructure: Stellenbosch University 1 st African Digital Curation Conference 12 – 13 February 2008 Wouter Klapwijk Senior

A case study

• DST-NRF Centre of Excellence for Invasion Biology (CIB)

• Prepared a set of Use Cases – metadata requirements, access, permission, roles and responsibilities

• Dedicated Collection Administrator• Datasets Dublin Core• Publications ISO-19115 (spatial data)

Page 13: Data curation in an existing infrastructure: Stellenbosch University 1 st African Digital Curation Conference 12 – 13 February 2008 Wouter Klapwijk Senior

A Case Study - managed access

Level Metadata Authors Projects Theses Publications Datasets

L0

L1

L2

L3

L4

No Access

Access

To be confirmed

View if it is owner

Permissions according to levels(extracted from the CIB Use Cases)

Page 14: Data curation in an existing infrastructure: Stellenbosch University 1 st African Digital Curation Conference 12 – 13 February 2008 Wouter Klapwijk Senior

A Case Study – metadata requirements

Conditional statements:language: documented if not defined by the encoding standardcharacterSet: documented if ISO 10646-1 not used and not defined by the encoding standardhierarchyLevel: documented if hierarchyLevel not equal to "dataset"?hierarchyLevelName: documented if hierarchyLevel not equal to "dataset"?

MD_SpatialRepresentation(from Spatial representation information)

<<Abstract>>

MD_ApplicationSchemaInformation(from Application schema information)

MD_PortrayalCatalogueReference(from Portrayal catalogue information)

MD_MetadataExtensionInformation(from Metadata extension information)

MD_ContentInformation(from Content information)

MD_ReferenceSystem(from Reference system information)

DQ_DataQuality(from Data quality information)

MD_Distribution(from Distribution information)

MD_MaintenanceInformation(from Maintenance information)MD_Metadata

+ fileIdentifier [0..1] : CharacterString+ language [0..1] : CharacterString+ characterSet [0..1] : MD_CharacterSetCode = "utf8"+ parentIdentifier [0..1] : CharacterString+ hierarchyLevel [0..*] : MD_ScopeCode = "dataset"+ hierarchyLevelName [0..*] : CharacterString+ contact : CI_ResponsibleParty+ dateStamp : Date+ metadataStandardName [0..1] : CharacterString+ metadataStandardVersion [0..1] : CharacterString

0..*+spatialRepresentationInfo 0..*

0..*+applicationSchemaInfo0..*

0..*

+portrayalCatalogueInfo

0..*

0..1

+metadataMaintenance

0..1

0..*

+metadataExtensionInfo

0..*

0..*+contentInfo

0..*

0..*

+referenceSystemInfo

0..*

0..*+dataQualityInfo

0..*

0..1

+distributionInfo

0..1

MD_Constraints(from Constraint information)

0..*+metadataConstraints

0..*

MD_Identification(from Identification information)

<<Abstract>>

0..*+resourceMaintenance

0..*

1..*

+identificationInfo

1..*

0..*

+resourceConstraints

0..*

Page 15: Data curation in an existing infrastructure: Stellenbosch University 1 st African Digital Curation Conference 12 – 13 February 2008 Wouter Klapwijk Senior

A Case Study – metadata requirements

Link to:

1. Interface design

2. Data Dictionary

3. Dspace Administrator view

Page 16: Data curation in an existing infrastructure: Stellenbosch University 1 st African Digital Curation Conference 12 – 13 February 2008 Wouter Klapwijk Senior

Thank you

[email protected]

+27 21 808-4378