Transcript
Page 1: Open Source Options for Digital Curation

Open Source Options for Digital Curation

Library 2.012October 4, 2012

Christinger TomerUniversity of Pittsburgh

Page 2: Open Source Options for Digital Curation

According to the Digital Curation Centre in the U.K., digital curation involves maintaining, preserving and adding value to digital research data throughout its lifecycle and covering the stewardship of data from the point of conceptualisation to its eventual disposal. It is based on the presumption that such data has multiple uses and uses in other contexts.

.

Definitions of Digital Curation

Page 3: Open Source Options for Digital Curation

An Illustration of the DCC Curation Lifecycle Model

See http://www.dcc.ac.uk/resources/curation-lifecycle-model

Page 4: Open Source Options for Digital Curation

The Role of Libraries in Digital Curation

Heidorn argues that "[i]ncreasingly, data are being recognized as first-class intellectual objects that can undergo quality checks, peer review, distribution, and reuse. The reuse of data contributes as much to society as the reuse of a concept in a journal article. The data set can be cited and contribute to the reputation of the creator of the data for good or ill." He goes on to assert that "[l]ibraries ..... have a duty to society to collect, preserve, and disseminate the intellectual output of the society—including this data." (From P. Bryan Heidorn (2011): The Emerging Role of Libraries in Data Curation and E- science, Journal of Library Administration, 51:7-8, 662-672.)

Page 5: Open Source Options for Digital Curation

Key Factors in Digital Curation• Identity -- "Identity is contextual: some objects are associated with

information that allows identification only within a limited context (e.g., an object may be uniquely identified only within the context of objects residing on the same server), while others have enough information to make them globally identifiable (e.g., a global identifier such as a GUID or ISBN).”

• Authenticity and Understandabilityo Evaluation of the understandability of data requires that there be

sufficient context (documentation, meta- data, or provenance) describing the data, and that the data is usable.

• Persistence

See: Sally Vermaaten, et al. Identifying Threats to Successful Digital Preservation: the SPOT Model for Risk Assessment. D-Lib Magazine 18 (September/October 2012): 8.

Page 6: Open Source Options for Digital Curation

More Key Factors• Renderability -- "the property that a digital object is able to be used

in a way that retains the object's significant characteristics," meaning that the hardware and software necessary to render the object are available or may be reproduced through emulation.

• Integrityo Integrity of data assumes that the data can be proven to be

identical, at the bit level, to some prior accepted or verified state. Data integrity may be required for usability, understandability, authenticity, trust, and thus overall quality.

• Access and Usabilityo Data Generatorso Data Seekerso Data Consumers

Page 7: Open Source Options for Digital Curation

Archon

Archon's designers refer to it as a "simple archiving" system, but its effective use does require a working knowledge of standards for archival description. Its preview feature is especially effective in the treatment of visual materials.

Archon, which has been developed by a group from the University of Illinois, supports the creation of records conforming to MARC and EAD, as well as their import and export.

Page 8: Open Source Options for Digital Curation

ICA AToM allows creators to build what are effectively compound documents and user to scroll through thumbnails of the documents on the interface

ICA AToM

Artefactual Systems in British Columbia is developing ICA AToM 2.0, which will be available as open source, community-supported software and as a fee-based service

Page 9: Open Source Options for Digital Curation

ResourceSpace

Page 10: Open Source Options for Digital Curation

Document View under ResourceSpace

Page 12: Open Source Options for Digital Curation

Islandora

Islandora is a hybrid, combining the Fedora Commons repository system as the back end with Drupal, the LAMP-based CMS, as the front end. This hybridized approach is gaining in popularity among developers, who believe that successful design must provide a place for narrative treatments.

The idea underlying Islandora is that a creator places an object in the repository, Fedora Commons, and then links that object to other materials, e.g., text, images, et al., that are mounted through the CMS, which is a Drupal instance.

Page 13: Open Source Options for Digital Curation

OmekaOmeka is based on the "LAMP" architecture. Perhaps its most important feature is its modular design.

Page 14: Open Source Options for Digital Curation

Omeka's Modularity

Omeka supports a wide array of plugins that have been designed to enhance the functionality of the system. In this illustration, one of the examples is the Creative Commons Chooser, which allows the creator of an object to select the appropriate license from the entry interface.

Page 15: Open Source Options for Digital Curation

Omeka.net

Page 16: Open Source Options for Digital Curation

DSpaceDSpace is based on Java and Apache Tomcat and will run with equal facility on *nix or Windows systems

Page 17: Open Source Options for Digital Curation

ePrints3ePrints is another LAMP-based system, distinguished by its reliance on PERL and popularity, which owes much to its ease-of-use, particularly in the generation of metadata.

Page 18: Open Source Options for Digital Curation

DSpace, with Manakin Interface

Page 19: Open Source Options for Digital Curation

HubZero Client Interface

HubZero, which was developed at Purdue University, is based on Joomla, the content management system, and uses MySQL as its back-end. The main aim of the system is to provide a platform on which researchers can mount and annotate datasets.

Page 20: Open Source Options for Digital Curation

Penn State’s ScholarSphere

Released on September 24, 2012, ScholarSphere is another hybrid system, based on Hydra, a Ruby-on-Rails front-end and Fedora Commons.

Page 21: Open Source Options for Digital Curation

The Common Sense of IR Plus

IR Plus is a system developed by the University of Rochester Libraries. It is another variation on the hybrid theme, in this case it uses Apache Tomcat's WebDAV extension to support personal file storage and public archiving, with depositors able to make materials mounted in the personal storage area available to collaborative groups and/or the public by toggling a software switch. This design is intended to reduce the friction associated with the use of other archiving/repository systems

Page 22: Open Source Options for Digital Curation

Sharing & Publishing under IR Plus

Page 23: Open Source Options for Digital Curation

Interoperability and Related Issues

• Are Key Archival and/or Bibliographic Standards such as MARC and/or EAD Supported?

• Does the System Support the Open Archives Initiative's Metadata Harvesting Protocol?

• To What Extent is Content Exportable? To What Extent is the System Itself Portable?

• Is the system extensible?

Page 24: Open Source Options for Digital Curation

Ease-of-Use

• Does the creation of objects within the System require a professional level knowledge of metadata generation?

• What are the characteristics of the workflow? Does the workflow support multiple roles?

• Does the system incorporate lookup features based on Web APIs?

• How does the system support the organization of objects once they have been mounted?

Page 25: Open Source Options for Digital Curation

Documentation and Support

• Is the System Supported by an Active Documentation Project? What is the quality of the documentation that is available?

• Are their user forums through which questions and configuration, content creation, and/or bugs may be addressed?

• How often is the software updated?• In the case of extensible systems, how

productive are the developer communities providing extensions, plugins, themes, etc.?

Page 26: Open Source Options for Digital Curation

Factors in Evaluating Open Source Software

• License• Activity and Age of the Project• Unit Tests• Code Quality• Base Use Test• Modification Test