Open Source Options for Digital Curation
Library 2.012October 4, 2012
Christinger TomerUniversity of Pittsburgh
According to the Digital Curation Centre in the U.K., digital curation involves maintaining, preserving and adding value to digital research data throughout its lifecycle and covering the stewardship of data from the point of conceptualisation to its eventual disposal. It is based on the presumption that such data has multiple uses and uses in other contexts.
.
Definitions of Digital Curation
An Illustration of the DCC Curation Lifecycle Model
See http://www.dcc.ac.uk/resources/curation-lifecycle-model
The Role of Libraries in Digital Curation
Heidorn argues that "[i]ncreasingly, data are being recognized as first-class intellectual objects that can undergo quality checks, peer review, distribution, and reuse. The reuse of data contributes as much to society as the reuse of a concept in a journal article. The data set can be cited and contribute to the reputation of the creator of the data for good or ill." He goes on to assert that "[l]ibraries ..... have a duty to society to collect, preserve, and disseminate the intellectual output of the society—including this data." (From P. Bryan Heidorn (2011): The Emerging Role of Libraries in Data Curation and E- science, Journal of Library Administration, 51:7-8, 662-672.)
Key Factors in Digital Curation• Identity -- "Identity is contextual: some objects are associated with
information that allows identification only within a limited context (e.g., an object may be uniquely identified only within the context of objects residing on the same server), while others have enough information to make them globally identifiable (e.g., a global identifier such as a GUID or ISBN).”
• Authenticity and Understandabilityo Evaluation of the understandability of data requires that there be
sufficient context (documentation, meta- data, or provenance) describing the data, and that the data is usable.
• Persistence
See: Sally Vermaaten, et al. Identifying Threats to Successful Digital Preservation: the SPOT Model for Risk Assessment. D-Lib Magazine 18 (September/October 2012): 8.
More Key Factors• Renderability -- "the property that a digital object is able to be used
in a way that retains the object's significant characteristics," meaning that the hardware and software necessary to render the object are available or may be reproduced through emulation.
• Integrityo Integrity of data assumes that the data can be proven to be
identical, at the bit level, to some prior accepted or verified state. Data integrity may be required for usability, understandability, authenticity, trust, and thus overall quality.
• Access and Usabilityo Data Generatorso Data Seekerso Data Consumers
Archon
Archon's designers refer to it as a "simple archiving" system, but its effective use does require a working knowledge of standards for archival description. Its preview feature is especially effective in the treatment of visual materials.
Archon, which has been developed by a group from the University of Illinois, supports the creation of records conforming to MARC and EAD, as well as their import and export.
ICA AToM allows creators to build what are effectively compound documents and user to scroll through thumbnails of the documents on the interface
ICA AToM
Artefactual Systems in British Columbia is developing ICA AToM 2.0, which will be available as open source, community-supported software and as a fee-based service
ResourceSpace
Document View under ResourceSpace
Islandora
Islandora is a hybrid, combining the Fedora Commons repository system as the back end with Drupal, the LAMP-based CMS, as the front end. This hybridized approach is gaining in popularity among developers, who believe that successful design must provide a place for narrative treatments.
The idea underlying Islandora is that a creator places an object in the repository, Fedora Commons, and then links that object to other materials, e.g., text, images, et al., that are mounted through the CMS, which is a Drupal instance.
OmekaOmeka is based on the "LAMP" architecture. Perhaps its most important feature is its modular design.
Omeka's Modularity
Omeka supports a wide array of plugins that have been designed to enhance the functionality of the system. In this illustration, one of the examples is the Creative Commons Chooser, which allows the creator of an object to select the appropriate license from the entry interface.
Omeka.net
DSpaceDSpace is based on Java and Apache Tomcat and will run with equal facility on *nix or Windows systems
ePrints3ePrints is another LAMP-based system, distinguished by its reliance on PERL and popularity, which owes much to its ease-of-use, particularly in the generation of metadata.
DSpace, with Manakin Interface
HubZero Client Interface
HubZero, which was developed at Purdue University, is based on Joomla, the content management system, and uses MySQL as its back-end. The main aim of the system is to provide a platform on which researchers can mount and annotate datasets.
Penn State’s ScholarSphere
Released on September 24, 2012, ScholarSphere is another hybrid system, based on Hydra, a Ruby-on-Rails front-end and Fedora Commons.
The Common Sense of IR Plus
IR Plus is a system developed by the University of Rochester Libraries. It is another variation on the hybrid theme, in this case it uses Apache Tomcat's WebDAV extension to support personal file storage and public archiving, with depositors able to make materials mounted in the personal storage area available to collaborative groups and/or the public by toggling a software switch. This design is intended to reduce the friction associated with the use of other archiving/repository systems
Sharing & Publishing under IR Plus
Interoperability and Related Issues
• Are Key Archival and/or Bibliographic Standards such as MARC and/or EAD Supported?
• Does the System Support the Open Archives Initiative's Metadata Harvesting Protocol?
• To What Extent is Content Exportable? To What Extent is the System Itself Portable?
• Is the system extensible?
Ease-of-Use
• Does the creation of objects within the System require a professional level knowledge of metadata generation?
• What are the characteristics of the workflow? Does the workflow support multiple roles?
• Does the system incorporate lookup features based on Web APIs?
• How does the system support the organization of objects once they have been mounted?
Documentation and Support
• Is the System Supported by an Active Documentation Project? What is the quality of the documentation that is available?
• Are their user forums through which questions and configuration, content creation, and/or bugs may be addressed?
• How often is the software updated?• In the case of extensible systems, how
productive are the developer communities providing extensions, plugins, themes, etc.?
Factors in Evaluating Open Source Software
• License• Activity and Age of the Project• Unit Tests• Code Quality• Base Use Test• Modification Test