795
DSpace 4.x Documentation Page of 1 795 DSpace 4.x Documentation URL: Date: Author: The DSpace Developer Team 28 February 2014 https://wiki.duraspace.org/display/DSDOC4x

DSpace Manual

Embed Size (px)

Citation preview

  • DSpace 4.x Documentation

    Page of 1 795

    DSpace 4.xDocumentation

    URL:Date:Author: The DSpace Developer Team

    28 February 2014https://wiki.duraspace.org/display/DSDOC4x

  • DSpace 4.x Documentation

    28-Feb-2014 https://wiki.duraspace.org/display/DSDOC4x Page of 2 795

    Table of Contents1 Introduction __________________________________________________________________________ 10

    1.1 Release Notes ____________________________________________________________________ 111.2 Functional Overview _______________________________________________________________ 17

    1.2.1 Online access to your digital assets ______________________________________________ 191.2.2 Metadata Management _______________________________________________________ 201.2.3 Licensing __________________________________________________________________ 231.2.4 Persistent URLs and Identifiers _________________________________________________ 231.2.5 Getting content into DSpace ___________________________________________________ 251.2.6 Getting content out of DSpace __________________________________________________ 281.2.7 User Management ___________________________________________________________ 301.2.8 Access Control ______________________________________________________________ 311.2.9 Usage Metrics ______________________________________________________________ 331.2.10 Digital Preservation __________________________________________________________ 341.2.11 System Design ______________________________________________________________ 35

    2 Installing DSpace ______________________________________________________________________ 382.1 For the Impatient __________________________________________________________________ 382.2 Hardware Recommendations ________________________________________________________ 392.3 Prerequisite Software ______________________________________________________________ 39

    2.3.1 UNIX-like OS or Microsoft Windows _____________________________________________ 392.3.2 Oracle Java JDK 7 (standard SDK is fine, you don't need J2EE) or OpenJDK 7 ___________ 402.3.3 Apache Maven 3.x (Java build tool) ______________________________________________ 402.3.4 Apache Ant 1.8 or later (Java build tool) __________________________________________ 412.3.5 Relational Database: (PostgreSQL or Oracle) ______________________________________ 412.3.6 Servlet Engine (Apache Tomcat 7 or later, Jetty, Caucho Resin or equivalent) ____________ 422.3.7 Perl (only required for [dspace]/bin/dspace-info.pl) __________________________________ 43

    2.4 Installation Instructions _____________________________________________________________ 442.4.1 Overview of Install Options ____________________________________________________ 442.4.2 Overview of DSpace Directories ________________________________________________ 452.4.3 Installation _________________________________________________________________ 46

    2.5 Advanced Installation ______________________________________________________________ 552.5.1 'cron' jobs / scheduled tasks ___________________________________________________ 552.5.2 Multilingual Installation ________________________________________________________ 562.5.3 DSpace over HTTPS _________________________________________________________ 562.5.4 The Handle Server ___________________________________________________________ 612.5.5 Google and HTML sitemaps ___________________________________________________ 632.5.6 Statistics ___________________________________________________________________ 64

    2.6 Windows Installation _______________________________________________________________ 642.7 Checking Your Installation ___________________________________________________________ 642.8 Known Bugs _____________________________________________________________________ 65

  • DSpace 4.x Documentation

    28-Feb-2014 https://wiki.duraspace.org/display/DSDOC4x Page of 3 795

    2.9 Common Problems ________________________________________________________________ 652.9.1 Common Installation Issues ____________________________________________________ 662.9.2 General DSpace Issues _______________________________________________________ 67

    3 Upgrading DSpace ____________________________________________________________________ 693.1 Upgrading From 4.0 to 4.x ___________________________________________________________ 69

    3.1.1 Backup your DSpace _________________________________________________________ 703.1.2 Upgrade Steps ______________________________________________________________ 713.1.3 Fixing the effects of DS-1536 ___________________________________________________ 72

    3.2 Upgrading From 3.x to 4.x ___________________________________________________________ 743.2.1 Backup your DSpace _________________________________________________________ 753.2.2 Update Prerequisite Software (as necessary) ______________________________________ 753.2.3 Upgrade Steps ______________________________________________________________ 75

    3.3 Upgrading From 3.0 to 3.x ___________________________________________________________ 783.3.1 Backup your DSpace _________________________________________________________ 793.3.2 Upgrade Steps ______________________________________________________________ 80

    3.4 Upgrading From 1.8.x to 3.x _________________________________________________________ 813.4.1 Backup your DSpace _________________________________________________________ 823.4.2 Upgrade Steps ______________________________________________________________ 82

    3.5 Upgrading From 1.8 to 1.8.x _________________________________________________________ 853.5.1 Backup your DSpace _________________________________________________________ 873.5.2 Upgrade Steps ______________________________________________________________ 87

    3.6 Upgrading From 1.7.x to 1.8.x ________________________________________________________ 883.6.1 Backup your DSpace _________________________________________________________ 913.6.2 Upgrade Steps ______________________________________________________________ 91

    3.7 Upgrading From 1.7 to 1.7.x _________________________________________________________ 943.7.1 Upgrade Steps ______________________________________________________________ 95

    3.8 Upgrading From older versions of DSpace ______________________________________________ 963.8.1 Upgrading From 1.6.x to 1.7.x __________________________________________________ 963.8.2 Upgrading From 1.6 to 1.6.x __________________________________________________ 1063.8.3 Upgrading From 1.5.x to 1.6.x _________________________________________________ 1093.8.4 Upgrading From 1.5 or 1.5.1 to 1.5.2 ____________________________________________ 1223.8.5 Upgrading From 1.4.2 to 1.5 __________________________________________________ 1313.8.6 Upgrade Steps _____________________________________________________________ 1363.8.7 Upgrading From 1.4 to 1.4.x __________________________________________________ 1363.8.8 Upgrading From 1.3.2 to 1.4.x _________________________________________________ 1393.8.9 Upgrading From 1.3.1 to 1.3.2 _________________________________________________ 1423.8.10 Upgrading From 1.2.x to 1.3.x _________________________________________________ 1433.8.11 Upgrading From 1.2.1 to 1.2.2 _________________________________________________ 1443.8.12 Upgrading From 1.2 to 1.2.1 __________________________________________________ 1453.8.13 Upgrading From 1.1.x to 1.2 __________________________________________________ 1483.8.14 Upgrading From 1.1 to 1.1.1 __________________________________________________ 1523.8.15 Upgrading From 1.0.1 to 1.1 __________________________________________________ 152

    4 Using DSpace _______________________________________________________________________ 156

  • DSpace 4.x Documentation

    28-Feb-2014 https://wiki.duraspace.org/display/DSDOC4x Page of 4 795

    4.1 Advanced Customisation ___________________________________________________________ 1564.1.1 Additions module ___________________________________________________________ 1564.1.2 Maven WAR Overlays _______________________________________________________ 1564.1.3 DSpace Source Release _____________________________________________________ 1574.1.4 DSpace Service Manager ____________________________________________________ 157

    4.2 Authentication Plugins _____________________________________________________________ 1604.2.1 Stackable Authentication Method(s) ____________________________________________ 161

    4.3 Batch Metadata Editing ____________________________________________________________ 1804.3.1 Batch Metadata Editing Tool __________________________________________________ 1804.3.2 Batch Metadata Editing Configuration ___________________________________________ 185

    4.4 Configuration Reference ___________________________________________________________ 1864.4.1 General Configuration _______________________________________________________ 1884.4.2 The build.properties Configuration Properties File __________________________________ 1904.4.3 The dspace.cfg Configuration Properties File _____________________________________ 1934.4.4 Optional or Advanced Configuration Settings _____________________________________ 265

    4.5 Curation System _________________________________________________________________ 2704.5.1 Changes in 1.8 _____________________________________________________________ 2714.5.2 Tasks ____________________________________________________________________ 2714.5.3 Activation _________________________________________________________________ 2724.5.4 Writing your own tasks _______________________________________________________ 2724.5.5 Task Invocation ____________________________________________________________ 2734.5.6 Asynchronous (Deferred) Operation ____________________________________________ 2764.5.7 Task Output and Reporting ___________________________________________________ 2774.5.8 Task Properties ____________________________________________________________ 2784.5.9 Task Annotations ___________________________________________________________ 2804.5.10 Scripted Tasks _____________________________________________________________ 2804.5.11 Bundled Tasks _____________________________________________________________ 2824.5.12 Curation tasks in Jython ______________________________________________________ 291

    4.6 Discovery _______________________________________________________________________ 2934.6.1 What is DSpace Discovery ____________________________________________________ 2944.6.2 Discovery Changelist ________________________________________________________ 2964.6.3 Enabling Discovery _________________________________________________________ 2974.6.4 Configuration files __________________________________________________________ 2974.6.5 General Discovery settings ( config/modules/discovery.cfg ) __________________________ 2984.6.6 Modifying the Discovery User Interface ( config/spring/api/discovery.xml ) _______________ 2994.6.7 Discovery Solr Index Maintenance ______________________________________________ 3114.6.8 Advanced Solr Configuration __________________________________________________ 311

    4.7 DOI Digital Object Identifier _________________________________________________________ 3124.7.1 Persistent Identifier _________________________________________________________ 3134.7.2 DOI Registration Agencies ____________________________________________________ 3134.7.3 Configure DSpace to use the DataCite API _______________________________________ 3134.7.4 Configure DSpace to use EZID service for registration of DOIs _______________________ 3214.7.5 Adding support for other Registration Agencies ____________________________________ 322

  • DSpace 4.x Documentation

    28-Feb-2014 https://wiki.duraspace.org/display/DSDOC4x Page of 5 795

    4.8 DSpace Statistics ________________________________________________________________ 3224.8.1 What is exactly being logged ? ________________________________________________ 3234.8.2 Web User Interface Elements _________________________________________________ 3254.8.3 Architecture _______________________________________________________________ 3284.8.4 Configuration settings for Statistics _____________________________________________ 3284.8.5 Upgrade Process for Statistics _________________________________________________ 3314.8.6 Statistics Administration ______________________________________________________ 3324.8.7 Statistics differences between DSpace 1.7.x and 1.8.0 ______________________________ 3324.8.8 Statistics differences between DSpace 1.6.x and 1.7.0 ______________________________ 3334.8.9 Web UI Statistics Modification (XMLUI Only) ______________________________________ 3334.8.10 Custom Reporting - Querying SOLR Directly ______________________________________ 3344.8.11 Manually Installing/Updating GeoLite Database File ________________________________ 3354.8.12 Managing Usage Statistics ____________________________________________________ 3354.8.13 Elastic Search Usage Statistics ________________________________________________ 340

    4.9 Embargo _______________________________________________________________________ 3434.9.1 What is an Embargo? ________________________________________________________ 3444.9.2 DSpace 3.0 New Embargo Functionality _________________________________________ 3454.9.3 Configuring and using Embargo in DSpace 3.0+ ___________________________________ 3464.9.4 Technical Specifications ______________________________________________________ 3584.9.5 Pre-DSpace 3.0 Embargo ____________________________________________________ 3604.9.6 Pre-3.0 Embargo Lifter Commands _____________________________________________ 364

    4.10 Exchanging Content Between Repositories ____________________________________________ 3654.10.1 Transferring Content via Export and Import _______________________________________ 3654.10.2 Transferring Items using Simple Archive Format ___________________________________ 3654.10.3 Transferring Items using OAI-ORE/OAI-PMH Harvester _____________________________ 3664.10.4 Copying Items using the SWORD Client _________________________________________ 366

    4.11 Exporting Content and Metadata _____________________________________________________ 3664.11.1 OAI ______________________________________________________________________ 3664.11.2 SWORDv1 Client ___________________________________________________________ 385

    4.12 Ingesting Content and Metadata _____________________________________________________ 3864.12.1 Submission User Interface ____________________________________________________ 3864.12.2 Configurable Workflow _______________________________________________________ 4144.12.3 Importing and Exporting Content via Packages ____________________________________ 4274.12.4 Importing and Exporting Items via Simple Archive Format ___________________________ 4344.12.5 Registering Bitstreams via Simple Archive Format _________________________________ 4414.12.6 Importing Items via basic bibliographic formats (Endnote, BibTex, RIS, TSV, CSV) and online

    services (OAI, arXiv, PubMed, CrossRef, CiNii) ___________________________________________ 4444.12.7 Importing Community and Collection Hierarchy ____________________________________ 4504.12.8 SWORDv1 Server __________________________________________________________ 4524.12.9 SWORDv2 Server __________________________________________________________ 4584.12.10Ingesting HTML Archives ____________________________________________________ 469

    4.13 Item Level Versioning _____________________________________________________________ 4704.13.1 What is Item Level Versioning? ________________________________________________ 471

  • DSpace 4.x Documentation

    28-Feb-2014 https://wiki.duraspace.org/display/DSDOC4x Page of 6 795

    4.13.2 Important warnings - read before enabling _______________________________________ 4714.13.3 Enabling Item Level Versioning ________________________________________________ 4714.13.4 Initial Requirements _________________________________________________________ 4724.13.5 User Interface ______________________________________________________________ 4734.13.6 Architecture _______________________________________________________________ 4754.13.7 Configuration ______________________________________________________________ 4804.13.8 Identified Challenges & Known Issues in DSpace 4.0 _______________________________ 4814.13.9 Credits ___________________________________________________________________ 482

    4.14 JSPUI Configuration and Customization _______________________________________________ 4834.14.1 Configuration ______________________________________________________________ 4834.14.2 Customizing the JSP pages ___________________________________________________ 483

    4.15 Localization L10n ________________________________________________________________ 4844.15.1 Introduction _______________________________________________________________ 4854.15.2 Common areas of localization _________________________________________________ 4854.15.3 XMLUI specific localization ___________________________________________________ 4864.15.4 JSPUI specific localization ____________________________________________________ 488

    4.16 Mediafilters for Transforming DSpace Content __________________________________________ 4894.16.1 MediaFilters: Transforming DSpace Content ______________________________________ 490

    4.17 Metadata Recommendations _______________________________________________________ 4954.17.1 /*none;margin-left: 0px;} div.rbtoc1393619522420 li {margin-left: 0px;padding-left: 0px;} /*]]>*/ _______ 4954.17.2 Recommended Metadata Fields _______________________________________________ 4954.17.3 Local Fields _______________________________________________________________ 496

    4.18 Mapping Items ___________________________________________________________________ 4964.18.1 Introduction _______________________________________________________________ 4974.18.2 Using the Item Mapper _______________________________________________________ 4974.18.3 Implications _______________________________________________________________ 498

    4.19 Moving Items ____________________________________________________________________ 4984.19.1 Moving Items via Web UI _____________________________________________________ 4984.19.2 Moving Items via the Batch Metadata Editor ______________________________________ 499

    4.20 Managing Community Hierarchy _____________________________________________________ 4994.20.1 Sub-Community Management _________________________________________________ 499

    4.21 Managing User Accounts __________________________________________________________ 5004.21.1 From the browser: XMLUI ____________________________________________________ 5014.21.2 From the browser: JSPUI _____________________________________________________ 5014.21.3 From the command line ______________________________________________________ 5014.21.4 Email Subscriptions _________________________________________________________ 503

    4.22 Request a Copy __________________________________________________________________ 5044.22.1 Introduction _______________________________________________________________ 5044.22.2 Requesting a copy using the XML User Interface __________________________________ 5054.22.3 Requesting a copy using the JSP User Interface ___________________________________ 5074.22.4 Email templates ____________________________________________________________ 5084.22.5 Configuration parameters _____________________________________________________ 508

  • DSpace 4.x Documentation

    28-Feb-2014 https://wiki.duraspace.org/display/DSDOC4x Page of 7 795

    4.23 REST API ______________________________________________________________________ 5094.23.1 What is DSpace REST API ___________________________________________________ 5094.23.2 REST Endpoints ____________________________________________________________ 5094.23.3 Introduction to Jersey for developers ____________________________________________ 5114.23.4 Configuration for DSpace REST _______________________________________________ 5124.23.5 Recording Proxy Access by Tools ______________________________________________ 5124.23.6 Deploying the DSpace REST API in your Servlet Container __________________________ 5134.23.7 Additional Information _______________________________________________________ 513

    4.24 Updating Items via Simple Archive Format _____________________________________________ 5134.24.1 Item Update Tool ___________________________________________________________ 513

    4.25 XMLUI Configuration and Customization ______________________________________________ 5164.25.1 Overview of XMLUI / Manakin _________________________________________________ 5174.25.2 Manakin Configuration Property Keys ___________________________________________ 5204.25.3 Configuring Themes and Aspects ______________________________________________ 5234.25.4 Multilingual Support _________________________________________________________ 5254.25.5 Creating a New Theme ______________________________________________________ 5264.25.6 Customizing the News Document ______________________________________________ 5274.25.7 Adding Static Content _______________________________________________________ 5294.25.8 Harvesting Items from XMLUI via OAI-ORE or OAI-PMH ____________________________ 5294.25.9 Additional XMLUI Learning Resources __________________________________________ 5314.25.10Mirage Configuration and Customization ________________________________________ 5314.25.11XMLUI Base Theme Templates (dri2xhtml) ______________________________________ 5354.25.12DRI Schema Reference _____________________________________________________ 537

    4.26 Authority Control of Metadata Values _________________________________________________ 5804.26.1 WORK IN PROGRESS ______________________________________________________ 5804.26.2 Introduction _______________________________________________________________ 5804.26.3 Simple choice management for DSpace submission forms ___________________________ 5814.26.4 Hierarchical Taxonomies and Controlled Vocabularies ______________________________ 5824.26.5 Authority Control: Enhancing DSpace metadata fields with Authority Keys _______________ 583

    5 System Administration _________________________________________________________________ 5855.1 Introduction to DSpace System Administration __________________________________________ 5855.2 Scheduled Tasks via Cron _________________________________________________________ 586

    5.2.1 Recommended Cron Settings _________________________________________________ 5865.3 Command Line Operations _________________________________________________________ 589

    5.3.1 Executing command line operations ____________________________________________ 5895.3.2 Available operations _________________________________________________________ 5905.3.3 Executing streams of commands _______________________________________________ 5915.3.4 Testing Database Connection _________________________________________________ 591

    5.4 Ant targets and options ____________________________________________________________ 5925.4.1 Options ___________________________________________________________________ 5925.4.2 Targets ___________________________________________________________________ 593

    5.5 AIP Backup and Restore ___________________________________________________________ 5945.5.1 Background & Overview ______________________________________________________ 595

  • DSpace 4.x Documentation

    28-Feb-2014 https://wiki.duraspace.org/display/DSDOC4x Page of 8 795

    5.5.2 Makeup and Definition of AIPs _________________________________________________ 5995.5.3 Running the Code __________________________________________________________ 6005.5.4 Additional Packager Options __________________________________________________ 6125.5.5 Configuration in 'dspace.cfg' __________________________________________________ 6175.5.6 Common Issues or Error Messages _____________________________________________ 6205.5.7 DSpace AIP Format _________________________________________________________ 621

    5.6 Performance Tuning DSpace _______________________________________________________ 6405.6.1 Give Tomcat (DSpace UIs) More Memory ________________________________________ 6415.6.2 Give the Command Line Tools More Memory _____________________________________ 6435.6.3 Give PostgreSQL Database More Memory _______________________________________ 6445.6.4 SOLR Statistics Performance Tuning ___________________________________________ 644

    5.7 Search Engine Optimization ________________________________________________________ 6445.7.1 Ensuring your DSpace is indexed ______________________________________________ 6445.7.2 Google Scholar Metadata Mappings ____________________________________________ 649

    5.8 Validating CheckSums of Bitstreams _________________________________________________ 6505.8.1 Checksum Checker _________________________________________________________ 650

    5.9 Legacy methods for re-indexing content _______________________________________________ 6545.9.1 Overview _________________________________________________________________ 6555.9.2 Re-Enabling the legacy Lucene Search and/or DBMS Browse providers ________________ 6555.9.3 Creating the Browse & Search Indexes __________________________________________ 6565.9.4 Running the Indexing Programs ________________________________________________ 6575.9.5 Indexing Customization ______________________________________________________ 658

    5.10 Troubleshooting Information ________________________________________________________ 6636 DSpace Reference ___________________________________________________________________ 665

    6.1 Directories and Files ______________________________________________________________ 6656.1.1 Overview _________________________________________________________________ 6656.1.2 Source Directory Layout ______________________________________________________ 6666.1.3 Installed Directory Layout _____________________________________________________ 6686.1.4 Contents of JSPUI Web Application _____________________________________________ 6686.1.5 Contents of XMLUI Web Application (aka Manakin) ________________________________ 6686.1.6 Log Files __________________________________________________________________ 669

    6.2 Metadata and Bitstream Format Registries _____________________________________________ 6716.2.1 Default Dublin Core Metadata Registry (DC) ______________________________________ 6726.2.2 Dublin Core Terms Registry (DCTERMS) ________________________________________ 6756.2.3 Default Bitstream Format Registry ______________________________________________ 679

    6.3 Architecture _____________________________________________________________________ 6816.3.1 Overview _________________________________________________________________ 6816.3.2 Application Layer ___________________________________________________________ 6836.3.3 Business Logic Layer ________________________________________________________ 6956.3.4 DSpace Services Framework __________________________________________________ 7276.3.5 Storage Layer ______________________________________________________________ 734

    6.4 History _________________________________________________________________________ 7416.4.1 Changes in 4.x _____________________________________________________________ 741

  • DSpace 4.x Documentation

    28-Feb-2014 https://wiki.duraspace.org/display/DSDOC4x Page of 9 795

    6.4.2 Changes in 3.x _____________________________________________________________ 7476.4.3 Changes in 1.8.x ___________________________________________________________ 7546.4.4 Changes in 1.7.x ___________________________________________________________ 7606.4.5 Changes in 1.6.x ___________________________________________________________ 7676.4.6 Changes in 1.5.x ___________________________________________________________ 7756.4.7 Changes in 1.4.x ___________________________________________________________ 7826.4.8 Changes in 1.3.x ___________________________________________________________ 7856.4.9 Changes in 1.2.x ___________________________________________________________ 7866.4.10 Changes in 1.1.x ___________________________________________________________ 792

    6.5 DSpace Item State Definitions ______________________________________________________ 793

  • DSpace 4.x Documentation

    28-Feb-2014 https://wiki.duraspace.org/display/DSDOC4x Page of 10 795

    1 IntroductionDSpace is an open source software platform that enables organisations to:

    capture and describe digital material using a submission workflow module, or a variety of programmaticingest optionsdistribute an organisation's digital assets over the web through a search and retrieval systempreserve digital assets over the long term

    This system documentation includes a , which is a good introduction to thefunctional overview of the systemcapabilities of the system, and should be readable by non-technical folk. Everyone should read this section firstbecause it introduces some terminology used throughout the rest of the documentation.

    For people actually running a DSpace service, there is an , and sections on andinstallation guide configurationthe .directory structure

    Finally, for those interested in the details of how DSpace works, and those potentially interested in modifying thecode for their own purposes, there is a detailed .architecture and design section

    Other good sources of information are:

    The DSpace Public API Javadocs. Build these with the command mvn javadoc:javadocThe contains stacks of useful information about the DSpace platform and the work peopleDSpace Wikiare doing with it. You are strongly encouraged to visit this site and add information about your own work.Useful Wiki areas are:

    A list of DSpace resources (Web sites, mailing lists etc.)Technical FAQA list of projects using DSpaceGuidelines for contributing back to DSpace

    www.dspace.org has announcements and contains useful information about bringing up an instance ofDSpace at your organization.The . Join DSpace-General to ask questions or join discussions about non-technicalDSpace General Listaspects of building and running a DSpace service. It is open to all DSpace users. Ask questions, sharenews, and spark discussion about DSpace with people managing other DSpace sites. WatchDSpace-General for news of software releases, user conferences, and announcements from the DSpaceFederation.The . DSpace developers help answer installation and technology questions, shareDSpace Technical Listinformation and help each other solve technical problems through the DSpace-Tech mailing list. Postquestions or contribute your expertise to other developers working with the system.

  • DSpace 4.x Documentation

    28-Feb-2014 https://wiki.duraspace.org/display/DSDOC4x Page of 11 795

    The . Join Discussions among DSpace Developers. The DSpace-Devel listservDSpace Development Listis for DSpace developers working on the DSpace platform to share ideas and discuss code changes tothe open source platform. Join other developers to shape the evolution of the DSpace software. TheDSpace community depends on its members to frame functional requirements and high-levelarchitecture, and to facilitate programming, testing, documentation and to the project.

    1.1 Release Notes

    Online Version of Documentation also available

    This documentation was produced with software. A PDF version was generated directlyConfluencefrom Confluence. An online, updated version of this 4.x Documentation is also available at: https://wiki.duraspace.org/display/DSDOC4x

    Welcome to Release 4.1. DSpace 4 is the latest major release offering many new features, bug fixes andimprovements. For information on upgrading to DSpace 4, please see .Upgrading DSpace

    The following is a list of the new features included for the 4.x platform (not an exhaustive list):

    DSpace 4.0 ships with a number of new features. Certain features are automatically enabled by default whileothers require deliberate activation.The following non-exhaustive list contains the major new features in 4.0 that are enabled by default:

  • DSpace 4.x Documentation

    28-Feb-2014 https://wiki.duraspace.org/display/DSDOC4x Page of 12 795

    is now enabled by default in both XMLUI and JSPUI.Discovery: Search & Browse

    Note: The Lucene/DB-based search & browse backend is still supported, but isdeprecated and might be removed in a future release. Any new features should use theDiscovery API instead of tying directly to Lucene, Solr or Elastic Search.

    Discovery general enhancementsSolr libraries were upgraded to version 4.4 (JSPUI, XMLUI and OAImodules) - by lapSolr search accent insensitive - by abSolr-based item counter - by im, ab

    Discovery UI enhancements (both JSPUI and XMLUI)Query spell checking ("did you mean") - XMLUI by kv, JSPUI by lap, ab

    Contributors:lap - Luigi Andrea Pascarelli with the support of CINECAab - Andrea Bollini with the support of CINECAkv - Kevin Van de Velde with the support of @mireim - Ivan Masr

    A new -based default look and feel for JSPUI (see for screenshots)Bootstrap DS-1675

    Kindly contributed by Andrea Bollini & Luigi Andrea Pascarelli with the support of CINECA

    JSPUI new features

    Bibliographic import and lookup in Submission, by and CINECA Greek NationalDocumentation Centre/EKTAJAX progress bar for file upload the submission upload step ( ), byDS-1639Andrea Bollini with the support of CINECASherpa/Romeo integration in the submission upload step ( ), by AndreaDS-1633Bollini with the support of CINECA

  • DSpace 4.x Documentation

    28-Feb-2014 https://wiki.duraspace.org/display/DSDOC4x Page of 13 795

    JSPUI porting of features previously available only on XMLUI

    Advanced Embargo featureItem level versioning featureCuration tasks administrative UI"Login as" feature

    Kindly contributed by Keiji Suzuki & Luigi Andrea Pascarelli with the support of CINECA

    UI support for metadata batch import from various bibliographic formats

    Update to Biblio-Transformation-Engine 0.9.2.2Added data loader for OAI-PMHNew configuration format to support simultaneous input mappings from thevarious supported metadata formatsNew interface for administrators in JSPUI (for file data loaders like bibtex, csv,tsv, endnote and ris)

    Kindly contributed by the Greek National Documentation Centre/EKT

    SWORDv2 module update, which provides the following improvements:

    some general bug fixes including: bitstream url construction, config options,context management and connection pool, ORIGINAL bundle problem (DS-1149)proper METSDSpaceSIP support in both deposit and updateproper authentication for accessing actionable bitstreams (i.e. those that can bereplaced via sword), tightened security options around mediated actions, andadd extra security to the access of descriptive documents (deposit receipts,statements)more configuration options: bundles to expose in Statements, DepositMOextensions (for individual files), and many moresome general refactoringaddition of 404 responses where necessarybetter support for add/replace of metadata, and how metadata updates arehandled on archived itemsupdate to latest version of Java Server librarynew bitstream formats in the bitstream registry

    Kindly contributed by Richard Jones with the support of Cottage Labs

  • DSpace 4.x Documentation

    28-Feb-2014 https://wiki.duraspace.org/display/DSDOC4x Page of 14 795

    Improved command line features

    Run commands in batchDisplay DSpace instance information including version and enabled modules (

    )DS-1456Create users from command line ( )DS-1355

    Kindly contributed by Mark H. Wood with the support of IUPUI University Library

    Support in XMLUI item display and in AIP import/exportsimple embargo

    DS-1697-Convey effective dates in METSRIGHTS information( Closed)

    DS-1514-Embargo settings on item import( Closed)

    Kindly contributed by Ivan Masr and Terry Brady with the support of GeorgetownUniversity

    Language switch for xmlui and some basic i18n stuff

    DS-842-Language switch for xmlui and some basic i18n stuff( Closed)

    Kindly contributed by Claudia Jrgen with the support of TU Dortmund University

    Filtering of web spiders from statistics can now match by the spider host's DNS name orthe spider's User-Agent string.

    DS-790-SOLR - Spider detection to match on hostname or useragent( Closed)

    Kindly contributed by Mark H. Wood with the support of IUPUI University Library

    Several improvements to help better index your content (requested byGoogle ScholarGoogle Scholar team). See also recommendations, forSearch Engine Optimizationways to further enhance Google Scholar (and other search engine) findability.

    DS-1481-"dc.date.issued" is often incorrectly set (reported from Google)( Closed)

    DS-1482-Add a way for harvesters to find recently added items (request fromGoogle)( Closed)

    DS-1483-Store link to "primary bitstream" in citation_pdf_url for Google Scholar(request from Google)( Closed)

    Kindly contributed by several members of the DSpace Committer team (see individualtickets for more details).

  • DSpace 4.x Documentation

    28-Feb-2014 https://wiki.duraspace.org/display/DSDOC4x Page of 15 795

    The following list contains all features that included in the DSpace 4.0 release, but need to be enabledare manually.Review the documentation for these features carefully, especially if you are upgrading from an older version ofDSpace.

    DOI Support

    Support for minting new DOIsSupport for the DataCite and EZID DOI providers

    DS-1535-DOI support for dspace-api( Closed) -EZID DOI provider DS-1678for DSpace( Closed)

    Kindly contributed by Pascal-Nicolas Becker & Mark Wood with the support of TU Berlinand IUPUI University Library

    Support running handle server and application container on separate machines

    DS-1637-Support running handle server and application container on separatemachines( Closed)

    Kindly contributed by Pascal-Nicolas Becker, Andrea Bollini & Mark Wood with thesupport of and TU Berlin CINECA

    Mobile Theme for XMLUI matures from beta ( )DS-1679

    New feature:Documentation

    Kindly contributed by Elias Tzoc and James Russell with the support of MiamiUniversity

    Improvements to LDAP Authentication

    New option to map LDAP attribute-based group membership to internal DSpacegroups

    Kindly contributed by Ivan Masr and Sam Ottenhoff of for Longsight Allegheny College( ).DS-1078

  • DSpace 4.x Documentation

    28-Feb-2014 https://wiki.duraspace.org/display/DSDOC4x Page of 16 795

    Media filter generates better-looking thumbnails

    DS-1259-use better image downscaling method in filter-media( Closed)

    Kindly contributed by Jason Sherman with the support of University of Science and Artsof Oklahoma

    Curation Task for Consuming Web Services

    DS-1647-Curation Task for Consuming Web Services( Closed)

    Kindly contributed by Richard Rodgers with the support of Massachusetts Institute ofTechnology

    Request a Copy for JSPUI and XMLUI ( )DS-824

    For items with restricted access, allows users to ask the original author for acopy of the item

    [original Minho addon docs: ]RequestCopy

    Original contribution of the , improvements and porting to XMLUI byUniversity of MinhoAndrea Bollini with the support of CINECA

    A new module based on Jersey (a JAX RS 1.0 implementation)REST web service API( )DS-1696

    Provides:

    Read-only access to unrestricted communities, collections, items and bitstreamsHandle lookupJSON (and XML) output formats

    Kindly contributed by Peter Dietz with the support of Ohio State University Libraries

    A full list of all changes / bug fixes in 4.x is available in the section.Changes in 4.x

  • DSpace 4.x Documentation

    28-Feb-2014 https://wiki.duraspace.org/display/DSDOC4x Page of 17 795

    The following individuals have contributed directly to this release of DSpace: Alan Orth, Alexey Maslov, lexMagaz Graa, Andrea Bollini, Andrea Schweer, Andrew Waterman, Anja Le Blanc, Bram Luyten (@mire), BrianFreels-Stendel, Cedric Devaux, Claudia Jrgen, Clint Bellanger, Denis Fdz, DSpace @ Lyncode, Elias Tzoc,Fabio Bolognesi, Hardy Pottinger, Hlder Silva, Hilton Gibson, Ian Boston, Ivan Masr, james bardin, JasonSherman, Joo Melo, Jonathan Blood, Juan Corrales Correyero, Keiji Suzuki, Kevin Van de Velde, KostasMaistrelis, Kostas Stamatis, LifeH2O, Luigi Andrea Pascarelli, Marco Fabiani, Marco Weiss, Mark Diggory, MarkH. Wood, Michael White, Moises A., Moises Alvarez, Onivaldo Rosa Junior, Pascal-Nicolas Becker, Peter Dietz,Rania Stathopoulou, Raul Ruiz, Richard Jones, Richard Rodgers, Robert Ruiz, Robin Taylor, Roeland Dillen,Samuel Ottenhoff, Sara Amato, Sean Carte, Stuart Lewis, Terry Brady, Thomas Autry, Thomas Misilo, TimDonohue, Toni Prieto, usha sharma, and others who reviewed and commented on their work. Many of thesecould not do this work without the support (release time and financial) of their associated institutions. We offerthanks to those institutions for supporting their staff to take time to contribute to the DSpace project.

    A big thank you also goes out to the (DCAT), who helped the developers toDSpace Community Advisory Teamprioritize and plan out several of the new features that made it into this release. The current DCAT membersinclude: Amy Lana, Augustine Gitonga, Bram Luyten, Ciarn Walsh, Claire Bundy, Dibyendra Hyoju, ElenaFeinstein, Elin Stangeland, Iryna Kuchma, Jim Ottaviani, Leonie Hayes, Maureen Walsh, Michael Guthrie, SarahMolloy, Sarah Shreeves, Sue Kunda, Valorie Hollister and Yan Han.

    We apologize to any contributor accidentally left off this list. DSpace has such a large, active developmentcommunity that we sometimes lose track of all our contributors. Our ongoing list of all known people/institutionsthat have contributed to DSpace software can be found on our . Acknowledgments toDSpace Contributors pagethose left off will be made in future releases.

    Want to see your name appear in our list of contributors? All you have to do is report an issue, fix a bug,improve our documentation or help us determine the necessary requirements for a new feature! Visit our Issue

    to report a bug, or join to take part in development work. If you'd like to helpTracker dspace-devel mailing listimprove our current documentation, please get in touch with one of our with your ideas. You don'tCommitterseven need to be a developer! Repository managers can also get involved by volunteering to join the DSpace

    and helping our developers to plan new features.Community Advisory Team

    The Release Team consisted of Mark H. Wood, Hardy Pottinger and Andrea Bollini.

    Additional thanks to Tim Donohue from DuraSpace for keeping all of us focused on the work at hand, forcalming us when we got excited, and for the general support for the DSpace project.

    1.2 Functional OverviewThe following sections describe the various functional aspects of the DSpace system.

  • DSpace 4.x Documentation

    28-Feb-2014 https://wiki.duraspace.org/display/DSDOC4x Page of 18 795

    1 Online access to your digital assets1.1 Full-text search1.2 Navigation1.3 Supported file types1.4 Optimized for Google Indexing1.5 OpenURL Support

    2 Metadata Management2.1 Metadata2.2 Choice Management and Authority Control

    3 Licensing3.1 Collection and Community Licenses3.2 License granted by the submitter to the repository3.3 Creative Commons Support for DSpace Items

    4 Persistent URLs and Identifiers4.1 Handles4.2 Bitstream 'Persistent' Identifiers

    5 Getting content into DSpace5.1 The Manual DSpace Submission and Workflow System

    5.1.1 Workflow Steps5.1.2 Submission Workflow in DSpace

    5.2 Command line import facilities5.3 Registration for externally hosted files

    6 Getting content out of DSpace6.1 OAI Support6.2 SWORD Support6.3 Command Line Export Facilities6.4 Packager Plugins6.5 Crosswalk Plugins6.6 Supervision and Collaboration

    7 User Management7.1 User Accounts (E-Person)7.2 Subscriptions7.3 Groups

    8 Access Control8.1 Authentication8.2 Authorization

    9 Usage Metrics9.1 Item, Collection and Community Usage Statistics9.2 System Statistics

    10 Digital Preservation10.1 Checksum Checker

  • DSpace 4.x Documentation

    28-Feb-2014 https://wiki.duraspace.org/display/DSDOC4x Page of 19 795

    11 System Design11.1 Data Model11.2 Storage Resource Broker (SRB) Support

    1.2.1 Online access to your digital assetsThe online presentation of your content in an organized tree of Community and Collections is a main feature ofDSpace. Users can access pages for individual items, these are metadata descriptions together with filesavailable for download.

    Full-text searchDSpace can process uploaded text based contents for full-text searching. This means that not only the metadatayou provide for a given file will be searchable, but all of its contents will be indexed as well. This allows users tosearch for specific keywords that only appear in the actual content and not in the provided description.

    NavigationDSpace allows users to find their way to relevant content in a number of ways, including:

    Searching for one or more keywords in metadata or extracted full-textFaceted browsing through any field provided in the item description.Search is an essential component of discovery in DSpace. Users' expectations from a search engine arequite high, so a goal for DSpace is to supply as many search features as possible. DSpace's indexing andsearch module has a very simple API which allows for indexing new content, regenerating the index, andperforming searches on the entire corpus, a community, or collection. Behind the API is the Javafreeware search engine . Lucene gives us fielded searching, stop word removal, stemming, andLucenethe ability to incrementally add new indexed content without regenerating the entire index. The specificLucene search indexes are configurable enabling institutions to customize which DSpace metadata fieldsare indexed.Through , such as a Handleexternal reference

    Another important mechanism for discovery in DSpace is the browse. This is the process whereby the userviews a particular index, such as the title index, and navigates around it in search of interesting items. Thebrowse subsystem provides a simple API for achieving this by allowing a caller to specify an index, and asubsection of that index. The browse subsystem then discloses the portion of the index of interest. Indices thatmay be browsed are item title, item issue date, item author, and subject terms. Additionally, the browse can belimited to items within a particular collection or community.

  • DSpace 4.x Documentation

    28-Feb-2014 https://wiki.duraspace.org/display/DSDOC4x Page of 20 795

    Supported file typesDSpace can accommodate any type of uploaded file. While DSpace is most known for hosting text basedmaterials including scholarly communication and electronic theses and dissertations (ETDs), there are manystakeholders in the community who use DSpace for multimedia, data and learning objects. While somerestrictions apply, DSpace can even serve as a store for .HTML Archives

    Files that have been uploaded to DSpace are often referred to as "Bitstreams". The reason for this is mainlyhistoric and tracks back to the technical implementation. After ingestion, files in DSpace are stored on the filesystem as a stream of bits without the file extension.

    Optimized for Google IndexingThe Duraspace community fosters a close relation with Google to ensure optimal indexing of DSpace content,primarily in the Google Search and Google Scholar products. For the purpose of Google Scholar indexing,DSpace added specific metadata in the page head tags facilitating indexing in Scholar. More information can beretrieved on the . Popular DSpace repositories often generate overGoogle Scholar Metadata Mappings page60% of their visits from Google pages.

    OpenURL SupportDSpace supports the from , in a rather simple fashion. If your institution has an SFXOpenURL protocol SFXserver, DSpace will display an OpenURL link on every item page, automatically using the Dublin Core metadata.Additionally, DSpace can respond to incoming OpenURLs. Presently it simply passes the information in theOpenURL to the search subsystem. A list of results is then displayed, which usually gives the relevant item (if itis in DSpace) at the top of the list.

    1.2.2 Metadata Management

    MetadataBroadly speaking, DSpace holds three sorts of metadata about archived content:

    Descriptive Metadata: DSpace can support multiple flat metadata schemas for describing an item. Aqualified Dublin Core metadata schema loosely based on the set of elementsLibrary Application Profileand qualifiers is provided by default. The comesset of elements and qualifiers used by MIT Librariespre-configured with the DSpace source code. However, you can configure multiple schemas and selectmetadata fields from a mix of configured schemas to describe your items. Other descriptive metadataabout items (e.g. metadata described in a hierarchical schema) may be held in serialized bitstreams.

    and have some simple descriptive metadata (a name, and some descriptiveCommunities collectionsprose), held in the DBMS.

  • DSpace 4.x Documentation

    28-Feb-2014 https://wiki.duraspace.org/display/DSDOC4x Page of 21 795

    Administrative Metadata: This includes preservation metadata, provenance and authorization policydata. Most of this is held within DSpace's relational DBMS schema. Provenance metadata (prose) isstored in Dublin Core records. Additionally, some other administrative metadata (for example, bitstreambyte sizes and MIME types) is replicated in Dublin Core records so that it is easily accessible outside ofDSpace.Structural Metadata: This includes information about how to present an item, or bitstreams within anitem, to an end-user, and the relationships between constituent parts of the item. As an example,consider a thesis consisting of a number of TIFF images, each depicting a single page of the thesis.Structural metadata would include the fact that each image is a single page, and the ordering of the TIFFimages/pages. Structural metadata in DSpace is currently fairly basic; within an item, bitstreams can bearranged into separate bundles as described above. A bundle may also optionally have a primary

    . This is currently used by the HTML support to indicate which bitstream in the bundle is the firstbitstreamHTML file to send to a browser. In addition to some basic technical metadata, a bitstream also has a'sequence ID' that uniquely identifies it within an item. This is used to produce a 'persistent' bitstreamidentifier for each bitstream. Additional structural metadata can be stored in serialized bitstreams, butDSpace does not currently understand this natively.

    Choice Management and Authority ControlThis is a configurable framework that lets you define plug-in classes to control the choice of values for a givenDSpace metadata fields. It also lets you configure fields to include "authority" values along with the textualmetadata value. The choice-control system includes a user interface in both the Configurable Submission UI andthe Admin UI (edit Item pages) that assists the user in choosing metadata values.

    Introduction and Motivation

    Definitions

    Choice Management

    This is a mechanism that generates a list of choices for a value to be entered in a given metadata field.Depending on your implementation, the exact choice list might be determined by a proposed value or query, or itcould be a fixed list that is the same for every query. It may also be closed (limited to choices producedinternally) or open, allowing the user-supplied query to be included as a choice.

    Authority Control

    This works in addition to choice management to supply an authority key along with the chosen value, which isalso assigned to the Item's metadata field entry. Any authority-controlled field is also inherentlychoice-controlled.

    About Authority Control

    The advantages we seek from an authority controlled metadata field are:

  • DSpace 4.x Documentation

    28-Feb-2014 https://wiki.duraspace.org/display/DSDOC4x Page of 22 795

    1.

    2.

    3.

    4.

    1.

    2.

    There is a simple and positive way to test whether two values are identical, by comparing authoritykeys.

    Comparing plain text values can give false positive results e.g. when two different people have aname that is written the same.It can also give false negative results when the same name is written different ways, e.g. "J. Smith"vs. "John Smith".

    Help in entering correct metadata values. The submission and admin UIs may call on the authority tocheck a proposed value and list possible matches to help the user select one.Improved interoperability. By sharing a name authority with another application, your DSpace caninteroperate more cleanly with other applications.

    For example, a DSpace institutional repository sharing a naming authority with the campus socialnetwork would let the social network construct a list of all DSpace Items matching the sharedauthor identifier, rather than by error-prone name matching.When the name authority is shared with a campus directory, DSpace can look up the emailaddress of an author to send automatic email about works of theirs submitted by a third party. Thatauthor does not have to be an EPerson.

    Authority keys are normally invisible in the public web UIs. They are only seen by administrators editingmetadata. The value of an authority key is not expected to be meaningful to an end-user or site visitor.Authority control is different from the controlled vocabulary of keywords already implemented in thesubmission UI:

    Authorities are external to DSpace. The source of authority control is typically an external database ornetwork resource.

    Plug-in architecture makes it easy to integrate new authorities without modifying any core code.This authority proposal impacts all phases of metadata management.

    The keyword vocabularies are only for the submission UI.Authority control is asserted everywhere metadata values are changed, including unattended/batchsubmission, LNI and SWORD package submission, and the administrative UI.

    Some Terminology

    Authority An authority is a source of fixed values for a given domain, each unique value identified by akey.

    . For example, the OCLC LC Name Authority Service.

    AuthorityRecord

    The information associated with one of the values in an authority; may include alternatespellings and equivalent forms of the value, etc.

    AuthorityKey

    An opaque, hopefully persistent, identifier corresponding to exactly one record in the authority.

  • DSpace 4.x Documentation

    28-Feb-2014 https://wiki.duraspace.org/display/DSDOC4x Page of 23 795

    1.2.3 LicensingDSpace offers support for licenses on different levels

    Collection and Community LicensesEach community and collection in the hierarchy of a DSpace repository can contain its own license terms. Thisallows an institution to use the repository both for collections where certain rights are reserved and others fromwhich the content may be accessed and distributed more freely.

    License granted by the submitter to the repositoryAt the end of the manual submission process, the submitter is asked to grant the repository service anappropriate distribution license. This license can be easily customized on a per collection basis. In its mostcommon form, the submitter grants to the repository service a non-exclusive distribution license, meaning thathe officially gives the repository service the right to share his or her work with the world.

    Creative Commons Support for DSpace ItemsDSpace provides support for Creative Commons licenses to be attached to items in the repository. Theyrepresent an alternative to traditional copyright. To learn more about Creative Commons, visit .their websiteSupport for license selection is controlled by a site-wide configuration option, and since license selectioninvolves interaction with the Creative Commons website, additional parameters may be configured to work witha proxy server. If the option is enabled, users may select a Creative Commons license during the submissionprocess, or elect to skip Creative Commons licensing. If a selection is made, metadata and (optionally) a copy ofthe license text is stored along with the item in the repository. There is also an indication - text and a CreativeCommons icon - in the item display page of the web user interface when an item is licensed under CreativeCommons. For specifics of how to configure and use Creative Commons licenses, see the configuration section.

    1.2.4 Persistent URLs and Identifiers

    HandlesResearchers require a stable point of reference for their works. The simple evolution from sharing of citations toemailing of URLs broke when Web users learned that sites can disappear or be reconfigured without notice, andthat their bookmark files containing critical links to research results couldn't be trusted in the long term. To helpsolve this problem, a core DSpace feature is the creation of a persistent identifier for every item, collection andcommunity stored in DSpace. To persist identifiers, DSpace requires a storage- and location- independentmechanism for creating and maintaining identifiers. DSpace uses the for creating theseCNRI Handle Systemidentifiers. The rest of this section assumes a basic familiarity with the Handle system.

  • DSpace 4.x Documentation

    28-Feb-2014 https://wiki.duraspace.org/display/DSDOC4x Page of 24 795

    DSpace uses Handles primarily as a means of assigning globally unique identifiers to objects. Each site runningDSpace needs to obtain a unique Handle 'prefix' from CNRI, so we know that if we create identifiers with thatprefix, they won't clash with identifiers created elsewhere.

    Presently, Handles are assigned to communities, collections, and items. Bundles and bitstreams are notassigned Handles, since over time, the way in which an item is encoded as bits may change, in order to allowaccess with future technologies and devices. Older versions may be moved to off-line storage as a newstandard becomes de facto. Since it's usually the that is being preserved, rather than the particular bititemencoding, it only makes sense to persistently identify and allow access to the item, and allow users to accessthe appropriate bit encoding from there.

    Of course, it may be that a particular bit encoding of a file is explicitly being preserved; in this case, the bitstreamcould be the only one in the item, and the item's Handle would then essentially refer just to that bitstream. Thesame bitstream can also be included in other items, and thus would be citable as part of a greater item, orindividually.

    The Handle system also features a global resolution infrastructure; that is, an end-user can enter a Handle intoany service (e.g. Web page) that can resolve Handles, and the end-user will be directed to the object (in thecase of DSpace, community, collection or item) identified by that Handle. In order to take advantage of thisfeature of the Handle system, a DSpace site must also run a 'Handle server' that can accept and resolveincoming resolution requests. All the code for this is included in the DSpace source code bundle.

    Handles can be written in two forms:

    hdl:1721.123/4567http://hdl.handle.net/1721.123/4567

    The above represent the same Handle. The first is possibly more convenient to use only as an identifier;however, by using the second form, any Web browser becomes capable of resolving Handles. An end-userneed only access this form of the Handle as they would any other URL. It is possible to enable some browsersto resolve the first form of Handle as if they were standard URLs using , butCNRI's Handle Resolver plug-insince the first form can always be simply derived from the second, DSpace displays Handles in the second form,so that it is more useful for end-users.

    It is important to note that DSpace uses the CNRI Handle infrastructure only at the 'site' level. For example, inthe above example, the DSpace site has been assigned the prefix '1721.123'. It is still the responsibility of theDSpace site to maintain the association between a full Handle (including the '4567' local part) and thecommunity, collection or item in question.

  • DSpace 4.x Documentation

    28-Feb-2014 https://wiki.duraspace.org/display/DSDOC4x Page of 25 795

    Bitstream 'Persistent' IdentifiersSimilar to handles for DSpace items, bitstreams also have 'Persistent' identifiers. They are more volatile thanHandles, since if the content is moved to a different server or organization, they will no longer work (hence thequotes around 'persistent'). However, they are more easily persisted than the simple URLs based on databaseprimary key previously used. This means that external systems can more reliably refer to specific bitstreamsstored in a DSpace instance.

    Each bitstream has a sequence ID, unique within an item. This sequence ID is used to create a persistent ID, ofthe form:

    dspace url/bitstream/handle/sequence ID/filename

    For example:

    https://dspace.myu.edu/bitstream/123.456/789/24/foo.html

    The above refers to the bitstream with sequence ID 24 in the item with the Handle . The hdl:123.456/789 foo.htmlis really just there as a hint to browsers: Although DSpace will provide the appropriate MIME type, somebrowsers only function correctly if the file has an expected extension.

    1.2.5 Getting content into DSpace

    The Manual DSpace Submission and Workflow SystemRather than being a single subsystem, ingesting is a process that spans several. Below is a simple illustration ofthe current ingesting process in DSpace.

  • DSpace 4.x Documentation

    28-Feb-2014 https://wiki.duraspace.org/display/DSDOC4x Page of 26 795

    DSpace Ingest Process

    The batch item importer is an application, which turns an external SIP (an XML metadata document with somecontent files) into an "in progress submission" object. The Web submission UI is similarly used by an end-user toassemble an "in progress submission" object.

    Depending on the policy of the collection to which the submission in targeted, a workflow process may bestarted. This typically allows one or more human reviewers or 'gatekeepers' to check over the submission andensure it is suitable for inclusion in the collection.

    When the Batch Ingester or Web Submit UI completes the InProgressSubmission object, and invokes the nextstage of ingest (be that workflow or item installation), a provenance message is added to the Dublin Core whichincludes the filenames and checksums of the content of the submission. Likewise, each time a workflowchanges state (e.g. a reviewer accepts the submission), a similar provenance statement is added. This allows usto track how the item has changed since a user submitted it.

    Once any workflow process is successfully and positively completed, the InProgressSubmission object isconsumed by an "item installer", that converts the InProgressSubmission into a fully blown archived item inDSpace. The item installer:

    Assigns an accession dateAdds a "date.available" value to the Dublin Core metadata record of the itemAdds an issue date if none already presentAdds a provenance message (including bitstream checksums)Assigns a Handle persistent identifierAdds the item to the target collection, and adds appropriate authorization policiesAdds the new item to the search and browse index

    Workflow StepsA collection's workflow can have up to three steps. Each collection may have an associated e-person group forperforming each step; if no group is associated with a certain step, that step is skipped. If a collection has noe-person groups associated with any step, submissions to that collection are installed straight into the mainarchive.

    In other words, the sequence is this: The collection receives a submission. If the collection has a group assignedfor workflow step 1, that step is invoked, and the group is notified. Otherwise, workflow step 1 is skipped.Likewise, workflow steps 2 and 3 are performed if and only if the collection has a group assigned to those steps.

    When a step is invoked, the submission is put into the 'task pool' of the step's associated group. One member ofthat group takes the task from the pool, and it is then removed from the task pool, to avoid the situation whereseveral people in the group may be performing the same task without realizing it.

    The member of the group who has taken the task from the pool may then perform one of three actions:

  • DSpace 4.x Documentation

    28-Feb-2014 https://wiki.duraspace.org/display/DSDOC4x Page of 27 795

    WorkflowStep

    Possible actions

    1 Can accept submission for inclusion, or reject submission.

    2 Can edit metadata provided by the user with the submission, but cannot change the submittedfiles. Can accept submission for inclusion, or reject submission.

    3 Can edit metadata provided by the user with the submission, but cannot change the submittedfiles. Must then commit to archive; may not reject submission.

    Submission Workflow in DSpaceIf a submission is rejected, the reason (entered by the workflow participant) is e-mailed to the submitter, and it isreturned to the submitter's 'My DSpace' page. The submitter can then make any necessary modifications andre-submit, whereupon the process starts again.

    If a submission is 'accepted', it is passed to the next step in the workflow. If there are no more workflow stepswith associated groups, the submission is installed in the main archive.

    One last possibility is that a workflow can be 'aborted' by a DSpace site administrator. This is accomplishedusing the administration UI.

    The reason for this apparently arbitrary design is that is was the simplest case that covered the needs of theearly adopter communities at MIT. The functionality of the workflow system will no doubt be extended in thefuture.

    Command line import facilitiesDSpace includes batch tools to import items in a simple directory structure, where the Dublin Core metadata isstored in an XML file. This may be used as the basis for moving content between DSpace and other systems.For more information see .Item Importer and Exporter

  • DSpace 4.x Documentation

    28-Feb-2014 https://wiki.duraspace.org/display/DSDOC4x Page of 28 795

    DSpace also includes various package importer tools, which support many common content packaging formatslike METS. For more information see .Package Importer and Exporter

    Registration for externally hosted filesRegistration is an alternate means of incorporating items, their metadata, and their bitstreams into DSpace bytaking advantage of the bitstreams already being in accessible computer storage. An example might be thatthere is a repository for existing digital assets. Rather than using the normal interactive ingest process or thebatch import to furnish DSpace the metadata and to upload bitstreams, registration provides DSpace themetadata and the location of the bitstreams. DSpace uses a variation of the import tool to accomplishregistration.

    1.2.6 Getting content out of DSpace

    OAI SupportThe has developed a . This allows sites toOpen Archives Initiative protocol for metadata harvestingprogrammatically retrieve or 'harvest' the metadata from several sources, and offer services using thatmetadata, such as indexing or linking services. Such a service could allow users to access information from alarge number of sites from one place.

    DSpace exposes the Dublin Core metadata for items that are publicly (anonymously) accessible. Additionally,the collection structure is also exposed via the OAI protocol's 'sets' mechanism. OCLC's open source OAICatframework is used to provide this functionality.

    You can also configure the OAI service to make use of any crosswalk plugin to offer additional metadataformats, such as MODS.

    DSpace's OAI service does support the exposing of deletion information for withdrawn items, but not for itemsthat are 'expunged' (see above). DSpace also supports OAI-PMH resumption tokens.

    SWORD SupportSWORD (Simple Web-service Offering Repository Deposit) is a protocol that allows the remote deposit of itemsinto repositories. SWORD was further developed in SWORD version 2 to add the ability to retrieve, update, ordelete deposits. DSpace supports the SWORD protocol via the 'sword' web application and SWord v2 via theswordv2 web application. The specification and further information can be found at .http://swordapp.org

    Command Line Export FacilitiesDSpace includes batch tools to export items in a simple directory structure, where the Dublin Core metadata isstored in an XML file. This may be used as the basis for moving content between DSpace and other systems.For more information see .Item Importer and Exporter

  • DSpace 4.x Documentation

    28-Feb-2014 https://wiki.duraspace.org/display/DSDOC4x Page of 29 795

    DSpace also includes various package exporter tools, which support many common content packaging formatslike METS. For more information see .Package Importer and Exporter

    Packager PluginsPackagers are software modules that translate between DSpace Item objects and a self-contained externalrepresentation, or "package". A interprets, or , the package and creates an Item. A Package Ingester ingests

    writes out the contents of an Item in the package format.Package Disseminator

    A package is typically an archive file such as a Zip or "tar" file, including a document which containsmanifestmetadata and a description of the package contents. The is a typical packaging standard.IMS Content PackageA package might also be a single document or media file that contains its own metadata, such as a PDFdocument with embedded descriptive metadata.

    Package ingesters and package disseminators are each a type of named plugin (see ), so it isPlugin Managereasy to add new packagers specific to the needs of your site. You do not have to supply both an ingester anddisseminator for each format; it is perfectly acceptable to just implement one of them.

    Most packager plugins call upon to translate the metadata between DSpace's object modelCrosswalk Pluginsand the package format.

    More information about calling Packagers to ingest or disseminate content can be found in the Package Importer section of the System Administration documentation.and Exporter

    Crosswalk PluginsCrosswalks are software modules that translate between DSpace object metadata and a specific externalrepresentation. An interprets the external format and crosswalks it to DSpace's internal dataIngestion Crosswalkstructure, while a does the opposite.Dissemination Crosswalk

    For example, a MODS ingestion crosswalk translates descriptive metadata from the MODS format to themetadata fields on a DSpace Item. A MODS dissemination crosswalk generates a MODS document from themetadata on a DSpace Item.

    Crosswalk plugins are named plugins (see ), so it is easy to add new crosswalks. You do notPlugin Managerhave to supply both an ingester and disseminator for each format; it is perfectly acceptable to just implementone of them.

    There is also a special pair of crosswalk plugins which use XSL stylesheets to translate the external metadata toor from an internal DSpace format. You can add and modify XSLT crosswalks simply by editing the DSpaceconfiguration and the stylesheets, which are stored in files in the DSpace installation directory.

    The Packager plugins and OAH-PMH server make use of crosswalk plugins.

  • DSpace 4.x Documentation

    28-Feb-2014 https://wiki.duraspace.org/display/DSDOC4x Page of 30 795

    Supervision and CollaborationIn order to facilitate, as a primary objective, the opportunity for thesis authors to be supervised in the preparationof their e-theses, a supervision order system exists to bind groups of other users (thesis supervisors) to an itemin someone's pre-submission workspace. The bound group can have system policies associated with it thatallow different levels of interaction with the student's item; a small set of default policy groups are provided:

    Full editorial controlView item contentsNo policiesOnce the default set has been applied, a system administrator may modify them as they would any otherpolicy set in DSpace

    This functionality could also be used in situations where researchers wish to collaborate on a particularsubmission, although there is no particular collaborative workspace functionality.

    1.2.7 User ManagementAlthough many of DSpace's functions such as document discovery and retrieval can be used anonymously,some features (and perhaps some documents) are only available to certain "privileged" users. E-People andGroups are the way DSpace identifies application users for the purpose of granting privileges. This identity isbound to a session of a DSpace application such as the Web UI or one of the command-line batch programs.Both E-People and Groups are granted privileges by the authorization system described below.

    User Accounts (E-Person)DSpace holds the following information about each e-person:

    E-mail addressFirst and last namesWhether the user is able to log in to the system via the Web UI, and whether they must use an X509certificate to do so;A password (encrypted), if appropriateA list of collections for which the e-person wishes to be notified of new itemsWhether the e-person 'self-registered' with the system; that is, whether the system created the e-personrecord automatically as a result of the end-user independently registering with the system, as opposed tothe e-person record being generated from the institution's personnel database, for example.The network ID for the corresponding LDAP record, if LDAP authentication is used for this E-Person.

  • DSpace 4.x Documentation

    28-Feb-2014 https://wiki.duraspace.org/display/DSDOC4x Page of 31 795

    SubscriptionsAs noted above, end-users (e-people) may 'subscribe' to collections in order to be alerted when new itemsappear in those collections. Each day, end-users who are subscribed to one or more collections will receive ane-mail giving brief details of all new items that appeared in any of those collections the previous day. If no newitems appeared in any of the subscribed collections, no e-mail is sent. Users can unsubscribe themselves at anytime. RSS feeds of new items are also available for collections and communities.

    GroupsGroups are another kind of entity that can be granted permissions in the authorization system. A group is usuallyan explicit list of E-People; anyone identified as one of those E-People also gains the privileges granted to thegroup.

    However, an application session can be assigned membership in a group being identified as anwithoutE-Person. For example, some sites use this feature to identify users of a local network so they can readrestricted materials not open to the whole world. Sessions originating from the local network are givenmembership in the "LocalUsers" group and gain the corresponding privileges.

    Administrators can also use groups as "roles" to manage the granting of privileges more efficiently.

    1.2.8 Access Control

    AuthenticationAuthentication is when an application session positively identifies itself as belonging to an E-Person and/orGroup. In DSpace 1.4 and later, it is implemented by a mechanism called : the DSpaceStackable Authenticationconfiguration declares a "stack" of authentication methods. An application (like the Web UI) calls on theAuthentication Manager, which tries each of these methods in turn to identify the E-Person to which the sessionbelongs, as well as any extra Groups. The E-Person authentication methods are tried in turn until one succeeds.Every authenticator in the stack is given a chance to assign extra Groups. This mechanism offers the followingadvantages:

    Separates authentication from the Web user interface so the same authentication methods are used forother applications such as non-interactive Web ServicesImproved modularity: The authentication methods are all independent of each other. Customauthentication methods can be "stacked" on top of the default DSpace username/password method.Cleaner support for "implicit" authentication where username is found in the environment of a Webrequest, e.g. in an X.509 client certificate.

  • DSpace 4.x Documentation

    28-Feb-2014 https://wiki.duraspace.org/display/DSDOC4x Page of 32 795

    AuthorizationDSpace's authorization system is based on associating actions with objects and the lists of EPeople who canperform them. The associations are called Resource Policies, and the lists of EPeople are called Groups. Thereare two built-in groups: 'Administrators', who can do anything in a site, and 'Anonymous', which is a list thatcontains all users. Assigning a policy for an action on an object to anonymous means giving everyonepermission to do that action. (For example, most objects in DSpace sites have a policy of 'anonymous' READ.)Permissions must be explicit - lack of an explicit permission results in the default policy of 'deny'. Permissionsalso do not 'commute'; for example, if an e-person has READ permission on an item, they might not necessarilyhave READ permission on the bundles and bitstreams in that item. Currently Collections, Communities andItems are discoverable in the browse and search systems regardless of READ authorization.

    The following actions are possible:

    Collection

    ADD/REMOVE add or remove items (ADD = permission to submit items)

    DEFAULT_ITEM_READ inherited as READ by all submitted items

    DEFAULT_BITSTREAM_READ inherited as READ by Bitstreams of all submitted items. Note: only affectsBitstreams of an item at the time it is initially submitted. If a Bitstream isadded later, it does get the same default read policy.not

    COLLECTION_ADMIN collection admins can edit items in a collection, withdraw items, map otheritems into this collection.

    Item

    ADD/REMOVE add or remove bundles

    READ can view item (item metadata is always viewable)

    WRITE can modify item

    Bundle

    ADD/REMOVE add or remove bitstreams to a bundle

    Bitstream

    READ view bitstream

    WRITE modify bitstream

  • DSpace 4.x Documentation

    28-Feb-2014 https://wiki.duraspace.org/display/DSDOC4x Page of 33 795

    Note that there is no 'DELETE' action. In order to 'delete' an object (e.g. an item) from the archive, one musthave REMOVE permission on all objects (in this case, collection) that contain it. The 'orphaned' item isautomatically deleted.

    Policies can apply to individual e-people or groups of e-people.

    1.2.9 Usage MetricsDSpace is equipped with SOLR based infrastructure to log and display pageviews and file downloads.

    Item, Collection and Community Usage StatisticsUsage statistics can be retrieved from individual item, collection and community pages. These Usage Statisticspages show:

    Total page visits (all time)Total Visits per MonthFile Downloads (all time)*Top Country Views (all time)Top City Views (all time)

    *File Downloads information is only displayed for item-level statistics. Note that downloads from separatebitstreams are also recorded and represented separately. DSpace is able to capture and store File Downloadinformation, even when the bitstream was downloaded from a direct link on an external website.

    System StatisticsVarious statistical reports about the contents and use of your system can be automatically generated by thesystem. These are generated by analyzing DSpace's log files. Statistics can be broken down monthly.

    The report includes following sections

  • DSpace 4.x Documentation

    28-Feb-2014 https://wiki.duraspace.org/display/DSDOC4x Page of 34 795

    A customizable general overview of activities in the archive, by default including:Number of items archivedNumber of bitstream viewsNumber of item page viewsNumber of collection page viewsNumber of community page viewsNumber of user loginsNumber of searches performedNumber of license rejectionsNumber of OAI Requests

    Customizable summary of archive contentsBroken-down list of item viewingsA full break-down of all performed actionsUser loginsMost popular searchesLog Level InformationProcessing information!stats_genrl_overview.png!The results of statistical analysis can be presented on a by-month and an in-total report, and are availablevia the user interface. The reports can also either be made public or restricted to administrator accessonly.

    1.2.10 Digital Preservation

    Checksum CheckerThe purpose of the checker is to verify that the content in a DSpace repository has not become corrupted orbeen tampered with. The functionality can be invoked on an ad-hoc basis from the command line, or configuredvia cron or similar. Options exist to support large repositories that cannot be entirely checked in one run of thetool. The tool is extensible to new reporting and checking priority approaches.

  • DSpace 4.x Documentation

    28-Feb-2014 https://wiki.duraspace.org/display/DSDOC4x Page of 35 795

    1.2.11 System Design

    Data Model

    Data Model Diagram

    The way data is organized in DSpace is intended to reflect the structure of the organization using the DSpacesystem. Each DSpace site is divided into , which can be further divided into communities sub-communitiesreflecting the typical university structure of college, department, research center, or laboratory.

    Communities contain , which are groupings of related content. A collection may appear in more thancollectionsone community.

  • DSpace 4.x Documentation

    28-Feb-2014 https://wiki.duraspace.org/display/DSDOC4x Page of 36 795

    Each collection is composed of , which are the basic archival elements of the archive. Each item is owneditemsby one collection. Additionally, an item may appear in additional collections; however every item has one andonly one owning collection.

    Items are further subdivided into named of . Bitstreams are, as the name suggests, streamsbundles bitstreamsof bits, usually ordinary computer files. Bitstreams that are somehow closely related, for example HTML files andimages that compose a single HTML document, are organized into bundles.

    In practice, most items tend to have these named bundles:

    ORIGINAL the bundle with the original, deposited bitstreamsTHUMBNAILS thumbnails of any image bitstreamsTEXT extracted full-text from bitstreams in ORIGINAL, for indexingLICENSE contains the deposit license that the submitter granted the host organization; in other words,specifies the rights that the hosting organization haveCC_LICENSE contains the distribution license, if any (a license) associated with theCreative Commonsitem. This license specifies what end users downloading the content can do with the content

    Each bitstream is associated with one . Because preservation services may be an importantBitstream Formataspect of the DSpace service, it is important to capture the specific formats of files that users submit. In DSpace,a bitstream format is a unique and consistent way to refer to a particular file format. An integral part of abitstream format is an either implicit or explicit notion of how material in that format can be interpreted. Forexample, the interpretation for bitstreams encoded in the JPEG standard for still image compression is definedexplicitly in the Standard ISO/IEC 10918-1. The interpretation of bitstreams in Microsoft Word 2000 format isdefined implicitly, through reference to the Microsoft Word 2000 application. Bitstream formats can be morespecific than MIME types or file suffixes. For example, and span multiple versions ofapplication/ms-word .docthe Microsoft Word application, each of which produces bitstreams with presumably different characteristics.

    Each bitstream format additionally has a , indicating how well the hosting institution is likely to besupport levelable to preserve content in the format in the future. There are three possible support levels that bitstreamformats may be assigned by the hosting institution. The host institution should determine the exact meaning ofeach support level, after careful consideration of costs and requirements. MIT Libraries' interpretation is shownbelow:

    Supported The format is recognized, and the hosting institution is confident it can make bitstreams of thisformat usable in the future, using whatever combination of techniques (such as migration,emulation, etc.) is appropriate given the context of need.

    Known The format is recognized, and the hosting institution will promise to preserve the bitstreamas-is, and allow it to be retrieved. The hosting institution will attempt to obtain enoughinformation to enable the format to be upgraded to the 'supported' level.

    Unsupported The format is unrecognized, but the hosting institution will undertake to preserve the bitstreamas-is and allow it to be retrieved.

  • DSpace 4.x Documentation

    28-Feb-2014 https://wiki.duraspace.org/display/DSDOC4x Page of 37 795

    Each item has one qualified Dublin Core metadata record. Other metadata might be stored in an item as aserialized bitstream, but we store Dublin Core for every item for interoperability and ease of discovery. TheDublin Core may be entered by end-users as they submit content, or it might be derived from other metadata aspart of an ingest process.

    Items can be removed from DSpace in one of two ways: They may be 'withdrawn', which means they remain inthe archive but are completely hidden from view. In this case, if an end-user attempts to access the withdrawnitem, they are presented with a 'tombstone,' that indicates the item has been removed. For whatever reason, anitem may also be 'expunged' if necessary, in which case all traces of it are removed from the archive.

    Object Example

    Community Laboratory of Computer Science; Oceanographic Research Center

    Collection LCS Technical Reports; ORC Statistical Data Sets

    Item A technical report; a data set with accompanying description; a video recording of a lecture

    Bundle A group of HTML and image bitstreams making up an HTML document

    Bitstream A single HTML file; a single image file; a source code file

    Bitstream Format Microsoft Word version 6.0; JPEG encoded image format

    Storage Resource Broker (SRB) SupportDSpace offers two means for storing bitstreams. The first is in the file system on the server. The second is usingSRB (Storage Resource Broker). Both are achieved using a simple, lightweight API.

    SRB is purely an option but may be used in lieu of the server's file system or in addition to the file system.Without going into a full description, SRB is a very robust, sophisticated storage manager that offers essentiallyunlimited storage and straightforward means to replicate (in simple terms, backup) the content on other local orremote storage resources.

  • DSpace 4.x Documentation

    28-Feb-2014 https://wiki.duraspace.org/display/DSDOC4x Page of 38 795

    2 Installing DSpace1 For the Impatient2 Hardware Recommendations3 Prerequisite Software

    3.1 UNIX-like OS or Microsoft Windows3.2 Oracle Java JDK 7 (standard SDK is fine, you don't need J2EE) or OpenJDK 73.3 Apache Maven 3.x (Java build tool)

    3.3.1 Configuring a Proxy3.4 Apache Ant 1.8 or later (Java build tool)3.5 Relational Database: (PostgreSQL or Oracle)3.6 Servlet Engine (Apache Tomcat 7 or later, Jetty, Caucho Resin or equivalent)3.7 Perl (only required for [dspace]/bin/dspace-info.pl)

    4 Installation Instructions4.1 Overview of Install Options4.2 Overview of DSpace Directories4.3 Installation

    5 Advanced Installation5.1 'cron' jobs / scheduled tasks5.2 Multilingual Installation5.3 DSpace over HTTPS

    5.3.1