NCSU Libraries
Digital Repository Projects at the
North Carolina State University Libraries
James Jackson Sanborn
Jim Tuttle
Open Repositories/DSpace User Group ‘07
NCSU Libraries
Early Repository Planning
• Digital Repository Planning Committee• What it wouldn’t be (at least to start)
– Distributed community structure– Open submission– ‘Institutional’ Repository
• What it would be (at least to start)
– Library-managed collections– Building block for campus partnership– Learning opportunity
NCSU Libraries
Repository Building Blocks
• NCSU Electronic Theses and Dissertations– Started 1997– Mandatory since 2002– Virginia Tech’s ETDdb– ~3,000 ETDs
• NCSU Authors Database– Started 1995– Access Database/Cold Fusion front-end– ~22,000 citations
NCSU Libraries
Repository Building Blocks (cont’d)
• Technical Reports Print Collection– Campus Institutes and Departments– Massive fall-off in print distribution
• Special Collections Resource Center– Digitized texts and photographs– Campus Newsletters
• GIS Data– Library managed/acquired data collection– Homegrown data layer database/discovery
tools
NCSU Libraries
Repository Plan
• Target ‘Research’ collections first– Technical Reports– ETDs– Faculty Publications/Citations
• Treat each collection as its own project
• Actively pursue common technological solutions
NCSU Libraries
Technical Reports
• DSpace Application
• Lightly Customized
• Library Harvested– Local Cataloging/Metadata database– Scripted Ingest Object Creation– Batch Ingest
• Mix of ongoing submission by institute/departmental personnel and Library capture.
NCSU Libraries
Electronic Theses & Dissertations
• Partnership with Graduate School
• Hybrid System: DSpace and ETD-db– ETD-db submission/approval/management– Direct database extract for DSpace Ingest
Object creation– Scheduled Batch Ingest process
• DSpace Considerations/Alterations– Metadata Mapping– Author Browse (exclude contributor.advisor)– Various interface changes
NCSU Libraries
Faculty Publications
• Built on Existing Author Database– Rebuilt Authors DB from Access/ColdFusion
to Oracle/PHP• Re-modeled data• Added Functionality
– OpenURL– ‘Vita-like’ citation display– Full-text or submission links
– Full-text stored in DSpace• Citation metadata and file exported by script• DSpace Identifier currently manually entered
NCSU Libraries
Faculty Publications Schematic
Scholar
Oracle FacultyPublications DB (citations)
Web interface (php)
DSpaceJava/JSP
(full-text only)
Cataloging and Coll. Mgt.
Access
DSpace Item DisplayWeb Submission Form
ISIAnn. Reps
Etc.
View full-text
S+R Citations
Add/Edit data
Handle IDs
SubmitCitations
and/or Text
File System(files)
PostgreSQL(metadata)
NCSU Libraries
Repository Governance
• Internal– Digital Repository Planning Committee– Data Repository Architect
• External– Faculty Repository Advisory Committee– Partnerships with departments and institutes
NCSU Libraries
NCGDAP: Overview
• NDIIPP: National Digital Information Infrastructure and Preservation Program
• Collaboration with Library of Congress
• 1 of 8 three year projects to study long-term (50+ years) digital preservation
• Objective: engage existing state/federal geospatial data infrastructures in preservation
• Project approaches: Technical and Social
NCSU Libraries
Repository Requirements
• Dim archive with possible future access– minimal IR/access component
• Minimal repository imprint on data– repository agnostic ingest and export
• Simple digital curation functions– Periodic MD5 checksum validation– Structured metadata index
• Expected archived-data exchange• Leverage existing investments• Free Software with active community
NCSU Libraries
Automation: Threat and format analysis, validationPython wrappers for the following:
• Anti-virus – ClamAV
• Compressed files (tar, zip, gzip, bzip)
• At-risk formats
• Executable files (magic numbers)
• Jhove validation
NCSU Libraries
Automation:Archive package organization• Rule-based python
logic– filestem – extension
relationships ( multi-file format validation)
– directory structure
• Manual intervention• NOID assignment
NCSU Libraries
Metadata:Seed file form• 'Transfer set' metadata capture in 'Seed
file'– communicates with DSpace backend,
generates xml used to inform later scripts
NCSU Libraries
Metadata:Communities and Collections
• Search by type for 100+ communities• Facilitates creation and reduces errors
NCSU Libraries
Curation Processing
• At-risk format migration, original retained
• Agency-specific XML templates in ArcCatalog with synchronization flags
• Provenance and curation metadata scripted
NCSU Libraries
Source Metadata Translation
• Repository agnostic approach
• Spokes for each transformation
• Facilitates export from Dspace into other repositories
• Generate Dspace QDC, METS; populate Workflow database
NCSU Libraries
Extra-repository AIP management
• Workflow Management Database (WMD) populated as a spoke on the metadata/ingest hub
• External tracking of NOID, Handle, ISO keywords, other metadata for interaction with other systems
• Integrates with existing GIS Lookup tool
NCSU Libraries
Repository Architecture Overview
PostgreSQL
repository tomcat instance
Faculty PublicationsPHP/DSpace hybrid
TomcatDSpace Internal
NDIIPP(DSpace)
SCRC(DSpace)
Asset Store/ATABeast
(sub-directory for each DSpace app)
One shared username. Separate database for each
app
Repository(DSpace)•Technical Reports•ETDs
Collections (DSpace)SCRC --Course Catalogs --Green ‘N’ Growing
NCSU Libraries
Upcoming Repository Related Projects
• Enhancements to current system– XTF search interface– Inter-archive exchange
• Digital Collections Repository– Special Collections Research Center– Other non-faculty collections
• Data Repository– Scientific data– Statistical resources
NCSU Libraries
For More Information:
• James Jackson Sanborn– [email protected]
• Jim Tuttle– [email protected]