62
Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library [email protected] https://github.com/organizations/Georgetown-Universit y-Libraries

Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library [email protected]

Embed Size (px)

Citation preview

Page 1: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu

Focus on Your Content, Not on Ingesting Your

ContentTerry Brady

Applications Programmer AnalystGeorgetown University Library

[email protected]

https://github.com/organizations/Georgetown-University-Libraries

Page 2: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu

Goals of our Repository Managers

Create new collections

Grow collections

Accurately describe collection contents

Showcase our repository content

Page 3: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu

Our storyUsing simple tools to facilitate these goals

Page 4: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu

Imagine that you have content to load into your

repository

Page 5: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu

Scenario: One Item to Add to DSpace

Page 6: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu

One Item to Add: Item Submission

Click through 7 item submission screens

authoring metadata as you go

Page 7: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu

Scenario: Three Items to Add to DSpace

Page 8: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu

Three Items to Add: Item Submission

Click through 3x7 item submission

screens authoring metadata as you go

Page 9: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu

50 Items

Scenario: 50 newspaper issues to add to DSpace (very similar metadata)

Page 10: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu

50 Items to Add: Individual Item Submission is impractical

Page 11: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu

Next OptionDSpace Bulk Ingest Process

Page 12: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu

DSpace Bulk Ingest

50 Items

Page 13: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu

Ingest Folder

Media File

Thumbnail (optional)

Contents File

Metadata File

License File (optional)

Page 14: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu

Bulk Ingest: Build a Metadata Spreadsheet

50 Items

Page 15: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu

Bulk Ingest: Build Ingest Folders

50 Items

Page 16: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu

Bulk Ingest: For Each ItemCopy Item to Folder

50 Items

.PDF

Page 17: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu

Bulk Ingest: For Each ItemsCreate a unique Contents File

50 Items .TXT

.PDF

Page 18: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu

Bulk Ingest: For Each ItemsCreate a Dublin Core File

50 Items

.PDF

.TXT

.XML

Page 19: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu

Bulk Ingest: Initiate Import from a Terminal Window

50 Items .TXT

.PDF

.XML

Page 20: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu

Bulk Ingest: For Each ItemsCreate a Dublin Core File

50 Items .TXT

.PDF

.XML

What if you make a mistake?

What if you need to refine the metadata?

Page 21: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu

The ChallengeWant to grow the collections

But, the ingest process is daunting

Page 22: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu

The conversation focused on HOW to ingest the contentRather than on the content itself

Page 23: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu

Our Approach

Page 24: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu

Our Approach:Empower Content Owners

• Automate the tedious tasks

• Make metadata entry the focus of the effort

• Hide the command line from content owners

Page 25: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu

Our Approach:Simple Tools

Work around the tedious steps

Without constructing a complex workflow

Page 26: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu

Our Tools

• File Analyzer

o Desktop Application for File System Traversal

• DSpace QC Tools

o Web application for Batch Process Submission

Both of these tools are available on GitHub

• Georgetown-University-Libraries

Page 27: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu

File AnalyzerDesktop Application for File Processing

Page 28: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu
Page 29: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu

What we need

50 Items

Page 30: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu

Step 1: Automatically Generate an Ingest Inventory based on existing files

50 Items

Page 31: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu
Page 32: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu

Export the Generated Inventory

Page 33: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu

Step 2: Edit the Ingest Inventory as a Spreadsheet

Page 34: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu

Step 3: Generate the Ingest Folders from the Inventory Spreadsheet

Generate Contents FileGenerate Dublin Core Metadata FileInclude custom thumbnails if applicable

Page 35: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu
Page 36: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu

Create Ingest Folders

• An error message will appear if files are missing (or misspelled)

• Process can be rerun if the metadata spreadsheet needs to change

Page 37: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu

Ingest Folder Creation Report

Page 38: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu

Step 4: Validate Ingest Folders

• Identify Missing Files• Required Metadata• Validate Files

o Contentso Dublin Core

Page 39: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu
Page 40: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu

Validation Status Report

Page 41: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu

Step 5: Move Ingest Folders to Server and Initiate Bulk Ingest

Page 42: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu

for Batch Process Submission

Web Tools

Page 43: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu
Page 44: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu

Web Tools, Tutorials co-located with tools

Page 45: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu

Collection

Folder Location

Page 46: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu

Processes run by Bulk Ingest

• import

• filter-media [collection]

• update-discovery-index

• oai-import

• stats-util

Content is visible, searchable, and thumbnails are present!

Page 47: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu
Page 48: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu

Results

Empowered Librarians

Iterative metadata refinement

At the right point of the workflow

Significant growth in repository content

Decreasing IT involvement

Rapid development of support tools

Page 49: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu

Derived Tools

Generate Ingest Folders for ProQuest ETD's

Filter Media

Page 50: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu

Ingest ETD's from ProQuest

Page 51: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu

ProQuest ETD Ingest Rule

Page 52: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu

Filter Media Toolfor Items Submitted One by One

Collection

Filter Media Tasks

Re-index?

Page 53: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu

Benefits

Companion tools easy to learn

Users are very comfortable with them

De-mystify DSpace-specifics

Users trained other users!

Page 54: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu

Other Tools Created

Automation

• Undo Bulk Ingest

• Update Metadata

• Move Community/Collection

Reporting

• Data Quality Reports

• Statistics Reports

Page 55: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu

More Tools (time permitting)

Page 56: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu

Data Quality Reports

• Items with multiple media files

• Non-PDF Document Items

• Items missing a Thumbnail

• "Non-standard" Media Types

• Items modified last 30 days

• Items with Embargo

• Items missing a metadata field

• Item metadata containing a URL

Page 57: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu

Collection QC Report

Page 58: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu

Item QC Report

Page 59: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu

Usage Statistics Reports

• Not confident in the out of the box reports

• Wanted to understand underlying data

• Filter Stats

o On campus

o Within the library

Page 60: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu
Page 61: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu

Try it yourself

GitHub: Georgetown-University-Libraries

• File Analyzer & Metadata Harvestero Just need a Java Compilero Contains several utilities for digitization workflowso Links to tutorials

• DSpace QC Toolso PHP Codeo Sample code, not ready to runo Links to tutorials

Please let me know how these work for you!

Page 62: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu

Terry BradyApplications Programmer Analyst

Georgetown University [email protected]

https://github.com/organizations/Georgetown-University-Libraries