View
78
Download
4
Tags:
Embed Size (px)
Citation preview
Digitisation;Nuts & bolts at the Wellcome
Library
In the picture: getting the most out of images inside & outside your collection. CILIP, September 2014
Dave ThompsonDigital Curator, Wellcome Library
The Wellcome Library
• Part of Wellcome Collection, astonishing public venue in London developed by the Wellcome Trust. Where people can learn more about medicine through the ages & across cultures.
• More than 10,000 readers visit us each year, including historians, academics, students, health professionals & consumers, journalists, artists & members of the general public.
Harvesting
Harvesting
Digitisation in the Wellcome Library
• Strategic approach, conscious planned decisions.
• Library transformation strategy, physical to digital.
• From ‘project’ to ‘production’.
• Digitisation as a sustainable end-to-end process.
• Sustainable activity delivering access to content.
Overview - three IT systems…
1. Workflow management system – ‘Goobi’ = PRODUCTION.
2. Digital object repository – ‘Preservica’ = STORAGE.
3. Front end - ‘the player’ = ACCESS.
Remember, this doesn’t include cataloguing or bibliographic systems. Here we’re just talking about the process of creating, storing & delivering digital content. You have to assume that those other systems are also in place.
Goobi is our core digitisation system
• Goobi can be used to normalise image formats, e.g. TIFFs into JPEG2000.
• Used for reporting, volumes, numbers, etc.
• Web based, used by all staff involved in digitisation.
• Produces METS files, flexible & standards based.
Goobi is the primary interface for most staff involved in digitisation. It’s the only software that many use, which simplifies training & delivery.
Goobi workflow tracking & management
• Manages & tracks the production of content.
• Workflow driven. Already highly automated.
• Allows us to set very granular access conditions.
• Scalable & highly adaptable to different projects.
Goobi has been in production for about 3 years now, it’s already processed some 2.5 million images. Content which is publicly available in our player.
Digitisation the steps
MARC records are imported from Sierra into Goobi
Digitisation – enter the humans
Digitised images are imported into Goobi & automatically associated with that metadata
We use cameras not scanners for better resolution & quicker imaging.
Digitisation – enter the humans
METS files are created in Goobi
Digitisation – enter the humans
Goobi initiates ingest of the JPEG2000 images & metadata in Preservica
Digitisation – enter the humans
Player pulls images from Preservica using metadata in the METS file
Goobi – exit the humans
• Goobi key steps performed by humans.
• There are high levels of automation, but not everything is automated.
• Ambition is to build fully automated workflows.
• Scalable & highly adaptable to different projects.
Remember, humans are still an important part of digitisation. There are some decisions that only a human can make, & there will always be a need for human driven processes.
Working with digitised content
Goobi Preservica
In-house
Institutions
Contractors
Harvesting
TIFF or JP2
TIFF or JP2HD & ftp
TIFF or JP2
Normalises TIFF to JP2
Manual
Automatic
Jpylyzer validates JP2Auto harvesting of
JP2 & DMD
Grey literature
Ingest Officer / Digital Curator
Snagging
Snagging
Goobi – 19th century book project
• Internet Archive (IA) is digitising our 19th century books.
• Content is uploaded by them to the IA website.
• IA do Optical Character Recognition the books & create structure.
• Goobi harvests the files that the IA create to automatically process content.
http://www.kuka-robotics.com/l
Looking at the IA website
https://archive.org/details/wellcomelibrary
Looking at the IA website – metadata
How the automation works
• Goobi builds a process using the MARC record.
• Against this process it imports the images.
• Uses the scandata file to create a METS file with pagination & structure.
• Uses the raw Abbyy file to create ALTO files that allow us to search for words & highlight search term hits.
http://www.impactautomation.com.au/automation
Here’s the record in our OPAC
b20422155
Here’s the book in our player
How it all works…
So, to wrap up…
• Digitisation is a strategic activity.
• We have built an end-to-end process from selection to access.
• Working at scale so efficiency is important.
• Integrated in our OPAC. No silos.
• Well articulated architecture.
Thank you
Questions now, questions later…?
Dave Thompson, Digital CuratorWellcome Library
[email protected] - @d_n_t
http://wellcomelibrary.org/