22
Digitisation; Nuts & bolts at the Wellcome Library In the picture: getting the most out of images inside & outside your collection. CILIP, September 2014 Dave Thompson Digital Curator, Wellcome Library

Dave's Wellcome Library digitisation presentation

Embed Size (px)

Citation preview

Page 1: Dave's Wellcome Library digitisation presentation

Digitisation;Nuts & bolts at the Wellcome

Library

In the picture: getting the most out of images inside & outside your collection. CILIP, September 2014

Dave ThompsonDigital Curator, Wellcome Library

Page 2: Dave's Wellcome Library digitisation presentation

The Wellcome Library

• Part of Wellcome Collection, astonishing public venue in London developed by the Wellcome Trust. Where people can learn more about medicine through the ages & across cultures.

• More than 10,000 readers visit us each year, including historians, academics, students, health professionals & consumers, journalists, artists & members of the general public.

Harvesting

Harvesting

Page 3: Dave's Wellcome Library digitisation presentation

Digitisation in the Wellcome Library

• Strategic approach, conscious planned decisions.

• Library transformation strategy, physical to digital.

• From ‘project’ to ‘production’.

• Digitisation as a sustainable end-to-end process.

• Sustainable activity delivering access to content.

Page 4: Dave's Wellcome Library digitisation presentation

Overview - three IT systems…

1. Workflow management system – ‘Goobi’ = PRODUCTION.

2. Digital object repository – ‘Preservica’ = STORAGE.

3. Front end - ‘the player’ = ACCESS.

Remember, this doesn’t include cataloguing or bibliographic systems. Here we’re just talking about the process of creating, storing & delivering digital content. You have to assume that those other systems are also in place.

Page 5: Dave's Wellcome Library digitisation presentation

Goobi is our core digitisation system

• Goobi can be used to normalise image formats, e.g. TIFFs into JPEG2000.

• Used for reporting, volumes, numbers, etc.

• Web based, used by all staff involved in digitisation.

• Produces METS files, flexible & standards based.

Goobi is the primary interface for most staff involved in digitisation. It’s the only software that many use, which simplifies training & delivery.

Page 6: Dave's Wellcome Library digitisation presentation

Goobi workflow tracking & management

• Manages & tracks the production of content.

• Workflow driven. Already highly automated.

• Allows us to set very granular access conditions.

• Scalable & highly adaptable to different projects.

Goobi has been in production for about 3 years now, it’s already processed some 2.5 million images. Content which is publicly available in our player.

Page 7: Dave's Wellcome Library digitisation presentation

Digitisation the steps

MARC records are imported from Sierra into Goobi

Page 8: Dave's Wellcome Library digitisation presentation

Digitisation – enter the humans

Digitised images are imported into Goobi & automatically associated with that metadata

We use cameras not scanners for better resolution & quicker imaging.

Page 9: Dave's Wellcome Library digitisation presentation

Digitisation – enter the humans

METS files are created in Goobi

Page 10: Dave's Wellcome Library digitisation presentation

Digitisation – enter the humans

Goobi initiates ingest of the JPEG2000 images & metadata in Preservica

Page 11: Dave's Wellcome Library digitisation presentation

Digitisation – enter the humans

Player pulls images from Preservica using metadata in the METS file

Page 12: Dave's Wellcome Library digitisation presentation

Goobi – exit the humans

• Goobi key steps performed by humans.

• There are high levels of automation, but not everything is automated.

• Ambition is to build fully automated workflows.

• Scalable & highly adaptable to different projects.

Remember, humans are still an important part of digitisation. There are some decisions that only a human can make, & there will always be a need for human driven processes.

Page 13: Dave's Wellcome Library digitisation presentation

Working with digitised content

Goobi Preservica

In-house

Institutions

Contractors

Harvesting

TIFF or JP2

TIFF or JP2HD & ftp

TIFF or JP2

Normalises TIFF to JP2

Manual

Automatic

Jpylyzer validates JP2Auto harvesting of

JP2 & DMD

Grey literature

PDF

Ingest Officer / Digital Curator

Snagging

Snagging

Page 14: Dave's Wellcome Library digitisation presentation

Goobi – 19th century book project

• Internet Archive (IA) is digitising our 19th century books.

• Content is uploaded by them to the IA website.

• IA do Optical Character Recognition the books & create structure.

• Goobi harvests the files that the IA create to automatically process content.

http://www.kuka-robotics.com/l

Page 15: Dave's Wellcome Library digitisation presentation

Looking at the IA website

https://archive.org/details/wellcomelibrary

Page 16: Dave's Wellcome Library digitisation presentation

Looking at the IA website – metadata

Page 17: Dave's Wellcome Library digitisation presentation

How the automation works

• Goobi builds a process using the MARC record.

• Against this process it imports the images.

• Uses the scandata file to create a METS file with pagination & structure.

• Uses the raw Abbyy file to create ALTO files that allow us to search for words & highlight search term hits.

http://www.impactautomation.com.au/automation

Page 18: Dave's Wellcome Library digitisation presentation

Here’s the record in our OPAC

b20422155

Page 19: Dave's Wellcome Library digitisation presentation

Here’s the book in our player

Page 20: Dave's Wellcome Library digitisation presentation

How it all works…

Page 21: Dave's Wellcome Library digitisation presentation

So, to wrap up…

• Digitisation is a strategic activity.

• We have built an end-to-end process from selection to access.

• Working at scale so efficiency is important.

• Integrated in our OPAC. No silos.

• Well articulated architecture.

Page 22: Dave's Wellcome Library digitisation presentation

Thank you

Questions now, questions later…?

Dave Thompson, Digital CuratorWellcome Library

[email protected] - @d_n_t

http://wellcomelibrary.org/