35
1 Newspaper Digitisation Newspaper Digitisation Workflows Workflows Rose Holley- Manager ANDP Rose Holley- Manager ANDP Presentation to Cultural Heritage Digitisation professionals Presentation to Cultural Heritage Digitisation professionals 26 November 2008 26 November 2008

Newspaper digitisation workflows: presentation for cultural heritage digitisation professionals. 2008

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: Newspaper digitisation workflows: presentation for cultural heritage digitisation professionals. 2008

1

Newspaper Digitisation Newspaper Digitisation WorkflowsWorkflows

Rose Holley- Manager ANDPRose Holley- Manager ANDPPresentation to Cultural Heritage Digitisation professionalsPresentation to Cultural Heritage Digitisation professionals

26 November 200826 November 2008

Page 2: Newspaper digitisation workflows: presentation for cultural heritage digitisation professionals. 2008

2

Preparing for DigitisationPreparing for Digitisation Creation of digital imagesCreation of digital images Adding metadata and Quality AssuranceAdding metadata and Quality Assurance Optical Character RecognitionOptical Character Recognition Quality AssuranceQuality Assurance

Other informationOther information Access & interactionAccess & interaction StatisticsStatistics

General WorkflowGeneral Workflow

Page 3: Newspaper digitisation workflows: presentation for cultural heritage digitisation professionals. 2008

3

Identify title to be digitisedIdentify title to be digitised Source master microfilm from ownerSource master microfilm from owner Send master microfilm to scanning Send master microfilm to scanning

contractorscontractors Add title to Content Management Add title to Content Management

SystemSystem

Preparing for Preparing for DigitisationDigitisation

Page 4: Newspaper digitisation workflows: presentation for cultural heritage digitisation professionals. 2008

4

Add Title ScreenAdd Title Screen

Page 5: Newspaper digitisation workflows: presentation for cultural heritage digitisation professionals. 2008

5

Microfilm converted to digital imagesMicrofilm converted to digital images

Page 6: Newspaper digitisation workflows: presentation for cultural heritage digitisation professionals. 2008

6

Image ReceptionImage Reception Images received from scanning Images received from scanning

contractor on LTO2 Tapecontractor on LTO2 Tape Tapes added to tape robot and Tapes added to tape robot and

extractedextracted Reels automatically added to Content Reels automatically added to Content

Management SystemManagement System Reel details are checkedReel details are checked Images ingested into Content Images ingested into Content

Management SystemManagement System

Page 7: Newspaper digitisation workflows: presentation for cultural heritage digitisation professionals. 2008

7

Check Reel DetailsCheck Reel Details

Page 8: Newspaper digitisation workflows: presentation for cultural heritage digitisation professionals. 2008

8

Ingest ReelsIngest Reels

Page 9: Newspaper digitisation workflows: presentation for cultural heritage digitisation professionals. 2008

9

Quality Assurance (QA)Quality Assurance (QA)

QA Phase 1 – Add metadata (dates and QA Phase 1 – Add metadata (dates and page numbers)page numbers)

Supervisor reviews marked pagesSupervisor reviews marked pages QA Phase 2 – Define batches QA Phase 2 – Define batches QA Phase 2 – Resolve duplicatesQA Phase 2 – Resolve duplicates QA Phase 2 – Create missing page QA Phase 2 – Create missing page

targetstargets

Page 10: Newspaper digitisation workflows: presentation for cultural heritage digitisation professionals. 2008

10

Adding MetadataAdding Metadata Date and Page Sequence number Date and Page Sequence number

addedadded

Page 11: Newspaper digitisation workflows: presentation for cultural heritage digitisation professionals. 2008

11

Supervisor Supervisor ReviewReview

Supervisor Supervisor reviews reviews pages pages marked for marked for attentionattention

Page 12: Newspaper digitisation workflows: presentation for cultural heritage digitisation professionals. 2008

12

Define BatchesDefine Batches Batches defined by dateBatches defined by date Each batch contains 2-3000 imagesEach batch contains 2-3000 images Batches are automatically assigned a numberBatches are automatically assigned a number

Page 13: Newspaper digitisation workflows: presentation for cultural heritage digitisation professionals. 2008

13

Resolve DuplicatesResolve Duplicates Duplicate pages compared and the best copy is Duplicate pages compared and the best copy is

selectedselected

Page 14: Newspaper digitisation workflows: presentation for cultural heritage digitisation professionals. 2008

14

Missing Missing page page targets targets are are generategeneratedd

MissinMissing g

PagesPages

Page 15: Newspaper digitisation workflows: presentation for cultural heritage digitisation professionals. 2008

15

Optical Character Optical Character Recognition (OCR)Recognition (OCR)

Complete batches are added to a tapeComplete batches are added to a tape Tapes are generated and written by ITTapes are generated and written by IT Tapes sent to OCR contractorTapes sent to OCR contractor Contractor completes OCR processesContractor completes OCR processes OCR data (not images) is returned via FTPOCR data (not images) is returned via FTP

Page 16: Newspaper digitisation workflows: presentation for cultural heritage digitisation professionals. 2008

16

Tapes CreatedTapes Created Completed batches added to a tapeCompleted batches added to a tape

Page 17: Newspaper digitisation workflows: presentation for cultural heritage digitisation professionals. 2008

17

Optical Character Recognition (OCR) of pages and article zoningOptical Character Recognition (OCR) of pages and article zoning

Page 18: Newspaper digitisation workflows: presentation for cultural heritage digitisation professionals. 2008

18

OCR Data ReceptionOCR Data Reception(Automated process)(Automated process)

OCR contractor advises NLA server that a OCR contractor advises NLA server that a batch has been completedbatch has been completed

NLA server downloads the batchNLA server downloads the batch Batch is ingested into Content Batch is ingested into Content

Management SystemManagement System Checks are performed on data validityChecks are performed on data validity QA Derivatives are generatedQA Derivatives are generated Articles may now be searched, but are not Articles may now be searched, but are not

yet accessibleyet accessible

Page 19: Newspaper digitisation workflows: presentation for cultural heritage digitisation professionals. 2008

19

Batch informationBatch information

Page 20: Newspaper digitisation workflows: presentation for cultural heritage digitisation professionals. 2008

20

Quality Assurance (QA)Quality Assurance (QA) A random sample of Issues and Articles is A random sample of Issues and Articles is

checkedchecked Volume and Issue number are checked for Volume and Issue number are checked for

accuracyaccuracy Sample articles are checked against Quality Sample articles are checked against Quality

Acceptance Criteria (QAC)Acceptance Criteria (QAC) Error rates calculated against QAC on the flyError rates calculated against QAC on the fly Supervisor checks final result and decides Supervisor checks final result and decides

on accepting the batchon accepting the batch

Page 21: Newspaper digitisation workflows: presentation for cultural heritage digitisation professionals. 2008

21

Selecting the batchSelecting the batch

Page 22: Newspaper digitisation workflows: presentation for cultural heritage digitisation professionals. 2008

22

Volume & Issue Number Volume & Issue Number CheckCheck

Page 23: Newspaper digitisation workflows: presentation for cultural heritage digitisation professionals. 2008

23

Article checked against Article checked against QACQAC

Page 24: Newspaper digitisation workflows: presentation for cultural heritage digitisation professionals. 2008

24

Clean fields checked for Clean fields checked for accuracyaccuracy

Page 25: Newspaper digitisation workflows: presentation for cultural heritage digitisation professionals. 2008

25

Supervisor checks result Supervisor checks result and makes a decisionand makes a decision

Page 26: Newspaper digitisation workflows: presentation for cultural heritage digitisation professionals. 2008

26

QA ResultsQA Results Automated email sent to supplier Automated email sent to supplier

advising the resultadvising the result Emails for rejected batches include a Emails for rejected batches include a

summary of errorssummary of errors Summary of errors saved for all Summary of errors saved for all

batchesbatches Accepted batches are immediately Accepted batches are immediately

accessibleaccessible

Page 27: Newspaper digitisation workflows: presentation for cultural heritage digitisation professionals. 2008

27

AccessAccess Access is provided through Australian Access is provided through Australian

Newspapers betaNewspapers beta Users can search or browse Users can search or browse

newspapersnewspapers Search results can be refined using Search results can be refined using

filtersfilters Users can browse by Newspaper title Users can browse by Newspaper title

or Date.or Date.

Page 28: Newspaper digitisation workflows: presentation for cultural heritage digitisation professionals. 2008

28Search Results

Page 29: Newspaper digitisation workflows: presentation for cultural heritage digitisation professionals. 2008

29

Newspaper informationNewspaper information

Page 30: Newspaper digitisation workflows: presentation for cultural heritage digitisation professionals. 2008

30

User InteractionUser Interaction

Users are able toUsers are able to:: Correct the textCorrect the text Add tagsAdd tags Add commentsAdd comments

User-added content is not currently User-added content is not currently moderated, but may be in future.moderated, but may be in future.

Page 31: Newspaper digitisation workflows: presentation for cultural heritage digitisation professionals. 2008

31

StatisticsStatistics Stats for content received and QAd Stats for content received and QAd

generated on request by the Content generated on request by the Content Management SystemManagement System

Stats for volume usage of Beta Stats for volume usage of Beta collected using Google Analyticscollected using Google Analytics

Stats for user contributions to beta Stats for user contributions to beta collected on an as-needed basiscollected on an as-needed basis

Page 32: Newspaper digitisation workflows: presentation for cultural heritage digitisation professionals. 2008

32

Content StatisticsContent Statistics

Page 33: Newspaper digitisation workflows: presentation for cultural heritage digitisation professionals. 2008

33

Work StatisticsWork Statistics

Page 34: Newspaper digitisation workflows: presentation for cultural heritage digitisation professionals. 2008

34

Usage StatisticsUsage Statistics

Page 35: Newspaper digitisation workflows: presentation for cultural heritage digitisation professionals. 2008

35

Questions?Questions?