Upload
joshua-french
View
219
Download
1
Tags:
Embed Size (px)
Citation preview
1
Newspaper Digitisation Newspaper Digitisation WorkflowsWorkflows
Rose Holley- Manager ANDPRose Holley- Manager ANDPPresentation to Cultural Heritage Digitisation professionalsPresentation to Cultural Heritage Digitisation professionals
26 November 200826 November 2008
2
Preparing for DigitisationPreparing for Digitisation Creation of digital imagesCreation of digital images Adding metadata and Quality AssuranceAdding metadata and Quality Assurance Optical Character RecognitionOptical Character Recognition Quality AssuranceQuality Assurance
Other informationOther information Access & interactionAccess & interaction StatisticsStatistics
General WorkflowGeneral Workflow
3
Identify title to be digitisedIdentify title to be digitised Source master microfilm from ownerSource master microfilm from owner Send master microfilm to scanning Send master microfilm to scanning
contractorscontractors Add title to Content Management Add title to Content Management
SystemSystem
Preparing for Preparing for DigitisationDigitisation
4
Add Title ScreenAdd Title Screen
5
Microfilm converted to digital imagesMicrofilm converted to digital images
6
Image ReceptionImage Reception Images received from scanning Images received from scanning
contractor on LTO2 Tapecontractor on LTO2 Tape Tapes added to tape robot and Tapes added to tape robot and
extractedextracted Reels automatically added to Content Reels automatically added to Content
Management SystemManagement System Reel details are checkedReel details are checked Images ingested into Content Images ingested into Content
Management SystemManagement System
7
Check Reel DetailsCheck Reel Details
8
Ingest ReelsIngest Reels
9
Quality Assurance (QA)Quality Assurance (QA)
QA Phase 1 – Add metadata (dates and QA Phase 1 – Add metadata (dates and page numbers)page numbers)
Supervisor reviews marked pagesSupervisor reviews marked pages QA Phase 2 – Define batches QA Phase 2 – Define batches QA Phase 2 – Resolve duplicatesQA Phase 2 – Resolve duplicates QA Phase 2 – Create missing page QA Phase 2 – Create missing page
targetstargets
10
Adding MetadataAdding Metadata Date and Page Sequence number Date and Page Sequence number
addedadded
11
Supervisor Supervisor ReviewReview
Supervisor Supervisor reviews reviews pages pages marked for marked for attentionattention
12
Define BatchesDefine Batches Batches defined by dateBatches defined by date Each batch contains 2-3000 imagesEach batch contains 2-3000 images Batches are automatically assigned a numberBatches are automatically assigned a number
13
Resolve DuplicatesResolve Duplicates Duplicate pages compared and the best copy is Duplicate pages compared and the best copy is
selectedselected
14
Missing Missing page page targets targets are are generategeneratedd
MissinMissing g
PagesPages
15
Optical Character Optical Character Recognition (OCR)Recognition (OCR)
Complete batches are added to a tapeComplete batches are added to a tape Tapes are generated and written by ITTapes are generated and written by IT Tapes sent to OCR contractorTapes sent to OCR contractor Contractor completes OCR processesContractor completes OCR processes OCR data (not images) is returned via FTPOCR data (not images) is returned via FTP
16
Tapes CreatedTapes Created Completed batches added to a tapeCompleted batches added to a tape
17
Optical Character Recognition (OCR) of pages and article zoningOptical Character Recognition (OCR) of pages and article zoning
18
OCR Data ReceptionOCR Data Reception(Automated process)(Automated process)
OCR contractor advises NLA server that a OCR contractor advises NLA server that a batch has been completedbatch has been completed
NLA server downloads the batchNLA server downloads the batch Batch is ingested into Content Batch is ingested into Content
Management SystemManagement System Checks are performed on data validityChecks are performed on data validity QA Derivatives are generatedQA Derivatives are generated Articles may now be searched, but are not Articles may now be searched, but are not
yet accessibleyet accessible
19
Batch informationBatch information
20
Quality Assurance (QA)Quality Assurance (QA) A random sample of Issues and Articles is A random sample of Issues and Articles is
checkedchecked Volume and Issue number are checked for Volume and Issue number are checked for
accuracyaccuracy Sample articles are checked against Quality Sample articles are checked against Quality
Acceptance Criteria (QAC)Acceptance Criteria (QAC) Error rates calculated against QAC on the flyError rates calculated against QAC on the fly Supervisor checks final result and decides Supervisor checks final result and decides
on accepting the batchon accepting the batch
21
Selecting the batchSelecting the batch
22
Volume & Issue Number Volume & Issue Number CheckCheck
23
Article checked against Article checked against QACQAC
24
Clean fields checked for Clean fields checked for accuracyaccuracy
25
Supervisor checks result Supervisor checks result and makes a decisionand makes a decision
26
QA ResultsQA Results Automated email sent to supplier Automated email sent to supplier
advising the resultadvising the result Emails for rejected batches include a Emails for rejected batches include a
summary of errorssummary of errors Summary of errors saved for all Summary of errors saved for all
batchesbatches Accepted batches are immediately Accepted batches are immediately
accessibleaccessible
27
AccessAccess Access is provided through Australian Access is provided through Australian
Newspapers betaNewspapers beta Users can search or browse Users can search or browse
newspapersnewspapers Search results can be refined using Search results can be refined using
filtersfilters Users can browse by Newspaper title Users can browse by Newspaper title
or Date.or Date.
28Search Results
29
Newspaper informationNewspaper information
30
User InteractionUser Interaction
Users are able toUsers are able to:: Correct the textCorrect the text Add tagsAdd tags Add commentsAdd comments
User-added content is not currently User-added content is not currently moderated, but may be in future.moderated, but may be in future.
31
StatisticsStatistics Stats for content received and QAd Stats for content received and QAd
generated on request by the Content generated on request by the Content Management SystemManagement System
Stats for volume usage of Beta Stats for volume usage of Beta collected using Google Analyticscollected using Google Analytics
Stats for user contributions to beta Stats for user contributions to beta collected on an as-needed basiscollected on an as-needed basis
32
Content StatisticsContent Statistics
33
Work StatisticsWork Statistics
34
Usage StatisticsUsage Statistics
35
Questions?Questions?