Upload
allison-merriweather
View
217
Download
1
Tags:
Embed Size (px)
Citation preview
Prioritizing digitization
British Library Centre for Conservation, February 23 2010
The scanning on demand system of the
Amsterdam City Archives
Projectleader for the Imagebank
Projectleader development search and retrieval applications
Projectleader digitization
Started working at the Amsterdam Archives in 2001
Who am I?
British Library Centre for Conservation, February 23 2010
Marc Holtman
Current job
Coordination all digitization projectsCoordination all digitization projects
Development workflowDevelopment workflow
Development workflow toolsDevelopment workflow tools
ArchiefbankArchiefbank
ImagebankImagebank
Archiefbank online with more than 7 million scans and 15.000 registered users
Image Bank online with 300.000 high end quality scans
2010
2001 – Developing of the Image Bank: building an application and digitizing of 25.000
photo’s, drawings and prints
2006 – Developing the Archiefbank: expanding of the inventories with integration of scans,
Indexes, scanning on demand service and a workflow for large scale digitization
2003 – Developing an application for online inventories: all inventories, no scans
2000 - Start with digitization of highlights from the collections and three large genealogy
sources
History
British Library Centre for Conservation, February 23 2010
Brief history of digitization at the Amsterdam Archives
From (relatively) small scale digitization to a “scan it all” approach
And a spectacular growth of users on the website
Trigger was an ongoing decline in visitors of our reading rooms
Turning point in 2006
History
British Library Centre for Conservation, February 23 2010
From small selections to large scale digitization
Visitors
Year Reading rooms Website
1982 24.027
1988 29.788
1992 27.738
1998 26.598 40.048
2002 25.014 224.050
2006 17.958 512.592
Users expects to find everything digitally available
……when we have 20 miles of archives in our repositories
Strategy
British Library Centre for Conservation, February 23 2010
Everybody should be able to consult digitized documents 24 /7 onlineEverybody should be able to consult digitized documents 24 /7 online
But where to start?
And how to finance?
After the realization of the online inventories users started to ask
“Where’s the button for the images?”
Strategy
Q. How long does it take to scan it all?
1 feet = 2.000 scans
Production = 10.000 scans a week
A. 406 years
Q. How many scans can be made from 20 miles of archives?
A. 739.200.001 scans
British Library Centre for Conservation, February 23 2010
The pessimistic math
It was clear we had to:
Strategy
British Library Centre for Conservation, February 23 2010
Rethink our policy in prioritizing digitization
Rethink our financial principals on digitization
Develop a workflow in which large scale and low costs are starting points
Develop a user friendly web application
And started thinking about the documents the users need for their research
Users only need a few documents, not everything that is being digitized
We stopped thinking about the 20 miles of archives in our repositories
British Library Centre for Conservation, February 23 2010
The user priorities
The documents needed for your research should be the first
documents to be digitized, not the last
The documents needed for your research should be the first
documents to be digitized, not the last
This asks for client-driven digitization This asks for client-driven digitization
Prioritizing digitization
The user doesn’t commit to anything by placing a request, but neither does the archive
In principle all requests are honored, unless
It can not be digitized for material reasonsIt can not be digitized for material reasons
Copyright materialCopyright material
Disclosure restrictions applyDisclosure restrictions apply
In the Archiefbank we let the user set priorities in digitization
Prioritizing digitization
British Library Centre for Conservation, February 23 2010
The user priorities
All archive files can be requested for digitization via the
online inventories
All archive files can be requested for digitization via the
online inventories
Prioritizing digitization
British Library Centre for Conservation, February 23 2010
After digitization the originals can not be requested in the reading room anymore
The scans in the scanning on request service are made for the purpose of archival research
Not as a substitute for the originals
Nevertheless, digitization does have a real conservation function
Conservation of the originals remains our major concernConservation of the originals remains our major concern
Prioritizing digitization
British Library Centre for Conservation, February 23 2010
The preservation side
Damage or loss of the originals caused by use is ruled outDamage or loss of the originals caused by use is ruled out
If the material is too fragile, or asks for complex restoration we cancel the request for
digitization
If necessary – and possible – our restoration employees perform small restorations
All inventory nrs are checked before they are transported to the digitizer
Basic rules:
We perform small preservation tasks
Prioritizing digitization
The preservation side
Removal of staples Removal of staples
repackaging when necessaryrepackaging when necessary
The sequence of the originals is not checked or alteredThe sequence of the originals is not checked or altered
We do not number the originalsWe do not number the originals
British Library Centre for Conservation, February 23 2010
Prioritizing digitization
The preservation side
British Library Centre for Conservation, February 23 2010
Digital preservation: all scans are stored in a controlled e-repository environment (OAIS)
Prioritizing digitization
The preservation side
British Library Centre for Conservation, February 23 2010
Hundreds to millions of scans in each project
Purpose of digitization varies from accessibility to substitution of the originals
Besides the selection made by users we scan on project basis
Prioritizing digitization
British Library Centre for Conservation, February 23 2010
Digitization projects
Grants from (national) program, often on specific topics
Cooperation with Amsterdam district councils and services
But: consulting the scans at our reading rooms is for free
In the Netherlands free access to archives is legislated
Users have to pay to get access to the scans
But for reproductions you have to pay
We regard reading and downloading of digitized archival documents via the
web as delivery of reproductions
We regard reading and downloading of digitized archival documents via the
web as delivery of reproductions
Grants for digitization are not enough for realizing our vision
Financing
British Library Centre for Conservation, February 23 2010
The idea is that by buying scans the audience makes (part of the) financing
of digitization possible
The idea is that by buying scans the audience makes (part of the) financing
of digitization possible
Customers think a low price is important
This means that costs for producing and storing scans have to be as low as possible
Archival research easily runs into the use of dozens to hundreds of documents
The price of an ordinary copy in our reading room should be the benchmark
100 scans should not cost € 1000
The costs when purchasing scans online should be competitive with travel
costs when visiting our reading room
The costs when purchasing scans online should be competitive with travel
costs when visiting our reading room
Financing
British Library Centre for Conservation, February 23 2010
Pricing policy
Reducing incidental costs (production of scans):
Digitization on al large scale only is possible when both incidental and structural costs are
as low as possible
Reducing structural costs (storage of scans):
1. Standardized and efficiently organized workflow
Financing
British Library Centre for Conservation, February 23 2010
Reducing costs
2. Choosing quality standards that fit the purpose of the scans
3. Filesizes as small as possible
Financing
British Library Centre for Conservation, February 23 2010
Reducing costs
2. Choosing quality standards that fit the purpose of the scans
Price comparison scanning costs
Price rates scanning, external partner
High-end 2 – 10 €
“Legibility” 0,20 – 0,40 €
“Legibility”, auto-feed 0,10 €
In every project we choose a quality that fits the purpose of the digitizing In every project we choose a quality that fits the purpose of the digitizing
Scanning a modern, printed book for means of accessibility is not the same
as scanning of a vulnerable charter for preservation
Scanning a modern, printed book for means of accessibility is not the same
as scanning of a vulnerable charter for preservation
Example of scan with a “legibility” standard of quality
Financing
British Library Centre for Conservation, February 23 2010
Reducing costs
2. Choosing quality standards that fit the purpose of the scans
Is this scan ok for the purpose of doing archival research: yes
Is this scan ok for the publication in an art book: no
3. Filesizes as small as possible
We use a combination of 1 and 3
Storage costs still are considerably high when producing large quantities of scans
In order to bring structural costs down file size of the scans has to be as low as possible
This can be achieved in three ways
1. Skimming on resolution
3. Using (lossless or lossy) compression on the files
2. Skimming on bit depth / amount of colors (only possible in formats like TIFF and PNG)
Financing
British Library Centre for Conservation, February 23 2010
Reducing costs
Financing
British Library Centre for Conservation, February 23 2010
Reducing costs
3. Filesizes as small as possible
Fileformat Storage Costs 1 year Costs 10 years
Tiff uncompressed 11 TB € 38.500 € 380.500
JPEG 10 1,1 TB € 3.850 € 38.500
JPEG 4 (200 dpi) 124 GB € 434 € 4.340
JPEG 2000 (part 1) 6 TB € 21.000 € 210.000
Storage of 500.000 images Avg size per scan uncompressed = 22,1 MB
Price rate: 1 TB, storage in a controlled e-repository environment on two separate locations, including IT costs
€ 3.500 (NLD, jan 2010)
Also, digitization simply is a powerfull way to fulfill our mission: making our archives accessible
What we win by digitization is more than what we can simply measure in euro’s as income
For example, after digitizing logistics and physical reading room with climate control and
security isn’t necessary anymore for these documents when requested
What should we put in and what not?
Calculating real costs and income is difficult
Financing
British Library Centre for Conservation, February 23 2010
Costs and income Archiefbank
Costs Archiefbank (2009)
Digitization on request € 140,000
Digitization projects € 200,000
Webservices € 50.000
Total € 390.000
Income Archiefbank (2009)
Sales of scans € 100,000
Project funding € 200,000
Government (digitization) € 90,000
Total € 390.000
Conclusion in our framework is that the scanning on request service is financially feasible
Financing
British Library Centre for Conservation, February 23 2010
Costs and income Archiefbank
Costs Archiefbank (2009)
Digitization on request € 140,000
Digitization projects € 200,000
Webservices € 50.000
Total € 390.000
Income Archiefbank (2009)
Sales of scans € 100,000
Project funding € 200,000
Government (digitization) € 90,000
Total € 390.000
Goals of digitization projects vary from access to substitution of the originals
In every project quality standard and method are set, depending on purpose
and type of material
We always work on project basis
Every type of document can be digitized in this workflow
We developed a standardized workflow for all digitization
British Library Centre for Conservation, February 23 2010
Standardized workflow
2. Providing Ordernr(s)
3. Assessing the originals
4. Preparing the originals
5. Transport
6. Scanning7. Assinging
filenames
8. Transport
9. Checking originals
10. Checking scans
13. Import in controled
storage system
15. Export scans
17. Import scans
16. Export metadata
18. Import metadata
14. Import in metadata system
11. Checking metadata
1. Requesting digitalization
12. Originals back to
repositry
Workflow
Scanning is contracted out
Identification of the file and assigning filenames by means of an
order ticket
Always scanning of complete inventory numbers
Use of workflow tools for managing the originals and performing of checks on scans
Workflow
British Library Centre for Conservation, February 23 2010
workflow Principles
2. Providing Ordernr(s)
3. Assessing the originals
4. Preparing the originals
5. Transport
6. Scanning7. Assinging
filenames
8. Transport
9. Checking originals
10. Checking scans
13. Import in controled
storage system
15. Export scans
17. Import scans
16. Export metadata
18. Import metadata
14. Import in metadata system
11. Checking metadata
1. Requesting digitalization
12. Originals back to
repositry
Workflow
British Library Centre for Conservation, February 23 2010
Weekly schedule scanning on demand
2. Providing Ordernr(s)
3. Assessing the originals
4. Preparing the originals
5. Transport
6. Scanning7. Assinging
filenames
8. Transport
9. Checking originals
10. Checking scans
13. Import in controled
storage system
15. Export scans
17. Import scans
16. Export metadata
18. Import metadata
14. Import in metadata system
11. Checking metadata
1. Requesting digitalization
12. Originals back to
repositry
Task Hours
Retrieving the originals 4
Preparing the originals 6
Checking scans 6
Returning the originals 4
Contact with customers 1
Coordination and administration 3
Archiefbank
British Library Centre for Conservation, February 23 2010
Demonstration of the Archiefbank
2. Providing Ordernr(s)
3. Assessing the originals
4. Preparing the originals
5. Transport
6. Scanning7. Assinging
filenames
8. Transport
9. Checking originals
10. Checking scans
13. Import in controled
storage system
15. Export scans
17. Import scans
16. Export metadata
18. Import metadata
14. Import in metadata system
11. Checking metadata
1. Requesting digitalization
12. Originals back to
repositry
More:
http://www.slideshare.net/ktheimer