Digitisation and Print-on-demand
ARCADIA
Ed Chamberlain - Systems Development Librarian
Question:
How could we (better) automate the digitisation workflow?
Why is digitisation important to libraries?
1. Better expose existing collections to a wider audience
2. Better meet reader expectation of ‘everything online’
3. Preserve material
Why is it important to me?
1. Previous work on the Biodiversity Heritage Library project
2. I feel that libraries are still not fulfilling their tremendous potential here
3. Cambridge has no ‘Google books project’• Is there an alternative model?
4. Is a one-site UL sustainable forever?
What's’ happening now in Cambridge? Digitisation is focused on special collections
Limited funds Cambridge's USP!
Relatively slow, manual process done to exemplar standards
Not scalable (at an effective cost)
As it stands …
Barriers to digitisation
Barriers are not technology-centric
1. Copyright legislation
2. Cost / time
3. Difficulty in reading on a screen
Areas of investigation …
Examine technological responses to barriers:
1. Copyright legislation => Speed up / rationalise copyright analysis
2. Cost /time => Explore automated book scanning
3. People prefer to a book to a screen => Explore print on demand
What exactly?
1. Copyright legislation => Speed up the copyright analysis => COPYRIGHT CALCULATOR
2. Cost / time => Explore automated book scanning =>KIRTAS AUTOMATIC BOOK SCANNER
3. People prefer to a book to a screen => Explore print on demand => ESPRESSO BOOK PRINTING MACHINE
Focus of fellowship …
1. COPYRIGHT CALCULATOR
2. KIRTAS AUTOMATIC BOOK SCANNER
3. ESPRESSO BOOK PRINTING MACHINE
… investigate them as a basis for a potential ‘on-demand’ digitisation service
Imagine …
Full or partial digitization of a work instead of a stack request initiated from a catalogue
Straight to desktop in less than a day
Order a bound print copy as an option
If it’s a public domain work then made available for all, under Creative Commons License…
Full digitisation at readers’ request …
Why ‘On-Demand’?
Expectation of modern society
Self sustaining – if reader pays for cost of digitisation -no large external donor needed
‘Every book its reader’
‘Save the time of the reader’
Fellowship methodology …
Explore each area in turn …
Visit case studies
Assemble facts and figures where possible
Draw out advantages and disadvantages of each piece of technology
1) Copyright and
copyright calculation
Basic problems with copyright:
Fiendish stuff
Complexity slows down decisions
Upsets risk-averse Librarians
We can only fully digitise what is in the Public Domain
Scope for automation
Copyright legislation as a set of rules into which data about a work is fed
Out comes a result (yes/ no/ probably)
Sounds like a job for a machine, rather than a person …
… Exactly what others have thought
Open Knowledge Foundation - Public domain works project / Europeana
Now exists as a machine accessible API
Feed in bib data - get a results
Conclusions on copyright calculation
Out of the 100 samples, 76 returned an expected result given the data available (further 8 could have been useful if a safe cut off point was added)
Great technology to potentially assist in decision making
As useful in asserting what is not in the public domain, as opposed to what is
Data we can provide is incomplete for the task – sometimes further research will be needed
Great feature for a library catalogue - kick off an ordering process
2) Digitisation-on-demand
Not that difficult to copy a book quickly…
Why Kirtas?
Two in Cambridge at the press
Used in the Cambridge libraries Collections project
CUP let me take a look!
Kirtas video …
http://www.youtube.com/watch?v=V03s5oJDwwc
Automated page turning …
But with a human watching just in case …
Cost saving?
Still quicker than ‘by hand’
Automated post processing …
But images are also sent to India for a two week tidy-up
Quick enough for on-demand?
What level of quality is sufficient for a library surrogate?
Focus on improving access rather than preservation
Would a preservation quality image be too expensive to produce for an on-demand approach?
For the iPad and Kindle - text is as important as a scanned image
Demand for this kind of thing?
91% (56/61) of Cambridge academics surveyed would be interested in a full text digital copy of an out-of copyright work
62% (36/58) would be interested in a partial digital copy of an in-copyright work if available
What can we copy?
Pub. Date Items % PD No. PD
1400-1850 304,587 100 304,587
1850-1860 40,970 100 40,970
1860-1870 43,734 100 43,734
1870-1880 50,564 95 48,035
1880-1890 66,857 90 60,171
1890-1900 66,883 85 56,850
1900-1910 70,360 65 45,734
1910-1920 60,489 40 24,195
1920-1930 78,670 25 19,667
1930-1940 90,576 10 9,057
1940-1950 72,692 6 4,361
1950-1960 118,251 0 0
1960-1970 262,974 0 0
1970-2009 2130,509 0 0
Total 3458,116 19 657,361
Estimations of University of Cambridge holdings within the public domain. R.Pollock 2009
What can we copy?
Around 19% of CUL’s collections fall within the public domain
Niche interest in this area - 2% of circulation transactions affected material from 1850 -1920
How much does it cost?
Cheaper than current services … Imaging option: Photocopy/Scan Image Type: A4 300 dpi (pdf) Image production (350 images at 0.50): 175.00 Service charge (15%): 26.25 VAT (20%): 35.00 Total: £236.25 for 350 pages
But still not that cheap… About £30 for a 350 page work (cost modelling based around the Kirtas
manned by imaging services staff) No capital recoup in that model
How much would readers pay?
Survey information reveals that 66% (36/54) academic users would prefer to pay under £15 for a digitised copy
Achieving this at cost or with a small surplus would be a challenge
Attempting to recoup capital investment directly would push costs beyond a ‘sweet-spot’ price point
Should they have to pay at all?
Conclusions for digitisation-on-demand
Great technology, nice idea, some demand
Somewhat limited as an effective service by size of public domain
Large upfront costs if Kirtas purchased
Other cost models available (lease hire, outsource)
3) Print-on-demand
Print on demand
Nothing new for publishing
Espresso Book Machine is the most exciting thing out there
EBM video
http://www.youtube.com/watch?v=Q946sfGLxm4
Blackwells Experience
Lots of interest
Needs full time staff to run
Strong interest in self publishing (theses)
Increasing amounts of material available from a variety of sources (Project Gutenburg, Google Books, publishers)
Utah Experience
“It undermines the need for traditional subject selection, disrupting a major sub-discipline of librarianship. By doing so, it also undermines the rationale for a large research collection—if the purpose of the collection is to meet patrons’ information needs, and if they can now be met without buying and housing a large just-in-case collection, then how do we defend the unbelievably expensive and arguably quite wasteful practice of traditional collection building?”
Rick Anderson, Marriott Library University of Utah
Utah Experience
“Undermines the need for publishers to print speculative runs of new books, thus potentially changing in a drastic way the logistics of the publishing world. In a rational marketplace, every bookstore would have an EBM or something that works on the same principle, and books would only be printed at the point of demand and purchase”
“Obviously, its full potential has yet to be realized—but the fundamental model is now in place. What are left to fix (bad metadata, incomplete catalog, rights issues, etc.) are the details. In most cases, fixing them will require only money and effort, and as roadblocks go those are relatively simple ones”
Rick Anderson, Marriott Library University of Utah
Demand?
65% would also be interested in a print facsimile
42% of academic respondents would be willing to pay £10-£15, 33% £15-£25
Costs?
£10 per 350 page volume
Blackwells have a pricing model that does not recoup capital
Final thoughts …
Conclusions for both print and digitisation
High upfront cost – any model that attempts to recoup capital through charges prices itself out of market
High upfront cost – High risk of failure
‘Innovators dilemma’ - we are in effect in competition with our bread and butter services
Conclusions for both print and digitisation
Aiming to hit a moving target of user expectation
Danger of early adoption – not understanding or being aware of longer term issues (ejournals)
Conclusions for both print and digitisation
Demand is high
Breakthrough technology – getting cheaper
Are libraries loosing digital customers by playing fair?
Google continues to digitise, despite legal setbacks and gain the headlines
Users continue to digitise themselves… Privately in research groups ‘Socially’(http://library.nu/ and other academic
torrent sites)
Many in academia now chose to ignore or challenge inflexibilities of copyright to get the material they need
Remove barriers - Make it easier to get material people need for free for them (or cheaply)
Lower costs – new approaches, new models of working
How could we respond?
Ed Chamberlain
@edchamberlain
This work is licensed under a Creative Commons Attribution 3.0 Unported License.