View
911
Download
0
Category
Tags:
Preview:
DESCRIPTION
Crowdsourcing in the DigitalKoot project with Majlis Bremner-Laamanen from the National Library of Finland.
Citation preview
CROWDSOURCING IN THE DIGITALKOOT PROJECTMajlis Bremer-Laamanen
IMPACT 24TH OF OCTOBER, 2011
Microtask.com: Digitalkoot: Making Old Archives Accessible Using Crowdsourcing by
Otto Chrons and Sami Sundell,Discussions Managing Director Harri Holopainen
harri@microtask.com
• Established in 1990
• Digitisation started in 1998
• Over 50 employees
• Yearly average (past three years):
• Microfilm production: 1, 3 million exposures
• Digitisation: 1,3 million pages
• Audio digitisation and cataloguing music 1,300 unique cassettes and the sleeves
• Conservation: 10,000-15,000 units
The Centre for Preservation and Digitisation: statistics
ENRICHING CONTENT (http://digi.nationallibrary.fi, http://www.doria.fi/handle/10024/4194)
• Newspapers - > 2 million pages, the Historical Newspaper Library• Journals - > 2,7 million pages, free to 1910, in all legal deposit
libraries to 1944• Books - > travel, novels, Dissertations 17th century, Save the Book• Ephemera - > industrial price lists• Sound - > national sound archive, C-casettes• Interest groups: the creators, users, contributors of the material
Mass digitisation activities in the most cost-effective manner: Newspapers, books, journals, ephemera, audio: • Logistics for physical items• Process for digital objects: network services and long-term preservation • Metadata Mets - Alto: capturing through process
• Metadata development: User experience and crowdsourcing• Customizing of the tracking systems (CCS, Item Tracking, Scan Client)• Operational environment: scaling architecture and implementation
Context for mass digitisation and crowdsourcing
Transferring Physical Objects
Physical objects Retrieval
Temporary storage for
digitised objects
Centre for Preservation and Digitisation
Preparation for Digitisation Digitisation
Post- processing
ClientAccessibility
DIGITALKOOT
DIGI = TO DIGITISE
TALKOOT = PEOPLE GATHERING TO WORK TOGETHER VOLUNTARILY (WITHOUT PAYMENT)
FIRST EXPERIENCE 2011:
DIGITALKOOT: correction of OCR by gamification, turning useful activities into games ”THE MOLE HUNT” by Microtask.com.
– People can spend hours on games– Turning useful activities into games– Activities can be rewarded with scores, achievments and social benefits
From February, 8th to September 15th, 2011: about 80.000 visitors, 4000 hours of effective game time. More than 5 million tasks.
CHALLENGES
Meaningful tasks without breaking the flow of the game
Real-time feedback – many simultaneous players doing the same task
Build a bridge to save the moles from falling down =>– Correct typing gives you a block to the bridge– Incorrect is punished by explosion
DIGITALKOOT: Mole Hunt
Right or wrong?
DIGITALKOOT: Mole Bridge
A bridge has been built…
To the next level?
Changing sceneries
When a mole falls
Incorrect answer exploding
GAMIFICATION CHALLENGES
Balancing game play elements with task completion speed and accuracy
Keep the motivation of people and enlarge the audience
Introduction of meaningful tasks into the game without breaking game play mechanisms
Instant feedback on players´ actions (simultaneous players)• pressure to adapt to varying feedback situations/latencities
POSITIVE EFFECT OF VERIFICATION
”The wisdom of the crowds” • includes answers from possible spammers
Game start: verification tasks only
Accurate work shown => verification lowered in phases, never zero
Verification tasks are created automatically:• A randomly selected task is sent to several players: all have to
agree on the result => verification task
VERIFICATION OF THE OCR
Players and their pace cannot be synchronized.
Verification tasks to the task stream:• Fed to players varies according to the number of active players• The system knows the answer: the game play is improved by
fast feedback• Downside: no new information produced
USERS: February 8th to March 31st, 2011
31,816 visitors, 4,768 players, 2,740 hours of game time, 2,5 million tasks.
1 % via Internet, 99 % via Facebook
Half of the users were men.
Gametime: seconds to over 100 hours (altogether).
Median time: => 9 minutes.
Women >13 minutes and 54 % of the tasks
Hardest working top 4 were all men
ACCURACY
OCR-system 0.8 confidential about accuracy => human correction in 30%
Random selection of 2 articles:• 1,467 words Digitalkoot result: only14 mistakes /228 OCR• 516 words Digitalkoot result: 1 mistake/118 OCR• >> well over 99% possible by gamification
Spammer play: • One player 1,5 hours and 5,692 tasks was detected by the verification
system and only 4 tasks were accepted
Enriching Digitisation Production Processes, METS Profiles: a new development platform
SOURCE MATERIALPHYSICAL COLLECTIONS
Structural metadata METS, ALTO
METS EXPORT Packesges include:
JPEG2000
OCR TXT as ALTO XML
JPEG(150)
METSXML
MARCXML
DIGITAL RESOURCE COMPREHENSIVE DIGITAL COLLECTIONS
Standards & OAI-PMH complient METS SIP packages
Two Bibliographic Records
CATALOGUING
SCANNING
POST PROCESSING
LEVEL OF MARK UP
ArticlesIllustrationsPoems
Descriptive metadata MARC21/MODS
Administrative/technical metadata MIX/PREMIS
NewspapersSerialsBooksParchmentsNotesMapsAudio
IN THE MEDIA
- Until March 31st, over 30 articles: all around the world: New York Times…
- Television appearances ongoing
- Helsingin Sanomat : HS talkoot using the National Library´s digitised newspaper material Historical Newspaper Library > advertising Digitalkoot e.g. September 15th
- Influenced user interest
=> stabilisation to 300 individual users per week
NEXT1) Marking of articles and/or
images2) Indexing articles and/or
images
KUVATALKOOTGoal: sophisticated user experience
Collections discovery and reuse of digital content by researchers and people at large:
Researchers will get better systematic coverage of images and articles in published printed material.
Luon
non-
kirja
ala
-alk
eisk
oulu
in ta
rpee
ksi /
Z. T
opel
ius,
186
8
Recommended