16
Mass digitization & crowdsourcing Maarten Heerlien NBN Crowdsourcing Summit, Manchester 09-25-2015

Mass digitization & crowdsourcing€¦ · NBN Crowdsourcing Summit, Manchester 09-25-2015 13 Results FCD Digistreet Start End Objects digitized 1 object = Molluscs 2011 –Q1 2013

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Mass digitization & crowdsourcing€¦ · NBN Crowdsourcing Summit, Manchester 09-25-2015 13 Results FCD Digistreet Start End Objects digitized 1 object = Molluscs 2011 –Q1 2013

Mass digitization & crowdsourcingMaarten Heerlien

NBN Crowdsourcing Summit, Manchester 09-25-2015

Page 2: Mass digitization & crowdsourcing€¦ · NBN Crowdsourcing Summit, Manchester 09-25-2015 13 Results FCD Digistreet Start End Objects digitized 1 object = Molluscs 2011 –Q1 2013

NBN Crowdsourcing Summit, Manchester 09-25-2015 2

Naturalis Biodiversity Center

• Merger of Naturalis, National Herbarium, Zoological Museum

Amsterdam & ETI BioInformatics (2010)

• Staff: 300 (100 scientists)

• 200 peer-reviewed publications per year

• Collection: 37 million (5th in the world)

• 9 exhibition spaces

• 300.000 visitors per year (50.000 school children)

Page 3: Mass digitization & crowdsourcing€¦ · NBN Crowdsourcing Summit, Manchester 09-25-2015 13 Results FCD Digistreet Start End Objects digitized 1 object = Molluscs 2011 –Q1 2013

NBN Crowdsourcing Summit, Manchester 09-25-2015 3

Collection digitization

• 2010-2015: FCD program (FES Collection Digitization)

• Goals:

• 7 million objects digitized in detail

• 30 million objects digitized on a meta-level

• Permanent collection digitization infrastructure

• Budget: € 13 million (€ 1,87 per object)

• People: 80 temporary employees

• Funding: Economic Structure Enhancement Fund (FES)

Page 4: Mass digitization & crowdsourcing€¦ · NBN Crowdsourcing Summit, Manchester 09-25-2015 13 Results FCD Digistreet Start End Objects digitized 1 object = Molluscs 2011 –Q1 2013

NBN Crowdsourcing Summit, Manchester 09-25-2015 4

From tailor made to production lines

• Priority driven (policy, research, preservation)

• Digitization processes based on collection types

• Divide complicated and labor-intensive processes in a shorter

series of tasks

• Standardize the processes for data entry and photography

Page 5: Mass digitization & crowdsourcing€¦ · NBN Crowdsourcing Summit, Manchester 09-25-2015 13 Results FCD Digistreet Start End Objects digitized 1 object = Molluscs 2011 –Q1 2013

NBN Crowdsourcing Summit, Manchester 09-25-2015 5

Digitization process

Page 6: Mass digitization & crowdsourcing€¦ · NBN Crowdsourcing Summit, Manchester 09-25-2015 13 Results FCD Digistreet Start End Objects digitized 1 object = Molluscs 2011 –Q1 2013

2015 6

Digistreets

• Herbarium sheets

• Molluscs

• Wet collections

• Entomology

• Wood

• (in-)vertebrates dry

• Library

• Geology

• Microscopic slides

Page 7: Mass digitization & crowdsourcing€¦ · NBN Crowdsourcing Summit, Manchester 09-25-2015 13 Results FCD Digistreet Start End Objects digitized 1 object = Molluscs 2011 –Q1 2013

NBN Crowdsourcing Summit, Manchester 09-25-2015 7

Digitization and public engagement

Page 8: Mass digitization & crowdsourcing€¦ · NBN Crowdsourcing Summit, Manchester 09-25-2015 13 Results FCD Digistreet Start End Objects digitized 1 object = Molluscs 2011 –Q1 2013

NBN Crowdsourcing Summit, Manchester 09-25-2015 8

Digitization and public engagement

Page 9: Mass digitization & crowdsourcing€¦ · NBN Crowdsourcing Summit, Manchester 09-25-2015 13 Results FCD Digistreet Start End Objects digitized 1 object = Molluscs 2011 –Q1 2013

NBN Crowdsourcing Summit, Manchester 09-25-2015 9

Crowdsourcing: Glashelder!

• Transcription by online volunteers

• 100.000 microscopic glass slides

• Mites, Springtails and Aphids

• Goals:

• Full transcription and validation by the crowd

• 6 months

• Costs comparable to in-house digitization

• Existing platform: VeleHanden.nl (Many Hands)

• Dedicated platform for transcription of

handwritten heritage

• Benefit of pre-existing crowd

Page 10: Mass digitization & crowdsourcing€¦ · NBN Crowdsourcing Summit, Manchester 09-25-2015 13 Results FCD Digistreet Start End Objects digitized 1 object = Molluscs 2011 –Q1 2013

NBN Crowdsourcing Summit, Manchester 09-25-2015 10

Transcription

Page 11: Mass digitization & crowdsourcing€¦ · NBN Crowdsourcing Summit, Manchester 09-25-2015 13 Results FCD Digistreet Start End Objects digitized 1 object = Molluscs 2011 –Q1 2013

NBN Crowdsourcing Summit, Manchester 09-25-2015 11

Validation

Page 12: Mass digitization & crowdsourcing€¦ · NBN Crowdsourcing Summit, Manchester 09-25-2015 13 Results FCD Digistreet Start End Objects digitized 1 object = Molluscs 2011 –Q1 2013

NBN Crowdsourcing Summit, Manchester 09-25-2015 12

Results Glashelder!

• 9 months

• 200.000 transcriptions

• Record: 1913 transcriptions on one day

• 100.000 validations

• 497 participants

• More than 100 transcriptions: 73

• More than 1000 transcriptions: 26

• More than 10.000 transcriptions: 5

• 18 validators

• Costs comparable to in-house digitization

• Lots of media attention

Page 13: Mass digitization & crowdsourcing€¦ · NBN Crowdsourcing Summit, Manchester 09-25-2015 13 Results FCD Digistreet Start End Objects digitized 1 object = Molluscs 2011 –Q1 2013

NBN Crowdsourcing Summit, Manchester 09-25-2015 13

Results FCD

Digistreet Start End Objects digitized 1 object =

Molluscs 2011 – Q1 2013 – Q1 650.000 1 sample

Entomology 2011 – Q2 2015 – Q2 850.000 1 insect

Wood 2011 – Q3 2013 – Q2 125.000 1 wood sample

Library 2011 – Q3 2015 – Q2 820.000 1 page

Alcohol specimens 2012 – Q1 2015 – Q2 100.000 1 sample

Herbarium 2012 – Q2 2015 – Q2 4.400.000 1 herbarium sheet

Dry (e)vertebrates 2012 – Q2 2015 – Q2 275.000 1 specimen (part)

Glass slides 2012 – Q3 2015 – Q2 800.000 1 microscopic slide

Geology 2013 – Q2 2015 – Q2 200.000 1 sample

Total 8.220.000

Page 14: Mass digitization & crowdsourcing€¦ · NBN Crowdsourcing Summit, Manchester 09-25-2015 13 Results FCD Digistreet Start End Objects digitized 1 object = Molluscs 2011 –Q1 2013

NBN Crowdsourcing Summit, Manchester 09-25-2015 14

Access to digitized collections

• 8.220.000 specimens digitized in detail

• Published as open content

• Digitized data and scans of objects: CC0

• Other content: CC-BY

• Bioportal.naturalis.nl

• Netherlands Biodiversity API

• Data and content aggregators

Page 15: Mass digitization & crowdsourcing€¦ · NBN Crowdsourcing Summit, Manchester 09-25-2015 13 Results FCD Digistreet Start End Objects digitized 1 object = Molluscs 2011 –Q1 2013

NBN Crowdsourcing Summit, Manchester 09-25-2015 15

Thank you

[email protected] | +31-71-751-9387

• https://science.naturalis.nl/en/collection/digitization

• http://bioportal.naturalis.nl (digital collections portal)

• http://docs.biodiversitydata.nl (Github API documentation)

• https://youtu.be/ODtuWKoujFw (introduction to Naturalis digitization program)

• https://youtu.be/TywNYCigY0k (digitizing entomology collections)

• https://youtu.be/hmG4twyHXkE (digitizing herbarium sheets)

• https://en.wikipedia.org/wiki/Wikipedia:GLAM/Naturalis (Naturalis content donation page)

• Heerlien et al., 2015. The natural history production line: An industrial approach to the

digitization of scientific collections. ACM Journal on Computing and Cultural Heritage 8, 1,

Article 3 (February 2015). http://dx.doi.org/10.1145/2644822

Page 16: Mass digitization & crowdsourcing€¦ · NBN Crowdsourcing Summit, Manchester 09-25-2015 13 Results FCD Digistreet Start End Objects digitized 1 object = Molluscs 2011 –Q1 2013