18
The Ingest and Uses of Specimen Label Data into Semantic Knowledge Stores P. Bryan Heidorn [email protected] du Steven Chong [email protected] na.edu University of Arizona, School of Information Resources and Library Science Semantics for Biodiversity Symposium – TDWG 2013 Annual Conference Florence, Italy October 30, 2013

The Ingest and Uses of Specimen Label Data into Semantic Knowledge Stores

  • Upload
    jamil

  • View
    41

  • Download
    0

Embed Size (px)

DESCRIPTION

The Ingest and Uses of Specimen Label Data into Semantic Knowledge Stores. P. Bryan Heidorn [email protected] Steven Chong [email protected]. University of Arizona, School of Information Resources and Library Science - PowerPoint PPT Presentation

Citation preview

Page 1: The Ingest and Uses of Specimen Label Data into Semantic Knowledge Stores

The Ingest and Uses of Specimen Label Data into

Semantic Knowledge StoresP. Bryan Heidorn

[email protected] Chong

[email protected] of Arizona, School of Information Resources and Library Science

Semantics for Biodiversity Symposium – TDWG 2013 Annual ConferenceFlorence, Italy October 30, 2013

Page 2: The Ingest and Uses of Specimen Label Data into Semantic Knowledge Stores

Heidorn and Chong - TDWG 2013

Anyone can say anything about anything

Semantic Information StoreMuseum Labelshttp://vector-magz.com/fashion/glass-slipper-clip-art-item-3/http://www.maskworld.com/english/products/make-up/--/monster-hands--640/ogre-feet--SP-3310-black

Page 3: The Ingest and Uses of Specimen Label Data into Semantic Knowledge Stores

iDigBio Hackathon

Heidorn and Chong - TDWG 2013

• Coincided with 2013 iConference meeting in Dallas/Fort Worth, Texas• 28 participants from a variety of backgrounds and institutinos• Goal: develop new tools to parse OCR output from specimen labels into

Darwin Core. Results were compared again human-parsed gold and silver files• Three datasets

1. Easy – 10,000 images of lichens, bryophytes and climate change TCN, and lichen and bryophyte packet labels. Little or no handwriting present

2. Medium – 5,000 BRIT Herbarium and NYBG Herbarium specimen sheets. Some handwriting present.

3. Hard – Several thousand images of entomology specimens from the Essig Museum and CalBug

Page 4: The Ingest and Uses of Specimen Label Data into Semantic Knowledge Stores

iDigBio Hackathon

Heidorn and Chong - TDWG 2013

Page 5: The Ingest and Uses of Specimen Label Data into Semantic Knowledge Stores

Mapping to Darwin Core

Heidorn and Chong - TDWG 2013

Page 6: The Ingest and Uses of Specimen Label Data into Semantic Knowledge Stores

Namespaces and Modeling• CIDOC CRM

• International Council of Museums – International Committee for Documentation’s Conceptual Reference Model • http://www.cidoc-crm.org/index.html

• Semantic framework for cultural heritage information• Being harmonized with Functional Requirements for Bibliographic Records (FRBR)

into FRBRoo ontology – goal: “to facilitate the integration, mediation, and interchange of bibliographic and museum information”

• Relation Ontology• Biology specific relations• http://obofoundry.org/ro/

Heidorn and Chong - TDWG 2013

Page 7: The Ingest and Uses of Specimen Label Data into Semantic Knowledge Stores

Heidorn and Chong - TDWG 2013

Some digitization support from Semantics• Copy (some) metadata from duplicates• Up date taxonomy• Link to literature for types

Page 8: The Ingest and Uses of Specimen Label Data into Semantic Knowledge Stores

Heidorn and Chong - TDWG 2013

Semantics of Duplicates and Citation• Multiple types of Specimen Duplicates• Different inferences licensed by type

• Multiple types of Citation• Different inferences licensed by type

Page 9: The Ingest and Uses of Specimen Label Data into Semantic Knowledge Stores

Heidorn and Chong - TDWG 2013

Duplicate Specimens

=

NYBG Fairchild Tropical Garden

Page 10: The Ingest and Uses of Specimen Label Data into Semantic Knowledge Stores

Heidorn and Chong - TDWG 2013

Museum 2

Museum 1

Locality

E19: Biological Object

Specimen 2

Date

Scientific Name

E41: Appellation

E50: Date E53: Place

E19: Biological Object

E78: Collection E78: Collection

P52: has current ownerP52: has current owner

P7: took place at

P1: is identified by

Collection Event

Collector

E39: Actor

E8: Acquisition

P78’: is identified by

P12’: was present at

P12’: was present at

P14: carried out byScientific Name

E41: Appellation (instance)E42: Identifier (Class)

P1: is identified by

CIDOC CRM overlaying DwC

Catalog Number

P1

Page 11: The Ingest and Uses of Specimen Label Data into Semantic Knowledge Stores

Heidorn and Chong - TDWG 2013

E19: Biological Object E19: Biological Object

Dwc:Collection Event

E8: Acquisition

P12: was present at

P12: was present at

dwc;basisOfRecordhttp://www.ncbi.nlm.nih.gov/genbank/Sequence?

TaxonomicName

P41: Classified

E17: Type Assignment

Dwc:Identification

Dwc:Identification

P41: Classified

TaxonID

Population Duplicate

Page 12: The Ingest and Uses of Specimen Label Data into Semantic Knowledge Stores

Heidorn and Chong - TDWG 2013

Valid Inferences• The specimens/individuals/samples? Come from the same population• The DNA will be same at species but not individual level• If you do DNA sequencing on one: the DNA of the other will be similar

Page 13: The Ingest and Uses of Specimen Label Data into Semantic Knowledge Stores

Heidorn and Chong - TDWG 2013

E19: Biological Object E19: Biological Object

Dwc:Collection Event

E8: Acquisition

P12: was present at

P12: was present at

E19: Biological Object

dwc;basisOfRecord

P12: was present at

ro: derives_fromP24: Transfer?

ro: derives_from

http://www.ncbi.nlm.nih.gov/genbank/Sequence?

TaxonomicName

P41: Classified

E17: Type Assignment

Dwc:Identification

Dwc:Identification

P41: Classified

TaxonID

Individual Duplicate (Material Sample)

Page 14: The Ingest and Uses of Specimen Label Data into Semantic Knowledge Stores

Heidorn and Chong - TDWG 2013

DOIs and Publications

http://biodiversitylibrary.org/page/2381678#page/363/mode/1up

Page 15: The Ingest and Uses of Specimen Label Data into Semantic Knowledge Stores

Heidorn and Chong - TDWG 2013

E19: Biological Object E19: Biological Object

E19: Biological Object

http://www.ncbi.nlm.nih.gov/genbank/Sequence?

E83:Type Creation

P41: Classified

E17: Type Assignment

Dwc:Identification

Dwc:Identification

P41: Classified

Specimen Literature Relationships

P135: Created Type

E55:Type (Quercus alba L)

E75: Conceptual Object Appellation (DOI, ISBN Needed)

P149: is identified by

P136: was based on

P137: Exemplifies

lectotype

E75: Conceptual Object Appellation

E41: Appellation

Scientific Name

See: DwC:associatedReferences

TaxonDescription

Checklist of Plants of

Algonquin Park

Page 16: The Ingest and Uses of Specimen Label Data into Semantic Knowledge Stores

Heidorn and Chong - TDWG 2013

Anyone can say anything about anything

Semantic Information StoreMuseum Labelshttp://vector-magz.com/fashion/glass-slipper-clip-art-item-3/

http://www.maskworld.com/english/products/make-up/--/monster-hands--640/ogre-feet--SP-3310-blackhttp://www.clipartguide.com/_pages/0511-0810-0502-2909.html

Page 17: The Ingest and Uses of Specimen Label Data into Semantic Knowledge Stores

Acknowledgments• National Science Foundation

• BiSciCol Collaboratorshttp://biscicol.blogspot.com/

• iDigBio Hackathon Participants

Heidorn and Chong - TDWG 2013