Upload
jamil
View
41
Download
0
Tags:
Embed Size (px)
DESCRIPTION
The Ingest and Uses of Specimen Label Data into Semantic Knowledge Stores. P. Bryan Heidorn [email protected] Steven Chong [email protected]. University of Arizona, School of Information Resources and Library Science - PowerPoint PPT Presentation
Citation preview
The Ingest and Uses of Specimen Label Data into
Semantic Knowledge StoresP. Bryan Heidorn
[email protected] Chong
[email protected] of Arizona, School of Information Resources and Library Science
Semantics for Biodiversity Symposium – TDWG 2013 Annual ConferenceFlorence, Italy October 30, 2013
Heidorn and Chong - TDWG 2013
Anyone can say anything about anything
Semantic Information StoreMuseum Labelshttp://vector-magz.com/fashion/glass-slipper-clip-art-item-3/http://www.maskworld.com/english/products/make-up/--/monster-hands--640/ogre-feet--SP-3310-black
iDigBio Hackathon
Heidorn and Chong - TDWG 2013
• Coincided with 2013 iConference meeting in Dallas/Fort Worth, Texas• 28 participants from a variety of backgrounds and institutinos• Goal: develop new tools to parse OCR output from specimen labels into
Darwin Core. Results were compared again human-parsed gold and silver files• Three datasets
1. Easy – 10,000 images of lichens, bryophytes and climate change TCN, and lichen and bryophyte packet labels. Little or no handwriting present
2. Medium – 5,000 BRIT Herbarium and NYBG Herbarium specimen sheets. Some handwriting present.
3. Hard – Several thousand images of entomology specimens from the Essig Museum and CalBug
iDigBio Hackathon
Heidorn and Chong - TDWG 2013
Mapping to Darwin Core
Heidorn and Chong - TDWG 2013
Namespaces and Modeling• CIDOC CRM
• International Council of Museums – International Committee for Documentation’s Conceptual Reference Model • http://www.cidoc-crm.org/index.html
• Semantic framework for cultural heritage information• Being harmonized with Functional Requirements for Bibliographic Records (FRBR)
into FRBRoo ontology – goal: “to facilitate the integration, mediation, and interchange of bibliographic and museum information”
• Relation Ontology• Biology specific relations• http://obofoundry.org/ro/
Heidorn and Chong - TDWG 2013
Heidorn and Chong - TDWG 2013
Some digitization support from Semantics• Copy (some) metadata from duplicates• Up date taxonomy• Link to literature for types
Heidorn and Chong - TDWG 2013
Semantics of Duplicates and Citation• Multiple types of Specimen Duplicates• Different inferences licensed by type
• Multiple types of Citation• Different inferences licensed by type
Heidorn and Chong - TDWG 2013
Duplicate Specimens
=
NYBG Fairchild Tropical Garden
Heidorn and Chong - TDWG 2013
Museum 2
Museum 1
Locality
E19: Biological Object
Specimen 2
Date
Scientific Name
E41: Appellation
E50: Date E53: Place
E19: Biological Object
E78: Collection E78: Collection
P52: has current ownerP52: has current owner
P7: took place at
P1: is identified by
Collection Event
Collector
E39: Actor
E8: Acquisition
P78’: is identified by
P12’: was present at
P12’: was present at
P14: carried out byScientific Name
E41: Appellation (instance)E42: Identifier (Class)
P1: is identified by
CIDOC CRM overlaying DwC
Catalog Number
P1
Heidorn and Chong - TDWG 2013
E19: Biological Object E19: Biological Object
Dwc:Collection Event
E8: Acquisition
P12: was present at
P12: was present at
dwc;basisOfRecordhttp://www.ncbi.nlm.nih.gov/genbank/Sequence?
TaxonomicName
P41: Classified
E17: Type Assignment
Dwc:Identification
Dwc:Identification
P41: Classified
TaxonID
Population Duplicate
Heidorn and Chong - TDWG 2013
Valid Inferences• The specimens/individuals/samples? Come from the same population• The DNA will be same at species but not individual level• If you do DNA sequencing on one: the DNA of the other will be similar
Heidorn and Chong - TDWG 2013
E19: Biological Object E19: Biological Object
Dwc:Collection Event
E8: Acquisition
P12: was present at
P12: was present at
E19: Biological Object
dwc;basisOfRecord
P12: was present at
ro: derives_fromP24: Transfer?
ro: derives_from
http://www.ncbi.nlm.nih.gov/genbank/Sequence?
TaxonomicName
P41: Classified
E17: Type Assignment
Dwc:Identification
Dwc:Identification
P41: Classified
TaxonID
Individual Duplicate (Material Sample)
Heidorn and Chong - TDWG 2013
DOIs and Publications
http://biodiversitylibrary.org/page/2381678#page/363/mode/1up
Heidorn and Chong - TDWG 2013
E19: Biological Object E19: Biological Object
E19: Biological Object
http://www.ncbi.nlm.nih.gov/genbank/Sequence?
E83:Type Creation
P41: Classified
E17: Type Assignment
Dwc:Identification
Dwc:Identification
P41: Classified
Specimen Literature Relationships
P135: Created Type
E55:Type (Quercus alba L)
E75: Conceptual Object Appellation (DOI, ISBN Needed)
P149: is identified by
P136: was based on
P137: Exemplifies
lectotype
E75: Conceptual Object Appellation
E41: Appellation
Scientific Name
See: DwC:associatedReferences
TaxonDescription
Checklist of Plants of
Algonquin Park
Heidorn and Chong - TDWG 2013
Anyone can say anything about anything
Semantic Information StoreMuseum Labelshttp://vector-magz.com/fashion/glass-slipper-clip-art-item-3/
http://www.maskworld.com/english/products/make-up/--/monster-hands--640/ogre-feet--SP-3310-blackhttp://www.clipartguide.com/_pages/0511-0810-0502-2909.html
Acknowledgments• National Science Foundation
• BiSciCol Collaboratorshttp://biscicol.blogspot.com/
• iDigBio Hackathon Participants
Heidorn and Chong - TDWG 2013
Thank [email protected]
Heidorn and Chong - TDWG 2013