21
Brian Stucky, University of Colorado, Boulder John Deck, University of California, Berkeley Lukasz Ziemba, University of Florida, Gaineseville Nico Cellinese, University of Florida, Gainesville Rob Guralnick, University of Colorado, Boulder BiSciCol Team: Reed Beaman, Nico Cellinese, Jonathan Coddington, Neil Davies, John Deck, Rob Guralnick, Bryan P. Heidorn, Chris Meyer, Tom Orrell, Rich Pyle, Kate Rachwal, Brian Stucky, Rob Whitton, Lukasz Ziemba BiSciCol: Bi ological Sc ience Col lections Tracker Tracking Biodiversity Objects to Brokering Standards Univ. Hawai’i Univ. Arizona Smithsonian

Biological Science Collections Tagging and Tracking presented at SPNHC

Embed Size (px)

Citation preview

Page 1: Biological Science Collections Tagging and Tracking presented at SPNHC

Brian Stucky, University of Colorado, BoulderJohn Deck, University of California, BerkeleyLukasz Ziemba, University of Florida, GainesevilleNico Cellinese, University of Florida, GainesvilleRob Guralnick, University of Colorado, Boulder

BiSciCol Team:Reed Beaman, Nico Cellinese, Jonathan Coddington, Neil Davies, John Deck, RobGuralnick, Bryan P. Heidorn, Chris Meyer, Tom Orrell, Rich Pyle, Kate Rachwal, BrianStucky, Rob Whitton, Lukasz Ziemba

BiSciCol: Biological Science Collections Tracker

Tracking Biodiversity Objects to Brokering Standards

Univ. Hawai’iUniv. ArizonaSmithsonian

Page 2: Biological Science Collections Tagging and Tracking presented at SPNHC

• National Science Foundation funded 2010 – 2014• Infrastructure to tag & track specimens & derivates in cyberspace• Relies on globally unique identifiers (GUIDs) to track objects • Implements a Linked Data approach

Page 3: Biological Science Collections Tagging and Tracking presented at SPNHC
Page 4: Biological Science Collections Tagging and Tracking presented at SPNHC

QUANTITY OF DATA IS FIRST LINK IN A LARGER CHAIN OF ISSUES

Page 5: Biological Science Collections Tagging and Tracking presented at SPNHC

Here is the problem:

Lots of Data ….

Generates …

Page 6: Biological Science Collections Tagging and Tracking presented at SPNHC

Taxonomic concepts: Catalog of Life, WORMS, ITIS, EOL, GNA Geography: GBIF, IUCN ranges, Map of Life, WDPA

Genes/genomes: Genbank, TreeBase, ToL Web, AVATOL, BOLD

Phenotypes and traits: MorphBank, TRY, Phenoscape

Standards

Data stores:

Page 7: Biological Science Collections Tagging and Tracking presented at SPNHC

EOL

GBIF

NCBI

A Growing Constellation of Biodiversity Data and Knowledge

Page 8: Biological Science Collections Tagging and Tracking presented at SPNHC

How do we link all these data together?

Page 9: Biological Science Collections Tagging and Tracking presented at SPNHC

Borrowing from Facebook and social media…Can we track relationships for Biological Objects as well?

Page 10: Biological Science Collections Tagging and Tracking presented at SPNHC

Taxonomic Type Filter

Class Filter

X

X

Specimens

Tissues

Sequences

FunctionsX Infer Relationships Across providers

A Biological Relationship Graph …

Page 11: Biological Science Collections Tagging and Tracking presented at SPNHC

Moorea Biocode Example: From field collection through analysis, across multiple systems

(Biocode Event)

(Essig Museum Specimen)

(Smithsonian Tissue)

(CAMERA Gut Sample Event)

(Genbank Sequence)

(metagenomic Sequencing)

Key Blast*n

Taxon*nTaxon

Blast

Taxon

(Key)

(Taxon)

Page 12: Biological Science Collections Tagging and Tracking presented at SPNHC

How to Guide: Tracking Biological Object RelationshipsGroup “like” terms into classes. In Darwin Core, e.g. groups of terms: Events, Locations, Occurrences, GeologicalContext, Identification, Taxon.Assign Identifiers to objects. Use globally unique, resolvable, persistent identifiers for each class or term.

Link Identifiers using relationship terms and specified classes. For example, “This object is related to that object.”

Put this data on the Web.

Page 13: Biological Science Collections Tagging and Tracking presented at SPNHC

Global Unique identifiers: • Globally unique (mandatory)• Persistent (not mandatory, but very helpful)• Resolvable (not mandatory, but very helpful)

Examples:

http://example.org/urn:lsid:example.org:specimen/7217D220-836A-11DF-8395-0800200C9A66 http://mycollection.org/specimen/JDeckSpecimen1http://mycollection.org/specimen/uuid=7217D220-836A-11DF-8395-0800200C9A66http://dx.doi.org/10.5072/FK2JW8GKM

Page 14: Biological Science Collections Tagging and Tracking presented at SPNHC

Simple relationshipterms:

Graph relationships:

Page 15: Biological Science Collections Tagging and Tracking presented at SPNHC

ONE FINAL PIECE OF THE PUZZLE:

GIVING BIRTH TO DATA IN THE RIGHT

FORMAT FOR LINKING

Page 16: Biological Science Collections Tagging and Tracking presented at SPNHC

“Triplifier” - creating the format for linking biological objects

KEMUMysql

BiSciC

ol

Triples

tore

Darwin Core Archive

Mysql

DarwinCoreArchive

TriplifierCreate links fromNative data formats

Page 17: Biological Science Collections Tagging and Tracking presented at SPNHC

BiSciC

ol

Triples

tore

Qu

ery

Response

QUERY AND RESULTS ACROSS LINKED DATA

Page 18: Biological Science Collections Tagging and Tracking presented at SPNHC

Aedes increpitusSearch Scientific Name: Run

Client Interface:

Results:OccurrenceID1 (Aedes increpitus  Dyar, 1916 ) OccurrenceID3 (Aedes vittata  Theobald, 1903)

Taxon SERVICE (ITIS / GNUB)http://lsid.itis.gov/urn:lsid:itis.gov:itis_tsn:126314http://lsid.itis.gov/urn:lsid:itis.gov:itis_tsn:126317http://gnub.org/8E19F1DC-74BA-47D4-A505-6498414B4CCE

BISCICOL SERVICE LOOKUP:dwc:IdentificationID1 :relatedTo http://lsid.itis.gov/urn:lsid:itis.gov:itis_tsn:126314dwc:IdentificationID1 :relatedTo dwc:OccurrenceID1dwc:IdentificationID2 :relatedTo http://lsid.itis.gov/urn:lsid:itis.gov:itis_tsn:126317dwc:IdentificationID2 :relatedTo dwc:OccurrenceID3

BISCICOL – EXAMPLE SEARCH

Page 19: Biological Science Collections Tagging and Tracking presented at SPNHC

Working with Locations:

Tracking location in space of a

moving individual (whales)

EventID1

EventID2

EventID3

IndividualID1 GeoreferenceID1

GeoreferenceID2

GeoreferenceID3

Page 20: Biological Science Collections Tagging and Tracking presented at SPNHC

Data Impact Factor – Graph Metrics

Occurrences

MBIO99999(1024 total descendents)

IMBL8888888(723 total descendents)

Events

Biocode10234(4234 direct children)

Expedition21234(1023 direct children)

Collectors

Gustav Paulay(102,000 direct children)

Christopher Meyer(83,000 direct children)

Craig Moritz(523 direct children)

[ ] GBIF Relations Graph[X] Moorea Biocode[X] SI MSNGR System[+] Add New Graph

Graphs

Cited occurrences over time

Page 21: Biological Science Collections Tagging and Tracking presented at SPNHC

• New era of collections digitization• new & derived data objects created, replicated, annotated

• BiSciCol tackles preservation of nat. hist. collections challenge:• How to follow these digital objects• How to link together objects and derivatives back to specimens

• BiSciCol is about community, collaborative practice• Commitment to standards, ontologies• Agreement on permanent, resolvable identifiers• Triplification of data sources to enhance linked data

Why BiSciCol and Why SPNHC and Why Collaborations?