Click here to load reader
Upload
roderic-page
View
263
Download
7
Embed Size (px)
DESCRIPTION
Slides from 4th Global Online Biodiversity Informatics Seminar https://plus.google.com/events/clvk6nd14d9fhh7e4a6oe5mt9s0
Citation preview
Building the Biodiversity Knowledge Graph
@rdmpage
http://iphylo.blogspot.com
• There are known knowns, things we know that we know
• There are known unknowns, things we now know we don’t know
• But there are also unknown unknowns, things we do not know we don’t know
known
unknown
knowns
unknowns
Things we don’t know that we know
Melissotarsus insularis
Melissotarsus insularis no hit
CASENT0107663-D01 DQ176312
Melissotarsus sp. BLF m1DQ176312
CASENT0107663-D01Melissotarsus insularis
1
Melissotarsus insularisMelissotarsus sp. BLF m1 =
We have a vast amount of “old stuff”
Numbers of new animal names
1923
WWIWWII
We are learning new stuff
“New” and “old” are disconnected
Dark taxa
http://iphylo.blogspot.co.uk/2011/04/dark-taxa-genbank-in-post-taxonomic.html
Mammals in GenBank
Proper Linnaean names
Aus sp.
Mammals
Proper Linnaean names
Aus sp.
“Invertebrates”
BOLD
Challenge: linking things together
(sticky data)
Data is good
More data is better…
…but this data is not sticky
Location
name
name
Tags
Namenname
Identifiers
Shared identifiers are sticky
Identifiers
• Globally unique
• Resolvable (for humans and machines)
• Use other people’s identifiers to link things together
Human and machine readable
machine
human
{"author": [ { "family": "Page", "given": "Roderic D.M." } ], "container-title": "PeerJ", "reference-count": 60, "page": "e190", "deposited": { "date-parts": [ [ 2013, 11, 18 ] ], "timestamp": 1384732800000 }, "title": "BioNames: linking taxonomy, texts, and trees", "type": "journal-article", "DOI": "10.7717/peerj.190", "ISSN": [ "2167-8359" ], "URL": "http://dx.doi.org/10.7717/peerj.190”}
Using other people’s identifiers is hard work and scary
• Hard work - you have to find their identifiers
• Scary - what happens if other person breaks their identifiers?
• Solution: make it easy to find them, and make them robust (e.g., CrossRef and DOIs)
http://dx.doi.org/10.7717/peerj.190
DOI (Digital Object Identifier)
Biodiversity Knowledge Graph(linking things together)
Our questions are “paths” in this network
Phylogeography
Taxonomy
GenBank records from Spain
MESH term
PMID:948206
http://biostor.org/reference/102054
http://data.gbif.org/occurrences/215921922/
BHL and GBIF as biomedical databases
http://iphylo.blogspot.co.uk/2012/03/bhl-and-gbif-as-biomedical-databases.html
Metrics(counting links in the knowledge graph)
In an attempt to live up to that increasing demand for documentation, the leadership of the Natural History Museum of Denmark has issued an order to its curatorial staff - The staff members are requested to document which publications from 2011, written entirely by external scientists, that in one way or another are based on material in the collections of the Museum.
http://markmail.org/message/opv2we7fkmro2nen@TAXACOM
https://twitter.com/#!/search/10.1371%252Fjournal.pone.0036881
https://twitter.com/edwbaker/status/205595933159858176
https://twitter.com/edwbaker/status/205595933159858176
http://www.museum-analytics.org/
Cited, linkable specimens
NMNH Vertebrate Zoology Herpetology Collections 11194
CAS Herpetology Collection Catalog
MCZ Herpetology Collection
Herpetology Collection (University of Kansas Biodiversity Research Center)
9619
6720
5818
http://iphylo.blogspot.co.uk/2012/02/gbif-specimens-in-biostor-who-are-top.html
Annotation(everyone can make
the knowledge graph)
http://bionames.org/labs/bookmarklet/
How many people view annotation
DataFix me!
Annotation as fixing errors
Annotation as buildingthe knowledge graph
paper specimen
paper
sequence
taxonomic name
specimen
cites
publishes
has voucher
OK, but if the biodiversity knowledge graph is so cool, why haven’t we
made it already?
Open question:
Who will build thebiodiversity knowledge graph?