Upload
roderic-page
View
269
Download
0
Embed Size (px)
Citation preview
Unknown knowns, long tails, and long data
@rdmpage
http://iphylo.blogspot.com
Stressors and Drivers of Food Security: Evidence from Scientific Collections
We know this We know we don’t know this
We don’t know that we don’t know this
known
unknown
knowns
unknowns
We know this, but we don’t know that
“long data”
100,000 articles from http://biostor.org (BHL)
1923 today
Long tail(Wikipedia pages for mammals)
Lots of very small articles
A few very large articles
Mining the biodiversity literature
Associations between species
http://biostor.org/maps
Biological Diversity in the Patent SystemPaul Oldham, Stephen Hall, Oscar Forero
PLoS ONE http://dx.doi.org/10.1371/journal.pone.0078737
“…human innovative activity involving biodiversity in the patent system focuses on approximately 4% of taxonomically described…”
@junglepaul
PMID:948206
http://biostor.org/reference/102054
http://data.gbif.org/occurrences/215921922/
BHL and GBIF as biomedical databases
http://iphylo.blogspot.co.uk/2012/03/bhl-and-gbif-as-biomedical-databases.html
PubMed(disease)
BioStor(publication)
GBIF(specimen)
Summary
• Open access literature is a potential goldmine of information (long data, long tail)
• Text mining for entities (scientific names, places, specimens, attributes) (search is still the killer app)
• Linking things together (unknown knowns)