18
Unknown knowns, long tails, and long data @rdmpage http://iphylo.blogspot.com Stressors and Drivers of Food Security: Evidence from Scientific Collections

Unknown knowns, long tails, and long data

Embed Size (px)

Citation preview

Page 1: Unknown knowns, long tails, and long data

Unknown knowns, long tails, and long data

@rdmpage

http://iphylo.blogspot.com

Stressors and Drivers of Food Security: Evidence from Scientific Collections

Page 2: Unknown knowns, long tails, and long data

We know this We know we don’t know this

We don’t know that we don’t know this

known

unknown

knowns

unknowns

We know this, but we don’t know that

Page 3: Unknown knowns, long tails, and long data

“long data”

Page 4: Unknown knowns, long tails, and long data
Page 5: Unknown knowns, long tails, and long data
Page 6: Unknown knowns, long tails, and long data

100,000 articles from http://biostor.org (BHL)

1923 today

Page 7: Unknown knowns, long tails, and long data

Long tail(Wikipedia pages for mammals)

Lots of very small articles

A few very large articles

Page 8: Unknown knowns, long tails, and long data

Mining the biodiversity literature

Page 9: Unknown knowns, long tails, and long data

Associations between species

Page 10: Unknown knowns, long tails, and long data

http://biostor.org/maps

Page 11: Unknown knowns, long tails, and long data

Biological Diversity in the Patent SystemPaul Oldham, Stephen Hall, Oscar Forero

PLoS ONE http://dx.doi.org/10.1371/journal.pone.0078737

“…human innovative activity involving biodiversity in the patent system focuses on approximately 4% of taxonomically described…”

@junglepaul

Page 12: Unknown knowns, long tails, and long data
Page 13: Unknown knowns, long tails, and long data

PMID:948206

Page 14: Unknown knowns, long tails, and long data

http://biostor.org/reference/102054

Page 15: Unknown knowns, long tails, and long data
Page 16: Unknown knowns, long tails, and long data

http://data.gbif.org/occurrences/215921922/

Page 17: Unknown knowns, long tails, and long data

BHL and GBIF as biomedical databases

http://iphylo.blogspot.co.uk/2012/03/bhl-and-gbif-as-biomedical-databases.html

PubMed(disease)

BioStor(publication)

GBIF(specimen)

Page 18: Unknown knowns, long tails, and long data

Summary

• Open access literature is a potential goldmine of information (long data, long tail)

• Text mining for entities (scientific names, places, specimens, attributes) (search is still the killer app)

• Linking things together (unknown knowns)