Unknown knowns, long tails, and long data

  • View
    269

  • Download
    0

  • Category

    Science

Preview:

Citation preview

Unknown knowns, long tails, and long data

@rdmpage

http://iphylo.blogspot.com

Stressors and Drivers of Food Security: Evidence from Scientific Collections

We know this We know we don’t know this

We don’t know that we don’t know this

known

unknown

knowns

unknowns

We know this, but we don’t know that

“long data”

100,000 articles from http://biostor.org (BHL)

1923 today

Long tail(Wikipedia pages for mammals)

Lots of very small articles

A few very large articles

Mining the biodiversity literature

Associations between species

http://biostor.org/maps

Biological Diversity in the Patent SystemPaul Oldham, Stephen Hall, Oscar Forero

PLoS ONE http://dx.doi.org/10.1371/journal.pone.0078737

“…human innovative activity involving biodiversity in the patent system focuses on approximately 4% of taxonomically described…”

@junglepaul

PMID:948206

http://biostor.org/reference/102054

http://data.gbif.org/occurrences/215921922/

BHL and GBIF as biomedical databases

http://iphylo.blogspot.co.uk/2012/03/bhl-and-gbif-as-biomedical-databases.html

PubMed(disease)

BioStor(publication)

GBIF(specimen)

Summary

• Open access literature is a potential goldmine of information (long data, long tail)

• Text mining for entities (scientific names, places, specimens, attributes) (search is still the killer app)

• Linking things together (unknown knowns)

Recommended