Upload
nickyn
View
378
Download
2
Embed Size (px)
Citation preview
Kew at pro-iBiosphere
data hackathon
Nicky Nicolson, Matt BlissettRBG Kew Biodiversity Informatics team
A map + data + tools = links
Two minute background: what we’ve done, why we
should link up our data
What is needed?
- Persistent identifiers
- Tools – to turn “strings” into “things”
What we’ve brought along:
- Map
- Data
- ... Labelled with persistent identifiers
- A rules based matching / linking tool
A map + data + tools = links
Two minute background: what we’ve done, why we
should link up our data
What is needed?
- Persistent identifiers
- Tools – to turn “strings” into “things”
What we’ve brought along:
- Map
- Data
- ... Labelled with persistent identifiers
- A rules based matching / linking tool
specimens.kew.org/herbarium/K000525802
doi: 10.1007/s12225-010-9210-7
Cited in:
Rakotoarinivo M, Dransfield J. 2010
New species of Dypsis and Ravenea
(Arecaceae) from Madagascar. Kew
Bull. 65, 279–303.
doi:10.1007/s12225-010-9210-7
specimens.kew.org/herbarium/K000525802
Data linking tool
Rules based
Armed with a tabular dataset, you:
Define zero or more transformers for each field
Define how fields must match
This is a match configuration.
Examples of transformers
Epithet
mediterraneum → mediterranea
NormaliseDiacrits
Déségl. → Desegl.
RemoveBracketedText, RomanNumeral
cix (1892), 57 → 109 57
CleanedPubAuthors
(L.) A.Gray in Hook.f. → A.Gray
SurnameExtracter
(A.Gray) A.Heller → (Gray) Heller
PageExtractor
37(4): 412 (1977) → 412
Examples of matchers
Exact
CommonTokens
CapitalLetters
in Beitr. Aethiop. → B A
Beitr. Fl. Aethiop. → B F A = 0.67 ratio
Number
Integer
Levenshtein
Using the matcher
A configured match can run against any tabular dataset.
Accessible as:
- JSON web service
- Google Refine reconciliation service (work in
progress)
Transformers can be dropped into Google Refine
Proposal: link names in floras to
IPNI
We’ll set up the tool with IPNI as its backend dataset
We run lists of taxa treated in floras against it and
distribute IPNI IDs for these names.
Short term gain: navigate via the IPNI ID to the
evidence about the name – protologues (Rod has
matched 120K to DOIs) and types.
Long term gain: GSPC target #1 – online world flora.
Simpler to integrate data if we’re talking about the
same name.
Proposal – link IPNI to types
We set up the tool with a botanical specimen catalogue
as its backend data-source.
We link up the IPNI cited type data with the specimens
themselves.
Proposal – link floras to
specimens
Floras use herbarium specimens as evidence for their
distribution statements.
We set up the tool with a botanical specimen catalogue
as its backend data-source.
We extract specimen references from floras and run
these against the tool to create links from flora
accounts to specimens themselves.
specimens.kew.org/herbarium/K000049118
Cited in: FZ volume:5 part:3 (2003) Rubiaceae by D.M.Bridson &
B.Verdcourt
specimens.kew.org/herbarium/K000049118
Proposal – link duplicates
between herbaria
We set up the tool with a botanical specimen catalogue
e.g. K as its backend data-source.
We fire specimen data from another specimen
catalogue at it to look for duplicates.
Benefits:
- Geo-referencing
- Imaging
- Data capture efficiency