23
Taxonomic Databases Working Group Annual Meeting 2011 GBIF: Issues in providing federated access to digital information related to biological specimens. David Remsen Senior Programme Officer Global Biodiversity Information Facility (GBIF) TDWG 2011

Tdwg 2-remsen

Embed Size (px)

DESCRIPTION

Presentation at TDWG Conference 2011 in New Orleans

Citation preview

Page 1: Tdwg 2-remsen

Taxonomic Databases Working Group Annual Meeting 2011

GBIF: Issues in providing federated access to digital information related to biological specimens.

David RemsenSenior Programme OfficerGlobal Biodiversity Information Facility (GBIF)

TDWG 2011

Page 2: Tdwg 2-remsen

Issue #1: The consequences of scale

Goal – Provide timely access to a large federated network of biodiversity databases

Page 3: Tdwg 2-remsen

About GBIF

• 341 publishers• 9290 datasets• 310M records

The mission of the Global Biodiversity Information Facility (GBIF) is to facilitate free and open access to biodiversity data worldwide via the Internet to underpin sustainable development.

• 57 countries• 45 organisations

Primary biodiversity data

Page 4: Tdwg 2-remsen

“Wrapper” Software

PyWrapper (Python)

TAPIR Link (PHP)

DiGIR (PHP)

Your database

Insect Collection

Install one of these ‘wrappers’

ABCD

Bird Observations

Herbarium

Data

DarwinCore

DarwinCore

Page 5: Tdwg 2-remsen

The promise of federation

Insect Collection HerbariumBird Observations Herbarium

Any specimens from Thailand?

GBIF Data Portal

I will ask!

I do! I do! I do!Nope!

GBIF Data Portal as a Gateway

Page 6: Tdwg 2-remsen

The challenge of federation

Insect Collection HerbariumBird Observations Herbarium

Hello?

Server Not AvailableServer Not Available

GBIF Data Portal

Hi!

Page 7: Tdwg 2-remsen

The rise of Indexing

Insect Collection HerbariumBird Observations Herbarium

Any data records from

Thailand?

Send me an index of all of your data

GBIF Data Portal (now with Data!)

GBIF Data Portal as a Data Index

Page 8: Tdwg 2-remsen

The wrong tools for the job

Insect Collection HerbariumBird Observations Herbarium

Any data records from

Thailand?

Send me an index of your data once per month

Here is page one.

If I go offline,start againNot too fast!

You ask the same questions every time

GBIF Data Portal (now with Data!)

Page 9: Tdwg 2-remsen

Darwin Core Archives

A text-based solution to publishing biodiversity

data

Page 10: Tdwg 2-remsen

A Refined Approach

Insect Collection HerbariumBird Observations Herbarium

Any data records from

Thailand?

This is fast!

GBIF Data Portal (now with Data!)

This is easy

URL URL URL URL

Page 11: Tdwg 2-remsen

2007 Today

70 million

20102008 2009

147 million

180 million

201 million

302 millionGrowth

Need for a new standard identified

Page 12: Tdwg 2-remsen

Issue #2: Geospatial Integration

Goal – Provide accurate reporting of nationally-bound data

Challenge – Inaccurate recording of geospatial coordinates

Page 13: Tdwg 2-remsen

Geo-referenced USA data

Verbatim data as shared on the network

Page 14: Tdwg 2-remsen

Issue #2: Geospatial Integration

• Remediation includes• Integration of national shapefiles to verify that

coordinates fell within country boundaries– Including EEZ boundaries– Including islands

• Identified outliers• Qualified the nature of the error (e.g.,

“coordinates inverted”)• Marked and omitted these records from display

Page 15: Tdwg 2-remsen

Geo-referenced USA data

Data following interpretation- Coastal regions recognised

- Offshore islands recognised

Page 16: Tdwg 2-remsen

Issue #3: Taxonomic Integration

• Goal – Provide access to biodiversity data according to taxonomic groups and concepts

• Challenge – – Heterogeneous and sometimes inaccurate

classification• Same taxon appearing in different classifications

– Presence of homonyms that complicate reconciling above

– Misspellings– Wide range of orthographies for the same name.

Page 17: Tdwg 2-remsen

Enabled taxonomic data to be published through GBIF

Page 18: Tdwg 2-remsen

Trochilidae (Hummingbirds) (today)

Misinterpretations(Hummingbirds are only found in western

hemisphere)

Page 19: Tdwg 2-remsen

Trochilidae (Hummingbirds) (next month)

Improved interpretation

Page 20: Tdwg 2-remsen

Search for Oenanthe(water dropwort plant or wheatear bird)

Difficult for user to interpret

Accurate search results

Today

Next month

Page 21: Tdwg 2-remsen

Improved the means to match names

Page 22: Tdwg 2-remsen

In summary

• GBIF has had to deploy different data access strategies in order to effectively scale

• Darwin Core Archive offers a scalable solution that has led to rapid growth in data published through GBIF

• Geospatial filtering via shapefiles provides basis for more accurate national reporting– Basis for additional services later (e.g., ecosystem

shapefiles, protected areas, etc.)

• Heterogenous taxonomy inherent to collections data is nearly impossible to consolidate into a taxonomically accurate structure.– Comprehensive authoritative taxonomic data is a key

organisational component of collections data

Page 23: Tdwg 2-remsen

Thank you