Tdwg 2-remsen

Preview:

DESCRIPTION

Presentation at TDWG Conference 2011 in New Orleans

Citation preview

Taxonomic Databases Working Group Annual Meeting 2011

GBIF: Issues in providing federated access to digital information related to biological specimens.

David RemsenSenior Programme OfficerGlobal Biodiversity Information Facility (GBIF)

TDWG 2011

Issue #1: The consequences of scale

Goal – Provide timely access to a large federated network of biodiversity databases

About GBIF

• 341 publishers• 9290 datasets• 310M records

The mission of the Global Biodiversity Information Facility (GBIF) is to facilitate free and open access to biodiversity data worldwide via the Internet to underpin sustainable development.

• 57 countries• 45 organisations

Primary biodiversity data

“Wrapper” Software

PyWrapper (Python)

TAPIR Link (PHP)

DiGIR (PHP)

Your database

Insect Collection

Install one of these ‘wrappers’

ABCD

Bird Observations

Herbarium

Data

DarwinCore

DarwinCore

The promise of federation

Insect Collection HerbariumBird Observations Herbarium

Any specimens from Thailand?

GBIF Data Portal

I will ask!

I do! I do! I do!Nope!

GBIF Data Portal as a Gateway

The challenge of federation

Insect Collection HerbariumBird Observations Herbarium

Hello?

Server Not AvailableServer Not Available

GBIF Data Portal

Hi!

The rise of Indexing

Insect Collection HerbariumBird Observations Herbarium

Any data records from

Thailand?

Send me an index of all of your data

GBIF Data Portal (now with Data!)

GBIF Data Portal as a Data Index

The wrong tools for the job

Insect Collection HerbariumBird Observations Herbarium

Any data records from

Thailand?

Send me an index of your data once per month

Here is page one.

If I go offline,start againNot too fast!

You ask the same questions every time

GBIF Data Portal (now with Data!)

Darwin Core Archives

A text-based solution to publishing biodiversity

data

A Refined Approach

Insect Collection HerbariumBird Observations Herbarium

Any data records from

Thailand?

This is fast!

GBIF Data Portal (now with Data!)

This is easy

URL URL URL URL

2007 Today

70 million

20102008 2009

147 million

180 million

201 million

302 millionGrowth

Need for a new standard identified

Issue #2: Geospatial Integration

Goal – Provide accurate reporting of nationally-bound data

Challenge – Inaccurate recording of geospatial coordinates

Geo-referenced USA data

Verbatim data as shared on the network

Issue #2: Geospatial Integration

• Remediation includes• Integration of national shapefiles to verify that

coordinates fell within country boundaries– Including EEZ boundaries– Including islands

• Identified outliers• Qualified the nature of the error (e.g.,

“coordinates inverted”)• Marked and omitted these records from display

Geo-referenced USA data

Data following interpretation- Coastal regions recognised

- Offshore islands recognised

Issue #3: Taxonomic Integration

• Goal – Provide access to biodiversity data according to taxonomic groups and concepts

• Challenge – – Heterogeneous and sometimes inaccurate

classification• Same taxon appearing in different classifications

– Presence of homonyms that complicate reconciling above

– Misspellings– Wide range of orthographies for the same name.

Enabled taxonomic data to be published through GBIF

Trochilidae (Hummingbirds) (today)

Misinterpretations(Hummingbirds are only found in western

hemisphere)

Trochilidae (Hummingbirds) (next month)

Improved interpretation

Search for Oenanthe(water dropwort plant or wheatear bird)

Difficult for user to interpret

Accurate search results

Today

Next month

Improved the means to match names

In summary

• GBIF has had to deploy different data access strategies in order to effectively scale

• Darwin Core Archive offers a scalable solution that has led to rapid growth in data published through GBIF

• Geospatial filtering via shapefiles provides basis for more accurate national reporting– Basis for additional services later (e.g., ecosystem

shapefiles, protected areas, etc.)

• Heterogenous taxonomy inherent to collections data is nearly impossible to consolidate into a taxonomically accurate structure.– Comprehensive authoritative taxonomic data is a key

organisational component of collections data

Thank you

Recommended