57
Integrating Bio-Data Lee Belbin Manager, Infrastructure Project TDWG (Biodiversity Information Standards)

TDWG at the University of Tasmania

  • Upload
    leebel

  • View
    336

  • Download
    0

Embed Size (px)

Citation preview

Page 1: TDWG at the University of Tasmania

Integrating Bio-Data

Lee BelbinManager, Infrastructure Project

TDWG (Biodiversity Information Standards)

Page 2: TDWG at the University of Tasmania

Who has heard of GBIF?

Page 3: TDWG at the University of Tasmania

GBIF

The Global Biodiversity Information Facility

Page 4: TDWG at the University of Tasmania

GBIF

International organisation established to share bio-data

Page 5: TDWG at the University of Tasmania

GBIF

Supported by ~42 countries (including Australia) and ~35

international organisations

Page 6: TDWG at the University of Tasmania

GBIF

The Australian hub is through ABRS (ABIF)

Page 7: TDWG at the University of Tasmania
Page 8: TDWG at the University of Tasmania

Who has heard of TDWG?

Page 9: TDWG at the University of Tasmania

…that’s what I figured

Page 10: TDWG at the University of Tasmania

TDWG

Formerly: The Taxonomic Database Working Group

Page 11: TDWG at the University of Tasmania

…but more accurately referred to as

Biodiversity Information Standards

Page 12: TDWG at the University of Tasmania

Biodiversity Information Standards

International group responsible for standards and protocols for

sharing bio-data

Page 13: TDWG at the University of Tasmania

EOL?

Encyclopedia of Life

Page 14: TDWG at the University of Tasmania

ALA?

Atlas of Living Australia

Page 15: TDWG at the University of Tasmania

GBIF, EoL, ALA …

…are now or will be based on TDWG standards

Page 16: TDWG at the University of Tasmania

So what?

(…hence this talk…)

Page 17: TDWG at the University of Tasmania

Science is getting more collegiate

… a good thing.

Page 18: TDWG at the University of Tasmania

The Project

US$2 million over 2.5 years (Gordon & Betty Moore

Foundation)

Page 19: TDWG at the University of Tasmania

Aim

To improve the standards for sharing 'bio-data'

Page 20: TDWG at the University of Tasmania

Why?

The whole is (far) more than the sum of the parts…

Page 21: TDWG at the University of Tasmania

PeopleLee Belbin (Hobart: Manager), Roger Hyam (Edinburgh: Systems Architect), Ricardo Pereira (Brasilia: Software Engineer)Donald Hobern (Copenhagen: GBIF & now Manager of the ALA),Stan Blum (San Francisco, TDWG old timer!)

Page 22: TDWG at the University of Tasmania

Once … we had paper!

Page 23: TDWG at the University of Tasmania

and calculators!

Page 24: TDWG at the University of Tasmania
Page 25: TDWG at the University of Tasmania
Page 26: TDWG at the University of Tasmania

The attitude:

“It’s Mine!”

Page 27: TDWG at the University of Tasmania

Then..

Page 28: TDWG at the University of Tasmania
Page 29: TDWG at the University of Tasmania
Page 30: TDWG at the University of Tasmania
Page 31: TDWG at the University of Tasmania

… but we are moving to

…far more open sharing and integration of data

Page 32: TDWG at the University of Tasmania

This will enable

…more effective environmental and species conservation / management

(among many other things)

Page 33: TDWG at the University of Tasmania

To do this, we need effective standards

…using ‘web 2.0’ technologies

Page 34: TDWG at the University of Tasmania

Video

‘The web is us’

http://www.youtube.com/watch?v=6gmP4nk0EOE

Page 35: TDWG at the University of Tasmania

Standards?

…Good ones are transparent to most who use them

Page 36: TDWG at the University of Tasmania

But for your education…I’ll give you a little insight … it will be

good for you.

Promise

Page 37: TDWG at the University of Tasmania

Standards to exchange bio-data have three components-

1. An ontology2. GUIDs

3. Transport protocols

Page 38: TDWG at the University of Tasmania

1. OntologyIs a data model that represents a formal set of concepts within a domain and the relationships

between those concepts

Page 39: TDWG at the University of Tasmania

Ontologies…are the basis of the Semantic Web where objects are given

meaning which computers and humans can understand

Page 40: TDWG at the University of Tasmania

Ontologies

…can be used by machines to reason about the objects

within that domain

Page 41: TDWG at the University of Tasmania
Page 42: TDWG at the University of Tasmania

Resource Description Framework …

RDF is the language of the Semantic Web

Page 43: TDWG at the University of Tasmania
Page 44: TDWG at the University of Tasmania

ALL data can be stored in the form of ‘RDF triples’ …

subject – predicate (verb) – objectWine – has vintage - 2005

Page 45: TDWG at the University of Tasmania
Page 46: TDWG at the University of Tasmania

2. GUIDs

Globally Unique Identifiers

Page 47: TDWG at the University of Tasmania

GUIDs

Assigned by authorities to their (bio) objects

Page 48: TDWG at the University of Tasmania

GUIDs

…Remain attached to data objects(with attribution!)

Page 49: TDWG at the University of Tasmania

GUIDs

… When ‘clicked’ return ‘semantic’ metadata / data

Page 50: TDWG at the University of Tasmania

GUID of Choice …

Life Science Identifiers(LSIDs)

Page 51: TDWG at the University of Tasmania

Transport Protocols

…Map local data to global standards

Page 52: TDWG at the University of Tasmania

Transport Protocols

… Enable searching across geographically separated data repositories (based on different

systems)

Page 53: TDWG at the University of Tasmania

The transport protocol of choice …

TAPIRTDWG Access Protocol for

Information Retrieval

Page 54: TDWG at the University of Tasmania

Transport Protocol

Video

http://www.youtube.com/watch?v=x9404is3RJ8

Page 55: TDWG at the University of Tasmania

An Example

Antbase, Google, Genbank, PubMed ‘skimmed’ for RDF

and GUIDs using TAPIR

Page 56: TDWG at the University of Tasmania

… Emergent Properties…there are specimens that have been barcoded and which are labelled in GenBank as unidentified (i.e., names like "Melissotarsus sp. BLF m1"), but the same specimen has a proper name in AntWeb (e.g., casent0107665-d01 is Melissotarsus insularis).

We can then use this information to add value to GenBank. For example, a search of GenBank for sequences for Melissotarsus insularis find nothing, but it does have sequences for this taxon, albeit under the name "Melissotarsus sp. BLF m1".

Rod Page

Page 57: TDWG at the University of Tasmania