Integrating Bio-Data
Lee BelbinManager, Infrastructure Project
TDWG (Biodiversity Information Standards)
Who has heard of GBIF?
GBIF
The Global Biodiversity Information Facility
GBIF
International organisation established to share bio-data
GBIF
Supported by ~42 countries (including Australia) and ~35
international organisations
GBIF
The Australian hub is through ABRS (ABIF)
Who has heard of TDWG?
…that’s what I figured
TDWG
Formerly: The Taxonomic Database Working Group
…but more accurately referred to as
Biodiversity Information Standards
Biodiversity Information Standards
International group responsible for standards and protocols for
sharing bio-data
EOL?
Encyclopedia of Life
ALA?
Atlas of Living Australia
GBIF, EoL, ALA …
…are now or will be based on TDWG standards
So what?
(…hence this talk…)
Science is getting more collegiate
… a good thing.
The Project
US$2 million over 2.5 years (Gordon & Betty Moore
Foundation)
Aim
To improve the standards for sharing 'bio-data'
Why?
The whole is (far) more than the sum of the parts…
PeopleLee Belbin (Hobart: Manager), Roger Hyam (Edinburgh: Systems Architect), Ricardo Pereira (Brasilia: Software Engineer)Donald Hobern (Copenhagen: GBIF & now Manager of the ALA),Stan Blum (San Francisco, TDWG old timer!)
Once … we had paper!
and calculators!
The attitude:
“It’s Mine!”
Then..
… but we are moving to
…far more open sharing and integration of data
This will enable
…more effective environmental and species conservation / management
(among many other things)
To do this, we need effective standards
…using ‘web 2.0’ technologies
Video
‘The web is us’
http://www.youtube.com/watch?v=6gmP4nk0EOE
Standards?
…Good ones are transparent to most who use them
But for your education…I’ll give you a little insight … it will be
good for you.
Promise
Standards to exchange bio-data have three components-
1. An ontology2. GUIDs
3. Transport protocols
1. OntologyIs a data model that represents a formal set of concepts within a domain and the relationships
between those concepts
Ontologies…are the basis of the Semantic Web where objects are given
meaning which computers and humans can understand
Ontologies
…can be used by machines to reason about the objects
within that domain
Resource Description Framework …
RDF is the language of the Semantic Web
ALL data can be stored in the form of ‘RDF triples’ …
subject – predicate (verb) – objectWine – has vintage - 2005
2. GUIDs
Globally Unique Identifiers
GUIDs
Assigned by authorities to their (bio) objects
GUIDs
…Remain attached to data objects(with attribution!)
GUIDs
… When ‘clicked’ return ‘semantic’ metadata / data
GUID of Choice …
Life Science Identifiers(LSIDs)
Transport Protocols
…Map local data to global standards
Transport Protocols
… Enable searching across geographically separated data repositories (based on different
systems)
The transport protocol of choice …
TAPIRTDWG Access Protocol for
Information Retrieval
Transport Protocol
Video
http://www.youtube.com/watch?v=x9404is3RJ8
An Example
Antbase, Google, Genbank, PubMed ‘skimmed’ for RDF
and GUIDs using TAPIR
… Emergent Properties…there are specimens that have been barcoded and which are labelled in GenBank as unidentified (i.e., names like "Melissotarsus sp. BLF m1"), but the same specimen has a proper name in AntWeb (e.g., casent0107665-d01 is Melissotarsus insularis).
We can then use this information to add value to GenBank. For example, a search of GenBank for sequences for Melissotarsus insularis find nothing, but it does have sequences for this taxon, albeit under the name "Melissotarsus sp. BLF m1".
Rod Page