Antabif training

Preview:

DESCRIPTION

Introduction presentation for ANTABIF training.

Citation preview

ANTABIF Traininggetting your data online

Bruno Danis, Anton Van de Putte and Nabil Youdjou

Wednesday 26 October 11

Objectives

• familiarize with ANTABIF

• learn about architecture, functionalities tools and standards we offer

• hands on exercises with dummy and *real* data

• collect feedback on the fitness for use for this community

Wednesday 26 October 11

On the Menu Today

• Background about ANTABIF

• Technical overview

• Standards, tools and resources

• Functionalities

• Future directions

• Hands on

Wednesday 26 October 11

Background

Wednesday 26 October 11

Antarctic Treaty« In order to promote international cooperation in scientific investigation in Antarctica, […], the Contracting Parties agree that, to the greatest extent feasible and practicable: […]

Scientific observations and results from Antarctica shall be exchanged and made freely available. »

Wednesday 26 October 11

SCAR-MarBIN & ANTABIF

• www.scarmarbin.be

• www.antabif.be or www.biodiversity.aq

• Core funding: BELSPO.be

• International Polar Year 2007/08

• Census of Antarctic Marine Life

• Ocean Biogeographic Information System

• Global Biodiversity Information Facility

Wednesday 26 October 11

General Philosophy

• Build an electronic ecosystem

• Offer free and open access to data and technology

• Expose all the (biodiversity) data and metadata, in multiple contexts

• Remain community-driven, and collaborative

• Adopt strong standardization

• Work for science, conservation, management

Wednesday 26 October 11

Wednesday 26 October 11

Achievements

• The first RAMS

• Board of 60+ editors

• Feeds WoRMS, CoL and EoL

• 17,098 taxa (RAMS)

• Building a dynamic RAS

• 24,248 taxa (RAS)

Wednesday 26 October 11

Achievements

• 1,288,441 records

• 198 datasets

• 5,235 taxa

• Feeds OBIS, GBIF

• Downloadable

• WebGIS

• Webservices

Wednesday 26 October 11

Achievements

• Up since Oct 2005

• open access

• 909,915 visitors

• 8,093,774 hits

• 51,416,196 dld records

• Citations: 183

• Cited Publications: 38

Wednesday 26 October 11

Achievements

Records SMB ANTABIF Progress

Metadata 198 7.200 36,4

Occurrence 1.288.441 2.659.392 2,1

Taxonomy 17.184 30.472 1,8

Wednesday 26 October 11

Nuts and Bolts

Wednesday 26 October 11

100% Open Source

• Language: Ruby

• Framework: Rails(ActiveRecord) and YUI

• (smart) Search engine: Full text (Elasticsearch-Lucene)

• Database/GIS server/SpatialDB: PostGresql/Geoserver/PostGIS

• Mapping client: OpenLayers

• Web services: RESTish (all resources)

• Protocols/Standards: DIF, DwC, DwC-A, Tapir…etc

• GBIF tools : HIT, IPT

• Hosting: BeBIF (ULB/VUB joint IT Center)

• Metadata systems: GCMD API (DIF)

Wednesday 26 October 11

Data flow

Your data

standardize

DwC-A

upload publish

IPT ANTABIF

publish

Data Paper

(your point of view)

Wednesday 26 October 11

Data flow(our point of view)

Wednesday 26 October 11

Standards, tools, resources

Wednesday 26 October 11

MetadataInformation about datasets deteriorates over time!

Wednesday 26 October 11

Metadata

• preferred MD catalogue = Antarctic Master Directory (subset of GCMD)

• standard = DIF (Data Interchange Format)

• used by the whole SCAR community

• crawled by Google, Scopus...

Wednesday 26 October 11

DarwinCore

"A vocabulary of words that biologists, hackers, and citizen scientists use to broadly describe the biodiversity of life on earth."

Wednesday 26 October 11

DarwinCore Archive

• Complete package of data

–One file

–Multiple files

• Text Files…

• Self-documenting

• Intended to be shared/distributed

Wednesday 26 October 11

DarwinCore Archive

The  core  data  file  is  a  text  file.

Archives always have a ‘core’ data file

My_data.txt

Wednesday 26 October 11

DarwinCore Archive

The  core  data  file  is  a  text  file.

Archives always have a ‘core’ data file

My_data.txt

Wednesday 26 October 11

DarwinCore Archive

meta.xml  describes  the  mappings  in  thecore  data  file  (species.txt)

Darwin Core Archive (two files)

Wednesday 26 October 11

DarwinCore Archive

Columns  in  extensions  are  mapped  to  Darwin  Core  using  the  meta.xml  file

Multiple extensions are available

Wednesday 26 October 11

DarwinCore Archive

h?p://rs.gbif.org/extension/

Many extensions are available

Wednesday 26 October 11

Spreadsheet templates

• Metadata - describe a database or other data resource. 

• Species Occurrence - store basic species collections or observational data

• Species Checklists – recording and storing simple annotated species checklists.

Wednesday 26 October 11

Wednesday 26 October 11

Wednesday 26 October 11

Wednesday 26 October 11

Wednesday 26 October 11

Wednesday 26 October 11

Spreadsheet processor

• web application: Excel spreadsheet to DwC-A.

• Excel files contain data entry and GBIF metadata profile.

• Worksheet supports publication of primary biodiversity data

• Processor performs data validation and transformation and returns a validated DwC-A

Wednesday 26 October 11

Wednesday 26 October 11

DwC-A validator

• tests Darwin Core Archives

• validates the content against the known extensions and terms registered within the GBIF network for sharing biodiversity data.

Wednesday 26 October 11

Wednesday 26 October 11

IPT - Integrated Publishing Toolkit

• Publishing primary biodiversity data

• Resources

• Metadata

• Source Data (text, zip, SQL)

• Source Mappings

• Visibility

• Published Release

Wednesday 26 October 11

The Data Paper concept

• A scholarly journal publication whose primary purpose is to describe a dataset or group of datasets, rather than to report a research investigation.

• Benefits of the Data Paper

–Scholarly credit to Data Publishers

–Describe the data in structured human readable form

–Bring the existence of the data to the attention of the scholarly community

Wednesday 26 October 11

Data Paper: Incentivising Data Discovery

Wednesday 26 October 11

Data PaperMetadata document

Reward data publishing

Wednesday 26 October 11

• Complete metadata of a dataset using metadata editor in IPT 2.0.2

• Generate ‘Data Paper’ manuscript (menu: Manage Resource – RTF Download)

• Submit the manuscript for possible publication in one of the PenSoft publication (ZooKeys, PhytoKeys, BioRisks, NeoBiota).

• Revision (if any) is carried out using metadata editor in IPT 2.0.2 and manuscript re-submitted to PenSoft Open Journal System

Step-by-Step

Wednesday 26 October 11

• Digital Object Identifier is assigned to the Data Paper

• Paper is published in (a) print format, (b) PDF format, (c) semantically enhanced HTML, and (d) XML is archived in PubMedCentral

• DoI of the Data Paper is linked with the Persistent Identifier of the metadata document in the GBIF Registry

• Data Paper is indexed by Web of Knowledge (ISI), PubMedCentral, Scopus, Zoological Record, Google Scholar, CAB Abstracts, Directory of Open Access Journal (DOAJ), EBSCO.

Once paper is accepted

Wednesday 26 October 11

• Metadata is complete in all the respect

• All the claims are adequately substantiated

• Data described in ‘Data Paper’ is freely available at the time of submission of the manuscript

Important to consider

Wednesday 26 October 11

ORC• GBIF’s Online Resource Center

• Provides access to documents, best practices, tools and links

• Wide thematic scope

• Different ways of accessing resources

• Enabling community contributions

• Different levels of resource access

• Multilanguage supportWednesday 26 October 11

Wednesday 26 October 11

Functionalities

Wednesday 26 October 11

www.biodiversity.aq

• general website

• latest news

• contact

• sponsors

• governance

• links

Wednesday 26 October 11

www. biodiversity.aq

Wednesday 26 October 11

data. biodiversity.aq

• find primary biodiversity data

• visualize occurrence data on map

• view taxonomic data

• download data

• view metrics

• send feedback

• access technical documentation

Wednesday 26 October 11

data. biodiversity.aq

Wednesday 26 October 11

ipt. biodiversity.aq

• prepare and clean your data

• publish primary biodiversity data

• publish metadata

• push data and metadata to ANTABIF & GBIF

• get a Data Paper

Wednesday 26 October 11

ipt. biodiversity.aq

Wednesday 26 October 11

afg. biodiversity.aq

• (nice-looking) Identification aid

• Publication/sharing platform for customized Field Guides

• High quality (useful) pictures

• Expert Descriptions

• Built dynamically from various sources

Wednesday 26 October 11

afg. biodiversity.aq

Wednesday 26 October 11

share. biodiversity.aq

• download shared resources

• reports, communication material

• original datasets, tools, resources

Wednesday 26 October 11

share. biodiversity.aq

Wednesday 26 October 11

• polarcommons.org

• Emergency solution for orphan datasets

• Setup of a commons

• IT cloud

• Set of norms

• All polar data (IPY)

• Simple procedure!

PIC

Wednesday 26 October 11

www.polarcommons.org

Wednesday 26 October 11

Future directions

Wednesday 26 October 11

Architecture

• A network of IPTs

• Enhanced data flow

• Community involved in data management

• Enhanced interoperability

• Optimization of research efforts/resources

• Integrative, connected science

• Factual, adaptative conservation

Wednesday 26 October 11

Challenges ahead

• Data intensive science

• Data deluge

• Digital divides

• Other data types and integration

• Orphan datasets

• Cultural change

Wednesday 26 October 11

Hands on now

Wednesday 26 October 11

The rest of the day

• Using the portals

• Using data tools

• templates

• data validation

• documentation

• publishing

Wednesday 26 October 11

http://share.biodiversity.aq/training/

Wednesday 26 October 11

Recommended