49
iMarine Products and services delivery 4th iMarine Board Rome 17-18 October 2013 iMarine Products and services delivery Pasquale Pagano (CNR) iMarine Technical Director [email protected]

iMarine Products and Services delivery

Embed Size (px)

DESCRIPTION

The iMarine initiative provides a data infrastructure aimed at facilitating open access, the sharing of data, collaborative analysis, processing and mining processing, as well as the dissemination of newly generated knowledge. The iMarine data infrastructure is developed to support decision making in high-level challenges that require policy decisions typical of the ecosystem approach. The iMarine offering can be articulated in six bundles. A “bundle” is a set of services and technologies grouped according to a family of related tasks for achieving a common objective. Bundles can be customized and/or enriched into flexible, purpose-built Virtual Research Environments (VRE). Virtual research environments offer flexible and secure web-based, community-centric platforms, so researchers can work together on common challenges. Each VRE in the infrastructure is tightly integrated with the underlying gCube enabling software, and can access and re-purpose data from other iMarine applications.

Citation preview

Page 1: iMarine Products and Services delivery

iMarine Products and services delivery

4th iMarine Board

Rome

17-18 October 2013

iMarine Products and services delivery

Pasquale Pagano (CNR)iMarine Technical Director

[email protected]

Page 2: iMarine Products and Services delivery

Outline

• BiolCube

• StatsCube

• GeosCube

• ConnectCube

Products and services development progress reportProducts and services development progress report

• ConnectCube

• Tiny selection of products

Products and services catalogue at project conclusionProducts and services catalogue at project conclusion

iMarine Products and services delivery

Page 3: iMarine Products and Services delivery

Google Analytics iMarine portal

iMarine Products and services delivery

Page 4: iMarine Products and Services delivery

Application Bundles

Management and interpretation of biological and ecological data in the environmentManagement and interpretation of biological and ecological data in the environment

Complete full life-cycle data framework, from observational data to aggregated data repositories enriched with validation and analytical tools

Complete full life-cycle data framework, from observational data to aggregated data repositories enriched with validation and analytical tools

A BUNDLE is

a set of

services and

technologie

s grouped

according to

a family of

iMarine Products and services delivery

Storage and interpretation of geospatial explicit information, including WPS processingStorage and interpretation of geospatial explicit information, including WPS processing

Flexible sharing, storage, reporting, search and retrieval, aggregation and projection facilitiesFlexible sharing, storage, reporting, search and retrieval, aggregation and projection facilities

a family of

related

tasks for ac

hieving a

common

objective

Page 5: iMarine Products and Services delivery

PRODUCTS AND SERVICES

DEVELOPMENT PROGRESS REPORT

A fraction of the products and services belonging to BiolCube

iMarine Products and services delivery

Page 6: iMarine Products and Services delivery

Species Data Discovery

Search for multiple species

Search across several data providers

iMarine Products and services delivery

Search for all occurrences of a set of species and their synonyms

Search occurrences for all species belonging a taxon group

Page 7: iMarine Products and Services delivery

Species Data Discovery

Search in GBIF all the occurrences about 'sarda sarda' and its synonyms found in WoRMSSearch in GBIF all the occurrences about 'sarda sarda' and its synonyms found in WoRMS

• SEARCH BY SN 'sarda sarda' EXPAND WITH WoRMS IN GBIF RETURN Occurrence

Search in CoL all the Taxa about 'sarda sarda' and its synonyms found in WoRMSSearch in CoL all the Taxa about 'sarda sarda' and its synonyms found in WoRMS

• SEARCH BY SN 'sarda sarda' EXPAND WITH WoRMS IN CoL RETURN TAXON

Search all occurrences for the species commonly recognized as 'shark' in WoRMS and their Search all occurrences for the species commonly recognized as 'shark' in WoRMS and their Search all occurrences for the species commonly recognized as 'shark' in WoRMS and their synonyms as recognized by CoL. Accept only the results with coordinate less or equals to (15.12, 16.12).

Search all occurrences for the species commonly recognized as 'shark' in WoRMS and their synonyms as recognized by CoL. Accept only the results with coordinate less or equals to (15.12, 16.12).

• SEARCH BY CN 'shark' RESOLVE WITH WoRMS EXPAND WITH CoL WHERE coordinate <= 15.12, 16.12 RETURN Occurrence

Search in OBIS all the occurrences for 'sarda sarda' and 'Carcharodon carcharias' expanded with synonyms from WoRMS and CoL. Accept only the results with an event date between 2000 and 2005.

Search in OBIS all the occurrences for 'sarda sarda' and 'Carcharodon carcharias' expanded with synonyms from WoRMS and CoL. Accept only the results with an event date between 2000 and 2005.

• SEARCH BY SN 'sarda sarda', 'Carcharodon carcharias' EXPAND WITH WoRMS, CoL IN OBIS WHERE eventDate >= '2000' AND eventDate <= '2005' RETURN Occurrence

iMarine Products and services delivery

Page 8: iMarine Products and Services delivery

Occurrence Data from GBIF Occurrence Data from Obis

∩Intersection

-Difference

ᴜUnion

Occurrence Points

DD

Duplicates DeletionIntersection DifferenceUnion

A

x,y

Event Date

Modif Date

Author

Species Scientific Name

B

x,y

Event Date

Modif Date

Author

Species Scientific Name

Records

Similarity

Records

Similarity

Duplicates Deletion

iMarine Products and services delivery

Page 9: iMarine Products and Services delivery

Similarity between habitats

Habitat Representativeness Score:

1. Measures the similarity between the environmental features of two areas

2. Assesses the quality of models and environmental features

Latimeria chalumnae

HRS=10.5HRS=10.5

Habitat

Representativeness

Score

iMarine Products and services delivery

Page 10: iMarine Products and Services delivery

BiOnym

Preprocessing

And

Parsing

A flexible workflow approach to

Taxon

Matcher 1

Taxon

ReferenceReference

Source

(ASFIS)(FISHBASE)

Reference

Source

(FISHBASE)

ReferenceReference

Source

(OBIS)

Raw Input String.

E.g. Gadus morua Lineus 1758

DwC-A)

Reference

Source

(Other in

DwC-A)

A flexible workflow approach to

taxon name matching

Accounts for:

• Variations in the spelling and

interpretation of taxonomic

names

• Combination of data from

different sources

• Harmonization and reconciliation

of Taxa names

Taxon

Matcher 2

Taxon

Matcher n

PostProcessing

Correct Transcriptions:

E.g. Gadus morhua (Linnaeus, 1758)

iMarine Products and services delivery

Page 11: iMarine Products and Services delivery

Trendylyzer - Scope

• Fill some knowledge gaps on marine

species

• Account for sampling biases

• Define trends for common species

Is the Fulmar losing its common

We focus on the OBIS database

Is the Fulmar losing its common

species status among the

seabirds?

Plankton regime shift

Herring recovered after the fish ban

Can we recognize big changes in

species presence?

iMarine Products and services delivery

Page 12: iMarine Products and Services delivery

Trendylyzer - Most Observed Taxa

iMarine Products and services delivery

Page 13: iMarine Products and Services delivery

Trendylyzer – Observation ranks on Large Marine Ecosystems

iMarine Products and services delivery

Page 14: iMarine Products and Services delivery

Trendylyzer – Observation ranks on Marine Ecoregions of the World

iMarine Products and services delivery

Page 15: iMarine Products and Services delivery

Objective:

Calculate the a and b parameters for several

species.

Requirements:

Account for...

• Many studies about a single species

• Single study

• Use existing studies to inform new studies

Length-Weight Relationships

bluewatermag.com.au

• Use existing studies to inform new studies

Solution:

Combine existing knowledge with new data by

means of Bayesian methods.

Approach:

� Collaborative development with the

‘stakeholder’

� Integration of R Scripts

� Usage of Cloud computing for R Scripts

iMarine Products and services delivery

Page 16: iMarine Products and Services delivery

LWR - Performance

� The porting to the D4Science Statistical Manager allowed to run the

scripts in distributed fashion

� The original time of the scientist’s procedure was 20 days

� After the optimization on our R development machines the time of

the sequential run was reduced to 10 days

� The timing on the Statistical Manager was of 11 hours!

Time reduction of 95.4%

� The script has been run periodically and currently solves LWR for

37 234 species

iMarine Products and services delivery

Page 17: iMarine Products and Services delivery

PRODUCTS AND SERVICES

DEVELOPMENT PROGRESS REPORT

A fraction of the products and services belonging to StatsCube

iMarine Products and services delivery

Page 18: iMarine Products and Services delivery

Tabular Data Manager

Complete new application for the management

of data workflow. It allows to *manage* *flow of

data* and to create report out of the

management activities.

• flow of data: dataset compliant with a template • flow of data: dataset compliant with a template

that are generated and updated in chunks.

• manage: import, store, transform, validate,

access, analyze, visualize, and export.

iMarine Products and services delivery

Page 19: iMarine Products and Services delivery

Tabular Data Manager: Templates

• A table template defines:

– Table definition

– Columns definition

– A set of table transformations– A set of table transformations

– A set of validation procedures

• Can be applied to any dataset

• Can be modified and shared among people

iMarine Products and services delivery

Page 20: iMarine Products and Services delivery

Tabular Data Manager: Menu

Ribbon style menu

Buttons behavior depends

on current document

iMarine Products and services delivery

on current document

Alt messages on

mouseover

Page 21: iMarine Products and Services delivery

Tabular Data Manager: Panels

iMarine Products and services delivery

Page 22: iMarine Products and Services delivery

Tabular Data Manager: Import

iMarine Products and services delivery

Page 23: iMarine Products and Services delivery

Infrastructure: Computing as Service

Hadoop

Statistical Manager

• MapReduce

• Analysis/clustering/modeling

33

0 C

ore

s C

urr

en

tly

All

oca

ted

Manager

R clusters • Windows and Linux

I-MARINE EXTENDED BOARD 23

33

0 C

ore

s C

urr

en

tly

All

oca

ted

Page 24: iMarine Products and Services delivery

PRODUCTS AND SERVICES

DEVELOPMENT PROGRESS REPORT

A fraction of the products and services belonging to GeosCube

iMarine Products and services delivery

Page 25: iMarine Products and Services delivery

Rasterization

A polygonal map is

transformed into a raster

map or into a point map

iMarine Products and services delivery

Page 26: iMarine Products and Services delivery

Maps Comparison

compare

Compares :

• Species Distribution

mapsmaps

• Environmental layers

• SAR Images

iMarine Products and services delivery

Page 27: iMarine Products and Services delivery

Periodicity and Seasonality

Periodicity: 12 months

Extraction Tools Fourier AnalysisExtraction Tools Fourier Analysis

iMarine Products and services delivery

Page 28: iMarine Products and Services delivery

Environmental Signal Processing

Resampling

Spectrogram

iMarine Products and services delivery

Page 29: iMarine Products and Services delivery

Environmental Enrichment: Approach

• (Oozie)workflow to optimize the processing chain:

– Extract occurrences for the Carcharodon carcharias (White Shark) for a given time of interest

– Apply the dbscan algorithm (R implementation) to identify geospatial clusters

– Create bounding boxes around the clusters – Create bounding boxes around the clusters

– Use the bounding boxes as queryables for the WCS request

– Apply BEAM Pixel Extraction (same algorithm as BioOracleenvironmental enrichment service)

– Create the time series

– Visualize the time series

iMarine Products and services delivery

Page 30: iMarine Products and Services delivery

Environmental Enrichment: results

iMarine Products and services delivery

Page 31: iMarine Products and Services delivery

SPREAD

• Interactive investigation process for statisticians &

scientists to confront data from different domains

(e.g. Statistics vs. GIS data) and batch process of data

reallocations hypothesis

DATA IMPORT / CURATION

Estimates dataset

DATA SELECTION

(e.g. Filter)

FAO AreasGeographic intersection

FAO Areas / EEZs – Highs seas

Catch dataset

by FAO area

REALLOCATION

Estimates dataset

by EEZ – high seas

Available

Target Areas

Species

distributions

GIS DATA DISCOVERY,

SEARCHING & SHARING

iMarine Products and services delivery

Page 32: iMarine Products and Services delivery

Legacy Processes (IRD)

• iX Catches per Species: per Ocean / Area, per

Fishing Gear type, per Month / Year, and kernel

density for biodiversity / ecological datasets

(IRD+OBIS+GBIF)

20°N

30°E 50°E 70°E 90°E 110°E

30°S

20°S

10°S

0

10°N

iMarine Products and services delivery

Page 33: iMarine Products and Services delivery

PRODUCTS AND SERVICES

DEVELOPMENT PROGRESS REPORT

A fraction of the products and services belonging to ConnectCube

iMarine Products and services delivery

Page 34: iMarine Products and Services delivery

MarineTLO

Version 2.0.0

– Species

– Scientific Name of Species

– FAO Species Code

– IRD Species Code

– WoRMS Species Code

– Predators and Prey

– Competitors

– Biological Classification of Species

Version 3.0.0– MarineTLO Version 2.0.0

– Water Areas

– Species connected to Water Areas

– Countries

– Countries connected to Water Aras

– Species connected to Countries

– Ecosystems

– Ecosystems connected to Countries– Biological Classification of Species

(e.g. WoRMS)

– Ecosystems connected to Countries

– Species connected to Ecosystems

– Exclusive Economical Zones

– Fishing Gears

– Fishing Vessels

– More species and more Predators

– Common Names of Species

34iMarine Products and services delivery

Page 35: iMarine Products and Services delivery

#Query For a scientific name of a species (e.g. Thunnus Albacares or Poromitra Crassiceps),

find/give me

Q1 the biological environments (e.g. ecosystems) in which the species has been introduced and more

general descriptive information of it (such as the country)

Q2 its common names and their complementary info (e.g. languages and countries where they are

used)

Q3 the water areas and their FAO codes in which the species is native

Q4 the countries in which the species lives

Requirements as Competency Queries

35

Q4 the countries in which the species lives

Q5 the water areas and the FAO portioning code associated with a country

Q6 the presentation w.r.t Country, Ecosystem, Water Area and Exclusive Economical Zone (of the

water area)

Q7 the projection w.r.t. Ecosystem and Competitor, providing for each competitor the identification

information (e.g. several codes provided by different organizations)

Q8 a map w.r.t. Country and Predator, providing for each predator both the identification information

and the biological classification

Q9 who discovered it, in which year, the biological classification, the identification information, the

common names - providing for each common name the language, the countries where it is used

in.

iMarine Products and services delivery

Page 36: iMarine Products and Services delivery

The MarineTLO-based warehouse Evolution

RDF

Triple StoreTLOMarine

FLOD ECOSCOPE WoRMS

FLOD2TLOm

apping

ECOSCOPE2TLO

mapping

WoRMS2TLO

mapping

DBpediaS2TLO

mapping

FB2TLO

mapping

DBpedia Fishbase

FLOD ECOSCOPEWoRMS

(part)

FLOD ECOSCOPE WoRMS

CopyCopy

By FAO By IRD Generated by SPD

&TLO wrapper

Copy

DBpedia Fishbase

DBpedia

(part)

Fishbase

(part)

By DBpedia

SPARQL Endpoint

By Fishbase

RDMS

Copy Copy

iMarine Products and services delivery

Page 37: iMarine Products and Services delivery

Warehouse V3

Concepts Ecoscope FLOD WoRMS DBpedia Fishbase

Species

Scientific Names

Authorships

Common Names

Predators

EcosystemsEcosystems

Countries

Water Areas

Vessels

Gears

EEZ

iMarine Products and services delivery

Page 38: iMarine Products and Services delivery

TLO warehouse V2 vs V3

V2 Contains information about 19,000 distinct marine species

Source Species Number

DBpedia 14,291

FLOD 10,849

WoRMS 1124

Ecoscope 277

Common Species (size of intersections)

FLOD WoRMS Ecoscope

DBpedia 3,046 731 56

FLOD 768 73

WoRMS 768 53

V3 contains information about 37,000 distinct marine species

Source Species Number

DBpedia 14,291

FLOD 10,849

WoRMS 1124

Ecoscope 277

FishBase 31,277

Common Species (size of intersections)

FLOD WoRMS Ecoscope Fishbase

DBpedia 3,046 731 56 9833

FLOD 768 73 6141

WoRMS 53 1288

Ecoscope 53

V3 contains information about 37,000 distinct marine species

iMarine Products and services delivery

Page 39: iMarine Products and Services delivery

PRODUCTS AND SERVICES CATALOGUE

AT PROJECT CONCLUSION

A tiny fraction of the products and services belonging to BiolCube

iMarine Products and services delivery

Page 40: iMarine Products and Services delivery

Trendylyzer – Definition of Common SpeciesGrey = not a common species in 1990

� Trends for common

species can be indicators

of ecological changes

� A formal definition of

common species is not

trivial

� A definition based on

occurrences distribution

gives interesting, result

but is affected by sampling

biases

iMarine Products and services delivery

Page 41: iMarine Products and Services delivery

Trendylyzer – Definition of Common Species

We are searching for a more formal definition of C.S., which accounts

for the biases in the database …

We defined a commonness score function

The terms influencing the Commonness of a species are given a weight

using pattern recognition models

For each species:

1. Nr of observations

2. Nr of individuals per observation

3. Nr of observations per dataset

4. Nr of datasets

5. Nr of geographical cells

6. Temporal frequency of the observations

Normalizing => relative commonness.

Create score or rank by taxonomic group

We are assessing the

performances on the

indications by FishBase and

IUCN on some benchmark

species

iMarine Products and services delivery

Page 42: iMarine Products and Services delivery

Trendylyzer - Performance

A preliminary definition of CS was done using

1. Nr of observations per dataset in one year

2. Nr of datasets containing the species in one year

On a ‘trustable’ benchmark with 255 species the correctness of the

classification with respect to an expert classification was 99.21%!

The complex approximating function including also time and

geographical extent gave 80% of agreement with respect to an expert

classification on an ‘wild’ benchmark (80 species)

The results are very promising!

iMarine Products and services delivery

Page 43: iMarine Products and Services delivery

PRODUCTS AND SERVICES CATALOGUE

AT PROJECT CONCLUSION

A tiny fraction of the products and services belonging to StatsCube

iMarine Products and services delivery

Page 44: iMarine Products and Services delivery

Tabular Data Manager

gCube Releases

iMarine Products and services delivery

Page 45: iMarine Products and Services delivery

Tabular Data Manager: 2.18

• Transformations support: table/column type, labels management

• Validation: multiple codes warning, reduntanttuples, table types checks (codelist, dataset)

• Generic table metadata support

• Batch replace (according to an expression)• Batch replace (according to an expression)

• Single tuple modification

• Full Workspace integration

• Support for JSON document

• Templates

iMarine Products and services delivery

Page 46: iMarine Products and Services delivery

Tabular Data Manager: 2.19

• Operations bundle

– Aggregation, Union, Filtering, Denormalisation

– Column merging

– Import Postprocessing

– Notification

– Custom codelist creation use cases– Custom codelist creation use cases

more to come…

iMarine Products and services delivery

Page 47: iMarine Products and Services delivery

Tabular Data Manager: Next releases

• 2.20

– SDMX Datasource

– Codelist georeferencing

– Maps visualization

• 2.21

Data Analysis: mondrian support– Data Analysis: mondrian support

– Graphs

– UI: Jpivot, stpivot

iMarine Products and services delivery

Page 48: iMarine Products and Services delivery

Harmonize: Cotrix

48

Page 49: iMarine Products and Services delivery

Discussion time

Thank you

for your attention

www.i-marine.eu

iMarine Products and services delivery