Upload
imarine283644
View
477
Download
2
Tags:
Embed Size (px)
DESCRIPTION
The iMarine initiative provides a data infrastructure aimed at facilitating open access, the sharing of data, collaborative analysis, processing and mining processing, as well as the dissemination of newly generated knowledge. The iMarine data infrastructure is developed to support decision making in high-level challenges that require policy decisions typical of the ecosystem approach. The iMarine offering can be articulated in six bundles. A “bundle” is a set of services and technologies grouped according to a family of related tasks for achieving a common objective. Bundles can be customized and/or enriched into flexible, purpose-built Virtual Research Environments (VRE). Virtual research environments offer flexible and secure web-based, community-centric platforms, so researchers can work together on common challenges. Each VRE in the infrastructure is tightly integrated with the underlying gCube enabling software, and can access and re-purpose data from other iMarine applications.
Citation preview
iMarine Products and services delivery
4th iMarine Board
Rome
17-18 October 2013
iMarine Products and services delivery
Pasquale Pagano (CNR)iMarine Technical Director
Outline
• BiolCube
• StatsCube
• GeosCube
• ConnectCube
Products and services development progress reportProducts and services development progress report
• ConnectCube
• Tiny selection of products
Products and services catalogue at project conclusionProducts and services catalogue at project conclusion
iMarine Products and services delivery
Google Analytics iMarine portal
iMarine Products and services delivery
Application Bundles
Management and interpretation of biological and ecological data in the environmentManagement and interpretation of biological and ecological data in the environment
Complete full life-cycle data framework, from observational data to aggregated data repositories enriched with validation and analytical tools
Complete full life-cycle data framework, from observational data to aggregated data repositories enriched with validation and analytical tools
A BUNDLE is
a set of
services and
technologie
s grouped
according to
a family of
iMarine Products and services delivery
Storage and interpretation of geospatial explicit information, including WPS processingStorage and interpretation of geospatial explicit information, including WPS processing
Flexible sharing, storage, reporting, search and retrieval, aggregation and projection facilitiesFlexible sharing, storage, reporting, search and retrieval, aggregation and projection facilities
a family of
related
tasks for ac
hieving a
common
objective
PRODUCTS AND SERVICES
DEVELOPMENT PROGRESS REPORT
A fraction of the products and services belonging to BiolCube
iMarine Products and services delivery
Species Data Discovery
Search for multiple species
Search across several data providers
iMarine Products and services delivery
Search for all occurrences of a set of species and their synonyms
Search occurrences for all species belonging a taxon group
Species Data Discovery
Search in GBIF all the occurrences about 'sarda sarda' and its synonyms found in WoRMSSearch in GBIF all the occurrences about 'sarda sarda' and its synonyms found in WoRMS
• SEARCH BY SN 'sarda sarda' EXPAND WITH WoRMS IN GBIF RETURN Occurrence
Search in CoL all the Taxa about 'sarda sarda' and its synonyms found in WoRMSSearch in CoL all the Taxa about 'sarda sarda' and its synonyms found in WoRMS
• SEARCH BY SN 'sarda sarda' EXPAND WITH WoRMS IN CoL RETURN TAXON
Search all occurrences for the species commonly recognized as 'shark' in WoRMS and their Search all occurrences for the species commonly recognized as 'shark' in WoRMS and their Search all occurrences for the species commonly recognized as 'shark' in WoRMS and their synonyms as recognized by CoL. Accept only the results with coordinate less or equals to (15.12, 16.12).
Search all occurrences for the species commonly recognized as 'shark' in WoRMS and their synonyms as recognized by CoL. Accept only the results with coordinate less or equals to (15.12, 16.12).
• SEARCH BY CN 'shark' RESOLVE WITH WoRMS EXPAND WITH CoL WHERE coordinate <= 15.12, 16.12 RETURN Occurrence
Search in OBIS all the occurrences for 'sarda sarda' and 'Carcharodon carcharias' expanded with synonyms from WoRMS and CoL. Accept only the results with an event date between 2000 and 2005.
Search in OBIS all the occurrences for 'sarda sarda' and 'Carcharodon carcharias' expanded with synonyms from WoRMS and CoL. Accept only the results with an event date between 2000 and 2005.
• SEARCH BY SN 'sarda sarda', 'Carcharodon carcharias' EXPAND WITH WoRMS, CoL IN OBIS WHERE eventDate >= '2000' AND eventDate <= '2005' RETURN Occurrence
iMarine Products and services delivery
Occurrence Data from GBIF Occurrence Data from Obis
∩Intersection
-Difference
ᴜUnion
Occurrence Points
DD
Duplicates DeletionIntersection DifferenceUnion
A
x,y
Event Date
Modif Date
Author
Species Scientific Name
B
x,y
Event Date
Modif Date
Author
Species Scientific Name
Records
Similarity
Records
Similarity
Duplicates Deletion
iMarine Products and services delivery
Similarity between habitats
Habitat Representativeness Score:
1. Measures the similarity between the environmental features of two areas
2. Assesses the quality of models and environmental features
Latimeria chalumnae
HRS=10.5HRS=10.5
Habitat
Representativeness
Score
iMarine Products and services delivery
BiOnym
Preprocessing
And
Parsing
A flexible workflow approach to
Taxon
Matcher 1
Taxon
ReferenceReference
Source
(ASFIS)(FISHBASE)
Reference
Source
(FISHBASE)
ReferenceReference
Source
(OBIS)
Raw Input String.
E.g. Gadus morua Lineus 1758
DwC-A)
Reference
Source
(Other in
DwC-A)
A flexible workflow approach to
taxon name matching
Accounts for:
• Variations in the spelling and
interpretation of taxonomic
names
• Combination of data from
different sources
• Harmonization and reconciliation
of Taxa names
Taxon
Matcher 2
Taxon
Matcher n
PostProcessing
Correct Transcriptions:
E.g. Gadus morhua (Linnaeus, 1758)
iMarine Products and services delivery
Trendylyzer - Scope
• Fill some knowledge gaps on marine
species
• Account for sampling biases
• Define trends for common species
Is the Fulmar losing its common
We focus on the OBIS database
Is the Fulmar losing its common
species status among the
seabirds?
Plankton regime shift
Herring recovered after the fish ban
Can we recognize big changes in
species presence?
iMarine Products and services delivery
Trendylyzer - Most Observed Taxa
iMarine Products and services delivery
Trendylyzer – Observation ranks on Large Marine Ecosystems
iMarine Products and services delivery
Trendylyzer – Observation ranks on Marine Ecoregions of the World
iMarine Products and services delivery
Objective:
Calculate the a and b parameters for several
species.
Requirements:
Account for...
• Many studies about a single species
• Single study
• Use existing studies to inform new studies
Length-Weight Relationships
bluewatermag.com.au
• Use existing studies to inform new studies
Solution:
Combine existing knowledge with new data by
means of Bayesian methods.
Approach:
� Collaborative development with the
‘stakeholder’
� Integration of R Scripts
� Usage of Cloud computing for R Scripts
iMarine Products and services delivery
LWR - Performance
� The porting to the D4Science Statistical Manager allowed to run the
scripts in distributed fashion
� The original time of the scientist’s procedure was 20 days
� After the optimization on our R development machines the time of
the sequential run was reduced to 10 days
� The timing on the Statistical Manager was of 11 hours!
Time reduction of 95.4%
� The script has been run periodically and currently solves LWR for
37 234 species
iMarine Products and services delivery
PRODUCTS AND SERVICES
DEVELOPMENT PROGRESS REPORT
A fraction of the products and services belonging to StatsCube
iMarine Products and services delivery
Tabular Data Manager
Complete new application for the management
of data workflow. It allows to *manage* *flow of
data* and to create report out of the
management activities.
• flow of data: dataset compliant with a template • flow of data: dataset compliant with a template
that are generated and updated in chunks.
• manage: import, store, transform, validate,
access, analyze, visualize, and export.
iMarine Products and services delivery
Tabular Data Manager: Templates
• A table template defines:
– Table definition
– Columns definition
– A set of table transformations– A set of table transformations
– A set of validation procedures
• Can be applied to any dataset
• Can be modified and shared among people
iMarine Products and services delivery
Tabular Data Manager: Menu
Ribbon style menu
Buttons behavior depends
on current document
iMarine Products and services delivery
on current document
Alt messages on
mouseover
Tabular Data Manager: Panels
iMarine Products and services delivery
Tabular Data Manager: Import
iMarine Products and services delivery
Infrastructure: Computing as Service
Hadoop
Statistical Manager
• MapReduce
• Analysis/clustering/modeling
33
0 C
ore
s C
urr
en
tly
All
oca
ted
Manager
R clusters • Windows and Linux
I-MARINE EXTENDED BOARD 23
33
0 C
ore
s C
urr
en
tly
All
oca
ted
PRODUCTS AND SERVICES
DEVELOPMENT PROGRESS REPORT
A fraction of the products and services belonging to GeosCube
iMarine Products and services delivery
Rasterization
A polygonal map is
transformed into a raster
map or into a point map
iMarine Products and services delivery
Maps Comparison
compare
Compares :
• Species Distribution
mapsmaps
• Environmental layers
• SAR Images
iMarine Products and services delivery
Periodicity and Seasonality
Periodicity: 12 months
Extraction Tools Fourier AnalysisExtraction Tools Fourier Analysis
iMarine Products and services delivery
Environmental Signal Processing
Resampling
Spectrogram
iMarine Products and services delivery
Environmental Enrichment: Approach
• (Oozie)workflow to optimize the processing chain:
– Extract occurrences for the Carcharodon carcharias (White Shark) for a given time of interest
– Apply the dbscan algorithm (R implementation) to identify geospatial clusters
– Create bounding boxes around the clusters – Create bounding boxes around the clusters
– Use the bounding boxes as queryables for the WCS request
– Apply BEAM Pixel Extraction (same algorithm as BioOracleenvironmental enrichment service)
– Create the time series
– Visualize the time series
iMarine Products and services delivery
Environmental Enrichment: results
iMarine Products and services delivery
SPREAD
• Interactive investigation process for statisticians &
scientists to confront data from different domains
(e.g. Statistics vs. GIS data) and batch process of data
reallocations hypothesis
DATA IMPORT / CURATION
Estimates dataset
DATA SELECTION
(e.g. Filter)
FAO AreasGeographic intersection
FAO Areas / EEZs – Highs seas
Catch dataset
by FAO area
REALLOCATION
Estimates dataset
by EEZ – high seas
Available
Target Areas
Species
distributions
GIS DATA DISCOVERY,
SEARCHING & SHARING
iMarine Products and services delivery
Legacy Processes (IRD)
• iX Catches per Species: per Ocean / Area, per
Fishing Gear type, per Month / Year, and kernel
density for biodiversity / ecological datasets
(IRD+OBIS+GBIF)
20°N
30°E 50°E 70°E 90°E 110°E
30°S
20°S
10°S
0
10°N
iMarine Products and services delivery
PRODUCTS AND SERVICES
DEVELOPMENT PROGRESS REPORT
A fraction of the products and services belonging to ConnectCube
iMarine Products and services delivery
MarineTLO
Version 2.0.0
– Species
– Scientific Name of Species
– FAO Species Code
– IRD Species Code
– WoRMS Species Code
– Predators and Prey
– Competitors
– Biological Classification of Species
Version 3.0.0– MarineTLO Version 2.0.0
– Water Areas
– Species connected to Water Areas
– Countries
– Countries connected to Water Aras
– Species connected to Countries
– Ecosystems
– Ecosystems connected to Countries– Biological Classification of Species
(e.g. WoRMS)
– Ecosystems connected to Countries
– Species connected to Ecosystems
– Exclusive Economical Zones
– Fishing Gears
– Fishing Vessels
– More species and more Predators
– Common Names of Species
34iMarine Products and services delivery
#Query For a scientific name of a species (e.g. Thunnus Albacares or Poromitra Crassiceps),
find/give me
Q1 the biological environments (e.g. ecosystems) in which the species has been introduced and more
general descriptive information of it (such as the country)
Q2 its common names and their complementary info (e.g. languages and countries where they are
used)
Q3 the water areas and their FAO codes in which the species is native
Q4 the countries in which the species lives
Requirements as Competency Queries
35
Q4 the countries in which the species lives
Q5 the water areas and the FAO portioning code associated with a country
Q6 the presentation w.r.t Country, Ecosystem, Water Area and Exclusive Economical Zone (of the
water area)
Q7 the projection w.r.t. Ecosystem and Competitor, providing for each competitor the identification
information (e.g. several codes provided by different organizations)
Q8 a map w.r.t. Country and Predator, providing for each predator both the identification information
and the biological classification
Q9 who discovered it, in which year, the biological classification, the identification information, the
common names - providing for each common name the language, the countries where it is used
in.
iMarine Products and services delivery
The MarineTLO-based warehouse Evolution
RDF
Triple StoreTLOMarine
FLOD ECOSCOPE WoRMS
FLOD2TLOm
apping
ECOSCOPE2TLO
mapping
WoRMS2TLO
mapping
DBpediaS2TLO
mapping
FB2TLO
mapping
DBpedia Fishbase
FLOD ECOSCOPEWoRMS
(part)
FLOD ECOSCOPE WoRMS
CopyCopy
By FAO By IRD Generated by SPD
&TLO wrapper
Copy
DBpedia Fishbase
DBpedia
(part)
Fishbase
(part)
By DBpedia
SPARQL Endpoint
By Fishbase
RDMS
Copy Copy
iMarine Products and services delivery
Warehouse V3
Concepts Ecoscope FLOD WoRMS DBpedia Fishbase
Species
Scientific Names
Authorships
Common Names
Predators
EcosystemsEcosystems
Countries
Water Areas
Vessels
Gears
EEZ
iMarine Products and services delivery
TLO warehouse V2 vs V3
V2 Contains information about 19,000 distinct marine species
Source Species Number
DBpedia 14,291
FLOD 10,849
WoRMS 1124
Ecoscope 277
Common Species (size of intersections)
FLOD WoRMS Ecoscope
DBpedia 3,046 731 56
FLOD 768 73
WoRMS 768 53
V3 contains information about 37,000 distinct marine species
Source Species Number
DBpedia 14,291
FLOD 10,849
WoRMS 1124
Ecoscope 277
FishBase 31,277
Common Species (size of intersections)
FLOD WoRMS Ecoscope Fishbase
DBpedia 3,046 731 56 9833
FLOD 768 73 6141
WoRMS 53 1288
Ecoscope 53
V3 contains information about 37,000 distinct marine species
iMarine Products and services delivery
PRODUCTS AND SERVICES CATALOGUE
AT PROJECT CONCLUSION
A tiny fraction of the products and services belonging to BiolCube
iMarine Products and services delivery
Trendylyzer – Definition of Common SpeciesGrey = not a common species in 1990
� Trends for common
species can be indicators
of ecological changes
� A formal definition of
common species is not
trivial
� A definition based on
occurrences distribution
gives interesting, result
but is affected by sampling
biases
iMarine Products and services delivery
Trendylyzer – Definition of Common Species
We are searching for a more formal definition of C.S., which accounts
for the biases in the database …
We defined a commonness score function
The terms influencing the Commonness of a species are given a weight
using pattern recognition models
For each species:
1. Nr of observations
2. Nr of individuals per observation
3. Nr of observations per dataset
4. Nr of datasets
5. Nr of geographical cells
6. Temporal frequency of the observations
Normalizing => relative commonness.
Create score or rank by taxonomic group
We are assessing the
performances on the
indications by FishBase and
IUCN on some benchmark
species
iMarine Products and services delivery
Trendylyzer - Performance
A preliminary definition of CS was done using
1. Nr of observations per dataset in one year
2. Nr of datasets containing the species in one year
On a ‘trustable’ benchmark with 255 species the correctness of the
classification with respect to an expert classification was 99.21%!
The complex approximating function including also time and
geographical extent gave 80% of agreement with respect to an expert
classification on an ‘wild’ benchmark (80 species)
The results are very promising!
iMarine Products and services delivery
PRODUCTS AND SERVICES CATALOGUE
AT PROJECT CONCLUSION
A tiny fraction of the products and services belonging to StatsCube
iMarine Products and services delivery
Tabular Data Manager
gCube Releases
iMarine Products and services delivery
Tabular Data Manager: 2.18
• Transformations support: table/column type, labels management
• Validation: multiple codes warning, reduntanttuples, table types checks (codelist, dataset)
• Generic table metadata support
• Batch replace (according to an expression)• Batch replace (according to an expression)
• Single tuple modification
• Full Workspace integration
• Support for JSON document
• Templates
iMarine Products and services delivery
Tabular Data Manager: 2.19
• Operations bundle
– Aggregation, Union, Filtering, Denormalisation
– Column merging
– Import Postprocessing
– Notification
– Custom codelist creation use cases– Custom codelist creation use cases
more to come…
iMarine Products and services delivery
Tabular Data Manager: Next releases
• 2.20
– SDMX Datasource
– Codelist georeferencing
– Maps visualization
• 2.21
Data Analysis: mondrian support– Data Analysis: mondrian support
– Graphs
– UI: Jpivot, stpivot
iMarine Products and services delivery
Harmonize: Cotrix
48
Discussion time
Thank you
for your attention
www.i-marine.eu
iMarine Products and services delivery