View
17
Download
0
Category
Tags:
Preview:
DESCRIPTION
On the 22 July 2014, OpenChannels.org and the EBM Tools Network, two of the premier sources of information about coastal and marine planning and management tools in the United States and internationally, hosted the iMarine webinar: iMarine Data e-Infrastructure Initiative for Fisheries Management and Conservation of Marine Living Resources. The webinar focused on the presentation of the iMarine initiative and its powerful data e-infrastructures and services, followed by a presentation of a set of use cases related to Geospatial Analysis, Ecology, Biodiversity and Life History Traits. The presentations were given by Pasquale Pagano, CNR-ISTI and iMarine Technical Director and Gianpaolo Coro, CNR-ISTI. Watch the video of the webinar here https://www.youtube.com/watch?v=lgf30BPyBbk
Citation preview
iMarine Data e-‐Infrastructure Ini$a$ve for Fisheries Management and Conserva$on of Marine Living Resources
Data access, harmonization, analysis, and
management platform
Pasquale Pagano (CNR) iMarine Technical Director pasquale.pagano@is$.cnr.it
22 July 2014
Open Channel Webinar
Gianpaolo Coro (CNR) iMarine Data Analyst
gianpaolo.coro@is$.cnr.it
Pasquale Pagano • Master Degree in Computer Science • Ph.D in Informa$on Engineering on Distributed Systems • it.linkedin.com/in/pasqualepagano/
Gianpaolo Coro • Master Degree in Physics -‐ Cyberne$cs • Ph.D in Computer Science • it.linkedin.com/pub/gianpaolo-‐coro/16/665/b5a
2 iMarine -‐ Just an overview
Concepts
iMarine -‐ Just an overview
The ini=a=ve (the visionary leadership)
The e-‐infrastructure (the opera$onal plaUorm)
The system (the enabling sw system)
3
THE INITIATIVE Distinguishing capabilities of the iMarine initiative
iMarine -‐ Just an overview 4
iMarine Objec$ve
5 iMarine -‐ Just an overview
Launch an Ini$a$ve aimed at establishing and opera$ng a data Infrastructure suppor$ng the
principles of the Ecosystem Approach
to Fisheries Management and Conserva=on of Marine Living Resources
Nov 2011
Sept 2014
Apr 2016
Address system harmoniza$on
6 iMarine -‐ Just an overview
VLIZ
IOC
FIN
CRIA
IRD FAO
T2
MyO
cean SeaDatanet Other sources
FIGIS
Fisheries
Ocean environm
ent
Biodiversity
ESTAT
DG-‐MARE Na$onal DOF
Ecoscope
ICES RDB
Taxono
my
Emod
net B
iology
WORM
S OBIS
Aquamaps
Niche modelling algorithms Open Source so^ware
Open SDMX -‐ CLM
SDMX
GBIF
EoL
NEAFC
FishBa
se
Courtesy by Marc Taconet (FAO)
Role of the iMarine Board
• Mobilize user community – Core set of influen$al partners mobilized to work on two main business cases:
Support to implementa=on of the EU Common Fishery Policy Support to FAO’s deep seas fisheries programme
• Develop governance model – Public partnerships – Policies – data sharing – so^ware sharing
• Address systems’ harmoniza$on – Ra$onalize solu$ons among partners (cost efficiency)
– Agree on “iMarine” standards
iMarine -‐ Just an overview
• FAO • DG MARE • Eurostat • NEAFC • MEDDE/DOF • IRD • ICES
• IOC/OBIS • FIN • CRIA • VLIZ • T2/GENESI-‐DEC
Environm
ent Biodiversity
Fish
eries
7
THE INFRASTRUCTURE
Distinguishing capabilities of the iMarine e-infrastructure and its enabling software
iMarine -‐ Just an overview 8
Concepts and Defini$ons
iMarine -‐ Just an overview 9
The D4Science infrastructure
iMarine is exploi$ng a Hybrid Data Infrastructure combining over 500 so^ware components into a
coherent and centrally managed system of hardware, so^ware, and data resources.
iMarine -‐ Just an overview 10
Born from the user needs
11 iMarine -‐ Just an overview
I need to host my applica$ons in a secure and scalable environment
I need to maintain my database
I need to backup my data
I need to securely delivery my data to a set of known people
I want to offer a flexible sharing, storage, repor$ng, search and retrieval tool
Born from the user needs
12 iMarine -‐ Just an overview
I need to manage and analyze biological and ecological data
I need to manage the full data life-‐cycle from import to valida$on, cura$on, harmoniza$on and publica$on
I need to offer to my team a powerful tool to manage code-‐lists
I need to store and analyze geospa$al explicit informa$on
I need to reduce the costs of data maintenance of my dept.
Born from the user needs
13 iMarine -‐ Just an overview
I need to access authorita$ve biological and ecological data
I need to simplify the access to my geospa$al data
I need to mash-‐up sta$s$cal and biodiversity data
I need to validate my datasets and provide a standard access to them
I need to analyse my big datasets
User Needs Analysis
14 iMarine -‐ Just an overview
• Needs – Not isolated – Not disconnected – Not trivial
• Solu$ons – Actual and with an eye to the future
– Designed for individuals and looking at the community
iMarine e-‐infrastructure
iMarine is exploi$ng D4Science.org
iMarine -‐ Just an overview 15
Geographically Distributed Compu$ng
Infrastructure
Across administra$ve boundaries
Across private and commercial
providers
Service Alloca$ons, Deployment,
Monitoring, and Opera$on
Uniform resource and data access
Opera$on Built on SLAs
Support monitoring, audi$ng, repor$ng, and no$fica$on
Trust Privacy, governance, and apribu$on
Security, trusted network
Infrastructure: key characteris$cs • Efficient and tailored storage technologies
• Computa=onal environments dealing with the volume of the data
• Elas=c management of the resources, monitoring, aler$ng, recovery
• Collabora=ve environment to support scien$fic communi$es
• Rich porQolio of applica=ons to perform access, valida$on, enriching, processing, sharing, and mash-‐up of data
iMarine -‐ Just an overview 16
Capaci$es: Storage as Service
17 iMarine -‐ Just an overview
to host and maintain data
Database High-‐availability
Standard Ready-‐to-‐use
Cloud Storage Scalable Reliable Secure
Geographical DB Scalable
OGC Standard Privacy and AEribuFon
Capaci$es: Compu$ng as Service
18 iMarine -‐ Just an overview
to process and extract knowledge
Scalable Easy to Manage Across Boundaries
Tailored
Elas<c Assignment of CompuFng Assignment of Processors
Virtual Research Environment
Rich and Heterogeneous High Throughput Map-‐Reduce Parallel R
Applica$ons as a Service
19 iMarine -‐ Just an overview
to curate and manage data
Metadata Genera<on GeospaFal Data Biodiversity Data StaFsFcal Data
Harmoniza<on Disambiguate
Validate Integrate and Consistency Check
Data Exchange OGC protocols DarwinCore
SDMX
THE APPLICATIONS CATALOGUE
Distinguishing capabilities of the iMarine catalogue of applications
iMarine -‐ Just an overview 20
Management and interpreta$on of biological and ecological data in the environment
Complete full life-‐cycle data framework, from observa$onal data to aggregated data repositories enriched with valida$on and analy$cal tools
Storage and interpreta$on of geospa$al explicit informa$on, including WPS processing
Flexible sharing, storage, repor$ng, search and retrieval, aggrega$on and projec$on facili$es
Applica$ons as a Service
iMarine -‐ Just an overview
A BUNDLE is a set of
services and technologies grouped
according to a family of related tasks for
achieving a common objec$ve
21
Occurrence and Taxonomic Data Discovery Occurrence Data Processing Species Distribu=on Modeling Species Distribu=on Maps Discovery Taxonomic Data Comparison Taxonomic Data Matching
Code List Discovery Code List Management Sta=s=cal Engine Tabular Data Discovery Tabular Data Enrichment Tabular Data Management Tabular Data Processing
Geospa=al Data Discovery Geospa=al Data Processing
Enhanced Documents Management Fact-‐sheets Management Informa=on Object Discovery Messaging Shared Workspace Social Networking Facili=es
Applica$ons as a Service
22 iMarine -‐ Just an overview
A BUNDLE is a set of
services and technologies grouped
according to a family of related tasks for
achieving a common objec$ve
THE DATA CATALOGUE
Distinguishing capabilities of the iMarine catalogue of applications
iMarine -‐ Just an overview 23
iMarine
OBIS WoRMS
WoRDS
GBIF
CoL
ITIS
IRMNG NCBI
MyOcean
WOA
EuroStat
Data.FAO
…
Data
24 iMarine -‐ Just an overview
iMarine Registries
Valida=on
Enriching
Processing
Sharing
Data
Ontologies and Data
Warehouses
Biological and
Ecological Data
GeoSpa$al Data
Sta$s$cal Data
Documents
iMarine -‐ Just an overview
DarwinCore / ISO19139 >35 M Observa$ons (OBIS) ≈ 120 K Observed Species (OBIS) ≈ 500 K Taxa (WoRMS) >600 K Scien$fic Names (ITIS) >12 K Species Maps (AquaMaps) ≈ 600 Species Extent (FAO) … FishBase, SeaLifeBase … CoL, GBIF
SDMX * Ø FAO CodeLists Ø IRD CodeLists Ø FAO datasets Ø Eurostat Ø …
ISO19139 (OGC W*S) Ø 15 years Chemical and Physical variables in 2D space
Ø Ice concentra$on and velocity, Chlorophyll, Oxygen, Nitrate, Phosphate, Phytoplankton as carbon, Salinity, Temperature, …
Ø On-‐demand Chemical and Physical variables in 3D space Ø Apparent Oxygen U$liza$on, Dissolved Oxygen, Salinity, Temperature, …
> 450
varia
bles
25
OAI-‐PMH, OpenSearch Ø FAO Facksheets Ø Aqua$c Commons Ø Bioline Interna$onal Ø Biodiversity Heritage Ø OceanDocs Ø Nature, PenSo^
Journals Ø …
RDF, OWL Ø FAO FLOD Ø Marine Top Level Ontology Ø IRD Ecoscope Ø FactForge, Yago2 Ø …
THE COLLABORATIVE ENVIRONMENT
Distinguishing capabilities of the iMarine collaborative environment
iMarine -‐ Just an overview 26
Is this enough? • An ecosystem of par$cipatory data e-‐Infrastructures
• Regulated by policies • Enabled by standards • Promo$ng not only access but mash-‐up of heterogeneous data
iMarine -‐ Just an overview
User centric 27
Virtual Research Environment iMarine is user-‐centric and workflow-‐oriented thanks to the gCube VRE technology
Virtual Research Environment (VRE) is • a distributed and dynamically created environment • where subset of data, services, computa$onal, and storage resources • regulated by tailored policies • are assigned to a subset of users via interfaces • for a limited =meframe • at li_le or no cost for the providers of the par$cipatory data e-‐infrastructures
iMarine -‐ Just an overview
L. Candela, D. Castelli, P. Pagano (2013) Virtual Research Environments: An Overview and a Research Agenda. Data Science Journal, Vol. 12
28
e-‐Infrastructure VRE VRE
VRE
Virtual Research Environment
29 iMarine -‐ Just an overview
to share and collaborate
Share Database Tables
Workflow Files
Communicate Post
Favourite ConnecFon
Organize Dynamic VRE CreaFon
Secure Policy Control
Infrastructure: Collabora$ve Environment
iMarine -‐ Just an overview
A single place to • Get status and updates from applica$ons and other users they are interested in; • Get no$fica$ons about messages, jobs comple$on, new generated products, etc.
30
Share Updates
User news feed
VREs user is member of
Infrastructure: Collabora$ve Environment
iMarine -‐ Just an overview
A single place to • Get status and updates from applica$ons and other users they are interested in; • Get no$fica$ons about messages, jobs comple$on, new generated products, etc.
Feeds fromApplications
Feed from Users
31
Infrastructure: Collabora$ve Environment
iMarine -‐ Just an overview
A single place to • Manage data, store and preserve them • Share data
32
THE SOFTWARE
Distinguishing capabilities of gCube software
iMarine -‐ Just an overview 33
iMarine Technology
• iMarine is powered by gCube
iMarine -‐ Just an overview 34
openhub.net
USE CASES Few examples of the analytics capabilities
iMarine -‐ Just an overview 35
Geospa$al Analysis
Ecology
Biodiversity
Life History Traits
Prac$cal Examples
iMarine -‐ Just an overview 36
Geospa$al Analysis
iMarine -‐ Just an overview 37
Rasteriza$on
A polygonal map is transformed into a raster map or into a point map
iMarine -‐ Just an overview 38
Maps Comparison
compare
Compares : • Species Distribu$on
maps • Environmental layers • SAR Images
iMarine -‐ Just an overview 39
Periodicity and Seasonality
Periodicity: 12 months Extrac=on Tools Fourier Analysis
iMarine -‐ Just an overview 40
Environmental Signal Processing
Resampling
Spectrogram
iMarine -‐ Just an overview 41
Water\Height Column
Given a layer containing 3D environmental informa$on • Extract the environmental informa$on along Z given X,Y,T at resolu$on R
• Produce charts and ranges
iMarine -‐ Just an overview 42
iMarine -‐ Just an overview 43
Ecology
Niche Modelling
• AquaMaps – Suitable Habitat • AquaMaps – Na$ve Habitat • AquaMaps for 2050 Scenario • Ar$ficial Neural Networks
Gadus morhua
AquaMaps -‐ Suitable Habitat
iMarine -‐ Just an overview 44
Outliers Detec$on
Presence Points
Density-‐based Clustering
and Outliers detec$on
Distance Based Clustering
K-‐Means
X-‐Means
DBScan
Cetorhinus maximus
iMarine -‐ Just an overview 45
iMarine -‐ Just an overview 46
Biodiversity
Climate Changes Effects on Species
Es$mated impact of climate changes over 20 years on 11549
species. Pseudanthias evansi
The occupancy by the Pseudanthias evansi
decreases in Area 71 but increases in Area 77
Bioclimate HSpec
Overall occupancy in =me
iMarine -‐ Just an overview 47
Similarity between habitats Habitat Representa$veness Score:
1. Measures the similarity between the environmental features of two areas
2. Assesses the quality of models and environmental features
HRS=10.5
Habitat Representa$veness
Score
La$meria chalumnae
iMarine -‐ Just an overview 48
Occurrence Data from GBIF Occurrence Data from Obis
∩ Intersec=on
-‐ Difference
ᴜ Union
A
x,y
Event Date
Modif Date
Author
Species Scien=fic Name
Occurrence Points
B
x,y
Event Date
Modif Date
Author
Species Scien=fic Name
Records
Similarity
DD Duplicates Dele=on
iMarine -‐ Just an overview 49
BiOnym
A flexible workflow approach to taxon name matching Accounts for: • Varia$ons in the spelling and
interpreta$on of taxonomic names
• Combina$on of data from different sources
• Harmoniza$on and reconcilia$on of Taxa names
Raw Input String Gadus morua Lineus 1758
Correct Transcrip$on: Gadus morhua (Linnaeus, 1758)
Preprocessing And
Parsing
Taxon name Matcher 1
Taxon name Matcher 2
Taxon name Matcher n
PostProcessing
Reference Source (ASFIS)
Reference Source
(FISHBASE)
Reference Source
(WoRMS)
Reference Source (Other in DwC-‐A)
iMarine -‐ Just an overview 50
Trendylyzer -‐ Recognize Big Changes in Species Presence
• Fill some knowledge gaps on marine species • Account for sampling biases • Define trends for common species
Plankton regime shift
Herring recovered after the fish ban
iMarine -‐ Just an overview 51
iMarine -‐ Just an overview 52
Life History Traits
𝑊=𝑎𝐿↑𝑏 Calculate the a and b parameters for 14 230 species by means of Bayesian Methods
Length-‐Weight Rela$onships
Approach: Ø Collabora$ve development with the final user Ø Integra$on of user’s R Scripts Ø Usage of parallel processing for R Scripts Ø Periodic runs
bluewatermag.com.au
Ø The por$ng to the D4Science Sta$s$cal Manager allowed to run the scripts in distributed fashion
Ø The $me reduc$on was from 20 days to 11 hours! 95.4% reduc=on
iMarine -‐ Just an overview 53
Safe Biological Limits of Large Stocks
Re-‐es$mated SSB limit
Re-‐es$mated HS
Rule-‐based HS
Re-‐es$mated precau$onary limit
Es=mate biological limits for 50 Northeast Atlan=c fish stocks Ø Use real measures Ø Rely on previous expert knowledge Ø Use Bayesian models to combine
informa$on
iMarine -‐ Just an overview 54
Resilience vs Produc$vity of a Species
Best Resilience and Produc$vity pair for the species
iMarine -‐ Just an overview 55
THE WAY TO USE IT
Distinguishing capabilities of the exploitation models
iMarine -‐ Just an overview 56
Mul$-‐tenant Delivery Model
Infrastructure as a Service
• Dynamic deployment • Hos$ng • Resource Lifecycle • Monitoring • Accoun$ng • Security
So^ware as a Service
• BiolCube • ConnectCube • GeosCube • StatsCube
PlaUorm as a Service
• FeatherWeightStack • SmartGears • Applica$onSupportLayer • SOA3
iMarine -‐ Just an overview 57
Landscape
D4Science e-‐Infrastructure
gCube Framework
gCube Apps
Discussion
www.i-‐marine.eu
i-‐marine.d4science.org
www.openhub.net/p/gCube
info@i-‐marine.eu
iMarine -‐ Just an overview 58
Google Analy$cs iMarine portal
iMarine -‐ Just an overview 59
Recommended