Upload
imarine283644
View
74
Download
0
Embed Size (px)
DESCRIPTION
iMarine solutions and benefits for communities.
Citation preview
iMarine Catalogue of Services
Pasquale Pagano (CNR) iMarine Technical Director pasquale.pagano@is?.cnr.it
iMarine data plaAorm for collabora?ons 7th March 2014, 09:00 – 17:30
Food and Agriculture Organiza2on of the United Na2ons (FAO) Headquarters
The Catalogue of Services
iMarine is exploi?ng a Hybrid Data Infrastructure combining over 500 soPware components into a
coherent and centrally managed system of hardware, soPware, and data resources.
iMarine data plaAorm for collabora?ons 2
Born from the user needs
3 iMarine data plaAorm for collabora?ons
I need to host my applica?ons in a secure and scalable environment
I need to maintain my database
I need to backup my data
I need to delivery my data to a set of known people
I need to analyse my big datasets
Born from the user needs
4 iMarine data plaAorm for collabora?ons
I need to manage and analyze biological and ecological data
I need to manage the full data life-‐cycle from import to valida?on, cura?on, harmoniza?on and publica?on
I need to offer to my team a powerful tool to manage code-‐lists
I need to store and analyze geospa?al explicit informa?on
I want to offer a flexible sharing, storage, repor?ng, search and retrieval tool
Born from the user needs
5 iMarine data plaAorm for collabora?ons
I need to access authorita?ve biological and ecological data
I wish to simplify the access to my geospa?al data
I need to mash-‐up sta?s?cal and biodiversity data
I need to reduce the costs of data maintenance of my dept.
I need to validate my datasets and provide a standard access to them
User Needs Analysis
6 iMarine data plaAorm for collabora?ons
• Needs – Not isolated – Not disconnected – Not trivial
• Solu?ons – Actual but with an eye to the future
– Designed for individuals but looking at the community
Capaci?es: Storage as Service
• Scalability and high availability
• Across sites
• ISO 19115/19139 Metadata
• Catalogue
• Open source RDBMS
• Up to 1 TB data
• Secure • Fault-‐tolerant • Replica?on
Virtual Workspace
Rela?onal Databases
Large and Ac?ve data storage
Spa?al Database
iMarine data plaAorm for collabora?ons 7
Capaci?es: Compu?ng as Service
Hadoop
Sta?s?cal Manager
R clusters
• MapReduce
• Analysis/clustering/modeling
• Windows and Linux
iMarine data plaAorm for collabora?ons
1000 CPU
s Currently Available
8
Management and interpreta?on of biological and ecological data in the environment
Complete full life-‐cycle data framework, from observa?onal data to aggregated data repositories enriched with valida?on and analy?cal tools
Storage and interpreta?on of geospa?al explicit informa?on, including WPS processing
Flexible sharing, storage, repor?ng, search and retrieval, aggrega?on and projec?on facili?es
Applica?ons
iMarine data plaAorm for collabora?ons
A BUNDLE is a set of
services and technologies grouped
according to a family of related tasks for
achieving a common objec?ve
9
Occurrence and Taxonomic Data Discovery Occurrence Data Processing Species Distribu2on Modeling Species Distribu2on Maps Discovery Taxonomic Data Comparison Taxonomic Data Matching
Code List Discovery Code List Management Sta2s2cal Engine Tabular Data Discovery Tabular Data Enrichment Tabular Data Management Tabular Data Processing
Geospa2al Data Discovery Geospa2al Data Processing
Enhanced Documents Management Fact-‐sheets Management Informa2on Object Discovery Messaging Shared Workspace Social Networking Facili2es
Applica?ons
10 iMarine data plaAorm for collabora?ons
A BUNDLE is a set of
services and technologies grouped
according to a family of related tasks for
achieving a common objec?ve
iMarine data plaAorm for collabora?ons
Presence Points
(FishBase +
Obis)
Density Based Clustering DBSCAN
(with outliers)
Other methods are also available …
K-‐Means
X-‐Means
Features Clustering with StatsCube
11
Data Analysis with StatsCube
12
Import CodeLists
Validate Datasets
Analyse And
Project
Ecological Modeling with BiolCube
iMarine data plaAorm for collabora?ons 13
VS
FAO Eleutheronema tetradactylum
AquaMaps Eleutheronema tetradactylum
Maps Comparison with GeosCube
MEAN=0.81 VARIANCE=0.02 NUMBER_OF_ERRORS=6691 NUMBER_OF_COMPARISONS=259200 ACCURACY=97.42 MAXIMUM_ERROR=1.0 MAXIMUM_ERROR_POINT=3005:363:1 COHENS_KAPPA=0.218 COHENS_KAPPA_CLASSIFICATION_LANDIS_KOCH=Fair COHENS_KAPPA_CLASSIFICATION_FLEISS=Marginal TREND=EXPANSION RESOLUTION=0.5
iMarine data plaAorm for collabora?ons 14
iMarine
OBIS WoRMS
WoRDS
GBIF
CoL
ITIS
IRMNG NCBI
MyOcean
WOA
EuroStat
Data.FAO
…
Data
15 iMarine data plaAorm for collabora?ons
iMarine Registries
Valida2on
Enriching
Processing
Sharing
Data
Ontologies and Data
Warehouses
Biological and
Ecological Data
GeoSpa?al Data
Sta?s?cal Data
Documents
iMarine data plaAorm for collabora?ons
DarwinCore / ISO19139 >35 M Observa?ons (OBIS) ≈ 120 K Observed Species (OBIS) ≈ 500 K Taxa (WoRMS) >600 K Scien?fic Names (ITIS) >12 K Species Maps (AquaMaps) ≈ 600 Species Extent (FAO) … FishBase, SeaLifeBase … CoL, GBIF
SDMX * Ø FAO CodeLists Ø IRD CodeLists Ø FAO datasets Ø Eurostat Ø …
ISO19139 (OGC W*S) Ø 10 years Chemical and Physical variables in 2D space
Ø Ice concentra?on and velocity, Chlorophyll, Oxygen, Nitrate, Phosphate, Phytoplankton as carbon, Salinity, Temperature, …
Ø On-‐demand Chemical and Physical variables in 3D space Ø Apparent Oxygen U?liza?on, Dissolved Oxygen, Salinity, Temperature, …
> 350
varia
bles
16
OAI-‐PMH, OpenSearch Ø FAO Facksheets Ø Aqua?c Commons Ø Bioline Interna?onal Ø Biodiversity Heritage Ø OceanDocs Ø Nature, PenSoP
Journals Ø …
RDF, OWL Ø FAO FLOD Ø Marine Top Level Ontology Ø IRD Ecoscope Ø FactForge, Yago2 Ø …
Is this enough? • An ecosystem of par?cipatory data e-‐Infrastructures
• Regulated by policies • Enabled by standards • Promo?ng not only access but mash-‐up of heterogeneous data
iMarine data plaAorm for collabora?ons
User centric 17
Virtual Research Environment iMarine is user-‐centric and workflow-‐oriented thanks to the gCube VRE technology Virtual Research Environment (VRE) is • a distributed and dynamically created environment • where subset of data, services, computa?onal, and storage resources
• regulated by tailored policies • are assigned to a subset of users via interfaces • for a limited 2meframe • at lifle or no cost for the providers of the par?cipatory data e-‐infrastructures
iMarine data plaAorm for collabora?ons
L. Candela, D. Castelli, P. Pagano (2013) Virtual Research Environments: An Overview and a Research Agenda. Data Science Journal, Vol. 12
18
iMarine Technology
• iMarine is powered by gCube
iMarine data plaAorm for collabora?ons 19
hups://www.ohloh.net/p/gCube
iMarine Technology
• iMarine is powered by gCube
iMarine data plaAorm for collabora?ons 20
hups://www.ohloh.net/p/gCube
iMarine Technology
• iMarine is powered by gCube
iMarine data plaAorm for collabora?ons 21
hups://www.ohloh.net/p/gCube
iMarine e-‐infrastructure
iMarine is exploi?ng D4Science.org
iMarine data plaAorm for collabora?ons 22
Geographically Distributed Compu?ng
Infrastructure
Across administra?ve boundaries
Across private and commercial
providers
Service Alloca?ons, Deployment,
Monitoring, and Opera?on
Uniform resource and data access
Opera?on Built on SLAs
Support monitoring, audi?ng, repor?ng, and no?fica?on
Trust Privacy, governance, and auribu?on
Security, trusted network
Landscape
D4Science e-‐Infrastructure
gCube Framework
gCube Apps
Discussion
www.i-‐marine.eu
i-‐marine.d4science.org
iMarine data plaAorm for collabora?ons 23
Google Analy?cs iMarine portal
iMarine data plaAorm for collabora?ons 24