Metadata & brokering - a modern approach #2

Preview:

Citation preview

Daniele Bailo

METADATA& BROKERINGa modern approach EPISODE#2

Previously on…Metadata & Brokering#1

Main concepts- Digital Data- Metadata- Brokering system- The triad <PID, MD, DO>- Database- APIs (web services)

Side concepts- Ontologies / Semantics- PID- Digital Object- Standard- Interoperability- Open Access

Dataset

Dataset

DatasetData

setDataset

DatasetData

setDataset

Dataset

API API API

Discovery (DC) and (CKAN, eGMS)

Contextual (CERIF metadata model)

Detailed (community specific)

Features1. APIs2. <PID, metadata,

DO>3. Contextualization

metadata4. Support ontologies

Data from Irpinia

<PID, metadata, DO>

request response

THE PERFECT SYSTEM#6 Metadata driven canonical Brokeringwith contextualization & PID

BROKERING SYSTEM

NEW & OLD CHARACTERS

Metadata

Purposes1. Discovery (humans

& machines)2. Contextualization:

which is the context of the data

3. Use it for processing or other advanced tasks

Usually attached to D.O.

Interoperability

What & WhyEnables 2 system to1. Exchange

information2. Understand

information

Usually achieved through:- Agreed language - Software

“translators” interfaces thin layers

...ma che parli Arabo???

Ontologies

Why an ontology?It is the way machines manage “meaning”

How does it work?1. Connects concepts2. Needs vocabulary

Issues• Many ontologies

exist• Vocabulary Mapping

Michelini

CNT

Is Director of

INGV

Is section of Gresta

Is president of

Sailing

Has hobby

Trieste

Is Born

Italy

Located in

Boat

use

sea

use

Metadata Catalogue#1

PurposesStore metadata:e.g. 1. producer 2. date of creation 3. data format format

Misleading Example (why?)

Metadata Catalogue#2How to implement it?

Single table (bad habit)One table with all data

Multi table (good habit)- Data is stored in

multiple tables (one for concept)

- Tables are linked- Can contextualize

data

Metadata catalogue = relational database *

(*)= also noSQL... We’ll see it later..

Single table

Multi table

Metadata Catalogue#2How to implement it?

Single table (bad habit)One table with all data

Multi table (good habit)- Data is stored in

unique tables (one for concept)

- Tables are linked- Can contextualize

data

Metadata catalogue = relational database *

(*)= also noSQL... We’ll see it later..

Single table

Multi table and contextualization

Catalogue Interface

Human interface (GUI)Website or portal

Machine interface- API or Web service - which execute

scripts or queries- Returns metadata in

a given standard

What is it?It does something for the user(deliver value to customer)*

A “thin layer”We usually don’t know what’s under the hood

Examples- FDSN stations- FDSN dataselect

(web) serviceFDSN stations

FDSN Dataselect

Database(MD catalogue)

Waveformrepository

CKAN

CKAN GUI

METADATAcatalogue

CKAN APIs

EIDA stations ISIDE stations

Metadatareplication

What is it?- Metadata Catalogue- With interfaces

(GUI+API)- No direct

CKAN <-> sources connection

Examples- Works FDSN stations- Doesn’t work with

FDSN dataselect

Plugins

Plugins

Plugins Plugins

Plugins

Plugins

Plugins Plugins

Brokering System(e.g. VERCE framework)

BROKER GUI

METADATAcatalogue

BROKER APIs

EIDA stations

ISIDE stations

Metadatareplication

What is it?- Metadata Catalogue- With interfaces

(GUI+API)- System manager- Other modules- BROKER <-> sources

interactive connection

Examples- EIDA stations- EIDA dataselect- Processing Job at

CINECA

System manager

Interactiveaccess to service

EIDA dataselect

Processing facility

? ? ?

Comments&

Questions

Why the example was misleading?

A global viewData initiatives

RDA-”regulate” data sharing/use

EUDAT- Common data infrastructure

EGI- Organize National Grid Infrastructures (CINECA)

EPOS- ESFRI integrating Solid Earth data

RDADo for data what has been done for the internet (TCP/IP)

RDA concepts

Data FabricWhat?Identifies mechanisms, standard, components and interfaces making data science efficient and cost effective

Data Management Plan• Data management • Data analysis • Data preservation • Data publication • Data sharing

[UK data Archive http://www.data-archive.ac.uk/]

RDA concepts

Data Fabric

[RDA WG outputs https://indico.cern.ch/event/370271/session/2/contribution/6/material/0/0.pdf]

How to store?How to register?

How to discover?How to cite?

How to document processing?

How to integrate?

How to collect new DP?

How to access?

How to describe data?How to discover data?Metadata system

WE ALREADY KNOW EVERYTHING ABOUT IT

METADATAcatalogue

How to have standards?How to preserve data?Registry systemWhat?

An agreed/legacy catalog of:- data formats

(schemas)- metadata formats- Vocabularies &

semantic categories- Data types- Trusted repositories- ….

Registry

Ahaa.. Ma ‘npratica è ‘n

database..

…anfatti…

How to register/cite data or publications?

PID system

Purpose - DO / publication can

be uniquely referenced

- Assign a PID at data creation times

Issues- Need for a simple

mechanism to implement it

- Now EUDAT can help- Peter & Massimo

comments…

How to access data?

AAI system (federeated & distributed)Purpose - Authenticate users- Authorize users

Issues- Delegation- Many system,

sometimes non interoperable

How to store data?

Data repository (trusted)What? - Store data- Couple with PIDs- Ensure preservation

(not curation)- Can be trusted (DSA)

Opportunity- INGV DSA

repository…

How to document data processing?

Workflow enginesPurpose - Tracks data

transformation- Allows versioning- Allows reproducibility

Comments- Interoperability

among various workflow engines

- VERCE did it

Brokering System(e.g. VERCE framework)

BROKER GUI

METADATAcatalogue

BROKER APIsFull version include- Metadata Catalogue- interfaces (GUI+API)- System manager- AAI system- Workflow engine

External actors- PID System- Trusted repositories- Registries- Processing facilities

System manager

Dataset

Dataset Data

setDataset Data

setDataset

API API

AAI system

Workflow Engine

Trusted repository

Trusted repository

RegistryPID

system

HPCcenter

Q&A