Focus meeting FAIR Developments...A particular class of FAIR Data System that provides access to...

Preview:

Citation preview

FOCUS MEETING ON FAIR DATA DEVELOPMENTS

Luiz Olavo Bonino - luiz.bonino@dtls.nl

SUMMARY

■ What is FAIR data?

■ The FAIR ecosystem

■ Plans and how to realise

Produces Consumes

Produces Consumes

storage

sustainability

maintenance

license

privacy security

stewardship

access

?

Produces Consumes

RDF

MIAPEDBMS Excel

APISQL

SPARQLMetadata

DICOM

MIRIAM

Semantics

Produces Consumes

access

find

query

format

license

integrate

WHAT IS FAIR DATA?

FAIR Data aims to support existing communities in their attempts to enable valuable scientific data and knowledge to be published and utilised in a ‘FAIR’ manner.

Findable - (meta)data is uniquely and persistently identifiable. Should have basic machine readable descriptive metadata.

Accessible - data is reachable and accessible by humans and machines using standard formats and protocols.

Interoperable - (meta)data is machine readable and annotated with resolvable vocabularies/ontologies.

Reusable - (meta)data is sufficiently well-described to allow (semi)automated integration with other compatible data sources.

THE FAIR ECOSYSTEM

FAIR Data Principles

FAIR Data Protocol

FAIR Data Resources

FAIR Data Core Technologies

FAIR Data Systems/Tools

Normative

Artefact

Software

www.nature.com/articles/sdata201618

WWW.NATURE.COM/ARTICLES/SDATA201618

FAIR DATA RESOURCE

Datasets expressed using one of the prescribed standards of the FAIR Data Protocol, with metadata complying with the protocol and license. The original dataset is transformed into a FAIR format and proper metadata and license are added to produce a FAIR Data Resource. The original and the FAIR version can co-exist, each one fulfilling its own purpose.

Original dataset

FAIR Conversion

FAIR Data Resource

FAIR Format

Metadata License

FAIR transformation FAIR transformation

Analysis transformation Analysis transformation

FAIR DATA APPLICATION ECOSYSTEM (NL APPROACH)

FAIR DATA RESOURCE

FAIR transformation

FAIR Data Resource

BRING YOUR OWN DATA - BYOD

■ Goals: ■ Learn how to make data linkable “hands-on” with experts ■ Create a “telling story” to demonstrate its use

■ Composition: ■ Data owners – specialists on given datasets ■ Data interoperability experts ■ Domain experts

Source: Marcos Roos

BYOD

FAIRIFIER

FAIRIFIER

DataFAIRportFind,&Access,&Interoperate&&&Re3use&DataNon-FAIR Dataset

FAIR Data Resource

FAIR Format

Metadata LicenseFAIRifier

input output publish

DataFAIRportFind,&Access,&Interoperate&&&Re3use&Data

Non-FAIR Dataset

FAIR Data Resource

FAIR Format

Metadata LicenseFAIRifier

publish output

FAIR DATA MODEL REGISTRY

FAIR DataModel Registry

Dataset

Data Model

Dataset

Data Model

Dataset

Data Model

FAIRIFIER AND FAIR DATA MODEL REGISTRY

Data OwnerNon-FAIR Dataset

FAIRifier FAIR DataModel Registry

submit

search referencedata model

return referenceFAIR Profile

FAIR Data Resource

FAIR Format

License

output

Metadata

F A

I R

A particular class of FAIR Data System that provides access to published datasets. The datasets can be external or internal to the FAIR Data Point. Also, the source data can be a regular (non-FAIR) dataset or a FAIR Data Resource. If the source data is non-FAIR, the FAIR Data Point needs to made the necessary FAIR transformations on the fly.

FAIR Data Resource

non-FAIR Data Resource

Sensor

FAIR DATA POINT

Data Producer

Dataset

Data Consumer

Data Producer

Dataset

Data Producer

Dataset

Data Producer

Dataset

Data Consumer

FAIR DATA POINT

FAIR Data Point

Who are you? Can I

trust you?

FAIR DATA POINT

FAIR Data Point

Here is information about

myself

FDP Metadata

FAIR DATA POINT

FAIR Data Point

Ok, now that I know

you, tell me what you have to offer

reads

FDP Metadata

FAIR DATA POINT

FAIR Data Point

Here is information about my catalog of datasets

Catalog Metadata

FAIR DATA POINT

FAIR Data Point

Tell me more about your

genomic dataset

reads

Catalog Metadata

FAIR DATA POINT

FAIR Data Point

This is the detailed information about

the genomic dataset

Dataset & Data Record

Metadata

FAIR DATA POINT

FAIR Data Point

Ok, now that I know

what you have, give me the data.

reads

Dataset & Data Record

Metadata

FAIR DATA POINT

FAIR Data Point

Here is my data.

FAIR Data

FAIR DATA POINT - GENERAL ARCHITECTURE

FAIR API / GUI

Metadata Provider

FAIR Data Accessor

Metrics Gatherer Access Controller

FAIR Metadata FAIR Data

EMBEDDED FAIR DATA POINT

FAIR API / GUI

Metadata Provider

FAIR Data Accessor

Metrics Gatherer Access Controller

FAIR Metadata FAIR Data

B2FAIR

EUDAT API / GUI

EUDAT Current ComponentsEUDAT Current

ComponentsEUDAT Current

ComponentsEUDAT Current

Components

https://www.eudat.eu

DISTRIBUTED FAIR DATA POINTS

Biobank

FAIR Data PointBiobankDatabase

Patie

nt R

egist

ry

FAIR

Dat

a Po

int

UNIPROT

FAIR

Dat

a Po

int

HPA

FAIR Data Point

FAIR DATA POINT METADATA PROVIDER API

METADATA LAYERS

Layer Description URL Example Standard

FDP (Data repository)

Information about the FDP as a data repository

http://myfdp/ PID, title, description, license, owner, API version, etc.

OAI-PMH (extended)

Catalog Information about the catalog of datasets offered

http://myfdp/catalog

PID, title, description, publisher, etc.

W3C DCAT #Catalog

Dataset Information about each of the offered datasets

http://myfdp/[datasetID]/

AccessURL, downloadURL, format, mediaType, etc.

W3C DCAT #Dataset, #Distribution

Data record Information about the actual data, types, identifiers, etc.

http://myfdp/[datarecordID]

Community/domain, ex.: DICOM, VCF,

FDP METADATA

@prefix dbp: <http://dbpedia.org/resource/> . @prefix dcat: <http://www.w3.org/ns/dcat#> . @prefix dct: <http://purl.org/dc/terms/> . @prefix lang: <http://id.loc.gov/vocabulary/iso639-1/> . @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . @prefix xml: <http://www.w3.org/XML/1998/namespace> . @prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

<http://fdp.biotools.nl:8080/fdp> a dct:Agent ; rdfs:label "FAIR Data Point of the Plant Breeding Group, Wageningen UR"^^xsd:string ; dct:description "This FDP provides metadata on plant-specific genotype/phenotype data sets"^^xsd:string ; dct:hasPart "catalog-01"^^xsd:string ; dct:identifier "FDP-WUR-PB"^^xsd:string ; dct:issued "2015-11-24"^^xsd:date ; dct:language lang:en ; dct:modified "2015-11-24"^^xsd:date ; dct:publisher <http://orcid.org/0000-0002-4368-8058> ; dct:title "FAIR Data Point of the Plant Breeding Group, Wageningen UR"^^xsd:string ; dct:version "1.0"^^xsd:string ;

CATALOG METADATA

@prefix dbp: <http://dbpedia.org/resource/> . @prefix dcat: <http://www.w3.org/ns/dcat#> . @prefix dct: <http://purl.org/dc/terms/> . @prefix lang: <http://id.loc.gov/vocabulary/iso639-1/> . @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . @prefix xml: <http://www.w3.org/XML/1998/namespace> . @prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

<http://fdp.biotools.nl:8080/catalog/catalog-01> a dcat:Catalog ; rdfs:label "Plant Breeding Data Catalog"^^xsd:string ; dct:description "Plant Breeding Data Catalog"^^xsd:string ; dct:hasPart <breedb> ; dct:issued "2015-11-24"^^xsd:date ; dct:language lang:en ; dct:modified "2015-11-24"^^xsd:date ; dct:publisher <http://orcid.org/0000-0002-4368-8058> ; dct:title "Plant Breeding Data Catalog"^^xsd:string ; dct:version "1.0"^^xsd:string ; dcat:dataset <breedb> ;

DATASET METADATA

@prefix dbp: <http://dbpedia.org/resource/> . @prefix dcat: <http://www.w3.org/ns/dcat#> . @prefix dct: <http://purl.org/dc/terms/> . @prefix lang: <http://id.loc.gov/vocabulary/iso639-1/> . @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . @prefix xml: <http://www.w3.org/XML/1998/namespace> . @prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

<http://fdp.biotools.nl:8080/dataset/breedb> a dcat:Dataset ; rdfs:label "BreeDB tomato passport data"^^xsd:string ; dct:description "BreeDB tomato passport data"^^xsd:string ; dct:issued "2015-11-24"^^xsd:date ; dct:language lang:en ; dct:modified "2015-11-24"^^xsd:date ; dct:publisher <http://orcid.org/0000-0002-4368-8058> ; dct:title "BreeDB tomato passport data"^^xsd:string ; dct:version "1.0"^^xsd:string ; dcat:distribution <breedb-sparql>, <breedb-sqldump> ;

METADATA DISTRIBUTION

<http://fdp.biotools.nl:8080/distribution/breedb-sparql> a dcat:Distribution ; rdfs:label "SPARQL endpoint for BreeDB tomato passport data"^^xsd:string ; dct:description "SPARQL endpoint for BreeDB tomato passport data"^^xsd:string ; dct:issued "2015-11-24"^^xsd:date ; dct:language lang:en ; dct:license <http://rdflicense.appspot.com/rdflicense/cc-by-nc-nd3.0> ; dct:modified "2015-11-24"^^xsd:date ; dct:publisher <http://orcid.org/0000-0002-4368-8058> ; dct:title "SPARQL endpoint for BreeDB tomato passport data"^^xsd:string ; dct:version "1.0"^^xsd:string ; dcat:accessURL <http://virtuoso.biotools.nl:8888/sparql> .

<http://fdp.biotools.nl:8080/distribution/breedb-sqldump> a dcat:Distribution ; rdfs:label "SQL dump of the BreeDB tomato passport data"^^xsd:string ; dct:description "SQL dump of the BreeDB tomato passport data"^^xsd:string ; dct:issued "2015-11-24"^^xsd:date ; dct:language lang:en ; dct:license <http://rdflicense.appspot.com/rdflicense/cc-by-nc-nd3.0> ; dct:modified "2015-11-24"^^xsd:date ; dct:publisher <http://orcid.org/0000-0002-4368-8058> ; dct:title "SQL dump of the BreeDB tomato passport data"^^xsd:string ; dct:version "1.0"^^xsd:string ; dcat:downloadURL <http://virtuoso.biotools.nl:8888/DAV/home/breedb/breedb.sql> .

CURRENT STATUS

■ FDP, Catalog and Dataset metadata tested.

■ FAIR Accessor tested.

■ Demonstration application on rare diseases with FDPs exposing patient registry and biobank datasets.

■ Working on FDP for BreeDB (WUR).

FDP DEMONSTRATION

Biobank FAIR Data Point Patie

nt R

egis

try

FAIR

Dat

a

Poin

t

60 dataset metadata 3 biobanks data 3 patient registries data

NEXT STEPS

■ Extend the demonstration application with more types of datasets.

■ Specific a metadata description format for the data record metadata.

■ Implement the Security Enforcer and Metrics Gatherer components.

■ Release version 1.0

■ Implement subscription/notification mechanism.

DataFAIRportFind,&Access,&Interoperate&&&Re3use&Data

■ A particular class of FAIR Data System to provide support for data interoperability;

■ Supports publication and access to FAIR data. ■ Fosters an ecosystems of applications and services; ■ Federated architecture: different FAIRports (and other

FAIR Data Systems) are interconnectable; ■ Supports citations of datasets and data items; ■ Provides metrics for data usage and citation;

DataFAIRport

FAIR DATA PUBLICATION

DataFAIRportFind,&Access,&Interoperate&&&Re3use&Data

Data Owners/Creators

Dataset

Metadata

Concept 2

Concept 3

Concept 4

Concept 1

Concept 2

Concept 3

Concept 4

Concept 1

FAIR DATA ACCESS

DataFAIRportFind,&Access,&Interoperate&&&Re3use&Data

Data User

DatasetDatasetDataset

DatasetDataset Dataset

DISTRIBUTED ARCHITECTURE

DataFAIRportDTL

DataFAIRportDataFAIRport

DataFAIRportVLPB/WUR

DataFAIRport

Organiza(onX

DataFAIRport

Organiza(onY

Rare Diseases

Plant

NETHERLANDS

FAIR Data Search Engine

FAIRifier + (Meta)Data Publication

Metadata storageData storage (optional)

TransformationServices Registry

(optional)FAIR Data Point

DataFAIRportDTL

FAIR Data PointFAIR Data Point

F A IR

FAIRPORT ECOSYSTEM

DataFAIRportFind,&Access,&Interoperate&&&Re3use&Data

Application & Services [BRAIN]

Infrastructural Services

Data Consumer

Data Producer

FAIRPORT

DataFAIRportFind,&Access,&Interoperate&&&Re3use&Data

Stewardship API FAIR Data API

(Meta)Data Storage component

Metadata storage

Data storage

DataVerse EUDAT Data Repository

Semantic resolver Ontology storage

Data storage API / FAIR Data API

Data usage policy

Management component

GUI (Data publishing, search, mgmt)

Data Mgmt App

FAIR Data System

Metrics storage

Data ConsumerData Producer

Data Consumer AppsEx. *APInatomy, BRAIN,

etc)

Data Consumer AppsEx. *APInatomy, BRAIN,

etc)

Data Consumer AppsEx. *APInatomy, BRAIN,

etc)

Data Consumer AppsEx. *APInatomy, BRAIN,

etc)Data Mgmt AppData Mgmt AppData

Stewardship Apps

ROADMAP 2016

■ Implement the FAIR Data search engine

■ Implement the FAIR Data publication mechanism

■ Extend the demonstration application with more types of datasets.

■ Specific a metadata description format for the data record metadata.

■ Implement the Security Enforcer and Metrics Gatherer components on FAIR Data Points.

■ Start work on the the encryption and pseudonymisation of Personal Health Train

QUESTIONS?

Luiz Olavo Bonino

luiz.bonino@dtls.nl

Recommended