View
2
Download
0
Category
Preview:
Citation preview
FOCUS MEETING ON FAIR DATA DEVELOPMENTS
Luiz Olavo Bonino - luiz.bonino@dtls.nl
SUMMARY
■ What is FAIR data?
■ The FAIR ecosystem
■ Plans and how to realise
Produces Consumes
Produces Consumes
storage
sustainability
maintenance
license
privacy security
stewardship
access
?
Produces Consumes
RDF
MIAPEDBMS Excel
APISQL
SPARQLMetadata
DICOM
MIRIAM
Semantics
Produces Consumes
access
find
query
format
license
integrate
WHAT IS FAIR DATA?
FAIR Data aims to support existing communities in their attempts to enable valuable scientific data and knowledge to be published and utilised in a ‘FAIR’ manner.
Findable - (meta)data is uniquely and persistently identifiable. Should have basic machine readable descriptive metadata.
Accessible - data is reachable and accessible by humans and machines using standard formats and protocols.
Interoperable - (meta)data is machine readable and annotated with resolvable vocabularies/ontologies.
Reusable - (meta)data is sufficiently well-described to allow (semi)automated integration with other compatible data sources.
THE FAIR ECOSYSTEM
FAIR Data Principles
FAIR Data Protocol
FAIR Data Resources
FAIR Data Core Technologies
FAIR Data Systems/Tools
Normative
Artefact
Software
www.nature.com/articles/sdata201618
FAIR DATA RESOURCE
Datasets expressed using one of the prescribed standards of the FAIR Data Protocol, with metadata complying with the protocol and license. The original dataset is transformed into a FAIR format and proper metadata and license are added to produce a FAIR Data Resource. The original and the FAIR version can co-exist, each one fulfilling its own purpose.
Original dataset
FAIR Conversion
FAIR Data Resource
FAIR Format
Metadata License
FAIR transformation FAIR transformation
Analysis transformation Analysis transformation
FAIR DATA APPLICATION ECOSYSTEM (NL APPROACH)
FAIR DATA RESOURCE
FAIR transformation
FAIR Data Resource
BRING YOUR OWN DATA - BYOD
■ Goals: ■ Learn how to make data linkable “hands-on” with experts ■ Create a “telling story” to demonstrate its use
■ Composition: ■ Data owners – specialists on given datasets ■ Data interoperability experts ■ Domain experts
Source: Marcos Roos
BYOD
FAIRIFIER
FAIRIFIER
DataFAIRportFind,&Access,&Interoperate&&&Re3use&DataNon-FAIR Dataset
FAIR Data Resource
FAIR Format
Metadata LicenseFAIRifier
input output publish
DataFAIRportFind,&Access,&Interoperate&&&Re3use&Data
Non-FAIR Dataset
FAIR Data Resource
FAIR Format
Metadata LicenseFAIRifier
publish output
FAIR DATA MODEL REGISTRY
FAIR DataModel Registry
Dataset
Data Model
Dataset
Data Model
Dataset
Data Model
FAIRIFIER AND FAIR DATA MODEL REGISTRY
Data OwnerNon-FAIR Dataset
FAIRifier FAIR DataModel Registry
submit
search referencedata model
return referenceFAIR Profile
FAIR Data Resource
FAIR Format
License
output
Metadata
F A
I R
A particular class of FAIR Data System that provides access to published datasets. The datasets can be external or internal to the FAIR Data Point. Also, the source data can be a regular (non-FAIR) dataset or a FAIR Data Resource. If the source data is non-FAIR, the FAIR Data Point needs to made the necessary FAIR transformations on the fly.
FAIR Data Resource
non-FAIR Data Resource
Sensor
FAIR DATA POINT
Data Producer
Dataset
Data Consumer
Data Producer
Dataset
Data Producer
Dataset
Data Producer
Dataset
Data Consumer
FAIR DATA POINT
FAIR Data Point
Who are you? Can I
trust you?
FAIR DATA POINT
FAIR Data Point
Here is information about
myself
FDP Metadata
FAIR DATA POINT
FAIR Data Point
Ok, now that I know
you, tell me what you have to offer
reads
FDP Metadata
FAIR DATA POINT
FAIR Data Point
Here is information about my catalog of datasets
Catalog Metadata
FAIR DATA POINT
FAIR Data Point
Tell me more about your
genomic dataset
reads
Catalog Metadata
FAIR DATA POINT
FAIR Data Point
This is the detailed information about
the genomic dataset
Dataset & Data Record
Metadata
FAIR DATA POINT
FAIR Data Point
Ok, now that I know
what you have, give me the data.
reads
Dataset & Data Record
Metadata
FAIR DATA POINT
FAIR Data Point
Here is my data.
FAIR Data
FAIR DATA POINT - GENERAL ARCHITECTURE
FAIR API / GUI
Metadata Provider
FAIR Data Accessor
Metrics Gatherer Access Controller
FAIR Metadata FAIR Data
EMBEDDED FAIR DATA POINT
FAIR API / GUI
Metadata Provider
FAIR Data Accessor
Metrics Gatherer Access Controller
FAIR Metadata FAIR Data
B2FAIR
EUDAT API / GUI
EUDAT Current ComponentsEUDAT Current
ComponentsEUDAT Current
ComponentsEUDAT Current
Components
https://www.eudat.eu
DISTRIBUTED FAIR DATA POINTS
Biobank
FAIR Data PointBiobankDatabase
Patie
nt R
egist
ry
FAIR
Dat
a Po
int
UNIPROT
FAIR
Dat
a Po
int
HPA
FAIR Data Point
FAIR DATA POINT METADATA PROVIDER API
METADATA LAYERS
Layer Description URL Example Standard
FDP (Data repository)
Information about the FDP as a data repository
http://myfdp/ PID, title, description, license, owner, API version, etc.
OAI-PMH (extended)
Catalog Information about the catalog of datasets offered
http://myfdp/catalog
PID, title, description, publisher, etc.
W3C DCAT #Catalog
Dataset Information about each of the offered datasets
http://myfdp/[datasetID]/
AccessURL, downloadURL, format, mediaType, etc.
W3C DCAT #Dataset, #Distribution
Data record Information about the actual data, types, identifiers, etc.
http://myfdp/[datarecordID]
Community/domain, ex.: DICOM, VCF,
FDP METADATA
@prefix dbp: <http://dbpedia.org/resource/> . @prefix dcat: <http://www.w3.org/ns/dcat#> . @prefix dct: <http://purl.org/dc/terms/> . @prefix lang: <http://id.loc.gov/vocabulary/iso639-1/> . @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . @prefix xml: <http://www.w3.org/XML/1998/namespace> . @prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
<http://fdp.biotools.nl:8080/fdp> a dct:Agent ; rdfs:label "FAIR Data Point of the Plant Breeding Group, Wageningen UR"^^xsd:string ; dct:description "This FDP provides metadata on plant-specific genotype/phenotype data sets"^^xsd:string ; dct:hasPart "catalog-01"^^xsd:string ; dct:identifier "FDP-WUR-PB"^^xsd:string ; dct:issued "2015-11-24"^^xsd:date ; dct:language lang:en ; dct:modified "2015-11-24"^^xsd:date ; dct:publisher <http://orcid.org/0000-0002-4368-8058> ; dct:title "FAIR Data Point of the Plant Breeding Group, Wageningen UR"^^xsd:string ; dct:version "1.0"^^xsd:string ;
CATALOG METADATA
@prefix dbp: <http://dbpedia.org/resource/> . @prefix dcat: <http://www.w3.org/ns/dcat#> . @prefix dct: <http://purl.org/dc/terms/> . @prefix lang: <http://id.loc.gov/vocabulary/iso639-1/> . @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . @prefix xml: <http://www.w3.org/XML/1998/namespace> . @prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
<http://fdp.biotools.nl:8080/catalog/catalog-01> a dcat:Catalog ; rdfs:label "Plant Breeding Data Catalog"^^xsd:string ; dct:description "Plant Breeding Data Catalog"^^xsd:string ; dct:hasPart <breedb> ; dct:issued "2015-11-24"^^xsd:date ; dct:language lang:en ; dct:modified "2015-11-24"^^xsd:date ; dct:publisher <http://orcid.org/0000-0002-4368-8058> ; dct:title "Plant Breeding Data Catalog"^^xsd:string ; dct:version "1.0"^^xsd:string ; dcat:dataset <breedb> ;
DATASET METADATA
@prefix dbp: <http://dbpedia.org/resource/> . @prefix dcat: <http://www.w3.org/ns/dcat#> . @prefix dct: <http://purl.org/dc/terms/> . @prefix lang: <http://id.loc.gov/vocabulary/iso639-1/> . @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . @prefix xml: <http://www.w3.org/XML/1998/namespace> . @prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
<http://fdp.biotools.nl:8080/dataset/breedb> a dcat:Dataset ; rdfs:label "BreeDB tomato passport data"^^xsd:string ; dct:description "BreeDB tomato passport data"^^xsd:string ; dct:issued "2015-11-24"^^xsd:date ; dct:language lang:en ; dct:modified "2015-11-24"^^xsd:date ; dct:publisher <http://orcid.org/0000-0002-4368-8058> ; dct:title "BreeDB tomato passport data"^^xsd:string ; dct:version "1.0"^^xsd:string ; dcat:distribution <breedb-sparql>, <breedb-sqldump> ;
METADATA DISTRIBUTION
<http://fdp.biotools.nl:8080/distribution/breedb-sparql> a dcat:Distribution ; rdfs:label "SPARQL endpoint for BreeDB tomato passport data"^^xsd:string ; dct:description "SPARQL endpoint for BreeDB tomato passport data"^^xsd:string ; dct:issued "2015-11-24"^^xsd:date ; dct:language lang:en ; dct:license <http://rdflicense.appspot.com/rdflicense/cc-by-nc-nd3.0> ; dct:modified "2015-11-24"^^xsd:date ; dct:publisher <http://orcid.org/0000-0002-4368-8058> ; dct:title "SPARQL endpoint for BreeDB tomato passport data"^^xsd:string ; dct:version "1.0"^^xsd:string ; dcat:accessURL <http://virtuoso.biotools.nl:8888/sparql> .
<http://fdp.biotools.nl:8080/distribution/breedb-sqldump> a dcat:Distribution ; rdfs:label "SQL dump of the BreeDB tomato passport data"^^xsd:string ; dct:description "SQL dump of the BreeDB tomato passport data"^^xsd:string ; dct:issued "2015-11-24"^^xsd:date ; dct:language lang:en ; dct:license <http://rdflicense.appspot.com/rdflicense/cc-by-nc-nd3.0> ; dct:modified "2015-11-24"^^xsd:date ; dct:publisher <http://orcid.org/0000-0002-4368-8058> ; dct:title "SQL dump of the BreeDB tomato passport data"^^xsd:string ; dct:version "1.0"^^xsd:string ; dcat:downloadURL <http://virtuoso.biotools.nl:8888/DAV/home/breedb/breedb.sql> .
CURRENT STATUS
■ FDP, Catalog and Dataset metadata tested.
■ FAIR Accessor tested.
■ Demonstration application on rare diseases with FDPs exposing patient registry and biobank datasets.
■ Working on FDP for BreeDB (WUR).
FDP DEMONSTRATION
Biobank FAIR Data Point Patie
nt R
egis
try
FAIR
Dat
a
Poin
t
60 dataset metadata 3 biobanks data 3 patient registries data
NEXT STEPS
■ Extend the demonstration application with more types of datasets.
■ Specific a metadata description format for the data record metadata.
■ Implement the Security Enforcer and Metrics Gatherer components.
■ Release version 1.0
■ Implement subscription/notification mechanism.
DataFAIRportFind,&Access,&Interoperate&&&Re3use&Data
■ A particular class of FAIR Data System to provide support for data interoperability;
■ Supports publication and access to FAIR data. ■ Fosters an ecosystems of applications and services; ■ Federated architecture: different FAIRports (and other
FAIR Data Systems) are interconnectable; ■ Supports citations of datasets and data items; ■ Provides metrics for data usage and citation;
DataFAIRport
FAIR DATA PUBLICATION
DataFAIRportFind,&Access,&Interoperate&&&Re3use&Data
Data Owners/Creators
Dataset
Metadata
Concept 2
Concept 3
Concept 4
Concept 1
Concept 2
Concept 3
Concept 4
Concept 1
FAIR DATA ACCESS
DataFAIRportFind,&Access,&Interoperate&&&Re3use&Data
Data User
DatasetDatasetDataset
DatasetDataset Dataset
DISTRIBUTED ARCHITECTURE
DataFAIRportDTL
DataFAIRportDataFAIRport
DataFAIRportVLPB/WUR
DataFAIRport
Organiza(onX
DataFAIRport
Organiza(onY
Rare Diseases
Plant
NETHERLANDS
FAIR Data Search Engine
FAIRifier + (Meta)Data Publication
Metadata storageData storage (optional)
TransformationServices Registry
(optional)FAIR Data Point
DataFAIRportDTL
FAIR Data PointFAIR Data Point
F A IR
FAIRPORT ECOSYSTEM
DataFAIRportFind,&Access,&Interoperate&&&Re3use&Data
Application & Services [BRAIN]
Infrastructural Services
Data Consumer
Data Producer
FAIRPORT
DataFAIRportFind,&Access,&Interoperate&&&Re3use&Data
Stewardship API FAIR Data API
(Meta)Data Storage component
Metadata storage
Data storage
DataVerse EUDAT Data Repository
Semantic resolver Ontology storage
Data storage API / FAIR Data API
Data usage policy
Management component
GUI (Data publishing, search, mgmt)
Data Mgmt App
FAIR Data System
Metrics storage
Data ConsumerData Producer
Data Consumer AppsEx. *APInatomy, BRAIN,
etc)
Data Consumer AppsEx. *APInatomy, BRAIN,
etc)
Data Consumer AppsEx. *APInatomy, BRAIN,
etc)
Data Consumer AppsEx. *APInatomy, BRAIN,
etc)Data Mgmt AppData Mgmt AppData
Stewardship Apps
ROADMAP 2016
■ Implement the FAIR Data search engine
■ Implement the FAIR Data publication mechanism
■ Extend the demonstration application with more types of datasets.
■ Specific a metadata description format for the data record metadata.
■ Implement the Security Enforcer and Metrics Gatherer components on FAIR Data Points.
■ Start work on the the encryption and pseudonymisation of Personal Health Train
Recommended