21
The new CIARD RING a machine-readable directory of datasets for agriculture Valeria Pesce Global Forum on Agricultural Research (GFAR) Research Data Alliance 4th Plenary Meeting 22-24 September 2014, Amsterdam Agricultural Data Interoperability Interest Group agINFRA project EC 7th framework program INFRA-2011-1.2.2 - Grant agr. no: 283770

The new CIARD RING, a machine-readable directory of datasets for agriculture

Embed Size (px)

DESCRIPTION

The CIARD RING, a global directory of datasets for agriculture, has been enhanced during the EC-funded agINFRA project. It has become a Linked Data hub that can be queried by other applications. Presented at the 4th RDA Plenary Meeting in Amsterdam on 22/09/2014.

Citation preview

Page 1: The new CIARD RING, a machine-readable directory of datasets for agriculture

The new CIARD RINGa machine-readable directory of

datasets for agriculture

Valeria PesceGlobal Forum on Agricultural Research (GFAR)

Research Data Alliance 4th Plenary Meeting22-24 September 2014, Amsterdam

Agricultural Data Interoperability Interest Group

agINFRA projectEC 7th framework program INFRA-2011-1.2.2 - Grant agr. no: 283770

Page 2: The new CIARD RING, a machine-readable directory of datasets for agriculture

The CIARD RING

http://ring.ciard.net

The CIARD RING is a project implemented within the CIARD initiative and is led by the Global Forum on Agricultural Research (GFAR).

The CIARD RING is a global directory of web-based information

services and datasets for agriculture

Page 3: The new CIARD RING, a machine-readable directory of datasets for agriculture

Why (1)

- Producers and managers of information / data need a place where their information products can be found

- Data consumers need to find suitable data sources

- IT professionals need information on the level and mode of interoperability of information services and datasets for using data in their applications

Page 4: The new CIARD RING, a machine-readable directory of datasets for agriculture

Numbers and map

• 468 data providers• 1018 information services, of which– 268 exposed datasets

Page 5: The new CIARD RING, a machine-readable directory of datasets for agriculture

Definition of “dataset” in the RINGThe term “datasets” has been defined in several ways, all of which further specify or extend the basic concept of “a collection of data”.

Definition given by the W3C Government Linked Data Working Group:

A dataset is “a collection of data, published or curated by a single source, and available for access or download in one or more formats”

The “instances” of the dataset “available for access or download in one or more formats” are called “distributions”. A dataset can have many distributions.

Examples of distributions include a downloadable CSV file, an API or an RSS feed.

Page 6: The new CIARD RING, a machine-readable directory of datasets for agriculture

Direct submission + federation

• All datasets currently featured in the RING have been manually submitted by their owners / managers

• BUT, We don’t want to force data owners who already have a dataset catalog to catalog and maintain their datasets in two places

We are working on procedures to federate datasets from the most used dataset cataloguing platforms (Dataverse, CKAN…)

First experiment started with the IFPRI Dataverse dataset catalog

Page 7: The new CIARD RING, a machine-readable directory of datasets for agriculture

The RING user interface

Page 8: The new CIARD RING, a machine-readable directory of datasets for agriculture

Dataset record

Page 9: The new CIARD RING, a machine-readable directory of datasets for agriculture

The RING machine interface – Why (2)

• Datasets registered in the RING have to be found by applications

• Applications have to be able to read all the metadata about datasets and filter datasets according to their needs

• Applications have to find enough technical metadata in the RING to:– Identify datasets with a specific coverage (type of data, thematic

coverage, geographic coverage)– Identify datasets that comply with certain technical specifications

(format, protocol etc.)– Access the dataset and get the data

This machine-readable layer can support the data aggregation workflows of external services

Page 10: The new CIARD RING, a machine-readable directory of datasets for agriculture

The RING machine interface – SPARQL

An RDF store is a way of storing data using a machine-readable "grammar" (the Resource Description Framework) and documented semantics (RDF vocabularies).

URIsThe URI for each service / dataset is built as follows: RING-domain/node/service-ID.For example: http://ring.ciard.net/node/2417

The RING database is also an accessible RDF store.

SPARQL endpointhttp://ring.ciard.net/sparql1

Page 11: The new CIARD RING, a machine-readable directory of datasets for agriculture

SPARQL how to: vocabularies

The vocabularies used in the RDF store are:• RDF: http://www.w3.org/1999/02/22-rdf-syntax-ns#• RDFS: http://www.w3.org/2000/01/rdf-schema# • DC: http://purl.org/dc/terms/• DCAT: http://www.w3.org/ns/dcat# • ADMS: http://www.w3.org/ns/adms# • FOAF: http://xmlns.com/foaf/0.1/ • DOAP: http://usefulinc.com/ns/doap# • SKOS: http://www.w3.org/2004/02/skos/core# • VCARD: http://www.w3.org/2006/vcard/ns#

The data model chosen to describe datasets is the

W3C Data Catalog Vocabulary (DCAT)designed to describe datasets

and the forms in which they are exposed, their "distributions"

Page 12: The new CIARD RING, a machine-readable directory of datasets for agriculture

SPARQL how to: sample queryTo get all datasets available through the OAI-PMH protocolQuery: PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX dc: <http://purl.org/dc/terms/> PREFIX dcat: <http://www.w3.org/ns/dcat#> PREFIX adms: <http://www.w3.org/ns/adms#> PREFIX doap: <http://usefulinc.com/ns/doap#> PREFIX skos: <http://www.w3.org/2004/02/skos/core#> DESCRIBE ?dataset ?distro ?owner ?contact ?topic ?standard ?format ?protocol

WHERE { ?dataset rdf:type dcat:Dataset . ?dataset dc:title ?title . ?dataset dcat:distribution ?distro . ?dataset dc:publisher ?owner . ?distro dcat:accessURL ?url . ?distro adms:representationTechnique <http://ring.ciard.net/taxonomy_term/108> . OPTIONAL { ?dataset doap:maintainer ?contact } OPTIONAL { ?dataset dcat:theme ?topic } OPTIONAL { ?distro dc:conformsTo ?standard } OPTIONAL { ?distro dc:format ?format } OPTIONAL { ?distro adms:representationTechnique ?protocol } }

Page 13: The new CIARD RING, a machine-readable directory of datasets for agriculture

SPARQL how to: URIs?

All the URIs that you may need in queries are listed on the RING web site• A list of the URIs of all the RING entities

(services/datasets, organizations, KOSs etc.)• A list of the URIs of all RING concepts

(countries, topics, regions, protocols etc.)

Page 14: The new CIARD RING, a machine-readable directory of datasets for agriculture

SPARQL how to: URIs of entities

Page 15: The new CIARD RING, a machine-readable directory of datasets for agriculture

SPARQL how to: exploit linked URIs

Page 16: The new CIARD RING, a machine-readable directory of datasets for agriculture

Example of use: AGRIS RING

1. How AGRIS uses the RING Linked Data

AGRIS (http://agris.fao.org): database of more than 7 million bibliographic references on agricultural research and technology and links to related data resources on the Web.AGRIS retrieves information on AGRIS centers through a SPARQL query run against the RING.<http://ring.ciard.net/node/10687> is the uRI of the AGRIS network in the RING------------------------------

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX dc: <http://purl.org/dc/terms/> PREFIX dcat: <http://www.w3.org/ns/dcat#> DESCRIBE ?dataset WHERE { ?dataset rdf:type dcat:Dataset . ?dataset dc:partOf <http://ring.ciard.net/node/10687> } ------------------------------

Page 17: The new CIARD RING, a machine-readable directory of datasets for agriculture

Example of use: AGRIS RING2. How to get AGRIS Linked Data bibliographic records for each AGRIS center

In the AGRIS RDF store, all bibliographic records are associated to the corresponding AGRIS center through the dcterms:source property: the URI used to identify the AGRIS center is the RING URI.Any application can therefore retrieve all records belonging to an AGRIS center by running a query against the AGRIS SPARQL endpoint (http://202.45.139.84:10035/catalogs/fao/repositories/agris).------------------------------------PREFIX dcterms: <http://purl.org/terms> DESCRIBE ?rec WHERE { ?rec dcterms:source <http://ring.ciard.net/node/2754> . } -----------------------------------

Page 18: The new CIARD RING, a machine-readable directory of datasets for agriculture

Interoperability assessment in the RING

The technical metadata registered in the RING for each dataset provide enough information to give a good idea of the level of “interoperability” of that dataset.

“Interoperability is a feature of datasets— and of information services that give access to datasets— whereby data can easily be retrieved, processed, re-used, and re-packaged (“operated”) by other systems. The less pre-coordination required to achieve this, the more “interoperable” the dataset.”

[from: Interim Proceedings of International Expert Consultation on “Building the CIARD Framework for Data and Information Sharing”, Beijing 20-23 June 2011. 2011.]

Page 19: The new CIARD RING, a machine-readable directory of datasets for agriculture

Metadata Type Interoperability points Tim Berner Lee’s stars

For the service/dataset in general 1 Global coverage Select list 4 if not empty 2 Regional coverage (FAO) Select list 4 if not empty 3 Regional coverage (GFAR) Select list 4 if not empty 4 National coverage Select list 4 if not empty 5 Specific topic (AGROVOC) Autocomplete multiple

(authority: AGROVOC)8 if not empty

6 Type of content/data managed Autocomplete multiple 4 if not empty 7 KOSs used Select list multiple

(authority: VEST Registry)10 for each KOS used 5 IF you already have 4

8 Special instructions for getting data from this service

Text 3 if not empty

9 Examples Text multiple 2 for each example For each distribution of the

dataset

10 URL / target / endpoint Text 30 if not empty 1

11 File upload Upload 10 if not empty 1

12 Access / licensing Autocomplete 4 if half-open; 6 if free / open; 8 if formally open (OA, CC)

0.5 if half-open; 1 if open; 1.5 if open and known license e.g. CC

13 License URL Text: URL 7 if not empty 0.5

14 Protocol Select list 10 ftp/download; 20 OAI-PMH or web service; 30 if SPARQL

1 if ftp/download; 3 if OAI-PMH or RSS; 4 if SPARQL

15 Format / serialization / notation Select list(authority: subset of IANA types)

5 Excel; 10 CSV, XML; 12 JSON; 15 RDFXML; 20 JsonLD, ntriples-n3-turtle)

2 if Excel; 3 if CSV, XML, JSON; 4 if JsonLD, RDFXML, ntriples-n3-turtle

16 Metadata set(s) used Select list(authority: VEST Registry)

6 for each metadata set 2.5

17 Does the dataset use URIs? Yes/No 20 if yes; OR: multiply 15 by n. 10 4 (OR: 4 IF you already have 3)

18 Does the dataset link to external URIs?

Yes/No 20 if yes; OR: multiply 15 by n. 15 5 (OR: 5 IF you already have 3)

Page 20: The new CIARD RING, a machine-readable directory of datasets for agriculture

Example of interoperability

assessment in the RING

Page 21: The new CIARD RING, a machine-readable directory of datasets for agriculture

Thank you

Thank you for your attentionValeria Pesce

[email protected]