14
1-5 stars: Metadata on the Openness Level of Open Data Sets in Europe Sébastien Martin, Muriel Foulonneau, Slim Turki

1-5 stars: Metadata on the Openness Level of Open Data Sets in Europe

Embed Size (px)

DESCRIPTION

Authors: Sébastien Martin, Muriel Foulonneau, Slim Turki Paris VIII University, France Public Research Centre Henri Tudor, Luxembourg http://link.springer.com/chapter/10.1007%2F978-3-319-03437-9_24 Presented during MTSR 2013 / 7th Metadata and Semantics Research Conference http://mtsr2013.teithe.gr/ Abstract. The development of open data requires a better reusability of data. Indeed, the catalogs listing data dispersed in different countries have a crucial role. However, the degree of openness is also a key success factor for open data. In this paper, we study the PublicData.eu catalogue, which allows accessing open datasets from European countries and analyse the metadata recorded for each dataset. The objectives are to (i) identify the quality of a sample of metadata properties, which are critical to enable data reuse and to (ii) study the stated level of data openness. The study uses the Tim Berners-Lee’s five star evaluation scale.

Citation preview

Page 1: 1-5 stars: Metadata on the Openness Level of Open Data Sets in Europe

1-5 stars: Metadata on the Openness Level of Open Data Sets in EuropeSébastien Martin, Muriel Foulonneau, Slim Turki

Page 2: 1-5 stars: Metadata on the Openness Level of Open Data Sets in Europe

2

Context & Objectives

• Level of reuse of open data is still disappointing.• Development of open data requires a better reusability of data. • Degree of openness is a key success factor.• Catalogs listing data have a crucial role.

Analyse PublicData.eu catalogue

(i) identify the quality of a sample of metadata properties, which are critical to enable data reuse

(ii) study the stated level of data openness.

21/11/20131-5 stars: Metadata on the Openness Level of

Open Data Sets in Europe

Page 3: 1-5 stars: Metadata on the Openness Level of Open Data Sets in Europe

3

PublicData.eu

• Many local and national portals to provide access to public sector open datasets - 114 EU catalogues on datacatalogs.org

• Gather datasets across geographic and institutional boundaries

PublicData.eu • pan-European catalogue launched under the FP7 LOD2 project. • aggregates data from CKAN open data catalogues all over Europe. • collects data from 26 sources• 1st to be published in Europe in 2011• data beyond the European Union, e.g., Serbian datasets. • not exhaustive, it represents a unique aggregation of European datasets.

• 17.027 datasets• UK: largest provider

21/11/2013

Page 4: 1-5 stars: Metadata on the Openness Level of Open Data Sets in Europe

4

Methodology

Descriptions of datasets collected in May 2013

236 distinct dataset properties identified, partially due to• linguistic diversity; some providers adapt property names in their language• problems of consistency in naming (upper / lower case, spaces /

underscore for a single field).

Major challenge to understand the content of the PublicData.eu

Data collected and analysed to identify information made available on data openness and reusability in particular the licensing conditions and the data formats.

21/11/20131-5 stars: Metadata on the Openness Level of

Open Data Sets in Europe

Page 5: 1-5 stars: Metadata on the Openness Level of Open Data Sets in Europe

5

Tim Berners-Lee’s evaluation scale

★ Available on the web (whatever format) but with an open license, to be Open Data

★★ Available as machine-readable structured data

★★★ 2 + non-proprietary format

★★★★ 3 + Use open standards from W3C (RDF and SPARQL) to identify things

★★★★★ 4 + Link your data to other people’s data to provide context

21/11/20131-5 stars: Metadata on the Openness Level of

Open Data Sets in Europe

Page 6: 1-5 stars: Metadata on the Openness Level of Open Data Sets in Europe

6

★ Data Licences

13.535 / 17.027 datasets have at least 1 license indication

12.470 datasets can be considered having some form of open license 73,24%

769 datasets have a Creative Commons license

Significant number of datasets have a national license:• apie v2 to publish information created by French public authorities • UK-crown which “covers material created by civil servants, ministers and

government departments and agencies” in the UK,• UK Open Government License

128 datasets with an explicitly closed license

21/11/20131-5 stars: Metadata on the Openness Level of

Open Data Sets in Europe

Page 7: 1-5 stars: Metadata on the Openness Level of Open Data Sets in Europe

7

★★ Machine readable format

• Facilitates data reusability

• 4.051 / 17.027 with content_TYPE

• 11.285 with at least one indication about format

• 56 datasets in RDF• Dominant proportion of

spreadsheets type’s formatsDistribution of formats

40% not a machine readable format34% of datasets available in a machine readable format

machine readability cond. for openness levels of 2★ and >

21/11/20131-5 stars: Metadata on the Openness Level of

Open Data Sets in Europe

Page 8: 1-5 stars: Metadata on the Openness Level of Open Data Sets in Europe

8

★★★ Use of non-proprietary formats

Creates ambiguities as the openness nature of formats can be debated in some cases:

• Certain formats are proprietary but their specifications are open.• Some formats have been open at a certain point of time but additions and

further evolutions remain proprietary

In many cases, value of property was too vague to determine whether the format was or not proprietary.

It was possible to identify:• For 49% of the datasets, a non-proprietary format • For 21% a proprietary format.

Use of proprietary formats is a critical issue for improving the level of openness of datasets.

21/11/20131-5 stars: Metadata on the Openness Level of

Open Data Sets in Europe

Page 9: 1-5 stars: Metadata on the Openness Level of Open Data Sets in Europe

9

★★★★ Use of open standards from W3C

Including HTML, XML, and RDF in particular.• XML-based formats may be entirely independent from W3C (e.g. KML)

Availability in W3C standards: 9,5% of datasets

Availability in XML based formats: 10%

Information remains unknown in most cases

21/11/20131-5 stars: Metadata on the Openness Level of

Open Data Sets in Europe

Page 10: 1-5 stars: Metadata on the Openness Level of Open Data Sets in Europe

10

★★★★★ Linked data

Linked data are only mentioned in the description of a single dataset (Brandweer Amsterdam-Amstelland Uitrukberichten) for which the format is described as “linked data api, rdf json”.

58 datasets mention RDF (or RDFa) as a format or content type, i.e., 0,34%.

21/11/20131-5 stars: Metadata on the Openness Level of

Open Data Sets in Europe

Page 11: 1-5 stars: Metadata on the Openness Level of Open Data Sets in Europe

11

Level of openness (1/2)

6.891 / 17.027 datasets show at least one information about their degree of openness.

All come from Data.gov.uk (8 689 datasets)

For a majority of datasets, the level of openness is unknown.• Coherent with lack of licensing information without which it is impossible

to conclude on even ★ openness level.

21/11/20131-5 stars: Metadata on the Openness Level of

Open Data Sets in EuropeDistribution of openness levels in UK datasets

Page 12: 1-5 stars: Metadata on the Openness Level of Open Data Sets in Europe

12

Level of openness (2/2)

Approximate level of openness derived from licensing and format properties

• 73,24% of the datasets should have ★ or above.• Reference to 5★ should take into consideration linkages, cannot be

inferred from dataset metadata.

Data openness mainly related to 1st level of compliance: licensing issue.

• Data providers have clearly not focused on publication of data in reusable formats.

Level of openness according to Format and License related properties

21/11/20131-5 stars: Metadata on the Openness Level of

Open Data Sets in Europe

Page 13: 1-5 stars: Metadata on the Openness Level of Open Data Sets in Europe

13

Conclusion

• Limited openness of datasets advertised as open data• Heterogeneity of associated metadata Difficulty for reusers to (i) discover datasets, despite the

creation of large catalogues of datasets, and to (ii) effectively reuse machine readable and contextualized data.

★ may be sufficient to ensure transparency of gov. action, facilitating reuse of data through services is not served below 2★Confirmed risks regarding major challenges that data providers

have to face: (i) language barrier and (ii) lack of consistency of metadata.

Harmonization of practices, training and tools necessary to ensure that datasets are available in relevant formats.

21/11/20131-5 stars: Metadata on the Openness Level of

Open Data Sets in Europe

Page 14: 1-5 stars: Metadata on the Openness Level of Open Data Sets in Europe

1-5 stars: Metadata on the Openness Level of Open Data Sets in EuropeSébastien Martin, Muriel Foulonneau, Slim Turki

Contact:

[email protected]