63
ENGAGE Workshop, June 27th, 2013, Marseille Accelerate the data re-use: ex of an e-infrastructure at European level Valerie Brasse euroCRIS / IS4RI Strasbourg, France Slides reproduced from presentations by ENGAGE members

ENGAGE Workshop at OpenDataWeek2013

Embed Size (px)

DESCRIPTION

The slides that supported the workshop "Accelerating data reuse: international initiatives? what role for the European Commission?" on February 27th, 2013, in Marseille, introducing the ENGAGE platform, and animated by Valerie BRASSE

Citation preview

Page 1: ENGAGE Workshop at OpenDataWeek2013

ENGAGE Workshop, June 27th, 2013, Marseille

Accelerate the data re-use: ex of an e-infrastructure at European level

Valerie BrasseeuroCRIS / IS4RI

Strasbourg, France

Slides reproduced from presentations by ENGAGE members

Page 2: ENGAGE Workshop at OpenDataWeek2013

Agenda

The ENGAGE project, an introduction The ENGAGE 2.0 platform, released in Beta since April 2013 Open data for re-use in Europe, some barriers to overcome?

Findings from the ENGAGE project Discussion

Your suggestions to overcome the barriers

2

Page 3: ENGAGE Workshop at OpenDataWeek2013

Contract no

Project type

Start date

Duration

Partners

Framework Programme 7 (2007-2013)

NTUA GR

TU-DELFT NL

MIC-GR GR

IBM-ISRAEL IL

INTRASOFT LU

STFC UK

FhG-FOKUS DE

AEGEAN GR

EUROCRIS NL

Acronym ENGAGE

Title An Infrastructure for Open, Linked Governmental Data Provision

towards Research Communities and Citizens

Website http://www.engage-project.eu

Platform http://www.engagedata.eu

ENGAGE Project Information

RI-283700

CP-CSA

01/06/2011

36 months

9

Project participants

Research Infrastructures (Coordinator)

Page 4: ENGAGE Workshop at OpenDataWeek2013

Public Sector Information

0 Data produced by governmental organisations – typically referring to datasets

0 Examples: geospatial, demographic, statistical, environmental, public safety, financial data

0 Growing international movement: open access to PSI datasets in a way that facilitates reuse

0 Opening up PSI datasets can potentially lead to substantial economic gains 1

1Vickery, G. (2011): Review of recent studies on PSI re-use and related market developments.

Page 5: ENGAGE Workshop at OpenDataWeek2013

• Development and use of a data infrastructure, incorporating distributed and diverse public

sector information (PSI) resources

• Capable of supporting scientific collaboration and research, particularly for the Social

Science and Humanities (SSH) scientific communities,

• Empowering the deployment of open governmental data towards citizens.

Simply put, ENGAGE is a door for researchers that leads them to the world of Open

Government Data. Through the ENGAGE platform, researchers and citizens will be able to

search, browse, download, visualise and submit diverse and distributed Public Sector

datasets from EU countries.

Overview of ENGAGE objectives

Page 6: ENGAGE Workshop at OpenDataWeek2013

ENGAGE Two-way Scenario

Public Sector Information Collection

Data Curation

Archival Data Search

and Retrieval Advanced

Data Services

Delivering Open Data Needs and guidelines to Public Sector Organisations

•Public Sector

Organisations

•Open data

initiations

•Pre-processing

•Anonymisation

•Harmonisation

•Annotation

•Linking

•Cloud and Grid

Infrastructure

•Platform

Independence and

Interoperability

•Open and intuitive

access to the data

collection

•Context-specific

search

•Visualisation (inc.

combined views)

•Context-specific

formatting

•Collaboration tools

•Public Sector

Organisations

•ENGAGE and

eInfrastructures

•ENGAGE •Society

•Policy

•Research

Communities

•Policy makers

New Problems – new

Challenges

Search Data Needs

New Service Definition for

open data

Utilisation of existing

Infrastructures

Needs for Governmental data Provision

Page 7: ENGAGE Workshop at OpenDataWeek2013

ENGAGE provides a

single point of access

to PSI sources as well

as relevant tools in

order to cover the

needs of researchers

and citizens

Unstructured / “Semi-structured”

Ministries / local public agencies websites

Publicdata.eu

National

Statistical

Offices

Public

data

sources

ENGAGE traverses

across distributed and

diverse public sector

information resources

Page 8: ENGAGE Workshop at OpenDataWeek2013

ENGAGE aims to embrace the

Linked Data Paradigm while

ensuring the quality and

responsiveness of highly

structured information models.

ENGAGE: not an isolated

data silo but a vital part of

the Global Data Space.

Page 9: ENGAGE Workshop at OpenDataWeek2013

ENGAGE will enable EU Researchers / Citizens to

Discover and browse datasets across diverse and

dispersed public sector information resources

(local, National and European) in their own

language.

Upload curated, enhanced or extended versions of

existing datasets, originally published by public

agencies, in order to address various formats,

standards and scientific purposes in a crowd-

sourcing manner.

Acquire the datasets

Visualize properly structured datasets in data

tables, maps and charts

Additionally

Utilize ENGAGE Application Programming

Interfaces (APIs) for searching and acquiring the

datasets.

Rate the quality of datasets on various dimensions

Request additional datasets or information on

existing datasets from the Public Agencies

View usage statistics

View publications and other material linked to

datasets

Page 10: ENGAGE Workshop at OpenDataWeek2013

Public Agencies will be able to Utilize the ENGAGE infrastructure (interface and APIs) to publish

governmental data

Register and link their datasets within the ENGAGE infrastructure

Receive feedback on the quality of their datasets

Review the opinion or request of citizens and researchers

View the applications, publications and other datasets uploaded by

scientists, that are linked to their original published datasets

Page 11: ENGAGE Workshop at OpenDataWeek2013

0 Integration of original PSI data and derived / curated datasets created, maintained and extended by users (researchers, citizens, journalists, computer specialists) in a collaborative environment. A research / data curation community platform with focus on the SSH domain.

0 The vision of the ENGAGE infrastructure is to extract, highlight and enhance the RE‐USE value of PSI data.0 HOW: Moving slowly from low‐structured, isolated, difficult to find PSI 

data to high‐structured, easy to link , easy to process datasets => Crowd‐sourcing.

Page 12: ENGAGE Workshop at OpenDataWeek2013

Unstructured / Semi-structured / Structured

Public

data

sources

JSON

Conversion Data Enrichment

Metadata Enrichment Cleansing

“Snapshots”

Low

Re-Use Value /

Quality structure /

metadata

Discovery

and Context

Metadata

High Re-Use Value /

Quality structure /

metadata

ENGAGE Crowdsourcing

Moving from low

structured, low value

datasets to highly

structured and / or

derived datasets

Page 13: ENGAGE Workshop at OpenDataWeek2013

ENGAGEDATA.EU

Page 14: ENGAGE Workshop at OpenDataWeek2013

ENGAGE 2.0

0 On top of ENGAGE basic functions (catalog, search, visualizations, API)

Researchers / Citizens / Journalists:

0 Extend other datasets (official or already extended - derived datasets) 0 Conversions (e.g. HTML- PDF to xls, PDF to RDF)

0 Data Cleansing (e.g. duplicate records, empty rows, errors)

0 Metadata Enrichment (missing metadata, Linked Data Enablers!)

0 Data Enrichment (enrich datasets with more information)

0 Snapshots of real-time data (e.g. Diavgeia_decisions_10_2012_to_12_2012.xls)

0 Mash-ups / Interlinking (e.g. Combine Election results to UV radiation levels!)

0 View the version tree of official – derived datasets (clean solution - easy to understand and manage the contributions / versions)

Page 15: ENGAGE Workshop at OpenDataWeek2013

ENGAGE 2.0

Researchers / Citizens / Journalists:

0 Data Requests 0 Looking for a dataset (e.g. I can’t find it elsewhere. Does it exist?)

0 Looking for a curation / conversion / enrichment (e.g. I am looking for the election results in Greece in XLS. )

0 Looking for data verification (e.g. Do you think this dataset is valid?)

0 Freedom of Information Requests

0 Integration of tools 0 Google Refine

0 ScraperWiki

0 Visualizations

Page 16: ENGAGE Workshop at OpenDataWeek2013

ENGAGE 2.0

Data Providers:

0 Maintainers of Official Datasets

0 Work as a group

0 Bring the community which works on their data closer to them/ direct communication

0 See and take advantage of ENGAGE Data Curation Community work (e.g. cleansing, better formats)

0 Easy to see / gather all the Applications that are based on their official datasets.

0 See the impact of their datasets.

0 Understand which datasets have RE-USE value for users.

0 Community Help in the process of Digitalization and Opening of current or older Public Data (history dimension)

Page 17: ENGAGE Workshop at OpenDataWeek2013

Search for a dataset...

...use your own language

Page 18: ENGAGE Workshop at OpenDataWeek2013

Check dataset information...

...and download it

Page 19: ENGAGE Workshop at OpenDataWeek2013

Faceted search available...

...with several filters

Page 20: ENGAGE Workshop at OpenDataWeek2013

Extend the datasets...

...in several ways...

Page 21: ENGAGE Workshop at OpenDataWeek2013

...and keep the provenance information

Page 22: ENGAGE Workshop at OpenDataWeek2013

www.engagedata.eu

OpenRefine

Page 23: ENGAGE Workshop at OpenDataWeek2013

Describe the metadata...

Page 24: ENGAGE Workshop at OpenDataWeek2013

Join the community...

..and create your groups

Page 25: ENGAGE Workshop at OpenDataWeek2013

Rate the datasets...

..and share your thoughts

Page 26: ENGAGE Workshop at OpenDataWeek2013

Find out about Open Data sites...

Page 27: ENGAGE Workshop at OpenDataWeek2013

...per country

Page 28: ENGAGE Workshop at OpenDataWeek2013

...or other criteria

Page 29: ENGAGE Workshop at OpenDataWeek2013

Learn more about...

...ENGAGE,the ENGAGE API

and Data CurationMethods

Page 30: ENGAGE Workshop at OpenDataWeek2013

Functionalities of ENGAGE open data e-infrastructure

0 Contribution of ENGAGE over existing infrastructures:

1. Service for researchers and citizens

2. Metadata specification and content organisation (embracement of the Linked Data Paradigm while ensuring the quality and responsiveness of highly structured information models)

3. Automation in data entry and curation

4. Crowdsourcing and interaction with and between users of the platform

5. Data curation tools and services

6. Dataset visualisation possibilities

7. Multilinguality

8. User help and training

Page 31: ENGAGE Workshop at OpenDataWeek2013

Value Proposition through individual tools

Search in diverse and dispersed data sources in EU supported by ENGAGE

Be able to transform your datasets keeping the valuable information with the ENGAGE external tools (Open Refine, Scrapperwiki etc.)

See your results through visualisation tools

Structure your data according to your needs – control all the levels of your dataset (data, metadata, format)

Refine existing datasets by metadata enrichment

Page 32: ENGAGE Workshop at OpenDataWeek2013

Value Proposition through collaboration

Create your community(ies) with members of mutual interests

Each community will be able to increase the value of its data sets by applying their own perspectives based on its unique needs

Upload your work and share it with your community

Find other data sets, valuable for your work, uploaded by your community (Collaborate / Exchange / Ask / Provide)

Combine their results with yours – make new datasets

Page 33: ENGAGE Workshop at OpenDataWeek2013

Elastic Search

Ckan API ScraperWiki API

OpenRefine

DjangoWiki

Amazon S3

Python / Django Framework

HerokuPostgresql

Virtuoso PostgreSQL

Apache SolR

Django Framework

Gateways and  integrated 

tools

User Interface

ENGAGE CoreComponents

HTML / Jquery

Translate

StorageComponents

CERIF

Page 34: ENGAGE Workshop at OpenDataWeek2013

Performing scenarios

Scenario 1: Searching, downloading, extending/ visualizing/ curating/ linking and uploading interesting datasets

Scenario 2: Getting information about other open data websites and comparing them via the ENGAGE website

Scenario 3: Getting information about manuals, API's and tutorials (training)

Page 35: ENGAGE Workshop at OpenDataWeek2013

engagedata.eu

engage-project.eu → Events → Workshops → ENGAGE Online Usability Test

Verification Code = ODWM

Page 36: ENGAGE Workshop at OpenDataWeek2013

Agenda

The ENGAGE project, an introduction The ENGAGE 2.0 platform, released in Beta since April 2013 Open data for re-use in Europe, some barriers to overcome?

Findings from the ENGAGE project Discussion

Your suggestions to overcome the barriers

8

Page 37: ENGAGE Workshop at OpenDataWeek2013

From V1 evaluation Asking for:

– More of the specific datasets that users are looking for– Better performing advanced search functionality– More / more open dataset formats– More tools for visualization– More metadata– More metadata in the language that the user understands– Better understandable metadata– Easy to find metadata– Information about the quality of the datasets– Ability to rate and post comments on datasets

➢ Metadata are very important in solving many problems + Multilinguality + Dataset formats

3

Page 38: ENGAGE Workshop at OpenDataWeek2013

Challenges of data sourcing• Great diversity and variety on datasets in terms of

• File format• Encoding• License• Language• Metadata standard (Discovery level)• Metadata standard (Data ‐ Domain level)

• Some PSI sites (even new) do not provide an API• Most sites provide an API only for discovery• Linked Data potential still not achieved (IT‐savvy / researchers only)• Live query of other portals datasets has issues:

– Schema Mapping– Performance

Page 39: ENGAGE Workshop at OpenDataWeek2013

Barriers to overcome?

Metadata Need for a rich format to facilitate discovery and search

(DCAT, CKAN..., CERIF?) How to have it filled: human vs extraction Tracking of data ownership and provenance, for trust and

security Datasets formats: from pdf/csv toward LOD/rdf Multilinguality (metadata and data) Licences

Many “open” licences: CC and national licences Curation: what licence for a merged and enriched A + B,

depending on A and B licences?

4

Page 40: ENGAGE Workshop at OpenDataWeek2013

Agenda

The ENGAGE project, an introduction The ENGAGE 2.0 platform, released in Beta since April 2013 Open data for re-use in Europe, some barriers to overcome?

Findings from the ENGAGE project Discussion

Your suggestions to overcome the barriers

11

Page 41: ENGAGE Workshop at OpenDataWeek2013

Barriers to overcome?

Metadata Need for a rich format to facilitate discovery and search

(DCAT, CKAN..., CERIF?) How to have it filled: human vs extraction Tracking of data ownership and provenance, for trust and

security Datasets formats: from pdf/csv toward LOD/rdf Multilinguality (metadata and data) Licences

Many “open” licences: CC and national licences Curation: what licence for a merged and enriched A + B,

depending on A and B licences?

5

Page 42: ENGAGE Workshop at OpenDataWeek2013

Rich contextual metadata is important

0 Captures context, purpose, provenance, coverage, etc.

0 Allows the user to:

0 Discover a dataset

0 Evaluate utility and re-use potential

0 Reuse it!

0 Enables advanced services

0 Sophisticated search/discovery and navigation, mining, visualisation,

reporting

11th International Conference on Current Research Information Systems (CRIS 2012), Prague, 6-9 June 2012

Page 43: ENGAGE Workshop at OpenDataWeek2013

• Need canonical form to reduce n(n‐1) conversions to n– PSI data has several different metadata ‘standards’

• Canonical form must be able to ingest or generate the other metadata ‘standards’– Implies has to be richer than the others

• Syntax (structure) and semantics• Support multiple semantics over canonical syntax

• Canonical form must support whatever architecture is used

Mapping considerations

Page 44: ENGAGE Workshop at OpenDataWeek2013
Page 45: ENGAGE Workshop at OpenDataWeek2013

A 3‐level metadata approach• Level‐1. Discovery metadata. Flat schemata (analogous to Dublin core). Enables basic search by non‐sophisticated users.

• Level‐2. Usage metadata. A structured, semantically‐rich model for contextual metadata. Enables advanced domain‐independent services.

• Level‐3. Domain metadata. Detailed domain‐specific metadata. Allows advanced services provided by specialized tools.

Page 46: ENGAGE Workshop at OpenDataWeek2013

A 3‐level metadata approach

CSMD Scientific studies

Samples, parameters,…

DDISocial sciences

Surveys, Populations, questionnaires,…

INSPIREGeospatial dataGeospatial info

SDMXStatistical data

Measures, Dimensions, …

Level‐3

eGMSDCATCKAN DC

Level‐1

Level‐2CERIF

CERIF‐generated RDF/LOD

Page 47: ENGAGE Workshop at OpenDataWeek2013

Target Dataset(s)

generate

Pointto

Processing model

A 3‐level metadata approach

Page 48: ENGAGE Workshop at OpenDataWeek2013
Page 49: ENGAGE Workshop at OpenDataWeek2013
Page 50: ENGAGE Workshop at OpenDataWeek2013

CERIF Common European Research Information Format – maintained by euroCRIS

From http://cerifsupport.org/2013/04/02/data-in-cerif/ , B. Joerg

CERIF

Page 51: ENGAGE Workshop at OpenDataWeek2013

Barriers to overcome?

Metadata Need for a rich format to facilitate discovery and search

(DCAT, CKAN..., CERIF?) How to have it filled: human vs extraction Tracking of data ownership and provenance, for trust and

security Datasets formats: from pdf/csv toward LOD/rdf Multilinguality (metadata and data) Licences

Many “open” licences: CC and national licences Curation: what licence for a merged and enriched A + B,

depending on A and B licences?

6

Page 52: ENGAGE Workshop at OpenDataWeek2013

Dataset formats

Page 53: ENGAGE Workshop at OpenDataWeek2013

Community converting dataset to another format

Page 54: ENGAGE Workshop at OpenDataWeek2013

Barriers to overcome?

Metadata Need for a rich format to facilitate discovery and search

(DCAT, CKAN..., CERIF?) How to have it filled: human vs extraction Tracking of data ownership and provenance, for trust and

security Datasets formats: from pdf/csv toward LOD/rdf Multilinguality (metadata and data) Licences

Many “open” licences: CC and national licences Curation: what licence for a merged and enriched A + B,

depending on A and B licences?

7

Page 55: ENGAGE Workshop at OpenDataWeek2013

User Interface translation

Page 56: ENGAGE Workshop at OpenDataWeek2013

Metadata translation

Page 57: ENGAGE Workshop at OpenDataWeek2013

Barriers to overcome?

Metadata Need for a rich format to facilitate discovery and search

(DCAT, CKAN..., CERIF?) How to have it filled: human vs extraction Tracking of data ownership and provenance, for trust and

security Datasets formats: from pdf/csv toward LOD/rdf Multilinguality (metadata and data) Licences

Many “open” licences: CC and national licences Curation: what licence for a merged and enriched A + B,

depending on A and B licences?

8

Page 58: ENGAGE Workshop at OpenDataWeek2013

Open licenses landscape

Page 59: ENGAGE Workshop at OpenDataWeek2013

Open licenses landscape – per countryCountry Portal Licence

France Data.gouv.fr Licence Ouverte

United Kingdom Data.gov.uk Open Government Licence

Italy Dati.gov.it Creative Commons Attribuzione - Non commerciale 2.5 Italia (CC BY-NC 2.5)

Germany Govdata.deDatenlizenz Deutschland – Namensnennung Datenlizenz Deutschland – Namensnennung – nicht kommerziell

Norway Data.norge.no Norsk lisens for offentlige data (NLOD)

Netherlands Data.overheid.nl

No specific common licence but a recommendation for the agencies publishing data through the portal to use the framework of the Open Government Act, and to apply Creative Commons Zero of Public Domain if any licence is desired at all

Spain Datos.gob.esNo specific licence but two parts in extensive legal notes that cover data re-use and are based on different pieces of Spanish national legislation

Belgium Data.gov.beNo specific common licence. Each public service or government institution determines the terms and conditions governing access to and use of its data published through portal.

From Bunakov, V., Jeffery, K. (2013). Licence management for Public Sector Information. Conference for eDemocracy & Open Government

Page 60: ENGAGE Workshop at OpenDataWeek2013

Open licenses landscape – several types

Page 61: ENGAGE Workshop at OpenDataWeek2013

Open license content – an example

Regulation components of data.gouv.fr open licence

From Bunakov, V., Jeffery, K. (2013). Licence management for Public Sector Information. Conference for eDemocracy & Open Government

Page 62: ENGAGE Workshop at OpenDataWeek2013

Conclusion: Participants' suggestions

The ENGAGE platform and features are interesting to promote data re-use pending the fulfillment of the next points:

1. Have a clear view of the positioning of ENGAGE in the Open Data ecosystem, including the added value / differences with respect to HOMER and other Open Data-related EC projects

2. Ensure ENGAGE sustainability

3. For ENGAGE in particular, and for EC projects in general, the developed software should be required to be open source in order to ensure their sustainability

4. Success stories related to the use of ENGAGE should be promoted, for example demonstrating the savings in time for Researchers

5. The educative side should be strong, with the inclusion of basic information on Linked Data, video tutorials,...

25

Page 63: ENGAGE Workshop at OpenDataWeek2013

[email protected]@valcas2000

http://www.engage-project.eu

Join Us

Thank you for your contribution!