Upload
valerie-brasse
View
361
Download
0
Embed Size (px)
DESCRIPTION
The slides that supported the workshop "Accelerating data reuse: international initiatives? what role for the European Commission?" on February 27th, 2013, in Marseille, introducing the ENGAGE platform, and animated by Valerie BRASSE
Citation preview
ENGAGE Workshop, June 27th, 2013, Marseille
Accelerate the data re-use: ex of an e-infrastructure at European level
Valerie BrasseeuroCRIS / IS4RI
Strasbourg, France
Slides reproduced from presentations by ENGAGE members
Agenda
The ENGAGE project, an introduction The ENGAGE 2.0 platform, released in Beta since April 2013 Open data for re-use in Europe, some barriers to overcome?
Findings from the ENGAGE project Discussion
Your suggestions to overcome the barriers
2
Contract no
Project type
Start date
Duration
Partners
Framework Programme 7 (2007-2013)
NTUA GR
TU-DELFT NL
MIC-GR GR
IBM-ISRAEL IL
INTRASOFT LU
STFC UK
FhG-FOKUS DE
AEGEAN GR
EUROCRIS NL
Acronym ENGAGE
Title An Infrastructure for Open, Linked Governmental Data Provision
towards Research Communities and Citizens
Website http://www.engage-project.eu
Platform http://www.engagedata.eu
ENGAGE Project Information
RI-283700
CP-CSA
01/06/2011
36 months
9
Project participants
Research Infrastructures (Coordinator)
Public Sector Information
0 Data produced by governmental organisations – typically referring to datasets
0 Examples: geospatial, demographic, statistical, environmental, public safety, financial data
0 Growing international movement: open access to PSI datasets in a way that facilitates reuse
0 Opening up PSI datasets can potentially lead to substantial economic gains 1
1Vickery, G. (2011): Review of recent studies on PSI re-use and related market developments.
• Development and use of a data infrastructure, incorporating distributed and diverse public
sector information (PSI) resources
• Capable of supporting scientific collaboration and research, particularly for the Social
Science and Humanities (SSH) scientific communities,
• Empowering the deployment of open governmental data towards citizens.
Simply put, ENGAGE is a door for researchers that leads them to the world of Open
Government Data. Through the ENGAGE platform, researchers and citizens will be able to
search, browse, download, visualise and submit diverse and distributed Public Sector
datasets from EU countries.
Overview of ENGAGE objectives
ENGAGE Two-way Scenario
Public Sector Information Collection
Data Curation
Archival Data Search
and Retrieval Advanced
Data Services
Delivering Open Data Needs and guidelines to Public Sector Organisations
•Public Sector
Organisations
•Open data
initiations
•Pre-processing
•Anonymisation
•Harmonisation
•Annotation
•Linking
•Cloud and Grid
Infrastructure
•Platform
Independence and
Interoperability
•Open and intuitive
access to the data
collection
•Context-specific
search
•Visualisation (inc.
combined views)
•Context-specific
formatting
•Collaboration tools
•Public Sector
Organisations
•ENGAGE and
eInfrastructures
•ENGAGE •Society
•Policy
•Research
Communities
•Policy makers
New Problems – new
Challenges
Search Data Needs
New Service Definition for
open data
Utilisation of existing
Infrastructures
Needs for Governmental data Provision
ENGAGE provides a
single point of access
to PSI sources as well
as relevant tools in
order to cover the
needs of researchers
and citizens
Unstructured / “Semi-structured”
Ministries / local public agencies websites
Publicdata.eu
National
Statistical
Offices
Public
data
sources
ENGAGE traverses
across distributed and
diverse public sector
information resources
ENGAGE aims to embrace the
Linked Data Paradigm while
ensuring the quality and
responsiveness of highly
structured information models.
ENGAGE: not an isolated
data silo but a vital part of
the Global Data Space.
ENGAGE will enable EU Researchers / Citizens to
Discover and browse datasets across diverse and
dispersed public sector information resources
(local, National and European) in their own
language.
Upload curated, enhanced or extended versions of
existing datasets, originally published by public
agencies, in order to address various formats,
standards and scientific purposes in a crowd-
sourcing manner.
Acquire the datasets
Visualize properly structured datasets in data
tables, maps and charts
Additionally
Utilize ENGAGE Application Programming
Interfaces (APIs) for searching and acquiring the
datasets.
Rate the quality of datasets on various dimensions
Request additional datasets or information on
existing datasets from the Public Agencies
View usage statistics
View publications and other material linked to
datasets
Public Agencies will be able to Utilize the ENGAGE infrastructure (interface and APIs) to publish
governmental data
Register and link their datasets within the ENGAGE infrastructure
Receive feedback on the quality of their datasets
Review the opinion or request of citizens and researchers
View the applications, publications and other datasets uploaded by
scientists, that are linked to their original published datasets
0 Integration of original PSI data and derived / curated datasets created, maintained and extended by users (researchers, citizens, journalists, computer specialists) in a collaborative environment. A research / data curation community platform with focus on the SSH domain.
0 The vision of the ENGAGE infrastructure is to extract, highlight and enhance the RE‐USE value of PSI data.0 HOW: Moving slowly from low‐structured, isolated, difficult to find PSI
data to high‐structured, easy to link , easy to process datasets => Crowd‐sourcing.
Unstructured / Semi-structured / Structured
Public
data
sources
JSON
Conversion Data Enrichment
Metadata Enrichment Cleansing
“Snapshots”
Low
Re-Use Value /
Quality structure /
metadata
Discovery
and Context
Metadata
High Re-Use Value /
Quality structure /
metadata
ENGAGE Crowdsourcing
Moving from low
structured, low value
datasets to highly
structured and / or
derived datasets
ENGAGEDATA.EU
ENGAGE 2.0
0 On top of ENGAGE basic functions (catalog, search, visualizations, API)
Researchers / Citizens / Journalists:
0 Extend other datasets (official or already extended - derived datasets) 0 Conversions (e.g. HTML- PDF to xls, PDF to RDF)
0 Data Cleansing (e.g. duplicate records, empty rows, errors)
0 Metadata Enrichment (missing metadata, Linked Data Enablers!)
0 Data Enrichment (enrich datasets with more information)
0 Snapshots of real-time data (e.g. Diavgeia_decisions_10_2012_to_12_2012.xls)
0 Mash-ups / Interlinking (e.g. Combine Election results to UV radiation levels!)
0 View the version tree of official – derived datasets (clean solution - easy to understand and manage the contributions / versions)
ENGAGE 2.0
Researchers / Citizens / Journalists:
0 Data Requests 0 Looking for a dataset (e.g. I can’t find it elsewhere. Does it exist?)
0 Looking for a curation / conversion / enrichment (e.g. I am looking for the election results in Greece in XLS. )
0 Looking for data verification (e.g. Do you think this dataset is valid?)
0 Freedom of Information Requests
0 Integration of tools 0 Google Refine
0 ScraperWiki
0 Visualizations
ENGAGE 2.0
Data Providers:
0 Maintainers of Official Datasets
0 Work as a group
0 Bring the community which works on their data closer to them/ direct communication
0 See and take advantage of ENGAGE Data Curation Community work (e.g. cleansing, better formats)
0 Easy to see / gather all the Applications that are based on their official datasets.
0 See the impact of their datasets.
0 Understand which datasets have RE-USE value for users.
0 Community Help in the process of Digitalization and Opening of current or older Public Data (history dimension)
Search for a dataset...
...use your own language
Check dataset information...
...and download it
Faceted search available...
...with several filters
Extend the datasets...
...in several ways...
...and keep the provenance information
www.engagedata.eu
OpenRefine
Describe the metadata...
Join the community...
..and create your groups
Rate the datasets...
..and share your thoughts
Find out about Open Data sites...
...per country
...or other criteria
Learn more about...
...ENGAGE,the ENGAGE API
and Data CurationMethods
Functionalities of ENGAGE open data e-infrastructure
0 Contribution of ENGAGE over existing infrastructures:
1. Service for researchers and citizens
2. Metadata specification and content organisation (embracement of the Linked Data Paradigm while ensuring the quality and responsiveness of highly structured information models)
3. Automation in data entry and curation
4. Crowdsourcing and interaction with and between users of the platform
5. Data curation tools and services
6. Dataset visualisation possibilities
7. Multilinguality
8. User help and training
Value Proposition through individual tools
Search in diverse and dispersed data sources in EU supported by ENGAGE
Be able to transform your datasets keeping the valuable information with the ENGAGE external tools (Open Refine, Scrapperwiki etc.)
See your results through visualisation tools
Structure your data according to your needs – control all the levels of your dataset (data, metadata, format)
Refine existing datasets by metadata enrichment
Value Proposition through collaboration
Create your community(ies) with members of mutual interests
Each community will be able to increase the value of its data sets by applying their own perspectives based on its unique needs
Upload your work and share it with your community
Find other data sets, valuable for your work, uploaded by your community (Collaborate / Exchange / Ask / Provide)
Combine their results with yours – make new datasets
Elastic Search
Ckan API ScraperWiki API
OpenRefine
DjangoWiki
Amazon S3
Python / Django Framework
HerokuPostgresql
Virtuoso PostgreSQL
Apache SolR
Django Framework
Gateways and integrated
tools
User Interface
ENGAGE CoreComponents
HTML / Jquery
Translate
StorageComponents
CERIF
Performing scenarios
Scenario 1: Searching, downloading, extending/ visualizing/ curating/ linking and uploading interesting datasets
Scenario 2: Getting information about other open data websites and comparing them via the ENGAGE website
Scenario 3: Getting information about manuals, API's and tutorials (training)
engagedata.eu
engage-project.eu → Events → Workshops → ENGAGE Online Usability Test
Verification Code = ODWM
Agenda
The ENGAGE project, an introduction The ENGAGE 2.0 platform, released in Beta since April 2013 Open data for re-use in Europe, some barriers to overcome?
Findings from the ENGAGE project Discussion
Your suggestions to overcome the barriers
8
From V1 evaluation Asking for:
– More of the specific datasets that users are looking for– Better performing advanced search functionality– More / more open dataset formats– More tools for visualization– More metadata– More metadata in the language that the user understands– Better understandable metadata– Easy to find metadata– Information about the quality of the datasets– Ability to rate and post comments on datasets
➢ Metadata are very important in solving many problems + Multilinguality + Dataset formats
3
Challenges of data sourcing• Great diversity and variety on datasets in terms of
• File format• Encoding• License• Language• Metadata standard (Discovery level)• Metadata standard (Data ‐ Domain level)
• Some PSI sites (even new) do not provide an API• Most sites provide an API only for discovery• Linked Data potential still not achieved (IT‐savvy / researchers only)• Live query of other portals datasets has issues:
– Schema Mapping– Performance
Barriers to overcome?
Metadata Need for a rich format to facilitate discovery and search
(DCAT, CKAN..., CERIF?) How to have it filled: human vs extraction Tracking of data ownership and provenance, for trust and
security Datasets formats: from pdf/csv toward LOD/rdf Multilinguality (metadata and data) Licences
Many “open” licences: CC and national licences Curation: what licence for a merged and enriched A + B,
depending on A and B licences?
4
Agenda
The ENGAGE project, an introduction The ENGAGE 2.0 platform, released in Beta since April 2013 Open data for re-use in Europe, some barriers to overcome?
Findings from the ENGAGE project Discussion
Your suggestions to overcome the barriers
11
Barriers to overcome?
Metadata Need for a rich format to facilitate discovery and search
(DCAT, CKAN..., CERIF?) How to have it filled: human vs extraction Tracking of data ownership and provenance, for trust and
security Datasets formats: from pdf/csv toward LOD/rdf Multilinguality (metadata and data) Licences
Many “open” licences: CC and national licences Curation: what licence for a merged and enriched A + B,
depending on A and B licences?
5
Rich contextual metadata is important
0 Captures context, purpose, provenance, coverage, etc.
0 Allows the user to:
0 Discover a dataset
0 Evaluate utility and re-use potential
0 Reuse it!
0 Enables advanced services
0 Sophisticated search/discovery and navigation, mining, visualisation,
reporting
11th International Conference on Current Research Information Systems (CRIS 2012), Prague, 6-9 June 2012
• Need canonical form to reduce n(n‐1) conversions to n– PSI data has several different metadata ‘standards’
• Canonical form must be able to ingest or generate the other metadata ‘standards’– Implies has to be richer than the others
• Syntax (structure) and semantics• Support multiple semantics over canonical syntax
• Canonical form must support whatever architecture is used
Mapping considerations
A 3‐level metadata approach• Level‐1. Discovery metadata. Flat schemata (analogous to Dublin core). Enables basic search by non‐sophisticated users.
• Level‐2. Usage metadata. A structured, semantically‐rich model for contextual metadata. Enables advanced domain‐independent services.
• Level‐3. Domain metadata. Detailed domain‐specific metadata. Allows advanced services provided by specialized tools.
A 3‐level metadata approach
CSMD Scientific studies
Samples, parameters,…
DDISocial sciences
Surveys, Populations, questionnaires,…
INSPIREGeospatial dataGeospatial info
SDMXStatistical data
Measures, Dimensions, …
Level‐3
eGMSDCATCKAN DC
Level‐1
Level‐2CERIF
CERIF‐generated RDF/LOD
Target Dataset(s)
generate
Pointto
Processing model
A 3‐level metadata approach
CERIF Common European Research Information Format – maintained by euroCRIS
From http://cerifsupport.org/2013/04/02/data-in-cerif/ , B. Joerg
CERIF
Barriers to overcome?
Metadata Need for a rich format to facilitate discovery and search
(DCAT, CKAN..., CERIF?) How to have it filled: human vs extraction Tracking of data ownership and provenance, for trust and
security Datasets formats: from pdf/csv toward LOD/rdf Multilinguality (metadata and data) Licences
Many “open” licences: CC and national licences Curation: what licence for a merged and enriched A + B,
depending on A and B licences?
6
Dataset formats
Community converting dataset to another format
Barriers to overcome?
Metadata Need for a rich format to facilitate discovery and search
(DCAT, CKAN..., CERIF?) How to have it filled: human vs extraction Tracking of data ownership and provenance, for trust and
security Datasets formats: from pdf/csv toward LOD/rdf Multilinguality (metadata and data) Licences
Many “open” licences: CC and national licences Curation: what licence for a merged and enriched A + B,
depending on A and B licences?
7
User Interface translation
Metadata translation
Barriers to overcome?
Metadata Need for a rich format to facilitate discovery and search
(DCAT, CKAN..., CERIF?) How to have it filled: human vs extraction Tracking of data ownership and provenance, for trust and
security Datasets formats: from pdf/csv toward LOD/rdf Multilinguality (metadata and data) Licences
Many “open” licences: CC and national licences Curation: what licence for a merged and enriched A + B,
depending on A and B licences?
8
Open licenses landscape
Open licenses landscape – per countryCountry Portal Licence
France Data.gouv.fr Licence Ouverte
United Kingdom Data.gov.uk Open Government Licence
Italy Dati.gov.it Creative Commons Attribuzione - Non commerciale 2.5 Italia (CC BY-NC 2.5)
Germany Govdata.deDatenlizenz Deutschland – Namensnennung Datenlizenz Deutschland – Namensnennung – nicht kommerziell
Norway Data.norge.no Norsk lisens for offentlige data (NLOD)
Netherlands Data.overheid.nl
No specific common licence but a recommendation for the agencies publishing data through the portal to use the framework of the Open Government Act, and to apply Creative Commons Zero of Public Domain if any licence is desired at all
Spain Datos.gob.esNo specific licence but two parts in extensive legal notes that cover data re-use and are based on different pieces of Spanish national legislation
Belgium Data.gov.beNo specific common licence. Each public service or government institution determines the terms and conditions governing access to and use of its data published through portal.
From Bunakov, V., Jeffery, K. (2013). Licence management for Public Sector Information. Conference for eDemocracy & Open Government
Open licenses landscape – several types
Open license content – an example
Regulation components of data.gouv.fr open licence
From Bunakov, V., Jeffery, K. (2013). Licence management for Public Sector Information. Conference for eDemocracy & Open Government
Conclusion: Participants' suggestions
The ENGAGE platform and features are interesting to promote data re-use pending the fulfillment of the next points:
1. Have a clear view of the positioning of ENGAGE in the Open Data ecosystem, including the added value / differences with respect to HOMER and other Open Data-related EC projects
2. Ensure ENGAGE sustainability
3. For ENGAGE in particular, and for EC projects in general, the developed software should be required to be open source in order to ensure their sustainability
4. Success stories related to the use of ENGAGE should be promoted, for example demonstrating the savings in time for Researchers
5. The educative side should be strong, with the inclusion of basic information on Linked Data, video tutorials,...
25