View
1.522
Download
0
Embed Size (px)
DESCRIPTION
The necessity of metadata for open linked data and its contribution to policyanalyses (Anneke Zuiderwijk, Keith Jeffery, Marijn Janssen) #CeDEM12
Citation preview
CEDEM 2012, May 3-4
The necessity of metadata for linked open data and its contribution to policy analyses
Anneke Zuiderwijk*, Keith Jeffery**, Marijn Janssen*
*Delft University of Technology, The Netherlands**Science and Technology Facilities Council, United Kingdom
Open governmental data
0 "We are sending a strong signal to administrations today. Your data is worth more if you give it away. So start releasing it now.” (December 12, 2011)
European Commission Vice President Neelie Kroes, digital agenda: Turning government data into gold)
0 One of many examples that shows that open governmental data have gained considerable attention recently
CEDEM 2012
The ENGAGE project
0 ENGAGE (FP7): An Infrastructure for Open, Linked Governmental Data Provision towards Research Communities and Citizens (http://www.engage-project.eu)
0 Main goal: the development and use of a data infrastructure, incorporating distributed and diverse public sector information (PSI) resources.
0 The ENGAGE platform will enable researchers and citizens to:0 Discover and browse datasets across diverse and dispersed public
sector information resources (local, national and European) in their own language
0 Download the datasets0 Perform geospatial search of datasets0 Visualize properly structured datasets in data tables, maps and charts
CEDEM 2012
Open governmental data
0 Open governmental data can be defined as “all stored data of the public sector which could be made accessible by government in the public interest without any restrictions on usage and distribution” (Geiger & Von Lucke, 2011, p. 185).
0 For example, public sector data can be:0 Geographic data (e.g. cadastral information)0 Legal data (e.g. courts decisions, legislation)0 Meteorological data (e.g. climate data, weather forecasts)0 Social data (e.g. population, public administration)0 Transport data (e.g. traffic congestion, work on roads)0 Business data (e.g. chamber of commerce, patents) (MEPSIR study,
Dekkers et al., 2006)
CEDEM 2012
Figure 1: Process for creating Linked Open Data
PUBLICATION ON THE SEMANTIC WEB
PUBLIC SECTOR (POLICY)
DATA
LINKED OPEN DATA
METADATA
LINKING DATA
REUSING OPEN DATA
(1)
(2)
(3)
(4)
(5)
Linked open data (LOD)
0 Focus on turning public sector data into LOD
1. Public body produces data (and metadata)
2. Data become available on the Web of Data / Semantic Web
3. Open data can be reused4. Open data can be linked to other
data show relationships5. Data are both open and linked
Linked Open Data (LOD)
CEDEM 2012
Metadata
0 Metadata are part of the LOD-process0 Metadata are needed to make sense of the open data (Berners-
Lee, 2009)
0 Metadata are defined as “structured information that describes, explains, locates, or otherwise makes it easier to retrieve, use, or manage an information resource.” (National Information Standards Organization, 2004, p. 1).
0 Metadata provision in the ideal situation:0 Discovery metadata, e.g. identifier, title, creator, keywords.0 Contextual metadata, e.g. organizations, projects, funding.0 Detailed metadata, e.g. quality and domain specific parameters.
CEDEM 2012
Why metadata are necessary in analyzing LOD
0Metadata for LOD can be useful in the following situations. Metadata:0 create order within datasets;0 improve storing and preservation of LOD;0 improve easily finding LOD;0 improve the accessibility of LOD;0 may make it possible to assess and rank the quality of LOD;0 improve easily analyzing, comparing, reproducing and therefore finding
inconsistencies in LOD;0 improve chances of a correct interpretation of LOD;0 improve the possibilities to find patterns in LOD to generate new
hypotheses;0 may improve visualizing LOD;0 make it easier to link data ;0 avoid unnecessary duplication of LOD.
CEDEM 2012
Problem statement
0 Discrepancies between the benefits that are described in literature and the benefits that are obtained in reality
0 Current situation is a long way from the ideal situation:0 usually few and insufficient ways of managing metadata and
interpretation of LOD (for instance Hernández-Pérez et al., 2009; Schuurman et al., 2008; Xiong et al., 2011);
0 adding metadata is often viewed as an additional activity that only consumes resources.
0 Statements:0 Merely linking data is not enough to make use of open data 0 Metadata are key enablers for the effective use of LOD in
policy-making
CEDEM 2012
Requirements for a metadata architecture
0 The metadata should:0 be easily discovered;0 interconvert common metadata formats used in PSI;0 provide a LOD representation of the metadata for browsing
or query;0 maintain the capabilities of conventional information
systems with structured query including convenient primitive operations.
CEDEM 2012
Outline architecture0 The requirements lead to the following architecture:
CEDEM 2012
Figure 2: An architecture of a portal server for the provision of metadata.
PSI Dataset Servers
Application Server
Portal server PORTAL METADATA
RUNNING SOFTWARE APPLICATION
PSI DATA-SET
PSI DATA-SET
PSI DATA-SET
Metadata0 Metadata should be used to implement this architecture
A 3-layer structure for metadata is used: a) discovery (flat) metadata; for example:
0 Dublin Core (DC);0 e-Government Metadata Standard (e-GMS);0 Comprehensive Knowledge Archive Network (CKAN);0 or similar ‘flat’ metadata
b) contextual metadata; uses the Common European Research Information Format (CERIF) ;
c) detailed metadata.
CEDEM 2012
The Vision: Metadata for Data Model
DISCOVERY(DC, eGMS…)
CONTEXT(CERIF)
DETAIL(SUBJECT OR TOPIC SPECIFIC)
Generate
Point to
Linked open data
Formal Information
Systems
DesignThe presented structure provides the next improved facilities:
0 CERIF provides a much richer metadata than the standards used commonly with PSI datasets.
0 The representation of contextual metadata (CERIF) allows rich semantics to be represented thus making the PSI datasets understandable to the end user (or software) through the metadata.
0 The Structured Query Language (SQL) has a simpler structure than SPARQL and includes convenient primitive operations for simple statistical calculations such as sum, count, average.
CEDEM 2012
Benefits of architecture
0 Because of the powerful expressive semantics over formal syntax of CERIF we can:0 Generate discovery metadata from CERIF;0 Interconvert common metadata formats used in PSI using CERIF as the
superset exchange mechanism;0 Provide a semantic web / LOD representation of the metadata for
browsing or query using SPARQL;0 While maintaining a conventional information systems capability with
structured query including convenient primitive operations.
CEDEM 2012
Models for an infrastructure
0 The data model with its metadata described is only one relevant model
0 The other models are:0 User model0 Processing model0 Resource model
The Vision: The Models
Complete ICT environment for PSIComplete cohort of users
Processing Model
User Model
Data ModelResource
Model
Model – User model
0 User Model: controls the way in which the end-user interacts with the e-infrastructure.0 User profile, security certification, privacy;0 Device and interaction mode preferences (keyboard/mouse through
voice and gesture to brain-connected), language preference;0 Resource preferences (including contacts) with directories;
0 METADATA
Models – Processing model
0 Process Model controls the way processes are constructed and executed in the e-infrastructure0 Services
0 Described for discovery, described for functional and non-functional (security, privacy, performance) properties
0 Mobile (deployed in distributed / parallel execution environments)
0 Open source where possible
0 Service composition0 Dynamically (re-) composable during execution
0 METADATA
Models – Data model
0 Data Model controls data representation and data (re-)use0 Formal syntax (structure)
0 Even for text, images, streamed video0 Declared semantics (meaning)
0 METADATA
Models – Resource model
0 Resource Model catalogs the available computing resources in the e-infrastructure0 This allows virtualisation so the user neither knows nor cares from
where the data comes, or where the processing is done, as long as quality of service is maintained;
0 Requires updating by resource owners – together with conditions of use
0 METADATA
Conclusions (1)
0 Metadata are needed to make sense of the open data 0 Merely linking data is not enough to make optimal use of open
data 0 Metadata are key enablers for policy-making0 Adding metadata can yield considerable benefits, including:
0 creating order in datasets0 improving find ability, accessibility, storing and preservation of LOD0 improving easily analyzing, comparing, reproducing, finding
inconsistencies0 correct interpretation and visualizing of LOD0 finding patters in LOD to generate new hypotheses0 making linking of data easier0 assessing and ranking the quality of LOD and avoiding unnecessary
duplication of LOD
CEDEM 2012
Conclusions (2)
0 Architecture for metadata:0 discovery metadata can be generated from CERIF0 common metadata formats can use CERIF as the superset exchange
mechanism0 a LOD representation of the metadata for browsing or query can be
made allowing the use of SPARQL0 while a conventional information systems capability with structured
query including convenient primitive operations can be maintained0 We recommend to further implement the proposed metadata
architecture
CEDEM 2012