Consuming Linked Open Data

Preview:

Citation preview

Consuming Linked Open Data

WorkdistributedunderthelicenseCreativeCommonsAttribution-Noncommercial-Share Alike 3.0

Boris Villazón-Terrazas Facultad de Informática, Universidad Politécnica de Madrid

Campus de Montegancedo sn, 28660 Boadilla del Monte, Madrid http://www.oeg-upm.net

bvillazon@fi.upm.es Phone: 34.91.3366605, Fax: 34.91.3524819

@boricles Slides available at: http://www.slideshare.net/boricles/

Acknowledgements: Alexander de Leon, Filip Wisniewki, Daniel Vila-Suero, Daniel Garijo, Victor Saquicela, Michael Hausenblas, Richard Cyganiak, Sarven Capadisli, Oscar Corcho, Asunción Gómez-Pérez, all OEG members involved in the Linked Data initiatives, and Local Government Management Services Board - Ireland.

Some references

Wood, David (Ed) Linking Government Data - 2011!

Methodological Guidelines for Publishing Government Linked Data!

Boris Villazón-Terrazas, Luis M. Vilches, Oscar Corcho, Asunción Gómez-Pérez!

Best Practices for Publishing Linked Data!

W3C Editor’s Draft – Government Linked Data Working Group!

Bernadette Hyland, Boris Villazón-Terrazas, Michael Hausenblas, !

https://dvcs.w3.org/hg/gld/raw-file/default/bp/index.html!

Cookbook for Open Government Linked Data!

W3C Editor’s Draft – Government Linked Data Working Group!

Bernadette Hyland, Boris Villazón-Terrazas, Sarven Capadisli!

http://www.w3.org/2011/gld/wiki/Linked_Data_Cookbook!

2

ToC

•  Introduction

•  Publishing & Consuming Linked Open Data

•  Use cases

•  Conclusions and future work

3

ToC

•  Introduction

•  Publishing & Consuming Linked Open Data

•  Use cases

•  Conclusions and future work

4

Guidelines for Publishing Linked Data

•  The process of publishing Linked Data has an iterative incremental life cycle model.

•  Based on our experience in the production of Linked Data in several Governmental Contexts, have been applied in real case scenarios.

5

ToC

•  Introduction

•  Publishing & Consuming Linked Open Data

•  Use cases

•  Conclusions and future work

6

7

Specifica(on  

Modelling  

Genera(on  

Linking  

Publica(on  

Exploita(on  

8

Specifica(on  

Modelling  

Genera(on  

Linking  

Publica(on  

Exploita(on  

Identification and analysis of the data sources

We have to distinguish •  Open and publish data that government agencies have

not yet opened up and published •  Task that may require contacting to specific government data

owners to get access to their legacy data

•  Reuse and leverage on data already opened up and published by government agencies •  Task to look for these data in public government catalogs

•  Open Government Data •  datacatalogs.org •  Open Government Catalog

9

Specification

Identification and analysis of the data sources

After we have identified and selected the government data sources

•  Search and compile all the available data and

documentation about those resources

•  Identify the schema of those resources including conceptual components and their relationships

•  Identify the items in the domain, i.e., things whose properties and relations are described in the data sources

10

Specification

URI Design

•  Use meaningful URIs, instead of opaque URIs, when possible

•  Separate TBox (ontology model) from ABox (instances) URIs. •  Base URI

http://data.gov.bo/ http://health.data.gov.bo/

•  TBox URIs http://data.gov.bo/ontology/{class|property}

•  ABox URIs http://data.gov.bo/resource/ http://data.gov.bo/resource/province/Tiraque

11

Specification

Definition of the license

•  Several possibilities

•  The UK Open Government License

•  Open Database License

•  Public Domain Dedication and License

•  Open Data Commons Attribution License

•  The Creative Commons Licenses

It is also possible to reuse and apply an existing license of the government data sources.

12

Specification

13

Specifica(on  

Modelling  

Genera(on  

Linking  

Publica(on  

Exploita(on  

Reuse available vocabularies

14

Search for suitable vocabularies

Linked Open Vocabularies

are there suitable

vocabularies?

Build the vocabulary by reusing available

vocabularies

Yes

No

Modelling

Reuse available non-ontological resources

15

Search for suitable non-ontological resources

Highly reliable Web Sites

Domain-related sites

Government Catalogs

are there suitable

resources?

Build the vocabulary by transforming available

resources

Yes

No

Build the vocabulary from scratch

Modelling

16

Specifica(on  

Modelling  

Genera(on  

Linking  

Publica(on  

Exploita(on  

Transformation

•  Take the data sources selected in the specification activity and transform them to RDF according to the vocabulary created in the modelling activity

•  Some tools •  CSV and spreadsheets

•  RDF extension of Google Refine, XLWrap, RDF123, NOR2O •  RDB

•  D2R Server, ODEMapster, W3C RDB2RDF WG – R2RML •  XML

•  GRDDL, ReDeFer

17

Generation

•  A majority of dynamic Web content is backed by relational databases (RDB), and so are many enterprise systems.

•  W3C RDB2RDF Working Group •  R2RML: RDB to RDF Mapping Language - http://www.w3.org/TR/r2rml/ •  Direct Mapping - http://www.w3.org/TR/rdb-direct-mapping/ •  R2RML and Direct Mapping Test Cases - http://www.w3.org/2001/sw/rdb2rdf/test-cases/ •  RDB2RDF Implementation report – http://www.w3.org/2001/sw/rdb2rdf/implementation-

report/

Transformation – RDB2RDF

transformation description

transformation engine

18

19

Specifica(on  

Modelling  

Genera(on  

Linking  

Publica(on  

Exploita(on  

20

Identify suitable data sets as linking targets

http://thedatahub.org

Discover relationships between data items

Silk Framework LIMES

Validate the relationships discovered sameAs Validator

http://aksw.org/Projects/limes http://www4.wiwiss.fu-berlin.de/bizer/silk/

http://oegdev.dia.fi.upm.es:8080/sameAs/

Linking

21

Specifica(on  

Modelling  

Genera(on  

Linking  

Publica(on  

Exploita(on  

•  Tools for storing RDF •  Virtuoso Universal Server, Jena, Sesame, 4Store, YARS,

OWLIM

•  SPARQL endpoint and Linked Data frontend •  Pubby, Talis Platform, Fuseki

22

Dataset Publication Publication

•  VoID allows to express metadata about RDF datasets

•  PROV Ontology

23

Metadata Publication Publication

http://www.w3.org/TR/void/

http://www.w3.org/TR/prov-o/

•  Register the dataset into CKAN Registry – thedatahub.org

•  Generate sitemap files for your dataset, by using sitemap4rdf

•  Submit the sitemap location to Google and Sindice

24

Dataset discovery Publication

http://http://ww.w3.org/wiki/TaskForces/CommunityProjects/LinkingOpenData/DataSets/CKANmetainformation http://lab.linkeddata.deri.ie/2010/sitemap4rdf/

25

Specifica(on  

Modelling  

Genera(on  

Linking  

Publica(on  

Exploita(on  

Effective usage, develop applications that exploit these data

26

Streaming resources

ToC

•  Introduction

•  Publishing & Consuming Linked Open Data

•  Use cases •  GeoLinkedData - ES •  AEMET - ES •  El Viajero - ES •  datos.bne.es - ES •  Service Indicators - IE

•  Conclusions and future work

27

GeoLinkedData

28

•  An open initiative whose aim is to enrich the Web of Data with Spanish geospatial data

•  It started off by publishing diverse information sources, such as National Geographic Institute of Spain (IGN) and Statistical Institue of Spain (INE).

•  http://geo.linkeddata.es/

GeoLinkedData – Identification of the data sources

IGN National Geographic Institute of Spain

Oracle & MySQL

INE National Statistic Institute of Spain

29

Agreement with the IGN

Data sources available in a public data catalog

Specification

GeoLinkedData – Analysis of the data sources

30

Specification

Industry Production Index Province

Year

GeoLinkedData - URI design

•  Base URI http://linkeddata.es/ http://geo.linkeddata.es/

•  TBox URIs

http://geo.linkeddata.es/ontology/{concept|property} http://geo.linkeddata.es/ontology/Provincia

•  ABox URIs

http://geo.linkeddata.es/resource/{r. type}/{r. name} http://geo.linkeddata.es/resource/Provincia/Madrid

31

Specification

GeoLinkedData

hasStatisticalData

on

Ontology

Specification

Legend

hydrOntology

4

FAO

FAO Geopolitical ontology

WGS84

4W3C Vocabulary

GML

4GML Specification

O. Statistics

SCOVO

O. Time

W3C Time

hasLat/Long

hasGeometry

hasLat/Long

hasGeometry

hasLocation/isLocated

Thesaurus

UNESCO

4EGM / ERM

GeoNames…

scv:Dimension scv:Item

scv:Dataset

WGS84 Geo Positioning: an RDF

vocabulary

hydrographical phenomena (rivers,

lakes, etc.)

Ontology for OGC Geography Markup Language

Vocabulary for instants, intervals, durations, etc.

Names and international code systems for territories and groups

http://neon-toolkit.org/

Classes 33 33

Object Properties 44 44

Data Properties 318 318

32

Modelling

33

Generation GeoLinkedData - Transformation

INE

NOR2O  

ODEMapster  

IGN

IGN

Geospatial column Geometry2RDF  

Geospatial model

34

•  Model used by DBPedia http://www.w3.org/2005/Incubator/geo/

dbpedia:Madrid

40.41

geo:lat

geo:long

-3.70

•  GeoLinkedData Geometry Model, NeoGeo Vocab http://geo.linkeddata.es/web/guest/modelos http://geovocab.org/doc/neogeo/

geoes:Madrid geoes:wgs84/40.41_-3.70 geo:geometry

40.41

-3.70

geo:lat

geo:long

geo:Point

rdf:type

Geospatial model - more complex geometries

35

geoes:Ebro geoes:resource/7979707 geo:geometry

40.41 -3.70

geo:lat geo:long

geoes:wsg84/0.45_45.4

geoes:ontology/formedBy

geoes:wsg84/0.67_45.3

40.67 45.3

geo:lat geo:long

geo:Point

geoes:ontology/LineString

rdf:type

geoes:ontology/formedBy

36

GeoLinkedData

GeoNames DBPedia

http://sws.geonames.org/

6355233/

http://geo.linkeddata.es/...

/Madrid

http://dbpedia.org/resource/Madrid

….

…. ….

….

….

….

GeoLinkedData - Linking

owl:sameAs owl:sameAs

37

SPARQL

Pubby

Linked Data HTML

Virtuoso 6.1.0

Pubby 0.3

Including Provenance Support

http://www4.wiwiss.fu-berlin.de/pubby/

GeoLinkedData – Dataset publication Publication

Overview

•  Faceted browsing tool for exploring and visualizing RDF datasets enhanced with geospatial information.

38

map4rdf SPARQL

Triplestore

http://oegdev.dia.fi.upm.es/projects/map4rdf/ http://github.com/pejot/linkeddata-visualization-tools/

Exploitation

Basic architecture

39

Triplestore*

Web*Server*

SPARQL*Endpoint*

Web*Client* Faceted*Browsing**Interface*

Command*A* Command*B* Command*C* Command*D* Command*E*

Invoker*

Event*Bus*

Dispatch*Servlet*

Command*Handler*A*

Command*Handler*B*

Command*Handler*C*

Command*Handler*D*

Command*Handler*E*

Data*Access*Object*(DAO)*

Model View Presenter Pattern Command Pattern Dependency Injection Pattern

http://blog.hivedevelopment.co.uk/2009/08/google-web-toolkit-gwt-mvp-example.html

How to use it

40

map4rdf.war

configuration.properties

Google Maps

41

OpenStreetMap

42

Open Layers Map

43

Catalogue Service Web

44

•  http://www.idee.es/csw-inspire-idee/srv/en/main.home •  http://www.idee.es/csw-inspire-idee/servicio

•  Also we can include any other WMS service

Complex geometries

45

Suggestions of Editions / Data curation

•  For example, let us assume that the resource that represents the Murcia Airport has not the correct information.

46

Provinces

47

Integrate data coming from the Statistical Institute

48

Provinces – Industry Production Index

49

Specific visualizations for datasets based on SCOVO and RDF Data Cube

http://vocab.deri.ie/scovo http://www.w3.org/TR/vocab-data-cube/

ToC

•  Introduction

•  Publishing & Consuming Linked Open Data

•  Use cases •  GeoLinkedData - ES •  AEMET - ES •  El Viajero - ES •  datos.bne.es - ES •  Service Indicators - IE

•  Conclusions and future work

50

Meteorological Linked Data - AEMET

•  AEMET, Spanish Meteorological Office

•  Meteorological data registered by its weather stations, radars, lightning detectors and ozone soundings.

•  http://aemet.linkeddata.es/

51

Specification

•  Identification of data sources •  250 weather stations (pressure,

humidity, etc) •  Data from the stations in CSV files in a

FTP server

•  URI design

11/02/11

52

•  Reuse existing resources •  Well known ontologies •  Reuse our own ontologies •  Ontology Design Patterns •  Standards and classifications •  Repositories: Swoogle, Watson, etc.

Modelling

11/02/11

OWL-Time ontology

SSN ontology

• Geobuddies Ontology Network • GeoLinkedData

WGS84 Geo Positioning

53

Generation

•  Generate instances of the ontology from the previous steps •  Phyton script to covert CSV to RDF instances

11/02/11

Generation

54

Publication

•  Virtuoso Open Source Edition •  http://aemet.linkeddata.es/sparql

11/02/11

55

Exploitation

•  Visualization •  http://aemet.linkeddata.es/browser.html

•  Based on map4RDF

•  Gflot •  Flot is a pure Javascript plotting library, and Gflot is a GWT

adaptation of Flot.

11/02/11

http://code.google.com/p/gflot/

56

Weather stations

11/02/11

57

Observations for each station

11/02/11

58

A particular observation

11/02/11

Specific visualization for datasets based on SSN Ontology – ongoing work

http://www.w3.org/2005/Incubator/ssn/ssnx/ssn

59

ToC

•  Introduction

•  Publishing & Consuming Linked Open Data

•  Use cases •  GeoLinkedData - ES •  AEMET - ES •  El Viajero - ES •  datos.bne.es - ES •  Service Indicators - IE

•  Conclusions and future work

60

El Viajero – tourism and travelling

•  Content is aggregated from different platforms, such as “Suplemento El País”, ”Guías Aguilar”, “Canal Viajar” o “Prisa Digital”.

•  Heterogeneous content (images, travel guides, posts, videos, news) with different sources and from people with different profiles (journalists, bloggers and normal users)

61

Modelling

Ontology network

•  OPM (1): •  Centered in the description of

the evolution of the resource.

•  OPM profile (2): •  OPM Extension to our specific

domain.

•  SIOC (3): •  Describes the social

relationships in the platforms, plus posts and blogs.

•  MPEG-7 (3): •  Image and video description.

•  GEO (3): •  Localization of the resources.

OPM Core

OPM extension to our domain

SIOC MPEG-7 GEO

1

2

3

62

Overview of the architecture

Repository

Post Parser

Blog Parser

XML Parser

IPTC

Parser

PARSERS

Annotation interface

HTTP POST

Request

HTTP GET

Request (SPARQL query)

REST API

Insert processed data

Store in the repository

Insert XML data Receive request Send

response

User/content provider Application

Send/receive

RDF response

OWL Model SPARQL request

63

Linking •  SILK has been used to:

•  Link resources to DBpedia through gelocation •  Link resources to GeolinkedData through geolocation

•  Linking resources to LUF (Linked User Feedback). •  Guide & travel recommendation.

•  Linking travel guides to hotels and restaurants of “Guía Santillana”.

SILK

64

Exploitation

El Viajero: •  Extension of map4rdf to our domain.

•  New queries for browsing resources •  Image addition •  Filtering and time-line plugins

Additional exploitation: •  Resource searcher using the dataset. •  LARKC demo (ISOCO) http://contextmanager.isoco.net/webn1/demolarkc/

http://www.simile-widgets.org/timeline/

65

Browser

66

Initial screen

Selecting a type of resource, we will see all of the available resources on the map

Guide Browsing

67

More images of the guides

Link to the news in “El Viajero”

Pubby frontend

Guide Browsing

68

More images of the guides

Year filtering

69

Plugin selection Year selection

Trip Browsing

70

Trip metadata Itinerary followed in the trip

Timeline

71

Trip timeline (drawn from its provenance

information)

Trip features (price, duration, type, etc)

Quick search - Author

72

Reference to locations

Guides

ToC

•  Introduction

•  Publishing & Consuming Linked Open Data

•  Use cases •  GeoLinkedData - ES •  AEMET - ES •  El Viajero - ES •  datos.bne.es - ES •  Service Indicators - IE

•  Conclusions and future work

73

datos.bne.es project

•  Joint project between the National Library of Spain (BNE) and Ontology Engineering Group

•  Started as a small proof-of-concept project:

Publishing "Cervantes" Datasets as LD

•  Evolved into a bigger project: Publishing a significant part of the BNE catalogue

•  Published in December 2011, public announcement

at BNE

74

datos.bne.es: Methodological approach

•  Derived from several experiences at OEG: geolinkeddata.es, Met agency, etc. [1]

•  Design principle: Have more control over the different

activities, allow for iterative, incremental process

75

Data specification

Modelling

RDF generation

Link generation

[1] Villazón-Terrazas, B. et al., Methodological Guidelines for Publishing Government Linked Data. In D. Wood, ed. Linking Government Data. Springer.

Publication

Exploitation

www.oeg-upm.net/index.php/es/technologies/228-marimba

Specification

•  Records in the MARC 21 format •  3.9 million bibliographical records •  4.2 million authority records •  Version: November, 2011

76

IFLA Vocabulary-based ontology

77

Modelling

MARiMbA generates RDF using RDFS/OWL ontologies

BNE

78

Generation

VIAF, DNB, SUDOC, LIBRIS, DBpedia

BNE

http://datos.bne.es/resource/XX1718747

Same As Same As

Same As

Same As

Same As

LIBRIS

http://libris.kb.se/resource/auth/45369

SUDOC

http://www.idref.fr/026774771/id

DNB

http://d-nb.info/gnd/11851993X

DBpedia

http://dbpedia.org/resource/Miguel_de_Cervantes

VIAF http://viaf.org/viaf/17220427

79

Linking

Publication

Data publication Metadata publicacion using VoID To facilitate the discovery

•  Register in CKAN your dataset

•  Use sitemap4rdf to generate the site map

•  Upload the site map to Google and Sindice

80

Exploitation

select distinct COUNT(?Obras) where { http://datos.bne.es/resource/XX1718747 <http://iflastandards.info/ns/fr/frbr/frbrer/P2010> ?Obras }

URI Cervantes

Is author

SPARQL Queries: http://datos.bne.es/sparql

Web Interface

http://bne.linkeddata.es

81

Author Graph

•  http://bne.linkeddata.es/graphvis/

82

Open Graph Viz platform http://gephi.org/

ToC

•  Introduction

•  Publishing & Consuming Linked Open Data

•  Use cases •  GeoLinkedData - ES •  AEMET - ES •  El Viajero - ES •  datos.bne.es - ES •  Service Indicators - IE

•  Conclusions and future work

83

County Rank

•  Data sources •  Services Indicators (Local Government Management

Services Board) •  Wikipedia/Dbpedia

•  Purpose •  Assess county/city performance •  Compare counties/cities

84

Specification – Spreadsheet about statistics

•  Service Indicators of Ireland

85

Specification – Spreadsheet about statistics

•  Service Indicators of Ireland •  Data for 2009

86

Specification - URI design

•  Base URI

•  http://stats.data-gov.ie

•  TBOX URI

•  We use the RDF Data Cube Vocabulary

•  ABOX URI

•  http:// stats.data-gov.ie /data/{resourceType}/{resource}

87

RDF Data Cube – Main elements

88

Modelling

RDF Data Cube - Concepts

89

stats:concept/f

skos:Concept

stats:concept/f-1 stats:concept/f-2

stats:concept/f-1-2

skos:broader skos:broader

skos:broader rdf:type

rdf:type

rdf:type

rdf:type

rdf:http://www.w3.org/1999/02/22-rdf-syntax-ns# skos: http://www.w3.org/2004/02/skos/core# stats: http://stats.ull.es/resource/

RDF Data Cube - Properties

90

qb:MeasureProperty

stats:property/f-1-2

rdf:type

stats:concept/f-1-2

qb:obsValue rdfs:subPropertyOf

qb:concept

xsd:double

rdfs:range

“Average time …”

rdfs:label

rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns# rdfs: http://www.w3.org/2000/01/rdf-schema# qb: http://purl.org/linked-data/cube# stats: http://stats.ull.es/resource/

RDF Data Cube – Data structure definition

91

qb:DataStructureDefinition

stats:dsd/f-1-2

rdf:type

stats:componet/geoArea

stats:componet/refPeriod

stats:componet/f-1-2

qb:component

qb:component

qb:component

rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns# qb: http://purl.org/linked-data/cube# stats: http://stats.ull.es/resource/

RDF Data Cube – DataSet

92

stats:data/f-1-2

stats:data/f-1-2/2009/county/donegal

qb:dataSet

stats:data/f-1-2/2009/county/cavan

……

qb:dataSet

qb:DataSet

rdf:type

qb:Observation

rdf:type rdf:type

stats:dsd/f-1-2

qb:structure

rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns# qb: http://purl.org/linked-data/cube# stats: http://stats.ull.es/resource/

RDF Data Cube – Observation

93

stats:data/f-1-2 stats:data/f-1-2/2009/county/donegal

qb:dataSet

qb:Observation

http://reference.data.gov.uk/id/

year/2009

sdmx-dimension:refPeriod

rdf:type

http://geo.data-gov.ie/county/donegal

property:geoArea

5.29

property:f-1-2

rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns# qb: http://purl.org/linked-data/cube# stats: http://stats.ull.es/resource/ property: http://stats.data-gov.ie/property/ sdmx-dimension: http://purl.org/linked-data/sdmx/2009/dimension#

Publication

•  http://data-gov.ie/sparql

94

Metadata publication – VoID

•  VoID description •  void.ttl

95

Exploitation http://county-rank.data-gov.ie/

Google charts tools https://developers.google.com/chart/ 96

Information about the County

97

98

99

100

ToC

•  Introduction

•  Publishing & Consuming Linked Open Data

•  Use cases

•  Conclusions and future work

101

Conclusions & Future Work

•  Keep working on visualizations for specific vocabularies

•  Integrate different visualizations

•  Develop applications that •  promote transparency, •  allow the creation of new, innovative, added-value services.

102

Consuming Linked Open Data

WorkdistributedunderthelicenseCreativeCommonsAttribution-Noncommercial-Share Alike 3.0

Boris Villazón-Terrazas Facultad de Informática, Universidad Politécnica de Madrid

Campus de Montegancedo sn, 28660 Boadilla del Monte, Madrid http://www.oeg-upm.net

bvillazon@fi.upm.es Phone: 34.91.3366605, Fax: 34.91.3524819

@boricles Slides available at: http://www.slideshare.net/boricles/

Acknowledgements: Alexander de Leon, Filip Wisniewki, Daniel Vila-Suero, Daniel Garijo, Victor Saquicela, Michael Hausenblas, Richard Cyganiak, Sarven Capadisli, Oscar Corcho, Asunción Gómez-Pérez, all OEG members involved in the Linked Data initiatives, and Local Government Management Services Board - Ireland.

Recommended