Upload
leque
View
223
Download
0
Embed Size (px)
Citation preview
Consuming Linked Open Data
WorkdistributedunderthelicenseCreativeCommonsAttribution-Noncommercial-Share Alike 3.0
Boris Villazón-Terrazas Facultad de Informática, Universidad Politécnica de Madrid
Campus de Montegancedo sn, 28660 Boadilla del Monte, Madrid http://www.oeg-upm.net
[email protected] Phone: 34.91.3366605, Fax: 34.91.3524819
@boricles Slides available at: http://www.slideshare.net/boricles/
Acknowledgements: Alexander de Leon, Filip Wisniewki, Daniel Vila-Suero, Daniel Garijo, Victor Saquicela, Michael Hausenblas, Richard Cyganiak, Sarven Capadisli, Oscar Corcho, Asunción Gómez-Pérez, all OEG members involved in the Linked Data initiatives, and Local Government Management Services Board - Ireland.
Some references
Wood, David (Ed) Linking Government Data - 2011!
Methodological Guidelines for Publishing Government Linked Data!
Boris Villazón-Terrazas, Luis M. Vilches, Oscar Corcho, Asunción Gómez-Pérez!
Best Practices for Publishing Linked Data!
W3C Editor’s Draft – Government Linked Data Working Group!
Bernadette Hyland, Boris Villazón-Terrazas, Michael Hausenblas, !
https://dvcs.w3.org/hg/gld/raw-file/default/bp/index.html!
Cookbook for Open Government Linked Data!
W3C Editor’s Draft – Government Linked Data Working Group!
Bernadette Hyland, Boris Villazón-Terrazas, Sarven Capadisli!
http://www.w3.org/2011/gld/wiki/Linked_Data_Cookbook!
2
ToC
• Introduction
• Publishing & Consuming Linked Open Data
• Use cases
• Conclusions and future work
3
ToC
• Introduction
• Publishing & Consuming Linked Open Data
• Use cases
• Conclusions and future work
4
Guidelines for Publishing Linked Data
• The process of publishing Linked Data has an iterative incremental life cycle model.
• Based on our experience in the production of Linked Data in several Governmental Contexts, have been applied in real case scenarios.
5
ToC
• Introduction
• Publishing & Consuming Linked Open Data
• Use cases
• Conclusions and future work
6
7
Specifica(on
Modelling
Genera(on
Linking
Publica(on
Exploita(on
8
Specifica(on
Modelling
Genera(on
Linking
Publica(on
Exploita(on
Identification and analysis of the data sources
We have to distinguish • Open and publish data that government agencies have
not yet opened up and published • Task that may require contacting to specific government data
owners to get access to their legacy data
• Reuse and leverage on data already opened up and published by government agencies • Task to look for these data in public government catalogs
• Open Government Data • datacatalogs.org • Open Government Catalog
9
Specification
Identification and analysis of the data sources
After we have identified and selected the government data sources
• Search and compile all the available data and
documentation about those resources
• Identify the schema of those resources including conceptual components and their relationships
• Identify the items in the domain, i.e., things whose properties and relations are described in the data sources
10
Specification
URI Design
• Use meaningful URIs, instead of opaque URIs, when possible
• Separate TBox (ontology model) from ABox (instances) URIs. • Base URI
http://data.gov.bo/ http://health.data.gov.bo/
• TBox URIs http://data.gov.bo/ontology/{class|property}
• ABox URIs http://data.gov.bo/resource/ http://data.gov.bo/resource/province/Tiraque
11
Specification
Definition of the license
• Several possibilities
• The UK Open Government License
• Open Database License
• Public Domain Dedication and License
• Open Data Commons Attribution License
• The Creative Commons Licenses
It is also possible to reuse and apply an existing license of the government data sources.
12
Specification
13
Specifica(on
Modelling
Genera(on
Linking
Publica(on
Exploita(on
Reuse available vocabularies
14
Search for suitable vocabularies
Linked Open Vocabularies
are there suitable
vocabularies?
Build the vocabulary by reusing available
vocabularies
Yes
No
…
Modelling
Reuse available non-ontological resources
15
Search for suitable non-ontological resources
Highly reliable Web Sites
Domain-related sites
Government Catalogs
are there suitable
resources?
Build the vocabulary by transforming available
resources
Yes
No
Build the vocabulary from scratch
Modelling
16
Specifica(on
Modelling
Genera(on
Linking
Publica(on
Exploita(on
Transformation
• Take the data sources selected in the specification activity and transform them to RDF according to the vocabulary created in the modelling activity
• Some tools • CSV and spreadsheets
• RDF extension of Google Refine, XLWrap, RDF123, NOR2O • RDB
• D2R Server, ODEMapster, W3C RDB2RDF WG – R2RML • XML
• GRDDL, ReDeFer
17
Generation
• A majority of dynamic Web content is backed by relational databases (RDB), and so are many enterprise systems.
• W3C RDB2RDF Working Group • R2RML: RDB to RDF Mapping Language - http://www.w3.org/TR/r2rml/ • Direct Mapping - http://www.w3.org/TR/rdb-direct-mapping/ • R2RML and Direct Mapping Test Cases - http://www.w3.org/2001/sw/rdb2rdf/test-cases/ • RDB2RDF Implementation report – http://www.w3.org/2001/sw/rdb2rdf/implementation-
report/
Transformation – RDB2RDF
transformation description
transformation engine
18
19
Specifica(on
Modelling
Genera(on
Linking
Publica(on
Exploita(on
20
Identify suitable data sets as linking targets
http://thedatahub.org
Discover relationships between data items
Silk Framework LIMES
Validate the relationships discovered sameAs Validator
http://aksw.org/Projects/limes http://www4.wiwiss.fu-berlin.de/bizer/silk/
http://oegdev.dia.fi.upm.es:8080/sameAs/
Linking
21
Specifica(on
Modelling
Genera(on
Linking
Publica(on
Exploita(on
• Tools for storing RDF • Virtuoso Universal Server, Jena, Sesame, 4Store, YARS,
OWLIM
• SPARQL endpoint and Linked Data frontend • Pubby, Talis Platform, Fuseki
22
Dataset Publication Publication
• VoID allows to express metadata about RDF datasets
• PROV Ontology
23
Metadata Publication Publication
http://www.w3.org/TR/void/
http://www.w3.org/TR/prov-o/
• Register the dataset into CKAN Registry – thedatahub.org
• Generate sitemap files for your dataset, by using sitemap4rdf
• Submit the sitemap location to Google and Sindice
24
Dataset discovery Publication
http://http://ww.w3.org/wiki/TaskForces/CommunityProjects/LinkingOpenData/DataSets/CKANmetainformation http://lab.linkeddata.deri.ie/2010/sitemap4rdf/
25
Specifica(on
Modelling
Genera(on
Linking
Publica(on
Exploita(on
Effective usage, develop applications that exploit these data
26
Streaming resources
ToC
• Introduction
• Publishing & Consuming Linked Open Data
• Use cases • GeoLinkedData - ES • AEMET - ES • El Viajero - ES • datos.bne.es - ES • Service Indicators - IE
• Conclusions and future work
27
GeoLinkedData
28
• An open initiative whose aim is to enrich the Web of Data with Spanish geospatial data
• It started off by publishing diverse information sources, such as National Geographic Institute of Spain (IGN) and Statistical Institue of Spain (INE).
• http://geo.linkeddata.es/
GeoLinkedData – Identification of the data sources
IGN National Geographic Institute of Spain
Oracle & MySQL
INE National Statistic Institute of Spain
29
Agreement with the IGN
Data sources available in a public data catalog
Specification
GeoLinkedData – Analysis of the data sources
30
Specification
Industry Production Index Province
Year
GeoLinkedData - URI design
• Base URI http://linkeddata.es/ http://geo.linkeddata.es/
• TBox URIs
http://geo.linkeddata.es/ontology/{concept|property} http://geo.linkeddata.es/ontology/Provincia
• ABox URIs
http://geo.linkeddata.es/resource/{r. type}/{r. name} http://geo.linkeddata.es/resource/Provincia/Madrid
31
Specification
GeoLinkedData
hasStatisticalData
on
Ontology
Specification
Legend
hydrOntology
4
FAO
FAO Geopolitical ontology
WGS84
4W3C Vocabulary
GML
4GML Specification
O. Statistics
SCOVO
O. Time
W3C Time
hasLat/Long
hasGeometry
hasLat/Long
hasGeometry
hasLocation/isLocated
Thesaurus
UNESCO
4EGM / ERM
GeoNames…
scv:Dimension scv:Item
scv:Dataset
WGS84 Geo Positioning: an RDF
vocabulary
hydrographical phenomena (rivers,
lakes, etc.)
Ontology for OGC Geography Markup Language
Vocabulary for instants, intervals, durations, etc.
Names and international code systems for territories and groups
http://neon-toolkit.org/
Classes 33 33
Object Properties 44 44
Data Properties 318 318
32
Modelling
33
Generation GeoLinkedData - Transformation
INE
NOR2O
ODEMapster
IGN
IGN
Geospatial column Geometry2RDF
Geospatial model
34
• Model used by DBPedia http://www.w3.org/2005/Incubator/geo/
dbpedia:Madrid
40.41
geo:lat
geo:long
-3.70
• GeoLinkedData Geometry Model, NeoGeo Vocab http://geo.linkeddata.es/web/guest/modelos http://geovocab.org/doc/neogeo/
geoes:Madrid geoes:wgs84/40.41_-3.70 geo:geometry
40.41
-3.70
geo:lat
geo:long
geo:Point
rdf:type
Geospatial model - more complex geometries
35
geoes:Ebro geoes:resource/7979707 geo:geometry
40.41 -3.70
geo:lat geo:long
geoes:wsg84/0.45_45.4
geoes:ontology/formedBy
geoes:wsg84/0.67_45.3
40.67 45.3
geo:lat geo:long
geo:Point
geoes:ontology/LineString
rdf:type
geoes:ontology/formedBy
…
36
GeoLinkedData
GeoNames DBPedia
http://sws.geonames.org/
6355233/
http://geo.linkeddata.es/...
/Madrid
http://dbpedia.org/resource/Madrid
….
…. ….
….
….
….
GeoLinkedData - Linking
owl:sameAs owl:sameAs
37
SPARQL
Pubby
Linked Data HTML
Virtuoso 6.1.0
Pubby 0.3
Including Provenance Support
http://www4.wiwiss.fu-berlin.de/pubby/
GeoLinkedData – Dataset publication Publication
Overview
• Faceted browsing tool for exploring and visualizing RDF datasets enhanced with geospatial information.
38
map4rdf SPARQL
Triplestore
http://oegdev.dia.fi.upm.es/projects/map4rdf/ http://github.com/pejot/linkeddata-visualization-tools/
Exploitation
Basic architecture
39
Triplestore*
Web*Server*
SPARQL*Endpoint*
Web*Client* Faceted*Browsing**Interface*
Command*A* Command*B* Command*C* Command*D* Command*E*
Invoker*
Event*Bus*
Dispatch*Servlet*
Command*Handler*A*
Command*Handler*B*
Command*Handler*C*
Command*Handler*D*
Command*Handler*E*
Data*Access*Object*(DAO)*
Model View Presenter Pattern Command Pattern Dependency Injection Pattern
http://blog.hivedevelopment.co.uk/2009/08/google-web-toolkit-gwt-mvp-example.html
How to use it
40
map4rdf.war
configuration.properties
Google Maps
41
OpenStreetMap
42
Open Layers Map
43
Catalogue Service Web
44
• http://www.idee.es/csw-inspire-idee/srv/en/main.home • http://www.idee.es/csw-inspire-idee/servicio
• Also we can include any other WMS service
Complex geometries
45
Suggestions of Editions / Data curation
• For example, let us assume that the resource that represents the Murcia Airport has not the correct information.
46
Provinces
47
Integrate data coming from the Statistical Institute
48
Provinces – Industry Production Index
49
Specific visualizations for datasets based on SCOVO and RDF Data Cube
http://vocab.deri.ie/scovo http://www.w3.org/TR/vocab-data-cube/
ToC
• Introduction
• Publishing & Consuming Linked Open Data
• Use cases • GeoLinkedData - ES • AEMET - ES • El Viajero - ES • datos.bne.es - ES • Service Indicators - IE
• Conclusions and future work
50
Meteorological Linked Data - AEMET
• AEMET, Spanish Meteorological Office
• Meteorological data registered by its weather stations, radars, lightning detectors and ozone soundings.
• http://aemet.linkeddata.es/
51
Specification
• Identification of data sources • 250 weather stations (pressure,
humidity, etc) • Data from the stations in CSV files in a
FTP server
• URI design
11/02/11
52
• Reuse existing resources • Well known ontologies • Reuse our own ontologies • Ontology Design Patterns • Standards and classifications • Repositories: Swoogle, Watson, etc.
Modelling
11/02/11
OWL-Time ontology
SSN ontology
• Geobuddies Ontology Network • GeoLinkedData
WGS84 Geo Positioning
53
Generation
• Generate instances of the ontology from the previous steps • Phyton script to covert CSV to RDF instances
11/02/11
Generation
54
Publication
• Virtuoso Open Source Edition • http://aemet.linkeddata.es/sparql
11/02/11
55
Exploitation
• Visualization • http://aemet.linkeddata.es/browser.html
• Based on map4RDF
• Gflot • Flot is a pure Javascript plotting library, and Gflot is a GWT
adaptation of Flot.
11/02/11
http://code.google.com/p/gflot/
56
Weather stations
11/02/11
57
Observations for each station
11/02/11
58
A particular observation
11/02/11
Specific visualization for datasets based on SSN Ontology – ongoing work
http://www.w3.org/2005/Incubator/ssn/ssnx/ssn
59
ToC
• Introduction
• Publishing & Consuming Linked Open Data
• Use cases • GeoLinkedData - ES • AEMET - ES • El Viajero - ES • datos.bne.es - ES • Service Indicators - IE
• Conclusions and future work
60
El Viajero – tourism and travelling
• Content is aggregated from different platforms, such as “Suplemento El País”, ”Guías Aguilar”, “Canal Viajar” o “Prisa Digital”.
• Heterogeneous content (images, travel guides, posts, videos, news) with different sources and from people with different profiles (journalists, bloggers and normal users)
61
Modelling
Ontology network
• OPM (1): • Centered in the description of
the evolution of the resource.
• OPM profile (2): • OPM Extension to our specific
domain.
• SIOC (3): • Describes the social
relationships in the platforms, plus posts and blogs.
• MPEG-7 (3): • Image and video description.
• GEO (3): • Localization of the resources.
OPM Core
OPM extension to our domain
SIOC MPEG-7 GEO
1
2
3
62
Overview of the architecture
Repository
Post Parser
Blog Parser
XML Parser
IPTC
Parser
PARSERS
Annotation interface
HTTP POST
Request
HTTP GET
Request (SPARQL query)
REST API
Insert processed data
Store in the repository
Insert XML data Receive request Send
response
User/content provider Application
Send/receive
RDF response
OWL Model SPARQL request
63
Linking • SILK has been used to:
• Link resources to DBpedia through gelocation • Link resources to GeolinkedData through geolocation
• Linking resources to LUF (Linked User Feedback). • Guide & travel recommendation.
• Linking travel guides to hotels and restaurants of “Guía Santillana”.
SILK
64
Exploitation
El Viajero: • Extension of map4rdf to our domain.
• New queries for browsing resources • Image addition • Filtering and time-line plugins
Additional exploitation: • Resource searcher using the dataset. • LARKC demo (ISOCO) http://contextmanager.isoco.net/webn1/demolarkc/
http://www.simile-widgets.org/timeline/
65
Browser
66
Initial screen
Selecting a type of resource, we will see all of the available resources on the map
Guide Browsing
67
More images of the guides
Link to the news in “El Viajero”
Pubby frontend
Guide Browsing
68
More images of the guides
Year filtering
69
Plugin selection Year selection
Trip Browsing
70
Trip metadata Itinerary followed in the trip
Timeline
71
Trip timeline (drawn from its provenance
information)
Trip features (price, duration, type, etc)
Quick search - Author
72
Reference to locations
Guides
ToC
• Introduction
• Publishing & Consuming Linked Open Data
• Use cases • GeoLinkedData - ES • AEMET - ES • El Viajero - ES • datos.bne.es - ES • Service Indicators - IE
• Conclusions and future work
73
datos.bne.es project
• Joint project between the National Library of Spain (BNE) and Ontology Engineering Group
• Started as a small proof-of-concept project:
Publishing "Cervantes" Datasets as LD
• Evolved into a bigger project: Publishing a significant part of the BNE catalogue
• Published in December 2011, public announcement
at BNE
74
datos.bne.es: Methodological approach
• Derived from several experiences at OEG: geolinkeddata.es, Met agency, etc. [1]
• Design principle: Have more control over the different
activities, allow for iterative, incremental process
75
Data specification
Modelling
RDF generation
Link generation
[1] Villazón-Terrazas, B. et al., Methodological Guidelines for Publishing Government Linked Data. In D. Wood, ed. Linking Government Data. Springer.
Publication
Exploitation
www.oeg-upm.net/index.php/es/technologies/228-marimba
Specification
• Records in the MARC 21 format • 3.9 million bibliographical records • 4.2 million authority records • Version: November, 2011
76
IFLA Vocabulary-based ontology
77
Modelling
MARiMbA generates RDF using RDFS/OWL ontologies
BNE
78
Generation
VIAF, DNB, SUDOC, LIBRIS, DBpedia
BNE
http://datos.bne.es/resource/XX1718747
Same As Same As
Same As
Same As
Same As
LIBRIS
http://libris.kb.se/resource/auth/45369
SUDOC
http://www.idref.fr/026774771/id
DNB
http://d-nb.info/gnd/11851993X
DBpedia
http://dbpedia.org/resource/Miguel_de_Cervantes
VIAF http://viaf.org/viaf/17220427
79
Linking
Publication
Data publication Metadata publicacion using VoID To facilitate the discovery
• Register in CKAN your dataset
• Use sitemap4rdf to generate the site map
• Upload the site map to Google and Sindice
80
Exploitation
select distinct COUNT(?Obras) where { http://datos.bne.es/resource/XX1718747 <http://iflastandards.info/ns/fr/frbr/frbrer/P2010> ?Obras }
URI Cervantes
Is author
SPARQL Queries: http://datos.bne.es/sparql
Web Interface
http://bne.linkeddata.es
81
Author Graph
• http://bne.linkeddata.es/graphvis/
82
Open Graph Viz platform http://gephi.org/
ToC
• Introduction
• Publishing & Consuming Linked Open Data
• Use cases • GeoLinkedData - ES • AEMET - ES • El Viajero - ES • datos.bne.es - ES • Service Indicators - IE
• Conclusions and future work
83
County Rank
• Data sources • Services Indicators (Local Government Management
Services Board) • Wikipedia/Dbpedia
• Purpose • Assess county/city performance • Compare counties/cities
84
Specification – Spreadsheet about statistics
• Service Indicators of Ireland
85
Specification – Spreadsheet about statistics
• Service Indicators of Ireland • Data for 2009
86
Specification - URI design
• Base URI
• http://stats.data-gov.ie
• TBOX URI
• We use the RDF Data Cube Vocabulary
• ABOX URI
• http:// stats.data-gov.ie /data/{resourceType}/{resource}
87
RDF Data Cube – Main elements
88
Modelling
RDF Data Cube - Concepts
89
stats:concept/f
skos:Concept
stats:concept/f-1 stats:concept/f-2
stats:concept/f-1-2
skos:broader skos:broader
skos:broader rdf:type
rdf:type
rdf:type
rdf:type
rdf:http://www.w3.org/1999/02/22-rdf-syntax-ns# skos: http://www.w3.org/2004/02/skos/core# stats: http://stats.ull.es/resource/
RDF Data Cube - Properties
90
qb:MeasureProperty
stats:property/f-1-2
rdf:type
stats:concept/f-1-2
qb:obsValue rdfs:subPropertyOf
qb:concept
xsd:double
rdfs:range
“Average time …”
rdfs:label
rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns# rdfs: http://www.w3.org/2000/01/rdf-schema# qb: http://purl.org/linked-data/cube# stats: http://stats.ull.es/resource/
RDF Data Cube – Data structure definition
91
qb:DataStructureDefinition
stats:dsd/f-1-2
rdf:type
stats:componet/geoArea
stats:componet/refPeriod
stats:componet/f-1-2
qb:component
qb:component
qb:component
rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns# qb: http://purl.org/linked-data/cube# stats: http://stats.ull.es/resource/
RDF Data Cube – DataSet
92
stats:data/f-1-2
stats:data/f-1-2/2009/county/donegal
qb:dataSet
stats:data/f-1-2/2009/county/cavan
……
qb:dataSet
qb:DataSet
rdf:type
qb:Observation
rdf:type rdf:type
stats:dsd/f-1-2
qb:structure
rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns# qb: http://purl.org/linked-data/cube# stats: http://stats.ull.es/resource/
RDF Data Cube – Observation
93
stats:data/f-1-2 stats:data/f-1-2/2009/county/donegal
qb:dataSet
qb:Observation
http://reference.data.gov.uk/id/
year/2009
sdmx-dimension:refPeriod
rdf:type
http://geo.data-gov.ie/county/donegal
property:geoArea
5.29
property:f-1-2
rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns# qb: http://purl.org/linked-data/cube# stats: http://stats.ull.es/resource/ property: http://stats.data-gov.ie/property/ sdmx-dimension: http://purl.org/linked-data/sdmx/2009/dimension#
Publication
• http://data-gov.ie/sparql
94
Metadata publication – VoID
• VoID description • void.ttl
95
Exploitation http://county-rank.data-gov.ie/
Google charts tools https://developers.google.com/chart/ 96
Information about the County
97
98
99
100
ToC
• Introduction
• Publishing & Consuming Linked Open Data
• Use cases
• Conclusions and future work
101
Conclusions & Future Work
• Keep working on visualizations for specific vocabularies
• Integrate different visualizations
• Develop applications that • promote transparency, • allow the creation of new, innovative, added-value services.
102
Consuming Linked Open Data
WorkdistributedunderthelicenseCreativeCommonsAttribution-Noncommercial-Share Alike 3.0
Boris Villazón-Terrazas Facultad de Informática, Universidad Politécnica de Madrid
Campus de Montegancedo sn, 28660 Boadilla del Monte, Madrid http://www.oeg-upm.net
[email protected] Phone: 34.91.3366605, Fax: 34.91.3524819
@boricles Slides available at: http://www.slideshare.net/boricles/
Acknowledgements: Alexander de Leon, Filip Wisniewki, Daniel Vila-Suero, Daniel Garijo, Victor Saquicela, Michael Hausenblas, Richard Cyganiak, Sarven Capadisli, Oscar Corcho, Asunción Gómez-Pérez, all OEG members involved in the Linked Data initiatives, and Local Government Management Services Board - Ireland.