View
220
Download
3
Embed Size (px)
Citation preview
SPARQL Query Rewriting for Implementing Data Integration over Linked Data
Gianluca Correndo, Manuel Salvadores, Ian Millard, Hugh Glaser, Nigel Shadbolt
Linked Data access• Retrieving RDF content via HTTP requests
– Instance based vs. schema based access
• Accessing SPARQL endpoints
– Schema based vs. instance based access
2SPARQL+HTTP
Linked Data – Schema based integration
3
source target
Data set
Ontology(SPARQL) Query
Co-reference
OA = <SO,TO,TD,EA>SO: Source OntologiesTO: Target OntologiesTD: Target DatasetEA: Entity Alignments
• Datasets can use more than one ontology for describing the data• More than one dataset can use the same set of ontologies coherently (e.g. RKB)• More than one ontology is used for defining a SPARQL query• Ontologies contain many entities to be aligned
Query Rewriting Architecture
4
<source>SPARQLquery
SPARQL query
rewriter
<target>SPARQLquery
<KISTI>SPARQLquery
<dbpedia>SPARQLquery
voiDAlignments
Ontology Alignment• DL primitives are used to describe concept alignments
(i.e. Equivalent, Subsume)
– Implementation of the underneath ontological mediation usually not provided or relies on reasoners
• Ontological mediation usually applied to data, not queries
– rule systems that exploit alignments to translate data
– [Euzenat] SPARQL for integrating dataCONSTRUCT { ?x rdf:type vc:VCard } WHERE { ?x rdf:type foaf:Person }
How to write such queries?
5
Anatomy of a SPARQL query• Query type: SELECT, DESCRIBE, CONSTRUCT, ASK
• Basic Graph Pattern (or BGP): graph pattern that resulting triples must satisfy
• Filter section: additional constraints over variables present in the BGP
PREFIX id:<http://southampton.rkbexplorer.com/id/>PREFIX akt:<http://www.aktors.org/ontology/portal#>SELECT DISTINCT ?a WHERE {
?paper akt:has-author id:person-02686 .?paper akt:has-author ?a .
}
6
SPARQL BGPPREFIX id:<http://southampton.rkbexplorer.com/id/>PREFIX akt:<http://www.aktors.org/ontology/portal#>SELECT DISTINCT ?a WHERE {
?paper akt:has-author id:person-02686 , ?a .}
•“DISTINCT ?a” is not represented in this graph
•Constraints over nodes can be represented either as a graph and within FILTER section
7
?paper
id:person-02686
akt:has-author
?a
akt:has-author
Entity Alignment as Graph Rewriting• Query rewriting based on BGP graph rewriting
• Entity Alignment EA = <LHS, RHS, FD>
– LHS : Triple to match (open variables to bind)
– RHS : Set of triples to instantiate (depending on previous bindings on open variables)
– FD : Functional dependencies (between variables)
8
Entity Alignment as Graph Rewriting• Using the graph rewriting formalism we can
rewrite queries defined for a dataset (or ontology) to integrate results from other data sets
– But not only, we can also generate CONSTRUCT queries to integrate entire data sets
9
SPARQL Rewriting• Each triple from the BGP is matched to the LHSs
(generating variable bindings in the process)
• Eventual functional dependencies are solved (enriching the bindings with new associations)
• The respective RHS is instantiated with the given bindings and replace the original triple
• Unbounded variables generates new variables
10
SPARQL Rewriting• Example:
– LHS1 = <_:1,rdf:type, source:A>
– RHS1 = {<_:1,rdf:type,target:B>}
– FD1 = {}
• <?p,rdf:type,source:A> = LHS1[_:1/?p]
• RHS1[_:1/?p]=<?p,rdf:type,target:B>
• _:1 it’s the RDF way to define blank nodes, that are treated, within a graph, as existentially quantified variables.Triple(v1,rdf:type,source:A)Triple(v1,rdf:type,target:B)
11
SELECT *WHERE { ?s a source:User.…}
<_:1,rdf:type,source:User>
SELECT *WHERE { ?s a target:Agent.…}
<_:1,rdf:type,target:Agent>
Ontology Alignments – Class Eq.
_:1
source:User
rdf:type
_:1
target:Agent
rdf:type
12
SELECT *WHERE { ?s a source:WhiteWine.…}
<_:1,rdf:type,source:WhiteWine>
SELECT *WHERE { ?s a target:Vin; target:has-color ”blanc”@fr…}
<_:1,rdf:type,target:Vin><_:1,target:has-color, ”blanc”@fr>
Ontology Alignments – Class Partition
_:1
source:WhiteWine
rdf:type
_:1
target:Vin
rdf:type
“blanc”@frtarget:has-color
13
SELECT *WHERE { ?s source:has-name ?n.…}
<_:1,source:has-name,_:2>
SELECT *WHERE { ?s target:fullName ?n.…}
<_:1,target:fullName,_:2>
Ontology Alignments – Property Eq.
_:1
source:has-name
_:1
target:fullName
_:2 _:2
14
SELECT *WHERE { ?p akt:has-author ?a.…}
<_:1,akt:has-author,_:2>
SELECT *WHERE { ?s kisti:CreatorInfo ?i. ?i kisti:hasCreator ?a…}
<_:1,kisti:CreatorInfo,:_3><_:3,kisti:hasCreator,_:2>
Ontology Alignments – Property Eq.
_:1
akt:has-author
_:1
kisti:CreatorInfo
_:2_:3
_:2
kisti:hasCreator
15
SELECT *WHERE { ?p source:temp ”10”^^C.…}
<_:1,source:temp,_:2>
SELECT *WHERE { ?p target:farenheit ”50”^^F…}
<_:1,target:farenheit,_:2>
Ontology Alignments – Property Eq.
_:1
source:temp
_:1
target:farenheit
_:2 _:2
binding directly Celsius values to Fahrenheit is wrong, the two values are linked by a functional dependency.
_:3
celsius2farenheit
16
SPARQL Rewriting• PREFIX id:<http://southampton.rkbexplorer.com/id/>
PREFIX akt:<http://www.aktors.org/ontology/portal#>SELECT DISTINCT ?a WHERE {
?paper akt:has-author id:person-02686 .?paper akt:has-author ?a .
}
17
?paper
id:person-02686
akt:has-author
?a
akt:has-author
_:1
akt:has-author
_:1
kisti:CreatorInfo
_:2_:3
_:2
kisti:hasCreator
SPARQL Rewriting
18
?paper
id:person-02686
akt:has-author
?a
akt:has-author
?paper
id:person-02686
kisti:CreatorInfo
?new1
akt:has-author
?a
kisti:hasCreator
?paper
id:person-02686
kisti:CreatorInfo
?new1
kisti:hasCreator
?a
kisti:hasCreator
?new2
kisti:CreatorInfo Problemin KISTI dataset <http://southampton.rkbexplorer.com/id/person-02686> is unknown.
Co-reference integration• Constants in the query (like URIs) must be translated in
order to retrieve correct results
• URI equivalences are maintained by co-reference services like http://sameas.org accessible via REST interface.
• Modeled as functional dependency within variables
– Function returns the equivalent URI that satisfy a regex pattern
– Datasets maintain URIs that are recognizable by a common schema (prefix for sure, e.g. http://dbpedia.org/resource/*)
19
Co-reference integration
20
_:11
akt:has-author
_:12
kisti:CreatorInfo
_:21
_:3
_:22
kisti:hasCreator
sameas
sameas
id:person-02686 kisti:PER_000000000105047
http://kisti.rkbexplorer.com/id/\S*
Implementation• Java package based on Jena API for SPARQL Query
rewriting
• Code not released yet (planning to integrate it with INRIA ontology alignment API)
21
Progress report• Contact with Francois Schraffe and Jerome Euzenat
• Partial mapping to EDOAL ontology alignment specification (work in progress)
• SPARQL query rewriter to be implemented in the Alignment API (partially done)
22
EDOAL - Expressive and Declarative Ontology Alignment Language • Construction of entities from other entities can be
expressed through algebraic operators
• Restrictions can be expressed on entities in order to narrow their scope.
• Transformations of property values can be specified. Property values using different encoding or units can be aligned using transformations.
23
EDOAL - Example
24
<http://oms.omwg.org/wine-vin/MappingRule_3> :entity1 wine:Bordeaux ; :entity2 [ edoal:and (vin:Vin [
a edoal:AttributeValueRestriction edoal:comparator xsd:equals ; edoal:onAttribute [ edoal:compose (vin:hasTerroir proton:locatedIn ) ; a edoal:Relation ] ; edoal:value vin:Aquitaine ] ) ; a edoal:Class ] ; :measure "1."^^xsd:float ; :relation "SubsumedBy" ; a :Cell .
Internal Representation
25
_:6
rdf:type_:6
rdf:typewine:Bordeaux
vin:Vin
vin:Aquitaine
vin:hasTerroir_:9
proton:locatedIn
Progress report• Graph pattern rewriting can be used also for
creating CONSTRUCT queries for translate RDF graphs with different ontologies.
26
CONSTRUCT { ?9 <http://proton.semanticweb.org/locatedIn> <http://ontology.deri.org/vin#Aquitaine> . ?6 <http://ontology.deri.org/vin#hasTerroir> ?9 . ?6 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://ontology.deri.org/vin#Vin> .}WHERE { ?6 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/TR/2003/CR-owl-guide-20030818/wine#Bordeaux> .}
Outline• Linked Data
– Data topology
– Data access
• Query Rewriting
– Ontology Alignment
– Entity Alignment
– SPARQL rewriting
28
Linked Data topology• Foreign URIs for referring to external entities
• Co-references for referring to instance “equivalence”
29