Upload
olaf-hartig
View
3.499
Download
2
Tags:
Embed Size (px)
DESCRIPTION
These are the slides from my ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data"
Citation preview
ICWE 2012 Tutorial
An Introduction to SPARQL and Queries over Linked Data
● ● ●
Chapter 3: Querying Linked Data
Olaf Hartighttp://olafhartig.de/foaf.rdf#olaf
@olafhartig
Database and Information Systems Research GroupHumboldt-Universität zu Berlin
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 2
Chapter 3
Accessing a SPARQL Endpoint Queries over Multiple Datasets Linked Data Queries
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 3
SPARQL Endpoints
● SPARQL query processing service
● Supports the SPARQL protocol
● Issuing a SPARQL query is an HTTP GET requestwith parameter query
GET /sparql?query=PREFIX+rd... HTTP/1.1Host: dbpedia.orgUser-agent: my-sparql-client/0.1
URL-encoded stringwith the SPARQL query
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 4
Query Result Formats
● For SELECT and ASK queries: XML, JSON, plain text
● For CONSTRUCT and DESCRIBE: RDF/XML, Turtle, ...
● How to request?● ACCEPT header
● Non-standard alternative: parameter out
GET /sparql?query=PREFIX+rd... HTTP/1.1Host: dbpedia.orgUser-agent: my-sparql-client/0.1Accept: application/sparql-results+json
GET /sparql?out=json&query=... HTTP/1.1Host: dbpedia.orgUser-agent: my-sparql-client/0.1
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 5
SPARQL Client Libraries
● More convenient than on the protocol level:● SPARQL JavaScript Library
http://www.thefigtrees.net/lee/blog/2006/04/sparql_calendar_demo_a_sparql.html
● ARC for PHP http://arc.semsol.org/● RAP – RDF API for PHP
http://www4.wiwiss.fu-berlin.de/bizer/rdfapi/index.html● Jena / ARQ (Java) http://jena.sourceforge.net/● Sesame (Java) http://www.openrdf.org/● SPARQL Wrapper (Python)
http://sparql-wrapper.sourceforge.net/● PySPARQL (Python)
http://code.google.com/p/pysparql/
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 6
SPARQL Client Libraries
● Example with Jena ARQ:
import com.hp.hpl.jena.query.*;
String service = "..."; // address of the SPARQL endpointString query = "SELECT ..."; // your SPARQL queryQueryExecution e = QueryExecutionFactory.sparqlService( service, query );ResultSet results = e.execSelect();while ( results.hasNext() ) {
QuerySolution s = results.nextSolution();// …
}e.close();
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 7
SPARQL Endpoints
● Several Linked Data sets exposed via SPARQL endpoint● DBpedia http://dbpedia.org/sparql● Musicbrainz http://dbtune.org/musicbrainz/sparql● Semantic Web dog food http://data.semanticweb.org/sparql● etc. http://esw.w3.org/topic/SparqlEndpoints
● Send your query, receive the result
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 8
SPARQL Endpoints
● Several Linked Data sets exposed via SPARQL endpoint● DBpedia http://dbpedia.org/sparql● Musicbrainz http://dbtune.org/musicbrainz/sparql● Semantic Web dog food http://data.semanticweb.org/sparql● etc. http://esw.w3.org/topic/SparqlEndpoints
● Send your query, receive the result
Querying a single dataset is quite boring
compared to:
Issuing SPARQL queries over multiple datasets
Querying a single dataset is quite boring
compared to:
Issuing SPARQL queries over multiple datasets
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 9
Chapter 3
Accessing a SPARQL Endpoint Queries over Multiple Datasets Linked Data Queries
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 10
Chapter 3
Accessing a SPARQL Endpoint Queries over Multiple Datasets
➢ Query a given collection➢ Manage your own collection➢ Use a query federation system➢ Link traversal based query execution
Linked Data Queries
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 11
Querying a Given Collection
● Some public SPARQL endpoints provide access to a collection of data from multiple sources● http://lod.openlinksw.com/sparql● http://sparql.sindice.com/
● Pros:● Nothing to set up● Good query execution times
● Cons:● Queried data might be out of date● Not all relevant data in the collection
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 12
Setting up Your Own Collection
● RDF-specific DBMSs:● Virtuoso http://virtuoso.openlinksw.com/● Allegro Graph http://www.franz.com/agraph/allegrograph/● Bigdata http://www.systap.com/bigdata.htm● OWLIM http://www.ontotext.com/owlim● 4store http://4store.org/● Jena TDB
http://jena.apache.org/● Sesame
http://www.openrdf.org/● etc.
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 13
Populating Your Own Collection
● Datasets provided as RDF dumps
● (Focused) crawling● ldspider http://code.google.com/p/ldspider/
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 14
Setting up Your Own Collection
● Pros:● All relevant data● Independent of existence, availability,
efficiency of SPARQL endpoints● Good query execution times
(once set up properly)
● Cons:● Effort to set up● Effort to operate● Queried data might
be out of date
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 15
Chapter 3
Accessing a SPARQL Endpoint Queries over Multiple Datasets
➢ Query a given collection➢ Manage your own collection➢ Use a query federation system➢ Link traversal based query execution
Linked Data Queries
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 16
???
?
SPARQL Endpoint Federation
● Idea of federated query processing:● Querying a query federation
service (mediator)● Mediator distributes
sub-queries torelevant sources
● Finally, mediatorcombinessub-results
● Prototypes:● FedX● SPLENDID● ANAPSID
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 17
???
?
SPARQL Endpoint Federation
● Pros:● Queried data is up to date
● Cons:● All relevant datasets
must be exposed viaa SPARQL endpoint
● Effort to setup mediator
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 18
SPARQL 1.1 Federation Extension
● SERVICE pattern in SPARQL 1.1● Explicitly specify query patterns whose execution
must be distributed to a remote SPARQL endpoint
SELECT ?v ?ve WHERE
{
?v rdf:type umbel-sc:Volcano ;
p:location dbpedia:Italy .
SERVICE <http://volcanos.example.org/query> {
?v p:lastEruption ?ve }
}
SELECT ?v ?ve WHERE
{
?v rdf:type umbel-sc:Volcano ;
p:location dbpedia:Italy .
SERVICE <http://volcanos.example.org/query> {
?v p:lastEruption ?ve }
}
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 19
For all these approaches ...
● … you have to know the relevant data sources beforehand● When selecting a SPARQL endpoint over an existing
collection of datasets● When setting up your own collection● When configuring your federation system● When using the SERVICE pattern
● … you restrict yourself to the selected sources
● … you do not tap the full potential of the Web
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 20
Chapter 3
Accessing a SPARQL Endpoint Queries over Multiple Datasets
➢ Query a given collection➢ Manage your own collection➢ Use a query federation system➢ Link traversal based query execution
Linked Data Queries
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 21
Main Idea
Discovered data
● Intertwine query evaluation with traversal of data links
● We alternate between:● Evaluate parts of the query (triple patterns)
on a continuously augmented set of data● Look up URIs in intermediate
solutions and add retrieved datato the query-local dataset
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 22
Main Idea
Discovered data
filmingLocationlives_in
?loc
Queryhttp://.../movie2449
acto
r_in
?actor
● Intertwine query evaluation with traversal of data links
● We alternate between:● Evaluate parts of the query (triple patterns)
on a continuously augmented set of data● Look up URIs in intermediate
solutions and add retrieved datato the query-local dataset
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 23
Main Idea
Queried data
filmingLocationlives_in
?loc
Queryhttp://.../movie2449
acto
r_in
?actor
http://.../movie2449
?
● Intertwine query evaluation with traversal of data links
● We alternate between:● Evaluate parts of the query (triple patterns)
on a continuously augmented set of data● Look up URIs in intermediate
solutions and add retrieved datato the query-local dataset
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 24
● Intertwine query evaluation with traversal of data links
● We alternate between:● Evaluate parts of the query (triple patterns)
on a continuously augmented set of data● Look up URIs in intermediate
solutions and add retrieved datato the query-local dataset
Main Idea
Queried data
filmingLocationlives_in
?loc
Queryhttp://.../movie2449
acto
r_in
?actor
http://mdb.../Paul
?actor
actor_in
http://.../movie2449
http://mdb.../Paul
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 25
● Intertwine query evaluation with traversal of data links
● We alternate between:● Evaluate parts of the query (triple patterns)
on a continuously augmented set of data● Look up URIs in intermediate
solutions and add retrieved datato the query-local dataset
Main Idea
Queried data
filmingLocation
http://.../movie2449
acto
r_in
lives_in ?loc
Query
?actor
http
://m
db...
/Pau
l
?
http://mdb.../Paul
?actor
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 26
● Intertwine query evaluation with traversal of data links
● We alternate between:● Evaluate parts of the query (triple patterns)
on a continuously augmented set of data● Look up URIs in intermediate
solutions and add retrieved datato the query-local dataset
Main Idea
http://mdb.../Paul
?actor
Queried data
http://mdb.../Paul http://geo.../Berlin
?loc?actor
filmingLocation
http://.../movie2449
acto
r_in
lives_in ?loc
Query
?actor
lives_inhttp://geo.../Berlin
http://mdb.../Paul
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 27
● Intertwine query evaluation with traversal of data links
● We alternate between:● Evaluate parts of the query (triple patterns)
on a continuously augmented set of data● Look up URIs in intermediate
solutions and add retrieved datato the query-local dataset
Main Idea
http://mdb.../Paul
?actor
Queried data
http://mdb.../Paul http://geo.../Berlin
?loc?actor
filmingLocation
http://.../movie2449
acto
r_in
lives_in ?loc
Query
?actor
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 28
“Real World” Example
SELECT DISTINCT ?author ?phone WHERE {
?pub swc:isPartOf <http://data.semanticweb.org/conference/eswc/2009/proceedings> .
?pub swc:hasTopic ?topic . ?topic rdfs:label ?topicLabel .
FILTER regex( str(?topicLabel), "ontology engineering", "i" ) .
?pub swrc:author ?author .
{ ?author owl:sameAs ?authorAlt }
UNION
{ ?authorAlt owl:sameAs ?author }
?authorAlt foaf:phone ?phone
}
Return phone numbers ofauthors of ontology engineering papers
at ESWC'09.
2
297
161min 30sec
Result size
# of retrieved docs
# of accessed servers
avg. execution time
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 29
Summary
O. Hartig and A. Langegger. A Database Perspective on Consuming Linked Data on the Web. Datenbankspektrum 10(2), 2010
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 30
Chapter 3
Accessing a SPARQL Endpoint Queries over Multiple Datasets
➢ Query a given collection➢ Manage your own collection➢ Use a query federation system➢ Link traversal based query execution
Linked Data Queries➢ Foundations➢ Iterator Based Implementation➢ Query Planning
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 31
http://mdb.../Paul http://geo.../Berlin
?loc?actor
SPARQL Pattern Evaluation
eval(P,G ) = { μ1 , μ2 , ... }
filmingLocationlives_in
?loc
http://.../movie2449
acto
r_in
?actor
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 32
http://mdb.../Paul http://geo.../Berlin
?loc?actor
QP(W ) = { μ1 , μ2 , ... }
SPARQL Linked Data Query
filmingLocationlives_in
?loc
http://.../movie2449
acto
r_in
?actor
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 33
QP(W ) = { μ1 , μ2 , ... }
Full-Web Semantics
eval(P,AllData(W ))
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 34
Reachability-based Semantics
● Seed URIs S
● Reachability criterion c
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 35
Reachability-based Semantics
WQP,S( ) = eval(P,AllData(W
* ))c
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 36
Reachability-based Semantics
WQP,S( ) = eval(P,AllData(W
* ))cAll
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 37
Reachability-based Semantics
WQP,S( ) = eval(P,AllData(W
* ))cNone
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 38
Reachability-based Semantics
WQP,S( ) = eval(P,AllData(W
* ))cMatch
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 39
TM
Computability
● (Ordinary) Turing machinesunsuitable:● Limited data access capabilities
not properly captured
● Web machines● Abiteboul and Vianu, 1997● Mendelzon and Milo, 1997
WQP,S( )cMatch
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 40
LD Machine
# enc(u1) enc(adoc(u1)) # enc(u2) enc(adoc(u2)) # ∙ ∙ ∙
● Multi-tape Turing machine➔ Web Input
➔ Input
➔ Work
➔ Output
● Access to Web input is restricted● Only by performing
a particular procedurein a particular state
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 41
# enc(u1) enc(adoc(u1)) # enc(u2) enc(adoc(u2)) # ∙ ∙ ∙➔ Web Input
➔ Input
➔ Work
➔ Output
● For Q exists an LD machine MQ such that for any W holds:
● MQ halts after a finite number of computation steps, and
● MQ outputs the complete result Q(W )
Finitely Computable LD Queries
step 1 ∙ ∙ ∙ step k - 3 step k - 2 step k – 1 step k
∙ ∙ ∙
# enc(μ1) # enc(μ2) # ∙ ∙ ∙ # enc(μn) #
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 42
Eventually Computable LD Queries
stepk + 2
∙ ∙ ∙∙ ∙ ∙
stepk - 3
stepk - 2
stepk - 1
stepk
stepk + 1
# enc(u1) enc(adoc(u1)) # enc(u2) enc(adoc(u2)) # ∙ ∙ ∙
# enc(μ1) # enc(μ2)
➔ Web Input
➔ Input
➔ Work
➔ Output
● For Q exists an LD machine MQ such that for any W holds:
1. Output always encodes a subset of query result Q(W ), and
2. Each μ Q(W ) eventually appears on the output
✗ No guarantee for termination
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 43
Main Results for cMatch-Semantics
Theorem: Any satisfiable SPARQL based Linked Data query QP,S under cMatch-semantics that is monotonic, is at least eventually computable; Any non-monotonic QP,S is either finitely computable or not even eventually computable.
Theorem: Any satisfiable SPARQL based Linked Data query QP,S under cMatch-semantics that is monotonic, is at least eventually computable; Any non-monotonic QP,S is either finitely computable or not even eventually computable.
cMatch
cMatch
Theorem: TERMINATION(cMatch) is not LD machine decidable.Theorem: TERMINATION(cMatch) is not LD machine decidable.
Problem: TERMINATION(cMatch )
Web Input: W – a (potentially infinite) Web of Linked DataOrd.Input: S – a finite but nonempty set of seed URIs
P – a SPARQL expressionQuestion: Does an LD machine exist that computes QP,S (W )
and halts?
Problem: TERMINATION(cMatch )
Web Input: W – a (potentially infinite) Web of Linked DataOrd.Input: S – a finite but nonempty set of seed URIs
P – a SPARQL expressionQuestion: Does an LD machine exist that computes QP,S (W )
and halts?cMatch
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 44
Chapter 3
Accessing a SPARQL Endpoint Queries over Multiple Datasets
➢ Query a given collection➢ Manage your own collection➢ Use a query federation system➢ Link traversal based query execution
Linked Data Queries➢ Foundations➢ Iterator Based Implementation➢ Query Planning
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 45
?p ex:affiliated_with <http://.../orgaX>
?p ex:interested_in ?b
?b rdf:type <http://.../Book>
Query
tp1 = ( ?p , ex:affiliated_with , <http://.../orgaX> ) I
1
tp2 = ( ?p , ex:interested_in , ?b ) I
2
tp3 = ( ?b , rdf:type , <http://.../Book> ) I
3
Seed: <http://.../orgaX>
Iterator Based Execution
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 46
?p ex:affiliated_with <http://.../orgaX>
?p ex:interested_in ?b
?b rdf:type <http://.../Book>
Query
Seed: <http://.../orgaX>
tp1 = ( ?p , ex:affiliated_with , <http://.../orgaX> ) I
1
tp2 = ( ?p , ex:interested_in , ?b ) I
2
tp3 = ( ?b , rdf:type , <http://.../Book> ) I
3
query-localdataset
Iterator Based Execution
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 47
query-localdataset
Next?
Next?
Next?
tp1 = ( ?p , ex:affiliated_with , <http://.../orgaX> ) I
1
tp2 = ( ?p , ex:interested_in , ?b ) I
2
tp3 = ( ?b , rdf:type , <http://.../Book> ) I
3
Iterator Based Execution
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 48
Next?
tp1 = ( ?p , ex:affiliated_with , <http://.../orgaX> ) I
1
:
<http://.../alice> ex:affiliated_with <http://.../orgaX>
:
query-localdataset
{ ?p = <http://.../alice> }
Next?
tp2 = ( ?p , ex:interested_in , ?b ) I
2
tp3 = ( ?b , rdf:type , <http://.../Book> ) I
3
Iterator Based Execution
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 49
tp1 = ( ?p , ex:affiliated_with , <http://.../orgaX> ) I
1
query-localdataset
{ ?p = <http://.../alice> }
Next?
tp2 = ( ?p , ex:interested_in , ?b ) I
2
tp3 = ( ?b , rdf:type , <http://.../Book> ) I
3
Iterator Based Execution
Next?
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 50
Next?
tp3 = ( ?b , rdf:type , <http://.../Book> ) I
3
tp1 = ( ?p , ex:affiliated_with , <http://.../orgaX> ) I
1
tp2 = ( ?p , ex:interested_in , ?b )
tp2' = ( <http://.../alice> , ex:interested_in , ?b )
I2
:
<http://.../alice> ex:interested_in <http://.../b1>
:
query-localdataset
{ ?p = <http://.../alice> , ?b = <http://.../b1> }
Iterator Based Execution
{ ?p = <http://.../alice> }
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 51
Next?
tp3 = ( ?b , rdf:type , <http://.../Book> ) I
3
tp1 = ( ?p , ex:affiliated_with , <http://.../orgaX> ) I
1
tp2 = ( ?p , ex:interested_in , ?b )
tp2' = ( <http://.../alice> , ex:interested_in , ?b )
I2
query-localdataset
Iterator Based Execution
{ ?p = <http://.../alice> }
{ ?p = <http://.../alice> , ?b = <http://.../b1> }
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 52
tp1 = ( ?p , ex:affiliated_with , <http://.../orgaX> ) I
1
Next?
tp3 = ( ?b , rdf:type , <http://.../Book> )
tp3' = ( <http://.../b1> , rdf:type , <http://.../Book> )
I3
tp2 = ( ?p , ex:interested_in , ?b )
tp2' = ( <http://.../alice> , ex:interested_in , ?b )
I2
:
<http://.../b1> rdf:type <http://.../Book>
:
query-localdataset
Iterator Based Execution
{ ?p = <http://.../alice> }
{ ?p = <http://.../alice> , ?b = <http://.../b1> }
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 53
{ ?p = <http://.../alice> , ?b = <http://.../b1> }
tp1 = ( ?p , ex:affiliated_with , <http://.../orgaX> ) I
1
tp3 = ( ?b , rdf:type , <http://.../Book> )
tp3' = ( <http://.../b1> , rdf:type , <http://.../Book> )
I3
tp2 = ( ?p , ex:interested_in , ?b )
tp2' = ( <http://.../alice> , ex:interested_in , ?b )
I2
query-localdataset
Iterator Based Execution
{ ?p = <http://.../alice> }
{ ?p = <http://.../alice> , ?b = <http://.../b1> }
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 54
?p ex:affiliated_with <http://.../orgaX>
?p ex:interested_in ?b
?b rdf:type <http://.../Book>
Query
Seed: <http://.../orgaX>
tp2 = ( ?p , ex:interested_in , ?b ) I
2
Alternative Execution Order
tp3 = ( ?p , ex:affiliated_with , <http://.../orgaX>) I
3
tp1 = ( ?b , rdf:type , <http://.../Book> ) I
1
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 55
?p ex:affiliated_with <http://.../orgaX>
?p ex:interested_in ?b
?b rdf:type <http://.../Book>
Query
Seed: <http://.../orgaX>
tp2 = ( ?p , ex:interested_in , ?b ) I
2query-local
dataset
Iterator Based Execution
tp1 = ( ?b , rdf:type , <http://.../Book> ) I
1
tp3 = ( ?p , ex:affiliated_with , <http://.../orgaX>) I
3
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 56
Next?:
<http://.../alice> ex:affiliated_with <http://.../orgaX>
:
query-localdataset
Next?
tp1 = ( ?b , rdf:type , <http://.../Book> ) I
1
tp2 = ( ?p , ex:interested_in , ?b ) I
2
tp3 = ( ?p , ex:affiliated_with , <http://.../orgaX>) I
3
END!
Alternative Execution Order
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 57
query-localdataset
tp1 = ( ?b , rdf:type , <http://.../Book> ) I
1
tp2 = ( ?p , ex:interested_in , ?b ) I
2
tp3 = ( ?p , ex:affiliated_with , <http://.../orgaX>) I
3
END!
END!
END!Computed queryresult may dependon the order of triple patterns
= logical query execution plan
Alternative Execution Order
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 58
Chapter 3
Accessing a SPARQL Endpoint Queries over Multiple Datasets
➢ Query a given collection➢ Manage your own collection➢ Use a query federation system➢ Link traversal based query execution
Linked Data Queries➢ Foundations➢ Iterator Based Implementation➢ Query Planning
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 59
Query Plan Selection
Assumptions about Q P,S :● P refers to instance data● S = uris(P)
cMatch
● Assessment criteria:● Cost (query execution time)● Benefit (size of computed of result)
● Cost and benefit must be estimated without plan execution
● Estimation impossible due to “zero knowledge”
● Heuristic Based Plan Selection● DEPENDENCY RESPECT RULE
● SEED TP RULE
● NO VOCAB SEED RULE
● FILTERING TP RULE
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 60
Query Plan Selection
● Assessment criteria:● Cost (query execution time)● Benefit (size of computed of result)
● Cost and benefit must be estimated without plan execution
● Estimation impossible due to “zero knowledge”
● Heuristic Based Plan Selection● DEPENDENCY RESPECT RULE
● SEED TP RULE
● NO VOCAB SEED RULE
● FILTERING TP RULE
Assumptions about Q P,S :● P refers to instance data● S = uris(P)
cMatch
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 61
?p ex:affiliated_with <http://.../orgaX>
?p ex:interested_in ?b
?b rdf:type <http://.../Book>
Query
DEPENDENCY RESPECT RULE
● Dependency respect: a variable from each triple pattern already occurs in one of the preceding triple patterns
tp1 = ( ?p , ex:affiliated_with , <http://.../orgaX>) I
1
tp2 = ( ?p , ex:interested_in , ?b ) I
2
tp3 = ( ?b , rdf:type , <http://.../Book> ) I
3
√
Use a dependency respecting query plan
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 62
?p ex:affiliated_with <http://.../orgaX>
?p ex:interested_in ?b
?b rdf:type <http://.../Book>
Query
DEPENDENCY RESPECT RULE
● Dependency respect: a variable from each triple pattern already occurs in one of the preceding triple patterns
tp1 = ( ?p , ex:affiliated_with , <http://.../orgaX>) I
1
tp2 = ( ?p , ex:interested_in , ?b ) I
2
tp3 = ( ?b , rdf:type , <http://.../Book> ) I
3
Use a dependency respecting query plan
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 63
?p ex:affiliated_with <http://.../orgaX>
?p ex:interested_in ?b
?b rdf:type <http://.../Book>
Query
DEPENDENCY RESPECT RULE
● Dependency respect: a variable from each triple pattern already occurs in one of the preceding triple patterns
tp1 = ( ?p , ex:affiliated_with , <http://.../orgaX>) I
1
tp2 = ( ?b , rdf:type , <http://.../Book> ) I
2
tp3 = ( ?p , ex:interested_in , ?b ) I
3
Use a dependency respecting query plan
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 64
?p ex:affiliated_with <http://.../orgaX>
?p ex:interested_in ?b
?b rdf:type <http://.../Book>
Query
DEPENDENCY RESPECT RULE
Use a dependency respecting query plan
● Dependency respect: a variable from each triple pattern already occurs in one of the preceding triple patterns
● Rationale:Avoidcartesianproducts
tp1 = ( ?p , ex:affiliated_with , <http://.../orgaX>) I
1
tp2 = ( ?b , rdf:type , <http://.../Book> ) I
2
tp3 = ( ?p , ex:interested_in , ?b ) I
3
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 65
SEED TP RULE
● Potential seed triple pattern
… is a triple pattern that contains at least one HTTP URI
● Seed triple pattern of a plan
… is the first triple pattern in the plan and
… is a potential seed triple pattern
● Rationale: goodstarting point
Use a plan with a seed triple pattern
?p ex:affiliated_with <http://.../orgaX>
?p ex:interested_in ?b
?b rdf:type <http://.../Book>
Query
√√
√
Recall: S = uris(P)
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 66
NO VOCAB SEED RULE
● Not only vocabulary term URIs in the seed triple pattern
● Patterns to avoid: ?s ex:any_property ?o
?s rdf:type ex:any_class
● Rationale: URIs for vocabulary term usually resolve tovocabulary definitions with little instance data
Avoid a seed triple pattern with vocabulary terms
?p ex:affiliated_with <http://.../orgaX>
?p ex:interested_in ?b
?b rdf:type <http://.../Book>
Query
√
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 67
FILTERING TP RULE
● Filtering triple pattern: each variable already occurs in oneof the preceding triple patterns
● For each resultconsumed as inputa filtering TP canonly report 1 or 0results as output
● Rationale: Reduce cost
tp2 = ( ?p , ex:interested_in , ?b )
tp2' = ( <http://.../alice> , ex:interested_in , ?b )
I2
tp3 = ( ?b , rdf:type , <http://.../Book> )
tp3' = ( <http://.../b1> , rdf:type , <http://.../Book> )
I3
tp1 = ( ?p , ex:affiliated_with , <http://.../orgaX>) I
1
{ ?p = <http://.../alice> }
{ ?p = <http://.../alice> , ?b = <http://.../b1> }
Use a plan where all filtering triple patterns areas close to the seed triple pattern as possible
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 68
Evaluation Procedure
● Generate all possible plans
● Execute each plan:● 5 runs (+ 1 initial warm-up run) ● Use an initially empty query-local dataset for each run
● Measure for each plan:● Avg. execution time● Avg. number of RDF documents retrieved during execution● Avg. number of query results
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 69
Evaluation Query (Example)
SELECT ?spec ?genus WHERE {
geospecies:4qyn7 gs:inFamily ?fam .
?fam skos:narrowerTransitive ?spec .
?spec skos:closeMatch ?sp2 .
?sp2 rdfs:subClassOf ?genus .
?spec gs:isExpectedIn ?loc .
geospecies:4qyn7 gs:isExpectedIn ?loc
?loc rdf:type gs:State . }
● 2 potential seed triple patterns thatsatisfy our NO SEED VOCAB RULE
● 56 different dependency respectingplans, each contains 2 filtering TPs
Of what genus are the species that are● classified in the
same family as the American Badger,
● and expected in the same states as the American Badger ?
Picture source: Wikipedia
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 70
Measurements
1st Filtering TP
Percentage of plans in each group with a filtering TP in specific positions
2nd Filtering TP
0 30 60 90 120 150 1800
100
200
300
400
query exec. times (in seconds)
quer
y re
sults
0 30 60 90 120 150 1800
10
20
30
query exec. times (in seconds)
1 2 3 4 5 6 70
100
TP position in the ordered BGP
1 2 3 4 5 6 70
100
TP position in the ordered BGP
retr
ieve
d d
ocu
men
ts
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 71
Summary (Linked Data Queries)
● Theoretical foundations of Linked Data queries● Full-Web semantics, (family of) reachability based semantics● Theoretical properties of queries (e.g. computability)
● Link traversal based query execution● Novel paradigm for executing Linked Data queries● Sound and complete for conjunctive Linked Data queries
under cMatch-semantics
● Iterator implementation of the LTBQE paradigm● Trades off completeness for a termination guarantee● Degree of completeness depends on execution order of TPs
● Heuristic based plan selection
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 72
Chapter 3
Accessing a SPARQL Endpoint Queries over Multiple Datasets
➢ Query a given collection➢ Manage your own collection➢ Use a query federation system➢ Link traversal based query execution
Linked Data Queries➢ Foundations➢ Iterator Based Implementation➢ Query Planning
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 73
These slides have been created byOlaf Hartig
http://olafhartig.de
This work is licensed under aCreative Commons Attribution-Share Alike 3.0 License
(http://creativecommons.org/licenses/by-sa/3.0/)