67
Executing SPARQL Queries over the Web of Linked Data Olaf Hartig* Christian Bizer˚ Johann-Christoph Freytag* *Humboldt-Universität zu Berlin ˚Freie Universität Berlin

Executing SPARQL Queries of the Web of Linked Data

Embed Size (px)

DESCRIPTION

With these slides I presented my paper at the International Semantic Web Conference (ISWC'09), Washington DC, USA, Oct.2009

Citation preview

Page 1: Executing SPARQL Queries of the Web of Linked Data

Executing SPARQL Queriesover the

Web of Linked DataOlaf Hartig*Christian Bizer˚Johann-Christoph Freytag*

*Humboldt-Universität zu Berlin ˚Freie Universität Berlin

Page 2: Executing SPARQL Queries of the Web of Linked Data

Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data

● Use URIs as names for things● Use HTTP URIs so that people

can look up those names.● When someone looks up a

URI, provide useful information.

● Include links to other URIs so that they can discover more things.

Tim Berners-Lee, July 2006

My Movie DB

Page 3: Executing SPARQL Queries of the Web of Linked Data

Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data

http://mymovie.db/movie0362

● Use URIs as names for things● Use HTTP URIs so that people

can look up those names.● When someone looks up a

URI, provide useful information.

● Include links to other URIs so that they can discover more things.

Tim Berners-Lee, July 2006

http://mymovie.db/movie2449

http://mymovie.db/movie5112

http://mym

ovi e.db

/movie

13 42

My Movie DB

Page 4: Executing SPARQL Queries of the Web of Linked Data

Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data

http://mymovie.db/movie0362

● Use URIs as names for things● Use HTTP URIs so that people

can look up those names.● When someone looks up a

URI, provide useful information.

● Include links to other URIs so that they can discover more things.

Tim Berners-Lee, July 2006

http://mymovie.db/movie2449

http://mymovie.db/movie5112

http://mym

ovi e.db

/movie

13 42

My Movie DB

http://mym

ovie.db/movie2449

?

Page 5: Executing SPARQL Queries of the Web of Linked Data

Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data

http://mymovie.db/movie0362

● Use URIs as names for things● Use HTTP URIs so that people

can look up those names.● When someone looks up a

URI, provide useful information.

● Include links to other URIs so that they can discover more things.

Tim Berners-Lee, July 2006

http://mymovie.db/movie2449

http://mymovie.db/movie5112

http://mym

ovi e.db

/movie

13 42

My Movie DB

http://mym

ovie.db/movie2449

?

Page 6: Executing SPARQL Queries of the Web of Linked Data

Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data

http://mymovie.db/movie0362

● Use URIs as names for things● Use HTTP URIs so that people

can look up those names.● When someone looks up a

URI, provide useful information.

● Include links to other URIs so that they can discover more things.

Tim Berners-Lee, July 2006

http://mymovie.db/movie2449

http://mymovie.db/movie5112

http://mym

ovi e.db

/movie

13 42

My Movie DB

http://mym

ovie.db/movie2449

?

Page 7: Executing SPARQL Queries of the Web of Linked Data

Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data

http://mymovie.db/movie0362

● Use URIs as names for things● Use HTTP URIs so that people

can look up those names.● When someone looks up a

URI, provide useful information.

● Include links to other URIs so that they can discover more things.

Tim Berners-Lee, July 2006

http://mymovie.db/movie2449

http://mymovie.db/movie5112

http://mym

ovi e.db

/movie

13 42

My Movie DB http://geo.db/cityCJ

http

://ge

o.d

b/c

ou

ntry

7

http://geo.db/country21

http://geo.db/cityXA

http://mym

ovie.db/movie2449

?

Page 8: Executing SPARQL Queries of the Web of Linked Data

Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data

http://mymovie.db/movie0362

● Use URIs as names for things● Use HTTP URIs so that people

can look up those names.● When someone looks up a

URI, provide useful information.

● Include links to other URIs so that they can discover more things.

Tim Berners-Lee, July 2006

http://mymovie.db/movie2449

http://mymovie.db/movie5112

http://mym

ovi e.db

/movie

13 42

My Movie DB http://geo.db/cityCJ

http

://ge

o.d

b/c

ou

ntry

7

http://geo.db/country21

http://geo.db/cityXA

http://mym

ovie.db/movie2449

?

Page 9: Executing SPARQL Queries of the Web of Linked Data

Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data

● The Web: a huge, globally distributed dataspace

● Querying this dataspace opens new possibilities:● Aggregating data from different sources● Integrating fragmentary information● Achieving a more complete view

Page 10: Executing SPARQL Queries of the Web of Linked Data

Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data

Traditional approach 1: data centralization

● Querying a collection ofcopies from all relevantdatasets

Page 11: Executing SPARQL Queries of the Web of Linked Data

Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data

Traditional approach 1: data centralization● Querying a collection of

copies from all relevantdatasets

● Misses unknown or new sources● Collection probably out of date● Will it scale?

Page 12: Executing SPARQL Queries of the Web of Linked Data

Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data

Traditional approach 2: federated query processing

● Querying a mediator whichdistributes subqueries torelevant sources andintegrates the results

???

?

Page 13: Executing SPARQL Queries of the Web of Linked Data

Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data

Traditional approach 2: federated query processing● Querying a mediator which distributes

subqueries to relevant sources andintegrates the results

● Requires sources toprovide a query service

● Requires informationabout the sources

● Misses unknownor new sources

???

?

Page 14: Executing SPARQL Queries of the Web of Linked Data

Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data

Main drawback:

You have to know the relevantdata sources in advance.

You restrict yourself tothe selected sources.

You do not tap thefull potential of

the Web !

Page 15: Executing SPARQL Queries of the Web of Linked Data

Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data

A novel approach:

Link Traversal Based Query Execution

Allows data sources to be discovered at runtime

Page 16: Executing SPARQL Queries of the Web of Linked Data

Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data

Outline

Part I

Overview of Link Traversal based Query Execution

Part II

An Iterator based Implementation Approach

Page 17: Executing SPARQL Queries of the Web of Linked Data

Main Idea

● Intertwine query evaluation with traversal of RDF links

● Alternately:● Evaluate parts of the query on a

continuously augmented set of data● Look up URIs in intermediate

solutions and add retrieved datato the queried data set

Queried data

Page 18: Executing SPARQL Queries of the Web of Linked Data

Main Idea

● Intertwine query evaluation with traversal of RDF links

● Alternately:● Evaluate parts of the query on a

continuously augmented set of data● Look up URIs in intermediate

solutions and add retrieved datato the queried data set

Queried data

filmingLocation

http://.../movie2449

statistics

?loc

Query unemp_rate?ur

?stat

Page 19: Executing SPARQL Queries of the Web of Linked Data

Main Idea

● Intertwine query evaluation with traversal of RDF links

● Alternately:● Evaluate parts of the query on a

continuously augmented set of data● Look up URIs in intermediate

solutions and add retrieved datato the queried data set

Queried data

filmingLocation

http://.../movie2449

statistics

?loc

Query unemp_rate?ur

?stat

http://.../movie2449

?

Page 20: Executing SPARQL Queries of the Web of Linked Data

Main Idea

● Intertwine query evaluation with traversal of RDF links

● Alternately:● Evaluate parts of the query on a

continuously augmented set of data● Look up URIs in intermediate

solutions and add retrieved datato the queried data set

Queried data

filmingLocation

http://.../movie2449

statistics

?loc

Query unemp_rate?ur

?stat

http://.../movie2449

?

Page 21: Executing SPARQL Queries of the Web of Linked Data

Main Idea

● Intertwine query evaluation with traversal of RDF links

● Alternately:● Evaluate parts of the query on a

continuously augmented set of data● Look up URIs in intermediate

solutions and add retrieved datato the queried data set

Queried data

filmingLocation

http://.../movie2449

statistics

?loc

Query unemp_rate?ur

?stat

http://.../movie2449

?

Page 22: Executing SPARQL Queries of the Web of Linked Data

Main Idea

● Intertwine query evaluation with traversal of RDF links

● Alternately:● Evaluate parts of the query on a

continuously augmented set of data● Look up URIs in intermediate

solutions and add retrieved datato the queried data set

Queried data

filmingLocation

http://.../movie2449

statistics unemp_rate?ur

?stat

?loc

Query

Page 23: Executing SPARQL Queries of the Web of Linked Data

Main Idea

● Intertwine query evaluation with traversal of RDF links

● Alternately:● Evaluate parts of the query on a

continuously augmented set of data● Look up URIs in intermediate

solutions and add retrieved datato the queried data set

Queried data

filmingLocationhttp://geo.../Italyhttp://.../movie2449

filmingLocation

http://.../movie2449

statistics unemp_rate?ur

?stat

?loc

Query

Page 24: Executing SPARQL Queries of the Web of Linked Data

Main Idea

● Intertwine query evaluation with traversal of RDF links

● Alternately:● Evaluate parts of the query on a

continuously augmented set of data● Look up URIs in intermediate

solutions and add retrieved datato the queried data set

Queried data

http://geo.../Italy

?loc

filmingLocationhttp://.../movie2449

filmingLocation

http://.../movie2449

statistics unemp_rate?ur

?stat

?loc

Query

http://geo.../Italy

Page 25: Executing SPARQL Queries of the Web of Linked Data

Main Idea

Queried data

http

://ge

o.../

Italy

?

● Intertwine query evaluation with traversal of RDF links

● Alternately:● Evaluate parts of the query on a

continuously augmented set of data● Look up URIs in intermediate

solutions and add retrieved datato the queried data set

http://geo.../Italy

?loc

filmingLocation

http://.../movie2449

statistics unemp_rate?ur

?stat

?loc

Query

Page 26: Executing SPARQL Queries of the Web of Linked Data

Main Idea

Queried data

http

://ge

o.../

Italy

?

● Intertwine query evaluation with traversal of RDF links

● Alternately:● Evaluate parts of the query on a

continuously augmented set of data● Look up URIs in intermediate

solutions and add retrieved datato the queried data set

http://geo.../Italy

?loc

filmingLocation

http://.../movie2449

statistics unemp_rate?ur

?stat

?loc

Query

Page 27: Executing SPARQL Queries of the Web of Linked Data

Main Idea

Queried data

http

://ge

o.../

Italy

?

● Intertwine query evaluation with traversal of RDF links

● Alternately:● Evaluate parts of the query on a

continuously augmented set of data● Look up URIs in intermediate

solutions and add retrieved datato the queried data set

http://geo.../Italy

?loc

filmingLocation

http://.../movie2449

statistics unemp_rate?ur

?stat

?loc

Query

Page 28: Executing SPARQL Queries of the Web of Linked Data

Main Idea

● Intertwine query evaluation with traversal of RDF links

● Alternately:● Evaluate parts of the query on a

continuously augmented set of data● Look up URIs in intermediate

solutions and add retrieved datato the queried data set

http://geo.../Italy

?loc

Queried data

filmingLocation

http://.../movie2449

statistics unemp_rate?ur

?stat

?loc

Query

Page 29: Executing SPARQL Queries of the Web of Linked Data

Main Idea

● Intertwine query evaluation with traversal of RDF links

● Alternately:● Evaluate parts of the query on a

continuously augmented set of data● Look up URIs in intermediate

solutions and add retrieved datato the queried data set

http://geo.../Italy

?loc

Queried data

filmingLocation

http://.../movie2449 unemp_rate?ur

statistics ?stat

?loc

Query

Page 30: Executing SPARQL Queries of the Web of Linked Data

Main Idea

● Intertwine query evaluation with traversal of RDF links

● Alternately:● Evaluate parts of the query on a

continuously augmented set of data● Look up URIs in intermediate

solutions and add retrieved datato the queried data set

http://geo.../Italy

?loc

Queried data

filmingLocation

http://.../movie2449 unemp_rate?ur

statistics ?stat

?loc

Query

statistics http://stat.db/.../it

http://geo.../Italy

Page 31: Executing SPARQL Queries of the Web of Linked Data

Main Idea

● Intertwine query evaluation with traversal of RDF links

● Alternately:● Evaluate parts of the query on a

continuously augmented set of data● Look up URIs in intermediate

solutions and add retrieved datato the queried data set

http://geo.../Italy

?loc

Queried data

http://geo.../Italy http://stats.db/../it

?stat?loc

statistics http://stat.db/.../it

http://geo.../Italy

filmingLocation

http://.../movie2449 unemp_rate?ur

statistics ?stat

?loc

Query

Page 32: Executing SPARQL Queries of the Web of Linked Data

Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data

In a Nutshell

● Link traversal based query execution:● Evaluation on a continuously augmented dataset● Discovery of potentially relevant data during execution● Discovery driven by intermediate solutions

● Main advantage:● No need to know all data sources in advance

Page 33: Executing SPARQL Queries of the Web of Linked Data

Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data

Real-World Examples

SELECT DISTINCT ?author ?phone WHERE {

?pub swc:isPartOf <http://data.semanticweb.org/conference/eswc/2009/proceedings> .

?pub swc:hasTopic ?topic . ?topic rdfs:label ?topicLabel .

FILTER regex( str(?topicLabel), "ontology engineering", "i" ) .

?pub swrc:author ?author .

{ ?author owl:sameAs ?authorAlt }

UNION

{ ?authorAlt owl:sameAs ?author }

?authorAlt foaf:phone ?phone

}

Return phone numbers ofauthors of ontology engineering papers

at ESWC'09.

2

297

161min 30sec

# of query results

# of retrieved graphs

# of accessed servers

avg. execution time

Page 34: Executing SPARQL Queries of the Web of Linked Data

Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data

Outline

Part I

Overview of Link Traversal based Query Execution

Part II

An Iterator based Implementation Approach➢ Introduction to the Iterator Paradigm➢ Application to Link Traversal based Query Execution➢ URI Prefetching➢ Extension to the Iterator Paradigm➢ Evaluation

Page 35: Executing SPARQL Queries of the Web of Linked Data

Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data

Iterator based Query Execution

● Iterator:● implements an operation● is a group of functions:

OPEN, GETNEXT, CLOSE

Page 36: Executing SPARQL Queries of the Web of Linked Data

Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data

● Iterator:● implements an operation● is a group of functions:

OPEN, GETNEXT, CLOSE

● Query execution usesa chain of iterators

I1

I2

I3

Iterator based Query Execution

Page 37: Executing SPARQL Queries of the Web of Linked Data

Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data

● Iterator:● implements an operation● is a group of functions:

OPEN, GETNEXT, CLOSE

● Query execution usesa chain of iterators

● Each iterator responsiblefor a single triple pattern

filmingLocation

http://.../movie2449

statistics

?loc

Query unemp_rate?ur

?stat

filmingLocation

http://.../movie2449

?loc

statistics

?loc

?stat

unemp_rate

?ur

?stat

I1

I2

I3

Iterator based Query Execution

Page 38: Executing SPARQL Queries of the Web of Linked Data

Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data

1. Substitute tpc u r

= μc u r

[ tpi ]

2. Find matching triples match(tpc u r

) in queried data set

3. Create solution μ' for each t in match(tpc u r

)

4. Return each μc u r

U μ' as a result

Ii for tp

i

Results from Ii - 1

http://geo.db/country/US http://stats.example.org/USstatistics

http://geo.db/country/IT http://stats.example.org/ITstatistics

http://geo.db/country/IT http://stats.db/example/It

http://example.db/ctry/DE http://stats.example.org/Germany

?cStats?c

μc u r

Iterator based Query Execution

Page 39: Executing SPARQL Queries of the Web of Linked Data

Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data

1. Substitute tpc u r

= μc u r

[ tpi ]

2. Find matching triples match(tpc u r

) in queried data set

3. Create solution μ' for each t in match(tpc u r

)

4. Return each μc u r

U μ' as a result

Ii for tp

i

Results from Ii - 1

http://geo.db/country/US http://stats.example.org/USstatistics

http://geo.db/country/IT http://stats.example.org/ITstatistics

http://geo.db/country/IT http://stats.db/example/It

http://example.db/ctry/DE http://stats.example.org/Germany

?cStats?c

μc u r

tpi = ( ?loc ex:stats ?s )

μc u r

= { ?p → http://ex... , ?loc → http://geo... } Example

Iterator based Query Execution

Page 40: Executing SPARQL Queries of the Web of Linked Data

Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data

1. Substitute tpc u r

= μc u r

[ tpi ]

2. Find matching triples match(tpc u r

) in queried data set

3. Create solution μ' for each t in match(tpc u r

)

4. Return each μc u r

U μ' as a result

Ii for tp

i

Results from Ii - 1

http://geo.db/country/US http://stats.example.org/USstatistics

http://geo.db/country/IT http://stats.example.org/ITstatistics

http://geo.db/country/IT http://stats.db/example/It

http://example.db/ctry/DE http://stats.example.org/Germany

?cStats?c

μc u r

tpc u r

= ( http://geo... ex:stats ?s )

tpi = ( ?loc ex:stats ?s )

μc u r

= { ?p → http://ex... , ?loc → http://geo... } Example

Iterator based Query Execution

Page 41: Executing SPARQL Queries of the Web of Linked Data

Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data

1. Substitute tpc u r

= μc u r

[ tpi ]

2. Find matching triples match(tpc u r

) in queried data set

3. Create solution μ' for each t in match(tpc u r

)

4. Return each μc u r

U μ' as a result

Ii for tp

i

Results from Ii - 1

http://geo.db/country/US http://stats.example.org/USstatistics

http://geo.db/country/IT http://stats.example.org/ITstatistics

http://geo.db/country/IT http://stats.db/example/It

http://example.db/ctry/DE http://stats.example.org/Germany

?cStats?c

μc u r

tpc u r

= ( http://geo... ex:stats ?s )

(http://geo... ex:stats http://db...), (http://geo... ex:stats http://ex...)

tpi = ( ?loc ex:stats ?s )

μc u r

= { ?p → http://ex... , ?loc → http://geo... } Example

Iterator based Query Execution

Page 42: Executing SPARQL Queries of the Web of Linked Data

Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data

1. Substitute tpc u r

= μc u r

[ tpi ]

2. Find matching triples match(tpc u r

) in queried data set

3. Create solution μ' for each t in match(tpc u r

)

4. Return each μc u r

U μ' as a result

Ii for tp

i

Results from Ii - 1

http://geo.db/country/US http://stats.example.org/USstatistics

http://geo.db/country/IT http://stats.example.org/ITstatistics

http://geo.db/country/IT http://stats.db/example/It

http://example.db/ctry/DE http://stats.example.org/Germany

?cStats?c

μc u r

tpc u r

= ( http://geo... ex:stats ?s )

μ' = { ?s → http://db... }

(http://geo... ex:stats http://db...), (http://geo... ex:stats http://ex...)

tpi = ( ?loc ex:stats ?s )

μc u r

= { ?p → http://ex... , ?loc → http://geo... } Example

Iterator based Query Execution

Page 43: Executing SPARQL Queries of the Web of Linked Data

Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data

1. Substitute tpc u r

= μc u r

[ tpi ]

2. Find matching triples match(tpc u r

) in queried data set

3. Create solution μ' for each t in match(tpc u r

)

4. Return each μc u r

U μ' as a result

Ii for tp

i

Results from Ii - 1

http://geo.db/country/US http://stats.example.org/USstatistics

http://geo.db/country/IT http://stats.example.org/ITstatistics

http://geo.db/country/IT http://stats.db/example/It

http://example.db/ctry/DE http://stats.example.org/Germany

?cStats?c

μc u r

tpc u r

= ( http://geo... ex:stats ?s )

{ ?p → http://ex... , ?loc → http://geo.db/... , ?s → http://db... }

μ' = { ?s → http://db... }

(http://geo... ex:stats http://db...), (http://geo... ex:stats http://ex...)

tpi = ( ?loc ex:stats ?s )

μc u r

= { ?p → http://ex... , ?loc → http://geo... } Example

Iterator based Query Execution

Page 44: Executing SPARQL Queries of the Web of Linked Data

Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data

● Results of Ii are solutions for tp

1 , … , tp

i

Ii - 1

for tpi - 1

Ii for tp

i

Ii + 1

for tpi + 1

Iterator based Query Execution

Page 45: Executing SPARQL Queries of the Web of Linked Data

Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data

Outline

Part I

Overview of Link Traversal based Query Execution

Part II

An Iterator based Implementation Approach➢ Introduction to the Iterator Paradigm➢ Application to Link Traversal based Query Execution➢ URI Prefetching➢ Extension to the Iterator Paradigm➢ Evaluation

Page 46: Executing SPARQL Queries of the Web of Linked Data

Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data

Application to Link Traversal

● The queried data set grows

Ii - 1

for tpi - 1

Ii for tp

i

Ii + 1

for tpi + 1

Page 47: Executing SPARQL Queries of the Web of Linked Data

Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data

● The queried data set grows

● Look-up Requirement:

Do not evaluate tpc u r

until the

queried data set contains all

data that can be retrieved from

all URIs in tpc u r

Ii - 1

for tpi - 1

Ii for tp

i

Ii + 1

for tpi + 1

Application to Link Traversal

Page 48: Executing SPARQL Queries of the Web of Linked Data

Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data

1. Substitute tpc u r

= μc u r

[ tpi ]

2. Ensure look-up requirement for tpc u r

3. Find matching triples match(tpc u r

) in queried data set

4. Create solution μ' for each t in match(tpc u r

)

5. Return each μc u r

U μ' as a result

Ii for tp

i

Results from Ii - 1

http://geo.db/country/US http://stats.example.org/USstatistics

http://geo.db/country/IT http://stats.example.org/ITstatistics

http://geo.db/country/IT http://stats.db/example/It

http://example.db/ctry/DE http://stats.example.org/Germany

?cStats?c

μc u r

Application to Link Traversal

Page 49: Executing SPARQL Queries of the Web of Linked Data

Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data

1. Substitute tpc u r

= μc u r

[ tpi ]

2. Ensure look-up requirement for tpc u r

3. Find matching triples match(tpc u r

) in queried data set

4. Create solution μ' for each t in match(tpc u r

)

5. Return each μc u r

U μ' as a result

Ii for tp

i

Results from Ii - 1

http://geo.db/country/US http://stats.example.org/USstatistics

http://geo.db/country/IT http://stats.example.org/ITstatistics

http://geo.db/country/IT http://stats.db/example/It

http://example.db/ctry/DE http://stats.example.org/Germany

?cStats?c

μc u r

Initiate look-upsand wait

Application to Link Traversal

Page 50: Executing SPARQL Queries of the Web of Linked Data

Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data

1. Substitute tpc u r

= μc u r

[ tpi ]

2. Ensure look-up requirement for tpc u r

3. Find matching triples match(tpc u r

) in queried data set

4. Create solution μ' for each t in match(tpc u r

)

5. Return each μc u r

U μ' as a result

Ii for tp

i

Results from Ii - 1

http://geo.db/country/US http://stats.example.org/USstatistics

http://geo.db/country/IT http://stats.example.org/ITstatistics

http://geo.db/country/IT http://stats.db/example/It

http://example.db/ctry/DE http://stats.example.org/Germany

?cStats?c

μc u r

Initiate look-upsand wait

Application to Link Traversal

Page 51: Executing SPARQL Queries of the Web of Linked Data

Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data

Blocked Query Execution

● Waiting for URI look-upsblocks query execution

Ii - 1

for tpi - 1

Ii for tp

i

Ii + 1

for tpi + 1

Initiate look-upsand wait

Page 52: Executing SPARQL Queries of the Web of Linked Data

Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data

Outline

Part I

Overview of Link Traversal based Query Execution

Part II

An Iterator based Implementation Approach➢ Introduction to the Iterator Paradigm➢ Application to Link Traversal based Query Execution➢ URI Prefetching➢ Extension to the Iterator Paradigm➢ Evaluation

Page 53: Executing SPARQL Queries of the Web of Linked Data

Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data

URI Prefetching

● Waiting for URI look-upsblocks query execution

● URI prefetching: when a URIis bound to a variable initiatelook-up in the background

Ii - 1

for tpi - 1

Ii for tp

i

Ii + 1

for tpi + 1

Ensure look-upis finished

Initiate look-up

Initiate look-upsand wait

Page 54: Executing SPARQL Queries of the Web of Linked Data

Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data

URI Prefetching

1. Substitute tpc u r

= μc u r

[ tpi ]

2. Ensure look-up requirement for tpc u r

3. Find matching triples match(tpc u r

) in queried data set

4. Create solution μ' for each t in match(tpc u r

)

5. Initiate parallel look-up for each new URI in μ'

6. Return each μc u r

U μ' as a result

Ii for tp

i

Results from Ii - 1

http://geo.db/country/US http://stats.example.org/USstatistics

http://geo.db/country/IT http://stats.example.org/ITstatistics

http://geo.db/country/IT http://stats.db/example/It

http://example.db/ctry/DE http://stats.example.org/Germany

?cStats?c

μc u r

Page 55: Executing SPARQL Queries of the Web of Linked Data

Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data

URI Prefetching

Ii - 1

for tpi - 1

Ii for tp

i

Ii + 1

for tpi + 1

Ensure look-upis finished

Initiate look-up

Initiate look-upsand wait

Page 56: Executing SPARQL Queries of the Web of Linked Data

Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data

URI Prefetching

Ii - 1

for tpi - 1

Ii for tp

i

Ii + 1

for tpi + 1

Wait until look-upis finished

Initiate look-up

Initiate look-upsand wait

Page 57: Executing SPARQL Queries of the Web of Linked Data

Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data

URI Prefetching

● Even with URI prefetchingquery execution may block

Ii - 1

for tpi - 1

Ii for tp

i

Ii + 1

for tpi + 1

Wait until look-upis finished

Page 58: Executing SPARQL Queries of the Web of Linked Data

Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data

URI Prefetching

● Even with URI prefetchingquery execution may block

● Possible solutions:● Program parallelism● Asynchronous pipeline

● Drawback: requires major rewrite of existing query engines

Ii - 1

for tpi - 1

Ii for tp

i

Ii + 1

for tpi + 1

Wait until look-upis finished

Page 59: Executing SPARQL Queries of the Web of Linked Data

Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data

Outline

Part I

Overview of Link Traversal based Query Execution

Part II

An Iterator based Implementation Approach➢ Introduction to the Iterator Paradigm➢ Application to Link Traversal based Query Execution➢ URI Prefetching➢ Extension to the Iterator Paradigm➢ Evaluation

Page 60: Executing SPARQL Queries of the Web of Linked Data

Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data

Postponing Iterator

● Enabled by an extension of the iterator paradigm:● New function POSTPONE: take most recently provided

result back● Adjusted GETNEXT: either return the next result or return

a formerly postponed result

● POSTPONE allows to temporarily reject input solution μc u r

Page 61: Executing SPARQL Queries of the Web of Linked Data

Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data

1. Substitute tpc u r

= μc u r

[ tpi ]

2. POSTPONE μc u r

if look-up requirement doesn't hold for tpc u r

3. Find matching triples match(tpc u r

) in queried data set

4. Create solution μ' for each t in match(tpc u r

)

5. Initiate parallel look-up for each new URI in μ'

6. Return each μc u r

U μ' as a result

Ii for tp

i

Results from Ii - 1

http://geo.db/country/US http://stats.example.org/USstatistics

http://geo.db/country/IT http://stats.example.org/ITstatistics

http://geo.db/country/IT http://stats.db/example/It

http://example.db/ctry/DE http://stats.example.org/Germany

?cStats?c

μc u r

Postponing Iterator

Page 62: Executing SPARQL Queries of the Web of Linked Data

Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data

Outline

Part I

Overview of Link Traversal based Query Execution

Part II

An Iterator based Implementation Approach➢ Introduction to the Iterator Paradigm➢ Application to Link Traversal based Query Execution➢ URI Prefetching➢ Extension to the Iterator Paradigm➢ Evaluation

Page 63: Executing SPARQL Queries of the Web of Linked Data

Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data

Evaluation

● Implementation: Semantic Web Client Library (SWClLib) http://www4.wiwiss.fu-berlin.de/bizer/ng4j/semwebclient/

● Berlin SPARQL Benchmark (BSBM)● Simulates e-commerce scenario● Mix of 12 SPARQL queries● Generates datasets of different sizes (scaling factor)

● Simulation of the Web of Linked Data● Linked Data server publishes BSBM datasets

● Experiment● Adjusted BSBM queries link to the simulation server● Execute query mix with SWClLib

Page 64: Executing SPARQL Queries of the Web of Linked Data

Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data

Evaluation

10 20 30 40 50 60

0

50

100

150

200

250

w/o prefetchingw/ prefetchingnon-blocking + prefetchingall data retrieved in advance

avg.

exe

cuti

on

tim

e pe

r qu

e ry

mix

in s

eco n

ds

BSBM scaling factor

scal.factor # of triples # of entities

10 4,971 613

20 8,485 928

30 11,999 1,245

40 16,918 1,845

50 22,616 2,599

60 26,108 2,914

Page 65: Executing SPARQL Queries of the Web of Linked Data

Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data

Take-away Summary

● Novel query execution approach for the Web of Data:● Utilizes the characteristics of the Web● Traverses RDF links during query execution● Discovery of new data sources● No need to know all data sources in advance

● Implementation approach:● Iterator based execution with URI Prefetching● Extension of the iterator paradigm (POSTPONE)

● New research challenges:● Improving result completeness● Investigating suitable caching strategies

Page 66: Executing SPARQL Queries of the Web of Linked Data

Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data

Try it!

● SQUIN http://squin.org● Provides SWClLib functionality as a Web service● Accessible like a SPARQL endpoint

● Public SQUIN service at

http://squin.informatik.hu-berlin.de/SQUIN/

Page 67: Executing SPARQL Queries of the Web of Linked Data

Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data

These slides have been created byOlaf Hartig

http://olafhartig.de

This work is licensed under aCreative Commons Attribution-Share Alike 3.0 License

(http://creativecommons.org/licenses/by-sa/3.0/)