47
© Copyright 2009 Digital Enterprise Research Institute. All rights reserved. Digital Enterprise Research Institute www.deri.ie Processing Queries on Top of Linked Data and Sensor Data Cry Distribution Marcel Karnstedt

Processing Queries on Top of Linked Data and Sensor Data · Digital Enterprise Research Institute Indexing in YARS etc. Index the different parts of a triple Ideally: all combinations

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Processing Queries on Top of Linked Data and Sensor Data · Digital Enterprise Research Institute Indexing in YARS etc. Index the different parts of a triple Ideally: all combinations

© Copyright 2009 Digital Enterprise Research Institute. All rights reserved.

Digital Enterprise Research Institute www.deri.ie

Processing Queries on Top of Linked Data and Sensor Data

Cry Distribution

Marcel Karnstedt

Page 2: Processing Queries on Top of Linked Data and Sensor Data · Digital Enterprise Research Institute Indexing in YARS etc. Index the different parts of a triple Ideally: all combinations

Digital Enterprise Research Institute www.deri.ie

Linked Data

  Use URIs as names for things (documents, people, organisations, products, …)

  Use HTTP URIs so that people can look up those names

  When someone looks up a URI, provide useful information, typically structured data in RDF

  Include links to other URIs, so that they can discover more things

http://www.w3.org/DesignIssues/LinkedData

Page 3: Processing Queries on Top of Linked Data and Sensor Data · Digital Enterprise Research Institute Indexing in YARS etc. Index the different parts of a triple Ideally: all combinations

Digital Enterprise Research Institute www.deri.ie

Accessing Linked Data

Page 4: Processing Queries on Top of Linked Data and Sensor Data · Digital Enterprise Research Institute Indexing in YARS etc. Index the different parts of a triple Ideally: all combinations

Digital Enterprise Research Institute www.deri.ie

Linked Data in Practice

User Agent

Web Server

http://www.polleres.net/foaf.rdf#me

http://www.polleres.net/foaf.rdf

HTTP GET

RDF

Page 5: Processing Queries on Top of Linked Data and Sensor Data · Digital Enterprise Research Institute Indexing in YARS etc. Index the different parts of a triple Ideally: all combinations

Digital Enterprise Research Institute www.deri.ie

Forward References

User Agent

Web Server

http://dbpedia.org/resource/Gordon_Brown

http://dbpedia.org/data/Gordon_Brown

HTTP GET

303 HTTP GET

RDF

http://dbpedia.org/page/Gordon_Brown

Page 6: Processing Queries on Top of Linked Data and Sensor Data · Digital Enterprise Research Institute Indexing in YARS etc. Index the different parts of a triple Ideally: all combinations

Digital Enterprise Research Institute www.deri.ie

Consumption - Essentials

  Linked Data provides for a global data-space with a uniform API (due to RDF as the data model)

  Access methods   Dereference URIs via HTTP GET (RDF/XML, RDFa, etc.)

  SPARQL (‘the SQL of RDF’)

  Data dumps (RDF/XML, etc.)

  Metadata about LOD datasets   voiD (http://semanticweb.org/wiki/VoiD)

  Allows to select datasets based on their characteristics (topic, license, interlinking, formats, etc.)

Page 7: Processing Queries on Top of Linked Data and Sensor Data · Digital Enterprise Research Institute Indexing in YARS etc. Index the different parts of a triple Ideally: all combinations

Digital Enterprise Research Institute www.deri.ie

Consumption - Technologies

  Basic Linked Data access mechanisms widely supported   in all major platforms and languages (HTTP interface &

RDF parsing), such as Java, PHP, C/C++/.NET, etc.

  Inspect and debug tools –  Command line tools (curl, rapper, etc.)

–  Online tools –  http://redbot.org/ (HTTP/low-level) –  http://sindice.com/developers/inspector (RDF/data-level)

  SPARQL endpoints (generic and dataset-specific) http://esw.w3.org/SparqlEndpoints

Page 8: Processing Queries on Top of Linked Data and Sensor Data · Digital Enterprise Research Institute Indexing in YARS etc. Index the different parts of a triple Ideally: all combinations

Digital Enterprise Research Institute www.deri.ie

Gimme URIs…!!

  Distributed setup need for central point of access (indexer, aggregator)

  Sindice, an index of the Web of Data   http://sindice.com/

  Sig.ma, Web of Data aggregator & browser   http://sig.ma/

  Relationship discovery   http://relfinder.semanticweb.org/

  But where is the DB…?!   Complex queries, efficient storage, quick access, etc.

Page 9: Processing Queries on Top of Linked Data and Sensor Data · Digital Enterprise Research Institute Indexing in YARS etc. Index the different parts of a triple Ideally: all combinations

Digital Enterprise Research Institute www.deri.ie

Ranking

  Huge data, millions of sources, messiness, RDF model, …   Requires special ranking

  TF-IDF style, Graph based (PageRank style), Cardinality based (histogram style), etc.   Triple level

  URI level

  Document level

  Source level

  Domain level

  …

Page 10: Processing Queries on Top of Linked Data and Sensor Data · Digital Enterprise Research Institute Indexing in YARS etc. Index the different parts of a triple Ideally: all combinations

Digital Enterprise Research Institute www.deri.ie

Skyline Ranking

  Objects that are not “dominated“ by other objects   Scoring function on multiple attributes, no weighting

  In contrast to (multidimensional) top-k

  No straightforward IR operation

dominated objects

price

distance age

price time

Page 11: Processing Queries on Top of Linked Data and Sensor Data · Digital Enterprise Research Institute Indexing in YARS etc. Index the different parts of a triple Ideally: all combinations

Digital Enterprise Research Institute www.deri.ie

Querying

SELECT ?f ?n WHERE { an:f#ah foaf:knows ?f. ?f foaf:name ?n. }

?f ?n

SELECT ?x1 ?x2 WHERE { dblppub:HoganHP08 dc:creator ?a1, ?a2. ?x1 owl:sameAs ?a1. ?x2 owl:sameAs ?a2. ?x1 foaf:knows ?x2. ?x2 foaf:knows ?x1. }

Page 12: Processing Queries on Top of Linked Data and Sensor Data · Digital Enterprise Research Institute Indexing in YARS etc. Index the different parts of a triple Ideally: all combinations

Digital Enterprise Research Institute www.deri.ie

ξ([?s2·?p2·?o2 → ···])

WHERE { ?s1 <p1> ?o1 . ?s2 ?p2 ?o2 . ?s2 <add> ?add . FILTER (edist(?o1, ?o2) < k) . FILTER ?p2=<pred>}

σ(?p2=<pred>) ξ([?s1··?o1 → ·<p1>·])

(edist(?o1,?o2)<k)

ξ([?s2·?p2·?o2 → ·<pred>·])

ω(?s2;[?s2·· ?add → · <add> ·])

Query Algebra

Page 13: Processing Queries on Top of Linked Data and Sensor Data · Digital Enterprise Research Institute Indexing in YARS etc. Index the different parts of a triple Ideally: all combinations

Digital Enterprise Research Institute www.deri.ie

  Data warehousing or materialisation-based approaches (MAT)

Querying Data Across Sources

CRAWL INDEX SERVE

  RDBMS   One big table

  Property tables   Vertical storage, column stores

  Hybrid approaches, such as Virtuoso

  Native stores, such as YARS   Special (simplified) structures, special indexes!

Page 14: Processing Queries on Top of Linked Data and Sensor Data · Digital Enterprise Research Institute Indexing in YARS etc. Index the different parts of a triple Ideally: all combinations

Digital Enterprise Research Institute www.deri.ie

Indexing in YARS etc.

  Index the different parts of a triple   Ideally: all combinations

  Optimised for read-only access   With prefix support: only 6 (spo, sop, pso, pos, osp, ops)

  Trade-off: storage vs. performance, read vs. write

  Optional special indexes (full-text, string similarity, …)

<x> name Xavier <x> knows <friend> <x> seeAlso <link> <x> sameAs <y>

Page 15: Processing Queries on Top of Linked Data and Sensor Data · Digital Enterprise Research Institute Indexing in YARS etc. Index the different parts of a triple Ideally: all combinations

Digital Enterprise Research Institute www.deri.ie

  Live lookups, on-demand querying

Live Queries

15

SELECT * FROM…

R S

R S

SELECT ?s WHERE…

TP TP

TP TP

HTTP GET

HTTP GET

ODBC ODBC

Page 16: Processing Queries on Top of Linked Data and Sensor Data · Digital Enterprise Research Institute Indexing in YARS etc. Index the different parts of a triple Ideally: all combinations

Digital Enterprise Research Institute www.deri.ie

Live Queries: Approaches

Andreas Harth Data

16

15.03.2010

TP (an:f#ah foaf:knows ?f)

SELECT ?f ?n WHERE { an:f#ah foaf:knows ?f. ?f foaf:name ?n. }

TP (?f foaf:name ?n)

?f ?n

http://danbri.org/foaf.rdf#danbri Dan Brickley

Select source(s)

Select source(s)

HTTP

GET RDF HTTP

GET

RDF

  Direct lookups   dereferencing URIs,

recursive

  Data summaries

Page 17: Processing Queries on Top of Linked Data and Sensor Data · Digital Enterprise Research Institute Indexing in YARS etc. Index the different parts of a triple Ideally: all combinations

Digital Enterprise Research Institute www.deri.ie

Federated/Distributed Querying

  Federated feature in SPARQL1.1   Directly refer to sources

  Automatically split/copy queries and forward

  Depends on query capabilities!   Simple sources vs. SPARQL end points etc.

  Similar issues as in central engines   Indexing, local stores, …

  New challenges   Availability, guarantees, robustness, consistency, …

Page 18: Processing Queries on Top of Linked Data and Sensor Data · Digital Enterprise Research Institute Indexing in YARS etc. Index the different parts of a triple Ideally: all combinations

Digital Enterprise Research Institute www.deri.ie

Basic federated Queries (time permitting)

  http://www.w3.org/TR/sparql11-federated-query/

  Will be integrated in Query spec

  Essentially new pattern SERVICE   Similar to GRAPH

  allows delegate query parts to a specific (remote) endpoint

PREFIX foaf: <http://xmlns.com/foaf/0.1/> SELECT ?N FROM <http://www.w3.org/People/Berners-Lee/card> WHERE { { <http://www.w3.org/People/Berners-Lee/card#i> foaf:knows ?F . ?F foaf:name ?N } UNION { SERVICE <http://dblp.l3s.de/d2r/sparql>

{ [ foaf:maker <http://dblp.l3s.de/…/authors/Tim_Berners-Lee>, [ foaf:name ?N ] ] . } } }

Page 19: Processing Queries on Top of Linked Data and Sensor Data · Digital Enterprise Research Institute Indexing in YARS etc. Index the different parts of a triple Ideally: all combinations

Digital Enterprise Research Institute www.deri.ie

Total Decentralisation

DB Find all reviews for movies made in 1994 in central Europe!

RDF: Geonames data

RDF: IMDB data RDF: reviews

Page 20: Processing Queries on Top of Linked Data and Sensor Data · Digital Enterprise Research Institute Indexing in YARS etc. Index the different parts of a triple Ideally: all combinations

Digital Enterprise Research Institute www.deri.ie

<x> name Xavier <x> knows <friend> <x> seeAlso <link> <x> sameAs <y>

  Use distributed hash tables (DHT)   Indexing of attributes = key for Hashing   Which attributes? All!

Indexing in UniStore

  h(s) for subject lookup   h(p1 || o1) for ?s pi ?o . h(p2 || o2) filter (?o ≥ v) ... (prefix search)

  ...trade-off storage vs. performance

Page 21: Processing Queries on Top of Linked Data and Sensor Data · Digital Enterprise Research Institute Indexing in YARS etc. Index the different parts of a triple Ideally: all combinations

Digital Enterprise Research Institute www.deri.ie

  Goal: stateless processing → “push“ approach   Messages containing both plan and intermediate results

(based on Mutant Query Plans [Papadimos et al. 02])

  Receiver peer is identified by applying the hash function

  Multiple instances of the plan travel trough the network

Robustness: Parallel Execution

p0

p1

p2

p3

p4

p5

{(A,1),(A,2)}

{(A,3),(A,4)}

{(A,5),(A,6)} {(B,5),(B,6)}

{(B,2),(C,1),(B,2),(C,4)}

p0

{(A,5),(B, 5)}

{(A,2,B,2,C,1), (A,2,B,2,C,4)}

σ(A) B

σ(A) B

σ(A) B

{(A,5)} B

{(A,2)} B

{(A,3),(A,4)} B

Page 22: Processing Queries on Top of Linked Data and Sensor Data · Digital Enterprise Research Institute Indexing in YARS etc. Index the different parts of a triple Ideally: all combinations

Digital Enterprise Research Institute www.deri.ie

Less Bandwidth: Sequential Execution

p0

p1

p2

p3

p4

p5

{(A,1),(A,2)}

{(A,3),(A,4)}

{(A,5),(A,6)} {(B,5),(B,6)}

{(B,2),(C,1),(B,2),(C,4)}

p0

{(A,2,B,2,C,1),(A,2,B,2,C,4),...}

σ(A) B {(A,2)} B

{(A,2),(A,3),(A,4)} B

{(A,2),(A,3),(A,4),(A,5)} B

{(A,2),(A,3),(A,4),(A,5,B,5)} B

  All peers can be queried in a sequence

  Decision at each peer: adaptive query processing

Page 23: Processing Queries on Top of Linked Data and Sensor Data · Digital Enterprise Research Institute Indexing in YARS etc. Index the different parts of a triple Ideally: all combinations

Digital Enterprise Research Institute www.deri.ie

Mixed Query Execution

p0

p1

p2

p3

p4

p5

{(A,1),(A,2)}

{(A,3),(A,4)}

{(A,5),(A,6)} {(B,5),(B,6)}

{(B,2),(C,1),(B,2),(C,4)}

p0

{(A,5),(B, 5)}

{(A,2,B,2,C,1), (A,2,B,2,C,4)}

σ(A) B {(A,2)} B

{(A,2),(A,3),(A,4)} B

{(A,2),(A,3),(A,4),(A,5)} B

  May result in unpredictable behavior

  “Fire and forget”   A peer may see the same query multiple times

  Different data to process

  Different operators to process

Page 24: Processing Queries on Top of Linked Data and Sensor Data · Digital Enterprise Research Institute Indexing in YARS etc. Index the different parts of a triple Ideally: all combinations

Digital Enterprise Research Institute www.deri.ie

Reasoning

Page 25: Processing Queries on Top of Linked Data and Sensor Data · Digital Enterprise Research Institute Indexing in YARS etc. Index the different parts of a triple Ideally: all combinations

Digital Enterprise Research Institute www.deri.ie

In Principle

  Machine-interpretable representation of data allows for deductive reasoning   Drawing conclusions from axioms and data

  Web Ontology Language (OWL) provides constructs supporting entity consolidation   Same as, inverse-functional properties (mbox_sha1sum)

  Reasoning can further be used to:   Unite fractured data sets

  Disambiguate entities

  Check consistency of knowledgebases

Page 26: Processing Queries on Top of Linked Data and Sensor Data · Digital Enterprise Research Institute Indexing in YARS etc. Index the different parts of a triple Ideally: all combinations

Digital Enterprise Research Institute www.deri.ie

Example

  General Idea: Answer Queries with implicit answers

  Simplified example:   :jeff rdfs:type foaf:person & :jeff foaf:knows :aidan

  query: select ?x { ?x rdfs:type foaf:agent }

 foaf:person rdfs:subClassOf foaf:agent  :jeff as result

  query: select ?x { ?x rdfs:type foaf:person }  foaf:knows rdfs:range foaf:person  :jeff and :aidan as result

  Inverse-functional properties, sameAs, subPropertyOff etc.

Page 27: Processing Queries on Top of Linked Data and Sensor Data · Digital Enterprise Research Institute Indexing in YARS etc. Index the different parts of a triple Ideally: all combinations

Digital Enterprise Research Institute www.deri.ie

Problems…

  Usually expensive, huge amount of “new” facts

  Potential conflicts due to inconsistencies   Potentially infinite results

  “Ontology hijacking”   e.g. foaf:Person subClassOf my:Person

  A new statement for each Person in the dataset

  Non-distinguished variables   SELECT ?X { ?X :hasFather ?Y }

  No such triple in the data, but “every person has a father”?!

  08445a31a78661b5c746feff39a9db6e4e2cc5cf   sha1-sum of ‘mailto:’

Page 28: Processing Queries on Top of Linked Data and Sensor Data · Digital Enterprise Research Institute Indexing in YARS etc. Index the different parts of a triple Ideally: all combinations

Digital Enterprise Research Institute www.deri.ie

Implementation

  Materialise inferred triples   Forward chaining

  Query rewriting, recursive/iterative   Backward chaining

  On-the-fly with data summaries   Ongoing research

  Stateless query expansion   Parallel sub-queries

Page 29: Processing Queries on Top of Linked Data and Sensor Data · Digital Enterprise Research Institute Indexing in YARS etc. Index the different parts of a triple Ideally: all combinations

Digital Enterprise Research Institute www.deri.ie

Query Expansion

Unexpanded query

Map operators added

First mapping Expanded query

Page 30: Processing Queries on Top of Linked Data and Sensor Data · Digital Enterprise Research Institute Indexing in YARS etc. Index the different parts of a triple Ideally: all combinations

Digital Enterprise Research Institute www.deri.ie

Querying Sensor Data

Page 31: Processing Queries on Top of Linked Data and Sensor Data · Digital Enterprise Research Institute Indexing in YARS etc. Index the different parts of a triple Ideally: all combinations

Digital Enterprise Research Institute www.deri.ie

Processing Paradigms

Page 32: Processing Queries on Top of Linked Data and Sensor Data · Digital Enterprise Research Institute Indexing in YARS etc. Index the different parts of a triple Ideally: all combinations

Digital Enterprise Research Institute www.deri.ie

AnduIN

Page 33: Processing Queries on Top of Linked Data and Sensor Data · Digital Enterprise Research Institute Indexing in YARS etc. Index the different parts of a triple Ideally: all combinations

Digital Enterprise Research Institute www.deri.ie

CQL

Page 34: Processing Queries on Top of Linked Data and Sensor Data · Digital Enterprise Research Institute Indexing in YARS etc. Index the different parts of a triple Ideally: all combinations

Digital Enterprise Research Institute www.deri.ie

In-Network Query Processing

Page 35: Processing Queries on Top of Linked Data and Sensor Data · Digital Enterprise Research Institute Indexing in YARS etc. Index the different parts of a triple Ideally: all combinations

Digital Enterprise Research Institute www.deri.ie

Example

  Anomalies in sensor networks   Sensors deliver measurements

x

y

s9

s6 s5 s25

s2

s15

s17

s24

s18

s10 s4

s12

s28

s13

s16

s7

s3 s1

s21

s11

s8

s19

s23

s41

s34

s38 s37

s20

s22

s27 s26

s31

s29 s30

s35

s36

s39

s14

s40

s32 s33

Page 36: Processing Queries on Top of Linked Data and Sensor Data · Digital Enterprise Research Institute Indexing in YARS etc. Index the different parts of a triple Ideally: all combinations

Digital Enterprise Research Institute www.deri.ie

Example

  Anomalies in sensor networks   Identify anomalies from the stream

x

y

s9

s6 s5 s25

s2

s15

s17

s24

s18

s10 s4

s12

s28

s13

s16

s7

s3 s1

s21

s11

s8

s19

s23

s41

s34

s38 s37

s20

s22

s27 s26

s31

s29 s30

s35

s36

s39

s14

s40

s32 s33

Page 37: Processing Queries on Top of Linked Data and Sensor Data · Digital Enterprise Research Institute Indexing in YARS etc. Index the different parts of a triple Ideally: all combinations

Digital Enterprise Research Institute www.deri.ie

Example

  Anomalies in sensor networks   Determine anomalous regions

x

y

s9

s6 s5 s25

s2

s15

s17

s24

s18

s10 s4

s12

s28

s13

s16

s7

s3 s1

s21

s11

s8

s19

s23

s41

s34

s38 s37

s20

s22

s27 s26

s31

s29 s30

s35

s36

s39

s14

s40

s32 s33

Page 38: Processing Queries on Top of Linked Data and Sensor Data · Digital Enterprise Research Institute Indexing in YARS etc. Index the different parts of a triple Ideally: all combinations

Digital Enterprise Research Institute www.deri.ie

Example

  Anomalies in sensor networks   Respect obstacles

x

y

s9

s6 s5 s25

s2

s15

s17

s24

s18

s10 s4

s12

s28

s13

s16

s7

s3 s1

s21

s11

s8

s19

s23

s41

s34

s38 s37

s20

s22

s27 s26

s31

s29 s30

s35

s36

s39

s14

s40

s32 s33

Page 39: Processing Queries on Top of Linked Data and Sensor Data · Digital Enterprise Research Institute Indexing in YARS etc. Index the different parts of a triple Ideally: all combinations

Digital Enterprise Research Institute www.deri.ie

Example

  Anomalies in sensor networks   Obstacles change regions

x

y

s9

s6 s5 s25

s2

s15

s17

s24

s18

s10 s4

s12

s28

s13

s16

s7

s3 s1

s21

s11

s8

s19

s23

s41

s34

s38 s37

s20

s22

s27 s26

s31

s29 s30

s35

s36

s39

s14

s40

s32 s33

Page 40: Processing Queries on Top of Linked Data and Sensor Data · Digital Enterprise Research Institute Indexing in YARS etc. Index the different parts of a triple Ideally: all combinations

Digital Enterprise Research Institute www.deri.ie

Example Scenario

  Storms in California

Page 41: Processing Queries on Top of Linked Data and Sensor Data · Digital Enterprise Research Institute Indexing in YARS etc. Index the different parts of a triple Ideally: all combinations

Digital Enterprise Research Institute www.deri.ie

Anomaly Degrees and Regions

  Regions for different thresholds   Triangulated Wireframe Surface

(TWS)

Degree plane at height 0.25 Degree plane at height 0.4

Page 42: Processing Queries on Top of Linked Data and Sensor Data · Digital Enterprise Research Institute Indexing in YARS etc. Index the different parts of a triple Ideally: all combinations

Digital Enterprise Research Institute www.deri.ie

IN Region Detection

  Focus on energy consumption   But cost model supports multiple dimensions

Page 43: Processing Queries on Top of Linked Data and Sensor Data · Digital Enterprise Research Institute Indexing in YARS etc. Index the different parts of a triple Ideally: all combinations

Digital Enterprise Research Institute www.deri.ie

Cost Estimation

  In streams: continuous queries

  Thus, query planning should be adaptive   Not “optimise first, execute next” any more

  When and how to re-optimise

  Forecast, e.g., by exponential smoothing

  Decide between alternative query plans

Page 44: Processing Queries on Top of Linked Data and Sensor Data · Digital Enterprise Research Institute Indexing in YARS etc. Index the different parts of a triple Ideally: all combinations

Digital Enterprise Research Institute www.deri.ie

Eval: Anomalies

  Anomaly rate and size of sliding window

Page 45: Processing Queries on Top of Linked Data and Sensor Data · Digital Enterprise Research Institute Indexing in YARS etc. Index the different parts of a triple Ideally: all combinations

Digital Enterprise Research Institute www.deri.ie

Eval: Anomalies /2

  Number of leader nodes

Page 46: Processing Queries on Top of Linked Data and Sensor Data · Digital Enterprise Research Institute Indexing in YARS etc. Index the different parts of a triple Ideally: all combinations

Digital Enterprise Research Institute www.deri.ie

Eval: Anomalous Region

  Anomaly rate

Page 47: Processing Queries on Top of Linked Data and Sensor Data · Digital Enterprise Research Institute Indexing in YARS etc. Index the different parts of a triple Ideally: all combinations

Digital Enterprise Research Institute www.deri.ie

Brief Wrap-Up

  Different approaches for querying SemWeb data

  Linked Data is inherently distributed   ...but not inherently dynamic?!

  Distributed approaches promising:   Support dynamic data

  Scalable

  Query processing in sensor networks shows similarities

  Scalability requirements advise to focus on distributed approaches, resource limitations demand it