34
On the need for a W3C community group on RDF Stream Processing ISWC2013 Workshop on Ordering and Reasoning, Sydney, 22/10/2013 Oscar Corcho [email protected], [email protected] @ocorcho http://www.slideshare.net/ocorcho/

OrdRing 2013 keynote - On the need for a W3C community group on RDF Stream Processing

Embed Size (px)

DESCRIPTION

Keynote provided on the Ordring 2013 workshop about the need for a W3C community group on RDF Stream Processing

Citation preview

Page 1: OrdRing 2013 keynote - On the need for a W3C community group on RDF Stream Processing

On the need for a W3C community group on RDF

Stream Processing

ISWC2013 Workshop on Ordering and Reasoning, Sydney, 22/10/2013

Oscar [email protected], [email protected]

@ocorchohttp://www.slideshare.net/ocorcho/

Page 2: OrdRing 2013 keynote - On the need for a W3C community group on RDF Stream Processing

Disclaimer…

2<<Texto libre: proyecto, speaker, etc.>>

This presentation expresses my view but not necessarily the one from the rest of the group (although I hope that it is similar)

Page 3: OrdRing 2013 keynote - On the need for a W3C community group on RDF Stream Processing

Acknowledgements

• All those that I have “stolen” slides, material and ideas from• Emanuele Della Valle• Daniele Dell’Aglio• Marco Balduini• Jean Paul Calbimonte• And many others who

have already startedcontributing…

3<<Texto libre: proyecto, speaker, etc.>>

Page 4: OrdRing 2013 keynote - On the need for a W3C community group on RDF Stream Processing

Why setting up a community group?

4<<Texto libre: proyecto, speaker, etc.>>

In RDF Stream models(timestamps, events, time

intervals, triple-based, graph-based …)

In RDF Stream query languages(windows, stream selection, CEP-based operators, …)

In implementations(RDF native, query rewriting, continuous query registration,

scalability, static vs streaming data…)

In operational semantics(tick, window content, report)

Heterogeneity

Page 5: OrdRing 2013 keynote - On the need for a W3C community group on RDF Stream Processing

You may think that we do not like heterogeneity…

5<<Texto libre: proyecto, speaker, etc.>>

Page 6: OrdRing 2013 keynote - On the need for a W3C community group on RDF Stream Processing

But at least I love it…

• However, we need to tell people what to expect with each system, and smooth differences when they are not crucial……

6<<Texto libre: proyecto, speaker, etc.>>

Page 7: OrdRing 2013 keynote - On the need for a W3C community group on RDF Stream Processing

The solution…

• Let’s create a W3C community group…

• To understand better those differences• The requirements on which we are based• And explain to others• …• And maybe get some “recommendation” out

7<<Texto libre: proyecto, speaker, etc.>>

Page 8: OrdRing 2013 keynote - On the need for a W3C community group on RDF Stream Processing

The W3C RDF Stream Processing Comm. Group

• http://www.w3.org/community/rsp/

8<<Texto libre: proyecto, speaker, etc.>>

Page 9: OrdRing 2013 keynote - On the need for a W3C community group on RDF Stream Processing

W3C RSP Community Group mission

“The mission of the RDF Stream Processing Community Group (RSP) is to define a common model for producing, transmitting and continuously querying RDF Streams. This includes extensions to both RDF and SPARQL for representing streaming data, as well as their semantics. Moreover this work envisions an ecosystem of streaming and static RDF data sources whose data can be combined through standard models, languages and protocols. Complementary to related work in the area of databases, this Community Group looks at the dynamic properties of graph-based data, i.e., graphs that are produced over time and which may change their shape and data over time.”

9<<Texto libre: proyecto, speaker, etc.>>

Page 10: OrdRing 2013 keynote - On the need for a W3C community group on RDF Stream Processing

Use cases

• We have started collecting them

• And I hope that by the end of my talk you will consider contributing some more…

10<<Texto libre: proyecto, speaker, etc.>>

Page 11: OrdRing 2013 keynote - On the need for a W3C community group on RDF Stream Processing

A template to describe use cases (I)

• Streaming Information • Type: Environmental data: temperatures, pressures, salinity, acidity, fluid

velocities etc, • Nature:

• Relational Stream: yes • Text stream: no

• Origin: Data is produced by sensors in oil wells and on oil and gas platforms equipments. Each oil platform has an average of 400.000.

• Frequency of update: • from sub-second to minutes • In triples/minute: [10000-10] t/min

• Quality: It varies, due to instrument/sensor issues • Management /access

• Technology in use: Dedicated (relational and proprietary) stores • Problems: The ability of users to access data from different sources is

limited by an insufficient description of the context • Means of improvement: Add context (metadata) to the data so it

become meaningful and use reasoning techniques to process that metadata

11<<Texto libre: proyecto, speaker, etc.>>

Page 12: OrdRing 2013 keynote - On the need for a W3C community group on RDF Stream Processing

A template to describe use cases (II)

• [optional] Static Information required to interpret the streaming information

• Type: Topology of the sensor network, position of each sensor, the descriptions of the oil platform

• Origin: Oil and gas production operations • Dimension:

• 100s of MB as PostGIS dump • In triples: 10^8

• Quality: Good • Management / access

• Technology in use: RDBMS, proprietary technologies • Available Ontologies and Vocabularies: Reference Semantic Model

(RSM), based on ISO 15926

12<<Texto libre: proyecto, speaker, etc.>>

Page 13: OrdRing 2013 keynote - On the need for a W3C community group on RDF Stream Processing

A tale of four heterogeneities

ISWC2013 Workshop on Ordering and Reasoning, Sydney, 22/10/2013

Oscar [email protected], [email protected]

@ocorchohttp://www.slideshare.net/ocorcho/

Page 14: OrdRing 2013 keynote - On the need for a W3C community group on RDF Stream Processing

Heterogeneity #1: Representing RDF Streams

14<<Texto libre: proyecto, speaker, etc.>>

Page 15: OrdRing 2013 keynote - On the need for a W3C community group on RDF Stream Processing

What is an RDF stream?

• Several possibilities:• An RDF stream is an infinite sequence of timestamped

events (triples or graphs), where timestamps are non-decreasing

…<eventi,ti >

<eventi+1,ti+1 >

<eventi+2,ti+2 >

…• An RDF stream is an infinite sequence of triple occurrences

<<s,p,o>,tα,tω> where <s,p,o> is an RDF triple and tα and tω are the start and end of the interval

• How are timestamps assigned?

Page 16: OrdRing 2013 keynote - On the need for a W3C community group on RDF Stream Processing

Some examples…

• What would be the best/possible RDF stream representation for the following types of problems?• Does Alice meet Bob before Carl?• Who does Carl meet first?

• How many people has Alice met in the last 5m?• Does Diana meet Bob and then Carl within 5m?

• Which are the meetings the last less than 5m?• Which are the meetings with conflicts?

16<<Texto libre: proyecto, speaker, etc.>>

e1

:alice :isWith :bob

e2

:alice :isWith :carl

e3

:bob :isWith :diana

e4

:diana :isWith :carl

t3 6 91

:alice :isWith :bob :alice :isWith :carl :bob :isWith :diana :diana :isWith :carl

e1

e2

e3

e4

Page 17: OrdRing 2013 keynote - On the need for a W3C community group on RDF Stream Processing

Data types for semantic streams - Summary

• Multiple notions of RDF stream proposed• Ordered sequence (implicit timestamp)

• One timestamp per triple (point in time semantics)

• Two timestamps per triple (interval base semantics)

• Comparison between existing approaches

• More investigation is required to agree on an RDF stream model

17

System Data item Time model # of timestamps

INSTANS triple Implicit 0

C-SPARQL triple Point in time 1

SPARQLstream triple Point in time 1

CQELS triple Point in time 1

Sparkwave triple Point in time 1

Streaming Linked Data RDF graph Point in time 1

ETALIS triple Interval 2

Page 18: OrdRing 2013 keynote - On the need for a W3C community group on RDF Stream Processing

Heterogeneity #2: RDF Stream processors

18<<Texto libre: proyecto, speaker, etc.>>

Page 19: OrdRing 2013 keynote - On the need for a W3C community group on RDF Stream Processing

Existing RDF Stream Processing systems

• C-SPARQL: RDF Store + Stream processor• Combined architecture

• CQELS: Implemented from scratch. Focus on performance• Native + adaptive joins for static-data and streaming data

• CQELS-Cloud: Reusing Storm• Paper presentation on Thursday

RDF Store

Stream processor

C-SPARQLquery

static

streaming

continuous results

Native RSPCQELSquery

continuous results

translator

Storm topology

CQELSquery

continuous results

Page 20: OrdRing 2013 keynote - On the need for a W3C community group on RDF Stream Processing

Existing RSP systems

• EP-SPARQL: Complex-event detection• SEQ, EQUALS operators

• SPARQLStream: Ontology-based stream query answering• Virtual RDF views, using R2RML mappings• SPARQL stream queries over the original data streams.

• Instans: RETE-based evaluation

Prolog engine

EP-SPARQLquery

continuous results

translator

DSMS/CEPSPARQLStreamquery

continuous results

rewriter

R2RML mappings

Page 21: OrdRing 2013 keynote - On the need for a W3C community group on RDF Stream Processing

Query languages for semantic streams - Summary

• Different architectural choices • It is not clear when each choice is best for which type of use

case• Wrappers over existing systems

• C-SPARQL, ETALIS, SPARQLstream , CQELS-Cloud

• Better reliability and maintainability?• Native implementations

• CQELS, Streaming Linked Data, INSTANS • Better scalability: optimizations that are not possible

in other systems

• Different operational semantics• See later

21

Page 22: OrdRing 2013 keynote - On the need for a W3C community group on RDF Stream Processing

Heterogeneity #3: Querying RDF Streams

22<<Texto libre: proyecto, speaker, etc.>>

Page 23: OrdRing 2013 keynote - On the need for a W3C community group on RDF Stream Processing

Querying data streams (from CQL to SPARQL-X)

Streams

Relations

…<s,τ>…

<s1

><s2

><s3

>

infiniteunbounded

bagfinitebag

Mapping: T R

stream-to-relation (S2R)

relation-to-stream (R2S)

relation-to-relation (R2R)

Stream Relation R(t)

RDF Stream

s

S2R Window operators

R2S operators

SPARQL operators

RDF

Page 24: OrdRing 2013 keynote - On the need for a W3C community group on RDF Stream Processing

Output: relation

• Case 1: the output is a set of timestamped mappings

RSP

SELECT ?a ?b …FROM ….WHERE ….

CONSTRUCT {?a :prop ?b }FROM ….WHERE ….

a … ?b… [t1]a … ?b…a … ?b… [t3]a … ?b… [t5]a … ?b… [t7]

 <… :prop … > [t1] <… :prop … >  <… :prop … > [t3] <… :prop … > [t5] <… :prop … > [t7]

queries bindings

triples

Page 25: OrdRing 2013 keynote - On the need for a W3C community group on RDF Stream Processing

Output: stream

• Case 2: the output is a stream• R2S operatorsCONSTRUCT RSTREAM {?a :prop ?b }FROM ….WHERE ….

…  <… :prop … > [t1] <… :prop … > [t1] <… :prop … > [t3] <… :prop … > [t5] < …:prop … > [t7]…

RSPquery

stream

ISTREAM: stream out data in the last step that wasn’t on the previous step

DSTREAM: stream out data in the previous step that isn’t in the last step

RSTREAM: stream out all data in the last step

Page 26: OrdRing 2013 keynote - On the need for a W3C community group on RDF Stream Processing

Other operators

• Sequence operators and CEP world

e1 e2 e3

e4

SS

3 6 91

Sequence Simultaneous

SEQ: joins eti,tf and e’ti’,tf’ if e’ occurs after e

EQUALS: joins eti,tf and e’ti’,tf’ if they occur simultaneously

OPTIONALSEQ, OPTIONALEQUALS: Optional join variants

Page 27: OrdRing 2013 keynote - On the need for a W3C community group on RDF Stream Processing

Query languages for semantic streams - Summary

• Comparison between existing approaches

• Is it time to converge on a standard?

27

System S2R R2R Time-aware R2S

INSTANS Based on time events

SPARQL update

Based on time events Ins only

C-SPARQL Engine

Logical and triple-based

SPARQL 1.1 query

timestamp function Batch only

SPARQLstream Logical and triple-based

SPARQL 1.1 query

no Ins, batch, del

CQELS Logical and triple-based

SPARQL 1.1 query

no Ins only

Sparkwave Logical SPARQL 1.0 no Ins only

Streaming Linked Data

Logical and graph-based

SPARQL 1.1 no Batch only

ETALIS no SPARQL 1.0 SEQ, PAR, AND, OR, DURING, STARTS, EQUALS, NOT, MEETS, FINISHES

Ins only

Page 28: OrdRing 2013 keynote - On the need for a W3C community group on RDF Stream Processing

• Different syntax for S2R operator• Semantics of query languages is similar, but not

identical• Lack of R2S operator in some cases• Different support for time-aware operators

28

Query languages for semantic streams - Issues

Page 29: OrdRing 2013 keynote - On the need for a W3C community group on RDF Stream Processing

Classification of existing systems

Page 30: OrdRing 2013 keynote - On the need for a W3C community group on RDF Stream Processing

Heterogeneity #4: Operational Semantics

30<<Texto libre: proyecto, speaker, etc.>>

Page 31: OrdRing 2013 keynote - On the need for a W3C community group on RDF Stream Processing

Operational Semantics

S1 S2 S3 S4SS

t3 6 91

:bob :isIn :hall

:bob :isIn :kitchen

:alice :isIn :hall

:alice :isIn :kitchen

Where are both alice and bob in the last 5s?

System 1: :hall [5] :kitchen [10]

System 2: :hall [3] :kitchen [9]

Both correct?ISWC 2013 evaluation track for "On Correctness in RDF stream

processor benchmarking" by Daniele Dell’Aglio, Jean-Paul Calbimonte, Marco Balduini, Oscar Corcho and Emanuele Della Valle

Page 32: OrdRing 2013 keynote - On the need for a W3C community group on RDF Stream Processing

Conclusions…

32<<Texto libre: proyecto, speaker, etc.>>

Page 33: OrdRing 2013 keynote - On the need for a W3C community group on RDF Stream Processing

Next steps in the community group…

• Agree on an RDF model? • Metamodel?• Timestamps in graphs?• Timestamp intervals• Compatibility with normal (static) RDF

• Additional operators for SPARQL?• Windows (not only time based?)• CEP operators• Semantics

• Go Web• Volatile URIs• Serialization: terse, compact• Protocols: HTTP, Websockets?

Page 34: OrdRing 2013 keynote - On the need for a W3C community group on RDF Stream Processing

On the need for a W3C community group on RDF

Stream Processing

ISWC2013 Workshop on Ordering and Reasoning, Sydney, 22/10/2013

Oscar [email protected], [email protected]

@ocorchohttp://www.slideshare.net/ocorcho/